Professional Documents
Culture Documents
ICMPC14 Proceedings PDF
ICMPC14 Proceedings PDF
ICMPC14 Proceedings PDF
ICMPC14
JULY 5 - 9, 2016
Hyatt Regency Hotel
San Francisco, California
International Conference
on Music Perception and Cognition
14 th Biennial Meeting
Proceedings of the
14th International Conference on
Music Perception and Cognition
Sponsors
Conference Information
Table of Contents
Author Index
Search
Proceedings of the 14th International Conference on Music Perception and Cognition. ISBN 1 876346 65 5,
Copyright © 2016 ICMPC14. Copyright of the content of an individual paper is held by the first-named (primary)
author of that paper. All rights reserved. No paper from this proceedings may be reproduced or transmitted in any
form or by any means, electronic or mechanical, including photocopying, recording, or by any information retrieval
systems, without permission in writing from the paper’s primary author. No other part of this proceedings may be
reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording,
or by any information retrieval systems, without permission in writing from ICMPC14 (icmpc14@icmpc.org). For
technical support please contact Causal Productions (info@causalproductions.com).
SPONSORS
i
WELCOME
from
ICMPC14
We are pleased you could join us for the 14th International Conference on Music
Perception and Cognition
(ICMPC14)
and
we
welcome
you
to
San
Francisco.
It
has
been
a
distinct
privilege
for
us
to
host
this
event
and
we
look
forward
to
your
involvement.
We
believe
the
historical
success
of
ICMPC
stems
from
the
rich
diversity
of
the
attendees,
who
despite
differences
in
geographical
location,
educational
background,
and
professional
focus,
come
together
biennially
as
a
commitment
to
advance
this
truly
interdisciplinary
field
of
music
perception
and
cognition.
This
year,
ICMPC14
will
bring
together
over
500
attendees
from
32
different
countries
who
boast
a
myriad
of
specializations,
including
psychologists,
educators,
theorists,
therapists,
performers,
computer
scientists,
ethnomusicologists
and
neuroscientists.
And
what
better
place
to
celebrate
this
diversity
than
San
Francisco
–
a
city
renowned
for
its
melding
of
cultures
and
ideas.
As
such,
ICMPC14
is
poised
to
continue
the
great
tradition
of
blending
worldc class
interdisciplinary
research
presentations
with
stimulating
social
opportunities,
both
inside
and
outside
the
conference
venue.
On
behalf
of
the
organizing
committee,
the
conference
advisory
board,
and
the
conference
staff
and
volunteers,
we
hope
you
enjoy
your
time
at
ICMPC14
and
that
you
feel
at
home
here
in
San
Francisco.
Please
do
not
hesitate
to
ask
if
there
is
anything
we
can
do
enhance
your
conference
experience.
Sincerely,
Theodore
Zanto
ICMPC14
Conference
Chair
University
of
California
San
Francisco
ii
WELCOME
from
SMPC
I
am
happy
to
welcome
you
to
the
14th
biennial
International
Conference
on
Music
Perception
and
Cognition,
hosted
this
year
in
North
America
and
sponsored
in
part
by
the
Society
for
Music
Perception
and
Cognition.
SMPC
is
an
organization
of
scientists
and
scholars
conducting
research
in
music
perception
and
cognition,
broadly
construed.
Our
membership,
which
numbers
over
500,
represents
a
broad
range
of
disciplines,
including
music
theory,
psychology,
psychophysics,
linguistics,
neurology,
neurophysiology,
ethology,
ethnomusicology,
artificial
intelligence,
computer
technology,
physics
and
engineering,
and
we
are
witnessing
an
unprecedented
proliferation
of
research
labs
and
graduate
programs
dedicated
to
the
scientific
study
of
music.
We
are
historically
a
North
American
society,
however
we
have
members
around
the
globe,
including
Europe,
Asia
and
Australia,
and
we
are
proud
to
participate
in
this
year’s
meeting.
To
learn
more
about
SMPC,
please
visit
our
website,
www.musicperception.org,
where
you
can
read
recent
news,
learn
about
upcoming
events,
find
information
about
our
laboratories
and
graduate
programs,
and
watch
videos
and
podcasts
that
feature
the
latest
research
in
music
perception
and
cognition.
I
wish
to
express
my
sincere
thanks
to
Dr.
Theodore
Zanto
and
all
the
organizers
for
planning
and
coordinating
this
year’s
meeting
in
San
Francisco.
Conferences
like
ICMPC
are
essential
to
the
dissemination
and
coordination
of
research
in
our
growing
field,
and
Dr.
Zanto
has
done
a
spectacular
job
of
putting
together
this
meeting
in
one
of
the
most
beautiful
cities
in
North
America.
I
am
also
pleased
to
announce
that
SMPC’s
next
biennial
meeting
will
be
hosted
at
University
of
California
San
Diego,
with
Dr.
Sarah
Creel
as
conference
chair.
The
tentative
dates
for
the
meeting
are
Jul
31
–
Aug
4,
and
we
hope
to
have
these
and
other
details
finalized
soon.
Ed
Large
President
Society
for
Music
Perception
and
Cognition
iii
CONFERENCE
TEAM
Petr
Janata
University
of
California
Davis
Proceedings
Coordinator
George
Vokalek
Julene
Johnson
Causal
Productions
University
of
California
San
Francisco
Harrison
Conrad
University
of
California
San
Francisco
iv
CONFERENCE
COMMITTEES
Edward Large (President SMPC) Anna Rita Addessi (University of Bologna)
Diana
Deutsch
(Co-‐Organizer
1st
ICMPC)
Annabel
Cohen
(University
of
Prince
Edward
Island)
Roger
Kendall
(Organizer
2nd
ICMPC)
Barbara
Tillmann
(Lyon
Neuroscience
Irene
Deliege
(Organizer
3rd
ICMPC)
Research
Center)
Eugenia
Costa-‐Giomi
(Organizer
4th
ICMPC)
Charles
Limb
(University
of
California
San
Bruce
Pennycook
(Co-‐Organizer
4th
ICMPC)
Francisco)
Suk
Won
Yi
(Organizer
5th
ICMPC)
David
Temperley
(University
of
Rochester)
John
Sloboda
(Organizer
6th
ICMPC)
Devin
McAuley
(Michigan
State
University)
Susan
O'Neill
(Co-‐Organizer
6th
ICMPC)
Elizabeth
Margulis
(University
of
Arkansas)
Kate
Stevens
(Organizer
7th
ICMPC)
Elvira
Brattico
(Aarhus
University)
Scott
Lipscomb
(Organizer
8th
ICMPC)
Emilios
Cambouropoulos
(Aristotle
University
of
Thessaloniki)
Anna
Rita
Addessi
(Co-‐Organizer
9th
ICMPC)
Erin
Hannon
(University
of
Nevada
Las
Vegas)
Mayumi
Adachi
(Organizer
10th
ICMPC)
Erkki
Huovinen
(University
of
Minnesota)
Steven
Demorest
(Organizer
11th
ICMPC)
Eugenia
Costa-‐Giomi
(Ohio
State
University)
Emilios
Cambouropoulos
(Organizer
12th
ICMPC)
Glenn
Schellenberg
(University
of
Toronto
at
Mississauga)
Moo
Kyoung
Song
(Organizer
13th
ICMPC)
Henkjan
Honing
(University
of
Amsterdam)
v
Jane
Ginsborg
(Royal
Northern
College
of
Peter
Vuust
(Aarhus
University)
Music)
Petr
Janata
(University
of
California
Davis)
Jessica
Grahn
(Western
University)
Petri
Toiviainen
(University
of
Jyväskylä)
Joel
Snyder
(University
of
Nevada
Las
Vegas)
Psyche
Loui
(Wesleyan
University)
John
Iversen(University
of
California
San
Ramesh
Balasubramaniam
(University
of
Diego)
California
Merced)
John
Sloboda
(Guildhall
School
of
Music
&
Reinhard
Kopiez
(Hanover
University
of
Music)
Drama)
Reyna
Gordon
(Vanderbilt
University)
Jonathan
Berger
(Stanford
University)
Richard
Ashley
(Northwestern
University)
Jukka
Louhivuori
(University
of
Jyväskylä)
Rita
Aiello
(New
York
University)
Julene
Johnson
(University
of
California
San
Francisco)
Robert
Gjerdingen
(Northwestern
University)
Justin
London
(Carleton
College)
Sarah
Creel
(University
of
California
San
Diego)
Kate
Stevens
(University
of
Western
Sydney)
Scott
Lipscomb
(University
of
Minnesota)
Kyung
Myun
Lee
(Seoul
National
University)
Stefan
Koelsch
(Freie
Universität
Berlin)
Laurel
Trainor
(McMaster
University)
Stephen
McAdams
(McGill
University)
Lauren
Stewart
(Goldsmith,
University
of
London)
Steven
Brown
(McMaster
University)
Lola
Cuddy
(Queen's
University)
Steven
Morrison
(University
of
Washington)
Marc
Leman
(University
of
Ghent)
Suk
Won
Yi
(Seoul
National
University)
Marcus
Pearce
(Queen
Mary,
University
of
Takako
Fujioka
(Stanford
University)
London)
Theodore
Zanto
(University
of
California
San
Mark
Schmuckler
(University
of
Toronto
Francisco)
Scarborough)
Tom
Collins
(de
Montfort
University)
Morwaread
Farbood
(New
York
University)
Werner
Goebl
(University
of
Music
and
Nikki
Rickard
(Monash
University)
Performing
Arts)
Peter
Keller
(University
of
Western
Sydney)
William
Forde
Thompson
(Macquarie
University
Peter
Pfordresher
(University
at
Buffalo
State
University
of
New
York)
vi
AWARDS
ICMPC14
/
SEMPRE
Young
Investigator
Award
This
award
is
granted
to
research
submissions
of
exceptional
quality
that
were
undertaken
by
either
graduate
students
or
new
researchers
in
the
field.
Each
applicant
for
the
award
submitted
a
paper
of
up
to
3000
words
that
was
reviewed
by
the
Young
Investigator
Award
selection
committee.
The
award
winner
received
$1500
in
travel
and
housing
support
and
complimentary
registration
courtesy
of
SEMPRE
&
ICMPC14.
The
honorable
mention
recipient
received
complimentary
registration
and
$1000
toward
travel
expenses.
Honorable
Mention
Tan-‐Chyuan
Chin
(The
University
of
Melbourne)
Predicting
emotional
well-‐being:
The
roles
of
emotional
sensitivity
to
music
and
emotion
regulation
using
music
Music
&
Health
/
Well-‐Being
2:
Thursday,
9:30
am,
Marina
vii
SEMPRE
Travel
Awards
ICMPC14
would
like
to
thank
the
Society
for
Education,
Music,
and
Psychology
Research
(SEMPRE)
for
their
unprecedented
generosity
in
supporting
ICMPC
attendees.
This
year,
SEMPRE
provided
over
$23,000
in
awards
to
conference
attendees!
Awardees
received
various
amounts
based
on
need.
Congratulations
to
the
50
SEMPRE
Travel
Award
winners:
viii
OPENING
KEYNOTE
I
will
discuss
cross-‐modal
associations
from
music
to
colors
for
several
kinds
of
music,
including
classical
orchestral
pieces
by
Bach,
Mozart,
and
Brahms,
single-‐line
piano
melodies
by
Mozart,
and
34
different
genres
of
popular
music
from
jazz
to
heavy-‐metal
to
salsa
to
country-‐western.
When
choosing
the
3
colors
(from
37)
that
“went
best”
with
selections
of
classical
music,
choices
by
48
US
non-‐synesthetes
were
highly
systematic:
e.g.,
faster
music
in
the
major
mode
was
strongly
associated
with
more
saturated,
lighter,
yellower
colors.
Further
results
strongly
suggest
that
emotion
mediates
these
music-‐to-‐color
associations:
e.g.,
the
happy/sad
ratings
of
the
classical
orchestral
selections
were
highly
correlated
.97
with
the
happy/sad
ratings
of
the
colors
chosen
as
going
best
with
them
(Palmer,
Schloss,
Xu,
&
Prado-‐León,
PNAS,
2013).
Cross-‐
cultural
results
from
Mexican
participants
for
the
same
classical
selections
were
virtually
identical
to
those
from
US
participants.
Analyses
of
the
34
genres
data
further
reveals
that
color
choices
can
be
better
predicted
by
3
emotional
dimensions
(happy/sad,
agitated/calm,
and
like/dislike)
than
by
8
perceived
musical
dimensions
(electronic/acoustic,
fast-‐strong/slow-‐
weak,
harmonious/disharmonious,
etc.)
Strong
emotional
effects
were
also
present
for
two-‐
note
musical
intervals,
and
weaker
emotional
effects
for
the
timbre
(or
tone
color)
of
individual
instruments.
Similar
experiments
were
conducted
with
12
music-‐to-‐color
synesthetes,
except
that
they
chose
the
3
colors
(from
the
same
37)
that
were
most
similar
to
the
colors
they
actually
experienced
while
listening
to
the
same
musical
selections.
Synesthetes
showed
clear
evidence
of
emotional
effects
for
some
musical
variables
(e.g.,
major
versus
minor)
but
not
for
others
(e.g.,
slow
versus
fast
tempi
in
the
piano
melodies).
The
complex
nature
of
similarities
and
differences
between
synesthetes’
color
experiences
and
non-‐synesthetes’
color
associations
will
be
discussed.
ix
CLOSING
KEYNOTE
Music
involves
physical
interactions
with
musical
instruments,
acoustic
transmission
of
sound
waves,
vibration
of
peripheral
auditory
structures
and
dynamic
coordination
of
whole
body
movements.
At
the
same
time,
music
engages
high-‐level
cognitive
and
affective
processes,
and
is
often
compared
to
language
in
terms
of
its
structural
and
cognitive
complexity.
How
can
we
bridge
the
gap
between
the
raw
physical
experience
of
music
and
the
complex
cognitive
computations
that
are
assumed
to
enable
it?
In
this
talk,
I
propose
a
radically
embodied
approach
to
music
cognition,
consistent
with
the
enactive
approach
of
Varela
and
the
ecological
psychology
of
Gibson.
The
“radical”
part
is
that
it
appeals
to
physical
principles
to
explain
music
at
all
levels
of
description:
patterns
of
activity
in
the
brain,
patterns
of
action
in
the
body,
and
patterns
of
interaction
among
groups
of
individuals.
At
the
neural
level,
emergent
dynamic
patterns
—
as
opposed
to
special
purpose
circuits
—
are
considered
to
be
the
fundamental
elements,
and
resonance
—
rather
than
computation
—
is
considered
to
be
the
fundamental
operation.
However,
this
approach
is
not
neurocentric.
It
recognizes
that
music
cuts
across
brain–body–world
divisions
and
it
provides
the
concepts
and
tools
necessary
to
bridge
this
divide.
I
begin
with
rhythm,
reviewing
theoretical
and
empirical
results
that
show
why
rhythm
perception-‐action
is
best
understood
as
resonance
of
the
brain
and
body
to
sound.
Using
the
example
of
synchronizing
metronomes,
I
explain
how
we
can
use
a
single
language
—
nonlinear
dynamical
systems
—
to
cut
across
brain-‐body-‐world
boundaries.
I
show
how
this
approach
replaces
abstract
cognitive
“computation”
with
resonance
of
emergent
physical
patterns.
Finally,
I
describe
how
the
radically
embodied
approach
leads
to
a
view
of
music
as
a
natural,
emergent
phenomenon
of
the
physical
world.
x
Table of Contents
10:30 Testing the Robustness of the Timbre Toolbox and the MIRtoolbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Savvas Kazazis, Nicholas Esterer, Philippe Depalle, Stephen McAdams
10:45 Musicians do Not Experience the Same Impairments as Non-Musicians During Postural
Manipulations in Tactile Temporal Order Judgment Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
James Brown, Emily Orchard-Mills, David Alais
11:00 Semantic Dimensions of Sound Mass Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Stephen McAdams, Jason Noble
11:15 Aging Adults and Gender Differences in Musical Nuance Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Jennifer A. Bugos
14:00 Insights into the Complexities of Music Listening for Hearing Aid Users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Alinka Greasley, Harriet Crook, Robert Fulford, Jackie Salter
14:30 The Perception of Auditory Distortion Products from Orchestral Crotales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Alex Chechile
15:00 Vibrotactile Perception of Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Frank A. Russo
xi
Aesthetic Perception & Response 1
Wednesday, 10:30–12:30, Marina, Session Chair: Bill Thompson
10:30 Influence of Information: How Different Modes of Writing About Music Shape Music
Appreciation Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Timo Fischinger, Michaela Kaufmann, Wolff Schlotz
11:00 Cognitive and Affective Reactions to Repeating a Piece on a Concert Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Andrea R. Halpern, John Sloboda, Chloe H.K. Chan, Daniel Müllensiefen
11:30 Evaluating Recorded Performance: An Analysis of Critics’ Judgements of Beethoven Piano
Sonata Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Elena Alessandri, Victoria J. Williamson, Hubert Eiholzer, Aaron Williamon
12:00 Clouds and Vectors in the Spiral Array as Measures of Tonal Tension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Dorien Herremans, Elaine Chew
xii
16:30 The Constituents of Art-Induced Pleasure — A Critical Integrative Literature Review . . . . . . . . . . . . . . . . . . . 54
M. Tiihonen, Elvira Brattico, Suvi Saarikallio
16:30 “Leaky Features”: Influence of the Main on the Many. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Michael Barone, Matthew Woolhouse
16:30 The Silent Rhythm and Aesthetic Pleasure of Poetry Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Natalie M. Phillips, Lauren Amick, Lana Grasser, Cody Mejeur, J. Devin McAuley
16:30 Visual Imagery and the Semantics of Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Paige Tavernaro, Elizabeth Hellmuth Margulis
16:30 Disliked Music: The Other Side of Musical Taste. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Taren Ackermann
16:30 On the Asymmetry of Cross-Modal Effects Between Music and Visual Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Teruo Yamasaki
16:30 Zoning in or Tuning in? Absorption Within a Framework for Attentional Phases in Music
Listening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Thijs Vroegh
15:30 Effects of the Duration and the Frequency of Temporal Gaps on the Subjective Distortedness
of Music Fragments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Kunito Iida, Yoshitaka Nakajima, Kazuo Ueda, Gerard B. Remijn, Yukihiro Serizawa
15:45 Modeling Audiovisual Tension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Morwaread M. Farbood
16:00 Computational Modeling of Chord Progressions in Popular Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Stefanie Acevedo
Symposium: Beneath (or Beyond) the Surface: Corpus Studies of Tonal Harmony
Discussants: David Sears (McGill University); Claire Arthur (Ohio State University)
xiii
Cognitive Modeling of Music 3
Saturday, 08:30–10:30, Marina, Session Chair: Christopher White
8:30 Exploring the Temporal Capacity of Memory for Key-Change Music Empirically and
Computationally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Érico Artioli Firmino, José Lino Oliveira Bueno, Carol Lynne Krumhansl
9:00 Form-Bearing Musical Motives: Perceiving Form in Boulez’s Anthèmes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Cecilia Taher, Robert Hasegawa, Stephen McAdams
9:30 Learning the Language of Affect: A New Model That Predicts Perceived Affect Across
Beethoven’s Piano Sonatas Using Music-Theoretic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Joshua D. Albrecht
10:00 Modelling Melodic Discrimination Using Computational Models of Melodic Similarity and
Complexity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Peter M.C. Harrison, Daniel Müllensiefen
Cognitive Musicology 1
Wednesday, 08:30–09:50, Seacliff D
Symposium: Film, Television, and Music: Embodiment, Neurophysiology, Perception, and Cognition
Discussants: Siu-Lan Tan (Kalamazoo College); Mark Shevy (Northern Michigan University)
Film, Television, and Music: Embodiment, Neurophysiology, Perception, and Cognition . . . . . . . . . . . . . . 122
Siu-Lan Tan, Mark Shevy
8:30 Temporality and Embodiment in Film Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Juan Chattah
8:50 Neurophysiological Responses to Motion Pictures: Sound, Image, and Audio-Visual Integration. . . . . . . 124
Roger E. Dumas, Scott D. Lipscomb, Arthur C. Leuthold, Apostolos P. Georgopoulos
xiv
9:10 Exploring Incongruence: Shared Semantic Properties and Judgments of Appropriateness in
Film-Music Pairings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
David Ireland, W. Luke Windsor
9:30 Classical Music in Television Commercials: A Social-Psychological Perspective. . . . . . . . . . . . . . . . . . . . . . . . 126
Peter Kupfer
Cognitive Musicology 2
Saturday, 08:30–10:30, Seacliff D, Session Chair: Richard Ashley
16:00 Extracting the Musical Schemas of Traditional Japanese Folk Songs from Kyushu District . . . . . . . . . . . . 139
Akihiro Kawase
16:00 An Investigation on the Perception of Regional Style of Chinese Folk Song. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Cong Jiang
16:00 reNotate: The Crowdsourcing and Gamification of Symbolic Music Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 146
Ben Taylor, Matthew Wolf, David John Baker, Jesse Allison, Daniel Shanahan
16:00 In the Blink of an Ear: A Critical Review of Very Short Musical Elements (Plinks) . . . . . . . . . . . . . . . . . . . . . . 147
Felix C. Thiesen, Reinhard Kopiez, Christoph Reuter, Isabella Czedik-Eysenberg, Kathrin Schlemmer
16:00 The Use of Prototype Theory for Understanding the Perception and Concept Formation of
Musical Styles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Florian Hantschel, Claudia Bullerjahn
16:00 Exploring Circadian Patterns in Musical Imagery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Freya Bailes
16:00 Musical Factors Influencing Tempo Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Leigh VanHandel
16:00 Relative Salience in Polyphonic Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Marco Bussi, Sarah Sauvé, Marcus T. Pearce
16:00 Reading Pitches at the Piano: Disentangling the Difficulties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Marion Wood
16:00 A Machine-Learned Model of Perceived Musical Affect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Tess Dula, Paul Griesemer, Joshua D. Albrecht
xv
Composition & Improvisation 1
Tuesday, 14:00–15:30, Bayview A, Session Chair: Andrew Goldman
16:30 All You Need to do is Ask? The Exhortation to be Creative Improves Creative Performance
More for Non-Expert Than Expert Jazz Musicians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
David S. Rosen, Youngmoo E. Kim, Daniel Mirman, John Kounios
16:30 Negentropy: A Compositional and Perceptual Study Using Variable Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Jason C. Rosenberg
16:30 A Method for Identifying Motor Pattern Boundaries in Jazz Piano Improvisations . . . . . . . . . . . . . . . . . . . . . 182
Martin Norgaard
16:30 The Meaning of Making — Mapping Strategies in Music Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Peter Falthin
16:30 Study of Instrument Combination Patterns in Orchestral Scores from 1701 to 2000 . . . . . . . . . . . . . . . . . . . 186
Song Hui Chon, Dana DeVlieger, David Huron
15:00 Reduced Cultural Sensitivity to Perceived Musical Tension in Congenital Amusia . . . . . . . . . . . . . . . . . . . . . 191
Cunmei Jiang, Fang Liu, Patrick C.M. Wong
15:15 A Comparison of Statistical and Empirical Representations of Cultural Distance . . . . . . . . . . . . . . . . . . . . . . . 192
Steven J. Morrison, Steven M. Demorest, Marcus T. Pearce
16:00 A Cross Cultural Study of Emotional Response to North Indian Classical Music . . . . . . . . . . . . . . . . . . . . . . . . 193
Avantika Mathur, Vishal Midya, Nandini Chatterjee Singh
16:00 Relating Musical Features and Social Function: Contrast Pattern Mining of Native American
Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
Daniel Shanahan, Kerstin Neubarth, Darrell Conklin
16:00 Effects of Repetitive Vocalization on Temporal Perception in a Group Setting. . . . . . . . . . . . . . . . . . . . . . . . . . 195
Jessica Pinkham, Zachary Wallmark
xvi
History of Music Cognition 1
Friday, 11:00–12:30, Seacliff D
Symposium: Empirical Musicology, 10 Years On: New Ways to Publish and the Empirical Paradigm in Music
Research
Discussant: Daniel Shanahan (Louisiana State University)
11:00 Empirical Musicology, 10 Years On: New Ways to Publish and the Empirical Paradigm in
Music Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Daniel Shanahan
14:00 Musical Structure as a Hierarchical Retrieval Organization: Serial Position Effects in Memory
for Performed Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Jane Ginsborg, Roger Chaffin, Alexander P. Demos, James Dixon
14:30 How Accurate is Implicit Memory for the Key of Unfamiliar Melodies? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
E. Glenn Schellenberg, Michael W. Weiss
15:00 Effects of Performance Cues on Expressive Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Tânia Lisboa, Alexander P. Demos, Kristen Begosh, Roger Chaffin
15:30 A Non-Musician with Severe Alzheimer’s Dementia Learns a New Song. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Amee Baird, Heidi Umbach, William Thompson
15:45 Vocal Melodies are Remembered Better Than Piano Melodies but Sine-Wave Simulations are
not . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Michael W. Weiss, Sandra E. Trehub, E. Glenn Schellenberg
16:00 Perception of Leitmotives in Richard Wagner’s Der Ring des Nibelungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
David John Baker, Daniel Müllensiefen
16:15 Music Identification from Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Ivan Jimenez, Tuire Kuusi
xvii
Memory & Music 3
Saturday, 11:00-12:00, Bayview B, Session Chair: Daniel Müllensiefen
11:00 Memory for the Meaningless: Expert Musicians’ Big Advantage at Recalling Unstructured
Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Giovanni Sala, Fernand Gobet
11:15 Memory for Melody: How Previous Knowledge Shapes the Formation of New Memories. . . . . . . . . . . . . . 213
Steffen A. Herff, Kirk N. Olsen, Roger T. Dean
11:30 Generality of the Memory Advantage for Vocal Melodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Michael W. Weiss, E. Glenn Schellenberg, Sandra E. Trehub
11:45 Musical Experience and Executive Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Brooke M. Okada, L. Robert Slevc
15:00 Divided Attention and Expertise in the Context of Continuous Melody Recognition . . . . . . . . . . . . . . . . . . . 216
Steffen A. Herff, Daniela Czernochowski
15:15 Generalizing the Learning of Instrument Identities Across Pitch Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Stephen McAdams, Ayla Tse, Grace Wang
15:30 Playing by Ear Beyond 100: A Continuing Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
Eleanor Selfridge-Field
15:45 The Effect of Visually Presented Lyrics on Song Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Yo-Jung Han
xviii
Music & Aging (Posters)
Saturday, 16:00, Seacliff A–C
16:00 Better Late Than Never (or Early): Late Music Lessons Confer Advantages in Decision-Making . . . . . . . 233
Kirsten E. Smayda, W. Todd Maddox, Darrell A. Worthy, Bharath Chandrasekaran
16:00 The Benefits of Music Making on Cognition in Older Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
LinChiat Chang, Anna M. Nápoles, Alexander Smith, Julene K. Johnson
16:00 Outcomes of Participation in a Piano-Based Recreational Music Making Group for Adults Over
50: A Retrospective Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Olivia Swedberg Yinger, Vicki McVay, Lori Gooding
16:00 Does Aging Affect the Voice of Trained Professional Singers? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Thomas Akshintala, Rashmi Pai, G.S. Patil
8:30 The ‘rasa’ in the ‘raga’? Brain Networks of Emotion Responses to North Indian Classical ragas. . . . . . . 247
Avantika Mathur, Nandini Chatterjee Singh
9:00 Multimodal Affective Interactions: Comparing Auditory and Visual Components in Dramatic
Scenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
Moe Touizrar, Sarah Gates, Kit Soden, Bennett K. Smith, Stephen McAdams
9:30 Differences in Verbal Description of Music Listening Experiences Between College Students
with Total Blindness and Typical Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Hye Young Park, Hyun Ju Chong
10:00 Social eMotions — Communicating Emotions in Contemporary Dance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Tommi Himberg, Klaus Förger, Maija Niinisalo, Johanna Nuutinen, Jarkko Lehmus
14:00 Rehearing Music: Schematic and Veridical Expectancy Ratings in Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Hayley Trower, Adam Ockelford, Evangelos Himonides, Arielle Bonneville-Roussy
14:15 Pleasurable Music is Serious, Exciting and Bodily Experienced, While Pleasurable Visual
Objects are Light, Calming, and Aesthetically Pleasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
J. Maksimainen, Suvi Saarikallio
xix
14:30 Brazilian Children’s Emotional Responses to Wagner’s Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Patricia Vanzella, Paulo E. Andrade, Olga V.C.A. Andrade, E. Glenn Schellenberg
14:45 The Relative Contributions of Composition and Visual and Auditory Performance Cues to
Emotion Perception: Comparing Piano and Violin Performances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Marc R. Thompson, Jonna K. Vuoskoski, Eric Clarke
16:30 Imprinting Emotion on Music: Transferring Affective Information from Sight to Sound . . . . . . . . . . . . . . . 265
Caitlyn Trevor, Joseph Plazak
16:30 Matching Music to Brand Personality: A Semantic Differential Tool for Measuring Emotional
Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
David John Baker, Tabitha Trahan, Daniel Müllensiefen
16:30 Audience Emotional Response to Live and Recorded Musical Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Elizabeth Monzingo
16:30 The Relationship Between Coupled Oscillations During Auditory Temporal Prediction and
Motor Trajectories in a Rhythmic Synchronization Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
Fiona C. Manning, Brandon T. Paul, Serena W. Tse
16:30 Inducing Emotion Through Size-Related Timbre Manipulations: A Pilot Investigation . . . . . . . . . . . . . . . . . 276
Joseph Plazak, Zachary Silver
16:30 The Effect of Repetitive Structure on Enjoyment in Uplifting Trance Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Kat Agres, Louis Bigo, Dorien Herremans, Darrell Conklin
16:30 Emotions and Chills at the Interface Between Contrasting Musical Passages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Sabrina Sattmann, Richard Parncutt
16:30 The Influence of Odour on the Perception of Musical Emotions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Scott Beveridge
16:30 Emotional Mediation in Crossmodal Correspondences from Music to Visual Texture . . . . . . . . . . . . . . . . . . 285
Thomas Langlois, Joshua Peterson, Stephen E. Palmer
16:30 Expressive Performance and Listeners’ Decoding of Performed Emotions: A Multi-Lab
Replication and Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Renee Schapiro, Jessica Akkermans, Veronika Busch, Kai Lothwesen, Timo Fischinger, Klaus Frieler,
Kathrin Schlemmer, Daniel Shanahan, Kelly Jakubowski, Daniel Müllensiefen
xx
Music & Emotions 6 (Posters)
Wednesday, 14:00, Seacliff A–C
14:00 Self-Chosen Music Listening at Home, Emotion and Mood Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Alexandra Lamont, Laura Vassell
14:00 Hearing Wagner: Physiological Responses to Richard Wagner’s Der Ring des Nibelungen . . . . . . . . . . . . . 288
David John Baker, Daniel Müllensiefen
14:00 The Effects of Hearing Aids and Hearing Impairment on the Physiological Response to
Emotional Speech. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Gabriel A. Nespoli, Gurjit Singh, Frank A. Russo
14:00 Coping with Interpersonal Distress: Can Listening to Music Alleviate Loneliness? . . . . . . . . . . . . . . . . . . . . . 290
Katharina Schäfer, Tuomas Eerola
14:00 Does the Familiarity and Timing of the Music Affect Indirect Measures of Emotion?. . . . . . . . . . . . . . . . . . . 291
L. Edelman, P. Helm, S.C. Levine, S.D. Levine, G. Morrello
14:00 Emotional Reactions in Music Listeners Suffering from Depression: Differential Activation of
the Underlying Psychological Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Laura S. Sakka
14:00 Music Liking, Arousal and Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Tessa Overboom, Jade Verheijen, Joyce J. de Bruin, Rebecca S. Schaefer
xxi
16:30 A New Measure of the Well-Being Benefits of Music Participation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Amanda E. Krause, Jane W. Davidson, Adrian C. North
16:30 Musical Intensity in Affect Regulation: Interventions in Self-Harming Behavior . . . . . . . . . . . . . . . . . . . . . . . . 315
Diana Hereld
16:30 Music Engagement, Personality, Well-Being and Affect in a Sample of North Indian Young
Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
A. Chakraborty, D.K. Upadhyay, R. Shukla
16:30 Music Rehearsals and Well-Being: What Role Plays Personality? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Eva Matlschweiger, Sabrina Sattmann, Richard Parncutt
16:30 Motivating Stroke Rehabilitation Through Music: A Feasibility Study Using Digital Musical
Instruments in the Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
Pedro Kirk, Mick Grierson, Rebeka Bodak, Nick Ward, Fran Brander, Kate Kelly, Nickolas Newman,
Lauren Stewart
16:30 Salivary Biomarkers as Objective Data in Music Psychology Research: Comparing Live and
Recorded Music to Story Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
R. Orlando, C. Speelman, A. Wilkinson, V. Gupta
16:30 Chanting Meditation Improves Mood and Social Cohesion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Gemma Perry, Vince Polito, William Thompson
8:30 Music Training and Neuroplasticity: Cognitive Rehabilitation of Traumatic Brain Injury
Patients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Berit Marie Vik, Geir Olve Skeie, Karsten Specht
9:00 Help Musicians UK Hearing Survey: Musicians’ Hearing and Hearing Protection . . . . . . . . . . . . . . . . . . . . . . . 337
Alinka Greasley, Robert Fulford, Nigel Hamilton, Maddy Pickard
9:30 Predicting Emotional Well-Being: The Roles of Emotional Sensitivity to Music and Emotion
Regulation Using Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
Tan-Chyuan Chin, Nikki S. Rickard
xxii
10:00 Efficacy of a Self-Chosen Music Listening Intervention in Regulating Induced Negative Affect:
A Randomised Controlled Trial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Jenny M. Groarke, Michael J. Hogan, Phoebe McKenna-Plumley
10:30 Music-Language Dissociation After Brain Damage: A New Look at the Famous Case of
Composer Vissarion Shebalin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Meta Weiss
10:45 Using a Rhythmic Speech Production Paradigm to Elucidate the Rhythm-Grammar Link: A
Pilot Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Rita Pfeiffer, Alison Williams, Carolyn Shivers, Reyna L. Gordon
11:00 Music Syntactic Processing Related to Integration of Local and Global Harmonic Structures in
Musicians and Non-Musicians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Irán Román, Takako Fujioka
xxiii
Music & Language 3
Wednesday, 08:30–09:50, Bayview B
8:30 Individuals with Congenital Amusia Show Impaired Talker Adaptation in Speech Perception. . . . . . . . . 360
Fang Liu, Yanjun Yin, Alice H.D. Chan, Virginia Yip, Patrick C.M. Wong
9:00 Music Supports the Processing of Song Lyrics and Changes Their Contents: Effects of Melody,
Silences and Accompaniment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Yke Schotanus
9:30 Music is More Resistant to Neurological Disease Than Language: Insights from Focal Epilepsy . . . . . . . 362
Laura J. Bird, Graeme D. Jackson, Sarah J. Wilson
10:00 An fMRI Study Comparing Rhythm Perception in Adults Who do and do not Stutter. . . . . . . . . . . . . . . . . . . 363
Elizabeth A. Wieland, J. Devin McAuley, Ho Ming Chow, David Zhu, Soo-Eun Chang
xxiv
Music & Language 6
Saturday, 15:00–15:45, Bayview A, Session Chair: Ao Chen
15:00 Finding the Beat in Poetry: The Role of Meter, Rhyme, and Lexical Content in Speech
Production and Perception. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Benjamin G. Schultz, Sonja A. Kotz
15:15 Vowel Perception by Congenital Amusics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
Jasmin Pfeifer, Silke Hamann
15:30 A Tool for the Quantitative Anthropology of Music: Use of the nPVI Equation to Analyze
Rhythmic Variability Within Long-Term Historical Patterns in Music. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Joseph R. Daniele
xxv
Music & Language 8 (Posters)
Saturday, 16:00, Seacliff A–C
16:00 Music and Language: Syntactic Interaction Effects Found Without Violations, and Modulated
by Attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
Anna Fiveash, Genevieve McArthur, William Thompson
16:00 Creatures of Context: Recruiting Musical or Linguistic Knowledge for Spoken, Sung, and
Ambiguous Utterances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
Christina Vanden Bosch der Nederlanden, Erin E. Hannon, Joel S. Snyder
16:00 Perception of Vowel-Consonant Duration Contrasts in 8-Month-Old Infants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Christine D. Tsang, Simone Falk
16:00 A Novel Auditory Classification Paradigm for Developmental Music Cognition Research . . . . . . . . . . . . . 398
Justin Lane Black, Erin E. Hannon
16:00 Modality and Tempo of Metronome-Timed Speech Influence the Extent of Induced Fluency
Effect in Stuttering Speakers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Lauren Amick, Katherine B. Jones, Juli Wade, Soo-Eun Chang, J. Devin McAuley
16:00 A Study on Analysis of Prosodic and Tonal Characteristics Emotional Vocabularies in Korean
Language for Songwriting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
Soo Yon Yi, Hyun Ju Chong
16:00 The Association Between Musical Skills and Grammar Processing in Childhood . . . . . . . . . . . . . . . . . . . . . . . 401
Swathi Swaminathan, E. Glenn Schellenberg
14:00 Evaluating the Musical Protolanguage Hypothesis Using a Serial Reproduction Task . . . . . . . . . . . . . . . . . . 402
William Thompson, Anna Fiveash, Weiyi Ma
14:30 Instrumental Functionality and its Impact on Musical Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Morten Schuldt-Jensen
11:30 The Conductor as Guide: Gesture and the Perception of Musical Content. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
Anita B. Kumar, Steven J. Morrison
11:45 How Music Moves Us: Entraining to Musicians’ Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Roger Chaffin, Alexander P. Demos
12:00 The Effect of Attention on Adaptive and Anticipatory Mechanisms Involved in Sensorimotor
Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Bronson Harry, Carla Pinto, Peter Keller
12:15 Gesture, Embodied Cognition, and Emotion: Comparing Hindustani Vocal Gharanas in
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Hans Utter
xxvi
Music & Movement 2
Thursday, 11:00–12:30, Seacliff D, Session Chair: Marc Thompson
8:30 The Loss and Regain of Coordinated Behavior in Musical Duos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Andrea Schiavio, Ashley E. Walton, Matthew Rodger, Johanna Devaney
9:00 Impact of Event Density and Tempo on Synchronization Ability in Music-Induced Movement . . . . . . . . 414
Birgitta Burger, Justin London, Marc R. Thompson, Petri Toiviainen
9:30 Embodied Decision-Making in Improvisation: The Influence of Perceptual-Motor Priming. . . . . . . . . . . . . 415
Eric Clarke, Adam Linson
xxvii
14:00 Validation of a Simple Tool to Characterize Postural Stability in the Setting of the Indian
Classical Dance Form, Bharatnatyam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Nandita K. Shankar, Ramesh Balasubramaniam
14:00 Singing About Sharing: Music Making and Preschoolers’ Prosocial Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
Sara Lynn Beck, John Rieser
14:00 Measuring Stress in Cases of Discrepancy Between a Task Description and the Accompanying
Gesture, Using the Example of Conducting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Sarah Platte, Rosalind Picard, Morten Schuldt-Jensen
14:00 Movement Restriction When Listening to Music Impacts Positive Emotional Responses to and
Liking of Pieces of Music. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Shannon Wright, Evan Caldbick, Mario Liotti
14:00 Auditory-Motor Interactions in the Music Production of Adults with Cochlear Implants. . . . . . . . . . . . . . . 438
Tara Vongpaisal, Daniela Caruso
16:00 Bobbing, Weaving, and Gyrating: Musician’s Sway Reflects Musical Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Alexander P. Demos, Roger Chaffin, Topher Logan
16:00 Short Latency Effects of Auditory Frequency Change on Human Motor Behavior . . . . . . . . . . . . . . . . . . . . . . 440
Amos Boasson, Roni Granot
16:00 The Wave Transformation: A Method for Analyzing Spontaneous Coordination from
Discretely Generated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Benjamin G. Schultz, Alexander P. Demos, Michael Schutz
16:00 Hearing the Gesture: Perception of Body Actions in Romantic Piano Performances . . . . . . . . . . . . . . . . . . . . 442
Catherine Massie-Laberge, Isabelle Cossette, Marcelo M. Wanderley
16:00 Participant Attitudes Toward Dalcroze Eurhythmics as a Community Fall Prevention Program
in Older Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Hyun Gu Kang, Rodney Beaulieu, Shoko Hino, Lisa Brandt, Julius Saidro
16:00 Evaluating the Audience’s Perception of Real-Time Gestural Control and Mapping
Mechanisms in Electroacoustic Vocal Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
J. Cecilia Wu, Matt Wright
16:00 Walking to Music: How Instructions to Synchronize Alter Gait in Good and Poor Beat
Perceivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Lucy M. McGarry, Emily A. Ready, Cricia Rinchon, Jeff D. Holmes, Jessica A. Grahn
16:00 A Comparison of Three Interaction Measures That Examine Dyadic Entrainment . . . . . . . . . . . . . . . . . . . . . . 450
Marc R. Thompson, Birgitta Burger, Petri Toiviainen
16:00 Eye Movements While Listening to Various Types of Music. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
Marek Franěk, Denis Šefara, Roman Mlejnek
16:00 Periodic Body Motions as Underlying Pulse in Norwegian Telespringar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
Mari Romarheim Haugen
16:00 Bouncing Synchronization to Vibrotactile Music in Hearing and Early Deaf People . . . . . . . . . . . . . . . . . . . . 453
Pauline Tranchant, Martha Shiell, Marcello Giordano, Alexis Nadeau, Marcelo M. Wanderley,
Isabelle Peretz, Robert Zatorre
16:00 Conducting — An Intuitive Gestural Language? Exploring the Actual Impact of Conducting
Gestures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
Sarah Platte, Morten Schuldt-Jensen
16:00 Dyadic Improvisation and Mirroring of Finger Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Tommi Himberg, Maija Niinisalo, Riitta Hari
xxviii
16:00 Let’s Dance: Examining the Differences in Beat Perception and Production Between Musicians
and Dancers Using Motion Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Tram Nguyen, Jessica A. Grahn
Symposium: Exploring Music & Language Processing and Emotions with ECoG
Discussant: Petr Janata (UC Davis)
Conveners: Christian Mikutta (University of California Berkeley); Robert Knight (University of California Berkeley)
Exploring Music & Language Processing and Emotions with ECoG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
C.A. Mikutta, Diana Omigie, Robert T. Knight, Petr Janata
11:00 Electrocorticography and Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Robert T. Knight
11:30 Reconstruction of Music from Direct Cortical Recordings in Humans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
C.A. Mikutta, S. Martin, G. Schalk, P. Brunner, Robert T. Knight, B.N. Pasley
12:00 Hippocampus and Amygdala Interactions Associated with Musical Expectancy . . . . . . . . . . . . . . . . . . . . . . . . 473
Diana Omigie, Marcus T. Pearce, Severine Samson
xxix
Music & Neuroscience 4
Saturday, 08:30–10:30, Bayview B, Session Chair: Takako Fujioka
8:30 Neural Correlates of Auditory and Language Development in Children Engaged in Music
Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Assal Habibi, B. Rael Cahn, Antonio Damasio, Hanna Damasio
9:00 Music Use Predicts Neurobiological Indices of Well-Being and Emotion Regulation Capacity . . . . . . . . . . 475
Tan-Chyuan Chin, Nikki S. Rickard
9:30 Neural Correlates of Dispositional Empathy in the Perception of Isolated Timbres . . . . . . . . . . . . . . . . . . . . 476
Zachary Wallmark, Choi Deblieck, Marco Iacoboni
10:00 SIMPHONY: Studying the Impact Music Practice Has on Neurodevelopment in Youth . . . . . . . . . . . . . . . . . 481
John R. Iversen, Johanna Baker, Dalouge Smith, Terry Jernigan
14:00 The Influence of Meter on Harmonic Syntactic Processing in Music: An ERP Study . . . . . . . . . . . . . . . . . . . . 495
Clemens Maidhof, Rie Asano
14:00 Effect of Musical Expertise on the Coupling of Auditory, Motor, and Limbic Brain Networks
During Music Listening. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Iballa Burunat, Valeri Tsatsishvili, Elvira Brattico, Petri Toiviainen
xxx
14:00 Neural Changes After Multimodal Learning in Pianists — An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Maria Celeste Fasano, Enrico Glerean, Ben Gold, Dana Sheng, Mikko Sams, Josef Rauschecker,
Peter Vuust, Elvira Brattico
14:00 Representation of Musical Beat in Scalp Recorded EEG Responses: A Comparison of Spatial
Filtering Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Steven Losorelli, Blair Kaneshiro, Jonathan Berger
14:00 Reading the Musical Mind: From Perception to Imagery, via Pattern Classification of fMRI Data. . . . . . . 500
Tudor Popescu, Hannes Ruge, Oren Boneh, Fernando Bravo, Martin Rohrmeier
14:00 Time-Dependent Changes in the Neural Processing of Repetitive “Minimalist” Stimuli . . . . . . . . . . . . . . . . 501
Tysen Dauer, Audrey Ho, Takako Fujioka
16:00 Utilizing Multisensory Integration to Improve Psychoacoustic Alarm Design in the Intensive
Care Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
Joseph J. Schlesinger
16:00 Examining Error Processing in Musical Joint Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
Anita Paas, Giacomo Novembre, Claudia Lappe, Catherine J. Stevens, Peter Keller
16:00 Perceived Groove and Neuronal Entrainment to the Beat. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
Gabriel A. Nespoli, Frank A. Russo
16:00 Validation and Psychometric Features of a Music Perception Battery for Children from Five to
Eleven Years Old: Evidences of Construct Validity and M-Factor’s Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . 505
Hugo Cogo-Moreira
16:00 Musical EMDR: Music, Arousal and Emotional Processing in EEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Marijn Coers, Daniel Myles, Joyce J. de Bruin, Rebecca S. Schaefer
16:00 The Influence of Activating versus Relaxing Music on Repetitive Finger Movement and
Associated Motor Cortical Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
Patricia Izbicki, Brandon Klinedinst, Paul Hibbing, Allison Brinkman, Elizabeth Stegemöller
16:00 Neural Activations of Metaphor Use in Music Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
Rebecca S. Schaefer, Maansi Desai, Deborah A. Barany, Scott T. Grafton
16:00 Audio-Visual Mismatch Negativity During Musical Score Reading in Musicians . . . . . . . . . . . . . . . . . . . . . . . . 513
Shu Yu Lin, Takako Fujioka
16:00 Auditory-Motor Processing in Neural Oscillatory Beta-Band Network After Music Training in
Healthy Older Adults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
Takako Fujioka, Bernhard Ross
16:00 Mechanical Timing Enhances Sensory Processing Whereas Expressive Timing Enhances Later
Cognitive Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Takayuki Nakata, Laurel J. Trainor
16:00 Music-Induced Positive Mood Broadens the Scope of Auditory Selective Attention . . . . . . . . . . . . . . . . . . . . 516
Vesa Putkinen, Tommi Makkonen, Tuomas Eerola
16:00 Auditory Deficits Compensated for by Visual Information: Evidence from Congenital Amusia . . . . . . . 517
Xuejing Lu, Hao T. Ho, Yanan Sun, Blake W. Johnson, William Thompson
xxxi
Music & Personality 1
Tuesday, 14:00–15:30, Bayview B, Session Chair: Birgitta Burger
14:00 The Association of Music Aptitude and Personality in 10- to 12-Year-Old Children and Adults . . . . . . . 521
Franziska Degé, Gudrun Schwarzer
14:30 MUSEBAQ: A Psychometrically Robust Questionnaire for Capturing the Many Voices of Music
Engagement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
Tan-Chyuan Chin, Nikki S. Rickard, Eduardo Coutinho, Klaus R. Scherer
15:00 Personality and Musical Preference Using Crowd-Sourced Excerpt-Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Emily Carlson, Pasi Saari, Birgitta Burger, Petri Toiviainen
Music Education 1
Tuesday, 11:30–12:30, Marina, Session Chair: Reinhard Kopiez
11:30 The Effect of Movement Instruction in Music Education on Cognitive, Linguistic, Musical and
Social Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
Emese Maróti, Edina Barabás, Gabriella Deszpot, Tamara Farnadi, László Norbert Nemes,
Borbála Szirányi, Ferenc Honbolygó
11:45 Singing for the Self: Exploring the Self-Directed Singing of Three and Four Year-Old Children
at Home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Bronya Dean
xxxii
12:00 Memory Ability in Children’s Instrumental Musical Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Larissa Padula Ribeiro da Fonseca, Diana Santiago, Victoria J. Williamson
12:15 Development and First Results from the Musical Ear Training Assessment (META) . . . . . . . . . . . . . . . . . . . . 547
Anna Wolf, Reinhard Kopiez
Music Education 2
Wednesday, 16:30–18:00, Seacliff D, Session Chair: Bronya Dean
Music Performance 1
Tuesday, 14:00-15:00, Seacliff D, Session Chair: Theodore Zanto
xxxiii
Music Performance 2
Wednesday, 16:30–17:30, Marina, Session Chair: Stacey Davis
16:30 Vocal Learning — The Acquisition of Linguistic and Musical Generative Grammars . . . . . . . . . . . . . . . . . . . 563
Stefanie Stadler Elmer
17:00 The Influence of Preliminary Listening and Complexity Level of Music on Sight-Reading
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
Yeoeun Lim, Seung-yeon Rhyu, Kyung Myun Lee, Kyogu Lee, Jonghwa Park, Suk Won Yi
Music Performance 3
Thursday, 14:00–15:30, Seacliff D, Session Chair: Peter Keller
Music Performance 4
Friday, 08:30–10:00, Seacliff D, Session Chair: Kyung Myun Lee
8:30 A Classical Cello-Piano Duo’s Shared Understanding with Their Audience and with Each Other. . . . . . . 577
Neta Spiro, Michael F. Schober
9:00 The Role of String Register in Affective Performance Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
Caitlyn Trevor, David Huron
9:30 Expressive Consistency and Creativity: A Case Study of an Elite Violinist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
Stacey Davis
Music Performance 5
Saturday, 14:00–15:00, Seacliff D, Session Chair: Neta Spiro
14:00 The Role Repertoire Choice Has in Shaping the Identity and Functionality of a Chamber Music
Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
Alana Blackburn
14:15 Negotiating Incongruent Performer Goals in Musical Ensemble Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
Jennifer MacRitchie, Andrea Procopio, Steffen A. Herff, Peter Keller
14:30 Limited Evidence for an Evaluation Advantage for Performance Played from Memory . . . . . . . . . . . . . . . . 589
Reinhard Kopiez, Anna Wolf, Friedrich Platz
14:45 Are Visual and Auditory Cues Reliable Predictors for Determining the Finalists of a Music
Competition?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Friedrich Platz, Reinhard Kopiez, Anna Wolf, Felix C. Thiesen
xxxiv
Music Performance 6 (Posters)
Wednesday, 14:00, Seacliff A–C
14:00 Music, Animacy, and Rubato: What Makes Music Sound Human?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
Adrian S. Blust, David John Baker, Kaitlin Richard, Daniel Shanahan
14:00 Metaphor Use in Skilled Movement: Music Performance, Dance and Athletics. . . . . . . . . . . . . . . . . . . . . . . . . . 599
Jake D. Pietroniro, Joyce J. de Bruin, Rebecca S. Schaefer
14:00 Describing Self-Talk in Music Practice Cognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600
Julie M. Stephens, Amy L. Simmons
14:00 Outliers in Performed Loudness Transitions: An Analysis of Chopin Mazurka Recordings. . . . . . . . . . . . 601
Katerina Kosta, Oscar F. Bandtlow, Elaine Chew
14:00 How Stable are Motor Pathways in Beginning Wind Instrument Study? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
Laura A. Stambaugh
14:00 Performers’ Motions Reflect Their Intention to Express Local or Global Structure in Melody . . . . . . . . . . 607
Madeline Huberth, Takako Fujioka
14:00 Cognitive and Affective Empathic Responses to Contemporary Western Solo Piano
Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
Mary C. Broughton, Jessie Dimmick
14:00 Silent Reading and Aural Models in Pianists’ Mental Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Nina Loimusalo, Erkki Huovinen
14:00 Auditory and Motor Learning for Skilled and Novice Performers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
Rachel M. Brown, Virginia Penhune
14:00 Artistic Practice: Expressive Goals Provide Structure for the Perceptual and Motor
Components of Music Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
Sarah E. Allen, Carla D. Cash, Amy L. Simmons, Robert A. Duke
14:00 Attentional Control and Music Performance Anxiety: Underlying Causes of Reduced
Performance Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Sarah E. Lade, Serena W. Tse, Laurel J. Trainor
14:00 Effect of Room Acoustics on Listeners’ Preference of String Quartet Performances Recorded
in Virtual Acoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Song Hui Chon, Sungyoung Kim, Doyuen Ko
14:00 Computational Analysis of Expressive Shape in Music Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Taehun Kim, Stefan Weinzierl
14:00 Solo vs. Duet in Different Virtual Rooms: On the Consistency of Singing Quality Across
Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 624
Timo Fischinger, Gunter Kreutz, Pauline Larrouy-Maestri
Music Therapy 1
Tuesday, 11:30–12:30, Seacliff D, Session Chair: Charles Limb
xxxv
12:15 Neuroimaging and Therapeutic Use of Psychedelic Drugs and Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
Mendel Kaelen, Romy Lorenz, Anna Simmonds, Leor Roseman, Amanda Feilding, Robert Leech,
David Nutt, Robin Carhart-Harris
16:00 Benefits of a Voice Therapy and Song Singing Group for People with ILD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
Merrill Tanner, Meena Kalluri, Janice Richman-Eisenstat
16:00 The Use of Therapeutic Singing Program to Enhance Vocal Quality and Alleviate Depression
of Parkinson’s Disease: Case Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
Eun Young Han, Ji Young Yun, Kyoung-Gyu Choi, Hyun Ju Chong
16:00 The Therapeutic Function of Music for Musical Contour Regulation Facilitation: A Model to
Facilitate Emotion Regulation Development in Preschool-Aged Children. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
Kimberly Sena Moore, Deanna Hanson-Abromeit
16:00 Representing Change in Patterns of Interaction of Children with Communication Difficulties
in Music Therapy: What do the Therapists Think? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
Neta Spiro, Camilla Farrant, Tommi Himberg
16:00 A Two-Year Longitudinal Investigation Comparing Pitch Discrimination and Pitch Matching
Skills of Young Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
Debbie Lynn Wolf
16:00 How Masculine is a Flute? A Replication Study on Gender Stereotypes and Preferences for
Musical Instruments Among Young Children. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
Claudia Bullerjahn, Katharina Heller, Jan H. Hoffmann
16:00 The Importance of Musical Context on the Social Effects of Interpersonal Synchrony in Infants. . . . . . . 643
Laura Cirelli, Stephanie Wan, Christina Spinelli, Laurel J. Trainor
16:00 Melody Familiarity Facilitates Music Processing in 4–5-Year-Olds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
Sarah C. Creel
16:00 Effects of Dark and Bright Timbral Instructions on Acoustical and Perceptual Measures of
Trumpet Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
D. Gregory Springer, Amanda L. Schlegel
16:00 Listeners Rapidly Adapt to Musical Timbre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
Elise A. Piazza, Frédéric E. Theunissen, David Wessel, David Whitney
16:00 All About That Bass: Timbre and Groove Perception in Synthesized Bass Lines. . . . . . . . . . . . . . . . . . . . . . . . 650
Ivan Tan, Ethan Lustig
10:30 Mobile Experience Sampling of Music Listening Using the MuPsych App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Will Randall, Suvi Saarikallio
11:00 BeatHealth: Bridging Rhythm and Technology to Improve Health and Wellness . . . . . . . . . . . . . . . . . . . . . . . 652
Simone Dalla Bella, Ramesh Balasubramaniam
11:20 Optimization of Rhythmic Auditory Cueing for Gait Rehabilitation in Parkinson’s Disease . . . . . . . . . . . 653
Simone Dalla Bella, Dobromir Dotov, Sophie Bayard, Valérie Cochen de Cock, Christian Geny,
Emilie Guettard, Petra Ihalainen, Benoît Bardy
11:40 When Music (De-)Synchronizes Biology During Running . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654
Benoît Bardy, Loïc Damm, Fabien Blondel, Petra Ihalainen, Déborah Varoqui, Simone Dalla Bella
xxxvi
12:00 The Empowering Effect of Being Locked to the Beat of Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
Marc Leman, Edith Van Dyck, Bart Moens, Jeska Buhman
12:20 BeatHealth: A Platform for Synchronizing Music to Movement in the Real World . . . . . . . . . . . . . . . . . . . . . . 656
Rudi Villing, Joseph Timoney, Victor Lazzarini, Sean O’Leary, Dawid Czesak, Tomas Ward
Panel Discussion: The Intersection of Art, Medicine, Science and Industry: Improving Public Health and Well-Being
Through Music and Technology
Discussant: Benoit Bardy (University of Montpellier)
Convener: Theodore Zanto (University of California San Francisco)
Panel: Adam Gazzaley (University of California San Francisco)
Michael Winger (The Recording Academy, SF)
Matt Logan (UCSF Benioff Children’s Hospital)
Ketki Karanam (The Sync Project)
Musical Development 1
Tuesday, 10:30–11:30, Marina, Session Chair: Adam Tierney
10:30 The Role of Updating Executive Function in Sight Reading Performance Across Development. . . . . . . . 662
Laura Herrero, Nuria Carriedo, Antonio Corral, Pedro R. Montoro
10:45 Auditory and Visual Beat and Meter Perception in Children. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
Jessica E. Nave-Blodgett, Erin E. Hannon, Joel S. Snyder
11:00 Beat-Based Entrainment During Infant-Directed Singing Supports Social Engagement . . . . . . . . . . . . . . . . . . 669
Miriam Lense, Warren Jones
11:15 Infants’ Home Soundscapes: A Portrait of American Families. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 670
Eugenia Costa-Giomi
xxxvii
Musical Development 2
Wednesday, 15:00–16:00, Marina, Session Chair: Miriam Lense
Musical Development 3
Thursday, 11:00–12:30, Marina
Musical Development 4
Saturday, 11:00–12:30, Marina
xxxviii
Musical Timbre 1
Friday, 08:30–10:30, Bayview B, Session Chair: Jason Noble
8:30 Top-Down Modulation on the Perception and Categorization of Identical Pitch Contours in
Speech and Melody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
J.L. Weidema, M.P. Roncaglia-Denissen, Henkjan Honing
9:00 The Structure of Absolute Pitch Abilities and its Relationship to Musical Sophistication . . . . . . . . . . . . . . 697
Suzanne Ross, Karen Chow, Kelly Jakubowski, Marcus T. Pearce, Daniel Müllensiefen
9:30 Heptachord Shift: A Perceptually Based Approach to Objectively Tracking Consecutive Keys
in Works of J.S. Bach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698
Marianne Ploger
10:00 Perception of Pitch Accuracy in Melodies: A Categorical or Continuous Phenomenon? . . . . . . . . . . . . . . . . 699
Pauline Larrouy-Maestri, Simone Franz, David Poeppel
Revisiting Music Perception Research: Issues in Measurement, Standardization, and Modeling . . . . . . . 709
Beatriz Ilari, Graziela Bortz, Hugo Cogo-Moreira, Nayana Di Giuseppe Germano
11:00 Absolute Pitch: In Search of a Testable Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
Nayana Di Giuseppe Germano, Hugo Cogo-Moreira, Graziela Bortz
xxxix
11:30 Active and Receptive Melodic Perception: Evaluation and Validation of Criteria . . . . . . . . . . . . . . . . . . . . . . . 714
Graziela Bortz, Hugo Cogo-Moreira
12:00 How are We Measuring Music Perception and Cognition? Lack of Internal Construct Validity,
Overuse of Row Scores, and Misuses of Cronbach’s Alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Hugo Cogo-Moreira
8:30 Information Dynamics of Boundary Perception: Entropy in Self-Paced Music Listening . . . . . . . . . . . . . . . 719
Haley E. Kragness, Niels Chr. Hansen, Peter Vuust, Laurel J. Trainor, Marcus T. Pearce
9:00 We are Fast Learners: The Implicit Learning of a Novel Music System from a Short Exposure . . . . . . . . 720
Yvonne Leung, Roger T. Dean
9:30 Tonal Cognition: A Dynamic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
Duilio D’Alfonso
10:00 On the Role of Semitone Intervals in Melodic Organization: Yearning vs. Baby Steps . . . . . . . . . . . . . . . . . 727
Hubert Léveillé Gauvin, David Huron, Daniel Shanahan
16:00 Statistical Learning of Novel Musical Material: Evidence from an Experiment Using a Modified
Probe Tone Profile Paradigm and a Discrimination Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
A.-X. Cui, C. Diercks, N.F. Troje, L.L. Cuddy
16:00 Robust Training Effects in Congenital Amusia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Kelly L. Whiteford, Andrew J. Oxenham
16:00 Listeners’ Perception of Rock Chords: Further Evidence for a More Inclusive Harmonic
Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
Lincoln Craton, Peter Krahe, Gina Micucci, Brittany Burkins
16:00 Comparative Judgement of Two Successive Pitch Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Makiko Sadakata, Maartje Koning, Kengo Ohgushi
16:00 The Influence of Diatonicity, Tone Order, and Composition Context on Perception of
Similarity: An Experiment and Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
Mark Shevy, Douglas McLeod, Stephen Dembski
16:00 The Alberti Figure Visualized Challenges the Stability of Tonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
Nancy Garniez
16:00 Understanding Modulations Through Harmonic Priming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
Olivia Podolak, Mark A. Schmuckler
16:00 The Use of Combined Cognitive Strategies to Solve More Successfully Musical Dictation . . . . . . . . . . . . . 741
Ruth Cruz de Menezes, Maria Teresa Moreno Sala, Matthieu J. Guitton
16:00 “Looseness” in Musical Scales: Interplay of Style and Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 742
Rytis Ambrazevičius
xl
Pitch & Tonal Perception 6 (Posters)
Saturday, 16:00, Seacliff A–C
11:30 Visual Rhythm Perception of Dance Parallels Auditory Rhythm Perception of Music . . . . . . . . . . . . . . . . . . 767
Yi-Huang Su
11:45 Musical Rhythms Induce Long-Lasting Beat Perception in Listeners with and without Musical
Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 768
Karli Nave, Erin E. Hannon, Joel S. Snyder
12:00 Rhythmic Skills in Children, Adolescents and Adults Who Stutter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
Simone Falk, Marie-Charlotte Cuartero, Thilo Müller, Simone Dalla Bella
12:15 Assessment of Rhythm Abilities on a Tablet-Based Mobile Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
Theodore P. Zanto, Namita Padgaonkar, Adam Gazzaley
15:00 A Bayesian Approach to Studying the Mental Representation of Musical Rhythm Using the
Method of Serial Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
Elisa Kim Fromboluti, Patrycja Zdziarska, Qi Yang, J. Devin McAuley
15:15 Absolute versus Relative Judgments of Musical Tempo: The Roles of Auditory and
Sensorimotor Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772
Justin London, Marc R. Thompson, Birgitta Burger, Petri Toiviainen
15:30 The Perception of a Dotted Rhythm Embedded in a Two-Four-Time Framework . . . . . . . . . . . . . . . . . . . . . . . 773
Chinami Onishi, Yoshitaka Nakajima, Emi Hasuo
xli
15:45 It’s Open to Interpretation: Exploring the Relationship Between Rhythmic Structure and
Tempo Agreement in Bach’s Well Tempered Clavier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
Michael Schutz, Raven Hebert-Lee, Anna Siminoski
16:30 Beat Deaf Persons Entrain: Evidence for Neural Entrainment Despite Impaired Sensorimotor
Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
Benjamin G. Schultz, Pauline Tranchant, Baptiste Chemin, Sylvie Nozaradan, Isabelle Peretz
17:00 Beat Keeping in a Sea Lion as Coupled Oscillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
Andrew Rouse, Peter F. Cook, Edward W. Large, Colleen Reichmuth
17:30 Rhythmic Perceptual Priors Revealed by Iterated Reproduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
Nori Jacoby, Josh McDermott
14:00 Ensemble Synchronization and Leadership in Jembe Drumming Music from Mali. . . . . . . . . . . . . . . . . . . . . . 778
Nori Jacoby, Justin London, Rainer Polak
14:30 Groove in Context: Effects of Meter and Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779
Richard Ashley
15:00 Single Gesture Cues for Interpersonal Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
Esther Coorevits, Dirk Moelants, Marc Leman
16:30 Perceived Difference Between Straight and Swinging Drum Rhythms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781
Andrew V. Frane
16:30 Predicting Temporal Attention in Music with a Damped Oscillator Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
Brian K. Hurley, Lauren K. Fink, Petr Janata
16:30 Rhythmic Complexity in Rap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Adam Waller, David Temperley
16:30 Exploring Rhythmic Synchronization Abilities in Musicians Using Training-Specific
Movements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
Anna Siminoski, Fiona C. Manning, Michael Schutz
16:30 Working Memory and Auditory Imagery Predict Synchronization with Expressive Music . . . . . . . . . . . . . 785
Ian Colley, Peter Keller, Andrea R. Halpern
16:30 Perception of Auditory and Visual Disruptions to the Beat and Meter in Music. . . . . . . . . . . . . . . . . . . . . . . . . 786
Jessica E. Nave-Blodgett, Erin E. Hannon, Joel S. Snyder
16:30 Metricality Modifies the Salience of Duration Accents but not Pitch Accents . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
Jon B. Prince, Tim Rice
16:30 Pupillary and Eyeblink Responses to Auditory Stimuli Index Attention and Sensorimotor
Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Lauren K. Fink, Brian K. Hurley, Joy J. Geng, Petr Janata
16:30 Rhythmic Timing in the Visual Domain Aids Visual Synchronization of Eye Movements in
Adults: Comparing Visual Alone to Audio-Visual Rhythms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789
Melissa Brandon, Nicole Willett, Diana Chase, Dillon Lutes, Richard Ravencroft
xlii
Rhythm, Meter & Timing 6 (Posters)
Thursday, 16:00, Seacliff A–C
16:00 Motor System Excitability Fluctuations During Rhythm and Beat Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
Daniel J. Cameron, Jana Celina Everling, T.C. Chiang, Jessica A. Grahn
16:00 Does Anticipating a Tempo Change Systematically Modulate EEG Beta-Band Power? . . . . . . . . . . . . . . . . . . 791
Emily Graber, Takako Fujioka
16:00 An EEG Examination of Neural Entrainment and Action Simulation During Rhythm Perception. . . . . . . 792
Jessica Ross, John R. Iversen, Scott Makeig, Ramesh Balasubramaniam
16:00 The BEAT Test: Training Fundamental Temporal Perceptual Motor Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
Joel Afreth, Scott Makeig, John R. Iversen
16:00 Detrended Cross-Correlation Analysis Reveals Long-Range Synchronization in Paired Tempo
Keeping Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
Masahiro Okano, Masahiro Shinya, Kazutoshi Kudo
16:00 Eye Movement Synchronization to Visual Rhythms in Infants and Adults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
Melissa Brandon, Sarah Hackett, Jessica Gagne, Allison Fay
16:00 Neural Entrainment During Beat Perception and its Relation to Psychophysical Performance. . . . . . . . . 799
Molly J. Henry, Jessica A. Grahn
16:00 Deconstructing nPVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800
Nathaniel Condit-Schultz
16:00 Quantifying Synchronization in Musical Duo Improvisation: Application of n:m Phase
Locking Analysis to Motion Capture Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805
Shinya Fujii, Masaya Hirashima, Hama Watanabe, Yoshihiko Nakamura, Gentaro Taga
16:00 Spared Motor Synchronization to the Beat of Music in the Presence of Poor Beat Perception . . . . . . . . . . 806
Valentin Bégel, Charles-Etienne Benoit, Angel Correa, Diana Cutanda, Sonja A. Kotz, Simone Dalla Bella
16:00 Rhythmic Consonance and Dissonance: Perceptual Ratings of Rhythmic Analogs of Musical
Pitch Intervals and Chords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
Alek S. Razdan, Aniruddh D. Patel
16:00 The Effect of Short-Term Training on Synchronization to Complex-Meter Music . . . . . . . . . . . . . . . . . . . . . . . 813
Carson G. Miller Rigoli, Sarah C. Creel
16:00 Do Chord Changes Make Us Hear Downbeats? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
Christopher William White, Christopher Pandiscio
16:00 The Influence of Tempo on Subjective Metricization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815
Ève Poudrier
16:00 “Not on the Same Page” — A Cross Cultural Study on Pulse and Complex-Meter Perception. . . . . . . . . . . 816
Hsiang-Ning Dora Kung, Udo Will
16:00 Statistical Learning for Musical Meter Revisited: A Corpus Study of Malian Percussion Music . . . . . . . . 820
Justin London, Nori Jacoby, Rainer Polak
16:00 Entrainment to an Auditory Rhythm: Is Attention Involved? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
Richard Kunert, Suzanne R. Jongman
16:00 Walking Cadence is Influenced by Rhythmic Regularity in Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Simone Falk, Elena Maslow, Thierry Legou, Simone Dalla Bella, Eckart Rupp
xliii
16:00 Hierarchical Structure of Musical and Visual Meter in Cross-Modal “Fit” Judgments . . . . . . . . . . . . . . . . . . . 823
Stephen E. Palmer, Joshua Peterson
16:00 Musicianship and Tone-Language Experience Differentially Influence Pitch-Duration
Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
Yu Fan, Udo Will
10:30 The Song Remains the Same: Biases in Musical Judgements and the Illusion of Hearing
Different Songs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
Manuel Anglada-Tort, Daniel Müllensiefen
11:00 Being Moved by Unfamiliar Sad Music is Associated with Empathy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
Tuomas Eerola, Jonna K. Vuoskoski
11:30 Situational and Dispositional Influences on the Functions of Music Listening . . . . . . . . . . . . . . . . . . . . . . . . . . 834
Fabian Greb, Wolff Schlotz, Jochen Steffens
12:00 Affording Social Interaction and Collaboration in Musical Joint Action. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
Ashley E. Walton, Auriel Washburn, Elaine Hollensbe, Michael J. Richardson
14:00 Anytime, Anyplace, Anywhere: The Holy Trinity of Music Streaming? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
Geoff Luck
14:00 Is Complex Coordination More Important for Maintaining Social Bonds Than Synchronization?. . . . . . . 841
Jacques Launay, Robin I.M. Dunbar
14:00 Why Listen to Music Right Now? — Towards an Inventory Measuring the Functions of Music
Listening Under Situational Influences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Jochen Steffens, Fabian Greb, Wolff Schlotz
14:00 The Effect of Music Listening on Theory of Mind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
Jonna K. Vuoskoski, David M. Greenberg, Tuomas Eerola, Simon Baron-Cohen, Jason Rentfrow
14:00 Musical Eclecticism in Terms of the Human Development Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Michael Barone, Matthew Woolhouse
14:00 Empathy and Moral Judgment: The Effects of Music and Lyrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
Naomi Ziv, Shani Esther Twana
xliv
14:00 The Effects of Music on Consumer Mood States and Subsequent Purchasing Behavior in an
Online Shopping Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
Alison Wong, Nicola Dibben
14:00 “Gonna Make a Difference” — Effects of Prosocial Engagement of Musicians on Music
Appraisal of Recipients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847
Nicolas Ruth
14:00 A Journey to Ecstasy: The Lived Experiences of Electronic Dance Music Festival Attendees . . . . . . . . . . 848
Noah Little, Birgitta Burger
14:00 A Psychological Approach to Understanding the Varied Functions That Different Music
Formats Serve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
Steven Caldwell Brown, Amanda E. Krause
14:00 Picking up Good Vibrations: The Effect of Collaborative Music Making on Social Bonding and
Enjoyment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
Stephen Ash, Joshua D. Albrecht
xlv
Table of Contents
for this manuscript
The Timbre Toolbox and MIRtoolbox are the most popular Matlab toolboxes that are
used for audio feature extraction within the MIR community. They have been recently
evaluated according to the number of presented features, the user interface and
computational efficiency, but there have rarely been performance evaluations of the
accuracy of the extracted features. In this study we evaluate the robustness of audio
descriptors these toolboxes have in common and that are putatively related to timbre. For
this purpose we synthesized various soundsets using additive synthesis, which allows for
a direct computation of the audio descriptors. We evaluate toolbox performance by
analyzing the soundsets with each toolbox and calculating the normalized RMS error
between their output and the theoretical values. Each soundset was designed to exhibit
specific sound qualities that are directly related to the descriptors being tested. In this
way we are able to systematically test the performance of the toolboxes by tracking under
which circumstances certain audio descriptors are poorly calculated. For the analysis we
used the default settings of each toolbox and the summary statistics from the frame-by-
frame analysis were derived using the median values. For spectral descriptors the Timbre
Toolbox performs better and on some soundsets outperforms the MIRtoolbox with the
short-term Fourier transform power representation being the most robust. The Timbre
Toolbox failed to analyze some sounds using the harmonic representation even though all
sounds were strictly harmonic. The MIRtoolbox's calculations of spectral centroid on
some sounds and spectral irregularity on a specific soundset proved to be numerically
unstable. For descriptors that are based on the estimation of the amplitude envelope both
toolboxes perform almost equally but poorly. We noticed that in this case the errors
depend both on the length of the attack or decay times and on the shape of the slopes.
ISBN 1-876346-65-5 © ICMPC14 1 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In a temporal order judgment (TOJ) task, participants are required to judge the order of
two or more stimuli presented close together in time (e.g. which stimulus occurred first).
Studies concerning tactile perception will often use TOJ tasks to make inference on the
temporal and spatial limits of the tactile sense (i.e. how sensitive or precise the tactile
sense is at locating a stimulus in time or external space). When a tactile stimulus is
received at the skin, it is thought to be processed with regard to two properties:
somatotopic (i.e. what part of the skin received the stimulus) and external spatial (i.e.
where that part of the skin was in external space when it received the stimulus). Postural
manipulations (e.g. crossing the hands into opposite external hemifields) can cause a
conflict between these two properties and impair the temporal and spatial limits of the
tactile sense in that context. Given their purpose, TOJ tasks can therefore help assess the
impairment. Previous studies by other researchers, and pilot studies by the current
authors, suggest that musicians exhibit finer temporal and spatial abilities when it comes
to the tactile sense – it is therefore of interest to the current researchers if the temporal
and spatial impairments associated with postural manipulations will affect musicians the
same as non-musicians. A TOJ task with both normal and postural manipulation
conditions was conducted on a group of musicians and non-musicians. Musicians
performed significantly better than non-musicians in all conditions, with the largest
differences in conditions with conflict between somatotopic and external spatial
properties. Possible reasons for the differences in performance, and the apparent
resilience of musicians to impairment by postural manipulation, are briefly discussed.
ISBN 1-876346-65-5 © ICMPC14 2 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Sound mass, the grouping of multiple sound events or sources into perceptual units that
retain an impression of multiplicity, is essential to daily auditory experience (e.g. when
we recognize a crowd, traffic, or rain as a “single” sound). It is also highly relevant to
musical experience: composers including Ligeti, Xenakis, and many others have
exploited it at length. Such composers often describe their aesthetic aims in vividly
evocative terms – clouds, galaxies, crystal lattices – but it remains unclear to what extent,
if any, such metaphors relate to the experiences of listeners.
ISBN 1-876346-65-5 © ICMPC14 3 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Musical nuance, fine grain deviations in pitch, loudness, time, and tone quality in a
musical phrase, is critical to listener perceptions. Research suggests that music training
may influence nuance perception in young adults; however, little is known about the
extent to which aging or other demographic variables such as gender, influence musical
nuance perception (Bugos, Heller, & Batcheller, 2015).
Aims
The aims were to evaluate the role of gender on musical nuance perception in older adults
and to compare the performance of older adults with younger adults.
Methods
One-hundred twenty healthy older adults (47 males, 73 females) were recruited from an
independent living facility and screened for cognitive deficits and hearing loss. Younger
adults (n=120) were recruited from a local urban university. Participants completed a
short demographic survey and the Musical Nuance Task (MNT). The MNT is a highly
reliable, 30-item measure that includes 15 items of three short musical phrases performed
on the same instrument (cello, clarinet, or piano) and 15 items of three short musical
phrases performed on each of these same three instruments. Two of the three performed
phrases were considered the “same” in nuance and one was considered “different” in
nuance.
Results
Results of a MANOVA on the MNT show significantly faster reaction times for older
adult females compared to males. While no differences were found for the total number
of errors on the MNT, less errors were committed by older adult females on the last 15
trials, items with multiple timbres presented. No differences were found in errors rates
between young adults and older adults. However, younger adults show faster reaction
times compared to older adults.
Conclusions
Nuance perception is more difficult with contrasting timbres. Gender differences in
auditory processing may influence nuance perception.
ISBN 1-876346-65-5 © ICMPC14 4 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Insights into the complexities of music listening for hearing aid users
Alinka Greasley1, Harriet Crook2, Robert Fulford1, Jackie Salter1
1
School of Music, University of Leeds, Leeds, UK
2
Sheffield Teaching Hospitals NHS Foundation Trust, Sheffield, UK
Music listening has significant health and well-being benefits, including for people with
all levels of hearing impairment. Digital hearing aids (HA) are optimised for speech
amplification and can present difficulties for music perception. A UK-based project is
currently exploring how music listening behaviour is shaped by hearing impairment and
the use of HA technology. Findings from two studies will be reported. First, a clinical
questionnaire explored the extent of music listening issues and the frequency and success
of discussions with audiologists about music. Data from 176 HA users (age range 21-93,
average=60.35, SD=18.07) showed that problems with music listening were often
experienced and almost half reported that this negatively affects their quality of life.
Participants described issues listening to live music performances, hearing words in
songs, the loss of music from their lives and associated social isolation. Most had never
talked with their audiologist about music and, for those that had, improvements were
rarely reported. A second study explored listening experiences of HA users in greater
depth, with the collection of pure tone audiometry to facilitate interpretation of accounts.
The sample (n=22, age range 24-82, average=62.05, SD=24.07) included participants
with a range of hearing impairments and levels of musical training. Participants described
a variety of listening behaviours and preferences combined with different patterns of HA
and Assistive Listening Device use. Analyses showed interactions between contextual
factors such the individual nature of hearing loss (e.g. type, level, duration); levels of
musical engagement (e.g. daily exposure, training) and contexts (e.g. recorded music at
home/travelling, live performances). These studies provide new data on a poorly
understood topic that will be used to inform the design of a national survey which seeks
to identify patterns in the listening behaviour of a wider population of HA users.
ISBN 1-876346-65-5 © ICMPC14 5 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
On the Sensations of Tone is a series of compositions that use specific acoustic material
to create a separate stream of music generated by the body. Upon the simultaneous
presentation of two closely related pure-tone frequencies, auditory distortion products are
produced on the basilar membrane in the cochlea and result in the perception of
additional tones not present in the acoustic space. From 18th century concert music to
contemporary electronic music, composers have implemented “combination tones” in
pieces using instruments ranging from the violin to computer-based synthesizers. The aim
of the present work was to examine the acoustic properties of the orchestral crotales
within the context of combination tones and the composition On the Sensations of Tone
VIII. The crotales are small cymbals chromatically tuned in the range of C6 to C8 (or F8),
and are usually played with mallets or sometimes bowed. Pre-compositional methods
involved conducting a spectral analysis in MATLAB of the upper octave crotales,
calculating the appropriate distortion products, and matching the acoustic tones with
synthesized frequencies. The results indicated clearly perceivable additional tones, which
informed the compositional process. A case study of the piece illustrates how distortion
products and virtual pitches allow for additional harmonic material. Distortion products
also allow for expanded spatial depth between acoustic material and material generated in
the ear. Macroscopic and microscopic listening is possible by shifting attention between
the different streams, and contributes to a unique listening experience.
ISBN 1-876346-65-5 © ICMPC14 6 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
While music is primarily experienced in the auditory domain, many aspects of music can
also be experienced through vibrotactile stimulation alone. This paper provides an
overview of research in our lab that has assessed the perception of vibrotactile music.
Specific features of music considered include pitch, rhythm, and timbre. We also
consider differences in vibrotactile and auditory perception with regard to higher-level
constructs such as melody and emotion. Finally, we consider methodological issues in
this research and the manner in which the utilization of cues may change as a function of
hearing loss, experience, and magnitude of stimulation.
ISBN 1-876346-65-5 © ICMPC14 7 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
A. Perceptual Learning
ABSTRACT
Perceptual learning (PL) refers to improvements in the
How can we improve abstract pattern recognition in music? Can pickup of information as a function of experience (Gibson,
principles enhancing visual learning be extended to auditory stimuli, 1969). In particular, PL has been shown to accelerate the
such as music? Perceptual learning (PL), improvements in the pickup
of information from experience, is well-established in both vision and
development of expert pattern recognition, contributing to the
audition. Most work has focused on low-level discriminations, rather development of expertise. PL is well-established in both vision
than extraction of relations in complex domains. In vision, recent (Sagi, 2010) and audition (Sabin, Eddins, & Wright, 2012).
research suggests that PL can be systematically accelerated in Much work in both modalities has focused on expertise in
complex domains, including aviation, medicine, and mathematics, low-level discriminations. For example, many studies have
through perceptual and adaptive learning modules (PLMs). PLMs use demonstrated PL in studies of pitch duration (e.g. Karmarkar &
brief interactive classification trials along with adaptive spacing to Buonomano, 2003) or orientated visual gratings (e.g. Song,
accelerate perceptual category learning. We developed a Composer Peng, Lu, Liu, & Li, 2007). However, perceptual learning also
PLM to apply these techniques to recognizing composers’ styles, a supports the extraction of relations in complex domains. In
challenging, abstract, PL task involving complex relations over time.
We investigated whether (1) participants could learn composers’
vision, recent research suggests that PL can be systematically
styles, and (2) whether the PLM framework could be successfully accelerated in aviation (Kellman & Kaiser, 1994), medicine
extended to rich auditory domains. On each PLM trial, participants (Thai, Krasne, & Kellman, 2015), mathematics (e.g. Kellman
listened to a 15-second clip of solo piano music by one of 4 composers & Massey, 2013; Kellman, Massey, & Son, 2010; Landy &
(2 Baroque: Bach, Handel; 2 Romantic: Chopin, Schumann), Goldstone, 2007), and other complex domains. Perceptual and
attempted to identify the composer, and received feedback. Before and adaptive learning modules (PALMs) accomplish this
after multiple PLM sessions, we tested participants’ ability to acceleration by using brief, interactive classification trials (e.g.
distinguish clips from composers in the PLM, unlearned composers in Mettler & Kellman, 2014; Mettler, Massey, & Kellman, 2011).
trained periods (Baroque: Scarlatti; Romantic: Mendelssohn) and PALMs have only been created for visual stimuli thus far,
composers in untrained periods (Renaissance: Byrd; post-Romantic:
Debussy). At pretest, participants’ sensitivity was at chance, but they
which do not share the temporal nature of auditory stimuli. Can
showed robustly improved classification at posttest on both composer principles enhancing visual learning be extended to auditory
styles (p < .01) and period styles (p < .001). Results indicate that PLM stimuli, such as music?
training can improve participants’ recognition of composers’ styles,
demonstrating that (1) composer style can be learned, and that (2) B. Composer Style
PL-based interventions are effective in complex auditory domains. Psychologists studying musical style perception have
focused primarily on documenting human sensitivity to period
styles (e.g. Hasenfus, Martindale & Birnbaum, 1983), leading
I. INTRODUCTION to the development of hundreds of machine learning programs
Humans have impressive abilities to recognize structure and that simulate this impressive human ability (e.g. Widmer,
patterns in the world, in multiple modalities and many domains. 2005). However, while our human ability to recognize patterns
For example, we can recognize familiar voices on the phone in period styles has been well-documented, psychologists have
and identify objects at unusual sizes and angles. Music, in done very little empirical work on whether humans have a
particular, is a useful domain in which to study pattern similar sensitivity to composer styles, a topic that
recognition because of its many patterns at many levels of musicologists have long been interested in and discussed. Tyler
complexity and abstraction which allow research into auditory (1946) played the three movements of a Mozart sonata and
perception across the spectrum from basic processing to selected and played three movements each of trios by
high-level real-world tasks. The simplest patterns in music are Beethoven and Schubert to students in a music appreciation
specific note durations and pitches. Themes are more complex course and found that students could distinguish between the
patterns, but even untrained listeners can recognize a theme at composers better than chance, but she did not control for
its restatement in a piece they have not heard before (Java, musical form (e.g. sonata or ballad) nor did she study learning.
Kaminska, & Gardiner, 1995). The styles of composers are Crump (2002) chose J. S. Bach and Mozart because the
perhaps the most abstract and complex patterns that humans composers were “stylistically distinct”, and then only used the
can detect in music, because these patterns transcend individual Goldberg variations for Bach and Minuets for Mozart.
pieces or works. Research on perceptual learning as well as Participants and a machine learning algorithm successfully
research specifically on composer style contribute insights into learned to distinguish these composers, but confounding
our ability to recognize composer styles, which are abstract musical form with composer limits the generalizability of these
auditory patterns. results. Due to the limitations of these studies, the empirical
proof of composer style and of the learnability of composer
style is uncertain.
ISBN 1-876346-65-5 © ICMPC14 8 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
C. Current Study options were the four PALM-trained composers, and the
remaining three were “Other Baroque” (Scarlatti), “Other
We developed a Composer PALM to apply perceptual
Romantic” (Mendelssohn), and “Other Period” (Byrd or
learning technology to recognizing composers’ styles, a
Debussy) (see Figure 1).
challenging, abstract, perceptual learning task involving
There were three versions of the assessment (A, B, C), all
complex relations over time. We investigated whether (1)
containing unique clips. The amount of clips per composer in
participants could learn composers’ styles, and (2) whether the
each version was held constant. The participants were
PALM framework could be successfully extended to a rich
randomly assigned versions in their pretest and posttest,
auditory domain.
without replacement, such that no participant received the same
II. METHOD version in posttest as they did in pretest.
D. Participants
Forty-three (19 men, 24 women) undergraduate students at
the University of California, Los Angeles participated for
course credit in psychology or linguistics courses. Two
participants were excluded for non-completion of the posttest.
Included participants have learned to play an instrument for on
average 7.17 years, and specifically, piano, for on average 6.88
years. Seventeen participants did not have any prior musical
training.
E. Materials
1) Music Stimuli. There were clips from eight different
composers from four different time periods (Renaissance: Byrd; Figure 1. Response options for assessment. The “Other Baroque”
Baroque: Bach, Handel, and Scarlatti; Romantic: Chopin, composer was Scarlatti, the “Other Romantic” composer was
Mendelssohn, and Schumann; and post-Romantic: Debussy). Mendelssohn, and the “Other Period” composers were Byrd
The composers were chosen partially based on the size and (Renaissance) and Debussy (post-Romantic).
availability of their musical work. The stimuli were 15-second 2) Intervention. The Composer PALM consisted of 400
clips of classical piano music collected from Youtube.com or clips - 100 from each of four composers from two periods
the UCLA Music Library and were balanced with respect to (Baroque period: Bach and Handel, Romantic period: Chopin,
stylistic features such as tempo and form. We chose to only use Schumann). Participants were presented with many trials, each
piano music so that instrumentation would not be confounded containing one of the fifteen-second clips. In each trial,
with composer or musical period. Thus, timbral information participants listened to a music clip and attempted to choose the
was relatively constant across recordings, periods, and correct composer from the four given composers (Bach,
composers. Participants, therefore, had to rely on more Handel, Chopin, and Schumann). If they answered correctly,
temporally extended patterns to learn to distinguish composers. the PALM showed that their answer was correct and let them
We used only modern solo piano recordings (no harpsichord or proceed to the next trial (see Figure 1B). If their response was
four-hand recordings). incorrect, the participants were shown the correct answer and
1) Assessments. The pretest and posttest assessments listened to the remainder of the music clip (see Figure 1C).
consisted of twenty-four clips from four composers trained in For every twenty trials they completed, they were given a
the Composer PALM (Bach, Handel, Schumann, and Chopin) feedback summary on their average accuracy and average
and four untrained composers (Byrd, Scarlatti, Mendelssohn, response time over the block of trials. Participants were also
Debussy). given feedback on mastery, using objective criteria. A
Sixteen of the clips were from the trained composers (four participant reached mastery for a given composer when they
each), and for each of these trained composers, about two clips correctly answered four of five consecutive trials of that
were included in the PALM, and about two clips were new, composer, with a response time of less than 21s on those
novel clips that were not included in the PALM, to test both for correct trials. Mastery level, the number of composers
learning in the PALM and for transfer. For each composer, the mastered, was indicated in the PALM by four circles on the
clips were divided among the composer’s typical and atypical bottom of the screen; for each composer mastered, one circle
composition style. Experienced listeners determined typicality was filled in green. The module was delivered online, via a web
of composer style by listening extensively to all of the collected browser.
clips and subjectively categorizing them as typical or atypical
of each composer’s style. F. Procedure
The other eight clips were from the untrained composers Participants took the pretest at the laboratory. Afterwards,
(two each) - four from trained periods (Baroque and Romantic) they were instructed to undergo PALM training on their own
and four from untrained periods (Renaissance and time, using their own computer, for 45 minutes every day for
post-Romantic). seven days or until they completed the PALM by reaching
In each of 24 trials, the participants listened to a clip and mastery for all four of the composers. We sent daily emails
chose who they think composed the music clip. They were reminding them to complete their training for the day as well as
given seven response options, clustered by period. Four of the updating them on their progress. Following completion of the
9
Table of Contents
for this manuscript
PALM or a maximum of seven days of training, the Assessment (A, B, C) as a between-subjects variable and
participants returned to the laboratory to take the posttest. Period as a within-subjects variable. There was no interaction,
but there was a statistically significant main effect of
A B Assessment such that performance on version A (N = 12, M =
-0.26, SE = 0.14) was significantly worse than performance on
versions B (N = 12, M = 0.29, SE = 0.14) and C (N = 17, M =
0.24, SE = 0.12), Pillai’s Trace F(1,40) = 4.91, p = .01,
partial-eta-squared = 0.21. Because of this performance
difference, we analyzed improvement from pretest to posttest
using analyses of covariance (ANCOVAs) of sensitivity
change (d’ change = posttest d’ - pretest d’), controlling for
pretest sensitivity.
J. Composers
One of our primary goals was to empirically investigate
whether composer style could be identified and learned, by
testing sensitivity to clips’ composers. We conducted a custom
Figure 2. Screenshots of PALM trials showing feedback. Panel A hypothesis test to test the hypothesis of no sensitivity change (d’
shows a correctly answered PLM trial. Each trained composer is
change = 0) in a repeated-measures ANCOVA for Composer
a response option, and their period is noted in parentheses.
Feedback on accuracy was given on every trial. Panel B shows an (Bach, Chopin, Handel, Mendelssohn, Scarlatti, Schumann,
incorrectly answered PLM trial. Participants saw which Other Period [Byrd+Debussy]) on sensitivity change,
composer they had selected and which composed the music in the covarying out pretest sensitivity. The custom hypothesis test
clip. After incorrect answers, participants listened to the rest of confirmed that participants significantly improved their
the clip. sensitivity to composers (see Figure 3) from pretest to posttest
(M = 0.61, SE = 0.21), F(1,33) = 8.29, p < .01,
G. Dependent Measures partial-eta-squared = 0.20. We also found a significant main
This study used sensitivity as a dependent measure, with effect of Composer: participants improved on all composers
correction 1/(10n + 1), where n is the number of signal trials for (Bach M = 0.26, SE = 0.14; Chopin M = 0.36, SE = 0.14;
hit rates and n is the number of noise trials for false alarm rates. Mendelssohn M = 0.43, SE = 0.10; Scarlatti M = 0.29, SE =
For example, out of the twenty-four assessment trials, four 0.10; Schumann M = 0.59, SE = 0.14; Byrd and Debussy M =
were music clips by Bach. This means Bach had signal trials 0.38, SE = 0.12) except Handel (M = -0.17, SE = 0.14), Pillai’s
correction 1/(10(4) + 1) = 1/41, and noise trials correction Trace F(6,28) = 3.05, p = .02, partial-eta-squared = 0.40.
1/(10(20) + 1)=1/201.
0.8
0.6 Pretest
III. RESULTS Posttest
0.4
H. PALM
0.2
On average, participants completed 482.10 (range: 94 - 979,
d'
10
Table of Contents
for this manuscript
change in sensitivity (d’ change = 0) in a repeated-measures F(1,40) = 12.81, p < .001, partial-eta-squared = 0.24.
ANCOVA of Period (Baroque, Romantic, Other Period Participants were actually more sensitive to novel Bach (M =
[Renaissance + post-Romantic]) on sensitivity change, 0.38, SE = 0.18) and novel Handel (M = 0.72, SE = 0.22) clips
covarying out pretest sensitivity to periods. The custom than to familiar Bach (M = -0.23, SE = 0.18) and familiar
hypothesis test demonstrated that participants significantly Handel (M = -0.09, SE = 0.17) clips. This contradicts the
improved their sensitivity to periods (M = 1.18, SE = 0.17) explanation of participants memorizing clips because
through incidental exposure during the PALM (see Figure 4), memorization predicts better performance on learned clips, not
F(1,37) = 40.06, p < .001, partial-eta-squared = 0.52. The better performance on novel clips. There is no obvious
ANCOVA revealed no main effect of Period: participants explanation for the superiority of novel clips, but one
performed equally well on the Baroque (M = 0.59, SE = 0.13) possibility is that the clips used in training were somewhat less
and Romantic (M = 0.78, SE = 0.15) periods and the representative of the composer, or more difficult in some way,
combination of the Renaissance and post-Romantic periods (M than the clips used in the assessments.
= 0.38, SE = 0.12). We could not include Chopin and Schumann in the above
analysis because a programming error had caused assessment
1.2
Pretest version B to have only novel clips (no clips included in the
1.0 Posttest PALM) for Schumann, and assessment version C to have only
novel clips for Chopin. To analyze familiarity for these
0.8 composers, we conducted paired-samples t-tests on the data of
participants not impacted by the error. For Chopin, participants
0.6
(N = 27) were equally sensitive to familiar (M = 0.33, SE = 0.25)
d'
11
Table of Contents
for this manuscript
empirical evidence for the existence and learnability of Java, R. I., Kaminska, Z., & Gardiner, J. M. (1995). Recognition
composer style in humans. memory and awareness for famous and obscure musical themes.
Perhaps psychologists have rarely previously studied European Journal of Cognitive Psychology, 7, 41–53.
composer style because of its complexity and difficulty to learn. Karmarkar, U. R., & Buonomano, D. V. (2003). Temporal specificity
of perceptual learning in an auditory discrimination task. Learning
Undergraduates who have learned to play an instrument (e.g. & Memory, 10(2), 141-147.
piano) for on average 7.17 years showed no sensitivity to Kellman, P. J., & Kaiser, M. K. (1994, October). Perceptual learning
composer or period styles at pretest; with PALM training they modules in flight training. In Proceedings of the Human Factors
showed significant sensitivity to both composer styles and and Ergonomics Society Annual Meeting (Vol. 38, No. 18, pp.
period styles. Seven years of musical training was insufficient 1183-1187). SAGE Publications.
to learn composer and period; but with only four and a half Kellman, P. J., Massey, C. M., & Son, J. Y. (2010). Perceptual
hours of PALM training, they gained expertise in these learning modules in mathematics: Enhancing students’ pattern
composer and period styles that would otherwise take years to recognition, structure extraction, and fluency. Topics in Cognitive
learn. This is remarkable. Science, 2(2), 285–305.
Mettler, E., & Kellman, P. J. (2014). Adaptive response-time-based
Our results demonstrate that perceptual learning technology category sequencing in perceptual learning. Vision Research, 99,
is effective for learning auditory stimuli, even in complex and 111-123.
challenging domains. This suggests that perceptual learning Mettler, E., Massey, C., & Kellman, P. J. (2011) Improving adaptive
interventions could accelerate learning in other complex learning technology through the use of response times. In L.
auditory domains. Perhaps future work can use perceptual Carlson, C. Holscher, & T. Shipley (Eds.), Proceedings of the 33rd
learning technology to investigate these domains, including Annual Conference of the Cognitive Science Society (pp.
important domains such as language. 2532-2537). Boston, MA: Cognitive Science Society.
Sabin, A. T., Eddins, D. A., & Wright, B. A. (2012). Perceptual
ACKNOWLEDGMENT learning evidence for tuning to spectrotemporal modulation in the
human auditory system. The Journal of Neuroscience, 32(19),
The authors gratefully acknowledge support from NSF 6542-6549.
REESE award #110922 and the US Department of Education, Sagi, D. (2011). Perceptual learning in vision research. Vision
Institute of Education Sciences (IES), Cognition and Student Research, 51(13), 1552-1566.
Learning (CASL) Program awards R305A120288 and Song, Y., Peng, D., Lu, C., Liu, C., Li, X., Liu, P., et al. (2007). An
R305H060070. We thank Timothy Burke, Joel Zucker, Rachel event-related potential study on perceptual learning in grating
Older, and the Human Perception Lab research assistants for orientation discrimination. Neuroreport, 18(9), 945–948.
Thai, K. P., Krasne, S., & Kellman, P. J. (2015). Adaptive perceptual
expert assistance.
learning in electrocardiography: The synergy of passive and active
classification. Proceedings of the 37th Annual Conference of the
REFERENCES Cognitive Science Society. Austin, TX: Cognitive Science Society.
Crump, M. (2002, May). A principal components approach to the Tyler, L. E. (1946). An exploratory study of discrimination of
perception of musical style. In Paper at the Banff Annual Seminar composer style. The Journal of General Psychology, 34(2),
in Cognitive Science (BASICS), Banff, Alberta. 153-163.
Gibson, E. J. (1969). Principles of perceptual learning and Widmer, G. (2005). Studying a creative act with computers: Music
development. New York, NY: Appleton- Century-Crofts. performance studies with automated discovery methods. Musicae
Hasenfus, N., Martindale, C., & Birnbaum, D. (1983). Psychological Scientiae, 9(1), 11-30.
reality of cross-media artistic styles: Human perception and
performance. Journal of Experimental Psychology, 9, 841–863.
12
Table of Contents
for this manuscript
Offset detection has received relatively little attention in the field of auditory perception
compared to onset detection. Previous studies of offset detection thresholds for musical
elements have examined harmonic offsets within a complex tone (Zera & Green, 1993),
as well as temporal offsets of pairs of tones (Pastore, 1983). In a temporal offset
detection task, two tones with identical onsets but slightly different offsets are played
through headphones, and a listener is asked to determine which tone ended first. By
varying the offset asynchrony of the tones across trials, one can estimate the listener’s
temporal offset judgment (TOJ) threshold: the minimum offset asynchrony required to
detect that the tones end at different times. The durations of the tones used in these
studies are typically very short (e.g., ranging in duration from 10 to 300 ms in Pastore,
1983). In contrast, notes in a musical piece, whether played by an individual instrument
or an instrumental section, can last for several seconds. Heise, Hardy, and Long (2015)
reported a TOJ threshold of approximately 100 ms for tones lasting 1500-2000 ms, but
this result warrants validation as it is much higher than reported by Pastore and assumes
100% detection (vs. 70.7% by Pastore). Our current project estimates TOJ thresholds for
a set of tones with durations similar to what listeners might encounter in a piece of music
played at a moderate tempo (120 bpm). We are conducting a psychophysical experiment
to estimate TOJ thresholds for two inharmonic tones (256 Hz and 400 Hz) played at 125,
250, 500, 1000, 1500, and 2000 ms (corresponding to sixteenth, eighth, quarter, half,
dotted-half, and whole notes, respectively). We predict that TOJ thresholds will increase
logarithmically as note duration increases. A function that relates note duration and TOJ
threshold will be useful for designing a musical transcription system or other sound event
detection algorithm that matches human perception.
ISBN 1-876346-65-5 © ICMPC14 13 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music training has been associated with improvements in both musical and non-musical cognitive
domains, as reported by a number of studies. Conflicting findings, however, make it difficult to
conclude exactly in what fashion musicians outperform non-musicians through systematic
reviews alone. To address this question, two meta-analyses were conducted to delineate whether
musicians and non-musicians differ across a range of cognitive and music processing domains,
and to determine the magnitude of such differences. Moreover, potential covariates (derived from
existing literature) were included as moderator variables. The most common of such covariates
was the amount of training that musicians have received; thus, it was hypothesized that years of
training would predict performance across domains. Additional variables believed to contribute to
differences between musicians included age of training onset, type of training, and so on, and
were similarly employed as moderator variables. As predicted, musicians significantly differed
from non-musicians on all music processing domains. Musicians also differed on many cognitive
domains: Construction and Motor Performance (d=.68), Executive Functions (d=.65), Memory
(d=.29), Neuropsychological Assessment (d=.31), Orientation and Attention (d=.40), Perception
(d=.35), and Verbal Functions and Academic Skills (d=.43); but not Concept Formation and
Reasoning (d=.183). Of more surprise was the finding from meta-regression analyses that years
of training was not related to performance in most of the cognitive and music processing
domains. In fact, years of training predicted superior performance only in Construction and Motor
Performance and Temporal Perception (p<.05). This pattern indicates a surprising finding that,
although musicians and non-musicians clearly differ across a range of musical and cognitive
processing dimensions, the amount of training does not explain differences. These results
highlight the need for a standardized protocol for defining musician samples.
ISBN 1-876346-65-5 © ICMPC14 14 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In electroacoustic music, the way in which sound spectra evolve over time often
resembles geometric shapes (e.g., Smalley's 'spectromorphology'). For instance, such
shapes may delineate a spectrotemporal evolution in which two sides of a triangle diverge
in frequency over time. Although the perceptual clarity of the shape could be implied by
the diverging sides' trajectories, the spectral content enclosed therein could also bear
some morphological significance, suggesting that sound shapes could involve several
perceptual aspects. This study explored the acoustic factors influencing the perception of
sound shapes and how they contribute to several perceptual attributes. Triangular sound
shapes, constant in duration and frequency extent, were studied as a function of principles
known from auditory scene analysis and different scalings of frequency and amplitude. A
listening experiment employed stimuli created by granular synthesis, which allowed the
control over the density of sinusoidal grains. Two independent variables varied the
proximity among sound grains along frequency and time, respectively. Two remaining
independent variables involved different frequency and amplitude scalings, based on
either linear or psychoacoustic functions. The dependent variables considered several
ratings, assessing the perceived degree of clarity, opacity or balance of the implied shape
and the sound's degree of heterogeneity. Results from a pilot study suggest that all
investigated factors influenced perceptual aspects of sound shape. Clarity of shape
appeared to be largely unaffected by granular density, whereas density appeared to
influence other perceptual aspects. These preliminary trends require further confirmation
in the forthcoming months, which may allow associating acoustic factors with the
perceptual aspects by means of multivariate regression. Further insight into the
underlying factors will inform more complex perceptual studies on electroacoustic music.
ISBN 1-876346-65-5 © ICMPC14 15 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Earlier studies have shown that harmonic (e. g., Bigand & Pineau, 1997) as well as tonal
expectations (e. g., Marmel et al., 2008; Marmel et al., 2011) influence pitch processing.
Moreover, we know that different musical systems have different pitch representations.
Turkish makam music, for instance, is a microtonal musical system that divides an octave
into 24 or 36 instead of seven intervals. Hence one should expect listeners from that
culture to have a different categorical representation of pitch compared to Western
listeners, which in turn might influence their sensitivity to minute pitch deviations. The
current study aims to test the influence of both tonal expectation and musical culture on
pitch perception. In our experiment, Turkish makam musicians, Turkish nonmusicians,
and Western listeners are asked to judge whether a final note within a Turkish makam or
Western melodic sequence is in- or out-of-tune. In some cases, the final note completes
the melodic sequence (“full cadence”), in some it does not (“half cadence”). We expect
faster and more accurate out-of-tune judgments for notes that complete a cadence (cf.
Bigand & Pineau, 1997). Furthermore, due to its microtonal system we expect listeners
encultured and entrained within Turkish makam music to be more accurate in judging
pitch deviation due to potentially possessing frequency-wise more narrow pitch category
representations. It is likely that Western listeners will show a response bias effect in the
makam music context compared to the Western music context. Results will be discussed
within the framework of top-down and attention-related processes. The study also plans
to explore ERP correlates of expectation effects on pitch accuracy judgments.
ISBN 1-876346-65-5 © ICMPC14 16 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
IV. RESULTS
I. BACKGROUND
Preliminary analyses suggest that the manipulation of
In every concert or while listening to a CD we are information provided influenced the way the music was
familiarized to the music heard by informative texts, such as perceived and evaluated. Ratings of the depended variable
program notes and CD booklets. Already with its integration liking were significantly higher in the second experiment if
into nineteenth century concert life, informative texts had the music was preceded by texts with an emotional-expressive
three central functions: 1) providing knowledge of the work in compared to a structural description of the music [main effect
the more narrow sense, e.g. example information on the of text: F(1, 166) = 11.40, p < .001, η² = .064], while no effect
history of a work’s creation or a composer’s biography; 2) of composer and/or interaction between the two variables text
providing knowledge regarding classification and valuation, and composer as well as of musical expertise (Gold-MSI) was
e.g. the significance of a work’s compositional and cultural found. Further analyses of the semantic differential data as
history; 3) providing knowledge of interpretation and well as the data from the first experiment will be reported at
reception. the conference.
ISBN 1-876346-65-5 © ICMPC14 17 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 18 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 19 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
20
Table of Contents
for this manuscript
A. The content of recording critical review affective states, moral qualities, or intentionality: 421). The
Figure 2 visualizes the thematic outcomes of Analyses I and construct of musical expression emerged in reviews as multi-
II. For the sake of brevity the model summarizes the emergent layered: the term ‘express’ and its correlates were used in
themes down to the fourth hierarchical level. Levels five and relation to Primary and Supervenient Descriptors to indicate
six are reported in detail in [27]. the way performers use their performance options (e.g., timing
or dynamics), the portrayal of music structural patterns, or the
outer manifestation of (defined or undefined) inner states.
Process
(78)
Besides Performance, reviews entailed, on average,
discussion of at least three other recording elements. These
Meta-Criticism
Critic’s
Perspective were mainly characterizations of the Composition (80) in
(119)
Descriptors Character Qualities (Cronbach’s = .986 for performance statements and = .962
Performer (1,404)
(78) (246) (421) for extra-performance statements).
Understanding
Recording (171)
Besides Descriptive Judgements, reviews encompassed
Production
Emotion
comments on facts (historical or current) that describe or
(65) (102) contextualize the recorded performance (Factual
Supplementaries Difficulty Dialogue Information, 257). Most Factual Information was about the
(32) (47) (92)
Composition performed (78), the Performer (68), and the
Recording Production (65). A subgroup of Factual
Factual Information
Recorded
Sound Market Information discussed the recording in the context of the
(21) (121)
wider music market (Market, 121), for instance indicating the
(257)
21
Table of Contents
for this manuscript
Evaluative Judgements and implicitly through Descriptors nature of the recording object (Collectability).
that also entailed an evaluative connotation (e.g., ‘nimble 14.65% of performance and 6.42% of extra-performance
fingers’ vs. ‘overtaxed fingers’). In total, 87.57% of Evaluative Judgements were comparisons, in which the
Performance statements in reviews were valence loaded. recorded performance was discussed against one or more other
Comments on Difficulty (Figure 2) also shaped evaluations, recordings. Reviews were also characterized by a mixture of
emphasizing the value of a performative outcome as the absolute and relative (taste-dependent) Evaluative
performer’s, composer’s, or engineer’s achievement. Judgements, with taste-dependent judgements present in 47%
The systematic examination of the relationship between of reviews. Finally, a sub-group of Evaluative Judgements
descriptors and valence loaded statements showed that critical (24) addressed the record producers (labels, performers),
evaluations can be explained through a small number of basic criticizing their course of action or expressing a desire for a
criteria used reliably by all critics (Kendall's coefficient of certain product. These comments fell within the themes
concordance W = 0.81, p < 0.001): Supplementaries, Composition, Recording Production, and
The first seven criteria were linked to the Suitability of all Performer and are heightened with an asterisk in Figure 3.
the criteria given the musical context, to the Aesthetic value of
the recorded performance (Intensity, Coherence, and IV. DISCUSSION
Complexity) and to its Achievement value (Sureness, We conducted the largest analysis to date on a sizeable
Comprehension, and Endeavour). A tension emerged between corpus of piano recording reviews in order to outline the
these seven criteria: critics discussed them as if they were performance features and other recording elements that critics
interdependent, so that an increase in one criterion could add discuss and the criteria they adduce to support their value
or mar other criteria (Tightrope). judgements. The final result is two descriptive models that
capture critics’ listening focus and what they think makes for a
Performance great recorded performance.
(1,502) Suitability Context- The overall emergent picture describes the content of
Recorded (244) dependency recorded performance critical review in terms of Descriptive
Sound Intensity and Evaluative Judgements of the Performance plus eight
(162) (349) other recording elements, particularly the Composition, the
Aesthetic
Instrument Coherence value Recorded Sound, and the Recording Production. Critics also
(19) (259) (848) provided Factual Information on the recording itself and on
the surrounding recording market, as well as meta-reflections
Evaluative Judgement
Supplementaries* Complexity
(17) (240) on the process and nature of critical writing (Meta-Criticism).
Composition* Sureness Evaluative Judgements emerged as the primary critical
(56) (133) activity in reviews, one that is embedded in a rich variety of
(1,876)
22
Table of Contents
for this manuscript
achievement could be efficiently integrated in educational and models offer solid empirical grounds for future
assessment schemes. comparative investigations of other forms, genre, corpora or
The different descriptors and criteria we have identified aspects of music critical review.
were used reliably by all our critics across almost 90 years of
published review. However the criterion Suitability, the ACKNOWLEDGMENT
tension between aesthetic and achievement criteria We thank the Gramophone for permission to study their
(Tightrope), and the presence of comparative and taste- archive for this research and the Institute for Quantitative
dependent judgements, all emphasize the context-dependency Social Science at Harvard University for permission to use
of performance evaluations in line with Sibley’s [38] context- their text analysis algorithm. We also thank Alessandro
aware generalism. This resonates with interactionist theories Cervino for his assistance in collecting data and Simone
on the aesthetic experience such as the ‘uniformity-in-variety’ Alessandri for his technical support. EA was supported by a
theory [39] and the processing fluency framework [6], and grant from the Swiss National Science Foundation
suggests that future developments of assessment protocols (13DPD6_130269) and by the Swiss State Secretariat for
should account for the complex and dynamic interaction Education, Research, and Innovation. VJW was supported by
between criteria. a Vice-Chancellor’s Fellowship from the University of
In addition to the Aesthetic and Achievement values, two Sheffield, UK.
recording-specific criteria emerged that focus on the recording
as a collectable and on its ability to offer a listening REFERENCES
experience akin to a concert (Collectability and Live- [1] D. Hume, “Of the standard of taste,” in Four Dissertation. London,
Performance Impact). These two criteria and the weight given 1757. Retrieved September 26, 2014 from <http://www.davidhume.org
in reviews to extra-performance elements, identify the /texts/fd.html>
recorded performance as a unique artistic product, distinct [2] H. Helmholtz, Die Lehre von den Tonempfindungen, als physiologische
Grundlage für die Theorie der Musik, 4th ed. Braunschweig: Friedrich
from live performance [40], [41]. In particular, they point at Vieweg und Sohn Verlag, 1877.
what [42] defined as the tension between ‘creating virtual [3] J. H. McDermott, „Auditory preferences and aesthetics: Music, voices,
worlds’ and ‘capturing performances’ and support the view and everyday sounds,” in Neuroscience of preference and choice:
Cognitive and neural mechanisms, R. J. Dolan & T. Sharot Eds.
that recorded performance judgements need to account for the London: Academic Press, 2012, pp. 227–256.
way in which the sound “got onto the disc” [40]. [4] P. N. Juslin, S. Isaksson, “Subjective criteria for choice and aesthetic
Finally, the polarization of reviews around a few pianists judgment of music: A comparison of psychology and music students,”
Research Studies in Music Education, vol. 36, pp. 179–198, 2014. doi:
and performances, the use of comparative judgements, 10.1177/1321103X14540259
comments on Price and Market, and recommendations [5] S. Thompson, S. “Determinants of listeners' enjoyment of a
addressed to external agents, all emphasize the critics’ role as performance,” Psychology of Music, vol. 35, pp. 20–36, 2007.
[6] R. Reber, N. Schwarz, P. Winkielman, “Processing fluency and aesthetic
mediators between producers and consumers. They aim to act pleasure: Is beauty in the perceiver’s processing experience?”
as a guide for consumers when deciding what product to buy Personality and Social Psychology Review, vol. 8, pp. 364–382, 2004.
[43]-[45]. The next step for this research will explore the [7] M. J. Bergee, “Faculty interjudge reliability of music performance
evaluation,” Journal of Research in Music Education, vol. 51, pp. 137–
extent to which this purpose is consciously intentional and its 150, 2003. doi: 10.2307/3345847
impact on consumer choices. [8] D. W. Kinney, “Internal consistency of performance evaluations as a
function of music expertise and excerpt familiarity,” Journal of
V.CONCLUSIONS Research in Music Education, vol. 56, pp. 322–337, 2009. doi:
10.1177/0022429408328934
We analyzed a vast corpus of critical review to map out [9] W. J. Wrigley, S. B. Emmerson, “Ecological development and validation
what expert critics write about and how they justify their of a music performance rating scale for five instrument families,”
Psychology of Music, vol. 41, pp. 97–118, 2013. doi: 10.1177/03057
evaluations of recorded performance. Despite the focused 35611418552
nature of the data source (Gramophone), repertoire [10] S. Thompson, A. Williamon, “Evaluating evaluation: musical
(Beethoven’s 32 piano sonatas), and format (recorded performance assessment as a research tool,” Music Perception, vol. 21,
pp. 21–41, 2003.
performance) the material has provided multiple insights [11] G. McPherson, E. Schubert, „Measuring performance enhancement in
relevant to research and musical practice. music,” in Musical Excellence, A. Williamon, Ed. New York: Oxford
The findings support but also challenge theories on University Press, 2004, pp. 61–82.
[12] C. Ryan, J. Wapnick, N. Lacaille, A. A. Darrow, “The effects of various
aesthetic appreciation; they provide musicians with a detailed physical characteristics of high-level performers on adjudicators'
account of critics’ focus in their assessment of Beethoven’s performance ratings,” Psychology of Music, vol. 34, pp. 559–572, 2006.
sonatas – essential in any pianists’ repertoire – and offer for [13] B. W. Vines, C. L. Krumhansl, M. M. Wanderley, D. J. Levitin, “Cross-
modal interactions in the perception of musical performance,”
the first time two models of critical evaluation that account for Cognition, vol. 101, pp. 80–113, 2006.
the specific nature of recorded performance as an artistic [14] M. Schutz, S. Lipscomb, „Hearing gestures, seeing music: Vision
product. influences perceived tone duration,” Perception, vol. 36, pp. 888–897,
2007.
A main aim of this research was to offer a new perspective [15] N. K. Griffiths, “Posh music should equal posh dress: an investigation
on the performance evaluation discourse by looking at what into the concert dress and physical appearance of female soloists,”
we can learn from expert critics’ writings. Performance critical Psychology of Music, vol. 38, pp. 159–177, 2010.
review emerged as a rich source material apt to systematic
investigation of both content and purpose: the present methods
23
Table of Contents
for this manuscript
[16] M. Lehmann, R. Kopiez, „The influence of on-stage behavior on the [41] M. Katz, Capturing sound: How technology has changed music.
subjective evaluation of rock guitar performances,” Musicae Scientiae Berkeley and Los Angeles: University of California Press, 2004.
vol. 17, pp. 472–494, 2013. doi: 10.1177/1029864913493922 [42] D. N. C. Patmore, E. F. Clarke, “Making and hearing virtual worlds:
[17] F. Platz, R. Kopiez, „When the first impression counts: music John Culshaw and the art of record production,” Musicae Scientiae, vol.
performers, audience, and the evaluation of stage entrance behavior,” 11, pp. 269-293, 2007. doi: 10.1177/102986490701100206.
Musicae Scientiae, vol. 17, pp. 167–197, 2013. doi: 10.1177/1029 [43] S. Frith, “Going critical. Writing about recordings,” in The Cambridge
864913486369 Companion to Recorded Music, N. Cook, E. F. Clarke, D. Leech-
[18] A. Gabrielsson, “Music performance research at the millennium,” Wilkinson & J. Rink, Eds. New York: Cambridge University Press,
Psychology of Music, vol. 31, pp. 221–272, 2003. 2009, pp. 267-282.
[19] J. H. McDermott, „Auditory preferences and aesthetics: Music, voices, [44] S. Debenedetti, “The role of media critics in the cultural industries,”
and everyday sounds,” in Neuroscience of preference and choice: International Journal of Arts Management, vol. 8, pp. 30-42, 2006.
Cognitive and neural mechanisms, R. J. Dolan & T. Sharot, Eds. doi:10.2307/41064885
London: Academic Press, 2012, pp. 227–256. [45] A. Pollard, Gramophone: The first 75 years. Harrow, Middlesex:
[20] D. E. Lundy, “A test of consensus in aesthetic evaluation among Gramophone Publications Limited, 1998.
professional critics of 900 modern music,” Empirical Studies of the Arts,
vol. 28, pp. 243–258, 2010.
[21] A. Van Venrooij, V. Schmutz, “The evaluation of popular music in the
United States, Germany and The Netherlands: a comparison of the use
of high art and popular aesthetic criteria,” Cultural Sociology, vol. 4, pp.
395–421, 2010. doi: 10.1177/1749975510385444
[22] G. Guest, K. M. MacQueen, E. E. Namey, E. E. Applied thematic
analysis. California: SAGE Publications, 2012.
[23] E. Alessandri, H. Eiholzer, A. Williamon, “Reviewing critical practice:
An analysis of Gramophone’s reviews of Beethoven’s piano sonatas,
1923-2010,” Musicae Scientiae, vol. 18, pp. 131–149, 2014. doi:
10.1177/1029864913519466
[24] D. J. Hopkins, G. King, “A method of automated nonparametric content
analysis for social science,” American Journal of Political Science, vol.
54, pp. 229–247, 2010.
[25] Y. R. Tausczik, J. W. Pennebaker, „The psychological meaning of
words: LIWC and computerized text analysis methods,” Journal of
Language and Social Psychology, vol. 29, pp. 24–54, 2010.
[26] E. E. Namey, G. Guest, L. Thairu, L. Johnson, “Data reduction
techniques for large qualitative data sets,” in Handbook for team-based
qualitative research, G. Guest & K. M. MacQueen, Eds. Maryland:
Altamira Press, 2008, pp. 137–161.
[27] E. Alessandri, V. J. Williamson, H. Eiholzer, A. Williamon, “Beethoven
recordings reviewed: a systematic method for mapping the content of
music performance criticism,” Frontiers in Psychology, vol. 6:57, 2015.
doi:10.3389/fpsyg.2015.00057
[28] E. Alessandri, “The notion of expression in music criticism,” in
Expressiveness in music performance: Empirical approaches across
styles and cultures, D. Fabian, R. Timmers & E. Schubert, Eds. Oxford:
Oxford University Press, 2014, pp. 22–33.
[29] E. Alessandri, V. J. Williamson, H. Eiholzer, A. Williamon, “A critical
ear: Analysis of value judgements in reviews of Beethoven’s piano
sonata recordings,” Frontiers in Psychology, vol. 7:391, 2016.
doi.org/10.3389/fpsyg.2016.00391
[30] V. J. Williamson, S. R. Jilka, J. Fry, S. Finkel, D. Müllensiefen, L.
Stewart, “How do earworms start? Classifying the everyday
circumstances of involuntary musical imagery,” Psychology of Music,
vol. 40, pp. 259–284, 2012. doi: 10.1177/0305735611418553
[31] V. J. Williamson, S. R. Jilka, S. R. “Experiencing earworms: an
interview study of involuntary musical imagery,” Psychology of Music,
vol. 42, pp. 653–670, 2013. doi: 10.1177/0305735613483848
[32] N. Carroll, On criticism. New York: Routledge, 2009.
[33] M. C. Beardsley, “On the generality of critical reason,” Journal of
Philosophy, vol. 59, pp. 477–486, 1962.
[34] M. C. Beardsley, “The classification of critical reason,” Journal of
Aesthetic Education, vol. 2, pp. 55–63, 1968.
[35] S. Kaplan, R. Kaplan, The Experience of Nature: a Psychological
Perspective. New York: Cambridge University Press, 1989.
[36] D. Davies, “Against enlightened empiricism,” in Contemporary debates
in aesthetics and the philosophy of art, M. Kieran, Ed. Oxford:
Blackwell, 2006, pp. 22–36.
[37] G. Graham, “Aesthetic empiricism and the challenge of fakes and ready-
mades,” in Contemporary Debates in Aesthetics and the Philosophy of
Art, M. Kieran, Ed. Oxford: Blackwell, 2006, pp. 11–21.
[38] G. Dickie, “Beardsley, Sibley and critical principles,” The Journal of
Aesthetics and Art Criticism, vol. 46, pp. 229-237, 1987.
[39] D. E. Berlyne, Aesthetics and psychobiology. New York: Appleton,
1971.
[40] R. Philip, Performing music in the age of recording. New Haven and
London: Yale University Press, 2004.
24
Table of Contents
for this manuscript
Musical tension is a complex composite concept that is not easy to define, nor quantify. It
consists of many aspects, usually rooted in either the domain of psychology (e.g.
expectation and emotion) or music (rhythm, timbre, and tonal perception). This research
focuses on the latter category, more specifically, tonal tension. Four vector-based
methods are proposed for quantifying and visualizing aspects of tonal tension based on
the spiral array, a 3D model for tonality. The methods are evaluated empirically using
variety of musical excerpts.
Our approach first segments a musical excerpt into equal length subdivisions and maps
the notes to clouds of points in the spiral array. Using vector-based methods, four aspects
of tonal tension are identified from these clouds. A first aspect, related to dissonance, is
captured by the cloud diameter, which shows the dispersion of the note clusters in tonal
space. The second aspect, related to tonal change, is the linear cloud momentum, which
captures pitch set movements in the spiral array. Tensile strain measures how far the
tonal center of each cloud is displaced from the global key. Finally, angular cloud
momentum measures the directional change for movements in tonal space.
These methods are implemented in a system that visualizes their results as tension
ribbons over the musical score, allowing for easy interpretation. The system was tested
on stimuli used in Farbood (2012)’s empirical study. The results show that the user-
annotated tension is almost always captured by one of the four methods, thus
corroborating the effectiveness of our methods. The study reveals that tension is a
composite characteristic that should not be captured by one fixed global measure, but by
varying combinations of aspects such as those in this study. These results are verified
thorough evaluation of snippets containing the Tristan Chord, excerpts from a Haydn
string quartet, a Beethoven piano sonata, and Schoenfield’s Trio and Cafe Music.
ISBN 1-876346-65-5 © ICMPC14 25 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Despite colloquial accounts of hearing stories while listening to music, little empirical
work has investigated this form of engagement. A series of studies in our lab examined
the factors that contribute to narrative hearings of orchestral excerpts. Some people,
however, tend to encounter orchestral music almost exclusively in the context of film.
Therefore, we designed new studies that use instrumental excerpts from pop music as
stimuli. People tend to hear this kind of music in a wider variety of circumstances,
making it less exclusively reliant on cinematic associations. By comparing results, we can
understand more about the relationship between enculturation and narrative listening.
50 participants heard 90s instrumental pop excerpts that either did or did not feature
noticeable contrast (as identified by a preliminary study), pursuing the hypothesis that
people resort to narrative to make sense of music featuring significant contrast (cf.
Almen, 2008). After listening to each excerpt, participants completed the Narrative
Engagement Scale for Music (NESM), adapted from Busselle & Bilandzic (2009), and
responded to a number of other questions about their listening experience. At the end of
the experiment, participants completed several measures including the Absorption in
Music Scale (Sandstrom & Russo, 2013).
As in the studies that used classical excerpts, results showed a higher tendency to
narrativize music that featured contrast. Also similar to previous studies, there was a
surprising amount of convergence in free response descriptions of the narratives, with
topical overlap that sometimes surpassed 80%. Contrastingly, however, the content of the
free response descriptions provided in response to the pop excerpts differed in intriguing
ways from those provided in response to the classical excerpts. The last part of the talk
explores these differences and outlines an agenda for further work on the widespread
behavior of narrative listening.
ISBN 1-876346-65-5 © ICMPC14 26 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 27 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
indeed, implies this by suggesting that as soon as music is This could be because they are not as musically sensitive as
heard in gestural terms, there is also an implication of agency, our theorist troika, or not sophisticated enough in their critical
because a musical gesture “communicates information about acumen—or because the persona theorists themselves are
the gesturer” (Hatten 2004, 224). completely misguided. Such explanations would leave
Apart from semiotic theories aiming at detailed, persona theories the sole province of critical scholasticism.
interpretative analyses of particular musical works, some Another explanation might be that the persona theorists, in
scholars have also presented full-scale persona theories which hypostatizing entities that might seem foreign to many
go a step further toward generalized claims. Foreshadowing listeners’ everyday musical understanding, have been unable
the above-mentioned distinction between story-level agents to communicate the idea of music’s impersonation in an
and an overarching narrative persona, Edward T. Cone (1974) accessible enough manner. In this paper, I suppose that this is
made a distinction between vocal or instrumental protagonists, indeed the case, and I want to make the suggestion that the
on the one hand, and what he called the complete musical source of persona theories lies simply in the ease with which
persona, on the other. For Cone, “any instrumental listeners may associate music with various significant spheres
composition […] can be interpreted as the symbolic utterance and aspects of human life, and hence, potentially, with
of a virtual persona. […] [I]n every case there is a musical significant dimensions of the human personality as we know it.
personality that is the experiencing subject of the entire In other words, to say that a piece of music is easily heard as
composition, in whose thought the play, or narrative, or an expressive utterance of depression by a virtual persona is
reverie, takes place—whose inner life the music perhaps just a way of saying that when listeners’ mindsets or
communicates by means of symbolic gesture” (Cone 1974, contexts, for one reason or another, suggest associating the
94). The complete musical persona is “by no means identical music with thoughts about oneself or a fellow human being,
with the composer; it is a projection of his musical the complex associations awoken (in the reference listener
intelligence, constituting the mind, so to speak, of the population) tend to be toward the depressive side of human
composition in question” (ibid., 57). In discussing Cone’s existence. From this angle, claims concerning the
theory, Robinson (2005, 329), suggests that it may help the impersonation of music would not hinge on positing virtual
critic “explain what appear to be anomalies in a piece when it personae or other mysterious entities “in the music.” Rather,
is interpreted as a ‘purely musical’ structure of tones.” Again, the core claim would simply be that in a given cultural setting,
the question would not be whether and how impersonation a call to associate human personality characteristics with a
actually figures in uninformed musical listening, but rather range of musical genres, styles, and textures may produce a
whether and how one might find justification for critical roughly consistent mapping. In the following study, this claim
practices drawing on personification. In Robinson’s (2005, will be evaluated by assuming the received Five-Factor Model
332) view, “it is appropriate to interpret a piece of (Costa & McCrae 1992; see, e.g., Matthews et al. 2003) as a
instrumental music as containing characters if this conveniently fine-grained model of human personality, and by
interpretation is consistent with what is known about the using the participant group’s own favorite musics as a basis
composer, including his compositional practices and beliefs, for sampling the range of culturally relevant musical sounds.
and if interpreting the piece in this way is part of a consistent This study addresses two questions: (1) Can listeners’
interpretation that makes sense out of the piece as a whole.” extramusical response to music reproduce the dimensions of
The philosopher Jerrold Levinson has gone further, human personality, as given by the Five-Factor Model, and (2)
however, in proposing a roughly similar theory as a general Is this dependent on listeners’ cultural familiarity with the
definition of musical expressiveness. Briefly, he takes a piece music? Basically, then, we want to know whether heard music
of music to be expressive of a given psychological condition α is personified in listeners’ understanding in ways that convey
if and only if the piece is ”most readily and aptly heard by the the “Big Five” traits of Neuroticism (N), Extraversion (E),
appropriate reference class of listeners as (or as if it were) a Openness (O), Agreeableness (A), and Conscientiousness (C).
sui generis personal expression of α by some (imaginatively These questions are approached by finding the “qualitative
indeterminate) individual” (Levinson 1990, 338). One extremes” of the favorite instrumental music of a listener
problem with such a formulation is the logical equivalence group in one culture, and by subjecting this music to a kind of
relation (“if and only if”) that suggests musical expressivity to “personality analysis” by this group of listeners as well as by
be present in music in all and only those situations in which another group of listeners in a different culture. The basic idea
the appropriate reference class of listeners will tend to hear is simple: on a first round of listenings, the participants would
the music as an expression of an imagined persona. This may imagine and write down scripts for associated movie scenes
be too strong a claim (for a discussion, see Huovinen & involving a central protagonist. On another round of listenings,
Pontara 2011). For present purposes, what is more important they would then be asked to fill out standard personality
is the mere fact that Cone, Levinson, and Hatten would all questionnaires concerning the imagined protagonists. Finally,
suggest a way of hearing music as a temporally extended factor-analytical solutions of the questionnaire results would,
experience of or expression by a persona in the music, for both of the listener groups, indicate the extent to which
fluctuating in its nuances as the music changes and progresses. listeners’ imagery of human characters in response to the
We should ask: do other listeners, too, more generally hear music might reproduce something like the dimensions of the
music in such terms? Five-Factor Model. The working hypothesis was that this
In discussing this issue with music students, I have found would indeed happen in the “home culture” from which a
that even despite their interest in conceptualizing qualitatively varied sample of music is culled, but that such
music-related phenomena, they often find it hard to accept reproduction of the Five-Factor structure might not succeed
persona theories as intuitively credible, or even as intelligible. with a group of listeners less culturally familiar with the
28
Table of Contents
for this manuscript
music. If this is right, the “personality dimensions” emerging on the course website, where the participants could
from the confrontation of listeners with culturally foreign individually listen to them at their leisure. To counterbalance
music might be either relatively coarse (with fewer than five effects of tedium, half of the participants saw the list of
factors emerging in the analysis), or they might be differently musical excerpts in the opposite order. After listening to each
organized (reflecting a less clear understanding of the music piece, each participant wrote down the name and composer or
as “naturally” expressive of human personality). performer of the excerpt if s/he knew it, and judged the
“fittingness of given opposite words to describe the mood of
the music.” The 12 bipolar, 7-point rating scales (between –3
and 3) were conceived in terms of six adjectival mood
II. EXPERIMENT 1 variables (three dynamic ones and three sensory ones) as well
as six nominal human variables (three with teleological and
With reference to the overall research plan presented above, three with affirmative import; see Table 1 for English
Experiment 1 was intended to test the hypothesis that listeners translations of the original Finnish terms). The task was
would tend to associate music from their own musical culture carried out in the time frame of 4–14 days before the main
with human characteristics in such a manner as to roughly experiment, thus familiarizing the participants with the music
reproduce the dimensions of the Five-Factor Model of beforehand.
personality. Accordingly, the Finnish listeners taking part in
Experiment 1 would first be asked to select their favorite
Table 1. The mood and human variables used in the preparatory
instrumental music to be used in the study. In the experiment,
phase of Experiment 1
they would hear music representing qualitative extremes of
musical genres known and valued by themselves as a mood variables human variables
group—by young Finnish adults with an interest in music.
dynamic sensory teleological affirmative
moving/static warm/cold future/past reality/imagination
A. Method tense/relaxed light/dark desire/contentment life/death
1) Participants. The participants were 38 Finnish strong/weak soft/hard fate/chance knowledge/doubt
university students of musicology (incl. 28 majors), with a
mean age of 25.4 years (sd = 7.7). They had played or sung
music actively for an average of 14.5 years (sd = 7.8), and had The participants’ overall familiarity with each others’
participated in formal music education for an average of 9.4 favorite music was not very high: on average, they knew only
years (sd = 5.7). 18 of them reported playing or singing music 6.1 of the other participants’ music selections by either the
daily, and the rest at least once a week. Most of them name and the composer/performer (sd = 3.7), and only 3.8 by
indicated having had some experience of improvising (33) both (sd = 3.0).
and/or composing (21) music during the past year. The Out of the 36 musical excerpts evaluated, a set of ten pieces
participants also reported averages of 16.5 hours of weekly was selected to be used in the main experiment. The selection
listening to music (sd = 15.9), 7.6 hours of playing or singing process aimed at sampling the qualitative extremes of this
music (sd = 6.6), and 6.2 hours of watching movies or TV (sd pool of music, by first arranging the 36 pieces in 12 ranking
= 6.5). orders according to the mean results on each of the scales in
2) Preparatory Phase: Music Sampling. The purpose of Table 1. Those pieces that most often appeared at rank orders
this preparatory phase was to sample the qualitative extremes 1–3, counted from either end of the ranking, were taken as the
of the participants’ musical environment by having them core of the stimulus set. Hence, as can be gleaned from Table
evaluate each others’ favorite pieces of music with regard to 2, the dramatic beginning of Shostakovich’ Fifth Symphony
their mood and their closeness to existential aspects of human would appear at these rank orders for no less than seven of the
life. Each participant was given the task to select one recorded scales: in relation to the other music heard during Phase 1, this
piece of instrumental music (in any genre, style, or tradition) music was experienced as extremely tense, strong, cold, and
which was personally important for him/her, and to upload it dark, as well as reflecting desire, death, and doubt.
as a soundfile on a course website. With two students Conversely, the 1972 rock instrumental Sylvia by the Dutch
selecting the same piece, and one self-recorded excerpt group Focus would be heard as light (vs. dark), and full of
discarded, the resulting pool of musical selections comprised contentment, reality, life, and knowledge. Other pieces with
36 pieces that, as a whole, gave an impression of the musical such extremely clear profiles included the beginning of
tastes among the target group of listeners. In all, there were 16 Prokofiev’s Piano Sonata No. 7 (moving, tense, cold, hard,
selections of classical music, 8 pieces representing various chance, doubt), a piece by the French post-metal band Year of
genres of rock music, 5 pieces of film or TV music, 3 musical No Light (tense, strong), and jazz pianist Bill Evans’ signature
excerpts from video games, as well as some jazz (2), folk (1), piece Waltz for Debby (relaxed, warm, soft). As the mean
and gospel (1) titles. rankings shown in Table 2 reveal, these pieces tended to
These pieces were edited to briefer versions of a 1’30’’– appear near the extremes for most of the scales. In order to
1’35’’ duration (with a 3 s linear fade-out), starting either at maximize variety, additional selections were made from those
the beginning of the original recording, or, in cases where the pieces that had extreme values on one or two of the scales and
participant had suggested a more specific favorite location, did not match any of the already chosen pieces in their
starting from there (with a 3 s fade-in, if needed). The edited qualitative profiles. Hence the opening of Alfredo Casella’s
excerpts were arranged in random order and uploaded again harp sonata was experienced as congenial to imagination, the
29
Table of Contents
for this manuscript
Table 2. The 10 pieces of music eliciting the strongest responses, and their rankings on the mood and human variables
(the top and bottom three rank orders have been bolded)
knowledge
ranking1
moving
reality
strong
future
desire
warm
mean
tense
light
soft
fate
life
CLASSICAL MUSIC
Casella, Sonata per arpa 6 18 29 16 17 16 11 25 32 36 14 29 11.00
Prokofiev, Piano Sonata No. 7 1 1 5 35 30 35 4 10 34 32 22 35 4.75
Satie, Gymnopédie 1 36 35 33 10 13 5 25 26 23 21 15 19 10.08
Shostakovich, Symphony No. 5 24 3 3 36 36 33 26 3 5 8 35 36 4.58
Sibelius, Violin Concerto 22 5 12 31 24 27 17 1 13 10 31 33 9.33
ROCK MUSIC
Focus, Sylvia 5 28 9 7 2 17 5 36 27 2 1 1 5.75
Gary Moore, The Loner 18 17 7 21 21 24 35 11 2 1 22 11 10.75
Year of no light, Persephone 8 1 1 32 33 32 7 6 10 7 28 9 6.00
JAZZ MUSIC
Bill Evans, Waltz for Debby 12 36 30 1 9 3 8 32 33 17 5 7 6.58
GAME MUSIC
Uematsu, Interrupted by Fireworks 23 32 36 13 3 10 9 30 22 34 12 13 8.75
1
Means for ranking numbers taken from the closer end of the scale (36 = 1, 35 = 2, etc.).
opening of Sibelius’ violin concerto sounded of desire, Erik probably esteem. The following instructions were given
Satie’s simplistic piano piece Gymnopédie 1 was static and before the experiment:
relaxed, Max Middleton’s rock ballad The Loner in
You will here ten musical excerpts. With each of these pieces
guitarist Gary Moore’s 1987 rendition resonated with the
of music, imagine that you are a film director who has
past, fate, and reality, and Nobue Uematsu’s music for the
carefully selected music for a scene depicting a character in
1997 video game Final Fantasy VII was experienced as
the film. Listen to the music with your eyes closed, trying to
weak and light, and relevant for imagination. be sensitive to the mood of the music, and imagine what sort
The set of music sampled thus not only evoked strong of a character and scene could be in question. Do not ponder
immediate judgments in the participant group, but also did too much, but try to grasp your first intuition!
this in sharply distinct fashions. It may be noted that none
of the film and TV music—supposedly engineered to When you have arrived at a suitable character and a scene,
please as large groups of listeners as possible—or the open your eyes and write a description addressing at least (1)
pieces most familiar to the participant group who the person depicted in the scene is, (2) what he or she is
(Tchaikovsky’s Swan Lake and Beethoven’s Moonlight like as a person, and (3) what is happening in the scene.
Sonata) made it to the selection based on strong responses. Try to write freely and spontaneously, without unnecessary
Indeed, prior familiarity did not seem to play much of a self-criticism. The more descriptive and detailed your writing,
role in eliciting strong responses: Shostakovich’ Fifth the easier it will be to carry out Part 2 of this experiment.
Symphony, for instance, was only recognized by three of Therefore, the richness of your descriptions is more
the participants, and hence the darkness, doubt and death important than linguistic correctness or textual structure.
heard in it could not stem from associations to the
composer’s harsh conditions under the Soviet rule. 4) Procedure, Phase 2: Personality Analysis. After
3) Procedure, Phase 1: Imagined Scenes. In the first completing Phase 1, and after a short break, Phase 2 of the
phase of the main experiment, the participants listened to main experiment involved listening to the pieces of music
the 12 musical excerpts in a quasi-random order in a again and filling out standard personality questionnaires for
classroom setting. On listening to each excerpt, their task each of the previously imagined protagonists. The
was to imagine and describe a human character that the instructions were as follows:
music could suitably depict. So as to avoid alienating
persona constructions, the task was framed with reference You will now hear the previous musical excerpts once again.
to choices standardly made in film production—that is, in Imagine that for each piece of music, your task as film
director is to describe to the actor or actress what sort of
cultural practices the participants would be aware of and
character he or she would be portraying in the scene
underscored by the music.
30
Table of Contents
for this manuscript
31
Table of Contents
for this manuscript
32
Table of Contents
for this manuscript
in standard applications of personality inventories. Instead, personality trait structure (McCrae & Costa 1997). The
the number of distinct factors remained the same in the result that our South African listeners’ “personality analysis”
South African and Finnish cases and, more importantly, the of the music-inspired protagonists closely matched the trait
dimensional structure of personality judgments for the structure of the Five-Factor Model suggests that the same
imagined protagonists actually approached the standard dimensional structure that cross-culturally emerges from
human pattern in the case of culturally more distant people’s judgments concerning their own personality can
listeners. It seems, then, that despite any cultural also emerge from their associative imagery catalyzed by
“irrelevance” that might be claimed for using such “foreign” music—even by music with some cultural distance from
music in a music-psychological experiment with South their own cultural setting. This could be viewed as another
African participants, these listeners were, in fact, highly way of putting forth the old idea of music as a “universal
capable of grasping the music as expressive of important language”: different expressive varieties of music may
dimensions of human existence. While they may have been cross-culturally function as roughly reliable markers of
lacking in some of the implicit cultural knowledge universally important dimensions of human personality.
concerning typical extramusical links for the relevant Even though associative features related to human
musical sounds and genres, the South African listeners personality may thus be cross-culturally available as tools
approached the spectrum of heard musical sounds with a for the appreciation of unfamiliar music, specific cultural
spectrum of more general human characteristics in mind. knowledge may further bias such associations towards
culturally established stereotypes. In the present study, the
Finnish group of listeners, in their associative imagery
evoked by “their own music,” did not conform to the
IV. GENERAL DISCUSSION dimensional structure suggested by the Five-Factor Model.
The whole pattern of results was, then, contrary to the
In audiovisual media, music is often used to illustrate working hypothesis behind the study. It had been originally
distinctions in human psychology. In classical Hollywood supposed that more specific cultural knowledge of the
movies, character types were conspicuously connected with musical genres and stylistic features might allow listeners
musical signifiers (Gorbman 1987), but there are also to better appreciate a range of such genres and styles as
subtler possibilities for using music in cinema to convey a bearers of information concerning the fine distinctions in
sense of represented subjectivity: music may sometimes human personality. Even if this were true to some small
offer a glimpse, as it were, to the film character’s inner extent, any such increase in acuity was here clearly
world (Stillwell 2007; Pontara 2014). In these and other superseded by the cultural insiders’ knowledge of the
ways, our omnipresent media cultures create and strengthen topical associations between varieties of music and
associations between music and human personality. This is varieties of human personality. For these listeners, the
but one aspect of a broader phenomenon of musical narrative imagery evoked by the music did not so much
topicality, whereby sets of musical pieces, styles, and revolve around “normal” human characters as it did around
genres are gradually furnished with extramusical meaning stereotypical figures ingrained by popular culture such as
in populations of listeners (see Huovinen & Kaila 2015). the generic “good person” or the “action hero.” In sum, it
Such cultural processes of collective meaning-making take seems that music is a medium with potential universal
place on a vast time scale, making it hard to draw significance as a signifier of human personality
distinctions between culturally contingent associations and characteristics, but such significance can be overridden by
ones that might be more deeply rooted in, say, analogical culturally constructed and instilled understandings of
gestural patterns between the music and the extramusical extramusical meaning in music.
world. This article has been an attempt to address such With regard to the persona theories discussed in the
distinctions—not by searching for universally recognizable introduction, these findings suggest that the personification
“musemes” (small units of musical meaning; see, e.g., of music does not need to be treated solely as a question of
Tagg 2013), but rather by focusing on music-driven careful critical interpretation of musical works. The
associations with complex, high-level psychological participants in the present study were admittely music
properties, and by examining the internal structuredness of students with perhaps a keener interest in music’s meanings
such associations in cases of smaller and greater cultural than one would find in a general population, but they did
distance between the music and the listeners. not have access to musical scores or information about the
In this study, Finnish and South African listeners genesis of the pieces heard, and they could not approach
listened to pieces of music representing some of the the fleeting experimental tasks with the kind of rational
favorite music of the Finnish group, and were asked to deliberation and consideration of alternative interpretive
imagine film characters that the music might suitably possibilities that is characteristic of semiotic persona
depict. By having the participants fill out personality theorizing. Instead, they improvised their imagined scenes
questionnaires for their imagined characters, we could on the fly while listening, grasping the music in more
employ the standard factor-analytical methods of intuitive terms, and nevertheless appeared to describe a
personality research to extract the dimensions of judgments world as rich in its dimensions of personality information
involved in the listeners’ human imagery. The as the real human world is.
methodological basis for the study was given by the In assessing such results, there is no reason to claim, à la
Five-Factor Model of personality, which has been found to Levinson, that musical expressivity consists in
have cross-cultural validity as a model of human impersonation, or should be defined in terms of it. It is
33
Table of Contents
for this manuscript
enough to say that at will, listeners across cultures, in Levinson, J. (1990). Music, art, and metaphysics: Essays in
response to heard music, can be expected to be capable of philosophical aesthetics. Ithaca and London: Cornell
easily directing their thoughts in ways that delineate either University Press.
more universal or, perhaps, more culturally impregnated Matthews, G., Deary, I. J., & Whiteman, M. C. (2003).
Personality traits. 3rd ed. Cambridge and New York:
conceptions of human personality. As soon as the listener,
Cambridge University Press.
for one reason or another, activates person-related thoughts McCrae, R. R. & Costa, P. T., Jr. (1997). Personality trait
or emotions in the listening situation, the music will be structure as a human universal. American Psychologist, 52(5),
readily available to shape, sharpen, focus, or rechannel this 509–516.
human imagery. No hypothetical musical personae are Pontara, T. (2014). Bach at the space station: Hermeneutic
needed to make this happen. It is just that our lives as pliability and multiplying gaps in Anderi Tarkovsky’s Solaris.
human persons, in societies made up of other human Music, Sound, and the Moving Image, 8(1), 1–23.
persons, are so imbued with person-related thinking that Rentfrow, P. J. & Gosling, S. D. (2003). The do re mi’s of
personality-relevant associative modes are always awaiting everyday life: The structure and personality correlates of
music preferences. Journal of Personality and Social
to be activated when heard music gets our minds to wander
Psychology, 84(6), 1236–1256.
into our memories, fears, hopes, plans, and dreams. Robinson, J. (2005). Deeper than reason: Emotion and its role in
literature, music, and art. New York: Oxford University
Press.
Spitzer, M. (2013). Sad flowers: Analyzing affective trajectory in
ACKNOWLEDGMENT Schubert’s “Trockne Blumen.” In T. Cochrane, B. Fantini, &
K. R. Scherer (Eds.), The emotional power of music:
Multidisciplinary perspectives on musical arousal, expression,
This research was supported by a Research fellowship in and social control (pp. 7–21). Oxford: Oxford University
the Research niche area Musical Arts in South Africa: Press.
Resources and Applications (MASARA) at the North West Stillwell, R. (2007). The fantastical gap between the diegetic and
University, Potchefstroom campus, South Africa. Thanks nondiegetic. In D. Goldmark, L. Kramer, & R. Leppert (Eds.),
are also due to prof. Jan-Erik Lönnqvist at the University of Beyond the soundtrack: Representing music in cinema (pp.
Helsinki for advice in using the Short Five personality 184–202). Berkeley & Los Angeles: University of California
inventory. Press.
Tagg, P. (2013). Music’s meanings: A modern musicology for
non-musos. New York & Huddersfield: The Mass Media
Music Scholar’s Press, Inc.
REFERENCES Tarasti, E. (1994). A theory of musical semiotics. Bloomington
and Indianapolis: Indiana University Press.
Chamorro-Premuzic, T. & Furnham, A. (2007). Personality and
music: Can traits explain how people use music in everyday
life? British Journal of Psychology, 98(2), 175–185.
Cone, E. (1974). The composer’s voice. Berkeley, Los Angeles,
and London: University of California Press.
Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO
Personality Inventory (NEO-PI-R) and NEO Five-Factor
Inventory (NEO-FFI) professional manual. Odessa, FL:
Psychological Assessment Resources, Inc.
Gorbman, C. (1987). Unheard melodies: Narrative film music.
Bloomington & Indianapolis, IN: Indiana University Press.
Hatten, R. S. (2004). Interpreting musical gestures, topics, and
tropes: Mozart, Beethoven, Schubert. Bloomington and
Indianapolis: Indiana University Press.
Huovinen, E. & Kaila, A.-K. (2015). The semantics of musical
topoi: An empirical approach. Music Perception, 33(2), 217–
243.
Huovinen, E. & Pontara, T. (2011). Methodology in aesthetics:
The case of musical expressivity. Philosophical Studies,
155(1), 45–64.
Kemp, A. E. (1996). The musical temperament: Psychology and
personality of musicians. New York: Oxford University Press.
Konstabel, K., Lönnqvist, J.-E., Walkowitz, G., Konstabel, K., &
Verkasalo, M. (2012). The “short five” (S5): Measuring
personality traits using comprehensive single items. European
Journal of Personality, 26(1), 13–29.
Kramer, L. (1990). Music as cultural practice, 1800–1900.
Berkeley, Los Angeles, and Oxford: University of California
Press.
Ladinig, O. & Schellenberg, E. G. (2012). Liking unfamiliar
music: Effects of felt emotion and individual differences.
Psychology of Aesthetics, Creativity, and the Arts, 6(2), 146–
154.
34
Table of Contents
for this manuscript
It has been suggested that aesthetic experience is optimal when stimuli have a moderate
level of complexity. In this experiment we test this hypothesis, focusing on two musical
factors that have not been previously examined in this regard: range and repetition. Our
hypothesis is that melodies that are more difficult to encode will seem more complex.
Encoding should be inhibited by a large melodic range (since this creates large melodic
intervals, which are low in probability) and variability, i.e., lack of repetition (since
repetition allows a more parsimonious encoding of the melody).
Melodies were generated by a stochastic process that chooses pitches from a uniform
distribution over a defined range. All melodies were 49 notes long and entirely within the
G major scale. There were three levels of range: small (2 notes), medium (6 notes), and
large (29 notes). There were also three levels of variability: in low-variability melodies,
each four-note group simply repeated the first (randomly generated) four-note group; in
medium-variability melodies, each even-numbered four-note group repeated the previous
odd-numbered group; and high-variability melodies contained no repetition (except that
arising by chance). All combinations of range and variability were used. It was predicted
that melodies of medium range and variability would obtain the highest ratings, and that
there would be an interaction between range and variability such that high-variability
melodies were liked more when the range was smaller.
Subjects (musicians and nonmusicians) heard the melodies and indicated their liking for
them on a Likert scale. As predicted, in both the range and variability factors, the medium
level was preferred over the low and high levels, though the difference between medium
and high levels was marginal. The predicted interaction between range and variability
was also observed. Implications will be discussed with regard to liking for melodies and
melodic encoding.
ISBN 1-876346-65-5 © ICMPC14 35 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Music holds a valued place in the marketer’s toolset for persuasion, whether in the form
of jingles, sonic branding, or background music in stores or in TV ads. Like the
application of music in medicine and in education, the application of music in marketing
provides a real-world test case for investigating the effects of music on the mind and
brain. Though there have been a number of studies investigating music in marketing over
the last decades, the most recent review of research on this topic was published back in
1990 (Bruner). We are overdue for a fresh look at this area of research that also takes into
account advances in basic research on music.
Results
There are three primary veins of research into music and marketing that have developed.
First, studies show that music has a significant impact on consumers in stores. The choice
of music relative to the products on sale, the style of music, its tempo, familiarity, and
even volume influence how consumers move through the store, as well as what and how
much they purchase. Second, the music that accompanies TV advertising impacts not
only the interpretation of the storyline in ads, but also the propensity for ads to drive
behavior change. Third, by means of its influence on mood, music can alter decision-
making and interact with the experience of other senses to color the overall experience of
products.
Conclusion
Applied research on the impact of music in marketing confirms the potential of music to
influence the human mind and brain. It also opens new avenues of inquiry into
multisensory integration and broadens our understanding of the effects of music. This
paper sheds light on research investigating music and marketing to date and relates that
work to what we currently know from basic research into the psychology and
neuroscience of music.
ISBN 1-876346-65-5 © ICMPC14 36 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 37 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
(2011) further refined five dimensions/factors in the structure (soothing, relaxing, etc.), musical characteristic (allegro, light,
of musical preference. They included musical attributes, etc.), genre (heavy metal, folk, etc.), musical period
psychological attributes, and genres to enhance the concept of (romantic, modern, etc.), and performing form (choir,
the multifaceted nature of music styles. The five dimensions orchestral, etc.). Finally, 10 factors of music (Table 1) were
are mellow (smooth, relaxing, romantic, R&B, and soft rock), obtained. The most popular factor was pop songs (Cantopop,
unpretentious (country/singer-songwriter, relaxed, not Mandopop, K-pop, J-pop, and English pop), accounting for 64
distorted, and not intelligent), sophisticated (complex, out of the 242 answers or 26.45%. The second factor was the
intelligent, inspiring, classical, world, and jazz), intense (loud, creation of uplifting music (soothing, relaxing, easy listening,
forceful, energetic, aggressive, not relaxing, classic rock, and soft, slow, light, new age as used for yoga, allegro, and
heavy metal), and contemporary (rhythmic, percussive, rap, positive) (19.01%). The third factor was the musical period
electronica, funk, and acid jazz). (classical, romantic, and modern) (14.46%). The fourth factor
Various methods of defining music genres for preference consisted of band, rock, indie, post hardcore, heavy metal, and
measurement have been presented in previous studies. The alternative rock music (12.40%). The fifth factor included
common method utilized is to simply list a range of genre R&B, soul, hip pop, dance, reggae, blues, and rap (10.74%).
labels and assess based on a Likert scale. Other studies The sixth factor was folk (folk, oldies, and country music)
adopted a highly complicated procedure by playing musical/ with 4.55%. The seventh factor covered instrumental music
audio stimuli of a representative genre for a range of selected (pure music, orchestral, and piano music) (4.13%). The eighth
genres. The first step is to determine the music samples that factor was jazz (jazz and bossa) with 3.31%. The ninth factor
can fully depict the study subjects’ musical preferences. The covered acoustic music (electronic, anime, and acoustic)
samples can be audio or verbal descriptors. Several studies (2.89%). The least liked one was choral (hymn, duets, church,
looked into the featured musical elements (e.g., tempo, chant, and choir) (2.07%).
rhythm, melody, and instrument) in musical works to express
musical preference (Kopacz, 2005). Preference researchers Table 1. Genre descriptors.
agree that “musical genres are always contextualized in a
particular culture, region and time” (Ferrer, Eerola, & Music Frequency Percentage
Vuoskoski, 2012, p. 500). To accurately match the genre
Pop, K-Pop, J-Pop, Cantopop, English
labels found in commercial or academic arenas with Macau
song, Mandopop 64 26.45
listeners’ usage, a certain procedure is implemented to clarify
what to put into the preference measurement and how. Soothing, relax, easy listening, soft,
The present study serves as a pilot study to investigate the slow, light, New Age, allegro, positive 46 19.01
Macau youth’s music preference and its relation to Classic, Romantic, Modern 35 14.46
personality. The first step is to explore their expression of
musical preference: what (genre, label, style, feature, Band, rock, indie, post hardcore, heavy
category, etc.) do they use to describe their favorite music and metal, alternative rock 30 12.40
how? R & B, soul, hip-hop, dance, reggae,
The second step is to identify the differences and blues, rap 26 10.74
correlations between musical preference and the Big Five
personality traits. Folk, oldies, country music 11 4.55
Pure music, instrumental, orchestral,
II. METHOD piano 10 4.13
A survey was conducted on 130 participants (57 males and
Jazz, bossa 8 3.31
73 females) aged 18 to 25 years old (mean = 20.27, SD =
1.28) who are non-music major undergraduates enrolled in Acoustic, electronic, anime 7 2.89
one of the music elective courses in a Macau university. The
participants were informed of the purpose of the survey; they Hymn, duets, church, chant, choir 5 2.07
expressed their willingness to participate in the study. The
The personality score was rated on a seven-point Likert
survey was administered during the final 10 minutes of each
scale (see Table 2). Among the Macau youth participants,
class session.
agreeableness was the most commonly observed trait (M =
The questionnaire is composed of a Big Five personality
4.66), followed by conscientiousness (M = 4.42), openness (M
inventory (the 50-item version) translated to Chinese. To
= 4.24), extroversion (M = 3.96, and emotional stability (M =
obtain information on musical preference, an item asks the
3.64).
respondents to write three of their favorite music genres in
ANOVA was performed to obtain the differences between
free form. The item does not suggest any terms or genres to
the Big Five traits and music preferences. No significant
determine how the participants express their own musical
difference was found between all personality traits and music
preference.
preferences.
Spearman’s rho correlation was utilized to analyze any
III. RESULTS correlation between the five personality traits and music
The item on three favorite music genres received 48 terms/ preferences. No significant correlation was found
labels of music with a total frequency of 242. The researcher (extroversion: 0.0, agreeableness: −0.09, conscientiousness:
consolidated the labels according to psychological expression −0.10, emotional stability: −0.04, and openness: −0.05).
38
Table of Contents
for this manuscript
Table 2. Mean, SD, Max, Min, F Value and P Value of Big Five ACKNOWLEDGMENT
Personality.
This report, as part of the research project, is supported by
the University of Macau.
Personality Trait Mean SD Max Min F Value P Value
REFERENCES
Agreeableness 4.66 .69 6.13 1.75 .598 .813
Brown, R. A. (2012). Music preferences and personality among
Conscientiousness 4.42 .87 7.00 2.00 .622 .793 Japanese university students. International Journal of Psychology,
47(4), 259-268. doi: 10.1080/00207594.2011.631544
Openness 4.24 .68 6.70 2.20 1.223 .283 Burt, C. (1939). The factorial analysis of emotional traits. Character
and Personality, 7, 238-290.
Chamorro-Premuzic, T., & Furnham, A. (2007). Personality and
Extraversion 3.96 .78 5.90 1.80 1.460 .163 music: Can traits explain how people use music in everyday life?
Emotional British Journal of Psychology, 98, 175-185. doi:
stability 3.64 .88 5.78 1.11 .866 .567 10.1348/000712606X111177
Ercegovac, I. R., Dobrota, S., & Kuščević, D. (2015). Relationship
Between Music and Visual Art Preferences and Some Personality
Traits. Empirical Studies of the Arts, 33, 207-227
IV. DISCUSSION AND CONCLUSION Ferrer, R., Eerola, T., & Vuoskoski, J. K. (2013). Enhancing genre-
based measures of music preference by user-defined liking and
The raw data on the participants’ preferred music indicate
social tags. Psychology of Music, 41, 499-518.
that the answers mainly covered the most accessible music, Klimas-Kuchtowa, E. (2000). Music preferences and music
such as pop (in different languages), and functional music to prevention. Publication of the Academy of Music in Wroclaw, 76
enhance their daily living. The most frequently used Kopacz, M. (2005). Personality and Music Preferences: The
descriptors are positive, soothing, and relaxing. Music on the Influence of Personality Traits on Preferences Regarding Musical
Internet has permeated the lives of young people; many Elements. Journal of Music Therapy, 42, 216-239. doi: 10.1093/
genres, such as heavy metal and jazz, have become user jmt/42.3.216
friendly. Young people depend on the Internet to search for Langmeyer, A., Guglhör-Rudan, A., & Tarnai, C. (2012). What Do
and share patterns to create their own listening list. In the old Music Preferences Reveal About Personality? Journal of
days when musical knowledge was limited to the school, Individual Differences, 33(2), 119-130. doi: 10.1027/1614-0001/
a000082 ̈
Macau students did not have many choices of music genres to
Litle, P., & Zuckerman, M. (1986). Sensation seeking and music
express their preference other than Cantopop, the culture’s preferences. Personality and Individual Differences, 7, 575-57.
mainstream genre. Many music labels, such as classical, Rawlings, D., Hodge, M., Sherr, D., & Dempsey, A. (1995). Tough-
romantic, and hymns, were probably learned by the mindedness and preferences for musical excerpts, categories, and
participants through the general music education and elective triads. Psychology of Music, 23, 63-80.
music courses they completed. Future studies can consider Rentfrow, P. J., Goldberg, L. R., & Levitin, D. J. (2011). The
modifying the pattern of the five-factor structure of Rentfrow structure of musical preferences: A five-factor model. Journal of
et al. (2011) to design a music preference inventory because it Personality and Social Psychology, 100, 1139-1157. doi: 1.1037/
comprises several detailed samples of music using a0022406
psychological attributes and musical attributes/elements. Rentfrow, P. J., & Gosling, S. D. (2003). The do re mi's of everyday
life: The structure and personality correlates of music preferences.
Instead of asking free-form open-ended questions, future
Journal of Personality and Social Psychology, 84, 1236-1256. doi:
studies should provide musical excerpts for the preference 10.1037/0022-3514.84.6.1236
items. Schafer, T., & Sedlmeier, P. (2009). From the functions of music to
The lack of significant differences and correlations between music preference. Psychology of Music, 37, 279-300. doi:
personality traits and music genres can be explained by the 10.1177/0305735608097247
method of preference assessment. Another explanation might Schwartz, K. D., &Fouts, G. T. (2003). Music preferences,
be the homogenous education background of the participants. personality style, and developmental issues of adolescents. Journal
The effectiveness of music training, school culture, and media of Youth and Adolescence, 32, 205-213
provision from their 11 years of foundational education can Swami, V., Malpass, F., Havard, D., Benford, K., Costescu, A.,
distort the results. Despite widespread Internet use, the Sofitiki, A., & Taylor, D. (2013). Metalheads: The influence of
personality and individual differences on preference for heavy
training for independent thinking might not have been
metal. Psychology of Aesthetics, Creativity, and the Arts, 7, 377.
successful. As exhibited by the distribution of personality Zweigenhaft, R. L. (2008). A Do Re Mi Encore. A closer look at the
traits, agreeable individuals are the majority. According to personality correlates of music preferences. Journal of Individual
literature, two traits, namely, extroversion and openness, are Differences, 29, 45-55. doi: 10.1027/1614-0001.29.1.45
robust predictors of music preferences. However, the present
study’s results do not agree with the trends found in other
cultures (Rentfrow and Gosling, 2003).
39
Table of Contents
for this manuscript
Short, looping segments of very catchy songs that spontaneously come to mind are
known as earworms (Williamson, 2012). It has been estimated that over 90% of people
experience this type of involuntary musical imagery (INMI) every week (Liikkanen,
2012a). Despite our understanding of the incidence of INMI, less is known about specific
underlying musical qualities of earworms, or more broadly, what qualities make a song
catchy and likely to become an earworm. This research considered what musical qualities
make music catchy, using our knowledge of how familiarity and liking play a role in
determining musical preferences. The first study investigated the “catchiness” of the top
Billboard year-end hits from 1910-2009 and aimed to identify musical features associated
with “catchiness”. After adjusting for the strong relationship between catchiness and
familiarity, three musical factors exhibited the strongest relationship with catchiness:
tempo, harmonic complexity (measured as the number of different chords in the
chorus/hook), and the incidence of chain rhyming in the chorus/hook. A follow up study
was conducted to explore if these three factors still hold in unfamiliar musical stimuli,
with a particular focus on genre to determine if these preferences are genre-specific.
Results indicated different patterns of preferences across the five genres studied
(Country, Hip-Hop, Pop, R&B and Rock). The current study concludes that the musical
qualities that make a song catchy, and therefore more likely to become an earworm, tend
to make songs easier to sing, making them intrinsically more memorable.
ISBN 1-876346-65-5 © ICMPC14 40 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 41 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Gomez & Danuser 2007; Russo, Vempala, & Sandstrom, To prepare the experimental stimuli, we used Audacity
2013), while heart rate sometimes increased in response to software to remix the original digital .m4a audio file of the
more arousing music (Russo, Vempala, & Sandstrom, 2013), recording to mono and export to .wav format. We then loaded
but often produced no statistically significant change from the .wav file in Matlab and added a linear fade-in and fade-out
baseline measures (Gomez & Danuser, 2004; Rickard, 2004; to the first and last 1000 msec of the piece, respectively. A
Steinbeis, Koelsch, & Sloboda, 2006; Gomez & Danuser 2007; reversed condition was then created by reversing the stimulus
Koelsch et al., 2008). waveform. Finally, we added a second audio channel
Ongoing behavioral responses to music along various containing intermittent clicks, which were sent directly to the
dimensions, including locus of attention (Madsen & Geringer, EEG amplifier for precise time-stamping of the stimuli and
1990), ‘aesthetic experiences’ (Madsen, Brittin, & were not heard by participants, and saved the outputs as
Capperella-Sheldon, 1993), and combined valence and arousal stereo .wav files.
measures (Schubert 2004), have been collected using
Table 1. Points of interest in Elgar’s Cello Concerto. We identify
continuous response interfaces. A recent study of particular a set of salient musical events in the excerpt, including points at
relevance to the present study involves continuous behavioral which the primary theme is introduced (A1, A2) and the start of
ratings of engagement with a live dance performance sections of buildup (B1, B2), which later culminate in highpoints
(Schubert, Vincs, & Stevens, 2013). This study utilizes a of the piece (C1, C2). We additionally identify a point of low
definition of engagement that we will use here. The authors arousal (D). Labels correspond to the labels in Figure 3.
additionally consider both the level of engagement and the
agreement among participant responses over time. One Time Event Label
finding, which we will return to later, is that so-called ‘gem 1:23 Entrance of cello theme A1
moments’ – in that study described as moments of surprise – 2:07 Cello theme, building to high point B1
bring about sudden rises in engagement level, but periods of 2:31 Highpoint C1
3:19 Break in content D
heightened agreement tend to occur more when audience
6:15 Entrance of cello theme A2
expectations have been established and are not interrupted.
6:39 Orchestral theme, building to high point B2
In the present study, we aim to validate the application of
7:02 High point C2
RCA and EEG-ISCs to study cortical responses to a complete
and self-contained musical excerpt from the classical B. Participants
repertoire (in this case a movement from a concerto). Our
For this experiment we sought healthy, right-handed
proposed collection of responses will lead to measures
participants between 18-35 years old, with normal hearing,
intended to index both engagement (EEG, continuous
who were fluent in English, and who had no cognitive or
behavioral responses) and arousal (physiological responses).
decisional impairments. As formal musical training has been
The responses can additionally be categorized as objective
shown to produce enhanced cortical responses to music
(neurophysiological responses) or subjective (continuous
(Pantev et al., 1998), we sought participants with at least five
behavioral responses). As our stimulus of interest expresses
years of formal training in classical music. Qualifying forms
large fluctuations in arousal, we further seek to interpret the
of training included private lessons, AP or college-level music
cortical responses among both the levels and the synchronicity
theory courses, and composition lessons. Years of training did
of physiological and continuous behavioral responses. Our
not need to be continuous. Because we wished to avoid
present analysis focuses on specific periods of very high or
involuntary motor activations found to occur in response to
low arousal within the musical excerpt, as well as with
hearing music played by one’s own instrument (Haueisen &
passages of increasing harmonic and melodic tension, and the
Knösche, 2001), we recruited participants who had no training
introductions of principal and salient musical themes.
or experience with the cello. Finally, as we are interested in
II. EXPERIMENTAL DESIGN listener engagement, we sought participants who reported that
they enjoyed listening to classical music, at least occasionally.
A. Stimuli We collected usable data from 13 participants (four males,
Stimuli for this experiment were drawn from the first nine females) ranging from 18-34 years of age (mean 23.08
movement of Elgar’s Cello Concerto in E Minor, Op. 85, years). All participants met the eligibility requirements for the
composed in 1919. This work has been shown to induce chills experiment. Years of formal musical training ranged from
in past experiments (Grewe et al., 2007b; Grewe et al., 2010). 7-17 years (mean 11.65 years). Eight participants were
The version used here is the recording of the 1965 Jacqueline currently involved in musical activities at the time of their
du Pré performance with sir John Barbirolli and the London experimental session. Music listening ranged from 3-35 hours
Symphony Orchestra, considered to be a definitive and per week (mean 13.96 hours). Data from three additional
influential performance of the piece (Solomon, 2009). participants were excluded during preprocessing due to gross
In descriptive music analysis, salient events are often noise artifacts. This study was approved by Stanford
designated as structurally relevant. These events can include University’s Institutional Review Board, and all participants
climactic ‘highpoints’ (Agawu, 1984); the introduction and delivered written informed consent prior to their participation
reprise of principal thematic materials; a sudden, generally in the experiment.
unexpected pause; and significant changes of timbre or texture, C. Experimental Procedure
such as a change in orchestration or the entrance of a soloist.
The experiment was organized in two blocks. The first
Moments of this type in the Barbirolli/Du Pré recording of the
block involved the simultaneous EEG and physiological
Elgar concerto are noted in Table 1.
recordings. Here, the participant was instructed to sit still and
42
Table of Contents
for this manuscript
listen attentively to the stimuli as they played (no other task Transducer Module. The ECG and respiratory sensors, leads,
was performed) while viewing a fixation image on the and apparatus were obtained from EGI and were approved for
monitor in front of him. Each stimulus was presented once in use with their Polygraph Input Box (PIB); the electrode net
random order and was always preceded by a 1-minute provided the ground for these inputs. While EGI does not
baseline recording, during which time pink noise was played currently provide an apparatus for measuring GSR, we
at a low volume. At the conclusion of each trial, the attempted to collect it using a custom apparatus which
participant used a computer keyboard to rate the pleasantness, plugged into the PIB. The electrode net and the PIB connected
arousal, interestingness, predictability, and familiarity of the to the Net Amps 300 amplifier. All responses were recorded
excerpt just heard. We additionally collected a rating of genre simultaneously at a sampling rate of 1 kHz. The following
exposure after the original stimulus only. filters were applied to the physiological responses at
In the second block, continuous behavioral measures of acquisition:
engagement were recorded. We used the definition from the • ECG and respiratory: Highpass 0.1 Hz, lowpass 100 Hz,
Schubert, Vincs, & Stevens (2013) study: Being ‘compelled, notch 60 Hz;
drawn in, connected to what is happening, interested in what • GSR: Highpass 0.05 Hz, lowpass 3 Hz, notch 60 Hz.
will happen next.’ Participants were shown this definition No filtering was applied to the EEG responses at acquisition.
prior to the trials in this block, and indicated their Physiological signals and amplified EEG signals were
understanding of the definition and the task before proceeding recorded using EGI Net Station software, version 4.5.7, on a
with the trials. The participant operated a mouse-controlled Power Mac G5 desktop computer running the OSX operating
slider whose position was displayed on the computer monitor. system, version 10.6.8. Continuous behavioral responses were
Each stimulus was presented once in random order. recorded to the stimulus laptop at a sampling rate of 20 Hz.
The continuous behavioral block was always second in the
experimental session because we did not want the specific III. DATA ANALYSIS
definition of engagement presented there to influence
participants’ listening behavior during the neurophysiological E. EEG Preprocessing and Analysis
block, which was intended to involve no cognitive task or Initial preprocessing steps were performed using EGI’s Net
effort on the part of the participant beyond listening to the Station software. EEG recordings were bandpass filtered
stimuli. This of course introduces a potential confound into between 0.3-50 Hz and temporally downsampled by a factor
the present study, but we found it to be the most appropriate of 8 to a sampling rate of 125 Hz. Data records were then
design, as we wanted to collect the full set of responses from exported to .mat format.
each participant. We note that Steinbeis, Koelsch, & Sloboda All subsequent analyses were performed in Matlab using a
(2006), who also employed separate neurophysiological and custom software implementation. Individual recordings were
continuous behavioral recordings, presented the continuous annotated, and trials were epoched using the precisely timed
behavioral block first. However, participants in that study click events sent from the secondary audio channel directly to
were given a separate task during the latter block (comparing the EEG amplifier. Despite the filtering step described above,
lengths of stimuli), and thus were likely not impacted by the we found it necessary to DC-correct and detrend each trial
task of the previous block. epoch of data. Following these procedures, we retained
electrodes 1 through 124 for analysis, excluding electrodes on
D. Stimulus Delivery and Data Acquisition the face. Bad electrodes were identified and removed from the
Experiments for both blocks were programmed using the data frame (i.e., the number of rows in the data decreased).
Matlab Psychophysics Toolbox (Brainard, 1997) with Matlab Ocular artifacts were removed from concatenated trials of
R2013b on a Dell Inspiron 3521 laptop computer running the each EEG recording in a semi-automatic procedure using
Windows 7 Professional operating system. Audio was output Extended Infomax ICA (Bell & Sejnowski, 1995; Jung et al.,
from the stimulus computer via USB to a Native Instruments 1998), as implemented in the Matlab EEGLAB Toolbox
Komplete Audio 6 sound card, from which the audio channel (Delorme & Makeig, 2004). Following this, any electrode for
containing the stimulus was sent to a Behringer Xenyx 502 which at least 10% of voltage magnitudes exceeded 50 μV
mixer, split, and presented through two magnetically shielded across the recording was flagged as a bad electrode; if any
Genelec 1030A speakers. The second audio channel, such electrodes were identified, the preprocessing process was
containing the timing clicks, was sent directly to the EEG re-started with these channels of data removed. Once no
amplifier through one pin of a modified DB-9 cable, which further electrodes were identified in this stage, any electrode
also sent stimulus trigger labels from Matlab and key press for which at least 10% of voltage magnitudes exceeded 50 μV
events from the participant. The slider interface for the for a given trial were removed from that trial only. As final
continuous behavioral response block was programmed using preprocessing steps, we removed noisy transient spikes from
a custom implementation within the Psychophysics Toolbox the data in an iterative procedure, setting to NaN any data
scheme. samples whose magnitude voltage exceeded four standard
Neurophysiological responses were recorded using the deviations of its channel’s mean power. Next, the removed
Electrical Geodesics, Inc. (EGI) GES 300 system. rows corresponding to bad electrodes were reconstituted with
Dense-array EEG was recorded with vertex reference using rows of NaNs. Finally, a row of zeros representing the
unshielded HCGSN 110 and 130 nets. ECG responses were reference channel was added to the data matrix, and the data
recorded using a two-lead configuration with hydrocel Ag-Cl were converted to average reference.
snap electrodes. Respiratory activity was measured using Data from 3 (out of 16) participants who completed the
thorax and abdomen belts that plugged into a Z-Rip Belt experiment had to be excluded from further analysis, resulting
43
Table of Contents
for this manuscript
in the 13 usable datasets. Excluded participants were were baseline corrected using mean values computed from
identified during preprocessing on the basis of gross noise 25-55 seconds of the respective baseline recording.
artifacts (20 or more bad electrodes). We also excluded the Due to possible orienting responses at the start of a trial
physiological and behavioral responses collected from these (Lundqvist et al., 2009), as well as spline-interpolation
participants so that the comparison of results would always artifacts at the start and end of the trials, we discarded the first
involve the same population of participants. and last 5 seconds of data from all physiological responses.
Cleaned data frames for each stimulus were aggregated
across participants into time-by-electrodes-by-participant G. Continuous Behavioral Response Preprocessing
matrices. RCA was computed over responses to both stimuli Continuous behavioral (CB) responses were aggregated
using a publicly available Matlab implementation across participants into a time-by-participants matrix. While
(Dmochowski, Greaves, & Norcia, 2015), according to the the interpolation artifacts mentioned above were not an issue
procedure described in Dmochowski et al. (2012). This is the here, this type of response is known to suffer from reliability
RCA output that we use in the computation of ISCs. For issues at the start and end of the stimulus (Schubert, 2013).
visual comparison of the topographies only, we also ran RCA Therefore, to account for that and for any orienting responses,
separately for the set of responses corresponding to each we discarded the first and last 10 seconds of response from
stimulus condition. further analysis. Once these potentially transient portions of
the data were removed, we z-scored each response.
F. Physiological Response Preprocessing
Over the course of data collection, we discovered that the H. Inter-Subject Correlations
GSR apparatus was not working correctly. This response is Inter-subject correlations of EEG, HR, RAmpl, RResp, and
therefore excluded from analysis. It appears that the apparatus CB responses were each computed using a 10-second sliding
also introduced substantial noise into the EEG recordings. window that advanced in 1-second increments, resulting in an
However, we used the GSR apparatus with all participants in effective temporal resolution of 1 Hz. In each time window,
order to maintain a consistent experimental procedure. ISCs were computed across all unique pairs of participants;
The Net Station waveform tool used to filter and for 13 participants, this comes out to n-choose-13=78 pairs.
downsample the EEG data would perform only the We present mean ISCs across participant pairs for every
downsampling step on physiological responses. Therefore, to temporal window, mapped to the midpoint times of the
avoid possible aliasing artifacts that could result from windows.
downsampling without a prior lowpass filtering step, we
I. Statistical Analyses
exported the physiological responses at the original sampling
rate (1 kHz) and performed all filtering operations in Matlab. We performed paired two-tailed t-tests on the participant
Full recordings of ECG responses (prior to epoching) were ratings that were delivered after each trial of the
lowpass filtered to 50 Hz using a zero-phase 8th-order neurophysiological block, using the Bonferroni correction for
Chebyshev Type I filter, and then downsampled by a factor of multiple comparisons (McDonald, 2014).
8. We then epoched the data into baseline and stimulus trials. To assess the statistical significance of mean HR, RAmpl,
Each ECG signal was converted to a time-resolved measure of RRate, and CB activity responses, we performed two-tailed
heart rate (HR), in beats per minute, using time intervals Wilcoxon signed-rank tests (Lehmann, 2006) across the
between successive R peaks, and then spline interpolated to a collection of responses at every time point. This assessed
sampling rate of 125 Hz. We baseline corrected each HR whether the sample of responses at a given time point was
vector by subtracting out the mean bpm across 25-55 seconds drawn from a zero-median distribution. We report statistically
of the corresponding baseline recording. significant results at α=0.05 after controlling for False
We found the chest and abdomen respiratory activations to Discovery Rate (Benjamini & Yekutieli, 2001).
be highly correlated, and focus our analyses here on the chest Statistical significance of time-resolved ISCs was assessed
activations only, which tended to be larger in amplitude. Full using a permutation test (Fisher, 1971). In every permutation
recordings of respiratory responses were lowpass filtered iteration, each participant’s response vector was shuffled
using a zero-phase 8th-order Butterworth filter with a cutoff according to non-overlapping 5-second segments. ISCs were
frequency of 1 Hz, and then temporally downsampled by a then computed over the collection of the shuffled responses.
factor of 8. The data were then epoched into baseline and This procedure was repeated for 500 iterations. The 0.95
stimulus trials. While from the ECG responses we computed quantile of mean ISCs across all permutation iterations
only HR, we can derive two measures of interest from the designates the statistical significance threshold.
respiratory activations: Respiratory rate and respiratory
amplitude. Our first step in computing these measures over IV. RESULTS
time was to employ a peak-finding algorithm to identify the
J. Behavioral Ratings
positive and negative peaks in the response. From these,
respiratory rate over time (RRate) was computed from the We first analyzed the behavioral ratings collected during
time intervals between successive positive peaks. Respiratory the neurophysiological block to determine whether participant
amplitude over time (RAmpl) was computed from successive ratings of pleasantness, arousal, interestingness, predictability,
peak-to-trough and trough-to-peak amplitude differences. and familiarity differed significantly according to stimulus
Finally, both response vectors were spline-interpolated to a condition. The distribution of responses, along with responses
sampling rate of 125 Hz. As with the HR data, these measures from individual participants and p-values from the tests, are
shown in Figure 1. Using the Bonferroni-corrected p-value
44
Table of Contents
for this manuscript
threshold of 0.01, only the dimensions of pleasantness and experimental blocks, we may assess the following measures in
predictability are found to vary significantly according to the tandem: EEG-ISCs, CB activity, CB-ISCs, HR activity,
stimulus condition. In both cases, the original (forward) HR-ISCs, RAmpl activity, RAmpl-ISCs, RRate activity, and
stimulus garners higher ratings. Interestingly, we note that RRate ISCs. While we do not attempt to relate the responses
while familiarity with the stimuli was low overall, two of the to one another quantitatively at this time, several interesting
13 participants reported that they were at least moderately preliminary insights emerge from visual inspection of the
familiar with the reversed stimulus. Finally, ratings of genre responses, in particular regarding their respective regions of
exposure range from 2-9 with a mean value of 5.08, verifying statistical significance, in relation to points of interest in the
that all participants met the inclusion criterion of listening to stimulus that we defined in Table 1.
classical music at least occasionally. The responses are shown in aggregate in Figure 3. The gray
Pleasant Arousing Interesting Predictable Familiar Genre
9 9 9 9 9 9
line plots in the top portion of the figure show scale-free
8 8 8 8 8 8 activations or ISCs of the responses over the course of the
7
6
7
6
7
6
7
6
7
6
7
6
musical excerpt. The colored regions denote points at which
Rating
3
4
3
4
3
4
3
4
3
4
3
portion of the figure shows the waveform of the stimulus and
2 2 2 2 2 2 the stimulus spectrogram. The vertical dotted lines denote the
1
Orig Rev
1
Orig Rev
1
Orig Rev
1
Orig Rev
1
Orig Rev
1
22
time points of interest as described in Table 1.
p=0.0001 p=0.0371 p=0.0656 p=0.003 p=0.1928 As can be seen from the plot, all responses except HR
Figure 1. Behavioral ratings of the stimuli. Participants activity produce statistically significant activity or synchrony
delivered ratings along dimensions of pleasantness, arousal, at some point over the course of the excerpt. The EEG-ISCs,
interestingness, predictability, and familiarity after hearing each our primary measure of interest, produce the greatest
stimulus for the first time. Participants reported their exposure proportion of statistically significant results. Following that,
to the genre of the original stimulus only. Boxplots as well as CB activity and CB-ISCs achieve the greatest proportions of
individual participant ratings are shown. Ratings of Pleasantness statistical significance across the excerpt.
and Predictability are found to differ significantly by stimulus In relation to our identified points of interest throughout the
condition using a corrected p-value threshold of 0.01.
piece, it appears that different measures are significant at
K. RC Topographies different times. The first entrance of the cello theme (A1)
brings about significant CB activity, as well as RRate-ISCs,
The RCA algorithm returns a mixing matrix W, which is
which, from inspection of the RRate plot above it, appear
used to project the EEG data from electrode space to
linked to an overall decrease in respiratory rate. The second
component space. The scalp projections of the weights, useful
entrance of the principal theme (A2) is accompanied by
for visualizing the topographies of components, can be
significant EEG-ISCs; while CB activity is not significant
computed by means of a forward model (Parra et al., 2005;
here, HR-ISCs are significant for a short time. The period of
Dmochowski et al., 2012). The projections corresponding to
rising tension beginning at B1 and culminating in the first
the weights of the most reliable component (RC1) are shown
highpoint at C1 implicates EEG-ISCs and CB activity, with
in Figure 2. As shown in the figure, a similar fronto-central
the onset of the highpoint itself co-occurring with significant
topography emerges whether RCA is computed over the
CB activity, CB-ISCs, RAmpl-ISCs, and RRate-ISCs.
responses to both stimuli together, or over the responses to the
Somewhat similar observations can be made in the buildup to
original or reversed stimulus only. While the present
the second highpoint, between B2 and C2. Interestingly, point
topographies are roughly consistent with the RC1 topography
D, which we identified as a point of interest due to its absence
produced in response to intact musical excerpts by Kaneshiro
of activity, corresponds with a significant region of RAmpl
et al. (2014), they are now somewhat less symmetric and
activity, and is followed closely by significant CB-ISCs,
smooth, which we believe was due to the added noise
HR-ISCs, and RAmpl-ISCs.
introduced to the EEG recordings by the GSR apparatus.
It appears that for the responses from which we derived
dual activity and ISC measures (all responses except EEG),
significant regions of activity do not necessarily coincide with
significant regions of synchrony. In particular, the continuous
behavioral responses appear to alternate in significance, and
in fact point back to the finding reported by Schubert, Vincs,
& Stevens (2013) regarding ‘gem moments’. Based on the
activations around C1 and C2, it appears that for the present
Figure 2. Scalp topographies of maximally reliable EEG experiment, the ‘gem moments’ might be the highpoints of the
components. Left: RC1 was computed over responses to both piece, at which time the continuous behavioral activity is
stimuli. Center, Right: RC1 was computed separately over significantly above baseline. It may also be the case that
responses to each stimulus condition. Topographies are roughly significant CB-ISCs relate to the building of tension around
consistent with past RC1 topographies in response to music.
many of the structural segmentation points of interest.
L. Combined Responses
V. DISCUSSION
Our present focus for the remaining results concerns
responses to the original stimulus only. To summarize the In this study, we have further validated the use of a
neurophysiological and continuous behavioral measures that state-of-the-art analysis method to study cortical responses to
were derived from the responses recorded across the two naturalistic music excerpts presented in a single-listen
45
Table of Contents
for this manuscript
Figure 3. Aggregate neurophysiological and continuous behavioral measures. Top: The measures, in order, are EEG inter-subject
correlations (EEG-ISCs), continuous behavioral median z-scored activity and inter-subject correlations (CB, CB-ISC), heart rate
median deviation from baseline and inter-subject correlations (HR, HR-ISC), respiratory amplitude median deviation from baseline
and inter-subject correlations (RAmpl, RAmpl-ISC), and respiratory rate median deviation from baseline and inter-subject
correlations (RRate, RRate-ISC). The colored regions on each line plot denote periods of statistically significant activity or synchrony.
Middle/Bottom: The stimulus waveform and spectrogram are plotted for reference. Dashed vertical lines correspond to musical events
of interest. Annotations at the top of the plot (A1, B1, C1, D, A2, B2, C2) correspond to the labels in Table 1.
paradigm using EEG. Furthermore, we took first steps toward et al., 2014), and roughly 1-3 seconds for behavioral
analyzing these responses alongside various physiological and responses (Krumhansl, 1996; Schubert, 2004; Sammler et al.,
behavioral responses, seeking points of correspondence 2007; Egermann et al., 2013). From inspection of the regions
among the responses and in relation to the musical stimulus. of significance in Figure 3, it appears that shifting the
We analyzed both the activity and the synchrony of the continuous behavioral and physiological responses back in
physiological and continuous behavioral responses, and found time by such amounts (to the corresponding stimulus events),
that these dual measures reached statistical significance at might in fact improve their alignment with our musical events
times in tandem, at times in alternation. Finally, by assessing of interest in some cases, and lead to alternate interpretations
measures intended to denote engagement in conjunction with in others. Thus, this will be an interesting topic to consider
measures intended to denote arousal, we took a first step further in future research.
toward disentangling these two constructs. In conclusion, EEG-ISCs, physiological responses, and
A number of issues can be improved upon in future continuous self-reports suggest a promising combined
iterations of this research. First, more work is needed to framework with which to characterize musical engagement.
integrate a GSR recording modality into the EGI PIB for Fluctuations in the quantitative physiological measures reflect
simultaneous collection with the EEG responses. Next, given fluctuation in arousal levels, which, when changing in
that our stimulus was chosen for its variation in arousal, approximate simultaneity across measures, and correspondent
which is arguably related to loudness, it may be useful to with changes in the musical signal (such as changes in
introduce other control stimuli that retain the amplitude spectral energy or intensity, or a rise in musical tension), can
envelope of the original version while manipulating the be interpreted in conjunction with both cortical and
musicality and temporal coherence of the underlying content. continuous behavioral measures of engagement.
Finally, it is important to note that each response analyzed
here occurs at some point after its corresponding musical ACKNOWLEDGMENT
event; these lags vary according to the measure and have not This research was supported by a grant from the
been definitively determined. For example, while we may Wallenberg Network Initiative: Culture, Brain Learning. The
assume that EEG responses occur within 500 msec of a authors thank Jay Kadis for his assistance with the custom
stimulus and, in the context of the 10-second ISC window, GSR apparatus.
can be treated as concurrent with the stimulus, more care must
be taken with the other responses. Stimulus-to-response time REFERENCES
has been estimated to range from 1-5 seconds for
Agawu, V. K. (1984) Structural ‘Highpoints’ in Schumann’s
physiological responses (Schubert & Dunsmuir, 1999; Grewe ‘Dichterliebe’. Music Analysis, 3(2) 159-180.
et al., 2007a; Grewe, Kopiez, & Altenmüller, 2009; Bracken
46
Table of Contents
for this manuscript
Benjamini, Y., & Yekutieli, D. (2001). The control of the false Kaneshiro, B., & Dmochowski, J. P. (2015). Neuroimaging methods
discovery rate in multiple testing under dependency. Annals of for music information retrieval: Current findings and future
Statistics, 29(4), 1165-1188. prospects. In ISMIR’15.
Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization Koelsch, S., Kilches, S., Steinbeis, N., & Schelinski, S. (2008).
approach to blind separation and blind deconvolution. Neural Effects of unexpected chords and of performer's expression on
computation, 7(6), 1129-1159. brain responses and electrodermal activity. PLoS ONE 3:7,
Bracken, B. K., Alexander, V., Zak, P. J., Romero, V., & Barraza, e2631.
J. A. (2014). Physiological synchronization is associated with Krumhansl, C. L. (1996). A perceptual analysis of Mozart's Piano
narrative emotionality and subsequent behavioral response. In Sonata K. 282: Segmentation, tension, and musical ideas. Music
Foundations of Augmented Cognition, 3-13. Springer. Perception, 13(3), 401-432.
Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, Lehmann, E. L. (2006). Nonparametrics: Statistical Methods Based
10, 433-436. on Ranks. Springer.
Delorme, A., & Makeig, S. (2004). EEGLAB: an open source Lundqvist, L. O., Carlsson, F., Hilmersson, P., & Juslin, P. (2009).
toolbox for analysis of single-trial EEG dynamics including Emotional responses to music: Experience, expression, and
independent component analysis. Journal of neuroscience physiology. Psychology of Music 37(1), 61-90.
methods, 134(1), 9-21. Madsen, C. K., & Geringer, J. M. (1990). Differential patterns of
Dmochowski, J. P., Sajda, P., Dias, J., & Parra, L. C. (2012). music listening: Focus of attention of musicians versus
Correlated components of ongoing EEG point to emotionally nonmusicians. Bulletin of the Council for Research in Music
laden attention–a possible marker of engagement? Frontiers in Education, 45-57.
human neuroscience, 6(112). Madsen, C. K., Brittin, R. V., & Capperella-Sheldon, D. A. (1993).
Dmochowski, J. P., Bezdek, M. A., Abelson, B. P., Johnson, J. S., An empirical method for measuring the aesthetic experience to
Schumacher, E. H., & Parra, L. C. (2014). Audience preferences music. Journal of Research in Music Education, 41(1), 57-69.
are predicted by temporal reliability of neural processing. Nature McDonald, J. H. (2014). Handbook of Biological Statistics.
communications, 5. Baltimore: Sparky House Publishing.
Dmochowski, J. P., Greaves, A. S., & Norcia, A. M. (2015). Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., &
Maximally reliable spatial filtering of steady state visual evoked Hoke, M. (1998). Increased auditory cortical representation in
potentials. Neuroimage, 109, 63-72. musicians. Nature, 392(6678), 811-814.
Egermann, H., Pearce, M. T., Wiggins, G. A., & McAdams, S. Parra, L. C., Spence, C. D., Gerson, A. D., & Sajda, P (2005).
(2013). Probabilistic models of expectation violation predict Recipes for the linear analysis of EEG. NeuroImage, 28(2),
psychophysiological emotional responses to live concert music. 326-341.
Cognitive, Affective, & Behavioral Neuroscience 13, 533-553. Rickard, N. S. (2004). Intense emotional responses to music: A test
Fisher, R. A. (1971). The Design of Experiments. New York: of the physiological arousal hypothesis. Psychology of Music
Macmillan Publishing Co. 32(4), 371-388.
Gomez, P., & Danuser, B. (2004). Affective and physiological Russo, F. A., Vempala, N. N., & Sandstrom, G. M. (2013).
responses to environmental noises and music. International Predicting musically induced emotions from physiological inputs:
Journal of Psychophysiology 53, 91-103. linear and neural network models. Frontiers in Psychology, 4,
Gomez, P., & Danuser, B. (2007). Relationships between musical article 468.
structure and psychophysiological measures of emotion. Emotion Sammler, D., Grigutsch, M., Fritz, T., & Koelsch, S. (2007). Music
7(2), 377-387. and emotion: electrophysiological correlates of the processing of
Grewe, O., Nagel, F., Kopiez, R., & Altenmüller, E. (2007a). pleasant and unpleasant music. Psychophysiology, 44(2),
Emotions over time: Synchronicity and development of 293-304.
subjective, physiological, and facial affective reactions to music. Schmälzle, R., Häcker, F. E., Honey, C. J., & Hasson, U. (2015).
Emotion 7(4), 774-778. Engaged listeners: shared neural processing of powerful political
Grewe, O., Nagel, F., Kopiez, R., & Altenmüller, E. (2007b). speeches. Social cognitive and affective neuroscience, nsu168.
Listening to music as a re-creative process: Physiological, Schubert, E., & Dunsmuir, W. (1999). Regression modelling
psychological, and psychoacoustical correlates of chills and continuous data in music psychology. Music, mind, and science,
strong emotions. Music Perception, 24(3), 297-314. 298-352.
Grewe, O., Kopiez, R., & Altenmüller, E. (2009). The chill Schubert, E. (2004). Modeling perceived emotion with continuous
parameter: Goose bumps and shivers as promising measures in musical features. Music Perception: An Interdisciplinary Journal,
emotion research. Music Perception, 27(1), 61-74. 21(4), 561-585.
Grewe, O., Katzur, B., Kopiez, R., & Altenmüller, E. (2010). Chills Schubert, E. (2013). Reliability issues regarding the beginning,
in different sensory domains: Frisson elicited by acoustical, middle and end of continuous emotion ratings to music.
visual, tactile and gustatory stimuli. Psychology of Music 39(2), Psychology of music, 41(3), 350-371.
220-239. Schubert, E., Vincs, K., & Stevens, C. J. (2013). Identifying regions
Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). of good agreement among responders in engagement with a
Intersubject synchronization of cortical activity during natural piece of live dance. Empirical Studies of the Arts, 31(1), 1-20.
vision. Science, 303(5664), 1634-1640. Solomon, J. (2009). Deconstructing the definitive recording: Elgar’s
Haueisen, J., & Knösche, T. R. (2001). Involuntary motor activity in Cello Concerto and the influence of Jacqueline du Pré.
pianists evoked by music perception. Journal of cognitive Unpublished manuscript.
neuroscience, 13(6), 786-792. Steinbeis, N., Koelsch, S., & Sloboda, J. A. (2006). The role of
Jung, T. P., Humphries, C., Lee, T. W., Makeig, S., McKeown, M. J., harmonic expectancy violations in musical emotions: Evidence
Iragui, V., & Sejnowski, T. J. (1998). Extended ICA removes from subjective, physiological, and neural responses. Journal of
artifacts from electroencephalographic recordings. Advances in Cognitive Neuroscience 18(8), 1380-1393.
neural information processing systems, 894-900.
Kaneshiro, B., Dmochowski, J. P., Norcia, A. M., & Berger, J.
(2014). Toward an objective measure of listener engagement
with natural music using inter-subject EEG correlation. In
ICMPC13-APSCOM5.
47
Table of Contents
for this manuscript
This study uses electroencephalography (EEG) to determine how neural functions for
auditory rhyme processing are related to expertise in freestyle lyricism, a form of Hip-
Hop rap improvisation involving the spontaneous composition of rhyming lyrics.
Previous event-related potential studies have shown that rhyming congruency is reflected
in the phonological N400 component, and that musical expertise affected a difference
between technical and aesthetic judgments for musical stimuli, as reflected in the
contingent negative variation (CNV) component before the critical moment. Here, we
asked expert freestyle lyricists and laypersons to listen to pseudo-word triplets in which
the first two words always rhymed while the third one either fully rhymed (e.g., STEEK,
PREEK; FLEEK), half-rhymed (e.g., STEEK, PREEK; FREET), or did not rhyme at all
(e.g., STEEK, PREEK; YAME). For each trial, upon hearing the third word, participants
had to answer one of two types of questions: one aesthetic (do you ‘like’ the rhyme?) and
one descriptive (is the rhyme ‘perfect’?). Responses were indicated by a button-press
(yes/no). Rhyme effect was investigated as the difference between full-rhyme and non-
rhyme conditions, and was measured beginning from the vowel onset of the critical word.
Rhyme effects were observed as a fronto-central negativity, strongest in left and right
lateral sites in lyricists; this effect was highly left-lateralized in laypersons. A stronger
CNV to the ‘perfect’ task towards the third word was observed for both lyricists and
laypersons, which suggests that determining whether rhymes are perfect requires more
exertion than making aesthetic judgments of rhyme. This task effect was observed in
lyricists over fronto-central electrode sites while appearing at the midline for laypersons.
The results indicate that expertise in freestyle lyricism may be associated with more
expansive and bilateral neurocognitive networks for processing of rhyme information.
ISBN 1-876346-65-5 © ICMPC14 48 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 49 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Durations Genres
50
Table of Contents
for this manuscript
51
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 52 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Individual scale tones are known to evoke stable qualia for experienced listeners (Arthur,
2015; Huron, 2006). In this study, we ask whether melodic intervals evoke distinctive
qualia, and whether these qualia are independent of the scale degree qualia of the
constituent tones. The first study was an open-ended exploratory task in which musician
listeners provided free association terms in response to ascending and descending
melodic intervals in both tonal and atonal contexts. Content analysis was carried out on
the resulting terms using independent assessors. In order to test the validity of the qualia
dimensions, a second study asked independent participants to judge melodic intervals
according to the dimensions arising from the content analysis from the first study. In
particular, we tested for intersubjective reliability suggesting that interval qualia are
stable experiences shared by Western-enculturated listeners. Even if the dimensions are
shown to exhibit high reliability, a possible objection would be that listeners are actually
responding to the qualia evoked by the constituent scale degrees rather than by the
intervals per se. Accordingly, a third study was conducted in which participants again
heard melodic intervals in both key contexts and atonal contexts. Intervals were judged
using only those dimensions validated in Study 2. Two analyses were used to interpret
the results of Study 3. First, do the qualia judgments for key-contextualized intervals
converge with the qualia judgments for the a-contextualized intervals? That is, is there
any evidence that intervals have stable qualia independent of key context? Second, can
we predict the qualia of the contextualized intervals as some combination of the qualia of
the constituent tones? For example, if the qualia of an interval is similar to the qualia of
the final tone forming that interval, then the idea that intervals evoke independent qualia
would be thrown into question.
ISBN 1-876346-65-5 © ICMPC14 53 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 54 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The digitization and streaming of music online has resulted in unprecedented access to
high quantities of music-information and user-consumption data. Publicly available
resources such as Echo Nest have analyzed, and extracted audio features for millions of
songs. We investigate whether preferences for audio features in a listener's main genre
are expressed in the other genres they download, a phenomenon we refer to as “leaky
features”. For example, do people who prefer Metal listen to faster and louder Classical
or Country music (Metal typically has a higher tempo than many other genres)? A
music-consumption database, consisting of over 1.3 billion downloads from 17 million
users between 2007-14, is used to explore 10 high level audio features, grouped into 3
categories: rhythmic (danceability, tempo, duration), environmental (liveness,
acousticness, instrumentalness), and psychological (valence, energy, speechiness,
loudness). The study was made possible with a data-sharing and cooperation agreement
between McMaster University and Nokia Music, a global digital music streaming service.
Genre information was provided directly by the labels. Audio features for 7 million songs
were collected using Echo Nest API, and linked to the Nokia Music database. Although
not exhaustive, audio features utilized in this study represent a core set of objective
values by which music audio is currently analyzed. We found audio features specific to
users’ main genres “leaked” into other downloaded genres. Audio features that tend to be
more exaggerated in one genre, such as “speechiness” in Rap music, typically
demonstrate more leakiness than genres without this characteristic, such as Classical.
These results have implications for music-recommendation systems, and future music
cognition studies. Additionally, this work demonstrates a unique approach to conducting
psychological research exploring the extent to which musical features remain tied to
specific genres.
ISBN 1-876346-65-5 © ICMPC14 55 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In the past decade, there has been increasing interest in the relation between music and
language processing. One area of overlapping research is in the domain of rhythm. In this
study, we examined individual differences in aesthetic responses during silent reading of
sonnets – a highly structured literary form in iambic pentameter that shares a number of
rhythmic characteristics with music. Participants read sixteen sonnets at their own self-
defined pace, first for familiarity and then to make judgments about different
characteristics of the poems. For the second reading of each sonnet, participants judged
vividness of the poem’s imagery, valence of poem theme, valence of personal feelings
about the poem, strength/intensity of their feelings, and aesthetic pleasure (AP). To
examine individual differences in poetry perceptions, we also assessed, prior to the first
reading, both positive and negative affect using the PANAS, and considered the role of
visual and auditory imagery abilities using the VVIQ and BAIS survey measures,
respectively. We also considered potential differences associated with music and literary
training. As an online measure of AP, literary trained participants highlighted words
within the sonnets that they felt were particularly aesthetically pleasing (positive) or
displeasing (negative). Results revealed that strength/intensity of feelings, valence of
feelings, and positive affect accounted for approximately 76% of the variance in AP
ratings. Moreover, we found that positive highlighting was positively correlated with AP
ratings; to our surprise, this contrasted with negative highlighting, which had no impact
on AP ratings. Although neither imagery vividness nor visual/auditory imagery abilities
contributed substantially to AP, vividness was a greater predictor of AP for individuals
with high imagery abilities. Findings will be discussed in the context of the relation
between rhythm and aesthetic pleasure in music and language more broadly.
ISBN 1-876346-65-5 © ICMPC14 56 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music triggers visual imagery quite easily (Osborne, 1980). This imagery influences the
meaning people ascribe to music and the emotions they experience while listening (Juslin
& Västfjäll, 2008). The musical structure, its contextual associations, and characteristics
of the listener likely all play a role in stimulating visual imagery. This study aims to
understand how visual imagery responses are affected by extramusical descriptions, and
what role individual differences play. One hundred forty participants were randomly
assigned to one of three groups: congruent, incongruent, or no description. Each group
heard the same nine orchestral excerpts in a randomized order. A prior study with twenty-
four participants established a set of brief descriptions of visual imagery that seemed
congruent or incongruent with each excerpt. Participants in the congruent group and
incongruent group read a description of congruent or incongruent visual imagery,
respectively, before each excerpt. Participants in the third group listened without an
imagery prompt. After each excerpt, participants answered questions about their
experience of the music. At the end of the study, participants completed the Goldsmiths-
MSI and the Absorption in Music Scale. Linear mixed models were conducted with the
group as the independent variable and imagery experience, emotional reaction to music,
enjoyment of music, and engagement with music as the dependent variables. Results
revealed that incongruent descriptions had a negative impact on participants’ visual
imagery to the music. The congruency or incongruency of the prompts also had
interesting effects on the free description reports of visual imagery. Additionally,
absorption in music was found to correlate positively with the overall imagery experience
of the listener. This study lays the foundation for future work on visual imagery to music.
ISBN 1-876346-65-5 © ICMPC14 57 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musical taste, understood as an attitude towards music, plays an important part in the way
music listeners in Western cultures perceive and construct their self-concept. Listeners
like or dislike specific music not only to satisfy their emotional and communicative
needs, but also to create and affirm their own identity. But until now, research has
focused mainly on the positive aspects of musical taste. To get to a better understanding
of it, qualitative in-depth-interviews (N = 21) were conducted to explore the different
dimensions and justifications individual participants offered in respect to music they
particularly dislike. Prior to interview sessions the participants were asked to list their
musical dislikes. Then, during the interviews they were asked to give reasons for why
they dislike each musical piece, artist or style given on their list and rate each item on a
10-point scale (0=neutral to 10=worst possible item). All interviews were analyzed using
qualitative content analysis. The results from this analysis show that reasons for likes and
dislikes do not necessarily form opposite pairs, but are in some instances identical. Liking
and disliking insofar serves the same function that listeners use both to express their
identity and encourage social contact and cohesion. But disliking certain music plays a
prominent role when listeners want to avoid negative emotional states and moods,
physical harm or unpleasant social situations. Also, all participants confirmed that they
can draw links between individual dislikes and their own identity. Ultimately, the study
provides further insight into the dimensions and functions of musical taste in general and
disliked music in particular, and into how specifically individual dislikes are relevant for
the creation and affirmation of one’s self-concept.
ISBN 1-876346-65-5 © ICMPC14 58 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
There are two lines of findings on the cross-modal effects between music and visual
stimuli. One claims that the effect of music on the impression of visual stimuli dominated
the effect of visual stimuli on the impression of music, and the other does vice versa. In
the author’s previous study, the cross-modal effect of music dominated one of paintings
with relatively long exposure, while this dominance of music disappeared with brief
exposure. As one possibility, the steadiness of stimuli could explain these results. Music
changes continuously but paintings don’t. With long exposure, the effect of music
accumulates, whereas the effect of paintings may not grow because of habituation. This
study examined the influence of steadiness on the cross-modal effects. In order to control
the steadiness, an affective picture was presented with 20 seconds in the single picture
condition and 20 pictures were presented in succession with a second each in the 20
pictures condition. Music excerpts were 4 piano solos. Participants rated the potency and
brightness impression of music and pictures individually in the first part of the
experiment. In the second part, participants were presented music and pictures at the
same time and rated the impression of music or pictures. To compare the cross-modal
effects between music and pictures, musical impact and pictorial impact were calculated.
Musical impact was defined as the difference between the potency (or brightness) scores
of a visual stimulus with and without a musical excerpt. Pictorial impact was also defined
in the same way. As the results, for potency, ANOVA revealed that the musical impact
was significantly larger than the pictorial one in the single picture condition. On the other
hand, in the 20 pictures condition, such a significant difference was not found. For
brightness, similar tendency was shown. These results suggest that the steadiness of
stimulus influences the cross-modal effects between music and visual stimuli.
ISBN 1-876346-65-5 © ICMPC14 59 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In terms of theoretical and empirical initiatives, Herbert (2011) put music absorption
back on the map of scholarly interests. However, having focused exclusively on everyday
life experiences, she and others have left absorbed listening in settings that generally
seem to require and assume full attention largely unexplored. The main aim of this study
was to gain insight in the underlying structure of ‘aesthetic absorption’ while listening to
music. A second aim was to distinguish between possibly different types of absorption
which may occur as part of aesthetic experiences, and to understand them in relation to
other types of attentional behavior when listening to music. Via an online survey (N=228)
participants were asked to listen to a self-chosen piece of highly-involving music. Data
were gathered on a variety of parameters related to consciousness and attention. Factor
analysis (PAF) and the Schmid-Leiman procedure provided a deeper and parsimonious
understanding in the higher-order structure of consciousness while being musically
engaged. Its cognitive-motivational part was used to further define the dimensionality of
absorption, and was linked to Tellegen’s model of absorption concerning the experiential
and instrumental sets. Next, a hierarchical cluster analysis was conducted in order to
explore different types of attentional behavior. Results suggested distinguishing between
two types of musical absorption which differ in the presence of meta-awareness. These
were termed ‘zoning-in’ and ‘tuning-in’ (cf. Smallwood 2007). Finally, a dynamic
framework was developed which integrates these types with other attentional phases
(e.g., mind wandering) applicable to music listening. It is argued that this framework may
be a realistic alternative to the often problematic distinction between everyday aesthetics
and art-centered aesthetic experiences.
ISBN 1-876346-65-5 © ICMPC14 60 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The sound quality of car radios is often degraded by poor radio-wave reception. Ways to
estimate the qualities of sounds transmitted by radio are sought after. We attempted to
establish a psychophysical way to evaluate the subjective distortedness of music in which
some parts were interrupted, as is often the case in car radio sound. A magnitude
estimation experiment was conducted. Two music sources of about 5 s, classic and rock,
were degraded by replacing some parts with temporal gaps. The number of gaps per
second was 2-16, and the duration of each gap was 6-48 ms, with 17 stimuli for each
music source. Twelve participants estimated the subjective distortedness of each
stimulus, responding with a number above or equal to zero. Zero meant that no distortion
was perceived. The results showed that magnitude estimation turned out to be a reliable
method to measure the subjective distortedness of music degraded by temporal gaps: the
subjective distortedness was described as a power function (with exponent of about 0.7)
of gap duration summed up per second.
ISBN 1-876346-65-5 © ICMPC14 61 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
mfarbood@nyu.edu
ISBN 1-876346-65-5 © ICMPC14 62 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Table 1. Correlations between features and responses for each stimulus. Abbreviations: Aud = musical excerpt; Vis = animation; Lou =
loudness; Cen = spectral centroid; Dev = spectral deviation; Inh = inharmonicity; Rou = roughness; Fla = spectral flatness; Pit = pitch; Ons
= onset frequency; Spe = speed of motion; Coh = coherence of motion; Con = contrast; Bri = brightness.
Features
Stimulus Audio Visual
Aud Vis Lou Cen Dev Inh Rou Fla Pit Ons Spe Coh Con Bri
A1 - .71 .04 .20 .02 .21 .29 -.01 .68 - - - -
A2 - .53 -.44 -.63 .04 .58 -.61 .28 .57 - - - -
A3 - .69 .65 .63 .40 .28 .53 .18 .12 - - - -
A1 V1 .55 .17 .32 .09 -.01 .41 -.10 .69 .26 .20 .42 -.33
A1 V2 .71 .10 .24 -.04 .10 .34 .02 .73 .41 .39 -.06 .22
A1 V3 .75 .10 .24 -.01 .21 .26 .04 .62 .26 .17 -.28 -.23
A2 V1 .22 -.22 -.25 .07 .30 -.28 .39 .44 .37 -.16 .27 -.28
A2 V2 .30 -.31 -.38 .10 .38 -.40 .40 .48 .56 .52 .03 -.07
A2 V3 .36 -.42 -.42 .03 .33 -.43 .25 .46 .42 .22 -.32 -.14
A3 V1 .41 .32 .23 .21 .39 .20 .23 .17 .33 -.21 .04 -.03
A3 V2 .26 .22 .23 .17 .12 .18 .17 .36 .41 -.16 .30 -.42
A3 V3 .37 .35 .39 .31 .18 .29 .09 -.08 .39 .06 .17 .08
- V1 - - - - - - - - .38 -.22 .39 .06
- V2 - - - - - - - - .58 .29 .39 -.13
- V3 - - - - - - - - .59 -.06 -.16 -.30
Mean .49 .05 .07 .12 .26 .07 .16 .44 .41 .09 .10 -.13
When tested on stimuli with both audio and visual Farbood, M. M., & Price, K. (2014). Timbral features contributing to
components, the best results were obtained when audio features perceived auditory and musical tension. In Proceedings of the
were given twice the weight (or more) of the visual features. 13th International Conference of Music Perception and
The optimized model for the audio + visual stimuli resulted in a Cognition, Seoul, Korea.
Paraskeva, S., & McAdams, S. (1997). Influence of timbre,
mean correlation value of .67 (min of .31, max of .87). In
presence/absence of tonal hierarchy and musical training on the
general the model did quite well, particularly in predicting perception of musical tension and relaxation schemas. In
more local tension changes. Proceedings of the 1997 International Computer Music
Conference, 438-441.
IV. CONCLUSION Pressnitzer, D., McAdams, S., Winsberg, S., & Fineberg, J. (2000).
This paper reports a preliminary analysis of an experiment Perception of music tension for nontonal orchestral timbres and
its relation to psychoacoustic roughness. Perception &
that explored the perception of audiovisual tension. Subjects
Psychophysics, 62(1), 66–80.
were asked to judge perceived tension when watching/listening Schubert, E. (2004). Modeling perceived emotion with continuous
to stimuli featuring three musical excerpts taken from musical features. Music Perception, 21(4), 561–585.
electronic compositions paired with random-dot animations. Schütz, A. C., Braun, D. I., Movshon, J. A., & Gegenfurtner, K. R.
A trend-salience model was utilized as a way of determining (2010). Does the noise matter? Effects of different kinematogram
which auditory and visual features were influencing tension types on smooth pursuit eye movements and perception. Journal
perception and the relative contributions of those features. of Vision, 10(13), 26–26.
When the model was optimized to fit the mean responses, the
feature with the greatest weight was found to be loudness,
followed by spectral centroid, roughness, onset frequency, and
speed of motion (all equal in weight), then by contrast, and
finally with much smaller negative contributions from
coherence of motion and brightness. In general, the auditory
features contributed significantly more to perceived tension
than the visual features.
REFERENCES
Bailes, F., & Dean, R. T. (2012). Comparative time series analysis of
perceptual responses to electroacoustic music. Music Perception,
29(4), 359–375.
Baker, C. L., & Braddick, O. J. (1982). The basis of area and dot
number effects in random dot motion perception. Vision Research,
22(10), 1253–1259.
Dean, R. T., & Bailes, F. (2010). Time series analysis as a method to
examine acoustical influences on real-time perception of music.
Empirical Musicology Review, 5(4), 152–175.
Farbood, M. M. (2012). A parametric, temporal model of musical
tension. Music Perception, 29(4), 387–428.
63
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 64 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
For over two centuries, scholars have observed that tonal harmony, like language, is
characterized by the logical ordering of successive events, what has commonly been
called harmonic syntax. Yet despite the considerable attention devoted to characterizing
the behavior of chord events at relatively local levels of musical organization, the
application of statistical modeling procedures—advanced in the cognitive sciences to
explain how humans learn and mentally represent languages—has yet to gain sufficient
traction in music research.
At present, few harmony corpora exist in the scholarly community, and many consist of
symbolic representations of homo-rhythmic genres that rarely feature in everyday
listening. Perhaps worse, corpus studies often suffer from the contiguity fallacy, the
assumption that note or chord events on the musical surface depend only on their
immediate neighbors, despite the wealth of counter evidence from studies of sequence
memory that listeners form non-contiguous associations for stimuli demonstrating
hierarchical structure.
This session presents four corpus studies of tonal harmony that look beneath (or beyond)
the surface, examining problems relating to the representation and analysis of tonal
materials using techniques for pattern discovery, classification, and style analysis. John
Ashley Burgoyne (The Netherlands) examines the evolution of expected per-song chord
distributions over the period 1958–1991, based on the McGill Billboard Project
(http://billboard.music.mcgill.ca/), an open-access corpus of expert harmonic
transcriptions for over 1,000 songs from the Billboard “Hot” 100. David Sears (Canada)
demonstrates the utility of non-contiguous n-grams for the discovery of recurrent
harmonic patterns in a corpus of Haydn’s string quartets, where non-contiguous events
often serve as focal points in the syntax. Christopher White (USA) presents an algorithm
that derives a reduced chord vocabulary from the surface statistics in the Yale Classical-
Archives Corpus (http://ycac.yale.edu/) without recourse to pre-made harmonic templates
or knowledge of tonal harmony in general, with the resulting vocabulary and syntax
conforming to those produced by humans. Finally, Claire Arthur (USA) considers
whether algorithms based only on melodic features—metric position, duration, and
interval of approach and departure—can identify non-chord tones in classical and popular
corpora, using harmonic transcriptions as a basis for comparison.
ISBN 1-876346-65-5 © ICMPC14 65 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Using MIR techniques on an audio corpus, Mauch et al. (2015) identified three moments
of change in Western popular music style: the years 1964, 1983, and 1991. As yet, their
claims have not been tested against human transcriptions of the musical material. We are
seeking to verify and enrich these findings with the McGill Billboard corpus
(http://billboard.music.mcgill.ca/), a collection of over 1000 expert transcriptions of the
harmony and formal structure of songs from the U.S. pop charts.
Symbolic corpus analyses are bedeviled by two common challenges: (1) correctly
handling the sum-to-one constraint implicit in considering relative frequencies of chords
within pieces, composers, or styles; and (2) measuring evolution smoothly over time,
allowing inflection points to appear naturally, without introducing arbitrary cut-points.
We converted the relative frequencies of chord roots and bigrams to so-called “balances”
using the principles of compositional data analysis (Burgoyne et al., 2013). This approach
avoids spurious correlations that can appear with techniques that ignore the sum-to-one
constraint. We then analyzed the evolution of these balances using time-series analysis,
rather than grouping according to fixed time spans.
The data do not support a strong inflection point in 1964, though there is a small
reduction in the usage of vi chords and descending minor third progressions. There is
more support for 1983, where the use of bVII chords, minor keys, and stepwise
progressions strongly peak. The corpus ends in 1991, making it impossible to verify the
third potential inflection point. Together, these results are consistent with Mauch et al.'s
audio findings: a small stylistic change in the mid-1960s, followed by smooth and steady
evolution until a sharp break occurred in the early 1980s. Principal component analysis
further reveals that the variation within harmonic vocabulary is substantially more
complex than that of the root progressions themselves.
ISBN 1-876346-65-5 © ICMPC14 66 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Pattern discovery is an essential task in many fields, but particularly so in the cognitive
sciences, where a substantial body of evidence suggests that humans are more likely to
learn and remember the patterns they encounter most often. In corpus linguistics,
researchers often discover recurrent patterns by dividing the corpus into contiguous
sequences of n events, called n-grams, and then determining the frequency of each
distinct pattern in the corpus. And yet for stimuli demonstrating hierarchical structure,
non-contiguous events often serve as focal points in the syntax. This problem is
particularly acute for corpus studies of tonal harmony, where the musical surface contains
considerable repetition, and many of the vertical sonorities from the notated score do not
represent triads or seventh chords, thereby obscuring the most recurrent patterns.
ISBN 1-876346-65-5 © ICMPC14 67 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Figure 1. A toy analysis of Mozart K. 284, iii, mm. 1-8 However ubiquitous this method might be within
pedagogy and practice, few computational models have been
How do we undertake this key finding task? Much emphasis implemented that use corpus data to build a key-finding
in music-perception research has focused on the distribution system that relies on chord events (Bharucha, 1987; Hörnel
of pitches in tonal contexts to answer this question, & Ragg, 1996; Barthelemy & Bonardi, 2001; Raphael &
comparing the relative amount of onsets or durations of each Stoddard, 2004; Bellman, 2005; Quinn, 2010; White, 2013),
pitch to some ideal template. For instance, listeners generally and (to the author’s knowledge) none have been evaluated
associate the most frequent tone with the first degree of the against human behavior. Instead, most key-orientation
major or minor scale, the second most frequent tone with studies that model event successions have focused on pitch
scale-degree five, and so on (Krumhansl & Shepard, 1979; events as well, especially those successions that outline rare
Castellano, Bharucha, & Krumhansl, 1984; Oram & Cuddy, intervals that indicate particular key centres (Brown &
1995; Creel & Newport, 2002; Smith & Schmuckler, 2004). Butler, 1981; Brown, Butler & Jones 1994; Matsunaga &
In Figure 1’s initial window, D and A occur most often, Abe 2005; 2012).
thereby aligning this passage with a D major key center.
ISBN 1-876346-65-5 © ICMPC14 68 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
This paper presents a chordal set-based probabilistic key the resulting scale-degree sets as comprised of scale-degree
finder – or, SPOKE – that a) models statistical learning by classes (where tonic is 0, the leading tone is 11, supertonic is
deriving parameters directly from a corpus, and b) can be 2, and so on), and apply modulo 12 arithmetic to transpose
compared to human behavior. Such comparisons will then be the former to the latter (Forte 1973). Here, x acts as the
undertaken in three ways: 1) comparing SPOKE’s key example’s tonal interpretation, transposing the series to one
assessments to those of a annotated corpus, 2) comparing the of the 12 possible tonal centers (i.e., the key of C would be 0,
model’s Roman-numeral analyses to those of undergraduate C-sharp would be 1, F would be 5, and so on). To maximize
music majors, and 3) asking professional music theorists to the argument, then, one chooses the key level at which
distinguish the computer’s output from that of undergraduate transitions between scale-degree sets produce the highest
music majors. These tests are designed to provide first steps overall n-gram probabilities. In terms of Figure 1, this means
toward entertaining a chordal approach to key finding as a that aligning D with tonic (x = 2) would find the ideal key –
supplement to more established pitch-based tonal modeling. or – that maximizes the probability of the window – or,
argmax(P( ) – at that particular time point j. The contrasting
II. A CHORDAL KEY-FINDING MODEL pitch material of the second window’s time point changes the
probability of the chord sequence: now aligning A with tonic
A. SPOKE’s Key-Finding Method (x = 9) creates the most probable series of sets in this
Models that use chord progressions to find a passage’s window.
key translate successions of pitch sets into successions of
scale-degree sets, or sets of values oriented within a scale j= argmaxx(P ([oj-n-x ]mod 12 ... [oj-x ]mod 12) ) (1)
(rather than within absolute pitch space). For instance, x |0. . .11|
consider an F-major triad <F-A-C> followed by a C major
triad <C-E-G>. Represented as pitch sets, they show no tonal
orientation: they could be in almost any key. However, when B. SPOKE’s Vocabulary Reduction
converted to F-major’s scale degrees, the notes are defined But, it is not always obvious which sets should constitute
by their position within the F-major scale. These two sets an analytical model’s vocabulary, or the universe of sets
become triads on the first and fifth scale degrees, designated (“chords”) possible in the model’s tonal analyses. Consider
respectively as I and V in standard Romans-numeral the two dotted boxes of Figure 1. Both boxes have the
notation. pitch-class content <C#, D, E, F#, B>. While our toy analysis
The translation between pitch sets and scale-degrees sets initially assumed an underlying vocabulary of triads and
has been engineered in several ways, using neural networks sevenths, without a priori rules or templates there is no
(Bharucha, 1987; 1991, Hörnel & Ragg, 1996), Hidden immediate way of culling a B minor triad from that larger set
Markov modeling (Raphael & Stoddard 2004), and n-gram of pitches. To resolve this problem using the properties of
Markov chains (Quinn 2010, White in press); however, only the corpus itself, White (2013) proposed a machine-learning
the last of these has aggressively relied on large corpora to reduction method for deriving a limited vocabulary of
set their models’ parameters. This study will therefore rely scale-degree sets from a dataset of chord progressions. First,
on this established paradigm. Such a process essentially the method observes which progressions of scale-degree sets
mimics that used in the Roman-numeral analysis of Figure 1. occur most frequently within some corpus. Those
Imagine the thickness and direction of the arrows of Figure 2 progressions that occur above some frequency threshold
as representing 2-gram probabilities (i.e., the probability of comprise the model’s initial vocabulary. When presented
each chord moving to the next chord). From this standpoint, with some series of scale-degree sets not in this vocabulary,
the D major and A major readings are ideal since their the reduction method adds and subtracts pitches from the
Roman numerals produce the highest 2-gram probabilities observed sets until they are transformed into vocabulary
compared to those from other tonal orientations. It is then a items. As a demonstration of this reduction process, again
small step to derive these transition probabilities from a consider the dotted boxes of Figure 1, imagining that the
corpus by tracking how frequently each chord moves to each syntax of Figure 2 represents a corpus’s most-frequent
other chord within that corpus. progressions. Unreduced, the pitch content of these boxes
Equation 1 formalizes this process (following White would not produce chords within the syntax; however, the
2014). The equation acts on a progression of events (oj-n ... reduction process would find that, by removing the C# and E,
oj), a series of n pitch sets o between two timepoints j-n and j, the pitch content of those windows can be transformed into
and finds the key judgment x that transposes that series to the ii and vi triads, respectively, and therefore participate in
scale-degree sets. This key judgment at time point j is the syntactic chord progressions.
equation’s end result, j. More specifically, we use integers This process can be iterated over an entire corpus to
[0…11] to represent the initial unanalyzed notes as pitch produce a reduced vocabulary: by reducing a corpus’s
classes (where C is 0, C# is 1, D is 2, B is 11, and so on) and scale-degree transitions, the process “edits” the corpus’s
69
Table of Contents
for this manuscript
less-frequent progressions into their more frequent subsets or 13,769 MIDI files by 571 composers yielding 14,051,144
supersets (e.g., editing a set like <C#, D, E, F#, B> into <B, salami slices, or sets of notes each time a pitch is added or
D, F#>, or a B minor triad). The process can be repeated – subtracted from the texture. Slices are identified with their
editing the edits – until no more changes are made. Having onset time. The corpus has also been tonally analysed, with
mapped all improbable sets onto probable ones, the resulting sections of each piece identified either with a key and mode
reduced series of chords comprises a vocabulary smaller and or labelled as ambiguous. “Start” and “End” tokens were
more constrained than one culled from the raw musical added to the beginnings and endings of each file. The files
surface. were divided into lists of scale-degree sets, and divisions
Formally, we can represent a series of musical were made at key changes. Each slice was compiled as an
observations O with time points 1 to n such that unordered set of scale degrees. To further simplify the
, … . We can reduce each o in O to a related reduced dataset, singletons and adjacent subsets of cardinalities < 3
chord s such that , … and | | 1 where were deleted (e.g., the set <C,E> would be deleted if it
the cardinality of the intersection between each o and its preceded <C,E,G>), as were repeats. These processes (and
corresponding s is at least 1 (i.e., they share at least one those following) were implemented in the Python language
scale-degree). Equation 2 then produces the reduced series S using the music21 software package (as described in
by maximizing the contextual probabilities k and the Cuthbert & Ariza, 2011).
proximity of the two sets . Here, | ) is the The reduction process’ input and parameters were defined
probability that a given s would occur in the context in which as follows. The YCAC’s lists of scale-degree sets were used
we observe the corresponding o. While this could in both to provide the 2-gram probabilities within the reduction
principle be any probabilistic relationship, in our case this is process and as the material for the reduction process itself.
a simple bigram probability. , , then, is the proximity Any repeated chords that arose during the reduction process
(NB: “ ” for “proximity”) between the two sets, s and o. were conflated into a single chord event. The distribution of
While in principle this could be any type of similarity chords in the reduced series contained a long tail: the top 22
relation, in the current implementation is equal to the chords accounted for 85.3% of all chord occurrences, with
amount of overlap (or intersection cardinality) between the the remaining chords accounting for only 14.7%. These
two sets. This method has been shown to produce a infrequent chords were removed from the reduced series, and
vocabulary primarily of triads and seventh chords that chord transitions were revised. This series was used to
conforms to basic intuitions surrounding classical tonal create a reduced vocabulary of scale-degree sets and a
harmony (White 2012). For additional specifics on the reduced 2-gram Markov model. The resulting vocabulary
implementation, see the study’s online supplement, available and syntax and its overlap with textbook chordal
at chriswmwhite.com/research. vocabularies are described in more detail in the online
supplement.
= argmax ∏ | , (2) The MIDI files of each excerpt in the Kostka-Payne
corpus (Temperley, 2009) were chosen as the musical
SPOKE then combines this reduction process with the material to be analyzed. The corpus contains 46 examples of
key-finding method of Equation 1 to analyze a piece of Western European common-practice music drawn from the
music. The full process, crucially, determines a musical key Kostka-Payne textbook, ranging from the Baroque to
by relying only parameters learned from a corpus. The Romantic era. Five pieces were removed due to difficulties
model therefore not only produces key-finding assessments converting them into digital representations usable by this
that can be measured against human behavior, but it also study’s methods. The ground truth was taken as the opening
learns how to undertake this task by replicating key indicated in the textbook instructor’s edition.
exposure-based learning. In what follows, we conduct To compare this model’s behavior to other models, three
several tests to evaluate the model’s performance as it relates key-profile implementations were used to assess the tonal
to human behavior. Test 1 examines how accurately the orientations of the same examples: the
model assesses the key of a ground-truth corpus compared to Krumhansl-Schmuckler (Krumhansl 1990), the
other established key-finding models. Temperley-Kostka-Payne (Temperley 2007), and
Bellman-Budge (Bellman 2005) weightings. All three were
III. TESTS available in music21’s library of key-finding functions.
Various window-lengths were attempted for these analyses,
C. Test1 – Key-Finding Accuracy and it was found that using the first six offsets of each piece
1) Materials and Methods. The reduction process was produced the best results.
implemented on the Yale-Classical Archives corpus 2) Results. As shown in Figure 2, of the 41 pieces
(YCAC). This corpus is described in White & Quinn (in analyzed, the SPOKE’s Reduced-YCAC model assigns the
press) and is available at www.ycac.yale.edu. It consists of same tonic triad as the textbook 87.8% of the time. Of the 5
that did not overlap, twice the model judged the key to be the
70
Table of Contents
for this manuscript
passage’s relative major, twice the keys differed by fifth, and three or more windows assigned on a particular scale-degree
once the passage was too scalar for the model to recognize set at a given timepoint, the Roman-numeral annotation of
the correct underlying chords. Figure 2 also shows how often that set was placed in the score at the appropriate timepoint.
each key-profile model produced correct answers. The Using this voting process allowed the model to interpret
profiles of Krumhansl-Schmuckler, modulations: for instance, if three earlier timepoints read a
Temperley-Kostka-Payne, and Bellman-Budge produced C-major triad as I, and three later timepoints analyse it as IV,
rates of 78.0%, 85.4%, and 73.2% correct, respectively. then both those Roman numerals were placed under that
(These findings track but differ from Albrecht and Shanahan moment in the score. If there was no agreement, a question
(2013), likely because their specific implementation and mark was placed in the score. Since the final windows in the
music21’s differ.) piece would only produce one or two annotations, the
automated annotations ended before the example’s final
measure in several instances. If a series of chords was
reduced to a single annotation, a line was used to indicate the
prolongation of that single chord. (The reduced vocabulary
included two non-triadic sets, the 11th and 21st most frequent
chords in the vocabulary; since these chords were outliers
and annotating a chord with a non-triadic structure would
“give away” the origin of the analysis and unduly bias
graders, these non-triadic structures were removed from the
model’s vocabulary.)
The 32 examples were also analysed by 32
Figure 2: The percentage of key assessments on the undergraduates in their fourth and final semester of a
Kostka-Payne corpus that agreed with the instructor’s edition music-theory sequence at the University of North Carolina at
for each model Greensboro’s School of Music, Theatre and Dance. (Six
While this test addresses the output of key finding models, subjects were drawn from the author’s theory section, 26
it does not test whether SPOKE executes the key-finding task were not.) Students were given 10 minutes to analyse their
in a way similar to a human, namely whether the model example, and were asked to use Roman numerals without
identifies chords, prolongations, and points of modulation in noting inversion. The full instruction page, as well as all
the same way a human would. The following test therefore examples and analyses, can be found in the supplementary
compares the output of the analytical model to analyses of material.
the same excerpts produced by undergraduate music Eight music theory faculty (each from different
students. institutions, all currently teaching a music theory or
fundamentals class) were then asked to grade a randomly
D. Test 2 – Comparing the Model to Human Annotations
selected group of 8 analyses, not knowing which of the set
1) Materials and Method. Of the Kostka-Payne excerpts was algorithmically or human generated. The theorists were
whose key was successfully analysed by the reduced YCAC asked to provide a grade from 0 to 10 for each example,
model, 32 were chosen to be analysed in depth by both indicating the level of expertise in Roman-numeral analysis
humans and the model. The examples were selected to the annotations seem to convey (0 = a student with no music
control for the passages’ different characteristics, dividing theory experience, 10 = the expertise of a professional).
into four categories with 8 examples in each group. 2) Results. As shown in Figure 3, analyzing the average
Characteristics were labelled “Simple,” “Modulating,” grades of the humans and the computer using a two-sided
“Chromatic,” and “Chromatic Modulating.” (These were t-test is nearly significant (p=.07), indicating that the
determined by the author in relation to their placement in the computer was given consistently higher grades but the
Kostka-Payne textbook and the type of analytical knowledge variation renders the grades statistically indistinguishable.
they seemed designed to teach.) An Analysis of Variance (ANOVA) found no significant
The model’s parameters were set as in Test 1, with several three-way interaction between the primary factors in this
additions to allow for the entire excerpts to be analysed. For model (the grader, the example’s characteristic, and whether
the automated analysis, the algorithm was run using a the analysis had been produced by a human or computer) and
moving window designed to provide the model with a the grades given to the examples (F(1, 25) = .123, MSE
consistent number of non-repeating chords. The window = .615, p = .729), nor was there an interaction between the
began with four chords, but if the reduction process reduced grade and the computer/human producer (F(1, 25) = .265,
that span to fewer than four chords, the window was MSE = 1.324, p=.611). The one significant effect was the
extended until the process produced a four-chord analysis. interaction between grade and the example’s characteristic
The window then moved forward, progressing through the (F(3, 25) = 3.053, MSE = 15.27, p = .047), due to the fact that
piece. Results were transcribed using a “voting” process. If “simple” excerpts received significantly higher grades.
71
Table of Contents
for this manuscript
IV. DISCUSSION
These tests indicate that SPOKE, a chordal reduction
process combined with an established analytical algorithm,
seems to conform to human Roman-numeral annotations.
Test 1 found the model to outperform key profiles when
Figure 3. Comparing graders’ assessments for both groups of
judging the opening key of excerpts within the Kostka-Payne
analyses with standard error corpus, while Tests 2 and 3 showed that analyses generated
by the model were judged comparable to and – with the
E. Test 3 – Human or Computer? exception of analyses of chromatic passages –
1) Introduction, Materials, and Methods. While the indistinguishable from those produced by undergraduate
previous tests show SPOKE to be performing well, it might music majors. These findings suggest that a chordal
behave in a way noticeably different from human behaviour. key-finding process can, in fact, both draw its parameters
To investigate such potential differences, a final test was from a musical corpus and also simulate the key finding
undertaken in which expert theorists were asked to behaviour and Roman-numeral annotations of humans.
distinguish between human and algorithmic analyses. The goal of this research is not to supplant pitch-based key
Both the human and automated analyses of Test 2 were finding models – their connections to music cognition are
used, now reordered into pairs. Each pair included the same unassailable. However, these results suggest that tonal
excerpt analyzed twice, once by a human and once by the cognition might be a multifaceted and use multiple musical
algorithm, with sequential and pairwise ordering randomized. domains. For instance, a chord-based model loses its power
Eight music theory faculty (each from different institutions, in scalar passages and in monophonic music, while a chordal
all currently teaching a music theory or fundamentals class, model might better perform in more thickly-textured music.
with two having participated in the earlier grading task) were Furthermore, the research shows that salient chordal
presented with eight pairs of analyses. Each excerpt was information can be learned by exposure to a corpus. These
analyzed twice by different graders. The task was to report results provide a fundamental proof-of-concept: if this model
which of the two they believed to be created by the computer indeed captures the properties of common-practice music
and to provide a short written explanation of their choice. If used to construct traditional triadic Roman-numeral
the sorts of analytical choices made by humans were vocabularies, then listeners may learn tonal vocabularies and
different than the algorithm’s behaviors, the theorists would analyse music in a way similar to how the model learns its
perform better than chance in their choices. (The full packet vocabulary and assigns keys.
of analysis pairs appears in the online supplement.) Nevertheless, these observations are preliminary and
2) Results. Figure 4 shows the number of times the speculative. This model of tonal cognition seems to adhere to
theorists correctly identified the computer-generated many behavioral aspects of the key-finding task, but the
analysis, first shown as an overall percentage (in white) and cognitive validity of much of its engineering remains to be
then grouped by the excerpt’s characteristic (in grey). more extensively tested against human behavior. After all,
Overall, the analyses were statistically indistinguishable: of the value of computational work to music psychology is not
the 64 choices made by the graders, only 35 were correct, only to model what is already known about some cognitive
falling well within the error of a P(.5) binomial distribution. process, but to model what might be true given what we
However, excerpts from the “chromatic” category were know about that cognitive process. Such empirically-based
significantly distinguishable: of the 16 choices made by speculative models suggest new hypotheses, plausible
theorists, 75% (12) were correct (p = .0384). explanations, and directions for future experimental
research.
ACKNOWLEDGMENT
Many thanks to the expert graders: Damian Blattler,
Rebecca Leydon, Patrick McCreless, David Crean, Bryn
Hughes, Ben Crouch, Chris Brody, Megan Long, Drew
Nobile, Andrew Aziz, Michael Buchler, Brent Auerbach,
and Guy Capuzzo. I am grateful to David Sears, David
Temperley, Trevor deClercq, and Josh Albrecht for
Figure 4. Dotted line shows the window for a p < 0.05 according comments on earlier drafts of this essay.
to a P(.05) binomial distribution
72
Table of Contents
for this manuscript
73
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 74 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
structural nature of a chord and what it means to consider a NCTs than those on weak beats. Then, using the
tone as “structural” or as “elaboration.” Moreover, information from the combination of all melodic features,
melodies and their harmonic accompaniments are typically an algorithm will be optimized to predict the likelihood of a
not independent. As such, when analyzing a piece of music NCT in the melody. The success of the algorithm will be
to determine the harmony (as was done in all of the corpora measured against harmonic analyses which were added
used in this study), typically the melodic notes are taken manually to each corpus. Melodic tones will be classified
into consideration when determining the underlying as a member or non-member of the underlying harmonic
harmony. Thus, while investigating the stylistic differences accompaniment.
in melody between and across genres, this study will It is anticipated that algorithms for the most complex
simultaneously help shed light on some of these contentious melodic corpus will work for less complex melodic
issues. corpora, but not vice-versa. In addition, it is predicted that
the algorithm for identifying NCTs in the classical corpus
will not work for the popular music corpus and vice-versa.
III. AIMS That is, multiple algorithms are likely to be needed for the
separate corpora.
Multiple corpora are examined to compare stylistic As an example, the set of Bach chorales were compared
usage of NCTs in classical versus popular music. against a test set of the theme and variations (specifically,
Specifically, a corpus of classical music consisting of Bach just the themes) in their ability to predict a chord tone (CT)
chorales, Haydn quartets, and theme and variations by or NCT simply from the beat position of the given note
Mozart and Beethoven, and a corpus of popular music (either ‘on the beat’ or ‘off the beat’). As can be seen from
consisting of the McGill “Billboard” corpus with melodic Table 1, because of the frequent chord changes and
information added are compared and contrasted in terms of relatively scarce off-beat melody notes in the chorales, this
their usage of NCTs. simple algorithm can correctly predict whether or not a note
It is hypothesized that NCTs in classical music will be is a CT or NCT 83% of the time. However, when applied
largely predicted by melodic factors, whereas NCTs (based to the set of themes using only the beat position as a
on the traditional definition) in popular music will not be so predictor, only 59% of CTs and NCTs were successfully
easily identified. Within the genre of classical music it is predicted. However, the intervals of approach and departure
hypothesized that theme and variations will have the may provide some additional information. As shown in
greatest incidences of NCTs compared with quartets and Figure 1, the proportion of CTs approached or left by leap
especially chorales, due to the nature of variations as is larger than the proportion of CTs approached or left by
quintessential of embellishment. step. Also, it is more likely that a note that is ‘on the beat’
Exploratory analysis will investigate which melodic and approached by step will be a CT than a note that is ‘off
factors carry the greatest predictive value for identifying the beat’ and approached by step. If we then combine this
NCTs both within and across genres. In addition, based on information about the intervals of approach and departure
Temperley’s (2007) model, it is hypothesized that the with the beat position information and apply it to the set of
proportion of NCTs will be greater in verse sections than themes – as shown in Table 1c – the model has improved its
chorus sections in popular melodies. Finally, the question prediction rate and can now successfully predict CTs and
of whether popular melodies exhibit the tendency to move NCTs at an accuracy rate of 74%.
freely among the notes of the pentatonic scale (as Finally, because of the tendency for popular music to
Temperley, 2007 suggests) or whether the melodies tend to exhibit melodies and harmonies that are relatively
outline the tonic triad (as Nobile, 2015 suggests) can be “independent”, it may be necessary to redefine “non-chord
directly evaluated. tone” status in popular music such that notes may be
classified as more or less “stable” regardless of their
relationship to the underlying harmony, and instead based
IV. METHOD primarily on melodic factors such as note duration, interval
of approach and departure, and/or relation to the tonic triad.
From each of the corpora information from the Note that this last factor would be consistent with Nobile’s
melodies is extracted, such as the metric position of each (2015) hypothesis that “the notes of the tonic triad…can
note onset, the duration of each note, and the interval of under certain circumstances act as stable tones even if they
approach and departure. All information is encoded in are dissonant with the foreground harmony.”
Humdrum format (Huron, 1995). An algorithm will be first Note that in terms of the interval of approach and
applied to each melodic feature in isolation, and optimized departure, the primary scale type will need to be
to determine which feature in isolation is the best predictor considered. That is, as noted by Temperley (2007), what
of NCTs. For example, in classical music, based on rules of may appear to be resolution by “skip” may in fact simply
composition, a note that is lept to and from is likely to be a reflect the fact that popular melodies are often within a
chord tone, regardless of metric position; however, ignoring pentatonic, rather than diatonic collection, and therefore a
melodic intervals, notes on strong beats are less likely to be “step” must include a minor third.
75
Table of Contents
for this manuscript
76
Table of Contents
for this manuscript
The literature contains different estimates of the temporal capacity of memory for music
containing key-changes. We addressed this in three time reproduction experiments in
which a modulating chord sequence was followed by instructions to produce a silent time
interval with a computer stopwatch that matched the duration of the sequence just
presented. Time estimates were affected by three variables: (1) the duration of each key,
(2) the response delay (the time between the end of the sequence and the participant’s
response), and (3) the distance between the keys. The sequences in Exp 1 and 2 were 60 s
in duration. Exp 1 sequences modulated from C to Gb and the relative duration of the two
keys was varied. Exp 2 sequences modulated from C to Eb to Gb and the longest key was
either the first, second, or third. Exps 1 and 2 showed that when the longest key was the
first, the reproductions were shorter. However, when compared to non-modulating
controls, short earlier keys still resulted in time underestimations. Exp 3 sequences were
20 s in duration, and contained modulations from C to F (close) or C to Gb (distant); the
response delay was 25, 50, or 100 s. The farther the interkey distance, the shorter the time
reproduction. All these results confirm predictions of the Expected Development Fraction
(EDF) Model whereby time underestimates result from a mismatch between the actual
and expected duration of key-changes (Firmino & Bueno, 2008). Exp 3 showed, in
addition, that when the response delay was 25 s, the time reproductions were larger than
the longer response delays. We further developed the EDF Model to take into account the
greater interfering effect of extra-musical information (e.g., verbal, visual) on time
estimates of music when the response delay is short. The model dynamically describes
the long capacity of memory for key-change music throughout the stimulus-response
period.
ISBN 1-876346-65-5 © ICMPC14 77 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Theoretical analyses of the form of Anthèmes 1 and its enlargement in Anthèmes 2 build
on the findings from two experiments that explore musicians’ perception of variations of
selected motives representing the motivic families of Anthèmes. Participants hear each
motive, subsequently listening to one version of Anthèmes and pressing a button when
they recognize a transformation of the motive. The findings suggest that predictability
(repeating successions of motives or electronic effects consistently associated with a
motive), formal blend (motivic continuity and lack of surface contrast), and elapsed time
affect listeners’ apprehension of the hierarchy. Musical events that are predictable or
associated with low formal blend lead listeners’ attention to the local/indivisible level.
Unpredictable and formally blended events and the passing of musical time shift attention
to higher/divisible levels.
ISBN 1-876346-65-5 © ICMPC14 78 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Learning the Language of Affect: A New Model That Predicts Perceived Affect
Across Beethoven’s Piano Sonatas Using Music-Theoretic Parameters
Joshua D. Albrecht*1
*Music Department, The University of Mary Hardin-Baylor, Belton, TX 76513, USA
1jalbrecht@umhb.edu
ISBN 1-876346-65-5 © ICMPC14 79 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
model was constructed from several different movements To try to minimize the effect of performer bias, recordings
with a greater variability of affective expression. were selected randomly from the compact disc holdings of
the University of Texas at Austin. In total, there were 155
A.Affective Categories Beethoven piano sonata discs or collections. Of these,
Zentner, et al. (2008) argues that the best paradigm for historical recordings with poor quality were eliminated, as
studying musical affect is to use an eclectic list of affect were any recordings by “Joyce Hatto.” No more than one
terms specifically suited to the music under investigation. recording was selected per CD. The performers and year of
Albrecht (2014) conducted a series of three studies with release of chosen recordings are provided in Table 2.
different methodologies to empirically derive an eclectic list A final consideration is how long the excerpts should be.
of affective dimensions suitable for studying the piano Respondents’ descriptions of their suggested excerpt are
sonatas of Beethoven. The results of that study suggest nine provided parenthetically in Table 1. Some of the passages
eclectic affect terms: angry, anticipating, anxious, calm, described are very short and specific (specific measures)
dreamy, exciting, happy, in love, and sad. While these terms while others are very long (themes). To control for excerpt
cover many of the common affective dimensions (relating to length, and in keeping with Albrecht (2012), excerpts of five
both the basic emotions and the arousal/valence paradigm), seconds in length were chosen. If the excerpt provided by
several less common terms are also present, such as dreamy the respondent was longer than five seconds, the five
and in love. The terms were also chosen to be largely seconds including the moment most closely matching the
orthogonal. This paper uses these nine terms. description was chosen. If there was no description of the
B.Stimuli
passage, the first five seconds were used.
Of primary importance to this study is the selection of Table 2. Thirty recordings chosen randomly from 155 compact
excerpts to investigate. In the selection process, there are disc recordings at the University of Texas at Austin’s library.
three primary considerations: which excerpts to select, how Performer Date Performer Date
long the excerpts should be, and which recordings should be
Badura-Skoda, Paul 1988 Landowski, Wanda 1927
used.
One sampling method would involve the principal Barenboim, Daniel 1984 Levin, Beth 2013
investigator selecting excerpts deemed to exhibit strong Benedetti, Arturo ? Lortie, Louis 1994
affective expression. However, this approach may introduce Brendel II, Alfred 1998 Lortie, Louis 1998
researcher bias. A second approach of randomly sampling Chodos, Gabriel 2007 Michelangeli 1960
from the repertoire might suffer from selecting affectively Fischer, Edwin 1987 Nissman, Barbara 2007
neutral excerpts, such as Alberti bass passages. To avoid
Goode, Richard 1991 Perahia, Murray 1986
these problems, three sources of expert musicians were
polled: The Society for Music Theory listserv, the American Haskil, Clara 1960 Ratusiñski, Andrzej 1981
Musicological Society listserv, and the Piano Street online Horszowsky, Mieczyslaw 1991 Richter, Sviatoslav 1960
forum. In these posts, respondents were asked to submit Jandó, Jenö 1987 Richter, Svjatoslav 1971
specific measures or excerpts of “the most emotionally Katahn, Enid 2000 Roberts, Bernard 1988
expressive moments in the Beethoven sonata literature.” As Kovacevich, Steven 1960 Rose, Jerome 1996
a result of this query, 30 excerpts were attained, divided into Kovacevich, Steven 1960 Rosen, Charles 1989
a training set and a test set. The two sets were matched for
both compositional period (early, middle, late) and inner/ Kovacevich, Stephen 2001 Wild, Earl 1987
outer movement. The excerpts attained from the poll are Kuerti, Anton 1975 Vogel, Edith ?
shown in Table 1.
Table 1. Thirty excerpts attained from a survey of the Society for Music Theory listserv, the American Musicological Society listserv,
and the Piano Street forum. Respondents were asked for the “most emotionally expressive excerpts” in the Beethoven piano literature,
and to list the measures, themes, or sections along with the movements (descriptions are provided in parentheses). Movements were
paired by compositional period and outer/inner movement, and then were randomly assigned to the training set or the test set.
Training Set Test Set
Inner Op. 2, No. 1, II (2nd theme) Inner Op. 7, II (2nd theme)
Early Early
Movement Pathétique, Op. 13, No. 1, II (entire mvt.) Movement Op. 10, No. 3, II (development)
Period Period
Outer Op. 2, No. 1, IV (mm. 20-50) Outer Op. 10, No. 3, I (no section identified)
Op. 26, I (resolutions in Var. III) Op. 28, I (mm. 109-127, climax of A chord)
First
First Op. 27, No. 2, I (transition to the A1 section) Op. 31, No. 2, I (recit. at close of movement)
Movement
Middle Movement Op. 31, No. 3, I (ms. 99) Middle Appassionata, Op. 57, I (start of 2nd theme)
Period Waldstein, Op. 53, I (beginning of 2nd theme) Period Moonlight, Op. 27, No. 2, III (opening theme)
Last
Last Op. 26, IV (main theme) Waldstein, Op. 53, III (major 7th arpeggios)
Movement
Movement Tempest, Op. 31, No. 2, III (mm. 169-173) Les Adieux, Op. 81a, III (ending, new tempo)
First Op. 109, I (retransition) First Hammerklavier, Op. 106, I (beginning)
Movement Op. 110, I (sad part) Movement Op. 111, I (beginning)
Late Middle Op. 81a, II (2nd theme) Late Middle Hammerklavier, Op. 106, II (beginning)
Period Op. 90, II (canon before the conclusion) Period Movement Op. 101, III (intro to final movement)
Last
Hammerklavier, Op. 106, III (mm. 60 or 145) Last Op. 109, III (Var. VI)
Movement
Op. 111, II mm. (116-121) Movement Op. 110, III (mm. 21-23)
80
Table of Contents
for this manuscript
C.Participants parameter and the way that each was operationalized. The
This study involved 68 participants from the University of score and recording were both consulted for each 5” excerpt.
Mary Hardin-Baylor, consisting of freshmen and sophomore Table 4. Fourteen regression parameters used in the model and
music majors. These participants were randomly divided into their operationalization.
two groups, a group for training regression models and a
Regression Operationalization
group for testing the regression models. Across both groups,
parameter
there were 33 females, a mean age of 19.4 (SD = 1.5), a Dynamic level Performed relative dynamic level:
mean number of years of musical training of 7.1 (SD = 3.7),
Coded pp–ff (1-5)
and a mean Ollen Musical Sophistication Index (Ollen,
Crescendo Crescendo or decrescendo:
2006) of 455.0 (SD = 243.9). As a rough gauge of the extent
of the participants’ familiarity with Beethoven’s piano Coded “strong dim. – strong cresc.” (-2–2)
sonatas, participants were also asked how familiar they were Density Maximum number of concurrently attacked
with early Romantic piano music and how often they Surface rhythm The total number of onset moments
listened to it (from 1-5). The mean across both groups was Tempo Performed relative tempo: Coded 1-5
2.6 (SD = .95) and 2.8 (SD = 1.1), respectively. There were Articulation Performed articulation: Coded 1-5 (1 = legato)
no significant differences between the two groups regarding Pitch height Two separate values: Highest and Lowest pitch
gender, age, years of musical training, OMSI score, or in semitones from middle C
familiarity with or frequency of listening. Pitch direction Correlation of pitch height/time:
Coded -2-2 (all descending- all ascending)
D.Procedure
Tendency tone Presence of tendency tones: Coded 0/1/2
Subjects participated in groups of up to 8 in a computer (0 = none; 1 = LT or Dom. 7, 2 = suspension or
lab. After reading the experiment’s instructions, participants melodic chromatic half-step motion)
listened to two trial excerpts to ensure they understood the Mode Predominantly major or minor harmonies:
instructions and interface. Participants then listened to all Coded -1-1 (1 = major; 0 = neither)
fifteen excerpts from their set and rated each one along nine Harmonic Presence of harmonic dissonance: Coded 0-2
different affective dimensions, including one extra trial per dissonace (0 = only M or m triads; 1 = Dom. 7; 2 = dim. or
other dissonant/chromatic harmonies)
affect to test for intra-subjective reliability. Participants
Harmonic Number of harmonies per segment
listened to the excerpts on headphones adjusted to a tempo
comfortable listening level, and rated each excerpts affective Closure/ Presence of closure/cadence:
expression on a 7-point Likert scale. cadence Coded 0-2 ( 2 = authentic; 1 = other cadence)
81
Table of Contents
for this manuscript
E.Affect Predictions participant ratings for the exciting and sad affective
To test the generalizability of these models, each of the dimensions are plotted.
fourteen musical parameters used to construct the nine Figure 2 provides a profile not of an excerpt but of an
training set models were measured for the fifteen excerpts in affective model. In short, the distance between the
the test set. These parameters were fed into each affect’s predictions and participant ratings provides a metric of
regression model, resulting in a prediction in ‘Likert model success and, consequently, generalizability of
space’for each affective dimension for each excerpt in the affective expression across the repertoire. Although most of
test set. Importantly, both the excerpts used and the the asterisks are relatively close to the participant ratings,
participants tested were new for the test set. As a result, the some of the model predictions are fairly distant from
proximity of the model predictions to the participant ratings attained ratings, especially for the sad affect model.
provides a metric for whether the musical structures
associated with affective expression are generalizable across Model Predictions by Excerpt for Exciting (r = +.84***)
the Beethoven piano sonatas and across listeners. Fig. 1
7 G
6
affects of Op. 109, I and Op. 81a, III in the test set. Red G
G G
Rating
5
asterisks show predictions made by the training set models
4
and dots surrounded by the blue 95% confidence intervals G
G
G
G
3
show the mean participant ratings of the excerpt for the G
G
2 G
10.3.1
10.3.2
101.3
106.1
106.2
109.3
110.3
111.1
27.2.3
28.1
31.2.1
53.3
57.1
7.2
81a.3
Model Predictions by Affect for Op. 109, I
7 Opus.Movement
G
6 G
G
Model Predictions by Excerpt for Sad (r = +.74**)
Rating
5 G
7
G
4
G
G
6 G
G
G
Rating
3 G G
5
2 G
4
G
G
1 3 G
G
G
G G
G
Angry
Anticipating
Anxious
Calm
Dreamy
Exciting
Happy
In_love
Sad
G
G
10.3.1
10.3.2
101.3
106.1
106.2
109.3
110.3
111.1
27.2.3
28.1
31.2.1
53.3
57.1
7.2
81a.3
Model Predictions by Affect for Op. 81a, III Opus.Movement
7 Figure 2. Model predictions for two of the nine affective
6 dimensions: exciting and sad. As in Fig. 1, red asterisks
G
G
5 G G
4
intervals. Although all three models shown are significantly
G
G correlated with participant ratings, some excerpts reveal slight
3
G
G
inaccuracies in prediction.
2
1
G
Anticipating
Anxious
Calm
Dreamy
Exciting
Happy
In_love
Sad
82
Table of Contents
for this manuscript
with the idea that those affective dimensions can be reliably The results from this test are given in the right column of
modeled on the Beethoven piano sonatas at large. Table 6. Figure 3 displays the results graphically.
Specifically, distances between participant ratings and
Table 6. Accuracy metrics of each training set regression model
matched model predictions for each affect are shown in blue
Affective Correlations (R Mean distance (SD) and distances between ratings and unmatched model
dimension predictions and mean closer to rating than predictions are shown in red. In Fig. 3, the smaller the Likert
participant ratings ‘noise distribution’ distance, the closer the prediction and more successful the
Angry r = +.71 (.10), p = .02 1.15 (1.6), p < .0001 model. As can be seen in Table 6 and Fig. 3, matched model
Anticipating r = +.50 (.09), p = .02 0.65 (1.5), p < .0001 predictions were significantly more accurate at predicting
Anxious r = +.53 (.16), p = .02 0.66 (1.5), p < .0001 participants’ responses than chance, as measured by
Calm 1.39 (1.8), p < .0001 unmatched model prediction distances.
r = +.82 (.40), p = .0004
Dreamy r = +.70 (.28), p = .005 1.30 (1.6), p < .0001
Exciting 1.73 (1.4), p < .0001
V. DISCUSSION
r = +.87 (.71), p = .0001
Happy r = +.81 (.58), p = .0002 1.27 (1.6), p < .0001 For each affective dimension, the regression models
In love r = +.48 (.27), p = .002 0.69 (1.5), p < .0001 generated from one set of participants on the training set
Sad r = +.77 (.55), p = .003 1.16 (1.8), p < .0001
excerpts predicted ratings from new participants on new
Another way of testing whether or not the regression
excerpts at rates better than chance. However, a further
question could be raised about regression models that could
be generated from the test set. In other words, given that the
models can accurately predict participants’ response is data from each set were collected independently, the study
suggested by the null hypothesis, which states that the could theoretically be run backwards, in which the
relationship between musical structures and participant regression models generated from the test predict participant
responses in one set of excerpts would be unrelated to the responses from the training set.
relationship in another set. If that is the case, then the It should be mentioned here that this reversal is post hoc
predictions from one model applied to the predictors in a and after the models were built, tested, and examined.
different set of excerpts would provide a poor fit equivalent Caution should be advised in the treatment of the following
to random noise rather than good predictions and should not data, in the first case because the author seeing the first set
be significantly different from participant responses of models may bias the construction of the second set of
generated by any other unrelated model. models. In the second case, the way that the two sets of
This hypothesis can be directly tested by examining the regression models interact is likely highly correlated; even
distances in ‘Likert space’ between model predictions and though the predictors chosen may be different, if a model
participant responses. Specifically, random noise generated from dataset A performs well on dataset B, then it
distributions can be generated by measuring the distance on would not be surprising for the dataset B model to perform
the 7-point Likert scale between predictions of different well on dataset A. Therefore, the following further
affects’ models and matching affect participant ratings. For examination of the data should be taken as post hoc and care
example, the predictions made by every affect model but the should be taken in its interpretation.
angry model could be compared to participant responses for
the angry affect. The Likert distances between the A.Model comparison
unmatched model predictions and participant ratings for Nine new regression models, one per affective dimension,
each affect can be made into a ‘noise distribution,’ which can were generated for the ‘test set’ dataset in a similar way as
then be compared to a distribution of distances between the the ‘training set’ models were constructed (see section IV:
angry model predictions and angry participant ratings. D). These models, along with the amount of variance they
account for, are shown in Table 7. The test set regression
Distance from Model Predictions to Participant Responses models used 2-4 predictors each, accounting for between
G 10.3%-67.1% (mean of 47.1%) of the variance in participant
G G G
G G
ratings.
Likert Points
G
G G
2.0
Again, variable selection is not an exact science. The
1.5
variables included in the model were chosen to be as
orthogonal as possible, to explain as much of the variance in
1.0 participant ratings as possible, and to do so with as few
G Unmatched Models variables as possible. In many cases, the variables included
0.5
Matched Models
as predictors mirrored the original set of models, but more
predictors were unique than shared. Of the variables shared
Angry
Anticipating
Anxious
Calm
Dreamy
Exciting
Happy
In_love
Sad
83
Table of Contents
for this manuscript
Table 7. Regression models derived from the test set. Adj. R2 positively correlated with participant responses. These test
values are what percent of the variance is explained. set model predictions are better correlated with the training
Affective Regression model Adj. set ratings than the reverse (see IV: E). One possible reason
dimension R for this result might be because the participants in the test set
Angry More tendency tones, minor mode, higher 0.378 provided less ‘noisy’ responses. Another potential reason
highest pitch could be that the excerpts in the test set covered a wider
Anticipating More dissonance, minor mode, higher 0.203 range of affective expression and so the models derived from
highest pitch
the data are more generalizable. Yet another possible
Anxious 0.229
explanation could be that the model selection process was
Higher highest pitch, faster rhythms, more
biased by information from the first dataset.
tendency tones, more legato articulations
As another test, Likert distances were calculated between
Calm Quieter dynamic, slower tempo, higher 0.602
affect ratings and matched model predictions and between
lowest pitch the ‘noise distribution’ of unmatched model predictions.
Dreamy Slower tempo, lighter texture, fewer 0.401 These results are shown in the rightmost column of Table 8
tendency tones, faster harmonic tempo and graphically in Figure 4. As with the training set, Likert
Exciting Louder dynamic, faster tempo 0.671 distances for regression models were significantly closer
Happy Faster tempo, more staccato, major mode 0.589 than chance.
In love Crescendo, major mode, less dissonance 0.103
Sad Quieter dynamic, more tendency tones, 0.579 Distance from Model Predictions to Participant Responses
slower tempo, minor mode
2.5 G G
G
G
Likert Points
G G
G
Anticipating
Anxious
Calm
Dreamy
Exciting
Happy
In_love
Sad
of generalizing across many excerpts. Moreover, it might be
the case that the group of 34 participants represented in the
test set may have been attending to different musical
elements than the 34 participants in the training set to make
their determinations. Again, the relatively accurate Figure 4. Distance, in Likert points on a 7-point scale, between
predictions made by the models are remarkable given the model predictions and participant responses. This figure is the
sizable differences between the predictors chosen by the conceptual reverse of Figure 3. Specifically, the distance
between test set model predictions was measured against
different models, and is consistent with the notion that participant ratings in the training set. Again, the blue distances
composers use quite a bit of redundancy in the represent matched affect model/response comparisons, and the
communication of affective signals. red distances represent the distances between model predictions
for different affects than the participant’s rated for each affect.
B.Reverse Model Predictions All distances are significantly closer at p < .0001.
To complete the reverse comparison, predictions were
made by the ‘test set’ models on the ‘training set’ data and C.Supermodel construction
compared against the participant ratings. These results are As with the training set models, the test set models were
shown in Table 8. significantly predictive of participant ratings in the training
Table 8. Accuracy metrics of each test set regression model
set. These results are consistent with the hypothesis that the
regression models do not involve excessive overfitting, but
Affective Correlations (R Mean distance (SD) instead that they truly capture some of the musical structures
dimension predictions and mean closer to rating than correlated with affective expression in the Beethoven piano
participant ratings ‘noise distribution’ sonatas. The high levels of accuracy in predictions are
Angry r = +.59 (.35), p = .02 1.15 (1.6), p < .0001 particularly noteworthy in light of how different the models
Anticipating r = +.58 (.34), p = .02 0.65 (1.5), p < .0001 are between the training and test sets in terms of the
Anxious r = +.58 (.34), p = .02 0.66 (1.5), p < .0001 predictor variables used.
Calm r = +.79 (.62), p = .0004 1.39 (1.8), p < .0001 As a result of the inter-model accuracy of the models’
Dreamy r = +.67 (.45), p = .006 1.30 (1.6), p < .0001
predictions, it is appropriate to consider the entire set of 30
excerpts as one large dataset. From this large dataset, one
Exciting r = +.84 (.71), p = .0001 1.73 (1.4), p < .0001
supermodel for each affective dimension can be created that
Happy r = +.82 (.67), p = .0002 1.27 (1.6), p < .0001 will explain the musical structures associated with affective
In love r = +.74 (.55), p = .002 0.69 (1.5), p < .0001 expression in ways that are even more generalizable.
Sad r = +.70 (.49), p = .004 1.16 (1.8), p < .0001 One benefit of using the larger dataset is that there is more
The middle column of Table 8 shows correlations between
variance in the way that each affective dimension is
represented. Coupled with the increased size of the dataset,
the means of each set of participant ratings and the model the extra variance allows for the statistical inclusion of more
predictions. Again, these correlations only involve two sets predictor variables that are significantly correlated with
of 15. Nevertheless, each set of predictions is significantly affective expression. As a result, the supermodels promise to
84
Table of Contents
for this manuscript
more fully model the way that affective expression in this affective expression in the Beethoven piano sonatas. Again,
repertoire is communicated in different ways and a deeper further testing on a new dataset is necessary before being
understanding of nuances in the relationship between able to ascertain how accurate these models might be.
musical structures and that affective expression.
The supermodels are shown in Table 9. As in Tables 5 and ACKNOWLEDGMENT
7, the musical parameters used in each supermodel along I would like to thank Aaron Baggett for spending several
with the effect of the directionality are shown in the center of his valuable hours talking with me at length about the
column of Table 9. The adjusted R2 values are in the right unique statistical problems involved in this study. He not
column, showing how much of the variance in participants’ only provided a great sounding board for thinking through
ratings the models are able to explain. Of note is the these issues, but was an enthusiastic conversation partner,
tendency for adjusted R2 values to decrease from the smaller and for that I am deeply grateful.
data set models. This effect is largely due to the increase of
variance in the total dataset. Because the bigger dataset REFERENCES
contains excerpts with a greater range of expression for each
Albrecht, J. (2014). Naming the abstract: Building a repertoire-appropriate
affective dimension and because there is a greater number of taxonomy of affective expression. In Proceedings of ICMPC-APSCOM
human raters, less of the variance can be accounted for. 2014 Joint Conference (pp. 128-135). Seoul, South Korea: Yonsei
However, as a direct consequence, it is likely that the University.
supermodels would be better at predicting new listeners’ Albrecht, J. (2012). A model of perceived musical affect accurately predicts
ratings for new excerpts from the Beethoven piano sonatas. self-reported affect ratings. In Proceedings of 12th International
Conference of Music Perception and Cognition (pp. 35-43).
Knowing the extent to which these models can successfully Thessaloniki, Greece: Aristotle University.
predict new listeners’ ratings for new excerpts must await Balkwill, L. L, & Thompson, W.F. (1999). A cross-cultural investigation of
direct testing in a new experiment. the perception of emotion in music: Psychophysical and cultural cues.
Music Perception: An Interdisciplinary Journal, 17, 43-64.
Table 9. Regression supermodels derived from the entire data Balkwill, L. L., Thompson, W.F., & Matsunaga, R. (2004). Recognition of
set. Adj. R2 values are what percent of the variance is explained. emotion in Japanese, North Indian, and Western music by Japanese
listeners. Japanese Journal of Psychological Research, 46, 337-349.
Affective Regression model Adj. Eerola, T., Lartillot, O., & Toiviainen, P. (2009). Prediction of
multidimensional ratings in music from audio using multivariate
dimension R
Angry
Minor mode, more tendency tones, lower
lowest pitch, louder dynamic
0.353
regression models. In Proceedings of 10th International Conference on
Music Information Retrieval (ISMIR 2009) (pp. 621-626). Kobe, Japan.
Egermann, H., Fernando, N., Chuen, L, & McAdams, S. (2015). Music
Anticipating More dissonance, minor mode, more
staccato, higher highest pitch
0.168 induces universal emotion-related psychophysiological responses:
Comparing Canadian listeners to Congolese Pygmies. Frontiers in
Anxious
More dissonance, minor mode, more
staccato, higher highest pitch
0.238
Psychology, 5, 1341. doi: 10.3389/fpsyg.2014.01341
Imberty, M. (1979). Entendre la musique. Paris: Dunod.
Juslin, P., & Lindström, E. (2011). Musical expression of emotions:
Calm
Quieter dynamic, slower tempo, less dense
texture, lower highest pitch
0.539 Modeling listeners’ judgments of composed and performed features.
Special Issue: Music and Emotion. Music Analysis, 29(1-3), 334-364.
Ladinig, O. & Schellenberg, G. (2011). Liking unfamiliar music: Effects of
Dreamy
Slower tempo, quieter dynamic, fewer
tendency tones, more dissonance
0.402 felt emotion and individual differences. Psychology of Aesthetics,
Creativity, and the Arts, 6, 146-154.
Exciting
Faster surface speed, faster tempo, louder
dynamic, staccato articulations
0.678
Ollen, J. (2006). A Criterion-related Validity Test of Selected Indicators of
Musical Sophistication Using Expert Ratings (Doctoral dissertation).
Columbus, OH: Ohio State University.
Happy
Major mode, faster tempo, staccato
articulations, descending melodies, fewer
0.493 Rentfrow, P. J. & Gosling, S. D. (2003). The do re mi’s of everyday life:
The structure and personality correlates of musical preference. Journal
In love
tendency tones
Major mode, fewer tendency tones, higher 0.193
of Personality and Social Psychology, 82, 1236-1256.
Schubert, E. (2004). Modeling perceived emotion with continuous musical
features. Music Perception: An Interdisciplinary Journal, 21(4),
Sad
lowest pitch, crescendo
Slower tempo, slower surface rhythms, 0.457
561-585.
Sloboda, J. (1991.) Music structure and emotional response: Some empirical
more legato, minor mode
findings. Psychology of Music, 19, 110-120.
Tagg, P. (2006). Music, moving images, semiotics, and the democratic right
to know. In Ulrik Volgsten (Ed.), Music and Manipulation: On the
Social Uses and Social Control of Music (pp. 163-186). New York, NY:
VI. CONCLUSION Berghahn Books.
Although the predictions of the regression models Zentner, M. R., Grandjean, D., & Scherer, K. R. (2008). Emotions evoked
by the sound of music: Characterization, classification, and
generated from the training set did not always fall within the measurement. Emotion, 8, 494-521.
confidence intervals of the participant ratings of the training
set, the results are consistent with the hypothesis that the
models capture some generalizable elements of affective
expression, at least in Beethoven’s piano sonatas. Moreover,
testing the reverse regression models provided even closer
estimates of listener perception of affective expression.
Despite differences in the model structures as seen in
predictor variable selection, regression models were
significantly better than chance in every case. Given the
similarity between test set and training set models, the post
hoc supermodels may provide the best approximation of a
generalizable estimate of musical structures associated with
85
Table of Contents
for this manuscript
Melodic discrimination has formed the basis of much research, both in the experimental
psychology tradition (e.g. Dowling, 1971) and in the individual differences tradition (e.g.
Seashore, 1919). In the typical melodic discrimination paradigm, participants are tasked
with detecting differences between two successively played versions of the same melody.
Experimental psychology studies have investigated how melodic features influence task
performance, identifying effects from a range of sources including contour, length, and
tonal structure. Meanwhile, individual difference studies have found that melodic
discrimination ability provides a good proxy for musical expertise, and as a result the task
is often used in musical test batteries (Law & Zentner, 2012). However, there is
surprisingly little consensus concerning what cognitive ability or abilities the melodic
discrimination test actually measures, raising serious problems for its construct validity.
The present research aimed to address these problems by forming and testing an explicit
cognitive model of melodic discrimination. This cognitive model comprises four main
stages: perceptual encoding, memory retention, similarity comparison, and decision-
making. On the basis of this model, we hypothesised that item difficulty should be
predicted by the complexity of the reference melody (i.e. first version in the pair), the
tonalness of the reference melody, and the perceptual similarity between melody
versions. We operationalise these three features using computational models (SIMILE,
Müllensiefen & Frieler, 2003; FANTASTIC; Müllensiefen, 2009). We then test the
influence of these three features on melodic discrimination performance by applying
explanatory item response modelling to response data collected for three pre-existing
melodic discrimination tests (N = 317). Perceptual similarity and complexity measures
significantly predicted task difficulty, supporting the proposed four-stage cognitive model
and contributing to the construct validity of the melodic discrimination test.
ISBN 1-876346-65-5 © ICMPC14 86 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 87 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
88
Table of Contents
for this manuscript
As shown in Figure 3, although the IR model is capturing The AC model has better absolute performance by virtue of
some of the variance in the data, it leaves much unexplained. learning an intercept term that strongly favors always
In particular, it does a poor job of distinguishing between the choosing the tonic, but the remaining variance is not well
AC and NC experimental conditions, underpredicting tonic accounted for. Moreover, note that the model’s ability to learn
responses in the AC condition and overpredicting tonic an intercept term that favors the tonic is independent of the IR
responses in the NC condition. To further explore the model’s predictions. Thus it appears that the IR model itself is
predictions of the IR model in these two conditions, we re-fit unable to account for the fact that the tonic is strongly favored
the multinomial discrete-choice model on data from each in these melodies.
condition separately.
IV. CONCLUSION
C. Results: Non Cadence condition Our results indicate that the principles of the IR model do
Results of the model fit to NC condition responses reveal significantly affect participants’ responses but that they still
significant effects of both Proximity (β = -0.23, t = -12.85, leave much of the variance in participants’ responses
p<0.001) and Pitch Reversal (β = 0.14, t = 3.02, p<0.01) in unexplained. The Proximity factor in particular is reliably
the predicted direction. McFadden’s R2 for this model is 0.047. significant in the predicted direction, while we found mixed
As before we use the fitted model to predict the distribution of results for the Pitch Reversal factor.
responses for each melody and compare it against the true A comparison of the AC and NC melodies revealed that the
empirical distribution. The mean L1 error is 0.099 (sd=0.032). IR model is unable to predict the strong preference for tonic
On all these measures, the NC model appears comparable to completions in the AC melodies. In other words, the
the both-conditions model. note-to-note principles encoded by this model are unable to
capture the underlying harmonic structure that generates
D. Results: Authentic Cadence condition strong expectations for an authentic cadence.
In future work, we plan to expand upon these regression
Results of the model fit to AC condition responses reveal a analyses to explicitly compare different quantitative models of
significant effect of Proximity (β = -0.15, t = -6.95, p<0.001) melodic expectancy such as the IDyOM model (Pearce, 2005)
in the predicted direction and a significant effect of Pitch and hierarchical models (Koelsch et al., 2013), which—by
Reversal opposite the predicted direction (β = -0.19, t = -2.44, taking into account more (and different) sources of
p=0.01). McFadden’s R2 for this model is 0.038. The mean L1 context—may better be able to account for expectations such
error is 0.050 (sd=0.023). as those in the AC melodies. Predictions of multiple models
Although the absolute (L1) error is lower for the AC model can be entered into the same regression model in order to test
than for the NC or both-conditions models (indicating a better which better predicts the given data. This will allow us to
fit between predictions and true responses) McFadden’s R2 is directly test whether models focusing on local versus global
also lower, indicating that less variance in the data is context, or learning statistical relationships versus innate
explained by this model. This paradox reflects the fact that the principles, are better predictors of participants’ responses.
response data itself is less variable in the AC condition, as
responses in this condition predominantly consist of the tonic.
89
Table of Contents
for this manuscript
REFERENCES
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random
effects structure for confirmatory hypothesis testing: Keep it
maximal. Journal of Memory and Language, 68(3), 255–278.
Croissant, Y (2013). mlogit: multinomial logit model. R package
version 0.2-4. URL
https://cran.r-project.org/web/packages/mlogit/index.html.
Eerola, T., and Toiviainen, P. (2004). MIDI Toolbox: MATLAB Tools
for Music Research. Jyväskylä: University of Jyväskylä.
Fogel, A. R., Rosenberg, J. C., Lehman, F. M., Kuperberg, G. R., &
Patel, A. D. (2015). Studying Musical and Linguistic Prediction
in Comparable Ways: The Melodic Cloze Probability Method.
Frontiers in Psychology, 6:1718.
Huron, D. (2006). Sweet Anticipation: Music and the Psychology of
Expectation. Cambridge, MA: MIT Press.
Koelsch, S., Rohrmeier, M., Torrecuso, R., & Jentschke, S. (2013).
Processing of hierarchical syntactic structure in music.
Proceedings of the National Academy of Sciences of the United
States of America, 110(38), 15443-15448.
Krumhansl, C. L., Louhivuori, J., Toiviainen, P., Järvinen, T., &
Eerola, T. (1999). Melodic expectation in Finnish spiritual folk
hymns: Convergence of statistical, behavioral, and
computational approaches. Music Perception, 17, 151-195.
Large, E. W., & Jones, M. R. (1999). The dynamics of attending:
How people track time-varying events. Psychological Review,
106, 119.
Margulis, E. H. (2005). A model of melodic expectation. Music
Perception, 22(4), 663- 714.
McFadden, D. (1978). Quantitative Methods for Analysing Travel
Behaviour of Individuals: Some Recent Developments. In D. A.
Hensher & P. R. Stopher (Ed.), Behavioural Travel Modeling (pp.
279-318). London: Croom Helm.
Narmour, E. (1990). The Analysis and Cognition of Basic Melodic
Structures: The Implication-Realization Model. Chicago:
University of Chicago Press.
Pearce, M. T. (2005). The Construction and Evaluation of Statistical
Models of Melodic Structure in Music Perception and
Composition. Ph.D. Thesis, City University, London.
Pearce, M. T., Ruiz, M. H., Kapasi, S., Wiggins, G. A., &
Bhattacharya, J. (2010). Unsupervised statistical learning
underpins computational, behavioural, and neural manifestations
of musical expectation. NeuroImage, 50(1), 302-313.
Schellenberg, E. G. (1996). Expectancy in melody: Tests of the
implication- realization model. Cognition, 58(1), 75-125.
Schellenberg, E. G. (1997). Simplifying the implication-realization
model of melodic expectancy. Music Perception, 14(3), 295-318.
Schmuckler, M. A. (1989). Expectation in music: investigation of
melodic and harmonic processes. Music Perception, 7, 109–149.
90
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 91 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
transitional probabilities of certain words, and the correlation variability, measured with the normalized pairwise variability
of the word frequencies to a corpus of confirmed authorship. index (see Grabe and Low, 2002; Daniele and Patel, 2004;
Music researchers have occasionally used such techniques to Patel and Daniele, 2013), the average entropy of each melodic
determine compositional authorship. For example, Backer and voice, and the correlation to the pitch distributions in secure
Kranenburg (2005) used a linear discriminant transformation to Josquin works.
examine the features of works by J.S. Bach and J.L. Krebs (see
also Kranenburg, 2007), and Bellman (2011) employs
computational methods to quantify musical style engaging with Table 1. Features contributing to the models.
stylometric questions throughout his study. Speiser and Gupta
(2013) use principal components analysis to train classifiers High-Level Low-Level Metadata
(similarly to how this paper approaches the problem). The 9-8 suspensions average melodic primary
current study makes use of these previously implemented entropy (per voice) listed
approaches to better understand the authorship of the works of composer
Josquin. oblique motion nPVI title
contrary motion pitch-range correlation JRP
A.High-Level Musical Features to secure Josquin attribution
works level
This study makes use of what we might consider “high-level”
similar motion note-to-note transition genre
features, meant to accompany the currently employed probabilities (by scale
“low-level” features. The former category includes the degree)
presence of 9-8 suspensions, as well as the rate of occurrence of parallel motion
the specific types of contrapuntal motion: parallel, similar,
contrary, and oblique. Many other commonly-discussed
high-level features are not employed in the current study. For
example, musicologists often debate the uniqueness of the III. BETWEEN-COMPOSER COMPARISON
texture around cadence points in Josquin’s music, arguing that
textures become sparser as cadences approach (Wegman, We compare the music of Josquin to that of other composers
2008). Such features can be difficult to quantify, and therefore in a range of styles. This helps to identify features that
difficult to operationalize. Other features commonly discussed represent differences between styles, are common to a
but left out of the current study include “relaxed” contrapuntal particular style, or are unique to a particular composer. A
writing and an overreliance on repetition of material. These comparison between Josquin and J.S. Bach’s four-part chorales
both require a level of interpretation that we decide not to is used as an initial calibration comparison. Most any untrained
employ for the current study. listener should be able to distinguish between these two
repertories separated by two centuries. Slightly more difficult
Additionally, a feature commonly referred to as the are comparisons to Ockeghem and Du Fay, who predate
Satzfehler has been discussed and debated. Originally coined Josquin by one and two generations. These comparisons are
by Helmuth Osthoff (Osthoff, 1962), the Satzfehler was more difficult for the average listener, but relatively easy for
considered to be a possible error in voice-leading made by experienced listeners of Renaissance music. Finally,
Josquin during his early years, in which leading-tone motion challenging comparisons are made to de Orto, and La Rue.
occurred between two voices simultaneously and involved one These composers are Josquin contemporaries who all compose
voice dipping down from the tonic to the leading-tone and then in a common style, and their works are often misattributed
ascending back to the tonic while another voice preemptively amongst each other.
performs the leading-tone before the first voice has had a
chance to resolve. Later Edgar Sparks argued that this device A.Dimensionality Reduction
was not an error but was simply a common feature used, not
only by Josquin, but also by many of his contemporaries Before training a classifier, we employed principal component
(Sparks, 1976). While the appropriateness of this technique is analysis (PCA) to reduce the 53 input features into a more
still being questioned, it is one of the few techniques that have manageable number of dimensions. This maintains as much
explicitly been applied to Josquin’s music as an identifier of his variance between features as possible and is used to drop
compositional style, and have been documented at length dimensions containing little information. High-level feature
(Wegman, 2008). For the current study, however, we decided comparisons seem to be more salient between genres and
to exclude it from our searching, as once again there were many epochs, whereas low-level features function best within them.
flexibilities to the definition, and computational searching To anticipate the results of this portion of the study,
yields too many false positives. comparisons between disparate styles (Josquin and Bach)
were actually hindered by the inclusion of low-level features,
B.Low-Level Musical Features and the classifiers performed better when only provided the
For low-level features, we examined note-to-note transitions high-level information. In contrast, low-level features do
by scale degree. Admittedly, the notion of scale degree does increase the separation abilities of PCA and can train more
not easily map onto the music of the Renaissance, but it accurate classifiers when provided with whose music in style
nevertheless provides a concise method for looking at similar to that of Josquin.
relationships between notes. These were then examined along
with other similar low-level features including rhythmic
92
Table of Contents
for this manuscript
93
Table of Contents
for this manuscript
94
Table of Contents
for this manuscript
F.Josquin and La Rue should be fed into a classifier. In order to achieve this goal, the
current study uses the results of the principal component
Pierre de la Rue (c1452–1518) is more problematic than de analysis as a way of training a multi-composer classifier. As
Orto. His works are most commonly mistaken for Josquin’s this study is primarily exploratory in nature, we opted to train
(and vice versa), and the two are the most stylistically similar three types of classifiers, and compare the results. A future goal,
of all of the composer’s examined in the present study. Figure 8, as will be discussed below, would be to employ an
shows an interesting pattern: the repetition of the fifth is ensemble-learning model, in which multiple models are used
anti-correlated, and the strongest correlation coefficients are and a decision tree is implemented to take the results of the
provided by 3 4 scale degree motions. model that best performs in a given context. The current study
employs a k-nearest neighbor (KNN), a support vector machine,
and a decision tree (as a way of better understanding the role of
each feature).
A.K-Nearest Neighbor Classifier
A principal component analysis was performed on all of the
above composers at once, using 48 features. In order to
account for 85% of the variance (the standard used above), 27
PCs were needed to create the model. A two-dimensional
representation of the separation between the first two PCs can
be seen below in Figure 10. Notice that the first two PCs can be
used to anticipate the similarity of style between the composers.
Bach (red) is well separated from all of the Renaissance
composers, while Du Fay (green) is the second more
differentiated. The first two PCs cannot differentiate by
themselves between the other composers’ music (Ockeghem,
de Orto, La Rue and Josquin).
95
Table of Contents
for this manuscript
Du Fay was most confused with Ockeghem, which makes Table 3. Confusion matrix for SVM using a radial basis function.
sense, given their relative proximity in time period and similar Results are in percentage of responses. The correct answer is
compositional styles. Interestingly, however, this was not a listed in first column, top row lists predicted composer.
bidirectional confusion: Ockeghem pieces were rarely
Bach de Du Ock. Josquin La
confused for those written by Du Fay, and instead were more Orto Fay Rue
likely to be ascribed to La Rue (20% of the time). The model
did surprisingly well with discerning Josquin from La Rue, Bach 98.5 0 0 0 0 1.5
which was predicted to be the most difficult comparison.
Although 21.2% of Josquin pieces were confused for La Rue de Orto 33.3 33.3 0 0 0 33.3
pieces, the model guessed correctly on 60.6% of the pieces. For
La Rue, the model was accurate 80.6% of the time, and only Du Fay 50 0 25 25 0 0
ascribed pieces to Josquin 16.7% of the time. While, at first
sight some of these accuracy levels might seem to be a little Ockeghem 6.7 0 0 60 13.3 20
low—we had hoped for at least a 75% accuracy with
Josquin—it should be noted that each composer performed Josquin 16 0 4 4 60 16
significantly better than the baseline (which would be around
16.7%, given a uniform random comparison between six La Rue 0 0 0 0 12.9 87.1
composers).
Table 2. Confusion matrix for KNN model. Results are in a
percentage of responses. The correct answer is listed in the first
column, and predictions are listed in the top row. C.Future Work
Bach de Du Ock. Josquin La Future work will examine the specifics of each feature more
Orto Fay Rue thoroughly as a way of identifying the differences between
composers. For example, the features of all of the composers
Bach 94.5 .9 0 0 .9 3.6 studied were placed within a decision tree model to better
understand how such aspects might contribute to creating
de Orto 5.6 38.9 0 11.1 11.1 33.3 better models in the future. As can be seen below, the most
important features in discerning the four composers seem to be
Du Fay 14.3 0 42.9 28.6 0 14.3 the employment of similar motion, as well as treatment of the
sixth scale degree, followed by motion of scale degree two to
Ockeghem 0 6.7 3.3 70 0 20 scale degree five. While the decision tree below divides the
pieces into eight, rather than six, compositional styles, it allows
Josquin 0 9 3 6.1 60.6 21.2 us to better understand the differences the constitute each
composer’s style and provide some context for the above PCA
La Rue 0 2.8 0 0 16.7 80.6 and subsequent classifiers.
96
Table of Contents
for this manuscript
compositional style is undergoes minimal levels of change Wegman, R. C. (2008). The Other Josquin. Tijdschrift van de
throughout the course of a composer’s career. Future work Koninklijke Vereniging Voor Nederlandse Muziekgeschiedenis,
hopes to be able to use these elements, although the sources are 58(1), 33–68.
often sparse and apocryphal. Additionally, we hope to examine
more features in future work, including harmonic entropy,
rhythmic transition probabilities, and the incorporation of the
aforementioned Satzfehler. Once these are implemented we
hope to provide a more accurate model that can serve as a tool
for researchers investigating this music.
ACKNOWLEDGMENTS
We would like to thank Jesse Rodin and the entire team at
the Josquin Research Project for all of the work they do on the
project, and Blake Howe for his helpful advice and insight into
the music of Josquin.
REFERENCES
Backer, E., & Kranenburg, P. van. (2005). On musical stylometry—a
pattern recognition approach. Pattern Recognition Letters, 26(3),
299–309.
Bharucha, J. J. (1991). Pitch, harmony, and neural nets: A
psychological perspective. In P. Todd & G. Loy (Eds.), Music and
Connectionism (pp. 84–99). Cambridge, MA: MIT Press.
Daniele, J. R., & Patel, A. D. (2013). An Empirical Study of Historical
Patterns in Musical Rhythm: Analysis of German & Italian
Classical Music Using the nPVI Equation. Music Perception: An
Interdisciplinary Journal, (31/1). pp. 10–18.
Grabe, E., & Low, E. L. (2002). Durational variability in speech and
the Rhythm Class Hypothesis. In Laboratory Phonology, (7). pp.
515–546. Berlin: Mouton de Gruyter.
Huron, D. (2006). Sweet anticipation: Music and the psychology of
expectation. Cambridge, MA: MIT Press.
Jas, E. (2014). What Other Josquin? Early Music History, 33, 109–
142.
Juola, P. (2015). The Rowling Case: A Proposed Standard Analytic
Protocol for Authorship Questions. Digital Scholarship in the
Humanities, 30 (suppl 1), pp. i100–i113.
van Kranenburg, P. (2006). Composer attribution by quantifying
compositional strategies. In ISMIR Proceedings, pp. 375–376.
Levitt, M. (2001). Joyce, Yet Again. Journal of Modern Literature,
24(3), 507–511.
Love, H. (2002). Attributing Authorship. Cambridge: Cambridge
University Press.
Mosteller, F., & Wallace, D. (1964). Inference and Disputed
Authorship: The Federalist. Addison-Wesley.
Osthoff, H. (1962). Josquin des Prez, in MGG1, vol. 7. pp. 190–214.
Parncutt, R., & McPherson, G. E. (Eds.) (2002). The science and
psychology of music performance: Creative strategies for
teaching and learning. New York: Oxford University Press.
Patel, A. D., & Daniele, J. R. (2003). An empirical comparison of
rhythm in language and music. Cognition, (87/1).
Rodin, J, & Sapp, Craig. (2010). The Josquin Research Project.
http://josquin.stanford.edu, accessed on 5/5/2016.
Speiser, Jacqueline, & Gupta, Vishesh. "Composer Style Attribution".
Project Report for CS 229: Machine Learning, Stanford
University; 2013.
cs229.stanford.edu/proj2013/CS229-FinalProject.pdf
Sparks, E. H. (1971). Problems of authenticity in Josquin’s motets.
Proceedings of the International Festival-Conference. edd. E.E.
Lowinsky & B.J. Blackburn (London 1976), pp. 345–59.
97
Table of Contents
for this manuscript
Playing by ear is considered a foundational musical skill, yet few studies examine the
mental processes involved in it. As externalizing internal music involves multimodal
imagery, this study aims to illuminate the mental imagery musicians without absolute
pitch may experience while playing by ear. Employing a qualitative case study design,
we will purposively select six proficient ear players without absolute pitch, enculturated
in Western tonal musical traditions, representing multiple learning backgrounds and
musical instruments.
The first two phases of data collection will facilitate participants’ metacognition, often a
challenge with automatized processes such as playing by ear. First, participants will be
asked to play several melodic passages by ear, designed to require varying levels of
abstraction, and verbalize their mental imagery using a think-aloud protocol. Participants
will then refine their descriptions for one week in a reflective journal, supplemented by
daily prompts from researchers. Finally, semi-structured interviews will allow researchers
to gather information about participants’ backgrounds and to clarify their previous
descriptions. Participants will then check the accuracy of researchers’ transcription and
interpretation of their descriptions. While analyzing data using systematic coding
techniques, we will be particularly cognizant of participants’ usage of imagery as they
employ strategies which may include the tracing of contour through intervals, the use of
intervals to relate pitches to the tonic, a sense of tonal hierarchy, and acquired motor
associations. Based on existing literature, we expect formal and informal learning
experiences, exposure to music notation, and features of participants’ musical instruments
to influence the multimodal imagery used while playing by ear.
ISBN 1-876346-65-5 © ICMPC14 98 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 99 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Nakano presented an automatic system that evaluates the Table 1. Statistics of the 2013-2014 FBA audition dataset
singing skill of the users (Nakano, Goto, & Hiraga, 2006).
The system is trained based on the extracted pitch interval # Files Total Duration Avg. Duration
accuracy and vibrato features without any score input. The (mins) (mins)
evaluation results show that the system is able to classify the Middle 1099 5460 4.96
performance into two classes (good or poor) with high School
accuracy. Mion and De Poli proposed a system that classifies Concert 1046 5280 5.04
music expressions based on score-independent audio features Band
(Mion & De Poli, 2008). With instantaneous and event-based Symphonic 1199 6540 5.45
features such as spectral centroid, residual energy, and notes Band
per second, the system is able to recognize four different List of All Instruments
musical expressions of violin, flute, and guitar performances. Alto Sax, Baritone Sax, Bass Clarinet, Bass Trombone, Bassoon,
These examples show the potential of analyzing a recorded Bb Clarinet, Bb Contrabass Clarinet, Eb Clarinet, English Horn,
Euphonium, Flute, French Horn, Oboe, Percussion, Piccolo,
music performance without the need for the underlying score.
Tenor Sax, Trombone, Trumpet, Tuba
III. DATASET 13 Mel Frequency Cepstral Coefficients (MFCCs). The
The dataset used for this study is kindly provided by the features are implemented according to the definitions in
Florida Bandmasters Association (FBA). The dataset contains (Lerch, 2012). To represent each recording with one feature
3344 audio recordings of the 2013–2014 Florida all-state vector, a two-stage feature aggregation process has been
auditions with accompanying expert assessments. The applied. In the first stage, the block-wise features within a
participating students are divided into three groups, namely 250ms texture window are first aggregated by their mean and
middle school (7th and 8th grade), concert band (9th and 10th standard deviation. In the second stage, all of these
grade), and symphonic band (11th and 12th grade). A total meta-features are aggregated again into one single vector with
number of 19 types of instruments are played during the their mean and standard deviation, resulting in one baseline
auditions. More details are shown in Table 1. All of the feature vector (d = 68) per recording.
auditions are recorded at a sampling rate of 44100 Hz and are The designed features for the percussion instruments are
encoded with MPEG-1 Layer 3. used to capture the rhythmic aspects of the audio signal. The
For pitched instruments, each audition session includes 5 resulting features (d = 18) per recording are shown as follows:
different exercises, which are lyrical etude, technical etude,
chromatic scale, 12 major scales, and sight-reading. For • Inter-Onset-Interval (IOI) histogram statistics (d = 7):
percussion instruments, the audition session also includes 5 This set of features is designed to describe the rhythmic
different exercises, which are mallet etude, snare etude, characteristics of the played onsets. An IOI histogram is
chromatic scale (xylophone), 12 major scales (xylophone), computed with 50 bins. Next, the standard statistical
and sight-reading (snare). Each exercise is graded by human measures crest, skewness, kurtosis, rolloff, flatness,
experts with respect to different assessment categories, such tonal power ratio, and the histogram resolution are
as musicality, note accuracy, rhythmic accuracy, tone quality, extracted.
artistry, and articulation, etc. In our experiments, all of the • Amplitude histogram statistics (d = 11): This set of
ratings are normalized to a range between 0 and 1, with 0 features is designed to capture the amplitude variations
being the minimum and 1 being the maximum score. of the played onsets. The amplitude of the waveform is
To narrow the scope of this study, only a small subset of first converted into dB, and the amplitude histogram is
this original dataset is used. This subset includes the calculated with 50 bins. Next, the same statistical
recordings of one exercise from one pitched instrument (alto measures as above are extracted. Additionally, the
sax) and one percussion instrument (snare drum) performed length of the exercise, the standard deviation of the
by middle school students. These two instruments have been amplitude in both linear and dB scale, and the Root
selected because they have relatively higher number of Mean Square (RMS) of the entire waveform are
recordings in the dataset (122 and 98, respectively). included as features.
The designed features for the pitched instruments are
intended to describe various dimensions of the performances,
IV. EXPERIMENTS namely the pitch, dynamics, and timing characteristics. Most
of these features are extracted at the note-level after a simple
A. Feature Extraction segmentation process based on the quantized pitch contour.
To represent the audio signals in the feature space, two The pitches are detected using an autocorrelation function
types of features are extracted: 1) baseline features and 2) based pitch tracker. The note-level features are:
designed features.
• Note steadiness (d = 2): These two features are designed
The baseline features (d = 17, d is the dimensionality) are to find fluctuations in the pitch of a note. For each note,
computed block-by-block with a window size of 1024 the standard deviation of pitch values and the
samples and a hop size of 256 samples. A Hann window is percentage of values deviating more than one standard
applied to each block. The baseline features include: spectral deviation from the mean are computed.
centroid, spectral rolloff, spectral flux, zero-crossing rate, and
100
Table of Contents
for this manuscript
• Amplitude deviation (d = 1): This feature aims to find coefficient 𝑟 and its corresponding p-value, the R2 value, and
the uniformity of the amplitude of a note. For each note, the standard error. These metrics are typically used to measure
the standard deviation of the RMS is computed. the strength of the regression relationship between the
• Amplitude envelope spikes (d = 1): This feature predictions and the actual ratings. More details of the
describes the spikiness of the note amplitude over time. mathematical formulations can be found in (McClave &
The number of local maxima of the smoothed derivative Sincich, 2003).
of the RMS is computed per note.
Once the above features are extracted, their mean, V. RESULTS AND DISCUSSION
maximum, minimum, and standard deviation across all the
The results of the first and second set of experiments are
notes are calculated to represent the recording. In addition, the
shown in Table 2. Most of the entries have except
following exercise-level features are extracted:
for artistry in the baseline features and note accuracy in both
• Average pitch accuracy (d = 1): This feature shows the features of the pitched instrument; therefore, the p-values are
consistency of the notes played. The histogram of the not shown in the table.
pitch deviation from the closest equally tempered pitch The following trends can be observed: first, the baseline
is extracted with a 10 cent resolution. The area under feature set is not able to capture enough meaningful
the window (width: 30 cent) centered around the highest information to create a usable model.
peak is considered as the feature. Second, compared to the baseline features, the designed
• Percentage of correct notes (d = 1): For this feature, features lead to general improvements in all the metrics for
each note is labeled either correct or incorrect, and the both pitched and percussion instruments. In most cases, the R2
percentage of correct notes across the entire exercise is values increase by at least 30%, illustrating the effectiveness
computed as the feature. A note is labeled correct if the of the designed features at characterizing the music
percentage of pitch values with a deviation from the performances.
mean pitch is lower than a pre-defined threshold. Third, musicality is the assessment that is modeled best for
• Timing accuracy (d = 7): These features are computed both pitched and percussion instruments. The highest
from the IOI histogram of the note onsets. Note onsets correlation coefficient and R2, 0.7307 and 0.5254 respectively,
are computed from the pitch contour. The features used are achieved when using designed features to predict
are the same as the features that were used for musicality for pitched instrument recordings. This could be
percussive instruments. explained by the fact that musicality is a relatively abstract
The resulting feature vector for the designed features for description that covers most aspects of a performance, and
pitched instruments has the dimension d = 25. therefore, it is likely to be related to the general quality of the
B. Feature Extraction performance, making it relatively easy to model. Table 3
shows the result of an inter-label investigation, correlating one
Using the extracted features from the audio signals, a
Support Vector Regression (SVR) model with a linear kernel type of rating with the others. It is found that musicality tends
function is trained to predict the human expert ratings. The to be highly correlated with the other assessment categories.
libsvm (Chang & Lin, 2011) implementation of this model is This result further confirms the relationship between
used with default parameter settings. A Leave One Out musicality and overall performance quality.
cross-validation scheme is adopted to train and evaluate the Fourth, note accuracy for the pitched instrument is the
models. worst performing category. Compared to a percussion
instrument, more information might be required to model the
C. Experimental Setup note accuracy of a pitched instrument properly. The proposed
To compare the effectiveness of the baseline features versus features might be unsuitable for capturing the relevant
the designed features, two sets of experiments are conducted. information. This result could imply the necessity of including
In the first set of experiments, the baseline and designed score information in order to improve the model of certain
features are extracted from the pitched instrument recordings categories.
and used to build the regression models for predicting labels Last but not least, the baseline features did not perform
such as artistry, musicality, note accuracy, rhythmic accuracy, better than the designed features in predicting the tone quality.
and tone quality. In the second set of experiments, the same This seems to contradict the intuition that timbre features are
procedure is used to predict musicality, note accuracy, and directly related to the tone quality. On the one hand, this could
rhythmic accuracy for the percussion instrument recordings. mean that other, not extracted low-level characteristics such
An outlier removal process is included for all the experiments. as the initial attack phase of each note might have larger
This process removes the training data with the highest impact on the tone quality than the spectral envelope modeled
prediction residual (prediction minus actual rating) and is by our features. On the other hand, this result might suggest a
repeated until 5% of the dataset is eliminated. By removing connection between the perceived tone quality and the
the outliers, the regression models should be able to better high-level information captured by the designed features, such
capture the underlying patterns in the data. as note steadiness and pitch accuracy.
D. Evaluation Metrics
The performances of the models are evaluated by the
following standard statistical metrics: the correlation
101
Table of Contents
for this manuscript
Table 2. Experiment results of the regression models Table 3. Inter-label cross-correlation results for musicality
Benetos, E., Dixon, S., Giannoulis, D., Kirchhoff, H., & Klapuri, A.
Note Acc. 0.4853 0.0695 0.1592
Percussion (Snare)
102
Table of Contents
for this manuscript
Abstract—Successfully predicting missing components from how notes can be and cannot be played together in complex
complex multipart musical textures has attracted researchers of music multipart textures. Such rules change over time and are subject
information retrieval and music theory. Solutions have been limited to multiple factors [2]. As artificial intelligence and machine-
to either two-part melody and accompaniment (MA) textures or four- learning research advances, it is natural that computer scien-
part Soprano-Alto-Tenor-Bass (SATB) textures. This paper proposes tists apply such technique to music analysis in order to eluci-
a robust framework applicable to both textures using a Bidirectional
date these rules [3]. Two popular tasks investigated in this area
Long-Short Term Memory (BLSTM) recurrent neural network. The
BLSTM system was evaluated using frame-wise accuracies on the are (1) generating chord accompaniments for a given melody
Nottingham Folk Song dataset and J. S. Bach Chorales. Experimental in a two-part MA texture and (2) generating a missing voice
results demonstrated that BLSTM significantly outperforms other for an incomplete four-part SATB texture. Successfully ac-
neural-network based methods by 4.6% on average for four-part complishing either task manually is time-consuming and re-
SATB and two-part MA textures. The high accuracies obtained with quires considerable style-specific knowledge and the applica-
BLSTM on both textures indicated that it is the most robust and ap- tions discussed below are designed to automate and help non-
plicable structure for predicting missing components from multi-part professional musicians compose and analyze music.
musical textures. Approaches that treat these problems can be categorized
into two types according to the level of human engagement in
Keywords—bidirectional long-short term memory neural network, discovering and applying music rules. Early works that handle
multipart musical textures, music information retrieval
incomplete four-part SATB textures were mostly knowledge-
based models. Steels [4] proposed a representation system to
I. INTRODUCTION encode musical information and exploit heuristic search,
ISBN 1-876346-65-5 © ICMPC14 103 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
each note when different chords are being played. The optimal using specific music representation (e.g. chords) and cannot be
chord sequence is then generated using dynamic program- generalized to other representations (e.g. SATB). Moreover,
ming, or Viterbi Algorithm. HMM are also used by Allan [13] probabilistic models tend to ignore long-term dependency
[14] to harmonize four-part chorales in the style of J. S. Bach. among music components as they mainly focus on local transi-
In addition to HMM, Markov Model and Bayesian Networks tions between two consecutive components. Existing studies
are alternative models used for four-part textures by Biyikoglu using neural networks captured long-term dependencies in
[15] and Suzuki, et al. [16]. Raczynski, et al. [17] proposed a music and also are capable of dealing with music pieces of
statistical model that combines multiple simple sub-models. arbitrary lengths. However, neural networks have been notori-
Each sub-model captures different music aspects such as met- ously hard to train, and their ability to utilize long-term infor-
ric and pitch information, and all of them are then interpolated mation was limited until the introduction of Long-Short Term
into a single model. Paiement, et al. [18] proposed a multi- Memory (LSTM) cells.
level graphical model, which is proved to capture the long- The current study builds on the LSTM model and has the
term dependency among chord progression better than tradi- advantages of several other models without their restrictions.
tional HMM. One drawback of probabilistic models is that Like probabilistic models, neural networks can learn music
they cannot correctly handle data that are not seen in training “rules” of different styles of music from training data. Yet
data. Chuan and Chew [19] reduced this problem by using a unlike probabilistic models where a transition matrix among
hybrid system for style-specific chord sequence generation chords is required, neural networks enable the model to deal
with statistical learning approach and music theory. In [20], with flexible polyphonic accompaniment that does not neces-
Chuan compared and evaluated rule-based Harmonic Analyzer sarily have to be in the form of chords. Such structure also
[21], probabilistic-model based MySong [18], and the hybrid allows the model to incorporate additional musical quantities
system proposed in [19] on the task of style-specific chord easily by adjusting the number of input neurons. Finally, add-
generation for melodies. ing bidirectional links and LSTM cells improves a neural net-
Neural networks have also been used. Gang, et al. [22] were work’s ability to employ additional timing information. All of
one of the earliest that used neural networks to produce chord the above contributes to the fact that the proposed BLSTM
harmonization for given melodies. Sequential neural network model is flexible and effective in generating the missing com-
consisted of a sub-net that learned to identify chord notes for ponent in an incomplete texture.
the melody in each measure, and the result was fed into the
network to learn the relationship between melodies and II. BACKGROUND
chords. The network was later adopted in real-time application
[23][24]. Consisting of 3 layers, the input layer takes pitch, A. Feed-Forward Neural Network
metric information, and the current chord context, and the The most common neural network is a feed-forward
output layer predicts the next chord. Cunha, et al. [25] also multi-layered perceptron (MLP) network as shown in Fig. 1,
proposed a real-time chord harmonization system using multi- which consists of three layers: an input layer, a hidden layer,
layer perceptron (MLP) neural networks and a rule-based se- and an output layer. The network is usually trained via back-
quence tracker that analyzes the structure of the song in real- propagation. Feed-forward neural networks were used exten-
time, which provides additional information on the context of sively in music-related research such as harmony generation
the notes being played. [22][25], onset detection [30], and algorithmic composition
Hoover, et al. [26] used two Artificial Neural Networks [31].
(ANN) to model the relationship between melodies and ac- Though being the simplest among various neural network
companiment as a function of time. The system was later ex- structures, feed-forward neural networks could not effectively
tended to generate multi-voice accompaniment by increasing
capture rhythmic patterns and music structures in music owing
the size of the output layer in [27]. Bellgard and Tsand [3]
to the fact that they do not have a mechanism to keep track of
trained an effective Bolzmann machine and incorporated ex-
the notes played in the past. Since each input frame is proc-
ternal constraints so that harmonization follows the rules of
chorales. Fuelner developed a feed-forward neural network essed individually, feed-forward neural networks are totally
that harmonizes melodies in specific styles in [28]. De Prisco, deterministic unless the context is provided to the network
et al. [29] proposed a neural network that finds appropriate such as using a sliding window as in [30].
chords to harmonize given bass lines in four-part SATB cho- B. Recurrent Neural Network
rales by combining three base networks, each of which models
Another way to present past information is to add recurrent
contexts of different time lengths.
Although these previous studies provide valuable insights, a links to the network, resulting in a recurrent neural network
number of constraints exist in their applications. Most rules (RNN). A RNN has at least one feedback connection from one
encoded in knowledge-based systems are style-specific, mak- or more of its units to another unit, forming cyclic paths in the
ing them hard to apply to other types of music efficiently. network. Fig. 2 shows a simple example of a RNN with an
Probabilistic models and neural networks, on the other hand, input layer, a hidden layer, and a output layer with recurrent
provide a much more adaptable solution that can be applied to links from the output layer to the input layer. RNNs are known
music of different styles by learning rules from different styles to be able to approximate dynamical systems due to internal
of training data. Nevertheless, many of the probabilistic mod- states that act as internal memory to process sequence of in-
els can only handle music pieces of fixed length. In addition, puts through time. The fact that music is a complex structure
the transition matrix of probabilistic models has to be learned that has both short-term and long-term dependency just as
104
Table of Contents
for this manuscript
language models makes RNN an ideal structure for solving [39], Real-Time Recurrent Learning (RTRL) [40] and their
music-related problems. Mozer [32] used a fully-connected combinations, update the network by flowing errors “back in
RNN to generate music note-by-note. Boulanger, Lewan- time.” As the error propagates from layer to layer, it tends to
dowski, et al. also developed an RNN-based model to recog- either explode or shrink exponentially depending on the mag-
nize chords in audio files [33] and construct polyphonic music nitude of the weights. Therefore, the network fails to learn
[34] by using restricted Boltzmann machine (RBM) and recur- long-term dependency between inputs and outputs. Tasks with
rent temporal RBM (RTRBM). time lags that are greater than 5-10 time steps are already dif-
ficult to learn, not to mention that dependency of music usu-
ally spans across tens to hundreds of notes in time, which con-
tributes to music’s unique phrase structures. Long short-term
memory (LSTM) [38] algorithm was designed to tackle the
error-flow problem.
105
Table of Contents
for this manuscript
irrelevant memory contents; a forget gate learns to control The Beat-Onunit, on the other hand, provides metric informa-
when it is time to forget already remembered value, i.e. to tion to the network. If the time t is on a beat, the value of the
reset the memory cell. When gates are closed, irrelevant in- Beat-Onunit is 1.0, otherwise 0.0. If it is at rest, the values of
formation does not enter the cell and the state of the cell is not all input neurons are 0.0. The time signature information is
altered. The outputs of all memory blocks are fed back recur- obtained via meta-data in MIDI files.
rently to all memory blocks to remember past values. The outputs at time t, y(t), is the chord played at time t. We
In this project, we use Bidirectional Recurrent Neural Net- limit chord selection to major, minor, diminished, suspended,
works as their recurrent links grant the network access to both and augmented triads as in [12], resulting in 52 chords in total.
information in the past and in the future. We also use LSTM The output units represent these 52 chords in a manner similar
to the input neurons: the value of the neuron corresponding to
cells in the network to avoid vanishing gradient problems, the
the chord played at that time has a value of 1.0, and the values
details of which are covered below.
of the rest of the neurons are all 0.0.
III. METHODS 2) Training the Network
A. Music Representation The input layer has 14 input neurons: 12 neurons for each
pitch in the pitch-class, one neuron for note-begin and one for
MIDI files are used as input in both training and testing Beat-Onunit. The network consists of two hidden layers for
phases in this project. Multiple input and output neurons are both forward and backward states, resulting in four hidden
used to represent different pitches. At each time, the value of layers in total. In every hidden layer are 20 LSTM blocks with
the neuron associated with the particular pitch played at that one memory cell. The output layer uses the softmax activation
time is 1.0. The values of the rest of the neurons are 0.0. We function and cross entropy error function as in [35]. Softmax
avoid distributed encodings and other dimension reduction function is a standard function for multi-class classification
techniques and represent the data in this simple form because that squashes a K-dimensional vector x in the range of (0,1),
this representation is common and assumes that neural net-
which takes the form
works can learn a more distributed representation within hid-
den layers. Monophonic inputs are used because of their
adaptability to both monophonic and polyphonic data. We
leave further consideration and evaluation of distributed repre- .
sentation to future work. The softmax function ensures that all the output neurons sum
The music is split into time frames and the length of each to one at every time step, and thus can be regarded as the
frame depends on the type of music. Finding missing music probability of the output chord given the inputs at that time.
components can then be formulated as a supervised classifica- Each music piece is presented to the network one at a time,
frame-by-frame. The network is trained via standard gradient-
tion problem. For a song of length t1, for every time t from t0
descent Back-Prorogation. A split of data is used as the valida-
to t1, given input x(t), the notes played at time t, find the out-
tion set for early stopping in order to avoid over-fitting of the
put y(t), which is the missing component we try to predict. In training data. If there is no improvement on the validation set
other words, for two-part MA textures, y(t) is the chord played for 30 epochs, training is finished and the network setting with
at time t, while for four-part SATB textures, y(t) is the pitch of the lowest classification error on the validation set is used for
the missing part at time t. testing.
B. Generating Accompaniment in Two-Part MA Texture 3) Markov Model as Post-Processing
1) Input and Output The network trained in III-B2 can then be used to predict
The MIDI files are split into eighth-note fractions. The in- the chord associated with each melody note by choosing the
puts at time t, x(t), are the notes of the melody played at time t. output neuron that has the highest activation at each time
Instead of representing the notes by their MIDI number, which point. However, the predicted chord at each time is independ-
spans the whole range of 88 notes on a keyboard, we used ent of the chord predicted in the previous and succeeding time.
pitch-class representation to encode pitches into their corre- While there are forward and backward links in the hidden lay-
sponding pitch-class number. Pitch class, also known as ers of the network, there is no recurrent connections from the
“chroma,” is the set of all pitches regardless of their octaves. final neuron output to the network. The chord might sound
That is, all C notes (C0, C1. . . Cn, etc.) are all classified as good with the melody, but the transition from one chord to
pitch-class C. All notes are represented with one of the 12 another might not make sense at all. In fact, how one chord
numbers corresponding to the 12 semitones in an octave. In transitions from and to the other typically follows specific
addition to pitch-class information, two additional values are chord-progression rules depending on different music styles.
added as inputs: Note-Begin unit and Beat-Onunit. In order to A bi-gram Markov Model is thus added to learn the probabil-
be able to tell when a note ends, a Note-Begin unit is used to ity of transitioning from each chord to possible successors
differentiate two consecutive notes of the same pitch from one independent of the melody, which will be referred to as the
note that is held for two time frames as was done by [31]. If transition matrix. The transition matrix is smoothed using lin-
the note in the melody begins at time n, the value of the Note- ear interpolation with a unigram model. The model also learns
Begin unit is 1.0; if the note is sounding, but is held from pre- the statistics of the start chords.
vious time n or is not played at all, the value of the unit is 0.0. Instead of selecting the output neuron with the highest acti-
vations, the first k neurons with the highest activations are
106
Table of Contents
for this manuscript
107
Table of Contents
for this manuscript
108
Table of Contents
for this manuscript
109
Table of Contents
for this manuscript
[10] A. Freitas and F. Guimaraes, “Melody harmonization in evolutionary [36] F. Eyben, S. Bo ̈ ck, B. Schuller, and A. Graves, “Universal onset detec-
music using multiobjective genetic algorithms,” in Proceedings of the tion with bidirectional long short-term memory neural networks.” in
Sound and Music Computing Conference, 2011. ISMIR, 2010, pp. 589–594.
[11] H.-R. Lee and J.-S. Jang, “i-ring: A system for humming transcription [37] M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural net-
and chord generation,” in Multimedia and Expo, 2004. ICME’04. 2004 works,” Signal Processing, IEEE Transactions on, vol. 45, no. 11, pp.
IEEE International Conference on, vol. 2. IEEE, 2004, pp. 1031–1034. 2673–2681, 1997.
[12] I. Simon, D. Morris, and S. Basu, “Mysong: automatic accompaniment [38] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
generation for vocal melodies,” in Proceedings of the SIGCHI Con- fer- computation, vol. 9, no. 8, pp. 1735–1780, 1997.
ence on Human Factors in Computing Systems. ACM, 2008, pp. 725– [39] P. J. Werbos, “Back-propagation through time: what it does and how to
734. do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
[13] M. Allan, “Harmonising chorales in the style of Johann Sebastian Bach,” [40] A. Robinson and F. Fallside, The utility driven dynamic error propaga-
Master’s Thesis, School of Informatics, University of Edinburgh, 2002. tion network. University of Cambridge Department of Engineering,1987.
[14] M. Allan and C. K. Williams, “Harmonising chorales by probabilistic [41] J. Schmidhuber, “Long short-term memory: Tutorial on lstm recurrent
inference,” Advances in neural information processing systems, vol. 17, networks,” http://people.idsia.ch/∼juergen/lstm/ index.htm, 2003, on-
pp. 25–32, 2005. line resource.
[15] K. M. Biyikoglu, “A markov model for chorale harmonization,” in Pre- [42] E. Foxley, “Nottingham dataset,” http://ifdo.ca/∼sey-mour/nottingham/
ceedings of the 5 th Triennial ESCOM Conference, 2003, pp. 81–84. nottingham.html, 2011, accessed: 04-19-2015.
[16] S. Suzuki, T. Kitahara, and N. University, “Four-part harmonization [43] D. Temperley, The Cognition of Basic Musical Structures. Cambridge:
using probabilistic models: Comparison of models with and without MIT Press, 2001.
chord nodes,” Stockholm, Sweden, pp. 628–633, 2013. [44] M. S. Cuthbert and C. Ariza, “Music21: A toolkit for computer-aided
[17] S. A. Raczyński, S. Fukayama, and E. Vincent, “Melody harmonization musicology and symbolic music data,” 2010.
with interpolated probabilistic models,” Journal of New Music Research, [45] M. Greentree. (1996) http://www.jsbchorales.net/index.shtml. Accessed:
vol. 42, no. 3, pp. 223–235, 2013. 04-19-2015.
[18] J.-F. Paiement, D. Eck, and S. Bengio, “Probabilistic melodic harmoni-
zation,” in Advances in Artificial Intelligence. Springer, 2006, pp. 218–
229.
[19] C.-H. Chuan and E. Chew, “A hybrid system for automatic generation of
style-specific accompaniment,” in 4th Intl Joint Workshop on Computa-
tional Creativity, 2007.
[20] C.-H. Chuan, “A comparison of statistical and rule-based models for
style-specific harmonization.” in ISMIR, 2011, pp. 221–226.
[21] D. Temperley and D. Sleator, “The temperley-sleator harmonic ana-
lyzer,” 1996.
[22] D. Gang and D. Lehmann, “An artificial neural net for harmonizing
melodies,” Proceedings of the International Computer Music Associa-
tion, 1995.
[23] D. Gang, D. Lehmann, and N. Wagner, “Harmonizing melodies in real-
time: the connectionist approach,” in Proceedings of the International
Computer Music Association, Thessaloniki, Greece, 1997.
[24] D. Gang, D. Lehman, and N. Wagner, “Tuning a neural network for
harmonizing melodies in real-time,” in Proceedings of the International
Computer Music Conference, Ann Arbor, Michigan, 1998.
[25] U. S. Cunha and G. Ramalho, “An intelligent hybrid model for chord
prediction,” Organised Sound, vol. 4, no. 02, pp. 115–119, 1999.
[26] A. K. Hoover, P. A. Szerlip, and K. O. Stanley, “Generating musical
accompaniment through functional scaffolding,” in Proceedings of the
Eighth Sound and Music Computing Conference (SMC 2011), 2011.
[27] A. K. Hoover, P. A. Szerlip, M. E. Norton, T. A. Brindle, Z. Merritt, and
K. O. Stanley, “Generating a complete multipart musical composition
from a single monophonic melody with functional scaffolding,” in In-
ternational Conference on Computational Creativity, 2012, p. 111.
[28] J. Feulner, “Neural networks that learn and reproduce various styles of
harmonization,” in Proceedings of the International Computer Music
Conference. International Computer Music Association, 1993, pp. 236–
236.
[29] R. DePrisco, A. Eletto, A. Torre, and R. Zaccagnino, “A neural network
for bass functional harmonization,” in Applications of Evolutionary
Computation. Springer, 2010, pp. 351–360.
[30] H. Goksu, P. Pigg, and V. Dixit, “Music composition using genetic
algorithms (ga) and multilayer perceptrons (mlp),” in Advances in Natu-
ral Computation. Springer, 2005, pp. 1242–1250.
[31] P. M. Todd, “A connectionist approach to algorithmic composition,”
Computer Music Journal, pp. 27–43, 1989.
[32] M. C. Mozer, “Neural network music composition by prediction: Ex-
ploring the benefits of psychoacoustic constraints and multi-scale proc-
essing,” Connection Science, vol. 6, no. 2-3, pp. 247–280, 1994.
[33] N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, “Audio chord
recognition with recurrent neural networks.” in ISMIR, 2013, pp. 335–
340.
[34] ——,“Modeling temporal dependencies in high-dimensional sequences:
Application to polyphonic music generation and transcription,” arXiv
preprint arXiv:1206.6392, 2012.
[35] A. Graves and J. Schmidhuber, “Framewise phoneme classification with
bidirectional lstm and other neural network architectures,” Neural Net-
works, vol. 18, no. 5, pp. 602–610, 2005.
110
Table of Contents
for this manuscript
In just the past decade, a host of new technologies have become available for studying
musical corpora (such as the Music21 Toolkit and the MATLAB Midi toolbox), that
allow us to ask questions and discover features in music that would otherwise be too
difficult or time consuming to carry out using traditional methods.
This talk offers a new utility toward this effort, in particular a standalone piece of
software, for both discovering phrase-level patterns, including voice-leading schemas in
music, and a robust search utility specifically designed to identify schematic-type
information in musical corpora. This current project is distinct from other work, in that
the discovery and search engine presented here were designed to be as cognitively
informed as practically possible, incorporating ideas from schema theory (Gjerdingen
2007), corpus linguistics (Stefanowitsch and Gries, 2004), and computer vision (Marr,
1982), which brings with it interesting consequences. For one thing, the application can
be used to reveal patterning in music at various structural scales and can identify
similarity in music despite the presence of even a great deal of variety at the musical
surface. In this way, the algorithm performs search and finds correspondences that may
more closely align with the intuitions of human listeners.
The goal of this talk is to demonstrate the numerous applications of this utility as a tool
for music studies, and to discuss relevant music theoretic and cognitive issues that can be
investigated using the application. In particular, I demonstrate how these ideas apply to
theories of contour, such as those proposed by Morris (1993) and Quinn (1997), and
address the problem of equivalence using methods adopted from applied mathematics as
an alternative to set theory.
ISBN 1-876346-65-5 © ICMPC14 111 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 112 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
but rather exists on a continuum whereby some auditory ob- identified as having normal hearing and normal or corrected-
jects may be considered more or less musical than other audi- to-normal vision. All were considered non-musicians with a
tory objects. We suggest that identification of an auditory mean 2.45 years of formal music training. All were native
object as being more musical than another begins with a speakers of American English. No subjects reported having
modularized analysis of features. The results of this analysis absolute pitch (AP) or knowledge of any family members with
produce a percept that may be strongly or weakly associated AP.
with memories in the musical lexicon. Musicality, therefore,
2) Stimuli
can be thought of as a property of sound that emerges when
certain organizational features are presented in specific ways. We used 50 sequences, each with 10 pure tones of 500ms
By first probing an auditory object’s degree of musicality and, duration, with an inter-stimulus-interval of 500ms. Each of the
second, investigating the features and processes that give rise 10 tones was chosen randomly from the diatonic collection
to the degree of musicality, we can determine how listeners corresponding to the G-major scale (G4-F#5 or 392Hz-
perceive an auditory object as musical. 740Hz). Sequences were randomly transposed to all 12 pitch-
Music theorists, empirical musicologists, and music psy- class levels and then transposed back to the original G4-F#5
chologists have approached feature identification and analysis span creating an equal distribution of pitches across the chro-
in a number of ways. One way is to examine structural fea- matic scale.
tures of existing music with the premise that, if a particular 3) Task
feature is significantly present in a specific way, then there is a
mental representation or process that has developed in re- Subjects were asked to provide a rating of musicality for
sponse to this feature. Another approach is to posit a general each sequence. They were instructed to rate each sequence on
idea about “what music is” from the experimenter’s experi- a Likert scale of 1 to 5. If they thought the sequence was very
ence. In both cases, features lead to testable hypotheses and musical, they were asked to press the ‘5’ key on the keyboard;
the importance of the feature is established when observable if they thought the sequence was not musical at all, they were
behavior can be modulated by the manipulation of that feature. asked to press the ‘1’ key on the keyboard. Subjects were
In both approaches, however, an auditory-object’s status as asked to use the entire range of ratings and were given 1.5
“music” is taken a priori. While the assumption that Mozart’s seconds to respond. If they responded before the sequence had
“Hunt Quartet” or an ethnic folk song is axiomatically music completed or failed to respond in 1.5 seconds, the response
is appropriate in some contexts, it can inhibit us from under- was not recorded and the experiment moved on to the next
standing what a subject-oriented mental representation of mu- trial.
sic might look like. In the current study we seek to uncover an 4) Design
operational definition of music by first observing subjects’ Subjects completed a practice block of six trials after which
behavior and then examining the stimuli in an attempt under- they were asked if they understood the task. Subjects were
stand how stimuli modulate behaviors. To test this, we asked given the option to complete additional practice trials or to
subjects to listen to randomly generated pure-tone sequences move on to the main part of the experiment. The experiment
and then rate each sequence on its musicality. Our first hy- was comprised of 15 blocks, each with 20 trials. 50 sequences
pothesis is that subjects will be able to effectively group these were played eight times apiece for a total of 400 trials. Trials
sequences according to their musicality. Our second hypothe- were presented in 20 blocks, with a forced 20-second break
sis is that a measure can be devised to quantitatively describe between each block. Stimuli were presented via headphones
the psychological processes involved in judging the musicality (Sennheiser HD210) at a fixed, comfortable listening volume
of simple auditory stimuli. We created a profile for each se- (~82dBA-SPL). The paradigm ran on an Apple Mac Mini and
quence, comprised of a set of metrics that describe structural responses were recorded on a standard computer keyboard.
features commonly discussed in the music theory and music The paradigm was coded using MATLAB with Psychtoolbox
psychology literature. A principal component analysis (PCA) extensions [3][33].
was performed on the ratings and the resulting components
were correlated with the profiles to understand which features
best account for the variance found in the musicality ratings.
This approach allows for an exploration of the low-level audi-
tory features that give rise to the perception of musicality in
auditory objects. The following describes our two experi-
ments and discussion is reserved for the end.
Fig. 1 Group (N=30) z-scored stimulus ratings of 50 sequences.
II. EXPERIMENT 1 Lower scores indicate less-musical sequences and higher scores indi-
cate more-musical sequences.
A. Methods
5) Results
1) Subjects
The results, shown in Fig. 1, show clear separability be-
30 participants (17 female, 13 male) were recruited using
tween the most musical and the least musical sequences, as
the Carnegie Mellon University Center for Behavioral and
well as a graded representation across the scale. Subjects used
Decision Research subject pool. Ages ranged from 19 to 58
the entire scale in rating the sequences. For analysis, ratings
years old with mean of 30.26 (SD 13). All subjects self-
were divided into two equal sized groups (lower 25 and upper
113
Table of Contents
for this manuscript
114
Table of Contents
for this manuscript
(e.g., such as varied rhythm) that our melodic stimuli do not tures are used, to varying degrees, in combination when sub-
possess. Because our sequences are relatively feature-poor, we jects make musicality judgments of pitch sequences. PCA of
used prediction by partial matching (PPM) to measure the the ratings was performed and permutation testing showed that
compressibility of each sequence. PPM is an adaptive statisti- only the first three components were significant, explaining a
cal compression technique that predicts the nth symbol in a combined 38% of the variance; however, this is a conservative
string by using a context that optimally varies in length ac- measure. Because information is often embedded in additional
cording to its predictive success [26]. We used a PPM encoder components, we include the top five. The feature profile of
on consecutive intervals in each sequence and limited the con- each tune was correlated with the eigenvalues of the signifi-
text size to a maximum of 5. Since motive-rich music is more cant PCA components in order to identify dominant strategies
highly valued than motive-poor music by some music scholars listeners use to make decisions about musicality. The PCA
[31][6][12][49], we expect that greater compressibility will analysis is shown in Fig. 4.
correlate with greater musicality.
TABLE 1
5) Entropy SIGNIFICANT CORRELATIONS BETWEEN FEATURES AND RATINGS.
Probabilistic entropy has a rich history in music scholarship
largely centered on style analysis [48][20][41]. It has also
been used to model melodic complexity, musical expectation,
and aesthetic experience [25][10][23]. Entropy is closely re-
lated to ideas of compression found in our PPM analysis and
the probabilistic distribution found in our key finding analysis.
Relative entropy (Hr) is a convenient and standard metric for
characterizing how varied the distribution of symbols (pitches)
is given a fixed alphabet (the diatonic scale). It is difficult to
make predictions about how varied levels of uncertainty might
affect musicality. We might predict that a 10-note sequence
composed using a single pitch might be equally as musical as
a 10-note sequence employing all seven diatonic pitches. Nev-
ertheless, because of its continued appearance in music-related
studies, we included relative entropy in the sequence profile. Table 2 shows correlations between eigenvalues of the first
Fig. 3 shows the aforementioned metrics associated with the five components and the eight metrics. Significant correlations
three most- and least-musical sequences. are highlighted in red. Of the eight, only intervallic features of
Range, Mean, and SD correlate significantly with the first
component. Mean also correlates strongly with the second
component as does Key. Hr and PPM appear less strongly
followed by MIV. Contour correlates with the third compo-
nent followed closely by a reappearance of PPM.
Fig. 3 Feature metrics for the three-most and three-least musical se-
quences shown.
IV. RESULTS
The first three intervallic features (Range, Mean, and SD)
are strongly negatively correlated (p=<0.001) with z-mean
scores. Subjects found that sequences with smaller range,
smaller mean-interval size, and smaller standard deviation of
the mean to be more musical (Table 1). This confirms our
prediction and supports the Huron’s detailed work showing
Fig. 4 A principal components analysis (PCA) on subjects’ ratings
that cross-culturally, sequences privilege small intervals [19].
returns three significant uncorrelated components explaining 38% of
Interestingly, features such as MIV, Contour, Key, Hr, and the variance. Gray bars indicate critical R2 values from permutation
PPM do not significantly correlate with the ratings. testing; blue bars indicate R2 values from PCA.
We hypothesize that subjects’ ratings will have uncorrelated
principal components that will each correlate with the differ- V. DISCUSSION
ent musical features described above. That is, components are
separable in terms of these features suggesting that these fea- The goal of this study was to understand the conditions un-
der which randomly generated sequences are organized into
115
Table of Contents
for this manuscript
(or perceived as) auditory objects. Despite the extensive work eigenvalues. The pattern is more diffuse with a slight bilateral
that has been done on auditory stream formation, few studies distribution of high versus low musicality across the x axis,
have examined how this process plays out in novel musical but no grouping along the y axis. We interpret this as evi-
contexts. Notable treatments address how bistable auditory dence that the non-significant metrics (Key, Hr, PPM, MIV,
streams are formed in polyphonic music and how acoustic and Contour) are not contributing meaningfully to the variance
features, such as timbre, contribute to object formation at this level.
[47][24][45]. Unlike previous studies that present multi-stable
auditory scenes, this study presents a single stream in a non-
competitive auditory environment. The streams were con-
trolled for range, timbre, tempo, and event duration, so as to
minimize the possibility that any one stream could be inter-
preted as multi-stable. However, the perception of multiple
streams and how it contributes to musical object formation
remains an open question.
TABLE II
SIGNIFICANT CORRELATIONS (IN RED) BETWEEN FEATURES AND COEFFICIENTS
116
Table of Contents
for this manuscript
represent musical experiences. Experiment 2 is constrained by sic studies and our empirically derived collection of 10-tone
the set of features we have chosen. Whether or not a feature sequences is fertile territory for exploration. Future work in-
has explanatory power, of course, depends on the encoding of cludes identifying the neural correlates of the varied percep-
that feature. It is possible that we are limited in thinking about tion of auditory objects.
musical features in specific ways, finding just enough evi-
dence to continue considering them an important part of our REFERENCES
mental representations of music. [1] J. J. Bharucha, “Music cognition and perceptual facilitation: A
connectionist framework,” Music Perception, pp. 1–30, 1987.
[2] J. K. Bizley and Y. E. Cohen, “The what, where and how of auditory-
object perception,” Nature Reviews Neuroscience, vol. 14, no. 10, pp.
693–707, 2013.
[3] D. H. Brainard, “The psychophysics toolbox,” Spatial vision, vol. 10,
pp. 433–436, 1997.
[4] A. S. Bregman, Auditory scene analysis: The perceptual organization
of sound. MIT press, 1994.
[5] E. C. Cherry, “Some experiments on the recognition of speech, with
one and with two ears,” The Journal of the acoustical society of Amer-
ica, vol. 25, no. 5, pp. 975–979, 1953.
[6] E. T. Cone, “On derivation: Syntax and rhetoric,” Music Analysis, vol.
6, no. 3, pp. 237–255, 1987.
[7] M. S. Cuthbert and C. Ariza, “Music21: A toolkit for computer-aided
musicology and symbolic music data,” 2010.
[8] W. J. Dowling and D. S. Fujitani, “Contour, interval, and pitch recog-
nition in memory for melodies,” The Journal of the Acoustical Society
of America, vol. 49, no. 2B, pp. 524–531, 1971.
[9] W. J. Dowling, “Scale and contour: Two components of a theory of
Fig. 6 Linear combination of metrics significantly correlated with memory for melodies.,” Psychological review, vol. 85, no. 4, p. 341,
1978.
component 1 plotted against eigenvalues for the 10 most musical [10] T. Eerola and P. Toiviainen, “MIDI toolbox: MATLAB tools for
(blue) and 10 least musical (green) sequences. music research,” 2004.
[11] M. M. Farbood, G. Marcus, and D. Poeppel, “Temporal dynamics and
the identification of musical key.,” Journal of Experimental Psychol-
ogy: Human Perception and Performance, vol. 39, no. 4, p. 911,
2013.
[12] R. O. Gjerdingen, “Revisiting Meyer’s‘ grammatical simplicity and
relational richness,’” 1992.
[13] T. D. Griffiths and J. D. Warren, “What is an auditory object?,” Na-
ture Reviews Neuroscience, vol. 5, no. 11, pp. 887–892, 2004.
[14] L. L. Holt and A. J. Lotto, “Speech Perception Within an Auditory
Cognitive Science Framework,” Current Directions in Psychological
Science, vol. 17, no. 1, pp. 42–46, Feb. 2008.
[15] D. Huron, “The Humdrum Toolkit: Software for Music Research,”
Center for Computer Assisted Research in the Humanities, Ohio State
University, copyright, vol. 1999, 1993.
[16] D. Huron, “The melodic arch in Western folksongs,” Computing in
Musicology, vol. 10, pp. 3–23, 1996.
[17] D. Huron, “Music research using Humdrum: A user’s guide,” Center
for Computer Assisted Research in the Humanities, Stanford, Califor-
nia, 1999.
[18] D. Huron, “Music information processing using the Humdrum
Toolkit: Concepts, examples, and lessons,” Computer Music Journal,
vol. 26, no. 2, pp. 11–26, 2002.
[19] D. B. Huron, Sweet anticipation: Music and the psychology of expec-
tation. MIT press, 2006.
Fig. 7 Linear combination of metrics not significantly correlated with [20] L. Knopoff and W. Hutchinson, “Entropy as a measure of style: The
component 1 plotted against eigenvalues. Z-score for each sequence influence of sample length,” Journal of Music Theory, vol. 27, no. 1,
is color coded (red = high, blue=low). pp. 75–97, 1983.
[21] C. L. Krumhansl and M. A. Schmuckler, “Key-finding in music: An
algorithm based on pattern matching to tonal hierarchies,” in Mathe-
Our approach to explicate these mental representations pro- matical Psychology Meeting, 1986.
vides a new way to examine music. This study sought to un- [22] C. L. Krumhansl, “Perceptual structures for tonal music,” Music Per-
derstand the boundary conditions for how auditory objects can ception, pp. 28–62, 1983.
[23] E. H. Margulis and A. P. Beatty, “Musical style, psychoaesthetics, and
become musical objects. We show that musicality is a vari-
prospects for entropy as an analytic tool,” Computer Music Journal,
able quality that auditory objects possess in greater or lesser vol. 32, no. 4, pp. 64–78, 2008.
degrees. We also show that there is considerable inter- [24] S. McAdams, “Perspectives on the contribution of timbre to musical
subjective agreement about what constitutes a highly musical structure,” Computer Music Journal, vol. 23, no. 3, pp. 85–102, 1999.
[25] L. B. Meyer, Emotion and meaning in music. University of chicago
object. We see that musicality is a property of auditory objects
Press, 2008.
that emerges under specific conditions, and exists on a contin- [26] A. Moffat, “Implementing the PPM data compression scheme,” Com-
uum. Placement on this continuum is the result of multiple munications, IEEE Transactions on, vol. 38, no. 11, pp. 1917–1921,
interacting features. Our results are informative for future mu- 1990.
[27] B. C. Moore and H. Gockel, “Factors influencing sequential stream
117
Table of Contents
for this manuscript
segregation,” Acta Acustica United with Acustica, vol. 88, no. 3, pp.
320–333, 2002.
[28] B. C. Moore and H. E. Gockel, “Properties of auditory stream forma-
tion,” Philosophical Transactions of the Royal Society B: Biological
Sciences, vol. 367, no. 1591, pp. 919–931, 2012.
[29] R. Morris, “Composition with Pitch-classes,” Yale University, 1987.
[30] R. D. Morris, “New directions in the theory and analysis of musical
contour,” Music Theory Spectrum, vol. 15, no. 2, pp. 205–228, 1993.
[31] E. Narmour, The analysis and cognition of melodic complexity: The
implication-realization model. University of Chicago Press, 1992.
[32] A. D. Patel, Music, language, and the brain. Oxford university press,
2010.
[33] D. G. Pelli, “The VideoToolbox software for visual psychophysics:
Transforming numbers into movies,” Spatial vision, vol. 10, no. 4, pp.
437–442, 1997.
[34] I. Peretz and M. Coltheart, “Modularity of music processing,” Nature
neuroscience, vol. 6, no. 7, pp. 688–691, 2003.
[35] I. Peretz and K. L. Hyde, “What is specific to music processing? In-
sights from congenital amusia,” Trends in cognitive sciences, vol. 7,
no. 8, pp. 362–367, 2003.
[36] I. Peretz, “The nature of music from a biological perspective,” Cogni-
tion, vol. 100, no. 1, pp. 1–32, 2006.
[37] L. Quinto, W. F. Thompson, and A. Taylor, “The contributions of
compositional structure and performance expression to the communi-
cation of emotion in music,” Psychology of Music, p.
0305735613482023, 2013.
[38] H. Schaffrath and D. Huron, “The Essen folksong collection in the
humdrum kern format,” Menlo Park, CA: Center for Computer As-
sisted Research in the Humanities, 1995.
[39] E. G. Schellenberg, “Simplifying the implication-realization model of
melodic expectancy,” Music Perception: An Interdisciplinary Journal,
vol. 14, no. 3, pp. 295–318, 1997.
[40] M. D. Schulkind, R. J. Posner, and D. C. Rubin, “Musical Features
That Facilitate Melody Identification: How Do You Know It’s
‘“Your”’ Song When They Finally Play It?,” Music Perception: An
Interdisciplinary Journal, vol. 21, no. 2, pp. 217–249, 2003.
[41] J. L. Snyder, “Entropy as a measure of musical style: the influence of
a priori assumptions,” Music Theory Spectrum, vol. 12, no. 1, pp.
121–160, 1990.
[42] D. L. Strait, N. Kraus, A. Parbery-Clark, and R. Ashley, “Musical
experience shapes top-down auditory mechanisms: evidence from
masking and auditory attention performance,” Hearing research, vol.
261, no. 1, pp. 22–29, 2010.
[43] D. Temperley, “What’s key for key? The Krumhansl-Schmuckler key-
finding algorithm reconsidered,” Music Perception, pp. 65–100, 1999.
[44] D. Temperley, Music and probability. MIT Press Cambridge^ eMass
Mass, 2007.
[45] M. Uhlig, M. T. Fairhurst, and P. E. Keller, “The importance of inte-
gration and top-down salience when listening to complex multi-part
musical stimuli,” Neuroimage, vol. 77, pp. 52–61, 2013.
[46] N. L. Wood and N. Cowan, “The cocktail party phenomenon revis-
ited: attention and memory in the classic selective listening procedure
of Cherry (1953).,” Journal of Experimental Psychology: General,
vol. 124, no. 3, p. 243, 1995.
[47] J. K. Wright and A. S. Bregman, “Auditory stream segregation and the
control of dissonance in polyphonic music,” Contemporary Music Re-
view, vol. 2, no. 1, pp. 63–92, 1987.
[48] J. E. Youngblood, “Style as information,” Journal of Music Theory,
vol. 2, no. 1, pp. 24–35, 1958.
[49] L. M. Zbikowski, “Musical coherence, motive, and categorization,”
Music Perception: An Interdisciplinary Journal, vol. 17, no. 1, pp. 5–
42, 1999.
118
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 119 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Table 1. Dissonance rating between two intervals (Cook, 1999; maximum distance of 6. Again, the non-zero chord distance is
Nordmark & Fahlén, 1988) scaled to the range between 1 and 2 as a substitution cost.
Interval Name Number of Semitones Dissonance Rating(1-7) (a) root level: 0
Octave 12 1.7 (b) fifth level: 0 7
Fifth 7 1.7 (c) triadic level: 0 4 7
Fourth 5 2.0 (d) diatonic level: 0 2 4 5 7 9 11
Major third 4 2.0
(e) chromatic level 0 1 2 3 4 5 6 7 8 9 10 11
Major sixth 9 2.4 3
Minor third 3 2.6 Figure 2. Diatonic basic space for C major triad (C=0,
Minor sixth 8 3.0 C#=1…B=11)
Minor seventh 10 3.3
Major second 2 3.9
Tritone 6 4.0
Major seventh 11 5.3 5) Hierarchy of harmonic stability For the fifth substitution
Minor second 1 5.7 cost function, we use the distance principle derived from
Minor ninth 13 5.8 hierarchy of harmonic stability are applied (Bharucha &
Krumhansl, 1983). In this model, the psychological distances
between chords reflect both key membership and stability
within the key: 1) Chords from the same key are perceived as
more closely related than those from different keys, 2) Chords
in a harmonic core (Tonic, subdominant and dominant) are
perceived as more closely related to each other compared to
other chords from the key, but not in core. Based on this model,
the substitution cost is computed when two different chords
and are given as followed:
cost = 0 when C = C
cost = 1 when C , C ∈ " and C , C ∈ #
cost = 2 when C , C ∈ " and C or C ∈ #
cost = 3 when C
C ∈ ",
Figure 1. The Circle of fifths
where K contains the seven triads built upon the seven degrees
3) Harmonic relations The third substitution cost function is of the diatonic scale and S is the set containing the three
defined based on the theoretical harmonic relations (Lerdahl, harmonic chords (I, IV, and V) of the harmonic core. The
2001). The distance between chord pairs, which all belong to non-zero substitution cost is scaled to the range between 1 and
the harmonic function in C major as we transposed all chords to 2 to be used in the edit distance.
this key, are selected from the theoretical harmonic relations
(See Table 2). The values between 5 and 8 are scaled to the B. Study 2: Human Survey Experiment
range between 1 and 2 as a substitution cost.
The objective of the survey experiment is to gather human
Table 2. Theoretical Harmonic Relations from Lerdahl (2001) similarity judgment data to compare the results with the
computed similarity values. A total of 28 subjects participated
ช ซ ฌ ญ ฎ ฏ ฐ in the experiment. Participants were 24–58 years old (mean μ =
ช 0 27.7) and were recruited without regard to their musical
ซ 8 0 training.
ฌ 7 8 0
5 7 8 0
Chord sequence pairs generated using US copyright
ญ
infringement cases from study 1 were also used in audio form
ฎ 5 5 7 8 0
in study 2. Chords were generated in a C major key, within a
ฏ 7 5 5 7 8 0
range of two octaves centred at middle C. Chord sequences
ฐ 8 7 5 5 7 8 0
were composed of 4 chords, with a beat of a quarter notes each
(140 bpm), resulting the total length of the audio of 6 seconds.
4) Tonal pitch space For the fourth substitution cost function, Human experiments were conducted as an on-line survey.
the tonal pitch space (TPS) model (Lerdahl, 2001) was used to Before starting to answer the questionnaire, participants were
calculate a distance between two chords. The basis of the informed about the task and presented with examples of very
model is the basic space (see Figure 2) which comprises five similar and very dissimilar chord sequences. During the survey,
hierarchical levels (a-e) consisting of pitch class subsets participants listened to the audio clips for each chord sequence
ordered from root level to chromatic level. The chordal pair, and were asked to rate the similarity of these pairs on a
distance is calculated by the number of non-common pitch scale from 1 to 4 (1=very similar, 2=similar, 3=dissimilar, and
classes within the basic spaces of two chords, divided by two. 4=very dissimilar). It took approximately 30 minutes to finish.
We calculated the distance from level a to c (level a-c), as we
limited our chord selection to triads only, resulting the
120
Table of Contents
for this manuscript
III. RESULTS tonality models can yield to better results than relying on a
single model only.
The comparison results are summarized in Figure 3.
Pearson's correlation coefficient was used to compare the
results between the computational measures and human survey
data. IV. CONCLUSIONS
We presented five different types of chord sequence
similarity measures based on different psychological distances
of harmony. We also gathered human judgements of similarity
between two chord sequence pairs. The main contribution of
this study is to provide theoretical and empirical evidence on
the relationship between computational approaches and human
judgements on harmonic similarity. Furthermore, the results
show that using a combination of similarity measures from
different tonality models can improve correlation with human
survey data. However, there are still numerous issues
Tonal dissonance of intervals Circle of fifths remaining to be addressed for the measurement of chord
r = 0.6241** r = 0.6303** sequence similarity. In this study, we handled the substitution
cost only while fixing the deletion and insertion costs to a
constant value. Considering all cases separately and in
conjunction will help comparing more diverse chord sequences
and the importance of each operation. Finally, future work
needs to take contextual information within a chord sequence
into account.
REFERENCES
Harmonic relations Tonal pitch space (TPS)
Bharucha, J., & Krumhansl, C. L. (1983). The representation of
r =0.6500** r = 0.6597** harmonic structure in music: Hierarchies of stability as a function
of context. Cognition, 13(1), 63-102.
Casey, M. A., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., &
Slaney, M. (2008). Content-based music information retrieval:
Current directions and future challenges. Proceedings of the IEEE,
96(4), 668-696.
Cook, P. R. (1999). Music, cognition, and computerized sound: An
introduction to psychoacoustics: The MIT press.
De Haas, B., Veltkamp, R. C., & Wiering, F. (2008). Tonal Pitch Step
Distance: a Similarity Measure for Chord Progressions. Paper
Hierarchy of Combination of TPS & presented at the ISMIR.
harmonic stability hierarchy of harmonic stability De Haas, W. B., Wiering, F., & Veltkamp, R. C. (2013). A
geometrical distance measure for determining the similarity of
r = 0.6694** r = 0.6710**
musical harmony. International Journal of Multimedia Information
**p<0.01.. Retrieval, 2(3), 189-202.
Freedman, D. (2015). Correlational Harmonic Metrics: Bridging
Figure 3. Pearson's correlation coefficient between computational Computational and Human Notions of Musical Harmony.
measures and human judgements in chord sequence similarity Hanna, P., Robine, M., & Rocher, T. (2009). An alignment based
system for chord sequence retrieval. Paper presented at the
As shown in Figure 3, all of computed similarity Proceedings of the 9th ACM/IEEE-CS joint conference on Digital
measurements had significant correlation with human libraries.
judgment of similarity (p<0.01). The highest correlation was Lerdahl, F. (2001). Tonal pitch space: Oxford University Press.
achieved by the similarity measure based on Bharucha– Nordmark, J., & Fahlén, L. E. (1988). Beat theories of musical
Krumhansl’s hierarchy of harmonic stability model (r=0.669, consonance. KTH Royal Institute of Technology STL-QPSR, 29(1),
p<0.01), followed by the TPS model (r=0.659, p<0.01). This 111-122.
suggests that key membership and harmonic stability play an Rocher, T., Robine, M., Hanna, P., & Desainte-Catherine, M. (2010).
A survey of chord distances with comparison for chord analysis:
important role in the perception of similarity based on harmony, Ann Arbor, MI: Michigan Publishing, University of Michigan
as well as the common pitch classes shared by two chords. Library.
Every correlation of measurements on chord level was higher
than those using root notes only, which implies that more
complex chord representations can enhance the performance of
harmonic similarity measurements. A combination of results
from the two highest-ranking models, hierarchy of harmonic
stability and TPS, provides even higher correlation with the
human judgements. This suggests that considering multiple
121
Table of Contents
for this manuscript
With the rapid, global growth of audio-visual media use in daily life, theory and research
focusing on the perception and cognition of multimedia presentation has never been more
important or more timely. Until recently, however, there has been little empirical focus
on the role of music in the multimedia experience, though the work in this area is gaining
momentum (Tan, Cohen, Lipscomb, & Kendall, 2013). The research presented in this
symposium breaks new ground in theory, methodology, and empirical investigation with
respect to the unique physiological and mental processes involved when humans engage
with music and moving images in the context of film and television. The first
presentation applies analytical frameworks of embodied cognition to various facets of the
temporal domain of music within film. Departing from the behavioral measures typically
used in the study of music and multimedia, the next presentation provides data collected
using magnetoencephalography (MEG) to explore how the combination of audio and
visuals in film may stimulate brain activity that differs from that caused by audio or
visuals alone. Delving further into the semantic resonances of the film-music
relationship, the third presentation considers film-music pairings, with particular attention
to pairs perceived as incongruent. The last presentation investigates the possible links
between music used in television commercials and various sociodemographic variables.
Following the four presentations, the discussant will lead an extended discussion with the
presenters and attendees, highlighting the diversity of methods and interdisciplinary
character of multimodal research (as represented by our symposium). The discussion
will provide the opportunity to raise questions, delve into topics in more depth, and draw
rich thematic connections between the speakers’ topics and within the field of music
psychology at large.
ISBN 1-876346-65-5 © ICMPC14 122 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Although most film-scoring techniques have gradually emerged through the intuitive use
of music, recent research in embodied cognition lends itself to examining these
techniques from theoretical and empirical perspectives: grounded cognition (e.g.
Barsalou, 2008) maintains that meaning is generated and stored in sensorimotor
experiences; ecological and evolutionary approaches (e.g. Clarke, 2005) take notion of
‘affordances’ (Gibson, 1983) as a point of departure; conceptual metaphor and schema
theories (e.g. Lakoff & Johnson, 2003) propose that cognitive representations and
operations emerge from sensorimotor processes; and the discovery of the mirror neuron
system in macaques (e.g. Rizzolatti, Fogassi, & Gallese, 2001) reveals neurophysiological
mechanisms that underpin social cognition. This presentation focuses exclusively on the
temporal domain of music within film, and traces the logic that motivates embodied
meaning while shedding light on a wide range of phenomena, including parallels between
the visual and aural domains, the effect of tempo fluctuations in arousal level, the role of
beat anticipation and predictability in relation to perceptual judgment of closure and
stability, and the influence of meter in depicting collective motor coordination that may
lead to social bonding and the construction of individual and group identities. By
applying analytical frameworks from embodied cognition to a number of film music
examples, this presentation extends beyond cognitive mappings of physicality to include
cultural practices and values, and elucidates how musical temporality supports,
highlights, or comments on other facets of the cinematic experience.
ISBN 1-876346-65-5 © ICMPC14 123 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 124 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Ideas of congruence recur in studies that address the impact of various properties of the
film-music relationship on audience perception and response, and relate to shared
properties of structural, semantic or holistic dimensions of the stimuli. Less research has
focused on audiovisual incongruence, which is often conflated with judgments of
inappropriateness in a film-music relationship. Yet, such judgments are multi-
dimensional, context-dependent and subjective. This study explores the relationship
between the perceived appropriateness of film-music pairings, and the number of
semantic properties shared by the component extracts that comprise them. Participants
(N=40) rated four music and four film excerpts, which reflected differing states on visual
analogue scales labeled “agitated-calm” and “happy-sad.” They also rated all possible
combinations of these excerpts (i.e., 16 pairings), using the same scales and a third scale
labeled “audiovisual pairing is appropriate-audiovisual pairing is inappropriate.” Results
show that, in general, pairings with shared properties on both the “agitated-calm” and
“happy-sad” scales were rated as more appropriate; pairings with fewer shared properties
were rated as less appropriate. Significant differences exist between appropriateness
ratings for pairings that share properties on both “agitated-calm” and “happy-sad” scales
and pairings that share properties on just one of these scales, or on neither scale. Yet,
some of the film-music pairings challenge these broader patterns, including those using
film rated as “calm.” Focusing on one such pairing, which combines film from The
Shawshank Redemption and music from the score for Gladiator, this paper will question
ideas of additivity used to explain film-music interactions. The researchers will contend
that a psycho-semiotic approach, involving musicological, semiotic and conceptual
analyses can complement empirical studies to more holistically account for the
complexity of film-music incongruence.
ISBN 1-876346-65-5 © ICMPC14 125 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Previous studies have uncovered relationships between listeners’ musical preferences and
certain sociodemographic factors (e.g., sex, age, income, education, and musical
training). Similarly, much of television advertising is targeted at consumers based on
relevant sociodemographic categories. This study thus aimed to measure differences
between sociodemographic groups regarding their reception of commercials (as measured
by appeal and congruency ratings) based on the type of music employed. Because
classical music is a relatively clearly defined genre that is recognizably different from
most popular styles, and because, as previous studies have shown, its listeners typically
fall into clear sociodemographic groups (e.g., older, richer, and more educated), the focus
of this study was on classical music, though distractor tracks using various other styles
were included as well. The 557 respondents were divided into four sociodemographically
equivalent groups as follows: Group 1 listened only to the musical tracks of the stimuli
(i.e., without image tracks); Group 2 watched only the video tracks (i.e., without music);
Group 3 watched the original commercials with music and video; and Group 4 watched
“recombined” commercials in which music tracks were swapped. The stimuli comprised
nine nationally aired commercials for products ostensibly appealing to different
sociodemographic groups. In order to control for the effect of music, the chosen
commercials consisted largely or only of music and images, five using classical music,
four using other styles. The results of the study showed that while there were significant
correlations between respondents’ sociodemographic background and their preferences
for certain musical genres, and between their sociodemographic background and
preferences for certain ads/products, these differences disappeared when music and
images were combined, regardless of the musical style used. The findings suggest that
sociodemographic differences do not play a central role in the processing of music in
commercials, classical or otherwise.
ISBN 1-876346-65-5 © ICMPC14 126 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 127 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
entirely new as a melodic corpus, having been used in previous the study took all pieces and averaged them together, or we
studies before, this corpus contains solos (N=59) with added in simply need more musicians to rate the pieces to provide a
harmonic information and metadata (such as recording more accurate model.
information, and information on the sidemen) in *kern format We decided to also examine the relationship between the
for processing using David Huron’s Humdrum toolkit (Huron, instruments transposed pitch and its concert pitch. We
1995). We also introduce solos (N=57) from the Complete performed a t-test between the original key Charlie Parker
Transcriptions of Clifford Brown by Marc Lewis (1991). played his solos and two half steps up and down comparing
While equal in number of solos compared to the Omnibook, average idiomaticism ratings of the original key with its
this Brown corpus contains almost twice as much data and transpositions. On average original key solos scored an
additionally contains realized chord progressions. We also use idiomaticism ratings of 1.25 with a half step above scoring on
some transcriptions of Dizzy Gillespie (N=12). All corpus average 1.48 (p < .001) and a half step below scoring 1.47
material is available for use upon request. (p<.001). Significance was still maintained after using the
Bonferroni correction for multiple t-tests.
Additionally difficulty ratings for each instrument were
III. IDIOMATICISM AND KEY CHOICE mapped onto each corpus to explore the effect of difficult
Our first task was to replicate the findings from Huron and ratings on transpositions. The first three charts below (Figures
Berec (2009) that showed that compositional choices were 1-2) show how ratings from the three instruments map on to the
found to be more idiomatic for the instrument if the composer Charlie Parker Corpus. Although individual differences
of the piece were themselves considered a composer for that between keys are not overwhelming, there is a clear effect of
instrument. Our study is not an exact replication of this in that range.
we rely on improvisatory material as opposed to composed
music. Below we outline how this data was obtained. These
ratings are then used throughout the rest of the studies.
A.Obtaining Ratings of Idiomaticism
A web study was set up to collect data on perceived
difficulty of note-to-note transitions by instrument from
advanced players. Respondents who identified as advanced
level players on alto saxophone (N=11), tenor saxophone ( N=
7) and trumpet (N=15) made ratings on random subsets of a
collection of notes presented on the treble clef from a pool of
every note transition from G3 to G5. A priori we decided to
only use natural, sharp, and flat notes, excluding all double
sharps and flats since this study was not based on orthography.
Participants were presented with the musical notation and
asked to make a rating on the keyboard using a 7 point Likert
scale with 1 indicating “One of the Easiest Intervals to Play”
and 7 indicating “One of the Most Difficult Intervals to Play”.
Participants also had the option to respond with a 0 to indicate
that an interval was unplayable on their instrument since the
pool of notes used was not specific to either instruments range
and will serve as the basis for collecting future data on Figure 1. Difficulty ratings mapped onto Charlie Parker Solos.
note-to-note transitions on other instruments. The middle of the graph is the untransposed rating, with
Ratings from each instrument were collected, pooled, transpositions up twelve semitones to the right, and down
averaged, and then mapped on to each corpus. Solos from each twelve semitones to the left.
corpus were additionally transposed up and down 11 half steps
with the appropriate difficulty ratings being mapped on to the
new note transitions. Solos were transposed both up and down
to further explore effects of range.
B.Improvisation and Key Choice
First, we examined the works of a trumpeter (much like the
Huron and Berec study), performing a t-test between the
original key Clifford Brown played his solos and two half steps
up and down comparing average idiomaticism ratings of the
original key with its transpositions. On average solos played at
Brown’s original transposition scored an average lower with a
rating of 1.31 compared to the 1.39 rating from a half step
above (p<.001). Interestingly, no statistically significant
difference was found in the idiomaticism ratings when
transposed a half step above. This could be due to the fact that
128
Table of Contents
for this manuscript
129
Table of Contents
for this manuscript
V. CONCLUSION
Our current study attempted to demonstrate the role of
note-to-note affordances on improvisation. While there does
seem to be some effect on key-choice, it is not always
replicable when going up a half step when looking at trumpet
ratings, a result that came as a surprise to us (including the jazz
trumpeter in the group). Our second hypothesis regarding a
trend in difficulty ratings over the course of an improvisation
showed no significant trends in either direction, and we
therefore failed to reject our null hypothesis. Future work will
gather more data on these note-to-note transitions, in the hopes
of having more reliable ratings, and will expand to look at the
interactions between difficulty ratings, entropy measures, and
rhythmic aspects, as well as the top-down effects of harmony,
improvisational interaction, and schema.
ACKNOWLEDGMENT
We would like to thank Craig Sapp for his assistance with
stimuli design and a transposition script.
REFERENCES
Aebersold, J., & Slone, K. (1978). Charlie Parker Omnibook. Atlantic
Music Corp.
Gjerdingen, R.O. (1986). The formation and deformation of
classic/romantic phrase schemata: A theoretical model and
historical study. Music Theory Spectrum, 8, 25-43.
Goldsen, M. H. (1978). Charlie Parker Omnibook: For C Instruments.
Atlantic Music Corp.
Huron, D., & Berec, J. (2009). Characterizing idiomatic organization
in music: A theory and case study of musical affordances.
Huron D. B. (1994). The Humdrum Toolkit: Reference Manual.
Center for Computer Assisted Research in the Humanities.
Johnson-Laird, P. N. (2002). How Jazz Musicians Improvise. Music
Perception: An Interdisciplinary Journal, 19(3), 415-442.
130
Table of Contents
for this manuscript
The lack of a single widely accepted definition of improvisation has not hindered the
development of the field of improvisation studies. In fact, a critical literature is
flourishing by questioning its historical, sociological, and philosophical foundations.
However, in order to do scientific experiments on improvisation, an interim concrete
definition must be chosen. Much psychological and neuroscientific work has defined the
phenomenon in the problematic and culturally contingent terms of novelty and
spontaneity. In this theoretical paper, I argue that without being more critically sensitive
to concepts like “novel” and “spontaneous,” the scientific theories that are built around
them remain equally ambiguous and problematic. I propose a different conceptual
foundation to guide future empirical work. Improvisation can be considered as a way of
knowing. Cognitive-scientific research has elsewhere shown that musical training
changes how people perceive and perform music (i.e., how they know about it). Given
that musicians learn through different methods, it is reasonable to theorize that there are
differences between the ways musicians know about and use musical structures.
Improvisers train in ways that emphasize links between perception and action, not just in
expressive aspects of performance, but also learning which movements create which
harmonies and melodies in a way that may differ from score-dependent musicians.
Improvisers may have different knowledge of sensorimotor contingencies. Perception-
action coupling experimental methods could be adapted to compare groups of
improvisers and “non-improvisers,” categorized on the basis of questionnaires that
evaluate practice methods and musical experiences. Exploring these cognitive differences
can serve as a productive alternative foundation for empirical work that characterizes
what it means to be an improviser, and how improvisers acquire and use their knowledge.
ISBN 1-876346-65-5 © ICMPC14 131 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 132 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
considerably better than people with less musical training in themes. A common heuristic would be that a major theme is a
any score-based perception task. Although musical training second theme, whereas its paired minor theme is a first theme,
should also facilitate performance on the recording-based given conventions about primary and secondary themes in
perception task, the difference in performance between people sonata form. Attempts were made to circumvent the use of
of various levels of musical training may be smaller in the such heuristics. Therefore, the sample consisted of first and
case of the recording-based task. No hypothesis was made second themes that are in the same mode. Additionally, all of
about whether people with a high level of musical training the samples were transposed to have a tonal center of C:
would be better at the score-based or recording-based musical pairs that were in a major mode were transposed to C
perception tasks, since, to some degree, they are different major and musical pairs that were in a minor mode were
skills. transposed to C minor. The transposed samples prevented key
As mentioned in the introduction, the structural features relationships (such as I – V or i – III) from affecting the
operationalized and measured in Warrenburg and Huron discrimination between first and second themes. Therefore,
(2015) have been shown to differ between first and second the categorization of a pair of musical themes must be done
themes. These features included durational pace (a proxy for with regard to more subtle musical criteria, instead of only
tempo), average interval size, rhythmic smoothness (measured using basic knowledge about music theory.
using the nPVI), mode, key, articulation, and dynamics.
Accordingly, when differentiating between two themes,
musicians should be able to use these criteria to make their IV. METHOD
judgments. Additional criteria may be used to make these
distinctions, as well, but no a priori hypotheses were made The study consisted of two parts, both based on a
regarding other criteria. Importantly, the consideration of two-alternative forced-choice task. The first part asked
harmony and phrase-structure were not considered as a priori participants to examine the musical score of a pair of musical
hypotheses. In order to compare the perceptual and corpus themes that came from the same work; they were asked to
studies, it was decided to retain the same criteria in the identify which theme they believed comes first in the music.
perceptual study as was tested in the preceding corpus study. The second part requested the same participants to listen to
Harmonic implications will be discussed more at the end of recordings of the same musical thematic pairs and to identify
the paper. which theme was presented first in the music. Both the
score-based task and the recording-based task consisted of
two conditions: A) musical samples with dynamic and
III. SAMPLE articulative information and B) musical samples without
dynamic and articulative information.
For the purposes of this study, a sample of 22 piano works
was taken from Sam Barlow and Harold Morgenstern’s A. Score-Based Perception
Dictionary of Musical Themes (1948). Sampled items were
One of the hypotheses of this study is that participants are
limited to works containing precisely two themes. That is,
able to categorize first and second musical themes by looking
pieces containing an introductory theme or entailing more
at the information contained in the musical score. The task
than two themes were excluded. Approximately half of the
used to test this hypothesis required participants to distinguish
sample consisted of works in sonata form, with the remaining
between a pair of themes that came from the same musical
half written in other structural forms. Sampled piano works
work. In order to avoid variability between different published
were composed between 1777 to 1939, with most works
musical editions, all themes were reprinted using the same
composed during the Classical and Romantic periods. Using
notation software (Finale 2014). The themes were notated as
the Humdrum Toolkit (Huron, 1993), the normalized pairwise
they first appear in the musical score. All external cues were
variability index (Low, Grabe, & Nolan, 2000), average pace
eliminated from the Finale notation. This included types of
value, average interval size, mode, and key were estimated. In
information such as measure numbers, tempo markings, and
addition to these measures, manual coding of articulation and
headers like “Minuet” or “Trio.” Performance indications
dynamics was carried out for all 22 piano works. Dynamics
such as “animato,” “dolce,” and “furioso” were also removed
were coded using the traditional Italian levels: ppp, pp, p, mp,
from the thematic samples. In this study, accompaniment of
mf, f, ff, and fff. Sforzandos, crescendos and diminuendos were
the theme was included, allowing some harmonic and textural
ignored. Only the basic dynamic level was coded. In the
cues to be available to the participants. Further discussion of
analysis, the eight levels (identified above) were treated as
this methodological choice is deferred to the end of the paper.
ordinal data. That is, ppp was coded as 1, pp as 2, fff as 8, etc.
The hypothesis also specified that participants should
Similarly, articulation markings in the thematic passages were
perform better on this task when they are given access to
subjectively coded as one of five possible designations, again
articulation and dynamic information. Therefore, two versions
coded using an ordinal scale: very legato (coded 1), generally
of the musical samples were notated using Finale 2014. The
legato (coded 2), balanced/unclear (coded 3), generally
first version consisted of the musical notes, dynamic markings,
staccato (coded 4), and very staccato (coded 5).
and articulation markings; this version was deemed the “With
The comparison of thematic pairs from a single musical
Dynamics and Articulation (With DA)” sample. A second
work could lead musically-trained participants to respond in
version removed the dynamic and articulation markings and
one of a few ways. The first outcome is that participants could
was deemed the “Without Dynamics and Articulation
distinguish between the musical cues written by the composer,
(Without DA)” sample. Therefore, in the “Without DA”
using one or more characteristics of the music in order to
sample, participants could only use certain features of the
make their judgment. Participants may use subtle criteria,
musical structure (such as nPVI, average interval size, and
such as variations in interval sizes, articulation, and dynamics
pace value) to make the determination of “first” and “second”
to distinguish between two themes. Another possibility is that
participants use a heuristic to categorize first and second
133
Table of Contents
for this manuscript
themes. An example of each of the two conditions is shown in parts of the study. This means that for a given musical sample,
Figure 1. the condition (either “With DA” or “Without DA”) was
maintained across both parts. If a participant saw the score to
a Mozart sonata with articulation and dynamics and saw the
score to a Beethoven sonata without articulation and dynamics,
he or she would hear the recording of the Mozart sonata with
articulation and dynamics and hear the Beethoven recording
without articulation and dynamics. The musical recordings
were therefore counterbalanced in a parallel manner to the
presentation of the notated scores. In summary, every musical
recording in the experiment was heard with and without
dynamics and articulation, but each participant only heard one
of the two versions of the recordings.
To eliminate bias from real human performance, each
recording of the piano music was made from the Finale
notation file, using an automated setting called “Human
Playback.” The tempo chosen was the one indicated by the
Figure 1. The same musical score shown in the “Without
score or by judgment of the first author (who was familiar
Dynamics and Articulation” condition (top line) and in the
“With Dynamics and Articulation” condition (bottom line). The
with all of the piano pieces in the sample). Two separate
music is the first theme of Dvořák’s Waltzes, Op. 54, No. 3. recordings of the piano works were obtained from Finale: a
control condition (akin to “Without DA”) and a performance
Participants were presented with two themes from a condition (akin to “With DA”). In the performance condition,
musical work on a single screen. They were asked to the audio file included articulation and dynamic markings.
categorize the pair of themes by indicating which theme they The design of the recording-based task was the same as the
predicted came first in the music and which theme they design of the score-based task. Participants heard audio files
judged to appear second in the music. Additionally, they were of the first and second themes from the twenty-two piano
asked to rate the confidence of their response on a 10-point works. They were told they could listen to each theme as
scale. The order of presentation for the musical works was many times as they wished. Their task was to indicate which
randomized, as was the order of the first and second themes theme came first in the music and which theme appeared
for each musical work. second in the music. They were also asked to rate how
In an attempt to circumvent demand characteristics, it was confident they were about their response using a 10-point
decided that each participant would not be presented with scale. The order of presentation was randomized, as was the
both the “With DA” and “Without DA” versions of each order of the first and second themes for each musical work.
musical pair. Instead, the study consisted of two experimental Two musical samples were repeated during the course of the
groups, utilizing a between-subjects design. The groups were experiment in order to collect test-retest reliability.
counterbalanced with regard to the dynamic and articulative
condition. In Participant Group 1, 11 of the musical pairs were C. Protocol
given “With DA” and 11 musical pairs were shown “Without Participants completed the two tasks online, via the
DA.” In Participant Group 2, these conditions were reversed, Qualtrics Research Suite. In order to account for individual
so that the musical pairs that were in the “With DA” condition variability, the study was created as a within-subjects design,
in Participant Group 1 were the musical pairs that were in the so participants completed the recording- and score-based tasks
“Without DA” condition in Participant Group 2. Therefore, with the same conditions of the musical pairs (either with or
every musical thematic pair was used in the experiment with without dynamic and articulative information). Once again,
dynamics and articulation and without dynamics and participants were asked to indicate which of the two themes
articulation, but each participant only saw each thematic pair was first and their confidence.
in one of the two conditions. This design allows overall After the participants completed both the score-based and
performance of the music categorization to be compared recording-based tasks, listeners were asked to rank-order the
across the two conditions. Presumably, people might be possible criteria they may have used in making their
expected to perform better in the “With DA” condition than in determinations. In addition to the musical criteria from the
the “Without DA” condition. corpus study (excluding mode and key, which were fixed),
B. Recording-Based Perception five blanks were provided so that participants could include
other criteria in their rankings. Participants were also asked
The second major hypothesis is that listeners should be able whether first or second themes were easier to classify
to discriminate between first and second themes from audio (generalizable/had more of a prototype) and whether it was
recordings. The corollary hypothesis is that recordings easier to classify first and second themes from the notated
including dynamics and articulation should result in better scores or from sound recordings. A final question was posed
overall performance than recordings without these two where participants were asked to comment on the relationship
features. Therefore, similar to the score-based perception task, between thematic order (first and second themes) and
the recording-based task involved two conditions: one with thematic hierarchy (main theme versus subservient theme).
the inclusion of articulation and dynamic information (“With After completing these tasks, participants were asked basic
DA”) and one without these features (“Without DA”). demographic questions about their musical experience –
In order to compare a single participant’s performance on estimated years of music theory training, years of instrumental
the score-based and recording-based perceptual tasks, a or vocal training, amount of time per week spent listening to
within-subjects design was maintained across the two major classical music, whether or not they were familiar with
134
Table of Contents
for this manuscript
sonata-allegro form, and their overall familiarity with Finale participants would be better at categorizing themes from
and MIDI recordings. sound recordings when dynamics and articulation were
included than from sound recordings without this information.
The results of this corollary hypothesis were significant
V. RESULTS according to McNemar’s test, with 58.9% correct judgments
A. Demographics in the recordings with dynamics and articulation and 55.8%
correct judgments in the recordings without dynamics and
Forty-four participants, primarily from the Ohio State articulation, X2(1) = 9.820, p = 0.0017. The results are shown
University School of Music, took part in the experiment. The graphically in the right half of Figure 2.
median age was 20 (range from 19 to 39). Of these Participants performed with a slightly higher accuracy in
participants, twenty-one (48%) were female. Participants the recording-based task (57.3% correct judgments) than in
exhibited a wide range of musical training, with a range of the score-based task (55.7% correct judgments), McNemar’s
0-25 years of formal music theory training (mean = 3.66). Six X2(1) = 16.413, p < 0.0001. In response to a question about
participants had one or fewer years of music theory training. whether it was easier to classify first and second themes from
Participants had a range of 1-30 years of formal instrumental
the scores or the recorded excerpts, 61% (27/44) of the
or vocal training (mean = 9.43). About half of the participants
participants indicated that the recording-based task was easier
claimed that they had learned sonata-allegro form (48%) and
than the score-based task.
the majority of the participants had medium or no experience
with Finale or MIDI (77% had medium to no Finale
experience, 82% had medium to no experience with MIDI).
B. Statistical Analysis
Between the two parts of the study (score-based and
recording-based perception), each subject made 44 judgments
about first and second themes, with an additional 4 judgments
for test-retest trials. With 44 participants, this means that 1936
judgments about first and second themes were made (2112
including test-retest trials). Of these judgments, participants
selected the correct answer 57% of the time (1094/1936),
which was significantly different from chance, X2(1) = 32.802,
p < 0.00001. Although the effect size is small, the results are Figure 2. Participants perform slightly better in situations when
nevertheless consistent with the hypothesis that, on average, the music contains articulation and dynamic information (“With
participants are able to use structural information from the D.A.”) than when the music does not contain this information
music to determine which theme from a pair comes first in a (“Without D.A.”) for both the score-based and recording-based
musical work. judgments.
One of the a priori hypotheses was that participants could
distinguish first and second themes by looking at the C. Criteria Used in Decisions
information in the musical score. The data suggest that
participants are able to do this at a level statistically different After the participants finished the two tasks, they were
from chance, with 56% of the judgments (539/968) classified asked to rank order the criteria they used to make their
as the correct response, X2(1) = 12.50, p = 0.0004. The results determinations. In addition to the criteria from the corpus
are shown graphically in the left half of Figure 2. The study, five blanks were provided so participants could list
corollary to this hypothesis was that participants should their own criteria. The results are shown in Table 1. Of the
perform better when the provided scores include dynamic and five features expected to differ between first and second
themes, rhythmic smoothness (an indication of nPVI) was
articulative information than when the given scores do not
considered to be the most important feature to aid in
include this information. The individual differences from both
discrimination between first and second themes. Articulation
participant groups were combined to use McNemar’s test, a and dynamics were consistently rated on the lower half of the
nonparametric X2 test used to compare differences in list of criteria. This is not surprising, as only half of the
categorical data. The results were consistent with this samples seen by participants contained dynamic and
corollary hypothesis, with 56.6% correct judgments in the articulative information (the “With DA” condition, as opposed
scores with dynamics and articulation and 54.8% correct to the “Without DA” condition). The category marked “Other”
judgments in the scores without dynamic and articulation was also considered to be important in making the
markings. Although these values are close in magnitude, they determinations of first and second themes. Participants listed
are statistically different using McNemar’s test, X2(1) = 5.915, features such as the note density (texture), harmonic
p = 0.015. The results are graphically depicted in the left half complexity, melodic range, cadence type, phrase length, type
of Figure 2. of meter (simple or compound), speed, and difficulty
The other main a priori hypothesis was that participants (virtuosity) as features that helped make the determinations
could categorize first and second themes when listening to between first and second themes. These additional features are
MIDI sound recordings. As shown in the right half of Figure 2, listed in Table 2.
participants were able to perform this task, with 57% of the
judgments being the correct answer (555/968), X2(1) = 20.831,
p < 0.0001. The corollary to this hypothesis was that
135
Table of Contents
for this manuscript
Table 1. Rank-ordered criteria used to determine which theme group (55.6% correct), McNemar's X2(1) = 173.65,
from a pair comes first or second in the music. The mean Bonferroni-corrected p < 0.001. The participants with higher
importance of each criterion is shown, where the smallest musical training performed better on the recording task
number means it is considered to be the most important (closest (61.8% correct) than on the score-based task (57.3% correct),
to rank number 1). Bonferroni-corrected p < 0.05. The participants with less
training also performed better on the audio task (56.0%
Criteria Scores Recordings correct) than on the notation task (55.2% correct), although
Articulation 3.56 3.80 the effect size is small, Bonferroni-corrected p < 0.05.
Dynamics 3.47 3.13 The second post-hoc test examined whether or not
Durational Pace 2.68 2.95 participants with training in sonata theory (n = 23) would do
better on the task than participants without training in sonata
Interval Size 4.15 4.12
theory (n = 21). Although skewed in the expected direction,
Rhythmic 2.22 2.52 the difference was not statistically significant,
Smoothness Bonferroni-corrected p = 0.126.
Other 2.77 2.23 It might be expected that listening to and engaging with
music on a regular basis probably also contributes to
perceptual judgments about first and second themes.
Table 2. Additional criteria listed by participants as features that
Accordingly, two further tests were carried out. Participants
helped them perform the discrimination task, with the number of
mentions for each feature. Criteria with no mentions in each task
were categorized by the amount of instrumental training they
are italicized. had received. The average amount of training in the
participant groups was 9.4 years of instrumental instruction.
The participants were divided according to a mean split into a
Scores Recordings
“High Training” (10 or more years of training) and a “Low
Variations 1 Variations 1 Training” category (n = 24 and n = 20, respectively). The
Texture 4 Texture 2 groups performed equally well on the tasks,
Sequences 2 Sequences 2 Bonferroni-corrected p = 2.83).
Finally, participants were split into two categories based on
Elaboration 1 Elaboration 1
how much time they spent listening to classical music. A
Declamatory Style 1 Declamatory Style 1 mean split was performed (mean = 4.33 hours), resulting in
Harmonic Complexity 4 Harmonic Complexity 5 “High Listening” (n = 31) and “Low Listening” (n = 13)
Difficulty 2 Difficulty 1 categories. Those in the high listening category performed
Other musical 2 Other musical 0
significantly better on the task (59.4% correct) than did those
markings (e.g. pedal) markings (e.g. pedal) in the low listening category (55.3% correct), McNemar's X2(1)
Cadences 2 Cadences 1 = 76.169, Bonferroni-corrected p < 0.00001.
Of course, we must be cautious in interpreting these results.
Modulation 2 Modulation 2 There is a certain arbitrariness in dividing a continuous
Harmonic Rhythm 2 Harmonic Rhythm 1 variable into categories, even when well-intentioned.
Melodic Range 1 Melodic Range 0 Moreover, many of the groups performed similarly to each
Chromaticism 3 Chromaticism 1 other, despite the statistical significance between the groups.
Two important trends do emerge, however. Those with more
Mode 1 Mode 2 theory training and those who regularly listen to classical
Melodic Smoothness 1 Melodic Smoothness 1 music performed the best on the discrimination tasks. This
Tempo/Speed 0 Tempo/Speed 3 follows basic intuition about musical training.
Meter 0 Meter 1
Phrase Length 0 Phrase Length 1
VI. IMPORTANCE OF HARMONY
Phrase Completeness 0 Phrase Completeness 1
A modified version of the perceptual task was conducted
after the data analysis from the original perceptual study was
carried out. Recall that when participants were asked to
D. Listener Attributes describe the criteria they used to make their decisions, many
participants indicated that they used harmonic considerations.
A few post-hoc tests were conducted regarding
In this modified task, the same sample of musical works was
characteristics of the listeners. The first consideration
utilized. However, the harmony was removed so that the
regarded the amount of music theoretical training in which the
musical excerpts consisted of only a single melodic line (see
participants had engaged. The mean years of theory training in
Figure 3). The task protocol remained the same.
the participant group was 3.66 years. It was thought that
Twelve participants completed this modified task, none of
music theory graduate students and upper level undergraduate
whom had taken the original perceptual task. The average age
music theory majors might perform better on the tasks. The
was 20 years (range from 18-25), and 58% were female (7/12
participants were accordingly split into two groups. The “Less
participants). The participants had an average of 4.6 years of
Theory” group (n = 34) consisted of those who had less than 4
music theory training (range from 1.5-14 years) and an
years of music theory training, whereas the “More Theory”
average of 8.5 years of instrumental or vocal training (range
group (n = 10) consisted of those who had 4 or more years of
from 1-14). The results are not consistent with the hypothesis
training. As expected, the “More Theory” group performed
that participants can differentiate first and second themes,
better on the tasks (59.5% correct) than did the “Less Theory”
136
Table of Contents
for this manuscript
with 48% accuracy in judgments, p = 0.408. There was no participants who listen regularly to classical music are also
difference in performance for music with or without dynamics likely to have received greater theoretical training. In any
and articulation, X2(1) = 0.329, p = 0.566. Participants event, the small effect size indicates that the task of
performed equally well on the score-based and discriminating first and second themes is far from easy.
recording-based tasks, X2(1) = 0.291, p = 0.589. A median Sensitive listeners may indeed be aware that a given theme is
split was performed to divide participants into groups with likely to appear first. However, the current study suggests that
more and less music theory training. Both of these groups it is only a small component of music perception, even for
performed equally well on the tasks, X2(1) = 0.307, p = 0.580. musician listeners.
The results seem to indicate that participants are not able to
distinguish first and second musical themes without access to
REFERENCES
the underlying harmony.
Barlow, H., & Morgenstern, S. (1948). A dictionary of musical
themes. New York, NY: Crown Publishers.
Burnham, S. (2002). Form. In T. S. Christensen (Ed.), The
Cambridge history of Western music theory (pp. 880-906).
Cambridge: Cambridge University Press.
Caplin, W. E. (1998). Classical form: A theory of formal functions for
the instrumental music of Haydn, Mozart and Beethoven. New
York: Oxford University Press.
Caplin, W. E., Hepokoski, J., & Webster, J. (2010). Musical form,
Figure 3. First theme (top line) and second theme (bottom line)
forms, & Formenlehre: Three methodological reflections (2nd ed.).
from Dvořák’s Waltzes, Op. 54, No. 3.
Pieter Bergé (Ed.). Leuven, Belgium: Leuven University Press.
Churgin, B. (1968). Francesco Galeazzi's description (1796) of
VII. CONCLUSION sonata form. Journal of the American Musicological Society, 21(2),
The goal of the current study was to determine whether 181-199.
people could differentiate between a pair of musical themes Cook, N. (2002). Epistemologies of music theory. In T. S.
using visual and auditory factors. Overall, participants were Christensen (Ed.), The Cambridge history of Western music theory
able to use structural musical features to differentiate first and (pp. 78-105). Cambridge: Cambridge University Press.
second themes from the same musical work; they could do Dunsby, J. (2002). Thematic and motivic analysis. In T. S.
this both from sound recordings and from musical scores. In Christensen (Ed.), The Cambridge history of Western music theory
general, participants performed better in the conditions where (pp. 907-926). Cambridge: Cambridge University Press.
dynamics and articulation were available than in conditions Hepokoski, J. A., & Darcy, W. (2006). Elements of sonata theory:
without these features; however, the performance across these Norms, types, and deformations in the late-eighteenth-century
two conditions was similar. In general, performance was sonata. New York: Oxford University Press.
Huron, D. (1993). The Humdrum Toolkit: Software for Music
similar across many of the conditions of the study. For
Research. Center for Computer Assisted Research in the
example, in both the score-based task and recording-based
Humanities.
task, accuracy in judgments was consistently around 57-60%.
Koch, Heinrich Christoph. (1983). Introductory essay on composition:
This is significantly different from chance, but is small in The mechanical rules of melody, sections 3 and 4. (N. K. Baker,
magnitude, indicating that the task was difficult. Participants Trans.). New Haven: Yale University Press. (Original work
with more theory training and those who regularly listen to published 1793).
classical music performed better on the discrimination tasks. Kollmann, A. F. C. (1973). An essay on practical musical
Score-reading and close-listening are related, but distinct composition: According to the nature of that science and the
musical skills. Despite the nearly identical performance on the principles of the greatest musical authors. New York: Da Capo
score-based and recording-based tasks, participants seem to Press. (Original work published 1799).
indicate that the recording-based task was easier than was the Low, E. L., Grabe, E., & Nolan, F. (2000). Quantitative
score-based task (61% indicated that the recording-based task characterizations of speech rhythm: syllable-timing in Singapore
was easier, but only 18% indicated that the score-based task English. Language & Speech, 43 (4), 377–401.
was easier). Marx, A. B. (1845). Die Lehre von der musikalischen Komposition,
The corpus study carried out by Warrenburg and Huron praktisch-theoretisch. Vol. 3. Leipzig: Breitkopf und Härtel.
(2015) established that there are indeed broad structural Newman, W. S. (1963). The sonata in the classic era. Chapel Hill:
differences between first and second themes in classical University of North Carolina Press.
Western instrumental repertoire—consistent with observations Rosen, C. (1971). The Classical style: Haydn, Mozart, Beethoven.
made by music scholars. The question addressed in the current New York: Viking Press.
study is whether musician listeners are sensitive to these Schellenberg, E. G., Corrigall, K. A., Ladinig, O., & Huron, D.
differences. (2012). Changing the tune: listeners like music that expresses a
contrasting emotion. Frontiers in Psychology, 3(574), 1-9.
On the one hand, the enhanced performance of participants
Warrenburg, L. A., & Huron, D. (2015, August). On contrasting
who regularly listen to classical music suggests that an ability
expressive content between primary and secondary musical themes.
to discriminate first and second themes might arise through Poster presented at the Society for Music Perception and
simple exposure involving implicit learning. However, the Cognition, Nashville, TN.
enhanced performance of participants with theory training
suggests that the ability to carry out the task may depend
primarily on explicit theoretical knowledge. Those
137
Table of Contents
for this manuscript
Experiencing music can be accompanied by the state of being absorbed by the music, a
state in which individuals allow the music to draw them into an intense listening
experience. We investigated whether being absorbed by music might be related to eye
movement control, pupil size and blinking activity. All three measurements index
cognitive workload. If they relate to felt absorption, they might pick up cognitive
processing as part of state absorption. We presented musical excerpts to listeners in two
Experiments. In Exp. 1 we instructed participants to fixate centrally, measuring
microsaccades, miniature movements of the eyes during fixational movements. In the
free viewing task of Exp. 2 we measured larger, eye movements, called saccades. Eye
movements, pupil dilation and blink rate were recorded. Musical excerpts were selected
from a broad range of styles. The listeners rated each excerpt for felt absorption, valence,
arousal, preference and familiarity. Acoustic properties of the music were analyzed using
the MIR toolbox. We used linear mixed models to predict the subjective ratings by the
physical properties of the eye. In Exp. 1 we found clear evidence for microsaccade and
blink rate to predict absorption and liking. Predicting valence included an additional
effect of pupil size. Knowing and arousal were not reliably related to our measurements.
In Exp. 2 fixational microsaccades were not expected to be effective as eyes were moved
freely. Here, some of the effects shifted to mean saccadic amplitude. Acoustic properties
contributed to predictions of subjective ratings as well in both Experiments. Our results
are important for two reasons. First, objective measurements can predict subjective states
during music listening. Second, the construct of state absorption might be related to
cognitive processing. To sum, eye tracking seems to be a fruitful tool to investigate
musical processing during listening to music.
ISBN 1-876346-65-5 © ICMPC14 138 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
(d) 29 Chikugo song pieces, (e) 163 Hizen song pieces, (f) 150
ABSTRACT Higo song pieces, (g) 107 Hyuga song pieces, (h) 77 Satsuma
song pieces, (i) 116 Ohsumi song pieces, (j) 11 Tsushima song
A.Background pieces, (k) 9 Iki song pieces were selected. In total there were
We have sampled and digitized the five largest song genres 382,653 tones in the sample of 896 songs for the 11 provinces
within the music corpora of the “Nihon Min’yo Taikan” (e.g. Figure 1).
(1944-1993, NHK) consisting of 1,794 Japanese folk song
pieces from 45 Japanese prefectures, and have clarified that the Sea of Japan
j
differences in the musical characteristics almost match the
East-West division in the folkloristics from a broader
perspective (Kawase & Tokosumi, 2011). However, to conduct k
more detailed analysis in order to empirically clarify the c a
structures by which music has spread and changed in e d
b
traditional settlements, it is necessary to expand the data and do
comparisons based on the old Japanese provinces (Kawase,
f
2015). Province
a: Buzen g
b: Bungo
c: Chikuzen h
B.Aim of the study d: Chikugo
i
e: Hizen
The main purpose of this study is to extract the pitch f: Higo
g: Hyuga
transition patterns from pieces of traditional Japanese folk h: Satsuma
songs, and by making comparisons with Koizumi’s tetrachord i: Ohsumi
j: Tsushima
theory (Koizumi, 1958) in order to make the regional k: Iki Pacific Ocean
classification by the tendency in pitch information.
Influenced by the methods of Western comparative Figure 1. Geographical divisions of the Kyushu district under the
musicology, the Japanese musicologist Fumio Koizumi old province system.
conceived of a scale based not on the octave unit but rather on
the interval of a perfect fourth, and has developed his
tetrachord theory to account traditional Japanese music In order to digitize the song pieces, we generated sequences
(Koizumi, 1958). The tetrachord is a unit consisting of two of notes as follow. In the classification experiment of folk
stable outlining tones called kaku-on (nuclear tones), and one music using hidden Markov models, Chai and Vercoe represent
unstable intermediate tone located between the nuclear tone. melodies in several ways. They suggested that the “interval
Depending on the position of the intermediate tone, four representation” is more compatible with human perception, as
different tetrachords can be formed (see Table 1 and Figure 2). people employ interval information when they memorize,
distinguish or sing a melody (Chai & Vercoe, 2001). Here we
Table 1. Koizumi’s four basic tetrachords. adopt their interval representation by digitizing each note in
terms of its relative pitch. By subtracting the next number
Type Name Pitch intervals (pitch), it is possible to generate a pitch sequence T = (t(1), t(2),
t(3), … , t(i), … , t(n)) that carries information about the pitch
I Min’yo minor third + major second interval to the next note, where, t(i) are the notes i, and n is the
II Miyako-bushi minor second + major third number of elements in a sequence T.
We quantitatively analyzed its pitch transition patterns using
III Ritsu major second + minor third N-gram modeling, and conducted a hierarchical cluster
IV Ryu’kyu major third + minor second analysis based on the frequency of occurrences of the
tetrachords to see the differences in each province.
C.Methods
As a case study, we extracted the musical notes for works D.Results
included in the “Nihon Min’yo Taikan” (Anthology of Japanese Results (will be reported at the conference in more detail)
Folk Songs) (1944-1993, NHK), which is a collection of indicate that pitch transition patterns of Japanese folk songs of
catalogued scores for Japanese folk songs. We digitized the Kyushu district have an overwhelming tendency to either form
entire scores from each province in the Kyushu district perfect fourth pitches or return to the initial pitch within two or
(geographically located in the southern part of Japan) into three steps. In addition, a difference among provinces is
MusicXML file format. In the experiment, (a) 89 Buzen song observed in comparing frequencies of generating the
pieces, (b) 99 Bungo song pieces, (c) 46 Chikuzen song pieces, tertrachords identified by F. Koizumi.
ISBN 1-876346-65-5 © ICMPC14 139 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
perfect
完全4度
perfect
完全4度
perfect
完全4度
perfect
完全4度
fourth fourth fourth fourth
w w w w bw w w w w w #w w
& =======================
=
短3度 長2度
minor major 短2度 長3度
minor major 長2度 短3度
major minor 長3度 短2度
major minor
third second second third second third third second
民謡のテトラコルド 都節のテトラコルド 律のテトラコルド 琉球のテトラコルド
Min’yo Miyako-bushiyo Ritsu Ryukyu
tetrachord tetrachord tetrachord tetrachord
Figure 2. Example of four different types of F. Koizumi’s tetrachord when taking A-D as kaku-on (nuclear tones).
The dendrogram shown in Figure 3 is the result of the structural commonalities and differences between areas by
hierarchical cluster analysis, and the result of applying the conducting analysis on musical corpora nationwide including
Ward method to the extracted probabilities for each province. folk pieces from neighboring regions, which has not really
As a result, we constructed the possibility that the melodic been pursued in existing humanities research fields. We
features in the Kyushu district spread by land and sea routes believe this will empirically clarify the musical culture
based on quantitative analysis (see Figure 4). phenomena by which music spreads and changes.
ACKNOWLEDGMENT
C1
This work was mainly supported by the Japanese Society
for the Promotion of Science (JSPS) Grants-in-Aid for
C2 Scientific Research (15K21601) and the Suntory Foundation
Research Grants for Young Scholars.
C3
REFERENCES
C4 Chai, W., & Vercoe, B. (2001). Folk music classification using
hidden Markov models. In Proceedings of International
C1
C2
C3
C4
Pacific Ocean
E.Conclusions
This study focused on melodies of endangered Japanese
folk songs, and examined the characteristics of its schema
from a quantitative perspective. It is possible to clarify the
140
Table of Contents
for this manuscript
Cong Jiang*1
*
College of Music, Capital Normal University, China
1conger108@yahoo.com
ISBN 1-876346-65-5 © ICMPC14 141 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
modal particles were added); some other versions changed the For Jiangsu and Heilongjiang versions, the two melodic
lyrics to narrate other love stories, such as Ningxia version. contours are generally quite similar, and they are only different
at the beginning and later part. Jiangsu version covers an
B. Analysis of four versions
interval of minor seventh. It starts more flat, and then gradually
This study only involved four versions from Jiangsu, Hebei, goes up and then comes down, so the contour is like an
Heilongjiang, and the northeast area of China, which audio arch-shape; while Heilongjiang version covers an interval of
recordings were available. Only Heilongjiang version was sung minor sixth. It starts more dramatically and goes up first, and
in male voice, the other three versions were sung by women. then comes down a little, and rises again and decreases, so the
The common features of the four versions are: contour is like two-humped camel.
- there are 4 phrases; For Hebei and Northeast versions, both of them cover an
interval of minor sixth, and have some modal particles in lyrics.
- the rhythm is 2/4 beat (except Heilongjiang version: 4/4 Hebei version fluctuates also twice, but the second peak of
beat); contour comes a little bit later; while the generated melodic
contour of Northeast version is not so clear, because the
- the scale is pentatonic (except Hebei version: hexatonic);
singing is with more staccato, and the contour is similar to
- in zhi mode (ended on sol in movable-do system). Heilongjiang version, but flatter with only one peak in the
contour.
The differences in melodic contour and register, motif, and
tempo of the four versions are compared here.
1) Melodic contour & register
The melodic contour of each song recording was analyzed by
Praat (version 5.4.06), which generated a pitch list of each
phrases in the song, and then the contour was drawn in Excel
with the pitch list. So the melodic contour in this study is not a
perceived one, but an algorithmic contour.
The melodic contour presented here only involve the first
phrase “hao yi duo molihua”, which is repeated once in each
version. Jiangsu and Heilongjiang versions (see figure 1a and Figure 1c. Melody contour of first phrase in Hebei version
1b) are more melodic, smooth and legato; while Hebei and
Northeast versions (see figure 1c and 1d) are with more
intervals, because the lyrics in these two versions contain more
modal particles.
142
Table of Contents
for this manuscript
This intervallic relationship could be presented directly in the of them sounds very “regional” or “local”. So it would be better
melody, or indirectly in the melodic structure, or used to find some versions that were sung by native singers of
simplified in the melody. For instance, Heilongjiang and Hebei certain region. Here is the list of recording and singers:
version directly show the motif in the melody in the first phrase;
Jiangsu version shows the motif in its melodic structure, that is, - Jiangsu version, Xiufang Ju (1934-), from Jiangsu province;
there are some tones added in the motif “frame” to decorate the - Heilongjiang version, Song Guo (1931-), from Shenyang
melody; and the motif in Northeast version is shorted to [0250] province, studied and lived in Heilongjiang province for many
(sol - la - do - sol) without minor third in its melodic structure. years.
Tempo - Hebei version, Lan Si, from Anhui province and a folk vocal
The duration of four audio recordings are quite different (see singer.
table 1). Jiangsu version is the fastest, while Heilongjiang and - Northeast version, the singer is unknown.
Northeast versions are the slowest. There is no modal particles
in the lyric of Jiangsu version, so six words correspond 12 B. Participants
notes; two words were added at the end of the lyric of
Heilongjiang version, totally 8 words correspond 16 notes; Participants involved both music students and non-music
three words were added respectively in the middle and at the students. There were 117 music students, major in instrument,
end of the lyric of Hebei version, totally 9 words correspond 18 vocal, music education and recording; and 81 non-music
notes; and two words were added respectively in the middle students, major in mathematics, physics, biology, chemistry,
and at the end of the lyric of Northeast version, totally 8 words computer science, architecture. All of them were freshmen in
correspond 11 notes, and besides, another 5 model particles the university, in their 18 or 19 years old, and they are from
prolong or complement the phrase for another 2 beats. more than 20 provinces in different parts of China.
Jiangsu Heilongjiang Hebei Northeast Each version were presented only the first verse of the song,
and played twice for each. Students were asked to answer
Duration 3.25s 6.88s 7.14s 4.45s where the song originally comes from, and their judgment
could be made according to their musical experience or based
No. of tones 12 16 18 11 on their intuition. The answer of the origin should be as precise
as possible, for instance, it would be better to state “Jiangsu
No. of beats 4 (2/4) 6 (4/4) 8 (2/4) 4 (2/4) province” than “east part of China”.
Tempo (bpm, ca.) 73-74 52-53 67-68 53-54 Besides, they should also answer on which they depended to
make the judgment, either on lyrics, melody (for both music
Thus, considering melodic contour, motif and tempo, and non-music students) or even instrumentation (only for
Jiangsu version may sounds delicate because of its fast tempo music students); and evaluate how confident they were about
and decoration tones in the motif; Heilongjiang version may their answers on a 5-point Likert-scale.
sounds unrestrained because of its dramatic contour and slower
tempo; Hebei version may sounds witty because of its modal V. Result
particles and more tones in the melody; and Northeast version
may sounds melodious because its slower tempo and less All the perceived region were transmitted into numbers which
changed melodic contour. indicate the distance between the original region and perceived
region, and the distance was not in kilometers, but in the
III. AIM number of provinces. For instance, if a student answered
Jiangsu province to Jiangsu version, then he would get 1 point;
From music analysis, it is easy to find the similarities of four if a student answered Hunan province to Jiangsu version, then
version Molihuas in melodic contour, motif, scale, mode, it was checked on the map of China that people should go
rhythm, tempo and lyrics. However, it is interesting to see that across Jiangsu, Anhui, Hubei or go across Jiangsu, Anhui,
“tongzong” songs also have regional styles, and these Jiangxi to get Hunan, so the student will get 3 points.
characters are also mixed with different musical elements. If
people can tell a set of “tongzong” songs based on lyrics, motif, A. Distribution of perceived origins
or melodic contour, etc., can they also distinguish the regional
styles of the songs? What aspects influence their judgment? Generally speaking, music students responded more correctly
And what relationship between the features of “tongzong” than non-music students, and the incorrect answers of music
category and the features of regional style? This study tried to students did not scatter as widely as those of non-music
investigate these questions in perception of listeners. students (see figure 2a & 2b).
IV. Method For Jiangsu version, which is the most popular version in China,
there were 114 valid answers of music students, and 57.89%
A. Materials answered correctly; all the answers of 81 non-music students
were valid, only 37.84% answered correctly, and the
The four versions were collected from published recordings.
Although many singers sang and recorded the songs, but not all
143
Table of Contents
for this manuscript
144
Table of Contents
for this manuscript
Non-music students also presented the significant families (Selfrige-Field, 2007; Volk & van Kranenburg, 2012;
differences among different distant regions (p<0.01). For Cowdery, 1984; Feng, 1998; Xu,2001); while, styles in
Hebei and Heilongjiang version, the students from furthest different regions or in different nations often refer to
regions were separated from the other four groups; for fluctuation of melody contour, modes and rhythm-tempo in
Northeast versions, the students from further, much further and Chinese folk song analysis (Yang, 2002). Thus, perception of
furthest regions were separated from the other two groups, and region style in “tongzong” family is a crossover study. It would
the latter two groups were in the same homogeneous subset. be better to compare with the perception of folk songs in
certain region, so that it would be clear that whether the
common regional features of some version in the “tongzong”
VI. Conclusion category work in the same way or not. However, such research
is relative rare at the moment.
This is a pilot study to explore the perception of regional style
In consideration of musical analysis of four versions, despite
of “tongzong” folk songs. This study took a broad sense of
of common features in motif, rhythm and modes, fluctuation of
“tongzong” category which involves similarities in both music
music contour and tempo were giving direction of region
and lyrics, and folk songs were also chosen with singing
judgment. Furthermore, the strong connection between dialects
version, so it was assumed that listeners could categorize these
and tunes of folk songs can also play an important role. These
songs according to features either in music or in lyrics. That is,
have to be investigated in the future studies.
it was taken as a default that when students heard the first
phrases of four “Molihua” versions, they could tell the four
versions shared the same topic. Based on this assumption, it ACKNOWLEDGMENT
was investigated whether students could tell the region style of
each version. This study is a part of project “Implicit musical knowledge of
From the accuracy of students’ responses on region styles, it Chinese students”, sponsored by the Scientific Research
seems that familiarity of songs plays an important role, because Foundation for the Returned Overseas Chinese Scholars, State
Jiangsu version is so popular for nearly every Chinese. And Education Ministry, 2014
influence of such familiarity may be even stronger than the
familiarity of culture in their hometown which could be more
implicit as part of home identity. Just as the students from the REFERENCES
province where is the origin of the folk song, they got higher Cowdery, J. R. (1984). A fresh look at the concept of tune family.
percentage of accuracy than the students from other areas. Ethnomusicology, 28(3)495-504.
The similar low confidence of both music students and Deliège, I. (2007). Similarity relations in listening to music: How do
non-music students may be not in the same situation. Although they come into play?. Musicae Scientiae, 11, 9-37.
the music students rated low on their confidence, but it seems Feng, G.Y. (1998). Chinese Tongzong Folk Songs. Beijing:China
that they have smaller “hesitant area” than non-musical federation of literary and art circles publishing house.
Klusen, E., Moog, H., & Piel, W. (1978). Experimente zur
students, that is, they were quite sure or clear about whether
Mündlichen Tradition von Melodien. Jahrbuch für
they know the region styles of songs or not. While some of Volksliedforschung, 2, 11-32.
non-music students tried to seek the solution with modal Selfridge-Field, E. (2007). Social dimensions of melodic identity,
particles in the lyrics which also present some local features. cognition, and association. Musicae Scientiae, Discussion Forum
How music elements play roles in their judgments? It is a 4A, 77-97.
little bit difficult to answer this question in this study. In the Volk, A., & van Kranenburg, P. (2012). Melodic similarity among
musical analysis, the most common feature of the four folk songs: An annotation study on similarity-based
“Molihua” versions is the motif, which is presented directly in categorization in music. Musicae Scientiae, 16, 317-328.
the melodies or presented indirectly in melodic structures; and Xu, Y.Y. (2001). My understanding of Tongzong folk songs --
discussion with Prof. Feng. Huangzhong-Journal of Wuhan
also rhythm, which is in the same or in a comparable meter. As
Music Conservatory, 3, 64-69.
former studies mentioned, characteristic motif is the most Yang, L.L. (2014). A Brief History of Moo-Lee-Chwa in Modern
important for tune family categorization (Volk & van Times. Dissertation, Northeast Normal University.
Kranenburg, 2012; Cowdery, 1984), while lyrics could be Yang, R.Q. (2002). Melodic Form of Chinese Folk Songs. Shanghai:
hardly the main reason for categorization (Volk & van Shanghai Music Publishing House.
Kranenburg, 2012). Motif could also play an important role in
“tongzong” family categorization either as local or global
features, but this study did not deal with this issue. However,
this can be investigated in further studies with non-lyric
versions.
Different from the study by Volk and Kranenburg, in which
they tried to find out the musical similarities of 360 folk song
melodies in 26 tune families, this study focused on whether
listeners could distinguish regional style of “tongzong” family.
That is, characteristics of “tongzong” family and those of
regional style are different categorizations. In music analysis,
similarities of motif, melody contour and comparable rhythm
are usually mentioned in the tune families or “tongzong”
145
Table of Contents
for this manuscript
Despite advances in music information retrieval, extracting symbolic notation from audio
files is still problematic. Currently options for encoding symbolic notation entail either
manually encoding scores (which is time-consuming) or making use of optical music
recognition software, which requires a human to correct for errors, and is limited to
repertoires in which Western musical notation is available. Similar struggles in
digitization with print media have been remedied using systems such as “Captcha” and
“reCaptcha”, which crowdsource the tasks of validation. This paper introduces a similar
approach that both crowdsources and gamifies the process of symbolic music encoding.
This project provides a paradigm that focuses on the transcription of non-traditional
repertoire that would also be difficult for MIR-related technologies to properly capture.
The platform encodes audio data into symbolic notation by transforming the encoding
process into a game. Users provide parameters such as tempo, onsets, contour, and
eventually melody, and are rewarded for accuracy within a framework that encourages
accurate identification of musical material. Data are identified over the course of multiple
stages. Users play levels of the game specific to identifying tempo, onset of melodic
material, and melodic contour. Once a reliable tempo is determined for piece by enough
uses, participants can then tap onsets to allow for the calculation of rhythmic values. A
third level asks participants to identify the appropriate contour of a three-note musical
event, and a final level implements a probe-tone paradigm to ask the listeners the
“goodness-of-fit” of selected pitches within the appropriate contour. This project is able
to accurately identify tempi and onsets with a high degree of success, allowing for the
rhythmic values of melodic lines to be encoded. These first two levels are available as
both a web-service and mobile application.
ISBN 1-876346-65-5 © ICMPC14 146 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In the Blink of an Ear: A Critical Review of Very Short Musical Elements (Plinks)
Felix C. Thiesen,*1 Reinhard Kopiez,*2 Christoph Reuter,#3 Isabella Czedik-Eysenberg,#4 Kathrin Schlemmer *5
*
Hanover Music Lab, Hanover University of Music, Drama and Media, Hanover, Germany,
#
University of Vienna, Austria,
*
Catholic University of Eichstätt-Ingolstadt, Germany
1
felix.thiesen@hmtm-hannover.de, 2reinhard.kopiez@hmtm-hannover.de,
3
christoph.reuter@univie.ac.at, 4isabella@czedik.net, 5kathrin.schlemmer@ku.de
ABSTRACT acoustic cues that are crucial for a rapid and intuitive
At the turn of the millennium, several much-noticed studies on the classification of musical information. Here she referred to a
rapid assessment of musical genres by very short excerpts ("plinks") ground-breaking publication by Gjerdingen and Perrott (2008).
have been published. While at first the research mostly focused on 400 short stimuli (250, 325, 400, 475, 3000 ms) were derived
clips of popular pieces of music, the recognition of elements in the from prototypical popular compositions representing 10
millisecond range has since been attracting growing attention in individual genres (Blues, Classical, Country, Dance, Jazz,
general. Although previous studies point to surprisingly short Latin, Pop, R&B, Rap and Rock). While the 3-seconds-
durations of exposure for above-chance genre recognition (mostly stimuli were used for the investigation of ceiling effects, the
between 125 and 250 ms), interpretations of the results are remaining 320 shorter stimuli in a randomized manner. For all
ambiguous due to blurring factors. For example, varying duration of the shorter stimuli, the experimenters reported
thresholds for genre recognition could have been caused by above-chance recognition levels – increasing with ascending
coarse-meshed iterations, unstable audio-quality of sources used, stimulus durations.
unsatisfactory sample sizes, and insufficient theoretical foundation. In the same year, Schellenberg, Iverson and McKinnon
In a pilot study, based on a selection of original stimuli used in (1999) circumvented the difficulty of unclear genre
related precedent studies, we aim to (a) identify essential spectral distinctions by focusing on recognition rates of specific
low-level audio features of the stimuli; (b) determine spectral cues compositions. By asking their participants to name the titles
which are obsolete for rapid assessement by means of a correlation and interpreters of popular songs, they found absolute
analysis between audio features and recognition rates; (c) test for the
recognition rates for stimuli of 100 ms and 200 ms. Rentfrow
reliability of genre recognition performance; and (d) compare the
and Gosling (2003) tried to avoid unclear categories by a
selected excerpt to other sections of the particular song (spectral
four-factor model of preference (reflective and complex;
matching). Spectral analyses of audio features are based on the MIR
toolbox and on the software package dBSONIC. Replication of genre intense and rebellious; upbeat and conventional; energetic and
recognition is based on online-experiments, and the comparison rhythmic) which they verified with an extensive sample of n =
between sound excerpts and the full song is being conducted by a 1704. A similar approach was followed by Bigand et al.
researcher-developed software application. The work is currently (2005a, 2005b) who tested emotional qualities of short
under progress but first results indicate that recognition rate strongly musical excerpts in a model of multidimensional scaling. The
depends on the more or less random selection of excerpts in previous different approaches found in these studies, however, do not
studies, which might not always be representative of an entire song. allow for consistent statements about the processes of
In future studies, researchers should exert more control over the recognition on a chronological basis.
sound material: for instance, by selecting multiple excerpts from one
song which should be controlled for the spectral fit to the whole II. THEORY
piece.
A. Lack of Objectivity in Stimulus Creation
I. INTRODUCTION One reason for the variety of minimal durational values
Gray (1942) carried out the first empirical study on proposed for rapid genre assessment (as depicted in Figure 1
minimal durational values required for perception. He limited on the following page) can be found in stimulus creation
tape recordings of phonetic vocals to varying short extracts of processes. None of the studies reviewed for this overview
3 to 520 ms. Gray observed that even the shortest fragments used a standardized process for gaining stimulus materials. In
of acoustic information could be recognized correctly at contrary, most of the short excerpts seem to have been gained
above-chance levels. He noted that recordings of phonetic using unclear criteria. This prevents any statements on how
vocals with low fundamental frequencies could be reliably representative the excerpts are for the construct investigated –
detected by 75% of the subjects when only one oscillation of be it genre, instrumentation or any other perceptible criterion.
f0 had been presented to them. For higher-frequency sounds, Another blurring factor lies in the quality of the source
performance levels of the subjects rose from 30% for a material used for stimulus extraction. Lossy audio
duration of one to 65% at four entire oscillations of f0. compression codecs lead to artefacts that are impossible to
Referring to Gladwell’s (2007) popular scientific bestseller, control. Some studies do not include any description of the
Blink: The Power Of Thinking Without Thinking, Krumhansl production processes applied – adding to the impression that
(2010) introduced the term “plink” as a very short acoustic many blurring factors have mostly been ignored (such as
stimulus (“thin slice of musical information”). According to missing “fade-in and -out” which leads to hearable “crackles”).
her article “Plink: ‘Thin Slices’ of Music,” stimuli in the By controlling for these avoidable pitfalls, a higher grade of
millisecond range bear potential in the investigation of accuracy in perceived stimulus information can be achieved.
ISBN 1-876346-65-5 © ICMPC14 147 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
III. METHOD
The shortcomings in research on rapid assessment
processes can be tackled by a strict standardization of
stimulus generation. We reviewed the stimuli of Krumhansl’s
2010 publication. As the original stimuli used in this study
were provided as Mp3-files, our first aim was to reconstruct
the uncompressed sources used for stimulus generation.
Especially for the analysis of timbral texture features, the
uncompressed original titles are obligatory to achieve the
highest possible accuracy. This necessitated the recovery of
the full-quality source-material. 23 out of 28 original albums
were found in second-hand CD-shops in northern Germany.
As many successful albums (especially those dating back to
the 1960’s) have been remastered in different versions
throughout the years, we decided to use only CDs with
(re)release-dates earlier than the date of Krumhansl’s
publication. Those CDs advertised with extensive
remasterings were excluded from the selection. Based on this
reconstruction, we were able to use the uncompressed audio
signals for the following steps of data analysis.
148
Table of Contents
for this manuscript
noises, we defined fade-in and -out durations before starting B. Evaluation of Stimulus Content
the file extraction. In a second step of standardization, experts rated the
For the first in a row of studies, we designed the musical content which was perceptible in the short audio files
“Matryoshka-Method” of stimulus generation (see Figure 3). generated – namely, the presence or absence of voice (and
This algorithm describes the creation of stimuli with therefore gender) as well as the use of percussive and melodic
descending durations, derived from longer files: A first “plink” instruments. As the experimental design was to include an
with a duration of 800 ms was created. For the extraction of a online study with predefined answer alternatives, this formed
following 400 ms “plink”, the extraction range was set to an important reference for the statements gained from the
3-797 ms of the first one – excluding fade-in and -out areas. participants. In addition, we offered a free text field for
Stimuli of 800, 400, 200, 100 and 50 ms were generated to recording additional recognitions of specific genre, interpreter
uncover suspected ceiling- and floor-effects of stimulus and/or title data. To our knowledge, this is the first study
durations on recognition rates alike. Figure 3 depicts a flow investigating concrete rapid recognition rates of previously
chart diagram of the stimulus production process. determined criteria in terms of musical events.
149
Table of Contents
for this manuscript
150
Table of Contents
for this manuscript
The Use of Prototype Theory for Understanding the Perception and Concept
Formation of Musical Styles
Florian Hantschel,*1 Claudia Bullerjahn *2
*
Institut für Musikwissenschaft und Musikpädagogik, Justus-Liebig-Universität, Giessen, Hesse, Germany
1
florian.hantschel@musik.uni-giessen.de, 2claudia.bullerjahn@musik.uni-giessen.de
ISBN 1-876346-65-5 © ICMPC14 151 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
C. Prototype theory schema. He thereby implicitly assumed that music styles could
As a theory, the prototype-approach belongs to the cluster of display a prototypical structure. Martindale and Moore (1989)
theories that focus on pattern recognition. These are not proclaimed that the perception of music is based on
congruent with the so-called gestalt theory. In principal, the categorizations of mental schemata and prototypicality
fundamentals of prototype theory are a result of the influences preference judgements (cf. Cohrdes et al. 2011).
linguistically and psychologically orientated studies of Eleanor Subsequently, based on the perceived prototypicality of rock
Rosch (1975, 1978 & 1981). Rosch et al. initially investigated and jazz music Niketta (1990) was the first music psychologist
cognitive categorical representation of so-called »natural who investigated the correlation between a certain
objects« (colors, forms) and »categories of language-usage in music-style-concept, collative traits and aesthetic-evaluative
everyday life« (animals or furniture). Subsequently, this judgements in an experimental setting.
research paradigm was transferred to social and human science. Because of the idea of pattern recognition, Meyer’s
Within the scope of empirical psychological research on musicological theory (1989) is closely related to the cognitive
aesthetics, cognitive music psychology and music theory have prototype-concept Niketta applied. This connection provides
adapted the prototype theory to investigate conceptual models the basis for joining the knowledge acquired through music
and categorization in human thought. analysis with prototype-theory for the purpose of
Helga de la Motte-Haber (2005, p. 63) theorizes that a understanding the perception and concept formation of musical
prototype is a form of cognitive typification. Underlying styles in a multi-methodological approach.
characteristics occurring in various examples are abstracted
and determine the shape of a prototype. Thus, they are acquired III. METHOD
through experience and learning. This learning process can The musicological groundwork, i.e. a preliminary study
happen incidentally, e.g. while listening to music. Moreover, combined with a research literature review, resulted in a large
according to Niketta (1990), the general prototype-approach corpus of research relevant black-metal-music3. The musical
postulates that parameters were examined qualitatively and quantitatively
x boundaries between those cognitive categories formed using methods of popular music analysis and statistics for
through abstraction are fluent identifying the relevant stylistic aspects of the music.
x members of categories can vary in their typicality, i.e. Our exploratory, quasi-experimental elicitation is based on a
how representative they are for that certain category self-developed, Lime-Survey online-questionnaire (n 1=764 4
x for every category various contrastive categories exist and a pre-test expert rating (n2=4). It was published by the
x the ideal representative of a category is called the server host of Justus-Liebig-University, Giessen, Germany. 5 A
Prototype promotional campaign was carried out using Facebook, notices,
However, Gregory L. Murphy (2004, p. 41) states that circular email, a newsletter and personal/face-to-face contact.
prototype theory uses Prototypes not only in the sense of Data was collected April through May 2015. As a further
»best examples« for categories. More recently, the motivation, participants were presented the prospect to
research-paradigm also includes the assumption of participating in a lottery and winning a 50€ coupon for concert
prototypes as »summary representations« of concepts. tickets.
These are portrayable through weighted feature-lists and The main section of the online survey consists of five-point
schemata. Furthermore, German musicologist Martin Likert scale-based typicality ratings for twenty audio-stimuli.
Rohrmeier (2015, p. 290) states that: Each stimulus lasts twelve seconds (incl. fade-in & out). 6
Audio-stimuli were presented one at the time through a
»From a formal perspective, schema or prototype theories and Soundcloud-based embedded player device. While filling in
grammar approaches may be construed as the opposite ends of a the ratings, participants were able to listen multiple times.
complexity spectrum. Their primary difference involves a tradeoff Once they had rated a stimulus and continued to the next one,
in terms of compression: If a pattern evinces both regularity and moving backward or changing the previous rating was no
combinatorial freedom, grammars will be more suitable to
longer possible.
describe it; however, if a musical structure exhibits more
standardization and less variability, schema-theoretical (and
The twenty audio test-stimuli7 were taken from an item pool
exemplar-based) approaches will be more appropriate.« (lexical approach/lexical hypothesis) of 84 existing
black-metal-songs. The stimuli were selected in accordance to
The benefit of mentally-represented prototypes is seen in or in deviation from the stylistically relevant parameters in the
fast reaction times, less erroneous classification and less pre-test. Afterwards, in a double-blind experimental setting, a
oblivion in the recognition of patterns (la Motte-Haber, 2005, typicality rating of four expert8 for the black-metal-music-style
63). One of the relevant taxonomic processes involved in was conducted to secure the fitness of the selected test-items.
cognitive categorization and prototype formation is the Another reason for this rating was the subsequent comparison
»efficiency principle«. This minimizes the number of with the ratings of the survey applicants.
categories, while at the same time guaranteeing a maximum of The questionnaire-participants were grouped into six
informative content (cf. Thorau, 2010, 218). Through the different expertise levels by means of a self-assessment
»prototype effect«, new category-candidates are question regarding black-metal-music. Applicants with no
cognitive-statistically judged in relation to existing features in knowledge of this music were excluded from the survey:9
the current category (cf. ibid.).
In his model for the schematic processing of musical
structures, Stoffer (1985) already implied that types of style-
and form-schemata could be part of a global cognitive music
152
Table of Contents
for this manuscript
153
Table of Contents
for this manuscript
Only expertise group 6 can also identify music excerpts pre-existing full-song-examples instead of short cues. Under
without the characteristic vocals (stimulus 6) as typical. specific instructional conditions (e.g. "Pay special attention
Importantly, the audio test stimuli 12, 13, 15 and 20, assessed to…"), repeated measures would be possible as well. Unlike
as being very typical by the four experts, are also on average Niketta (1990) we chose to use no contrast categories and no
judged as being typical by all of the six groups from the survey. exclusion of items (e.g. "Is not Black Metal" was no rating
This means, that the recognition of prototypical option). Future experiments should also implement multipolar
black-metal-music is independent from expertise. ratings or rating scales for various different style-categories.
Confirmed by non-parametric group comparison of audio Different from Rosh et al. (ibid.), the inter-rater-reliabilities
test-stimuli, all of stylistically relevant musical parameters concerning sounding aesthetic-objects are, in line with
detected in the pre-test-analysis (screeching/screaming vocals, expectations, lower as for »natural objects« and language. This
tremolo-picking drones and blast-beat drum-patterns) lead to could be due to:
high typicality-ratings. Yet, an even more significant and x category vagueness (debates on the so called distinction
therefore fundamental criterion of typicality-perception is the of first & second wave of black metal compared to
aesthetics of lo-fi production. The regression analysis proto-black and black metal proper are still relatively
(dependent variable: mean typicality-rating; predictor: new)
standard-deviation) shows (p≤0.01; R² = 0.670) that with x the test subjects generally overestimated their expertise
increasing levels of perceived prototypicality, the expertise x overseen confounding variables have systematic or
groups 1 and 6 independently displayed higher levels of non-systematic impact
inter-individual consistency of judgement. x uncertainty about the category, because of its specifity
On the one hand, with 88% of participants rating it as typical x forced involvement of language processing and
example, the audio test stimulus 20 (Tatthogghua, ‒ Status declarative knowledge (both rely on slower and
Stürmer) appears to be the best example and thus can be called conscious paths of cognition compared to auditory
the prototype of the study. On the other hand, example 9 perception of music and/or emotional processing)
(Lantlôs ‒ Minusmensch) with 91% survey contributors Hence, keeping in mind, that prototypes are believed to
classifying it as the most atypical representative, consequently emerge incidentally as product of implicit learning, it is quite
can be called the antitype. reasonable to assume that the very immediate process of
The results of the analysis of free-text answers show that categorization has to be captured wordless (for example with
68.6% of the respondents identify one of nine bands as being a imaging methods, such as EEG or fMRI). Moreover, it might
typical black-metal-band, most of which are Norwegian be useful to adjust the particular scale in use (insofar one wants
(decreasing order: Darkthrone, Immortal, Mayhem, Dimmu to use rating-scales after all) according to the individual
Borgir, Gorgoroth, Behemoth, Marduk, Burzum, Venom). expertise of the test-subject (for professionals a multi-stage
Altogether, higher expertise-groups’ answers are more measurement and for laymen one as easy and understandable
versatile answers. The most frequent free-text answer is the as possible). Following Murphy’s review and interpretation
song »Transilvanian Hunger« by the Norwegian band (2006), it is quite possible that lay judge more exemplar-based
Darkthrone and can thus be called the verbal prototype of the and professionals by means of an internalized prototype. The
survey. overall question is how the typicality-rating comes into being
and which parameters are influential. The multivariate tests
V. DISCUSSION suggest that in the questionnaire, socio-demographic variables
All in all, a point of criticism is the quasi-experimental and also verbally recorded music-preferences have no
setting of our study. In the future, controlled laboratory statistically significant impact on the typicality-rating. The
experiments should be performed to minimize the possible only variable that comes into question is the membership of a
impact of confounding factors. Further, although the groups 1 black-metal-expertise group. Clearly, regression-analysis
to 6 reflect a representative cross-section of German society, shows that subjects of a higher group affiliation judge most
these should be approximated to near equal size and similarly to the expert-raters from the pre-test. Despite the
gender-distribution to ensure comparability. Then, other highly significant differences, all in all, only ten of the twenty
statistical methods such as parametric test procedures (e.g. test-stimuli were rated significantly different and it must be
t-test; ANOVA) are possible as well. Another criticism noted that:
concerning our study is that the expertise for x the very typical representatives were evaluated in a
black-metal-music was only measured indirectly. Hence, it can comparable manner in each black-metal-expertise-group
be assumed that the self-assessment assignment was not x only for example 7 the average ratings are substantially
sufficiently objective. Certainly, the choice of twenty different and group 1 is diametrically opposed to group 6,
audio-test-stimuli for an exploratory and hypothesis-testing which suggest an effect of expertise
investigation was reasonable. In respect to the small amount of x e.g. for the prototype-stimulus 20, which, after all , was
test-stimuli, it is also obvious that there are no controlled rated as very typical by all expertise groups, the statistical
results for each individual stylistically-relevant parameter significance is most likely also be due to the very large
possible. That is why we recommended that selected features sample size
could be examined in a controlled manner using a large corpus Following the above-mentioned assumptions of
of test-stimuli and expert-raters. Another possibility is to prototype-theory, this leads us to the conclusion, that these
reflect the holistic processing of music perception and typicality-effects suggest an inter-subjectively shared
cognition. With this in mind, one might apply the typicality underlying concept of what is prototypical black-metal-music.
rating-paradigm in an experimental context to a corpus of
154
Table of Contents
for this manuscript
VI. CONCLUSION Frith, S. (1996). Performing Rites. On the Value of Popular Music.
Oxford: Oxford University Press.
To conclude, most recent investigations concerning the Haas, B. (2004). Die neue Tonalität von Schubert bis Webern. Hören
perception of musical styles either use the linguistic approach und Analysieren nach Albert Simon. Wilhelmshaven: Noetzel.
of defining specific grammatical rules (e.g. Martin Hagen, R. (2011). Musical Style, Ideology, and Mythology in
Rohrmeier’s Generative Syntax Model of tonal harmony, 2005, Norwegian Black Metal. In J. Wallach, H. M. Berger & P. D.
2009, 2015), or analyze non-musical parameters, for example Greene (Eds.), Metal Rules The Globe. Heavy Metal Music
musicians’ image or fan culture. Instead, we argue that Around The World (pp. 180-199). Durrham & London: Duke
psychology’s research-paradigm of cognitive prototypes University Press,
Hainaut, B. (2012) "Fear and Wonder". Le fantastique sombre et
provides valuable information for understanding the concept
l'harmonie des médiantes, de hollywood au black metal. Volume!
formation and perception processes for the differentiation of The French Journal of Popular Music Studies, 9(2)
musical styles. We used the example of black-metal-music, (=Contre-cultures), 179-197.
which is an extreme sub-style of heavy metal music. Excluding La Motte-Haber, H. de (2004). Ästhetische Erfahrung: Wahrnehmung,
those participants who never heard of this music before, ratings Wirkung, Ich-Beteiligung. In Musikästhetik. Handbuch der
of twenty audio-test-stimuli revealed, that typicality effects systematischen Musikwissenschaft. H. de la Motte-Haber & E.
govern the intersubjective perception of what are typical and Tramsen (Ed.). Laaber: Laaber, 408-429.
untypical sound examples for this music. Moreover and very La Motte-Haber, H. de (2005). Modelle der musikalischen
surprising in our quantitative-empirical study with 764 Wahrnehmung. Psychophysik ‒ Gestalt ‒ Invarianten ‒
Mustererkennen ‒ Neuronale Netze ‒ Sprachmetapher. In H. de la
participants from very different backgrounds and expertise,
Motte-Haber & G. Rötter (Eds.), Musikpsychologie. Handbuch
there is substantial agreement concerning the very ends of the der systematischen Musikwissenschaft (pp. 55-73). Laaber:
typicality continuum. Against all odds that black-metal is a Laaber.
minority’s music where only a chosen few are in the secret,. La Motte-Haber, H. de (2010). Wahrnehmung. In H. de la
prototypical black-metal-music could be identified Motte-Haber, H. v. Loesch, G. Rötter & C. Utz (Eds.), Lexikon
expertise-independently. Concerning future research, more der systematischen Musikwissenschaft (pp. 552-526.). Laaber:
experiments following the paradigms established by Rosch et Laaber.
al. (ibid.), especially regarding the very basic meta-categories Martindale, C. & Moore, K. (1989). Relationship of musical
of everyday social life, such as e.g. rock, classic, jazz or preference to collative, ecological, and psychophysical variables.
Music Perception, 6(4), 431-445.
electronic dance music, are preferable.
Marx, W. (2004). Klassifikation und Gattungsbegriff in der
Musikwissenschaft. Hildesheim: Georg Olms Verlag.
ACKNOWLEDGMENT Meyer, L. B. (1989). Style and Music: Theory, History, Ideology.
We thank the German National Academic Foundation, the Philadelphia: University of Chicago Press.
heavy-metal supporters-community Undergrounded, Ursula Moore, A. F. (2001). Rock: The Primary Text. Developing a
Kruck-Hantschel, Thilo Marauhn, Philipp Dietzel, Joey Sato, musicology of rock. Burlington: Ashgate.
Nils Wilken and Lydia Scott for their support. Moore, A. F. (2012). Song Means: Analysing and Interpreting
Recorded Popular Song. Burlington: Ashgate.
Murphy, G. L. (2004). The Big Book of Concepts. Cambridge,
REFERENCES Massachusetts: MIT Press.
Chaker, S. (2011a). Extreme Music for Extreme People: Black and Niketta, R. (1990). Was ist prototypische Rockmusik? Zum
Death Metal. Put to the Test in a Comparative Empirical Study. In: Zusammenhang zwischen Prototypikalität, Komplexität und
The Metal Void. First Gatherings. N. W. Scott & I. v. Helden ästhetischem Urteil. In K-E. Behne, G. Kleinen & H. de la
(Eds.). Oxford: Inter-Disciplinary Press, 265-278. Motte-Haber (Eds.), Musikpsychologie. Empirische Forschungen
Chaker, S. (2011b). Götter des Gemetzels. Mediale Inszenierung von ‒ Ästhetische Experimente. Jahrbuch der deutschen Gesellschaft
Tod und Gewalt im Death Metal. In R. Bartosch (Ed.), Heavy für Musikpsychologie (pp. 35-57). Wilhelmshaven: Florian
Metal Studies. Band 1: Lyrics und Intertextualität. Oberhausen: Noetzel Verlag.
Verlag Nicole Schmenk. Reyes, I. (2013). Blacker than Death: Recollecting the "Black Turn" in
Chaker, S. (2014). Schwarzmetall und Todesblei. Über den Umgang Metal Aesthetics. Journal of Popular Music Studies, 25 (2),
mit Musik in den Death- und Black-Metal-Szenen Deutschlands. 240-257.
Berlin: Archiv der Jugendkulturen. Rohrmeier, M. (2009). Implicit learning of musical structure:
Cohn, R. (2004). Resemblances: Tonal Signification in the Freudian Experimental and computational approaches. PhD Dissertation.
Age. In Journal of the American Musicological Society, 57 (2), University of Cambridge.
285-324. Rohrmeier, M. & Neuwirth, M. (2015). Towards a Syntax of the
Cohrdes, C., Lehmann, M. & Kopiez, R. (2011). Typikalität, Classical Cadence. In M. Neuwirth & P. Bergé (Eds.), What Is A
Musiker-Image und die Musikbewertung durch Jugendliche. Cadence. Theoretical and Analytical Perspectives on Cadences in
Musicae Scientae, 16 (1), 81-101. Classical Repertoire (pp. 287-338). Leuven: Leuven University
Cohrdes, C. & Kopiez, R. (2014). Optimal distinctivness and Press.
adolescent music appreciation: Development of music and Rohrmeier, M. (2005). Towards modelling movement in music:
image-related typicality scales. Psychology of Music, 1-18. Analysing properties and dynamic aspects of pc set sequences in
Dietzel, P. (2013). Blizzard Bea(s)ts. Die musikalische Ästhetik des Bach’s chorales. MPhil dissertation. University of Cambridge.
norwegischen Black Metal in der ersten Hälfte der 1990er Jahre. Rosch, E. & Mervis, C. B. (1981). Categorization of natural objects.
Magister-thesis. Justus-Liebig-University, Giessen. Annual Review of Psychology, 32, 89-115.
Eckes, T. (1991). Psychologie der Begriffe. Strukturen des Wissens Rosch, E. & B.B. Lloyd (1978). Cognition and categorization.
und Prozesse der Kategorisierung. Göttingen.: Hogrefe. Hillsdale, New Jersey: Erlbaum.
Elflein, D. (2010). Schwermetallanalysen. Die musikalische Sprache Rosch, E. & Mervis, C. B. (1975). Family resemblances: Studies in
des Heavy Metal. Bielefeld: Transcript. the internal structure of categories. Cognitive Psychology, 7,
573-605.
155
Table of Contents
for this manuscript
156
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 157 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Leigh VanHandel1
1
College of Music, Michigan State University, East Lansing, MI, USA
In the experiment, subjects are presented with novel musical stimuli designed to study
one of the musical characteristics. Subjects manipulate the tempo of the stimuli in real
time, with no pitch shifting, via a controller; their task is to determine what they believe
is an appropriate tempo for each excerpt through manipulation of the controller. When
subjects determine an appropriate tempo, their response is recorded and they move onto
the next excerpt.
ISBN 1-876346-65-5 © ICMPC14 158 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 159 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
32 participants (10 female, mean age 27.4, SD = 10.2) took The rhythmic durations used in the distractor voices are
part in this experiment: 16 musicians, with at least 7 years of selected from the same set as in the target melody. The simple
formal training, and 16 non-musicians, with less than 1 year of level has only two rhythmic durations in each and across all six
formal training. voices. These two values are the longest rhythmic values
possible, in accordance with the frequency of changes in the
B. Behavioural stimuli harmonic progression. In the medium level there are three
For the purpose of the experiment a series of 7-voice rhythmic durations in each and across all six voices. In the
polyphonic stimuli was created. The seven voices are written to complex level there are at least four different rhythmic
cover an entire orchestral range and are played by the following durations in each voice and all seven durations across all six
instruments: one bass organ, two bassoons, one Clarinet and voices.
three flutes, respectively located in the following subjective As for melodic complexity, the three levels of difficulty are
registers: very-low, low, medium-low, medium, medium-high, determined by qualitative change. In the simple level only
high and very-high. harmonic tones are used. In this way harmonic progression and
The stimuli are composed according to the basic rules of melody are consistent and they make no deviations between
harmony and voice leading and according to standard rules of synchronic and diachronic dimension. This helps to fuse
functional harmony, as systematized by Schönberg (Schönberg together the distractor voices and makes it easier to perceive
& Stein, 1969). the target melody. In the medium level the distractor voices are
In line with the proposition that salience is linked to composed using harmonic tones and non-harmonic tones on
complexity, stimuli were constructed to be equally complex as weak beats (e. g. passing notes). In this level each line assumes
a baseline, with 7 different voices, 7 different rhythmic duration a stronger melodic character, becoming more salient in its own
values, 7 different pitches and 7 different chord species. The right, and drawing more attention from the listener. Finally, in
latter three were then systematically and independently the complex level, we find harmonic tones, non-harmonic tones
manipulated in order to manipulate complexity and evaluate its on weak beats and non-harmonic tones on strong beats (e.g.
effect on perceptual salience. Similarly to Marozeau and appoggiaturas). In this last case, in contrast with the simplest
colleagues (Marozeau et al., 2013), we designed our stimuli level, harmonic progression and melody are not always
with one target (the middle) voice and six distractor (top and consistent. This provokes some conflicts between harmony and
bottom three) voices. The target melody, played by the clarinet, melody, potentially drawing the attention of the listener away
is the same in every composition save experimental from the target melody in the strongest way possible.
manipulation, described below, while distractor voices are Finally, for harmonic progression the difference between the
systematically manipulated in complexity. levels is both qualitative and quantitative. In each level and
The target melody was composed according very strict each stimulus, seven different harmonic species are used. They
criteria: it has seven different pitches (C major diatonic scale), are: major triad (I, major third + minor third), minor triad (VI,
seven different rhythmic durations (proportions 1:1; 1:2; 1:3; minor third + major third), diminished triad (VII, minor third +
1:4; 1:6; 1:8; 1:10; where the shortest value is a sixteenth note) minor third), first species seventh chord (V7, major third +
and seven different direct (between adjacent notes) and indirect minor third + minor third), second species seventh chord (II7,
(between non-adjacent notes) intervals (minor second, major minor third + major third + minor third), third species seventh
second, minor third, major third, fourth, tritone and fifth). The chord (VII7, minor third + minor third + major third) and fourth
melody has a well-defined, convex melodic contour, species seventh chord (IV7, major third + minor third + major
developing over a ninth range with only one high peak reached third). Each species is linked to a particular harmonic function:
in its final part. Therefore, the participant should be able to pick the I chord is linked to a tonic function; the VI chord is linked
out the target melody from among the other voices by both to a subdominant and a tonic, depending on the context;
recognizing its characteristic contour. VII, VII7 and V7 chords are associated with dominant function;
Further to the above specifications, the target melody was while II7 and IV7 are exclusively associated with a
manipulated for same/different experimental task purposes. subdominant function. In accordance with these characteristics,
Four changes were possible: one altered note in the middle of the three levels of difficulty are created using an increasing
the melody, one altered note duration in the middle of the number of chords (quantitative difference), which implies an
melody, one altered final note or one altered final rhythmic increasing number of different harmonic functions (qualitative
duration. Alterations are illustrated in Figure 1. difference), from the simplest level to the most complex one.
The distractor voices are also constructed according to very The complexity of harmonic progression is hence created
strict constraints, varying on three parameters: harmonic through an intensification of the harmonic functions and
progression, melodic complexity and number of rhythmic harmonic rhythm.
durations, where each parameter varies among three levels of Overlap between voices was avoided as much as possible
difficulty: simple, medium and complex. and each voice was composed in vocal style to enhance the
independence of each melodic line. Where overlap was not
avoidable it was usually sufficiently distant in terms of time (i.e.
more than one measure away) not to disturb the independence
of each voice.
As the three parameters can vary independently on three
levels of difficulty each, 27 different stimuli were created,
Figure 1. Target melody and four change typologies using Apple Logic X Software with the built-in sound library
supplied with the program. The balance between the channels
160
Table of Contents
for this manuscript
was regulated in order to have the target melody, played by a to 2, an eighth note to .5 and so on, for every possible pair of
clarinet, and the two central voices, medium-high, played by a lines amongst the six distractor streams and averaged to
flute, and medium-low played by a cello, equally distributed on produce a score for the composition.
left and right channels. The very-high voice, played by a flute,
and the very-low, played by a bass organ, were louder in the D. Procedure
left channel; while the high, played by a flute, and the low, After reading the information sheet and providing informed
played by a cello, were louder in the right one. This was consent, each participant was acquainted with the task and
intended to spatialize the high and low register, while keeping participated in a practice session, with an opportunity to ask
the medium range equally balanced in the center. Each voice questions before beginning the experiment. Participants
was manually adjusted by the first author to be of perceptually listened through headphones (Bose, Sound True In-Ear) at a
equal loudness. A -10 dB linear fade-in was applied to the six comfortable volume and sat at a comfortable distance from a
distractor voices on the first bar only to prime participant focus laptop computer. The task was a same/difference judgment
to the target melody (also determined through pilot testing). where participants were to indicate whether the target melody
Stimuli were randomly produced in one of three slightly they heard presented in a polyphonic context was the same as
different tempos (87, 90 and 93 BPM) and three different keys the target melody they heard in a monophonic context. The
(Bb Major, C Major, D Major) to avoid training effects as much monophonic melody was presented at the very beginning of the
as possible. All stimuli can be found online at http://music- experiment and was present throughout so that participants
cognition.eecs.qmul.ac.uk/data/BussiSauvePearce2016.zip. could remind themselves of what it sounded like when they
wished, but did not need to hear it for each trial (decided
C. Implementation of corpus analysis through piloting). In total, participants were presented with 2
1) Onset and offset synchrony. Onset synchrony (Duane, excerpts of each of the 27 stimulus types for a total of 54 trials.
2013) was calculated as the percent of pitches in a bar that have One of each pair of excerpts contained the same target melody,
the same onset. In our implementation, the proportion of while the other was altered in one of the four possible ways
streams (out of the six distractor streams) with the same onset explained above. Moreover, in order to have an equal number
was averaged across every possible onset, in other words every of trials for each type of target melody change across the whole
sixteenth note (256 units of MIDI time), beginning with time 0 experiment, participants were randomly divided into 4 groups
to produce one value for the composition. Offset (i.e. note (8 subjects per group, 4 musicians and 4 non-musicians), in
ending) synchrony was calculated in the same way, beginning which the four alteration types were rotated. The 54 stimuli
at time 256 (i.e. the end of the first possible 16th note). were randomized and the subjects could hear each stimulus a
maximum of twice. The experiment was implemented in Max
2) Pitch comodulation. Pitch comodulation (Duane, 2013) MSP and was run on an Apple MacBook Pro laptop computer.
was calculated for every vertical pair of pitches as it moves to
another pair, with the following rules: a) when the two III. RESULTS
neighbouring harmonic intervals are the same, the scoring
function should equal 1; b) the scoring function should decrease A. Behavioural experiment
as the difference between the two intervals increases since Accuracy was measured as a correct identification of same
larger interval differences are progressively further from or different for each trial. Results showed a relatively low
parallel motion. The scoring function is as follows: overall accuracy in completing the task, with a mean of .66,
score = 1 – eĮ/d (1) 95% CI [.64, .69], where musicians, with mean .76, [.73, .79]
differed from non-musicians, with mean .56, [.53, .60]. A
where d is the absolute difference in semitones between the two mixed-effects logistic regression predicting participant
harmonic intervals and Į is a scaling constant, set to 2ln0.5 so accuracy with type of target melody alteration as a 5-factor
that interval differences of 0 scored at 1, differences of 1 scored
fixed effect (4 types plus no change), musicianship as a fixed
as .75, differences of 2 at .5 and so on. In our implementation,
effect, and participants as an intercept-only random effect
pitch comodulation was calculated for every possible pair of
yielded the best model, Ȥ2 (5) = 51.12, p < .01 as compared to
lines amongst the six distractor streams and averaged to
produce a score for the composition. a null model with only random effect of participant as predictor.
See Table 1 for details of the model. A potential interaction
3) Harmonic overlap. Harmonic overlap (Duane, 2013) between type of target melody alteration and musicianship was
calculates the degree of overlapping harmonics between two explored and while the model was significantly better than
pitches (interval score) and is weighted by rhythmic duration. without an interaction, Ȥ2 (4) = 13.54, p = .008, interaction
The interval score is calculated as follows: coefficients were highly correlated with corresponding type of
ଶ target melody alteration coefficients and were therefore
IntervalScore = (2)
ே௨௧ା௧ dropped. We can see by looking at the coefficients in Table 1
where the numerator and denominator are those of the that types of melody change 2 and 4 contribute significantly to
interval’s just frequency ratio (i.e. octave = 2/1, fifth = 3/2). An the model as compared to typologies 1 and 3, corresponding to
interval score is calculated for each pair of pitches with changes in the last measure of the target melody as opposed to
identical onsets between any two given streams. In our the middle. Musicianship is also a highly significant predictor,
implementation, harmonic overlap is calculated in the same indicating that musicians and non-musicians performed
way as pitch comodulation: interval score is multiplied differently, as outlined in overall performance above.
rhythmic weight, where a quarter note is equal to 1, a half note
161
Table of Contents
for this manuscript
Table 1. Model specifications for behavioural and greater complexity of our stimuli also had a negative effect on
computational analysis models. task performance suggesting the need for larger alterations,
greater separation between the voices or fewer voices to make
Standard the task easier in future research. Third, the location of the
Model Fixed effect Estimate alteration had a significant effect on detection, where changes
Error
(Intercept) .12 .13 at the end of the target melody were more easily detected than
Type1 .03 .16 those in the middle. This is likely to be a direct consequence
Type2 .48 .17 of musical phrasing, where phrase endings are typically
Behavioural familiar, formulaic closures culminating on more or less
Type3 -.05 .16
Type4 .87 .18 definite cadences recognized across cultures (Huron, 1996),
Musicianship .99 .18 simplify detection of differences. This possibility is reinforced
(Intercept) 6.20 2.62 by both rhythm- and pitch-altered endings violating rules of
Harmonic -3.83 1.69 musical closure.
Computational
Overlap
Musicianship .98 .18 ACKNOWLEDGMENT
This work is funded by Erasmus+/KAI1 Istruzione
B. Computational analysis Superiore Mobility Consortia - North South Traineeship
In a similar analysis to Duane (2013), onset synchrony, offset (2014/53), awarded to author MB.
synchrony, harmonic overlap and pitch comodulation were
calculated as outlined above and tested as fixed effects to REFERENCES
predict participant accuracy, along with a fixed effect of Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual
musicianship, and an intercept-only random effect of Organization of Sound. MIT Press.
participants. The best model, as compared to a null model with Duane, B. (2013). Auditory Streaming Cues in Eighteenth- and Early
Nineteenth-Century String Quartets: A Corpus-Based
only random effects of participants, contained harmonic Study. Music Perception: An Interdisciplinary Journal,
overlap and musicianship as predictors as well as an intercept, 31(1), 46–58. http://doi.org/10.1525/mp.2013.31.1.46
Ȥ2 (5) = 5.06, p = .02. See Table 1 for details of the model. Huron, D. (1996). The melodic arch in Western folksongs.
Once again, the interaction between harmonic overlap and Computing in Musicology, 10, 3–23.
musicianship was significant, but the interaction coefficient Iverson, P. (1995). Auditory stream segregation by musical timbre:
was highly correlated to harmonic overlap and was therefore effects of static and dynamic acoustic attributes. Journal of
Experimental Psychology. Human Perception and
dropped. Performance, 21(4), 751–763.
Marozeau, J., Innes-Brown, H., & Blamey, P. J. (2013). The Effect
IV. DISCUSSION of Timbre and Loudness on Melody Segregation. Music
Overall, task accuracy was lower than targeted (~75%), Perception: An Interdisciplinary Journal, 30(3), 259–274.
particularly for non-musicians, who performed only slightly, http://doi.org/10.1525/mp.2012.30.3.259
McDermott, J. H., Lehr, A. J., & Oxenham, A. J. (2008). Is Relative
though significantly above chance. This outcome is likely due Pitch Specific to Pitch? Psychological Science, 19(12),
to the difficulty of detection of the small differences in the 1263–1271. http://doi.org/10.1111/j.1467-
target melody required by the task for non-musicians, and does 9280.2008.02235.x
not allow us to infer any explanation on how melodic, harmonic Prince, J. B., Thompson, W. F., & Schmuckler, M. A. (2009). Pitch
and rhythmic information is processed in this context. and time, tonality and meter: how do musical dimensions
Despite this limitation, the current study still provides some combine? Journal of Experimental Psychology. Human
Perception and Performance, 35(5), 1598–1617.
interesting results and implications. First it corroborates the http://doi.org/10.1037/a0016456
importance of musical training in melodic segregation, as Schoenberg, A., & Stein, L. (1969). Structural functions of harmony.
previously suggested (Marozeau et al., 2013; Zendel & Alain, WW Norton & Company.
2009). Second, we have, to our knowledge, for the first time Van Noorden, L. (1975). Temporal Coherence in the Perception of
assessed stream segregation with complex and ecological Tone Sequences. Technical University Eindhoven,
polyphonic musical stimuli that systematically vary complexity Eindhoven.
Vliegen, J., Moore, B. C. J., & Oxenham, A. J. (1999). The role of
in three musical parameters. Our results are contrary to spectral and periodicity cues in auditory stream
predictions by Prince and colleagues (2009), which would segregation, measured using a temporal discrimination
suggest that pitch is a more salient feature than time because of task. Journal of the Acoustical Society of America, 106(2),
its inherently greater complexity. This may be due to deliberate 938–945. http://doi.org/10.1121/1.427140
orthogonal manipulation of pitch structure and rhythmic Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt,
structure in our stimuli. The results are also contrary to those of II [Investigations in Gestalt Theory: II. Laws of
organization in perceptual forms]. Psychologische
Duane (2013), who found evidence for the importance of onset Forschung, 4, 301–350.
synchrony and offset synchrony and pitch comodulation and Zendel, B. R., & Alain, C. (2009). Concurrent sound segregation is
none for harmonic overlap in the stream segregation of music. enhanced in musicians. Journal of Cognitive Neuroscience,
We suggest that this is due to the greater polyphonic 21(8), 1488–1498. http://doi.org/10.1162/jocn.2009.21140
complexity of our stimuli, both in terms of number of voices
and harmonic complexity. However, it seems likely that the
162
Table of Contents
for this manuscript
1b 0 1# 1#
minor diminished
I. INTRODUCTION 2b, 1b, 0
2b
diminished major
Watching a pianist playing in a recital, often without any
printed music, an audience member is often struck by the 5b, 4b, 3b 5b, 4b, 3b
physical characteristics of the performance – the ability of the major minor
pianist to locate and play many notes in a complex and
apparently effortless sequence. At the other end of the Figure 1. Ten different musical ‘meanings’, each with a specific
professional spectrum is the ‘reluctant organist’ attempting to motor response pattern, represented by a single visual fragment.
play four hymns in a church service, with or without the Research into the mental chronometry of visual processing
added difficulty of coordinating notes on a pedalboard. We suggests that ‘multivalent’ stimuli – i.e. visual symbols that
could be forgiven for thinking that motor coordination is the can have more than one possible meaning – are processed
primary issue in studying a keyboard instrument. However a more slowly than univalent symbols. Frequent changes to the
major issue preventing the exploration of a wide range of response rules (‘task-switches’) slow the response time further,
music that lies within the musician’s technical grasp appears particularly in the first response after a switch.
to be the difficulty of reading the printed notation. See Fourie, (Vandierendonck 2010).
2004, for example. Eye-tracking work by Goolsby (1994) showed pianists’
One of the problems in studying music reading is the gaze moving between treble and bass clef very frequently,
interaction of the temporal (rhythmic) and physical (pitch) sometimes on alternate saccades. If the two rule sets for treble
dimensions. This study takes as its focus the pitch aspect and bass clefs can be considered a task-switch, clearly this
alone. By omitting the rhythm dimension, it is possible to would have implications for the speed of responding in piano
apply insights from the considerable literature of existing sight-reading.
response time (RT) research to the issue of music notation.
Musical notation, considered as a semiotic system, is not a
very effective map of the physical space of the piano II. METHOD
keyboard. It does not illustrate the octave-repeating pattern of Three separate experiments used a similar experimental
the keyboard, and identical visual symbols or clusters of setup, with three pitches presented to a computer screen at a
symbols must be executed differently by the two different time. The participant was requested to respond by playing the
clefs/hands. notes in the correct order, but as quickly as possible, as soon
Existing research into piano sight-reading (Sloboda, 1974) as they saw the stimulus. The key signature was given at the
suggests that expert sight-readers may process common beginning of a block of 40-50 trials, and feedback on average
musical configurations as ‘chunks’ to a greater extent than response time and errors was given at the end of each block.
novices. However as we see below, the notation system is not
contributing effectively to the recognition of common musical
patterns.
ISBN 1-876346-65-5 © ICMPC14 163 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
C. Third Experiment
A third experiment focuses on the more difficult key
signatures of 2#, 3#, 4# and 2b, 3b, 4b. Taking the same basic
stimulus set as the second experiment, alternate blocks in the
Figure 2. Experimental setup with screenshot of a repeat trial in
third experiment use a form of clarified notation (figure 5)
the treble clef. A reminder key signature remains at the left that resolves the confusion between multivalent stimuli. 15
during the whole block. blocks present the six difficult key signatures (plus the central
key C major/A minor) in one of four maximally confusing key
A. First Experiment orders, to try to evaluate the effect of one key signature on
The first experiment used a classic alternating runs another, and establish to what extent visual ambiguity adds to
task-switching paradigm (Monsell 2003), in which two trials response time.
were presented to one hand/clef, followed by two trials to the Data from an initial group of 10 moderate participants is
other. Task-switch trials (immediately following a change of reported here, recruited from in and around Münster,
clef) may then be compared with task-repeat trials to measure Germany.
the effect of the changing the task. A set of visually balanced
triads was used as stimuli, shown in figure 3. This total set
was subdivided so that not every triad appeared in every block,
and the key signature was changed every two blocks. Thus a
contrast could also be made between triads that had been Figure 5. Modified noteheads, showing the chords Bb major and
recently rehearsed in the same key, or had recently appeared D major. Notes with the left half filled are flat (b), and with the
in a different key. right half filled, sharp (#). Short barlines clarify the clefs.
The whole experiment contained 18 blocks covering 9 keys,
and were presented in two main key signature orders. 22
participants of widely varying age and musical background III. RESULTS
took part, recruited by word of mouth from in and around A more detailed account of these results may be found in
Exeter, UK, and their data were partitioned later into two the proceedings of the TENOR 2016 conference, where the
equal-sized groups, of expert or moderate proficiency. the effects and applications of clarified notation are further
discussed. Results are summarised here in brief, framing a
discussion of the confounds in standard music-reading.
The data analysis relies on averaging the mean RT over
groups of participants across the cells of the design. Although
it would be possible to normalise the data across all
participants, there are some aspects of motor coordination and
cognitive architecture which are common to all levels of
Figure 3. The stimulus set of experiment 1. competence. Reaction time is a direct reflection of a physical
quantity (processing duration) and is consequently not usually
transformed in reporting experiments of this type.
B. Second Experiment Response time was taken at the third keypress, and
The second experiment removed the task-switching element averaged data across correct trials for analysis.
of the design (which although significant, was not found to be Error scores of -1 were mostly single errors of execution in
large in comparison to other effects) and also presented a the correct hand in the right general area of the keyboard,
more rigorously balanced set of experimental stimuli (figure whereas errors of type -3 were almost all mistakes of
4), using seven root position chords. Every stimulus appeared switching (the wrong hand used, or wrong clef read).
in every block of every key. In this experiment each triad was A. Average Response Times
also presented in reverse order. Nine key signatures were
given in a randomly shuffled order. Participants in the ‘expert’ group of first experiment had
Participants were all attending Dartington Summer School average response times between 800 and 1500ms, and those in
and were recruited through the summer school choir, so that the ‘moderate’ group had response times between 1500 and
in addition to their piano skills they were also all reasonably 2500ms. Participants from the second experiment classed as
experienced choral singers. 12 participants had average RTs ‘moderate’ had response times between 1150 and 2300ms,
that approximately matched those of the moderate group from and from the third experiment between 1400 and 2450ms.
164
Table of Contents
for this manuscript
B. Effect of Clef significant in both expert and moderate groups, with contrasts
The effect of clef was highly significant across all of 45ms and 96ms respectively.
experimental and pilot data, with the Treble clef generally E. Effect of Presentation Order
faster than the bass clef. This Treble clef advantage included a
number of self-reported left-handers, bass singers and ’cellists In the second and third experiments, the order of
who might be expected to read bass clef more fluently. presentation was varied, with average response times to rising
In the expert group of the first experiment the contrast triads found to be faster than falling triads. In the second
between the averages for each clef was 118ms, and in the experiment this difference was 100ms, and in the third,
moderate group it was 209ms. In the moderate group of the 145ms.
second experiment the impact of clef was proportionally F. Effect of Inversion
lower but still highly significant, at 105ms. The contrast in the
third experiment was 202ms, closely comparable to the In the first experiment the effect of inversion was
unexpectedly found to be significant, with no interaction with
moderate group of experiment one.
the number of black notes, or their position in the triad. In the
C. Effect of Switch of Clef expert group, this effect was small but highly significant, with
During the first experiment, in the expert group in a difference between the root and the first inversion 8ms, and
particular, a number of very experienced individuals found it between the first and second inversions 28ms, a combined
very hard not to alternate between the clefs/hands, and difference of 29ms. In the moderate group, this effect was
frequently hesitated or moved the wrong hand slightly in much larger, with a difference between root and first inversion
repeat trials. of 168ms and between first and second inversions of 48ms, or
Nevertheless a small but highly significant effect of 216ms combined.
‘task-switch’ was found, in which the average response time Closer inspection of individual keypress data suggests that
when the clef had just been changed was greater than the this may relate to the experimental procedure; the participants
average response time in trials where the clef remained the tended to arpeggiate their response in a single ‘gesture’. The
same. This difference was 46ms in the expert group, and greater distance between two wider-spaced notes of an
138ms in the moderate group. inverted chord is reflected in a slight delay between the
keypresses. This effect appears to account for about half of
D. Effect of Last-seen Clef Congruence the variation in inversion response times.
In the first experiment, the effect of stimuli on one another G. Effect of Diatonic Chord
within the experiment was analysed in two ways.
1) On a global scale, three subsets of stimuli were rotated The position of the chord within the key can be analysed
so that half the trials in each block were from a ‘repeat set’ – according to its diatonic function, with chord 1 (I) the major
i.e. they were also shown in the previous block, and half from key chord, and chord 6 (vi) the minor key chord. Chords 1-6
a ‘novel set’ that had been absent in the previous block. all appear in different positions in neighbouring keys, but
2) At the local level, within each block, each stimulus chord 7 (vii), the diminished chord, is unique to each key
appeared four times, once in each clef-switch/repeat condition, signature and thus less practiced across the literature and
i.e. twice in each clef in each block. Investigating whether the within this experimental paradigm.
RT of a stimulus is affected by its most recent previous The contrast between chord 7 and the rest was significant
appearance, trials were coded according to whether the in every group. The difference between the average response
stimulus had most recently been seen in the same (similar) time for chord 1 and the average response time for chord 7 in
clef, or in the other clef (different): see Figure 6. the first experiment was 143ms in the expert group and 182ms
in the moderate group; a proportionally much greater
difference for the experts. In the second and third
fully-balanced experiments this range was 219ms and 232ms
respectively, with some pairwise comparisons between chords
also significant.
! ! Moderate Groups
Expert Group 2200
1,350
#
Mean RT (ms)
2100 Experiment 1
1,300
Mean RT (ms)
2000
1,250
1900
1,200 1800
1,150 1700
Experiment 2
Figure 6. Illustration of last-seen-clef similarity. 1600
1,100
(Stimuli numbered arbitrarily). I ii iii IV V vi vii I ii iii IV V vi vii
Diatonic Chord Diatonic Chord
Comparing the two subsets of ‘novel’ and ‘repeat’ stimuli Figure 7. Diatonic Chord profiles
within blocks where the key signature remained the same, no
significant effect was found in either the expert or moderate
groups, or in the error rates. The variable describing
‘last-seen-clef’ congruence, however, was found to be highly
165
Table of Contents
for this manuscript
1,600 Experiment2
990
4b 3b 2b 1b 0 1# 2# 3# 4# 4b 3b 2b 1b 0 1# 2# 3# 4#
Key Signature Key Signature
166
Table of Contents
for this manuscript
IV. DISCUSSION beyond the small ‘task-switch’ effect, the switch might in
itself have the capacity to disrupt recent learning.
In terms of longer term learning, there are two main lenses
In principle there may be a number of factors at work to
we might apply. The ’cellists (including one who was also
explain the variation in response times reported here.
left-handed), on being informed that they read more fluently
Hand-eye coordination in this situation requires at least three
in the right hand expressed little surprise, stating that they had
steps: decoding of the instructions, some transformation into
learned the Treble clef first as children, and had always felt
an action plan, and then the execution of that plan in space.
comfortable reading it. If early experience of a set of task
Delays may be incurred by extra difficulties at any of these
rules forms the basis for music-reading, with other mappings
stages.
either overlaid in conflict, or extended by some further
The only evidence for significant motor difficulty in
transformation, we might also be able to explain the
executing common chords at the keyboard was provided by
overwhelming advantage observed for the central key of C
the very small delay for chords with two black notes in the
major/A minor, which most piano teachers introduce first.
expert group of the first experiment. The contrast between
On the other hand, accumulated exposure may play a
clef/hand is unlikely to be a coordination problem because it
substantial role in shaping long term learning. The treble clef
remained evident in left-handers.
effect may simply reflect the fact that the timbral qualities of
A likely example of a transformation difficulty is the effect
the instrument mean there are usually more notes to play in
of presentation order (III.E). In the context of piano playing,
the right hand than the left, those notes are somewhat less
ascending figures are congruent to the order in which the
likely to form patterns that can be taken in at a glance, and
notes are to be executed, shown in figure 11 below.
consequently treble clef reading is more practiced. Whether
Processing notes from left to right and then executing them in
the same can be said of the central key C major/A minor is
the other direction requires a mirror transformation that may
more questionable, as any young learner who has spent many
confer a delay.
hours playing the Moonlight Sonata will attest.
The difference between the clefs themselves can also be
There is some indication in the data that some keys may be
considered from this point of view. The two hands are mirror
read as ‘extensions’ of other related keys. Some 50% of all the
images of one another, operating in an environment that
single-note errors in the first and second experiments were
repeats consistently from left to right. So either a chord or a
semitone errors at the 4th or 7th degree of the scale, i.e.
visual pattern that is fingered 1-3-5 in the right hand, will be
forgetting the last flat or sharp of the key signature. This
executed 5-3-1 in the left hand; a similar mirror
seemed to be irrespective of the immediately preceding key
transformation. This would not explain why the right
signature. In addition, some of the diatonic graphs showed
hand/treble clef is faster in left-handed individuals, but
signs of skew towards a particular reading strategy. When
provides one hypothesis for the difference between hands.
chord IV appears to be the most fluently read in the key of 1#,
for example, we may suppose that the key is essentially being
read as “Cmajor with an extra sharp”. Additionally some
participants’ individual graphs in 3b show an advantage for
chords vi and iii (when 3 is usually the most disadvantaged
chord after chord 7) which strongly suggests that 3b is being
read preferentially as C minor, rather than as Eb major. These
‘extension’ strategies appear to be quite individual, however,
and may reduce or disappear in experts; more data from
balanced experiments is required to investigate this effect.
Figure 11. Playing/Reading incongruence in falling triads Key signatures may interfere with each other in a number
of ways. One might imagine that A major (3#) followed by A
In terms of decoding the instructions, several factors may flat major (4b) might present a particular set of difficulties, as
be at work. We expect a two-fold effect of multivalent every single note mapping will be altered. On the other hand,
notation, both in obscuring the actual requirements at the the diatonic chords remain in the same locations on the
point of execution, and in obstructing the long-term pattern musical staff, which may be an advantage. In terms of pure
learning that we know is a pre-requisite for good sight-reading. spatial congruence, 3b is the mirror inverse of 3#, and playing
Disentangling these effects from one another, and from these one after the other might also be expected to cause a
middle-term recency effects of one key signature on another is particular pattern of disruption between the hands. A further
a complex but fascinating challenge. Pianists represent a possibility, that the major and minor key signatures of the
population with thousands of hours of practice at a very same keynote either enable or disrupt one another could also
complex and interlocking set of task-mappings, which is be considered. None of these questions provide a direct
unmatched in normal experimental settings. hypothesis for why 3# should show longer average response
The effect of last-seen congruence reported in section III.D times than 3b, but provide plausible avenues of enquiry which
is an important finding. Apparently the clef of the last sighting form the basis of the third experiment.
of the a visual configuration (either in the same clef or the
other clef) was more significant than whether that stimulus
had been rehearsed in the previous block, where it would have
been shown four times, twice in each clef. This suggests that
167
Table of Contents
for this manuscript
V. CONCLUSION
The multivalent nature of music notation, as discussed in
the introduction, may provide a reasonable hypothesis for the
particular difficulty of sight-reading in two clefs and in many
keys. This experiment was set up to encourage ‘chunking’ of
three-group notes into patterns that are amongst the most
common in the literature, and to observe how better pattern
recognition of these chunks may be acquired.
Further work aims to collect more data from expert
participants to contrast with the moderate group of the second
experiment, and to complete the key signature comparison
design of the third experiment. Further study of the ways in
which individuals select or extend core key signature
mappings, and of the learning that takes place during the
experiment is also planned.
ACKNOWLEDGMENTS
Many thanks to all the participants for their time, interest,
suggestions and anecdotal insights, to Professor Stephen
Monsell for his supervision in devising the experiment, and to
the staff of Dartington Summer School for making the second
experiment possible.
REFERENCES
Fourie, E. (2004). The processing of music notation: some
implications for piano sight-reading. Journal of the Musical Arts
in Africa., 1, 1–23.
Goolsby Thomas W. (1994). Eye Movement in Music Reading:
Effects of Reading Ability, Notational Complexity, and
Encounters. Music Perception: An Interdisciplinary Journal,
12(1), 77–96.
Monsell, S. (2003). Task switching. Trends in Cognitive Sciences,
7(3), 134–140. http://doi.org/10.1016/S1364-6613(03)00028-7
Sloboda, J. (1974). The Eye-Hand Span-An Approach to the Study
of Sight Reading. Psychology of Music, 2(2), 4–10.
http://doi.org/10.1177/030573567422001
Vandierendonck, A., Liefooghe, B., & Verbruggen, F. (2010). Task
switching: Interplay of reconfiguration and interference control.
Psychological Bulletin, 136(4), 601–626.
http://doi.org/10.1037/a0019791
Wood, M (2016). Visual Confusion in Piano Notation. In
Proceedings of the Second International Conference on
Technologies for Music Notation and Representation (In press)
168
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 169 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
For the purpose of the cognitive model, six acoustic years of musical training of 7.1 (SD = 3.7), and a mean Ollen
features were automatically extracted from each of these Musical Sophistication Index (Ollen, 2006) of 455.0 (SD =
excerpts. As a first step in building a neural network 243.9). More detail about the participants can be found in
examining the relationship between these acoustic features Albrecht (2016).
and listener ratings of affect, regression models were
constructed. These regression models can be taken as proxies D.Procedure
of correlations between the surface musical features and Subjects participated in groups of up to 8 in a computer lab.
listener perceptions of affective expression. After reading the experiment’s instructions, participants
listened to two trial excerpts to ensure they understood the
A.Stimuli instructions and interface. Participants listened to the excerpts
For this study, it was important to use excerpts that would on headphones adjusted to a comfortable listening level, and
be representative of strong affective expressions. Therefore, rated each excerpt’s affective expression on a 7-point Likert
the Society for Music Theory listserv, the American scale.
Musicological Society listserv, and the Piano Street forum
were polled for those excerpts that were deemed to be “the E. Automatic feature extraction
most emotionally expressive moments in the Beethoven piano For the purpose of constructing models of musical affect,
sonata literature.” As a result of this poll, thirty excerpts were six acoustic features were automatically extracted from each
provided, along with the descriptions of the excerpts provided of the thirty five-second audio excerpts. These features were
in parentheses. extracted using LabVIEW 2015, version 15.02f. For the
To control for excerpt length, excerpts of five seconds in purpose of continuously tracking certain audio features across
length were chosen. If the excerpt provided by the respondent the five second excerpt, excerpts were subdivided into fifty
was longer than five seconds, the five seconds including the 100-millisecond windows. A fast Fourier transform was
moment most closely matching the description was chosen. If performed to determine the amplitudes of the frequencies
there was no description of the passage, the first five seconds present in each window. The algorithms for the extraction of
were used. More detail on recordings chosen can be found in each of the six audio features are described further below.
Albrecht (2016). The excerpts used are listed in Table 1. 1. Timbre. The spectral centroid, fc, of each excerpt was
Table 1. The thirty excerpts used in this study as a result of used as a measure of the timbre of the piece (see Schubert,
polling expert musicians. Wolfe, and Tarnopolsky, 2004). The fast Fourier transform
Excerpt Excerpt produces a power spectrum that relates amplitudes of a
signal, A, to the corresponding frequency, f. Using this
Op. 2, No. 1, II (2nd theme) Op. 53, III (major 7th arpeg.)
information, the spectral centroid was computed as the
Op. 2, No. 1, IV (mm. 20-50) Op. 57, I (start of 2nd theme)
weighted average of frequencies by amplitude,
Op. 7, II (2nd theme) Op. 81a, III (ending)
Op. 10, No. 3, I (no section) Op. 81a, II (2nd theme) ∑( )
= .
Op. 10, No. 3, II (development) Op. 90, II (canon) ∑( )
Op. 13, No. 1, II (entire mvt.) Op. 101, III (into to final mvt.)
2. Maximum Amplitude. The measurement of dynamic
Op. 26, I (resolutions in Var. III) Op. 106, I (beginning)
range was extracted from each 100-millisecond splice.
Op. 26, IV (main theme) Op. 106, II (beginning)
Amplitude was recorded at the peak of each note-produced
Op. 27, No. 2, I (trans. to A1) Op. 106, III (mm. 60 or 145) sine wave. The maximum amplitude of each splice was
Op. 27, No. 2, III (opening) Op. 109, I (retransition) collected. The largest maximum of all of the splices was then
Op. 28, I (mm. 109-127) Op. 109, III (Var. VI) selected as the maximum amplitude of the excerpt.
Op. 31, No. 2, I (recit. at close) Op. 110, I (sad part)
Op. 31, No. 2, III (mm. 169-173) Op. 110, III (mm. 21-23) 3. Average Amplitude. Using the power spectrum
Op. 31, No. 3, I (ms. 99) Op. 111, I (beginning) produced by the fast Fourier transform, the amplitude of the
Op. 53, I (2nd theme) Op. 111, II mm. (116-121) two strongest signals were acquired for each 100-
millisecond window. These amplitudes were then averaged
for each splice throughout the piece to gain a measured mean
B. Affect terms volume.
Affect terms in this study were taken from Albrecht (2014), 4. Roughness. Vassilakis (2005) showed that auditory
who conducted a series of three experiments to empirically roughness corresponds to auditory consonance. In this study
determine which affect labels would be appropriate for roughness was measured within each 100-millisecond splice
studying the Beethoven piano sonatas. From these studies, he using the following formula from Vassilakis:
obtained nine affective dimensions, used in this study: angry,
anticipating, anxious, calm, dreamy, exciting, happy, in love, = .
∗ 0.5 ∗
.
∗ ,
and sad. where:
C.Participants = ∗
This study involved 68 participants from the University of ∗
Mary Hardin-Baylor, consisting of freshmen and sophomore =
music majors. Among these participants there were 33
= ∗( ) − ! ∗( ) .
females, a mean age of 19.4 (SD = 1.5), a mean number of
170
Table of Contents
for this manuscript
and are the minimum and maximum amplitudes minimized overfitting by limiting the number of predictors,
of the two strongest frequencies in the power spectrum of and c) selecting for theoretically meaningful predictors. One
each 100-millisecond splice. and are the model was created for each affective dimension.
minimum and maximum amplitudes, respectively, of the Derived models included 2-5 predictor variables each,
same two frequencies. X corresponds to the dependence of accounting for between 6.9%-41.5% (mean of 20.9%) of the
roughness on intensity (amplitude) of accounted sine waves. variance in participant ratings. The musical parameters used
Y denotes the fluctuation degree of amplitude. Z represents in each model, along with the directionality of the effect, are
the relation of roughness to the fluctuation rate of amplitude. shown in Table 2.
5. Spectral Centroid Rate of Change. As defined Table 2. Regression models derived from the training set. Adj.
above, the spectral centroid was measured in each 100- R2 values are what percent of the variance is explained.
millisecond splice. To find the spectral centroid rate of Affective Regression model Adj.
change, the following equation was applied to each dimension R2
consecutive pair of splices: Angry Overall maximum amplitude, timbre, 0.111
spectral centroid rate of change.
∑ | − # |. Anticipating Overall maximum amplitude, oscillation 0.071
Spectral centroid rate of change measures velocity of mode of the dynamic signal, spectral
centroid rate of change.
spectral centroid throughout the pieceಧa calculation of the
Anxious Overall maximum amplitude, oscillation 0.082
speed at which sound quality fluctuated throughout the five- mode of the dynamic signal, spectral
second excerpt. The summation of individual splice centroid rate of change.
measurements provided a numeric measurement for spectral Calm Overall maximum amplitude, average 0.401
rate of change over the entirety of the five-second musical amplitude, timbre, roughness, spectral
excerpt. centroid rate of change.
Dreamy Overall maximum amplitude, average 0.287
6. Oscillation Mode of the Dynamic Signal. The final
amplitude, timbre, spectral centroid rate of
predictor in this study is the velocity of the amplitude signal change.
throughout the excerpt. Amplitude measurements for each
Exciting Overall maximum amplitude, average 0.415
100-millisecond splice were recorded and plotted against
amplitude, roughness.
time. An example of this plot is shown in Figure 1 for Op.
31, No. 3, I. The resulting wave was analyzed with a Fourier Happy Overall maximum amplitude, average 0.223
transform, and the dominant frequency was recorded. This amplitude, roughness.
frequency is the principal mode of oscillation of the dynamic In love Overall maximum amplitude, timbre, 0.072
signal. For the example shown, the frequency was found to spectral centroid rate of change.
be 0.26 Hz. Sad Average amplitude, roughness. 0.222
V. DISCUSSION
The results from this study are consistent with the
hypothesis that low-level automatically extracted acoustic
features can predict listener ratings of perceived affect in
Beethoven's piano sonatas with a statistically significant
accuracy. Nevertheless, the study's methodology and results
should be contextualized with a few caveats.
A.Additional acoustic variables
In the first place, it is important to note that the six acoustic
features examined in this study are somewhat limited. Several
Figure 1 - Example of the Dynamic Signal for Excerpt Op. 31, important elements of a musical signal are not fully accounted
No. 3, I, with a Frequency of 0.26 Hz. for by these predictors. For example, although average
spectral centroid gives an approximation of timbre and pitch
IV. RESULTS height, more detailed pitch information could be taken into
consideration. The partials present in the signal could be
A.Predictor Selection algorithmically reduced to approximations of complex
Once values were assigned to each excerpt according to the pitches made up of multiple partials in the cochlea. From this
six low-level acoustic features defined above (see III: E), each algorithm, a measure could be made of the highest and lowest
variable was treated as a predictor variable in a linear pitches, as well as the density of chords.
regression model for each of the nine affective categories. Another variable that could be further explored is melodic
Model selection is not a straightforward task, and suffers from progression. 100 ms. is a short time, but many musical
dangers of overfitting and covariance. Regression models passages use notes shorter than 100 ms. Further divisions in
were selected that a) maximized variance accounted for, b)
171
Table of Contents
for this manuscript
window size could be made allowing for a more detailed network on the same task. It is possible that a machine-learned
investigation of melodic contour and motion. A third variable model may provide further insight into not only the
measured could be mode, which has been shown to be a relationships embedded in the signal, but also on the way
significant indicator of affective expression. Further research humans process a cognitively complex signal like musical
could quantify the 'majorness' or 'minorness' of each window emotion.
length and so estimate the mode of each excerpt. In future work, the addition of further parameters will be
Finally, variables related to rhythm and tempo could be explored due to the low adjusted values observed in the
extracted from the signal. The number of onset moments in current set. Examples of further parameters include: melodic
each five second excerpt could be calculated by looking at the progression, rhythm, and tempo throughout the musical
inter-onset intervals between amplitude spikes, providing an excerpt.
estimate of rhythmic activity. In addition, a tempo variable The next step in this research program is to define the
could be algorithmically derived based on rhythmic and machine learning algorithms that will employ the automatic
accentual regularities in the acoustic signal. extraction algorithms that are described in the current work.
These algorithms will be trained on the data set described
B. Model Strength above and the results will be compared to current statistical
The 'missing variables' discussed in the preceding section approaches to the problem of modeling the emotional
almost certainly play a role in reducing the amount of variance response to complex pieces of music.
in participant ratings the regression models reported here can
explain. Of course, listeners who are parsing the acoustic REFERENCES
signal in real time have access to all of the variables described
in the preceding section, and the failure to include them in Albrecht, J. (2014). Naming the abstract: Building a repertoire-
these models likely limits the success of the models' appropriate taxonomy of affective expression. In Proceedings
predictions. of ICMPC-APSCOM 2014 Joint Conference (pp. 128-135).
Although each regression model listed can explain a Seoul, South Korea: Yonsei University.
significant amount of the variance in listener ratings, some of Albrecht, J. (2016). Learning the language of affect: A new model
the models do a poor job of fitting the data. For example, the that predicts perceived affect across Beethoven’s piano sonatas
using music-theoretic parameters. in Proceedings of the
anticipating regression model can only account for 7.1% of
ICMPC-SMPC 2016 Joint Conference, San Francisco, USA.
the variance, while in love and anxious only account for 7.2% Eerola T., Friberg A., & Bresin R. (2013). Emotional expression in
and 8.2%, respectively. It is likely that these numbers would music: contribution, linearity, and additivity of primary musical
increase with the inclusion of more variables. cues. Frontiers in Psychology. 4:487. doi:
Some models can explain up to 41.5% (exciting) or 40.1% 10.3389/fpsyg.2013.00487
(calm) of the listener variance. These higher values offer more Eerola, T., Lartillot, O., & Toiviainen, P. (2009). Prediction of
confidence that the models are more accurately measuring multidimensional ratings in music from audio using
musical parameters associated with these emotions. multivariate regression models. In Proceedings of 10th
Nevertheless, in comparison to Eerola, Friberg, and Bresin International Conference on Music Information Retrieval
(ISMIR 2009) (pp. 621-626). Kobe, Japan.
(2013), Schubert (2004), or Eerola, Lartillot, and Toiviainen
Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music:
(2009), there is still room for improvement. The need to consider underlying mechanisms. Behavioral and
brain sciences, 31(05), 559-575.
C.Overfitting
Juslin, P. N., & Lindstrom, E. (2010). Musical expression of
As mentioned above, the regression models derived from emotions: Modelling listeners' judgements of composed and
this dataset may be overfit to the data. Although adjusted R2 performed features. Music Analysis, 29(1-3), 334-364.
values attempt to correct for overfitting by penalizing extra Ollen, J. (2006). A Criterion-related Validity Test of Selected
variable selection, it is still possible that the models may be Indicators of Musical Sophistication Using Expert
overly specific with respect to music participants, excerpt Ratings (Doctoral dissertation). Columbus, OH: Ohio State
selection, and affective communication. Further research and University.
Schubert, E. (2004). Modeling perceived emotion with continuous
algorithms will need to be developed in order to address these
musical features. Music Perception: An Interdisciplinary
categories of overfitting. Journal, 21(4), 561-585.
Schubert, E., Wolfe, J., & Tarnopolsky, A. (2004, August). Spectral
VI. CONCLUSION centroid and timbre in complex, multiple instrumental textures.
The purpose of this research effort is to construct machine- In Proc. 8th Int. Conf. on Music Perception & Cognition
learning algorithms to analyze and predict the correlation (ICMPC), Evanston.
between low-level musical features automatically extracted Vassilakis, P. N. (2005). Auditory roughness as means of musical
expression. Selected Reports in Ethnomusicology, 12, 119-144.
from an audio excerpt and participants’ evaluation of the
emotional expression of that excerpt. One way to begin an
effort to build a machine-learned neural net exploring these
relationships is to first investigate correlations using a
multiple regression model. The regression models reported
here show that the low-level acoustic features automatically
extracted from these audio signals are significantly correlated
with perceived emotion. It is therefore fitting to further
expand this work by training a machine-learned neural
172
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 173 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Research shows that active music instruction with K-12 populations may enhance
academic achievement. This enhancement may be due to better auditory processing in
students who participate in music. We believe there are additional possible advantages to
active music instruction with a focus on musical improvisation. Improvisation involves
combining discrete elements (notes, musical figures) in real time following musical rules.
In addition, improvisers plan architectural features of upcoming passages and evaluate
whether the played output corresponds to these plans. The evaluation process also
identifies errors in note choices that may not comply with the given tonal and rhythmic
context. Therefore, we hypothesize that students who participate in intensive jazz
improvisation instruction exhibit enhanced general cognitive abilities on measures related
to inhibition, cognitive flexibility, and working memory. We are testing this hypothesis in
an ongoing longitudinal study in which middle school students in a large suburban band
program complete cognitive tests before and after receiving intensive jazz improvisation
instruction. Initial between-subjects results show that students with improvisation
experience (N=24) make significantly fewer errors on the Wisconsin Card Sort Task
(M=11.9, % perseverative errors) than a control group (M=14.7, N=141). However, there
were no significant differences between the two groups in measures of cognitive
inhibition and working memory performance. We are currently engaging a much larger
group in intensive jazz instruction (N=70). Results of these within-subjects pre/post tests
will be available by the time of the proposed presentation. We believe this is the first
investigation of far-transfer effects of music improvisation instruction.
ISBN 1-876346-65-5 © ICMPC14 174 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 175 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
between expectation and perceptual imagery, both of which and each note lasted for 250 ms with an inter-onset interval of
are widely studied with well-established paradigms in music 600 ms. All 12 keys were used randomly throughout the
perception and cognition in behavioral (psychophysical) tests, experiment with starting notes of F#3 (184.997 Hz) through
and in time-sensitive measures of electrical brain activity F4 (349.228 Hz). There were 108 trials total with a block of
(Janata & Paroo, 2006; Koelsch et al., 2000). 36 trials for each scale block, for which the order was rotated
for each participant. The last note alterations were ±0, 25, 50,
D. The Present Research 75 and 100 cents and these alterations were randomized
Here we apply psychophysical, electrophysiological, and within each scale block.
psychometric tools from music perception and cognition There were two conditions: perception and imagery. For
research to clarify our understanding of creativity. the perception condition participants were asked to judge
Specifically, we examine the roles of divergent thinking, whether the last note was higher, lower, or the same as the
expectation, and perceptual imagery in jazz musicians as a expected pitch. In the imagery condition the two
model of creativity, compared with non-improvising second-to-last notes of the scale were silent, and participants
musicians and nonmusician control groups. A major were asked to imagine these two notes in the silent gap and
advantage of the following tests is that they offer specific, still judge the last note. For each condition there was a
controlled stimuli to couple with neural measures, thus cutting practice round of 10 trials during which participants received
down the problem space for a more rigorous understanding of feedback on the screen and the experimenter monitored their
jazz improvisation as a domain of creativity. accuracy to make sure they understood the task.
Linear psychometric functions were fitted to yield the slope
II. AIMS for imagery and perception conditions for each individual.
General accuracy was compared in addition to slopes of
A. Overall hypothesis
psychometric functions between groups.
Here we combine psychophysical measures of auditory
imagination and perception (Janata & Paroo, 2006), C. EEG Harmonic Expectation Task
behavioral and electrophysiological measures of musical Stimuli consisted of chord progressions that were either
expectation (Koelsch et al., 2000), and domain-general expected, slightly unexpected, or highly unexpected (figure 1).
measures of divergent thinking (Torrance, 1968), to test the The participants were instructed to listen to each chord
hypothesis that spontaneous musical creativity depends on 1) progression and rate their preference for it on a scale from 1-4,
heightened perceptual awareness and more accurate mental with the 1 being dislike and 4 being like. The trials were
imagery, 2) increased sensitivity to and awareness of arranged in blocks of 60, and each participant completed at
unexpected events, and 3) heightened domain-general least 3 blocks (maximum 6 blocks). EEG was recorded using
divergent thinking abilities. We test this hypothesis using jazz PyCorder software from a 64-channel BrainVision actiCHamp
musicians as a model of spontaneous musical creativity, setup with electrodes corresponding to the international 10-20
compared with non-improvising musicians and non-musician EEG system. The recording was continuous with a raw
controls. sampling rate of 1000 Hz. EEG recording took place in a
sound attenuated, electrically shielded booth.
III. METHOD
A. Subjects
Subjects were recruited from Wesleyan University or the
Hartt School of Music in exchange for monetary
compensation or partial course credit. Subjects gave informed
consent as approved by the Institutional Review Boards of
Wesleyan University and Hartford Hospital.
Table 1. Subject characteristics in Jazz musician, Control
(non-jazz) musician, and Non-musician groups
Pitch Raw IQ Training Training
Age
% Discrim (Shipley, Onset Duration
N (years)
Female (Hz) 1940) (years) (years)
M(SD)
M(SD) M(SD) M(SD) M(SD)
Jazz 20.1
17 11.8 4.1 (3.2) 17.5 (1.8) 7.9 (2.8) 8.7 (3.4)
Musicians (1.5)
Control 22.4 8.8
16 50.0 5.0 (2.2) 16.5 (1.6) 10.0 (4.6)
Musicians (5.9) (3.4)
Non- 19.0 14.4
24 62.5 16.6 (2.3) 9.3 (2.8) 2.1 (2.1)
musicians (1.1) (13.6) Figure 1. Example high, medium, and low expectation chord
progressions.
Raw EEG data were imported to BrainVision Analyzer for
B. Scale Imagery Task analysis. Preprocessing included applying infinite impulse
Participants listened to scales (either major, harmonic response filters with a low-pass cutoff of 30 Hz and a
minor, or blues) and judged whether the last note was high-pass cutoff of .5 Hz. Raw data inspection was used to
modified in pitch. The scales were played using Max/MSP exclude data points with a higher gradient (>50uV/msec),
176
Table of Contents
for this manuscript
high mins and max (>200uV), and extreme amplitudes (-200 2) ERP Data. A mixed factor ANOVA on the dependent
to 200 uV). Ocular correction ICA was also done for each variable of ERP amplitude during the last chord, with the
participant. The data were then segmented into chords and the between-subjects factor of group and the within-subjects
trials were averaged and baseline corrected. We compared factor of expectancy (low vs. high) showed a significant main
ERP traces for high, medium, and low expectation chords effect of expectancy and a significant interaction between
among the groups. We also plotted difference waves for expectancy and group for electrodes P2, P4, and PO4 between
medium minus high expectation and for low minus high 410-480 ms and F8 and FT8 between 220 and 260 ms (See
expectation. Peaks for each subject were then exported from Table 2 for and F and p values).
BrainVision Analyzer and analysed separately in SPSS.
1 a Imagery Perception
D. Divergent Thinking Task 0.9
0.7 0.7
IV. RESULTS 0.6 0.6
0.5 0.5
A. Scale Imagery Task
0.4 0.4
A mixed factor ANOVA on the dependent variable of 0.3
0.3
accuracy with the between-subjects factor of group (Jazz Nonmusicians
0.2
0.2
musicians, Non-jazz musicians, Non-musicians) and the Musicians
0.1
0.1
within-subjects factor of task (perception vs. imagery) showed 0
Jazz Musicians
0
significant main effects of group (F(2,33) = 22.8, p < .001) 25 50 75 100 25 50 75 100
and task (F(1,33) = 21.0, p < .001) but no task-by-group Deviation (cents) Deviation (cents)
177
Table of Contents
for this manuscript
= .04), 3 (F(2,31) = 13.1, p <.001), 4 (F(2,31) = 12.0, p <.001), memory-dependent strategy, and are thus more dependent on
5 (F(2,31) = 8.7, p =.001), and 6 (F(2,31) = 5.5, p =.01). The training compared to the lower-level pitch discrimination task.
jazz musicians scored the highest out of the three groups on
questions 4, 5, and 6 for fluency and questions 3, 4, 5, and 6 a Nonmusicians Musicians Jazz Musicians
20
*
for originality followed by musicians and then non-musicians.
18 *
16 *
4
a b
*
Non-jazz musicians 410 ms - 480 ms 14
Fluency
3.5 Jazz musicians 12
10
3 8
Rating
2.5 6
4
2 2
0
1.5
1 2 3 4 5 6
1 b 40 Question number
Hi Med Low Non-musicians * *
30
*
Originality
*
Expectation -10 µV/m² 0 µV/m² 10 µV/m² *
20
c d
10
0
1 2 3 4 5 6
Question number
178
Table of Contents
for this manuscript
abilities and age of onset or number of years of musical Musical Improvisation: An fMRI Study of ‘Trading Fours’
training between non-jazz musicians and jazz musicians (as in Jazz. PLoS ONE, 9(2), e88665.
shown in the control pitch discrimination tasks), and no
differences in IQ among the three groups. Huron, D. (2006). Sweet Anticipation: Music and the Psychology of
The Jazz musicians’ high performance on the DTT on 4/6 Expectation (1 ed. Vol. 1). Cambridge, MA: MIT Press.
questions compared to Non-jazz musicians and
Ivry, R. B., & Robertson, L. (1997). The Two Sides of Perception:
Non-musicians indicates that Jazz musicians have a general
MIT Press.
advantage in creativity that transcends the domain of music.
We believe that questions 1 and 2 may not have captured Janata, P., & Paroo, K. (2006). Acuity of auditory images in pitch
significant differences among the three groups because 1) the and time. Percept Psychophys, 68(5), 829-844.
questions were especially ambiguous and resulted in
extremely divergent answers, and 2) possible order effects as Koelsch, S., Gunter, T., Friederici, A. D., & Schroger, E. (2000).
participants might have needed time to engage themselves Brain indices of music processing: "nonmusicians" are
fully in the task of divergent thinking. Nevertheless, the musical. J Cogn Neurosci, 12(3), 520-541.
differences between the Jazz musicians and other two groups
Limb, C. J., & Braun, A. R. (2008). Neural substrates of spontaneous
for the rest of the questions is striking given that there are no
differences in IQ among the three groups. musical performance: an FMRI study of jazz improvisation.
PLoS One, 3(2), e1679.
VI. CONCLUSION Liu, S., Chow, H. M., Xu, Y., Erkkinen, M. G., Swett, K. E., Eagle,
Jazz musicians in our present sample scored higher on M. W., . . . Braun, A. R. (2012). Neural correlates of lyrical
domain-general creativity tasks. Psychophysical and improvisation: an FMRI study of freestyle rap. Sci Rep, 2,
electrophysiological measures suggest that they also possess 834.
heightened perceptual awareness of and sensitivity to
unexpected events within a musical context. Taken together, Loui, P., & Wessel, D. (2007). Harmonic expectation and affect in
results from domain-specific as well as domain-general tasks Western music: Effects of attention and training.
suggest that creativity entails being open to unexpected events Perception & Psychophysics, 69(7), 1084-1092.
within one’s domain, as well as being more fluent and original
Meyer, L. (1956). Emotion and Meaning in Music: U of Chicago
in idea generation. The present results validate the use of jazz
Press.
improvisation as a model system for understanding creativity,
and further suggest that systematic violations of Navarro Cebrian, A., & Janata, P. (2010). Electrophysiological
domain-specific expectations may provide a time-sensitive correlates of accurate mental image formation in auditory
measure of the rapid and flexible real-time creative process. perception and imagery tasks. Brain Res, 1342, 39-54.
ACKNOWLEDGMENT Pinho, A. L., de Manzano, Ö., Fransson, P., Eriksson, H., & Ullén, F.
Supported by the Imagination Institute. We thank all our (2014). Connecting to create: Expertise in musical
participants. improvisation is associated with increased functional
connectivity between premotor and prefrontal areas. The
REFERENCES Journal of Neuroscience, 34(18), 6156-6163.
Berkowitz, A. L., & Ansari, D. (2008). Generation of novel motor Runco, M. A. (1991). Divergent thinking. Westport, CT US: Ablex
sequences: the neural correlates of musical improvisation. Publishing.
Neuroimage, 41(2), 535-543.
Shipley, W. C. (1940). A self-administering scale for measuring
Biasutti, M. (2015). Pedagogical applications of the cognitive intellectual impairment and deterioration. Journal of
research on music improvisation. Frontiers in Psychology, Psychology, 9, 371-377.
6.
Sternberg, R. J., Lubart, T. I., Kaufman, J. C., & Pretz, J. E. (2005).
Campbell, D. T. (1960). Blind variation and selective retentions in Creativity. In K. J. Holyoak & R. G. Morrison (Eds.), The
creative thought as in other knowledge processes. Cambridge handbook of thinking and reasoning (Vol. 137,
Psychological Review, 67(6), 380-400. pp. 351-369): Cambridge University Press Cambridge.
Csikszentmihalyi, M. (1996). Creativity: flow and the psychology of Torrance, E. P. (1968). Examples and rationales of test tasks for
discovery and invention. New York: assessing creative abilities Journal of Creative Behavior,
HarperCollinsPublishers. 2(3), 165-178.
Dietrich, A., & Kanso, R. (2010). A review of EEG, ERP, and Villarreal, M. F., Cerquetti, D., Caruso, S., Schwarcz Lopez
neuroimaging studies of creativity and insight. Psychol Aranguren, V., Gerschcovich, E. R., Frega, A. L., &
Bull, 136(5), 822-848. Leiguarda, R. C. (2013). Neural Correlates of Musical
Creativity: Differences between High and Low Creative
Donnay, G. F., Rankin, S. K., Lopez-Gonzalez, M., Jiradejvong, P., Subjects. PLoS One, 8(9), e75427.
& Limb, C. J. (2014). Neural Substrates of Interactive
179
Table of Contents
for this manuscript
Current creativity research continues to disagree about the nature of creative thought,
specifically, whether it is primarily based on Type-1 (automatic, associative) or Type-2
(executive, controlled) cognitive processes. We proposed that Type-1 and Type-2
processes make differential contributions to creative production depending on domain
expertise and situational factors such as task instructions. We tested this hypothesis on
jazz pianists who were instructed to improvise to a novel chord sequence and rhythm
accompaniment. Using a staggered-baseline design, participants were introduced to
explicit creativity instructions, which, via goal activation, is thought to prompt the
musicians to engage Type-2 processes. Jazz experts rated all performances with high
degrees of inter-rater reliability. Multilevel regression models revealed a significant
interaction between Expertise and Instructions (standard / ‘be creative’) when predicting
musicians’ improvisation ratings. Overall, performances by more experienced pianists
were rated as superior and creativity instructions resulted in higher ratings. However,
creativity instructions benefited less-experienced pianists more than it did for more-
experienced pianists, suggesting differential contributions by Type-1 and Type-2
processes depending on situational factors and individual differences. These findings
indicate that training and experience afford jazz pianists the ability to develop efficient
creative processes, relying more on implicit, unconscious cognitive systems than novices.
Since explicit “be creative” instructions are a challenging goal that occupies the
conscious mind, they interfere with the optimal creative processes of expert jazz
musicians; however, for less experienced musicians, consciously attending to a creative
goal can help them avoid cognitive fixation, assisting musicians to alter their
performances in ways that facilitate more creativity.
ISBN 1-876346-65-5 © ICMPC14 180 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Variable form compositions permit multiple orderings of its sections and can thus be used
to study preferences in musical form. This study uses a contemporary musical
composition with variable form (created by the author, a composer-researcher) to
investigate 1) perceived similarity of musical material and 2) preferences when creating
large-scale forms. The composition, titled Negentropy, is a stand-alone concert piece for
solo flute, intended to double as a music perception task. It is divided into 9 sections
(between 25 and 55 seconds each) that can be variably sequenced, resulting in just under
a million possible unique sequences. Participants (7 professional composers and 7 non-
musicians) characterized each of the 9 sections, then sorted the sections into groups based
on musical similarity, and then ordered the sections (omitting up to 3) into their own
“compositions”.
Preliminary results reveal that in the sorting task divergent sections were reliably placed
into the same groups, suggesting that listeners prioritized surface similarities in their
classifications rather than form-bearing dimensions, such as structural organization. Data
from the sequencing task reveal an unexpected agreement between composers’ and non-
musicians’ sequences. 71% of participants chose a sequence with a symmetrical
compositional structure consisting of tender, contemplative outer sections and energetic,
rhythmic inner sections. That is, composers and non-musicians alike preferred an arch
form ordering. It was anticipated that composers would choose more unconventional
sequences. In contrast, the two flautists that have performed the piece thus far chose non-
traditional forms, thereby implying that increased familiarity with the musical material,
rather than musical experience, may result in more daring structural decisions. In sum,
the Negentropy project contributes to our understanding of preference in large-scale
musical form, and presents an integration of scientific inquiry with contemporary musical
composition.
ISBN 1-876346-65-5 © ICMPC14 181 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 182 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 183 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
purpose or to generate seed material for composition. 2. Tim has developed the sketching into notation technique,
Sketch materials or models to structure musical form or often used in combination with elements of traditional
regulate construction principles. 3. Symbolic representations notation. An important feature in the early phases of this was a
to make extramusical references as an integral part of the grid to guide the musician in the awareness of different
musical expression. degrees of presence in the music by means of a simple spatial
model. The degree of presence in the music could concern
III. MUSICAL TOPOLOGY such different aspects as the freedom of improvisation in
Relations berween spatial notions and music can be relation to the score, or a sense of abstraction of being in the
approached from different angles. That physical gesture is world.
inherent in musical gesture was a point of departure for
Smalley when developing spectromorphology, as a technique IV. GENERATIVE STRUCTURE
for notating gestural qualities of sonic events[2]. Smalley Generative structures are commonplace in music
mentions four orders of surrogacy that can be engaged composition and typically engage computerized algorithms
whenever a physical gesture is propagated in sound, going that mimic some aspect(s) of creation in nature, often inspired
from concrete to abstract relations, and discusses the problem by evolutionary theories[3][4]. Ultimately it addresses the
of source-bonding as limiting to extrinsic meaning making in question ‘what is life?’ and problems of the existence of a free
music. will. In a series of composition Vincent has dealth with a
Tim has developed techniques for graphical sketching of different approach to generative structures for in music
musical ideas, based on an experienced strong connection composition, wherein a set of instructions enrols the
between spatiality and music. The sketches employ symbolic musicians as generators of musical events and gestures. The
representation with a certain degree of conceptual flexibility. overarching structure of the instructions together with the
The symbols have a kind of nodal structure that is context actions of the musicians provide the local and global context;
dependent. According to Tim, items in the drawings represent the form of the piece.
topoi in the composer’s mind that correspond to certain Vincent has provided very few sketches within the frame of
musical expressions. Sometimes the sketches imply a time the study. Instead his mapping strategies go right into the
axis, other times their function is more geographical, like mind scores that typically focus musical activity rather than convey
maps over musical ideas and their contexts. In order to an image of a musical idea. Some of his pieces have a game-
elaborate the technique and esthetic principles, Tim practiced like structure where the actions of the musician have a major
this kind of sketching separate from compositional work, as a influence on the musical structure. This is true in a piece for
discipline in itself (fig. 1) ”to develop a language of musical any instrument, which is setup like a maze where the musician
drawing” as he put it. Sketching in this way offers a kind of is faced with a series of choices. The choices made, lead to a
abstraction; a meta-level for organizing musical thinking with path that brings certain obligations and then to new forks in
an option to infer quite specific musical ideas prior to the path.
designing the musical activities to shape the musical
expression.
184
Table of Contents
for this manuscript
185
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 186 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Violone Contrabass
III. RESULTS
The resulting sample set included a total of 165 parts
employing 91 unique instruments. Many of these parts were
of common orchestral instruments (such as violin, flute, and
trumpet), whereas some parts disappeared in history (such as
continuo). Some other instruments appeared more often with
advancing years, including percussion and wind instruments
often exhibiting extended pitch and timbre ranges (such as
alto flute or contrabassoon). Analyses were carried out using
instrumentation presence and instrument usage.
First, we might want to study the change in ensemble size
over time. Due to the practice of sometimes having more than
one instrument playing the same part, we do not have data for
Figure 2. Instrumentation presence (top) and instrument usage
the total number of instruments required to perform a given (bottom) of string instruments
piece of music. However, we can still examine the total
number of parts notated in the score, which is likely to be In Figure 3, we see that the presence of four
proportional to the ensemble size. Figure 1 shows the total instruments—oboe, flute, bassoon, and clarinet—is noticeably
number of parts for the 180 orchestral scores in our sample. higher than others. Oboe, flute, bassoon, and clarinet were
Each dot represents a score, with the corresponding available earlier in history, which probably helped establish
187
Table of Contents
for this manuscript
their presence in orchestras, whereas the other four early 19th century, their use was reserved for occasional
instruments were introduced later presumably to provide a special effects.
wider range of pitch and timbre. This separation of two
groups can be still observed in the usage pattern, although not
as clearly.
188
Table of Contents
for this manuscript
Table 2. Most common usage patterns in 300 years pattern was not very popular in 1701−1750. Table 3 shows
Number of that the use of string instruments showed a slight decrease
Instrument combination pattern through the years, in contrast to wind instruments.
instances
67 Violin + Violin + Viola + Cello + Contrabass
Violin + Violin + Viola + Cello + Contrabass +
27
Oboe + Oboe
Violin + Violin + Viola + Cello + Contrabass +
20
Harpsichord
19 Violin + Violin + Viola
19 Violin + Violin
19 Violin + Violin + Viola + Cello
18 Violin + Violin + Viola + Continuo
17 Piano
14 Violin
Violin + Violin + Viola + Cello + Contrabass +
12
Bassoon
Violin + Violin + Violin + Viola + Cello +
10
Contrabass
10 Violin + Violin + Continuo
10 Cello + Contrabass
9 Violin + Violin + Violin
Violin + Violin + Viola + Cello + Contrabass +
9
Piano
Figure 6. Hierarchical clustering of 22 orchestral instruments
Violin + Violin + Viola + Cello + Harpsichord
9 Interested in how various instruments were combined with
+ Continuo
each other, we carried out a hierarchical clustering analysis on
Table 3. Most common usage patterns per epoch the usage patterns of every pair of 22 typical orchestral
Number instruments (Violin, viola, cello, contrabass, piccolo, flute,
Epoch of Instrumentation pattern alto flute, oboe, English horn, clarinet, bass clarinet, bassoon,
instances contrabassoon, French horn, trumpet, trombone, bass
Violin + Violin + Viola + Cello + trombone, contrabass trombone, tuba, bass tuba, harp, and
19
1701 Contrabass + Harpsichord timpani). Figure 6 shows the resulting dendrogram. At around
– 15 Violin + Violin + Viola + Continuo the distance of 16, we observe four clusters: on the top is a big
1750 Violin + Violin + Viola + Cello + cluster containing all string instruments and all brass
14
Contrabass + Oboe + Oboe +Bassoon instruments except for the French horn, plus harp and timpani
Violin + Violin + Viola + Cello + (cello, contrabass, viola, contrabass trombone, bass tuba, bass
27
Contrabass trombone, tuba, harp, timpani, violin, trumpet, and trombone).
1751
Violin + Violin + Viola + Cello + These instruments probably form the core of the orchestra.
– 13
Contrabass + Oboe + Oboe + Bassoon Next is a cluster of four woodwinds offering extended pitch
1800
Violin + Violin + Viola + Cello + and timbral color (bass clarinet, contrabassoon, English horn,
9
Contrabass + Harpsichord + Continuo and alto flute). Strangely, piccolo is in its own cluster for
Violin + Violin + Viola + Cello + some unknown reason. The last cluster consists of five wind
10
1801 Contrabass instruments (flute, oboe, clarinet, bassoon, and French horn)
– 9 Piano that showed higher presence than others (evident in Figures 2
1850 Violin + Violin + Viola + Cello + and 3). If we take the liberty to include piccolo in the second
7
Contrabass + Piano cluster with other woodwinds, we have three clusters that
4 Violin + Violin + Viola + Cello reflect potentially different compositional functions. The first
1851
3 Violin cluster might be called the Core cluster, where all string
–
Violin + Violin + Viola + Cello + instruments seems to provide the core of an orchestral scene;
1900 3
Contrabass the second cluster contains five woodwinds (bass clarinet,
Violin + Violin + Viola + Cello + contrabassoon, English horn, alto flute, and piccolo) that offer
1901 17
Contrabass extended pitch and timbre ranges, therefore dubbed the
–
6 Violin + Viola + Cello + Contrabass Extended Woodwind cluster; the third cluster might be
1950
5 Violin + Violin + Viola + Cello deemed the Standard Wind cluster as it contains five wind
5 Piano
instruments that are rather standard in orchestras (flute, oboe,
Violin + Violin + Violin + Violin + Viola +
4 clarinet, bassoon, and French horn). The second and third
1951 Viola + Cello + Contrabass
clusters probably reflect different purposes in instrumentation
– Violin + Violin + Viola + Cello +
3 of woodwinds; the instruments in the third cluster tend to be
2000 Contrabass
used more consistently, complementing those in the core
3 Cello
cluster, whereas those in the second cluster are more likely to
3 Harp
189
Table of Contents
for this manuscript
be used for specific effects in pitch and timbre. Our clusters as well as examination of occurrences of specific patterns
did not perfectly agree with Johnson’s (2011) SPC model. mentioned in orchestral treatises.
However, like Johnson’s model, our clusters nevertheless
appear to point to different compositional functions.
REFERENCES
IV. DISCUSSION Adler, S. (2002). The Study of Orchestration. New York: W.W.
Norton.
In this paper, we surveyed the instrumentation patterns in Berlioz, L.H. (1855). Berlioz’s orchestration treatise: A translation
orchestral music composed between 1701 and 2000, and commentary (Cambridge musical texts and monographs) (H.
concentrating on instrumentation presence and instrument Macdonald, Ed.). Cambridge, England: Cambridge University
usage. From our samples we could clearly see the constant Press. (Republished in 2002).
increase in the total number of parts required to perform a Dubal, D. (2001). The Essential Canon of Classical Music. New
piece of music through history up until the mid-20th century. York: North Point Press.
Harnoncourt, N. (1988). Baroque music today: Music as speech:
In contrast to the increase of the number of parts in an Ways to a new understanding of music (R.G. Pauly, Trans.).
orchestral work, the usage pattern of an instrument showed a Portland, Oregon: Amadeus Press.
decreasing trend in general. This decline in usage may simply Johnson, R. (2011). The Standard, Power, and Color model of
reflect the increased palette of instruments from which a instrument combination in Romantic-era symphonic works.
composer might choose; that is, when the composer can Empirical Musicology Review, Vol. 6, No. 1, pp. 2-19.
choose from many instruments, any given instrument need not Jolly, J. (2010). The Gramophone Classical Music Guide. Haymarket
play constantly. Consumer Media.
In terms of instrumentation presence, string instruments Koechlin, C. (1954-9). Traité de l’orchestration: en quatre volumes.
appear to show the highest presence. Strings have been a core Paris: M. Eschig.
Piston, W. (1955). Orchestration. New York: W.W. Norton.
of an orchestra since the 18th century and therefore present in Rimsky-Korsakov, N. (1964). Principles of Orchestration. Edited by
nearly every piece of orchestral music (Weaver, 2006). The Maximilian Steinberg. English translation by Edward Agate New
wind instruments in current orchestras seem to belong to three York: Dover Publications. (Original work published 1912)
groups depending on the compositional purposes they Stauffer, G.B. (2006). The modern orchestra: A creation of the late
serve—the Standard Wind group (flute, oboe, clarinet, eighteenth century. In J. Peyser (ed.), The Orchestra (pp. 41-72).
bassoon, and French horn), the Core (brass) group (trumpet, Milwaukee, WI: Hal Leonard.
trombone, bass trombone, contrabass trombone, tuba, bass Weaver, R.L. (2006). The Consolidation of the main elements of the
tuba), and the Extended Woodwind group (bass clarinet, orchestra: 1470-1768. In J. Peyser (ed.), The Orchestra (pp.
7-40). Milwaukee, WI: Hal Leonard.
contrabassoon, English horn, alto flute, and piccolo). The
Widor, C.-M. (1904/1906) Technique de l’orchestre moderne. Paris,
instruments in the Standard Wind group have been available 1904. Trans. by Edward Suddard as: The technique of the
for a longer time and they tend to sound more often than modern orchestra: A manual of practical instrumentation.
others. These instruments offer a set of stable timbres to the London, J. Williams, 1906.
orchestra, which steadily increased with the inclusion of
instruments that provide more diversity in timbre and pitch
(especially those in the extended woodwind group). The Core
brass instruments add more weight to the skeleton of an
orchestral image painted by the string instruments. Harp and
timpani, which belong to the same cluster, also appear to
serve the same purpose of adding weight. Our three clusters
did not entirely replicate Johnson’s SPC model, possibly
because our data included samples from a longer period than
the Romantic era, whereas Johnson focused exclusively on the
Romantic period.
V. CONCLUSION
To our knowledge, this is the first empirical longitudinal
study of orchestration. The general patterns we found mostly
agree with what have been qualitatively observed by music
scholars (Harnoncourt, 1988; Stauffer, 2006; Weaver, 2006).
Many of our analyses were performed on the entire set of
samples for the purpose of statistical power, but it might have
led to glossing over possible changes observable per epoch.
Therefore our findings need to be interpreted carefully.
Although this study required extensive labor to sample and
code 1800 sonorities from 180 works, a larger collection of
data is necessary in order to survey detailed longitudinal
trends. Further analysis of presence and usage patterns in
relation to pitch, tempo, and dynamics are expected to follow,
190
Table of Contents
for this manuscript
The aim of the present study is to investigate whether amusic individuals demonstrate
such an in-culture bias as to assess whether implicit learning and emotional responses to
what has been implicitly learned constitutes a deficit domain in amusia. The literature
reporting on core deficit domains in amusia typically focuses on musical expectancy
including evaluating melodic and harmonic structures. Yet how failing at musical
expectancy may lead to reduced sensitivity to more global musical phenomena such as
culture and tension is unknown. We seek to examine whether sensitivity to such global
musical phenomena is impaired in amusia.
Twenty-six amusics and 26 matched controls listened to 32 short Western and Indian
melodies, played either on the piano or sitar, with or without syntactic violations (two in-
key and two out-of-key variants), and were asked to rate the tension induced by each
melody using a Continuous Response Digital Interface dial.
These findings indicate that amusics have reduced cultural sensitivity to perceived
tension in music. Since musical tension rating taps into emotional responses, this suggests
that amusics may also have impaired affective responses to music.
ISBN 1-876346-65-5 © ICMPC14 191 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Individuals, regardless of music training, demonstrate better memory for music that is
culturally familiar than music that is culturally unfamiliar. Among Western listeners,
memory advantage for own-culture music persists even when music examples are
reduced to isochronous pitch sequences of uniform timbre, suggesting that melodic
expectancy in the form of the pitch-to-pitch intervallic content may be a useful measure
of enculturation. We have hypothesized that the cognitive distance between the musics of
two cultures may be represented by differences in the statistical content of melodies from
those cultures.
In the first part of this study we tested this hypothesis by examining a corpus of Western
and of Turkish melodies to determine if culture-specific typicality of interval content,
melodic contour, rhythm content, or some combination of these parameters showed a
relationship to cross-cultural memory performance. Interval content demonstrated the
strongest correlation with preexisting memory performance data.
In the second part of this study currently underway we further test this finding. Using
melodies that demonstrated the strongest typicality within their respective corpora, adult
listeners will complete an open sorting task in which they will organize 24 Western and
Turkish melodies into two categories of their own devising. This will be followed by a
recall test in which participants will identify whether brief music excerpts were or were
not taken from the sorted melodies. We predict that (1) sorting differentiation will best
reflect culture-specific measures of pitch content, (2) memory will be more accurate for
melodies derived from the culturally familiar corpus, and (3) memory performance will
demonstrate the strongest relationship with own-culture pitch content typicality.
ISBN 1-876346-65-5 © ICMPC14 192 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Emotion occupies centre stage amongst the numerous expressive qualities of music.
Recently, it was shown that ragas which are specific combinations of tonic intervals are
capable of eliciting distinct emotions. It was also shown that alaap and gat, the two
modes of presentation of a raga that differ in rhythmic regularity, differ in the level of
emotion arousal. While tonality (estimated in terms of minor to major ratio) determined
the emotion experienced for a raga, rhythmic regularity and tempo modulated level of
arousal. The primary objective of this study was to investigate cultural similarities and
differences in experienced emotion across enculturated and non-enculturated populations
in such raga stimuli. 143 enculturated and 112 non-enculturated participants provided
responses of ‘experienced emotion’ to 12 Hindustani classical ragas presented in alaap
and gat on a Likert scale of 0-4. Analysis of responses revealed similar emotion
experienced by the two groups, while factor analysis revealed differences in the way
emotion labels were grouped together. ‘Longing’ was grouped with ‘negative’ factor (i.e.
sad, tensed and angry emotions) by enculturated participants whereas it was grouped with
‘positive’ factor (i.e. happy, romantic, calm, and devotional) by the non-enculturated
group. To investigate level of emotion arousal, ratings of highest experienced emotion
were correlated with tonality for both alaap and gat. Results revealed significant
correlations between tonality and emotion rating for both alaap and gat in enculturated
participants, while non-enculturated participants showed a significant correlation only
during gat. While our results suggest universality in emotions experienced by culturally
distinct populations, the differences in level of emotion arousal suggest that rhythmic
cues may be used similarly but tonality cues may modulate emotion response
differentially by the two populations.
ISBN 1-876346-65-5 © ICMPC14 193 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
One of the most striking benefits of the use of large corpora in music research has been
the ability to examine relationships and commonalities between cultures, which have
associated linguistic, emotional, and social universals underpinning musical functions.
The current study takes an alternative approach, examining contrasts rather than
commonalities within Native American music, while also discussing how data mining
can allow for an understanding of the interaction between specific social functions and
musical features.
In this study, we build upon Densmore’s own analyses and Gundlach’s early comparisons
in Native American music to further explore the relationships between musical features
and social function. For example, results reveal certain sequential patterns of melodic
intervals and contour, as well as melodic interval global features, that are significantly
overrepresented in love songs, game songs, and healing songs. These results confirm and
extend earlier observations on Native American Music.
ISBN 1-876346-65-5 © ICMPC14 194 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
I. BACKGROUND linearly from silence to chanting tasks (silence [SD = .18], listening
[SD = .21], speaking [SD = .22], and chanting [SD = .32]). The high
Spiritual musicking is ubiquitous across human cultures (Ellingson, variability of time estimations during the chanting task provides
1987). As evidenced in practices as diverse as Balinese bebutan tentative support for our hypothesis that synchronized repetitive
trancing, Krishna kirtan singing, and Vedic mantra chanting, spiritual vocalization alters temporal perception to a greater degree than
musicking has the capacity to modulate arousal levels and distort listening or speaking.
temporal perception (Becker, 2004; Black Brown, 2014). Though the
cultural value and phenomenological efficacy of music in these and Music majors were more accurate at time estimation than
many other cultural practices is widely acknowledged by non-majors during all tasks (music majors SMD = –.17, non-majors
ethnomusicologists (Rouget, 1985), the perceptual correlates of SMD = –.33), and the chanting mean for music majors was the most
spiritual musicking are poorly understood. In this study, we examined accurate of the whole dataset (SMD = –.01). This finding held when
one shared trait among the practices in question: sustained, assessing the relationship between time estimation and years of
synchronized vocalization of a small number of syllables on a single musical training as well: musical training is weakly though
pitch, as exemplified by mantra-style chant. significantly correlated with estimation accuracy during silence (r
= .31, p = .02) and chanting (r = .36, p < .01).
II. AIMS
We aimed to explore the effects of sustained, synchronized
vocalization on temporal perception, in comparison to silence,
listening, and speaking tasks. Accuracy of time estimates was used as
an indicator of participants’ retrospective experience of “clock time.”
The difference between subjects’ estimates and the actual “clock time”
that elapsed during each task indicated relative levels of temporal
distortion experienced by the subjects. We hypothesized that the
mantra-style vocalization (i.e., chanting) would produce the greatest
variability of time estimations.
III. METHOD
Participants were two groups of students at SMU (N = 58), 28 male,
30 female; 35 music majors, 23 non-major; age 18–30 (M = 19.55, SD
= 2.09); with 0–16 years of musical training (M = 8.49, SD = 5.44).
The study consisted of four conditions: (1) sitting in silence as a group,
(2) listening to a recording of a group chanting neutral syllables
“shaun” (ʃa–ɔ–un), (3) speaking the neutral syllables “shaun”
repetitively as a group, and (4) chanting “shaun” repetitively as a
group. For each condition, participants were asked to estimate as
accurately as possible in seconds how much time had elapsed during Figure 1: Time estimation (N = 58) in four conditions
the performance of the task.
Conditions were presented in randomized order between the groups
V. CONCLUSION
of students, and each task was randomly assigned a duration of 45s,
60s, 75s, or 90s. To compare data from the same tasks across different This study provides preliminary insight into the ubiquity and
time intervals, we defined accuracy of guesses as the ± standardized efficacy of chant practices. Sustained group vocalizations may more
differences (in percentage) of participants’ time estimations from the effectively lead to a distortion of time perception compared to merely
actual time. Three outliers were removed due to a 100% or greater speaking or listening to the same syllables.
standardized difference from the actual time.
REFERENCES
IV. RESULTS
Becker, J. (2004). Deep Listeners: Music,
A one-way repeated measures ANOVA revealed a significant effect
Emotion, and Trancing. Bloomington, IN: Indiana University
of the task on temporal judgment, F(3, 171) = 8.29, p < .0001. As
Press.
shown in Fig. 1, subjects consistently underestimated the duration of
Black Brown, S. (2014). Krishna, Christians, and
time that had elapsed: standardized mean difference (SMD) of time colors: The socially binding influence of Kirtan singing at a Utah
estimates for all four tasks was less than the actual time (silence [SMD Hare Krishna festival. Ethnomusicology, 58(3). 455-474.
= –.28], listening [SMD = –.31], speaking [SMD = –.23], and chanting Ellingson, T. (1987) “Music and Religion.” In Encyclopedia of
[SMD = –.15]). However, while chanting produced the most “accurate” Religion, Vol. 9, ed. M. Eliade. London: Macmillan.
temporal evaluation according to group means, it exhibited the highest Rouget, G. (1985) Music and Trance, trans.
standard deviation by a substantial margin. Variability increased B. Biebuyck. Chicago: University of Chicago Press.
ISBN 1-876346-65-5 © ICMPC14 195 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Empirical Musicology, 10 Years On: New Ways to publish and the empirical
paradigm in music research
Daniel Shanahan1
1
School of Music, Louisiana State University, Baton Rouge, LA, USA
2016 represents the 10th anniversary for the open-access journal Empirical Musicology Review
(http://emusicology.org/), which has followed the open peer-review model of Behavioral and
Brain Sciences, in which articles are published alongside commentaries by peers, in an attempt to
open the lines of discourse in the field. Since the journal began, however, the academic
publishing world has seen seismic shifts in attitudes toward access, peer-review, and the
publishing model in general. At the same time, empirical musicology as a research field has
developed substantially over the past ten years along with developments in neighbouring areas
such as music information retrieval, music and neurosciences, embodied music cognition and
digital musicology. These neighbouring research areas all subscribe to the empirical research
paradigm and the idea that musicological and scientific insights need to be based on empirical
evidence. However, while they share the empirical approach to music research, their individual
research perspectives and their ways of presenting themselves as current areas of music research
differ markedly. Given this diversity of empirical approaches to music research, can and should
empirical musicology provide a conceptual bracket that facilitates exchange of research ideas
and empirical findings between these areas?
This symposium will address several short-topic questions related to the recent changes in
scientific publishing, as well as on the role of empirical musicology as a paradigm for modern
music research. Topics will be presented by several contributors in short position papers (5-7
minutes), followed by a roundtable discussion. Specifically, topics will focus on the role of open-
access in music research, what questions empirical musicology should seek to answer, what tools
and methods are currently lacking, the benefits and disadvantages of open-peer review and a
broader focus on the past, present, and future of empirical musicology.
Although this deviates from a typical symposium consisting of longer talks, we are hopeful that
the representation of scholars with varying interests and from many different countries will
provide an equally rich discussion.
ISBN 1-876346-65-5 © ICMPC14 196 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
How can music theory, music psychology and music informatics strengthen
each other as disciplines?
Anja Volk1
1
Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands
Can new publishing models benefit researchers who read and respond
constructively to work by others?
Alan Marsden1
1
Department of Music, Lancaster University, Lancaster, UK
How and why does empirical data make music research more impactful
Alexandra Lamont1
1
School of Psychology, Keele University, Keele, UK
Where will empirical musicology venture over the next 10 years and what
does it take to get there?
Elizabeth Hellmuth Margulis1
1
School of Music, University of Arkansas, Fayetteville, AR, USA
The future of the specialised music research journal in the age of digital
publishing
Daniel Müllensiefen1
1
Department of Psychology, Goldsmiths, University of London, London, UK
197
Table of Contents
for this manuscript
The field of music cognition has developed for approximately 150 years. However,
extended discussion of its origins and growth in a historical sense has emerged only
recently. Investigating how and why music became one of the domains of brain research
in the twentieth and twenty-first centuries can lead to a deeper understanding of
psychology and cognitive science today. Exploring the roots of music research takes us
back to the nineteenth century. Most music psychologists agree there was an increasing
interest in music and brain in the late nineteenth century, particularly in the work of
Hermann von Helmholtz and the beginnings of psychoacoustics. The late nineteenth
century also saw the emergence of musicology, psychology and neurology as new fields
of study. Literature in these areas shows that Helmholtz was not alone, but that in fact,
researchers in all of these disciplines were interested in music and the brain. By the end
of the nineteenth century, the study of music and brain had formed a separate identity
within the fields of neurology, psychology and musicology. Many issues that were
explored in the late nineteenth and early twentieth centuries have become part of
twentieth-first century investigations in music cognition and in neuroscience:
localization of music function, music impairments, perceptual and cognitive processing,
the relationship between music and emotion. During the early and mid-twentieth century,
however, the field of music cognition did not continue to grow as rapidly as during the
late-nineteenth century. The first talk will focus on the contributions of neurologists to
the study of music cognition in the late nineteenth century. The second talk will explore
the overlapping interest of late nineteenth-century neurologists, psychologists, and
musicologists who shared an interest in music cognition. The final talk will focus on the
impact of American Behaviorism on the field of music psychology during the early and
mid-twentieth century.
ISBN 1-876346-65-5 © ICMPC14 198 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The historical roots of the current interest in studying music and brain can be traced back
to the second half of the nineteenth century when neurology and psychology were
gaining momentum as respected fields. During this time, music was used to examine and
better understand higher cognitive functioning in persons with brain damage, particularly
those with aphasia (i.e., difficulty speaking or understanding language). Early
neurologists were fascinated by the paradox that some patients who were unable to speak
could sing. This seemingly simple observation inspired decades of clinical inquiry about
music and brain, and these early neurologists developed theories about how music was
processed by the brain. The purpose of this talk is to summarize observations of music
and brain through the eyes of three important nineteenth-century neurologists: August
Knoblauch, Hermann Oppenheim, and Jean-Martin Charcot. We used historical methods
to review archival material. The presentation will discuss similarities and differences
among these three figures. Music played an important role in early thinking about
cognition and brain and helped shape the emerging field of neurology. After various
music abilities were investigated over time and with different patients, it became apparent
that music involved a complex cognitive system and brain functions.
ISBN 1-876346-65-5 © ICMPC14 199 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
As part of a new interest in how the brain processes music, prominent nineteenth-century
neurologists began conducting investigations of music cognition in patients with
neurological impairments. After careful observations of many patients, these German,
French and English scientists identified a new neurological syndrome referred to as
“amusia”—a systematic breakdown of musical abilities as a result of brain damage. This
led to discussion of localization of music function within the brain. At the same time,
investigations by psychologists and musicologists discussed similar concepts. However,
research by psychologists and musicologists followed a different path from the
neurologists, one focused on multiple levels of sensory processing and mental
representation. The purpose of this talk is to summarize the work of nineteenth-century
psychologists and musicologists who were interested in music and brain, in particular
Carl Stumpf and Richard Wallaschek, and to discuss parallels with the work of
contemporary neurologists (Knoblauch and Oppenheim, discussed in the first talk, and
English neurologist John Hughlings Jackson). We used historical methods to review
archival material that reveals music played a greater role in nineteenth-century scientific
thought within psychology and neurology than has been previously acknowledged.
ISBN 1-876346-65-5 © ICMPC14 200 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Today’s favorable climate for music psychology contrasts with the climate found within
early and mid twentieth-century American Behaviorism. Behaviorism was both a
negative reaction to earlier decades that had founded a seemingly unproductive human
introspective psychology and a positive reaction to the success of Pavlov’s learning laws
derived through animal research. Between 1913 and 1975 (the cusp of the Cognitive
Revolution), Behaviorism promoted the study of “objectively observable behavior” and
demoted studies of mind and imagery. New laboratories for experiments with rats and
pigeons created an atmosphere unconducive to music research. A few researchers such
as Carl Seashore soldiered on in the music realm. But PsycInfo reveals a dip in the
number of papers referring to “music” (as compared to “psychology” and “behavior”, for
example) during this Behaviorist period. Several milestones mark a return to the
legitimacy of music psychology. In 1969, Diana Deutsch published an article in
Psychological Review entitled “Music Recognition, applying a hierarchical physiological
model of feature detectors to account for melodic transposition”. There followed under
the editorship of George Mandler several more articles on music themes. In 1978, Roger
Shepard organized a symposium on the cognitive foundations of music pitch at the
annual meeting of the Western Psychological Association. It took place in this very hotel,
introducing, to a standing-room-only crowd, Carol Krumhansl and the tonal hierarchy
along with several other (then young) researchers active to this day. Thus began the post-
Behaviorism legitimacy of music cognition research. While this recognition eventually
became more widespread, specialty journals sprang up. In spite of the downside to
Behaviorism, music psychology benefited indirectly from Behaviorism’s rigor and
developments in statistics. The historical context highlights the privilege it is to study
music psychology in the current cognitive/neuroscience zeitgeist.
ISBN 1-876346-65-5 © ICMPC14 201 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
When musicians first learn pieces of music, serial cuing activates memory for what
follows, as each passage played or sung is linked to the next. During memorization, serial
memory is overlaid with different types of performance cue (PCs) enabling musicians to
recall a specific passage, independent of musical context, simply by thinking of it, e.g.,
“coda”. The locations of retrieval cues can be identified by the presence of serial position
effects in free recall. We looked for such effects in the written recall of an experienced
singer (the first author) who wrote out the score of a song memorized for public
performance six times over a period of five years. The singer recorded all of her practice.
After the performance she marked the locations of phrases and sections in the score, and
the locations of the PCs that she had used. Previously, we have reported how serial
position effects for PCs were different for different types of PC. Here, we use logistic and
growth-curve mixed models to examine how serial position effects for PCs were affected
by practice, showing how recall changed over time for better and worse. Initially, recall
was perfect. Fourteen months after the performance, there was a normal primacy effect at
Expressive PCs marking musical feelings, and an inverse primacy effect at Word PCs
marking nuances of stress and pronunciation. By the time of the last recall, five years
after performance, these effects had reversed. The normal effect for Expressive PCs
became inverse, and vice versa for Word PCs; and there was also an inverse effect for
Basic PCs marking details of technique such as breathing. PCs acted as retrieval cues,
providing content addressable access to an associatively chained memory and countering
the effect of weakening links between passages. The direction of primacy effects
depended on the salience of each type of PC, which changed over time with the singer’s
interest in the piece, initially as performer and later as researcher.
ISBN 1-876346-65-5 © ICMPC14 202 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musically untrained listeners remember the key of previously unheard melodies, but the
key manipulations in earlier research were large (6 semitones). Here we sought to
document the accuracy of implicit memory for key. Our 24 stimulus melodies were taken
from the vocal lines of unknown Broadway songs and presented in piano timbre. Each
was saved in five different keys. The standard (lowest) key had a median pitch of G4.
Other versions had a median pitch 1, 2, 3, and 6 semitones (ST) higher than the standard.
Musically untrained participants were tested in one of four conditions (1, 2, 3, and 6 ST)
based on the magnitude of the key change. Before the experiment began, listeners learned
that key is irrelevant to a melody’s identity. During an exposure phase, listeners heard 12
of 24 melodies (selected randomly): 6 in the standard key, 6 in the higher key. After a 10-
min break, they heard all 24 melodies. The 12 new melodies were divided equally among
the standard and higher keys. Six of the 12 old melodies were presented in the original
key. The other six were transposed from the standard to higher key or vice versa.
Listeners rated how confident they were that they had heard the melody in the exposure
phase. Across conditions, ratings were much higher for old than new melodies, indicating
that the melodies were remembered well. Memory for old melodies was also superior
when they were re-presented in the original key, indicating memory for key. This
original-key advantage was evident in the 2, 3 and 6 ST conditions. It disappeared,
however, in the 1 ST condition. Overall recognition (ignoring the key change) was also
better in the 1 and 2 ST conditions than in the 3 and 6 ST conditions. The findings
indicate that (1) implicit memory for the key of previously unfamiliar melodies is
remarkably accurate, and (2) memory for a corpus of novel melodies is superior when the
overall pitch range is restricted.
ISBN 1-876346-65-5 © ICMPC14 203 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Performance cues (PCs) are the landmarks of a piece of music that a performer attends to
during performance. Experienced performers strategically select aspects of the music to
attend to during performance in order to achieve their musical and technical goals; these
are their PCs. While most aspects of performance become automatic with practice, PCs
provide musicians with a means of consciously controlling the otherwise automatic motor
sequences required in performance.
ISBN 1-876346-65-5 © ICMPC14 204 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The hallmark symptom of Alzheimer’s Dementia (AD) is impaired memory, but memory
for familiar music can be relatively spared. We investigated whether a non-musician with
severe AD could learn a new song and explored the mechanisms underlying this ability.
NC, a 91 year old woman with severe AD (diagnosed 10 years ago) who loved singing
but had no formal music training, was taught an unfamiliar Norwegian tune (sung to ‘la’)
over 3 sessions. We examined (1) immediate learning and recall, (2) 24 hour recall and
re-learning and (3) 2 week delayed recall. She completed standardised assessments of
cognition, pitch and rhythm perception, famous melody recognition and a music
engagement questionnaire (MEQ, informant version), in addition to a purposefully
developed two word recall task (words presented as song lyrics, a proverb or as a word
stem completion task). She showed impaired functioning across all cognitive domains,
especially verbal recall (0/3 words after 3 minute delay). In contrast, her pitch and rhythm
perception was intact (errorless performance of 6 items each) and she could identify
familiar music. Her score on the ‘response’ (responsiveness to music) scale of the MEQ
was in the superior range relative to healthy elderly participants. On the experimental
tasks she recalled 0/2 words that were presented in song lyrics or proverbs, but correctly
completed both word stems, suggesting intact implicit memory function. She could sing
along to the learnt song on immediate and delayed recall (24 hours and 2 weeks later,
both video recorded), and with prompting could sing it alone. On objective ratings of
pitch and rhythm accuracy during song recall she showed better recall of the rhythm. This
is the first study to demonstrate preserved ability to learn a new song in a non-musician
with severe AD. Underlying mechanisms may include intact implicit memory, superior
responsiveness to music and increased emotional arousal during song compared with
word learning.
ISBN 1-876346-65-5 © ICMPC14 205 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Vocal melodies are remembered better than piano melodies but sine-
wave simulations are not
Michael W. Weiss1, Sandra E. Trehub1, and E. Glenn Schellenberg1
1
Department of Psychology, University of Toronto Mississauga, Mississauga, Ontario, Canada
Listeners remember vocal melodies (la la) better than instrumental melodies (Weiss et
al., 2012, 2015a, b), but it is unclear which acoustic features are responsible for the
memory advantage. To date, the instrumental timbres tested (piano, banjo, marimba)
have had fixed pitches and percussive amplitude envelopes, in contrast to the voice,
which incorporates dynamic variation in pitch and amplitude, rich spectral information,
and indexical cues about the singer. In the present study, we asked whether artificial
stimuli with voice-like pitch and amplitude variation could elicit a memory advantage
like that observed with natural vocal stimuli. Vocal renditions from Weiss et al. (2012)
were used as source materials for generating “sine-wave” renditions in PRAAT.
Specifically, continuous pitch (fundamental frequency) and amplitude information were
extracted from the natural vocal performances and applied to a continuous tone of
equivalent duration. The resulting melodies sounded more like dynamic theremin
performances than like human singing. In the familiarization phase, listeners heard 16
melodies (half natural or sine-wave voices, half piano) and rated their liking of each
melody. Half of the listeners received naturally sung melodies and half received sine-
wave versions (n = 30 per group). In the test phase, listeners heard the same melodies and
16 timbre-matched foils, and they judged whether each was old or new on a 7-point scale
(definitely new to definitely old). Listeners exhibited better memory for natural vocal
melodies than for piano melodies (p = .012), as in previous research, but they performed
no differently on sine-wave melodies and piano melodies (p = .631). These findings
suggest that dynamic pitch and amplitude information from singing, in the absence of
natural vocal timbre, is insufficient to generate enhanced processing. Instead, such
processing advantages may depend on the full suite of acoustic features that is unique to
human voices.
ISBN 1-876346-65-5 © ICMPC14 206 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Listeners naive to a musical genre are thought to process structures on the musical
surface differently than experts familiar with the genre. However, only a small amount
research has been done exploring claims of differences in musical processing abilities
associated with expertise beyond musical training. In this paper, we investigate the
perception of leitmotives in Wagner’s Der Ring des Nibelungen in an two experiments
requiring listeners to recall musical material based on a short excerpt. We hypothesized
that both individual differences as well as musical parameters of the leitmotives would
contribute to the memory performance of the listeners. In the experiment participants
(N=100 and N=30) listened to a ten minute excerpt from the opera Siegfried and
completed a memory test to indicate which leitmotives they remembered hearing in the
excerpt. The independent variables measured were musical training, German speaking
ability, measures of Wagner affinity and knowledge , and the acoustic similarity
determined chroma distance. As dependent variables participants provided ratings of
implicit and explicit memory as well as affective ratings for each leitmotive in the
memory test. A structural equation model of the data indicates that Wagnerism, a latent
trait combining affinity to and knowledge of Wagner’s music is almost twice as strong as
a predictor for participants’ leitmotive memory than musical training. Regression analysis
of the musical parameters indicated that acoustic chroma was a significant predictor for
the individual difficulty of leitmotives but subjective arousal ratings were not. Results
suggest that stylistic expertise can be independent from general musical expertise and that
stylistic expertise can contribute strongly to a listener's perception. We suggest that
incorporating stylistic expertise into models of music cognition can account for
differences in listeners’ performance scores and can thus make models of musical
cognition more robust.
ISBN 1-876346-65-5 © ICMPC14 207 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 208 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
pieces. The difficulty of the identification task and the high experiment were extracted from commercial recordings of the
number of items in the list reduced the likelihood that listeners twelve pieces used in the experiment. Excerpts lasted 15
would identify pieces by process of elimination. The twelve seconds and contained the harmonic reductions used in the
pieces that were most often identified in the pilot study were first part of the experiment. Most excerpts from songs
selected for the main experiment: The Animals, House of the included vocals but no excerpt contained words from the title
Rising Sun; The Beatles, Let it Be; Capital Cities, Safe and of the song
Sound; Coldplay, Clocks; Coldplay, Viva La Vida; Daft Punk,
C. Transposition
Get Lucky; Elgar, Pomp and Circumstance; Led Zeppelin,
Stairway to Heaven; Nirvana, Smells like Teen Spirit; Although most listeners can identify familiar melodies
Pachelbel, Canon in D Major; Red Hot Chili Peppers, Snow when transposed to a different key based on relative pitch
(Hey Oh); Tchaikovsky, Dance of the Sugar Plum Fairy. information (e.g., Happy Birthday), we predicted that for
pieces that are usually performed at the same pitch level, the
III. MAIN EXPERIMENT presence of absolute pitch cues (i.e., hearing the music at its
most typical pitch level) would be an important factor in the
A. Participants identification of a piece from harmonic reduction, that is, in
There were one hundred participants in this experiment. the absence of important rhythmic, timbral, and textural cues.
Two participants were rejected since they had very limited Thus, we expected that playing harmonic reductions in
familiarity with Western popular music. Eighty-seven atypical keys would make identification more difficult. Six
participants were undergraduate students enrolled in one or different pitch levels were used for the main experiment:
more music classes at the University of Pittsburgh. Of the original, one semitone down, one semitone up, two semitones
other eleven participants, five were composition doctoral or down, two semitones up, and tritone.
master students, two were music theorists, two were
instrument instructors, one was a professional musician and D. Procedure
one was an amateur musician. The participants were divided In the first part of the experiment, the 12 harmonic
into four groups according to their musical background. In the reductions representing the 12 pieces were presented using
first group, labeled professionals (N = 17; 13 male), the either piano tones or shepard tones. After hearing each
participants were professional musicians or professional music reduction, the participants were asked to respond either with
students. In the second group, labeled serious amateur the name of the piece, some words of the lyrics or some
musicians (N = 16; 7 male), the participants had no description of the piece if they could not recall the exact name
professional musical training but had had private instrument
or lyrics. In the second part of the experiment, the tunes were
lessons for more than five years. The third group, labeled
presented as commercial recordings, and the participants were
amateur musicians (N = 40; 23 male), consisted of
asked to name the pieces as well. Participants were also asked
participants with no professional musical training but who had
had 5 years or less of private instrument lessons or had played how often they had heard the piece or if they had played the
an instrument for more than 5 years. The participants in the piece themselves.
fourth group, labeled non-musicians (N = 25, 15 male), had E. Results
not studied music nor did they actively play any instrument.
The scores were summarized for each timbre and each
B. Stimuli group of participants. If a participant did not identify a piece
The chord sequences were first composed using Finale 2007 from the commercial recording, all his/her responses for
software. The harmonic reductions were composed so that harmonic reductions of the piece were omitted from analysis.
each new harmony of the original commercial recording was There were a few instances where participants mentioned a
composed into a chord. The specific notes for each chord in piece that was not in our list but that matched the chord
the reductions were chosen according to the same criteria used progressions featured in the harmonic reduction. Based on that
in the pilot experiment. For the nine harmonic reductions scarcity and the fact that our harmonic reductions had a less
taken from popular songs, the duration of the chords matched direct musical connection to those pieces than to the pieces in
the original duration in the most viewed version of the song on our list, we decided to exclude those pieces from our main
YouTube. For the three harmonic reductions taken from analysis. Fig.1 shows the percentages of identified harmonic
classical pieces, the duration of the chords was determined by reductions played using two timbres for the four groups of
averaging the duration of every chord in the ten most viewed participants. The figure shows that the professionals identified
versions of the piece on YouTube. more pieces from their harmonic reductions than did the other
Both piano tones and shepard tones were used in the participants.
harmonic reductions. Shepard tones are vague in terms of The difference between professionals and other participants
pitch register, which greatly reduces the clarity of melodic was relatively large, but the differences between the three
gestures, voicing, and chordal inversions. For this reason other subgroups were small, especially when the reductions
shepard tones have been used regularly in experiments where were played using shepard tones. As can also be seen, the
researchers want to minimize the effects of melodic cues in reductions played using shepard tones were generally more
tasks that aim to examine participants’ responses to harmony difficult to identify than the reductions played using piano
[17]-[21]. Audio excerpts for the second part of the tones. A two-factor ANOVA analysis was made using two
209
Table of Contents
for this manuscript
timbres of the reductions and the musical background of the that create a pattern similar to that of the bass in Pachelbel’s
participants as experimental variables. The analysis confirmed Canon.
the results: both the timbre (F(1, 97) = 10.031, p = .002) and The effect of transposition on music identification was also
the musical training of participants (F(3, 291) = 24.798, p < analyzed. Fig 2. shows the percentages of pieces identified
.001) were statistically significant factors in explaining the form the six transpositional levels, done separately for
responses. The group * timbre interaction was not statistically professionals and non-professionals. ANOVA analysis
significant (p = .545). The Bonferroni post-hoc test confirmed showed that the effect of transpositions was statistically
that the subgroup of professional musicians differed from significant neither for the non-professionals (F(5,400) = 1.127,
other participant subgroups (p < .001) and that there was no p < .345) nor for the professionals (F(5,80) = 1.750, p = .131).
difference between the other participant subgroups. Since the
subgroup of professionals differed from the other participant
subgroups, and since the responses of the other three
subgroups did not differ, the three subgroups were merged
into one group of non-professionals, and the rest of the
analyses were made using data from only two groups of
participants labeled as professionals and non-professionals.
210
Table of Contents
for this manuscript
extra-musical associations (e.g., weddings, graduations, [3] T. Kuusi, “Tune recognition from melody, rhythm and harmony,” in
Proc. 7th Triennial Conference of European Society for the Cognitive
Christmas), pervasive presence in popular culture, and Sciences of Music (ESCOM), Jyväskylä, Finland. 2009.
distinctive sequential pattern may have facilitated its [4] R. J. Scott, Money Chords: A Songwriter's Sourcebook of Popular
identification, the rhythmic resemblance between the block- Chord Progressions. Bloomington, IN: iUniverse, 2000.
[5] N. Stoia, “The common stock of schemes in early blues and country
chords in our stimuli and the beginning of Pachelbel’s piece is
music,” Music Theory Spectrum, vol. 35, pp. 194-234, 2013.
likely to have simplified the process of identification as well. [6] C. Czerny, School of Practical Composition: Complete Treatise on the
It is important to note that for most of the pieces we selected Composition of All Kinds of Music (J. Beshop, Trans.). New York: Da
for our experiment, deviation from the slow, regular rhythm of Capo Press, 1979 [Reprinted from Die Schule der Praktischen
Tonsetzkunst, Bonn: Simrock, 1849].
our block-chord stimuli also meant deviation in terms of [7] E. Aldwell, C. Schachter, and A. C. Cadwallader, Harmony & Voice
melodic content. That is, the more rhythmically dense the Leading, 4th ed. Boston, MA: Schirmer/Cengage Learning, 2011.
original excerpts were, the more melodic gestures were [8] J. P. Clendinning, and E. W. Marvin, The Musician's Guide to Theory
and Analysis, 2nd ed. New York: W.W. Norton, 2011.
omitted in the harmonic reduction. The correlations between [9] M. L. Serafine, N. Glassman, and C. Overbeeke, “The cognitive reality
rhythmic similarity and identification rates for individual of hierarchic structure in music,” Music Perception, vol. 6, pp. 397-430,
pieces from piano chordal reductions are consistent with the 1989.
[10] N. Dibben, “The cognitive reality of hierarchic structure in tonal and
idea that rhythmic similarity does indeed facilitate atonal music,” Music Perception, vol. 12, pp. 1-25, 1994.
identification from harmonic reductions. The role that melodic [11] C. L. Krumhansl, “Plink: ‘thin slices’ of music,” Music Perception, vol.
and rhythmic cues played in the identification of music from 27, no. 5, pp. 337-354, 2010.
[12] K. VanWeelden, “Classical music as popular music: adolescents'
harmonic reductions in the present study provides further
recognition of Western art music.” Applications of Research in Music
evidence of their influence on the mental representation of Education, vol. 31, pp. 1–11, 2012.
harmonic activity. [13] K. VanWeelden, “Preservice music teachers’ recognition of Western art
In addition to testing the influence of musical training in the music found in popular culture. Journal of Music Teacher Education,
vol. 24, pp. 65–75, 2014.
identification of music from harmony, our experiment also [14] K. Frieler, T. Fischinger, K. Schlemmer, K. Lothwesen, K. Jakubowski
tested the influence of transposition. Although music was most and D. Müllensiefen, “Absolute memory for pitch: A comparative
often identified from harmonic reductions played at their replication of Levitin's 1994 study in six European labs,” Musicae
Scientiae, vol. 17, pp. 334-349, 2013.
typical pitch level in our experiment, we found no significant [15] T. deClercq and D. Temperley, “A corpus analysis of rock harmony,”
effect of transposition on music identification. Popular Music, vol. 30, pp. 47–70, 2011.
[16] D. Parry, The 50 Greatest Pieces of Classical Music. London
Philharmonic Orchestra. X5 Music Group. ASIN: B002WMP162, 2009.
V. CONCLUSIONS AND FUTURE WORK [17] J. J. Bharucha, “Anchoring effects in music: The resolution of
This study showed that music identification from harmonic dissonance,” Cognitive Psychology, vol. 16, pp. 485-518, 1984.
[18] J. J. Bharucha and K. Stoeckig, “Reaction time and musical expectancy:
reductions is possible, yet there are many factors affecting the Priming of chords.” Journal of Experimental Psychology: Human
identification process. It appears that the use of harmonic Perception and Performance, vol. 12, pp. 403-410, 1986.
information in identification is easiest for professional [19] É. A. Firmino, J. L. O. Bueno, and E. Bigand, “Travelling through pitch
musicians, especially composers and theorists. Melodic and space speeds up musical time,” Music Perception, vol. 26, pp. 205–209,
2009.
rhythmic cues are important, and they are always present in [20] T. Kuusi, “Interval-class and order of presentation affect interval
chord progressions and harmonic reductions. The association discrimination,” Journal of New Music Research, vol. 36, pp. 95–104,
of a harmonic reduction with a specific song is just one way 2007.
[21] C. L. Krumhansl, J. J. Bharucha, and E. J. Kessler. “Perceived harmonic
that the interaction of harmony and episodic memory can structure of chords in three related musical keys,” Journal of
influence our perception of music, suggesting that the Experimental Psychology: Human Perception and Performance, vol. 8,
seemingly idiosyncratic task of explicitly identifying music pp. 24-36, 1982.
[22] E. Cao, M. Lotstein, and P. N. Johnson-Laird, “Similarity and Families
from harmonic reductions can improve our general of Musical Rhythms,” Music Perception, vol. 31, pp. 444-469, 2014.
understanding of how listeners make sense of music. Future [23] J. Forth, Cognitively-motivated Geometric Methods of Pattern Discovery
research on music identification from harmony is needed to and Models of Similarity in Music. PhD Dissertation, Goldsmiths
University London, 2012.
clarify whether the association of harmony with episodic [24] G. T. Toussaint, The Geometry of Musical Rhythm: What Makes a
memory occurs implicitly or not as well as how such "Good" Rhythm Good?. Boca Raton, FL: CRC Press, Taylor & Francis
unconscious associations impact our experiences with music. Group, 2013.
[25] M. M. Farbood, “A parametric, temporal model of musical tension,”
Music Perception, vol. 29, pp. 387–428, 2012.
ACKNOWLEDGMENT [26] R. Kopiez and F. Platz. "The role of listening expertise, attention, and
The Authors would like to thank Matti Strahlendorff for the musical style in the perception of clash of keys," Music Perception, vol.
26, no. 4, pp. 321-334, 2009.
shepard tone stimuli used in this study and Lynne Sunderman [27] D. Sears, W. E. Caplin, and S. McAdams,"Perceiving the Classical
for her proofreading of this manuscript. cadence." Music Perception, vol. 31, no. 5, pp. 397-417, 2014.
[28] L. R. Williams, The Effect of Musical Training and Musical Complexity
REFERENCES on Focus of Attention to Melody or Harmony. Unpublished doctoral
dissertation, Florida State University, Tallahassee, FL., 2004.
[1] A. R. Halpern and J. C. Bartlett, “Memory for melodies.” in Music [29] R. S. Wolpert, “Attention to key in a non-directed music listening task:
perception. Springer, New York, 2010, pp. 233-258. Musicians vs. nonmusicians,” Music Perception, vol. 18, pp. 225–230,
[2] D.J. Povel and R. Van Egmond, “The function of accompanying chords 2000.
in the recognition of melodic fragments,” Music Perception, vol. 11, pp.
101-115, 1993.
211
Table of Contents
for this manuscript
When recalling sequences of notes briefly presented in a musical stave, expert musicians
show a better performance then novices. Interestingly, this skill effect occurs even with
unstructured material – such as random sequences of notes. Classically, the advantage of
experts at recalling domain-specific meaningless material – such as shuffled chess
positions, or random notes – has been convincingly explained by the chunking theory. In
fact, theories assuming small memory structures (i.e., chunks) predict a moderate skill
effect, because even in unstructured material some small meaningful structure – which
experts are more likely to recognize and recall than novices – can occur by chance.
The aim of this meta-analysis is thus to establish (a) the size of experts’ superiority at
recalling unstructured domain-specific material, and (b) the possible differences between
diverse domains of expertise (music, programming, board games, and sports). A
systematic search strategy is used to find studies comparing experts’ and novices’
performance at recalling random material. Then, a random meta-analytic model (K = 28)
is run to calculate the overall correlation between expertise and performance. The results
show (a) a moderate overall correlation ( = .332, 95% CI [.221; .434], p < .001),
and surprisingly (b) a significantly superior correlation ( = .706, 95% CI [.539; .820],
b = 0.455, z = 2.54, p = .011) in the domain of music compared to all the other domains
considered.
These outcomes suggest that expert memory is qualitatively different in music, because
such a large skill effect cannot be accounted for only by meaningful chunks of notes
occurring by chance. Possibly, whilst experts in other fields base their superiority on a
greater amount of small meaningful domain-specific structures stored in their long-term
memory, expert musicians benefit from a retrieval structure specific for notes drawn in
the musical stave, and able to explain such a strong correlation.
ISBN 1-876346-65-5 © ICMPC14 212 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Most memory domains show clear performance decrements as the number of intervening
items between a stimulus’ first exposure and its reoccurrence increases. For example,
spoken word recognition decreases as the number of additional intervening words
increases between a target word’s first and second occurrence. However, using
continuous recognition paradigms in the context of Western tonal music, our work
demonstrates that this is not the case in memory for melody: the first intervening melody
results in a significant performance decrement; however, every additional intervening
melody does not impede melody recognition. We have replicated this observation
multiple times using different melody corpora, implicit/explicit measurements, and up to
190 intervening melodies. To explain these results, we posit a model describing memory
for melody as the formation of multiple representations that can regenerate one another;
specifically, a representation of the melody as an integrated whole, and a representation
of its underlying parts. The model predicts that removing a familiar musical context
established by previous musical experience disrupts integration of melodies as a whole.
In this case, decrements in recognition performance should be observed as the number of
intervening melodies increases. This hypothesis was tested in the present study using
novel melodies in a microtonal tuning system unfamiliar to participants. The study used
the same continuous recognition paradigm that previously produced no effect of the
number of intervening items in response to Western tonal stimuli. Results provide
preliminary support for the model’s predictions and show that when presented with an
unfamiliar tonal system, memory for melodies does indeed deteriorate as the number of
intervening items continuously increases beyond immediate repetition. Overall, the
results advance our understanding of memory mechanisms and pave the way for future
work designed to predict melody recognition.
ISBN 1-876346-65-5 © ICMPC14 213 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Adults (musicians and nonmusicians) and older children exhibit better memory for vocal
melodies (without lyrics) than for instrumental melodies (Weiss et al., 2012, 2015a, b),
which implies processing advantages for a biologically significant timbre. Because
previous studies used a single female vocalist, it is possible that something about her
voice or about female voices in general facilitated melodic processing and memory. In
the present study, we compared adults’ memory for vocal and instrumental melodies, as
before, but with two additional singers, one female (same pitch level as the original
female) and one male (7 semitones lower). Instrumental melodies were MIDI versions at
the same pitch level as the corresponding vocal recording (i.e., male or female). (Previous
research showed no memory differences for natural and MIDI versions.) During the
exposure phase, participants rated their liking of 24 melodies (6 each in voice, piano,
banjo, and marimba), with random assignment to one of three voices (n = 29, female
from previous research; n = 32, new female; n = 29, male). After a short break,
recognition memory was tested for the same melodies plus 24 timbre-matched foils (6 per
timbre). A mixed-model ANOVA and follow-up pairwise comparisons revealed better
recognition memory for vocal melodies than for melodies in every other timbre (ps <
.001), replicating previous findings. Importantly, the memory advantage was comparable
across voices (i.e., no interaction, F < 1) despite the fact that liking ratings for vocal
melodies differed by singer. Our results provide support for the notion that the vocal
advantage in memory for melodies is independent of the idiosyncrasies of specific singers
and of vocal attractiveness, arising instead from heightened arousal or attention to a
biologically significant timbre.
ISBN 1-876346-65-5 © ICMPC14 214 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Learning and performing music draws on a host of cognitive abilities; might musicians
then have advantages in related cognitive processes? One likely aspect of cognition that
may be related to musical training is executive function (EF). EF is a set of top-down
processes that regulate behavior and cognition according to task demands (Miyake et al.,
2000). Models of EF postulate three related, but separable components: inhibition, used
to override a prepotent response; shifting, used to switch between task demands; and
updating, used to constantly add and delete items from working memory.
Musicians likely draw on EF; for example, using inhibition to resolve musical
ambiguities, shifting to coordinate attention when playing with others, and updating when
sight-reading music. Because of this, many studies have investigated whether music
training is related to more general EF advantages. However, results from such studies are
mixed and difficult to compare. In part, this is because most studies look at only one
specific cognitive process, and even studies looking at the same process use different
experimental tasks.
The current study addresses these issues by investigating how individual differences in all
three EF components relate to musical training. By administering a well-validated EF
battery of multiple tasks tapping each EF component and a comprehensive measure of
musical training and sophistication (Müllensiefen et al., 2014), we can obtain reliable
measures of individual differences in EF and musical experience.
ISBN 1-876346-65-5 © ICMPC14 215 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 216 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 217 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Abstract—The subject ME, who was introduced at the age of 95 as violist in a women’s orchestra in Nashville. She occasionally
in a study presented at ICMPC 11, retains performance abilities at 101 played trombone as well.
that astound many listeners. Vascular dementia is held to account for ME was married in 1940. She and her husband welcomed
a loss of memory for biographical detail and some spatial-recognition
two sons over the next few years. In 1946 the family moved to
skills. Her performance of musical tasks and her ability to learn new
music on aural exposure have survived a recent stroke and other Jacksonville (FL). There she engaged in no substantial musical
physical maladies intact. This update provides additional data on activities utilizing her earlier skills until she was in her 90s, but
health impacts, trends in her recent performances, additional informal piano playing was a routine activity at holiday
information on her background, and consideration of recent studies. gatherings. After their children were grown, ME and her spouse
Keywords—vascular (non-Alzheimer’s) dementia, lexical took up square dancing and visited other square-dance venues
memory, semantic memory, executive function, playing by ear on regular trips across the US to visit their sons. She also played
(pertinent skills). in a hand-bell choir in her retirement years.
In her 80s ME had a TIA, but its effects were so slight that
I. INTRODUCTION
no extensive medical examination was conducted. Only when
ISBN 1-876346-65-5 © ICMPC14 218 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
II. MUSICAL RESOURCES FOR ANALYSIS of the likelihood that she learned some titles later than indicated
Data have been gathered in several ways. We have a video here, her learning curve may peak slightly later. The musical
(1990) of a family gathering in Tennessee in which several culture in which ME was immersed was a live one in which
family members of ME’s generation played the piano. One also nearly everyone with whom she associated made music.
sang. We have a one-hour Disklavier recording from Dec. 2009, C. Genres and tonal properties
with extracted MIDI files that were discussed in [1]. Over the ME’s repertory consists largely of popular music, but under
past six years ME’s family has made 16 audio recordings of her
that rubric we find instances of every kind of music in common
appearances in the two homes and the church. These typically
use in southern Appalachia during ME’s formative and early
contain 30 minutes of music. When possible, direct observation
adult years. Of the 365 titles so far identified, popular titles
has been made.
number 242 (66.3%). Other genres include songs popularized
A. Repertory overview in Broadway shows and films of musicals (8.2%) and hymns
One of ME’s sons began compiling a list of pieces she (6.8%). Music of a more indigenous character (ballads, folk and
“remembered” when she first entered assisted living. The traditional music, barbershop, country, Dixie, gospel songs,
director wanted some way of asking for what was possible. spirituals, jazz, and swing collectively make up 13%. The
Additions continue to be made, as requests that she has satisfied remainder await classification.
are added. (She can almost always satisfy an ad hoc request.) With respect to tonal orientation, most pieces (82%) are in
The database currently contains almost 400 pieces, and it is major keys. A further 7% are in minor or modal frameworks,
clear that ME can in principle play many hundreds more, since 7% are unidentified, 3% are in “blues” scales, and a few (1%,
her performances always amount to a fresh reconstruction of probably Appalachian) are pentatonic.
something aurally familiar. One value of the growing database
is that at each appearance she is rotated to a new starting III. MENTAL, PHYSICAL, AND COGNITIVE STATUS
position on the virtual dial. A list of titles (as many as 30) is Health professionals classify ME’s dementia as one of
usually prepared in advance. vascular origin. No suggestion of corollary Alzheimer’s has
been made. One of my colleagues has commented that her
B. Peak acquisition period
profile suggests amnesia. (No instance of physical trauma has
While it is not possible to determine when any given piece of been identified to substantiate this hypothesis.) Her mini-
music first entered the recall lexicon of musicians who play by mental status exam (MMSE) score is static at 23. Her greatest
ear, we have used dates of first publication to give a sketchy strengths are in digit span and calculation generally, although
profile of the periods during which the piece was well known. the profiles of outstanding subjects in experimental work on
At present 315 pieces have been anchored to such a year. These digit-span such as [2] diverge substantially from hers in being
much younger and in being tested in abstract rather than
TABLE I practical tasks. ME’s weaknesses involve lapses in visual
CHRONOLOGICAL DISTRIBUTION OF REPERTORY
Time span No. Percentage
memory (for people, places, and regular routes of local travel).
1586-1860 23 7.4% Her eyesight is sufficiently good not to require correction. Her
1861-1880 13 4.1% hearing in one ear has declined, but age-adjusted she still hears
1880-1900 15 4.7% very well. Her aural memory is superior.
1901-1910 20 6.3%
1911-1920 36 11.5% ME’s short-term memory deficiencies progress towards
1921-1930 81 25.7% recency. The retrograde aspect of her memory has been static
1931-1940 58 18.4% over the past six years. What she remembers is selective. Her
1941-1950 26 8.2%
1951-1960 25 7.9% autobiographical memory dwells in the years from ages 3 to 8.
1961-1980 6 1.9% Her visual memory falters in early adulthood. She recognizes
1981-2000 2 0.7% family members and those whom she has met prior to the age
2001-2016 10 3.2%
of 75 or 80. For her 100th birthday a sister 13 years her junior
Dates indicate year of original publication.
and a middle-aged niece traveled from the east coast to join the
come with the caveat that sheet-music publication rises and falls celebration. ME recognized both instantly and found the
with economic contingencies. Many pieces in ME’s repertory reunion deeply affecting. (ME’s sister worked as a radio
were reissued in new arrangements from era to era. announcer for about 35 years.) When the local mayor came to
Fifty-five titles first appeared before ME was born. The her facility to read a birthday proclamation she appreciated the
overall year range is 1586-2016. (ME continues to learn new associated festivity but seemed unaware that it related to her.
music.) In the distribution of publication dates (TABLE I), one ME does not hesitate to initiate conversations. She shares
can see a rapid decline in the acquisition of new titles between with many dementia patients a tendency to propose generic
1961 and 2000. Note that her rate of acquiring new material in topics of conversation. “Beautiful day today, isn’t it?” is one of
recent years has modestly increased, presumably in response to her standard openers. Her most prominent social quality is a
greater musical engagement. This distribution also shows that perennially cheerful disposition (possibly enhanced by short-
the bulk of ME’s performing repertory comes from the decades term memory loss, since she never retains any recollection of
in which she was most active musically (to 1940), but because her numerous physical disabilities).
219
Table of Contents
for this manuscript
Over the past six years ME had been hospitalized twice for defines the status of her harmonic vocabulary then and reveals
pneumonia, three times for falls, and once for a stroke. A fall elements of her rhythmic and harmonic practice in ways that
that resulted in several broken ribs, giving rise to a 4-month audio resources cannot make a clear.
long convalescence, has probably contributed greater disability
A. Harmonic scope
than her 2014 stroke. ME has been ordered to use a wheelchair,
though she never misses an opportunity to stand in place. The One of the outstanding features of ME’s playing is the rich
stroke did not cause noticeable cognitive decline, but a harmonic vocabulary she utilizes. It established that her chord
concomitant advance of distal arthritis, particularly affecting usage is more dependent on jazz and blues idioms than on
the knuckles of her left hand, has sometimes interfered with her harmonic practices based on European art music. The melody
ability to play as fast as she would like. The combination of the out of context would suggest many fewer chords than she
wheelchair (which distances her from keyboard) and decreasing actually uses (TABLE II). Among these, various types of tonic
manual dexterity has not discouraged her from playing. Her chords are particularly numerous. Heavy emphasis is placed on
determination and optimism compensate. various mutations of degrees I and II. The treatment of chords
Common methods of subject investigation (including fMRI) TABLE II
are deemed inappropriate for a person of ME’s age. Her medical CHORD TYPES IN ME’S 2005 COMPOSITION
doctor stands firm in his opposition to tests (mental or physical) ROOT C ROOT D ROOT F ROOT G ROOT A ROOT B
I maj II maj IV maj V maj VI maj
that “do not benefit a patient’s condition.” I min II min VI min
ME occasionally acknowledges her loss of short-term
memory with a wry comment. Asked recently whether she I dim II dim
I7 IV 7 V7 VI 7
heard the noise from one of Northern California’s rare
I b7 IV b7 V b7 VI b7
thunderstorms the night before, she answered “You know I I #3/b7 II #3/f7 IV #3/b7 V #3/n7
wouldn’t know the answer to a question like that.” II #3/#7
She clearly has an emotional response to music but it is not VII dim/
b7
definably autobiographical. A lack of living witnesses to her I #5
early life is an acknowledged drawback to answering some I #5/b7
questions about her past. Her colleague MC occasionally chides II #5/n7
VI dim7
her that a song she picks is “too old” for him. (MC is 12-15
Roman numerals indicate conventional chord degrees.
years ME’s junior.)
Music appears to enhance ME’s cognizance of her present based on degrees IV, V, and VI is similar. Seventh chords with
circumstances. She likes to accompany one singer she met on a various alterations are plentiful. A few sixth chords and altered
visit to California 30 years ago. Yet she is never sure whether roots also occur. (Alternative spellings are possible in some
the place in which she is currently playing is her home or not. cases.) No median root (III) is ever employed, and only a few
To find her room she relies on the nameplate on the door. She instances of the VII root occur.
does not know whether she shares the room (she does). At the
same time her uptake of new musical material on the strength B. Metrical and rhythmic features
of one or two hearings is still remarkable. Her daughter-in-law ME’s performances are full of crisp rhythms. In successive
is responsible for exposing her to new material. iterations of a single piece she uses unexpected syncopations,
unexpected rests, and staccato passages for variety. In multiple
IV. PERFORMING QUALITIES performances of the same pieces her tempos are not identical.
In recent recordings made by ME’s family and friends in C. ME with and without other musicians
recent years audio quality varies from venue to venue. All In accompanying other performers, such as the clarinetist
recordings contain background noise (conversation, requests, MC, ME’s ability to emphasize the harmonic framework and
staff instruction, and in one venue the cheerful contributions of omit the melody (entirely by ear) seems to be instinctive. She
resident canaries). MC’s clarinet and sundry singers are transitions in and out of accompaniment mode without the
captured in combination with ME’s piano playing in several slightest hesitation. She does not remember (unless reminded)
instances. MC’s participation cannot be anticipated. He that clarinetists prefer to play in Bb. The usual result of this
sometimes attends ME’s performance without participating. lapse is that MC tends to drop out of modulatory B sections.
We have used the Disklavier recordings from December Because of breathing difficulties, he rarely plays for more than
2009 [1] and their corresponding MIDI files to compare with one verse. ME in consequence is also inclined to stop after one
subsequent performances of the same works, although with 400 verse when playing with MC.
pieces to choose from the same titles do not reappear rapidly. In her best form ME plays three verses of a piece. No two are
Because of her physical limitations, it has not been feasible to played the same way. Of the second and third one is usually in
bring ME to the Disklavier in the intervening years. a high register, the other in a low one. There is no set order. She
Only one piece has been fully transcribed from a employs a wide range of articulatory methods to different each
performance. It shows her own composition, a song called verse. Her arrangements vary somewhat by musical genre. ME
“Give me a rainbow” (September 2005). Its lyrics were written has a clear sense of what is appropriate where.
by a then nonagenarian friend of ME’s. Her son’s transcription
220
Table of Contents
for this manuscript
When she has an engaged audience (something she can used to be, but the clusters always fall on the beat and most
assess accurately) she plays more pieces, and plays them with listeners do not notice the difference.
more variations than she would otherwise do. Her programs are
intended to last 30 minutes, but the number of works and their V. DISCUSSION
duration are reciprocal. If playing with fewer repetitions, she Playing by ear is widely considered to depend on musical
plays more works. The maximum number of pieces in her memory, but the individual skills it requires are substantially
programs is 30. The average length of a single work is 1'40". different from those required to memorize music learned from
Six years ago it was well above 2 minutes, but fewer titles a notated source. Recognizing this distinction is essential to
appeared on her programs. appreciating the elements of ME’s performances that are really
D.Non-imitative contrapuntal interpolations special.
Playing by ear relies on a sense of structure with which ME
A familiar feature of ME’s playing is her filling in of
is well endowed. It is used in combination with extensive
intermediate cadences with melodic interjections. It is difficult
(previously stored) lexicons of melodic and harmonic
to characterize them verbally, but collectively they suggest an
knowledge, which themselves interact with other processes that
instinctive avoidance of static harmonies and drab melodies.
range from the general to the specific. The interplay of
Considering the degree to which she emphasizes accents
cognitive skills that operate along this continuum in live
(evident in the spikes on sonograms) and her frequent use of
performance relies heavily on the synchronization of musical
sudden rests to dramatize the approach of a cadence, we can
learning, cognitive schemata, and motor memory. Each aspect
suggest that she is highly skilled at eliciting the attention of
must accommodate aural stimuli as they occur. Playing by ear
listeners. Her elderly listeners are attentive. Feet tap on
relies more nearly on improvisation skills than on
wheelchair footrests even while heads droop. Fingers tap on
memorization. The governing of these processes could well lie
armrests, or in the severely impaired eyelids may flicker in time
within the province of executive function.
to the music. Somnolent residents occasionally burst forth in
One model that may be concordant with the cognitive model
song as they suddenly recognize an old favorite. There may be
we describe in ME comes from recent work by Omar [3], [4].
valuable lessons here beyond a reckoning of ME’s skills.
The two studies represent a total of three subjects. One has no
prior musical training. Another is almost 50 years younger than
COGNITIVE ASPECTS OF SKILL RETENTION
ME. Unlike ME, these three subjects have semantic dementia.
At 101 ME does not play with the same vigor as she did at They retain knowledge of musical “objects” and “knowledge”
75. Being the consummate musician that she is, ME has but lack several other skills well preserved in ME. We could
developed her own methods to compensate for her disabilities. say that their losses are the inverse of ME’s strength.
Yet over the six-year period in which we have observed her, ME plays without premeditation. (When asked, ME likes to
most aspects of her playing are little changed. say that what she does is easy. She finds the starting note and
She consistently retains three features: (1) a strong, steady her fingers do all the rest.) Her insistence that she has never read
beat, (2) an instinctive differentiation of musical foreground music is contradicted by her ability to sight-read choral music,
and background, and (3) an excellent sense of musical structure. photographic evidence of her participation in an orchestra, and
The third may be the key to her confidence in playing for others. two degrees in music. In contrast to the scholarly literature on
Additionally, she is not distracted by external auditory stimuli, music and dementia, anecdotal evidence of other accomplished
including her own occasional errors. musicians who seem to function somewhat like ME at ages
Her single composition is in 3/4. An anticipatory above 90 are accumulating in newspapers and online videos.
introduction (which was a common feature in sheet music up to One area of current research that may be worth evaluating in
the 1960s) leads to a substantial piece in AABA form. A coda relation to studies of the preservation of musical function can
(echoing the final line) is appended in a higher register. Phrase be found in the cognitively enhanced music-theoretic literature
structure is dictated by the underlying verse, which is slightly such as the works of Temperley [5]-[6]. Another fruitful area
irregular. The A section consists of four 6-bar couplets, the B for reverse “engineering” may exist in the explorations of audio
section (really B-B1) of two 8-bar lines. A four-octave segmentation in a series of papers by Pearce, Müllensiefen, and
chromatic glissando (G2-G6) serves as a cadenza leading to the Wiggins [7]-[8]. They compare statistical and rule-based
recapitulation. The total length is 110 bars. approaches to segmentation, such that in their melodic grouping
In comparison to performances of the same piece over the formulae the critical features are rests, the length of attach
past decade, this schematic detail silhouettes ME’s gradual drift points, changes of register, and changes of phrase length.
towards simplification of detail. Basic architecture, meter, and Features 1, 3, and 4 figure prominently in ME’s methods of
accent are never altered. In recent performances of this piece structural articulation.
the introduction, cadenza, and coda have usually been omitted.
When ME is tired she may abbreviate the form to AABA VI. THERAPEUTIC IMPLICATIONS
(excluding the introduction and coda) or even AA. When her
Therapeutic investigations bring further perspectives that
fingers do not obey her ear, she is learning to make a virtue of
may be relevant. Reference [9] found that giving older subjects
necessity. We may hear tone clusters where chromatic runs
four months of piano instruction led to improvements in
221
Table of Contents
for this manuscript
executive function, visual scanning, and motor ability. This [9] S. Seinfeld, H. Figueroa, J. Ortiz-Gil, and M. V. Sanchez-Vives,
followed studies [10]-[11] that concentrated on the value on “Effects of music learning and piano practice practice on
piano instruction on executive function in adults aged 60-85. cognitive function, mood, and quality of life in older adults,”
Looking at a slightly different aspect of music in the presence Frontiers in Psychology, vol. 4 (Nov. 1, 2013).
doi:http://dx.doi.org/10.3389/fpsyg.2013.00810.
of dementia Ren and Luo [12] found positive benefits in
[10] J. A. Bugos, “The effects of individualized piano instruction on
cognitive processing speed and working memory but not in executive functions in older adults (ages 60-85), U. Florida, 2004.
fluid intelligence in elderly subjects. Playing by ear requires [11] J. A. Bugos, W. M. Perlstein, C. S. McCrae, T. S. Brophy, and P.
fluid application of learned principles, while reading from a H. Bedenbaugh, “Individualized piano instruction enhances
score does not. However, reading fluently from a score relies executive function and working memory in older adults,” Aging
on fluid recall of kinesthetic memory. In the Community of & Mental Health, vol. 11/4 (Jul 2007), pp. 464-471.
Voices project [13] amateur singers are recruited from senior [12] J. Ren and X. Luo, “Effect on cognitive processing speed,
centers in order to evaluate the benefits of musical participation working memory, and fluid intelligence by learning to play the
without prior training. piano in senior years,” Chinese Journal of Clinical Psychology,
vol. 17/4 (Aug 25, 2009), pp. 396-399.
A paucity of studies of multi-dimensional skill coordination
[13] J. K. Johnson, A. Nápoles, A. Stewart, and W. Max,
in aged populations leaves much room for speculation and http://communityofvoices.org/home/meet-the-team/meet-the-
future research. Subjects such as ME provide unique case team-university-of-california-san-francisco/
histories. At all stages of her life ME has to have been an
exceptionally observant listener and a quick-witted performer.
Given the near impossibility of finding matches for subjects
beyond the age of 90, individual case studies seem to offer the
only practical way into the terra incognita of musical skill
retention at advanced ages.
ACKNOWLEDGMENTS
ME’s family (unnamed here to protect ME’s anonymity) has
provided generous access and audio support. Craig Sapp has
contributed valuable technical assistance. Suggestions by D.
Gordon Bower, K. Anders Ericsson, Caroline Palmer, Petr
Janata, Julene Johnson, Dang Vu-Phan, and Anthony Wagner
have helped to steer my thinking in productive directions.
REFERENCES
[1] E. Selfridge-Field, “Playing by ear at 95: A case study,” ICMPC
11 Abstracts (Seattle, 2011), ed. S. M. Demorest, S. J. Morrison,
and P. S. Campbell, p. 91.
[2] Y. Hu and K. A. Ericsson, “Memorization and recall of very long
lists accounts for within the Long-Term Working Memory
framework,” Cognitive Psychology, vol. 64 (2012), pp. 235-266.
[3] J. C. Hailstone, R. Omar, J. D. Warren, “Relatively preserved
knowledge in semantic dementia,” J. Neurol. Neurosurg
Psychiatry, vol. 80/7 (2009), pp. 808-809.
[4] R. Omar, J. C. Hailstone, J. E. Warren, S. J. Crutch, and J. D.
Warren, “The cognitive organization of music knowledge: a
clinical analysis,” Brain, vol. 10 (2010), pp. 1200-1213.
doi: http://dx.doi.org/10.1093/brain/awp345.
[5] D. Temerpley, The cognition of basic musical structures,
Cambridge: MIT Press, 2001.
[6] D. Temperley, Music and probability, Cambridge: MIT Press,
2007.
[7] M. T. Pearce, D. Müllensiefen, and G. A. Wiggins, “A
comparison of statistical and rule-based models of melodic
segmentation,” ISMIR 2008 Proceedings, ed. J. P. Bello, E.
Chew, and D. Turnbull, http://www.mirlab.org/
conference_papers/International_Conference/ISMIR%202008/p
apers/ISMIR2008_228.pdf.
[8] M. T. Pearce, D. Müllensiefen, and G. A. Wiggins, “Melodic
grouping in music information retrieval: new methods and
applications,” Adv. in music inform, retrieval, ed. Z. W. Raś and
A. A. Wieczorkowska (Berlin, 2010), pp. 364-389.
222
Table of Contents
for this manuscript
When learning a song aurally, verbal information (i.e., text) and musical information
(e.g., pitches and rhythm) are perceived through the aural channel. However, if a song
has much verbal and/or musical information, only using the aural channel might hinder
one’s ability to learn the song. This study assumed the cognitive load imposed on the
aural channel could be reduced if the verbal information is processed in both visual and
aural channels by seeing the lyrics. This reduction might provide more capacity to
process musical information, leading better recall of the learned song. Therefore, this
study aimed to investigate the effect of visually presented lyrics on song recall
controlling individual differences in phonological working memory. In addition, the
effects of perceived task difficulty, preferred learning modality, and music major were
investigated. Members of auditioned choirs were invited to participate. While aurally
learning an unfamiliar song, one group saw the lyrics and another group did not. After
one individual instructional session, each participant sang the learned song from memory
and their singing was recorded. Additionally, participants took a phonological working
memory test, a perceived task difficulty report, and a learning style inventory. The recall
accuracy of their singing was rated by two experienced music teachers in terms of lyrics,
pitches, and rhythm. In the 2 way (Group × Major) MANCOVA with phonological
working memory as a covariate, a main effect for major and interaction between group
and major were revealed. Seeing lyrics while learning a song was helpful for non-music
majors in recalling pitches and rhythm, but not helpful for music majors. For music
majors, seeing lyrics might be redundant, which is congruent with the expertise reversal
effect. They may process verbal and musical information in a more integrated way than
non-music majors.
ISBN 1-876346-65-5 © ICMPC14 223 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 224 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
and audience-listening activities”, depending if the participants Firstly, the quality and quantity of the emissions from the
were from Group 1 or Group 2. elderly people in each of the groups in the three measurement
Following participation in the second condition, the second situations were investigated. Then the groups were compared
autobiographical interview and distracter activity were carried regarding the cognitive evaluation data (MMSE) and two
out. The third song indicated by the participants was used in the affectivity evaluations (GDS and PANAS). The Statistical
“activity with clay and background music” or “composing, Analysis System (SAS) for Windows program, version 9.2.,
performing and audience-listening activities”, depending if the SAS Institute Inc, 2002-2008, Cary, NC, USA was used for the
participants were from Group 1 or Group 2. The third interview statistical analysis.
was carried out after the third condition, and Positive and
Negative Affect Schedule (PANAS) was applied to verify their
mood. III. RESULTS
D. Autobiographical Interview The emission of details related to internal and external
events related by the two groups in the three situations to
The autobiographical interview (Levine et al., 2002) evaluate autobiographical memories (condition 1 – passive
quantifies the content of autobiographical memories. There are listening to music; condition 2 - an activity with clay and
previously defined categories which are called details, which background music and condition 3 – composing, performing
can be internal, if they are directly related to the main event, or and audience-listening activities for Group 1; condition 1 –
external if they were more general memories which are not passive listening to music; condition 3 – composing,
related to the main event. performing and audience-listening activities and condition 2 -
Internal details may be an event (unfolding of the story, an activity with clay and background music for Group 2 was
including who was there, reactions/emotions in others, time, counted.
one’s clothing, and the physical state and actions of others), The emission frequency was calculated as per the categories
place (country, city, street, apartment, rooms and location and sub-categories established (Levine et al., 2002). The
within a room), time (year, season, month, date, day of the “Other” category was excluded because it included emissions
week, time of day or clock time), perception (auditory, which did not represent autobiographical memories and thus
olfactory, tactile/pain, taste, visual, spatial-temporal) and were not of interest to this study. The One Sample Kolmorov-
emotion/thought (feeling states, thought, opinion, expectation Smirnov test, applied to this data, revealed that the response
or belief). frequency for condition 3 was statistically different from those
The external details are semantic (general knowledge or observed in conditions 1 and 2, whereas the frequencies
facts), repetitions (do not add new information) and other observed in conditions 1 and 2 did not differ in a statistically
details (metacognitive statements, editorializing and significant manner (p <0.001).
inferences). The scoring for the detail categories is awarded Intra-group comparisons were performed in relation to the
according to the emission frequency and wealth of content. In performances observed in the three conditions to evaluate the
addition to the detail categories, the Autobiographical Memory contents of the autobiographical interviews. In Group 1, a
Interview - AMI (specific memory in time and place), time statistically significant increase was observed in the scores for
integration (memory of the event within a wider time scale) and the AMI category in condition 2 to condition 3; from the
episodic richness (impression of experiencing the event again) episodic richness category in condition 1 to condition 3 and the
categories were also included. total evaluation; from the total evaluation between conditions 1
Scoring is only attributed for these categories if there is a and 3.
specific event. The autobiographical interview is held in two In Group 2, there was an improvement in the scoring from
parts: a general and specific investigation. In the general condition 1 to condition 3 in the total emotion/thought, internal
investigation, the interviewer invites the participant to recount emotion/internal thought, AMI, episodic richness and total
an autobiographical memory. He then asks if he wishes to say evaluation categories. Similarly, an improvement in the scoring
anything else. In the specific investigation, verbal clues are in these categories between conditions 2 and 3 was also noted.
given in order to add further details to the participant’s memory. However, there was a reduction in the score in the integrated
Only the specific investigation of the autobiographical time category between conditions 1 and 2, but an increased
interview was used to preserve the involuntary nature of the score between conditions 2 and 3.
MEAMs (El Haj, Fasotti, & Allain, 2012) and also on account
of the overgenerality effect (Ford, Rubin, & Giovanello, 2014).
E. Data analysis IV. DISCUSSION
The statistical comparisons between Groups 1 and 2 were Condition 3 “composing, performing and audience-listening
carried out according to the logic of intra- and inter-group activities” had the potential to evoke a higher number of
comparisons. A normality distribution exam was performed, autobiographical memories than condition 1 “passive listening
using the Kolmogorov- Smirnov test. As they were anormal, it to music” and condition 2 “activity with clay and background
was decided to adopt nonparametric tests: chi-squared, to music”. This demonstrates the importance of direct
compare categorical data; Mann-Whitney for the comparisons involvement with music and not only passive listening to music
between Groups 1 and 2 regarding the scores obtained and to increase autobiographical memories. It also shows that even
Friedman for comparisons between the distributions of each the execution of another creative artistic activity, such as the
group in the three measurements used (dependent samples). activity with clay and background music, did not evoke more
autobiographical memories than making music. Composing,
performing and audience-listening activities are the
225
Table of Contents
for this manuscript
fundamental processes of music and together with the by the content of the details of place, time, perception and
fundamental possibilities of direct involvement with music, emotion/thought, integrated time, episodic richness and
represent the basic categories of musical behavior (França & episodic repetition (total score) expresses the wealth of content
Swanwick, 2002). of autobiographical memories, which was greater following
In Group 1, the increased AMI classification following composing, performing and audience-listening activities.
condition 3 when compared with condition 2 showed that the These results, together with those of a higher frequency of
participants detailed more specific autobiographical memories emissions of autobiographical memories following condition 3,
in time and place and not more general memories. Although the when compared to conditions 1 and 2 in both groups, prove our
elderly people recounted more general memories, direct hypothesis that direct involvement with music through
involvement with music through composing, performing and composing, performing and audience-listening activities
audience-listening activities favoured evoking more specific increase the content of autobiographical memories.
memories due to the fact that making music was related to Autobiographical memories increased both in quantity
identity and emotion (Creech et al., 2013; Dabback & Smith, (emission frequency) and quality (memory content) following
2012; Hallam, 2010). Identity and emotion are also related to direct involvement with music through making music.
autobiographical memory, which is fundamental for the self,
emotions and personal experiences (Conway & Pleydell-Pearce,
2000). In Group 2, in addition to the increased AMI ACKNOWLEDGMENT
classification in condition 3 when compared with condition 2, This study was conducted as part of the first author’s
there was also an increase in condition 3 when compared to Doctorate program in Music, held at the School of Music of the
condition 1, which demonstrated that the content of the Federal University of Bahia (UFBA), Brazil, and with support
autobiographical memories remained more general in condition from the National Council for Scientific and Technological
1. It was concluded that there is greater specificity of Development (CNPq) – Brazil.
autobiographical memory in time and place when the subjects
are involved in composing, performing and audience-listening REFERENCES
activities when compared to passive listening to music and an
Belfi, A. M., Karlan, B., & Tranel, D. (2015). Music evokes vivid
activity with clay and background music. autobiographical memories. Memory, 10, 1-11.
In Group 1, the episodic richness was higher after condition Conway, M. A., & Pleydell-Pearce, C.W. (2000). The construction of
3, when compared to condition 1. Evoking more general autobiographical memories in the self-memory system.
memories after passive listening to music did not bring a Psychological Review, 107 (2), 261-288.
feeling of experiencing the past, as occurred following Creech, A., Hallam, S., Gaunt, H., Pincas, A., McQueen, H., &
composing, performing and audience-listening activities. In Varvarigou, M. (2013). The power of music in the lives of older
Group 2, the episodic richness was greater in condition 3, when adults. Research Studies in Music Education, 35, 1, 83-98.
compared to the other categories or, that is, participation in Dabback, W. M., & Smith, D. S. (2012). Elders and music:
empowering learning, valuing life experience, and considering the
condition 3 before condition 2 evoked a greater sensation of
needs of aging adult learners. In: McPherson, G. E., & Welch, G.
going back to a specific moment in time and place from the F. (Ed.). The Oxford Handbook of Music Education, vol. II. New
centrality of making music. It is observed that there is a greater York, Oxford University Press, 229-242.
intensity of evoking autobiographical memory when the person El Haj, M., Fasotti, L., & Allain, P. (2012). The involuntary nature of
is involved in making music, when compared to passive music-evoked autobiographical memories in Alzheimer’s disease.
listening to music and an activity with clay and background Consciousness and cognition., v. 21, 238-246.
music. Direct contact with music makes the experience of Ford, J. H., Rubin, D. C., & Giovanello, K. S. (2014). Effects of task
evoking memories more intensive. instruction on autobiographical memory specificity in Young and
The order of participating in conditions 2 and 3 (Group 2) older adults. Memory, vol. 22, n. 6.
França, C. C., & Swanwick, K. (2002). Composing, audience-listening
showed differences in the content of the total emotion/thought,
and performance in music education: theory, research and practice.
internal emotion/thought and integrated time categories. The Em Pauta, v.13, n.21, 5-41.
participants evoked a richer emotional content in condition 3 Hallam, S. (2010). Music education: the role of affect. In: Juslin, P. N.,
than from conditions 1 and 2. In addition, a stronger integration & Sloboda, J. Music and emotion: theory, research, applications.
of remembering an episodic event within a wider timescale New York: Oxford University Press, 791-817.
occurred with the Group 2 participants or, that is, there was a Janata, P., Tomic, S. T., & Rakowski, S. K. (2007). Characterisation of
more detailed description of contextual information which took music-evoked autobiographical memories. Memory, 15, 8, 845-
place before or after the recollected event. That fact that these 860.
results only took place in Group 2 demonstrates that the Group Levine, B., Svoboda, E., Hay, J. F., & Winocur, G. (2002). Aging and
autobiographical memory: Dissociating episodic from semantic
1 participants did not have increased emotional content or
retrieval. Psychology and Aging, 17 (4), 677-689.
integrated time in condition 3 because they were previously Platz, F., Kopiez, R., Hasselhorn, J., & Wolf, A. (2015). The impact
involved with condition 2. It can be inferred that the other of song-specif age and affective qualities of popular songs on
artistic activity carried out before composing, performing and music-evoked autobiographical memories (MEAMs). Musicae
audience-listening activities could minimize the potential of scientiae. The jornal of the european society for the cognitive
making music evoking a higher emotional and integrated time sciences of music, v. 19, n. 4, 327-349.
content. Schulkind, M. D., Hennis, L. K., & Rubin, D. C. (1999). Music,
In Groups 1 and 2, the total evaluation of the content of the emotion, and autobiographical memory: They’re playing your
autobiographical memories was higher following condition 3 song. Memory & Cognition, 27 (6), 948-955.
than conditions 1 and 2. The sum total of the scoring attributed
226
Table of Contents
for this manuscript
227
Table of Contents
for this manuscript
Studies of verbal memory suggest that the presence of a regular auditory beat aids
learning and memory. We investigate facilitation from an auditory beat for working
memory for actions. On each trial of a recognition task, four whole body actions were
presented as a study set, followed immediately by a test set of four actions. The task was
to recognize whether the test set actions were the same or different from the study set.
Half of the trials were different with a single action replaced from a set of eight actions.
The presence of an auditory beat coinciding in time with presentation of the visual action
stimulus was manipulated across study and test sets. The crossing of the auditory beat
being either present or absent across study and test produced four conditions: beat study-
beat test; silent study-silent test; beat study-silent test; silent study-beat test. A fifth
condition was included, to decouple visual and auditory entrainment, wherein the visual
action and auditory beat were mismatched in time. It was hypothesized that i) recognition
is greater when an auditory cue is synchronized with the action items during study than
when an auditory cue is mismatched or absent; and ii) relatively high accuracy is elicited
when study and test conditions match (“encoding specificity”). Thirty participants
completed 100 randomised trials. The first hypothesis was supported with significantly
greater accuracy recorded in the beat study-beat test condition (d’=1.15) than in the beat
study-silent test condition (d’=0.95). Poorest accuracy, as hypothesized, was recorded in
the mismatched condition (d’=0.83). Relatively good performance when silent study and
test conditions matched, supported the hypothesis (d’=0.94). Facilitation in accuracy did
not manifest in reaction time; there was no speed-accuracy trade-off. Results are
interpreted using Dynamic Attending Theory: the regular auditory cue heightens arousal
and accuracy whereas response timing is locked to the cue.
ISBN 1-876346-65-5 © ICMPC14 228 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
How do you remember all the words? Singing along facilitates memory
for song lyrics
Lucy M. McGarry1, Adrian M. Owen1, Jessica A. Grahn1
1
The Brain and Mind Institute, Department of Psychology, University of Western Ontario, Canada
Background
Many individuals have an exceptional capacity to remember the words to numerous
songs from the past. Several factors may enhance memory for song lyrics versus memory
for other spoken items such as speeches or poetry; these include repeated listening,
liking, and the presence of melody to assist with encoding of words. Individuals also tend
to interact with music by singing or dancing in synchrony, and this could add an
additional source of motoric encoding.
Aims
Here we examine whether singing during encoding enhances subsequent lyric memory;
specifically, whether singing along vs. speaking along or simply listening quietly to song
facilitates memory for song lyrics. We also examined context (a mixture of sung/listened
trials or spoken/listened trials versus blocks of sung-only, spoken-only, or listened-only
trials).
Methods
Sung and spoken sentences were presented audio-visually. Participants produced or
listened to each sentence in a ‘mixed’ list (produced and listened) or ‘blocked’ list
(produced only or listened only), followed by free recall tests.
Results
We found significant ‘production effects’ for song and speech. As in prior research, this
difference occurred only in a mixed list context, implicating distinctiveness as a factor in
the production effect.
Conclusions
Singing along to music improved word recall, as did speaking. Enhanced long-term
memory for lyrics may partially be due to the tendency to sing in synchrony with the
piece. It is likely people sing along with song more often than they recite along with
speech in everyday life, which helps to explain findings of enhanced long-term memory
for lyrics versus speech even though both have the capability to enhance memory.
ISBN 1-876346-65-5 © ICMPC14 229 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 230 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 231 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Previous studies have shown differences between musicians and non-musicians in visuo-
spatial memory, verbal memory and auditory working memory. The research presented
aimed to explore possible differences between musicians, delineated by the multi-tasking
requirements of the instrument they play. Participants were musicians (N=32) divided
equally into four categories: singers, brass players, woodwind/string players and finally
organists, percussionists and conductors. Participants were asked to complete a
questionnaire and undertake three memory tests: digit-span, Corsi block and a newly
designed music-span test in both forwards and backwards conditions. The music-span
consisted of atonal note rows selected from a totally chromatic sequence of notes with no
reference to a diatonic framework. These increased in length following the format of the
digit-span test. The results indicated that singers were strongest across the forwards tests.
Multi-tasking musicians (organists, percussionists and conductors) performed best in the
forwards and backwards digit-spans tasks, but not the spatial test.
ISBN 1-876346-65-5 © ICMPC14 232 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Better late than never (or early): Late music lessons confer advantages
in decision-making
Kirsten E. Smayda1, W. Todd Maddox1, Darrell A. Worthy2, Bharath Chandrasekaran1,3
1
Psychology Department, The University of Texas at Austin, Austin, TX, USA
2
Psychology Department, Texas A & M, College Station, TX, USA
3
Communication Sciences and Disorders, The University of Texas at Austin, Austin, TX, USA
Music training early in childhood, relative to later, confers cross-domain sensory and
motor advantages. Both the sensory and motor systems mature early in life, and training
during this critical period in development, relative to later training, is suggested to confer
a greater benefit on the sensory and motor systems of the brain that persist past childhood.
Here we ask whether late music training enhances decisional processes, given that brain
regions that mediate cognitive and decisional processes show a more protracted
maturation into adolescence. We recruited young adult musicians who began music
training before the age of eight (early-trained; ET), after the age of eight (late-trained;
LT) and non-musicians (NM) to perform a classic decision-making task, the Iowa
Gambling Task. We found that late-trained musicians showed an advantage in decision-
making relative to both early-trained musicians and non-musicians. To explore possible
mechanisms of the late-musician performance advantage, we conducted a number of
individual differences analyses between and within groups. Results suggest no group
differences in measures of working memory, inhibition, and short-term memory. Lastly,
we found that working memory measures were predictive of performance in the Iowa
Gambling Task in only the early musician group, suggesting differential mechanisms for
decision-making depending on when an individual began playing music. These results
argue for a potential benefit of music instruction later in childhood on critical decisional
processes.
ISBN 1-876346-65-5 © ICMPC14 233 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Research has demonstrated that older adults who sing or play music have enhanced
cognitive functioning, but most studies focused on persons of higher socioeconomic
status.
Aims
This study examines the link between music and cognition in a diverse sample of older
adults and whether propounded benefits vary by socioeconomic status.
Methods
Data was extracted from the 2014 Health and Retirement Study (HRS), designed to
represent the general population of Americans over 50 (mean age=68). Analyses
compared older adults who sing or play music with those who do not (n=1,496) on verbal
memory (immediate and delayed word recall), executive function (serial 7’s, numeracy,
backwards count), and semantic memory (object, person naming). Multivariate regression
models tested whether potential benefits of music on cognition vary by socioeconomic
attributes including education, income and assets, race/ethnicity, sex, and age. Sample
weights were applied to correct for nonresponse bias, design effects, panel attrition and
mortality.
Results
Older adults who sing or play music performed better than those who do not make music
on tests of verbal memory (p < 0.001), but the two groups did not differ in performance
on executive function or semantic memory tests. The positive effect of music making on
verbal memory remained robust (p < 0.001) even after controlling for socioeconomic
variables. A significant interaction with education (p < 0.05) implied stronger effects of
music making on verbal memory among older adults with more years of education than
those with fewer years. A marginal interaction with age (p < 0.10) suggested that
potential benefits of music making on verbal memory may weaken with advancing age.
Effects of music on cognition did not vary by income and assets, race/ethnicity, or sex.
Conclusion
Results suggest that music making is related to superior verbal memory among older
adults, particularly among those with more years of education.
ISBN 1-876346-65-5 © ICMPC14 234 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musical participation has many benefits, including improved health, reduced stress, and
social connectedness, that have been linked to successful aging. The purpose of this study
was to investigate perceptions of a 10-week piano-based music and wellness group
program designed for adults age 50 and older. Specifically, researchers examined which
program aspects participants perceived were beneficial, needed improvement, and
impacted their quality of life and emotional well-being. This study was a retrospective
review of data gathered from participants (N = 26) who took part in at least one 10-week
piano-based music and wellness group. Measures included a feedback form and quality
of life questionnaire. Responses to Likert-type scale questions were analyzed using
descriptive statistics. Written comments were analyzed using Linguistic Inquiry and
Word Count (LIWC) software and content analysis. Results indicated that participants
enjoyed the group learning environment, gained both cognitive and stress reduction
benefits, and were highly likely to continue to play the piano. Suggested improvements
included increasing class offerings, increasing access to pianos for practice, and
providing email/social media notifications. Participants reported high levels of quality of
life, particularly as it relates to independence, health, and support from family, friends, or
neighbors. Linguistic analysis of comments revealed higher percentages of positive
emotion words, social words, and leisure words compared to negative emotion words. A
content analysis of written comments revealed reasons for joining the program, including
stress relief, self-satisfaction, cognitive stimulation, love of music, mental health benefits,
personal enjoyment, and coping skills. This study provides information about a
recreational music-making program specifically designed for older adults which can be
used to develop music programs that encourage lifelong musical participation and
successful aging.
ISBN 1-876346-65-5 © ICMPC14 235 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Abstract— Music is the art of arranging sounds in time so as to I. INTRODUCTION
produce a continuous, unified and evocative composition through
Voice can defined as laryngeal modification of pulmonary air
melody, harmony and rhythm. The music of India includes multiple
varieties of folk music, pop and classical music. Indian classical stream and requires systemic coordination between the
music is one of the oldest forms of music in the world. It has its roots respiratory, phonatory,resonatory and articulatory apparatus.
in divers area such as the ancient religion in Vedic hymns, tribal Aging is a complex process of biological events that changes
chants, devotional temple music, and folk music. Indian classical the structure and function of varies parts of body. Vocal aging
music is a raga system. Vocal aging is a gradual process, with several is a gradual process, with several vocal changes occurring at
vocal changes occurring at different stages of an adult’s life. With different stages of an adult’s life. With aging, the larynx
aging, the larynx undergoes anatomical and physiological changes undergoes anatomical and physiological changes that have an
that have an impact on the vocal quality. Aim of the study was, to impact on the vocal quality (Linville, 2001).
compare the age related changes in intensity, frequency, HNR among
trained professional carnatic singers.31 trained female carnatic
singers in the age range of 13 to 60 years, divided into 3 groups as; E.T.Stathoulas,J.E.Huber and J.E.Sussman (2001) Carried out
below 20 years, 20-40 years and 40 to 60 years, were considered for a study investigating changes in acoustic characteristics of
the study. Voice recording was carried out in a quiet environment voice across life span among 192 individuals. Fundamental
using a digital voice recorder placed at a distance of 6cm away from frequency (F0) and Signal to Noise ratio (SNR) showed
the singer’s mouth. Subjects were asked to sing “Shankarabharanam” gender difference across the life span unlike SPL. Variability
The ragas were recited in the form of “alap” (phonating) at the of F0, Sound pressure level (SPL) and SNR followed a
comfortable level used by them. nonlinear trend which was found to be higher in younger than
Parameters considered for study were, mean frequency and intensity, older age group.
minimum frequency and intensity, maximum frequency and intensity,
frequency and intensity range and HNR. Acoustic analysis was
carried out using Dr. Speech version 4 software. Paired sample t test E.D’haseleer,et.al., in their study, measured and described the
was applied to compare the parameters of the various age group using effects of aging on middle aged premenopausal women. Both
SPSS 17 software.The mean, standard deviation values obtained and subjective and objective tests of the vocal assessment were
compared across three groups. Mean frequency, Minimum frequency used. Results of the study showed reduced fundamental
and maximum frequency found to be higher among Group II, Mean frequency, more restricted vocal frequency range and lower
intensity, maximum intensity, intensity range and HNR higher in minimum fundamental frequency, reduced intensity range,
Group I. Although literature shows changes in voice with ageing, no reduced maximum intensity levels among middle aged
significant difference observed among the frequency, intensity and premenopausal women when compared to young women
HNR. It was statistically evident from the present study that, ageing
has very minimal effect on trained professional singers. No
statistically significant difference observed among all the three age Singing is product of delicate balancing of physiological
groups. This could be attributed to their years of practice, vocal warm control, artistry, and technique (Teachey, kahane & Beckford,
ups, habits, vocal hygiene followed and factors alike. 1991). The uniqueness of Indian classical music is in its
melodic structure, which is based involved in voice production
Keywords— Aalap, Acoustic parameters of voice, Carnatic music, on the system of “Raga” system.
ageing, professional singers. Indian classical musical system may be broadly classified into
Carnatic and Hindustani systems. The scale of carnatic music
system comprises seven tones or swaras – sa, ri, ga, ma,
pa,da,and ni. Such a scale is known as ‘sampurna raga’. Indian
classical music is predominantly improvisatory from tonal
T. Akshintala is with Institute of Speech and Audiology as a Clinical music
Supervisor; Sweekaar Academy of Rehabilitation Sciences; Secundrabad;
Osmania University INDIA(Phone:+91 9491353712;E-
mail:akshintalathomas@gmail.com) Effects of training can be speculated on the process of vocal
R. Pai , is an Associate Professor at Institute of Speech and Audiology; aging. Brown et. al (1991) in their study, suggested that
Sweekaar Academy of Rehabilitation Sciences; Secundrabad; Osmania professional singers are more resistant to effects of aging as
University INDIA(E-mail: dessairashmi@gmail.com)
they master the techniques to preserve optimal laryngeal
G.S.Patil is with Ali Yaver Jung National Institute for Hearing
Handicapped, SRC, Secunderabad. Osmania University INDIA(E-mail: functioning with aging and and also practice a good vocal
gourishanker07@rediffmail.com) hygiene. A study by D.Hazelett and M.J.Ball(1996) compared
ISBN 1-876346-65-5 © ICMPC14 236 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
acoustic characteristics among two trained singers with ascending order was provided to the participants. The ragas
varying years of experience. The results showed lower F0 and were recited in the form of “alap” (phonating) at the
lower jitter in subject, with higher and history of smoking. comfortable level generally used by the singer.Recordings of
Virtually identical HNR scores were observed though subject the same, were transferred to the computer system with the
2 with lesser experience and no history of smoking showed voice analysis software, Dr.Speech installed in it. Dr. Speech
greater hoarseness as compared to subject 1. VOT indicated is a comprehensive speech/voice assessment software system,
no significance difference among the speakers. that is intended for use with professionals in voice and speech
fields. Parameters considered for study were, mean frequency
(Koufman et. al(1996); Sonnien & Hurme (1998) in their and intensity, minimum frequency and intensity, maximum
study, concluded that the singers exhibit less muscular tension frequency and intensity, frequency and intensity range and
than the non singers, while singing. They also suggested that harmonic to noise ratio (HNR). Acoustic analysis of the
the singers tend to sound younger as they advance in age, than obtained samples was carried out . Multiple analysis of
their non professional counter parts due to their usage of voice variance (SPSS 17.0) was applied to compare the parameters
conservation stratergies. of the various age groups considered in the study.
Brown et al 1991,1993.,) suggested that professional singers
do not follow the patterns of aging change observed in non Null Hypothesis: There will be significant difference in voice
singers and are somewhat immune from the effects of ageing. parameters between the three groups.
Professional singers tend to maintain stable fundamental
frequency (Fo) levels throughout their adult life in contrast to
the non singers and also do not follow the patterns of ageing IV. RESULTS AND DISCUSSION
as seen in others. The present study compared the fundamental frequency (F0),
frequency range and harmonics to noise ratio (HNR) mean
II. NEED FOR THE STUDY intensity, minimum and maximum intensity, and intensity
Literature of review shows evident changes in voice of a range among professional carnatic singers. The mean and the
normal individual. But there is dearth of literature on effect of standard deviation values obtained were obtained and
ageing in trained professional singers. Hence the present study compared across the three groups.
aimed at comparing the age related changes in frequency,
intensity and harmonic to noise ratio parameters among
professional carnatic singers.
Aim of the study, was to profile and compare the age related
changes in intensity, frequency, harmonics to noise ratio
among trained professional carnatic singers.
III. METHOD:
Present study was carried out at Sweekaar Academy of
rehabilitation science; Secunderabad Telangana,to compare
the acoustic parameters of voice across age groups. Figure.1.Indicating pitch of three groups
31 trained female carnatic singers in the age range of 13 to 60 Mean frequency, Minimum frequency and maximum
years were considered for the study. Subjects were divided frequency were found to be higher among Group II
into 3 groups as; below 20 years, 20-40 years and 40 to 60 participants in the age range of 20 to 40 years. Whereas
years. frequency range and Harmonic to noise ratio was found to be
Inclusion Criteria: higher in group I that comprised of participants with a younger
Trained professional Carnatic Singers age as shown in figure 3. Statistically no significant difference
Minimum of 4 years formal vocal training. was observed across the groups for all the frequency
parameters as well as harmonic to noise ratio.
Singing practice of at least 20 hours per week
Exclusion criteria:
No history of speech, language, or hearing problems
No history of alcohol or tobacco intake,
No History of smoking,
No history of laryngeal trauma.
237
Table of Contents
for this manuscript
238
Table of Contents
for this manuscript
[12] Durga, S. A. K. (1997). Voice culture with special [33] Ross, J. (1992). Formant Frequency in Estonian Folk
reference to south Indian music. Indian Musicology Singing. Journal of Acoustic Society of America, 91,
Society, Baroda, India. 3532-3539.
[13] E.T.Stathoulas.,J.E.Huber & J.E.Sussman (2001). [34] Rammohan, A. (2000). The Handbook of Tamil culture
changes in Acoustic characteristics of voice across the and heritage. International Tamil Language Foundation.
life span measures from individuals 4 to 93 years age., PA: National Publishing Co.
JSLHR., volume 54, 1011 to 1021, August 2011. [35] Miller,(1985). Intraindividual Parameters of the Singers
[14] Greene, M. C. L. (1972). The voice and its disorders. (Ed. Formant. Folia Phonation, 37, 31-35.
3), London: Pittman Medical Publishing Co. [36] Savitri, S. R. (2006). Overview of professional voice care.
[15] http/www.indianmelody.com Journal of ITC Sangeet Research Academy, 20. 258-26
[16] Green & Mathaison (1995). Voice disorders. London:
Whurr Publishers
[17] Hirano, M., Kiyokawa, K. and Kurita, S. (1988),
"Laryngeal muscles and glottic shaping," in Vocal
physiology: Voice production mechanisms and functions,
edited by O. Fujimara (Raven Press Ltd., New York) 49-
65
[18] Jairazbhoy, N. A. (1971). The raga of North India music:
Their Structer and Evaluation. London: Faber and Faber.
[19] Jiang, J., Lin, E. and Hanson, D. G. (2000), "Voice
disorders and phonosurgery i: Vocal fold physiology,"
Otolaryngologic Clinics of North America. 33, 699-718.
[20] Koufman, J.et.al, 1996 Laryngeal Reflex consensus
conference report journal of voice, 10(3), 215 -216.
[21] Krishnaswamy, A., (2003). Application of Pitch tracking
to south Indian classical music. In IEEE International
Conference
[22] on Acoustics, Speech and Signal processing (pp.557-560),
Hong Kong, Chain.
[23] Krumhansl, C. L. (1990). Cognitive Foundations of
Musical Pitch. Oxford: Oxford University Press.
[24] Kiesgen, Paul. "Breathing." Journal of Singing 62, no. 2
(2005): 169-171.
[25] Lawrence, V. L. (1979). Laryngological observation on
the belting. Journal of Research in Singing, 2, 26-28.
[26] Luchsinger, R. L, & Arnold, G. E. Voice, (1965). Speech
and Language. Constable and Co. Ltd.,
[27] Linville SE. Aging of the larynx. In: Linville SE, ed.
Vocal Aging. Canada:Singular Thomson Learning;
2001:37–53
239
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 240 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
During a musical performance, does the listener perceive the emotion the performer is
expressing? Is there affective information composed into the melodic material of a
dramatic work? How is the intended expression influenced or affected by the range of
timbral variation available on a given instrument? Our study seeks to understand
the communication process involving composer, performer, and listener by
abstracting melodic lines from their original context within operatic works, and then
reinterpreting these excerpts using different timbres and expressive affects. Three short
excerpts from Romantic-era operas were chosen, based on their clarity in signifying one
of five basic discrete emotions (anger, fear, happiness, love, sadness). For the
reinterpretations, soprano voice and five instruments (flute, English horn, clarinet,
trumpet, cello) were chosen to include a wide variety of timbral and expressive
characteristics. Six professional musicians performed the excerpts while attempting to
express one of each of these five basic emotions during their performance, yielding a
total of 90 excerpts. Participants (N=40) were asked to listen to each excerpt and to
complete two tasks: 1) during listening they continuously rated the overall emotional
intensity from weak to strong using a slider that provided force feedback; 2) at the end of
the excerpt they picked from the emotion (s) those that they perceived as being expressed
and rated their strength on a scale of low to high. The analysis of the results examined
both the extent to which expressive intent in a musical performance was perceived as a
function of the melody and the instrument playing it, and what timbre’s role is in this
communication process. Results showed that although there was a communication of
affective information, the musical communication process first and foremost depended
heavily upon the excerpt, and then additionally, the instrument and/or instrumentalist, and
then the intended emotion.
ISBN 1-876346-65-5 © ICMPC14 241 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Regression analyses revealed timing, pitch height and mode were predictive for mean
ratings of valence among preludes (R2=0.82) and fugues (R2=0.86). Timing accounted for
a larger amount of variance across both musical styles (62%), while pitch height and
modality accounted for low to moderate variance in both preludes and fugues. Mean
intensity ratings were also significantly predicted by timing within preludes and fugues,
however pitch height and modality were not significant. Therefore, it appears timing was
the strongest predictor of participants’ emotional responses. Cue strength however, varied
with respect to style; attack rate accounted for 66% of variance in emotional responses to
fugues, and comparatively 76% in responses to preludes. In addition both pitch height
and modality cues explained more variance for preludes than fugues. Overall these three
cues accounted for more variability within ratings for preludes (R2=0.80) performance
than fugues (R2=0.68). We will further discuss implications of these differences and the
importance of timing as a cue on understanding the communication of emotion in
composed music.
ISBN 1-876346-65-5 © ICMPC14 242 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 243 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
i. Pleasant Group: 15 pleasant images were selected inches computer monitor. The loudness was set at a
from IAPS images database. The valance was equal to comfortable level by the participants at the beginning of the
or greater than 5 (mean pleasure rating = 7.05, SD = experiment. When the participants arrived at the laboratory,
0.63, range = 5.008.34; mean arousal rating = 4.87, they were given a brief idea about the protocol related to the
SD = 0.98, range = 2.907.35). GSR recording so that the they understand the procedure and
artifact can be reduced.
ii. Unpleasant Group: 15 unpleasant images were Participants were explicitly instructed to avoid hand
selected from IAPS images database. The valance was movement as the electrodes were placed on index and middle
less than the neutral midpoint of 5 (mean pleasure finger. They were also instructed on screen which steps to
rating = 3.05, SD = 0.84, range = 1.454.59; mean follow, e.g. pressing spacebar after a visual stimulus, directions
arousal rating = 5.56, SD = 0.92, range = 2.637.35). of giving ratings etc. The electrodes were attached to the
iii. Neutral Group: This group served as a Control Group. participants 10-15 minutes before the experiment to record
Subjects were shown 15 images accounting arousal their baseline skin conductance. After receiving the instructions
and valence of range 3.5-4.5 based on IAPS normative and having their electrodes attached, participants rated their
ratings. All the IAPS pictures were classified as per initial mood. Participants were asked to report their emotional
recommendation of (Mikels et al., 2015). quotient on 3 categories namely mood, emotion, tension. Most
of the participants rated their emotional quotient as calm before
Musical Stimulus: A set of 12 different ragas were taken from
going on the main experiment. All the participants were
SwarGanga Music Foundation. The duration of each raga was randomly assigned into 3 groups namely pleasant, unpleasant
of 3 minutes and were classified as Happy, Sad, Angry and and neutral and presented different emotional pictures
Calm based on their rasa (see Table 1).
depending on the groups e.g. in the pleasant group, participants
Table 1. This table lists the ragas used in the study and the rasa were shown 15 pleasant picture and again asked them to rate
used by the artist to play the raga their mood to make sure that participants should be in a
particular emotional context and then they were allowed to take
part in the main experiment.
IV. RESULTS
The analysis was performed in two steps: 1) Pre-processing of
GSR Data and 2) Performing Statistical Analysis. GSR data
was firstly down-sampled to 25Hz to perform adaptive
Therefore, in total there were 3 ragas of the same rasa. The smoothing. Adaptive smoothing on data was carried out to
above classification was made in accordance with previous ensure that artefact removal does not cause distortions in the
literature by (Balkwill & Thompson, 1999; Mathur et al., data. GSR data was then divided into chunks of time-blocks for
2015). The musical structure and the sequences chosen are each raga that was played with the use of marker embedded by
essentially fragments of ragas. Ragas were played randomly to PsychoPy software.
each participant and participant were asked to report their
Mood, Emotion and Tension level on a 5 point likert scale. The GSR data were analysed using two-way ANOVA with the
Mood was rated on a 1 to 5 scale where 1 being extremely effect of context (group) on musical stimulus as between-
annoyed, 2 being annoyed, 3 being neutral, 4 being appeased subjects factor and effect of rasa of the raga as within-subject
and 5 being extremely appeased. Emotion was also rated on a factors. The analysis was performed in MATLAB 2014b and
1 to 5 scale where 1 being extremely sad, 2 being sad, 3 being SPSS version 16.0.
neutral, 4 being happy and 5 being extremely happy. Lastly, for
tension Level the scale was 1 being extremely tensed, 2 being Results show a significant effect of group (Pleasant,
tensed, 3 being neutral, 4 being relaxed and 5 being extremely Unpleasant and Neutral) in which participants were placed (F
relaxed. (2, 60) = 3.874, p = 0.026) for first 60 seconds which is also
reported in our previous findings (Mehrotra et al., in
III. PROCEDURE submission). An interaction effect was also observed among
Participants were tested individually in a dimly lit group and ragas (F (6, 60) = 2.585, p = 0.027) for first 60
soundproof room, approximately 57 cm from a 17.8’ X 14.2’ seconds. However, while considering 60 seconds to 120
244
Table of Contents
for this manuscript
seconds, we did not find any significance of the group in which group. Further, this observation compliments the statistical
participants were placed (F (2, 60) = 1.686, p = 0.194). result. We observe a similar trend for the other ragas.
Figure 6. GSR Potential: Calm Raga for 120-180 Secs time interval
Figure 2. GSR Potential: Calm Raga for 60-120 Secs time interval
For 120-180 seconds, the pleasant and unpleasant group have
relatively same average GSR potential followed by the neutral
group for angry raga in figure 5. In figure 6, from the
continuation of figure 2, Neutral group has achieved highest
GSR potential followed by Pleasant and Unpleasant group for
Calm Raga. In figure 7, we can follow the same continuation
of a trend from figure 3. The Pleasant group has a maximum
GSR potential closely followed by the Unpleasant group and
the minimum GSR potential is exhibited by the Neutral group
for Happy raga. We interpret this as an adaption effect of the
musical stimuli that is reaching a plateau over time. Adaption
effect is found in both 60-120 secs and 120-180 secs for Calm,
Figure 3. GSR Potential: Happy Raga for 60-120 Secs time
interval Sad, Angry and Happy raga.
In figure 8, the Unpleasant group has highest GSR potential
followed by Neutral group and Pleasant group for sad raga. Sad
raga shows the strongest effect of adaptation for all groups viz.
the Pleasant, Unpleasant and Neutral group for the musical
stimulus.
Figure 4. GSR Potential: Sad Raga for 60-120 Secs time interval
A similar trend was observed for 120-180 seconds (F (2, 60)
= 1.356, p = 0.265). In figure 2 and 3, we can see that the
context has no dependence on the emotional group. For 60-120
seconds, in figure 1, Neutral group has maximum average GSR Figure 7. GSR Potential: Happy Raga for 120-180 Secs time
potential followed by Unpleasant and Pleasant group for angry interval
raga but at the end of 120 secs, it appears that difference in the
GSR potential between Neutral and Unpleasant has reduced
and hence points towards the adaptation takes place across the
245
Table of Contents
for this manuscript
ACKNOWLEDGMENT
We are grateful to all our participants for participating in the
study. We are thankful to Prof. S.Bapi Raju for their helpful
suggestions. This work was funded in part by the Kohli Center
on Intelligent Systems (KCIS), as well as the International
Institute of Information Technology, India.
REFERENCES
Balkwill, L., & Thompson, W. F. (1999). A Cross-Cultural
Investigation of the Perception of Emotion in Music:
Psychophysical and Cultural Cues. Music Perception: An
Interdisciplinary Journal, 17(1), 43-64
Carifio, J., & Perla, R. J. (2007). Ten Common Misunderstandings,
Misconceptions, Persistent Myths and Urban Legends about
Likert Scales and Likert Response Formats and their
Antidotes. Journal of Social Sciences, 106-116.
doi:10.3844/jssp.2007.106.116
Dade, P. (2010). A Dictionary of Psychology (3rd ed.)201010 Andrew
M. Colman. A Dictionary of Psychology (3rd ed.) Oxford: Oxford
University Press 2009. xi 882 pp., ISBN: 978 0 19 953406 7
Huron, D. (2003). Is Music an Evolutionary Adaptation? The
Cognitive Neuroscience of Music, 57-75.
Lang, P.J., Bradley, M.M., & Cuthbert, B.N. (2008). International
affective picture system (IAPS): Affective ratings of pictures and
instruction manual. Technical Report A-8. University of Florida,
Gainesville, FL.
Levitin, D. (2007). This is Your Brain on Music. By Daniel Levitin.
London: Atlantic Books, 2007. 320 pp. ISBN 978-1-84354-715-0
(hb).
Mathur, A., Vijayakumar, S. H., Chakrabarti, B., & Singh, N. C.
(2015). Emotional responses to Hindustani raga music: The role
of musical structure. Frontiers in Psychology Front. Psychol., 6.
doi:10.3389/fpsyg.2015.00513
246
Table of Contents
for this manuscript
The central notion in North Indian Classical Music is that of a raga. The word ‘raga’
originates in Sanskrit and is defined as 'the act of colouring or dyeing' (the mind and
mood/emotions in this context) and therefore refers metaphorically to 'any feeling of
love, affection, sympathy, desire, interest, motivation, joy, or delight’. It is believed that
an artist uses particular note combinations, to create a mood (rasa) that is unique to the
raga. We conducted a behavioural study in which 122 participants from India rated their
experienced emotion for 12 ragas on Likert scale of 0-4 (with 0 being ‘not at all felt’ to 4
being ‘felt the most’) for each of the following emotions - happy, romantic, devotional,
calm, angry, longing, tensed, and sad. The results of the behavioural study conducted
online revealed that ragas were indeed capable of eliciting distinct emotions. Eight ragas
that were rated highest on ‘happy’ and ‘tensed’ emotion, four in each category were
chosen for a subsequent functional neuroimaging experiment to investigate the neural
circuits underlying these emotional responses. A separate group of 29 participants rated
raga sequences selected from the behavioural study for ‘happy’ and ‘tensed’ emotion.
The raga condition contrasted with rest revealed a network of areas implicated in
emotion processing namely, bilateral mOFC, ACC, caudate, nucleus accumbens, insula,
precuneus, auditory association areas and right lOFC, amygdala, hippocampus - para
hippocampus gyrus complex and hypothalamus. Further flexible factorial analysis
revealed a main effect of emotion (happy/tensed) in the bilateral mOFC. Our results
provide evidence for the first time that ragas elicit basic emotion circuitry. They also
reveal that valence information may be coded in the mOFC. Together, we demonstrate
the ability of North Indian Classical ragas as tone sequences capable of eliciting distinct
emotional responses.
ISBN 1-876346-65-5 © ICMPC14 247 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Links between music and emotion in dramatic art works (film, opera or other theatrical forms)
have long been recognized as integral to a work’s success. A key question concerns how the
auditory and visual components in dramatic works correspond and interact in the elicitation of
affect. We aim to identify and isolate the components of dramatic music that are able to clearly
represent basic emotional categories using empirical methods to isolate and measure auditory and
visual interactions. By separating visual and audio components, we are able to control for,
coordinate and compare their individual contributions. Using stimuli from opera and film noir, we
collected a rich set of data (N=120) that will make it possible for us to segment and annotate
musical scores with collated affect descriptors for subsequent score analysis. Custom software
was developed that enables participants successively to record real-time emotional intensity, to
create event segmentations and to apply overlapping, multi-level affect labels to stimuli in audio,
visual and audiovisual conditions. Our findings suggest that intensity profiles across conditions
are similar, but that the auditory component is rated stronger than the video or audiovisual
components. Descriptor data show congruency in responses based on six basic emotion
categories, and suggest that the auditory component elicits a wider array of affective responses,
whereas the video and audiovisual conditions elicit more consolidated responses. These data will
enable a new type of score analysis based entirely on emotional intensity and emotion category,
with applications in music perception, music theory, composition, and musicology.
ISBN 1-876346-65-5 © ICMPC14 248 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ABSTRACT
The aim of this study is to examine the types of verbal expressions Meanwhile, despite the fact that language is less affected by
used by college students with and without visual impairments. visual impairment since it is learned through listeningG
Participants of the in-depth interviews were students at universities in (Anderson & Olson, 1981; Civelli, 1983), it is hard to find
Seoul and Chungcheong Province. Ten participants were visually studies analyzing the VI’s verbal description of emotional
impaired and the other ten were normal sighted. Content analysis of reaction upon listening to music. Since the main route of
the in-depth interviews revealed the following implications. First, communication with others of the visually impaired is listening,
visually oriented verbalism is observed among visually impaired it is necessary to pay attention to verbal description of music
people even when they can experience the same level of musical after listening in order to look into music listening experience
experience with normal sighted people. Highly observed is verbalism of the visually impaired and the normal sighted.
by contextual information which is naturally obtained in daily life, Research on the influence of visual impairments on verbal
rather than verbalism gained from the media nor from didactic description in everyday life has been continued by the scholars
learning. Verbalism is a linguistic characteristic shown by the people majoring in visual impairment education. Regarding linguistic
with visually impaired using words in an abstract way. Second, K- characteristic caused by the loss of vision,G Cutsforth(1951)
LIWC was employed to examine the types of words used for
used the word verbalism for the first time, and this word means
description between the two comparing groups, and results showed
the linguistic characteristic of the visually impaired in which
high ratio of use of pronouns in the visually impaired was observed.
he/she uses the word which he/she heard and did not fully
Third, In terms of the content, words for negative emotions were used
more frequently. Fourth, compared to normal sighted people, visually
understand in an abstract way in everyday life. Especially,
impaired people more frequently use affective description and visually oriented verbalism refers to congenital VI’s behavior
description via senses other than vision which implies that the people of using the words related to visual information such as color,
with visually impairment use non-vision senses substitute or shade, chroma in everyday life without having any direct
compensate visual impairment. Last, in regards to episodic memory experience of them (Vinter, Fernandes, Orlandi, & Morgan,
and situational depiction, normal sighted people frequently narrate 2013). This is closely related to the age as it is found that there
episodic memories, while people with visually impairment more is tendency for the verbal similarity between the visually
frequently focus on situational depiction. impaired and the normal sighted to increase as they get older.
(Rosel et al., 2005).
This study purported to investigate the differences of music
I. INTRODUCTION listening experiences between adults with normal sight and
Visual imagery is one of major mechanism to induce music visual impairment focusing on their characteristics of verbal
emotion. It has been found that the people with visually descriptions. The research questions were:
impaired (VI) make less analogy or metaphorical expression First, what are the differences between descriptive factors of
than those with normal vision (NV) due to the listener’s ability two different groups of participants after the music listening
of processing music and organizing them into visual images. experience?
Visual experience limitation affects abstract information Second, what are the differences between perceptual factors
processing in music listening experience. of two different groups of participants after the music listening
The factors which influence on VI’s listening of music experience?
include conceptualization of music and visual imagery processG Third, what are the differences between referential factors of
(Darrow & Novak, 2007). While there are studies which two different groups of participants after the music listening
suggest that VI’s tactile sense coincides with normal sighted experience?
people’s visual sense (Walker, 1985), the studies on different
amount of imagery utilized by visually impaired and normal II. METHOD
sighted people using the similar measuring tools suggest that Participants (10 VI, 10 NV) were students at five universities
limited visual experience has negative impacts on the formation in Korea. Self-volunteered participants listened to the selected
of musical conceptualization for VI (Madsen & Darrow, 1989). music for approximately 3 minutes, and freely described their
Also, the study which revealed limitation on their process of emotions, physical reaction, related images, episodic memory,
abstract data, reported that when asked to freely comment about specific emotion related to musical factor from listening to
the music after listening, the VI tend to present more data- music. For the in-depth interview, the authors played the
driven description including used instruments and musical selected emotion-inducing-music to the participant, and the
factors while presenting less of analogical or metaphorical participant freely expressed and described the emotions he/she
descriptions comparing to the normal sighted (Flowers & Wang, had felt using adjectives, and the authors recorded this process.
2002). Despite their visual impairment they can still articulate The authors collected all the raw recorded data, systematically
the music listening experiences in their own words.
ISBN 1-876346-65-5 © ICMPC14 249 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
250
Table of Contents
for this manuscript
Understands about <Table 3> Analysis of perceptual factor content of music listening
atmosphere of specific experience
place through TV drama Category Subcategory VI (%) NV (%) Total (%)
Learns about facial
expression and attitude Visual 52 (16.0) 78 (25.2) 130 (20.4)
through lyrics of the song Perceptual Sensual 99 (30.6) 71 (23.0) 170 (26.8)
Understands about Factor Emotional 130 (40.3) 95 (30.6) 225 (35.6)
symbols of color through
Behavioral 42 (13.1) 66 (21.2) 108 (17.2)
conversation with friends
Infers about the size of Total 323 (100) 310 (100) 633 (100)
the object through
hearing about portrayal of D. Referential Factor
specific scene The referential factor analysis unit in the music listening
Contextual
Learns about the change 20(57.2) experience is sentence, and it needs to have subject and
verbalism
of light through
situational description
predicate. Also, even if the subject is excluded in the special
Understands about case of interview, if the meaning is clear, then it would consider
characteristic of the it as the analysis subject. Episodic memory triggered by music
object by combining is the description about research participant’s experienced
survival sense and visual incident, people, and place. In addition, situational depiction is
information the description about story telling or imaginary image triggered
Total 35(100) by music (see Table 4).
<Table 4> Analysis of referential factor about music listening
B. Form of Description experience (N=20)
Form of description was analyzed by breaking down
Category Subcategory VI (%) NV (%) Total (%)
interview recording in linguistic dimension and psychological
dimension using Korean-Linguistic Inquiry Word Counting Episodic
20(35.0) 27(49.1) 47(41.9)
Referential Memory
(K-LIWC). By looking into syntactic aspect including word,
morpheme, word vs. sentence, it was intended to find out Factor Situational
37(64.8) 28(50.9) 65(58.1)
additional information about totally blind college students use Depiction
of words. Linguistic dimension is the method to understand the Total 57 (100) 55 (100) 112 (100)
characteristic of use of words of totally blind college student
through syntactic approach. It includes major linguistic IV. CONCLUSION
dimension’s 6 low ranked factors including noun, pronoun, The conclusion derived from the analysis of verbal
verb, adjective, adverb, and exclamation (see Table 2). description about music listening is as follows.
<Table 2> Linguistic characteristic of descriptive factor about First, VI’s showed high level of visual directivity verbalism,
music listening experience (N=20) and its visual information was mostly acquired from contextual
verbalism, followed by verbalism acquired from media and
Types VI (n=10) NV (n=10) z p didactic learning. This implies how use of verbalism itself has
contextual characteristics, and shows how information
Noun 14.6 15.1 -1.369 0.17
obtained from everyday life or conversation with others have
Pronoun 1.8 0.7 -2.389 0.04* bigger impacts than those obtained from books, movies,
television or through education. In other words, it proves that
Verb 10.8 13.3 -1.579 0.11 VI tend to have higher trust on visual information obtained
contextually compared to that from meda or education, and
Adjective 5.5 5.7 -0.684 0.49
tend to utilize them as one of their ordinary verbal or linguistic
Adverb 2.9 2.6 -0.526 0.59 habits (Jaworska-Biskup, 2011). Generally, amount of
contextually obtained linguistic habit through languages used
Exclamation 0.8 0.3 -0.919 0.35 by people is much more than that obtained from education or
*
p < .05 media, and there is higher trust for contextually obtained visual
information (Kemp, 1981). In other words, as it is used
continuously in the casual conversation with others, it is
C. Perceptual Factor silently approved, and accepted contextual verbalism is used
frequently as way of communication applying the characteristic
Since this study is the comparative analysis of linguistic
of language.
description characteristic about emotional reaction triggered by
Secondly, through the Korean-Linguistic Inquiry Word
music between totally blind college student and normal sighted
Counting (K-LIWC)’s syntactic factor analysis in order to find
college student, visual adjective has been categorized as a
the differences in the description forms, it was found VI group
separate category, while other senses including hearing, smell,
tend to frequently use pronouns (Perez-Pereira, 1999). This
and touch have been categorized as one category as a whole.
could be seen that the concentration level of VI was higher than
The words included in the emotional category refer to all the
that of NV, and they believed that their use of pronoun would
words that could describe the reaction from external
imply the actual noun it is referring to the interviewer. Another
stimulation (see Table 3).
251
Table of Contents
for this manuscript
252
Table of Contents
for this manuscript
Every social interaction is also communication of emotions through our voices, postures,
and body movements. Emotions are contagious; they are in a constant flux, and have a
big, yet often subconscious effect on our social relations. Dance and music have the
ability to bring emotional communication to the foreground, and a wide spectrum of
emotions, their intensities, and their social consequences can be explored through them.
We combined contemporary dance and movement science to explore the social, dynamic,
and embodied facets of emotions. Two dancers studied social emotions and emotional
contagion in their artistic research, and then performed a short choreography with
different emotional conditions (“loathing”, neutral, “loving”, pride & shame; pair having
same or different emotions; static, changing together, or one contracting the emotion of
the other). The performances were recorded using optical motion capture, and kinematic
analysis was conducted to investigate which movement features correspond with which
emotional states and scenarios.
Principal component analysis (PCA) of the kinematic features indicates that the first
component mainly captures the “vigorousness” of the dance, separating “loving” from
“loathing”, with neutral performances in between. PC2 captures the social dimension, or
movement similarity between dancers. PC3 is loaded with acceleration-related features,
and teases apart depictions of pride and shame from the others.
ISBN 1-876346-65-5 © ICMPC14 253 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 254 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 255 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
assessed the participant’s general Emotional reactivity between the modalities, F (33, 430) = 7.80, p ≤
.001. In order
(general strength of emotional reaction). The following six to identify the clearest differences and avoid bias from
questions assessed their emotional responsiveness towards the multiple comparisons, we only selected object characteristics
selected musical or visual object. These consisted of the that filled the criterion of having ≥0.5 unit mean difference
following aspects of emotional responsiveness: Emotion between the modalities and showing a statistically significant
strength, Pleasure strength, Aesthetic experience, Attention, difference at ≤
.001
significance
level.
These object
Ambivalence, and Bodily experience. Ratings were provided characteristics are presented in Table 2. Again, bolding
on a 7-point Likert scale (see scale dimensions in Appendix indicates towards which end of the SDS the mean values were
2). For the analyses, and reporting the results, the scrutinized settled.
concepts were translated from Finnish to English using back- TABLE 2
DIFFERENTLY PERCEIVED OBJECT CHARACTERISTICS
translation.
Object
characteristics:
Mean / Music Mean /Visual
C. Analysis (N=228)
(N=236)
The analysis was predominantly descriptive and exploratory in Ratings at opposite sides of the SD scale:
nature. We first explored the mean values of the bipolar Playful
-‐
Serious 4.3 3.5
adjective scales, in order to identify object characteristics
Conventional
-‐
Rebellious 4.2 3.6
and/or associations, and the difference in these between the
modalities. In terms of statistical analyses, MANOVA and Mystical
-‐
Down
to
earth 3.7 4.5
subsequent t-tests were calculated to detect significant Discreet
-‐
Bold 4.5 3.9
differences between the modalities.
Ratings on same side of the SD scale:
Secondly, similarities and differences between the
modalities regarding the seven constituents of emotional Light
- Heavy 3.7 3.0
responsiveness were investigated by comparing their mean Calming
- Exciting 3.2 2.3
values (MANOVA and subsequent t-tests). Finally, the most
Energetic - Calm 4.3 5.0
distinctive features of the emotional responsiveness in each
modality were correlated with the perceived pleasantness of Funny - Serious
4.8
4.0
the object. Tranquil
- Restless
2.7
2.1
Sensual - Non-‐erotic
4.0
4.7
III. RESULTS
A. Object characteristics “Differences” were categorized in two sub-groups. The first
The results showed several distinctions between how section of Table 2 represents dimensions in which the mean
pleasurable music and pleasurable visual objects are values were generally close to the midpoint of the scale, but
perceived. First, we searched for such characteristics that were towards different ends of the scale in different modalities.
most similarly rated in both modalities. Object characteristics Among these, the musical objects inducing pleasure were
were selected as “similarities” if the mean ratings between the perceived to be more towards serious, rebellious, mystical,
object types were practically identical, i.e. filled the criterion and bold, whereas visual objects inducing pleasure were
of showing only ≤0.1 unit difference. These characteristics are perceived to be more towards playful, conventional, down to
presented in Table 1. Bolding in Table 1 indicates towards earth and discreet.
which end of the SDS the mean values in each scale were The second section of Table 2 presents dimensions in which
settled. Both modalities were equally associated to be highly the mean values were settled on the same side of the scale in
pleasant and timeless, and somewhat nostalgic, sophisticated, both modalities, associating both object types as light,
and imposing (rather than unpleasant, temporal, oriented calming, calm, serious, tranquil, and non-erotic. However, the
towards novelty, rough, and modest, respectively). mean value differences indicated that the perceived
characteristics of music, in comparison to visual domain, were
TABLE I inclined more towards heavy, exciting, energetic, serious,
SIMILARLY PERCEIVED OBJECT CHARACTERISTICS restless, and sensual. In contrast, characteristics related to
Object characteristics Mean / Music Mean / Visual visual objects were perceived more towards light, calming,
(N=228) (N=236)
calm, funny, tranquil and non-erotic.
Pleasant
- Unpleasant 1.5 1.5
B. Emotional Responsiveness
Temporal - Timeless 6.4 6.3
MANOVA revealed a significant difference between the
Nostalgic - Oriented towards Novelty 3.6 3.5 object types in the emotional responsiveness factors, F (7,
Rough - Sophisticated 4.8 4.9 239) = 20.43, p ʺ″ .001, and subsequent t-tests were conducted
Modest - Imposing 4.6 4.6 to investigate the differences in each of the seven features
(Table 3).
Next, we focused on identifying the differences between the The general tendency for emotional responsiveness
modalities. MANOVA indicated a significant difference (reactivity) appeared to be somewhat stronger within those
256
Table of Contents
for this manuscript
who selected music in comparison to those who selected a perceived pleasantness of the object correlated significantly
visual object, and an even stronger difference was observed only with the strength of Aesthetic experience. These
between the modalities for the emotional strength experienced correlations were relatively low but in line with earlier
towards the selected object, showing that music was observations on modality differences in emotional reactivity
experienced significantly more emotionally than the visual towards the object.
objects. Pleasure strength was scored high within both groups
TABLE 5
(M=6.1/music, M=6.0/visual), and did not show significant OBJECT PLEASANTNESS & EMOTIONAL RESPONSIVENESS IN VISUAL OBJECTS
difference between the modalities. Specific aspects of Variable AesthExp Attention BodilyExp
emotional reactivity also differed significantly between the Objec feature:
.13* .09 .01
object types, showing significantly higher ratings for attention Unpleasant-Pleasant
and bodily experience in relation to pleasure evoked by music AesthExp 1 .35** .33**
than for pleasure evoked by visual objects. In contrast, Attention .35** .1 .32**
aesthetic experience received significantly higher ratings BodilyExp .32** .32** 1
within visual domain than in music. Mean ratings for the * p ≤ .05; ** p ≤ .01; *** p ≤ .001
experience of emotional ambivalence were somewhat higher
in music than in visual objects but the difference between In regard to the components of emotional reactivity, Aesthetic
modalities did not reach significance. experience, Attention, and Bodily experience, all were
TABLE 3 moderately positively correlated with each other, both for
EMOTIONAL RESPONSIVENESS music and visual objects.
Variable Mean Std. t-‐test
results
Music 5.3 1.3 IV. DISCUSSION
Reactivity t
(462)
=
2.19;
p=.029
Visual 5.0 1.4
The results provided evidence of distinct characteristics of
Music 5.7 .90
EmoStrength t
(448.67)
=
3.82;
p
≤
.001
pleasurable music and pleasurable visual objects, and the
Visual 5.4 1.1
relation of these to emotional responsiveness. First, it was
Music 6.1 .91
PleaStrength t
(462)
=
1.38;
ns.
shown that both pleasurable music and pleasurable visual
Visual 6.0 1.0 objects were perceived as pleasant, timeless, nostalgic,
Music 5.7 1.5 sophisticated, and imposing, on the bi-polar scales (as opposed
AesthExp t
(435.33)
=
-‐3.49;
p
≤
.001
Visual 6.1 1.2 to unpleasant, temporal, oriented towards novelty, rough, and
Attention
Music 6.0 .92
t
(421.39)
=
9.12;
p
≤
.001
modest, respectively).
Visual 5.0 1.4 In regards to differences between the modalities, it was
Music 3.2 2.0 found that pleasurable music was characterized more towards
Ambivalence t
(449.72)
=
1.93;
ns
Visual 2.8 1.7 serious, rebellious, mystical, bold, heavy, exciting, energetic,
Music 5.0 1.5 restless, and sensual. In contrast, pleasurable visual objects
BodilyExp t
(452.55)
=
4.91;
p
≤
.001
were characterized more towards playful and funny,
Visual 4.2 1.8
conventional, down to earth, discreet, light, calming, calm,
tranquil, and non-erotic. These differences present pleasurable
C. Correlations music and pleasurable visual objects as distinct, and
Since Aesthetic Experience, Attention and Bodily appreciated in different ways. Music presents itself as a source
experience demonstrated clear differences between the object of serious, mystical, heavy, and rebellious excitement, while
types in emotional responsiveness, we further investigated visual objects offer playful, conventional, and light tranquility.
whether they also showed connections to perceiving the object Results regarding emotional responsiveness provide an
pleasurable. The object characteristic “Unpleasant-Pleasant”, interesting additional perspective. Music received higher
was therefore correlated with these emotional responsiveness ratings in the strength of attention and bodily experience, and
features, separately in music (Table 4) and in visual objects was generally emotionally reacted to more strongly than visual
objects. Pleasure evoked by visual objects was experienced
(Table 5).
TABLE 4
predominantly as aesthetic in nature. These differences were
OBJECT PLEASANTNESS & EMOTIONAL RESPONSIVENESS IN MUSIC further supported by correlations between emotional
Variable AesthExp Attention BodilyExp
responsiveness components and the object feature unpleasant-
Objec feature: pleasant: Object pleasantness was only correlated with the
.16* .00 .19**
Unpleasant-Pleasant strength of aesthetic response in visual objects but with both
AesthExp 1 .26** .37**
aesthetic and bodily response in music.
Attention .26** .1 .33**
The limitations of the study concern sample biases in terms
BodilyExp .37** .33** 1
of gender, age, and education level, so one should be careful
* p ≤ .05; ** p ≤ .01; *** p ≤ .001 in making generalizations to other types of samples. Also,
cultural environment may have influenced how emotions
In music, the perceived pleasantness of the object was evoked by daily experiences of music and visual objects are
significantly correlated with the strength of Aesthetic perceived and conceptualized, and results may be somewhat
experience and Bodily experience. For visual objects, the reflective of Finnish culture.
257
Table of Contents
for this manuscript
258
Table of Contents
for this manuscript
Music induces emotions and influences moods. Mode, tempo, loudness, and complexity
have systematic associations with specific emotions. In previous research on children’s
sensitivity to emotional cues in music, however, real music has rarely been used. Our
participants were middle-class children (7-10 years of age, n = 195), adult musicians (n =
57), and adult nonmusicians, (n = 51), recruited in Brazil. The task required participants
to match emotion labels (e.g., happiness) with 1-min instrumental excerpts taken from
Wagner’s operas. Eight excerpts, from five operas, were selected so that they varied
systematically in terms of arousal and valence, and so that two came from each quadrant
of emotion space as defined by the circumplex model. Excerpts were presented to groups
of participants over speakers. For each excerpt, listeners chose one emotion from a set of
eight emotion words that varied identically in terms of arousal and valence (happy,
elated, loving, peaceful, angry, fearful, sad, and distressed). Emoticons were presented
with each emotion term to make the task easier for children. Responses were considered
correct if the word and excerpt were in the same quadrant. Children’s performance
greatly exceeded chance levels and matched that of adult nonmusicians. There was also
monotonic improvement as a function of grade level, and musicians outperformed
nonmusicians and children. Group differences arose primarily from the ability to match
words and music on the basis of valence cues. Sensitivity to arousal cues varied little
across groups or grades. Thus, even 7-year-olds are capable of making consistent
emotional evaluations of complex music. The results imply that such music could be used
more frequently with young children to enrich their musical environment.
ISBN 1-876346-65-5 © ICMPC14 259 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Visual and auditory performance cues, as well as structural features of the music, can all
communicate emotional information to observers. However, little is known about the
relative contributions of these components to emotions perceived in music, or whether
the relative contributions change across different instruments. In two experiments, we
systematically varied the emotion conveyed by the structural features of the music, as
well as the emotional expression communicated by the performer. In Experiment 1, a
pianist performed four short pieces (sad, happy, scary, and tender-sounding) with four
different expressive intentions: sad, happy, angry, and deadpan. The body movements of
the pianist were tracked using optical motion capture. Subsequently, 31 participants rated
the emotions that they perceived in the performances in audio-only, video-only, and
audiovisual rating conditions. In Experiment 2, we replicated the experimental design
using violin performances (violin performed the melody from the same four short pieces)
with 34 different participants. To investigate the possible influence of visual performance
cues on emotion perception in the audiovisual rating condition, the four deadpan audio
tracks were paired with motion capture animations representing the four different
emotional expressions (of the same piece). A time-warping algorithm was applied to the
motion capture animations so that the audio and video would appear synchronized. The
ratings from each study were analyzed using repeated-measures ANOVAs. Our
presentation focuses on how emotion ratings change depending on which instrument is
being observed. For instance, in the video-only condition, participants viewing the piano
stimuli accurately recognized the emotional expression conveyed by the performer, but
those viewing violin stimuli confused emotional expressions. The varying results suggest
that pianists and violinists convey emotional information in different ways to observers.
ISBN 1-876346-65-5 © ICMPC14 260 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 261 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 262 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Research on music uses has shown that regulation functions are transversely present
across affect dimensions (e.g., emotions, energy levels, motivation) and populations.
Nevertheless, a recent literature review (Baltazar & Saarikallio, submitted) revealed that
no comprehensive model for the phenomenon exists and that music use is not equally
studied across the affective dimensions. The current study aimed to empirically explore
and clarify how contextual and individual factors, music features, affective goals, tactics,
strategies, and mechanisms interact in the regulation process in order to develop a new
model for musical affect regulation. Data were collected through an online survey, filled
by 571 participants aged between 13 and 30. Preliminary results support a general model
of musical affect regulation, as 34.7% reported a co-presence of two or more dimensions
and 29.4% could not decide on one dimension. The most salient features of the process
for the participants were strategies and mechanisms, chosen significantly more often as
the motive for music engagement than goals or tactics. Most used strategies were
distraction, concentration, and manipulation of affective states, and most common
mechanisms were rhythm, lyrics, and aesthetics. However, the use of strategies and
mechanisms varied depending on which of these two was more relevant for the
participant. Detailed analyses regarding the co-occurrence and interrelation of these
elements and the mediator role of individual-contextual factors are being conducted. The
results indicate a process that, although based on the experienced state and individual-
situational factors, is perceptually initiated with a focus on one or multiple regulatory
strategies or mechanisms. The motivational focus influences which elements become part
of the process. A new integrative model of affect regulation through music, combining
the elements that emerged both from theoretical and empirical work, is proposed.
ISBN 1-876346-65-5 © ICMPC14 263 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 264 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ABSTRACT
In this study, we investigated the transferability of visual emotional
information from film clips onto emotionally “neutral” musical
works. We solicited short piano compositions that aimed to convey
a sense of “neutrality,” or at least the absence of strong emotions,
from a pool of composers, and then paired each neutral composition
with emotional film clips. Using a between subjects design, 20
participants provided subjective ratings of valence and arousal after
watching these short film clips. For a given neutral composition, half
of the participants heard the music paired with a nominally happy
video clip, whereas the other half heard the same music with a
nominally sad video clip. After a short distraction task, participants
listened to the previously heard music samples, this time without the
accompanying visual information and were asked to again provide
subjective ratings of valence and arousal levels. We hypothesized
that affective visual information would “imprint” onto neutral
compositions, such that those who heard neutral music paired with a Figure 1. The Congruency-Associationist Model (Marshall&
nominally happy film would rate the neutral music more positively Cohen, 1988)
than those who first heard the music paired with a nominally sad film.
For example, McGurk and MacdDonald (1976)
Our results investigated the reciprocal relationship between visual
demonstrated that when participants were exposed to films of
and aurally inferred emotional information, specifically by
a young woman mouthing the syllable “ga” with a dubbed
examining the influence of visual information on the aural
experience.
voice saying “ba”, they reported hearing the syllable “da.”
The visual information from the woman’s mouthing had
altered the participants’ perception of the aural information
I. INTRODUCTION from the pronounced syllable. Another study demonstrated
this phenomenon using filmed and recorded marimba tones.
Much of the research on music and film is focused on how
music affects the interpretation of the film (Cohen, 2001). For Schutz & Lipscomb (2007) showed that switching the video
example, Marshall and Cohen (1998) showed that the of a percussionist playing short and long marimba tones with
the audio of these short and long tones could alter the
perception of a film with animated geometric figures could be
perceived length of each tone. Both of these studies
influenced by an accompanying musical sample. In their
demonstrated that it is possible for visual information to alter
discussion, they outlined an explanatory model referred to as
the perception of aural information.
the congruency-associationist model (Figure 1) (Marshall &
Given this possible “reversed relationship” between the
Cohen, 1988). In this figure, the Venn Circles represent the
senses, the question that this study explored was: can the
total meanings of the film and the music. When experiencing
congruency-associationist model also operate in reverse?
both the meaning of the film and the music, more attention is
Boltz, Ebendorf, & Field (2009) tested this conjecture by
drawn towards characteristics that are the same (a).
pairing positively and negatively valenced visuals with
Information from the music itself (x) is then associated with
ambiguous musical samples and found that the visual
the congruent characteristics of the film and music, thereby
allowing the music to alter the meaning the film has when information affected the perception of the musical samples in
a mood-congruent manner. Specifically, ambiguous musical
shown by itself.
samples were rated more positively when paired with a
While much of multimodal research demonstrates how
positively affective visual and were rated more negatively
aural information can influence the perception of visual
when paired with a negatively affective visual.
information, other studies have shown that the opposite
Here we conducted a similar study by investigating the
relationship also exists: visual information can influence aural
influence of positive or negatively valenced film clips on
experience.
emotionally “neutral” music. Specifically, we predicted the
following: (H1) Affective information from silent-films will
alter the perceived emotionality of “neutral” music.
ISBN 1-876346-65-5 © ICMPC14 265 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Specifically, affective information from short film clips will emotion research as being more “carefree” (Maher & Berlyne,
“bind” to relatively neutral music. 1982; Gabrielsson & Lindström, 2010).
Table 1. Characteristics of the “neutral” compositions
II. STUDY 1: EMOTIONALLY NEUTRAL
MUSIC
Category Neutral Characteristics
A. Motivation
Average # of Measures 20.5 measures
For our study design, it was necessary to acquire
emotionally neutral music to accompany visual stimuli. The Median # of Notes per
task of finding existing music that could be described as Measure (all voices) 4.12 notes
emotionally neutral proved to be difficult. Therefore, we
elected to explicitly solicit emotionally “neutral” music from Average # of Independent
student composers. This solicitation presented a unique Parts 3 parts
opportunity to study what musical characteristics might be
Pitch Range (all pieces) B2-Ab5
perceived as emotionally neutral; an interesting question that
will only be briefly discussed in this short report. Median Range 25 semitones
B. Participants
Most Common Melodic
Six composition students from Illinois Wesleyan Intervals Unison, Octave, Major 2nd
University between the ages of 18-22 were recruited to write
emotionally “neutral” compositions. Most Common Harmonic Perfect 5th, Perfect 4th,
Intervals Major 2nd
C. Methodology
All compositions in duple or
In an open call, composers were asked to write a short (no Meter quadruple
longer than 45 seconds) piano piece using standard music
notation that expressed a sense of emotional neutrality. The Metric Regularity Absence of syncopation
composers were solely responsible for interpreting the
meaning of “emotional neutrality.” The instructions that they
received were as follows: “Composer participants will write a III. STUDY 2; METHODOLOGY
short piece ( < 45s) for piano that they consider to be In our main study, we investigated whether film can leave a
emotionally "neutral." Please interpret "neutral" however you programmatic emotional impression on a piece of music that
see fit.” accompanies it. Below, the details of the main study are
D. Analysis outlined.
Each piece was analysed with the Humdrum toolkit as well F. Participants
as other standard techniques. A brief descriptive analysis Twenty participants were recruited from the Ohio State
appears in Table 1. School of Music aural skills classes. These were primarily
E. Results and Discussion music majors between the ages of 18-22.
The general characteristics of these pieces include an G. Materials
average length of 20.5 measures with a median of 4.12 notes
All neutral compositions collected from the first study were
per measure. The texture of these short pieces was generally
paired with short, 45-second, "valenced video clips." A total
fairly sparse with an average of 3 parts. The pitch range
spanned just less than three octaves from B2 to Ab5; a range of eight video clips, taken from pre-existing online films,
that falls comfortably within the average span of the human were selected according the following criterion. Each video
voice. In particular, this is an interesting finding given that was required to be clearly positively or negatively valenced,
more extreme ranges choices have been shown to yield a highly unfamiliar, a length of 45 seconds, and not involving
stronger emotional response, with higher range being rated as overly graphic visuals, such as in the subjects of war or death.
more tense and less stable (Costa, Bitti, & Bonfiglioli, 2000; Audio and visual components were combined using iMovie.
Gabrielsson & Lindström, 2010). Within each piece, however,
H. Procedure
the median range was 25 semitones, just over two octaves.
Meters within our corpus of emotionally neutral music Participants were seated in front of a computer in a quiet
included duple and quadruple. With regards to metric room. Experimental software, designed using Max/MSP/Jitter,
regularity, we noted an absence of syncopation. Regarding presented all of the experimental stimuli and collected all
tonality, it is interesting to note that the compositions tended participant data (Cycling ‘74). The experimenter first
to avoid major or minor thirds or sixths, both melodically and familiarized each participant with the experiment
harmonically. Mode is often indicated by the quality of thirds software and allowed them to adjust the audio volume to a
and sixths, and further, mode is commonly associated with comfortable level. Care was taken to ensure that each
valence (Gabrielsson & Lindström, 2010). It is worth noting participant understood the experimental interface (see Figure
that some of the most common melodic and harmonic 3), the meaning of valance, arousal, and the circumplex model
intervals, the P5, P4, P1, and P8 are described in music and (Russell, 1980). Once participants were comfortable with the
266
Table of Contents
for this manuscript
experiment interface, the experimenter read the following groups. All participants heard the 6 neutral compositions
instructions: from study 1. However, groups were balanced so that when
“Six video clips will now be shown to you. Each video clip is listening to any given neutral composition, half of the
approximately 45 seconds long. Between each clip, there will participants viewed a positively valenced video clip, whereas
be a ten second pause. After the clips, you will be asked to rate the other half of participants viewed a negatively valenced
the valence and arousal of each clip on the circumplex model video clip.
you were previously shown. This will take approximately 7
minutes. Following this will be some paperwork about your
basic information and demographics. There will then be a
second stage in which you will be asked to listen to six audio
files. After each audio file, you will be asked to again rate the
valence and arousal on the circumplex model. The entire study
will take approximately 20 minutes. If you become
uncomfortable with any part of this study at any time, you may
leave immediately.”
IV. RESULTS
Recall that our hypothesis predicted that the neutral
compositions would be rated differently depending on which
movie they were seen with due to the differences in valence
and arousal between the movie clips. First it was necessary to
verify that participants perceived different valence and arousal
levels from the counterbalanced pairs of films. Therefore,
t-tests for each pair of video clips paired with the same
soundtrack were calculated. For the majority of the pairings,
there was a significant difference in both the valence and
arousal ratings (Table 1).
Table 1. The results for the valence ratings of the video and
audio condition in which the ratings of the counterbalanced
videos with the same audio track are compared (p-values were
Bonferroni-corrected, α = 0.0083)
Figure 2. The Circumplex Model of Emotions (Russell, 1980).
Data collection took part in two stages. In the first stage, Valence Results of Video & Audio Condition
participants were asked to watch 6 short movie clips, and then Emotion of Mean
rate valance and arousal levels via the circumplex model after Track # Video Clip Valence t-value p-value
each clip. After inputting their responses, participants clicked 1 Happy 24.7 -4.3 0.0005*
Sad -46.5
a button to initiate the next trial. Participants were only
2 Happy 6.1 -3.67 0.004*
allowed to watch each video clip once. Stage 1 required
Sad -52.6
approximately 7 minutes to complete. Participants completed
3 Happy -19.9 -3.13 0.008*
a short distractor task between stage 1 and stage
Sad -66.3
2. Specifically, participants were asked to fill out a 4 Happy -2.3 1.82 0.09
questionnaire consisting of basic demographic information. Sad -34.4
This distractor task took approximately 5 minutes. Stage two 5 Happy 72.6 5.89 0.00002*
of the experiment was identical to stage 1, except that visual Sad -38.3
information was omitted. Participants heard the same 6 6 Happy 45.6 3.13 0.006*
musical excerpts that were presented in stage 1, but this time Sad -26.1
without the accompanied "valenced videos". Participants
were only allowed to listen to each audio file once, after T-tests were also run for the arousal ratings of the video
which they again provided ratings of valence and arousal via condition. There were no significant differences between the
the circumplex model (Figure 1). Stage 2 also required arousal ratings of the counterbalanced video pairs. With this
approximately 7 minutes to complete. difference established between the perceived valence of the
film and audio pairs, we next tested our main hypothesis with
t-tests to compare the valence and arousal ratings between the
I. Experimental Design counterbalanced soundtracks in the audio-only condition.
Participants were randomly assigned to 1 of 2 experimental These results are presented in Tables 3 and 4.
267
Table of Contents
for this manuscript
Table 2. The results for the arousal ratings of the video and As can be seen in the tables, none of the ratings of the
audio condition in which the ratings of the counterbalanced audio-only counterbalanced soundtracks were significantly
videos with the same audio track are compared different from each other. Therefore, our results did not
(Bonferroni-corrected, α = 0.0083)
provide support for our hypothesis.
268
Table of Contents
for this manuscript
1
davidjohnbaker1@gmail.com, 2tabi@soundout.com, 3d.mullensiefen@gold.ac.uk
ISBN 1-876346-65-5 © ICMPC14 269 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Table 1. Table of 39 Initial Attributes used in Study same prompt in the original study “Please rate this track as if
Victorious Angry Peaceful Heroic it were a person, to what extent would you describe them on
Raging Calm Stately Cruel each dimension?” but instead responded on a ten-point scale.
Relaxed Majestic Hateful Gentle Participants also made ratings on a liking and a similarity
Determined Frustrated Pleasant Vibrant prompt that is to be used in future tool creation. The similarity
and liking data was not used in the confirmatory factor
Depressive Contemplative Vigorous Dreary analysis.
Reflective Exuberant Blue Serene
Comical Sad Tranquil Humorous D. Confirmatory Factor Analysis Model Fitting
Gloomy Loving Amusing Yearning The initial confirmatory factor analysis accounted for all 15
Tender Playful Longing Beautiful items on their respective factors. While some of the fit
Cheerful Lonely Romantic indices in Model 1 are acceptable in terms of confirming the
initial hypothesis (Standardized Root Mean Square Residual
B. Factor Analysis and Removal of Items SRMR <.08), the Root Mean Square Error of Approximation
A exploratory factor analysis with oblimin rotation was run (RMSEA) did not reach an acceptable threshold of <.1. This
on the data. After the initial factor analysis, we then sought to could be due to the fact that modifications were made post
reduce the number of items from 39 to 15. Our first step in hoc to the initial EFA in order to accommodate industry
the removal of items was to remove items that did not seem to concerns. In order to investigate this, we removed five items
effectively describe each factor as a whole. Here we removed (comical, victorious, relaxed, loving, romantic) from the CFA
the adjectives victorious, heroic, stately, majestic, determined, that were only present in the initial EFA. Removal of these
angry, raging, cruel, hateful, and pleasant. After removing items did decrease the RMSEA from .102 to .085, which has
these items, we then looked at each item’s factor score across been deemed an acceptable level for the CFA fit.
all three factors and removed items that did not seem to load Additionally, the problem of fit was also explored by reducing
clearly on a single factor, which we operationalized as having the amount of items in the Tranquil factor due to a
a loading of less than .5 on all factors .This excluded cheerful, non-significant relationship between Vibrancy and Tranquil in
longing, and contemplative. the first model. A third analysis with a reduced number of
Next we eliminated items that scored high on item items from each factor was created. By dropping some of the
uniqueness thus excluding vigorous, frustrated, related, loving, items that did not load as highly on the initial EFA, the
and beautiful. Our last exclusion criterion to reduce the RMSEA has now dropped again to .071. Additionally, this
amount of items was to remove items that appeared to be final model with 6 fewer items from the first model eliminated
synonyms with other included items, which then removed the the non-significant covariance between the Vibrancy and
word reflective since two native English speakers deemed it Tranquil factors. The three resulting factors yielded
synonymous with contemplative. The final subset of items Cronbach’s alpha scores of .88 for the Vibrancy factor, .85 for
retained along with their factor loadings can be seen in Table the Dark factor, and .87 for the Tranquil factor.
2. The three factors were interpreted as Vibrant, Dark, and
Tranquil in collaboration with industry professionals that were E. Calculation of Euclidian Distance to From Semantic
going to implement the tool. Differential
Using the item ratings from Table 3, it is then possible to
Table 2. Factors, Items, and Their Loadings from EFA Model
create a semantic differential relationship between any two
FACTOR LOADING ITEM items measured with the tool. If a client’s goal is to match
Vibrant Vibrant .623 music to brand in order to calculate brand fit, a brand manager
Exuberant .635
would then to need to rate his or her perception of their brand
Humorous .956
using the same prompt used for the music. We highly
Amusing .954
recommend that multiple brand managers make these ratings
Playful .711
independently of one another and check for inter rater
Dark Depressive .891
reliability.
Blue .727
After this process is complete, each rating can then be
Sad .941
multiplied by the factor scores shown in Table 3 which then
Lonely .768
creates a metric that the songs can be objectively compared to.
Tranquil Peaceful .903
This same process is done for each song that was rated. This
Calm .837
standardized metric can then be used to calculate the distance
Gentle .835
measure on each of the three factors shown in in Table 3 or a
Serene .783
combined metric can be computed taking all three factors in
Tranquil .867
the table into account simultaneously. An example of this can
Tender .662
be seen in Figure 1. In Figure 1 each line represents the
distance in two dimensional Euclidian space that either a song
C. Confirmatory Factor Analysis or brand exists in relation to one another. If a client is looking
to match song and brand they will look for lines that appear to
After the EFA, the 15 items were then presented to
match up with one another. This process an be done
participants in a web-study via Soundout’s SliceThePie.com
quantitatively by also looking at the distance measures. If a
interface and N=6006 ratings over 60 songs (each participant
rated one song) were collected. Participants were asked the
270
Table of Contents
for this manuscript
line extends in a different dimension from the target, that The a second version of the tool is planned in the future
choice should be avoided since it does not match accordingly. that will follow similar methodology, but instead draw from
more industry oversight in the creation of the original list of
termed that would relate to different facets of value judgments
of music, which would then be analyzed further suing more
sophisticated techniques of language processing such as
Latent Semantic Analysis or Natural Language Processing to
get a more accurate understanding of factors used to group the
music.
REFERENCES
Asmus, E. P. (1985). The development of a multidimensional
instrument for the measurement of affective response to music.
In P. Psychology of Music, 13(1), 19-30.
Brodsky, W. (2011). Developing a functional method to apply music
in branding: Designing language-generated music. Psychology of
Music, 0305735610387778.
MacInnis, D. J. & Park, C. W. (1991). The differential role of
characteristic of music on high and low involvement of
consumers’ processing of ads. Journal of Consumer Research,
161-173.
Figure 1 Example of Mapping Euclidian Distance Müllenseifen, D., Davies, C., Dossman, L., Hansen, J.L., and
Pickering, A. (2013) Implicit and Explicit Effect of Music on
Brand Perception in TV Ads. In K. Bronner, R. Hirt & C. Ringe
III. DISCUSSION (Eds.), Audio Branding Academy Yearbook 2012/2013 (pp.
Unlike tools created in the past to examine the relationship 139-153). Baden-Baden, Germany: Nomos.
between music and brand, this new tool can now provide an
objective reference point to frame discussions regarding the
relationship between music and brand. The tool has been
dubbed the SightSync and is used in a suite of music market
research tools that is now implemented by the two companies
involved in the creation of the tool: Soundout
(https://www.soundout.com/) and Hush
(http://www.hushmusic.tv/). The tool is easy to use and does
not require any sort of commercial statistical software to run.
By simply entering the factor values into most spreadsheet
management software, ratings then can be compared between
brand personality and music.
Beyond providing a starting point for conversations
between brand managers and creatives, the tool can also serve
as a perceptual similarity tool to find similarity between tracks
of music that share the same affective categorization but may
not share the same features such as copyright detailing.
This would be helpful for an advertising company looking
to use a certain track for their campaign or ad, but may be
limited in terms of funding for rights and royalties. If the
advertising company knew they wanted an effect similar to a
song that may be out of their price range, they could simply
rate that song according to the SightSync tool, then use those
scores to find similar songs from a larger database that scored
closely to the target song. This would then allow the
marketing company to hit their target affect without having to
pay expensive licensing fees.
The development and creation of this tool also created
interest in future directions and refinements of our current
model. The largest critique from industry demands was that
the language used from the original collection of items was
dated and did not reflect current industry terminology.
Industry professionals noted that a similar tool that measured
value judgments of music as opposed to perceived emotion
would be helpful.
271
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 272 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Table 1. Experimental conditions. As the pianist played, a researcher held up a cue card every
thirty seconds, prompting the participants to indicate their
Audio and visual Audio access, no present emotional response on their response sheets.
access to visual access to
performance performance D.Participants
79 university students (37 females and 42 males, ages
Live performance Condition 1 Condition 2
ranging from 18 to 40 [M = 20]) participated in this
Recorded Condition 3 Condition 4 experiment.
performance
III. RESULTS
B.Stimuli A two-way ANOVA was conducted to examine the effect of
A Master of Music student performed roughly ten minutes of performance setting⎯live or recorded performance⎯and
piano music, including: Piano Sonata no. 32 in C op. 11, I, by visual access⎯audio and visual access to the performance or
Beethoven; Anamorfosi, by Sciarrino; and Liza, by Gershwin
and Wild. Every thirty seconds, a researcher displayed a only audio access to the performance⎯on responses on both
prompt to the participants to indicate their present emotional the pleasantness scale and the emotional arousal scale. There
response via two five-point Likert scales, found in Figures 1 was a statistically significant interaction between the effects of
and 2. performance setting and visual access on pleasantness scale
responses one minute into the Beethoven piece, F (2, 78) =
4.777, p = .032; 2.5 minutes into the Beethoven piece, F (2, 78)
= 4.886, p = .030; and one minute into the Sciarrino piece, F (2,
78) = 4.010, p = .049. Interactions are displayed in Figures 3, 4,
Figure 1. Pleasantness scale. and 5.
Participants attending the live performance indicated
significantly higher pleasantness at 2 minutes into the
Gershwin/Wild piece, F (1, 75) = 4.756, p = .032, and 3.5
minutes into the same piece, F (1, 75) = 4.860, p = .031.
Participants attending the live performance also indicated
Figure 2. Arousal Scale. significantly higher emotional arousal at 30 seconds into the
Beethoven piece, F (1, 75) = 4.141, p = .045; 2.5 minutes into
C.Procedure the Gershwin/Wild piece, F (1, 75) = 13.377, p < .001; and 3
In each condition, participants first completed the Ollen minutes into the Gershwin/Wild piece, F (1, 75) = 4.677, p
Musical Sophistication Index on paper. The researcher then = .034.
gave participants the following prepared instructions for the Musical sophistication was not found to significantly effect
experiment:
responses on either pleasantness nor emotional arousal scales
“You will hear a musical performance. At various moments in any condition.
during the performance, you will see a numbered cue card
reading, ‘Please record your current emotional response.’
When you see the cue, you will find its corresponding number
on your response sheet and circle the phrase that best describes
your current emotional response on both scales given. The
musical performance will not pause for you to respond;
respond quickly and continue to listen to the musical
performance. The first scale asks you to record your emotional
response on a scale from ‘Very Unpleasant’ to ‘Very Pleasant.’
The second scale asks you to record your emotional response
on a scale from ‘Not Aroused at All’ to ‘Very Aroused.’
Arousal is a state of heightened physiological activity. For
example, if you were to feel slightly guilty, you might assess
your emotion as ‘Unpleasant’ as well as ‘Aroused,’ whereas if
you were to feel very guilty, you might assess your emotion as
‘Very Unpleasant’ as well as ‘Very Aroused.’ For another
Figure 3. Interaction between the effects of performance setting
example, consider the emotion, ‘sleepy.’ You might assess this and visual access on pleasantness scale response 2, F (1, 75) =
emotion as ‘Pleasant’ as well as ‘Not Aroused at All.’” 4.777, p = .032.
273
Table of Contents
for this manuscript
REFERENCES
Adams, B. L. (1994). The effect of visual/aural conditions on the
emotional response to music. (Doctoral Dissertation). Retrieved
from ProQuest Dissertations and Theses. (Accession Order No.
9434127).
Cochrane, E. A. (2011). The effect of live versus recorded preferred
Figure 4. Interaction between the effects of performance setting sedative music on pain intensity perception in hospice patients.
and visual access on pleasantness scale response 5, F (1, 75) = (Doctoral Dissertation). Retrieved from ProQuest Dissertations
4.886, p = .030. and Theses. (Accession Order No. 1494616).
Kuhl, P. K., Tsao F., and Liu, H. (2003). Foreign-Language
Experience in Infancy: Effects of Short-Term Exposure and
Social Interaction on Phonetic Learning. Proceedings of the
National Academy of Sciences of the United States of America,
Vol. 100, No. 15, pp. 9096-9101.
Vaughan, B. A. (1983). Comparison of the effects of recorded and live
piano performance in listening experiences for sixth grade
students. (Doctoral Dissertation). Retrieved from ProQuest
Dissertations and Theses. (Accession Order No. 8322479).
274
Table of Contents
for this manuscript
Our ability to predict upcoming events in music is thought to arise from a coupling of
auditory and motor brain networks that establish expectations for when events (i.e.,
rhythmic sounds) are likely to occur. Evidence for this view is derived from
neuroimaging studies showing engagement of low- (e.g., delta) and high- (e.g., beta)
frequency oscillations sourced to auditory and motor regions that are time locked to
predictable auditory events, when no overt movement is made by the listener. Despite
these findings no direct relationship has been established between neural correlates of
audio-motor coupling during auditory prediction and motor trajectory phases. Here we
test for this relationship. Musicians and nonmusicians (right-handed) listened to
isochronous tone sequences at two tempos while we recorded EEG and motion capture
data simultaneously in separate phases of listening and tapping. When listening,
participants identified deviant (louder) tones and did not synchronize movement to the
sequence. In the tapping task, participants synchronized right index finger tapping with
the sequence. Data collection and analysis are ongoing. Our analysis will correlate the
strength of delta-beta coupling localized to auditory and motor regions in the listening
task to consistency of movement trajectory phases. In a second analysis, we will compute
the amount of mutual information shared between time series of neural activity during the
listening task, neural activity during the tapping task, and motion trajectory time series
during tapping. If we find high correspondence between auditory prediction networks and
those involved in sensorimotor synchronization, we suggest auditory-motor linkages
serve as a substrate for both rhythmically timed prediction and action, consistent with
current views. If weak correspondence is found, we suggest an alternative view wherein
auditory temporal prediction networks are complementary to those involved in timed
actions.
ISBN 1-876346-65-5 © ICMPC14 275 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 276 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
University Institutional Review Board certified this study for each ordinal position. Following a paradigm similar to that of
compliance with ethical standards. Olsen & Stevens (2013), each of our trials consisted of 8
stimuli (i.e., each stimuli was presented twice) with
B. Apparatus/Materials approximately 15 seconds of silence between each stimulus.
Skin conductance measurements were collected by
attaching two small gold-plated brass sensors (NeuroDyne C. Procedure
Medical) to the participant's non-dominant index and ring We collected baseline skin conductance data for a period
fingers. Measurements were collected via the open-source e- of two minutes before presenting each participant with their
Health biofeedback shield for biomedical applications uniquely generated trial. Participants were seated in a
(Libelium Comunicaciones) at a sampling rate of 100 Hz. comfortable chair and instructed to refrain from moving their
Physiological data and audio markers were stored in plain text non-dominant hand. We then read them the following
format and uploaded into a Matlab-based software platform instructions:
for the analysis of skin conductance data (Ledalab). The “You will now hear a collection of sounds. While you
analytical process is detailed in the results section below. are listening, all you must do is relax and keep your non-
Stimuli were recorded at the beginning of each dominant hand still. This portion of the experiment will last
experimental session via Matlab. Participants were asked to approximately 4:30.”
sing and record a sustained pitch (approximately 3 seconds After listening to the complete trial, we removed the
long) near the middle portion of their vocal register. After skin conductance electrodes, and then participants once again
recording this tone, the spectral envelope of the participant’s listened to their uniquely generated trial. This time,
sung pitch was re-synthesized using the Matlab-based participants provided spoken feedback to indicate if the
TANDEM-STRAIGHT vocoder. Four unique spectral stimuli were perceived as originating from a larger or smaller
envelope ratio (SER) manipulations were applied to the sound source, relative to themselves. The experimenter
recorded tone; these manipulations were nominally labeled as transcribed these data points directly into a spreadsheet.
very small (SER x 0.7143), small (SER x 0.833), large (SER x
1.2) , and very large (SER x 1.4) stimuli. Visualizations of
the stimuli can be found in figure 1. III. Results
When assessing the perceived size of each sound, on
average participants correctly identified the sound source size
as being relatively larger or smaller than themselves 85.05%
(SD=15.99%) of the time. Of the 29 participants, 11 correctly
assessed the size of all 8 sounds, 7 misidentified only 1 sound,
4 misidentified 2 sounds, 6 misidentified 3 sounds, and 1
participant misidentified 4 sounds. No participants
misidentified 5 or more sounds.
Skin conductance recordings were analyzed using Ledalab.
The majority of our participants were unfortunately deemed to
be non-responders. That is, their phasic skin conductance
data did not reveal any significant SCRs. Our primary
conjecture for this finding was that the room in which our
experiment was conducted was relatively cold and potentially
inhibited participants’ normal electrodermal activity.
Methodological studies on skin conductance recording
suggest that measurements are optimal with an ambient
temperature between 22-24 (Braithwaite et al., 2015;
MacNeill & Bradley, 2016); our lab space fluctuates between
17-19. Therefore, we reviewed our initial dataset and
removed non-responders via visual inspection of the data.
Because we are here reporting on a pilot study, we will offer a
Figure 1a-d. Example waveforms and spectral-slices for
few modest descriptive findings from our small subset of
each of the four SER manipulations used in the study.
responders (n = 5).
Spectral modifications were accomplished via the
Each participant’s trial data were processed with Ledalab’s
TANDEM-STRAIGHT vocoder. Figure 1a (upper-left) =
adaptive smoothing feature followed by a Continuous
very small, Figure 1b (upper right) = small, Figure 1c
Decomposition Analysis (Benedek & Kaernbach, 2010b).
(bottom-left) = large, Figure 1d= very large.
Prior to analysis, we ensured that audio markers for the onset
of each stimulus were correct. The Continuous
In order to quickly generate experimental trials for each
Decomposition Analysis (CDA) then analyzed a response
participant, we used the SoX command line audio tool to
window from 1s to 4s after each stimulus onset. The relevant
concatenate their resynthesized tones. Stimulus order within
event data that we chose to incorporate into our statistical
each trial was quasi-randomized via 24 unique concatenation
models included the sum of significant skin conductance
scripts; these scripts ensured that any given stimulus was not
response (SCR) amplitudes, the maximum value of phasic
presented consecutively, and that all four stimuli appeared in
activity, and the response latency of the first significant SCR
277
Table of Contents
for this manuscript
within the response window. Due to the small dataset, we IV. DISCUSSION
reduced the four factors (much smaller, smaller, larger, much
Our goal for this pilot study was to determine if listeners
larger) of our independent variable (relative size
would be capable of classifying sounds as being larger or
manipulation) into two groups (smaller & larger).
smaller than themselves, and further, to determine if relatively
Our three skin conductance variables of interest are plotted
larger sounds would be more likely to result in induced
as a function of size condition (i.e., smaller & larger) in
emotional responses. The results, although preliminary,
figures 2a-c. From our pilot data, two of the variables were
warrant further research. On average, participants in our
skewed in the direction predicted by our hypothesis.
study were 85% accurate when judging sound source size
Specifically, both the sum of SCR amplitudes data and the
relative to their own size. For our small subset of responder
phasic maximum data tended to increase as SER
participants, we observed a positive relationship between
manipulations of size increased, as would be predicted by our
relatively larger sound sources and both summed SCR
hypothesis. However, it bears reminding that these results are
amplitudes and SCR phasic maximums. We did not observe
based on a very small sample of responders. We also
shorter SCR latency times for larger stimuli.
expected that SCR latency times would be shortest for sounds
We believe that our results from this pilot study provide a
that were perceived to be larger. However, these data were
novel insight towards understanding induced musical affect
skewed opposite of our predicted direction.
by “normalizing” sound stimuli to each specific listener. This
approach, combined with a better understanding of how sound
source size is inferred for instrumental timbres, has a number
of musical implications. First, listener-normalized emotional
responses might reveal how various orchestrations, from
sparse to grandiose, can reliably induce certain types of
emotional responses via music. Secondly, listener normalized
emotional responses might be useful for understanding certain
types of affective human computer interactions. Finally,
listener-centered approaches to emotional communication
within music might inform us about variability in previously
conducted studies of induced emotion.
ACKNOWLEDGMENTS
The authors both wish to thank Illinois Wesleyan
University for providing financial support to realize this
research. Joe Plazak wishes to thank both Stephen McAdams
and Marc Pell (McGill University) for their support
throughout his research leave.
REFERENCES
Benedek, M., & Kaernbach, C. (2010). A continuous measure of
phasic electrodermal activity. Journal of neuroscience methods,
190(1), 80-91.
Boucsein, W. (2012). Electrodermal activity (2nd ed.). New York:
Springer Science+Business Media, LLC.
Braithwaite, J.J., & Watson, D.G (2015). Issues surrounding the
normalization and standardisation of skin conductance responses
(SCRs). Technical Research Note. Selective Attention &
Awareness Laboratory (SAAL), Behavioural Brain Sciences
Centre, School of Psychology, University of Birmingham.
Clarke, E. F. (2005). Ways of listening: An ecological approach to
the perception of musical meaning. Oxford University Press.
Gabrielsson, A. (2002). Emotion perceived and emotion felt: Same
or different?. Musicae Scientiae, 5(1 suppl), 123-147.
Gaver, W. W. (1993). What in the world do we hear?: An ecological
approach to auditory event perception. Ecological
psychology, 5(1), 1-29.
Juslin, P. N., & Laukka, P. (2004). Expression, perception, and
induction of musical emotions: A review and a questionnaire
study of everyday listening. Journal of New Music Research,
33(3), 217-238.
Kawahara, H., Takahashi, T., Morise, M., & Banno, H. (2009).
Figure 2a-c. Three SCR measurements as a function of
Development of exploratory research tools based on TANDEM-
experimental stimuli (relatively smaller or larger than each
STRAIGHT. In Proceedings: APSIPA ASC 2009: Asia-Pacific
participant).
278
Table of Contents
for this manuscript
279
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 280 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
pattern has a duration of 16 bars. chord loops designed specifically for this study 1 . The
sequences with the highest probability between successive
Table 1. The 14 semiotic structures used in the experiment.
chords were selected and key modulations within the
AABB-AAAA ABCD-ABCD sequences were avoided as much as possible. Each chord
AABB-AABB ABCD-AECF sequence was then rendered within the Digital Audio
AABB-AACC ABCD-BCDB Workstation Logic Pro X (LPX) by starting from an existing
AABB-BBAA ABCD-CECF uplifting trance template 2 and applying as few as possible
AABB-CCAA ABCD-DADB pitch modifications necessary to make its harmonic structure
AABB-CCAB ABCD-EFAB
fit with the generated chord sequence (Bigo & Conklin, 2015).
AABB-CCDE ABCD-EFGH
As this transformation task leaves rhythm, instrumentation,
and audio effects unchanged, the generation results in a set of
The semiotic patterns were created such that the first 8 bars stimuli that differ by their harmonic properties. The entire
were either ABCD or AABB. The second 8 bars fell into three generation process was automated by using a system which
categories when considering the particular chord and its interacts with LPX through the MIDI protocol (Conklin &
metrical position in the pattern: either the semiotic structure of Bigo, 2015).
the first half was completely preserved (as in AABB-AABB
and ABCD-ABCD), some of the semiotic structure of the first E. Results
half was preserved (e.g., AABB-AACC and ABCD-CECF, To test the impact of repetition on enjoyment ratings, a 2 X
where the repeated chords are in bold), or none of the semiotic 3 X 4 ANOVA was conducted for three types of repetition
structure of the first half was preserved (e.g., AABB-CCDE found within the semiotic structures as described above.
and ABCD-DADB). Finally, the number of unique chords in These three repetition variables were: 1) the structure of first 8
the second half of the semiotic structure ranged from 1 chord bars (AABB or ABCD), 2) the number of chords in common,
(AAAA) to 4 chords (ABCD or EFGH). The semiotic and in the same metrical position, between the first and
structures used in the listening experiment are displayed in second 8 bars of the sequence (all the same, partially the
Table 1. same, or no common chords), and 3) the number of unique
The listeners were presented with a total of 56 UT anthems chords in the second 8 bars (ranging from 1 to 4 chords).
(4 instances per unique semiotic pattern), each 30 seconds in There was a significant effect of variable 3 (the number of
duration, and the order of trials was randomised across unique chords in the second 8 bars), F= 4.24, p < 0.01, such
participants. Two practice trials were played before the that more chord diversity yields higher enjoyment ratings. A
experiment to help the participant become familiar with the significant interaction was also found between variable 1
experimental procedure. After listening to each trial, the (semiotic structure of first 8 bars) and variable 2 (all, some, or
listeners’ task was to provide enjoyment ratings on a 7-point no chords in common between the first and second 8 bars), F
Likert scale, where 1 was equivalent to ‘Did not enjoy at all’ = 4.09, p < 0.05. Highest enjoyment ratings were elicited
and 7 represented ‘Enjoyed very much.’ when all the same chords were present in the first and second
The experiment was conducted on a MacBook Pro laptop half of the stimulus (AABB-AABB and ABCD-ABCD). For
using a GUI constructed for data collection for this stimuli beginning with ABCD, high enjoyment ratings were
experiment. Participants listened to the stimuli through also found when none of the same chords were present in the
headphones set at a comfortable listening volume, in a quiet first and second half of the stimuli. The lowest enjoyment
room. ratings were elicited when the second 8 bars contained only
some of the same chords as the first 8 bars, especially for
C. Participants
stimuli beginning AABB. Lastly, a post hoc analysis indicated
Twenty volunteers, recruited via email advertisement and that semiotic patterns occurring in the Uplifting Trance
flyers around the Queen Mary University of London campus,
Anthem Loops Corpus (i.e. native structures) were preferred
participated in the experiment (mean age = 28.6 yrs, std = 7.9
over those not in corpus, t = 2.36, p < 0.01.
yrs; 6 female and 14 male). Participants included
undergraduate students, graduate students, and staff, all of F. Discussion
whom reported prior experience listening to trance music.
Although there was a trend of increasing enjoyment for
Each volunteer received £7 for his or her participation.
stimuli with less repetition in the last 8 bars (that is, more
D. Stimulus generation unique chords in the second half of the stimulus), this was not
To avoid any potential confound of the actual chords true for sequences beginning AABB. The findings indicate
presented (as opposed to the overarching semiotic structure), that enjoyment is elicited by semiotic patterns that are either
four different stimuli were generated for each semiotic very repetitive, namely, AABB-AABB and ABCD-ABCD, or
structure listed in Table 1. To this end, the first chord of each fairly complex, such as ABCD-EFAB and ABCD-BCDB; but
sequence was either Bmin, Dmaj, Emin or Gmaj. The enjoyment does not result from sequences that are only
generation was performed using a method which allows slightly repetitive, such as AABB-AACC. Violations of
sampling high probability chord sequences with respect to a
given semiotic structure (Conklin, 2015). Sequences were 1
The Uplifting Trance Anthem Loops Corpus (UTALC) which was
generated from a statistical model that encodes chord constructed for this study may be found here: http://www.lacl.fr/~lbigo/utalc
transition probabilities of a corpus of 100 uplifting trance 2
Uplifting Trance Logic Pro X Template by Insight, DAW Templates,
Germany.
281
Table of Contents
for this manuscript
ACKNOWLEDGMENT
This research is supported by the project Lrn2Cre8 which is
funded by the Future and Emerging Technologies (FET)
programme within the Seventh Framework Programme for
Research of the European Commission, under FET grant
number 610859. This project also received funding from the
European Union’s Horizon 2020 research and innovation
programme under grant agreement number 658914.
282
Table of Contents
for this manuscript
Which structural features in music evoke chills, and how are chills related to emotion? 20
participants each brought 3 chill-eliciting pieces of their own choice. In an interview, they
indicated the chill-inducing passages verbally and evaluated the intensity of felt emotions
using the Geneva Emotional Music Scale (GEMS-25; Zentner et al., 2008),
complemented by items which were found to be important in a previous study. On
average, each participant indicated 5.9 chill-inducing sections, making a total of 118
passages. On factor level, the emotion most strongly associated with chills was wonder,
followed by power, transcendence and joyful activation, all of which were rated higher
than tension and sadness. On item level, we found highest ratings for fascinated, strong,
liberated (item added by authors), happy and allured, whereas emotions like sad, tense,
soothed, nostalgic or tearful received very low ratings. This contradicts Panksepp (1995)
who suggested that musical chills are associated with the emotions sad/melancholy and
thoughtful/nostalgic. Analysis of the 118 passages showed that chills often occur after a
relatively monotonous section, characterized by a repetition of harmony, rhythm and
melody, a delay of progression, a reduced number of instruments or short silence. The
chill-inducing passage itself featured a sudden increase in loudness, an entry or change of
instrument(s), a melodic peak, a change in melodic range or a sudden
rhythmical/harmonical change. In sum, the emotions associated with chills are attributed
to two contrasting factors: wonder and transcendence (sublimity factor) on the one hand,
power and joyful activation (vitality factor) on the other hand; this could give further
insight into understanding the phenomenon of chills. As observed by Grewe et al. (2007),
chills were often evoked by sudden changes; in addition, we found that the preceding
passage was characterized by a noticeable lack of change or activity, suggesting that the
chill was evoked by the contrast between the two passages.
ISBN 1-876346-65-5 © ICMPC14 283 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Affective evaluations of music can often involve integration of multiple sensory cues.
Previous work in this area has focused on the role of visual cues that performers combine
with acoustic cues to express particular emotions. These visual cues include facial
expression and body movement/posture. This paper examines the role of olfactory stimuli
in music emotion perception. Despite evidence that odours can effect general perceptual
and cognitive processing, little work has been done in the context of music. Of particular
interest is the possibility of affect-congruent judgment biases in relation to music. As
valence is a primary dimension in characterizing odour, we are interested in the effect
this may have on the perceived valence of music.
The aim of this study is to investigate the effect of odour on the perception of musical
emotions. Specifically, we are interested in changes in strength of perceived emotions
and the time taken to make emotion judgements.
Preliminary findings show that in the odour condition, the strength of ratings for
negatively valenced music is reduced. Furthermore, the response time of emotion
judgements are significantly smaller in the odour condition.
This early study suggests that environmental odour can modulate emotional judgements
of music. More specifically, sad music is perceived less sad in the presence of a
positively valenced, pleasing odour. In addition, judgements of perceived emotions are
made in less time in the presence of a pleasing odour.
ISBN 1-876346-65-5 © ICMPC14 284 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Previous work has revealed systematic music-to-color cross-modal associations in non-
synesthetes that are mediated by emotion (e.g., Palmer et al., 2013; Langlois et al., VSS-
2013; Whiteford et al., VSS-2013).
Aims
What identifiable features might mediate cross-modal associations from music to visual
texture by using single-line melodies that vary along highly controlled musical
dimensions: register (low/high), mode (major/minor), note-rate (slow/medium/fast), and
timbre (piano-sound/cello-sound)? Are these associations mediated by emotional and/or
non-emotional features?
Methods
Non-synesthetic participants made visual texture associations to each of 32 variations on
4 melodies by Mozart from a 7×4 array of black-and-white visual line-based textures.
Results
Cross-modal melody-to-texture associations were at least partly mediated by emotion,
because the correlation between the emotional ratings of the music and the emotional
ratings of the chosen textures were high (Angry/Not-Angry: r=.79, Calm/Agitated: r=.91,
Active/Passive: r=.76). However, unlike music-to-color associations, the Happy/Sad
correlation (r=.31) was not significant. Melody-to-texture associations were also
mediated by sensory-perceptual features (e.g., Sharp/Smooth: r=.96, Curved/Straight:
r=.92, Simple/Complex: r=.89, Granular/Fibrous: r=.80). Importantly, the cross-modal
melody-to-texture associations corresponded to specific musical features. k-means
clustering analyses revealed strong timbre and note-rate effects: cello melodies
were paired with straight/sharp/fibrous textures, and piano melodies with
curved/smooth/granular textures. Different textures were also chosen for different note-
rates, with more granular/separate textures being chosen with slow note-rate melodies,
and more fibrous/connected textures being chosen with faster note-rate melodies.
Conclusions
Both emotional and sensory-perceptual features are important for understanding melody-
to-texture associations.
ISBN 1-876346-65-5 © ICMPC14 285 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Until recently, there has been a lack of replication studies conducted in the field of music
psychology. A highly-cited paper in the field by Juslin and Gabrielsson (1996) reported
that performer’s intended emotional expressions were decoded by listeners with a high
degree of accuracy. While there have been related studies published on this topic, there
has yet to be a direct replication of this paper. The present experiment joins the recent
replication effort by producing a multi-lab replication using the original methodology of
Juslin and Gabrielsson. Expressive performances of various emotions (e.g., happy, sad,
angry, etc.) by professional musicians were recorded using the same melodies from the
original study and are subsequently being presented to participants for emotional
decoding (i.e., participants will rate the emotional quality of each excerpt using a 0-10
scale). The same instruments from the original study have been used (i.e., violin, voice,
and flute), with the addition of piano. Furthermore, this experiment investigates potential
factors (e.g., musicality, emotional intelligence, emotional contagion) that might explain
individual differences in the decoding process. Finally, acoustic features in the recordings
will be analyzed post-hoc to assess which musical features contribute to the effective
communication of emotions. The results of the acoustic and individual differences
analyses will contribute to a more comprehensive understanding of when music can be an
effective vehicle for emotional expression.
ISBN 1-876346-65-5 © ICMPC14 286 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In Western industrialised society, music listening occurs in a wide range of settings and
serves a wide range of functions. Studies have begun to focus on specific contexts to
explore how these might influence the nature of music listening (e.g. driving: Dibben &
Williamson, 2007; travelling: Heye & Lamont, 2010). This presentation focuses on self-
chosen music listening in the home as a specific context with potential for emotion
regulation (Saarikallio, 2011): listeners have access to their entire music collection and
are able to express and work through emotions more readily in private. The study
explores the fine-grained emotions evoked by music listening using the novel technique
of Subjective Evidence-Based Ethnography (SEBE; Glaveanu & Lahlou, 2012). This
technique obtains first person digital-audio from a camera worn at eye level, which forms
the starting point for an interview which allows participants to reflect on moment-to-
moment experiences in greater depth than a typical retrospective interview can. Twenty
young adults will wear the glasses and make recordings at home while they are listening
to music over one week. Given the prevalence of music listening in everyday life, this
will generate a large number of musical episodes, some longer and others shorter, that
can be used as a prompt to engage participants in a detailed discussion of the emotions
they experience through music and to situate this in the context of their everyday lives.
Given the fleeting nature of many emotional responses to music and emotions
experienced in music through the processes of self-regulation, this data will shed
considerable light on the ways in which people use music in everyday life to regulate
their emotions. Pilot testing is under way with the glasses to establish the precise nature
of instructions to participants and early results indicate that music listening at home tends
to comprise longer episodes than are typical in other settings.
ISBN 1-876346-65-5 © ICMPC14 287 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The relationship between music and emotion has frequently been studied either in
controlled settings or through retrospective reports. However, these approaches lack the
long-term perspective of emotional developments across real concert experiences and
often require participants to disengage from the concert experience. In contrast, this paper
presents results from two experiments in which participants listened to the entirety of
Richard Wagner’s Der Ring des Nibelungen, while being continuously assessed for their
arousal levels. We intend to explore how unconscious arousal affects perception and
memory for musical materials, namely leitmotives in The Ring, as well as use the
physiological data to explore musicologically motivated questions. The first experiment
was conducted live in a large opera house (N=7) and the second group screened condition
(N=9). Physiological reactions from all participants were measured via their heart rate,
electrodermal activity, and small scale movements. In addition, individual differences
measures such as musical sophistication, Wagner affinity and knowledge, auditory
working memory span and personality were taken from participants. Participant’s
familiarity and recognition memory for individual leitmotives was taken prior to each
opera. Results from the live experiment suggest that memory for musical material
increases with exposure time for both novices and experts of Wagner’s music alike,
although participants with a higher Wagner affinity generally show stronger arousal
reactions. The strongest reactions from both groups were observed for the opera
Siegfried. Further results, including comparisons between participants in live and screen
conditions and results regarding individual leitmotives are being analysed. The results
suggest that given the differences in physiological and behavioral responses, factors
beyond musical training should be further investigated to better understand the listening
experience.
ISBN 1-876346-65-5 © ICMPC14 288 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Age-related hearing loss negatively impacts the perception of speech. While hearing aids
can ameliorate these deficits to some extent, questions exist about their efficacy for
supporting perception of emotion. In normal hearing adults, characteristic physiological
responses are triggered in response to speech emotion. The current study assessed the
extent to which these responses vary across normal hearing, hearing impaired, and
hearing aided older adults. Participants were presented with audio-only samples of
linguistically neutral sentences spoken in a happy, sad, angry, or calm manner, and were
asked to respond with the expressed emotion. Physiological responses included skin
conductance, heart rate, respiration rate, and facial electromyography on cheek (smiling)
and brow (frowning) muscles. Normal hearing participants were both faster and more
accurate in their responses than the hearing-impaired or hearing-aided groups. Normal
hearing participants exhibited an increase in skin conductance in response to negatively-
valenced and high-arousal stimuli (i.e. angry, and to some extent happy and sad). This
increase was not present in hearing-impaired participants but was recovered in hearing-
aided participants. These findings raise important questions about the efficacy of signal
processing strategies employed in modern digital hearing aids.
ISBN 1-876346-65-5 © ICMPC14 289 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Loneliness has been identified as a major risk factor for our health, but it can be
influenced positively by the use of media. In order to feel connected to others, people
often immerse emotionally in narratives or form para-social relationships with media
figures (Hawkley & Cacioppo, 2010). There is some empirical evidence (Derrick et al.,
2009; Saarikallio & Erkkilä, 2007) that music could act as a “social surrogate”.
To test the effect of music listening on loneliness, a between-subject design with two
factors was implemented: (i) The need to belong, which comprised three conditions of
autobiographical recall (interpersonal distress, task-related distress, and control), and (ii)
an imagined music listening situation, which included either preferred or casual music.
Both factors were operationalized as writing tasks. Mood was assessed before, between
and after the writing. After the tasks, measures of loneliness and attachment were
collected.
The results of 141 participants indicate that preferred music does not buffer against the
negative emotional effects of interpersonal distress. On the contrary, emotional loneliness
was heightened through the imagination of a situation where one would listen to one’s
preferred music. Yet at the same time, those who were thinking of their preferred music
were in a significantly better mood. So, even though preferred music boosted feelings of
loneliness, it also raised the mood simultaneously.
These intriguing findings can be explained from the perspective of emotion regulation:
Thinking about interpersonal distress may have guided the participants to choose music
that promotes insights and clarification. This ‘mental work’ (Saarikallio et al., 2007) may
have enhanced the feeling of loneliness, but helped them to deal with that conflict, which
raised their mood. So, even if music enhanced loneliness in the short term, it could
contribute to mental health in the long run as it supports healthy coping.
ISBN 1-876346-65-5 © ICMPC14 290 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Abstract— Extensive research has concluded that music can before or during the word stem completion task. Our hypotheses
induce emotions in a listener. Much of the research has been based on were that participants would form word completions that matched the
self-report measures. Research suggests indirect measures avoid the emotional tone of the music, and that the effect would be stronger for
biases inherent in self-report. The current study uses an indirect more familiar music. We reasoned that playing the music before
measure (word stem completion task) to look at the subconscious participants started the task would have them primed before they
effect of music on emotions. We hypothesized that when participants started on the first words and therefore the results would be more
hear happy, sad or scary music, they would be primed to generate clear cut than having them start the task before they heard much of
words associated with the corresponding emotion. In addition, we the music.
added a familiarity condition including a familiar and unfamiliar song
for each emotion. We hypothesized that participants would have II. METHOD
stronger emotional responses to more familiar music. Results showed
that the tone of the music did affect the amount of related emotion A. Participants
words written. The sample included 134 participants all over the age of 18.
Keywords— Indirect measures, emotions, familiarity, B. Design
A 3X2X3 mixed design was used. The between subjects
I. INTRODUCTION independent variables were the emotional tone of the music; happy,
There has been extensive research confirming that music can sad, or scary, and the familiarity of the music (familiar or unfamiliar).
induce emotions in listeners [1]. Researchers have used a wide The within subjects independent variable was the potential emotional
variety of response measures. These measures have included self- tone of the word stems either happy, sad, or scary. The dependent
report rank ordering of emotional responses [2], standard emotion measures were the completion of the word stems, and ratings of the
scales [3], scales developed specifically for music [4]. The self-report music.
scales have been criticized for being confounded by demand
C. Stimuli
characteristics [5]. Participants may guess what the researcher is
looking for and respond accordingly. Others have used physiological We used instrumental music to avoid associations with the lyrics:
responses, which are thought to be less influenced by demand • Happy familiar: Build Me Up Buttercup by Pentatonic
characteristics [6]. Physiological measures are good for Karaoke Band
• Happy unfamiliar: Happy Whistling Ukulele by Seastock
differentiating the intensity of the emotional response, however, they
• Sad familiar: Arms of an Angel by The O’Neill Brothers
do not clearly differentiate the tone of the emotional response [7]. A
Group
previous researcher suggested the use of indirect measures of
• Sad unfamiliar: Hazel Emergency from the original score of
emotion to avoid demand characteristics and retain the emotional The Fault in our Stars
tone of the participants’ responses [5]. • Scary familiar: Jaws by The City of Prague Philharmonic
Previous research has shown that a word stem completion task can • Scary unfamiliar: The Shadow from the original score of
be used as an implicit measure of emotional responses to happy and Psycho
sad music [8]. However, across three studies, the word stem
completion task only worked as a response measure when the music
was familiar. D. Materials
The current study expands the use of the word stem completion The materials consisted of a page of 20 word stems, which
task by comparing music with three different emotional tones; happy, provided the first two letters of the word to be completed. All of the
sad, scary, and varying the familiarity of the music. Additionally, word stems had potential completions that were neutral in emotional
since the music can be thought of as priming the participants to think tone. There were five words stems that had emotionally congruent
about a particular emotion we varied whether the music was played completions for each of the three emotional tones of the music and a
final five that had only neutral completions.
In the post-test questionnaire participants rated the music on its
emotional tone, familiarity, liking, and how soothing, relaxing, and
L. Edelman is with the Psychology Department, Muhlenberg College, pleasant they found it. All of these were rated on a seven point scale
Allentown, PA 18104 USA (phone: 484-664-3426; e-mail:
with higher numbers meaning more. Participants also indicated
ledelman@muhlenberg.edu).
P. Helm is with the Music Department, Muhlenberg College, Allentown, whether they believed the music influenced their performance on the
PA 18104 USA, (e-mail: pbhelm@muhlenberg.edu). word stem task and whether or not they had musical training. If
S.C. Levine, S.D. Levine, & G. Morrello are students in the Psychology participants indicated that the music did influence their responses
Department Muhlenberg College, Allentown, PA 18104 USA. they were asked to explain how the music influenced them.
ISBN 1-876346-65-5 © ICMPC14 291 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
liked it more, (r(173)= .22, p< .01), and found it more pleasant
E. Pretest r(173)= .26, p<.01. There were no significant correlations based on
the sad words completed. Participants that completed more scary
Previous researchers played the music while participants words, rated the music as less familiar, (r(173)= -.17, p=.02), more
completed the word stems [8]. We hypothesized that priming the scary, (r(173) = .24, p< .01), less relaxing, (r(173) = -.29, p< .01),
participants with the music before the word stem task would and found it less pleasant r(173)= -.25, p<.01.
strengthen their responses. We ran the happy and sad conditions by There were several potential problems with our music
playing the music either before or during the word stem task. There choices. Although the pretest data indicated clear categorization for
were no statistically significant differences in the results based on each piece of music, the ratings by the participants in the word stem
when the music was played. Therefore we decided to shorten the completion task did not match the pretest ratings. There was a
experiment by playing the music during the word stem task. significant difference in liking based on the emotional tone of the
music. Happy music was liked much more (M= 6.0, SD= 1.17) than
F. Procedure the sad (M= 4.88, SD= 1.74) or scary music, M= 3.75, SD= 1.84, F
(1, 167)= 27.06, p<.01, 𝜂 p2 = .25. Happy music was rated as more
familiar (M= 5.80, SD=1.42) than either the sad music (M= 4.02, SD=
Participants were told we were investigating the effects of music
2.04) or the scary music, M=3.75, SD= 2.25, F(2,167)= 23.97,
on word processing. They were given a word stem completion task
p<.01, 𝜂 p2 = .22. Although the unfamiliar music was always rated as
while listening to either happy, sad, or scary music (either familiar or
less familiar, the difference in familiarity (between the unfamiliar and
unfamiliar) and then given the post-test questionnaire to complete.
familiar pieces) varied by the tone of the music. Thus we were unable
to draw any conclusions based on the role of familiarity.
III. RESULTS
IV. DISCUSSION
The data were analyzed using a 3 way mixed analysis of
variance. There was a significant interaction between the tone of the
Because pretesting did not show a significant difference between
music and the tone of word stem completion, F(4,244)= 10.22, p<.01,
whether participants heard the music before or during the word stem
𝜂 p2 = .14. Fig.1 shows word stem completion was congruent with the
completion task, this condition was not included in a final analysis of
tone of the music that was playing.
the results. However, the results of the main study supported our
hypothesis that indirect measures of emotion can be used in studies of
musically induced emotions. We were able to confirm, via the tone of
the word stems completed, that happy sounding music induced
positive emotions within participants, sad sounding music induced
negative emotions, and scary music induced scary emotions. The
correlations also support this finding as they showed that the
participants’ ratings of the music played a role in the word stem
completion task. We expected the emotional tone ratings to correlate
with the word stem completions, but were surprised that that other
ratings such as relaxing, soothing, and pleasant, influenced the word-
stem completion task. Our results for familiarity were less clear. We
had several confounds for our music categorizations. For example,
the happy unfamiliar music rated higher on familiarity than the sad
and scary music in both the familiar condition and unfamiliar
conditions. The finding that the happy music was liked more than the
sad or scary music may have also influenced the results.
Another limitation in our study may be in our sad music
choices. For both the happy and the scary music there were
correlations between the word stem completions and the participants’
ratings of the music. However, there were no correlations between
the number of sad words completed and the ratings of the sad music.
This seems to indicate that our sad music choices did not work for
our participants. Future researchers should use songs that are a better
representation of each condition, especially in regards to familiarity.
292
Table of Contents
for this manuscript
[8] Edelman, L., Baron, C., Rybnick, E., & Zimel, C. (2015) Music
and Emotion: Indirect Measures. Poster presented at SMPC,
Nashville, TN.
293
Table of Contents
for this manuscript
Music has the potential to elicit emotions in listeners, and recent research shows that
these emotions are mediated by a number of underlying psychological mechanisms, such
as contagion and memories. Depression has been associated with negative biases in
cognitive processes (e.g., memory, attention). The aim of this study was to explore
whether such cognitive biases also influence the activation of mechanisms during music
listening in depressed people. Depressed and healthy-control individuals participated in a
music listening experiment, which featured musical stimuli that targeted specific
mechanisms (i.e., brain stem reflex, contagion, and episodic memory). After each piece,
listeners rated scales measuring the BRECVEMA mechanisms and their felt emotions.
Participants also completed psychometric tests of depression (BDI) and anxiety (BAI).
Depressed people showed differences in emotional reactions to music compared to
healthy controls, and these could be partly explained by a differential activation of the
mechanisms. These results suggest that cognitive biases may be evident in the activation
of mechanisms during music listening, influencing the way depressed people experience
music. Knowledge of this kind may be of potential use for music-therapeutic
interventions against depression, where focus lies on the underlying mechanisms
mediating the induction of musical emotions.
ISBN 1-876346-65-5 © ICMPC14 294 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 295 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
REFERENCES
Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: The
self-assessment manikin and the semantic differential. Journal of
Behavior Therapy and Experimental Psychiatry, 25, 49–59.
Rauscher, F. H., Shaw, G. L., & Ky, K. N. (1993). Music and spatial
task performance. Nature, 365, 611.
Russo, F,A., Vempala, N, N., Sandstrom, G, M., (2013). Predicting
musically induced emotions from physiological inputs: linear
and neural network models Front. Psychol, 4, 468.
Schellenberg G, E. (2014). Music in the social and behavioral
sciences: an encyclopedia. Sage editors, 4, 717-718.
Vandenberg, S. G., & Kuse, A. R. (1978). Mental rotations, a group
test of three-dimensional spatial visualization. Perceptual and
Motor Skills, 47, 599–604.
296
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 297 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
cortex [35] and corpus callosum [36], [37]. There is also TABLE I
PARTICIPANT DEMOGRAPHICS
strong evidence suggesting that the acquisition of absolute
Musicians (n = 52)
pitch and its persistence into adulthood depends on a critical Early onset Late onset Nonmusicians
period that ends around age six [38] and that the probability of (n = 18) (n = 34) (n = 39)
acquiring absolute pitch is related to onset age [39], [40]. Mean age (SD) 24.39 (5.55) 23.06 (4.03) 22.28 (3.32)
Females 14 20 23
Further, onset age of music training positively influences Males 4 14 16
performance on visual- and auditory-motor synchronization Mean onset age of 4.67 (.49) 9.27 (1.97)
tasks [41]-[43]. Given these relationships, it is plausible that training (SD)
musicians who begin studying music during a sensitive period Mean years of 19.72 (5.42) 13.79 (4.63)
training (SD)
show increased levels of empathy in adulthood. Instrument
The aim for this study was to investigate whether onset age Piano 12 9
of music training during childhood can impact development of Strings 2 7
Winds 0 7
socio-emotional skills, including theory of mind and empathy. Brass 1 4
We hypothesized that musicians with an early onset age of Voice 3 5
music training would outperform musicians with late onset Percussion 0 2
Ensemble?
age of training as well as nonmusicians in both measures of Yes 8 25
empathy and theory of mind. No 10 9
298
Table of Contents
for this manuscript
explored the extent to which empathy is improved by music RMET 30.61 (2.45) 28.06 (3.60) 28.44 (2.52)
practice in the long term. Given influences of music on Voice Test 19.06 (2.90) 17.88 (2.66) 18.72 (1.57)
emotion and the various reported cognitive benefits of music Perspective-Taking 17.61 (4.33) 18.00 (3.11) 19.64 (4.99)
training especially from a young age, we hypothesized that
Fantasy 19.17 (5.87) 19.32 (5.12) 20.03 (5.05)
early onset musicians may have enhanced socio-emotional
skills as a result of their training. Empathic Concern 21.94 (3.80) 21.06 (3.43) 19.85 (5.26)
When comparing the three groups (early onset musicians, Personal Distress 12.39 (4.10) 10.00 (5.26) 10.87 (4.46)
late onset musicians, and nonmusicians), we found that early
Participant scores on all empathy measures. Early onset musicians, n = 18;
onset musicians performed significantly better than late onset Late onset musicians, n = 34; Nonmusicians, n = 39. RMET = Reading the
musicians and nonmusicians on the RMET. Scores on the Mind in the Eyes Test; Voice Test = Reading the Mind in the Voice Test;
RMET were further correlated with the age at which Perspective-Taking, Fantasy, Empathic Concern, & Personal Distress
musicians began training, but not with their number of years subscales of the Interpersonal Reactivity Index (IRI).
299
Table of Contents
for this manuscript
would have given us more specific data. On the other hand, proficiency test performance," Journal of Research in Music Education,
vol. 54, no. 1, pp. 73–84, Apr. 2006.
gathering data from a group of musicians with different [16] E. G. Schellenberg, "Long-term positive associations between music
specifications provides us with a broader look at the lessons and IQ," Journal of Educational Psychology, vol. 98, no. 2, pp.
differences between musicians and nonmusicians. 457–468, May 2006.
[17] V. Sluming, T. Barrick, M. Howard, E. Cezayirli, A. Mayes, and N.
Nevertheless, future research should isolate various music
Roberts, "Voxel-Based Morphometry reveals increased gray matter
training environments in order to investigate the potentially density in Broca’s area in male symphony orchestra musicians,"
different influences ensemble training has on empathy NeuroImage, vol. 17, no. 3, pp. 1613–1622, Nov. 2002.
compared to solo training. [18] C. Gaser and G. Schlaug, "Brain structures differ between musicians and
non-musicians," Journal of Neuroscience: The Official Journal of the
The present study provides evidence for the possibility of a Society for Neuroscience, vol. 23, no. 27, pp. 9240–9245, Oct. 2003.
sensitive period for development of associations of music [19] G. Schlaug, "Effects of music training on the child’s brain and cognitive
training and empathy skills. These findings support previous development," Annals of the New York Academy of Sciences, vol. 1060,
no. 1, pp. 219–230, Dec. 2005.
work that shows associations between music training and [20] A. Habibi and A. Damasio, "Music, feelings, and the human brain,"
emotion recognition skills [27], [51]. While the results from Psychomusicology: Music, Mind, and Brain, vol. 24, no. 1, pp. 92–102,
this study deepen our understanding of the relationship 2014.
[21] P. G. Hunter and E. G. Schellenberg, “Music and emotion” in Music
between music training and empathic abilities, future research perception. New York: Springer, 2010, pp. 129-164.
should look towards a predictive model that would take a [22] P. N. Juslin, S. Liljeström, D. Västfjäll, G. Barradas, and A. Silva, "An
variety of training methods, ensemble involvement, experience sampling study of emotional reactions to music: Listener,
music, and situation," Emotion, vol. 8, no. 5, pp. 668–683, Oct. 2008.
environmental and innate factors into account. [23] M. Zentner, D. Grandjean, and K. R. Scherer, "Emotions evoked by the
sound of music: Characterization, classification, and measurement,"
REFERENCES Emotion, vol. 8, no. 4, pp. 494–521, Aug. 2008.
[24] T. DeNora, “Emotion as social emergence: Perspectives from music
[1] J. Chobert, C. Francois, J. Velay, and M. Besson, "Twelve months of sociology,” in Handbook of music and emotion: Theory, research,
active musical training in 8- to 10-year-old children enhances the applications, P. N. Juslin, and J. A. Sloboda, Eds. Oxford: Oxford
preattentive processing of syllabic duration and voice onset time," University Press, 2010, pp. 159–183.
Cerebral Cortex, vol. 24, no. 4, pp. 956–967, Apr. 2014. [25] P. N. Juslin and R. Timmers, “Expression and communication of
[2] C. Francois, J. Chobert, M. Besson, and D. Schon, "Music training for emotion in music performance,” in Handbook of music and emotion:
the development of speech segmentation," Cerebral Cortex, vol. 23, no. Theory, research, applications, P. N. Juslin, and J. A. Sloboda, Eds.
9, pp. 2038–2043, Sep. 2013. Oxford: Oxford University Press, 2010, pp. 453-489.
[3] D. L. Strait and N. Kraus, "Biological impact of auditory expertise [26] T. R. Goldstein and E. Winner, "Enhancing empathy and theory of
across the life span: Musicians as a model of auditory learning," Hearing mind," Journal of Cognition and Development, vol. 13, no. 1, pp. 19–37,
Research, vol. 308, pp. 109–121, Feb. 2014. Jan. 2012.
[4] R. Brochard, A. Dufour, and O. Després, "Effect of musical expertise on [27] T. Rabinowitch, I. Cross, and P. Burnard, "Long-term musical group
visuospatial abilities: Evidence from reaction times and mental interaction has a positive influence on empathy in children," Psychology
imagery," Brain and Cognition, vol. 54, no. 2, pp. 103–109, Mar. 2004. of Music, vol. 41, no. 4, pp. 484–498, Jul. 2013.
[5] V. Sluming, J. Brooks, M. Howard, J. J. Downes, and N. Roberts, [28] S. Kirschner and M. Tomasello, "Joint music making promotes prosocial
"Broca’s area supports enhanced Visuospatial Cognition in orchestral behavior in 4-year-old children," Evolution and Human Behavior, vol.
musicians," Journal of Neuroscience, vol. 27, no. 14, pp. 3799–3806, 31, no. 5, pp. 354–364, Sep. 2010.
Apr. 2007. [29] F. Laurence, “Music and empathy,” in Music and conflict
[6] S. Moreno, C. Marques, A. Santos, M. Santos, S. L. Castro, and M. transformation: Harmonies and dissonances in geopolitics, O. Urbain,
Besson, "Musical training influences linguistic abilities in 8-year-old Ed. London: I.B Tauris, 2008, pp. 13-25.
children: More evidence for brain plasticity," Cerebral Cortex, vol. 19, [30] J. K. Vuoskoski and T. Eerola, "Can sad music really make you sad?
no. 3, pp. 712–723, Mar. 2009. Indirect measures of affective states induced by music and
[7] S. Moreno, D. Friesen, and E. Bialystok, "Effect of music training on autobiographical memories," Psychology of Aesthetics, Creativity, and
promoting preliteracy skills: Preliminary causal evidence," Music the Arts, vol. 6, no. 3, pp. 204–213, Aug. 2012.
Perception: An Interdisciplinary Journal, vol. 29, no. 2, pp. 165–172, [31] J. K. Vuoskoski and W. F. Thompson, "Who enjoys listening to sad
Dec. 2011. music and why?," Music Perception: An Interdisciplinary Journal, vol.
[8] F. Degé and G. Schwarzer, "The effect of a music program on 29, no. 3, pp. 311–317, Feb. 2012.
phonological awareness in preschoolers," Frontiers in Psychology, vol. [32] C. Wöllner, "Is empathy related to the perception of emotional
2, no. 24, pp. 7-13, Jun. 2011. expression in music? A multimodal time-series analysis," Psychology of
[9] J. M. Piro and C. Ortiz, "The effect of piano lessons on the vocabulary Aesthetics, Creativity, and the Arts, vol. 6, no. 3, pp. 214–223, Aug.
and verbal sequencing skills of primary grade students," Psychology of 2012.
Music, vol. 37, no. 3, pp. 325–347, Jul. 2009. [33] H. Egermann and S. McAdams, "Empathy and emotional contagion as a
[10] A. Tierney and N. Kraus, “Music training for the development of link between recognized and felt emotions in music listening," Music
reading skills,” in Progress in Brain Research, vol. 207, M. M. Perception: An Interdisciplinary Journal, vol. 31, no. 2, pp. 139–156,
Merzenich, M. Nahum, T. M. Van Vleet, Eds. Burlington: Academic Dec. 2013.
Press, 2013, pp. 209-241. [34] L. J. Trainor, "Are there critical periods for musical development?,"
[11] E. Glenn Schellenberg, "Music lessons enhance IQ," Psychological Developmental Psychobiology, vol. 46, no. 3, pp. 262–278, Apr. 2005.
Science, vol. 15, no. 8, pp. 511–514, Aug. 2004. [35] K. Amunts et al., "Motor cortex and hand motor skills: Structural
[12] E. G. Schellenberg, "Music and cognitive abilities," Current Directions compliance in the human brain," Human Brain Mapping, vol. 5, no. 3,
in Psychological Science, vol. 14, no. 6, pp. 317–320, Dec. 2005. pp. 206–215, Jan. 1997.
[13] E. Glenn Schellenberg, "Music lessons, emotional intelligence, and IQ," [36] G. Schlaug, "Increased corpus callosum size in musicians,"
Music Perception: An Interdisciplinary Journal, vol. 29, no. 2, pp. 185– Neuropsychologia, vol. 33, no. 8, pp. 1047–1055, Aug. 1995.
194, Dec. 2011. [37] C. J. Steele, J. A. Bailey, R. J. Zatorre, and V. B. Penhune, "Early
[14] E. G. Schellenberg, "Examining the association between music lessons musical training and white-matter plasticity in the corpus callosum:
and intelligence," British Journal of Psychology, vol. 102, no. 3, pp. Evidence for a sensitive period," Journal of Neuroscience, vol. 33, no. 3,
283–302, Aug. 2011. pp. 1282–1290, Jan. 2013.
[15] K. R. Fitzpatrick, "The effect of instrumental music participation and [38] H. Takeuchi and S. H. Hulse, "Absolute pitch," Psychological Bulletin,
socioeconomic status on Ohio fourth-, sixth-, and ninth-grade vol. 113, no. 2, pp. 345–361, Mar. 1993.
300
Table of Contents
for this manuscript
301
Table of Contents
for this manuscript
There is growing evidence that musical perception and ability are strengths of individuals
with autism spectrum disorder (ASD). To date, most studies have examined how
individuals with ASD process either the emotional or cognitive aspects of music. Thus,
the question of what draws individuals with ASD to music remains unresolved. The
current study aims to assess whether (sub-clinical) autism symptomatology in the general
population impacts the relationship between processing of musical emotions and
structure. We propose a dual experimental task using a common set of musical stimuli
where participants must 1. select which emotion (happy, sad, or scary; forced choice
paradigm) best describes the stimuli and 2. detect chromatic or diatonic variations in the
structure of the stimuli (same-different paradigm). Higher autism symptomatology, as
measured with the general population Adult Autism Spectrum Quotient and Social
Responsiveness Scale, should be associated with greater discrepancy in reaction time and
total score between musical task 1 (emotion selection) and 2 (structure detection)*. NEO
personality scores will also be correlated with the discrepancy between task 1 and 2.
Musical interest and experience will be controlled for using the Goldsmiths Musical
Sophistication Index. Findings will be discussed within the context of Baron-Cohen’s
empathizing-systemizing theory, which suggests that discrepancies between socio-
emotional and cognitive profiles are distributed along a continuum in the general
population, with ASD representing one extreme of this continuum.
*Data collection is ongoing and should be completed before the conference
ISBN 1-876346-65-5 © ICMPC14 302 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 303 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 304 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
useful for normalizing sound stimuli relative to a listener. size (an angry 10-foot tall giant). In all three cases, the
Psychomechanical attributes of sound include characteristics “perceived” emotional message is the same (i.e. anger), yet
that can be inferred about a source from sound cues alone, through the process of listener-normalization, the “felt”
such as perceived sound source size, perceived sound source emotional response to the angry vocalization could be
energy, sound source material, sound source proximity, etc. manifested in many different ways.
One unique feature of these cues is that they are common to Hypothetically, if an angry vocalization was produced by a
both the sound source and the listener; both the source and the sound source of approximately the same size as the listener,
receiver will have a physical size, amount of energy, physical the felt affective responses might be expected to vary widely,
location, etc. For example, via psychomechanical sound ranging from indifference to fear. In the second example, in
descriptors, it is possible to assess if a given sound source is which the angry vocalization was produced from a much
much smaller or much larger than a particular listener, as smaller dwarf-like sound source, different induced responses
might be the case when comparing performances of ukulele might be expected, such as laughter, ridicule, or compassion.
music with pipe organ music. The former sound source is In the final example, in which the angry vocalization was
likely to be relatively smaller than the listener, whereas the produced by a much larger giant-like sound source, one might
latter sound source is likely to be much bigger. A similar expect induced responses such as terror or panic. These
comparison could be made between an energetic sound source hypothetical responses are closely related to the sound
and a lethargic sound receiver. receiver’s ability to cope with the angry sound source.
The conceptual framework brought forward here is that All of the potential induced responses in the example above
induced musical emotions stem from cues that are normalized are theoretically derived from normalizing a single
relative to a given musical receiver (i.e. listener), and psychomechanical source cue relative to the listener.
perceived musical emotions stem from cues that are However, many concurrent psychomechanical cues should be
normalized to a given musical sender (i.e. instrument or expected to play a role in induced emotional responses (e.g.
ensemble). In many ways, this idea represents what is already source energy, source proximity, etc.). Sound source size is
inherent within the connotative meaning of the words therefore but one of many descriptors that may prove to be
“perceived” and “felt.” Perceived emotion, by definition, useful in understanding felt emotional responses.
implies a sense of outwardness, especially with regards to
perceiving emotion within music, whereas felt emotion, on the V. CONCLUSIONS
other hand, implies a sense of inwardness. Therefore, the The utility of listener-normalized musical affect is largely
process of normalization may play an important role towards unknown. In order to develop a complete listener-normalized
understanding the difference between these two emotional music perception framework, more research is needed on the
loci. perception of psychomechanical aspects of music. By better
understanding the perception of sound source attributes within
IV. DOUBLE-NORMALIZATION music, more meaningful conclusions may be drawn. In this
In order to illustrate the utility of “double-normalizing” brief paper, the role of listener-normalized affect has only
affective musical communication, that is, normalizing cues been discussed in relation to isolated sounds (some of which
relative to both the sound source (perceived emotion) and the were musical). Actual music making typically involves
sound receiver (felt emotion), consider the various ways in constant changes of psychomechanical attributes (energy,
which one might respond to the sound of a human vocalizing proximity, etc.), and therefore, the combination of these
an angry growl. changing attributes results in a very complex sound source;
Being able to perceive that a sound source is expressing perhaps even the most complex sound source regularly
anger involves multiple cues, such as sharp amplitude encountered in everyday life.
envelopes (Juslin, 1997), lower f0 (Morton, 1977), large f0 Despite limited knowledge on this topic, we can still
range (Williams & Stevens, 1972), etc. Recall that these cues speculate about the implications of listener-normalized
are not absolute, but instead perceived relative to a given musical affect, both within and beyond music perception.
sound source. Without the ability to normalize the pitch and From a development perspective, listener-normalized
intensity ranges of the sound source, it might not be possible emotional responses might be useful for understanding how
to understand how we perceive expressive intent. music perception changes from infancy thru adulthood. To an
Normalization facilitates the ability to hear sounds as infant, we should expect that the world "sounds much bigger"
relatively low and/or relatively sharp, and thereafter “perceive” than it does to an adult listener. A recent study by Masapollo,
that a given sound source may be angry. Polka, & Menard (2015) found that infants preferred listening
Beyond sound source-normalization, listener-normalization to synthesized speech with infant vocal properties relative to
may provide insight into plausible induced emotional adult vocal properties. In this case, listener-normalization (i.e.
responses to this angry vocalization. For simplicity, a singular normalizing sounds to an infant listener) provides an
psychomechanical property will be used in this example: interesting conceptual lens for examining these experimental
sound source size. Assume that the listener not only perceives results. From a theoretical perspective, listener-normalized
that the sound source is angry (via the cues mentioned above), music affect might be useful for developing a "scaled" theory
but also simultaneously perceives the physical size of the of emotional responses, such that normalizing
sound source (see Patterson, et al. 2008 for an explanation of psychomechanical musical features to a specific listener might
auditory size perception). Here, we will examine three source provide insight into how and why that listener prefers certain
size exemplars: same source size (an angry adult), much types of music for certain purposes.
smaller source size (an angry dwarf), and much larger source
305
Table of Contents
for this manuscript
ACKNOWLEDGMENT
The author wishes to thank Illinois Wesleyan University
for supporting a junior faculty research leave proposal that
facilitated this research. The author also wishes to thank
Stephen McAdams, Marc Pell, David Huron, and Zachary
Silver for their support, feedback, and suggestions during this
leave.
REFERENCES
Gabrielsson, A. (2002). Emotion perceived and emotion felt: Same
or different? Musicae Scientiae, 5(1 suppl), 123-147.
Huron, D., Kinney, D., & Precoda, K. (2006). Influence of pitch
height on the perception of submissiveness and threat in musical
passages. Empirical Musicology Review, 1(3), 170-177.
Jiang, X., & Pell, M. D. (2015). On how the brain decodes vocal cues
about speaker confidence. Cortex, 66, 9-34.
Juslin, P. N. (1997). Perceived emotional expression in synthesized
performances of a short melody: Capturing the listener's
judgment policy.Musicae Scientiae, 1(2), 225-256.
Juslin, P. N., & Timmers, R. (2010). Expression and communication
of emotion in music performance. Handbook of music and
emotion: Theory, research, applications, 453-489.
McAdams, S., Chaigne, A., & Roussarie, V. (2004). The
psychomechanics of simulated sound sources: Material
properties of impacted bars. The Journal of the Acoustical
Society of America, 115(3), 1306-1320.
Patterson, R. D., Smith, D. R., van Dinther, R., & Walters, T. C.
(2008). Size information in the production and perception of
communication sounds. In Auditory perception of sound
sources (pp. 43-75). Springer US.
Masapollo, M., Polka, L., & Ménard, L. (2015). When infants talk,
infants listen: prebabbling infants prefer listening to speech with
infant vocal properties. Developmental Science, 19(2), 318-328.
Morton, E. S. (1977). On the occurrence and significance of
motivation-structural rules in some bird and mammal sounds.
American Naturalist, 111(981), 855-869.
Schubert, E. (2013). Emotion felt by the listener and expressed by
the music: literature review and theoretical perspectives. Frontiers
in Psychology, 4:837.
Stoelinga, C. (2009). A psychomechanical study of rolling sounds.
VDM Publishing.
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech:
Some acoustical correlates. The Journal of the Acoustical Society
of America,52(4B), 1238-1250.
306
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 307 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Sadness experienced in everyday life is largely unpleasant, yet when expressed through
music, it is often found pleasurable. Previous research attempting to elucidate this
paradoxical concept has shown that liking sad music is correlated with trait empathy, yet
which subtype is most predictive of the enjoyment of sad music remains a matter of
debate. In addition, how a person’s mood, past experiences, and current situations might
influence this relationship between empathy and liking sad music has largely been
unexplored. To address these questions, we conducted an online survey (N = 429) in
which participants reported the type of music they would most likely listen to in various
situations and the reasons why they chose that type of music. Multiple regression and
structural equation modeling was used to assess the relationship between four subtypes of
empathy, emotional responses to sad music, and the enjoyment of sad music in various
situations. Results show that fantasy was the strongest predictor of the enjoyment of sad
music, and this association was partially mediated by the experience of sublime feelings
in response to sad music. Moreover, people who score high on fantasy report liking sad
music because it fosters strong, positive feelings, such as feeling touched or moved, and
releases negative feelings, such a tension or sadness. Finally, people who prefer listening
to sad sounding music when experiencing a negative situation, such as after a breakup or
when feeling lonely, score higher on fantasy and personal distress, whereas people who
prefer listening to happier sounding music when in such negative situations score higher
on perspective taking. The findings allow for a more complete understanding of how trait
empathy influences pleasurable responses to negative-valent stimuli within various
contexts. Such results could provide new ways of comprehending and treating mood
disorders in which the ability to express and experience emotions is attenuated.
ISBN 1-876346-65-5 © ICMPC14 308 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Pitch and timing information work hand in hand to create a coherent piece of music; but
what happens when this information goes against the norm? While much research in
music and emotion explore what emotions are elicited, this work will explore a potential
explanation for the elicitation of those very emotions. Meyer (1956) suggested decades
ago that musical expectancy triggers emotions in response to music and Juslin and
Vastfjall (2008) also propose expectancy as a potential mechanism for musical emotion.
The aim of this research is to understand the effect of tonal and temporal structure on
expectedness and musical affect as they unfold in real time, and whether these effects are
independent or not. 40 participants listened to 32 melodies, 16 of which were original
and 16 of which were modified. Each group of 16 was categorized by high and low,
pitch and time expectancy (as measured by information content). Half the participants
continuously rated the level of perceived expectedness on a 7-point Likert scale while the
other half rated the perceived arousal/valence on a two-dimensional circumplex model of
emotion (Russell, 1980). Participants were further divided in two sub-groups by musical
training. A window-based analysis shows that notes with high pitch unpredictability that
occur on predictable onset times elicit stronger negative emotions, suggesting that pitch
and time information are not processed independently. While expectedness and valence
were modulated by the differences in, and manipulations of tonal and temporal structure,
arousal was not. Arousal ratings were instead correlated with tempo. Results did not
significantly differ between musicians and non-musicians, which can be explained by a
shared cultural background and extra-opus knowledge. For a more thorough investigation
of the data, time series analysis is underway. Individual differences in rating strategies
are also being investigated. Preliminary analysis suggests that tonal and temporal
structure do affect perceived expectedness and musical affect as music unfolds over time,
and that these structures are processed in a dependent way; however, completion of
analysis will solidify the conclusions that can be drawn from this research.
ISBN 1-876346-65-5 © ICMPC14 309 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
connection [5]–[8]. The third theory is that both movement and music
Abstract— Previous research found that both synchronous trigger mirror neuron responses and those responses stimulate social
movement and music increased perceptions of bonding for targets in bonding [9]-[10]. More recently research has suggested that both
a video. Using a 2x3x2 mixed groups design, the current study music and movement trigger neurohormonal mechanisms that
investigated the effect of having participants engage in music-making stimulate social bonding [11]-[12].
and physical movement. Two confederates participated in each Previous research found that both synchronous movement and
condition along with one participant. In half the conditions the music increased perceptions of bonding for targets in a video [13]. In
participant and the two confederates sang “Row, Row, Row Your that study participants were simply listening to music and had no
Boat” in unison. In the other half of the conditions there was no active role in making music. Additionally, they watched a video of
singing. There were three movement conditions. In one condition, the other people walking synchronously or asynchronously, but again did
participant and confederates pumped their arms up and down in not actually engage in movement themselves. Even so, they
unison to the beat of a metronome. In another condition the perceived that the synchronous video walkers were socially
participant and confederates pumped their arms up and down out of connected, but the participants did not express a connection to the
synchrony with each other. In the final movement condition, all three walkers in the video. [14]. The current study investigated the effect of
people kept their arms at their sides (no movement). The singing and having participants engage in music-making and physical movement.
movement conditions were completely crossed. All conditions were We assessed the degree to which the participants felt a social bond to
the same duration. At the end of the manipulation participants and the other people in the room who sang or moved with them. There
confederates completed a questionnaire to measure entitativity, were two singing conditions; either singing or not singing. There
rapport, mood, and manipulation checks. Singing increased bonding were three movement conditions; not moving, moving
compared to not singing. Synchronous movement increased bonding synchronously, moving asynchronously. We hypothesized that the
over no movement, but asynchronous movement tended to decrease strongest bonding would be in the condition in which participants
bonding. Singing alone, moving alone, and singing and moving both sang and moved in synchrony. We expected that singing alone
together all produced similar levels of bonding. Putting both activities and moving alone would produce an equal amount of bonding and
together, singing and moving, did not produce greater bonding than that the condition with no singing and no moving would not generate
either one alone. The asynchronous moving decreased bonding in the perceptions of bonding. Finally we expected that the asynchronous
singing condition, but was equivalent to no movement in the no movement would result in the lowest amount of bonding, regardless
singing condition. The results will be discussed in terms of theories of the singing.
of group cohesion, motor neurons, and neurohormonal mechanisms
II. METHOD
Keywords— social bonding, synchrony, singing, moving
A. Participants
I. INTRODUCTION The sample included 91 participants all over the age of 18.
ISBN 1-876346-65-5 © ICMPC14 310 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
D. Procedure
311
Table of Contents
for this manuscript
high exertion synchronous movements (as measured by an increase in [15] B. Tarr, J. Launay, and R. M. Dunbar, “Music and social bonding: 'Self-
pain threshold). However, in the same study, low exertion other' merging and neurohormonal mechanisms,” Frontiers In Psychology,
synchronous movements also did not cause an increase in the release vol.5, 2014.
[16] U. Nilsson, “Soothing music can increase oxytocin levels during bed rest
of endorphins. In contrast, in other studies such low exertion
after open-heart surgery: A randomised control trial,” Journal Of Clinical
activities such as synchronized finger tapping resulted in greater Nursing, vol. 18, no. 15, pp. 2153-2161, 2009. doi:10.1111/j.1365-
social bonding [21]. 2702.2008.02718.x
Given that social coordination and synchrony can promote a sense [17] D. Huron, “Is music an evolutionary adaptation?” in R. J. Zatorre, I.
of belongingness to the group [3], moving out of sync with group Peretz, R. J. Zatorre,and I. Peretz (Eds.) , The biological foundations of
members may serve to increase focus on the self, leading to negative music, New York, NY, US: New York Academy of Sciences, pp. 43-61, 2001.
self-evaluations, and lower self-esteem [20]. The post-experimental [18] K. MacDonald and T. M. MacDonald, “The peptide that binds: A
questionnaire revealed that most participants in the asynchronous systematic review of oxycotin and its prosocial effects in humans,” Harvard
Review Of Psychiatry, vol. 18, no.1, pp. 1-21, 2010.
conditions reported that they were aware that the three individuals
doi:10.3109/10673220903523615
were moving their arms at different rates. This awareness may have [19] J.A. Barraza and P.J. Zak, “Empathy toward strangers triggers oxytocin
dampened the need to connect with others who created barriers to release and subsequent generosity,” in O. Vilarroya, S. Altran, A. Navarro, K.
coordination with the participant. In other words, the asynchronous Ochsner, A. Tobeña, O. Vilarroya, ... A. Tobeña (Eds.) , Values, empathy, and
movement may have decreased participants’ motivation to connect fairness across social barriers , New York, NY, US: New York Academy of
with these peers as a way to restore self-esteem and personal worth. Sciences, pp. 182-189, 2009.
The current study showed that both singing and synchronous [20] J. Lumsden, L.K. Miles, and C.N. Maccrae, “Sync or sink? Interpersonal
movement led to increased social bonding. Moreover, moving out of synchrony impacts self-esteem,” Frontiers In Psychology, vol. 5, 2014.
[21] Oullier O, de Guzman G, Jantzen K, Lagarde J, Kelso J. Social
sync with other individuals reduces social connection even when the
coordination dynamics: Measuring human bonding. Social Neuroscience
group is singing together. These results appear to imply that [serial online]. 2008;3(2):178-192.
asynchronous movement has a more powerful and negative effect on
perceptions of groupness; however, we can’t make that claim since
we did not include a condition where there was asynchronous
singing. Future research should further examine the role that both
synchrony and asynchrony play in large motor movement and in
music-making to more clearly distinguish between the social bonding
effects of synchrony alone and of music alone.
IV. REFERENCES
[1] S. R. Livingstone and W. F. Thompson, “The emergence of music from
the Theory of Mind,” Musicae Scientiae, Special Issue, pp. 83-115, 2009.
[2] J. G. Roederer, “The search for a survival value of music,” Music
Perception, vol. 1, no.3, pp. 350-356, 1984.
[3] D. Campbell, “Common fate, similarity, and other indices of the status of
aggregates of persons as social entities,” Syst. Res. Behavioral Science, vol. 3,
no. 1, pp. 14-25, 1958. doi:10.1002/bs.3830030103
[4] I. Molnar-Szakacs and K. Overy, “Music and mirror neurons: From
motion to 'e'motion,” Social Cognitive And Affective Neuroscience, vol. 1, no.
3, pp. 235-241, 2006. doi:10.1093/scan/nsl029
[5] D. Lakens and M. Stel, “If they move in sync, they must feel in sync:
Movement synchrony leads to attributions of rapport and entitativity,” Social
Cognition, vol. 29, no. 1, pp.1-14, 2011. doi:10.1521/soco.2011.29.1.1
[6] M.J. Hove and J.L Risen,” It's all in the timing: Interpersonal synchrony
Increases affiliation,” Social Cognition, vol. 27, no.6, pp. 949-961, 2009.
[7] C.N. Macrae, O. K. Duffy, L. K. Miles, and J. Lawrence, “A case of hand
waving: Action synchrony and person perception,” Cognition, vol. 109,
pp.152-156, 2008.
[8] L.K. Miles, L.K. Nind, and C.N. Macrae, “The rhythm of rapport:
Interpersonal synchrony and social perception,” Journal of Experimental
Social Psychology, vol. 45, pp.585-589, 2009.
[9] K. Overy and I. Molnar-Szakacs, “Being together in time: Musical
experience and the mirror neuron system,” Music Perception, vol. 26, no. 5,
pp. 489-504, 2009. doi:10.1525/mp.2009.26.5.489
[10] J. R. Matyja, “The next step: Mirror neurons, music, and mechanistic
explanation,” Frontiers In Psychology, vol. 6, 2016.
[11] R.M. Dunbar, K. Kaskatis, I. MacDonald, and V. Barra, “Performance of
music elevates pain threshold and positive affect: Implications for the
evolutionary function of music,” Evolutionary Psychology, vol. 10, no. 4, pp.
688-702, 2012.
[12] L. Gebauer, M.L. Kringelbach, and P. Vuust, “Ever-changing cycles of
musical pleasure: The role of dopamine and anticipation,” Psychomusicology:
Music, Mind, and Brain, vol. 22, no. 2, pp. 152-167, 2012.
[13] L.L. Edelman and K.E Harring, “Music and social bonding: The role of
non-diegetic music and synchrony on perceptions of videotaped walkers,”
Current Psychology: A Journal For Diverse Perspectives On Diverse
Psychological Issues, 2014. doi:10.1007/s12144-014-9273-y
[14] L.L. Edelman, K. E. Harring, “Music bonds the video walkers but not the
participants,” Unpublished manuscript, 2014.
312
Table of Contents
for this manuscript
Of all music genres, heavy metal and gangsta rap are the most socially, politically, and
aesthetically polarizing, with a deep divide between their dedicated fan-bases and the
larger community which cannot comprehend the appeal of music which seems to be
uniformly angry, vulgar, and violent. This paper will present a theory of emotion and
aesthetic in these musical genres which is grounded in evolutionary psychology. I will
argue that the core aesthetic appeal of these genres is not anger or violence but the
expression of a feeling of being powerful, with deep evolutionary roots in male-male
competition. Sexual competition is an important driver of evolutionary selection and has
thus shaped human nature and culture. As a result, the psychology of human males,
especially adolescents, is heavily influenced by an innate desire to dominate—to feel
physically, mentally, socially, and sexually powerful. I will discuss the artistic and
aesthetic roles of these feelings in music, as well as their ethical and social ramifications.
ISBN 1-876346-65-5 © ICMPC14 313 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 314 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Prior research associates listening to heavy music with reduced suicide risk, especially
among teenage girls when utilized for vicarious release. Nevertheless, few studies
consider the active use of heavy music in self-regulation for those who suffer from
thoughts of self-harm and/or mental illness. In order to to better understand the
mechanisms by which engaging with heavy and intense music may circumvent self-
harming behavior, a pilot study is presented of 283 subjects. The majority of those
surveyed report suffering from thoughts of self-harm or mental disorders. To examine the
use of affect regulation via both generic (non-specified) and heavy, intense, and highly
emotive music, we created the Music in Affect Regulation Questionnaire (MARQ),
utilizing music in mood regulation (MMR) strategies from the work of Saarikallio. We
identify heavy music by the presence of capacious, distorted riffs; loud, pervasive
percussion; or an overall feeling of ‘raw power,’ emotion, and affective intensity
stemming from the instrumental or vocal parts. Our findings collectively show that heavy
music listeners (and those who have thoughts of self-harm, in particular) interact with
definitively heavy, intense, or highly emotive music differently than with generic music,
especially in the use of modulating negative mood. These findings seem less related to
genre-specific categories than certain musical commonalities collectively understood as
intensity, and provide significant evidence for heavy music's ability to circumvent self-
destructive impulses, especially when applied in tandem with specific listening strategies
of affect-regulation. Additional evidence from prior case studies further suggests the
value of deeper investigation of the conscientious use of heavy music as a potential
intervention for those suffering from affect dysregulation and self-harm.
ISBN 1-876346-65-5 © ICMPC14 315 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Abstract—The current research, comprised of two studies, explored I. INTRODUCTION
the nature of music engagement and its relationship with various
psychological constructs in a sample of North Indian young adults.
Study 1 examined the relationship between music preferences (MP),
listening styles (LS), functions of music (FoM), perceived rasa
A ll humans across all cultures are exposed to music and
potentially possess the innate ability to understand and
respond to music. In this modern era, music is so pervasive it
(music), and personality traits (PT - Big Five Factors). A sample of is unavoidable. The music of India (one of the oldest unbroken
77 young adults (M = 39; F = 38; M = 22.7 years) completed musical traditions in the world), is inextricably interwoven not
measures of the above constructs and data were analysed via only with ritualistic and devotional side of religious lives but
correlations, one-way ANOVA, post hoc tests, and T-tests.
Significant correlations were found between LS & PT; MP & PT;
also with day-to-day life experiences. Music accompanies a
FoM & PT; FoM & emotion; and LS & FoM. Findings indicated person from birth until death. However, we have little
stronger preferences for genres namely Romantic songs, Soft songs, understanding of how people use and experience music in
and Filmy (Sad) songs. Gender difference existed in terms of MP, their daily lives [3], and how cultural background, gender,
perceived emotions and LS. Music listening mainly served as a personality, listening habits and other factors may influence
‘source of pleasure and enjoyment’ and which ‘calms, motivates, or the uses of music. Involvement in musical activities has been
reminds of past events’. Musical genres inducing santoṣa rasa were
shown to have positive effects on mood [22], quality of life [4]
perceived significantly higher in female participants. Based on the
findings, a 'Music Engagement Model (MEM) for Young Adults' and engagement [5], and to be a very rewarding leisure
describing their music behaviour, have been proposed. activity [12]. A challenge to such an investigation is that
music is used for many different purposes.
Study 2 examined affect (thoughts, feelings and actions) while
listening to the lyrics of various music genres in young adults with In recent years, researchers have been particularly interested
varied personality traits. This study further examined which in adolescents and young adults who are marginalized and/or
personality factor was more associated with thoughts, feelings and experiencing major psychological issues and have found that
action tendencies generated through songs of various genres. A they prefer heavier forms of music such as heavy metal and
sample of 60 young adults (30 boys and 30 girls) of age group 18-27 hard rock [21] [6] [23]. It is presumed that these music
years, pursuing graduation and post-graduation degrees from Amity preferences reflect their values, conflicts, and developmental
University, Lucknow campus were selected. Tools used were Big
Five Inventory (by John & Srivastava) and a 4-point Cognitive, issues with which these youth are dealing. Music listening is
Affective and Conative (CAC) scale (developed by Authors). one of the most enjoyable activities reported by young adults.
Findings provide insights about the significance of music as media in Karl Mannheim relates young adulthood with the ability to
day-to-day lives of young adults, particularly on their cognition and question and reflect upon life and experiences. To him, this
the amount of affect based on their personality factors. The time time begins about the age of seventeen [13]. He justifies his
phrase i.e. young adulthood, what has been called the most crucial age differentiation in further expressing the importance of
age, needs to be exposed to such music which does not only prove to
be a source for chills and enjoyment but also which fosters its well- living in the “present,” “the up-to-dateness of youth therefore
being. consists in their being closer to the ‘present’ problems” (pp.
Keywords—Music preference; rasa; personality; psychological 300–301). “The time phrase [young adulthood] embraces what
well-being; affect. has been called the most crucial age range for the creation of a
distinctive and self-conscious generation” [8]. Young adults
are those who have reached sexual maturity, but are not
married [18].
Ms A. Chakraborty, student of MA (Clinical Psychology), is with the
Amity Institute of Behaviour and Allied Sciences, Amity Uttar Pradesh, Young adults’ subsequent transition to adulthood is
Lucknow, India, 226028 (corresponding author to provide phone: +91- notoriously stressful, placing demands on young people’s
7071758062; e-mail: ahelic28@gmail.com). coping resources and putting them at greater risk of
Dr. D. K. Upadhyay, Assistant Professor, is with the Amity Institute of
Behaviour and Allied Sciences, Amity Uttar Pradesh, Lucknow, India, 226028 developing mental health problems [1] [11]. There is,
(e-mail: dkupadhyay@lko.amity.edu). therefore, a need to investigate everyday strategies that young
Dr. R. Shukla, Assistant Professor, is with the Amity Institute of Behaviour people use to support their well-being. There is a growing
and Allied Sciences, Amity Uttar Pradesh, Lucknow, India, 226028 (e-mail:
rshukla1@lko.amity.edu).
recognition amongst scientists that musical behaviour is
ISBN 1-876346-65-5 © ICMPC14 316 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
central to our humanity, to what it means to be human. Not subscales: extraversion, agreeableness, conscientiousness,
least, we engage in musical activities because doing so neuroticism, and openness. The inventory seeks responses on
sweetens and structures our leisure time and thereby makes us a 5-point rating scale ranging from ‘‘totally agree” to “totally
happy and increases our well-being [7] [15]. All these disagree”. Certain items in the inventory are reverse scored.
assumptions, however, lack an explanation for how music can Reliability of the BFI ranges from 0.79 to 0.88.
lead to such positive effects [17]. Survey
Viewed against this backdrop, the current research, Based on previous literature, for example on musical
comprised of two studies, explored the nature of music engagement [24], rasa [10] [14], personality factors [9], and
engagement and its relationship with various psychological music preferences (Upadhyay, in press), a survey with ratings
constructs in a sample of North Indian young adults. Study 1 scales was administered individually on the participants.
examined the relationship between music preferences (MP),
B. Results and Discussion
listening styles (LS), functions of music (FoM), perceived
rasa (emotion), personality traits (PT - Big Five Factors) and Results revealed that majority of 55.90% respondents
psychological well-being (PWB). Study 2 examined the actively listened to music up to 6 hours, 27.3% listened
impact of various songs (with lyrics) of various genres had between 6 to 10 hours, and 13% listened more than 10 hours
any effect on the thoughts (cognitive), feelings (affective) and per week. Active listening was defined as ‘listening carefully
actions (conative) of young adults with varied personality or on purpose, instead of doing other things as well’. Only
traits. 3.90% respondents did not actively listen to music. Passive
listening was defined as ‘listening in the background, while
II. STUDY I you do other things’. 53.3% respondents listened up to 6
hours, 19.5% between 6 to 10 hours, and 24.5% listened more
A. Method than 10 hours per week in the background. 2.6% indicated ‘not
Participants (listening) at all’ in the background. The male respondents (M
A total of 77 respondents (male = 39; female = 38 with = 4.33, SD = 1.91), in comparison to female respondents (M =
mean age = 22.7 years), pursuing UG and PG programs from 3.55, SD = 1.69) were more engaged in active listening.
different schools/institutions of Amity University, completed Respondents’ responses to the open-ended question ‘when
the survey. do you prefer listening to particular music genre’ were put in
Measures three categories: time, context and mood. Respondents
Demographic Data Sheet: The data sheet was used to gain indicated that they listened to Bollywood sad (slow) songs
demographic details of the respondents (e.g. age, gender, anytime (50%) being alone (100%) and when they were in bad
socioeconomic status, and music background). (low) mood (60%) or felt gloomy/sad (40%). On the other
Music Preference Scale: Music Preference Scale developed hand, most of the respondents (87.5%) enjoyed listening to
by the researcher was used to figure out the music preference romantic (love) songs being alone but when they were in
of the respondents. The scale included 23 music genres, to be happy mood (90%). Genres namely rock songs (66.67%), hip-
rated on a seven-point Likert rating scale by the respondents to hop (85.71%), remix (57.14%) and rap (57.14%) were
indicate their preference for listening to a particular music preferred in clubs/parties. Bhajan (88.89%) was preferred in
genre. This scale also included one open ended item asking morning whereas patriotic songs (100%) on special occasions.
respondents to add, if any, to the already listed genres. Each Respondents also showed preferences for particular genres in
music genre was accompanied by one open ended item asking terms of actions they are involved in while listening to them.
them to respond to ‘when (time, place, mood, etc.) do you While driving respondents generally preferred romantic (love)
prefer listening to this music genre?’ Cronbach’s alpha of the songs, melodious film songs, English songs, sufi and rock
scale has been found to be 0.85. songs; when felt like dancing, indicated for rap, hip-hop, pop,
Music Engagement Scale: This Likert type rating scale was remix, new age, Punjabi and rock songs; and to relax they
used to assess the music engagement of the participants [24]. preferred listening to genres like trance, instrumental, sufi,
It covered items pertaining to number of hours devoted per classic, jazz and ghazal.
week by each participant and their preferred ways of listening Findings indicated preference for listening to selected music
to music. Functions of Music Scale: This Likert type rating genres. Results of a one-sample test against the centre of the
scale [19] comprised of 10 items was used. Cronbach’s alpha 7-points rating scale (test value = 4) indicated whether ratings
for this scale has been 0.76. were higher or lower from ‘somewhat enjoyable’.
Music Emotion Scale: This rating scale, proposed by author, Romantic/love songs as well as film songs were generally
has been adopted from Koduri and Indurkhya (2010) [10] and favoured over other genres. On seven-point scales, mean
Misra (2014) [14]. It has 11 rasa (with clusters of emotions) to ratings of ‘romantic/love songs’ were significantly higher in
be rated on seven point scale. The reliability coefficient female respondents (M = 6.00, SD = 1.21) than male
(Cronbach’s alpha) is 0.83. respondents (M = 5.33, SD = 1.61). Whereas, mean ratings of
Big Five Inventory (BFI): To ascertain the personality type ‘rock song’, ‘pop’, ‘blues’, ‘trance’, ‘jazz’, and ‘instrumental
of the respondents ‘Big Five Inventory’ developed by John music’ were significantly higher in male respondents than
and Srivastava (1999) [9] was used. The BFI contains five female respondents.
317
Table of Contents
for this manuscript
Respondents also indicated their listening styles on seven- Significant correlations with ‘openness’ personality factor
point scale. The highest ratings were yielded for ‘emotional were found with five functions served by music listening.
listening’ (i.e., experiencing personal emotional responses to Previous researches argue that after exercise, music is the
music, M = 4.74, SD = 1.85), followed by ‘moving the body second most commonly used mood-regulation strategy in
or parts of the body’ (M = 4.31, SD = 1.88), ‘having visual or young people [20]. Various forms of musical engagement
other associations’ (M = 4.25, SD = 1.79), and ‘analytical were related to well-being through emotion regulation
listening/concentrating on the structure’ (M = 3.94, SD = strategies [16]. Although, this study didn’t investigate
1.89). Moreover, mean ratings of ‘emotional listening’ were correlations between young adults’ musical engagement and
significantly higher in female respondents (M = 5.26, SD = their psychological well-being. However, it can be
1.65) than male respondents (M = 4.23, SD = 1.99), t (77) = hypothesized that young adults’ day-to-day musical
2.47. Mean ratings of ‘moving the body or parts of the body’ engagements may predict their psychological well-being.
were also higher in female respondents (M = 4.87, SD = 1.60)
than male respondents (M = 3.77, SD = 1.99), t (77) = 2.67.
Music genres were correlated with personality factors.
Genres namely rock song, ghazal, patriotic songs, sufi, classic,
hip-hop, English songs, blues, jazz, trance and instrumental
music were significantly correlated with ‘openness’.
‘Extraversion’ was correlated with genres namely patriotic
songs, melodious film songs and folk; whereas ‘neuroticism’
was negatively correlated with rock and rap music genres and
‘conscientiousness’ with English songs.
Significant correlations with listening styles were found for
two personality factors. Neuroticism’ was correlated with
‘emotional listening’, r (77) = .28, p < .05, and ‘openness’
with ‘analytical listening’, r (77) = .27, p < .05. This study proposes a tentative Music Engagement Model
Respondents were asked what functions active or passive (MEM) for Young Adults (see Figure 1), which indicates
music listening have for them. Results indicated that music gender differences in terms of music preferences, listening
mainly serves as a ‘source of pleasure and enjoyment’, styles, types of listening, functions of music and rasa
‘calms/releases stress/relaxes’, ‘motivates’ or ‘reminds of past perceived. Music preferences and listening styles are
events’. Significant correlations were found between hours correlated with the rasa perceived. Functions served by music
spend in active listening and music that ‘motivates’, r (77) = listening are correlated with type of listening, listening styles
.25, p < .05, and between passive listening and music that and the rasa perceived. Personality factors are correlated with
‘reminds of past events’, r (77) = .38, p < .01. the rasa perceived, music preferences, functions of music,
Respondents rated their experienced rasa while listening to listening styles and type of listening (active/passive).
music. The highest ratings were yielded for Śṛṅgāra rasa Moreover, music listening occur in a specific psychosocial
(represents emotions like arousal, longing, desire, naughty, context (e.g., place, time, mood etc.). Proposed model assumes
romance and love, M = 5.16, SD = 1.79), followed by Santoṣa that variables like types of rasa, preferences for music genres,
rasa (represents emotions like joy, pleasure, excitement, and listening styles and functions served by music listening may
contentment, M = 4.58, SD = 2.10), and Śānta rasa predict psychological well-being. The proposed model and its
(represents emotions like steady, rest, and peace, M = 4.25, assumption are subject to verification.
SD = 2.07). Significant difference in rating of Santoṣa rasa
C. Conclusion
were found between female respondents (M = 4.05, SD =
2.43) and male respondents (M = 5.10, SD = 1.59), t (77) = In conclusion, investigating young adults’ musical
2.25, p < .05. engagement, their background, and their personality enhances
Listening style ‘visual and other associations’ was found our understanding of their musical behaviour. This study puts
significantly correlated with five functions served by music forth several questions to be answered. For example, how do
listening, which were as follows: It reminds me of past events, they develop preference for music genres? How do they use
r (77) = .33, p < .01; It evokes visual images, r (77) = .42, p < music strategically to regulate their moods/emotions or to deal
.01; It moves to tears/chills/other bodily reactions, r (77) = with their sufferings? When do they prefer listening to a
.24, p < .05; It functions as catharsis, r (77) = .44, p < .01; and particular music? How many hours they spend listening to
It motivates, r (77) = .34, p < .01. This listening style was also particular music genre in a day or in a week? Future research
found significantly correlated with four rasa namely Śṛṅgāra could explore the musical experiences of young adults of both
(r = .34, p < .01), Bhayānaka (r = .25, p < .05), Adbhuta (r = music and non-music background through in-depth interviews.
.27, p < .05), and Santoṣa (r = .25, p < .05) evoked by music Answers to these questions may help researchers to explain
listening. It indicates that respondents of this listening style the musical behaviour of young adults that support their well-
could be more engaged and affected by the music they listen being.
to.
318
Table of Contents
for this manuscript
319
Table of Contents
for this manuscript
320
Table of Contents
for this manuscript
Numerous studies have shown that personality traits play an important role in
determining well-being, especially extraversion being a strong predictor of positive affect
(e.g. Steel, Schmidt & Shultz, 2008). In music, positive effects of (choral) singing on
well-being have been reported (e.g. Beck et al., 2000). We explored the relationship
between personality and well-being in three amateur choirs (n = 57, 44 female, mean age
59.7 years), two amateur brass bands (n = 54, 20 female, mean age 34.1 years) and three
amateur theater groups (n = 34, 21 female, mean age 32.1 years). Participants completed
the Positive Negative Affect Schedule (PANAS), the Perceived Stress Questionnaire
(PSQ) and the State-Trait Anxiety Inventory (STAI; state questionnaire only) before and
after a 1.5-hour rehearsal, as well as the short version of the Big Five Inventory (BFI-K)
after the session. We found differences between groups for extraversion and openness to
experience, indicating that members of the theater groups are more extraverted than choir
singers and that members of the brass bands are less open to experience than those of
choirs and theater groups. Multiple linear regression analyses show that the difference
score for positive affect between pre- and post-measurements can be predicted by
extraversion; a higher score on the extraversion scale accounts for a bigger increase of
positive affect. Members of the theater groups benefit more than brass band members. As
for the difference score of STAI, higher ratings for openness to experience were found to
predict the decrease of anxiety. Anxiety is predicted to decrease more for members of the
theater groups than members of the brass bands. Results are of relevance for hospitals,
retirement homes, and therapists that organize or recommend active musical activities for
their clients.
ISBN 1-876346-65-5 © ICMPC14 321 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 322 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Saliva contains many of the chemical markers found in blood. It provides a non-invasive,
readily accessed and repeatable individual psychobiological snapshot. This exploratory
study examined the comparative effects of live and recorded music on pain, anxiety and
immune function of 50 tertiary students, 15 palliative care and 9 surgical patients. The
paper describes the interventions as well as the saliva collection procedures and
processes.
It was anticipated that the live interventions would generate a greater reduction in
measures of pain and anxiety, and greater improvement of immune function, than the
audiovisual, which in turn would have a greater effect than the audio-only conditions. It
was also predicted that music overall would have a larger effect on the outcome measures
than stories. Results partly supported these hypotheses, in particular with the clinical
population. Methodological issues and benefits of this psychoneuroendocrine approach to
music psychology research will be discussed in light of the results.
ISBN 1-876346-65-5 © ICMPC14 323 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ABSTRACT
Chanting is a pervasive practice in almost every tradition all over the
world. It has been found to improve attention and reduce depressive
symptoms, stress and anxiety. The current study aimed to determine
whether chanting “Om” for 10 minutes would improve attention,
positive mood and increase feelings of social cohesion. The effects of
vocal and silent chanting as a meditation practice were compared, as
well as the effects of chanting for experienced and inexperienced
chanters. It was hypothesized that vocal chanting would have a
greater effect than silent chanting and experienced chanters would
report stronger effects. Experienced and inexperienced chanters were
randomly allocated to one of two conditions: vocal chanting or silent
chanting. Prior to and following chanting, participants completed the
Digit-letter Substitution task, the Positive Affect Negative affect
Schedule, the Multidimensional Measure of Empathy and the Adapted
Self-Report Altruism Scale. Following chanting participants also
completed a Social Connectedness Questionnaire and a manipulation
check. Results showed that positive affect and altruism increased
more following vocal than silent chanting. Furthermore, whereas Figure 1. Chanting practices as a form of FA meditation.
altruism increased following both vocal and silent chanting for Although a small body of research has identified some of
experienced participants, it only increased following vocal chanting the emotional and cognitive effects of chanting (Bernardi et
for inexperienced participants. No significant differences between al., 2001a; Kenny, Bernier & DeMartini, 2005; Pradhan &
vocal and silent conditions were observed for empathy, attention, or Derle, 2012; Wolf & Abell, 2003), no studies have compared
social connectedness. Overall, the results indicate that chanting has a the effects of silent and vocalized chanting, or the mediating
positive effect on mood and social cognition. The findings are effects of experience. The aim of the current study was to
discussed in view of current understandings of the psychological and compare the effects of vocal and silent chanting on cognitive
emotional effects of music and synchronization. and affective states in both experienced and inexperienced
chanters. Past research of explicit group synchronization has
revealed that it leads to increased social cohesion (Valdesolo,
I. INTRODUCTION Ouyang & DeSteno, 2010; Wiltermuth & Heath, 2009).
Although chanting is a pervasive practice around the Therefore, it is possible that explicit vocal chanting will
world, used in many traditions as a way of deepening spiritual enhance synchronization, such that the beneficial effects are
awareness, there is very little understanding of the stronger for this form of chanting than for silent chanting,
psychological, emotional and social implications of this which permits desychronisation given the absence of an
widespread practice. explicit joint-action-task.
There are many different styles of chanting but all styles
fall into two broad categories: vocalized and silent. Vocal II. MATERIALS AND METHOD
chanting may be defined as the repetition of words or
A. Participants
syllables that are either spoken or sung on the same note or a
series of notes (Shearing, 2004). In contrast, “silent chanting” The final analysis consisted of 45 inexperienced chanters
may be conceptualized as the repetition of imagined words or (individuals who had chanted less than 5 times in total),
syllables in the absence of any vocalization. consisting of 37 females and 8 males with ages ranging from
As illustrated in Figure 1, vocal and silent chanting are 18 to 68 years (M = 25.11, SD = 13.07) and 27 experienced
forms of focused-attention (FA) meditation, a technique of chanters (individuals engaging in chanting at least once a
concentration involving intense or prolonged focus on a single month for over 12 months), consisting of 13 females and 14
point (Bormann et al., 2006a). The most well known form of males with ages ranging from 18 to 68 years (M = 38.22, SD =
FA meditation is mantra meditation that involves focusing on 14.31). The inexperienced participants were recruited through
the mental repetition of a specific sound or phrase, known as the Macquarie University online participant pool or through
social media. The experienced participants were recruited
the mantra (Lutz et al., 2008; Bormann et al., 2014). Both
through social media and through a yoga studio. Participants
vocal and silent chanting are forms of mantra meditation with
recruited through Macquarie University were offered course
the sound of phrase of concentration being the mantra.
credit for their participation. Members of the general public
went into a draw to win 1 of 3 $50 vouchers for a yoga studio.
ISBN 1-876346-65-5 © ICMPC14 324 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
B. Materials
Participants completed the Digit-letter Substitution task Table 1. Experienced participants’ scores before and after silent
(DLST; Natu & Agarwal, 1995), the Positive Affect Negative chanting (N = 13).
affect Schedule (PANAS), the Multidimensional Measure of
Empathy (MME; Davis, 1980), and the Adapted Self-Report
Altruism Scale (SRA; Rushton, Chrisjohn & Fekken, 1981)
prior to the experimental phase of the research. Following the
experimental phase, participants completed the above
measures again and also completed the Social Connectedness
Questionnaire (SCQ) and the manipulation check. The
experimental phase of the research involved listening to a
recording of chanting, which was either chanted along with in
the vocal condition, or listened to in the silent condition. This
recording can be found at:
https://www.youtube.com/watch?v=yoYrLM5rGX8&list=RD
yoYrLM5rGX8#t=22. Table 2. Experienced participants’ scores before and after vocal
C. Procedure chanting (N = 14).
A. Assumptions of Normality
Variables were examined for their conformity to the
assumptions of a two-way between subjects analysis of
variance (ANOVA). Assumptions of normality were met for
all variables with the exception of negative affect, which had
a positive skew and was slightly leptokurtic. It was deemed
unnecessary to remove outliers as the dependent variables
were normally distributed. Furthermore, the two-way
ANOVA is robust to small violations of assumptions of
normality (Kenny & Judd, 1986). Levene’s tests were
insignificant for all variables examined, indicating that
homogeneity of variance was met. An alpha level of .05 was
used for all significance tests. Table 4. Inexperienced participants’ scores before and after
vocal chanting (N = 22).
B. Descriptive Statistics
The SCQ was completed post-chanting, and revealed that
social connectedness was somewhat higher for experienced
chanters in the silent condition, M = 3.15, SD = 0.69, than it
was for experienced chanters in the vocal chanting condition,
M = 2.71, SD = 1.27, inexperienced chanters in the silent
condition, M = 2.83, SD = 0.937, or inexperienced chanters in
the vocal condition, M = 2.64, SD = 0.953. For other measures,
the mean and standard deviation of pre-chanting and
post-chanting scores for each of the four conditions are
displayed in Tables 1-4.
325
Table of Contents
for this manuscript
IV. DISCUSSION
C. Analyses Consistent with previous research, chanting increased
positive mood, decreased negative mood and improved
Paired sample t-tests were used in order to examine the
attention. Furthermore, altruism increased more following
different effects of before and after chanting for each of the
vocal chanting than silent chanting, suggesting that an explicit
groups. Those measures that significantly increased or
joint-action activity is more effective at creating feelings of
decreased following interventions are indicated in Table 1-4
social connectedness than a silent group activity. These
by one (p < 0.05) or two (p < 0.01) asterisks. The change in
findings indicate that chanting not only has a positive effect
score for attention, positive affect, negative affect and
on mood and cognition; it also has the ability to improve
empathy were calculated by subtracting the 'pre score' from
feelings of social connectedness. Table 5 provides a synopsis
the 'post score', with the change in score used as the
of the significant increases and decreases of each variable
dependent variable. For example, the dependent variable for
following vocal and silent chanting. Significant benefits were
altruism was calculated by subtracting the pre-experiment
observed in all four participant groups, but it is especially
altruism score from the post-experiment altruism score. As
striking to note that inexperienced chanters exhibited
such, positive scores on the dependent variable represent a
significant benefits following vocal chanting in all five
post-experiment increase in that variable, whereas a negative
measures of mood and social cohesion. Although the source
score represents a decrease in the variable. For each
of these effects has yet to be determined, Figure 2 illustrates
dependent variable a two-way ANOVA was carried out using
some hypothetical mechanisms underlying these benefits.
the independent variables of engagement and experience.
Table 5. Significant increases and decreases of measures for
D. Positive Affect experienced and inexperienced participants following silent and
vocal chanting conditions.
A two-way ANOVA using the mean difference of scores
showed a significant interaction between type of chanting and
amount of experience on positive affect, F(3, 68) = 6.320, p
= .014, ηp2= .085. Positive affect increased more in the vocal
chanting condition (M = 1.00, SD = 1.00) compared with the
silent chanting condition (M = 0.95, SD = 0.98) and
inexperienced chanters in the vocal condition (M = 2.55, SD =
1.22) showed a greater increase in positive affect than
experienced chanters in the vocal condition (M = -0.65, SD =
1.53).
Post Hoc comparisons were conducted using Bonferroni
adjusted alpha level of 0.025. These comparisons revealed
that inexperienced chanters showed a significant overall
increase in positive affect (p = 0.003) whereas experienced
chanters showed no significant increase. Furthermore
inexperienced chanters only showed a significant increase in
positive affect for the vocal chanting condition (M = 2.55, SD
= 5.70) and not the silent chanting condition (M = -2.91, SD =
5.95).
E. Altruism
A two-way ANOVA, using the mean difference scores
showed a significant overall effect of type of chanting on
altruism, F(3,68) = 9.097, p = .004, ηp2= .118. Altruism
increased to a greater extent following vocal chanting (M =
4.18, SD = 0.65) than silent chanting (M =1.38, SD = 0.66).
Chanting experience did not significantly affect altruism, Figure 2. Factors associated with the benefits of chanting.
F(3,68) = 2.935, p = .091, ηp2= .041 and there was no
significant interaction between level of experience and type of V. CONCLUSION
chanting on altruism F(3,68) = 0.002 , p = .961 , ηp2= .000. Chanting is a pervasive practice that has been part of
human behaviour for thousands of years, yet the emotional
F. Manipulation Check
and cognitive effects of chanting are not well understood. The
To determine whether participants were chanting in both results of this investigation demonstrate that the benefits of
conditions, a manipulation check was conducted by measuring vocal chanting may be mediated by three factors: group
how much participants felt they engaged in the practice. The synchronization, which increases feelings of social cohesion;
results of a two-way ANOVA revealed significantly higher physiological changes (breath control and singing), which
scores in the vocal chanting compared with the silent chanting may contribute to increased positive mood; and focused
F(3,68) = 31.451, p = .007, ηp2= .101. Also, engagement attention, which may inhibit ruminative thinking and lead to
scores were significantly higher for experienced chanters than increased positive mood.
for inexperienced chanters F(3, 68) = 7.634, p = < .001, As one of the first studies to systematically investigate the
ηp2= .316. benefits of chanting, the current study provides a basis for
326
Table of Contents
for this manuscript
future research on the emotional, social, and cognitive Wiltermuth, S. S., & Heath, C. (2009). Synchrony and cooperation.
benefits of this pervasive human activity. As revealed by this Psychological Science, 20(1), 1-5. doi:
investigation, chanting has emotional and social benefits for 10.1111/j.1467-9280.2008.02253.x
both experienced and inexperienced individuals, and these Wolf, D. B., & Abell, N. (2003). Examining the effects of meditation
techniques on psychosocial functioning. Research on Social
benefits can be observed following either vocal or silent
Work Practice, 13, 27-42.
chanting. The extent of these benefits was dependent on the Valdesolo, P., Ouyang, J., & DeSteno, D. (2010). The rhythm of
type of chanting meditation and the experience of participants, joint action: Synchrony promotes cooperative ability. Journal of
with inexperienced participants who engaged in vocal Experimental Social Psychology, 46(4), 693695. doi:
chanting reaping the greatest number of significant benefits. 10.1016/j.jesp.2010.03.004
Such results are encouraging, and suggest that chanting may
be an effective tool for enhancing mood and creating a sense
of social cohesion. These benefits, in turn, may be associated
with increased health and wellbeing.
ACKNOWLEDGMENTS
This research was supported by an Australian Research
Council Discovery Grant DP130101084 awarded to WFT and
by a fellowship awarded to VP by the Australian Research
Council Centre of Excellence in Cognition and its Disorders.
REFERENCES
Bernardi, L., Gabutti, A., Porta, C., & Spicuzza, L. (2001a). Slow
breathing reduces chemoreflex response to hypoxia and
hypercapnia, and increases baroreflex sensitivity. Journal of
Hypertension, 19(12), 2221-2229.
Bernardi, L., Sleight, P., Bandinelli, G., Cencetti, S., Fattorini, L.,
Wdowczyc-Szulc, J., & Lagi, A. (2001b). Effect of rosary prayer
and yoga mantras on autonomic cardiovascular rhythms:
Comparative study. British Medical Journal, 323(7327),
1446-1449.
Bormann, J. E., Becker, S., Gershwin, M., Kelly, A., Pada, L., Smith,
T. L., & Gifford, A. L. (2006a). Relationship of frequent mantra
repetition to emotional and spiritual wellbeing in healthcare
workers. Journal of Continuing Education in Nursing, 37(5),
218224.
Bormann, J. E., Oman, D., Walter, K. H., & Johnson, B. D. (2014).
Mindful attention increases and mediates psychological
outcomes following mantra repetition practice in Veterans with
posttraumatic stress disorder. Medical Care, 52, S13-S18.
Davis, M. H. (1980). A multidimensional approach to individual differences
in empathy. Catalog of Selected Documents in Psychology. 10(85).
Retrieved. from
http://www.researchgate.net/publication/34891073_A_Multidimensional
_Approach_to_Individual_Differences_in_Empathy
Kenny, M., Bernier, R., & DeMartini, C. (2005). Chant and be happy:
The effects of chanting on respiratory function and general
wellbeing in individuals diagnosed with depression.
International Journal of Yoga Therapy, 15, 61-64.
Lutz, A., Slagter, H. A., Dunne, J. D., & Davidson, R. J. (2008).
Attention regulation and monitoring in meditation. Trends in
Cognitive Sciences, 12(4), 163-169.
Natu, M. V., & Agarwal, A. K. (1995). Digit letter substitution test (DLST) as
an alternative to digit symbol substitution test (DSST). Human
Psychopharmacology: Clinical and Experimental, 10, 339-343. doi:
10.1002/hup.470100414
Pradhan, B., & Derle, S. G. (2012). Comparison of effect of Gayatri
Mantra and Poem Chanting on Digit Letter Substitution Task.
Ancient Science of Life, 32, 89–92.
Rushton, J. P., Chrisjohn, R. D., & Fekken, G. C. (1981). The altruistic
personality and the self-report altruism scale. Personality and Individual
Differences, 2(4), 293-302. Retrieved from
http://www.subjectpool.com/ed_teach/y3project/Rushton1981THE_ALT
RUISTIC_PERSONALITY.pdf
Shearing, C. R. (2004). Gregorian chants: An illustrated history of
religious chanting. London: Mercury Books.
327
Table of Contents
for this manuscript
In recent years choirs have become increasingly popular, and this is likely due to the
effects that they can have on people’s wellbeing, and the capacity they have for
rebuilding a sense of community. This symposium brings together researchers who are
investigating the positive role choirs and group singing can play in a variety of contexts.
This includes the role of choirs on intercultural bonding on college campuses, in people
with Parkinson’s Disease, in building communities in London, and in older adults taking
group singing lessons. This is relevant to anyone with an interest in the ways that
community and health can be influenced by singing in groups, and in particular those
who are interested in the ways in which singing can produce these strongly positive
effects. Presenters will discuss the impact that their choral groups have had on the
health, social and intellectual lives of singers, as well as the mechanisms that might
underpin these changes. Challenges will also be discussed. Under the auspices of the
AIRS (Advancing Interdisciplinary Research in Singing) project, presentations will
primarily address ongoing work, but will also outline prospective projects that aim to
expand beyond the populations currently studied. Daniel Weinstein and his colleagues
(UK) explore the effect of choral singing on the fostering of social closeness. In particular
they considered whether increasing size of the group is a limiting factor and found that
this was not the case. Frank Russo (Canada) describes the formation of a choral group for
persons with Parkinson’s disease. Jennifer Bugos (US) reports on a study in which older
adults engaged in group singing lessons for a period of six weeks. Annabel Cohen and
colleagues (Canada/Malta) describe the challenges of creating a multicultural choir on a
college campus. Jacques Launay (UK) will discuss the papers from the point of view of
social bonding and health benefits but also from the perspective of connecting research to
practice, helping to answer the question of whether our research knowledge can benefit
the public engaged in singing activities and how group singing can inform understanding
of the power of musical shared engagement.
ISBN 1-876346-65-5 © ICMPC14 328 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Prior research has shown that singing has many positive physical health benefits as well
as can foster prosocial behavior. We investigated group singing as a means of social
bonding in both small and large choir contexts. While previous research has shown that
music making in small group contexts is an effective means of maintaining social
cohesion, the question of whether this effect could occur in larger groups remains
unanswered. We set out to compare the social bonding effects of singing in small,
familiar contexts versus a large, unfamiliar context. The current study recruited
individuals from a community choir, Popchoir, that meets in small choirs weekly, and
comes together annually to form a large choir combining individuals from the smaller
subchoirs. Participants gave self-report measures of affect and social bonding and had
pain threshold measurements taken (as a proxy for endorphin release) before and after 90
minutes of singing. Positive affect, social bonding measures, and pain thresholds all
increased across singing rehearsals. We found no significant difference for pain
thresholds between the small versus large group context. Additionally, levels of social
bonding were found to be greater at pre- and post-levels for the small (familiar) choir
context. However, the large choir context showed a greater change in social bonding
survey measures as compared to the small context. We find that group singing is an
effective means to foster social closeness, even in large group contexts where many
individuals are unfamiliar with one another.
ISBN 1-876346-65-5 © ICMPC14 329 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
People with Parkinson’s disease (PD) experience difficulty producing expressive aspects
of vocal and facial communication, resulting in a permanently flat expression that has
been referred to as “masked face syndrome”. It has also been found that PD patients have
decreased ability to correctly identify emotion in the facial expressions of others (e.g.,
Dujardin et al., 2004; Buxton, McDonald & Tippett, 2013). Identification deficits appear
to be due in part to deficits in automatic facial mimicry (Livingstone, Vezer, McGarry,
Lang & Russo, under review). Because expressive aspects of vocal and facial
communication are an important part of social interaction, impairments can lead to an
array of emotional health issues such as anxiety, depression and reduced quality of life
(Cummings, 1992; McDonald, Richard & DeLong, 2003). Despite its importance for
social and emotional wellbeing, few rehabilitative programs have targeted expressive
aspects of vocal and facial communication in PD patients. Based on recent research that
our lab has conducted demonstrating a positive effect of singing on expressive aspects of
communication in PD, the current study reports on a 13-week choir for people with PD
that emphasizes vocal and facial expressiveness. We hypothesize that the training
program will improve participants’ expressiveness as well as their emotional mimicry and
their ability to accurately perceive emotion in others, ultimately enhancing interpersonal
communication and quality of life.
ISBN 1-876346-65-5 © ICMPC14 330 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music training can contribute to melodic expectations. Research suggests that vocal
improvisations by young adult musicians end on the tonic, and those by young adult non-
musicians are less likely to do so (Pan & Cohen, 2015). Little is known regarding the
effects of training on vocal improvisations in novice older adults. Vocal improvisation
requires the ability to generate musical phrases, and similarly, verbal fluency is the ability
to generate words. Deficits in verbal fluency are one of the first symptoms of cognitive
impairment in aging adults (Laisney et al, 2009). Thus, we examined these areas in
conjunction with processing speed, an area known to affect verbal fluency. The purpose
of this study was to evaluate vocal improvisations and cognitive performance in novice
older adult vocal students following group vocal lessons. Thirteen healthy, older adults
were recruited from independent living facilities, and screened for cognitive impairment.
Participants completed measures of verbal fluency, simple processing speed, music
aptitude, and vocal achievement pre- and post-training. Participants attended eight weeks
of group vocal instruction. Each two-hour class included exercises in vocal technique,
music reading, and vocal independence. Vocal improvisation exercises were performed
with better intonation post-training; however, most participants repeated the same
material multiple times prior to ending the phrase. Consistent melodic contour for
improvised endings was found pre- and post-training. In addition, familiar melodies were
represented in repeated material. Results of a pair-samples t-test showed significant
increases in letter fluency, post-training. Group vocal training can instill confidence in
novice vocalists allowing them to improvise more freely. Improvisation exercises within
the context of a vocal training course may contribute to enhanced verbal fluency, an area
that can impact the quality of life in older adults.
ISBN 1-876346-65-5 © ICMPC14 331 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Establishing social bonds is one of the first challenges students face on arriving at a
university for the first time, and this challenge is greater for those who “come from
away”. Joining a choir might provide one means to meet this challenge, however, many
campus choirs are exclusive audition choirs, and their aim is often to perfect a
performance program. In contrast, a song circle approach might have wider appeal and
offer common ground for friendship and appreciation of one’s own and others’ cultures.
Over a five-year period a protocol has been in development for a campus “multicultural
choir and song circle” in which participants are encouraged to share in the responsibility
for the repertoire while they are provided the scaffolding to lead with the help of a
somewhat “behind-the-scenes” team consisting of the nominal choir director,
collaborative pianist, and research director and associates. A questionnaire before and
after a 10-week session revealed that the goals of the 19 participants in joining were met
and in some cases were exceeded. While sharing the over-riding goal to have fun, diverse
interests of members such as singing to perform vs singing to share in joint activity can
sometimes conflict. Pressure to meet standards for external performance requests can
upset the group cohesion. A solution entailing two groups – one that performs and one
that does not—is not likely practical, at least on a small campus, and ways of achieving
compromise will be discussed. The concept of the multicultural choir and song circle
initiated in the small Canadian University has been duplicated at a larger European
University for two consecutive years suggesting the transferability of the concept. The
research suggests that the university multicultural choir and song circle adds social
capital to the parent university as well as to the broader community.
ISBN 1-876346-65-5 © ICMPC14 332 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 333 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
recovering within the two first years) with music intervention, one
control group of 11 healthy participants with music intervention and
one baseline control group of 11 healthy participants without music.
Study design
8 weeks piano-tuition intervention program was developed. All Fig.1. ANOVA for repeated measures shows no significant interaction
participants were non-musicians. The participants received two of the groups. Effect only in the two control groups.No effect in the
lessons a week in the lab with instructor, 30 minutes each lesson with patient group.
instructions to practice a minimum of 15 minutes each day. Mean
training time per week was 3 hours. 20 short pieces for beginners was
curriculum, the last pieces for two hands, melody in right hand and
chords in left.
Neuropsychological tests
334
Table of Contents
for this manuscript
IV CONCLUSION
The novel key finding of this study is that regular piano playing during
Fig 3. Training effects of eight-week piano training. Functional 8 weeks can lead to structural and functional reorganization of the
activation changes in Orbitofrontal cortex patient group. neural networks in TBI patients with cognitive impairment. The
results demonstrated significant training-related neuroplasticity with
enhanced cognitive performance in both music groups, supporting the
III. DISCUSSION aim of the intervention, to restore cognitive function in TBI patients.
Additional research may consider the variable of intervention length
In analysing fMRI data we found significant changes in orbitofrontal and practising time, together with increased number of participants,to
cortex. This networks regulates higher order cognitive processing such establish an intervention adapted for future cognitive music supported
as executive functions including attention, concentration, decision- rehabilitation programs for TBI patients.
making, impulse control and, social interaction. The OC has a widely
distributed interconnected neural networks to almost all areas of the
brain. Auditory processing and motor areas are densely interconnected REFERENCES
with other prefrontal cortical regions, reflecting integration for Altenmüller, E. & Schlaug, G. (2015). Apollo’s gift: new aspects of
executive motor control which is evident in playing the piano. OC’s neurologic music therapy. Progress in Brain Research. Volum 217,
connectivity to association corties are of special interest as this is one ISSN 0079-6123.
of the most important connectivities in developing new pathways in Balanger, H.G., Vanderploeg, R.D., Curtiss, G. & Warden, D.L.
reference to the NMT intervention applied (Schlaug 2009:197-2008). (2007). Recent neuroimaging techniques in Mild Traumatic Brain
This neural process may link up new pathways and explain the Injury. Journal of Neuropsychiatry and Clinical Neurosciences
functional and structural changes in the neural networks as reported in 19,5-20. In Sigurdadottir, S. (2010). A prospective study of
this study.Interestingly, there were significant differences between the Traumatic Brain Injury. Dissertation PhD. Department of
two music groups in structural changes post intervention. In the Psychology. Faculty of Social Sciences. University of Oslo, Oslo,
control group with music there was a significantly increased synaptic Norway.
density in Basal Ganglia (BG), the motor area. There were no BG Brown, S., Martinez, M.J. & Parsons, L.M. (2006). Music and
changes in the TBI patient group. However, this group revealed Language side by side in the brain.: a PET study of the generation
significant changes in Lingual Gyrus, Occipital cortex, the neural of melodies and sentences. European Journal of Neuroscience,
networks for sight-reading musical notation. There may be several Vol.23, pp.2791-2803.
reasons for this difference. Firstly, the control group may employ less Hegde, S. (2014). Music-based cognitive remediation therapy for
effort in reading the notation, while the patient group concentrated patients with traumatic brain injury. http://journal
more in learning and reading notes. This area is active in analysing Frontiersin.org/article/10.3389/fneur.2014.00034/full.
logical conditions and recognition of words, which some TBI patients Herholz, S.C., Zatorre, R.J. (2012). Musical training as a framework
reported as their spesific deficit. Secondly, the control group had a for brain plasticity: behaviour, function, and structure. Neuron
larger repertoire, and therefore employed more training/time in 76:486-502. Doi: 10.1016/j.neuron.2012.10.011.
stimulating Basal Ganglia. The training time may be responsible for Herholz, S.C., Coffey, E.B.J., Pantev, C. & Zatorre, R.J. (2015).
the differences in structural changes in BG and explain why the patient Dissociation of Neural Networks for Predisposition and for
group did not have any significant changes in the motor area. Training-Related Plasticity in Auditory-Motor Learning. Cerebral
Another possible theory of the structural changes in the TBI Cortex, 2015, 1-10.
group’s Lingual Gyrys (LG) and changes in Orbitofrontal cortex, may Kahnt, T., Chang, L.J. Park, S.Q., Heinzle, J. & Haynes, J.D. (2012).
be the connectivity between the Temporal Pole (TP) and Ventral Connectivity-based parcellation of the human orbitofrontal cortex.
Visual Stream during social interaction. A recent study demonstrated Journal of Neuroscience, 32:6240-6250.
that the TP integrates information from different modalities (in this Koelsch, S., Fritz, T., Schulze, K., Alsop, D. & Schlaug, G. (2005).
study music) and top-down modulates lower-level perceptual areas in Adults and children processing music. An fMRI study.
the ventral visual stream as a function of integration demands (Pehrs Neuroimage.25 (2005:1068-1076).
et al.2015). TP is part of the association cortex and involved in Koelsch, S. (2012). Response to target article “Language, music, and
multimodal sensory integration (Olson et al.2007;Skipper et al.2011). the brain: a resource-sharing framework”. In Koelsch, S. Language
The results indicates interactive neural activity within the stream and Music as Cognitive Systems.Oxford: Oxford University Press.
consistent with finding from object recognition. The authors suggests Olson, I.R., Plotzker, A., Ezzyat, Y. (2007). The enigmatic temporal
there may be a TP bottom/up perceptual information to higher/order pole: a review of findings on social and emotional processing. Brain.
cognitive areas. For example by functional connectivity to the 130:1718-1731.
orbitofrontal cortex (Kahnt et al.2012). We may suggest a possible Parsons, L. Sergent, J., Hodges, D.A., & Fox, P.T. (2005). The brain
explanation of the TBI groups structural change in Lingual Gyrus and basis of piano performance. Neuro-Psychologia. 43 (2005) 199:215.
Orbitofrontal cortex correlating with the above theory, namely a Patel, A.D., Peretz, I., Tramo, M., & Labreque, R. (1998). Processing
bottom-up connectivity from TP to OC responsible for functional and prosodic and musical patterns: a neuropsychological investigation.
structural changes in the neural networks. Brain Lang., 61, 123-144. In Music and language side by side in the
Limitations to the study is the spare number of participants in the TBI brain: A PET study of the generation of melodies and
patient group. Future studies should also include a second TBI patient sentences.European Journal of Neuroscience. Vol. 23, pp.2791-
2803.
335
Table of Contents
for this manuscript
336
Table of Contents
for this manuscript
The health of musicians’ hearing has received attention due to increased legislation
protecting employees from noise in working environments. Help Musicians UK (HMUK)
is a charity for professional musicians of all genres throughout their careers. In 2014,
HMUK conducted a national survey into musicians’ health, and found that 47% of the
sample (n=552) reported that they had experienced hearing loss. In response, HMUK
conducted another survey investigating the prevalence and type of hearing problems,
help-seeking behaviour, worry about noise at work, and use of hearing protection. A
sample of 693 professional musicians took part, the majority being orchestral or
instrumental musicians. Quantitative and qualitative data were analysed using SPSS and
NVivo software. Results revealed contrasting patterns of associations between subjective
perceptions of risk, experiencing symptoms, help seeking behaviour, hearing loss (HL)
diagnosis, worry about noise at work, awareness of noise legislation and knowing
colleagues with hearing loss. A large proportion of the sample had experienced HL or
other hearing issues and many attributed hearing problems to their musical careers.
Whilst two-thirds reported wearing hearing protection, frequently used, inexpensive
forms of protection were given much lower effectiveness ratings than infrequently used
expensive forms of protection. Orchestral and instrumental musicians were significantly
more likely to use protection than singers and pianists. No association was found between
having a test and the use of protection, suggesting that hearing tests should be
accompanied by advice about the risks of NIHL to support improved protection
behaviour and uptake. Results will be interpreted in relation to existing literature on
musicians hearing and health promotion behaviour, and the data used to delineate the
potential roles and responsibilities of employers, musicians' charities and musicians
themselves in improving hearing protection uptake.
ISBN 1-876346-65-5 © ICMPC14 337 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music engagement has been associated with a wide range of health and well-being
benefits. There are however differential outcomes depending on a variety of factors such
as listener contexts and emotion regulation styles. This study explored the relationships
between the various motivations for engaging with music and self-reported perceptions of
musical capacity as captured by the MUSEBAQ, a comprehensive and modular
instrument for assessing musical engagement, in a sample of 2964 adults (aged 18-87;
59% females; 55% without formal music training). This study further examined the
mediating role of emotional sensitivity to music between emotion regulation using music
and emotional well-being. Findings from mediation analyses suggest that the purposeful
use of music to regulate emotions may not necessarily predict better well-being
outcomes. Higher levels of emotional well-being were predicted only with a greater
capacity to experience strong emotions through music. This study suggests that when
evaluating benefits of music engagement on well-being outcomes, individual variations
in music capacity and engagement need to be carefully considered and further researched.
ISBN 1-876346-65-5 © ICMPC14 338 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Affect regulation is a common function of music listening. However, experimental
findings regarding its efficacy are mixed. The majority of previous studies have
employed university-aged samples, a silent control condition, and researcher-prescribed
classical, or relaxing music.
Aims
The current study evaluates the efficacy of a self-chosen music listening intervention with
older (OA) and younger adults (YA).
Method
Forty YA (18-30 years, M = 19.75, SD = 2.57, 14 males) and forty OA (60- 81 years, M =
69.06, SD = 5.83, 21 males) completed a measure of music listening functions and
training, demographic information, and were asked to make a playlist with 15 minutes of
music they would listen to in a stressful situation. Eligible participants visited the
laboratory and were randomised to either the intervention (10 minutes of music listening)
or the control condition (10 minutes of listening to a radio documentary). Negative affect
(NA) was induced in all participants using the Trier Social Stress Test, followed by the
intervention/control condition. Measures of self-reported affect (Visual Analogue Scales:
Excited-Depressed, Happy-Sad, Calm-Tense, Content-Upset, Relaxed-Nervous, Serene-
Stressed, Alert-Bored, Active-Fatigued) were taken at baseline, post-induction and post-
intervention.
Results
A 3 (baseline, post-induction, post-intervention) x 2 (control, intervention) x 2 (younger,
older adult) mixed-ANOVA found significant time*group interaction effects for Happy-
Sad, Relaxed-Nervous, Active-Fatigue, and time*group, and time*age interactions for
Excited-Depressed and Alert-Bored. A self-chosen music listening intervention was more
effective than an active control at reducing induced NA, and was most effective for older
adults - reducing NA below baseline levels.
Conclusions
OA have rarely been included in music studies. Findings here suggest music listening
may support advanced emotion regulation capabilities in older adulthood, and will be
discussed in relation to socio-emotional selectivity theory. Age differences in intervention
effects may also result from differences in music chosen for the purposes of affect
regulation. 66% of the OA sample chose classical music pieces, similar to those used in
previous research, however only 3% of the YA sample included classical music, with
more choosing upbeat music from rock, pop, and dance genres. Results from this self-
chosen music listening intervention are in line with past studies using classical pieces or
ISBN 1-876346-65-5 © ICMPC14 339 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
music to induce particular moods. We found that self-chosen music for affect regulation
significantly reduced negative affect, particularly in older adults for whom it decreased
below baseline levels, in comparison to the control condition.
340
Table of Contents
for this manuscript
Music is a complex auditory stimulus and a universal feature of human societies. The use
and effects of music in clinical settings is of growing interest. Studies have shown that
listening to music before or during a surgery can reduce patients’ anxiety and pain levels
and that music therapy can add beneficial value in health care and rehabilitation.
Research in this area has recently become more robust by combining more diverse
disciplines but there is a clear need of research looking at the complexity of music
perception and the extent and limits of the treatment effects of music. There is a demand
for research looking at the specificities of music and the mechanisms of how these
influence wellbeing and mood. Additionally, small sample sizes and the lack of studies
combining subjective and objective outcome measures evaluating anxiety and pain levels
or benefits in health care restrict the significance of reported effects. Therefore, the
present symposium combines fundamental research and applied science in order to
highlight new findings on the perception of music, the biological mechanisms of the
effects of music and demonstrates how music can be used in clinical settings. More
precisely, the exploration of the neurochemistry and social flow of singing will be
examined and music perception in individuals with cochlear implants will be discussed.
Further, the symposium includes a paper presenting clinical data investigating the effects
of listening to music during a caesarean section on patients’ subjective and objective
anxiety levels. The aim of the symposium is to bring together new research looking at the
effective use of music in the medical sector and to give impulses for future research in
this area.
ISBN 1-876346-65-5 © ICMPC14 341 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
People often report a feeling of unity and trust when they perform music together;
however, little is known about the neurobiological processes that are associated with that
experience. Increased understanding of these neurobiological mechanisms could have
clinical implications for people with social deficits in particular. In this feasibility study
we explored neurobiological and self-reported measures associated with social bonding,
stress, and flow state during group vocal jazz improvisation. Neurobiological measures
included the neuropeptide oxytocin (previously shown to regulate social behavior and
trust) and adrenocorticotropic hormone (ACTH), which may in part mediate the
engagement and arousal effects of music. Flow state is a positive intrapersonal state
experienced when the difficulty of a task matches or slightly exceeds one’s current ability
to perform the task. Flow has been positively correlated with the quality of interpersonal
relationships and may be achieved through various music tasks. A within-subjects,
repeated measures design was used to assess changes in oxytocin and ACTH during
group vocal jazz (quartet) singing in two conditions: pre-composed and improvised. Self-
reported experiences of flow were measured after both conditions. The primary aim of
this small-sample feasibility study was to evaluate the research protocols. To our
knowledge, this study was among the first to assess neurobiological measures in vocal
jazz improvisation. Our procedures were implemented as planned and only mild
modifications of the protocol are indicated prior to a larger scale endeavor. Though not
the primary focus of this study, initial neurobiological and flow state data was generated.
Group singing was associated with a decrease in ACTH and a mean increase in oxytocin.
Flow state was experienced by all participants in both conditions. Although preliminary,
group singing and vocal improvisation may influence social relationships and reduce the
biological stress response.
ISBN 1-876346-65-5 © ICMPC14 342 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Cochlear implants represent a true modern miracle of healthcare today. The ability to
restore hearing in children with profound hearing difficulty was once regarded as science
fiction. Today, we routinely provide deaf children with the ability to gain high-level
speech perception through cochlear implantation. Although this success has been
inspiring on many levels, it has also encouraged both deaf patients and the biomedical
community to pursue ever more demanding auditory challenges. The ability to hear
music represents the most sophisticated and difficult of these challenges, and is arguably
the pinnacle of hearing. Through the use of music as both an inspiring target for which to
strive and also as a tool from which to learn, we have learned a great deal about the
limitations of current cochlear implant technologies. Pitch represents one of the most
fundamental attributes of music, and is also the aspect of music that is the most difficult
for CI users. In this presentation, we will discuss the implications of limited pitch
perception for CI users, and consider several case studies of CI users that have attempted
to pursue musical activities despite these obstacles.
ISBN 1-876346-65-5 © ICMPC14 343 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Several studies have shown that music interventions before and during surgery can lead
to reduced anxiety and pain levels of the patient. The use of this positive effect in the case
of caesarean sections could be beneficial for the expectant mother but has not found
much attention in the literature yet. The present study investigates the effects of music
interventions during the caesarean section on subjective (State Trait Anxiety Inventory,
visual analogue scale for anxiety) and objective (saliva and amylase from cortisol)
measures of anxiety as well as heart rates perceived before, during and after the caesarean
on the day of surgery. The patients are randomly allocated to the experimental group,
listening to music in the operating room during the caesarean section, or to the control
group, undergoing the procedure without any music intervention. Data from a pilot phase
(N = 45) revealed significant lower heart rates (p = .048) and a trend (p = .061) of
reduced cortisol levels during the caesarean section in the experimental group compared
to the control group. The subjective measures showed descriptively lower anxiety levels
in women listening to music. These first results are promising and suggest that listening
to music during a caesarean section leads to a reduction of anxiety and stress experienced
by the expectant mother. This positive effect could also beneficially influence pain
perception and recovery from the operation which are interesting areas for future
research. 170 participants have completed the study so far and results from a larger
sample (N ~ 250) will be presented at the conference.
ISBN 1-876346-65-5 © ICMPC14 344 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The Russian composer Vissarion Shebalin (1902-1963) holds a famous place in music
psychology. As documented by Alexander Luria (1965), Shebalin suffered strokes that
left him severely aphasic, yet he was still able to compose, and produced works admired
by his contemporaries. His case is often taken as evidence that music and language
processing can be completely dissociated in the brain. However, the musical structure of
his compositions pre- and post-stroke has never been systematically examined. By
collecting and analyzing pre- and post-stroke string quartets, the current study finds
evidence that Shebalin’s post-stroke music showed a marked shifted to a pseudo-atonal
style and the use of nonfunctional harmony, as well as a more compact formal
construction. These findings are interesting in light of recent research and theorizing in
cognitive neuroscience regarding relations between the processing of linguistic and
musical syntax (e.g., the ‘Shared syntactic integration resource hypothesis’, Patel, 2003).
The complexity of his case is increased by the historical context of Shebalin’s work. By
conducting archival research and interviews in Russia, I found that that his post-stroke
output also coincided with a more relaxed political environment, which may have
encouraged musical experimentation. This makes it difficult to ascribe the change in his
compositional style entirely to his strokes, but supports the larger point that Shebalin’s
music was saliently different before and after his strokes, and that he is thus not a simple
example of music-language dissociation after brain damage.
ISBN 1-876346-65-5 © ICMPC14 345 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
A growing body of work points to a relationship between rhythm abilities and language
skills, including grammar. Individual differences in sensitivity to rhythmic aspects of
speech could be a mechanism underlying the association between musical rhythm and
grammar skills found in our preliminary work. To assess this, we first modified a
rhythmic speech production task and conducted a pilot study to determine whether
typically developing six-year-olds would be able to do the following: 1) consistently say
a given phrase with the same rhythm across multiple trials; 2) maintain a steady internal
pacing; and, 3) synchronize their speech with an isochronous metronome.
We employed a modified version of the speech cycling task (Cummins and Port, 1998),
in which participants speak short, metrically regular phrases along with an isochronous
metronome. Twelve six-year-old children repeated a set of ten phrases (e.g., “beg for a
dime”) with two patterns (whole note metronome: one click per cycle, and half-note
metronome: two per cycle).
Primary speech beats within each phrase were marked using an algorithm and analyzed
as follows: synchrony with the metronome (external phase), synchrony between phrases
(period), and synchrony within each phrase (internal phase). Preliminary findings show
clear metrical organization of speech beats within each phrase. Both the period and
external phase analyses revealed that metronome pattern (i.e., presence of a subdivided
beat) significantly influenced children’s performance on the task, suggesting the
importance of external rhythmic cues in scaffolding children’s underlying sychronization.
Pilot results show promise for future use of the paradigm to test an overlap of speech
rhythm sensitivity, musical rhythm perception, and expressive grammar abilities.
Ongoing work entails validating the speech cycling paradigm and enhancing its
feasibility for assessment and intervention purposes.
ISBN 1-876346-65-5 © ICMPC14 346 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 347 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 348 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
addition, we assessed a variety of potential confounding abilities, and emotions) as well as a general sophistication
factors, including handedness, bilingualism, dialect exposure, factor. Because these factors are highly related, for the present
and perceptual acuity. If musical ability can predict unique study we focused on the general sophistication factor.
variance in the ability to adapt to a foreign accent even after The Musical Ear Task (Wallentin et al., 2010) requires
controlling for these potential confounds, this would suggest participants to make same/different judgments on pairs of
that musicians’ advantages in speech sound processing extend musical stimuli. In one block (MET-Melody), half of the 52
to the ability to rapidly adjust and adapt to non-standard pairs differ in a single pitch, and half of these pitch
segmental (non pitch-based) aspects of phonology. differences also change the melodic contour. In the second
block (MET-Rhythm), half of 52 pairs of rhythms (played in
II. METHODS wood block beats) contain a single rhythmic change.
Performance on each subtest was evaluated with
A. Participants discriminability (d-prime) scores.
Eighty-three native-English speaking participants without 3) Perceptual Acuity Measures. Perceptual discrimination
Korean language experience but with normal hearing (by thresholds were assessed with a set of auditory discrimination
self-report) were recruited from the University of Maryland, tasks, which use an adaptive psychophysical procedure to
College Park and surrounding area. Fifty of the participants yield an estimate of the minimum detectable difference
(60.2%) self identified as musicians. Participants either between two types of stimuli (from the Maximum Likelihood
received course credit or $10/hour for their participation. Procedure Matlab toolbox; Grassi & Soranzo, 2009). The four
tasks measured discrimination of pitch, temporal order, gap
B. Stimuli and Procedure
duration, and intensity.
1) Accent Adaptation Task. Word and sentence stimuli were 4) Additional Measures. Participants also completed a
selected from the Wildcat Corpus of Native- and questionnaire assessing handedness (Oldfield, 1971),
Foreign-Accented English (Van Engen et al., 2010) and the language experience (based on the LEAP-Q; Marian et al.,
Korean-English Intelligibility Database, both via the Online 2007), and their degree of exposure to different dialects. Note
Speech/Corpora Archive and Analysis Resource (OSCAAR). that these participants also participated in two additional
Both databases contain words and sentences recited in English speech sound learning paradigms, which are not reported here.
by native Korean speakers. Words and sentences were
selected such that, given the speaker’s accent, they could distractor target
easily be interpreted as sounding like either of two distinct
words (e.g., “ball” vs. “bowl”) both of which could easily be
represented with a picture. Pictures corresponding to the
target and foil referents of each stimulus were selected and
judged for appropriateness and representational aptness by
Y coordinate
349
Table of Contents
for this manuscript
MD reduction (residuals)
G G
III. RESULTS G
GG G
0.2 GG
G
G G G
G G
A. Correlations between IVs. G
G
G G
G
G
G G G
The different factors from the Gold-MSI questionnaire G
G
G
G G G G
G G
G
were generally well correlated. We relied on the general G
G GG G G
G
G G
G G
factors (rs between .55 and .89, except for active engagement; G
G
G
G GG
GG
G
GG G
r = .15). General sophistication also correlated highly with the G
G
G
G G
G
G G
correlated (r = .41, p < .0001), the MET rhythm subtest was G
350
Table of Contents
for this manuscript
may reflect a lessening in confidence (reflected by more Freeman, J.B. & Ambady, N. (2010). MouseTracker: Software for
variable mouse movements) as participants learned that the studying real-time mental processing using a computer
speaker produces phonemes in an unusual way. For those mouse-tracking method. Behavior Research Methods, 42,
more musical participants, this may have been balanced out or 226-41.
Grassi, M. & Soranzo, A. (2009). MLP: A MATLAB toolbox for
overcome by learning the systematic ways in which the
rapid and reliable auditory threshold estimation. Behavior
speaker’s accent differs from standard English pronunciation. Research Methods, 41, 20-8.
Form these data, it is not clear if musically sophisticated Janse, E., & Adank, P. (2012). Predicting foreign-accent adaptation
participants learned something about the accent generally or in older adults. Quarterly Journal of Experimental Psychology,
about the specific speaker they heard (cf. Bradlow & Bent, 65, 1563-85.
2008). Nor is it possible, from this correlational study, to Kempe, V., Bublitz, D., & Brooks, P. J. (2015). Musical ability and
know the extent to which this musical ability advantage non-native speech-sound processing are linked through
relates to musical experience, specific types of musical sensitivity to pitch and spectral information. British Journal of
experience, or to musical aptitude. While musical training can Psychology, 106(2), 349-66.
Marian, V., Blumenfeld, H.K., & Kaushanskaya, M. (2007). The
affect auditory acuity and speech sound learning (e.g., Tierney
Language Experience and Proficiency Questionnaire (LEAP-Q):
et al., 2015), and the type of musical experience is likely Assessing Language Profiles in Bilinguals and Multilinguals.
relevant (e.g., Carey et al., 2015), there are also clearly Journal of Speech Language and Hearing Research, 50, 940-67.
notable differences between individuals who do and do not Milovanov, R., & Tervaniemi, M. (2011). The Interplay between
pursue musical experiences (e.g., Schellenberg, 2015) and at Musical and Linguistic Aptitudes: A Review. Frontiers in
least some of these pre-existing differences probably also Psychology, 2, 321.
relate to sound category learning abilities. Müllensiefen D., Gingras B., Musil J., Stewart L. (2014). The
musicality of non-musicians: an index for assessing musical
V. CONCLUSION sophistication in the general population. PLoS ONE 9(2):
e89642.
These data indicate that individuals with higher levels of Oldfield, R. C. (1971). The assessment and analysis of handedness:
musical ability show more robust adaptation to foreign The Edinburgh inventory. Neuropsychologia, 9, 97-113.
accented speech, even after controlling for other factors Parbery-Clark, A., Anderson, S., Hittner, E., & Kraus, N. (2012).
including dialect exposure and individual differences in Musical experience strengthens the neural representation of
perceptual acuity. This suggests that musical ability is related sounds important for communication in middle-aged adults.
to the ability to rapidly adjust perceptual categories for speech Frontiers in Aging Neuroscience, 4, 30.
Patel, A.D. (2014). Can nonlinguistic musical training change the
sounds in order to accommodate systematic deviations from
way the brain processes speech? The expanded OPERA
native-like speech that are attributable to a foreign accent. hypothesis. Hearing Research, 308, 98-108.
These data complement evidence for other types of speech Posedel, J., Emery, L., Souza, B., & Fountain, C. (2011). Pitch
processing advantages associated with musical ability, perception, working memory, and second-language phonological
including the ability to perceive speech in noise (e.g., Slater et production. Psychology of Music, 40(4), 508-517.
al., 2015) and to successfully learn the phonology of a second Schellenberg, E.G. (2015) Music training and speech perception: a
language (e.g., Slevc & Miyake, 2006). gene-environment interaction. Annals of the New York Academy
of Sciences, 1337, 170-7.
ACKNOWLEDGMENT Slater J., Skoe E., Strait D.L., O'Connell S., Thompson S., Kraus N.
(2015). Music training improves speech-in-noise perception:
The work reported here was supported by a grant from the longitudinal evidence from a community-based music program.
GRAMMY Foundation. We thank Rochelle Newman, Behavioral Brain Research, 291, 244-252.
Mattson Ogg, and Nabil Morad for assistance and advice. Slevc, L.R. & Miyake, A. (2006). Individual differences in
second-language proficiency: Does musical ability matter?
REFERENCES Psychological Science, 17, 675-81.
Thompson, W.F., Schellenberg, E.G., & Husain, G. (2004).
Banks, B., Gowen, E., Munro, K.J., & Adank, P. (2015). Cognitive
Decoding speech prosody: Do music lessons help? Emotion, 4,
predictors of perceptual adaptation to accented speech. Journal
46-64.
of the Acoustical Society of America, 137, 2015-24.
Tierney, A.T., Krizman, J., & Kraus, N. (2015). Music training alters
Bradlow, A.R. & Bent, T. (2008). Perceptual adaptation to
the course of adolescent auditory development. Proceedings of
non-native speech. Cognition, 106, 707-29.
the National Academy of Sciences, 112, 10062-7.
Carey, D., Rosen, S., Krishnan, S., Pearce, M.T., Shepherd, A.,
Van Engen, K.J., Baese-Berk, M., Baker, R.E., Choi, A., Kim, M., &
Aydelott, J., & Dick, F. (2015). Generality and specificity in the
Bradlow, A.R. (2010). The Wildcat Corpus of Native- and
effects of musical expertise on perception and cognition.
Foreign-Accented English: Communicative Efficiency across
Cognition, 137, 81-105.
Conversational Dyads with Varying Language Alignment
Chobert J. & Besson M. (2013). Musical expertise and second
Profiles. Language and Speech, 53, 510-540.
language learning. Brain Sciences, 3, 923-940.
Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., &
Christiner, M. & Reiterer, S.M. (2015). A Mozart is not a Pavarotti:
Vuust, P. (2010). The Musical Ear Test, a new reliable test for
singers outperform instrumentalists on foreign accent imitation.
measuring musical competence. Learning and Individual
Frontiers in Human Neuroscience, 9, 482.
Differences, 20, 188-196.
Clarke, C.M. & Garrett, M.F. (2004). Rapid adaptation to
Wayland R., Herrera E., & Kaan E. (2010). Effects of musical
foreign-accented English. Journal of the Acoustical Society of
experience and training on pitch contour perception. Journal of
America, 116, 3647-58.
Phonetics, 38, 654–62.
Flege, J.E., Yeni-Komshian, G.H., & Liu, S. (1999). Factors
Witteman, M.J., Bardhan, N.P., Weber, A., & McQueen,
affecting degree of foreign accent in an L2: A review. Journal of
J.M. (2015). Automaticity and stability of adaptation to
Memory and Language, 41, 78-104.
foreign-accented speech. Language and Speech, 52, 168-89.
351
Table of Contents
for this manuscript
Music Training Correlates with Visual but not Phonological Foreign Language
Learning Skills
Peter Martens,*1 Kimi Nakatsukasa, #2 Hannah Percival*3
*
School of Music, Texas Tech University, Lubbock, TX, USA
#
Department of Classical and Modern Languages and Literature, Texas Tech University, Lubbock, TX, USA
1
peter.martens@ttu.edu, 2kimi.nakatsukasa@ttu.edu, 3hannah.percival@ttu.edu
ISBN 1-876346-65-5 © ICMPC14 352 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
90
Music Majors
III. EXPERIMENT 2
80 (n=79) A. Method
70 Rogers et al.
If the superior performance by music majors observed in
percentage correct
353
Table of Contents
for this manuscript
intellectual basis for the correlations above may play into the
association of general intelligence with music aptitude and
training (Schellenberg, 2012). The fundamental musical
aptitudes or skills typically assessed within
performance-focused music programs (i.e. pitch and rhythm
perception and execution) may not represent the sole or even
primary enhancements to learning (L2 or otherwise) that
correlate with music training.
ACKNOWLEDGMENT
The authors wish to thank David Villarreal and his staff in
the Texas Tech University Language Learning Lab for their
logistical assistance with data-gathering.
REFERENCES
Forgeard, M., Winner, E., Norton, A., Schlaug, G. (2008). Practicing
a Musical Instrument in Childhood is Associated with Enhanced
Verbal Ability and Nonverbal Reasoning. PLoS ONE 3(10):
e3566.
Gordon, E. E. (1989). Advanced Measures of Music Audiation
(Grade 7 – Adult). Chicago, IL: GIA Publications.
Margulis, E. H. (2014). On repeat: How music plays the mind.
Oxford: Oxford University Press.
Meara, P. M. (2005). Llama language aptitude test. Swansea, UK:
Lognostics.
Schellenberg, E. G. & Weiss, M. W. (2012). Music and Cognitive
Abilities. In Deutsch, D. (Ed.), The Psychology of Music (3rd ed.),
pp. 499-550. Cambridge, MA: Academic Press.
Slevc., L. R. & Miyake, A. (2006). Individual differences in
second-language proficiency: Does musical ability matter?
Psychological Science, 17, 675-681.
Sloboda, J. (2011). Musical Expertise. In Levitin, D. J. (Ed.),
Foundations of Cognitive Psychology: Core Readings (2nd ed.).
Boston: Allyn & Bacon.
Rogers, V., Barnett-Legh, T., Curry, C., Davie, E., & Meara, P.
(2015). Validating the LLAMA aptitude tests. Presented to the
European Second Language Association (EUROSLA25), August
27-29, 2015, Aix-en-Provence, France.
354
Table of Contents
for this manuscript
The importance of rhythmic ability for normal development of language skills, in speech
and in print, is increasingly well-documented. The broad range of neurodevelopmental
disorders that implicate rhythm and/or temporal processing as a core deficit is also quite
striking. These include developmental stuttering, developmental dyslexia, and specific
language impairment (SLI), among others. This symposium brings together four scholars
working at the intersection of musical rhythm and language to consider (a) the role of
rhythm in typical language development and (b) the specific nature of rhythm and/or
temporal processing deficits in different language disorders.
ISBN 1-876346-65-5 © ICMPC14 355 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 356 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musical rhythm and linguistic grammar are both hierarchically organized, and the
timing of the speech signal contains important cues to grammar, such as pauses at
clause boundaries. Here we present a series of experiments aimed at investigating
the role of rhythm in grammatical development, in children with normal and disordered
language. Our preliminary work in n=25 typically developing (TD) six-year-olds found a
robust association (r=0.70) between music rhythm perception and grammar skills.
Performance on rhythm perception tasks accounted for individual differences in
children’s production of complex sentence structures (complex syntax: r=0.46; and
transformation/reorganization: r=0.42). Preliminary findings of EEG patterns (recorded
during listening to simple rhythms) indicate that higher grammar scorers show enhanced
neural sensitivity to accented tones (p=0.02) at early latencies. Pilot behavioral testing in
n=3 children with language impairment (SLI) also show lower musical rhythm scores
compared to their TD peers (d=-0.88). To examine mechanisms underlying these
associations, we are investigating speech rhythm as a possible mediator between musical
rhythm and grammar. First, we are probing children’s ability to synchronize speech with
an isochronous metronome, in a modified speech cycling paradigm. Preliminary results
point to correlations (r=0.60) with grammar skills. Second, we are testing the ability of
children with SLI and TD to learn novel word-forms from syllable streams that are
rhythmically biased to facilitate particular groupings. Findings will be discussed in the
context of our ongoing work exploring the dynamics of rhythm deficits in SLI and
studying whether facilitations of rhythm on grammar are due to auditory working
memory, speech rhythm sensitivity, or both. This line of research contributes to
knowledge about the role of rhythm in acquisition and processing of syntactic structure.
ISBN 1-876346-65-5 © ICMPC14 357 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Studies have reported a link between phonological awareness and musical rhythmic
abilities. Both speech perception and musical rhythm perception involve sensory
encoding of timing/intensity patterns, short term memory of temporal patterns, and
perception of grouping/phrasing. Beat processing is an important component of musical
rhythm, but since speech is not periodic, the association between the processes is not
clear. This study investigated the relationship among phonological, musical rhythm, and
beat processing, both behaviorally and its neural correlates. As part of a longitudinal
functional neuroimaging study of reading ability, children (N=35) performed a
phonological (first-sound matching contrasted with voice matching) in-scanner task in
kindergarten. In 2nd grade these children were administered a rhythm discrimination task
with strong beat (SB) and weak beat (WB) patterns, as well as cognitive and pre-literacy
measures. Behavioral results indicated a significant positive association between SB (but
not WB) performance and children’s phonological awareness and reading skills
(p<0.001). Mediation analysis revealed a significant indirect effect of rhythm
discrimination on literacy skills, b=.59, BCa CI [.118, 1.39]. Hierarchical regression
analysis demonstrated that temporally regular patterns, but not irregular patterns,
accounted for unique variance in reading (p<0.01). Children’s whole-brain functional
activation for first sound matching versus voice matching contrast was significantly
(p<0.01, corrected) correlated with their subsequent performance on SB and WB patterns.
The pattern of correlation differed between SB and WB patterns. Implications of these
results will be discussed.
ISBN 1-876346-65-5 © ICMPC14 358 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musicians have better performance in linguistic tasks such as reading abilities (e.g.,
Hurwitz et al., 1975) and speech in noise (e.g., Parbery-Clark et al., 2009). Recently,
Tierney and Kraus (2014) suggested that precise auditory timing is a core mechanism
underlying enhanced phonological abilities in musicians. Consistent with this hypothesis,
Tierney and Kraus (2013) found a correlation between tapping performance and reading
skills also in the general population. We empirically tested the precise auditory timing
hypothesis (PATH) on three groups of subjects. In the first study we evaluated the
entrainment, reading skills and other cognitive abilities of a group of (N=56) normal
adults. In the second, we examined the difference between dyslexic (N=26) and non-
dyslexic adults (N=19). In the third, we looked at the contrast between dyslexic (N=26)
and non-dyslexic professional musicians (N=42). In the first study we replicated Tierney
and Kraus’s results, finding correlations between reading skills and entrainment in the
normal population (word reading r=-0.4 p=0.034; non word reading r=-.46 p=0.013).
However, upon measuring a larger set of additional cognitive measures, we found that
these correlations could be explained by a more prominent correlation between
entrainment and working memory (WM), peaking in a non-auditory visual N-back task
(r=-0.52 p=0.0059). In the second study we found no difference in the entrainment
abilities of dyslexic and non-dyslexic adults. In the third study, only dyslexic musicians
displayed a correlation to WM abilities (r=.45 p=0.02). Our finding suggests an
alternative explanation for the PATH hypothesis. We propose that more general cognitive
mechanisms strongly associated with WM mediate the link between reading and
entrainment. This interpretation is also consistent with recent computational models of
reading disabilities (Sagi et al. 2015).
ISBN 1-876346-65-5 © ICMPC14 359 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Fang Liu1, Yanjun Yin2, Alice H. D. Chan3, Virginia Yip2 and Patrick C. M. Wong2,4
1
Department of Language and Linguistics, University of Essex, Colchester, Essex, UK
2
Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Hong Kong
SAR, China
3
Division of Linguistics and Multilingual Studies, School of Humanities and Social Sciences, Nanyang
Technological University, Singapore
4
Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong SAR, China
For tones with no context, amusics exhibited a shallower slope (i.e., reduced categorical
perception) than controls when categorization rates were plotted as a function of initial
F0. While controls’ lexical tone categorization demonstrated a significant context effect
due to talker adaptation, amusics showed similar categorization patterns across both
contexts. However, normal listeners’ FFRs to tones in different contexts did not show a
talker adaptation effect.
These findings suggest that congenital amusia impacts the extraction of speaker-specific
information in speech perception. Given that talker adaptation appears to be a high-order
cognitive process above the brainstem level, the deficits in amusia may originate from the
cortex.
ISBN 1-876346-65-5 © ICMPC14 360 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Yke Schotanus1
1
ICON, Universiteit Utrecht, Utrecht, The Netherlands
A growing body of evidence indicates that singing might affect the perception of
language both in a supportive and in a detrimental way. Accompaniment might be a
factor of interest, because loud rests (on-beat silences) and out-of-key notes may confuse
listeners, while accompaniment might elucidate rhythm and harmony and reduces the
amount of silences. This might further music processing and concentration. The main aim
of the experiment was to test this hypothesis in a class room situation and investigate at
the same time, whether school children might benefit from the use of song, even after a
single exposure. A total of 274 pupils (average age 15, SD 0,6), spread over twelve fixed
groups, listened to four complete songs in six different conditions (spoken or sung, with
and without accompaniment, sung with ‘lala’ instead of lyrics, and accompaniment only).
In a regular lesson Dutch Language and Literature each group listened to five different
tracks and completed a questionnaire after each track. The questions concerned
processing fluency, valence, comprehension, recall, emotion and repetition. Listeners
rated accompanied songs more beautiful and less exhaustive than unaccompanied songs
or spoken lyrics, and had the feeling that in these songs the text is more intelligible and
comprehensive. With accompaniment the singers voice was considered more relaxed and
in tune than without (although the same recordings were used). Furthermore, the silences
in unaccompanied versions were rated more distracting than the interludes or the
accompaniment in the other conditions. Lyrics were considered more happy, funny,
sensitive and energetic, and less sad, heavy and nagging in the sung versions. Finally,
verbatim repetition of words is less accepted when spoken, and adds (mainly) emotional
meaning to those words in the sung conditions. These findings indicate that accompanied
singing supports the transfer of verbal information, although it affects the contents of the
words.
ISBN 1-876346-65-5 © ICMPC14 361 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Language and music are higher cortical functions supporting verbal and non-verbal
human communication. Despite their shared basis in auditory processing, music appears
more resistant to neurological disease than language. Focal epilepsy provides a good
model to test this hypothesis, selectively disrupting unilateral network functions.
Language may also reorganise due to early onset left focal seizures, while effects on
music functions are less clear. We compared the effects of focal epilepsy on music and
language functions relative to age of seizure onset, laterality and seizure network. We
expected music to be more resistant to left or right frontal or temporal network seizures
than language, reflecting a greater bilateral cortical representation. We tested 73 pre-
surgical epilepsy patients and 31 matched controls on a comprehensive battery of music
tests (pitch, melody, rhythm, meter, pitch and melodic memory, melodic learning),
language tests (verbal fluency, learning, memory, verbal reasoning, naming), visuospatial
and executive functions, and IQ. Controlling for age and IQ, patients performed worse
than controls on verbal fluency, verbal reasoning, and word list recall (all ps <.05),
associative learning of semantically-linked (p =.065) and arbitrary-linked word pairs (p
<.01) and their delayed recall (p <.01). This was particularly evident for left frontal or
temporal seizure foci. A specific rhythm deficit was seen in left foci patients (p <.05), but
no other differences in music skills between patients and controls. There were no effects
for age of seizure onset (all ps >.05). So while disruption of left-sided networks impairs
language, verbal learning and memory, music skills were largely preserved for left or
right-sided network disease, irrespective of seizure onset. This supports theories of
proximal but distinct neural systems for language and music, with left specialisation for
temporal processing, and with music benefitting from greater bilaterality.
ISBN 1-876346-65-5 © ICMPC14 362 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Stuttering is a disorder that affects the timing and rhythmic flow of speech; however,
when speech is synchronized with an external pacing signal, stuttering can be markedly
alleviated. Several neuroimaging studies have pointed to deficiencies in connectivity of
the basal ganglia thalamocortical (BGTC) network in people who stutter, suggesting that
stuttering may be associated with deficient auditory-motor integration, timing, and
rhythm processing. This study compared brain activity patterns in adults who stutter
(AWS) and matched controls performing an auditory rhythm discrimination task while
undergoing functional magnetic resonance imaging (fMRI; n = 18 for each group). We
hypothesized that AWS would show (1) worse rhythm discrimination performance,
particularly for complex rhythms, and (2) aberrant brain activity patterns in the BGTC
network during the rhythm discrimination task, particularly for complex rhythms. Inside
the scanner participants heard two presentations of a standard rhythm and judged whether
a third comparison rhythm was the same or different. Rhythms either had an explicit
periodic accent (simple) or did not (complex). As expected, simple rhythms were easier
to discriminate than complex rhythms for both groups (p < .001). Although no main
group difference was found, a significant rhythm type by group interaction exists (p =
.038), where AWS had greater difficulty discriminating complex rhythms than controls.
The fMRI results demonstrated that AWS had greater (and more right-lateralized)
activation than controls in the BGTC network (e.g., right STG, right insula). Our results
suggest that AWS are utilizing the rhythm processing network more during the task than
controls, and may mean that AWS have to “work harder” during the task or are less
familiar with utilizing this network. This difference may underlie deficiencies in internal
rhythm generation, in turn affecting the timing of speech movements that result in
disfluent speech.
ISBN 1-876346-65-5 © ICMPC14 363 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Language and music are both high-level cognitive domains that highly rely on prosody.
While whether these two domains correlate has received much attention, the issue of
whether experience may shape such correlation is under-investigated. To fill in the void,
the current study explores pitch perception in both domains among listeners speaking
different languages, and we ask the question whether language experience may influence
the correlation between perception of pitch in speech and music. We tested Chinese-
English, Dutch-English, and Turkish-English non-musicians bilinguals on their
sensitivity to pitch information in speech and music, with their English proficiency
controlled for. Chinese is a syllable timed tone language, whereas Dutch and Turkish are
stress timed, with Dutch have a predominantly strong-weak metrical structure and
Turkish the opposite. For the speech domain, the listeners were tested on their
discrimination of Chinese lexical tones. For the music domain, they were tested with the
melody part of Musical Ear Test (MET), where they needed to discriminate between two
melodic musical phrases. We found that no significant difference between the three
groups for the lexical tone discrimination while, for the MET, the Chinese-English and
Turkish-English listeners outperformed the Dutch-English listeners. Importantly, for both
the Dutch-English and the Turkish-English listeners, significant correlation was found
between the lexical tone discrimination and the MET, whereas no such correlation was
found among the Chinese listeners. Our results suggest that 1) knowing two prosodically
different languages enhances musical pitch perception and 2) Unlike non-tone language
listeners who perceive both lexical tones and musical pitch psycho-acoustically, tone
language listeners perceive their lexical tones phonemically, and they seem to “split”
lexical tones from other non-phonemic pitch variations, resulting in the lack of cross-
domain correlation.
ISBN 1-876346-65-5 © ICMPC14 364 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
W ithin music theory, Narmour’s Implication- substance of Narmour’s IR. The adaptation allows the
Realization model of melody (IR) [1]–[3] is unusual in description of pitch as a series of melodic intervals of varied
that it treats a musical melody as a series of pitched tones not sizes, leading to possibilities not available in linguistic
fundamentally perceived in terms of harmony or discrete pitch approaches that rest on the classification of discrete pitch
identity. IR would thus seem to be readily applicable to speech levels.
intonation, a type of melody matching the same description. IR models melodic hearing as a process of successively
IR holds promise for the study of intonational phonology comparing the intervals formed by consecutive pairs of tones.
because it analyzes melody in terms of a system of perceptual From this series of comparisons a network of both binary and
binary contrasts, an essential element of phonological
description. A crucial element of prosody is its use of accents
to convey informational coherence between elements of
discourse, and here yet again IR holds promise, if it can be
shown that IR’s network of implications and realizations
within a melody is analogous to intonation’s cohesive
discursive relationships.
In this paper, I outline the possible use of IR to describe
linguistic intonation partly to suggest what IR may offer to
Fig. 1. Illustration of IR analysis. Bracketed structural label
linguistics and partly to deepen our understanding of music. as used in [1], [2] at top; this article’s labeling system in
IR is especially well suited to comparison with the middle and at bottom.
Autosegmental-Metrical model of intonation (AM) [4]–[8].
AM’s contributions include its interpretation of the spoken
pitch contour as a series of discrete accents or points of
attention; its conception of accent as a phenomenon realizable
in features other than pitch, such as intensity or duration; and
its relationship to discourse and information structure.
ISBN 1-876346-65-5 © ICMPC14 365 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
TABLE 1 TABLE 2
I MPLICATION-REALIZATION SYMBOLS PROVISIONAL VALUES OF BOUNDARIES
Implication/closu BETWEEN IR’ S ANALYTICAL CATEGORIES
Interval type Feature described
re symbol Threshold Value in music [1], [2] Value in speech
O initial interval of zero magnitude ?> L to S 6 semitones (st) 4 semitones (st)
Process (continuation of previous R+ to R~ 3 st 2.5 st
P ??
interval’s direction)
R~ to R0 N/A .25 st
R Reversal of previous direction <?
Second consecutive interval of zero R~ to R- 3 st 2.5 st
D ??
magnitude P+ to P~ 4 st 3 st
Interval contrastively larger than P~ to P0 N/A .25 st
+ ?>
previous
P~ to P- 4 st 3 st
Interval contrastively smaller than
- <?
previous Duration
1.5 times duration of 1.2 times duration of
Interval precisely same size as suppressing
0 ?? previous tone previous tone
previous implication
Interval approximately same size as
` ??
previous previous tone) are indicated by a small, high arrow \. A tone’s
P=
Non-zero interval following zero
?? net implication is:
interval
R=
Zero-magnitude interval after non-
<? net implication = (i – c) + (sc – si) (1)
zero interval
Full closure } where i = arrowheads and c = tails occurring since the
Implication suppressed (closure beginning of the structure, and where and sc = small arrows
promoted) by feature other than | or ??|
and si = vertical bars applying to the tone itself.
interval or pitch
Closure suppressed by non-pitch
If net implication > 0, the listener proceeds to the next tone,
\ now treating the old “later” interval as the new “earlier”
feature
( ) Retrospective (unexpected) interval against which to compare the new “later” interval
[parentheses] realization formed by the new tone. If net implication ≤ 0, the tone is
No parentheses Prospectively expected interval fully closural } and is expressed on the next hierarchical level,
Initial tone { where it interacts with the earlier and later tones that project to
S small initial interval ?>> that level.
L large initial interval ?>> The initial interval of a segment—including any interval
O Initial interval of magnitude 0 ?> following a break or begun by a closural tone—is not
Non-Italic evaluated in comparison to the previous interval. Depending
Ascending interval
letter on its size. Its type is labeled L or S, its initial tone is marked
Descending interval (somewhat {, and its second tone normally receives two arrowheads ?>>
Italic letter
closural)
(see Table 1).
scaled contrasts emerges in which pitch relationships are For comparison, Narmour does not use the symbols L and
specified fairly precisely. The tones express both long-term S, and his letter symbols refer to three-note structures, not to
and short-term implicative connections between tones, and two-note intervals. Analyses are often presented in a kind of
these connections can frame events as either expected or shorthand involving brackets that explicitly denote only the
surprising. Given two consecutive intervals (spanning three hierarchically salient beginnings and endings of structures (see
tones), IR in effect describes the later interval by comparing it the tops of Figs. 1 and 2).
to the earlier interval, using interval-size labels that combine The labeling system of IR rests on the premise that there are
the symbols P, R, and D with the symbols +, -, ~, and 0 as thresholds of difference above which two intervals are
shown in Table 1. A reversal in direction R or a sufficient perceived to have categorical contrast. These thresholds
decrease in size - from the earlier interval contributes closure, account for the distinctions between symbols such as L and S
or P+ and P~ and thus serve as the basis for categorical interval
and any such closural feature is indicated by an arrowtail ⤚?
sizes here. In principle these thresholds vary depending on
above the final tone. Conversely, a sufficient increase in
context; the values used in this study are given in Table 2.
interval size is implicative (anticlosural) and is indicated by an
In applying IR to speech, two additional difficulties present
arrowhead ?>. (This usage of arrowheads and tails departs
themselves. First, IR uses as input a series of discrete musical
somewhat from Narmour’s, where arrowheads and tails
tones instead of speech’s continuously varying pitch contour.
typically describe implicative connections. Here they describe
Thus the continuous contour is converted to a stylized series
changes in what Margulis calls expectancy [9].) Non-pitch
of steady pitches, generally one per syllable, using weighted
features that contribute closure (for example, a clear increase
averages supplemented by human judgement. Second, the
of a tone’s duration over that of the previous tone) are
appropriate categorical thresholds appear to be larger for
indicated by vertical bars | . Features that suppress closure
music than for speech. The values shown in Table 3 are
(such as a tone’s clearly decreased duration with respect to the
366
Table of Contents
for this manuscript
Fig. 3. College baseball coach quoted by broadcaster Vin Scully: “On a pop fly, you don’t say, ‘I got it, I got it,’ you say, ‘I have it, I have it.’ ”
AM annotation appears above the words. IR analysis appears below the words. The numbers at bottom indicate syllables’ stylized frequencies.
Thick lines graph spoken pitch. Thin lines graph spoken intensity.
III. OBSERVATIONS
Figs. 3–6, a selection of juxtaposed AM and IR analyses of
spoken utterances, reveal a number of consistent points of
comparison between AM and IR:
1. Tones that appear as suffix boundary tones (symbolized
%) in AM appear as non-closed tones at shallow
hierarchical levels in IR. (See figs. 3 and 6.)
2. In English and German, tones that AM treats as H* or L*
tend to be closural on shallow hierarchical levels. Thus Figure 4. Chickasaw utterance: “Does the wolf run?”
they emerge at deeper levels, where they can interact
implicatively with tones from earlier and later speech in pitch functions implicatively at the shallow level and
segments. These implicative interactions correspond to thus does not figure at deeper levels.
the cohesive relationships between accented words as 4. While there are not always literal mappings between
described by AM. particular AM accent labels and particular IR labels, IR
3. Where AM relies on pitch peaks and troughs to identify nevertheless generates robust descriptions of pitch and its
H* and L* accents, IR’s employment of pitch and functions that are consistent with AM’s accounting of the
durational factors also results in robust identifications of features’ functions. They are also more precise. AM’s
these accents. This is even true in cases where the peak L+H* accent, whose sharply rising contour draws
pitch occurs after the accented syllable, e.g. “YOU don’t attention to contrasting information, is associated with
say” in Fig. 3 [11] and “malilita” in Fig. 4 [12]. While the
AM analyses parse these two cases differently, IR has a
single approach to both: the accented tones are initiations
which emerge on deeper levels while the subsequent rise
TABLE 3
AUTOSEGMENTAL-METRICAL SYMBOLS
Phonological Function
Meaning Meaning
Symbol Symbol
H high pitch * lexical accent
L low pitch – phrase accent
^ or < pitch upstep % boundary tone
! pitch downstep Figure 5. German utterance: “Is yours the blue camper?”
367
Table of Contents
for this manuscript
Figure 6. AM and IR analyses of the utterance “You want an example? Mother Teresa.”
Dotted lines graph stylized pitch.
increasing interval size +; hence the ( R+) on “blaue” in All of these aspects of IR draw attention to the expressive
Fig.5 [13] and the P+ on the first “I have it” in Fig. 3. power of phonological processes that appear to be shared by
The accent on “Teresa” in Fig. 6 [14], described as !H music and language.
downstepped from the previous H in AM, is precisely
specified in IR by being described as (r+), p~, and s in REFERENCES
relation to earlier pitches on each of three levels. [1] E. Narmour, The Analysis and Cognition of Basic Melodic Structures:
5. IR captures resemblances between instances whose The Implication-Realization Model. Chicago: University of Chicago
Press, 1990.
intonation differs superficially but coincides functionally. [2] E. Narmour, The Analysis and Cognition of Melodic Complexity: The
The German rising yes/no-question in Fig. 5 is seen to be Implication-Realization Model. Chicago: University of Chicago Press,
similar in the IR analysis to the falling Chickasaw 1992.
[3] E. Narmour, “Toward a Unified Theory of the I-R Model (Part I):
yes/no-question in Fig. 4. In these questions the answer is Parametric Scales and their Analogically Isomorphic Structures,” Music
unknown to the speaker. IR can also make distinctions Perception, vol. 33, no. 1, pp. 32–69, 2015.
between instances whose intonation is superficially [4] J. B. Pierrehumbert, “The phonology and phonetics of English
similar but differs functionally. In the rising yes/no intonation,” Ph.D. dissertation, Massachusetts Institute of Technology,
1980.
question of Fig. 6, the speaker has already decided the [5] J. Pierrehumbert and J. Hirschberg, “The Meaning of Intonational
answer, and IR offers an analysis different from those of Contours in the Interpretation of Discourse,” in Intentions in
Figs. 4 and 5. Communication, P. R. Cohen, J. L. Morgan, and M. E. Pollack, Eds.
MIT Press, 1990, pp. 271–312.
6. IR is able to describe intonational mimesis of [6] A. Wennerstrom, The Music of Everyday Speech: Prosody and
relationships between items in discourse. Symbols such Discourse Analysis. Oxford: Oxford University Press, 2001.
as L and R+ denote intervals whose pitch evinces [7] M. E. Beckman, J. Hirshberg, and S. Shattuck-Hufnagel, “The Original
ToBI System and the Evolution of the ToBI Framework,” in Prosodic
perceptual contrast; S and P- denote non-contrast. Thus it Typology: The Phonology of Intonation and Phrasing, S.-A. Jun, Ed.
is significant that the contrasting words “have” and “got” Oxford: Oxford University Press, 2005, pp. 9–54.
in Fig. 3 are linked by the contrastive interval L and that [8] D. R. Ladd, Intonational Phonology, 2nd ed. Cambridge: Cambridge
University Press, 2008.
the non-contrastive relationship between the topics of the [9] E. H. Margulis, “A Model of Melodic Expectation,” Music Perception,
two halves of Fig. 6 is signaled by the non-contrastive s. vol. 22, no. 4, pp. 663–714, Summer 2005.
[10] P. Boersma and D. Weenink, Praat: Doing Phonetics by Computer
[computer program]. 2014.
IV. CONCLUSION [11] “Vin Scully college grammar story,” YouTube, Sep. 13, 2011 [Video
In these analyses, IR is significant not merely for its file]. Available: https://www.youtube.com/watch?v=1doxB2AZnaI.
[Accessed: April 23, 2016].
assertions about expectation but also for its contributions to [12] M. Gordon, “An autosegmental-metrical model of Chickasaw
the study of segmentation, cohesion, and phonological intonation,” in Prosodic Typology: The Phonology of Intonation and
categorization in sound streams. If IR successfully describes Phrasing, S.-A. Jun, Ed. Oxford: Oxford University Press, 2005, pp.
301–330.
spoken intonation, it becomes possible to understand more [13] M. Grice, S. Baumann, and R. Benzmüller, “German Intonation in
deeply the musical analyses possible in IR, and thus the nature Autosegmental-Metrical Phonology,” in Prosodic Typology: The
of music itself. IR’s interval-based approach may contribute Phonology and Intonation and Phrasing, S.-A. Jun, Ed. Oxford: Oxford
University Press, 2005, pp. 55–83.
usefully to the study of intonation. In addition, studies of
[14] Alejna Brugos, Stefanie Shattuck-Hufnagel, and Nanette Veilleux,
speech-song hybrids such as chant, recitation, Sprechmelodie, “Transcribing Prosodic Structure of Spoken Utterances with ToBI.” MIT
parlando, and rap music would illuminate IR, and vice versa. Open Courseware, ch. 2.7. [Accessed: Nov. 15, 2015].
368
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 369 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
song after each repetition. We measured the musical were averaged across all subjects. The initial rating and the
characteristics of each stimulus using computational models of rating change between the first and last repetition were then
melodic structure (Temperley 2007) and musical beat structure calculated and correlated with several stimulus characteristics.
(Ellis 2007). Our prediction was that these musical First, we measured the average pitch flatness within syllables
characteristics would explain additional variance in the extent by manually marking the onset and offset of each syllable,
to which each stimulus transformed into song with repetition, tracking the pitch contour using autocorrelation in Praat,
even after syllable pitch flatness was accounted for. calculating the change in pitch between each time point (in
Furthermore, we predicted that musical characteristics would fractions of a semitone), and then dividing by the length of the
correlate with the change in song ratings due to repetition but syllable. Thus, pitch flatness was measured in semitones per
not with song ratings after a single repetition. second.
II. Experiment 1 Second, we determined the best fit of the pitches in each
sequence to a diatonic key using a Bayesian key-fitting
A. Methods algorithm (Temperley, 2007) which evaluates the extent to
which pitch sequences fit characteristics of melodies from
1) Participants. 32 participants were tested (14 male). The western tonal music by assessing them along a number of
mean age was 33.7 (standard deviation 9.4). The mean amount dimensions, including the extent to which the distribution of
of musical training was 1.9 years (standard deviation 3.8). interval sizes fits the interval size distribution of Western vocal
2) Stimuli. Stimuli consisted of the 24 Illusion stimuli and music and fit to tonal key profiles.
24 Control stimuli described in Tierney et al. (2013). These The model considers four sets of probabilities—key profile,
were short phrases (mean 6.6 (sd 1.5) syllables) extracted from central pitch profile, range profile, and proximity
audiobooks. profile—which are all empirically generated from the Essen
3) Procedures. Participants were recruited via Mechanical Folksong Collection, a corpus of 6217 European folk songs.
Turk, a website which enables the recruitment of participants The key profile is a vector of 12 values indicating the
for internet-based work of various kinds. This study was probability of occurrence for each of the 12 scale tones in a
conducted online using the Ibex paradigm for internet-based melody from a specific key, normalized to sum to 1. For
psycholinguistics experiments (http://spellout.net/ibexfarm). example, on average 18.4% of the notes in a melody in a Major
Participants were told that they would hear a series of spoken key are scale degree 1 (e.g., C in C Major), while only 0.1% are
phrases, each repeated eight times. After each presentation they scale degree #1 (e.g., C# in C Major). The central pitch profile
were given three seconds to indicate, on a scale from 1 to 10, (c) is a normal distribution of pitches represented as integers
how much the stimuli sounded like song versus like speech. (C4 = 60) with a mean of 68 (Ab4) and variance of 13.2, which
Judgments could be made either by clicking on boxes which captures the fact that melodies in the Essen corpus are centered
contained the numbers or by pressing the number keys. Order within a specific pitch range. The range profile is a normal
of presentation of the stimuli was randomized. Participants also distribution with a mean of the first note of the melody, and
heard, interspersed throughout the test, four “catch” trials in variance of 29, which captures the fact that melodies in the
which the stimulus actually changed from a spoken phrase to a Essen corpus are constrained in their range. The proximity
sung phrase between the fourth and fifth repetitions. Data from profile is a normal distribution with a mean of the previous note,
participants whose judgments of the sung portions of the catch and variance of 8.69, which captures the fact that melodies in
trials were not at least 1.5 points higher than their judgments of the Essen corpus have relatively small intervals between
the spoken portions were excluded from analysis. This adjacent notes. The final parameter of the model is the RPK
constraint resulted in the exclusion of 8 participants from profile, which is calculated at each new note as the product of
Experiment 1, 7 participants from Experiment 2, and 8 the key profile, range profile, and proximity profile. The
participants from Experiment 3. Excluded participants are not inclusion of the RPK profile captures the fact that specific
included in the subject counts of demographic descriptions notes are more probable after some tones than others.
within the Participants section for each Experiment. Calculating the probability of each of the 24 (major and
4) Analysis. To minimize the influence of inter-subject minor) diatonic keys given a set of notes is done using the
differences on baseline ratings, ratings were normalized prior equation below:
to analysis with the following procedure: each subject’s mean
rating across all repetitions for all stimuli was subtracted from
each data point for that subject. Data were then averaged across (ℎ
) =
()()
stimuli within a single subject. This generated, for each subject, ,
average scores for each repetition for Illusion and Control
stimuli. A repeated measures ANOVA with two P(k) is the probability of any key (k) being chosen (higher for
within-subjects factors (condition, two levels; repetition, eight major than minor keys), P(c) is the probability of a central pitch
levels) was then run; an interaction between repetition and being chosen, and RPKn is the RPK profile value for all pitches
condition was predicted, indicating that the Illusion stimuli of the melody given the key, central pitch, and prior pitch for
each note. We defined melodic structure as the best fit of each
transformed to a greater extent than the Control stimuli after
sequence to the key (k) that maximized key fit in the equation.
repetition.
Finally, we used a computer algorithm designed to find beat
To investigate the stimulus factors contributing to the
times in music (Ellis 2007) to determine the location of each
speech-song illusion, for each stimulus initial and final ratings
370
Table of Contents
for this manuscript
beat, and then we calculated the standard deviation of Table 1. Correlation between song ratings and stimulus
inter-beat times to measure beat regularity. characteristics. Bolded cells indicate significance at p < 0.05
(Bonferroni corrected).
The beat tracking algorithm works as follows. First, it
divides a sound sequence into 40 equally-spaced Mel r-values Initial rating Rating change
frequency bands, and extracts the onset envelope for each band Pitch 0.151 0.588
by taking the first derivative across time. These 40 onset flatness
envelopes are then averaged, giving a one-dimensional vector Melodic 0.141 0.541
of onset strength across time. Next, the global tempo of the structure
sequence is estimated by taking the autocorrelation of the onset Beat 0.456 0.402
variability
envelope, then multiplying the result by a Gaussian weighting
function. Since we did not have a strong a priori hypothesis for
the tempo of each sequence, the center of the Gaussian We used hierarchical linear regression to determine whether
weighting function was set at 120 BPM, and the standard pitch flatness, melodic structure, and beat variability
deviation was set at 1.5 octaves. The peak of the weighted contributed independent variance to song ratings. By itself,
autocorrelation function is then chosen as the global tempo of pitch flatness predicted 34.6% of variance in song ratings (ΔR2
the sequence. Finally, beat onset times are chosen using a = 0.346, F = 21.2, p < 0.001). Adding melodic structure
dynamic programming algorithm which maximizes both the increased the variance predicted to 45.2% (ΔR2 = 0.105, F =
onset strength at chosen beat onset times and the fit between 7.5, p < 0.01). Adding beat variability increased the variance
intervals between beats and the global tempo. A variable called predicted to 55.6% (ΔR2 = 0.105, F = 7.5, p < 0.01).
“tightness” sets the relative weighting of onset strength and fit C. Discussion
to the global tempo; this value was set to 100, which allowed a
moderate degree of variation from the target tempo. As reported previously (Tierney et al. 2013), stimuli for
which the speech-song transformation was stronger had flatter
The beat times chosen by this algorithm tend to correspond pitch contours within syllables. However, this characteristic
to onsets of stressed syllables, but can also appear at other was not sufficient to fully explain why some stimuli
times (including silences) provided that there is enough transformed more than others. Adding musical cues, namely
evidence for beat continuation from surrounding stressed melodic structure and beat variability, improved the model.
syllable onsets. This algorithm permits non-isochronous beats, This result suggests that listeners who are relatively musically
and is therefore ideal for extracting beat times from speech, inexperienced rely on melodic and rhythmic structure when
despite the absence of metronomic regularity (Schultz et al. judging the musical qualities of spoken phrases.
2015). As this procedure was only possible when phrases
contained at least three identifiable beats, we were only able to We also found, contrary to our prediction, that musical beat
calculate beat variability for 42 of the 48 stimuli. As a result, variability correlated not just with the increase in song
correlational and regression analyses were only run on these 42 perception with repetition but also with song ratings after a
stimuli. single repetition. Melodic structure and pitch flatness, however,
only correlated with song ratings after repetition. This result
B. Results suggests that the rhythmic aspects important for the perception
Song ratings increased with repetition across all stimuli of song can be perceived immediately, while the melodic
(main effect of repetition, F(1.5, 48.0) = 47.9, p < 0.001). aspects take time to extract. This finding could provide a partial
However, this increase was larger for the Illusion stimuli explanation for why repetition is necessary for the speech-song
(interaction between condition and repetition, F(1.5, 46.8) = illusion to take place, but does not rule out the possibility that
62.7, p < 0.001). Moreover, song ratings were greater overall satiation of speech perception resources is playing an
for the Illusion stimuli compared to the Control stimuli (main additional role.
effect of condition, F(1, 31) = 83.6, p < 0.001). (See Figure 1,
black lines, for a visual display of song ratings across Although these results suggest that syllable pitch flatness,
repetitions for Illusion and Control stimuli in Experiment 1.) melodic structure, and beat variability all influence
Table 1 displays correlations between initial ratings and speech-song transformation, our correlational design was
rating changes, and the three stimulus attributes. After a single unable to show that these factors played a causal role. To begin
repetition, subjects’ reported song perception was only to investigate this issue we ran two follow-up experiments in
correlated with beat variability. However, rating change was which syllable pitch flatness (Experiment 2) and melodic
correlated with pitch flatness, melodic structure, and beat structure (Experiment 3) were experimentally varied. We
variability. predicted that the removal of either of these cues would
diminish the speech-song effect.
III. Experiment 2
D. Methods
1) Participants. 27 participants were tested (17 male). The
mean age was 33.1 years (standard deviation 7.4). The mean
amount of musical training was 1.0 years (standard deviation
1.6).
371
Table of Contents
for this manuscript
2) Stimuli. Illusion and Control stimuli from Experiment 1 not affect the magnitude of the speech-song transformation.
were altered for Experiment 2. First, the ratio of pitch Falk et al. (2014), on the other hand, found that increasing tonal
variability within syllables in the Control stimuli and Illusion target stability boosted the speech-song transformation;
stimuli was measured. On average pitch variability within however, our pitch manipulations in this study were much
syllables was 1.49 times greater for Control stimuli than for smaller than those used by Falk et al. (2014). While these
Illusion stimuli. This ratio was then used to alter the pitch results do not rule out a role for syllable pitch flatness entirely,
contours of syllables (using Praat) such that pitch movements they do suggest that this factor does not play a major role in
within syllables for Illusion stimuli were multiplied by 1.49, distinguishing transforming from non-transforming stimuli in
while pitch movements within syllables for Control stimuli this particular corpus.
were divided by 1.49. This process switched the pitch flatness
of Control and Illusion stimuli while maintaining other IV. Experiment 3
characteristics such as beat variability and melodic structure.
G. Methods
3) Procedures. Same as Experiment 1.
1) Participants. 31 participants were tested (21 male). The
4) Analysis. To determine whether the pitch flatness mean age was 35.5 years (standard deviation 7.5). The mean
manipulation altered song perception ratings, data from amount of musical training was 1.1 years (standard deviation
Experiment 1 and Experiment 2 were compared using a 2.3).
repeated measures ANOVA with two within-subject factors
(condition, two levels; repetition, eight levels) and one 2) Stimuli. Illusion and Control stimuli from Experiment 1
between-subject factor (experiment). were altered for Experiment 3 using a Monte Carlo approach
with 250 iterations across each of the 48 stimuli. For each
E. Results iteration the pitch of each syllable was randomly shifted to
Similar to the data from Experiment 1, across all stimuli for between 3 semitones below and 3 semitones above its original
both experiments there was an increase in song ratings with value. Temperley’s (2007) algorithm was then used to
repetition (main effect of repetition, F(1.4, 81.9) = 82.2, p < determine which of the 250 randomizations resulted in the
0.001), although this increase was larger for the Illusion stimuli worst fit to Western melodic structure, and this randomization
(interaction between condition and repetition, F(1.6, 88.6) = was used to construct the final stimulus. Note that this
99.9, p < 0.001). Song ratings were also greater overall for the manipulation does not affect the pitch flatness within syllables.
Illusion stimuli compared to the Control stimuli (main effect of 3) Procedures. Same as Experiment 1.
condition, F(1,57) = 133.9, p < 0.001). However, the pitch
flatness manipulation had no measurable effect on song 4) Analysis. To determine whether the melodic structure
perception ratings (no interaction between experiment and manipulation altered song perception ratings, data from
repetition, F(1,57) = 1.0, p > 0.1; no interaction between Experiment 1 and Experiment 3 were compared using a
experiment, repetition, and condition, F(1.6, 88.6) = 0.56, p > repeated measures ANOVA with two within-subject factors
0.1). See Figure 1 for a visual comparison between song ratings (condition, two levels; repetition, eight levels) and one
from Experiment 1 and from Experiment 2. between-subject factor (experiment).
H. Results
Similar to the data from Experiment 1, across all stimuli for
both experiments there was an increase in song ratings with
repetition (main effect of repetition, F(1.4, 86.0) = 81.6, p <
0.001), although this increase was larger for the Illusion stimuli
(interaction between condition and repetition, F(1.5, 91.2) =
105.6, p < 0.001). There was also a tendency for the Illusion
stimuli to sound more song-like overall (main effect of
condition, F(1, 62) = 161.9, p < 0.001). Importantly, however,
the melodic structure manipulation changed song perception
ratings: the repetition effect for Illusion stimuli was larger for
the original stimuli than for the altered stimuli (3-way
interaction between repetition, condition, and experiment,
F(1.5, 91.2) = 6.1, p < 0.01). The melodic structure
manipulation also had different overall effects for the two
classes of stimuli, decreasing song perception ratings for the
Figure 1. Song ratings for Illusion and Control stimuli,
Illusion stimuli but increasing song perception ratings for the
unaltered (black line) and with increased pitch variation for the Control stimuli (interaction between condition and experiment,
Illusion stimuli and decreased pitch variation for the Control (F(1, 62) = 6.3, p < 0.05). See Figure 2 for a visual comparison
stimuli (red line). between song ratings from Experiment 1 and from Experiment
3.
F. Discussion
Contrary to our predictions, switching the syllable pitch
flatness characteristics of the Illusion and Control stimuli did
372
Table of Contents
for this manuscript
373
Table of Contents
for this manuscript
Poetry is an artistic form of communication that can contain meter, rhyme, or lexical
content. We examined how the temporal regularity of spoken German poetry varied as a
function of these features. Using a beat tracker that was previously used to measure
speech rate based on moments of acoustic stress, we examined how stress rate and
variability were influenced by meter, rhyme, and lexical content during read poetry
performances. Theories of cognitive fluency state that less cognitively demanding tasks
are performed more quickly and less variably than more cognitively demanding tasks.
Based on evidence that meter, rhyme, and lexical words would increase cognitive
fluency, we hypothesized that the stress rate would be faster and less variable for: 1)
poems with an implied meter compared to those without, 2) rhyming poems compared to
non-rhyming poems, and 3) poems with lexical words compared to pseudo-words. A
German poet performed poetry readings consisting of single phrases where the written
stanzas varied in meter (meter, no meter), rhyme (rhyme, no rhyme), and lexical content
(lexical words, pseudo words). Recordings were subjected to beat tracking analyses that
yielded the mean and standard deviation of the inter-beat intervals for each stanza
measuring the stress rate and variability, respectively. In line with our hypothesis, poems
with a meter were spoken at a faster rate and with less variability than poems without
meter (ps < .001). Bayes factor t-tests revealed substantial evidence for the null
hypothesis indicating no difference in stress rate or stress variability for the factors rhyme
and lexical content. Results indicate that, during readings of poetry, meter increases stress
rate and decreases variability, suggesting that an implied meter may increase cognitive
fluency. However, rhyme and lexical content do not influence stress rate or variability.
The perception of the poetry readings will be examined using recollection tasks and
EEG recordings.
ISBN 1-876346-65-5 © ICMPC14 374 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The present study tests German amusics’ perception of vowels, focusing on durational
and qualitative (i.e. spectral) characteristics. We hypothesize that vowel pairs with small
differences in duration or quality are difficult to discriminate by amusics, and that
amusics employ a different cue weighting of spectral and durational cues compared to
normal listeners. Furthermore, we hypothesize that amusics have severe problems
perceiving vowel differences if these are presented with a short inter-stimulus interval
(ISI), as amusics have shown problems with rapid auditory temporal processing
(Williamson et al. 2010).
We tested this with vowel stimuli synthesized on the basis of the German front mid
vowels /e:/ (long and tense) and /ɛ/ (short and lax): By varying the spectral (tense vs. lax)
as well as durational information (long vs. short), we created four separate continua (see
e.g. Boersma & Escudero 2005). These continua were used in a counterbalanced AXB
task, where the ISI was once 0.2 s and once 1.2 s.
Preliminary findings show that the amusics’ overall performance is worse than that of the
controls, and that they have severe problems with an ISI of 0.2 s (almost chance
performance).
This study thus supports earlier findings that amusia is not music-specific but affects
general auditory mechanisms that are also relevant for language. In addition, it shows that
these impairments affect the perception of vowel quality and duration.
ISBN 1-876346-65-5 © ICMPC14 375 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ABSTRACT: The development of musical style across time and geography is of particular interest to
historians and musicologists yet quantitative evidence to support these trends has been lacking. This paper
illustrates a novel application of the nPVI (‘normalized pairwise variability index’) equation to probe and
quantify the rhythmic components of music over time. The nPVI equation quantifies the average difference
between adjacent events in a sequence (e.g. musical notes in a melody, successive vowels in a spoken
sentence). Building upon an earlier finding that German/Austrian composer nPVI values increased steadily
from 1600 to 1950 (while Italian composers showed no such increase), the nPVI ‘distribution’ of themes
from individual composers was quantitatively explored. Interestingly, the proportion of ‘low nPVI’ or
‘Italianate’ themes decreases rapidly with time while ‘high nPVI’ (more Germanic) themes concomitantly
increase in frequency. ‘Middle range nPVIs’ exhibit a constant incidence, arguing for a replacement of ‘low
nPVIs’ (Italianate) with ‘high nPVIs’ over a short time instead of a more modest, long-term progressive
shift. Thus, the precise rhythmic components of complex stylistic shifts in music can be quantitatively
extracted from music and support the historical record and theory.
1. BACKGROUND
The nPVI (‘normalized pairwise variability index’) is an equation originally developed for linguistic
analysis that quantifies the average amount of durational contrast between successive events in a sequence
(e.g. musical notes in a melody or successive vowels in a spoken sentence). A high nPVI value, for
instance, indicates greater durational contrast between adjacent events in a sequence (cf. the Appendix from
[1] for an example of nPVI computation). Originally developed by phoneticians to show that “stress-timed”
languages (British English, German, Dutch) had naturally higher nPVI values than “syllable-timed”
languages (French, Spanish, Italian), Patel and Daniele [2] used the nPVI to show that a composer’s native
language directly influences the rhythms he/she writes[3]–[5] (for recent data on vocalic nPVI in English,
German, Italian, and Spanish, see Arvaniti 2012, Figure 2b [6]). Recent studies of the nPVI have begun to
explore the variation of this statistic with respect to time and culture. More specifically, Daniele and Patel
have demonstrated that the nPVI can reveal patterns in the historical analysis of music [1], [7]–[9]. By
plotting mean nPVI vs. midpoint year for German/Austrian and Italian composers who lived between
~1600 and 1950 a steady increase in nPVI values was observed over this period for German/Austrian music
while Italian music showed no such increase [1]. These data are consistent with the idea (from historical
musicology) that the influence of Italian music on German music began to wane in the second half of the
1700s due to a rise of musical nationalism in Germany [10]. Of note, these findings were replicated and
expanded upon to include an initial increase and decrease in music from 34 French composers during this
era [11].
It is important to note that the analyses performed by Daniele & Patel, e.g. [1], [8], were based on
assigning each composer a single, average nPVI value. More recently, when composer’s lives were
demarcated into different compositional epochs (by historical musicologists) [7] it was found that the mean
nPVI (for each compositional period) does not vary dramatically. Nonetheless, many questions remain. For
instance, what is the rhythmic “distribution” of each composer during this transition from 1760-1800? To
what extent might historical factors (the “Italian” influence) affect nPVI distribution over time?
2. AIMS
To test if nPVI could be used to detect dramatic historical shifts in music, the change in nPVI distribution
was explored with respect to the aforementioned 1760-1800 stylistic transition in German music and
compared to a culture where no change in nPVI would be expected (e.g. Italian). A simulation in which
nPVI was modeled to increase modestly over a long period of time was then compared to the calculated
ISBN 1-876346-65-5 © ICMPC14 376 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
data. Thus, this paper illustrates the utility of the nPVI equation and method in quantifying the precise
rhythmic underpinnings of a complex stylistic transition in music history.
3. METHOD
Theme List, exclusion Criteria, and Data Analysis: Data collection and analysis was performed according
to the previous methods and exclusion criteria were enumerated extensively in [7], [1]. As in the Daniele &
Patel studies, the musical materials for the current work were drawn from A Dictionary of Musical Themes,
Revised Edition [12]. The midpoint year (the mathematical average of the birth and death year, representing
when the composer was active) was considered in grouping these individuals into periods from the
“Baroque/Classical Era” (1600-1750/1750-1810) which took place before and during the 1760-1800
transition to more stylistically “German” music and the “Romantic Era” (1825-1900) which followed this
transition ([13], Vol. 9 pp. 708-744). Beethoven was considered one of the Romantic composers despite
writing before the “Romantic period” [14], [15]. Modeling of nPVI distributions over time was completed
using the “Normal Distribution Probability Calculator” in Sigma XL. More explicitly, distributions were
calculated by plugging in composer mean nPVI, standard deviation, and nPVI range into the
aforementioned program and then plotting the resulting proportions for each range.
4. RESULTS
With the intent to chronicle the rhythmic underpinnings of German musical evolution the median nPVI of
German/Austrian composers from different eras of musical history (“Baroque/Classical” 1600-1825, and
“Romantic” ~1825-1900) ([13]; and see Method) was calculated and used to categorize composers. Since
these grouped data sets (“Baroque/Classical” composers (Bach, Haydn, Mozart)) and “Romantic”
composers (Beethoven through R. Strauss, see Table 1) were deemed “non-normal” using the Anderson
Darling Test for Normality, the medians (and not means) were used (nPVI = 32.2 for the
“Baroque/Classical” period and nPVI = 43.0 for “Romantic”). These values were used to demarcate bins
for each composer’s theme corpus (Table 1). It is of note to mention that the nPVI at the intercept of the
German/Austrian and Italian slopes [1] was 41.3 (for a 15 theme cutoff for Italians). Thus, the use of an
nPVI of 43.0 as a demarcation point from the “Italian” style to the “German” style is reasonable.
Table 1
When the frequencies in Table 1 were plotted against each composer’s midpoint year, and regression lines
fitted, a striking trend was observed (Figure 1). The proportion of ‘low nPVIs’ (<32.2) significantly
decreased with time (Frequency (%) = -0.25*(Midpoint Year) +492.26, R² = 0.68, p < 0.01) while the
proportion of themes with higher nPVIs (>43) significantly increased with time (Frequency (%) =
0.29*(Midpoint Year) - 471.74, R² = 0.71, p < 0.01). No significant change in the proportion of themes
377
Table of Contents
for this manuscript
with nPVI 32.1 to 43.0 was seen (p=0.14) suggesting the majority of composers wrote the same percent of
themes with nPVIs in this range. These trends are almost identical when the theme cutoff is raised to 100
themes, which excludes Wagner and Strauss, Jr. (data not shown). Thus, very low nPVI themes are
seemingly being replaced with very high nPVI themes while the middle range stays constant. One does not
see these same trends when this analysis is performed on individual Italian composers (a culture that has
been shown to maintain a constant nPVI over time). For Italians, all slopes in these ranges are below (or
equal to) 0.03 and none of the regressions are significant (data not shown).
Figure 1. Percent of themes within a particular range for each composer plotted against midpoint
year (see Table 1). Three vertical dots (black, grey, and white), representing the percent of themes
within a particular nPVI range for each composer are plotted against midpoint year. Linear
regressions were computed from the proportion of themes, for each composer, within each nPVI
range. Lines represent significant linear regressions (p<0.01). Minimum number of themes per
composer = 75.
While the Italian data gave important historical evidence for the potential uniqueness of this rhythmic shift
in German/Austrian music, it was important to test what these nPVI proportions might look like if all
composer nPVIs increased incrementally with time. A computer simulation was performed to test this
hypothesis and the results are shown in Figure 2. Similar to the calculated composer data (Figure 1) the
proportion of ‘low nPVI’ themes (>32.1) decreases in a significant manner (Frequency (%) = -
0.22*(Midpoint Yr) +432.55, R2=0.71, p<0.01) though the rate at is noticeably slower than the observed
data (compare to slope 0.25, black line from Figure 1). The proportion of “high nPVI” themes (>43) also
increases steadily and significantly (Frequency (%) = 0.26*(Midpoint Yr) – 423.53, R2=0.70, p<0.01) yet
this is also at a slower rate than the observed data (compare to slope 0.29, grey line from Figure 1). Finally,
in stark contrast to the observed composer data (Figure 1) the proportion of “mid-range nPVI” themes
(32.2 > nPVI > 43) in the model decreases significantly with time (Frequency (%) = -0.04*(Midpoint Yr) +
90.98, R2=0.49, p=.017) while the observed composer data showed no significant change. Almost identical
results were found when Wagner and Strauss, Jr. were excluded.
378
Table of Contents
for this manuscript
Figure 2. Predicted rhythmic changes over time using simulation software. Predicted percent of
themes within a particular range for each composer (based on each composer’s calculated mean and
standard deviation) plotted against midpoint year. Three vertical dots (black, grey, and white),
represent the percent of themes within a particular nPVI range for each composer and are plotted
against midpoint year. Linear regressions were computed from the proportion of themes, for each
composer, within an nPVI range. In contrast to Figure 1, all regression lines are significant; dotted
line p=0.017
5. CONCLUSIONS
The present study illustrates how nPVI analysis can reveal the precise rhythmic underpinnings of important
musical shifts. Thus, to understand which rhythms are changing (and when) the distribution of nPVI within
each composer’s corpus of themes was explored.
The suggestive increase in the mean nPVI of German/Austrian composers during the aforementioned 1760-
1800 transition away from the Italian influence [1] prompted the development of a different method to
study the “metric” of a composer’s style, namely the distribution and spread of nPVI for all his/her themes
(Table 1 and Figure 1). Testing the robustness and composition of this trend led to the observation that the
frequency of themes with nPVI < 32.2 drops dramatically with time. These themes were replaced with a
concomitant increase in themes with nPVI > 43. Themes in the middle range between 32.2 and 43 did not
show any significant increase over time even when the theme cutoff was raised (Figure 1). A similar
analysis of Italian composers (where one would expect no changes with time) revealed no significant linear
regressions while a simulation of incremental nPVI increases with time also differed significantly from the
observed results (Figure 2). These data offer suggestive evidence for the idea that ‘low nPVI’ themes in
German music are being replaced with ‘high nPVI’ themes while the middle nPVI range stays constant
(“the Replacement Hypothesis”). This is in contrast to an alternative hypothesis that the majority of nPVIs
are modestly increasing to eventually resemble “more German” nPVIs by the mid- to late 1800s (e.g.
Figure 2). More broadly, these data suggest what has been reported by musicologists (e.g. [10]) that
composers in the late 1700’s made a conscious choice to write “German” themes in place of the “Italianate”
themes they once wrote.
379
Table of Contents
for this manuscript
While most of the German/Austrian composers had a proportion of ‘high nPVI’ themes that ranged from
40-50% of themes, two composer’s Richard Wagner and Richard Strauss, had nearly 80% of their themes
in this category. With such strikingly different distributions, it is interesting to speculate that these
composers’ themes might be reflecting rhythms that are characteristic of 20th century composers, rather
than the Romantic period they lived in. In fact, their style was most likely characteristic of the “New
German School”; composers which pitted Beethoven’s “absolute music" against their invention, the
“symphonic poem”. Importantly, these composers were not constrained by conventional musical forms e.g.
sonata, concerto, etc. ([16], pp. 338-367). This break from the traditional compositional style can be seen as
early as Wagner’s Tristan and Isolde which is thought to have inspired future musical conventions like
atonality, often used in “20th century” music ([17], p. 114). Future studies including “New German School”
composers like Franz Liszt and 20th century German composers like Arnold Schoenberg could test if this
predilection for very high nPVI themes is prevalent in these artists.
In conclusion, the current paper demonstrates the significance of using nPVI analysis as a tool to
probe and quantify the precise rhythmic components of stylistic trends in music history. Thus, it provides a
quantitative mechanism for comparison and support to the historical record and theory. Additional research
using nPVI analysis could include the exploration of not only how complex stylistic trends in music history
progress and develop, but also what the specific, underlying rhythmic structure of these trends might be.
REFERENCES
[1] J. R. Daniele and A. D. Patel, “An Empirical Study of Historical Patterns in Musical Rhythm,” Music
Percept. Interdiscip. J., vol. 31, no. 1, pp. 10–18, Sep. 2013.
[2] A. D. Patel and J. R. Daniele, “An empirical comparison of rhythm in language and music,”
Cognition, vol. 87, no. 1, pp. B35–B45, Feb. 2003.
[3] E. ~L L. E Grabe, “Durational variability in speech and the rhythm class hypothesis,” Pap. Lab.
Phonol., vol. Vol. 7, pp. 515–546, 2002.
[4] L. E. Ling, E. Grabe, and F. Nolan, “Quantitative characterizations of speech rhythm: syllable-timing
in Singapore English,” Lang. Speech, vol. 43, no. Pt 4, pp. 377–401, Dec. 2000.
[5] F. Ramus, “Acoustic correlates of linguistic rhythm: Perspectives,” in In: Proc. Speech Prosody
2002, Aix-en-Provence.
[6] A. Arvaniti, “The usefulness of metrics in the quantification of speech rhythm,” J. Phon., vol. 40, no.
3, pp. 351–373, May 2012.
[7] J. R. Daniele and A. D. Patel, “Stability and Change in Rhythmic Patterning Across a Composer’s
Lifetime,” Music Percept. Interdiscip. J., vol. 33, no. 2, pp. 255–265, Dec. 2015.
[8] J. R. Daniele and A. D. Patel, The Interplay of Linguistic and Historical Influences on Musical
Rhythm in Different Cultures. 2004.
[9] A. D. Patel and J. R. Daniele, “Stress-Timed vs. Syllable-Timed Music? A Comment on Huron and
Ollen (2003),” Music Percept. Interdiscip. J., vol. 21, no. 2, pp. 273–276, Dec. 2003.
[10] M. S. Morrow, German music criticism in the late eighteenth century: aesthetic issues in
instrumental music. Cambridge, U.K.; New York: Cambridge University Press, 1997.
[11] N. C. Hansen, M. Sadakata, and M. Pearce, “Non-linear changes in the rhythm of European art
music: Quantitative support for historical musicology.,” Music Percept. Interdiscip. J., 2016.
[12] H. Barlow and S. Morgenstern, A Dictionary of musical themes. London: Faber, 1983.
[13] S. Sadie and J. Tyrrell, The new Grove dictionary of music and musicians. London; New York:
Grove’s Dictionaries, 2001.
[14] M. Solomon, Beethoven
! #$ '@ZZ\
[15] V. d’Indy, Beethoven; a critical biography. New York: Da Capo Press, 1970.
[16] A. Walker, Franz Liszt. 2 2. Ithaca, NY: Cornell Univ. Pr., 1993.
[17] J. Deathridge, Wagner beyond good and evil. Berkeley: University of California Press, 2008.
380
Table of Contents
for this manuscript
Language and music both require the listener to extract temporal information from
utterances as they unfold in time. Although individuals display different patterns of
coordination when synchronizing with music relative to speech, it is not clear whether
these differences are driven by the way listeners entrain to music compared to language
or by the acoustic features unique to each domain. In the current study, we used several
ambiguous speech utterances that, when repeated, are perceived by both musicians and
non-musicians as transforming from speech to song. These excerpts provide the
opportunity to further understand how a listener’s perception influences physical motor
entrainment to temporal characteristics of excerpts perceived as speech or song. Listeners
tapped to 10 repetitions of ambiguous, speech-to-song and unambiguous speech excerpts
(synchronization) and continued tapping in silence after the repetitions (continuation).
Following this task, participants were also asked to simply listen to three repetitions of
each excerpt, and rate how closely they resembled speech or song. Preliminary results
indicate that listeners have a higher degree of local entrainment to properties of
ambiguous stimuli compared to unambiguous speech stimuli. Further analyses will
examine the differences in multiscale sensorimotor coupling and whether these
differences are driven by acoustic features or by top-down factors. We discuss our results
in terms of the differences and similarities between language and music under the
framework of multiscale sensorimotor coupling.
ISBN 1-876346-65-5 © ICMPC14 381 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Please Don’t Stop the Music: Song Completion in Patients with Aphasia
Anna Kasdan1 and Swathi Kiran2
1
Undergraduate Program in Neuroscience, Boston University, Boston, MA, USA
2
Sargent College of Health and Rehabilitation Sciences, Boston University, Boston, MA, USA
Aphasia, an acquired language disorder resulting from brain damage, affects over one
million individuals in the United States alone. Many persons with aphasia (PWA) have
been observed to be able to sing the lyrics of songs more easily than they can speak the
same words. The observation that not only singing, but even humming a melody, can
facilitate speech output in PWA provided the foundation for Melodic Intonation Therapy.
The current study aimed to look at PWA in their ability to complete phrases from songs
in an experimental stem-completion format. Twenty PWA and 20 age-matched healthy
controls participated. The task consisted of three conditions (sung, spoken, and melodic)
each consisting of 20 well-known songs. Participants heard the first half of a phrase that
was either sung, spoken, or intoned on the syllable “bum,” and were asked to complete
the phrase according to the format in which the stimulus was presented. Participants were
scored on their ability to complete the melody and lyrics together in the sung condition,
only the lyrics in the spoken condition, and only the melody in the melodic condition.
PWA scored highest in the sung condition, followed by the spoken and melodic
conditions, while controls scored comparably in the sung and spoken condition and much
lower in the melodic condition. Both groups were better able to access the melody of
songs in the sung condition than in the melodic only condition, while there was no
difference in accuracy for lyric production between the sung and spoken conditions.
These results are consistent with the integration hypothesis, which postulates that the text
and tune of a song are integrated in memory. Moreover, there exists a stronger salience of
the text over the tunes of songs in memory; this is true for PWA and controls.
Interestingly, the more severe PWA scored higher in the melodic condition than in the
spoken condition, while the opposite trend was found for less severe PWA and controls.
This indicates that access to melody is preserved in PWA, particularly those who are
more severe, even while they exhibit language impairments. Findings may have
implications for using music as a more widely implemented tool in speech therapy for
PWA.
ISBN 1-876346-65-5 © ICMPC14 382 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Mounting evidence suggests that musical abilities are closely associated with reading
ability, particularly word decoding (“sounding out” or pronouncing individual words).
For example, musically trained children typically show superior reading abilities
compared to untrained children, and music perception skills are associated reading skills.
Remaining questions include 1) whether associations remain after controlling for
potential confounding variables (e.g., demographic, cognitive, and personality variables),
2) whether the associations are specific to word decoding or extend to other academic
skills (e.g., reading comprehension, spelling skills, and mathematical ability), 3) which
aspects of music perception best predict reading skills, and 4) whether the associations
are also evident in adult samples. In the current study, we examined the specificity of the
association between musical abilities and reading skills in undergraduate students.
Participants completed measures of academic skills (word decoding, reading
comprehension, spelling, and math; Wide Range Achievement Test), personality (Big
Five Inventory), IQ (Wechsler Abbreviated Scale of Intelligence – Short Form), and
music perception (Mini Profile of Music Perception Skills; Beat Alignment Test), and
they provided musical background and demographic information (e.g., age, gender,
parents’ education, family income). Results (n = 105) suggest that music perception skills
are associated with spelling and word decoding but not reading comprehension or
mathematical abilities. Associations between music perception skills and word decoding
were especially strong and remained significant even when potential confounds (e.g., IQ)
were statistically controlled. Our findings provide further support that musical abilities
are specifically associated with reading and reading-related skills (e.g., spelling) that rely
most heavily on sound processing, even into adulthood.
ISBN 1-876346-65-5 © ICMPC14 383 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Research on musical schemas has revealed stock musical patterns that serve particular
functions within a musical piece. For example, Gjerdingen (2007) has shown that this
system of musical schemas was a common means of musical production in the Western
classical music tradition. In a similar manner, within linguistics, construction grammar
researchers argue that language utilizes multiple levels of form-function pairings: like the
semiotic pairing of a form, sound, and meaning to form a word, even sentences carry an
arbitrary, but conventionalized, pairing of a certain structure or form and a meaning (e.g.
X causes Y to Z, the Xer the Yer). There are several parallels between musical schema and
linguistic constructions (cf. Gjerdingen & Bourne 2015). For instance, both entail a
system of categories, from which certain elements can be selected and interchanged in the
various “slots” in the forms. Nevertheless, to draw conclusions about common cognitive
mechanisms for language and music, further, cross-cultural evidence of musical schemas
is needed.
I will present exploratory findings of research on the songs of the ethnic minority Dong
people in Guizhou, China. Historically, the Dong people have no orthography for their
language (Kam), yet they have a rich tradition of singing (cf. Long & Zheng 1998;
Ingram 2014). This research will investigate if and how they may use a similar form-
function pairing of musical chunks. For example, their use of a sor or “musical habitus”
(Ingram 2012) may be indications of categorization of musical entities and prefabricated
chunking similar to constructions in language. Another indicator is their rising, second
interval musical “idiom” that is often used to signal the end of musical sections or songs.
The Dong singing tradition offers an opportunity to investigate a human cultural artefact
to help understand the cognitive relationship between music and language. Moreover,
without an orthographic system, the Dong people may use music as a cultural
replacement of some aspects of language.
ISBN 1-876346-65-5 © ICMPC14 384 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 385 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Repetition of speech has long been known to cause perceptual transformations. In the
semantic satiation and verbal transformation effect, repetition for example leads to loss
and shifts of meaning. Similarly, in the speech to song transformation (STS) a spoken
phrase starts to sound like a song when repeated several times, even though the phrase
itself is not changed (Deutsch et al., 2011). This makes it a powerful tool for investigating
the differences between speech and song perception without introducing confounding
acoustical differences. While speech repetition triggers various transformations, it
remains an open question why certain phrases transform to song and others do not.
Several studies have touched upon this question and we aim to combine these to provide
a general answer by construing the STS as a categorization problem. Previous studies
suggest that pitch and rhythm related features strongly contribute to the transformation
(e.g. Deutsch et al., 2011; Falk et al., 2014; Tierney et al., 2012). The most detailed study
of such features (Falk et al. 2014) investigated only a small set of stimuli, whereas
Tierney et al. (2012) analyzed many phrases, but used simple pitch and timing measures.
The current study combines both approaches using a web-based experiment with nearly
300 spoken fragments. We analyze the ratings using different measures, including paired
variability index and intensity weighted standard deviations. The result provides
additional support for the previous findings and suggest that the extent to which a phrase
can be heard as having a clear and musical pitch structure, predicts the STS. In summary,
these findings strengthen the idea that speech phrases that transform to song have clear
song-like features to start with. This highlights the problem of inferring the transformed
melody from the sound, which suggests a computational model of the STS. Moreover, it
has implications on perceptual shifts in repeated speech more generally.
ISBN 1-876346-65-5 © ICMPC14 386 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Adults speak and sing differently when directing their vocalizations to infants and young
children compared with other adults. Prior studies have shown that infant-directed
vocalizations tend to have higher fundamental frequencies, slower tempos, and greater
variations in tonal contours. Changes in the acoustic qualities of infant-directed
vocalizations appear to affect infants—attracting their attention, raising arousal levels,
and even facilitating speech perception—but it is not clear what properties of infant-
directed vocalizations are most relevant to infant speech and song perception. In the
present study, we investigate the temporal dynamics of infant vs. adult-directed speech
and song using a new methodology designed to analyze temporal clustering in acoustic
events. We hypothesize that temporal clustering in spoken language is relevant to infant
speech and song perception because it reflects the hierarchical structure of spoken
language, by virtue of greater temporal clustering in infant-directed vocalizations.
Acoustic events are peaks in the acoustic amplitudes of an auditory filter bank that spans
a range of frequencies, and temporal clustering is measured with Allan Factor analyses
used previously to measure temporal clustering in neural spike trains. Fifteen German
mothers read a story and sang a nursery rhyme for their 6 months old infants, and also for
an adult. Analyses showed greater temporal clustering in infant-directed compared to
adult-directed versions of story-reading and singing, particularly at longer timescales on
the order of seconds. These results open new ways of analyzing the temporal dynamics of
speech and song, and relating them to infant language acquisition and musical
development.
ISBN 1-876346-65-5 © ICMPC14 387 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Research has shown that aspects of musical training and ability transfer to speech
perception in an unfamiliar language (e.g., Mandarin tones, Gottfried, 2007) and in one’s
native language (e.g., enhanced speech encoding in noise, Zendel et al., 2015). These
findings led us to test whether a musician advantage might also be observed in
comprehension of acoustically degraded speech. Normal-hearing listeners performed a
pretest in which they transcribed 8-channel TigerCIS vocoded speech (which presents
information similar to that provided by cochlear implants; see Loebach et al., 2008).
These listeners were then trained (with feedback) on 60 vocoded Harvard sentences and
40 MRT (Modified Rhyme Test) words or the same sentences and words in natural
speech. Musicians (conservatory-trained) and non-musicians were compared on their
initial accuracy in transcribing the vocoded speech, and their transcription accuracy after
the limited amount of training on natural or vocoded speech. Accuracy was significantly
greater on sentences than words. However, musicians and non-musicians were not
significantly different in their transcription accuracy, neither on the pretest nor on tests
after training (new 8-channel and 4-channel stimuli). Accuracy for all listeners on 8-
channel posttests was greater than on pretests, but there was no significant difference in
posttest performance between those trained on natural vs. vocoded speech. Further tests
using a more extensive training regimen and perceptual tests to assess listeners’ musical
pitch and rhythm abilities are currently being conducted to determine whether specific
musical skills are more relevant to improved speech perception.
ISBN 1-876346-65-5 © ICMPC14 388 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
High Talkers, Low Talkers, Yada, Yada, Yada: What Seinfeld tells us
about emotional expression in speech and music
Tiffany Bamdad1, Brooke M. Okada1, and L. Robert Slevc1
1
Department of Psychology, University of Maryland, College Park, MD, USA
While both music and speech can express emotion, it is not yet clear if they do so via a
“common code.” On one hand, speech prosody and music convey emotion in similar
ways—through manipulation of rate, intensity, and pitch. On the other hand, relationships
between discrete pitches are critical to the emotional character of musical pieces, whereas
speech prosody does not seem to rely on discrete pitch. Supporting a common code for
emotion in music and speech, Curtis and Bharucha (2010) found that pitches in sad
speech approximate a minor third (an interval associated with sad music). This may even
occur between speakers: Okada et al. (2012) found that individuals tend to speak in keys
separated by dissonant intervals when disagreeing, and by consonant intervals when
agreeing. However, this conclusion required extracting a likely key from spoken phrases,
thus relied on approximations of both pitch and key. The current study more directly tests
whether intervals between interlocutors’ spoken pitches relate to the quality of their
conversation. We selected agreeable and disagreeable conversations from Seinfeld
episodes and compared the pitch of the last word of one character’s utterance with the
pitch of the first word of their conversation partner’s response. Distance between the
pitches was approximated to the closest musical interval and categorized as a perfect
consonant, imperfect consonant, or dissonant interval. In contrast to previous findings,
agreeable conversations did not contain more perfect or imperfect consonant intervals
compared to disagreeable conversations, and disagreeable conversations did not contain
more dissonant or imperfect consonant intervals compared to agreeable conversations.
These results suggest that expression of emotion in speech does not manifest in musically
relevant intervals between interlocutors, supporting the view that emotion in speech and
music does not rely on a common code, at least between speakers in conversation.
ISBN 1-876346-65-5 © ICMPC14 389 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
New, freely-available technology for studying the impact of simple musical training on
speech perception in cochlear implant users
Thomas R. Colgrove,*1 Aniruddh D. Patel, #2
*
Departments of Computer Science and Music (BS 2015), Tufts University, USA
#
Department of Psychology, Tufts University, USA
trcolgrove@gmail.com, a.patel@tufts.edu
ISBN 1-876346-65-5 © ICMPC14 390 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
391
Table of Contents
for this manuscript
II. DETAILS OF TRAINING white keys using adhesive paper (no black keys are used in training).
Each note within an octave has a unique color, and the color pattern
A.Notation repeats across octaves.
An important feature of Contours is the simplicity and As with Western music notation, the staff lines and spaces
readability of the musical notation. While traditional music map in a one-to-one fashion to the large keys of the keyboard
notation has a steep learning curve, the notation in Contours (i.e., the white keys, whose tips are covered with colors, cf.
uses color as a visual aid to assist users in playing the patterns Figure 5). Thus the contour shown in Figure 4 involves playing
presented (Figure 4), and does not require learning note names an ascending contour which starts on a large blue key (near the
(e.g., C, D, E, etc.) The notation uses unstemmed circles to middle of the keyboard) and skips one large key between each
represent pitches. Each circle has a specific color, which of its pitches. An important difference between the music
corresponds to a color on the MIDI keyboard (Figure 5). The notation used in Contours and Western music notation is that
notation and keyboard span a 3 octave range (from C2 to C5, the notation in Contours does not include any accidentals.
i.e., MIDI note 48 to 84), with the pattern of colors repeating (Correspondingly, none of the black keys are used during
each octave (e.g., all Cs on the keyboard are purple, across training.) This is to maintain the simplicity of notation and
different octaves). make it easy to learn for individuals with no prior musical
training.
B. Training Program Flow
i. Each time a user plays Contours, they select a contour
difficulty (easy, medium or hard, cf. Figs 1-3), a pitch
gap size between successive notes in the contours
(narrow, medium, or wide), and a sound for the
application to play (sine wave, triangle wave, square
wave, sawtooth wave, or sampled piano sounds). For
pitch gap size, narrow, medium and wide correspond to
the number of colored keys skipped when playing
consecutive notes in a contour: i.e., 0, 1, or 2,
respectively. (Thus the ascending contour shown in
Figure 4 has a medium gap size).
i. There is no enforced order in which the user
progresses through the different conditions. This
allows the researcher suggest a particular order, e.g.,
Figure 4. Screenshot of Contours, presenting an easy contour to the
user. The next note to be played is indicated by an inverted wedge
easy level first, then medium, then hard, with wide
over the circle in the music notation, and over the key that should be pitch gaps at all levels, followed by repeating these
pressed on the keyboard to produce that note. When the three levels with medium pitch gaps, and then again
corresponding key is pressed, the wedge moves to the next pitch in with narrow pitch gaps, all using sine-wave tones.
the music notation and the corresponding key on the image of the This example would result in 9 conditions (3 contour
keyboard. The image of the keyboard at the bottom of the Contours difficulty levels x 3 pitch gap sizes), all using the
screen is for visual reference only: the user plays a physical keyboard sine-wave sound. Repeating this regimen with
attached to the tablet, which has the same color scheme (see Figure another sound (such as the square wave) would add 9
5). more conditions.
ii. Once the user chooses a difficulty level, a pitch gap size,
and a sound, Contours presents the user with a block of
contours consisting of each contour shape from that
difficulty level at 5 different transpositions. (Hence a
block has 45 contours at the easy level, and 50 at the
medium and hard levels, cf. Figs 1-3.) For a given
contour shape, transpositions are chosen randomly
(without replacement) from all possible transpositions
that fit within the 3 octave range. Thus, 5 different
transpositions are always presented, and repeating a block
will result in a new set of randomly-chosen
transpositions.
iii. When playing a given contour, if the user makes a
mistake (e.g., plays a wrong note), a visual message
appears in the center of the screen (e.g., “Whoops!”) and
the user must begin the contour again from the first note.
Upon playing a contour correctly, the user gets a visual
congratulation message (e.g., “Good job!”) and earns
Figure 5. Contours running on an Android Tablet, connected to the points, which appear in the top bar of the screen. (See
color-coded 3-octave MIDI keyboard. Colors are placed on the
392
Table of Contents
for this manuscript
section on Data metrics for a description of how points 3.Note gap: an integer between 0 and 2, indicating how many
are calculated.) colored keys are skipped between consecutive notes of the
iv. Once a block of contours is completed, the user is asked a contour.
few short questions about the contours played, and 4.Sound: specifies which of the 5 sounds was used: sine
responds to each question using a number from 1 to 5. wave, square wave, triangle wave, sawtooth wave, sampled
The questions are: piano.
i. How difficult was the task? 5. Start note: The MIDI note number of the first note of the
ii. How different did the contours sound from each contour (indicates the transposition level).
other? 6. Total completion time: Time from when the contour is
iii. Do you think you are improving at the task? first posted on the screen to when it is completed correctly. If
v. Once a block is completed, quantitative data about the mistakes were made, and the user had to start the contour
user’s performance on each of the 45 (or 50) contours is again, this time is included. (However, time taken by the
logged, such as number of errors, speed and accuracy. posting of messages triggered by errors, e.g., “Whoops!”, is
These data are stored locally on the Android device, and not included.)
sent to a remote server in JSON format (see section on 7. Num errors: The number of wrong notes played before
Data metrics for details). These data can be downloaded completing the contour correctly.
from the server as spreadsheets in .csv format for 8. Percent error: Num errors/number of notes played while
analysis. This allows researchers to quantify the progress completing a contour.
of users over multiple sessions. (Responses to the 9. Completion time (correct run): Time between the onset of
questions described above are also stored.) the first note and the onset of the last note when the contour is
played correctly
C. Data Metrics 10. Note interonset interval sd: The standard deviation of
Within a block, the user earns points for each correctly- durations between note onsets when the contour is played
played contour. Points reflect both speed and accuracy of correctly.
performance. The score for a contour = 100 points (for 11. Date and time
completing the contour correctly), plus a bonus which equals Date and time the contour was completed.
100 minus the total time it took to complete the contour (in
milliseconds/250). This number is then multiplied by an Details on how to access data spreadsheets from a server are
integer which corresponds to how many contours have been provided on the GitHub repository for the Contours project.
completed in a row without any errors (the maximum
multiplier is 8). Finally, this number is added to the existing III. TECHNICAL SPECIFICS AND
score to create a running point total for the block. After EXTENSIBLITY
completing a contour, the top bar briefly displays the points
The Contours training application for Android can be
earned for that contour, and the running total for the block,
downloaded from GitHub and installed on any Android tablet
with the latter score remaining on screen throughout the
device, although tablets with at least 9” of screen space,
block.
running Android 5.0 or higher are recommended. This tablet
When a block is completed, performance data about that
should be connected to a 3-octave MIDI keyboard to which
block are stored locally on the tablet, and then uploaded to a
colored adhesive paper has been added to match the color
server when an internet connection is available. These data
scheme in the software (see Figure 5). We used a CME Xkey
can then be downloaded from the server as a .csv spreadsheet
37.
by the researcher for analysis, e.g., to quantify improvement
The application is open source under the MIT license
over time. Each spreadsheet has 11 columns. The columns
(https://opensource.org/licenses/MIT), and can be freely used,
are as follows:
shared, and modified to suit different testing needs. While
there are many possible extensions of the program, there are a
1. Contour ID: an integer between 1 and 9 (for the easy
few that can be made with relatively minimal coding (see
contour condition) or 1 and 10 (for the medium and hard
below). We chose Android as the operating system for its
conditions), specifying contour shape. For the easy condition,
portability and affordability, the open nature of the platform,
the numbers correspond to the layout of shapes in Figure 1, in
and the ability to use the application on many different
the following order:
devices.
123
456 A. Modification of Contour Shapes in XML
The preloaded musical contour shapes are defined in a project
789
xml contours.xml file, that can be easily modified to provide
For the medium and hard conditions, the numbers correspond
other contours of different lengths and shapes. This allows
to the layout of shapes in Figure 2 and 3, respectively, in the
researchers to easily expand on or replace the preloaded set of
following order:
contours. For example, some users may want to progress
12
beyond the preloaded contours to more challenging and
34
complex shapes. Each contour shape only needs to be
56
specified once, as the software will automatically transpose
78
the contour to multiple different pitch levels. Example xml
9 10
codings of three contours are shown below. Note that contours
2. Difficulty: Easy, medium or hard (cf. Figs 1 – 3)
are specified using standard note names from Western music
393
Table of Contents
for this manuscript
(e.g., C2, E2), which combine a pitch class and an octave which this application could be significantly extended. For
number. (Further documentation on code specifics can be instance, the rhythm and cadence of speech is arguably as
found on the project github.) important as pitch variation. Furthermore, rhythm is often a
fundamental component of people’s ability to identify
different melodies. It is possible that introducing a rhythmic
<!--Rising one skip -->
component to the training program could be of greater benefit
<item>C2,E2,G2,B2,D3</item> to CI users in understanding speech.
Furthermore the keyboard training at present only
<!-- Rising Flat one skip --> presents the user with monophonic melodic patterns. It would
be fairly straightforward to extend the training to include
<item>D2,F2,A2,A2,A2</item>
contours featuring chords, or more than one note played at the
<!-- Rising Falling one skip --> same time.
<item>C2,E2,G2,E2,C2</item> V. CONCLUSION
Contours provides a flexible, sophisticated platform for
3 xml formatted contours, 5 notes each training CI users to play simple melodic contours, with the
goal of improving their pitch contour perception, with
B. Modify Application Timbre using Pure Data consequent benefits for both music and speech perception.
One powerful component of Contours is the ability to test the The software assumes no prior musical knowledge or
effect of various timbral qualities on pitch perception and experience. Existing perceptual tests, such as the melodic
pitch contour learning among CI users. The synthesizer contour identification (MCI) test (Galvin et al., 2007), can be
component of the application uses the Pure Data visual given before and after training to determine if learning to
language to enable the generation of a wide array of different play melodic contours enhances the ability to perceive and
musical sounds. The integration of Pure Data with Contours is discriminate pitch contours. Ultimately, one could compare
modular and allows modification and swapping of pure data such training to purely auditory training (matched in duration)
patches. It is straightforward to modify or swap out the to test the hypothesis that actively playing contours is more
currently included Pure Data patches, to use an expanded set effective for enhancing pitch perception, due to the auditory-
of sounds. The two currently included modules are a fully motor nature of such training.
functional subtractive synth based on a related The software is available for download at
implementation written by Christopher Penny, and an https://github.com/contoursapp/Contours
instrument sampler. Possible modifications include modifying
the existing subtractive synth, changing the sample sets ACKNOWLEDGMENT
available on the sampler, or to using a different patch
This research was supported by a grant from the Paul S.
altogether. More detailed instructions for extending the
Veneklasen Research Foundation. We thank Allison Reid for
Contours synthesizer can be found in the project
her help in developing the medium and hard difficulty contours
documentation.
and test the Contours software, and Paul Lehrman and Chris
C. Example Server Penny for their technical insights.
The Contours application comes preloaded with the capability
to send detailed test results to a server application in JSON REFERENCES
format. A fully-featured example server that can be used with Galvin, J. J., Fu, Q. J., & Nogaki, G. (2007). Melodic contour
the contours server is provided on the application GitHub. The identification by cochlear implant listeners. Ear and hearing, 28,
server is coded in a Flask/Python/MongoDB back end. The 302-319.
example server also includes a simple front end webpage Galvin, J. J., Fu, Q. J., & Shannon, R. V. (2009). Melodic contour
which displays summary test results in tabular form, and identification and music perception by cochlear implant users.
Annals of the New York Academy of Sciences, 1169, 518-533.
enables the spreadsheet download of results for each
Lappé, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008). Cortical
completed block of contours in CSV format. Administrators plasticity induced by short-term unimodal and multimodal
of the Contours training program may use the example server musical training. The Journal of Neuroscience, 28, 9632-9639.
as provided, extend it, or use their own server. Lahav, A., Saltzman, E., & Schlaug, G. (2007). Action representation
of sound: audiomotor recognition network while listening to
D. Further Extensions newly acquired actions. The Journal of Neuroscience, 27, 308-
More extensive changes to the contours application require 314.
some knowledge of Java and the Android SDK. Brief Limb C.J., & Roy A.T. (2014). Technological, biological, and
descriptions of possible future extensions will be detailed in acoustical constraints to music perception in cochlear implant
the following section. users. Hearing Research, 308, 13-26.
Mathias, B., Palmer, C., Perrin, F., & Tillmann, B. (2015).
IV. PROPOSALS FOR FUTURE Sensorimotor learning enhances expectations during auditory
perception. Cerebral Cortex, 25, 2238-2254.
CAPABILITIES Patel, A. D. (2014). Can nonlinguistic musical training change the way
The Contours application currently tests the effects of the brain processes speech? The expanded OPERA hypothesis.
simple musical keyboard training on CI users ability to Hearing research, 308, 98-108.
perceive pitch contours. However, there are multiple ways in
394
Table of Contents
for this manuscript
Research suggests that music and language draw on similar syntactic processing
resources, but questions remain about the nature of these resources, and the degree to
which they are shared. Evidence suggests that performance is impaired when both
domains simultaneously process syntax. However, most of this evidence comes from
experiments that introduce syntactic errors in music (e.g. out-of-key notes) or language
(e.g. grammatically incorrect sentences). Such experiments cannot rule out the possibility
that effects are due to shared error detection mechanisms independent to syntactic
processing. In a series of experiments, we examined whether evidence for shared
syntactic resources by music and language can be elicited without introducing syntactic
errors, and whether such interference effects are modulated by attention. In Experiment 1,
participants listened to audio stimuli while reading language. We discovered that memory
for complex sentences (with syntax), but not word-lists (no syntax), was (1) lower when
paired with a single-timbre melody than when paired with environmental sounds, and (2)
higher when paired with a melody with alternating timbres than when paired with a
single-timbre melody. In Experiment 2, participants did a same/different task, which
found that the latter effect occurred because alternating timbres drew attention away from
syntax, thereby releasing resources for processing syntax. In Experiment 3, participants
selectively attended to one stream of audio stimuli, while reading language. Results
showed that memory for complex sentences was worse when participants attended to
melodies while ignoring environmental sounds, and better when participants attended to
environmental sounds while ignoring melodies. This suggests that attention modulates
syntactic processing. Overall, the results indicate that syntactic interaction effects cannot
be explained by overlapping error detection mechanisms, and that attention modulates
syntactic processing.
ISBN 1-876346-65-5 © ICMPC14 395 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Every day we use our experiences and knowledge to make sense of the sights and sounds
around us. Although there is a wealth of literature examining speech and music
processing independently, there is little known about how we flexibly recruit musical and
linguistic knowledge in different contexts while also controlling for differences in
acoustic features. As such, it remains unclear whether children are able to recruit domain-
specific knowledge even when acoustic features are held constant, or whether domain-
specific knowledge is primarily driven by acoustic features that are characteristics of each
domain. We examined how adults and children detect pitch interval changes in both a
spoken and sung context for overtly spoken and sung utterances and for ambiguous
utterances that straddle the boundary between speech and song. Using a pitch contour
expansion, we expanded the pitch relationships of spoken, sung, and ambiguous
utterances, such that the pitch contour was preserved, but pitch interval relationships were
broken. Pitch contour is a salient aspect of both speech and song, but pitch interval
relationships are uniquely important in music. Thus, if listeners recruited domain-specific
knowledge when listening to overtly spoken and sung utterances, listeners should be
more sensitive to these pitch interval changes for song than speech. We also examined
pitch interval change sensitivity for ambiguous utterances when they were presented in a
language vs. a music context. Finally, as domain-dependent recruitment of knowledge
can help listeners extract relevant features in each domain, we examined whether
domain-specific recruitment of knowledge was correlated with phonological processing.
Adults showed greater sensitivity for sung compared to spoken utterances, but their
performance was not related to phonological processing. Results for children (4-8 years
of age) will be discussed.
ISBN 1-876346-65-5 © ICMPC14 396 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 397 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Much of behavioral music cognition research relies on the attentional control and self-
discipline of the listener, who is asked to make ratings or judgments about a sound, or to
classify a sound into one of multiple categories. Although such tasks are easily carried
out by adults, they may pose an unnecessary challenge to populations such as young
children. While a venerable tradition of “rocket-ship psychophysics” employs cleverly
designed games that measure discrimination and detection thresholds among children
(Abramov et al, 1984), there is an ongoing need for ecologically valid, engaging tasks to
be used in auditory cognition research with children. We adapted a video game paradigm
(Lim & Holt, 2011) to examine auditory pattern-learning in adults and children. The
fictional premise is that the player (child) must save Earth by capturing or destroying
noisy aliens, whose sounds predict their status as friends or enemies before they appear
on screen. During a training phase listeners are trained with feedback to classify aliens
according to two sound patterns (the “friend” or “enemy” pattern), and during a test
phase they must classify aliens in the dark (due to a “power outage”). We have adapted
this paradigm to replicate a prior adult pattern-learning experiment in our lab showing
that listeners can classify (using similarity ratings) novel syllable or melodic patterns
based on prior exposure to other exemplars (Hannon et al., in preparation). Assuming full
replication of adult findings is successful, we will extend the task to children. The game
is general enough that it could be used to ask many different types of questions involving
auditory classification, but by virtue of its ecological validity it promises to yield more
robust, reliable data from young children.
ISBN 1-876346-65-5 © ICMPC14 398 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 399 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Song is combined form of words and music, which is efficient medium for one’s
expression of emotions and thoughts. Songwriting is one of major therapy techniques
utilized to facilitate self-expression including diverse emotions. Therefore, selecting and
matching vocabularies for the melodic line, or vise versa, is an important concept in
songwriting. A few studies on music and language suggest that there is a similarity
between a melody of song and a prosodic characteristic of language. However, there are
not many studies which examine relationships between the prosodic contours of emotion
vocabularies from musical or tonal aspects. These studies can provide crucial information
needed for songwriting.
This study was purposed to investigate the characteristic of tonal elements in emotional
vocabularies in adults by analyzing the prosodic features of vocabularies used for
designated context when speaking: pitch pattern, loudness (dB), and speech rate (syl/sec).
For the analyses, words of 3 different emotions from valence theory (happy, sad,
fear/anxiety) and one neutral word (be careful) were processed to extract acoustic values
using Pratt. The data were transformed into pitch (herz) as the Pratt records them in
semitone figures in order to examine the melodic contour.
The results showed that ‘ʙul-ƞan-hɛ-sʌ’ (fear/anxiety) and ‘ʤl-ɡʌ-ƞwa-ƞjo’ (fun) were
common pitch patterns among male. The pitch patterns of these syllables are that ‘ʙul-
ƞan-hɛ-sʌ’ is A2-A2-G2-B2 and ‘ʤl-ɡʌ-ƞwa-ƞjo’ is G2-G2-G2-E2. Whereas, ‘sɯl-pʌ-
ƞjo’ (sad), ‘hɛƞ-bok-hɛ-sʌ’ (happy), ‘ʤo-sim- hɛ- ƞjo’ (be careful) were common pitch
patterns among female. The pitch patterns of these syllables are that ‘sɯl-pʌ-ƞjo’ is A3-
B3-A3, ‘hɛƞ-bok-hɛ-sʌ’ is D4-E4-B3-B3 and ‘ʤo-sim- hɛ- ƞjo’ is G3-G3-D3-A2.
Another results were that the mean loudness of ‘sad’ is 40 (dB), ‘happy’ is 44.47 (dB)
and ‘be careful’ is 41.71 (dB). Overall, results suggest that the word ‘happy’ had the
highest pitch pattern and loudness whereas the word ‘sad’ had lower pitch pattern and
loudness. There were differences in the speech rate among the words although not at the
significant level. The Results provided basic data on the relationship between
emotionality of words and tonal contour. Further, these findings can be incorporated into
songwriting.
ISBN 1-876346-65-5 © ICMPC14 400 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In this experiment we investigated whether musical skills are associated with grammar
processing in musically trained and untrained 6- to 9-year-old children. The children
completed a music aptitude test and a standardized test of receptive grammar. After
controlling for demographic factors, working memory and IQ, rhythm-perception skill,
but not melody perception, was found to predict receptive grammar. This result suggests
an overlap in the temporal domain for language and music. Musically trained children
outperformed on the test of melody perception. They did not, however, show a rhythm
perception or receptive grammar advantage. This result suggests that music training may
not lead to language advantages if it does not promote the development of rhythm-
perception skill.
ISBN 1-876346-65-5 © ICMPC14 401 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The origin of music and language is a fundamental question in science. Darwin’s musical
protolanguage hypothesis holds that music and language evolved from a common
precursor that gradually diverged into two distinct systems with distinct communicative
functions. In this investigation, a serial reproduction task was used to simulate this
process of divergence. Fifteen participants were presented with eight vocalized nonsense
sounds and were asked to reproduce them with specific communicative intentions. Four
vocalizations functioned to communicate an emotional state (happy, sad, surprised,
tender) and four vocalisations were referential, communicating a physical entity (tree,
hill, stone, grass). Participants were told to reproduce each sound with the defined
communicative intention strongly held in mind (the assignment of the eight vocalisations
to emotional or referential conditions was counterbalanced across participants). The
resultant vocalisations were then recorded and played to a new set of 15 participants, who
were given the same instructions as the first set of participants. This procedure was
continued across five sets of participants (“generations”) to create 15 “chains” of
reproductions. Recordings of the final 120 vocalisations were presented to two new
groups of participants in Australia (N=40) and China (N=40), who rated each in terms of
its perceived music- and speech-like qualities. Emotional vocalizations were rated as
significantly more music-like than referential vocalisations; and referential vocalisations
were rated as significantly more speech-like than emotional vocalizations. Acoustic
analysis of vocalizations revealed clear differences in rate, intensity, and pitch variation
for emotional and referential vocalizations. This study is the first to experimentally
demonstrate how a musical protolanguage may have gradually diverged into speech and
music depending on the communicative intention of vocalizations.
ISBN 1-876346-65-5 © ICMPC14 402 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In order to convey not only the semantic information of a score but also the emotional
connotation of the work, every interpretation of a classical composition implies an
attempt to identify the artistic and even ethical-religious intentions of the composer. The
findings are to be expressed in the performance by means of only a few musical
categories - tempo, dynamics, phrasing, articulation, timbre – which on a technical level,
however, encompasses a host of optional parameters, among which musicians and
conductors must permanently navigate in order to target the emotional impact of their
interpretations.
This presentation aims to explain and exemplify these options based on recent findings
within the fields of musical perception and semiotics.
ISBN 1-876346-65-5 © ICMPC14 403 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 404 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
405
Table of Contents
for this manuscript
Why do listeners move to music? To find out, we measured the postural sway of two
trombonists as they each recorded multiple performances of two solo pieces in each of
three different expressive styles (normal, expressive, non-expressive). We then measured
the postural sway of 29 non-trombonist listeners as they “air conducted” the
performances (Experiment 1), and of the two trombonists as they played along with the
same recorded performances (Experiment 2). In both experiments, the velocity of
listeners’ sway was more similar to that of the performer than expected by chance. In
other words, listeners spontaneously entrained to the performer’s movements,
spontaneously changing their direction of sway at the same time as the performer simply
from hearing the recorded sound of the music. Listeners appeared to be responding to
information about the performer’s movements conveyed by the musical sound. First,
entrainment was only partly due to performer and listeners both swaying to the musical
pulse in the same way; listeners also entrained to performers’ unique, non-metrical,
movements. Second, listeners entrained more strongly to back-and-forth sway (more
closely linked to the sound-producing movements of the trombone slide) than to side-to-
side sway (more ancillary). Both effects suggest that listeners heard the performer’s
movements in the musical sound and spontaneously entrained to them. People
spontaneously entrain to the movements of other people in many situations. In present
study, entrainment appeared to be an integral part of listeners’ affective response to the
music. Listeners rated performances as more expressive when they entrained more. They
also rated expressive higher than non-expressive performances, but they did not entrain
more strongly to more expressive performances. Thus, entrainment was determined by
the listener, not by the expressive style of the performer. Entrainment appears to enhance
listeners’ experience of musical feelings expressed by the musician.
ISBN 1-876346-65-5 © ICMPC14 406 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Human interaction often involves the rhythmic coordination of actions across multiple
individuals, such as in ensemble musical performance. Traditional accounts of
sensorimotor synchronization (SMS) posit that precise synchronization is the product of
adaptive mechanisms that correct for asynchronies between performers in a reactive
manner. However, such models cannot account for how synchrony is maintained during
dynamic changes in tempo. The adaptation and anticipation model (ADAM) proposes
that performers also make use of predictive mechanisms to estimate the timing of
upcoming actions via internal simulations of a partners’ behavior. According to this
account, predictive mechanisms rely on attention whereas basic adaptive mechanisms are
automatic. To investigate the effect of attention on adaptive and anticipatory
mechanisms, we fitted a computational implementation of ADAM to data collected from
two SMS tasks. Estimates of adaptation were derived from a SMS task involving an
adaptive virtual partner, wherein participants drummed in synchrony with an adaptive
metronome that implemented varying levels of phase correction. Estimates of
anticipation were derived from a task in which participants drummed in synchrony with
sequences with constantly varying tempo. To assess the role of attention, participants also
did these tasks while engaged in a concurrent visual one-back task. As expected,
synchronization accuracy and precision were not modulated by the secondary task for the
virtual partner task, but were significantly impaired under dual task conditions for the
tempo-changing task. Moreover, model estimates of adaptive mechanisms were not
altered by the secondary task, whereas model estimates of predictive mechanisms
indicated participants utilized reduced predictive control while engaged in the secondary
task. These results suggest that rhythmic interpersonal synchronization relies on both
automatic adaptive and controlled predictive mechanisms.
ISBN 1-876346-65-5 © ICMPC14 407 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Introduction
Bilateral gesture in North Indian classical vocal music (khyal) has a complex relationship
with music structure and aesthetic intentions, and appears to augment emotional
expression and serve important pedagogical functions. Gesture has an important role in
Indian vocal training, especially in relation to processes of attention, cognition, and
memory. In this paper, I examine the multimodal interactions between voice production,
emotional expression, song text, and gesture through an audio/visual analysis of
performers in three different vocal traditions. I will examine distinctions between
different vocal gharanas (schools) through both gesture and related stylistic/aesthetic
elements. The use the spatial visualization of sound based on particular vowels in the
dhrupad vocal genre will also be discussed as a possible influence on movement.
Method
The methods employed consist of collecting ethnographic data through interviews and
audio/video recordings, participation in learning situations, and examining Indian
theoretical treatises on vocal music. The data is correlated with performance practices of
the gharanas to determine the relationships between aesthetics, vocal culture, and
musical structure.
Results
While the use of gesture is not formally taught and can be idiosyncratic, performers from
specific vocal gharanas exhibit some common practices. However, between gharanas
gestures tend to emphasize the dominant aesthetic features particular to that gharana, and
correlate with distinct musical elements.
Discussion
For North Indian classical vocal music, the system of oral transmission and
individualized training in a particular gharana is evident in gesture, which developed
through the process of enculturation. My evidence shows that gesture is a component of
multi-modal processes of learning and expression, which supports the hypothesis of the
embodied nature emotional-cognitive processes.
ISBN 1-876346-65-5 © ICMPC14 408 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 409 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 410 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 411 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
412
Table of Contents
for this manuscript
Traditionally, participatory musical behavior has been explored through what seems to be
an individualist perspective, which considers the single agent as the main explanatory
unit of the interaction. This view, however, may downplay the role of real-time
interactions, reducing joint action to a stimulus-response schema. Our aim, then, is to go
beyond such spectatorial stance, investigating how musicians maintain structural unity
through a mutually influencing network of dynamical processes. For example, is visual
information needed in order to achieve coordination among musicians? What kind of
coupling is necessary to keep playing together when the auditory signal is perturbed
externally? In order to answer such questions, we will recruit 6 pairs of string players
divided into 2 groups (by level of expertise) of 3 pairs each. The participants, wearing
noise-cancelling headphones, will be firstly asked to play 2 pieces individually (taken
from Bartok’s 44 Duos). After this, they will perform together in 3 main conditions,
designed to test how auditory perturbations may shape coordination when partners are
differently ‘present’ to each other. Conditions involve subjects being (A) naturally placed
one in front of the other, (B) able to see each other only partially, and (C) unable to see
each other. For all conditions 2 different playing modalities will be investigated:
(i) without disturbance, and (ii) with brief, unpredictable, audio disturbance (white noise)
in both performers. To investigate interactive behavior, audio and motion-capture data
will be extracted, measured, and compared within and between groups during individual
and joint performance. Features to be examined include zero-crossing rate, amplitude and
duration variables for each movement, and onset synchronicity in the audio signal. We
predict significant differences in interacting behavior related to musical expertise,
condition, and manipulation.
ISBN 1-876346-65-5 © ICMPC14 413 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
While most rhythm synchronization studies have focused on finger tapping and stimulus
tempo, full-body movements to music can reveal more complex relationships between
rhythmic structure and sensorimotor behavior. This study expands on the role of low-
frequency spectral flux on synchronization in movement responses to music. The amount
of spectral flux in low frequency bands, manifesting in beat-related event density, has
previously been found to influence music-induced movement; here, its influence on full-
body movement synchronization is examined. Thirty participants were recorded using
optical motion capture while moving to excerpts from six Motown songs (105, 115, 130
bpm), three of which containing low and three high spectral flux. Each excerpt was
presented twice with a tempo difference of ±5% bpm. Participants were asked to dance
freely to the stimuli. Synchronization (phase locking) indexes were determined for the
hip, head, hands, and feet, concentrating on vertical acceleration relative to beat-level
periods of the music, and sideways acceleration relative to bar-level periods. On the beat
level, significant differences were found between high and low flux excerpts, with tighter
synchronization of hip and feet to the high flux stimuli. Synchronization was better with
the slowed-down stimuli, and decreased in particular for the low flux sped-up excerpts.
On the bar level, significant differences between high and low flux excerpts were also
found, but with tighter synchronization of head and hands to the low flux stimuli.
Moreover, synchronization ability decreased for the sped-up excerpts. These results
suggest a connection between event density/spectral flux and phase-locking abilities to
music, with an interaction between metrical levels present in the music, spectral flux, and
the part(s) of the body being synchronized.
ISBN 1-876346-65-5 © ICMPC14 414 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
We report a study using a novel priming method to investigate the impact of motor
‘shaping’ on musicians’ improvised continuations from an ambiguous starting point.
Trained musicians (N=34) played three games with a Disklavier piano. In Game 1
participants were presented with a short extract of music, automatically played by the
piano, which could be interpreted as either a sequence of 2nds or 7ths, and were
instructed to improvise a continuation, based on these starting events. In Game 2 they
were told to immediately repeat whatever the piano played, as quickly as possible. There
were two versions of Game 2 to which participants were randomly assigned in equal
numbers: dyads over the whole range of the keyboard that were all 2nds; and dyads that
were all 7ths.On completion of Game 2, participants played Game 3, which was identical
to Game 1. The dependent variable was the value of the change in the size of the intervals
in participants’ spontaneous improvised continuations in Game 3 by comparison with
Game 1.
The results show an effect of the two different versions of Game 2, with participants
exposed to the 7ths version of Game 2 showing an increase in the mean size of the
intervals in Game 3 by comparison to Game 1; and participants who were exposed to the
2nds version of Game 2 showing a decrease in the size of the intervals in Game 3 by
comparison to Game 1. These results indicate the significant effect of motor priming on
improvised musical performance, on the basis of which we argue for the impact of
embodied factors on the structure and shaping of musical improvisation.
ISBN 1-876346-65-5 © ICMPC14 415 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Electronic Dance Music (EDM) is a predominant cultural phenomenon of every day lives
of many. It is listened to in private settings, it contributed to the establishment of club
dance culture, and it is also being performed live on big stages that attracts large
audiences. However, up until now, it has rarely in the focus of research within the field of
music psychology. Given the fact that it is strongly rooted in an embodied engagement
with music, its research can be nicely linked to recent theories of embodied music
cognition. This symposium will try to contribute to this by providing several very
presentations that focus on several stages of the musical communication process within
electronic dance music. The first two talks will focus on the production of electronic
dance music: One will present the results of a qualitative interview study on live
performance of electronic music. Another will present a statistical analysis of historical
trends in the production of electronic dance music. Subsequently, the following
presentation will focus on audiences moving to this music. This studies will employ an
embodied paradigm to research the shape dance movements in club cultural settings. We
believe that this symposium will contribute to establish a research focus on this often
ignored part of everyday life musical culture.
ISBN 1-876346-65-5 © ICMPC14 416 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Electronic dance music (EDM) is among the most popular musical genres of today.
However, in stark contrast to practices in other musical styles, electronic music is hardly
ever played fully “live” in a conventional sense. The performers interact with a variety of
electronic devices (e.g. turntables, synthesizers, gestural controllers), often with the use
of pre-recorded material, to transform their respective musical intentions into sound. In
order to investigate the many possible approaches to electronic “live” performance and
the underlying concepts of musical embodiment, 13 qualitative interviews with electronic
live performers – ranging from experimental to relatively popular – were conducted. The
questions were conceived in the framework of “embodied music cognition”, with a
particular focus on the use of the performer's body during the interaction with the
equipment, the visibility and readability of the performer's movements from the
audience's perspective, and the role of the musical equipment as a mediator between
performer and audience. Results initially show a great heterogeneity in the various
approaches in the sample group. On closer inspection, a continuum becomes apparent –
ranging from a very controlled (i.e. focus on reproduction quality), less embodied and
less visible performance style to a very physical, extroverted way of performing with
room for unpredictable events and error. Surprisingly, the former “controlled” style is
found more often in performers with a connection to club culture, whereas the latter
“extroverted” style is found more often in avant-garde contexts with a seated audience.
This seemingly contradicts the idea of a shared bio-mechanic energy level between the
performer and the audience, as can be found in other popular musical genres. These
findings, as well as the general obstacles of conducting research on electronic music, are
discussed.
ISBN 1-876346-65-5 © ICMPC14 417 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Several studies have shown a preference for tempi around 120 bpm. If this corresponds to
our natural way of moving, it should also be reflected in the tempi found in dance music.
To study this, we analyzed the tempi of pieces in the Scandinavian Dance Charts, which
give a top 40 of the most popular pieces for each week, including tempo. Analyzing data
over a period of 17 years (with 33141 records in total) gives interesting insights in how
tempo evolves and is influenced by trends. It shows periods with a prevalence of slower
or faster tempi, but most remarkable is the predominance of one single tempo. Over the
whole period about 28% of the records have a tempo of 128 bpm, with 62% of the tempi
between 125 and 130. This narrow tempo range is always present, independent of certain
trends in musical style. Therefore it is a strong indication that that this range corresponds
to our natural dance movement and can be related to optimal walking tempo. These
results show again a preferred tempo around 120 bpm, but not just ‘around’. It seems that
128 bpm has a special function and that a narrow band around it accounts for a majority
of our common dance moves. Future research can now consciously look if 128 bpm also
has a special role in other rhythmic phenomena and if this might be a more exact
representation of human natural resonance frequency.
ISBN 1-876346-65-5 © ICMPC14 418 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Electronic dance music (EDM) could be classified as music created to move to. While
research has revealed relationships between movement features and, e.g., musical,
emotional, or personality characteristics, systematic investigations of genre differences
and specifically of EDM are rather rare. This contribution aims at offering insights into
the embodiment of EDM from three different angles: firstly from a genre-comparison
perspective, secondly by comparing different EDM excerpts with each other, and thirdly
by investigating embodiments in one specific EDM excerpt. Sixty participants moved
freely to 16 excerpts of four different genres (EDM, Latin, Funk, Jazz – four
excerpts/genre) while being recorded with optical motion capture. Subsequently, a set of
movement features was extracted from the mocap data, including acceleration of different
body parts, rotation, complexity, or smoothness. In order to compare embodiments of
EDM to the other genres, repeated measures ANOVA analyses were conducted revealing
that participants moved with significantly higher acceleration of the center of the body,
head, and hands, overall higher smoothness, and a more upright position to the EDM
excerpts than to the other excerpts. Between the EDM excerpts, several spectro-timbre
related musical features (brightness, noisiness, spectral centroid, spectral energy (mfcc1),
attack length) showed high correlations with vertical foot speed, acceleration of head and
hands, and upper body complexity. Detailed analysis was conducted on a section
containing four bars chorus, four bars break, and four bars chorus of one of the EDM
excerpt. Participants’ movements differed in several features distinguishing the break
from the surrounding parts, in particular showing lower amounts of foot and hand
acceleration, phase coherence, rotation and hand displacement in the break. These
analyses proposed different ways of studying EDM and could indicate distinctive
characteristics of EDM embodiment.
ISBN 1-876346-65-5 © ICMPC14 419 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 420 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
by the musicians and preferred by the participants. Because of augmented motion version were created. This was achieved
the difficulty of a fast, technical passage, it is possible that a by hand-editing what was deemed expressive motion.
musician would add less expressive motion to their playing Figure 2. Examples of marking the musicians’ hands
for fear of distracting themselves from the difficult notes. On
the other hand, for a slower more lyrical passage, a performer
might feel freer to add expressive motions. As outlined in the
above scenario, we predicted that performers would move less
during a faster, more technically challenging passage than
while playing a slower, more lyrical passage. Therefore, we
asked the volunteers to bring both a slow lyrical example and
a faster more technically challenging example with the
following hypothesis: (H2): While adjusting towards more
motion in general, participants will prefer a greater amount of
performer motion for lyrical pieces than for technical pieces.
B. Recording Procedure
Each of the four musicians wore a motion capture suit to
which small, spherical markers were applied in conventional
Expressive motion was operationalized as any additional
anatomical locations (Fig.1). Some experimentation was
motion that was not a direct result of technical movements.
conducted in order to optimize placement of finger markers.
For example, the cellist tended to sway their upper body from
Figure 1. Experimenter in the motion capture suit and markers side to side and backwards and forwards. These motions were
not direct results of the bowing or fingering motions required
Figure 3. Motion capture data realized in MotionBuilder
Due to their close proximity, fingers are particularly to produce notes. Henceforth, this motion is referred to as an
challenging parts of the body to capture using a motion expressive motion. For the augmented condition, these same
capture system (in this case VICON Blade). The system often motions were amplified. For the diminished condition, an
confuses the markers, mislabelling fingers and joints. There attempted was made to keep the spine as stationary as
was also the issue of whether or not wearing the typical possible. A similar process applied in altering the expressive
motion capture fingerless gloves would interfere with the motions of each of the performers. The most commonly
musicians’ ability to play well. To avoid using the gloves and altered motions included head motions, knee bends, and
in an attempt to yield a more refined capture of finger motion, full-body swaying.
we tried to use make-up tape to attach miniature markers After creating these animated versions, each animated
directly onto their hands and fingers (Fig.2). However, for figure was fitted with white cylinders in a skeletal fashion and
most of the musicians, these failed to stay attached during placed on a wooden floor with a dark, neutral background
playing. Therefore, after testing to be sure that the gloves (Fig.4). A skeleton model was chosen after attempts to fit
minimally interfered with playing level, the gloves were used more realistic models were found to reduce the apparent
for the capture including two extra markers taped to the index motions overall since some body parts were visually
finger and the minimus (pinky) fingers (Fig.2). Markers were
obscured. Additionally, part of what made motion capture an
also attached to various points on each instrument to allow for
ideal tool for this study was that it eliminated all other
possibility of adding instruments to the animations.
physical influences, such as gender and attractiveness. A
C. Method of Animation skeletal stick figure animation is devoid of all these other
The data for each of these captures were cleaned and imported visual aspects. Additionally, there were no chairs or
into MotionBuilder (Catalogue, n.d.) for animation (Fig.3). instruments animated into the scenes. This decision was made
From the originals, a diminished motion version and an based on the sheer difficulty of attempting to align the
instruments with the live capture data without having the
421
Table of Contents
for this manuscript
instruments collide with a limb or the main body. It was see eight different performances. For each performance, we
especially difficult in the augmented and diminished want you to consider the following scenario. Suppose there is
animations because the instrument motion data did not alter a performer who is preparing an animated video to submit for
a competition. While viewing their video, the performer is
Figure 4. Animated skeletal representation of motion data
thinking that perhaps there is room to ‘cheat’ a bit by
enhancing their performance by manipulating the movement.
With the slider at the bottom of each video clip, you can
manipulate the overall amount of movement in the
performance (left diminishes the motion while right augments
the motion). For each animated clip, we want you to tune the
slider to give what you think is the most musically superior
performance.”
IV. RESULTS
Recall that our first hypothesis predicted that participants
would adjust the overall amount of performer motion towards
the augmented version. Given that the slider moved from
diminished (0) to original (0.5) to augmented (1), in order to
reject the null hypothesis, the mean must be significantly
higher than 0.5. Consistent with our hypothesis, a
naturally when the body motions were altered. The same single-sample t-test showed that participants chose, on
issues presented themselves when we attempted to add chairs. average, to move the slider towards the augmented version of
Therefore, it was deemed too difficult a task and the each recording (M = 0.634, SD = 0.218, t (127) = 6.944, p <
instruments and chairs were consequently omitted. .0001).
Participants reported no aversion to the stick figures or to the Our second hypothesis predicted that participants would
absence of instruments or chairs when asked during prefer more motion for lyrical passages than for technical
post-experiment interviews. passages. In fact, using an independent means t-test, we found
D. Interface no difference between the two conditions (t (126) = -0.615, p
= 0.54).
The interface was created using Unity (Goldstone, 2009).
Additionally, we were interested in whether the degree of
The three versions of each capture (augmented, original, and
musical sophistication of the participants affected the results.
diminished) were combined into one dynamic video that was
Using the OMSI index, participants were categorized into two
adjustable by a slider. From left, to center, to right, the slider
groups; more musically sophisticated (>500) and less
continuously altered the live animation from the original
musically sophisticated. An independent means t-test
motion at the center to mostly the diminished version if
demonstrated that there was no statistically significant
moved to the far left and to mostly the augmented version if
difference between the two groups of participants (t (126) =
moved to the far right. This gave the illusion that participants
-1.603, p = 0.112).
were able to adjust the amount of expressive motion in each
Finally, a one-way ANOVA determined that there was a
video clip in real time without manipulating the audio of the
significant difference between the average amounts of motion
performance.
preferred for each of the instruments (F (3, 124) = 5.04, p =
.002). A Tukey post-hoc test revealed that the average
III. METHODOLOGY
selected amount of motion for cello (M = 0.728, SD = 0.109)
A. Participants was significantly higher than the average selected for clarinet
There were 16 participants from the School of Music at (M = 0.556, SD = 0.18, p = .007) or for violin (M = 0.573, SD
Ohio State University. Participants ranged in age between = 0.246, p = .019). However, it was not significantly higher
18-22. than flute (M = 0.679, SD = 0.263, p = .782).
B. Procedure V. DISCUSSION
In order to later test whether musical experience might have The results were consistent with our first hypothesis in that
an effect on the results, the first part of the interface directed when participants were instructed to adjust the overall amount
participants to complete the Ollen Musical Sophistication of performer motion to enhance the animated musical
Index (OMSI) and record their results (Ollen, 2006). No a performances, they tended to augment the overall amount of
priori hypothesis was made regarding possible effects of motion. These results corroborate findings by Morrison et al.
musical sophistication. Recall that the aim of this study is to (2009) and Juchniewicz (2008). Our second hypothesis
test whether musicians with more motion are perceived as predicted that participants would select a greater overall
better musicians. Therefore, to test this concept with the amount of performer motion for lyrical passages compared
method of adjustment, we presented participants with the with technical passages. However, this prediction was not
following scenario via these instructions: “You are going to born out in the data.
422
Table of Contents
for this manuscript
In comparing the mean adjustments for each of the Burgoon, J. K., Birk, T., & Pfau, M. (1990). Nonverbal behaviors,
instruments, it was notable that the motion of the cellist was persuasion, and credibility. Human Communication Research,
augmented significantly more than that of the clarinet and the 17(1), 140-169
Catalogue, S. Motion Builder: Making Motion Control Programming
violin. There was no statistical difference in comparison to the Easy.
adjustments for flute. It is important to acknowledge that the Fisher, G., & Lochhead, J. (2002). Analyzing from the Body. Theory
expressive motions of each of the instruments differ from and Practice, 37-67.
each other greatly. Therefore, it is hard to draw any specific Goldstone, W. (2009). Unity game development essentials. Packt
conclusions from these resulting differences in adjustments. Publishing Ltd.
Finally, the wording of the instructions may have been a Gritten, A., & King, E. (Eds.). (2006). Music and gesture. Ashgate
Publishing, Ltd..
possible confound in this experiment. Our participants were Hatten, R. S. (2004). Interpreting musical gestures, topics, and
instructed to “enhance” the performance. Unfortunately, the tropes: Mozart, Beethoven, Schubert. Indiana University Press.
word “enhance” could imply meanings such as “amplify” or Juchniewicz, J. (2008). The influence of physical movement on the
“increase”. These meanings could have led subjects to believe perception of musical performance. Psychology of Music.
that the task was to increase the motion of the performance Larson, S. (2012). Musical Forces: Motion, Metaphor, and Meaning
rather than make adjustments to improve the performance, in Music. Indiana University Press.
McCreless, P. (2006). Anatomy of a gesture: From Davidovsky to
thereby biasing the data. Chopin and back. Approaches to Meaning in Music, 11-40.
Montague, E. (2012). Instrumental Gesture in Chopin's Étude in
VI. FUTURE RESEARCH A-flat Major, Op. 25, No. 1. Music Theory Online, 18(4).
It would be appropriate to repeat this experiment with a Morrison, S. J., Price, H. E., Geiger, C. G., & Cornacchio, R. A.
(2009). The effect of conductor expressivity on ensemble
change of instructions, particularly avoiding the word
performance evaluation. Journal of Research in Music Education,
“enhancing”. Additionally, it would be of interest to explore 57(1), 37-49.
further the physical visual elements of performance. Motion Ollen, J. E. (2006). A criterion-related validity test of selected
capture technology in particular provides a useful way to indicators of musical sophistication using expert ratings
study performer motion without confounding factors such as (Doctoral dissertation, The Ohio State University).
attractiveness, gender, race, etc. Additionally, animating the Schutz, M., & Lipscomb, S. (2007). Hearing gestures, seeing music:
Vision influences perceived tone duration. Perception, 36(6),
motion data allows for manipulation of visual elements
888.
without altering the original audio recording of a Sundaram, D. S., & Webster, C. (2000). The role of nonverbal
performance. communication in service encounters. Journal of Services
Upon closer inspection of how the sliders manipulated the Marketing, 14(5), 378-391.
performer motions, there were sometimes marked changes in Wapnick, J., Darrow, A. A., Kovacs, J., & Dalrymple, L. (1997).
posture and head placement. For example, the cellist, when Effects of physical attractiveness on evaluation of vocal
performance. Journal of Research in Music Education, 45(3),
augmented, also adopted a more expansive, open posture and
470-479.
a raised head position. Research on posture and eye contact Wapnick, J., Mazza, J. K., & Darrow, A. A. (1998). Effects of
indicate that open postures and good eye contact communicate performer attractiveness, stage behavior, and dress on violin
friendliness, trustworthiness, and greater emotionality performance evaluation. Journal of Research in Music Education,
(Burgoon, Birk, & Pfau, 1990; Sundaram & Webster, 2000). 46(4), 510-521.
It would be of interest to further examine how augmented and
diminished motions altered the overall physical vocabulary of
each performer to understand what individual components of
performer motion might be contributing to these results.
ACKNOWLEDGMENT
Our thanks to the Center for Cognitive and Brain Sciences
at Ohio State University for funding this research through a
summer research grant. Additional thanks to our performers
(Evan Lynch, Mark Rudoff, Snow Shen, Ann Stimson), our
motion capture operators (Vita Berezina-Blackburn and
Lakshika Udakandage), our unity app developer (John Luna),
and the head of the Advance Computing Center for the Arts
and Design, Maria Palazzi. Finally, we would like to thank
members of Cognitive and Systematic Musicology Laboratory
at Ohio State University for critical feedback and support.
REFERENCES
Berry, M. (2009). The Importance of Bodily Gesture in Sofia
Gubaidulina’s Music for Low Strings. Music Theory Online, 15.
423
Table of Contents
for this manuscript
and finally a release on the strong beat (rise with the hand).
ABSTRACT Studies were also conducted under different performance
This work seeks to identify the relationship between pianists’ use of conditions (i.e., immobile, deadpan, standard, exaggerated) to
motion cues (quantity, velocity, regularity) that convey expressive understand the correlation between levels of expression, body
intentions and auditors’ evaluation of these cues, as well as between movements, and musical structure (e.g., Davidson, 1994;
structural parameters and motion cues. Five pianists played an Thompson & Luck, 2012; Wanderley, Vines, Middleton,
excerpt from a well-learned piece of their Romantic repertoire in four McKay, & Hatch, 2005). Looking at piano performances,
expressive conditions (normal, exaggerated, deadpan, immobile). A Thompson and Luck (2012) demonstrated that quantity of
motion capture system tracked pianists’ movements and a force plate motion was linked to articulation, dynamic markings, and
measured the weight distribution. Twenty auditors rated the with the piece’s climax. Velocity of head movement was
performances according to motion cues and to the movements of positively correlated with key-velocity (Camurri, Mazzarino,
body parts (head, arms, hands, torso, hips). Quantity of movement of Ricchetti, Timmers, & Volpe, 2004). Short phrases in
the head and hips, mean velocity of the head, and force applied on Chopin’s preludes presenting the same rhythmical pattern at
the piano stool were overall higher in the exaggerated performances different moments in the score entailed similar movements by
than in other conditions. Head quantity of motion yields significant pianists (MacRitchie, Buck & Bailey, 2009). In a study
differences between conditions for every pianist. When comparing
conducted on clarinetists’ gestures, Wanderley and colleagues
expressive conditions, quantity of motion was rated as the most
(2005) revealed that there is a discrepancy in timing between
important cue for auditors. Arm and head movements received the
the conditions. The immobile condition was performed faster
greatest attention, while hip movements obtained the lowest attention.
Velocity and quantity of motion were associated to phrase than standard and exaggerated performances, suggesting that
boundaries, modulations, sound dynamics, and articulation, while motion is associated to the rhythmic structure of phrases.
regularity of motion was related to similarity in rhythmic patterns. They also showed that performers use less quantity of motion
This work led to the identification of pianists’ motion parameters that during very active technical passages, and exaggerate their
could be associated to expressive intentions and structural features, motions during easier ones. Bell motion in clarinet playing
and that were perceived by auditors. also reinforces idiomatic acoustic events at phrase boundaries
and at places with harmonic tension (Teixeira, Loureiro,
Wanderley, & Yehia, 2014).
Nusseck and Wanderley (2009) have indicated that certain
I. INTRODUCTION movements are not essential to appreciate a musical
The development of robust methods to analyze pianists’ performance. Modifying specific kinematic parameters of
gestural movements requires one to understand the correlation clarinet performance (e.g., removing movements from torso or
between pianists’ use of motion cues that convey expression arms and hands) does not necessarily affect its expressiveness
and the structural parameters of music. To evaluate how and understanding (Nusseck & Wanderley, 2009), and results
auditors perceive these motion cues in musical performance in nearly no effect on observers’ judgments of tension,
could also orient better the analysis toward specific kinematic intensity, and fluency of movements. Moreover, every player
features. Acoustic parameters, such as timing, phrasing, and performed the piece with personal movement velocities and
dynamics, how they influence structural communication in the total amount of motion differed for the individual body
performance, and the way auditors relate these parameters to parts (e.g., one may use larger arm motions and another more
expressive intentions have been extensively studied (e.g., body sway).
Clarke ,1989; Gabrielsson & Juslin,1996; Gagnon & Peretz, The performance conditions were also used to assess
2003; Sloboda, 1985). Gagnon and Peretz (2003) found that auditors’ visual perception of different levels of expression.
fast tempi were related to expressions of excitement, surprise, Auditors were better at identifying performers’ expressive
potency, while slow tempi were associated with calmness, intentions with visual information than with sound only
boredom, and sadness. Moreover, pianists appear to modify (Davidson, 1994). The quantity and velocity of head and torso
articulation and note length in relation to the different metrical movement sufficed to discriminate between performance
positions of the note (Sloboda, 1985). conditions (Davidson, 1994), and between emotions (Dahl &
The musical structure has been also discussed in relation to Friberg, 2007). The hip swaying motion towards and away
movements. According to Delalande (1988), Gould’s left from the keyboards was always present in the different
hand movements in Bach performances suggest a precise performance conditions but the relationship with the musical
analysis of the score. He would emphasize syncope and structure was not clear (Davidson, 2007).
anticipates phrases by a preparation with a vibrato of the hand, In the present study we propose a new approach to extract
a support on the up-beat (lower point of the hand’s trajectory), kinematic features of piano performances of pieces from the
ISBN 1-876346-65-5 © ICMPC14 424 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
425
Table of Contents
for this manuscript
426
Table of Contents
for this manuscript
deadpan conditions for pianist 2, which indicates that this of the head, mainly in the exaggerated and normal conditions.
pianist used a wider range of speeds and larger directional The more the level of expression increases, the further away
changes between the conditions. Figure 4 presents the head from the keyboard the head goes. In addition, changes in
velocity pattern for pianist 2 during normal and exaggerated velocity are related to high dynamic level, large chords in the
conditions. The velocity patterns seem to follow the section low register, and a staccato articulation (e.g., between 15 and
boundaries identified in the score. Indeed, changes occur at 17 s).
the beginning and ending of new sections, and change rapidly
at places with modulations. Toward the end of the excerpt, a
French sixth chord, successive accentuated chords in a high
dynamic level, and a final rallentendo are heard.
427
Table of Contents
for this manuscript
showed a significant difference in the quantity of movement Delalande, F. (1988). La gestique de Gould: éléments pour une
between the performing conditions in the movements of the sémiologie du geste musical. In G. Guertin (Ed.), Glenn Gould,
wrists and hands. Pluriel (pp. 83-111). Montréal: Louise Courteau éditrice.
Studies have pointed out a few relationships between the Gabrielsson, A., & Juslin, P. N. (1996). Emotional expression in
music performance: Between the performer's intention and the
phrasing structure, melodic line, and performers’ gestures
listener's experience. Psychology of Music, 24(1), 68-91.
(Camurri et al. 2004; MacRitchie et al. 2009; Teixeira et al. Gagnon, L., & Peretz, I. (2003). Mode and tempo relative
2014; Thompson & Luck, 2012; Wanderley et al., 2005). This contributions to “happy-sad” judgements in equitone melodies.
exploratory work considered a large-scale analysis (harmony, Cognition & Emotion, 17(1), 25-40.
phrase structure, melody (motive), sound dynamic, and MacRitchie, J., Buck, B., & Bailey, N. J. (2009). Visualising Musical
articulation), and investigated various piano pieces with Structure Through Performance Gesture. In Proceedings of the
different levels of complexity. Similarly to Thompson and 10th International Society for Music Information Retrieval
Luck’s results, quantity of motion was mainly associated to Conference (pp. 237-242). Kobe, Japan.
phrase boundaries, motives, modulations, and repetitions. Nusseck, M., & Wanderley, M. M. (2009). Music and Motion—How
Music-Related Ancillary Body Movements Contribute to the
Velocity of motion was related to staccato articulation, sound
Experience of Music. Music Perception: An Interdisciplinary
dynamics, and chords. Periodicity of the head motion was Journal, 26(4), 335-353.
found in similar rhythmic patterns. Force applied on the piano Teixeira, E. C., Loureiro, M. A., Wanderley, M. M., & Yehia, H. C.
stool was associated to section boundaries, and before (2015). Motion analysis of clarinet performers. Journal of New
accentuated chords. The recurrence of these motion cues in Music Research, 44(2), 97-111.
different musical contexts reinforces the theory that certain Thompson, M. R., & Luck, G. (2012). Exploring relationships
gestures effectively support musical structural parameters. between pianists’ body movements, their expressive intentions,
The diversity of repertoire helped us make various and structural elements of the music. Musicae Scientiae, 16(1),
relationships between the compositional structure and 19-40.
Verron, C. (2005). Traitement et visualisation de données gesturalles
gestures. Future work is needed to characterize the influence
captées par Optotrak [Processing and visualizing gesture data
of these motion cues on auditors regarding the musical captured by Optotrak]. Unpublished manuscript, Input Devices
structure. A better understanding of the impact of individual and Music Interaction Laboratory, McGill University, Montréal,
motion cues on the appreciation and comprehension of the Canada.
musical structure could further expand the scope of use and Wanderley, M. M., Vines, B. W., Middleton, N., McKay, C., &
validity of this approach. The method would need to be Hatch, W. (2005). The musical significance of clarinetists'
reiterated by having each pianist play the same Romantic ancillary gestures: An exploration of the field. Journal of New
excerpts in order to evaluate the reproducibility of gestures in Music Research, 34(1), 97-113.
different performances and different piano schools.
This work may also contribute to the development and
design of new pedagogical technologies to help students
acquire movement awareness and increase their musical
communicative abilities.
ACKNOWLEDGMENT
We gratefully acknowledge financial support from the
Social Sciences and Humanities Research Council of Canada.
We would also like to thank all the performers and auditors
for their participation and collaboration.
REFERENCES
Camurri, A., Mazzarino, B., Ricchetti, M., Timmers, R., & Volpe, G.
(2003). Multimodal analysis of expressive gesture in music and
dance performances. In A. Camurri & G. Volpe (Eds.),
Gesture-based Communication in Human-Computer Interaction
(pp. 20-39). Berlin: Springer Verlag.
Clarke, E. F. (1987). Levels of structure in the organization of
musical time. Contemporary Music Review, 2(1), 211-238.
Dahl, S., & Friberg, A. (2007). Visual perception of expressiveness
in musicians' body movements. Music Perception: An
Interdisciplinary Journal, 24(5), 433-454.
Davidson, J. W. (1994). What type of information is conveyed in the
body movements of solo musician performers. Journal of Human
Movement Studies, 6, 279-301.
Davidson, J. W. (2007). Qualitative insights into the use of
expressive body movement in solo piano performance: A case
study approach. Psychology of Music, 35(3), 381-401.
428
Table of Contents
for this manuscript
The primary objective of this study is to investigate the role of action/sound congruence
in observer evaluations of conductor efficacy where domain specific temporal distance
has been experimentally manipulated.
We collected video from five conductors and extracted two excerpts per conductor (fast,
slow). These were manipulated using video editing software to create stimuli
encompassing intact, audio-lead, and video-lead conditions of ±15% and ±30% of
performed tempo. Stimuli were ordered so that no two conductors or conditions (a/v
offset or fast/slow excerpt) were seen consecutively. For each excerpt participants
(n=110) evaluated the conductor, ensemble, and overall performance.
These results suggest that gesture’s temporal distance from its sonic correlate plays a
detrimental role in observer evaluations of conductor efficacy. While lower ratings of
audio-lead offset conditions lend support to a predictive or delineative function of
movement in music performance, the presence of similar, negative ratings in video-lead
conditions tempers any universality that might be attributed to the aforementioned
function. These results support existing research on the multimodal influence of gesture
on perception of sound, but highlight an underexamined aspect of action/sound
congruence and its influence on aesthetic evaluations.
ISBN 1-876346-65-5 © ICMPC14 429 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Introduction
What can we learn about someone’s experience of music by tracking their breathing?
Potentially a great deal. At any moment, how we are breathing is determined by what we
are doing, what we are thinking, how we are feeling, and who we are. Beyond changes in
respiration rate a few studies have found instances of listeners' breathing aligning to
particular moments of musical stimuli. Measuring when and how participants show
significant respiratory adaptation may open new avenues to capturing important elements
of their responses to music as it plays.
Aims
Respiratory alignment to music can but does not always occur. This research presents a
method of evaluating disruptions in a passive listener’s breathing which may be support
adaptation to the music heard, and quantifying the likelihood of stimulus influence in
those cases when moments of alignment are identified from other listenings.
Method
Alignment between multiple respiratory sequences to a given musical stimulus, or the
absence there of, is an indicator of whether this music encourages alignment. From
continuous chest expansion measure from individual participants to many listenings of a
variety of pieces, the respiratory sequences are classified as showing none, some, or
strong indications of adaptation. Disruptions and changes in breathing patterns are
quantified and statistics on their qualities and occurrences are compared between these
classes.
Results
In the respiratory sequences measured during a set of repeated response case studies,
twelve listening sessions to eleven stimuli from four participants with distinct musical
histories, preliminary results show evidence of adaptation. These data show promising
variation in each participant’s sensitivity to specific stimuli, ranging from a Beethoven
string quartet to Maria Carey, and even between single listenings. Possible correlations
with ratings of liking, focus, and familiarity will be discussed in more detail.
ISBN 1-876346-65-5 © ICMPC14 430 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Upright postural control is pivotal to the integrity of dance movement. Most techniques to
assess postural stability require expensive instrumentation, not readily accessible to dance
schools, especially in countries such as India. This is a particularly striking gap, given the
rich heritage of Indian classical dance.
Aims
We sought to evaluate the simple Star Excursion Balance Test (SEBT) as a tool for
assessment of postural stability in the context of the Araimandi, (half-sitting posture) a
primary and fundamental position in the Indian classical dance form, Bharatanatyam.
Methods
Twenty-two female volunteers (one highly trained and experienced dancer as index
subject and 21 dancers-in-training) underwent the SEBT procedure and a digital still
motion capture recording during the Araimandi stance. SEBT scores and joint angles
computed using MATLAB ™ software. In the index subject alone, postural stability in a
standardized optimal Araimandi position was also assessed using a laboratory force
platform. Joint angle data from each volunteer were compared against those from the
index subject force plate analysis, to define degree of alignment or deviation from the
index subject. We evaluated the relationship between joint angle data and SEBT scores.
Results
Subject characteristics (mean, range): age: 20 yrs. (7-42), BMI: 20 kg/m2 (14-29); 4
subjects had no additional physical activity. Preliminary data analysis reveals strong
differences between the expert and novice performers.
Conclusion
To our knowledge, this is the first experiment to attempt to develop a simple method to
quantitatively characterize postural stability in the context of the Indian classical dance
form, Bharatanatyam. If the SEBT score is indeed a viable tool for assessment of
postural stability in such a setting, it could be readily applied in dance schools, and
perhaps other settings such as rehabilitation centers in developing nations with limited
resources.
ISBN 1-876346-65-5 © ICMPC14 431 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Educational children’s media relies heavily upon musical sequences to convey prosocial
messages about desirable behaviors, but little is known about the conditions under which
children’s social behavior is most likely to be influenced by engaging with music through
movement or singing. Recent research has shown that engaging in joint music making
can result in increased cooperative behavior among preschoolers (Kirschner &
Tomasello, 2010) and that moving to a musical beat can foster increased prosocial
behavior in infants as young as 14 months (Cirelli, Einarson, & Trainor, 2014). No
studies to date have examined the impact of prosocial lyrics on children’s behavior. The
current study represents a starting point for examining the unique contribution of lyrics
and interpersonal synchrony to children’s behavior in the context of music making. We
investigated 4 and 5-year-olds’ sharing behaviors following one of four experiences:
engaging in joint singing and movement to a novel song with either prosocial or neutral
lyrics, and engaging in joint non-musical play involving a novel rhyme with either
prosocial or neutral content. The four conditions utilize identical props and matched
interaction with the experimenter and a research assistant. Children’s sharing behavior
was coded, and a t test on the first two conditions collected (music/prosocial lyrics and
play/neutral lyrics) shows a trend toward significance such that children in the
music/prosocial condition share more readily than those in the play/neutral condition
(t(18)=-1.907, p=.07). Changes in children’s affective state (indicated by each participant
on a pictorial likert scale) do not account for the difference between the two groups,
suggesting that the difference is not mediated by affect. Data collection and analysis is
underway for the other two conditions, which will enable us to begin to examine the
unique contribution of prosocial lyrics and interpersonal synchrony in facilitating
prosocial behavior.
ISBN 1-876346-65-5 © ICMPC14 432 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 433 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
procedure of the experiment and informed consent was 1b) Convex gesture / ‘Play long notes’
obtained.
1c) Big gesture / ‘Play forte’
Our experiment took place in the Bartos Theatre, a lecture
hall with a seating capacity of 190. We decided to use this 1d) Small gesture / ‘Play piano’.
particular room for the creation of a performance-like
atmosphere without external sources of noise to animate the
participant to perform the test with a high level of Category 2 – Incoherent task gesture combination
concentration. The experimental setup consisted of a screen, For the second category of tasks we used contrasting
showing videos of conducting gestures, headphones for audio instructions and gestures:
feedback and a touch-sensor (Force Sensing Resistor (FSR))
detecting physical pressure. The touch-sensor was 2a) Convex gesture / ‘Play short notes’
custom-built, using a single zone FSR with an accuracy (force 2b) Concave gesture / ‘Play long notes’
repeatability) of +-2% and a force sensitivity range of ∼0.2N –
20N, measuring touch events in Millivolt. The collected data 2c) Small gesture / ‘Play forte’
was sent though an Arduino UNO via serial communication to 2d) Big gesture / ‘Play piano’.
Max/MSP, where it was time stamped and saved into a csv
file.
In order to measure current stress levels the participants We executed every task twice in a randomized sequence to
were asked to wear two electrodes to record electrodermal prevent order effects, and we closed the experiment with a
activity (EDA) and an optical sensor for blood volume pulse short questionnaire, asking for background information on age,
(BVP) measurement on the fingertips of the non-dominant gender, profession, and musical education/experience.
hand. Additionally, a chest belt was used to record respiration
frequency and depth. All sensors were connected to a
FlexComp Infiniti Decoder, which sampled the received
signals at 2048 samples/second, digitized, encoded, and
transmitted the sampled data via the computer's USB port to
the BioGraph Infiniti Developer Tools. The software was used
to merge and timestamp the received data, as well as to create
one single file per task to be saved as csv-file. Max/MSP was Figure 1. Concave (left), convex (right) conducting patterns with
used to start the conducting videos synchronized with the red beat points.
data-collection from the touch sensor and to time stamp and
ultimately save the collected data.
The within-participants experiment with two randomized III. RESULTS
sets included 19 healthy participants of both genders (12 male /
Measurements of EDA and breathing did not show
7 female) with an age between 19 and 32 (x̅ 26.5). 14
statistically significant differences in tasks with predicted
participants were musicians/singers who had already or
higher stress levels. The lack of response in EDA is probably
currently played or sung with a conductor; 5 were
due to the relatively short intervals of the different tasks;
non-musicians who had never worked with a conductor.
longer single tasks with each gesture-task-combination could
After mounting the sensors as described above, the bring more consistent results. Concerning breathing amplitude,
individual baseline of the participant was measured during a only non-significant trends could be found as discrimination
3-minute video containing relaxing music and nature from low-stress to high-stress tasks. However, breathing of
photographs. Subsequently the participant was instructed to other musicians (wind players), and especially singers, could
follow videos of varying conducting gestures by playing a be influenced by stress, as it plays a more “active” part in
sound elicited by pressing the touch sensor. Every actual tone production for them than in our one-finger-study.
tapping-event induced an audio feedback, the length and force
While on the one hand the heart rate (HR) showed only
of the participant's pressure mirroring the length and volume of
little difference, on the other hand changes of the heart rate
the evoked tone respectively. The on- and offset-time of each
variability (HRV) revealed a strong correlation with higher
tapping event and the amount of pressure on the sensor were
stress levels. Research has shown a highly significant
recorded. The collected sensor data was synchronized with the
correlation between short-term HRV analysis and acute mental
recorded data of the shown conducting gestures to allow
stress (Tremper 1989, Sinex 1999, Castaldo et al. 2015). Using
further investigations.
the collected blood volume pulse (BVP) data, we calculated
Based on findings on intuitive reactions of conducting low frequency (LF | 0.04–0.15 Hz) and high frequency (HF |
gestures (Platte et al. 2016), we decided to use concave and 0.15–0.4 Hz) powers and used the power ratio of LF/HF as an
convex trajectories and different sizes of the convex indicator for mental stress. As HF spectrum changes are
conducting pattern to design a procedure of both coherent and associated with changes in mainly vagal-cardiac modulation
incoherent tasks, subdivided into two categories: and changes in LF power with both vagal- and
sympathetic-cardiac modulation, a higher LF/HF power ratio
Category 1 – Coherent task gesture combination indicates a higher level of mental stress. However, only 16
The first category of tasks included coherent tasks and participants showed usably noise-free BVP measurements –
gestures (video shows / verbal instruction): data from 3 participants were insufficient and could not be
used for calculations of HRV and LF/HF power ratio.
1a) Concave gesture / ‘Play short notes’
434
Table of Contents
for this manuscript
A. Dynamics B. Articulation
Within the musical parameter of dynamics the instruction The combination of the instruction “play staccato” while
“play forte” revealed the clearest differences in HRV LF/HF following a convex or a concave conducting gesture, showed
power ratio (p = 0.0003). Almost all participants (15 out of 16) the most consistent stress reactions (p = 0.0003). All 16
showed higher values of LF/HF power in the cases of participants show higher LF/HF power ratios under the stress
discrepant task and shown gesture. However, the amount of inducing condition. As for all other tasks, musicians again
evoked stress differs significantly between the groups of show a higher significance in the consistency of their stress
musicians and non-musicians (see Figure 2). This is reflected reaction. As opposed to the tasks of contrasting dynamics, the
in a higher significance (p=0.003) for musicians, with a mean highest stress reaction in the “play staccato” task is found in
low-stress value of 1.2 and a ratio of 3.2 for the high-stress the group of non-musicians with a value of 3.2, compared to
condition. Non-musicians showed a similar power ratio (1.1) 2.5 in musicians (see Figure 3). The standard deviation in the
in cases of low stress, with a less increased high-stress value case of the low-stress task is very low at 0.6 for both groups,
(2.0) compared to the group of musicians, leading to a higher while it is much higher for non-musicians (1.9) compared to
but still significant p-value of 0.01. Interestingly, in the case of 1.2 in musicians for the high-stress condition.
high-stress, the standard deviation of the group of musicians
(2.9) is much higher than it is for the group of non-musicians
(1.8), indicating that some musicians seem to be accustomed to
this type of stress than others.
435
Table of Contents
for this manuscript
between instruction and gesture (see Figure 4). Although both V. CONCLUSION
tasks should theoretically elicit the same length of pressure, the
participants seemed not to be able (despite instruction) to hold With this study we demonstrated that a mismatch between
the shown conducting gesture and the concurrent verbal
the note as long with the concave gesture (654.3 ms) as with
the convex gesture (706.7 ms). As seen in Figure 5, a similar demand for a specific sound envelope shape has a negative
impact on both the musician’s stress level and the resulting
reaction can be found in the “play piano” task, evoking a
sound itself. Thus, the key to a healthy conducting and
significantly higher pressure (p=0.03) when accompanied by a
rehearsing environment seems to involve supporting a
big gesture. Neither the “play staccato”, nor the “play forte”
consistency of the conductor’s imaginative musical goal, his or
instruction elicits significant alterations in the length and
her shown gesture, and the verbal demand.
amount of pressure respectively, when combined with an
incoherent gesture. Recent studies show, that one third of absences at work are
due to mental stress and anxiety (Hope, 2013), which in a
chronic state is known to lead to serious somatic diseases, such
as hypertension, coronary artery disease, heart attack, etc.
(Boonnithi & Phongsuphap, 2011). Hence, proving a direct
influence of incoherent gesture-task combinations on the
musicians’ stress level – and taking it into practical account –
could have a major impact on the diagnosis, the treatment and
– not least – the prevention of some occupational illnesses
among professional musicians. Likewise, a change of rehearsal
methods towards more coherence of verbally predetermined
musical tasks, conducting gestures, and aimed interpretation
could improve not only the daily working environment of the
193,300 professional musicians (in the U.S.), but also lead to a
more joyful and healthy experience for innumerable amateur
singers and musicians (U.S. Bureau of Labor Statistics, 2015).
REFERENCES
436
Table of Contents
for this manuscript
Motor engagement with music occurs frequently in humans at a both a conscious and an
automatic level. Groove music is defined as music that elicits a pleasurable response of
wanting to move one’s body to the music (Janata et al., 2011). The primary aim of this
study is to provide further understanding of the impact of body movement on musically-
induced emotional responses. Focusing on positively-valenced emotional responses to
music, the impact of three types of movement on musically-induced emotion and liking
of musical pieces was investigated. Twenty-seven subjects listened to three different
musical excerpts, rated to be high in “groove”, under varying conditions of movement
restriction. These conditions were full restraint (no movement), partial restraint (finger
tapping), and no restraint (free movement). Subjects were videotaped throughout the
study to ensure compliance with movement restrictions. One-way ANOVA, paired
samples t-tests, and regression analyses were performed. The central finding was that the
free movement and finger tapping conditions resulted in greater positively-valenced
emotional responses (scored on a 9-figure Self-Assessment Manikin) upon music
listening than did the still condition. It was also found that the free movement and finger
tapping conditions elicited higher ratings of liking (scored on a 5-point Likert scale) of
the musical excerpts than the still condition. Findings did not support a relationship
between movement restriction and physiological arousal thought to occur with emotional
response. It is concluded that movement to a piece of music affects emotional responses
upon music listening and that people enjoy pieces of music more when they are able to
engage motorically with the music. These findings are relevant to future neuroimaging
research into music and emotion as the inability to move freely during neuroimaging
procedures may have an impact on emotional responses to pieces of music as well as
neural activity involved in these responses. In addition, the role of shifts in attention
when one is required to restrict one’s movement to music is explored as a possible
explanation for decreases in both liking of and positive emotional response to pieces of
music.
ISBN 1-876346-65-5 © ICMPC14 437 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Cochlear implants (CIs) constrain the perception of pitch in users, yet many continue to
participate in musical activities after receiving their implants. To date, there has been no
research examining the basis upon which adult CI users integrate electrical hearing input
with motor skill to perform a musical instrument. To understand the mechanisms by
which CI users learn to produce music, and to what extent auditory and motor systems
contribute to this process, we trained adult CI users (n = 5) and hearing controls (n = 15)
to learn short sequences on a piano keyboard. We examined how their performance is
affected by changes in the melodic and motor sequences in the learned patterns. We
adapted a training paradigm used previously to examine the motor and melodic
contributions in music production of trained pianists. Using an animated learning
procedure that removes the constraints of reading standard music notation, we trained
participants to perform short melodic sequences on the piano. We then examined the
transfer of learning by measuring the participants’ time to perform novel sequences
involving a change in the following: 1) melodic pattern only, 2) finger pattern only, 3)
melodic and finger patterns, 4) no change. Hearing controls demonstrated the same
pattern of difficulty in the transfer of learning across conditions as trained pianists in
previous reports. That is, retaining the same melodic and finger patterns in novel
sequences enabled the greatest transfer of learning, while those that altered both of these
components yielded the poorest transfer of learning. Sequences involving a change in
either the melodic or finger pattern yielded intermediate transfer performance. In contrast
to hearing controls, there was no systematic pattern in the latency performance of CI
participants across transfer conditions. While CI users were able to learn piano melodies
successfully under the current training paradigm, our preliminary findings indicate
considerable individual differences in the transfer of learning to novel sequences. These
differences likely reflect the influence of CI users’ unique hearing histories and adaptive
strategies in the integration of electrical hearing input and motor skill for music
production.
ISBN 1-876346-65-5 © ICMPC14 438 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Despite the belief that musicians’ bobs, weaves, and gyrations are somehow related to the
musical structure, the nature of the relationship has been difficult to assess. Progress has
been impeded by four problems: (1) the assumption that musical structure is constant
across performances; (2) the complexity of the movements; (3) and the inability of
traditional statistical tests to accurately model the multilevel temporal hierarchies
involved, (4) and to assess significance. We addressed these problems in a study of the
postural sway of two trombonists as each recorded two performances of each of two solo
pieces in each of three different expressive styles (normal, expressive, non-expressive).
First, we assessed musical structure immediately after each performance by asking the
musicians to report their phrasing, which changed across expressive styles and musicians.
Second, we measured the complex patterns of movement using recurrence quantification
analysis to assess their recurrence (self-similarity), stability and predictability. Third, we
used mixed linear models to statistically assess the reliability of recurrence, stability and
predictability while taking into account the nesting of phrases within pieces within
performances across expressive styles and musicians. Fourth, we also assessed
significance of similarity between performance using surrogate techniques to bootstrap
confidence intervals by generating surrogate time-series that preserved the autoregressive
structure and frequency distribution of the original data. Postural sway was shaped by
each musician’s phrasing of the music, producing serial position curves that were less
pronounced in non-expressive performances and in shorter phrases. Recurrence of sway
rose steeply at the beginning of a phrase, peaking before the middle and tailing off
gradually. Stability of sway followed the opposite course, falling to a minimum near the
middle, before rising and falling again to the end of the phrase.
ISBN 1-876346-65-5 © ICMPC14 439 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
A prime element in music, frequency change (FC), being ecologically crucial, may affect
within a short latency human motor behavior. In animals, a ‘pitch code’, used intra and
inter species, by senders and receivers, optimizes action, since perceiving direction,
magnitude and rate of frequency change enables recognition of identity, intention and
urgency. Nevertheless, human motion yielded by FC has been rarely researched.
We hypothesized that FC information affects behavior; that ‘Up’ or ‘Down’ may yield
differential responses; and that those may be mediated by an ‘express’ ear to muscle
route.
Results
Different NMA behavior in intensity vs frequency conditions implied that the responses
are not to auditory change per se. NMA grew (tap occured earlier) in response to rising
frequency already on the Tap 1 following change, i.e. within less than 250ms. Further,
finger acceleration data revealed, that differential behavior began yet earlier, ca 170ms
post change. Aligning data to sound or to tap showed on Tap 1 early triggering of action,
but unchanged dynamic profile. The response latency, though short, does not
unequivocally show a subcortical ‘express’ route.
Conclusion
Our results suggest that an ongoing covert human motor response is yielded by frequency
change. Such activity may constitute a part of our music perception process, indeed part
of music’s meaning.
ISBN 1-876346-65-5 © ICMPC14 440 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Coordination between co-performers who produce discrete onsets (e.g., tapping, drumming,
piano) has often been measured by performing cross-correlations of the inter-onset intervals (IOI)
or through conversion to discrete relative phase and the use of circular statistics (e.g., the resultant
vector). The cross-correlation of IOIs can produce coefficients that are relatively low, even when
experts perform music together with the goal of synchronizing. Both methods also require several
assumptions to be made about the period and/or phase relationships and can be perturbed by
missing data. Such assumptions become untenable when people are not intentionally trying to
coordinate, but spontaneously align their actions resulting in intermittent coordination. To remedy
these issues we have developed the Wave Transformation (WT) Toolbox, a MATLAB toolbox
for measuring coordination in discrete onset data. The WT converts the discrete signal into
continuous representation similar in appearance to a sinewave, but the peaks of the waves
represent the discrete onsets times. The WT signal can be cross-correlated or undergo other more
complex signal processing and allows missing data points and unequal numbers of data points.
We present simulations where “participants” synchronize with a metronome or with each other to
varying degrees. We also provide a method to estimate chance levels of synchrony between a
person and a metronome as well as between co-performers. Through direct comparisons of
discrete data and the transformed data analyses, we highlight the hazards that can occur when
making assumptions for the discrete data. Finally, we demonstrate several applications of the WT
for human datasets, including analyses of intentional desynchronization that occurs in Steve
Reich’s Drumming. Our analyses highlight differences in the outcomes of the methods; we
discuss which methods are more appropriate under certain conditions.
ISBN 1-876346-65-5 © ICMPC14 441 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 442 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
structural parameters. For instance, a performance with B. Auditory and motion cues
smoother movements is often associated with a sad expression, Thirty auditors (20 females), with at least 5 years of
while jerkiness is related to anger and fear. Smoothness can musical training, were randomly assigned to a modality
also be related to the phrasing of music and provide observers (vision, audio, or both). They could listen to the excerpts as
with time points of segmentation, as well as a sense of
many times as they wanted in order to associate the musical
phrasing contour over time (Wanderley, Vines, Middleton,
excerpts to the expressive conditions. After each trial, auditors
McKay, & Hatch, 2005). While studying the effect of the
selected a confidence level on a five-point Likert scale for
interactions between visual and auditory information on the
judgment of tension and phrasing in clarinet performances, their answer. They also rated on a five-point Likert scale the
Vines and colleagues (2006) found that high tension was importance of motion cues (quantity, regularity, velocity) and
associated to fast, intense, and abrupt movements, whereas auditory cues (timing, articulation, dynamic, phrasing) in
low tension was related to smooth and controlled motion discriminating the conditions. Quantity of motion was
patterns. Moreover, certain gestures may help anticipate the described as the amplitude of movement, regularity was
beginning of a new section, for instance when the performer related to the periodicity of movement and the rhythmic
initiates a breath while other movements may leave the structure, and finally velocity was characterized as the speed
impression that the phrase extends beyond the end of the note. and irregularity of the motion (any directional changes).
Although auditors’ perception of expressive parameters Participants indicated which body parts (i.e., head, torso,
conveyed through reduced body representations is impressive, shoulders, arms, hands, hips) contributed the most to help
it is unsure which kinematic features allow listeners to them discriminate the expression. Auditors filled out a
perceive this and what specific body movements may demographic questionnaire at the end of the experiment.
influence their perception. In the present study, we seek to
understand what influence a piano performance with
restrained gestures but usual expressive acoustic III. RESULTS
characteristics may have on auditors’ perception of that
performance. The objectives of this work were to evaluate: 1) A. Discrimination of expressive conditions
whether auditors are better in recognizing different In order to examine the effects of the different expressive
performance conditions in complex Romantic piano pieces conditions, the modalities, and individual pianists on the
when provided with one perceptual mode at a time (visual or auditors’ ability to discriminate the excerpts, an analysis of
auditory) or both; 2) how motion cues (i.e., quantity, variance (ANOVA) with repeated measures was conducted
regularity, velocity) or auditory cues (i.e., timing, articulation, for each of the dependent variables, with the gender and the
dynamic, phrasing) are efficient in conveying different level of instruction treated as the random variables, and the
expressions; and 3) which body parts contribute the most to
pianist, modality, and condition treated as the fixed effects.
the discrimination of the different performance conditions.
There is a significant main effect for the conditions (p <
.01) on auditors’ answers, but not for the modalities.
II. EXPERIMENTAL METHOD Additionally, there is a significant interaction effect between
modalities and conditions (p < .01); between the modalities
A. Measuring pianists’ motion data and the pianists (p < .01); and between the conditions and the
Eleven graduate pianist-students (4 females) performed a pianists (p < .01). Figure 1 compares the results obtained for
thirty second excerpt from a well-learned piece of their the modalities in each condition. Overall, the expressive
Romantic repertoire in four different conditions: normal, condition are better identified in the audio-visual modality
deadpan, exaggerated, and immobile. Normal performances (average: 76.1% of good answers), followed by the
were played as naturally as possible, and exaggerated visual-only modality (72.5%), and finally the audio-only
performances with an exaggerated level of expression. modality (65.5%). The exaggerated performances are fairly
Deadpan referred to a performance with a reduced level of well recognized in every modality (median: 90%). The
expression, whereas immobile referred to playing with only identification of the normal condition in the audio-visual
the essential movements required to play in a usual manner. mode varies considerably from 20 % to 100% of good
Participants could choose the tempo they thought was answers and is better recognized in the visual-only mode
appropriate to convey the expressive conditions. (median: 80%). However, answers vary between auditors for
Each pianist played one of the following excerpts: Chopin the audio-visual mode (standard deviation: 11.7) and the
4th Ballade; Brahms 3rd Piano Sonata; Chopin scherzo no. 2;
audio-only mode (standard deviation: 11.4), but less for the
Schumann Fantasy; Liszt 2nd Ballade; Medtner Sonata
visual-only mode (standard deviation: 5.8). Auditors are often
Reminiscenza op. 38; Liszt Spanish Rhapsody; Chopin Waltz
confused between the deadpan and immobile conditions in the
Op. 64 No. 2; Schubert Impromptu no. 1; Brahms Capriccio
op. 76 no. 1; and Chopin Impromptu. Each performance visual-only mode and the same occurred for the normal and
condition was repeated three times (total of 12 performances immobile conditions in the audio-only mode. The auditors’
per pianist). Performances were video recorded with a Sony level of confidence in identifying the conditions is higher in
Wide Angle video camera and audio recorded with a the audio-visual modality (weighted average: 78.9%) than in
Sennheiser MKH microphone. the visual-only mode (weighted average: 73.5%) and in the
audio-only mode (weighted average: 66.8%).
443
Table of Contents
for this manuscript
Figure 1. Distribution of the median values per modality illustrating auditors’ good answers (in %) with respect to the conditions (N:
normal; D: deadpan; E: exaggerated; I: immobile)
Figure 2 shows that the auditors’ identification of the important) are respectively the most important auditory and
expressive conditions varies greatly with the modality when motion cues. Regularity is perceived as less essential in
different pianists play. For instance, the conditions are better discriminating the expressive conditions although 19.55% of
discriminated in the audio modality than in the audio-visual the responses were rated as “very important”. Articulation and
mode for pianists 4, 8, and 9. However, they are better timing obtain a very similar response distribution with
recognized in the audio-visual mode, followed by the respectively 28.18% and 24.55% of the answers rated as “very
visual-only and the audio-only modes for pianists 2, 3, 5, and important”.
11. Overall, auditors were better at discriminating the
condition when pianists 3 and 8 were playing, with
respectively an average of 75% and 80% of total good
answers, and the lowest average score during pianist 9 playing
(63.3% of good answers). Interestingly, auditors deciphered
well the conditions in the audio-only modality for pianist 8
(90.0% of recognition).
C. Body parts
In order to identify which body parts are more frequently
associated to the motion cues, we measured the frequency
distribution (in %) of the auditors’ choice of body parts for all
Figure 2. Percentage of auditors' good responses per pianist for
each modality pianists together (Figure 4). Arms (71.36% of the time) are
the most often body part chosen by auditors to help them
B. Importance of auditory and motion cues discriminate between the conditions, followed by the head
Figure 3 represents the frequency distribution of the (60% of the time), the torso (56.36% of the time), the hands
auditors’ answers (five-point Likert scale) on the importance (43.18% of the time), the shoulders (36.36% of the time), and
of audio and motion cues for the identification of the finally by the hips, which are the least often chosen body part
expressive conditions. The vertical line divides the (8.64% of the time).
“moderately important” responses in half. Sound dynamic
(52.27% very important) and quantity of motion (40.91% very
444
Table of Contents
for this manuscript
445
Table of Contents
for this manuscript
446
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 447 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
This paper presents the first empirical evaluation of a digital music instrument (DMI) that
is designed for augmenting electroacoustic vocal performance from the audience’s
perception. We strive to understand the audience preference for the DMI, a Tibetan
Singing Prayer Wheel, which simultaneously processes vocals based on the performer’s
hand gestures; and it’s mapping between vocal processing parameters and sensed
horizontal spinning gesture, in particular the time-varying rotational speed. We
hypothesize that the audience’s level of perceived expression and their engagement level
of a performance increase when: 1. Using synchronized gestures mapping to control the
vocal processing; 2. Using what we consider intuitive gesture mapping to control the
vocal processing. In both experiments, we made two versions of each song’s audio and
inserted them as alternative soundtracks for a video recording of a single performance.
This methodology eliminates possible confounds associated with comparing different
performances; in our case every comparison is between identical videos of the same
performance but with alternative gesture to effects-control mappings. Experiment 1
compared the original mapping, our “synchronicity” condition, against a desynchronized
alternative, with control functions taken from an unrelated performance. Experiment 2
compared the original mapping (slow rotation producing a more natural sound and faster
rotation causing a progressively more intense granular stuttering effect); versus the
opposite. We then presented all six songs to two groups of participants, and evaluated
their responses via questionnaire. In both experiments, viewers reported overall higher
engagement, preference and perceived expressiveness for the original versions, and prefer
our original design. Through evaluating these control and mapping mechanisms, we aim
to push the discussion towards developing gesture theory in electroacoustic vocal
performance.
ISBN 1-876346-65-5 © ICMPC14 448 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Music-based rhythmic auditory stimulation (RAS) is a cueing technique used in gait
rehabilitation, typically involving synchronizing footsteps with the beat. However,
instructions to synchronize may be detrimental to individuals who have difficulty
perceiving the beat in music, such as people with Parkinson’s disease, by causing an
increase in cognitive load.
Aims
Here we evaluate how beat perception ability influences gait responses to RAS when
instructed to synchronize steps with the beat versus when permitted to walk freely. We
hypothesized that walking freely to music without instructions to synchronize may
benefit poor beat perceivers by reducing dual-task load.
Methods
84 healthy adult participants walked on a pressure-sensor walkway to music that they
rated to be high in groove (the 'desire to move to the music'). Beat perception ability was
evaluated using the Beat Alignment Test from the Goldsmiths Music Sophistication
Index v1.0.
Results
As predicted, instructions to synchronize facilitated gait in good beat perceivers, whereas
walking freely facilitated gait in poor beat perceivers. This instruction x beat perception
interaction was statistically significant for stride width, an indicator of walking
confidence and balance, and approached significance for stride length, an area of
particular concern in Parkinson’s. Post-hoc tests on song rating data indicate that gait
improvements in good beat perceivers were linked to perceptions of high beat salience
and groove in the music, whereas improvements for poor beat perceivers were linked
more closely to enjoyment.
Conclusions
Results support the premise that RAS should be tailored to beat perception ability: good
beat perceivers benefitted more from synchronizing, whereas poor beat perceivers
benefitted more from freely walking to the music.
ISBN 1-876346-65-5 © ICMPC14 449 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The body movements involved in dyadic music performance usually fall into two broad
categories; sound-producing movements, which are planned and highly coordinated, and
ancillary gestures, which are improvised yet essential for communicating musical
intentions. Previous work has shown that these types of movements are correlated to
some extent as they both reflect a performance’s attributes such as musical structure and
tempo. However the categories differ enough that they may require different interaction
analysis methods when assessing interpersonal coordination. This study investigates
which interaction analysis method is best suited for each movement category, and also
considers a range of movements from simple coordinated actions to musical
performances. In a series of trials, two violinists were motion-captured while performing
short coordinated gestures (e.g. mirrored and unmirrored limb extensions), uncoordinated
improvised actions (e.g. free gestures and dancing), and a short work for two violins.
Interpersonal coordination is evaluated using three different methods that uncover
generalized relationships between data sets: Canonical correlation analysis (CCA:
identifies linear relationships by calculating the shared variance between data sets),
mutual information (MI: identifies mutual dependence of two data sets) and joint
multivariate synchrony (JMS: identifies the joint entropy of two data sets). Though some
work has started with investigating coordination using CCA, progress on this project is
ongoing. Generally, we expect to find that the effectiveness of a given interaction
analysis method will be based on the nature of the trial. For instance, because CCA
assumes linear relationships between data sets, this technique would be best suited for
evaluating the degree of synchrony between coordinated instrumental
movements, whereas MI makes no such assumptions, and could therefore be used for
assessing interaction in improvised or ancillary gestures.
ISBN 1-876346-65-5 © ICMPC14 450 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
A large body of research was devoted to the way in which music shapes the perception of
the visual stimuli. The recent research demonstrated that the evaluation of outdoor scenes
can be affected by characteristics of the listened music. The present study using eye
tracking method was intended to investigate differences in perception of urban and
natural scenes while listening to two distinctive types of music. The effects of (1)
motivational music, which has a fast tempo and strong rhythm and (2) non-motivational
music, which is slower and with no strong implication to movement on perception of
various urban and natural scenes were investigated. We were interested, whether
particular types of music can influence perception of the scenes in terms of a number of
eye fixations, durations of fixations, and an extent of perceived visual information. 66
undergraduates aged from 19 to 25 years participated in the study. They viewed 25
photographs successively. Simultaneously, they listened to motivational or non-
motivational music. The no-music condition was used for a control. Participants’ gaze
was recorded using the Tobii X2-60 eye tracker. Durations and a number of fixations in
the photographs were analyzed. We did not find any significant differences in the extent
of perceived visual information. However, the number of fixations was significantly
higher in the no-music condition compared to the music conditions. The durations of
fixations were the shortest in the no-music condition. Moreover, the duration of fixations
of the participants listening to motivational music were significantly longer relative to
those listening to non-motivational music. The data suggested that listening to music
while viewing the photographs required larger cognitive load, which resulted in longer
durations of eye fixations. The research could enhance our understanding of phenomena
associated with music listening while walking or running in an outdoor environment.
ISBN 1-876346-65-5 © ICMPC14 451 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The pulse level in music is often described as a mental reference structure against which
we perceive rhythm. The pulse is often defined as a series of isochronous beats, however
this conception is challenged by music styles that seem to feature an underlying pulse that
consists of beats of uneven durations, such as certain traditional Scandinavian dance
music genres. A considerable part of the traditional dance music of Sweden and Norway
are in so-called asymmetrical meter—that is, the beats in a measure are of uneven
duration.
In this study we focus on one specific music style, namely Norwegian telspringar. The
intimate relationship between music and motion is often highlighted in rhythm studies of
these music styles, so we wanted to investigate whether performers’ periodic body
motions in telespringar performance could be understood as an expression of the
underlying pulse.
We report from a motion capture study where three professional performers, one fiddler
and two dancers, participated. The participants’ body motions were recorded using an
advanced optical infrared motion capture system.
Sound analysis showed cluster of peaks at beat level, which made it difficult to determine
a duration pattern based on the sound signal. Motion analysis of the fiddler’s integrated
foot stamping, on the other hand, showed clear acceleration peaks revealing a very stable
long – medium – short duration pattern that seemed to be synchronized with the beat
related sonic events in the music. Analysis of the dancers’ vertical hip motions also
revealed a stable motion pattern that seemed to be synchronized with both measure level
and beat level indicated by the fiddler’s foot stamping.
The results show that underlying asymmetrical structure is indicated by the fiddler’s foot
stamping and corresponds to the dancers’ vertical hip motions. This confirms the view
that the underlying pulse in telespringar consists of beats of uneven duration.
ISBN 1-876346-65-5 © ICMPC14 452 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In dance, people synchronize their movements with the musical beat. Since music is
primarily experienced through the auditory sense, this form of sensorimotor
synchronization has exclusively been studied using auditory stimuli. However, music can
also be experienced through the tactile sense, when objects that generate sounds also
generate vibrations. One typical example is the vibration of the floor induced by the
subwoofers in dance clubs. This is of particular relevance for deaf people, who rely on
non-auditory sensory information for beat synchronization. In the present study, we
aimed to investigate beat synchronization to vibrotactile electronic dance music in
hearing and deaf people. We tested 8 deaf and 15 hearing individuals on their ability to
bounce in time with the tempo of vibrotactile stimuli (no sound) delivered through a
vibrating platform. The corresponding auditory stimuli (no vibration) were used in an
additional condition with the hearing group. We collected movement data using a
camera-based Motion Capture system, and subjected it to a phase-locking analysis to
assess synchronization quality. The vast majority of participants were able to precisely
time their bounces to the vibrations, with no difference in performance between the two
groups. In addition, we found higher performance for the auditory compared to the
vibrotactile condition in the hearing group. Our results thus show that accurate tactile-
motor synchronization in a dance-like context does occur regardless of auditory
experience, though auditory-motor synchronization was of superior quality.
ISBN 1-876346-65-5 © ICMPC14 453 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Expert or fraud – opinions differ widely when it comes to There seem to be many abstract multimodal analogies
the profession of the conductor. The powerful person in front between sound and color, form, size or emotion. Two
of an orchestra or choir attracts both hate and admiration. But examples of research on this phenomenon are Manfred
what actual influence does a conductor have on the musician’s Clyne’s “essentic forms” (1978) and Wolfgang Köhler’s
body and the sounding result? “Maluma & Takete” (1929 & 1947). Clynes developed a
machine called the sentograph, which measures people’s
A large proportion of the myth around the conductor is gestural responses to various pieces of music. Based on these
inextricably linked to his individual influence on the sound of findings he claims the existence of distinct expressive forms
an orchestra. In this context, many reports of the conductor’s for a number of emotions. In 1929, Köhler explored a
unique influence on the orchestra exist. For example, Arthur non-arbitrary mapping between speech sounds and the visual
Nikisch is said to have “had the uncanny ability to get a shape of objects. By showing participants two forms similar
ravishingly beautiful sound out of an orchestra without saying to those in Figure 1 and asking, which shape was called
a word” (Seaman, 2013). A commonly used explanation of a “takete” and which “maluma”, respectively, he came to the
specific sound of an orchestra is the aura of a charismatic result, that there is a strong preference to link the rounded
conductor. The most famous example is probably Wilhelm shape to “maluma” and the jagged shape to “takete”.
Furtwängler, who was known to change the orchestra’s sound
just by entering the room, even when someone else was Köhler’s experiment has been repeated in various settings
conducting (Lawson, 2003). Recent books on the role of the and variations (Jankovic, 2006, Nielsen & Rendall, 2013,
conductor refer to more scientific concepts, such as mirror Oakley & Brewster, 2013). An equivalent consistency in the
neurons or emotional contagion, as an explanation for the combination of gestural expressions and sounding reaction in
ISBN 1-876346-65-5 © ICMPC14 454 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
B. Methodology
Based on the literature review, we decided to use three III. DATA COLLECTION
different archetypes of conducting patterns for our A. Touch-Sensor Study
experiments: Concave, convex and mixed (see Figure 2).
Concave and convex gestures are simplified versions of those The experimental setup of the first study consisted of a
used by Serrano (1993), reduced to the pure shape as being screen, showing videos of prerecorded conducting gestures,
convex or concave as defined by Schuldt-Jensen (2016). The headphones for audio feedback and a touch-sensor
mixed gesture is an often taught combination of a concave and (Force-Sensitive Resistor (FSR)), which detected physical
a convex trajectory, for example seen in Phillips (1997), pressure.
Lumley & Springthorpe (1989) and Göstl (2006). The within-subjects experiment with two randomized sets
In addition to the three basic patterns, we were also using a included 19 healthy subjects of both genders (12 male / 7
big and a small version of the convex pattern to investigate female) with an age between 19 and 32 (x̅ 26.5). 14
possible differences in volume of the evoked reaction. In participants were musicians/singers who had already or
relation to onset synchrony the question of predictability currently played or sung with a conductor; 5 were
becomes especially interesting when it comes to sudden non-musicians who had never worked with a conductor.
tempo changes. Therefore, as a supplement to measuring the During the experiment the subject was instructed to follow
reaction on the three different conducting patterns in constant videos of varying conducting gestures by playing a sound
tempo, we included versions with changing tempo for all three elicited by pressing the touch sensor. Every tapping-event
gestural archetypes to see whether response time differs.
455
Table of Contents
for this manuscript
induced an audio feedback, the length and force of the synchronized with the recorded data of the conductor to allow
subject's pressure mirroring the length and volume of the further investigations.
evoked tone respectively. The on- and offset-time of each The proceeding tasks can be subdivided into three
tapping event and the amount of pressure on the sensor were categories:
recorded. The collected sensor data were synchronized with
the recorded data of the shown conducting gestures to allow 1) Concave, Convex and Mixed. The first category
further investigations. investigates the intuitive reaction to the different forms of
conducting gestures in terms of sound quality and muscle
The participants were asked to follow the tempo of the tension.
conductor and to intuitively play – in terms of length and
volume – as shown by the conductor. The proceeding tasks 2) Pronation, Supination and Neutral Hand Posture. With
different forearm rotations we aim to explore potential
can be subdivided into three categories:
influences of hand postures on the sound quality and muscle
1) Concave, Mixed, Convex. Videos of the three basic tension of conductor and musician.
movement types were shown, conducted in constant tempo.
3) Fermata. This category is designed to show possible
Measurements of length and amount of pressure on the FSR
differences in sound quality in cases of a fermata with
sensor were collected, and mean and standard deviation (STD) respectively without movement.
were calculated.
In both studies, we executed every task twice in a
2) Dynamics & Articulation. For dynamics, two conducting randomized sequence to prevent order effects, and we closed
gestures with identical form, but differing in size, were shown the experiments with a short questionnaire, asking for
to measure differences in the applied pressure on the touch background information on age, gender, profession, and
sensor. As for the effect on the articulation, the two musical education/experience.
contrasting conducting patterns (concave and convex from
category 1) were used, measuring potential differences in the
length of pressure on the sensor.
IV. RESULTS
3) Predictability. Three videos of concave, convex and
mixed conducting gestures with changing tempo were used to A. Touch-Sensor Study
investigate the predictability of the different types of Measuring intuitive reactions on different gestural
trajectories, indicated by the standard deviation (STD) from archetypes, significant differences occurred in all measured
the calculated beat point. categories.
B. Violin Study 1) Concave, Mixed, Convex. The intuitively evoked
Violinists represent the largest instrumental group of pressure on the touch sensor, which represents the produced
orchestral musicians. As target group for our second volume of the note, differed significantly when comparing
experiment they are especially interesting, because all main concave/convex (p=0.00007) and concave/mixed (p=0.009)
body parts involved in the process of playing the violin are gestures. The difference between convex and mixed is not
visible and accessible for sensors. Moreover, the violin offers a significantly distinctive (p=0.4). The mean pressure is the
comparatively open sound configuration; timbre varies highest in convex (374.8), followed by mixed with 359.5 and
depending on the bow’s speed, pressure, angle, and position of the lowest in concave (337.9). However, the envelope of a
contact, as well as on the left hand’s finger pressure on the pressure event indicates that the difference in mean pressure
strings. should rather be described as the overall energy during one
The experimental setup for this study consisted of a screen, specific touch event than just mean pressure itself, as there is
showing videos of the prerecorded conducting gestures and a a longer plateau on a high pressure level in convex and mixed
Zoom H2 device to record audio in 44100 Hz stereo, 32 bit gestures as compared to the sharp rise and immediate decline
wave-format. As orchestra musicians usually play seated, we in the concave gesture (see Figure 3).
also asked our participants to sit during the experiment. While
following a protocol of single tasks, the participants wore three When it comes to the length of the evoked note, differences
MYO Armbands (Thalmic Labs, 2016), one on the right and are considerable. There were statistically highly significant
left forearm respectively and the third on the right upper-arm. differences between mean values of concave, mixed and
convex as determined by a one-way ANOVA test (p =
The within-subjects experiment with two randomized sets 0.0000000014). As is evident from the envelopes in Figure 3,
included 8 healthy subjects of both genders (1 male / 8 female) the length of the resulting note of a concave gesture (295.5 ms)
with an age between 19 and 63 (x̅ 31). All 8 participants are is less than half of those of mixed (620.4 ms) and convex
active violinists with at least 8 years of experience of playing (731.8 ms).
under a conductor; 4 participants are professional violinists and
members of professional symphony orchestras, 4 play on a 2) Dynamics & Articulation. Different forms and sizes of
semi-professional level. conducting gestures lead to highly significant differences in
pressure length and force on the touch-sensor.
After mounting the sensor armbands, the subject was
instructed to follow videos of varying conducting gestures by
playing a d’’ with the third finger on the a-string in half notes.
The recorded audio and collected sensor data was
456
Table of Contents
for this manuscript
B. Violin Study
1) Concave, Convex and Mixed. Measuring intuitive
reactions on different gestural archetypes, significant
differences occurred in the measured intensity of the evoked
tone.
By means of the Sonic Visualiser (2016), we calculated the
intensity of the signal, i.e. the sum of the magnitude of the
FFT bins of 7 sub-bands. As shown in Figure 6, the highest
average intensity was induced by a convex gesture (66.4),
followed by mixed (56.7) and concave (34.1). The observed
differences in intensity of the three gestural archetypes can be
confirmed in significance by calculating the one-way
ANOVA (p = 0.0001). These results confirm the findings of
Figure 3. Envelopes of touch events intuitively evoked volume in form of pressure in the
touch-sensor study. Interestingly, an analysis of the
conductor’s muscle tension in the right forearm shows
contrary results: The conductor uses the highest average
muscle tension with the concave gesture (90.2), followed by
mixed (41.5) and convex (33.7). Although the resulting
intensity is consistent trough all participants, measurements of
muscle tension of the musicians do not give significant
results.
457
Table of Contents
for this manuscript
V. DISCUSSION
The main purpose of the presented studies was to explore
the impact of different gestural archetypes on the musician’s
Figure 7. Muscle tension in the conductor’s right forearm. body and the sounding result. Firstly, we investigated the
reaction of participants on three basic conducting gestures
with different shapes of trajectories. The results correspond
3) Fermata. Measurements of intensity in the two different with Köhler’s (1929 & 1947) findings of non-arbitrary
versions of the fermata show significantly (p=0.01) higher mappings between speech sounds and the visual shape of
values (mean intensity: 20.8) for the fermata with movement objects and confirm the claim that such correlation also exists
compared to the static version (mean intensity: 17.7). Also, between conducting gestures and the elicited tone
significant differences in mean muscle tension of the (Schuldt-Jensen, 2016). Our results for the first time
conductor indicating a slightly higher value for the fermata scientifically document the spontaneous and consistent
with movement. relationship between certain shapes of conducting gestures
and the resulting articulation, as well as a significant
The movement of the conductor’s arm during a fermata
correlation between size of gesture and evoked volume.
seems to not only have an influence on the mean intensity, but
Interestingly, different shapes of gestures do not lead to
the envelopes of intensity over time differ considerably: The
significant difference in terms of loudness of the resulting
version with movement starts with a high intensity that
tone, but the envelope of the evoked tones varies significantly,
remains relatively stable until the middle of the fermata,
in that there is a big difference in the intuitively evoked length
before decreasing until the end of the note (see figure 8).
of sound. The loudness of the note seems to be primarily
Figure 9 shows a typical fermata without movement, starting
influenced by the size of the gesture. These findings could be
already with a lower intensity than the other version; a drop of
of direct practical relevance, especially for the field of
more than 30% occurs between second 1 and 2, followed by a
conducting tuition; for example, in view of the fact that there
rise until the end of the note. As a result, the version with
are significant differences in the intuitive reaction to concave
movement appears to be much smoother than the fermata
versus convex conducting patterns, it would be especially
without movement with its heavy fluctuations.
interesting to further analyze teaching books on conducting
presenting conducting patterns with a combination of both
types.
Another important quality of successful conducting
technique is the predictability of the shown gesture.
Especially when it comes to delicate musical elements, such
as a pizzicato, a subtle unison cue for instruments with
differing functionality, and to tempo changes (crucial to
musical phrasing), an unpredictably shaped gesture
immediately deteriorates the interpretation, because the
conductor thus loses control over the timing and the
onset-offset-envelope. In order to measure differences in the
predictability of conducting gestures, we created tasks with a
changing tempo and calculated the onset-delays of the
participants’ touch-events. While refuting Serrano’s (1993)
Figure 8. Intensity – Fermata with movement. findings, the results confirm Schuldt-Jensen’ (2016) analyses,
to the effect that convex conducting gestures show the highest
458
Table of Contents
for this manuscript
predictability whereas mixed and concave gestures show a reduce the required muscle tension on a permanent basis and
higher standard deviation especially in sections of at the same time evoke a higher intensity of the sound of his
deceleration. ensemble.
The second study aimed at exploring the impact of different
conducting gestures on muscle tension of the musicians as
well as the resulting sound. The results show significant VI. CONCLUSION
findings in all investigated categories for both the conductor’s
muscle tension and the intensity of the evoked sound. We presented two studies exploring the physiological impact
However, the outcome has a number of limitations: Firstly, and the sounding result of different conducting gestures. Our
due to the small number of participants, no significance could findings show that there are consistent reactions to different
be reached when measuring the violinists’ muscle tension. types of trajectories and hand postures, indicating that it does
Secondly, in addition to the individual and slightly varying indeed matter musically how conducting gestures are formed
playing techniques of the participants, every one of them used and executed. However, our findings do not aim to define any
their own instrument. Both of these factors presumably led to of the investigated types of gestures as being right or wrong.
inconsistencies in the measured muscle tension as well as the But it turns out that since every gesture has a unique sounding
sound quality. In future studies, the same instrument should consequence, certain gesture types are more capable – more
be used for all participants, and the individual differences in economical and effective – of reaching predefined musical/
instrumental technique – if unavoidable – should be interpretational goals than others.
documented and be taken into consideration during data
analysis.
The reactions on concave, convex and mixed conducting REFERENCES
gestures show significant differences in intensity of the
evoked tone; interestingly, the amount of tension involved in Bowen, J. A. (2005). The Cambridge Companion to conducting
the production of the conductor’s movement seems to be in (Reprint). Cambridge: Cambridge University Press.
inverse proportion to the intensity of the resulting tone. Clynes, M. (1978). Sentics: The Touch of the Emotions. Garden City,
N.Y.: Anchor Press/Doubleday.
Furthermore, trajectories with a natural gravitational
Colson, J. F. (2012). Conducting and Rehearsing the Instrumental
pendulum movement, executed through lifting the arm and Music Ensemble: Scenarios, Priorities, Strategies, Essentials, and
then releasing the potential energy into the trajectory, such as Repertoire. Scarecrow Press.
convex and (partly) mixed, apparently evoke a higher level of Farberman, H. (1997). The art of conducting technique: A new
intensity than the concave gesture with its bouncing perspective. Miami: Belwin-Mills.
movement against gravity. Galkin, E. W. (1988). A history of orchestral conducting in theory
and practice. New York: Pendragon Press.
The results of the second category (pronation, supination, Göstl, R. (2006). Chorleitfaden : motivierende Antworten auf Fragen
neutral) show that not only the form of the conducting gesture der Chorleitung. Regensburg: ConBrio.
leads to differences in intensity. Also, different forearm Green, E. A. H. (1992). The modern conductor : a college text on
rotations used with the same (convex) form of trajectory conducting based on the technical principles of Nicolai Malko as
influence the intensity of the induced tone significantly. Both set forth in his The conductor and his baton. Englewood Cliffs,
of the standard rotations pronation and supination (prescribed N.J. : Prentice-Hall.
Hattinger, W., (Author). (2013). Der Dirigent: Mythos – Macht –
as standard positions in many books on conducting) evoke a
Merkwürdigkeiten. Kassel: Bärenreiter.
lower intensity, compared to a neutral hand posture. Haverkamp, M. (2001). Application of Synesthetic Design as
As for the fermata, though the two types seem to be rather multi-sensory Approach on Sound Quality. Retrieved from
alike, they nevertheless led to surprisingly distinct differences. http://www.michaelhaverkamp.de/Syn_Wahrn_Ger_update_05_
HAV.pdf
Although the movement of the hand during the duration of the
Hinton, E. L. (2008). Conducting the wind orchestra.
fermata needs only slightly more muscle tension on the Jankovic, D. (2006). Development of the takete-maluma
conductor’s side, this action resulted in a much higher overall phenomenon: Children’s performance on choice and
intensity of the evoked tone and a lower fluctuation of visualisation task. PERCEPTION, 35, 90–91.
intensity, which combined improves the sound quality Köhler, W. (1929). Gestalt psychology. New York, H. Liveright,
consistency of fermatas substantially. 1929.
Köhler, W. (1947). Gestalt psychology (2nd ed.). New York, H.
An important implication of our findings is that both the Liveright, 1947.
form of the trajectory and the hand posture elicit significant Lawson, C. (2003). The Cambridge companion to the orchestra.
differences in sound quality, especially when the Cambridge, U.K. ; New York : Cambridge University Press,
multiplication of the individual results (due to the high 2003.
number of violinists and other string players in an orchestra) Lumley, J., & Springthorpe, N. (1989). The art of conducting : a
is taken into account. Concerning health related implications, guide to essential skills. London: Rhinegold.
as i.e. overstraining, future studies should measure the impact Maiello, A. J., Bullock, J., & Clark, L. (1996). Conducting: A
Hands-on Approach. Alfred Music Publishing.
of different conducting gestures over a longer period of time.
McElheran, B. (2004). Conducting Technique: For Beginners and
As for the conductor’s risk of overstraining, by using convex Professionals. OUP USA.
conducting gestures with a neutral hand posture (instead of Nielsen, A. K. S., & Rendall, D. (2013). Parsing the role of
the other gestural types mentioned above) he can seemingly consonants versus vowels in the classic Takete-Maluma
459
Table of Contents
for this manuscript
460
Table of Contents
for this manuscript
Most synchronisation studies investigate rhythmic movements and note onset synchrony,
where metrical structure generates strong expectations of timing of future events. We
studied kinematics and coordination in a mirror game using fluent, improvised
movements that either give rise to a moderate sense of pulse (circle drawing) or are
pulse-free (free movement). 32 participants (18 f, 14 m; mean age 25.2 years) took part in
dyads, standing face-to-face, pointing their right index fingers at each other, with
fingertips 10–15 cm apart. In turn, one of the p’s was appointed the leader, or the dyad
was instructed to share leadership. Hand movements were recorded with optical motion
capture.
In leader-follower tasks, the lag between participants was app. 0.3 seconds. The joint
leadership trials resulted in mutual adaptation, with both p’s “following” each other at
similar lags, windowed analysis revealing that the direction of the lag varied at sub-
second intervals. Importantly, there were no velocity differences between the leadership
conditions.
Our free, fluent movement tasks extend the range of current entrainment studies from the
strictly rhythmic and metric domain towards fluent movements, which are important for
ensemble coordination, and add to the growing literature on movement mirroring.
ISBN 1-876346-65-5 © ICMPC14 461 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Most of us can perceive and produce a regular ‘beat’ in response to musical rhythms, but
individuals vary in how well they can do so. Beat perception and production may be
influenced by music (Repp & Doggett, 2007) or dance training (Miura et al., 2011). It
may also be influenced by the particular musical instrument or style of dance with which
an individual has experience with. For example, training in percussive instruments (e.g.,
drums or keyboards) and percussive styles of dance (e.g., breakdancing or tap) may lead
to better beat processing abilities than training in nonpercussive instruments (e.g., brass
or strings) or nonpercussive styles of dance (e.g., ballet or contemporary) (Cameron &
Grahn, 2014; Cicchini et al., 2012; Miura et al., 2011). In our previous work, we
examined how percussive and nonpercussive music or dance experience influenced beat
perception and production, using subtests of the Beat Alignment Test (Gold-MSI, 2014).
We found that musicians did better than dancers, but percussive training did not have a
significant effect. However, the BAT uses finger tapping to measure beat production; it is
possible that the musicians’ advantage does not generalize to types of movement more
familiar to dancers, such as whole-body movement.
ISBN 1-876346-65-5 © ICMPC14 462 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Hearing loss, which most adults will experience to some degree as they age, is associated
with increased risk of cognitive impairment and decreased quality of life. Musicianship
has been shown to improve aspects of auditory and cognitive processing, but has not been
studied as a short-term intervention for improving these abilities in older adults. The
current study investigates whether short-term musical training can improve three aspects
of auditory processing: perception of speech in noise, pitch discrimination, and the neural
response to brief auditory stimuli (frequency following response; FFR). This study also
examines several measures of cognition – working memory and inhibitory control of
attention – in order to explore the extent to which gains in auditory perception might be
related to improvements in cognitive abilities. Thirty-two older adults (aged 50+)
participated in a choir for 10 weeks, during which they took part in group singing (2
hours/week) supported by individual online musical training (1 hour/week). Choir
participants (n=32) and an age-matched control group (n=20) underwent pre- and post-
training assessments, conducted during the first week of the choir and again after the last
week. Cognitive and auditory assessments were administered electronically, and the FFR
was obtained using electroencephalography (EEG). Preliminary statistical analyses
showed that choir participants improved across all auditory measures, while the control
group showed no differences. As well, choir participation resulted in higher scores on a
non-auditory attention task, suggesting that musical training can lead to cognitive
changes outside the auditory domain. A correlational analysis demonstrated that older
adults with increased hearing loss achieved greater gains as a result of this intervention.
These findings support our hypothesis that short-term choir participation is an effective
intervention for perceptual and cognitive aspects of age-related hearing loss.
ISBN 1-876346-65-5 © ICMPC14 463 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Brain processes underlying sound perception are different when passively listening or
being actively involved in sound making such as in music. During the action-perception
cycle of sound making, auditory and sensorimotor areas interact closely, thus integrating
multimodal sensory feedbacks into subsequent action planning and execution. Auditory
evoked responses during sound making such as vocalization are attenuated compared to
passively listening to same recorded sound. Similar movement-related response
suppression had been found for sound making actions with instruments.
In addition to showing suppressed auditory brain responses while making sound, our
hypothesis was that newly engaged action-perception connections would affect how we
interpret the sound during subsequent listening.
Eighteen young healthy adults participated in three magnetoencephalography (MEG)
sessions while (1) passively listening to the recorded sound of a chime, (2) actively
making sounds on a chime, and (3) listening again to the recorded sounds.
In MEG data analysis, we localized sources of auditory brain responses in the bilateral
auditory cortices, and projected the MEG signals into time series of left and right
auditory cortex activity.
Averaged auditory evoked responses recorded during passive listening showed typical P1,
N1, and P2 peaks at 50 ms, 100 ms and 200 ms after sound onset and a sustained wave
according to the duration of the lingering chime sound. During sound making, we
observed similar P2 and sustained responses, while the N1 response was substantially
attenuated. Comparison of brain activity during naive and experienced listening showed
an increase in the P2 response and emerging beta activity after sound making.
Previous studies found that the P2 response and beta oscillations are specific to
perception rather sensation of sound. Therefore, we discuss the experimental results as
objective indicator of short-term experience induced neuroplastic changes of perception.
ISBN 1-876346-65-5 © ICMPC14 464 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 465 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The Pitch Imagery Arrow Task (PIAT) was designed to reliably induce pitch imagery in
individuals with a range of musical training. The present study used the PIAT to
investigate the effects of pitch imagery on intrinsic brain rhythms measured with
magnetoencephalography (MEG). Participants were 19 healthy adults whose musical
expertise ranged from novice to professional musicians. Participants performed the pitch
imagery task with a mean accuracy of 80.2% (SD = 14.9%). Brain sources were modeled
with bilateral auditory cortical dipoles. Time-frequency analyses revealed prominent
experimental effects on auditory cortical beta-band (15-30 Hz) rhythms, which were
more strongly desynchronized by musical imagery than by musical perception. Such
effects in the beta band may reflect a mechanism for coordinating auditory and motor
processing of temporal structure in music. In addition, cortical oscillations showed a
marked relationship to a psychometric measure of musical imagery. Specifically, the
magnitude of cortical oscillations in the high gamma band (60-80 Hz) was significantly
and inversely related to auditory manipulation scores measured by the Bucknell Auditory
Imagery Scale Control (BAIS_C) which evaluates the degree of mental control that
individuals perceive they have over their imagery. Gamma band activity reflects the
recruitment of neuronal resources in auditory cortex, and the observed negative
correlation is consistent with the interpretation that pitch imagery is a less effortful
process for those who self-report higher mental control. These results add to a growing
body of evidence that intrinsic brain oscillations play crucial roles in supporting our
ability to mentally replay or create music through musical imagery.
ISBN 1-876346-65-5 © ICMPC14 466 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 467 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Rhythmic drumming has been used for centuries to alter consciousness and induce states
of trance. Rhythm-induced trance is commonly practiced in shamanism, humanity’s most
ancient spiritual and healing tradition. Most forms of shamanism use similar repetitive
rhythms, which suggests a common biological basis. Despite similar techniques across
cultures and powerful phenomenological content, little is known about the neural and
physiological mechanisms underlying this form of trance. We present a series of studies
that examine the neural and physiological correlates of rhythm-induced trance in
experienced shamanic practitioners. In the first study, we used fMRI to examine the
neural patterns associated with trance. Experienced shamanic practitioners (n=15)
underwent 8 minute brain scans while they listened to rhythmic drumming and entered a
trance state (or remained in non-trance in a control condition). During trance, brain
networks displayed notable reconfigurations, including increased connectivity in regions
associated with internal thought (the default mode’s posterior cingulate cortex) and
cognitive control (dorsal anterior cingulate cortex and insula), as well as decreased
connectivity within the auditory pathway. Together this network configuration suggests
perceptual decoupling and that the repetitive drumming was gated out to maintain an
internally oriented stream of consciousness. We followed up this work in an EEG/ERP
study that used a similar design to examine auditory gating and network activity while
shamanic practitioners (n=16) experienced rhythm-induced trance and a control mind-
wandering state. We will also present work that examines physiological correlates and
responses to trance including a study on heart-rate variability (a measure of sympathetic
vs. parasympathetic dominant states) (n=14) and the stress response measured in terms of
cortisol levels after a ritual session (n=20). Together this work helps explicate the
common use of rhythmic drumming to alter consciousness and why trance is a valuable
tool to facilitate insight in many cultures.
ISBN 1-876346-65-5 © ICMPC14 468 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Deficits in the ability to perceive and produce a beat correlate with the impairment of
coordinated movement in disorders such Parkinson’s disease. Neuroimaging has shown
that rhythmic movements and maintaining a steady beat involve the interaction of
auditory and motor areas, within a wider network of prefrontal, parietal, cerebellar, and
striatal regions. How these dynamic interactions establish an internal sense of beat and
enable us to produce rhythms is only beginning to be understood, yet may ultimately help
develop more effective interventions for movement disorders. This study therefore
examined brain activity across a large network using functional (fMRI), functional
connectivity (fcMRI), and effective connectivity MRI (ecMRI) techniques during a beat
production and maintenance task of contrasting strongly and weakly beat-inducing
stimuli. fMRI BOLD activation results from14 participants (4F, 24 ± 4.5yrs) are
consistent with the previous literature. Tapping to a strong beat compared to self-paced
tapping showed the greatest relative activation in the bilateral auditory cortices,
supplementary motor areas (SMA), premotor cortices (PMC), and inferior frontal gyri
(IFG). Relative to a weak beat, significant activation was seen in the putamen and
thalamus. Functional connectivity was examined across a more comprehensive group of
brain regions of interest (ROIs) then has been previously reported (18 ROIs), including
regions the cerebellum, thalamus, inferior parietal lobe, and caudate nucleus. These
analyses demonstrated that connectivity is remarkably stable regardless of stimulus beat
strength or self-paced tapping. Examination of the strongest signal correlations showed
the left IFG, thalami, and particularly the SMA may act as important connectivity “hubs.”
A preliminary ecMRI analysis using Group Iterative Multi Model Estimation (GIMME)
confirmed the fcMRI analyses' outline of the network’s structure as well as its
identification of important network hubs.
ISBN 1-876346-65-5 © ICMPC14 469 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Clinically ECoG is the gold standard for locating epileptic sources in therapy refractory
epilepsy patients. Within cognitive neuroscience it provides the combination of high
temporal (in ms range) and high spatial (in mm range) resolution. Those features offer a
high potential for (a) exploring the neural encoding of music within the secondary
auditory areas via cortical surface electrodes and (b) exploring activations of limbic
structures due to music induced emotions via depth electrodes in the hippocampus and
amygdala.
Prof. Knight will focus on the use of auditory stimuli to study language processes using
ECoG. He will address the cortical mechanisms of phoneme and word representation,
categorical phoneme representation and speech production.
Dr. Mikutta will present ECoG data recorded from 28 patients while they listened to
music. Using the high gamma filtered data from the informative electrodes (144 out of
1760 over all patients) a spectrotemporal decoding model was fitted. This model was able
to reconstruct the spectrogram properties of a given music directly from cortical
recordings of the brain. The perceptual quality of the sound reconstructions increased
with the spatial density of the electrodes, and the extent of frequency tuning sampling.
These results reveal that the spectro-temporal sound representation of music can be
decoded using neural signals acquired directly from the human cortex.
Finally Dr. Omigie will demonstrate the different roles of hippocampus and amygdala
within the feeling of expectedness and unexpectedness in music. She presents data from
patients with depth electrodes in the hippocampus, amygdala and lateral temporal areas.
Using Granger causality analysis she demonstrates differences in information flow during
expected and unexpected notes in a melody. While the hippocampus processes subjective
feelings of familiarity in response to expected notes, the amygdala may facilitate feelings
of tension or surprise in response to unexpected high IC ones.
ISBN 1-876346-65-5 © ICMPC14 470 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Language, attention, perception, memory, motor control and executive control generate
focal high frequency oscillatory activity in the range of 70-250 Hz (high gamma, HG).
The HG response in the human electrocorticogram (ECoG) precisely tracks auditory
processing in the neocortex and can be used to study the spatio-temporal dynamics of
auditory and linguistic processing. ECoG data will be reviewed addressing the neural
mechanisms of speech suppression, categorical representation and the timing of speech
perception and production in peri-sylvian language regions.
ISBN 1-876346-65-5 © ICMPC14 471 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music is a universal aspect of human behavior spanning all cultures. We addressed how
music is represented in the human brain by recording electrocorticographic signals
(ECoG) from 28 patients with intracranial electrodes covering left and right auditory
cortices. A decoding model based on sound spectral features and high gamma cortical
surface potentials was able to reconstruct music with high sound quality as judged by
human listeners. The perceptual quality of the sound reconstructions increased with the
spatial density of the electrodes, and the extent of frequency tuning sampling. These
results reveal that the spectro-temporal sound representation of music can be decoded
using neural signals acquired directly from the human cortex.
ISBN 1-876346-65-5 © ICMPC14 472 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 473 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Several studies comparing adult musicians and non-musicians have shown that music training
is associated with functional and anatomical brain differences. It is unknown, however,
whether those differences result from lengthy musical training or from pre-existing biological
traits or social factors favoring musicality. As part of an ongoing 5-year longitudinal study,
we investigated, using behavioral, neuroimaging and electrophysiological measures, the
effects of a music training program on the auditory and language development of children,
over the course of three years beginning at 6-7. The training was group-based and inspired by
El Sistema. We compared the children in the music group with two groups of “control”
children of the same socio-economic background, one involved in sports training, another not
involved in any systematic training. Prior to participating, children who began training in
music were not different from those in the control groups relative to cognitive, motor,
musical, or brain measures. After three years, we now observe that children in the music
group, but not in the two control groups, show an enhanced ability in pitch and rhythm
perception and phonological processing and an accelerated maturity of auditory processing as
measured by cortical auditory evoked potentials. Our results support the notion that music
training results in significant brain changes in school aged children.
ISBN 1-876346-65-5 © ICMPC14 474 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music engagement for the purposes of cognitive and emotional regulation has been
shown to be consistently associated with, and predictive of, subjective indices of
well-being and adaptive emotion regulation strategy. The present study investigated
the extent to which music engagement contributed to neurophysiological substrates
of well-being, after statistically controlling first, for age and gender, and secondly, music
listening and training in a sample of 38 participants between the ages of 18 and 42
(M = 27.11, SD = 6.66). Frontal asymmetry and heart rate variability were acquired using
electroencephalography (EEG) and electrocardiography (ECG). Intentional use of music
for cognitive and emotional regulation predicted both well-being and emotion regulation
capacity, as measured respectively by resting frontal asymmetry and high frequency heart
rate variability. Furthermore, this predictive value was maintained after statistically
controlling for age, gender, music listening and music training. These findings suggest
that purposeful engagement with music for cognitive and emotional regulation could be
strategically targeted for future well-being interventions.
ISBN 1-876346-65-5 © ICMPC14 475 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 476 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
was not significant, F(1, 9) = 0.34, p = 0.58. Table 1. Neural Correlates of Empathic Concern in Timbre
Noisy versions were correlated at a significant level (p < Perception
.05) with perceived bodily exertion (r = .55), negative valence
(.55), and anger (.87).
D. Acoustical Attributes of Stimuli
Noisy signals were characterized acoustically by elevated
inharmonicity, spectral centroid, spectral flatness, zero-cross
rate, and auditory roughness, as measured using MIRtoolbox
1.4 (Lartillot & Toiviainen, 2007) in MATLAB.
E. Interpersonal Reactivity Index
The Interpersonal Reactivity Index (IRI), a 28-item survey
designed to measure individual differences in empathy, was
given to all subjects prior to the scan (Davis, 1983). The index
uses 5-point Likert questions to capture four subscales of
empathy: perspective taking, fantasy, empathic concern, and
personal distress, of which “empathic concern” (EC) is the
most relevant to the present study. The EC subscale attempts
to capture the emotional component of empathy, or “feeling
with someone else” (Preston & de Waal, 2002). The highest
EC score possible is 35 (7 questions x 5 points each); mean
score of the 14 participants (one subject did not complete the
IRI) was 29.07, SD = 3.08.
F. MRI Procedure
Subjects listened to the randomized stimuli while being
scanned. Each stimulus was repeated 6 times with 10 ms of
silence between each onset (approximately 12 s total per
timbre). After the presentation of all 12 stimuli, subjects were
given a 16 s baseline period of silence. The full block was
repeated 3 times.
G. Data Acquisition, Preprocessing, and Statistics
Images were acquired on a Siemens 3T Trio MRI scanner,
and audio stimuli were timed and presented with Presentation
software through noise-cancelling, magnet-compatible
SereneSound headphones. Image preprocessing and data
analysis were performed with FSL version 5.0.4. Images were
realigned to the middle volume to compensate for any head
motion using MCFLIRT (Jenkinson et al., 2002). Statistical
analyses were performed at the single-subject level using a
general linear model with fMRI Expert Analysis Tool (FEAT,
version 6.00), and group-level analysis was carried out using
FSL FLAME stage 1 and 2 (Beckman, Jenkinson, & Smith,
2003). All images were thresholded at Z > 2.3 (p < .01),
corrected for multiple comparisons using cluster-based
Gaussian random field theory controlling family-wise error
across the whole brain at p < .05. A region of interest (ROI)
was determined from the task > baseline contrast, and
subsequent contrasts were masked for this ROI. In our Figure 1. Empathy modulates SMA: “Empathic concern” (EC)
contrasts, the “noisy” version of the stimuli comprised an covariant in the contrast noisy > regular timbre
equal contribution of noisy conditions #1 and 2.
477
Table of Contents
for this manuscript
in Figure 1 (p < .01). To make sure that the correlation was 2004; Lima et al., 2015). Halpern et al. (2004) attributed SMA
not driven by outliers, we extracted parameter estimates from activity in part to subvocalization of timbral attributes, and the
the peak SMA voxel, and plotted the correlations with EC (r = present study would seem to partially corroborate this
.76) as shown in Figure 2. explanation. Indeed, we interpret this result as a possible
instance of sensorimotor integration: SMA activity could
The electric guitar and the voice exhibited a similar motor
reflect a neurophysiological propensity to link sounds with
pattern. Activity in SMA and superior parietal lobule their associated actions.
correlated with empathic concern when subjects heard In a single-neuron study of action execution and
distorted guitar timbre compared to a regular, “clean” electric observation, the SMA was shown to possess “mirroring”
guitar signal. Noisy vocal timbres also modulated SMA; properties; that is, it was active both when subjects performed
additionally, the contrast noisy > regular voice showed and observed certain hand grasping actions and emotional
correlations with EC in left parietal operculum, Rolandic facial expressions (Mukamel et al., 2010). Even in passive
operculum, inferior parietal lobule, and Broca’s area. listening, then, timbres may suggest the motions (and
emotions) that contribute to them: perhaps the prevalence of
SMA here is evidence for the phenomenon of motor
resonance demonstrated previously in other auditory domains
(Aglioti & Pazzaglia, 2010), including music perception
(Zatorre, Chen, & Penhune, 2007). Following this embodied
interpretation, people do not just passively listen to different
timbres—they enact some of the underlying physical
determinants of sound production, whether through
subvocalization, biography-specific act/sound associations
(Margulis et al., 2009), or other mechanisms of audio-motor
coupling. The voice appears to be categorically distinct in this
regard: in addition to SMA, we found activation in Broca’s
area and Rolandic operculum, important areas for speech
production and larynx control (Brown, Ngan, & Liotti, 2008),
Figure 2. Parameter estimates of peak SMA voxel (x) correlated when subjects heard noisy vocal timbres.
with “empathic concern” (EC) scores (y) when subjects hear What accounts for the difference in neural processes
noisy timbres between noisy and regular qualities of sound? Noisy
timbre—i.e., spectral characteristics that are generally disliked,
such as excessive inharmonicity, spectral centroid, and
IV. DISCUSSION auditory roughness—appears to modulate motor resonance to
a greater degree than non-noisy timbres, and we make sense
This study examined the neural correlates of dispositional of this finding through an ecological and embodied lens.
empathy in the processing of isolated musical timbres, Distorted qualities of timbre in vocal and (non-synthetic)
specifically those heard as “noisy.” We present two novel instrumental sound production typically index physical
findings. First, our study reveals a possible motor component exertion, and are associated with high-potency affect and low
in timbre processing, especially among listeners with higher valence (Scherer, 1995). Via the ecological contingencies of
levels of dispositional empathy. Previous neurophysiological sound production and perception, the listener parses these
research has demonstrated the involvement of large areas of acoustic cues into their relevant sensorimotor and affective
the temporal lobe (e.g., superior temporal gyrus, superior components (Juslin & Västfjäll, 2008; Wallmark, 2014). The
temporal sulcus, and Heschl’s gyrus) in the perception of ethological association between noisy, aperiodic qualities of
timbre (Platel et al., 1997; Pantev et al., 2001; Menon et al., timbre and arousal appears to be phylogenetically ancient, and
2002; Alluri et al., 2012); but only one study, to our is not exclusive to humans (Tsai et al., 2010; Blumstein,
knowledge, has implicated motor regions (Halpern et al., Bryant, & Kaye, 2012). In sum, listeners may respond with
2004). Our basic task > baseline contrast (without empathy greater intensity to more “intense” timbres owing to the
covariate), which was used as an ROI for all subsequent ecological urgency typically signaled by such sound qualities.
analyses, found the contribution of a range of motor areas in As our results indicate, however, motor resonance is not
timbre perception (Wallmark, Deblieck, & Iacoboni, in prep), necessarily dictated by timbral quality alone: there is a
suggesting multi-modal processing of musical timbre. probable social-cognitive variable as well (Overy &
Cross-fertilizing this result with behavioral data from the IRI Molnar-Szakacs, 2009; Launay, 2015).
indicates that this motor resonance is even more acute for According to Theodor Lipps’s early formulation
empathic people. (Einfühlung), empathy enables one individual to
Second, the present study suggests that the supplementary intersubjectively “feel into” another through inner imitation.
motor area (SMA) is preferentially engaged among To experience another’s point of view, in this account, is not
high-empathy individuals in the perception of subjectively an abstract mental representation of what it feels like “in the
and acoustically “noisy” musical timbres (over “regular” other’s shoes,” but rather constitutes the dynamic
timbral qualities performed on the same sound producers). co-experience of affective and cognitive states (Gallese, 2007).
SMA has been implicated in internally generated movement, In musical terms, empathy has been shown to moderate
coordination of action sequences, and postural stability between recognized and induced emotions in music: the
(Roland et al., 1980; Nguyen, Breakspear, & Cunnington, greater the dispositional empathic concern on a social level,
2014); it has also been found to contribute to the vividness of the more likely an individual is to exhibit a strong affective
auditory imagery, including timbral imagery (Halpern et al., response to music (Egermann & McAdams, 2013). Though
478
Table of Contents
for this manuscript
we did not explicitly address musical affect in this study, our study. Our research was sponsored by a UCLA
results indicate that empathic concern modulates motor Transdisciplinary Seed Grant.
resonance, which has been theorized to play an important role
in emotional contagion (Molnar-Szakacs & Overy, 2006).
Individuals with higher dispositional empathy may be more
likely to take up the motor invitation of a timbre, in a sense REFERENCES
co-experiencing the actions involved in a sound’s production. Aglioti, S., & Pazzaglia, M. (2010). Representing actions through
The singing voice is particularly direct as a locus of motor their sound. Experimental Brain Research, 206, 141–151.
resonance, as previously mentioned, though noisy signals in Alluri, V., et al. (2012). Large-scale brain networks emerge from
aggregate (including instruments) drive SMA activity among dynamic processing of musical timbre, key, and rhythm.
empathic people as well. This would seem to support Neuroimage, 59, 3677–3689.
neurophysiological studies demonstrating a robust correlation Beckman, C., Jenkinson, M., & Smith, S. M. (2003). General
between strength of activity in pre- and primary motor multi-level linear modelling for group analysis in fMRI.
cortices and empathic concern (measured by the IRI) when Neuroimage, 20, 1052–1063.
observing emotional behaviors of others (Carr et al., 2003; Blumstein, D., Bryant, G. A., & Kaye, P. (2012). The sound of
Pfeifer et al., 2008). arousal in music is context-dependent. Biology Letters, 8, 744–
Egermann and McAdams (2013) found an effect of 747.
self-reported empathy on affective valence, and positively Brown, S., Ngan, E., & Liotti, M. (2008). A larynx area in the human
valenced non-verbal vocalizations were reported in another motor cortex. Cerebral Cortex, 18, 837–845.
study to elicit a stronger motor resonance response (Warren et Carr, L., Iacoboni, M., Dubeau, M.-C., Mazziota, J. C., & Lenzi, G.
al., 2006). Differing from previous studies, we found instead L. (2003). Neural mechanisms of empathy in humans: A relay
that empathic concern covaries with negatively valenced from neural systems for imitation to limbic areas. Proc. Nat’l
(“noisy”) sounds. This provides tentative evidence for our Academy of Sciences, 100, 5497–5502.
hypothesis that dispositionally empathic people would tend to Christov-Moore, L., & Iacoboni, M. (2016). Self-other resonance, its
exhibit greater motor susceptibility to timbral qualities that control and prosocial inclinations: Brain-behavior relationships.
betoken heightened arousal, bodily exertion, and ecological Human Brain Mapping, 37, 1544–1558.
urgency. We speculate that social- and musical empathy are Davis, M. H. (1983). Measuring individual differences in empathy:
closely related; empathic concern for the well-being of others Evidence for a multidimensional approach. Journal of
is thus correlated with greater motor attunement toward others’ Personality and Social Psychology, 44, 113–126.
expressions of anger, pain, danger, and other high-potency Egermann, H., & McAdams, S. (2013). Empathy and emotional
affective contexts. contagion as a link between recognized and felt emotions in
music listening. Music Perception, 31, 139–156.
V. CONCLUSION Gallese, V. (2007). Before and below ‘theory of mind’: Embodied
simulation and the neural correlates of social cognition.
This study used fMRI to explore the neural correlates of Philosophical Transactions: Biological Sciences, 362, 659–669.
empathic concern in the processing of certain features of Gazzola, V., Aziz-Zadeh, L., & Keysers, C. (2006). Empathy and the
isolated timbres. We present preliminary neurophysiological somatotopic auditory mirror system in humans.” Current
evidence for the role of motor regions, especially the Biology, 16, 1824–1829.
supplementary motor area (SMA), in the perception of “noisy” Halpern, A., Zatorre, R. J., Bouffard, M., & Johnson, J. A. (2004).
qualities of timbre by empathic individuals. We interpret a Behavioral and neural correlates of perceived and imagined
number of novel findings through the framework of embodied musical timbre. Neuropsychologia, 42, 1281–1292.
cognition to posit a functional link between motor systems, Hatfield, E., Cacioppo, J. T., & Rapson, R. L. (1994). Emotional
timbre perception, and the dispositional trait of empathic contagion. Cambridge: Cambridge Press.
concern (as evaluated using the IRI). This is the first study to Jenkinson, M., Bannister, P., Brady, J. M., & Smith, S. M. (2002).
suggest that empathy modulates not only music processing, as Improved optimization for the robust and accurate linear
has been explored elsewhere (Egermann & McAdams, 2013; registration and motion correction of brain images. Neuroimage,
Vuoskoski & Eerola, 2012), but also the perception of 17, 825–841.
individual timbres removed from any musical context. Juslin, P. N., & Västfjäll, D. (2008). Emotional responses to music:
There are a number of limitations to the present study. The need to consider underlying mechanisms. Behavioral and
Future researchers might find it fruitful to investigate the Brain Sciences, 31, 559–621.
somatotopy of motor resonance in empathy-modulated timbre Kaplan, J., & Iacoboni, M. (2006). Getting a grip on other minds:
perception. Localization tasks for vocal production and Mirror neurons, intention understanding, and cognitive
instrument-specific actions might help disambiguate the empathy.” Social Neuroscience, 1, 175–183.
relationship between gross motor resonance and Kendall, R. A., & Carterette, E. C. (1993). Verbal attributes of
muscle-specific co-activation in sound production and simultaneous wind instrument timbres: I. von Bismarck
perception. Further, it remains unclear how timbre-specific adjectives.” Music Perception, 10, 445–467.
effects of empathy relate (if at all) to music listening more Lartillot, O., & Toiviainen, P. (2007) A Matlab toolbox for musical
generally. feature extraction from audio. Proc. 10th Int’l Conf. Digital
Audio Effects, Bordeux, 237–244.
ACKNOWLEDGMENT Launay, J. (2015). Musical sounds, motor resonance, and detectable
agency. Empirical Musicology Review, 10, 30–40.
We would like to thank Roger Kendall, Katy Cross, and Leman, M. (2007). Embodied music cognition and mediation
Marita Meyer for their assistance at various stages of this technology. Cambridge, MA: MIT Press.
479
Table of Contents
for this manuscript
480
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 481 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Research evidence suggests that sustained and intense musical training can be a powerful
tool in changing the structure of the human brain. When comparing the brain structures of
trained musicians to non-musicians, research evidence reveals differences can be found in
multiple regions, including the corpus callosum, premotor areas, and superior temporal
gyrus. However, the characteristics that define a formally trained musician are not fully
consistent from study to study. While not an exhaustive list, these variations include:
different total years of intensive musical training, different definition(s) of intense
musical training, and different primary instrument(s) of training. This study quantified
these characteristics to help further define formally trained musicians through the use of
an in-depth musical training survey. The purpose was to explore whether there are
training threshold characteristics at which a significant amount of structural and
functional network changes in the brain occur using fMRI (functional Magnetic
Resonance Imaging). Participants were grouped according to their level of musical
training: (1) professional musicians (post-secondary training), (2) school-trained
musicians (at least seven years of intensive ‘school-age’ musical training), and (3) non-
musicians (at maximum receiving only the mandatory grade school general music
classes). Graph theory methods were applied to analyze network connectivity properties.
Results reveal unique structural and functional changes of the human brain based on use-
dependent intensive training. Through the results, evidence suggests that although
receiving post-secondary education in music aids in the structural and functional change
of the human brain, brain network differences can also be found between those who
received seven years of intensive school-aged instrumental musical training and non-
musicians. These results suggest that future network neuroimaging investigations are
warranted to further our understanding of how use-dependant changes occur in the
brain’s structural and functional network as a result of intensive musical training.
ISBN 1-876346-65-5 © ICMPC14 482 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Charles S. Wasserman1, Jung Nyo Kim1, Edward W. Large1 and Erika Skoe2
1
Department of Psychological Sciences, University of Connecticut, Storrs, CT USA
2
Department of Speech, Language & Hearing Sciences, University of Connecticut, Storrs, CT USA
This study aimed to investigate whether the sensorimotor system, as measured by 32-
channel cortical EEG, would entrain to a complex rhythm at the pulse frequency even
when the complex rhythm contained no spectral power at that frequency. The experiment
utilized four different rhythms of varying complexity (1 simple, 2 complex, and 1
random). Fast Fourier Transform (FFT) of the Hilbert envelope showed energy at the
repetition frequency (2Hz) for the simple rhythm, but no spectral energy at the missing
pulse frequency (2Hz) for the complex rhythms. EEG responses to these stimuli were
recorded to look for the neural oscillations at the missing pulse frequency predicted by
dynamical analysis. We report evidence of a 2Hz response in the EEG to missing pulse
rhythms. These data support the theory that rhythmic synchrony occurs as the result of
an emergent population oscillation that entrains at this particular frequency. We also
discuss generators of the 2Hz response component.
ISBN 1-876346-65-5 © ICMPC14 483 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music production and music perception both involve efficient integration of visual,
auditory, and motor networks. Research from multiple neuroscientific methods has
uncovered a large cortical and subcortical network of brain areas responsible for musical
tasks, including premotor cortex, supplementary motor area and dorsolateral prefrontal
regions. To date, research has focused on how visual and auditory information is
integrated, as well as how the motor system is recruited during such tasks. We examine
motor recruitment in music perception when auditory and visual information are at odds
with one another. Using Transcranial Magnetic Stimulation and electromyography, we
measure corticospinal excitability of the cortical hand area while participants experience
congruent and incongruent audio-visual musical stimuli. Stimuli were either congruent
audiovisual guitar playing, incongruent audio guitar with a video of a mouth singing, or
incongruent singing audio with a video of guitar playing. Results, including the
interpretation of corticospinal excitability measured at various time points after stimulus
onset, are discussed in the context of motor simulation and theories of sensory-motor
integration.
ISBN 1-876346-65-5 © ICMPC14 484 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ABSTRACT
It has been shown that learning experience is consolidated after sleep
and that learning some aspects of sensory stimuli such as phonetic
categorization is reflected in the evoked response obtained via event-
related potentials (ERPs). Active engagement in goal-oriented tasks
is known to facilitate sensory learning more so than passive
exposure. Musicians that participate in active, sensory learning over
many years experience changes in their Auditory Evoked Responses Figure 1. Schedule of training and recording sessions for each
(AERs). However, it remains unclear as to how actively learning participant.
how to play an unfamiliar instrument changes auditory processing
and how such an effect changes after sleep. Here, we trained the increased mismatch negativity (MMN) component, which
musicians to play a novel virtual instrument with a synthesized sound reflects neural process of pre-attentive perceptual
connected using a hand-tracking game controller for about a half discrimination, after the post-training sleep. Furthermore,
hour. The EEG was recorded with participants listening to this perceptual learning also results in increased N1-P2 complex of
instrument's tones as well as piano tones, immediately before and
the obligatory auditory ERP, or auditory evoked response
after training, and a day after the learning (termed Pre, Post1, and
Post2). To make the sound of the virtual instrument novel, we used a (AER) (Tremblay et al., 2001; Tremblay et al., 2009),
sharp attack of a struck modal bar combined with a sustained plucked especially after post-training sleep (Alain et al., 2015). These
string in the instrument model. The height of each hand determined AERs are known to show specific enhancement with musical
the pitch (the dominant hand) and duration separately. The range of training (Kuriki, Kanda, & Hirata, 2006; Pantev et al., 1998;
the motion was calibrated for each participant. The auditory evoked Shahin et al., 2003;). Noteworthy is that the N1-P2 complex is
responses showed clear P1-N1-P2 complex for both stimuli, also sensitive to music timbre (Shahin et al., 2005), in particular
followed by a small N2-like peak around 300ms further followed by related to individual musical training (Pantev et al., 2001).
a slow positivity. Around 450-500 ms the sleep-related enhancement Since training with music making activity, compared to
of a positivity was observed to be significantly different between
discrimination training, was more effective in enhancing the
Post1 and Post2, only in the response to the virtual instrument tones,
but not to the piano tones. Moreover, this enhancement was focused improving behavioural discrimination of tonal patterns and
in the midline and right fronto-central electrodes. Our data accompanying MMN (Lappe et al., 2008), we hypothesized
demonstrate the effects of sleep on the auditory processing, possibly that actively learning how to play a novel instrument which
suggesting the involvement of the right auditory cortex related to produces an unfamiliar timbre and involves no physical contact
spectral timbre information processing, and medial frontal motor- with acoustic objects, would result in promoting plastic
related functions even during passive listening. changes of auditory cortical processes for processing the timbre
of this instrument, and these changes would be sensitive to the
post-training sleep. We specifically designed the instrument
I. INTRODUCTION (henceforth, ‘virtual instrument’) such that the past experience
Studies have shown that sleep plays a key role in the in playing traditional acoustic instruments do not interact with
consolidation of memory for motor learning, spatial learning the learning. That is, the virtual instrument has to be played
and perceptual learning (Stickgold, 2005). When focusing on with arm and hand gestures, but without causing any physical
perceptual learning, different sleep periods such as rapid eye excitation such as string-plucking, string-bowing, or surface-
movement (REM) sleep, paradoxical sleep, and slow-wave hitting. The timbre of this instrument was synthesized by
sleep are important for promoting brain plasticity and, in the combining a modal bar physical model with a plucked string
adult brain, for learning and memory (Karni et al., 1994; physical model, to again, avoid effects of past experience in
Maquet 2001; Stickgold et al., 2000). Importantly, post-sleep associating specific gestures with a particular instrument
improvements in perceptual tasks can be seen after the first physical system and its sounds.
night of sleep, after the subsequent nights of sleep, and after
between-session sleeps (Censor, Karni, & Sagi, 2006). In
general, adaptive procedure is more effective in improving
perceptual discrimination compared to rote practice, and
increasing the number of training sessions has a limit in its
benefit.
The benefits of sleep for auditory perceptual learning have
also been shown in phonological recognition performance
following sleep (Fenn, Nusbaum, & Margoliash, 2003) and
pattern discrimination of tone sequence (Atienza et al., 2004;
Atienza, Cantero, & Quiroga, 2005). The latter studies, in Figure 2. GameTrak controller used to play the virtual
particular, used event-related potential (ERP) to demonstrate instrument
ISBN 1-876346-65-5 © ICMPC14 485 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
II. METHODS
A. Participants
12 young adult musicians (4 females) participated in our
study. They have at least 8 years of formal musical training,
training ranged from 10-30 years. The participants’ dominant
hand was used to keep playing the virtual instrument uniform.
They gave signed informed consent for participation, and the
procedure was approved by Stanford IRB board.
B. Sessions
This study took place over two days. First, the participants
were fitted with a 64 channel Neuroscan Quik cap, and
underwent the Pre-training EEG recording in a sound-shielded
room, performed with Neuroscan SymAmpRT amplifier with
200-Hz sampling rate. This recording lasted for four eight-
minute long sequences (two blocks of the virtual instrument
Figure 3. Electrode groupings of the 64 channel EEG cap.
tones, and two of piano tones, in an alternating manner with the
order counter-balanced between participants). To ensure the computer program using ChucK (Wang, 2008) to produce
subject was alert we asked them to press a button when they musical tones with a specifically designed unfamiliar timbre
heard a white noise burst, randomly interspersed in the (see below section E). The instrument mapped the pitch of the
sequence, instead of a tone. Thereafter, the participants were instrument within one-octave range (G3-G4) to the height of
engaged in training with the virtual instrument for a half hour. the player’s dominant hand and the duration of the note (Half,
After the training, the EEG was recorded again immediately quarter, and eighth notes at the tempo of 150 bpm) to the height
following the training (Post1), and the approximately 24-hours of the non-dominant hand. The player would then move their
later including the overnight sleep (Post2). hand away from them in a striking motion horizontally. The
C. Training instrument would sound once the hand passed a threshold of
Participants were asked to play three fairly simple musical distance from the player. All distances and notes were scaled to
pieces (a simple one-octave arpeggio of a major chord and the players’ comfortable reach in a seated position, measured
minor chord, Twinkle Twinkle Little Star, and Happy at the beginning of the training.
Birthday) shown to them as scores. Thirty minutes were given E. Stimuli
to practice these songs. Thereafter the participants were asked Participants were presented with two types of tones during
to perform the pieces as accurate and consistent as possible. the EEG recording. The first is the tones of the virtual
Once the subject was able to play each piece twice with no or instrument, using a combination of a modal bar physical model
few mistakes, the training was finished. with a plucked string physical model. The striking of the modal
D. Virtual Instrument bar gave us a well defined note attack, while the string model
The virtual instrument was designed to be played with both made the tone novel and unfamiliar. Participants also listened
hands wearing a GameTrak hand tracking controller (Mad to piano timbre tones as a control. Tones with both timbres
Catz), which send the hand movement information to a
Figure 4. Auditory evoked responses to the virtual instrument and piano tones for each session across three groups of electrodes.
486
Table of Contents
for this manuscript
were edited, with 7 fundamental frequencies of C-major central midline electrode site were evident in the two time
diatonic scale (G3 to G4), and normalized in loudness. ranges: the first was around 270ms in both Pre vs. Post2 and
F. EEG Analysis Post1 vs. Post2 comparisons, whereas the second range was
After re-referencing the EEG electrode to the common found at 470ms in Post1 vs. Post2 comparison (Figure 4, top
average, eye artifacts were removed using eye blinks and center panel). In contrast, for the piano condition, the sleep-
movements detected by horizontal and vertical related effects were obtained mainly in the 400-500ms range
electrooculogram (EOG) through source space projection in for Post1 vs. Post2 comparison at the left fronto-central
Brainstorm toolbox (Tadel et al., 2011). Afterwards the data electrode group, and in the 300-400ms range for Pre and
were segmented to epochs of -0.2 to 0.8 sec to the stimulus Post2 in the midline and right electrode groups (Figure 4,
onset, and subsequently averaged to obtain auditory evoked bottom center and left panels). Figure 5 shows the evoked
response for virtual instrument and piano conditions responses to the two stimulus types, their difference at each
separately. The baseline correction was done using a window testing session, as well as the differences between two testing
of -0.1 to 0 sec. Three electrode groups, namely, sessions. In the fronto-central midline electrode group, the
FrontCentral_Left (fcl), FrontCentral_Medial (fcm) and effects of sleep were evident in three time ranges, namely
FrontCentral_Right (fcr, Figure 3) were used to inspect the around 150ms, 270ms, and 470ms. The amplitude was
waveforms and preliminary time-point-by-point pair-wise t- calculated in a 20-ms time window centered at these latencies,
test comparison between any of two sessions. Based on the and submitted to three separated repeated-measures ANOVA
latency area exhibiting the session-differences, repeated to determine the contribution of the session and the stimulus
measures analysis of variance (ANOVA) was used to examine type to the change in the signal.
the amplitude in the fcm electrode site, using the factors A. 150ms
Timbre (virtual instrument vs. piano) and Session (Pre, Post1, The ANOVA showed the significant main effect of timbre
Post2). around 150ms, riding on the onset slope of P2 peak (F(1,10) =
21.510, p = 0.0009), reflecting the more negative response to
the virtual instrument than piano tones. No other main effects
III. RESULTS or interactions were significant.
The clear auditory evoked response was obtained for all B. 270ms
the sessions and tone types, presented in Figure 4. The The ANOVA showed that a negativity around 270ms,
significant differences between sessions Pre, Post1 and Post2 following the P2 peak around 200ms, to the virtual instrument
within the stimulus type, as marked in Figure 4, demonstrated stimulus was significantly larger from the response to the
that the primary effects of the sessions involved different time
ranges across the virtual instrument and piano tones. In the
virtual instrument condition, the effects of sleep in the fronto-
Figure 5. The evoked responses to the virtual instrument and piano stimuli, their difference at the Pre, Post1 and Post2 testing sessions,
and the differences between two testing sessions. Graphs have circles underneath the curve to show time points of statistically
significant difference between the stimuli (top and middle) and between sessions (bottom).
487
Table of Contents
for this manuscript
piano stimulus (F(1,10) = 7.757, p = 0.0193). No other main away after sleep. Our findings indicate evidence of neural
effects or interactions were significant. plasticity and the consolidation of memory during sleep for
C. 470ms attention and timbre processing.
Here, the ANOVA found no main effect of Timbre or
Session. Instead, there was a significant interaction between ACKNOWLEDGMENT
Timbre and Session (F(2,20) = 3.768, p = 0.0409). The post- We would like to thank Professor Ge Wang (Stanford
hoc test showed that this interaction was caused by the University) for allowing us to use his GameTrak controller.
significant reduction of the negativity in the Post2, compared
to Post1 only for the virtual instrument (p<0.05), as the REFERENCES
negativity to the piano stimulus did not change across the
Alain, C., Da Zhu, K., He, Y., & Ross, B. (2015). Sleep-dependent
three sessions. neuroplastic changes during auditory perceptual
learning. Neurobiology of learning and memory, 118, 133-142.
IV. DISCUSSION Atienza, M., Cantero, J. L., & Quiroga, R. Q. (2005). Precise timing
Past experiments focusing on the specific effects of sleep accounts for posttraining sleep-dependent enhancements of the
found an increase in the N1-P2 complex after the first night of auditory mismatch negativity. NeuroImage, 26(2), 628-634.
Atienza, M., Cantero, J. L., & Stickgold, R. (2004). Posttraining
sleep following acoustic discrimination training. This
sleep enhances automaticity in perceptual discrimination. Journal
experiment did not find any statistical increase in the P2 peak of Cognitive Neuroscience, 16(1), 53-64.
after sleep. Censor, N., Karni, A., & Sagi, D. (2006). A link between perceptual
Instead we found differences in the N1 and N2 peaks. The learning, adaptation and sleep. Vision research, 46(23), 4071-
former was more related to the acoustic difference between 4074.
the stimuli itself, as indicated in the difference waveform Fenn, K. M., Nusbaum, H. C., & Margoliash, D. (2003).
between the stimuli consistently. The latter took place at the Consolidation during sleep of perceptual learning of spoken
offset slope of P2, across all recording sessions of the virtual language. Nature, 425(6958), 614-616.
instrument. N2 is known as attention related AER component. Karni, A., Tanne, D., Rubenstein, B. S., Askenasy, J. J., & Sagi, D.
(1994). Dependence on REM sleep of overnight improvement of
Because participants were trained on a music making task
a perceptual skill. Science, 265(5172), 679-82.
instead of an acoustic discrimination task difference may have Kuriki, S., Kanda, S., & Hirata, Y. (2006). Effects of musical
been caused by the brain activity related to motor and/or experience on different components of MEG responses elicited
attentional processing even though the EEG was recorded by sequential piano-tones and chords. The Journal of
during passive listening. Neuroscience, 26(15), 4046-4053.
The negativity around 470ms was also found to be Koelsch, Stefan. "Neural substrates of processing syntax and
sensitive to the session. Interestingly, the negativity was semantics in music." Current opinion in neurobiology 15.2
enhanced at the Post1 recording of the virtual instrument (2005): 207-212.
stimulus, compared to the Pre recording, while this increment Lappe, C., Herholz, S. C., Trainor, L. J., & Pantev, C. (2008).
Cortical plasticity induced by short-term unimodal and
went away at the Post2 session. The nature of this peak is not
multimodal musical training. The Journal of
really clear. However, it might be related to the N5 peak, Neuroscience, 28(39), 9632-9639.
which is usually recorded around 500ms, and found to be Maquet, P. (2001). The role of sleep in learning and memory.
“related to the amount of harmonic integration required by a Science,294(5544), 1048-1052
musical event” (Koelsch 2005). The integration of the novel Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., &
timbre perception and learning might be the cause of the Hoke, M. (1998). Increased auditory cortical representation in
presence of the peak after training. The consolidation after musicians. Nature,392(6678), 811-814.
sleep might be the reason we do not see the peak during the Pantev, C., Roberts, L. E., Schulz, M., Engelien, A., & Ross, B.
Post2 recording of the virtual instrument stimulus. We cannot (2001). Timbre-specific enhancement of auditory cortical
representations in musicians. Neuroreport, 12(1), 169-174.
associate the peak with N400, associated with semantic
Shahin, A., Bosnyak, D. J., Trainor, L. J., & Roberts, L. E. (2003).
integration, because the literature shows different topography Enhancement of neuroplastic P2 and N1c auditory evoked
about priming effect, and perhaps different from what we potentials in musicians. The Journal of Neuroscience, 23(13),
observed. It is believed there is a connection between the 5545-5552.
N400 and N5 peaks, and the N5 is a measure of musical Shahin, A., Roberts, L. E., Pantev, C., Trainor, L. J., & Ross, B.
meaning. Another noteworthy aspect of the results was that (2005). Modulation of P2 auditory-evoked responses by the
these effects of training and sleep mainly involved midline spectral complexity of musical sounds. Neuroreport, 16(16),
and right fronto-central electrode sites, as shown in Figure 4 1781-1785.
and 5, indicating stronger involvement of the right auditory Stickgold, R. (2005). Sleep-dependent memory consolidation.
Nature,437(7063), 1272-1278.
cortex, in line with the favored timbre processing (i.e.,
Stickgold, R., Whidbee, D., Schirmer, B., Patel, V. & Hobson, J. A.
spectral information) (Zatorre and Belin, 2001) (2000). Visual discrimination task improvement: A multi-step
process occurring during sleep. J. Cogn. Neurosci.12, 246–254
V. CONCLUSION Tadel, F., Baillet, S., Mosher, J. C., Pantazis, D., & Leahy, R. M.
Sleep plays a role in the process of learning a new instrument (2011). Brainstorm: a user-friendly application for MEG/EEG
for musicians. We believe that training participants in a new analysis. Computational intelligence and neuroscience, 2011, 8.
instrument for 30 minutes lead to an enhanced negativity Tremblay, K., Kraus, N., McGee, T., Ponton, C., & Otis, B. (2001).
Central auditory plasticity: changes in the N1-P2 complex after
around 270ms possibly related to attention and motor
speech-sound training. Ear and hearing, 22(2), 79-90.
processing. We also saw a significant increase in the right-
lateralized negativity around 500ms after training, which went
488
Table of Contents
for this manuscript
489
Table of Contents
for this manuscript
Auditory scene analysis is important both for understanding how the auditory system
works and for developing computational systems capable of analyzing acoustic
environments for relevant events. The auditory system parses concurrent frequencies in
the acoustic signal into coherent sound events such as musical tones. Recently, nonlinear
oscillatory processes were found at various stages of auditory processing, and oscillatory
network models have been used to explain both neurophysiological and perceptual data.
Here we show that many aspects of auditory scene analysis can be understood as signal
processing and pattern formation in a dynamical system consisting of nonlinear
oscillatory components tuned to distinct frequencies. We first analyze a canonical model
to identify common dynamical properties of oscillatory networks that are relevant to
auditory scene analysis. We focus on mode-locked synchronization and Hebbian
plasticity between oscillators with harmonic frequency relationships. Next, we use the
canonical model to replicate some of the key empirical findings in the field, and we
compare the model to existing models and mechanisms. Simulations show that the
pattern of plastic connections formed between tonotopically tuned oscillators functions
similarly to a harmonic template. It provides reliable predictions for grouping and
segregation of frequency components and replicates phenomena such as the formation of
multiple concurrent harmonic complexes with distinct F0s and the “pop-out” of a
mistuned harmonic. The canonical dynamical systems model provides a general
theoretical framework which can account for many neurophysiological processes and
perceptual phenomena related to auditory scene analysis.
ISBN 1-876346-65-5 © ICMPC14 490 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Several recent studies have found that adult musicians outperform non-trained peers in
tasks requiring executive functions. However, this putative “musician advantage” in
executive functions is still largely unexplored in children.
The musically trained 11-13-year-olds showed smaller P3a responses to distractor sounds
compared to their non-trained peers. In the 15-17-year-olds, by contrast, there were no
significant group differences in response amplitudes. However, irrespective of age, the
musically trained children showed stronger habituation of the P3a response across the
experiment as well as higher scores in the tests for inhibition and set-shifting.
Thus, the ERP results suggest that the musically trained children had relatively mature
control over auditory attention already at age 11–13 years. By age 15–17, these skills had
improved in the control children, as indexed by lack of group differences in P3a
amplitude. However, they still remained below of that of the music group, as indexed by
the test performance and the habituation of the P3a. The results suggest that the
“musician advantage” in executive functions may be most pronounced at school age and
diminishes somewhat as these functions improve with age also in nontrained children.
ISBN 1-876346-65-5 © ICMPC14 491 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musical textures can contain one or multiple motifs that repeat in different melodic lines
(voices). Previous event-related potential (ERP) studies have shown that the mismatch
negativity (MMN) responses reflect encoding of melodic motifs. Also, when two voices
sound simultaneously, the encoding of the upper voice is advantaged in MMN.
Interestingly, our recent results showed that the upper voice advantage may extend to
non-simultaneous (i.e. alternating) voices, even if voices contain the same motif. The
present study is aimed at examining whether differentiation between upper and lower
voices is enhanced when each voice has its own motif, which varied by pitch contour. We
recorded electroencephalogram (EEG) from musicians, and compared MMN responses
across voices and number of motifs present. For 20% of trials in each voice, the 5th note
contained a contour-changing deviant, relative to the standard direction of that motif.
Within one voice, motif entry pitch varied within one-half-octave range; the pitch
separation between voices was one octave on average. Results showed that upper voice
MMN was larger when each voice had its own motif, compared to when both voices had
the same motif, or the upper voice was presented alone. This suggests that the presence of
multiple melodic identities across voices modulates preattentive encoding of melodic
lines, perhaps because melodic identity serves to differentiate streams similarly to other
acoustic features.
ISBN 1-876346-65-5 © ICMPC14 492 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Classic accounts of musical enjoyment hold that it arises primarily from characteristics of
the sound itself, but more recent studies have demonstrated that enjoyment also depends
on the expectations and preconceptions that people bring to the listening experience.
When wine is labeled as more expensive, it is not only perceived as tasting better, but
also elicits underlying neural activity that is associated with experienced pleasantness
(Plassmann, O’Doherty, Shiv, and Rangel, 2008). A more recent study has built upon this
concept by examining music, finding that the experienced pleasure of music listening is
also susceptible to manipulation (Margulis & Kroger, submitted). In this study, people
rated their preferences for piano performances they had been told were performed by a
“world-renowned concert pianist” or a “conservatory student of piano.” These labels
significantly affected behavioral ratings of perceived pleasantness.
In the current study, we aim to identify the neural correlates for this shift in reported
enjoyment. Nonmusicians were scanned using fMRI. They heard 8 pairs of excerpts
where one was performed by a student and one by a professional, and were correctly (in
half the trials) or falsely (in the other half) primed about the identity of the performer.
Participants used Likert-like scales to rate each performance and choose which one they
preferred after each matched pair. As in the behavioral study, results suggest that
enjoyment of fine music–like fine wine–can be susceptible to extrinsic information and
likewise implicate specific neural correlates. Implications for the nature of aesthetic
responses to music and the cognitive basis of musical pleasure are discussed.
ISBN 1-876346-65-5 © ICMPC14 493 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 494 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musical syntax includes tonal-harmonic structures, but also rhythm. Although several
theoretical investigations stressed the importance of the interplay between tonality and
rhythm (and specifically meter) for musical syntax, neuroscientific evidence is lacking.
Thus, the aim of the current experiment is to investigate the influence of meter on
harmonic syntactic processing in music using event-related potentials, in particular the
early right anterior negativity (ERAN). Participants listened to musical sequences
consisting of five chords of equal loudness, the final chord function being either tonally
regular or irregular. The metrical importance of chords was manipulated by presenting
the sequences in two blocks: each chord sequence was preceded by a one-bar percussion
sequence either in a 4/4 or in a 3/4 meter. Thus, the final chord occurred either on a
metrically strong (first) or weak (second) beat. To further induce a specific meter,
participants had to detect rarely occurring chords with deviant timbre and judge as fast as
possible whether they were on a strong or weak beat (e.g. by pressing the left button if
the deviant was ‘on 1’ and the right button if it was ‘not on 1’). To accomplish the task,
participants had thus to keep the induced metrical structure over a whole chord sequence.
We hypothesize that if the metrical structure influences tonal syntactic processing, the
ERAN shows larger amplitudes when chord sequences are presented in a 4/4 meter than
in a 3/4 meter: in the former case, an irregular final chord is on the metrically strong beat
and is considered to cause a much stronger syntactic expectancy violation, compared to
when an irregular final chord is on a metrically weak beat. We discuss the results of the
currently ongoing analysis of the EEG and behavioral data. Results of this experiment
could provide important implications for research on musical syntax, structural and
temporal expectancy building, and their neurocognitive bases.
ISBN 1-876346-65-5 © ICMPC14 495 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Although music listening is known to couple auditory and motor regions and engage the
mesolimbic reward system of the brain, the functional connectivity of this network has
not been studied in detail as a function of musical training. By performing independent
component analysis (ICA), we aimed at studying functional integration of auditory-motor
networks in musicians and controls during naturalistic music listening. We performed
spatial group ICA on participants' fMRI brain responses (acquired during free-listening)
separately for each group in a targeted, hypothesis-driven region of interest (ROI) related
to auditory and motor function. A ROI-based ICA approach improves the separation and
anatomical precision of the identified spatial components. After examination of map
distribution, we subsequently estimated the degree of connectivity between a priori,
hypothesis-driven selected pairs of subnetworks by computing their proportion of overlap
in the most stable independent components. Preliminary findings revealed (a) a trend
towards high entropy and low kurtosis in the distribution of the identified components in
musicians, reflecting more widespread spatial maps compared to controls; (b)
strengthened connectivity between auditory and motor regions in musicians; (c) between
nucleus accumbens (NAcc) and caudate (correlates of dopamine release during and prior
chills while music listening); and (d) between the NAcc and the auditory cortex (a
network known to predict music reward value). Results may be construed as (a) a more
heterogeneous listening mode, (b) a strengthened action-perception integration during
free listening, and (c,d) an increased connection between music processing and the
emotion system in musicians compared to controls, likely due to musical training. These
inferences speak to musicians’ increased functional integration in sensorimotor and
reward networks previously found to tightly interact in a musical context.
ISBN 1-876346-65-5 © ICMPC14 496 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 497 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 498 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Results Schaefer, R. S., Desain, P., & Farquhar, J. (2013). Shared processing
The low-frequency content of the auditory stimuli used here of perception and imagery of music in decomposed EEG.
contain many spectral peaks in the 0 to 12 Hz region. Many of NeuroImage, 70, 317–326.
these peaks are directly related to the beat frequency, while Vialatte, F. B., Maurice, M., Dauwels, J., & Cichocki, A. (2010).
Steady-state visually evoked potentials: Focus on essential
others are likely multiples of the beat frequency peaks or the paradigms and future perspectives. Progress in Neurobiology,
result of other acoustic features present in the data. 90(4), 418-438.
For the responses to the intact stimuli, both spatial filtering Zanto, T. P., Snyder, J. S., & Large, E. W. (2006). Neural correlates
techniques produce roughly fronto-central component of rhythmic expectancy. Advances in Cognitive Psychology,
topographies, as well as strong spectral peaks at the beat 2(2-3), 221-231.
frequency, its harmonics, and its subharmonics. Interestingly,
activity at other peak frequencies in the stimuli that were
unrelated directly to the beat is reduced or absent. The EEG
magnitude spectra of responses to phase-scrambled control
stimuli do not show the characteristic tempo-related peaks
elicited by the intact stimuli.
Acknowledgements
This research was partially supported by a grant from the
Wallenberg Network Initiative: Culture, Brain, Learning. The
authors thank Anthony Norcia and Jacek Dmochowski for
providing helpful advice on the analyses.
REFERENCES
Dmochowski, J. P., Sajda, P., Dias, J., & Parra, L. C. (2012).
Correlated Components of Ongoing EEG Point to Emotionally
Laden Attention – A Possible Marker of Engagement? Frontiers
in Human Neuroscience, 6, 112.
Dmochowski, J. P., Bezdek, M. A., Abelson, B. P., Johnson, J. S.,
Schumacher, E. H., & Parra, L. C. (2014). Audience preferences
are predicted by temporal reliability of neural processing. Nature
Communications, 5.
Dmochowski, J. P., Greaves, A. S., & Norcia, A. M. (2015).
Maximally reliable spatial filtering of steady state visual evoked
potentials. NeuroImage, 109, 63–72.
Ellis, D. P. W. (2007). Beat Tracking by Dynamic Programming.
Journal of New Music Research, 36(1), 51–60.
Grahn, J. A., & Brett, M. (2007). Rhythm and beat perception in
motor areas of the brain. Journal of Cognitive Neuroscience,
19(5), 893–906.
Kaneshiro, B., Berger, J., Perreau Guimaraes, M., & Suppes, P.
(2012). An exploration of tonal expectation using single-trial
EEG classification. In ICMPC12-ESCOM8, 509-515.
Kaneshiro, B., Dmochowski, J. P., Norcia, A. M., & Berger, J.
(2014). Toward an objective measure of listener engagement with
natural music using inter-subject EEG correlation. In
ICMPC13-APSCOM5.
Kaneshiro, B., Nguyen, D. T., Dmochowski, J. P., Norcia, A. M., &
Berger, J. (2016). Naturalistic Music EEG Dataset – Hindi
(NMED-H). Stanford Digital Repository. Retrieved from
http://purl.stanford.edu/sd922db3535
Krumhansl, C. L. (2000). Rhythm and pitch in music cognition.
Psychological Bulletin, 126(1), 159–179.
Norcia, A. M., Appelbaum, L. G., Ales, J. M., Cottereau, B. R., &
Rossion, B. (2015). The steady-state visual evoked potential in
vision research: a review. Journal of Vision, 15(6), 4.
Nozaradan, S., Peretz, I., Missal, M., & Mouraux, A. (2011).
Tagging the neuronal entrainment to beat and meter. The Journal
of Neuroscience, 31(28), 10234–40.
Nozaradan, S., Peretz, I., & Mouraux, A. (2012). Selective neuronal
entrainment to the beat and meter embedded in a musical rhythm.
The Journal of Neuroscience, 32(49).
Parra, L. C., Spence, C. D., Gerson, A. D., & Sajda, P (2005).
Recipes for the linear analysis of EEG. NeuroImage, 28(2),
326-341.
499
Table of Contents
for this manuscript
Musical imagery – the ability to "hear" music in our mind's ear in the absence of external
stimuli – has long been posited as a quasi-perceptual phenomenon. This conjecture has,
in recent decades, found further support in neuroimaging findings of a significant overlap
in the neural resources underlying imagery and perception for the auditory – as well as
other – modalities. For instance, auditory cortex activation is reliably observed during
silence, if imagery is triggered by presenting music with silent gaps or silent videos
strongly implying sound. Musical imagery is driven by expectations based largely on
implicit musical knowledge, but while the role of expectation in music has been well
studied in perception, this holds less for musical imagery. Here we used multivariate
pattern analysis (MVPA), a technique that describes the extended brain spatial
representations that distinguish stimuli in various conditions. We used an fMRI passive-
listening paradigm that established a harmonic context via the presentation of in-key
chords (cadences). Cadences in each of several categories created a strong expectation for
certain end-chords, which were presented in one condition ("perception") but omitted in
the other ("imagery"). An MVPA classifier was trained to distinguish between these
former categories, i.e. between phrases leading to one heard end-chord or another. Pilot
data so-far supports our hypothesis that the classifier trained in the perception condition
generalises to the imagery condition, such that an end-chord's identity can be inferred
from brain activity (reflecting imagery) acquired during the silence replacing that end-
chord in its harmonic context. This result would suggest the existence of an extended
network underlying both perception and imagery, one which encodes stimulus identity
regardless of modality. This finding has theoretical implications about the generative
model that is argued to give rise to both musical perception and musical imagery.
ISBN 1-876346-65-5 © ICMPC14 500 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
A wide variety of musical repertoires, including 1960’s American minimalist music, are
built of highly repetitive musical features. Some minimalist music used short motives
each repeated for various lengths of time, similar to typical auditory Event-Related
Potential (ERP) paradigms. In fact, melodic pattern difference is known to be reflected in
the Mismatch Negativity (MMN) component. Also, the subsequent P3a component is
associated with involuntary attentional orienting towards the deviation and its novelty.
We were interested in how MMN and P3a are related to the regularity of melodic block
transition. We hypothesized that, compared to the previously repeated melody (i.e. the
Standard stimulus), MMN and P3a would be elicited when a new melodic block starts
(i.e. the Deviant stimulus). We also expected that P3a would attenuate more over the
course of irregularly repeating stimuli than in regularly repeating stimuli. We created two
stimulus sequences from five different four-note melodies, each starting with a unique
note. In the Constant sequence, each melodic block consisted of five repetitions of a
melody in a randomized order. In the Variant sequence, the number of repetitions varied
from three to seven. Ten participants heard both sequences twice while EEG was
recorded. Our analysis shows that the ERPs differed between the conditions in the first
and final thirds of the sequences: Constant conditions show greater MMN than Variant in
the beginning, but the reverse was true in the final third of the block. Interestingly, the
P3a stayed larger in the Constant condition. The Variant condition also shows strong
MMN throughout the sequence suggesting that pre-attentive processing remains steady
through the Variant condition. Decreased MMN for the Constant condition when
comparing the beginning with the end may result from the predictability of the melodic
unit repetition. The attenuated P3a found in the Variant End sequence suggests lower
attentional engagement.
ISBN 1-876346-65-5 © ICMPC14 501 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Alarms in the ICU sound frequently and 85-99% of cases do not require clinical
intervention. As alarm frequency increases, clinicians develop ‘alarm fatigue’ resulting in
desensitization, missed alarms, and delayed responses. This is dangerous for the patient
when an alarm-provoking event requires clinical intervention but is inadvertently missed.
Alarm fatigue can also cause clinicians to: set alarm parameters outside effective ranges
to decrease alarm occurrence, decrease alarm volumes to an inaudible level; silence
frequently insignificant alarms; and be unable to distinguish alarm urgency. Since false
alarm and clinically insignificant alarm rates reach 80-99%, practitioners distrust alarms,
lose confidence in their significance, and manifest alarm fatigue. Yet, failure to respond
to the infrequent clinically significant alarm may lead to poor patient outcomes. Fatigue
from alarm amplitude and nonspecific alarms from uniform uninformative alarms is the
post-monitor problem that can be addressed by understanding the psychoacoustic
properties of alarms and the aural perception of clinicians.
Results show near-threshold auditory perception of alarms is around -27 decibels (dB)
from background noise at 60 dB. Additionally, with visual offset of a patient monitor,
there is preserved performance measured by response time and accuracy to a clinical task
at negative SNRs compared to positive SNRs with worsening as the SNR approaches the
near-threshold of hearing. Thus, clinician performance is maintained with alarms that are
softer than background noise.
ISBN 1-876346-65-5 © ICMPC14 502 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Errors are rare but important events that have consequences and require adaptation of
action. When acting together, these consequences are often shared amongst partners.
During joint action, people need to monitor their own activity, the activity of partners,
and progress toward shared action goals. However, when performing the same action at
the same time and with the same expected outcome, it can be difficult to tell who is
responsible for the outcome; the agency of the action becomes ambiguous. Previous
research has shown that people slow down their actions after making an error, and also
when a partner makes an error. Electroencephalography studies have shown consistent
patterns of electrical activity in the brain when an error is made, both for one’s own errors
and errors made by a partner. However, previous studies have not investigated how
agency affects these patterns. We propose a method for studying how ambiguity of
agency affects these error-related event-related potentials (ERPs) for one’s own errors
and errors of a partner. We expect that agency ambiguity will affect the later, positive
ERPs for both own and other’s errors. Music is an ideal way to investigate this as it is a
social activity and musicians are familiar with playing with other musicians. Paired
pianists, who can hear but not see one another, played technical exercises in unison and in
octave parts. In the unison condition, agency is relatively ambiguous, enabling the
comparison of behavioural and neural measures between errors made when agency is
ambiguous and non-ambiguous. Preliminary behavioural results suggest that pianists
adapt to errors differently when agency is ambiguous. In this study, errors are made
spontaneously and naturally. Therefore, results will indicate how pianists adjust their
behaviour to errors they have committed and to those made by a partner, and how agency
ambiguity affects error processing in the brain.
ISBN 1-876346-65-5 © ICMPC14 503 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Groove is notoriously challenging to define but generally thought of with respect to the
propensity for a piece of music to induce movement (Madison, 2006). Consistent with
this view, Janata, Tomic, and Haberman (2012) showed that spontaneous movement is
more likely in high- vs. low-groove music. This is perhaps not surprising considering that
brain networks underlying beat processing encompass several motor areas such as
premotor cortex, cerebellum, and supplementary motor area (Grahn & Brett, 2007; Grahn
& Rowe, 2009). From the lens of M/EEG, neural entrainment to the beat has been found
to vary as a function of stimulus parameters as well as listener history (e.g., Nozaradan,
Peretz, & Mouraux, 2012; Doelling & Poeppel, 2015). On the basis of these findings, we
reasoned that the extent of entrainment to the beat in motor areas might serve as a useful
neural marker for “feeling the groove." Participants were asked to listen to 30-second
excerpts of popular songs that had been categorized as low-, mid-, or high-groove in a
previous study (Janata et al., 2012). Participants rated each excerpt on the extent to which
they thought the music grooved. Sixty-four channel electroencephalography (EEG) was
recorded and independent components analysis (ICA) was used to identify sources of
neuro-electric activity. Fourier analyses of motor sources revealed that the amount of
entrainment to the beat was greater for high- and mid- versus low-groove songs.
Entrainment was also found in visual areas, but was not found to be modulated by
groove. These findings support the view that a feeling of groove is related to a propensity
to move.
ISBN 1-876346-65-5 © ICMPC14 504 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 505 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
random selection without any kind of inclusion/exclusion criteria tend to worsen as the number of variables in the model increases
guaranteed a very heterogenic and representative sampling of the (Kenny & McCoach, 2003). We used the weighted least square using
music perception spectrum. a diagonal weight matrix with standard errors and mean- and variance-
adjusted (WLSMV) estimator and due to the cluster structure (children
D. The conception of the music perception battery nested in schools), the standard errors and chi-square test of the model
There are 80 pair of stimulus (called as items and represented by fit took into account such non-independence using the implementation
letter E in the table 1 and supplementary material) covering seven proposed by (Asparouhov, 2005, 2006). The adopted statistical
domains: contour (13 items), timbre (12), meter (10), pitch (5), scale significance level was 0.05.
(15), duration (20), and loudness (5). The 80 pair of stimulus are B. Reducing the items
provided in the attached supplementary material.
All fourteen teachers reported that the battery was too long. The
E. The battery application children tired around, on average, the 50th presented item, with the
Before presenting the 80 items, we played six opening testing younger losing interest sooner. Hence, one of the main previous aim
stimuli to evaluate if the child fully understood the instructions: “You, was to reduce the 80 items, selecting only the best indicators regarding
[child’s name], will listen to two sound sequences which are separated items’ factor loading. Under the bifactor model, we excluded some
by a short silent moment. You should decide if these sound sequences discordant items, the majority of composed items exhibiting non-
are equal or different. Then, push the button different if the two statistically significant factor loadings. We decided to preserve the
sequences have any difference, no matter how small. Push the button concordant-items regardless of their factor loadings, producing an
equal if you believe that the two sequences of sounds are equal equilibrated battery in terms of concordant and discordant items. For
perfectly.” For the six stimuli, the teacher helped the children explain loudness, we did not exclude any item, because it already had few
why they were equal or different; repeating them until they felt the items.
children really captured the instruction. Then, the 80 items were C. Viability of Subscales
played and children gave correct or wrong answers, marked as 1 or 0
respectively. There was no time limit to push the button. The order of The following indices were used to better understand the viability
presentation was random (as showed in the supplementary material), of subscales which constitute children’s musical perception:
to avoid local dependence. coefficient omega hierarchical (ωh) (McDonald, 1999; Zinbarg,
Revelle, Yovel, & Li, 2005), omega subscale (ωs) reliability estimate
F. Data collection and computer system Armonikos (Reise, 2012b), and explained common variance (ECV).
A computer system has been developed, called Armonikos, non-
Internet-dependent, which will be installed within personal computers
running Java Virtual Machine (JVM). Java was chosen because it
IV. RESULTS
allows software to be operating system-independent, and aggregates Among the 1,006 children, 69.9% are in public schools, 45% are
great possibilities to extend the design to accomplish future tasks. male, and each grade (from first to fifth) had approximately 200
G. Evaluator training children. Five out 14 schools participating in this research were private.
Even an automatized battery may evince some difference in the A. Seven correlated factors underlying the 80 items
way it is explained and applied. Hence, all the 14 evaluators passed Statistical problems appeared in the first measurement model
through training with the last author to ensure that they gave the same where the 80 items were distributed across the seven preconceived
instructions. The author instructed them to avoid demonstrating any correlated factors, represented by ξ. We saw a linear dependency
facial expressions, gestures, or words that gave positive or negative between two factors (ξscale and ξpitch) with the other five factors
signs of children’s achievement. If children became demotivated, the within the model: ξ(scale) with ξ(contour) ρ = 0.999 and ξ(pitch) with
author instructed teachers, they should use motivational phrases as ξ(duration) ρ = 0.971. Such linear dependency is inadmissible as stated
“you are almost finishing, do not give up,” “it is almost complete,” but by Kline (2011) and we solved such statistical problems (formally
avoid “you are doing well,” which could indicate positive or even described as a non-positive psi (ψ) defined matrix) by the exclusion of
negative attitudes in relation to children’s correct or wrong answers. scale and pitch factors. Consequently, there were 60 remaining items
distributed across five latent factor.
III. STATISTICAL ANALYSIS B. Five factors with 60 items
A. Fitting the baseline model (80 items) After the exclusion of scale and pitch factors, there were 60 items
distributed across five factors. Standardized factor loading is the
Confirmatory factor analysis evaluated the fit of the two proposed degree of association between each item with its underlying factor
multidimensional models underlying the 80 items correlated-factors where the closer to one, the stronger (perfect) the correlation between
model and a bifactor model. For major details about the bifactor model, the item and the factor). The goodness of fit for the five correlated-
its features, and related indices see Reise (2012a). factor solution returns a satisfactory adjusted model, although the TLI
To evaluate the goodness of fit for both models, we used chi- was lower than adopted cut-off: χ²(1700) = 2101.1211, p-value <0.001;
square (χ ), Confirmatory Fit Indices (CFI), the Tucker-Lewis index RMSEA = 0.015 (90% confidence interval [CI] = 0.013 to 0.017); CFI
(TLI), and root mean square error approximation (RMSEA). The = 0.902 and TLI = 0.898. For bifactor returned a good model: χ²(1650)
cutoff criteria used to determine the goodness of fit are described as = 1938.983, p-value <0.001; RMSEA = 0.013 (90% confidence
following: chi-square with no statistical significance (> 0.05), interval [CI] = 0.011 to 0.016); CFI = 0.930 and TLI = 0.925.
RMSEA near or less than 0.06, and CFI and TLI near or greater than C. Excluding items with non-significant p-values’ factor
0.9 (Hu & Bentler, 1999). CFI and TLI are penalized under complex, loading
multidimensional models with many items per factor like ours: they
506
Table of Contents
for this manuscript
To optimize the battery, a new exclusion process left only specific subdomains displayed weak viability beyond the general MP
statistically significant discordant items. This reduced the number of factor. Such results are not surprising, especially because in other
items close to 50 as the teachers suggested. Even for the remaining 51 areas of child evaluation, different constructs and their related
items, the bifactor model is still good: χ²(1173) = 1441.578, p-value subscales has exhibited the same poor results (Jovanović, 2015;
<0.001; RMSEA = 0.015 (90% confidence interval [CI] = 0.012 to Wagner et al., 2015).
0.018); CFI = 0.922 and TLI = 0.915.
Lastly, the total information curves for the general trait
D. Viability of the five subscales showed that the 51 items are most precise when measuring children
with average music perception skills. Therefore, such a battery is less
If a composite were formed based on summing the 51 items, accurate when assessing gifted or impaired children in this ability.
coefficient omega hierarchical (ωH) = 0.89 and EVC = 0.69, we
conclude that 89% of the variance of the resulting composite could be CONCLUSION
attributable to variance of general music perception. The reliability for
the five specific factors (calculated via ω(s)) controlling for that part The music perception battery showed good fit indices,
of the reliability due to the general perception factor were very low: especially under the bifactor model, was easy to apply in the
ω(s)Contour = 0.113, ω(s)Duration = 0.227, ω(s)Meter = 0.124, scholarly context, and captured the main general factor of music
ω(s)Timbre = 0.392, and ω(s)Loudness = 0.263. perception with good reliability.
V. DISCUSSION ACKNOWLEDGMENT
Our multidimensional structure underlying the original 80 or We thank FAPESP for the grant 2014/06662-8.
reduced 51 items led to a bifactor model fitting better than any
multidimensional correlated solution like the five correlated-factor REFERENCES
model.
Asparouhov, T. (2005). Sampling weights in latent variable
Most descriptions of validity of tools for assessing music modeling. Structural equation modeling, 12(3), 411-434.
perception skills use item parceling, grouping the items or questions
posed to the participants into parcels. One then treats these parcels like Asparouhov, T. (2006). General multi-level modeling with sampling
directly observed variables rather than treating each individual item or weights. Communications in Statistics—Theory and
question as a directly observed variable. Parceling might compromise Methods, 35(3), 439-460.
the quality of the research by hiding problems. Under seven correlate-
factors, pitch and scale within the same model presented an Bartholomew, D. J. (2004). Measuring intelligence: Facts and
inadmissible solution. Both domains were linearly dependent of other fallacies: Cambridge University Press.
latent trait (for example, contour). Hence, the ability to distinguishing
Borsboom, D. (2006). The attack of the psychometricians.
scale from contour among children with no extensive music training
resulted in a lack of divergent validity between such factors. Psychometrika, 71(3), 425-440. doi:10.1007/s11336-006-
1447-6
Due to the sample size, modeling at items’ level was viable
and we observed different patterns of positive and negative factor Cattell, R. (2013). Validation and Intensification of the Sixteen
loadings, correlating respectively to discordant and concordant stimuli, Personality Factor Questionnairef. Readings in Clinical
regardless of bifactor or correlated-factor solution. Such pattern has Psychology, 241.
never been described, indicating that discordant and concordant items
behave differently and the inverse nature of correlation related to each Cattell, R. B., & Burdsal Jr, C. A. (1975). The radial parcel double
hypothesized factor. Such detail regarding how the patterns of factor factoring design: A solution to the item-vs-parcel
loading behave would be muddied under parceling procedures due to controversy. Multivariate behavioral research, 10(2), 165-
combination of multiple sources of variance into one construct (Little, 179.
Rhemtulla, Gibson, & Schoemann, 2013). Item parceling occurs when
two or more items are combined (summed or averaged) prior to an Embretson, S. E. (2004). FOCUS ARTICLE: The Second Century of
analysis parcels (instead of the original items) are used as the manifest Ability Testing: Some Predictions and Speculations.
indicators of latent constructs (R. Cattell, 2013; R. B. Cattell & Measurement: Interdisciplinary Research and
Burdsal Jr, 1975). Perspectives, 2(1), 1-32.
Although the idea of a general music factor had been Hausen, M., Torppa, R., Salmela, V. R., Vainio, M., & Sarkamo, T.
described in Law and Zentner (2012) alluding to Charles Spearman’s (2013). Music and speech prosody: a common rhythm.
g-factor for intelligence, formal procedures (i.e., bifactor modeling) to Front Psychol, 4, 566. doi:10.3389/fpsyg.2013.00566
evaluate its existence had not previously been conducted. Through the
bifactor model, we observed that the viability and reliability of MP Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in
subscales were poor. Moreover, critical evidence was observed when covariance structure analysis: Conventional criteria versus
ωH (0.89) was compared with omega (0.95): almost all of the reliable new alternatives. Structural Equation Modeling: A
variance in total scores (0.89/0.95 = 0.93) can be attributed to the Multidisciplinary Journal, 6(1), 1-55.
general factor, which is assumed to reflect individual differences in
music perception taken as a whole. Only 6% (0.95 – 0.89) of the Jovanović, V. (2015). Structural validity of the Mental Health
reliable variance in total scores can be attributed to the Continuum-Short Form: The bifactor model of emotional,
multidimensionality caused by the specific domains. Only 5% is social and psychological well-being. Personality and
estimated to be due to random error. In other words, the m-factor is
Individual Differences, 75, 154-159.
robustly reliable even though it is multidimensional construct and the
507
Table of Contents
for this manuscript
Kang, R., Nimmons, G. L., Drennan, W., Longnion, J., Ruffin, C., Wagner, F., Martel, M. M., Cogo-Moreira, H., Maia, C. R. M., Pan,
Nie, K., . . . Rubinstein, J. (2009). Development and P. M., Rohde, L. A., & Salum, G. A. (2015). Attention-
Validation of the University of Washington Clinical deficit/hyperactivity disorder dimensionality: the reliable
Assessment of Music Perception Test. Ear and hearing, ‘g’and the elusive‘s’ dimensions. European child &
30(4), 411-418. doi:10.1097/AUD.0b013e3181a61bc0 adolescent psychiatry, 1-8.
Kantrowitz, J. T., Scaramello, N., Jakubovitz, A., Lehrfeld, J. M., Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., &
Laukka, P., Elfenbein, H. A., . . . Javitt, D. C. (2014). Vuust, P. (2010). The Musical Ear Test, a new reliable test
Amusia and protolanguage impairments in schizophrenia. for measuring musical competence. Learning and
Psychol Med, 44(13), 2739-2748. Individual Differences, 20(3), 188-196.
doi:10.1017/s0033291714000373
Wing, H. (1948). Tests of musical ability and appreciation: An
Kenny, D. A., & McCoach, D. B. (2003). Effect of the number of investigation into the measurement, distribution, and
variables on measures of fit in structural equation development of musical capacity. London, England:
modeling. Structural equation modeling, 10(3), 333-351. Cambridge University Press.
Kline, R. B. (2011). Measurement Models and Confirmatory Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s
Analysis Principles and Practice of Structural Equation α, Revelle’s β, and McDonald’s ω H: Their relations with
Modeling (Third Edition ed.): Guilford Press. each other and two alternative conceptualizations of
reliability. Psychometrika, 70(1), 123-133.
Law, L. N. C., & Zentner, M. (2012). Assessing Musical Abilities
Objectively: Construction and Validation of the Profile of
Music Perception Skills. PLoS One, 7(12), e52508.
doi:10.1371/journal.pone.0052508
508
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 509 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
REFERENCES
Barrowcliff, A. L., Gray, N. S., Freeman, T. C. & MacCulloch, M. J.
(2004). Eye-movements reduce the vividness, emotional valence
and electroderman arousal associated with negative
autobiographical memories. Journal of Forensic Psychiatry &
Psychology. 15(2), 325-345.
Davidson, P. R., & Parker, K. C. (2001). Eye movement
desensitization and reprocessing (EMDR): a meta-analysis.
Journal of consulting and clinical psychology, 69(2), 305.
Lamprecht, F., Köhnke, C., Lempa, W., Sack, M., Matzke, M., &
Münte, T. F. (2004). Event-related potentials and EMDR
treatment of post-traumatic stress disorder. Neuroscience
Research. 49, 267-272.
Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (2008). International
affective picture system (IAPS): Affective ratings of pictures and
instruction manual. Technical report A-8.
Lee, C. W., & Cuijpers, P. (2013). A meta-analysis of the
contribution of eye movements in processing emotional
memories. Journal of behavior therapy and experimental
psychiatry. 44(2), 231- 239.
Rozenkrants, B., & Polich, J. (2008). Affective ERP processing in a
visual oddball task: arousal, valence, and gender. Clinical
Neurophysiology, 119(10), 2260-2265.
Schubert, S. J., Lee, C. W., & Drummond, P. D. (2011). The efficacy
and psychophysiological correlates of dual-attention tasks in eye
movement and desensitization and reprocessing (EMDR).
Journal of Anxiety Disorders. 25, 1-11.
Shapiro, F., & Forrest, M. S. (2001). EMDR: Eye movement
desensitization and reprocessing. New York: Guilford.
510
Table of Contents
for this manuscript
It has been shown in healthy adults that style of music can differentially affect movement
performance. The aim of this study was to examine movement performance and
associated motor cortical activity while moving to different styles of music at two
different rates in young healthy adults. Thirty-two participants were asked to perform an
unconstrained finger flexion-extension movement in time with a tone only, and in time
with two music conditions, relaxing music and activating music. Two rates were
presented for each condition. A metronome click in the music conditions ensured
participants were tapping at the correct rate. Finger movement was measured using a 2
mm sensor placed on the index finger, and bipolar surface electromyography (EMG) was
recorded from the first dorsal interossieous and the extensor digitorum communis.
Electroencephalography (EEG) signals were recorded from a montage of 64 scalp-surface
electrodes during movement conditions and rest. Kinematic and kinetic data were
obtained from the sensor and EMG data. Movement onsets were manually obtained. EEG
signals were epoched relative to movement onset. Epochs were then collated for all trials
across each tone rate and condition. A fast Fourier transform was applied. The power
spectrum was normalized so total power in the spectrum was equal to 1 and then summed
for each participant. To obtain the mean spectrum across participants, all epochs were
averaged resulting in a chi-square distribution. To compare spectra between conditions,
the mean spectrum of one condition was divided by the mean spectrum of the second
condition resulting in an F distribution, allowing for statistical comparison of spectrum
between groups by obtaining 95th percentile confidence limits from an F table using the
total number of epochs in each group as the degrees of freedom. Any value below or
above these limits designated a significant difference between spectrums. Results
revealed that movement amplitude of repetitive finger movement did not differ between
conditions. Statistically significant results were found for slow rate movement in the
upper beta and gamma band for comparisons between both the music and move only
conditions. The comparison between activating and relaxing music revealed significant
differences in the upper beta and gamma band for only fast rate movement. These results
suggest that music may modulate brain activity over sensorimotor areas, but not to the
level to observe differences in movement performance. Perhaps differences will be
revealed among populations that demonstrate impaired movement performance, such as
those with Parkinson’s disease.
ISBN 1-876346-65-5 © ICMPC14 511 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Metaphors are widely used in expressive music performance, yet the process of how
semantic concepts can adjust motor performance is largely unknown. The music
pedagogy literature describes varying types of metaphor as being useful (i.e. other
actions, agents, moods or emotions, or complex scenes or situations), but also reports
large individual differences in how these metaphors affect performance. Given how
general the use of metaphors is in music performance, more knowledge of the mechanism
of metaphor-driven movement can have implications for pedagogy as well as clinical use
(i.e. movement rehabilitation). To investigate the neural activations related to metaphor
use in piano performance, two pilot studies were performed with professional pianists
using functional magnetic resonance imaging (fMRI) while performing series of button
presses as melodies, and applying different metaphors. Pilot 1 (n=5) used 2 melodies,
played in either a ‘flowing’ or ‘playful’ way, with behavioral data suggesting that in
some pianists performance timing is mostly driven by the melody, whereas in others it is
driven more by the metaphor. Substantial individual differences in the neural activations
were seen related to metaphor use, ranging from differences in primary motor areas to
visual and attention-related areas, suggesting individualized strategies. In pilot 2 (n=5),
two melodies were played with eight different metaphors, using action and emotion
metaphors varying in arousal and valence content. Although the results are preliminary,
this more complex design reveals similar individual differences in neural activation
patterns and behavioral results. Further single subject analyses will allow for a more
structured interpretation of the different aspects of metaphor-based music performance.
This exploratory work will provide a first look at the way expert musicians implement
expressive performance.
ISBN 1-876346-65-5 © ICMPC14 512 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 513 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Timing processing for musical rhythm is important for both performers and listeners.
Previous research has shown that listening to musical rhythm involves both auditory and
sensorimotor systems. We have further demonstrated that such auditory-motor
coordination is also reflected by modulation pattern of neural beta-band oscillations (13–
30 Hz), traditionally associated with motor functions, when listening to auditory
isochronous beats and subjectively imagining metric structures. Furthermore, it is known
that the beta-band oscillations have significance in ageing-related movement disorders
such as Parkinson's disease and stroke-induced disabilities. Here we hypothesized that a
short-term keyboard training may promote plastic changes in auditory–motor functions in
healthy older adults with the age of 60 years or more. They had little musical experience,
and received 15 one-on-one 1-hour lessons to learn how to read music and play keyboard
with both hands spanning over approximately one month. At the pre- and post- training
time points, magnetoencephalography (MEG) was recorded while they were listening to
the beat stimuli with three different tempi as well as finger-tapping with and without the
beat stimuli. Neuromagnetic beta-band activity for listening and tapping reproduced the
periodic modulation pattern observed in the previous studies. The brain areas involved
were also in line with the previous findings on the effect of tempo. Comparison to the
age-matched control group who were similarly naïve in music, but received no training
between two measurements apart by a month are being conducted. The present study has
implication for applying rhythmic auditory stimulation and music-making rehabilitation
in aging population.
ISBN 1-876346-65-5 © ICMPC14 514 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 515 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Behavioral and neuroimaging studies indicate that happy mood reduces the selectivity of
visual attention. For instance, individuals in an induced happy mood show heightened
distraction from task-irrelevant visual stimuli and augmented neural responses to
peripheral visual information presented outside the focus of attention.
In the current study, we investigated with event-related potentials (ERP) whether music-
induced happy mood has comparable effects on selective attention in the auditory
domain. Subjects listened to experimenter-selected sad, neutral or happy instrumental
music and afterwards participated in a dichotic listening task while
electroencephalography (EEG) was recorded. The task of the subjects was to detect target
sounds presented to the attended ear and ignore the sounds presented to the unattended
ear.
Thus, the ERP and behavioral results suggest that the subjects in a happy mood allocated
their attentional resources more equally across the task-relevant targets and the to-be-
ignored sounds compared to subjects in a sad or neutral mood. Therefore, the current
study extends previous research on the effects of mood on attention that has mostly been
conducted in the visual domain and indicates that even unfamiliar instrumental music can
broaden the scope of selective auditory attention via its effects on mood.
ISBN 1-876346-65-5 © ICMPC14 516 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
life because they can fall back on cues delivered through other
ABSTRACT sensory modalities.
While most normal hearing individuals can readily use prosodic Indirect evidence for this explanation comes from a recent
information in spoken language to interpret the moods and feelings study by Albouy and colleagues (2015) showing that, like
of conversational partners, people with congenital amusia, a musical normal listeners, auditory perception in amusics benefits from
disorder, report that they often rely more on facial expressions and the presence of informative visual information. In this
gestures, a strategy that may compensate for proposed behavioural study, amusic and control participants were asked
neurophysiological deficits in auditory processing. To test the extent to detect a deviant pitch in a 5-tone sequence, which was
to which amusic individuals rely on visual information in auditory simultaneously presented with or without visual stimuli – 5
processing, we measured event-related potentials (ERP) elicited by a dots appearing consecutively from the left to the right along a
change in pitch (up or down) between two sequential tones paired horizontal line but at the same vertical position. Even though
with the change in position (up or down) between two visually the visual stimuli provided no task-relevant (pitch)
presented dots. The change in dot position was either congruent or information about the deviant tone, their presence facilitated
incongruent with the change in pitch. Participants were asked to amusics’ ability to detect the deviant. It is unclear why
judge the direction of pitch change while ignoring the visual task-irrelevant visual information might enhance auditory
information. Our results revealed that amusic participants performed
processing in amusics. One possibility is that the visual cues
significantly worse in the incongruent condition than control
in Albouy et al. (2015) provided an additional temporal signal
participants. ERPs showed an enhanced N2-P3 response to
that prepared the processing of the concurrent auditory stimuli,
incongruent AV pairings for control participants, but not for amusic
participants. These findings indicate that amusics biased to rely on
thereby enhancing detection of deviant pitches (Jones, 1976;
visual information when this is available because past experience has Nickerson, 1973). Alternatively, it is possible that amusics
shown that information obtained from the auditory modality may not considered visual stimuli as a useful cue when detecting
be reliable. We conclude that amusic individuals tend to discount deviant pitch because they tend to use visual information to
auditory information in the context of audiovisual stimulus pairings. compensate their auditory difficulties on the basis of their
daily experience. Because amusics believe visual information
is valuable, its presence may increase their confidence in
I. INTRODUCTION decision making, which may benefit performance.
Congenital amusia is a disorder of musical abilities (Ayotte, To clarify this issue, we required amusics and normal
Peretz, & Hyde, 2002; Foxton, Dean, Gee, Peretz, & Griffiths, hearing controls to judge the direction (up or down) of pitch
2004; Hyde & Peretz, 2004; Peretz, 2013; Peretz et al., 2002; change in two consecutive tones. The acoustic pitch change
Peretz & Hyde, 2003). Experimental findings indicate that stimuli were presented concurrently with visual cues: a
congenital amusia can also affect aspects of speech perception, sequence of two dots, the second of which appeared above or
such as the recognition of linguistic prosody. However, this below the first one. The pitch change of the two tones was
deficit occurs mainly when speech is stripped of semantic either congruent (second tone higher pitch and second dot
information: affected individuals rarely show difficulties with higher location, or second tone lower pitch and second dot
natural speech and other complex sounds (e.g., environmental lower location) or incongruent (second tone higher pitch and
sounds) in everyday life (Ayotte et al., 2002; Liu, Patel, second dot lower location, or vice versa) with the spatial
Fourcin, & Stewart, 2010; Patel, Foxton, & Griffiths, 2005). change of the two dots. If the visual enhancement effect is
One possible explanation for this discrepancy is that amusics entirely task independent, we predicted that amusics should
make use of other available cues, such as semantic or visual show the same increment in performance for both AV
information, to compensate for their auditory deficits. For conditions, because the temporal cues and visual information
example, Thompson et al. (2012) found that individuals with in the AV-congruent and AV-incongruent displays are
amusia show reduced sensitivity to prosody of spoken identical. However if there is an element of task-dependence
sentences conveying a happy, sad, tender or irritated in the enhancement effect, then amusics should show a larger
emotional state. When queried, the same individuals reported increment in performance in the AV-congruent task relative to
that they often rely on facial expressions and gestures to the AV-incongruent task.
interpret the moods and feelings of people with whom they The use of congruent and incongruent AV pairings in the
interact. Thus, the reduced sensitivity to emotional prosody present study provided an interesting opportunity to obtain
does not seem to pose a problem to amusic individuals in real additional information about the neural mechanisms
underlying visual enhancement of auditory information in
ISBN 1-876346-65-5 © ICMPC14 517 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
amusics. A number of EEG studies have reported that standard and target stimuli (each of 800 msec duration) were
multisensory stimuli that are incongruent in spatial location or presented consecutively with a jittered inter-stimulus interval
temporal synchrony elicit a N2 component, a negative polarity (ISI) of 300-500 msec. Participants were asked to judge
event-related potential (ERP) component with a latency of whether the auditory target was shifted upward or downward
about 200 msec after the multisensory stimulus onset (Forster with respect to the standard while ignoring the visual
& Pavone, 2008; Lindstrom, Paavilainen, Kujala, & information. The inter-trial interval (ITI) was also jittered
Tervaniemi, 2012). This component is strongly associated between 300-500 msec. Participants underwent 240 trials in
with conflict detection at both a response level and a total, that is, 60 trials per interval (small and large) and
stimulus-representation level (Yeung, Botvinick, & Cohen, congruency (congruent and incongruent). All trials were
2004), and its amplitude is typically larger following presented in randomized order. Participants were given a
incongruent than congruent stimuli (Nieuwenhuis, Yeung, van break after every 60 trials. They completed ten practice trials
den Wildenberg, & Ridderinkhof, 2003). We reasoned that if prior to each block. Feedback was provided in the practice but
amusic individuals tend to ignore audiovisual conflicts not in the actual experiment. Participants were seated
because of an over reliance on visual information, then the N2 approximately 50 cm from the computer screen in an
response to incongruent stimuli may be attenuated for amusic electrically shielded and sound-attenuated room with dimmed
individuals but not control participants. light. All sounds were presented at a comfortable level via
headphones.
II. MATERIALS AND METHOD
D. EEG Recording
A. Participants The EEG was recorded with a sampling rate of 1000 Hz
Sixteen individuals with congenital amusia and 16 control from 32 Ag-AgCl electrodes placed according to the extended
participants took part in the present study. Participants were International 10-20 electrode system and mounted to a
diagnosed as congenital amusics when their composite scores head-cap (EASYCAP GmbH, Germany). The left mastoid
of the melodic subtests of the Montreal Battery of Evaluation electrode served as reference electrode during the recording.
of Amusia (MBEA; Peretz, Champod, & Hyde, 2003) were Four additional electrodes were employed to monitor
equal or less than 65 out of 90 points, that is, 72% correct (Liu horizontal and vertical eye movements. Electrode impedances
et al., 2010). All participants were right-handed and had were below 5 kΩ. Using a SynAmps 2.0 RT amplifier
normal hearing and normal or corrected-to-normal vision. (Compumedics Neuroscan, USA), the EEG was filtered
None reported any auditory, neurological, or psychiatric online with an analogue band-pass filter (0.05-200 Hz).
disorder. The amusic and control group were matched in
terms of age, gender, and education (all p > 0.05). The study E. EEG Processing
was approved by the Macquarie University Ethics Committee, Offline processing of the EEG was performed in MATLAB
and written informed consent for participation was obtained (R2013b; Mathwork, USA) using the EEGLAB toolbox
from all participants prior to testing. (Delorme & Makeig, 2004). The raw data was first
downsampled to 500 Hz and re-referenced to the average of
B. Stimuli the left and right mastoids. It was then segmented into
The auditory stimuli comprised 800 msec tones, which had one-second epochs extending from 200 msec before to 800
a flute timbre and were generated with the computer software msec after the onset of the target. Trials with incorrect
Garageband (Version 6.0.4, Apple Inc., USA). The visual responses or in which the potential exceeded ±150 μV were
stimuli were white dots (60 × 60 pixels, screen resolution: excluded from further data processing. The mean of each
1980 × 1024 pixels) presented for 800 msec at the center of a epoch was removed (Groppe, Makeig, & Kutas, 2009) before
black computer background. Each trial contained two stimuli. an independent component analysis (ICA) was performed
We refer to the first and second stimulus in a pair as the using the runica algorithm. We used an ICA-based method to
standard and target, respectively. The auditory standard could identify and reject trials with unusual activity (Delorme,
have one of the following tones in C major scale – C (261.63 Makeig, & Sejnowski, 2001). In addition, we employed the
Hz), D (293.66 Hz), E (329.63 Hz), F (349.23 Hz), and G EEGLAB plugin ADJUST to automatically reject ocular ICs
(392 Hz). The pitch of the auditory target was shifted upward (Mognon, Jovicich, Bruzzone, & Buiatti, 2011). After IC
or downward by 3 and 4 semitones (small interval) or 8 and 9 rejection, the EEG data was low- and high-pass filtered with a
semitones (large interval) with respect to the standard. The Windowed Sinc FIR Filter (Widmann & Schröger, 2012) and
visual standard always appeared at the center of the computer a cut-off frequency at 30 Hz (Blackman window; filter order:
screen, while the visual target could be shifted 300 pixels 275) and 0.5 Hz (Blackman window; filter order: 2750),
upward or downward along with the vertical line in relation to respectively. Epochs were subsequently averaged per
the standard. Auditory and visual stimuli were presented participant, condition and task and baseline corrected by
simultaneously, and auditory and visual target could be subtracting a pre-stimulus interval of 200 msec.
congruent or incongruent in terms of the direction of their
shift (e.g., congruent: both upward; incongruent: auditory F. ERP Analysis
upward, but visual downward). The main analysis focused on the comparison of the N2
components between amusic and control groups. Based on
C. Procedure visual inspection of the grand averages and previous studies,
Each trial began with a fixation-cross that appeared for the ERP deflection within time window of 260-380 msec after
500-800 msec in the centre of the screen. Subsequently, the stimulus onset was selected, encompassing the N2-P3
518
Table of Contents
for this manuscript
complex. For each participant and condition, we computed the complex, the statistical results yielded a marginally significant
mean amplitudes within the pre-defined time window. interaction between Congruency and Group, F(1, 30) = 3.89, p
Furthermore, the scalp electrodes were grouped into two = 0.06, ηp2 = 0.12. It should be noted that this interaction was
regions of interest (ROIs), specifically, anterior (FP1, FP2, F3, significant in the mid-coronal electrodes, F(1, 30) = 6.25, p <
FC3, F7, FT7, F4, FC4, F8, and FT8) and posterior (O1, O2, 0.05, ηp2 = 0.17, as shown in Figure 1. For the control group,
P3, CP3, P7, TP7, P4, CP4, P8, and TP8). The mid-coronal the negativity of the N2-P3 complex was larger for
electrodes (T7, C3, CZ, C4, and T8) were analyzed separately. incongruent trials (M = 2.03 μV, SE = 0.51 μV) than for
congruent trials (M = 3.01 μV, SE = 0.40 μV), F (1, 30) =
III. RESULTS 11.68, p < 0.01, ηp2 = 0.28. For amusic group, the enhanced
Repeated-measures ANOVA was conducted on the mean negativity of the N2-P3 complex for incongruent trials was
percentage correct (PC), with the between-subject factor not found (incongruent: M = 1.81 μV, SE = 0.51 μV;
Group (amusic and control) and the within-subject factors congruent: M = 1.99 μV, SE = 0.40 μV), F(1, 30) = 0.40, p =
Interval (large and small) and Congruency (congruent and 0.53, ηp2 = 0.01.
incongruent). Similar analyses were conducted on the mean
amplitudes of ERPs with an additional factor of ROIs IV. DISCUSSION
(anterior and posterior). This factor was excluded from the Congenital amusia is characterized by a variety of deficits
analysis on the mid-coronal electrodes. in musical perception. The present study tested the hypothesis
that these auditory modality processing deficits result in a
A. Behavioural Results tendency in amusic individuals to rely on visual information
We found a significant interaction between the factors in contexts like interpersonal conversations, when both
Congruency and Group, F(1, 30) = 10.24, p < 0.01, ηp2 = 0.25. auditory and visual cues to emotional prosody are
Examination of this interaction revealed that the performance simultaneously available.
of amusics was significantly worse than control participants in Indeed, our data revealed that amusics’ ability to identify
both small (amusics: M = 0.72, SE = 0.02; controls: M = 0.86, the pitch contour was comparable to controls’ when congruent
SE = 0.03) and larger interval conditions (amusics: M =0.92, visual information was supplied. However, amusics’
SE = 0.01; controls: M = 0.96, SE = 0.02), F(1, 30) = 12.63, p performance deteriorated significantly when the visual
= 0.001, ηp2 = 0.30, when the visual change in direction was information was incongruent with the auditory information. It
incongruent with the pitch change. However, when visual and suggests that amusic participants tend to be misled by the
auditory information was congruent, amusics (small: M = 0.88, visual stimuli even when visual information is task-irrelevant.
SE = 0.02; large: M = 0.97, SE = 0.01) performed just as well In other words, amusics make use of available visual cues in
as controls (small: M = 0.93, SE = 0.02; large: M = 0.98, SE = order to compensate for their difficulties in pitch perception.
0.01), F(1, 30) = 3.64, p = 0.07, ηp2 = 0.11. The tendency by amusics to use visual cues is compatible with
the “optimal-integration hypothesis,” which suggests that
B. ERP Results
perceivers are more likely to rely on one modality over the
We focus on the results from the ROI analysis, as the other depending on how reliable the information is (Ernst &
analysis of the mid-coronal electrodes yielded largely similar Bulthoff, 2004; Ernst & Banks, 2002). For amusic individuals,
results. For the selected time window, that is the N2-P3 their pitch impairment means that auditory information is not
reliable. Therefore, amusic individuals tend to use contextual
or facial cues to boost their auditory perception (Albouy et al.,
2015; Thompson, Marin, & Stewart, 2012). Regarding to
ERPs results, control participants exhibited an increase in the
amplitude of N2-P3 complex to incongruent relative to
congruent AV pairings, suggesting a processing of conflict
detection even when the visual information was
task-irrelevant (Forster & Pavone, 2008; Lindstrom et al.,
2012; Nieuwenhuis et al., 2003; Yeung et al., 2004). By
contrast, amusics failed to show a conflict response in this
task, as reflected by an absence of N2 effect. The findings in
line with our behavioural data, therefore, support our idea that
amusics focus on visual information to complete the auditory
task thereby ignoring AV conflicts.
V. CONCLUSION
In summary, the present study is the first ERP study
Figure 1. Grand-averaged ERPs of amusic and control groups at
showing that individuals with congenital amusia rely heavily
electrode CZ in response to auditory stimuli with congruent on unattended visual information when doing auditory task
(solid line) and incongruent (dash line) visual stimuli. A due to their deficits in auditory processing, providing the
significant congruency effect is highlighted with red bar, and a theoretical basis for using visual information to improve
non-significant congruency effect is highlighted with yellow bar. amusics’ auditory perception.
519
Table of Contents
for this manuscript
520
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 521 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Engagement with music is complex, influenced by music training and capacity, affective
style, music preferences, and motivations. Methods to assess music engagement have
included auditory tests, psychometric questionnaires and experience sampling methods.
Limitations of previous research however, include: an absence of a comprehensive
measure that assesses multiple aspects of music engagement; limited evidence of
psychometric properties; and a bias towards equating ‘engagement’ with formal music
training, overlooking other forms of strong music engagement. In this paper, we describe
a newly developed and psychometrically tested modular instrument which assesses a
diverse set of music engagement constructs. The MUSEBAQ can be administered in full,
or by module as relevant for specific purposes. In 3 separate studies, evidence was
obtained from over 3000 adults (aged 18-87; 40% males) for its structure, validity and
reliability. Module 1 (Musicianship) provides a brief assessment of formal and informal
music knowledge and practice. Module 2 (Musical Capacity) measures emotional music
sensitivity (α=.90), listening sophistication (α=.76), indifference to music (α=.59), music
memory and imagery (α =.81) and personal commitment to music (α=.80). Module 3
(Music Preferences) classifies preferences into six broad genres - rock or metal, classical,
pop or easy listening, jazz, blues, country or folk, rap or hip/hop, dance or electronica.
Online administration uses adaptive reasoning to selectively expand sub-genres, while
minimizing time demands. Module 4 (Reasons for Music Use) assesses seven
motivations for using music; musical transcendence (α=.90), emotion regulation (α=.94),
social, music identity and expression (α=.90), background (α=.80), attention regulation
(α=.69), and physical (α=.71). The comprehensiveness, yet flexibility, of the MUSEBAQ
makes it an ideal questionnaire to use in research requiring a robust measure of music
engagement.
ISBN 1-876346-65-5 © ICMPC14 522 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music preference has been related to individual differences like social identity, cognitive
style, and personality, but preference can be difficult to quantify. Self-report measures
may be too presumptive of shared genre definitions between listeners, while listener-
ratings of expert-selected music may fail to reflect typical listeners’ genre-boundaries.
The current study aims to address this by using a crowd-tagging to select music for
studying preference. For the current study, 2407 tracks were collected and subsampled
from the Last.fm crowd-tagging service and the EchoNest platform based on attributes
such as genre, tempo, and danceability. The set was further subsampled according to
tempo estimates and metadata from EchoNest, resulting in 48 excerpts from 12 genres.
Participants (n=210) heard and rated the excerpts, rated each genre using the Short Test
of Music Preferences (STOMP), and completed the Ten-Item Personality Index (TIPI).
Mean ratings correlated significantly with STOMP scores (r =.37-.83, p<.001),
suggesting that crowd-sourced genre ratings can provide a fairly reliable link between
perception and genre-labels. PCA of the ratings revealed four musical components:
‘Danceable,’ ‘Jazzy.’ ‘Hard,’ and ‘Rebellious.’ Component scores correlated modestly
but significantly with TIPI scores (r =-.14-.20, p <.05). Openness related positively to
Jazzy scores but negatively to Hard scores, linking Openness to liking of complexity.
Conscientiousness related negatively to Jazzy scores, suggesting easy-going listeners
more readily enjoy improvisational styles. Extraversion related negatively to Hard scores,
suggesting extroverts may prefer more positive valences. Agreeableness related
negatively to Rebellious scores, in line with agreeable peoples’ tendency towards
cooperation. These results support and expand previous findings linking personality and
music preference, and provide support for a novel method of using crowd-tagging in the
study of music preference.
ISBN 1-876346-65-5 © ICMPC14 523 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
What kinds of people become musical? Research has begun to examine which individual
difference variables predict aspects of musical expertise, such as duration of music
training. However, playing a musical instrument is not the only way to exhibit musicality.
The recently developed Goldsmiths Musical Sophistication Index (Gold-MSI) suggests
that musicality can be conceptualized more broadly and comprises 1) active engagement
(e.g., attending concerts), 2) perceptual abilities, 3) formal music training, 4) singing
ability, and 5) emotional responses to music. In the current study, we compared
predictors of duration of playing a musical instrument regularly with predictors of
musical sophistication. We also tested previously unexamined predictor variables such as
creativity and personality facets (six facets for each of the Big Five domains). We
administered tests of personality (IPIP-NEO Short Form; Grit-Short Form), nonverbal IQ
(modified Raven’s Matrices), creativity (Guilford’s Alternative Uses), and musical
sophistication (Gold-MSI) to 209 undergraduate students. Musical background and
demographic information (e.g., age, gender, parents’ education, involvement in non-
musical extracurricular activities) were also collected. Predictor variables that were
significantly associated with each outcome variable were entered into multiple regression
analyses. Results revealed that the predictor variables explained 19.1% of the variance in
duration of playing music regularly and 32.9% of the variance in musical sophistication.
Notable findings were that creativity was the strongest predictor of duration of playing
music regularly, and that neither cognitive measure (non-verbal IQ, creativity) predicted
musical sophistication. Our findings to date suggest that there are both overlapping as
well as unique predictors of duration of playing music regularly compared to musical
sophistication.
ISBN 1-876346-65-5 © ICMPC14 524 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 525 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
How can inter-individual differences in the preference for musical styles/genres be
predicted? Age, gender, personality traits, social and socio-cultural variables have been
discussed as the most promising variables for making good predictions. Above all,
scholars have usually ascribed the largest part of the variance to inter-individual
differences in personality traits. Consequently, there has been a large series of empirical
studies on the relationship between personality traits and musical style/genre preferences.
Aims
Despite the large number of studies, the impression remains, however, that empirical
findings are not quite consistent and that effect sizes are rather small. To examine the
issue more thoroughly, the results of previous studies were collected and meta-analyzed
in order to obtain reliable estimates of the effects.
Methodology
The Big 5 personality traits were most often used in previous studies and were therefore
chosen as the most appropriate categorization of personality traits in the meta-analysis.
The Short Test of Music Preference (STOMP) with its five dimensions was most often
used as a categorization of musical style/genre preferences and therefore used in the
meta-analysis. Hence, studies were included in the analysis when they had investigated
the relationship between at least one of the Big 5 personality traits and at least one of the
five dimensions of the STOMP. Thus, in total, there were 25 sub-analyses.
Results
All weighted averaged correlation coefficients were very small; most of them near zero.
Only four of the 25 coefficients were at least r = .10. The largest effects were observed
for the openness-to-experience personality trait that exhibited small correlations (between
.12 and .22) with three musical styles.
Conclusion
Thus, personality traits cannot substantially account for the inter-individual variance in
musical preferences. Musical functions are discussed as an alternative explanation for
that variance.
ISBN 1-876346-65-5 © ICMPC14 526 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 527 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 528 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 529 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
530
Table of Contents
for this manuscript
experiences. Therefore it was decided to build the chart as a dataset, we kept the recordings belonging to a subset of the
synoptic diagram (see Figure 2), in order to stimulate the original singers, regarding the largest intersection of examples
visual sense. It aims to promote a representative analysis and, at same time, the most frequent type of exercises. At the
through geometrical shapes. In this case, triangles mark the end of this screening process, our dataset contains a total of 66
beginning and the end of a particular note (pointing up and recordings with examples from six distinct music scores and
down, respectively), and circles mark the pitch precision. The from fourteen distinct singers (age range from 19 to 58 years
colors of these shapes indicate the accuracy of each note old). Later, a committee of five expert listeners, with more
parameter. These colors are complementary in the RGB model than ten years of experience in solfège assessment, annotated
and antagonic on themselves: green as a "cold" color, and red each sung note existing in the final dataset. As a result of this
as a "warm". This contraposition is in agreement with the process, each sung note had received labels (“correct” or
contrast key, suggested by Travis (2011). Since the color red “incorrect”) from each evaluator, regarding independently the
is the one with longer wavelength and lower frequency on the parameters pitch, onset and offset. The final dataset contains a
visible chromatic spectrum, it is the color that first impact the total of 2116 evaluated music notes (an average of 423.2 notes
human eye; thus, it is also the one that first reach the visual per expert). The ground truth is obtained for each of these
sense (Farina et al., 2006, p. 92). Eva Heller (2012) considers sung notes by labeling each respective note parameter as
that "warm" colors causes anxiety, heat and excitation. In this correct or incorrect by the majority of votes.
way, red color has the potential to draw attention and signalize It is worth noting that the evaluation was performed in two
something wrong, implying an understanding of "mistake". distinct contexts: a) 14% of our samples were recorded with
The green color, which is complementary and opposite to red tempo variation by meter-mimicking. Thus, its assessment is
in the RGB model, counteracts the meanings attributed to it, regarding the coherence between the hand position and the
implying ideas of quietness and welfare on the viewer, note onset/offset timestamp;. b) 86% of the samples were
causing the understanding of "success". When the proposed recorded in synchronization with a metronome in constant
system is unable to find a confident response, it draws the tempo. In this case, the assessment is performed regarding
respective symbol in gray color, which is a neutral tone only the auditive stimulus. The disagreement among the
(Farina et al., 2006, p. 98). It indicates resignation or abscence experts were kept to model the Bayesian classifier.
of something – in this case, it means the system indecision.
Figure 2 shows a print-screen of the proposed visual feedback. B. Data Analysis
Aiming to compare the two assessment contexts, a
statistical analysis was done on the voting process made by
the expert’s committee. We found out that, among the music
notes with majority of votes for the classes “correct” and
“incorrect”, 10 – 15% of them received only 3 of 5 votes in
agreement, regarding independently the both approaches (with
and without meter-mimicking). This indicates an arguable
degree of doubt among the expert listeners and also a quite
consistent behavior (similar percentage), independently of the
audio or audiovisual stimulus.
The final automated note-by-note assessment is performed
through a Bayesian classifier that was trained using the
annotated dataset, including the votes in disagreement among
the experts. Figures 3 and 4 show the histograms of the
deviations (distance between the measured sung note and the
Figure 2: Visual Feedback: Triangles mark the beginning and the end expected note that is defined in the music score) of the pitch,
of a particular note (pointing up and down, respectively), and circles onset and offset parameters, estimated from the annotated
mark the pitch precision. The colors of these shapes indicate the dataset (for the correct and incorrect labels) using constant
accuracy of each note parameter.
tempo and varying tempo through meter-mimicking,
respectively. We have modeled the decision boundaries
III. RESULTS (Bayes Decision Rule) of the probabilistic classifiers by fitting
these data into Gamma density distributions. The accuracy of
A. Dataset the solfège assessment achieved by our system was evaluated
The data used in this work was captured between 2013 and using 10-Fold cross validation with Bayes rejection rule
2015, where sight-singing exercises were recorded by students (Webb, 2011), discarding 15% of the samples. The final
and professors of the undergraduate music course from average accuracy over the three parameters is approximately
Federal University do Rio Grande do Sul (Brazil), providing 88.5%.
useful input data to our experiments and evaluation. During We also investigate the human perception tolerance through
this period, a total of 93 audio examples were recorded (based the mapping of the expert listeners’ assessment, based on the
on several solfège exercises). The recording examples Bayesian decision points for each note parameter. Figure 5
contains solfège exercises based on diatonic scales and all illustrates these points as well as the final trained Bayesian
melodic intervals of the chromatic scale, associated to classifiers for the pitch, onset and offset parameters. As can
durations of half, quarter and eighth notes. From this initial be seen, the decision points (upright dashed lines) estimated
531
Table of Contents
for this manuscript
using the auditive stimulus only (top) are different from the
decision points estimated using the audiovisual stimulus
(bottom).
532
Table of Contents
for this manuscript
control the music tempo with meter-mimicking gestures. The Schramm, R., Jung, C.,R., and Miranda, E. R., Dynamic Time
assessment is done by a set of Bayesian classifiers that create Warping for Music Conducting Gestures Evaluation, Multimedia,
probabilistic regions of acceptance (correct class) and IEEE Transactions on, vol. 17, no. 2, pp. 243-255, Feb, 2015a.
rejection (incorrect class), emulating the real human Schramm, R., H. S. de Nunes, and C. R. Jung, Automatic Solfège
Assessment, International Society for Music Information Retrieval
perception of musical parameters (pitch, onset and offset). Conference, Málaga, 27 October, 2015b.
Specifically, the Bayesian classifiers were built by fitting Stephen A. Zahorian, and Hongbing Hu, A spectral/temporal method
Gamma density distributions on the histograms obtained from for robust fundamental frequency tracking, J. Acosut. Soc. Am.
measures of deviation between the detected sung note 123(6), June 2008.
parameters and the respective note parameters specified in the Travis, D. A CRAP way to improve usability. In: Bright ideas for
music score of the sight-singing exercise. The proposed user experience designers. Userfocus, 2011. Available in
system can perform the sight-singing assessment, note by <http://www.userfocus.co.uk/pdf/Bright_Ideas_for_UX_Designer
note, in 85% of the cases, with an average accuracy of s.pdf> Access in May, 2016.
Webb., A. R. Statistical Pattern Recognition. Wiley, Chichester,UK,
approximately 88.5%. This work provides a reliable and
3 edition, 2011.
automated alternative for sight-singing assessment, which can Zahorian, S.A. and Hu H. A spectral/temporal method for robust
be useful in the context of self-education and distance fundamental frequency tracking. The Journal of the Acoustical
learning. A visual feedback based on user experience and Society of America, 123, 4559-4571, 2008.
usability theories was proposed to aid the solfège study. It still
needs more research to measure and evaluate the impact of the
user interface design in the student learning process. Our
experiments also have shown that the human evaluators
(expert listeners) change their judgment criteria depending of
the type of stimulus. Despite formal statistic tests had
confirmed this behaviour, it still pending an analysis in deep
to check if its cause is the difficulty of a visual analysis (audio
only versus audiovisual) or if it is caused by the relaxation of
the expectation of the musical tempo generated by the meter-
mimicking conducting. In future work, we plan to address
these questions, implementing more extensive tests based
mainly on human perceptual experiments.
REFERENCES
Allanwood, G.; Beare, P. User experience design - Creating designs
users really love. London: Bloomsbury Publishing, 2014.
Farina, M.; Perez, C.; Bastos, D. Psicodinâmica da cores em
comunicação. São Paulo: Edgar Blücher, 2006.
Heller, E. A Psicologia das Cores: Como as cores afetam a emoção e
a razão. 1. Ed. São Paulo: GG Editor, 2012.
ISO. Ergonomic requirements for office work with visual display
terminals (VDTs), Part 11, Guidance on usability (ISO 9241-
11:1998E). International Standards Organization, Geneva, 1998.
Maes, P., Amelynck, D., Lesaffre, M. Leman, M., and Arvind, D. K.
The “Conducting Master”: An Interactive, Real-Time Gesture
Monitoring System Based on Spatiotemporal Motion Templates.
International Journal Human Computer Interaction 29, 7, 2013,
471–487.
Mauch, M. and Dixon, S. 2014. pYIN: A Fundamental Frequency
Estimator Using Probabilistic Threshold Distributions. In
Proceedings of the IEEE Inter- national Conference on Acoustics,
Speech, and Signal Processing (ICASSP). IEEE, Florence, 659–
663.
Molina, E.; Barbancho, Gomez, E.; AM. Barbancho, and Tardon.
L.J. 2013. Fundamental frequency alignment vs. note-based
melodic similarity for singing voice assessment. In Acoustics,
Speech and Signal Processing (ICASSP), 2013 IEEE Interna-
tional Conference on. 744–748.
Ryynänen, M. P. and Klapuri, A. P. 2008. Automatic transcription of
melody, bass line, and chords in polyphonic music. Computer
Music Journal 32, 3 (September 2008), 72-86.
Santaella. L. A Teoria Geral dos Signos: Como As Linguagens
Significam As Coisas. São Paulo: Pioneira Editor, 2000.
533
Table of Contents
for this manuscript
The risk of noise-induced hearing loss, which results from repeated exposure to loud
sounds, is a concern for those in the field of music, such as professional musicians and
music educators. Many musicians may be reluctant to wear earplugs to protect their
hearing, due to lack of comfort and inability to hear instrument sound properly. The
purpose of this study was to determine the effect of musicians’ earplugs on instrumental
performance and the perception of tone quality, intonation, and dynamic contrast.
Participants (N = 96) were studio faculty teachers (n = 8) and undergraduate music
education students (n = 88), from a large state university school of music. Faculty
teachers were recorded performing musical passages first without earplugs and then with
musicians’ earplugs. Objective data were collected to compare recorded frequencies with
standard frequencies of single pitches recorded with and without earplugs. A perception
test was created using individual audio tracks with pairs of the same performer and
performance material, with and without musicians’ earplugs. Participants listened to each
pair of phrases to determine whether the second phrase differed from the first in regards
to tone quality, intonation, and dynamic contrast. Acoustical analyses of recordings made
by faculty with and without earplugs indicate that pitch accuracy did not consistently
favor either condition. Preliminary results from the perception test indicate that although
both faculty and student listeners perceived some differences, the most frequent
perception was that the audio pair was equal, and there was no clear advantage between
performing with and without earplugs in terms of tone quality, intonation, and dynamic
contrast. The implications of these findings suggest that musicians could feel confident
that wearing musicians’ earplugs while performing may not adversely affect pitch
accuracy or listeners’ perceptions of their timbre and dynamic control.
ISBN 1-876346-65-5 © ICMPC14 534 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Shinichi Suzuki emphasized fostering kindness and sensitivity to others as part of music
instruction, and group music making impacts children’s empathy and social skills. We
investigated whether six months of Suzuki group music lessons increased children’s
empathy, as assessed by the Griffith Empathy Measure (GEM). Parents of students in a
Suzuki instrumental music program (N = 48) completed the GEM at the start of their
lesson year, and again six months later, and were also surveyed about their values related
to music and family sociodemographic data at Time 1. A 2x3 (empathy score x group
persistence) ANOVA revealed no significant change in empathy from Time 1 to Time 2.
However, we noted differing trends, consistent with our predictions, among those who
did or did not continue group lessons: Mean empathy scores for children who stayed in
lessons increased from M = 2.44 to M = 2.67, while scores for those who did not persist
(or did not participate at all) decreased from M = 1.71 to M = 1.58 over the same period.
The analysis revealed a significant between-subjects effect for group participation (p =
.004), and post-hoc analyses showed a significant difference in Time 1 empathy scores
between children who subsequently persisted in group class and those who never enrolled
(p = .003). These pre-existing differences in our sample are consistent with Corrigall and
Schellenberg’s (2015) observation that child personality predicts music training duration.
Children’s empathy score at Time 1 was also significantly associated with parent’s rating
that music was important (ρ = .54, p = .002), and that Suzuki instruction in particular was
important (ρ = .54, p = .003), consistent with Corrigall and Schellenberg’s finding that
music training is predicted by parent personality (2015). We conclude that individual
differences play an important role in music education, and continue to explore group
music and empathy in an ongoing study of Suzuki programs across North America.
ISBN 1-876346-65-5 © ICMPC14 535 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Music training has been associated with enhanced cognitive control, which refers to the
cognitive processes enabling self-regulation of behavior. However, the precise
association between music training and cognitive control remains unclear. Possibly, this
stems from studies using different types of tasks that measure different aspects of
cognitive capacities (linguistic, visuospatial, etc.) or that rely on different response
modes.
Aims
We investigated whether the enhanced performance by musicians on cognitive control
tasks depends on the type of task.
Method
83 participants were tested (9-11 years old), with 39 monolingual musically trained
children and 44 monolingual non-musically trained children. Information on
confounders, such as socio-economic status, verbal and non-verbal IQ and personality
were measured to ensure group equality. To assess cognitive control, both groups
underwent the Simon and the Stroop task, measuring reaction times (RTs) and error rates
on congruent and incongruent trials.
Results
There were no differences between groups on the background variables. Concerning the
RTs on the Simon task, a significant interaction of group by congruency emerged (one-
tailed), such that musicians had a smaller interference effect than controls. For the error
rates, there were no differences between groups. Concerning the Stroop task, no
differences between the musicians and the controls were found on the RTs and the error
rates.
Conclusion
Our results suggest that the cognitive control benefit for musicians could depend on the
cognitive control task. A potential explanation for these findings is that musicians are
better at processing spatial information, which is an important factor in the Simon task,
but not in the Stroop task.
ISBN 1-876346-65-5 © ICMPC14 536 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Specific Mathematical and Spatial Abilities Correlate with Music Theory Abilities
Nancy Rogers,*1 Jane Piper Clendinning,*2 Sara Hart,#3 and Colleen Ganley#4
*
College of Music, Florida State University, USA
#
Department of Psychology, Florida State University, USA
1
nancy.rogers@fsu.edu, 2jclendinning@fsu.edu, 3shart@fcrr.org, 4ganley@psy.fsu.edu
ISBN 1-876346-65-5 © ICMPC14 537 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
II. METHOD violated it. Participants were asked to identify the image that
didn’t match the other five. Samples are provided in Figure 2.
Rather than relying solely on broad measures such as
standardized exams to represent math ability, we administered 3) Approximate Number System Test. Participants
a battery of tests addressing basic numeracy, spatial skills, and completed the 120-item Panamath Test (www.panamath.org),
pattern recognition, as well as surveys addressing anxiety and a freely available web-based test of the Dots Task, a measure
confidence. Just as importantly, we didn’t assume that musical of the approximate number system (Halberda, Mazzocco, &
ability is well represented by years of private study, hours of Feigenson, 2008). The Dots Task is a method for measuring
practice time, or public concerts. We focused instead on the ability of an individual to understand and manipulate
performance in music theory classes and on music theory numerical quantities non-symbolically. The process
exams, because — as an academic study of music — music underlying this ability of non-symbolic number sense is called
theory grades are particularly likely to reflect the underlying the Approximate Number System, an intuitive recognition of
cognitive processing and affective factors that may link number. Participants were shown collections of intermixed
achievement in mathematics and music. blue and yellow dots, with five to twenty dots per color, for
600 ms and determined whether there was a larger number of
A. Participants yellow dots or a larger number of blue dots. They were
This study included 99 participants (60 female, 38 male; instructed to ignore the size of the dots, focusing solely on the
one declined to answer), all of whom were recruited from quantity. A representative image is provided in Figure 3. The
beginning music theory courses at Florida State University. accuracy and response time for each trial was recorded, and
Most (86) were enrolled in Music Theory I, the first course in this was converted into a participant’s Weber fraction
the required four-semester music theory sequence required for (w-score), which represents the smallest ratio that can
music majors; the remaining 13 were enrolled in accurately be discriminated by a given individual.
Fundamentals for Music Majors, a remedial course that
precedes Music Theory I. Their mean age was 18.4 (SE = .12).
Participants granted permission to obtain their music theory
grades, entrance exam scores, and standardized test scores.
(Data from several additional participants who withheld their
permission have not been included in this report.) They were
awarded extra credit for their participation.
B. Procedure and Apparatus Figure 1. A sample item from the Mental Rotation Test.
538
Table of Contents
for this manuscript
68
saw images of common musical symbols for 500 ms. Some
images were correct, and others had been distorted (e.g.,
written backwards). Participants were asked to determine
whether or not the image represented correct standard musical
68
notation. Samples are provided in Figure 4.
For key signatures, participants saw images for 500 ms.
Some images were correct, and others had been distorted (e.g.,
accidentals were written in the wrong order or in the wrong
register). Participants were asked to determine whether or not
the image represented correct standard musical notation. Figure 6. Two sample items from the rhythmic notation portion
Samples are provided in Figure 5. of the Standard Music Notation Recognition Test.
For rhythmic notation, participants saw images of a
one-measure rhythm in either 3/4 or 6/8 for 500 ms. Some
images beamed the eighth-notes correctly, while others either
beamed inappropriate combinations of eighth-notes.
Participants were asked to determine whether or not the image
represented correct standard musical notation. Samples are
provided in Figure 6.
6) Triad Recognition Test. This researcher-developed test
consisted of 10 items. Because some participants had no prior
Figure 7. Three sample items from the Triad Recognition Test.
instruction in music theory, they were first provided with an
explanation of a triad:
A triad contains three notes that can be arranged on a staff to
fall on consecutive lines or consecutive spaces. This might
involve changing octaves.
Clarifying illustrations showed a triad built on consecutive
spaces, a triad built on consecutive lines, and two
demonstrations in which notes were moved an octave in order
to reveal the triadic structure. Participants then saw images of
three pitches (without accidentals) on a single clef for 500 ms.
Some of the images formed triads and others did not.
Participants were asked to determine whether or not the
images contained triads. Samples are provided in Figure 7.
7) Musical Pattern Continuation Test. This researcher-
developed test consisted of six items. In this untimed test,
participants were shown a melodic pitch pattern containing
7-10 notes. They were asked to determine which one of four
possible continuations maintained the established pattern. A
sample is provided in Figure 8.
Figure 4. Four sample items from the individual symbol portion Figure 8. A sample item from the Musical Pattern Continuation
of the Standard Music Notation Recognition Test. Test.
539
Table of Contents
for this manuscript
8) Math Confidence Assessment. This assessment (adapted achievement test that assesses knowledge of scales, key
from the Confidence Subscale of the Fennema Sherman Math signatures, intervals, and triads. (The Fundamentals
Attitudes Scale, Fennema & Sherman, 1976) consisted of Placement Exam determines whether students begin with
seven statements. Participants were asked to indicate how Fundamentals for Music Majors or with Music Theory 1.)
much they agreed or disagreed with each statement on a
seven-point scale (1 = “strongly disagree” and 7 = “strongly A. Individual Measures
agree”). Two sample statements appear below. 1) Mental Rotation Test. Participant scores ranged
I am sure I could do advanced work in math. from 1-21, out of a maximum possible score of 24, with a
I am not good at math. mean of 10.5 (SE = .6). The Mental Rotation Test correlated
significantly with Music Theory 1, with overall performance
9) Math Anxiety Assessment. This assessment (Math in music theory, and with the traditional Fundamentals
Anxiety Rating Scale-Revised, Hopko, 2003) consisted of Placement Exam.
twelve situation descriptions. Participants were asked to
indicate how much anxiety they would feel in each situation 2) Geometric Invariants Test. Participant scores ranged
on a five-point scale (1 = “low anxiety” and 5 = “high from 3-14, the maximum possible score, with a mean of 11.6
anxiety”). Two samples appear below. (SE = .3). The Geometric Invariants Test correlated
significantly with Music Theory 1, Music Theory 2, overall
Thinking about an upcoming math test one day before
Watching a teacher work an algebraic on the blackboard performance in music theory, and with the traditional
Fundamentals Placement Exam.
10) Music Theory Confidence Assessment. This assessment
(adapted from the Confidence Subscale of the Fennema 3) Approximate Number System Test. Two scores were
Sherman Math Attitudes Scale, Fennema & Sherman, 1976) eliminated due to technical errors. The remaining scores
consisted of seven statements. Participants were asked to ranged from 0.12-0.67, with a mean of 0.242 (SE = .01). The
indicate how much they agreed or disagreed with each Approximate Number System Test did not correlate
statement on a seven-point scale (1 = “strongly disagree” and significantly with any measure of achievement in music
7 = “strongly agree”). Two sample statements appear below. theory. Although sense of number is reported to be an
important factor in mathematical ability, it does not appear
I am sure I could do advanced work in music theory. critical to music theory ability.
I am not good at music theory.
4) Berlin Numeracy Test. Participant scores ranged
11) Music Theory Anxiety Assessment. This assessment from 1-4, the maximum possible score, with a mean of 2.0
(adapted from Math Anxiety Rating Scale-Revised, Hopko, (SE = .1). The Berlin Numeracy Test correlated significantly
2003) consisted of twelve situation descriptions. Participants with Music Theory 1, Music Theory 2, overall performance in
were asked to indicate how much anxiety they would feel in music theory, and with the traditional Fundamentals
each situation on a five-point scale (1 = “low anxiety” and Placement Exam. Although these correlations were less strong
5 = “high anxiety”). Two samples appear below. than those observed for the Geometric Invariants Test, the
Thinking about an upcoming music theory test one day before Mental Rotation Test, the Triad Recognition Test, and the
Watching a teacher write counterpoint on the blackboard Musical Pattern Continuation Test, it is worth noting that this
measure included only four items.
III. RESULTS AND DISCUSSION
5) Standard Music Notation Recognition Test. Participant
Very few measures correlated significantly with overall scores ranged from 14-26, the maximum possible score, with
performance in the Music Fundamentals class, a remedial a mean of 21.5 (SE = .3). Scores on individual symbols ranged
course for music majors who enter Florida State University from 1-4, the maximum possible score, with a mean of 3.6
with very weak backgrounds. In contrast, nearly all of our (SE = .1). Scores on key signatures ranged from 4-10, the
measures (with the sole exception of the Approximate maximum possible score, with a mean of 8.6 (SE = .1). Scores
Number System Test) exhibited the expected significant on rhythmic notation ranged from 4-12, the maximum
correlation with overall performance in Music Theory 1. possible score, with a mean of 9.3 (SE = .2). The Standard
Although this might suggest that our measures are better Music Notation Recognition Test correlated strongly with
predictors for more experienced students, a more likely Music Theory 1, Music Theory 2, overall performance in
explanation is that an insufficient number of Fundamentals music theory, and with the traditional Fundamentals
students (13) volunteered to participate in the study. Most of Placement Exam. In fact, this measure alone predicted student
these Fundamentals students are, in fact, part of the Music success in all three music theory courses better than the
Theory 1 data pool because they advanced to Music Theory 1 traditional Fundamentals Placement Exam (results for the
in the second semester (while most of the other 86 participants Music Fundamentals course fell just short of significance,
advanced to Music Theory 2). p = .08).
Table 1 presents a summary of the observed correlations
between individual measures and participant performance 6) Triad Recognition Test. Participant scores ranged
both in specific courses and more generally in all core music from 1-9, out of a maximum possible score of 10, with a mean
theory courses so far (i.e., Music Theory 1 and 2, because of 5.8 (SE = .2). The Triad Recognition Test correlated with
none of these participants have yet taken Music Theory 3 performance in all music theory courses, including a fairly
and 4). For purposes of comparison, Table 1 also includes strong significant correlation with Fundamentals. It also
results from the Fundamentals Placement Exam, a traditional correlated with the traditional Fundamentals Placement Exam
540
Table of Contents
for this manuscript
(which itself did not correlate significantly with performance 9) Math Anxiety Assessment. Participant scores ranged
in the Fundamentals course). from 1, the minimum possible score, to 5, the maximum
possible score, with a mean of 2.4 (SE = .1). The Math
7) Musical Pattern Continuation Test. Participant scores
Anxiety Assessment correlated negatively (that is, lower
ranged from 0, the minimum possible score, to 6, the
anxiety corresponded with better outcomes) with Music
maximum possible score, with a mean of 3.8 (SE = .2). The
Theory 1, Music Theory 2, and overall performance in music
Musical Pattern Continuation Test was our single most
theory. These negative correlations were considerably less
successful measure, correlating with performance in all music
strong than the positive correlations observed for the Math
theory courses, including a fairly strong significant correlation
Confidence Assessment.
with Fundamentals. It also correlated with the traditional
Fundamentals Placement Exam (which, as mentioned above, 10) Music Theory Confidence Assessment. Participant
did not correlate significantly with performance in the scores ranged from 1, the minimum possible score, to 7, the
Fundamentals course). The results associated with this maximum possible score, with a mean of 4.4 (SE = .1). The
measure are especially remarkable because it only included Music Theory Confidence Assessment correlated positively
six items. with Music Theory 1, Music Theory 2, overall performance in
music theory, and with the traditional Fundamentals
8) Math Confidence Assessment. Participant scores ranged
Placement Exam. These correlations were considerably
from 1, the minimum possible score, to 6.7, out of a
stronger than those observed for the corresponding Math
maximum possible score of 7, with a mean of 4.4 (SE = .2).
Confidence Assessment. One possible explanation is that most
The Math Confidence Assessment correlated positively with
incoming music majors at Florida State University have
Music Theory 1, Music Theory 2, overall performance in
previously studied music theory and therefore have realistic
music theory, and with the traditional Fundamentals
expectations for their upcoming coursework in music theory.
Placement Exam.
Table 1. Correlations between individual tests / assessments and participants averages in music theory courses. The Overall Core
Theory Average combines Music Theory 1 and 2. For purposes of comparison, the traditional Fundamentals Placement Exam and a
combined measure that includes all notation-based measures (the Music Notation Recognition Test, Triad Recognition Test, and
Musical Pattern Continuation Test) are also included.
541
Table of Contents
for this manuscript
542
Table of Contents
for this manuscript
543
Table of Contents
for this manuscript
Several studies have shown the benefits of music training on cognitive, linguistic and
neural development. It has been also found that incorporating movement instruction into
music education can improve musical competence. However, it is not known whether
movement instruction in music education can also relate to other, non-musical skills.
Here we address this question and report the results of the first year of a longitudinal
study, in which we examine the effect of movement instruction in childhood music
education on musical, cognitive, linguistic and social development. We compared 6-7
year old children receiving the traditional Kodály music training, which includes
solfeggio and singing, with two groups of children receiving movement based music
training: one involves rule based sensorimotor synchronization, and the other involves
free improvisation (Kokas method). In order to measure the effect of these methods, we
created a comprehensive test battery for sensorimotor synchronization, musical,
cognitive, linguistic and social skills. In the beginning of the study children were
compared on all tests to examine any possible pre-existing differences between the
groups that can affect the results of future comparisons. Our results revealed that the
groups did not differ initially in any musical and non-musical skills. After 8 months,
children underwent the same tests, and we found that movement based methods improved
sensorimotor synchronization, rhythmic ability, linguistic skills and executive functions.
Furthermore, free improvisation increased pitch discrimination. Finally, we found
significant correlations between sensorimotor synchronization and both executive
functions and linguistic skills. These initial results inspire us to continue investigating
whether the effect of movement instruction can manifest not only on a behavioral, but on
a neural level, specifically how it affects neural entrainment to a musical rhythm and
neural encoding of speech.
ISBN 1-876346-65-5 © ICMPC14 544 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Singing for the Self: exploring the self-directed singing of three and four
year-old children at home
Bronya Dean1
1
Graduate School of Education, University of Exeter, Exeter, United Kingdom
The everyday musical lives of young children are a subject of growing interest among
music education researchers and practitioners, as everyday musical experiences are
increasingly recognized as relevant to music education. Typically, young children’s
spontaneous singing has been studied in educational contexts where social, or
interpersonal, forms of singing dominate musical play. This means self-directed or intra-
personal forms of singing have largely been ignored. In this paper I discuss the self-
directed singing of 15 three and four year-old children at home, focusing on how they use
singing as a tool for self-management. This is one aspect of a qualitative study examining
how three and four year-old children use singing in their everyday home lives. Audio
data was collected using the LENA™ all-day recording technology. A total of 187 hours
of suitable audio recordings were collected from 15 children (7 boys, 8 girls) aged from
3:0 to 4:10 years (average age 3:8). Analysis of the data revealed that young children sing
extensively at home and that their self-directed singing is less likely to adhere to
culturally meaningful models than their social singing. The children used singing as a
tool to manage the self, including to entertain themselves, enhance their experiences,
regulate bodily coordination, express emotion, explore self-identity, protect their self-
esteem and to support themselves in being alone. Self-directed singing allows young
children to explore improvisation in different ways and may assist in the development of
independence and self-reliance. These findings suggest that music educators should
provide opportunities for solitary music-making in parallel with more socially-oriented
music activities and find ways to support the use of singing in self-management.
ISBN 1-876346-65-5 © ICMPC14 545 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 546 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The testing of musicality has been in effect for 100 years. So far, most tests have focused
on the aspect of musical talent and applied the paradigm of detecting small
discriminations and differences in musical stimuli.
Less research has been conducted on the assessment of musical skills primarily
influenced by an individual’s practice. Up to the present day, there has been no
comprehensive set of tests to objectively measure musical skills, which has rendered it
impossible to measure the long-term development of musicians or to conduct studies on
the characteristics and constituents of musical skill.
The following study has, for the first time, developed an assessment instrument for
analytical hearing following a strict test theoretical validation, resulting in the Musical
Ear Training Assessment (META). By means of three pre-studies, a developmental study
(n=33) and a validation study (n=393), we verified a one-dimensional test model using
sophisticated methods of Item Response Theory identifying the best 53 items to measure
a person’s ear training skills. For better application, two test versions with 10 and 25
items have been compiled (META-10 and META-25).
Future studies will investigate the mentioned effects, especially the hard-to-explain
gender effect, and focus on a comprehensive view of the musician’s ear, including
notation-evoked sound imagery to explore a general model of musical hearing
mechanisms, their influences and educational implications.
ISBN 1-876346-65-5 © ICMPC14 547 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
This paper reports on several research projects that focused on developing innovative
approaches to enhancing sight-reading skills in higher education pianists. Zhukov (2006)
identified three promising areas of pedagogical applications of sight-reading research:
accompanying (Lehmann & Ericsson 1993, 1996; Wristen, 2005), rhythm training
(Fourie, 2004; Henry, 2011; Kostka, 2000; Smith, 2009) and understanding of musical
styles to improve pattern recognition and prediction skills (Sloboda et al. 1998; Thomson
& Lehmann, 2004; Waters, Townsend & Underwood, 1998). The first project evaluated
effectiveness of new distinct training strategies (accompanying, rhythm, style) against a
control. One hundred participants were tested twice on sight-reading, 10-weeks apart.
The MIDI files of playing were analysed by tailor-made software, giving two pitch and
two rhythm scores. Mixed-design ANCOVAs administered to test data showed that each
treatment group improved in one pitch and one rhythm category, and control group
became more accurate in pitch. The findings demonstrated that the new approaches did
have an impact on improving sight-reading skills and provided some numerical evidence
to the old myth that simply doing more sight-reading could enhance sight-reading
(control). In the second project the three training strategies were amalgamated into a
hybrid approach, with each week of the 10-week program containing rhythm training,
four solo pieces of different styles and collaborative playing items. The course materials
were developed in collaboration with piano staff from four institutions who trialled the
prototype curriculum with two different student cohorts. Mixed-model ANCOVAs
showed a significant improvement in all rhythm and pitch categories in the hybrid-
approach group, thus exceeding previously achieved outcomes. The proven effectiveness
of the hybrid-training approach led to publication of a higher education textbook on sight-
reading.
ISBN 1-876346-65-5 © ICMPC14 548 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 549 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Results showed no significant interaction between preferred Rodrigues, H., & Rodrigues, P. (2010). Reconhecimento de canções
criteria for recognition (words, melody, or both) and vocal por crianças de quatro a cinco anos de idade: palavra ou melodia?
Revista de Educação Musical, 134, 11-22.
performances on both songs (F= .575, ρ= .567).
Saito, Y., Ishii, K., Sakuma, N., Kawasaki, K., Oda, K., & Mizusawa,
Conclusions H. (2012). Neural substrates for semantic memory of familiar
songs: Is there an interface between lyrics and melodies? PLoS
This study demonstrated different ways of perceiving ONE 7 (9), e46354.
melody and words of a song. Although the results did not Sammler, D., Baird, A., Valabrègue, R., Clément, S., Dupont, S., &
reveal a relationship between recognition of songs with and Samson, S. (2010). The relationship of lyrics and tune in the
without words and its vocal performance, findings indicate processing of unfamiliar songs: A functional magnetic resonance
adaptation study. The Journal of Neuroscience, 30(10),
that individual differences should be accounted for among
3572-3578.
kindergarten children. Therefore, it is important to consider Samson, S., & Zatorre, R. (1991). Recognition memory for text and
songs without words in classroom activities. This is valid not melody for songs after unilateral temporal lobe lesion: Evidence
only for vocal performance benefit, but also for helping for dual encoding. Journal of Experimental Psychology: Learning,
students concentrate on the melody and not become distracted Memory, and Cognition, 17(4), 793-804.
by words. Further studies should replicate these procedures Serafine, M. L., Davidson, J., Crowder, R., & Repp, B. (1986). On
the nature of melody-text integration in memory for songs.
with older children and different songs. Also, effects of
Journal of Memory and Language, 25(2), 123-135.
teaching strategies should be considered in longitudinal Smale, M. (1988). An investigation of pitch accuracy of four- and
studies regarding song perception and production. five-year-old singers. Dissertation Abstracts International, 48(08),
2013A. (UMI No. 8723851)
ACKNOWLEDGMENT
This research was developed in the framework of CESEM
(Sociology and Musical Aesthetics Research Center –
UID/EAT-00693/2013) and financial support was provided by
FCT (Fundação para a Ciência e Tecnologia) through a PhD
grant (PD/BD/114489/2016), co-financed under the European
Social Fund (ESF) and the Ministry of Science, Technology
and Higher Education (MCTES). The authors would like to
thank Daniel Albert for his helpful comments and review of
this abstract.
REFERENCES
Feierabend, J., Saunders, T., Holahan, J., & Getnick, P. (1998). Song
recognition among preschool-age children: An investigation of
words and music. Journal of Research in Music Education, 46,
351-359.
Jacobi-Karna, K. (1996). The effects of inclusion of text on the
singing accuracy of preschool children. Dissertation Abstracts
International, 57(11), 4682A. (UMI No. 9713419)
Lange, D. (2000). The effect of the use of text in music instruction
on the tonal aptitude, tonal accuracy, and tonal understanding of
kindergarten students. Dissertation Abstracts International,
60(10), 3623A. (UMI No. 9948130)
Levinowitz, L. (1989). An investigation of preschool children’s
comparative capability to sing songs with and without words.
Bulletin of the Council for Research in Music Education, 100,
14-19.
Morrongiello, B., & Roes, C. (1990). Children’s memory for new
songs: Integrations or independent storage of words and tunes?
Journal of Experimental Psychology, 50, 25-38.
Nakada, T., & Abe, J. (2006). The interplay between music and
language processing in song recognition. In M. Baroni, A.
Addessi, R. Caterina & M. Costa, Proceedings of the 9th
International Conference on Music Perception & Cognition
(ICMPC9). Bologna: Italy.
Phillips, K. (1989). Mary Goetze: Factors affecting accuracy in
children’s singing [Review of the unpublished doctoral
dissertation Factors affecting accuracy in children’s singing, by
M. Goetze]. Bulletin of the Council for Research in Music
Education, 102, 82-85.
Racette, A., & Peretz, I. (2007). Learning lyrics: To sing or not to
sing? Memory & Cognition, 35(2), 242-253.
550
Table of Contents
for this manuscript
The successful transformation of music from the written score to being played with
accurate movements and to its being heard lies at the heart of skilled musicianship. To
enable future research on the development of musicians, the processes of this
transformation have to be investigated more thoroughly. The aim of this study is to
further develop a rapid assessment instrument to observe the transformation from the
written score to the imagery of sounding music: namely, the concept of notation-evoked
sound imagery (NESI), which follows the notion of notational audiation.
Referring to Brodsky et al. (2008), we have developed and validated n=26 sets of stimuli
consisting of a theme melody, a correct variation and an incorrect variation (“lure”) of
this particular theme. In this procedure, participants (music students and musicians,
N=43) read the theme, try to imagine the sound, hear a correct or incorrect variation and
decide whether the heard variation is based on the seen theme. These stimuli have been
examined in Pre-study 1 and, by the means of signal detection theory, analysed with
regards to their item characteristics (Sensitivity/Easiness d’: M=1.30, SD=0.84. Bias c:
M=0.07, SD=0.45). In the upcoming Pre-study 2, both easier and more difficult items are
being generated and will be tested in spring 2016. Several control variables will be
included based on the preliminary results from the first study.
On the basis of these pretested items, a test instrument will be developed to measure the
extent to which musicians are able to imagine the sound represented by a written score
(inner hearing). As many musical skills (e.g., sight-reading, playing by ear…) rely on the
fast and accurate switching between sounding music and the written score, assessments
will both facilitate research in analytical hearing and mental representations of music and
uncover the role of the musical ear for musicians of all skill levels.
ISBN 1-876346-65-5 © ICMPC14 551 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Recent years have seen increasing interest in empirical cross-species research on music
cognition, as a way of addressing evolutionary questions about music. This symposium
brings together researchers who are doing comparative studies of music cognition in
chimpanzees, sea lions, and songbirds to address questions about the origins of human
rhythmic and melodic processing capacities. The symposium will include research based
on a variety of empirical methods (behavioral and neural) and will present new empirical
research relevant to a range of theoretical issues, including the origins of human beat
perception and synchronization and cross-species similarities and differences in the
perceptual cues used to recognize melodic patterns. The four speakers and the discussant
come from labs in the US, Japan, and Europe, reflecting international research on music
cognition. All have published research in the area of cross-species music cognition, and
all will include new data in their presentations.
ISBN 1-876346-65-5 © ICMPC14 552 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
One of the interesting effects of music is that it induces our movement even when simply
listening to it. Although recent studies report that non-human apes (and several other
animals) also show spontaneous entrainment to auditory rhythms, still it is not fully
understood how we acquired such embodied music cognition in the course of human
evolution. Here we report spontaneous rhythmic engagement with complex auditory beat
in chimpanzees (Pan troglodytes). At the experimental booth, we observed that some
chimpanzees spontaneously began to coordinate their body movements in response to
auditory beats. So, we investigated systematically their responses to different auditory
beats by video analysis. The characteristics of rhythmic movements (e.g., rocking, head
bobbing) observed were similar to their display they show in daily life and this suggests
that movements induced by the beat in the chimpanzees were in the range of their natural
behavioral repertoire. We also found more rhythmic engagement to complex beats than
to simple beats and stronger effects in male chimpanzees than female chimpanzees. The
finding suggests that a predisposition for rhythmic movement in response to music
already existed in the common ancestor of chimpanzees and humans, approximately 6
million years ago.
ISBN 1-876346-65-5 © ICMPC14 553 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Recent comparative findings in birds, primates, and a sea lion have overturned the long-
held belief in psychology that beat keeping is unique to humans. It remains to be seen
what the limits on beat keeping behavior are in non-human animals, and to uncover the
neural underpinnings for flexible rhythmic behavior. We exposed an experienced sea lion
beat keeper to multiple tempo and phase perturbations in rhythmic stimuli and modeled
her response recovery. Consistent with prior findings in humans, she showed a rapid
return to stable phase following phase perturbation and more gradual return following
tempo perturbation. These findings suggest the sea lion relies on similar neural
mechanisms as do humans, which can be accurately modeled as processes of neural co-
oscillation. In parallel work, we used diffusion tensor imaging to map auditory, motor,
and vocal production pathways in sea lion brains obtained opportunistically from animals
euthanized in a veterinary rehabilitation facility. Tracts were compared to other pinniped
species and primates. Based on these and other recent findings, we suggest that beat
keeping capability is rooted in basic and widely shared neural processes underlying
sensorimotor synchronization. We further suggest that the capability may be enhanced or
restricted by a wide range of biological and environmental factors, including faculty for
precise motor control, attention, motivation, and social interaction.
ISBN 1-876346-65-5 © ICMPC14 554 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music is found in all human cultures, which suggests that music is not a purely cultural
phenomenon, but has roots in our biology. Not only that, but some aspects of music tend
to be found cross-culturally, suggesting that they are fundamental to the phenomenon of
human music. Are these fundamental features of human musical systems found in other
species? If they are, we can begin to understand the evolutionary origins of these features
by studying what other animals with these musical abilities have in common with each
other and also us. Here we show two examples of methods we are using to understand
whether other animals perceive music-relevant sounds the way we do. In both cases, we
conducted experiments with humans first without any explicit verbal instructions and
then conducted the same experiments with other species to see if they would respond the
same way. In the first task, we demonstrated octave equivalence in humans using a
paradigm where humans had to respond to some sounds and not to others based on trial-
and-error feedback. In the second task, we demonstrated that humans prefer to listen to
rhythmic over arrhythmic sounds by placing them in an arena with two rooms connected
by an opening, where each room was playing a different soundtrack, and observing where
they spent the majority of their time. We have begun to use these simple tasks to test
other species. We found that a songbird species that is known for excellent pitch abilities
failed to show octave equivalence in the same task that humans completed. We are
currently studying how budgerigars, a parrot with excellent pitch, rhythm, and mimicry
abilities, fares at both of these tasks. We also plan on studying mammalian species: while
some avian species share rare potentially music-relevant abilities with humans, such as
vocal mimicry and rhythmic entrainment (synchronizing to a beat), other mammals may
share music-relevant genes with humans because they are more closely-related to us than
birds.
ISBN 1-876346-65-5 © ICMPC14 555 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 556 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 557 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
byproduct of increased stress, and may be under flexible research on dog auditory perception suggests that they have the
control of the signaler (cf. Fitch, 2000). hearing range and frequency discrimination abilities to be able
If individual wolves regulate the F0 of their howls in to discriminate between versions of a musical piece transposed
relation to the howls made by others, this would show that the up or down by a few semitones. The dog hearing range spans
capacity for regulating one’s own pitch in relation to other, from about 67 Hz – 45 kHz at 60 dB SPL (Heffner, 1983), and
simultaneously heard pitches is not uniquely human. An dogs can discriminate pitch differences of 1 semitone or less
acoustic study by Filibeck et al. (1982) examined chorus howls within musically-relevant pitch ranges (Anrep, 1920).
made by four canids howling simultaneously (two captive As a first step toward the proposed transposition experiment,
wolves, a dog-wolf hybrid, and dog) and found that the F0s of we have conducted an empirical study of dogs howling to
the animals differed from each other by at least 15 Hz. This music. Our question was how often dogs hit stable pitches
finding accords with the speculation that chorus howling when howling to music. By ‘stable pitches’ we do not mean
involves purposeful frequency separation between the howls of pitches corresponding to Western musical scales, but sections
individual wolves, which may enhance the apparent size of the of pitch contour with relatively little frequency variation. The
pack to other canid groups listening to such howling (Lopez, presence of stable pitches allows one to study the relation of
1978). While this idea is speculative, it raises the interesting howl pitches to the pitches in the musical structure. Stable
possibility that when a wolf (or dog) howls along with others, it pitches are also ideal from the standpoint of the proposed
purposefully ‘detunes’ its F0 from the other pitches that it transposition study, because one can measure the mean F0 of
hears. stable pitches taken from dogs howling to a piece of music in
We propose that a suitable model system for addressing the its original and transposed versions. A significant difference in
voluntary vocal pitch flexibility of canids is provided by the the mean F0 of such pitches before and after transposition
phenomenon of domestic dogs (Canis lupus familiaris) would suggest voluntary regulation of vocal pitch in relation to
howling to music. This is a frequently-observed behavior, as heard pitch patterns.
evidenced by many videos of the behavior posted on public The current study did not perform the transposition
video-hosting websites (such as YouTube). Howling to music experiment, which we hope to conduct in the future. Here we
is easy to distinguish from other vocalizations dogs might focus on determining how often stable pitches occur when dogs
make while music is playing (such as barking, growling, or howl to music, and how much variability there is between the
whining) because of the distinctive acoustic structure of the mean frequencies of such stable pitches within a given dog’s
dog howl (Taylor et al. 2014) and the characteristic posture howling to music.
associated with it (muzzle pointing up). Dog howling to music
is likely elicited by pitch patterns in the music that resemble the II. METHODS
acoustics of howls, i.e., sustained pitches with salient periods
of frequency modulation (hence the canine tendency to howl to A. Sample
non-musical sounds with these qualities, e.g., ambulance sirens 29 videos of dogs howling to a range of musical stimuli were
or fire trucks). gathered using YouTube and Google search engines. The most
Anecdotal reports (e.g., from dog owners who post videos common music within these videos was the 2010 theme to the
online) suggest that some dogs howl fairly reliably when they television show Law & Order (n=10), a piece of about 40
hear certain pieces of music. If this is the case, it provides an seconds duration (tempo = 118 BPM) performed by electric
opportunity to determine whether dogs change the pitch of guitar, keyboard, timpani, and clarinet. Videos of dogs reacting
their howls in relation to simultaneously heard pitch patterns. to this theme were selected for further analysis (these dogs
Specifically, one could analyze the pitches of a dog’s howl to a spanned a range of different breeds and sizes). The audio
particular musical piece and to versions of the piece transposed content of the videos and the 2010 Law & Order theme were
up or down by a few semitones in pitch. If the dog changes the extracted (WAV format, mono, 16-bit, 48kHz) using an online
pitch of its howl in response to such transpositions, this would audio extraction tool (http://www.onlinevideoconverter.com).
provide evidence that it is attempting to coordinate its own
pitch pattern with the pitch patterns it hears (e.g., to
purposefully detune its howl from those patterns). Importantly,
3000
Frequency (Hz)
2000
1000
0
0 5 10 15 20 25 30 35 40
Time (s)
Figure 1. A spectrogram of a dog howling in response to the musical stimulus.
558
Table of Contents
for this manuscript
3000
2000
Frequency (Hz)
1000
0
0 1 2 3 4 5 6 7 8 9 10
Time (s)
Figure 2. An enlarged view of the first 10 seconds of Figure 1. Red boxes outline the first three stable howl segments.
clearly visible in this figure, and are much more prominent than
B. Howl Quantification the background music, which is barely visible on the
Howls were acoustically analysed using spectrograms made spectrogram. Figure 2 enlarges the first 10 seconds of the
with Praat (Boersma, 2002). Howls were identified using sonic spectrogram shown in Figure 1, and red boxes outline stable
and visual characteristics noted in Tembrock’s (1967) howl segments identified using the above criteria.
classification of canid vocalizations. Pitch (Hz) and time (s)
data were measured for each howl. Praat default pitch and C. Alignment
spectrogram settings were used, with two deviations (Pitch The 2010 Law & Order Theme was transcribed into music
Range = 50.0 – 2500.0 Hz and Very Accurate = On). Stable notation, and a 4-part MIDI version of the theme was created in
howl segments were identified from pitch contour data Ableton Live 9. Extracted audio from dog howling videos were
imported into Excel 2016, using the following criteria: then aligned to this reference, using the music in each video for
alignment with the original theme song. In two videos, the
1) Pitch. The maximum and minimum fundamental
tempo of the music was slower than the tempo of the original
frequency of the segment was within ±5% of that segment’s
music, suggesting that the uploaded video had undergone
average F0.
temporal dilation (in one video, the tempo of the music was 112
2) Duration. The segment was at least 0.5s in duration. BPM and in another, 87 BPM). Time warping in Ableton Live
One video had no segments that met the criteria and could 9 was used to bring these audio files up to the original tempo of
not be analyzed further, leaving 9 audio tracks for further 118 BPM (without changing pitch) and align them to the
analysis. An example spectrogram from our study is shown in original theme song.
Figure 1. The F0 and first few harmonics of the howls are
Figure 3. A piano roll visualization of stable segments of dog’s howls relative to pitches in the musical stimulus (analysis based on the howls in
Figure 1). Darker colors (blue, red, green, and brown) represent instruments of the Law & Order theme (guitar, keyboard, timpani, and clarinet,
respectively). Yellow indicates the average pitch of stable howl segments. These segments have not been rounded to Western chromatic pitches.
559
Table of Contents
for this manuscript
560
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 561 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 562 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 563 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
collective practice to build, maintain, and to cultivate affective 5) Lyrics and melody are relatively autonomous (see 2),
bonds between human beings and to satisfy our need to feel forming parallel hierarchies (cf. Baroni, Dalmonte, &
socially included. Dissanayake (e.g., 2000, 2008, 2011) Jacoboni, 1995). Ideally, lyrics and melody fit
analysed in detail early mother-infant interactions, and she
together in an equal manner. If this is not possible,
theorized about caretakers’ proto-aesthetic means to regulate
affective and summarized them as repeating, simplifying (or one is super- and one is subordinated.
formalizing), exaggerating, elaborating (or varying), and 6) The poetic meter of the lyrics and the musical meter
manipulating expectancies (or surprising). The use of these of the melody simultaneously rule the timing of the
affect regulatory means are precursors of artistic experiences, syllables. As aforementioned, this may create
since they allow to make an event turn into something special, tension.
and act Dissanayake calls artifying. 7) Symmetries of temporal and sonor forms aim at
Merker (2009) proposes to conceive of human culture as creating well-formed wholes or a Gestalt.
governed by rituals that focus on performing procedures,
whereas animal culture is restricted to instrumental usage of In order to understand the generative nature of the two
tools and communication. He assumes that language and systems, table 1 gives an overview on vocal features
rituals are specific to humans. Moreover, he introduced the differently organized while singing, reciting, and speaking.
idea that music – like language – is a generative system, i.e.,
the fact that music generates infinite pattern diversity by finite
means (Merker, 2002). Hence, vocal products – speech and Table 1: Singing, reciting, and speaking share the same, yet
songs – are governed by generative rules, and in song singing differently organized vocal features. (© Stadler Elmer, 2015)
linguistic and musical rules are combined.
The engagement in vocal learning is driven by the infant’s
basic affective needs such as care, cognitive stimulation, and
feelings to belong; and vocal teaching is driven by caretakers’
intuition and intention to transmit cultural achievements.
Collectively, every culture provides traditional practices to
regulate affects by ritual behavior – for instances song singing
– that help to make the future somehow less uncertain
(Valsiner, 2003; Cross, 2003; Stadler Elmer, 2015).
In short, songs are cultural rituals and devices that are used to
share and to modulate affective states. Song singing also
stimulates the learning of complex rules from early on, and Note that poetic language – reciting or chanting – shares with
hence, it seems to be the very first and most important cultural singing the periodically accentuated pulse, a meter. Most
act for recognizing structures and rules, and for making notable is the fact that syllables are the commonly shared
abstractions. building blocks of the entire system, as formulated in
principle 3. The traditional German song represented in figure
IV. How can we describe and 2 serves as an example to verify the given selection of rules.
analyze vocal structures? To date, 21 rules are formulated with respect to the tonal
Song singing formally encompasses lyrics and melody or organization, the timing, and the lyrics (Stadler Elmer, 2015).
language and music. The two generative systems share the
same organizational principle and vocalization as the
expressive means. In order to describe and analyze song
singing, it is important to summarize basic principles of the
structures of children’s songs. At the basis of Merker’s work
(2002) Stadler Elmer (2015) proposed a grammar of
children’s songs (CS) that makes explicit the underlying
principles and rules. So far, the rules are restricted to German Figure 2: An example of a traditional German children’s song. The
language since each language has specific prosodic rules that top line shows the two phrases, the symbols I, IV, and V indicate the
need to be taken into account (Hall, 2000). Up to now, seven underlying harmonic structure of the melody, the last line points to
general principles are formulated: the rhymes in the lyrics that mark the boundaries of the phrases. (©
Stadler Elmer, 2015)
1) CS consists of lyrics (a verse or a poem) and a
Timing rules - examples
melody.
1) Once the meter (measure) is set, it is valid for the entire
2) Two generative systems – music and language – are
song. CS don‘t change the measure or meter.
simultaneously involved that allow creating infinite
2) The number of measures in CS is dividable by two.
possible songs.
Possible quantities of measures are 4, 8, 10, 12, 14, and 16.
3) The building block of both systems for making songs
is the syllable since it also contains musically
Tonal rules – examples
relevant features such as pitch, time, and accents.
1) CS are in major keys, rarely in pentatonic scale or minor
4) Songs are hierarchical organized. keys.
564
Table of Contents
for this manuscript
2) The major key once chosen is maintained through the entire Tom, 1/1, event 4, 8.1 secs
song. There is no change of key.
3) CS begin with one of the three tones of the tonic accord (do,
mi, so).
4) The first measure of a CS marks the key.
so
G4
Rules concerning lyrics - examples
1) The lyrics are formed in poetic language. That means it has F4
a meter, and verse lines that end with rhymes (pair and cross mi
E4
rhymes).
2) The meter defines the periodic accents, and the verse lines
is defined by the quantity of syllables. D4
3) The meter (trochee, jambus, dactylus, anapest) is
contingent with the musical meter (measure) of the melody to do
C4
yield a well-formed entity or Gestalt.
B3
How can vocal learning be described and analyzed in its
0 1 2 3 4 5 6 7
dynamic changes?
In order to investigate children’s strategies in generating new Figure 3: Tom’s first version of a new song. The thin line represents
songs – be it invented or reproduced – we carried out a the song model, and the dots represent pitches sung by the child. The
quasi-experimental study. We composed new songs that y-axes shows pitch as a continuum, and the x-axes shows the
deliberately violated some of the CS rules, and we asked continuous timing categorized as seconds, and measured by the onset
children to invent songs. Pictures stimulated all singing of the syllable and duration. The horizontal lines indicating do, mi, so
activities. In order to exemplify the application of CS rules, I visualize the tonic triad as a conventional tonal framework (Stadler
use an excerpt of a previously published case study (Stadler Elmer, 2000, 2002).
Elmer, 2000, 2002) with Tom, 4;7 years old, and discuss it in
the light of the CS grammar. Figure 3 shows Tom’s first VI. Conclusions
reproduction of a new song, he previously listened two times The example given shows that song singing is more than a
and joint in singing once. This graphic representation is based sequence of producing words, adding rhythm, then a melody
on a micro-genetic methodology including acoustic analysis contour, and finally intervals – one of the traditional
by the Pitch Analyzer, and with the aid of the Notation conceptions of singing development (e.g., Welch, Sergeant, &
Viewer (Stadler Elmer & Elmer, 2000). White, 1998; Hargreaves, 1986; Moog, 1976). Microanalyses
Tom’s first solo version (figure 3) shows first of all an exact repeatedly reveal that children have an overall understanding
match between his temporal organization and the model: for song singing being a coherently and densely organized
Without counting 15, he produces 15 syllables that correspond social event with predictive features. It can’t be acquired by
adding elements or by focusing accurate pitch, but rather by
to 15 notes or pitch categories, altogether forming a coherent
grasping the entire form at the basis of related and repeated
and new song. He seems to have a sense for this formal aspect,
sensorimotor and simultaneously socially shared experiences.
or in other words, for the Gestalt of a song in terms of its
temporal framework. He subordinates the lyrics by adopting
only the first two words and by subsequently reducing the References
lyrics to the repetitive syllable /ne/. The melody is remarkable:
Baroni, M., Dalmonte, R. & Jacoboni, C. (1995). The concept of
he adopts the first phrase, but he replaces entirely the second,
hierarchy: A theoretical approach. In E. Tarasti (ed.),
unconventional one by a newly invented and well-formed tune.
Musical signification. Essays in the semiotic theory and
Thereby he generalizes the rule that a CS does not change the analysis of music (pp. 325–334). Berlin: Mouton de
key, and therefore it ends with the tonic of the key introduced Gruyter.
at the beginning. He continued acquiring this song by Baumeister, R.F., & Leary, M.R. (1995). The need to belong: desire
applying various strategies (adopting the lyrics, asking for for interpersonal attachments as a fundamental human
help, focusing absolute pitch etc.). motivation. Psychol Bull 117(3):497–529.
Condon, W.S. (1977). A primary phase in the organization of infant
responding. In: Schaffer HR (eds) Studies in mother-infant
interaction. London: Academic, pp 153–176.
Cross, I. (2003). Music and biocultural evolution. In: M. Clayton, T.
Herbert, R. Middleton (eds.) The cultural study of music: a
critical introduction. London: Routledge, pp 19–30.
Dissanayake, E. (2000). Antecedents of the temporal arts in early
mother-infant interaction. In: N.L. Wallin, B. Merker, & S.
Brown (eds) The origins of music (pp 389–410).
Cambridge, MA: MIT Press.
565
Table of Contents
for this manuscript
566
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 567 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
of music), each piece was composed in a quantitative way (e.g., we calculated the overall distances between each of the
the total number of notes was matched within each complexity performances and original pieces using dynamic time warping
level; the number of accidentals and the longest duration of (DTW). DTW is a method to find an optimal cost path by
notes were more than doubling) (see Table 1). tracking minimum costs within a similarity matrix of two
sequences that vary in time and speed (Soulez, Rodet &
Table 1. A quantitative composition of musical pieces: mm means
measure numbers and indicates the number of notes on each Schwarz, 2003; Müller, 2007). It has been used effectively in
measure; total notes, accidentals, and half notes mean the number the fields of speech or text recognition, as well as in score
of total notes, accidentals, and notes with half notes or over half following. We applied this method to provide a single value
notes, respectively. reflecting the distance of the two different streams of music
that should have been similar.
Complexity level We divide the process DTW into two sections in this paper:
computing the similarity matrix and calculating the DTW
mm. Low Medium High distance. First, we applied the Jaccard distance to obtain a
similarity matrix between each participant’s performance and
1–4 10 8 8 10 8 7 11 8 8 10 12 10 the stimuli. The Jaccard distance is widely used for managing
asymmetric binary variables (Willett & Barnard, 1998; Lipkus,
5–8 8 9 9 8 9 10 12 8 13 12 10 12
1999). The Jaccard distance between binary strings A and B
9–12
can be calculated as following:
6 10 7 7 11 10 11 11 14 13 18 13
| ∪ | − | ∩ |
13–16 8 6 9 7 12 11 11 10 11 10 12 12 (, ) = 1 − (, ) =
| ∪ |
Total
130 160 190 (1)
notes
568
Table of Contents
for this manuscript
569
Table of Contents
for this manuscript
was greater than that between the low and medium musical
complexity levels.
We will explore the boundaries of auditory memory and aim
to quantify the number of times auditory memory helps
sight-reading proficiency in the future research. Further
investigation of the influence of preliminary listening will be
also conducted to examine the effect on different groups, such
as novice sight-readers, in the same complexity condition.
Examination of the effect of preliminary listening by
measuring eye movements would further extend the body of
knowledge in the field.
REFERENCES
Burman, D. D., & Booth, J. R. (2009). Music rehearsal increases the
perceptual span for notation. Music Perception: An
Interdisciplinary Journal, 26(4), 303-320.
Drai-Zerbib, V., Baccino, T., & Bigand, E. (2012). Sight-reading
expertise: Cross-modality integration investigated using eye
tracking. Psychology of Music, 40(2), 216-235.
Kopiez, R., & In Lee, J. (2006). Towards a dynamic model of skills
involved in sight-reading music. Music education research, 8(1).
97-120.
Kopiez, R., & In Lee, J. (2008). Towards a general model of skills
involved in sight-reading music. Music education research, 10(1).
41-62
Lahav, A., Boulanger, A., Schlaug, G., & Saltzman, E. (2005). The
Power of Listening: Auditory-Motor Interactions in Musical
Training. Annals of the New York Academy of Sciences, 1060(1),
189-194.
Lehmann, A. C., & Kopiez, R. Sight-reading. In S. Hallam, I. Cross &
M. Thaut (Eds.) (2009). The Oxford handbook of music
psychology (pp. 344-351). Oxford: Oxford University Press.
Lehmann, A. C., & Ericsson, K. A. (1996). Performance without
preparation: structure and acquisition of expert sight-reading and
accompanying performance. Psychomusicology, 15(1/2), 1-29.
Lipkus, A. H. (1999). A proof of the triangle inequality for the
tanimoto distance, Journal of Mathematical Chemistry, 26,
263-265.
McPherson, G. E. (1995). Five aspects of musical performance and
their correlates. Bulletin of the Council for Research in Music
Education, 127, 115-121.
Müller, M. (2007). Dynamic time warping. Information retrieval for
music and motion, 69-84.
Rayner, K., & Pollatsek, A. (1997). Eye movements, the eye-hand
span, and the perceptual span during sight-reading of music.
Current Directions in Psychological Science, 6(2), 49–53.
Sloboda, J. A. (1984). Experimental studies of music reading; A
review. Music Perception, 2(2), 222-236.
Soulez, F., Rodet, X. & Schwarz, D. (2003). Improving polyphonic
and poly-instrumental music to score alignment. Proceedings of
International Society on Music Information Retreival, 6-11.
Waters, A. J., Townsend, E., & Underwood, G. (1998). Expertise in
musical sight reading: A study of pianists. British Journal of
Psychology, 89, 123-149.
Willett, P., Barnard, J. M. & Downs, G. M. (1998). Chemical
similarity searching, Journal of Chemical Information and
Computer Sciences, 38, 983-996.
Wurtz, P., Mueri, R. M., & Wiesendanger, M. (2009). Sight-reading
of violinists: Eye movements anticipate the musical flow.
Experimental Brain Research, 194(3), 445–450.
570
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 571 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Table 1. List of Analyzed Records of Chopin’s Nocturne B-flat Minor, Op. 9 No. 1.
No. Pianist Name TT mm. 1–4 Record Live / Place / Label / Release, Re-mastering Date
(in alphabetical order) Date Studio (if different from record date)
1. Arrau Claudio (1903–1991) 5’41.499 18.662 1977-78 S Amsterdam Concertgebouw / Philips Classics
2. Ashkenazy Vladimir (b. 1937) 5’41.445 17.676 1997 S Decca
3. Barenboim Daniel (b. 1942) 5’38.644 15.544 1981 S Studio Lankwitz Berlin / Polydor Int. GmbH / 1982
4. Biret Idil (b. 1941) 5’29.859 15.883 1990-92 S Naxos (Grand Prix du Disque Chopin, 1995)
5. Chaimovich Vadim (b. 1978) (1) 6’40.067 17.974 2006 L No data
6. -”- (2) 6’34.845 17.573 2007 L No data
7. -”- (3) 7’34.894 20.865 2009 S Sheva / 2010
8. Cziffra Georges (1921–1994) 5’56.818 17.090 1980-86 S EMI
9. Dang Thai Son (b. 1958) 5’09.676 15.244 1987 S Victor JVC
10. Engerer Brigitte (b. 1952) (1) 5’37.917 17.741 1993 S Harmonia mundi / 2014
11. -”- (2) 5’43.339 17.413 2010 Jun 5 L 5th Piano Festival, Saint-Germain-au-Mont-d'Or
12. -”- (3) 5’57.277 19.214 2010 Oct 9 L Nuit de l'âme, Théâtre du Beauvaisis, Beauvais
13. Falvay Sandor (1949) 4’56.153 16.090 1989 Jun S Budapest / Naxos
14. Freire Nelson (b. 1944) 5’24.732 15.698 2009 Dec S Decca / 2010
15. Gavrilov Andrei (b. 1955) (1) 5’22.581 20.810 2008 Aug 17 L Valldemossa Chopin Festival, Majorca
16. -”- (2) No data 22.789 2013 May 17 S Fazioli Hall, Rome
17. -”- (3) 5’57.885 23.554 2013 Sep 2 L Green Cross Int. 20th Anniv., VictoriaHall, Geneva
18. Godowsky Leopold (1870–1938) 4’03.993 13.545 1928 Jun 23 S English Columbia Recordings
19. Harasiewicz Adam (b. 1932) 5’22.047 16.646 1961 Jun S The Netherlands / Philips Classics / 1994
20. Joao Pires Maria (b. 1944) 5’27.787 16.765 1995-96 S Deutsche Grammophon
21. Kapell William (1922–1953) 5’23.701 15.164 1945 Feb 28 L Carnegie Hall, New York / RCA
22. Li Yundi (b.1982 ) (1) 5’18.124 14.620 2010 Mar 1 L Chopin's 200th Anniv., Warsaw
23. -”- (2) 5’28.234 15.165 2010 S EMI
24. -”- (3) 4’59.275 14.237 2011 Apr 23 L Beijing National Centre
25. -”- (4) 4’51.013 14.634 2015 Mar 1 L Chengdu, China
26. Magaloff Nikita (1912–1992) 6’41.030 20.204 1974 S Amsterdam Concertgebouw / Philips Classics
27. Moravec Ivan (1930–2015) 5’34.301 16.476 1965 Apr S St. Paul's Chapel at Columbia University, New York
(Steinway) / Elektra Nonesuch / 1991
28. Novaes Guiomar (1895–1979) 5’32.581 14.151 1957 S Vox / 1995
29. Ohlsson Garrick (b. 1948) 5’14.605 13.442 1979 S EMI / 2005
30. Pacini Sophie (b. 1991) 5’10.267 19.012 2014 S CAvi
31. Pollini Maurizio (b. 1942) 4’42.720 14.335 2005 S Deutsche Grammophon
32. Richter Sviatoslav (1915–1997) (1) 6’25.317 20.623 1950 Jun 19 L Tchaikovsky Hall, Moscow
33. -”- (2) 6’13.219 18.109 1972 Feb 16 L Szeged, Hungary
34. Rubinstein Arthur (1887–1982) 5’22.829 14.887 1965 S RCA
35. Sheng Cai (b. 1987) 5’29.595 15.254 2014 May 8 L Capitol Theatre, New Brunswick, Canada
36. Sombart Elizabeth (No data) 6’28.042 22.956 2012 S Forlane / 2013
37. Sztompka Fryderyk (1901–1964) 5’52.224 17.322 1959 S Muza label (Polskie Nagrania)
38. Thibaudet Jean-Yves (b. 1961) 5’16.165 16.975 2000 S Decca
39. Tiempo Sergio (b. 1972) 5’20.051 14.923 2006 S EMI
40. Tipo Maria (b. 1931) 6’50.769 21.560 1993-94 S EMI
41. Ts’ong Fou (b. 1934) 4’55.421 13.255 1978-80 S Sony
42. Varsi Dinorah (1939–2013) 4’35.471 13.661 1986 S EMI
572
Table of Contents
for this manuscript
performance of mm. 1–4 is 12.407” (the total duration of Table 2. Duration Data of Sound-Records of Chopin’s Nocturne
Nocturne performed by computer is 4’25.709”). B-flat Minor, Op. 9 No. 1.
Length of Length of
mm. 1–4 Total Piano Piece
MM=116 12.407” 1. Godowsky 4’03.993”
upbeat
1. Ts’ong 13.255” MM=116 4’25.709”
mm. 1 2 3 4 2. Ohlsson 13.442” 2. Varsi 4’35.471”
3. Godowsky 13.545” 3. Pollini 4’42.720”
4. Varsi 13.661” 4. Li Yundi - 4 4’51.013”
5. Novaes 14.151” 5. Ts’ong 4’55.421”
6. Li Yundi - 3 14.237” 6. Falvay 4’56.153”
7. Pollini 14.335” 7. Li Yundi - 3 4’59.275”
8. Li Yundi - 1 14.620” 8. Dang Thai Son 5’09.676”
9. Li Yundi - 4 14.634” 9. Pacini 5’10.267”
10. Rubinstein 14.887” 10. Ohlsson 5’14.605”
11. Tiempo 14.923” 11. Thibaudet 5’16.165”
12. Kapell 15.164” 12. Li Yundi - 1 5’18.124”
13. Li Yundi - 2 15.165” 13. Tiempo 5’20.051”
2.5” 5” 7.5” 10” 12.5”
14. Dang Thai Son 15.244” 14. Harasiewicz 5’22.047”
Figure 1. Visual representation of computer generated
sound-pattern, MM=116, mm. 1–4 duration 12.407” (fragment of 15. Sheng Cai 15.254” 15. Gavrilov - 1 5’22.581”
the score, scheme of quarter- and eighth-note sequences, and 16. Barenboim 15.544” 16. Rubinstein 5’22.829”
melodic range spectrogram) 17. Freire 15.698” 17. Kapell 5’23.701”
18. Biret 15.883” 18. Freire 5’24.732”
On the first step of analysis, the first phrase in Chopin’s 19. Falvay 16.090” 19. Joao Pires 5’27.787”
Nocturne (mm. 1–4, a traditional tonally closed period of two 20. Moravec 16.476” 20. Li Yundi - 2 5’28.234”
two-bar melodically and harmonically parallel phrases) was 21. Harasiewicz 16.646” 21. Sheng Cai 5’29.595”
extracted from the total sound-record. Already here we
22. Joao Pires 16.765” 22. Biret 5’29.859”
encounter different approaches and the style of piano playing.
23. Thibaudet 16.975” 23. Novaes 5’32.581”
Afterwards the extracts were grouped gradually moving from
24. Cziffra 17.090” 24. Moravec 5’34.301”
pretty strict playing to the very freedom of tempo and rhythm,
25. Sztompka 17.322” 25. Engerer - 1 5’37.917”
which affected the manifestation of tempo from live to
extremely slow motion. Therefore 4 types were distinguished, 26. Engerer - 2 17.413” 26. Barenboim 5’38.644”
where No. 1 and No. 4 are absolutely opposite, with a high 27. Chaimovich - 2 17.573” 27. Ashkenazy 5’41.445”
contrast of playing speed, which show a gradual change in 28. Ashkenazy 17.676” 28. Arrau 5’41.499”
slowing the tempo, and more and more mismatches in the 29. Engerer - 1 17.741” 29. Engerer - 2 5’43.339”
metric vertical (see Table II). 30. Chaimovich - 1 17.974” 30. Sztompka 5’52.224”
The 1st type of performances are characterized by pretty 31. Richter - 2 18.109” 31. Cziffra 5’56.818”
orderly playing, stable tempo and equal motion of the 32. Arrau 18.662” 32. Engerer - 3 5’57.277”
accompaniment (eighth-notes in the left hand). The vertical of 33. Pacini 19.012” 33. Gavrilov - 3 5’57.885”
accompaniment and melody shows pretty strict metrical 34. Engerer - 3 19.214” 34. Richter - 2 6’13.219”
coincidences. In total 3 pianists/records were placed in this 35. Magaloff 20.204” 35. Richter - 1 6’25.317”
group: Falvay, Godowsky and Novaes. Usually the analyzed 36. Richter - 1 20.623” 36. Sombart 6’28.042”
examples are united by live tempo, the length of performance 37. Gavrilov - 1 20.810” 37. Chaimovich - 2 6’34.845”
of mm. 1–4 ranges from 13.545” to 16.090”. 38. Chaimovich - 3 20.865” 38. Chaimovich - 1 6’40.067”
The 2nd type: slight manifestation of rubato features. Enough 39. Tipo 21.560” 39. Magaloff 6’41.030”
live tempo and pretty strict pulse of the accompaniment;
40. Gavrilov - 2 22.789” 40. Tipo 6’50.769”
manipulation of minor acceleration and slowing down,
41. Sombart 22.956” 41. Chaimovich - 3 7’34.894”
especially inside the melodic ornaments of 11 and 22 notes;
42. Gavrilov - 3 23.554”
some records keep the strict pulse of the melodic phrase of 6
eighth-notes in the upbeat, whereas other tend to expose the
upbeat (performed slightly slower and with more freedom, like
the run-up); slight deviations in the match of the vertical; the The 3rd type: expression of “dreamy” rubato. A prominence
second phrase (mm. 3–4) is performed more freely. 15 of the accelerated or spaced out upbeat; a tendency to listen
pianists/16 records: Rubinstein, Barenboim, Biret, Dang Thai attentively to the melodic relief; rather slow tempo;
Son, Freire, Kapell, Li Yundi (1, 2), Moravec, Ohlsson, Pollini, acceleration and slowing down inside the pulse of eighths. 11
Sheng Cai, Sztompka, Ts’ong and Varsi. The length of mm. pianists/15 records: Arrau, Ashkenazy, Cziffra, Chaimovich
1–4 is dominated by 14–15”. However, an equal motion, strict (1, 2), Engerer (all 3 examples), Harasiewicz, Joao Pires, Li
vertical and pretty stable rhythm is specific for extremely Yundi (3, 4), Richter (2), Thibaudet and Tiempo. The length of
lengthy interpretation by Nikita Magaloff, lasting 20.204”. mm. 1–4 ranges from 14.237” to 18.662”.
573
Table of Contents
for this manuscript
The 4th type: a turn towards the narrative nature of sound; CONCLUSION
maximally attentive listening and performance of each note The main observation after examining the records highlights
that determines a very slow tempo; most often the vertical the manifestation of rubato closely related to the moderation of
coincides in the beginning of bars only (the inside of bars is movement: the tendency towards the expressiveness of sound,
emphasized with the rhythmical irregularity); every an attempt to highlight every eight-note (likely sing) slows
performance is individual. 7 pianists/10 records: Chaimovich down the tempo. On the other hand, tempo deceleration is
(3), Gavrilov (all 3 examples), Pacini, Richter (1), Sombart, directly connected to the disagreement in the vertical between
and Tipo. The length of mm. 1–4 ranges from 17.573” to melody and accompaniment (see Fig. 2).
23.554”, dominated length over 20”. As expected, the interpretations, placed in the groups 2 and 3,
constitute the major part of the records. The length of
2.5” 5” 7.5” 10” 12.5” 15” 17.5” 20” 22.5”
interpretation and choice of narrative approach (specific for
mm. 1 2 3 4 group 4) does not depend on the age of performer or the date of
the record. However, the most ordinary performances in the 1st
group are represented especially by pianists of the old
generation (Leopold Godowsky and Guiomar Novaes). In most
cases, the total duration of Nocturne is possible to predict
according to the way of interpretation in mm. 1–4 (Pacini’s
(a) performance displays unpredictable proportion: the lengthy
period and narrative playing in mm. 1–4 lasts over 19”,
however the total duration of the piano piece is of an ordinary
length 5’20”).
In the range of 42 interpretations only a few pianists are
close to the original tempo and duration, e.g., Ts’ong, Ohlsson,
Godowsky, Varsi (see Table II). After comparing Varsi’s
interpretation to the Alfred Cortot’s notes in his 1945-edition
(b) of Nocturne Op. 9 No. 1, indicating the tempo mark 116 and
total duration of Nocturne 4’30”, the Uruguayan pianist
Dinorah Varsi may be regarded as the closest to the original
conception of Nocturne tempo.1
ACKNOWLEDGMENT
The preparation of the article and participation at ICMPC14
partly supported by the Lithuanian Composers’ Union and
(c) Kaunas University of Technology.
REFERENCES
Caccini, G. (2009). Le nuove musiche. H. Wiley Hitchcock (ed.), 2nd
ed. Middleton, Wisconsin: A-R Editions.
Cannam, C., Landone, C., & Sandler, M. (2010). Sonic Visualiser: An
Open Source Application for Viewing, Analysing, and
Annotating Music Audio Files”, in Proceedings of the 18th ACM
International Conference on Multimedia, 1467–1468. Retrieved
(d) April 18, 2016, from http://sonicvisualiser.org/sv2010.pdf.
Finck, H. T. (1913). Success in Music and How it is Won. New York:
C. Scribner’s sons, chapter “Paderewski on Tempo Rubato”.
Hudson, R. (1997). Stolen Time: The History of Tempo
Rubato. Oxford: Oxford University Press.
Rosenblum, S. P. (1994). The Uses of Rubato in Music, Eighteenth to
Twentieth Centuries. In Performance Practice Review, 7(1),
article 3, 33–53.
Tomaszewski, M. (2016, April 18). Series of Programs “Fryderyka
(e) Chopina Dzieła Wszystkie”, Polish Radio, Program II. Retrieved
mm. 1 2 3 4
2.5” 5” 7.5” 10” 12.5” 15” 17.5” 20” 22.5”
April 18, 2016, from
http://en.chopin.nifc.pl/chopin/composition/detail/id/153.
Figure 2. Some visual representations of Chopin’s Nocturne in Tosi, P. F. (1743). Observations on the Florid Song, or Sentiments on
B-flat minor, Op. 9 No. 1, section mm. 1–4, by Sonic Visualiser the Ancient and Modern Singers. London: J. Wilcox.
2.5.: 1
It is regarded, that Alfred Cortot (1877–1962) was highly influential for
(a) – computer generated sound-sample 12.407” his poetic interpretations of piano pieces by Chopin and Schumann. As is
(b) – Ts’ong 13.255” known, Cortot studied at Paris Conservatoire with Chopin’s student Emile
(c) – Moravec 16.476” Decombes. Respectively, the Uruguayan pianist Dinorah Varsi (1939–2013)
(d) – Magaloff 20.204” studied under Sarah Bourdillon, a pupil at the Alfred Cortot’s Ecole Normale
(e) – Sombart 25.956” de Musique. So, Varsi’s interpretation of Chopin’s Nocturne is a possible
legacy of Cortot’s and respectively Chopin’s conception.
574
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 575 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The Seattle Singing Accuracy Protocol (SSAP) is a set of standardized tasks designed to
measure singing accuracy and component skills including matching novel pitches and
patterns in different timbres, singing a familiar song from memory on text and neutral
syllable, and testing perceptual acuity. The SSAP was developed by a group of
researchers from cognitive psychology, neuroscience, and education in an attempt to
standardize the way certain basic tasks were being measured. The goal was to provide a
measure of basic singing tasks that were short enough to be included in any study of
singing accuracy thus allowing researchers to compare findings across studies and
develop and database of singing performance across populations of different ages and
skill levels. Our goal is to make the measure available to any scholars interested in
studying singing accuracy and its development.
This presentation will provide a brief demonstration of the measure and focus on the
methodological challenges inherent in creating an accessible version of the protocol for
use online and the quality of the initial data we are receiving. Methodological challenges
include insuring clarity of the tasks, quality of input, accurate real-time acoustic signal
analysis, and a user-friendly interface. We will share an analytical comparison of how
well the automated acoustic analysis mimics human judgments of accuracy for both
pattern matching and a cappella singing. We will also offer a descriptive analysis of
preliminary data collected from the general public since the beta version went live in
November of 2015 to see who is choosing to participate and what kind of picture we are
getting of singing accuracy in the general population.
ISBN 1-876346-65-5 © ICMPC14 576 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
To what extent do a pianist and cellist understand what just happened in their live duo
performance in the same way as each other, and in the same way as their audience
members do? In previous case studies on jazz standard improvisation and free jazz
improvisation, performers did not fully agree with their partner's characterizations of
what occurred in the improvisations, and they could agree with an expert outsider's
characterizations more than they agreed with their partner. Can this pattern also be
observed in classical chamber performance? In the current study, a cello-piano duo
performed Schumann’s Fantasiestücke Op. 73 no. 1 once for the other 12 members of
their conservatory studio class, which included another duo who were prepared to play
the same piece that day. Immediately afterwards, the listeners and players individually
wrote the top three things that had struck them about the performance. Then, listening to
the recording on personal devices, they marked on scores and wrote about three moments
that struck them as most worthy of comment. After all had finished, the response sheets
were redistributed for each participant to rate their agreement (on a 5-point scale) with,
and comment on, another participant's characterizations. Following this, there was a
second redistribution for agreement ratings with one other participant’s characterizations.
Results showed that the performers did not select any of the same moments in the piece
as worthy of comment and that they agreed with less than half of their performing
partners’ statements; this was markedly less than they agreed with statements by the
listeners who were prepared to play and markedly less than the other listeners’ agreement
with each others’ statements. Even some agreements included comments demonstrating
that the agreement was qualified. As we have seen in other genres, performers’
understanding of what happened is not necessarily privileged relative to outsiders’
understanding.
ISBN 1-876346-65-5 © ICMPC14 577 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 578 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
cello and used each of the four strings. Each pitch had a low the melody played in an upper playing position was a
and high playing position note. Twelve pitches total were “correct” answer. Therefore, should the results support the
selected for part one of this study. Six melodies were selected hypothesis, the mean score should be greater than 6. Initially
from Suzuki Cello School: Cello Part, Volumes 3 and 4 (1999, it could appear that our results are inconsistent with our
2003). It was deemed practical to equally represent both major hypothesis (M=5.96, SD=2.7). However, upon closer
and minor mode because of their opposite emotional examination, these results still provide an interesting picture.
connotations. Therefore, three of the pieces picked were in a Figure 1 shows a distribution of the frequencies of the
major mode and three were in a minor mode. The minor mode results for part two of this study. As you can see, there are two
is often associated with sadness (Post & Huron, 2009). It was swells in the data. Participants either scored mostly below 6
observed that cellists might move into the upper register more
(showing a preference for melodies played in a low playing
often for a sad expressive musical passage than for a happier
position) or mostly above 6 (showing a preference for
one. Therefore, the prediction was made that participants
melodies played in a high playing position). Few participants
would select the high playing position melodies as more
expressive more often for the minor mode selections than actually scored around chance (6). To test the significance of
major mode selections: (H3) Participants will select the this phenomenon, a post-hoc test was run. To verify that the
melodies played in a high playing position as more expressive means of each half of the data were significantly different
for minor mode melodies more often than for major mode Figure 1. The Distribution of Scores for Part Two
melodies.
C. Interface
JavaScript and Jspsych were used to create the interface for
this study (Flanagan, 2006; De Leeuw, 2015). When
participants clicked on the first link for part one, a main
window and a pop-up window appeared. The pop-up window
contained the audio examples of notes played in a high
playing position and notes played in a low playing position.
The main window thanked them for participating and then
proceeded to the demographic questions. After these,
participants were briefly informed of what playing positions
were on the cello and asked to familiarize themselves with the
examples in the pop-up window as much as they liked
throughout that part of the study. They were then presented
with a pair of notes and asked to identify which they believe
was played in a high playing position by pressing 1 or 2 on than chance, six was subtracted from each of the scores and
their computer keyboard. For part two, a similar procedure the absolute value was taken. A t-test was used again, this
occurred. Participants were presented with pairs of melodies time with a test value of zero since zero represented
and asked to select which performance they deemed more participants lacking a strong preference for either low or high
expressive. playing positions. The results are consistent with the idea that
participants had an opinion that either low playing position
III. RESULTS melodies or high playing position melodies were more
Recall that the first hypothesis predicted that participants expressive rather than playing position having no affect (M =
would be able to hear a difference between the same notes 2.28, SD = 1.44, t (46) = 10.84, p <.0001).
played in a low playing position versus a high playing Recall that for this part of the experiment, two cellists were
position. This ability was investigated by asking participants used. Half of the participants heard one cellist (Cellist A) play
to listen to a pair of notes and choose which one was played in all of the melodies and half of them heard the other (Cellist
a high playing position. Each participant heard 12 pairs of B). Comparing resulting scores with an independent means
notes. Therefore, if the results were consistent with the t-test shows that there is a significant difference between
hypothesis, the resulting mean of scores should be higher than responses to melodies played by Cellist A (M = 7.67, SD =
chance (6). A simple t-test demonstrates that the results are 1.99) and responses to melodies played by Cellist B (M =
consistent with our hypothesis in that participants on average 4.17, SD = 2.15, t (45) = 5.78, p <.0001). This difference is
scored significantly higher than 6 (M = 7.96, SD = 2.86, t (46) also evident in the distributions presented below further
= 4.69, p < .0001). supporting the results from the post-hoc test (Figures 2 & 3).
Our second hypothesis predicted that melodies played in Participants who heard Cellist A showed a significant
mostly upper playing positions would be perceived as more preference for melodies played in the upper register (t
expressive than those played in lower playing positions. To (23)=4.1, p <.0001) and participants who heard Cellist B
test this, participants heard 12 pairs of melodies, one version showed a significant preference for melodies played in the
played with higher playing positions and one with lower lower register (t (22) = -4.1, p <.0001). Theories behind this
playing positions. They were asked to select which of the two difference in results are examined in the discussion section.
was more expressive. The results were coded so that selecting The third hypothesis predicted that participants would
579
Table of Contents
for this manuscript
select melodies played in a high playing position as the more requires a high level of skill, it is worth noting that Cellist A
was at a more advanced playing level than Cellist B. Cellist A
Figure 2. The Distribution of Scores for Cellist A
had just completed a bachelor’s degree in cello performance
and was applying to graduate programs. Cellist B had just
finished their sophomore year of undergraduate studies in
cello performance. This could perhaps explain the difference
in results. Looking at the written responses, both the highest
scores for each cellist indicate that dynamic variation, timbre,
rubato, and vibrato were all factors in their responses. This
might indicate that Cellist A was able to execute these
variables well in a high playing position while Cellist B was
less able to. For Cellist B, responses also mentioned
intonation and emotion as part of their response possibly
indicating that in the upper register, Cellist B played less in
tune and more robotically. Additionally, perhaps results
would have been different had the cellists been able to
practice the melodies before the recording session.
Considering the positive results for Cellist A, it is
Figure 3. The Distribution of Scores for Cellist B worthwhile to discuss other reasons why cellists might prefer
the upper register of the instrument when aiming to play more
expressively. One reason is that vibrato changes as one moves
up the fingerboard. In some empirical research, vibrato has
been found to increase in both width and rate when moving
from a low playing position to a high one. Allen, Geringer, &
MacLeod (2009) found that the rate of vibrato of a
professional violinist increased from a mean of 5.7 Hz in first
position to a mean of 6.3 Hz in fifth position, and that their
vibrato width increased from 40 cents in first position to 108
cents in fifth position. This finding extends beyond that
single-case example. MacLeod (2008) found in another study
with high school and university violin and viola players that
the vibrato width was wider in the upper register with a mean
of 58 cents as compared to 34 cents in the lower register, and
that the rate was also faster in the upper register. Similarly,
David A. Pope (2012) measured vibrato characteristics of
expressive melody if the melody was in the minor mode rather high school and college level cellists and found that vibrato
than in the major mode. An independent means t-test was rates increased from a mean of 5.07 Hz in first position to a
conducted. The results were consistent with the hypothesis in mean of 5.33 Hz in fourth and thumb positions, and that the
that, for both cellists, melodies in the upper playing positions width increased from 23 cents in first position, to 34 cents in
were rated as more expressive more often when they were in fourth position, to 43 cents in thumb position.
the minor mode (M = 3.4, SD = 1.66) than in the major mode The explanation for why vibrato widths and rates increase
(M = 2.53, SD = 1.6, t (92) = -2.59, p = .011). as one’s hand moves up the fingerboard is partially related to
physics while also being effected by cello technique and
IV. DISCUSSION characteristics about the instrument. Generally, vibrato width
The results of the first part of the study were consistent increases as one moves up the string because the string length
with the prediction that listeners would be reasonably able to shrinks causing the physical distance between the notes to
categorize cello pitches into high and low playing positions. diminish as well (MacLeod, 2008). Because of this,
In additional analyses of the data for part one, no effect was pedagogues generally encourage students to use a smaller
found for what instrument the listener played or for whether vibrato width as they move up the instrument. Executing a
or not the tones played were sympathetic or non-sympathetic thinner vibrato width results in a reduction of the width of the
pitches. The results of the second part of the study were oscillating motion of the hand. Consequently, the time it takes
partially consistent with the second main hypothesis. to complete a full oscillation is also reduced resulting in a
Participants who heard recordings by Cellist A, on average, faster vibrato rate. This phenomenon is further complicated by
rated the melodies played in high playing positions as more changes that hand position and vibrato technique go through
expressive. However, participants who heard Cellist B’s as one moves to higher playing positions. For example, when
recordings, on average, rated the melodies played in low moving into fourth position, the neck of the instrument ends
playing positions as more expressive. Given that the ability to and the body of the instrument partially obstructs the hand as
play musically successfully in the upper register of the cello it vibrates thereby shrinking the vibrato width and rate. The
580
Table of Contents
for this manuscript
consequential vibrato alterations of shifting hand positions As mentioned earlier, it is impossible to declare exactly
may factor into why a string player would opt for one hand which musical element of the upper range of the cello might
position over the other for expressive reasons. have contributed to the perceived higher level of emotional
In order to predict why a string player might chose a wider, intensity for Cellist A. While vibrato and timbre are certainly
faster vibrato, it is necessary to consider what the emotional a part of the equation, both parts of this study provide strong
impressions are of various vibrato widths and rates. Research evidence for the conjecture that using high playing positions
on vocal vibrato, which is comparable in range and width to on the cello might communicate a higher emotionality by
string vibrato (Seashore, 1931), gives insight into the affective mimicking the similar vocal affective cue.
information of vibrato widths and rates. Dromey, Holmes,
Hopkin, & Tanner (2015) analyzed recordings of graduate
ACKNOWLEDGMENT
vocalists and found increased vibrato rates for selections that Our thanks to the members of the Cognitive and Systematic
were rated as more emotionally intense. Other studies by Musicology Laboratory at Ohio State University for their
Sundberg (1994) & Sundberg et al. (1995) found the same critical feedback and support.
correlation between vibrato rate and emotional intensity as
well as some suggestion of vibrato width being positively REFERENCES
correlated as well. Assuming that vibrato could carry the same Allen, M. L., Geringer, J. M., & MacLeod, R. B. (2009).
emotional meaning for instruments other than voice, perhaps Performance practice of violin vibrato: An artist-level case study.
Journal of String Research, 4, 27-38.
string players move to higher playing positions for a wider,
Banse, R., & Scherer, K. R. (1996). Acoustic profiles in vocal
faster vibrato to give a passage a higher emotional intensity. emotion expression. Journal of personality and social
Given this research, it is hard to say which would be the psychology, 70(3), 614.
dominant reason for moving into the higher register: to use a Bharucha, J. J. (1991). Pitch, harmony, and neural nets: A
high pitch register to portray high emotionality or to achieve a psychological perspective. In P. Todd & G. Loy (Eds.), Music
wider, faster vibrato to portray high emotionality. Also, it may and Connectionism (pp. 84-99). Cambridge, MA: MIT Press.
De Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating
be that the two are intertwined. Perhaps a wider, faster vibrato
behavioral experiments in a Web browser. Behavior Research
is part of the cue for high register, which communicates high Methods, 47(1), 1-12.
emotionality. On the other hand, perhaps high register Dromey, C., Holmes, S. O., Hopkin, J. A., & Tanner, K. (2015). The
portrays a greater emotionality because of the wider, faster Effects of Emotional Expression on Vibrato. Journal of Voice,
vibrato. This study cannot provide a conclusive answer to this 29(2), 170-181.
relationship. Perhaps future work will be able to differentiate Flanagan, D. (2006). JavaScript: the definitive guide. " O'Reilly
Media, Inc."
between these possible factors and determine any possible
Honorof, D. N., & Whalen, D. H. (2005). Perception of pitch
causality. location within a speaker’s F0 range. The Journal of the
Another potentially expressive device, timbre, may be Acoustical Society of America, 117(4), 2193-2200.
affected by fingering decisions as well. Open (un-fingered) Juslin, P. N., & Sloboda, J. (Eds.). (2011). Handbook of music and
strings have been found to have a brighter timbre than stopped emotion: Theory, research, applications. OUP Oxford.
(fingered) strings because the finger on the string dampens the MacLeod, R. B. (2008). Influences of dynamic level and pitch
register on the vibrato rates and widths of violin and viola
higher harmonics (Schelleng, 1973). If a string player wants
players. Journal of Research in Music Education, 56(1), 43-54.
to avoid the brightness of the open strings, they may be Pope, D. A. (2012). An Analysis of High School and University
tempted to move to a higher playing position to facilitate Cellists’ Vibrato Rates, Widths, and Pitches. String Research
playing the closed versions of those notes. There may also be Journal, 3, 51.
a timbral difference between notes played on a high string in a Post, O., & Huron, D. (2009). Music in minor modes is slower
low playing position and those played on a low string in a (except in the Romantic Period). Empirical Musicology Review,
Vol. 4, No. 1, pp. 1-9.
high playing position. No measurements of timbre-related
Seashore, C. E. (1931). The natural history of the vibrato.
acoustical features were taken for this paper. However, in the Proceedings of the National Academy of Sciences, 17(12),
future, it would be helpful to examine the spectral differences 623-626.
of each register given that our results indicate an audible Schelleng, J. C. (1973). The bowed string and the player. The
difference. Journal of the Acoustical Society of America, 53(1), 26-41.
Results were also consistent with our third hypothesis, that Scherer, K.R., Johnstone, T., & Klasmeyer, G. (2003). Vocal
expression of emotion. In: R. Davidson, K. Scherer, & H.
participants would more often prefer upper playing position
Goldsmith (eds). Handbook of Affective Sciences. Oxford:
melodies when in the minor mode compared to the major Oxford University Press, pp. 433-456.
mode. It could be that playing in the upper register has a Sundberg, J. (1994). Acoustic and psychoacoustic aspects of vocal
darker timbre than in the low register. A darker timbre has vibrato. STL-QPSR, 35(2-3), 45-68.
been found to be an element of sad speech (Scherer, Sundberg, J., Iwarsson, J., & Hagegård, H. (1995). A singer's
Johnstone, & Klasmeyer, 2003). The minor mode is also expression of emotions in sung performance. In Vocal fold
physiology. Singular Publishing Group, Inc..
associated with a sad affect (Juslin & Sloboda, 2011).
Suzuki, S. (1999). Suzuki Cello School: Cello Part (Vol. 3). Alfred
Therefore, perhaps if the upper register indeed has a darker Music Publishing.
timbre, participants found the sadder minor mode melodies to Suzuki, S. (2003). Suzuki Cello School: Cello Part (Vol. 4). Alfred
be more suited to that upper playing position. Music Publishing.
581
Table of Contents
for this manuscript
Expert musicians tend to exhibit precise expressive control, thereby allowing them to
consistently execute their interpretation across multiple performances of the same piece.
However, experts are also distinguished by their creativity, with both deliberate and
spontaneous variations occurring in response to the expressive intentions and
circumstances of each live performance.
Although research often seeks to identify shared expressive traits across performers
relative to musical structure, the study of creativity necessitates the investigation of
individual performance characteristics. This project therefore uses a case study approach
to explore the relationship between expressive consistency and creativity in performances
by renowned Baroque violinist, Rachel Podger.
Podger recorded Bach’s solo violin works in 1999 and gave multiple live performances
of the same pieces in January 2016. The expressive characteristics of these performances
were measured using Sonic Visualiser software, with focus placed on her use of rubato,
dynamics, articulation, and ornamentation. Podger also participated in interviews to
provide information about the reasons for her deliberate expressive choices, her
purposeful changes across different performances, and any spontaneously creative
variations that occurred.
Results confirm that Podger created and consistently executes a stylistically appropriate
interpretation while also remaining expressively flexible in each performance. The
consistency occurs in the location of expressive nuances relative to aspects of musical
structure and style (including meter, cadences, contour, voice leading, motivic repetition,
and dissonance). In contrast, expressive creativity appears in the use of improvised
ornamentation (both melodic embellishments and vibrato) and in the magnitude of timing
and dynamic variability both within and across different performances.
ISBN 1-876346-65-5 © ICMPC14 582 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The Role Repertoire Choice Has in Shaping the Identity and Functionality of a
Chamber Music Ensemble
Alana Blackburn
School of Education, The University of New England, Australia
ablackb6@une.edu.au
ISBN 1-876346-65-5 © ICMPC14 583 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
make sense of the data and ensure that the focus of the homogenous ensemble (or whole consort in early music
analysis is on the participant while conducting a free textual terminology) in that members all play the same instrument
analysis noting and commenting on the participants’ from the same family. After beginning the ensemble with a
descriptions of things that matter to them and the meaning of conductor, in recent years the ensemble has become leaderless.
those things for the participant (Smith et al., 2009). Emerging During this change, the ensemble has embraced its new
themes were then analyzed and any connections across these structure as part of its identity, and Stella believes it reflects a
themes were identified. The IPA method requires that once an truer display of the repertoire:
in-depth analysis is conducted on one case study, only then
can the researcher move on to the next (Smith et al., 2009, p. …this has become in a way a part of our ensemble identity
82). Analysis was completed using Nvivo software where which is very much related to the concept of consort itself, like
themes were identified for each individual participant and we are a group but we don’t have one leader, it’s all shared as
then common themes were identified across all cases. For it is in the music we play, the polyphonic music where you
these three specific participants repertoire was discussed a don’t have one part in the music which is more soloistic but
number of times, emphasizing importance to the ensemble. everybody has the same themes, everybody shares the same
Pseudonyms have been used throughout this paper and the material and everyone is equally important. So partly this has
ensembles that these participants are members of have not a very strong influence in the identity of the group. (Stella)
been named. Table 1 provides an overview of the participant
case studies used for this paper. It shows the pseudonym used From Stella’s description of the group’s identity it is
in the study and the professional profile describing the evident that the structure of the group, the collaboration and
ensemble they are a member of. equality of parts is reflected in the repertoire they perform. In
terms of leadership, this adds another element to the
Table 1. Participants’ background information preparation of the repertoire as the ensemble performs all
programs from memory. As Stella explains, this not only
Name Participant Profile
requires memorizing the player’s individual part, but also to
Stella Repertoire: specializes in 16th and remember who is leading at what point in the piece. Although
17th Century polyphony. Stella identifies this as a risk in performance, staying true to
Homogenous ensemble of 13 consort music material means leadership is shared and the
members normal hierarchical roles in relation to the order on the score
(highest to lowest voice) do not determine leadership.
Brett Repertoire: not available for this
ensemble, arrange and transcribe
…it is more of a risk, on the other hand I think it’s much
their own repertoire. Homogenous
more loyal to the music itself because in that kind of music we
ensemble of 4 members
play, it’s not, the top line, ja, it’s the highest but not always the
John Repertoire: composer/performer most important. The most important parts are actually the
ensemble performing middle voices like if there’s a cantus firmus it’s most likely
contemporary repertoire. Mixed going to be on the tenor part or even the bass, and if there’s
ensemble of 6 members imitation it’s going to go through all the lines and very often it
doesn’t start on the top line, so I think musically it makes much
more sense to do this divide the leadership and to feel, then
III. FINDINGS also that all these different people in different roles all have a
moment of shine or coming, being a little bit in the spotlight in
A. Case study 1 – Early music ensemble every composition that makes everyone more alert even
There is a whole area of musicology devoted to historically through it’s more challenging. (Stella)
informed performance (HIP), which is known for its thorough
practice-based research. The research and investigation into The ensemble does not have a conductor and therefore it is
earlier periods of repertoire is often the basis on which an important that leadership is shared within the group.
ensemble may develop. How this research into historic Rehearsal processes require at least one person ‘in charge’,
performance practice shapes an ensemble goes much deeper however, while performing musical decisions are made and
than specializing in a particular repertoire for Stella and her led between the players themselves, reflecting the shape and
ensemble. Stella is a core member of an ensemble focused on intention of the chosen repertoire.
complex renaissance polyphonic consort music. The This way of performing through directing the musical ideas
repertoire is arranged and transcribed to suit the from the members within the ensemble and not in a typical
instrumentation by members of the ensemble. Stella takes hierarchical way also has an effect on the audience’s
pride in researching 16th and 17th century practices and experience:
includes these in her ensemble. These practices include
performing without a score, performing without a conductor, …we found it quite interesting to notice that since we don’t
allowing the musical material to determine who is leading at have a conductor, almost after every concert, if not after each
what point within the piece, and how themes are musically single concert we have given since then, people erm, audience
interpreted and played. members come to us and say “wow this was so er interesting to
On the edge of what normally constitutes a chamber watch because I could see how the music was going woo woo
ensemble, Stella’s ensemble comprises 13 members. It is a [laughs] and how you were looking at each other and some
584
Table of Contents
for this manuscript
people were here and some people…” – so even people who As a unique ensemble, they are developing a new genre and
don’t really know why that happens, or maybe they are not identity. The process of creating their own repertoire not only
that acquainted with the music they see this interaction and it gives the ensemble a sense of fulfillment in performance, but
captures them visually as well. (Stella) the group is making a contribution to the genre in the hope
that it will grow and encourage other instrumentalists to form
The absence of a conductor has allowed the ensemble to similar groups.
focus on the repertoire in a different way. The analysis
undertaken by the group to decide which musical lines … that’s very problematic for getting more momentum wit
introduce new material, the shape in which the music takes, the genre. I mean we’re establishing the repertoire side of it…
and performing from memory has had an effect on the way (Brett)
this ensemble performs in public; the highest level of
non-verbal communication is employed. Brett understands that this new genre of ensemble means
that there is no existing repertoire available. The goal of the
B. Case study 2 – Homogenous ensemble ensemble is not only to perform works arranged for the
The most widely studied example of a homogenous ensemble to an audience, but also to build the genre through
self-managing ensemble is that of the string quartet, evinced making repertoire more readily available. The lack of music
by the research referenced earlier. This genre came to its peak requires more forward planning, not allowing for spontaneous
in the 19th century, instilling a working group whereby each or impromptu performances.
instrument and its position within the ensemble dictates its
musical role. A genre developed over 250 years, repertoire is C. Case study 3 – Composer/performer ensemble
abundant, requiring a certain level of fundamental Composer/performer ensembles are groups that are
understanding that can be put into practice each time a piece founded by a composer who also performs within the group
is chosen to be rehearsed and performed. For ensembles that playing their own repertoire. In the case of John, he acts as
do not have this wealth of repertoire, the identity and shape artistic director, composer and performer providing the
the ensemble takes becomes centered on the repertoire, ensemble with new repertoire as well as searching for unique
needing more commitment and effort on the part of the collaborations seeking to provide context through the works
players to source, commission or transcribe/arrange existing they are performing. The music performed by this ensemble is
works and adapt them to the ensemble. new, however, John does not label the ensemble as a ‘new
Brett is in a homogenous quartet of which there is limited music ensemble’ or ‘contemporary music ensemble’ as the
repertoire available. This forces the members of the quartet to goals set out by the group do not lend its identity to that of a
arrange music written for other ensembles and instruments, new music group.
often drawing on existing repertoire from string quartets and
percussion ensembles. The ensemble needs to arrange I guess the goal of that initially was to basically play 20th
everything and is frustrating for Brett as it can hinder the century music generally, mainly, you know, play new works
productivity of the ensemble: and old works and so forth. But when I say old, still, relatively
recent. So it wasn’t like a new music ensemble per say it was
… so every new CD we do we have to arrange all the music. never really designed to be like a contemporary music
You know it’s arranging existing works, but, you know, like, ensemble either really. It’s interesting like how we became
that’s a hell of a lot of work to, whenever we even think about identified later on. (John)
doing new pieces, you know, there’s a whole stack of work
that’s got to go into even getting the notes in front of you Collaboration and programming are the focus, aims and
[laughs]. It’s pretty high maintenance [laughs]. (Brett) goals of this ensemble, not specific repertoire from a
particular style or genre. Their programs are often mixed with
To develop and sustain a profile as a chamber ensemble it folk, popular or musical theatre. The group identifies with
requires a large commitment from all the members. This is contextualized, interdisciplinary performance.
often time consuming as far as rehearsals, preparation,
discussion, communication and travel. For Brett, his definition …we can act-you know and that really changed, and since
of a ‘high profile’ ensemble is one of frequency of then like it hasn’t really been about the genre or the repertoire,
performance and recording output. Creating their own or necessarily the pieces, it’s about how the whole thing works
repertoire takes a large amount of time and therefore restricts as a whole and how the threads kind of interlie and you know,
the profile of the group: personally I really like other groups that do that which of
course there are some, but others you know focus on the pieces,
… it’s probably why we don’t have a higher profile because, which is great too. (John)
well not only that, but because you can’t, you know, like a
string quartet or a brass quintet, or whatever recorder quartet, John further explains that through contextualizing and
you can’t just go to a library and just pic out a rarely collaborating in programs, the risk of introducing new music
performed piece and put that on your program, there’s just to an audience is minimalized:
nothing there. So there’s this whole you know which has to
happen. So, not a lot of things can happen spontaneously. It all When we perform new music if you use that as an example,
has to be really planned, long-term vision and that’s tricky to of course there’s going to be a variety of responses that
maintain. (Brett) audiences have to that, but generally I think because we
585
Table of Contents
for this manuscript
contextualize it. It’s never really a big issue because it’s never eclectic ensemble, the roles of each of the instruments or
music for it’s own sake, that risk is not so much there. (John) players is flexible and therefore changes depending on the
style of repertoire or each individual piece. This requires a
The composer/performer’s role is often to produce different approach to rehearsal, performance and discussion
repertoire, however in this case it is shared amongst the other about interpretation and leadership.
members of the ensemble depending on their skills and There have been a number of studies exploring the effect
strengths – a true collaborative effort. Because each member known and unknown repertoire has on an audience and part of
is involved in the production of repertoire, and the ensemble developing an ensemble today is audience engagement (Ford
is flexible in its formation, this changes the dynamic, & Davidson, 2003; Pompe, Tamburri, & Munn, 2013).
leadership and direction of each piece within a program or Knowing who you want to play for has an effect on what you
performance creating a more complex dynamic. are going to play for them. John describes his audience as ‘a
thinking audience’ not ‘genre specific’ and that the audience
We do a lot of small chamber, so like dos and trios and wants ‘to be informed of new things and ideas’. To connect
things like that, and that also means it’s different, everyone with this audience, John focuses on incorporating elements of
has a different function. In the past also, less, ah… in different history, anthropology and science into his programs,
ways now but, in the past there was in terms of like contextualizing the new repertoire the group has created. The
arrangements that were done on sh-, like notes on page, where musical flow of leadership within Stella’s ensemble gives the
now a lot of the arrangements have been aural and stuff but audience a visual as well as an aural map of the music; the
notes on pages there were maybe three people that probably polyphonic writing of the repertoire is shared throughout the
contributed equally when it came to doing those arrangements ensemble giving new insight into the composer’s musical
so that also mean that generally those people would lead stuff, intentions. This results in audiences enjoying music they may
because they knew how it would be. That still kind of happens be unfamiliar with. The memorization of the repertoire also
now but in a slightly, slightly different way. (John) allows members of the group to engage with and ‘focus on the
audience’ rather than a conductor or music stand.
Refusing to be labeled as a ‘new music ensemble’ despite Ensemble identity is formed through a number of factors, a
the currency of the repertoire performed, John prefers to focus title for the group, dress, artistic and organizational roles, and
on the package or program performed as a whole. The story shared goals (Blank & Davidson, 2007, p. 245). The choices
told through the music and interdisciplinary collaboration is of behind certain repertoire do shape the values and outcomes of
more importance than the individual pieces, making the an ensemble but also develop the identity of the group and its
performances more ‘audience friendly’. raison d’être. The aims of an ensemble and why they focus on
certain repertoire differ: (i) they do it for purely
IV. DISCUSSION artistic/musical reasons, (ii) the group is composer driven, (iii)
In today’s cultural environment musicians act as the programs are educational or contextual, or (iv) the group
entrepreneurs deciding and searching for their own brand or is experimental and interdisciplinary. The repertoire reflects
identity. This includes sourcing their own repertoire and these aims and the three cases presented provide clear
performance opportunities. For the participants presented in examples of well thought out alignment between the group’s
this paper, finding repertoire means not playing ‘standard’ aims and identity and repertoire choice.
works but sourcing, arranging, and transcribing existing
pieces, or commissioning and composing new scores. This V. CONCLUSION
process therefore takes time and can effect the momentum of This paper explored the role of repertoire choice in shaping
an ensemble, as is the case of Brett’s ensemble. By not easily the social processes and functionality of chamber music
accessing pieces ‘off the shelf’ timelines are structured around ensembles. It highlights the implications of such repertoire
composer availability and members’ willingness to assist in choice that in itself is a carefully thought out process that
searching for repertoire. The chosen repertoire also reflects effects the ensemble in terms of sourcing music, leadership,
the goals and desired outcomes of the ensemble. For Stella audience engagement, time management and identity. The
and John, the repertoire performed is embodied within the participants in the three cases presented here identified
artistic vision of the group, representing the values and aims repertoire as an extremely important factor in how the
the ensemble identifies with. Brett explains that the artistic ensemble is shaped, indicating that in order to maintain a
output of the group is not only performing or recording successful ensemble more thought needs to go into aligning
repertoire they have arranged, but also building a library for the aims, goals and objectives of an ensemble.
the genre. Highlighting such an important step in shaping an
The repertoire represented by the ensemble comes with it a ensemble means that this process cannot be overlooked when
number of socio-cultural factors, both within the ensemble aligning the aims and objectives of a professional ensemble.
and the audience. The knowledge each performer possesses The information provided by these participants leads to
contributes to the musical outcome of the score both on an further investigation into audience development and whether
aesthetic and technical level. Different repertoire is also found this alignment and the impact good repertoire choice has on
to effect the co-ordination of content, allowing the musicians the audience is worthwhile for an ensemble. This information
to focus on different aspects in preparation for and in a also informs practitioners and teachers the significance this
performance (Davidson & Good, 1997). The ensembles has on the success of a chamber music ensemble.
represented in this paper: early music ensemble, homogenous
(same instrument family) ensemble, and contemporary
586
Table of Contents
for this manuscript
REFERENCES
587
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 588 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 589 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
4
Performance evaluation scale (95% CI)
IV. RESULTS
F. Data Analysis
After we removed n = 94 incomplete data sets, n = 377
3
cases remained for data analysis. The 13 items were
dichotomized and analyzed by Item Response Theory (1 PL
model) for Rasch validity. The marks for the 3 remaining
items from IRT analysis (authentic, sonorous, professional)
2
Experts
were summed up and divided by 3, so that the resulting Amateurs
Performance Evaluation Scale (PES) was within a score range
from 1 to 4. To increase the contrast between groups of high
1
vs. low level of musical sophistication, we excluded n = 47
Prelude Prelude Gigue Gigue
participants who were neither musical experts (≧ 8 yrs of with music stand w/o. music stand with music stand w/o. music stand
music lessons) nor amateurs (1-4 yrs of music lessons). A
final sample of n = 330 cases remained for statistical analysis. Figure 2. Mean values of the Performance Evaluation Scale for
the interaction of with vs. without music stand and high vs. low
G. Differences between main conditions degree of musical sophistication (1 = poor, 4 = very good). In all
conditions, the performer knew the piece by heart, but in the
Figure 1 shows the between groups main effect for the condition with music stand, he or she looked at the score.
conditions (with vs. without music stand). Evaluations for
playing without music stand were slightly more positive (M = V. CONCLUSION
3.20, SD = 0.51) when compared to a performance with a
visible music stand (M = 3.05, SD = 0.62). A t-test for this We conclude that the audience’s appreciation of a
main effect resulted in a significant finding, which was, particular performance by heart might be based on factors
however, characterized by a small effect size (t(318.47) = other than the objective performance quality. Additionally,
-2.47, p = .014, d = 0.27). The absolute difference between although performance from the score is often confounded with
both conditions was only 0.15 scale steps (out of a maximal a player’s supposed lack of technical proficiency – which was
range of 3). also the case in Williamon’s (1999) study – the mere use of a
music stand subsequently does not result in a negative
performance evaluation. Against the background of previous
4
590
Table of Contents
for this manuscript
ACKNOWLEDGMENT
We thank two anonymous reviewers for their helpful
comments on an earlier version of this manuscript as well as
Cosimo Caloeiro for his patience and cooperation during the
recording of the videos and Theresa Tamoszus for her support
during the implementation of the study.
REFERENCES
Behne, K.-E. (1994). Gehört, gedacht, gesehen: zehn Aufsätze zum
visuellen, kreativen und theoretischen Umgang mit Musik [Heard,
thought, seen: Ten essays on visual, creative, and theoretical
dealings with music]. Regensburg: Bosse.
Berlo, D. K., Lemert, J. B., & Mertz, R. J. (1969). Dimensions for
evaluating the acceptability of message sources. Public Opinion
Quarterly, 33(4), 563-576.
Chaffin, R., Demos, A. P., & Logan, T. (2016). Performing from
memory. In S. Hallam, I. Cross, & M. Thaut (Eds.), The Oxford
handbook of music psychology (2nd ed., pp. 559-571). Oxford:
Oxford University Press.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power
3: A flexible statistical power analysis program for the social,
behavioral, and biomedical sciences. Behavior Research Methods,
39(2), 175-191.
Lehmann, M., & Kopiez, R. (2013). The influence of on-stage
behavior on the subjective evaluation of rock guitar
performances. Musicae Scientiae, 17(4), 472-494.
Platz, F., & Kopiez, R. (2012). When the eye listens: A
meta-analysis of how audio-visual presentation enhances the
appreciation of music performance. Music Perception, 30(1),
71-83.
Platz, F., & Kopiez, R. (2013). When the first impression counts:
Music performers, audience, and the evaluation of stage entrance
behavior. Musicae Scientiae, 17(2), 167-197.
Reips, U.-D. (2002). Standards for internet-based experimenting.
Experimental Psychology, 49(4), 243-256.
Thompson, S., & Williamon, A. (2003). Evaluating evaluation:
Musical performance assessment as a research tool. Music
Perception, 21(1), 21-41.
Williamon, A. (1999). The value of performing from memory.
Psychology of Music, 27(1), 84-95.
591
Table of Contents
for this manuscript
Are visual and auditory cues reliable predictors for determining the
finalists of a music competition?
Friedrich Platz1, Reinhard Kopiez2, Anna Wolf2, Felix Thiesen2
1
State University of Music and Performing Arts, Stuttgart, Germany
2
Hanover University of Music, Drama and Media, Hanover, Germany
ISBN 1-876346-65-5 © ICMPC14 592 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Abstract—Our understanding of—and preference for—music is objects are perceived as animate or inanimate. Some of the
dependent upon the perception of human agency. Listeners often earliest work on the perception of animacy and causality was
speak of how computer-based performances lack the “soul” of a conducted by Heider and Simmel who demonstrated the role
human performer. At the heart of perceived animacy is causality, of motion in the perception of animacy in shapes [1]. Stewart
which in music might be thought of as rubato, and other variations in
timing. This study focuses on the role of variations in microtiming on showed that the perception of animacy needs only a few very
the perceived animacy of a musical performance. Recent work has simple perceptual cues, and experiments by Premack and his
shown that the perception of visual animacy is likely categorical, colleagues later showed that certain movements allow viewers
rather than gradual. Although a number of studies have examined to ascribe intentions and motives to abstract geometric figures
auditory animacy, there has been very little research done on whether such as triangles and squares [2]–[4]. Tremoulet & Feldman
it might be thought of as a dichotomy of alive/not alive, rather than a demonstrated that the perception of causality in visual stimuli
continuum. The current study examines the specific intricacies of
musical animacy, specifically, how microtiming variations of inter- might be linked to change in direction [5]. Scholl and
onset intervals contribute to the perception that a piece was human Tremoulet refer to this as the “energy violation” hypothesis
performed. Additionally, this study aims to examine the possible [6]. Broze codified much of this research into six indicators of
nature of categorical/continuous perception of musical animacy. In an animate object [7]:
Experiment 1: “Rohum”, computer sequenced MIDI renditions were 1. Animate agents are self-propelling or locomotive, and
manipulated to contain set random fluctuations of inter-onset
can move under their own power.
intervals. In Experiment 2: “Humbot”, participants were presented
with human performances digitally recorded using MIDI keyboards, 2. Animate agents possess intentionality, and exhibit goal-
and were asked how “alive” each performance sounded using a 7- directed behavior.
point Likert scale. Human performances were divided into ten 3. Animate agents are communicative, and employ
degrees of quantization strength, increasing from raw performance to signalling systems.
100% quantization. Results suggest an optimal level of quantization 4. Animate agents are sentient, and can experience
strength that is correlated with higher perceived animacy, and fixed
subjective feelings, percepts, and emotions.
random fluctuations of IOI are not a good indicator of human
performance. This paper discusses the role of external stylistic 5. Animate agents are intelligent, capable of rational
assumptions on perceived performances, and also takes into account thought.
musical sophistication indices and experience. 6. Animate agents can be self-conscious, having
metacognitive states about their own consciousness and that of
Keywords—Animacy, performance, microtiming variations, others.
rubato. It could be argued, however, that these qualities themselves
are less important than the perception that an agent has the
I. INTRODUCTION capacity to convey them. Recent work by Looser and
ISBN 1-876346-65-5 © ICMPC14 593 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
above, rather than demonstrating the explicit ability to do so. involved in the perception of an animate human performance
Just as visual cues can be used to infer the animacy of an or an inanimate computer MIDI sequence using micro-timing
object, it would follow that auditory cues can contribute to the variations. If microtiming variations are a salient cue for the
perception of an animate agent creating a sound. Nielsen, et perception of animacy in music, where is the tipping point
al. conducted an auditory analogue of [5], using synthesized between the perception of a “deadpan” computer generated
mosquito sounds [11]. Using binaural spatialization software, MIDI sequence or an emotionally stimulating human
the authors varied the direction of motion and velocity of the performance? Furthermore, what is the optimal magnitude of
mosquito sound and asked participants to rate how likely the variation that induces a convincingly human performance?
sound was being produced by an animate source. Interestingly, Using two experimental paradigms, this study explored how
velocity changes were significantly rated as more animate the perception of animacy is affected as a computer MIDI
when compared to the other paradigms of manipulation sequence is applied with variance, and from the other
including directional change of motion or no change at all. It direction, how the perception of animacy in a raw human
gives rise to the question of what aspects of sound, and its performance changes as microtiming variations are decreased.
organization in time, can be manipulated to change the We hypothesized that animacy in music is most likely not
perception of animacy in music. linear, meaning that participants do not hear “aliveness” on a
Perhaps the closest analogue to changes of direction, sliding scale, and that there is a likely “sweet spot” in rubato
motion, and intentionality in music would be the use of where performances that are too straight will be thought of as
expressive timing. In music, expressivity is regarded by robotic, as will pieces that have too much variance.
musicians to be the most important aspect of performance
characteristics [12], and in performance, emotions are II. EXPERIMENT 1: APPLYING VARIANCE TO COMPUTERIZED
expressed through subtle implicit deviations in timing, PERFORMANCES
dynamics, and intonation. In one study, Gabrielsson and Juslin Our first study used sequenced performances, adding
asked participants to perform a melody to express different varying levels of variance to each performance. As this study
emotions such as sadness, happiness, and anger, as well as to was converting “robotic” performances to more “human”
perform with “no expression” [13]. The fluctuations in performances, we advertised it as “Rohum” for participants (as
performance timing strongly differed from performances of a way of keeping it separate from “Humbot”, which altered
other emotions. Moreover, timing variations were the most human performances).
salient cues for an expressive musical performance, and
participants that were asked to perform with “no expression” A. Methods
contained the smallest deviations in timing. Juslin defines Participants. Students of the Louisiana State University
expressivity as “random variations that reflect human (LSU) School of Music (N=38) were recruited for
limitations with regard to internal time-keeper variance and participation (18 Females, 20 Males, mean age: 20.4).
motor delays,” [14]. Geringer et al. expands upon this, Experimental trials were conducted in the Music Cognition
discussing the role of consistent expressivity as necessary for and Computation Lab.
the appreciation of music as human-produced [15]. Stimuli. Bach’s Toccata and Fugue in D minor (BWV565)
Manipulations in timing appear to elicit the greatest amount and Bach’s Concerto for Oboe and Violin (BWV1060a) were
of change in the perception of human character in music. computer generated using Finale MIDI music notation
Johnson et al. asked participants to rate their perception of software (MakeMusic) to produce precisely metrically
how “musical” a piece of music was (specifically a organized note onsets. Excerpts were split evenly by time (5
performance of Bach’s Third Cello Suite), and found that seconds each). Using Logic Pro X (Apple Inc.), tempo, timbre,
when the music was manipulated to contain an exaggerated and velocity fluctuations were minimized in order to eliminate
amount of rubato, “musical” ratings trended downwards [16]. any effect of expressiveness outside that of timing. Using
However, as rubato was manipulated to contain less than the Max/MSP software (Version 7; Cycling ‘74), we applied
original performance, participants perceived the music as variance to each recording by randomly adding or subtracting
significantly less “musical”. Bruno Repp analyzed the an interval of time determined by dividing a maximum note
relationship between microtiming variations and larger scale displacement time of 500ms into 100 equal windows. For
timing variations found in rubato [17]. When participants were example, onsets occurring at 500ms and 1000ms with 5ms
asked to perform Chopin’s Preludes No.15 and No.6 with a variance would be manipulated by randomly adding or
“normal” expression metronomically (without aid of subtracting 5ms to create a new manipulated onset of either
metronome), and in synchrony with a metronome, Repp 495/505ms or 995/1005ms respectively. If the recording were
discovered that articulated rubato performances of normal set to include a variance of 10ms, the same excerpt’s onsets
expression were simply exaggerated gestures of the random would be altered so that they fell at either 490/510ms and
variations of timing that naturally occur due to human 990/1010ms after the beginning of the recording.
limitations. This suggests that the expressive nature of rubato
mimics the micro-timing variations that already exist. This
provides a stable foundation for the basis of the present study.
This study aims to elucidate the perceptual mechanisms
594
Table of Contents
for this manuscript
595
Table of Contents
for this manuscript
III. EXPERIMENT 2: QUANTIZING A HUMAN PERFORMANCE point Likert scale (from 1, definitely not alive, to 7, definitely
The purpose of the second experiment (“Humbot”) was to alive).
approach the question of how animacy ratings change with C. Experiment 2b: Web Study
manipulation of microtiming variations, but from the opposite Participants. Twenty-nine volunteer participants (14 Males,
direction of the previous experiment (“Rohum”). This study is 15 Females) were recreuited through various social media
comprised of both a lab and a web study. The latter study was platforms, such as Facebook, Twitter, and Reddit. Participants
originally meant as a way of replicating the former study. were not required to be a part of a university setting, and we
Ideally, we would see an arc of “animacy ratings”, based on did not require previous musical training.
the human performances that were quantized to varying Design. All participants were allowed to access the study
degrees. from any location. Participants were given a link and
voluntarily completed the study using various web browsers.
A. Methods
Participants were prompted on the screen to listen to 5-10s
Stimuli. Humbot. A pianist performed Bach’s Concerto for recording excerpts at randomly selected degrees of
Organ in G major (BWV592) and Chopin’s Mazurka 49 in F quantization strength. After each stimulus, participants were
Minor (Op.68, no.4) on a MIDI keyboard. The pianist was asked to use the number keys 1 - 7 to rate each recording
instructed to perform the pieces naturally. MIDI recordings of (from 1, definitely not alive, to 7, definitely alive) before
both performances were divided into 10 degrees of moving on to the next recording. The participant had the
quantization strength. Using Logic Pro X software (Apple choice of how quickly to move through the experiment.
inc.), tempo, timbre, and velocity fluctuations were
D. Results
minimized, and quantization strength was systematically
applied on a continuum from 0% to 100% quantization by Animacy ratings. In order to examine the analogue between
increments of 10%. Both recordings were quantized to the visual and auditory animacy, the results were analyzed in a
method similar to Looser and Wheatley [8]. Like their study,
nearest 16th note. Thus, natural variation found in human
we analyzed animacy ratings by first transforming them
performance decreased as quantization strength was increased.
linearly to a scale from 0 to 1 (0 = less likely to performed by
a human, 1 = more likely human). In order to find the point at
which a recording was equally likely to be perceived as
animate or inanimate—this point is known as the point of
subjective equality (PSE; [8])— the animacy data from each
participant was fit with a cumulative normal function across
all recordings. A t-test was then performed between the PSE
of each recording and the midpoint of the transform. The
results of this t-test were significant (p <.001), indicating that
the transformation from one end of the “aliveness” spectrum is
not linear.
596
Table of Contents
for this manuscript
E. Discussion variations makes every song unique, and does contribute to the
Ideally, we would be able to demonstrate that an arc of ‘living’ character in music [14], but the present study suggests
“aliveness” exists depending on the level of quantization. We that the perceptual mechanisms involved in discerning
were unable to find such a result with the lab study, which we between animate/inanimate music are highly sensitive to the
think might be largely due to the fact that we had a relatively type of microtiming variations that are embedded within the
small sample size. As such, we decided to use the web study music.
(discussed below) less as a replication, and more as a way of Future work will also examine the types of variance and
increasing statistical power. fluctuations that might be applied. Noise and random
The web study results did, in fact, indicate a trend toward an fluctuations come in a variety of forms, ranging from “white”
optimal level of quantization strength, where highest animacy noise (uncorrelated variance) to “brown” noise (1/f2 noise)
ratings were weakly correlated. As can be seen in Figure 6 [19]. Human performance has been shown to be a mixture of
(below), we find that the original performances were correctly “pink” (1/f noise) and “white” noise [20], [19], [14], [21],
rated as “alive”, but that these ratings actually increased as a [22]. Slower tempos, or larger intervals between beats
minimal level of quantization was applied. Interestingly, produces a variance that more corresponds to 1/f noise. A task
however, these ratings sharply declined as quantization went
testing pure reaction time, such as asking participants to press
past a threshold (around 50%). This effect seems to exist most
the spacebar as quickly as possible will produce a variance of
strongly in the Chopin pieces, and less so in Bach, indicating a
reaction time intervals that is more associated with white noise
top-down style-processing component is at play.
[21]. Preference for 1/f noise variance has been found in
computer-sequenced music [23], however, the question of how
much variance is necessary to produce a convincingly human
performance, and if it is a categorical perception, still remains.
Experiment 1 of the present study aimed to produce
convincing human performances given computer-generated
music as a starting point and applying “fixed” variance. Future
research would test if pure white and pure 1/f noise produce
significantly higher animacy ratings with increased application
of variance, compared to the “fixed” variance method.
Furthermore, if a PSE is found, employing a same/different
task over the PSE would shed light on if the perception of
animacy in music is categorical, thus suggesting consistency
with the categorical perception of animacy in faces [8].
The results indicate that the manipulation of rubato using
variations in microtiming has some effect on perceived
animacy. The size of the effect, however, seems to be
composer and genre specific. This is consistent with previous
Fig. 6 Chopin’s Mazurka 49 in F Minor (Op.68, no.4) from the literature that suggests we are more likely to perceive objects
web-study, indicating that there is likely a point at which there is an as being animate than inanimate as evolutionarily, this is best
optimal level of quantization in rubato. As a corollary to Experiment for survival. Interestingly, this perceptual mechanism also
1, where too much rubato hindered the perceived animacy, it seems
extends into the auditory domain as it relates specifically to
that too much quantization has a similar effect.
music perception [8]. The web study results revealed a trend
towards an optimal level of quantization strength in the
IV. CONCLUSION
Chopin stimuli where animacy ratings initially increased in an
In Experiment 1 (Rohum), we hypothesized that increasing inverse-U shape, reaching an apex point, but then promptly
variance would result in higher animacy ratings. However, the began to drop down as quantization strength continued to
contrary was found. As variance was introduced, animacy increase. We would argue that these minor changes in
ratings promptly decreased, suggesting that the method of microtiming serve as a way to display intentionality and are
applying variance may have been too rigid to convincingly communicative of human performance. Together, these
emulate human performance. Variance was applied by studies illustrate how the perception of animacy in music can
randomly selecting between adding or subtracting an interval be manipulated by systematically adjusting microtiming
of time that was a multiple of 5 in the range of 0 to 500ms. variations, and perhaps more importantly, how the method of
That is, no random jitter was applied within each interval of manipulation (Rohum vs. Humbot) affects the perception of
time, simply, the original note in the music was shifted by a animacy. Future work will continue along this train of thought,
fixed amount of time to randomly fall either before or after the engaging in more analyses of stylistic differences and the
metric beat. Evidence from Experiment 1 suggests that this perception of animacy, as well as the specific points at which
model for synthesizing natural variation due to human rubato is applied, and how the shifting of such points affects
limitations does not communicate a convincing perception of the perception of aliveness.
human performance. The unpredictability of random
597
Table of Contents
for this manuscript
REFERENCES
[1] F. Heider, M. Simmel, “An Experimental Study of Apparent Behavior,”
The American Journal of Psychology, vol. 57, pp. 243–259, 1944.
[2] J. A. Stewart, “Perception of animacy,” Unpublished Psychology,
University of Pennsylvania, 1982.
[3] D. Premack, “Intentionality - how to tell mae west from a crocodile - a
review of intentional stance,” Behavioral and Brain Sciences, vol. 11,
1988.
[4] D. Premack, “Cause/induced motion: Intention/spontaneous motion,”
Origins of the Human Brain, vol. 321, 1995.
[5] P.D. Tremoulet, J. Feldman, “Perception of animacy from the motion of
a single object,” Perception, vol.29, 2000.
[6] B.J. Scholl, P.D. Tremoulet, “Perceptual causality and animacy,” Trends
in Cognitive Sciences, vol. 4, issue 8, 2000.
[7] G.J.Y. Broze III, “Animacy, anthropomimesis, and musical line,”
Unpublished Graduate Program in Music Theory, The Ohio State
University, 2013.
[8] C.E. Looser, T. Wheatley, “The Tipping Point of Animacy: How, When,
and Where We Perceive Life in a Face,” Psychological Science, vol. 21,
issue 12, pp. 1854–1862, 2010.
[9] T. Wheatley, S.C. Milleville, A. Martin, “Understanding Animate
Agents: Distinct Roles for the Social Network and Mirror System,”
Psychological Science, vol. 18 issue 6, pp. 469–474, 2007.
[10] E. Halgren, P. Baudena, J.M. Clarke, G. Heit, K. Marinkovic, B.
Devaux, et al., “Intracerebral potentials to rare target and distractor
auditory and visual stimuli. II. medial, lateral and posterior temporal
lobe,” Electroencephalography and Clinical Neurophysiology, vol. 94,
issue 4, 1995.
[11] R.H. Nielsen, P. Vuust, M. Wallentin, “Perception of animacy from the
motion of a single sound object,” Perception, vol. 44, issue 2, 2015.
[12] E. Lindström, P.N. Juslin, R. Bresin, A. Williamon, “‘Expressivity
comes from within your soul’: A questionnaire study of music students'
perspectives on expressivity.” Research Studies in Music Education, vol.
20, issue 1, 2003.
[13] A. Gabrielsson, P.N. Juslin, “Emotional expression in music
performance: Between the performer's intention and the listener's
experience,” Psychology of Music, vol. 24, issue 1, pp. 68-91, 1996.
[14] P. Juslin, “Five Facets of Musical Expression: A Psychologist's
Perspective on Music Performance,” Psychology of Music, vol. 31, issue
3, pp. 273-302. 2003.
[15] J. M. Geringer, J.K. Sasanfar, “Listener perception of expressivity in
collaborative performances containing expressive and unexpressive
playing by the pianist,” Journal of Research in Music Education, vol.
61, issue 2, 2013.
[16] C.M. Johnson, “Effect of rubato magnitude on the perception of
musicianship in musical performance,” Journal of Research in Music
Education, vol. 51, issue 2, 2003.
[17] B. Repp, “Relationships Between Performance Timing, Perception of
Timing Perturbations, and Perceptual-Motor Synchronisation in Two
Chopin Preludes,” Australian Journal of Psychology, vol. 51, issue 3,
pp. 188-203, 1999.
[18] D. Müllensiefen, B. Gingras, J. Musil, L. Stewart, “The Musicality of
Non-Musicians: An Index for Assessing Musical Sophistication in the
General Population,” Plos One, vol. 9, issue 2, 2014.
[19] P.N. Juslin, A. Friberg, R. Bresin, “Toward a computational model of
expression in music performance: The GERM model,” Musicae
Scientiae, vol. 5, 2002.
[20] R. F. Voss, J. Clarke, “1/fnoise in music and speech,” Nature, vol. 253,
1975.
[21] D.L. Gilden, T. Thornton, M.W. Mallon, “1/f noise in human cognition,”
Science, vol. 267, 1995.
[22] D.L. Gilden, “Cognitive emissions of 1/f noise,” Psychological Review,
vol. 108, issue 1, 2001.
[23] H. Hennig, R. Fleischmann, A. Fredebohm, Y. Hagmayer, J. Nagler, A.
Witt, et al., “The nature and perception of fluctuations in human musical
rhythms,” Plos One, vol. 6, issue 10, 2011.
598
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 599 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 600 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 601 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
sones) for each score beat indexed ∈ ℕ in one piece, then the the pairs (mf, ff), (mf, pp), and (ff, mf) as these marking pairs
normalised sequence is defined as do not appear in the Mazurkas we analyze.
−
= , (1)
−
where
and are the minimum and maximum loudness
values, respectively, in the recording. The loudness value
associated with a marking at beat
is
1
ℓ = ( + + ). (2)
3
602
Table of Contents
for this manuscript
B. Experiment II
One question that follows naturally from Experiment I is just
how varied are the loudness behavior in the recordings? In
order to investigate this question, we map each recording to a Figure 3. Loudness transition clusters found for Mazurka Op. 7
time series of discretized values of the function defined in No. 1 (top) and Op. 33 No. 3 (bottom); in each case, the cluster
Section II. Each recording then becomes a curve with comprising of only Magin's recording is shown in solid bold lines,
discretized points in score time. The purpose of the procedure the cluster having the highest number of recordings is shown as
above is to create meaningful clusters of curves so as to further dotted bold lines.
analyze the resulting shapes, and distinguish unusual
behaviors. Next, we focus on when there are commonalities in unusual
To cluster the curves obtained, we implemented the k-means interpretations. For this, we consider pianists whose recordings
machine learning method. The number of clusters, k, is defined are in outlier clusters for every Mazurka, and we compute the
by the “gap statistic” (Tibshirani, 2001); we further limit k to number of times two pianists’ recordings are classified into the
the range [3,8] to ensure a reasonable number of recordings same outlier cluster for all Mazurkas. The results are shown in
appear in each cluster, but also provide the flexibility of Table 5, where we give the number of outlier clusters
detecting clusters that include curves that follow an unusual containing the specific pianist pairs. We focus on pairs that co-
behavior at the same time. occur in outlier clusters for more than twenty Mazurkas.
The result of the process described above is a number of
loudness behaviour clusters for each Mazurka. Clusters with Table 5. Pianists whose recordings co-occur in more than twenty
the least number of recordings are labeled as outliers; then, we outlier clusters.
create a list of pianists that appear at these outlier clusters.
Table 4 presents the top three outlier cluster pianists who have Pianist Fliere Milkina Ezaki Barbosa
recorded more than thirty Mazurkas. Chiu 0 21 0 21
Smith 27 0 0 0
Table 4. Pianists having the highest proportion of recordings in Rubinstein 0 26 0 0
outlier clusters. Kushner 0 0 21 0
Czerny- 0 0 0 21
Pianist (year of # Outlier # Recordings Ratio Stefanska
recording) clusters
Cortot (1951) 12 42 0.2727 Table 5 shows higher degrees of similarity in performed
Poblocka (1999) 9 33 0.2727 loudness between pianists Chiu—Milkina, Chiu—Barbosa,
Sztompka(1959) 8 38 0.2105 Smith—Fliere, Rubinstein—Milkina, Kushner—Ezaki, and
Czerny-Stefanska—Barbosa. Thus, although certain pianists
Magin is the only pianist whose recordings were the single are more often in outlier clusters, the ways in which they differ
element which was contained in a cluster at the cases of in their loudness interpretations often follow shared patterns.
Mazurka Op. 7 No. 1 and Mazurka Op. 33 No. 3. Fig. 3
highlights the loudness behavior of Magin’s recordings of both
Mazurkas, and how they differ from the other recordings..Fig. IV. CONCLUSIONS-DISCUSSION
3 shows the centroid for each cluster, thus the discretized
dynamic value may not be equal to an outcome of the function In this study we have analyzed outliers in loudness
. interpretations in piano recordings. The results show that we
603
Table of Contents
for this manuscript
ACKNOWLEDGMENT
This research has been funded in part by a Queen Mary
University of London Principal's studentship.
REFERENCES
Burkhart, C. (1983). Schenker’s theory of levels and musical
performance. In David Beach, ed., Aspects of Schenkerian
Theory. New Haven: Yale University Press, 95-112.
Cook, N. (2013). Beyond the Score: Music as Performance. Oxford
University Press.
Ewert, S., Müller M., Grosche, P. (2009). High resolution audio
synchronization using chroma onset features. Proceedings of
IEEE International Conference of Acoustics, Speech, and Signal
Processing (ICASSP) (pp.1869-1872). Taipei, Taiwan.
Fabian, D., Timmers, R., Schubert, E. (2014). Expressiveness in
Music Performance: Empirical Approaches across Styles and
Cultures. pp. 60-61. Oxford University Press.
Kosta, K., Bandtlow, O. F., Chew, E. (2014). Practical implications of
dynamic markings in the score: Is piano always piano?
Proceedings of the 53th Conference on Semantic Audio. London,
U.K.
Kosta, K., Bandtlow, O. F., Chew, E. (2016). Mapping between
dynamic markings and performed loudness: a machine learning
approach. To appear in Journal of Mathematics and Music,
special issue on Machine learning and Music, July.
Langner, J., & Goebl, W. (2003). Visualizing expressive performance
in tempo-loudness space. Computer Music Journal, 27(4), 69-83.
Leech-Wilkinson, D. (2015). Cortot’s Berceuse. Music Analysis,
34(3), 335-363.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the
number of clusters in a dataset via the gap statistic. Journal of the
Royal Statistical Society B, 63:2, 411-423.
604
Table of Contents
for this manuscript
V. APPENDIX A
Table A. Chopin Mazurkas used in this study, the number of recordings for each one, and the number of dynamic markings that
appear in each one. Mazurkas are indexed as “M<opus>-<number>.”
Mazurka index M06-1 M06-2 M06-3 M07-1 M07-2 M07-3 M17-1 M17-2 M17-3 M17-4 M24-1
# recordings 34 42 42 41 35 58 45 50 36 67 46
# markings 18 13 22 13 13 18 7 6 9 7 4
Mazurka index M24-2 M24-3 M24-4 M30-1 M30-2 M30-3 M30-4 M33-1 M33-2 M33-3 M33-4
# recordings 56 39 54 45 50 54 55 48 50 23 63
# markings 12 7 33 8 14 25 18 5 16 4 12
Mazurka index M41-1 M41-2 M41-3 M41-4 M50-1 M50-2 M50-3 M56-1 M56-2 M56-3 M59-1
# recordings 35 42 39 33 45 40 67 34 48 51 41
# markings 12 5 6 7 15 14 17 14 7 16 8
Mazurka index M59-2 M59-3 M63-1 M63-3 M67-1 M67-2 M67-3 M67-4 M68-1 M68-2 M68-3
# recordings 56 56 42 62 35 31 40 42 38 48 42
# markings 8 11 9 4 18 10 13 11 12 21 8
VI. APPENDIX B
Figure A. Score excerpt from Mazurka Op. 59 No. 3 where the markings of the first pair (f, p) appears. This transition has the
highest proportion of outliers.
605
Table of Contents
for this manuscript
Laura A. Stambaugh1
1
Department of Music, Georgia Southern University, Statesboro, GA USA
A musician has many choices of where to direct her attention while playing. The field of
motor control has identified these possible areas as “focus of attention” (FOA). FOA can
be directed internally (embouchure or fingers) or externally (keys or the sound of one’s
music). Previous research in motor learning and one limited music study found an
external FOA generally leads to more efficient and effective movement than an internal
FOA. However, a recent study indicated university novice woodwind players were not
differentially affected by internal and external FOAs, while the advanced players
performed most evenly and accurately using an internal FOA.
The purpose of the current study was to examine the effect of FOA on second year band
students’ performance. It built directly on the previous study, using a repeated measures
design with a control condition (no directed FOA), an internal focus condition (fingers),
and an external focus condition (sound).
Participants were forty band students in their second year of study, aged 13-14 years (n =
16, woodwinds: flute, clarinet, and saxophone; n = 16, valved brass: trumpet and French
horn; n = 8, trombones). The study stimuli were isochronous, alternating two pitch
patterns (e.g., eighth notes C-A-C-A-C-A-C). Students heard a model recording of each
stimulus at a specified tempo and were then directed to play the measure as evenly and
accurately as possible (control condition), while “thinking about your fingers” (internal
focus condition), and while “thinking about your sound” (external focus condition).
Participants played eight trials of each stimulus. The design was fully counterbalanced,
with the exception that the control condition was always performed first. Approximately
24 hours after the first study session, participants returned for a retention test, playing
each stimulus three times with no directed FOAs. Results will be analyzed as within and
between group ANOVAs for pitch accuracy and pitch evenness.
ISBN 1-876346-65-5 © ICMPC14 606 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
When playing music, musicians’ body movements relate not only to playing their
instrument directly, but to their expressive intentions and interpretation of musical
structure. These expressive gestures have been shown to correspond with the level of the
phrase, especially in performers’ head and torso. A phrase though, can consist of multiple
melodic groups. As such, a performer can choose to either express the local groups, or
more strongly integrate them into a whole phrase. These choices have been less explored
with relation to performer motion. This study aimed to characterize the nature of motions
associated with either choice. We filmed 13 cellists playing a musical excerpt, the
opening 8 bars of the 3rd Ricercar by Domenico Gabrielli, in two conditions: in which
they were asked to think either about local melodic groupings or a whole phrase,
prescribed by the experimenters. Cellists were further required to use the prescribed
fingerings and bowings, and memorized both interpretations. Cellists were filmed by two
video cameras (front and right side perspective), and 2 successful takes of each version
were collected. 1-inch squares of retroreflective tape were placed on the forehead, chin,
right cheek, hands, shoulders, and torso. Marker positions (in 2D pixel space) for each
camera were subsequently extracted using computer vision techniques. Our results show
that participants’ heads move more in the local task than in the whole-phrase task,
measured by distance traveled by forehead and cheek markers. This result is consistent
with previous studies relation between head motion and phrase, and extends the relation
to show that melodic interpretation is also shown in body movement. The indication of
expressive melodic intentions by musicians’ bodies has implications for the extent of the
embodiment of musical structure in performers.
ISBN 1-876346-65-5 © ICMPC14 607 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Effective audience engagement with music performing arts involves cognitive and
affective elements. Musical and motor expertise, and modality of presentation may shape
cognitive, but not affective responses to music performance. In an empathy framework,
we investigate cognitive and affective responses to contemporary solo piano performance
presented in audio and audio-visual formats. Affective empathy involves feeling and
sharing emotions (basic attributes of normal-functioning humans). Cognitive empathy
involves taking the perspective of another, and may be expertise moderated.
Understanding arguably results from the cognitive empathic ability to ‘put oneself in
another’s shoes’. Participants were non-musicians (Nms), musician non-pianists (Ms),
and musician pianists (Mps). Cognitive (understanding – ‘put yourself in the performer’s
shoes’ and report your understanding of what the performer is doing to physically
generate the expressive performance) and affective (arousal and valence) empathic
responses were made continuously. Liking and familiarity ratings were made on seven-
point Likert scales after each excerpt. As hypothesized: understanding ratings were
higher for Ms and Mps than Nms; understanding and liking ratings for audio-visual
presentations were higher than audio-only; liking ratings were higher for Ms and Mps
than Nms; no significant arousal, valence, or familiarity effects were observed. Contrary
to expectation, understanding ratings did not significantly differ between the two
musician groups. Affective empathic responses appear similar across expertise groups
and presentation modality. Cognitive empathic responses vary according to musical, but
not piano-playing motor expertise, and presentation modality. Potential sample size and
cognitive empathy measure issues also warrant consideration. Non-musician audiences
may be developed for contemporary music performance through multimodal educational
strategies to increase understanding of performance.
ISBN 1-876346-65-5 © ICMPC14 608 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 609 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
610
Table of Contents
for this manuscript
However, a comparison between the pitch errors made in the an overview of the most prevalent kinds of argument that the
two performances of each target piece revealed a significant participants used in the interviews after the experiment to
difference between the silent and audio conditions. In the account for their experience of the advantages and
silent condition, mental practice diminished the amount of disadvantages of the two conditions.
pitch errors made by 9.9 percentage points, which was One of the most important benefits of the aural model, as
significantly better than in the aural model condition. experienced by the participants, was that such a model could
offer them a “better general impression of the piece” (Eva;
Table 1. Learning outcomes in the silent and aural model
similarly Ella, Anna-Liisa, Anja, Sissi, Tomi, Hilma, Tia).
conditions. Hearing the music would help the musician “discern
silent aural Difference everything more clearly” (Tomi), or “perceive and internalize
technical fluency .75 .70 t(22) = .280 p = .782 the music better” (Ella; similarly Hilma). Heard sound might,
expressive .96 .93 t(22) = .124, p = .902 indeed, be felt as a kind of glue between the visual and the
Improvement
interpretation kinesthetic:
structuring of the 1.05 .89 t(22) = .753 p = .459
phrases The aural model kind of helped to connect the visual score and
structuring of the .92 .90 t(22) = .131 p = .897 the melody to doing it with your hands. There were so many
overall form aspects that were linked together. (Ella.)
pitch errors –9.9 –3.4 t(22) = –2.17 p = .041
rhythmic errors –3.4 –3.1 t(22) = –.105 p = .917 The participants who stated their preference for the aural
model typically explained this by their inability to clearly hear
The improvement scores, as given by the expert judges, music directly in their minds on the first encounter (Tomi, Eva,
were found to be significantly correlated with scores from the Ella, Sissi, Ville, Anja, Anna-Liisa, Sanna, Anu). Indeed,
VARK questionnaire for modality preferences especially for Tomi suggests that such inability might be due to the very use
the silent condition. In the silent condition (see Table 2), the of audio recordings in practicing:
scores for the kinesthetic and auditive dimensions were I grasp the idea of the music more effectively when I listen to a
significantly correlated with the learning outcomes assessed recording. My inner ear is not that developed—probably
by the expert panel. In particular, kinesthetic VARK scores because I have not practiced it. Maybe I could develop it if I
found significant correlations with the scores for improvement left out the recordings. (Tomi.)
in expressive interpretation, improvement in the structuring of
the phrases, and improvement in structuring of the overall The aural model, then, was found helpful for “confirm[ing]
form. In addition, the auditive VARK scores were correlated the right pitches” (Esko; similarly Tomi, Anu, Hilma), and for
with the improvement in expressive interpretation. detecting possible mistakes in the previous prima vista
performance (Milla). Some of the participants reported “being
more self-confident” after hearing the music (Tia; similarly
Table 2. Silent condition: Pearson correlations between Tomi, Mari).
learning outcomes and modality preferences It was especially in connection with the aural model that
VARK VARK the participants also mentioned reflecting on expressive
kinesthetic auditive
aspects of performance. Hearing the melody was found to
technical fluency .371 –.241
expressive .596** .540**
invite such reflections more than the silent condition did (Sissi,
Ella, Anna-Liisa, Eva, Esko). Heard sound might be
Improvement
interpretation
structuring of the .567** .163 experienced to free some additional resources for thinking
phrases about the dynamics (Ella, Anna-Liisa), or to “remind of the
structuring of the .506* .248 articulation” (Eva). Important elements to be emphasized such
overall form as slurs would “come forth better” while listening, “reminding
pitch errors –.362 .260 [the player] of performing them” (Eva). Occasionally, the
rhythmic errors –.317 .124 details perceived with the support of the heard music might be
devoted more prolonged visual attention, even while the
** p < .01, * p<.05 music was progressing onward:
In the aural model condition, the only significant [With the aural model] I was thinking about interpretational
correlation was found between the visual VARK dimension things, how I would play it. […] For example, when I heard
and the improvement in structuring of the overall form (r some passage, I thought: hey, I could play like this and this.
= .464, p < .05). […] When the theme came for the second time, I stopped and
thought to myself how it might be played differently than in the
beginning. (Sissi.)
F. Musicians’ experience of the aural model condition The heard audio version may also have been used as such
Above, we saw that the silent reading and the aural model as a guiding model for expressive performance (Eva, Anna):
conditions were equally efficient in facilitating the musicians’
learning processes on the group level. Of course, this does not I noticed that in the latter piece I took quite a lot from the heard
rule out the possibility that individual musicians might find aural image, so that I found a more vertical treatment to the
one of these conditions as more suitable, useful, or motivating piece. (Eva.)
for themselves. In this and the following section, we will give
611
Table of Contents
for this manuscript
More often, however, the deadpan computer performance practice as something separate from either reading the score
seems to have been treated critically, “encouraging a more or listening to the music, and found it “too much” to combine
musical performance” in exchange (Tomi). Interestingly, all of these aspects:
some of these participants reported “fighting in my mind with
I hardly did much practicing with the music playing in the
the computer” (Kaisa; similarly Hilma), or comparing the
background—I could not concentrate. I could have done that
heard sound with a better imaginary version (Esko, Ella).
too, but it would have equaled doing like three things at the
Esko, who confessed “not thinking much about the
same time: looking at the score, physically practicing and
interpretation” in the silent condition, did so with the audio listening to the music. (Tuomas.)
condition:
I was maybe comparing my interpretation to what I was
Interestingly, some of the participants even claimed that
hearing. […] For example, some notes are supposed to be a hearing the aural model actually prevented them from mentally
little bit less in volume than others. In such instances, I noticed listening to the music (Tuomas, Anna, Mai, Susanna).
that the computer is just playing it [in] the same volume, and I Susanna actually tried to ignore the aural model,
was comparing it to some ideal interpretation. concentrating instead on visual and kinesthetic strategies:
I had to go like: “Okay, close your ears and concentrate on the
Even though the aural condition often complemented
score.” I was kind of able to exclude [the aural model]
participants’ weaker audiation skills, some participants with
reasonably well, but […] because I also wanted to hear it in my
strong audiation skills might also prefer the aural condition as
mind—to get an aural picture of it, but then I was forced just to
a tool for learning music by ear. Marika and Mari both look at the score, and think about how it would feel on the
reported having perfect or near-perfect pitch, but confessed to keyboard. It was disturbing not to be able to listen to it in my
be slow music readers, and hence they valued the auditory head. (Susanna.)
model. Mari calls herself an “auditive learner,” and recounts:
With a new piece, hearing it helps me a lot […] Maybe it’s
because as a musician one of my weaknesses is handling the G. Musicians’ experience of the silent condition
rhythms. (Mari.) The most often reported benefit of the silent condition was
the possibility it offered for processing the score at one’s
Learning by ear was a characteristic approach to new
leisure (Anu, Anna, Matti, Susanna, Tuomas, Mai, Mari, Tia).
repertoire for Mari, who actually scored highest of all
In a word, this condition gave the musician “more time to
participants on the VARK auditive modality scale. She
practice in one’s mind” (Anu). Compared to the aural model
mentioned “reading the score being challenging since
condition, more varied reading processes were reported here.
childhood.” This is how she described her exceptional
These ranged from simply reading through the music (Anu,
memory for music as a combination of absolute pitch and
Esko, Ella, Sanna, Anna, Mai), through more segmented
visual imagery:
(Tuomas, Susanna) and comparative reading processes
In general, being with the score has always been such that I (Susanna, Esko, Matti) to “unsystematic” (Anja), or
play a lot without it—like from the memory and by ear […] “squinting” (Tomi) approaches.
because I can play a lot based just on an auditory image, so The silent condition was found helpful in aiding detailed
that I don’t need a score […] [The memorizing] is very easy analytical study of important or difficult passages (Susanna,
for me. Often just hearing it once will allow me to play it [...] Tuomas, Anna, Milla). Susanna, who in this condition
Of course, I immediately hear the key and how it begins, and I received the highest expert judgments across the board,
can locate where I’m at [in the music]. […] It happens enjoyed the condition, as it enabled her to quickly scan the
somehow automatically. In my opinion I don’t consciously score for difficulties and to concentrate on them:
think much of anything—rather, I just listen, and then I’m like:
“a-ha: this is how it goes,” and then I go and play it […] I With the silent [condition], I was able to focus my
can’t explain it very well. […] I kind of hear, and I see—I can concentration—like “Okay, this is easy, easy, that is a difficult
very well imagine fingers on the keyboard: “a-ha, that’s the one, difficult one, that’s a difficult one.” And then, to
way it goes here, and now it does this.” (Mari.) concentrate on those. To go through them many times.
(Susanna.)
The participants also pointed out some disadvantages with
the aural model. Chief among them was a tendency to inhibit Another potential benefit of silent mental practice is that,
slower and more detailed processing of passages that would for some musicians, it might actually facilitate audiating the
have required extra time for analysis or simulated practice music. One indicator for playing the music in one’s mind, as
(Mai, Tuomas, Anna, Matti, Susanna). Tuomas found the opposed to analytical scrutiny, might be tapping to the silently
aural model “superficial,” and thus experienced it as imagined music (as described by Tia and Hilma). Without
challenging for himself to “balance between” listening to the support from heard music, enacting the music kinesthetically
model and focusing on musical detail (Tuomas). Mai who might be experienced as a helpful approach:
wanted to “look at the notes at their places at my own leisure”
I was moving my fingers quite a lot. I moved them actually
simply found the tempo too fast and was “frustrated by it”: as more than with the music playing [in the aural model]. It was
she said, “my brain just could not keep up that speed” (Mai). somehow even more important to get the feeling image with it.
Some participants found the mere presence of heard sound (Tia.)
distracting to mental practice (Tuomas, Sami, Kaisa). Tuomas
seemed to conceive of the “practicing” element in mental
612
Table of Contents
for this manuscript
Many of the participants who expressed preference for the Now, as found by Kessler, Rubinstein, Ginsborg, and Henik
silent condition reported having easily heard the musical score (2008), a cued manual motor imagery is often present during
in their minds (Anna, Esko, Susanna, Kaisa, Sami, Milla)—a music reading. We suggest that when processing the score in
skill commonly referred to as audiation (Gordon 1984). Anna silence, individuals with stronger kinesthetic tendencies might
who described getting the music “more carefully in my head” more fluently have been able to translate cognized musical
in the silent condition, noted that playing the music in her elements into embodied action.
mind helped her “pretty much with everything,” also helping In the silent condition, the connection between auditive
her to relax. Descriptions of audiation in the silent condition tendencies and improvement in expressive interpretation
involved such phrases as “beautiful melody” (Esko) or probably cannot be accounted for by conscious planning of
“getting the atmosphere” (Anna), suggesting that at least some expressivity: in this condition, only very few of the pianists
participants were able to grasp the silently audiated materials reported such planning during the silent practice. One
as aesthetically rich music. potential explanation for the connection between auditive
Whereas the aural condition was above noted to evoke tendencies and expressive improvement could simply be that
thoughts concerning music’s expressive aspects, the silent an “auditive musician” is better able to expressively respond
condition sometimes seemed to render “thinking about the to her/his own playing in the course of performance. An
interpretation more difficult” (Ella, similarly Sissi). At least alternative explanation would be the use of audiation
these two participants were among the ones who also admitted strategies during mental practice. Namely, in the silent
not having been able to audiate the music clearly. condition, the four participants who improved their expressive
In cases where audiating the pitch content of the piece was performance the most reported singing or listening to the
found to be challenging, it could also be replaced with music in their minds during silent practice; none of the four
audiating the rhythms (Sanna, Matti). This could be participants with least improvement in expressivity reported
understood as a kind of silent singing: clearly hearing the music. Having internalized the piece
through inner audiation, the musician not only memorizes the
I could not somehow sight read it by singing in my head. I
piece better (Highben & Palmer 2004; Bernardi & al. 2013),
didn’t want to spend time in searching for the pitches or the
but also gains more freedom to even improvise a cogent
pitch intervals. Maybe I was thinking more about the rhythm.
[…] I was singing it in my head even though it was just the
expressive interpretation in the actual performance.
rhythms at this point. In this piece, I did not [think about] the In the aural model condition, the expert panel scores for
melody. [Sanna demonstrates by playing on the lap and singing improvement in structuring of the overall form were
in a whispery tone:] Dii-dii-dii. Somehow I was in that pulse. associated with a visual tendency. Many musicians
(Sanna.) experienced that the aural model offered them a better general
impression of the piece. We suggest that the aural model may
Some participants found the silent condition “useless” have helped especially musicians with visual tendencies to
(Tomi) and were even pondering “when it is going to end” connect the visual score and the heard music to project a more
(Anu). The disadvantages of the silent condition were very structured understanding of the overall form.
often related to the participant’s inability to clearly listen to As revealed by the interviews, musicians’ ability to hear
the music in one’s mind (Ella, Eva, Sanna, Sissi). Ella newly encountered music in their minds affects their attitudes
mentioned feeling a certain “insecurity” when she “did not towards types of mental practice. Silent mental practice may
know if it’s precisely correct.” often be found useful by musicians with developed audiation
skills. When audiation skills are more modest, listening to a
recording may be found necessary for grasping a more holistic
IV. CONCLUSION impression of the piece. However, listening to an aural model
may also be useful for musicians with preference to “learn by
The present study addressed the role of individual ear,” irrespective of their level of audiation skills. However,
differences in musicians’ mental practice of new musical the habit of using recordings to speed up the learning process
material, meaning score-based practice away from the musical in the beginning stages of learning new music may turn out to
instrument. Two different mental practice conditions—silent be an impediment to the development of stronger skills in
mental practice and mental practice while listening to the score reading and audiation.
music—were compared through expert ratings of achieved Of the two mental practice conditions considered, it was
improvement as well as through interviews of the participants. especially the aural model condition that evoked thoughts
The two conditions did not differ in their overall benefits, about expressive interpretation—paradoxically, despite the
as judged by the expert panelists. However, the learning expressively flat computer rendering used here for the aural
outcomes in the silent condition were associated with models. While some of the musicians accepted the
kinesthetic and auditive tendencies, as assessed by a test of performance on the recording as a guiding model, others
modality preferences. In the silent condition, the expert panel would, conversely, be inspired to their own expressive
scores for improvement in expressive interpretation, performances in opposition to the deadpan version heard.
structuring of the phrases, and structuring of the overall form Aural models may surely be used as sources of inspiration for
were significantly correlated with a kinesthetic tendency; this expressive interpretation. Nevertheless, they may sometimes
was not the case for improvement in technical fluency. These guide performers too much, preventing them from creating an
results resonate with previous findings suggesting that mental expressive interpretation of their own—which was noted by
practice might be more efficient for cognitive than for motor some pianists as a negative side of using recordings in early
tasks (see Driskell & al. 1994; van Meer & Theunissen 2009). stages of learning.
613
Table of Contents
for this manuscript
The aural model, despite giving support, may also disturb Gordon, E. E. (1984). Learning sequences in music: Skill, content, and
patterns. Chicago, IL: G. I. A. Publications, Inc.
the cognitive processing by preventing the musician from
Highben, Z. & Palmer, C. (2004). Effects of auditory and motor mental
processing the piece in a slower and more detailed manner. practice in memorized piano performance. Bulletin of the Council for
Some of our participants also reported that the aural model Research in Music Education, 159, 58–65.
prevented them from audiating the music by themselves. Van Meer, J. P. & Theunissen, N. C. M. (2009). Prospective educational
Silent score reading gives more freedom to process at one’s applications of mental simulation: A meta-review. Educational
Psychology Review, 21(2), 93–112.
own leisure, to concentrate, to focus on critical passages, and Miksza, P. (2005). The effect of mental practice on the performance
to engage in structural analysis. It can be seen as a more ideal achievement of high school trombonists. Contributions to Music
type of mental practice, particularly in situations that require Education, 32(1), 75–93.
Palmer, C. & Meyer, R. K. (2000). Conceptual and motor learning in music
more in-depth, detailed processing of the music. It is, perhaps,
performance. Psychological Science, 11(1), 63–68.
a more ideal activity for developing skills like audiation, Ross, S. L. (1985). The effectiveness of mental practice in improving the
reflective score analysis or imagined physical action. performance of college trombonists. Journal of Research in Music
In silent mental practice situations, musicians are not Education, 33(4), 221–230.
Rubin-Rabson, G. (1941). Studies in the psychology of memorizing piano
exclusively dependent on their ability to audiate. If one’s
music. VI: A comparison of two forms of mental rehearsal and keyboard
ability to silently “listen” to the music from the score is overlearning. Journal of Educational Psychology, 32(8), 593–602.
weaker, the skills lacking in aural imagery may be Theiler, A. M. & Lippman L. G. (1995). Effects of mental practice and
compensated for by audiating only the rhythms or leaning on modeling on guitar and vocal performance. The Journal of General
kinesthetic strategies such as tapping in rhythm. Despite Psychology, 122(4), 329–343.
individual weaknesses, there is always something a musician
can do. The challenge is to find the most appropriate
strategies for different individuals and situations.
In our experience, instrumental educators do often advise
their students in mental practice, but judging from the
characteristically individual attitudes of the musicians in our
study, it is a real possibility that not all of the methods suit
everybody. Our findings converge with previous research in
emphasizing the importance of audiation for mental practice,
but perhaps meaningful programs of mental practice could
also be developed along other lines—for instance, by building
on an individual’s kinesthetic tendencies. Our study
tentatively suggests that individuals’ kinesthetic tendencies
may play a bigger role in the silent processing of music than
heretofore acknowledged.
ACKNOWLEDGMENTS
This research was supported by the project Reading Music:
Eye Movements and the Development of Expertise, funded by
the Finnish Academy (project nr. 275929).
REFERENCES
Bernardi, N. F., De Buglio, M., Trimarchi, P. D., Chielli, A., & Bricolo, E.
(2013). Mental practice promotes motor anticipation: Evidence from
skilled music performance. Frontiers in Human Neuroscience, 7(451), 1–
14.
Bernardi, N. F., Schories, A., Jabusch, H-C., Colombo, B., & Altenmüller, E.
(2013). Mental practice in music memorization: An ecological-empirical
study. Music Perception, 30(3), 275–290.
Cahn, D. (2008). The effects of varying ratios of physical and mental practice,
and task difficulty on performance of a tonal pattern. Psychology of
Music, 36(2), 179–191.
Brodsky, W., Kessler, Y. Rubinstein, B.-S., Ginsborg, J., & Henik, A. (2008).
The mental representation of music notation: Notational audiation.
Journal of Experimental Psychology: Human Perception and
Performance, 34(2), 427–445.
Coffman, D. (1990). Effects of mental practice, physical practice, and
knowledge of results on piano performance. Journal of Research in Music
Education, 38(3), 187–196.
Driskell, J. E., Copper, C., Moran, A. (1991). Does mental practice enhance
performance? Journal of Applied Psychology, 79(4), 481–492.
Fortney, P. M. (1992). The effect of modeling and silent analysis on the
performance effectiveness of advanced elementary instrumentalists.
Research Perspectives in Music Education, 3, 18–21.
614
Table of Contents
for this manuscript
Acquiring sensorimotor skill, such as playing the piano, requires learning patterns of
actions and sounds. The role of actions and sounds in sensorimotor skill acquisition may
depend on expertise. Skilled pianists have developed strong action-sound associations, in
which highly-practiced action patterns can be generalized to produce multiple possible
sound patterns. Pianists also engage motor brain regions when simply perceiving related
sounds, suggesting auditory-to-motor neural connections. Therefore, pianists may learn to
play new melodies most efficiently by listening, while novice performers may learn most
efficiently through motor practice. We examined whether skilled pianists learn to perform
new melodies more efficiently, and show greater motor brain response, when listening
compared to performing, in contrast to novices. Pianists and non-musicians underwent
functional MRI scanning while learning novel piano melodies in four conditions: by
following visuospatial cues (cue-only learning), or by additionally listening (auditory
learning), performing (motor learning), or performing with auditory feedback (auditory-
motor learning). Participants performed melodies from memory (recall) periodically
during learning. Pianists’ pitch accuracy was highest in the auditory learning condition;
non-musicians’ pitch accuracy did not differ across learning conditions. Pianists and non-
musicians engaged the superior temporal gyrus during auditory and motor (both minus
cue-only) learning, and primary motor cortex (M1) during motor learning. Pianists
showed greater M1 response during recall in the auditory learning condition compared to
recall in the motor learning condition (both compared to recall during auditory-motor
learning). Results suggest an expertise-dependent advantage for auditory over motor
learning in auditory-motor skill acquisition. Skilled performers’ sensory-to-motor
connections may enable motor learning based solely on perception.
ISBN 1-876346-65-5 © ICMPC14 615 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The acquisition and refinement of the physical and perceptual skills involved in music
performance demand both the automatization of fundamental motor behaviors and
selective, focused attention to key aspects of proprioceptive and auditory feedback. The
apportioning of conscious attention is central to effective practice in any domain, but
seems especially challenging as musicians combine these components to instantiate
expressive intention. In fact, the ways in which thinking and attention accommodate the
various afferent and efferent signaling processes engaged during music making are not
well understood. Even musicians who have achieved high levels of skill often approach
individual components of music making independently during practice, without sufficient
attention to the musical effects the physical skills of music making are intended to bring
about, rendering their work ineffective in improving performance and overall artistry. In
a series of experiments, we analyzed the practice of artist-level musicians and their
students. Our results reveal that artist-level musicians focus attention on the expressive
intentions of music making during practice in ways that serve to unify the physical
components of music performance. We show in data from behavioral observations and
interviews that the processes that underlie music learning are enhanced when musicians
retain the integration of multiple aspects of playing and focus primarily on the
accomplishment of expressive goals. Thus, the expressive aspects of music performance
are not something “added later” once technical demands have been met. Instead,
expression serves as the unifying element around which all of the physical and perceptual
tasks cohere. In this session, we present recorded examples of effective music practice by
artist-level musicians and highly accomplished students, and we discuss the approaches
to music learning that facilitate the successful accomplishment of learning goals.
ISBN 1-876346-65-5 © ICMPC14 616 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 617 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
A musician’s ability to present and perceive their own sound is greatly influenced by the
acoustical properties of a given performance space. As various musical genres are
performed in multi-purpose venues, today’s advanced electro-acoustic technology has
encouraged the development of virtual acoustic systems. Virtual acoustics offers
flexibility and efficiency in adjusting the room acoustics of a concert venue to best
accommodate musicians’ unique needs of each performance.
Aims
This paper investigates listeners’ preference of musical performances recorded in three
different acoustic conditions, including two conditions with virtual enhancements. The
goal of the study is to learn whether listeners, unaware of different acoustic conditions,
prefer recordings in enhanced acoustic conditions, which performers reported favoring in
our earlier study.
Method
Five professional string quartets were invited to record short excerpts in three different
acoustic conditions. These excerpts were used for the listening test employing a forced-
choice paired comparison paradigm. A group of amateur and professional musicians
participated. Stimuli were presented over the headphones in a quiet laboratory, and
listeners indicated which one of the two given excerpts they preferred, by how much, and
why. 15 excerpts were tested twice for participant reliability verification.
ISBN 1-876346-65-5 © ICMPC14 618 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
III. METHOD
Taehun Kim is with Audio Communication Group, Technische Universität
Berlin, Berlin, Germany (corresponding author to provide e-mail: A. Model of Expressive Shape
t.kim@mailbox.tu-berlin.de).
Stefan Weinzierl is with Audio Communication Group, Technische Assuming that a performer renders a musical piece with a
Universität Berlin, Berlin, Germany (e-mail: stefan.weinzierl@tu-berlin.de).
ISBN 1-876346-65-5 © ICMPC14 619 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
620
Table of Contents
for this manuscript
a)
●
●
a) ●●
● ● ●●
● ● ●●● ●●
70 ● ●●●●●●●●●●●●●
●●●
● ●● ●●
●●● ●● ●● ●
●●
●
●●●●●
●●●
200
● ●● ● ●●
● ● ● ● ● ●●
● ●● ● ●
● ● ● ●●
●●
● ● ●● ●●
● ● ●
● ●● ● ●● ●●
● ●●
● ● ●● ●●
Tempo (BPM)
● ●
60
● ●
Tempo (BPM)
● ●● ●●
● ● ●● ●●
● ● ●
●● ● ●●
●
● ● ● ● ●●
● ●● ● ● ●
● ● ●
●● ● ● ●●
●● ●● ● ● ●●
● ●● ●
●● ●● ●
● ●
● ●
●● ● ●
● ●
●
●
●
150 ●
●
●●
●●
●
●●
● ● ● ●● ● ●●
50
●
●●
● ●
●● ●● ●
● ● ●
●●
● ● ●
● ● ● ●
● ● ● ● ● ●
● ●● ●● ●● ●●
● ● ● ●● ●
●● ●●
● ● ● ●
●● ● ●
● ● ●
●● ● ●
● ● ● ●
●● ●
40 ● ●●
● ●
●● ●●
● ●● ●
100 ●
●
●
●
●●
●
●
●
●
●● ● ●
●● ●
30 ● ●
● ●
●●
●
● ●
50
0 10 20 30 40
0 50 100 150
Beat Beat
b) b)
4
100 ● ●
●
1st Derivative (BPM)
Tempo (BPM)
● ●
●
75
●
●●● ● ●
●
●●●
0 ● ● ●●●
●●●
●●●
● ●
●●●●●●
●●●●
●●●
●●●
●●●●
●●●
●●●
●●●●●
●●●
●●●
●●
●● ●● ●●●●●●
● ●●● ●●● ●●● ●●● ●● ●●
● ●●●●● ● ●● ●
● ●●● ●●
●●● ●●
●●●●
50 ●
● ●
●●●
●●●
●●●
●●●
●●● ●●●
●●
●
●● ●●
●● ●●
●●●
●
●●●
−2 ●●
● ●
● ●●●
25
0 10 20 30 40
0 20 40 60 80
Beat Beat
c)
Fig. 1 Tempo shape analysis of F. Chopin Prelude Op. 28 No. 7 by ●
Tempo (BPM)
●●
100 ●●
●●
●●
●
●●
●
●●
●●
●●
●●
●
● ●●
●
●
●●
●
●
●●
●●
●
●●●
●
● ●
●
●● ●●
●
●
●
●
●● ●
●● ●●●
●
● ●
●
●
●
●● ●
●●
●●
●
●● ●
●●
● ●
●
●● ●
IV. RESULT
●
●
●● ●
●●
●●
●
●●
●●
●
●●●
●
● ●
●
●●
●●
●
● ●●
●
● ●
●
● ●
●●
● ●●
●
●
●
●
●● ●●
● ●
●●
● ●●
●
●●
●●
●
●
●● ●●
● ●●
● ●
●
●● ●
●
●●
●●
●
● ●●
●
●● ●
●
●● ●● ●●
● ●●
●
● ●●
●●
●
●
●● ●
●
●●● ●
● ●
●
●● ●
●
●● ●
●
●●
●●
●
●
●
●●
● ●● ●●
● ●● ●
●
●● ●● ●● ●
●
●● ●
●
●● ●●
●
●●
● ●●●● ●●
● ●●● ●● ●●
● ● ●●
●
● ●●
● ●●
●●
● ●●
●
● ●●
● ●
●
●● ●● ●
●●
●
●●
●●●
●● ●● ●●
● ●●
● ●●● ●●
● ●●
●
● ●
●● ●
● ●
●● ●
●
●● ●
●●
●
●
●
●●
● ●● ●● ●●
●● ●●
● ● ●●
●●
50 ●
● ●
●●
● ●
● ● ●●
●
● ●●
● ●●
●●
●
●
●● ●● ●● ●● ●
●
●● ●● ●
●
●●●
●●
●● ●
●●
● ●● ●
●
●●●
●●
● ●
●●
● ●●
●● ●●
●●●●
●●● ● ●● ● ● ●● ●●● ●
●
●● ●●
●
●
●●
● ● ●● ●●
●●●●
●
●●
●
●●
● ●● ●●
●● ●●
● ● ●●●●
●● ●● ●●
● ●● ●
●
●● ●● ●●
● ●●
●
●
● ●● ●● ● ●●
●
2 2 2
respectively). This indicates the subjects perceived shape-only
1 1 1
rendition expressive enough as its original performance.
0 0 0
Although the score difference was not significant, some
Original Shape Original Shape Original Shape
Stimuli subjects found that the shape-only rendition conveyed
performer's artistic intention better than the original
Fig. 3 Listening test results of a) cho-mzk019-czern-y, b)
moz-snt279-2-pires-y and c) rav-sonatn1-rogep-y.
performance of Chopin Mazurka. This result addresses a
2 3
cho-pld007-ashke-b from CrestMuse PEDB. More information about the 5 professionals, 26 hobbyists and 11 others.
4
database can be found in http://www.crestmuse.jp/pedb/ Deviation is not discussed here, because it is out of scope of this paper.
621
Table of Contents
for this manuscript
a) a) G C G
C
C G G C G G C G G
G
1 2 3 4
H M H H M H
C
C C G G G C G
G G G C
H H
b) 1 2 3 4 H M H
H M
H H
●
800
● ●
b)
Duration (ms)
700 ●
●
●
●
● ●
●
●
Performer
600 ●
100 arger
● ● ●
●
Tempo (BPM)
●
● ●
● ● ashke
● ●
500 ●
●
● 80 dangt
● ● ●
●
● ●
pires
0 10 20 30 60
Beat
c) 1 2 3 4 40
50 20
0 10 20 30 40
Beat
1st Derivative
−50
100
−100
Tempo (BPM)
80
−150
0 10 20 30
Beat 60
40
Fig. 4 a) Score of the first 4 bars in Chopin Prelude Op. 28 No. 4. b)
note durations in the performance of Martha Argerich, which were 0 10 20 30 40 Beat
measured by Senn et al. (dotted line), and shape analysis result G G G G G G G G G G G G G G G G
computed by our method (solid line). c) the first derivative of the 20
expressive shape.
1st Derivative
10
0
hypothesis that expressive shape would be one of the most
important components in performer-listener communication, −10
622
Table of Contents
for this manuscript
V. SUMMARY
We presented a computational method of expressive shape
analysis in music performance, and showed several results
indicating plausibility of the computational analysis. As the
next step, we will perform a large scale computational analysis
on some of published performance databases, and discover its
potentials for performance-related computer systems.
Furthermore, we will research about integration of the
computational analysis method to artificial music performance
systems for learning and generating expressive shapes.
ACKNOWLEDGMENT
This research is partially supported by Samsung Scholarship
Foundation.
REFERENCES
[1] B. Repp, “Diversity and commonality in music performance: An analysis
of timing microstructure in Schumann’s Träumerei,” in Journal of the
Acoustical Society of America, vol. 92(5), 1992.
[2] P. Juslin and R. Timmers, “Expression and communication of emotion in
music performance,” in Handbook of music and emotion: theory,
research, applications. London: Oxford University Press, 2010.
[3] N.P. Tood, “Kinematics of musical expression,” in Journal of the
Acoustical Society of America, vol. 97(3), 1995.
[4] A. Tobudic and G. Widmer, “Playing Mozart phrase by phrase,” in
Proc. International Conference on Case-Based Reasoning, 2003.
[5] S. Morita, N. Emura, M. Miura, S. Akinaga and M. Yonagida,
“Evaluation of a scale performance on the piano using spline and
regression models,” in Proc. International Symposium on Performance
Science, 2009.
[6] P. Mavromatis, “A Multi-tiered approach for analyzing expressive timing
in music performance,” in Mathematics and Computation in Music,
Springer Verlag, 2009.
[7] G. Golub, M. Heath and G. Wahba, “Generalized Cross-Validation as a
method for choosing a good ridge parameter,” in Technometrics, vol.
21(2), 1979.
[8] T. Hastie and R. Tibschirani, “Generalized Additive Models,” in
Statistical Science, vol. 1(3), 1986.
[9] C. Gu, “Cross-validating non-gaussian data,” in Journal of
Computational and Graphical Statistics, vol. 1(2), 1992.
[10] O. Senn, L. Kilchenmann and M.-A. Camp “Expressive timing: Martha
Argerich plays Chopin's Prelude op. 28/4 in E minor,” in Proc.
International Symposium on Performance Science, 2009.
[11] J. MacRitchie, “Elucidating musical structure through empirical
measurement of performance parameters,” Ph.D. dissertation, University
of Glasgow, 2011.
623
Table of Contents
for this manuscript
Previous research on vocal pitch accuracy revealed insights into the fundamentals of
singing. However, most of the research on singing focused on the analysis of single
voices, whereas few attempts have been made to tackle the challenge of analyzing
multitrack recordings of singing ensembles. In addition, singers have to adjust their way
of singing with respect to a given venue’s acoustical environment (e.g., small room vs. a
comparatively large space like a church). If it is common that musical performances are
greatly influenced by room acoustics, studies on the effects of room acoustical features
during ensemble singing are rare. In order to investigate singing performances across
various conditions, we manipulated the singing condition (unison, canon, solo) as well as
the acoustical feedback by applying diverging virtual rooms. Three duets with female
singers (N = 6) were asked to sing three different melodies using headset microphones to
record each singer separately. Recordings took place in the communication acoustic
simulator (CAS) at the House of Hearing (Oldenburg, Germany) to be able to provide
different simulated acoustical spaces (i.e., cathedral, classroom, and dry condition) to the
singers. Objective measures were performed on each recording and confirmed that the
singers sang the melodies with high precision (small pitch interval deviations) hardly
affected by singing conditions or by the type of acoustical feedback. However, the
singers tended to drift (larger deviations of the tonal center) when singing in canon
compared to solo and unison singing. Overall, the analysis of the pitch accuracy showed a
general effect of condition (i.e., unison, canon, solo), but no general effect of acoustical
feedback and no interaction between the two variables under study.
ISBN 1-876346-65-5 © ICMPC14 624 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Music is frequently used as a means to relax, while its arousing effects are often employed in
sports and exercise contexts. Previous work shows that music tempo is one of the most
significant determinants of music-related arousal and relaxation effects. However, in the
literature on human heart rate, music tempo itself has not yet been studied rigorously on its
own.
Aims
The aim was to investigate the link between music tempo and heart rate during passive music
listening, adopting an experimental design with tight control over the variables.
Method
Heart rate was measured in silence after which music was provided in a tempo corresponding
to the assessed heart rate. Finally, the same stimulus was presented again but music tempo
was decreased, increased, or kept stable.
Results
The experiment with heart rate measurements of 32 participants revealed that substantial
decreases in music tempo significantly reduced participants' heart rates, while heart rate did
not respond to less considerable drops or increases. It was also shown that heart rate
significantly increased in the music condition compared to the silent one. The person’s gender
or music preference did not seem to be of significant importance.
Conclusions
Generally, it is believed that music can induce measurable and reproducible effects on human
heart rate, leading to a condition of arousal proportional to the tempo of the music. However,
our findings revealed that only substantial decreases in music tempo could account for heart
rate reductions, while no link between tempo increase and heart rate was uncovered. As music
listening showed to increase heart rate compared to silence, it is suggested that possible
effects of tempo increases are regulated by the arousal effect of music itself. These results are
a major contribution to the way in which music is used in everyday activities and are valuable
in therapeutic and exercise contexts.
ISBN 1-876346-65-5 © ICMPC14 625 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Music has been shown to positively affect a patient’s mental and physiological state in
perioperative settings. This effect is often explained by entrainment – the synchronization
of the patient’s bodily rhythms to rhythms presented in the music. While there is
evidence to support the effectiveness of musical entrainment, there exists a need for a
more specialized investigation into this field, which moves beyond the basic discussions
of tempo and looks at musical selections at both an individual level, and as a whole.
Aims
This research examines the literature surrounding the use of entrainment in music-based
surgical environments, creating a method for the selection and composition of music for
this purpose.
Main Contribution
It is argued that music can not only entrain the rhythms of the body to states more
conducive to healing, but that the mental state of the listener can also be entrained by the
moods presented in the music. As such, it is advocated that these two types of
entrainment should be referred to using a more specialized terminology – ‘physiological
entrainment’ and ‘emotional entrainment’ respectively. It also argued that the iso
principle – an established music-based technique for the targeted shifting of emotional
states – can be applied not only to emotional entrainment interventions, but can also be
translated to physiological entrainment interventions.
Implications
This research will aid future musical entrainment interventions by providing a clear
method for the selection or composition of the most effective and appropriate music for
this purpose. Standardizing music selection in this way will contribute to the
improvement of the reliability of studies, and thus will aid in enhancing future research in
this field.
ISBN 1-876346-65-5 © ICMPC14 626 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Stroke is the main cause for adult disability, leading to the loss of motor skills. A recently
developed therapy, music-supported rehabilitation (MSR), utilizes goals of music-making
activities with percussion and keyboard instruments to improve motor abilities in stroke
patients, and the respective results have been promising. Multiple studies have shown that
musical training leads to neuroplastic changes in auditory and motor domains. The effect
of MSR on the auditory system is not clear yet. The present study investigated the effect
of MSR on auditory cortical processing in chronic stroke patients by means of
magnetoencephalography (MEG) recordings by examining mismatch negativity (MMN)
of auditory evoked fields (AEF). Lesion size and location varies largely across stroke
patients, thus we evaluated the outcome of MSR and a conventional therapy (CT) for
single cases. 29 patients participated in MSR or CT for 15 or 30 times of one-on-one 1-
hour session with a therapist. We administered our test battery at three/four time points:
before training, after 15h or 30h of training, and 3 months after the last training session.
The test battery consisted of MEG recordings and behavioural tests. MEG recordings
entailed: (a) Presentation of single complex tones to obtain standard AEF responses; and
(b) passive and active oddball tasks to extract MMN responses. Previous studies found
improved MMN responses after musical training. We hypothesized that MMN will
change in the patients receiving MSR, but not in those receiving CT. Behavioural tests
assessed auditory, motor and cognitive performance. The different lesion locations
revealed a large variability on AEF responses to a single complex tone. Thus, we discuss
the results of MSR-dependent changes in AEFs and motor abilities based on lesion side.
Motor rehabilitation will be discussed for single stroke cases by comparing the outcomes
for MSR and CT. Limitations in evaluating AEFs in stroke patients will also be
discussed.
ISBN 1-876346-65-5 © ICMPC14 627 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Therapeutic potentials of psychedelic drugs such as LSD and psilocybin are being re-
examined today with promising results. During psychedelic therapy patients typically
listen to music, which is considered an important element to facilitate therapeutic
experiences, often characterized by profound emotion. Underlying mechanisms have
however never been studied.
Aims
Studies aimed 1) to understand how music and psychedelics work together in the brain,
and 2) how their combination can be therapeutic.
Method
1) The interaction between LSD and music was studied via a placebo-controlled
functional magnetic resonance imaging (fMRI) study. 20 healthy volunteers received
LSD (75mcg) and, on a separate occasion, placebo, before scanning under resting
conditions with and without music. Changes in functional and effective connectivity
were assessed, alongside measures of subjective experience. In addition, it was studied
how LSD alters how the brain processes acoustic features (timbre, tonality and rhythm).
2) In a clinical trial, psilocybin was used to treat 20 patients with major depression. Semi-
structured interviews assessed the therapeutic role of music, and fMRI scanning before
and after treatment assessed enduring changes in music-evoked emotion.
Results
1) Compared to placebo, changes in functional and effective connectivity were observed
during LSD in response to music, including altered coupling of the medial prefrontal
cortex (mPFC) and parahippocampus. These changes correlated with subjective effects,
including increased mental imagery. 2) Music influenced the therapeutic experience in
several ways, including enhancing emotion. Altered functional connectivity in response
to music after treatment was associated with reductions in depressive symptoms.
Conclusions
These findings suggest plausible mechanisms by which psychedelics and music interact
to enhance subjective experiences that may be useful in a therapeutic context.
ISBN 1-876346-65-5 © ICMPC14 628 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
1
merrill.tanner@ualberta.ca, 2kalluri@ualberta.ca, 3richmane@ualberta.ca
ABSTRACT
Interstitial lung diseases (ILD) are characterized by fibrosis and impaired gas exchange in the lungs. Shortness of
breath and cough are the main symptoms. These respiratory diseases are complex and can be difficult to treat. A
singing program led by a Speech-Language Pathologist/Singer was provided to 4 people with ILD. 8 sessions of voice
therapy exercises and song singing with piano accompaniment took place over a 4 week period. The goal of the study
was to find out whether singing and voice therapy would be beneficial to people with ILD. Pre and post measures were
taken. Respiratory measures included assessment of forced vital capacity, and a rating on the Dyspnea Medical
Research Council breathlessness 1-5 scale and cough rating scale (mild, moderate, severe). Acoustic measures included
average pitch in reading, maximum phonation time (MPT), loudness range and pitch range. Unsolicited comments
given during post testing were also recorded. Lung function did not change in two patients and improved in one patient.
The Dyspnea MRC breathlessness scale improved in 2 patients and remained the same in another 2 cases. In one of the
2 patients that had a cough, a mild cough disappeared. Pre-post acoustic measures showed that average pitch in reading
rose and that MPT improved for all participants. Loudness range, pitch range and perceptual ratings improved in ¾ of
participants. Comments of participants during post testing revealed that all of the participants enjoyed the group and
felt that their voices had improved. 3 of the 4 participants felt that their breathing had improved, and 2 reported less
coughing. 3 mentioned that they would like the group to continue as a part of the pulmonary rehabilitation program.
One participant returned to singing in a choir after the end of the session. Respiratory, vocal and mood improvements
were evident in this small group of four participants with ILD. Voice therapy with singing appears to improve the lives
of people with ILD and should be studied further.
I. BACKGROUND
Idiopathic pulmonary fibrosis (IPF) is a chronic, progressive, irreversible, and fatal fibrotic interstitial lung
disease (ILD). It is a devastating condition that afflicts people with relentlessly progressive shortness of
breath [1]. Despite 30 years of intensive research, there is no cure and mortality rate remains high [2].
Review of the literature on IPF related quality of life (QOL) reveals several unique as well as over lapping
problems shared with other lung diseases. The hallmark symptom of IPF is exertional dyspnea, which is
quite disabling in patients with IPF. It measurably worsens over time and appears to be one driver of
reduced QOL.
In addition to dyspnea, IPF patients also suffer from sadness, worry, anxiety, fear, isolation and a sense of
lack of control over their condition [3]. Patients commonly express hopelessness and despondency when
they learn that there are no effective therapies for disabling symptoms. Physical limitations and oxygen
dependency impose an overpowering sense of social and emotional isolation that seems to have not been
captured by any QOL surveys done to date. Several QOL related studies have noted that the domains of
mental health also show impairment [4]. It is known that depression and anxiety can have a negative impact
on lung function, can worsen quality of life and also lead to higher healthcare costs [5].
The current therapies may slow down progression of disease but have no impact on symptoms or QOL.
Therefore, the goal of care should be to help IPF patients experience less dyspnea, less anxiety, less depression,
and to have a better quality of life. Pulmonary rehabilitation (PR) programs have been reported to improve
quality of life in IPF [6]. They have also been associated with improvements in mood, depression and QOL in
healthy volunteers and patients with COPD [8-9]. However, delivering effective psychological, emotional and
ISBN 1-876346-65-5 © ICMPC14 629 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
social support still remains a challenge in IPF. The use of music therapy in the form of singing teaching can be
an effective tool to achieve these ends. Voice training and singing has been shown to improve QOL and lung
function.
Trials with COPD and asthma patients have shown that involvement in a choir group not only encourages
social bonding and alleviates depression, anxiety, fear and isolation, but also gives a sense of purpose in
life. A study of 43 COPD patients showed improvement in QOL, lung function and dyspnea [11]. In
another study of 7 subjects with emphysema who participated in twice weekly singing classes for six weeks
improvement in chest wall mechanics was observed [12]. A pilot study evaluated quality of life (QOL) and
lung function before and after three months of choral singing in cancer survivors and their care givers and
found improvements in QOL and depression. Structured interviews revealed that the choir provided focus;
the participants felt uplifted and had greater confidence and self-esteem [13]. Another study of a group of
Alzheimer’s disease patients and their caregivers showed the efficacy of receptive music therapy in
alleviating anxiety and depression [14].
II. RATIONALE
IPF is a chronic, progressive, fibrotic interstitial lung disease with mortality rates approaching those found
in lung malignancy. IPF patients suffer from depression, anxiety, fear and social isolation, in addition to the
progressive dyspnea. Despite intense research there is no curative therapy. There are several studies related
to asthma, COPD and lung cancer that showed that participation in choir groups has effects beyond lung
function. It promotes social and spiritual wellbeing and thereby improves quality of life. This has never
been studied in relation to interstitial lung diseases like idiopathic pulmonary fibrosis (IPF).
IV. METHODS
A small study with 4 people with ILD was conducted in August of 2015. Subjects were recruited from the
Multidisciplinary ILD program by Dr. Kalluri. Dr. Richman-Eisenstat assessed them and determined their
exercise prescription. Patients were offered participation in a singing group. Three of the four patients involved
attended the pulmonary rehabilitation program as well. The program consisted of four weeks of twice weekly
singing sessions led by a Speech-Language Pathologist/Singer and accompanied by piano. The sessions were
initially 1/2 hour long but were expanded to one hour due to patient requests for longer sessions. Sessions
involved voice exercises, including semi-occluded vocal tract exercises, open throat exercises, vocal hygiene,
breathing techniques and song singing. Home practice was encouraged. Songs included common folk and older
popular repertoire with positive themes. Attendance was variable as these patients were living in the community
and trying to deal with work and the limitations of their disease.
A. Evaluation
Pulmonary measures were taken in clinic by the respiratory therapist before and after the treatment period.
Voice measures were taken by the Speech Language Pathologist leading the group before and after the
treatment period.
B. Results
1) Pulmonary Results
MRC dyspnea scale, CGI-I (clinical global impression of improvement) and objective measures of lung
function such as pulmonary function test (FVC, DL) and 6 minute walk distance at the two time points
were measured. €Two participants had improvements in functional vital capacity and reduced
breathlessness. One participant had improved cough and another had an improved clinical global
impression of improvement.
630
Table of Contents
for this manuscript
2) Voice Results
Average pitch during reading and maximum phonation time improved in all four cases. Loudness range,
pitch range, and perceptual ratings by the therapist improved in three out of four cases. The initial pitch
range reading of one patient may have been inaccurate due to the perturbation and shimmer in his voice.
The equipment may have picked up high frequencies that were not a part of the perceived vocal range. Post
testing for one patient occurred in her work environment where she was not able to be as loud as possible.
In the perceptual measures the one patient who did not improve remained at the same as in the initial
assessment. One person of the three that completed the forms, improved on both vocal QOL measures. The
other two appeared to increase their scores (poorer vocal QOL). This may have occurred since their self
awareness of problems increased as a result of the vocal hygiene education.
3) Interview results
All participants mentioned that they enjoyed the group and would like to continue. One person wanted it to
increase to three times per week. Two people felt that their coughing had decreased. Another person felt
that her throat was more relaxed. One participant said he would highly recommend the group to anybody
and that his improved voice now allows him to sing along with music in the car. Two people felt that the
quality of their voices was better in speaking and singing. Two others reported that their higher range had
improved. One woman said she felt more confident, while another said that the group had improved her
mood. She thought that meeting with others who had the same problem and having fun both contributed to
her better mood. As a result of being in the group she returned to the choir that she attended before falling
ill with ILD and found that she could participate if she took more frequent breaths that previously. The
same woman felt that she now had improved breathing and was able to use the vocal techniques learned to
recover more quickly when walking too fast. Another participant reported that she had learned methods to
control voice and breathing.
C. Discussion
Despite the small size of group (n=4) and the bias in the voice results (the treating therapist also performed
voice testing), the study appears to show that all participants experienced some benefit.
D. References
1. American Thoracic Society. IPF: diagnosis and treatment. International consensus statement. Am J Resp
Crit Care Med 2000;161:646.
2. AJRCCM 1998:157:199-203-Flaherty.
3. Pulmonary rehabilitation in idiopathic pulmonary fibrosis: a call for continued investigation. Resp
medicine 2008;102;12:1675.
4. Evaluation of the short-form 36-item questionnaire to measure health-related quality of life in patients
with idiopathic pulmonary fibrosis. Chest 2000;117(6):1627.
631
Table of Contents
for this manuscript
5. The prevalence and pulmonary consequences of anxiety and depressive disorders in patients with asthma.
Coll Antropol. 2012 Jun; 36(2):473-81.
6. Effectiveness of pulmonary rehabilitation in restrictive lung disease. J Cardiopulm rehab 2006;
26(4):237-243.
7. 'Didgeridoo playing and singing to support asthma management in Aboriginal Australians', Journal of
Rural Health, vol. 26, no. 1, pp. 100-4.
8. The perceived benefits of singing: findings from preliminary surveys of a university college choral
society. J R SOC PROmotion Health 2001, 121, 4, 248-256.
9. Effects of singing classes on pulmonary function and quality of life of COPD patients. Intl j of copd
2009:4;1-8
10. Karaoke for quads: A new application of an old recreation with potential therapeutic benefits for people
with disabilities. Disability and Rehabilitation, 25, 297-301.
11. Singing teaching as a therapy for chronic respiratory disease - a randomised controlled trial and
qualitative evaluation. BMC pulm med 2010; 10:41.
12. The singer’s breath: implications for treatment of persons with emphysema. J Music Ther. 2005 Spring;
42(1):20-48.
13. A pilot investigation of quality of life and lung function following choral singing in cancer survivors
and their carers. Ecancer 2012;6:261.
14. Impact of music therapy on anxiety and depression for patients with Alzheimer's disease and on the
burden felt by the main caregiver (feasibility study). Encephale 2009, 35, 1, 57-65, France.
V. CONCLUSION
In the absence of pharmacological therapy and ILD tailored pulmonary rehabilitation, patients with IPF
continue to suffer from breathlessness. Based on the beneficial initial findings of this pilot study there is
reason to continue research on singing programs for Interstitial Lung Disease.
IV. ACKNOWLEDGMENT
Funding for this study was obtained from the Canadian Pulmonary Fibrosis Foundation (CPFF).
632
Table of Contents
for this manuscript
Eun Young HAN1,2, Ji Young YUN2 , Kyoung-Gyu CHOI2 and Hyun Ju CHONG1
1
Department of Music Therapy, Ewha Womans University Graduate School, Seoul, Korea
2
Department of Neurology, Ewha Womans University School of Medicine, Ewha Womans University
Mokdong Hospital, Seoul, Korea
The symptoms of Parkinson’s disease (PD) patients are not only movement disorder but
also voice and affective disorder so that they frequently suffer emotional depression as
well as communication problems. This depression causes reduced quality of life and
delayed therapeutic responses. Music acts as a stimulus to elicit motor and emotional
responses and has been applied to various neurologic disorders. Therapeutic singing
activates neural mechanisms for speech using the respiratory muscles and the articulation.
Singing and speaking are similar in that they involve cortical regions and engage bilateral
activities of the brain. Moreover, the melody and lyrics of singing frequently induce
personal memories related to both physical and emotional states. The purpose of this
study is to examine the Therapeutic Singing Program (TSP) to enhance vocal quality and
alleviate depression of PD. We studied 6 female and 3 male PD patients aged 53 to 78.
They were selected by the neurologist with clinical criteria. Each patient received an
intensive music therapy session for 50 minutes in a total of 6 sessions for 2 weeks. The
therapeutic singing program consisted of patterned vocalizing, vocal improvisations and
song compositions for vocal exercise. In the pre/post tests on the 1st and 8th days, we
measured the Maximum Phonation Time (MPT) via the Praat test, the Voice Handicap
Index (VHI), the Voice-Related Quality of Life (V-RQOL), and the Geriatric Depression
Scale (GDS). We conducted the follow-up tests six months later. The vocal quality of the
patients, including acoustic and subjective vocal evaluations, was improved and the
depression symptoms of the patients were alleviated. We also found that both the acoustic
vocal in the MPT and the V-RQOL were increased while the VHI was decreased. The
GDS was found remarkably lowered. The follow-up tests indicated positive effects of the
Therapeutic Singing Program on keeping the vocal quality and alleviating the depression.
This holistic music therapy study for Parkinson’s disease has confirmed the effectiveness
of the Therapeutic Singing Program (TSP) to enhance vocal quality and alleviate
depression of PD using vocalizing, vocal improvisation and song compositions based on
phonetics, neuropsychology and music psychotherapy.
ISBN 1-876346-65-5 © ICMPC14 633 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 634 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Music therapy has been found to help improve communicative behaviours and joint
attention in children with communication difficulties, but it is unclear what processes in
the music therapy sessions contribute to those changes. Previous research has focused on
the use of outcome measures or on the analysis of segments of sessions. One way to
explore what contributes to change is to analyse what happens in the sessions. In
developing methods for such analysis, it is informative to understand how the therapists
in the sessions view the results of the analysis. In this two-stage project, we created
representations of aspects of interactive behaviours in improvisational music therapy
sessions. We then asked the therapists who had been in the videos for their views about
the representations. To create the representations, an annotation protocol was developed
which focused on simple, unambiguous individual and shared behaviours of clients and
therapists including, facing, rhythmic activity and musical structures. We analysed
sessions with children who had been referred for communication difficulties. The
representations were of five pairs of whole early and late sessions and showed either
moment-by-moment views of behaviours of each player or proportions of the annotated
behaviours. Three of the therapists who had been in the videos responded to a
questionnaire about the representations. They reported that all the characteristics
presented were relevant to tracing change. They found that the visualisations would be
useful for their own assessments of client progress as well as for communication about
their work with others. The role of these representations was discussed in the context of a
balance between consideration of whole sessions and the analysis of salient moments
within sessions. The therapists’ responses indicate that though the representations
illustrate simple behaviours, they may contribute to the analysis of the multi-faceted work
of the music therapists.
ISBN 1-876346-65-5 © ICMPC14 635 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Many researchers have explored relationships between the ability to discriminate pitch
and the ability to match pitch, but results conflict particularly for young children.
Although tonal patterns may be more accessible for children to match than songs, most
researchers have focused on cross-sectional investigations of song reproductions. A
longitudinal study of discrimination and pitch matching of tonal patterns may provide a
more equitable comparison and greater insight into children’s initial musical experiences.
The purpose of this study was to examine longitudinal differences and relationships
between children’s pitch discrimination and pitch matching skills.
Differences in pitch discrimination and pitch matching between kindergarten and second
grade scores were examined by paired t-tests. Results indicate improvement of 14.5% in
discrimination and 13% in pitch matching between kindergarten and second grade.
Pearson correlations between the score distributions of the discrimination measure and
the pitch matching measure for both grades were analyzed. The relationship between
discrimination and pitch matching remains moderate in kindergarten (.40) through second
grade (.45). The strong relationship between kindergarten and second grade pitch
matching scores (.74) contrasts with the moderate relationship between kindergarten and
second grade discrimination scores (.53).
ISBN 1-876346-65-5 © ICMPC14 636 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 637 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
associations of those instruments with children aged seven to the present cases. 44.4 % of the participants were already play-
eight (N = 357). Their study aimed to examine, whether gender- ing this instrument at the time of survey. In 26 cases, the child
typed preferences will either be established, confirmed or was the only one to play an instrument in their family. 69.7 %
changed after exposure to stereotypic or counter-stereotypic of the young children had already experienced live music in a
role models. During a priming phase, six instruments (flute, vi- concert or a similar situation. For the remaining 30.3 % the con-
olin, piano, trumpet, guitar and drums) were introduced at cert experienced during the examination seems to be the first
schools in Great Britain. Short interviews with pictures of the live music performance.
instruments were held and preferences as well as gender stere-
otyped associations to those instruments were captured. An in- B. Data collection instrument
tervention concert followed this phase. Subjects were divided We use a specifically developed questionnaire as the data
into three groups: Group 1 and group 2 received concerts collection instrument for the present examination. It was lim-
played by either gender-consistent or gender-inconsistent role ited to few items to ensure that our young subjects are able to
models and group 3 served as control group. It is important to complete it in an appropriate amount of time and with the as-
take into account that Harrison and O’Neill mention no alter- sistance of the interviewer. Content and amount were discussed
native treatment for the control group. in a colloquium with teachers and educators. Upon consultation,
Results confirmed earlier findings regarding role models and the processing time was set for 15-20 minutes.
preferences. Girls showed a significant change concerning The first part of the questionnaire aimed to collect socio-de-
preferences for flute, piano and drums. Girls who received a mographic data to be used as general exploratory information.
gender-inconsistent concert rejected the piano more after the The second part of the questionnaire compiled preferences for
concert. A similar but not significant trend could be found for particular musical instruments as well as gender stereotypes.
flute and violin. Boys in the two main groups (gender-con- The included instruments were flute, violin, piano, trumpet,
sistent and -inconsistent) preferred the drums more after the guitar and drums. Because the children were too young to read,
concert. In addition, only the results for violin showed a ten- continuous scales and drawings were used to rate the instru-
dency for rejection after seeing a female play the instrument (cf. ments for preferences as well as for stereotypes. First, the chil-
Harrison & O’Neill, 2000). dren should place small pictures of the six musical instruments
We choose a replication of this study, especially because of on a rectangular field in rank order according to their prefer-
the innovative research design with live concerts. Throughout ence for the instruments (the closer to the upper rim, the better
the last decades, research regarding the preference for a partic- liked; placed side by side means same preference). Second,
ular musical instrument as well as gender stereotypes with chil- they should make one’s cross on lines with arrows on ends
dren and adolescents were more or less limited to the United pointing towards a drawing of a boy or a girl for every instru-
States and Great Britain. In Germany, there are only very few ment (the nearer the cross to the respective drawing, the more
studies exploring the relationship between musical instruments likely it should be played by the respective gender, see figure
and gender (Scheuer, 1988; Vogl, 1993; Müller & Burr 2004). 1). To code the data, we measured out the place of the pictures
Since stereotypes and preferences are established much earlier or the crosses and applied a metrical splitting of the continuums.
than at the age of seven, the present study investigates effects
among younger children.
638
Table of Contents
for this manuscript
C. Procedure
The time sequence divided into two survey phases and each Table 1. Boys and girls’ preferences for musical instruments right
survey phase into two sub-phases. The first stage of the survey after the priming phase (t1).
consisted of a priming phase followed by the first questioning trum-
(t1). During the priming phase, the researchers wanted to en- flute violin piano pet guitar drums
sure that all of the children had at least a first impression of the boys M 4.375 3.746 3.158 4.223* 2.948 1.369(*)
SD 3.301 3.739 3.150 3.311 3.084 2.154
presented instruments. To avoid gender stereotypes and tradi-
girls M 3.067 2.926 2.110 4.940* 3.812 3.352(*)
tional role models, the subjects saw pictures of each instrument SD 3.338 3.076 2.322 3.370 3.253 3.442
on a white background. Additionally, each playing technique
M: mean-value (0 = absolutely preferred, 9 = absolutely rejected); SD: stand-
was explained and sound recordings (interpretations of the mel- ard deviation; one-way ANOVA: (*) p ≤ .10; * p ≤ .05; ** p ≤ .01;
ody Twinkle, Twinkle Little Star and a C major scale; for the *** p ≤ .001
drums a short drum pattern) were played. The priming phase
took approximately 20 minutes and was carried out in smaller In contrast to our expectations, children assigned all instru-
groups of five children. The second stage of the survey took ments rather according to their own gender (cf. table 2):
place one week after the first. It consisted of the respective in- Whereas girls stated that all musical instruments except the
tervention concert, immediately followed by a second round of trumpet and the drums should rather be played by a girl with
questioning (t2). the piano as most feminine (M = 8.42) and the trumpet as most
The two experimental groups underwent slightly different masculine (M = 4.96), boys categorized all musical instruments
treatments. Male and female musicians presented live one of more or less as boy instruments with the drums as most mascu-
six different instruments each to all of these children. In the line (M = 2.06) and the piano as least masculine (M = 5.55).
group 1, the sex of the respective musician complies with the There is only one significant difference between boys and girls
stereotypic role models which were confirmed by earlier stud- for the trumpet (F = 3.519, p = .034).
ies (Abeles & Porter, 1978; Bussey & Perry, 1982; Delzell &
Leppla, 1992; Bruce & Kemp, 1993). We defined this group as
the gender-consistent group. The opposite cast (e.g. a male Table 2. Gender stereotypes for musical instruments right after
the priming phase (t1).
playing the flute and a female playing the drums) was presented
trum-
to the group 2 – the gender-inconsistent group. Group 3 served flute violin piano pet guitar drums
as the control group. The children of this group received an ep- boys M 4.375 4.635 5.552 3.479* 2.583 2.063
isode of a TV-series for kids. SD 4.262 4.607 4.367 3.771 3.428 3.212
The intervention concerts took place seven days after the girls M 8.369 7.738 8.417 4.964* 6.583 5.583
first ascertainment. We presented the musical instruments in SD 3.957 4.427 3.641 4.426 4.614 4.628
random order. Each instrument played a short solo-piece. Tim- M: mean-value (0 = absolutely masculine, 12 = absolutely feminine); SD:
ing and musical style were similar in both groups. standard deviation; one-way ANOVA: (*) p ≤ .10; * p ≤ .05; ** p ≤ .01;
*** p ≤ .001
D. Statistical Methods
All data was analyzed using the Statistical Packages for the Concerning both boys and girls, results reveal no significant
Social Sciences (SPSS, Version 22). Two-way repeated differences between the two times of measurement in terms of
measures variance analyses (ANOVA) between three groups preferences for musical instruments except for the guitar (see
computed for both gender were used for answering the research table 3).
issues as well as t-tests for independent samples. The level of
significance was set at α = .05. Table 3. Preference for the guitar: two-way analysis of variance
(ANOVA) with repeated measures between the three groups
computed for both gender.
IV. RESULTS F df p d
Right after the priming phase (t1), boys indicated highest boys time 5.292 1 / 45 .026 0.456
preferences for the drums (M = 1.36) and lowest for the flute time*group .198 2 / 45 .821
(M = 4.38), whereas girls preferred most the piano (M = 2.11) girls time 11.788 1 / 39 .001 0.598
time*group 1.372 2 / 39 .266
and least the trumpet (M = 4.94) (cf. table 1). The difference
between the preference ratings of boys and girls is significant F, F-value; df, degrees of freedom; p, empirical significance; d, Cohen’s d
for the trumpet (F = 4.411, p = .015) and nearly significant for (effect size) with: 0.00–0.10 (no effect), 0.20–0.40 (small), 0.50–0.70 (me-
the drums (F = 2.874; p = .062). These findings are in line with dium), > 0.80 (large).
gender-typical preferences measured in previous studies (e.g.
Abeles & Porter, 1978; Griswold & Chroback, 1981; Delzell & All subjects tend to prefer the guitar more at the second time
Leppla, 1992; O’Neill & Boulton, 1996; Harrison & O’Neill, of measurement (see figure 2 and 3).
2000; Abeles, 2009; Killian & Satrom, 2011; Vickers, 2015). It
is remarkable that the children rejected no musical instrument
at the first time of measurement.
639
Table of Contents
for this manuscript
9,8
preferred – rejected
5,3
8,4 7,3
masculine – feminine
4,1
4,4
2,7
2,5 2,4
1,6
t1 t2 boys girls
Figure 3. Girls’ preference for the guitar: comparison of the two Figure 4. Gender stereotypes for the flute between two times of
times of measurement between the three groups. measurement.
640
Table of Contents
for this manuscript
general, but we would like to emphasize above this that the par- Abeles, H. (2009). Are Musical Instrument Gender Associations
ticipating children did not reject a musical instrument particu- Changing? Journal of Research in Music Education, 57 (2), 127-
larly and the standard deviations were rather high. The results 139.
in study 3 by Abeles & Porter (1978) indicated something sim- Bruce, R., & Kemp, A. (1993). Sex-stereotyping in Children’s Prefer-
ences for Musical Instruments. British Journal of Music Educa-
ilar: the sex-stereotyping behaviour in musical instrument pref- tion, 10 (3), 213-217.
erence was not very strong in kindergarten children but more Bussey, K., & Perry, D. G. (1982). Same-Sex Imitation: The Avoid-
pronounced in children beyond school grade 3. However, our ance of Cross-Sex Models or the Acceptance of Same-Sex Models?
findings might be also due to the new developed measuring Sex Roles, 8 (7), 773-784.
method with continuums instead of forced-choice categories. Conway, C. (2000). Gender and musical instrument choice: A phe-
In addition, we did not ask the children to select the instrument nomenological investigation. Bulletin of the Council for Research
that they would like to play, but to rank order all instruments in in Music Education, 146, 1-15.
terms of preference. Because they should perform choices Cramer, K. M., Million, E., & Perreault, L. A. (2002). Perceptions of
without real consequences (e.g. the opportunity to play the in- musicians: Gender stereotypes and social role theory. Psychology
of Music, 30, 164-174.
strument liked best), they might not be afraid of negative out- Delzell, J. K., & Leppla, D. A. (1992). Gender Association of Musical
comes such as playing a gender-inappropriate instrument in the Instruments and Preferences of Fourth-Grade Students for Se-
presence of peers. lected Instruments. Journal of Research in Music Education, 40,
But yet, our results clearly differ from the previous in terms 93-103.
of changes between the two times of measurement what might Elliot, C. A., & Yoder-White, M. (1997). Masculine/feminine associ-
be also caused by confounding variables and methodological ations for instrumental timbres among children seven, eight, and
problems such as children’s conversations between priming nine years of age. Contributions to Music Education, 24 (2), 30-
and intervention phase. Contacts with the guitar beyond the in- 39.
terventions can explain the only observed changes in prefer- Eros, J. (2008). Instrument Selection and Gender Stereotypes. A Re-
view of Recent Literature. Update: Applications of Research in
ences for this instrument, even though we do not know of any Music Education, 27 (1), 57-64.
special contacts. Fortney, P. M., Boyle, J. D., & DeCarbo, N. J. (1993). A study of mid-
Children establish a gender identity latest in the third year of dle school band students’ instrument choices. Journal of Research
life and preschoolers typically show the strongest relations in Music Education, 41, 28–39.
among gender schemata. However, at the end of preschool and Goble, P., Martin, C. L., Hanish, L. D., & Faves, R. A. (2012). Chil-
with the beginning of elementary school an increase in stereo- dren’s Gender-Typed Activity Choices across Preschool Social
type flexibility is frequently observed which might signal that Contexts. Sex Roles, 67, 435-451.
the interrelations among gender schemas are weakened (Signo- Graham, B. J. (2005). Relationships among instrument choice, instru-
rella, 1999; Zosuls, Miller, Ruble, Martin, & Fabes, 2011). In ment transfer, subject sex, and gender-stereotypes in instrumental
music. Dissertation. Indiana University: ProQuest (UMI No.
addition, gender-typed activities vary in strength dependency 3298144).
upon their social contexts (Goble, Martin, Hanish, & Faves, Griswold, P. A., & Chroback, D. A. (1981). Sex-role associations of
2012) and the children in our study were asked in smaller music instruments and occupations by gender and major. Journal
groups. Moreover, the presence of instrumental stereotypes is of Research in Music Education, 29, 57–62.
equivocal in preschool children (Abeles & Porter, 1978; Mar- Hallam, S., Rogers, L., Creech, A. (2008). Gender differences in mu-
shall & Shibazaki, 2012). All this can explain that in our study sical instrument choice. International Journal of Music Education,
each musical instrument is perceived as appropriate for one’s 26 (1), 7-19.
own sex. But yet in a study on self-perceived gender typicality Harrison, A. C., & O’Neill, S. A. (2000). Children’s Gender-Typed
and gender stereotypes in general, Meagan Patterson (2012) Preferences for Musical Instruments. An Intervention Study. Psy-
chology of Music, 28, 81-97.
could show “that even young elementary-school-aged children Harrison, A. C., & O’Neill, S. A. (2003). Preferences and Children’s
used their knowledge of cultural gender roles to make subjec- Use of Gender-Stereotyped Knowledge about Musical Instru-
tive judgments regarding the self, and, conversely, that views ments: Making Judgments about Other Children’s Preferences.
of the self may influence personal endorsement of cultural gen- Sex Roles, 49 (7/8), 389-400.
der stereotypes” (p. 422). In accord with Wrape et al. (2016) it Killian, J. N., & Satrom, S. L. (2011). The effect of demonstrator gen-
could be confirmed that our young and musically inexperienced der on wind instrument preferences of kindergarten, third-grade,
children are more open to counter-stereotypical views than and fifth-grade students. Update: Applications of Research in Mu-
older children. This offers best chances for early instrumental sic Education, 29 (2), 13-19.
learning. MacLeod, R. B. (2009). A comparison of aural and visual instrument
preferences of third and fifth-grade students. Bulletin of the Coun-
cil for Research in Music Education, 179, 33-43.
Marshall, N. A., & Shibazaki, K. (2011). Instrument, Gender and Mu-
ACKNOWLEDGMENT sical Style Associations in Young Children. Psychology of Music,
The authors would like to thank Lydia Scott for additional 40 (4), 494-507.
proofreading. Müller, R., & Burr, M. (2004). Präsentative Forschungsmethoden zur
Untersuchung von Musikinstrumentenpräferenzen in Schulen. In
B. Hofmann (Ed.), Was heißt methodisches Arbeiten in der Mu-
REFERENCES sikpädagogik? (pp. 149-168). Essen: Die Blaue Eule.
Abeles, H. F., & Porter, S. Y. (1978). The Sex Stereotyping of Musical O’Neill, S. A., & Boulton, M. J. (1996). Boys’ and Girls’ Preferences
Instruments. Journal of Research in Music Education, 26 (2), 65- for Musical Instruments: A Function of Gender? Psychology of
75. Music, 24, 171-183.
641
Table of Contents
for this manuscript
Patterson, M. M. (2012). Self-Perceived Gender Typicality, Gender- Tarnowski, S. M. (1993). Gender-bias and musical instrument prefer-
Typed Attributes, and Gender Stereotype Endorsement in Ele- ence. Update: Applications of Research in Music Education, 12
mentary-School-Aged Children. Sex Roles, 67, 422-434. (1), 14-21.
Payne, P. D. (2009). An Investigation of Relationships between Timbre Vickers, M. E. (2015). The effect of model gender on instrument
Preference, Personality Traits, Gender, and Music Instrument Se- choice preference of beginning band students. Dissertation. Uni-
lection of Public School Band Student. Dissertation. The Univer- versity of Hartford – The Harrt School: ProQuest (UMI No.
sity of Oklahoma: ProQuest (UMI No. 3366051). 3700787).
Pickering, S., & Repacholi, B. (2001). Modifying Children’s Gender- Vogl, M. (1993). Instrumentenpräferenz und Persönlichkeitsentwick-
Typed Musical Instrument Preferences: The Effects of Gender and lung. Eine musik- und entwicklungspsychologische Forschungs-
Age. Sex Roles, 45 (9/10), 623-643. arbeit zum Phänomen der Instrumentenpräferenz bei Musikern
Scheuer, W. (1988). Zwischen Tradition und Trend. Die Einstellung und Musikerinnen. Frankfurt/M.: Peter Lang.
Jugendlicher zum Instrumentalspiel. Eine empirische Unter- Wrape, E. R., Dittloff, A. L., & Callahan, J. L. (2016). Gender and
suchung. Mainz: Schott. Musical Instrument Stereotypes in Middle School Children: Have
Signorella, M. L. (1999). Multidimensionality of gender schemas: Im- Trends Changed? Update: Applications of Research in Music Ed-
plications for the development of gender-related characteristics. ucation, 34 (3), 40-47.
In W. B. Swann Jr., J. H. Langlois & L. A. Gilbert (Eds.), Sexism Wych, G. M. F. (2012). Gender and Instrument Associations, Stereo-
and stereotypes in modern society: The gender science of Janet types, and Stratification: A Literature Review. Update: Applica-
Taylor Spence (pp. 107-126). Washington: American Psycholog- tions of Research in Music Education, 30 (2), 22-31.
ical Association. Zervoudakes, J., & Tanur, J. M. (1994). Gender and musical instru-
Sinsel, T. J., Dixon, W. E., & Blades-Zeller, E. (1997). Psychological ments: Winds of change? Journal of Research in Music Education,
Sex Type and Preferences for Musical Instruments in Fourth and 42, 58-67
Fifth Graders. Journal of Research in Music Education, 45 (3), Zosuls, K. M., Miller, C. F., Ruble, D. N., Martin, C. L., & Fabes, R.
390-401. A. (2011). Gender Development Research in Sex Roles: Historical
Trends and Future Directions. Sex Roles, 64, 826-842.
642
Table of Contents
for this manuscript
Interpersonal synchrony encourages prosocial behavior among adults, children, and even
infants. In adults, this effect has been demonstrated in both musical and non-musical
contexts. Music is therefore considered to be a natural, but not necessary, context within
which interpersonal synchrony is achieved. However, how the presence of music shapes
the social effects of interpersonal synchrony has yet to be directly addressed. Previous
work in our lab has shown that 14-month-old infants are more helpful towards an
experimenter after being bounced synchronously versus asynchronously with that person.
In these experiments, music was always present in the background. This experiment
investigated the effects of interpersonal synchrony and asynchrony on infant helping in a
non-musical context. As in our previous experiments, infants were held by an assistant
and bounced at a steady rate. At the same time, the experimenter facing the infant
bounced either in- or out-of-synchrony with how the infant was bounced. In contrast to
our previous studies in which infants listened to music while being bounced, here infants
listened to nature sounds (i.e. rushing water, rustling leaves). Afterward, the infant was
placed in a situation where the experimenter attempted to complete simple tasks, but
dropped the objects she needed to complete the tasks. The number of dropped objects
handed back by each infant was recorded, and average helping rates were compared
across conditions. Replicating our previous findings, infants helped significantly more in
the synchronous versus asynchronous conditions. However, infants displayed more
distress in this experiment (40% did not complete the experiment compared to our typical
rates of 15-25%) and infants took longer to help than in our previous experiments. These
results suggest that while music may not be necessary for the social effects of synchrony
to emerge, the presence of music does contribute positively to the experience.
ISBN 1-876346-65-5 © ICMPC14 643 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 644 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
to name them. Melodies will be referred to hereafter with only III. Experiment 1: Association learning
the first word of the title for brevity. Naming accuracy was
fairly low (21% overall) and did not differ between 2-measure A. Method
and 4-measure melodies. The highest naming rates were for 1) Participants. Sixty-four preschool-aged children (ages
Happy (36%), Mary (30%), and Twinkle (39%). 3-5) took part.
Table 1. Pretest stimuli. 2) Stimuli. The four melodies were Mary, Twinkle,
Mary-scrambled, and Twinkle-scrambled. Two different
Melody Picture
Mary Had a Little Lamb lamb
picture sets were used (Figure 2): either a lamb and a star
Twinkle Twinkle Little Star star (familiar-picture condition), or two cartoon characters
Happy Birthday cake with candles (unfamiliar-picture condition).
Deck the Halls Christmas tree 3) Procedure. Children received reinforced learning trials
Yankee Doodle USA Revolutionary soldier in blocks of 8 trials each (4 with one melody-picture
London Bridge (not used in pretest 2)
combination, 4 with the other). On each trial, two pictures
Star-Spangled Banner flag
appeared on the left and the right side of the screen, and then a
melody played. The child was asked to select the picture that
“goes with” the melody. Once they scored at least 7/8 in a
A second pretest with 23 children used a 2-alternative
block of learning trials, they continued to unreinforced test
forced-choice picture-matching task with the same melodies,
trials. Sixteen children each learned to associate familiar
except than London Bridge (0% naming recognition) was
melodies with familiar pictures; unfamiliar melodies with
replaced with the Star-Spangled Banner. Pictures were Here,
familiar pictures; familiar melodies with unfamiliar pictures;
overall accuracy was 59%. Length (2 vs. 4 measures) did not
unfamiliar melodies with unfamiliar pictures.
impact recognition. The pairs with highest accuracy were
Mary-Twinkle and Birthday-Twinkle (both 69.6% for the B. Results and Discussion
2-measure length). The two chosen melodies were Mary and Results from testing trials suggested that performance
Twinkle because they have similar rhythmic patterns. After degraded when feedback disappeared, so are not analyzed
selecting these two melodies, scrambled versions of each were further. Results from the first block of training trials (Figure
created, attempting to use simple contours for both (Figure 1) 3)—the only block where all participants took part—were
so that familiar and unfamiliar melodies were matched for analyzed in a 2 (Melody Familiarity) x 2 (Picture Familiarity)
overall naturalness. between-subjects ANOVA. Effects of Melody Familiarity
(F(1,60) = 4.13, p = .046) and Picture Familiarity (F(1,60) =
4.05, p = .049) were qualified by an interaction (F(1,60) =
Twinkle Scrambled Twinkle 12.95, p = .0006). T-tests suggested that for familiar pictures,
there was a strong effect of melody familiarity (t(30) = 4.04, p
= .0003), while for novel pictures, there was no effect of
melody familiarity (t(30) = 1.09, p = .28).
Mary Scrambled Mary
Figure 3. Experiment 1, accuracy in first block of training trials
(± standard errors).
645
Table of Contents
for this manuscript
whether the familiarity advantage is at the level of perceptual between melody-change and timbre-change trials (t(31) = 0.11,
representation, at the level of learned associations (with verbal p = .92), while for the large timbre difference, the
materials or visual associates), or both. Therefore, the next timbre-change trials were more accurate than the
experiment tested whether children perform better on familiar melody-change trials (t(31) = 6.31, p < .0001).
melodies in a task that more directly assesses perceptual
processing: a same-different discrimination task. If children
have more robust perceptual representations for familiar than
for unfamiliar (scrambled) melodies, then discrimination !
performance should be better for changes from one familiar
melody to another than for changes from one scrambled
melody to another.
Because previous studies have suggested that children are
more sensitive to timbre than to melodic contour for brief tone
sequences (Creel, 2014, 2016), the next experiment also
included timbre discrimination trials at two levels of timbre
similarity.
646
Table of Contents
for this manuscript
semantic associations, with unfamiliar and seemingly Trehub, S. E., Bull, D., & Thorpe, L. A. (1984). Infants’ perception of
melodies: The role of melodic contour. Child Development, 55(3), 821–
unrelated objects makes the learning task especially difficult.
830.
One might wonder if pre-familiarizing children with melodies Werker, J. F., Fennell, C. T., Corcoran, K. M., & Stager, C. L. (2002). Infants’
would get around this problem, but at least one previous ability to learn phonetically similar words: Effects of age and vocabulary
attempt suggests that pre-familiarization in a lab setting is size. Infancy, 3(1), 1–30. doi:10.1207/15250000252828226
insufficient to generate enough familiarity to facilitate
mapping.
Another limitation concerns the role of familiarity in
melody discrimination. Familiarity effects were present, but
not large. Why would effects be smaller for a perceptual
discrimination task than an association task? There are many
potential explanations, but one possibility is that effects are
smallest when one can retain a melody in working memory.
Once the listener is required to maintain a representation in a
longer-term store—as would be required for association
learning—weaker representations of unfamiliar melodies
would be more apparent.
Nonetheless, results largely support the specific learning
hypothesis. Models of musical development and musical
representation may need to account for greater specificity in
musical memory than is currently acknowledged.
ACKNOWLEDGMENT
Thanks to Emilie Seubert, Nicolle Paullada, and Elizabeth
Groves for running child participants, and to the children,
parents, and preschools who made this research possible.
REFERENCES
Corrigall, K. A., & Trainor, L. J. (2010). Musical enculturation in preschool
children: acquisition of key and harmonic knowledge. Music Perception,
28(2), 195–200.
Creel, S. C. (2014). Tipping the scales: Auditory cue weighting changes over
development. Journal of Experimental Psychology: Human Perception
and Performance, 40(3), 1146–1160. doi:10.1037/a0036057
Creel, S. C. (2016). Ups and downs in auditory development: Preschoolers’
sensitivity to pitch contour and timbre. Cognitive Science, 14, 373–303.
doi:10.1111/cogs.12237
Dowling, W. J. (1978). Scale and contour: two components of a theory of
memory for melodies. Psychological Review, 85(4), 341–354.
Feldman, N. H., Myers, E. B., White, K. S., Griffiths, T. L., & Morgan, J. L.
(2013). Word-level information influences phonetic learning in adults and
infants. Cognition, 127(3), 427–38. doi:10.1016/j.cognition.2013.02.007
Fennell, C. T., & Werker, J. F. (2003). Early word learners’ ability to access
phonetic detail in well-known words. Language and Speech, 46(Pt 2-3),
245–64. doi:10.1177/00238309030460020901
Goldinger, S. D. (1998). Echoes of echoes?: An episodic theory of lexical
access. Psychological Review, 105(2), 251–279.
Hannon, E. E., & Trehub, S. E. (2005). Tuning in to musical rhythms: Infants
learn more readily than adults. Proceedings of the National Academy of
Sciences, 102(35), 12639–12643.
Hannon, E. E., & Trehub, S. E. (2005). Metrical categories in infancy and
adulthood. Psychological Science, 16(1), 48–55.
Iverson, P., & Krumhansl, C. L. (1993). Isolating the dynamic attributes of
musical timbre. Journal of the Acoustical Society of America, 94(5),
2595–603. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/8270737
Pearce, M. T., & Wiggins, G. A. (2012). Auditory expectation: the
information dynamics of music perception and cognition. Topics in
Cognitive Science, 4, 625–652. doi:10.1111/j.1756-8765.2012.01214.x
Stager, C. L., & Werker, J. F. (1997). Infants listen for more phonetic detail in
speech perception than in word-learning tasks. Nature, 388, 381–382.
Swingley, D., & Aslin, R. N. (2002). Lexical neighborhoods and the
word-form representations of 14-month-olds. Psychological Science,
13(5), 480–484. doi:10.1111/1467-9280.00485
Trainor, L. J., & Trehub, S. E. (1994). Key membership and implied harmony
in Western tonal music: developmental perspectives. Perception &
Psychophysics, 56(2), 125–32. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/7971113
647
Table of Contents
for this manuscript
Prior research has indicated perceptual interactions among pitch, timbre, and loudness.
For example, listeners often judge equal-frequency tones as different in pitch when their
timbre varies, and listeners’ loudness judgments are less accurate when comparison tones
have different frequencies. The present study was designed to investigate how these
perceptual interactions might be made manifest in a performance task. Specifically, we
investigated the potential effects of timbral instructions (e.g., “Play with a dark sound”)
on the cent deviation, intensity, and tone quality of sustained tones. Collegiate trumpeters
performed repetitions of two sustained tones (written C4 and C5) in three conditions:
neutral instruction (“Perform this note”), dark-timbre instruction (“Perform this note with
a dark sound”), and bright-timbre instruction (“Perform this note with a bright sound”).
Four contrasting presentation orders were used. Dependent variables included two
acoustic measures (cent deviation from equal-tempered standard frequency [expressed in
absolute value] and intensity [expressed as dB]) and one perceptual measure (expert tone
quality rating). Split-plot ANOVAs were conducted to examine the effects of instruction
(neutral, dark, or bright), tone (C4 or C5), and order on each of these dependent variables.
Preliminary results from a pilot study indicate that timbral instructions demonstrated a
significant main effect on performed cent deviation, F(2, 4) = 7.62, p = .04, η2p = .79, and
post-hoc comparisons indicate significantly higher deviation in response to the dark-
timbre instruction than the neutral instruction (p = .04). No other main effects or
interactions were observed in the pilot study. Data collection for the primary study
(expected N = 30) is ongoing and will be complete by May 1. Results will provide useful
implications for music pedagogy and further conclusions regarding perceptual
interactions among pitch, timbre, and loudness.
ISBN 1-876346-65-5 © ICMPC14 648 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Timbre allows us to quickly identify threatening animal growls, pick out a friend’s nasal
voice among a crowd, and enjoy a rich variety of musical textures, from the bright buzz
of a muted trumpet to the crisp attack of a snare drum. Often defined as tone color, or the
unique quality of asound, timbre cannot be attributed simply to pitch, intensity, duration,
or location. However, despite its critical contributions to our everyday auditory
experiences, it is unclear whether timbre truly represents a meaningful auditory
configuration rather than an arbitrary collection of low-level features: if so, listeners
should adapt to timbre, and this adaptation should generalize across changes in lower-
level features (e.g., pitch). Although adaptation to simple auditory features (e.g.,
frequency, level) is well studied, it is unknown whether or how adaptation to the more
complex, holistic structure of natural sounds might influence perception. Here, we used a
range of complex sounds equated in basic auditory features to investigate timbre
adaptation. In each trial, participants adapted to one of two sounds (e.g., clarinet and
oboe) that formed the endpoints of a morphed continuum. After repeated exposure to one
of the two adapter sounds, participants judged the identity of the various morphed
sounds. We found consistent negative aftereffects resulting from timbre adaptation, such
that adapting to sound A significantly altered perception of a neutral morph between A
and B, making it sound more like B. Importantly, these aftereffects were robust and
invariant to changes in pitch, suggesting that they stem from adaptation to the global,
spectro-temporal configuration of complex sounds and can survive the lower-level
feature shifts that naturally vary (largely independently of timbre) in the environment.
This adaptation, which is likely exploited by composers, could serve to enhance our
sensitivity to novel or rare auditory objects, such as a new soloist in an orchestra.
ISBN 1-876346-65-5 © ICMPC14 649 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
All about that bass: Timbre and groove perception in synthesized bass
lines
Ivan Tan1, Ethan Lustig1
1
Department of Music Theory, Eastman School of Music, University of Rochester, Rochester, NY, USA
Groove is defined as wanting to move the body to music. Most groove research has
focused on temporal features like microtiming and syncopation, though previous studies
have shown correlations between activity in certain frequency bands and increased
movement. This study further examines the relationship between timbre and groove. Four
brief songs consisting of drum samples and a bass synthesizer were composed in an
electronic dance music style and presented to 102 participants on headphones. Each song
had four audio filter conditions (no filter, low-pass, high-pass, band-pass) applied only to
the synthesizer. Participants heard each stimulus three times, and rated both groove and
liking on a five-point scale.
For both groove and liking, a main effect was found for filter—participants gave higher
ratings to the filter conditions that preserved low-frequency energy (low-pass and no
filter), and lower ratings to the filter conditions that removed low-frequency energy
(high-pass and band-pass). No song-filter interaction was found, and no repetition effect
was found.
The results suggest—not surprisingly—that people groove more to, and like more, the
filter conditions that preserve low-frequency energy. The lack of song-filter interaction
demonstrates that within the same style, timbre can determine groove and liking, even
across different melodic and rhythmic contexts. Our results are consistent with other
recent empirical groove research on timbre, and we consider recent physiological
explanations for these findings. We also analyze our data for gender differences in filter
ratings, and consider cultural and evolutionary explanations, as some research suggests
that males prefer exaggerated bass. Finally, we consider the effect of dance experience on
filter ratings, based on questionnaire data.
ISBN 1-876346-65-5 © ICMPC14 650 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 651 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
There is a tight link between the rhythm of external information (e.g., auditory) and
movement. It can be seen when we spontaneously or deliberately move on the beat of
music (e.g., in dance), or when we particularly enjoy performing sport activities (e.g.,
running or cycling) together with music. The propensity to match movements to rhythm
is natural, develops precociously and is likely hard-wired in humans. The BeatHealth
project exploits this compelling link between rhythm and movement for boosting motor
performance and enhancing health and wellness using a personalized approach. This goal
is achieved by creating an intelligent technological architecture for delivering rhythmic
auditory stimuli using state-of-the art mobile and sensor technology. Optimal stimulation
is based on experiments performed in the lab and in more ecological conditions. The
beneficial effects of BeatHealth are evaluated in two populations: patients with
Parkinson’s disease (walking) and in healthy citizens of various ages with moderate
physical activity (running). The main findings and technological achievements of this
project, still ongoing, will be presented in this Symposium.
ISBN 1-876346-65-5 © ICMPC14 652 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Gait dysfunctions in Parkinson's disease (PD) can be partly relieved by rhythmic auditory
cueing (i.e., asking patients to walk with a rhythmic auditory stimulus such as a simple
metronome or music). The beneficial effect on gait is visible immediately in terms of
increased speed and stride length. In standard rhythmic auditory cueing, the interval
between sounds or musical beats is fixed. However, this may not be an optimal
stimulation condition, as it fails to take into account that motor activity is intrinsically
variable and tends to be correlated in time. In addition, patients show typical variability
(accelerations and decelerations) during walking which may lead them to progressively
desynchronize with the beat. Stimulation with embedded biological variability and
capable to adapt in real time to patients’ variable step times could be a valuable
alternative to optimize rhythmic auditory cueing. These possibilities have been examined
in the BeatHealth project, by asking a group of PD patients (n = 40) and matched controls
(n = 40) to walk with various rhythmic auditory stimuli (a metronome, music, and
amplitude-modulated noise) under different conditions. In some conditions, inter-beat
variability has been manipulated, by embedding biological vs. random variability in the
stimulation. Another manipulation was to adapt in real time the stimulus beats to
participants’ step times using dedicated algorithms. Performance was quantified by way
of the typical spatio-temporal gait parameters and additional measures akin to complex
systems (e.g., long-range correlations). The results indicated that biologically-variable
stimulation which is adapted to patients’ steps is the most efficient for improving gait
performance. These findings provide useful guidelines for optimizing music-driven
rehabilitation of gait in PD.
ISBN 1-876346-65-5 © ICMPC14 653 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Although the inclination to run in music seems to transcend places, generations, and
cultures, it is unclear where this ergogenic power of music comes from. During running,
it can be merely due to music’s tendency to elicit pleasure, increase motivation or drive
attention away from athletes’ feeling of fatigue. Alternatively, music may foster efficient
coupling of biological rhythms that are key contributors to performance, such as
locomotion and respiration. In this talk, we present the main concepts underlying
rhythmical entrainment in biological systems, and their potential consequences in terms
of movement stability, overall performance and wellness. We present the results of
several experiments performed in young adults running on a treadmill or outside the lab.
These experiments exploit our intelligent running architecture — BeatRun — able to
deliver embodied, flexible, and efficient rhythmical stimulation. Our results show the
often beneficial but sometimes detrimental effect of running in music, the entrainment
asymmetry between locomotor and respiratory systems, and large individual differences.
These results are discussed in terms of the dynamics of flexible coupling between
running and breathing and rhythmical entrainment. Consequences for sport performance,
wellness, efficiency in daily work, and the development of new music technologies are
envisaged.
ISBN 1-876346-65-5 © ICMPC14 654 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 655 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The BeatHealth project delivers a music driven application designed to improve the
wellness of healthy runners and people with Parkinson’s disease by suitably modulating
the coupling of music to their movement. Specifically, musical stimulation is adapted in
real time to synchronize with human gait so as to motivate the user and reinforce,
stabilize, or modify their gait characteristics. This research focused on the development
of a mobile platform, comprising a mobile phone application and low-cost wearable
wireless sensors. This mobile platform conveniently translates the BeatHealth laboratory
findings and algorithms to an embodiment suitable for real world use. The development
process demanded an interdisciplinary approach which integrated the fields of movement,
music, and technology. Sensor technology was developed to accurately track the
kinematic activity which formed the input to the musical stimulation process running on
the mobile phone. This process selects suitable audio tracks from a playlist and
dynamically alters their playback rate in real time so that the rhythm of the movement
and the music are synchronized. Multiple sources of delay and variability in the
technology were measured and compensated for to ensure that the desired
synchronization could be achieved. The system also required tuning so that it responded
appropriately to natural movement variability and intentional movement changes. The
mobile system has been validated in a variety of settings using laboratory equipment.
Ultimately, the BeatHealth mobile platform provides a synchronization capability that
has not been available to date for music and movement applications. Moreover, the
technical architecture of the platform can be adapted to synchronize with essentially any
repetitive movement or rhythmic pulse which can be described as an oscillator. This
gives the platform potential applicability in a wide range of studies examining the
complex relationship between music and movement.
ISBN 1-876346-65-5 © ICMPC14 656 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In recent years, mobile, online, and social-media services have yielded an unprecedented
volume of user data reflecting various forms of musical engagement. While these data
typically lack the underlying experimental rigor of controlled laboratory studies, they
surpass traditional experimental data in size, scope, and ecological validity. Analytical
insights drawn from such data thus promise to provide significant contributions to our
understanding of numerous musical behaviors. Embodying the Bay Area’s unique
integration of academia and industry, this symposium aims to introduce the ICMPC
community to music cognition research conducted in an industrial setting. We convene
researchers from four well-known music technology companies, who will each explore a
different aspect of musical engagement: Discovery (Shazam), performance (Smule),
education (Yousician), and recommendation (Pandora). The symposium will comprise
back-to-back presentations followed by a short panel discussion and audience Q&A
facilitated by the discussant (Google). All symposium participants have research
experience in both academic and industrial settings, and are excited to engage in a
discussion of research goals, practices, and practicalities bridging academia and industry.
It is hoped that this symposium will open a dialogue between academia and industry
regarding potential benefits and challenges of incorporating industrial data into music
perception and cognition research. The symposium will be relevant to conference
delegates performing research in any of the above topics, as well as anyone interested in
using industrial data to broaden the scope of research in musical practices and behaviors.
Attendees will come away with a practical understanding of requisite tools and technical
skills needed to perform music cognition research at scale, as well as perspective on the
impact of data size on hypothesis formulation, experimentation, analysis, and
interpretation of results.
ISBN 1-876346-65-5 © ICMPC14 657 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
When a listener deliberately seeks out information about a piece of music, a number of
cognitive, contextual, and musical factors are implicated – including the listener’s
musical tastes, the listening setting, and features of the musical performance. Such
drivers of music discovery can now be studied on a massive scale by means of user data
from Shazam, a widely used audio identification service that returns the identity of an
audio excerpt – usually a song – in response to a user-initiated query. Shazam’s user data,
comprising over 600 million queries per month from over 100 million monthly users
across a catalog of over 30 million tracks, have been shown to predict hits, reflect
geographical and temporal listening contexts, and identify communities of listeners. We
begin this talk with an overview of the Shazam use case and its user data attributes. Next,
we present findings from an exploratory study utilizing Shazam query ‘offsets’, the
millisecond-precision timestamps specifying when in a song a query was initiated. Here
we wish to see whether there exists a relationship between the distribution of offsets over
the course of a song and specific musical events. We aggregated over 70 million offsets
corresponding to ten top hit songs of 2015. Taking into account the estimated delay
between intent-to-query and actual query times, results thus far suggest that offsets are
not uniformly distributed over the course of a song, but in fact cluster around specific
musical events such as the initial onset of vocals and transitions between song parts. We
consider the implications and potential confounds surrounding these findings in terms of
music-listening contexts, distinguishing between music discovery and musical tastes, and
practical considerations surrounding Shazam’s functionality. We conclude with a
discussion of potential uses of Shazam data to address central questions in the music
perception and cognition field.
ISBN 1-876346-65-5 © ICMPC14 658 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 659 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
We studied the learning process of students who practice to play a musical instrument in
the self-study setting. The goal was to discover individual differences and commonalities
in the students’ practice habits and learning efficiency. Data collection took place via
Yousician, a music education platform with over five million users. In the application,
users are prompted to perform an exercise and the acoustic signal from the instrument is
then used to evaluate whether the correct notes were played at the right times, and to
deliver appropriate real-time feedback. Students tend to develop varying practice habits,
even in the restricted setting where the syllabus is pre-determined: Some prefer to perfect
each exercise before moving on to the next one, whereas others push forward faster.
Some revisit previously learned material from time to time, whereas others do not – just
to mention a few examples. We carried out statistical analysis on a random set of 25,000
subscribed users, in order to study the correlation between certain practice habits and the
resulting learning efficiency. Here learning efficiency was defined as the cumulative
practice time required to pass skill tests at varying levels of difficulty. We also studied
how pitch and timing inaccuracies in users’ performances correlate with the music
content, in order to identify factors that make a song difficult to perform. The use of such
data for optimizing computer-assisted learning and the syllabus contents will be
discussed.
ISBN 1-876346-65-5 © ICMPC14 660 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Pandora began with The Music Genome Project, the most sophisticated taxonomy of
musicological data ever collected and an extremely effective content-based approach to
music recommendation. Its foundation is based on human music cognition, and how an
expert describes and perceives the complex world of a music piece. But what happens
when you have a decade of additional data points, given off by over 250 million
registered users who have created 8+ billion personalized radio stations and given 60+
billion thumbs? Unlike traditional recommender systems such as Netflix or Amazon,
which recommend a single item or static set, Pandora provides an evolving set of
sequential items, and needs to react in just milliseconds when a user is unhappy with a
proposed song. Furthermore, several factors (musicological, social, geographical,
generational) play a critical role in deciding what music to play to a user, and these
factors vary dramatically for each listener. This session will look at how Pandora’s
interdisciplinary team goes about making sense of these massive data sets to successfully
make large-scale music recommendations. Following this session the audience will have
an in-depth understanding of how Pandora uses Big Data science to determine the perfect
balance of familiarity, discovery, repetition and relevance for each individual listener,
measures and evaluates user satisfaction, and how our online and offline architecture
stack plays a critical role in our success. I also present a dynamic ensemble learning
system that combines curational data and machine learning models to provide a truly
personalized experience. This approach allows us to switch from a lean-back experience
(exploitation) to a more exploration mode to discover new music tailored specifically to
users’ individual tastes. I will also discuss how Pandora, a data-driven company, makes
informed decisions about features that are added to the core product based on results of
extensive online A/B testing.
ISBN 1-876346-65-5 © ICMPC14 661 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 662 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
information from long term memory (LTM); b) transformation flute, 4 oboe, 4 clarinet, 2 saxophone, 1 horn, 2 trumpet) formed
of information in WM; and c) substitution of information in de intermediate expertise group; and 31 young adults (M = 22.3,
WM, which inhibits and replaces information that is no longer SD = 1.34; 14 women, 17 men), that have been receiving music
relevant. Both retrieval and transformation sub-processes could lessons since at least 12 years (M = 12.9, SD = .46) of melodic
represent a common source of variance in WM capacity and instruments (2 violin, 1 viola, 2 cello, 8 flute, 4 oboe, 2 clarinet,
updating, whereas substitution could be specific of the updating 2 bassoon, 2, saxophone, 4 horn, 1 trumpet, 2 trombone, 1 tuba)
function, and could allow the differentiation between updating formed the high expertise group.
and WM capacity (see also Ecker, Lewandowsky & Oberauer,
2014). Taking into account the cognitive requirements of SR B. Materials
music, and the involvement of updating executive function in Updating Task. We used an adaptation to the Spanish of the
high level cognition, it could be supposed that updating could updating task (Carriedo, Corral, Montoro, Herrero & Rucián,
be in part responsible of the differences in this complex task. 2016) developed by De Beni and Palladino, (2004). This
Moreover, we also consider that the underlying processes of updating task allows differentiating retrieval/transformation
retrieval/transformation and substitution described above could and substitution components of the updating process, posing
contribute differentially to SR performance at both efficiency variable demands on memory and suppression, which permits
and developmental levels: the processes of retrieval and the study of the interaction of the two processes. Importantly,
transformation could be necessary to maintain in WM the this task also provides different indexes for the process of
information printed in the score and to select the adequate inhibition/interference: suppression of information in WM, that
motor responses from LTM; on the other hand, the process of is, the ability to inhibit in WM information that is irrelevant to
substitution could be necessary to supress no longer relevant the goals of the task, and proactive interference, understood as
information from WM, replace it for the new incoming the capacity to inhibit the information stored in LTM that is no
information from the score, and inhibit inadequate motor longer relevant for the ongoing task. This task was selected
responses from LTM.. because it has been shown to be a reliable measure in the study
To our knowledge, it has not yet been explored the role of of individual differences in various areas, such as arithmetic
updating executive function underlining processes in SR problem-solving (Iglesias-Sarmiento et al. 2014;Passolunghi &
performance. Moreover, the vast majority of studies Pazzaglia, 2005), reading comprehension (Carretti, Borella,
considering cognitive processes in SR were carried out with Cornoldi, & De Beni, 2009; Iglesias-Sarmiento et al. 2014),
adult and expert pianist participants. For this reason, we and in the description of cognitive changes associated with age
established two main goals: first, to explore the relationship (Carriedo et al., 2016; De Beni & Palladino, 2004; Lechuga,
between updating executive function sub-processes and SR Moreno, Pelegrina, Gómez-Ariza, & Bajo, 2006).
rhythmic accuracy across development and expertise levels; The task had a total of 24 lists (20 experimental lists and 4
and second, to analyze the existence of differences between practice lists), each one containing 12 words. The words were
good and poor sight readers, in terms of rhythmic accuracy, that names of objects, animals, or body parts of different sizes, and
could be associated with updating executive function sub- abstract common nouns. Each list included words to be recalled
processes. (relevant words), words to be discarded (irrelevant words), and
We first hypothesized that, regardless of the effect of age and filler words. The number of each kind of word in each list
expertise level, the processes of retrieval/transformation and varied depending on the experimental condition. Thus, the
substitution underlying updating executive function would be number of relevant words in each list varied between 3 (low
significantly related to SR rhythmic accuracy. Second, we also memory load) and 5 (high memory load). The number of
predicted that good sight readers would outperform poor sight irrelevant words varied between 2 (low suppression) and 5
readers in the updating task, which will be reflected both in the (high suppression). Finally, the number of abstract filler words
processes of retrieval/transformation and substitution. varied from 2 to 7. The composition of lists as a function of
memory load and suppression can be seen in Table 1. The final
II. METHOD 24 lists were distributed in 4 experimental conditions of 6 lists
each. One list of each experimental condition was considered
A. Participants as a practice list. The total words to be recalled across all
The sample for this study consisted of 131 students of different condition lists were 80 (practice lists were excluded), 25 in the
ages and level of expertise: 37 children aged 10-11 (M = 10.9, case of each high load condition, and 15 in the case of each low
SD = .44; 26 girls, 11 boys), that have been receiving music load condition. A list example is: árbol (tree), autobús (bus),
lessons since at least 2 years (M = 2.9, SD = .32) of melodic piscina (pool), sofá (couch), cesta (basket), tema (matter), acto
instruments (10 violin, 3 viola, 3 cello, 3 double bass, 9 flute, 1 (act), flor (flower), dedo (finger), lápiz (pencil), oreja (ear),
oboe, 2 clarinet, 3 saxophone, 1 trumpet, 2 trombone) formed patata (potato).
the novices-low expertise group; 31 pre-adolescents aged 12- Dependent variable was the percentage of correct recall as
13 (M = 13.2, SD = .48 ; 25 girls, 6 boys), that have been an index of retrieval/transformation of information in WM;
receiving music lessons since at least 4 years (M = 4.6, SD same list intrusions as an index of suppression of information
= .44) of melodic instruments (11 violin, 4 viola, 3 cello, 1 in WM; and previous lists intrusions as an index of proactive
double bass, 4 flute, 1 oboe, 4 clarinet, 2 saxophone, 1 trumpet) interference.
formed the novices-high expertise group; 32 adolescents aged Sight Reading Task. Two different pieces from “Sound at Sight
15-16 (M = 15.2, SD =.49 ; 21 girls, 11 boys), that have been Series” (Trinity College London) were selected by instrument
receiving music lessons since at least 7 years (M = 6.8, SD specialists for each instrument and level. The selection criterion
= .28 ) of melodic instruments (12 violin, 1 viola, 4 cello, 2 was the time bar, choosing one binary (A) and one ternary (B)
663
Table of Contents
for this manuscript
time bar for all the participants. Taking into account the musical of 12 items, a different beep and a big question mark on the
knowledge by level of expertise, we decided to select a 2/4 or screen asked the participants to recall the 3 or 5 smallest items
4/4 piece as a binary time bar for all the levels and instruments. of the current list by verbal response. To continue with the next
Following the same criteria, we selected the ternary pieces list, participants had to press the space bar.
according to both time bar and subdivision. Thus, we selected In the SR task, all participants performed two consecutive
a 3/4 piece (ternary time bar and binary subdivision) for the scores. They were instructed to carefully look at the first score
novices-low expertise group, a 6/8 or 12/8 piece (binary time (tempo, bar, key signature), and 30 seconds later they were
bar and ternary subdivision) for the novices-high expertise instructed to play trying not to make breakdowns or repetitions.
group, and a 6/8, 12/8, 3/8 or 9/8 (binary or ternary time bar The same procedure was carried out for the second score. All
and ternary subdivision) for the intermediate and expertise the performances were audio-registered (SONY£ IC Recorder
levels. We selected rhythmic accuracy as an index of SR ICD-UX71F) and scored by two independent expertise
performance taking into account that previous research has musician judges.
been determined that it is a reliable measure of SR execution
(e.g. Drake & Palmer, 2000; Elliot, 1982; Gromko, 2004; III. RESULTS
Hayward & Gromko, 2009; Kopiez & Lee, 2008; McPherson, For the analysis of the results, we previously tested whether
1994; Mishra, 2014; Waters et al.,1997; Wurtz, Mueri, & there were differences by age and level of expertise in binary
Wiesendanger, 2009). Rhythmic accuracy was measured by and ternary SR rhythmic accuracy measures. Thus, two
the correct proportion between notes and its corresponding independent univariate ANOVAs were carried out. The results
rests. showed that there were no significant differences in both,
Table 1. Composition of the lists as a function of the experimental binary (F (3, 127) = .533, p = .66, ηp2 = .01) and ternary (F (3,
conditions. 127) = .554, p = .65, ηp2 = .01) SR tests by age and level of
expertise. Thus, we grouped all the participants for subsequent
Low load/low Low load/high High load/low High
analysis.
suppression suppression suppression load/high
Next, we tested whether binary and ternary SR tests were
significantly different. Although there was a highly significant
(5 Lists) (5 Lists) (5 Lists) suppression correlation between them (r = .917**, p < .001), Student t, test
showed that ternary test means were significantly lower than
(5 Lists) binary ones (t = 3.574, p < .001) for all the participants. Thus,
we decided to analyze them separately.
In order to test our first hypothesis, that is, that the processes
3 Relevant 3 Relevant 5 Relevant 5 Relevant
of retrieval/transformation and substitution underlying
items items items items updating executive function would be significantly related to
SR rhythmic accuracy, a correlational analysis was carried out.
2 Irrelevant 5 Irrelevant 2 Irrelevant 5 Irrelevant Pearson correlations are shown in Table 2.
items items items items Table 2. Results of Pearson´s correlations (*p<.05, **p<.01).
664
Table of Contents
for this manuscript
Novices high 10 10
.18
Intermediate 9 6
.16
Expert 8 12
.14
Total N 38 38 .12
.10
For all the dependent variables we conducted a mixed .08
ANOVA 2 x (2 x 2), with SR group (good/poor) as a between-
subject factor, and memory load (high and low) and level of .06
suppression (high and low) as within-subject factors. One- .04
tailed test results are shown, unless otherwise is stated. Figure .02
1 shows the mean proportion of correct answers and errors by
group (good and poor sight readers). .00
good sight readers poor sight readers
1) Proportion of recall of critical items.
low-load high-load
The results showed a significant effect of memory load(F (1,
74) = 200.605, p = .000, ηp2 = .73), which means that the
proportion of correct responses was lower in the high load Figure 2. Proportion of same-list intrusions by group and memory
(75%) condition than in the low load (92%) condition for all load conditions.
the participants. Importantly, the results also showed a
significant effect of group (F (1, 74) = 5.043, p < .002, ηp2 3) Proportion of previous-lists intrusions.
= .06), which means that the proportion of correct responses The ANOVA showed a significant main effect for memory
was significantly higher in good sight readers (86%) than in load (F (1, 74) = 17.822, p = .000, ηp2 = .19), which means that
poor sight readers (81%). the proportion of previous-lists intrusions was higher in the
high load condition (4%) than in the low load condition (1%)
for all the participants.
665
Table of Contents
for this manuscript
IV. DISCUSSION good sight readers have a greater ability to inhibit irrelevant
information in WM when memory load increases, allowing
Music performance at sight is a complex dual task that
them to more efficiently maintain the relevant information.
require the transcription of music printed in a score into sounds,
In summary, we found that, in terms of rhythmic accuracy,
in real time and without practice benefit. Taking into account
good sight readers were significantly more efficient than poor
the temporary nature of SR performance, we suggested that it
sight readers both in retrieval/transformation and substitution
could require the involvement of updating executive function
processes underlying updating executive function, regardless
sub-processes, that is, retrieval/transformation and substitution.
of the effect of age and expertise level.
Our first aim was to explore the relationship between the
A limitation of this work is that SR performance must
updating executive function sub-processes
include not only rhythmic accuracy but also other measures
(retrieval/transformation and substitution) and SR rhythmic
such as pitch accuracy, continuity, articulation accuracy and
accuracy, a reliable measure of SR (Drake & Palmer, 2000;
expressiveness. It is possible that the relationship between
Elliot, 1982; Gromko, 2004; Hayward & Gromko, 2009;
updating sub-processes and these measures could reflect
Kopiez & Lee, 2008; McPherson, 1994; Mishra, 2014; Waters
differences by age and expertise level.
et al., 1997; Wurtz et al., 2009) across development and
expertise level. We hypothesized that, regardless of the effect
of age and expertise, the processes of retrieval/transformation
ACKNOWLEDGMENT
and substitution underlying updating executive function would We would like to thank Mª Teresa Lechuga for providing us
be significantly related to SR rhythmic accuracy. Moreover, we with the word lists used in her experiment. We also thank Celia
also hypothesized that the differences between good and poor Viciana for their help in data collection, and the students who
sight readers in terms of rhythmic performance could be due to have collaborated with this research. This work was financially
updating sub-processes. supported by MINECO EDU2011-22699 project.
In relation to our first hypotheses, the results showed that the
process of retrieval/transformation was significantly related to REFERENCES
SR rhythmic accuracy in high load/low suppression condition, Barrouillet, P. (1996). Transitive inferences from set-inclusion
regardless of the age and level of expertise. relations and working memory. Journal of Experimental
Thus, it seems that the ability to efficiently retrieve and Psychology: Learning, Memory, and Cognition, 22, 1408–1422.
transform information in WM is related to a better performance http://dx.doi.org/10.1037/0278-7393.22.6.1408
at sight, probably helping to select the adequate rhythmic Barrouillet, P., & Lecas, J. (1999). Mental models in conditional
proportion between notes from the first stages of music training. reasoning and working memory. Thinking & Reasoning, 5, 289–
Moreover, the results also showed that rhythmic accuracy 302. http://dx.doi.org/10.1080/135467899393940
was significantly related to inhibition in WM in high load/low Belacchi, C., Carretti, B., & Cornoldi, C. (2010) The role of working
memory and updating in Coloured Raven Matrices performance
suppression condition, but only in binary SR test. These pattern in typically developing children. European Journal of Cognitive
of correlations seems to indicate that the processes of SR and Psychology,22:7,1010-1020,
updating information in WM could be associated only in http://dx.doi.org/10.1080/09541440903184617
conditions of high memory and low suppression demands, that Bull, R., & Lee, K. (2014). Executive functioning and mathematics
is, when WM was highly loaded but not overwhelmed. achievement. Child Development Perspectives, 8(1), 36-41.
With regard to our second hypothesis, globally, the results http://dx.doi.org/10.1111/cdep.12059
showed that good sight readers outperformed poor sight readers Carretti, B., Borella, E., Cornoldi, C., & De Beni, R. (2009). Role of
both in maintenance and inhibition in WM. working memory in explaining the performance of individuals
Specifically, and regarding the rhythmic accuracy in SR with specific reading comprehension difficulties: A meta-analysis.
Learning and Individual Differences, 19, 246-251.
performance, our results showed that good sight readers had a Carriedo, N., Corral, A., Montoro, P., Herrero, L., & Rucián, M.
greater capacity for maintaining information in WM than poor (2016). Development of the updating executive function: from 7-
sight readers, regardless of their level of expertise and age, or year-olds to young adults. Developmental Psychology. Advance
the demands on memory load and suppression. These results online publication. http://dx.doi.org/10.1037/dev0000091
could also support the implication of WM capacity in SR Chen, T., & Li, D. (2007). The roles of working memory updating and
performance as Lehmann and Kopiez (2009) also suggested, at processing speed in mediating age-related differences in fluid
least in terms of rhythmic accuracy. intelligence. Aging, Neuropsychology, and Cognition, 14(6), 631-
In relation to the substitution process, the updating task used 646.
in our study allowed us to differentiate between two processes: Daneman, M., & Carpenter, P. A. (1980). Individual differences in
working memory and reading. Journal of Verbal Learning and
suppression of information in WM, that is, the ability to inhibit Verbal Behavior, 19(4), 450-466.
in WM information that is irrelevant to the goals of the task, De Beni, R., & Palladino, P. (2004). Decline in working memory
and proactive interference, understood as the capacity to inhibit updating through ageing: Intrusion error analyses. Memory, 12(1),
the information stored in LTM that is no longer relevant for the 75-89.
ongoing task. The results showed that good sight readers were Drake, C., & Palmer, C. (2000). Skill acquisition in music
less affected by the increase in memory load to inhibit performance: relations between planning and temporal control.
irrelevant information from WM, whereas there were no Cognition, 74(1), 1-32.
significant differences between good and poor sight readers to Ecker, U. K., Lewandowsky, S., & Oberauer, K. (2014). Removal of
inhibit information from LTM. information from working memory: A specific updating process.
Journal of Memory and Language, 74, 77-90.
These results could be interpreted in terms of the limitations http://dx.doi.org/10.1016/j.jml.2013.09.003.
of the attentional system to simultaneously maintain and inhibit
information in WM (Ecker et al., 2014). Thus, it is possible that
666
Table of Contents
for this manuscript
Ecker, U. K., Lewandowsky, S., Oberauer, K., & Chee, A. E. (2010). Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter,
The components of working memory updating: An experimental A., & Wager, T. D. (2000). The unity and diversity of executive
decomposition and individual differences. Journal of functions and their contributions to complex “frontal lobe” tasks:
Experimental Psychology: Learning, Memory, and Cognition, A latent variable analysis. Cognitive Psychology, 41(1), 49-100.
36(1), 170-189. Morris, N., & Jones, D. M. (1990). Memory updating in working
Elliott, C. A. (1982). The relationships among instrumental sight- memory: The role of the central executive. British journal of
reading ability and seven selected predictor variables. Journal of psychology, 81(2), 111-121.
Research in Music Education, 30(1), 5-14. Palladino, P., Cornoldi, C., De Beni, R., & Pazzaglia, F. (2001).
Gromko, J. E. (2004). Predictors of music sight-reading ability in high Working memory and updating processes in reading
school wind players. Journal of Research in Music Education, comprehension. Memory & Cognition, 29(2), 344-354.
52(1), 6-15. Passolunghi, M. C., & Pazzaglia, F. (2005). A comparison of updating
Hayward, C. M., & Gromko, J. E. (2009). Relationships among music processes in children good or poor in arithmetic word problem-
sight-reading and technical proficiency, spatial visualization, and solving. Learning and Individual Differences, 15(4), 257-269.
aural discrimination. Journal of Research in Music Education, Sloboda, J. A. (1982). Music performance. In Deutch, D. (Ed.) The
57(1), 26-36. psychology of music, pp. 479-496. Academic press.
Huizinga, M., Dolan, C. V., & van der Molen, M. W. (2006). Age- St Clair-Thompson, H. L., & Gathercole, S. E. (2006). Executive
related change in executive function: Developmental trends and a functions and achievements in school: Shifting, updating,
latent variable analysis. Neuropsychologia, 44, 2017– inhibition, and working memory. The Quarterly Journal of
2036.http://dx.doi.org/10.1016/j.neuropsychologia.2006.01.010 Experimental Psychology, 59, 745–759.
Iglesias-Sarmiento, V., Carriedo-López, N., & Rodríguez-Rodríguez, http://dx.doi.org/10.1080/17470210500162854
J. L. (2014). Updating executive function and performance in Thompson, W. B. (1987). Music sight-reading skill in flute players.
reading comprehension and problem solving. Anales de The Journal of general psychology, 114(4), 345-352.
Psicología/Annals of Psychology, 31(1), 298-309. Trinity College London (2003). Sight reading pieces for Trumpet
Kyllonen, P. C., & Christal, R. E. (1990). Reasoning ability is (little (Grades 1-8). Sound and sight.
more than) working-memory capacity?! Intelligence, 14(4), 389- Trinity College London (2003). Sight reading pieces for Violin (Book
433. 1: Initial-Grade 3; Book 2: Grades 4-8). Sound and sight.
Kopiez, R. & Lee, J.I. (2006). Towards a dynamic model of skills Trinity College London (2004). Sight reading pieces for Cello (Initial-
involved in sight reading music. Music Education Research, 8 (1), Grade 8). Sound and sight.
97–120. Trinity College London (2007). Sight reading pieces for Bass Clef
Kopiez, R. & Lee, J. I. (2008). Towards a general model of skills Brass (Grades 1-8). Sound and sight.
involved in sight reading music. Music Education Research, 10 Trinity College London (2007). Sight reading pieces for Clarinet
(1), 41–62. (Book 1: Grades 1-4; Book 2: Grades 5-8). Sound and sight.
Kopiez, R., C. Weihs, U. Ligges, and J.I. Lee. 2006. Classification of Trinity College London (2007). Sight reading pieces for Flute (Book
high and low achievers in a music sight reading task. Psychology 1: Grades 1-4; Book 2: Grades 5-8). Sound and sight.
of Music 34 (1), 5-26. Trinity College London (2007). Sight reading pieces for Saxophone
Lechuga, M. T., Moreno, V., Pelegrina, S., Gómez-Ariza, C. J., & (Book 1: Grades 1-4; Book 2: Grades 5-8). Sound and sight.
Bajo, M. T. (2006). Age differences in memory control: Evidence Trinity College London (2008). Sight reading pieces for Oboe (Grades
from updating and retrieval-practice tasks. Acta Psychologica, 1-8). Sound and sight.
123(3), 279-298. Trinity College London (2007). Sight reading pieces for Viola (Initial-
Lehmann, A. C., & Ericsson, K. A. (1996). Performance without Grade 8). Sound and sight.
preparation: Structure and acquisition of expert sight-reading and Trinity College London (2009). Sight reading pieces for Double Bass
accompanying performance. Psychomusicology: A journal of (Initial-Grade 8). Sound and sight.
research in music cognition, 15(1-2), 1. http://dx.doi.org/ Trinity College London (2009). Sight reading pieces for French Horn
10.1037/h0094082 (Grades 1-8). Sound and sight.
Lehmann, A. C. & Kopiez, R. (2009). Sight-reading. In Hallam, S., Waters, A.J., Townsend, E., & Underwood, G. (1998). Expertise in
Cross, I., & Thaut, M. (Eds.) The Oxford handbook of music musical sight-reading: a study of pianists. British Journal of
psychology, pp. 344-351. Oxford: Oxford University Press. Psychology, 89, 123–149.
Lehmann, A.C. & McArthur, V.H. (2002). Sight-reading. In Parncutt, Waters, A. J., Underwood, G., & Findlay, J. M. (1997). Studying
R. & McPherson, G. (Eds) Science and psychology of music expertise in music reading: Use of a pattern-matching paradigm.
performance, pp. 135–150. Oxford University Press. Perception & Psychophysics, 59(4), 477-488.
Lehto, J. E., Juujärvi, P., Kooistra, L., & Pulkkinen, L. (2003). Wolf, T. (1976). A cognitive model of musical sight-reading. Journal
Dimensions of executive functioning: Evidence from children. of Psycholinguistic Research, 5 (2), 143-171.
British Journal of Developmental Psychology, 21, 59 – 80. Wurtz, P., Mueri, R. M., & Wiesendanger, M. (2009). Sight-reading
http://dx.doi.org/10.1348/ 026151003321164627 of violinists: Eye movements anticipate the musical flow.
McPherson, G.E. 1994. Factors and abilities influencing sightreading Experimental Brain Research, 194 (3), 445-450.
skill in music. Journal of Research in Music Education, 42 (3), Xu, F., Han, Y., Sabbagh, M. A., Wang, T., Ren, X., & Li, C. (2013).
217- 31. Developmental differences in the structure of executive function
McPherson, G. E., & Thompson, W. F. (1998). Assessing music in middle childhood and adolescence. PLoS ONE,
performance: Issues and influences. Research Studies in Music 8(10),e77770 .http://dx.doi.org/10.1371/journal.pone.0077770
Education, 10(1), 12-24.
Meinz, E. J., & Hambrick, D. Z. (2010). Deliberate practice is
necessary but not sufficient to explain individual differences in
piano sight-reading skill the role of working memory capacity.
Psychological Science, 21, 914-919. http://dx.doi.org/
10.1177/0956797610373933.
Mishra, J. (2014). Factors Related to Sight-Reading Accuracy A Meta-
Analysis. Journal of research in Music Education, 61(4), 452-465.
667
Table of Contents
for this manuscript
Children experience music in their daily lives, and can tap along with the beat of the
music, becoming more accurate with age. In Western cultures, music is often structured
metrically, with beats theoretically heard as recurring patterns of stronger (downbeat) and
weaker (upbeat) events. Do children perceive these so-called metrical hierarchies in
music and match them with auditory or visual realizations of meter? Children aged 5-10
years old listened to excerpts of ballroom dance music paired with either auditory
(beeping) or visual (ticking clock) metronomes. They rated how well the metronome
matched the music they heard. For auditory metronomes, children “judged” a student
drummer, and for visual metronomes, children “tested” musical clocks. Each child only
experienced one metronome modality. There were four metronome conditions:
synchronous with the music at the beat and measure level, synchronous at the level of the
beat but not the measure, synchronous at the measure level but not the beat, or not at all
synchronous with the music. For auditory metronomes, children at all ages successfully
matched the beat level of the metronome with the music, and rated beat-synchronous
metronomes as fitting the music better than beat-asynchronous metronomes. However,
children did not use measure-level information in their judgments of fit, and they rated
beat-synchronous metronomes as fitting equally well whether they also matched the
measure level or not. In the visual modality, 5-6 year old children showed no evidence of
beat or meter perception, and they did not differ in their ratings of any of the
metronomes. Children aged 7-8 and 9-10 successfully used beat-level information in their
ratings of fit, rating beat-synchronous metronomes as fitting better than beat-
asynchronous visual metronomes. Beat perception appears to develop in the auditory
modality earlier than the visual modality, and perception of multiple levels of metrical
hierarchy may develop after age 10.
ISBN 1-876346-65-5 © ICMPC14 668 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 669 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Infants learn complex aural communication systems rapidly and seemingly without effort
simply through exposure to sounds. However, the way in which they learn to organize
auditory information is dependent upon the characteristics of their soundscapes. For
example, the type and quantity of infants’ exposure to speech affect their language
development and the musical experiences in which they are immersed affects their
preferential attention and recognition of music. We studied infants’ (n=8) language and
music home environments by gathering precise and objective information about the
characteristics, frequency, duration, and quality of infants’ musical experiences during a
weekend day through the use of an innovative technology that facilitates the collection
and analysis of the sound environment. Results showed that live singing and purposeful
music listening experiences did not occur frequently in the homes of middle class
American families, that parents overestimated the number and length of such experiences
in their reports, and that grandparents and infants’ siblings often engaged in singing more
regularly than did the infants’ parents. Although infant-directed singing episodes
occurred rarely, they were highly communicative, interactive, and intimate in nature.
Overall, we found great diversity in the quantity and quality of the language and music
experiences afforded by the infants. Some were exposed to recorded music played by the
TV and digital toys for more than half of their waking hours, heard live singing for less
than four minutes in an entire day, and were exposed to over 8000 words spoken by
adults for a total of almost two hours. Others were immersed in more socially interactive
language and musical experiences than others. Our finding that the quantity of infants’
vocalizations was related to the characteristics of their soundcapes, highlights the
importance of the home sound environment on infants’ communicative skills.
ISBN 1-876346-65-5 © ICMPC14 670 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
This study compared fetal response to musical stimuli applied intravaginally (intravaginal
music [IVM]) with application via emitters placed on the mother’s abdomen (abdominal
music [ABM]). Responses were quantified by recording facial movements identified on
3D/4D ultrasound. One hundred and six normal pregnancies between 14 and 39 weeks of
gestation were randomized to 3D/4D ultrasound with: (a) ABM with standard
headphones (flute monody at 98.6 dB); (b) IVM with a specially designed device
emitting the same monody at 53.7 dB; or (c) intravaginal vibration (IVV; 125 Hz) at 68
dB with the same device. Facial movements were quantified at baseline, during
stimulation, and for 5 minutes after stimulation was discontinued. In fetuses at a
gestational age of >16 weeks, IVM-elicited mouthing (MT) and tongue expulsion (TE) in
86.7% and 46.6% of fetuses, respectively, with significant differences when compared
with ABM and IVV. There were no changes from baseline in ABM and IVV. The
frequency of TE with IVM increased significantly with gestational age. Fetuses at 16–39
weeks of gestation respond to intravaginally emitted music with repetitive MT and TE
movements not observed with ABM or IVV. Our findings suggest that neural pathways
participating in the auditory–motor system are developed as early as gestational week 16.
These findings might contribute to diagnostic methods for prenatal hearing screening, and
research into fetal neurological stimulation. Presented by Professor Jordi A. Jauset-
Berrocal, Ph D (University Ramon Llull, Barcelona, Spain), consultant in this research.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4616906/pdf/10.1177_1742271X156093
67.pdf.
ISBN 1-876346-65-5 © ICMPC14 671 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 672 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musical abilities and active engagement with music have been shown to be positively
associated with many cognitive abilities as well as social skills and academic
performance in secondary school students. While there is evidence from intervention
studies that musical training can be a cause of these positive relationships, recent findings
in the literature have suggested that other factors (e.g. genetics, family background,
personality traits) might also contribute. In addition, there is mounting evidence that self-
concepts and beliefs can affect academic performance independently of intellectual
ability. However, it is currently not known whether student's beliefs about the nature of
musical talent also influence the development of musical abilities in a similar fashion.
Therefore, this study introduces a short self-report measure termed “Musical Self-
Theories and Goals,” closely modeled on validated measures for self-theories in
academic scenarios. Using this measure the study investigates whether musical self-
theories are related to students' musical development as indexed by their concurrent
musical activities and their performance on a battery of listening tests. We use data from
a cross-sectional sample of 313 secondary school students to construct a network model
describing the relationships between self-theories and academic as well as musical
outcome measures, while also assessing potential effects of intelligence and the Big Five
personality dimensions. Results from the network model indicate that self-theories of
intelligence and musicality are closely related. In addition, both kinds of self-theories are
connected to the students' academic achievement through the personality dimension
conscientiousness and academic effort. Finally, applying Pearl’s do-calculus method to
the network model we estimate that the size of the assumed causal effects between
musical self-theories and academic achievement lie between 0.07 and 0.15 standard
deviations.
ISBN 1-876346-65-5 © ICMPC14 673 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 674 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
675
Table of Contents
for this manuscript
(Aisenbrey & Fasang, 2010). As a result, the sequence pattern school were analyzed (see table 1 and 2). In addition, two
analysis identified three, respectively four different trajectories typical developmental trajectory groups could be distinguished.
for the sub-facets of the psychometric constructs musical As a result of the correlational analyses, the significant
self-concept and the general factor of musical sophistication. In phi-correlation-coefficients show that sex (female students)
order to analyze relationships, the correlations between the and musical status (musically active students) as well as type of
three, respectively four identified trajectory groups with the school (Grammar Schools) are associated with higher devel-
assessed socio-demographic background variables were ex- opmental trajectories of the psychometric constructs as well as
amined. As a result, table 1 and 21 show the correlations be- the higher developmental trajectory group. This means, that
tween the identified developmental trajectories of the especially musically active female students at Grammar
sub-facets of musical self-concept and the general factor of Schools are in higher developmental trajectory groups and
musical sophistication with students’ sex, musical status, and show a higher development of musical self-concept as well as
type of school, with significant phi-correlation-coefficients musical sophistication, compared to their male peers at lower
between ϕ = .247 and ϕ = .459 – for the identified four de- musical developmental trajectory groups. Thus, the identified
velopmental trajectories of the general factor of musical so- relationships suggest that school music lessons can have a
phistication – as well as significant phi-correlation-coefficients non-inclusive character – particular for male students and
between ϕ = .297 and ϕ = .358 – for the three developmental students who don’t play a musical instrument– (Heß, 2011a;
trajectories of the sub-facets of musical self-concept. Thereby, 2013), which particular can influence the development of the
one exception can be found, because there is no significant target variable interest in music as a school. Therefore, it is not
relationship between the developmental trajectories of the surprising that the multilevel analysis shows that the overall
sub-facets of musical self-concept with type of school. More- interest in music as a school subject is decreasing over the
over, the analysis shows that there are two typical develop- three time points in the school year 2014/2015. This result is in
mental trajectory groups, which can be distinguished. In addi- line with Daniels’ (2008, p. 348) outcomes. Daniels (2008)
tion, significant correlations between the two typical identified describes that the interest in a school subject is decreasing at
developmental trajectory groups with the variables sex class seven (p. 348). The reasons for this are among other
(ϕ = .319), musical status (ϕ = .416), and type of school (ϕc things a lack between the (music) lessons in school and the
= .222) are found. However, the examined significant correla- psychological basic human needs for autonomy, social inte-
tions show that female as well as musically active2 students and gration, and competency (Daniels, 2008, p. 348; Deci & Ryan,
especially students at Grammar Schools are associated with 1985; 2002). In summary, musical self-concept as well as
higher developmental trajectories. musical sophistication and the identified developmental tra-
jectory groups of the assessed psychometric constructs con-
In addition, to identify how musical self-concept and musi- tribute the development of the target variable interest in music
cal sophistication contribute the development of the target as a school subject over time. This means that for the higher
variable of this study interest in music as a school subject the developmental trajectory group the target variable interest in
two typical identified developmental trajectory groups were music as a school subject is increasing over time, compared to
used to conduct a multilevel analysis. As a result, the multilevel their peers in the lower developmental trajectory group. Hence,
analysis shows that the overall interest in music as a school this study makes an important contribution to the understand-
subject (target variable) is decreasing over time (p ≤ .001). But ing of the mechanisms of musical development during ado-
in addition, the analysis shows that there is also an interaction lescence. Moreover, the results of this study can serve as a
effect. This means that for the students in the higher trajectory basis for teaching segmentation by ability and interest in music
group, the target variable interest in music as a school subject school teaching.
is increasing over time (p ≤ .001), compared to their peers in
the lower trajectory group. REFERENCES
Aisenbrey, S. & Fasang, A. E. (2010). New Life for Old Ideas: The
VI. DISCUSSION “Second Wave” of Sequence Analysis. Bringing the “Course”
Back Into the Life Course. Sociological Methods & Research,
The first aim of this repeated-measurement study was to 38(3), 420-462.
identify different developmental trajectories within the sample Bähr, J. (2001). Zur Entwicklung musikalischer Fähigkeiten von
of students by using the Goldsmiths Musical Sophistication Zehn- bis Zwölfjährigen: Evaluation eines Modellversuchs zur
Index (Gold-MSI; Fiedler & Müllensiefen, 2015; Schaal et al., Kooperation von Schule und Musikschule. Göttingen: Cuvillier.
2014; Müllensiefen et al., 2014) as well as the Musical Bastian, H. G. (2000). Musik(erziehung) und ihre Wirkung: Eine
Self-Concept Inquiry (MUSCI; Spychiger, 2010; 2012; Langzeitstudie an Berliner Grundschulen. Mainz: Schott.
Spychiger & Hechler, 2014). As a result, three, respectively Bentley, A. (1968). Musikalische Begabung bei Kindern und ihre
four different developmental trajectories for the sub-facets of Meßbarkeit. Frankfurt: Diesterweg.
Bernecker, C., Haag, L. & Pfeiffer, W. (2006). Musikalisches
musical self-concept (MUSCI) and the general factor of mu-
Selbstkonzept. Eine empirische Untersuchung. Diskussion Mus-
sical sophistication (Gold-MSI) were identified. Moreover, ikpädagogik, 29, 53-56.
correlations between the identified developmental trajectory Colwell, R. (1969). Music achievement tests 1 and 2. Chicago: Follett
groups with the assessed socio-demographic background Educational Corp.
variables such as students’ sex, musical status, and type of Colwell, R. (1970). Music achievement tests 3 and 4. Chicago: Follett
Educational Corp.
1
Colwell, R. (1979). Silver Burdett Music Competency Tests. Morris-
Note that table 1 and 2 can be found at the end of this paper. town: Silver Burdett.
2
Musically active students are persons who currently play a musical in-
strument.
676
Table of Contents
for this manuscript
677
Table of Contents
for this manuscript
Table 1: Overview of the musical developmental trajectory groups of the sub-facets of musical self-concept as well as the
phi-correlation-coefficients to describe the relationships between the musical developmental trajectory groups with the so-
cio-demographic background variables sex and musical status
Table 2: Overview of the musical developmental trajectory groups of the general factor of musical sophistication as well as the
phi-correlation-coefficients to describe the relationships between the musical developmental trajectory groups with the so-
cio-demographic background variables sex, musical status, and type of school
678
Table of Contents
for this manuscript
Singing begins spontaneously early in life, as the case for speech. Although there has
been extensive research on children’s acquisition of speech, relatively little attention has
been devoted to the acquisition of singing, especially in the early years. Even within
music psychology, the primary focus on skill acquisition has been on learning to play a
musical instrument rather than learning to perform on one’s inborn vocal instrument.
Singing entails multiple skills that include vocal play and gesture as well as learning and
reproduction of melodies (pitch and rhythmic patterns) and lyrics. The present
symposium, under the auspices of the AIRS (Advancing Interdisciplinary Research in
Singing) Project, includes presentations by Anna Rita Addessi (Italy) on infants, Sandra
Trehub (Canada) on toddlers, and Beatriz Ilari (U.S.A) on children of primary school age,
with Mayumi Adachi (Japan) and Stefanie Stadler Elmer (Switzerland) as discussants.
Anna Rita Addessi examines vocal interactions between caregivers and infants that occur
during routine diaper-changing. Sandra Trehub’s analysis of home recordings of toddlers’
singing (from YouTube and parents) reveals that toddlers are capable of reproducing the
contours, rhythms, and pitch range of conventional songs. Beatriz Ilari examines the
long-term impact of musical (e.g., Il Systema) and nonmusical (athletic) interventions on
various singing behaviors. Although children’s improvisation abilities increased
substantially over the three years of the study, the musical interventions proved to be no
more effective than the nonmusical interventions. The presentations and discussion
provide novel findings across a wide age range, novel methods, and novel perspectives
from different disciplines (music education, psychology) settings (home, school,
community), and countries.
ISBN 1-876346-65-5 © ICMPC14 679 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
This study takes a constructionist perspective concerning the interaction of an infant with
various adults. It focuses in particular on an echo mechanism of repetition and variation,
which structures the child-adult interaction according to different time scales (micro-
events to daily rhythms), languages, modes of expression, and communication. A
previous case-study with one infant 8 months of age was videorecorded at home during
the daily diaper change on successive days, with mother and father. The observed relation
between the infant and adult behavior highlighted the importance of imitation and
variation in these dyadic interchanges and several differences between mother and father.
The present study reports on a larger study using this same paradigm. Eleven infant 8
months of age were videorecorded during the daily diaper change on 7 successive days,
at home and in the day-care nursery, providing 7 recordings each of the infant with the
mother, father, grandmother, and educator. Audiovideo data was scored via a grid with
respect to duration and frequency of vocal productions, imitation, variation, and turn
taking. The results showed that the child’s vocal activity is inversely proportional to the
adult’s vocal activity (duration and frequency) and increases when the adult imitates the
child and respects the turn-taking. Several differences between mother, father,
grandmother and educator were observed concerning the intentionality, the timing of the
interaction, and the musical quality of the vocalisations. Data analysis now in progress
will be presented and discussed. The results observed to date highlight the importance of
the reflexive interaction between child and adult and can have important implications in
the field of music education. Because the diaper activity takes place daily for every
infant, these results have great generality and highlight the influences on infant vocal
learning in everyday activities.
ISBN 1-876346-65-5 © ICMPC14 680 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Toddlers’ Singing
Sandra E. Trehub1
1
Department of Psychology, University of Toronto, Toronto, Ontario, Canada
According to most descriptions, toddlers focus on song lyrics rather than melodies when
they sing, and the resulting songs have a relatively narrow pitch range. However, case
studies have revealed more proficient singing in toddlers, raising the possibility that those
toddlers were precocious rather than representative. Another possibility is that toddlers’
spontaneous singing at home may reveal more about their capabilities than their singing
of experimenter-selected songs in less familiar environments. In one study involving
home recordings from toddlers 18 to 36 months of age, we found a wide range of singing
proficiency that was unrelated to speech production skills. At 20 months, a number of
toddlers were merely producing the words of songs in a monotone voice. By 26 months,
however, most toddlers reproduced the pitch contours, rhythms, and words of songs, with
large individual differences. In another study (Helga Gudmundsdottir, collaborator), we
sought to ascertain whether toddlers’ sung melodies were recognizable when sung with
foreign words. We gathered home recordings of toddlers 16 months to 3 years (from
YouTube and parents). We selected several samples of two familiar songs (Twinkle,
Twinkle; Happy Birthday) and two unfamiliar children’s songs, all sung in foreign
languages. In online testing, English-speaking adults were required to identify the tunes.
They readily identified the familiar tunes (over 90% correct) but not the unfamiliar tunes.
These findings disconfirm the claim that toddlers’ songs are recognizable from the lyrics.
Pitch analyses revealed that toddlers’ pitch range approximated that of the target songs,
disconfirming the claims of reduced pitch range. Finally, in an ongoing study, we are
attempting to ascertain whether toddlers’ proficiency in imitating visual gestures is
related to their proficiency in singing, which relies, in part, on vocal imitation.
ISBN 1-876346-65-5 © ICMPC14 681 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Improvisation is one important means for developing creative thinking in music. In spite
of much discourse in support of improvisation in classrooms, little is currently known
about the development of children’s improvisatory skills. The aim of this mixed methods
study was to examine the development of children’s inventions of vocal song endings
over the course of three years. Children (N = 52) were followed for three years beginning
in first-grade. They were divided into three groups (i.e., music – El Sistema-inspired
program, sports, control) based on their extracurricular activities. For three consecutive
years, children were asked to invent song endings to given song beginnings based on a
task from the ATBSS. They also sang “Happy Birthday”, engaged in pitch matching
tasks, and talked about music. Invented song endings were analyzed in terms of number
of trials and creativity scores, and in relationship to singing skills, which were examined
for accuracy and range. Musical and extra-musical qualities of all vocal creations were
also analyzed, giving origin to in-depth case studies of song creators and their unique
products. Quantitative analyses revealed no interactions between groups and creativity
scores or number of trials in any given year. Significant differences in creativity scores
were found for year 3, for all groups. Qualitative analyses suggested that there were six
different strategies used by child creators when inventing song endings, with the latter
being influenced by children’s personality, dispositions, and immediate social
environments. The act of inventing song endings may come naturally to some children,
irrespective of music training. These results suggest a multitude of contributing factors to
executing and being comfortable with improvisation, including the importance of
offering ample opportunities for children to improvise and compose in formal music
education settings.
ISBN 1-876346-65-5 © ICMPC14 682 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The development of singing accuracy in children, and the relative role of training versus
maturation, is a central issue for both music educators and those within music cognition.
Singing ability is fundamental to children’s developing musicianship and one of the
primary goals of early music education. Despite this focus, a number of people have
difficulty singing accurately even into adulthood and many report feeling inadequate as
singers despite having normal skills in other domains. Perceptions of inadequacy can
deter both children and adults from participating in music education and music-related
activities. Is inaccurate singing a cognitive deficit or the result of a lack of proper music
education? This symposium presents three controlled studies of factors that may
influence young children’s singing development. The first experiment explored the
impact of early childhood music instruction on the singing development of children
before they enter school. The second experiment examined the impact of daily singing
instruction on 5-6 year olds’ singing development in their first year of schooling. The
third comparative study explored the role of instrumental training in singing accuracy of
children age 5-12 years old. Taken together, this research can help to provide a picture of
how singing development may be facilitated by certain kinds of musical experience and
how those experiences interact with age and gender. Understanding how children’s
singing can be improved my shed light on some of the root causes of inaccurate singing
and allow future generations to avoid some of the well-documented difficulties of adults
who struggle with singing.
ISBN 1-876346-65-5 © ICMPC14 683 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Sean Hutchins1,2
1
The Royal Conservatory of Music, Toronto, ON, Canada
2
McMaster University, Hamilton, ON, Canada
Recent data show singing abilities can vary greatly in the adult population, and that
several different factors can be involved in failure to sing in tune. However, the origins of
singing ability (or difficulties) are poorly understood. Music educators often cite music
education, especially in early childhood, as a key factor in developing this ability, but
there is little in the way of acoustical data to support this assertion. In the present study,
we aimed to test whether early childhood music education can improve singing ability, as
assessed through acoustic measurements of intervallic accuracy. We tested two sets of
children, aged 2.5 to 6 years. The first set (48 children) had taken weekly early childhood
music education classes at The Royal Conservatory of Music for the past 24 weeks (as
part of a larger, controlled study), the other set (61 children) had applied to take the same
classes, but had been assigned to a control group (to receive the classes at a later time). In
our testing session, children sang the alphabet song, in addition to several other musical
and cognitive tasks. Acoustic analysis of their intervallic accuracy showed a significant
interaction between Test Group (whether they had taken the lessons or not) and Gender,
such that females in the lessons group made fewer interval errors that in those in the no
lessons group. We also found improvement in accuracy across increasing age groups,
which also interacted with gender. Singing ability also correlated with other factors as
well. These results show some initial evidence that early childhood music education can
have an impact on some singing abilities, which may be a precursor to better developed
general musical skills throughout life.
ISBN 1-876346-65-5 © ICMPC14 684 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
We explored the effect of daily group singing instruction versus no formal instruction on
the singing accuracy of kindergarten-aged children and whether singing accuracy
performance differed across tasks. The treatment group (n = 41) received 20 minutes a
day of group instruction in a Kodaly-based music classroom that emphasized the
development of the singing voice in terms of tone, register, and accuracy for six months.
The control group (n = 38) received no formal music instruction during the school day.
Singing accuracy was measured both pre and post instruction by having participants echo
single pitches, intervals and patterns sung by a recorded female non-vibrato vocal model
and then sing Twinkle, Twinkle Little Star on their own.
ISBN 1-876346-65-5 © ICMPC14 685 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The paper presents findings from In Harmony instrumental programmes that provide
sustained instrumental tuition for children (aged 5-12 years) in two Primary schools in
the UK. As a government-funded programme, the In Harmony initiative in England is
based on the El Sistema orchestral social and music programme in Venezuela and, as
such, are located in urban schools characterised by challenging circumstances. The aim of
the research evaluation was to assess particular features of musical behaviours and
development, children’s attitudes to musical learning, and possible wider impact.
Longitudinal data were collected via (a) specially designed questionnaire instruments for
children that drew on different aspects of their experiences in music and of school life in
general, (b) instrumental tutor reports on individual children’s instrumental learning and
(c) individual singing assessments. During visits made to each school, a researcher
listened to each child’s singing of two target songs and made a judgement as to the level
of competency displayed against developmental criteria of two rating scales. The
researcher’s ratings were combined and converted into a ‘normalised singing score’
(NSS). Subsequent analyses revealed that the NSS values ranged from 22.5 (chant-like
singing) to a maximum of 100 (no obvious errors in the musical features of the songs).
Comparisons of children’s NSS values were compared over time. Results indicate that
whilst the main focus of the intervention was to support the development of instrumental
skills, pupils entering the programme with significantly less developed singing
competency (compared to children of a similar age and background in a national dataset
of 11,000+ children) had made highly significant progress in their singing competency
and were more in-line with children of the same age in the national dataset. This
improvement was evidenced for both girls and boys, although sex differences were
evidenced (in line with national and historical data) with girls being more developed than
boys. The described measure of children’s singing represents one indicator of their
musical behaviour and development. We hypothesize that the finding of significant
improvements in children’s singing are likely to be, in part, related to the additional
singing opportunities provided within the school week, but also to be an outcome of the
auditory imagery or ‘inner hearing’ that occurs as a form of silent singing whilst children
play their instruments, including when they play from musical notation.
ISBN 1-876346-65-5 © ICMPC14 686 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Orchestration theory has not attained the depth or precision compared to other musical
fields and would benefit from perceptual grounding. Analyses of scores and treatises
reveal that many orchestration aims are related to auditory grouping principles, which
determine auditory fusion of musical events, the integration of coherent streams of
events, and chunking into perceptual units. These grouping processes are implicated in
orchestral devices related to timbral blend or heterogeneity, segregation or stratification
of layers, and orchestral contrasts, such as antiphonal exchanges. We developed an
empirical analysis method to study orchestral devices in a corpus of orchestral pieces.
Pairs of researchers analyzed each movement: after analyzing the movement individually,
they compared analyses and developed a joint analysis, which consisted of an annotated
score and a spreadsheet that catalogued various details about each instance of the
orchestral devices. These data include the span of measures in which the device is found,
the timing of the recording(s), the instrumentation, and the perceived strength of the
effect. A relational database was created, recording over 4200 orchestral devices from
over 75 orchestral movements drawn from the classical period to the 20th century. Text
data mining techniques of association analysis were used to explore the catalogued data
in order to reveal patterns among instruments and their combinations within various
orchestral devices. This approach will quantify the extent to which instrument
combinations described in textbooks occur and also potentially uncover unknown
combinations or unique instances by certain composers. Further analyses will investigate
whether instrument combinations vary over time, as well as the contexts in which the
devices occur. The aims are to contribute to the development of a theory of orchestration
based on empirical analysis and to create a catalogue of excerpts for perceptual
experimentation.
ISBN 1-876346-65-5 © ICMPC14 687 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Most research on the blending of musical timbres has studied isolated sounds, rather than
musical excerpts. We sought to understand the relative roles of the number of
instruments, the diversity of their combinations within and across instrument families, the
intervals employed, pitch register, and the degree of parallel or similar motion on blend
perception in orchestral music. A sample of 64 excerpts representing different
combinations of the above factors were selected from the analysis of over 70 movements
of the orchestral repertory from the early classical period to the early 20th century. The
excerpts were also chosen to cover a range of strengths from weak to strong blend. High-
quality interpretations of these excerpts were created to mimic specific commercial
recordings using the OrchSim system developed at McGill. From these interpretations,
only the groups of instruments involved in a given blend were extracted and rendered to
stereo sound files, thus initially testing the blending instruments outside of their
surrounding musical context, but balanced within that context. Listeners rated the degree
of blend of each excerpt on a scale from unity to multiplicity, with the scale ends
representing the strongest blend and no blend, respectively. A regression analysis
indicates that the primary factor affecting blend is the degree of similar motion. The
effect of the number of instruments is more variable for smaller sets of instruments,
where the timbral characteristics of the combined instruments may play a stronger role in
determining blend. The long-term aim is to develop a model of blend perception that can
be used for music information research and for computer-aided orchestration systems, as
well for pedagogical applications in music theory, composition, and performance
technique.
ISBN 1-876346-65-5 © ICMPC14 688 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 689 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Prussian organ building school of German legacy by the master Nowadays, some examples of Vox humana pipe
Adam Gottlob Casparini (1715–1788), who settled in construction are distinguished as the most representative and
Königsberg, and possibly – Gerhardt Arendt Zelle (?–1761), typical examples among others. Seeking to produce the delicate
the founder of Vilnius Late Baroque Organ Building School. and rapid pulsation of sound like the vibrato effect, Vox
Thereafter, from the 2nd half of the 18th c. to the middle of the humana most commonly was performed in a combination with
19th c., Vox humana remained one of the main exclusive reed other 8-foot registers with stopped pipes or Tremulant (in some
stops in almost all large (2-keyboard manual) Lithuanian cases the latter equipment was installed inside the Vox humana
baroque organs as well as portatives (small organs). as a double-stop construction) (Friedrich, 1989, p. 45).
Italian baroque organs featured the same title Voce Umana The Vox humana pipe consists of two main parts: a metal or
as well. However, in the organ heritage of this country we wooden boot (bottom part) and metal (very rarely wooden)
encounter a different type of the stop and its pipes. Italian Voce resonator (top part). The quality of timbre and sound especially
Umana is characterized by labial pipes instead of reed, and depends on the construction and shape of resonator that usually
every tone is produced by two unevenly tempered pipes that was produced using two main geometrical forms of cylinder
produce a wavy sound (e.g., organs by the Antegnati family and conic (plus a conic with flat top) and various combinations
masters in Bergamo St. Nicola Church, 1588, and Brescia (see Fig. 1)
St. Carlo Church, 1630, Italy).
690
Table of Contents
for this manuscript
a) b)
c)
Figure 3. Pipe examples of Type 2: a) the copy of the Vox humana
pipes from Arp Schnitger’s organ in Hamburg Hauptkirche
St. Jacobi (1688–93; a copy of Schnitger’s organ installed in new
organ in Gothenburg Örgryte New Church and implemented by
a) b) GOArt/Göteborg Organ Art Centre, Sweden, in 2000);
b) Vox humana by Albertus Antonius Hinsz (1741–3, Kampen
St. Nicholas Church, The Netherlands); c) an example of
analogous construction – the copy of the Zink stop from Arp
Schnitger’s organ in Hamburg Hauptkirche St. Jacobi.
691
Table of Contents
for this manuscript
a) b)
Figure 6. Pipe example of Type 5: a) a drawing of Apfel-Regal
pipe by James Wedgewood (Wedgewood, 1905, p. 134);
b) a drawing of Apfel-Regal pipe by George Audsley (Audsley,
1921, p. 224)
a) b)
692
Table of Contents
for this manuscript
a)
b) c)
c) d)
Figure 9. Example of triple conical resonator: a pipe from the
Figure 8. Pipe examples of Type 9: a) by Andreas Hildebrandt organ by organ master Johann Heinrich Hartmann Bätz (1762)
(1719, Paslęk St. Bartholomew Church, Poland); b) by Adam and rebuilt by Jonathan Bätz (1837, Evangelical Church, The
Gottlob Casparini (1776, Vilnius Church of the Holy Spirit, Hague, The Netherlands); b) drawing of the Bärpfeife of
Lithuania); c) by Mateusz Raczkowski (1792–3, Kurtuvėnai triple-conical construction (Adlung, 1768, p. 73); c) drawing of
Church, Lithuania); d) by unknown master (1839, Žemalė the Bärpfeife of triple-conical construction (Audsley, 1921,
Church, Lithuania). p. 224).
693
Table of Contents
for this manuscript
ACKNOWLEDGMENT
The preparation of the article and participation at ICMPC14
partly supported by the Lithuanian Composers’ Union and
Kaunas University of Technology.
REFERENCES
Adlung, J. (1768). Musica Mechanica Organoedi. Teil 1. Berlin:
Friedrich Wilhelm Birnstiel.
Audsley, G. A. (1905). The Art of Organ Building. Dodd, Mead &
Company.
Audsley, G. A. (1921). Organ-Stops and their Artistic Registration.
Names, Forms, Construction, Tonalities, and Offices in Scientific
Combination. New York.
Bédos de Celles, D. F. (1766). L’Art du Facteur d’Orgues. Paris;
faximile ed. (1994): Geneve.
Burney, C. (1775). The Present State of Music in Germany, the
Netherlands, and United Provinces: Or, the Journal of a Tour
Through Those Countries, Undertaken to Collect Materials for a
General History of Music, 2nd ed., vol. 2. London.
Eberlein, R. (2009). Orgelregister, ihre Namen und ihre Geschichte.
Siebenquart: Verlag Dr. Roland Eberlein.
Friedrich, F. (1989). Der Orgelbauer Heinrich Gottfried Trost:
Leben-Werks-Leistung. Leipzig:Deutscher Verlag für Musik.
Miller, G. L. (1913). The Recent Revolution in Organ Building. 2nd ed.
New York: The Charles Francis Press.
Praetorius, M. (1619). Syntagma Musicum. 2nd book ‘De
Organographia’. Wolfenbüttel.
Praetorius, M. & Faulkner, Q. (trans. & ed.) (2014). Syntagma
Figure 10. Example of conical-cylindrical-spherical combination: Musicum II: De Organographia, Parts III – V with Index. Lincoln:
a drawing of Vox humana pipe in Norman and Beard’s organ in Zea E-Books. Book 24.
Norwich Cathedral (1899, England) (Miller, 1913, p. 92, fig. 20). Seidel, J. J. (1844). Die Orgel und ihr Bau. Breslau.
Schneider, T. (1958). Die Namen der Orgelregister. Kompendium von
2. One-section tapered resonator, with closed cover.
Registerbezeichnungen aus alter und neuer Zeit mit Hinweisen
Example: organ by Jemlich workshop, 1899, Lößnitz auf die Entstehung der Namen und deren Bedeutung. Kassel, Lon-
Johanniskirche (Germany), the stop is placed in a distant box don: Bärenreiter.
over the organ case. This type of newly manufactured pipe was Smets, P. (1948). Die Orgelregister, ihr Klang und Gebrauch: ein
introduced in 1932 and 1956 – catalogues by 20th c.-organ Handbuch für Organisten, Orgelbauer und Orgelfreunde. Mainz:
building workshop of August Laukhuff. Rheingold-Verlag.
3. Free reeds (Germ. Durchschlagende Zungen). The first Stauff, E. L. (2016, April 28). Vox humana. Encyclopedia of Organ
known examples are dated the end of the 18th c.; in the 19th c. Stops. Retrieved April 28, 2016, from
it was the most popular type of resonators among organ http://www.organstops.org/v/VoxHumana.html.
Wedgwood, J. I. (1905). A Comprehensive Dictionary of Organ Stops
building workshops, e.g., organ by the Walker workshop, 1879,
(English and Foreign, Ancient and Modern): Practical,
Düsseldorf Johanneskirche (Germany). Theoretical, Historical, Esthetic, Etymological, Phonetic, 3rd ed.
4. A combination of conical bottom section, cylindrical London: The Vincent Music Company.
middle section and half-spherical resonance hood. In 1913, an
interesting and rare example of Vox humana pipe construction
was printed in Miller’s study about organ building, while
discussing the organs of the 19th c., and referring to Norman
and Beard’s organ in Norwich Cathedral (Miller, 1913, p. 92;
see Fig. 10).
Labial Type of Vox Humana
This kind of organ stop features the vibrant, wavy (Germ.
Schwebende) and slightly untuned sound (uneven tuning), and
is attributed to the group of Principal stops. To this day the stop
has been used in Italian organs since c. 1550, with the title Voce
umana or Fiffaro, Piffara, Bifara. Furthermore, this type of
stop was brought into the surroundings of Germany and later
spread more widely through the organ builder Eugenio
Casparini (Johann Caspar, 1623–1706), who after returning
from Italy, manufactured this stop for the 1697–1703 organ in
Görlitz St. Peter and Paul Church. This type is quite analogous
to another labial stop Unda maris, which is more common.
694
Table of Contents
for this manuscript
Changes in timbre (namely spectral centroid) can affect subjective judgments of the size
of pitch intervals. We investigated the effect of timbre change on musicians’ ability to
identify melodic intervals in order to observe whether poorly identified pitch intervals are
more susceptible to interference with timbre. Musicians (n=39) identified all melodic
intervals within the octave in ascending and descending directions using traditional labels
(e.g., “perfect fifth”) in two timbre conditions: (1) timbre-neutral (piano only) and (2)
timbre-change (3 timbre pairs). The pairs were chosen from a timbre dissimilarity
experiment (n=19) using multidimensional scaling in order to determine the perceptual
dimensions along which timbre change occurred. The pairs chosen were marimba—
violin (spectral centroid), French horn—piano (temporal envelope), and bass clarinet—
muted trumpet (amplitude modulation). In a speeded interval-identification task,
musicians’ vocal identifications of melodic intervals were analyzed in terms of accuracy
and response time. A repeated-measures ANOVA showed a main effect of timbre on
accuracy scores, indicating that the marimba—violin pair had the lowest scores overall.
Timbre pair also affected the type of errors in incorrect trials. Root-mean-squared errors
calculated from confusion matrices revealed an increase in the spread of errors for
timbre-change conditions (the largest being the marimba—violin pair) compared to the
timbre-neutral condition. The response-time analysis showed a significant main effect of
timbre demonstrating that both the fastest (French horn—piano) and slowest (marimba—
violin) response times were within the timbre-change condition. The results indicate that
timbre-change along differing dimensions can affect the categorical identification of pitch
intervals. The lack of systematic interaction between timbre pair, interval and direction
suggests that this effect may not be limited to poorly identified melodic intervals.
ISBN 1-876346-65-5 © ICMPC14 695 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
4. Asaridou SS, McQueen JM. Speech and music shape the listening
III. Method brain: evidence for shared domain-general mechanisms. Front
Three groups of participants (Mandarin, Dutch, and musicians) Psychol. 2013;4: 321. doi:10.3389/fpsyg.2013.00321
were exposed to identical pitch contours in speech and
5. Patel AD. Music, language, and the brain. Oxford: Oxford
melody. Critical items (CIs) for the speech condition
University Press.; 2008.
consisted of disyllabic words in Mandarin with three possible
contours: rising, falling, or flat. A pitch continuum was 6. Patel AD. Language, music, and the brain: A resource-sharing
created for the CIs from rising to flat, and flat to falling, by framework. In: Rebuschat P, Rohrmeier M, Hawkins J, Cross I,
dividing a finite pitch space into 11 equal-sized tonal editors. Language and music as cognitive systems. Oxford: Oxford
gradations [see 8]. CIs for the melodic condition were exact University Press; 2012. pp. 204–223.
analogues of the manipulated speech contours, created by
extracting the pitch contour of the critical speech items, and 7. Peretz I, Vuvan D, Armony JL. Neural overlap in processing
resynthesizing these as gliding sinusoidal tones. Given music and speech. Phil Trans R Soc B 370 20140090. 2015;
condition, CIs were placed in linguistic, and musical carrier doi:http://dx.doi.org/10.1098/rstb.2014.0090
phrases, respectively. Participants indicated the direction of 8. Xu Y, Gandour JT, Francis AL. Effects of language experience
the last tone in a 3-alternative forced-choice task. Accuracy and stimulus complexity on the categorical perception of pitch
rates were collected. direction. J Acoust Soc Am. 2006;120: 1063–1074.
doi:10.1121/1.2213572
IV. Results
Results indicate domain-specific perceptual mechanisms
operant in both speech, and music. Identical pitch contours
ISBN 1-876346-65-5 © ICMPC14 696 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Two types of absolute pitch abilities have been identified from previous research: overt
AP (e.g. pitch labelling; oAP) is purported to be a rare binary ability possessed by a small
proportion of people with a musical background, while latent AP (recognising or
producing a well-known song at the correct pitch; lAP) is thought to exist in the general
population and can be measured on a continuous scale. However, the measurement
structure of these abilities (binary versus continuous) and the degree to which the two are
related still needs to be confirmed. Furthermore, it may be that lAP is merely a side-effect
of singing ability, musical engagement, or formal musical training. The relationship
between lAP and musical sophistication thus requires clarification. We therefore
developed of a comprehensive test battery for measuring oAP and lAP in musicians and
non-musicians to address the aforementioned questions. 104 musician and non-musician
participants were tested on five oAP and three lAP pitch production and perception tests,
as well as three subscales of the Goldsmiths Musical Sophistication Index self-report
inventory. In a preliminary analysis, Gaussian mixture modelling showed oAP scores to
be bimodal and lAP to be unimodally distributed. Variable selection for cluster
discrimination and exploratory factor analysis suggested different pitch production tests
as the most efficient measures of latent oAP and lAP abilities. A point-biserial correlation
indicated a relationship between overall oAP and lAP scores, but this relationship was not
found when participants with and without oAP were analysed as separate groups. There
was no significant correlation between lAP scores and active engagement, musical
training or singing ability. These results support previous findings that oAP is a binary
ability and indicate that lAP is a continuously expressed ability which is distinct from
oAP. Results further show that lAP is not a mere side-effect of musical sophistication.
ISBN 1-876346-65-5 © ICMPC14 697 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Widely acknowledged as the greatest master of tonal modulation, J.S. Bach’s works elude
traditionally triad-based analyses more applicable to later eras. Considerable disparities
exist among theorists in the analyses of the same piece resulting in areas of tonal
ambiguity. But what if there were a way to remove such ambiguities and clearly and
objectively identify both by ear and eye the tonic and mode of each consecutive key
throughout a piece by Bach? This lecture demonstration explores a cognitively based
approach to modulation called heptachord shift in which the perception of the tonic and
mode of each key is based on the mind’s unconscious ability to bind together through
time the most recently heard seven diatonic pitches to form an unbroken chain of
objectively perceptible tonal heptachords from the beginning to the end of a piece of
Bach, allowing us to know the specific degree ‘meaning’ of each note, and echoing how
we comprehend language in real time— one word at a time. The Prelude in D major,
BWV 850 acts as a model.
ISBN 1-876346-65-5 © ICMPC14 698 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Twenty-five participants identified melodies as “in tune” or “out of tune” and rated their
confidence for each answer. The pitch manipulation consisted of the enlargement of an
interval in 5 cent steps (from 0 to 50 cent deviation). The interval deviated was either a
Major 2nd or a Perfect 4th and occurred in the middle or end of a 6-tone melody. The task
was run twice, before and after an explicit definition of the two labels. Repeated measure
ANOVAs were conducted to examine the effect of the deviation on the proportion of in-
tune answers and on the confidence levels.
For the participants who were able to learn the labels (n = 20), the proportion of in tune
answers varies greatly according to the amplitude of the deviation and depended on the
size of the interval manipulated. Associated with the confidence level measurement, the
identification data support a categorical perception process. Interestingly, explicitly
learning the labels increased the overall confidence but did not modify drastically the
profile of the categories and the process behind the categorization.
This study suggests that explicit learning is not necessary to develop higher order
categories relative to “correctness”. Nevertheless, such a process seems limited to certain
intervals. Further investigation of other intervals and individual differences seems
promising to better understand the mechanisms underlying music perception.
ISBN 1-876346-65-5 © ICMPC14 699 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 700 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Inspired by the isomorphism between the diatonic scale and 3) Procedure. Participants were asked to listen to one
the standard pattern, the current experiments took an stimulus at a time and then make their rating. In Experiment
alternative approach to the question of how pitch and time are 1a, they rated how well the probe tone fits into the scale by
processed. The experiments asked whether judgments of how moving a slider on a continuous scale from extremely bad fit
well a rhythm fit a scale could be accounted for by how much to extremely good fit. In Experiment 1b, they rated how well
the tonally stable tones and the rhythmically stable tones the probe accent fits into the rhythm by moving the same
coincided. That is, were the fit judgments higher when the two slider. In both studies, they first completed four practice trials.
hierarchies of stability correlated with one another? The trials were blocked by scale or rhythm. Each block began
with the relevant scale or rhythm played in a neutral form, that
is, without a probe tone or accent. The neutral form for
II. EXPERIMENT 1 Experiment 1a was an ascending scale followed by the same
The purpose of this experiment was to measure tonal and scale in descending order, with a short pause in between; the
rhythmic stability in the seven scales and seven rhythms that neutral trial for Experiment 1b was a rhythm played twice
are formed by shifting the starting pitch interval or tone with a short pause in between. After the neutral trial, they
duration in Figure 1b. In order to measure the perceived tonal listened to and rated the probes for that particular scale or
stability of the tones in the scales, Experiment 1a was a probe rhythm, and then moved on to a different scale or rhythm
tone experiment in which each of the probe tones following which were presented in a randomized order. Once they rated
the scales was judged as to how well into the scale context. In all probe trials for the seven scales or rhythms, they listened to
order to measure the perceived rhythmic stability of each of and rated each of the 7 baseline trials in a randomized order
the tones in the rhythms, in Experiment 1b each position of on two scales: how familiar the scale or rhythm is, and how
the sequences was accented dynamically and listeners judged well-formed the scale or rhythm seems (in other words,
how well the accent fit into the rhythm. These judgments were whether the scale or rhythm forms a good pattern). Both items
then used in the analysis of the results of Experiment 2 that were also rated by moving a slider on a continuous scale, from
used all 49 possible combinations of the 7 scales and 7 extremely unfamiliar or ill-formed to extremely familiar or
rhythms in Experiment 1. well-formed. At the end of the study, they filled out a
demographics questionnaire. Each study lasted approximately
A. Method 30 minutes.
1) Participants. Forty-five Cornell University students
participated in each experiment for course credit or a $5 cash
reward. Thirty-one participated in both. In both studies, all but
III. EXPERIMENT 2
1 were musically trained. Participants in experiment 1a had an B. Method
average of 14 years of musical training; 3 had absolute pitch.
Participants in experiment 1b had an average of 12.8 years of 4) Participants. Fifty Cornell University students
musical training. participated in the experiment for course credit. All but 4 were
musically trained. Participants had an average of 11.3 years of
2) Stimulus materials. In both experiments, all sequences musical training; 3 had absolute pitch.
consisted of 8 tones, forming 7 intervals. They were created
using GarageBand and were sounded in piano timbre. In 5) Stimulus materials. Forty-nine scale/rhythm pairs were
Experiment 1a, the seven scales were constructed by shifting constructed by combining the seven scales and the seven
the starting interval on the diatonic pattern in Figure 1b. For rhythms used in Experiment 1. As Table 1 shows, out of all 49
example, Scale1 had the intervals (in semitones) 2212221, stimuli, the seven on the diagonal in this table are matched
Scale2 had the intervals 2122212, and so on. Each scale was pairs, because both the scale and the rhythm start from the
constructed in both ascending and descending forms, same point in the diatonic pattern in Figure 1b. This means
beginning and ending on C. The range of the ascending that the scale and the rhythm share the same structure. The
sequence was C2 to C3, and the range of the descending rest are mismatched pairs, because the scale and the rhythm
sequence was C4 to C5 (details see Krumhansl & Shepard, do not share the same structure. In addition, the same fourteen
1979). The seven scales were followed by a probe tone that baseline trials from Experiment 1 were used again as baseline
was one of the seven tones in the scale, which was played in trials.
the range of C3 to C4 between the ranges of the two scale 6) Procedure. Participants were asked to listen to one
contexts. A baseline trial was also composed for each scale stimulus at a time. After each listening, they rated how well
with isochronous rhythm. For Experiment 1b, we constructed the rhythm fits the scale by moving a slider on a continuous
seven rhythms by shifting the starting duration on the standard scale from extremely bad fit to extremely good fit. First, they
pattern in Figure 1b. For example, Rhythm1 had the durations completed four practice trials, and then rated the 49 pairs in a
(in eighth notes) 2212221, Rhythm2 had the durations randomized order. After filling out a demographics
2122212, and so on. They were played monotonically on C3. questionnaire, they listened to and rated the 49 pairs for a
On successive trials, each of the seven temporal positions was second time, in a different randomized order. After filling out
dynamically accented; this is called the probe accent. A another demographics questionnaire, they listened to and rated
baseline trial was also composed for each rhythm with familiarity and well-formedness for each of the 14 baseline
monotonic pitch. All rhythms were played twice with a short trials, presented in a randomized order. The study lasted
pause in between. approximately 30 minutes.
701
Table of Contents
for this manuscript
IV. RESULTS and the time interval matched. The other was how many
positions matched before the mismatch. Neither of the
Data were processed in the following way. All continuous
codings of surface-level match correlated significantly with
rating scales were coded from -100 to 100, with -100 being
the judgments of how well the rhythm matched the scale; for
extremely bad fit, unfamiliar, or ill-formed, and 100 being
the first coding, r(49) = .03, p = .86; for the second coding,
extremely good fit, familiar, or well-formed. For Experiment
r(49) = .17, p = .23. This suggests that the surface-level
1a, probe tone ratings from ascending and descending trials
structural match does not predict judgments of
were averaged because they correlated highly with each other.
cross-dimension fit. Instead, it is the match between the
Next, individual ratings were averaged across participants
underlying tonal stability and rhythmic stability of the tones
because no large effect of musical training background was
that predicts judgments of fit.
found in either Experiment 1a or 1b. This way, one judgment
rating was obtained for each probe tone in each scale, and one
judgment rating was obtained for each probe accent in each V. CONCLUSION
rhythm. Probe tone ratings was then correlated with probe
The current experiments explored the relationship between
accent ratings, which gives a predicted goodness of fit
measure for how participants would judge the combined pitch musical pitch and time by focusing on the isomorphism
and time pattern in Experiment 2. Table 2 shows this between pitch interval and tone duration. Specifically, scales
and rhythms in the diatonic pattern were used as stimuli.
goodness of fit measure from Experiment 1.
Findings suggest that the surface-level structural match did
Similarly, in Experiment 2, the two judgment ratings for
not predict judgments of the cross-dimension fit. Instead, the
each scale/rhythm pair were averaged because they correlated
correlation between the two probe ratings, measuring tonal
highly with each other. Individual ratings were also averaged
stability and rhythmic stability, predicted judgments of how
across participants because no large effect of musical training
well the rhythm fit the scale. The ratings were higher when
background was found. Table 3 shows the ratings of how well
the higher-rated tones were played on the higher-rated
the rhythm fit the scale for each of the 49 pairs from
temporal positions in the probe experiments. This suggests
Experiment 2. To determine how information about pitch and
that listeners’ cross-dimension judgments were governed by
time combine perceptually, we correlated the goodness of fit
their preference for the best-fitting tones in the diatonic scales
measure from Experiment 1 with the judgment ratings from
Experiment 2. to be played on the best-fitting temporal locations in the
If pitch and time are separable dimensions, then the exact standard pattern. This finding shows that pitch and time are
not two separable dimensions. Instead, they interact when
scale/rhythm combination should not matter for the
joined together.
cross-dimension judgment. In other words, the correlation
between probe tone ratings and probe accent ratings should
a).
not predict the cross-dimension judgment ratings. Thus, the
expected correlation between the goodness of fit measure and
the judgment ratings would be zero if the two dimensions are
processed separately. On the other hand, if the correlation is
not zero and is significant, then it means that pitch and time
are not separable dimensions and that they interact in
perception.
Results show a positive and significant correlation between
the goodness of fit measure and judgment ratings, r(49) = .65,
b).
p < .0001. This suggests that pitch and time interact in the
perception of music. They interact in such a way that listeners
prefer the higher-rated tones to be played on higher-rated
temporal positions.
However, listeners in Experiment 2 reported being much
more familiar with the major diatonic scale than the others.
Consequently, we computed the residuals of listener’s
judgments after taking out the effect of familiarity. A
correlation analysis was conducted to assess the judgments
against the goodness of fit measure. Results remain positive
and significant, r(49) = .51, p < .001. This suggests that after
taking the familiarity bias into account, listeners still preferred
the higher-rated tones to be played on higher-rated temporal
positions.
In addition, the judgment ratings were examined against the
surface-level structural match between pitch interval and tone
Figure 1. a). Isomorphic structure shared by the pitch intervals
duration. As Table 1 shows, all matched pairs were coded as 1 of the diatonic scale and the tone durations of the standard
and the rest as 0. Correlation between judgment and pattern. b). Illustration of the asymmetrical and cyclic Diatonic
surface-level match was not significant, r(49) = .05, p = .75. Pattern.
The surface-level match was also coded in two other ways.
One way was to count the number of times the pitch interval
702
Table of Contents
for this manuscript
Rhythm1 1 0 0 0 0 0 0
Rhythm2 0 1 0 0 0 0 0
Rhythm3 0 0 1 0 0 0 0
Rhythm4 0 0 0 1 0 0 0
Rhythm5 0 0 0 0 1 0 0
Rhythm6 0 0 0 0 0 1 0
Rhythm7 0 0 0 0 0 0 1
Table 2. Goodness of fit measure constructed by correlating probe tone ratings and probe accent ratings
Note. The lowest possible value is -100, meaning extremely bad fit; the highest possible value is 100, meaning extremely good fit.
703
Table of Contents
for this manuscript
ACKNOWLEDGMENT
We would like to thank Cognitive Science at Cornell
University for the Travel Grant they provided for Olivia Wen
in support for her attendance of this conference. We would
also like to thank Matthew Shortell for making stimuli and
collecting data for Experiment 1.
REFERENCES
Agawu, K. (2006). Structural analysis of cultural analysis?
Competing perspectives on the “Standard Pattern” of West
African Rhythm. Journal of the American Musicological Society,
59, 1–46.
Browne, R. (1981). Tonal implications of the diatonic set. In Theory
Only, 5, 3–21.
Clough, J., & Douthett, J. (1991). Maximally Even Sets. Journal of
Music Theory, 35, 93–173.
Ellis, R. J., & Jones, M. R. (2009). The role of accent salience and
joint accent structure in meter perception. Journal of
Experimental Psychology: Human Perception and Performance,
35, 264–280.
Iyer, V., Bilmes, J., Wright, M., & Wessel, D. (1997). A novel
representation for rhythmic structure. In Proceedings of the 23rd
International Computer Music Conference, 97–100.
Jones, M. R., Moynihan, H., Mackenzie, N., & Puente, J. (2002).
Temporal aspects of stimulus-driven attending in dynamic arrays.
Psychological Science, 13, 313-319.
Krumhansl, C. L. (2000). Rhythm and pitch in music cognition.
Psychological Bulletin, 126, 159–179.
Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the
hierarchy of tonal functions within a diatonic context. Journal of
Experimental Psychology: Human Perception and Performance,
5, 579–594.
Monahan, C. B., & Carterette, E. C. (1985). Pitch and duration as
determinants of musical space. Music Perception, 3, 1–32.
Palmer, C., & Krumhansl, C. L. (1987a). Independent temporal and
pitch structures in determination of musical phrases. Jounal of
Experimental Psychology: Human Perception and Performance,
13, 116–126.
Palmer, C., & Krumhansl, C. L. (1987b). Pitch and temporal
contributions to musical phrase perception: Effects of harmony,
performance timing, and familiarity. Perception &
Psychophysics, 41, 505–518.
Peretz, I. & Kolinsky, R. (1993). Boundaries of separability between
melody and rhythm in music discrimination: A
neuropsychological perspective. The Quarterly Journal of
Experimental Psychology, 46A, 301–325.
Pressing, J. (1983). Cognitive isomorphisms between pitch and
rhythm in world musics: West Africa, the Balkans and Western
Tonality. Studies in Music, 17, 38–61.
Prince, J. B., & Schmuckler, M. A. (2014). The tonal-metric
hierarchy: A corpus analysis. Music Perception, 31, 254–270.
Prince, J. B., Thompson, W. F., & Schmuckler, M. A. (2009). Pitch
and time, tonality and meter: How do musical dimensions
combine? Journal of Experimental Psychology: Human
Perception and Performance, 35, 1598–1617.
Rahn, J. (1987). Asymmetrical ostinatos in sub-Saharan music: Time,
pitch, and cycles reconsidered. In Theory Only, 9, no.7: 23–36.
Temperley, D. (2000). Meter and grouping in African music: A
review from music theory. Ethnomusicology, 44, 65–96.
704
Table of Contents
for this manuscript
The study reported here capitalized on the high prevalence of absolute pitch (AP) among
students in the Shanghai Conservatory of Music (see Deutsch, Li, and Shen, JASA 2013,
134, 3853-3859) to study the effects of timbre on note naming accuracy. The subjects
were 71 students at the Shanghai Conservatory of Music; roughly half the subjects were
piano majors, the other half being string majors.
The subjects were administered a test which employed the same notes as were employed
in earlier studies by Deutsch et al. (Deutsch et al.,2006; Deutsch et al., 2009). They were
presented successively with the 36 notes that spanned the three-octave range from C3
(131 Hz) to B5 (988 Hz) and wrote down the name of each note (C, C#, D, etc.) when
they heard it. A difference between this and the earlier studies is that we used the 36
notes twice in two timbres: one for a piano timbre and the other for a string timbre. (More
specifically, the string timbre consisted of two timbres: the pitches below “G3” were in a
viola timbre, and those above “G3”were in violin timbre). The piano and string timbres
were tested separately and contained the same numbers of the pitches but indifferent
orderings. Table 1 displays the number of subjects in each condition, their majors, and the
different orders (piano timbre vs. string timbre) in which the tones were presented.
Table 1. Number of subjects who were piano and string majors and the presentation
orders (piano vs. string timbre)
Group1(P-S) Group2(S-P) Total
13 9
Piano
22
Major Piano String String Piano
Questionnaire Questionnaire
timbre timbre timbre timbre
27 22
String
49
Major Piano String String Piano
Questionnaire Questionnaire
timbre timbre timbre timbre
Total 40 31 71
The subjects were piano majors and string majors, and those in each instrument group
were divided randomly into two subgroups. One subgroup listened to notes in the piano
timbre first, and to notes in the string timbre second; the other subgroup heard the piano
and string timbres in reverse order. The subjects also completed a questionnaire
concerning their personal information. As in the earlier study (Deutsch, Li, and Shen,
2013), given the low performance level for age-of-onset group ≥ 10, the analyses in this
study were carried out taking only those subjects (n = 71) who had begun musical
training at age ≤9.
ISBN 1-876346-65-5 © ICMPC14 705 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Overall performance levels were very high; the subjects scored 73% correct, not allowing
for semitone errors, and 80% correct allowing for semitone errors.
First, we evaluated the basic difference in the average percentage correct between the
piano and string timbres. To examine this difference, the subject population was divided
into two subgroups, Group 1 heard piano timbre first, and Group 2 heard string timbre
first. We compared the percentage correct for piano timbre in Group 1 and string timbre
in Group 2. It was found that the percentage correct for piano timbre(the average
percentage correct for piano timbre was 76.8% correct not allowing for semitone errors,
and 84.15% correct allowing for semitone errors)was higher than for string timbre(the
percentage correct for string timbre was 69.25% correct not allowing for semitone errors,
and 77.23% correct allowing for semitone errors;) and the difference between piano and
string timbre was highly significant[Semitone errors not allowed: p < .02; Semitone
errors allowed: p < .01].
Second, we compared the average percent correct for each timbre separately depending
on whether they heard it first or second. The percentage correct for the piano timbre when
it was heard first(78.82% correct not allowing for semitone errors, and 86.46% correct
allowing for semitone errors)was little higher than when it was heard following the string
timbre (74.19% correct not allowing for semitone errors, and 81.18% correct allowing for
semitone errors. However, the percentage correct for the string timbre was higher when it
was heard following the piano timbre(77.43% correct not allowing for semitone errors,
and 85.76% correct allowing for semitone errors) than when it was presented
first(58.69% correct not allowing for semitone errors, and 66.22% correct allowing for
semitone errors).
The difference in percentage correct between subjects from the different majors was also
examined for each timbre. In most cases, the performance of the piano majors (75.69%
correct not allowing for semitone errors, and 82.51% correct allowing for semitone
errors)was slightly higher than that of the string majors(71.83% correct not allowing for
semitone errors, and 79.88% correct allowing for semitone errors); however this
difference was nonsignificant (p>0.1).
Generally, the performance of all the subgroups was higher when they were judging
pitches in the piano timbre than in the string timbre. Further, after listening to the piano
timbre, naming the pitches in the string timbre was significantly more accurate. Yet after
listening to the string timbre, naming the pitches in the piano timbre was less accurate,
though not significantly so. Also, the performance of the piano majors was higher than
that of the string majors, though not significantly so. Given this pattern of results, we
conclude that the difference in ability to identify pitches in these different timbres found
here was not due, in general, to a difference in long term experience with these two types
of timbre, but rather to the characteristics of these timbres, and to the context in which
they were presented.
706
Table of Contents
for this manuscript
Within a harmonic context, pitch relationships between notes are governed by syntactic
rules that instantiate the perception of tonal structure. Krumhansl and colleagues showed
that behavioral ratings can be used to derive geometric models of mental representations
of musical pitch. Individual pitches are mapped onto a continuum of stability depending
on their relationship to a tonal context, and this context determines the perceived
similarity between pitches. The current study examined the mental representation of pitch
at a neural level using Multivariate Pattern Analysis applied to Magnetoencephalography
data, recorded from musically trained subjects. On each trial, one of four harmonic
contexts was followed by one of four single probe notes. Two probe notes were diatonic
(tonic and dominant) and two were nondiatonic. A machine learning classifier was
trained and tested on its ability to predict the stimulus from neural activity at each time-
point. We found that pitch-class was decodable from 150 ms post-onset. An analysis of
the decodability of individual pairs of pitches revealed that even two physically identical
pitches preceded by two distantly related tonal contexts could be decoded by the
classifier, suggesting that effects were driven by tonal schema rather than acoustical
differences. Characterizing the dissimilarity in activity patterns for each pair of pitches,
we uncovered an underlying neural representational structure in which: a) diatonic tones
are dissimilar from nondiatonic tones, b) diatonic tones are moderately dissimilar to other
diatonic tones, and c) nondiatonic tones are closely related to other nondiatonic tones.
Dissimilarities in neural activity were significantly correlated with differences in
goodness-of-fit ratings, indicating that psychological measures of pitch perception honor
the underlying neural population code. We discuss the implications of our work for
understanding the neuroscientific foundations of musical pitch.
ISBN 1-876346-65-5 © ICMPC14 707 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The topics of expectation and prediction have received considerable attention in music
cognition (e.g., Huron, 2006; Pearce, 2005; Temperley, 2007) and visual perception (e.g.,
Bar, 2007; Summerfield & Egner, 2009; Fiser & Aslin, 2002), but few studies have
examined the relative surprise that can result from veridical expectations, resulting from
exact repetitions of material in the local context, versus schematic expectations, acquired
through prior exposure. It is also unclear whether the underlying expectation mechanisms
are modality-specific or more general. To address these questions, we aimed to test
different types of expectation (veridical and schematic) by systematically varying the
predictability of artificially constructed stimuli, and compared expectation processing
across domains. Every trial consisted of a sequence of tones or visual elements, and
participants rated the expectedness of a target (the final event in the sequence). The target
either fulfilled or violated expectations set up by the preceding context, which varied in
predictability. In both modalities, contexts yielding specific veridical expectations led to
more extreme un/expectedness ratings than contexts producing general schematic
expectations, and non-repetitive/less predictable stimuli produced the least surprise from
an improbable target. In a direct extension of these findings, work in progress presents
visual and musical information concurrently to addresses the extent to which veridical
and schematic expectations from one modality impact expectations within the other
modality. Although different information contributes to the degree of predictability of
stimuli across modalities, we conclude that expectations in both music and vision are
driven by domain general processing mechanisms that are based on the degree of
specificity and predictability of the prior context.
ISBN 1-876346-65-5 © ICMPC14 708 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The study of music perception has received considerable attention from the scholarly
community. Models of music perception have been devised, and several test batteries
created and used, time and again, to investigate how humans make sense of the musical
sounds found in their surroundings. In spite of the popularity of many studies and tests of
music perception, discussions concerning their underlying epistemological assumptions,
measurement properties and standardization are still rare. The aim of this symposium is
to revisit music perception research from the perspective of psychometrics. As a
contribution for the construction of critical thinking in this area, the participants of this
symposium will share findings from three research projects concerned with the
measurement of music perception.
Following a brief introduction of the symposium theme and presenters, the first
presentation focuses on the construction of a new model for testing absolute pitch. The
second presentation describes an experimental study on active and receptive melodic
perception in musicians that is being carried out in Brazil. Finally, the third presentation
brings forward the issue of construct validity in music perception research, following an
analysis of 163 studies. The discussant will make some comments and then there will be
time for discussion with audience members.
ISBN 1-876346-65-5 © ICMPC14 709 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 710 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
as needed to identify a note or if they are freely allowed prone III. MODEL EVALUATION
to semitone errors) can make it very difficult, or even
To study any latent psychological phenomenon (i.e., one
impossible, to distinguish AP possessors from non-possessors,
that cannot be measured directly, such as AP), the
since so many answers are considered indicatives of the
identification of a set of structured, consistent, observable
cognitive ability.
criteria based on evidence is essential. From a scientific
To ponder on AP definition it is crucial to a scientific
perspective, it is of prior importance that these elaborated
approach of the phenomenon. Without a proper approach, it is
criteria can be tested in search for evidences supporting the
not even possible to claim that the dichotomous theoretical
theoretical model elected. It is from this experimental testing
model commonly adopted to represent AP is in fact adequate.
that the model can be rejected or not, allowing its enhancement.
Observing current researches, it is possible to enumerate
Most AP researches adopt experimental testing as their core
criteria used to address AP ability, which do not fit into a
methodology, in order to measure different patterns of subject
dichotomous model, although this model is usually presented
response in relation to parameter variation (such as register,
on the given definition of the cognitive trait. Instead, these
timbre, the time necessary for pitch identification or the
authors should have evaluated alternative models for
proportion of correct/wrong answers). However, as highlighted
representing AP (e.g., if the phenomenon is better represented
in the previous section, the lack of consideration regarding
by a continuous model or even a hybrid model).
correspondences between theoretical model and empirical data
The previous paragraph points out to a need of a deeper
makes the process of knowledge acquisition extremely
thinking regarding AP definition. In fact, many papers define
difficult.
AP generically, while others only quote definitions presented
If how well the proposed set of criteria for the definition of
by reference authors, usually without any consideration of the
the AP phenomenon fits to the reality of the AP possessor and
correspondence of these definitions to the AP perspective
its abilities is unknown, then solving this matter should be the
discussed on these papers.
first phase in experimental research. In fact, the borderline that
A better comprehension of the AP phenomenon can begin to
separates AP possessors from non-possessors has not been
take form by observing the consensual criteria identified on AP
consensually defined, taking into account that both exhibit
definitions of different authors. First, AP possessors are
limitations on pitch identification. Considering that many
capable of identifying pitches immediately, or almost
researches adopt, as a starting point of their methodology, the
immediately. Second, AP possessors are capable of associating
segregation of AP possessors from non-possessors, this
pitches to verbal labels, an association stored on subject´s
problem becomes of great importance.
long-term memory, that is, a specific label is learned by the
Consequently, the scientific community should find a way to
subject (i.e., usually a name of one of the musical tones) in
cope with the basic indicators used to classify AP, defining
association to a specific pitch class. Although these
them properly and establishing cut-off points with accuracy
associations are considered standardized by many, it is well
rates, in order to separate those who have AP from those who
known that some subjects (e.g., musicians who play
do not have it. This demands the accomplishment of
transposable instruments or that studied on an out of tune piano)
experiments specifically designed to test basic hypothesis on
usually show different patterns of associations than most
AP ability, based both on evidence from previous empirical
musicians.
observations and on logical coherence provided by theoretical
Nonetheless, a vast number of researches highlight that AP
foundations.
possessors are rarely infallible, that is, most of them show a
Only when this basic step is achieved will researches
few (or many) limitations, such as:
dedicated to more specific aspects of AP ability have a solid
1) Margin of errors of semitone on pitch evaluation (i.e.,
ground to elaborate on. As pointed before, without the
half-tone errors are recurrent);
standardization of basic criteria inside the community
2) Difficulty in (or even incapacity of) singing a note
dedicated to the AP studies, it is not possible to know if
without external reference;
different researches are in fact evaluating the same cognitive
3) Limitations to certain registers;
ability, and even less to compare information resulting from
4) Limitations to certain timbres.
these experiments.
If many authors discuss these criteria, the question remains
A possible solution to this problem is the creation of a
as to how should the scientific community cope with these
reproducible model for AP categorization using
variables. For example, what is the percentage of correct
standardization of criteria and the construct validity evaluation
answers a subject must achieve to be considered an AP
as conducted in the medical area, throughout guidelines stated
possessor? An AP possessor should be capable of identifying
in sources such as Diagnostic and Statistical Manual of Mental
pitches on which timbres? If a subject only recognizes pitches
Disorders or Composite International Diagnostic Interview.
on the medium register, should he be considered an AP
The resulting theoretical model could then be tested by using
possessor? If the subject cannot sing a demanded pitch without
procedures from a field of statistics known as structural
external reference, can he be considered an AP possessor?
equation modelling, focused specifically on testing theoretical
Considering the above information, maybe one could
models.
contemplate the cognitive trait named AP heterogeneous
according to the subject´s abilities. This leads to a hypothesis IV. ABSOLUTE PITCH INDICATORS
(already raised by previous researchers, as Bachem in 1937)
that there could be different types of AP possessors. What are the criteria necessary to define AP? Considering a
designed set of criteria, are they capable of adequately
explaining the latent psychological trait?
711
Table of Contents
for this manuscript
From a scientific perspective, it is important that the Due to this fact, Ward considers that the best AP possessor is
designed criteria can be tested in order to provide evidences not the one who makes fewer mistakes, but the one who
supporting or falsifying the core theoretical model adopted, exhibits fewer variations in answers (WARD, 1999).
allowing its enhancement. Although we have adopted a dichotomous perspective, it is
Based on researched bibliography, more specifically those important to emphasize that the proposed indicators can
discussed in Germano (2015), we propose a preliminary collectively provide a mapping of the cognitive abilities of AP
hypothetical model (construct) for AP: possessors in all its diverse capacities and limitations, therefore
providing a model which is believed to be better capable to fit
the reality of AP possessors’ abilities.
Consequently, this hypothetical model can provide a solid
perspective to AP phenomenon. Through test, it can supply
solid evidences to indicate if AP can be described as a
continuous ability (with a continuous line from minimal to
maximal pitch identification), or if it is better described as a
hybrid ability (with the possible verification of different AP
types, as already proposed by Bachem).
V. CONCLUSION
The primary purpose of this paper was to propose a set of
indicators to assess what AP is, so that the resulting theoretical
model could be tested in a future research phase. The
elaboration of this model started by observing the consensual
criteria present on AP definitions of different authors.
Based on the information provided, the scientific question of
what in fact is the AP ability and how it has been measured
Figure 1. Preliminary hypothetical model for AP. remains open on researches up to date. We highlighted the
main non-consensual AP characteristics, which show that AP
This model is a starting point for future researches.
possessors commonly exhibit some kind of limitation (be it
Therefore, it is open to modifications and corrections due to
regarding timbre, register or half-tone errors). We also
results obtained within the preliminary test that should evaluate
questioned the common (undiscussed) view that AP is a
this hypothetical proposition.
dichotomous cognitive trait.
From a theoretical perspective, the proposed model
Based on researched bibliography, more specifically those
considers that AP is a dichotomous phenomenon (to possess or
discussed in Germano (2015), we proposed a preliminary
not to possess) and embodies the following indicators:
hypothetical model (construct) for AP. First, it will be
1) Pitch Labeling: ability to keep long-term stable
evaluated, leading to modifications and corrections according
representations of specific pitches on memory and to access
to results obtained within the preliminary test aimed at testing
them when needed, associating them to learned verbal labels
this hypothetical proposition. As mentioned before, the
(Levitin, 1994).
elaboration of a well-formulated (testable) model for AP
2) Time of identification: according to AP bibliography,
categorization, formed by a coherent set of criteria, is an
the identification of stimuli in AP possessors is immediate. Our
essential step for AP researchers. It is only by its test that the
question is: what is immediate? How many seconds does the
scientific community will be able to evaluate the capacity of
subject need to identify a tone?
the model to fit reality, i.e., the mapping of the cognitive
3 and 4) Number of timbres and registers: these items
abilities of AP possessors in all its diverse capacities and also
measure the capacity of the subject to identify pitches in two
limitations.
parameters, describing how many timbres and registers a
subject is capable of recognizing (from identifying in just one REFERENCES
timbre and one register to all timbres and their respective
registers). Athos, E. A., Levinson, B., Kistler, A., Zemansky, J., Bostrom, A.,
Freimer, N., & Gitschier, J. (2007). Dichotomy and perceptual
5) Half-tone errors: how many half-tone errors does an AP distortions in absolute pitch ability. Proceedings of the National
possessor make? The lower degree of accuracy would be 100% Academy of Sciences, 104 (37), 14795–14800.
half-tone errors and the higher degree would be 0% half-tone Abraham, O. (1901). Das absolute Tonbewusstsein.
errors. However, it should be questioned if a large quantity of Psychologisch-musikalische Studie. Sammelbände der
half-tone errors is actually caused by the subject´s mistakes, or internationalen Musikgesellschaft, 3, 1-86.
if it is due to a different memorization reference. As pointed out Bachem, A. (1937). Various types of absolute pitch. Journal of the
before, many studies highlight that AP possessors acquire pitch Acoustical Society of America, 9, 146-151.
memory according to the reference presented to them on the Baggaley, J. (1974). Measurement of absolute pitch. Psychology of
Music, 2 (2), 11-17.
first years of study. For example, if a child possessing AP
Baharloo, S., Johnston, P. A., Service, S. K., Gitschier, J., & Freimer,
grows studying in an out of tune piano (tuned half-tone bellow N.B. (1998). Absolute Pitch: An Approach for Identification of
the A=440 Hz standard), this child will certainly make Genetic and Nongenetic Components. The American Journal of
consistent half-tone errors in tests, although the variation in Human Genetics, 62 (2), 224–231.
errors would possibly be near zero (Levitin & Rogers, 2005).
712
Table of Contents
for this manuscript
713
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 714 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
as to receptive melodic performance (transcription of dictated The model of Figure 3 refers to the items conceived here to
tonal melodies). address the Sight-Reading Trait underlying melodic perception
of classical music students:
II. EVALUATION AND ASSESSMENT:
CREATING A BATTERY AND ITS
UNDERLYING MODEL
The second part of the present research consists on
designing a model (construct) for assessment of the active tonal
melodic perception (solfege), and giving criteria for evaluation
of such ability. The questions made are: a) How do we grade
music students on tonal sight-singing? b) Which are the
items/indicators (criteria) that should compose a test?
The Item Response Theory (IRT), especially that described
by Lord (1952), and Rasch (1960) is used here. “Performance
is the effect, and latent models are the cause.” … “Subjects
with greater ability will have the most probability to respond * Pitch accuracy includes the ability of labelling (be it with
correctly to the item, and vice-versa” (Pasquali, 2003, 82-83). syllables, numbers or else)
A theoretical model has been conceived over several years
of experience on ear-training classes and evaluation of students Figure 3. Items that compose the test for evaluating the
Sight-Reading Trait of classical music students.
at college level, as well as on existing literature. The
hypothetical model created here (Figure 1) refers to melodic Hence, the Latent Model for such trait can be viewed at the
perception of humans in general. following table, where the lines describe the items used for the
test, and the columns refer to the amount of ability on each item,
according to a five-point-Likert scale:
715
Table of Contents
for this manuscript
For active melodic perception trait, the same 150 students x Rhythmic precision, regularity of pulse and
were evaluated by one of the three judges, according to 4 items subdivisions
under a five-point-Likert structure considering 10 passages. Understood here as the capacity of participants on decoding
The judges were blinded to the students’ performance, rhythms, meters, and on maintaining pulse and subdivisions
receiving only a recording with a random ID number. regularity during all the exercise. It is not expected, naturally,
Students are recorded while sight-reading 19 intervals and that the metronomic speed be maintained during the whole
10 tonal melodies (different from the receptive test). The solfege, but, even though changes of speed at the pulse level
intervals are considered either correct, incorrect, or out of tune may occur, they are supported by changes at the subdivision
by the judges. levels. Actually, such a thing is quite desirable, musically,
Out of 150, 50 students’ recorders, at random, were given to since it denotes control and experience by the participant (on
all the judges in order to evaluate the agreement regarding doing ritardandos at the end of a phrase, for instance).
specialists’ ratings on the active evaluation domain. Such For the 5-point-Lickert scale, one should consider:
ratings will be assessed via weighted Kappa coefficient. To A. It is not possible, absolutely, to discern both meter(s)
evaluate the construct (factorial) validity, confirmatory factor and regularity of pulses and subdivisions.
analysis (CFA) will be used. The models will be evaluated B. It is possible to discern meter(s), but subdivisions are
regarding their fit using the following fit indices and their inconsistent (for example, a participant executes a
respective cut-off: chi-square (^2), Confirmatory Fit Indices ternary subdivision instead of a given binary one, and
(CFI), the Tucker-Lewis index (TLI), and root mean square vice-versa).
error approximation (RMSEA). C. It is possible to discern meter(s) and its(their)
For the evaluation of melodies, in order to minimize hierarchical subdivisions, but many inconsistences
subjectivity, a ‘translation’ of the Likert scale into musical (errors) occur in the course of the solfege.
terms for each criterion was provided to the judges, as follows: D. It is possible to discern meter(s) and its(their)
x Intonation and pitch accuracy: hierarchical subdivisions, but occasional
One should not consider the kind of precision expected as if inconsistences occur in the course of the solfege.
singers were being judged by a specialist jury. Instead, it E. It is perfectly possible to discern meter(s) and its(their)
should be considered that students from diverse backgrounds in hierarchical subdivisions, and, when there are
music, such as instrumentalists, conductors, composers, or flotations of speed, they are supported proportionally
music educators are being evaluated. In other words, the by the hierarchical levels of division and subdivisions
degree of vocal technique expected is not that of a singer, but of the beat (rubato, rit.).
one should be able to recognize the pitches of the melody sung.
For the 5-point-Lickert scale, one should consider: x Fluency and music direction
A. It is not possible to recognize the melody sung at all. It refers not only to the capacity of the participant on moving
B. It is possible to recognize the melody, but the forward regardless of mistakes, but also on doing so (with or
intonation drops (or raises) at least a half step. without mistakes) in a way that the musical discourse is clearly
C. It is possible to recognize the melody, but the expressed. The expression musical discourse should be
intonation drops (or raises) up to a half step. understood here as the capacity of the student to prevent
D. It is possible to recognize the melody, but the inadequate accentuations, being them related to meter, rhythm,
intonation drops (or raises) less than a half step. or harmony, that prevent the fully expression of musical
E. It is possible to recognize the melody fully. prosody (taken metaphorically here in relation to musical
accents, since it doesn’t refer to words, but exclusively to
x Tonal sense and memory music).
Although related to item 1 (“intonation and pitch accuracy”), For the 5-point-Lickert scale, one should consider:
it was conceived to approach a different kind of memory, for A. It is not possible to follow the musical discourse.
instance, when an individual gets lost in terms of pitch during B. It is possible to follow the musical discourse, but
the solfege of a tonal or modal melody, or makes errors of many interruptions occur.
decoding pitches, but is still capable of recovering through C. It is possible to follow the musical discourse, but tiny
tonal memory or pitches just sung. non-intentional delays occur (a brief non-desired
For the 5-point-Lickert scale, one should consider: fermata, for example, while the student ‘takes time’ in
A. It is not possible to recognize the tonality(ies) sung at order to decode pitches and rhythms).
all. D. It is possible to follow the musical discourse, but
B. It is possible to recognize the tonic (and eventually the extra-accents occur that prevent its total fluency.
dominant) of the tonality(ies) only. E. It is possible to follow the musical discourse fully.
C. It is possible to recognize the tonality(ies), but
occasional errors of tones occur. IV. CONCLUSION
D. It is possible to recognize the tonality(ies) and Research is in progress. Data collection is finished, as well
diatonic tones, but chromaticism (chromatic as evaluation made by the judges. Data is currently being
inflexions of diatonic tones) are not realized. inserted in databases’ system, and will be analyzed in the next
E. It is possible to recognize the tonality(ies) fully, even few months.
in case where subtle instabilities of intonation occur Music students have being graded in ear-training tests with
during the solfege. no tool (test/scale/battery) showing evidences about construct
validity that allows us to measure how good the latent trait of
716
Table of Contents
for this manuscript
ACKNOWLEDGMENT
We thank Fapesp Funding Agency at Sao Paulo, Brazil, for
supporting this research.
REFERENCES
Bharucha, J. J. (1987). Anchoring effects in music: The resolution of
dissonance. Cognitive Psychology, 16, 4, 485-518.
Bigand, E. & Poulin-Charronnat, B. (2009). Tonal cognition. In S.
Hallam, I. Cross & M. Thaut (Eds.), The Oxford Handbook of
Music Psychology (pp. 59-71). Oxford: Oxford University Press.
Covington, K. & Lord. C. H. (1994). Epistemology and procedure in
aural training: in search of a unification of music cognitive. Music
Theory Spectrum 16, 2, 159-70.
Edlung, L. Modus novus. (1963). Stockholm: Nordiska musikförlaget.
Edlung, L. (1974). Modus vetus: sight singing and ear-training in
major/minor tonality. Stockholm: Nordiska musikförlaget.
Karpinsky, G. S. (2000). Aural skills acquisition. Oxford: Oxford
University Press.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch.
Oxford: Oxford University Press.
Krumhansl, C. L. (2006) Ritmo e altura na cognição musical. In: B. S.
Ilari (Ed.) Em Busca da Mente Musical: Ensaios sobre os
processos cognitivos em música–da percepção à produção (pp.
45-109). Curitiba, PR: Editora UFPR.
Krumhansl, C. L. & Toiviainen, P. (2009). Tonal cognition. In I.
Peretz & R. J. Zatorre (Eds.), The Cognitive Neuroscience of
Music (pp. 95-108). Oxford: Oxford University Press.
Lerdahl, F. (1988). Tonal pitch space. Music Perception 5, 3, 315-49.
Lerdahl, F. & Jackendoff, R. (1983). A Generative theory of tonal
music. Cambridge, MA: MIT Press.
Levitin, D. (1994). Absolute memory for musical pitch: evidence from
the production of learned melodies. Perception & Psychophysics
56, 4, 414-423.
Lord, F. M. (1952). A theory of tests scores. Psychometric
Mpnograph 7. Iowa, IA: Psychometric Society.
Pasquali, L. (2003). Psicomentria: teoria dos testes na psicologia e
educação. São Paulo, SP: Editora Vozes.
Rash, G. (1960). Probabilistic model for some intelligence and
attainment tests. Copenhagen: Danish Institute for Educational
Research, and St. Paul, MN: Assessment Systems Corporation.
Rodgers, M. (1996). The Jersild approach: A sightsinging method
from Denmark. College Music Symposium 36, 149-61. Retrieved
April 11, 2016, from http://www.jstor.org/stable/40374290
Tillman, B., Bharucha, J. J.; Bigand, E. (2000). Implicit learning of
tonality: a self-organizing approach. Psychological Review 107, 4,
885-913.
Urbina, S. (2007). Fundamentos da testagem psicológica. São Paulo,
SP: Artmed.
717
Table of Contents
for this manuscript
Background
Different tasks, batteries, tests, and paradigms are largely used in experimental branch of
researches on music perception and music cognition, assessing musical skills (or
inabilities as amusia). They aim to provide observed indicators underlying given
ability/inability.
Aims
To conduct a narrative review searching evidences for construct validity of the commonly
used tasks, batteries, and tests normally used as correlational indicator for biological
indicators as brain region’s thickness, volume, functionally or other phenotypical
behavior/skills (e.g., attention).
Methods
In the Pubmed database, without period restriction, the following term were searched:
“music perception” and “assessment”.
Results
Out of 163 studies, less than 1% described statistical analysis procedures proper to
evaluate the batteries/tests’ construct validity. Even commonly used tools as Gordon’s
Musical Aptitude Profile, Seashore Measures of Musical Talents, and Montreal Battery
of Evaluation of Amusia (MBEA) do not present evidence regarding construct validity,
which is assessed by factor analysis techniques (e.g., confirmatory factor analysis [under
dichotomous items it is called Item Response Theory]).
Discussion
The impact of robust statistical methods to provide construct validity is still limited,
being majority of psychological tests still were based on classical test theory. This is
clearly an important limitation. Normally the proposed scales, tests, and batteries uses
Cronbach’s alpha as a reliability index; however, such index is only a measurement of
inter-item covariance per unit composite variance and a reasonably good index of
reliability under certain restrictive condition which, in a brief review of the above cited
tested, have been never assessed to ensure Cronbach’s Alpha’s use. Robust statistical
models must be applied to music perception research, providing better understanding how
to generate reliable, viable, and precise phenotypical measures.
ISBN 1-876346-65-5 © ICMPC14 718 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
It has long been noted that expert musicians lengthen notes at phrase boundaries in
expressive performance. Recently, we have extended research on this phenomenon by
showing that undergraduates with no formal musical training and children as young as 3
years lengthen phrase boundaries during self-paced listening to chord sequences in a lab
setting (the musical dwell-time effect). However, the origin of the musical dwell-time
effect is still unknown. Recent work has demonstrated that musicians and non-musicians
are sensitive to entropy in musical sequences, experiencing high-entropy contexts as
more uncertain than low-entropy contexts. Because phrase boundaries tend to afford
high-entropy continuations, thus generating uncertain expectations in the listener, one
possibility is that boundary perception is directly related to entropy. In other words, it
may be hypothesized that entropy underlies the musical dwell-time effect rather than
boundary status per se. The current experiment thus investigates the contributions of
boundary status and predictive uncertainty to the musical dwell time effect by controlling
these usually highly-correlated factors independently. In this procedure, participants self-
pace through short melodies (derived from a corpus of Bach chorales) using a computer
key to control the onset of each successive note with the explicit goal of optimizing their
memorization of the music. Each melody contains a target note that (1) is phrase ending
or beginning and (2) has high or low entropy (as estimated by the Information Dynamics
of Music Model, IDyOM, trained on a large corpus of hymns and folksongs). Data
collection is ongoing. The main analysis will examine whether longer dwelling is
associated with boundary status or entropy. Results from this study will extend recent
work on predictive uncertainty to the timing domain, as well as potentially answer key
questions relating to boundary perception in musical listening.
ISBN 1-876346-65-5 © ICMPC14 719 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
We are fast learners: the implicit learning of a novel music system from
a short exposure
Yvonne Leung1, Roger T. Dean1
1
MARCS Institute, Western Sydney University, Sydney, NSW, Australia
Melodies in the task were blocked by scale: a diatonic scale (Control) or a microtonal
scale. To provide exposure, participants first performed a 10-minute discrimination task
where no incongruent intervals were embedded in the melodies. They were then tested
using the new paradigm immediately and again a week later. Results from 21 non-
musicians showed that immediately after the exposure, response time was significantly
shorter when targets were preceded by incongruent intervals than when they were not, for
both the Control and Microtonal melodies. However, this difference was only significant
for the Control melodies a week later, suggesting that one short exposure session is not
sufficient for long-term memory encoding. Nevertheless, this consistent pattern found in
the Control stimuli suggests that this new paradigm is effective in measuring implicit
knowledge of music, in this case, listeners’ consistent ability to differentiate incongruent
pitch intervals from the intervals in a diatonic scale.
ISBN 1-876346-65-5 © ICMPC14 720 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 721 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
722
Table of Contents
for this manuscript
logic and information theory could be fruitful also in music major key, it can persist, at least for few time, the memory of
cognition, and especially in tonal cognition. the right branch initially unsatisfied. This is a way to account
for the sense of “unusualness” of a piece in C major starting
IV. THE STABILITY THEORY with the dominant chord.
It could be useful to spend some words on the problem of
the so-called stability theory. According to Lerdahl and
Jackendoff, a stability theory provides a (formal) criterion to
assign stability value to pitch–events, distinguishing the more ?
∅
stable events from the less ones. In Lerdahl [15], a tonal space
is introduced, with the relative metric to determine the
distance between chords, that is mainly thought with the
purpose to account for the tonal context dependence of the
perceived tension, or distance, between chords. Stability
values of chords are determined on the basis of an economy
principle: given a string of chords, the “tonal reading” of the
entire string that minimizes distances (or, better, sets to the
smallest values the tonal tension) is preferred. Fig. 5 Assuming the basic normative structure as the initial
Now, I adopt a similar view, but under the assumption that predictive state of the parsing process, once perceived the progression
listeners attempt to recover the tonal sense, rather than the G-C, the listener hypothesizes two possible interpretation,
minimal path in the tonal space, in the format of what Lerdahl represented by the two staves in column. But, only the lower one,
and Jackendoff call the “basic normative structure”, showed in implying the prediction of a structural cadence, can be satisfied,
fig. 4. while the upper one is predestined to failure.
723
Table of Contents
for this manuscript
724
Table of Contents
for this manuscript
Note that the tree in 2) is to be intended as a partial Second case. A listener, after having saturated the basic tree
representation. As for dynamic grammars for natural language, with the final tonic chord replaced by, say, the vi chord, as in
musical parsing should be able to deal with partial the deceptive cadence, feels that the requirement of the closing
information, if we want it to manage the dynamical, and cadence (i.e., the question mark decorating the higher node in
contextual, aspect of mind (musical) information processing. the right branch) is unsatisfied. Thus, also the question mark
in the root of the basic tree remains undeleted, although the
branches of the tree are solid (the tree is saturated). This
? ? situation can be expressed by the tree 1) in fig. 10. As you can
? ?
? see, the tree is saturated but unsatisfied, as the question marks
1) 2) 3)
along the branch pointing to the closing tonic are still in the
waiting for satisfaction. This unstable representation induces
? ? ? ? ?
the projection of a new, unsaturated right branch, and this is,
similarly to the previous case, a pure prediction step. The
listener is now expecting a new cadence, as represented by the
tree 2), since the cadence just processed is not able to satisfy
Fig. 8 From the left to the right: three state of the chords progression the expectation of closure generated by the tonic prolongation
parsing, represented by the trees above the staves. followed by subdominant chords previously perceived.
?
In this simple case, the system performs a top-down ?
prediction step as the initialization of the parsing process; in 1) ? 2) ? 3)
the course of the process only bottom-up steps were ? ?
necessary, input scanning followed by bottom-up (depth-first)
parsing steps, to integrate the input in the expected structure.
But, of course, the system should be endowed with the
capacity of manipulating the tree under construction, triggered
by the acquired information, but without performing a
scanning step, i.e. the capacity of performing top-down
prediction steps during the parsing process, eventually Fig. 10 The processing of the deceptive cadence. A pure prediction
projecting sub-goals of the main goal to fulfill the question move induces the projection of a novel right, cadential, branch, as the
mark in the basic tree root. A number of such cases should be deceptive cadence produces a sense of failure of the basic structure
accomplishment.
provided, with the additional difficulty to obtain a computable
system aiming at the completeness. Here, I’ll illustrate three
Third case. The well-known structure of the 8-measure
case, just in order the give an idea of this device. theme in the classical style, in the form of period, is
First case. Consider the first measure of the Bach’s choral,
characterized by the so-called interruption4, at the end of the
in A minor shown in fig. 9. It begins with the dominant chord
first 4 bars (the antecedent). The interruption implies a half–
resolving to the tonic followed by another dominant-tonic
cadence, with the V positioned immediately before the border
pattern (vii°6–i6). What is here the parsing strategy of the
of the antecedent phrase. Again, how does the listener build up
hearer? Probably, she postulates a hidden initial tonic when
a coherent representation of the interruption, and how does she
she grasps the sense of A minor; but this postulate cannot be
merge this representation with that of the consequent (the
stored for long time. To account for this fact, we can provide
second 4–measure of the period, usually concluded by a
the system with a rule transforming a tree with a rightmost
PAC)? My hypothesis is that the interruption implies a sense
unsaturated branch into a tree with a backward prolongation of of incompleteness, but it is such that, in order to receive
the main right branch. This rule involves, evidently, a pure
accomplishment, it generates the expectation of a re-launch of
prediction step, forced by a situation where a highly unstable
the entire normative structure. The sense of interruption is
representation relaxes into a more stable one.
emphasized also by means of other parameters, marking the
?
phrase border, independently from the harmony (usually there
? is a kind of textural caesura after the dominant chord, at the
? end of the antecedent). This situation is represented in fig. 11.
? In 1), the listener temporary remains on hold for completing
?
the cadence. Then, triggered by the perception of the phrase
border, she projects a new entire unsaturated tree as the
expectation of the accomplishment of the interruption, as
shown in 2).
4
The term is imported from the Schenkerian tradition in music analysis. It
refers to the stop at the supertonic (the second scale-degree) in the descent of
the fundamental linear progression of the fundamental structure (the Ursatz),
Fig. 9 The first four chords of the Bach’s choral Ach Gott, vom with the bass rising, and stopping, at the V. This produces a half-cadence, that
Himmel sieh’ darein, Cantata 153. is considered an interruption in the harmonic flow. After the interruption, the
Ursatz is entirely replicated (see [20] for more details).
725
Table of Contents
for this manuscript
REFERENCES
?
? [1] C. L. Krumhansl, Cognitive Foundations of Musical Pitch, New York:
1) 2) ? 3) Oxford University Press, 1990.
? ?
[2] D. Temperley, The Cognition of Basic Musical Structures, MA: MIT
Press, 2001.
[3] F. Lerdahl and R. Jackendoff, A Generative Theory of Tonal Music, MA:
The MIT Press, 1983.
[4] R. Jackendoff, “Musical Parsing and Musical Affects,” Music
Perception, vol. 9, No. 2, pp. 199-230, 1991.
[5] R. Kempson, W. Meyer-Viol, and D. Gabbay, Dynamic Syntax: the flow
of Language Understanding, UK: Blackwell, 2001.
Fig. 11 An hypothesis of the real-time processing of the [6] B. Tillmann, J.J. Bharucha, E. Bigand, “Implicit learning of tonality: A
interruption, after the first 4 bars, in the eight-measure theme known self-organizing approach.” Psychological Review, vol. 107(4), pp. 885-
913, 2000.
as period.
[7] P. Toiviainen and C. L. Krumhansl, “Measuring and modeling real-time
responses to music: The dynamics of tonality induction”, Perception,
vol. 32, pp. 741-766, 2003.
VI. CONCLUSION [8] E. Narmour, The Analysis and Cognition of Basic Melodic Structures:
the Implication-Realization Model, Chicago: The University of Chicago
The model sketched is informal and requires to be formally Press, 1990.
outlined. It is a working hypothesis to be developed and [9] D. Temperley, Music and Probability, MA: The MIT Press, 2007.
tested. It has to be formalized in order to become a model of [10] L. B. Meyer, Style and Music: Theory, History and Ideology, Chicago:
The University of Chicago Press, 1989.
the listener tonal competence experimentally testable and it is [11] J. van Benthem, Language in Action, The MIT Press, 1995.
surely desirable it achieves some kind of completeness. But [12] W. Chafe, Discourse, Consciousness, and time, Chicago: The University
completeness in music theory is not the same as in logic. The of Chicago Press, 1994.
dynamic grammar of tonal music should be a flexible system, [13] H. Kamp and U. Reyle, From discourse to Logic. Dordrecht: Kluwer,
1993.
incorporating the principles of dynamic processing of [14] J. Peregrin (ed.), Meaning: the Dynamic Turn, Oxford: Elsevier Science,
information, already adopted in many non-classical logics, as 2003.
well as in linguistics. More than theorems of completeness, in [15] F. Lerdahl, Tonal Pitch, Space, Oxford University Press, 2004.
theoretical psychology of music we need of formal [16] D. Jurafsky and J.H. Martin, Speech and language processing, New
Jersey: Prentice-Hall, 2000.
hypotheses that can be empirically testable and that, at the [17] W.E. Caplin, Classical Form, Oxford: Oxford University Press, 1998.
same time, can account for the cognitive complexity of the [18] N. Cook, “The Perception of Large-Scale Tonal Closure,” Music
musical experience. Perception, vol. 5 pp. 197-205, 1987.
I argued for the necessity to introduce a way of representing [19] F. Lerdahl, “Musical Syntax and its relation with Linguistic Syntax”, in
M.A. Arbib (ed.), Language, Music, and the Brain, MA: The MIT Press,
“partiality” in music processing: the complex interaction 2013.
between perception and prediction, in a non-monotonic [20] D. Gagné and A. Cadwallader, Analysis of Tonal Music, Oxford: Oxford
growth of representations, is a crucial aspect of music University Press, 2007.
understanding, and in order to deal with this aspect we need of
models managing incomplete structures. Anyway, it is already
a long time that formal systems and logics dealing with
partiality have been introduced in theoretical linguistics.
To sum up, my principal purpose here has been to argue for
the fruitful adoption in music cognitive theory of the dynamic,
symbolic perspective, as a paradigm developed in logic and
semantics but with relevant involvement in mental information
processing. Tonal cognition, as a special case of music
cognition, shows some features that seem particularly suitable
to be treated in this perspective, such as a strong context
dependence in the information processing and a strong
tendency of the mind to actively interact with perception. The
current information in perception is interpreted against a
structural context produced by the data already processed.
Moreover, tonal hearing amplifies a pervasive character of the
listening to music: that to project the past in the future. What
is in memory and what is currently perceived prompt the
hearer to predict what is going to be. Now, exactly these
aspects of the musical experience are efficiently captured by
the dynamic perspective, to the extent that dynamic principles
have been introduced just to address two main features
characterizing mental understanding processes in human
cognition: the general context dependence (among others,
involving temporal constraints) and the tendency to anticipate
the future.
726
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 727 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
that there is nothing inherent to small intervals that would lead these two modes allows us, at least to some extent, to be able
to this yearning quality. It may simply be that scale degrees to examine semitone pitch movements relatively
and have a strong attachment, and that scale degrees and independently of the hierarchical importance of the different
have similarly strong attachments independent of the semitone scale tones. That is, the contrast between the major and minor
relationship. The implicit learning view posits no special scales affords the opportunity to test our conjecture
function for the semitone: any pitch might accrue yearning independent of the effect of scale degree.
qualia simply by its statistical tendency to be followed by
some other pitch. For example, the dominant might evoke II. HYPOTHESIS
feelings of anticipation for the tonic, simply because in In light of this background, we might propose the following
Western music, many dominants are followed by the tonic. specific hypothesis:
These speculations aside, the purpose of the present study
is not to resolve the issue of origins, but to empirically test the H. There is an association favoring semitone movement so
existence of the phenomenon. That is, our aim is to determine that movement between and is favored in the major mode
whether musical organization is indeed consistent with the over the minor mode, while movement between and is
special role of the semitone in such a yearning or tending favored in the minor mode over the major mode.
relationship. Accordingly, we might propose the following
conjecture: III. METHOD
In general, our method involves calculating the frequency
Scale degrees separated by a semitone are more likely to
of successions for various scale tones in a sample of major
cleave together than scale degrees separated by a larger
and minor-mode works.
distance (e.g., whole tone).
A. Sample
In testing this conjecture we face at least two potential
Since our hypotheses relate to tone successions, an
confounds. The first difficulty relates to compositional intent. important sampling criterion is to focus on musical materials
Without resorting to a perceptual experiment, how might we for which the linear succession of tones is not contentious.
operationalize the notion of “cleaving” or “yearning”? If one
That is, we need to ensure that there is no ambiguity or
tone tends to cleave to another tone, evidence consistent with
dispute that tone X is followed by tone Y. Of the various
this relationship might minimally involve a statistical
musical textures, the least contentious would be musical
tendency for the one tone to be followed by the other tone. Of
melodies. Accordingly, in selecting our musical materials, we
course, in real music, composers might aim to increase
aimed to sample unambiguous musical melodies or thematic
tension or engage in deception by interposing a third tone
material.
between a purported “yearning” pitch and a purported
For the purposes of this study, we employed two
“yearned-for” pitch. So a simple tally of the number of X
convenience samples. Specifically, we made use of two
followed by Y may be necessary but not sufficient evidence
existing monophonic musical databases:
consistent with a purported “cleaving” or “yearning” qualia.
Nevertheless, we may reasonably suppose that evidence of a 1. A random sample of 7,704 major and 768 minor-mode
purported “cleaving” quale would minimally involve an
songs from the Essen Folksong Collection (Schaffrath &
elevated likelihood that one of the tones will have a high
Huron, 1995).
probability of being followed by the other tone.
A second potential confound relates to the relative stability
2. A random sample of 7,171 major- and 2,618
of different scale tones. Musical melodies are not simply
minor-mode themes from the Barlow and Morgenstern
successions of intervals. Melodies are also salient successions
Dictionary of Musical Themes (1948).
of scale degrees, and some scale tones are more important
than others. For example, scale degrees , and are known In both of these databases, the determination of the mode
to be more stable than other scale tones. These differences are for each musical passage was made by the database authors.
empirically evident, for example, in the key profiles We have no information about the provenance or method by
assembled by Krumhansl and Kessler (1982). Unstable tones which these determinations were made. For the purposes of
are simply attracted to more stable tones. This suggests that this study, we simply accepted the major and minor
the tendency for to move to , and for to move to might designations as encoded by the database authors.
simply be a manifestation of unstable-to-stable movement, Although a musical work might be nominally “in the major
and that the semitone relationship between the pitch pairs is mode” or “in the minor mode,” it is common for works to
merely coincidental. exhibit various deviant passages. In the minor mode, for
Fortunately, the dual-scale system of major and minor example, it is common to encounter so-called “modal mixture”
modes offers an opportunity to control for this confound. in which the major and minor modes co-mingle.
Although the scale-tone hierarchies are regarded as similar In addition, chromatic alterations are common in both
between the major and minor scales, the positions of the major- and minor-mode passages. These modifications might
semitones differ between the two scales. Specifically, in the introduce unanticipated confounds that could skew the results
major scale, semitone relationships exist between and and in various ways. It would be appropriate, therefore, to
between and . In the harmonic minor scale, semitone establish criteria by which certain musical works might be
relationships exist between and , between and , and excluded from the sample.
between and . The contrasting placement of semitones in
728
Table of Contents
for this manuscript
Of particular concern would be those alterations that render Table 1b. Comparison of frequency of movement between and
a nominally major-mode work to more closely resemble the and and in the Barlow and Morgenstern themes.
minor mode, or a nominally minor-mode work to more Major Minor
closely resemble the major mode. For example, any nominally
9,345 3,643
major-mode melody that contains 3, or any nominally
8,014 2,544
minor-mode melody that contains 6 would be suspect.
The main differences between the major and (harmonic)
The hypothesized association would predict that
minor modes are found in scale degrees and . Scale tone
major/
and minor/
would exhibit higher tallies than
is more problematic. In the minor mode, both 7 and 7
major/
and minor/. An appropriate statistical test for
regularly appear and so it may be inappropriate to exclude any
this association is the chi-square test for contingency tables. In
nominally minor-mode melody or theme either because it
the case of the Essen Folksong Collection the results are not
employs 7 or because it employs 7.
consistent with the hypothesis. In fact, there is a significant
As a result, we resolved to exclude any nominally
reverse relationship, 2 (1) = 424.67, p < .01; Phi = .08, Yates’
major-mode melody or theme that exhibits either 3, 6 or 7,
continuity correction applied. Similar reverse results are
and to exclude any nominally minor-mode melody or theme
evident in the Barlow and Morgenstern themes, 2 (1) = 46.79,
that exhibits either 3 and 6. Employing this criterion, 389 of
p < .01, Phi = .05. In both cases the effect size is very small
the original 7,704 major-mode melodies and 205 of the
however.
original 768 minor-mode melodies were excluded from the
Essen Folksong collection. Similarly, 1,547 of the original
V. DISCUSSION
7,171 major-mode themes and 704 of the original 2,618
minor-mode themes were excluded from the Barlow and We predicted an association favoring semitone movement
Morgenstern collection. Hence, our final sample included so that movement between and in the major mode would
7,315 major- and 563 minor-mode songs from the Essen be more common than and in the minor mode, while
Folksong Collection Procedure, and 5,624 major- and 1,914 movement between and would be favored in the minor
minor-mode themes from the Barlow and Morgenstern mode over the major mode. However, our results showed a
Dictionary of Musical Themes. significant (though very small) reverse association. Instead,
All of the sampled materials are available in the Humdrum activity between and tends to always be greater than
“kern” format. The data were processed using the Humdrum activity between and . In light of these results, the yearning
Toolkit (Huron, 1994). Specifically, each melody was theory appears to be weak.
translated to a scale-degree representation, and then all of the In the course of this study, our analyses produced an
scale-degree transitions were tallied. Since rests often indicate unexpected result regarding the use of semitone intervals in
grouping boundaries, the relationship between pitches melodies generally. In both the Barlow & Morgenstern and
separated by a rest appears to be perceptually less salient. the Essen Folksong Collection, the most commonly
Accordingly, scale-degree transitions spanning a rest were sought-out intervals are the conjunct intervals of major and
omitted. In the Essen Folksong collection, phrases are minor seconds. Descending seconds appear to be more
explicitly notated. For the same reason, we omitted transitions sought-out than ascending seconds, and minor seconds are
occurring at phrase boundaries for this sample. That is, we did sought-out more than major seconds. Unisons also occur more
not consider the last note of one phrase to be “connected” to frequently than a chance level, although to a lesser extent. A
the first note of the ensuing phrase. detailed comparison of the use of unisons, minor seconds, and
major seconds in actual and scrambled melodies is presented
IV. RESULTS in Tables 3a-d.
Recall that our hypothesis predicts an association favoring Table 3a. Comparison of semitone and whole-tone frequencies in
semitone movement so that movement between and is scrambled and unscrambled melodies: Essen Folksong Collection,
favored in the major mode over the minor mode, while major mode.
movement between and is favored in the minor mode over Interval Actual Scrambled Difference Increase
the major mode. Tables 1a and 1b present the pertinent tallies. (%) (%) (%)
Both tables show the total number of instances of movement Unison 21.8 16.8 + 5.0 29.8
between and and between and . Table 1a pertains to the Ascending 6.2 3.3 + 2.9 87.9
Essen Folksong collection; Table 1b pertains to the Barlow minor second
and Morgenstern themes. Descending 8.3 3.3 + 5.0 151.5
minor second
Table 1a. Comparison of frequency of movement between and Ascending 13.3 10.1 + 3.2 31.7
and and in the Essen Folksong collection. major second
Major Minor Descending 21.5 10.1 + 11.4 112.9
major second
39,637 2,815
19,685 2,515
729
Table of Contents
for this manuscript
Table 3b. Comparison of semitone and whole-tone frequencies in the minor mode. The results were not consistent with the
in scrambled and unscrambled melodies: Essen Folksong “yearning semitone” theory and failed to show the predicted
Collection, minor mode. relationships.
Interval Actual Scrambled Difference Increase However, post-hoc observations show that while
(%) (%) (%) whole-tone intervals outnumber semitone intervals,
Unison 18.5 16.4 + 2.1 12.8 composers nevertheless exhibit an even stronger affinity for
Ascending 9.1 4.3 + 4.8 111.6 using semitone intervals in general. By way of summary, our
minor second results are not consistent with the “yearning semitone” theory,
Descending 12.4 4.3 + 8.1 188.4 but our study does offer post-hoc evidence consistent with an
minor second
alternative theory—what might be called the “baby steps”
Ascending 16.4 9.9 + 6.5 65.7
major second
theory: the smallest pitch movements appear to be favored
Descending 21.9 10.0 + 11.9 119.0 whether or not these movements are linked to tonally more
major second stable pitches. These results reinforce longstanding
observations regarding pitch-proximity, but add to these
observations by identifying a disposition towards using the
Table 3c. Comparison of semitone and whole-tone frequencies in smallest available intervals in the construction of melodies.
scrambled and unscrambled themes: Barlow and Morgenstern, Of course these observations may not generalize beyond
major mode.
the specific repertoires studied. Further study is warranted to
Interval Actual Scrambled Difference Increase establish whether “baby steps” are preferred in other styles of
(%) (%) (%) Western melody, and whether the theory might apply to
Unison 14.7 13.9 + 0.8 5.8 melodies from non-Western cultures.
Ascending 9.0 3.7 + 5.3 143.2
minor second
Descending 9.9 3.9 + 6.0 153.8
REFERENCES
minor second Arthur, A. (2015, October). A comprehensive investigation of
Ascending 15.0 8.4 + 6.6 78.6 scale-degree qualia: a theoretical, cognitive and philosophical
major second approach. Paper presented at the 38th Annual Meeting of the
Descending 18.3 8.5 + 9.8 115.3 Society for Music Theory, St. Louis, MO.
major second Barlow, H., & Morgenstern, S. (1983). A dictionary of musical
themes. New York: Crown Publishers. Revised ed. London:
Faber & Faber. (Original work published 1948)
Table 3d. Comparison of semitone and whole-tone frequencies Bharucha, J. J. (1984). Anchoring effects in music: The resolution of
in scrambled and unscrambled themes: Barlow and Morgenstern, dissonance. Cognitive Psychology, 16(4), 485-518.
minor mode. Bharucha, J. J. (1996). Melodic anchoring. Music Perception: An
Interval Actual Scrambled Difference Increase Interdisciplinary Journal, 13(3), 383-400.
Cooke, D. (1959). The language of music. London: Oxford
(%) (%) (%)
University Press.
Unison 14.7 13.6 + 1.1 8.1
Deutsch, D. (1978). Delayed pitch comparisons and the principle of
Ascending 11.2 5.0 + 6.2 124.0 proximity. Perception & Psychophysics, 23(3), 227-230.
minor second Fétis, F. J. (1853). Traité complet de la théorie et de la pratique de
Descending 13.4 5.0 + 8.4 168.0 l'harmonie contenant la doctrine de la science et de l'art.
minor second Brandus et Cía.
Ascending 13.0 7.3 + 5.7 78.1 Francès, R. (1988). The perception of music. (Dowling, J., Trans.).
major second Psychology Press. (Original work published 1958)
Descending 16.1 7.2 + 8.9 123.6 Fuller, S. (2011). Concerning Gendered Discourse in Medieval
major second Music Theory: Was the Semitone “Gendered Feminine?”. Music
Theory Spectrum, 33(1), 65-89.
In general, there are more whole-tone steps than semitone Huron, D. (1994). The Humdrum Toolkit: Reference manual. Menlo
steps in melodies. However, compared with scrambled Park, CA: Center for Computer Assisted Research in the
reorderings of tones, in actual melodies semitone intervals are Humanities.
favored more than whole-tone intervals. In all four samples Huron, D. (2001). Tone and voice: A derivation of the rules of
(Tables 3a-3d) the percentage increase for ascending and voice-leading from perceptual principles. Music Perception: An
Interdisciplinary Journal, 19(1), 1-64.
descending semitones is greater than the corresponding
Huron, D. (2006). Sweet anticipation: Music and the psychology of
increase for whole-tone intervals. This implies that close pitch expectation. MIT press.
proximity is favored. Krumhansl, C. L. (1990). Tonal hierarchies and rare intervals in
music cognition. Music Perception: An Interdisciplinary
VI. CONCLUSION Journal, 7(3), 309-324.
We investigated some 15,000 melodies and themes from Krumhansl, C.L., & Kessler, E.J. (1982). Tracing the dynamic
changes in perceived tonal organization in a spatial map of
the Essen Folksong Collection and the Barlow and
musical keys. Psychological Review, 89, 334-368.
Morgenstern Dictionary of Musical Themes in order to test Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the
whether there is an association favoring movements between hierarchy of tonal functions within a diatonic context. Journal of
scale degrees and (semitone) over to (whole-tone) in experimental psychology: Human Perception and Performance,
the major mode, compared with movements between scale 5(4), 579-594.
degrees and (whole-tone) over to (semitone) intervals
730
Table of Contents
for this manuscript
731
Table of Contents
for this manuscript
Method
We assessed statistical learning before and after exposure using the probe tone method.
Participants indicated the goodness of fit (GoF) of each of 12 probe tones to novel
melodies (contexts). Melodies heard during exposure and as contexts were both based on
a whole-tone scale, but differed in 12% of the presented tones. This allowed us to
investigate GoF responses to probe tones that occurred both during exposure and in
contexts (EyPy), probe tones that occurred in the contexts, but not during exposure
(EnPy), probe tones that occurred during exposure, but not in contexts (EyPn), and probe
tones that occurred neither during exposure or in contexts (EnPn). Then, participants
completed a discrimination task, in which they indicated which of two melodies
resembled the heard melodies.
ISBN 1-876346-65-5 © ICMPC14 732 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 733 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
We adapted a rating task developed by Krumhansl (1990) in her pioneering studies of the
common-practice harmonic hierarchy. Participants were recruited via Amazon’s
Mechanical Turk (MTurk) and directed to an online experiment hosted by Qualtrics. In
Experiment 1 (N = 199), participants provided liking ratings (1 = dislike extremely; 10 =
like extremely) for major chords that followed a short key-establishing passage (a major
scale + tonic major triad in root position). Twenty target chords were presented in root
position and included every chromatic root from IV (descending) to octave (ascending).
The effect of chord on liking ratings was significant, F(14.32, 2835.52) = 45.13, p < .001,
partial η2 = 0.186. Pairwise comparisons revealed a pattern of mean liking ratings that
closely replicate our earlier findings.
Experiment 2 (N = 188) was identical, except that participants rated how well each target
“fit” with the preceding musical context. The effect of chord on fitness ratings was
significant, F(15.39, 2878.53) = 73.47, p < .001, partial η2 = 0.282. Pairwise comparisons
revealed a similar pattern, except that fitness ratings for the tonic were particularly high.
These finding provide further support for the claim that listeners experience rock chords
as musically appropriate.
ISBN 1-876346-65-5 © ICMPC14 734 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
I. INTRODUCTION same direction (e.g., pitch contour from t1-t2, t3-t4 and t1-t3
were all upwards), listeners tended to overestimate the size of
Variable acoustic cues are available to listeners as music t3-t4 relative to t1-t2. Conversely, in general, when directions
unfolds over time. Among these cues, listeners greatly rely on of between and within pairs disagreed, listeners tended to
relative pitch intervals that are presented either underestimate the size of t3-t4 relative to t1-t2. This trend
simultaneously or sequentially in music, which form tended to be exaggerated in larger intervals. The interval
foundations of our perception of melody and harmony. While familiarity did not result in any significant effects.
the importance of such relative pitch processing in everyday Interestingly, individuals with higher Goldsmith-MSI and
music listening has repeatedly been emphasized, thus listeners pitch discrimination performance tended to perform better in
are suggested to be more sensitive in perceiving relative pitch this task, but only when intervals were small and the t1-t3
information than e.g., absolute pitch processing (Honing pitch contour was ascending.
2011), it was demonstrated that listeners find it rather hard to
compare interval sizes of two successive tone pairs (Sadakata
& Ohgushi, 2000). The current study elaborates on this IV. Discussion
finding by testing how tonal contexts and listeners’ musical
experience interact with the task performance. Although processing of relative pitch intervals is one of the
most important skills in our daily musical experience, the
current study demonstrated that comparing size of two pitch
II. Methods intervals is challenging. This may be because global pitch
contours interfere with judgements of local pitch interval sizes.
Twenty-seven participants took part in 3 tests: the Interestingly, the same results patterns were observed for
comparative judgement of pitch intervals (comparison pitch intervals that are both common and uncommon to the
experiment), part of the Goldsmith-MSI (excluding the listeners, suggesting that this interference effect is not specific
listening part, Müllensiefen et al., 2014) and the pitch to Western tonal music context.
discrimination threshold test.
The stimuli of the comparison experiment consisted of two
pure tone pairs (t1-t2 and t3-t4). Participants compared REFERENCES
whether the size of pitch interval of t3-t4 was smaller, the
Honing, H. (2011). Musical Cognition. A Science of Listening. New
same or larger than that of t1-t2.
Brunswick, N.J.: Transaction Publishers.
The stimuli were designed to compare the following 4 Müllensiefen, D., Gingras, B., Stewart, L. & Musil, J. (2014). The
factors: 1. within pair pitch contour (e.g., up or down within Musicality of Non-Musicians: An Index for Measuring Musical
t1-t2 and t3-t4), 2. between pair pitch contour (600 cents up or Sophistication in the General Population. PLoS ONE 9(2):
down from t1 to t3), 3. size of pitch intervals e89642.
(small/medium/large each for t1-t2 and t3-t4) and 4. interval Sadakata, M & Ohgushi, K. (2000). Comparative judgements pitch
familiarity (common/uncommon). Common interval sizes intervals and an illusion. [Ontei no hikakuhandan to sore ni
were 300, 700 and 1000 cents, and uncommon intervals were tomonau sakkaku genshou]. Proceedings of the Annual meeting
250, 650, 950 cents, respectively. The included pitch ranged of the Japanese Society for Music Perception and Cognition, pp.
35-42, Sendai.
from 184 Hz to 987 Hz.
The 75% pitch discrimination threshold from C4 was
determined using the forced choice discrimination task with
the staircase procedure.
III. Results
The overall correct response rate of the comparison
experiment was about 40%, confirming the previous finding
that the task is challenging. Further analyses revealed that,
when pitch contours within and between pairs were in the
ISBN 1-876346-65-5 © ICMPC14 735 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
There is a need for more nondiatonic music perception research (cf. Krumhansl &
Cuddy, 2010). In an experiment (N=177) and a replication (N=105), listeners were
randomly assigned to hear a series of pairs of music. Following each pair, they rated the
similarity of the two pieces. The independent variables of the study consisted of three
musical manipulations: 1. Compositional context is based upon the original composition
used for creating musical stimuli variations. One set of variations was created by
systematically altering the nondiatonic piece Alta (Dembski, 2001). The other set of
variations was created by altering Well-Tempered Clavier Prelude in C major (WTC). 2.
The second manipulation was diatonicity (diatonic pitch set vs. nondiatonic). 3. The third
manipulation was tone order (Alta tone order vs. WTC tone order).
The results indicate that for all pieces of music, diatonic and nondiatonic, listeners
rated the pieces in an identical pair (the same piece of music presented twice) as being
much more similar than pieces in a pair that differed in diatonicity and/or tone order. This
finding adds to the existing empirical evidence of what some scholars assume is self-
evident, yet other people still question – that listeners can perceive structure in
nondiatonic music. The results also indicate that changes in diatonicity resulted in
significantly lower similarity ratings than changes in tone order, if the music was in the
Alta composition context. Conversely, in the WTC composition context, changes in tone
order caused the greater decrease in perceived similarity. This pattern of results was
strikingly consistent in the replication experiment. The results will be discussed in terms
of tonal cognitive schemas, probabilistic learning, and the possibility that the rules
established by a composer may, in certain contexts, have more bearing on perception than
the mental models developed through general music exposure.
ISBN 1-876346-65-5 © ICMPC14 736 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Abstract— CHRONOLOGY
Since first reading Sound and Symbol by Viktor Zuckerkandl in 1963 That the sound of the piano was special to me was clear from my
I have built a life of performance and teaching upon his observations first experience of the instrument as a toddler. I had never seen or
about the logic of intuitive hearing. My goal is to explore the extent heard a piano other than the spinet in a neighbor's apartment, to
to which acknowledged variable responses to tone and tone which I was given free access. At twelve my deep love of the sound
relatedness lead to confident, engaged performance. I outline the met its first match via the printed page in certain sounds of the
chronology and the methodology of my work, starting with the Mozart Sonata in G, K. 283. But my involvement with piano sound
process of freeing myself from the constraints imposed by standard was interrupted twice: first, at fifteen, when I quit piano for reasons
music notation upon my trained ear. Learning to hear as amateurs and to this day unclear and organ became my default instrumental focus
children hear has enabled me to lead them to value the ways in which through college and a Fulbright year; and again at twenty-three when,
their intuitive responses--"wrong" notes and "faulty" intonation, despite three years of intensive piano study, a congenital spinal
e.g.,--may reveal a composer's process. Three distinct areas of pitch deformity seemed to preclude professional solo performance. It was
perception are involved: a cappella singing, chamber music, and solo at this point that Carl Schachter suggested I read Sound and Symbol
piano--by far the most difficult to observe. I briefly describe Tonal by Viktor Zuckerkandl and the ongoing experiment began.
Refraction, the visualization technique to which the work gave rise in
1993, detailing how visualizing Mozart's Piano Sonata in G, K. 283 METHODOLOGY
was pivotal in connecting a vivid memory of child hearing to my The goal throughout is to hear as students hear, not as I think they
mature artistic ear. The Alberti figure, which had previously seemed should hear. I did this in relation to the three contexts of listening: a)
uninspired, proved seminal in comprehending the difference between a cappella singing; b) chamber music for diverse instruments; and c)
habitual responses based on learned notions of tonicity, and full solo piano. The students whose listening I observed were well suited
connection with the piano's infinitely variable complex sound. One to the experiment, as they were either private students, young people
might study how alternative visualizations assist in accounting for the at the Mannes College of Music Preparatory Division, or adult
long as well as short-term processing of tone. Another topic is the amateurs, primarily participants in the Chamber Music program I
study of subjective reactions to specific pitches in specific coordinated at the Mannes Extension Division
compositions. Also of interest is the way in which color selection These populations, being less practiced and less pressured, gave
reveals an individual's sense of tone relatedness. uninhibited expression to their spontaneous aural responses. I wanted
to take their deviations seriously enough to guide them into the
Keywords—Alberti figure, Mozart, Visualization, Zuckerkandl composer's logic while building confidence in their natural auditory
inclinations.
INTRODUCTION a) Starting with the sound in which the listening ear is most
The experiment that culminated in this finding began some 50 clearly manifest, in 1975 I undertook an experiment to teach adult
years ago as a result of reading Viktor Zuckerkandl's Sound and amateurs to sight-sing in tune. A warm-up exercise focussing on the
Symbol. in which he demonstrates that the ear, whether trained or not, difference between random sounds and a tunable pitch assured the
makes sense of tones. "The acoustically wrong tone can be musically group's visceral awareness of vocal intonation. Based on
right if the deviation is right in the sense of the movement. It cannot, Zuckerkandl's observations about the major scale in relation to the
then, be simply tones as such, tones of a predetermined pitch, which overtones of the tonic [2]. I used the staff to locate the 1st, 3rd, and
we sing or hear in melodies:it is motion represented in tones" [1]. 5th degrees of the scale of each piece, noting that only these pitches
I had been aware since first encountering remarkable sounds in need be in tune, leaving the others indeterminate. Thus "tuning" the
Mozart sonatas as a young child that there were qualities intrinsic to staff connected the act of reading directly to awareness of linear vs.
my hearing which remained unaddressed throughout my musical harmonic tuning. The experiment continued with many of the original
training, qualities made memorable by what I now know to be pre- participants for over twenty-five years; the foreshortened staff
conscious responses. Sensing that Zuckerkandl's work related directly became the basis of the vertical axis of my visualization method.
to those qualities, I resolved through teaching to prove its validity. b) Objective listening to ensemble playing proved remarkably
Pursuing that deeply personal fascination with listening entailed difficult, as it required that I listen without recourse to a printed score
identifying responses within three distinct musical contexts with in order to perceive lapses in intonation and wrong notes not as
which I had been involved as a child: the sound of the piano, choral errors, but as deviations from the composer's version of a sensible
singing. and chamber music. Activating my awareness of these line. The process had an unforeseen result: not only was it evident
enabled me to objectify aspects of my own pre-conscious hearing in that certain notes or passages provoked deviations, but that other
the form of a visualization method, Tonal Refraction. specific notes or passages were remarkably accurate and in tune. It
The Alberti figure proved a powerful clarifying tool already at the became apparent that much of a work's inherent drama stemmed from
method's earliest manifestation, as it directed attention to the ways in the composer's grasp of the acoustical specificity of the instruments.
which the figure's several voices defined and energized distinct The possibility that instrumental acoustics played a structural role
registers of the piano. Many years of exploring the method revealed in Classical composition seemed at once self-evident and arcane, as
the extent to which habitual hearing of theoretically identifiable the identifying quality of those passages became evident not from
harmonies was inhibiting my responses to the overtone activity analysis of the printed score or by conscious intention of the players,
generated by the Alberti figure. Now, no longer a tedious, predictable but from listening to uninhibited playing. (This was subsequently
cliché, it becomes central to a lively, endlessly variable approach to borne out in contrasting Mozart's writing for piano in the sonatas for
the Mozart piano sonatas, one of which provoked this curiosity many piano and violin, K.301-306, with those for piano solo, K. 279 -285.
years ago.
Nancy Garniez, Mannes School of Music, New School University New York, NY 10014 phone: 212-749-4035; e-mail: garniezn@newschool.edu
ISBN 1-876346-65-5 © ICMPC14 737 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
c) Of the three listening "modes" the solo piano proved the most Influenced by both the visual clarity and the auditory sense of the
difficult to observe. Visualizing Mozart sonatas over a period of Alberti figure, I began basing the vertical axis of each grid on the
many years leads me to conclude that this is because of the clash "tuned" staff of the sight-singing class, i.e., the 1st, 3rd, and 5th
between the complexity of actual piano sound with the readily degrees of the tonic, with intervening pitches placed subjectively as
analyzable harmonies of standard music notation. they felt closer to one or the other of these orientation pitches, or
In 1993, when an innovation in Physical Therapy enabled me to "stations" of the octave. The pitches of the composition were
resume solo performance, decades of immersion in ensemble playing represented with colors, chosen according to whatever plan seemed to
had heightened my awareness of acoustical properties distinct to the fit my hearing of the piece that day. The horizontal axis represented
solo piano. This awareness manifested itself in two areas: metric values, the rigid lines of the grid heightening awareness of
1) The complexities of equal-temperament so consumed my ear articulations.
that I could no longer find the relatively pure overtones of mean-tone The Alberti figure provided the single most informative visual
intonation with which I had been experimenting on the harpsichord. element that revealed the extent to which my hearing of Mozart
2) Concentration on touch and fingering revealed with clarity sonatas had become pre-conditioned by learned notions of tonicity.
the difference between the resonance of white and black keys, the Of the nineteen Mozart solo piano sonatas, eleven use the Alberti
result of the piano's unique levered action, whereby the amount of figure within the first eight bars; in only two of these does the tonic
audible overtones in a tone is directly affected by the position of the occur prominently as the repeated tone, probably to demonstrate the
fingers on black keys relative to white. metric vitality of the off-beat repeated tone of the figure.
Celebrating my return to the solo piano, I programmed a recital on
which I played the Mozart Sonata in G, K. 283, the work in which
specific sounds had caught my attention as a young child. I found
myself awake in the middle of the night convinced that now I could
show what it was about this work that had so fascinated me all these
years. Using the crude colored pencils I happened to have on hand
and graph paper I visualized the way I heard the piece [Fig. 1].
Though here the vertical axis of the grid is quite primitive, two
features stand out: 1) the Alberti figure is clearly identifiable; 2) the
colors for B and D are much more vivid than that for G. This entirely Fig. 3. K. 330 with reverse Alberti figure, emphasizing tonic.
unconscious selection of colors surprises me after many years, as it
represents a sense of the relative values of the three tones that can Simply by noticing the visual effect of the constantly repeated off-
have been informed only by my memory of what I had heard as a beat tones of the Alberti figure I became aware of the rhythmic
child. Notable also is the solidity of the first tone, D (blue), and the variation from one measure to the next caused by the reference of
brightness of B (red), in contrast to the almost invisible G (light overtones in the Alberti figure to the usually longer melody tones, the
gray). Coming into a composition via its openings tones rather than most obvious of which are indicated by diagonal lines in Fig. 4. Note
in the context of as-yet-unheard tonicity has proved one of the that these melody tones are longer than the bass notes.
insightful strengths of Tonal Refraction.
CONCLUSION
From the earliest stages of music learning visual imagery and
expression have a profound impact upon how we hear and on our
understanding of what we hear [4]. Musical tone, by nature complex,
cannot be depicted with discrete symbols on a staff. Even young
Fig. 2. The passage as heard harmonically with distorted bass values.
children seem reassured when they learn of the 32,000 vibrating hair
cells in the inner ear [5]. A visualization system with room for
738
Table of Contents
for this manuscript
ACKNOWLEDGMENT
Thanks to the many individual donors and colleagues who have
supported this research, its publication, and its presentation in
conferences; to past and present students, including Dr. Pamela
Stanley, (The Horace W. Goldsmith Chair, Albert Einstein College of
Medicine), Dr. Marlon Feld (Music Theory and Composition,
Columbia University), Dr. Damon Horowitz (Philosophy, Columbia
University), Thanks to Dr. Joan Farber, Clinical Psychologist and
painter, for reading the paper; to Jacob Garniez for editorial and
technical assistance.
Thanks to Mannes College of Music for a Faculty Development
Grant; to Dr. George Fisher, former Dean of the Mannes College
Division, for support and encouragement; and to Carl Schachter,
without whom I would not have known of Viktor Zuckerkandl,
REFERENCES
1. Viktor Zuckerkandl, Sound and Symbol, Vol. I. Music and the External
World (tr. Willard Trask), New York, NY, Pantheon Books, 1956, p. 81.
2. Zuckerkandl, The Sense of Music (corrected edition), Princeton, NJ,
Princeton University Press, 1971, pp.25-27.
3. Garniez, Nancy, "Can Individuals, Both Trained and Untrained, Be
Made Aware of Their Subconscious Responses to Tone and, If So,
Would It Make Any Difference?"" Book of Abstracts ICMPC 13-
APSCOM 5, 2014, p. 111.
4. Jeanne Bamberger, The Mind Behind the Musical Ear, Cambridge, MA,
Harvard University Press, 1991, pp. 46, ff.
5. James Hudspeth,. "How the Ear’s Works Work" in Nature, Vol. 341, 5
October, 1989.
6. Garniez, "Can One Teach Rhythmic Flexibility to Young Learners?" in
Clavier Companion, Nov.-Dec, 2011, pp. 42-5.
739
Table of Contents
for this manuscript
The harmonic hierarchy has been studied vigorously over the past few decades. However,
few studies have scrutinized the effects of tonal modulations, or key changes, in regards
to harmonic processing, despite the their relevancy in everyday listening. The present
study used an adapted version of the classic harmonic priming paradigm, in which the
length of the priming context was extended to include two keys connected by a single
pivot chord. Fifteen musicians and seventeen non-musicians were asked to respond to a
single target chord that was related to the first key, the second key, or neither key in the
priming context. Two dependent variables were measured: (1) participants’ reaction time
to whether or not the target chord fit with the priming context, and (2) their subjective
rating of the degree to which this target chord fit with the priming context. The results of
the reaction time data showed that non-musicians were not able to distinguish between
the varying degrees of relatedness shared by the prime and target. In contrast, musicians
responded more quickly to targets that were related to the second key as opposed to
targets related to the first key. Furthermore, musicians did not respond significantly faster
to targets related to the first key versus those related to neither key. The results of the
rating data showed that the highest ratings were given to targets that were related to the
second key. Interestingly, however, targets related to the first key were still rated
significantly higher than those related to neither key. Overall, the present study
demonstrates the online flexibility of harmonic representations. The findings suggest that
key changes operate as a function of recency rather than primacy, as participants rapidly
disregarded the preceding key during modulation. Future research will be thus utilizing
electroencephalography to untangle the sensory and cognitive factors that might be
underlying these findings.
ISBN 1-876346-65-5 © ICMPC14 740 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 741 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 742 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
diaphony). They are based on polyphonic and polyrhythmic Aukštaitija and Žemaitija, and in northwestern Žemaitija and
patterns resulting from intertwining vocal parts. Typically, two Suvalkija, are larger and contain more strings, usually nine or
voices perform simultaneously, creating sequences of chords twelve. They were presumably tuned diatonically, in a
that, in their featuring seconds, are mostly dissonant (from the major-like scale, sometimes with additional basses. They were
viewpoint of Western music theory). used not only for individual playing, but also for entertainment
and for songs and dance accompaniments (Figure 6). Suvalkian
kanklės are considered the most modern; they are sometimes
richly decorated, with characteristic round and spiral endings
(Figure 7).
743
Table of Contents
for this manuscript
Ten recordings of Suvalkian multipart homophonic singing Of course, only the pitches used in the pieces can be
comprising the third sample were used from a collection by measured, therefore some of the depicted tunings are
Četkauskaitė, 2002. And finally, the fourth sample consists of incomplete. Since kanklės performer Plepas did not use the
11 Suvalkian kanklės recordings also from the Folklore lowest pitch in pieces 13, 19, and 20, the relative pitches for these
Archives of the Institute of Lithuanian Literature and Folklore; pieces are roughly defined with references to the adjacent pieces.
some of them are published in Nakienė & Žarskienė (2003). Performer Lapienė tuned the marginal strings in slightly
stretched fifth (by 0–33 cents). The intervals between the
B. Method adjacent pitches seem not to be very strictly defined, in terms of
Software Praat was applied for the acoustic measurements. a whole tone/semitone, though some pattern of asymmetry is
Frequencies of prominent fundamentals or other harmonics observed. Plepas tuned the marginal strings of his kanklės to a
were measured from LTAS spectra of the recordings. The considerably larger interval, a stretched augmented fifth or
average frequencies were evaluated as Praat “gravity centers” even a minor sixth.
of the corresponding filtered peaks of partials. Then the pitches The pieces in Figures 9 and 10 are presented in
were calculated employing the logarithmic frequency-pitch chronological order, i.e. in the order they were recorded. Thus
relation. Lapienė’s tuning rose slightly over time. Most probably this
was not an intentional retuning but rather caused by increasing
III. RESULTS: NORTHEASTERN SAMPLES string tension due to decreasing temperature. However, starting
from piece 8, at least the fourth string (counting from the
A. Singing lowest one) was either intentionally lowered or the musician
The measured scales of northeastern (Aukštaitian) did not notice the changed pitch of the loosened string.
polyphonic vocal performances (sutartinės) are presented in Plepas’ tunings show more distinct changes. It is important
Figure 8. The vertical dotted lines separate four groups of to note that the musician presents two different tuning
singers; localizations (settlements) of the groups are specified procedures recorded as pieces 24 and 25. However, he
below. considers the two tunings equally acceptable for playing the
I should mention my earlier findings (e.g. Ambrazevičius, same music pieces. We are lucky that there are several
2008) that the tonal structures of vocal sutartinės can be recordings of the same pieces. Specifically, pieces 13 and 20;
roughly considered as anchored on the tonal nuclei and 14 and 21; 15, 17, and 26; 16, 18, 22, and 27; 19, 23, and 28 are
dissipating, in terms of salience, towards the scale margins. actually multiple recordings of the same pieces. In addition, the
The tonal nucleus can be approximated as “double tonic”, i.e. first tuning (i.e. presented by pieces 13-24) shows significant
containing two central pitches separated by the interval of a variations from piece to piece.
second. For instance, the most prominent and the most frequent The steadiest intervals are the ones between the two lowest
dyad in the sutartinė “Mina, mina, minagaučio lylio” (Figure 3) pitches and the one above. However, even their standard
is the one containing pitches A4 and Bb4; they form the tonal deviations equal, respectively, 22 and 27 cents, i.e. they are
nucleus. considerable larger than in the case of the tonal nucleus in
Therefore, to visualize somewhat the tonal structures, the vocal sutartinės.
scales in Figure 8 are normalized to the lower pitch of the tonal Briefly, a precise tuning seemingly was not of importance for
nucleus. the northeastern kanklės players. This is reflected in the freedom
At first glance, the scales in Figure 8 show very different of Plepas’ tunings and in the significant divergences of Lapienė’s
interval relations. However, one should notice that the intervals and Plepas’ tunings used to perform music of the same type.
between the pitches of tonal nuclei (on the zero line and the one
above) are very similar, even among different groups of singers. IV. RESULTS: SOUTHWESTERN SAMPLES
The average nuclear interval of the 27 examples is 181 cents;
the standard deviation equals 11 cents. Other pitches are more A. Singing
freely intoned, which corresponds to their lower weights in the Similarly as with the NE sample, I have measured the scales
tonal structures. Additionally, in some of the examples these of SW (Suvalkian) homophonic multipart vocal performances.
pitches occur too episodically to be reliable for evaluation. This music has a more ordinary tonal structure than sutartinės.
It is major-like. Therefore this time I normalized the scales to
B. Kanklės the tonic (Figure 11). Again, the vertical dotted lines separate
Figures 9 and 10 show the measured tunings of five-string the groups of singers.
northeastern kanklės, both original and normalized to the Even though the sample is very small, it shows significant
lowest pitch. It would be problematic to normalize the tunings differences between the examples. Since this type of
to the lower pitch of the tonal nucleus since, unlike the case of performance is based on sonorities in thirds (and not in seconds,
vocal sutartinės, the tonal nucleus is barely identifiable. Say as in NE case), the intonations of the corresponding dyads
that the general structure of the piece presumes that the nucleus should be considered. So, the dyad I-III (tonic – third degree;
should lie on the upper strings. The upper strings are damped here and further I use Roman numerals for the degrees) equals
when plucking the lower strings, but the lower strings are 422 cents, on the average, and the standard deviation is 20
usually left to ring when upper ones are being plucked. cents. The corresponding numbers for the dyads II-IV, III-V,
Therefore, the resounding lowest strings often become the most and IV-VI are 281 and 40 cents, 302 and 24 cents, 395 and 34
important acoustically, thus masking the category of a tonal cents, respectively. The large standard deviation means that the
nucleus. scales are characteristic of relatively loose interval
compositions.
744
Table of Contents
for this manuscript
Figure 8. Scales of NE vocal performances (polyphony) normalized to the lower pitch of the tonal nucleus
745
Table of Contents
for this manuscript
B. Kanklės
As was already mentioned, SW kanklės contain more strings,
and the music is completely different, compared to NE kanklės.
The tunings are major-like, as is most of the vocal music in this
region. The measured kanklės tunings are presented in Figures
12 and 13.
We can state that the musicians were quite careful about
tuning. This especially holds for Puskunigis: the tonic is
somewhat raised or lowered in different pieces (Figure 12),
while the rest of the pitches are raised or lowered, in fact, by
precisely the same increment. This is clear from the normalized
representation of the tunings (Figure 13).
As with the SW vocal performances, now we consider the
intervals of thirds between the dyad pitches. At first glance,
there is a noticeable difference between Popiera’s and
Puskunigis’ tunings. However, a closer inspection reveals that,
in the Popiera case, actually only the triad II-IV-VI is raised
with no significant change of the intervals in the triad. The Figure 13. SW kanklės tunings normalized to the tonic
intervals between the triads I-III-V and II-IV-VI are of little
importance as these triads or their separate pitches never sound
simultaneously. V. DISCUSSION
The pooled tunings of the two musicians show these I have measured and analyzed musical scales in four
averages and standard deviations: 408 and 7 cents (I-III), 300 samples of Lithuanian traditional vocal and instrumental music.
and 7 cents (II-IV), 303 and 15 cents (III-V), and 398 and 11 The intent was to compare “looseness” of the scales in
cents (IV-VI). First, the numbers show relative precision in stylistically similar instrumental (kanklės) and vocal
tuning (as opposed to the case of SW vocal performances), and performances. I considered newer diatonic major-like melodies
second, a slight tendency to stretch the entire scale (similar to from southwestern part of Lithuania (Suvalkija) performed by
the case of SW vocal performances). This tendency stands out singers and musicians. I found that kanklės tunings used for the
more clearly if the pitches an octave apart are compared: the melodies show relatively steady and similar patterns. The same
interval between the fifth degrees above and below the tonic is or related melodies sung even by the same performers show
1228 cents, on the average. more “loose” scales, which seems natural because of the
differences in vocal / fixed instrumental intonation. The related
sung melodies show more “loose” scales, which what seems
746
Table of Contents
for this manuscript
natural because of the differences in vocal / fixed instrumental demonstrate the examined phenomena. It is possible to extend
intonation. the study to include other instruments. Some aspects of scale
However, a reverse tendency was found for the music from “looseness” in the cases of other Lithuanian traditional string
northeastern Lithuania (Aukštaitija). I considered only the and wind instruments are discussed in Ambrazevičius, Budrys,
more archaic music layer from this region: vocal sutartinės and & Višnevska (2015).
sutartinės-type kanklės music. This style represents polyphony
based on beating sonorities (Schwebungsdiaphonie, beat VI. CONCLUSION
diaphony), i.e. harmonic intervals of seconds are characteristic. The “looseness” of scales results from a number of factors.
The basic dyads in vocal sutartinės tend to be quite steadily Specifically, for the examined examples of traditional music, it
intoned. Supposedly, it is because the singers strive for was demonstrated how the phylogenetically determined type of
maximum psychoacoustic roughness as for the ideal “clash” of scale, requirements of psychoacoustic roughness, and
voices. In earlier studies (e.g. Ambrazevičius, 2008), the constraints of performance technique can combine and produce
maximum roughness for the frequency range of characteristic various instances of the “looseness.”
intense partials in the vocal spectra has been evaluated to occur
at roughly 170-200 cents. Thus not surprisingly similar values REFERENCES
for the basic dyads of sutartinės were registered in the present
Alexeyev, E. (1976). Проблемы формирования лада [Problems of
study as well. The other intervals in vocal sutartinės are more modality development]. Москва: Музыка.
or less scattered, which is in agreement with less severe Alexeyev, E. (1986). Раннефольклорное интонирование.
requirements of the “clash” for the marginal scale intervals. Звуковысотный аспект [Early folklore intonation: The pitch
Striving for maximum roughness seems to be less relevant in aspect]. Москва: Советский композитор.
the kanklės case; the tunings are relatively “loose.” I propose Ambrazevičius, R. (2008). Psychoacoustical and cognitive basis of
that this is because of the scale type and the playing technique. Sutartinės. In K. Miyazaki, Y. Hiraga, M. Adachi, Y. Nakajima, Ų
As mentioned, sutartinės-type music, both vocal and M. Tsuzaki (Eds.), ICMPC10. Proceedings of the 10th
instrumental, belongs to a relatively archaic music layer that International Conference on Music Perception and Cognition.
25–29 August 2008. Sapporo, Japan (pp. 700-704). Adelaide:
may be considered an instance of Alexeyev’s “γ-intonation”
Causal Productions.
(Alexeyev, 1986, p. 53). This type of scale is characterized by Ambrazevičius, R. (2014). Performance of musical scale in traditional
non-fixed or “wandering” pitches, wide pitch categories, and vocal homophony: Lithuanian examples. Музикологија /
coordination of pitches, not subordination. Phylogenetically Musicology, 17, 45-68.
the γ-intonation is a precedent to the modern type of scale with Ambrazevičius, R., Budrys, R., & Višnevska, I. (2015). Scales in
relatively fixed pitches, narrow pitch categories, and the Lithuanian traditional music: Acoustics, cognition, and contexts.
subordination of pitches (Alexeyev’s “t-intonation”). Kaunas: Kaunas University of Technology, ARX Reklama.
In the case of vocal sutartinės, the “wandering” pitches are Ambrazevičius, R., & Wiśniewska, I. (2008). Chromaticisms or
noticeably stabilized by the requirement to produce performance rules? Evidence from traditional singing. Journal of
Interdisciplinary Music Studies, 2(1&2), 19-31.
psychoacoustic roughness, whereas the sustained rough dyads
Četkauskaitė, G. (Ed.) (2002). Lietuvių liaudies muzika III d.
or clusters producing roughness cannot be realized by Suvalkiečių dainos. Pietvakarių Lietuva [Lithuanian folk music.
relatively fast decaying and usually shortened (stopped) Vol. 3. Suvalkian songs. Southwestern Lithuania]. Vilnius:
kanklės sounds. Lietuvos muzikos akademijos Muzikologijos instituto
I should stress that though the samples are taken from Etnomuzikologijos skyrius.
different ethnographical regions, the primary focus is not on Fastl, H., & Zwicker, E. (2007). Psychoacoustics. Facts and models.
regions, but on styles. That is, I compare cases of homophony Berlin, Heidelberg, New York: Springer.
and a peculiar type of polyphony. Some pilot analysis shows Friberg, A., Bresin, R., & Sundberg, J. (2006). Overview of the KTH
that the newer homophonic multipart singing in northeastern Rule System for musical performance. Advances in Cognitive
Psychology, 2(2&3), 145-161.
Aukštaitija is similar to the examined homophonic multipart
Fyk, J. (1994). Static and dynamic model of musical intonation. In A.
singing in Suvalkija in terms of scale “looseness.” Hence, the Friberg, J. Iwarsson, E. Jansson, & J. Sundberg (Eds.),
discussed traits of scales should be first attributed to style, not Proceedings of the Stockholm Music Acoustics Conference. July
to musical dialect. 28-August 1, 1993 (pp. 89-95). Stockholm: Stockholm: Royal
In addition, solo singing and homophonic multipart singing Swedish Academy of Music.
from different musical dialects are often characteristic of Hess, W. (1983). Pitch determination of speech signals. Algorithms
gradual tonality change, mostly a rise of the entire scale from and devices. Berlin: Springer Verlag.
the beginning to the end of performance. The gradual change of Kopiez, R. (2003). Intonation of harmonic intervals: Adaptability of
scale position is usually supplemented with modification of the expert musicians to Equal Temperament and Just Intonation.
Music Perception, 20(4), 383-410.
intervals between scale degrees (Ambrazevičius, 2014).
Nakienė, D., & Žarskienė, R. (Eds.) (2003). Suvalkijos dainos ir
However, the change of intervals does not affect the percept of muzika. 1935–1939 metų fonografo įrašai / Songs and music from
scale: emically, from the performers’ viewpoint, the scale is Suvalkija. Phonograph records of 1935–1939. Vilnius: Lietuvių
considered to remain constant. These gradual changes point at literatūros ir tautosakos institutas.
certain aspects of the discussed “looseness.” Importantly, the Nakienė, D., & Žarskienė, R. (Eds.) (2004). Aukštaitijos dainos,
gradual scale changes are not observed in the case of vocal sutartinės ir instrumentinė muzika. 1935–1941 metų fonografo
sutartinės; the scales are acoustically stable in the course of įrašai / Songs, Sutartinės and music from Aukštaitija. Phonograph
performance. records of 1935–1941. Vilnius: Lietuvių literatūros ir tautosakos
In the present study, I employed music examples from two institutas.
Nakienė, D., & Žarskienė, R. (Eds.) (2005). Žemaitijos dainos ir
“instruments” in a broad sense, — kanklės and voice — to
muzika. 1935–1941 metų fonografo įrašai / Songs and music from
747
Table of Contents
for this manuscript
748
Table of Contents
for this manuscript
Persons who do not possess absolute pitch (AP) often have good pitch memory—for
example, the ability to recognize the pitch level of a familiar song or a tone they have
heard many times (such as an orchestral tuning pitch). The goal of this study is to
investigate whether playing a musical instrument provides a context not just for pitch
memory, but also for labeling pitches. Because persons who play instruments receive
physical, visual, and aural cues that connect with specific pitches, it is expected that the
reference to a musical instrument will increase the accuracy in labeling pitches. In a first
experiment, 12 participants—persons who played non-transposing instruments and did
not possess AP—were asked (without any context) to move a slide on a tone generator
until the tone matched the pitch C (in any octave). After a brief discussion, during which
participants’ information was gathered, the participants were asked to envision
themselves playing their instrument and stopping on the pitch C. Then, the participants
moved the tone generator until it matched the imagined instrumental pitch C. The data
were recorded as distances from the nearest pitch C, from 0 to 600 cents (0 to 6
semitones). A paired-sample t test found that the imagined instrumental pitches were
nearer to C (M = 120¢) than the pitches without context (M = 225¢), p = .02. In a second
experiment, 50 participants—persons who play non-transposing or transposing
instruments—complete the same tasks. It is expected that those who play non-transposing
instruments will replicate the results of the first experiment, and those who play
transposing instruments will match the pitch C written for their instrument, rather than
the pitch C written for non-transposing instruments. Data collection is underway for the
second experiment. The results of these experiments will be discussed within the context
of AP, as well as pedagogical issues with teaching aural skills to instrumentalists.
ISBN 1-876346-65-5 © ICMPC14 749 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Are Jazz musicians more creative? Here we use domain-specific and domain-general
measures of creativity and imagination to investigate musical creativity in Jazz
musicians, Non-jazz (mostly classical) musicians, and Non-musicians. Our test battery
includes measures of auditory imagery, expectation, and divergent thinking skills. The
scale imagery task (modified from Navarro-Cebrian & Janata, 2010) had participants
listen to scales and judge whether the last note was modified in pitch. Trials included a
perceptual condition, where all notes play, and an imagery condition, where some of the
notes are left out. Results showed that while all subjects performed above chance, Jazz
musicians were more accurate than Non-musicians in musical scale perception as well as
imagery. Psychometric functions for Jazz and Non-jazz musicians showed steeper slopes
compared to Non-musicians. In the harmonic expectation task, participants rated their
preference of chord progressions of different expectancies while EEG was recorded.
Behavioral data showed that Jazz musicians preferred slightly unexpected chords over
highly expected chords, and were also more tolerant of very unexpected chords. ERP data
showed a more prominent P300 and early right anterior negativity in the highly
unexpected chords for the Jazz musicians compared to the Non-jazz musicians and Non-
musicians. This suggests that Jazz musicians are more sensitive to unexpected musical
events. The Divergent Thinking task (Torrance 1968), which required participants to
respond to prompts in a creative manner, was used as a domain-general measure of
creativity. There was a main effect of group for four out of the six questions in fluency
and originality measures, with the Jazz musicians scoring highest. Together, results
suggest that musical experience, especially in Jazz musicians, confers an advantage in
auditory imagery, musical expectancy, and domain-general creativity, which may be
useful in creative improvisation.
ISBN 1-876346-65-5 © ICMPC14 750 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The melodic cloze probability method (Fogel et al., 2015, Front. in Psychol.) uses a
production task to quantitatively measure melodic expectation: listeners hear the opening
of a novel, monophonic melody (i.e., a melody without any accompaniment) and sing the
note they think comes next. By manipulating the structure of the melodic openings, one
can study how different structural factors interact in shaping melodic expectation. The
current study examines how subtle changes to melodic structure influence expectation for
the tonic note. 48 melodic openings (each 5-9 notes in length) were composed with the
aim of creating expectation for the tonic using an implied authentic cadence. Participants
were asked to sing a single note continuation of each melody. In 26 of these melodies,
the large majority of participants did in fact sing the tonic, but in 22 of the melodies, less
than 70% of participants sang the tonic (mean cloze probability of tonic = 55%). These
22 melodies were subtly modified to increase expectancy for the tonic, and were tested
on a new set of participants. In the revised melodies, on average the tonic was sung more
frequently (mean melodic cloze probability of the tonic increased from 55% to 78%).
The structural factors that most shaped the increase in melodic expectation include not
only local pre-target changes (such as reducing intervallic distance, changing the melodic
patterning, or adding a neighbor gesture around target), but also upstream changes (such
as rhythm simplification or elongation, or changes to implied harmony). However, not all
revised melodies resulted in our goal of attaining a significant increase in melodic
expectation for the tonic. Analysis of the successful and failed attempts at manipulating
melodic constraint within cadential sequences reveal an interplay of local and global
factors in shaping melodic expectation.
ISBN 1-876346-65-5 © ICMPC14 751 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
I. INTRODUCTION
MIDI data can be notoriously messy. In an unpublished
validation study, Shanahan and Albrecht (MS) found that
ClassicalArchives.com, consisting of a large MIDI database of
over 12,000 user-generated works, was exceptionally prone to
errors. They measured fifteen different types of error,
Figure 1. An illustration of some of the problems of MIDI data.
including pitch, rhythms, meter, and key signature error.
The top image is from a 1915 published score of Chopin Op. 15,
Depending on the type of error, error rates ranged from a low No. 3 and the bottom is a visual representation of corresponding
of 5.18% (± ~3%) for misaligned note onsets to a high of MIDI data. As can be seen, there are many discrepancies.
53.88% (± ~7%) for incorrect number of measures due to
blank measures opening files. Another obstacle for the practical use of MIDI data is a
Despite the untrustworthy nature of the data, the dataset is tendency for no key interpretation to be provided. That is,
nevertheless an attractive potential source of future research. pitch-classes are present with a key signature in more than
There are already studies beginning to be published that draw half of the cases, but most files do not explicitly identify the
on this particular dataset (e.g. White 2013, 2014). One reason key. Compounding this issue is a tendency for provided key
this large database is attractive is because most of the current signatures to be incorrect (~40% of the time). While certain
digitally encoded datasets available have been used questions can be answered without key identifications, any
extensively in many studies for years. The overuse of attempts to analyze functional progression, scale degree
particular datasets in empirical studies could lead to tendencies, or other musical elements related to key are
methodological errors such as double-use data or multiple greatly hindered without this information.
tests, and this new repository offers a way to avoid using the Even considering the gross inaccuracies that tend to plague
same data too many times. A second reason for using this this data, most of the pitch information is relatively intact.
dataset is because it is more representative of composers and Shanahan and Albrecht (MS) estimate that only about 8.5% (±
ensemble sizes than most currently available datasets, which ~2%) of pitch information in this database is incorrectly
tend to focus on Germanic composers and small ensembles. encoded. It may be the case, then, that this dataset may be
Given the advantages offered by adding such a database to equipped to answer certain types of questions involving pitch.
the available resources, it is reasonable to further investigate One such question is the accuracy of key finding
algorithms, which is a particularly germane issue for three
ISBN 1-876346-65-5 © ICMPC14 752 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
primary reasons. First, key finding algorithms only consider 1.Database validation sampling
distributions of pitch-classes. Although there are some pitch- The first step we used in the process of sampling for these
class encoding issues in the dataset, accuracy rates are two goals was simply to excerpt roughly 10% of the dataset.
relatively high. Moreover, music theorists have historically We first sampled 1,200 files from the dataset. After the
focused on pitch information more than on other types of conversion process, there were issues with several of the files
musical structures for analysis. Therefore, resolving whether (see III: B below). These problematic files were discarded,
or not the database can reliably answer pitch-related questions resulting in 1,125 files.
promises to offer the most diverse analytical benefits. For the purpose of assessing how viable these data are for a
Second, as mentioned above, one way to greatly increase key finding task, some ground truth was required. Therefore,
the database’s usefulness for examining functional or scale- the methodological decision was made to pair files from the
degree information would be to assign keys to each file. By 1,125 file subset with the corresponding musical works used
testing to what extent key finding algorithms perform in Albrecht and Shanahan (2013). In that study, 982 works had
accurately on this dataset, we can move towards using the previously had keys assigned by independent analysts. These
database more constructively. works had been encoded into the Humdrum kern
Finally, key finding algorithms have undergone some representation, and the assigned keys were used as a ground
significant changes in recent years. Specifically, refinements truth for the key finding algorithm. By matching files from the
and improvements have been made to key finding strategies, current study’s subset to the works used in that study, a
resulting in higher accuracy rates. One such algorithm uses ground truth would enable the testing of this dataset.
Euclidean distance to estimate keys, rather than the more Importantly, note that the goal in the first phase of sampling
common use of correlation (Albrecht & Shanahan, 2013). In is not to determine how accurate the key finding algorithm is.
their study, Albrecht and Shanahan obtained an accuracy rate Instead, we wish to assess how similarly the algorithm
of 93.1% on a database of 982 works encoded in humdrum performs on these messy MIDI files as it did on that study’s
kern notation. cleanly-encoded files. Because we already know how well the
By combining their algorithm with the Aarden-Essen algorithm performs on clean data, we can determine whether
algorithm (Aarden 2003), they were able to create a meta- the algorithm performs in a similar or different way on MIDI
algorithm that performed at 95.1% accuracy. However, data. In short, we will not be comparing the algorithm’s
because the algorithm was determined in a post hoc manner, results on the MIDI to the ground truth of analyzed keys, but
further research on an independent dataset is needed to fully rather to the ground truth of the algorithm’s results on kern
assess the actual accuracy rate. The ClassicalArchives.com data. Therefore, it is important that we only use the exact
dataset provides such an opportunity to test this algorithm’s same movements or works used in the first study.
veracity. After matching works used in Albrecht and Shanahan
(2013) to the subset of 1,125 files selected for this study, only
II. AIMS 32 files were found in common between the two datasets.
In light of the above considerations, this study has two These 32 files were used as a comparison to determine the
complementary aims: 1) we will examine how appropriate effect of encoding on algorithm performance. Members of this
the classical archives MIDI dataset is for examining pitch- subset included Bach’s Well-Tempered Clavier and
related inquiries like key profiling and key finding, and 2) we Brandenburg concerto movements, Beethoven piano sonata
will test the Albrecht & Shanahan (2013) meta-algorithm on movements, Beethoven, Mozart, and Haydn string quartet
this independent dataset to determine how accurate the movements, Scarlatti piano sonatas, Chopin Op. 28 preludes,
algorithm is. If the results are consistent with both the idea Hummel Op. 67 Préludes, and movements from Vivaldi’s The
that the MIDI dataset can be used for certain pitch-related Four Seasons.
tasks and that the meta-algorithm is as accurate as Albrecht & 2.Key finding algorithm validation sampling
Shanahan (2013) suggest, then using this new dataset with
algorithmically assigned keys can open up new possibilities Depending on the results from the first element of this
for computational research. study, it may be appropriate to validate the key finding
algorithm on the MIDI dataset. Of course, this depends on the
III. METHODOLOGY initial results. If the way that the algorithm performs on the
MIDI set of 32 files is significantly different from the way it
A.Sampling performs on the Humdrum files, then that reveals that this
The ClassicalArchives.com dataset is relatively large by the dataset is likely inaccurate enough to be ill equipped for key
standards of classical music corpora. At over 12,000 files, the finding tasks.
database promises to significantly augment current digitalized Each file used in the key finding validation needs to have a
offerings. However, validation studies tend to be very labor key manually encoded by an analyst. This is a time consuming
intensive, requiring visual inspection of individual files. As a process, requiring visual inspection not of the MIDI file, but
result, some means of sampling a small subset of these files is of an original publication of the composition or movement (in
necessary. case of encoded errors in the MIDI file). Of course, the bigger
Recall that there are two primary goals for this study. The the sample, the more confidence can be placed in the results
first goal is to assess the viability of using this relatively from the study. As a middle-ground between these two values,
messy data for key finding. The second goal is to test the key- 204 of the 1,125 files originally subsetted that were not
finding algorithm. Different subsets are required for each included in the first sample were chosen for validation of the
individual goal.
algorithm.
753
Table of Contents
for this manuscript
B.File conversion Recall that the initial goal of this study was to assess
The key-finding algorithm does not work directly on MIDI whether or not the MIDI data was viable for pitch-based tasks
data, so in order for the analysis to be performed the data first like key finding. In order to determine whether this was the
needed to be converted into Humdrum’s kern representation. case, we compared the key finding meta-algorithm’s results
However, tools that directly convert MIDI data to kern on the 32 kern files used for ground truth to the 32 MIDI
representations tend to perform poorly. works sampled for this purpose, described in III: A, 1.
An alternative strategy is to use a middle-ground. Most For example, assume that the algorithm estimates the keys
computer-based music notation programs directly engage with of four works encoded in Humdrum format. For the sake of
MIDI data. For this study, we used MuseScore 2.0.2 to read argument, we will call them compositions A, B, C, and D. If
the MIDI data. the algorithm correctly estimates the keys of A and B and
When reading data from MIDI into MuseScore, shortest incorrectly estimates the keys of C and D, the algorithm will
note length must be specified. To capture the most accurate have an accuracy rate of 50%. Once we locate MIDI versions
representation of the data, we used the shortest note value of a of compositions A, B, C, and D, we can compare the
64th note. Once the data was read by MuseScore, it could algorithm’s performance on the MIDI and Humdrum formats.
easily be exported into MusicXML. From MusicXML, the If the same algorithm correctly estimates the keys on the
conversion into kern format is relatively straightforward using MIDI versions of C and D and incorrectly estimates the keys
the Humdrum extra tool xlm2hum, designed by Craig Sapp of A and B, the accuracy will be 50% compared to the correct
(2010). keys.
Of the 1,200 works originally sampled for the subset, there However, the performance of the algorithm on the MIDI
were errors in at least one step of the file conversion process data compared to the Humdrum data will be 0%. In short,
for 75 files. These 75 files were discarded from further because they are the same compositions, the algorithm should
consideration. perform in the exact same way on either version of the data if
the representations of the data are equally useful for key
C.Key analysis estimation in either representation. Because the algorithm
For the second stage of this study, files were required to be performs differently on each work in this hypothetical
analyzed with key identifications as a ground truth to compare example, we can safely conclude that the representation of the
performance of the key finding algorithm against. Because data between the two encodings has a large impact on key
some of the MIDI files contained a large number of errors, the estimation.
decision was made to assign keys based on published editions Of the 32 kern files used from Albrecht and Shanahan
of each work rather than on the MIDI files. (2013), 31 works were correctly assigned keys by their meta-
One difficulty in identifying the compositions linked to the algorithm (96.9%). Of the MIDI files of the matched works,
files in the ClassicalArchives.com database is that the files do only 29 works were correctly assigned keys (90.6%).
not always contain metadata. However, the website contains a Moreover, the one kern file incorrectly assigned a key, the
directory of its musical works in html documents. The first first movement from Beethoven’s string quartet No. 14, was
step in identifying keys, then, was to identify the works. The correctly assigned to the corresponding MIDI file. Of the 32
html files were searched for the file names in the subset of works, then, only 28 were assigned matching keys, or 87.5%.
1,125 works. These results are consistent with the hypothesis that the
Once the composer and work were identified, scores were errors involved in the MIDI data have an effect on certain
downloaded from the International Music Score Library pitch-based tasks, like key estimation. Nevertheless, 87.5% is
Project (imslp.org). Editions were randomly sampled, with the not too different from the accuracy rates of many key finding
exception that recent creative commons publications were algorithms. Although in this case the same algorithm is being
excluded. The reason why this exclusion criterion was applied to different formats of (allegedly) the same work, and
established is because many of these files were generated so should be held to a higher standard, the results are
from online encodings of works. Because our goal was to nevertheless not out of the realm of normal performance.
validate these online encodings against publications, scores Therefore, we felt that it was appropriate to continue the
generated from the online encodings were problematic. validation of the algorithm. Nevertheless, it is important to
The 204 works selected for validating the key finding keep in mind that at least some of the error in the following
algorithm were randomly assigned to one of the two authors section may be attributable to error in the dataset rather than
of this study. The pdf documents were then analyzed for keys error in the key finding meta-algorithm.
by one of the two authors. If the work was post-tonal, it was
B.Meta-algorithm validation
discarded as not appropriate for use with a key finding
algorithm. Pre-tonal works were included if the tonal center After it was determined that the MIDI database provided a
and mode (‘majorish' or ‘minorish') was clear, otherwise they reasonably satisfactory source for pitch class information, at
were discarded. Some of the MIDI files were also clearly least for the purposes of estimating keys using pitch-class
transposed from their originally published versions; for the distributions, the accuracy of the Albrecht-Shanahan meta-
purpose of clarity, these were also discarded. Finally, works algorithm (2013) was tested against a sample of 204 files. As
that began and ended in different keys were discarded from mentioned above, we established ground truth for keys by a
further consideration. After discarding these files, 192 works visual inspection of the corresponding scores downloaded
remained. from the International Music Score Library Project
(imslp.org). Works in which the key was not clear or where
IV. RESULTS the MIDI file was clearly transposed from the original source
were discarded.
A.Database validation Of the 192 files remaining, the key finding algorithm
correctly identified the keys of 164 files, or 85.4%. Four of
754
Table of Contents
for this manuscript
these files had the correct pitch-class assigned as a tonal Nevertheless, if a more generous calculation of accuracy is
center, but it was enharmonically respelled, such as a work in used, in which only the correct assignment of tonal center is
Db minor being identified as C# minor. The reason for these checked, then this algorithm performs at a fairly impressive
misattributions is likely due to the fact that the key finding 92.2% accuracy rate, not significantly different from the
algorithm picks an enharmonic key based on which accuracy rate found in Albrecht & Shanahan (2013).
accidentals are more prevalent in the piece. Because MIDI Given the fluid nature of modality, especially in early
data only includes pitch-class information, MIDI files with no common-practice era music and late common-practice era
key signature or incorrect key signature (again, about 40% of music, the identification of tonal center alone may be all that
the time (Shanahan & Albrecht, MS)) are interpreted by is possible in these styles. Because the ClassicalArchives.com
MuseScore as the pitch-class that appears more often in the dataset is broader than the sample used in the original study, it
repertoire. For example, Bb is chosen over A# because it is a is possible that these early and late works are skewing the
more common notated pitch. Because this lack of information results of the study. In other words, it is possible that the
is not an error with the algorithm, enharmonically respelled accuracy rates are different because the samples represent
tonal centers were counted as matches. slightly different populations.
Of the errors in key estimation, the most common error was Nevertheless, correct assessment of tonal center is still
assignment of the parallel key, occurring as 46.4% of errors. enough information to assign scale-degrees and harmonic
This result was also prevalent in the initial study (Albrecht & function of chords. Although completely accurate key
Shanahan, 2013). In this database, this error comprises a assignments of both tonal center and mode would be more
bigger percentage of the total. Many of these errors involved useful, tonal center alone is enough to begin functional
Baroque minor mode works. This is not surprising given the analysis.
tendency of these works to employ a Picardy ending. This These results are consistent with a) the hypothesis that the
particular error, then, appears to be the result of the key-finding meta-algorithm is a reliable indicator of key, and
algorithm’s focus on the first and last 8 measures of each b) that the MIDI database can somewhat reliably be used for
work (for further details on how the algorithm works, see the right pitch-related research questions. With the error rate
Albrecht & Shanahan, 2013). The second and third most so high in the comparison between the 32 files used in the
common errors involved the relative key and fifth-related, original study and this study, it is difficult to determine the
each accounting for 25% of errors. The different types of contribution of encoding error on key estimation error.
errors and their frequencies are shown in Table 1. Future work will attempt to further disentangle these effects
by further validating both the database and the meta-
Table 1. Types of errors in key-assignment. The most common algorithm. Moreover, future research questions examined with
type of error is assignment of the relative key. this large and exciting new dataset will likely provide further
Error type # of occurrences (%) insights into the nature of the data and its reliability.
755
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 756 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
of the melody (Dowling, 1973; Dowling, Lung, & sine waves with linear on- and off-ramps to avoid clicks, and
Herrbold, 1987), or by scrambling the notes of the with 20-ms gaps between notes, and presented to listeners via
melody into several octaves while preserving the pitch high-quality headphones at comfortable levels. The program
classes (Dowling, 1984; Idson & Massaro, 1978). presented the stimuli in a different random order to each
(7) Dowling (1986) showed that listeners with moderate participant, and recorded their response times (RTs) to the
levels of musical training automatically encode novel wrong notes.
melodies they hear in terms of scale steps, not intervals. Procedure. Listeners were told that on each trial they would
In contrast, nonmusicians tend toward interval encoding, hear a melody that would often contain a wrong note, and that
and musical professionals can use either strategy their task was to respond to the wrong note as quickly as
depending on task demands. possible. We told them to hold their fingers on the space bar
and to press it as soon as they heard a wrong note. If they got to
One of the consequences of the theory that melodies are the end of the melody without hearing a wrong note, they were
represented as combinations of contours and scales is that then to press the space bar to go on to the next trial. Each
wrong notes in the melodies that violate the tonal scale pattern session lasted about 30 min.
should be much more obvious and easy to identify than wrong
notes that violate expected interval sizes. It has already been
observed that out-of-key notes “pop out” of an otherwise RESULTS
uniform stream of pitches (Janata, Birk, Tillmann, & Bharucha, We analyzed the data separately in the unfamiliar and
2003). The experiment reported here is the latest in a series of familiar conditions, in each case scoring the responses for hits
studies in which we directly compare the rapidity with which (correct detections of a wrong note in a window of 300 to 3000
listeners detect out-of-key wrong notes versus ms following the onset of an actual wrong note), and for
interval-distorting wrong notes (see Dowling, 2008; 2009; median RTs for correct detections (hits) for each condition
Raman & Dowling, 2015). The main difference between this (collapsed across direction, up vs. down) for each participant.
study and our previous studies is that here we include a Unfamiliar Melodies. Table 1 shows the number of hits (out
condition involving unfamiliar melodies, as well as familiar of 16) for each condition for listeners at the two levels of
melodies. We expected that the tendency to rely on key musical training. The hit rates are quite low, as might be
membership as an index of whether a note is a wrong note expected for unfamiliar tunes. We ran an Analysis of Variance
would be even stronger with unfamiliar melodies, since there (ANOVA) with musical training as a between-groups variable,
are no hard and fast rules governing interval sizes in melodies. and key membership (in vs. out) and distance from original
note (1 vs. 2 semitones) as within-group variables. To our
surprise, key membership was not an important factor in
METHOD detection of the wrong notes. Only distance from the original
Participants. Fifty-six students at the University of Texas at pitch was significant, F(1,33) = 5.87, p = .02, with alterations
Dallas served in the experiment for partial course credit. of 1 semitone (M = 2.26) detected more often than alterations
Twenty-seven participants had less than 3 years of musical of 2 semitones (M = 1.66).
training, and 29 had 3 or more years and are characterized as
moderately trained. Participants were assigned blindly to either
Table 1. Number of hits with unfamiliar melodies (out of 16) by
the Familiar (N = 21) or the Unfamiliar (N = 35) condition. musically untrained participants and those with moderate
Stimuli. The stimuli consisted of the first 16 to 24 notes of 32 amounts of training, for wrong notes that were in- or out-of-key,
familiar melodies that received the highest familiarity ratings and 1 or 2 semitones removed from the original pitch. N = 35
in our previous studies (nursery tunes, folk tunes, patriotic
songs, holiday songs, etc.), plus a stylistically similar group of Scale Distance Untrained Moderate Means
unfamiliar folk songs drawn from Bronson (1976). These were
all presented with their natural rhythms and at tempi that we semitones N = 18 N = 17
judged to be comfortable for each melody. Each melody
appeared twice in its respective session, and contained wrong IN 1 2.39 2.24 2.31
notes in two different conditions. The wrong notes were
introduced in a way that did not alter the contour of the melody. IN 2 1.39 1.65 1.51
In the rare cases where the wrong note was part of a repeated
pair of notes, the second note in the pair was also altered to OUT 1 1.83 2.59 2.20
match. There were eight kinds of wrong notes. The wrong
notes were introduced by altering a target note either up or OUT 2 2.06 1.53 1.80
down from its original pitch, moving it 1 or 2 semitones, and
Means 1.92 2.00
landing on an in-key pitch or an out-of-key pitch. There were
64 trials in each session, in which these eight types of wrong
note each occurred eight times. The wrong note could be Table 2 shows the means of the median RTs for each
introduced anywhere between the sixth note of the melody and condition. Because the hit rates in Table 1 were so low, only 16
the end. Previous research had shown that the up-down of the original 35 participants produced hits in all four
variable had negligible effects, and so we collapsed the data conditions. Those 16 split 8 and 8 on musical training. In the
across that variable in the analyses reported here. The melodies ANOVA on their data, only key membership was significant,
were played by a MATLAB R2009b program as sequences of F(1,14) = 14.22, p = .002, with RTs to out-of-key wrong notes
757
Table of Contents
for this manuscript
(M = 1039 ms) almost twice as fast as RTs to in-key wrong Table 4 shows the RTs for each condition for both groups of
notes (M = 1960 ms). listeners. Moderately trained listeners responded more quickly
When we analyzed the hit rates of those listeners with (M = 592 ms) than untrained listeners (M = 812 ms), F(1,19) =
complete RT records, we found no significant effects of any of 11.74, p = .003. RTs were shorter to out-of-key wrong notes
the variables. That is, they did not show the effect of interval (657 ms vs. 716 ms), F(1,19) = 5.92, p = .03, and to 2-semitone
size alteration found with the larger group of listeners. alterations (M = 643 ms) than to 1-semitone alterations (M =
730 ms), F(1,19) = 8.65, p = .008. And this distance effect
appeared mainly in the untrained listeners (901 vs. 724 ms)
Table 2. RTs (in ms) with unfamiliar melodies by musically
untrained participants and those with moderate amounts of than in the moderately trained (603 vs. 582 ms), F(1,19) = 5.42,
training, who achieved at least one hit in each of the four p = .03.
conditions, for wrong notes that were in- or out-of-key, and 1 or 2 The key membership X distance interaction approached
semitones removed from the original pitch. N = 16 significance, F(1,19) = 3.02, p = .10, in which the effect of key
membership was stronger with a 2-semitone alteration (M =
Scale Distance Untrained Moderate Means 128 ms) than with a 1-semitone alteration (M = 46 ms).
semitones N=8 N=8
Table 4. RTs (in ms) with familiar melodies by musically
IN 1 2237 1764 2000 untrained participants and those with moderate amounts of
training, for wrong notes that were in- or out-of-key, and 1 or 2
semitones removed from the original pitch. N = 21
IN 2 1993 1847 1920
758
Table of Contents
for this manuscript
use it in detecting wrong notes. Furthermore, it is clear from the Dowling, W. J. (2008, November). Are melodies remembered as
present results that nonmusicians are not totally devoid of contour-plus-intervals or contour-plus-pitches? Auditory
sensitivity to the scale steps (as they were in Dowling, 1986), Perception, Cognition, and Action Meetings (APCAM), Chicago,
since they show a strong effect of key membership on their RTs IL.
Dowling, W. J. (2009, November). Directing auditory attention in
to the wrong notes in unfamiliar melodies. terms of pitch, time, and tonal scale membership. Poster session
Key membership is important in the detection of wrong presented at the meeting of Psychonomic Society, Boston, MA.
notes in familiar melodies, leading to the detection of about 2.3 Dowling, W. J., Kwak, S.-Y., & Andrews, M. W. (1995). The time
more wrong notes (out of 16) in the out-of-key conditions course of recognition of novel melodies. Perception &
versus the in-key conditions. Distance was used as a cue, but Psychophysics, 57, 197-210.
with smaller effects, leading to an increase in detections of Dowling, W. J., Lung, K. M.-T., & Herrbold, S. (1987). Aiming
about 1.3 (out of 16). Note that the effect of distance was the attention in pitch and time in the perception of interleaved
opposite of that found with the unfamiliar melodies; here it was melodies. Perception & Psychophysics, 41, 642-656.
the wrong notes that were 2 semitones away from the original Cuddy, L. L., & Cohen, A. J. (1976). Recognition of transposed
melodic sequences. Quarterly Journal of Experimental
that were easier to detect. And distance was a more important Psychology, 28, 255-270.
cue for musically untrained listeners than for the moderately Idson, W. L., & Massaro, D. W. (1978). A bidimensional model of
trained. pitch in the recognition of melodies. Perception & Psychophysics,
As with the unfamiliar melodies, RTs were strongly affected 24, 551-565.
by key membership, with responses to out-of-key wrong notes Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch.
about 59 ms faster. And there was a hint of an interaction New York: Oxford University Press.
between key membership and distance, such that both Lee, Y.-S., Janata, P., Frost, C., Martinez, Z., & Granger, R. (2015).
untrained and moderately trained listeners were especially Melody recognition revisited: Influences of melodic Gestalt on
quick to respond when an out-of-key wrong note was 2 the encoding of relational pitch information. Psychological
Bulletin & Review, 22, 163-169.
semitones away from the original pitch. Distance affected RTs Janata, P., Birk, J. L., Tillmann, B., & Bharucha, J. (2003). Online
in general, but the effect was much more pronounced for the detection of tonal pop-out in modulating contexts. Music
untrained listeners (about 177 ms faster for 2 semitones than Perception, 20, 283-305.
for 1) than for the moderately trained (about 21 ms faster). Raman, R., & Dowling, W. J. (2015, August). Effects of key
membership and interval size in perceiving wrong notes: A
CONCLUSION cross-cultural study. Paper session presented at the meeting of
Society for Music Perception and Cognition (SMPC), Nashville,
The present results replicate earlier studies in finding strong
TN.
effects of key membership on the detection of and RTs to Trainor, L. J., McDonald, K. L., & Alain, C. (2002). Automatic and
wrong notes in familiar melodies. We also found strong effects controlled processing of melodic contour and interval information
of the distance in semitones between the wrong note and the measured by electrical brain activity. Journal of Cognitive
correct pitch. In unfamiliar melodies, the picture was more Neuroscience, 14, 430-442.
complicated, in that lack of key membership facilitated fast Trehub, S. E., & Hannon, E. E. (2006). Infant music perception:
RTs, but not detection. Distance between the wrong note and Domain-general or domain-specific mechanisms? Cognition, 100,
the correct pitch in unfamiliar melodies seemed to function in a 73-99.
more global way than in familiar melodies, perhaps as by
producing awkward sounding phrases that suggested the
occurrence of wrong notes.
REFERENCES
Attneave, F., & Olson, R. K. (1971). Pitch as medium: A new
approach to psychophysical scaling. American Journal of
Psychology, 84, 147-166.
Balzano, G. J., & Liesch, B. W. (1982). The role of chroma and
scalestep in the recognition of musical intervals out of context.
Psychomusicology, 2, 331.
Bronson, B. H. (Ed.) (1976). The Singing Tradition of Child’s Popular
Ballads. Princeton, NJ: Princeton University Press.
Dowling, W. J. (1973). The perception of interleaved melodies.
Cognitive Psychology, 5, 322-337.
Dowling, W. J. (1978). Scale and contour: Two components of a
theory of memory for melodies. Psychological Review, 85,
341-354.
Dowling, W. J. (1984). Musical experience and tonal scales in the
recognition of octave-scrambled melodies. Psychomusicology, 4,
13-32.
Dowling, W. J. (1986). Context effects on melody recognition:
Scale-step versus interval representations. Music Perception, 3,
281-296.
Dowling, W. J. (1991). Pitch structure. In P. Howell, R. West, and I.
Cross (Eds.) Representing Musical Structure. London: Academic
Press, pp. 33-57.
759
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 760 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 761 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
songs did not show significant differences with those for the
ABSTRACT accompanied versions, although, remarkably, they were even
Because Western people usually hear melodies in well-tempered higher. Moreover, as table one shows, the isolated voice parts
tunings, it is doubtful whether laymen would use a pure tuning as a of a version of the songs in which each syllable of the lyrics
reference when asked to judge whether a certain melody is sung in was replaced by ‘la’ revealed a similar pattern of purity ratings.
tune or not. In an earlier class room experiment among approximately Furthermore, a series of Kruskal-Wallis tests showed that in
274 pupils (average age 15.6; SD 0.9) the isolated voice part of several most cases these results were the same across different groups.
songs was rated less in tune than the same voice part presented with The fact that purity ratings for isolated voice parts vary
the original accompaniment, although every mistuning was digitally significantly across songs while they do not vary significantly
corrected. It was hypothesized that these differences were due to the for isolated voice parts of accompanied songs, might be
fact that music processing is more difficult without accompaniment interpreted as a support for the model for lyric processing I
defining rhythm and harmony. The aim of the current experiment was presented along with Musical Foregrounding Hypothesis
to test the competing hypothesis that purity ratings in conditions (MFH)(Schotanus, 2015). In this model, I hypothesized that an
without accompaniment are affected by the fact that in a capella accompaniment that explicates the rhythm and the implied
conditions a pure tuning is expected. To test this, 32 of the pupils that harmony of a melody might ease the processing of the music in
also participated in the earlier experiment (mean age 15.5; SD = 0.5)
songs, at least in Western tonal music.
and 36 adults (mean age 56; SD = 15), rated the purity of the singing in
To assess this, the songs I chose for the earlier experiment
six excerpts of the same songs in two conditions: pure tuning and well
had to be Western tonal songs with harmonic and rhythmic
tempered tuning. The results indicate that in general there is no
significant difference between purity ratings for excerpts in pure
difficulties, such as out-of-key notes, modulations, and so
tuning, and excerpts in well tempered tuning. Among adults and called 'loud rests' (silences on the down-beat), components that
pupils separately there are no differences. Among the total population, are thought to complicate rhythm and harmony (Schotanus,
the pure tuning of one excerpt is significantly higher. Musical 2015). Uncertainties about rhythm and tonality, caused by
experience seems to have a small but interesting effect on purity these elements, where thought to affect purity ratings in a
ratings. Experienced musicians give relatively high purity ratings in negative way. In general, this turned out to be the case. The
both conditions, but tend to rate pure tunings slightly higher than well fact that, in a few cases, the purity ratings for the isolated voice
tempered ones, whereas listeners with moderate musical experience parts where not negatively affected might (in line with the
tend to rate well tempered tunings slightly higher. MFH) be explained by involvement. If either the lyric or the
catchiness of the melody causes the listener to get involved in
the music, the listener might overcome the difficulties in music
I. INTRODUCTION processing and understand tonality, especially if the music is
repeated several times. If purity can be understood as a specific
In a class room experiment among twelve fixed groups of kind of hedonic valence, this is in line with Berlyne (1989),
pupils from the fourth grade of the highest levels of secondary among others, who states that there is an inverted-U shaped
education in the Netherlands (Schotanus, 2016 (ICMPC14)) relationship between complexity and hedonic valence. This
the isolated voice part of two songs were rated significantly theory implies that an increase of complexity causes an
less in tune than the same voice part presented with the original increase in valence ratings if, and only if, the complexity has
accompaniment, although every mistuning was digitally been overcome. If not an increase of complexity will cause a
corrected. Purity ratings for the isolated song parts of two other decrease in valence ratings.
Tabel 1: Purity ratings for isolated and accompanied voice parts by twelve groups of pupils (mean age 15)
song Isolated voice part ‘la’ Accompanied song Isolated voice part lyric
N Mean (Median, SD) N Mean (Median, SD) N Mean (Median, SD)
1 72 (4) 2.52b (2; 1.26) 42 (2) 3.12 (3; .86) 64 (3) 3.40 (3; .93)
2 72 (3) 3.51 (4; .98) 49 (2) 3.24 (3; 1.03) 60 (3) 2.63b (3; 1.31)
3 46 (2) 2.69b (3; 1.15) 59 (3) 3.22c (3; 1.01) 58 (3) 3.31c (3; .94)
b
4 45 (2) 3.00 (3; .78) 62 (3) 3.56 (4; 1.01) 45 (2) 2.53b, c (3; .87)
All songs 235 2.90d (3; 1.15) 213 3.3 (3; 1,0 ) 227 3.00d (3; 1.1 )
a
Between brackets: number of groups
b
A Mann-Whitney U test reveals: purity ratings are different from those for the accompanied version of the song
c
A Kruskal-Wallis test reveals: purity ratings are different across groups; however, the deviant ratings do not really change the pattern
d
A Kruskal-Wallis test reveals: purity ratings are different across songs
ISBN 1-876346-65-5 © ICMPC14 762 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
However, there might be an alternative explanation for the of a second musical theme, either a different kind of verse, a
relatively low purity ratings for most isolated voice parts. bridge or a chorus. This might affect purity ratings because
Because these voice parts where recorded with piano there is less repetition to help listeners to get used to the
accompaniment they are sung in well tempered tuning, while melodies, and processing second themes might be more
listeners might expect a pure tuning in a capella singing. I do difficult when they are not preceded by the first theme.
not think this is a likely explanation because most of my pupils, The original recordings are sung by the author, a male
living in a western culture, are used to hear melodies in a well baritone and recorded by Christan Grotenbreg, a professional
tempered tuning. Moreover, it is known that non musicians musician, in his studio. During the recording the singer was
without absolute pitch ability, hardly can tell whether a note is accompanied Christan Grotenbreg, who improvised the
in or out of tune ( Krumhansl, 1991) and listeners usually are accompaniment based on te chord scheme served by the
more tolerant towards impurities of voices than of instruments composer. He played a keyboard, connected to: ProTools 10
(Hutchins, Roquet & Peretz, 2012). Furthermore, a single tone (Desktop recording). However, in this experiment only the
cannot be out of pitch, so it is likely that all purity ratings are isolated voice parts are used. The voice was recorded with a
based on the intervals in the music and even a person with Neumann TLM 103 microphone, and an Avalon VT 737 SM
absolute pitch ability is used to the intervals of well tempered amplifier. To avoid confounds concerning purity,
tuning. voice-treatment software was used even for the original well
Nevertheless, the current experiment is created to examine tempered tuning: Waves Tune, Renaissance Vox compression
whether listeners do use absolute pitches as a reference for and Oxford Eq. The (C-rooted) pure (or ‘absolute’) tuning was
purity ratings, instead of the pitches and intervals that belong to created digitally in Waves Tune, by Christan Grotenbreg, using
a well tempered tuning. the original (well tempered) recording.
The fragments were brought together in one 11 MB sound
II. Method file in MP3 format by the author and a friend, Gerard
Thirty-two pupils (mean age 15.3; SD 0,5) and thirty-six Vuuuregge, using the computer software Audition. Within this
adults (mean age 54; SD 15) heard two textually and musically file the excerpts were divided by silences and atonal piano
different sections of three different songs in two conditions riddles in order to prevent the tonality of a certain fragment to
(well tempered tuning and pure tuning) in a quasi-random affect purity perception in the fragment following it. The song
order, each time divided by an atonal piano riddle. parts are presented in the following order, preventing two
fragments of the same song to succeed each other and varying
A. Participants both song order and condition order:
Two groups of pupils that have participated in the earlier x 4apure,
experiment where selected for this experiment too. Both are x 1awell tempered,
Dutch language and literature classes from the fourth grade of x 2apure,
the highest level of secondary education in the Netherlands. x 1bpure,
One from SG Het Rhedens in Rozendaal, the other from x 2bwell tempered,
Stedelijke Scholengemeenschap Nijmegen, in Nijmegen. The x 4bwell tempered,
fact that these two groups are chosen and not others, is just a x 2bpure,
matter of formal consent, the choice is not based on their earlier x 4awell tempered,
performance. x 1apure,
The experiment was conducted after being reviewed and
x 2awell tempered,
approved by the Ethics Review Board of the Utrecht Institute
x 1bwell tempered,
of Linguistics OTS (ETCL). Both school leaders have given
signed permission, the parents of the children have been x 4bpure.
informed and have been given the chance to forbid their sons C. Questions
and daughters to participate and the pupils themselves where
For all fragments, listeners rated both purity of the singing
free to complete the questionnaires or not.
and sureness about the own judgement on a five point likert
The adults are friends, acquaintances, colleagues and family
scale. Furthermore they answered questions on gender, age,
members, who reacted to an e-mail containing the sound file
musical experience, musical background (Western or not), and
and the questionnaire.
hearing disabilities. Their names were not asked for. The
Every participant was told that the experiment was created
questionnaires are strictly anonymous.
to be sure of ‘something’ I might have found in the earlier
Concerning musical experience, I created a seven point scale.
experiment, but although they must have realized it had
Points are adjusted with a maximum of seven, according to the
something to do with purity, I gave them absolutely no clue
following principles:
what the difference between the different versions might be.
x plays an instrument (including voice) (1 – 3 years, 1 point;
B. Stimuli 4 – 6 years: 2 points; 7 – 9 years: 3 points, 10 years or
The songs used in this experiment where the same as song 1, more: 4 points);
2 and 4 of the previous experiment, two of which were rated x has received formal lessons in playing the instrument (1-3
significantly less pure in the condition without accompaniment years:1 point; 4 – 6 years: 2 points; more than 6 years: 3
but with lyrics. Of each song two fragments were chosen: the points);
first verse or the first couple of verses, and the first occurrence
763
Table of Contents
for this manuscript
x has received formal lessons in music theory in school 2bwt 2.76 ( .87) 3.34 (1.16) 3.08 (1.06)
(0-2 years: 0 points (this is because most pupils in the 4bpure 3.64 (1.08) 3.46 (1.07) 3.57 (1.08)
fourth grade of Het Rhedens have received at least three 4bwt 3.64 ( .93) 3.46 (1.20) 3.55 (1.06)
years of formal music education at school ); 3 years: 1 In line with these results a Repeated Measures Logistic
point; 4 years or more (or: has chosen music as part of the regression (GEE), shows that sureness ratings are significantly
curriculum): 2 points); lower for both 1a (B = -.480, Exp.(B) = .619; p = .005) and 2a
x has received formal lessons in music theory apart from (B = -.412, Exp.(B) = .662; p = .024), and that the SD’s for
school (1 – 3 years: 1 point; 4 – 6 years: 2 points; more purity ratings are highest for 2a. Moreover, a Wilcoxon signed
than 6 years: 3 points). rank test reveals that the difference between sureness ratings
between 2apure and 2awell tempered is close to significance: p = .055.
D. Statistical analyses Incidentally, sureness ratings in general are higher than purity
The significance of the differences between purity ratings ratings.
for pure and well tempered versions of the same excerpts, and Neither for purity ratings, nor for sureness ratings there are
between sureness ratings for the same pairs of recordings, are differences across age, music-cultural background and hearing
tested using Wilcoxon signed rank tests in SPSS. To look disabilities. However, there are differences across gender and
whether there are individuals or groups with a clear bias for musical experience. As the GEE on purity ratings shows,
either pure or well tempered versions the sum of differences women tend to give higher purity ratings than men, regardless
per individual was computed: of condition (B = .770, Exp.(B) = 2.164; p .001), and
σୀଵ൫ݔ௨ െ ݔ௪௧ௗ ൯. experienced musicians give higher purity ratings than
Groups of individuals (adults versus pupils, or musicians non-musicians (Bmusic=4 = .748, Exp.(B) = 2.113; p = .019), as
versus non-musicians) where compared using Kruskal Wallis do highly experienced musicians (Bmusic7 = 1.798, Exp.(B) =
tests, Mann Witney U tests, or Ordinal Logistic Regressions 6.037; p = .001) (the groups with 5 en 6 points also give
using Generalized Logistic Modeling in SPSS. Furthermore, relatively high purity ratings, but probably the groups are too
Repeated Measures Logistic Regressions were run on both small to reach significance). On the contrary, purity ratings
purity and sureness ratings, using te Generalized Estimation from listeners with moderate musical experience (music = 3),
Equation (GEE) in SPSS, in order to control for within subject are lower than those from non-musicians, though not
effects, and to look whether there is a relationship between significantly lower. The same group (music = 3) also seems to
purity ratings and sureness ratings. be less sure about their purity ratings. As the GEE reveals,
there sureness ratings are almost significantly lower than those
III. Results from highly experienced musicians (p = .060).
An Kruskal Wallis test on a restructured version of the
In most cases there are no significant differences in purity
variable concerning musicianship, shows a similar pattern.
ratings for pure or well tempered versions of the same isolated
Because some groups are rather small, as mentioned above, I
voice parts. In general for song 4a, 2a en 2b the well tempered
restructured the variable on musical experience in four
tunings are rated more in tune, and for 1a and 1b the pure
categories. Non-musicians (music 0 or 1), moderate
tunings. For 4b both tunings where rated almost exactly the
experienced musicians (2 or 3), experienced musicians (4 and 5)
same. However, as both a Repeated Measures Logistic
and highly experienced musicians (6 and 7). As Figure 1.
regression (GEE), and a series of Wilcoxon signed rank tests
shows, listeners with moderate musical experience tend to rate
reveal: these differences are not significant. Only one of these
well tempered tunings more in tune than pure tunings, and do
differences is: song 1apure is rated significantly more in tune
so significantly more often than highly experienced musicians
than 1awell tempered (pWilcoxon = .025), but only when the purity
do (p = .034).
ratings of the whole population are taken into account. Among
adults or pupils only there are no significant differences.
However, the difference between 2bwell tempered and 2bpure is
significant within one group of pupils. These outcomes suggest
that there are differences in purity ratings across groups, and
indeed a Kruskal-Wallis test shows that there are significant
differences between purity ratings across groups for 2awell
tempered (p = .028), 2bpure (p = .042) and 2bwell tempered (p = .048).
Tabel 2. Purity Ratings, Mean and SD, per Group, Song and
Condition
pupils adults All
4apure 3,67 (1,02) 3.66 (1.03) 3.68 (1.02)
4awt 3,64 (1,14) 4.01 ( .87) 3.84 (1.02)
1apure 3,09 ( .98) 3.22 (1.01) 3.16 (1.02)
1awt 2,85 (1.05) 2.91 (1.01) 2.91 (1,08)
2apure 2.82 (1.18) 3.4 (1.31) 3.14 (1.39)
2awt 2.82 (1.42) 3,52 (1.11) 3.16 (1,19)
1bpure 3.24 (1.25) 3.23 (1.01) 3.25 (1,16) Figure 1. Mean sum of differences between purity ratings ((xk
1bwt 2.94 (1.11) 3.11 (1.16) 3.15 (1.14) pure – xk well tempered)) from groups of listeners with a similar musical
2 bpure 2.58 (1.22) 3.01 (1.15) 2.85 (1.20) experience.
764
Table of Contents
for this manuscript
IV. Discussion are in B flat, moving to D flat, B-flat minor, F and F minor,
while song 1 is in E, moving toward A, B en F-sharp. Possibly,
Listeners rated purity for six song parts in two conditions:
the scale tones in the B-flat-related scales are more different
well tempered tuning and pure tuning. Although in one case
across tunings than those in the E-related scales.
there is a significant difference between the purity ratings for
However, the fact that there are significant differences
the two versions of the same song part, it is clear that the
between purity ratings for 2a and 2b across groups indicates
relatively low purity ratings of isolated voice parts of songs
that, perhaps, there might be other factors that play a part. Song
recorded with piano accompaniment in the earlier experiment
2 is the only song that is not a love song, but a song with a
cannot be due to their well tempered tuning. The only excerpt
complex somewhat philosophical message, and on top of that,
for which the pure tuning is rated significantly higher than the
it is slower, contains more silences and implies many minor
well tempered one, excerpt 1a, is part of the song with the
harmonies. Therefore, the music might be relatively
highest purity ratings for an isolated voice part in the initial
demanding, while the lyrics might be less attractive for pupils
experiment. They were even higher than those for 1apure in the
than for adults, resulting in a loss of concentration among
current experiment. Moreover, 1apure is not rated especially
pupils.
high, but 1awell tempered is rated relatively low. However, 1awell
Concerning the differences between musical experience,
tempered is the only excerpt with a significant difference between
non-musicians, apparently, do not have a strong preference for
purity ratings from non musicians and experienced musicians.
either well tempered or pure tunings. However, musical
This excerpt seems to be very demanding for listeners with no
training seems to make listeners less certain about their purity
or moderate musical experience. Furthermore, the difference
perception, resulting in lower purity ratings in general, with a
between purity ratings for 1apure and 1awel tempered is much
preference for well tempered tuning. When musical experience
smaller than the differences between those for 2 or 4accompanied
increases listeners tend to become more self assured, resulting
and 2 or 4isolated voice part in the original experiment.
in higher purity ratings regardless of condition (in fact
Strange enough, the purity ratings for the well tempered
somehow all the excerpts are in tune), but with a small but
versions of the excerpts of song 4 are much higher than those
increasing preference for pure tunings. This is remarkable
for the isolated voice part of the whole song in the first
because several musicians in this experiment (for example
experiment. This might be due the fact that an important song
piano players and rock musicians) never use pure tunings.
section with a strongly deviant melody (the bridge) was not
used for the current experiment.
Nevertheless, it is more likely to assume that the lower
V. CONCLUSION
purity ratings for isolated voice parts in the initial experiment The aim of this study was to investigate whether the
have something to do with the difficulty of processing relatively low purity ratings for isolated voice parts in an earlier
unaccompanied voice parts. Possibly, this difficulty hampers class room experiment might be due to their well tempered
purity perception more and more during the song, unless the tuning. The answer is: no. Neither the two groups of pupils who
melody or the lyrics can keep the listener involved, and also participated in the earlier experiment, nor the 36 adults
repetition can habituate him to the difficulties of the melody. who were new to the materials, rated pure tunings of excerpts
An indication that purity ratings indeed are related to of the songs used in the earlier experiment significantly
difficulties with the implied harmony caused by out-of-key different from the original well tempered tunings of these
notes and unusual intervals, is the fact that one adult listener fragments. The pure tuning of just one fragment was rated
made a remark about the unpredictability of the melodies, and higher than the well tempered version of it, and only for the
another wrote that without a score it was difficult to get used to total population. Moreover, both pure and well tempered
the semitones, not knowing where the flats and the sharps tunings received relatively low and inconsistent purity ratings,
should be. Others remarked, more in general, that it is hard to considering the fact that the purity of the recordings was
tell what is pure or not, when there is no harmony to refer to. digitally corrected. Therefore, the results of both experiments
Therefore, one of them, a professional rock musician, wrote, he together seem to support the MFH-based idea that musical
could not complete the questionnaire. processing of isolated voice parts is difficult, resulting in lower
Several participants, both pupils and adults, reported that purity ratings, expressing either a kind of tonal uncertainty or a
they heard something ‘Cher’-like unnatural in, or around my general dislike for the song. Further research might reveal
voice, a vocoder-effect, indicating that it was digitally whether these lower purity ratings really are a matter of tonal
fashioned. All of them said they preferred a more natural sound, uncertainty and therefore of purity perception, or whether they
be it less pure. As indeed there are digital modifications of my are in fact an expression of hedonic value.
voice in both versions, but probably more in the pure ones, I Of course the implications of this experiment are limited.
asked two persons in which song parts the Cher-sound Perhaps, with another singer, or with songs in other tonalities,
occurred. One reacted. According to him, the Cher-sound there would be more significant differences in purity ratings.
occurred in all songs, but it was salient in 4a and bpure, and 2a However, that would not change the conclusions concerning
and bpure and well tempered. If he is right I must admit there is a the purity ratings for the isolated voice parts in the earlier
correlation between purity ratings and Cher-sound effect, since experiment.
purity ratings for the well tempered versions of 4a, 2a and 2b
were higher than those for the pure versions of these song parts. POSTSCRIPT
However, it is strange that the Cher-sound effect would only In the first weeks after this paper was submitted three new
affect purity ratings for the pure versions of 2a and b, not for completed questionnaires arrived. After working up these
the well tempered ones. Moreover, there is something else that subsequent reactions (new Nadults = 39, average age 56.6; SD
song 2 and 4 have in common, and that is tonality. Both songs 15.2) the difference between purity ratings for pure and well
765
Table of Contents
for this manuscript
ACKNOWLEDGMENT
In the first place I would like to thank my participants for
their collaboration; the parents of the ones under age and the
school leaders of both RSG Het Rhedens and SSGN for their
consent; and the Ethics Review Board of the Utrecht Institute
of Linguistics OTS (ETCL) for reviewing my experimental
design. Furthermore, I would like to thank my supervisors Prof.
dr. E. Wennekes (Utrecht University), dr. F. Hakemulder
(Utrecht University) and dr. R. Willems (Donders Institute,
Nijmegen University), and statistics specialist Kirsten Schutter
(UIL OTS, Utrecht University) for their support and comments.
Finally, I thank the Dutch Government, NWO, and my
employer, RSG Het Rhedens, for giving me the opportunity to
avail myself of a PhD scholarship for teachers.
REFERENCES
Berlyne, D.E. (1971) Aesthetics and Psychobiology. New York:
McGraw Hill.
Hutchins S., Roquet, C. & Peretz, I. (2012) The Vocal Generosity
Effect: How Bad Can Your Singing Be? Music Perception,
vol. 30, pp. 147–159. DOI: 10.1525/mp.2012.30.2.147
Krumhansl, C. L. (1991). Music psychology: Tonal structures in
perception and memory. Annual review of psychology, 42(1),
277-303.
Schotanus, Y.P. (2015) The Musical Foregrounding Hypothesis: How
Music Influences the Perception of Sung Language. Ginsborg, J.,
Lamont, A., Philips, M. & Bramley, S. (Editors) Proceedings of
the Ninth Triennial Conference of the European Society for the
Cognitive Sciences of Music, 17-22 August 2015, Manchester,
UK.
Schotanus, Y.P. (2016) Music Supports the Processing of Song Lyrics
and Changes Their Contents: Effects of Melody, Silences and
Accompaniment. ICMPC14. San Francisco, California
766
Table of Contents
for this manuscript
Background
Musical rhythms are linked to human movements externally and internally. While this
sensorimotor coupling underlies rhythm perception, it is unknown whether similar
processes occur visually when observing music-related movements: dance. Given the
action-perception coupling, watching rhythmic dance movements may elicit motor
representations of visual rhythm.
Aims
I investigated visual rhythm perception of realistic dance movements with the following
questions: Can observers extract different metrical levels from patterns of different body
parts? Is visual tuning related to preferred tempo? Can one level serve as beat while
another as subdivision? Are perceptual results supported by sensorimotor synchronization
(SMS)?
Method
Participants observed a point-light figure performing Charleston and Balboa dance
cyclically. In one task, they freely tapped to a tempo in the movement. In another, they
watched the dance presented either naturally, or without the trunk bouncing, and
reproduced the leg tempo. In two further SMS tasks, participants tapped to the bouncing
trunk with or without the legs moving in the stimuli, or tapped to the leg movement with
or without the trunk bouncing.
Results
Observers tuned to one level (legs) or the other (trunk) depending on the movement
tempo, which was also associated with individual’s preferred tempo. The same leg tempo
was perceived to be slower with than without the trunk bouncing in parallel at twice the
tempo. Tapping to the leg movement was stabilized by the trunk bouncing.
Conclusions
Observers extract multiple metrical periodicities in parallel from structured dance, in a
similar way as for music. Lateral leg movements serve as beat and vertical trunk
movements as subdivisions, as confirmed by tempo perception and the subdivision
benefit in SMS. These results mirror previous auditory findings, and visual rhythm
perception of dance may engage common action-based mechanisms as musical rhythm
perception.
ISBN 1-876346-65-5 © ICMPC14 767 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Listeners are exposed to rhythmic stimuli on a daily basis, whether from observing others
moving, listening to music, or listening to speech. Humans easily perceive a beat (quasi-
isochronous pattern of prominent time points) while listening to musical rhythms, as
evidenced by experiments measuring synchronized tapping or perceptual judgments. It is
assumed that listeners infer the beat from regularly occurring events in the musical
surface, but they sustain an internally driven metrical percept once the beat is inferred.
Nevertheless, relatively few studies have attempted to disentangle the surface information
from the internal metrical percept. We therefore attempted to measure the robustness of
internally driven metrical percepts using a musical rich induction stimulus followed by a
beat matching task with metrically ambiguous stimuli. During induction listeners heard
an excerpt of unambiguous duple- or triple- meter piano music. They then heard a beat-
ambiguous rhythm, which could be perceived as either duple or triple. In the probe phase,
listeners indicated whether a drum accompaniment did or did not match the stimulus.
Listeners readily matched the drum to the prior musical induction meter after the beat-
ambiguous phase. Although musicians outperformed non-musicians, non-musicians were
above chance. Experiment 2 examined the time course of the internal metrical percept by
using the same task but varying the duration of the ambiguous phase. This revealed that
listeners performed accurately and comparably for 0, 2, 4, or 8 measures of the
ambiguous stimulus. Overall these results provide additional evidence for perception and
long-lasting memory for musical beat.
ISBN 1-876346-65-5 © ICMPC14 768 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
The production of fluent speech is related to the capacity to precisely integrate auditory
and motor information in time. Individuals who stutter have shown temporal deficits in
speech production which are possibly linked to altered connectivity and functioning of
the neural circuits involved in speech timing and sensorimotor integration. The aim of
this study is to examine if individuals who stutter show deficits which extend to non-
verbal sensorimotor timing as well. In a first experiment, twenty German-speaking
children and adolescents who stutter and 43 age-matched controls were tested on their
sensorimotor synchronization skills. They were asked to tap with their finger to rhythmic
auditory stimuli such as a metronome or music. Differences in sensorimotor timing
between participants who stutter and those who do not stutter were found. Participants
who stutter showed poorer synchronization skills (i.e., lower accuracy or consistency)
than age-matched peers. Adolescents who stutter were more impaired than children,
particularly in consistency. Low accuracy resulted from the fact that participants who
stutter tapped earlier in relation to the pacing stimulus as compared to controls. Low
synchronization consistency (i.e., higher variability) was observed in particular in
participants with severe stuttering. In a second experiment, we tested 18 French-speaking
adults who stutter and 18 matched controls using the same tasks. In addition, to assess
whether stuttering was accompanied by a beat perception deficit, the participants were
tested on a beat alignment task. Adults’ synchronization skills were similar to those of
children and adolescents (i.e., more variable and less accurate than controls). In contrast,
the two groups performed similarly well in the perceptual beat alignment task. In sum,
timing deficits in stuttering are not restricted to the verbal domain and are likely to
particularly affect processes involved in the integration of perception and action.
ISBN 1-876346-65-5 © ICMPC14 769 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
In our digitally enriched environment, there is now the potential to use consumer grade
mobile technologies to collect research data, which could enable remote data collection
from participants at home. As a first step to assess the utility of a tablet-based mobile
platform in characterizing rhythm abilities, a custom rhythm game was developed in
Unity and played by participants in-lab on a Windows Surface 3 tablet. The rhythm game
has 27 levels that consist of combinations of 3 tasks (tap with the metronome, tap
between metronome, call-and-response to metronome), 3 tempos (350 ms, 525 ms, 750
ms inter-onset-interval), and 3 stimulus types (audio, visual, and audio-visual). Four
groups of participants were recruited based on age (younger (18-35 years), older (60-80
years) adults) and musical experience (non-musician (< 5 years musical training),
musician (> 10 years musical training)). Rhythm abilities were assessed based on
accuracy, precision and variance. Results replicate previously observed findings in that
musicians outperformed non-musicians, younger adults outperformed older adults, fast
tempos were most difficult, and audio-visual stimuli as well as on-beat synchronization
were easiest for all participants. A week later all participants played the rhythm game a
second time and test-retest reliability was good to excellent (intra-class correlation
coefficients ranged from 0.72 to 0.83). Results were compared to other assessments of
perceptual, motor and cognitive functions known to decline in aging. Across all
participants, better rhythm abilities significantly correlated with faster response times,
higher working memory capacity, better visuospatial selective attention, and lowered
multitasking costs. Together, these results indicate tablet-based mobile platforms may
serve as a useful tool for remote data collection and are sensitive enough to discriminate
rhythm abilities based on age and musical experience.
ISBN 1-876346-65-5 © ICMPC14 770 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Adult listeners lose some ability to recognize and categorize rhythmic patterns present in
infancy through enculturation to a musical tradition. Some constraints on rhythm
perception, however, are likely to be innate. The present research investigated the mental
representation of musical rhythm using the method of serial reproduction. Participants
were presented with brief sequences, and then reproduced what they heard or saw by
tapping. The reproduced rhythm served as the target rhythm for the next participant. Data
were collected in a public setting using a custom-designed app at ArtPrize, a 19-day art
competition in Michigan. Over 400 individuals varying in age and musical experience
participated. Modality, rhythmic structure, and tempo of initial sequences were
manipulated. A Bayesian analysis is being conducted to examine how each link in the
chain of reproductions systematically distorts the information passed from person to
person. We hypothesize that (1) individual priors for tempo center on previously
observed preferred tempo within a range of inter-onset-intervals (IOIs) between 400–600-
ms; and (2) priors for meter may depend on both cultural expectations and inherent
constraints. Consistent with the prediction that regardless of the rhythm initiating a chain,
iterations of reproductions will converge to culturally- or cognitively-predicted preferred
tempo or meter, preliminary analysis of the tempo chains reveals systematic drift in the
reproduced IOIs to the predicted preferred tempo range for both auditory and visual
sequences. Analysis of chains involving the reproduction of simple and complex meters
is in progress. To our knowledge, this approach to studying the influence of biases on
rhythm perception is unique: the method of data collection enables us to engage a broader
population of non-experts in music cognition than is typically seen in the lab using a
method that has not yet been applied to examine the mental representation of rhythm.
ISBN 1-876346-65-5 © ICMPC14 771 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Listeners have good memories for the absolute tempo of well-known songs and are
sensitive to relative tempo differences in simple rhythms, but in a recent study (London,
et al, Acta Psychologica 2016) participants made systematic errors when presented with
time-stretched vs. original versions of popular songs: sped-up slow songs were rated
faster than slowed down fast songs, even though their absolute tempos (measured in beats
per minute) were the same.
Aims
To study these errors and their possible sources, two follow-up experiments were
conducted. The first varied BPM rate, low frequency spectral flux, and dynamics, and the
second involved synchronized self-motion (tapping).
Method
Stimuli from the previous study were re-used in both experiments, 6 classic R&B songs
(matched high/low flux pairs at 105, 115, and 130 BPM), plus versions time-stretched
±5%; participants rated the speed (not beat rate) on a 7-point scale. In Exp1 stimuli were
presented at two loudness levels (±7db); in Exp2 stimuli were presented at a single
dynamic level and participants tapped along to the beat.
Results
In Exp1 significant MEs were found for BPM, Flux, and Loudness, and a significant
BPM x Flux interaction. High flux songs were rated faster than low flux, louder faster
than softer, and flux and loudness had greater effects at slower tempos. Overall, the
confusion between absolute vs. relative tempo found in the 2016 experiment again
occurred. Pilot results from Exp. 2 found that while tapping ameliorated the relative vs.
absolute tempo errors somewhat, it also led to octave errors at faster BPM levels, and
correspondingly slower tempo judgments.
Conclusions
Collectively, these results support an account of tempo in which "faster" is equated with
either relative or absolute increases in observed and/or expended energy.
ISBN 1-876346-65-5 © ICMPC14 772 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Previous studies had shown that there are systematic discrepancies between played
rhythms that are physically correct and that are subjectively correct. The purpose of the
present study was to examine how a beat framework would affect such a discrepancy
observed in a dotted rhythm. A dotted rhythm consisting of a dotted quaver and a
semiquaver embedded in two-four time was presented to 13 participants. Very short pure-
tone bursts (1000 Hz) were used to mark the onsets of the notes; the whole pattern
sounded as if played on a percussion instrument. The location of the dotted rhythm was
varied between the first beat of the second bar and the second beat of the third bar. The
tempo was 80, 100, or 125 bpm. A music notation to be referred to was also presented.
The participants were instructed to adjust the physical ratio between adjacent durations so
as to make them subjectively correct. The results, by and large, indicated a tendency that
the adjusted physical proportion of the dotted quaver was larger than the notated
proportion, either in the order of the dotted quaver and the semiquaver (notating 3:1) or in
the opposite order (1:3). This tendency was larger when the semiquaver preceded the
dotted quaver, except when the dotted rhythm was located on the first beat of the second
bar. No effect of the location within the bar (whether the dotted rhythm was on the first or
on the second beat) was observed. The physical proportion of the dotted quaver (that was
subjectively correct) tended to be larger than the notated proportion, which was in line
with previous studies. The results also suggested that the location of a dotted rhythm
might influence the perception of the duration ratio.
ISBN 1-876346-65-5 © ICMPC14 773 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Why do some compositions compel performance at certain tempi whereas others “sound
right” under a variety of interpretations? Although J.S. Bach’s Well Tempered Clavier
has been reprinted numerous times and recorded extensively, his “indented tempi” remain
elusive. In his liner notes András Schiff (2011) openly confesses to wondering “what is
the right tempo…and how do we find it?”, and Wanda Landowska (1954) acknowledges
that some of her tempos “will probably be surprising.” Renowned artists have disagreed
strikingly on interpretation—Glenn Gould recorded the E minor Fugue at twice the tempo
of Rosalyn Tureck, and Anthony Newman the B minor Prelude at three times that of
Fredric Gulda! However this discrepancy is not universal – performances of the Eb minor
Prelude typically similar. Willard Palmer’s survey (used previously by Poon & Schutz,
2015 to quantify timing), affords a novel opportunity to explore this intriguing issue.
ISBN 1-876346-65-5 © ICMPC14 774 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
There is evidence that healthy individuals entrain to the beat via neural oscillations that
synchronize with, and adapt to, regularities within an auditory sequence. Here we
examined whether beat deaf participants, who have experienced lifelong difficulties to
synchronize movements to the musical beat, would display abnormal neural entrainment
when listening and tapping to two complex rhythms that were either strongly or weakly
beat inducing, at two different tempi (5Hz and 2.67Hz base frequencies). First,
participants were instructed to detect tempo accelerations while listening to the repeating
rhythm. In a second phase, they participants were instructed to tap the beat of the rhythm.
Time-frequency analyses identified the frequencies at which neural oscillations occurred
and the frequencies at which participants tapped. We hypothesized that behavioral beat
entrainment (i.e., tapping) is impaired for beat deaf persons compared to controls while
neural entrainment (i.e., steady-state evoked potentials) is comparable for beat deaf
persons and controls. Results indicate that, for both rhythms and tempi, beat deaf
participants did not demonstrate a reduction of entrainment compared to controls at the
relevant beat frequencies during listening; Bayes factor t-tests revealed considerable
evidence in favor of the null hypothesis indicating that entrainment by beat deaf
demonstrated no deficit compared with controls. Conversely, beat deaf participants
demonstrated a reduced ability to synchronize with the beat at the frequencies of interest
compared with controls when tapping. Findings suggest that beat deafness is primarily a
motor/sensorimotor deficit and not a perceptual disorder as implied by the term “beat
deafness.”
ISBN 1-876346-65-5 © ICMPC14 775 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 776 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Western musical notation is built on temporal intervals related by simple integer ratios.
However, the reason for the importance of these ratios is unclear, as some traditions
feature non-integer rhythmic patterns (Polak & London 2014). The issue remains
unresolved in part because prior studies have either relied on explicit judgments of
experts (Desain & Honing 2003; Clarke 1984), or have predominantly tested Western
musicians (Povel 1982;Repp et al. 2011).
We introduce a novel procedure adapted from computational concept modeling (Xu et al.,
2010) in which a listener’s reproduction of a 2- or 3- interval rhythm is used to infer the
internal “prior” probability distribution constraining human perception. Listeners are
presented with a random seed (interval ratios drawn from a uniform distribution), and
reproduce it by tapping or vocalizing. The reproduction is then substituted for the seed,
and the process is iterated. After a few iterations the reproductions become dominated by
the listener’s internal biases, and converge to their prior.
When tested in this way, Western musicians consistently converged to attractors near
integer-ratios. Integer-ratio biases were evident even in non-musicians. To explore
whether these biases are due to passive exposure to Western music, we tested members of
the Tsimane’, an Amazonian hunter/gatherer society. Despite profoundly different
cultural exposure, the Tsimane’ also exhibited a prior favoring integer ratios, though the
prominence of certain ratios differed from Westerners.
Our results provide evidence for the universality of integer ratios as perceptual priors.
However, differences exist between groups in the relative strength of rhythmic perceptual
categories, suggesting influences of musical experience. Both findings were enabled by
our method, which provides high-resolution measurement of rhythmic priors regardless
of the listener’s musical expertise.
ISBN 1-876346-65-5 © ICMPC14 777 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Recent studies of ensemble synchronization provide valuable insight into the question of
musical leadership, but thus far these methods have been only applied to Western music
(Keller 2013; Wing et al. 2014). Performers in a Jembe ensemble from Mali (typically
2-4 musicians) inhabit distinct musical roles: lead drum, repertoire-specific timeline, and
one or two ostinato accompaniment parts, making this repertoire ideally suited to
investigate ensemble entrainment.
Aims
We evaluate two different computational models of ensemble synchronization to study
implicit leadership roles as expressed in timing relationships between performers, and
expand the application of this methodology in a cross-cultural comparative perspective.
Method
We analyzed 15 performances of three pieces of Jembe percussion music recorded in
Bamako, Mali in 2006/07. A semiautomatic onset detection scheme yielded a data set of
~40,000 onsets with a temporal resolution of ±1ms. Two different statistical approaches,
linear phase correction modeling (Vorberg and Schulze 2002) and Grainger causality
(Barnett and Seth 2014), were used to analyze the timing influences between the parts.
Results
We found that whereas no single part is a clear leader, there is one consistent follower:
the lead drum, which adapts to the timeline and accompaniment parts to a far greater
extent than vice-versa. This is significantly different than the timing behaviors found for
Western string quartets (Timmers et al 2014; Wing et al. 2014).
Conclusion
Our finding that the lead drum has the most adaptive role in terms of ensemble timing
suggests a reconsideration of the concept of leadership in Jembe ensembles, and perhaps
more generally. These findings also complicate the Africanist concept of timeline as the
central timing reference. Rather, the accompaniment plus timeline parts together play a
time-keeping role in the jembe ensemble.
ISBN 1-876346-65-5 © ICMPC14 778 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Groove—that aspect of a musical stimulus that elicits a tendency to entrain or move
along with it—is of empirical and theoretical interest (Danielsen 2006, Janata, Tomic, &
Haberman 2012, Madison et al 2011, Witek et al 2014). Prior work has focused on
groove and genre, microtiming, or tempo, with less attention to specifics of musical
structure.
Aims
This project investigates groove as a function of multi-voice rhythmic structure. Our
emphasis is on the role of interactions of accent structures in different musical lines.
Methods
Stimuli are taken from two different sources: rhythm section recordings (drums, bass,
and other instruments, e.g. guitar and keyboards) and full recordings (excerpts with
vocals and instrumentation). Stimuli vary by number and combinations of lines (one,
some, or all rhythm instruments, or song segments including only rhythm section
instruments or others, such as horns), and also by selected transformations of order (such
as retrograde of original rhythms). Participants give ratings for groove and also move to
the music as desired, not limited to manual tapping.
Results
Data are still being collected so results are preliminary. Rhythmic structure has a
significant influence on groove, more with regard to entrainment than to global ratings of
groove. Listeners seek 'stable' patterns of entrainment to rhythmic schemata in popular
music and entrain to these, which are not necessarily predicted by notation or
instrumental conventions (i.e. bass drum = downbeats), although rhythm section
instruments are more influential than others. High-groove rhythms tend toward 'front
loading,' with stronger and more predictable periodic structure at the beginning of metric
units and more variability as the metric unit progresses.
Conclusions
A sense of groove emerges from the interactions of musical lines within metric
frameworks, with listeners responding to both stability (predictability) and diversity
(novelty) of rhythmic structure.
ISBN 1-876346-65-5 © ICMPC14 779 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 780 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Swing is a popular rhythmic technique that systematically delays the onset of each even-
numbered division of a bar (at a particular metric level, e.g., 16th-note swing delays each
even-numbered 16th-note subdivision). This creates a “swinging” pulse of alternating
long and short divisions, as opposed to a “straight” pulse of isochronous divisions. The
present study used two experiments to explore college students’ thresholds for perceiving
a difference between straight rhythm and swinging rhythm. The first experiment used a
signal detection paradigm: Participants listened to pairs of looped drum patterns and
judged whether the two patterns sounded “identical” or “different” in timing; in each pair,
the patterns either were either exactly alike or differed only in that one pattern was
straight and the other was swinging. Two independent variables were manipulated from
trial to trial: magnitude of swing (when present), and rhythmic density. The second
experiment used the method of adjustment: Participants listened to continually-looped
drum patterns that were straight when a button was depressed and swinging when the
button was released, and freely adjusted the swing’s magnitude (the “swing ratio”) until
the point of just-noticeable-difference was determined. Two independent variables were
manipulated from trial to trial: tempo, and the loudness discrepancy between odd-
numbered and even-numbered 16th-note divisions. Results of the two experiments are
discussed in terms of how the manipulated variables affected the threshold for detecting a
difference between straight and swinging versions of a rhythm.
ISBN 1-876346-65-5 © ICMPC14 781 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 782 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
It is generally agreed that rap songs vary in their rhythmic complexity; it has also been
claimed that rap has gotten more rhythmically complex over time. As yet, however, few
attempts have been made to quantify the rhythmic complexity of rap or to measure its
historical development.
For the experiment, synthesized versions of actual rap songs were created consisting only
of vocal syllables of two types, stressed (e.g. “TAH”) and unstressed (e.g. “tuh”), plus a
drum track. Subjects had to judge how well the synthesized rap “fit” the underlying meter
conveyed by the drum track.
A generative probabilistic model was devised to predict listeners’ judgments. The model
assigns each rap a probability, based on the type of syllable event (stressed, unstressed, or
none) occurring at each beat, distinguishing between beats depending on their position in
the measure. (A corpus of 20 rap songs was used to set the model’s parameters.) The
model shows a significant correlation with human judgments of “fit” between the rap and
the drumbeat. The model is also tested on the probability it assigns to the corpus data
itself. The role of metrical position and syllabic stress in the model’s performance will be
examined.
Several further issues will be considered: The change over historical time in the
complexity of rap, the role of “intra-opus” probabilities (i.e. repeated rhythmic patterns
within a piece) in rhythmic complexity, and the role of anticipatory syncopation (the
tendency for accented events to fall just before a strong beat).
ISBN 1-876346-65-5 © ICMPC14 783 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musicians are more accurate than nonmusicians at rhythmic synchronization tasks. Many
studies use finger tapping as a means to investigate timing abilities, however it is unclear
the extent to which motor training on musicians’ respective musical instruments extend to
finger tapping abilities. Here we examined how musicians’ training-specific movements
compare in both consistency and accuracy to finger tapping in a synchronization task.
Participants of different musician groups listened to an isochronous sequence of beats and
identified the final beat as being either consistent or inconsistent with the preceding
sequence. Participants either synchronized their movements to the sequence (movement
conditions) by using a drumstick, tapping with a finger, or playing their instrument, or
they listened without moving (no-movement condition). Previously we found that
musicians were significantly better at identifying the timing of the final beat when
allowed to move than when not moving. In the present study we examined the
synchronization data for each motor effector in the musical groups by assessing the
degree of variability between movements and the asynchronies between the movements
and the beat sequence. We observed that stick tapping is less variable and more
synchronous for all musicians and nonmusicians compared to finger tapping.
Additionally, musicians’ movements consistent with the instrument of training were less
variable and more synchronous than finger tapping. This suggests that movement abilities
may be relatively specialized to the motor effector with which musicians are trained.
These findings raise questions about factors that might influence a central timing
mechanism and the degree to which motor learning may be domain-specific.
ISBN 1-876346-65-5 © ICMPC14 784 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 785 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Humans can perceive an isochronous beat in rhythmic auditory patterns such as music,
and use this information to make perceptual judgments, tap to the beat, and dance. Music
theory suggests there is a hierarchy of beats, with some stronger (measure downbeats)
and others weaker (upbeats). Do listeners regularly perceive these so-called metrical
hierarchies in music or in simple auditory and visual patterns? Adult musicians and non-
musicians listened to excerpts of ballroom dance music paired with either auditory
(beeps) or visual (clock-face) metronomes. Participants rated how well the metronome fit
the music they heard. The metronomes could be synchronous with the music at the level
of the beat but not the measure, the measure (the strongest beat in the repeating groups)
but not the beat, both the beat and measure, or neither, yielding four metronome
conditions. Sessions were blocked by metronome modality. Overall, participants rated
beat synchronous auditory and visual metronomes as fitting the music better than beat-
asynchronous metronomes. Musicians showed evidence of hierarchical perception, rating
all measure and beat synchronous metronomes as fitting the music better than beat
synchronous metronomes in both modalities. Non-musicians showed the same pattern of
ratings as musicians, but their ratings did not differ as strongly between fully
synchronous and beat (but not measure) synchronous metronomes in either modality.
Thus all listeners successfully extracted beat-level information from musical excerpts and
matched it to the auditory and visual metronomes. For both groups beat-level synchrony
outweighed measure-level synchrony, but musicians were more strongly influenced by
measure-level synchrony than were non-musicians. Formal training in music may
enhance sensitivity to these additional levels of organization. Without a need to use these
hierarchical beat levels in their daily lives, non-musicians may not be as sensitive to
higher metrical levels.
ISBN 1-876346-65-5 © ICMPC14 786 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Prince (2014) showed that tonality enhances the salience of pitch leap accents at the
expense of duration accents in a metrical grouping task, even though tonality was
irrelevant to the task and provided no grouping cues. Thus the mere presence of
organisational structure in pitch (i.e., tonality) enhanced its dimensional salience (Prince,
Thompson, & Schmuckler, 2009). However it is not known whether temporal
organisational structure can similarly enhance the salience of unrelated temporal
information.
Aims
The aim was to investigate if metricality (whether a sequence conforms to a regular
metre) affects the salience of duration accents.
Method
Baseline study (N=29): participants rated on a 5-point scale how strongly each atonal 24-
note sequence evoked a duple (accent every two notes) or triple (accent every 3 notes)
grouping. Pitch and duration accents were tested separately in order to determine values
that gave equivalent levels of accent strength. Both metric and ametric (random IOI)
sequences were tested. Experimental study (N=32): same procedure, but sequences
contained both pitch and duration accents.
Results
Overall, pitch accents were less influential than duration accents. Metricality did not
affect the strength of pitch accents but it did increase the strength of duration accents.
Conclusions
The overall advantage of duration accents over pitch accents conflicts with earlier
findings using this methodology (Ellis & Jones, 2009; Prince, 2014), perhaps caused by
including a metric manipulation. Metricality did not decrement the salience of pitch
accents, which does not correspond with previous research showing the converse relation
– that tonality decremented the salience of duration accents (Prince, 2014). However,
metricality enhanced the salience of duration accents, even though the two were entirely
independent manipulations, supporting the dimensional salience theory.
ISBN 1-876346-65-5 © ICMPC14 787 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 788 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Past research on adults’ inability to synchronize to visual rhythms has been used as
evidence for the assumption that audition is for temporal processing and vision is for
spatial processing (Repp & Penel, 2002, 2004; Patel et al., 2005). But recent research has
shown adults can entrain to environmental visual rhythms (Richardson, et al. 2007).
Likewise, adults show improved tapping synchronization to visual rhythms that include
motion (Hove, Spivey, & Krumhansl, 2010; Iversen et al., 2009). This recent research
suggests when the visual rhythms are ecological the visual system does accurately
respond to the temporal information. The aim for this series of studies was to examine
whether adults can make use of visual temporal information to synchronize eye-
movements to visual rhythms. In one study adults’ eyes were tracked watching Sesame
Street characters appear in a constant spatial pattern with either rhythmic or jittered
timing (between-subjects). An image of a cloud was present in the center of the screen
but unobtrusive to the visual rhythm, and in the second half the cloud moved to occlude a
location in the rhythmic pattern. First looks to each location in the rhythm were classified
as anticipatory or reactive and the response times recorded. The adults used the rhythm to
plan anticipatory eye-movements that were more accurate in time compared to the jittered
condition, F(1,30)=7.55, p=0.01. The placement of the cloud did not affect visual
synchronization. The anticipatory looks were closer to the onset of the characters in the
rhythmic condition compared to the jittered condition. These findings demonstrate that
adults do track visual temporal information and use it to plan eye-movements. In a
follow-up study, adults’ eyes were tracked watching audio-visual rhythms (a tweeting
bird) to examine the impact of the additional auditory information on synchronization.
Preliminary results show similar synchronized eye-movements for the audio-visual
rhythm.
ISBN 1-876346-65-5 © ICMPC14 789 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Background
Humans synchronize movements with the perceived, regular emphasis (the beat) in
musical rhythms. Neural activity during beat perception is dynamic, time-locked, and
heavily based in the motor system. Neural oscillations synchronize to regularities, such as
the beat, in auditory rhythms (Fujioka, et al., 2012; Nozardan, et al., 2011). Excitability in
the motor system fluctuates during beat perception, as indexed by the amplitude of
motor-evoked potentials (MEPs) evoked by transcranial magnetic stimulation (TMS). In
a previous study, MEP amplitude was greater when listeners heard rhythms with a strong
beat vs. a weak beat, but only for MEPs elicited 100 ms before the beat, not for MEPs
generated at random time points relative to the beat (Cameron, et al., 2012). Thus, motor
system excitability may not be uniform across the entire rhythm, but instead may
fluctuate, rising at particular times on or before the beat.
Aims
We sought to characterize the dynamics of motor system excitability during beat
perception.
Methods
Participants (n=23) listened to strong beat, weak beat, and nonbeat auditory rhythms (35s
each). TMS was applied to left primary motor cortex at 100 time points in the beat
interval of each rhythm type. MEP amplitudes were recorded with electromyography
from the right hand (first dorsal interosseous muscle).
Results
Motor system excitability dynamics differed for strong beat vs. weak beat and nonbeat
rhythms: Excitability ncrease linearly over the beat interval (in anticipation of upcoming
beat positions) in strong beat rhythms, but not weak or nonbeat rhythms. Additionally,
excitability fluctuated at the rate of individual events (the beat rate subdivided by four) to
a greater extent in strong beat rhythms than in the other rhythms.
Conclusions
These results suggest that during beat perception, motor system excitability 1) anticipates
upcoming beats, and 2) tracks subdivisions of the beat.
ISBN 1-876346-65-5 © ICMPC14 790 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Neural oscillatory activities are known to relate to various sensorimotor and cognitive
brain functions. Beta-band (13-30Hz) activities are specifically associated with the motor
system; they are characterized by a decrease in power prior to and during movement
(event-related desynchronization, ERD), and an increase in power after movement
(event-related synchronization, ERS). Recent research has shown that listening to
isochronous auditory beats without motor engagement also induces beta-band power
fluctuations in synchrony with the beat; the rate of ERS is proportional to the beat rate
suggesting that ERS may be related to temporal prediction. Top-down processes such as
imagining march or waltz meters also induce modulations related to the temporal
structure of the particular meter. Here we hypothesize that beta-band modulations may
reflect top-down anticipation in other timing-related tasks that commonly occur in music
performance. Specifically, we examine whether actively anticipating a tempo change can
be differentiated into anticipating an increasing, a decreasing, or a steady tempo.
Electroencephalographic (EEG) signals were recorded from musicians who were asked to
actively anticipate tempo changes in auditory beat stimuli. Prior to each trial, an
informative visual cue appeared on a screen to indicate that the beat rate would increase,
decrease, or remain steady. The initial beat onset interval was 600ms for at least 6 beats
before a gradual tempo change began. The onset of the tempo change was randomized
across trials. Using time-frequency analysis, beta-band power modulations were
examined in the EEG data segment just before the tempo changes occurred. Preliminary
results replicate earlier findings of beta-band ERD and ERS in synchrony with the beat
rate. Further analysis will investigate the mean power, modulation depth, and various
electrode sites to differentiate the three conditions.
ISBN 1-876346-65-5 © ICMPC14 791 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Listening to musical rhythms activates our motor system, including premotor cortex,
supplementary motor area, cerebellum and increases corticospinal excitability as
measured by Transcranial Magnetic Stimulation (TMS) and electromyography.
According to the Action Simulation for Auditory Prediction (ASAP) hypothesis this
motor activation while listening to rhythmic sounds could be internal motor simulation,
which in turn helps the auditory system predict upcoming beats in rhythmic sounds. In
the present experiments we recorded EEG from participants who were instructed to listen
to rhythmic music while not moving their body. Preliminary data show evidence for
neural entrainment in beta band EEG and in mu wave suppression. We are presently
using Independent Components Analysis (ICA) and dipole source localization to
determine the motor sources of rhythm-related modulation of EEG. The results are
discussed in the context of the role of motor simulation even in the absence of explicit
movement.
ISBN 1-876346-65-5 © ICMPC14 792 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
A growing body of work suggests connections between rhythmic abilities and more
general cognitive performance. An outstanding issue is if training in complex beat
perception (involving the perception and maintenance of a periodic pulse in the face of
highly syncopated or ambiguous rhythms) is in any way more effective than mere
synchronization with a metronome. We have designed an adaptive test and training
method for enhancing beat perception and production, with two ultimate goals: 1)
examining if this kind of training yields measurable benefits to attention and other
cognitive behaviors, and 2) examine if real-time brain computer interface (BCI) feedback
might augment training. We will describe the Beat Expertise Adaptive Test (BEAT),
which assesses the limits of an individual's ability to maintain a beat to complex rhythms.
In the BEAT participants synchronize to the beat of a continuously evolving rhythm
starting with a strong, clear beat (simple march-like rhythms easy to synchronize to) and
evolving into patterns with a weakly supported beat (syncopated patterns with no
dominant beat). Successful synchronization (defined by absolute asynchrony over the
preceding 3 beats < 100 ms) causes the rhythms to become progressively harder, while
poor synchronization causes the current rhythm to become easier, enabling the adaptive
determination of an individual complex beat synchronization threshold. We will present
initial BEAT results for nine participants pre- and post- a 30 min interactive beat
perception training program that expands the principles of the BEAT into a mutli-level
game with visual feedback. Subjects varied in performance both before training and in
their relative change after training, with the majority showing significant gains in
maximal difficulty achieved after training. We will present preliminary analyses of
simultaneously recorded EEG, testing if it is possible to develop a signature for loss of
synchronization.
ISBN 1-876346-65-5 © ICMPC14 793 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
1 2
masahiro.okano.sr@gmail.com, shinya.masahiro@gmail.com, 3kudo@idaten.c.u-tokyo.ac.jp
ISBN 1-876346-65-5 © ICMPC14 794 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
() ∝
was pasted on top for easy tapping) in front of them, and the (3)
corresponding voltage was sampled at 1,000 Hz, using a 16-bit
analog-to-digital converter (USB-6218 BNC, National λ is the scaling exponent and expressed as the slope of log-log
Instruments, Austin, TX, USA). Participants could hear the plot of () as a function of n. When we input x (n) to
metronome beats through a speaker in front of them, and their equation (1) instead of y (n), DCCA reduct to DFA:
drum sound and that of their partner were played through the
speakers. The metronome beats were presented using an () = ∑
[() − ()] .
(4)
application (Metronome Beats 2.3., www.stonekick.com) on
the smartphone (ZenPhone5, ASUS) and sampled at 1,000 Hz () ∝ (5)
using the same analog-to-digital converter. Partitions were Zebende (2011) noted that λ itself did not reflect the strength
placed between the participants in each pair and the of cross-correlation, and introduced DCCA cross-correlation
experimenter, to avoid visual contact and possible distraction coefficient ρ (), defined as
resulting from the experimenter’s gaze.
"
! ()
C. Paired Tempo Keeping Task ρ () = , (6)
#! $ ()#! % ()
In our experiment of paired tempo keeping task, participants
where $ () and % () represent () for x (t) and
were required to tap in synchrony with each other as well as
maintain the reference tempo as accurate as possible, as if they y (t), respectively. ρ () is a set of coefficients which vary
were playing the role of a member of an ensemble. In each trial, between -1 (perfect anti cross-correlation) to 1 (perfect
participants tapped in synchrony with metronome beats at first. cross-correlation) and represents the evolution of
Ten seconds after the participants started tapping, the cross-correlation with n. ρ () = 0 means no
metronome was turned off, and they continued tapping, cross-correlation in the time-scale of interval length n.
keeping as close as possible to the reference tempo. The In DFA, the value α = 0.5 indicates the absence of
reference tempos were set at 75 bpm, 120 bpm, and 200 bpm autocorrelations, that is, the input signal was white noise. α >
(800 ms, 500 ms, and 300 ms in ITI, respectively). These 0.5 indicates persistent long-range correlations, that is, large
settings were intended to cover slow, medium, and fast musical (small) values are more likely to be followed by large (small)
tempo. The order of reference tempos was counterbalanced. values. α < 0.5 indicates anti-persistent autocorrelations,
All the participants performed the task twice. meaning that large values are more apt to be followed by small
D. Analyses values and vice versa. The values α = 1 and α = 1.5 means the
input signal was 1/f noise and Brownian noise, respectively
1) Tap Onset Detection. We preprocessed data to optimize (Peng, Havlin, Stanley, & Goldberger, 1995). The relationship
tap onset detection, using the following procedure: We between DFA α and slopes derived from PSD (-β) is α = (β+1)
obtained the zero-mean voltage time series by subtracting the / 2 (Delignieres, Ramdani, Lemoine, Torre, Fortes, & Ninot,
mean value from the original time series, and then full-wave 2006). It should be noted that α itself does not reflect the entire
rectified that. Data were passed through a 100–300 Hz property of long-range autocorrelation, either: Some task
second-order Butterworth band-pass filter. Filtered data were provide () plot with different slope between different
Hilbert transformed, and the instantaneous amplitude was time scales (Peng et al., 1995; Riley, Bonnette, Kuznetsov,
calculated to obtain an envelope of the waveform. The tap Wallot, & Gao, 2012).
onset times in the preprocessed data were defined as the times We applied DFA and DCCA to ITI time series obtained from
at which the value was larger relative to that of the threshold, paired tempo-keeping synchronization task. We also calculated
and the value of the adjacent 50-ms window was smaller PSD because fractal analyses are recommended to perform in a
relative to that of the threshold. combination of multiple methods (Delignieres et al., 2006). All
2) Detrended Cross-correlation Analysis (DCCA) and these three methods were performed after removing linear
Detrended Fluctuation Analysis (DFA). DCCA works on two trend because drift of time series unreasonably increases
time series x (t) and y (t) of equal length N. Fisrt, we get X (t) low-frequency power of PSD and correlations in longer time
and Y (t) by calculating the cumulative sum of deviation time scales in DFA and DCCA.
series of x (t) and y (t), respectively, for each time point t.
III. RESULTS and DISCUSSION
() = ∑
[() − ̅ ] and () = ∑
[
() −
] (1)
One participant was unable to synchronize with their
These integrated series are divided into k non-overlapping partner; therefore, the data from the pair including this
segments of length n. and the series X (t) and Y (t) are then participant were excluded from subsequent analyses. The
locally detrended, and the covariance of detrended X (t) and Y following results are based on the data from the remaining 12
(t) is calculated as pairs (24 participants). Figure 1 demonstrates typical examples
of F DFA (n) and PSD plots. As shown in figure 1, the slopes of
() = ∑
[() − ()][() − ()] (2) long time scales (> 20 s) were about the same, whereas those of
short (< 20 s) were different. Figure 2 demonstrates
where Xn (t) and Yn (t) denote theoretical values obtained from
correlations of PSD slopes between participants in each pair. In
regression of X (t) and Y (t), respectively. This calculation is
short time scales, the slopes of participants in a pair were not
repeated over all possible interval lengths n. In typical, the
correlated(r (24) = -.11, .36, and -.23 for 75, 120, and 200 bpm,
longer interval length n, () more increases. A power
respectively, ps > .05), whereas those were highly correlated in
law is expected:
795
Table of Contents
for this manuscript
significant (z = 5.84, 4.91, and 3.94 for 75, 120, and 200 bpm,
respectively, ps < .001). Complexity matching has been
discussed as reflecting participant’s strong anticipation
(Stephen et al., 2008; Stepp & Turvey, 2010). Strong
anticipation is a form of anticipation that is not based on
internal models (Stepp & Turvey, 2010). For example, a
system made up of two coupled external semiconductor lasers,
of which the transmitter laser takes a delayed feedback input
from itself whereas the receiver takes an input from the
transmitter at negligible delay, demonstrates anticipating
Figure 1. (A) A typical example of FDFA(n) of ITI time-series in a
synchronization: measured beam intensity of the receiver over
trial. (B) Power spectrum of ITI time-series in the same trial. In
time shows similar waveform to that of the transmitter and the
both plots, the slopes of longer time scale (> 20 s) were about the
receiver wave shows phase lead (Stepp & Turvey, 2010). In an
same, whereas those of shorter (< 20 s) were different.
experiment by Washburn et al. (2015), participants
(coordinators) could synchronize their arm movements to
long time scales (r (24) = .93, .96, and .75 for 75, 120, and 200 chaotic arm movements of their partners (producers) even with
bpm, respectively, ps < .001), suggesting complexity matching delayed feedback. Coordinators with delayed feedback should
in long time scale. Also, differences of the correlation anticipate the producers’ movement to synchronize with,
coefficients between long and short time scales were whereas the movements of the producers were confirmed to be
chaotic one by Lyapunov exponent. This type of anticipation
could not be explained by the internal model (anticipation
based on internal models is referred to as weak anticipation in
contrast to strong anticipation). Complexity matching in long
time scales would suggest participants’ ability to sense and
adapt global structures of partner’s tapping.
Figure 3 demonstrates a typical example of ρDCCA (n),
suggesting that the greater correlations were obtained for the
longer time scale. While Hennig (2014) demonstrated that
global coordination emerges from local coordination, our result
suggests that global coordination does not necessarily need
local coordination. No study has reported such a type of
dynamics as long as we know; further investigation is needed
to understand the process that causes global coordination
without local one.
In conclusion, PSD, DFA and DCCA enable us to
investigate the temporal coordination property in various time
scales. These methods would be interesting tools to understand
dynamics of musical synchronization.
Figure 2. Correlations of PSD slopes between participants in each
pair (A) for short time scales (< 20 s) and (B) for long time scales ACKNOWLEDGMENT
(> 20 s). In short time scales, the slopes of participants in a pair This study is partially supported by the research grant from
were not correlated, whereas those were highly correlated in long Tateishi science and technology foundation awarded to MO
time scales.
796
Table of Contents
for this manuscript
REFERENCES
Coey, C. A., Washburn, A., Hassebrock, J., & Richardson, M. J.
(2016). Neuroscience Letters Complexity matching effects in
bimanual and interpersonal syncopated finger tapping.
Neuroscience Letters, 616, 204–210.
Delignières, D., & Marmelat, V. (2014). Strong anticipation and
long-range cross-correlation : Application of detrended
cross-correlation analysis to human behavioral data. Physica A,
394, 47–60.
Delignieres, D., Ramdani, S., Lemoine, L., Torre, K., Fortes, M., &
Ninot, G. (2006). Fractal analyses for ‘short’time series: a
re-assessment of classical methods. Journal of Mathematical
Psychology, 50(6), 525-544.
Fine, J. M., Likens, A. D., Amazeen, E. L., & Amazeen, P. G. (2015).
Emergent Complexity Matching in Interpersonal Coordination :
Local Dynamics and Global Variability. Journal of Experimental
Psychology: Human Perception and Performance, 41(2), 1–15.
Hennig, H. (2014). Synchronization in human musical rhythms and
mutually interacting complex systems. Proceedings of the
National Academy of Sciences of the United States of America,
111(36), 12974–12979.
Marmelat, V., & Delignières, D. (2012). Strong anticipation:
complexity matching in interpersonal coordination. Experimental
Brain Research, 222(1-2), 137–48.
Peng, C. K., Havlin, S., Stanley, H. E., & Goldberger, A. L. (1995).
Quantification of scaling exponents and crossover phenomena in
nonstationary heartbeat time series. Chaos (Woodbury, N.Y.), 5(1),
82–87.
Podobnik, B., & Stanley, H. E. (2008). Detrended Cross-Correlation
Analysis: A New Method for Analyzing Two Nonstationary Time
Series. Physical Review Letters, 100(8), 084102.
Riley, M. A., Bonnette, S., Kuznetsov, N., Wallot, S., & Gao, J.
(2012). A tutorial introduction to adaptive fractal analysis.
Frontiers in Psychology, 3(371), 1–10.
Stephen, D. G., Stepp, N., Dixon, J. A., & Turvey, M. T. (2008).
Strong anticipation: Sensitivity to long-range correlations in
synchronization behavior. Physica A: Statistical Mechanics and
Its Applications, 387(21), 5271–5278.
http://doi.org/10.1016/j.physa.2008.05.015
Stepp, N., & Turvey, M. T. (2010). On Strong Anticipation. Cognitive
Systems Research, 11(2), 148–164.
Washburn, A., Kallen, R. W., Coey, C. A., Shockley, K., &
Richardson, M. J. (2015). Harmony from chaos?
Perceptual-motor delays enhance behavioral anticipation in social
interaction. Journal of Experimental Psychology: Human
Perception and Performance, 41(4), 1166–1177.
Zebende, G. F. (2011). DCCA cross-correlation coefficient:
Quantifying level of cross-correlation. Physica A: Statistical
Mechanics and Its Applications, 390(4), 614–618.
797
Table of Contents
for this manuscript
The assumption that audition is for temporal processing and vision is for spatial
processing has fueled research on adults’ inability to synchronize to visual rhythms (Repp
& Penel, 2002, 2004; Patel et al., 2005). But recent research shows adults can entrain to
environmental visual rhythms (Richardson et al., 2007) and improve tapping
synchronization to moving visual rhythms (Hove, Spivey, & Krumhansl, 2010; Iversen et
al., 2009). This recent research suggests the visual system does process temporal
information. This series of studies used eye-tracking to explore eye responses to visual
rhythms, looking for synchronization. Infants and adults were compared to gain a
developmental perspective. Eight-month-olds and adults watched Sesame Street
characters move in a constant spatial pattern with either rhythmic or jittered timing
(between-subjects). First looks to target locations were classified as anticipatory or
reactive and the response times recorded. The infants reacted to the characters rather than
anticipated their locations. However, in the first half of the experiment infants’ looks
were significantly faster in the rhythmic than the jittered conditions, F(1,30)=7.30,
p=0.01. The adults showed no difference between the rhythmic and jittered conditions in
this simple experiment (ps >0.05), but in a 2nd experiment with a distracter object, the
adults’ looks were significantly faster in the rhythmic than the jittered conditions,
F(1,30)=7.55, p=0.01. The adults’ anticipatory looks were closer to the onset of the
characters in the rhythmic condition. These results indicate infants and adults do track
visual rhythms and use them to plan their corresponding eye-movements. Follow-up
studies using audio-visual rhythms with adults will be discussed for the impact of the
addition of auditory information. Finally, future directions for a contingent looking
paradigm based on visual synchronization accuracy will be proposed as an applied
methodology.
ISBN 1-876346-65-5 © ICMPC14 798 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 799 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Deconstructing nPVI
Nathaniel Condit-Schultz1
1
School of Music, the Ohio State University, USA
ISBN 1-876346-65-5 © ICMPC14 800 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
58 Haydn string quartets1 ; (4) The author’s corpus Equivalent Average Ratio (unordered)
1
_ 5
_ 3
_ 2
_ 3
_
of popular rap transcriptions, the Musical Corpus 1 4 2 1 1
of Flow (MCFlow).2 Figure 1 shows the distribu-
tion of nPVI scores across various subsections of Europe
the four corpora. The variation in nPVI evident 21 Regions
●
●
●
●●
●●
●
●●
●● ●
●●
●●
● ●●●
●
●
●
● ●● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ● ●● ●● ● ●
●● ● ●
●● ●●● ●
● ●●● ●● ●
● ●
●●
● ●●
● ●●
● ●
● ●●● ●
●●●
●
● ●● ●
●● ● ● ● ●●●
● ●
●●
● ●
●●● ●
●
● ●●
●●
●●
●
●
●●●
●●
●● ● ●● ●
●● ●
●
●● ●
● ● ● ● ●● ● ●
●●
●● ●
●
●
●
●
● ●●
● ●● ●●
● ● ●●
● ● ●●
● ● ●● ●
●
● ● ● ●● ●
●
●
●●
●
●●●
● ● ●●●● ●●
●● ● ●
●
●●
●●●
● ● ●● ● ●●● ●
● ●●●
● ●
● ●
●●
● ●●
● ●● ● ●
● ●
●
●●●●
●●
●●● ●●
●● ●●●
● ● ●●
●●●●●●
● ●
●● ●●● ● ● ●
●●●
● ●
● ●
● ●● ● ●
●
●
●● ●●● ●
● ● ●●
●
● ●●●
● ●
●
●●
●
● ●
● ●●
●
●
●●●
●
●● ●
● ●● ●
● ●
●●
●●●●●
●●● ●
●
●● ●●●
●● ●
●●
●●
●
●
●●
●
●
●●●
●●
●● ●
●●
●● ●●
● ● ● ● ●● ●● ●●● ●●●●●
●
●●
●●
●
● ●●
●
●●● ●● ● ● ●
●
● ●
●● ●●
● ● ●●
●●
●● ● ●
● ●● ●●●
●
●
●
●
●● ●●● ●
● ●● ●●
●
●
● ● ●● ● ● ●
●●
●●●●●
● ●
●●●●
● ●
● ● ●● ●
●
●●
●●
●
●●●
●●●●
● ● ●●
●
●
●●
●●
●● ●
● ●
● ●●
● ●● ● ●●●●
●
● ●● ●●●
● ●● ●●● ●●
●●●
●
●
●
● ●● ●●
●●
●●
●
●
●●●
●●
● ●●
●●● ●
● ●
●
●
● ●
● ●●●
●
●
●●●●●●●●● ●● ●●●
● ●
●
● ●
● ● ●●●
●● ●●●●● ● ● ● ●● ●
●●
●● ●
●●
●
●
● ●●
● ●●
●●●●
●●
● ●●●
● ●
● ●● ● ●
●
●
●● ● ●
● ●●
●
● ●● ●
●●●● ● ●●●● ●
●●●
● ●
● ●
● ● ●●
●● ● ●●●
●
● ● ●
●
●●●● ● ● ● ● ●
●
●●
●●
● ● ●●
●
●● ●●
●●●●
●
●
●
●
●●● ●●
●
●
●●● ●● ● ●
●
●●● ● ●
●●
●●●● ●●
●
● ●● ● ●
●●
●● ●
●● ● ●● ●●
● ●●●
●
●
● ●
● ●●
● ●
● ● ●●
●● ● ● ●●
●● ● ●
●
●●●
●
● ●●● ●●●● ● ● ●● ●●● ●●●●●●● ● ● ●●
● ●● ●●● ● ●
●
●● ●● ●
● ●● ●
●
● ●
●●
●●
●
●●● ●
●● ● ●●
●● ●
●
● ● ●
●● ●● ●
● ● ●●
●
●
● ●●
● ● ●
●
●●●
●●
● ● ●
● ●●
●
● ● ●●
● ●
●●●● ●●
●
●● ●
●●
●
● ●● ● ●●
●●● ●
● ● ●
●
● ● ●
●●
●●● ● ●
● ●
● ●● ● ● ● ●● ● ● ●
●● ● ●
●
●●
●
●
●●
● ●
●
●
●● ●
● ● ● ● ●
● ●
●
● ●
● ● ● ●
●
●
●
●
● ● ●●
●
● ●
●
● ● ●● ●●
●
●●
● ●●
●●
●
● ●
●
●
●
●● ● ●●
●
●
●●
●
●●
●
●
●
●
●
●●
● ●
●
●
● ●●
●● ●
●
●
● ●
●
●
● ●
●
●
● ●
●
●
● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●
●●●●● ●
● ● ●● ● ● ●● ● ●●
● ● ●●● ●●● ●● ● ●
●● ● ●● ● ●● ● ●●● ●●●●●●● ● ●●● ●
● ●
● ● ● ●●● ● ● ●
●
● ●●● ● ●● ● ●●●● ● ● ● ●● ●● ● ●● ●● ●●● ● ●● ●● ● ●●●● ●● ● ● ● ●●● ● ●●●●● ●●●● ● ● ● ● ● ● ● ● ●● ●
●●●●● ● ● ●
6044 Songs
● ● ● ●● ● ●● ●● ● ●● ●● ● ● ● ●
● ●● ● ●● ● ●
●●●● ● ●● ●● ●●● ● ● ●●● ●● ● ● ●●●●●
●
● ● ● ●●●●
● ●● ●●●
●● ●●● ● ● ● ● ● ●● ●●●●●● ● ● ●
●● ●
● ●● ● ●
●●
● ●
● ● ● ●●
● ●● ●●● ● ● ● ●●
●●●●● ● ●
●● ● ● ● ●● ●●● ●● ● ● ● ●● ● ● ● ●● ●● ●● ●● ● ●● ●● ● ●
●● ●
●
●● ● ● ●
● ● ● ● ● ● ●
● ● ● ●● ●● ●●● ●●
●
●● ●
●●●● ● ●●
●● ● ● ●●● ●●●
● ● ●● ● ● ●●● ● ●
●●
● ● ● ●● ● ●● ● ● ●● ● ●●● ● ● ● ●●●●●● ●●●●●● ● ● ● ●● ● ●● ● ● ● ●
● ● ●● ●
●● ● ●● ● ●●
● ● ●●● ●● ●●●●●
●● ●
● ●● ●
●● ●●● ●
●●●● ●
●
●●● ● ● ●●
● ●●●● ●● ●●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ●
●●●
●●● ●● ●● ● ●● ● ●●●● ● ●● ● ●●● ● ●● ● ●● ●
●
●
● ● ● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ●●●●
● ● ●●● ● ●●● ● ●●● ● ●●●●●
●● ●
●● ●●●●● ●●● ● ●
● ● ●●●● ● ●
●● ●
● ●
● ● ●●● ● ●● ● ● ● ●● ● ●
● ●●●
●
● ● ● ● ●● ● ●● ●● ●●●● ● ● ●●●
● ● ●●● ●●●● ●● ● ● ● ●●● ● ●
● ●●●●●● ●●●●● ●● ●●●●● ● ●●● ●● ●● ● ● ●●●●● ●●● ●● ● ●●● ●● ● ● ● ● ●●
● ● ●● ●● ● ●● ●●●●● ●
●● ●●● ●● ● ●●●●● ●● ●● ●●●●●
● ●● ● ● ●● ● ● ●
● ● ● ●● ● ● ●● ●●
●●● ●● ● ● ● ●● ● ● ●
● ●
● ● ●
● ●
● ● ●●
● ● ●●●●● ● ●● ● ● ● ●● ● ● ● ●
●● ●● ●● ●●●●●●●●●● ● ● ●
● ● ●●●● ● ●●● ●●● ●● ● ● ●● ●● ● ●●●● ● ●
● ●● ●●
●
● ● ● ●●●●●
●
●● ● ● ●● ●● ●
● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●●
● ● ●● ● ●●● ● ● ● ●● ●●●
●● ● ● ●●
● ●● ●
●● ●
●
● ● ● ● ●
● ●●● ●●●● ● ●● ● ● ●● ●●● ●●● ● ●● ●
● ● ●
●● ●
● ● ● ●
● ● ● ●
● ● ● ●●●
● ● ● ● ●●●●●●● ●●● ●● ● ●●
●
●● ● ●
●
●● ●●
●
● ● ● ●●●●● ● ●● ●●● ●●●●●● ● ●●●
● ● ●● ●● ●● ●● ● ● ●● ● ● ●● ●● ● ●
●
●
●
●
● ● ● ● ●● ● ● ●
● ●● ● ●●
● ● ●●
● ● ●●●● ●● ●
● ● ●●
●● ● ● ● ●● ●● ●●●● ●●●● ●● ●
●● ●●●●●● ● ●●● ●
●●● ●
● ●● ● ● ● ● ●●●●●●● ●
●● ● ●● ●
●● ● ●● ●● ● ●● ● ● ● ● ●
●
● ● ● ● ●● ●●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ●●●
● ●
● ●●●●● ● ●● ● ●● ● ●●● ●● ●●●●●● ●● ● ●● ●● ● ●●●● ● ●● ● ●●●●●●● ●● ● ●●●● ● ●● ●●● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●
● ●● ●
●● ● ● ● ●●●● ●
●● ●
● ●● ●●● ● ●●● ● ●●●● ●
●● ●●●● ● ●●●● ● ●●● ●●
●● ● ●●●●●●● ● ● ●
●●
●● ● ●● ●● ● ●● ●
● ●
●
● ● ●● ●● ● ● ● ● ● ● ●●● ●
● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●●●
● ● ●●●
● ●● ●●●● ● ●
● ●● ●●
● ● ●● ● ● ●●● ●● ●●● ● ● ● ●●
● ● ● ●●
● ●● ●● ●●●● ● ●●● ● ●●● ●● ●●● ●●● ● ● ● ● ●●
●● ●●●● ● ●● ●●
●
●● ● ●●● ● ●● ● ●● ● ●● ● ● ●●● ●●● ● ● ●● ●●● ●●
●
● ● ●●
● ●● ● ● ● ● ● ● ●
● ●● ● ● ● ● ● ● ●● ●● ● ●●●● ● ●● ●●● ● ●● ●●●●● ●●
●● ● ●●● ●● ●●● ● ●●
●● ● ●● ●● ●●
●● ●● ●
● ●● ● ● ● ● ●
●
●● ● ● ● ●●
●● ● ●
●● ●●●
●● ● ●●● ●●● ●●● ●● ● ● ● ●●●● ● ●●● ●●● ● ●
●
● ● ●● ● ●●● ●●●●●● ● ●● ●●● ● ● ● ●● ● ● ●●●● ● ●● ●● ●●● ● ● ● ● ●
●● ●● ●● ● ● ● ● ●
● ●●● ●● ●
●
●●● ●● ●
●●● ●●● ● ● ● ● ●● ●● ● ●
●● ● ●● ●● ●● ●●●●● ● ●
●●
●●
● ●● ●● ●● ●● ●●●
●
●● ●●●●● ● ● ● ●● ●●● ●
● ●●
● ●● ● ● ● ●● ●
●
● ● ● ● ●
● ● ●● ● ●
● ●● ● ●● ●● ●●● ● ● ●● ● ● ● ● ● ●
● ● ● ●●● ●●
●●●
● ● ●●●● ●
● ●●
●●● ● ● ●●●●● ● ●● ● ● ●● ● ● ●● ● ● ● ●
●● ● ●●● ● ● ● ●
●● ● ●
● ●●
●●● ●● ●●●●
●● ●● ●● ●● ●●●●●●
●●●
●●
●● ● ●●●● ●● ●●
●●● ● ●
●●● ●●● ●● ●
●●●●● ●● ● ●● ●● ● ●● ●●
●● ●● ● ● ● ●
● ●
●● ● ●
● ● ● ● ● ● ● ● ●● ●●● ●● ● ●●●● ● ● ●●● ●●● ●●●● ●●● ●●● ●● ● ●● ● ● ●●● ● ●● ● ● ●
●● ●● ● ●●●● ● ●●●● ● ●●
● ●● ● ●● ● ●● ● ● ● ●
● ●
●● ● ● ● ● ●●●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●● ● ● ● ● ● ● ● ● ● ●
● ●● ● ●● ●
● ●● ● ●● ●
● ●●● ●● ● ●●●●● ●● ●● ● ●
●
●● ● ● ●● ●● ●●●● ● ●●●●●● ● ●
●●●●● ● ●●●●
● ●●● ●● ●● ●● ●
● ●● ● ●● ●● ●●●●● ● ● ●● ●
● ●● ●
●● ● ●●
● ● ●●
● ● ●●● ● ● ● ● ●
● ● ● ●
● ● ●● ● ●●● ●
● ● ● ● ● ●● ●● ●●
●●
●● ● ●●●
●● ●
● ●●●
●●●
●
● ● ● ●●● ●●●● ● ●● ●●●
●● ●
● ●●
● ●●● ● ●● ●●
●● ●● ●● ●● ● ●●● ● ● ●●● ● ● ● ● ●● ● ● ●
● ● ● ●
● ● ● ● ●● ●● ●● ● ● ● ● ● ●●
● ● ● ● ● ●●●● ●●●● ● ● ● ● ● ● ●● ● ●●
●● ●● ●● ● ●● ●● ● ●●● ● ● ●●●● ● ● ● ● ● ●● ● ●●● ● ●
● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ●●●
● ●● ● ●● ●● ●● ●● ●
●● ● ●● ●● ●● ●● ● ● ● ●●
●●●● ●● ●● ●●● ● ● ● ● ●● ●● ●●● ● ● ● ●
● ● ● ● ●●● ●● ●
● ● ●
● ●●●● ●● ●● ● ●● ● ● ● ●● ●
● ●●
●● ● ●● ● ●●●● ●● ●
●
●●●●
●●●●●
●●●●●●● ● ● ● ● ●●
● ● ●● ● ●● ● ●●●
●● ● ● ● ● ●● ● ●● ●
● ●
● ● ●● ● ●●
● ● ● ● ●
●
● ● ● ● ●● ● ● ● ●●● ●● ● ●●● ● ● ● ●● ● ● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
● ● ●●
●
●
●●● ● ●
●
● ●●
● ●
●
●
●
● ● ●
●
●
● ● ● ●
● ●
●●
●
●
●
●
●
●
●● ● ●
● ● ●●
●
●
● ●●
●●
●
● ●●
●
●
●
●
●
●
● ●● ●● ● ●
●
●●
●●
●●
●●
●
●
●●●
●
● ●
● ●● ●
●●
● ●
●
● ●●●● ●
●
●●●●
●●
●
●● ●●
●● ●
● ●
● ● ●●
●●● ●
● ●● ●
● ● ● ● ● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
● ● ●●● ●● ●
● ● ●
●
●
●
● ●●
●●
●
●
● ● ●
● ● ● ● ●● ● ● ● ●
● ●●
● ●●
● ● ●●
●
●
●●
● ● ●
● ●
●
●
●
●
●
●
● ● ●● ●●
● ●
●
●
● ●
●● ●
● ● ●● ●
●
● ●
● ●
●
● ●
●●
●●
●●
● ● ●
● ●●
●
●●
●
●● ●●
●
●
●
●
●
● ●
●● ●
●●●●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
1520 Songs
● ●
● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●
● ● ● ●
●● ● ● ● ● ● ●● ● ● ● ● ●● ●
● ●
●
● ●● ●●
●●
● ● ● ●● ● ● ●
● ●● ● ● ●
● ● ●
● ● ● ●● ●●● ● ● ●●
● ● ●
● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ●●●● ●
●●
● ●
● ● ● ●● ● ● ●●
●● ● ● ●● ● ●● ● ● ●
● ●●
● ●
210 Movements
● ● ● ● ●
● ● ● ● ● ● ● ● ●
III. DISCUSSION
● ● ● ●
● ● ● ● ●● ● ● ● ●
● ●
● ● ● ● ●
●
● ●● ●
● ●
● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ●
● ● ● ●
● ●
● ● ● ● ● ●
● ● ●● ●● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●● ● ●
● ●● ●
● ● ●● ●● ●
●● ● ● ● ● ●
● ● ●
● ● ● ● ●
● ● ● ●● ● ●
● ● ● ● ● ●
A. Calculating nPVI
Rap
To calculate an nPVI the following equation is ap- 85 Emcees
plied to each adjacent pair in an ordered series of ●
●
●
●●●
● ● ●
● ● ●● ●
●
●
●
●
●
●
●
●
●
●● ● ●●●
●● ●
●
●
●
●
● ● ●●
●
●● ●● ● ●
● ● ●
●
●
●
●
●
● ● ●● ● ● ● ● ● ●
● ●
389 Verses
● ●● ● ●●●● ● ● ● ●
●● ● ● ●● ● ● ● ● ● ●
● ●● ● ● ●● ●
●
● ●● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ●●
● ● ● ●
● ●●
● ●●
● ●● ●
● ●● ●●● ● ●
●●● ●● ● ● ● ● ●●● ● ●
● ● ●
● ● ● ● ● ●
● ●● ●
● ● ●● ● ●
● ●
● ● ● ● ● ● ●
● ●●
801
Table of Contents
for this manuscript
50.9%
Europe
Proportion of Pairs
26.7%
11.5%
3.8% 3.1% 1.1%
1.2%
50.1%
China
Proportion of Pairs
27.8%
7.5% 7.9%
1.7% 1.3%
Haydn
Proportion of Pairs
14.8%
5.4% 3.4%
1.1% 1.8%
70%, and 90% quantiles of the European corpus. In 0.0 40.0 66.7 100.0 200.0
1 3/2 2 3
other words, the first song’s nPVI is greater than nP Value
802
Table of Contents
for this manuscript
70.6%
raised to Europe = .98, China = .96, Haydn = .98,
and Rap = .98.
Proportion of Pairs
nPVI < 10% It is often convenient to reduce complex infor-
29.4%
mation like the distributions shown in Figures 3
and 4 to a single number, as nPVI does. How-
0.0 66.7 200.0 ever, it may ultimately be more fruitful to retain
1 2
nP Value more information. For instance, by comparing the
Unordered Ratio
nPV distributions directly, we can more thoroughly
explore the differences in pairwise durational rela-
Proportion of Pairs
58.1%
nPVI < 30% tionships between French and English. The pro-
23.3% portion of 1:1 pairs in French and English songs
7% 9.3%
2.3%
are 41.5% and 38.3% respectively—a fairly minor
0.0 28.6 66.7 100.0 120.0 200.0
difference. However, the biggest difference between
1 4/3 2 3 4 French and English IOI pairs, is that French songs
nP Value
Unordered Ratio
in the corpus contain approximately 63% more 3:1
ratios than English songs. Interestingly, this ratio
forms rhythms such as ˇ “‰ ˇ “( , which are associated
Proportion of Pairs
55.6%
33.3% 33.3%
nPVI < 90% ing the relative proportion of 1:1, 2:1, and 3:1 IOI
27.8%
pairs.
5.6%
803
Table of Contents
for this manuscript
Tirol = 39
Luxemberg = 37
Ukraine = 36
REFERENCES
0.55
Switzerland
Lorraine ==37
Sweden37 =Yugoslavia
38 = 36
Daniele, J. R. and Patel, A. D. (2015). Stability and
Austria = 39 Romania = 37
Czech = 38 change in rhythmic patterning across a composers
Denmark==39
Germany 41
Poland = 37
Proportion of 1/1 Ratios Elsass = 41
Russia = 39
lifetime: A study of four famous composers using
0.50 Hungary = 36 the npvi equation. Music Perception: An Interdisci-
Other = 45 plinary Journal, 33(2):255–265.
Grabe, E. and Low, E. L. (2002). Durational variability
in speech and the rhythm class hypothesis, pages 515–
0.45 Netherlands = 45
546. Berlin: Mouton de Gruyter.
Huron, D. and Ollen, J. E. (2003). Agogic Contrast
England = 46 in French and English Themes: Further Support
0.40 for Patel and Daniele (2003). Music Perception,
France = 53 21(2):267–271.
Italy = 54
Jian, H.-L. (2004). An improved pair-wise variabil-
0.20 0.25 0.30 0.35 0.40 ity index for comparing the timing characteristics of
speech. Proceedings of the 8th International Confer-
Proportion of 2/1 Ratios
ence on Spoken Language Processing, pages 4–8.
London, J. and Jones, K. (2011). Rhythmic Refine-
ments to the nPVI Measure: A Reanalysis of Patel
Italy = 54 & Daniele (2003a). Music Perception, 29(1):115–120.
0.18 McGowan, R. W. and Levitt, A. G. (2011). A Com-
France = 53 parison of Rhythm in English Dialects and Music.
0.16 Elsass = 41 Music Perception, 28(3):307–314.
Tirol = 39 Patel, A. D. and Daniele, J. (2003). An empirical com-
Proportion of 3/1 Ratios
0.14
Luxemberg = 37
Other = 45 parison of rhythm in language and music. Cognition,
87:B35–B45.
0.12 Austria = 39 Germany = 39 Patel, A. D. and Danielle, J. R. (2013). An empiri-
England = 46
Switzerland = 37 Russia = 39
Lorraine = 37
Sweden = 38
cal study of historical patterns in musical rhythm:
0.10 Czech = 38
Romania = 37 Netherlands = 45 Analysis of german & italian classical music using
the npvi equation. Music Perception, 31(1):10–18.
0.08 Poland = 37
Document feature - Graphs; ; Musical Scores; Ta-
Denmark = 41
Yugoslavia = 36 bles; Last updated - 2014-08-04.
0.06 Ukraine = 36
Hungary = 36
Patel, A. D., Iversen, J. R., and Rosenberg, J. C.
(2006). Comparing the rhythm and melody of
0.20 0.25 0.30 0.35 0.40 speech and music: The case of british english and
Proportion of 2/1 Ratios
french. Journal of the Acoustical Society of Amer-
ica, 119(5):3034–3037.
Toussaint, G. T. (2012). The pairwise variability index
Figure 5: Top figure: the 21 European regions in the as a tool in musical rhythm analysis. In Proceed-
Essen corpus plotted by their proportion of 1:1 and 2:1 ings of the 12th International Conference on Music
IOI pairs. Bottom figure: the 21 European regions in Perception and Cognition, pages 1001–1008.
the Essen corpus plotted by their proportion of 3:1 and
2:1 IOI pairs. In both figures, regions’ nPVI values are
printed next to their names.
804
Table of Contents
for this manuscript
The musical beat can serve as a catalyst to elicit spontaneous rhythmic movements, such
as the bobbing of our heads. One-to-one phase locking analysis is commonly used to
quantify synchronization of the movement-to-the-beat (e.g., Patel et al., 2009; Phillips-
Silver et al., 2011; Fujii and Schlaug 2013), yet more complex phase locking patterns can
occur in music. The aim of this study was to quantify complex movement
synchronization in musical duo improvisation by applying n:m phase locking analysis
(Tass et al., 1998). Sixteen professional drummers participated in the study and each of
them improvised with a professional bassist. The head movements of the drummer and
the bassist were recorded by a motion capture system with a sampling frequency of 200
Hz. The audio tracks were also recorded at 48000 Hz. Sound features (root mean square,
spectral centroid, spectral entropy, spectral flux, brightness, pulse clarity, and event
density measures) were extracted from the audio tracks with the MIR toolbox (Lartillot
and Toiviainen, 2007). Beat times and the inter-beat intervals were determined with a
beat tracking algorithm (Ellis, 2007). The motion capture data was segmented with the
beat times and bandpass filtered to extract the movement frequencies corresponding to
the half, fourth, and eighth note beat frequencies. We then calculated the degree of n:m
phase locking of head motion between pairs of players. We found that there were
multiple patterns of transition in phase locking mode during the improvisation. For
example, one drummer-bassist pair showed 1:1 phase locking mode at the beginning,
then changed the pattern into 1:2 phase locking mode in the middle, and finally showed
1:1 phase locking mode again at the end. The pattern of transition in the phase locking
mode was matched with the pattern of changes in the sound features. The results suggest
that the n:m phase locking analysis is useful as it can reveal the dynamics of complex
musical synchronization.
ISBN 1-876346-65-5 © ICMPC14 805 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Moving to the beat of music is a widespread reaction in humans. Even though the
majority can synchronize to the beat, some individuals encounter major difficulties in
performing this task (i.e., poor synchronizers). Poor synchronization was initially
ascribed to poor beat perception (i.e., beat deafness). However, it has been shown that
poor synchronization to a beat can co-exist with unimpaired beat perception. In the
current study, we present two cases of beat-deaf individuals (L.A. and L.C), who exhibit
an opposite dissociation: they can tap to the beat of music while their perception is
impaired. In a first experiment, participants’ explicit timing skills were assessed with the
Battery for The Assessment of Auditory Sensorimotor and Timing Abilities (BAASTA),
which includes both perceptual and sensorimotor tasks. Their performance was compared
to that of a control group. L.A and L.C performed poorly on rhythm perception tasks,
such as detecting durational shifts in a regular sequence. Yet, they could tap to the beat of
the same stimuli (i.e., their performance was both accurate and consistent). In a second
experiment, we tested whether the aforementioned perceptual deficit in L.A. and L.C.
extend to an implicit timing task. In this task the regular temporal pattern of isochronous
auditory stimuli aids in performing a non-temporal tasks (i.e., responding as fast as
possible to a different target pitch after a sequence of standard tones). L.A. and L.C’s
performance in the implicit timing task was comparable to that of controls. Altogether,
the fact that synchronization to a beat can occur even in the presence of poor perception
suggests that timing perception and action may dissociate at an explicit level. This
finding is reminiscent of dissociations observed in pitch processing (i.e., in poor-pitch
singers). Spared implicit timing mechanisms are likely to afford unimpaired
synchronization to the beat in some beat-deaf participants.
ISBN 1-876346-65-5 © ICMPC14 806 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 807 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
example the rhythmic perfect fifth had one click train at 120 Cycle duration for a given polyrhythm can be computed by
BPM and the other at 80 BPM, forming a 3:2 ratio in tempo, multiplying the largest number in the tempo ratio by the period
while the rhythmic major chord had one click train at 120 BPM, of the fastest click train (always 500 ms), or, alternatively, by
one at 100 BPM, and one at 80 BPM, forming a 6:5:4 ratio. multiplying the smallest number in the tempo ratio by the
period of the slowest click train. Thus for example, the cycle
duration of the rhythmic major sixth was 5 clicks * 0.5s = 2.5 s.
Cycle durations for all rhythms are shown in Table 1.
For dyadic polyrhythms, the faster and slower click trains
were made using square wave pulses of 4 and 2ms duration,
respectively (thus the 3:2 polyrhythm had a 120 BPM train
comprised of 4 ms duration square waves with onsets every
500 ms, and an 80 BPM click train comprised of 2 ms duration
square waves with onsets every 750 ms). For triadic
polyrhythms (e.g., 6:5:4), the fastest click train had 7 ms
pulses, the middle-tempo click train had 5 ms pulses, and the
slowest click train had 2 ms pulses. A schematic of four of our
stimuli is shown in Figure 1.
To create the polyrhythms, click trains were added together,
aligned on the first pulse. These polyrhythms were presented
diotically over headphones (Behringer HPM1000) in a quiet
room, and participants were allowed to adjust the volume to a
comfortable level.
C. Task
Since we had 12 dyadic and 4 triadic polyrhythms (cf. Table
1), each participant heard a total of 16 polyrhythms, presented
in random order. Rhythms always started on the first beat of
the cycle. On each trial participants had to listen for at least 15
seconds, after which they could stop the stimulus at any time to
make a rating.
In separate blocks, participants rated the stimuli for
pleasantness, complexity, groove, and beat induction (always
in that order). Pleasantness was rated on a scale of -3 to 3
(corresponding to the scale used in McDermott et al.’s 2010
study of pitch intervals, with negative indicating unpleasant
and positive indicating pleasant). The other ratings were made
on a scale of 1 to 7, where 7 was the highest score.
Groove was defined as ‘the extent to which the pattern
makes you want to move (e.g., tap your foot, bob your head).’
Beat induction was defined as the extent to which the pattern
projected a clear sense of an underlying beat. Participants took
a brief break between each block, and took less than 30 minute
to complete all of the ratings.
After the ratings, participants did tests of auditory working
memory (forward, backward, and sequencing versions of digit
span) using the standardized test from the WAIS IV (Wechsler
Adult Intelligence Scale, 4th Edition).
III. RESULTS
Participants found the ratings tasks easy to do. In general,
they stopped the stimulus soon after the minimum listening
Figure 1. Schematics of four stimuli used in this study. In each duration of 15 seconds (mean listening time = 18.7 s, sd = 2.3 s).
panel, the individual metronomic stimuli are shown in the top That is, if a pattern took longer than 15s to complete a cycle (cf.
rows, and the resulting polyrhythm, which is created by adding Table 1), they did not wait for a cycle to complete before
the streams together, is shown in the bottom row (this was the making their rating. This is not surprising since they were not
rhythm heard by participants). a) Rhythmic perfect fifth (3:2 informed that different stimuli had different cycle durations,
ratio); b) major second (9:8 ratio); c) major triad (6:5:4 ratio); d) and for rhythms with long cycles, they may have been unaware
minor triad (15:12:10 ratio). In a) and c) two rhythmic cycles are that any cyclical structure existed.
shown; in b) and d) one cycle is shown. Mean pleasantness ratings for our rhythmic stimuli are
shown in Figure 2a (see Table 1 for x-axis labels). For
comparison, pleasantness ratings of the corresponding pitch
808
Table of Contents
for this manuscript
intervals are superimposed in Figure 2b (data from McDermott shown in Table 2. (Note that these correlations used
et al., 2010). complexity, not inverted complexity, for calculation.)
As can be seen in Figure 2b, the pattern of pleasantness We also sought to determine if the mean pleasantness ratings
ratings for our rhythms is similar the ratings of the pitch of each condition correlated with objective aspects of stimulus
intervals and chords on which they were modeled (see structure. Specifically, we examined four aspects of each
Appendices A & B for numerical data). The correlation rhythmic pattern: the average and standard deviation of
between the ratings in our study and those in McDermott et inter-onset-interval durations measured over 1 complete
al.’s study is high (r =0.73, p = 0.003). Note that this correlation rhythmic cycle; average number of events per second in each
does not include our pleasantness ratings for the octave or cycle, and cycle duration. These correlations were computed
diminished triad, as McDermott et al. did not test these stimuli across all 16 stimuli, and for the 12 dyadic stimuli only. The
in their study. results are shown in Table 3.
809
Table of Contents
for this manuscript
Table 2. Correlations between the mean perceptual ratings of Our ratings of pleasantness are highly correlated with ratings
pleasantness (P), complexity (C), groove (G), and beat induction of beat induction and groove, and inversely correlated with
(B) for the rhythmic intervals and chords. The table shows ratings of complexity. Thus any of the latter three factors (or
r-values for each correlation. For all r-values, p < 0.001. some combination of them) could be driving our pleasantness
ratings (Table 2). In addition, pleasantness ratings were
P C G B correlated (or inversely correlated) with some objective
measures of rhythmic patterns, such as average inter-onset
interval, cycle duration, or average number of events per
P -- -0.78 0.91 0.79 second (Table 3).
After observing the significant similarity between our
C -- -- -0.75 -0.92 pleasantness ratings for polyrhythms and ratings of their pitch
counterparts, it is not unreasonable to wonder if a common
mechanism may be responsible for driving both of these
G -- -- -- 0.85 preference regimes, spanning the temporal spectrum from
rhythm to pitch perception. A possible bridge between the
timescales of rhythm and pitch may be provided by a
All of the patterns we used involved rhythmic cycles of framework based on nonlinear oscillatory models of neural
extended duration, ranging from 1 s for the rhythmic octave to activity. Lots and Stone (2008), for example, present a simple
32 s for the diminished triad (see Table 1). Thus the ability to model of two neurons coupled in a nonlinear fashion being
detect patterns may relate to how much of each pattern could be driven by an external pitch interval stimulus (such as an octave,
held in working memory. Being able to hold more of a pattern comprised of two frequencies in the ratio 2:1). They find that
in mind may help detect structure, and thus lower perceived the stability of the oscillatory states induced by each of the 12
complexity. To determine if complexity ratings were related to common pitch intervals in Western music serves as an accurate
individual differences in working memory capacity, we predictor of its preference ranking.
examined correlations between the average complexity ratings By stability the authors mean the extent to which, should the
of each participant (across all stimuli) and their measures of intrinsic frequency of one oscillator be perturbed by a small
working memory capacity (forward, backward, and sequencing amount, the synchronized relative oscillations of the two
digit span). However, none of the results were significant (all p > coupled oscillators will remain constant. For example, imagine
0.1). a pair of nonlinearly coupled neural oscillators being driven by
an external pitch interval such as the octave (frequency ratio
Table 3. Correlations between the mean perceptual ratings of 2:1) or the tritone (frequency ratio 45:32). When the driving
pleasantness and objective measures of stimulus structure. (The interval is the octave, if this driving stimulus is perturbed
octave is excluded from the standard deviation correlations in
row 2, as it has a standard deviation of 0 s, an outlier value.)
slightly (say to a stimulus comprised of two pitch streams in the
frequency ratio 2.1:1), then the coupled neural oscillators will
continue to oscillate in the frequency ratio 2:1. On the other
Correlation r (all p (all r (dyads p (dyads
hand, if the oscillators are driven by the tritone, changing the
of mean stimuli) stimuli) only) only)
pleasantness
external driving frequency slightly (say, to 46:32) may result in
rating with a a change of the two oscillators’ coupled vibrations from their
rhythm’s: original frequency ratio of 45:32. Thus, we may speak of the
Average 2:1 oscillatory state as being more stable than the 45:32 state.
inter-onset 0.526 0.003 0.516 0.086 Intriguingly, Large et al. (2016) show that that if one employs
interval such a nonlinear model and searches for the most stable
Standard frequency ratios between 1:1 and 2:1, the 12 common Western
deviation of intervals found in Just Intonation are effectively reproduced,
inter-onset perhaps providing a neurodynamic explanation for why certain
intervals -0.175 0.364 -0.463 0.152 pitch intervals appear repeatedly across many cultures and time
Cycle periods.
duration -0.611 p < 0.001 -0.406 0.190
Turning to rhythm, Large & Snyder (2009) and Large et al.
Average
number of (2015) posit that human entrainment to a beat and the tendency
events per -0.592 p < 0.001 -0.642 0.024 to subdivide and accent rhythms may be consequences of
second nonlinear neurodynamic oscillatory effects. Using a canonical
model of nonlinear coupled oscillation, they predict that the
most common patterns of metrical accent correspond to stable
IV. Discussion resonant states of neural oscillation, while the internal sense of
Our main finding is that when ‘rhythmic intervals and a beat that listeners feel could be a direct consequence of
chords’ are made by translating frequency ratios of common spontaneous oscillatory effects inherent in such nonlinear
pitch intervals and chords into metronomic pulse trains with models. Large & Snyder (2009) note that the timescales of
corresponding tempo ratios, the pleasantness ratings of the rhythm perception and the oscillatory timescales of individual
resulting polyrhythms are remarkably similar to ratings neurons differ significantly from each other. However, they
previously reported for the corresponding pitch combinations suggest that their canonical model could be applied not just to
(Figure 2). pairs of individual neurons, but to entire brain regions
810
Table of Contents
for this manuscript
(specifically, regions associated with both auditory and motor Jennifer Zuk for her help with the training involved in digit
feedback and processing), where activity levels would be on span administration, Nori Jacoby for suggestions regarding
the same timescale as rhythmic perception. analysis of objective features of rhythmic patterns, and Josh
Thus, even though the timescales of pitch and rhythm are McDermott for providing the data used in Figure 2b and
quite different, it is possible that the same canonical nonlinear reproduced in Appendix B. We also thank Michael Hove for
model could be used to explain the perceived pleasantness of his assistance with the creation of the stimuli in MaxMSP.
related stimuli in both temporal regimes, although the specific
regions or neural elements oscillating in each case could be REFERENCES
distinct. To help further test such oscillator models, future Large, E. W. (2010). Neurodynamics of Music. In: M.R. Jones et al.
studies should explore the grey area between pitch and rhythm (Eds.), Music Perception (pp. 201-231). New York: Springer.
perception by creating patterns at a range of different tempi. Large, E. W. & Kim, J.C., Flaig, N.K., Bharucha, J.J., & Krumhansl,
For example, the perfect fifth (ratio 3:2) could be presented C.L. (2016). A neurodynamic account of musical tonality. Music
using metronomic click trains at 120:80 BPM; 240:160 BPM, Perception, 33, 319-331.
480:320 BPM, etc., up to the realm of pitch at, say, 7680:5120 Large, E. W. & Snyder, J.S. (2009). Pulse and meter as neural
resonance. Annals of the New York Academy of Sciences, 1169,
BPM (=128 Hz:85.3 Hz). One could then study whether
46-57.
pleasantness ratings are consistent between these timescales. If Large, E.W., Herrera, J.A., & Velasco, M.J. (2015). Neural networks
so, this may lend support to such nonlinear oscillator models of for beat perception in musical rhythm. Frontiers in Neuroscience,
rhythm and pitch perception. 9:159. doi: 10.3389/fnsys.2015.00159
Future work should also examine pleasantness ratings in Lots, I.S., & Stone, L. (2008). Perception of musical consonance and
participants with pitch preferences different from those of the dissonance: an outcome of neural synchronization. Journal of the
Western listeners. For example, some Bulgarian singing Royal Society Interface, 5, 1429-1434.
traditions prize and hold beautiful certain intervals considered McAuley, J.D., Jones, M. R., Holub, S., Johnston, H. M., & Miller,
to be highly dissonant by Western listeners, such as the tritone N.S. (2006). The time of our lives: life span development of
timing and event tracking. Journal of Experimental Psychology:
or minor second (Rice, 1980). Asking such subjects to rate the
General, 135, 348-367.
pleasantness of rhythmic and pitch intervals and chords would McDermott, J. H., Lehr, A. J., & Oxenham, A. J. (2010). Individual
be a fascinating exercise. If the ratings of pleasantness for pitch differences reveal the basis of consonance. Current Biology, 20,
and rhythmic intervals and chords were found to be correlated 1035-1041.
within such a culture, but differed from the ratings given by Pressing, J. (1983). Cognitive isomorphisms between pitch and
Western listeners, this would raise questions about how neural rhythm in world musics: West Africa, the Balkans, and Western
oscillator models such as those of Large (2010) can be tuned by Tonality. Studies, in World Music, 17, 38-61.
cultural experience. Rice, T. (1980). Aspects of Bulgarian musical thought. Yearbook of
the International Fold Music Council, 12, 43-66.
ACKNOWLEDGMENTS
We would like to thank Kameron Clayton for his assistance
with presentation of the stimuli and data collection in Matlab,
APPENDIX A
Rating data (mean and standard errors) for pleasantness (P), complexity (C), groove (G), and beat induction (B) for the different
rhythmic stimuli in this study. See Table 1 for definition of symbols in column 1. Data are from 29 participants.
Symbol P P SE C C SE G G SE B B SE
O 1.034 0.274 1.034 0.034 4.690 0.344 6.655 0.188
m2 -1.069 0.302 4.828 0.290 2.138 0.271 2.690 0.290
M2 -0.931 0.248 5.207 0.182 2.759 0.288 3.241 0.251
m3 0.310 0.314 4.690 0.180 3.345 0.239 2.655 0.188
M3 1.069 0.267 3.276 0.156 5.207 0.282 5.793 0.240
P4 1.793 0.207 2.310 0.165 5.414 0.283 6.138 0.190
TT 0.207 0.282 4.793 0.240 3.655 0.259 3.862 0.242
P5 1.621 0.235 1.586 0.127 5.310 0.310 6.034 0.219
m6 1.000 0.222 3.724 0.221 3.862 0.226 3.793 0.299
M6 1.966 0.161 2.862 0.147 4.414 0.279 5.103 0.229
m7 0.724 0.289 4.172 0.238 3.310 0.268 3.966 0.260
M7 -0.345 0.239 4.138 0.226 2.759 0.177 4.414 0.241
Maj 1.586 0.176 4.138 0.247 5.759 0.231 4.966 0.278
Min -0.172 0.294 4.828 0.248 3.759 0.251 3.931 0.297
Aug 0.069 0.238 4.897 0.213 3.483 0.283 3.069 0.293
Dim -1.345 0.234 6.207 0.240 1.586 0.161 2.034 0.240
811
Table of Contents
for this manuscript
APPENDIX B
Rating data (mean and standard error) for pleasantness of pitch dyads and triads from McDermott et al. (2010). Data are from the
synthetic complex tone condition. Note that McDermott et al. did not collect pleasantness ratings for the octave or diminished triad.
Data are from 143 participants.
(NB:These tones were made from a fundamental frequency and 9 upper harmonics in sine phase, with harmonic amplitudes attenuated
at 14 dB/octave to mimic naturally occurring musical note spectra. Tone complexes were given temporal envelopes that were the
product of a half-Hanning window (10 ms) and a decaying exponential [decay constant of 2.5/sec] that was truncated at 2 s.)
812
Table of Contents
for this manuscript
IV. RESULTS
Pre-training results indicated that our participants performed
on-par with those reported in a previous study of complex
meter tap distortion by Western listeners in both
synchronization and continuation conditions. Comparing pre-
and post-training performance, we see decreased variability as
measured by standard error of tap times in both conditions.
There is a significant reduction of distortion with training in the
synchronization condition. However, no reduction is seen in
the continuation condition and there is no effect of the specific
drum rhythm trained on. The musical experience questionnaire
was used to classify each participant as a “musician” or
ISBN 1-876346-65-5 © ICMPC14 813 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
Musicians have long assumed that harmonic change influences how listeners hear
downbeats and meter– our theories and pedagogy have taught that chord changes tend to
align with downbeats and metric emphasis (e.g., Lerdahl & Jackendoff 1983, Temperley
& Sleator 1999). However, we entertain an alternate explanation. Since humans also
perceive strong beats at moments where more note onsets (or, pitch initiations) occur
(e.g. Boone 2000, Ellis & Jones 2009), it is possible that chord changes are incidental to
our perceptions of downbeats– in order to change a chord, more notes must re-strike than
when a harmony remains unchanged. It is therefore possible that correlations between
harmonic changes and downbeats are due only to patterns of note onsets rather than to the
changes in harmonies themselves.
Our experiments tease apart the parameters of harmonic change and note onsets. We
present participants with patterns that repeat the same chord twice before changing to a
new harmony, but with the first chord of the pair having fewer note onsets than the
second. Participants are then asked to tap the stronger pulse. Since the pattern alternates
harmonic change with greater numbers of note onsets, if participants tap on the pair’s
first chord, they are more influenced by harmonic change; if on the second, they are more
influenced by note onsets. Our results show that subjects tap on harmonic changes even
when the second chord contains 2 or 3 times more note onsets than the first chord;
however, subjects tap on the repeated chord when its note onsets are 4-times that of the
first chord. Further experiments show related effects of volume, key, and consonance. We
end by speculating that our results could arise from an exposure effect: using a corpus of
tonal music, we find that offbeats have on average 4-times fewer note onsets than do
downbeats, suggesting that this parameter effects listeners only when it conforms to the
properties of actual music.
ISBN 1-876346-65-5 © ICMPC14 814 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 815 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
ISBN 1-876346-65-5 © ICMPC14 816 ICMPC14, July 5–9, 2016, San Francisco, USA
Table of Contents
for this manuscript
817
Table of Contents
for this manuscript
40
35
30
25
n trials
F7
20
S7
15
10
0
0 1 2 3 4 5 6 7 8
beats tapped
Figure 3. Analysis II: Task performance (% correct) Pulse(P) vs. IV. CONCLUSION
Metrical perception (M). C=complex meter; S=simple meter; This experiment produced no evidence for hierarchical
F=fast tempo; P=mean preferred tempo. Mean for MFC: .476; processing of musical meter. It found, instead, that participants
MFS: .819; MPC: .422; MPS: .750; PFC: .845; PFS: .986; showed different behaviours in response to the pulse and the
PPC: .833; PPS: .986.
meter task. The three preliminary analyses reported were able
to distinguish between responses to the pulse, periodicity, and
To look more closely at the relationship among the subjects’ metrical aspects of the stimulus patterns. On the one hand,
response, task, and tempo, the third analysis looks at how the differences between responses to the pulse and the meter task
overall tapping responses align with the stimuli for the pulse (e.g. changing reference levels in the former) seem to suggest
and meter tasks under the two temporal conditions. In the pulse different underlying mechanisms between pulse and meter
task, the frequencies of the tapping levels (i.e. beat interval perception. On the other hand, differences in response to
between taps) are compared. The result shows a clear shift in periodicity and metrical structure seem to be due to the
reference level under the two temporal conditions, from complexity of the response pattern: there were no differences
dominantly one-beat distance (tapping on every beat) at P between simple and complex meter for tapping the periodicity,
(slower, mean preferred tempo), to two-beat distance (tapping only for tapping the metrical structure. Further analyses would
on every other beat) at F (faster, doubled tempo; see Fig.4). be necessary to test whether these differences are gradual or
However, in the meter task, no such level changes occur. categorical. However, both response patterns can be explained
Despite their larger variety, the overall tapping patterns and the on the basis of sequential processing of sound features of the
agreement across subjects remain basically the same in the two stimulus patterns.
temporal conditions. This suggests another disassociation of These results challenge Western hierarchical meter theories,
the response to the pulse and meter task –– the tempo change and the preview of cultural influence on rhythmic perception
leads to a shift in the reference level for pulse, but not for meter raises awareness that the rhythmic understanding and practices
tapping. of many cultures may not be on the same page with today’s
dominant theories.
160
140
ACKNOWLEDGMENT
120
The authors of this paper would like to thank the School of
100
Music at the Ohio State University for all the support. G. B. G..
n trials
fast
80
slow
60
40
REFERENCES
20 Arom, S. (1991). African Polyphony and Polyrhythm. 1st ed.
0
0 1 2 3 4 5 6 7 8
Cambridge, UK: Cambridge University Press.
tap interval (nr. of beats)
http://dx.doi.org/10.1017/CBO9780511518317
Clayton, M. (2000). Time in indian music: Rhythm, metre, and form in
Figure 4. Analysis III: Tapping pattern to pulse under the two north indian rag performance. New York: Oxford University
tempo conditions. Press.
Cross, I. (2003). Music as biocultural phenomenon. Annals of the New
York Academy of Science, 999, 106-111.
Also, there are no unambiguous “head nodes,” as suggested Fitch, W. T. (2013) Rhythmic cognition in humans and animals:
distinguishing meter and pulse perception. Frontiers in Systems
in the dominant Western meter theories (Fitch, 2013).
Neuroscience 7, 1-16.
Examining the average tap frequencies in the meter task, it is Geiser, E., Zaehle, T., Jankcke, L. and Meyer, M. (2009) The Neural
found that in the slow (mean preferred temporal zone) 4-beat Correlate of Speech Rhythm as Evidenced by Metrical Speech
meter, beat 1 and 3 receive similar number of taps; in the 7-beat Processing. Journal of Cognitive Neuroscience, Vol. 20, No. 3 ,
meter, beat 1 and 4 receive similar number of taps in both Pages 541-552.
temporal conditions (Fig.5); beat 1 and 6 in the 10-beat patterns Kuck H., Grossbach, M., Bangert, M. and Altenmueller, E. (2003)
also receive similar tapping frequencies in both tempi. This Brain Processing of Meter and Rhythm in Music. Ann. N.Y. Acad.
analysis does not suggest any hierarchical processing in the Sci, 999: 244-253.
meter task.
818
Table of Contents
for this manuscript