Professional Documents
Culture Documents
Icmpc11 26apr10
Icmpc11 26apr10
PERFORMANCE
*Marcia Kazue Kodama Higuchi, #José Eduardo Fornari Novo Júnior, * João Pereira Leite
*Department of Neuroscience and Behavior Science, Faculty of Medicine of Ribeirão Preto of São Paulo
University
# Interdisciplinary Nucleus for Sound Communication (NICS), University of Campinas (UNICAMP)
This study compared two musical performances from nine pianists The recording was performed in a Piano Steinway & Sons, series
ranging from 20 to 36 years of age. The pianists are graduate or D. The audio recordings were acquired with 3 Microphones
undergraduate students of music from Art Institute of São Paulo Neumman KM 184, 2 Microphones DPA 4006, Cables Canaire, a
State University (UNESP) and Alcântara Machado Art Faculty Recording table Mackie 32/8.
(FAAM). The piece of music selected for this experiment was an
adaptation of the 32 initial bars of Trauer, in F Major; from the 12 2.4 Analyzes
piano pieces for four hands, to big and little children, opus 85,
composed by Robert Schumann. The volunteers played the primo In order to analyze each pianistic performance, we used
part and were accompanied by Ms. Higuchi, the principal computational models that were designed to retrieve specific
researcher of this present study, who played the part secondo. musical features from audio files, somehow resembling specific
cognitive processes of the human audition, when identifying and
2.2 Procedure focusing to specific musical features. The literature of Music
Information Retrieval describes two classes of acoustic
These nine volunteers passed through 4 or 5 training sessions of descriptors: low-level and higher-level. Low-level descriptors are
one hour each, before recording, where it was explained and the ones independent of context, such as the psychoacoustic
applied all the processes of music memorization until they were features (e.g. loudness, pitch, timbre) and higher-level ones are the
able to perform all the tasks as they were solicited. the features with with musical context, such as: tonality, rhythmic
pulsation, or melodic line. In (Fornari, J., Eerola, T., 2009) it was
In the first session, all the memorization process was approached. presented eight computational models designed to predict
The volunteers were asked to know by heart all the succession of contextual music aspects. They were created with the objective of
notes of the repertory explicitly (i.e. to be able to name all notes), predicting specific higher-level musical cognition features
as well as implicitly (to be able of playing it automatically). associated with emotional arousal by music stimuli, as it was one
The second session was dedicated to the afective aspects. An of the major goals of “Braintuning” (www.braintuning.fi) the
emotional stimulus was prepared to induce their affection. Sad project in which this development was done. The result of each
scenes were played and watched by the pianists so they could model is a time series with the predictions of one specific music
associate the piece to the emotional state of sadness. feature. Initially, we tested the audio files from the pianistic
performances with eight descriptors, as previously designed. They
In the third, fourth and fifth sessions, the volunteers were are named: Pulse Clarity (PC), Key Clarity (KC), Harmonic
instructed to play the music in two different manners. In the first Complexity (HC), Articulation (AT), Repetition (RP), Mode
task, they were instructed to play the music thinking of each note (MD), Event Density (ED) and Brightness (BR).
they were playing (when they played with two hands, they were
instructed to think of the right hand notes only). In the second Pulse Clarity (PC) is a descriptor that measures the sensation of
task, they were instructed to play thinking of the emotional pulse in music. Pulse is here seen as related to agogics; a
fluctuation of musical periodicity that is perceptible as “beatings”,
in a sub-tonal frequency (below 20Hz), therefore, perceived not as Repetition (RP) is a descriptor that accounts for the presence of
tone (frequency domain) but as pulse (time domain). This is due to repeating patterns in musical excerpts. These patterns can be:
the fact that a pulse whose frequency is faster than 20Hz is melodic, harmonic or rhythmic. This is done by measuring the
supposed to be perceived by the human auditory system not as a similarity of hopped time-frames along the audio file, tracking
rhythmic stimuli but as a tonal stimuli. In the same way, if the repeating similarities happening within a perceptibly time delay
pulse is to slow (let say, 1 pulse per minute) it is very alike that (around 1Hz to 10Hz). Its scale ranges continuously from zero
listeners would not identify the pulse queue, but perceive each (without noticeable repetition within the musical excerpt) to one
pulse as an independent stimuli. The pulse predicted by PC can be (clear presence of repeating musical patterns).
of any musical nature (melodic, harmonic or rhythmic) as long as
it is perceived by listeners as an ordered stimuli in the time Mode (MD) is a descriptor that refers to the major, or Ionian,
domain. The measuring scale of this descriptor is continuous, scale; one of the eight modes of the diatonic musical scale. The
going from zero (no sensation of musical pulse) to one (clear most identifiable ones are: major (Ionian) and minor scales (such
sensation of musical pulse), independent of its frequency (there is as the Aeolian). They are distinguished by the presence of a tonal
no distinction between “slower” or “faster” pulses). center associated to intervals of major / minor thirds in the
harmonic and melodic structure. In the case of our descriptor, MD
Key Clarity (KC) is a descriptor that measures the sensation of is a computational model that retrieves from musical audio file an
tonality, or musical tonal center. This is related to the sensation of overall output that continuously ranges from zero (minor mode) to
how much tonal an excerpt of music (a sequence of notes) is one (major mode).
perceived by listeners, disregarding its specific tonality, but only
focusing on how clear its perception is. For instance, if the Event Density (ED) is a descriptor that refers to the overall
sequence of notes: C, D, E, F, G, A, B, C is played in ascending amount of identifiable, yet simultaneous (melodic, harmonic or
order, there will be, in most people, a clear identification of a tonal rhythmic) events in a musical excerpt. Its scale ranges
center; the C major scale. In opposition, another sequence such as: continuously from zero (only one identifiable musical event) to
C, F#, F, B, A#, A would not be easily related to any tonal center. one (maximum amount of simultaneous events that an average
KC prediction ranges from zero (atonal) to one (tonal). listener can distinguish).
Intermediate regions, neighboring the middle of its scale tend to Brightness (BR) is a descriptor that retrieves the synesthetic
refer to musical excerpts with sudden tonal changes, or dubious sensation of musical brightness. It is somewhat intuitive to realize
tonalities. that this aspect is related to the audio spectral centroid, as the
Harmonic Complexity (HC) is a descriptor that measures the presence of higher frequencies accounts for the sensation of a
sensation of complexity conveyed by musical harmony. In brighter sound. However other aspects can also influence its
communication theory, musical complexity is related to entropy, perception, such as: attack, articulation, or the unbalancing or
which can be seen as the amount of disorder of a system, or how lacking of partials in the frequency spectrum. Its measurement
stochastic is a signal. However, here we are interested in goes continuously from zero (opaque or “muffled) to one (bright).
measuring the “auditory perception” of entropy, instead of After analyzing the time series retrieved from the recordings of the
acoustic entropy of a musical sound. For example, in acoustical pianists with these eight descriptors, we concluded that the
terms, white-noise is a very complex sound, yet its auditory descriptors of Articulation (AR) and Pulse Clarity (PC) were the
perception is of a rather unchanging stimuli. The challenge here is most sensitive for the musical distinctions that set apart affective
finding out the cognitive identification of harmonic complexity, and cognitive pianistic performances. This seems to be intuitively
leaving (for now) the melodic and rhythmic complexity. The expected as AR measures the variations in the melodic line and PC
measuring scale of this descriptor is continuous and goes from measures the presence of musical pulse; two features known to be
zero (imperceptible harmonic complexity) to one (clear related to expressiveness in music. AR predictions range from
identification of harmonic complexity). legato (near zero) to staccato (positive values). PC predictions
Articulation (AR), as described in music theory, usually refers to range from “ad-libitum” (near zero) to “pulsating” (positive
the fingering in which a melodic line can be played. There are two values). A thorough explanation on each of these descriptors, as
opposite ways of playing a melody: staccato and legato. Staccato well as on the computational models behind the retrieval of these
means “detached” and refers to the way of playing a melody musical aspects, can be found in (Fornari, J., Eerola, T., 2008) and
inserting a small pause between the notes so the overall melody (Lartillot, O., et al. 2008).
sounds detached. In opposition, legato means “tied together” and The following Figures show an example of the time series for the
refers to the playing of a melody avoiding noticeable pauses prediction of PC and AR, for the performances of the first pianist
between notes, so the overall melody sounds linked, or connected. (1); in the cognitive (c) and affective (e) performance. The title of
This descriptor attempts to retrieve the articulation information each graphic describes which measurement is shown. For instance,
from musical audio files, by detecting frequent pauses or sudden 1cAR refers to the audio file of Pianist 1, in the cognitive attention
drops of intensity in the music audio files, and attributing to it an recording, for the prediction (time series) given by the
overall rank that continuously ranges from zero (legato) to one computational model of ARticulation.
(staccato).
Figure 1. Example of the predictions of AR and PC descriptors,
for the cognitive (1c) and affective (1e) performance of one
pianist.