Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Estimating the Perception of Complexity in Musical Harmony

Jose Fornari & Tuomas Eerola


Finnish Centre of Excellence in Interdisciplinary Music Research

Department of Music

University of Jyväskylä, Finland

fornari@campus.jyu.fi, tuomas.eerola@campus.jyu.fi

Variance of the auditory streams amount. 3) Loudness of the


ABSTRACT strongest moment. 4) Most-likely tempo (“pulse clarity”). 5)
The perception of complexity in musical harmony is here seeing as Variance of time between beats [3]. The first and most
being directly related to the psychoacoustic entropy carried by the important descriptor is related to the spectral spreading and
musical harmony. As such, it depends on a variety of factors mainly therefore proportional to the amount of noise (or entropy)
related to musical chords structure and progression. The literature presented in the musical excerpt.
shows few examples of the computational estimation, directly from Considering the great quantity of components related to its
audio, of the complexity in musical. So far, the perception of this perception, anyone can infer that it is not an easy task to
feature is normally rated by means of human expert listening. An measure HC. As our experiments building a ground-truth data
efficient computational model able to automatic estimate the suggested, it is also difficult for the listeners to have a
perception of complexity in musical harmony can be useful in a common agreement upon their perception of HC. In our
broad range of applications, such as in the fields of: psychology, experiments, the listeners considered it to be related to a
music therapy and music retrieval (e.g. in the large-scale search of variety of chord features, such as: 1) notes composition (from
music databases, according to this feature). In this work we present simple triads, major, minor, augmented, diminished, to
an approach for the computational prediction of harmonic complex clusters), 2) chords function (e.g. fundamental,
complexity in musical audio files and compare it with its human
dominant, subdominant), 3) chords tensions (related to the
rating, based on a behavioral study conducted with thirty-three
presence of tension notes, such as: 7ths, 9ths, 13ths, etc.) and
listeners that rated the harmonic complexity of one hundred music
4) chords progression (diatonic, modal, chromatic, atonal).
excerpts. The correlation between the results of the computational
model and the listeners mean-ratings are here presented and
As a high-level musical feature, HC is a scalar
discussed. measurement that portraits the overall harmonic complexity of
a musical excerpt. This perception can only be properly
conveyed in musical excerpts that are longer than the
I. INTRODUCTION cognitive "now time" of music, considered to have
As pointed by [1], in music, “subjective complexity reflects approximately the duration of three seconds [5].
information content”. Information Theory relates complexity The objective of the study here presented is to investigate a
with the amount of entropy (also known as randomness) computational model able to describe the human judgment of
presented in the information source output. Therefore, in harmonic complexity, based on the calculation of acoustic
music, we here infer that the amount of perceptual (or principles that are related to this perception. In order to do that,
psychoacoustic) entropy is related to the sensation of we carried out studies, both behavioral and computational,
complexity. using real musical recordings excerpts, as following
In the study of Music Complexity it is common to perform described.
separately the retrieve of this feature for each musical
component, such as for: melody, harmony and rhythm.
Harmonic Complexity (HC), as here seen, is defined as the II. BEHAVIORAL ANALISYS
contextual musical feature that almost any listener can The first step was to establish an HC model ground-truth,
perceive and intuitively grade how much complex the given by the human rating of HC in music audio data. For that
harmonic structure of a particular musical excerpt is. a group of thirty-three music students were invited to listen to
Temperley, in his studies on tension in music, although not one hundred excerpts of music and rate their HC. Each person,
specifically referring to the term "complexity", suggested four isolated, listened to a random-ordered queue of these one
components that seem to be related to HC perception. They hundred music excerpts and rated their HC from zero (no
are: 1) Harmonic changing rate. 2) Harmonic changing rate on harmonic complexity) to one (harmony very complex). All
weak beats. 3) Dissonant (ornamental) notes rate. 4) music excerpts were instrumental (without singing voice), had
Harmonic distance of consecutive chords [2]. Scheirer, in a five seconds of duration and were extracted from movies
study with short duration (five seconds) musical excerpts, soundtracks.
pointed out what he considered to be the most prominent The behavioral data was collected with a program that we
acoustic descriptors of music complexity, respectively shown developed using Pd (Pure Data) language. Afterwards, the
here, from the most to the least important, they are: 1) data was analyzed, pruned, and the mean-rating of all
Coherence of spectral assignment to auditory streams. 2) listeners’ measurements was established as the HC
ground-truth. After the rating section, each listener was asked
to post comments about the experiment. They were asked to interpreted by the human audition as chords. In audio signal
describe how difficult it was to rate HC and on which musical processing, this corresponds to the fundamental partials that
features their rating was mostly based. Summarizing all their coexist in a narrow time frame of approximately fifty
opinions, they paid attention to: 1) harmonic change velocity milliseconds and are gathered in a region of frequency where
and amount of change (seven comments), traditional (triadic) musical chords mostly occur. In musical terms this is
chords versus altered chords (seven comments), predictability normally located below the melody region and above the bass
and sophistication of chord progression (five comments), line. However, as the bass line also influences the
clarity of a tonal centre, amount of instruments playing interpretation of harmony, this one also had to be taken into
together, dissonances. Overall, most of the listeners consideration on our computational model.
considered difficult to rate HC, especially in excerpts with The model starts by calculating the chromagrams for these
high amounts of musical activity, that didn’t have a clear two regions (bass and harmony). A chromagram is a form of
chord structure or for atonals and/or electro-acoustics spectrogram whose partials are folded in one musical octave,
excerpts. with twelve bins, corresponding to the musical notes in a
These comments expressed how the listeners clearly paid chromatic musical scale [7]. We separate each region using
attention on different musical aspects in order to rate HC. This pass-band digital filters in order to attenuate the effects of
was an important piece of information for the design and partials not related to the musical harmonic structure.
improvement of our initial computational model. With this in These two chromagrams are calculated for each time frame
mind we focused on which features are the most important of audio corresponding to the window-size of its lowest
ones to be retrieved in order to properly predict HC with our frequency. Therefore, each chromagram is presented in the
computational model. In order to see which concepts were form of a matrix with twelve lines, each one corresponding to
understood by listeners and well defined by us, we calculated one musical note from the chromatic scale, and several
the mean inter-subject correlation and looked at the use of the columns, corresponding to the number of time frames. We
scales. There were no apparent problems in the scales then tested three principles that initially seemed to be related
although few deviant listeners could be identified from the to the HC perception, as they were created from the studies
correlation matrix, as Figure 1 shows. and evidences mentioned in the introduction of this work.
The first principle is what we named here as auto-similarity.
This one measures how similar each chromagram column is
with each other. This calculation returns an array whose size
is the same as the number of columns in the chromagram. The
variation of this array is inversely proportional to the
auto-similarity. The rationale behind this principle is that
auto-similarity may be proportional to the randomness of the
chromagram and this one seems to be related to chord
progression, one of the studied aspects of HC perception.
Next, these two chromagram matrixes are collapsed in time,
which means that their columns were summated and their
consequent twelve points arrays normalized. As the music
excerpts rated in this experiment were all with five seconds of
duration, collapsing the chromagram seemed reasonable once
this is nearby the cognitive “now time” of music, however, for
longer audio files, a more sophisticated approach should be
Figure 1. Inter-subject correlation for HC listeners’ rating.
developed. In this work, the collapsed chromagram array
proved to be efficient in representing the overall harmonic
Assessing HC seemed to be a rather difficult task for the information for this particular small time-duration. We then
listeners or several different strategies seemed to be used. This calculate two principals out of these two arrays, named here
was already apparent in the listeners’ comments, which as: energy and density. Energy is the summation of the square
seemed to cover a rather large range of concepts. When of each array element. The idea was to investigate if the array
calculating the means for our musical feature evaluations, we energy was somehow related to the chord structure. The
eliminated those participants whose ratings did not correlate second principle, Density, accounts for the array sparsity. The
significantly with the rest of the participants. Nevertheless, idea is that, for each array, the smaller the number of zeros
only three ratings were discarded in this fashion (some of the between nonzero elements, the bigger it is its density. The
dark-blue stripes shown in this figure). intention here was to investigate how much the collapsed
chromagram density is related to the chord structure. This
came from the notion that, in terms of musical harmony, the
III. COMPUTATIONAL MODEL closer the notes in a chord are, the more complex the harmony
To create a computational model able to predict the tends to be. Figure 2 shows the diagram of the computational
complexity in musical harmony, we started by investigating model where it is depicted the calculation of these six
principles related to the amount of complexity found in chords principles of harmonic complexity.
structures and their progressions. Chords are related to the
musical scale region where note events happen close enough
in time to be perceived as simultaneous and therefore
unexpected result. Further studies with a broader range of
music genres may unveil different prospects.
Finally we calculated a multiple regression model with
these six principles. This one presented a correlation with
ground-truth of: r = 0.61. Although this is a fairly high
correlation, further studies are needed to make sure that this
multiple regression model is not over-fitted as result of the
large (six) number of components. Nevertheless, the
individual correlation of Density and Auto-similarity for both
chromagrams within the region of Harmony and Bass are, by
themselves, high enough to be sounding results. Figure 3
depicts the behavioral mean-rating and the model prediction
for the one hundred music excerpts.

Figure 2. The six principles for harmonic complexity prediction.

This computational model was written and simulated in


Matlab, where it calculated the three principles depicted
above, for two chromagrams representing different regions of
Figure 3. Multiple Regression Prediction (dots) of Behavioral
frequency, here called as: bass and harmony. The results of
Data (bars). Correlation: r=0.61.
our measurements are described in the following section.

This linear model, made with the multiple regression of


IV. MODEL PREDICTION these six principles, reached a coefficient of determination of
Using our model, we calculated the energy, density and R2 = 0.37, thus explaining about 37% of data. Its prediction
auto-similarity for the bass and harmony chromagrams. This scatter diagram, shown in Figure 4, yields to the model
resulted in six predictions per music excerpt. We then evaluation.
calculated these six principles for the same one hundred music
excerpts that were rated by the listeners (as described in
section 2). The correlation of each principle with the listeners
mean-rating is shown in the Table 1.

Table 1. Correlation of the principles with the ground-truth.


Harmony Bass
chromagram chromagram
Energy 0.2733 0.3459
Density 0.5568 0.4635
Auto-similarity 0.4626 0.5138

The principle that presented highest correlation was


Density, in the Harmony chromagram, followed by
Auto-similarity in the Bass chromagram. The principles with
least correlation were the Energy in both chromagrams. This
results support our initial supposition of HC is mostly related
to chord structure (density) and perceptual randomness
(auto-similarity). Interestingly enough, this last one is most
verified in the Bass chromagram, instead of the Harmony one, Figure 4. Computational Model Evaluation.
as we initially suspected. Although the music excerpts are
selected from a broad range of music styles, all are from
sound tracks and without singing voice, which narrowed
down their similarity, placing them in one specific genre of
music. Eventually this may have contributed to this
[4] Amatriain, X., Herrera, P. (2000). Transmitting Audio Content as Sound
V. DISCUSSION AND CONCLUSION Objects. In Proceedings of the AES 22nd Conference on Virtual,
Synthetic, and Entertainment Audio, Helsinki.
This study introduced a novel approach for the estimation [5] Leman et al., (2005). Communicating Expressiveness and Affect in
of complexity in music harmony. The majority of the music Multimodal Interactive Systems. IEEE MultiMedia. vol. 12. pg 43 - 53.
material used in the experiment can be categorized as ISSN:1070-986X.
belonging to one musical genre; orchestral movies sound [6] Likert, R. (1932). A Technique for the Measurement of Attitudes.
Archives of Psychology 140: pp. 1-55.
tracks. All excerpts were instrumental and five seconds long. [7] Chai, W., Vercoe, B. (2005). Detection of Key Change in Classical
We suspect that this may have influenced the experimental Piano Music, Proceedings of the 6th International l Conference on Music
results in at least two ways: 1) The short-duration of five Information Retrieval,}, London.
seconds, near the cognitive now time, although sufficient to
convey emotional content, is not enough to analyze two
important aspects: prosody and forgetfulness. For prosody,
these excerpts may be compared to a still frame of musical
emotional content that, in most of the cases, is a dynamic
phenomenon. The relation between the emotional prosody of
music and its effects on the overall rating of harmonic
complexity are still to be studied. For the forgetfulness, the
analysis on how the natural forgetting curve is affected by
musical novelty, by repeating patterns, or even by the
repetition of similar but variant patterns, is also to be studied
and will lead to a more complex computational model than the
one introduced here. Secondly, for the musical genre, we
believe that a further study should broad it up, taking into
consideration other genres and the listener’s predilection. Also
we didn’t consider how many times the listener repeated each
excerpt before rating it (which may have influenced the
results).
As we reported in section 2, the average of listeners found
difficult to rate harmonic complexity. This may be the reason
for the small correlation between listeners (r=0.39) shown in
Figure 1, that is significantly smaller than the correlation of
the computational model with the listener’s mean-rating
(r=0.61). This is due to the fact that several musical excerpts
are, at the same time, complex in the chord structure and in
chord progression. Maybe a further study separating these two
forms of harmonic complexity would be convenient. This
would involve two sets of music excerpts, one with only static
chords of different complexity and another one with chords of
the same structural complexity but different degrees of the
one related to chord progression.
Nevertheless, the results shown here are promising and we
hope that they can inspire further researches leading to a
better understanding of the perception of complexity in
musical harmony.

ACKNOWLEDGMENT
We would like to thank the BrainTuning project
(www.braintuning.fi) FP6-2004-NEST-PATH-028570 and the
Music Cognition Group of the University of Jyväskylä.

REFERENCES
[1] Berlyne, D. E. (1971). Aesthetics and psychobiology.
Appleton-Century-Crofts, New York.
[2] Temperley, D. (2001). The cognition of basic musical structures. MIT
Press, Cambridge, London.
[3] Scheirer, E. D., Watson, R. B., and Vercoe, B. L. (2000). On the
perceived complexity of short musical segments. In Proceedings of the
2000 International Conference on Music Perception and Cognition, Keele,
UK.

You might also like