Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

2.

2 Prechtl (2016): a case of study

A recent piece of research by Prechtl (2016) combined game’s atmospheric function, where
music focuses on an emotional engage with the player (Jorgensen, 2006), with interactive
procedural audio (”composition that evolves in real time according to a specific set of rules
or control logics” (Collins, 2009)). He developed a first-person game where the player has to
escape a maze without being caught by moving enemies. The music in the game is designed
to express the game’s emotional narrative. To do so, the degrees of intensity are defined ac-
cording to the distance between the player and the enemies, stored as the level of danger.
The correlation between the level of danger and emotion is primarily based on Russel (1980),
where emotions are displayed in a 2D valence-arousal space. Valence refers to how positive
or negative an emotion is, and, arousal refers to how exciting or calm an emotion is. So that,
in Prechtl’s music generator system high danger correlates with negative valence and high
arousal, and low danger correlates with positive valence and low arousal, thus mapping the
analysis of the game’s narrative. The system’s musical output are synthetic chord sequences,
whose mode, rhythm, articulation and loudness are defined by a set of control parameters,
what allows the music to match the level of danger and, thereby, a certain emotion.
Prechtl’s approach shows an effective and viable method to support games’ narratives
with automatically generated music. This is also a step forward in the search for musical
meaning in music generative models. Furthermore, it can also be considered a starting point
and a trigger for new research ideas. That is why, following Prechtl (2016) methodologies and
results, we have designed a system that generates tonal accompanied melodies automatically,
according to a specific level of tension.
The music generator system presented in Prechtl (2016) generates a sequence of chords
to support a dynamically changing narrative. Said system is controlled by input parameters.
Those are set to represent low and high tension scenarios, conceiving tension as in the valence-
arousal model: an increase in tension means a decrease in valence and an increase in arousal.
The chords to be output by the system are stochastically selected using a first-order Markov
model. To do so, a transition matrix sets the probabilities of forty-eight chords to follow the
previous chord in the sequence. The initial probabilities are roughly the usual root progres-
sions in the Western musical tradition (Piston, 1948). Those probabilities will be modified by
the control parameters according to the game’s narrative. The Markov model selects a chord
to continue the sequence by generating a weighted-random number based on the transition
probabilities of the previous chord.
The most relevant control parameters used in Prechtl (2016) for the music generator are:

• Tempo: the speed at which the output chords are played. It is given in beats per minute
(BPM) and it does not strictly correlate with musical rhythm, but with the number of

6
pulses per time span.
• Velocity: the strength at which the attacks of the output notes are played. It corre-
sponds to the ’MIDI velocity’ descriptor, which affects the loudness and timbre prop-
erties of the output notes to make them sound more human.
• Volume: the overall loudness of the output sequence.
• Timbral intensity: a synthesizer-dependent descriptor that controls the notes’ timbre
and loudness after their onset.
• More major chords: controls the weighting of the output chords to pertain the C major
scale. The chords within the C major scale are given specific transition probabilities
according to usual root progressions of the Western music tradition.
• More minor chords: controls the weighting of the output chords to pertain the A
minor scale, similar to the previous parameter.
• More diminished chords: filters the chord transition probabilities by favoring dimin-
ished chords.
• More dominant chords: filters the chord transition probabilities by favoring major
seventh chords.
• More tonal: filters the chord transition probabilities by reducing the weightings of
chords with unclear functions in the Western music tradition.
• More diatonic: filters the chord transition probabilities by reducing the weightings of
non-diatonic chords.

Those parameters and their values set the tension scenarios within the game. The Low
Tension Case, intended for the player at a safe distance from enemies, sets the control parame-
ters to generate a consonant musical output by giving low values for tempo, volume, velocity
and timbral intensity. Diatonic chords, tonal transitions and minor mode are favoured. The
High Tension Case, intended for the player in a danger situation, that is close to an enemy,
sets the control parameters to generate more dissonant and unpleasant music. Non-diatonic
chords and chords from different keys are favoured, tempo is twice as much as in the Low
Tension Case. Volume, velocity and timbral intensity values are high.
The final musical output corresponds to an interpolation between the Low and High
Tension Cases. The level of danger, set as the distance to the enemies, is expressed in the range
[0,1], what allows to interpolate the set tension parameters and generates a musical sequences
whose characteristics support the game’s emotional narrative.

7
Prechtl (2016) carried out some empirical studies to evaluate the music generator system
and its impact on his own designed computer-game. A first study focused on the ability of a
preliminary version of the music generator system to convey specific emotions. In a first stage,
participants were asked to label musical sequences as jovial, serene, sad or hostile, represent-
ing the four areas in Russel (1980) valence-arousal model. The results show a 63% overall
accuracy between the intended emotion and the participants responses. In a second stage,
participants were asked to point out a specific location in the valence-arousal model, show-
ing a 77% overall accuracy between the intended emotion and the participants responses.
Finally, in a third stage, participants listened to musical sequences that include transitions
between emotions, between areas of the valence-arousal model, and they were asked to label
them according to the perceived emotions. The results show a 63% overall accuracy between
the intended emotion and the participants responses. In all three stages the Arousal match
was far more accurate than the Valence match.
A second study evaluated the impact of the the generated music in the game. More specif-
ically, dynamic music was compare to static and no-music conditions. Dynamic music refers
to the previously commented music generator system, whereas static music refers to a chord
sequence whit fixed values of tempo, volume and timbre. Output chords were selected from
the interpolated transition probabilities of a 15% tension level. As a result, dynamic music
was perceived as the most emotionally arousing of the three.

Prechtl (2016) showed a viable approach to generating music algorithmically corresponding


to emotional states, supporting a game narrative and evidencing a positively impact on the
player’s experience. We now present some limitations of his model and the challenges moti-
vated by these:
1. Music and emotion is a state-of-the-art topic and its association has been regularly ex-
amined during the last decades. Analytical approaches focused on musical descriptors,
such as those within Music Information Retrieval research, have achieved a similar
matching accuracy as the one got by Prechtl (2016) (between 60% and 70%). How-
ever, in Prechtl (2016) the emotional analysis was narrowed to the idea of tension. His
preliminary analysis studied the ability of the system to generate music according to
a specific emotion, but the final system evaluated the suitability of the music based
on tension cases. We agree musical tension is a strong feature to focus on: it was pro-
posed as a future prospect to improve musical meaning in music generative systems
(Papadopoulos & Wiggins, 1999), it has been empirically analyzed as an influence of
experiencing musical climax (Patty, 2009), it has been widely studied from an empir-
ical point of view since the twentieth century (see Milne (2013) chapter 5 for a review
of 9 probe-tone models published in the period 1982-2012), and, largely studied in har-
mony textbooks (Piston, 1948; Schoenberg, 1974). Likewise, there are some evidences

8
of its correlation with games’ narratives and the enhancement of playability (Yoo &
Lee, 2006; León & Gervás, 2012; Robertson et al., 1998). So, why not focus on the
musical generation and its analysis based on the level of musical tension (without a
correlation with emotions)?

2. The first preliminary model presented in Prechtl (2016) was tonal. The chords to be
output in that model were in the C major and A minor keys. The transition probabil-
ities of those chords were based on their functions within tonality. C major was asso-
ciated with high valence (happy/calm), and A minor was associated with low valence
(angry/sad). The differences in arousal were set by adjusting the tempo and velocity.
The transition between emotions was done again by interpolating the C major and A
minor probabilities. That means the chords being output corresponding to the maxi-
mum valence were played in C major, whereas the chords corresponding to the mini-
mum valence were played in A minor. During the transition, the calculated probabil-
ities were interpolated, meaning that tonal functions from both C major and A minor
were used to decide which chord to output next. This could be interpreted as a con-
tinuous modulation stage, where emotions closer to one of the maximum/minimum
valence areas have greater probabilities to output a chord corresponding to that area.
The final model in Prechtl (2016) uses transition probabilities from the tonal context
in the Low Tension Case. The High Tension Case uses chords from different tonal-
ities all together. To what extent the perceived emotions, of the sequences generated
with this model, just depend on the difference consonant/dissonant? The usage of
chords from different tonalities in the High Tension Case could be consider stochastic.
We think that the same idea of using functions of a specific context (e.g. some atonal
vanguards) or rules of progression within polytonality may have driven the research
project in a different direction, where high tension was not just based on dissonance
and unexpectedness. Most of the Western tradition music of the past centuries (XVII,
XVIII and XIX) has been composed within tonality. And, indeed, different degrees
of tension were achieved. After the apogee of vanguards and atonal music in the twen-
tieth century, some film soundtracks, game music and current music are again tonal
at present. Having said that, in order to compose music automatically supporting a
narrative, is it possible to generate completely rule-based tonal musical sequences ac-
cording to specific degrees of tension?

3. Prechtl (2016) uses the Tempo descriptor as as a strong predictor of the arousal di-
mension. It is also a key difference between the two musics played in his final exper-
iment: dynamic and static. However, the perception of tempo might be influenced

9
by rhythm. According to Prechtl’s model, a situation where the player is always at the
same distance from enemies would output chords at the same rate. The interpolation
system, already sufficiently commented, makes the differences in tempo more varied.
However, a mathematical interpolation of the probabilities associated with the tempo
descriptors prevents from calculating a rhythmic structure. So, as a first stage, why not
design a music generator with a specific rhythmic layer that can be controlled according
to the tension level? Unstable tempo situations could be added afterwards to multiply
the effect of the increase of tension.

4. The musical output in Prechtl (2016) is a chord sequence. Would it be possible to


generate a melodic output as well, matching the tension profile? Which would be the
impact on the melody regarding the level of tension?

2.3 Musical tension

Both in the theoretical and psychological literature about music, tension is one of the most
fundamental concepts (Granot & Eitan, 2011). However, there is not a unique definition
and interpretation of tension in music yet. The two major approaches to the study of musi-
cal tension are described in Bigand et al. (1996), Granot & Eitan (2011) and in Farbood (2012)
as: (1) hierarchical expectation models and (2) psychoacoustical studies. The former concern
harmonic, melodic and rhythmic aspects of tonal music, analysed from the perspective of
tonality as a hierarchy with specific organizations and functions (in the Western tradition).
In this approach, tension is directly correlated with expectation, meaning that a rise or de-
crease in expectation entails an increase or resolution of musical tension, respectively. The
hierarchical approach includes analytical models as the Generative Theory of Tonal Music
(GTTM) (Lerdahl & Jackendoff, 1985) and its derivative Tonal Pitch Space theory (Lerdahl,
1988); harmonic model of tonal tension and attraction (Lerdahl & Krumhansl, 2007); as well
as models concerning melodic expectation, such as Narmour (1992), Huron (2006) and Mar-
gulis (2005). The later approach is based on models where psychoacoustic parameters are
the source of musical tension. The main parameters include dynamics (Krumhansl, 1996;
Farbood, 2012), the perceived dissonance (Pressnitzer et al., 2000), pitch height (Krumhansl,
1996; Granot & Eitan, 2011), loudness (Granot & Eitan, 2011; Huron, 2006; Ilie & Thomp-
son, 2006), rhythmic patterns (Dawe et al., 1993; Swain, 1998; Fernández-Sotos et al., 2016)
and tempo (Ilie & Thompson, 2006; Farbood, 2012; Granot & Eitan, 2011; Fernández-Sotos
et al., 2016), among others. It is interesting, however, to note that the results and conclusions
of those studies often disagree with each other (Granot & Eitan, 2011).

10

You might also like