2019 Reproducing Musicality PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Reproducing Musicality: Detecting Musical Objects

and Emulating Musicality Through Partial Evolution


Aran V. Samson Andrei D. Coronel
Ateneo de Manila University Ateneo de Manila University
DISCS DISCS
Quezon City, Philippines Quezon City, Philippines
avsamson@ateneo.edu acoronel@ateneo.edu

Abstract—Musicology is a growing focus in computer science. style or a specific composer’s music and produce a musical
Past research has had success in automatically generating music piece based on his exposure to this large amount of data, it is
through learning-based agents [1] that make use of neural
also possible for musicians and composers listen to a single
networks and through model and rule-based approaches [2].
These methods require a significant amount of information, either piece of music and take elements from this then to create a
in the form of a large dataset for learning or a comprehensive new piece of music, similar to the original with specific and
set of rules based on musical concepts. This paper explores a noticeable takeaways from the original. An example of this can
model in which a minimal amount of musical information is be seen in pop artist Madonna’s song entitled Frozen which
needed to compose a desired style of music. This paper makes
directly takes elements from popular composer Salvatore Ac-
use of objectness, a concept directly derived from imagery and
pattern recognition to extract specific musical objects from a quaviva’s Ma vie fout le camp. While this example resulted
single musical piece. This is then used as the foundation to in a case of plagiarism against Madonna [8], it is an example
produce a new generated musical piece that is similar in style of a human’s capability in both identifying defining musical
to the original. The overall musical piece is generated through objects in a musical piece as well as being able to use this to
a partial evolution. This method eliminates the need for a large
compose an original composition.
amount of pre-provided data and directly composes music based
on a singular source piece. This study proposes a model in which this seemingly only-
human skill is emulated in algorithmic composition. This study
Index Terms—computer music, evolutionary algorithm, object- investigates how the proposed model can generate similarly
ness styled musical pieces based off a specific singular source piece.
This study focuses primarily on popular and modern music
I. I NTRODUCTION compositions.
Algorithmic music generation has had great strides in recent This study borrows technical concepts from imagery and
years in terms of musicality and overall pleasantness from the pattern recognition, specifically, the concept of objectness
perspective of human listening, so much progress has been wherein objects can be detected in a provided image even
made, in fact, that there have already been discussions if music without prior training of an image recognition system [9]
can be used as a valid metric in satisfying the Turing Test [4]. [10]. As this concept does not require training, and therefore
Studies and applications have been designed in an attempt to does not require large datasets, the application of this concept
model musical creativity for the use in computer systems [5] becomes an essential component in this study’s objective of
[6]. music composition based on musical object detection.
Musical models based on learning musical styles, or based This study takes musical objects and uses them as founda-
on music-theory rules require a significant amount of data. A tion for algorithmic music composition based on evolutionary
dataset of multiple musical pieces need to be collected and algorithms. In this study, extracted musical objects are con-
used as input in learning based systems for learning-based sidered permanent in the used evolutionary algorithm, hence
musical composition [1] while a significant number of rules is this algorithm is considered a selective or partial evolution
required to be defined in rule-based systems [2]. Other systems wherein certain parts of a musical composition is already at
make use of converting textual data into specific emotion it’s best state while the rest of the composition is evolved to
profiles then further converting this data to musical elements match the original composition’s style. To assess the similarity
[7]. All these systems and studies rely on the use of a vast of style, musical features are extracted from both the original
amount of data to produce a singular piece of music. piece and the evolving musical piece and distance is compared
This study designs a model in which large amounts of data between feature values. This method has been used in prior
is not required in composition and instead only requires a research in musical composition [11]. Features are extracted
singular musical piece to be used as basis in generating a using concepts from feature extraction application, jAudio,
similarly-styled new composition. While it is indeed a common produced by Corey Mckay [12].
occurrence that a composer or musician can study a specific This study aims to build a algorithmic music-generation

978-1-5386-7822-0/19/$31.00 ©2019 IEEE 014 ICAIIC 2019


model from which more improvements can stem from in the The algorithmic composition method used in [11] are directly
realm of automatic music composition. This model is designed applied in this study.
to be a beginning step in simple music generation based on This study makes use of a distance metric was devised
singular music compositions. using extractable musical features as basis [23]. This distance
metric is used in comparing the rhythmic and pitch values and
II. D ISCUSSION OF R ELATED L ITERATURE distances to determine the saliency of musical segments and
adjacent notes.
This research primarily takes from the concepts of imagery This study focuses on popular and modern music as the
and pattern recognition. Pattern recognition is often synony- specific objects this study attempts to find are what are called
mous to machine learning in that typically, a large dataset of musical hooks. Musical hooks are defined as involving a
learning material is first fed into a system to learn what an repetition of series of notes, and can be loosely defined as
object is, then this is used to evaluate new data to determine a musical catch phrase [21, 22]. In this study we focus on
if the learned object exists in an image or to count the number deriving hooks from popular music.
of this specific object in a presented image [18]. These objects
are also often referred to as classes. III. M ETHODOLOGY
This study takes a separate concept of imagery and pat-
The goal of this study is to develop a model in
tern recognition, specifically the pattern recognition concept
algorithmically generating a new piece of music directly
of objectness. While original approaches often specialize in
based of a specific source composition. To achieve the overall
detecting specific objects or classes such as cars or birds,
goal of this study, the following major steps are taken:
the concept of objectness identifies, in a given image,generic
objects. Objects are standalone things with a well-defined
boundary and center such as cars and cows, as opposed to 1) Define objectness and saliency in the context of compu-
amorphous background stuff such as skies and grass [10]. tational music.
Objectness and object detection relies on a visual concept 2) Using this objectness definition, extract musical hook
of saliency [9]. Object saliency is described as an object objects of a specific song.
having a well-defined closed boundary in space, and a different 3) Through an evolutionary algorithm, generate and evolve
appearance from its surroundings [10, 19, 20]. a musical piece that will be similar to the original
While there have been studies involving pattern recognition source material based on selected musical features.
in music such as the use of pattern recognition in musical style
identification using statistical descriptors [16], music summa-
rization and music transcription [17], there has been little A. Musical Objectness and Saliency
prior research in applying the specific concept of objectness As imagery defines objectness and saliency as objects
in music.This may be due to objectness being a more recently having a distinct boundary and different from a background,
conceived concept, as well as the difficulty in translating the we can draw comparisons of these concepts with music.
concept of image object saliency to music. In this study, an Background, in music, can be defined as the backing track,
attempt was made to create a saliency metric for musical and the rhythm section of a selected piece of music. In [10]
objects. backgrounds score lowest in saliency and are often disregarded.
Evolution-based algorithmic composition are based off the This study takes the same approach in which the backing
optimization of outputs based on a set of possible solutions, and rhythm sections of musical pieces can be disregarded.
or evolutions, from an initial species. The solution from that Imagery requires boundaries which can be defined either by
set of outputs that is determined closest to a pre-determined color or by texture [9, 10]. In this study, based on definitions
end-goal will then be considered next candidate for further of musical hooks [21], we define musical boundaries as either
evolution. This goes on indefinitely until the end-goal is rhythmic or pitch based. Pitch based boundaries take into
achieved [14, 15]. The distance from this pre-determined end- consideration the overall pitches used in a musical piece. An
goal in evolutionary and genetic algorithms is called fitness, example of a pitch-based high-saliency object is the choruses
a metric that measures how close the evolution has reached of the song Loving You by Minnie Riperton wherein the chorus
the most optimal state. In the case of musical composition, passages are pitched noticeably higher than the verses. A
the fitness can be calculated based on musical features and rhythmic boundary can be defined as a noticeable change
how close their values are to pre-determined optimal val- in note density per set time. Note density per time can be
ues [11, 15]. The distance calculation is calculated through described as a noticeable increase or decrease in the amount of
mathematical distance concepts such as the Euclidian distance notes played per second then returning to a normailzed state.
metrics [11, 17], but the specific features to be taken into An example of a noticeably less dense notes per time is in
consideration are important to be determined. This can be the song Sweet Child of Mine by Guns and Roses when the
based on musical literature as well as statistical analysis chorus has noticeably less notes than the verses for the vocal
[11, 13]. This study explores the use of musical literature track per time then normalizes back in the following berse.
in determining which features are used in fitness calculation. In the same song, the introductory guitar part also functions

015
similarly, from starting the song, (no notes played per second), algorithm extracts a salient section, the same MIDI file but
to the intro itself, then the rhythm normalizes back to the verse. excluding the salient section is used as input for the algorithm.
To summarize, this study defines musical saliency as how This can be done multiple times to determine the most salient
noticeably distinct a musical object is from the backing rhythm sections of a MIDI input.
tracks in a selected piece, as well as how noticeable the
C. Automatic Music Generation
difference is in pitch and rhythm for that musical object relative
to the normailzed state and averages in the piece. After an arbitrarily determined number of salient sections
have been extracted, this set of salient sections is then used as
B. Musical Object Extraction the foundation for a new composition. Algorithm 2 shows how
1) Algorithm and Description: Algorithm 1 describes the the composition algorithm generates music. This algorithm is
algorithm for musical object extraction in MIDI files. As MIDI a direct adaptation from [11].
files have explicit note and note descriptors, it may not be
difficult to extract data, and as such, musical objects from Algorithm 2: Evolutionary Algorithm Using Feature Val-
compositions. ues as Fitness Criteria
1 function CompositionalEvolve (s, v);

Algorithm 1: Extracting Musical Objects Algorithm Input : Set of salient sections s , original MIDI filev
Output: Emulated Music File
1 function MusicObjectExtraction (i); 2 i ←base empty MIDI file with randomly injected salient
Input : Source MIDI file i sections;
Output: Musical note segment 3 repeat
2 g ←global pitch averages and rhythm averages of i ; 4 f i ←calculated feature values of i ;
3 t ←first four notes of i ; 5 f v ←calculated feature values of v ;
4 tx ←rhythm and pitch profile of t ; 6 di ←distance of f i from f v ;
5 repeat 7 m ←partially mutated i ;
6 p ←next four notes of i ; 8 f m ←calculated feature values of m;
7 px ←rhythm and pitch profile of p ; 9 dm ←distance of f m from f v ;
8 determine if px is closer to g or tx; 10 if dm < di then
9 if px is closer to tx then 11 i←m
10 if px is within similarity threshold to tx then 12 end
11 append p to t; 13 until dm ≈ 0;
12 end
13 else
14 t ← p; The original algorithm that was used as basis is elaborated
15 end in [11]. It is described as taking a base blank MIDI file and
16 end evolving one note per iteration and eventually achieving a
17 until end of file; feature-value profile target. Mutations are the same in this
study as with the original algorithm except that the salient
sections extracted from the original composition are injected
Algorithm 1 extracts a singular musical object from an input and are considered uneditable. These sections can be moved
MIDI file i. Variable g is an object containing two values, the time-wise but the structure, i.e. the pitch and rhythm of these
numeric average of the quantified pitch profile of i as well sections can not be changed. The cycle of mutation continues
as the numeric average of the quantified rhythm profile of i. indefinitely until compositions reach a threshold of fitness
Variable t is the first section of the input MIDI file that is defined as feature values being close to the feature values of
tested for musical saliency with variable tx being an object the original source file [11].
that contains the quantified pitch and quantified rhythm profile
of that section. Variable p is the next set of notes which is IV. R ESULTS AND A NALYSIS
tested for saliency. Variable px is the rhythm and pitch profile The methodology was executed by applying the algorithms
of p. Variable px is compared with both tx and g to determine if for salient section extraction and automatic music generation
it is salient compared to t. If it is salient, the distance between to a single song, in this study focusing on popular music, the
px and tx are then compared. If their distances are within an primary test case is Justin Bieber’s Baby. A visual representa-
arbitrarily set threshold, p is then added to t, making t longer tion of this song’s melody track is shown in Figure 1.
in length. If outside the set threshold, p becomes the new p.
This distance calculation is based on [23]. The output of this A. Extracting Salient Sections
algorithm is a section of the input MIDI file which has the Figure 1 shows three types of sections, high saliency in
most perceived saliency. green, noticeable saliency in blue and no saliency uncolored.
This algorithm is used multiple times for a single MIDI file These types were set as groups based on the extracted salient
to determine multiple sections of saliency. To do this, after the sections when this MIDI file was used as input for Algorithm

016
Fig. 1. Visual representation of saliency. High in green, noticeable in blue.

1 and also by reviewing each section’s feature values for pitch a musical piece that emulates the feel of the original piece of
and rhythm. Pitch and rhythm feature values for each section, music, it did so to such a high degree that the composed piece
extracted through jAudio, are shown in Table I. may sound like part of, or an excerpt of the original piece. A
revision of the original algorithm and its limitations was made.
TABLE I
S ALIENCY SECTIONS 2) Revised Algorithm and Compositions: The original al-
gorithm described in Algorithm 2 was adjusted to be able to
Section Mean Note Mean Saliency
Pitch Density Note
modify even the restricted sections. Parts of the salient sec-
Duration tions, under the revised algorithm, can now be overwritten. An
Baseline 77 2.41 0.3389 - example output of this new algorithm is visually represented
Intro 75 1.688 0.2593 Noticeable in Figure 3.
Verse 1a 78 2.235 0.3097 -
Verse 1b 76 2.278 0.3152 -
Chorus a 78 2.3 0.4264 High
Chorus b 77 4.5 0.3333 High
Chorus c 78 2.3 0.4264 High
Chorus d 78 3.333 0.3462 Noticeable
Verse 2a 77 3.5 0.2747 -
Verse 2b 76 2.3 0.291 -
Fig. 3. Revised algorithm composed musical piece
Verse 2c 76 2.389 0.3059 -

The sections highlighted in green and blue are considered Figure 3 shows the use of the same salient sections from the
salient as they have a noticeable difference in pitch and rhythm original MIDI file shown in blue and generated sections shown
from the baseline features values of the original song. The in yellow. A key difference is that the blue sections now are not
chorus is divided into four parts, a through d, and has high completely the same as from what was in the original. Parts of
saliency for the first three parts but only a noticeable saliency the salient sections are intact but some have been overwritten
for the last. The intro section has a noticeable saliency as it is by the algorithm. This, through aural inspection, gives a sense
of a noticeably slower rhythm and lower average register than of similarity to the original work but does no longer appear
the rest of the song. Parts of the chorus have high saliency to be taken directly from it unlike the unmodified algorithm’s
either because of longer note durations, higher note density or output.
a combination of both. V. C ONCLUSION
B. Algorithmically Composing Music Overall the results of the modified algorithm show that it is
1) Initial Compositions: 5 salient sections have been indeed possible to emulate the musicality of specific pieces
extracted, the intro section, and four sections of the chorus of music through partial evolution. It was discovered that
from the source MIDI file. Using some of these as foundation, while initially a valid hypothesis, retaining unchanged salient
a short algorithmically generated musical piecewas composed sections is not an effective way to emulate a specific song’s
using the defined Algorithm 2. The primary fitness measure musicality but rather it leads only to the composition of a
will be the rhythm and pitch profile of the original MIDI possible new section to an original section.
file. An example of a composed musical piece is visually This study, through algorithmic composition, shows that
represented in 2. computers may be able to create music inspired by single
specific musical creations through the use of objectness or
saliency in conjucntion with algorithmic evolutionary compo-
sition methods.
Results show that emulating musicality is possible in such
a way that a human listener can emulate specific styles from
listening to one piece of music. The most obvious parts of a
Fig. 2. Initial composed musical piece composition is taken and is then made open to interpretation.
The musical objects that are retrieved may be used differently
Figure 2 shows the use of a high saliency section in green but still lead to musical compositions in the same style as the
and a noticeably salient section in blue but with algorithmi- original.
cally generated sections added shown in yellow. While this Recommended succeeding studies may include a section-
algorithm was able to produce the desired result of composing per-section analysis of musical features. It may also be of

017
significance to further break down salient sections of musical [22] Kronengold, C. (2005). Accidents, hooks and theory. Popular Music,
24(3), 381.
pieces to a more granular level. This may result in smaller [23] Samson, A. V., & Coronel, A. D. (2018, February). Estimating Note
musical objects and may lead to more creative compositions Phrase Aesthetic Similarity using Feature-based Taxicab Geometry. In
that are inspired by an original composition. Digital Arts, Media and Technology (ICDAMT), International Conference
on. IEEE.
These results prove to be useful as the method used may be
considered for future music analysis studies and can be taken
as steps in further improving the state of computer musicology.
The method of this study, as it is proven to be effective, may
be used as basis in not only algorithmic musical composition,
but also in other art forms as well as in the use of building
artificial intelligence models for creative learning.

R EFERENCES
[1] Horner, A., & Goldberg, D. E. (1991). Genetic algorithms and computer-
assisted music composition. Urbana, 51(61801), 437-441.
[2] Boenn, G., Brain, M., De Vos, M., & Ffitch, J. (2011). Automatic music
composition using answer set programming. Theory and practice of logic
programming, 11(2-3), 397-427.
[3] Cope, David, et al. Virtual Music: Computer Synthesis of Musical Style.
MIT Press, 2004.
[4] Belgum, E., Roads, C., Chadabe, J., Tobenfeld, T. E., & Spiegel, L. (1988).
A Turing Test for” Musical Intelligence”?. Computer Music Journal,
12(4), 7-9.
[5] Wiggins, G. A. (2007). Computer models of musical creativity: A review
of computer models of musical creativity by David Cope. Literary and
Linguistic Computing, 23(1), 109-116.
[6] Cope, D. (2005). Computer models of musical creativity (p. xi462).
Cambridge: MIT Press.
[7] Davis, H., & Mohammad, S. M. (2014). Generating music from literature.
arXiv preprint arXiv:1403.2124.
[8] ”Entertainment — Madonna in Plagiarism Case Defeat.” BBC News,
BBC, 18 Nov. 2005, news.bbc.co.uk/2/hi/entertainment/4449580.stm.
[9] Shah, S. A. A., Bennamoun, M., Boussaid, F., & El-Sallam, A. A.
(2013, February). Automatic object detection using objectness measure.
In Communications, Signal Processing, and their Applications (ICCSPA),
2013 1st International Conference on (pp. 1-6). IEEE.
[10] D. T. Bogdan A. and V. Ferrari, ?Measuring the objectness of image win-
dows,? IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 99, no. PrePrints, 2012.
[11] Samson, A. V., & Coronel, A. D. (2016, November). Evolutionary
algorithm-based composition of hybrid-genre melodies using selected
feature sets. In Computational Intelligence and Applications (IWCIA),
2016 IEEE 9th International Workshop on (pp. 51-56). IEEE.
[12] McKay, C., Fujinaga, I., & Depalle, P. (2005, September). jAudio: A
feature extraction library. In Proceedings of the International Conference
on Music Information Retrieval (pp. 600-3).
[13] A. Coronel (2013) Building an Initial Fitness Function Based On An
Identified Melodic Feature Set for Classical and Non- Classical Melody
Classification, International Conference on Information Science and Ap-
plications
[14] D. Matic (2010). A Genetic Algorithm for Composing Music, Yugoslav
Journal of Operations Research
[15] M. Towsey et al (2001). Towards Melodic Extension Using Genetic
Algorithms, Educational Technology and Society 4
[16] De Len, P. J. P., & Inesta, J. M. (2007). Pattern recognition approach
for music style identification using shallow statistical descriptors. IEEE
Transactions on Systems, Man, and Cybernetics, Part C (Applications and
Reviews), 37(2), 248-257.
[17] Eikvil, L., & Huseby, R. B. (2002). Pattern Recognition in Music. Norsk
Regnesentral/Norwegian Computing Center, Copyright Norsk Regnesen-
tral, No. SAMBA/07/02, 37-77.
[18] Christopher, M. B. (2016). PATTERN RECOGNITION AND MA-
CHINE LEARNING. Springer-Verlag New York.
[19] D. Gao and N. Vasconcelos. Bottom-up saliency is a discriminant
process. In ICCV, 2007.
[20] X. Hou and L. Zhang. Saliency detection: A spectral residual approach.
In CVPR, 2007.
[21] G. Burns: ?A Typology of ?Hooks? in Popular Records?, Popular Music,
Vol. 6, No. 1, pp. 1?20, 1987.

018

You might also like