MASTER OF SCIENCE in COMPUTER SCIENCE

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

UNIVERSITY OF CALIFORNIA

SANTA CRUZ

LEARNING MUSIC STYLES WITH ACOUSTIC SIMILARITY


MEASURES
A project submitted in partial satisfaction of the
requirements for the degree of

MASTER OF SCIENCE

in

COMPUTER SCIENCE

by

Mark C. Deckert

January 2007

The Project of Mark C. Deckert


is approved:

Professor David Helmbold, Chair

Professor Roberto Manduchi


Copyright
c by

Mark C. Deckert

2007
Table of Contents

List of Figures iv

List of Tables vi

Abstract vii

Acknowledgments viii

1 Introduction 1

2 Related Work 4

3 Creating a Model 6

4 Testing the Classification System 11

5 Modeling Attack 18

6 A Practical Use 21

7 Conclusion 27

Bibliography 28

A Styles with Four to Ten Artists 30

iii
List of Figures

3.1 The figure shows the creation of a style model from an audio corpus
containing the songs which represent that style. . . . . . . . . . . . . . . 9
3.2 The figure shows the evaluation of the likelihood that a particular song
was produced by the GMM for a style where ”Average Likelihood” is the
average of the of the log likelihood for each frame in the piece of music.
To make a classification, this evaluation occurs for multiple style GMMs
and the GMM which produces the greatest likelihood is chosen as the
style of that song. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.1 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 15
4.2 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 15
4.3 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 15
4.4 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16
4.5 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16
4.6 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16
4.7 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16

iv
4.8 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
with the style indicated at the top of that column. . . . . . . . . . . . . 17

5.1 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 19
5.2 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 19
5.3 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
with the style indicated at the top of that column. . . . . . . . . . . . . 20

v
List of Tables

4.1 Narrowed from a larger set, these styles were found to be independent
and distinct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 These styles were found to have overlapping artists. . . . . . . . . . . . 14

vi
Abstract

Learning Music Styles with Acoustic Similarity Measures

by

Mark C. Deckert

Music is classified into styles based on features derived from audio signals called Mel-

Frequency Cepstral Coefficients (MFCC). Styles are represented by guassian mixture

models that represent the distribution of MFCC vectors obtained from music of that

style. A novel feature based on properties of instrument attack is added to the existing

MFCCs to improve classification performance.


Acknowledgments

I would like to thank Roberto Manduchi, David Helmbold, Charlie McDowell, Dan Ellis,

David Cope, and My Parents for their support and guidance.

viii
Chapter 1

Introduction

Music similarity is an elusive concept—wholly subjective, multifaceted, and


a moving target—but one that must be pursued in support of applications
to provide automatic organization of large music collections [2].

Having experienced data loss that included a large, well-sorted MP3 collection,

my interest in music similarity is more than academic. As will be further discussed, it is

the contention of this paper that music similarity and, more specifically, music sorting

involves a significant element of personal taste. A major goal of this research is to

create a usable, flexible music sorting application that allows the user to choose music

categories that suit their own preferences. To this end, a sorting application has been

produced which achieves a reasonable degree of flexibility.

A substantial part of the music information retrieval literature and research

efforts have sought to find a “ground truth” in music similarity measures. I contend

that “ground truth” is an ill-conceived concept. Just as the opening quote purports,

music similarity is a “wholly subjective” concept. While there are certainly quantita-

1
tive measures which contribute to music similarity, the amount of potential individual

measures is staggering. Furthermore, the particular way in which these measures man-

ifest in the mind of a given listener is a function of both the physical features of their

hearing system and their particular mental state. For example, the focus of a musical

layperson and of a musician when listening to a particular piece of music will likely

be quite different. That said, a data set and some necessarily subjective classifications

are certainly required to create style classification systems. To mitigate this problem,

Ellis et al. define a concept called “consensus truth”, skirting the issues surrounding

individualized style perception by attempting to find consensus amongst the varying

opinions of listeners [2].

Having surveyed the multitude of techniques in this arena, I have attempted

to pull from this multitude those techniques which most closely match the physiological

aspects of the listener; emphasizing techniques that are simple, elegant and effective. A

three part system has been identified which meets these criteria.

The first part of such a system is the choice of styles and style examples. While

this may seem trivial, the choice of styles and examples can make or break a classification

system. If an artist belongs to multiple styles in a classification set or examples don’t

represent the style especially well, classification systems become less effective. Given

the large number of commonly used style classifications and inconsistency as to which

artists belong to them, finding groups of styles which represent typical end user classi-

fications becomes an important step. Once a selection of styles and classified music is

decided upon, the next component is to extract features from the audio. A well refined

2
technique from the speech recognition community called Mel-Frequency Cepstral Coef-

ficients (MFCC) is used. Finally, the features must be modeled in a compact manner.

With each frame containing 20 coefficients that comprise a vector, the MFCCs derived

from a corpus of audio representing a style are treated as a bag of frames and mod-

eled with two clustering algorithms, K-means and expectation maximization, creating

a guassian mixture model representing each style [10].

The remainder of the paper is organized as follows. Chapter 2 describes re-

lated work. In Chapter 3 the modeling methodology is described in detail. Chapter

4 describes the testing methods used. Chapter 5 covers an additional feature which

models the attack of musical instruments. Chapter 6 describes a practical use of these

style classification techniques and Chapter 6 concludes.

3
Chapter 2

Related Work

The Music Information Retrieval (MIR) community has seen much progress in

recent years, especially in the area of acoustic similarity measures. Advances in machine

learning techniques and the cost of computational resources have allowed large amounts

of audio to be analyzed over many feature spaces, often drawing from prior work in the

speech recognition community. Three recent papers act to distill the state of the art in

music similarity measures.

Berenzweig et. al [2] provide a strong, cross-site overview of music similarity

measures, exploring the concepts of consensus truth, anchor space and acoustic similar-

ity, and discussing the use of Mel-Frequency Cepstral Coefficients (MFCC), Gaussian

Mixture Models, K-means, and EM in creating models. In addition, they explore vari-

ous metrics for comparing similarity of models such as KL-divergence and Earth Movers

Distance. A distinct difference between their work and my own is that they explore sim-

ilarity between models whereas my own work, which is more specific to music sorting,

4
takes the approach of averaging log-likelihoods that each frame from a song would be

produced by the model for a given style.

Aucouturier and Patchet [1] focus specifically on global timbre similarity, ex-

ploring the parameter space in search of an optimal parameter set. They find that

adjusting parameters such as number of guassians in the GMM, number of MFCCs,

distance metrics and size of the MFCC window can improve the accuracy significantly.

Building on these results they add various front-end variations such as liftering, cep-

stral mean compensation, delta coefficients, and acceleration coefficients, concluding

that there seems to be a ”glass ceiling” limiting the potential for increasing accuracy

above a certain threshold regardless of the combination of front-end variations and tim-

bre similarity parameters used. My own novel feature builds upon the delta coefficient

variation they tested by adding a transformation which I gleaned from my own knowl-

edge as a musician. Though my confidence is not great, I hope future work with this

feature may serve to break though the glass ceiling.

Bergstra et. al [3] provide the most complete exploration of the acoustic simi-

larity feature space to date and significantly improve classification by computing many

features over medium sized song segments. They combine these meta-features (features

computed from song segments) as weak classifiers using the Adaboost algorithm, chal-

lenging Mandel & Ellis for supremacy at MIREX 2005, a Music Information Retrieval

competition. The use of song segments and Adaboost is an intriguing technique, but is

beyond the scope of my research (their team consists of no less than 5 MIR experts).

5
Chapter 3

Creating a Model

Surveying the existing techniques and prior work in the field of music infor-

mation retrieval, a complex and disparate array of techniques emerges. Most of the

techniques exhibit a fair amount of success but seem a bit complex and unrefined. A

few authors pull more refined techniques from the extensive body of research produced

by the speech recognition community to produce strong results. To quote, Dan Ellis,

one of the better performing Music Information Retrieval (MIR) researchers in years

past:
Decades of research in the speech community has led to usable systems and
convergence of the features and models used for speech analysis [7].

As a result, this author has paid particular attention to the techniques and results

coming out of LabROSA at Columbia University from Ellis et al. [2, 8]. I have accepted

their and Logan’s opinion [7] that Mel-Frequency Cepstral Coefficients (MFCC) are an

ideal feature for style classification and have obtained a large collection of MFCC data

and music labeling called uspop2002 [2]. Ellis et al. have been working to distribute this

6
data set as a means of providing a consistent testbed for the evaluation and comparison

of MIR techniques. Though MFCC are not the only possible audio feature, a strong

case has been made that a consistent data set is needed. Unfortunately due to copyright

restrictions, that data set cannot consist of raw audio and so features must be distributed

instead [6]. The choice of MFCC seems appropriate because, aside from their known

success in the field of speech recognition, they seem to be the features most closely

correlated to human perception [7].

I was able to host the aforementioned data set containing a large amount of

classified MFCC data from 400 artists on UCSC School of Engineering servers and would

like to thank the School of Engineering, David Helmbold and the Machine Learning

group for this privilege. Aside from my own work, this data set will serve as a resource

for future Machine Learning students.

A formal definition of MFCC is borrowed from Bergstra et al.. [3] F s is

the discretized fourier transform of the audio signal. F s is a vector where each s in

F s represents energy in a small frequency range during a near instant of time (a 32

ms frame). First, |F s| is projected according to the Mel-scale. The Mel-scale is a

psycho-acoustic frequency scale on which a changeof 1 unit carries the same perceptual

significance, regardless of theposition on the scale. The Mel-scale increases identically

with Hertz from 0 to 1000 Hz, at which point it continues to rise logarithmically. We

refer to the Mel-scale projected version of F s as Fmel . Following the projection, MFCC

are computed by taking the logarithm, which maps power to loudness, and then the

dct, which produces coefficients that estimate the strength of different harmonic serias

7
in the signal. The following equations define the MFCC, where each frame is referred

to by the superscript k and γ is set to a small value to avoid taking the logarithm of 0:
(k) (k)
Lmf cc = log(Fmel + γ)

rmf cc = dct(Lmf cc )

To clarify, what I refer to as an MFCC is a 20 dimension vector which represents

a near instant of sound. Each coefficient, or element in the vector, is a number which

represents the loudness of a harmonic series in the signal. We may simply further and

say the each coefficient represents loundness in a certian frequency range since lower

frequency harmonics tend to mask higher frequency harmonics. In the remainder of the

paper, we refer to a single vector as an MFCC and a collections of those vectors which

represent an audible amount of audio (generally one or more entire songs) as MFCCs.

The data set used contains 400 artists, 8764 tracks, and 251 unique style tokens.

The MP3s were downsampled to 22050 Hz, and mixed to mono using the mpg123 and

feacalc utilities. Each 32ms frame is distilled into 20 coefficients. [5]

Given the a large number of MFCCs, we must now find a way to model them

– there are too many for them to considered individually. The methodology chosen here

is selected for both for its simplicity and effectiveness. Again drawing from the speech

recognition community, the MFCCs are modeled as a bag of frames represented by a

Gaussian Mixture Model. The GMM is trained using the well established technique of

initializing with the K-means clustering algorithm and then honing in with Expectation

Maximization. For further explanation, the reader is referred to the excellent survey

of clustering techniques by Rui Xu and Donald Wunsch II [10]. Of note is that a well

8
Creating a Style Model

Short MFCC
Audio
Frame Vector

Bag
Audio of K-means+EM
Corpus MFCC GMM
Frames

Short MFCC
Audio
Frame Vector

Figure 3.1: The figure shows the creation of a style model from an audio corpus con-
taining the songs which represent that style.

known problem with EM is that sometimes an outlying point gets isolated to a single

gaussian in the mixture model, sending its covariance to zero and resulting in division

by zero errors. Though this problem did surface at one point, the Netlab toolkit I used

provided an option of handling this issue [9].

9
Computing Style Likelihoods for a Song

Short MFCC
Audio
Frame Vector

Log Likelihoods
Song Style Average
Audio GMM Likelihood

Short MFCC
Audio
Frame Vector

Figure 3.2: The figure shows the evaluation of the likelihood that a particular song was
produced by the GMM for a style where ”Average Likelihood” is the average of the of
the log likelihood for each frame in the piece of music. To make a classification, this
evaluation occurs for multiple style GMMs and the GMM which produces the greatest
likelihood is chosen as the style of that song.

10
Chapter 4

Testing the Classification System

After securing the disk space and transferring the data set, I analyzed the

available style label sets and determined that some restructuring of both the provided

labels and disk structure of the data would be required. The issue was that there

were over 200 styles specified with each artist belonging to several styles. As such,

any attempt to classify the entire database wholesale would be doomed to failure. The

solution to this came in the form of using Perl to massage the provided labeling into a

useful format and to generate shell scripts which created a directory for each style and

softlinked in all the data for each artist listed as belonging to that style. Tests would

then be conducted among groups of styles representative of what the typical end user

might use.

With an eye on the school of engineering’s recently purchased compute servers

and the desire to create a good resource base, I decided to try to model every style in

the data set – a rather computational and memory intensive task. These models were

11
later used to add flexibility to the music sorting application created as a result of the

research. The modeling was accomplished by randomly splitting the data set into half

training and half testing examples. Perl was again employed to create shell scripts, this

time creating three separate shell scripts aimed at the three appropriately configured

compute servers. Each script fired off matlab processes one at a time, using netlab [9]

and matlab to create a guassian mixture model of the songs in each style’s training set.

Having created and saved models for each style and lists of songs for their

training and testing sets, I proceeded to design some groups of styles. Again using

Perl, a list of styles containing content from more than 4 artists and less than 10 artists

was created with the reasoning that well represented, but not overly broad styles were

desired. This set included 24 styles which are listed in the Appendix. Tests were

then conducted on all size three subgroups of this group of styles. Due to limited

computational resources I further reduced this set to 10 styles, shown in the tables

which follow. Using a distributed approach I spread the tests across multiple shell

scripts. Initially I was disappointed to find that some of the tests showed fairly effective

classification while others were dismal. The cause of this ended up being overlap of

artists belonging to certain rock based styles. After removing the overlapping styles,

results improved and became more consistent. Sets of four were then tested and showed

only a slight decrease in classification effectiveness. Future work will involve identifying

all style overlaps and testing a larger, non-overlapping set of styles.

Having multiple labels for some artists does produce somewhat of an issue in

that traditional learning specificies that there be one label per example when training

12
Styles chosen for main experiment
Country-Pop
Electronica
Grunge
House
Prog-Rock+Art Rock
Punk-Pop
Soul

Table 4.1: Narrowed from a larger set, these styles were found to be independent and
distinct.

a classifier. As mentioned, the solution to this comes in choosing sets of styles where

each artist is labeled by only one style in the set, producing the one-to-one mapping

required to train a classifier. While this may seem a bit awkward, it is consistent with

the intended usage of the classification system. When a user uses the system to sort

their music, they choose a small subset of styles from the larger list. Assuming the user

is rational and has a good idea of what each style sounds like, the chosen set will include

styles which are not closely related and each artist from the training corpus will map to

a unique style in the set. Creating a sorting application with a large number of styles

where artists may belong to multiple styles is something which has already been done

and is not a goal of this research. The goal is allow a user to choose their own small

set of styles and to map each artist to a unique style, as I have found this to be the

most effective sorting mechanism for quickly and effectively finding music from one’s

own medium sized collection.

Table 4.1 shows the styles used and Table 4.2 shows those that were excluded

due to overlap.

With results now correctly tabulated for all groups of three and four styles,

13
Styles eliminated due to overlap
Rock + Roll
Blues-Rock
Funk

Table 4.2: These styles were found to have overlapping artists.

Perl was again employed to find the average classification accuracy for each of these

groupings.

Summarizing the current method, over 200 styles were identified and artists

for each style linked into representative directories. For each style, half the songs were

designated to belong to the training set and half to the testing set. A guassian mixture

model was then created for each style based on the training set. A subset of styles was

then chosen (Table 4.1) and tests run on each possible group of 3 and 4 styles from this

subset, producing a number representing the likelihood that each testing example was

produced by each style in the group. When this number was highest for the style the

song belonged to, a success was tallied. When the number was highest for a different

style, a failure was tallied.

For groups of 3 styles, classification was 71.9% successful. Figure 4.1 shows

a confusion matrix for the most poorly classfied set of 3 styles. Figure 4.2 shows a

confusion matrix for an average set of 3 styles. Figure 4.3 shows a confusion matrix for

the most well classfied set of 3 styles.

For groups of 4 styles, classification was 69.9% successful. Figure 4.4 shows

a confusion matrix for the most poorly classfied set of 4 styles. Figure 4.5 shows a

14
Electronica House Soul
Electronica 37.70 27.87 34.43
House 20.91 28.18 50.91
Soul 0.80 9.60 89.60

Figure 4.1: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

Electronica House Punk-Pop


Electronica 38.52 37.70 23.77
House 25.45 47.27 27.27
Punk-Pop 2.91 9.88 87.21

Figure 4.2: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

confusion matrix for an average set of 4 styles. Figure 4.8 shows a confusion matrix for

the most well classfied set of 4 styles.

Though not an identical test, these values do fall within the range of genre clas-

sification accuracies achieved by applicants in the MIREX 2005 contest where accuracies

ranged from 60.72% to 82.34% [3] .

Country-Pop Grunge House


Country-Pop 88.35 4.85 6.80
Grunge 9.76 78.86 11.38
House 4.55 7.27 88.18

Figure 4.3: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

15
Electronica Grunge House Soul
Electronica 31.15 9.84 27.87 31.15
Grunge 10.57 63.41 2.44 23.58
House 18.18 5.45 26.36 50.00
Soul 0.80 0.80 9.60 88.80

Figure 4.4: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

Electronica Grunge House Punk-Pop


Electronica 35.25 6.56 36.07 22.13
Grunge 11.38 34.96 3.25 50.41
House 24.55 1.82 47.27 26.36
Punk-Pop 2.91 3.49 9.88 83.72

Figure 4.5: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

Electronica Grunge Prog-Rock+Art Rock Soul


Electronica 52.46 8.20 23.77 15.57
Grunge 9.76 61.79 22.76 5.69
Prog-Rock+Art Rock 1.40 3.27 73.36 21.96
Soul 2.40 0.80 19.20 77.60

Figure 4.6: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

Country-Pop Electronica Prog-Rock+Art Rock Soul


Country-Pop 50.49 0.00 25.24 24.27
Electronica 1.64 57.38 24.59 16.39
Prog-Rock+Art Rock 1.40 1.40 75.70 21.50
Soul 4.80 1.60 18.40 75.20

Figure 4.7: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

16
Country-Pop Electronica Prog-Rock+Art Rock Punk-Pop
Country-Pop 63.11 0.00 33.98 2.91
Electronica 3.28 50.82 31.15 14.75
Prog-Rock+Art Rock 1.87 2.34 92.99 2.80
Punk-Pop 1.74 1.74 13.37 83.14

Figure 4.8: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with with the style
indicated at the top of that column.

17
Chapter 5

Modeling Attack

The motivation for the new feature is to improve results by incorporating a

temporal feature into the existing classification system. To this end, I have conceived

of a method that fits seamlessly into the existing modeling framework. By finding the

delta between consecutive MFCC frames and exponentiating this quantity, the large

positive deltas, which represent the attack of musical instruments (including voice), are

accentuated while still maintaining a set of coefficients which are in the general range

of the original MFCCs. For readers not familiar with the term attack, it is the short

time segment when an instrument goes from silence to producing a sound, marked by

a steep increase in energy in certain frequencies of the signal. By taking this new set of

coefficients and doubling the feature space, a new set of 40 dimension GMMs is produced.

Initial tests show up to a 4% increase in classification accuracy though some tests show

poorer performance. For the four style experiment, overall classification improved to

72.4%. The attack model seems to perform well on music contatining steady beats

18
Electronica Grunge Prog-Rock+Art Rock Soul
Electronica 57.50 37.50 2.50 2.50
Grunge 0.76 58.33 14.39 26.52
Prog-Rock+Art Rock 0.76 40.91 55.30 3.03
Soul 4.62 30.77 13.85 50.77

Figure 5.1: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

Country-Pop Electronica Prog-Rock+Art Rock Soul


Country-Pop 77.50 17.50 2.50 2.50
Electronica 3.62 54.35 37.68 4.35
Prog-Rock+Art Rock 2.86 11.43 82.86 2.86
Soul 0.00 14.74 9.47 75.79

Figure 5.2: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.

(e.g. Punk-Pop) and does poorly when the beats are irregular (e.g. Electronic) or not

prominent (e.g. Country-Pop). Further work will involve attempting to limit attack

modeling to frequenciies likely to contain attack oriented beat information.

Figure 5.1 shows a confusion matrix of a classification that did poorly in com-

parison to its non-attack counterpart, Figure 4.6. Figure 5.2 shows a confusion matrix

for an average attack based modeling classification compared to its non-attack counter-

part, Figure 4.7. Figure 5.3 shows a confusion matrix where attack modeling signifi-

cantly outperformed non-attack – Figure 4.8.

I wish to thank David Cope for spawning this idea by describing how it is

necessary to base music similarity on pitch intervals and not absolute pitch in order to

relate music in different keys [4].

19
Country-Pop Electronica Prog-Rock+Art Rock Punk-Pop
Country-Pop 65.00 12.50 15.00 7.50
Electronica 5.43 60.29 34.29 0.00
Prog-Rock+Art Rock 0.00 4.29 95.71 0.00
Punk-Pop 3.19 5.32 4.51 86.98

Figure 5.3: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with with the style
indicated at the top of that column.

20
Chapter 6

A Practical Use

Using the existing models and generating MFCCs from new user provided

audio data (the user points to a directory containing MP3s), I have created a system

which makes recommendations for sorting and allows the user to specify the actual

style to which the new song belongs. Choices are sorted accoring to the average log

likelihood that each song belongs to the style set chosen by the user and the program

moves the actual audio data to directories based on the users choice, creating a usable

music sorting system. The following example run illustrates the program being used.

Notice that the songs are sorted by their likelihood. The classifier was correct in most

of the cases and where it failed, the desired style was always the second choice.

Please provide a substring (or RegEx) of desired style [Enter for Done]: Rock

0. Search Again
1. Acid_Rock
2. Adult_Alternative_Pop+Rock
3. Album_Rock
4. Alternative_Country-Rock
5. Alternative_Pop+Rock

21
6. American_Trad_Rock
7. Arena_Rock
8. Aussie_Rock
9. Blues-Rock
10. Boogie_Rock
11. British_Folk-Rock
12. British_Trad_Rock
13. Celtic_Rock
14. College_Rock
15. Comedy_Rock
16. Country-Rock
17. Experimental_Rock
18. Folk-Rock
19. Glam_Rock
20. Goth_Rock
21. Hard_Rock
22. Heartland_Rock
23. Indie_Rock
24. Jazz-Rock
25. Latin_Rock
26. New_Zealand_Rock
27. Noise-Rock
28. Pop+Rock
29. Prog-Rock+Art_Rock
30. Pub_Rock
31. Rap-Rock
32. Rock_+_Roll
33. Rock_en_Espa?ol
34. Rockabilly
35. Rocksteady
36. Roots_Rock
37. Soft_Rock
38. Southern_Rock
39. Space_Rock
40. Swedish_Pop+Rock
Choose a modeled style number: 3
Please provide a substring (or RegEx) of desired style [Enter for Done]: Jazz

0. Search Again
1. Acid_Jazz
2. Crossover_Jazz
3. Jazz-Funk
4. Jazz-Pop

22
5. Jazz-Rock
6. Smooth_Jazz
7. Vocal_Jazz
Choose a modeled style number: 5
Please provide a substring (or RegEx) of desired style [Enter for Done]: Electronic

0. Search Again
1. Electronica
2. Indie_Electronic
3. Progressive_Electronic
Choose a modeled style number: 2
Please provide a substring (or RegEx) of desired style [Enter for Done]: Reggae

0. Search Again
1. Contemporary_Reggae
2. Political_Reggae
3. Reggae
4. Reggae-Pop
5. Roots_Reggae
Choose a modeled style number: 5
Please provide a substring (or RegEx) of desired style [Enter for Done]:

Styles for Sorting


------------------
Album_Rock
Jazz-Rock
Indie_Electronic
Roots_Reggae

Music Location [/cse/grads/mdeckert/music/]:

Creating MFCCs from Air-La_Femme_dArgent.mp3


Creating MFCCs from AphexTwin-4.mp3
Creating MFCCs from Boards_of_Canada-Music_Is_Math.mp3
Creating MFCCs from Clapton-I_Shot_The_Sheriff.mp3
Creating MFCCs from Faith_No_more-The_Real_Thing.mp3
Creating MFCCs from Jaco_Pastorius-Come_On_Come_Over.mp3
Creating MFCCs from Marley-Get_Up_Stand_Up.mp3
Creating MFCCs from MMW-Last_Chance_To_Dance_Trance.mp3
Creating MFCCs from Quasimodal-Creature_of_the_Night.mp3
Creating MFCCs from The_New_Mastersounds-Aint_No_Telling.mp3
Creating MFCCs from Zappa-CatholicGirls.mp3

23
Comparing to models.

Ready to Sort Songs


-------------------

1. Indie_Electronic
2. Jazz-Rock
3. Album_Rock
4. Roots_Reggae
Choose directory for Air-La_Femme_dArgent.mp3 [1]:

1. Indie_Electronic
2. Jazz-Rock
3. Album_Rock
4. Roots_Reggae
Choose directory for AphexTwin-4.mp3 [1]:

1. Jazz-Rock
2. Indie_Electronic
3. Album_Rock
4. Roots_Reggae
Choose directory for Boards_of_Canada-Music_Is_Math.mp3 [1]: 2

1. Jazz-Rock
2. Album_Rock
3. Roots_Reggae
4. Indie_Electronic
Choose directory for Clapton-I_Shot_The_Sheriff.mp3 [1]: 2

1. Album_Rock
2. Jazz-Rock
3. Roots_Reggae
4. Indie_Electronic
Choose directory for Faith_No_more-The_Real_Thing.mp3 [1]:

1. Jazz-Rock
2. Indie_Electronic
3. Album_Rock
4. Roots_Reggae
Choose directory for Jaco_Pastorius-Come_On_Come_Over.mp3 [1]:

1. Roots_Reggae
2. Jazz-Rock

24
3. Album_Rock
4. Indie_Electronic
Choose directory for Marley-Get_Up_Stand_Up.mp3 [1]:

1. Jazz-Rock
2. Indie_Electronic
3. Roots_Reggae
4. Album_Rock
Choose directory for MMW-Last_Chance_To_Dance_Trance.mp3 [1]:

1. Roots_Reggae
2. Jazz-Rock
3. Album_Rock
4. Indie_Electronic
Choose directory for Quasimodal-Creature_of_the_Night.mp3 [1]: 2

1. Jazz-Rock
2. Indie_Electronic
3. Roots_Reggae
4. Album_Rock
Choose directory for The_New_Mastersounds-Aint_No_Telling.mp3 [1]:

1. Album_Rock
2. Jazz-Rock
3. Roots_Reggae
4. Indie_Electronic
Choose directory for Zappa-CatholicGirls.mp3 [1]:

Clean up? [yes]: yes

> ls -R music
music:
Album_Rock Indie_Electronic Jazz-Rock Roots_Reggae

music/Album_Rock:
Clapton-I_Shot_The_Sheriff.mp3 Zappa-CatholicGirls.mp3
Faith_No_more-The_Real_Thing.mp3

music/Indie_Electronic:
Air-La_Femme_dArgent.mp3 AphexTwin-4.mp3 Boards_of_Canada-Music_Is_Math.mp3

music/Jazz-Rock:
Jaco_Pastorius-Come_On_Come_Over.mp3 Quasimodal-Creature_of_the_Night.mp3

25
MMW-Last_Chance_To_Dance_Trance.mp3 The_New_Mastersounds-Aint_No_Telling.mp3

music/Roots_Reggae:
Marley-Get_Up_Stand_Up.mp3

26
Chapter 7

Conclusion

While style classification may be elusive due to its subjectivity, I hope I have

shown that via the application of existing techniques, the creation of a new feature and

the use of these techniques in a practical music classification system, it is indeed possible

to identify styles accurately enough to create valuable applications. New features and

techniques will certainly serve to increase the value of this field in the future.

27
Bibliography

[1] Jean-Julien Aucouturier and Francois Pachet. Improving timbre similarity: How

high is the sky? Journal of Negative Results in Speech and Audio Sciences, 1(1),

2004.

[2] A. Berenzweig, B. Logan, D. Ellis, and B. Whitman. A large-scale evaluation of

acoustic and subjective music similarity measures. The Computer Music Journal,

28(2):63–76, July 2004.

[3] James Bergstra, Norman Casagrande, Dumitru Erhan, Douglas Eck, and Balazs

Kegl. Meta-features and adaboost for music classification. Submitted to Machine

Learning, December 2005.

[4] David Cope. Experiments in Musical Intelligence. A-R Additions, Madison, Wis-

consin, 1996.

[5] D. Ellis, A. Berenzweig, and B. Whitman. The uspop2002 pop music data set.

Technical report, 2003. http://labrosa.ee.columbia.edu/projects/musicsim/

uspop2002.html.

28
[6] B. Logan, D. Ellis, and A. Berenzweig. Toward evaluation techniques for music

similarity. In Proceedings of the 4th International Symposium on Music Information

Retrieval (ISMIR’03), Washington, D.C., USA, 2003.

[7] Beth Logan. Mel frequency cepstral coefficients for music modeling. In Proceedings

of the First International Symposium on Music Information Retrieval (ISMIR),

Plymouth, Massachusetts, October 2000.

[8] M. I. Mandel and D. P. Ellis. Song-level features and support vector machines for

music classification. MIREX genre classification contest, 2005.

[9] Ian Nabney. Netlab. Technical report, 2003. http://www.ncrg.aston.ac.uk/

netlab/.

[10] Rui Xu and Donald Wunsch II. Survey of clustering algorithms. IEEE TRANS-

ACTIONS ON NEURAL NETWORKS, 16(3):645–678, May 2005.

29
Appendix A

Styles with Four to Ten Artists

Big_Beat
Blue-Eyed_Soul
British_Blues
British_Invasion
British_Metal
British_Psychedelia
Country-Pop
Dirty_South
Disco
East_Coast_Rap
Euro-Pop
Funk
Funk_Metal
Gangsta_Rap
Hair_Metal
Industrial_Metal
Jazz-Rock
Latin_Pop
Neo-Traditionalist_Country
New_Jack_Swing
Quiet_Storm
Rap-Rock
Ska-Punk
Third_Wave_Ska_Revival
West_Coast_Rap

30

You might also like