Professional Documents
Culture Documents
MASTER OF SCIENCE in COMPUTER SCIENCE
MASTER OF SCIENCE in COMPUTER SCIENCE
MASTER OF SCIENCE in COMPUTER SCIENCE
SANTA CRUZ
MASTER OF SCIENCE
in
COMPUTER SCIENCE
by
Mark C. Deckert
January 2007
Mark C. Deckert
2007
Table of Contents
List of Figures iv
List of Tables vi
Abstract vii
Acknowledgments viii
1 Introduction 1
2 Related Work 4
3 Creating a Model 6
5 Modeling Attack 18
6 A Practical Use 21
7 Conclusion 27
Bibliography 28
iii
List of Figures
3.1 The figure shows the creation of a style model from an audio corpus
containing the songs which represent that style. . . . . . . . . . . . . . . 9
3.2 The figure shows the evaluation of the likelihood that a particular song
was produced by the GMM for a style where ”Average Likelihood” is the
average of the of the log likelihood for each frame in the piece of music.
To make a classification, this evaluation occurs for multiple style GMMs
and the GMM which produces the greatest likelihood is chosen as the
style of that song. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 15
4.2 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 15
4.3 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 15
4.4 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16
4.5 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16
4.6 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16
4.7 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 16
iv
4.8 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
with the style indicated at the top of that column. . . . . . . . . . . . . 17
5.1 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 19
5.2 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
the style indicated at the top of that column. . . . . . . . . . . . . . . . 19
5.3 A confusion matrix where each row shows the percentage of examples
with the actual label of that row that were labeled by the classifier with
with the style indicated at the top of that column. . . . . . . . . . . . . 20
v
List of Tables
4.1 Narrowed from a larger set, these styles were found to be independent
and distinct. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 These styles were found to have overlapping artists. . . . . . . . . . . . 14
vi
Abstract
by
Mark C. Deckert
Music is classified into styles based on features derived from audio signals called Mel-
models that represent the distribution of MFCC vectors obtained from music of that
style. A novel feature based on properties of instrument attack is added to the existing
I would like to thank Roberto Manduchi, David Helmbold, Charlie McDowell, Dan Ellis,
viii
Chapter 1
Introduction
Having experienced data loss that included a large, well-sorted MP3 collection,
the contention of this paper that music similarity and, more specifically, music sorting
create a usable, flexible music sorting application that allows the user to choose music
categories that suit their own preferences. To this end, a sorting application has been
efforts have sought to find a “ground truth” in music similarity measures. I contend
that “ground truth” is an ill-conceived concept. Just as the opening quote purports,
music similarity is a “wholly subjective” concept. While there are certainly quantita-
1
tive measures which contribute to music similarity, the amount of potential individual
measures is staggering. Furthermore, the particular way in which these measures man-
ifest in the mind of a given listener is a function of both the physical features of their
hearing system and their particular mental state. For example, the focus of a musical
layperson and of a musician when listening to a particular piece of music will likely
be quite different. That said, a data set and some necessarily subjective classifications
are certainly required to create style classification systems. To mitigate this problem,
Ellis et al. define a concept called “consensus truth”, skirting the issues surrounding
to pull from this multitude those techniques which most closely match the physiological
aspects of the listener; emphasizing techniques that are simple, elegant and effective. A
three part system has been identified which meets these criteria.
The first part of such a system is the choice of styles and style examples. While
this may seem trivial, the choice of styles and examples can make or break a classification
represent the style especially well, classification systems become less effective. Given
the large number of commonly used style classifications and inconsistency as to which
artists belong to them, finding groups of styles which represent typical end user classi-
fications becomes an important step. Once a selection of styles and classified music is
decided upon, the next component is to extract features from the audio. A well refined
2
technique from the speech recognition community called Mel-Frequency Cepstral Coef-
ficients (MFCC) is used. Finally, the features must be modeled in a compact manner.
With each frame containing 20 coefficients that comprise a vector, the MFCCs derived
from a corpus of audio representing a style are treated as a bag of frames and mod-
eled with two clustering algorithms, K-means and expectation maximization, creating
4 describes the testing methods used. Chapter 5 covers an additional feature which
models the attack of musical instruments. Chapter 6 describes a practical use of these
3
Chapter 2
Related Work
The Music Information Retrieval (MIR) community has seen much progress in
recent years, especially in the area of acoustic similarity measures. Advances in machine
learning techniques and the cost of computational resources have allowed large amounts
of audio to be analyzed over many feature spaces, often drawing from prior work in the
speech recognition community. Three recent papers act to distill the state of the art in
measures, exploring the concepts of consensus truth, anchor space and acoustic similar-
ity, and discussing the use of Mel-Frequency Cepstral Coefficients (MFCC), Gaussian
Mixture Models, K-means, and EM in creating models. In addition, they explore vari-
ous metrics for comparing similarity of models such as KL-divergence and Earth Movers
Distance. A distinct difference between their work and my own is that they explore sim-
ilarity between models whereas my own work, which is more specific to music sorting,
4
takes the approach of averaging log-likelihoods that each frame from a song would be
Aucouturier and Patchet [1] focus specifically on global timbre similarity, ex-
ploring the parameter space in search of an optimal parameter set. They find that
distance metrics and size of the MFCC window can improve the accuracy significantly.
Building on these results they add various front-end variations such as liftering, cep-
that there seems to be a ”glass ceiling” limiting the potential for increasing accuracy
above a certain threshold regardless of the combination of front-end variations and tim-
bre similarity parameters used. My own novel feature builds upon the delta coefficient
variation they tested by adding a transformation which I gleaned from my own knowl-
edge as a musician. Though my confidence is not great, I hope future work with this
Bergstra et. al [3] provide the most complete exploration of the acoustic simi-
larity feature space to date and significantly improve classification by computing many
features over medium sized song segments. They combine these meta-features (features
computed from song segments) as weak classifiers using the Adaboost algorithm, chal-
lenging Mandel & Ellis for supremacy at MIREX 2005, a Music Information Retrieval
competition. The use of song segments and Adaboost is an intriguing technique, but is
beyond the scope of my research (their team consists of no less than 5 MIR experts).
5
Chapter 3
Creating a Model
Surveying the existing techniques and prior work in the field of music infor-
mation retrieval, a complex and disparate array of techniques emerges. Most of the
techniques exhibit a fair amount of success but seem a bit complex and unrefined. A
few authors pull more refined techniques from the extensive body of research produced
by the speech recognition community to produce strong results. To quote, Dan Ellis,
one of the better performing Music Information Retrieval (MIR) researchers in years
past:
Decades of research in the speech community has led to usable systems and
convergence of the features and models used for speech analysis [7].
As a result, this author has paid particular attention to the techniques and results
coming out of LabROSA at Columbia University from Ellis et al. [2, 8]. I have accepted
their and Logan’s opinion [7] that Mel-Frequency Cepstral Coefficients (MFCC) are an
ideal feature for style classification and have obtained a large collection of MFCC data
and music labeling called uspop2002 [2]. Ellis et al. have been working to distribute this
6
data set as a means of providing a consistent testbed for the evaluation and comparison
of MIR techniques. Though MFCC are not the only possible audio feature, a strong
case has been made that a consistent data set is needed. Unfortunately due to copyright
restrictions, that data set cannot consist of raw audio and so features must be distributed
instead [6]. The choice of MFCC seems appropriate because, aside from their known
success in the field of speech recognition, they seem to be the features most closely
I was able to host the aforementioned data set containing a large amount of
classified MFCC data from 400 artists on UCSC School of Engineering servers and would
like to thank the School of Engineering, David Helmbold and the Machine Learning
group for this privilege. Aside from my own work, this data set will serve as a resource
the discretized fourier transform of the audio signal. F s is a vector where each s in
psycho-acoustic frequency scale on which a changeof 1 unit carries the same perceptual
with Hertz from 0 to 1000 Hz, at which point it continues to rise logarithmically. We
refer to the Mel-scale projected version of F s as Fmel . Following the projection, MFCC
are computed by taking the logarithm, which maps power to loudness, and then the
dct, which produces coefficients that estimate the strength of different harmonic serias
7
in the signal. The following equations define the MFCC, where each frame is referred
to by the superscript k and γ is set to a small value to avoid taking the logarithm of 0:
(k) (k)
Lmf cc = log(Fmel + γ)
rmf cc = dct(Lmf cc )
a near instant of sound. Each coefficient, or element in the vector, is a number which
represents the loudness of a harmonic series in the signal. We may simply further and
say the each coefficient represents loundness in a certian frequency range since lower
frequency harmonics tend to mask higher frequency harmonics. In the remainder of the
paper, we refer to a single vector as an MFCC and a collections of those vectors which
represent an audible amount of audio (generally one or more entire songs) as MFCCs.
The data set used contains 400 artists, 8764 tracks, and 251 unique style tokens.
The MP3s were downsampled to 22050 Hz, and mixed to mono using the mpg123 and
Given the a large number of MFCCs, we must now find a way to model them
– there are too many for them to considered individually. The methodology chosen here
is selected for both for its simplicity and effectiveness. Again drawing from the speech
Gaussian Mixture Model. The GMM is trained using the well established technique of
initializing with the K-means clustering algorithm and then honing in with Expectation
Maximization. For further explanation, the reader is referred to the excellent survey
of clustering techniques by Rui Xu and Donald Wunsch II [10]. Of note is that a well
8
Creating a Style Model
Short MFCC
Audio
Frame Vector
Bag
Audio of K-means+EM
Corpus MFCC GMM
Frames
Short MFCC
Audio
Frame Vector
Figure 3.1: The figure shows the creation of a style model from an audio corpus con-
taining the songs which represent that style.
known problem with EM is that sometimes an outlying point gets isolated to a single
gaussian in the mixture model, sending its covariance to zero and resulting in division
by zero errors. Though this problem did surface at one point, the Netlab toolkit I used
9
Computing Style Likelihoods for a Song
Short MFCC
Audio
Frame Vector
Log Likelihoods
Song Style Average
Audio GMM Likelihood
Short MFCC
Audio
Frame Vector
Figure 3.2: The figure shows the evaluation of the likelihood that a particular song was
produced by the GMM for a style where ”Average Likelihood” is the average of the of
the log likelihood for each frame in the piece of music. To make a classification, this
evaluation occurs for multiple style GMMs and the GMM which produces the greatest
likelihood is chosen as the style of that song.
10
Chapter 4
After securing the disk space and transferring the data set, I analyzed the
available style label sets and determined that some restructuring of both the provided
labels and disk structure of the data would be required. The issue was that there
were over 200 styles specified with each artist belonging to several styles. As such,
any attempt to classify the entire database wholesale would be doomed to failure. The
solution to this came in the form of using Perl to massage the provided labeling into a
useful format and to generate shell scripts which created a directory for each style and
softlinked in all the data for each artist listed as belonging to that style. Tests would
then be conducted among groups of styles representative of what the typical end user
might use.
and the desire to create a good resource base, I decided to try to model every style in
the data set – a rather computational and memory intensive task. These models were
11
later used to add flexibility to the music sorting application created as a result of the
research. The modeling was accomplished by randomly splitting the data set into half
training and half testing examples. Perl was again employed to create shell scripts, this
time creating three separate shell scripts aimed at the three appropriately configured
compute servers. Each script fired off matlab processes one at a time, using netlab [9]
and matlab to create a guassian mixture model of the songs in each style’s training set.
Having created and saved models for each style and lists of songs for their
training and testing sets, I proceeded to design some groups of styles. Again using
Perl, a list of styles containing content from more than 4 artists and less than 10 artists
was created with the reasoning that well represented, but not overly broad styles were
desired. This set included 24 styles which are listed in the Appendix. Tests were
then conducted on all size three subgroups of this group of styles. Due to limited
computational resources I further reduced this set to 10 styles, shown in the tables
which follow. Using a distributed approach I spread the tests across multiple shell
scripts. Initially I was disappointed to find that some of the tests showed fairly effective
classification while others were dismal. The cause of this ended up being overlap of
artists belonging to certain rock based styles. After removing the overlapping styles,
results improved and became more consistent. Sets of four were then tested and showed
only a slight decrease in classification effectiveness. Future work will involve identifying
Having multiple labels for some artists does produce somewhat of an issue in
that traditional learning specificies that there be one label per example when training
12
Styles chosen for main experiment
Country-Pop
Electronica
Grunge
House
Prog-Rock+Art Rock
Punk-Pop
Soul
Table 4.1: Narrowed from a larger set, these styles were found to be independent and
distinct.
a classifier. As mentioned, the solution to this comes in choosing sets of styles where
each artist is labeled by only one style in the set, producing the one-to-one mapping
required to train a classifier. While this may seem a bit awkward, it is consistent with
the intended usage of the classification system. When a user uses the system to sort
their music, they choose a small subset of styles from the larger list. Assuming the user
is rational and has a good idea of what each style sounds like, the chosen set will include
styles which are not closely related and each artist from the training corpus will map to
a unique style in the set. Creating a sorting application with a large number of styles
where artists may belong to multiple styles is something which has already been done
and is not a goal of this research. The goal is allow a user to choose their own small
set of styles and to map each artist to a unique style, as I have found this to be the
most effective sorting mechanism for quickly and effectively finding music from one’s
Table 4.1 shows the styles used and Table 4.2 shows those that were excluded
due to overlap.
With results now correctly tabulated for all groups of three and four styles,
13
Styles eliminated due to overlap
Rock + Roll
Blues-Rock
Funk
Perl was again employed to find the average classification accuracy for each of these
groupings.
Summarizing the current method, over 200 styles were identified and artists
for each style linked into representative directories. For each style, half the songs were
designated to belong to the training set and half to the testing set. A guassian mixture
model was then created for each style based on the training set. A subset of styles was
then chosen (Table 4.1) and tests run on each possible group of 3 and 4 styles from this
subset, producing a number representing the likelihood that each testing example was
produced by each style in the group. When this number was highest for the style the
song belonged to, a success was tallied. When the number was highest for a different
For groups of 3 styles, classification was 71.9% successful. Figure 4.1 shows
a confusion matrix for the most poorly classfied set of 3 styles. Figure 4.2 shows a
confusion matrix for an average set of 3 styles. Figure 4.3 shows a confusion matrix for
For groups of 4 styles, classification was 69.9% successful. Figure 4.4 shows
a confusion matrix for the most poorly classfied set of 4 styles. Figure 4.5 shows a
14
Electronica House Soul
Electronica 37.70 27.87 34.43
House 20.91 28.18 50.91
Soul 0.80 9.60 89.60
Figure 4.1: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
Figure 4.2: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
confusion matrix for an average set of 4 styles. Figure 4.8 shows a confusion matrix for
Though not an identical test, these values do fall within the range of genre clas-
sification accuracies achieved by applicants in the MIREX 2005 contest where accuracies
Figure 4.3: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
15
Electronica Grunge House Soul
Electronica 31.15 9.84 27.87 31.15
Grunge 10.57 63.41 2.44 23.58
House 18.18 5.45 26.36 50.00
Soul 0.80 0.80 9.60 88.80
Figure 4.4: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
Figure 4.5: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
Figure 4.6: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
Figure 4.7: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
16
Country-Pop Electronica Prog-Rock+Art Rock Punk-Pop
Country-Pop 63.11 0.00 33.98 2.91
Electronica 3.28 50.82 31.15 14.75
Prog-Rock+Art Rock 1.87 2.34 92.99 2.80
Punk-Pop 1.74 1.74 13.37 83.14
Figure 4.8: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with with the style
indicated at the top of that column.
17
Chapter 5
Modeling Attack
temporal feature into the existing classification system. To this end, I have conceived
of a method that fits seamlessly into the existing modeling framework. By finding the
delta between consecutive MFCC frames and exponentiating this quantity, the large
positive deltas, which represent the attack of musical instruments (including voice), are
accentuated while still maintaining a set of coefficients which are in the general range
of the original MFCCs. For readers not familiar with the term attack, it is the short
time segment when an instrument goes from silence to producing a sound, marked by
a steep increase in energy in certain frequencies of the signal. By taking this new set of
coefficients and doubling the feature space, a new set of 40 dimension GMMs is produced.
Initial tests show up to a 4% increase in classification accuracy though some tests show
poorer performance. For the four style experiment, overall classification improved to
72.4%. The attack model seems to perform well on music contatining steady beats
18
Electronica Grunge Prog-Rock+Art Rock Soul
Electronica 57.50 37.50 2.50 2.50
Grunge 0.76 58.33 14.39 26.52
Prog-Rock+Art Rock 0.76 40.91 55.30 3.03
Soul 4.62 30.77 13.85 50.77
Figure 5.1: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
Figure 5.2: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with the style indicated
at the top of that column.
(e.g. Punk-Pop) and does poorly when the beats are irregular (e.g. Electronic) or not
prominent (e.g. Country-Pop). Further work will involve attempting to limit attack
Figure 5.1 shows a confusion matrix of a classification that did poorly in com-
parison to its non-attack counterpart, Figure 4.6. Figure 5.2 shows a confusion matrix
for an average attack based modeling classification compared to its non-attack counter-
part, Figure 4.7. Figure 5.3 shows a confusion matrix where attack modeling signifi-
I wish to thank David Cope for spawning this idea by describing how it is
necessary to base music similarity on pitch intervals and not absolute pitch in order to
19
Country-Pop Electronica Prog-Rock+Art Rock Punk-Pop
Country-Pop 65.00 12.50 15.00 7.50
Electronica 5.43 60.29 34.29 0.00
Prog-Rock+Art Rock 0.00 4.29 95.71 0.00
Punk-Pop 3.19 5.32 4.51 86.98
Figure 5.3: A confusion matrix where each row shows the percentage of examples with
the actual label of that row that were labeled by the classifier with with the style
indicated at the top of that column.
20
Chapter 6
A Practical Use
Using the existing models and generating MFCCs from new user provided
audio data (the user points to a directory containing MP3s), I have created a system
which makes recommendations for sorting and allows the user to specify the actual
style to which the new song belongs. Choices are sorted accoring to the average log
likelihood that each song belongs to the style set chosen by the user and the program
moves the actual audio data to directories based on the users choice, creating a usable
music sorting system. The following example run illustrates the program being used.
Notice that the songs are sorted by their likelihood. The classifier was correct in most
of the cases and where it failed, the desired style was always the second choice.
Please provide a substring (or RegEx) of desired style [Enter for Done]: Rock
0. Search Again
1. Acid_Rock
2. Adult_Alternative_Pop+Rock
3. Album_Rock
4. Alternative_Country-Rock
5. Alternative_Pop+Rock
21
6. American_Trad_Rock
7. Arena_Rock
8. Aussie_Rock
9. Blues-Rock
10. Boogie_Rock
11. British_Folk-Rock
12. British_Trad_Rock
13. Celtic_Rock
14. College_Rock
15. Comedy_Rock
16. Country-Rock
17. Experimental_Rock
18. Folk-Rock
19. Glam_Rock
20. Goth_Rock
21. Hard_Rock
22. Heartland_Rock
23. Indie_Rock
24. Jazz-Rock
25. Latin_Rock
26. New_Zealand_Rock
27. Noise-Rock
28. Pop+Rock
29. Prog-Rock+Art_Rock
30. Pub_Rock
31. Rap-Rock
32. Rock_+_Roll
33. Rock_en_Espa?ol
34. Rockabilly
35. Rocksteady
36. Roots_Rock
37. Soft_Rock
38. Southern_Rock
39. Space_Rock
40. Swedish_Pop+Rock
Choose a modeled style number: 3
Please provide a substring (or RegEx) of desired style [Enter for Done]: Jazz
0. Search Again
1. Acid_Jazz
2. Crossover_Jazz
3. Jazz-Funk
4. Jazz-Pop
22
5. Jazz-Rock
6. Smooth_Jazz
7. Vocal_Jazz
Choose a modeled style number: 5
Please provide a substring (or RegEx) of desired style [Enter for Done]: Electronic
0. Search Again
1. Electronica
2. Indie_Electronic
3. Progressive_Electronic
Choose a modeled style number: 2
Please provide a substring (or RegEx) of desired style [Enter for Done]: Reggae
0. Search Again
1. Contemporary_Reggae
2. Political_Reggae
3. Reggae
4. Reggae-Pop
5. Roots_Reggae
Choose a modeled style number: 5
Please provide a substring (or RegEx) of desired style [Enter for Done]:
23
Comparing to models.
1. Indie_Electronic
2. Jazz-Rock
3. Album_Rock
4. Roots_Reggae
Choose directory for Air-La_Femme_dArgent.mp3 [1]:
1. Indie_Electronic
2. Jazz-Rock
3. Album_Rock
4. Roots_Reggae
Choose directory for AphexTwin-4.mp3 [1]:
1. Jazz-Rock
2. Indie_Electronic
3. Album_Rock
4. Roots_Reggae
Choose directory for Boards_of_Canada-Music_Is_Math.mp3 [1]: 2
1. Jazz-Rock
2. Album_Rock
3. Roots_Reggae
4. Indie_Electronic
Choose directory for Clapton-I_Shot_The_Sheriff.mp3 [1]: 2
1. Album_Rock
2. Jazz-Rock
3. Roots_Reggae
4. Indie_Electronic
Choose directory for Faith_No_more-The_Real_Thing.mp3 [1]:
1. Jazz-Rock
2. Indie_Electronic
3. Album_Rock
4. Roots_Reggae
Choose directory for Jaco_Pastorius-Come_On_Come_Over.mp3 [1]:
1. Roots_Reggae
2. Jazz-Rock
24
3. Album_Rock
4. Indie_Electronic
Choose directory for Marley-Get_Up_Stand_Up.mp3 [1]:
1. Jazz-Rock
2. Indie_Electronic
3. Roots_Reggae
4. Album_Rock
Choose directory for MMW-Last_Chance_To_Dance_Trance.mp3 [1]:
1. Roots_Reggae
2. Jazz-Rock
3. Album_Rock
4. Indie_Electronic
Choose directory for Quasimodal-Creature_of_the_Night.mp3 [1]: 2
1. Jazz-Rock
2. Indie_Electronic
3. Roots_Reggae
4. Album_Rock
Choose directory for The_New_Mastersounds-Aint_No_Telling.mp3 [1]:
1. Album_Rock
2. Jazz-Rock
3. Roots_Reggae
4. Indie_Electronic
Choose directory for Zappa-CatholicGirls.mp3 [1]:
> ls -R music
music:
Album_Rock Indie_Electronic Jazz-Rock Roots_Reggae
music/Album_Rock:
Clapton-I_Shot_The_Sheriff.mp3 Zappa-CatholicGirls.mp3
Faith_No_more-The_Real_Thing.mp3
music/Indie_Electronic:
Air-La_Femme_dArgent.mp3 AphexTwin-4.mp3 Boards_of_Canada-Music_Is_Math.mp3
music/Jazz-Rock:
Jaco_Pastorius-Come_On_Come_Over.mp3 Quasimodal-Creature_of_the_Night.mp3
25
MMW-Last_Chance_To_Dance_Trance.mp3 The_New_Mastersounds-Aint_No_Telling.mp3
music/Roots_Reggae:
Marley-Get_Up_Stand_Up.mp3
26
Chapter 7
Conclusion
While style classification may be elusive due to its subjectivity, I hope I have
shown that via the application of existing techniques, the creation of a new feature and
the use of these techniques in a practical music classification system, it is indeed possible
to identify styles accurately enough to create valuable applications. New features and
techniques will certainly serve to increase the value of this field in the future.
27
Bibliography
[1] Jean-Julien Aucouturier and Francois Pachet. Improving timbre similarity: How
high is the sky? Journal of Negative Results in Speech and Audio Sciences, 1(1),
2004.
acoustic and subjective music similarity measures. The Computer Music Journal,
[3] James Bergstra, Norman Casagrande, Dumitru Erhan, Douglas Eck, and Balazs
[4] David Cope. Experiments in Musical Intelligence. A-R Additions, Madison, Wis-
consin, 1996.
[5] D. Ellis, A. Berenzweig, and B. Whitman. The uspop2002 pop music data set.
uspop2002.html.
28
[6] B. Logan, D. Ellis, and A. Berenzweig. Toward evaluation techniques for music
[7] Beth Logan. Mel frequency cepstral coefficients for music modeling. In Proceedings
[8] M. I. Mandel and D. P. Ellis. Song-level features and support vector machines for
netlab/.
[10] Rui Xu and Donald Wunsch II. Survey of clustering algorithms. IEEE TRANS-
29
Appendix A
Big_Beat
Blue-Eyed_Soul
British_Blues
British_Invasion
British_Metal
British_Psychedelia
Country-Pop
Dirty_South
Disco
East_Coast_Rap
Euro-Pop
Funk
Funk_Metal
Gangsta_Rap
Hair_Metal
Industrial_Metal
Jazz-Rock
Latin_Pop
Neo-Traditionalist_Country
New_Jack_Swing
Quiet_Storm
Rap-Rock
Ska-Punk
Third_Wave_Ska_Revival
West_Coast_Rap
30