Professional Documents
Culture Documents
Folk Music Classification Using Hidden Markov Models
Folk Music Classification Using Hidden Markov Models
Folk Music Classification Using Hidden Markov Models
Wei Chai
Media Laboratory
Massachusetts Institute of Technology
Cambridge, MA, U.S.A.
Barry Vercoe
Media Laboratory
Massachusetts Institute of Technology
Cambridge, MA, U.S.A.
Abstract
Automatic music classification is essential for implementing efficient music information retrieval systems;
meanwhile, it may shed light on the process of humans music perception. This paper describes our work on the
classification of folk music from different countries based on their monophonic melodies using hidden Markov
models. Music corpora of Irish, German and Austrian folk music in various symbolic formats were used as the data
set. Different representations and HMM structures were tested and compared. The classification performances
achieved 75%, 77% and 66% for 2-way classifications and 63% for 3-way classification using 6-state left-right
HMM with the interval representation in the experiment. This shows that the melodies of folk music do carry some
statistical features to distinguish them. We expect that the result will improve if we use a more discriminable data
set and the approach should be applicable to other music classification tasks and acoustic musical signals.
Furthermore, the results suggest to us a new way to think about musical style similarity.
Keywords
Music classification, hidden Markov model, music perception
1. Introduction
There are both scientific and practical reasons for
building computer systems that can identify
music style. We are interested in finding out how
a piece of music is put together, what aspects of
music characterize a style, and what it is that
distinguishes the musical sound of one culture
from that of another. However, there are currently
no developed scientific theories about how
humans can make rapid judgment about the
musics style from very short examples [11], yet
there are many applications in which music style
identification by computer would be useful. For
example, we would like to build computer
systems that can annotate musical multimedia
data, which will benefit music information
retrieval; we would also like to build systems for
music theory study and teaching. By building
such systems, we can learn a great deal about
how the human system accomplishes this task.
This paper describes our work on the
classification of folk music from different
countries based on their monophonic melodies
using hidden Markov models. The goals of this
research are: (1) to explore whether there exists
significant statistical difference among folk music
from different countries based on their melodies;
2. Approach
2.1 Data Set
We chose folk music as our experiment corpus,
because (1) most folk music pieces have obvious
monophonic melody lines which can be easily
modeled by HMM. Monophonic here means
only one single tone is heard at a time, and there
is no accompaniment or multiple lines going
simultaneously. (2) melodies of folk music from
different countries may have some significant
statistical difference, which should be able to be
captured by an appropriate statistical model.
2.2 Representations
The most obvious way to convert each melody
into a sequence is using the pitch sequence.
However, there are two problems to consider: (1)
Should we use the absolute pitch sequence or the
interval sequence? (2) Should we incorporate
rhythmic information and how to do so if
necessary?
In the experiment, we represented melodies
in four different ways.
(A) Absolute pitch representation. A melody
is converted into a pitch sequence by normalizing
the pitches into one octave from C4 (Middle C) to
B4, that is, pitches in different octaves but of
same chroma are encoded with the same symbol
in the sequence and thus there are totally 12
symbols.
(B)
Absolute
pitch
with
duration
representation.
To
incorporate
rhythm
2.3 HMMs
HMM (Hidden Markov Model) is a very
powerful tool to statistically model a process that
varies in time. It can be seen as a doubly
embedded stochastic process with a process that
is not observable (hidden process) and can only
be observed through another stochastic process
(observable process) that produces the time set of
observations. A HMM can be fully specified by
(1) N, the number of states in the model; (2) M,
the number of distinct observation symbols per
state; (3) A = {aij } , the state transition
probability distribution; (4) B = {b j (k )} , the
observation symbol probability distribution; and
(5) = { i } , the initial state distribution. [10]
S1
S2
S3
a11 a12 0
0 a a
22
23
0 0 a33
0 0 0
0
0
(a)
S1
S2
a11
0
S3
a12 a13
a1n
a2n
a3n
1
a22 a23
0 a33
3. Results
absolute pitch
absolute pitch with duration
interval
contour
(b)
S2
S3
a1n
a2n
a3n
ann
(c)
S1
S2
a11
a
21
a31
an1
S3
a12 a13
a22 a23
a32 a33
a1n
a2n
a3n
ann
an2 an3
(d)
S1
0 0 a33
an1 0 0
70
65
60
55
50
8
HMMs
10
12
14
16
14
16
(i)
classification of Irish and Austrian folk music (test set)
80
70
65
60
55
8
HMMs
(ii)
10
12
66
64
62
60
58
56
54
52
50
48
8
HMMs
10
12
14
16
(iii)
Table 2: Classification performances of 6-state leftright HMM using different representations. The first
three rows correspond to 2-way classifications. The
last row corresponds to the 3-way classification. I:
Irish music; G: German music; A: Austrian music.
Classes
I-G
I-A
G-A
I-G-A
60
55
50
rep.
A
68%
75%
63%
56%
rep.
B
68%
74%
58%
54%
rep.
C
75%
77%
66%
63%
rep.
D
72%
70%
58%
59%
45
40
4. Discussions
35
30
25
8
HMMs
10
12
14
16
(iv)
Figure 2: Classification performances using different
representations and HMMs. (i), (ii) and (iii)
correspond to 2-way classifications. (iv) corresponds
to the 3-way classification.
Table 1: 16 HMMs with different structures and
number of hidden states used in the experiment (see
Figure 1 for the description of the structures).
HMM
N
STRUC
HMM
N
STRUC
1 2 3 4 5 6
7 8
2 2 2 2 3 3
3 3
a b
c
d
a
b
c
d
9 10 11 12 13 14 15 16
4 4 4 4 6 6
6 6
a b
c
d
a
b
c
d
5. Conclusions
It has been shown how hidden Markov models
could be used to build classifiers based on
melody information of folk music. Folk music
from different countries does have significant
statistical difference in their respective melodies.
The interval representation generally performs
better than the absolute pitch representation or the
contour based representation.
For further research, our approach should be
applicable to other music classification tasks, for
example, classifying music by different
composers, of different ages or genres, etc.
Furthermore, we will explore the possibility of
combining
our
method
in
symbolic
representations with signal processing techniques
to build music classification systems on acoustic
References
[1]
[11]
[12]
[13]
[14]