Professional Documents
Culture Documents
Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition
Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition
Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition
Lecture 11:
Chroma and Chords
1. Features for Music Audio
2. Chroma Features
3. Chord Recognition
Dan Ellis
http://www.ee.columbia.edu/~dpwe/e4896/
2013-04-08 - 1 /18
Euclidean metaphor
music tracks as points
in space
sound - timbre,
instruments MFCC
melody, chords
Chroma
rhythm, tempo
Rhythmic bases
E4896 Music Signal Processing (Dan Ellis)
2013-04-08 - 2 /18
MFCCs
Logan 2000
Sound
FFT X[k]
spectra
0.5
0.25
0.255
0.26
0.265
0.27 time / s
10
Mel scale
freq. warp
log |X[k]|
audspec
5
0
0x 10
1000
2000
3000
freq / Hz
10
15
freq / Mel
10
15
freq / Mel
10
20
30
quefrency
2
1
0
100
50
IFFT
cepstra
200
0
Truncate
200
MFCCs
E4896 Music Signal Processing (Dan Ellis)
2013-04-08 - 3 /18
MFCC Example
freq / Hz
coefficient
MFCCs
12
10
8
6
4
2
freq / Hz
10
15
20
25
time / sec
2013-04-08 - 4 /18
Ellis 2007
u2
tori_amos
suzanne_vega
steely_dan
roxette
radiohead
queen
prince
metallica
madonna
led_zeppelin
green_day
garth_brooks
fleetwood_mac
depeche_mode
dave_matthews_b
cure
creedence_c_r
beatles
aerosmith
u2
to
su
st
ro
ra
qu
pr
me
le
ma
gr
fl
ga
de
da
cu
cr
be
ae
true
single Gaussian
model
20 (mean)
+ 10 x 19
(covariance)
parameters
55% correct
(guessing ~5%)
2013-04-08 - 5 /18
2. Chroma Features
MIDI note number
melody spotting
chord recognition
cover songs...
70
65
60
55
50
e.g. sinusoidal
tracking:
confused by
harmonics
75
70
65
60
55
50
Recognized
True
40
45
22
24
26
28
30
32
34
2013-04-08 - 6 /18
Chroma Features
Fujishima 1999
3
2
D
C
A
chroma
freq / kHz
chroma
G
F
D
C
1
50 100 150 fft bin
time / sec
50
100
150
200
NM
C(b) =
b)W (k)|X[k]|
k=0
2013-04-08 - 7 /18
Better Chroma
Problems:
Solutions:
D
C
A
freq / kHz
chroma
G
F
2000 freq / Hz
3
2
1
0
time / sec
chroma
50
100
150
200
time / frame
2013-04-08 - 8 /18
Chroma Resynthesis
0
-10
-20
-30
-40
-50
-60
level / dB
b
b
o+ 12
yb (t) =
W (o + ) cos 2
w0 t
12
o=1
4
3
2
1
500
1000
1500
2000
2500 freq / Hz
0
2
10 time / sec
2013-04-08 - 9 /18
Chroma Example
freq / Hz
chroma bin
Chroma features
B
A
G
E
D
C
freq / Hz
freq / Hz
10
15
20
25
time / sec
2013-04-08 - 10/18
Beat-Synchronous Chroma
freq / Hz
6000
1400
300
chroma bin
Beat-synchronous chroma
B
A
G
E
D
C
freq / Hz
6000
1400
300
0
10
15
20
25
time / sec
2013-04-08 - 11/18
3. Chord Recognition
A-C-D-F
A-C-E
B-D-G
E
D
C
C-E-G
chroma bin
10
15
20
time / sec
Two approaches
manual templates
(prior knowledge)
learned models
(from training data)
2013-04-08 - 12/18
Beat track
Audio
test
100-1600 Hz
BPF
Chroma
25-400 Hz
BPF
Chroma
beat-synchronous
chroma features
C maj
B
A
G
train
Labels
chord
labels
HMM
Viterbi
Root
normalize
Gaussian
Unnormalize
24
Gauss
models
E
D
C
c min
B
A
G
E
D
Resample
C
b
a
g
Count
transitions
24x24
transition
matrix
f
e
d
c
B
A
G
F
E
D
C
EF
B c
e f
2013-04-08 - 13/18
HMMs
.8
.8
underlying Markov
generative model
each state has
emission distribution
.1
.1
A
.1
B
.1
.1
p(qn+1|qn)
S
A
qn B
C
E
.1
.1
.7
qn+1
S A B C E
0
0
0
0
0
1
.8
.1
.1
0
0
.1
.8
.1
0
0
.1
.1
.7
0
0
0
0
.1
1
SAAAAAAAABBBBBBBBBCCCCBBBBBBCE
q=B
q=C
3
0.6
xn
p(x|q)
q=A
0.8
AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC
0.4
0.8
q=A
q=B
q=C
0.6
0.4
2
1
0.2
0
Observation
sequence
0.2
xn
infer smoothed
state sequence
State sequence
Emission distributions
p(x|q)
observations
tell us
something
about state...
observation x
10
20
30
time step n
2013-04-08 - 14/18
HMM Inference
p(x|q)
HMM defines emission distribution
and transition probabilities p(qn |qn 1 )
Likelihood of observed given state sequence:
p({xn }|{qn }) =
Model M1
0.8
0.7
0.2
0.1
0.1
S
A
B
E
0.2
A B E
0.9 0.1
0.7 0.2 0.1
0.8 0.2
States
0.9
xn Observations
x1, x2, x3
p(x|B)
p(x|A)
B
A
S
p(x|q)
q{
x1
x2
x3
By dynamic programming,
0.7
0.9
4 time n
A
A
A
B
A
A
B
B
A
B
B
B
E
E
E
E
1)
Observation
likelihoods
Paths
0.7
p(Q | M) = n p(qn|qn-1)
.9 x .7 x .7 x .1 = 0.0441
.9 x .7 x .2 x .2 = 0.0252
.9 x .2 x .8 x .2 = 0.0288
.1 x .8 x .8 x .2 = 0.0128
= 0.1109
2013-04-08 - 15/18
Key Normalization
Chord transition
probabilities should
be key-relative
estimate main key of
piece
rotate all chroma
features
learn models
aligned chroma
Taxman
G
F
G
F
G
F
A
A
Love You To
C D
F G
G
F
A
A
C D
C D
F G
G
F
G
F
F G
C D
F G
C D
F G
G
F
C D
F G
A
A
F G
C D
Yellow Submarine
G
F
Eleanor Rigby
C D
F G
aligned chroma
2013-04-08 - 16/18
Chord Recognition
freq / Hz
Often works:
Audio
Let It Be/06-Let It Be
2416
761
Beatsynchronous
chroma
Recognized
G
F
E
D
C
B
A
0
A:min
F
4
10
12
A:min
G
14
A:min/b7
F:maj7
F:maj6
Ground
truth
chord
A:min/b7
F:maj7
240
a
16
18
2013-04-08 - 17/18
Summary
Chroma Features
Chord Recognition
2013-04-08 - 18/18
References
B. Logan, Mel frequency cepstral coefficients for music modeling, in Proc. Int.
Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000.
D. Ellis, Classifying Music Audio with Timbral and Chroma Features, in Proc.
Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340,Vienna, October 2007.
T. Fujishima, Realtime chord recognition of musical sound: A system using
common lisp music, In Proc. Int. Comp. Music Conf., pp. 464467, Beijing, 1999.
D. Ellis and G. Poliner, Identifying Cover Songs With Chroma Features and
Dynamic Programming Beat Tracking, Proc. ICASSP-07, pp. IV-1429-1432,
Hawai'i, April 2007.
M. A. Bartsch and G. H. Wakefield, To catch a chorus: Using chroma-based
representations for audio thumbnailing, in Proc. IEEE WASPAA, Mohonk,
October 2001.
A. Sheh and D. Ellis, Chord Segmentation and Recognition using EM-Trained
Hidden Markov Models, Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191,
Baltimore, October 2003.
2013-04-08 - 19/18