Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition

ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 11:
Chroma and Chords
1. Features for Music Audio
2. Chroma Features
3. Chord Recognition
Dan Ellis
Dept. Electrical Engineering, Columbia University

dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)
http://www.ee.columbia.edu/~dpwe/e4896/
2013-04-08 - 1 /18
1. Features for Music Audio
Challenges of large music databases

how to find what we want...
Euclidean metaphor
music tracks as points
in space
What are the

dimensions?
sound - timbre,
instruments MFCC
melody, chords
Chroma
rhythm, tempo
Rhythmic bases
2013-04-08 - 2 /18
MFCCs
Logan 2000
The standard feature for speech recognition

0.5
Sound
FFT X[k]
spectra
0.5
0.25
0.255
0.26
0.265
0.27 time / s
10
Mel scale
freq. warp
log |X[k]|
audspec
5
0
0x 10
1000
2000
3000
freq / Hz
10
15
freq / Mel
10
15
freq / Mel
10
20
30
quefrency
2
1
0
100
50
IFFT
cepstra
200
0
Truncate
200
MFCCs
2013-04-08 - 3 /18
MFCC Example
Resynthesize by imposing spectrum on noise

MFCCs capture instruments, not notes
freq / Hz
Let It Be - log-freq specgram (LIB-1)

6000
1400
300
coefficient
MFCCs
12
10
8
6
4
2
freq / Hz
Noise excited MFCC resynthesis (LIB-2)

6000
1400
300
0
10
15
20
25
time / sec
2013-04-08 - 4 /18
MFCC Artist Classification
Ellis 2007
20 Artists x 6 albums each

train models on 5 albums, classify tracks from last
Model as MFCC mean + covariance
per artist
u2
tori_amos
suzanne_vega
steely_dan
roxette
radiohead
queen
prince
metallica
madonna
led_zeppelin
green_day
garth_brooks
fleetwood_mac
depeche_mode
dave_matthews_b
cure
creedence_c_r
beatles
aerosmith
u2
to
su
st
ro
ra
qu
pr
me
le
ma
gr
fl
ga
de
da
cu
cr
be
ae
Confusion: MFCCs (acc 55.13%)
true
single Gaussian
model
20 (mean)
+ 10 x 19
(covariance)
parameters
55% correct
(guessing ~5%)
2013-04-08 - 5 /18
2. Chroma Features
MIDI note number
What about modeling tonal content (notes)?

75
melody spotting
chord recognition
cover songs...
70
65
60
55
MFCCs exclude tonal content

Polyphonic
transcription
is too hard
45
40
MIDI note number
50
e.g. sinusoidal
tracking:
confused by
harmonics
75
70
65
60
55
50
Recognized
True
40
45
22
24
26
Chroma features as solution...
28
30
32
34
2013-04-08 - 6 /18
Chroma Features
Fujishima 1999
Idea: Project all energy onto 12 semitones

regardless of octave
maintains main musical distinction

invariant to musical equivalence
no need to worry about harmonics?
G
F
3
2
D
C
A
chroma
freq / kHz
chroma
G
F
D
C
1
50 100 150 fft bin
time / sec
50
100
150
200
250 time / frame
NM
C(b) =
B(12 log2 (k/k0 )
b)W (k)|X[k]|
k=0
W(k) is weighting, B(b) selects every ~ mod12

2013-04-08 - 7 /18
Better Chroma
Problems:
blurring of bins close to edges

limitation of FFT bin resolution
Solutions:
D
C
A
freq / kHz
chroma
G
F
2000 freq / Hz
3
2
1
0
time / sec
chroma
peak picking - only keep energy at center of peaks

G
F
D
C
A
50
100
150
200
time / frame
Instantaneous Frequency - high-resolution estimates

adapt tuning center based on histogram of pitches
2013-04-08 - 8 /18
Chroma Resynthesis
Ellis & Poliner 2007
Chroma describes the notes in an octave

... but not the octave
Can resynthesize by presenting all octaves
... with a smooth envelope
Shepard tones - octave is ambiguous
M
0
-10
-20
-30
-40
-50
-60
12 Shepard tone spectra

freq / kHz
level / dB
b
b
o+ 12
yb (t) =
W (o + ) cos 2
w0 t
12
o=1
Shepard tone resynth
4
3
2
1
500
1000
1500
2000
2500 freq / Hz
0
2
10 time / sec
endless sequence illusion

2013-04-08 - 9 /18
Chroma Example
Simple Shepard tone resynthesis
can also reimpose broad spectrum from MFCCs
freq / Hz

6000
1400
300
chroma bin
Chroma features
B
A
G
E
D
C
freq / Hz
Shepard tone resynthesis of chroma (LIB-3)

6000
1400
300
freq / Hz
MFCC-filtered shepard tones (LIB-4)

6000
1400
300
0
10
15
20
25
time / sec
2013-04-08 - 10/18
Beat-Synchronous Chroma
Bartsch & Wakefield 2001
Drastically reduce data size
by recording one chroma frame per beat

freq / Hz
6000
1400
300
chroma bin
Onset envelope + beat times
Beat-synchronous chroma
B
A
G
E
D
C
Beat-synchronous chroma + Shepard resynthesis (LIB-6)
freq / Hz
6000
1400
300
0
10
15
20
25
time / sec
2013-04-08 - 11/18
3. Chord Recognition
A-C-D-F
A-C-E
B-D-G
E
D
C
C-E-G
chroma bin
Beat synchronous chroma look like chords

...
10
15
20
time / sec
can we transcribe them?
Two approaches
manual templates
(prior knowledge)
learned models
(from training data)
2013-04-08 - 12/18
Chord Recognition System
Analogous to speech recognition
Sheh & Ellis 2003
Gaussian models of features for each chord

Hidden Markov Models for chord transitions
Beat track
Audio
test
100-1600 Hz
BPF
Chroma
25-400 Hz
BPF
Chroma
beat-synchronous
chroma features
C maj
B
A
G
train
Labels
chord
labels
HMM
Viterbi
Root
normalize
Gaussian
Unnormalize
24
Gauss
models
E
D
C
c min
B
A
G
E
D
Resample
C
b
a
g
Count
transitions
24x24
transition
matrix
f
e
d
c
B
A
G
F
E
D
C
EF
B c
e f
2013-04-08 - 13/18
HMMs
Hidden Markov Models are good for

inferring hidden states
.8
.8
underlying Markov
generative model
each state has
emission distribution
.1
.1
A
.1
B
.1
.1
p(qn+1|qn)
S
A
qn B
C
E
.1
.1
.7
qn+1
S A B C E
0
0
0
0
0
1
.8
.1
.1
0
0
.1
.8
.1
0
0
.1
.1
.7
0
0
0
0
.1
1
SAAAAAAAABBBBBBBBBCCCCBBBBBBCE
q=B
q=C
3
0.6
xn
p(x|q)
q=A
0.8
AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC
0.4
0.8
q=A
q=B
q=C
0.6
0.4
2
1
0.2
0
Observation
sequence
0.2
xn
infer smoothed
state sequence
State sequence
Emission distributions
p(x|q)
observations
tell us
something
about state...
observation x
10
20
30
time step n
2013-04-08 - 14/18
HMM Inference
p(x|q)
HMM defines emission distribution
and transition probabilities p(qn |qn 1 )
Likelihood of observed given state sequence:
p({xn }|{qn }) =
Model M1
0.8
0.7
0.2
0.1
0.1
S
A
B
E
0.2
A B E
0.9 0.1
0.7 0.2 0.1
0.8 0.2
States
0.9
xn Observations
x1, x2, x3
p(x|B)
p(x|A)
B
A
S
p(x|q)
q{
x1
x2
x3
A 2.5 0.2 0.1

B 0.1 2.2 2.3
0.8 0.8 0.2

0.1
0.1 0.2 0.2
By dynamic programming,
0.7
0.9
4 time n
All possible 3-emission paths Qk from S to E

q0 q1 q2 q3 q4
S
S
S
S
A
A
A
B
A
A
B
B
A
B
B
B
E
E
E
E
1)
Observation
likelihoods
Paths
0.7
p(xn |qn )p(qn |qn
p(Q | M) = n p(qn|qn-1)
p(X | Q,M) = n p(xn|qn) p(X,Q | M)
.9 x .7 x .7 x .1 = 0.0441
.9 x .7 x .2 x .2 = 0.0252
.9 x .2 x .8 x .2 = 0.0288
.1 x .8 x .8 x .2 = 0.0128
= 0.1109
2.5 x 0.2 x 0.1 = 0.05

0.0022
2.5 x 0.2 x 2.3 = 1.15
0.0290
2.5 x 2.2 x 2.3 = 12.65 0.3643
0.1 x 2.2 x 2.3 = 0.506 0.0065
= p(X | M) = 0.4020
we can also identify the

best state sequence given
just the observations
2013-04-08 - 15/18
Key Normalization
Chord transitions depend on key of piece

dominant, relative minor, etc...
Chord transition
probabilities should
be key-relative
estimate main key of
piece
rotate all chroma
features
learn models
aligned chroma
Taxman
I'm Only Sleeping
G
F
G
F
G
F
A
A
Love You To
C D
F G
Aligned Global model

G
F
G
F
A
A
C D
She Said She Said
C D
F G
Good Day Sunshine
G
F
G
F
F G
C D
F G
C D
F G
And Your Bird Can Sing
G
F
C D
F G
A
A
F G
C D
Yellow Submarine
G
F
Eleanor Rigby
C D
F G
aligned chroma
2013-04-08 - 16/18
Chord Recognition
freq / Hz
Often works:
Audio
Let It Be/06-Let It Be
2416
761
Beatsynchronous
chroma
Recognized
G
F
E
D
C
B
A
0
A:min
F
4
10
12
A:min
G
14
But only about 60% of the time

A:min/b7
F:maj7
F:maj6
Ground
truth
chord
A:min/b7
F:maj7
240
a
16
18
2013-04-08 - 17/18
Summary
Music Audio Features
capture information useful for classification
Chroma Features
12 bins to robustly summarize notes
Chord Recognition
Sometimes easy, sometimes subtle
2013-04-08 - 18/18
References
B. Logan, Mel frequency cepstral coefficients for music modeling, in Proc. Int.
Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000.
D. Ellis, Classifying Music Audio with Timbral and Chroma Features, in Proc.
Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340,Vienna, October 2007.
T. Fujishima, Realtime chord recognition of musical sound: A system using
common lisp music, In Proc. Int. Comp. Music Conf., pp. 464467, Beijing, 1999.
D. Ellis and G. Poliner, Identifying Cover Songs With Chroma Features and
Dynamic Programming Beat Tracking, Proc. ICASSP-07, pp. IV-1429-1432,
Hawai'i, April 2007.
M. A. Bartsch and G. H. Wakefield, To catch a chorus: Using chroma-based
representations for audio thumbnailing, in Proc. IEEE WASPAA, Mohonk,
October 2001.
A. Sheh and D. Ellis, Chord Segmentation and Recognition using EM-Trained
Hidden Markov Models, Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191,
Baltimore, October 2003.
2013-04-08 - 19/18

Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition

Uploaded by

Copyright:

Available Formats

You might also like

Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition

Uploaded by

Copyright:

Available Formats

ELEN E4896 MUSIC SIGNAL PROCESSING

Dept. Electrical Engineering, Columbia University

1. Features for Music Audio

Challenges of large music databases

What are the

The standard feature for speech recognition

Resynthesize by imposing spectrum on noise

Let It Be - log-freq specgram (LIB-1)

Noise excited MFCC resynthesis (LIB-2)

E4896 Music Signal Processing (Dan Ellis)

MFCC Artist Classification

20 Artists x 6 albums each

E4896 Music Signal Processing (Dan Ellis)

Confusion: MFCCs (acc 55.13%)

What about modeling tonal content (notes)?

MFCCs exclude tonal content

MIDI note number

Chroma features as solution...

E4896 Music Signal Processing (Dan Ellis)

Idea: Project all energy onto 12 semitones

maintains main musical distinction

250 time / frame

B(12 log2 (k/k0 )

W(k) is weighting, B(b) selects every ~ mod12

blurring of bins close to edges

peak picking - only keep energy at center of peaks

Instantaneous Frequency - high-resolution estimates

Ellis & Poliner 2007

Chroma describes the notes in an octave

12 Shepard tone spectra

Shepard tone resynth

endless sequence illusion

Simple Shepard tone resynthesis

can also reimpose broad spectrum from MFCCs

Let It Be - log-freq specgram (LIB-1)

Shepard tone resynthesis of chroma (LIB-3)

MFCC-filtered shepard tones (LIB-4)

E4896 Music Signal Processing (Dan Ellis)

Bartsch & Wakefield 2001

Drastically reduce data size

by recording one chroma frame per beat

Onset envelope + beat times

Beat-synchronous chroma + Shepard resynthesis (LIB-6)

E4896 Music Signal Processing (Dan Ellis)

Beat synchronous chroma look like chords

can we transcribe them?

E4896 Music Signal Processing (Dan Ellis)

Chord Recognition System

Analogous to speech recognition

Sheh & Ellis 2003

Gaussian models of features for each chord

E4896 Music Signal Processing (Dan Ellis)

Hidden Markov Models are good for

E4896 Music Signal Processing (Dan Ellis)

A 2.5 0.2 0.1

0.8 0.8 0.2

All possible 3-emission paths Qk from S to E