Chroma and Chords: 1. Features For Music Audio 2. Chroma Features 3. Chord Recognition

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

ELEN E4896 MUSIC SIGNAL PROCESSING

Lecture 11:
Chroma and Chords
1. Features for Music Audio
2. Chroma Features
3. Chord Recognition

Dan Ellis

Dept. Electrical Engineering, Columbia University


dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)

http://www.ee.columbia.edu/~dpwe/e4896/
2013-04-08 - 1 /18

1. Features for Music Audio

Challenges of large music databases


how to find what we want...

Euclidean metaphor
music tracks as points
in space

What are the


dimensions?

sound - timbre,
instruments MFCC
melody, chords
Chroma
rhythm, tempo
Rhythmic bases
E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 2 /18

MFCCs

Logan 2000

The standard feature for speech recognition


0.5

Sound

FFT X[k]
spectra

0.5
0.25

0.255

0.26

0.265

0.27 time / s

10

Mel scale
freq. warp
log |X[k]|

audspec

5
0

0x 10

1000

2000

3000

freq / Hz

10

15

freq / Mel

10

15

freq / Mel

10

20

30

quefrency

2
1
0
100

50

IFFT
cepstra

200
0

Truncate

200

MFCCs
E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 3 /18

MFCC Example

Resynthesize by imposing spectrum on noise


MFCCs capture instruments, not notes

freq / Hz

Let It Be - log-freq specgram (LIB-1)


6000
1400
300

coefficient

MFCCs
12
10
8
6
4
2

freq / Hz

Noise excited MFCC resynthesis (LIB-2)


6000
1400
300
0

E4896 Music Signal Processing (Dan Ellis)

10

15

20

25
time / sec

2013-04-08 - 4 /18

MFCC Artist Classification

Ellis 2007

20 Artists x 6 albums each


train models on 5 albums, classify tracks from last
Model as MFCC mean + covariance
per artist

u2
tori_amos
suzanne_vega
steely_dan
roxette
radiohead
queen
prince
metallica
madonna
led_zeppelin
green_day
garth_brooks
fleetwood_mac
depeche_mode
dave_matthews_b
cure
creedence_c_r
beatles
aerosmith

u2
to
su
st
ro
ra
qu
pr
me
le
ma
gr
fl
ga
de
da
cu
cr
be
ae

E4896 Music Signal Processing (Dan Ellis)

Confusion: MFCCs (acc 55.13%)

true

single Gaussian
model
20 (mean)
+ 10 x 19
(covariance)
parameters
55% correct
(guessing ~5%)

2013-04-08 - 5 /18

2. Chroma Features
MIDI note number

What about modeling tonal content (notes)?


75

melody spotting
chord recognition
cover songs...

70
65
60
55

MFCCs exclude tonal content


Polyphonic
transcription
is too hard
45
40

MIDI note number

50

e.g. sinusoidal
tracking:
confused by
harmonics

75
70
65
60
55
50

Recognized
True
40
45

22

24

26

Chroma features as solution...

E4896 Music Signal Processing (Dan Ellis)

28

30

32

34

2013-04-08 - 6 /18

Chroma Features

Fujishima 1999

Idea: Project all energy onto 12 semitones


regardless of octave

maintains main musical distinction


invariant to musical equivalence
no need to worry about harmonics?
G
F

3
2

D
C
A

chroma

freq / kHz

chroma

G
F

D
C

1
50 100 150 fft bin

time / sec

50

100

150

200

250 time / frame

NM

C(b) =

B(12 log2 (k/k0 )

b)W (k)|X[k]|

k=0

W(k) is weighting, B(b) selects every ~ mod12


E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 7 /18

Better Chroma

Problems:

blurring of bins close to edges


limitation of FFT bin resolution

Solutions:
D
C
A

freq / kHz

chroma

G
F

2000 freq / Hz

3
2
1
0

time / sec

chroma

peak picking - only keep energy at center of peaks


G
F
D
C
A

50

100

150

200

time / frame

Instantaneous Frequency - high-resolution estimates


adapt tuning center based on histogram of pitches
E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 8 /18

Chroma Resynthesis

Ellis & Poliner 2007

Chroma describes the notes in an octave


... but not the octave
Can resynthesize by presenting all octaves
... with a smooth envelope
Shepard tones - octave is ambiguous
M

0
-10
-20
-30
-40
-50
-60

12 Shepard tone spectra


freq / kHz

level / dB

b
b
o+ 12
yb (t) =
W (o + ) cos 2
w0 t
12
o=1

Shepard tone resynth

4
3
2
1

500

1000

1500

2000

2500 freq / Hz

0
2

10 time / sec

endless sequence illusion


E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 9 /18

Chroma Example

Simple Shepard tone resynthesis

can also reimpose broad spectrum from MFCCs

freq / Hz

Let It Be - log-freq specgram (LIB-1)


6000
1400
300

chroma bin

Chroma features
B
A
G
E
D
C

freq / Hz

Shepard tone resynthesis of chroma (LIB-3)


6000
1400
300

freq / Hz

MFCC-filtered shepard tones (LIB-4)


6000
1400
300
0

E4896 Music Signal Processing (Dan Ellis)

10

15

20

25
time / sec

2013-04-08 - 10/18

Beat-Synchronous Chroma

Bartsch & Wakefield 2001

Drastically reduce data size

by recording one chroma frame per beat


Let It Be - log-freq specgram (LIB-1)

freq / Hz

6000
1400
300

chroma bin

Onset envelope + beat times

Beat-synchronous chroma

B
A
G
E
D
C

Beat-synchronous chroma + Shepard resynthesis (LIB-6)

freq / Hz

6000
1400
300
0

E4896 Music Signal Processing (Dan Ellis)

10

15

20

25
time / sec

2013-04-08 - 11/18

3. Chord Recognition

A-C-D-F

A-C-E

B-D-G

E
D
C

C-E-G

chroma bin

Beat synchronous chroma look like chords


...

10

15

20

time / sec

can we transcribe them?

Two approaches

manual templates
(prior knowledge)
learned models
(from training data)

E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 12/18

Chord Recognition System

Analogous to speech recognition

Sheh & Ellis 2003

Gaussian models of features for each chord


Hidden Markov Models for chord transitions

Beat track

Audio

test

100-1600 Hz
BPF

Chroma

25-400 Hz
BPF

Chroma

beat-synchronous
chroma features

C maj

B
A
G

train

Labels

chord
labels

HMM
Viterbi

Root
normalize

Gaussian

Unnormalize

24
Gauss
models

E
D
C

c min

B
A
G
E
D

Resample

C
b
a
g

Count
transitions

24x24
transition
matrix

f
e
d
c
B
A
G
F
E
D
C

E4896 Music Signal Processing (Dan Ellis)

EF

B c

e f

2013-04-08 - 13/18

HMMs

Hidden Markov Models are good for


inferring hidden states

.8

.8

underlying Markov
generative model
each state has
emission distribution

.1
.1

A
.1

B
.1

.1

p(qn+1|qn)
S
A
qn B
C
E

.1

.1

.7

qn+1
S A B C E
0
0
0
0
0

1
.8
.1
.1
0

0
.1
.8
.1
0

0
.1
.1
.7
0

0
0
0
.1
1

SAAAAAAAABBBBBBBBBCCCCBBBBBBCE

E4896 Music Signal Processing (Dan Ellis)

q=B

q=C
3

0.6

xn

p(x|q)

q=A

0.8

AAAAAAAABBBBBBBBBBBCCCCBBBBBBBC

0.4

0.8

q=A

q=B

q=C

0.6
0.4

2
1

0.2
0

Observation
sequence

0.2

xn

infer smoothed
state sequence

State sequence

Emission distributions

p(x|q)

observations
tell us
something
about state...

observation x

10

20

30

time step n

2013-04-08 - 14/18

HMM Inference

p(x|q)
HMM defines emission distribution
and transition probabilities p(qn |qn 1 )
Likelihood of observed given state sequence:
p({xn }|{qn }) =

Model M1
0.8

0.7

0.2

0.1

0.1

S
A
B
E

0.2

A B E
0.9 0.1
0.7 0.2 0.1
0.8 0.2

States

0.9

xn Observations
x1, x2, x3
p(x|B)
p(x|A)

B
A
S

p(x|q)
q{

x1

x2

x3

A 2.5 0.2 0.1


B 0.1 2.2 2.3

0.8 0.8 0.2


0.1
0.1 0.2 0.2

By dynamic programming,

0.7

0.9

4 time n

All possible 3-emission paths Qk from S to E


q0 q1 q2 q3 q4
S
S
S
S

A
A
A
B

A
A
B
B

A
B
B
B

E
E
E
E

1)

Observation
likelihoods

Paths

0.7

p(xn |qn )p(qn |qn

p(Q | M) = n p(qn|qn-1)

p(X | Q,M) = n p(xn|qn) p(X,Q | M)

.9 x .7 x .7 x .1 = 0.0441
.9 x .7 x .2 x .2 = 0.0252
.9 x .2 x .8 x .2 = 0.0288
.1 x .8 x .8 x .2 = 0.0128
= 0.1109

2.5 x 0.2 x 0.1 = 0.05


0.0022
2.5 x 0.2 x 2.3 = 1.15
0.0290
2.5 x 2.2 x 2.3 = 12.65 0.3643
0.1 x 2.2 x 2.3 = 0.506 0.0065
= p(X | M) = 0.4020

E4896 Music Signal Processing (Dan Ellis)

we can also identify the


best state sequence given
just the observations

2013-04-08 - 15/18

Key Normalization

Chord transitions depend on key of piece


dominant, relative minor, etc...

Chord transition

probabilities should
be key-relative
estimate main key of
piece
rotate all chroma
features
learn models

aligned chroma

Taxman

I'm Only Sleeping

G
F

G
F

G
F

A
A

Love You To

C D

F G

Aligned Global model


G
F

G
F

A
A

C D

She Said She Said

C D

F G

Good Day Sunshine

G
F

G
F

F G

C D

F G

C D

F G

And Your Bird Can Sing

G
F

C D

F G

A
A

F G

C D

Yellow Submarine

G
F

E4896 Music Signal Processing (Dan Ellis)

Eleanor Rigby

C D
F G
aligned chroma

2013-04-08 - 16/18

Chord Recognition

freq / Hz

Often works:
Audio

Let It Be/06-Let It Be

2416
761

Beatsynchronous
chroma
Recognized

G
F
E
D
C
B
A
0

A:min

F
4

10

12

A:min

G
14

But only about 60% of the time


E4896 Music Signal Processing (Dan Ellis)

A:min/b7
F:maj7

F:maj6

Ground
truth
chord

A:min/b7
F:maj7

240

a
16

18

2013-04-08 - 17/18

Summary

Music Audio Features

capture information useful for classification

Chroma Features

12 bins to robustly summarize notes

Chord Recognition

Sometimes easy, sometimes subtle

E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 18/18

References

B. Logan, Mel frequency cepstral coefficients for music modeling, in Proc. Int.
Symp. Music Inf. Retrieval ISMIR, Plymouth, September 2000.
D. Ellis, Classifying Music Audio with Timbral and Chroma Features, in Proc.
Int. Symp. Music Inf. Retrieval ISMIR-07, pp. 339-340,Vienna, October 2007.
T. Fujishima, Realtime chord recognition of musical sound: A system using
common lisp music, In Proc. Int. Comp. Music Conf., pp. 464467, Beijing, 1999.
D. Ellis and G. Poliner, Identifying Cover Songs With Chroma Features and
Dynamic Programming Beat Tracking, Proc. ICASSP-07, pp. IV-1429-1432,
Hawai'i, April 2007.
M. A. Bartsch and G. H. Wakefield, To catch a chorus: Using chroma-based
representations for audio thumbnailing, in Proc. IEEE WASPAA, Mohonk,
October 2001.
A. Sheh and D. Ellis, Chord Segmentation and Recognition using EM-Trained
Hidden Markov Models, Int. Symp. Music Inf. Retrieval ISMIR-03, pp. 185-191,
Baltimore, October 2003.

E4896 Music Signal Processing (Dan Ellis)

2013-04-08 - 19/18

You might also like