Professional Documents
Culture Documents
Pitch Tracking
Pitch Tracking
Pitch Tracking
Lecture 8:
Pitch Tracking
1. Pitch Tracking
2. Spectral Approaches
3. Time Domain
4. Example algorithms
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e4896/
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 1 /17
1. Pitch Tracking
• Pitch is a big part
of hearing
orthogonal to
formants (vowels)
in speech
follows fundamental
frequency (f0) of sounds
because of speech? because of periodicity?
• Pitch extraction (tracking) is useful
for coding / representation (telephony)
for extracting information
• .. but can be tricky
sounds without clear f0
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 2 /17
Challenges to Pitch Extraction
• Pitch extraction challenges include...
Talkin, 1995
1
Voice Activity
0
0.5 1 1.5 2 2.5 3 3.5
Detection
time / s
40 40
unclear f0
dB
20 20
0 0
-20 -20
0 1000 2000 3000 4000 0 1000 2000 3000 4000
freq / Hz
note segmentation
ampl.
0.2 0.2
0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2
0.75 0.76 0.77 0.78 0.79 3.3 3.31 3.32 3.33 3.34
time / s
3000
Frequency
2000
1000
0
0.5 1 1.5 2 2.5 3 3.5
Time
4000
3000
Frequency
2000
1000
0
0.5 1 1.5 2 2.5 3 3.5
Time
cross-terms...
identify-remove-repeat?
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 4 /17
2. Spectral Approaches
• Just pick the most intense spectral peak?
4
freq / kHz
0
4
freq / kHz
0
0.5 1 1.5 2 2.5 3 3.5 time / s
matched =
filter ... 0
0 20 40 60 80 100 120 140 160
8000
4000
2000
1000
500
250
8000
4000
2000
1000
500
250
• Spectrum 40
20
40
20
shows periodicity 0 0
at f0 -20
0 1000 2000 3000 4000
-20
0 1000 2000 3000 freq / Hz
0 0
0 200 400 600 800 1000 0 200 400 600 quefrency
4
freq / kHz
0
100
quefrency
80
60
40
20
0
4
freq / kHz
0
0.5 1 1.5 2 2.5 3 3.5 time / s
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 9 /17
3. Time Domain Approaches
Gold & Rabiner 1969
ampl.
0.75 0.76 0.77 0.78 0.79 3.3 3.31 3.32 3.33 3.34
time / s
• Directly measures
0.2
0.1
0
-0.1
(autoco is inverse FT of
20
10
magnitude spectrum) 0
-10
0 0.01 0.02 0.03 0 0.01 0.02 0.03 lag / s
4
freq / kHz
0
lag / ms
0
lag / ms
0
0.5 1 1.5 2 2.5 3 3.5 time / s
• Median filtering
only knows best point
• Dynamic programming (Viterbi path)
can look at all underlying values
raw
median31
lag / ms
6
viterbi
4
0
0 0.5 1 1.5 2 2.5 3 3.5 time / s
-1
-2
-3
-4
sqrt(apty)
0.5
0
sqrt(power)
0
0.5 1 1.5 2 2.5 3 3.5 time / s
• Also performs
note segmentation
and polyphony
• Spectrum
Combine multiple harmonics to f0
• Time
Autocorrelation directly measures
periodicity