Pitch Tracking

ELEN E4896 MUSIC SIGNAL PROCESSING
Lecture 8:
Pitch Tracking
1. Pitch Tracking
2. Spectral Approaches
3. Time Domain
4. Example algorithms
Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e4896/
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 1 /17
1. Pitch Tracking
• Pitch is a big part
of hearing
orthogonal to
formants (vowels)
in speech
follows fundamental
frequency (f0) of sounds
because of speech? because of periodicity?
• Pitch extraction (tracking) is useful
for coding / representation (telephony)
for extracting information
• .. but can be tricky
sounds without clear f0
Challenges to Pitch Extraction
• Pitch extraction challenges include...
Talkin, 1995
“Oh, I’m just about to lose my mind...”

4
noise / multiple f0s
freq / kHz
1
Voice Activity
0
0.5 1 1.5 2 2.5 3 3.5
Detection
time / s
40 40
unclear f0
dB
20 20
0 0
-20 -20
0 1000 2000 3000 4000 0 1000 2000 3000 4000
freq / Hz
note segmentation
ampl.
0.2 0.2
0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2
0.75 0.76 0.77 0.78 0.79 3.3 3.31 3.32 3.33 3.34
time / s

Multiple f0s
• Common in music (“polyphony”)
4000
3000
Frequency
2000
1000
0
0.5 1 1.5 2 2.5 3 3.5
Time
4000
3000
Frequency
2000
1000
0
0.5 1 1.5 2 2.5 3 3.5
Time
cross-terms...
identify-remove-repeat?
2. Spectral Approaches
• Just pick the most intense spectral peak?
4
freq / kHz
0
4
freq / kHz
0
0.5 1 1.5 2 2.5 3 3.5 time / s
fundamental may not be strongest...

Spectral Templates
• “Filter” log-freq *
spectrum
with template
0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80
matched =
filter ... 0
0 20 40 60 80 100 120 140 160
8000
4000
2000
1000
500
250
8000
4000
2000
1000
500
250
0.5 1 1.5 2 2.5 3 3.5 time / s

Cepstrum Noll, 1967
• Spectrum 40
20
40
20
shows periodicity 0 0
at f0 -20
0 1000 2000 3000 4000
-20
0 1000 2000 3000 freq / Hz
reveal via Fourier 3000 3000
transform of log|Y| 2000 2000
= cepstrum 1000 1000
0 0
0 200 400 600 800 1000 0 200 400 600 quefrency
4
freq / kHz
0
100
quefrency
80
60
40
20
0.5 1 1.5 2 2.5 3 3.5 time / s

“Specmurt” Sagayama et al., 2004
• Log-spectrum is like convolution of

pitch peaks and harmonics profile
esp. for multiple notes by one instrument (piano)
• Separate out by deconvolution
inverse Fourier transform of log-f spectrum V(y)
divide out harmonic template H(y)

Sinusoid Tracks
Maher & Beauchamp 1994
• Do sinusoid tracking before pitch tracking

apply templates / sieve on tracks
consider each strong sinusoid as a candidate
limits to clearly periodic energy (normalized?)
4
freq / kHz
0
4
freq / kHz
0
0.5 1 1.5 2 2.5 3 3.5 time / s
3. Time Domain Approaches
Gold & Rabiner 1969
ampl.
• Periodicity is primarily time-domain

0.2 0.2
0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2
0.75 0.76 0.77 0.78 0.79 3.3 3.31 3.32 3.33 3.34
time / s
time-domain processing avoids FFT

• Early approaches: Look for common gap
between wave features
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 10/17

Autocorrelation Sondhi 1967
• Directly measures
0.2
0.1
0
-0.1
period of self-similarity -0.2
0.75 0.76 0.77 0.78 3.3 3.31 3.32 3.33 time / s
(autoco is inverse FT of
20
10
magnitude spectrum) 0
-10
0 0.01 0.02 0.03 0 0.01 0.02 0.03 lag / s
4
freq / kHz
0
lag / ms
0
lag / ms
0
0.5 1 1.5 2 2.5 3 3.5 time / s

Post-processing
• Can assume & exploit pitch smoothness
Talkin, 1995
• Median filtering
only knows best point
• Dynamic programming (Viterbi path)
can look at all underlying values
raw
median31
lag / ms
6
viterbi
4
0
0 0.5 1 1.5 2 2.5 3 3.5 time / s

4. Examples:YIN
de Cheveigné & Kawahara, 2002
• Like autocorrelation, but
difference function - robust to changes in amplitude
progressive normalization by cumulative average
- eliminates lag = 0 advantage
outperforms others without post processing
0
Oct. re: 440 Hz
-1
-2
-3
-4
sqrt(apty)
0.5
0
sqrt(power)
0
0.5 1 1.5 2 2.5 3 3.5 time / s

Examples: sigmund~
Puckette et al. 1998
• Miller Puckette’s Pd pitch tracker
• Sinusoids + sieve L(f ) = +k
Ŷ (k · f )
k
• Also performs
note segmentation
and polyphony

Examples: Weighted Sieve Polyphonic
• Based on more complex multi-f 0 system
Klapuri 2006
• Chooses f via weighted sieve

0
spectrum is whitened before measuring peaks
weights are optimized for training data
• Multiple f s are found via cancel+repeat, or

0
joint optimization
Summary
• Pitch
Important perceptually, for speech & music
• Spectrum
Combine multiple harmonics to f0
• Time
Autocorrelation directly measures
periodicity

References
• A. de Cheveigné & H. Kawahara, “YIN, a Fundamental Frequency Estimator for
Speech and Music,” J. Acoust. Soc. Am. 111, 1917–1930, 2002.
• B. Gold & L. Rabiner, “Parallel Processing Techniques for Estimating Pitch
Periods of Speech in the Time Domain,” J. Acoust. Soc. Am. 46(2), 442-448, 1969.
• A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic
Amplitudes”, Proc. Int. Symp. Music IR, 216-221, 2006.
• R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals
using a two-way mismatch procedure,” J. Acoust. Soc. Am. 95(4), 2254-2263,1994.
• A. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am. 41, 293-309, 1967.
• M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,”
Proc. Int. Comp. Music Conf., 109-112, 1998.
• S. Sagayama, K. Takahashi, H. Kameoka and T. Nishimoto, “Specmurt Anasylis: A
Piano-Roll-Visualization of Polyphonic Music Signal by Deconvolution of Log-
Frequency Spectrum,” Proc. SAPA-2004, 2004.
• M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio and
Electroacoustics AU-16(2), 262-266, 1968.
• D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and
Synthesis (W. B. Kleijn and K. K. Paliwal, eds.), 495-518, Elsevier, 1995.

Pitch Tracking

Uploaded by

Copyright:

Available Formats

You might also like

Pitch Tracking

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pitch Tracking

Uploaded by

Copyright:

Available Formats

ELEN E4896 MUSIC SIGNAL PROCESSING

“Oh, I’m just about to lose my mind...”

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 3 /17

fundamental may not be strongest...

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 5 /17

0.5 1 1.5 2 2.5 3 3.5 time / s

reveal via Fourier 3000 3000

transform of log|Y| 2000 2000

= cepstrum 1000 1000

0.5 1 1.5 2 2.5 3 3.5 time / s

• Log-spectrum is like convolution of

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 8 /17

• Do sinusoid tracking before pitch tracking

• Periodicity is primarily time-domain

time-domain processing avoids FFT

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 10/17

period of self-similarity -0.2

0.75 0.76 0.77 0.78 3.3 3.31 3.32 3.33 time / s

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 11/17

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 12/17

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 13/17

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 14/17

• Chooses f via weighted sieve

• Multiple f s are found via cancel+repeat, or

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 16/17

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 17/17

You might also like