Pitch Tracking

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

ELEN E4896 MUSIC SIGNAL PROCESSING

Lecture 8:
Pitch Tracking

1. Pitch Tracking
2. Spectral Approaches
3. Time Domain
4. Example algorithms

Dan Ellis
Dept. Electrical Engineering, Columbia University
dpwe@ee.columbia.edu http://www.ee.columbia.edu/~dpwe/e4896/
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 1 /17
1. Pitch Tracking
• Pitch is a big part
of hearing
orthogonal to
formants (vowels)
in speech
follows fundamental
frequency (f0) of sounds
because of speech? because of periodicity?
• Pitch extraction (tracking) is useful
for coding / representation (telephony)
for extracting information
• .. but can be tricky
sounds without clear f0
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 2 /17
Challenges to Pitch Extraction
• Pitch extraction challenges include...
Talkin, 1995

“Oh, I’m just about to lose my mind...”


4
noise / multiple f0s
freq / kHz

1
Voice Activity
0
0.5 1 1.5 2 2.5 3 3.5
Detection
time / s

40 40
unclear f0
dB

20 20

0 0

-20 -20
0 1000 2000 3000 4000 0 1000 2000 3000 4000
freq / Hz
note segmentation
ampl.

0.2 0.2
0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2

0.75 0.76 0.77 0.78 0.79 3.3 3.31 3.32 3.33 3.34
time / s

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 3 /17


Multiple f0s
• Common in music (“polyphony”)
4000

3000
Frequency

2000

1000

0
0.5 1 1.5 2 2.5 3 3.5
Time
4000

3000
Frequency

2000

1000

0
0.5 1 1.5 2 2.5 3 3.5
Time

cross-terms...
identify-remove-repeat?
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 4 /17
2. Spectral Approaches
• Just pick the most intense spectral peak?
4
freq / kHz

0
4
freq / kHz

0
0.5 1 1.5 2 2.5 3 3.5 time / s

fundamental may not be strongest...

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 5 /17


Spectral Templates
• “Filter” log-freq *
spectrum
with template
0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80

matched =
filter ... 0
0 20 40 60 80 100 120 140 160

8000
4000
2000
1000
500
250

8000
4000
2000
1000
500
250

0.5 1 1.5 2 2.5 3 3.5 time / s


E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 6 /17
Cepstrum Noll, 1967

• Spectrum 40

20
40

20

shows periodicity 0 0

at f0 -20
0 1000 2000 3000 4000
-20
0 1000 2000 3000 freq / Hz

reveal via Fourier 3000 3000

transform of log|Y| 2000 2000

= cepstrum 1000 1000

0 0
0 200 400 600 800 1000 0 200 400 600 quefrency
4
freq / kHz

0
100
quefrency

80
60
40
20

0.5 1 1.5 2 2.5 3 3.5 time / s


E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 7 /17
“Specmurt” Sagayama et al., 2004

• Log-spectrum is like convolution of


pitch peaks and harmonics profile
esp. for multiple notes by one instrument (piano)
• Separate out by deconvolution
inverse Fourier transform of log-f spectrum V(y)
divide out harmonic template H(y)

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 8 /17


Sinusoid Tracks
Maher & Beauchamp 1994

• Do sinusoid tracking before pitch tracking


apply templates / sieve on tracks
consider each strong sinusoid as a candidate
limits to clearly periodic energy (normalized?)
4
freq / kHz

0
4
freq / kHz

0
0.5 1 1.5 2 2.5 3 3.5 time / s
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 9 /17
3. Time Domain Approaches
Gold & Rabiner 1969
ampl.

• Periodicity is primarily time-domain


0.2 0.2
0.1 0.1
0 0
-0.1 -0.1
-0.2 -0.2

0.75 0.76 0.77 0.78 0.79 3.3 3.31 3.32 3.33 3.34
time / s

time-domain processing avoids FFT


• Early approaches: Look for common gap
between wave features

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 10/17


Autocorrelation Sondhi 1967

• Directly measures
0.2
0.1
0
-0.1

period of self-similarity -0.2

0.75 0.76 0.77 0.78 3.3 3.31 3.32 3.33 time / s

(autoco is inverse FT of
20

10

magnitude spectrum) 0

-10
0 0.01 0.02 0.03 0 0.01 0.02 0.03 lag / s
4
freq / kHz

0
lag / ms

0
lag / ms

0
0.5 1 1.5 2 2.5 3 3.5 time / s

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 11/17


Post-processing
• Can assume & exploit pitch smoothness
Talkin, 1995

• Median filtering
only knows best point
• Dynamic programming (Viterbi path)
can look at all underlying values
raw
median31
lag / ms

6
viterbi
4

0
0 0.5 1 1.5 2 2.5 3 3.5 time / s

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 12/17


4. Examples:YIN
de Cheveigné & Kawahara, 2002
• Like autocorrelation, but
difference function - robust to changes in amplitude
progressive normalization by cumulative average
- eliminates lag = 0 advantage
outperforms others without post processing
0
Oct. re: 440 Hz

-1

-2

-3

-4
sqrt(apty)

0.5

0
sqrt(power)

0
0.5 1 1.5 2 2.5 3 3.5 time / s

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 13/17


Examples: sigmund~
Puckette et al. 1998
• Miller Puckette’s Pd pitch tracker
• Sinusoids + sieve L(f ) = +k
Ŷ (k · f )
k

• Also performs
note segmentation
and polyphony

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 14/17


Examples: Weighted Sieve Polyphonic
• Based on more complex multi-f 0 system
Klapuri 2006

• Chooses f via weighted sieve


0
spectrum is whitened before measuring peaks
weights are optimized for training data

• Multiple f s are found via cancel+repeat, or


0
joint optimization
E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 15/17
Summary
• Pitch
Important perceptually, for speech & music

• Spectrum
Combine multiple harmonics to f0

• Time
Autocorrelation directly measures
periodicity

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 16/17


References
• A. de Cheveigné & H. Kawahara, “YIN, a Fundamental Frequency Estimator for
Speech and Music,” J. Acoust. Soc. Am. 111, 1917–1930, 2002.
• B. Gold & L. Rabiner, “Parallel Processing Techniques for Estimating Pitch
Periods of Speech in the Time Domain,” J. Acoust. Soc. Am. 46(2), 442-448, 1969.
• A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic
Amplitudes”, Proc. Int. Symp. Music IR, 216-221, 2006.
• R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals
using a two-way mismatch procedure,” J. Acoust. Soc. Am. 95(4), 2254-2263,1994.
• A. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am. 41, 293-309, 1967.
• M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,”
Proc. Int. Comp. Music Conf., 109-112, 1998.
• S. Sagayama, K. Takahashi, H. Kameoka and T. Nishimoto, “Specmurt Anasylis: A
Piano-Roll-Visualization of Polyphonic Music Signal by Deconvolution of Log-
Frequency Spectrum,” Proc. SAPA-2004, 2004.
• M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio and
Electroacoustics AU-16(2), 262-266, 1968.
• D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and
Synthesis (W. B. Kleijn and K. K. Paliwal, eds.), 495-518, Elsevier, 1995.

E4896 Music Signal Processing (Dan Ellis) 2012-03-05 - 17/17

You might also like