Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

ELEN E4896 MUSIC SIGNAL PROCESSING

Lecture 5:
Sinusoidal Modeling
1.
2.
3.
4.

Sinusoidal Modeling
Sinusoidal Analysis
Sinusoidal Synthesis & Modification
Noise Residual
Dan Ellis

Dept. Electrical Engineering, Columbia University


dpwe@ee.columbia.edu
E4896 Music Signal Processing (Dan Ellis)

http://www.ee.columbia.edu/~dpwe/e4896/
2013-02-18 - 1 /16

1. Sinusoidal Modeling

Periodic sounds

each ridge is a
sinusoidal harmonic
.. with smoothly-varying
parameters

Violin.arco.ff.A4

ridges in spectrogram

.. an efficient & flexible


description?

E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 2 /16

Sinusoid Modeling

Analogous to Fourier series

model harmonics explicitly? e.g.


x[n] =
ak [n]cos( k [n])
k

... for pitched signal with fundamental


k [n] = k 0 [n] n

0 [n]

Additional constraints
harmonicity
smoothness of ak [n]

Arbitrarily accurate given enough sinusoids


E4896 Music Signal Processing (Dan Ellis)

2013-02-18 -

/16

Examples

Using Michael Klingbeils SPEAR


http://www.klingbeil.com/spear/

E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 4 /16

Envelope Limitations

Extracted envelope reflects analysis window


2000

0.3

1500

Frequency

0.4

0.2
0.1
0

1000
500

10

20

30

40

50

60

70

2000

0.3

1500

Frequency

0.4

0.2
0.1
0

0.5

1
Time

1.5

0.5

1
Time

1.5

1000
500

10

20

30

40

50

60

70

Sharp window violates assumptions

E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 5 /16

2. Sinusoidal Analysis

Sinusoids = peaks in spectrogram slices


N 1

= DFT frames X[k, m] =

x[n + mL] w[n]e

j 2 Nkn

n=0

DFT length N

window determines frequency resolution: X(ej )


long enough to see harmonics
e.g. 2-3x longest pitch cycle typically 50-100 ms
but: too long blurs amplitude envelope ak [n]

W (ej )

Hop advance L

choose N/2 or N/4


.. denser for simpler interpolation along time

E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 6 /16

Sinusoidal Peak Picking

Local maxima in DFT frames


freq / Hz

8000
6000
4000
2000

level / dB

0
0
20

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

time / s

0
-20
-40
-60
0

1000

2000

3000

4000

5000

6000

7000 freq / Hz

20

10

ab2/4

0
-10

y = ax(x-b)
x
b/2

-20
400

600

E4896 Music Signal Processing (Dan Ellis)

phase / rad

level / dB

Quadratic fit for sub-bin resolution


800 freq / Hz

-5

-10
400

600

800 freq / Hz

2013-02-18 - 7 /16

Peak Selection

Dont want every peak


just true sinusoids
threshold?

level / dB

20
0

-20
-40
-60

1000

2000

3000

local shape - fits

4000

Look for stability

5000

0)

6000

7000

freq / Hz

W (ej )

of frequency & amplitude in successive time frames


phase derivative in time/freq

E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 8 /16

Track Formation

Connect peaks in adjacent frames to form


sinusoids

freq

can be ambiguous if large frequency changes


birth

existing
tracks

death

new
peaks

time

Unclaimed peak create new track


No continuation of track termination
hysteresis

E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 9 /16

Pitch Tracking

Extracted sinusoids could be anywhere


6000

freq / Hz

freq / Hz

but often expect them to be in harmonic series

4000
2000
0
0

0.05

0.1 0.15

0.2

time / s

700
650
600
550
0

0.05

0.1 0.15

0.2

factor
Find pitch by searching for common
[n] = k [n]
can then regularize pitch

E4896 Music Signal Processing (Dan Ellis)

time / s

2013-02-18 - 10/16

3. Sinusoidal Synthesis

Each sinusoid track {a [n],


level

drives an oscillator

k [n]}

a k[n]

a k[n]cos(W k[n]t)

0
700

Hz
/
q
fre

-1

600
500 0

Wk[n]
0.05 0.1 0.15 0.2

-2

time / s

-3
0

0.05

0.1 0.15

time / s

0.2

can interpolate amplitude, frequency samples

Faster method synthesizes DFT frames


then overlap-add
trickier to achieve frequency modulation

E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 11/16

Sinusoidal Modification

Sinusoidal description very easy to modify


e.g. changing time base of sample points

freq / Hz

5000
4000
3000
2000
1000
0
0

0.05

0.1

0.15

0.2

0.25

0.3

Frequency stretch

0.35

0.4

0.45

0.5

time / s

40

level / dB

level / dB

preserve formant envelope?


30
20
10
0
0

1000 2000 3000 4000

freq / Hz
E4896 Music Signal Processing (Dan Ellis)

40
30
20
10
0
0

1000 2000 3000 4000

freq / Hz
2013-02-18 - 12/16

4. Noise Residual

Some energy is not well fit with sinusoids


e.g. noisy energy
Can just keep it as residual
or model it some other way
Leads to sinusoidal + noise model
x[n] =

ak [n]cos(

k [n]n)

+ e[n]

k
mag / dB

20

sinusoids

original

-20
-40
-60
-80

residual
LPC
0

1000

2000

E4896 Music Signal Processing (Dan Ellis)

3000

4000

5000

6000

7000 freq / Hz

2013-02-18 - 13/16

Sinusoids + Noise Decomposition

Removing sines reveals noise & transients


Guitar - original

4000

Frequency

3000
2000
1000
0

0.2

0.4

0.6

0.8

1
Time

1.2

1.4

1.6

1.8

1.4

1.6

1.8

1.4

1.6

1.8

Guitar - sinusoid reconstruction


4000

Frequency

3000
2000
1000
0

0.2

0.4

0.6

0.8

1
Time

1.2

Guitar - residual (original - sines)


4000

Frequency

3000
2000
1000
0

0.2

0.4

0.6

0.8

1
Time

1.2

Different representation approaches...


E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 14/16

5. Limitations

The spectrogram (mag STFT) is not linear


freq / Hz

superpositions suffer from phase effects


abs(stft(s1)) + abs(stft(s2))

1400
1300

1300

1200

1200

1100

1100

1000

1000

900

900

800

0.5

abs(stft(s1+s2))

1400

800

250
200
150
100
50

0.5

time / sec

Separating sources
is generally hard...
parameters
tracking
E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 15/16

Summary

Spectrogram shows sinusoid harmonics


in many sounds

Peak picking in spectrogram


can effectively extract them

Sinusoidal domain extremely flexible


for modification

Noise residual can add even more realism


E4896 Music Signal Processing (Dan Ellis)

2013-02-18 - 16/16

You might also like