Professional Documents
Culture Documents
Ismir 2014 Tutorial (Mining Suite)
Ismir 2014 Tutorial (Mining Suite)
2014
https://code.google.com/p/miningsuite/
MiningSuite
A comprehensive framework for music analysis, articulating audio
(MIRtoolbox 2.0) and symbolic approaches
Olivier Lartillot
Aalborg University, Denmark
Lrn2
Cre8
Tutorial overview
1. General description: Aims, architecture, syntax,
2. Audio-based approaches (MIRtoolbox 2.0)
3. Symbolic approaches: MIDItoolbox 2.0 and new
musicological models
4. How it works
5. How you can contribute: open-source community
1.
General description
5
MiningSuite
Matlab framework
MIRtoolbox
MIRtoolbox history
European project Tuning the Brain for Music (NEST, FP6, 200608)
MIRtoolbox advantages
MIRtoolbox syntax
miraudio(ragtime.wav)
outputs a figure:
mirtempo(ragtime.wav)
a = miraudio(a, Center);
t = mirtempo(a)
MiningSuite syntax
sig.input(ragtime.wav)
outputs a figure
mus.tempo(ragtime.wav)
a = sig.input(a, Center);
t = mus.tempo(a)
MiningSuite history
MiningSuite
SIGMINR: signal processing
AUDMINR: audio, auditory modelling
MUSMINR: music analysis
PATMINR: pattern mining
SEQMINR: sequence processing
VOCMINR: voice analysis?
15
Packages
16
Signal domain
SIGMINR!
Sets of operators
related to signal
processing
operations, audio
and musical
features
Versions specific to
particular domains
sig.input, sig.spectrum,
AUDMINR!
aud.spectrum,
aud.mfcc, aud.brightness,
MUSMINR!
mus.spectrum,
mus.tempo, mus.key,
17
BPM preference
method
sig.peaks
BPM range
method
etc.
mus.tempo
18
Symbolic domain
MUSMINR!
Symbolic domain
transcription)
MUSMINR: mus.score!
Software dependencies
How to start
help miningsuite
MiningSuite development
Started in 2010
Sneak Peek version 0.6 released in January 2014 for AES Semantic
Audio. Proof of concept, basic architecture
Alpha version 0.8 released now for ISMIR 2014. Focus on the
architecture, not on the exact feature implementation. Not
everything implemented. Results cannot be trusted!
2.
Audio-based approaches
25
SIGMINR
signal processing
sig.input
sig.rms
sig.zerocross
sig.filterbank
sig.frame
sig.flux
sig.autocor
sig.spectrum
sig.peaks
sig.segment
sig.rolloff
sig.stat
sig.cepstrum
sig.simatrix
sig.envelope
26
sig.cluster
AUDMINR
aud.filterbank
aud.brightness
aud.attacktime
aud.roughness
aud.mfcc
aud.fluctuation
aud.envelope
aud.pitch
aud.novelty
aud.segment
27
aud.attackslope
aud.score
aud.eventdensity
MUSMINR
music theory
mus.spectrum
mus.chromagram
mus.keystrength
mus.pitch
mus.tempo
mus.pulseclarity
mus.metre
mus.key
mus.keysom
mus.mode
mus.score
sig.input
sig.input
sig.input(myfile)*
sig.input: transformations
sig.input: extraction
.play, .save
sonification
a = sig.input(, Trim)
a.play
a.save(newfile.mp3)
34
sig.frame
frame decomposition
f = sig.frame(, FrameSize, l, s)
f.play
f.save
Audio waveform
0.3
0.2
amplitude
0.1
0
!0.1
!0.2
!0.3
!0.4
3
time (s)
35
sig.frame
frame decomposition
sig.input
sig.frame
a = sig.input(
f = sig.frame(a)
s = sig.spectrum(f)
sig.spectrum
Or: s = sig.spectrum(, Frame)
36
Frame option
frame decomposition
sig.spectrum
frequency spectrum
Discrete Fourier Transform of audio signal x:
Amplitude spectrum: s = sig.spectrum()
Spectrum
1000
magnitude
800
600
400
200
0
1000
2000
3000
frequency (Hz)
4000
5000
6000
sig.spectrum
parameter specification
sig.spectrum(, Min, 0)
in Hz
sig.spectrum
post-processing
mirspectrum(..., dB)
20
10
magnitude
30
in dB scale
0
!10
!20
!30
!40
1000
2000
3000
frequency
(Hz)
Spectrum
4000
5000
6000
1000
2000
3000
frequency (Hz)
4000
5000
6000
20
15
magnitude
10
th highest dB
mirspectrum(..., Smooth, o)
mirspectrum(..., Gauss, o)
40
aud.spectrum
auditory modeling
aud.spectrum(, Mel)
Mel-band decomposition
magnitude
MelSpectrum
100
50
0
10
aud.spectrum(..., Terhardt):
Bark-band decomposition
Outer-ear modeling
aud.spectrum(..., Mask):
20
25
Mel bands
BarkSpectrum
30
35
40
1
0
10
15
Bark bands
20
25
20
25
20
25
BarkSpectrum
magnitude
magnitude
aud.spectrum(..., Bark)
4000
2000
0
10
15
Bark bands
BarkSpectrum
magnitude
x 10
15
4000
2000
0
10
15
Bark bands
mus.spectrum
pitch-based distribution
magnitude
Spectrum
1000
500
0
2000
4000
6000
8000
frequency (Hz)
10000
12000
mus.spectrum(, Cents)
!
500
0
6000
7000
8000
9000
10000 11000
pitch (in midicents)
12000
13000
magnitude
magnitude
Spectrum
1000
2000
1000
0
200
400
600
Cents
800
1000
1200
42
sig.cepstrum
cepstral analysis
sig.cepstrum(, Complex): complex cepstrum
(Inverse)
Fourier
transform
Phase
unwrap
Fourier
audio
Log
transform
(sig.spectrum)
Spectrum
Cepstrum
1000
0.2
0.15
800
magnitude
magnitude
0.1
600
400
0.05
0
!0.05
200
0
!0.1
2000
4000
6000
frequency (Hz)
8000
10000
12000
!0.15
0.005
0.01
0.015
0.035
0.04
0.045
0.05
Bogert, Healy, Tukey. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, crosscepstrum, and saphe cracking. Symposium on Time Series Analysis, Chapter 15, 209-243, Wiley, 1963.
43
sig.cepstrum
cepstral analysis
sig.cepstrum(, Real): real cepstrum
Fourier
audio
Abs
transform
(mirspectrum)
(Inverse)
Fourier
transform
Log
Spectrum
Cepstrum
1000
0.2
0.15
800
magnitude
magnitude
0.1
600
400
0.05
0
!0.05
200
0
!0.1
2000
4000
6000
frequency (Hz)
8000
10000
12000
!0.15
0.005
0.01
0.015
0.035
0.04
0.045
0.05
Bogert, Healy, Tukey. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, crosscepstrum, and saphe cracking. Symposium on Time Series Analysis, Chapter 15, 209-243, Wiley, 1963.
44
sig.cepstrum
cepstral analysis
sig.cepstrum(, Freq)
sig.cepstrum(..., Min, 0, s)
45
sig.autocor
autocorrelation function
+,-./012345/67
27<=.8,-4
!"%
!!"%
!"#$%
!"#&
!"#&%
!"#'
!"#'%
!"#(
8.7409:;
!"#(%
!")
!")!%
!")*
0.04
0.045
Waveform autocorrelation
0.15
coefficients
Rxx
0.1
0.05
!0.05
0.005
0.01
0.015
0.02
0.025
lag (s)
46
0.03
0.035
0.05
sig.autocor
autocorrelation function
sig.autocor(..., Freq)
lags in Hz.
sig.autocor
generalized autocorrelation
Generalized autocorrelation:
y = IDFT(|DFT(x)|2)
y = IDFT(|DFT(x)|k)
sig.autocor
generalized autocorrelation
0.06
0.03
0.05
0.025
0.04
0.02
0.03
0.015
0.02
0.01
0.01
0.005
0
!0.01
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
s = sig.spectrum(a, Frame)
sig.flux(s) = sig.flux(a)
Spectral flux
0.03
0.02
0.01
0
0.5
1.5
2.5
3.5
3.5
frames
c = sig.cepstrum(a, Frame)
sig.flux(c)
!4
Cepstrum flux
x 10
7
6
5
4
0.5
1.5
2.5
frames
sig.flux(..., Median, l, C)
Dynamics
Audio level
Sound
sig.rms
root-mean square
sig.rms(ragtime)
The RMS energy related to file ragtime is 0.017932
sig.rms(ragtime, Frame)
Default frame size .05 s, frame hop = .5
RMS energy
0.04
0.03
0.02
0.01
0
0.5
1.5
2.5
frames
3.5
4.5
sig.envelope
envelope extraction
Audio waveform
0.1
amplitude
0.05
a:
!0.05
!0.1
0.5
1.5
2.5
3.5
4.5
time (s)
e = sig.envelope(a):
Envelope
0.025
0.02
amplitude
0.015
0.01
0.005
0.5
1.5
2.5
time (s)
3.5
4.5
e.play
e.save
sig.rms
coefficient value (in )
RMS energy
0.04
0.03
0.02
0.01
0
0.5
1.5
2.5
3.5
4.5
3.5
4.5
frames
sig.envelope
Envelope
0.2
amplitude
0.1
!0.1
!0.2
0.5
1.5
2.5
time (s)
sig.envelope(, Filter)
based on low-pass filtering
sig.envelope(, Hilbert)
Hilbert
abs
Full-wave rectification
(Analytic signal)
(Magnitude)
HalfHann
1.5
LPF LowPass
Filter
x 10
0.5
0.5
2000
4000
6000
8000
10000
2000
4000
6000
8000
10000
sig.envelope(, Spectro)
based on power spectrogram
Band decomposition:
sig.envelope(, UpSample, 2)
sig.envelope(, Complex)
sig.envelope
post-processing
Envelope
sig.envelope(, Center)
amplitude
0.02
0
!0.02
0.5
1.5
2 (half!wave2.5
Envelope
rectified)
time (s)
3.5
4.5
0()$!!'
0.5
1.5
2
2.5
*+,,-.-/0+10-2)-/3-456time (s)
3.5
4.5
!"#
$"#
% (half!wave%"#
Envelope
rectified)
0+7-)89:
&
&"#
'
'"#
3.5
4.5
HalfWaveCenter)
amplitude
0.02
0.01
Diff)
1764+0;2-
%
$
!
!$
HalfWaveDiff)
amplitude
!4
x! 10
2
0
0.5
1.5
2.5
time (s)
sig.envelope(..., Power)
sig.envelope(..., Normal)
= aud.envelope(...,
Spectro,
0
each narroweband
separately is employed,
but
all
values
[10,
1 in the
2 range
3
4 10 5]
prototypical meter analysis
system
and
a
subset
o
Time (s)
UpSample, Log, perform almost equally
database (see Sec.0.5 III) well.
were used to thoroughl
playing in whichHalfwaveDiff,
no perceptible gaps Lambda,
are
0.4
.8);
To
achieve
a
better
time
resolution,
the effect of . Values
between
0.6 andthe1.0com
pe
0.3
0.2
envelopes
y
(k)
are
interpolated
by factor
b
and
=
0.8
was
taken
into use. Using
this valtw
are approximately one whole tone apart,
0.1
d.
sig.sum(e, Adjacent,
zeros
between
the 0but
samples.
This
leads
e notes c and
1.010)
makes
a slight
consistent
improvement
1
2
3
4to the
5in
ub(n)
x (k)
aud.envelope
auditory modeling
Time (s)
accuracy.
Klapuri, A., A. Eronen and J. Astola. (2006).
AnalysisFig.
of3.the
meter of acoustic musical
Illustration of the dynamic compression and weighted differentiation
steps for an Processing,
artificial signal. Upper panel
shows x342
(k) and the
lower panel
signals, IEEE Transactions on Audio, Speech
and3 Langage
14-1,
355.
Figure
illustrates
the described
dynamic
com
shows u (n).
b
sig.filterbank
filterbank decomposition
f.play
f.save
filterN
...
miraudio
filter2
filter1
sig.filterbank
filterbank decomposition
sig.filterbank(, CutOff, [44, 88, 176, 352, 443])
Magnitude Response (dB)
Magnitude (dB)
0
10
20
30
40
0
0.005
0.01
N
li
dF
0.015
d/
0.02
0.025
l )
Magnitude (dB)
0
10
20
30
40
50
0.02
0.04
0.06
0.08
0.1
0.12
Examples
The ten ERB filters between 100 and 8000Hz are computed using
fcoefs = MakeERBFilters(16000,10,100);
The resulting frequency response is given by
y = ERBFilterBank([1 zeros(1,511)], fcoefs);
resp = 20*log10(abs(fft(y')));
freqScale = (0:511)/512*16000;
semilogx(freqScale(1:255),resp(1:255,:));
axis([100 16000 -60 0])
xlabel('Frequency (Hz)');
ylabel('Filter Response (dB)');
aud.filterbank
auditory modeling
0
aud.filterbank()
-40
Equivalent Rectangular
Bandwidth (ERB)
-60
10
10
10
Gammatone filterbank A simple cochlear model can be formed by filtering an utterance with these filters. T
Filter
frequency
convert this data
into an image
we pass each row ofresponses
the cochleagram through a hal
-20
aud.filterbank
auditory modeling
aud.filterbank(, Mel)
Magnitude (dB)
0
10
20
30
0.05
0.1
0.15
aud.filterbank(..., Bark)
0.2
0.25
0.3
Magnitude (dB)
0
10
20
30
0.1
0.2
0.3
aud.filterbank(..., Scheirer)
0.4
0.5
0.6
0.7
Magnitude (dB)
0
10
20
30
40
0.04
0.06
aud.filterbank(..., Klapuri)
0.08
0.1
0.12
0.14
0.16
0
Magnitude (dB)
0.02
10
20
30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
.sum
across-channel summation
Audio waveform (centered)
0.1
0
!0.1
10
!1
20
sig.filterbank
f = sig.filterbank()
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0
!5
10
0
!2
50
0.5
0
!1
0
!3
Envelope
x 10
5
0
0.10
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
3.5
4 0.05
0
0.40
sig.envelope
e = sig.envelope(f)
0.2
0
10
2 0.5
0
0.20
1 0.1
0
Envelope
0.06
e.sum
0.02
amplitude
.sum
0.04
0
!0.02
!0.04
!0.06
!0.08
0.5
1.5
2.5
time (s)
.sum
stereo summation
sig.input
a = sig.input(,
Mix, No)
Envelope
0.03
0.02
sig.envelope
0.01
e = sig.envelope(a)
0.5
1.5
0.5
1.5
2.5
2.5
2.5
0.03
0.02
0.01
0
time (s)
Envelope
.sum
0.05
e.sum(channel)
amplitude
0.04
0.03
0.02
0.01
0
0.5
1.5
2
time (s)
.sum
across-channel summary
0.1
0
!0.1
10
!1
20
sig.filterbank
f = sig.filterbank()
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0
!5
10
0
!2
50
0.5
0
!1
!3
Envelope
x 10
5
0
0.10
sig.envelope
e = sig.envelope(f)
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
4 0.05
0
0.40
3 0.2
0
10
0.5
0
0.20
1 0.1
0
sig.autocor
!10
a = sig.autocor(e)
0.5
5 0
!0.5
3
2
1
a.sum
!8
x 10
50
0
!5 x 10!6
10
0
!1 x 10!6
50
0
!5 x 10!7
20
1
0
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
!6
5
coefficients
Envelope autocorrelation
x 10
.sum
Envelope autocorrelation
x 10
!5
0.2
0.4
0.6
0.8
lag (s)
1.2
sig.peaks
peak picking
sig.peaks
peak picking
Border effects:
o = Abscissa
sig.peak(..., Valleys)
sig.peaks
peak picking
'()!*+(,-./0
03748-/5(
$!
OK
#"
OK
#!
"
!
not OK
!
"
#!
#"
$!
'()123456
OK
$"
sig.peak(, Threshold, 0)
%!
%"
t
c
0&!0
sig.peaks
peak picking
Waveform autocorrelation
sig.peak()
coefficients
0.5
!0.5
0.005
0.01
0.015
0.02
0.025
lag (s)
0.03
0.035
0.04
0.045
0.035
0.04
0.045
0.035
0.04
0.045
Waveform autocorrelation
sig.peak(..., Extract)
coefficients
0.5
!0.5
0.005
0.01
0.015
0.02
0.025
lag (s)
0.03
Waveform autocorrelation
1
sig.peak(..., Only)
coefficients
0.8
0.6
0.4
0.2
0
0.005
0.01
0.015
0.02
0.025
lag (s)
0.03
sig.peaks(, Track)
peak tracking
sig.segment
segmentation
s = sig.segment(myfile, [1 2 3])
s.play(1:2)
Audio waveform
s.save
0.1
sig.spectrum(s)
0.5
1.5
2.5
time (s)
3.5
4.5
MelSpectrum
40
Mel bands
amplitude
0.1
20
0
0.5
1.5
2.5
3.5
sig.segment
segmentation
e = sig.envelope(myfile)
0.045
0.04
0.035
0.03
0.025
0.02
0.015
Envelope
0.01
0.005
p = sig.peaks(e)
0.5
s = sig.segment(myfile, p)
1.5
time
2.5
3.5
Symbolic level
Notes
Dynamics
Audio level
Sound
aud.score
onset detection
e=
sig.envelope(myfile)
Envelope
0.045
0.04
0.035
0.03
0.025
0.02
0.015
p = sig.peaks(e,
Contrast, .1)
0.01
0.005
0
1
s = aud.score(p)
1.5
time
2.5
aud.sequence
-0.5
In short:
[s p] =
aud.score(myfile,
Contrast, .1)
0.5
0.5
-1
0.5
1.5
2.5
seq.event
seq.syntagm
mironsets
onset detection
Poster @ ismir 2008
mirrms
mirautocor
mirsimatrix
mirnovelty
mirframe
mirspectrum
mirflux
(Complex)
Mel
mirfilterbank!
Gammatone!
Scheirer !
Klapuri
power
diff!
simple!
order 8
mirenvelope!
auto-regressive!
half-Hanning
Halfwave
*.8
log
*.2
aud.score(, Scheirer)
aud.score(, Klapuri)
aud.eventdensity
temporal density of events
1
0.5
-0.5
-1
0.5
1.5
a = aud.score(audio.wav)
2.5
Event density
3.5
8.5
8
7.5
7
6.5
aud.eventdensity(a)
6
5.5
5
0.5
aud.eventdensity(audio.wav, Frame)
aud.eventdensity(score.mid)
77
1.5
time
2.5
Notes
Dynamics
Timbre
Audio level
Sound
aud.score(,
Attack, Release)
aud.attacktime
duration of note attacks
Envelope (centered)
2.5
amplitude
2
1.5
1
0.5
0
!0.5
4
time (s)
Krimphoff, J., McAdams, S. & Winsberg, S. (1994), Caractrisation du timbre des sons complexes.
II : Analyses acoustiques et quantification psychophysique. Journal de Physique, 4(C5), 625-628.
aud.attackslope
average slope of note attacks
Envelope (centered)
2.5
amplitude
2
1.5
1
0.5
0
!0.5
4
time (s)
Peeters. G. (2004). A large set of audio features for sound description (similarity and
classification) in the CUIDADO project. version 1.0
aud.attackslope(o)
sig.zerocross
waveform sign-change rate
Audio waveform
0.0&
amplitude
0.02
0.01
0
!0.01
!0.02
!0.0&
1.701
1.702
1.70&
1.704
1.705
time (s)
1.706
1.707
1.708
Indicator of noisiness
1.709
sig.rolloff
high-frequency energy (I)
Spectrum
3500
magnitude
3000
2500
2000
1500
1000
500
0
2000
4000
6000
frequency (Hz)
8000
10000
12000
5640.53 Hz
aud.brightness
high-frequency energy (II)
Spectrum
3500
magnitude
3000
2500
2000
1500
1000
500
0
2000
4000
6000
frequency (Hz)
8000
10000
1500 Hz
aud.brightness(..., Unit, u)
u = /1 or %
12000
aud.brightness
high-frequency energy (II)
aud.brightness(, Frame)
coefficient value
Brightness, son5
0.6
0.4
0.2
70
75
80
85
90
Temporal location of events (in s.)
95
100
Brightness, son6
0.5
0.4
0.3
0.2
0.1
20
40
60
80
100
Temporal location of events (in s.)
120
140
aud.mfcc
mel-frequency cepstral coefficients
mircepstrum:
audio
sig.spectrum
Abs
Log
mirmfcc:
sig.spectrum
Abs
Log
(Mel)
(Inverse)
Fourier
transform
Discrete
Cosine
Transform
aud.roughness
sensory dissonance
aud.roughness(, Sethares)
Dissonance produced
by two sinusoids
depending on
their frequency ratio
sig.spectrum
sig.peaks
dissonance
sum
estimated for
each pair of peaks
Plomp & Levelt "Tonal Consonance and Critical Bandwidth" Journal of the Acoustical Society of America, 1965.
Sethares, W. A. (1998). Tuning, Timbre, Spectrum, Scale, Springer-Verlag.
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
aud.pitch
f0 estimation
aud.filterbank
sig.frame
sig.frame
mirspectrum
sig.spectrum
sig.autocor
sig.autocor
.sum
p = aud.pitch(,
Autocor)
aud.pitch(...,
AutocorSpectrum)
aud.pitch(..., Cepstrum)
aud.pitch(..., Frame,...)
sig.cepstrum
*
sig.peaks
aud.pitch
f0 estimation
aud.filterbank
sig.frame
sig.autocor
sig.frame
aud.pitch(, Tolonen)
aud.pitch(,
2Channels,
Enhanced, 2:10,
Generalized, .67)
mirspectrum
sig.spectrum
sig.autocor
.sum
sig.cepstrum
*
sig.peaks
aud.pitch
f0 estimation
frequency (Hz)
Waveform autocorrelation
1500
1000
ac = sig.autocor(ragtime,
Frame)
500
0.5
1
1.5
2
2.5
3
3.5
temporal location of beginning of frame (in s.)
Waveform autocorrelation
frequency (Hz)
2000
1500
ac = sig.autocor(ac,
1000
Enhanced, 2:10)
500
0
0.5
1
1.5
2
2.5
3
3.5
temporal location of beginning of frame (in s.)
Pitch
1600
1400
1200
aud.pitch(ac)
1000
800
600
400
200
0
0.5
1.5
2
2.5
3
Temporal location of frames (in s.)
3.5
4.5
p.play
p.save
aud.pitch
f0 estimation
aud.pitch(, Frame)
Pitch, s103.wav
1000
800
600
400
200
0
0.5
1.5
2
2.5
Temporal location of events (in s.)
3.5
250
200
150
0.5
1.5
2
2.5
Temporal location of events (in s.)
3.5
Symbolic level
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
mus.pitch
pitch segmentation and discretisation
4
Pitch, istikhbar
x 10
1.21
10
10
1.2
1.19
8
88
1.18
8
7
1.17
1.16
14
14.5
15
15.5
Temporal location of events (in s.)
16
5
16.5
Structural
levels
Meter
Symbolic level
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
aud.fluctuation
rhythmic periodicity along auditory
channels
Mel!Spectrum
40
35
aud.spectrum
Bark bands
(Frame,.023,.5, Terhardt,
Bark, Mask, dB)
Mel bands
30
25
20
15
10
5
0.5
1
1.5
2
2.5
3
3.5
temporal locationSpectrum
of beginning of frame (in s.)
40
35
(AlongBands, Max,10,
Window, No,
Resonance, Fluctuation)
Channels
aud.spectrum
Bark bands
30
25
20
15
10
5
Spectrum
10000
sig.sum
magnitude
8000
4
5
6
frequency (Hz)
6000
4000
2000
0
4
6
frequency (Hz)
10
mus.tempo
tempo estimation
Onset curve (Differentiated envelope)
e = sig.envelope(mysong,
Diff)
ac = sig.autocor(do)
0.15
0.05
0
!0.05
pa = sig.peaks(ac, Total, 1)
pa
0.6
mus.tempo(pa)
t = mus.tempo(mysong)!
time (s)
0.4
coefficients
0.1
amplitude
Envelope autocorrelation
ac
0.2
0
!0.2
t = t.eval
t = {sig.signal, sig.Autocor}
!0.4
0.2
0.4
0.6
0.8
1
lag (s)
1.2
1.4
1.6
mus.tempo
tempo estimation
e = sig.envelope(mysong, Diff)
ac = aud.autocor(do, Resonance)
1.2
ac
Resonance
curve
emphasises
(Toiviainen
& Snyder,
2003)
periodicities more easily perceived.
1
0.8
0.6
0.4
0.2
do
pa
0.2
0.4
0.2
0.4
0.6
0.8
lags (s)
1.2
1.4
1.6
mus.tempo
tempo estimation
Onset curve (Differentiated envelope)
e = sig.envelope(mysong, Diff)
ac = aud.autocor(do, Frame,
Resonance)
0.1
amplitude
0.15
0.05
0
!0.05
time (s)
pa = sig.peaks(ac, Total, 1)
mus.tempo(pa)
t = mus.tempo(mysong, Frame)!
t = t.eval
t = {sig.signal, sig.Autocor}
pa
coefficient value (in bpm)
ac
Tempo
180
170
160
150
140
130
4
6
8
Temporal location of events (in s.)
10
mus.pulseclarity
rhythmic clarity, beat strength
mus.pulseclarity(,Enhanced, No)
Envelope autocorrelation
Amplitude
1
0.8
coefficients
0.6
0.4
0.2
0
0.2
0.4
0.2
240 BPM
0.4
0.6
0.8
.5 s
1.2
1 s
lag (s)
1.4
MaxAutocor MinAutocor
KurtosisAutocor
TempoAutocor
EntropyAutocor
Lag
0
InterfAutocor
1.6
mus.pulseclarity(,Enhanced, Yes)
EntropyAutocor
InterfAutocor
Amplitude
0.6
0.5
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
1.2
1.4
1.6
Lag
mus.tempo(mysong,
Frame)
mus.metre
Metrical levels: 8
Dominant levels
7
6 .
5
4
3 .
2
1
0.5
2.5
mus.metre
metrical hierarchy tracking
mus.metroid
Structural
levels
Mode!
Tonality
Symbolic level
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
mus.chromagram
energy distribution along pitches
s = sig.spectrum(a,
dB, 20, 'Min', 100,
'Max', 6400)
Spectrum
20
15
magnitude
10
5
0
c=
mus.chromagram(c,
Wrap, yes)
1000
1500
2000
2500
3000
frequency (Hz)
3500
4000
4500
5000
Chromagram
100
magnitude
c=
mus.chromagram(s,
Wrap, no)
500
120
80
60
40
20
0
G2
C3
F3
A#3
D#4
G#4
C#5
F#5
chroma
B5
E6
A6
D7
G7
C8
F#
G#
A#
Chromagram
400
300
magnitude
200
100
0
C#
D#
F
chroma
mus.chromagram
chroma resolution
Chromagram
B
A#
A
G#
G
F#
F
E
D#
D
C#
C
0
1.5
2
2.5
time axis (in s.)
3.5
1.5
2
2.5
3
temporal location of beginning of frame (in s.)
3.5
35
30
chroma class
0.5
25
20
15
10
5
0
0.5
mus.keystrength
probability of key candidates
mus.chromagram
C major
Cross-Correlations
C minor
C# major
C# minor
D major
D minor
mus.key
key estimation
sig.peaks(mus.keystrength())
mus.key(, Total, 1)
mus.key
key estimation
Key
A#
k = mus.key(mysong, Frame)!
k = k.eval
coefficient value
A
G#
min
maj
k= {sig.signal, sig.signal,
mus.Keystrength}
F#
F
E
D#
0.5
1.5
2
Temporal location of events (in s.)
Key strength
2.5
3.5
2.5
3.5
Key clarity
tonal center
0.8
coefficient value
0.75
0.7
0.65
0.6
0.55
0.5
1.5
2
Temporal location of events (in s.)
Bm
A#m
Am
G#m
Gm
F#m
Fm
Em
D#m
Dm
C#m
Cm
BM
A#M
AM
G#M
GM
F#M
FM
EM
D#M
DM
C#M
CM
0
0.5
1.5
2
2.5
temporal location of beginning of frame (in s.)
mus.key
key estimation
coefficient value
Key, Umile01
10
15
20son32.wav
Key,
B
Temporal
location of events (in s.)
A
G#
G
F#
F
E
D#
D
C#
C
coefficient value
coefficient v
F#
F
E
D#
D
C#
C
maj
min
5
10
15
20
Temporal location of events (in s.)
25
30
A#
A
G#
G
F#
F
E
D#
D
C#
20
coefficient value
coefficient value
60
80
100
120
Temporal location of events (in s.)
40
60
80
100
Temporal location of events (in s.)
30
120
Key, son27.wav
B
A#
A
G#
G
F#
F
E
D#
D
C#
C
40
25
Key, son5.wav
20
maj
min
140
160
B
A#
A
G#
G
F#
F
E
D#
D
C#
C
20
40
60
80
100
120
Temporal location of events (in s.)
140
[k c] = mus.key
key clarity
Key clarity, son31.wav
coefficient value
clear
tonality
1
0.8
0.6
0.4
0.2
unclear
20
40
60
80
100
Temporal location of events (in s.)
120
140
coefficient value
clear
unclear
0.8
0.6
0.4
0.2
0
20
40
60
80
100
Temporal location of events (in s.)
120
140
160
mus.mode
mode estimation
coefficient value
major
minor
coefficient value
major
minor
mus.mode
mode estimation
Mode, son28.wav
0.4
0.2
0
0.2
0.4
20
40
60
80
100
Temporal location of events (in s.)
120
140
Mode, son22.wav
0.4
0.2
0
0.2
0.4
20
40
60
80
100
Temporal location of events (in s.)
120
140
mus.keysom
self-organizing map
Self!organizing map projection of chromagram
f#
d
A
c#
E
B
a
C
e
G
b
D
bb
Db
Ab
c
g
Eb
Bb
ab
d#
Gb
Structural
levels
Mode
Tonality
Symbolic level
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
onset-based segmentation
o = aud.onsets(audiofile, Attack, Release)
sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06
amplitude
0.04
0.02
0
0.02
0.04
0.06
0.08
0.1
0.5
1.5
2.5
time (s)
3.5
4.5
onset-based segmentation
s = sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06
amplitude
0.04
0.02
0
0.02
0.04
0.06
0.08
0.1
0.5
1.5
2.5
3.5
4.5
time (s)
mus.pitch(s)
Pitch, ragtime
1500
1000
500
onset-based segmentation
s = sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06
amplitude
0.04
0.02
0
0.02
0.04
0.06
0.08
0.1
0.5
1.5
2.5
time (s)
mus.key(s)
3.5
4.5
Structural
levels
Groups
Symbolic level
Meter
Notes
Timbre
Audio level
Mode
Tonality
Pitch
Sound
Dynamics
sig.simatrix
The analysis proceeds by comparing all pairwise combinations of aud
dissimilarity
matrix
measure. Represent
the B-dimensional spectral
data computed for N wi
3.5
2.5
Dissimilarity)
The time axis runs on the horizontal (left to right) and vertical (top
diagonal, where self-similarity is maximal.
A common similarity meas
sig.simatrix(,
vi and vj representing the spectrograms forDistance,
sample timescosine)
i and j, resp
1.5
0.5
0.5
sig.spectrum
(,Frame)
1
1.5
2
2.5
3
temporal location of frame centers (in s.)
vj
3.5
< vi , vj >
dcos (vi , vj ) =
.
|vi ||vj |
There are many possible choices for the similarity measure including
normCooper.
or non-negative
exponential using
measures,
such as Decomposition,
Foote,
Media Segmentation
Self-Similarity
SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.
The
ehe
sole
sole
sole
requirement
requirement
isisthat
issource
that
that
similar
similar
similar
source
source
audio
audio
audio
samples
samples
samples
produce
produce
produce
similar
similar
similar
feature
featur
featu
ement
isrequirement
that similar
audiosource
samples
produce
similar
features.
The
The
analysis
analysis
analysis
proceeds
proceeds
byby
by
comparing
comparing
comparing
allall
all
pairwise
pairwise
pairwise
combinations
combinations
combinations
ofofaudio
ofaudio
audio
windows
window
windo
sThe
proceeds
by proceeds
comparing
all
pairwise
combinations
of
audio windows
using
a qua
measure.
asure.
easure.
Represent
Represent
Represent
the
the
the
B-dimensional
B-dimensional
B-dimensional
spectral
spectral
data
data
data
computed
computed
for
for
for
NNaN
windows
windows
windows
ofofao
esent
the
B-dimensional
spectral dataspectral
computed
for
Ncomputed
windows
of
digital
audio
BBB
BBB
BBB
B
:ii :}i=i=
1,1,
1,B ., N
,Using
N
,}N}
}
IR
Igeneric
R. . Using
. Using
Using
a ageneric
ageneric
generic
similarity
similarity
similarity
measure,
:IR:IR
IIR
we
I
RIR
IR
#
v{vi,: N
=
IR
aIR
similarity
measure,
d measure,
: measure,
IR
IRdBd:d#
R,
em
milarity
imilarity
ilarity
data
data
data
inSinin
aasamatrix,
aillustrated
matrix,
matrix,
SSas
Sas
as
illustrated
illustrated
illustrated
inin
the
the
the
right
right
right
panel
panel
panel
ofof
Figure
Figure
Figure
1.1.1.
The
The
The
el
in a matrix,
in
the
right in
panel
of
Figure
1. ofThe
elements
of
Se
sig.simatrix
similarity matrix
Similarity matrix
Dissimilarity matrix
=i,=d(v
=
d(v
,)jv )j ),i,N
i,ji,j=j.=1,
=1,1, , N
, N
, N. . .
S(i, j) = d(vS(i,
vS(i,
) j)j)
j d(v
=
i ,1,
iv,ijv
i ,S(i,
jj)
4
3.5
3.5
The
ehe
time
time
time
axis
axis
runs
runs
runs
onon
on
the
the
the
horizontal
horizontal
horizontal
(left
(left
(left
totovertical
to
right)
right)
right)
and
and
and
vertical
vertical
(top
(top
(top
to
toto
bottom)
bottom
botto
runs
onaxis
the
horizontal
(left
to right)
and
(top
tovertical
bottom)
axes
of
Sa
diagonal,
agonal,
where
where
where
self-similarity
self-similarity
self-similarity
ismaximal.
iscommon
maximal.
maximal.
AAA
common
common
common
similarity
similarity
similarity
measure
measure
measure
the
isthe
thc
egonal,
self-similarity
is
maximal. isA
similarity
measure
is the
cosineisis
distan
vand
and
vjvjvrepresenting
representing
representing
the
the
the
spectrograms
spectrograms
spectrograms
for
for
for
sample
sample
times
i iand
iand
and
j,j,respectively,
j,respectively,
respectively,
senting
spectrograms
for
sample times
isample
and times
j,times
respectively,
iand
jthe
3
2.5
2.5
2
1.5
1.5
1
0.5
0.5
There
ere
here
are
are
are
many
many
many
possible
possible
possible
choices
choices
choices
for
for
for
the
the
the
similarity
similarity
similarity
measure
measure
measure
including
including
including
measures
measures
measure
b
y possible
choices
for the
similarity
measure
including
measures
based
on the
norm
orm
m ororor
non-negative
non-negative
non-negative
exponential
exponential
exponential
measures,
measures,
measures,
such
such
such
asasas
egative
exponential
measures,
such
as
0.5
1
1.5
2
2.5
3
temporal location of frame centers (in s.)
3.5
0.5
1
1.5
2
2.5
3
3.5
temporal location of frame centers (in s.)
dexp
dexp
dexp
(v(v
,ivcos
,ijv,)jv(v
)=
=
(d
(v(v
,iv,ijv,)jv)
1)
1)1). . .
dexp (vi , vj ) =
(d
,exp
vexp
) (d
(d
1)cos
.i(v
exp
i(v
j )i=
cos
cos
j )
jexp
l of
Figure
1 of
shows
a similarity
computed
from
the song
Wild
Honey
SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.
aud.novelty
novelty curve
sig.simatrix(a)
aud.novelty(a, KernelSize, s)
Similarity matrix
4
s = 128 samples
3
2.5
2
1.5
1
0.5
0.5
1
1.5
2
2.5
3
3.5
temporal location of frame centers (in s.)
coefficient
value (in )
coefficient
value (in
)
coefficient value (in
)
coefficient
value (in
)
3.5
small kernel:
Novelty
0.5
0
0
1
0.5
1.5
2
Novelty 2.5
Temporal location of frames (in s.)
3.5
4.5
0.5
1.5
Novelty 2.5
2
Temporal location of frames (in s.)
3.5
4.5
0.5
0
0
1
0.5
0
large kernel:
0.5
1.5
2 Novelty 2.5
Temporal location of frames (in s.)
3.5
4.5
0.5
1.5
2
2.5
Temporal location of frames (in s.)
3.5
4.5
0.5
0
Novelty
1
0.8
coefficient value
aud.segment
novelty-based
segmentation
nv
0.6
0.4
0.2
sg = sig.segment(mysong, nv)
sg = sig.segment(mysong)
sg.play
s = aud.mfcc(sg)
sm = sig.simatrix(s)
sm
3
4
5
Temporal location of events (in s.)
Audio waveform
sg
0.5
amplitude
nv = aud.novelty(sm)
!0.5
4
time
(s)
MFCC
20
coefficient ranks
15
10
5
0
4
time axis (in s.)
sig.cluster
clustering of audio segments
Audio waveform
amplitude
0.1
sg = aud.segment(a)0
!
coefficient ranks
cc = aud.mfcc(sg)
amplitude
0.1
150
MFCC
time (s)
10
5
0
0.10
sig.cluster(sg, cc)
0
0.1
Audio waveform
2
3
time axis (in s.)
sig.cluster
clustering of frame-decomposed feature
coefficient ranks coefficient ranks
MFCC
12
10
cc = aud.mfcc(,8
6
Frame)
4
2
MFCC
!
0
1
2
3
12 temporal location of beginning of frame (in
sig.cluster(cc, 4)10
8
6
4
2
0
1
2
3
sig.stat
basic statistics
mean
standard deviation
temporal slope
main periodicity:
frequency
amplitude
periodicity entropy
sig.hist
histogram
4
x 10
number of occurrences
7
6
5
4
3
2
1
0
!0.1
!0.08
!0.06
!0.04
!0.02
0
values
0.02
0.04
sig.hist(, Number,10)
0.06
0.08
0.1
sig.zerocross
waveform sign-change rate
Audio waveform
0.0&
amplitude
0.02
0.01
0
!0.01
!0.02
!0.0&
1.701
1.702
1.70&
1.704
1.705
time (s)
1.706
1.707
1.708
on any curve
1.709
sig.centroid
geometric center
first moment:
1 =
xf (x)dx
sig.spread
variance, dispersion
second moment:
= 2 =
2
standard deviation:
(x
1 ) f (x)dx
2
sig.skewness
non-symmetry
third moment:
3 =
(x
1 ) f (x)dx
3
3
third standardized moment:
3
-3
<0
>0
+3
sig.kurtosis
pickiness
fourth moment:
4 =
(x
1 ) f (x)dx
4
4
fourth standardized moment: 4
4
excess kurtosis: 4 3
-2
<0
>0
sig.flatness
smooth vs. spiky
geometric mean
arithmetic mean
N
N
1
n=0
x(n)
PN 1
n=0 x(n)
N
sig.entropy
relative Shannon entropy
p=1: normalization
Shannon, C. E. 1948. A Mathematical Theory of Communication. Bell Systems Technical Journal 27:379423.
sig.export
exportation of statistical data to files
m = aud.mfcc(
aud.play
feature-based playlist
zc = sig.zerocross(Folder);
sig.dist
distance between features
MelSpectrum
000
magnitude
150
000
0
10
20
Mel bands
30
100
50
0
40 0
10
20
Mel bands
30
40
sig.dist
distance between clusters
c1 = aud.mfcc(a, Frame)
c2 = aud.mfcc(b, Frame)
c1 = sig.cluster(c1, 16)
c2 = sig.cluster(c2, 16)
MFCC
coefficient ranks
MFCC
12
10
8
6
4
2
0
12
10
8
6
4
2
1
2
3
4 0
temporal location of beginning of frame (in s.)
2
4
6
8
10
12
temporal location of beginning of frame (in s.)
sig.dist(c1, c2)
sig.dist
distance between features
= aud.spectrum(Folder,
s1 = aud.spectrum(a, Mel) s2
Mel)
MelSpectrum
000
MelSpectrum
MelSpectrum
10
20
Mel bands
30
40
magnitude
magnitude
magnitude
magnitude
000
0
150
150
MelSpectrum
100
100 150
MelSpectrum
50
150
100
50
0
100 10
0
20
30
40
0 50
Mel bands
0
10
20
30
40
50
0
Mel bands
0
10
20
30
40
0
Mel bands
0
10
20
30
40
Mel bands
sig.dist(s1, s2)
.getdata
returns data in Matlab format
s = sig.spectrum(file);
Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
s.getdata
vector
.getdata
returns data in Matlab format
s = sig.spectrum(file,
Frame);
Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
s.getdata
matrix
.getdata
returns data in Matlab format
f = sig.filterbank(file,
Frame);
Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
f.getdata
matrix
.getdata
returns data in Matlab format
sg = sig.segment(file)
f = sig.filterbank(sg,
Frame);
Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
f.getdata
matrix
cell array
matrix
...
.getpeakpos, .getpeakval
returns data in Matlab format
p = sig.peaks
p.getpeakpos
vector
Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
p.getpeakval
vector
a = miraudio(bigfile)
f = mirframe(a)
s = mirspectrum(f)
mircentroid(s)
mircentroid(bigfile, Frame)
mircentroid
145
a = miraudio(Design, )
s = mirspectrum(f, Frame, )
c = mircentroid(s)
mireval(c, bigfile)
mirframe
mirspectrum
mircentroid
146
a = sig.input(bigfile, );
sig.input
s = sig.spectrum(f, Frame, );
c = sig.centroid(s)
Frame
; No operation is performed.
sig.spectrum
sig.centroid
sig.design
data flow graph design
a = sig.input(bigfile, );
s = sig.spectrum(a);
c = sig.centroid(s)
c!
d = sig.ans!
d = c.eval!
d = c.getdata
sig.design.eval
data flow graph evaluation
a = sig.input(bigfile, );
s = sig.spectrum(a);
c = sig.centroid(s)
c.eval!
c.eval(anotherfile)
sig.design.show
data flow graph display
a = sig.input();
s = sig.spectrum(a);
c = sig.centroid(s)
c.show
sig.signal.design
design stored in the results
a = sig.input();
save result.mat d
s = sig.spectrum(a);
1 year later:
c = sig.centroid(s);
load result.mat
d = c.eval
d.design!
d.design!
d.design.show
d.design.show
the results
3.
Symbolic approaches
153
MIDI Toolbox
nmat = readmidi(laksin.mid)
nmat =
pianoroll(nmat)
C5#
C5
B4
A4#
A4
Pitch
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
Time in beats
!
0
1.0000
2.0000
2.5000
3.0000
3.5000
4.0000
5.0000
6.0000
7.0000
9.0000
9.5000
10.0000
11.0000
11.5000
12.0000
12
14
12.5000
0.9000
0.9000
0.4500
0.4500
0.4528
0.4528
0.9000
0.9000
0.9000
1.7500
0.4528
0.4528
0.9000
0.4528
0.4528
0.4528
16
0.4528
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
64.0000
71.0000
71.0000
69.0000
67.0000
66.0000
64.0000
66.0000
67.0000
66.0000
64.0000
67.0000
71.0000
71.0000
69.0000
67.0000
66.0000
MIDI data
82.0000
89.0000
82.0000
70.0000
72.0000
72.0000
70.0000
79.0000
85.0000
72.0000
74.0000
81.0000
83.0000
78.0000
73.0000
71.0000
69.0000
0
0.6122
1.2245
1.5306
1.8367
2.1429
2.4490
3.0612
3.6735
4.2857
5.5102
5.8163
6.1224
6.7347
7.0408
7.3469
7.6531
0.5510
0.5510
0.2755
0.2755
0.2772
0.2772
0.5510
0.5510
0.5510
1.0714
0.2772
0.2772
0.5510
0.2772
0.2772
0.2772
0.2772
mus.score
score excerpt selection
mus.score(, Channel, 1)
mus.score(, Trim)
mus.score
score information
Pitch
pianoroll(laksin,'name','sec','vel');
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
mus.score
score information
0
1
2
3
mus.score(laksin.mid)
B4
10
10
mus.save
mus.play
Time in seconds
120
Velocity
100
A4 80
60
G4 40
F#4 20
0
E4
Eb4
Time in seconds
0 notation of
2 the two first
4 phrases of
6 Lksin min
8 kesyn.
10 The lower panel
Figure 1: Pianoroll
shows the velocity information.
mus.pitch
pitch contour
m = mus.score(myfile)
Pitch
71
m is of class
mus.sequence
70
69
68
67
66
65
s = mus.pitch(m)
64
63
s is of class sig.signal
h = mus.hist(s, Class)
Histogram
10
8
6
4
2
0
D#
F#
Pitches
mus.pitch(Inter)
pitch interval contour
m = mus.score(myfile)
Pitch
7
6
5
4
i = mus.pitch(m, Inter)
3
2
1
0
h = mus.histo(i)
-1
-2
1
7
mus.histo(i, Sign, 0)
Histogram
5
4
3
2
1
0
-2
-1
mus.pitch
pitch contour
Pitch
71
70
69
68
67
m = mus.score(myfile)
66
65
64
c = mus.pitch(m,
Sampling, .25)
63
Autocor
1
0.8
mus.autocor(c)
0.6
0.4
0.2
0
-0.2
0.5
1.5
2.5
Time
3.5
4.5
mus.pitch
pitch contour
m1 = mus.score(myfile,
EndTime, 5)
Pitch
71
70
69
68
67
66
65
m2 = mus.score(myfile,
StartTime, 5)
64
0.5
1.5
time
2.5
3.5
2.5
3.5
Pitch
71
70
69
68
p1 = mus.pitch(m1,
Sampling, .25)
p2 = mus.pitch(m2,
Sampling, .25)
67
66
65
64
63
0.5
1.5
time
prolongational reduction
GPR 7
ex
ic
s
trin
GPR 6
grouping structure
MPR 1
parallelism
ctu
structural
stru
PR
PRPR 5
TS
R
ral
culture
metrical structure
l
a
c
i
etr
accents
phenomenal
2+
3)
ic
ns
tri
in
(G
PR
surface
Structural
levels
Groups
Symbolic level
Meter
Notes
Timbre
Audio level
Mode
Tonality
Pitch
Sound
Dynamics
mus.score(, Segment)
local segmentation
$
#
%
!
"
"
"
"
"
"
Bod, 2002
mus.score(, Segment)
local
segmentation
segmentgestalt(laksin,'fig');
C5#
C5
B4
A4#
Pitch
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
12
14
16
Time in beats
Another segmentation technique uses the probabilities derived from the analysis o
although the correct segmentation is more in line with Tenney & Polanskys model.
Finally, a Local Boundary Detection Model by Cambouropoulos (1997) is a recent
variant of the rule-based model that offers effective segmentation of monophonic
input.
mus.score(, Segment)
local segmentation
Pitch
boundary(laksin,'fig');
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
12
14
16
12
14
16
Time in beats
1
Boundary strengths
0.8
0.6
0.4
0.2
0
10
(Cambouropoulos, 1997)
mus.score(, Segment)
CHAPTER 4 EXAMPLES
local segmentation
Pitch
33
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
12
14
16
12
14
16
Time in beats
0.5
10
(Bod, 2002)
3
!
$
!
!
# 4 !!! ! !!
!
!
!
mus.score(, Group)
hierarchical grouping
$ 42 ! #
$
! ! !
$ !#
!
&
!#
!
"
!
"
!
!
"
! ! !
!
!
"
!
"
!
!(
!#
!
"
%! !
!
!
"
&
!
"
!
"
!
!
"
!
!
"
!
"
!
!!
"
'
mus.score(, Group)
hierarchical grouping
B4
A4
G4
F#4
E4
Eb4
10
Structural
levels
Groups
Mode
Tonality
Meter
Ornamentation!
reduction
Symbolic level
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
mus.score(, Reduce)
ornamentation reduction
(Lerdahl & Jackendoff)
head
!
#
2
$ 4
$
! ! !
#
!
$
!
&
!#
!
"
!
"
!
!
"
! ! !
!
!
"
!
"
!
!(
!#
!
"
%! !
!
!
"
&
!
"
!
"
!
!
"
!
!
"
!
"
!
!!
"
'
mus.score(, Reduce)
ornamentation reduction
B4
mus.minr(laksin.mid, A4
Group, Reduce)
G4
Construction of a
F#4
syntagmatic network
E4
Eb4
10
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
motivic pattern
analysis
!
A
B
A
"
A
!
A
B
A
A@
!@
!@
Computational description
A
A
A
A
Melodic
Rhythmic B@@
B@
(D,E,F,G,A,Bb,C#,Bb,A,G,F,E,F)
16th notes
(-2,+1,+1,+1,+1,-6,+6,-1,-1,-1,-1,+1) 16th notes
(F,A,D+,-,+,+,-)
8th notes
(prefix of A)
"
A
A@
A@(not detected)
A
B@
(+,+,-)
8th notes
(-2,+1,+1,+1,+1,-6)
16th notes
(+7,-5,+1,+1,+1,-6)
8th notes
(+1,+1,-2,+1,+1,-6)
16th notes
A
A
A
(-,+1,+1,+1)
16th
B@
B@
B@@ notes
B@@
(+1,+1)
16th notes
(D5,E5,F5)
16th notes
(C5,D5,E5)
16th notes
!@
!@
(not detected)
A
A
A
A
B@@
B@
(-1,+1,+1,+1)
16th B@@except 1st
A
B@
!
A
B
motivic pattern
analysis
"
C
!
A
B
Computer analysis
$
!
C
A
B
"
C
!@
A
B@
!@
A
B@
#
C
A
$
A
B@
%
!@
A
B@
B@
B@
A
B@@
!@
A
B@
B@
B@
motivic pattern
analysis
Bach Invention in D minor
Name Kresky's interpretation
A
Opening motive
A'
First sequence unit
B
Accompanying figure
a
Motive shape
a1
Reversed motive shape
a2=c Triadic structure
a3=a3' Retarded scalar climb
a4=C Bass line
a5
Seconde sequence unit
a6
(not considered)
b
Three-note scale
b'
Identifying a5 with a3
b''
Same, transposed
b'''
Three shape in bass line
D
(not considered)
E
(not considered)
Computational description
Melodic
Rhythmic
(D,E,F,G,A,Bb,C#,Bb,A,G,F,E,F)
16th notes
(-2,+1,+1,+1,+1,-6,+6,-1,-1,-1,-1,+1) 16th notes
(F,A,D+,-,+,+,-)
8th notes
(prefix of A)
(not detected)
(+,+,-)
8th notes
(-2,+1,+1,+1,+1,-6)
16th notes
(+7,-5,+1,+1,+1,-6)
8th notes
(+1,+1,-2,+1,+1,-6)
16th notes
(-,+1,+1,+1)
16th notes
(+1,+1)
16th notes
(D5,E5,F5)
16th notes
(C5,D5,E5)
16th notes
(not detected)
(-1,+1,+1,+1)
16th except 1st
(E5,F5,D5,E5,F5)
16th start. offbeat
0!
.5
G G
0!
.5
G G
-!
.5
+!
.5
Eb
0!
.5
0!
.5
+3P!
0! -2m!
.5 .5
0!
.5
0! -2M! +3P! 0!
.5 .5
.5 .5
G G G
0!
.5
-! +!
.5 .5
Eb Ab Ab Ab
0! -! +! 0! 0! -2m!
.5 .5 .5 .5 .5 .5
D Ab Ab Ab G F
F F
0! 0! -2m!
.5 .5 .5
! ! !
! ! ! ! ! "!
"! $ '
&% $
"! !
#
L1
motivic pattern
analysis
M1
(!
!
$
'
!
%
$
!
!
!
!
)
"! ! ! "! ! ! ! !
"!
!
(
!
!
!
!
!
) % $ !
"! $ ' ! ! ! "! ! ! ! ! ! !
#
U1
&% !!! !
! $ ' !# !
#
L2
! !*
##
! "! ! ! ! ! !
! ! ! "
! $ '
"! #
) % $
U2
M2
(
$ '
) % $ ' !( !
! !
!
!
!
"!
! ! +!
! !
!
!
!
"
!
%
$
$
'
)
##
U3
, ,
&L3% ! ! ! !
!
'
!
$
!
! ! !!!!
"!
#
mus.score(, Motif)
motivic pattern analysis
mus.minr(beethoven, Motif)
A
G
F
F
E
E
D
D
C
C
B
A
A
G
F
F
E
E
D
C
C
B
B
A
A
G
F
E
C
100
200
300
400
500
structure complexity
Pattern extraction
Pattern selection
(longest, frequent, ...)
cannot be computed
extensively
Imperfections of the
selection phase:
expected patterns
accidentally deleted
insufficiently selective
Improvement of the
extraction process
closed patterns
ABCDECDEABCDECDE
n
io
it
t
e
p
e
r
n
r
e
t
t
a
p
f
o
n
io
t
ip
r
c
s
e
d
t
c
a
p
Complete and com
incremental approach
C D A B C D E A B C DE
pattern cyclicity
ABCABCABCA
Cambouropoulos, 1998
heterogeneous pattern
cycles
"" 9
# "8
C Eb
+3 +2m-5m
8th 8th 8th
10
13
14 15
!
!
! ! ! !
!
!
!
!
!
!
!
!
!
G
cycle 1
cycle 2
-4
C Eb G C Eb Ab C Eb
+3 +2m-5m +3 +2m +3 +2m -4
8th 8th 8th 8th 8th 8th
8th 8th 8th
cycle 3
cycle 4
cycle 5
Structural
levels
Groups
Mode!
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
c = mus.chromagram(score.mid)
ks = mus.keystrength(c)
mus.key(ks)
mus.mode(ks)
C major
C# major
D major
mus.keysom(ks) ks
etc.
CM
Dm
II
GM
V
CM
I
Am
VI
DM
I
GM
V
CM
I
Audio / symbolic
audio
MIDI, score
tran
mus.chromagram
scr
mus.keystrength
ipti
on
mus.score
tonal analysis
sig.peaks
mus.key
190
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
o = aud.envelope(score.mid)
amplitude
0.15
0.1
0.05
0
a = mus.autocor(o)
!0.05
time (s)
t = mus.tempo(a)
mus.pulseclarity(a)
mus.metre(a)
Tempo
180
170
160
150
140
130
4
6
8
Temporal location of events (in s.)
10
12
G
A
H
B
a
d
a
e
a
f
a
g
a
d
a
e
a
f
b
i
c
j
b
k
c
l
b
m
c
n
d
1
e
2
f3
g
4
d
5
e
6
! !!! !!
! !! ! ! !! !
! !! !
! !! ! ! !! !
!! !! ! ! !! !! ! !
!! ! !&! !! !'! ! !! !!
!! !!
! !&! ! !' !! !
!! !! ! ! !! !! ! !
!& ! ' ! !
!! !! ! ! !! !! ! !
& '
C
I
C
G
A
H
B
I
C
G H
A
B
G H
a
g
a
d
a
e
a
f
a
g
a
d
a
e
a
f
b
o
c
p
b
i
c
j
b
k
c
l
b
m
c
n
b
o
f7
g
8
d
9
e
10
!
!!
! ! & !!
!!
! !
! ! & !!
! !! !!
!! !! !! ! !!& !! !! !!
!! ! !&! !! '! ! !! !!
! ! ! !& ! ! !
! !& ! ' ! !
!! !! ! ! !! !! ! !
!
!
!
!
!! ! &! ! '! ! ! !
!! !!
! & '
C
G
D H
E
I
C
G H
a
g
a
d
a
e
a
f
a
g
a
d
a
e
a
f
c
p
b
i
c
j
b
k
c
l
b
m
c
n
f 12
g 13
d 14
e 15
f 16
g
11
d
1
e
2
f3
g
4
d
5
e
6
! !!! !!
! !!! !!
! !!! !!
! !! !! ! !! !
!! ! !! ! ! ! ! !
!! ! ! ! !
!! ! !&! !! !'! ! !! !!
!! ! ! ! ! !
!& ! !' !! !
!! !! ! ! !! !! ! !
!& ! ' ! !
!! !! ! ! !! !! ! !
& '
F
I
F
G
D
H
E
I
F
G
D H
E
G H
! ! & !!
!!
! !
! ! & !!
!! ! !! !! ! ! !! !!
! ! ! !& ! ! !
! ! !!& !! !' ! !!! !!
! ! ! !& ! !
! ! !&! !! !' ! !! !!
!! !!
!
!
!
!
! ! !& ! !' ! ! !
!! !!
! & '
F
I
F
G
D
H
E
a a a a
a a a
a
g
a
d
a
e
a
f
a
g
a
d
a
e
a
f
a
g
a a a a a a a a
d e f g d e f
g
b
o
c
p
b
i
c
j
b
k
c
l
b
m
c
n
b
o
c
p
b
i
f7
g
8
d
9
e
10
f
g 13
d 14
e 15
f 16
g
11
12
b
k
c
l
b c
m
n
b
o
c
p
d 2
e
1
f
3
g
4
d e
5
6
f
7
g
8
Audio / symbolic
audio
MIDI, score
tran
aud.envelope
scr
mus.autocor
ipti
on
mus.score
metrical analysis
sig.peaks
mus.tempo
194
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
4.
How it works
197
Matrix convention in
MIRtoolbox
In standard Matlab code, processing
of matrix dimensions makes the code
somewhat obscure.
s
l
ne
n
a
h
Bins
Frames
Matrix convention in
MIRtoolbox
One single convention that needs to allow complex
things such as the data of a batch of audio files that
are each segmented.
Ch
Bins
{{
s
l
e
n
an
n
n
a
s
l
e
Ch
Bins
Frames
Frames
,},}
(always structured
like this)
Matrix convention in
MiningSuite
Samples
{bin,
sample,
track}
s
k
c
a
Tr
Bins
Bins
a
r
T
s
k
c
Freq. bands
{bin,
freqband,
track}
Matrix convention in
MiningSuite
More complex configurations can be freely
designed, outside of any pre-wired configuration.
Tr
Bins
{{
s
k
ac
a
r
T
s
k
c
Bins
Freq. bands
Freq. bands
,},}
(variable structures
possible)
sig.data
For instance:
{bin,sample,channel}
Indicating if the data is a cell-array,
corresponding to batch of audio files,
sequence of segmented data, etc.
sig.data
sig.data
d:
e = d.apply(@algo,{},{bin},2);
tra
samples
out = {e};
function ceps = algo(m,)
ceps = mfccDCTMatrix * m;
end
2 dimensions:
m:
bins
end
s
k
c
bins
sig.signal.display
a = sig.input(myfile, Center)
sig.signal
X axis (bin pos)
no
Y axis
Sample axis
Frame axis
Channel axis
no
no
Peaks
date
15.10.2014
ver
R2014b,
MiningSuite 0.8
yname
yunit
Ydata
audio
Sstart
Srate
Ssize
0
44100
184269
sig.data
Matlab
matrix
dimension
specification
cell layers
indices
precise pos
precise val
sig.design
design
a = sig.input(myfile, Center)
b = sig.brightness(a, Frame)
Brightness, son5
coefficient value
sig.signal.display
sig.signal
X axis (bin pos)
no
Y axis
Sample axis
Frame axis
Channel axis
no
no
Peaks
date
15.10.2014
ver
R2014b,
MiningSuite 0.8
0.6
0.4
0.2
70
yname
yunit
Ydata
75
brightness
Sstart
Srate
Ssize
80
85
90
Temporal location of events (in s.)
95
sig.data
Matlab
matrix
0
40
.05
dimension
specification
cell layers
indices
precise pos
precise val
sig.design
design
b = sig.brightness(a, Frame)
100
Information such as
time positions are
regenerated on the fly.
b = sig.spectrum(a, Frame)
sig.Spectrum sig.axis
sig.unit
X axis
xname
frequency
xname
Hz
Y axis
xstart
origin
rate
10.76
generator
@..
finder
@..
xunit
Sample axis
Frame axis
Channel axis
no
no
Peaks
date
15.10.2014
ver
R2014b,
MiningSuite 0.8
design
yname
yunit
Ydata
Sstart
Srate
Ssize
indices
precise pos
spectrum
0
40
.05
1
power
0
log
phase sig.data
a = sig.input(bigfile, );
sig.input
s = sig.spectrum(f, Frame, );
c = sig.centroid(s)
Frame
; No operation is performed.
sig.spectrum
sig.centroid
sig.design
c = aud.mfcc(b)
a
sig
package
input
name
ragtime.wav
input
sig.signal
type
Extract, 0, 1
argin
options
no
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
b
aud
package
spectrum
name
input
sig.Spectrum
type
argin Mel, log, Frame
options
.05 s, .5
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
c
package
name
input
type
argin
options
frame
date
ver
aud
mfcc
sig.Mfcc
.05 s, .5
15.10.2014
R2014b,
MiningSuite 0.8
sig.design
c = aud.mfcc(ragtime.wav, Frame)
sig
package
input
name
ragtime.wav
input
sig.signal
type
Extract, 0, 1
argin
options
no
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
aud
package
spectrum
name
input
sig.Spectrum
type
argin Mel, log, Frame
options
.05 s, .5
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
package
name
input
type
argin
options
frame
date
ver
aud
mfcc
sig.Mfcc
.05 s, .5
15.10.2014
R2014b,
MiningSuite 0.8
sig.design.eval
sig.input(long audio file)
aud.spectrum
aud.mfcc
mirsum
mirfilterbank
mirzerocross
mirsegment
mirpeaks
mirpitch
mirautocor
miraudio
mirinharmonicity
mirpulseclarity
mirspectrum
mirenvelope
mironsets
mirautocor
mircepstrum
mirspectrum
mirattackslope
mirpeaks
mirattacktime
mirrolloff
mirroughness
mirpeaks
mirregularity
mirchromagram
mirtempo
mireventdensity
mirmfcc
mirbrightness
mirpeaks
mirkeystrength
mirkeysom
Intensive operators:
Chunk decomposition needs
to be performed here
mirtonalcentroid
Extensive operators:
mirenvelope needs to be
computed completely
before going further
mirpeaks
mirkey
mirmode
mireval
non-frame-based evaluation
a = miraudio(Design);
c = mirmfcc(a);
mireval(c, Folder)
very complex
paradigm for a
situation not practically
usual in MIR.
MiningSuite is much
more simple
eval_each(c,song1.wav)
c @mirdesign type=@mirmfcc
options: Rank=1:13, Delta=0, ...
@mirdesign type=@mirspectrum
options: Win=hamming, ...
a @mirdesign type=@miraudio
options: Sampling=11025, ...
after
decomposition
summation
k
chun
loading
sig.design.eval(Folder)
a = aud.mfcc(Folder, )
d = a.eval
sig.design.display sig.design.eval
sig.signal.display
sig.design object, storing only the data flow graph
a = aud.mfcc(bigfile, )
d = sig.ans!
d = a.eval;
sig.design.display =
sig.design.eval, which
outputs a sig.signal and
calls sig.signal.display
sig.signal.display
sig.operate
operators main routine
+aud/brightness.m:
function varargout = brightness(varargin)
varargout = sig.operate(aud, brightness, options,
@init, @main, varargin);
end
aud.brightness (ragtime, Frame)
sig.operate first parses the arguments in the call,
options specification
my_operator(filename, Threshold, 100, Normalize)
thres.key = Threshold;
norm.key = Normalize;
thres.type = Numeric;
norm.type = Boolean;
three.default = 50;
norm.default = 0;
options.thres = thres;
options.norm = norm;
options specification
my_operator(filename, Domain,Symbolic)
domain.key = Domain;
domain.type = String;
domain.choice = {Signal, Symbolic};
domain.default = Signal;
options.domain = domain;
@init
implicit data flow deployment
audio file
sig.input
sig.signal
sig.spectrum
sig.Spectrum
aud.brightness
sig.signal
x = sig.spectrum(x);
end
type = 'sig.signal';
end
res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);
s
Freq. bands
k
ac
Tr
,},}
Bins
{{
Bins
x.Ydata:
ck
a
Tr
Freq. bands
Tr
Bins
d:
k
ac
Freq. bands
e = d.apply(@algo,{f,f0},{'element'},3);
out = {e};
end
sig.compute
applies the
routine to the
successive
segments,
successive
audio files,
etc.
res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);
function y = algo(m,f,f0)
y = sum(m(f > f0,:,:)) ./ sum(m);
end
sig.data.apply
applies the
sub-routine
algo to the
data properly
formatted, with
loops if
needed.
+aud/mfcc.m:
res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);
e = d.apply(@algo,{},{bin},2);
out = {e};
d:
tra
s
k
c
bins
samples
end
ceps = mfccDCTMatrix * m;
end
2 dimensions:
m:
bins
seq.event
note characterization
Each note is a seq.event:
characterization of the
note: musical
parameter (instance of
class seq.paramstruct)
previous note
next note
0
10
mus.score(, Group)
hierarchical grouping
B4
mus.score(laksin.mid,
Group)
A4
G4
F#4
E4
Eb4
10
mus.score(, Reduce)
ornamentation reduction
B4
mus.score(laksin.mid,
Group, Reduce)
A4
Each syntagmatic
relation between 2
notes is instance of
seq.syntagm
G4
F#4
E4
Eb4
10
mus.paramstruct
musical dimensions
each note:!
chromatic pitch
more general
chromatic pitch class
gross contour
metrical position
channel
rhythmic value
harmony, etc.
seq.param
parameter management
seq.syntagm
s = seq.syntagm(n1,n2)
n1 n2
mus.score
incremental approach
Groups
Mode
Tonality
Ornamentation
reduction
Meter
Motifs
Pattern
Notes
mus.score
incremental approach
pat.pattern
8
0
8
seq.syntagm
pat.occurrence
0
G 8
0
F 8
G
F
0
8
0
8
0
8 G
0
+
8 Eb
5.
https://code.google.com/p/
miningsuite/
Open-source project
Open-source project
Contribution acknowledgement
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
Future directions
(on my side)
Lrn2
Cre8
Future directions
(We need you!)
Acknowledgments
Learning to Create
Lrn2
Cre8