Ismir 2014 Tutorial (Mining Suite)

ISMIR 2014 Tutorial, 27.10.
2014
https://code.google.com/p/miningsuite/
MiningSuite
A comprehensive framework for music analysis, articulating audio
(MIRtoolbox 2.0) and symbolic approaches
Olivier Lartillot
Aalborg University, Denmark
Lrn2
Cre8
Tutorial overview
1. General description: Aims, architecture, syntax,
2. Audio-based approaches (MIRtoolbox 2.0)
3. Symbolic approaches: MIDItoolbox 2.0 and new
musicological models
4. How it works
5. How you can contribute: open-source community
1.
General description
5
MiningSuite
Matlab framework
Large range of audio and music analysis tools
Both audio and symbolic representations
Highly adaptive syntactic layer on top of Matlab
Syntactic layer within the operators Matlab code,

simplifying and clarifying the code
Memory management mechanisms

6
MIRtoolbox
Matlab framework (but intricate, non-optimized)
Large range of audio and music analysis tools
Both audio and symbolic representations
Highly adaptive syntactic layer on top of Matlab

Memory management mechanisms

7
Whats new in MiningSuite
Code & architecture entirely rebuilt
More efficient, more readable, better organised, better

generalisable
Integration audio / symbolic representations
Innovative and integrative set of symbolic-based

musicological tools and pattern mining

Open-source collaborative environment

8
Aim of this tutorial
Overview of audio and symbolic approaches to

computational audio/music analysis
Take benefit of the capabilities of the environment, of

the user-friendly syntax and architecture
Understand how the framework works
Understand about the choices in the toolbox

architecture, and maybe discuss about that?
Add new codes and new features in the project

9
MIRtoolbox history
Matlab toolboxes @ University of Jyvskyl:
MIDItoolbox (Eerola & Toiviainen, 2004)
Music Therapy Toolbox (Lartillot, Toiviainen, Erkkil, 2005)
European project Tuning the Brain for Music (NEST, FP6, 200608)
Master program @ University of Jyvskyl, MIR course (2005)
Large range of audio/musical features extracted from large

databases, linked to emotional ratings (Eerola et al., ISMIR 2009)
Toolbox easy to use, no Matlab expertise required
version 1.0 in 2007, current version 1.5.

10
MIRtoolbox advantages
Highly modular framework:
building blocks can be reused, reordered
no need to recode processes, to reinvent the wheel
Adaptive syntax: users can focus on design, MIRtoolbox

takes care of technical details
Free software, open source: Capitalized expertise of the

research community, for everybody
10000s download, 500+ citations, reference tool in MIR

11
MIRtoolbox syntax
miraudio(ragtime.wav)
outputs a figure:
miraudio(ragtime.wav); blocks figure display
mirtempo(ragtime.wav)
mirtempo(Folder) operates on all files in the Current Directory
a = miraudio(ragtime.wav, Extract, 0, 60, s)
a = miraudio(a, Center);
t = mirtempo(a)
d = mirgetdata(t) returns the result in Matlab array
get(t, Sampling) returns additional data stored in MIRtoolbox object

12
MiningSuite syntax
sig.input(ragtime.wav)
outputs a figure
sig.input(ragtime.wav); postpone any computation
mus.tempo(ragtime.wav)
mus.tempo(Folder) operates on all files in the Current Directory
a = sig.input(ragtime.wav, Extract, 0, 60, s)
a = sig.input(a, Center);
t = mus.tempo(a)
d = t.getdata returns the result in Matlab array
t.show returns additional data stored in MIRtoolbox object

13
MiningSuite history
Academy of Finland research fellowship, 2009-14
MIRtoolbox: a Rube Goldberg machine
Obscure architecture, obscure code, highly inefficient in

speed and memory rewrite
MIRtoolbox innovative framework draws interest outside MIR
Integrating audio and symbolic into common framework,

higher-level music analysis
Reorganize framework into discipline-focused packages
MIDItoolbox did not evolve since 1.0 (2004)

14
MiningSuite
SIGMINR: signal processing
AUDMINR: audio, auditory modelling
MUSMINR: music analysis
PATMINR: pattern mining
SEQMINR: sequence processing
VOCMINR: voice analysis?
15
Packages
In MIRtoolbox, all operators start with mir prefix

(miraudio, mirspectrum, etc.)
to avoid conflicts with other Matlab functions
In MiningSuite, each module (SigMinr, AudMinr,

etc.) is a package: its operators are called using a
particular prefix (sig.spectrum, aud.spectrum, etc.)
16
Signal domain
SIGMINR!
Sets of operators
related to signal
processing
operations, audio
and musical
features
Versions specific to
particular domains
Each operator can

be tuned with a set
of options
sig.input, sig.spectrum,
AUDMINR!
aud.spectrum,
aud.mfcc, aud.brightness,
MUSMINR!
mus.spectrum,
mus.tempo, mus.key,
17
Signal domain: Modular

data flow graphs
aud.envelope
method
mus.autocor
mus.tempo
BPM preference
method
sig.peaks
BPM range
method
etc.
mus.tempo
18
Symbolic domain
MUSMINR!
Succession of operations applied to input signal?
Several types of analysis applied together for

each successive note of the symbolic sequence
One single operator: mus.score
Types of analysis selected as options of

mus.score
19
Symbolic domain
AUDMINR: aud.score (auditory scene
transcription)
MUSMINR: mus.score!
SEQMINR: sequence management!
PATMINR: sequential pattern mining

20
Software dependencies
MathWorks Matlab 7.6 (r2008a) or newer versions.
But Matlab 8.2 (r2013b) or newer is strongly recommended,

because MiningSuite easily crashes on previous versions*.
MiningSuite based on new object oriented paradigm introduced

in Matlab 7.6
MathWorks Signal Processing Toolbox
Included in the distribution: code from Auditory Toolbox (Slaney),

Music Analysis Toolbox (Pampalk), MIDI Toolbox (Eerola et al.)
* versions 7.6 to 8.2: there exists a workaround based on debug

mode. Cf. README.txt.
21
How to start
Download version 0.8 on MiningSuite website:

https://code.google.com/p/miningsuite/
Read the README.txt file:
Add the folder "miningsuite" to your Matlab path.
help miningsuite
Check the examples in USAGE.m (unfinished)
MiningSuite development
Started in 2010
Sneak Peek version 0.6 released in January 2014 for AES Semantic
Audio. Proof of concept, basic architecture
Alpha version 0.8 released now for ISMIR 2014. Focus on the
architecture, not on the exact feature implementation. Not
everything implemented. Results cannot be trusted!
Beta version 0.9 with more complete implementation, users guide

(in wiki) and code documentation.
Version 1.0 with complete bench test. We need you..
Further versions: We definitely need you!
2.
Audio-based approaches
25
SIGMINR
signal processing
sig.input
sig.rms
sig.zerocross
sig.filterbank
sig.frame
sig.flux
sig.autocor
sig.spectrum
sig.peaks
sig.segment
sig.rolloff
sig.stat
sig.cepstrum
sig.simatrix
sig.envelope
26
sig.cluster
AUDMINR
audio, auditory modeling

aud.spectrum
aud.filterbank
aud.brightness
aud.attacktime
aud.roughness
aud.mfcc
aud.fluctuation
aud.envelope
aud.pitch
aud.novelty
aud.segment
27
aud.attackslope
aud.score
aud.eventdensity
MUSMINR
music theory
mus.spectrum
mus.chromagram
mus.keystrength
mus.pitch
mus.tempo
mus.pulseclarity
mus.metre
mus.key
mus.keysom
mus.mode
mus.score
sig.input
In MIRtoolbox: miraudio. But SigMinr is not

restricted to audio only, but any kind of signal.
sig.signal is the actual signal object (cf. part IV)
sig.input takes care of the transformations of the

signal representation before further processing
sig.input(v, sr) converts a Matlab vector v into a

signal of sampling rate sr
29
sig.input
sig.input(myfile)*
Formats: .wav, .flac, .ogg, .au, .mp3, .mp4, .m4a
Using Matlabs audioread
* Indicate the file extension as well (because

audioread requires that).
(sig.input actually calls routine from AudMinr for the
processing of audio files.)
30
sig.input: transformations
By default, multi-tracks are summed into mono.
sig.input(mysong, Mix, no) keeps the multitrack

decomposition.
sig.input(..., Center) centers the signal.
sig.input(..., Sampling, r) resamples at rate r (in Hz.),

using resample from Signal Processing Toolbox
sig.input(..., Normal) normalizes with respect to RMS

31
sig.input: extraction
sig.input(, Extract, 1, 2, s, Start)
sig.input(..., Extract, 44100, 88200, sp)
from samples #44100 to 88200 after the start
sig.input(..., Extract, -1, +1, Middle)
extracts signal from 1 s to 2 s after the start
from 1 s before to 1 s after the middle of the signal
sig.input(..., Extract, -10000, 0, sp, End)
the last 10000 samples in the signal

32
sig.input: trimming silence
sig.input(, Trim) trims (pseudo-)silence at start

and end
sig.input(..., Trim, JustStart) at start only
sig.input(..., TrimEnd, JustEnd) at end only
sig.input(..., Trim, TrimThreshold, thr) specifies

the silence threshold. Default thr = .06
Silent frames have RMS energy below thr times the

medium RMS energy of the whole audio file.
33
.play, .save
sonification
a = sig.input(, Trim)
a.play
a.save(newfile.mp3)
34
sig.frame
frame decomposition
f = sig.frame(, FrameSize, l, s)
f.play
f.save
unit: s (seconds), sp (samples)
f = sig.frame(..., FrameHop, h, /1)
unit: /1 (ratio from 0 to 1),Hz, s, sp
Audio waveform
0.3
0.2
amplitude
0.1
0
!0.1
!0.2
!0.3
!0.4
3
time (s)
35
sig.frame
frame decomposition
sig.input
sig.frame
a = sig.input(
f = sig.frame(a)
s = sig.spectrum(f)
sig.spectrum
Or: s = sig.spectrum(, Frame)
36
Frame option
frame decomposition
sig.input(, FrameSize, , FrameHop, )
sig.spectrum(, FrameSize, , FrameHop, )
sig.spectrum(, Frame) (default frame configuration)
Frame option available to most operators, each with

its own default frame configuration
Each operator can perform the frame decomposition

where it is most suitable.
37
sig.spectrum
frequency spectrum
Discrete Fourier Transform of audio signal x:
Amplitude spectrum: s = sig.spectrum()
Spectrum
1000
magnitude
800
600
400
200
0
1000
2000
3000
frequency (Hz)
4000
5000
6000
Phase spectrum: s.phase

Fast Fourier Transform using Matlabs fft function.
38
sig.spectrum
parameter specification
sig.spectrum(, Min, 0)
sig.spectrum(..., Max, sampling rate/2) in Hz
sig.spectrum(..., Window, hamming)
frequency resolution r, in Hz:
in Hz
sig.spectrum(..., Res, r): exact resolution specification
sig.spectrum(..., MinRes, r): minimal resolution (less

precise constraint, but more efficient)
sig.spectrum(..., Phase, No)

39
sig.spectrum
post-processing
mirspectrum(..., Normal) normalizes w.r.t. energy.
mirspectrum (..., Power) squares the energy.

Spectrum
mirspectrum(..., dB)
20
10
magnitude
30
in dB scale
0
!10
!20
!30
!40
1000
2000
3000
frequency
(Hz)
Spectrum
4000
5000
6000
1000
2000
3000
frequency (Hz)
4000
5000
6000
20
mirspectrum(..., dB, th)
15
magnitude
10
th highest dB
mirspectrum(..., Smooth, o)
mirspectrum(..., Gauss, o)
40
aud.spectrum
auditory modeling
aud.spectrum(, Mel)
Mel-band decomposition
magnitude
MelSpectrum
100
50
0
10
aud.spectrum(..., Terhardt):
Bark-band decomposition
Outer-ear modeling
aud.spectrum(..., Mask):
Masking effects along bands
20
25
Mel bands
BarkSpectrum
30
35
40
1
0
10
15
Bark bands
20
25
20
25
20
25
BarkSpectrum
magnitude
magnitude
aud.spectrum(..., Bark)
4000
2000
0
10
15
Bark bands
BarkSpectrum
magnitude
x 10
15
4000
2000
0
based on Auditory toolbox and Music Analysis toolbox
10
15
Bark bands
mus.spectrum
pitch-based distribution
magnitude
Spectrum
1000
500
0
2000
4000
6000
8000
frequency (Hz)
10000
12000
mus.spectrum(, Cents)
!
500
0
6000
7000
8000
9000
10000 11000
pitch (in midicents)
12000
13000
aud.spectrum(..., Collapsed): into one octave,

divided into 1200 cents
Spectrum
magnitude
magnitude
Spectrum
1000
2000
1000
0
200
400
600
Cents
800
1000
1200
42
sig.cepstrum
cepstral analysis
sig.cepstrum(, Complex): complex cepstrum
(Inverse)

Fourier

transform
Phase

unwrap
Fourier

audio
Log
transform
(sig.spectrum)
Spectrum
Cepstrum
1000
0.2
0.15
800
magnitude
magnitude
0.1
600
400
0.05
0
!0.05
200
0
!0.1
2000
4000
6000
frequency (Hz)
8000
10000
12000
!0.15
0.005
0.01
0.015
0.02 0.025 0.03

quefrency (s)
0.035
0.04
0.045
0.05
Bogert, Healy, Tukey. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, crosscepstrum, and saphe cracking. Symposium on Time Series Analysis, Chapter 15, 209-243, Wiley, 1963.
43
sig.cepstrum
cepstral analysis
sig.cepstrum(, Real): real cepstrum
Fourier

audio
Abs
transform
(mirspectrum)
(Inverse)

Fourier

transform
Log
Spectrum
Cepstrum
1000
0.2
0.15
800
magnitude
magnitude
0.1
600
400
0.05
0
!0.05
200
0
!0.1
2000
4000
6000
frequency (Hz)
8000
10000
12000
!0.15
0.005
0.01
0.015
0.02 0.025 0.03

quefrency (s)
0.035
0.04
0.045
0.05
Bogert, Healy, Tukey. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, crosscepstrum, and saphe cracking. Symposium on Time Series Analysis, Chapter 15, 209-243, Wiley, 1963.
44
sig.cepstrum
cepstral analysis
sig.cepstrum(, Freq)
represented in the frequency domain
sig.cepstrum(..., Min, 0, s)
sig.cepstrum(..., Max, .05, s)
45
sig.autocor
autocorrelation function
+,-./012345/67
27<=.8,-4
!"%
!!"%
!"#$%
!"#&
!"#&%
!"#'
!"#'%
!"#(
8.7409:;
!"#(%
!")
!")!%
!")*
0.04
0.045
Waveform autocorrelation
0.15
coefficients
Rxx
0.1
0.05
!0.05
0.005
0.01
0.015
0.02
0.025
lag (s)
46
0.03
0.035
0.05
sig.autocor
autocorrelation function
sig.autocor(, Min, t1, s) t1=0 s
sig.autocor(..., Max, t2, s) t2=.05 s (audio) or t2=2 s

(envelope)
sig.autocor(..., Freq)
sig.autocor(..., Window,w) w=hanning

specifies a windowing method
sig.autocor(..., NormalWindow,f) f=on=yes=1

divides by autocorrelation of the window*
sig.autocor(..., Halfwave) half-wave rectification
lags in Hz.
*Boersma. Accurate short-term analysis of the fundamental frequency and the

harmonics-to-noise ratio of a sampled sound, IFA Proceedings 17: 97-110, 1993.
sig.autocor
generalized autocorrelation
Autocorrelation (by default):
Generalized autocorrelation:
y = IDFT(|DFT(x)|2)
y = IDFT(|DFT(x)|k)
sig.autocor(, Compress, k) k=.67
Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis

Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.
sig.autocor
generalized autocorrelation
sig.autocor(Amin3, Enhanced, 2:10)

234567
0.06
0.03
0.05
0.025
0.04
0.02
0.03
0.015
0.02
0.01
0.01
0.005
0
!0.01
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis

Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.
sig.flux: distance between

successive frames
s = sig.spectrum(a, Frame)
sig.flux(s) = sig.flux(a)
coefficient value (in )
Spectral flux
0.03
0.02
0.01
0
0.5
1.5
2.5
3.5
3.5
frames
c = sig.cepstrum(a, Frame)
sig.flux(c)
!4
Cepstrum flux
x 10
7
6
5
4
0.5
1.5
2.5
frames
sig.flux: distance between

successive frames
sig.flux(, Dist, d) d = Euclidian, City, Cosine

specifies the distance measure.
sig.flux(..., Inc) positive differences
sig.flux(..., Complex) complex flux
sig.flux(..., Median, l, C)
sig.flux(..., Median, l, C, 'Halfwave)
Dynamics
Audio level
Sound
sig.rms
root-mean square
sig.rms(ragtime)
The RMS energy related to file ragtime is 0.017932

sig.rms(ragtime, Frame)
Default frame size .05 s, frame hop = .5
RMS energy
0.04
0.03
0.02
0.01
0
0.5
1.5
2.5
frames
3.5
4.5
sig.envelope
envelope extraction
Audio waveform
0.1
amplitude
0.05
a:
!0.05
!0.1
0.5
1.5
2.5
3.5
4.5
time (s)
e = sig.envelope(a):
Envelope
0.025
0.02
amplitude
0.015
0.01
0.005
0.5
1.5
2.5
time (s)
3.5
4.5
e.play
e.save
sig.rms
RMS energy
0.04
0.03
0.02
0.01
0
0.5
1.5
2.5
3.5
4.5
3.5
4.5
frames
sig.envelope
Envelope
0.2
amplitude
0.1
!0.1
!0.2
0.5
1.5
2.5
time (s)
sig.envelope(, Filter)
based on low-pass filtering
sig.envelope(, Hilbert)
Hilbert
abs
Full-wave rectification
(Analytic signal)
(Magnitude)
sig.envelope(, Filtertype, IIR)
HalfHann
1.5
LPF LowPass
Filter
x 10
0.5
0.5
2000
4000
6000
8000
10000
2000
4000
6000
8000
10000
mirenvelope(..., Tau, .02): time constant (in s.)

N Down-Sampling
mirenvelope(..., PostDecim, N) N=16

mirenvelope(..., Sampling, f)
sig.envelope(, Spectro)
based on power spectrogram
Band decomposition:
sig.envelope(..., Freq): none (default)
aud.envelope(..., Mel): Mel-band
aud.envelope(..., Bark): Bark-band
sig.envelope(, UpSample, 2)
sig.envelope(, Complex)
sig.envelope
post-processing
Envelope
sig.envelope(, Center)
amplitude
0.02
0
!0.02
0.5
1.5
2 (half!wave2.5
Envelope
rectified)
time (s)
3.5
4.5
0()$!!'
0.5
1.5
2
2.5
*+,,-.-/0+10-2)-/3-456time (s)
3.5
4.5
!"#
$"#
% (half!wave%"#
Envelope
rectified)
0+7-)89:
&
&"#
'
'"#
3.5
4.5
HalfWaveCenter)
amplitude
0.02
0.01
Diff)
1764+0;2-
%
$
!
!$
HalfWaveDiff)
amplitude
!4
x! 10
2
0
0.5
1.5
2.5
time (s)
sig.envelope(..., Power)
sig.envelope(..., Normal)
sig.envelope(..., Smooth,o) moving average, order o = 30 sp.
sig.envelope(..., Gauss,o) gaussian, std deviation o = 30 sp.
remains the same if only the seenDifferentiation

to approximate
differential
of asloudn
of zthe
(n)
is
performed
foll
b
bbands are preserved and then perception
loudness
for steady
soundsofiszroug
half-wave of
rectified
(HWR)
differential
b (n)
ise signal [20]. Approximately toasthe sum of log-powers at critical bands.
to suffice. Scheirer proposed a
The logarithm
and=differentiation
zb (n)
HWR(zb (n) operations
zb (n 1))
lysis was carried out within the sented in a more flexible form. A numerically
where the function
HWR(x)
= max(x,
0) sets ne
then combined across bands.
calculating
the logarithm
is the
-law compres
to zero and is essential to make the different
d was indeed very successful, a
aud.envelope(, Mu, mu),
ln(1
+
x
(k))
b
Then a weighted average
plies primarily to music with a
yb (k) =of zb (n) and its diffe
,
ln(1 + )
mu-law
compression,
mu
=
100
is
formed
as
nges for example in classical or
ced using only a few subbands. which performs a logarithmic-like transformatio
fr
zb (n)
ub (n) = (1 )zb (n) +
aud.envelope(..., Lambda, l)
changes and note beginnings in motivated above but behaves linearly fnear
LP zero
ly 40 logarithmically-distributed determines the 0.15
degree of compression and c
where 0 1 determines the balance betwe
aud.envelope(..., Klapuri06):
However, this leads to a dilemma: adjust
between a 0.1close-to-linear ( < 0.1) a
zb (n), and the factor4 fr /fLP compensates for th

0.05
distinguish harmonic changes logarithmic
(
>
10
)
transformation.
The
val
differential
of
a
lowpass-filtered
signal
is
small
in
6
= aud.envelope(...,
Spectro,
0
each narroweband
separately is employed,
but
all
values
[10,
1 in the
2 range
3
4 10 5]
prototypical meter analysis
system
and
a
subset
o
Time (s)
UpSample, Log, perform almost equally
database (see Sec.0.5 III) well.
were used to thoroughl
playing in whichHalfwaveDiff,
no perceptible gaps Lambda,
are
0.4
.8);
To
achieve
a
better
time
resolution,
the effect of . Values
between
0.6 andthe1.0com
pe
0.3
0.2
envelopes
y
(k)
are
interpolated
by factor
b
and
=
0.8
was
taken
into use. Using
this valtw
are approximately one whole tone apart,
0.1
d.
sig.sum(e, Adjacent,
zeros
between
the 0but
samples.
This
leads
e notes c and
1.010)
makes
a slight
consistent
improvement
1
2
3
4to the
5in
ub(n)
x (k)
aud.envelope
auditory modeling
Time (s)
accuracy.
Klapuri, A., A. Eronen and J. Astola. (2006).
AnalysisFig.
of3.the
meter of acoustic musical
Illustration of the dynamic compression and weighted differentiation
steps for an Processing,
artificial signal. Upper panel
shows x342
(k) and the
lower panel
signals, IEEE Transactions on Audio, Speech
and3 Langage
14-1,
355.
Figure
illustrates
the described
dynamic
com
shows u (n).
b
sig.filterbank
filterbank decomposition
f = sig.filterbank(, CutOff, [500, 1000])
f.play
f.save
filterN
...
miraudio
filter2
filter1
sig.filterbank
filterbank decomposition
sig.filterbank(, CutOff, [44, 88, 176, 352, 443])
Magnitude Response (dB)
Magnitude (dB)
0
10
20
30
40
0
0.005
0.01
N
li
dF
0.015
d/
0.02
0.025
l )
sig.filterbank(, CutOff, [-Inf 200 400 800 1600 Inf])

Magnitude (dB)
0
10
20
30
40
50
0.02
0.04
0.06
0.08
0.1
0.12
Examples
The ten ERB filters between 100 and 8000Hz are computed using
fcoefs = MakeERBFilters(16000,10,100);
The resulting frequency response is given by
y = ERBFilterBank([1 zeros(1,511)], fcoefs);
resp = 20*log10(abs(fft(y')));
freqScale = (0:511)/512*16000;
semilogx(freqScale(1:255),resp(1:255,:));
axis([100 16000 -60 0])
xlabel('Frequency (Hz)');
ylabel('Filter Response (dB)');
aud.filterbank
auditory modeling
0
aud.filterbank()
-40
Equivalent Rectangular
Bandwidth (ERB)
-60
10
10
10
Gammatone filterbank A simple cochlear model can be formed by filtering an utterance with these filters. T
Filter
frequency
convert this data
into an image
we pass each row ofresponses
the cochleagram through a hal
-20
wave-rectifier, a low-pass filter, and then decimate by a factor of 100. A cochleagra

of the A huge tapestry hung in her hallway utterance from the TIMIT database
aud.filterbank(, NbChannels, 10)

24
aud.filterbank(, Channel, 1:10)
based on Auditory toolbox
R. D. Patterson et al. Complex sounds and auditory images, in Auditory

Physiology and Perception, Y. Cazals et al, Oxford, 1992, pp. 429-446
aud.filterbank
auditory modeling
aud.filterbank(, Mel)
Magnitude (dB)
0
10
20
30
0.05
0.1
0.15
aud.filterbank(..., Bark)
0.2
0.25
0.3
Magnitude (dB)
0
10
20
30
0.1
0.2
0.3
aud.filterbank(..., Scheirer)
0.4
0.5
0.6
0.7
Magnitude (dB)
0
10
20
30
40
0.04
0.06
aud.filterbank(..., Klapuri)
0.08
0.1
0.12
0.14
0.16
0
Magnitude (dB)
0.02
10
20
30
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
.sum
across-channel summation
Audio waveform (centered)
0.1
0
!0.1
10
!1
20
sig.filterbank
f = sig.filterbank()
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0
!5
10
0
!2
50
0.5
0
!1
0
!3
Envelope
x 10
5
0
0.10
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
3.5
4 0.05
0
0.40
sig.envelope
e = sig.envelope(f)
0.2
0
10
2 0.5
0
0.20
1 0.1
0
Envelope
0.06
e.sum
0.02
amplitude
.sum
0.04
0
!0.02
!0.04
!0.06
!0.08
0.5
1.5
2.5
time (s)
.sum
stereo summation
sig.input
a = sig.input(,
Mix, No)
Envelope
0.03
0.02
sig.envelope
0.01
e = sig.envelope(a)
0.5
1.5
0.5
1.5
2.5
2.5
2.5
0.03
0.02
0.01
0
time (s)
Envelope
.sum
0.05
e.sum(channel)
amplitude
0.04
0.03
0.02
0.01
0
0.5
1.5
2
time (s)
(currently called sig.sum)
.sum
across-channel summary
Audio waveform (centered)
0.1
0
!0.1
10
!1
20
sig.filterbank
f = sig.filterbank()
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0
!5
10
0
!2
50
0.5
0
!1
!3
Envelope
x 10
5
0
0.10
sig.envelope
e = sig.envelope(f)
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
0.5
1.5
2.5
3.5
4 0.05
0
0.40
3 0.2
0
10
0.5
0
0.20
1 0.1
0
sig.autocor
!10
a = sig.autocor(e)
0.5
5 0
!0.5
3
2
1
a.sum
!8
x 10
50
0
!5 x 10!6
10
0
!1 x 10!6
50
0
!5 x 10!7
20
1
0
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
0.2
0.4
0.6
0.8
1.2
!6
5
coefficients
Envelope autocorrelation
x 10
.sum
x 10
!5
0.2
0.4
0.6
0.8
lag (s)
1.2
sig.peaks
peak picking
sig.peaks
peak picking
sig.peak(, Total, Inf) Number of peaks
Border effects:
sig.peak(..., NoBegin) First sample excluded
sig.peak(..., NoEnd) Last sample excluded
sig.peak(..., Order, o) Ordering of peaks
o = Amplitude From highest to lowest
o = Abscissa
sig.peak(..., Valleys)
Along the abscissa axis

Local minima
sig.peaks
peak picking
'()!*+(,-./0
03748-/5(
$!
OK
#"
OK
#!
"
!
not OK
!
"
#!
#"
$!
'()123456
OK
$"
sig.peak(, Threshold, 0)
sig.peak(..., Contrast, .1)
sig.peak(..., SelectFirst, .05)
mus.peak(..., Reso, SemiTone)
%!
%"
t
c
0&!0
sig.peaks
peak picking
sig.peak()
coefficients
0.5
!0.5
0.005
0.01
0.015
0.02
0.025
lag (s)
0.03
0.035
0.04
0.045
0.035
0.04
0.045
0.035
0.04
0.045
sig.peak(..., Extract)
coefficients
0.5
!0.5
0.005
0.01
0.015
0.02
0.025
lag (s)
0.03
1
sig.peak(..., Only)
coefficients
0.8
0.6
0.4
0.2
0
0.005
0.01
0.015
0.02
0.025
lag (s)
0.03
sig.peaks(, Track)
peak tracking
McAulay, R.; Quatieri, T. (1996). Speech analysis/Synthesis based on a sinusoidal

representation, IEEE Transactions on Acoustics, Speech and Signal Processing, 34:4, 744754.
sig.segment
segmentation
s = sig.segment(myfile, [1 2 3])
s.play(1:2)
Audio waveform
s.save
0.1
sig.spectrum(s)
0.5
1.5
2.5
time (s)
3.5
4.5
MelSpectrum
40
Mel bands
amplitude
0.1
20
0
0.5
1.5
2.5
3.5
sig.segment
segmentation
e = sig.envelope(myfile)
0.045
0.04
0.035
0.03
0.025
0.02
0.015
Envelope
0.01
0.005
p = sig.peaks(e)
0.5
s = sig.segment(myfile, p)
1.5
time
2.5
3.5
Symbolic level
Notes
Dynamics
Audio level
Sound
aud.score
onset detection
e=
sig.envelope(myfile)
Envelope
0.045
0.04
0.035
0.03
0.025
0.02
0.015
p = sig.peaks(e,
Contrast, .1)
0.01
0.005
0
1
s = aud.score(p)
1.5
time
2.5
aud.sequence
-0.5
In short:
[s p] =
aud.score(myfile,
Contrast, .1)
0.5
0.5
-1
0.5
1.5
2.5
seq.event
seq.syntagm
mironsets
onset detection
Poster @ ismir 2008
mirrms
mirautocor
mirsimatrix
mirnovelty
mirframe
mirspectrum
mirflux
(Complex)
Mel
mirfilterbank!
Gammatone!
Scheirer !
Klapuri
power
diff!
simple!
order 8
mirenvelope!
auto-regressive!
half-Hanning
Halfwave
*.8
log
*.2
aud.score(, Scheirer)
aud.score(, Klapuri)
aud.eventdensity
temporal density of events
1
0.5
-0.5
-1
0.5
1.5
a = aud.score(audio.wav)
2.5
Event density
3.5
8.5
8
7.5
7
6.5
aud.eventdensity(a)
6
5.5
5
0.5
aud.eventdensity(audio.wav, Frame)
aud.eventdensity(score.mid)
77
1.5
time
2.5
Notes
Dynamics
Timbre
Audio level
Sound
aud.score(,
Attack, Release)
aud.attacktime
duration of note attacks
Envelope (centered)
2.5
amplitude
2
1.5
1
0.5
0
!0.5
4
time (s)
aud.attacktime(, Lin): duration in seconds
aud.attacktime(, Log): duration in log scale
Krimphoff, J., McAdams, S. & Winsberg, S. (1994), Caractrisation du timbre des sons complexes.
II : Analyses acoustiques et quantification psychophysique. Journal de Physique, 4(C5), 625-628.
aud.attackslope
average slope of note attacks
Envelope (centered)
2.5
amplitude
2
1.5
1
0.5
0
!0.5
4
time (s)
mirattackslope(..., Diff): average slope
mirattackslope(..., Gauss): gaussian average,

highlighting the middle of the attack phase
Peeters. G. (2004). A large set of audio features for sound description (similarity and
classification) in the CUIDADO project. version 1.0
o = aud.onsets(audiofile, Attack, Release)
aud.attackslope(o)
sig.zerocross
waveform sign-change rate
Audio waveform
0.0&
amplitude
0.02
0.01
0
!0.01
!0.02
!0.0&
1.701
1.702
1.70&
1.704
1.705
time (s)
1.706
1.707
1.708
Indicator of noisiness
sig.zerocross(, Per, Second): rate per second
sig.zerocross(..., Per, Sample): rate per sample
sig.zerocross(..., Dir, One): only or
sig.zerocross(..., Dir, Both): both and
1.709
sig.rolloff
high-frequency energy (I)
Spectrum
3500
magnitude
3000
85% of the energy
2500
2000
1500
1000
500
0
2000
4000
6000
frequency (Hz)
8000
10000
12000
5640.53 Hz
sig.rolloff(, Threshold, .85)

th=.85 in Tzanetakis, Cook. Musical genre classification of audio signals. IEEE
Tr. Speech and Audio Processing, 10(5),293-302, 2002.
th=.95 in Pohle, Pampalk, Widmer. Evaluation of Frequently Used Audio
Features for Classification of Music Into Perceptual Categories, ?
aud.brightness
high-frequency energy (II)
Spectrum
3500
magnitude
3000
2500
53.96% of the energy
2000
1500
1000
500
0
2000
4000
6000
frequency (Hz)
8000
10000
1500 Hz
aud.brightness(, CutOff, 1500) (in Hz)
aud.brightness(..., Unit, u)
u = /1 or %
3000 Hz in Juslin 2001, p. 1802.

1500 Hz and 1000 Hz in Laukka, Juslin and Bresin 2005.
12000
aud.brightness
high-frequency energy (II)
aud.brightness(, Frame)
coefficient value
Brightness, son5
frame length = .05 s

frame hop = 50%
0.6
0.4
0.2
70
75
80
85
90
Temporal location of events (in s.)
95
100
Beethoven, 9th Symphony, Scherzo

coefficient value
Brightness, son6
0.5
0.4
0.3
0.2
0.1
20
40
60
80
100
120
140
Beethoven, 7th Symphony, Allegretto
aud.mfcc
mel-frequency cepstral coefficients
mircepstrum:
audio
sig.spectrum
Abs
Log
mirmfcc:
sig.spectrum
Abs
Log
Description of spectral shape
(Mel)
(Inverse)

Fourier

transform
Discrete

Cosine

Transform
aud.roughness
sensory dissonance
aud.roughness(, Sethares)
Dissonance produced
by two sinusoids
depending on
their frequency ratio
sig.spectrum
sig.peaks
dissonance

sum
estimated for

each pair of peaks
Plomp & Levelt "Tonal Consonance and Critical Bandwidth" Journal of the Acoustical Society of America, 1965.
Sethares, W. A. (1998). Tuning, Timbre, Spectrum, Scale, Springer-Verlag.
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
aud.pitch
f0 estimation
aud.filterbank
sig.frame
sig.frame
mirspectrum
sig.spectrum
sig.autocor
sig.autocor
.sum
p = aud.pitch(,
Autocor)
aud.pitch(...,
AutocorSpectrum)
aud.pitch(..., Cepstrum)
aud.pitch(..., Frame,...)
sig.cepstrum
*
sig.peaks
Peeters. Music Pitch Representation by Periodicity

Measures Based on Combined Temporal and
Spectral Representations. ICASSP 2006.
aud.pitch
f0 estimation
aud.filterbank
sig.frame
sig.autocor
sig.frame
aud.pitch(, Tolonen)
aud.pitch(,
2Channels,
Enhanced, 2:10,
Generalized, .67)
mirspectrum
sig.spectrum
sig.autocor
.sum
sig.cepstrum
*
sig.peaks
Tolonen, Karjalainen. A Computationally Efficient

Multipitch Analysis Model. IEEE Transactions on
Speech and Audio Processing, 8(6), 2000.
aud.pitch
f0 estimation
frequency (Hz)
1500
1000
ac = sig.autocor(ragtime,
Frame)
500
0.5
1
1.5
2
2.5
3
3.5
temporal location of beginning of frame (in s.)
frequency (Hz)
2000
1500
ac = sig.autocor(ac,
1000
Enhanced, 2:10)
500
0
0.5
1
1.5
2
2.5
3
3.5
Pitch
1600
coefficient value (in Hz)
1400
1200
aud.pitch(ac)
1000
800
600
400
200
0
0.5
1.5
2
2.5
3
Temporal location of frames (in s.)
3.5
4.5
Tolonen, Karjalainen. A Computationally Efficient

Multipitch Analysis Model. IEEE Transactions on
Speech and Audio Processing, 8(6), 2000.
p.play
p.save
aud.pitch
f0 estimation
aud.pitch(, Frame)
Pitch, s103.wav
1000
800
600
400
200
0
0.5
1.5
2
2.5
3.5
aud.pitch(, Frame, Total,

1, Max, 400)
Pitch, s103.wav
300
250
200
150
0.5
1.5
2
2.5
3.5
Symbolic level
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
mus.pitch
pitch segmentation and discretisation
4
Pitch, istikhbar
x 10
1.21
10
10
1.2
1.19
8
88
1.18
8
7
1.17
1.16
14
14.5
15
15.5
16
5
16.5
Structural
levels
Meter
Symbolic level
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
aud.fluctuation
rhythmic periodicity along auditory
channels
Mel!Spectrum
40
35
aud.spectrum
Bark bands
(Frame,.023,.5, Terhardt,
Bark, Mask, dB)
Mel bands
30
25
20
15
10
5
0.5
1
1.5
2
2.5
3
3.5
temporal locationSpectrum
of beginning of frame (in s.)
40
35
(AlongBands, Max,10,
Window, No,
Resonance, Fluctuation)
Channels
aud.spectrum
Bark bands
30
25
20
15
10
5
Spectrum
10000
sig.sum
magnitude
8000
4
5
6
frequency (Hz)
6000
4000
2000
0
4
6
frequency (Hz)
10
E. Pampalk, A. Rauber, D. Merkl, "Content-based Organization and

Visualization of Music Archives", ACM Multimedia 2002, pp. 570-579.
mus.tempo
tempo estimation
Onset curve (Differentiated envelope)
e = sig.envelope(mysong,
Diff)
ac = sig.autocor(do)
0.15
0.05
0
!0.05
pa = sig.peaks(ac, Total, 1)
pa
0.6
mus.tempo(pa)
t = mus.tempo(mysong)!
time (s)
0.4
coefficients
0.1
amplitude
ac
0.2
0
!0.2
t = t.eval
t = {sig.signal, sig.Autocor}
!0.4
0.2
0.4
0.6
0.8
1
lag (s)
1.2
1.4
tempo: 129.6333 bpm
1.6
mus.tempo
tempo estimation
e = sig.envelope(mysong, Diff)
ac = aud.autocor(do, Resonance)
1.2
ac
Resonance
curve
emphasises
(Toiviainen
& Snyder,
2003)
periodicities more easily perceived.
1
0.8
(van Noorden & Moelants, 2001)
0.6
0.4
0.2
do
pa
0.2
0.4
0.2
0.4
0.6
0.8
lags (s)
1.2
1.4
1.6
mus.tempo
tempo estimation
e = sig.envelope(mysong, Diff)
ac = aud.autocor(do, Frame,
Resonance)
0.1
amplitude
0.15
0.05
0
!0.05
time (s)
pa = sig.peaks(ac, Total, 1)
mus.tempo(pa)
t = mus.tempo(mysong, Frame)!
t = t.eval
t = {sig.signal, sig.Autocor}
pa
coefficient value (in bpm)
ac
Tempo
180
170
160
150
140
130
4
6
8
10
mus.pulseclarity
rhythmic clarity, beat strength
mus.pulseclarity(,Enhanced, No)
Amplitude
1
0.8
coefficients
0.6
0.4
0.2
0
0.2
0.4
0.2
240 BPM
0.4
0.6
0.8
.5 s

1.2
1 s

lag (s)
1.4
MaxAutocor MinAutocor
KurtosisAutocor
TempoAutocor
EntropyAutocor
Lag
0
InterfAutocor
1.6
mus.pulseclarity(,Enhanced, Yes)
EntropyAutocor
InterfAutocor
Amplitude
0.6
0.5
0.4
0.3
0.2
0.1
0.2
0.4
0.6
0.8
1.2
1.4
1.6
Lag
Lartillot et al., ISMIR 2008
mus.tempo(mysong,
Frame)
Pulsation does not always focus on one single metrical level.
mus.metre
tracks all metrical levels in parallel.
Metrical levels: 8
Dominant levels
7
6 .
5
4
3 .
2
1
0.5
2.5
mus.metre
metrical hierarchy tracking
mus.metroid
Structural
levels
Mode!
Tonality
Symbolic level
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
mus.chromagram
energy distribution along pitches
s = sig.spectrum(a,
dB, 20, 'Min', 100,
'Max', 6400)
Spectrum
20
15
magnitude
10
5
0
c=
mus.chromagram(c,
Wrap, yes)
1000
1500
2000
2500
3000
frequency (Hz)
3500
4000
4500
5000
Chromagram
100
magnitude
c=
mus.chromagram(s,
Wrap, no)
500
120
80
60
40
20
0
G2
C3
F3
A#3
D#4
G#4
C#5
F#5
chroma
B5
E6
A6
D7
G7
C8
F#
G#
A#
Chromagram
400
300
magnitude
200
100
0
C#
D#
F
chroma
mus.chromagram
chroma resolution
mus.chromagram(, Res, 12)

chroma class
Chromagram
B
A#
A
G#
G
F#
F
E
D#
D
C#
C
0
1.5
2
2.5
time axis (in s.)
3.5
1.5
2
2.5
3
3.5
mus.chromagram(..., Res, 36)

Chromagram
35
30
chroma class
0.5
25
20
15
10
5
0
0.5
mus.keystrength
probability of key candidates
mus.chromagram
C major

Cross-Correlations
C minor

C# major

C# minor

D major

D minor

Krumhansl, Cognitive foundations of musical pitch. Oxford UP, 1990.

Gomez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on
Computing, 18-3, pp. 294304, 2006.
mus.key
key estimation
sig.peaks(mus.keystrength())
mus.key(, Total, 1)

Gomez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on
Computing, 18-3, pp. 294304, 2006.
mus.key
key estimation
Key
A#
k = mus.key(mysong, Frame)!
k = k.eval
coefficient value
A
G#
min
maj
k= {sig.signal, sig.signal,
mus.Keystrength}
F#
F
E
D#
0.5
1.5
2
Key strength
2.5
3.5
2.5
3.5
Key clarity
tonal center
0.8
coefficient value
0.75
0.7
0.65
0.6
0.55
0.5
1.5
2
Bm
A#m
Am
G#m
Gm
F#m
Fm
Em
D#m
Dm
C#m
Cm
BM
A#M
AM
G#M
GM
F#M
FM
EM
D#M
DM
C#M
CM
0
0.5
1.5
2
2.5
mus.key
key estimation
coefficient value
Key, Umile01
10
15
20son32.wav
Key,
B
Temporal
location of events (in s.)
A
G#
G
F#
F
E
D#
D
C#
C
coefficient value
coefficient v
F#
F
E
D#
D
C#
C
maj
min
5
10
15
20
25
30
Monteverdi, Hor chel ciel e la terra, 1st part
A#
A
G#
G
F#
F
E
D#
D
C#
20
coefficient value
coefficient value
60
80
100
120
40
60
80
100
30
120
Key, son27.wav
B
A#
A
G#
G
F#
F
E
D#
D
C#
C
40
25
Tiersen, Comptine dun autre t: Laprs-midi
Key, son5.wav
20
maj
min
140
160
B
A#
A
G#
G
F#
F
E
D#
D
C#
C
20
40
60
80
100
120
140
Schnberg, Verklrte Nacht, Sehr Ruhig
[k c] = mus.key
key clarity
Key clarity, son31.wav
coefficient value
clear
tonality
1
0.8
0.6
0.4
Tchakovski, Swan Lake, Act one
0.2
unclear
20
40
60
80
100
120
140
Key clarity, son24.wav

1
coefficient value
clear
unclear
0.8
0.6
0.4
0.2
0
20
40
60
80
100
120
140
160
Prokofiev, Violon concerto No. in D major, Scherzo: Vivacissimo
mus.mode
mode estimation
coefficient value
major
minor
coefficient value
major
minor
mus.mode
mode estimation
Mode, son28.wav
0.4
Schubert, Piano Trio No.2 in E-flat major, Andante con moto
0.2
0
0.2
0.4
20
40
60
80
100
120
140
Mode, son22.wav
0.4
Mozart, Piano Concerto No.23 in A major, Adagio
0.2
0
0.2
0.4
20
40
60
80
100
120
140
mus.keysom
self-organizing map
Self!organizing map projection of chromagram
f#
d
A
c#
E
B
a
C
e
G
b
D
bb
Db
Ab
c
g
Eb
Bb
ab
d#
Gb
Toiviainen & Krumhansl, Measuring and modeling real-time responses to music:

The dynamics of tonality induction, Perception 32-6, pp. 741766, 2003.
Structural
levels
Mode
Tonality
Symbolic level
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
onset-based segmentation
o = aud.onsets(audiofile, Attack, Release)
sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06
amplitude
0.04
0.02
0
0.02
0.04
0.06
0.08
0.1
0.5
1.5
2.5
time (s)
3.5
4.5
s = sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06
amplitude
0.04
0.02
0
0.02
0.04
0.06
0.08
0.1
0.5
1.5
2.5
3.5
4.5
time (s)
mus.pitch(s)
Pitch, ragtime
1500
1000
500
s = sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06
amplitude
0.04
0.02
0
0.02
0.04
0.06
0.08
0.1
0.5
1.5
2.5
time (s)
mus.key(s)
3.5
4.5
Structural
levels
Groups
Symbolic level
Meter
Notes
Timbre
Audio level
Mode
Tonality
Pitch
Sound
Dynamics
may also be varied, though in our experience, robust audio analysis r

Here, the parameterization is optimized for analysis, rather than for com
The sole requirement is that similar source audio samples produce sim
sig.simatrix
The analysis proceeds by comparing all pairwise combinations of aud
dissimilarity
matrix
measure. Represent
the B-dimensional spectral
data computed for N wi
{vi : i = 1, , N } IRB . Using a generic similarity measure, d : IR

similarity data in a matrix, S as illustrated in the right panel of Figur
vi
S(i, j)sig.simatrix(a,
= d(vi , vj ) i, j = 1, , N
Dissimilarity matrix
temporal location of frame centers (in s.)
3.5
2.5
Dissimilarity)
The time axis runs on the horizontal (left to right) and vertical (top
diagonal, where self-similarity is maximal.
A common similarity meas
sig.simatrix(,
vi and vj representing the spectrograms forDistance,
sample timescosine)
i and j, resp
1.5
0.5
0.5
sig.spectrum
(,Frame)
1
1.5
2
2.5
3
vj
3.5
< vi , vj >
dcos (vi , vj ) =
.
|vi ||vj |
There are many possible choices for the similarity measure including
normCooper.
or non-negative
exponential using
measures,
such as Decomposition,
Foote,
Media Segmentation
Self-Similarity
SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.
The
ehe
sole
sole
sole
requirement
requirement
isisthat
issource
that
that
similar
similar
similar
source
source
audio
audio
audio
samples
samples
samples
produce
produce
produce
similar
similar
similar
feature
featur
featu
ement
isrequirement
that similar
audiosource
samples
produce
similar
features.
The
The
analysis
analysis
analysis
proceeds
proceeds
byby
by
comparing
comparing
comparing
allall
all
pairwise
pairwise
pairwise
combinations
combinations
combinations
ofofaudio
ofaudio
audio
windows
window
windo
sThe
proceeds
by proceeds
comparing
all
pairwise
combinations
of
audio windows
using
a qua
measure.
asure.
easure.
Represent
Represent
Represent
the
the
the
B-dimensional
B-dimensional
B-dimensional
spectral
spectral
data
data
data
computed
computed
for
for
for
NNaN
windows
windows
windows
ofofao
esent
the
B-dimensional
spectral dataspectral
computed
for
Ncomputed
windows
of
digital
audio
BBB
BBB
BBB
B
:ii :}i=i=
1,1,
1,B ., N
,Using
N
,}N}
}
IR
Igeneric
R. . Using
. Using
Using
a ageneric
ageneric
generic
similarity
similarity
similarity
measure,
:IR:IR
IIR
we
I
RIR
IR
#
v{vi,: N
=
IR
aIR
similarity
measure,
d measure,
: measure,
IR
IRdBd:d#
R,
em
milarity
imilarity
ilarity
data
data
data
inSinin
aasamatrix,
aillustrated
matrix,
matrix,
SSas
Sas
as
illustrated
illustrated
illustrated
inin
the
the
the
right
right
right
panel
panel
panel
ofof
Figure
Figure
Figure
1.1.1.
The
The
The
el
in a matrix,
in
the
right in
panel
of
Figure
1. ofThe
elements
of
Se
sig.simatrix
similarity matrix
Similarity matrix
Dissimilarity matrix
=i,=d(v
=
d(v
,)jv )j ),i,N
i,ji,j=j.=1,
=1,1, , N
, N
, N. . .
S(i, j) = d(vS(i,
vS(i,
) j)j)
j d(v
=
i ,1,
iv,ijv
i ,S(i,
jj)
4
3.5
3.5
The
ehe
time
time
time
axis
axis
runs
runs
runs
onon
on
the
the
the
horizontal
horizontal
horizontal
(left
(left
(left
totovertical
to
right)
right)
right)
and
and
and
vertical
vertical
(top
(top
(top
to
toto
bottom)
bottom
botto
runs
onaxis
the
horizontal
(left
to right)
and
(top
tovertical
bottom)
axes
of
Sa
diagonal,
agonal,
where
where
where
self-similarity
self-similarity
self-similarity
ismaximal.
iscommon
maximal.
maximal.
AAA
common
common
common
similarity
similarity
similarity
measure
measure
measure
the
isthe
thc
egonal,
self-similarity
is
maximal. isA
similarity
measure
is the
cosineisis
distan
vand
and
vjvjvrepresenting
representing
representing
the
the
the
spectrograms
spectrograms
spectrograms
for
for
for
sample
sample
times
i iand
iand
and
j,j,respectively,
j,respectively,
respectively,
senting
spectrograms
for
sample times
isample
and times
j,times
respectively,
iand
jthe
3
2.5
2.5
2
< vi , vj > <<v<iv,ivv,ijv, jv>j>>

dcos
ddcos
. ..
(v(v
,iv,ijv,)jv)=
dcos (vi , vj ) =
cos
i(v
j )=.=
|v|v
|vi ||v
|j |j |
|vi ||vj |
i ||v
ij||v
1.5
1.5
1
0.5
0.5
There
ere
here
are
are
are
many
many
many
possible
possible
possible
choices
choices
choices
for
for
for
the
the
the
similarity
similarity
similarity
measure
measure
measure
including
including
including
measures
measures
measure
b
y possible
choices
for the
similarity
measure
including
measures
based
on the
norm
orm
m ororor
non-negative
non-negative
non-negative
exponential
exponential
exponential
measures,
measures,
measures,
such
such
such
asasas
egative
exponential
measures,
such
as
0.5
1
1.5
2
2.5
3
3.5
0.5
1
1.5
2
2.5
3
3.5
sig.simatrix(a, Similarity, exponential)
dexp
dexp
dexp
(v(v
,ivcos
,ijv,)jv(v
)=
=
(d
(v(v
,iv,ijv,)jv)
1)
1)1). . .
dexp (vi , vj ) =
(d
,exp
vexp
) (d
(d
1)cos
.i(v
exp
i(v
j )i=
cos
cos
j )
jexp
Foote, Cooper. Media Segmentation using Self-Similarity Decomposition,

The
ehe
right
right
right
panel
panel
panel
ofof
Figure
Figure
Figure
1 1shows
1shows
shows
amatrix
asimilarity
asimilarity
similarity
matrix
matrix
matrix
computed
computed
computed
from
from
from
the
the
the
song
song
song
l of
Figure
1 of
shows
a similarity
computed
from
the song
Wild
Honey
aud.novelty
novelty curve
sig.simatrix(a)
aud.novelty(a, KernelSize, s)
Similarity matrix
4
s = 128 samples
3
2.5
2
1.5
1
0.5
0.5
1
1.5
2
2.5
3
3.5
Convolution with Gaussian

checkerboard kernel
coefficient
value (in )
coefficient
value (in
)
coefficient value (in
)
coefficient
value (in
)
3.5
small kernel:
Novelty
0.5
0
0
1
0.5
1.5
2
Novelty 2.5
3.5
4.5
0.5
1.5
Novelty 2.5
2
3.5
4.5
0.5
0
0
1
0.5
0
large kernel:
0.5
1.5
2 Novelty 2.5
3.5
4.5
0.5
1.5
2
2.5
3.5
4.5
0.5
0
Foote, Cooper. Media Segmentation using Self-Similarity Decomposition,

Novelty
1
0.8
coefficient value
aud.segment
novelty-based
segmentation
nv
0.6
0.4
0.2
sg = sig.segment(mysong, nv)
sg = sig.segment(mysong)
sg.play
s = aud.mfcc(sg)
sm = sig.simatrix(s)
sm
3
4
5
Audio waveform
sg
0.5
amplitude
nv = aud.novelty(sm)
!0.5
4
time
(s)
MFCC
20
coefficient ranks
15
10
5
0
4
time axis (in s.)
sig.cluster
clustering of audio segments
Audio waveform
amplitude
0.1
sg = aud.segment(a)0
!
coefficient ranks
cc = aud.mfcc(sg)
amplitude
0.1
150
MFCC
time (s)
10
5
0
0.10
sig.cluster(sg, cc)
0
0.1
Audio waveform
2
3
time axis (in s.)
sig.cluster
clustering of frame-decomposed feature
coefficient ranks coefficient ranks
MFCC
12
10
cc = aud.mfcc(,8
6
Frame)
4
2
MFCC
!
0
1
2
3
12 temporal location of beginning of frame (in
sig.cluster(cc, 4)10
8
6
4
2
0
1
2
3
sig.stat
basic statistics
mean
standard deviation
temporal slope
main periodicity:
frequency
amplitude
periodicity entropy
sig.hist
histogram
4
Histogram of Audio waveform
x 10
number of occurrences
7
6
5
4
3
2
1
0
!0.1
!0.08
!0.06
!0.04
!0.02
0
values
0.02
0.04
sig.hist(, Number,10)
0.06
0.08
0.1
sig.zerocross
waveform sign-change rate
Audio waveform
0.0&
amplitude
0.02
0.01
0
!0.01
!0.02
!0.0&
1.701
1.702
1.70&
1.704
1.705
time (s)
1.706
1.707
1.708
on any curve
sig.zerocross(, Per, Second): rate per second
sig.zerocross(..., Per, Sample): rate per sample
sig.zerocross(..., Dir, One): only or
sig.zerocross(..., Dir, Both): both and
1.709
sig.centroid
geometric center
first moment:
1 =
xf (x)dx
sig.spread
variance, dispersion
second moment:
= 2 =
2
standard deviation:
(x
1 ) f (x)dx
2
sig.skewness
non-symmetry
third moment:
3 =
(x
1 ) f (x)dx
3
3
third standardized moment:
3
-3
<0
>0
+3
sig.kurtosis
pickiness
fourth moment:
4 =
(x
1 ) f (x)dx
4
4
fourth standardized moment: 4
4
excess kurtosis: 4 3
-2
<0
>0
sig.flatness
smooth vs. spiky
geometric mean
arithmetic mean
N
N
1
n=0
x(n)
PN 1
n=0 x(n)
N
sig.entropy
relative Shannon entropy
Data considered as probability distribution:
p0: half-wave rectification
p=1: normalization
Shannon entropy: H(p) = -(p log(p))
Relative entropy, independent on the sequence length:

H(p) = -(p log(p)) /log(length(p))
Entropy gives an indication of the curve:
High entropy uncertainty flat curve
Low entropy certainty peak(s)
Shannon, C. E. 1948. A Mathematical Theory of Communication. Bell Systems Technical Journal 27:379423.
sig.export
exportation of statistical data to files
m = aud.mfcc(
sig.export(filename, m, ) adding one or several data

from MiningSuite operators.
sig.export(result.txt, ...) saved in a text file.
sig.export(result.arff, ...) exported to WEKA for datamining.
sig.export(Workspace, ...) saved in a Matlab

variable.
aud.play
feature-based playlist
zc = sig.zerocross(Folder);
aud.play(Folder, Increasing, zc)
Plays the folder of audio files in increasing order

of zero-crossing rate.
aud.play(Folder, Increasing, zc, Every, 5)
Plays one out of five audio files.
sig.dist
distance between features
s1 = aud.spectrum(a, Mel) s2 = aud.spectrum(b, Mel)

MelSpectrum
MelSpectrum
000
magnitude
150
000
0
10
20
Mel bands
30
100
50
0
40 0
10
20
Mel bands
30
sig.dist(s1, s2, Cosine)

The Mel-Spectrum Distance between files a and b is 0.1218
40
sig.dist
distance between clusters
c1 = aud.mfcc(a, Frame)
c2 = aud.mfcc(b, Frame)

c1 = sig.cluster(c1, 16)
c2 = sig.cluster(c2, 16)
MFCC
coefficient ranks
MFCC
12
10
8
6
4
2
0
12
10
8
6
4
2
1
2
3
4 0
2
4
6
8
10
12
Earth Mover's Distance between clusters
sig.dist(c1, c2)
The MFCC Distance between files a and b is 19.3907
sig.dist
distance between features
= aud.spectrum(Folder,
s1 = aud.spectrum(a, Mel) s2
Mel)
MelSpectrum
000
MelSpectrum
MelSpectrum
10
20
Mel bands
30
40
magnitude
magnitude
magnitude
magnitude
000
0
150
150
MelSpectrum
100
100 150
MelSpectrum
50
150
100
50
0
100 10
0
20
30
40
0 50
Mel bands
0
10
20
30
40
50
0
Mel bands
0
10
20
30
40
0
Mel bands
0
10
20
30
40
Mel bands
sig.dist(s1, s2)
The Mel-Spectrum Distance between files a and b1 is 0.2386

.getdata
returns data in Matlab format
s = sig.spectrum(file);
Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
s.getdata
vector
.getdata
s = sig.spectrum(file,
Frame);
Encapsulated data
numerical data,
related file name,
etc.
s.getdata
matrix
.getdata
f = sig.filterbank(file,
Frame);
Encapsulated data
numerical data,
related file name,
etc.
f.getdata
matrix
.getdata
sg = sig.segment(file)
f = sig.filterbank(sg,
Frame);
Encapsulated data
numerical data,
related file name,
etc.
f.getdata
matrix
cell array
matrix
...
.getpeakpos, .getpeakval
p = sig.peaks
p.getpeakpos
vector
Encapsulated data
numerical data,
related file name,
etc.
p.getpeakval
vector
Limitations of data flow

in MIRtoolbox
long audio file,
batch of files
miraudio
mirframe
mirspectrum
a = miraudio(bigfile)
f = mirframe(a)
s = mirspectrum(f)
mircentroid(s)
mircentroid(bigfile, Frame)
mircentroid
145
Data flow graph

design & evaluation
long audio file,
batch of files
miraudio
a = miraudio(Design, )
s = mirspectrum(f, Frame, )
c = mircentroid(s)
mireval(c, bigfile)
mirframe
mirspectrum
mircentroid
146
Data flow graph

in MiningSuite
long audio file,
batch of files
a = sig.input(bigfile, );
sig.input
s = sig.spectrum(f, Frame, );
c = sig.centroid(s)
Frame
; No operation is performed.
sig.spectrum
sig.centroid
(The data flow graph is

constructed without actual
computation.)
147
sig.design
data flow graph design
s = sig.spectrum(a);
c = sig.centroid(s)
c!
d = sig.ans!
d = c.eval!
d = c.getdata
sig.design objects, storing

only the data flow graph
Design now evaluated in
order to display the results.
But results was not stored in c,

so displaying again c triggers
another evaluation of the design.
The last evaluation is stored in sig.ans.
Evaluate and store in a variable.
sig.design.eval
data flow graph evaluation
c = sig.centroid(s)
c.eval!
c.eval(anotherfile)
Evaluate the data flow graph

for the particular audio file
(or Folder) assigned during
the graph design
Apply the same data flow
graph to another audio file
(or Folder)
sig.design.show
data flow graph display
a = sig.input();
c = sig.centroid(s)
c.show
> sig.input ( 'ragtime' )!

frameconfig: 0!
mix: 'Pre'!
sampling: 0!
center: 0!
sampling: 0!
extract: []!
trim: 0!
trimwhere: 'BothEnds'!
trimthreshold: 0.0600!
halfwave: 0!
> sig.spectrum ( ... )!

win: 'hamming'!
min: 0!
max: Inf!
mr: 0!
res: NaN!
length: NaN!
zp: 0!
wr: 0!
octave: 0!
constq: 0!
alongbands: 0!
ni: 0!
collapsed: 0!
rapid: 0!
phase: 1!
nl: 0!
norm: 0!
mprod: []!
msum: []!
log: 0!
db: 0!
pow: 0!
collapsed: 0!
aver: 0!
gauss: 0!
timesmooth: 0
> sig.centroid ( ... )!
sig.signal.design
design stored in the results
a = sig.input();
save result.mat d
1 year later:
c = sig.centroid(s);
load result.mat
d = c.eval
d.design!
d.design!
d.design.show
d.design.show
the results
3.
Symbolic approaches
153
MIDI Toolbox
nmat = readmidi(laksin.mid)
nmat =
pianoroll(nmat)
C5#
C5
B4
A4#
A4
Pitch
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
Time in beats
!
0
1.0000
2.0000
2.5000
3.0000
3.5000
4.0000
5.0000
6.0000
7.0000
9.0000
9.5000
10.0000
11.0000
11.5000
12.0000
12
14
12.5000
0.9000
0.9000
0.4500
0.4500
0.4528
0.4528
0.9000
0.9000
0.9000
1.7500
0.4528
0.4528
0.9000
0.4528
0.4528
0.4528
16
0.4528
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
64.0000
71.0000
71.0000
69.0000
67.0000
66.0000
64.0000
66.0000
67.0000
66.0000
64.0000
67.0000
71.0000
71.0000
69.0000
67.0000
66.0000
MIDI data
82.0000
89.0000
82.0000
70.0000
72.0000
72.0000
70.0000
79.0000
85.0000
72.0000
74.0000
81.0000
83.0000
78.0000
73.0000
71.0000
69.0000
0
0.6122
1.2245
1.5306
1.8367
2.1429
2.4490
3.0612
3.6735
4.2857
5.5102
5.8163
6.1224
6.7347
7.0408
7.3469
7.6531
0.5510
0.5510
0.2755
0.2755
0.2772
0.2772
0.5510
0.5510
0.5510
1.0714
0.2772
0.2772
0.5510
0.2772
0.2772
0.2772
0.2772
mus.score
score excerpt selection
mus.score(, Notes, 10:20)
mus.score(, StartTime, 30, EndTime, 60)
mus.score(, Channel, 1)
mus.score(, Trim)
mus.score(, TrimStart, TrimEnd)
mus.score
score information
metrical grid: hierarchical construction of

pulsations over multiple metrical levels
modal and tonal spaces: mapping pitch values

on scale patterns (on delimited temporal regions)
syntagmatic chains: successive notes forming

voices, enabling to express relative distance
between successive notes (rhythmic values)
Pitch
pianoroll(laksin,'name','sec','vel');
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
mus.score
score information
0
1
2
3
mus.score(laksin.mid)
B4
10
10
mus.save
mus.play
Time in seconds
120
Velocity
100
A4 80
60
G4 40
F#4 20
0
E4
Eb4
Time in seconds
0 notation of
2 the two first
4 phrases of
6 Lksin min
8 kesyn.
10 The lower panel
Figure 1: Pianoroll
shows the velocity information.
mus.pitch
pitch contour
m = mus.score(myfile)
Pitch
71
m is of class
mus.sequence
70
69
68
67
66
65
s = mus.pitch(m)
64
63
s is of class sig.signal
h = mus.hist(s, Class)
Histogram
10
8
6
4
2
0
D#
F#
Pitches
mus.pitch(Inter)
pitch interval contour
Pitch
7
6
5
4
i = mus.pitch(m, Inter)
3
2
1
0
h = mus.histo(i)
-1
-2
1
7
mus.histo(i, Sign, 0)
Histogram
5
4
3
2
1
0
-2
-1
mus.pitch
pitch contour
Pitch
71
70
69
68
67
66
65
64
c = mus.pitch(m,
Sampling, .25)
63
Autocor
1
0.8
mus.autocor(c)
0.6
0.4
0.2
0
-0.2
0.5
1.5
2.5
Time
3.5
4.5
mus.pitch
pitch contour
m1 = mus.score(myfile,
EndTime, 5)
Pitch
71
70
69
68
67
66
65
m2 = mus.score(myfile,
StartTime, 5)
64
0.5
1.5
time
2.5
3.5
2.5
3.5
Pitch
71
70
69
68
p1 = mus.pitch(m1,
Sampling, .25)
p2 = mus.pitch(m2,
Sampling, .25)
67
66
65
64
63
0.5
1.5
time
sig.dist(p1, p2, Cosine)
F. Lerdahl, R. Jackendoff, A generative

theory of tonal music, MIT Press, 1983
time-span reduction
prolongational reduction
GPR 7
ex
ic
s
trin
GPR 6
grouping structure
MPR 1
parallelism
ctu
structural
stru
PR
PRPR 5
TS
R
ral
culture
metrical structure
l
a
c
i
etr
accents
phenomenal
2+
3)
ic
ns
tri
in
(G
PR
surface
O. Lartillot, Reflexions towards a generative theory of musical parallelism,

Musicae Scientiae, DF 5, 2010.
Structural
levels
Groups
Symbolic level
Meter
Notes
Timbre
Audio level
Mode
Tonality
Pitch
Sound
Dynamics
mus.score(, Segment)
local segmentation
$
#
%
!
"
"
"
"
"
"
Tenney & Polansky, 1980
LBDM (Cambouropoulos, 1997)
Bod, 2002
based) approaches. An example of the first category is the algorithm by Tenney an

Polansky (1980), which finds the locations where the changes in clangs occur
These clangs correspond to large pitch intervals and large inter-onset-intervals (IOIs)
This idea is partly based on Gestalt psychology. For example, this algorithm segment
Lksin in the following way:
local
segmentation
segmentgestalt(laksin,'fig');
C5#
C5
B4
A4#
Pitch
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
12
14
16
Time in beats
mus.score(, Segment, TP)

Figure 22. Segmented version of Lksin min kesyn. The dotted line indicates clan
boundaries and the black line indicates the segment boundary, both the result of the Gestalt
based algorithm
& Polansky,
1980).
(Tenney(Tenney
& Polansky,
1980)
from MIDI Toolbox
Another segmentation technique uses the probabilities derived from the analysis o
although the correct segmentation is more in line with Tenney & Polanskys model.
Finally, a Local Boundary Detection Model by Cambouropoulos (1997) is a recent
variant of the rule-based model that offers effective segmentation of monophonic
input.
local segmentation
Pitch
boundary(laksin,'fig');
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
12
14
16
12
14
16
Time in beats
1
Boundary strengths
0.8
0.6
0.4
0.2
0
10
mus.score(, Segment, LBDM)
(Cambouropoulos, 1997)
from MIDI Toolbox
CHAPTER 4 EXAMPLES
local segmentation
Pitch
33
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#
10
12
14
16
12
14
16
Time in beats
0.5
10
mus.score(, Segment, Essen)
(Bod, 2002)
from MIDI Toolbox
mus.score(, Segment) vs.

mus.score(, Group)
3
!
$
!
!
# 4 !!! ! !!
!
!
!
mus.score(, Group)
hierarchical grouping
$ 42 ! #
$
! ! !
$ !#
!
&
!#
!
"
!
"
!
!
"
! ! !
!
!
"
!
"
!
!(
!#
!
"
%! !
!
!
"
&
!
"
!
"
!
!
"
!
!
"
!
"
!
!!
"
'
Mozart, Variation XI on Ah, vous dirai-je maman, K.265/300e
mus.score(, Group)
B4
A4
G4
F#4
E4
Eb4
10
Structural
levels
Groups
Mode
Tonality
Meter
Ornamentation!
reduction
Symbolic level
Notes
Timbre
Audio level
Pitch
Sound
Dynamics
mus.score(, Reduce)
ornamentation reduction
(Lerdahl & Jackendoff)
head
!
#
2
$ 4
$
! ! !
#
!
$
!
&
!#
!
"
!
"
!
!
"
! ! !
!
!
"
!
"
!
!(
!#
!
"
%! !
!
!
"
&
!
"
!
"
!
!
"
!
!
"
!
"
!
!!
"
'
Mozart, Variation XI on Ah, vous dirai-je maman, K.265/300e
mus.score(, Reduce)
B4
mus.minr(laksin.mid, A4
Group, Reduce)
G4
Construction of a
F#4
syntagmatic network
E4
Eb4
10
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
motivic pattern
analysis
!
A
B
A
"
A
!
A
B
Bach Invention in D minor
A
A@
Kresky (1977) analysis

Name Kresky's interpretation
A
Opening motive
A'
First sequence unit
B
Accompanying figure
a
Motive shape
a1
Reversed motive shape
a2=c Triadic structure
a3=a3' Retarded scalar climb
a4=C Bass line
a5
Seconde sequence unit
a6
(not considered)
b
Three-note scale
b'
Identifying a5 with a3
b''
Same, transposed
b'''
Three shape in bass line
D
(not considered)
!@
!@
Computational description
A
A
A
A
Melodic
Rhythmic B@@
B@
(D,E,F,G,A,Bb,C#,Bb,A,G,F,E,F)
16th notes
(-2,+1,+1,+1,+1,-6,+6,-1,-1,-1,-1,+1) 16th notes
(F,A,D+,-,+,+,-)
8th notes
(prefix of A)
"
A
A@
A@(not detected)
A
B@
(+,+,-)
8th notes

(-2,+1,+1,+1,+1,-6)
16th notes
(+7,-5,+1,+1,+1,-6)
8th notes
(+1,+1,-2,+1,+1,-6)
16th notes
A
A
A
(-,+1,+1,+1)
16th
B@
B@
B@@ notes
B@@
(+1,+1)
16th notes
(D5,E5,F5)
16th notes
(C5,D5,E5)
16th notes
!@
!@
(not detected)
A
A
A
A
B@@
B@
(-1,+1,+1,+1)
16th B@@except 1st
A
B@

!
A
B
motivic pattern
analysis
"
C
!
A
B
Computer analysis
$
!
C
A
B
"
C
!@
A
B@
!@
A
B@
#
C

Lartillot, Toiviainen, "Motivic

matching strategies for automated
pattern extraction", Musicae
Scientiae, DF4A, 281-314, 2007.
A
$
A
B@
%
!@
A
B@
B@
B@
A
B@@
!@
A
B@
B@
B@
motivic pattern
analysis
Name Kresky's interpretation
A
Opening motive
A'
First sequence unit
B
Accompanying figure
a
Motive shape
a1
Reversed motive shape
a2=c Triadic structure
a3=a3' Retarded scalar climb
a4=C Bass line
a5
Seconde sequence unit
a6
(not considered)
b
Three-note scale
b'
Identifying a5 with a3
b''
Same, transposed
b'''
Three shape in bass line
D
(not considered)
E
(not considered)
Computational description
Melodic
Rhythmic
(D,E,F,G,A,Bb,C#,Bb,A,G,F,E,F)
16th notes
(-2,+1,+1,+1,+1,-6,+6,-1,-1,-1,-1,+1) 16th notes
(F,A,D+,-,+,+,-)
8th notes
(prefix of A)
(not detected)
(+,+,-)
8th notes
(-2,+1,+1,+1,+1,-6)
16th notes
(+7,-5,+1,+1,+1,-6)
8th notes
(+1,+1,-2,+1,+1,-6)
16th notes
(-,+1,+1,+1)
16th notes
(+1,+1)
16th notes
(D5,E5,F5)
16th notes
(C5,D5,E5)
16th notes
(not detected)
(-1,+1,+1,+1)
16th except 1st
(E5,F5,D5,E5,F5)
16th start. offbeat
Lartillot, Toiviainen, "Motivic matching strategies for automated pattern extraction",

Musicae Scientiae, DF4A, 281-314, 2007.
motivic pattern analysis

0! 0! -2M!
.5 .5 .5
0!
.5
G G
0!
.5
G G
-!
.5
+!
.5
Eb
0!
.5
0!
.5
+3P!
0! -2m!
.5 .5
0!
.5
0! -2M! +3P! 0!
.5 .5
.5 .5
G G G
0!
.5
-! +!
.5 .5
Eb Ab Ab Ab
0! -! +! 0! 0! -2m!
.5 .5 .5 .5 .5 .5
D Ab Ab Ab G F
F F
Beethoven, 5th Symphony, Allegro con brio
0! 0! -2m!
.5 .5 .5
! ! !
! ! ! ! ! "!
"! $ '
&% $
"! !
#
L1
motivic pattern
analysis
M1
(!
!
$
'
!
%
$
!
!
!
!
)
"! ! ! "! ! ! ! !
"!
!
(
!
!
!
!
!
) % $ !
"! $ ' ! ! ! "! ! ! ! ! ! !
#
U1
&% !!! !
! $ ' !# !
#
L2
! !*
##
! "! ! ! ! ! !
! ! ! "
! $ '
"! #
) % $
U2
M2
J.S. Bach, Well-Tempered

Clavier, Book II, Fugue XX
Detected subject entries
Lartillot, ISMIR 2014
(
$ '
) % $ ' !( !
! !
!
!
!
"!
! ! +!
! !
!
!
!
"
!
%
$
$
'
)
##
U3
, ,
&L3% ! ! ! !
!
'
!
$
!
! ! !!!!
"!
#
mus.score(, Motif)
motivic pattern analysis
mus.minr(beethoven, Motif)
A
G
F
F
E
E
D
D
C
C
B
A
A
G
F
F
E
E
D
C
C
B
B
A
A
G
F
E
C
100
200
300
400
Lartillot, Poster, ISMIR 2014
500
structure complexity
Pattern extraction
Large set of unrelevant

structures during extraction
phase
Pattern selection
(longest, frequent, ...)
cannot be computed
extensively
Imperfections of the
selection phase:
expected patterns
accidentally deleted
insufficiently selective
Improvement of the
extraction process
Cambouropoulos (2006), Conklin & Anagnostopoulou (2001), Rolland (1999)
closed patterns
ABCDECDEABCDECDE
n
io
it
t
e
p
e
r
n
r
e
t
t
a
p
f
o
n
io
t
ip
r
c
s
e
d
t
c
a
p
Complete and com
incremental approach
C D A B C D E A B C DE
Progressively analysing music through one single

pass
Controls structural complexity in a way similar to

the way listeners perceive music.
pattern cyclicity
ABCABCABCA
Cambouropoulos, 1998
heterogeneous pattern
cycles
"" 9
# "8
C Eb
+3 +2m-5m
8th 8th 8th
10
13
14 15
!
!
! ! ! !
!
!
!
!
!
!
!
!
!
G
cycle 1
cycle 2
-4
C Eb G C Eb Ab C Eb
+3 +2m-5m +3 +2m +3 +2m -4
8th 8th 8th 8th 8th 8th
8th 8th 8th
cycle 3
Lartillot, ISMIR 2014
cycle 4
cycle 5
Structural
levels
Groups
Mode!
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
Statistical m/tonal analysis
c = mus.chromagram(score.mid)
ks = mus.keystrength(c)
mus.key(ks)
mus.mode(ks)
C major

C# major

D major

mus.keysom(ks) ks
Score-level m/tonal analysis
Statistical tonal analysis: pitch distribution in frames
What if key transition within one single frame?
Tonality is more than such statistical description:
Succession of chords rooted on the scale degrees
Standard chord sequences, cadenza formulae
Patterns, grouping help emphasise chord changes
etc.
Score-level m/tonal analysis

CM
CM
Dm
II
GM
V
CM
I
Am
VI
DM
I
GM
V
CM
I
Audio / symbolic
audio
MIDI, score
tran
mus.chromagram
scr
mus.keystrength
ipti
on
mus.score
tonal analysis
sig.peaks
mus.key
190
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
Statistical metrical analysis

o = aud.envelope(score.mid)
amplitude
0.15
0.1
0.05
0
a = mus.autocor(o)
!0.05
time (s)
t = mus.tempo(a)
mus.pulseclarity(a)
mus.metre(a)
coefficient value (in bpm)
Tempo
180
170
160
150
140
130
4
6
8
10
12
Score-level metrical analysis

## $ '
!
!
&
"
!!
## $ '
!
!
&
"
! ! !! !
#
!
!
!
!
!
!
!
!
!
!
!
!
#
$
'
&
%
" ## $ ! ! ! !
!! ! &! ! '! ! ! !
#
%
" ## # $$ '! ! ! ! !& ! !! !
! & '
!
!
!
!
!
!
!
!
!
!
!
!
% # $ !! !!
# ! & '
!
!
!
!
!
!
!
!
!
!
!
!
% #$
# ! & '
A
G
A
H
B
a
d
a
e
a
f
a
g
a
d
a
e
a
f
b
i
c
j
b
k
c
l
b
m
c
n
d
1
e
2
f3
g
4
d
5
e
6
! !!! !!
! !! ! ! !! !
! !! !
! !! ! ! !! !
!! !! ! ! !! !! ! !
!! ! !&! !! !'! ! !! !!
!! !!
! !&! ! !' !! !
!! !! ! ! !! !! ! !
!& ! ' ! !
!! !! ! ! !! !! ! !
& '
C
I
C
G
A
H
B
I
C
G H
A
B
G H
a
g
a
d
a
e
a
f
a
g
a
d
a
e
a
f
b
o
c
p
b
i
c
j
b
k
c
l
b
m
c
n
b
o
f7
g
8
d
9
e
10
!
!!
! ! & !!
!!
! !
! ! & !!
! !! !!
!! !! !! ! !!& !! !! !!
!! ! !&! !! '! ! !! !!
! ! ! !& ! ! !
! !& ! ' ! !
!! !! ! ! !! !! ! !
!
!
!
!
!! ! &! ! '! ! ! !
!! !!
! & '
C
G
D H
E
I
C
G H
a
g
a
d
a
e
a
f
a
g
a
d
a
e
a
f
c
p
b
i
c
j
b
k
c
l
b
m
c
n
f 12
g 13
d 14
e 15
f 16
g
11
d
1
e
2
f3
g
4
d
5
e
6
! !!! !!
! !!! !!
! !!! !!
! !! !! ! !! !
!! ! !! ! ! ! ! !
!! ! ! ! !
!! ! !&! !! !'! ! !! !!
!! ! ! ! ! !
!& ! !' !! !
!! !! ! ! !! !! ! !
!& ! ' ! !
!! !! ! ! !! !! ! !
& '
F
I
F
G
D
H
E
I
F
G
D H
E
G H
! ! & !!
!!
! !
! ! & !!
!! ! !! !! ! ! !! !!
! ! ! !& ! ! !
! ! !!& !! !' ! !!! !!
! ! ! !& ! !
! ! !&! !! !' ! !! !!
!! !!
!
!
!
!
! ! !& ! !' ! ! !
!! !!
! & '
F
I
F
G
D
H
E
a a a a
a a a
a
g
a
d
a
e
a
f
a
g
a
d
a
e
a
f
a
g
a a a a a a a a
d e f g d e f
g
b
o
c
p
b
i
c
j
b
k
c
l
b
m
c
n
b
o
c
p
b
i
f7
g
8
d
9
e
10
f
g 13
d 14
e 15
f 16
g
11
12
O. Lartillot, Reflexions towards a generative theory of musical

parallelism, Musicae Scientiae, DF 5, 2010.
c
j
b
k
c
l
b c
m
n
b
o
c
p
d 2
e
1
f
3
g
4
d e
5
6
f
7
g
8
Audio / symbolic
audio
MIDI, score
tran
aud.envelope
scr
mus.autocor
ipti
on
mus.score
metrical analysis
sig.peaks
mus.tempo
194
In future works, symbolic-based grouping based on

combination of different musical factors
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
4.
How it works
197
Matrix convention in
MIRtoolbox
In standard Matlab code, processing
of matrix dimensions makes the code
somewhat obscure.
In MIRtoolbox, one single matrix

convention:
This convention should be followed

everywhere in the code, and is not
explicitly explained to the readers.
s
l
ne
n
a
h
Bins
Frames
MIRtoolbox
One single convention that needs to allow complex
things such as the data of a batch of audio files that
are each segmented.
Ch
Bins
{{
s
l
e
n
an
n
n
a
s
l
e
Ch
Bins
Frames
Frames
,},}
(always structured
like this)
MiningSuite
There is no more constraint on the choice of dimensions.
Samples
{bin,
sample,
track}
s
k
c
a
Tr
Bins
Bins
a
r
T
s
k
c
Freq. bands
{bin,
freqband,
track}
Allows configuration that were not possible in MIRtoolbox.
MiningSuite
More complex configurations can be freely
designed, outside of any pre-wired configuration.
Tr
Bins
{{
s
k
ac
a
r
T
s
k
c
Bins
Freq. bands
Freq. bands
,},}
(variable structures
possible)
sig.data
New syntactic layer on top of Matlab that makes

operators code simpler.
sig.data
Matlab
matrix
dimension
specification
cell layers
For instance:
{bin,sample,channel}
Indicating if the data is a cell-array,
corresponding to batch of audio files,
sequence of segmented data, etc.
sig.data
x.size(sample) gives the number of samples.
x.sum(channel) sums the matrix along the channel

dimension.
x.times(y) multiplies two sig.data objects, respecting

dimension type congruency.
x.apply(@xcorr,{},{sample},2) applies xcorr along the

sample dimension. The last argument notifies that
xcorr does not work for matrices with more than 2
dimensions. The extra dimensions are automatically
covered via loops.
sig.data
d:
e = d.apply(@algo,{},{bin},2);
tra
samples
out = {e};
function ceps = algo(m,)
ceps = mfccDCTMatrix * m;
end
2 dimensions:
m:
bins
end
s
k
c
bins
function out = routine(d,)
sig.signal.display
a = sig.input(myfile, Center)
sig.signal
X axis (bin pos)
no
Y axis
Sample axis
Frame axis
Channel axis
no
no
Peaks
date
15.10.2014
ver
R2014b,
MiningSuite 0.8
yname
yunit
Ydata
audio
Sstart
Srate
Ssize
0
44100
184269
sig.data
Matlab
matrix
dimension
specification
cell layers
indices
precise pos
precise val
sig.design
design
a = sig.input(myfile, Center)
b = sig.brightness(a, Frame)
Brightness, son5
coefficient value
sig.signal.display
sig.signal
X axis (bin pos)
no
Y axis
Sample axis
Frame axis
Channel axis
no
no
Peaks
date
15.10.2014
ver
R2014b,
MiningSuite 0.8
0.6
0.4
0.2
70
yname
yunit
Ydata
75
brightness
Sstart
Srate
Ssize
80
85
90
95
sig.data
Matlab
matrix
0
40
.05
dimension
specification
cell layers
indices
precise pos
precise val
sig.design
design
b = sig.brightness(a, Frame)
100
Information such as
time positions are
regenerated on the fly.
b = sig.spectrum(a, Frame)
sig.Spectrum sig.axis
sig.unit
X axis
xname
frequency
xname
Hz
Y axis
xstart
origin
rate
10.76
generator
@..
finder
@..
xunit
Sample axis
Frame axis
Channel axis
no
no
Peaks
date
15.10.2014
ver
R2014b,
MiningSuite 0.8
design
yname
yunit
Ydata
Sstart
Srate
Ssize
indices
precise pos
spectrum
0
40
.05
1
power
0
log
phase sig.data
Data flow graph

in MiningSuite
long audio file,
batch of files
sig.input
s = sig.spectrum(f, Frame, );
c = sig.centroid(s)
Frame
; No operation is performed.
sig.spectrum
sig.centroid
(The data flow graph is

constructed without actual
computation.)
208
sig.design
a = sig.input(ragtime.wav, Extract, 0, 1);
b = aud.spectrum(a, Mel, log, Frame);
c = aud.mfcc(b)
a
sig
package
input
name
ragtime.wav
input
sig.signal
type
Extract, 0, 1
argin
options
no
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
b
aud
package
spectrum
name
input
sig.Spectrum
type
argin Mel, log, Frame
options
.05 s, .5
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
c
package
name
input
type
argin
options
frame
date
ver
aud
mfcc
sig.Mfcc
.05 s, .5
15.10.2014
R2014b,
MiningSuite 0.8
sig.design
c = aud.mfcc(ragtime.wav, Frame)
deploying the implicit data flow prior to the actual operator,

during the Init phase.
sig
package
input
name
ragtime.wav
input
sig.signal
type
Extract, 0, 1
argin
options
no
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
aud
package
spectrum
name
input
sig.Spectrum
type
argin Mel, log, Frame
options
.05 s, .5
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8
package
name
input
type
argin
options
frame
date
ver
aud
mfcc
sig.Mfcc
.05 s, .5
15.10.2014
R2014b,
MiningSuite 0.8
sig.design.eval
sig.input(long audio file)
aud.spectrum
aud.mfcc
Divide the audio file into successive chunks.
Compute the whole process on each chunk separately.
Recombine the final results by concatenation.
when there is no frame decomposition

mirrms
mirsum
mirfilterbank
mirzerocross
mirsegment
mirpeaks
mirpitch
mirautocor
miraudio
mirinharmonicity
mirpulseclarity
mirspectrum
mirenvelope
mironsets
mirautocor
mircepstrum
mirspectrum
mirattackslope
mirpeaks
mirattacktime
mirrolloff
mirroughness
mirpeaks
mirregularity
mirchromagram
mirtempo
mireventdensity
mirmfcc
mirbrightness
mirpeaks
mirkeystrength
mirkeysom
Intensive operators:

Chunk decomposition needs
to be performed here
mirtonalcentroid
Extensive operators:

mirenvelope needs to be
computed completely
before going further
mirpeaks
mirkey
mirmode
mireval
non-frame-based evaluation
a = miraudio(Design);

c = mirmfcc(a);

mireval(c, Folder)
very complex
paradigm for a
situation not practically
usual in MIR.
MiningSuite is much
more simple
eval_each(c,song1.wav)
c @mirdesign type=@mirmfcc
options: Rank=1:13, Delta=0, ...
@mirdesign type=@mirspectrum
options: Win=hamming, ...
a @mirdesign type=@miraudio
options: Sampling=11025, ...
after
decomposition
summation
k
chun
loading
sig.design.eval(Folder)
a = aud.mfcc(Folder, )
d = a.eval
For each separate file in the folder:
evaluate the design on that file (using the chunk

decomposition)
store the results
sig.design.display sig.design.eval
sig.signal.display
sig.design object, storing only the data flow graph
a = aud.mfcc(bigfile, )
d = sig.ans!
d = a.eval;
sig.design.display =
sig.design.eval, which
outputs a sig.signal and
calls sig.signal.display
the outputted sig.signal

sig.design.eval outputs a sig.signal
sig.signal.display
sig.operate
operators main routine
+aud/brightness.m:
function varargout = brightness(varargin)
varargout = sig.operate(aud, brightness, options,
@init, @main, varargin);
end
aud.brightness (ragtime, Frame)
sig.operate first parses the arguments in the call,
then creates a data flow design:
starts with implicit data flow (@init)
ends with @main routine.
options specification
my_operator(filename, Threshold, 100, Normalize)
thres.key = Threshold;
norm.key = Normalize;
thres.type = Numeric;
norm.type = Boolean;
three.default = 50;
norm.default = 0;
options.thres = thres;
options.norm = norm;
options specification
my_operator(filename, Domain,Symbolic)
domain.key = Domain;
domain.type = String;
domain.choice = {Signal, Symbolic};
domain.default = Signal;
options.domain = domain;
@init
implicit data flow deployment
audio file
function varargout = aud.brightness(varargin)

varargout = sig.operate(aud,brightness,options,@init,@main,varargin);
end
sig.input
sig.signal
function [x type] = init(x,option,frame)

if ~istype(x,'sig.Spectrum')
sig.spectrum
sig.Spectrum
aud.brightness
sig.signal
x = sig.spectrum(x);
end
type = 'sig.signal';
end
function varargout = aud.brightness(varargin)

varargout = sig.operate(aud,brightness,options,@init,@main,varargin);
end
function out = main(in,option)

x = in{1};
if ~strcmpi(x.yname,'Brightness')
res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);
x = sig.signal(res,'Name','Brightness','Srate',x.Srate,'Ssize',x.Ssize);
end
out = {x};
end
s
Freq. bands
k
ac
Tr
,},}
Bins
{{
Bins
x.Ydata:
ck
a
Tr
Freq. bands
function out = routine(d,f,f0)
Tr
Bins
d:
k
ac
Freq. bands
e = d.apply(@algo,{f,f0},{'element'},3);
out = {e};
end
sig.compute
applies the
routine to the
successive
segments,
successive
audio files,
etc.
function out = routine(d,f,f0)

e = d.apply(@algo,{f,f0},{'element'},3);
out = {e};
end
function y = algo(m,f,f0)
y = sum(m(f > f0,:,:)) ./ sum(m);
end
sig.data.apply
applies the
sub-routine
algo to the
data properly
formatted, with
loops if
needed.
+aud/mfcc.m:
e = d.apply(@algo,{},{bin},2);
out = {e};
d:
tra
s
k
c
bins
function out = routine(d,)
samples
end
ceps = mfccDCTMatrix * m;
end
2 dimensions:
m:
bins
function ceps = algo(m,)
seq.event
note characterization
Each note is a seq.event:
characterization of the
note: musical
parameter (instance of
class seq.paramstruct)
previous note
next note
0
10
mus.score(, Group)
B4
mus.score(laksin.mid,
Group)
A4
G4
Groups, like notes, are

instances of seq.event
F#4
E4
Eb4
10
mus.score(, Reduce)
B4
mus.score(laksin.mid,
Group, Reduce)
A4
Each syntagmatic
relation between 2
notes is instance of
seq.syntagm
G4
F#4
E4
Eb4
10
mus.paramstruct
musical dimensions
each note:!
interval between notes:!
chromatic pitch
more general
chromatic pitch class
diatonic pitch (letter, accident,

octave)
chromatic pitch interval
diatonic pitch class
chromatic pitch interval class
diatonic pitch interval (number,

quality)
diatonic pitch interval class
gross contour
onset, offset times (in s.)
metrical position
inter onset interval
channel
rhythmic value
harmony, etc.
seq.param
parameter management
common(p1,p2) returns the common parametric

description
p1.implies(p2) tests whether p2 is more general than p1
seq.syntagm
s = seq.syntagm(n1,n2)
n1 n2
s.param is automatically computed from n1.param

and n2.param
mus.score
Groups
Mode
Tonality
Ornamentation
reduction
Meter
Motifs
Pattern
Notes
Each successive note is progressively integrated to

all musical analyses, driving interdependencies.
mus.score
pat.pattern
8
0
8
seq.syntagm
pat.occurrence
0
G 8
0
F 8
G
F
0
8
0
8
0
8 G
0
+
8 Eb
5.
How you can contribute

231
https://code.google.com/p/
miningsuite/
All releases, SubVersion repository, integrated with

code review environment
Users Manual and documentations in wiki

environment
Mailing lists: news, discussion list related to

ongoing development, commits, issues registered
and modified, discussion list for users
Tickets to issue bug reports
Open-source project
MIRtoolbox and initial version of MiningSuite is mainly

the work of one person. Transition to a tool controlled by
a community following standard open-source protocols.
Whole code should be clearly readable, and be subject

to correction/modification/enrichment by open
community, after open discussions.
Further development of the toolbox core (architecture,

new perspectives) also subject to open discussion and
community-based collaboration.
Open-source project
Contribution acknowledgement
Each contributors participation is acknowledged,

in particular in the copyright notice of source files
to which he or she contributed.
Each new model based on particular research is

acknowledged in the documentation. In particular,
users of the model are asked to cite the related
research papers in their own papers.
High-level musicological analysis refines lower-level transcription.
Pattern mining is a central process in symbolic-based music

analysis.
Structural
levels
Groups
Mode
Tonality
Ornamentation
reduction
Symbolic level
Motifs
Pattern
Notes
Timbre
Audio level
Meter
Pitch
Sound
Dynamics
Future directions
(on my side)
Melodic transformations (ornamentation)
Modelling form and style
Metrical, tonal/modal analysis on symbolic domain
Polyphonic analysis on symbolic domain
Lrn2
Cre8
Future directions
(We need you!)
Systematic test to check the validity of the results,

to be run before each public version release
Measuring and controlling the variations of the

results between versions
Integrating other methods from the MIR literature
Any other idea?
We can also discuss about it during ISMIR.
Acknowledgments
Academy of Finland research fellowship, 2009-14
Finnish Centre of Excellence in Interdisciplinary

Music Research, University of Jyvskyl
Learning to Create
Lrn2
Cre8
Aalborg University, MusIC group
Thanks to all MIRtoolbox contributors and users
Thanks to MiningSuite beta-testers

238

Ismir 2014 Tutorial (Mining Suite)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ismir 2014 Tutorial (Mining Suite)

Uploaded by

Copyright:

Available Formats

ISMIR 2014 Tutorial, 27.10.

Large range of audio and music analysis tools

Both audio and symbolic representations

Highly adaptive syntactic layer on top of Matlab

Syntactic layer within the operators Matlab code,

Memory management mechanisms

Matlab framework (but intricate, non-optimized)

Large range of audio and music analysis tools

Both audio and symbolic representations

Highly adaptive syntactic layer on top of Matlab

Syntactic layer within the operators Matlab code,

Memory management mechanisms

Whats new in MiningSuite

Code & architecture entirely rebuilt

More efficient, more readable, better organised, better

Integration audio / symbolic representations

Innovative and integrative set of symbolic-based

Syntactic layer within the operators Matlab code,

Open-source collaborative environment

Aim of this tutorial

Overview of audio and symbolic approaches to

Take benefit of the capabilities of the environment, of

Understand how the framework works

Understand about the choices in the toolbox

Add new codes and new features in the project

Matlab toolboxes @ University of Jyvskyl:

MIDItoolbox (Eerola & Toiviainen, 2004)

Music Therapy Toolbox (Lartillot, Toiviainen, Erkkil, 2005)

Master program @ University of Jyvskyl, MIR course (2005)

Large range of audio/musical features extracted from large

Toolbox easy to use, no Matlab expertise required

version 1.0 in 2007, current version 1.5.

Highly modular framework:

building blocks can be reused, reordered

no need to recode processes, to reinvent the wheel

Adaptive syntax: users can focus on design, MIRtoolbox

Free software, open source: Capitalized expertise of the

10000s download, 500+ citations, reference tool in MIR

miraudio(ragtime.wav); blocks figure display

mirtempo(Folder) operates on all files in the Current Directory

a = miraudio(ragtime.wav, Extract, 0, 60, s)

d = mirgetdata(t) returns the result in Matlab array

get(t, Sampling) returns additional data stored in MIRtoolbox object

sig.input(ragtime.wav); postpone any computation

mus.tempo(Folder) operates on all files in the Current Directory

a = sig.input(ragtime.wav, Extract, 0, 60, s)

d = t.getdata returns the result in Matlab array

t.show returns additional data stored in MIRtoolbox object

Academy of Finland research fellowship, 2009-14

MIRtoolbox: a Rube Goldberg machine

Obscure architecture, obscure code, highly inefficient in

MIRtoolbox innovative framework draws interest outside MIR

Integrating audio and symbolic into common framework,

Reorganize framework into discipline-focused packages

MIDItoolbox did not evolve since 1.0 (2004)

In MIRtoolbox, all operators start with mir prefix

to avoid conflicts with other Matlab functions

In MiningSuite, each module (SigMinr, AudMinr,

Each operator can

Signal domain: Modular

Succession of operations applied to input signal?

Several types of analysis applied together for

One single operator: mus.score