Download as pdf or txt
Download as pdf or txt
You are on page 1of 239

ISMIR 2014 Tutorial, 27.10.

2014
https://code.google.com/p/miningsuite/

MiningSuite
A comprehensive framework for music analysis, articulating audio
(MIRtoolbox 2.0) and symbolic approaches

Olivier Lartillot
Aalborg University, Denmark

Lrn2
Cre8

Tutorial overview
1. General description: Aims, architecture, syntax,
2. Audio-based approaches (MIRtoolbox 2.0)
3. Symbolic approaches: MIDItoolbox 2.0 and new
musicological models
4. How it works
5. How you can contribute: open-source community

1.

General description
5

MiningSuite

Matlab framework

Large range of audio and music analysis tools

Both audio and symbolic representations

Highly adaptive syntactic layer on top of Matlab

Syntactic layer within the operators Matlab code,


simplifying and clarifying the code

Memory management mechanisms


6

MIRtoolbox

Matlab framework (but intricate, non-optimized)

Large range of audio and music analysis tools

Both audio and symbolic representations

Highly adaptive syntactic layer on top of Matlab

Syntactic layer within the operators Matlab code,


simplifying and clarifying the code

Memory management mechanisms


7

Whats new in MiningSuite

Code & architecture entirely rebuilt

More efficient, more readable, better organised, better


generalisable

Integration audio / symbolic representations

Innovative and integrative set of symbolic-based


musicological tools and pattern mining

Syntactic layer within the operators Matlab code,


simplifying and clarifying the code

Open-source collaborative environment


8

Aim of this tutorial

Overview of audio and symbolic approaches to


computational audio/music analysis

Take benefit of the capabilities of the environment, of


the user-friendly syntax and architecture

Understand how the framework works

Understand about the choices in the toolbox


architecture, and maybe discuss about that?

Add new codes and new features in the project


9

MIRtoolbox history

Matlab toolboxes @ University of Jyvskyl:

MIDItoolbox (Eerola & Toiviainen, 2004)

Music Therapy Toolbox (Lartillot, Toiviainen, Erkkil, 2005)

European project Tuning the Brain for Music (NEST, FP6, 200608)

Master program @ University of Jyvskyl, MIR course (2005)

Large range of audio/musical features extracted from large


databases, linked to emotional ratings (Eerola et al., ISMIR 2009)

Toolbox easy to use, no Matlab expertise required

version 1.0 in 2007, current version 1.5.


10

MIRtoolbox advantages

Highly modular framework:

building blocks can be reused, reordered

no need to recode processes, to reinvent the wheel

Adaptive syntax: users can focus on design, MIRtoolbox


takes care of technical details

Free software, open source: Capitalized expertise of the


research community, for everybody

10000s download, 500+ citations, reference tool in MIR


11

MIRtoolbox syntax

miraudio(ragtime.wav)

outputs a figure:

miraudio(ragtime.wav); blocks figure display

mirtempo(ragtime.wav)

mirtempo(Folder) operates on all files in the Current Directory

a = miraudio(ragtime.wav, Extract, 0, 60, s)

a = miraudio(a, Center);

t = mirtempo(a)

d = mirgetdata(t) returns the result in Matlab array

get(t, Sampling) returns additional data stored in MIRtoolbox object


12

MiningSuite syntax

sig.input(ragtime.wav)

outputs a figure

sig.input(ragtime.wav); postpone any computation

mus.tempo(ragtime.wav)

mus.tempo(Folder) operates on all files in the Current Directory

a = sig.input(ragtime.wav, Extract, 0, 60, s)

a = sig.input(a, Center);

t = mus.tempo(a)

d = t.getdata returns the result in Matlab array

t.show returns additional data stored in MIRtoolbox object


13

MiningSuite history

Academy of Finland research fellowship, 2009-14

MIRtoolbox: a Rube Goldberg machine

Obscure architecture, obscure code, highly inefficient in


speed and memory rewrite

MIRtoolbox innovative framework draws interest outside MIR

Integrating audio and symbolic into common framework,


higher-level music analysis

Reorganize framework into discipline-focused packages

MIDItoolbox did not evolve since 1.0 (2004)


14

MiningSuite
SIGMINR: signal processing
AUDMINR: audio, auditory modelling
MUSMINR: music analysis
PATMINR: pattern mining
SEQMINR: sequence processing
VOCMINR: voice analysis?

15

Packages

In MIRtoolbox, all operators start with mir prefix


(miraudio, mirspectrum, etc.)

to avoid conflicts with other Matlab functions

In MiningSuite, each module (SigMinr, AudMinr,


etc.) is a package: its operators are called using a
particular prefix (sig.spectrum, aud.spectrum, etc.)

16

Signal domain

SIGMINR!

Sets of operators
related to signal
processing
operations, audio
and musical
features

Versions specific to
particular domains

Each operator can


be tuned with a set
of options

sig.input, sig.spectrum,

AUDMINR!

aud.spectrum,
aud.mfcc, aud.brightness,

MUSMINR!

mus.spectrum,
mus.tempo, mus.key,
17

Signal domain: Modular


data flow graphs
aud.envelope
method
mus.autocor
mus.tempo

BPM preference

method

sig.peaks

BPM range

method

etc.

mus.tempo
18

Symbolic domain

MUSMINR!

Succession of operations applied to input signal?

Several types of analysis applied together for


each successive note of the symbolic sequence

One single operator: mus.score

Types of analysis selected as options of


mus.score
19

Symbolic domain

AUDMINR: aud.score (auditory scene

transcription)

MUSMINR: mus.score!

SEQMINR: sequence management!

PATMINR: sequential pattern mining


20

Software dependencies

MathWorks Matlab 7.6 (r2008a) or newer versions.

But Matlab 8.2 (r2013b) or newer is strongly recommended,


because MiningSuite easily crashes on previous versions*.

MiningSuite based on new object oriented paradigm introduced


in Matlab 7.6

MathWorks Signal Processing Toolbox

Included in the distribution: code from Auditory Toolbox (Slaney),


Music Analysis Toolbox (Pampalk), MIDI Toolbox (Eerola et al.)

* versions 7.6 to 8.2: there exists a workaround based on debug


mode. Cf. README.txt.
21

How to start

Download version 0.8 on MiningSuite website:


https://code.google.com/p/miningsuite/

Read the README.txt file:

Add the folder "miningsuite" to your Matlab path.

help miningsuite

Check the examples in USAGE.m (unfinished)

MiningSuite development

Started in 2010

Sneak Peek version 0.6 released in January 2014 for AES Semantic
Audio. Proof of concept, basic architecture

Alpha version 0.8 released now for ISMIR 2014. Focus on the
architecture, not on the exact feature implementation. Not
everything implemented. Results cannot be trusted!

Beta version 0.9 with more complete implementation, users guide


(in wiki) and code documentation.

Version 1.0 with complete bench test. We need you..

Further versions: We definitely need you!

2.

Audio-based approaches
25

SIGMINR

signal processing
sig.input
sig.rms
sig.zerocross
sig.filterbank

sig.frame
sig.flux
sig.autocor
sig.spectrum

sig.peaks

sig.segment

sig.rolloff

sig.stat

sig.cepstrum
sig.simatrix

sig.envelope
26

sig.cluster

AUDMINR

audio, auditory modeling


aud.spectrum

aud.filterbank

aud.brightness

aud.attacktime

aud.roughness
aud.mfcc
aud.fluctuation

aud.envelope

aud.pitch
aud.novelty
aud.segment
27

aud.attackslope
aud.score
aud.eventdensity

MUSMINR

music theory
mus.spectrum
mus.chromagram
mus.keystrength

mus.pitch

mus.tempo
mus.pulseclarity
mus.metre

mus.key
mus.keysom
mus.mode

mus.score

sig.input

In MIRtoolbox: miraudio. But SigMinr is not


restricted to audio only, but any kind of signal.

sig.signal is the actual signal object (cf. part IV)

sig.input takes care of the transformations of the


signal representation before further processing

sig.input(v, sr) converts a Matlab vector v into a


signal of sampling rate sr
29

sig.input

sig.input(myfile)*

Formats: .wav, .flac, .ogg, .au, .mp3, .mp4, .m4a

Using Matlabs audioread

* Indicate the file extension as well (because


audioread requires that).
(sig.input actually calls routine from AudMinr for the
processing of audio files.)
30

sig.input: transformations

By default, multi-tracks are summed into mono.

sig.input(mysong, Mix, no) keeps the multitrack


decomposition.

sig.input(..., Center) centers the signal.

sig.input(..., Sampling, r) resamples at rate r (in Hz.),


using resample from Signal Processing Toolbox

sig.input(..., Normal) normalizes with respect to RMS


31

sig.input: extraction

sig.input(, Extract, 1, 2, s, Start)

sig.input(..., Extract, 44100, 88200, sp)

from samples #44100 to 88200 after the start

sig.input(..., Extract, -1, +1, Middle)

extracts signal from 1 s to 2 s after the start

from 1 s before to 1 s after the middle of the signal

sig.input(..., Extract, -10000, 0, sp, End)

the last 10000 samples in the signal


32

sig.input: trimming silence

sig.input(, Trim) trims (pseudo-)silence at start


and end

sig.input(..., Trim, JustStart) at start only

sig.input(..., TrimEnd, JustEnd) at end only

sig.input(..., Trim, TrimThreshold, thr) specifies


the silence threshold. Default thr = .06

Silent frames have RMS energy below thr times the


medium RMS energy of the whole audio file.
33

.play, .save
sonification

a = sig.input(, Trim)

a.play

a.save(newfile.mp3)

34

sig.frame
frame decomposition

f = sig.frame(, FrameSize, l, s)

f.play
f.save

unit: s (seconds), sp (samples)

f = sig.frame(..., FrameHop, h, /1)

unit: /1 (ratio from 0 to 1),Hz, s, sp

Audio waveform
0.3
0.2

amplitude

0.1
0
!0.1
!0.2
!0.3
!0.4

3
time (s)

35

sig.frame
frame decomposition
sig.input
sig.frame

a = sig.input(

f = sig.frame(a)

s = sig.spectrum(f)

sig.spectrum
Or: s = sig.spectrum(, Frame)
36

Frame option
frame decomposition

sig.input(, FrameSize, , FrameHop, )

sig.spectrum(, FrameSize, , FrameHop, )

sig.spectrum(, Frame) (default frame configuration)

Frame option available to most operators, each with


its own default frame configuration

Each operator can perform the frame decomposition


where it is most suitable.
37

sig.spectrum
frequency spectrum
Discrete Fourier Transform of audio signal x:
Amplitude spectrum: s = sig.spectrum()
Spectrum
1000

magnitude

800
600
400
200
0

1000

2000

3000
frequency (Hz)

4000

5000

6000

Phase spectrum: s.phase


Fast Fourier Transform using Matlabs fft function.
38

sig.spectrum
parameter specification

sig.spectrum(, Min, 0)

sig.spectrum(..., Max, sampling rate/2) in Hz

sig.spectrum(..., Window, hamming)

frequency resolution r, in Hz:

in Hz

sig.spectrum(..., Res, r): exact resolution specification

sig.spectrum(..., MinRes, r): minimal resolution (less


precise constraint, but more efficient)

sig.spectrum(..., Phase, No)


39

sig.spectrum
post-processing

mirspectrum(..., Normal) normalizes w.r.t. energy.

mirspectrum (..., Power) squares the energy.


Spectrum

mirspectrum(..., dB)

20
10

magnitude

30

in dB scale

0
!10
!20
!30
!40

1000

2000

3000
frequency
(Hz)
Spectrum

4000

5000

6000

1000

2000

3000
frequency (Hz)

4000

5000

6000

20

mirspectrum(..., dB, th)

15

magnitude

10

th highest dB

mirspectrum(..., Smooth, o)

mirspectrum(..., Gauss, o)

40

aud.spectrum
auditory modeling

aud.spectrum(, Mel)

Mel-band decomposition

magnitude

MelSpectrum

100
50
0

10

aud.spectrum(..., Terhardt):

Bark-band decomposition

Outer-ear modeling

aud.spectrum(..., Mask):

Masking effects along bands

20
25
Mel bands
BarkSpectrum

30

35

40

1
0

10
15
Bark bands

20

25

20

25

20

25

BarkSpectrum

magnitude

magnitude

aud.spectrum(..., Bark)

4000
2000
0

10
15
Bark bands

BarkSpectrum
magnitude

x 10

15

4000
2000
0

based on Auditory toolbox and Music Analysis toolbox

10
15
Bark bands

mus.spectrum
pitch-based distribution
magnitude

Spectrum

1000
500
0

2000

4000

6000
8000
frequency (Hz)

10000

12000

mus.spectrum(, Cents)
!

500
0
6000

7000

8000

9000
10000 11000
pitch (in midicents)

12000

13000

aud.spectrum(..., Collapsed): into one octave,


divided into 1200 cents
Spectrum

magnitude

magnitude

Spectrum
1000

2000
1000
0

200

400

600
Cents

800

1000

1200

42

sig.cepstrum
cepstral analysis
sig.cepstrum(, Complex): complex cepstrum
(Inverse)

Fourier

transform

Phase

unwrap

Fourier

audio
Log
transform
(sig.spectrum)
Spectrum

Cepstrum

1000

0.2
0.15

800
magnitude

magnitude

0.1

600
400

0.05
0
!0.05

200
0

!0.1

2000

4000

6000
frequency (Hz)

8000

10000

12000

!0.15

0.005

0.01

0.015

0.02 0.025 0.03


quefrency (s)

0.035

0.04

0.045

0.05

Bogert, Healy, Tukey. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, crosscepstrum, and saphe cracking. Symposium on Time Series Analysis, Chapter 15, 209-243, Wiley, 1963.
43

sig.cepstrum
cepstral analysis
sig.cepstrum(, Real): real cepstrum
Fourier

audio
Abs
transform
(mirspectrum)

(Inverse)

Fourier

transform

Log

Spectrum

Cepstrum

1000

0.2
0.15

800
magnitude

magnitude

0.1

600
400

0.05
0
!0.05

200
0

!0.1

2000

4000

6000
frequency (Hz)

8000

10000

12000

!0.15

0.005

0.01

0.015

0.02 0.025 0.03


quefrency (s)

0.035

0.04

0.045

0.05

Bogert, Healy, Tukey. The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, crosscepstrum, and saphe cracking. Symposium on Time Series Analysis, Chapter 15, 209-243, Wiley, 1963.
44

sig.cepstrum
cepstral analysis

sig.cepstrum(, Freq)

represented in the frequency domain

sig.cepstrum(..., Min, 0, s)

sig.cepstrum(..., Max, .05, s)

45

sig.autocor
autocorrelation function
+,-./012345/67

27<=.8,-4

!"%

!!"%

!"#$%

!"#&

!"#&%

!"#'

!"#'%

!"#(
8.7409:;

!"#(%

!")

!")!%

!")*

0.04

0.045

Waveform autocorrelation
0.15

coefficients

Rxx

0.1

0.05

!0.05

0.005

0.01

0.015

0.02

0.025
lag (s)

46

0.03

0.035

0.05

sig.autocor
autocorrelation function

sig.autocor(, Min, t1, s) t1=0 s

sig.autocor(..., Max, t2, s) t2=.05 s (audio) or t2=2 s


(envelope)

sig.autocor(..., Freq)

sig.autocor(..., Window,w) w=hanning


specifies a windowing method

sig.autocor(..., NormalWindow,f) f=on=yes=1


divides by autocorrelation of the window*

sig.autocor(..., Halfwave) half-wave rectification

lags in Hz.

*Boersma. Accurate short-term analysis of the fundamental frequency and the


harmonics-to-noise ratio of a sampled sound, IFA Proceedings 17: 97-110, 1993.

sig.autocor
generalized autocorrelation

Autocorrelation (by default):

Generalized autocorrelation:

y = IDFT(|DFT(x)|2)

y = IDFT(|DFT(x)|k)

sig.autocor(, Compress, k) k=.67

Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis


Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.

sig.autocor
generalized autocorrelation

sig.autocor(Amin3, Enhanced, 2:10)


234567

0.06
0.03
0.05
0.025
0.04
0.02
0.03
0.015
0.02
0.01
0.01
0.005
0
!0.01
0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

Tolonen, Karjalainen. A Computationally Efficient Multipitch Analysis


Model, IEEE Transactions on Speech and Audio Processing, 8(6), 2000.

sig.flux: distance between


successive frames

s = sig.spectrum(a, Frame)

sig.flux(s) = sig.flux(a)

coefficient value (in )

Spectral flux
0.03
0.02
0.01
0

0.5

1.5

2.5

3.5

3.5

frames

c = sig.cepstrum(a, Frame)

sig.flux(c)

coefficient value (in )

!4

Cepstrum flux

x 10

7
6
5
4

0.5

1.5

2.5
frames

sig.flux: distance between


successive frames

sig.flux(, Dist, d) d = Euclidian, City, Cosine


specifies the distance measure.

sig.flux(..., Inc) positive differences

sig.flux(..., Complex) complex flux

sig.flux(..., Median, l, C)

sig.flux(..., Median, l, C, 'Halfwave)

Dynamics
Audio level

Sound

sig.rms
root-mean square

sig.rms(ragtime)
The RMS energy related to file ragtime is 0.017932

sig.rms(ragtime, Frame)
Default frame size .05 s, frame hop = .5

coefficient value (in )

RMS energy
0.04
0.03
0.02
0.01
0

0.5

1.5

2.5
frames

3.5

4.5

sig.envelope
envelope extraction
Audio waveform
0.1

amplitude

0.05

a:

!0.05

!0.1

0.5

1.5

2.5

3.5

4.5

time (s)

e = sig.envelope(a):
Envelope
0.025

0.02

amplitude

0.015

0.01

0.005

0.5

1.5

2.5
time (s)

3.5

4.5

e.play
e.save

sig.rms
coefficient value (in )

RMS energy
0.04
0.03
0.02
0.01
0

0.5

1.5

2.5

3.5

4.5

3.5

4.5

frames

sig.envelope
Envelope
0.2

amplitude

0.1

!0.1

!0.2

0.5

1.5

2.5
time (s)

sig.envelope(, Filter)
based on low-pass filtering
sig.envelope(, Hilbert)

Hilbert
abs

Full-wave rectification

(Analytic signal)

(Magnitude)

sig.envelope(, Filtertype, IIR)

HalfHann

1.5

LPF LowPass
Filter

x 10

0.5
0.5

2000

4000

6000

8000

10000

2000

4000

6000

8000

10000

mirenvelope(..., Tau, .02): time constant (in s.)


N Down-Sampling

mirenvelope(..., PostDecim, N) N=16


mirenvelope(..., Sampling, f)

sig.envelope(, Spectro)
based on power spectrogram

Band decomposition:

sig.envelope(..., Freq): none (default)

aud.envelope(..., Mel): Mel-band

aud.envelope(..., Bark): Bark-band

sig.envelope(, UpSample, 2)

sig.envelope(, Complex)

sig.envelope
post-processing
Envelope

sig.envelope(, Center)

amplitude

0.02
0
!0.02

0.5

1.5

2 (half!wave2.5
Envelope
rectified)
time (s)

3.5

4.5

0()$!!'

0.5

1.5

2
2.5
*+,,-.-/0+10-2)-/3-456time (s)

3.5

4.5

!"#

$"#

% (half!wave%"#
Envelope
rectified)
0+7-)89:

&

&"#

'

'"#

3.5

4.5

HalfWaveCenter)

amplitude

0.02
0.01

Diff)

1764+0;2-

%
$
!
!$

HalfWaveDiff)

amplitude

!4
x! 10

2
0

0.5

1.5

2.5
time (s)

sig.envelope(..., Power)

sig.envelope(..., Normal)

sig.envelope(..., Smooth,o) moving average, order o = 30 sp.

sig.envelope(..., Gauss,o) gaussian, std deviation o = 30 sp.

remains the same if only the seenDifferentiation


to approximate
differential
of asloudn
of zthe
(n)
is
performed
foll
b
bbands are preserved and then perception
loudness
for steady
soundsofiszroug
half-wave of
rectified
(HWR)
differential
b (n)
ise signal [20]. Approximately toasthe sum of log-powers at critical bands.
to suffice. Scheirer proposed a
The logarithm
and=differentiation
zb (n)
HWR(zb (n) operations
zb (n 1))
lysis was carried out within the sented in a more flexible form. A numerically
where the function
HWR(x)
= max(x,
0) sets ne
then combined across bands.
calculating
the logarithm
is the
-law compres
to zero and is essential to make the different
d was indeed very successful, a
aud.envelope(, Mu, mu),
ln(1
+
x
(k))
b
Then a weighted average
plies primarily to music with a
yb (k) =of zb (n) and its diffe
,
ln(1 + )
mu-law
compression,
mu
=
100
is
formed
as
nges for example in classical or
ced using only a few subbands. which performs a logarithmic-like transformatio
fr
zb (n)
ub (n) = (1 )zb (n) +
aud.envelope(..., Lambda, l)
changes and note beginnings in motivated above but behaves linearly fnear
LP zero
ly 40 logarithmically-distributed determines the 0.15
degree of compression and c
where 0 1 determines the balance betwe
aud.envelope(..., Klapuri06):
However, this leads to a dilemma: adjust
between a 0.1close-to-linear ( < 0.1) a

zb (n), and the factor4 fr /fLP compensates for th


0.05
distinguish harmonic changes logarithmic
(
>
10
)
transformation.
The
val
differential
of
a
lowpass-filtered
signal
is
small
in
6

= aud.envelope(...,
Spectro,
0
each narroweband
separately is employed,
but
all
values
[10,
1 in the
2 range
3
4 10 5]
prototypical meter analysis
system
and
a
subset
o
Time (s)
UpSample, Log, perform almost equally
database (see Sec.0.5 III) well.
were used to thoroughl
playing in whichHalfwaveDiff,
no perceptible gaps Lambda,
are
0.4
.8);
To
achieve
a
better
time
resolution,
the effect of . Values
between
0.6 andthe1.0com
pe
0.3
0.2
envelopes
y
(k)
are
interpolated
by factor
b
and

=
0.8
was
taken
into use. Using
this valtw
are approximately one whole tone apart,
0.1
d.
sig.sum(e, Adjacent,
zeros
between
the 0but
samples.
This
leads
e notes c and
1.010)
makes
a slight
consistent
improvement
1
2
3
4to the
5in
ub(n)

x (k)

aud.envelope
auditory modeling

Time (s)

accuracy.
Klapuri, A., A. Eronen and J. Astola. (2006).
AnalysisFig.
of3.the
meter of acoustic musical
Illustration of the dynamic compression and weighted differentiation
steps for an Processing,
artificial signal. Upper panel
shows x342
(k) and the
lower panel
signals, IEEE Transactions on Audio, Speech
and3 Langage
14-1,
355.
Figure
illustrates
the described
dynamic
com
shows u (n).
b

sig.filterbank
filterbank decomposition

f = sig.filterbank(, CutOff, [500, 1000])

f.play

f.save
filterN
...

miraudio

filter2
filter1

sig.filterbank
filterbank decomposition
sig.filterbank(, CutOff, [44, 88, 176, 352, 443])
Magnitude Response (dB)

Magnitude (dB)

0
10
20
30
40
0

0.005

0.01
N
li

dF

0.015
d/

0.02

0.025

l )

sig.filterbank(, CutOff, [-Inf 200 400 800 1600 Inf])


Magnitude Response (dB)

Magnitude (dB)

0
10
20
30
40
50
0.02

0.04

0.06

0.08

0.1

0.12

Examples
The ten ERB filters between 100 and 8000Hz are computed using
fcoefs = MakeERBFilters(16000,10,100);
The resulting frequency response is given by
y = ERBFilterBank([1 zeros(1,511)], fcoefs);
resp = 20*log10(abs(fft(y')));
freqScale = (0:511)/512*16000;
semilogx(freqScale(1:255),resp(1:255,:));
axis([100 16000 -60 0])
xlabel('Frequency (Hz)');
ylabel('Filter Response (dB)');

aud.filterbank
auditory modeling
0

aud.filterbank()

-40
Equivalent Rectangular
Bandwidth (ERB)
-60
10
10
10
Gammatone filterbank A simple cochlear model can be formed by filtering an utterance with these filters. T
Filter
frequency
convert this data
into an image
we pass each row ofresponses
the cochleagram through a hal

-20

wave-rectifier, a low-pass filter, and then decimate by a factor of 100. A cochleagra


of the A huge tapestry hung in her hallway utterance from the TIMIT database

aud.filterbank(, NbChannels, 10)


24

aud.filterbank(, Channel, 1:10)

based on Auditory toolbox

R. D. Patterson et al. Complex sounds and auditory images, in Auditory


Physiology and Perception, Y. Cazals et al, Oxford, 1992, pp. 429-446

aud.filterbank
auditory modeling

aud.filterbank(, Mel)

Magnitude Response (dB)

Magnitude (dB)

0
10
20
30

0.05

0.1

0.15

aud.filterbank(..., Bark)

0.2

0.25

0.3

Magnitude Response (dB)

Magnitude (dB)

0
10
20
30

0.1

0.2

0.3

aud.filterbank(..., Scheirer)

0.4

0.5

0.6

0.7

Magnitude Response (dB)

Magnitude (dB)

0
10
20
30
40

0.04

0.06

aud.filterbank(..., Klapuri)

0.08

0.1

0.12

0.14

0.16

Magnitude Response (dB)

0
Magnitude (dB)

0.02

10
20
30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

.sum
across-channel summation
Audio waveform (centered)
0.1

0
!0.1
10

!1
20

sig.filterbank

f = sig.filterbank()

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0
!5
10

0
!2
50

0.5

0
!1

0
!3

Envelope

x 10

5
0
0.10

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

3.5

4 0.05
0
0.40

sig.envelope

e = sig.envelope(f)

0.2
0
10

2 0.5
0
0.20

1 0.1
0

Envelope
0.06

e.sum

0.02
amplitude

.sum

0.04

0
!0.02
!0.04
!0.06
!0.08

0.5

1.5

2.5
time (s)

.sum
stereo summation

sig.input

a = sig.input(,
Mix, No)
Envelope
0.03
0.02

sig.envelope

0.01

e = sig.envelope(a)

0.5

1.5

0.5

1.5

2.5

2.5

2.5

0.03
0.02

0.01
0

time (s)
Envelope

.sum

0.05

e.sum(channel)

amplitude

0.04
0.03
0.02
0.01
0

0.5

1.5

2
time (s)

(currently called sig.sum)

.sum
across-channel summary

Audio waveform (centered)

0.1

0
!0.1
10

!1
20

sig.filterbank

f = sig.filterbank()

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0
!5
10

0
!2
50

0.5

0
!1

!3

Envelope

x 10

5
0
0.10

sig.envelope

e = sig.envelope(f)

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

0.5

1.5

2.5

3.5

4 0.05
0
0.40

3 0.2
0
10

0.5
0
0.20

1 0.1
0

sig.autocor

!10

a = sig.autocor(e)

0.5
5 0
!0.5

3
2
1

a.sum

!8

x 10
50
0
!5 x 10!6
10
0
!1 x 10!6
50
0
!5 x 10!7
20
1
0

0.2

0.4

0.6

0.8

1.2

0.2

0.4

0.6

0.8

1.2

0.2

0.4

0.6

0.8

1.2

0.2

0.4

0.6

0.8

1.2

0.2

0.4

0.6

0.8

1.2

!6

5
coefficients

Envelope autocorrelation

x 10

.sum

Envelope autocorrelation

x 10

!5

0.2

0.4

0.6

0.8
lag (s)

1.2

sig.peaks
peak picking

sig.peaks
peak picking

sig.peak(, Total, Inf) Number of peaks

Border effects:

sig.peak(..., NoBegin) First sample excluded

sig.peak(..., NoEnd) Last sample excluded

sig.peak(..., Order, o) Ordering of peaks

o = Amplitude From highest to lowest

o = Abscissa

sig.peak(..., Valleys)

Along the abscissa axis


Local minima

sig.peaks
peak picking
'()!*+(,-./0

03748-/5(

$!

OK

#"

OK

#!
"
!

not OK
!

"

#!

#"

$!
'()123456

OK
$"

sig.peak(, Threshold, 0)

sig.peak(..., Contrast, .1)

sig.peak(..., SelectFirst, .05)

mus.peak(..., Reso, SemiTone)

%!

%"

t
c
0&!0

sig.peaks
peak picking
Waveform autocorrelation

sig.peak()

coefficients

0.5

!0.5

0.005

0.01

0.015

0.02

0.025
lag (s)

0.03

0.035

0.04

0.045

0.035

0.04

0.045

0.035

0.04

0.045

Waveform autocorrelation

sig.peak(..., Extract)

coefficients

0.5

!0.5

0.005

0.01

0.015

0.02

0.025
lag (s)

0.03

Waveform autocorrelation
1

sig.peak(..., Only)

coefficients

0.8
0.6
0.4
0.2
0

0.005

0.01

0.015

0.02

0.025
lag (s)

0.03

sig.peaks(, Track)
peak tracking

McAulay, R.; Quatieri, T. (1996). Speech analysis/Synthesis based on a sinusoidal


representation, IEEE Transactions on Acoustics, Speech and Signal Processing, 34:4, 744754.

sig.segment
segmentation

s = sig.segment(myfile, [1 2 3])

s.play(1:2)

Audio waveform

s.save

0.1

sig.spectrum(s)

0.5

1.5

2.5
time (s)

3.5

4.5

MelSpectrum
40
Mel bands

amplitude

0.1

20
0

0.5

1.5

2.5

3.5

sig.segment
segmentation

e = sig.envelope(myfile)

0.045
0.04
0.035
0.03
0.025
0.02
0.015

Envelope

0.01
0.005

p = sig.peaks(e)

0.5

s = sig.segment(myfile, p)

1.5

time

2.5

3.5

Symbolic level

Notes
Dynamics

Audio level

Sound

aud.score
onset detection

e=
sig.envelope(myfile)

Envelope

0.045
0.04
0.035
0.03
0.025
0.02
0.015

p = sig.peaks(e,
Contrast, .1)

0.01
0.005
0
1

s = aud.score(p)

1.5

time

2.5

aud.sequence

-0.5

In short:
[s p] =
aud.score(myfile,
Contrast, .1)

0.5

0.5

-1

0.5

1.5

2.5

seq.event
seq.syntagm

mironsets
onset detection
Poster @ ismir 2008
mirrms
mirautocor

mirsimatrix

mirnovelty

mirframe
mirspectrum
mirflux
(Complex)
Mel

mirfilterbank!
Gammatone!
Scheirer !
Klapuri

power
diff!
simple!
order 8

mirenvelope!
auto-regressive!
half-Hanning

Halfwave

*.8

log

*.2

aud.score(, Scheirer)

aud.score(, Klapuri)

aud.eventdensity
temporal density of events
1

0.5

-0.5

-1

0.5

1.5

a = aud.score(audio.wav)

2.5

Event density
3.5

8.5
8
7.5
7
6.5

aud.eventdensity(a)

6
5.5
5

0.5

aud.eventdensity(audio.wav, Frame)

aud.eventdensity(score.mid)
77

1.5

time

2.5

Notes
Dynamics

Timbre
Audio level

Sound

aud.score(,
Attack, Release)

aud.attacktime
duration of note attacks
Envelope (centered)
2.5

amplitude

2
1.5
1
0.5
0
!0.5
4

time (s)

aud.attacktime(, Lin): duration in seconds

aud.attacktime(, Log): duration in log scale

Krimphoff, J., McAdams, S. & Winsberg, S. (1994), Caractrisation du timbre des sons complexes.
II : Analyses acoustiques et quantification psychophysique. Journal de Physique, 4(C5), 625-628.

aud.attackslope
average slope of note attacks
Envelope (centered)
2.5

amplitude

2
1.5
1
0.5
0
!0.5
4

time (s)

mirattackslope(..., Diff): average slope

mirattackslope(..., Gauss): gaussian average,


highlighting the middle of the attack phase

Peeters. G. (2004). A large set of audio features for sound description (similarity and
classification) in the CUIDADO project. version 1.0

o = aud.onsets(audiofile, Attack, Release)

aud.attackslope(o)

sig.zerocross
waveform sign-change rate
Audio waveform
0.0&

amplitude

0.02
0.01
0
!0.01
!0.02
!0.0&
1.701

1.702

1.70&

1.704

1.705
time (s)

1.706

1.707

1.708

Indicator of noisiness

sig.zerocross(, Per, Second): rate per second

sig.zerocross(..., Per, Sample): rate per sample

sig.zerocross(..., Dir, One): only or

sig.zerocross(..., Dir, Both): both and

1.709

sig.rolloff
high-frequency energy (I)
Spectrum
3500

magnitude

3000

85% of the energy

2500
2000
1500
1000
500
0

2000

4000

6000
frequency (Hz)

8000

10000

12000

5640.53 Hz

sig.rolloff(, Threshold, .85)


th=.85 in Tzanetakis, Cook. Musical genre classification of audio signals. IEEE
Tr. Speech and Audio Processing, 10(5),293-302, 2002.
th=.95 in Pohle, Pampalk, Widmer. Evaluation of Frequently Used Audio
Features for Classification of Music Into Perceptual Categories, ?

aud.brightness
high-frequency energy (II)
Spectrum
3500

magnitude

3000
2500

53.96% of the energy

2000
1500
1000
500
0

2000

4000

6000
frequency (Hz)

8000

10000

1500 Hz

aud.brightness(, CutOff, 1500) (in Hz)

aud.brightness(..., Unit, u)

u = /1 or %

3000 Hz in Juslin 2001, p. 1802.


1500 Hz and 1000 Hz in Laukka, Juslin and Bresin 2005.

12000

aud.brightness
high-frequency energy (II)
aud.brightness(, Frame)

coefficient value

Brightness, son5

frame length = .05 s


frame hop = 50%

0.6
0.4
0.2
70

75

80

85
90
Temporal location of events (in s.)

95

100

Beethoven, 9th Symphony, Scherzo


coefficient value

Brightness, son6

0.5
0.4
0.3
0.2
0.1
20

40

60
80
100
Temporal location of events (in s.)

120

140

Beethoven, 7th Symphony, Allegretto

aud.mfcc
mel-frequency cepstral coefficients
mircepstrum:
audio

sig.spectrum

Abs

Log

mirmfcc:
sig.spectrum

Abs

Log

Description of spectral shape

(Mel)

(Inverse)

Fourier

transform
Discrete

Cosine

Transform

aud.roughness
sensory dissonance

aud.roughness(, Sethares)
Dissonance produced
by two sinusoids
depending on
their frequency ratio
sig.spectrum

sig.peaks

dissonance

sum

estimated for

each pair of peaks
Plomp & Levelt "Tonal Consonance and Critical Bandwidth" Journal of the Acoustical Society of America, 1965.
Sethares, W. A. (1998). Tuning, Timbre, Spectrum, Scale, Springer-Verlag.

Notes
Timbre
Audio level

Pitch
Sound

Dynamics

aud.pitch
f0 estimation
aud.filterbank

sig.frame

sig.frame

mirspectrum
sig.spectrum

sig.autocor

sig.autocor

.sum

p = aud.pitch(,
Autocor)

aud.pitch(...,
AutocorSpectrum)

aud.pitch(..., Cepstrum)

aud.pitch(..., Frame,...)

sig.cepstrum
*
sig.peaks

Peeters. Music Pitch Representation by Periodicity


Measures Based on Combined Temporal and
Spectral Representations. ICASSP 2006.

aud.pitch
f0 estimation
aud.filterbank
sig.frame
sig.autocor

sig.frame

aud.pitch(, Tolonen)

aud.pitch(,
2Channels,
Enhanced, 2:10,
Generalized, .67)

mirspectrum
sig.spectrum

sig.autocor

.sum

sig.cepstrum
*
sig.peaks

Tolonen, Karjalainen. A Computationally Efficient


Multipitch Analysis Model. IEEE Transactions on
Speech and Audio Processing, 8(6), 2000.

aud.pitch
f0 estimation
frequency (Hz)

Waveform autocorrelation

1500

1000

ac = sig.autocor(ragtime,
Frame)

500
0.5

1
1.5
2
2.5
3
3.5
temporal location of beginning of frame (in s.)

Waveform autocorrelation

frequency (Hz)

2000

1500

ac = sig.autocor(ac,

1000

Enhanced, 2:10)

500
0

0.5

1
1.5
2
2.5
3
3.5
temporal location of beginning of frame (in s.)

Pitch
1600

coefficient value (in Hz)

1400
1200

aud.pitch(ac)

1000
800
600
400
200
0

0.5

1.5
2
2.5
3
Temporal location of frames (in s.)

3.5

4.5

Tolonen, Karjalainen. A Computationally Efficient


Multipitch Analysis Model. IEEE Transactions on
Speech and Audio Processing, 8(6), 2000.

p.play
p.save

aud.pitch
f0 estimation

aud.pitch(, Frame)

Pitch, s103.wav

coefficient value (in Hz)

1000
800
600
400
200
0
0.5

1.5
2
2.5
Temporal location of events (in s.)

3.5

aud.pitch(, Frame, Total,


1, Max, 400)
Pitch, s103.wav
300
coefficient value (in Hz)

250

200

150

0.5

1.5
2
2.5
Temporal location of events (in s.)

3.5

Symbolic level

Notes
Timbre

Audio level

Pitch
Sound

Dynamics

mus.pitch
pitch segmentation and discretisation
4

Pitch, istikhbar

x 10

coefficient value (in Hz)

1.21

10

10

1.2

1.19
8
88
1.18

8
7

1.17

1.16
14

14.5

15
15.5
Temporal location of events (in s.)

16

5
16.5

Structural
levels

Meter

Symbolic level

Notes
Timbre

Audio level

Pitch
Sound

Dynamics

aud.fluctuation
rhythmic periodicity along auditory
channels

Mel!Spectrum

40
35

aud.spectrum
Bark bands

(Frame,.023,.5, Terhardt,
Bark, Mask, dB)

Mel bands

30
25
20
15
10
5
0.5

1
1.5
2
2.5
3
3.5
temporal locationSpectrum
of beginning of frame (in s.)

40
35

(AlongBands, Max,10,
Window, No,
Resonance, Fluctuation)

Channels

aud.spectrum
Bark bands

30
25
20
15
10
5

Spectrum
10000

sig.sum

magnitude

8000

4
5
6
frequency (Hz)

6000
4000
2000
0

4
6
frequency (Hz)

10

E. Pampalk, A. Rauber, D. Merkl, "Content-based Organization and


Visualization of Music Archives", ACM Multimedia 2002, pp. 570-579.

mus.tempo
tempo estimation
Onset curve (Differentiated envelope)

e = sig.envelope(mysong,
Diff)
ac = sig.autocor(do)

0.15

0.05
0
!0.05

pa = sig.peaks(ac, Total, 1)

pa
0.6

mus.tempo(pa)
t = mus.tempo(mysong)!

time (s)

0.4
coefficients

0.1

amplitude

Envelope autocorrelation

ac

0.2
0
!0.2

t = t.eval

t = {sig.signal, sig.Autocor}

!0.4
0.2

0.4

0.6

0.8
1
lag (s)

1.2

1.4

tempo: 129.6333 bpm

1.6

mus.tempo
tempo estimation

e = sig.envelope(mysong, Diff)

ac = aud.autocor(do, Resonance)
1.2

ac
Resonance
curve
emphasises
(Toiviainen
& Snyder,
2003)
periodicities more easily perceived.

1
0.8

(van Noorden & Moelants, 2001)

0.6
0.4
0.2

do

pa

0.2
0.4
0.2

0.4

0.6

0.8

lags (s)

1.2

1.4

1.6

mus.tempo
tempo estimation
Onset curve (Differentiated envelope)

e = sig.envelope(mysong, Diff)
ac = aud.autocor(do, Frame,
Resonance)

0.1

amplitude

0.15

0.05
0
!0.05

time (s)

pa = sig.peaks(ac, Total, 1)

mus.tempo(pa)

t = mus.tempo(mysong, Frame)!
t = t.eval

t = {sig.signal, sig.Autocor}

pa
coefficient value (in bpm)

ac

Tempo
180

170
160
150
140
130

4
6
8
Temporal location of events (in s.)

10

mus.pulseclarity
rhythmic clarity, beat strength
mus.pulseclarity(,Enhanced, No)
Envelope autocorrelation

Amplitude

1
0.8

coefficients

0.6
0.4
0.2
0
0.2
0.4
0.2

240 BPM
0.4

0.6

0.8

.5 s

1.2

1 s

lag (s)

1.4

MaxAutocor MinAutocor
KurtosisAutocor
TempoAutocor
EntropyAutocor
Lag
0
InterfAutocor

1.6

mus.pulseclarity(,Enhanced, Yes)
EntropyAutocor
InterfAutocor

Amplitude

0.6

0.5

0.4

0.3

0.2

0.1
0.2

0.4

0.6

0.8

1.2

1.4

1.6

Lag

Lartillot et al., ISMIR 2008

mus.tempo(mysong,
Frame)

Pulsation does not always focus on one single metrical level.

mus.metre

tracks all metrical levels in parallel.

Metrical levels: 8

Dominant levels

7
6 .
5
4
3 .
2
1
0.5

2.5

mus.metre
metrical hierarchy tracking

mus.metroid

Beethoven, 9th Symphony, Scherzo

Structural
levels

Mode!
Tonality

Symbolic level

Notes
Timbre

Audio level

Meter

Pitch
Sound

Dynamics

mus.chromagram
energy distribution along pitches
s = sig.spectrum(a,
dB, 20, 'Min', 100,
'Max', 6400)

Spectrum
20
15

magnitude

10
5
0

c=
mus.chromagram(c,
Wrap, yes)

1000

1500

2000

2500
3000
frequency (Hz)

3500

4000

4500

5000

Chromagram
100

magnitude

c=
mus.chromagram(s,
Wrap, no)

500

120

80
60
40
20
0
G2

C3

F3

A#3

D#4

G#4

C#5

F#5
chroma

B5

E6

A6

D7

G7

C8

F#

G#

A#

Chromagram
400
300

magnitude

200
100
0

C#

D#

F
chroma

mus.chromagram
chroma resolution

mus.chromagram(, Res, 12)


chroma class

Chromagram
B
A#
A
G#
G
F#
F
E
D#
D
C#
C
0

1.5

2
2.5
time axis (in s.)

3.5

1.5
2
2.5
3
temporal location of beginning of frame (in s.)

3.5

mus.chromagram(..., Res, 36)


Chromagram

35
30
chroma class

0.5

25
20
15
10
5
0

0.5

mus.keystrength
probability of key candidates
mus.chromagram

C major

Cross-Correlations

C minor

C# major

C# minor

D major

D minor

Krumhansl, Cognitive foundations of musical pitch. Oxford UP, 1990.


Gomez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on
Computing, 18-3, pp. 294304, 2006.

mus.key
key estimation

sig.peaks(mus.keystrength())

mus.key(, Total, 1)

Krumhansl, Cognitive foundations of musical pitch. Oxford UP, 1990.


Gomez, Tonal description of polyphonic audio for music content processing, INFORMS Journal on
Computing, 18-3, pp. 294304, 2006.

mus.key
key estimation
Key
A#

k = mus.key(mysong, Frame)!

k = k.eval

coefficient value

A
G#

min
maj

k= {sig.signal, sig.signal,
mus.Keystrength}

F#
F
E
D#

0.5

1.5
2
Temporal location of events (in s.)

Key strength

2.5

3.5

2.5

3.5

Key clarity
tonal center

0.8

coefficient value

0.75
0.7
0.65
0.6
0.55

0.5

1.5
2
Temporal location of events (in s.)

Bm
A#m
Am
G#m
Gm
F#m
Fm
Em
D#m
Dm
C#m
Cm
BM
A#M
AM
G#M
GM
F#M
FM
EM
D#M
DM
C#M
CM
0

0.5

1.5
2
2.5
temporal location of beginning of frame (in s.)

mus.key
key estimation

coefficient value

Key, Umile01

10
15
20son32.wav
Key,
B
Temporal
location of events (in s.)

A
G#
G
F#
F
E
D#
D
C#
C

coefficient value

coefficient v

F#
F
E
D#
D
C#
C

maj
min
5

10
15
20
Temporal location of events (in s.)

25

30

Monteverdi, Hor chel ciel e la terra, 1st part

A#
A
G#
G
F#
F
E
D#
D
C#

20

coefficient value

coefficient value

60
80
100
120
Temporal location of events (in s.)

40
60
80
100
Temporal location of events (in s.)

30

120

Key, son27.wav

B
A#
A
G#
G
F#
F
E
D#
D
C#
C
40

25

Tiersen, Comptine dun autre t: Laprs-midi

Key, son5.wav

20

maj
min

140

Beethoven, 9th Symphony, Scherzo

160

B
A#
A
G#
G
F#
F
E
D#
D
C#
C
20

40
60
80
100
120
Temporal location of events (in s.)

140

Schnberg, Verklrte Nacht, Sehr Ruhig

[k c] = mus.key
key clarity
Key clarity, son31.wav

coefficient value

clear
tonality

1
0.8
0.6
0.4

Tchakovski, Swan Lake, Act one

0.2

unclear

20

40

60
80
100
Temporal location of events (in s.)

120

140

Key clarity, son24.wav


1

coefficient value

clear

unclear

0.8
0.6
0.4
0.2
0

20

40

60
80
100
Temporal location of events (in s.)

120

140

160

Prokofiev, Violon concerto No. in D major, Scherzo: Vivacissimo

mus.mode
mode estimation

coefficient value

major

minor

coefficient value

major

minor

mus.mode
mode estimation
Mode, son28.wav

0.4

Schubert, Piano Trio No.2 in E-flat major, Andante con moto

0.2
0
0.2
0.4

20

40

60
80
100
Temporal location of events (in s.)

120

140

Mode, son22.wav
0.4

Mozart, Piano Concerto No.23 in A major, Adagio

0.2
0
0.2
0.4

20

40

60
80
100
Temporal location of events (in s.)

120

140

mus.keysom
self-organizing map
Self!organizing map projection of chromagram

f#

d
A

c#
E
B

a
C

e
G

b
D

bb
Db

Ab

c
g

Eb
Bb

ab

d#
Gb

Toiviainen & Krumhansl, Measuring and modeling real-time responses to music:


The dynamics of tonality induction, Perception 32-6, pp. 741766, 2003.

Structural
levels

Mode
Tonality

Symbolic level

Notes
Timbre

Audio level

Meter

Pitch
Sound

Dynamics

onset-based segmentation
o = aud.onsets(audiofile, Attack, Release)

sig.segment(audiofile, o)
Audio waveform

0.1
0.08
0.06

amplitude

0.04
0.02
0
0.02
0.04
0.06
0.08
0.1

0.5

1.5

2.5
time (s)

3.5

4.5

onset-based segmentation
s = sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06

amplitude

0.04
0.02
0
0.02
0.04
0.06
0.08
0.1

0.5

1.5

2.5

3.5

4.5

time (s)

mus.pitch(s)

Pitch, ragtime

coefficient value (in Hz)

1500

1000

500

onset-based segmentation
s = sig.segment(audiofile, o)
Audio waveform
0.1
0.08
0.06

amplitude

0.04
0.02
0
0.02
0.04
0.06
0.08
0.1

0.5

1.5

2.5
time (s)

mus.key(s)

3.5

4.5

Structural
levels

Groups

Symbolic level

Meter

Notes
Timbre

Audio level

Mode
Tonality

Pitch
Sound

Dynamics

may also be varied, though in our experience, robust audio analysis r


Here, the parameterization is optimized for analysis, rather than for com
The sole requirement is that similar source audio samples produce sim

sig.simatrix
The analysis proceeds by comparing all pairwise combinations of aud
dissimilarity
matrix
measure. Represent
the B-dimensional spectral
data computed for N wi

{vi : i = 1, , N } IRB . Using a generic similarity measure, d : IR


similarity data in a matrix, S as illustrated in the right panel of Figur
vi
S(i, j)sig.simatrix(a,
= d(vi , vj ) i, j = 1, , N
Dissimilarity matrix

temporal location of frame centers (in s.)

3.5

2.5

Dissimilarity)

The time axis runs on the horizontal (left to right) and vertical (top
diagonal, where self-similarity is maximal.
A common similarity meas
sig.simatrix(,
vi and vj representing the spectrograms forDistance,
sample timescosine)
i and j, resp
1.5

0.5

0.5

sig.spectrum
(,Frame)

1
1.5
2
2.5
3
temporal location of frame centers (in s.)

vj

3.5

< vi , vj >
dcos (vi , vj ) =
.
|vi ||vj |

There are many possible choices for the similarity measure including
normCooper.
or non-negative
exponential using
measures,
such as Decomposition,
Foote,
Media Segmentation
Self-Similarity
SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.

The
ehe
sole
sole
sole
requirement
requirement
isisthat
issource
that
that
similar
similar
similar
source
source
audio
audio
audio
samples
samples
samples
produce
produce
produce
similar
similar
similar
feature
featur
featu
ement
isrequirement
that similar
audiosource
samples
produce
similar
features.

The
The
analysis
analysis
analysis
proceeds
proceeds
byby
by
comparing
comparing
comparing
allall
all
pairwise
pairwise
pairwise
combinations
combinations
combinations
ofofaudio
ofaudio
audio
windows
window
windo
sThe
proceeds
by proceeds
comparing
all
pairwise
combinations
of
audio windows
using
a qua
measure.
asure.
easure.
Represent
Represent
Represent
the
the
the
B-dimensional
B-dimensional
B-dimensional
spectral
spectral
data
data
data
computed
computed
for
for
for
NNaN
windows
windows
windows
ofofao
esent
the
B-dimensional
spectral dataspectral
computed
for
Ncomputed
windows
of
digital
audio
BBB
BBB
BBB
B
:ii :}i=i=
1,1,
1,B ., N
,Using
N
,}N}
}
IR
Igeneric
R. . Using
. Using
Using
a ageneric
ageneric
generic
similarity
similarity
similarity
measure,
:IR:IR
IIR
we
I
RIR
IR
#
v{vi,: N
=
IR
aIR
similarity
measure,
d measure,
: measure,
IR
IRdBd:d#
R,
em
milarity
imilarity
ilarity
data
data
data
inSinin
aasamatrix,
aillustrated
matrix,
matrix,
SSas
Sas
as
illustrated
illustrated
illustrated
inin
the
the
the
right
right
right
panel
panel
panel
ofof
Figure
Figure
Figure
1.1.1.
The
The
The
el
in a matrix,
in
the
right in
panel
of
Figure
1. ofThe
elements
of
Se

sig.simatrix
similarity matrix

Similarity matrix

Dissimilarity matrix

=i,=d(v
=
d(v
,)jv )j ),i,N
i,ji,j=j.=1,
=1,1, , N
, N
, N. . .
S(i, j) = d(vS(i,
vS(i,
) j)j)
j d(v
=
i ,1,
iv,ijv
i ,S(i,
jj)
4

3.5

3.5

temporal location of frame centers (in s.)

temporal location of frame centers (in s.)

The
ehe
time
time
time
axis
axis
runs
runs
runs
onon
on
the
the
the
horizontal
horizontal
horizontal
(left
(left
(left
totovertical
to
right)
right)
right)
and
and
and
vertical
vertical
(top
(top
(top
to
toto
bottom)
bottom
botto
runs
onaxis
the
horizontal
(left
to right)
and
(top
tovertical
bottom)
axes
of
Sa
diagonal,
agonal,
where
where
where
self-similarity
self-similarity
self-similarity
ismaximal.
iscommon
maximal.
maximal.
AAA
common
common
common
similarity
similarity
similarity
measure
measure
measure
the
isthe
thc
egonal,
self-similarity
is
maximal. isA
similarity
measure
is the
cosineisis
distan
vand
and
vjvjvrepresenting
representing
representing
the
the
the
spectrograms
spectrograms
spectrograms
for
for
for
sample
sample
times
i iand
iand
and
j,j,respectively,
j,respectively,
respectively,
senting
spectrograms
for
sample times
isample
and times
j,times
respectively,
iand
jthe
3

2.5

2.5
2

< vi , vj > <<v<iv,ivv,ijv, jv>j>>


dcos
ddcos
. ..
(v(v
,iv,ijv,)jv)=
dcos (vi , vj ) =
cos
i(v
j )=.=
|v|v
|vi ||v
|j |j |
|vi ||vj |
i ||v
ij||v

1.5

1.5
1

0.5

0.5

There
ere
here
are
are
are
many
many
many
possible
possible
possible
choices
choices
choices
for
for
for
the
the
the
similarity
similarity
similarity
measure
measure
measure
including
including
including
measures
measures
measure
b
y possible
choices
for the
similarity
measure
including
measures
based
on the
norm
orm
m ororor
non-negative
non-negative
non-negative
exponential
exponential
exponential
measures,
measures,
measures,
such
such
such
asasas
egative
exponential
measures,
such
as
0.5

1
1.5
2
2.5
3
temporal location of frame centers (in s.)

3.5

0.5

1
1.5
2
2.5
3
3.5
temporal location of frame centers (in s.)

sig.simatrix(a, Similarity, exponential)

dexp
dexp
dexp
(v(v
,ivcos
,ijv,)jv(v
)=
=
(d
(v(v
,iv,ijv,)jv)
1)
1)1). . .
dexp (vi , vj ) =
(d
,exp
vexp
) (d
(d
1)cos
.i(v
exp
i(v
j )i=
cos
cos
j )
jexp

Foote, Cooper. Media Segmentation using Self-Similarity Decomposition,


The
ehe
right
right
right
panel
panel
panel
ofof
Figure
Figure
Figure
1 1shows
1shows
shows
amatrix
asimilarity
asimilarity
similarity
matrix
matrix
matrix
computed
computed
computed
from
from
from
the
the
the
song
song
song

l of
Figure
1 of
shows
a similarity
computed
from
the song
Wild
Honey
SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.

aud.novelty
novelty curve
sig.simatrix(a)

aud.novelty(a, KernelSize, s)

Similarity matrix
4

s = 128 samples

3
2.5
2
1.5
1
0.5

0.5

1
1.5
2
2.5
3
3.5
temporal location of frame centers (in s.)

Convolution with Gaussian


checkerboard kernel

coefficient
value (in )
coefficient
value (in
)
coefficient value (in
)
coefficient
value (in
)

temporal location of frame centers (in s.)

3.5

small kernel:

Novelty

0.5
0

0
1

0.5

1.5

2
Novelty 2.5
Temporal location of frames (in s.)

3.5

4.5

0.5

1.5

Novelty 2.5
2
Temporal location of frames (in s.)

3.5

4.5

0.5
0

0
1

0.5
0

large kernel:
0.5

1.5

2 Novelty 2.5
Temporal location of frames (in s.)

3.5

4.5

0.5

1.5

2
2.5
Temporal location of frames (in s.)

3.5

4.5

0.5
0

Foote, Cooper. Media Segmentation using Self-Similarity Decomposition,


SPIE Storage and Retrieval for Multimedia Databases, 5021, 167-75.

Novelty
1

0.8

coefficient value

aud.segment
novelty-based
segmentation

nv

0.6

0.4

0.2

sg = sig.segment(mysong, nv)

sg = sig.segment(mysong)

sg.play

s = aud.mfcc(sg)
sm = sig.simatrix(s)

sm

3
4
5
Temporal location of events (in s.)

Audio waveform

sg

0.5
amplitude

nv = aud.novelty(sm)

!0.5

4
time
(s)
MFCC

20

coefficient ranks

15

10

5
0

4
time axis (in s.)

sig.cluster
clustering of audio segments
Audio waveform

amplitude

0.1

sg = aud.segment(a)0
!

coefficient ranks

cc = aud.mfcc(sg)

amplitude

0.1
150

MFCC

time (s)

10
5
0
0.10

sig.cluster(sg, cc)
0
0.1

Audio waveform
2
3
time axis (in s.)

sig.cluster
clustering of frame-decomposed feature
coefficient ranks coefficient ranks

MFCC
12
10
cc = aud.mfcc(,8
6
Frame)
4
2

MFCC
!
0
1
2
3
12 temporal location of beginning of frame (in
sig.cluster(cc, 4)10
8
6
4
2
0
1
2
3

sig.stat
basic statistics

mean

standard deviation

temporal slope

main periodicity:

frequency

amplitude

periodicity entropy

sig.hist
histogram
4

Histogram of Audio waveform

x 10

number of occurrences

7
6
5
4
3
2
1
0
!0.1

!0.08

!0.06

!0.04

!0.02

0
values

0.02

0.04

sig.hist(, Number,10)

0.06

0.08

0.1

sig.zerocross
waveform sign-change rate
Audio waveform
0.0&

amplitude

0.02
0.01
0
!0.01
!0.02
!0.0&
1.701

1.702

1.70&

1.704

1.705
time (s)

1.706

1.707

1.708

on any curve

sig.zerocross(, Per, Second): rate per second

sig.zerocross(..., Per, Sample): rate per sample

sig.zerocross(..., Dir, One): only or

sig.zerocross(..., Dir, Both): both and

1.709

sig.centroid
geometric center
first moment:

1 =

xf (x)dx

sig.spread
variance, dispersion
second moment:

= 2 =
2

standard deviation:

(x

1 ) f (x)dx
2

sig.skewness
non-symmetry
third moment:

3 =

(x

1 ) f (x)dx
3

3
third standardized moment:
3

-3

<0

>0

+3

sig.kurtosis
pickiness
fourth moment:

4 =

(x

1 ) f (x)dx
4

4
fourth standardized moment: 4

4
excess kurtosis: 4 3

-2

<0

>0

sig.flatness
smooth vs. spiky

geometric mean
arithmetic mean

N
N

1
n=0

x(n)
PN 1

n=0 x(n)
N

sig.entropy
relative Shannon entropy

Data considered as probability distribution:

p0: half-wave rectification

p=1: normalization

Shannon entropy: H(p) = -(p log(p))

Relative entropy, independent on the sequence length:


H(p) = -(p log(p)) /log(length(p))
Entropy gives an indication of the curve:

High entropy uncertainty flat curve

Low entropy certainty peak(s)

Shannon, C. E. 1948. A Mathematical Theory of Communication. Bell Systems Technical Journal 27:379423.

sig.export
exportation of statistical data to files

m = aud.mfcc(

sig.export(filename, m, ) adding one or several data


from MiningSuite operators.

sig.export(result.txt, ...) saved in a text file.

sig.export(result.arff, ...) exported to WEKA for datamining.

sig.export(Workspace, ...) saved in a Matlab


variable.

aud.play
feature-based playlist

zc = sig.zerocross(Folder);

aud.play(Folder, Increasing, zc)

Plays the folder of audio files in increasing order


of zero-crossing rate.

aud.play(Folder, Increasing, zc, Every, 5)

Plays one out of five audio files.

sig.dist
distance between features

s1 = aud.spectrum(a, Mel) s2 = aud.spectrum(b, Mel)


MelSpectrum

MelSpectrum

000
magnitude

150

000
0

10

20
Mel bands

30

100
50
0
40 0

10

20
Mel bands

30

sig.dist(s1, s2, Cosine)


The Mel-Spectrum Distance between files a and b is 0.1218

40

sig.dist
distance between clusters
c1 = aud.mfcc(a, Frame)
c2 = aud.mfcc(b, Frame)

c1 = sig.cluster(c1, 16)
c2 = sig.cluster(c2, 16)
MFCC

coefficient ranks

MFCC

12
10
8
6
4
2
0

12
10
8
6
4
2

1
2
3
4 0
temporal location of beginning of frame (in s.)

2
4
6
8
10
12
temporal location of beginning of frame (in s.)

Earth Mover's Distance between clusters

sig.dist(c1, c2)

The MFCC Distance between files a and b is 19.3907

sig.dist
distance between features
= aud.spectrum(Folder,
s1 = aud.spectrum(a, Mel) s2
Mel)
MelSpectrum

000

MelSpectrum
MelSpectrum

10

20
Mel bands

30

40

magnitude

magnitude

magnitude
magnitude

000
0

150
150
MelSpectrum
100
100 150
MelSpectrum
50
150
100
50
0
100 10
0
20
30
40
0 50
Mel bands
0
10
20
30
40
50
0
Mel bands
0
10
20
30
40
0
Mel bands
0
10
20
30
40
Mel bands

sig.dist(s1, s2)

The Mel-Spectrum Distance between files a and b1 is 0.2386


The Mel-Spectrum Distance between files a and b2 is 0.45729
The Mel-Spectrum Distance between files a and b3 is 0.6338
The Mel-Spectrum Distance between files a and b4 is 0.20082

.getdata
returns data in Matlab format
s = sig.spectrum(file);

Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
s.getdata
vector

.getdata
returns data in Matlab format
s = sig.spectrum(file,
Frame);

Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
s.getdata
matrix

.getdata
returns data in Matlab format
f = sig.filterbank(file,
Frame);

Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
f.getdata
matrix

.getdata
returns data in Matlab format
sg = sig.segment(file)
f = sig.filterbank(sg,
Frame);

Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
f.getdata

matrix

cell array

matrix
...

.getpeakpos, .getpeakval
returns data in Matlab format
p = sig.peaks

p.getpeakpos
vector

Encapsulated data
numerical data,
related sampling rates,
related file name,
etc.
p.getpeakval
vector

Limitations of data flow


in MIRtoolbox
long audio file,
batch of files
miraudio
mirframe
mirspectrum

a = miraudio(bigfile)

f = mirframe(a)

s = mirspectrum(f)

mircentroid(s)

mircentroid(bigfile, Frame)

mircentroid
145

Data flow graph


design & evaluation
long audio file,
batch of files
miraudio

a = miraudio(Design, )

s = mirspectrum(f, Frame, )

c = mircentroid(s)

mireval(c, bigfile)

mirframe
mirspectrum
mircentroid
146

Data flow graph


in MiningSuite
long audio file,
batch of files

a = sig.input(bigfile, );

sig.input

s = sig.spectrum(f, Frame, );

c = sig.centroid(s)

Frame

; No operation is performed.
sig.spectrum
sig.centroid

(The data flow graph is


constructed without actual
computation.)
147

sig.design
data flow graph design

a = sig.input(bigfile, );
s = sig.spectrum(a);

c = sig.centroid(s)

c!

d = sig.ans!

d = c.eval!

d = c.getdata

sig.design objects, storing


only the data flow graph
Design now evaluated in
order to display the results.

But results was not stored in c,


so displaying again c triggers
another evaluation of the design.
The last evaluation is stored in sig.ans.
Evaluate and store in a variable.

sig.design.eval
data flow graph evaluation

a = sig.input(bigfile, );

s = sig.spectrum(a);

c = sig.centroid(s)

c.eval!

c.eval(anotherfile)

Evaluate the data flow graph


for the particular audio file
(or Folder) assigned during
the graph design
Apply the same data flow
graph to another audio file
(or Folder)

sig.design.show
data flow graph display

a = sig.input();

s = sig.spectrum(a);

c = sig.centroid(s)

c.show

> sig.input ( 'ragtime' )!


frameconfig: 0!
mix: 'Pre'!
sampling: 0!
center: 0!
sampling: 0!
extract: []!
trim: 0!
trimwhere: 'BothEnds'!
trimthreshold: 0.0600!
halfwave: 0!

> sig.spectrum ( ... )!


win: 'hamming'!
min: 0!
max: Inf!
mr: 0!
res: NaN!
length: NaN!
zp: 0!
wr: 0!
octave: 0!
constq: 0!
alongbands: 0!
ni: 0!
collapsed: 0!
rapid: 0!
phase: 1!
nl: 0!
norm: 0!
mprod: []!
msum: []!
log: 0!
db: 0!
pow: 0!
collapsed: 0!
aver: 0!
gauss: 0!
timesmooth: 0

> sig.centroid ( ... )!

sig.signal.design
design stored in the results

a = sig.input();

save result.mat d

s = sig.spectrum(a);

1 year later:

c = sig.centroid(s);

load result.mat

d = c.eval

d.design!

d.design!

d.design.show

d.design.show

the results

3.

Symbolic approaches
153

MIDI Toolbox
nmat = readmidi(laksin.mid)

nmat =

pianoroll(nmat)

C5#
C5
B4
A4#
A4

Pitch

G4#
G4
F4#
F4
E4
D4#
D4
C4#

10

Time in beats

!
0
1.0000
2.0000
2.5000
3.0000
3.5000
4.0000
5.0000
6.0000
7.0000
9.0000
9.5000
10.0000
11.0000
11.5000
12.0000
12
14
12.5000

0.9000
0.9000
0.4500
0.4500
0.4528
0.4528
0.9000
0.9000
0.9000
1.7500
0.4528
0.4528
0.9000
0.4528
0.4528
0.4528
16
0.4528

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

64.0000
71.0000
71.0000
69.0000
67.0000
66.0000
64.0000
66.0000
67.0000
66.0000
64.0000
67.0000
71.0000
71.0000
69.0000
67.0000
66.0000

MIDI data
82.0000
89.0000
82.0000
70.0000
72.0000
72.0000
70.0000
79.0000
85.0000
72.0000
74.0000
81.0000
83.0000
78.0000
73.0000
71.0000
69.0000

0
0.6122
1.2245
1.5306
1.8367
2.1429
2.4490
3.0612
3.6735
4.2857
5.5102
5.8163
6.1224
6.7347
7.0408
7.3469
7.6531

0.5510
0.5510
0.2755
0.2755
0.2772
0.2772
0.5510
0.5510
0.5510
1.0714
0.2772
0.2772
0.5510
0.2772
0.2772
0.2772
0.2772

mus.score
score excerpt selection

mus.score(, Notes, 10:20)

mus.score(, StartTime, 30, EndTime, 60)

mus.score(, Channel, 1)

mus.score(, Trim)

mus.score(, TrimStart, TrimEnd)

mus.score
score information

metrical grid: hierarchical construction of


pulsations over multiple metrical levels

modal and tonal spaces: mapping pitch values


on scale patterns (on delimited temporal regions)

syntagmatic chains: successive notes forming


voices, enabling to express relative distance
between successive notes (rhythmic values)

Pitch

pianoroll(laksin,'name','sec','vel');
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#

mus.score
score information

0
1
2
3
mus.score(laksin.mid)
B4

10

10

mus.save
mus.play

Time in seconds

120

Velocity

100
A4 80

60

G4 40

F#4 20

0
E4
Eb4

Time in seconds

0 notation of
2 the two first
4 phrases of
6 Lksin min
8 kesyn.
10 The lower panel
Figure 1: Pianoroll
shows the velocity information.

mus.pitch
pitch contour

m = mus.score(myfile)
Pitch

71

m is of class
mus.sequence

70
69
68
67
66
65

s = mus.pitch(m)

64
63

s is of class sig.signal

h = mus.hist(s, Class)

Histogram

10
8

6
4
2
0
D#

F#

Pitches

mus.pitch(Inter)
pitch interval contour

m = mus.score(myfile)

Pitch

7
6
5
4

i = mus.pitch(m, Inter)

3
2
1
0

h = mus.histo(i)

-1
-2

1
7

mus.histo(i, Sign, 0)

Histogram

5
4
3
2
1
0
-2

-1

mus.pitch
pitch contour
Pitch

71
70
69
68

67

m = mus.score(myfile)

66
65
64

c = mus.pitch(m,
Sampling, .25)

63

Autocor

1
0.8

mus.autocor(c)

0.6
0.4
0.2
0
-0.2
0.5

1.5

2.5

Time

3.5

4.5

mus.pitch
pitch contour

m1 = mus.score(myfile,
EndTime, 5)

Pitch

71
70
69
68
67
66
65

m2 = mus.score(myfile,
StartTime, 5)

64

0.5

1.5

time

2.5

3.5

2.5

3.5

Pitch

71
70
69
68

p1 = mus.pitch(m1,
Sampling, .25)
p2 = mus.pitch(m2,
Sampling, .25)

67
66
65
64
63

0.5

1.5

time

sig.dist(p1, p2, Cosine)

F. Lerdahl, R. Jackendoff, A generative


theory of tonal music, MIT Press, 1983
time-span reduction

prolongational reduction

GPR 7

ex

ic
s
trin

GPR 6

grouping structure

MPR 1

parallelism
ctu

structural

stru

PR

PRPR 5

TS
R

ral

culture

metrical structure

l
a
c
i
etr

accents
phenomenal

2+
3)

ic
ns
tri
in

(G
PR

surface

O. Lartillot, Reflexions towards a generative theory of musical parallelism,


Musicae Scientiae, DF 5, 2010.

Structural
levels

Groups

Symbolic level

Meter

Notes
Timbre

Audio level

Mode
Tonality

Pitch
Sound

Dynamics

mus.score(, Segment)
local segmentation

$
#
%

!
"
"
"
"
"
"

Tenney & Polansky, 1980

LBDM (Cambouropoulos, 1997)

Bod, 2002

based) approaches. An example of the first category is the algorithm by Tenney an


Polansky (1980), which finds the locations where the changes in clangs occur
These clangs correspond to large pitch intervals and large inter-onset-intervals (IOIs)
This idea is partly based on Gestalt psychology. For example, this algorithm segment
Lksin in the following way:

mus.score(, Segment)
local
segmentation
segmentgestalt(laksin,'fig');

C5#
C5
B4
A4#

Pitch

A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#

10

12

14

16

Time in beats

mus.score(, Segment, TP)


Figure 22. Segmented version of Lksin min kesyn. The dotted line indicates clan
boundaries and the black line indicates the segment boundary, both the result of the Gestalt
based algorithm
& Polansky,
1980).
(Tenney(Tenney
& Polansky,
1980)

from MIDI Toolbox

Another segmentation technique uses the probabilities derived from the analysis o

although the correct segmentation is more in line with Tenney & Polanskys model.
Finally, a Local Boundary Detection Model by Cambouropoulos (1997) is a recent
variant of the rule-based model that offers effective segmentation of monophonic
input.

mus.score(, Segment)
local segmentation

Pitch

boundary(laksin,'fig');
C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#

10

12

14

16

12

14

16

Time in beats

1
Boundary strengths

0.8
0.6
0.4
0.2
0

10

mus.score(, Segment, LBDM)

(Cambouropoulos, 1997)

from MIDI Toolbox

mus.score(, Segment)
CHAPTER 4 EXAMPLES
local segmentation

Pitch

33

C5#
C5
B4
A4#
A4
G4#
G4
F4#
F4
E4
D4#
D4
C4#

10

12

14

16

12

14

16

Time in beats

0.5

10

mus.score(, Segment, Essen)

(Bod, 2002)

from MIDI Toolbox

mus.score(, Segment) vs.


mus.score(, Group)

3
!
$
!
!
# 4 !!! ! !!
!
!
!

mus.score(, Group)
hierarchical grouping
$ 42 ! #
$

! ! !

$ !#

!
&

!#
!

"

!
"

!
!

"

! ! !

!
!

"

!
"
!
!(

!#

!
"

%! !
!

!
"
&

!
"
!

"

!
!
"

!
!

"

!
"
!

!!

"

'

Mozart, Variation XI on Ah, vous dirai-je maman, K.265/300e

mus.score(, Group)
hierarchical grouping
B4

A4

G4
F#4

E4
Eb4

10

Structural
levels

Groups

Mode
Tonality

Meter

Ornamentation!
reduction
Symbolic level

Notes
Timbre

Audio level

Pitch
Sound

Dynamics

mus.score(, Reduce)
ornamentation reduction
(Lerdahl & Jackendoff)

head

!
#
2
$ 4
$

! ! !

#
!
$

!
&

!#
!

"

!
"

!
!

"

! ! !

!
!

"

!
"
!
!(

!#

!
"

%! !
!

!
"
&

!
"
!

"

!
!
"

!
!

"

!
"
!

!!

"

'

Mozart, Variation XI on Ah, vous dirai-je maman, K.265/300e

mus.score(, Reduce)
ornamentation reduction
B4

mus.minr(laksin.mid, A4
Group, Reduce)
G4

Construction of a
F#4
syntagmatic network
E4
Eb4

10

Structural
levels

Groups

Mode
Tonality

Ornamentation
reduction
Symbolic level

Motifs

Pattern
Notes

Timbre
Audio level

Meter

Pitch
Sound

Dynamics

motivic pattern
analysis

!
A
B

A

"
A

!
A
B

Bach Invention in D minor

A

A@

Kresky (1977) analysis


Name Kresky's interpretation
A
Opening motive
A'
First sequence unit
B
Accompanying figure
a
Motive shape
a1
Reversed motive shape
a2=c Triadic structure
a3=a3' Retarded scalar climb
a4=C Bass line
a5
Seconde sequence unit
a6
(not considered)
b
Three-note scale
b'
Identifying a5 with a3
b''
Same, transposed
b'''
Three shape in bass line
D
(not considered)

!@
!@
Computational description
A
A
A
A
Melodic
Rhythmic B@@
B@
(D,E,F,G,A,Bb,C#,Bb,A,G,F,E,F)
16th notes
(-2,+1,+1,+1,+1,-6,+6,-1,-1,-1,-1,+1) 16th notes
(F,A,D+,-,+,+,-)
8th notes
(prefix of A)
"
A
A@
A@(not detected)
A
B@
(+,+,-)
8th notes

(-2,+1,+1,+1,+1,-6)
16th notes
(+7,-5,+1,+1,+1,-6)
8th notes
(+1,+1,-2,+1,+1,-6)
16th notes
A
A
A
(-,+1,+1,+1)
16th
B@
B@
B@@ notes
B@@
(+1,+1)
16th notes
(D5,E5,F5)
16th notes
(C5,D5,E5)
16th notes
!@
!@
(not detected)
A
A
A
A
B@@
B@
(-1,+1,+1,+1)
16th B@@except 1st
A
B@


!
A
B

motivic pattern
analysis

"
C

!
A
B

Bach Invention in D minor

Computer analysis
$

!
C
A
B

"
C

!@
A
B@

!@
A
B@

#
C



Lartillot, Toiviainen, "Motivic


matching strategies for automated
pattern extraction", Musicae
Scientiae, DF4A, 281-314, 2007.

A
$

A
B@
%

!@
A
B@

B@

B@

A
B@@

!@
A
B@

B@

B@

motivic pattern
analysis
Bach Invention in D minor
Name Kresky's interpretation
A
Opening motive
A'
First sequence unit
B
Accompanying figure
a
Motive shape
a1
Reversed motive shape
a2=c Triadic structure
a3=a3' Retarded scalar climb
a4=C Bass line
a5
Seconde sequence unit
a6
(not considered)
b
Three-note scale
b'
Identifying a5 with a3
b''
Same, transposed
b'''
Three shape in bass line
D
(not considered)
E
(not considered)

Computational description
Melodic
Rhythmic
(D,E,F,G,A,Bb,C#,Bb,A,G,F,E,F)
16th notes
(-2,+1,+1,+1,+1,-6,+6,-1,-1,-1,-1,+1) 16th notes
(F,A,D+,-,+,+,-)
8th notes
(prefix of A)
(not detected)
(+,+,-)
8th notes
(-2,+1,+1,+1,+1,-6)
16th notes
(+7,-5,+1,+1,+1,-6)
8th notes
(+1,+1,-2,+1,+1,-6)
16th notes
(-,+1,+1,+1)
16th notes
(+1,+1)
16th notes
(D5,E5,F5)
16th notes
(C5,D5,E5)
16th notes
(not detected)
(-1,+1,+1,+1)
16th except 1st
(E5,F5,D5,E5,F5)
16th start. offbeat

Lartillot, Toiviainen, "Motivic matching strategies for automated pattern extraction",


Musicae Scientiae, DF4A, 281-314, 2007.

motivic pattern analysis


0! 0! -2M!
.5 .5 .5

0!
.5

G G

0!
.5

G G

-!
.5

+!
.5

Eb

0!
.5

0!
.5

+3P!

0! -2m!
.5 .5

0!
.5

0! -2M! +3P! 0!
.5 .5
.5 .5

G G G

0!
.5

-! +!
.5 .5

Eb Ab Ab Ab

0! -! +! 0! 0! -2m!
.5 .5 .5 .5 .5 .5

D Ab Ab Ab G F

F F

Beethoven, 5th Symphony, Allegro con brio

0! 0! -2m!
.5 .5 .5

! ! !
! ! ! ! ! "!
"! $ '
&% $
"! !
#
L1

motivic pattern
analysis

M1

(!
!
$
'
!
%
$
!
!
!
!
)
"! ! ! "! ! ! ! !
"!
!
(
!
!
!
!
!
) % $ !
"! $ ' ! ! ! "! ! ! ! ! ! !
#
U1

&% !!! !
! $ ' !# !
#
L2

! !*
##

! "! ! ! ! ! !
! ! ! "
! $ '
"! #
) % $
U2

M2

J.S. Bach, Well-Tempered


Clavier, Book II, Fugue XX
Detected subject entries
Lartillot, ISMIR 2014

(
$ '
) % $ ' !( !
! !
!
!
!
"!
! ! +!
! !
!
!
!
"
!
%
$
$
'
)
##
U3

, ,
&L3% ! ! ! !
!
'
!
$
!
! ! !!!!
"!
#

mus.score(, Motif)
motivic pattern analysis
mus.minr(beethoven, Motif)
A
G
F
F
E
E
D
D
C
C
B
A
A
G
F
F
E
E
D
C
C
B
B
A
A
G
F
E
C

100

200

300

400

Lartillot, Poster, ISMIR 2014

500

structure complexity

Pattern extraction

Large set of unrelevant


structures during extraction
phase

Pattern selection
(longest, frequent, ...)

cannot be computed
extensively

Imperfections of the
selection phase:

expected patterns
accidentally deleted

insufficiently selective

Improvement of the
extraction process

Cambouropoulos (2006), Conklin & Anagnostopoulou (2001), Rolland (1999)

closed patterns
ABCDECDEABCDECDE

n
io
it
t
e
p
e
r
n
r
e
t
t
a
p
f
o
n
io
t
ip
r
c
s
e
d
t
c
a
p
Complete and com

incremental approach

C D A B C D E A B C DE

Progressively analysing music through one single


pass

Controls structural complexity in a way similar to


the way listeners perceive music.

pattern cyclicity

ABCABCABCA

Cambouropoulos, 1998

heterogeneous pattern
cycles
"" 9
# "8

C Eb
+3 +2m-5m
8th 8th 8th

10

13

14 15

!
!
! ! ! !
!
!
!
!
!
!
!
!
!
G

cycle 1

cycle 2

-4

C Eb G C Eb Ab C Eb
+3 +2m-5m +3 +2m +3 +2m -4
8th 8th 8th 8th 8th 8th
8th 8th 8th
cycle 3

Lartillot, ISMIR 2014

cycle 4

cycle 5

Structural
levels

Groups

Mode!
Tonality

Ornamentation
reduction
Symbolic level

Motifs

Pattern
Notes

Timbre
Audio level

Meter

Pitch
Sound

Dynamics

Statistical m/tonal analysis

c = mus.chromagram(score.mid)

ks = mus.keystrength(c)

mus.key(ks)

mus.mode(ks)

C major

C# major

D major

mus.keysom(ks) ks

Krumhansl, Cognitive foundations of musical pitch. Oxford UP, 1990.

Score-level m/tonal analysis

Statistical tonal analysis: pitch distribution in frames

What if key transition within one single frame?

Tonality is more than such statistical description:

Succession of chords rooted on the scale degrees

Standard chord sequences, cadenza formulae

Patterns, grouping help emphasise chord changes

etc.

Score-level m/tonal analysis


CM

CM

Dm
II

GM
V

CM
I

Am
VI

DM
I

GM
V

CM
I

Audio / symbolic
audio

MIDI, score

tran
mus.chromagram

scr

mus.keystrength

ipti

on

mus.score
tonal analysis

sig.peaks
mus.key
190

Structural
levels

Groups

Mode
Tonality

Ornamentation
reduction
Symbolic level

Motifs

Pattern
Notes

Timbre
Audio level

Meter

Pitch
Sound

Dynamics

Statistical metrical analysis


Onset curve (Differentiated envelope)

o = aud.envelope(score.mid)

amplitude

0.15

0.1
0.05
0

a = mus.autocor(o)

!0.05

time (s)

t = mus.tempo(a)

mus.pulseclarity(a)

mus.metre(a)

coefficient value (in bpm)

Tempo
180

170
160
150
140
130

4
6
8
Temporal location of events (in s.)

10

12

Score-level metrical analysis


## $ '
!
!
&
"
!!
## $ '
!
!
&
"
! ! !! !
#
!
!
!
!
!
!
!
!
!
!
!
!
#
$
'
&
%
" ## $ ! ! ! !
!! ! &! ! '! ! ! !
#
%
" ## # $$ '! ! ! ! !& ! !! !
! & '
!
!
!
!
!
!
!
!
!
!
!
!
% # $ !! !!
# ! & '
!
!
!
!
!
!
!
!
!
!
!
!
% #$
# ! & '
A

G
A

H
B

a
d

a
e

a
f

a
g

a
d

a
e

a
f

b
i

c
j

b
k

c
l

b
m

c
n

d
1

e
2

f3

g
4

d
5

e
6

! !!! !!
! !! ! ! !! !
! !! !
! !! ! ! !! !
!! !! ! ! !! !! ! !
!! ! !&! !! !'! ! !! !!
!! !!
! !&! ! !' !! !
!! !! ! ! !! !! ! !
!& ! ' ! !
!! !! ! ! !! !! ! !
& '
C

I
C

G
A

H
B

I
C

G H
A
B

G H

a
g

a
d

a
e

a
f

a
g

a
d

a
e

a
f

b
o

c
p

b
i

c
j

b
k

c
l

b
m

c
n

b
o

f7

g
8

d
9

e
10

!
!!
! ! & !!
!!
! !
! ! & !!
! !! !!
!! !! !! ! !!& !! !! !!
!! ! !&! !! '! ! !! !!
! ! ! !& ! ! !
! !& ! ' ! !
!! !! ! ! !! !! ! !
!
!
!
!
!! ! &! ! '! ! ! !
!! !!
! & '
C

G
D H
E

I
C

G H

a
g

a
d

a
e

a
f

a
g

a
d

a
e

a
f

c
p

b
i

c
j

b
k

c
l

b
m

c
n

f 12
g 13
d 14
e 15
f 16
g
11

d
1

e
2

f3

g
4

d
5

e
6

! !!! !!
! !!! !!
! !!! !!
! !! !! ! !! !
!! ! !! ! ! ! ! !
!! ! ! ! !
!! ! !&! !! !'! ! !! !!
!! ! ! ! ! !
!& ! !' !! !
!! !! ! ! !! !! ! !
!& ! ' ! !
!! !! ! ! !! !! ! !
& '
F

I
F

G
D

H
E

I
F

G
D H
E

G H

! ! & !!
!!
! !
! ! & !!
!! ! !! !! ! ! !! !!
! ! ! !& ! ! !
! ! !!& !! !' ! !!! !!
! ! ! !& ! !
! ! !&! !! !' ! !! !!
!! !!
!
!
!
!
! ! !& ! !' ! ! !
!! !!
! & '
F

I
F

G
D

H
E

a a a a

a a a

a
g

a
d

a
e

a
f

a
g

a
d

a
e

a
f

a
g

a a a a a a a a
d e f g d e f
g

b
o

c
p

b
i

c
j

b
k

c
l

b
m

c
n

b
o

c
p

b
i

f7

g
8

d
9

e
10

f
g 13
d 14
e 15
f 16
g
11
12

O. Lartillot, Reflexions towards a generative theory of musical


parallelism, Musicae Scientiae, DF 5, 2010.
c
j

b
k

c
l

b c
m
n

b
o

c
p

d 2
e
1

f
3

g
4

d e
5
6

f
7

g
8

Audio / symbolic
audio

MIDI, score

tran
aud.envelope

scr

mus.autocor

ipti

on

mus.score
metrical analysis

sig.peaks
mus.tempo
194

In future works, symbolic-based grouping based on


combination of different musical factors

Structural
levels

Groups

Mode
Tonality

Ornamentation
reduction
Symbolic level

Motifs

Pattern
Notes

Timbre
Audio level

Meter

Pitch
Sound

Dynamics

4.

How it works
197

Matrix convention in
MIRtoolbox
In standard Matlab code, processing
of matrix dimensions makes the code
somewhat obscure.

In MIRtoolbox, one single matrix


convention:

This convention should be followed


everywhere in the code, and is not
explicitly explained to the readers.

s
l
ne

n
a
h

Bins

Frames

Matrix convention in
MIRtoolbox
One single convention that needs to allow complex
things such as the data of a batch of audio files that
are each segmented.
Ch

Bins

{{

s
l
e
n
an

n
n
a

s
l
e

Ch

Bins

Frames

Frames

,},}
(always structured
like this)

Matrix convention in

MiningSuite

There is no more constraint on the choice of dimensions.

Samples
{bin,
sample,
track}

s
k
c
a
Tr

Bins

Bins

a
r
T

s
k
c

Freq. bands
{bin,
freqband,
track}

Allows configuration that were not possible in MIRtoolbox.

Matrix convention in

MiningSuite
More complex configurations can be freely
designed, outside of any pre-wired configuration.

Tr

Bins

{{

s
k
ac

a
r
T

s
k
c

Bins

Freq. bands

Freq. bands

,},}
(variable structures
possible)

sig.data

New syntactic layer on top of Matlab that makes


operators code simpler.
sig.data
Matlab
matrix
dimension
specification
cell layers

For instance:
{bin,sample,channel}
Indicating if the data is a cell-array,
corresponding to batch of audio files,
sequence of segmented data, etc.

sig.data

x.size(sample) gives the number of samples.

x.sum(channel) sums the matrix along the channel


dimension.

x.times(y) multiplies two sig.data objects, respecting


dimension type congruency.

x.apply(@xcorr,{},{sample},2) applies xcorr along the


sample dimension. The last argument notifies that
xcorr does not work for matrices with more than 2
dimensions. The extra dimensions are automatically
covered via loops.

sig.data
d:

e = d.apply(@algo,{},{bin},2);

tra

samples

out = {e};
function ceps = algo(m,)

ceps = mfccDCTMatrix * m;
end

2 dimensions:
m:
bins

end

s
k
c

bins

function out = routine(d,)

sig.signal.display

a = sig.input(myfile, Center)

sig.signal
X axis (bin pos)

no

Y axis
Sample axis
Frame axis
Channel axis

no
no

Peaks
date

15.10.2014

ver

R2014b,
MiningSuite 0.8

yname
yunit
Ydata

audio

Sstart
Srate
Ssize

0
44100
184269

sig.data
Matlab
matrix
dimension
specification
cell layers

indices
precise pos
precise val

sig.design

design

a = sig.input(myfile, Center)

b = sig.brightness(a, Frame)

Brightness, son5

coefficient value

sig.signal.display

sig.signal
X axis (bin pos)

no

Y axis
Sample axis
Frame axis
Channel axis

no
no

Peaks
date

15.10.2014

ver

R2014b,
MiningSuite 0.8

0.6
0.4
0.2
70

yname
yunit
Ydata

75

brightness

Sstart
Srate
Ssize

80

85
90
Temporal location of events (in s.)

95

sig.data
Matlab
matrix

0
40
.05

dimension
specification
cell layers

indices
precise pos
precise val

sig.design

design

b = sig.brightness(a, Frame)

100

Information such as
time positions are
regenerated on the fly.

b = sig.spectrum(a, Frame)

sig.Spectrum sig.axis

sig.unit

X axis

xname

frequency

xname

Hz

Y axis

xstart

origin

rate

10.76

generator

@..

finder

@..

xunit

Sample axis
Frame axis
Channel axis

no
no

Peaks
date

15.10.2014

ver

R2014b,
MiningSuite 0.8

design

yname
yunit
Ydata
Sstart
Srate
Ssize
indices
precise pos

spectrum

0
40
.05

1
power
0
log
phase sig.data

Data flow graph


in MiningSuite
long audio file,
batch of files

a = sig.input(bigfile, );

sig.input

s = sig.spectrum(f, Frame, );

c = sig.centroid(s)

Frame

; No operation is performed.
sig.spectrum
sig.centroid

(The data flow graph is


constructed without actual
computation.)
208

sig.design

a = sig.input(ragtime.wav, Extract, 0, 1);

b = aud.spectrum(a, Mel, log, Frame);

c = aud.mfcc(b)

a
sig
package
input
name
ragtime.wav
input
sig.signal
type
Extract, 0, 1
argin

options
no
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8

b
aud
package
spectrum
name
input
sig.Spectrum
type
argin Mel, log, Frame

options
.05 s, .5
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8

c
package
name
input
type
argin
options
frame
date
ver

aud
mfcc
sig.Mfcc

.05 s, .5
15.10.2014
R2014b,
MiningSuite 0.8

sig.design

c = aud.mfcc(ragtime.wav, Frame)

deploying the implicit data flow prior to the actual operator,


during the Init phase.

sig
package
input
name
ragtime.wav
input
sig.signal
type
Extract, 0, 1
argin

options
no
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8

aud
package
spectrum
name
input
sig.Spectrum
type
argin Mel, log, Frame

options
.05 s, .5
frame
15.10.2014
date
R2014b,
ver
MiningSuite 0.8

package
name
input
type
argin
options
frame
date
ver

aud
mfcc
sig.Mfcc

.05 s, .5
15.10.2014
R2014b,
MiningSuite 0.8

sig.design.eval
sig.input(long audio file)
aud.spectrum
aud.mfcc

Divide the audio file into successive chunks.

Compute the whole process on each chunk separately.

Recombine the final results by concatenation.

when there is no frame decomposition


mirrms

mirsum
mirfilterbank

mirzerocross

mirsegment

mirpeaks

mirpitch

mirautocor

miraudio

mirinharmonicity
mirpulseclarity

mirspectrum
mirenvelope

mironsets

mirautocor

mircepstrum
mirspectrum

mirattackslope

mirpeaks

mirattacktime

mirrolloff
mirroughness

mirpeaks
mirregularity
mirchromagram

mirtempo

mireventdensity

mirmfcc

mirbrightness

mirpeaks

mirkeystrength
mirkeysom

Intensive operators:

Chunk decomposition needs
to be performed here

mirtonalcentroid

Extensive operators:

mirenvelope needs to be
computed completely
before going further

mirpeaks

mirkey

mirmode

mireval
non-frame-based evaluation

a = miraudio(Design);

c = mirmfcc(a);

mireval(c, Folder)

very complex
paradigm for a
situation not practically
usual in MIR.
MiningSuite is much
more simple

eval_each(c,song1.wav)

c @mirdesign type=@mirmfcc
options: Rank=1:13, Delta=0, ...
@mirdesign type=@mirspectrum
options: Win=hamming, ...
a @mirdesign type=@miraudio
options: Sampling=11025, ...

after
decomposition

summation

k
chun

loading

sig.design.eval(Folder)

a = aud.mfcc(Folder, )

d = a.eval

For each separate file in the folder:

evaluate the design on that file (using the chunk


decomposition)

store the results

sig.design.display sig.design.eval
sig.signal.display
sig.design object, storing only the data flow graph

a = aud.mfcc(bigfile, )

d = sig.ans!

d = a.eval;

sig.design.display =
sig.design.eval, which
outputs a sig.signal and
calls sig.signal.display

the outputted sig.signal


sig.design.eval outputs a sig.signal

sig.signal.display

sig.operate
operators main routine
+aud/brightness.m:
function varargout = brightness(varargin)
varargout = sig.operate(aud, brightness, options,
@init, @main, varargin);
end
aud.brightness (ragtime, Frame)
sig.operate first parses the arguments in the call,

then creates a data flow design:

starts with implicit data flow (@init)

ends with @main routine.

options specification
my_operator(filename, Threshold, 100, Normalize)

thres.key = Threshold;

norm.key = Normalize;

thres.type = Numeric;

norm.type = Boolean;

three.default = 50;

norm.default = 0;

options.thres = thres;

options.norm = norm;

options specification
my_operator(filename, Domain,Symbolic)
domain.key = Domain;
domain.type = String;
domain.choice = {Signal, Symbolic};
domain.default = Signal;
options.domain = domain;

@init
implicit data flow deployment
audio file

function varargout = aud.brightness(varargin)


varargout = sig.operate(aud,brightness,options,@init,@main,varargin);
end

sig.input
sig.signal

function [x type] = init(x,option,frame)


if ~istype(x,'sig.Spectrum')

sig.spectrum
sig.Spectrum
aud.brightness
sig.signal

x = sig.spectrum(x);
end
type = 'sig.signal';
end

function varargout = aud.brightness(varargin)


varargout = sig.operate(aud,brightness,options,@init,@main,varargin);
end

function out = main(in,option)


x = in{1};
if ~strcmpi(x.yname,'Brightness')
res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);
x = sig.signal(res,'Name','Brightness','Srate',x.Srate,'Ssize',x.Ssize);
end
out = {x};
end

function out = main(in,option)

res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);
s

Freq. bands

k
ac

Tr

,},}

Bins

{{

Bins

x.Ydata:

ck
a
Tr

Freq. bands

function out = routine(d,f,f0)

Tr

Bins

d:

k
ac

Freq. bands

e = d.apply(@algo,{f,f0},{'element'},3);
out = {e};
end

sig.compute
applies the
routine to the
successive
segments,
successive
audio files,
etc.

function out = main(in,option)

res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);

function out = routine(d,f,f0)


e = d.apply(@algo,{f,f0},{'element'},3);
out = {e};
end

function y = algo(m,f,f0)
y = sum(m(f > f0,:,:)) ./ sum(m);
end

sig.data.apply
applies the
sub-routine
algo to the
data properly
formatted, with
loops if
needed.

+aud/mfcc.m:
res = sig.compute(@routine,x.Ydata,x.xdata,option.cutoff);

e = d.apply(@algo,{},{bin},2);
out = {e};

d:
tra

s
k
c

bins

function out = routine(d,)

samples

end

ceps = mfccDCTMatrix * m;
end

2 dimensions:
m:
bins

function ceps = algo(m,)

seq.event
note characterization
Each note is a seq.event:

characterization of the
note: musical
parameter (instance of
class seq.paramstruct)

previous note

next note
0

10

mus.score(, Group)
hierarchical grouping
B4

mus.score(laksin.mid,
Group)

A4

G4

Groups, like notes, are


instances of seq.event

F#4

E4
Eb4

10

mus.score(, Reduce)
ornamentation reduction
B4

mus.score(laksin.mid,
Group, Reduce)
A4
Each syntagmatic
relation between 2
notes is instance of
seq.syntagm

G4
F#4

E4
Eb4

10

mus.paramstruct
musical dimensions
each note:!

interval between notes:!

chromatic pitch
more general
chromatic pitch class

diatonic pitch (letter, accident,


octave)

chromatic pitch interval

diatonic pitch class

chromatic pitch interval class

diatonic pitch interval (number,


quality)

diatonic pitch interval class

gross contour

onset, offset times (in s.)

metrical position

inter onset interval

channel

rhythmic value

harmony, etc.

seq.param
parameter management

common(p1,p2) returns the common parametric


description

p1.implies(p2) tests whether p2 is more general than p1

seq.syntagm
s = seq.syntagm(n1,n2)
n1 n2

s.param is automatically computed from n1.param


and n2.param

mus.score
incremental approach
Groups

Mode
Tonality

Ornamentation
reduction

Meter

Motifs

Pattern
Notes

Each successive note is progressively integrated to


all musical analyses, driving interdependencies.

mus.score
incremental approach
pat.pattern

8
0
8

seq.syntagm

pat.occurrence

0
G 8
0
F 8

G
F

0
8

0
8

0
8 G

0
+
8 Eb

5.

How you can contribute


231

https://code.google.com/p/
miningsuite/

All releases, SubVersion repository, integrated with


code review environment

Users Manual and documentations in wiki


environment

Mailing lists: news, discussion list related to


ongoing development, commits, issues registered
and modified, discussion list for users

Tickets to issue bug reports

Open-source project

MIRtoolbox and initial version of MiningSuite is mainly


the work of one person. Transition to a tool controlled by
a community following standard open-source protocols.

Whole code should be clearly readable, and be subject


to correction/modification/enrichment by open
community, after open discussions.

Further development of the toolbox core (architecture,


new perspectives) also subject to open discussion and
community-based collaboration.

Open-source project
Contribution acknowledgement

Each contributors participation is acknowledged,


in particular in the copyright notice of source files
to which he or she contributed.

Each new model based on particular research is


acknowledged in the documentation. In particular,
users of the model are asked to cite the related
research papers in their own papers.

High-level musicological analysis refines lower-level transcription.

Pattern mining is a central process in symbolic-based music


analysis.

Structural
levels

Groups

Mode
Tonality

Ornamentation
reduction
Symbolic level

Motifs

Pattern
Notes

Timbre
Audio level

Meter

Pitch
Sound

Dynamics

Future directions
(on my side)

Melodic transformations (ornamentation)

Modelling form and style

Metrical, tonal/modal analysis on symbolic domain

Polyphonic analysis on symbolic domain

Lrn2
Cre8

Future directions
(We need you!)

Systematic test to check the validity of the results,


to be run before each public version release

Measuring and controlling the variations of the


results between versions

Integrating other methods from the MIR literature

Any other idea?

We can also discuss about it during ISMIR.

Acknowledgments

Academy of Finland research fellowship, 2009-14

Finnish Centre of Excellence in Interdisciplinary


Music Research, University of Jyvskyl

Learning to Create

Lrn2
Cre8

Aalborg University, MusIC group

Thanks to all MIRtoolbox contributors and users

Thanks to MiningSuite beta-testers


238

You might also like