Piano Music Transcription Systems

Presented by: Greg Eustace

Piano Music Transcription Systems

Presented by: Greg Eustace

Introduction to polyphonic transcription systems including pioneering wor ! basic architecture! piano transcription systems and current problems" #iscussion o$ Marolt%s paper &' (onnectionist 'pproach to 'utomatic Transcription o$ Piano Music)! including adaptive oscillator networ s! *eural *etwor s and the SO*I( system $or piano transcription"

Polyphonic transcription
Transcription: The e+traction o$ symbolic ,notational- in$ormation $rom music! including pitches! dynamic levels! onset times and durations o$ notes" .or automatic transcription systems the input is an audio $ile and the output could be a MI#I or score $ile" Polyphonic transcription systems have been in use since the early /012%s beginning with the wor o$ Moorer whose system assumed only two voices! separated in $re3uency range! having di$$erent timbres and restricted intervallic relationships"

4asic architecture o$ transcription systems: .ront end processing

The audio signal is converted to a time5$re3uency representation such as provided by the short time .ourier trans$orm ,ST.T-" Partials present in a $rame are identi$ied and their $re3uency and amplitude in$ormation is e+tracted" Pea pic ing is commonly achieved by setting an amplitude threshold" 6Partial trac s% are $ormed by connecting partials across $rames based on their amplitude and $re3uency relationships"

4asic architecture: 4lac board systems $or note identi$ication

So called blac board systems use various criteria to group partials together in order to identi$y notes" Each criterion is represented by a 6 nowledge source%" These could include in$ormation $rom physics! psychoacoustics or music theory" 4lac board system o$ten uni$y 6top5down% and 6bottom5up approaches"

4asic architecture: Machine learning $or note Identi$ication

Other systems use machine learning $or pattern recognition! such as 7idden Mar ov Models and *eural networ s" This necessarily involves a training stage in which input5output pairs $rom a data set are introduced to the networ "

'ssumptions o$ transcriptions systems

Many transcriptions systems assume a speci$ic instrument as input" Thus! transcription problems which are not speci$ic to the instrument need not be accounted $or in the system" Piano transcription systems don%t need to deal with notes that modulate in $re3uency" Piano notes also have pronounced attac s ma ing onset detection easier"

(urrent problems
Types o$ error: Missed 8 spurious notes" Octave error ,due to ambiguity between partials7igh polyphony 7igh 8 low notes 9epeated notes Short note durations Mas ing o$ low amplitude notes

' (onnectionist 'pproach to 'utomatic Transcription o$ Piano Music

Mati:a Marolt

'uditory model 8 time $re3uency analysis

The input audio is passed through a series o$ logarithmically spaced gammatone $ilters! with center $re3uencies between 12 and ;222 7<" The output $rom each $ilter is processed using Meddis% model o$ hair cell transduction! involving hal$ wave recti$ication! saturation and reduction" This results in a 3uasi5periodic impulsive signal that represents the $iring patterns o$ hair cells" #ynamic compression is also inherent! meaning that low amplitude partials will be more detectable"

'uditory model 8 time $re3uency analysis

.igure /: 'nalysis o$ three partials o$ piano note .= with the 'uditory Model ,Marolt! >22?-"

Partial trac ing using adaptive oscillators

Partial trac ing is achieved using @arge5Aolen adaptive oscillators which synchroni<e to the $re3uency and phase o$ the driving signal ,i"e" the output $rom auditory model-" In this way partials are identi$ied i$ synchroni<ation occurs" Synchroni<ation operates according to the modi$ied gradient descent rule! &minimi<ing an error $unction that describes the di$$erence between input events and beginnings o$ oscillation cycles)" The initial $re3uency o$ the oscillator is set to that center $re3uency o$ the corresponding $ilter" The oscillator attempts synchroni<ation at the beginning o$ every cycle so lower $re3uencies are slower to synchroni<e" The authors have shown that adaptive oscillators can success$ully trac $re3uency modulated partials"

Partial trac ing using adaptive oscillators

.igure >: Partial trac ing with adaptive oscillators ,Marolt! >22?-"

Partial trac ing: 'daptive oscillator networ s

Partial groups are trac ed using BB networ s ,up to ten- o$ adaptive oscillators" Increasingly smaller networ s are used $or higher $re3uency notes in correspondence with an upper bound speci$ied at ;222 7< $or partial $re3uencies" The $re3uency o$ each oscillator in the networ is initially set to an integer multiple o$ the $undamental" 'n e+citatory relationship between the oscillators in a networ ! allows synchroni<ed oscillators to change the $re3uency o$ non5 synchroni<ed oscillators ,based on harmonic relationships-! thus achieving $aster synchroni<ation rates" The output o$ a networ is a weighted sum o$ the outputs o$ its oscillators" Oscillators are weighted according to their closeness to ideal $re3uencies" 'n oscillator that deviates strongly $rom the ideal contributes less to the output o$ the networ "

'rti$icial *eural *etwor s

'rti$icial *eural *etwor ,'**-: Models the neural structure o$ the brain" In simple terms! the '** receives in$ormation $rom various sources through 6input neurons%! combines or trans$orms the in$ormation in some way ,handled by neurons in hidden layers- and outputs that in$ormation via the $iring o$ 6output neurons%"

*ote detection using *eural *etwor s

The system uses 1; neural networ s" Each networ is trained to recogni<e a particular note ,'/ to (B-" The input to each networ is accepted $rom a partial trac ing module" The output o$ the networ is single neuron" ' neuron with a high value represents the presence o$ the target note" The data set $or testing consisted o$ a synthesi<ed piano pieces and piano chords! thus allowing $or input5output patterns ,=22!222 in total- to be presented to the networ "

*ote detection using *eural *etwor s

Several di$$erent types o$ neural networ s were tested! with time5delay **s provided the best results"

Table /: Per$ormance statistics o$ *eural *etwor models $or *ote recognition ,Marolt! >22?-"

*ote detection using neural networ s

The authors compared the result o$ using their partial trac ing method as input $or the time5delay **s! with that o$ a time5$re3uency trans$orm ,similar to constant C trans$orm-" Their partial trac ing method per$ormed better"

Table >: 'verage per$ormance o$ systems with and without partial trac ing ,Marolt! >22?-"

SO*I(: ' system $or transcription o$ piano music

The partial trac ing and note detection systems were incorporated in to the SO*I( system $or piano transcription" SO*I( is capable o$ detecting note onsets! lengths and loudness as well as a particular pianos tuning and the presence o$ repeated notes" SO*I( is available $or download at: http:DDlgm"$ri"uni5l:"siDSO*I("html

SO*I(: ' system $or transcription o$ piano music

.igure ?: Structure o$ SO*I( ,Marolt! >22?-"

SO*I(: Onset detection

The onset detector involves splitting the audio into >> $re3uency bands" The outputs are $iltered to give a positive value when the signal rises and negative value otherwise" The $ilter outputs control the activation o$ neurons which send impulses that indicate onsets" Multilayer perceptrons ,M@P- are used to determine i$ the impulse represents an onset or some other type o$ amplitude disturbance"

SO*I(: 9epeated notes

#etecting repeated notes poses a problem i$ notes which share the same partials are present in a chord containing a repeated note" SO*I( uses M@Ps $or trac ing repeated notes"

SO*I(: Tuning! note lengths and dynamics

The pianos tuning needs to be detected prior to transcription" 'daptive oscillators are used to detect partials" Tuning is then calculated as a weighted sum o$ the deviation o$ the partials $rom ideal $re3uencies" *otes terminations ,and there$ore lengths- are indicated when the note activation networ s $all below a threshold" #ynamics are calculated using the amplitude envelope o$ the notes $irst harmonic"

Per$ormance Statistics

Table =: Per$ormance statistics o$ transcriptions o$ = synthesi<ed and = real piano recordings ,Marolt! >22?-"

Transcription results available at:


Error #iscussion
The ma:ority o$ errors encountered were concerned with octave error and repeated notes" 'dditional sources o$ error include $ast passages ,such as arpeggios or thrills-! mas ing o$ low amplitude notes! missed onsets! high polyphony or very low pitched notes"

