Professional Documents
Culture Documents
ULNN
ULNN
Transcription
• BY :-
• MOHD ADNAN 2K22CSUN01149
• YASH POONIA 2K22CSUN01159
• ARYAN BHANOT 2K22CSUN01130
• KRISHNAV 2K21CSUN0
ABSTRACT
• This paper proposes an innovative approach to automating guitar audio
transcription to tablature
• It addresses the labor-intensive and error-prone nature of manual methods
utilizing convolutional auto-encoder neural networks trained on a dataset of
guitar audio recordings and corresponding tablature annotations.
• Through this training, the model learns to encode the unique features of
guitar music into a concise representation, enabling accurate tablature
generation.
• Leveraging machine learning and neural networks, this approach offers
increased speed, scalability, and accessibility.
• Actual guitar performances are utilized to establish a ground truth; they will
enable us to learn the exact fingerings so that a tablature can be created.
• The authors also introduced a set of metrics to measure the performance of
tablature estimation systems. Humphrey & Bello (2014) [REFERENCES ]
developed a novel approach for generating guitar tablature from music audio.
• The first layer is a 2D convolutional layer with 32 filters, a kernel size of (3,
3), ReLU activation.
• It takes input images of size (128, 128, 1).
• This is followed by a max pooling layer with a pool size of (2, 2).
• Then, another convolutional layer with 64 filters, ReLU activation, followed by
another max pooling layer.
The model has been compiled
using the Adam optimizer & mean
squared error as the loss function,
& then trains the autoencoder
model using the spectrogram
images as both input & target
output, for 10 epochs with a batch
size of 32 & shuffling the data
during training
FEATURES EXTRACTION
• In our study, we simplified our complex 'autoencoder' model by creating a
new 'encoder' model using its first three layers.
• This 'encoder' focuses on extracting essential features from input images.
When given an image, it produces a condensed summary akin to
compressing a large painting into a smaller version, emphasizing key
elements.
• In the encoded features, there are 8 samples (spectrogram images). Each
sample has a feature map with dimensions 64x64, & there are 64 channels
or filters in each feature map.
• Each channel captures different aspects or patterns present in the input
images.
CONCLUSION
• Our research presents an innovative method for automating guitar audio
transcription through the application of convolutional autoencoder neural
networks.
• By leveraging the synergy between machine learning and signal processing,
we have showcased the capacity to seamlessly convert guitar audio
recordings into tablature with exceptional precision and efficiency.
• This breakthrough not only revolutionizes the transcription process but also
gives freedom access to musical transcription for musicians across
proficiency levels.
• As automated transcription systems progress, they stand to redefine music
education, composition, and analysis, paving the way for novel avenues of
creativity and expression in the musical domain.
• FUTURE WORK :
• Utilizing the annotations accompanying the dataset can enhance transcription
accuracy by leveraging additional information about the audio recordings.
• Expanding the current model's prediction capability beyond the six standard
tuning classes of the guitar could lead to more precise audio transcriptions
for individual notes.
• Converting the audio into MIDI files offers a potential solution to simplify the
transcription process for individual notes, further improving accuracy and
efficiency.