Professional Documents
Culture Documents
full report_group-5
full report_group-5
full report_group-5
OF
Prof. R. S. Kothe
2019-2020
CERTIFICATE
Submitted by
is a bonafide work carried out by them under the supervision of Prof. R. S. Kothe and it is
approved for the partial fulfillment of the requirement of Savitribai Phule Pune University for
the award of the Degree of Bachelor of Engineering (Electronics and Telecommunication
Engineering)
This project phase-1 report has not been earlier submitted to any other Institute or University
for the award of any degree or diploma.
Place: Pune
Date
I
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL
NETWORK
ACKNOWLEDGEMENT
It gives us great pleasure in presenting the preliminary project report on ‘Musical
Instrument Sound Classification using Deep Convolutional Neural Network’.
I would like to take this opportunity to thank my internal guide Prof. R. S. Kothe,
Department of Electronics and Telecommunication Engineering, SMT. KASHIBAI
NAVALE COLLEGE OF ENGINEERING, for her unconditional guidance. I am really
grateful for her kind support. An erudite teacher, a magnificent person and a strict
disciplinarian, we consider ourselves fortunate to have worked under her supervision.
LIST OF FIGURES
CONTENTS
CERTIFICATE I
ACKNOWLEDGEMENT II
LIST OF FIGURES III
1. INTRODUCTION 1
1.1 BACKGROUND 2
1.2 PROJECT UNDERTAKEN 3
1.3 OBJECTIVES 3
1.4 ORGANIZATION OF PROJECT REPORT 4
2. LITERATURE SURVEY 5
2.1 LITERATURE SURVEY 6
3. DESIGN AND DRAWING 13
3.1 SYSTEM ARCHITECTURE 14
3.2 ALGORITHMS 16
4. SYSTEM REQUIREMENT 17
4.1 HARDWARE REQUIREMENT 18
4.2 SOFTWARE REQUIREMENT 18
5. EXPERIMENTATION AND RESULT 20
6. OTHER SPECIFICATIONS 23
6.1 ADVANTAGES 24
6.2 LIMITATIONS 24
7. CONCLUSIONS 25
8. REFERENCES 27
ABSTRACT
Acoustic is the branch of physics that deals with the study of mechanical waves such as
vibration, sound etc. It is important to find a way to classify musical instruments for obtaining
high level information about the musical signal. For this purpose, the MUMS musical
instrument sound database is used for feeding the data to machine. Sound recognition problem
consists of three different stages as pre-processing of signals, extraction of specific features
and their classification. Signal pre-processing divides the input signal to different segments
which used for extracting related features. Feature extraction reduces the size of data and
represents the complex data as feature vectors. The trimmed sound signals along with
characteristics timbre of the sound signals are fed to Convolutional Neural Network. Deep
learning methods used to perform feature extraction and classification from these signals.
CNN detects the features and builds a network to identify different instruments. All the work
including the processing of database and creation of neural network will be done using suitable
software.
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
Over a decade, Music Information Retrieval (MIR) has been making efforts to
provide a reliable solution for several music related tasks. Musical instruments sound
classification is a sub-task of MIR and deals with identification of individual musi-
cal instruments and its family. Musical instrument classification problem consists of
three steps: pre-processing, feature extraction and classification. Majority of the
research on musical instrument classification is focused on feature extraction and
feature analysis, of which feature extraction is vital. Finding compact, efficient and
robust feature set is a major challenge in instrument classification. In addition, se-
lection of a suitable classifier plays an important role in improving classification
accuracy.
Automated instrument recognition has been at the core of sound research for a long
time. Taking care of the issue would permit gains in related region, for example,
automated translation, source partition and some more. In spite of the fact that it
appears to be a clear issue, computer battle in the assignment set for basic sound
recordings that comprise just of a only one instrument.
Musical Instrument Classification (MIC) is one of the major critical tasks for
acquiring the high-level information about the musical signal. The musical notes are
classified into monophonic and polyphonic. In monophonic, a single instrument is
played and in polyphonic two are more instruments are played concurrently.
1.2.1 Motivation
Audio classification of the dataset, is what inspired me to try and apply CNN to
instrument recognition .In many ways, instrument classification is very similar to
object recognition in images: both contain one or more objects/instruments, both
could come in different qualities and in both the object / instrument can appear in
different positions of the image/song. As mentioned, Deep neural networks, and
CNN in particular, have shown incredible performance when faced with complex
problems. Despite that, there has been little research seeking to apply these
techniques to Automatic Musical Instrument Recognition. This project investigates
whether CNN can improve the current state-of-the-art in this field.
1.3 Objectives
• Study and analysis of different features of musical instrument sounds for ac-
curate identification of musical instruments.
• Chapter 2 deals with the Project Related Work i.e. Literature Survey.
• Chapter 3 giving an overall view of the techniques used in the system i.e.
Design and Drawing.
CHAPTER 2
LITERATURE SURVEY
Proposed system, the model tries to solve the trouble of MIR, the use of feature
gaining knowledge of system based on the network of deep convolution.
Music signal is represented by spectrograms which performance on the input
to deep CNNs. The CNN model consisting of 3 convolution and 3 absolutely
layers extract features of spectrograms and predictions of ten instrumental
classes. In this work, CNN model used to be skilled created on the
spectrograms produced for the IRMAS dataset. The results had been
performed the use of the model for most of the instruments.
4. Juan S. Gomez, Jakob Abeber and Estefania Cano, “Jazz Solo Instru-
ment Classification with Convolutional Neural Networks, Source Separa-
tion, and Transfer Learning”, Proceedings of the 19th ISMIR Conference,
Paris, France, September 23-27, 2018
The extraction of the timbre for musical identification from the audio signal
has been subject of several researches, where various spectro-temporal param-
eters have been proposed and compared. Classification strategies are generally
based on two discrimination approaches: direct classification and hierarchical
classification. In both strategies, the feature vector is static and is used during
the whole treatment process. In this paper, author proposed a hierarchical clas-
sification where the feature vector is dynamic and changes depending on each
level and each node of the hierarchical tree. The feature vector was optimized
and was determined with the sequential backward selection (SBS) algorithm.
Using a large database (RWC), the results show, a score gain in musical in-
strument recognition performances with the proposed approach.
In this paper, the role of various features with different classifiers on automatic
identification of musical instruments is discussed. Piano, flute, trumpet, gui-
tar, xylophone and violin are identified using various features and classifiers.
Spectral features like spectral centroid, spectral slope, spectral spread, spectral
kurtosis, spectral skewness and spectral roll-off are used along with zero cross-
ing rate, autocorrelation and Mel Frequency Cepstral Coefficients (MFCC) for
this purpose. The dependence of instrument identification accuracy on these
features is studied for different classifiers. The analysis also confirms the se-
lection of features and classifiers as the results are better.
1, pp. 2-10.
In this paper, the author presented a new innovative method for quantitative
estimation of the musical instrument categories which compose a music piece.
The method uses a wavelet-based music source (i.e., musical instrument)
separation algorithm and consists of two steps. In the first step, a source
separation technique based on wavelet packets is applied to separate the
musical instruments which compose a music piece. In the second step, a
classification algorithm based on support vector machines was applied to
estimate the musical category of each of the musical instruments identified in
the first step. The method is evaluated on the publicly available Iowa Musical
Instrument Database and found to perform quite successfully.
This paper presents a new acoustic model using decision trees (DTs) as re-
placements for Gaussian mixture models (GMM) to compute the observation
likelihoods for a given hidden Markov model state in a speech recognition
system. DTs have a number of advantageous properties, such as that they do
not impose restrictions on the number or types of features, and that they
automatically perform feature selection. Context-dependent DT-based models
are highly compact compared to conventional GMM-based acoustic models.
The main research content in this paper was to examine some topic often
ignored in lastly works. Several features inspired by both the psychoacoustic
and perceptive knowledges were reexamined to explore the 16 efficiency of
the timbre dimension contribution for each feature and each category (similar
CHAPTER 3
Various attempts have been made to construct automatic musical instrument classi-
fication system. Researchers have used different approaches and scopes, achieving
different performances. But, the literature survey study states that, finding reduced
and efficient feature scheme is a major challenge for musical instrument classifica-
tion. In order to achieve this objective, we have proposed automatic musical instru-
ment classification using MFCC features using CNN for isolated musical instrument
sounds. This work can be extended for polyphonic musical instrument sound clas-
sification. In polyphonic musical instrument classification, several instruments are
played simultaneously. The timbral and similar features can be extracted and used in
same way for classification of polyphonic musical sound signals. The proposed
automatic musical instrument classification system is depicted. The input to the
system is musical instrument sound in the form of wave file and output is name of
instrument and its family. The proposed automatic musical instrument classification-
system comprises of three main stages such as pre-processing, feature extraction and
classification which has been described in the following section.
14
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL NETWORK
1. Pre-processing
In this stage, the silence part of the music sound signal is removed, which helps
to improve the time complexity of the system. The energy threshold value is
used to remove the silence part of the signal.
2. Feature extraction
Feature extraction is a vital part of the system. In this stage, acoustic features
were extracted to obtain the effective classification using CNN. Acoustic fea-
tures such as temporal, spectral, the MFCC features were constructed and the
significant statistical values of the features were used as feature components.
The detail of the feature extraction is explained below.
MFCC are the short time power spectral representation of a signal and
represents psychoacoustic property of the human auditory system.MFCC
has been used extensively in speech analysis over past few decades and
recently received more attention in music analysis. Here, attempts have
been made to identify and classify musical instruments using MFCC fea-
tures. Computation of MFCC features for a segment of music signal
consists of pre-emphasis, framing, windowing, FFT, triangular mel scale
band pass filtering and DCT which is described below.
15
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL NETWORK
3.2 ALGORITHMS:-
• Step 2: Pooling
In this part, we’ll cover pooling and will get to understand exactly how it
generally works. Our nexus here, however, will be a specific type of pooling;
max pooling. We’ll cover various approaches, though, including mean (or
sum) pooling. This part will end with a demonstration made using a visual
interactive tool that will definitely sort the whole concept out for you.
• Step 3: Flattening
This will be a brief breakdown of the flattening process and how we move from
pooled to flattened layers when working with Convolutional Neural Networks.
16
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL NETWORK
CHAPTER 4
SYSTEM REQUIREMENT
• Monitor : 15”LED
• Ram :4GB
• Algorithm development
MATLAB is an interactive system whose basic data element is an array that does not
require dimensioning. This allows you to solve many technical computing prob-
lems, especially those with matrix and vector formulations, in a fraction of the time
it would take to write a program in a scalar noninteractive language such as C or
Fortran.
The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the LINPACK and
EISPACK projects, which together represent the state-of-the-art in software for
matrix computation.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory and
advanced courses in mathematics, engineering, and science. In industry, MATLAB
is the tool of choice for high-productivity research, development, and analysis.
MATLAB features a family of application-specific solutions called toolboxes.
Very important to most users of MATLAB, toolboxes allow you to learn and apply
specialized technology. Toolboxes are comprehensive collections of MATLAB
functions (M-files) that extend the MATLAB environment to solve particular classes
of problems. Areas in which toolboxes are available include signal processing,
control systems, neural networks, fuzzy logic, wavelets, simulation, and many
others.
CHAPTER 5
21
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL NETWORK
22
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL NETWORK
CHAPTER 6
OTHER SPECIFICATIONS
6.1 ADVANTAGES
6.2 LIMITATIONS
CHAPTER 7
CONCLUSION
Conclusion
An epic component extraction plot for programmed order of melodic instruments
sound utilizing MFCC highlights have been proposed. The Proposed highlights
expand the between-class variety and limit the inside class variety and increment the
separating capacity of the highlights with same computational multifaceted nature as
MFCC. The proposed highlights catch the dynamic time fluctuating nature of sound
sign due to trill like portion premise capacity and help to catch acoustic qualities of
music sound flag superior to MFCC. CNN has been used for prediction and
classification.
26
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL NETWORK
CHAPTER 8
REFERENCES
4. Juan S. Gomez, Jakob Abeber and Estefania Cano, “Jazz Solo Instrument
Classification with Convolutional Neural Networks, Source Separation, and
Transfer Learning”, Proceedings of the 19th ISMIR Conference, Paris, France,
September 23-27, 2018
5. Taejin Park and Taejin Lee, “Musical Instrument Sound Classification with Deep
Convolutional Neural Network using Feature Fusion Approach”, Electronics
and Telecommunications Research Institute (ETRI), Republic of Korea,2017
28
MUSICAL INSTRUMENT SOUND CLASSIFICATION USING DEEP CONVOLUTIONAL NEURAL
NETWORK