Professional Documents
Culture Documents
TEMPLATEpdf Merged
TEMPLATEpdf Merged
KEERTHAN K BHAT
1BI19ET19
BENGALURU –560004
2022 – 2023
BANGALORE INSTITUTE OF TECHNOLOGY
K. R. Road, V. V. Pura, Bengaluru-560004
Department of Electronics and Telecommunication
Engineering
CERTIFICATE
Certified that the Internship entitled “VOICE CLASSIFICATION USING ML” work carried out
by KEERTHAN K BHAT (1BI19ET019), bonafide students of Bangalore Institute of Technology
in partial fulfillment for the award of Bachelor of Engineering Degree in Electronics and
Telecommunication Engineering of the Visvesvaraya Technological University, Belagavi during the
year 2022-2023. It is certified that all corrections/suggestions indicated for internal assessment have
been incorporated in the report deposited in the Departmental Library. The Internship Report has been
approved as it satisfies the academic requirements in respect of Internship work prescribed for the
said Degree.
Guide H.O.D
Prof. Thyagaraj R
Dr. M. Rajeswari
Assistant Professor
External Viva:
Internal:
External:
Certificate of Internship
The person to whom this certificate is addressed to has worked on a project ti-
tled Voice Classification Using ML, As part of the project, They designed the
Machine Learning Model, Demonstrated and tested the working of the Model,
Prepared a report highligting its flaws by understanding the design briefs and
client Specifications that were provided in the Proposal.
During the course of the internship, they demonstrated good design skills with
a self-motivated attitude to learning new things. Their performance exceeded
expectations and was able to complete the project successfully on time.
Spoorthi C
Director
Varcons Technologies Pvt. Ltd www.varconstech.com
st
213, 2 Floor, contact@varconstech.com
18 M G Road, Ulsoor, Bangalore-560001
I would like to take this opportunity to thank all those who have been involved directly or indirectly
in the completion of my seminar.
I would therefore take this opportunity to express my gratitude to our respected Principal, Dr. Aswath
M. U, for providing an excellent academic environment in the college.
I would like to express my gratitude to Dr. M. Rajeswari, Head of the Department, Department of
Electronics and Telecommunication Engineering, for her encouragement throughout building this
report.
I would like to thank Prof. Thyagaraj R, Assistant Professor, Department of Electronics and
Telecommunication Engineering and Prof. N. Shruthi, Assistant Professor, Department of
Electronics and Telecommunication Engineering, who have extended their support, guidance and
assistance for the successful completion of the seminar.
I am grateful to all the teaching and non-teaching staff of the Department of Electronics and
Telecommunication Engineering, for their support and cooperation and I would like to thank my
parents for their constant moral support and encouragement throughout the completion of the
Internship.
i
ABSTRACT
Audio Classification means categorizing certain sounds in some categories, like environmental
sound classification and speech recognition. The task we perform same as in Image
classification of cat and dog, Text classification of spam and ham. It is the same applied in
audio classification.
A speech percept can reveal information about the speaker including gender, age, language, and
emotion. By converting a raw waveform of the audio data into the form of spectrograms, we
can pass it through deep learning models to interpret and analyze the data. In audio
classification, we normally perform a binary classification in which we determine if the input
signal is our desired audio or not.
ii
CONTENTS
Title Page No
CERTIFICATE
ACKNOWLEDGEMENT i
ABSTRACT ii
TABLE OF CONTENTS iii
LIST OF FIGURES iv
CHAPTER-1 : ABOUT THE COMPANY 1
1.1 INTRODUCTION 1
1.2 VISION 1
1.3 MISSION 1
1.4 CORE VALUES 1
1.5 SERVICES OFFERED 2
iii
Fig. No Title Page. No
2.1 Python Logo 4
2.2 Jupyter Logo 6
3.5 Waveform 14
iv
Voice classification using ML 2022-2023
COMPANY PROFILE
1.1 INTRODUCTION
• Enabling success to businesses by harnessing the power of technology
• Offering professional Graphic design, Brochure design & Logo design. Are experts in
crafting visual content to convey the right message to the customers. Along with this
design custom wraps are offered for products (package designing).
• Managing SEO campaigns more efficiently and effectively. To help you gain market
share by leveraging expertise.
1.2 VISION
• Our main goal is to find smart ways of using technology that will help build a better
tomorrow for everyone, everywhere. SaaS offers a variety of advantages over traditional
software licensing models and We here at VCT tend to include the key features of SaaS
in everything we build..
1.3 MISSION
• To accelerate the development of products and reduce time to market through its proven
processes, methodologies, and tools.
Dept of ETE,BIT. 1
Voice classification using ML 2022-2023
2.Branding and Design: -Offering professional Graphic design, Brochure design & Logo
design. Are experts in crafting visual content to convey the right message to the customers.
Along with this design custom wraps are offered for products (package designing).
3.Search Engine Optimization: -Managing SEO campaigns more efficiently and effectively. To
help you gain market share by leveraging expertise. With a holistic approach, anything that
may be hurting your traffic or rankings and demonstrate how to outrank the competition
Dept of ETE,BIT. 2
Voice classification using ML 2022-2023
CHAPTER 2
Machine learning is a branch of artificial intelligence (AI) and computer science which focuses
on the use of data and algorithms to imitate the way that humans learn, gradually improving its
accuracy. Machine learning is an important component of the growing field of data science.
Through the use of statistical methods, algorithms are trained to make classifications or
predictions, uncovering key insights within data mining projects. These insights subsequently
drive decision making within applications and businesses, ideally impacting key growth metrics.
As big data continues to expand and grow, the market demand for data scientists will increase,
requiring them to assist in the identification of the most relevant business questions and
subsequently the data to answer them.
The way in which deep learning and machine learning differ is in how each algorithm learns.
Deep learning automates much of the feature extraction piece of the process, eliminating some of
the manual human intervention required and enabling the use of larger data sets. You can think of
deep learning as "scalable machine learning". Classical, or "non-deep", machine learning is more
dependent on human intervention to learn. Deep learning (also called deep machine learning) can
leverage labeled datasets, also known as supervised learning, to inform its algorithm, but it
doesn’t necessarily require a labeled dataset. It can ingest unstructured data in its raw form (e.g.
text, images), and it can automatically determine the set of features that distinguish different
categories of data from one another. Unlike machine learning, don’t require human intervention
to process data, allowing us to scale machine learning in more interesting ways.
Dept of ETE,BIT. 3
Voice classification using ML 2022-2023
2.2 PYTHON:
Python is an interpreted, object-oriented, high-level programming language with dynamic
semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together. Python's simple, easy to learn
syntax emphasizes readability and therefore reduces the cost of program maintenance. Python
supports modules and packages, which encourages program modularity and code reuse. Often,
programmers fall in love with Python because of the increased productivity it provides. Since
there is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python
programs is easy: a bug or bad input will never cause a segmentation fault. Instead, when the
interpreter discovers an error, it raises an exception. When the program doesn't catch the
exception, the interpreter prints a stack trace. A source level debugger allows inspection of local
and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through
the code a line at a time, and so on. The debugger is written in Python itself, testifying to Python's
introspective power. On the other hand, often the quickest way to debug a program is to add a
few print statements to the source: the fast edit-test-debug cycle makes this simple approach very
effective.
Dept of ETE,BIT. 4
Voice classification using ML 2022-2023
Why Python?
Building software is like building a house. In both cases, what you need the most is a strong
foundation. Use a weak foundation, and you will struggle with expansion, suffer costly repairs, or
possibly be forced to rebuild the whole thing from scratch down the line. But use a strong
foundation, and you will scale up smoothly, the upkeep will be a breeze, and your project will be
built to last. If the house is your software, then the foundation is your programming language.
Python is designed to be accessible. This makes writing Python code very easy and developing
software in Python very fast. What does that mean for your development team? Less time wasted
struggling with the language and more time spent building your product.
A huge advantage of Python is the wide selection of libraries and frameworks it offers. Your
time-to-market will improve if you leverage them, since you won’t be coding features manually.
Data visualization, machine learning, data science, natural language processing, complex data
analysis. Easy maintenance. Python is intuitive to read, because it resembles actual english. This
makes the language effortless to decipher and maintain. Additionally, python has a clear syntax
and doesn’t require as many lines of code as java or c to give you comparable results.
Dept of ETE,BIT. 5
Voice classification using ML 2022-2023
The Jupyter Notebook is an open source web application that you can use to create and share
documents that contain live code, equations, visualizations, and text. Jupyter Notebook is
maintained by the people at Project Jupyter. Jupyter Notebooks are a spin-off project from the
IPython project, which used to have an IPython Notebook project itself. The name, Jupyter,
comes from the core supported programming languages that it supports: Julia, Python, and R.
Jupyter ships with the IPython kernel, which allows you to write your programs in Python, but
there are currently over 100 other kernels that you can also use. The Jupyter Notebook
application allows you to create and edit documents that display the input and output of a Python
or R language script. Once saved, you can share these files with others.
Dept of ETE,BIT. 6
Voice classification using ML 2022-2023
CHAPTER 3
TASKS PERFORMED
Speech Emotion Recognition (SER) is a system that can identify the emotion of different audio
samples. From a machine learning perspective, speech emotion recognition is a classification
problem where an input sample (audio) needs to be classified into a few predefined emotions.
In this paper, we propose a model for sentiment analysis that utilizes features extracted from the
speech signal to detect the emotions of the speakers involved in the conversation. The process
involves four steps:
The objective of our project is to come up with an effective method for voice sentiment
classification, which can be used in wide range of applications in different fi
Dept of ETE,BIT. 7
Voice classification using ML 2022-2023
• Personal computer
• NumPy
Numpy is one of the most commonly used packages for scientific computing in Python. It
provides a multidimensional array object, as well as variations such as masks and
matrices, which can be used for various math operations. It is used here mainly to process
the huge amount of data.
• Librosa
Librosa is a Python package for music and audio analysis. Librosa is basically used when
we work with audio data like in music generation, Automatic Speech Recognition. It
provides the building blocks necessary to create the music information retrieval systems.
Used here to analyse and convert audio files to data.
• Keras
Keras is a high-level, deep learning API developed by Google for implementing neural
networks. It is written in Python and is used to make the implementation of neural
networks easy. It also supports multiple backend neural network computation
• Matplotlib
Dept of ETE,BIT. 8
Voice classification using ML 2022-2023
3.3.1 Methodology
Dept of ETE,BIT. 9
Voice classification using ML 2022-2023
Raw Data
Raw Data refers to the set of data ie the audio files used to train the model and that can be
used to test the model. These data sets are a lot in number, greater the set greater is the
training. This lump sum audio files include a varied set of data.The unpacking of these
files are done by librosa.
Feature Extraction
The next step is to extract the features we will need to train our model. To do this, we are
going to create a visual representation of each of the audio samples which will allow us to
identify features for classification, using the same techniques used to classify images with
high accuracy.
The main difference is that a spectrogram uses a linear spaced frequency scale (so each
frequency bin is spaced an equal number of Hertz apart), whereas an MFCC uses a quasi-
logarithmic spaced frequency scale, which is more similar to how the human auditory
system processes sounds.
The image below compares three different visual representations of a sound wave, the
first being the time domain representation, comparing amplitude over time. The next is a
spectrogram showing the energy in different frequency bands changing over time, then
finally an MFCC which we can see is very similar to a spectrogram but with more
distinguishable detail.
The dataset from the memory is now used to train by utilizing data as metadata where the
file is mapped to filename. By usage of NumPy the iterations for training the model is
done.
Dept of ETE,BIT. 10
Voice classification using ML 2022-2023
Dept of ETE,BIT. 11
Voice classification using ML 2022-2023
Implementation is the stage where the theoretical design is turned into a working system. The
most crucial stage in achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.
The system can be implemented only after thorough testing is done and if it is found to work
according to the specification. It involves careful planning, investigation of the current
system and it constraints on implementation, design of methods to achieve the change over
and an evaluation of change over methods a part from planning.
Two major tasks of preparing the implementation are education and training of the users and
testing of the system. The more complex the system being implemented, the more involved
will be the system analysis and design effort required just for implementation.
The implementation phase comprises of several activities. The required hardware and
software acquisition is carried out. The system may require some software to be developed.
For this, programs are written and tested. The user then changes over to his new fully tested
system and the old system is discontinued.
3.4.1 Testing
The testing phase is an important part of software development. It is the Information zed
system will help in automate process of finding errors and missing operations and also a
complete verification to determine whether the objectives are met and the user requirements
are satisfied. Software testing is carried out in three steps:
1. The first includes unit testing, where in each module is tested to provide its correctness,
validity and also determine any missing operations and to verify whether theobjectives
have been met. Errors are noted down and corrected immediately.
2. Unit testing is the important and major part of the project. So, errors are rectified easily
inparticular module and program clarity is increased. In this project entire system is
divided into several modules and is developed individually. So, unit testing is conducted
to individual modules.
3. The second step includes Integration testing. It need not be the case, the software whose
modules when run individually and showing perfect results, will also show perfect
results when run as a whole.
Dept of ETE,BIT. 12
Voice classification using ML 2022-2023
Figure 3.3
Figure 3.4
Dept of ETE,BIT. 13
Voice classification using ML 2022-2023
Figure 3.5
WEEKLY REPORT
Learnt basics of programming library
WEEK-
1 fundamentals.
Dept of ETE,BIT. 14
Voice classification using ML 2022-2023
CHAPTER 4
CONCLUSION
The package was designed in such a way that future modifications can be done easily. The
following conclusions can be deduced from the development of the project:
❖ It provides a friendly graphical user interface which proves to be better when compared
to the existing system.
❖ System security, data security and reliability are the sectors that could potentially use this .
Dept of ETE,BIT. 15
Voice classification using ML 2022-2023
8. REFERENCE
I https://mikesmales.medium.com/sound-classification-using-deep-learning-8bc2aa1990b7
II https://www.topcoder.com/thrive/articles/voice-data-classification-using-deep-learning
Dept of ETE,BIT. 16