Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

INTERNSHIP REPORT ON

VOICE CLASSIFICATION USING ML

Work carried out at

VARCONS TECHNOLOGIES VT LTD

Submitted in partial fulfillment of the requirement for the award of


Bachelor of Engineering Degree
In
Electronics and Telecommunication Engineering
of
Visvesvaraya Technological University, Belagavi
By

KEERTHAN K BHAT
1BI19ET19

Under the guidance of Under the guidance of


Internal guide External guide

Prof.Thygaraj.R Ms. Spoorthi C


Assistant Professor Engineer
Dept. of Electronics and
Varcons Technologies
Telecommunication Pvt Ltd
Engineering, B.I.T, Bangalore. Ulsoor, Bangalore.

DEPARTMENT OF ELECTRONICS AND TELECOMMUNICATION


ENGINEERING

BANGALORE INSTITUTE OF TECHNOLOGY,

BENGALURU –560004
2022 – 2023
BANGALORE INSTITUTE OF TECHNOLOGY
K. R. Road, V. V. Pura, Bengaluru-560004
Department of Electronics and Telecommunication
Engineering

CERTIFICATE

Certified that the Internship entitled “VOICE CLASSIFICATION USING ML” work carried out
by KEERTHAN K BHAT (1BI19ET019), bonafide students of Bangalore Institute of Technology
in partial fulfillment for the award of Bachelor of Engineering Degree in Electronics and
Telecommunication Engineering of the Visvesvaraya Technological University, Belagavi during the
year 2022-2023. It is certified that all corrections/suggestions indicated for internal assessment have
been incorporated in the report deposited in the Departmental Library. The Internship Report has been
approved as it satisfies the academic requirements in respect of Internship work prescribed for the
said Degree.

Guide H.O.D
Prof. Thyagaraj R
Dr. M. Rajeswari
Assistant Professor

External Viva:

Name of the Examiners Signature

Internal:

External:
Certificate of Internship

This is to certify that Keerthan K Bhat whose USN is 1BI19ET019,


has completed their Machine Learning With Python (Research Based)
Internship organised and handled by Varcons Technologies Pvt. Ltd from
21st October, 2022 to 24th November, 2022.

The person to whom this certificate is addressed to has worked on a project ti-
tled Voice Classification Using ML, As part of the project, They designed the
Machine Learning Model, Demonstrated and tested the working of the Model,
Prepared a report highligting its flaws by understanding the design briefs and
client Specifications that were provided in the Proposal.

During the course of the internship, they demonstrated good design skills with
a self-motivated attitude to learning new things. Their performance exceeded
expectations and was able to complete the project successfully on time.

To verify this certificate, click here

Spoorthi C
Director
Varcons Technologies Pvt. Ltd www.varconstech.com
st
213, 2 Floor, contact@varconstech.com
18 M G Road, Ulsoor, Bangalore-560001

This certificate was generated using CERTIEFY.COM


ACKNOWLEDGEMENT

I would like to take this opportunity to thank all those who have been involved directly or indirectly
in the completion of my seminar.

I would therefore take this opportunity to express my gratitude to our respected Principal, Dr. Aswath
M. U, for providing an excellent academic environment in the college.

I would like to express my gratitude to Dr. M. Rajeswari, Head of the Department, Department of
Electronics and Telecommunication Engineering, for her encouragement throughout building this
report.

I am grateful to Prof. N. Shruthi, Assistant Professor, Department of Electronics and


Telecommunication Engineering, for coordinating and extending her support and guidance for the
accomplishment of the seminar abide to the guidelines.

I would like to thank Prof. Thyagaraj R, Assistant Professor, Department of Electronics and
Telecommunication Engineering and Prof. N. Shruthi, Assistant Professor, Department of
Electronics and Telecommunication Engineering, who have extended their support, guidance and
assistance for the successful completion of the seminar.

I am grateful to all the teaching and non-teaching staff of the Department of Electronics and
Telecommunication Engineering, for their support and cooperation and I would like to thank my
parents for their constant moral support and encouragement throughout the completion of the
Internship.

i
ABSTRACT

Audio Classification means categorizing certain sounds in some categories, like environmental
sound classification and speech recognition. The task we perform same as in Image
classification of cat and dog, Text classification of spam and ham. It is the same applied in
audio classification.
A speech percept can reveal information about the speaker including gender, age, language, and
emotion. By converting a raw waveform of the audio data into the form of spectrograms, we
can pass it through deep learning models to interpret and analyze the data. In audio
classification, we normally perform a binary classification in which we determine if the input
signal is our desired audio or not.

ii
CONTENTS

Title Page No

CERTIFICATE

ACKNOWLEDGEMENT i
ABSTRACT ii
TABLE OF CONTENTS iii
LIST OF FIGURES iv
CHAPTER-1 : ABOUT THE COMPANY 1
1.1 INTRODUCTION 1
1.2 VISION 1
1.3 MISSION 1
1.4 CORE VALUES 1
1.5 SERVICES OFFERED 2

CHAPTER-2 :VOICE CLASSIFICATION OF ML 3


2.1 MACHINE LEARNING 3
2.2 PYTHON 4
2.3 JUPYTER NOTEBOOK 6

CHAPTER-3 : TASKS PERFORMED 7


3.1 SYSTEM ANALYSIS 7
3.2 SYSTEM REQUIREMENTS 8
3.3 DESIGN ANALYSIS 9
CHAPTER-4 : OUTCOMES 10
CHAPTER-5 : WEEKLY REPORT 15
CHAPTER-6 : CONCLUSION 15
REFERENCES 16

iii
Fig. No Title Page. No
2.1 Python Logo 4
2.2 Jupyter Logo 6

3.3 Flowchart Diagram 9


3.2 Block Diagram representing Training Data 11

3.3 Resulting Grid 13

3.4 Output screen 13

3.5 Waveform 14

iv
Voice classification using ML 2022-2023

COMPANY PROFILE

1.1 INTRODUCTION
• Enabling success to businesses by harnessing the power of technology
• Offering professional Graphic design, Brochure design & Logo design. Are experts in
crafting visual content to convey the right message to the customers. Along with this
design custom wraps are offered for products (package designing).
• Managing SEO campaigns more efficiently and effectively. To help you gain market
share by leveraging expertise.

1.2 VISION
• Our main goal is to find smart ways of using technology that will help build a better
tomorrow for everyone, everywhere. SaaS offers a variety of advantages over traditional
software licensing models and We here at VCT tend to include the key features of SaaS
in everything we build..

1.3 MISSION
• To accelerate the development of products and reduce time to market through its proven
processes, methodologies, and tools.

1.4 CORE VALUES


Smart solutions are at the core of all that is done at VCT. To find smart ways of using
technology that will help build a better tomorrow for everyone, everywhere.

Dept of ETE,BIT. 1
Voice classification using ML 2022-2023

1.5 SERVICES OFFERED

1.Website as a service: - Development of websites that behave and interact similar to


sophisticated software.

2.Branding and Design: -Offering professional Graphic design, Brochure design & Logo
design. Are experts in crafting visual content to convey the right message to the customers.
Along with this design custom wraps are offered for products (package designing).

3.Search Engine Optimization: -Managing SEO campaigns more efficiently and effectively. To
help you gain market share by leveraging expertise. With a holistic approach, anything that
may be hurting your traffic or rankings and demonstrate how to outrank the competition

Dept of ETE,BIT. 2
Voice classification using ML 2022-2023

CHAPTER 2

VOICE CLASSIFICATION USING ML


2.1 MACHINE LEARNING

➢ What is machine learning?

Machine learning is a branch of artificial intelligence (AI) and computer science which focuses
on the use of data and algorithms to imitate the way that humans learn, gradually improving its
accuracy. Machine learning is an important component of the growing field of data science.
Through the use of statistical methods, algorithms are trained to make classifications or
predictions, uncovering key insights within data mining projects. These insights subsequently
drive decision making within applications and businesses, ideally impacting key growth metrics.
As big data continues to expand and grow, the market demand for data scientists will increase,
requiring them to assist in the identification of the most relevant business questions and
subsequently the data to answer them.

➢ Machine Learning vs. Deep Learning

The way in which deep learning and machine learning differ is in how each algorithm learns.
Deep learning automates much of the feature extraction piece of the process, eliminating some of
the manual human intervention required and enabling the use of larger data sets. You can think of
deep learning as "scalable machine learning". Classical, or "non-deep", machine learning is more
dependent on human intervention to learn. Deep learning (also called deep machine learning) can
leverage labeled datasets, also known as supervised learning, to inform its algorithm, but it
doesn’t necessarily require a labeled dataset. It can ingest unstructured data in its raw form (e.g.
text, images), and it can automatically determine the set of features that distinguish different
categories of data from one another. Unlike machine learning, don’t require human intervention
to process data, allowing us to scale machine learning in more interesting ways.

Dept of ETE,BIT. 3
Voice classification using ML 2022-2023

2.2 PYTHON:
Python is an interpreted, object-oriented, high-level programming language with dynamic
semantics. Its high-level built-in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together. Python's simple, easy to learn
syntax emphasizes readability and therefore reduces the cost of program maintenance. Python
supports modules and packages, which encourages program modularity and code reuse. Often,
programmers fall in love with Python because of the increased productivity it provides. Since
there is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python
programs is easy: a bug or bad input will never cause a segmentation fault. Instead, when the
interpreter discovers an error, it raises an exception. When the program doesn't catch the
exception, the interpreter prints a stack trace. A source level debugger allows inspection of local
and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through
the code a line at a time, and so on. The debugger is written in Python itself, testifying to Python's
introspective power. On the other hand, often the quickest way to debug a program is to add a
few print statements to the source: the fast edit-test-debug cycle makes this simple approach very
effective.

FIGURE 2.2 python logo

Dept of ETE,BIT. 4
Voice classification using ML 2022-2023

Why Python?

Building software is like building a house. In both cases, what you need the most is a strong
foundation. Use a weak foundation, and you will struggle with expansion, suffer costly repairs, or
possibly be forced to rebuild the whole thing from scratch down the line. But use a strong
foundation, and you will scale up smoothly, the upkeep will be a breeze, and your project will be
built to last. If the house is your software, then the foundation is your programming language.

➢ Fast development speed

Python is designed to be accessible. This makes writing Python code very easy and developing
software in Python very fast. What does that mean for your development team? Less time wasted
struggling with the language and more time spent building your product.

➢ Numerous libraries and frameworks

A huge advantage of Python is the wide selection of libraries and frameworks it offers. Your
time-to-market will improve if you leverage them, since you won’t be coding features manually.

➢ There’s a Python library for everything

Data visualization, machine learning, data science, natural language processing, complex data
analysis. Easy maintenance. Python is intuitive to read, because it resembles actual english. This
makes the language effortless to decipher and maintain. Additionally, python has a clear syntax
and doesn’t require as many lines of code as java or c to give you comparable results.

➢ What are the benefits of Python’s high readability?

Python’s simplicity is particularly helpful in reading code—yours or someone else’s. Because


Python code has fewer lines and mimics English, reviewing it takes a lot less time. This is a
major benefit Reducing the time you need to spend on code review is invaluable, since the
productivity of your developers should be your top priority.

Dept of ETE,BIT. 5
Voice classification using ML 2022-2023

2.3 JYPYTER NOTEBOOK(anaconda3)

The Jupyter Notebook is an open source web application that you can use to create and share
documents that contain live code, equations, visualizations, and text. Jupyter Notebook is
maintained by the people at Project Jupyter. Jupyter Notebooks are a spin-off project from the
IPython project, which used to have an IPython Notebook project itself. The name, Jupyter,
comes from the core supported programming languages that it supports: Julia, Python, and R.
Jupyter ships with the IPython kernel, which allows you to write your programs in Python, but
there are currently over 100 other kernels that you can also use. The Jupyter Notebook
application allows you to create and edit documents that display the input and output of a Python
or R language script. Once saved, you can share these files with others.

FIGURE 2.1 Jupyter logo

Dept of ETE,BIT. 6
Voice classification using ML 2022-2023

CHAPTER 3
TASKS PERFORMED

3.1 System analysis

3.1.1 Existing System

Speech Emotion Recognition (SER) is a system that can identify the emotion of different audio
samples. From a machine learning perspective, speech emotion recognition is a classification
problem where an input sample (audio) needs to be classified into a few predefined emotions.

3.1.2 Proposed System

In this paper, we propose a model for sentiment analysis that utilizes features extracted from the
speech signal to detect the emotions of the speakers involved in the conversation. The process
involves four steps:

1) Pre-processing which includes VAD,

2) Speech Recognition System,

3) Speaker Recognition System,

4) Sentiment Analysis System.

3.1.3 Objective of the System

The objective of our project is to come up with an effective method for voice sentiment
classification, which can be used in wide range of applications in different fi

Dept of ETE,BIT. 7
Voice classification using ML 2022-2023

3.2 System Requirements

3.2.1 Hardware Requirement Specification

• Personal computer

3.2.2 Software Requirement Specification


• Jupyter notebook

• NumPy

Numpy is one of the most commonly used packages for scientific computing in Python. It
provides a multidimensional array object, as well as variations such as masks and
matrices, which can be used for various math operations. It is used here mainly to process
the huge amount of data.

• Librosa

Librosa is a Python package for music and audio analysis. Librosa is basically used when
we work with audio data like in music generation, Automatic Speech Recognition. It
provides the building blocks necessary to create the music information retrieval systems.
Used here to analyse and convert audio files to data.

• Keras

Keras is a high-level, deep learning API developed by Google for implementing neural
networks. It is written in Python and is used to make the implementation of neural
networks easy. It also supports multiple backend neural network computation

• Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive


visualizations in Python. Matplotlib makes easy things easy and hard things possible.

Dept of ETE,BIT. 8
Voice classification using ML 2022-2023

3.3 Design analysis

3.3.1 Methodology

FIGURE 3.1 Flowchart for audio classification

The methodology involves 4 Principal stages.

• Importing data(audio) from the datasets.


• Loading sound data using librosa library,
• Converting sound data into numerical vector spectrograms,
• Building deep neural network, and predicting the label of sound data.

Dept of ETE,BIT. 9
Voice classification using ML 2022-2023

Raw Data

Raw Data refers to the set of data ie the audio files used to train the model and that can be
used to test the model. These data sets are a lot in number, greater the set greater is the
training. This lump sum audio files include a varied set of data.The unpacking of these
files are done by librosa.

Feature Extraction

The next step is to extract the features we will need to train our model. To do this, we are
going to create a visual representation of each of the audio samples which will allow us to
identify features for classification, using the same techniques used to classify images with
high accuracy.

Spectrograms are a useful technique for visualising the spectrum of frequencies of a


sound and how they vary during a very short period of time. We will be using a similar
technique known as Mel-Frequency Cepstral Coefficients (MFCC).

The main difference is that a spectrogram uses a linear spaced frequency scale (so each
frequency bin is spaced an equal number of Hertz apart), whereas an MFCC uses a quasi-
logarithmic spaced frequency scale, which is more similar to how the human auditory
system processes sounds.

The image below compares three different visual representations of a sound wave, the
first being the time domain representation, comparing amplitude over time. The next is a
spectrogram showing the energy in different frequency bands changing over time, then
finally an MFCC which we can see is very similar to a spectrogram but with more
distinguishable detail.

Train and Test Data

The dataset from the memory is now used to train by utilizing data as metadata where the
file is mapped to filename. By usage of NumPy the iterations for training the model is
done.

Dept of ETE,BIT. 10
Voice classification using ML 2022-2023

FIGURE 3.2 Block diagram representing model training

Dept of ETE,BIT. 11
Voice classification using ML 2022-2023

3.4 System analysis

Implementation is the stage where the theoretical design is turned into a working system. The
most crucial stage in achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.

The system can be implemented only after thorough testing is done and if it is found to work
according to the specification. It involves careful planning, investigation of the current
system and it constraints on implementation, design of methods to achieve the change over
and an evaluation of change over methods a part from planning.

Two major tasks of preparing the implementation are education and training of the users and
testing of the system. The more complex the system being implemented, the more involved
will be the system analysis and design effort required just for implementation.

The implementation phase comprises of several activities. The required hardware and
software acquisition is carried out. The system may require some software to be developed.
For this, programs are written and tested. The user then changes over to his new fully tested
system and the old system is discontinued.

3.4.1 Testing
The testing phase is an important part of software development. It is the Information zed
system will help in automate process of finding errors and missing operations and also a
complete verification to determine whether the objectives are met and the user requirements
are satisfied. Software testing is carried out in three steps:

1. The first includes unit testing, where in each module is tested to provide its correctness,
validity and also determine any missing operations and to verify whether theobjectives
have been met. Errors are noted down and corrected immediately.

2. Unit testing is the important and major part of the project. So, errors are rectified easily
inparticular module and program clarity is increased. In this project entire system is
divided into several modules and is developed individually. So, unit testing is conducted
to individual modules.

3. The second step includes Integration testing. It need not be the case, the software whose
modules when run individually and showing perfect results, will also show perfect
results when run as a whole.

Dept of ETE,BIT. 12
Voice classification using ML 2022-2023

3.4.2 Snapshots of Result

Figure 3.3

Figure 3.4

Dept of ETE,BIT. 13
Voice classification using ML 2022-2023

Figure 3.5

WEEKLY REPORT
Learnt basics of programming library
WEEK-
1 fundamentals.

WEEK- Studied various concepts related to


2
audio processing and neural networks .

WEEK- Trying to build the prototype using various


3
libraries and concepts.

WEEK- Training and testing the project model and


4
debugging errors.

Dept of ETE,BIT. 14
Voice classification using ML 2022-2023

CHAPTER 4
CONCLUSION

The package was designed in such a way that future modifications can be done easily. The
following conclusions can be deduced from the development of the project:

❖ Automation of the entire system improves the efficiency

❖ It provides a friendly graphical user interface which proves to be better when compared
to the existing system.

❖ It gives appropriate access to the authorized users depending on their permissions.

❖ System security, data security and reliability are the sectors that could potentially use this .

❖ The System has adequate scope for modification in future if it is necessary.

Dept of ETE,BIT. 15
Voice classification using ML 2022-2023

8. REFERENCE

I https://mikesmales.medium.com/sound-classification-using-deep-learning-8bc2aa1990b7

II https://www.topcoder.com/thrive/articles/voice-data-classification-using-deep-learning

Dept of ETE,BIT. 16

You might also like