Voice Based System Assistant Using NLP and Deep Learning

Voice Based System Assistant using NLP and deep learning
A Project report submitted in partial fulfillment of the requirements for

the award of the degree of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE ENGINEERING
Submitted by
K. Subba Reddy (318126510028)
K. Sesha Shai Datta (3191265100L06)
A. Tarun (31812651004)
S.Ajay Varma (318126510048)
Under the guidance of

Mrs.K. Amaravathi
ASSISTANT PROFESSOR, DEPT.OF CSE, ANITS
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ANIL NEERUKONDA INSTITUTE OF TECHNOLOGY AND SCIENCES
(UGC AUTONOMOUS)
(Permanently Affiliated to AU, Approved by AICTE and Accredited by NBA & NAAC with ‘A’ Grade)
Sangivalasa, bheemili mandal, visakhapatnam dist.(A.P)
2021-2022
1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ANIL NEERUKONDA INSTITUTE OF TECHNOLOGY AND SCIENCES
(UGC AUTONOMOUS)
(Affiliated to AU, Approved by AICTE and Accredited by NBA & NAAC with ‘A’ Grade) Sangivalasa,
Visakhapatnam district (A.P)
BONAFIDE CERTIFICATE
This is to certify that the project report entitled “Voice Based System Assistant using NLP and
deep learning” submitted by K. Subba Reddy(318126510047), K. Sesha Shai
Datta(319126510L06), A. Tarun(318126510016), S. Ajay Varma(318126510046) in partial
fulfillment of the requirements for the award of the degree of Bachelor of Technology in
Computer Science Engineering of Anil Neerukonda Institute of technology and sciences (A),
Visakhapatnam is a record of bonafide work carried out under my guidance and supervision.
PROJECT GUIDE HEAD OF THE DEPARTMENT
Mrs. K. Amaravathi Dr.R.Sivaranjani
(Assistant Professor) (Professor)

Dept of CSE, ANITS Dept of CSE, ANITS
2
DECLARATION
We K. Subba Reddy (318126510028), K. Sesha Shai Datta (3191265100L06), A. Tarun

(318126510016), S. Ajay Varma (318126510046) of final semester B.Tech., in the department of
Computer Science and Engineering from ANITS, Visakhapatnam, hereby declare that the project work
entitled “Voice Based System Assistant using NLP and deep learning ” is carried out by us and
submitted in partial fulfillment of the requirements for the award of Bachelor of Technology in Computer
Science Engineering, under Anil Neerukonda Institute of Technology & Sciences(A) during the academic
year 2018-2022 and has not been submitted to any other university for the award of any kind of degree.
Submitted By Team 3A
K. Subba Reddy (318126510028)
A. Tarun (318126510004)
S. Ajay Varma (318126510048)
Mrs. K. Amaravathi
(Assistant Professor)
Dept. of CSE, ANITS
3
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of a task would be incomplete
without mentioning the people who made it possible, whose constant guidance and encouragement always
boosted morale. We take a great pleasure in presenting a project, which is the result of a studied blend of
both research and knowledge.
We first take the privilege to thank our Head of Department, DR.R.SIVARANJANI, for permitting us in
laying the first stone of success and providing the lab facilities, we would also like to thank the other staff in
our department and lab assistants who directly or indirectly helped us in successful completion of the
project.
We feel great to thank Mrs. G. Gowri Pushpa, our project guide, who shared her valuable knowledge with
us and made us understand the real essence of the topic and created interest in us to work for the project.
PROJECT STUDENTS
K. Subba Reddy (318126510028)
A. Tarun (318126510004)
S. Ajay Varma (318126510048)
4
CONTENTS
TITLE Page No
Abstract 7
1. INTRODUCTION
1.1ProblemStatement 8
1.2 Deep Learning 9-
11 1.3 Natural language processing 11-12
1.4 Sequential Model 13-15
1.5 Natural Language Toolkit 16-18
1.6 Lemmatizer 18-21
1.7 Numpy 21-23

1.8 Tensorflow 23-24
1.9 Keras 24-25
2. LITERATURE SURVEY 26-27
3. METHODOLOGY
3.1 Proposed System 29-30
3.2 Architecture 30
4. UML
4.1 State Chart Diagram 31
4.2 Communication Diagram 32
4.3 Sequence Diagram 33
4. EXPERIMENTAL ANALYSIS AND RESULTS
4.1 System configuration
4.1.1 Software requirements 33
4.1.2 Hardware requirements 33
5
4.2 Sample Code 31-53
4.3 Screenshots/results 54-57
5. Platform used: Pycharm.
5.1 Intelligent Coding Assistance 57
5.2 Built-in Developer Tools 58
5.3 Customizable and Cross-platform IDE 58-59
6. Testing
6. CONCLUSION AND FUTURE SCOPE

6.1 Conclusion 59
6.2 Future Scope 59
7. REFERENCES 60
6
Abstract
Desktop assistants also evolved during the period along with the humans. Nowadays people are
very used to it and it is a part of their day-to-day lives.
We all want to make use of these computers more comfortable. So there is one way by using an
assistant. The typical way to provide a command to the computer is via the keyboard, but there is
a more convenient way to do. Voice instruction is a more convenient way to give the command.
These systems were used in many different applications like human-computer interactions. It
plays an important role in some people's lives, like the physically disabled. This includes
development of text to Braille systems, screen magnifiers and screen readers. Recently, attempts
have been made in order to develop tools and technologies to help Blind people to access internet
technologies. Some people may find it hard to use the system, so to overcome those kinds of
issues we use virtual assistance this helps in their daily life. This Paper Builds a general purpose
that makes conversations between user and computer.
We all want to make the use of these computers more comfortable, traditional way to give a
command to the computer is through keyboard but a more convenient way is to input the
command through voice. Giving input through voice is not only beneficial for the normal people
but also for those who are visually impaired who are not able to give the input by using a
keyboard. For this purpose, there is a need of a voice assistant which can not only take command
through voice but also execute the desired instructions and give output either in the form of voice
or any other means.
This is done through a synchronous process involving recognition of speech patterns and then,
responding via synthetic speech. Through these assistants a user can automate tasks ranging from
but not limited to mailing, tasks management and media playback. As the technology is
developing day by day people are becoming more dependent on it, one of the mostly used
platform is computer.
7
1. Introduction
Machines had to learn how to hear, recognize, and analyze human speech.long before the
voice assistants integrated into our smart speakers could understand our requests to play music,
switch off lights, and read the weather report. The technology we use today has been under
development for more than a century, and it has come a long way from the first listening and
recording equipment. From the phonograph to the virtual assistants. Assistant is described as one
of the most advanced and promising interactions between humans and machines. An Assistant is
computer software that can perform a conversation with a user in natural language through
applications, mobile apps, or desktop applications. Different companies like google, Apple use
different API’s for this purpose. It is truly a feat that today, one can schedule meetings or send
email merely through spoken commands. Whenever we want to perform an action on the system
then we have to communicate with it using our hands. If a person is facing difficulty
communicating with the system physically, then it will not be proper communication. To
overcome these kinds of problems we made an initiative to develop an application. With the
rapid development of deep learning techniques, it is now possible to solve these types of
complicated problems utilizing neural networks, which overcomes the constraints of traditional
machine learning methodology. We can extract high-level features from provided data using
deep learning and neural networks. In this way, the limitations of machine learning are being
overcome by using Deep Learning techniques.
1.1 Problem Statement
To improve customer service and increase delivery of services through the advancement in
technology. This is to gain a competitive edge over other benefits of specific search engines.In
order to overcome user satisfaction issues associated with online services. This Voice Assistant
will provide personal and efficient communication between the user and their needs in order to
reach the user desires. The Voice Assistant will allow user to feel confident and comfortable
when they are using this service regardless of the user’s computer literacy due to the natural
language used in messages. It also provides a very accessible and efficient service as all
interactions will take place within the one chat conversation negating the need for the user to
navigate through a site. Generally a Voice Assistant is which can communicate or have an
interaction with a human but the main motivation of our project is that one have to use the Voice
Assistant comfortably and in most efficient way. Not only a normal person but also a blind
person can access the personal computer by just giving the command by speech or voice, even an
illiterate person can access the computer just by entering the command or can access vocally.
Computer programs which can have real conversations are known as Voice Assistant. Voice
8
Assistant can be used with almost all popular apps. These bots can be given distinct personalities
as well. Voice Assistant can understand written and spoken text, and interpret its meaning. The
Voice Assistant can then look up relevant information and deliver it to the user. Most of the
websites rely on Voice Assistant for providing customer a quick response. The motivating
reasons for developing the project are accessible anytime, cost effective, handling capacity,
flexible and customer satisfaction. Different companies like google, Apple use different API’s
for this purpose. It is truly a feat that today, one can schedule meetings or send email merely
through spoken commands. . If a person is facing difficulty communicating with the system
physically, then it will not be proper communication. To overcome these kinds of problems we
made an initiative to develop an application.
1.2 Deep Learning
Deep learning is a subset of machine learning, which is essentially a neural network with three
or more layers. These neural networks attempt to simulate the behavior of the human brain albeit
far from matching its ability—allowing it to “learn” from large amounts of data. While a neural
network with a single layer can still make approximate predictions, additional hidden layers can
help to optimize and refine for accuracy.
Deep learning drives many Artificial Intelligence(AI) applications and services that improve
automation, performing analytical and physical tasks without human intervention. Deep learning
technology lies behind everyday products and services (such as digital assistants, voice-enabled
TV remotes, and credit card fraud detection) as well as emerging technologies (such as self-
driving cars). Deep learning is an increasingly popular subset of machine learning. Deep learning
models are built using neural networks. A neural network takes in inputs, which are then
processed in hidden layers using weights that are adjusted during training. Then the model spits
out a prediction. The weights are adjusted to find patterns in order to make better predictions.
The user does not need to specify what patterns to look for — the neural network learns on its
own.
Keras is a user-friendly neural network library written in Python. In this tutorial, I will go over
two deep learning models using Keras: one for regression and one for classification. We will
build a regression model to predict an employee’s wage per hour, and we will build a
classification model to predict whether or not a patient has diabetes.
9
If deep learning is a subset of machine learning, how do they differ? Deep learning distinguishes
itself from classical machine learning by the type of data that it works with and the methods in
which it learns.
Machine learning algorithms leverage structured, labeled data to make predictions—meaning

that specific features are defined from the input data for the model and organized into tables.
This doesn’t necessarily mean that it doesn’t use unstructured data; it just means that if it does, it
generally goes through some pre-processing to organize it into a structured format.
Deep learning eliminates some of data pre-processing that is typically involved with machine
learning. These algorithms can ingest and process unstructured data, like text and images, and it
automates feature extraction, removing some of the dependency on human experts. For example,
let’s say that we had a set of photos of different pets, and we wanted to categorize by “cat”,
“dog”, “hamster”, et cetera. Deep learning algorithms can determine which features (e.g. ears)
are most important to distinguish each animal from another. In machine learning, this hierarchy
of features is established manually by a human expert.
Then, through the processes of gradient descent and backpropagation, the deep learning
algorithm adjusts and fits itself for accuracy, allowing it to make predictions about a new photo
of an animal with increased precision.
Machine learning and deep learning models are capable of different types of learning as well,
which are usually categorized as supervised learning, unsupervised learning, and reinforcement
learning. Supervised learning utilizes labeled datasets to categorize or make predictions; this
requires some kind of human intervention to label input data correctly. In contrast, unsupervised
10
learning doesn’t require labeled datasets, and instead, it detects patterns in the data, clustering
them by any distinguishing characteristics. Reinforcement learning is a process in which a model
learns to become more accurate for performing an action in an environment based on feedback in
order to maximize the reward.
1.3 Natural language processing
Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers the
ability to understand text and spoken words in much the same way human beings can.
NLP combines computational linguistics—rule-based modeling of human language—with
statistical, machine learning, and deep learning models. Together, these technologies enable
computers to process human language in the form of text or voice data and to ‘understand’ its
full meaning, complete with the speaker or writer’s intent and sentiment.
NLP drives computer programs that translate text from one language to another, respond to
spoken commands, and summarize large volumes of text rapidly—even in real time. There’s a
good chance you’ve interacted with NLP in the form of voice-operated GPS systems, digital
assistants, speech-to-text dictation software, customer service chatbots, and other consumer
conveniences. But NLP also plays a growing role in enterprise solutions that help streamline
business operations, increase employee productivity, and simplify mission-critical business
processes.
Natural language processing, or NLP, is a branch of computer science that involves the analysis
of human language in speech and text. A specific subset of AI and machine learning (ML), NLP
is already widely used in many applications today. NLP is how voice assistants, such as Siri and
Alexa, can understand and respond to human speech and perform tasks based on voice
commands.
NLP is the driving technology that allows machines to understand and interact with human
speech, but is not limited to voice interactions. Natural language processing is also the
technology behind apps such as customer service chatbots. In addition, NLP enables email and
SMS apps to automatically suggest replies or text to complete a message as it is typed. These
applications, just like voice assistants, cannot intuitively understand human (or “natural”)
language.
11
Natural Language Processing
Everything we express (either verbally or in written) carries huge amounts of information. The
topic we choose, our tone, our selection of words, everything adds some type of information that
can be interpreted and value extracted from it. In theory, we can understand and even predict
human behaviour using that information.
But there is a problem: one person may generate hundreds or thousands of words in a
declaration, each sentence with its corresponding complexity. If you want to scale and analyze
several hundreds, thousands or millions of people or declarations in a given geography, then the
situation is unmanageable.
Data generated from conversations, declarations or even tweets are examples of unstructured
data. Unstructured data doesn’t fit neatly into the traditional row and column structure of
relational databases, and represent the vast majority of data available in the actual world. It is
messy and hard to manipulate. Nevertheless, thanks to the advances in disciplines like machine
learning a big revolution is going on regarding this topic. Nowadays it is no longer about trying
to interpret a text or speech based on its keywords (the old fashioned mechanical way), but about
understanding the meaning behind those words (the cognitive way). This way it is possible to
detect figures of speech like irony, or even perform sentiment analysis.
It is a discipline that focuses on the interaction between data science and human language, and is
scaling to lots of industries. Today NLP is booming thanks to the huge improvements in the
access to data and the increase in computational power, which are allowing practitioners to
achieve meaningful results in areas like healthcare, media, finance and human resources, among
others.
12
1.4 Sequential Model:
When we are using NLP to deal with textual data, one key point we must understand is that the
data is always in the form of sequences and the order of the data matters. For any given sentence,
if the order of words is changed, the meaning of the sentence doesn’t stay the same, hence we
can say that the sentence information is stored in both the words as well as the order of the words
in that particular sentence. In any type of data, if the sequential order matters, we call it
sequential data.
Traditional neural networks typically cannot handle sequential data. This is because when we
build a neural network for a particular task, we need to set a fixed input size at the beginning, but
in sequential data, the size of the data can vary. A sentence can contain 5 words, or 20 words,
hence we cannot configure a neural network to effectively deal with this kind of data. Even if we
were dealing with sentences with the same number of words, which is an ideal scenario, when
we input the processed words into a neural network of some fixed input size, a neural network is
not designed to pay attention to the sequence of the words. The model will effectively learn from
the semantic information of the individual words in the sentence, but it will fail to learn from the
order of the words in the sentence.
To convert textual data into numerical format so that we can input them into neural networks, we
must convert them into vectors. These can be either one hot encoded vectors or word vectors. I
have explained about these in the previous Deep NLP article over here. So our textual data will
turn into a sequence of vectors, which is exactly the format we need.
A Sequential model is appropriate for a plain stack of layers where each layer has exactly one
input tensor and one output tensor.
Schematically, the following Sequential model:
is equivalent to this function:
13
A Sequential model is not appropriate when:
Your model has multiple inputs or multiple outputs

Any of your layers has multiple inputs or multiple outputs
You need to do layer sharing
You want non-linear topology (e.g. a residual connection, a multi-branch model)
Creating a Sequential model

You can create a Sequential model by passing a list of layers to the Sequential constructor:
14
Specifying the input shape in advance:
Generally, all layers in Keras need to know the shape of their inputs in order to be able to create
their weights. So when you create a layer like this, initially, it has no weights:
1.5 Natural Language Toolkit:

15
NLTK is a leading platform for building Python programs to work with human language data. It
provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along
with a suite of text processing libraries for classification, tokenization, stemming, tagging,
parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active
discussion forum.
Thanks to a hands-on guide introducing programming fundamentals alongside topics in

computational linguistics, plus comprehensive API documentation, NLTK is suitable for
linguists, engineers, students, educators, researchers, and industry users alike. NLTK is available
for Windows, Mac OS X, and Linux. Best of all, NLTK is a free, open source, community-
driven project.
NLTK has been called “a wonderful tool for teaching, and working in, computational linguistics
using Python,” and “an amazing library to play with natural language.”
Natural Language Processing with Python provides a practical introduction to programming for
language processing. Written by the creators of NLTK, it guides the reader through the
fundamentals of writing Python programs, working with corpora, categorizing text, analyzing
linguistic structure, and more. The online version of the book has been been updated for Python
3 and NLTK 3.
NLTK is a toolkit build for working with NLP in Python. It provides us various text processing
libraries with a lot of test datasets. A variety of tasks can be performed using NLTK such as
tokenizing, parse tree visualization, etc… In this article, we will go through how we can set up
NLTK in our system and use them for performing various NLP tasks during the text processing
step.
16
You can see this screen and install the required corpus. Once you have completed this step let’s
dive deep into the different operations using NLTK.
Tokenization:
The breaking down of text into smaller units is called tokens. tokens are a small part of that text.
If we have a sentence, the idea is to separate each word and build a vocabulary such that we can
represent all words uniquely in a list. Numbers, words, etc.. all fall under tokens.
The security and risk reduction benefits of tokenization require that the tokenization system is
logically isolated and segmented from data processing systems and applications that previously
processed or stored sensitive data replaced by tokens. Only the tokenization system can tokenize
data to create tokens, or detokenize back to redeem sensitive data under strict security controls.
The token generation method must be proven to have the property that there is no feasible means
through direct attack, cryptanalysis, side channel analysis, token mapping table exposure or brute
force techniques to reverse tokens back to live data.
Replacing live data with tokens in systems is intended to minimize exposure of sensitive data to
those applications, stores, people and processes, reducing risk of compromise or accidental
exposure and unauthorized access to sensitive data. Applications can operate using tokens
instead of live data, with the exception of a small number of trusted applications explicitly
permitted to detokenize when strictly necessary for an approved business purpose. Tokenization
systems may be operated in-house within a secure isolated segment of the data center, or as a
service from a secure service provider.
Tokenization may be used to safeguard sensitive data involving, for example, bank accounts,
financial statements, medical records, criminal records, driver's licenses, loan applications, stock
trades, voter registrations, and other types of personally identifiable information (PII).
Tokenization is often used in credit card processing. The PCI Council defines tokenization as "a
process by which the primary account number (PAN) is replaced with a surrogate value called a
token. De-tokenization[5] is the reverse process of redeeming a token for its associated PAN
value. The security of an individual token relies predominantly on the infeasibility of
determining the original PAN knowing only the surrogate value".[6] The choice of tokenization
as an alternative to other techniques such as encryption will depend on varying regulatory
requirements, interpretation, and acceptance by respective auditing or assessment entities. This is
in addition to any technical, architectural or operational constraint that tokenization imposes in
practical use.
17
Lower case conversion:
We want our model to not get confused by seeing the same word with different cases like one
starting with capital and one without and interpret both differently. So we convert all words into
the lower case to avoid redundancy in the token list.
1.6 Lemmatizer
Lemmatization in NLTK is the algorithmic process of finding the lemma of a word
depending on its meaning and context. Lemmatization usually refers to the morphological
analysis of words, which aims to remove inflectional endings. It helps in returning the base or
dictionary form of a word known as the lemma.
The NLTK Lemmatization method is based on WorldNet’s built-in morph function. Text
preprocessing includes both stemming as well as lemmatization. Many people find the two terms
confusing. Some treat these as the same, but there is a difference between stemming vs
lemmatization. Lemmatization is preferred over the former because of the below reason
18
Stemming algorithm works by cutting the suffix from the word. In a broader sense cuts either the
beginning or end of the word.
On the contrary, Lemmatization is a more powerful operation, and it takes into consideration
morphological analysis of the words. It returns the lemma which is the base form of all its
inflectional forms. In-depth linguistic knowledge is required to create dictionaries and look for
the proper form of the word. Stemming is a general operation while lemmatization is an
intelligent operation where the proper form will be looked in the dictionary. Hence,
lemmatization helps in forming better machine learning features.
Lemmatizer minimizes text ambiguity. Example words like bicycle or bicycles are converted to
base word bicycle. Basically, it will convert all words having the same meaning but different
representation to their base form. It reduces the word density in the given text and helps in
preparing the accurate features for training machine. Cleaner the data, the more intelligent and
accurate your machine learning model, will be. NLTK Lemmatizer will also saves memory as
well as computational cost.
In many languages, words appear in several inflected forms. For example, in English, the verb 'to
walk' may appear as 'walk', 'walked', 'walks' or 'walking'. The base form, 'walk', that one might
look up in a dictionary, is called the lemma for the word. The association of the base form with a
part of speech is often called a lexeme of the word.
Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a

single word without knowledge of the context, and therefore cannot discriminate between words
which have different meanings depending on part of speech. However, stemmers are typically
easier to implement and run faster. The reduced "accuracy" may not matter for some
applications. In fact, when used within information retrieval systems, stemming improves query
recall accuracy, or true positive rate, when compared to lemmatisation. Nonetheless, stemming
reduces precision, or the proportion of positively-labeled instances that are actually positive, for
such systems.
For instance:
The word "better" has "good" as its lemma. This link is missed by stemming, as it requires a
dictionary look-up.
The word "walk" is the base form for the word "walking", and hence this is matched in both
stemming and lemmatisation.
19
The word "meeting" can be either the base form of a noun or a form of a verb ("to meet")
depending on the context; e.g., "in our last meeting" or "We are meeting again tomorrow".
Unlike stemming, lemmatisation attempts to select the correct lemma depending on the context.
NLTK Lemmatization is the process of grouping the inflected forms of a word in order to
analyze them as a single word in linguistics. NLTK has different lemmatization algorithms and
functions for using different lemma determinations. Lemmatization is more useful to see a
word’s context within a document when compared to stemming. Unlike stemming,
lemmatization uses the part of speech tags and the meaning of the word in the sentence to see the
main context of the document. Thus, NLTK Lemmatization is important for understanding a text
and using it for Natural Language Processing, and Natural Language Understanding practices.
NLTK Lemmatization is called morphological analysis of the words via NLTK. To perform text
analysis, stemming and lemmatization, both can be used within NLTK. The main use cases of
the NLTK Lemmatization are below.
1. Information Retrieval Systems
2. Indexing Documents within Word Lists
3. Text Understanding
4. Text Clustering
5. Word Tokenization and Visualization
The NLTK Lemmatization example above contains word tokenization, and a specific
lemmatization function example that returns the words’ original form within the sentence and
their lemma within a dictionary. The NLTK Lemmatization code block example above can be
explained as follows.
20
1. Import the WordNetLemmetizer from nltk.stem
2. Import word_tokenize from nltk.tokenize
3. Create a variable for the WordNetLemmetizer() method representation.
4. Define a custom function for NLTK Lemmatization with the argument that will include
the text for lemmatization.
5. Use a list, and for loop for tokenization and lemmatization.
6. Append the tokenized and lemmatized words into a dictionary to compare their lemma
and original forms to each other.
7. Call the function with an example text for lemmatization.
1.7 Numpy:
This article will help you get acquainted with the widely used array-processing library in Python,
NumPy. What is NumPy? NumPy is a general-purpose array-processing package. It provides a
high-performance multidimensional array object, and tools for working with these arrays. It is
the fundamental package for scientific computing with Python. It is open-source software. It
contains various features including these important ones:
A powerful N-dimensional array object

Sophisticated (broadcasting) functions
Tools for integrating C/C++ and Fortran code
Useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional
container of generic data. Arbitrary data-types can be defined using Numpy which allows
NumPy to seamlessly and speedily integrate with a wide variety of databases.
NumPy, which stands for Numerical Python, is a library consisting of multidimensional array
objects and a collection of routines for processing those arrays. Using NumPy, mathematical and
logical operations on arrays can be performed. This tutorial explains the basics of NumPy such
as its architecture and environment. It also discusses the various array functions, types of
indexing, etc. An introduction to Matplotlib is also provided. All this is explained with the help
of examples for better understanding.
This tutorial has been prepared for those who want to learn about the basics and various
functions of NumPy. It is specifically useful for algorithm developers. After completing this
tutorial, you will find yourself at a moderate level of expertise from where you can take yourself
to higher levels of expertise.
21
You should have a basic understanding of computer programming terminologies. A basic
understanding of Python and any of the programming languages is a plus.
For example, you can create an array from a regular Python list or tuple using the array function.
The type of the resulting array is deduced from the type of the elements in the sequences.
Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder content. These minimize the
necessity of growing arrays, an expensive operation. For example: np.zeros, np.ones, np.full,
np.empty, etc.
To create sequences of numbers, NumPy provides a function analogous to range that returns
arrays instead of lists.
arange: returns evenly spaced values within a given interval. step size is specified.
linspace: returns evenly spaced values within a given interval. num no. of elements are returned.
Reshaping array: We can use reshape method to reshape an array. Consider an array with shape
(a1, a2, a3, …, aN). We can reshape and convert it into another array with shape (b1, b2, b3, …,
22
bM). The only required condition is: a1 x a2 x a3 … x aN = b1 x b2 x b3 … x bM . (i.e original
size of array remains unchanged.)
Flatten array: We can use flatten method to get a copy of array collapsed into one dimension. It
accepts order argument. Default value is ‘C’ (for row-major order). Use ‘F’ for column major
order.
We will use the Python programming language for all assignments in this course. Python is a
great general-purpose programming language on its own, but with the help of a few popular
libraries (numpy, scipy, matplotlib) it becomes a powerful environment for scientific computing.
We expect that many of you will have some experience with Python and numpy; for the rest of
you, this section will serve as a quick crash course on both the Python programming language
and its use for scientific computing. We’ll also introduce notebooks, which are a very convenient
way of tinkering with Python code.
1.8 Tensorflow
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive,
flexible ecosystem of tools, libraries, and community resources that lets researchers push the
state-of-the-art in ML, and gives developers the ability to easily build and deploy ML-powered
applications.
TensorFlow provides a collection of workflows with intuitive, high-level APIs for both
beginners and experts to create machine learning models in numerous languages. Developers
have the option to deploy models on a number of platforms such as on servers, in the cloud, on
mobile and edge devices, in browsers, and on many other JavaScript platforms. This enables
developers to go from model building and training to deployment much more easily.
TensorFlow is an open source software library for numerical computation using data-flow
graphs. It was originally developed by the Google Brain Team within Google's Machine
Intelligence research organization for machine learning and deep neural networks research, but
the system is general enough to be applicable in a wide variety of other domains as well. It
reached version 1.0 in February 2017, and has continued rapid development, with 21,000+
commits thus far, many from outside contributors. This article introduces TensorFlow, its open
source community and ecosystem, and highlights some interesting TensorFlow open sourced
models.
TensorFlow is cross-platform. It runs on nearly everything: GPUs and CPUs—including mobile

and embedded platforms—and even tensor processing units (TPUs), which are specialized
hardware to do tensor math on. They aren't widely available yet, but we have recently launched
an alpha program.
23
Keras is compact, easy to learn, high-level Python library run on top of TensorFlow framework.
It is made with focus of understanding deep learning techniques, such as creating layers for
neural networks maintaining the concepts of shapes and mathematical details. The creation of
freamework can be of the following two types −
i. Sequential API
ii. Functional API
Consider the following eight steps to create deep learning model in Keras −
i. Loading the data

ii. Preprocess the loaded data
iii. Definition of model
iv. Compiling the model
v. Fit the specified model
vi. Evaluate it
vii. Make the required predictions
viii. Save the model
1.9Keras
Keras is an open-source high-level Neural Network library, which is written in Python is capable
enou gh to run on Theano, TensorFlow, or CNTK. It was developed by one of the Google
engineers, Francois Chollet. It is made user-friendly, extensible, and modular for facilitating
faster experimentation with deep neural networks. It not only supports Convolutional Networks
and Recurrent Networks individually but also their combination.
It cannot handle low-level computations, so it makes use of the Backend library to resolve it. The
backend library act as a high-level API wrapper for the low-level API, which lets it run on
TensorFlow, CNTK, or Theano.
Initially, it had over 4800 contributors during its launch, which now has gone up to 250,000
developers. It has a 2X growth ever since every year it has grown. Big companies like Microsoft,
Google, NVIDIA, and Amazon have actively contributed to the development of Keras. It has an
amazing industry interaction, and it is used in the development of popular firms likes Netflix,
Uber, Google, Expedia, etc.
Keras user experience
24
1. Keras is an API designed for humans
Best practices are followed by Keras to decrease cognitive load, ensures that the models
are consistent, and the corresponding APIs are simple.
2.Not designed for machines
Keras provides clear feedback upon the occurrence of any error that minimizes the
number of user actions for the majority of the common use cases.
3.Easy to learn and use.
4.Highly Flexible
Keras provide high flexibility to all of its developers by integrating low-level deep
learning languages such as TensorFlow or Theano, which ensures that anything written in the
base language can be implemented in Keras.
Keras can be developed in R as well as Python, such that the code can be run with TensorFlow,
Theano, CNTK, or MXNet as per the requirement. Keras can be run on CPU, NVIDIA GPU,
AMD GPU, TPU, etc. It ensures that producing models with Keras is really simple as it totally
supports to run with TensorFlow serving, GPU acceleration (WebKeras, Keras.js), Android (TF,
TF Lite), iOS (Native CoreML) and Raspberry Pi.
Keras is a high-level neural networks API developed with a focus on enabling fast
experimentation. Being able to go from idea to result with the least possible delay is key to doing
good research. Keras has the following key features:
 Allows the same code to run on CPU or on GPU, seamlessly.
 User-friendly API which makes it easy to quickly prototype deep learning models.
 Built-in support for convolutional networks (for computer vision), recurrent networks
(for sequence processing), and any combination of both.
 Supports arbitrary network architectures: multi-input or multi-output models, layer

sharing, model sharing, etc. This means that Keras is appropriate for building essentially
any deep learning model, from a memory network to a neural Turing machine.
Keras empowers engineers and researchers to take full advantage of the scalability and cross-
platform capabilities of TensorFlow 2: you can run Keras on TPU or on large clusters of GPUs,
and you can export your Keras models to run in the browser or on a mobile device.
25
2. LITERATURE SURVEY
Desktop Voice Assistant for Visually Impaired
The usage of virtual assistants is expanding rapidly after 2017, more and more products are
coming into the market. Due to advancement in the technology many different features are being added in
the mobile phone and desktops. To use them with more convenient and fun way we require a means of
input which is faster and reliable at the same time. In our project we use voice command to input the data
into the system for that the microphone is used which convert acoustic energy into electrical energy. After
taking the input there is a requirement to understand the audio signal for this google API is used.
Different companies like google, apple use different API’s for this purpose. It is truly a feat that today,
one can schedule meetings or send email merely through spoken commands.
1. Speech recognition The proposed system used the google API to convert input speech into
text. The speech is given as an input to google cloud for processing, As an output, the system
then receives the resulting text.
2. Backend work At backend the python gets the output from speech recognition and after that it
identifies whether the command is a system command or a browser command. The output is sent
back to the python backend to give desired output to the user.
3. Text to speech Text to speech, or TTS, is a new wave technique for transforming voice
commands into readable text. Not to mix it up with VR Systems that instead, generate speech by
joining strings gathered in an exhaustive DB of preinstalled text and have been developed for
different goals which form full-fledged sentences, clauses or meaningful phrases through a
dialect's graphemes and phonemes. Such systems have their limits as they can only determine
text on the basis of pre-determined text in their databases TTS systems, on the other hand, are
practically to "read" strings of characters and dole out resulting sentences, clauses and phrases.
ii) Proposed Architecture The system design consists of

1. Taking the input as speech patterns through the microphone.
2. Audio data recognition and conversion into text.
3. Comparing the input with predefined commands
4. Giving the desired output The initial phase includes the data being taken in as speech patterns
from the microphone.in the second phase the collected data is worked over and transformed into
textual data using NLP. In the next step, this resulting stringified data is manipulated through
Python Script to finalize the required output process. In the last phase, the produced output is
presented either in the form of text or converted from text to speech using TTS.
26
An Intelligent Behaviour Shown by Chatbot System:
A chatterbot is a computer program which conducts a conversation via auditory or textual
methods. Such programs are often created to convincingly simulate how a human would behave
as a conversational partner, thereby passing the Turing test. Chatbots are mainly used in dialog
systems for various practical purposes including customer services or information acquisition.
There are two main types of chatbots available, one whose functions are based on a set of rules
and other is the more advanced version which uses artificial intelligence. The former one tends to
be limited and their smartness depends upon the complexity of the program. The more complex
the program is, the more is the smartness of the bot. The one that uses artificial intelligence,
understands language, not just commands, and continuously gets smarter as it learns from the
conversation with the people. A chatbot can also perform some basic functions like calculations,
setting-up remainders, alarms etc. A popular example is ALICE Bot (Artificial Linguistic
Internet Computer Entity), that uses AIML(Artificial Intelligence Mark-Up Language) pattern
matching techniques. Turing Test is the one of the most popular measures of intelligence of such
systems. This test was proposed by British mathematician Alan Turing in his 1950 paper titled
“Computing Machinery and Intelligence” published in Mind. According to this test, when a
panel of human beings is conversing with an unknown entity believes that entity is human while
it was a computer, then the computer is said to have passed the turing test. A natural language
processing (NLP) gives capability of computer allows communication to happen between Vibhor
Sharma, Assistant Prof. , Department of Information Technology Maharaja Agrasen Institute Of
Technology GGSIPU, New Delhi, India Monika Goyal , Department of Information Technology
Maharaja Agrasen Institute Of Technology GGSIPU, New Delhi, India Drishti Malik,
Department of Information Technology Maharaja Agrasen Institute Of Technology GGSIPU,
New Delhi, India user-to-computer or human-to-machine and computer-to-computer or machine-
to-machine using human natural language.
27
Intelligent Android Voice Assistant - A Future Requisite:
The smart phone market is one of the most competitive markets in the world today with various
competitors such as Samsung, Google, Sony, HTC etc. As the users increase day by day,
facilities are also increasing. In recent years, smart phones have placed a rising emphasis on
bringing speech technologies into mainstream usage. The purpose of voice assistant systems is
the exchange of information in a more interactive approach using speech to commune. It is
estimated that smart phones captured 44% of all mobile phone sales in the December 2012
quarter with Android smart phones taking 31% of all mobile phone shipments and iOS in second
place at 9%. Android smart phones grew 88% year over year with iOS at 23%. So, it is
preferable that the application should be made on android platform as more number of people
can be facilitated by the personal assistants.[1] According to a survey, 54% of users agreed that
AI personal assistants make their lives easier. 31% said that AI assistants are part of their
everyday lives. 65% agreed that they have many different uses. Looking forward, the result of
the survey also unfold that 65% of users said that they regularly ask general questions to their AI
personal assistants. 40% use them to get directions while they drive. 25% use them to make calls.
23% dictate texts or emails through those assistants. 17% use them to receive updates. And 9%
use them in other ways, like for weather alerts and appointment reminders.[2] This paper
presents an Intelligent Android Voice Assistant system using speech recognition technologies of
mobile devices for every common user who is interested in AI personal assistant.
Voice assistant privacy concerns :
A user may have a privacy concern as personal assistant require a huge amount of data and are
always listening to take the command This passive data is then, retained and sifted through
humans that are employed by almost all of the major companies- Amazon, Apple etc. In addition
to this discovery over the Ai being able to record our audio interactions, there have been
concerns over the type of data such employees and contractors were hearing. So, in the cloud-
based voice assistant, privacy policy must be there which will protect the user information
As voice assistants and voice-activated Internet of Things devices work their way into people's
homes, there are plenty of new opportunities for developers and businesses. Creating an app that
obeys voice commands and interacts with a third-party voice assistant presents the opportunity to
reach a rapidly-growing market of tech-savvy consumers.
But you must proceed with caution. There are significant privacy issues associated with this type
of technology. There's a real danger of scaring people away if you aren't totally transparent.
Voice assistants and privacy is a topic that's often in the news. While most of the attention is on
device manufacturers, anyone who offers tools and services that use the technology also needs to
think carefully about the privacy implications.
28
Whether it's a mobile app that uses voice recognition, or a dedicated service for a voice assistant
gadget, you need to comply with both the rules of the device manufacturer and a host of national
and international laws.
Everyone is familiar with voice assistants: in many households, Alexa is practically a member of
the family, and whenever kids nowadays want to hear their favorite tunes, all they need to do is
ask, “Alexa, play … “. Every iPhone user knows Siri, every Android phone user might know the
Google Assistant. And, of course, the list goes on: there is Samsung’s Bixby, Microsoft’s
Cortana, Huawei’s HiVoice, etc. However, to be fair, 97% of all voice assistant users choose one
of the big three: Alexa, Siri, or Google Assistant.
Voice assistants are integrated into smart devices, like Alexa in Amazon Echo and Amazon Echo
Display, Apple’s Siri in HomePod or Google Assistant in Google Home. They are also usually
compatible with other digital equipment, such as smart TVs, mobile phones etc. Moreover,
Alexa’s functionality, for instance, can be extended not only by Amazon itself, (functions called
“Alexa skills”), but also through skills provided by third party users (e.g., radio station x
provides the skill “Alexa, play [radio station x]” in the Amazon’s skill store).
And, last but certainly not least, there is the automotive industry. Nearly every car company
either provides their own voice assistant, an interface for Google Assistant and/or Siri, or
integrates Alexa directly. Every driver of a modern Audi, Mercedes or BMW knows their “Hey
[Audi | Mercedes | BMW]” activating commands.
3. METHODOLOGY
3.1 Proposed System
We developed a program that serves the needs of the user, but the user cannot run the
code always when he is in need. So we developed a GUI(Graphical user interface) by using
QWidgets. It's a good tool for developers to design an efficient interface, where the data and the
features of the system are represented graphically. So he can directly interact with the interface
conveniently. The JSON file is used as a dataset in our project. This includes various kinds of
data, and these data are categorized by using Tag or Label, this tells about which kind of data is
stored under the name of the tag. The model which we developed has to be trained. By training,
we will get more accurate results and by doing this more and more the Loss function(Expected
output-Original output) will get less cost. So we used the Sequential model which is part of
Keras. It is a linear stack of layers Each layer has a particular number of neurons which are used
to process the data to get desired results. Keras is an open-source platform which acts as an
interface between Python and ANN(Artificial neural network). By using this we trained and
tested the data stored in the json dataset files. By doing the training of the model along with the
model testing, the assistant is ready to answer the user queries by performing some internal
29
operations, such as speech to text conversion it is done by using python inbuilt speech
recognition modules and after by using the converted text to identify the words and to categorize
them we used lemmatizer which is part of NLP( natural language processing), it then divides into
different groups under labels like greetings, googles search, applications, etc. After identifying
the text it is sent to the training model, then we will get to know which intent or tag based on the
command or instruction, the model will communicate with the user with voice response by
asking for instructions to perform the actions in the system like opening the application or
searching for information. It will ask doubts about instructions and after getting clarity then the
instruction is based on the intent like application, youtube search, google search, etc.. it will
perform the task related to the intent If it is an application then it will open the application and
close the application, If it is a youtube search it will go to the youtube and search the required
text in the youtube, If it is a google search it will go to the browser and search the required text in
the chrome. Every time we use the model it will be getting trained more and more in the time
being and it will produce more accurate results.
3.2 Architecture:
Fig 3.2: Architecture
30
4. UML
A UML diagram is a diagram based on the UML (Unified Modeling Language) with the purpose
of visually representing a system along with its main actors, roles, actions, artifacts or classes, in
order to better understand, alter, maintain, or document information about the system.
4.1 State Chart:
Statechart diagram is one of the five UML diagrams used to model the dynamic nature of a
system. They define different states of an object during its lifetime and these states are changed
by events. Statechart diagrams are useful to model the reactive systems. Reactive systems can be
defined as a system that responds to external or internal events.
Statechart diagram describes the flow of control from one state to another state. States are
defined as a condition in which an object exists and it changes when some event is triggered. The
most important purpose of Statechart diagram is to model lifetime of an object from creation to
termination.
31
4.2 Communication UML Diagram:
Since the core components are the messages that are exchanged between objects, we can build
communication diagrams the same way we would make a sequence diagram. The only difference
between the two is that objects in communication diagrams are shown with association
connections.
Visually, the two differ in that sequence diagrams are well-structured vertically and the message
flow follows a top-down chronological approach. Communication UML diagrams on the other
hand use number schemes and pointing arrows in order to depict the message flow.
32
4.3 Sequence UML Diagram
In this post we discuss Sequence Diagrams. Unified Modelling Language (UML) is a modeling
language in the field of software engineering which aims to set standard ways to visualize the
design of a system. UML guides the creation of multiple types of diagrams such as interaction ,
structure and behaviour diagrams. A sequence diagram is the most commonly
used interaction diagram. Interaction diagram – An interaction diagram is used to show
the interactive behavior of a system. Since visualizing the interactions in a system can be a
cumbersome task, we use different types of interaction diagrams to capture various features and
aspects of interaction in a system. Sequence Diagrams – A sequence diagram simply depicts
33
interaction between objects in a sequential order i.e. the order in which these interactions take
place. We can also use the terms event diagrams or event scenarios to refer to a sequence
diagram.
4. EXPERIMENTAL ANALYSIS AND RESULTS

4.1 System Configuration
4.1.1 Software Requirements
Operating System :Windows XP and above.
Programming Language : Python.
Technology : Deep Learning.
4.1.2 Hardware Requirements

Processor : core i5 and above.
RAM : 4GB and above.
Graphic card : NVIDIA GTX 1060.
Hard Disk : 1 TB and above.
4.2 Sample Code
4.2.1 Main file
from googletrans import Translator
import config
import model
import utils
from intents import mail, note
from intents.application import Applications
from intents.google_search import GoogleSearch
from intents.youtube_search import YoutubeSearch
from model.model_training import TrainingModel
from PyQt5 import QtWidgets, QtGui, QtCore
from PyQt5.QtGui import QMovie
import sys
from PyQt5.QtWidgets import *
34
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.uic import loadUiType
import speech_recognition as sr
import os
import time
import datetime
import pyautogui
flags = QtCore.Qt.WindowFlags(QtCore.Qt.FramelessWindowHint)
def wish():
hour = int(datetime.datetime.now().hour)
if hour >= 0 and hour < 12:
utils.speak("Good morning")
elif hour >= 12 and hour < 18:
utils.speak("Good Afternoon")
else:
utils.speak("Good night")
class mainT(QThread):
def __init__(self):
super(mainT, self).__init__()
def run(self):
self.Assistant()
def STT(self):
R = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...........")
audio = R.listen(source)
35
try:
print("Recognizing......")
text = R.recognize_google(audio, language='en-in')
print(">> ", text)
except Exception:
utils.speak("Sorry Speak Again")
return "None"
text = text.lower()
return text
def Assistant(self):
wish()
words = model.words
classes = model.classes
data_x = model.data_x
data_y = model.data_y
training_model = TrainingModel(words, classes, data_x, data_y)
trained_model = training_model.train()
while True:
command = self.STT()
if command != "None" or command != '' or command:
intent = training_model.get_intent(trained_model, command)
response = TrainingModel.get_response(intent, config.DATA)
print(intent, ' : ', response)
if intent == 'greeting':
utils.speak(response=response)
elif intent == 'youtube_search':
YoutubeSearch.launch(command)
elif intent == 'google_search':
GoogleSearch.launch(command)
36
elif intent == 'applications':
Applications(response).launch(command)
elif intent == "note":
utils.speak("What would you like me to write down?")
note_text = self.STT()
note.take_note(note_text)
utils.speak("I've made a note of that")
elif intent == "close_note":
utils.speak("Okay sir, closing notepad")
os.system("taskkill /f /im notepad.exe")
elif intent == "screen_shot":
utils.speak("Alright sir, taking the screenshot")

img = pyautogui.screenshot()
img.save(r"C:\Users\seshu\Desktop\screenshot.png")
FROM_MAIN, _ = loadUiType(os.path.join(os.path.dirname(__file__),
"./scifi.ui"))
class Main(QMainWindow, FROM_MAIN):

def __init__(self, parent=None):
super(Main, self).__init__(parent)
self.setupUi(self)
self.setFixedSize(1920, 1080)
self.label_7 = QLabel
self.exitB.setStyleSheet("background-image:url(./lib/exit - Copy.png);\n"
"border:none;")
self.exitB.clicked.connect(self.close)
self.setWindowFlags(flags)
Dspeak = mainT()
self.label_7 = QMovie("./lib/gifloader.gif", QByteArray(), self)
37
self.label_7.setCacheMode(QMovie.CacheAll)
self.label_4.setMovie(self.label_7)
self.label_7.start()
self.ts = time.strftime("%A, %d %B")
Dspeak.start()
self.label.setPixmap(QPixmap("./lib/tuse.png"))
self.label_5.setText("<font size=8 color='white'>" + self.ts + "</font>")
self.label_5.setFont(QFont(QFont('Acens', 8)))
app = QtWidgets.QApplication(sys.argv)
main = Main()
main.show()
exit(app.exec_())
4.2.2 Model Training

import random
import string
import nltk
import numpy as np
import tensorflow as tf
from nltk.stem import WordNetLemmatizer
from keras import Sequential
from keras.layers import Dense, Dropout
class TrainingModel:
def __init__(self, words, classes, data_x, data_y):
self.words = words
self.classes = classes
self.data_x = data_x
38
self.data_y = data_y
self.lemmatizer = WordNetLemmatizer()
def train(self):
words = [self.lemmatizer.lemmatize(word.lower()) for word in self.words if

word not in string.punctuation]
training = []
out_empty = [0] * len(self.classes)
for idx, doc in enumerate(self.data_x):

bow = []
text = self.lemmatizer.lemmatize(doc.lower())
for word in words:
bow.append(1) if word in text else bow.append(0)
output_row = list(out_empty)
output_row[self.classes.index(self.data_y[idx])] = 1
training.append([bow, output_row])
random.shuffle(training)
training = np.array(training, dtype=object)
train_x = np.array(list(training[:, 0]))
train_y = np.array(list(training[:, 1]))
input_shape = (len(train_x[0]),)
output_shape = len(train_y[0])
model = Sequential()
model.add(Dense(128, input_shape=input_shape, activation="relu"))
model.add(Dropout(0.5))
model.add(Dense(64, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(output_shape, activation="softmax"))
adam = tf.keras.optimizers.Adam(learning_rate=0.01, decay=1e-6)
39
model.compile(loss='categorical_crossentropy',
optimizer=adam,
metrics=["accuracy"])
model.fit(x=train_x, y=train_y, epochs=200, verbose=1)
return model
def get_intent(self, model, command):

tokens = nltk.word_tokenize(command)
tokens = [self.lemmatizer.lemmatize(word.lower()) for word in tokens]
bow = [0] * len(self.words)
for token in tokens:
for idx, word in enumerate(self.words):
if word == token:
bow[idx] = 1
bow = np.array(bow)
result = model.predict(np.array([bow]))[0]
thresh = 0.2
y_pred = [[idx, res] for idx, res in enumerate(result) if res > thresh]
y_pred.sort(key=lambda x: x[1], reverse=True)
intent = self.classes[y_pred[0][0]]
return intent
@staticmethod
def get_response(tag, data):
list_of_intents = data['intents']
for intent in list_of_intents:
if intent['tag'] == tag:
if len(intent['response']) > 0:
return random.choice(intent['response'])
else:
40
return None
4.2.3 Application.py
import os
import re
import subprocess
import config
import utils
import intents.windows as iw
class Applications:
INTENT_NAME = 'applications'
APP_INSTALLATION_DIRECTORIES = ['/System/Applications',
'/Applications', '/System/Applications/Utilities']
def __init__(self, response, logger=None):

self.logger = logger
self.response = response
self.os_name = config.OS_NAME
def get_name(self, command):

app = utils.get_search_value(command,
intent_name=Applications.INTENT_NAME)
is_space = bool(re.search(r"\s", app))

if is_space:
return app.replace(' ', "\\ ")
else:
return app
def launch(self, command):

app = self.get_name(command)
41
if self.os_name == 'Darwin':
path = utils.get_path_from_file(app)
if path is None:
patterns = [f'*{app}.app', f'{app}*.app', f'*{app}.app', f'*{app}*.app']
for directory in Applications.APP_INSTALLATION_DIRECTORIES:
if path:
break
for pattern in patterns:
path = os.popen(f"find {directory} -iname '{pattern}'").read() \
.split('\n')[0].replace(" ", "\\ ")
if path:
break
cmd = f'open {path}'

self.execute_command(cmd)
utils.add_to_json({app: {'path': path}})
else:
path = utils.get_path_from_file(app)
if path is None:
path = utils.get_path(app, iw.EXECUTABLE_EXT,
iw.APP_INSTALLATION_DIRECTORIES)
utils.add_to_json({app: {'path': path}})
if path:
cmd = f'explorer "{path}"'
print('Application : ', cmd)
self.execute_command(cmd)
def execute_command(self, cmd):

utils.speak(response=self.response)
output = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
stderr=subprocess.PIPE).communicate()
if str(output[1], 'utf-8')!='':
42
utils.speak('I am sorry sir, The app which you are looking for is not
installed in my database.')
4.2.4 Google Search

import webbrowser
import utils
class GoogleSearch:
@staticmethod
def launch(query):
utils.speak("ok sir This is what I found for your search!")
query.replace("assistant", "")
query.replace("google search", "")
query.replace("search", "")
query.replace("google", "")
web = "https://www.google.com/search?q=" + query
webbrowser.open(web)
utils.speak("Done sir")
4.2.5 Mail.py
import smtplib
def mail(sender_email, sender_password, receiver_email, msg):

try:
mail = smtplib.SMTP('smtp.gmail.com', 587)
mail.ehlo()
mail.starttls()
mail.login(sender_email, sender_password)
mail.sendmail(sender_email, receiver_email, msg)
mail.close()
return True
except Exception as e:
print(e)
return False
43
4.2.6 Note.py
import subprocess
import datetime
def take_note(text):
date = datetime.datetime.now()
file_name = str(date).replace(":", "-") + "-note.txt"
with open(file_name, "w") as f:
f.write(text)
notepad = "C:\\Windows\\System32\\notepad.exe"
subprocess.Popen([notepad, file_name])
4.2.7 YouTube
import webbrowser
import utils
class YoutubeSearch:
@staticmethod
def launch(query):
utils.speak("ok sir This is what I found for your search!")
query.replace("assistant", "")
query.replace("youtube search", "")
query.replace("search", "")
query.replace("youtube", "")
web = "https://www.youtube.com/results?search_query=" + query
webbrowser.open(web)
utils.speak("Done sir")
4.2.8 Config.json
{
"intents": [
{
"tag": "greeting",
"utterances": [
44
"wake up assistant",
"hello assistant",
"hi",
"hello",
"assistant",
"hai"
],
"response": [
"Hello Sir, How may i help you?",
"Hello Sir, always at your service."
]
},
{
"tag": "google_search",
"utterances": [
"google search",
"search",
"who is ",
"who are ",
"what is ",
"when",
"why",
"google"
],
"response": [
"Got it Sir.",
"I got the things for which you are looking for.",
"Give me a sec sir.",
"Just a moment sir"
]
},
{
"tag": "youtube_search",
"utterances": [
"play video on youtube",
"search video on youtube",
"play",
"youtube"
],
"response": [
45
"Got it Sir.",
"Give me a second sir."
]
},
{
"tag": "applications",
"utterances": [
"open",
"launch",
"app",
"application"
],
"response": [
"Ok sir.",
"sure sir."
]
},
{
"tag": "send_email",
"utterances": [
"email",
"send",
"send mail",
"send email",
"mail"
],
"response": [
"sending mail sir.",
"sure sir"
]
},
{
"tag": "note",
"utterances": [
"note",
"make a note",
"write this down",
"take a note"
],
"response": [
46
"sure sir",
"writing in the notepad sir"
]
},
{
"tag": "close_note",
"utterances": [
"close note",
"close notepad",
"close it",
"close"
],
"response": [
"sure sir",
"closing the notepad sir"
]
}
]
}
4.2.9 UTILS
import fnmatch
import json
import os
import random
import re
import webbrowser
import pyttsx3
import config
from model.voice_ana import VoiceAnalyzer
def choose_random(response):
47
return random.choice(response)
def speak(response):
engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id)
engine.setProperty('rate', 180)
engine.say(response)
engine.runAndWait()
def open_url(url):
webbrowser.open(url)
def find_file(pattern, path):

paths = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
paths.append(os.path.join(root, name))
if paths:
return paths
def get_search_value(command, intent_name, match_flag='word'):

intents = config.DATA['intents']
utterances = [intent['utterances'] for intent in intents if intent['tag'] ==

intent_name][0]
if match_flag == 'word':
words = ['\\b' + word + '\\b' for utterance in utterances for word in
utterance.split(' ')]
48
words = '|'.join(words)
elif match_flag == 'sentence':
words = '|'.join(utterances)
return re.sub(words, '', command, flags=re.IGNORECASE).strip()
def get_path_from_file(app):
with open(config.APP_DETAILS_FILE) as file:
app_details = json.load(file)
app = app_details.get(app)
if app:
return app.get('path')
def get_path(app, ext, directories):

patterns = [f'{app}{ext}', f'{app}*.{ext}', f'*{app}.{ext}', f'*{app}*.{ext}']
for directory in directories:
for pattern in patterns:
result = find_file(pattern, directory)
if result:
if len(result):
return get_multiple_paths(result, ext)
else:
return result
def get_multiple_paths(paths, ext):

speak('I got multiple applications. Which one would you like to open?')
for path in paths:
exe_name = os.path.basename(path).replace(ext, '')
speak(exe_name)
if path:
49
return path
def add_to_json(app_details):
with open(config.APP_DETAILS_FILE, 'r+') as file:
data = json.load(file)
data.update(app_details)
file.seek(0)
json.dump(data, file)
def read_voice_cmd():
recognizer = sr.Recognizer()
voice_input = ''
try:
print('Listening...')
audio = recognizer.listen(source=source, timeout=5, phrase_time_limit=5)
voice_input = recognizer.recognize_google(audio)
print('Input : {}'.format(voice_input))
except sr.UnknownValueError:
pass
except sr.RequestError:
print('Network error.')
except sr.WaitTimeoutError:
pass
except TimeoutError:
pass
return voice_input.lower()
50
4.2.10 Config
import json
import os
import platform
OS_NAME = platform.uname().system
APP_DETAILS_FILE = 'C:\\Users\\seshu\\PycharmProjects\\
VOICE_ASSISTANT\\config\\applications.json'
with open('C:\\Users\\seshu\\PycharmProjects\\VOICE_ASSISTANT\\config\\
config.json') as file:
DATA = json.load(file)
if os.path.exists(APP_DETAILS_FILE) is False:
with open(APP_DETAILS_FILE, 'w') as file:
file.write('{}')
4.2.11 Voice ana

from nltk.sentiment.vader import SentimentIntensityAnalyzer
import utils
class VoiceAnalyzer:
def _init_(self):
self.sid = SentimentIntensityAnalyzer()
def get_polarity_scores(self):
try:
51
voice_input = utils.read_voice_cmd()
return self.sid.polarity_scores(voice_input)
except sr.UnknownValueError as e:
print(e)
except sr.RequestError:
print('Network error.')
except sr.WaitTimeoutError:
pass
except TimeoutError:
pass
4.2.12 Model init

import nltk
import ssl
# Downloading the nltk data

# try:
# _create_unverified_https_context = ssl._create_unverified_context
# except AttributeError:
# pass
# else:
# ssl._create_default_https_context = _create_unverified_https_context
#
# nltk.download('punkt')
# nltk.download('wordnet')
# nltk.download('vader_lexicon')
import config
words = []
classes = []
data_x = []
data_y = []
for intent in config.DATA['intents']:
52
for utterance in intent['utterances']:
tokens = nltk.word_tokenize(utterance)
words.extend(tokens)
data_x.append(utterance)
data_y.append(intent['tag'])
if intent['tag'] not in classes:

classes.append(intent['tag'])
words = sorted(set(words))
classes = sorted(set(classes))
4.2.13 run.py
from PyQt5 import QtWidgets, QtGui,QtCore
from PyQt5.QtGui import QMovie
import sys
from PyQt5.QtWidgets import *
from PyQt5.QtCore import *
from PyQt5.QtGui import *
from PyQt5.uic import loadUiType
import pyttsx3
import os
import time
import webbrowser
import datetime
flags = QtCore.Qt.WindowFlags(QtCore.Qt.FramelessWindowHint)
engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
engine.setProperty('voice',voices[0].id)
engine.setProperty('rate',180)
53
def speak(audio):
engine.say(audio)
engine.runAndWait()
def wish():
hour = int(datetime.datetime.now().hour)
if hour>=0 and hour <12:
speak("Good morning")
elif hour>=12 and hour<18:
speak("Good Afternoon")
else:
speak("Good night")
class mainT(QThread):
def __init__(self):
super(mainT,self).__init__()
def run(self):
self.JARVIS()
def STT(self):
R = sr.Recognizer()
print("Listning...........")
audio = R.listen(source)
try:
print("Recog......")
text = R.recognize_google(audio,language='en-in')
print(">> ",text)
except Exception:
speak("Sorry Speak Again")
return "None"
text = text.lower()
return text
54
def JARVIS(self):
wish()
while True:
self.query = self.STT()
if 'good bye' in self.query:
sys.exit()
elif 'open google' in self.query:
webbrowser.open('www.google.co.in')
speak("opening google")
elif 'open youtube' in self.query:
webbrowser.open("www.youtube.com")
elif 'play music' in self.query:
speak("playing music from pc")
self.music_dir ="./music"
self.musics = os.listdir(self.music_dir)
os.startfile(os.path.join(self.music_dir,self.musics[0]))
FROM_MAIN,_ = loadUiType(os.path.join(os.path.dirname(__file__),"./scifi.ui"))
class Main(QMainWindow,FROM_MAIN):
def __init__(self,parent=None):
super(Main,self).__init__(parent)
self.setupUi(self)
self.setFixedSize(1920,1080)
self.label_7 = QLabel
self.exitB.setStyleSheet("background-image:url(./lib/exit - Copy.png);\n"
"border:none;")
self.exitB.clicked.connect(self.close)
self.setWindowFlags(flags)
Dspeak = mainT()
self.label_7 = QMovie("./lib/gifloader.gif", QByteArray(), self)
self.label_7.setCacheMode(QMovie.CacheAll)
self.label_4.setMovie(self.label_7)
self.label_7.start()
self.ts = time.strftime("%A, %d %B")
55
Dspeak.start()
self.label.setPixmap(QPixmap("./lib/tuse.png"))
self.label_5.setText("<font size=8 color='white'>"+self.ts+"</font>")
self.label_5.setFont(QFont(QFont('Acens',8)))
app = QtWidgets.QApplication(sys.argv)
main = Main()
main.show()
exit(app.exec_())
4.3 Screen shots/ Results
Interface
56
4.3.1
4.3.2 Input: What is the current time
57
Google Search for time
4.3.3 Input: Who is the president of India.
4.3.4 Input: Play songs
58
4.3.5 Input: Take a note. Note: my name is assistant.
Opening System Applications

4.5.6Input: Open Control Panel:
59
Opening System App Control panel.
5. Platform used: Pycharm:

As a programmer, you should be focused on the business logic and creating useful applications
for your users. In doing that, PyCharm by JetBrains saves you a lot of time by taking care of the
routine and by making a number of other tasks such as debugging and visualization easy.
PyCharm provides smart code completion, code inspections, on-the-fly error highlighting and
quick-fixes, along with automated code refactorings and rich navigation capabilities.
5.1Intelligent Coding Assistance
PyCharm provides smart code completion, code inspections, on-the-fly error highlighting and
quick-fixes, along with automated code refactorings and rich navigation capabilities.
Intelligent Code Editor:
PyCharm’s smart code editor provides first-class support for Python, JavaScript, CoffeeScript,
TypeScript, CSS, popular template languages and more. Take advantage of language-aware code
completion, error detection, and on-the-fly code fixes!
Smart Code Navigation:
60
Use smart search to jump to any class, file or symbol, or even any IDE action or tool window. It
only takes one click to switch to the declaration, super method, test, usages, implementation, and
more.
Fast and Safe Refactorings:
Refactor your code the intelligent way, with safe Rename and Delete, Extract Method, Introduce
Variable, Inline Variable or Method, and other refactorings. Language and framework-specific
refactorings help you perform project-wide changes.
5.2 Built-in Developer Tools:
PyCharm’s huge collection of tools out of the box includes an integrated debugger and test
runner; Python profiler; a built-in terminal; integration with major VCS and built-in database
tools; remote development capabilities with remote interpreters; an integrated ssh terminal; and
integration with Docker and Vagrant.
Debugging, Testing and Profiling:

Use the powerful debugger with a graphical UI for Python and JavaScript. Create and run your
tests with coding assistance and a GUI-based test runner. Take full control of your code with
Python Profiler integration.
VCS, Deployment and Remote Development:

Save time with a unified UI for working with Git, SVN, Mercurial or other version control
systems. Run and debug your application on remote machines. Easily configure automatic
deployment to a remote host or VM and manage your infrastructure with Vagrant and Docker.
Database tools:
Access Oracle, SQL Server, PostgreSQL, MySQL and other databases right from the IDE. Rely
on PyCharm’s help when editing SQL code, running queries, browsing data, and altering
schemas.
5.3 Customizable and Cross-platform IDE

Use PyCharm on Windows, macOS and Linux with a single license key. Enjoy a fine-tuned
workspace with customizable color schemes and key-bindings, with VIM emulation available.
61
Customizable UI
Are there any software developers who don't like to tweak their tools? We have yet to meet one,
so we've made PyCharm UI customization a breeze. Enjoy a fine-tuned workspace with
customizable color schemes and key-bindings.
Plugins
More than 10 years of IntelliJ platform development gives PyCharm 50+ IDE plugins of
different nature, including support for additional VCS, integrations with different tools and
frameworks, and editor enhancements such as Vim emulation.
Cross-platform IDE
PyCharm works on Windows, macOS or Linux. You can install and run PyCharm on as many
machines as you have, and use the same environment and functionality across all your machines.
6. CONCLUSION AND FUTURE SCOPE
6.1 Conclusion
The project is to build an assistant which is reliable, cost-effective and provides many services to
the user. Assistant was used in many applications such as scientific, educational, and
commercial. The main agenda of this project is to serve people with physical disabilities. If a
person is facing difficulty in communicating with the system it is not proper communication. To
overcome this kind of issue we developed an assistant which helps them to communicate with
the system efficiently.This assistant responses from a finite set of predefined responses or pre-
existing Information and training. Assistants are a big step forward in enhancing human
computer interactions.
6.2 Future Scope

There are some limitations with our project they are:
At present we have developed an assistant which supports one language, in the near future we
can break the barrier by developing a more sophisticated assistant.
To use the assistant efficiently we need a strong and durable internet connection, without that we
may end up with late responses. To process the voice data and convert it into the text we used
python modules, so we need an internet connection. In the future, it is better to overcome this
drawback and come up with more efficient solutions.
62
7. REFERENCES
[1].https://www.ijrte.org/wp-content/uploads/papers/v9i2/A2753059120.pdf
[2].https://media.neliti.com/media/publications/263312-an-intelligent-behaviour-shown-by-
chatbo-7020467d.pdf
[3].Apte, T. V., Ghosalkar, S., pandey, S., & Padhra, S. (2014). Android app for blind using
speech technology. International Journal of Research in Computer and Communication
Technology (IJRCCT), #(3), 391-394.
[4].https://opensource.adobe.com/Spry/samples/data_region/JSONDataSetSample.html
[5].https://support.etlworks.com/hc/en-us/articles/360014078293-JSON-dataset-
Format#:~:text=The%20JSON%20dataset%20is%20a,REST%20APIs%20in%20Etlworks
%20Integrator
[6].Anwani, R., Santuramani, U., Raina, D., & RL, P. Vmail: voice Based Email Application.
International Journal of Computer Science and Information Technologies, Vol. 6(3), 2015
[7]. https://keras.io/guides/sequential_model/
[8]. https://keras.io/guides/making_new_layers_and_models_via_subclassing/
[9]. https://keras.io/guides/training_with_built_in_methods/
[10]. https://analyticsindiamag.com/a-tutorial-on-sequential-machine-learning/
[11]. https://doc.qt.io/qtforpython-5/PySide2/QtWidgets/QWidget.html
63

Voice Based System Assistant Using NLP and Deep Learning

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Voice Based System Assistant Using NLP and Deep Learning

Uploaded by

Copyright:

Available Formats

Voice Based System Assistant using NLP and deep learning

A Project report submitted in partial fulfillment of the requirements for

Under the guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

PROJECT GUIDE HEAD OF THE DEPARTMENT

Mrs. K. Amaravathi Dr.R.Sivaranjani

(Assistant Professor) (Professor)

We K. Subba Reddy (318126510028), K. Sesha Shai Datta (3191265100L06), A. Tarun

K. Subba Reddy (318126510028)

K. Sesha Shai Datta (319126510L06)

S. Ajay Varma (318126510048)

K. Subba Reddy (318126510028)

K. Sesha Shai Datta (319126510L06)

S. Ajay Varma (318126510048)

1.7 Numpy 21-23

6. CONCLUSION AND FUTURE SCOPE

1.1 Problem Statement

1.2 Deep Learning

Machine learning algorithms leverage structured, labeled data to make predictions—meaning

1.3 Natural language processing

Schematically, the following Sequential model:

is equivalent to this function:

Your model has multiple inputs or multiple outputs

Creating a Sequential model

1.5 Natural Language Toolkit:

Thanks to a hands-on guide introducing programming fundamentals alongside topics in

Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a

A powerful N-dimensional array object

TensorFlow is cross-platform. It runs on nearly everything: GPUs and CPUs—including mobile

i. Loading the data

 Allows the same code to run on CPU or on GPU, seamlessly.

 Supports arbitrary network architectures: multi-input or multi-output models, layer

ii) Proposed Architecture The system design consists of

Voice assistant privacy concerns :

3.1 Proposed System

Fig 3.2: Architecture

4. EXPERIMENTAL ANALYSIS AND RESULTS

4.1.2 Hardware Requirements

4.2 Sample Code

4.2.1 Main file

from googletrans import Translator

utils.speak("Alright sir, taking the screenshot")

class Main(QMainWindow, FROM_MAIN):

self.ts = time.strftime("%A, %d %B")

4.2.2 Model Training

words = [self.lemmatizer.lemmatize(word.lower()) for word in self.words if

for idx, doc in enumerate(self.data_x):

model.fit(x=train_x, y=train_y, epochs=200, verbose=1)

def get_intent(self, model, command):

def __init__(self, response, logger=None):

def get_name(self, command):

is_space = bool(re.search(r"\s", app))

def launch(self, command):

cmd = f'open {path}'

def execute_command(self, cmd):

4.2.4 Google Search

def mail(sender_email, sender_password, receiver_email, msg):

def find_file(pattern, path):

def get_search_value(command, intent_name, match_flag='word'):

utterances = [intent['utterances'] for intent in intents if intent['tag'] ==

return re.sub(words, '', command, flags=re.IGNORECASE).strip()

def get_path(app, ext, directories):

def get_multiple_paths(paths, ext):

def init(self, response, logger=None):