Speech Recognition

Speech Recognition
BATCH 10
Team
Guide: Mr. VSVS Murthy
Team members:
M Siva Badarinath 1215316025

Team
G Manish Reddy 1215316015
M Sumanth 1215316027
Y Raviteja 1215316059
Project Brief:
Technlogies used:
Python
libraries: gTTs, google Speech Recognition, PyAudio

Abstract:
• The aim of project is to convert speech to text and browse

the internet with voice commands
• (much like google assistant)
• Read the Text documents in english language and convert it

into to Voice.
• To make Browising the net more comfortable and access

the documents more eaiser.
gTTS : Google Text to Speech
• a Python library and tool to interface with Google Translate’s

text-to-speech API.
• Writes spoken mp3 data to a file, a file-like object (bytestring) for
further audio manipulation, or stdout.
• It features flexible pre-processing and tokenizing, as well as
automatic retrieval of supported languages
PyAudio :
• PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library.
• With PyAudio, you can easily use Python to play and record audio on a variety of platforms.
• PyAudio is inspired by:
• pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.

Speech Recognition
• Speech Recognition is an important feature in several applications used such as

home automation, artificial intelligence, etc.
• This aims to provide an introduction on how to make use of the

SpeechRecognition library of Python
• This is useful as it can be used on microcontrollers such as Raspberri Pis with

the help of an external microphone
Outcome:
• Can detect Different Voices
• generates audio files from the given text
• Reduces the time for Browsing or typing files

Conclusion 1
We have analysed and looked into the various parameters in order to take the
conventional voice engines to the next level. Our model demonstrates how the voice
engine be used to perform real-time activities such as controlling of appliances with
mere voice commands and no physical movement. The model is currently developed on
the base version of Android Froyo 2.2 and is possible to make it compatible to the later
versions by following certain procedures.
The principle of DTMF signals to demonstrate the automation of electrical appliances

has been followed as the hardware setup is cost effective, and any individual can
implement the same without much expenditure.
Author: P. Magesh Kannan

Date of Conference: 28-30 Aug. 2014
Conclusion 2
In this paper the results of the time optimization of the real-time speaker recognition
system were presented. The obtained parameters prove that the system accuracy can be
held on the same level (or even increased) while reducing the number of computations
related to the MFCC (8 time less FFT computations) and GMM (twice less computations)
algorithms. As a consequence, the sampling rate can be increased to provide more
accurate real time speaker recognition (more than 10 %).
Author :Radosław Weychan

Date of Conference: 23-25 Sept. 2015
conclusion 3
This research project has developed a technique on converting a

text image directly to speech using Python and Raspberry Pi3
minicomputer. The hardware provides a portable and economical
way of converting an image to text. Our method is more reliable
than others as Tesseract OCR has an accuracy of 99% and eSpeak
uses two methods to read out the image with more human
compassion.
H
Author
asan U. Zaman
Date of Conference: 26-28 Oct. 2018
Conclusion 4
In this work an end to end speech to text conversion model using neural networks is
implemented. Techniques such as max pooling and batch normalization are used to
further optimize the model and boost its accuracy. The process of porting the trained
model to a Raspberry pi is explained. The usage of these kind of neural network models
is confined to the labels used in the dataset. Better datasets with more labels and
inclusion of various accents improve the application efficiency
Author :A. Pardha Saradhi

Date of conference :6, April 2019
THANK YOU

Speech Recognition

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech Recognition

Uploaded by

Copyright:

Available Formats

Speech Recognition

Guide: Mr. VSVS Murthy

M Siva Badarinath 1215316025

libraries: gTTs, google Speech Recognition, PyAudio

• The aim of project is to convert speech to text and browse

• Read the Text documents in english language and convert it

• To make Browising the net more comfortable and access

• a Python library and tool to interface with Google Translate’s

• PyAudio is inspired by:

• pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.

• Speech Recognition is an important feature in several applications used such as

• This aims to provide an introduction on how to make use of the

• This is useful as it can be used on microcontrollers such as Raspberri Pis with

• Can detect Different Voices

• generates audio files from the given text

• Reduces the time for Browsing or typing files

The principle of DTMF signals to demonstrate the automation of electrical appliances

Author: P. Magesh Kannan

Author :Radosław Weychan

This research project has developed a technique on converting a

Author :A. Pardha Saradhi

You might also like