Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Speech Recognition

BATCH 10
Team

Guide: Mr. VSVS Murthy

Team members:

M Siva Badarinath 1215316025


Team
G Manish Reddy 1215316015
M Sumanth 1215316027
Y Raviteja 1215316059
Project Brief:

Technlogies used:

Python

libraries: gTTs, google Speech Recognition, PyAudio


Abstract:

• The aim of project is to convert speech to text and browse


the internet with voice commands
• (much like google assistant)

• Read the Text documents in english language and convert it


into to Voice.

• To make Browising the net more comfortable and access


the documents more eaiser.
gTTS : Google Text to Speech

• a Python library and tool to interface with Google Translate’s


text-to-speech API.
• Writes spoken mp3 data to a file, a file-like object (bytestring) for
further audio manipulation, or stdout.
• It features flexible pre-processing and tokenizing, as well as
automatic retrieval of supported languages
PyAudio :

• PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library.
• With PyAudio, you can easily use Python to play and record audio on a variety of platforms.

• PyAudio is inspired by:

• pyPortAudio/fastaudio: Python bindings for PortAudio v18 API.


Speech Recognition

• Speech Recognition is an important feature in several applications used such as


home automation, artificial intelligence, etc.

• This aims to provide an introduction on how to make use of the


SpeechRecognition library of Python

• This is useful as it can be used on microcontrollers such as Raspberri Pis with


the help of an external microphone
Outcome:

• Can detect Different Voices

• generates audio files from the given text

• Reduces the time for Browsing or typing files


Conclusion 1
We have analysed and looked into the various parameters in order to take the
conventional voice engines to the next level. Our model demonstrates how the voice
engine be used to perform real-time activities such as controlling of appliances with
mere voice commands and no physical movement. The model is currently developed on
the base version of Android Froyo 2.2 and is possible to make it compatible to the later
versions by following certain procedures.

The principle of DTMF signals to demonstrate the automation of electrical appliances


has been followed as the hardware setup is cost effective, and any individual can
implement the same without much expenditure.

Author: P. Magesh Kannan


Date of Conference: 28-30 Aug. 2014
Conclusion 2
In this paper the results of the time optimization of the real-time speaker recognition
system were presented. The obtained parameters prove that the system accuracy can be
held on the same level (or even increased) while reducing the number of computations
related to the MFCC (8 time less FFT computations) and GMM (twice less computations)
algorithms. As a consequence, the sampling rate can be increased to provide more
accurate real time speaker recognition (more than 10 %).

Author :Radosław Weychan


Date of Conference: 23-25 Sept. 2015
conclusion 3

This research project has developed a technique on converting a


text image directly to speech using Python and Raspberry Pi3
minicomputer. The hardware provides a portable and economical
way of converting an image to text. Our method is more reliable
than others as Tesseract OCR has an accuracy of 99% and eSpeak
uses two methods to read out the image with more human
compassion.

H
Author
asan U. Zaman
Date of Conference: 26-28 Oct. 2018
Conclusion 4
In this work an end to end speech to text conversion model using neural networks is
implemented. Techniques such as max pooling and batch normalization are used to
further optimize the model and boost its accuracy. The process of porting the trained
model to a Raspberry pi is explained. The usage of these kind of neural network models
is confined to the labels used in the dataset. Better datasets with more labels and
inclusion of various accents improve the application efficiency

Author :A. Pardha Saradhi


Date of conference :6, April 2019
THANK YOU

You might also like