Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

SPEECH RECOGNITION

CONTENTS:-

 Introduction
 Meaning of Speech Recognition
 Working of Speech Recognition
 Speech Recognition Flowchart
 Recognition process Flow Summary
 Advantages
 Disadvantages
 The Future of Speech Recognition
 Conclusion
 Screenshots
Introduction

 Speech recognition is the process of converting an acoustic signal, captured


by a microphone or a telephone, to set of words.
 The recognized words can be an end in themselves, as for applications such
as commands & control, data entry, and document preparation.
 They can also serve as the input to further linguistic processing in order to
achieve speech understanding.
 It is also known as Automatic Speech Recognition (ASR), Computer
Speech Recognition, Speech To Text (STT).
What is Speech Recognition

 Speech recognition basically means talking to a computer, having it


recognize whatever we're saying.
 The definition says speech recognition is the interdisciplinary sub field of
computational linguistics that develops methodologies and technologies.
 It enables the recognition and translation of spoken language into text by
computers. it is also known as automatic speech recognition (asr),
computer speech recognition or speech to text (stt).
How does it work?

 This process fundamentally functions as a pipeline that converts pcm (pulse


code modulation) digital audio from a sound card into recognized speech.
 It basically uses algorithms through language modeling. it involves
relationship between linguistic units of speech and audio signals; language
modeling matches sounds with word sequences to help differentiate
between words that sound similar.
 We also use hidden markov models to identify temporal patterns to
improve accuracy.
TYPES OF SPEECH RECOGNITION

1) Speaker-Dependent
2) Speaker- Ind ependent

1) Speaker-Dependent

 Speaker-dependent software is commonly used for dictation software,


while speaker-independent software is more commonly found in telephone
applications.
 Speaker-dependent software works by learning the unique characteristics of
a single person's voice, in a way similar to voice recognition.
 New users must first "train" the software by speaking to it, so the computer
can analyze how the person talks.
 This often means users have to read a few pages of text to the computer
before they can use the speech recognition software.

2) Speaker-Indendent

 Speaker-independent software is designed to recognize anyone's voice, so


no training is involved. This means it is the only real option for
applications such as interactive voice response systems -
 Businesses can't ask callers to read pages of text before using the system.
The downside is that speaker-independent software is generally less
accurate than speaker-dependent software.
 Speech recognition engines that are speaker independent generally deal
with this fact by limiting the grammars they use.
 By using a smaller list of recognized words, the speech engine is more
likely to correctly recognize what a speaker said.
Recognition Process Flow Summary

 Step 1:User Input


The system catches user's voice in the form of analog acoustic signal.
 Step 2: Digitization
Digitize the analog acoustic signal.
 Step 3:Phonetic Breakdown
Breaking signals into phonemes
 Step 4:Statistical Modeling
 Mapping phonemes to their phonetic representation using statistics
model.
 Step 5:Matching
 According to grammar phonetic representation and Dictionary, the
system returns an n-best list (I.e,:a word plus a confidence score)
ADVANTAGES

 People with disabilities.


 Organizations - Increases productivity, reduces costs and errors.
 Lower operational Costs.
 Advances in technology will allow consumers and businesses to implement
speech recognition systems at a relatively low cost.
• Cell-phone users can dial pre-programmed numbers by voice command.
• Users can trade stocks through a voice-activated trading system.
• Speech recognition technology can also replace touch-tone.
DISADVANTAGES

 Difficult to build a perfect system.


 Conversations
• Involves more than just words (non-verbal
communication,stutters etc.
• Every human being has differences such as their voice, mouth, and
speaking style.
 Filtering background noise is a task that can even be difficult for humans to
accomplish.
The Future of Speech Recognition

 The Defense Advanced Research Projects Agency (DARPA) has three


teams of researchers working on Global Autonomous Language
Exploitation (GALE).
 A program that will take in streams of information from foreign news
broadcasts and newspapers and translate them.
 It hopes to create software that can instantly translate two languages with at
least 90 percent accuracy.
 DARPA is also funding an R&D effort called TRANSTAC to enable the
soldiers to communicate more effectively with civilian populations in non
English-speaking countries.
Conclusion:

1. At some point in the future, speech recognition may become speech


understanding
2. The statistical models that allow computers to decide what a person just
said may someday allow them to grasp the meaning behind the words.
3. Although it is a huge leap in terms of computational power and software
sophistication.
4. some researchers argue that speech recognition development offers the
most direct line from the computers of today to true artificial intelligence.
Screenshots:-

Presented by:
Rakesh C N
IIIrd Sem MCA

Introduction
Meaning of Speech Recognition
Working of Speech Recognition
Speech Recognition Flowchart
Recognition process Flow Summary
Advantages
Disadvantages
The Future of Speech Recognition
Conclusion
The process of converting an acoustic signal, captured by
a microphone or a telephone, to set of words.
They can also serve as the input to further linguistic
processing in order to achieve speech understanding.

 It means talking to a computer, having it recognize


whatever we're saying.
 The interdisciplinary subfield of computational
linguistics that develops methodologies and
technologies.
 It enables the recognition and translation of spoken
language into text by computers.
It converts PCM (pulse code modulation) digital audio from a
sound card into recognized speech.
It basically uses algorithms through language modeling.
It involves relationship between linguistic units of speech and
audio signals,
 Language modeling matches sounds with word sequences
to help differentiate between words that sound similar.

1)Speaker-Dependent
2)Speaker-Independent
 This works by learning the unique characteristics of a
single person's voice.
 New users must first "train" the software by speaking to it.
 the computer can analyze how the person talks.
 Users have to read a few pages of text to the computer
before they can use the speech recognition software.

 It is the only real option for applications such as interactive


voice response systems .
 It is generally less accurate than speaker-dependent software.
 Speech recognition engines that are speaker independent
generally deal with this fact by limiting the grammars they
use.
 Step 1:User Input
The system catches user's voice in the form of analog
acoustic signal.
 Step 2 Digitization
Digitize the analog acoustic signal.
 Step 3:Phonetic Breakdown
Breaking signals into phonemes
Step 4:Statistical Modeling
Mapping phonemes to their phonetic representation using
statistics model.
Step 5:Matching
According to grammar phonetic representation and Dictionary,
the system returns an n-best list
Grammar- the union words or phrases to constraint the range of
input or output in the voice application.

People with disabilities.


Lower operational Costs.
Advances in technology will allow to implement speech
recognition systems at a relatively low cost.
Users can trade stocks through a voice-activated trading
system.
Speech recognition technology can also replace touch-
tone.
Difficult to build a perfect system.
Conversations
•Every human being has differences such as their voice,
mouth, and speaking style.
Filtering background noise is a task that can even be difficult
for humans to accomplish.

DARPA has three teams of researchers working on Global


Autonomous Language Exploitation (GALE).
A program that will take in streams of information from
foreign news broadcasts and newspapers and translate them.
 "DARPA is also funding an R&D effort called TRANSTAC.
At some point in the future, speech recognition may become
speech understanding.
The statistical models that allow computers to decide what a
person just said may someday allow them to grasp the meaning
behind the words.
Although it is a huge leap in terms of computational power and
software sophistication.
Some researchers argue that speech recognition development
offers the most direct line from the computers of today to true
artificial intelligence.

You might also like