International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 9 Sep 2013

ISSN: 2231-2803 Page 3036

A survey on Speech Recognition

V.Malarmathi M.C.A
, Dr.E.Chandra M.Sc, M.phil, Phd

Research Scholar, Dr.SNS Rajalakshmi College of Arts & Science, Coimbatore, India.
Director, Department of Computer Science, Dr.SNS Rajalakshmi College of Arts & Science, Coimbatore-32, India

Abstract The Speech is most prominent & primary mode of
Communication among human being. The communication
among human computer interaction is called human computer
interface. It is the study of speech signals and the processing of
methods of the signals. The signals are usually processed in a
digital representation. It is closely tied to Natural Language
Processing (NLP) Example is Speech-To-Text Synthesis. Since
even before the time of Alexander Graham Bells revolutionary
invention, engineers and scientists have studied the phenomenon
of speech communication with an eye on creating more efficient
and effective systems of human-to-human and human-to-
machine communication. our goal is to provide a useful
introduction to the wide range of important concepts that
comprise the field of digital speech processing. Speech processing
in an effort to provide a more efficient representation of the
speech signal. Speech Processing is divided in to various
categories such as Speech recognition, Speaker recognition,
Speech coding, Voice Analysis, Speech Synthesis and Speech
Enhancement. This paper is mainly discussed with Speech.
Speech Recognition is the process of speaking words into the
computer, and having text appear on the screen, or having the
computer perform various functions, is one of the most exciting
and potential-filled technologies available for students with
special needs.

Keywords Speech Recognition, Automatic Speech Recognition
(ASR), Voice Analysis, Speech Synthesis


Speech Recognition is a process used to Recognize Speech
uttered by a Speaker and it has been in the field of Research.
Voice communication is the most effective mode of
communication used by humans. The significance of Speech
recognition lies in its simplicity. It can be used in many
applications like Security devices, Household Appliances,
Cellular Phones, ATM, Machines and Computers. Speech
Recognition is the process of translating spoken words into
text information on the computer. Through a speech
recognition program or an application, the computer is able to
process words you say and turn them into text on the screen.
In computer science, speech recognition (SR) is the translation
of spoken words into text. It is also known as "automatic
speech recognition", "ASR", "computer speech recognition",
"speech to text", or just "STT". Some SR systems use
"training" where an individual speaker reads sections of text
into the SR system. These systems analyze the person's
specific voice and use it to fine tune the recognition of that
person's speech, resulting in more accurate transcription.
Systems that do not use training are called "Speaker
Independent" systems. Systems that use training are called
"Speaker Dependent" systems. Most speech recognition
systems can be classified according to the following
Categories [1][2].


A. Isolated-Word Recognition
Isolated-Word can be introduced with speaker trained and
speaker independent. This technology opened up a class of
applications called command-and control applications in
which the system is capable of recognizing a single word
command (from a small vocabulary of single word
commands), and appropriately responding to the recognized
command. One key problem with this technology is the
sensitivity to background noises (which were often recognized
as spurious spoken words) and extraneous speech which was
inadvertently spoken along with the command word. Various
types of keyword spotting Algorithms evolved to solve these
types of problems [1][2].

B. Connected Word Recognition
Connected Word Recognition can be introduced with
speaker trained and speaker independent. This technology was
built on top of word recognition technology, choosing to
exploit the word models that were successful in isolated word
recognition, and extend the modeling to recognize a
concatenated sequence (a string) of such word models as a
word string. This technology opened up a class of applications
based on recognizing digit strings and alphanumeric strings,
and led to a variety of systems for voice dialing, credit card
authorization, directory assistance lookups, and catalog

C. Continuous or Fluent Speech Recognition
Continuous or Fluent can be introduced with speaker
trained and speaker independent. This technology led to the
first large vocabulary recognition systems which were used to
access databases (the DARPA Resource Management Task),
to do constrained dialogue access to information to handle
very large vocabulary read speech for dictation (the DARPA
NAB Task), and eventually were used for desktop dictation
systems for PC environments [1][3].
D. Speech Understanding Systems
Speech Understanding Systems (so-called unconstrained
dialogue systems) which are capable of determining the
underlying message embedded within the speech, rather than
just recognizing the spoken words. Such systems, which are
only beginning to appear recently, enable services like
customer care and intelligent agent systems which provide
access to information sources by voice dialogues (the AT&T
Maxwell Task)[1][2].

E. Spontaneous conversation Systems
Spontaneous Conversation is able to recognize the spoken
material accurately and understand the meaning of the spoken
material? Such systems, which are currently beyond the limits
of the existing technology, will enable new services such as
Conversation Summarization, Business Meeting Notes,
Topic Spotting in fluent speech (e.g., from radio or TV
broadcasts), and ultimately even language translation services
between any pair of existing languages[1][4].

F. Applications
Speech Recognition applications include voice user
interfaces such as voice dialing (e.g., "Call home"), call
routing (e.g., "I would like to make a collect call"), demotic
appliance control, search (e.g., find a podcast where particular
words were spoken), simple data entry (e.g., entering a credit
card number), preparation of structured documents (e.g., a
radiology report), speech-to-text processing (e.g., word
processors or emails), and aircraft (usually termed Direct
Voice Input)[1][2].
The Applications include automation of complex operator-
based tasks, e.g., customer care, dictation, form filling
applications, provisioning of new services, customer help
lines, e-commerce[3][1].

G. Issues in Speech Recognition
The termvoice recognitionrefers to finding the identity of
"who" is speaking, rather than what they are saying.
Recognizing the speaker can simplify the task of translating
speech in systems that have been trained on specific person's
voices or it can be used to authenticate or verify the identity of
a speaker as part of a security process[6][2].
Accurately and efficiently convert a speech signal into a
text message independent of the device, speaker or the
environment. It is easy to measure extracted speech features.
It should be stable over time [3][1].


A. Advantages
Speech input is easy to perform because it does not require
a specialized skill as does typing or push button operations.
Information can be Input even when the user is moving or
doing other activities involving the hands, legs, eyes or ears.
ASR is divided in to major categories. Speaker-dependent and
Speaker-independent. Automatic Speech Recognition requires
Speaker Training or enrollment prior to use and the primary
user trains the Speech Recognizer with samples of his or her
own speech. In Speaker independent Automatic Speech
Recognition does not Require Speaker Training prior to use.
The Speech recognizer is pre-trained during system
development with speech samples from a collection of

In this review, we have discussed the types of speech
recognition system. We also presented the applications and
issues consider under speech recognition system. Speech
recognition technology as evolved for more than 40 years,
spurred on by advances in signal processing, algorithms,
architectures, and hardware. Today high quality speech
recognition technology packages are available in the form of
inexpensive software only desktop packages (IBM via Voice,
Dragon Naturally Speaking, Kurzweil etc.), technology
engines that run on either the desktop or a workstation[1]


