Chapter One Genesis_011542

CHAPTER ONE: INTRODUCTION
1.0 Introduction
Speech to text translation is a technology that converts spoken language into written text. With
advancements in natural language processing and artificial intelligence, speech to text translation
has gained significant importance in various industries and applications. In this paper, we will
explore the setback, key benefits, usage, challenges, problem statement, aim and objectives, and
scope and limitations of speech to text translation.
One of the fastest-growing engineering innovations is speech recognition. This has been planned
and built with that fact in mind, and some effort has been made to accomplish this goal. It has a
variety of uses and possible benefits in a variety of fields. Nearly 20% of the world's population
has some kind of disability, with a huge number of them being unseeing or incapable of properly
handle their arms and some persons who are blind but have difficulty examination difficulties
may listen to a researched article using an accessible device. In such situations, speech
recognition systems come in handy, allowing them to exchange information with others when
running a device using voice input. A speech to text system (STT) converts speech into text in a
human language format and text to speech system (TTS) translates text into speech in a human
language format. The proposed device is a hardware solution for synthesizing speech and
allowing voice access to digital content. This project is planned and implemented. This goal has
been formulated with that element in mind, and a small effort has been made to achieve it. Our
job is to capable of speech recognition and text conversion from audio input and text recognition
and audio convert from text input.

Setback
One of the major setbacks in speech to text translation is accuracy. Achieving high accuracy
levels in understanding and translating spoken language can be challenging due to factors such as
background noise, accents, dialects, and variations in speech patterns. This setback has prompted
researchers and developers to strive towards improving the accuracy of speech to text translation
systems.
Key Benefit
The key benefit of speech to text translation is immense time-saving. By converting speech into
text, individuals can quickly document conversations, meetings, and interviews without the need
for manual transcription. This technology proves to be highly beneficial for people with
disabilities, as it enables them to communicate effectively in written form.
How it is used
Speech to text translation is used in a variety of applications across different domains. For
instance, in healthcare, doctors can dictate patient notes, allowing for efficient medical record
keeping. In customer service, speech to text translation aids in transcribing customer calls,
enabling better analysis and quality control. In education, students can record lectures and later
convert them into text to review key points.
1.1 Background of the Study
Speech to text translation technology has been evolving for several decades, with researchers and
developers constantly working towards improving its accuracy, speed, and reliability.
Advancements in machine learning and deep learning algorithms have significantly enhanced the
performance and usability of these systems.
Speech Recognition (SR) is the ability to turn a spoken word or dictation into text. Speech
recognition, also known as "automatic speech recognition" (ASR) or "speech to text," is a
technology that converts speech into text (STT)
- The process of translating an acoustic signal captured by a microphone or other
peripherals into a series of words is known as speech recognition.
- Linguistic processing can be used to enhance speech comprehension.
- For applications such as commands & control data entry and document preparation, the
known terms may be an end in themselves
Mention speech recognition today, and it is almost inevitable that someone will point to HAL,
the computer from 2001: A Space Odyssey. This illustration of where the technology is headed
has lulled many information technology (IT) managers into ignoring speech recognition because
it is obvious that computers that can hold an intelligent conversation will remain science fiction
for a long time. The fact is that practical, usable speech recognition products are here now. For
example, several companies now sell continuous speech recognition applications that offer the
everyday computer user the ability to input and format text in a word processing program with,
the companies claim, the speed and accuracy that now rivals the traditional keyboard and mouse.
Dragon Systems introduced the first general-purpose continuous-speech recognition program for
the personal computer (PC) in June 1997. IBM Corp. followed soon after with the introduction
of IBM ViaVoice. There is no doubt that speech recognition software has improved significantly
in a short time. Most current speech recognition applications offer large active vocabularies.
Also, the speech recognition engines have become robust. More importantly, a user can now
dictate directly into most popular applications like Microsoft Word (Alwang, 2020).
History of Speech Recognition
In its most basic form, speech recognition involves the process of a computer matching an
acoustic signal to some text. While this may sound relatively simple, speech recognition software
development spans a huge range of scientific disciplines, from linguistics and biology to
computer science and artificial intelligence. The ultimate aim of those working on speech
recognition is to produce a system that enables humans to communicate with computers as they
would with other humans i.e. using natural speech (Rodman, 2021).
Voice and speech recognition have been around since the early 1970s, when research was
conducted on these technologies at the Defense Advanced Research Projects Agency (DARPA)
(Katz, 1993). While commercial applications existed in the '80s and early '90s, they were cost
and technology prohibitive. Today's microprocessor technology, though, has brought voice and
speech recognition out of the laboratories, and into the mainstream consumer market (Lorek,
2022).
Although attempts have been made by many companies over the last 15 years to introduce low
cost products utilizing speech recognition, the products have been few and far between and
market failures have been common. For example, the toy industry historically, has been plagued
by poor speech recognition accuracy leading to a very high percentage of returns on products
such as voice controlled cars (Markowitz, 2019).
Specific Problem
In the early 1950s, the rationale behind computerization was to gain productivity improvements
associated with the substitution of machinery for labor. However, little direct evidence is
available of the relationship between information technology investment and performance. When
a relationship is found, the results cannot be generalized beyond the particular industry study
(Katz, 2019).
The obvious growth of computer information-processing industries since the 1950s might
suggest that every expectation of productivity payoffs has been fulfilled. With quality-adjusted
investment in new computer equipment near $500 billion during the 1990s, U.S. firms have
clearly embraced the computer. The problem, however, is that economy-wide productivity
growth remains well below historic averages (Brynjolfsson, 2020). The rise in computer
investment coupled with slow growth in productivity is commonly referred to as the "Computer
Productivity Paradox" (Brynjolfsson, 2020).
Challenges Faced
Developing accurate speech to text translation systems poses several challenges. Some of these
challenges include language-specific nuances, speaker adaptation, dealing with background
noise, handling multiple voices in a conversation, and extracting context from speech to improve
translation quality.
1.2 Problem Statement
The problem statement in speech to text translation revolves around improving accuracy and
efficiency in converting spoken language into written text. Researchers aim to reduce errors
caused by various factors and enhance the overall usability of speech to text translation systems.
1.3 Aim and Objectives
The aim of this study is to assess the current state of speech to text translation technology and
identify areas for improvement. The objectives include analyzing the challenges faced in
achieving high accuracy, evaluating existing methodologies and techniques used in speech to
text translation, and proposing potential solutions to enhance the performance of these systems.
1.4 Scope and Limitations
This study focuses on speech to text translation technology and its applications in various fields.
However, it is important to note that the scope of the study may be limited by the availability of
resources, specific languages, and dialects considered, as well as the accessibility to advanced
tools and technologies.
1.5 Justification of the Study
The Speech to Text Translator system has gained significant attention due to its potential to
overcome language barriers and enhance communication. This system utilizes voice recognition
technology to transcribe spoken words into text, which can then be translated into another
language. This has immense potential in various fields, such as education, healthcare, business,
and international relations.
1.6 Organization of the Project

This project will be organized in the following way:
Chapter 2 reviews the literature and related work.
Chapter 3 introduces types of machine learning algorithms, one of which will be used to build
the model and methodology.
Chapter 4 is implementation and testing.
Chapter 5 includes the summary, conclusion, and recommendation.

Chapter One Genesis_011542

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter One Genesis_011542

Uploaded by

Copyright:

Available Formats

CHAPTER ONE: INTRODUCTION

scope and limitations of speech to text translation.

and audio convert from text input.

disabilities, as it enables them to communicate effectively in written form.

convert them into text to review key points.

1.1 Background of the Study

performance and usability of these systems.

recognition, also known as "automatic speech recognition" (ASR) or "speech to text," is a

technology that converts speech into text (STT)

- The process of translating an acoustic signal captured by a microphone or other

peripherals into a series of words is known as speech recognition.

- Linguistic processing can be used to enhance speech comprehension.

known terms may be an end in themselves

History of Speech Recognition

such as voice controlled cars (Markowitz, 2019).

Productivity Paradox" (Brynjolfsson, 2020).

challenges include language-specific nuances, speaker adaptation, dealing with background

1.2 Problem Statement

1.4 Scope and Limitations

tools and technologies.

1.5 Justification of the Study

and international relations.

1.6 Organization of the Project

Chapter 2 reviews the literature and related work.

the model and methodology.

Chapter 4 is implementation and testing.

Chapter 5 includes the summary, conclusion, and recommendation.

You might also like