Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

CHAPTER ONE: INTRODUCTION

1.0 Introduction

Speech to text translation is a technology that converts spoken language into written text. With

advancements in natural language processing and artificial intelligence, speech to text translation

has gained significant importance in various industries and applications. In this paper, we will

explore the setback, key benefits, usage, challenges, problem statement, aim and objectives, and

scope and limitations of speech to text translation.

One of the fastest-growing engineering innovations is speech recognition. This has been planned

and built with that fact in mind, and some effort has been made to accomplish this goal. It has a

variety of uses and possible benefits in a variety of fields. Nearly 20% of the world's population

has some kind of disability, with a huge number of them being unseeing or incapable of properly

handle their arms and some persons who are blind but have difficulty examination difficulties

may listen to a researched article using an accessible device. In such situations, speech

recognition systems come in handy, allowing them to exchange information with others when

running a device using voice input. A speech to text system (STT) converts speech into text in a

human language format and text to speech system (TTS) translates text into speech in a human

language format. The proposed device is a hardware solution for synthesizing speech and

allowing voice access to digital content. This project is planned and implemented. This goal has

been formulated with that element in mind, and a small effort has been made to achieve it. Our

job is to capable of speech recognition and text conversion from audio input and text recognition

and audio convert from text input.


Setback

One of the major setbacks in speech to text translation is accuracy. Achieving high accuracy

levels in understanding and translating spoken language can be challenging due to factors such as

background noise, accents, dialects, and variations in speech patterns. This setback has prompted

researchers and developers to strive towards improving the accuracy of speech to text translation

systems.

Key Benefit

The key benefit of speech to text translation is immense time-saving. By converting speech into

text, individuals can quickly document conversations, meetings, and interviews without the need

for manual transcription. This technology proves to be highly beneficial for people with

disabilities, as it enables them to communicate effectively in written form.

How it is used

Speech to text translation is used in a variety of applications across different domains. For

instance, in healthcare, doctors can dictate patient notes, allowing for efficient medical record

keeping. In customer service, speech to text translation aids in transcribing customer calls,

enabling better analysis and quality control. In education, students can record lectures and later

convert them into text to review key points.

1.1 Background of the Study

Speech to text translation technology has been evolving for several decades, with researchers and

developers constantly working towards improving its accuracy, speed, and reliability.
Advancements in machine learning and deep learning algorithms have significantly enhanced the

performance and usability of these systems.

Speech Recognition (SR) is the ability to turn a spoken word or dictation into text. Speech

recognition, also known as "automatic speech recognition" (ASR) or "speech to text," is a

technology that converts speech into text (STT)

- The process of translating an acoustic signal captured by a microphone or other

peripherals into a series of words is known as speech recognition.

- Linguistic processing can be used to enhance speech comprehension.

- For applications such as commands & control data entry and document preparation, the

known terms may be an end in themselves

Mention speech recognition today, and it is almost inevitable that someone will point to HAL,

the computer from 2001: A Space Odyssey. This illustration of where the technology is headed

has lulled many information technology (IT) managers into ignoring speech recognition because

it is obvious that computers that can hold an intelligent conversation will remain science fiction

for a long time. The fact is that practical, usable speech recognition products are here now. For

example, several companies now sell continuous speech recognition applications that offer the

everyday computer user the ability to input and format text in a word processing program with,

the companies claim, the speed and accuracy that now rivals the traditional keyboard and mouse.

Dragon Systems introduced the first general-purpose continuous-speech recognition program for

the personal computer (PC) in June 1997. IBM Corp. followed soon after with the introduction

of IBM ViaVoice. There is no doubt that speech recognition software has improved significantly

in a short time. Most current speech recognition applications offer large active vocabularies.
Also, the speech recognition engines have become robust. More importantly, a user can now

dictate directly into most popular applications like Microsoft Word (Alwang, 2020).

History of Speech Recognition

In its most basic form, speech recognition involves the process of a computer matching an

acoustic signal to some text. While this may sound relatively simple, speech recognition software

development spans a huge range of scientific disciplines, from linguistics and biology to

computer science and artificial intelligence. The ultimate aim of those working on speech

recognition is to produce a system that enables humans to communicate with computers as they

would with other humans i.e. using natural speech (Rodman, 2021).

Voice and speech recognition have been around since the early 1970s, when research was

conducted on these technologies at the Defense Advanced Research Projects Agency (DARPA)

(Katz, 1993). While commercial applications existed in the '80s and early '90s, they were cost

and technology prohibitive. Today's microprocessor technology, though, has brought voice and

speech recognition out of the laboratories, and into the mainstream consumer market (Lorek,

2022).

Although attempts have been made by many companies over the last 15 years to introduce low

cost products utilizing speech recognition, the products have been few and far between and

market failures have been common. For example, the toy industry historically, has been plagued

by poor speech recognition accuracy leading to a very high percentage of returns on products

such as voice controlled cars (Markowitz, 2019).

Specific Problem
In the early 1950s, the rationale behind computerization was to gain productivity improvements

associated with the substitution of machinery for labor. However, little direct evidence is

available of the relationship between information technology investment and performance. When

a relationship is found, the results cannot be generalized beyond the particular industry study

(Katz, 2019).

The obvious growth of computer information-processing industries since the 1950s might

suggest that every expectation of productivity payoffs has been fulfilled. With quality-adjusted

investment in new computer equipment near $500 billion during the 1990s, U.S. firms have

clearly embraced the computer. The problem, however, is that economy-wide productivity

growth remains well below historic averages (Brynjolfsson, 2020). The rise in computer

investment coupled with slow growth in productivity is commonly referred to as the "Computer

Productivity Paradox" (Brynjolfsson, 2020).

Challenges Faced

Developing accurate speech to text translation systems poses several challenges. Some of these

challenges include language-specific nuances, speaker adaptation, dealing with background

noise, handling multiple voices in a conversation, and extracting context from speech to improve

translation quality.

1.2 Problem Statement

The problem statement in speech to text translation revolves around improving accuracy and

efficiency in converting spoken language into written text. Researchers aim to reduce errors

caused by various factors and enhance the overall usability of speech to text translation systems.
1.3 Aim and Objectives

The aim of this study is to assess the current state of speech to text translation technology and

identify areas for improvement. The objectives include analyzing the challenges faced in

achieving high accuracy, evaluating existing methodologies and techniques used in speech to

text translation, and proposing potential solutions to enhance the performance of these systems.

1.4 Scope and Limitations

This study focuses on speech to text translation technology and its applications in various fields.

However, it is important to note that the scope of the study may be limited by the availability of

resources, specific languages, and dialects considered, as well as the accessibility to advanced

tools and technologies.

1.5 Justification of the Study

The Speech to Text Translator system has gained significant attention due to its potential to

overcome language barriers and enhance communication. This system utilizes voice recognition

technology to transcribe spoken words into text, which can then be translated into another

language. This has immense potential in various fields, such as education, healthcare, business,

and international relations.

1.6 Organization of the Project


This project will be organized in the following way:

Chapter 2 reviews the literature and related work.

Chapter 3 introduces types of machine learning algorithms, one of which will be used to build

the model and methodology.

Chapter 4 is implementation and testing.

Chapter 5 includes the summary, conclusion, and recommendation.

You might also like