B.Tech(C.S.E) - 7th Sem

Speech and Natural Language Processing

Submitted By:
Name- Megha Yadav
Roll No.- CSE(L)-20/002

Submitted To:
Ms. Anjali Ma’am

 Machine Translation:
Machine Translation (MT) is the task of automatically converting one natural language into
another, preserving the meaning of the input text, and producing fluent text in the output
language. While machine translation is one of the oldest subfields of artificial intelligence
research, the recent shift towards large-scale empirical techniques has led to very significant
improvements in translation quality. The Stanford Machine Translation group's research
interests lie in techniques that utilize both statistical methods and deep linguistic analyses.

What is a machine translation and how does it work ?

Robotized interpretation, often known as machine translation (MT), is the process by which
computer software translates text from one language into another without the assistance of a
human. At its most basic level, machine translation simply swaps out atomic words from one
distinctive language for ones from another.
By using corpus approaches, it is possible to carry out more complex translations that better
treat phonetic typology contrasts, acknowledge expressions, translate idioms, and isolate
anomalies. Although some technologies can't now function as well as a human translator, this
will be achievable in the future.

Applications of machine translation:

1)Translation of text:
Many sentence-level and text-level translation tools make use of automated text translation.
Applications for sentence-level translation include the translation of query and recovery
inputs as well as the translation of (OCR) results from image optical character recognition.
Applications for text-level translation allow for the translation of a variety of unaltered
reports as well as archives containing organised data.

2)Speech translation:
With the rapid development of mobile applications, discourse translation has grown in
importance as an application scenario and voice input has become a useful technique of
human-computer communication. Discourse interpretation's basic cycle is "source language
discourse source language text-target language text-target language discourse."

 User Interfaces:
 A sort of computer user interface known as a "natural-language user interface" uses
linguistic elements such verbs, phrases, and clauses as UI controllers to create, select,
and alter data in software programmes.
 In interface design, natural-language interfaces are coveted for their speed and
usability, but the majority struggle to comprehend a wide range of confusing input.
 The fields of computational linguistics and natural-language processing are both
actively researching natural-language interfaces. One of the active aims of the
Semantic Web is an intuitive generic natural-language interface.
 To varied degrees, text interfaces feel "natural." Many formal (unnatural)
programming languages include idioms from everyday English. An analogous
"shallow" natural-language user interface would be a typical keyword search engine.
Uses and applications:
 Dictation, is the most common use for automated speech recognition (ASR) systems
today. This includes medical transcriptions, legal and business dictation, and general
word processing. In some cases special vocabularies are used to increase the accuracy
of the system.
 Command and control, ASR systems that are designed to perform functions and
actions on the system are defined as command and control systems. Utterances like
"Open Netscape" and "Start a new xterm" will do just that.
 Telephony, some PBX/Voice Mail systems allow callers to speak commands instead
of pressing buttons to send specific tones.
 Wearables, because inputs are limited for wearable devices, speaking is a natural
 Medical, disabilities, many people have difficulty typing due to physical limitations
such as repetitive strain injuries (RSI), muscular dystrophy, and many others. For
example, people with difficulty hearing could use a system connected to their
telephone to convert a caller's speech to text.

 Man-Machine interfaces:
A man-machine interface can be defined as the mediator between users and machines. It is a
system that handles all aspects of communication, is in charge of giving the machine
"knowledge," functionality, and access to information in a manner that is compatible with the
end-communication user's channels, and converts the user's actions (user input) into a form
(instructions/commands) that a machine can comprehend.

The need for more user-friendly man-machine interfaces is becoming increasingly important
for their use, and subsequently for their market success, as increasingly sophisticated
systems, products, and services arrive on the market. More powerful, complex, and
demanding user interfaces have evolved as a result of the development of graphic user
interfaces, audio-based interaction, speech synthesis and understanding, natural languages,
direct manipulation, and multimodal interaction dialogues, as well as ergonomics and human
factors evaluation. Future advancements in the field of man-machine interaction technology
are anticipated to be significantly impacted by recently revealed concepts. These include
alternative interaction methods and input/output devices, user-tailored environments
(interfaces compatible with the user's available communication channels), user-support
environments (able to assist the user during the interaction process), 3D direct manipulation,
multimodal-multimedia interfaces (providing concurrent interaction by means of different
media), and virtual environments (by means of the realisation of interaction environments far
beyond what is physically possible) (that provide concurrent access to the same environment
by several, locally-based or remotely located users, allowing co-operation and collaboration
between them).

Potential users of computer systems and telecommunications equipment will need to employ
the cutting edge technologies, interacting and communicating in a multimodal, multiprocess,
and collaborative environment. The employment of modalities that are unavailable to the
specific user or the usage of interfaces that require strong cognitive and interactional abilities
may provide difficulties for disabled and elderly users. On the other hand, the availability of
interface entities in redundant forms (messages, selection menus, workspaces, etc.), the
accessibility of a variety of interaction techniques that make use of alternative modalities
(e.g., selection, position, quantity), and the choice of suitable interaction metaphors (e.g. the
desktop metaphor that is utilised in most of the currently available graphical user interfaces).

 Natural language querying:

A natural language query consists only of normal terms in the user’s language, without any
special syntax or format. ATG Search enables users to enter terms in any format, such as a
statement, a question, or a straightforward list of keywords.

A natural language inquiry is one that is input using just words or phrases that are normally
spoken or entered in the manner in which they might be said, without the use of non-language
characters like the plus sign or the asterisk, and without the use of any extra formatting or
syntactic changes. A text or voice interface can be used to conduct natural language queries.

Depending on the application, natural language processing (NLP) enables software to

"understand" regular human speech or written text as input and perhaps respond to it. For
instance, a virtual assistant can reply to text or spoken commands. However, since no
computer programme can actually extract meaning from spoken human language, NLP uses
methods to translate language between the two.

NLP uses syntax techniques like parsing for grammatical analysis, word segmentation to
divide text into smaller units, sentence breaking to apply meaningful boundaries in unbroken
text, morphological segmentation to identify the structure and form of words, and stemming
to reduce words to the stems to which suffixes and prefixes attach. Along with these
procedures, NLP employs methods like named entity recognition (NER) and word sense
disambiguation to comprehend user input questions, interpret them, and then return them as
human-understandable responses through natural language generation (NLG).

Tutoring and authoring systems:

A software programme known as an intelligent tutoring system (ITS) supports the idea of
"learning by doing" by attempting to mimic the performance of a human tutor. A student's
performance can be improved by ITSs in a variety of academic areas. Despite their
advantages, ITSs have not been widely adopted because of the difficulty in developing them.
An ITS must be created from scratch, which necessitates knowledge of computer science,
cognitive psychology, and artificial intelligence. A variety of authoring tools have been
created in an effort to lower the level of expertise needed to create ITSs. In this thesis, I
outline a number of contributions to the field of intelligent tutoring, including updates to an
existing ITS authoring tool, analyses of authoring tool paradigms, and the creation of
authoring tools for non-programmers in the challenging fields of natural language processing
and 3D game environments. An authoring tool called the Extensible Problem Specific Tutor
(xPST) facilitates the quick creation of tutors that resemble model tracing on already existing
interfaces like websites. With the addition of new checktypes necessary for response
checking in issues pertaining to fields like geometry and statistics, xPST's language became
more expressive.

ITS Authoring with Natural Language:

Many ITS have been developed for mathematically well-formed topics, including algebra,
geometry, programming languages and physics.
Unfortunately, even with specialised authoring tools for a review of cutting-edge authoring
techniques, building mathematically oriented ITS is challenging in terms of development
costs, which can be as high as 100 hours of development work for 1 hour of teaching.
However, over the past ten years, a number of ITS have been developed that focus on
knowledge domains having a natural language foundation as opposed to mathematics and
topics that call for precise analytic reasoning. These natural language ITS' learning gains are
in line with the sizable impacts discovered in ITS meta-analyses.

These conversational ITS based in natural language share two defining attributes. First, they
are based on naturalistic observations and computational modeling of human tutoring
strategies embedded in tutorial dialogue. A common strategy is the so-called five-step
dialogue frame:

1) Tutor asks a deep reasoning question,

2) Student gives an answer,
3) Tutor gives immediate feedback or pumps the student,
4) Tutor and student collaboratively elaborate an answer, and
5) Tutor assesses the student’s understanding.

The five-step dialogue frame illustrates the other defining attribute of natural language ITS,
which is their interactive and collaborative nature: the tutor and student are co-constructing
an explanation together. According to theories of learning, the interactive and collaborative
nature of tutoring is what makes it more effective than activities like individual problem

 Speech Recognition:
Speech recognition, also known as automatic speech recognition (ASR), computer speech
recognition, or speech-to-text, is the capacity of a programme to convert spoken language
into written language. Speech recognition, which is frequently confused with voice
recognition, focuses on converting speech from a verbal to a text format, whereas voice
recognition merely aims to distinguish the voice of a certain user.

Key features of effective speech recognition:

There are numerous voice recognition programmes and gadgets, but the most sophisticated
ones make use of artificial intelligence and machine learning. To comprehend and process
spoken language, they combine grammar, syntax, signal structure, and signal composition
from audio and voice inputs. It's ideal for them to develop their replies as they go along and
as they interact.

The best kind of systems also allow organizations to customize and adapt the technology to
their specific requirements — everything from language and nuances of speech to brand
recognition. For example:

 Language weighting: Improve precision by weighting specific words that are

spoken frequently (such as product names or industry jargon), beyond terms already
in the base vocabulary.
 Speaker labeling: Output a transcription that cites or tags each speaker’s
contributions to a multi-participant conversation.
 Acoustics training: Attend to the acoustical side of the business. Train the system
to adapt to an acoustic environment (like the ambient noise in a call center) and
speaker styles (like voice pitch, volume and pace).

 Commercial use of NLP:

1. Sentiment Analysis:
Given that people frequently utilise sarcasm and irony when expressing thoughts, natural
language processing by robots can be particularly challenging. Sentiment analysis, however,
has the capacity to pick up on minute variations in feelings and viewpoints and assess their
positivity or negativity.
2. Text Classification:
Text classification involves automatically comprehending, interpreting, and categorising
unstructured text. It is a text analysis problem that also incorporates sentiment analysis.
3. Chatbots & Virtual Assistants:
Chatbots and virtual assistants are used for automatic question answering, designed to
understand natural language and deliver an appropriate response through natural language
4. Text Extraction:
Text extraction, or information extraction, automatically detects specific information in a
text, such as names, companies, places, and more. This is also known as named entity
recognition. You can also extract keywords within a text, as well as pre-defined features such
as product serial numbers and models.

5. Machine Translation:
Machine translation (MT) is one of the first applications of natural language processing. Even
though Facebooks’s translations have been declared superhuman, machine translation still
faces the challenge of understanding context.

