Download as odt, pdf, or txt
Download as odt, pdf, or txt
You are on page 1of 6

Name : Rajlaxmi Roy

Registration Number : 18BCE1080


Course : Natural Language Processing

DIGITAL ASSIGNMENT – 03

Question:
Part 1 : As part of the Recent Trends in NLP tabulate the weakness of some of the online recent
applications of NLP.
Part 2 : Describe which is the layer that contributes to the problem and how.
Part 3 : List the kind of additional knowledge or training that might be to these NLP applications to
overcome the shortcomings.

Answer :

5 Most famous NLP application’s are :


1. Sentiment Analysis
2. Machine Translation
3. Voice Recognition
4. Chat-bot
5. Text Correction

Application 1 : Sentiment Analysis


The goal of sentiment analysis is to identify sentiment among several posts or even in the same post
where emotion is not always explicitly expressed. Companies use natural language processing
applications, such as sentiment analysis, to identify opinions and sentiment online to help them
understand what customers think about their products and services

Limitations of automated sentiment analysis


Computer programs have problems recognizing things like sarcasm and irony, negations, jokes, and
exaggerations - the sorts of things a person would have little trouble identifying. And failing to
recognize these can skew the results.

Example 1 : 'Disappointed' may be classified as a negative word for the purposes of sentiment
analysis, but within the phrase “I wasn't disappointed", it should be classified as positive.

Example 2 : We would find it easy to recognize as sarcasm the statement "I'm really loving the
enormous pool at my hotel!", if this statement is accompanied by a photo of a tiny swimming pool;
whereas an automated sentiment analysis tool probably would not, and would most likely classify it
as an example of positive sentiment.
So, automated sentiment analysis tools do a really great job of analyzing text for opinion and
attitude, but they're not perfect.

Reasons that contribute to the problems


When validating a sentiment analysis system, the testing methodology is crucial. The data source,
cleanliness of language, how it is scored, subject matter and volume of data tested are all significant
variables that can dramatically affect results.
For an optimal test, the data source should closely match the intended uses. For example, if your
intended application is analysis of online dialog, the data used to test system accuracy should also
be sourced from online dialog.
Volume of data tested is also important, and a general rule of thumb here is “the more data the
better the test”.

In a nutshell the main reason that contributes to the misinterpretation of the sentiment of a sentence
is because the data set in not efficiently trained or is not suitable for the job.
Another reason is many in the industry are focused on one single metric: precision, often referred to
as accuracy. While certainly important, this measure alone does not tell us anywhere close to the
whole story.

Overcome the shortcomings


Another metric, known as recall, is equally important to the understanding of how these systems
perform. Finally, there is F-Score or F-Measure, which is a more holistic account of overall
performance.
The formula for calculating F1 Score is:
F1 = 2 (precision recall) / (precision + recall)

The larger the sample set, the better.


Application 2 : Machine Translation

As the amount of information available online is growing, the need to access it becomes
increasingly important and the value of natural language processing applications becomes clear.
Machine translation helps us conquer language barriers that we often encounter by translating
technical manuals, support content or catalogs at a significantly reduced cost. The challenge with
machine translation technologies is not in translating words, but in understanding the meaning of
sentences to provide a true translation.

Limitations of machine translation


Accuracy is not offered by the machine translation on a consistent basis. You can get the gist of the
draft or documents but machine translation only does word to word translation without
comprehending the information which might have to be corrected manually later on.

Accuracy is not offered by the machine translation on a consistent basis. You can get the gist of the
draft or documents but machine translation only does word to word translation without
comprehending the information which might have to be corrected manually later on.

Reasons that contribute to the problems

The fast progress of Machine Translation has boosted translation quality significantly, but,
unfortunately, machine translation approaches are not equally successful for all language pairs.
Morphologically rich languages are problematic in MT, especially if the translation is from a
morphologically less complex to a morphologically more complex language.
Morphological distinctions not present in the source language need to be generated in the target
language.

Overcome the shortcomings


Train your machine translation engine.
Translation Memory is a core component of any learning translation tool. Translation memory also
boosts the user’s productivity by working in conjunction with Dynamic Machine Learning to auto-
populate future translations based on previously translated content.
This is how to improve machine translation in the most cost-effective way.
The more Translation Memories you create over time, the better your translation quality becomes
and the less input you need to give.
Application 3 : Voice Recognition

NLP refers to the evolving set of computer and AI-based technologies that allow computers to learn,
understand, and produce content in human languages. The technology works closely with
speech/voice recognition and text recognition engines. While text/character recognition and
speech/voice recognition allows computers to input the information, NLP allows making sense of
this information.

Limitations of Voice Recognition


1. Lack of Accuracy and Misinterpretation
Voice recognition software won't always put your words on the screen completely accurately.
Programs cannot understand the context of language the way that humans can, leading to errors that
are often due to misinterpretation. When you talk to people, they decode what you say and give it a
meaning. Voice recognition software can do this but may not be capable of choosing the correct
meaning.
Example: it cannot always differentiate between homonyms, such as "their" and "there." It may also
have problems with slang, technical words and acronyms.

2. Accents and Speech Recognition


Voice recognition systems can have problems with accents. Even though some may learn to decode
your speech over time, you have to learn to talk consistently and clearly at all times to minimize
errors. If you mumble, talk too fast or run words into each other, the software will not always be
able to cope. Programs may also have problems recognizing speech as normal if your voice
changes, say when you have a cold, cough, sinus or throat problem.

3. Background Noise Interference


To get the best out of voice recognition software, you need a quiet environment. Systems don't work
so well if there is a lot of background noise. They may not be able to differentiate between your
speech, other people talking and other ambient noise, leading to transcription mix-ups and errors.
This can cause problems if you work in a busy office or noisy environment. Wearing close-talking
microphones or noise-canceling headsets can help the system focus on your speech.

Reasons that contribute to the problems


Speech recognition offers many useful applications that can make day-to-day activities easier.
Whether it is used to search for something online, unlock a smartphone, or operate a car
infotainment system: More and more programs use voice recordings.
Since every person speaks differently based on their dialect, individual mannerisms, or potential
speech impediments, the program needs to be trained to recognize the same words in various
iterations. This is why the human factor plays such an important role in gathering speech
recognition training data. Simply using one recording to train the system would not yield the desired
results.
Programming that does not include “human reason” and “human behavior” factors cannot lead to an
ideal speech recognition system. In many cases, the users’ voice commands are not recognized, or
they are misunderstood.

Overcome the shortcomings


Speech recordings of thousands of different people with their individual commands and
pronunciations are needed to optimize the range of the system for it to be able to recognize the
individual voice commands of potential users.
Create efficient data sets to improve speech recognition software.
Application 4 : Chat-bot

Machine learning is revolutionizing the current organizational structure by offering solutions to


limited manpower capacity at reduced costs with better results. One such tool developed to enhance
customer interaction which is becoming a part of our lives gradually is chat-bots.

Limitations of Chat-bot
Inability to Understand
Due to fixed programs, chat-bots can be stuck if an unsaved query is presented in front of them.
This can lead to customer dissatisfaction and result in loss. It is also the multiple messaging that can
be taxing for users and deteriorate the overall experience on the website.

Time-Consuming
Chat-bots are installed with the motive to speed-up the response and improve customer interaction.
However, due to limited data-availability and time required for self-updating, this process appears
more time-taking and expensive. Therefore, in place of attending several customers at a time, chat-
bots appear confused about how to communicate with people.

Zero decision-making
Chat-bots are known for being infamous because of their inability to make decisions. A similar
situation has landed big companies like Microsoft etc. in trouble when their chat-bot went on
making a racist rant. Therefore, it is critical to ensure proper programming of your chat-bot to
prevent any such incident which can hamper your brand.

Reasons that contribute to the problems


1. Understanding human language : Human interaction is complicated, and that has a lot to do with
how rich and diverse human languages are. Conversational datasets allow chat-bots to learn from a
large number of examples, from which they can learn sentence construction. Such datasets also
allow chatbots to learn cases of grammar rule exemptions

2. Tone detection: As native speakers of a language, we understand which words signify which
tones. We understand which statements represent happiness or sadness and pleasure or anger. While
these things are simple to us, they need to be ingrained into a chat-bot. We can’t have a chat-bot
responding to an “I’ve been having a bad day.” with an “I’m so happy for you!”
Understanding tone matters a lot while we communicate, and it ought to matter for intelligent
beings trying to interact with us.

3. Clean conversational data : If the training datasets aren’t clean or free of issues, do not expect
your AI/ML model to function as intended. With conversational AI, the clarity and cleanness of its
training data determines its ability to interact fluently with people.
Common issues with chat-bot training data include:
• Wrong punctuations
• Inaccurate word choices
• Illegible sentences

Overcome the shortcomings


Chat-bot needs to be fed the relevant datasets.
A chat bot gets defined by the training data it consumes. It truly becomes what it eats. Chat bots are
being adopted all across numerous areas of our lives, and results have shown that we like
interacting with these intelligent beings. They make the interaction between people and
organizations simpler. They enhance customer service and improve overall efficiency.
Application 5 : Text Correction

When dealing with text, we may need to deal with incorrect text.
Compare the typed word as typed to a large dictionary - and if the word is not found there then to
find those words in the dictionary that are closest to the typed word to display as “suggestions” -
and possibly to use the closest of all to “auto-correct”.

This idea of how “close” two words are to one another is complicated. There is an entire branch of
Computer Science devoted to finding good ways to do this.

Limitations of Text Correction


Typically, if you have misspelled a word the spellchecker will offer a list of alternatives. Unless
your initial attempt is reasonably close to the correct spelling, you are unlikely to be offered
sensible alternatives, and, even if you are, you have to be able to make sense of what is on offer.

Example : You may correctly spell a word but simply use the wrong one; for example, 'After I had
eaten my super I went straight to bed.' A spellchecker will not spot that it should be 'supper' not
'super'.

Reasons that contribute to the problems


The main reason why the spell checker is not working as expected might be because the model
wasn't exposed to a more diverse, larger amount of misspellings during training.

Overcome the shortcomings


There are no such large datasets of diverse misspellings publicly available for training a general-
domain spell checker.
One idea here is to artificially generate noisy text from clean text.
Method to implement : For example, we can take some clean text (which is available from, for
example, scraped web text almost indefinitely) and replace some letters at random. If you pair
artificially-generated noisy text created this way with the original, clean text, this will effectively
create a new, larger dateset on which you can train an even better spell checker!

The remaining issue we need to address is how to "corrupt" clean text to generate realistic spelling
errors that look like the ones made by humans. We can write a Python script that, for example,
replaces, deletes, and/or swaps letters at random, although there is no guarantee that typos made this
way are similar to those made by humans and the resulting artificial dateset provides useful insights
for the Transformer model.
Method to implement : If we "flip" the direction of the original dateset we used to train the spell
checker, you can observe how humans make typos. If you treat the clean text as the source language
while the noisy text as the target and train a Seq2Seq model for that direction, you are effectively
training a "spell corrupter"—a Seq2Seq model that inserts realistic looking spelling errors into clean
text.
This technique of using the "inverse" of the original training data to artificially generate a large
amount of data in the source language from a real corpus in the target language is called back-
translation in the machine translation literature. It is a very common, popular technique to improve
the quality of machine translation systems.

You might also like