Professional Documents
Culture Documents
Application 1: Sentiment Analysis
Application 1: Sentiment Analysis
DIGITAL ASSIGNMENT – 03
Question:
Part 1 : As part of the Recent Trends in NLP tabulate the weakness of some of the online recent
applications of NLP.
Part 2 : Describe which is the layer that contributes to the problem and how.
Part 3 : List the kind of additional knowledge or training that might be to these NLP applications to
overcome the shortcomings.
Answer :
Example 1 : 'Disappointed' may be classified as a negative word for the purposes of sentiment
analysis, but within the phrase “I wasn't disappointed", it should be classified as positive.
Example 2 : We would find it easy to recognize as sarcasm the statement "I'm really loving the
enormous pool at my hotel!", if this statement is accompanied by a photo of a tiny swimming pool;
whereas an automated sentiment analysis tool probably would not, and would most likely classify it
as an example of positive sentiment.
So, automated sentiment analysis tools do a really great job of analyzing text for opinion and
attitude, but they're not perfect.
In a nutshell the main reason that contributes to the misinterpretation of the sentiment of a sentence
is because the data set in not efficiently trained or is not suitable for the job.
Another reason is many in the industry are focused on one single metric: precision, often referred to
as accuracy. While certainly important, this measure alone does not tell us anywhere close to the
whole story.
As the amount of information available online is growing, the need to access it becomes
increasingly important and the value of natural language processing applications becomes clear.
Machine translation helps us conquer language barriers that we often encounter by translating
technical manuals, support content or catalogs at a significantly reduced cost. The challenge with
machine translation technologies is not in translating words, but in understanding the meaning of
sentences to provide a true translation.
Accuracy is not offered by the machine translation on a consistent basis. You can get the gist of the
draft or documents but machine translation only does word to word translation without
comprehending the information which might have to be corrected manually later on.
The fast progress of Machine Translation has boosted translation quality significantly, but,
unfortunately, machine translation approaches are not equally successful for all language pairs.
Morphologically rich languages are problematic in MT, especially if the translation is from a
morphologically less complex to a morphologically more complex language.
Morphological distinctions not present in the source language need to be generated in the target
language.
NLP refers to the evolving set of computer and AI-based technologies that allow computers to learn,
understand, and produce content in human languages. The technology works closely with
speech/voice recognition and text recognition engines. While text/character recognition and
speech/voice recognition allows computers to input the information, NLP allows making sense of
this information.
Limitations of Chat-bot
Inability to Understand
Due to fixed programs, chat-bots can be stuck if an unsaved query is presented in front of them.
This can lead to customer dissatisfaction and result in loss. It is also the multiple messaging that can
be taxing for users and deteriorate the overall experience on the website.
Time-Consuming
Chat-bots are installed with the motive to speed-up the response and improve customer interaction.
However, due to limited data-availability and time required for self-updating, this process appears
more time-taking and expensive. Therefore, in place of attending several customers at a time, chat-
bots appear confused about how to communicate with people.
Zero decision-making
Chat-bots are known for being infamous because of their inability to make decisions. A similar
situation has landed big companies like Microsoft etc. in trouble when their chat-bot went on
making a racist rant. Therefore, it is critical to ensure proper programming of your chat-bot to
prevent any such incident which can hamper your brand.
2. Tone detection: As native speakers of a language, we understand which words signify which
tones. We understand which statements represent happiness or sadness and pleasure or anger. While
these things are simple to us, they need to be ingrained into a chat-bot. We can’t have a chat-bot
responding to an “I’ve been having a bad day.” with an “I’m so happy for you!”
Understanding tone matters a lot while we communicate, and it ought to matter for intelligent
beings trying to interact with us.
3. Clean conversational data : If the training datasets aren’t clean or free of issues, do not expect
your AI/ML model to function as intended. With conversational AI, the clarity and cleanness of its
training data determines its ability to interact fluently with people.
Common issues with chat-bot training data include:
• Wrong punctuations
• Inaccurate word choices
• Illegible sentences
When dealing with text, we may need to deal with incorrect text.
Compare the typed word as typed to a large dictionary - and if the word is not found there then to
find those words in the dictionary that are closest to the typed word to display as “suggestions” -
and possibly to use the closest of all to “auto-correct”.
This idea of how “close” two words are to one another is complicated. There is an entire branch of
Computer Science devoted to finding good ways to do this.
Example : You may correctly spell a word but simply use the wrong one; for example, 'After I had
eaten my super I went straight to bed.' A spellchecker will not spot that it should be 'supper' not
'super'.
The remaining issue we need to address is how to "corrupt" clean text to generate realistic spelling
errors that look like the ones made by humans. We can write a Python script that, for example,
replaces, deletes, and/or swaps letters at random, although there is no guarantee that typos made this
way are similar to those made by humans and the resulting artificial dateset provides useful insights
for the Transformer model.
Method to implement : If we "flip" the direction of the original dateset we used to train the spell
checker, you can observe how humans make typos. If you treat the clean text as the source language
while the noisy text as the target and train a Seq2Seq model for that direction, you are effectively
training a "spell corrupter"—a Seq2Seq model that inserts realistic looking spelling errors into clean
text.
This technique of using the "inverse" of the original training data to artificially generate a large
amount of data in the source language from a real corpus in the target language is called back-
translation in the machine translation literature. It is a very common, popular technique to improve
the quality of machine translation systems.