Natural Language Processing

Natural Language Processing is one of the branches of AI that helps machines to understand, interpret, and manipulate human languages
such
as English or Hindi to analyze and derive their meaning.
NLP takes the data as input from the spoken words, verbal commands or speech recognition software which humans use in their daily lives and
operates on this. Before getting deeper into the concept follow the links to play the games based on NLP.
The mystery animal game: Play from here
Just open the browser and play the game then answer a few questions like:
1. Were you able to guess the animal?

2. If yes, in how many questions were you able to guess it?
3. If no, how many times did you try playing this game?
4. What according to you was the task of the machine?
5. Were there any challenges that you faced while playing this game? If yes, list them down.
6. What approach must one follow to win this game?
In the next section of Unit 6 Natural Language Processing AI Class 10, we are going to cover the applications of natural language processing.
Applications of Natural Language processing
As we have seen some of the common uses of Natural Langauge Processing in our daily lives like virtual assistants, google translate etc. Here
are some more applications of NLP:
Automatic Summarization
 Today internet is huge source of informtion. So it is very difficult to access specific information from the web.
 Summurizing help us to understand the emotional meaning from the information. For example social media
 It can be also helpful to provide an overview of blogpost or new story by avoiding redundancy from multiple sources and maximize the
diversity of content obtained.
 For example, newsletters, social media marketing, video scripting etc.
Sentiment Analysis
 Sometimes companies need to identify the options and sentiment online to help them to understand customers thought about products and
services.
 Sometimes sentiment analysis also help to define the overall reputation.
 It help customer to purcahse the product or services based on expressed opinion, understand the sentiment in context to help better.
 For example, Customer think about “I like the new smartphone, but it has weak battery backup” and brnad monitoring, customer support
analysis, customer feedback analysis, market research etc.
Text Classification
 It helps to assign a predefined category to a document, organize it in such a way that help customers to find the information they want.
 It also helps to simplify the activities.
 For example, spam filtering in email, auto tagging in social media, categorization of news articles etc.
Virtual Assistants
 We are using google assistant, cortana, siri, alexa daily.

 We can talk with them and they makes our task easy and comortable.
 They can be used to for keeping notes, remniders for our scheduled tasks, make calls, sending messages etc.
 It can also identify the words spoken by the user.
Getting Started with NLP
Let’s start the Natural Language Processing by revisiting the AI project cycle. Here I will give you the same example as given in the CBSE study
material. So let’s go through it!
Revisiting the AI project cycle – The scenario
 The world is competitive nowadays.

 People face competition in every task and expected to give their best at every point in time.
 Sometimes people are not able to meet these expectation, they get stressed and could go in depression.
 We get to hear a lot of cases of where people are depressed due to mamy reasons.
 These reasons are : pressure, studies, family issues, relationships.
 To overcome this Cognitive Behavioural Therapy (CBT), which is one of the best methods to address stress as it is easy to implement on
people and also gives good results.
 CBT inclused the behaviour and mindset of a person in normal life.
Follow this link to get more information about CBT:
Cognitive Behavioural Therapy
Problem Scoping
CBT technique is used to cure depression and stress. But what to do who do not want to connect with a psychiatrist. So let us try to look at this
problem with the 4Ws problem canvas.
Who Canvas
Who? People who suffer from stress and depressed
What did we know about People who are going through stress are reluctant to consult a
them? psychiatrist
What Canvas
People who need help are reluctant to consult a psychiatrist

What is the problem?
and hence live miserably.
How do you know it is a Studies around mental stress and depression are available from
problem? various authentic sources.
Where Canvas
When they are going through a stressful

What is the context/situation in which the
period of time o Due to some unpleasant
stakeholders experience this problem?
experiences
Why Canvas
People get a platform where they can talk and vent out their
What would be of key-value feelings anonymously
to the stakeholders? People get a medium that can interact with them and
applies primitive CBT on them and can suggest help
whenever needed
People would be able to vent out their stress

How would it improve their
They would consider going to a psychiatrist whenever
situation?
required
After 4Ws canvas the problem statement template is created which is as follows:
Our People undergoing stress Who?
Have a problem
Not being able to share their feelings What?
of
While They need help in venting out their emotions Where?
An ideal solution Provide them a platform to share their thoughts anonymously

Why?
would and suggest help whenever required
This leads us to the goal of our project which is:
“To create a chatbot which can interact with people, help them to vent out their feelings and take them through primitive CBT.”
Data Aquisition
 To understand the sentiments of people, data should be collected.

 After collecting data machine can interpret the words that they use and understand their meaning.
 Data can be collected from various means:
 Surveys
 Observing therapist’s sessions
 Database available on internet
 Interview
Data Exploration
 Once data is collected it needs to be processed and cleaned.

 After processing the easier version can be sent.
 The text is normalised through various steps and is lowered to minimum vocublury since the machine does not require grammatically correct
statements but the essence of it.
Modelling
 Once the text has been normalised, it is then fed to an NLP based AI model.
 Note that in NLP, modelling requires data pre-processing only after which the data is fed to the machine.
 Depending upon the type of chatbot we try to make, there are a lot of AI models available which help us build the foundation of our project.
Evaluation
 The model trained is then evaluated and the accuracy for the same is generated on the basis of the relevance of the answers which the
machine gives to the user’s responses.
 To understand the efficiency of the model, the suggested answers by the chatbot are compared to the actual answers.
In the above diagram, the blue line talks about the model’s output while the green one is the actual output along with data samples.
1. Figure 1: The model’s output does not match the true function at all. Hence the model is said to be underfitting and its accuracy is lower.
2. Figure 2: In the second one, the model’s performance matches well with the true function which states that the model has optimum accuracy
and the model is called a perfect fit.
3. Figure 3: In the third case, model performance is trying to cover all the data samples even if they are out of alignment to the true function.
This model is said to be overfitting and this too has a lower accuracy.
Once the model is evaluated thoroughly, it is then deployed in the form of an app that people can use easily.
Chatbot
 A chatbot is a very common and popular model of NLP.
 There are lots of chatbots available and many of them use the same approach.
 Some of the popular chatbots are as following:
 Mitsuku Bot
 Celver Bot
 Jabberwacky
 Haptik
 Rose
 OChatbot
Answer the following questions:
 Which chatbot did you try? Name any one.

 What is the purpose of this chatbot?
 How was the interaction with the chatbot?
 Did the chat feel like talking to a human or a robot? Why do you think so?
 Do you feel that the chatbot has a certain personality?
Types of chatbot
There are two types of chatbots:
1. Script-bot:
 Script-bot are very easy to make
 It works around the script which is programmed in them
 Free to use
 easy to integrate to a messaging paltform
 no or little processing skills required
 offers limited funtionality
 story speaker is an example of script-bot
2. Smart-bot
 Flexible and powerful
 Work on bigger databases and other resources directly
 Learn with more data
 coding is required to take up this on board
 Wide functionality
 Google Virtual Assistant, Alexa, Siri, Cortana etc
In the next section of Unit 6 Natural Language Processing AI Class 10, we will discuss human language vs computer language.
Human Language vs Computer Language

 Humans communicate through language which we process all the time.
 Our brain keeps on processing the sounds that it hears around itself and tries to make sense out of them all the time.
 Even in the classroom, as the teacher delivers the session, our brain is continuously processing everything and storing it in some place. Also,
while this is happening, when your friend whispers something, the focus of your brain automatically shifts from the teacher’s speech to your
friend’s conversation.
 So now, the brain is processing both the sounds but is prioritising the one on which our interest lies.
 The sound reaches the brain through a long channel.
 As a person speaks, the sound travels from his mouth and goes to the listener’s eardrum.
 The sound striking the eardrum is converted into neuron impulse, gets transported to the brain and then gets processed.
 After processing the signal, the brain gains understanding around the meaning of it.
 If it is clear, the signal gets stored. Otherwise, the listener asks for clarity to the speaker.
 This is how human languages are processed by humans.
 On the other hand, the computer understands the language of numbers.
 Everything that is sent to the machine has to be converted to numbers.
 And while typing, if a single mistake is made, the computer throws an error and does not process that part.
 The communications made by the machines are very basic and simple.
 Now, if we want the machine to understand our language, how should this happen?
 What are the possible difficulties a machine would face in processing natural language?
Arrangement of the words and meaning
 Every langauge has its own rules.

 In human language we have nouns, verbs, adverbs, adjectives etc.
 A word can be noun at one time or verb or adjective at other time.
 The rules provide a structure for the language.
 Every language has its own syntax. Syntax refers to the grammatical structure of the sentence.
 Similarly, rules are required to process the computer langauge.
 To do so part-of-speech tagging required which allows computer to identify different parts of speech.
 Human language has multiple characteristics that is easy for a human to understand but extremely difficult for a computer to understand.
Analogy with programming
 Different syntax, same semantics: 2+3 = 3+2

 Here the way these statements are written is different, but their meanings are the same that is 5.
 Different semantics, same syntax: 2/3 (Python 2.7) ≠ 2/3 (Python 3)
 Here the statements written have the same syntax but their meanings are different. In Python 2.7, this statement would result in 1 while in
Python 3, it would give an output of 1.5.
In the next section of Unit 6 Natural Language Processing AI Class 10 we are going to discuss the multiple meanings of words. Here it is!
Multiple meanings of word

Let’s consider these three sentences:
“His face turned red after he found out that he took the wrong bag”
 What does this mean?

 Is he feeling ashamed because he took another person’s bag instead of his?
 Is he feeling angry because he did not manage to steal the bag that he has been targeting?
“The red car zoomed past his nose”
 Probably talking about the color of the car

“His face turns red after consuming the medicine”
 Is he having an allergic reaction?

 Or is he not able to bear the taste of that medicine?
 Here we can see that context is important.
 We understand a sentence almost intuitively, depending on our history of using the language, and the memories that have been built within.
 In all three sentences, the word red has been used in three different ways which according to the context of the statement changes its
meaning completely.
 Thus, in natural language, it is important to understand that a word can have multiple meanings and the meanings fit into the statement
according to the context of it.
Perfect Syntax, no meaning
 Sometimes a sentence has perfect syntax but no meaning. For example,
“Chickens feed extravagantly while the moon drinks tea.”
 This statement is correct in syntax but doea this make any sense?
 In human language a perfect balance of syntax and semantics is important for better understanding.
Data Processing
 Humans communicate with each other very easily.
 The nagural langauge can be used very conveniently and effeciently by speaking and understadning by humans.
 At the same time it is very difficult and complex for computer to process them.
 Now the question is how machine can understand and speak in the Natual Language just like humans.
 So computer understands only numbers, so basic step is to convert each word or letter into numbers.
 The conversion requires text normalization.
Text Normalization
 As human languages are to complex, it needs to be simplified to understand .
 Text Normalization helps in cleaning up the textual data is such a way that it comes down to a level where its complexity is lower than actual
data.
 In text normalization, we follow several steps to normalise the text to a lower level.
 We need to collect the text for text normalization.
What does text normalization include?
The process of transforming a text into a canonical (standard) form. For example, the word ‘Welll’ and ‘Wel’ can be transformed to “Well”.
Why is text normalization important?
While text normalization we reduce the randomness and bring them closer to predefined standards. It reduces the amount of different
information that the computer has to deal with and therefore improves efficiency.
Corpus
 The text and terms collected from various documents and used for whole textual data from all documents altogether is known as corpus.
 To work out on corpus these steps are requried:
Sentence Segmentation
 Sentence segmentation divides the corups into sentences.

 Each sentence is taken as a different data so now the the corpus gets recuded to sentences.
Tokenisation
 After sentence segmentation each sentences is further divided into tokens.

 Tokens is a term used for any word or number or special character occuring in a sentence.
 Under tokenisation, every word, number and special character is considered separately and each of them is now a separate token.
Removing Stopwords, Special Characters and Numbers
 Stop words are those words which are used very frequently in the corpus and do not add any value to corpus.
 In human language there are certain words used to garmmar which does not add any essence to the coprus.
 Some examples of stop words are:
The above words have little or no meaning in the corpus, hence these words are removed and focus on meaningful terms.
Along with these stopwords, the corpus may have some special characters and numbers. Sometimes some of them are meaningful, sometimes
not. For example, for email ids, the symbol @ and some numbers are very important. If symbolism special characters and numbers are not
meaningful can be removed like stopwords.
Converting text to a common case
 The next step after removing stopwords, convert the whole text into similar case.
 The most preferable case is lower case.
 This ensures that the case-sensitivity of the machine does not consider same words as different just because of different cases.
In the above example, the word “hello” is written in 6 different forms, which is converted into lower case and hence all of them are treated as a
similar word by the machine.
Stemming
 In this step the words are recuded to their root words.

 Stemming is the process in which the affixes of words are removed and the words are converted to their base form.
 Note that in stemming, the stemmed words (words which are we get after removing the affixes) might not be meaningful.
 Here in this example as you can see: healed, healing and healer all were reduced to heal but studies was reduced to studi after the affix
removal which is not a meaningful word.
 Stemming does not take into account if the stemmed word is meaningful or not.
 It just removes the affixes hence it is faster.
Word Affix Stem
healed ed heal
healing ing heal
healer er heal
studies es studi
studying ing study

 In stemming, the stemmed words are not meningful always.
 As you can observe in the above table healed, healing, healer reduced to heal and stuies reduced to studi.
 Stemming does not take into account if the stemmed word is meaningful or not.
 It just removes the affixes hence it is faster.
In the next section of Unit 6 Natural Language Processing AI Class 10 we are going to discuss lemmatization. Here we go!
Lemmatization
 It is an alernate process of stemming.

 It also removes the affix from the corpus.
 The only difference between lemmatization and stemming is the output of lemmatization are meaningful words.
 The final output is known as lemma.
 It takes longer time to execute than stemming.
 The following table shows the process.
Word Affix Stem
healed ed heal
healing ing heal
healer er heal
studies es study
studying ing study
Compare the tables of stemming and lemmatization table, you will find the word studies converted into studi by stemming whereas the lemma
word is study.
Observe the following example to understand stemming and lemmatization:
After normalisation of the corpus, let’s convert the tokens into numbers. To do so the bag of words algorithm will be used.
Bag of words
 Bag of words is an NLP model that extracts the features of the text which can be helpful in macine learing algorithms.
 We get the occurences of each word and develop the vouclbary for the corpus.
 The above image shows how the bag of word algorithm works.
 The text is given on the left is normalised corpus after going through all the steps of text processing.
 The image in middle shows the bag of words algortihm. Here we put all the words we got from the text processing.
 The image in the rights shows unique words returned by the bag of words algorithm along with its occurence in the text corpus.
 Eventually the bag of words returns us two things:
 A vocublury of words
 The frequency of words
 The algorithm “bag” of words symbolises that the sequence of sentences or tokens does not matter in this case as all we need are the unique
words and their frequency in it.
The step by step process to implement bag of words algorithm
The following steps should be followed to implement bag of words:
1. Text Normalization
2. Create dictionary
3. Create document vectors
4. Create document vectors for all documents
Text Normalization:
This step collects the data and pre-process it. For example:
Document 1: Divya and Rani both are in stressed
Document 2: Rani wents to a therapist
Document 3: Divya went to download a health chatbot
The above example consists of three documents having one sentence each. After text normalization, the text would be:
Document 1: [Divya, and, Rani, both, are, stressed]
Document 2: [Rani, went, to, a, therapist]
Document 3:[Divya, went, to, download, a, health, chatbot]

 Note that no tokens have been removed in the stopwords removal step.
 It is because we have very little data and since the frequency of all the words is almost the same, no word can be said to have lesser value
than the other.
Step 2 Create Dictionary
To create a dictionary write all words which occurred in the three documents.
Dictionary:
Divya and Rani went are stressed
to a therapist download health chatbot
In this step, the repeated words are written just once and we create a list of unique words.
Step 3 Create a document vector
 In this step write all the words in top row.

 Now write check the document for the same word and write 1 (One) uder it if its found otherwise write 0 (zero)
 If the same word appeares more than once do the increment of that word.
Divya and Rani went are stressed to a therapist download health chatbot
Document 1 1 1 1 0 1 1 0 0 0 0 0 0
Document 2 0 0 1 1 0 0 1 1 1 0 0 0
Document 3 1 0 0 1 0 0 1 1 0 1 1 1
In the above table,
1. The header row contains the vocabulary of the corpus.

2. Three rows represents the the three documents.
3. Analyse the 1s and 0s.
4. This gives the document vector table for the corpus.
5. Finally the tokens should be converted into numbers.
To convert these tokens into numbers the TFIDF (Term Frequency and Inverse Document Frequency) is used.
TFIDF (Term Frequency and Inverse Document Frequency)
There are two terms in TFIDF, one is Term Frequency and another one is Inverse Document Frequency.
Term Frequency
 It helps to identify the value of each word in a document.

 Term frequency can easily be found from the document vector table as in that table we mention the frequency of each word of the
vocabulary in each document. (Refer above table)
 The numbers represents the freuquency of the word
1 1 1 0 1 1 0 0 0 0 0 0
0 0 1 1 0 0 1 1 1 0 0 0
1 0 0 1 0 0 1 1 0 1 1 1
Inverse Document Frequency
 Now lets discuss about Inverse Document Frequency.

 It refers to the number of documents in which the word occurs irrespective of how many times it has occurred in those documents.
 For example,
2 1 2 2 1 1 2 2 1 1 1 1
 In the above table you can observe the words – “Divya”,”Rani”,”went”,”to”,”a” is having 2 frequency as they occured in two documents.
 Rest of the terms occured in just one document.
 Now for Inverse Frequency put the doucment frequency in the denominator while the no. of documents is numerator.
 For example,
1*log(3/2) 1*log(3) 1*log(3/2) 0*log((3/2) 1*log(3) 1*log(3) 0*log(3/2) 0*log(3/2) 0*log(3/1) 0*log(3/1) 0*log(3/1) 0*log(3/1)
0 0 1 0 0 1 1
1 *log(3/2) 1 *log(3) 0 *log(3) 0 *log(3) 0 *log(3)
*log(3/2) *log(3) *log(3/2) *log(3) *log(3) *log(3/2) *log(3/2)
1 0 0 0 0 1 1
1 *log(3/2) 0 *log(3) 1 *log(3) 1 *log(3) 1 *log(3)
*log(3/2) *log(3) *log(3/2) *log(3) *log(3) *log(3/2) *log(3/2)
3/2 3/1 3/2 3/2 3/1 3/1 3/2 3/2 3/1 3/1 3/1 3/1
 The formula for is

TFIDF(W) = TF(W) * log( IDF(W) )
 Here, log is to the base of 10. Don’t worry! You don’t need to calculate the log values by yourself. Simply use the log function in the
calculator and find out!
 Now, let’s multiply the IDF values to the TF values.
 Note that the TF values are for each document while the IDF values are for the whole corpus.
 Hence, we need to multiply the IDF values to each row of the document vector table.
 The IDF values for each word is as follows:
0.176 0.477 0.176 0 0.477 0.477 0 0 0 0 0 0
0 0 0.176 0.176 0 0 0.176 0.176 0.477 0 0 0
0.176 0 0 0.176 0 0 0.176 0.176 0 0.477 0.477 0.477
 Finally, the words have been converted to numbers.

 These numbers are the values of each for each document.
 Here, you can see that since we have less amount of data, words like ‘are’ and ‘and’ also have a high value.
 But as the IDF value increases, the value of that word decreases.
 That is, for example:
Total Number of documents: 10
Number of documents in which ‘and’ occurs: 10
Therefore, IDF(and) = 10/10 = 1
Which means: log(1) = 0.
Hence, the value of ‘and’ becomes 0.
On the other hand, suppose a number of documents in which ‘Artificial’ occurs: 3 IDF(Artificial) = 10/3 = 3.3333…
This means log(3.3333) = 0.522; which shows that the word ‘pollution’ has considerable value in the corpus.
Applications of TFDIF
Document Stop
Topic Modelling Information Retrieval System
Classification word filtering
Helps in
Helps in
It helps in removing the
classifying the To extract the important
predicting the unnecessary
type and genre information out of a corpus.
topic for a corpus. words out of a
of a document.
text body.
Here is a corpus for you to challenge yourself with the given tasks. Use the knowledge you have gained in the above sections and try
completing the whole exercise by yourself.
The Corpus
Document 1: We can use health chatbots for treating stress.
Document 2: We can use NLP to create chatbots and we will be making health chatbots now!
Document 3: Health Chatbots cannot replace human counsellors now.
Accomplish the following challenges on the basis of the corpus given above. You can use the tools available online for these challenges.
1. What is Natural Language Processing class 10?
Natural Language Processing class 10 is one of the topics of the Artificial Intelligence skill course. It is unit 6 of the CBSE Artificial Intelligence
curriculum of class 10.
2. What is NLP explain with an example?
NLP stands for Natual Language Processing. It is one of the subfields of AI or domains of AI that helps machines to understand, interpret, and
manipulate human languages such as English or Hindi to analyze and derive their meaning. For example
3. What is Natural Language Processing explain in detail?
Natural Language Processing is one of the branches of AI that helps machines to understand, interpret, and manipulate human languages
such as English or Hindi to analyze and derive their meaning.
NLP takes the data as input from the spoken words, verbal commands or speech recognition software that humans use in their daily lives and
operates on this.
4. What are the steps of NLP?
NLP follows these steps:

1. Text Normalization
2. Sentence Segmentation
3. Tokenization
4. Bag of Words
5. TFIDF
[1] What are the domains of Artificial Intelligence?
Ans. :
1. Data Science
2. Computer Vision
3. Natural Language Processing
[2] Identify: I work around numbers and tabular data

Ans. Data Science
[3] I am all about visual data like images and video. – Who am I?
Ans. Computer Vision
[4] What do you mean by natural language processing?

Ans. NLP refers to data taken from the natural language spoken by human used by them in daily life and operates on them.
[5] Name the AI game you played which uses NLP.

Ans. Mystery Animal
Follow this link to know more about the game: Mystery Animal
[6] Sagar is collecting data from social media platforms. He collected a large amount of data. but he wants specific information from
it. Which NLP application would help him?
Ans.: Automatic Summarization
[7] What are the features of automatic summarization?

Ans.:
1. Summarizing the meaning of documents and information
2. Understand the emotional meanings within the information
3. Provide an overview of a news item or blog post
4. Avoid redundancy from multiple sources
5. Maximize the diversity of content
[8] I used to identify opinions online to help what customers think about the products and services. Who am I?
Ans. Sentiment Analysis
[9] Which application of NLP assigns predefined categories to a document and organize it to help customer to find the information
they want?
Ans. text classification
[10] What is an example of text classification?

Ans. Spam Filtering in email, Auto tagging customer queries, understadning audience response from social media, categorization of news
articles on specific topics.
[11] What is an example of automatic summarization?

Ans. media monitoring, newsletters, video scripting, automated content creation
[12] I am helping humans to keep notes of their important tasks, make calls for them, send messages and many more. Who am I?
Ans. Virtual Assistance or Chatbot
[13] Name popular virtual assistants.

Ans.:
1. Apple Siri
2. Gogole Assistant
3. Amazon Alexa
4. Microsoft Cortana
[14] Give the full form of CBT.

Ans. CBT – Cognitive Behavioural Therapy
[15] How does CBT help human beings?

Ans. CBT understands the behaviour and mindset of an individual. It helps therapist to overcome their stress and make their life happy.
[16] what do you mean by chatbots?

Ans. A chatbot is a computer program that’s designed to simulate human conversation through voice commands or text chats or both. Eg:
Mitsuku Bot, Jabberwacky etc.
OR
A chatbot is a computer program that can learn over time how to best interact with humans. It can answer questions and troubleshoot
customer problems, evaluate and qualify prospects, generate sales leads and increase sales on an ecommerce site.
OR
A chatbot is a computer program designed to simulate conversation with human users. A chatbot is also known as an artificial conversational
entity (ACE), chat robot, talk bot, chatterbot or chatterbox.
OR
A chatbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact
with a live human agent.
Note: Write any definition.
[17] What are the types of chatbots?

Ans.:
1. Scriptbot
2. Smartbot
[18] The customer care section of various companies includes which type of chatbot?
Ans.: Most of the customer case section of various companies includes scriptbot.
[19] The virtual assistants like Siri, Cortana, google assistant etc. can be taken as which type of chatbots?
Ans. The virtual assistants like Siri, Cortana, google assistants etc can be taken as a smartbot.
[20] Give the full form of NLP.

Ans.: NLP – Natural Language Processing
[21] What do you mean by syntax?

Ans.: The grammatical structure of a sentence is known as syntax. This structure help to interprets the message.
[22] How the computer interprets the syntax of a language?

Ans. The computer can interprets the syntax of language using part of speech tagging. The part of speech tagging allows to identify the
different parts of a speech.
[23] What do you mean by semantics?

Ans.: The meaning of word or sentence is known as semantics.
[24] Write an example of different syntax, same semantics.
Ans. 5 + 3 = 3 + 5
[25] Write an example of different semantics, same syntax.

Ans.: 2/3 in python 2.7 is not same in python 3. As in python 2.7, 2/3 returns 1 where as in python 3 it returns 2.7
[26] What do you mean by perfect syntax, no meaning?

Ans. Perfect syntax, no meaning refers to statement which have a correct syntax but it does not convey any meaning.
[27] What do you mean by corpus?

Ans.: In text normalization, all the text from all documents altogether is known as corpus.
[28] What is sentence segmentation in text normalization?

Ans.: In text normalization the whole text is divided into different sentences. This process is known as sentence segmentation. Each sentence is
taken as a different data.
[29] Name the term is used for any word or number or special character occurring in a sentence.
Ans. Tokens
[30] In which processes every word or number or special character is considered separately?
Ans. Tokenization

Natural Language Processing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Natural Language Processing

Uploaded by

Copyright:

Available Formats

Natural Language Processing is one of the branches of AI that helps machines to understand, interpret, and manipulate human languages

The mystery animal game: Play from here

1. Were you able to guess the animal?

Applications of Natural Language processing

 We are using google assistant, cortana, siri, alexa daily.

Revisiting the AI project cycle – The scenario

 The world is competitive nowadays.

Who? People who suffer from stress and depressed

People who need help are reluctant to consult a psychiatrist

When they are going through a stressful

People would be able to vent out their stress

Our People undergoing stress Who?

While They need help in venting out their emotions Where?

An ideal solution Provide them a platform to share their thoughts anonymously

This leads us to the goal of our project which is:

 To understand the sentiments of people, data should be collected.

 Once data is collected it needs to be processed and cleaned.

 Which chatbot did you try? Name any one.

There are two types of chatbots:

Human Language vs Computer Language

 Every langauge has its own rules.

 Different syntax, same semantics: 2+3 = 3+2

Multiple meanings of word

 What does this mean?

 Probably talking about the color of the car

 Is he having an allergic reaction?

Why is text normalization important?

 Sentence segmentation divides the corups into sentences.

 After sentence segmentation each sentences is further divided into tokens.

Converting text to a common case

 In this step the words are recuded to their root words.

healing ing heal

studying ing study

 It is an alernate process of stemming.

healing ing heal

Observe the following example to understand stemming and lemmatization:

The following steps should be followed to implement bag of words:

Document 1: Divya and Rani both are in stressed

Document 2: Rani wents to a therapist

Document 3: Divya went to download a health chatbot

Document 1: [Divya, and, Rani, both, are, stressed]

Document 2: [Rani, went, to, a, therapist]

Document 3:[Divya, went, to, download, a, health, chatbot]

Divya and Rani went are stressed

to a therapist download health chatbot

Step 3 Create a document vector

 In this step write all the words in top row.

In the above table,

1. The header row contains the vocabulary of the corpus.

TFIDF (Term Frequency and Inverse Document Frequency)

 It helps to identify the value of each word in a document.

Inverse Document Frequency

 Now lets discuss about Inverse Document Frequency.

 The formula for is

0.176 0.477 0.176 0 0.477 0.477 0 0 0 0 0 0

0 0 0.176 0.176 0 0 0.176 0.176 0.477 0 0 0

0.176 0 0 0.176 0 0 0.176 0.176 0 0.477 0.477 0.477

 Finally, the words have been converted to numbers.

Number of documents in which ‘and’ occurs: 10

Therefore, IDF(and) = 10/10 = 1

Which means: log(1) = 0.

Hence, the value of ‘and’ becomes 0.

1. What is Natural Language Processing class 10?