Lab Manual - NLP

UDS21303J - Introduction to Natural Language
Processing
Lab Experiment List
S.No Experiment Lists Page No
1 Install and import the NLTK package for Natural Language

Processing.
2 Install and import the spacy package and load the Language Model.
3 In Google Colab, develop a Simple Python Program to count the

number of words in a paragraph and user should be promoted to
enter a paragraph through a file: Note: The paragraph should not
be hand-typed.
4 In Google Colab, develop a Simple Python Program to reverse the
sentence and count the length of each word in a sentence and then
store them in a list stating with the word which has highest string
length.
5 In Google Colab, develop a Simple Python Program to customer
database by key and value, example
o Customer Id
o Customer Name
o Customer Address
o Customer Phone
6 Tokenize text with stop words as delimiters and remove stop words
in a text.
7 Perform Stemming and Lemmatization in the given text and
remove stop words.
8 Find the most common words in the text, excluding stop words
and spell-check the text.
9 Extract Noun, Pronoun, Verbs and Adjectives from the given text.
1
10 Find the similarity between two words and similarity between two
documents using Word2Vec.
11 Find the similarity between two words using cosine similarity.
12 Summarize the given text using different summarization

algorithms available.
13 Build a text classifier with TextBlob and train a text classifier
using Simple transformers.
14 Create a Question-Answering system from given context.
15 Classify a text as a positive, negative, or neutral sentiment using

NLP models.
Beyond the syllabus
1 Punkt package for regular expression-based tokenizer.
2 Spacy package for sentence segmentation.
3 Nueral network Model development using Keras
4. Python for NLP: Movie Sentiment Analysis
5. Emoji creation
2
Ex.no:1
Install and import the NLTK package for Natural Language
Date:
Processing.
Aim :
Install and import the package of NLTK for Natural language processing.
Procedure:
NLTK Tokenizer Package Tokenizers divide strings into lists of substrings. For example,
tokenizers can be used to find the words and punctuation in a string:
Install and import the libraries of nltk, punkt, and tokenize
Code explanation
 word_tokenize module is imported from the NLTK library.
 A variable “text” is initialized with two sentences.
 Text variable is passed in word_tokenize module and printed the result. This module
breaks each word with punctuation which you can see in the output.
Coding:
!pip install nltk
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize
text = "God is Great! I won the match."
print(word_tokenize(text))
Output:
Installation and import
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-
wheels/public/simple/
Requirement already satisfied: nltk in /usr/local/lib/python3.7/dist-packages (3.7) ………
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip. True
3
Tokenize output:
['God', 'is', 'Great', '!', 'I', 'won', 'the', 'match', '.']
Screenshot:
Result:
NLTK , punkt packages has installed and imported for tokenize the words.
4
Ex.no : 2
Install and import the spacy package and load the Language Model.
Date:
Aim :
Spacy package install for tokenization of text segmentation and find the part of speech tag.
Procedure:
spaCy provides the fastest and most accurate syntactic analysis of any NLP library
released to date. Import the spacy package for tokenize the text segmentation and also
for POS tag. POS tagging is the task of automatically assigning POS tags to all the words
of a sentence.
Coding:
Tokenize text segmentation
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('I am going to visit India next week.')
for token in doc:
print(token.text)
POS tag
import spacy
nlp = spacy.load('en_core_web_sm')
# Create an nlp object
doc = nlp("He went to play basketball")
# Iterate over the tokens
for token in doc:
# Print the token and its part-of-speech tag
print(token.text, "-->", token.pos_)
spacy.explain("PRON")
Output:
5
Text segmentation:
I
am
going
to
visit
India
next
week
POS Tag
He --> PRON
went --> VERB
to --> PART
play --> VERB
basketball --> NOUN
Explanation of tag : PRONOUN
Screenshot:
6
Result :
The spacy package was imported for finding tokenize the text segmentation and POS tag
.
7
Ex.no:3
Python Program for count number of words in the paragraph
Date:
Aim :
In Google Colab, develop a Simple Python Program to count the number of words in a paragraph
and user should be promoted to enter a paragraph through a file: Note: The paragraph should not
be hand-typed.
Procedure:
 Take the file name from the user.
 Read each line from the file and split the line to form a list of words.
 Find the length of items in the list and print it.
Coding:
fname = input("Enter file name: ")
num_words = 0
with open(fname, 'r') as f:

for line in f:
words = line.split()
num_words += len(words)
print("Number of words:")
print(num_words)
Output:
Enter file name: text.txt
Number of words:
15
Screenshot:
8
Result:
The Python Program to count the number of words in a paragraph from the text file
executed successfully.
9
Ex.no:4 Program to reverse the sentence and count the length of each word
in a sentence and then store them in a list stating with the word
Date:
which has highest string length
Aim:
In Google Colab, develop a Simple Python Program to reverse the sentence and count the
length of each word in a sentence and then store them in a list stating with the word which has
highest string length
Instruction:
We are given a string and we need to reverse words of a given string
#Add extra space after string to get the last word in the given string
#Split the string into words
#Add word to array words
#Initialize small and large with first word in the string
#If length of large is less than any word present in the string
#Store value of word into large
Coding:
import nltk
text = "A Little progress Each Day Adds Up to Big Results"
words = word_tokenize(text)
words.reverse()
print(words)
print("the lenght of the words in this sentence: ",len(words))
word = "";
words = [];
text = text + " ";
for i in range(0, len(text)):
if(text [i] != ' '):
10
word = word + text[i];
else:
words.append(word);
#Make word an empty string
word = "";
small = large = words[0];
for k in range(0, len(words)):
if(len(small) > len(words[k])):
small = words[k];
if(len(large) < len(words[k])):
large = words[k];
print("Smallest word: " + small);
print("Largest word: " + large);
Output:
['Results', 'Big', 'to', 'Up', 'Adds', 'Day', 'Each', 'progress', 'Little', 'A']
the lenght of the words in this sentence: 10
Smallest word: A
Largest word: progress
Screenshot:
11
Result:
The python program developed for finding the reverse of the sentence and count the length
of sentence.
12
Ex.no:5 In Google Colab, develop a Simple Python Program to create
customer database by key and value, Customer Id , Customer
Name , Customer Address , Customer Phone
Date:
Aim:
Develop a Simple Python Program to create customer database by key and value, example
of Customer Id , Customer Name, Customer Address and Customer Phone
Procedure:
 Create a customer list along with customer id, name, address and phone number
 Call the values through input values
 Print the customer list details
 Convert list into dictionary using dictionary comprehension
Coding:
# vAR_Customer_list = [[341, ‘Rajan’, '56 Nehru Street', 7374838381], [265, 'kanmani', 'sathya
nagar', 3455683838], [543, 'Manu', 'ramapuram', 3455683123], [124, 'sindhu', 'kannadasan
street', 6485683123]]
vAR_Customer_list = [ ]
vAR_n = int(input("Enter number of customers in the database : "))
for i in range(0, vAR_n):
vAR1 = [int(input()),input(),input(), int(input())]

vAR_Customer_list.append(vAR1)
print(vAR_Customer_list)
print("Customer list",str(vAR_Customer_list))
vAR_Dict={}
13
# Convert Lists of List to Dictionary
# Using dictionary comprehension
for vAR2 in vAR_Customer_list:
vAR_Dict[vAR2[0]] = vAR2[1:]
print(vAR_Dict)
import pandas as pd
vAR_df = pd.DataFrame.from_dict(vAR_Dict, orient ='index')
vAR_df
vAR_df.to_csv("customer.csv")
Output:
Enter number of customers in the database : 1
321
rajan
muthunagar
78989098
[[321, 'rajan', 'muthunagar', 78989098]]
Customer list [[321, 'rajan', 'muthunagar', 78989098]]
{245: ['cvuji', 'djlkajlf', 999999999]}
Screen shot:
14
Result:
Using colab, python program was created for customer database.
15
Ex.no:6
Tokenize text with stop words as delimiters and remove stop
words in a text.
Date:
Aim:
Write a program for tokenize text with stop words as delimiters and remove stopwords in
a text.
Instruction:
Stopwords are common words that are present in the text but generally do not contribute
to the meaning of a sentence. They hold almost no importance for the purposes of information
retrieval and natural language processing. For example – ‘the’ and ‘a’. Most search engines will
filter out stop words from search queries and documents.
NLTK library comes with a stopwords corpus – nltk_data/corpora/stopwords/ that contains word
lists for many languages.
Coding:
import nltk
nltk.download('stopwords')
import re
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import collections
#stopword
vAR_input = """Machine learning (ML) is a type of artificial intelligence (AI) that allows softwa
re applications to become more accurate at predicting outcomes without being explicitly program
16
med to do so. Machine learning algorithms use historical data as input to predict new output valu
es."""
stopwords = nltk.corpus.stopwords.words('english')
print("The stopwords are:",stopwords)
vAR_input = word_tokenize(vAR_input)
vAR_1 = [i for i in vAR_input if i.lower() not in stopwords ]
vAR_tokenized=' '.join(vAR_1)
print(vAR_tokenized)
Output:
a) [nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.
b) The stopwords are: ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've",
"you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her',
'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who',
'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have',
'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after',
'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then',
'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most',
'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will',
'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't",
'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven',
"haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan',
"shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn',
"wouldn't"]
Machine learning ( ML ) type artificial intelligence ( AI ) allows software applications become
accurate predicting outcomes without explicitly programmed . Machine learning algorithms use
historical data input predict new output values .
17
Screen shot:
Result:
Tokenize text with stop words as delimiters and remove stop words in a text executed
successfully.
18
Ex.no:7 Python program for stemming in the given text and remove stop words
in a text.
Date:
Aim:
Write a python program for stemming in the given text and remove stop words in a text.
Instruction:
Stemming and Lemmatization is simply normalization of words, which means reducing a word to
its root form.
In most natural languages, a root word can have many variants. For example, the word ‘play’ can
be used as ‘playing’, ‘played’, ‘plays’, etc. You can think of similar examples (and there are
plenty).
Coding:
Stemming:
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
set(stopwords.words('english'))
text = """My name is Sachin. I’m from Delhi where I finished my schooling last year from SRM
School. Is there anyone here from my city?.
I like watching movies, at least once a month. I play basketball on weekends and chess whenever
I get time. I’m into reading thriller novels as well, Dan Brown being my favorite novelist.
I’m happy to step into college life, which provides more freedom and where, finally, I don’t have
to come in a uniform. Post-college, I aspire to work in consulting industry."""
stop_words = set(stopwords.words('english'))
19
word_tokens = word_tokenize(text)
filtered_sentence = []
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
Stem_words = []
ps =PorterStemmer()
for w in filtered_sentence:
rootWord=ps.stem(w)
Stem_words.append(rootWord)
print(filtered_sentence)
print(Stem_words)
Output:
['My', 'name', 'Sachin', '.', 'I', '’', 'Delhi', 'I', 'finished', 'schooling', 'last', 'year', 'SRM', 'School', '.',
'Is', 'anyone', 'city', '?', '.', 'I', 'like', 'watching', 'movies', ',', 'least', 'month', '.', 'I', 'play', 'basketball',
'weekends', 'chess', 'whenever', 'I', 'get', 'time', '.', 'I', '’', 'reading', 'thriller', 'novels', 'well', ',', 'Dan',
'Brown', 'favorite', 'novelist', '.', 'I', '’', 'happy', 'step', 'college', 'life', ',', 'provides', 'freedom', ',',
'finally', ',', 'I', '’', 'come', 'uniform', '.', 'Post-college', ',', 'I', 'aspire', 'work', 'consulting', 'industry',
'.']
['my', 'name', 'sachin', '.', 'i', '’', 'delhi', 'i', 'finish', 'school', 'last', 'year', 'srm', 'school', '.', 'is', 'anyon',
'citi', '?', '.', 'i', 'like', 'watch', 'movi', ',', 'least', 'month', '.', 'i', 'play', 'basketbal', 'weekend', 'chess',
'whenev', 'i', 'get', 'time', '.', 'i', '’', 'read', 'thriller', 'novel', 'well', ',', 'dan', 'brown', 'favorit', 'novelist',
'.', 'i', '’', 'happi', 'step', 'colleg', 'life', ',', 'provid', 'freedom', ',', 'final', ',', 'i', '’', 'come', 'uniform', '.',
'post-colleg', ',', 'i', 'aspir', 'work', 'consult', 'industri', '.']
20
Screenshot:
Result:
Python program for stemming in the given text and remove stop words in a text executed
and find the stemming words successfully.
21
Ex.no:8
Find the most common words in the text, excluding stop words
and spell-check the text.
Date:
Aim:
Write a program for find the most common words in the text, excluding stop words and spell
check the text.
Instruction:
Excluding the stop word: The words which are generally filtered out before processing a natural
language are called stop words. These are actually the most common words in any language (like
articles, prepositions, pronouns, conjunctions, etc) and does not add much information to the text.
Examples of a few stop words in English are “the”, “a”, “an”, “so”, “what”.
Spell Checker is a sequence-to-sequence model that detects and corrects spelling errors in
your input text. It's based on Levenshtein Automaton for generating candidate corrections and a
Neural Language Model for ranking corrections.
Coding:
1. Finding most common words in text
import nltk
import re
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import collections
22
a = """India is famous for hosting the world’s tallest statue, known as the Statue of Unity. The 59
7 ft (182 m) high statue is the sculpture of Sardar Vallabhbhai Patel, an eminent freedom fighter
and the first Home Minister of independent India.”””
stopwords = set(nltk.corpus.stopwords.words('english'))
Dict = {}
for vAR_article_text in a.lower().split():

vAR_article_text = re.sub("[()]","", vAR_article_text) ##Removing paranthesis
vAR_article_text = re.sub(r'\[[0-9]*\]', '', vAR_article_text)
vAR_article_text = re.sub(r',', ' ', vAR_article_text)
vAR_article_text = re.sub(r'\s+', '', vAR_article_text)

# Removing special characters and digits
vAR_formatted_article_text = re.sub('[â-zA-Z]', '', vAR_article_text )
vAR_formatted_article_text = re.sub(r'\s+', '', vAR_formatted_article_text)
if vAR_article_text not in stopwords:

if vAR_article_text not in Dict:
Dict[vAR_article_text] = 1
else:
Dict[vAR_article_text] += 1
print("Dict is", Dict)

vAR_mostcommon = collections.Counter(Dict)
print("The most common words and their counts are:",vAR_mostcommon)
2.Spell check
a = """India is famous for hosting the world’s tallest statue, known as the Statue of Unity. The 59
7 ft (182 m) high statue is the sculpture of Sardar Vallabhbhai Patel, an eminent freedom fighter
and the first Home Minister of independent India.
23
The Statue of Unity is twice the size of New York’s Statue of Liberty. It was unveiled on Octobe
r 31, 2018, to commemorate the birth anniversary of Sardar Patel."""
vAR_article_text = re.sub("[()]","", a) ##Changes here
vAR_article_text = re.sub(r'\[[0-9]*\]', ' ', vAR_article_text)
vAR_article_text = re.sub(r'\s+', ' ', vAR_article_text)
vAR_formatted_article_text = re.sub('[â-zA-Z]', ' ', vAR_article_text )
# vAR_formatted_article_text = re.sub(r'\s+', ' ', vAR_formatted_article_text)
type(vAR_formatted_article_text)
!pip3 install pyspellchecker
from spellchecker import SpellChecker as Chk

obj_chk = Chk()
vAR_text = list(vAR_formatted_article_text.split(" "))
print(vAR_text)
vAR_wrong = obj_chk.unknown(vAR_text)
print("The following words have incorrect spelling",vAR_wrong)
Output:
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Package stopwords is already up-to-date!
Dict is {'india': 1, 'famous': 1, 'hosting': 1, 'world’s': 1, 'tallest': 1, 'statue': 5, 'known': 1, 'unity.': 1,
'597': 1, 'ft': 1, '182': 1, 'high': 1, 'sculpture': 1, 'sardar': 2, 'vallabhbhai': 1, 'patel': 1, 'eminent': 1,
24
'freedom': 1, 'fighter': 1, 'first': 1, 'home': 1, 'minister': 1, 'independent': 1, 'india.': 1, 'unity': 1,
'twice': 1, 'size': 1, 'new': 1, 'york’s': 1, 'liberty.': 1, 'unveiled': 1, 'october': 1, '31': 1, '2018': 1,
'commemorate': 1, 'birth': 1, 'anniversary': 1, 'patel.': 1}
The most common words and their counts are: Counter({'statue': 5, 'sardar': 2, 'india': 1, 'famous':
1, 'hosting': 1, 'world’s': 1, 'tallest': 1, 'known': 1, 'unity.': 1, '597': 1, 'ft': 1, '182': 1, 'high': 1,
'sculpture': 1, 'vallabhbhai': 1, 'patel': 1, 'eminent': 1, 'freedom': 1, 'fighter': 1, 'first': 1, 'home': 1,
'minister': 1, 'independent': 1, 'india.': 1, 'unity': 1, 'twice': 1, 'size': 1, 'new': 1, 'york’s': 1, 'liberty.':
1, 'unveiled': 1, 'october': 1, '31': 1, '2018': 1, 'commemorate': 1, 'birth': 1, 'anniversary': 1, 'patel.':
1})
Spell check:
['India', 'is', 'famous', 'for', 'hosting', 'the', 'world', 's', 'tallest', 'statue', '', 'known', 'as', 'the', 'Statue',
'of', 'Unity', '', 'The', '', '', '', '', 'ft', '', '', '', '', 'm', 'high', 'statue', 'is', 'the', 'sculpture', 'of', 'Sardar',
'Vallabhbhai', 'Patel', '', 'an', 'eminent', 'freedom', 'fighter', 'and', 'the', 'first', 'Home', 'Minister', 'of',
'independent', 'India', '', 'The', 'Statue', 'of', 'Unity', 'is', 'twice', 'the', 'size', 'of', 'New', 'York', 's',
'Statue', 'of', 'Liberty', '', 'It', 'was', 'unveiled', 'on', 'October', '', '', '', '', '', '', '', '', '', '', 'to',
'commemorate', 'the', 'birth', 'anniversary', 'of', 'Sardar', 'Patel', '']
The following words have incorrect spelling {'', 'vallabhbhai', 's', 'm', 'ft'}
Screenshot:
25
Spell check
Result:
Most common words in the text, excluding stop words and spell-check the text also
executed successfully through this program.
26
Ex.no:9
Extract Noun, Pronoun, Verbs and Adjectives from the given text.
Date:
Aim :
Write a program for extracting noun, pronoun, verbs and adjectives from the text.
Procedure:
 Information extraction is a powerful NLP concept that will enable you to parse through
any piece of text
 Learn how to perform information extraction using NLP techniques in Python
 Tokenize text (word_tokenize)
 apply pos_tag to above step that is nltk.pos_tag(tokenize_text)
Abbreviation Meaning
CC coordinating conjunction
CD cardinal digit
DT determiner
EX existential there
FW foreign word
IN preposition/subordinating conjunction
JJ This NLTK POS Tag is an adjective (large)
JJR adjective, comparative (larger)
JJS adjective, superlative (largest)
LS list market
MD modal (could, will)
NN noun, singular (cat, tree)
NNS noun plural (desks)
NNP proper noun, singular (sarah)
27
Abbreviation Meaning
NNPS proper noun, plural (indians or americans)
PDT predeterminer (all, both, half)
POS possessive ending (parent\ ‘s)
PRP personal pronoun (hers, herself, him, himself)
PRP$ possessive pronoun (her, his, mine, my, our )
RB adverb (occasionally, swiftly)
RBR adverb, comparative (greater)
RBS adverb, superlative (biggest)
RP particle (about)
TO infinite marker (to)
UH interjection (goodbye)
VB verb (ask)
VBG verb gerund (judging)
VBD verb past tense (pleaded)
VBN verb past participle (reunified)
VBP verb, present tense not 3rd person singular(wrap)
VBZ verb, present tense with 3rd person singular (bases)
WDT wh-determiner (that, what)
WP wh- pronoun (who)
WRB wh- adverb (how)
Coding:
import nltk
nltk.download('averaged_perceptron_tagger')
28
text = "In India people are using their natural language for communication. English is the most la
nguage in all over the world"
text_tokens = word_tokenize(text)
nltk.pos_tag(text_tokens)
Output:
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] /root/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
[('In', 'IN'),
('India', 'NNP'),
('people', 'NNS'),
('are', 'VBP'),
('using', 'VBG'),
('their', 'PRP$'),
('natural', 'JJ'),
('language', 'NN'),
('for', 'IN'),
('communication', 'NN'),
('.', '.'),
('English', 'NNP'),
('is', 'VBZ'),
('the', 'DT'),
('most', 'RBS'),
('language', 'NN'),
('in', 'IN'),
('all', 'DT'),
('over', 'IN'),
29
('the', 'DT'),
('world', 'NN')]
Screenshot:
Result:
From the given text we can extract the noun, adverb, adjective and verb etc..
30
Ex.no:10
Find the similarity between two words and similarity between
two documents using Word2Vec.
Date:
Aim :
Write a program for finding the similarity between the two words and two documents using
Word2vec.
Procedure:
1. nltk library is imported which from where you can download the abc corpus which we will
use in the next step.
2. Gensim is imported. If Gensim Word2vec is not installed, please install it using the
command ” pip3 install gensim”. Please see the below screenshot.
3. Pass the files to the model Word2vec which is imported using Gensim as sentences.
4. Vocabulary is stored in the form of the variable.
5. Model is tested on sample word as these files are related to india and britian.
6. Here the similar word of “india and britain” is predicted by the model.
Coding:
from nltk.corpus import brown

from gensim.models import Word2Vec
import string
import nltk
from gensim.test.utils import common_texts
nltk.download("brown")
# Preprocessing data to lowercase all words and remove single punctuation words
vAR_document = brown.sents()
31
vAR_preprocessed = []
for sent in vAR_document:
vAR_1 = []
for word in sent:
vAR_lowercase = word.lower()
if vAR_lowercase[0] not in string.punctuation:
vAR_1.append(vAR_lowercase)
if len(vAR_1) > 0:
vAR_preprocessed.append(vAR_1)
# Creating Word2Vec
model = Word2Vec(
sentences = vAR_preprocessed)
print(vAR_document)
vAR_w2v = model.wv['india']
print("vector for word is",vAR_w2v)
vAR_sim_words = model.wv.similarity('india', 'britain')
vAR_nsim = model.wv.most_similar('india', topn=10)
vAR_w2v_1 = model.wv['india']
vAR_w2v_2 = model.wv['britain']
Output:
[['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', "Atlanta's", 'recent',
'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities', 'took', 'place',
'.'], ['The', 'jury', 'further', 'said', 'in', 'term-end', 'presentments', 'that', 'the', 'City', 'Executive',
'Committee', ',', 'which', 'had', 'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves', 'the', 'praise',
'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta', "''", 'for', 'the', 'manner', 'in', 'which', 'the', 'election',
'was', 'conducted', '.'], ...]
vector for word is [-1.46023659e-02 -3.97142351e-01 -3.46496969e-01 2.92024426e-02
32
-6.06963992e-01 -1.69270322e-01 -2.08131835e-01 2.05122411e-01
-3.98775429e-01 7.15890080e-02 3.99377942e-01 2.81080846e-02
2.58334190e-01 1.02505200e-01 6.56543136e-01 -2.50810415e-01
2.40424704e-02 2.95228530e-02 7.16620609e-02 3.53642344e-01
6.23054653e-02 -4.44498621e-02 2.75789171e-01 -4.25714791e-01
-2.22657254e-04 -1.29600152e-01 2.08670542e-01 -2.15745702e-01
5.01390934e-01 2.05001235e-02 8.83393362e-02 9.43912379e-03
2.37939730e-01 -2.68860251e-01 4.25316006e-01 5.78801744e-02
-6.81024641e-02 5.04875965e-02 -2.94249445e-01 -3.77295196e-01
-2.33928598e-02 2.27336884e-01 -2.03150421e-01 -4.60095733e-01
-7.28669465e-02 1.44617528e-01 9.84417871e-02 3.10397744e-01
2.23221675e-01 1.98421299e-01 -6.95077479e-01 -4.73527819e-01
5.23682218e-03 2.56097317e-01 -3.91944677e-01 2.82986183e-03
3.82751413e-02 -6.79237843e-02 -1.77189022e-01 1.60648689e-01
-3.91146541e-01 3.82104039e-01 -3.77353460e-01 -4.69972491e-02
-3.40877384e-01 4.51696590e-02 9.03422832e-02 -4.73987788e-01
-2.98473351e-02 -2.83494350e-02 1.20413445e-01 -1.05972020e-02
5.24148829e-02 2.46571988e-01 -3.34149301e-01 2.41344586e-01
1.95572972e-02 7.13833496e-02 -2.57676572e-01 2.43796315e-02
1.83959201e-01 2.37521097e-01 -5.56657836e-02 1.38133556e-01
-2.13655666e-01 2.24126860e-01 8.80653113e-02 3.57075363e-01
7.40339577e-01 8.63423720e-02 3.35932933e-02 -9.80085880e-02
-2.83281595e-01 5.67644954e-01 9.64761749e-02 -1.21349290e-01
-2.33463109e-01 3.70934844e-01 -2.25646347e-01 -1.05537347e-01]
Screenshot:
33
Result:
By the use of Wordvec2 compare the two words and documents similarity through this
program.
34
Ex.no:11
Date: Find the similarity between two sentences using cosine similarity.
Aim :
Write a program for find the similarity between two sentences using cosine similarity.
Procedure:
Cosine similarity and nltk toolkit module are used in this program. To execute this program nltk
must be installed in your system. In order to install nltk module follow the steps below –
 sudo pip3 install nltk
 python3
 import nltk
 nltk.download(‘all’)
Fucntion used
nltk.tokenize: It is used for tokenization. Tokenization is the process by which big quantity of
text is divided into smaller parts called tokens. word_tokenize(X) split the given sentence X into
words and return list.
nltk.corpus: In this program, it is used to get a list of stopwords. A stop word is a commonly
used word (such as “the”, “a”, “an”, “in”).
Coding:
# Program to measure the similarity between
# two sentences using cosine similarity.
# X = input("Enter first string: ").lower()

# Y = input("Enter second string: ").lower()
X ="I love horror movies"
35
Y ="Lights out is a horror movie"
# tokenization
X_list = word_tokenize(X)
Y_list = word_tokenize(Y)
# sw contains the list of stopwords

sw = stopwords.words('english')
l1 =[];l2 =[]
# remove stop words from the string

X_set = {w for w in X_list if not w in sw}
Y_set = {w for w in Y_list if not w in sw}
# form a set containing keywords of both strings

rvector = X_set.union(Y_set)
for w in rvector:
if w in X_set: l1.append(1) # create a vector
else: l1.append(0)
if w in Y_set: l2.append(1)
else: l2.append(0)
c=0
# cosine formula
for i in range(len(rvector)):
c+= l1[i]*l2[i]
cosine = c / float((sum(l1)*sum(l2))**0.5)
print("similarity: ", cosine)
Output:
similarity: 0.2886751345948129
36
Screen shot:
Result:
Program has been executed for finding the cosine similarity for two sentences.
37
Ex.no:12 Summarize the given text using summarization algorithms
available.
Date:
Aim :
Write a program for summarize the given text using summarization algorithms.
Procedure:
 Import libraries
 Pre-processing
 Removing square bracket and extra spaces
 Removing special characters and digits
 Converting text to sentience
 Calculate weighted score
 Getting summary
Coding:
pip install --upgrade pip

pip install beautifulsoup4
pip install lxml

pip install nltk
mport bs4 as bs
import urllib.request
import re
import nltk
scraped_data = urllib.request.urlopen('https://en.wikipedia.org/wiki/Severe_acute_respiratory_sy
ndrome_coronavirus_2')
article = scraped_data.read()
38
parsed_article = bs.BeautifulSoup(article,'lxml')
paragraphs = parsed_article.find_all('p')
article_text = ""
for p in paragraphs:
article_text += p.text
# Removing Square Brackets and Extra Spaces
article_text = re.sub(r'\[[0-9]*\]', ' ', article_text)
article_text = re.sub(r'\s+', ' ', article_text)
formatted_article_text = re.sub('[â-zA-Z]', ' ', article_text )
formatted_article_text = re.sub(r'\s+', ' ', formatted_article_text)
sentence_list = nltk.sent_tokenize(article_text)
stopwords = nltk.corpus.stopwords.words('english')
word_frequencies = {}
for word in nltk.word_tokenize(formatted_article_text):
if word not in stopwords:
if word not in word_frequencies.keys():
word_frequencies[word] = 1
else:
word_frequencies[word] += 1
maximum_frequncy = max(word_frequencies.values())
for word in word_frequencies.keys():
word_frequencies[word] = (word_frequencies[word]/maximum_frequncy)
sentence_scores = {}
for sent in sentence_list:
for word in nltk.word_tokenize(sent.lower()):
39
if word in word_frequencies.keys():
if len(sent.split(' ')) < 30:
if sent not in sentence_scores.keys():
sentence_scores[sent] = word_frequencies[word]
else:
sentence_scores[sent] += word_frequencies[word]
import heapq
summary_sentences = heapq.nlargest(7, sentence_scores, key=sentence_scores.get)
summary = ' '.join(summary_sentences)

print(summary)
Output:
[nltk_data] Unzipping corpora/stopwords.zip.
SARS-CoV-2 is a virus of the species severe acute respiratory syndrome–related coronavirus
(SARSr-CoV), related to the SARS-CoV-1 virus that caused the 2002–2004 SARS outbreak. Other
studies have suggested that the virus may be airborne as well, with aerosols potentially being able
to transmit the virus. The host protein neuropilin 1 (NRP1) may aid the virus in host cell entry
using ACE2. During the initial outbreak in Wuhan, China, various names were used for the virus;
some names used by different sources included "the coronavirus" or "Wuhan coronavirus". The
virus previously had a provisional name, 2019 novel coronavirus (2019-nCoV), and has also been
called human coronavirus 2019 (HCoV-19 or hCoV-19). Severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2) is a strain of coronavirus that causes COVID-19 (coronavirus
disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The
original source of viral transmission to humans remains unclear, as does whether the virus became
pathogenic before or after the spillover event.
Screenshot:
40
Result :
The summarization of text has been executed successfully.
41
Ex.no:13
Build a text classifier with TextBlob and train a text classifier
using Simple transformers.
Date:
Aim :
Write a program for text classification using textblob and train a text.
Instructions:
 Our first classifier will be a simple sentiment analyzer trained on a small dataset of fake
tweets.
 To begin, we'll import the textblob.classifiers and create some training and test data.
 We create a new classifier by passing training data into the constructor for
a NaiveBayesClassifier.
 We can now classify arbitrary text using the NaiveBayesClassifier.classify(text) method.
 To classify strings of text is to use TextBlob objects. You can pass classifiers into the
constructor of a TextBlob.
Then call the classify() method on the blob.
 TextBlob's sentence tokenization and classify each sentence indvidually.
 Check the accuracy on the test set.
 Find the most informative features.
Coding:
import nltk
from textblob.classifiers import NaiveBayesClassifier
from textblob import TextBlob
train = [
('I love this sandwich.', 'pos'),
42
('This is an amazing place!', 'pos'),
('I feel very good about these beers.', 'pos'),
('This is my best work.', 'pos'),
("What an awesome view", 'pos'),
('I do not like this restaurant', 'neg'),
('I am tired of this stuff.', 'neg'),
("I can't deal with this", 'neg'),
('He is my sworn enemy!', 'neg'),
('My boss is horrible.', 'neg')
]
test = [
('The beer was good.', 'pos'),
('I do not enjoy my job', 'neg'),
("I ain't feeling dandy today.", 'neg'),
("I feel amazing!", 'pos'),
('Gary is a friend of mine.', 'pos'),
("I can't believe I'm doing this.", 'neg')
]
cl = NaiveBayesClassifier(train)
# Classify some text

print(cl.classify("Their burgers are amazing.")) # "pos"
print(cl.classify("I don't like their pizza.")) # "neg"
# Classify a TextBlob
blob = TextBlob("The beer was amazing. But the hangover was horrible. "
"My boss was not pleased.", classifier=cl)
print(blob)
print(blob.classify())
43
for sentence in blob.sentences:
print(sentence)
print(sentence.classify())
# Compute accuracy
print("Accuracy: {0}".format(cl.accuracy(test)))
# Show 5 most informative features

cl.show_informative_features(5)
Output:
[nltk_data] Unzipping tokenizers/punkt.zip.
pos
neg
The beer was amazing. But the hangover was horrible. My boss was not pleased.
neg
The beer was amazing.
pos
But the hangover was horrible.
neg
My boss was not pleased.
neg
Accuracy: 0.8333333333333334
Most Informative Features
contains(this) = True neg : pos = 2.3 : 1.0
contains(this) = False pos : neg = 1.8 : 1.0
contains(This) = False neg : pos = 1.6 : 1.0
contains(an) = False neg : pos = 1.6 : 1.0
contains(I) = False pos : neg = 1.4 : 1.0
44
Screenshot:
Result:
Text classification using Textblob executed successfully.
45
Ex.no:14
Create a Question-Answering system from given context
Date:
Aim:
Write a program for create a question and answering system from given context.
Procedure:
Install transformers
Import pipeline as QA
Create a variable for question and paragraph
Object model for producing output from the paragraph
Print the question and answer
Coding:
!pip3 install transformers
from transformers import pipeline as QA
obj_model = QA("question-answering",model="distilbert-base-cased-distilled-squad")
#first argument - "question-answering" is the task
#second argument is the bert-base-cased-distilled, trained on Squad Dataset
vAR_Ques = "What is the model for Question Generation"
vAR_para = "Deep learning is an important element of data science, which includes statistics and
predictive modeling.Transformers is a model for question generation"
vAR_out = obj_model(question = vAR_Ques, context = vAR_para)
print("Question:",vAR_Ques)
print("Answer:",vAR_out['answer'])
Output:
Question: What is the model for Question Generation
46
Answer: Transformers
Screen shot:
Result:
Question and answer were created for given context through this program.
47
Ex.no:15
Classify a text as a positive, negative, or neutral sentiment
with NLP models
Date:
Aim:
Write a program to classify a text as a positive, negative, or neutral sentiment
with NLP models.
Coding:
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
vAR_filepath = r'/content/drive/MyDrive/Introduction_to_Natural_Language_Processing/Unit9/I
MDB_Dataset.csv'
# Read Input data
vAR_df=pd.read_csv(vAR_filepath)
vAR_df.head()
train = vAR_df[:1000]
import numpy as np
vAR_train = np.array(train)
import nltk
from textblob.classifiers import NaiveBayesClassifier
obj_cls = NaiveBayesClassifier(vAR_train)
48
vAR_pred = obj_cls.classify("The movie was boring")
print(vAR_pred)
vAR_test = vAR_df[1000:1050]['review']
vAR_test = np.array(vAR_test)
vAR_pred_array = []
for i in vAR_test:
vAR_pred = obj_cls.classify(i)
vAR_pred_array.append(vAR_pred)
vAR_test_label = vAR_df[1000:1050]['sentiment']
vAR_test_label = np.array(vAR_test_label)
vAR_test_label
vAR_pred_array
from sklearn.metrics import classification_report

from sklearn.metrics import accuracy_score
vAR_acc = accuracy_score(vAR_test_label, vAR_pred_array)
print("Accuracy using textblob is:",vAR_acc*100,"%")
Output:
array(['negative', 'negative', 'negative', 'positive', 'positive', 'negative', 'negative', 'positive',
'positive', 'positive', 'positive', 'positive', 'negative', 'negative', 'positive', 'positive', 'positive',
'negative', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative', 'negative',
'positive', 'negative', 'negative', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative',
'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative',
'positive', 'positive', 'positive', 'positive', 'positive', 'positive'], dtype=object)
49
['negative',
'negative',
'negative',
'positive',
'positive',
'positive',
'negative',
'positive',
'negative',
'negative',
'positive',
'positive',
'negative',
'negative',
'negative',
'negative',
'negative',
'negative',
'positive',
'negative',
'negative',
'positive',
'negative',
'negative',
'negative',
'negative',
'positive',
'negative',
'negative',
'positive',
'positive',
50
'negative',
'positive',
'negative',
'negative',
'negative',
'positive',
'positive',
'positive',
'positive',
'negative',
'positive',
'positive',
'negative',
'positive',
'positive',
'positive',
'negative',
'positive',
'negative']
Accuracy using textblob is: 76.0 %
Screenshot:
51
Result: Program was executed for finding sentiment analysis
52
Beyond the syllabus:
1. Punkt package for regular expression based tokenizer
Aim: Import punkt package for regular expression based tokenizer .

Procedure:
This particular tokenizer requires the punkt sentence tokenization models to be installed.
NLTK also provides a simpler, regular expression based tokenizer, which splits text on
white space and punctuation.
Coding:
import nltk
from nltk.tokenize import wordpunct_tokenize
s = '''Good muffins cost 300 in SRM Canteen. Please buy me
... two of them.\n\nThanks.'''
print(wordpunct_tokenize(s))
Output:
['Good', 'muffins', 'cost', '300', 'in', 'SRM', 'Canteen', '.', 'Please', 'buy', 'me', '...', 'two', 'of',
'them', '.', 'Thanks', '.']
Screen shot
Result:
Punkt package installed and find the regular expression based tokenizers.
53
2. Import Spacy package for Sentence Segmentation
Aim:
Program for splitting the sentences from the paragraphs using sentence
segmentation.
Instruction:
The process of deciding from where the sentences actually start or end in NLP or we
can simply say that here we are dividing a paragraph based on sentences. This process
is known as Sentence Segmentation. In Python, we implement this part of NLP using
the spacy library.
Coding:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("India, officially the Republic of India, is a country in South Asia. It is the seve
nth-largest country by area, the second-
most populous country, and the most populous democracy in the world. Tamil Nadu, a So
uth Indian state, is famed for its Dravidian-
style Hindu temples. In Madurai, Meenakshi Amman Temple has high ‘gopuram’ towers
ornamented with colourful figures. On Pamban Island, Ramanathaswamy Temple is a pil
grimage site. The town of Kanyakumari, at India’s southernmost tip, is the site of ritual s
unrises.")
for sent in doc.sents:
print(sent.text)
Output:
India, officially the Republic of India, is a country in South Asia.
It is the seventh-largest country by area, the second-most populous country, and the most
populous democracy in the world.
Tamil Nadu, a South Indian state, is famed for its Dravidian-style Hindu temples.
In Madurai, Meenakshi Amman Temple has high ‘gopuram’ towers ornamented with
colourful figures.
On Pamban Island, Ramanathaswamy Temple is a pilgrimage site.
54
The town of Kanyakumari, at India’s southernmost tip, is the site of ritual sunrises.
Screenshot:
Result:
Spacy package installed for sentence segmentation.
55
3. Develop a neural network model using keras
Aim : Write a python program for develop a neural network model using keras
Procedure:
KerasNLP is a simple and powerful API for building Natural Language Processing (NLP)
models within the Keras ecosystem. KerasNLP provides modular building blocks
following standard Keras interfaces (layers, metrics) that allow you to quickly and flexibly
iterate on your task.
the NumPy library to load your dataset and two classes from the Keras library to define
your model.
load our dataset. In this Keras tutorial, you will use the Pima Indians onset of diabetes
dataset.
load the file as a matrix of numbers using the NumPy function loadtxt().
Once the CSV file is loaded into memory, you can split the columns of data into input and
output variables.
You are now ready to define your neural network model.
Compile keras model
Coding:
# first neural network with keras tutorial
from numpy import loadtxt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
# load the dataset
dataset = loadtxt('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-
indians-diabetes.data.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
56
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
Screen shot:
Result:
Python program was developed for keras .
57
4. Python for NLP: Movie Sentiment Analysis
Aim :
Write a program for movie sentiment analysis using NLP
Procedure:
 Load libraries
 Import movie reviews
 Find the categories of movies
 Train and test by the given array values
Coding:
import nltk
nltk.download('movie_reviews')
import random
from nltk.corpus import movie_reviews
reviews = [(list(movie_reviews.words(fileid)), category)

for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
new_train, new_test = reviews[0:100], reviews[101:200]
print(new_train[0])
Output:
[nltk_data] Downloading package movie_reviews to /root/nltk_data...

[nltk_data] Unzipping corpora/movie_reviews.zip.
(['plot', ':', 'two', 'teen', 'couples', 'go', 'to', 'a', 'church', 'party', ',', 'drink', 'and', 'then', 'drive', '.', 'they', 'get',
'into', 'an', 'accident', '.', 'one', 'of', 'the', 'the', 'guys', 'dies', ',', 'but', 'his',
'girlfriend', 'continues', 'to', 'see', 'him', 'in', 'her', 'life', ',',
'and', 'has', 'nightmares', '.', 'what', "'", 's', 'the', 'deal', '?',
'watch', 'the', 'movie', 'and', '"', 'sorta', '"', 'find', 'out', '.',
'.', '.', 'critique', ':', 'a', 'mind', '-', 'fuck', 'movie', 'for',
'the', 'teen', 'generation', 'that', 'touches', 'on', 'a', 'very', 'cool',
'idea', ',', 'but', 'presents', 'it', 'in', 'a', 'very', 'bad', 'package',
'.', 'which', 'is', 'what', 'makes', 'this', 'review', 'an', 'even',
'harder', 'one', 'to', 'write', ',', 'since', 'i', 'generally', 'applaud',
'films', 'which', 'attempt', 'to', 'break', 'the', 'mold', ',', 'mess',
'with', 'your', 'head', 'and', 'such', '(', 'lost', 'highway', '&',
'memento', ')', ',', 'but', 'there', 'are', 'good', 'and', 'bad', 'ways',
'of', 'making', 'all', 'types', 'of', 'films', ',', 'and', 'these',
'folks', 'just', 'didn', "'", 't', 'snag', 'this', 'one', 'correctly',
'.', 'they', 'seem', 'to', 'have', 'taken', 'this', 'pretty', 'neat',
'concept', ',', 'but', 'executed', 'it', 'terribly', '.', 'so', 'what',
'are', 'the', 'problems', 'with', 'the', 'movie', '?', 'well', ',', 'its',
'main', 'problem', 'is', 'that', 'it', "'", 's', 'simply', 'too',
'jumbled', '.', 'it', 'starts', 'off', '"', 'normal', '"', 'but', 'then',
'downshifts', 'into', 'this', '"', 'fantasy', '"', 'world', 'in', 'which',
58
'you', ',', 'as', 'an', 'audience', 'member', ',', 'have', 'no', 'idea',
'what', "'", 's', 'going', 'on', '.', 'there', 'are', 'dreams', ',',
'there', 'are', 'characters', 'coming', 'back', 'from', 'the', 'dead',
',', 'there', 'are', 'others', 'who', 'look', 'like', 'the', 'dead', ',',
'there', 'are', 'strange', 'apparitions', ',', 'there', 'are',
'disappearances', ',', 'there', 'are', 'a', 'looooot', 'of', 'chase',
'scenes', ',', 'there', 'are', 'tons', 'of', 'weird', 'things', 'that',
'happen', ',', 'and', 'most', 'of', 'it', 'is', 'simply', 'not',
'explained', '.', 'now', 'i', 'personally', 'don', "'", 't', 'mind',
'trying', 'to', 'unravel', 'a', 'film', 'every', 'now', 'and', 'then',
',', 'but', 'when', 'all', 'it', 'does', 'is', etc….
Screenshot:
Result:
Movie reviews find by this program.
59
5. Emoji in Python
Aim : Write a program for create a Emoji
Procedure:
Using emoji module:
To install it run the following in the terminal. In the python module emoji is used to
implement Emoji characters. To install this module need following command :
pip install emoji
Coding:
!pip install emoji
import emoji
print(emoji.emojize(":grinning_face_with_big_eyes:"))
print(emoji.emojize(":winking_face_with_tongue:"))
print(emoji.emojize(":zipper-mouth_face:"))
Result:
Emoji was created successfully.
60

Lab Manual - NLP

Uploaded by

Copyright:

Available Formats

You might also like

Lab Manual - NLP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lab Manual - NLP

Uploaded by

Copyright:

Available Formats

UDS21303J - Introduction to Natural Language

1 Install and import the NLTK package for Natural Language

3 In Google Colab, develop a Simple Python Program to count the

12 Summarize the given text using different summarization

15 Classify a text as a positive, negative, or neutral sentiment using

1 Punkt package for regular expression-based tokenizer.

2 Spacy package for sentence segmentation.

3 Nueral network Model development using Keras

4. Python for NLP: Movie Sentiment Analysis

with open(fname, 'r') as f:

vAR1 = [int(input()),input(),input(), int(input())]

{245: ['cvuji', 'djlkajlf', 999999999]}

for vAR_article_text in a.lower().split():

vAR_article_text = re.sub(r'\s+', '', vAR_article_text)

if vAR_article_text not in stopwords:

print("Dict is", Dict)

!pip3 install pyspellchecker

from spellchecker import SpellChecker as Chk

from nltk.corpus import brown

# X = input("Enter first string: ").lower()

# sw contains the list of stopwords

# remove stop words from the string

# form a set containing keywords of both strings

pip install --upgrade pip

pip install lxml

summary = ' '.join(summary_sentences)

# Classify some text

# Show 5 most informative features

from sklearn.metrics import classification_report

1. Punkt package for regular expression based tokenizer

Aim: Import punkt package for regular expression based tokenizer .

Spacy package installed for sentence segmentation.

Python program was developed for keras .

reviews = [(list(movie_reviews.words(fileid)), category)

[nltk_data] Downloading package movie_reviews to /root/nltk_data...

Movie reviews find by this program.

Aim : Write a program for create a Emoji

You might also like