Basic introduction to NLTK

AUTHOR : Aditya Ojha

The prerequistes of this notebook is that you love Python :)

In [20]:'movie_reviews')

[nltk_data] Downloading package movie_reviews to

[nltk_data] C:\Users\ABC\AppData\Roaming\nltk_data...
[nltk_data] Unzipping corpora\
Out[20]: True

In [1]: !pip install nltk

Requirement already satisfied: nltk in c:\users\abc\anaconda3\lib\site-packages (3.4)

Requirement already satisfied: six in c:\users\abc\anaconda3\lib\site-packages (from nltk) (1.12.0)
Requirement already satisfied: singledispatch in c:\users\abc\anaconda3\lib\site-packages (from nlt
k) (

Tokenizing means to group in words or sentences

Importing necessary packages

In [1]: import nltk

from nltk import sent_tokenize, word_tokenize

Sample Data

In [2]: para = "This is sample text. We are testing nltk packages. Do not disappoint us."

Sentence Tokenization

In [3]: a = sent_tokenize(para)

In [4]: a

Out[4]: ['This is sample text.',

'We are testing nltk packages.',
'Do not disappoint us.']

In [5]: for j in sent_tokenize(para):


This is sample text.

We are testing nltk packages.
Do not disappoint us.

Words Tokenizatin

In [6]: b = word_tokenize(para)

In [7]: b

Out[7]: ['This',
'not', 2/33
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-

In [8]: for i in word_tokenize(para):



They are words that does not have an impact on sentence analysis. For example is, am, was, will, etc

In [9]: from nltk.corpus import stopwords

In [10]: example_sentence = "This is an example of stop words filtaration. Hope it will run"

In [11]: stop_words = set(stopwords.words("english"))

In [12]: print(stop_words)

{'where', 'have', 'than', 'or', "haven't", 'some', 'because', "mightn't", 'your', 'its', "weren't",
"couldn't", 'been', 'did', 'these', 'it', 'for', 'wouldn', 'being', 'my', 'wasn', 'mightn', "should
n't", 'his', 'ma', 'same', 'has', 'do', 'will', 'couldn', 'when', 'won', 'each', 'doing', 'over', "t
hat'll", 'all', 'below', 're', 'any', 'you', 'which', "didn't", "wouldn't", 'as', 'once', "should'v
e", "you'll", 'before', 'why', 'at', 'after', "won't", 'a', 'then', 'above', 'most', 'hadn', 'y', 'n
ow', 'shouldn', 'only', 'itself', 'ours', 'aren', 'her', 'those', 'in', 'just', 'up', 'this', 'thems
elves', "doesn't", 'an', 'there', 'she', 'own', 'whom', 'how', 'not', 'were', "she's", 'should',
'd', 'so', "it's", 'himself', 'me', 'are', 'if', 'who', 'mustn', 'with', 'few', 'haven', 'that', 'b
y', "don't", 'm', 'theirs', 'the', 'between', 's', 'nor', "mustn't", "needn't", 'weren', 'to', 'ou
t', 'here', 'we', 'until', 've', "shan't", 'further', 'into', 'is', "you're", 'our', 'yourselves',
'while', "isn't", 'am', "aren't", 'and', 'during', 'needn', 'i', 'hers', 'them', 'their', 'having',
'o', 'was', 'yourself', 'of', 'didn', 'doesn', 'can', 'had', 'other', 'very', 'ain', "you'd", 'abou
t', 'off', 'he', 'under', 'myself', 'but', "hasn't", 't', 'through', "wasn't", "you've", 'too', "had
n't", 'him', 'be', 'against', 'on', 'they', 'isn', 'what', 'both', 'down', 'll', 'yours', 'no', 'do
n', 'herself', 'hasn', 'does', 'more', 'from', 'shan', 'ourselves', 'again', 'such'}

In [13]: words = word_tokenize(example_sentence)

filtered_sentence = []

for w in words:
if w not in stop_words:

['This', 'example', 'stop', 'words', 'filtaration', '.', 'Hope', 'run']

In stemming we take root words. For example, root word of 'Riding' is 'Ride'.


In [14]: from nltk.stem import PorterStemmer

In [15]: ps = PorterStemmer() 3/33
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
In [16]: example_words = ["python","pythoner","pythoning","pythoned","pythonly"]

for w in example_words:


In [17]: sample = "You are not considering the considerable performance considerably.Its consequences will no
t be considerd"
words = sent_tokenize(sample)
sample = sample.split(' ')

for w in sample:


SnowBall Stemming

In [18]: from nltk.stem import SnowballStemmer

sb = SnowballStemmer("english")

In [19]: example_words = ["python","pythoner","pythoning","pythoned","pythonly"]

for w in example_words:


In [20]: sample = ("You are not considering the considerable performance considerably. Its consequences will
not be considerd")
words = word_tokenize(sample)
sample = sample.split(' ')

for w in sample:



In [3]: import nltk

from nltk import sent_tokenize, word_tokenize
from nltk corpus import state union #state union adresses by various americaPunktSentenceTokenizern 4/33
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
from nltk.corpus import state_union #state union adresses by various americaPunktSentenceTokenizern
#from nltk.tokenize import sent_tokenize
from nltk.tokenize import PunktSentenceTokenizer

In [22]: train_text = state_union.raw("2006-GWBush.txt")

sample_text = state_union.raw("2005-GWBush.txt")

In [23]: custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

tokenized = custom_sent_tokenizer.tokenize(sample_text)

In [24]: def process_content():

for i in tokenized:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)


except Exception as e:


[('PRESIDENT', 'NNP'), ('GEORGE', 'NNP'), ('W.', 'NNP'), ('BUSH', 'NNP'), ("'S", 'POS'), ('ADDRESS',
'NNP'), ('BEFORE', 'IN'), ('A', 'NNP'), ('JOINT', 'NNP'), ('SESSION', 'NNP'), ('OF', 'IN'), ('THE',
'NNP'), ('CONGRESS', 'NNP'), ('ON', 'NNP'), ('THE', 'NNP'), ('STATE', 'NNP'), ('OF', 'IN'), ('THE',
'NNP'), ('UNION', 'NNP'), ('February', 'NNP'), ('2', 'CD'), (',', ','), ('2005', 'CD'), ('9:10', 'C
D'), ('P.M', 'NNP'), ('.', '.')]
[('EST', 'IN'), ('THE', 'NNP'), ('PRESIDENT', 'NNP'), (':', ':'), ('Mr.', 'NNP'), ('Speaker', 'NN
P'), (',', ','), ('Vice', 'NNP'), ('President', 'NNP'), ('Cheney', 'NNP'), (',', ','), ('members',
'NNS'), ('of', 'IN'), ('Congress', 'NNP'), (',', ','), ('fellow', 'JJ'), ('citizens', 'NNS'), (':',
':'), ('As', 'IN'), ('a', 'DT'), ('new', 'JJ'), ('Congress', 'NNP'), ('gathers', 'NNS'), (',', ','),
('all', 'DT'), ('of', 'IN'), ('us', 'PRP'), ('in', 'IN'), ('the', 'DT'), ('elected', 'JJ'), ('branch
es', 'NNS'), ('of', 'IN'), ('government', 'NN'), ('share', 'NN'), ('a', 'DT'), ('great', 'JJ'), ('pr
ivilege', 'NN'), (':', ':'), ('We', 'PRP'), ("'ve", 'VBP'), ('been', 'VBN'), ('placed', 'VBN'), ('i
n', 'IN'), ('office', 'NN'), ('by', 'IN'), ('the', 'DT'), ('votes', 'NNS'), ('of', 'IN'), ('the', 'D
T'), ('people', 'NNS'), ('we', 'PRP'), ('serve', 'VBP'), ('.', '.')]
[('And', 'CC'), ('tonight', 'NN'), ('that', 'WDT'), ('is', 'VBZ'), ('a', 'DT'), ('privilege', 'NN'),
('we', 'PRP'), ('share', 'NN'), ('with', 'IN'), ('newly-elected', 'JJ'), ('leaders', 'NNS'), ('of',
'IN'), ('Afghanistan', 'NNP'), (',', ','), ('the', 'DT'), ('Palestinian', 'JJ'), ('Territories', 'NN
P'), (',', ','), ('Ukraine', 'NNP'), (',', ','), ('and', 'CC'), ('a', 'DT'), ('free', 'JJ'), ('and',
'CC'), ('sovereign', 'JJ'), ('Iraq', 'NNP'), ('.', '.')]
[('(', '('), ('Applause', 'NNP'), ('.', '.'), (')', ')')]
[('Two', 'CD'), ('weeks', 'NNS'), ('ago', 'RB'), (',', ','), ('I', 'PRP'), ('stood', 'VBD'), ('on',
'IN'), ('the', 'DT'), ('steps', 'NNS'), ('of', 'IN'), ('this', 'DT'), ('Capitol', 'NNP'), ('and', 'C
C'), ('renewed', 'VBN'), ('the', 'DT'), ('commitment', 'NN'), ('of', 'IN'), ('our', 'PRP$'), ('natio
n', 'NN'), ('to', 'TO'), ('the', 'DT'), ('guiding', 'VBG'), ('ideal', 'NN'), ('of', 'IN'), ('libert
y', 'NN'), ('for', 'IN'), ('all', 'DT'), ('.', '.')]
[('This', 'DT'), ('evening', 'NN'), ('I', 'PRP'), ('will', 'MD'), ('set', 'VB'), ('forth', 'JJ'),
('policies', 'NNS'), ('to', 'TO'), ('advance', 'VB'), ('that', 'DT'), ('ideal', 'NN'), ('at', 'IN'),
('home', 'NN'), ('and', 'CC'), ('around', 'IN'), ('the', 'DT'), ('world', 'NN'), ('.', '.')]
[('Tonight', 'NNP'), (',', ','), ('with', 'IN'), ('a', 'DT'), ('healthy', 'JJ'), (',', ','), ('growi
ng', 'VBG'), ('economy', 'NN'), (',', ','), ('with', 'IN'), ('more', 'JJR'), ('Americans', 'NNS'),
('going', 'VBG'), ('back', 'RB'), ('to', 'TO'), ('work', 'NN'), (',', ','), ('with', 'IN'), ('our',
'PRP$'), ('nation', 'NN'), ('an', 'DT'), ('active', 'JJ'), ('force', 'NN'), ('for', 'IN'), ('good',
'JJ'), ('in', 'IN'), ('the', 'DT'), ('world', 'NN'), ('--', ':'), ('the', 'DT'), ('state', 'NN'),
('of', 'IN'), ('our', 'PRP$'), ('union', 'NN'), ('is', 'VBZ'), ('confident', 'JJ'), ('and', 'CC'),
('strong', 'JJ'), ('.', '.')]
[('(', '('), ('Applause', 'NNP'), ('.', '.'), (')', ')')]
[('Our', 'PRP$'), ('generation', 'NN'), ('has', 'VBZ'), ('been', 'VBN'), ('blessed', 'VBN'), ('--',
':'), ('by', 'IN'), ('the', 'DT'), ('expansion', 'NN'), ('of', 'IN'), ('opportunity', 'NN'), (',',
','), ('by', 'IN'), ('advances', 'NNS'), ('in', 'IN'), ('medicine', 'NN'), (',', ','), ('by', 'IN'),
('the', 'DT'), ('security', 'NN'), ('purchased', 'VBN'), ('by', 'IN'), ('our', 'PRP$'), ('parents',
'NNS'), ("'", 'POS'), ('sacrifice', 'NN'), ('.', '.')]
[('Now', 'RB'), (',', ','), ('as', 'IN'), ('we', 'PRP'), ('see', 'VBP'), ('a', 'DT'), ('little', 'J
J'), ('gray', 'NN'), ('in', 'IN'), ('the', 'DT'), ('mirror', 'NN'), ('--', ':'), ('or', 'CC'), ('a',
'DT'), ('lot', 'NN'), ('of', 'IN'), ('gray', 'NN'), ('--', ':'), ('(', '('), ('laughter', 'NN'),
(')', ')'), ('--', ':'), ('and', 'CC'), ('we', 'PRP'), ('watch', 'VBP'), ('our', 'PRP$'), ('childre
n', 'NNS'), ('moving', 'VBG'), ('into', 'IN'), ('adulthood', 'NN'), (',', ','), ('we', 'PRP'), ('as
k', 'VBP'), ('the', 'DT'), ('question', 'NN'), (':', ':'), ('What', 'WP'), ('will', 'MD'), ('be', 'V
B'), ('the', 'DT'), ('state', 'NN'), ('of', 'IN'), ('their', 'PRP$'), ('union', 'NN'), ('?', '.')]
[('Members', 'NNS'), ('of', 'IN'), ('Congress', 'NNP'), (',', ','), ('the', 'DT'), ('choices', 'NN
S'), ('we', 'PRP'), ('make', 'VBP'), ('together', 'RB'), ('will', 'MD'), ('answer', 'VB'), ('that',
'DT'), ('question', 'NN'), ('.', '.')]
[('Over', 'IN'), ('the', 'DT'), ('next', 'JJ'), ('several', 'JJ'), ('months', 'NNS'), (',', ','),
('on', 'IN'), ('issue', 'NN'), ('after', 'IN'), ('issue', 'NN'), (',', ','), ('let', 'VB'), ('us',
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
'PRP'), ('do', 'VB'), ('what', 'WP'), ('Americans', 'NNPS'), ('have', 'VBP'), ('always', 'RB'), ('do
ne', 'VBN'), (',', ','), ('and', 'CC'), ('build', 'VB'), ('a', 'DT'), ('better', 'JJR'), ('world',
'NN'), ('for', 'IN'), ('our', 'PRP$'), ('children', 'NNS'), ('and', 'CC'), ('our', 'PRP$'), ('grandc
hildren', 'NNS'), ('.', '.')]
[('(', '('), ('Applause', 'NNP'), ('.', '.'), (')', ')')]
[('First', 'RB'), (',', ','), ('we', 'PRP'), ('must', 'MD'), ('be', 'VB'), ('good', 'JJ'), ('steward
s', 'NNS'), ('of', 'IN'), ('this', 'DT'), ('economy', 'NN'), (',', ','), ('and', 'CC'), ('renew', 'V
B'), ('the', 'DT'), ('great', 'JJ'), ('institutions', 'NNS'), ('on', 'IN'), ('which', 'WDT'), ('mill
ions', 'NNS'), ('of', 'IN'), ('our', 'PRP$'), ('fellow', 'JJ'), ('citizens', 'NNS'), ('rely', 'RB'),
('.', '.')]
[('America', 'NNP'), ("'s", 'POS'), ('economy', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('fastest', 'JJ
S'), ('growing', 'NN'), ('of', 'IN'), ('any', 'DT'), ('major', 'JJ'), ('industrialized', 'VBN'), ('n
ation', 'NN'), ('.', '.')]
[('In', 'IN'), ('the', 'DT'), ('past', 'JJ'), ('four', 'CD'), ('years', 'NNS'), (',', ','), ('we',
'PRP'), ('provided', 'VBD'), ('tax', 'NN'), ('relief', 'NN'), ('to', 'TO'), ('every', 'DT'), ('perso
n', 'NN'), ('who', 'WP'), ('pays', 'VBZ'), ('income', 'NN'), ('taxes', 'NNS'), (',', ','), ('overcom
e', 'VBP'), ('a', 'DT'), ('recession', 'NN'), (',', ','), ('opened', 'VBD'), ('up', 'RP'), ('new',
'JJ'), ('markets', 'NNS'), ('abroad', 'RB'), (',', ','), ('prosecuted', 'JJ'), ('corporate', 'JJ'),
('criminals', 'NNS'), (',', ','), ('raised', 'VBD'), ('homeownership', 'NN'), ('to', 'TO'), ('its',
'PRP$'), ('highest', 'JJS'), ('level', 'NN'), ('in', 'IN'), ('history', 'NN'), (',', ','), ('and',
'CC'), ('in', 'IN'), ('the', 'DT'), ('last', 'JJ'), ('year', 'NN'), ('alone', 'RB'), (',', ','), ('t
he', 'DT'), ('United', 'NNP'), ('States', 'NNPS'), ('has', 'VBZ'), ('added', 'VBN'), ('2.3', 'CD'),
('million', 'CD'), ('new', 'JJ'), ('jobs', 'NNS'), ('.', '.')]
[('(', '('), ('Applause', 'NNP'), ('.', '.'), (')', ')')]
[('When', 'WRB'), ('action', 'NN'), ('was', 'VBD'), ('needed', 'VBN'), (',', ','), ('the', 'DT'),
('Congress', 'NNP'), ('delivered', 'VBN'), ('--', ':'), ('and', 'CC'), ('the', 'DT'), ('nation', 'N
N'), ('is', 'VBZ'), ('grateful', 'JJ'), ('.', '.')]
[('Now', 'RB'), ('we', 'PRP'), ('must', 'MD'), ('add', 'VB'), ('to', 'TO'), ('these', 'DT'), ('achie
vements', 'NNS'), ('.', '.')]
[('By', 'IN'), ('making', 'VBG'), ('our', 'PRP$'), ('economy', 'NN'), ('more', 'RBR'), ('flexible',
'JJ'), (',', ','), ('more', 'RBR'), ('innovative', 'JJ'), (',', ','), ('and', 'CC'), ('more', 'RB
R'), ('competitive', 'JJ'), (',', ','), ('we', 'PRP'), ('will', 'MD'), ('keep', 'VB'), ('America',
'NNP'), ('the', 'DT'), ('economic', 'JJ'), ('leader', 'NN'), ('of', 'IN'), ('the', 'DT'), ('world',
'NN'), ('.', '.')]
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
POS tag list:

CC coordinating conjunction

CD cardinal digit

DT determiner

EX existential there (like: "there is" ... think of it like "there exists") 19/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
FW foreign word

IN preposition/subordinating conjunction

JJ adjective 'big'

JJR adjective, comparative 'bigger'

JJS adjective, superlative 'biggest'

LS list marker 1)

MD modal could, will

NN noun, singular 'desk'

NNS noun plural 'desks'

NNP proper noun, singular 'Harrison'

NNPS proper noun, plural 'Americans'

PDT predeterminer 'all the kids'

POS possessive ending parent\'s

PRP personal pronoun I, he, she

PRP$ possessive pronoun my, his, hers

RB adverb very, silently,

RBR adverb, comparative better

RBS adverb, superlative best

RP particle give up

TO to go 'to' the store.

UH interjection errrrrrrrm

VB verb, base form take

VBD verb, past tense took

VBG verb, gerund/present participle taking

VBN verb, past participle taken

VBP verb, sing. present, non-3d take

VBZ verb, 3rd person sing. present takes

WDT wh-determiner which

WP wh-pronoun who, what

WP$ possessive wh-pronoun whose

WRB wh-abverb where, when


One of the main goals of chunking is to group into what are known as "noun phrases." These are phrases of one or more words that contain
a noun, maybe some descriptive words, maybe a verb, and maybe something like an adverb. The idea is to group nouns with the words
that are in relation to them.

In order to chunk, we combine the part of speech tags with regular expressions. Mainly from regular expressions, we are going to utilize the

+ = match 1 or more

? = match 0 or 1 repetitions. 20/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-

* = match 0 or MORE repetitions

. = Any character except a new line

In [25]: train_text = state_union.raw("2006-GWBush.txt")

sample_text = state_union.raw("2005-GWBush.txt")

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

tokenized = custom_sent_tokenizer.tokenize(sample_text)

def process_content():
for i in tokenized[:3]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)

chunkGram = r"""Chunk: {<RB.?>*<VB.?>*<NNP>+<NN>?}"""

chunkParser = nltk.RegexpParser(chunkGram)
chunked = chunkParser.parse(tagged)

for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Chunk'):


except Exception as e:


(Chunk THE/NNP UNION/NNP February/NNP)
(Chunk P.M/NNP)
(Chunk THE/NNP UNION/NNP February/NNP)
(Chunk P.M/NNP)
(Chunk Mr./NNP Speaker/NNP)
(Chunk Vice/NNP President/NNP Cheney/NNP)
(Chunk Congress/NNP)
(Chunk Congress/NNP)
i / 21/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
(Chunk Mr./NNP Speaker/NNP)
(Chunk Vice/NNP President/NNP Cheney/NNP)
(Chunk Congress/NNP)
(Chunk Congress/NNP)
(Chunk Afghanistan/NNP)
(Chunk Territories/NNP)
(Chunk Ukraine/NNP)
(Chunk Iraq/NNP)
(Chunk Afghanistan/NNP)
(Chunk Territories/NNP)
(Chunk Ukraine/NNP)
(Chunk Iraq/NNP)


Chinking is a lot like chunking, it is basically a way for you to remove a chunk from a chunk. The chunk that you remove from your chunk is
your chink.

The code is very similar, you just denote the chink, after the chunk, with }{ instead of the chunk's {}.

In [6]: #train_text = state_union.raw("2005-GWBush.txt")

#sample_text = state_union.raw("2006-GWBush.txt")

#custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

#tokenized custom sent tokenizer tokenize(sample text) 22/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
#tokenized = custom_sent_tokenizer.tokenize(sample_text)

#def process_content():
# try:
# for i in tokenized[:5]:
# words = nltk.word_tokenize(i)
# tagged = nltk.pos_tag(words)

# chunkGram = r"""Chunk: {<.*>+}

# }<VB.?|IN|DT|TO>+{"""

# chunkParser = nltk.RegexpParser(chunkGram)
# chunked = chunkParser.parse(tagged)

# chunked.draw()

# except Exception as e:
# print(str(e))



The idea is to have the machine immediately be able to pull out "entities" like people, places, things, locations, monetary figures, and more.

In [7]: import nltk

from nltk import sent_tokenize, word_tokenize
from nltk.corpus import state_union #state union adresses by various americaPunktSentenceTokenizern
from nltk.tokenize import PunktSentenceTokenizer

In [10]: train_text = state_union.raw("2006-GWBush.txt")

sample_text = state_union.raw("2005-GWBush.txt")

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

tokenized = custom_sent_tokenizer.tokenize(sample_text)

def process_content():
for i in tokenized[:3]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=False)
except Exception as e:



Lammatizing is similar to stemming. The major difference between these is, as you saw earlier, stemming can often create non-existent
words, whereas lemmas are actual words.

In [52]: from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

In [53]: print(lemmatizer.lemmatize("cats"))


In [56]: #(lemmatizer.lemmatize("better"))
print(lemmatizer lemmatize("worse" pos='a')) # a= adjective 23/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
print(lemmatizer.lemmatize( worse , pos= a )) # a= adjective
#print(lemmatizer.lemmatize("best", pos='a'))



WordNet is a lexical database for the English language.

We can use WordNet alongside the NLTK module to find the meanings of words, synonyms, antonyms, and more. Let's cover some

In [1]: from nltk.corpus import wordnet

In [20]: syns = wordnet.synsets("good")


[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01'), Synset('go

od.a.01'), Synset('full.s.06'), Synset('good.a.03'), Synset('estimable.s.02'), Synset('beneficial.s.
01'), Synset('good.s.06'), Synset('good.s.07'), Synset('adept.s.01'), Synset('good.s.09'), Synset('d
ear.s.02'), Synset('dependable.s.04'), Synset('good.s.12'), Synset('good.s.13'), Synset('effective.
s.04'), Synset('good.s.15'), Synset('good.s.16'), Synset('good.s.17'), Synset('good.s.18'), Synset
('good.s.19'), Synset('good.s.20'), Synset('good.s.21'), Synset('well.r.01'), Synset('thoroughly.r.0

In [13]: print(syns[0].name())


In [14]: print(syns[0].lemmas()[0].name())


In [22]: print(syns[0].definition())


In [21]: print(syns[0].examples())

['for your own good', "what's the good of worrying?"]

We might be discern synonyms and antonyms to a word. The lemmas will be synonyms, and then we can use .antonyms to find the
antonyms to the lemmas. As such, we can populate some lists like

In [23]: synonyms = []

antonyms = []

for syn in wordnet.synsets("good"):

for l in syn.lemmas():
if l.antonyms():


{'serious', 'goodness', 'near', 'estimable', 'undecomposed', 'effective', 'upright', 'full', 'skillf

ul', 'thoroughly', 'ripe', 'proficient', 'salutary', 'good', 'commodity', 'in_force', 'honorable',
'unspoiled', 'just', 'in_effect', 'beneficial', 'secure', 'expert', 'well', 'dependable', 'trade_goo
d', 'honest', 'safe', 'adept', 'practiced', 'respectable', 'dear', 'unspoilt', 'soundly', 'sound',
'skilful', 'right'}
{'evilness', 'ill', 'evil', 'badness', 'bad'}

Next, we can also easily use WordNet to compare the similarity of two words and their tenses, by incorporating the Wu and Palmer method
for semantic related-ness.

Let's compare the noun of "ship" and "boat:"

In [24]: w1 = wordnet.synset('ship.n.01')
w2 = wordnet.synset('boat.n.01')


In [27]: w1 = wordnet.synset('sheep.n.01') 24/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
[ ] y ( p )
w2 = wordnet.synset('dog.n.01')


In [6]: #w1 = wordnet.synset('ship.n.01')

#w2 = wordnet.synset('cat.n.01')


We're going to start by trying to use the movie reviews database that is part of the NLTK corpus. From there we'll try to use words as
"features" which are a part of either a positive or negative movie review. The NLTK corpus movie_reviews data set has the reviews, and
they are labeled already as positive or negative.

In [3]: import nltk

import random
from nltk.corpus import movie_reviews

In [4]: documents = [(list(movie_reviews.words(fileid)), category)

for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]



all_words = []
for w in movie_reviews.words():

all_words = nltk.FreqDist(all_words)

(['he', 'has', 'spent', 'his', 'entire', 'life', 'in', 'an', 'awful', 'little', 'apartment', ',', 'r
aised', 'and', 'cared', 'for', 'and', 'imprisoned', 'by', 'his', 'domineering', 'mother', '.', 'sh
e', 'inspires', 'his', 'love', 'and', 'his', 'fear', ',', 'and', 'instills', 'in', 'him', 'a', 'simi
lar', 'love', 'and', 'fear', 'of', 'jesus', '.', 'he', 'has', 'a', 'rudimentary', 'grasp', 'of', 'la
nguage', ',', 'mouthing', 'monosyllables', 'and', 'repetitions', 'of', 'his', 'mother', "'", 's', 'p
hrases', '.', 'he', 'is', 'taught', 'that', 'the', 'world', 'outside', 'is', 'fatally', 'poisonous',
';', 'his', 'mother', 'dons', 'a', 'gasmask', 'whenever', 'she', 'goes', 'out', 'the', 'door', '.',
'he', 'is', '35', '-', 'years', '-', 'old', 'in', 'body', ',', 'but', 'a', 'child', 'in', 'mind', 'a
nd', 'spirit', '.', 'he', 'is', 'the', 'premise', 'for', 'bad', 'boy', 'bubby', ',', 'a', 'defiantl
y', 'original', 'australian', 'movie', 'about', 'a', 'man', 'called', 'bubby', '(', 'nicholas', 'hop
e', ')', 'who', 'has', 'spent', 'his', 'entire', 'life', 'in', 'an', 'awful', 'little', 'apartment',
',', 'etc', '.', ',', 'etc', '.', 'then', 'one', 'day', 'his', 'father', '(', 'ralph', 'cotterill',
')', 'appears', '.', 'his', 'father', 'is', 'a', 'shabby', 'down', '-', 'at', '-', 'heels', 'pries
t', 'who', 'appears', 'to', 'have', 'permanently', 'misplaced', 'his', 'religion', '.', 'unsurprisin
gly', ',', 'he', 'is', 'not', 'thrilled', 'with', 'the', 'way', '"', 'his', '"', 'boy', 'has', 'turn
ed', 'out', '.', 'he', 'is', ',', 'however', ',', 'rather', 'pleased', 'at', 'renewing', 'his', 'acq
uaintance', 'with', 'the', 'mother', '(', 'claire', 'benito', ')', ',', 'and', ',', 'more', 'to', 't
he', 'point', ',', 'her', 'ample', 'breasts', '.', 'soon', 'they', 'are', 'copulating', 'on', 'the',
'dingy', 'couch', ',', 'while', 'bubby', 'crouches', ',', 'confused', ',', 'in', 'the', 'next', 'roo
m', ',', 'acutely', 'aware', 'that', 'the', 'mother', 'who', 'had', 'devoted', 'all', 'her', 'attent
ion', 'to', 'him', 'has', 'a', 'new', 'interest', '.', 'bubby', "'", 's', 'relationship', 'to', 'th
e', 'world', 'may', 'be', 'warped', ',', 'but', 'it', 'is', 'at', 'least', 'stable', '.', 'the', 'fa
ther', "'", 's', 'arrival', 'disturbs', 'his', 'precarious', 'balance', ',', 'causing', 'an', 'oedip
al', 'conflict', 'which', 'ends', '--', 'freud', 'would', 'be', 'pleased', '--', 'in', 'violence',
'and', ',', 'as', 'a', 'result', ',', 'freedom', '.', 'bubby', 'intuits', 'from', 'his', 'father',
"'", 's', 'arrival', 'that', 'the', 'air', 'outside', 'is', 'breathable', ':', 'he', 'leaves', 'th
e', 'apartment', ',', 'his', 'past', ',', 'his', 'world', ',', 'behind', '.', 'so', 'far', ',', 's
o', 'good', '.', 'the', 'first', 'thirty', 'minutes', 'or', 'so', 'of', 'bad', 'boy', 'bubby', ',',
'which', 'bring', 'us', 'to', 'this', 'point', ',', 'are', 'quite', 'brilliant', '.', 'the', 'movi
e', 'is', 'at', 'its', 'best', 'when', 'its', 'stays', 'within', 'the', 'constraints', 'of', 'bubb
y', "'", 's', 'hermetic', 'two', '-', 'room', 'universe', '.', 'it', 'follows', 'through', 'unrelent
ingly', 'on', 'the', 'implications', 'of', 'its', 'premise', ':', 'bubby', 'is', 'used', 'by', 'hi
s', 'mother', 'for', 'sex', ',', 'he', 'unwittingly', 'suffocates', 'the', 'pet', 'cat', 'with', 'ce
llophane', ',', 'he', 'is', 'terrifed', 'by', 'the', 'notion', 'that', 'jesus', 'will', 'beat', 'hi
m', 'senseless', 'if', 'he', 'sins', '.', 'it', 'is', 'grim', 'and', 'savage', 'and', 'appalling',
',', 'but', 'also', 'strangely', 'tender', '--', 'de', 'heer', ',', 'having', 'imagined', 'a', 'lif
e', 'as', 'bizarre', 'as', 'bubby', "'", 's', ',', 'does', 'not', 'exaggerate', 'for', 'comic', 'o
r', 'grotesqe', 'purposes', ',', 'but', 'simply', 'empathizes', '.', 'he', 'observes', 'what', 'it',
'might', 'be', 'like', '.', 'the', 'intensity', 'of', 'these', 'opening', 'scenes', ',', 'with', 'th
eir', 'minimalist', 'mise', '-', 'en', '-', 'scene', ',', 'immerses', 'us', 'in', 'a', 'claustrophob
ic' 'environment' 'which' 'seems' 'to' 'be' 'a' 'decayed' 'stratum' 'of' 'our' 'own' 'wo 25/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
ic , environment , which , seems , to , be , a , decayed , stratum , of , our , own , wo
rld', ',', 'and', 'owes', 'much', 'to', 'david', 'lynch', "'", 's', 'eraserhead', ',', 'not', 'leas
t', 'the', 'ambient', 'industrial', 'white', 'noise', 'of', 'the', 'soundtrack', '.', 'for', 'thirt
y', 'minutes', ',', 'the', 'movie', 'maintains', 'the', 'feel', 'and', 'mood', 'of', 'a', 'reality',
'that', 'does', 'not', 'seem', 'far', 'removed', 'from', 'our', 'own', '.', 'then', 'de', 'heer', 'l
ets', 'bubby', 'out', ',', 'brings', 'him', 'into', 'contact', 'with', 'our', 'world', ',', 'and',
'the', 'film', 'never', 'quite', 'recovers', '.', 'our', 'unlikely', 'hero', 'finds', 'himself', 'i
n', 'port', 'adelaide', ',', 'where', 'he', 'wanders', 'the', 'streets', 'and', 'meets', 'people',
',', 'where', 'he', 'suffers', 'and', 'learns', 'and', 'survives', '.', 'he', 'is', 'seduced', 'by',
'a', 'young', 'woman', 'from', 'a', 'salvation', 'army', 'band', '(', 'how', 'an', 'anti', '-', 'soc
ial', 'half', '-', 'wit', 'with', 'no', 'sense', 'of', 'hygiene', 'manages', 'to', 'get', 'laid', 'm
ere', 'hours', 'after', 'his', 'escape', 'is', 'not', 'the', 'sort', 'of', 'question', 'the', 'fil
m', 'encourages', ',', 'wisely', ')', ';', 'he', 'is', 'given', 'free', 'pizza', 'by', 'a', 'sympath
etic', 'waitress', ';', 'he', 'insults', 'a', 'traffic', 'cop', 'and', 'is', 'punched', 'in', 'the',
'stomach', ';', 'he', 'shares', 'a', 'few', 'beers', 'in', 'the', 'back', 'of', 'a', 'truck', 'wit
h', 'a', 'rock', 'group', ';', 'he', 'is', 'imprisoned', 'and', 'raped', ';', 'he', 'becomes', 'a',
'translator', 'for', 'mentally', 'handicapped', 'people', 'whose', 'speech', 'is', 'impaired', 'beyo
nd', 'comprehension', ';', 'he', 'is', 'loved', 'by', 'a', 'motherly', 'large', '-', 'breasted', 'nu
rse', '(', 'carmel', 'johnson', ')', '.', '.', '.', 'it', 'goes', 'on', ',', 'by', 'turns', 'inventi
ve', ',', 'silly', ',', 'tasteless', ',', 'endearing', ',', 'and', 'sometimes', 'all', 'of', 'thes
e', 'things', 'at', 'once', '.', 'de', 'heer', 'never', 'seems', 'to', 'be', 'sure', 'how', 'bubby',
'should', 'interface', 'with', 'the', 'real', 'world', ':', 'the', 'tone', 'shifts', ',', 'uneasil
y', ',', 'from', 'fable', 'to', 'realism', 'to', 'satire', 'and', 'back', 'again', '.', 'the', 'scen
es', 'which', 'try', 'to', 'touch', 'base', 'with', 'a', 'believable', 'version', 'of', 'reality',
'are', 'the', 'weakest', ';', 'the', 'film', 'is', 'best', 'understood', 'as', 'a', 'kind', 'of', 'p
arable', ',', 'and', ',', 'indeed', ',', 'the', 'religious', 'implications', 'of', 'bubby', "'",
's', 'experiences', 'are', 'foregrounded', ':', 'icons', 'of', 'jesus', 'on', 'the', 'cross', 'han
g', 'from', 'the', 'mother', "'", 's', 'walls', ',', 'bubby', 'dons', 'a', 'priest', "'", 's', 'coll
ar', 'stolen', 'from', 'his', 'father', ',', 'a', 'church', 'organ', '-', 'playing', 'atheist', 'lec
tures', 'him', 'on', 'the', 'necessity', 'of', 'unbelief', ',', 'the', 'woman', 'who', 'redeems', 'h
im', 'is', 'named', 'angel', '.', 'the', 'manifold', 'stresses', 'of', 'our', 'world', 'do', 'not',
'shatter', 'bubby', "'", 's', 'mind', ',', 'do', 'not', 'fragment', 'him', 'into', 'psychosis', ';',
'rather', ',', 'the', 'world', 'accomodates', 'him', ',', 'and', 'heals', 'him', '.', 'although', 'd
e', 'heer', "'", 's', 'touch', 'is', 'at', 'times', 'overbearing', ',', 'bubby', "'", 's', 'salvatio
n', 'is', 'touching', ';', 'what', 'seemd', 'at', 'first', 'a', 'harsh', 'lesson', 'in', 'the', 'dam
aging', 'effects', 'of', 'the', 'social', 'construction', 'of', 'reality', 'becomes', 'a', 'na',
'?', 've', 'humanist', 'tale', 'of', 'improbable', 'hope', '.', 'a', 'hapless', 'rock', 'group', 'wr
ite', 'a', 'song', 'about', 'bubby', 'and', 'sing', 'it', 'for', 'him', 'and', 'so', 'give', 'him',
'the', 'gift', 'of', 'community', '.', 'he', 'returns', 'the', 'favour', 'when', 'he', 'steps', 'o
n', 'stage', 'one', 'night', 'and', 'becomes', 'their', 'frontman', ',', 'turning', 'the', 'fragment
ed', 'impressions', 'of', 'his', 'experiences', 'into', 'performance', 'art', ',', 'and', 'turning',
'the', 'band', 'into', 'a', 'popular', 'draw', '.', 'innocence', 'triumphs', '.', 'bubby', 'become
s', 'a', 'holy', 'fool', ',', 'an', 'idiot', 'savant', ',', 'and', 'graces', 'us', 'with', 'wisdom',
'.', 'it', "'", 's', 'a', 'strange', 'turn', 'of', 'events', ',', 'but', 'by', 'now', 'we', 'should
n', "'", 't', 'be', 'surprised', ',', 'because', 'bad', 'boy', 'bubby', 'ain', "'", 't', 'like', 'ot
her', 'movies', '.'], 'pos')
[(',', 77717), ('the', 76529), ('.', 65876), ('a', 38106), ('and', 35576), ('of', 34123), ('to', 319
37), ("'", 30585), ('is', 25195), ('in', 21822), ('s', 18513), ('"', 17612), ('it', 16107), ('that',
15924), ('-', 15595)]

In [5]: all_words = nltk.FreqDist(all_words)



Words as Feature for Learning

We're going to be building and compiling feature lists of words from positive reviews and words from the negative reviews to hopefully see
trends in specific types of words in positive or negative reviews.

In [6]: import nltk

import random
from nltk.corpus import movie_reviews

In [7]: documents = [(list(movie_reviews.words(fileid)), category)

for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]


all_words = []

for w in movie_reviews.words():
all_words.append(w.lower()) 26/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:2500]

In [8]: def find_features(document):

words = set(document)
features = {}
for w in word_features:
features[w] = (w in words)

return features

In [9]: print((find_features(movie_reviews.words('neg/cv000_29416.txt'))))

{'plot': True, ':': True, 'two': True, 'teen': True, 'couples': True, 'go': True, 'to': True, 'a': T
rue, 'church': True, 'party': True, ',': True, 'drink': True, 'and': True, 'then': True, 'drive': Tr
ue, '.': True, 'they': True, 'get': True, 'into': True, 'an': True, 'accident': True, 'one': True,
'of': True, 'the': True, 'guys': True, 'dies': True, 'but': True, 'his': True, 'girlfriend': True,
'continues': True, 'see': True, 'him': True, 'in': True, 'her': True, 'life': True, 'has': True, 'ni
ghtmares': True, 'what': True, "'": True, 's': True, 'deal': True, '?': True, 'watch': True, 'movi
e': True, '"': True, 'sorta': True, 'find': True, 'out': True, 'critique': True, 'mind': True, '-':
True, 'fuck': True, 'for': True, 'generation': True, 'that': True, 'touches': True, 'on': True, 'ver
y': True, 'cool': True, 'idea': True, 'presents': True, 'it': True, 'bad': True, 'package': True, 'w
hich': True, 'is': True, 'makes': True, 'this': True, 'review': True, 'even': True, 'harder': True,
'write': True, 'since': True, 'i': True, 'generally': True, 'applaud': True, 'films': True, 'attemp
t': True, 'break': True, 'mold': True, 'mess': True, 'with': True, 'your': True, 'head': True, 'suc
h': True, '(': True, 'lost': True, 'highway': True, '&': True, 'memento': True, ')': True, 'there':
True, 'are': True, 'good': True, 'ways': True, 'making': True, 'all': True, 'types': True, 'these':
True, 'folks': True, 'just': True, 'didn': True, 't': True, 'snag': True, 'correctly': True, 'seem':
True, 'have': True, 'taken': True, 'pretty': True, 'neat': True, 'concept': True, 'executed': True,
'terribly': True, 'so': True, 'problems': True, 'well': True, 'its': True, 'main': True, 'problem':
True, 'simply': True, 'too': True, 'jumbled': True, 'starts': True, 'off': True, 'normal': True, 'do
wnshifts': True, 'fantasy': True, 'world': True, 'you': True, 'as': True, 'audience': True, 'membe
r': True, 'no': True, 'going': True, 'dreams': True, 'characters': True, 'coming': True, 'back': Tru
e, 'from': True, 'dead': True, 'others': True, 'who': True, 'look': True, 'like': True, 'strange': T
rue, 'apparitions': True, 'disappearances': True, 'looooot': True, 'chase': True, 'scenes': True, 't
ons': True, 'weird': True, 'things': True, 'happen': True, 'most': True, 'not': True, 'explained': T
rue, 'now': True, 'personally': True, 'don': True, 'trying': True, 'unravel': True, 'film': True, 'e
very': True, 'when': True, 'does': True, 'give': True, 'me': True, 'same': True, 'clue': True, 'ove
r': True, 'again': True, 'kind': True, 'fed': True, 'up': True, 'after': True, 'while': True, 'bigge
st': True, 'obviously': True, 'got': True, 'big': True, 'secret': True, 'hide': True, 'seems': True,
'want': True, 'completely': True, 'until': True, 'final': True, 'five': True, 'minutes': True, 'do':
True, 'make': True, 'entertaining': True, 'thrilling': True, 'or': True, 'engaging': True, 'meantim
e': True, 'really': True, 'sad': True, 'part': True, 'arrow': True, 'both': True, 'dig': True, 'flic
ks': True, 'we': True, 'actually': True, 'figured': True, 'by': True, 'half': True, 'way': True, 'po
int': True, 'strangeness': True, 'did': True, 'start': True, 'little': True, 'bit': True, 'sense': T
rue, 'still': True, 'more': True, 'guess': True, 'bottom': True, 'line': True, 'movies': True, 'shou
ld': True, 'always': True, 'sure': True, 'before': True, 'given': True, 'password': True, 'enter': T
rue, 'understanding': True, 'mean': True, 'showing': True, 'melissa': True, 'sagemiller': True, 'run
ning': True, 'away': True, 'visions': True, 'about': True, '20': True, 'throughout': True, 'plain':
True, 'lazy': True, '!': True, 'okay': True, 'people': True, 'chasing': True, 'know': True, 'need':
True, 'how': True, 'giving': True, 'us': True, 'different': True, 'offering': True, 'further': True,
'insight': True, 'down': True, 'apparently': True, 'studio': True, 'took': True, 'director': True,
'chopped': True, 'themselves': True, 'shows': True, 'might': True, 've': True, 'been': True, 'decen
t': True, 'here': True, 'somewhere': True, 'suits': True, 'decided': True, 'turning': True, 'music':
True, 'video': True, 'edge': True, 'would': True, 'actors': True, 'although': True, 'wes': True, 'be
ntley': True, 'seemed': True, 'be': True, 'playing': True, 'exact': True, 'character': True, 'he': T
rue, 'american': True, 'beauty': True, 'only': True, 'new': True, 'neighborhood': True, 'my': True,
'kudos': True, 'holds': True, 'own': True, 'entire': True, 'feeling': True, 'unraveling': True, 'ove
rall': True, 'doesn': True, 'stick': True, 'because': True, 'entertain': True, 'confusing': True, 'r
arely': True, 'excites': True, 'feels': True, 'redundant': True, 'runtime': True, 'despite': True,
'ending': True, 'explanation': True, 'craziness': True, 'came': True, 'oh': True, 'horror': True, 's
lasher': True, 'flick': True, 'packaged': True, 'someone': True, 'assuming': True, 'genre': True, 'h
ot': True, 'kids': True, 'also': True, 'wrapped': True, 'production': True, 'years': True, 'ago': Tr
ue, 'sitting': True, 'shelves': True, 'ever': True, 'whatever': True, 'skip': True, 'where': True,
'joblo': True, 'nightmare': True, 'elm': True, 'street': True, '3': True, '7': True, '/': True, '1
0': True, 'blair': True, 'witch': True, '2': True, 'crow': True, '9': True, 'salvation': True, '4':
True, 'stir': True, 'echoes': True, '8': True, 'happy': False, 'bastard': False, 'quick': False, 'da
mn': False, 'y2k': False, 'bug': False, 'starring': False, 'jamie': False, 'lee': False, 'curtis': F
alse, 'another': False, 'baldwin': False, 'brother': False, 'william': False, 'time': False, 'stor
y': False, 'regarding': False, 'crew': False, 'tugboat': False, 'comes': False, 'across': False, 'de
serted': False, 'russian': False, 'tech': False, 'ship': False, 'kick': False, 'power': False, 'with
in': False, 'gore': False, 'bringing': False, 'few': False, 'action': False, 'sequences': False, 'vi
rus': False, 'empty': False, 'flash': False, 'substance': False, 'why': False, 'was': False, 'middl
e': False, 'nowhere': False, 'origin': False, 'pink': False, 'flashy': False, 'thing': False, 'hit':
False, 'mir': False, 'course': False, 'donald': False, 'sutherland': False, 'stumbling': False, 'aro
und': False, 'drunkenly': False, 'hey': False, 'let': False, 'some': False, 'robots': False, 'actin
g': False 'below': False 'average': False 'likes': False 're': False 'likely': False 'work': F 27/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
g : False, below : False, average : False, likes : False, re : False, likely : False, work : F
alse, 'halloween': False, 'h20': False, 'wasted': False, 'real': False, 'star': False, 'stan': Fals
e, 'winston': False, 'robot': False, 'design': False, 'schnazzy': False, 'cgi': False, 'occasional':
False, 'shot': False, 'picking': False, 'brain': False, 'if': False, 'body': False, 'parts': False,
'turn': False, 'otherwise': False, 'much': False, 'sunken': False, 'jaded': False, 'viewer': False,
'thankful': False, 'invention': False, 'timex': False, 'indiglo': False, 'based': False, 'late': Fal
se, '1960': False, 'television': False, 'show': False, 'name': False, 'mod': False, 'squad': False,
'tells': False, 'tale': False, 'three': False, 'reformed': False, 'criminals': False, 'under': Fals
e, 'employ': False, 'police': False, 'undercover': False, 'however': False, 'wrong': False, 'evidenc
e': False, 'gets': False, 'stolen': False, 'immediately': False, 'suspicion': False, 'ads': False,
'cuts': False, 'claire': False, 'dane': False, 'nice': False, 'hair': False, 'cute': False, 'outfit
s': False, 'car': False, 'chases': False, 'stuff': False, 'blowing': False, 'sounds': False, 'firs
t': False, 'fifteen': False, 'quickly': False, 'becomes': False, 'apparent': False, 'certainly': Fal
se, 'slick': False, 'looking': False, 'complete': False, 'costumes': False, 'isn': False, 'enough':
False, 'best': False, 'described': False, 'cross': False, 'between': False, 'hour': False, 'long': F
alse, 'cop': False, 'stretched': False, 'span': False, 'single': False, 'clich': False, 'matter': Fa
lse, 'elements': False, 'recycled': False, 'everything': False, 'already': False, 'seen': False, 'no
thing': False, 'spectacular': False, 'sometimes': False, 'bordering': False, 'wooden': False, 'dane
s': False, 'omar': False, 'epps': False, 'deliver': False, 'their': False, 'lines': False, 'bored':
False, 'transfers': False, 'onto': False, 'escape': False, 'relatively': False, 'unscathed': False,
'giovanni': False, 'ribisi': False, 'plays': False, 'resident': False, 'crazy': False, 'man': False,
'ultimately': False, 'being': False, 'worth': False, 'watching': False, 'unfortunately': False, 'sav
e': False, 'convoluted': False, 'apart': False, 'occupying': False, 'screen': False, 'young': False,
'cast': False, 'clothes': False, 'hip': False, 'soundtrack': False, 'appears': False, 'geared': Fals
e, 'towards': False, 'teenage': False, 'mindset': False, 'r': False, 'rating': False, 'content': Fal
se, 'justify': False, 'juvenile': False, 'older': False, 'information': False, 'literally': False,
'spoon': False, 'hard': False, 'instead': False, 'telling': False, 'dialogue': False, 'poorly': Fals
e, 'written': False, 'extremely': False, 'predictable': False, 'progresses': False, 'won': False, 'c
are': False, 'heroes': False, 'any': False, 'jeopardy': False, 'll': False, 'aren': False, 'basing':
False, 'nobody': False, 'remembers': False, 'questionable': False, 'wisdom': False, 'especially': Fa
lse, 'considers': False, 'target': False, 'fact': False, 'number': False, 'memorable': False, 'can':
False, 'counted': False, 'hand': False, 'missing': False, 'finger': False, 'times': False, 'checke
d': False, 'six': False, 'clear': False, 'indication': False, 'them': False, 'than': False, 'cash':
False, 'spending': False, 'dollar': False, 'judging': False, 'rash': False, 'awful': False, 'seein
g': False, 'avoid': False, 'at': False, 'costs': False, 'quest': False, 'camelot': False, 'warner':
False, 'bros': False, 'feature': False, 'length': False, 'fully': False, 'animated': False, 'steal':
False, 'clout': False, 'disney': False, 'cartoon': False, 'empire': False, 'mouse': False, 'reason':
False, 'worried': False, 'other': False, 'recent': False, 'challenger': False, 'throne': False, 'las
t': False, 'fall': False, 'promising': False, 'flawed': False, '20th': False, 'century': False, 'fo
x': False, 'anastasia': False, 'hercules': False, 'lively': False, 'colorful': False, 'palate': Fals
e, 'had': False, 'beat': False, 'hands': False, 'crown': False, '1997': False, 'piece': False, 'anim
ation': False, 'year': False, 'contest': False, 'arrival': False, 'magic': False, 'kingdom': False,
'mediocre': False, '--': False, 'd': False, 'pocahontas': False, 'those': False, 'keeping': False,
'score': False, 'nearly': False, 'dull': False, 'revolves': False, 'adventures': False, 'free': Fals
e, 'spirited': False, 'kayley': False, 'voiced': False, 'jessalyn': False, 'gilsig': False, 'early':
False, 'daughter': False, 'belated': False, 'knight': False, 'king': False, 'arthur': False, 'roun
d': False, 'table': False, 'dream': False, 'follow': False, 'father': False, 'footsteps': False, 'sh
e': False, 'chance': False, 'evil': False, 'warlord': False, 'ruber': False, 'gary': False, 'oldma
n': False, 'ex': False, 'gone': False, 'steals': False, 'magical': False, 'sword': False, 'excalibu
r': False, 'accidentally': False, 'loses': False, 'dangerous': False, 'booby': False, 'trapped': Fal
se, 'forest': False, 'help': False, 'hunky': False, 'blind': False, 'timberland': False, 'dweller':
False, 'garrett': False, 'carey': False, 'elwes': False, 'headed': False, 'dragon': False, 'eric': F
alse, 'idle': False, 'rickles': False, 'arguing': False, 'itself': False, 'able': False, 'medieval':
False, 'sexist': False, 'prove': False, 'fighter': False, 'side': False, 'pure': False, 'showmanshi
p': False, 'essential': False, 'element': False, 'expected': False, 'climb': False, 'high': False,
'ranks': False, 'differentiates': False, 'something': False, 'saturday': False, 'morning': False, 's
ubpar': False, 'instantly': False, 'forgettable': False, 'songs': False, 'integrated': False, 'compu
terized': False, 'footage': False, 'compare': False, 'run': False, 'angry': False, 'ogre': False, 'h
erc': False, 'battle': False, 'hydra': False, 'rest': False, 'case': False, 'stink': False, 'none':
False, 'remotely': False, 'interesting': False, 'race': False, 'bland': False, 'end': False, 'tie':
False, 'win': False, 'comedy': False, 'shtick': False, 'awfully': False, 'cloying': False, 'least':
False, 'signs': False, 'pulse': False, 'fans': False, "-'": False, '90s': False, 'tgif': False, 'wil
l': False, 'thrilled': False, 'jaleel': False, 'urkel': False, 'white': False, 'bronson': False, 'ba
lki': False, 'pinchot': False, 'sharing': False, 'nicely': False, 'realized': False, 'though': Fals
e, 'm': False, 'loss': False, 'recall': False, 'specific': False, 'providing': False, 'voice': Fals
e, 'talent': False, 'enthusiastic': False, 'paired': False, 'singers': False, 'sound': False, 'music
al': False, 'moments': False, 'jane': False, 'seymour': False, 'celine': False, 'dion': False, 'mus
t': False, 'strain': False, 'through': False, 'aside': False, 'children': False, 'probably': False,
'adults': False, 'grievous': False, 'error': False, 'lack': False, 'personality': False, 'learn': Fa
lse, 'goes': False, 'synopsis': False, 'mentally': False, 'unstable': False, 'undergoing': False, 'p
sychotherapy': False, 'saves': False, 'boy': False, 'potentially': False, 'fatal': False, 'falls': F
alse, 'love': False, 'mother': False, 'fledgling': False, 'restauranteur': False, 'unsuccessfully':
False, 'attempting': False, 'gain': False, 'woman': False, 'favor': False, 'takes': False, 'picture
s': False, 'kills': False, 'comments': False, 'stalked': False, 'yet': False, 'seemingly': False, 'e
ndless': False, 'string': False, 'spurned': False, 'psychos': False, 'getting': False, 'revenge': Fa
lse, 'type': False, 'stable': False, 'category': False, '1990s': False, 'industry': False, 'theatric
al': False, 'direct': False, 'proliferation': False, 'may': False, 'due': False, 'typically': False,
'inexpensive': False, 'produce': False, 'special': False, 'effects': False, 'stars': False, 'serve':
False, 'vehicles': False, 'nudity': False, 'allowing': False, 'frequent': False, 'night': False, 'ca
bl ' F l ' ' F l ' li htl ' F l ' ' F l ' t' F l ' h ' F l 28/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
ble': False, 'wavers': False, 'slightly': False, 'norm': False, 'respect': False, 'psycho': False,
'never': False, 'affair': False, ';': False, 'contrary': False, 'rejected': False, 'rather': False,
'lover': False, 'wife': False, 'husband': False, 'entry': False, 'doomed': False, 'collect': False,
'dust': False, 'viewed': False, 'midnight': False, 'provide': False, 'suspense': False, 'sets': Fals
e, 'interspersed': False, 'opening': False, 'credits': False, 'instance': False, 'serious': False,
'sounding': False, 'narrator': False, 'spouts': False, 'statistics': False, 'stalkers': False, 'pond
ers': False, 'cause': False, 'stalk': False, 'implicitly': False, 'implied': False, 'men': False, 's
hown': False, 'snapshot': False, 'actor': False, 'jay': False, 'underwood': False, 'states': False,
'daryl': False, 'gleason': False, 'stalker': False, 'brooke': False, 'daniels': False, 'meant': Fals
e, 'called': False, 'guesswork': False, 'required': False, 'proceeds': False, 'begins': False, 'obvi
ous': False, 'sequence': False, 'contrived': False, 'quite': False, 'brings': False, 'victim': Fals
e, 'together': False, 'obsesses': False, 'follows': False, 'tries': False, 'woo': False, 'plans': Fa
lse, 'become': False, 'desperate': False, 'elaborate': False, 'include': False, 'cliche': False, 'mu
rdered': False, 'pet': False, 'require': False, 'found': False, 'exception': False, 'cat': False, 's
hower': False, 'events': False, 'lead': False, 'inevitable': False, 'showdown': False, 'survives': F
alse, 'invariably': False, 'conclusion': False, 'turkey': False, 'uniformly': False, 'adequate': Fal
se, 'anything': False, 'home': False, 'either': False, 'turns': False, 'toward': False, 'melodrama':
False, 'overdoes': False, 'words': False, 'manages': False, 'creepy': False, 'pass': False, 'demand
s': False, 'maryam': False, 'abo': False, 'close': False, 'played': False, 'bond': False, 'chick': F
alse, 'living': False, 'daylights': False, 'equally': False, 'title': False, 'ditzy': False, 'stron
g': False, 'independent': False, 'business': False, 'owner': False, 'needs': False, 'proceed': Fals
e, 'example': False, 'suspicions': False, 'ensure': False, 'use': False, 'excuse': False, 'decides':
False, 'return': False, 'toolbox': False, 'left': False, 'place': False, 'house': False, 'leave': Fa
lse, 'door': False, 'answers': False, 'opens': False, 'wanders': False, 'returns': False, 'enters':
False, 'our': False, 'heroine': False, 'danger': False, 'somehow': False, 'parked': False, 'front':
False, 'right': False, 'oblivious': False, 'presence': False, 'inside': False, 'whole': False, 'epis
ode': False, 'places': False, 'incredible': False, 'suspension': False, 'disbelief': False, 'questio
ns': False, 'validity': False, 'intelligence': False, 'receives': False, 'highly': False, 'derivativ
e': False, 'somewhat': False, 'boring': False, 'cannot': False, 'watched': False, 'rated': False, 'm
ostly': False, 'several': False, 'murder': False, 'brief': False, 'strip': False, 'bar': False, 'off
ensive': False, 'many': False, 'thrillers': False, 'mood': False, 'stake': False, 'else': False, 'ca
psule': False, '2176': False, 'planet': False, 'mars': False, 'taking': False, 'custody': False, 'ac
cused': False, 'murderer': False, 'face': False, 'menace': False, 'lot': False, 'fighting': False,
'john': False, 'carpenter': False, 'reprises': False, 'ideas': False, 'previous': False, 'assault':
False, 'precinct': False, '13': False, 'homage': False, 'himself': False, '0': False, '+': False, 'b
elieves': False, 'fight': False, 'horrible': False, 'writer': False, 'supposedly': False, 'expert':
False, 'mistake': False, 'ghosts': False, 'drawn': False, 'humans': False, 'surprisingly': False, 'l
ow': False, 'powered': False, 'alien': False, 'addition': False, 'anybody': False, 'made': False, 'g
rounds': False, 'sue': False, 'chock': False, 'full': False, 'pieces': False, 'prince': False, 'dark
ness': False, 'surprising': False, 'managed': False, 'fit': False, 'admittedly': False, 'novel': Fal
se, 'science': False, 'fiction': False, 'experience': False, 'terraformed': False, 'walk': False, 's
urface': False, 'without': False, 'breathing': False, 'gear': False, 'budget': False, 'mentioned': F
alse, 'gravity': False, 'increased': False, 'earth': False, 'easier': False, 'society': False, 'chan
ged': False, 'advanced': False, 'culture': False, 'women': False, 'positions': False, 'control': Fal
se, 'view': False, 'stagnated': False, 'female': False, 'beyond': False, 'minor': False, 'technologi
cal': False, 'advances': False, 'less': False, '175': False, 'expect': False, 'change': False, 'te
n': False, 'basic': False, 'common': False, 'except': False, 'yes': False, 'replaced': False, 'tack
y': False, 'rundown': False, 'martian': False, 'mining': False, 'colony': False, 'having': False, 'c
riminal': False, 'napolean': False, 'wilson': False, 'desolation': False, 'williams': False, 'facin
g': False, 'hoodlums': False, 'automatic': False, 'weapons': False, 'nature': False, 'behave': Fals
e, 'manner': False, 'essentially': False, 'human': False, 'savages': False, 'lapse': False, 'imagina
tion': False, 'told': False, 'flashback': False, 'entirely': False, 'filmed': False, 'almost': Fals
e, 'tones': False, 'red': False, 'yellow': False, 'black': False, 'powerful': False, 'scene': False,
'train': False, 'rushing': False, 'heavy': False, 'sadly': False, 'buildup': False, 'terror': False,
'creates': False, 'looks': False, 'fugitive': False, 'wannabes': False, 'rock': False, 'band': Fals
e, 'kiss': False, 'building': False, 'bunch': False, 'sudden': False, 'jump': False, 'sucker': Fals
e, 'thinking': False, 'scary': False, 'happening': False, 'standard': False, 'haunted': False, 'shoc
k': False, 'great': False, 'newer': False, 'unimpressive': False, 'digital': False, 'decapitations':
False, 'fights': False, 'short': False, 'stretch': False, 'release': False, 'mission': False, 'panne
d': False, 'reviewers': False, 'better': False, 'rate': False, 'scale': False, 'following': False,
'showed': False, 'liked': False, 'moderately': False, 'classic': False, 'comment': False, 'twice': F
alse, 'ask': False, 'yourself': False, '8mm': False, 'eight': False, 'millimeter': False, 'wholesom
e': False, 'surveillance': False, 'sight': False, 'values': False, 'becoming': False, 'enmeshed': Fa
lse, 'seedy': False, 'sleazy': False, 'underworld': False, 'hardcore': False, 'pornography': False,
'bubbling': False, 'beneath': False, 'town': False, 'americana': False, 'sordid': False, 'sick': Fal
se, 'depraved': False, 'necessarily': False, 'stop': False, 'order': False, 'satisfy': False, 'twist
ed': False, 'desires': False, 'position': False, 'influence': False, 'kinds': False, 'demented': Fal
se, 'talking': False, 'snuff': False, 'supposed': False, 'documentaries': False, 'victims': False,
'brutalized': False, 'killed': False, 'camera': False, 'joel': False, 'schumacher': False, 'credit':
False, 'batman': False, 'robin': False, 'kill': False, 'forever': False, 'client': False, 'thirds':
False, 'unwind': False, 'fairly': False, 'conventional': False, 'persons': False, 'drama': False, 'a
lbeit': False, 'particularly': False, 'unsavory': False, 'core': False, 'threatening': False, 'alon
g': False, 'explodes': False, 'violence': False, 'think': False, 'finally': False, 'tags': False, 'r
idiculous': False, 'self': False, 'righteous': False, 'finale': False, 'drags': False, 'unpleasant':
False, 'trust': False, 'waste': False, 'hours': False, 'nicolas': False, 'snake': False, 'eyes': Fal
se, 'cage': False, 'private': False, 'investigator': False, 'tom': False, 'welles': False, 'hired':
False, 'wealthy': False, 'philadelphia': False, 'widow': False, 'determine': False, 'whether': Fals
e, 'reel': False, 'safe': False, 'documents': False, 'girl': False, 'assignment': False, 'factly': F
alse, 'puzzle': False, 'neatly': False, 'specialized': False, 'skills': False, 'training': False, 'e 29/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
asy': False, 'cops': False, 'toilet': False, 'tanks': False, 'clues': False, 'deeper': False, 'dig
s': False, 'investigation': False, 'obsessed': False, 'george': False, 'c': False, 'scott': False,
'paul': False, 'schrader': False, 'occasionally': False, 'flickering': False, 'whirs': False, 'sproc
kets': False, 'winding': False, 'projector': False, 'reminding': False, 'task': False, 'hints': Fals
e, 'toll': False, 'lovely': False, 'catherine': False, 'keener': False, 'frustrated': False, 'clevel
and': False, 'ugly': False, 'split': False, 'level': False, 'harrisburg': False, 'pa': False, 'conde
mn': False, 'condone': False, 'subject': False, 'exploits': False, 'irony': False, 'seven': False,
'scribe': False, 'andrew': False, 'kevin': False, 'walker': False, 'vision': False, 'lane': False,
'limited': False, 'hollywood': False, 'product': False, 'snippets': False, 'covering': False, 'late
r': False, 'joaquin': False, 'phoenix': False, 'far': False, 'adult': False, 'bookstore': False, 'fl
unky': False, 'max': False, 'california': False, 'cover': False, 'horrid': False, 'screened': False,
'familiar': False, 'revelation': False, 'sexual': False, 'deviants': False, 'indeed': False, 'monste
rs': False, 'everyday': False, 'neither': False, 'super': False, 'nor': False, 'shocking': False, 'b
anality': False, 'exactly': False, 'felt': False, 'weren': False, 'nine': False, 'laughs': False, 'm
onths': False, 'terrible': False, 'mr': False, 'hugh': False, 'grant': False, 'huge': False, 'dork':
False, 'oral': False, 'sex': False, 'prostitution': False, 'referring': False, 'bugs': False, 'annoy
ing': False, 'adam': False, 'sandler': False, 'jim': False, 'carrey': False, 'eye': False, 'flutter
s': False, 'nervous': False, 'smiles': False, 'slapstick': False, 'fistfight': False, 'delivery': Fa
lse, 'room': False, 'culminating': False, 'joan': False, 'cusack': False, 'lap': False, 'paid': Fals
e, '$': False, '60': False, 'included': False, 'obscene': False, 'double': False, 'entendres': Fals
e, 'obstetrician': False, 'pregnant': False, 'pussy': False, 'size': False, 'hairs': False, 'coat':
False, 'nonetheless': False, 'exchange': False, 'cookie': False, 'cutter': False, 'originality': Fal
se, 'humor': False, 'successful': False, 'child': False, 'psychiatrist': False, 'psychologist': Fals
e, 'scriptwriters': False, 'could': False, 'inject': False, 'unfunny': False, 'kid': False, 'dad': F
alse, 'asshole': False, 'eyelashes': False, 'offers': False, 'smile': False, 'responds': False, 'eng
lish': False, 'accent': False, 'attitude': False, 'possibly': False, '_huge_': False, 'beside': Fals
e, 'includes': False, 'needlessly': False, 'stupid': False, 'jokes': False, 'olds': False, 'everyon
e': False, 'shakes': False, 'anyway': False, 'finds': False, 'usual': False, 'reaction': False, 'flu
ttered': False, 'paves': False, 'possible': False, 'pregnancy': False, 'birth': False, 'gag': False,
'book': False, 'friend': False, 'arnold': False, 'provides': False, 'cacophonous': False, 'funny': F
alse, 'beats': False, 'costumed': False, 'arnie': False, 'dinosaur': False, 'draw': False, 'parallel
s': False, 'toy': False, 'store': False, 'jeff': False, 'goldblum': False, 'hid': False, 'dreadful':
False, 'hideaway': False, 'artist': False, 'fear': False, 'simultaneous': False, 'longing': False,
'commitment': False, 'doctor': False, 'recently': False, 'switch': False, 'veterinary': False, 'medi
cine': False, 'obstetrics': False, 'joke': False, 'old': False, 'foreign': False, 'guy': False, 'mis
pronounces': False, 'stereotype': False, 'say': False, 'yakov': False, 'smirnov': False, 'favorite':
False, 'vodka': False, 'hence': False, 'take': False, 'volvo': False, 'nasty': False, 'unamusing': F
alse, 'heads': False, 'simultaneously': False, 'groan': False, 'failure': False, 'loud': False, 'fai
led': False, 'uninspired': False, 'lunacy': False, 'sunset': False, 'boulevard': False, 'arrest': Fa
lse, 'please': False, 'caught': False, 'pants': False, 'bring': False, 'theaters': False, 'faces': F
alse, '90': False, 'forced': False, 'unauthentic': False, 'anyone': False, 'q': False, '80': False,
'sorry': False, 'money': False, 'unfulfilled': False, 'desire': False, 'spend': False, 'bucks': Fals
e, 'call': False, 'road': False, 'trip': False, 'walking': False, 'wounded': False, 'stellan': Fals
e, 'skarsg': False, 'rd': False, 'convincingly': False, 'zombified': False, 'drunken': False, 'lose
r': False, 'difficult': False, 'smelly': False, 'boozed': False, 'reliable': False, 'swedish': Fals
e, 'adds': False, 'depth': False, 'significance': False, 'plodding': False, 'aberdeen': False, 'sent
imental': False, 'painfully': False, 'mundane': False, 'european': False, 'playwright': False, 'augu
st': False, 'strindberg': False, 'built': False, 'career': False, 'families': False, 'relationship
s': False, 'paralyzed': False, 'secrets': False, 'unable': False, 'express': False, 'longings': Fals
e, 'accurate': False, 'reflection': False, 'strives': False, 'focusing': False, 'pairing': False, 'a
lcoholic': False, 'tomas': False, 'alienated': False, 'openly': False, 'hostile': False, 'yuppie': F
alse, 'kaisa': False, 'lena': False, 'headey': False, 'gossip': False, 'haven': False, 'spoken': Fal
se, 'wouldn': False, 'norway': False, 'scotland': False, 'automobile': False, 'charlotte': False, 'r
ampling': False, 'sand': False, 'rotting': False, 'hospital': False, 'bed': False, 'cancer': False,
'soap': False, 'opera': False, 'twist': False, 'days': False, 'live': False, 'blitzed': False, 'ste
p': False, 'foot': False, 'plane': False, 'hits': False, 'open': False, 'loathing': False, 'each': F
alse, 'periodic': False, 'stops': False, 'puke': False, 'dashboard': False, 'whenever': False, 'mutt
ering': False, 'rotten': False, 'turned': False, 'sloshed': False, 'viewpoint': False, 'recognizes':
False, 'apple': False, 'hasn': False, 'fallen': False, 'tree': False, 'nosebleeds': False, 'snortin
g': False, 'coke': False, 'sabotages': False, 'personal': False, 'indifference': False, 'restrain':
False, 'vindictive': False, 'temper': False, 'ain': False, 'pair': False, 'true': False, 'notes': Fa
lse, 'unspoken': False, 'familial': False, 'empathy': False, 'note': False, 'repetitively': False,
'bitchy': False, 'screenwriters': False, 'kristin': False, 'amundsen': False, 'hans': False, 'pette
r': False, 'moland': False, 'fabricate': False, 'series': False, 'contrivances': False, 'propel': Fa
lse, 'forward': False, 'roving': False, 'hooligans': False, 'drunks': False, 'nosy': False, 'flat':
False, 'tires': False, 'figure': False, 'schematic': False, 'convenient': False, 'narrative': False,
'reach': False, 'unveil': False, 'dark': False, 'past': False, 'simplistic': False, 'devices': Fals
e, 'trivialize': False, 'conflict': False, 'mainstays': False, 'wannabe': False, 'exists': False, 'p
urely': False, 'sake': False, 'weak': False, 'unimaginative': False, 'casting': False, 'thwarts': Fa
lse, 'pivotal': False, 'role': False, 'were': False, 'stronger': False, 'actress': False, 'perhaps':
False, 'coast': False, 'performances': False, 'moody': False, 'haunting': False, 'cinematography': F
alse, 'rendering': False, 'pastoral': False, 'ghost': False, 'reference': False, 'certain': False,
'superior': False, 'indie': False, 'intentional': False, 'busy': False, 'using': False, 'furrowed':
False, 'brow': False, 'convey': False, 'twitch': False, 'insouciance': False, 'paying': False, 'atte
ntion': False, 'maybe': False, 'doing': False, 'reveal': False, 'worthwhile': False, 'earlier': Fals
e, 'released': False, '2001': False, 'jonathan': False, 'nossiter': False, 'captivating': False, 'wo
nders': False, 'disturbed': False, 'parental': False, 'figures': False, 'bound': False, 'ceremonia
l': False, 'wedlock': False, 'differences': False, 'presented': False, 'significant': False, 'lumino
us': False, 'diva': False, 'preening': False, 'static': False, 'solid': False, 'performance': False, 30/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
'pathetic': False, 'drunk': False, 'emote': False, 'besides': False, 'catatonic': False, 'sorrow': F
alse, 'genuine': False, 'ferocity': False, 'sexually': False, 'charged': False, 'frisson': False, 'd
uring': False, 'understated': False, 'confrontations': False, 'suggest': False, 'gray': False, 'zon
e': False, 'complications': False, 'accompany': False, 'torn': False, 'romance': False, 'stifled': F
alse, 'curiosity': False, 'thoroughly': False, 'explores': False, 'neurotic': False, 'territory': Fa
lse, 'delving': False, 'americanization': False, 'greece': False, 'mysticism': False, 'illusion': Fa
lse, 'deflect': False, 'pain': False, 'overloaded': False, 'willing': False, 'come': False, 'traditi
onal': False, 'ambitious': False, 'sleepwalk': False, 'rhythms': False, 'timing': False, 'driven': F
alse, 'stories': False, 'complexities': False, 'depressing': False, 'answer': False, 'lawrence': Fal
se, 'kasdan': False, 'trite': False, 'useful': False, 'grand': False, 'canyon': False, 'steve': Fals
e, 'martin': False, 'mogul': False, 'pronounces': False, 'riddles': False, 'answered': False, 'advic
e': False, 'heart': False, 'french': False, 'sees': False, 'parents': False, 'tim': False, 'roth': F
alse, 'oops': False, 'vows': False, 'taught': False, 'musketeer': False, 'dude': False, 'used': Fals
e, 'fourteen': False, 'arrgh': False, 'swish': False, 'zzzzzzz': False, 'original': False, 'lacks':
False, 'energy': False, 'next': False, 'hmmmm': False, 'justin': False, 'chambers': False, 'basicall
y': False, 'uncharismatic': False, 'version': False, 'chris': False, 'o': False, 'donnell': False,
'range': False, 'mena': False, 'suvari': False, 'thora': False, 'birch': False, 'dungeons': False,
'dragons': False, 'miscast': False, 'deliveries': False, 'piss': False, 'poor': False, 'ms': False,
'fault': False, 'definitely': False, 'higher': False, 'semi': False, 'saving': False, 'grace': Fals
e, 'wise': False, 'irrepressible': False, 'once': False, 'thousand': False, 'god': False, 'beg': Fal
se, 'agent': False, 'marketplace': False, 'modern': False, 'day': False, 'roles': False, 'romantic':
False, 'gunk': False, 'alright': False, 'yeah': False, 'yikes': False, 'notches': False, 'fellas': F
alse, 'blares': False, 'ear': False, 'accentuate': False, 'annoy': False, 'important': False, 'behin
d': False, 'recognize': False, 'epic': False, 'fluffy': False, 'rehashed': False, 'cake': False, 'cr
eated': False, 'shrewd': False, 'advantage': False, 'kung': False, 'fu': False, 'phenomenon': False,
'test': False, 'dudes': False, 'keep': False, 'reading': False, 'editing': False, 'shoddy': False,
'banal': False, 'stilted': False, 'plentiful': False, 'top': False, 'horse': False, 'carriage': Fals
e, 'stand': False, 'opponent': False, 'scampering': False, 'cut': False, 'mouseketeer': False, 'rop
e': False, 'tower': False, 'jumping': False, 'chords': False, 'hanging': False, 'says': False, '14':
False, 'shirt': False, 'strayed': False, 'championing': False, 'fun': False, 'stretches': False, 'at
rocious': False, 'lake': False, 'reminded': False, 'school': False, 'cringe': False, 'musketeers': F
alse, 'fat': False, 'raison': False, 'etre': False, 'numbers': False, 'hoping': False, 'packed': Fal
se, 'stuntwork': False, 'promoted': False, 'trailer': False, 'major': False, 'swashbuckling': False,
'beginning': False, 'finishes': False, 'juggling': False, 'ladders': False, 'ladder': False, 'defini
te': False, 'keeper': False, 'regurgitated': False, 'crap': False, 'tell': False, 'deneuve': False,
'placed': False, 'hullo': False, 'barely': False, 'ugh': False, 'small': False, 'annoyed': False, 't
rash': False, 'gang': False, 'vow': False, 'stay': False, 'thank': False, 'outlaws': False, '5': Fal
se, 'crouching': False, 'tiger': False, 'hidden': False, 'matrix': False, 'replacement': False, 'kil
lers': False, '6': False, 'romeo': False, 'die': False, 'shanghai': False, 'noon': False, 'remembere
d': False, 'dr': False, 'hannibal': False, 'lecter': False, 'michael': False, 'mann': False, 'forens
ics': False, 'thriller': False, 'manhunter': False, 'scottish': False, 'brian': False, 'cox': False,
'works': False, 'usually': False, 'schlock': False, 'halfway': False, 'goodnight': False, 'meaty': F
alse, 'substantial': False, 'brilliant': False, 'check': False, 'dogged': False, 'inspector': False,
'opposite': False, 'frances': False, 'mcdormand': False, 'ken': False, 'loach': False, 'agenda': Fal
se, 'harrigan': False, 'disturbing': False, 'l': False, 'e': False, '47': False, 'picked': False, 's
undance': False, 'distributors': False, 'scared': False, 'budge': False, 'dares': False, 'speak': Fa
lse, 'expresses': False, 'seeking': False, 'adolescents': False, 'pad': False, 'bothered': False, 'm
embers': False, 'presentation': False, 'oddly': False, 'empathetic': False, 'light': False, 'tempere
d': False, 'robust': False, 'listens': False, 'opposed': False, 'friends': False, 'wire': False, 'ac
t': False, 'confused': False, 'lives': False, 'pay': False, 'courtship': False, 'charming': False,
'temptations': False, 'grown': False, 'stands': False, 'island': False, 'expressway': False, 'slice
s': False, 'malls': False, 'class': False, 'homes': False, 'suburbia': False, 'filmmaker': False, 'c
uesta': False, 'uses': False, 'transparent': False, 'metaphor': False, '15': False, 'protagonist': F
alse, 'howie': False, 'franklin': False, 'dano': False, 'reveals': False, 'morbid': False, 'preoccup
ation': False, 'death': False, 'citing': False, 'deaths': False, 'alan': False, 'j': False, 'pakul
a': False, 'songwriter': False, 'harry': False, 'chapin': False, 'exit': False, '52': False, 'fascin
ated': False, 'feelings': False, 'projected': False, 'bright': False, 'move': False, 'force': False,
'complex': False, 'molesters': False, 'beast': False, 'ashamed': False, 'worked': False, 'ill': Fals
e, 'advised': False, 'foray': False, 'unnecessary': False, 'padding': False, 'miserable': False, 'br
uce': False, 'altman': False, 'seat': False, 'collar': False, 'crime': False, 'degenerate': False,
'youngsters': False, 'kicks': False, 'robbing': False, 'houses': False, 'homoerotic': False, 'shenan
igans': False, 'ass': False, 'terrio': False, 'billy': False, 'kay': False, 'handsome': False, 'artf
ul': False, 'dodger': False, 'add': False, 'themes': False, 'suburban': False, 'ennui': False, 'need
ed': False, 'awkward': False, 'subplots': False, 'concurrently': False, 'relationship': False, 'even
ly': False, 'paced': False, 'exceptionally': False, 'acted': False, 'sporting': False, 'baseball': F
alse, 'cap': False, 'faded': False, 'marine': False, 'tattoo': False, 'bluff': False, 'bluster': Fal
se, 'quiet': False, 'glance': False, 'withdrawn': False, 'whose': False, 'dramatic': False, 'choice
s': False, 'broad': False, 'calling': False, 'haley': False, 'restraint': False, 'admirable': False,
'screenplay': False, 'material': False, 'reads': False, 'walt': False, 'whitman': False, 'poem': Fal
se, 'moment': False, 'precious': False, 'lingers': False, 'ecstatic': False, 'hearing': False, 'glen
n': False, 'gould': False, 'performing': False, 'bach': False, 'goldberg': False, 'variations': Fals
e, 'involving': False, 'walter': False, 'masterson': False, 'jealous': False, 'newbie': False, 'thre
ad': False, 'predictably': False, 'leads': False, 'observational': False, 'portrait': False, 'aliena
tion': False, 'royally': False, 'screwed': False, 'terry': False, 'zwigoff': False, 'superb': False,
'confidence': False, 'ambivalent': False, 'typical': False, 'cinema': False, 'wrap': False, 'bulle
t': False, 'sparing': False, 'writers': False, 'philosophical': False, 'regard': False, 'countless':
False, 'share': False, 'blockbuster': False, 'solved': False, 'obstacle': False, 'removed': False,
'often': False, 'extend': False, 'question': False, 'striving': False, 'realism': False, 'destroy':
False, 'janeane': False, 'garofalo': False, 'couple': False, 'truth': False, 'cats': False, 'dogs': 31/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
False, janeane : False, garofalo : False, couple : False, truth : False, cats : False, dogs :
False, 'excruciating': False, 'matchmaker': False, 'books': False, 'plods': False, 'predestined': Fa
lse, 'surprises': False, 'jumps': False, 'popular': False, 'political': False, 'satire': False, 'ban
dwagon': False, 'campaign': False, 'aide': False, 'massacusetts': False, 'senator': False, 'sander
s': False, 'reelection': False, 'denis': False, 'leary': False, 'stereotypical': False, 'strategis
t': False, 'ethics': False, 'scandal': False, 'plagued': False, 'play': False, 'irish': False, 'root
s': False, 'boston': False, 'roman': False, 'catholic': False, 'democrat': False, 'contingent': Fals
e, 'kennedy': False, 'family': False, 'orders': False, 'ireland': False, 'relatives': False, 'exploi
t': False, 'soon': False, 'learns': False, 'said': False, 'done': False, 'mantra': False, 'tiny': Fa
lse, 'misses': False, 'bus': False, 'hotel': False, 'ends': False, 'smallest': False, 'trashiest': F
alse, 'dog': False, 'luggage': False, 'roger': False, 'ebert': False, 'calls': False, 'meet': False,
'happens': False, 'unconventional': False, 'cinematic': False, 'walks': False, 'bathroom': False, 'n
ude': False, 'sean': False, 'david': False, 'hara': False, 'bathtub': False, 'points': False, 'guess
ing': False, 'water': False, 'hates': False, 'instant': False, 'saw': False, 'irishman': False, 'hat
e': False, 'awhile': False, 'succumb': False, 'charms': False, 'happily': False, 'superficial': Fals
e, 'detail': False, 'throw': False, 'turmoil': False, 'reconcile': False, 'tune': False, 'annual': F
alse, 'matchmaking': False, 'festival': False, 'lonely': False, 'county': False, 'future': False, 'b
liss': False, 'milo': False, 'shea': False, 'snyder': False, 'pops': False, 'onscreen': False, 'spe
w': False, 'souls': False, 'assured': False, 'match': False, 'utter': False, 'predictability': Fals
e, 'message': False, 'respectable': False, 'person': False, 'comedic': False, 'distinction': False,
'sell': False, 'script': False, 'excited': False, 'stays': False, 'stateside': False, 'yelling': Fal
se, 'phone': False, 'undoes': False, 'microphone': False, 'speech': False, 'known': False, 'flying':
False, 'hong': False, 'kong': False, 'style': False, 'filmmaking': False, 'classics': False, 'nod':
False, 'asia': False, 'france': False, 'lukewarm': False, 'dumas': False, 'asian': False, 'stunt': F
alse, 'coordinator': False, 'xing': False, 'xiong': False, 'prior': False, 'attempts': False, 'chore
ography': False, 'laughable': False, 'van': False, 'damme': False, 'vehicle': False, 'team': False,
'dennis': False, 'rodman': False, 'simon': False, 'sez': False, 'thrown': False, 'air': False, 'resu
lt': False, 'tepid': False, 'adventure': False, 'rip': False, 'stinks': False, 'indiana': False, 'jo
nes': False, 'simple': False, 'grandmother': False, 'adapted': False, 'artagnan': False, 'vengeful':
False, 'son': False, 'slain': False, 'travels': False, 'paris': False, 'join': False, 'royal': Fals
e, 'meets': False, 'cunning': False, 'cardinal': False, 'richelieu': False, 'stephen': False, 'rea':
False, 'overthrow': False, 'associate': False, 'febre': False, 'killer': False, 'disbanded': False,
'rounds': False, 'aramis': False, 'nick': False, 'moran': False, 'athos': False, 'jan': False, 'greg
or': False, 'kremp': False, 'porthos': False, 'steven': False, 'spiers': False, 'wrongfully': False,
'imprisoned': False, 'leader': False, 'treville': False, 'prison': False, 'frisky': False, 'interes
t': False, 'chambermaid': False, 'francesca': False, 'footsy': False, 'coo': False, 'hunts': False,
'queen': False, 'captured': False, 'menancing': False, 'forcing': False, 'regroup': False, 'leadin
g': False, 'charge': False, 'peter': False, 'hyams': False, 'wanted': False, 'blend': False, 'easter
n': False, 'western': False, 'styles': False, 'disaster': False, 'reality': False, 'ones': False, 'j
et': False, 'li': False, 'risk': False, 'ironically': False, 'swordplay': False, 'spread': False, 'c
arry': False, 'bulk': False, '30': False, 'minute': False, 'picture': False, 'weighs': False, 'monot
onous': False, 'gene': False, 'quintano': False, 'prosaic': False, 'wedding': False, 'planner': Fals
e, 'mousy': False, 'artangnan': False, 'hyam': False, 'candles': False, 'torches': False, 'grime': F
alse, 'filth': False, '17th': False, 'noted': False, 'standout': False, 'mortal': False, 'kombat': F
alse, 'annihilation': False, 'reviewed': False, 'multiple': False, 'levels': False, 'rampant': Fals
e, 'usage': False, 'randian': False, 'subtext': False, 'pervades': False, 'occasionaly': False, 'iro
nic': False, 'depreciating': False, 'remark': False, 'tosses': False, 'clearly': False, 'marxist': F
alse, 'imagery': False, 'kidding': False, 'seriousness': False, 'fair': False, '*': False, 'necessar
y': False, 'viewpoints': False, 'watcher': False, 'unfamiliar': False, 'marginally': False, 'fan': F
alse, 'games': False, '1995': False, 'concerned': False, 'martial': False, 'arts': False, 'tournamen
t': False, 'decide': False, 'fate': False, 'billion': False, 'inhabitants': False, 'mortals': False,
'theory': False}

In [10]: featuresets = [(find_features(rev), category) for (rev, category) in documents]


The algorithm that we're going to use first is the Naive Bayes classifier. Before we can train and test our algorithm, we need to go ahead
and split up the data into a training set and a testing set. This is called supervised machine learning, because we're showing the machine
data, and telling it "this data is positive," or "this data is negative." Then, after that training is done, we show the machine some new data
and ask the computer, based on what we taught the computer before, what the computer thinks the category of the new data is.

In [11]: training_set =featuresets[:1900]

testing_set = featuresets[1900:]

In [12]: classifier = nltk.NaiveBayesClassifier.train(training_set)

In [13]: print("Naive Bayes Algo accuracy:", (nltk.classify.accuracy(classifier, testing_set))*100)

Naive Bayes Algo accuracy: 78.0

In [14]: classifier.show_most_informative_features(15)

Most Informative Features

annual = True pos : neg = 9.0 : 1.0
unimaginative = True neg : pos = 7 7 : 1 0 32/33
11/14/2019 NLTK-Tutorial-/nltk_practice1.ipynb at master · adityaojha07/NLTK-Tutorial-
unimaginative = True neg : pos = 7.7 : 1.0
frances = True pos : neg = 7.6 : 1.0
schumacher = True neg : pos = 7.0 : 1.0
shoddy = True neg : pos = 7.0 : 1.0
mena = True neg : pos = 7.0 : 1.0
atrocious = True neg : pos = 7.0 : 1.0
suvari = True neg : pos = 7.0 : 1.0
regard = True pos : neg = 7.0 : 1.0
turkey = True neg : pos = 6.4 : 1.0
kidding = True neg : pos = 6.4 : 1.0
singers = True pos : neg = 6.3 : 1.0
stinks = True neg : pos = 5.8 : 1.0
justin = True neg : pos = 5.8 : 1.0
bothered = True neg : pos = 5.8 : 1.0

This tells is the ratio of occurences in negative to positive, or visa versa, for every word. So here, we can see that the term "insulting"
appears 10.6 more times as often in negative reviews as it does in positive reviews. Ludicrous, 10.1

Saving Classifiers with NLTK 33/33

