Professional Documents
Culture Documents
NLP Lab-1 20bci7108
NLP Lab-1 20bci7108
ipynb - Colaboratory
KARAKA.RUPASREE 20BCI7108
import nltk
nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> d
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Downloader> q
True
Code Text
#Q1
import nltk
para="""The name of my village is Pakdiyar which falls in the Gopalganj district of Bihar.During the summer and winter vacations, I visit
My grandparents house is one of the biggest pakka houses in the village.My grandmother does a lot of social work for the villagers. There
People fetch water from these sources for daily use, irrigation, etc. They celebrate their joys together and stand united in tough times.
Every person in my village is hard working. My village does not have tall buildings and glittering lights. But it has peace, warmth and a
I love spending vacations in my village along with my parents and grandparents."""
from nltk.tokenize import word_tokenize
wt = word_tokenize(para)
print(wt)
print("\nNo.of Words in the Paragraph: ",len(wt))
from nltk.probability import FreqDist
fd = (FreqDist(wt))
list = [(m,n) for m,n in fd.items()]
print("\n",list)
['The', 'name', 'of', 'my', 'village', 'is', 'Pakdiyar', 'which', 'falls', 'in', 'the', 'Gopalganj', 'district', 'of', 'Bihar.Durin
[('The', 1), ('name', 1), ('of', 4), ('my', 5), ('village', 7), ('is', 3), ('Pakdiyar', 1), ('which', 1), ('falls', 1), ('in', 7),
#Q2
from urllib import request
url = "http://www.gutenberg.org/files/2554/2554-0.txt"
response = request.urlopen(url)
raw = response.read().decode('utf8')
print(raw)
uwt = word_tokenize(raw)
print(uwt)
https://colab.research.google.com/drive/12Xymz9HL-V5nzLd23uQwGvRKBnb73_Fs#scrollTo=tCC-7ykhqkBC&printMode=true 1/3
1/23/23, 6:23 PM Untitled0.ipynb - Colaboratory
Gutenberg-tm concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg-tm eBooks with only a loose network of
volunteer support.
Most people start at our website which has the main PG search
facility: www.gutenberg.org
Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)
#Q3
fd1 = (FreqDist(uwt))
list1 = [(m,n) for m,n in fd1.items()]
print(list1)
[('\ufeffThe', 1), ('Project', 84), ('Gutenberg', 28), ('eBook', 11), ('of', 3849), ('Crime', 4), ('and', 6279), ('Punishment', 2),
#Q4
from matplotlib import pyplot as plt
fd1.plot(30,cumulative=False)
print(fd1.most_common(4))
plt.show()
https://colab.research.google.com/drive/12Xymz9HL-V5nzLd23uQwGvRKBnb73_Fs#scrollTo=tCC-7ykhqkBC&printMode=true 2/3