Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

DIGITAL ASSIGNMENT-1

NAME: BEJUGAM SHIVA SUPRITH

REG NO: 18BCE0427

FACULTY: NATARAJAN P

SLOT: L45+L46
1. Write a program to Tokenize the given document or data and compute the frequency of
words?

CODE:

#18BCE0427

import nltk

data= 'Web Mining is the process of Data Mining techniques to automatically discover and
extract information from Web documents and services. The main purpose of web mining is
discovering useful information from the World-Wide Web and its usage patterns.'

#tokenising the given data

from nltk.tokenize import word_tokenize

word=word_tokenize(data)

print(word_tokenize(data))

#calculating word frequency

word_count={}

for i in wordt:

if i in word_count:

word_count[i]=word_count[i]+1

else:

word_count[i]=1
print(word_count)

OUTPUT:

2. Write a program to remove the stop word and to do Stemming?

CODE:

from nltk.corpus import stopwords

stopwords=set(stopwords.words('english'))

print(stopwords)

filtered_words=[]

for i in word:

if i not in stopwords:

filtered_words.append(i)

print(filtered_words)
##stemming

from nltk.stem import PorterStemmer

ps=PorterStemmer()

stemmed_words=[]

for i in filtered_words:

stemmed_words.append(ps.stem(i))

print(stemmed_words)

OUTPUT:

You might also like