Professional Documents
Culture Documents
Digital Assignment-1: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Natarajan P SLOT: L45+L46
Digital Assignment-1: Name: Bejugam Shiva Suprith REG NO: 18BCE0427 Faculty: Natarajan P SLOT: L45+L46
FACULTY: NATARAJAN P
SLOT: L45+L46
1. Write a program to Tokenize the given document or data and compute the frequency of
words?
CODE:
#18BCE0427
import nltk
data= 'Web Mining is the process of Data Mining techniques to automatically discover and
extract information from Web documents and services. The main purpose of web mining is
discovering useful information from the World-Wide Web and its usage patterns.'
word=word_tokenize(data)
print(word_tokenize(data))
word_count={}
for i in wordt:
if i in word_count:
word_count[i]=word_count[i]+1
else:
word_count[i]=1
print(word_count)
OUTPUT:
CODE:
stopwords=set(stopwords.words('english'))
print(stopwords)
filtered_words=[]
for i in word:
if i not in stopwords:
filtered_words.append(i)
print(filtered_words)
##stemming
ps=PorterStemmer()
stemmed_words=[]
for i in filtered_words:
stemmed_words.append(ps.stem(i))
print(stemmed_words)
OUTPUT: