Professional Documents
Culture Documents
9228 NLP Expt 4
9228 NLP Expt 4
Practical No: 4
Evaluation:
Experiment – 4
N-gram Model
Task 1:
Quiz output:
import pandas as pd
import nltk
from nltk.util import ngrams
from nltk.corpus import movie_reviews
from nltk.classify import NaiveBayesClassifier
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import mark_negation
df = pd.read_csv('/content/Tweets.csv', usecols=[2,3])
df.head()
selected_text sentiment
2 bullying me negative
df = df.dropna()
le = LabelEncoder()
df['sentiment'] = le.fit_transform(df['sentiment'])
X = list(df['selected_text'])
y = list(df['sentiment'])
X_train_cv = cv.fit_transform(X_train)
X_test_cv = cv.transform(X_test)
clf = MultinomialNB()
clf.fit(X_train_cv, y_train)
y_pred = clf.predict(X_test_cv)
cv = CountVectorizer(analyzer='word',ngram_range=(1,2), stop_words='english')
X_train_cv = cv.fit_transform(X_train)
X_test_cv = cv.transform(X_test)
clf = MultinomialNB()
clf.fit(X_train_cv, y_train)
y_pred = clf.predict(X_test_cv)
https://colab.research.google.com/drive/12p7BnuxD6qOsL08a3_uIKi5k6n9JqgkB?authuser=1#scrollTo=tECROc82Ph_A&printMode=true 1/2
8/25/23, 8:50 PM NLP_Exp4.ipynb - Colaboratory
for N in range(1,11):
clf = MultinomialNB()
clf.fit(X_train_cv, y_train)
y_pred = clf.predict(X_test_cv)
Conclusion: Based on the results, the model performs at its best with the n-gram range of (1,5). This means that training the model with n-
grams ranging from unigrams to 5-grams help achieve optimal results, but larger n-grams only result in more sparse input features, which
hampers model performance.
https://colab.research.google.com/drive/12p7BnuxD6qOsL08a3_uIKi5k6n9JqgkB?authuser=1#scrollTo=tECROc82Ph_A&printMode=true 2/2
q229 Robe Rodbigud
Postlabi
motiuo
Pogidhy
molsls cleling N-gam medels, in tte ielod f
a
It wgases how wrell a
valucs dtico
a tokeus bsecl on
hto mode
Lohe
N- Nombes ef woxds in Seaueuce
qrolabiliy csiel by tu mocte! to
ente Sequnee