9228 NLP Expt 4

Department of Computer Engineering
Academic Term: July-November 2023
Rubrics for Lab Experiments
Class : B.E. Computer Subject Name :NLP

Semester : VII Subject Code : CSDC7023
Practical No: 4
Title: N-Gram Model
Date of Performance: 17/08/2023
Roll No: 9228
Name of the Student: Ruben Rodrigues
Evaluation:
Performance Below average Average Good Excellent Marks

Indicator
On time Not submitted(0) Submitted Early or on time ---
Submission (2) after deadline submission(2)
(1)
Test cases and Incorrect The expected The expected Expected output is
output output (1) output is output is Verified obtained for all test
(4) verified only a for all test cases cases. Presentable and
for few test but is easy to follow (4)
cases (2) not presentable (3)
Coding The code is not The code is The code is -
efficiency (2) structured at all structured but structured
(0) not efficient (1) and
efficient. (2)
Knowledge(2) Basic concepts Understood Could explain the Could relate the theory with
not clear the basic concept with real world
(0) concepts (1) suitable example application(2)
(1.5)
Total
Natural Language Processing (BE COMP – Sem-VII)
Experiment – 4
N-gram Model
Aim: To implement the N-gram model
Task 1:
Quiz output:
I sit you EOS : 0.00808

Can you sit near I EOS : 1/29700
I can sit EOS : 0.0121
You sit EOS : 0.0181
8/25/23, 8:50 PM NLP_Exp4.ipynb - Colaboratory
import pandas as pd
import nltk
from nltk.util import ngrams
from nltk.corpus import movie_reviews
from nltk.classify import NaiveBayesClassifier
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import mark_negation
df = pd.read_csv('/content/Tweets.csv', usecols=[2,3])
df.head()
selected_text sentiment
0 I`d have responded, if I were going neutral
1 Sooo SAD negative
2 bullying me negative
3 leave me alone negative
4 Sons of ****, negative
from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split
df = df.dropna()
le = LabelEncoder()
df['sentiment'] = le.fit_transform(df['sentiment'])
X = list(df['selected_text'])
y = list(df['sentiment'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(analyzer = 'word',ngram_range=(1,1), stop_words='english')
X_train_cv = cv.fit_transform(X_train)
X_test_cv = cv.transform(X_test)
from sklearn.naive_bayes import MultinomialNB

from sklearn.metrics import f1_score
import numpy as np
clf = MultinomialNB()
clf.fit(X_train_cv, y_train)
y_pred = clf.predict(X_test_cv)
score = f1_score(y_test, y_pred, average='micro')

print('F-1 score : {}'.format(np.round(score,4)))
F-1 score : 0.7699
cv = CountVectorizer(analyzer='word',ngram_range=(1,2), stop_words='english')
score = f1_score(y_test, y_pred, average='micro')

print('F-1 score : {}'.format(np.round(score,4)))
F-1 score : 0.768
https://colab.research.google.com/drive/12p7BnuxD6qOsL08a3_uIKi5k6n9JqgkB?authuser=1#scrollTo=tECROc82Ph_A&printMode=true 1/2
8/25/23, 8:50 PM NLP_Exp4.ipynb - Colaboratory
for N in range(1,11):
cv = CountVectorizer(analyzer = 'word',ngram_range=(1,N), stop_words='english')

score = np.round(f1_score(y_test, y_pred, average='micro'),4)

print('F-1 score of model with n-gram range of {}: {}'.format((1,N), score))
F-1 score of model with n-gram range of (1, 1): 0.7699

Conclusion: Based on the results, the model performs at its best with the n-gram range of (1,5). This means that training the model with n-
grams ranging from unigrams to 5-grams help achieve optimal results, but larger n-grams only result in more sparse input features, which
hampers model performance.
Colab paid products - Cancel contracts here
check 12s completed at 8:46 PM
https://colab.research.google.com/drive/12p7BnuxD6qOsL08a3_uIKi5k6n9JqgkB?authuser=1#scrollTo=tECROc82Ph_A&printMode=true 2/2
q229 Robe Rodbigud
Postlabi
motiuo
Pogidhy
molsls cleling N-gam medels, in tte ielod f
a
It wgases how wrell a
valucs dtico
a tokeus bsecl on
hto mode
F an Ngam model wich is a hpe ef olbhlstie lagage

modet t t pecicts the nwt cON in a Sequence basid
Lohe
N- Nombes ef woxds in Seaueuce
qrolabiliy csiel by tu mocte! to
ente Sequnee
FOR EDUCATIONAL USE

Sundaram)

9228 NLP Expt 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

9228 NLP Expt 4

Uploaded by

Copyright:

Available Formats

Department of Computer Engineering

Academic Term: July-November 2023

Rubrics for Lab Experiments

Class : B.E. Computer Subject Name :NLP

Title: N-Gram Model

Date of Performance: 17/08/2023

Roll No: 9228

Name of the Student: Ruben Rodrigues

Performance Below average Average Good Excellent Marks

Aim: To implement the N-gram model

I sit you EOS : 0.00808

0 I`d have responded, if I were going neutral

1 Sooo SAD negative

3 leave me alone negative

4 Sons of ****, negative

from sklearn.preprocessing import LabelEncoder

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

from sklearn.feature_extraction.text import CountVectorizer

cv = CountVectorizer(analyzer = 'word',ngram_range=(1,1), stop_words='english')

from sklearn.naive_bayes import MultinomialNB

score = f1_score(y_test, y_pred, average='micro')

F-1 score : 0.7699

score = f1_score(y_test, y_pred, average='micro')

F-1 score : 0.768

cv = CountVectorizer(analyzer = 'word',ngram_range=(1,N), stop_words='english')

score = np.round(f1_score(y_test, y_pred, average='micro'),4)

F-1 score of model with n-gram range of (1, 1): 0.7699

Colab paid products - Cancel contracts here

check 12s completed at 8:46 PM

F an Ngam model wich is a hpe ef olbhlstie lagage

FOR EDUCATIONAL USE

You might also like