Welcome to Scribd!

Bag of Words 03 and 04 Model

Uploaded by

0% found this document useful (0 votes)

11 views4 pages

The document shows code for text processing and natural language processing. It imports NLTK and regular expression libraries. It tokenizes a sample paragraph into sentences and words. It then creates a dictionary counting word frequencies. The most frequent words are identified and sentences are represented as vectors of these frequent words.

Original Description:

Original Title

BAG OF WORDS 03 AND 04 MODEL

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

11 views4 pages

Bag of Words 03 and 04 Model

Uploaded by

Premjit Sengupta

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 4

Search inside document

4/15/2020 Untitled10 - Jupyter Notebook

In [1]:

import nltk

In [2]:

import re

In [21]:

import heapq
import numpy as np

In [4]:

paragraph = """Thank you all so very much. Thank you to the Academy.Thank you to all of you

In [5]:

dataset = nltk.sent_tokenize(paragraph)
for i in range(len(dataset)):
dataset[i] = dataset[i].lower()
dataset[i] = re.sub(r'\W',' ',dataset[i])
dataset[i] = re.sub(r'\s+',' ',dataset[i])

In [6]:

dataset

Out[6]:

['thank you all so very much ',

'thank you to the academy thank you to all of you in this room ',
'i have to congratulate the other incredible nominees this year ',
'the revenant was ']

In [7]:

word2count = {}
for data in dataset:
words = nltk.word_tokenize(data)
for word in words:
if word not in word2count.keys():
word2count[word] = 1
else:
word2count[word] += 1

127.0.0.1:8888/notebooks/Untitled10.ipynb?kernel_name=python3 1/4
4/15/2020 Untitled10 - Jupyter Notebook

In [8]:

word2count

Out[8]:

{'thank': 3,
'you': 4,
'all': 2,
'so': 1,
'very': 1,
'much': 1,
'to': 3,
'the': 3,
'academy': 1,
'of': 1,
'in': 1,
'this': 2,
'room': 1,
'i': 1,
'have': 1,
'congratulate': 1,
'other': 1,
'incredible': 1,
'nominees': 1,
'year': 1,
'revenant': 1,
'was': 1}

In [18]:

freq_words = heapq.nlargest(100,word2count,key=word2count.get)

127.0.0.1:8888/notebooks/Untitled10.ipynb?kernel_name=python3 2/4
4/15/2020 Untitled10 - Jupyter Notebook

In [19]:

freq_words

Out[19]:

['you',
'thank',
'to',
'the',
'all',
'this',
'so',
'very',
'much',
'academy',
'of',
'in',
'room',
'i',
'have',
'congratulate',
'other',
'incredible',
'nominees',
'year',
'revenant',
'was']

In [22]:

# Converting sentences to vectors

X = []
for data in dataset:
vector = []
for word in freq_words:
if word in nltk.word_tokenize(data):
vector.append(1)
else:
vector.append(0)
X.append(vector)

X = np.asarray(X)

In [23]:

Out[23]:

array([[1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]])

In [ ]:

127.0.0.1:8888/notebooks/Untitled10.ipynb?kernel_name=python3 3/4
4/15/2020 Untitled10 - Jupyter Notebook

127.0.0.1:8888/notebooks/Untitled10.ipynb?kernel_name=python3 4/4

Python Fundamentals For Machine Learning Version1
Document58 pages
Python Fundamentals For Machine Learning Version1
NABI M
No ratings yet
Additional Reading Material-Probability
Document11 pages
Additional Reading Material-Probability
Premjit Sengupta
No ratings yet
Week 18 Imperialism
Document5 pages
Week 18 Imperialism
api-301032464
No ratings yet
KUKA Programming Notes
Document13 pages
KUKA Programming Notes
92dalli
No ratings yet
Ams03 GB
Document27 pages
Ams03 GB
murtali
No ratings yet
Assignment No - 7
Document4 pages
Assignment No - 7
Sid Chabukswar
No ratings yet
Computer Programming Lectures
Document10 pages
Computer Programming Lectures
Fosu Adom
No ratings yet
18SOECE11008 Assignment 2
Document8 pages
18SOECE11008 Assignment 2
DEVENDRABHAI DABHI
No ratings yet
Ilovepdf Merged
Document101 pages
Ilovepdf Merged
enkaiwoo
No ratings yet
Python Basics
Document9 pages
Python Basics
mkesav3070
No ratings yet
Dsbda 7
Document1 page
Dsbda 7
monaliauti2
No ratings yet
01-NumPy Arrays - Jupyter Notebook
Document10 pages
01-NumPy Arrays - Jupyter Notebook
Kamal
No ratings yet
Act5 Wisnu Trenggono Wirayuda 57418379
Document9 pages
Act5 Wisnu Trenggono Wirayuda 57418379
Wisnu Trenggono Wirayuda
No ratings yet
BA - Codes
Document40 pages
BA - Codes
Narsipalli Kalyani Kousika
No ratings yet
Intro PDS
Document12 pages
Intro PDS
Albin P J
No ratings yet
NLP PDF
Document17 pages
NLP PDF
vlmohammedadnan9
No ratings yet
Python Quick Note
Document23 pages
Python Quick Note
Jyotirmay Sahu
No ratings yet
CH1 Introduction To Python
Document13 pages
CH1 Introduction To Python
gec
No ratings yet
MD - Shawkat Hossain - 20CSE046
Document14 pages
MD - Shawkat Hossain - 20CSE046
zahidul.cse7.bu
No ratings yet
Experiment 8 & 9
Document14 pages
Experiment 8 & 9
Sharath Chandan
No ratings yet
CSE 3024: Web Mining: Lab Assessment - 3
Document13 pages
CSE 3024: Web Mining: Lab Assessment - 3
Nikitha Reddy
No ratings yet
ASSIGNMENTS
Document5 pages
ASSIGNMENTS
Song Covers By DT
No ratings yet
1 Basics of Python
Document6 pages
1 Basics of Python
venkatesh m
No ratings yet
Lect2 Slides
Document24 pages
Lect2 Slides
tqbrowne
No ratings yet
Asss 7
Document4 pages
Asss 7
Ashwini Patil
No ratings yet
Hello, World!: Indentation
Document37 pages
Hello, World!: Indentation
program program
No ratings yet
Notebook Python Crash Course
Document25 pages
Notebook Python Crash Course
darayir140
No ratings yet
Linear Algebra: Prepared by Asif Bhat
Document42 pages
Linear Algebra: Prepared by Asif Bhat
joopraya
No ratings yet
Linear Algebra
Document42 pages
Linear Algebra
BI bi
No ratings yet
AND - Gate - Biased
Document4 pages
AND - Gate - Biased
mostafa98999
No ratings yet
Numeric Types
Document5 pages
Numeric Types
Sanket Patil
No ratings yet
Final Practical File 2022-23
Document87 pages
Final Practical File 2022-23
Kuldeep Singh
No ratings yet
Python 1
Document25 pages
Python 1
reema dsouza
No ratings yet
Python Basic - Jupyter Notebook
Document15 pages
Python Basic - Jupyter Notebook
Keyshaa
No ratings yet
PythonNotesV1 0
Document3 pages
PythonNotesV1 0
murthygrs
No ratings yet
Hands - Python
Document11 pages
Hands - Python
Thejovathi Raju
No ratings yet
Collections Datetime
Document4 pages
Collections Datetime
mgm9211
No ratings yet
Class 10 Loop
Document16 pages
Class 10 Loop
TheAncient01
No ratings yet
Final Tutorial
Document64 pages
Final Tutorial
Mohit Maurya
No ratings yet
AP19110010030 R Lab-Assignment-1
Document12 pages
AP19110010030 R Lab-Assignment-1
Sravan Kilaru AP19110010030
No ratings yet
Python - Basics - Dr. CHADI
Document11 pages
Python - Basics - Dr. CHADI
m.shaggyspringer
No ratings yet
1.4 Python Basics 2
Document4 pages
1.4 Python Basics 2
noman iqbal (nomylogy)
No ratings yet
ML Lab Manual
Document14 pages
ML Lab Manual
Priya Neelam
No ratings yet
AI Labfile
Document29 pages
AI Labfile
virgulatiit21a1068
No ratings yet
11 CSWTJan 2023 MS
Document3 pages
11 CSWTJan 2023 MS
moiiifitbituser
No ratings yet
3 - Intro - To - Python: 1 Using Python As A Calculator
Document16 pages
3 - Intro - To - Python: 1 Using Python As A Calculator
vb_murthy
No ratings yet
Python Exam Practice - Exercises
Document6 pages
Python Exam Practice - Exercises
Ariyan Jahanyar
No ratings yet
Numpy Functions
Document9 pages
Numpy Functions
Ramesh Kumar Mojjada
No ratings yet
Chemistry Project Main
Document62 pages
Chemistry Project Main
rahul kumar jha
No ratings yet
BECOB236 Code
Document10 pages
BECOB236 Code
kp
No ratings yet
Numpy Arrays
Document7 pages
Numpy Arrays
Adam Taufik
No ratings yet
Traversing 2D
Document1 page
Traversing 2D
haribabu ocr
No ratings yet
Numpy - KickStart - Jupyter Notebook
Document14 pages
Numpy - KickStart - Jupyter Notebook
dimple mahadule
No ratings yet
Python Introduction
Document20 pages
Python Introduction
Kalaivani V
No ratings yet
ABCD
Document9 pages
ABCD
skill share premium
No ratings yet
Python Assignment 4
Document5 pages
Python Assignment 4
Jai Sai
No ratings yet
Abhi Lab File
Document31 pages
Abhi Lab File
Abhi Goyal
No ratings yet
Computer Science S
Document30 pages
Computer Science S
Kumar Munendra Prf
No ratings yet
Daily Task - Day - 2 - Jupyter Notebook
Document6 pages
Daily Task - Day - 2 - Jupyter Notebook
Vrushali Vishwasrao
No ratings yet
PDF Assi
Document2 pages
PDF Assi
irron349
No ratings yet
1.a Numpy Code
Document2 pages
1.a Numpy Code
venkatesh m
No ratings yet
Lab 1
Document9 pages
Lab 1
m.o.ren.je.n.l.y
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
N Gram Model
Document2 pages
N Gram Model
Premjit Sengupta
No ratings yet
02-Stemming - Jupyter Notebook
Document4 pages
02-Stemming - Jupyter Notebook
Premjit Sengupta
No ratings yet
K Nearest Neighbor Algorithm in Python - Towards Data Science
Document7 pages
K Nearest Neighbor Algorithm in Python - Towards Data Science
Premjit Sengupta
No ratings yet
Binomialdistribution 190124111432 PDF
Document26 pages
Binomialdistribution 190124111432 PDF
Premjit Sengupta
No ratings yet
Big Data With Hadoop & Spark - Introduction
Document28 pages
Big Data With Hadoop & Spark - Introduction
Premjit Sengupta
No ratings yet
Advertisement Analysis
Document4 pages
Advertisement Analysis
bhavani
No ratings yet
Producing Clinical Laboratory Shift Tables From Adam Data: Rao Bingi, Octagon Research Solutions, Wayne, PA
Document12 pages
Producing Clinical Laboratory Shift Tables From Adam Data: Rao Bingi, Octagon Research Solutions, Wayne, PA
lenith
No ratings yet
Power Distance in 10 Minutes 2015-09-05
Document8 pages
Power Distance in 10 Minutes 2015-09-05
Harshdeep Bhatia
No ratings yet
Example of Essay Correction by YES-IELTS
Document2 pages
Example of Essay Correction by YES-IELTS
YES IELTS
86% (7)
Kent 1997 Milling Modelling
Document7 pages
Kent 1997 Milling Modelling
gustool7
No ratings yet
Vocabulary Teaching
Document44 pages
Vocabulary Teaching
Ni Made Ivana Swastiana
100% (1)
Toshiba Placement Paper
Document12 pages
Toshiba Placement Paper
panakkaltino
100% (1)
Smith - Lesson Plan Final
Document2 pages
Smith - Lesson Plan Final
api-429426959
No ratings yet
Mif Brochure PDF
Document28 pages
Mif Brochure PDF
commerce.academy.bam
No ratings yet
UT Dallas Syllabus For Entp6375.0g1.10f Taught by Daniel Bochsler (dcb091000)
Document20 pages
UT Dallas Syllabus For Entp6375.0g1.10f Taught by Daniel Bochsler (dcb091000)
UT Dallas Provost's Technology Group
No ratings yet
Calf Rac
Document72 pages
Calf Rac
Emerson Ipiales Gudiño
No ratings yet
Junjhunu CGWB PDF
Document28 pages
Junjhunu CGWB PDF
sumanpunia
No ratings yet
2 Force and Acceleration
Document22 pages
2 Force and Acceleration
Mohd Harris Idris
No ratings yet
Stability of Structures: Basic Concepts
Document17 pages
Stability of Structures: Basic Concepts
miry89
No ratings yet
Bedanking Thesis
Document6 pages
Bedanking Thesis
afkodkedr
100% (2)
India - A Superpower
Document4 pages
India - A Superpower
Srinivasan
No ratings yet
Lords Friendships
Document1 page
Lords Friendships
purinaresh85
100% (1)
Da 1998
Document17 pages
Da 1998
Paul Dumitru
No ratings yet
Speech Acceptance and Words of Commitment
Document13 pages
Speech Acceptance and Words of Commitment
Jennifer Angelou Derije
No ratings yet
Quantum Compatibility Rules
Document4 pages
Quantum Compatibility Rules
Abel Andrada
No ratings yet
Objection-Handling Techniques and Methods
Document4 pages
Objection-Handling Techniques and Methods
avignamasthu
No ratings yet
Performance Appraisal On The City Bank Ltd.
Document43 pages
Performance Appraisal On The City Bank Ltd.
Norul Bashar Sagar
50% (2)
Nanotechnology
Document32 pages
Nanotechnology
Ashwin Kumar
No ratings yet
PSY 369: Psycholinguistics: Introductions & Brief History of Psycholinguistics
Document30 pages
PSY 369: Psycholinguistics: Introductions & Brief History of Psycholinguistics
Deka
No ratings yet
How To Enhance Learners' Vocabulary Retention Through The Use of Acronyms
Document14 pages
How To Enhance Learners' Vocabulary Retention Through The Use of Acronyms
Maxwell Mclaughlin
No ratings yet
CV Fahad
Document1 page
CV Fahad
Fahad Iqbal
No ratings yet
Audit Methodologies
Document3 pages
Audit Methodologies
Ahmed639
No ratings yet