Welcome to Scribd!

0% found this document useful (0 votes)

17 views

Wse Homework - Semantic Web 2 PDF

Uploaded by

The document summarizes the analysis of the book "Don Quixote" by Miguel de Cervantes. It reports that the text contains 250,817 tokens with 13,953 unique tokens, reducing to 11,270 unique tokens after lemmatization. The top 20 most frequent tokens are listed. Word frequency analysis is demonstrated through word clouds with and without stopwords. Named entities are extracted and used to measure semantic similarity between this book and additional books from Project Gutenberg to identify the most similar books.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
Document39 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
VandCasa
No ratings yet
Natural Language Processing
Document11 pages
Natural Language Processing
cricketplayer242004
No ratings yet
Dsbdal A7
Document65 pages
Dsbdal A7
airprojectjnv2020
No ratings yet
NLP Manual (1-12) 1
Document56 pages
NLP Manual (1-12) 1
sj120cp
No ratings yet
Intro To NLP: Natural Language Toolkit
Document11 pages
Intro To NLP: Natural Language Toolkit
Medul
No ratings yet
NLP Programs
Document5 pages
NLP Programs
cnu.vadali
No ratings yet
NLP Manual (1-12)
Document55 pages
NLP Manual (1-12)
sj120cp
No ratings yet
NLP Manual (1-12)
Document54 pages
NLP Manual (1-12)
sj120cp
No ratings yet
How Get Started NLP 6 Unique Ways Perform Tokenization
Document12 pages
How Get Started NLP 6 Unique Ways Perform Tokenization
Hoang Pham
No ratings yet
NLP Study Materials Updated
Document43 pages
NLP Study Materials Updated
CHARU SINGH
No ratings yet
NLTK Analysis 5
Document5 pages
NLTK Analysis 5
shahzad sultan
No ratings yet
English Explanation Features
Document17 pages
English Explanation Features
Oscar Masinde
No ratings yet
Natural Language Processing Lec 5
Document12 pages
Natural Language Processing Lec 5
Touseef sultan
No ratings yet
NLP Manual
Document15 pages
NLP Manual
Chennakesavareddy Appireddy
No ratings yet
NLP Lab Manual-1
Document18 pages
NLP Lab Manual-1
kalanadhamganapathipavankumar
No ratings yet
Session 11-12 - Text Analytics
Document38 pages
Session 11-12 - Text Analytics
Shishir Gupta
No ratings yet
NLP Practicals
Document54 pages
NLP Practicals
RAPTER GAMING
No ratings yet
2 Markers
Document2 pages
2 Markers
pranavi
No ratings yet
What Can We Learn Just Through Tokenization?
Document2 pages
What Can We Learn Just Through Tokenization?
Mirela Lupu
No ratings yet
Basics of Bag of Words Model
Document32 pages
Basics of Bag of Words Model
Ganesh Chandrasekaran
No ratings yet
Building Transformer Models With Attention Crash Course Build A Neural Machine Translator in 12 Days
Document33 pages
Building Transformer Models With Attention Crash Course Build A Neural Machine Translator in 12 Days
Nam Nguyen
No ratings yet
Working With Text en
Document18 pages
Working With Text en
mahdi dorgham
No ratings yet
NLP Steps Basic
Document26 pages
NLP Steps Basic
Madhu
No ratings yet
Introduction To Antconc by Tahir Shah
Document20 pages
Introduction To Antconc by Tahir Shah
Tahir Shah
No ratings yet
NLP Notes and Related Questions
Document7 pages
NLP Notes and Related Questions
Pranjal Kapkar
No ratings yet
Natural Language Processing (NLP)
Document63 pages
Natural Language Processing (NLP)
Saif Jutt
No ratings yet
Big Data Finance t8 1 Choi Neoma NLP 2024
Document13 pages
Big Data Finance t8 1 Choi Neoma NLP 2024
amineelguengue98
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
Document7 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
ntsuandih
No ratings yet
Introduction To Natural Language Processing and NLTK
Document23 pages
Introduction To Natural Language Processing and NLTK
Nikhil Saini
No ratings yet
Topic Models Dsi Talk March 2017
Document24 pages
Topic Models Dsi Talk March 2017
Ao Lv
No ratings yet
Understanding Indexes: User Login
Document10 pages
Understanding Indexes: User Login
rleishman69
No ratings yet
Made By:-Bhawana Agarwal Cs Iiiyr
Document29 pages
Made By:-Bhawana Agarwal Cs Iiiyr
Bhawana Agarwal
No ratings yet
Group Assignment: Unit One
Document27 pages
Group Assignment: Unit One
Kena hk
No ratings yet
How To Build Knowledge Graph Text Using Spacy
Document17 pages
How To Build Knowledge Graph Text Using Spacy
alkamalik13625
No ratings yet
Final LP-VI NLP Manual 2023-24
Document29 pages
Final LP-VI NLP Manual 2023-24
shreyasnagare3635
No ratings yet
Chapter Two
Document31 pages
Chapter Two
latigudata
No ratings yet
NLP 101 - Machine Learning Seminar 2017
Document30 pages
NLP 101 - Machine Learning Seminar 2017
Dan
100% (1)
Query Languages: Chapter Seven
Document36 pages
Query Languages: Chapter Seven
Sooraa
No ratings yet
NLP Asgn2
Document7 pages
NLP Asgn2
[TE A-1] Chandan Singh
No ratings yet
Course Notes For Unit 1 of The Udacity Course CS262 Programming Languages
Document32 pages
Course Notes For Unit 1 of The Udacity Course CS262 Programming Languages
Iain McCulloch
No ratings yet
Chapter 1
Document34 pages
Chapter 1
Mia Kulal
No ratings yet
Creating Ontologies From Web Documents
Document8 pages
Creating Ontologies From Web Documents
thuhuong888
No ratings yet
Chapter 2 Part 1 & 2
Document58 pages
Chapter 2 Part 1 & 2
S J
No ratings yet
DR Happy: System Architecture
Document6 pages
DR Happy: System Architecture
Jaafar Talik
No ratings yet
Lectures Merged
Document327 pages
Lectures Merged
Alankrit Kr. Singh
No ratings yet
Seminar On Natural Language Processing
Document21 pages
Seminar On Natural Language Processing
Aman Bajaj
No ratings yet
Sporcle AI
Document2 pages
Sporcle AI
Timothy Barr
No ratings yet
Frequency Distribution: Text1 Text Corpora Text Corpus
Document2 pages
Frequency Distribution: Text1 Text Corpora Text Corpus
Gajanan Tale
No ratings yet
Programming Interview Questions
Document4 pages
Programming Interview Questions
Rekha Narayan N
No ratings yet
Final Presentation
Document13 pages
Final Presentation
Ajeet Singh
No ratings yet
DS Finalexam (Thxtoshravani)
Document31 pages
DS Finalexam (Thxtoshravani)
Sabale Omkar
No ratings yet
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
Document22 pages
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
raw.junk
No ratings yet
FALLSEM2019-20 CSE4022 ETH VL2019201002590 Reference Material I 17-Jul-2019 NLP1-Lecture 4
Document34 pages
FALLSEM2019-20 CSE4022 ETH VL2019201002590 Reference Material I 17-Jul-2019 NLP1-Lecture 4
Anonymous TpYSenLO8a
No ratings yet
A Mini Project On Clustering of Web Documents Using Suffix Tree Algorithm
Document4 pages
A Mini Project On Clustering of Web Documents Using Suffix Tree Algorithm
Chaitanya Vemuru
No ratings yet
NLP Lab Tasks
Document16 pages
NLP Lab Tasks
Hamza
No ratings yet
5.2 Natural Language Processing
Document43 pages
5.2 Natural Language Processing
punit mishra
No ratings yet
Master Thesis Oxford University
Document5 pages
Master Thesis Oxford University
katrinaduartetulsa
100% (2)
Full Ebook of Writing An Interpreter in Go Thorsten Ball Online PDF All Chapter
Document69 pages
Full Ebook of Writing An Interpreter in Go Thorsten Ball Online PDF All Chapter
ndocajcambal
100% (5)
Lecture #5
Document39 pages
Lecture #5
Huang Xin
No ratings yet
Troubleshooting Oracle Performance
From Everand
Troubleshooting Oracle Performance
Christian Antognini
Rating: 5 out of 5 stars
5/5 (2)

Wse Homework - Semantic Web 2 PDF

Uploaded by

Héctor Bállega

0% found this document useful (0 votes)

17 views3 pages

Original Description:

Original Title

WSE HOMEWORK - SEMANTIC WEB 2.pdf

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

17 views3 pages

Wse Homework - Semantic Web 2 PDF

Uploaded by

Héctor Bállega

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 3

Search inside document

Héctor Bállega Fernández

Web Science and Engineering

WSE HOMEWORK - SEMANTIC WEB 2

Section 1 (basic analysis)

The selected book is Don Quixote by Miguel de Cervantes Saavedra:

http://www.gutenberg.org/ebooks/996

1.How many tokens (words and punctuation symbols) are in text?

250,817 tokens.

2.How many unique tokens (unique words and punctuation) does the text have?
13,953 unique tokens.

3.After lemmatizing the words, how many unique tokens does text have?
11,270 lemma (found in WordNet).

4.What are the 20 most frequently occurring (unique) tokens in the text? What is
their frequency?
[(',', 18109), ('the', 10664), ('and', 8347), ('to', 7187), ('of', 6859), ('that', 4165), ('in', 3659), ('a',
3400), ('he', 3155), ('I', 3086), ('.', 2904), ('it', 2844), (';', 2646), ('his', 2561), ('for', 2518), ('“',
2299), ('as', 2260), ('”', 2251), ('was', 2098), ('not', 1985)]

Section 2 (Word frequency)

Word frequencies can also be used to learn more about the contents of a document,
such as the book you are analysing. The idea is that the most frequent words should
characterize what the book is about. A nice way to illustrate this is via a wordcloud,
please provide a screenshot of the word cloud for each of the following settings:

1.No Filter: Consider all the words

Héctor Bállega Fernández
Web Science and Engineering

2.No Stopwords: Remove stopwords You probably noticed that the most important
words were mostly uninformative. To address this problem, a typical approach is to
remove so called stopwords, which don’t carry a lot of meaning.

3. NER: To further remove the words and only keep the entities in text, first extract
the entities using NLTK library then perform the wordcloud.
Héctor Bállega Fernández
Web Science and Engineering

Section 3 (Word embedding)

Assume we want to find similar books to the ones we chose, for this please perform the
following steps:

- pick at least 7 additional books from Project Gutenberg

- extract their named entities

- measure the semantic similarity (using word embeddings) between the

additional selected books and the initial book based on their extracted
entities.
- rank the additional books based on their similarity to the initial one (descending
order). In this way you are able to find the most similar book to the initial one. (list
of book names, their similarity to the initial book)

Solution (screenshot from notebook):

Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
Document39 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick PDF
VandCasa
No ratings yet
Natural Language Processing
Document11 pages
Natural Language Processing
cricketplayer242004
No ratings yet
Dsbdal A7
Document65 pages
Dsbdal A7
airprojectjnv2020
No ratings yet
NLP Manual (1-12) 1
Document56 pages
NLP Manual (1-12) 1
sj120cp
No ratings yet
Intro To NLP: Natural Language Toolkit
Document11 pages
Intro To NLP: Natural Language Toolkit
Medul
No ratings yet
NLP Programs
Document5 pages
NLP Programs
cnu.vadali
No ratings yet
NLP Manual (1-12)
Document55 pages
NLP Manual (1-12)
sj120cp
No ratings yet
NLP Manual (1-12)
Document54 pages
NLP Manual (1-12)
sj120cp
No ratings yet
How Get Started NLP 6 Unique Ways Perform Tokenization
Document12 pages
How Get Started NLP 6 Unique Ways Perform Tokenization
Hoang Pham
No ratings yet
NLP Study Materials Updated
Document43 pages
NLP Study Materials Updated
CHARU SINGH
No ratings yet
NLTK Analysis 5
Document5 pages
NLTK Analysis 5
shahzad sultan
No ratings yet
English Explanation Features
Document17 pages
English Explanation Features
Oscar Masinde
No ratings yet
Natural Language Processing Lec 5
Document12 pages
Natural Language Processing Lec 5
Touseef sultan
No ratings yet
NLP Manual
Document15 pages
NLP Manual
Chennakesavareddy Appireddy
No ratings yet
NLP Lab Manual-1
Document18 pages
NLP Lab Manual-1
kalanadhamganapathipavankumar
No ratings yet
Session 11-12 - Text Analytics
Document38 pages
Session 11-12 - Text Analytics
Shishir Gupta
No ratings yet
NLP Practicals
Document54 pages
NLP Practicals
RAPTER GAMING
No ratings yet
2 Markers
Document2 pages
2 Markers
pranavi
No ratings yet
What Can We Learn Just Through Tokenization?
Document2 pages
What Can We Learn Just Through Tokenization?
Mirela Lupu
No ratings yet
Basics of Bag of Words Model
Document32 pages
Basics of Bag of Words Model
Ganesh Chandrasekaran
No ratings yet
Building Transformer Models With Attention Crash Course Build A Neural Machine Translator in 12 Days
Document33 pages
Building Transformer Models With Attention Crash Course Build A Neural Machine Translator in 12 Days
Nam Nguyen
No ratings yet
Working With Text en
Document18 pages
Working With Text en
mahdi dorgham
No ratings yet
NLP Steps Basic
Document26 pages
NLP Steps Basic
Madhu
No ratings yet
Introduction To Antconc by Tahir Shah
Document20 pages
Introduction To Antconc by Tahir Shah
Tahir Shah
No ratings yet
NLP Notes and Related Questions
Document7 pages
NLP Notes and Related Questions
Pranjal Kapkar
No ratings yet
Natural Language Processing (NLP)
Document63 pages
Natural Language Processing (NLP)
Saif Jutt
No ratings yet
Big Data Finance t8 1 Choi Neoma NLP 2024
Document13 pages
Big Data Finance t8 1 Choi Neoma NLP 2024
amineelguengue98
No ratings yet
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
Document7 pages
Sentiment Analysis Using Supervised Machine Learning Ijariie13051
ntsuandih
No ratings yet
Introduction To Natural Language Processing and NLTK
Document23 pages
Introduction To Natural Language Processing and NLTK
Nikhil Saini
No ratings yet
Topic Models Dsi Talk March 2017
Document24 pages
Topic Models Dsi Talk March 2017
Ao Lv
No ratings yet
Understanding Indexes: User Login
Document10 pages
Understanding Indexes: User Login
rleishman69
No ratings yet
Made By:-Bhawana Agarwal Cs Iiiyr
Document29 pages
Made By:-Bhawana Agarwal Cs Iiiyr
Bhawana Agarwal
No ratings yet
Group Assignment: Unit One
Document27 pages
Group Assignment: Unit One
Kena hk
No ratings yet
How To Build Knowledge Graph Text Using Spacy
Document17 pages
How To Build Knowledge Graph Text Using Spacy
alkamalik13625
No ratings yet
Final LP-VI NLP Manual 2023-24
Document29 pages
Final LP-VI NLP Manual 2023-24
shreyasnagare3635
No ratings yet
Chapter Two
Document31 pages
Chapter Two
latigudata
No ratings yet
NLP 101 - Machine Learning Seminar 2017
Document30 pages
NLP 101 - Machine Learning Seminar 2017
Dan
100% (1)
Query Languages: Chapter Seven
Document36 pages
Query Languages: Chapter Seven
Sooraa
No ratings yet
NLP Asgn2
Document7 pages
NLP Asgn2
[TE A-1] Chandan Singh
No ratings yet
Course Notes For Unit 1 of The Udacity Course CS262 Programming Languages
Document32 pages
Course Notes For Unit 1 of The Udacity Course CS262 Programming Languages
Iain McCulloch
No ratings yet
Chapter 1
Document34 pages
Chapter 1
Mia Kulal
No ratings yet
Creating Ontologies From Web Documents
Document8 pages
Creating Ontologies From Web Documents
thuhuong888
No ratings yet
Chapter 2 Part 1 & 2
Document58 pages
Chapter 2 Part 1 & 2
S J
No ratings yet
DR Happy: System Architecture
Document6 pages
DR Happy: System Architecture
Jaafar Talik
No ratings yet
Lectures Merged
Document327 pages
Lectures Merged
Alankrit Kr. Singh
No ratings yet
Seminar On Natural Language Processing
Document21 pages
Seminar On Natural Language Processing
Aman Bajaj
No ratings yet
Sporcle AI
Document2 pages
Sporcle AI
Timothy Barr
No ratings yet
Frequency Distribution: Text1 Text Corpora Text Corpus
Document2 pages
Frequency Distribution: Text1 Text Corpora Text Corpus
Gajanan Tale
No ratings yet
Programming Interview Questions
Document4 pages
Programming Interview Questions
Rekha Narayan N
No ratings yet
Final Presentation
Document13 pages
Final Presentation
Ajeet Singh
No ratings yet
DS Finalexam (Thxtoshravani)
Document31 pages
DS Finalexam (Thxtoshravani)
Sabale Omkar
No ratings yet
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
Document22 pages
Elementary IR: Scalable Boolean Text Search: (Compare With R & G 27.1-3)
raw.junk
No ratings yet
FALLSEM2019-20 CSE4022 ETH VL2019201002590 Reference Material I 17-Jul-2019 NLP1-Lecture 4
Document34 pages
FALLSEM2019-20 CSE4022 ETH VL2019201002590 Reference Material I 17-Jul-2019 NLP1-Lecture 4
Anonymous TpYSenLO8a
No ratings yet
A Mini Project On Clustering of Web Documents Using Suffix Tree Algorithm
Document4 pages
A Mini Project On Clustering of Web Documents Using Suffix Tree Algorithm
Chaitanya Vemuru
No ratings yet
NLP Lab Tasks
Document16 pages
NLP Lab Tasks
Hamza
No ratings yet
5.2 Natural Language Processing
Document43 pages
5.2 Natural Language Processing
punit mishra
No ratings yet
Master Thesis Oxford University
Document5 pages
Master Thesis Oxford University
katrinaduartetulsa
100% (2)
Full Ebook of Writing An Interpreter in Go Thorsten Ball Online PDF All Chapter
Document69 pages
Full Ebook of Writing An Interpreter in Go Thorsten Ball Online PDF All Chapter
ndocajcambal
100% (5)
Lecture #5
Document39 pages
Lecture #5
Huang Xin
No ratings yet
Troubleshooting Oracle Performance
From Everand
Troubleshooting Oracle Performance
Christian Antognini
Rating: 5 out of 5 stars
5/5 (2)

Wse Homework - Semantic Web 2 PDF

Uploaded by

Copyright:

Available Formats

You might also like

Wse Homework - Semantic Web 2 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wse Homework - Semantic Web 2 PDF

Uploaded by

Copyright:

Available Formats

Héctor Bállega Fernández

Web Science and Engineering

WSE HOMEWORK - SEMANTIC WEB 2

The selected book is Don Quixote by Miguel de Cervantes Saavedra:

1.How many tokens (words and punctuation symbols) are in text?

Section 2 (Word frequency)

1.No Filter: Consider all the words

Section 3 (Word embedding)

- pick at least 7 additional books from Project Gutenberg

- extract their named entities

- measure the semantic similarity (using word embeddings) between the

Solution (screenshot from notebook):

You might also like

Wse Homework - Semantic Web 2 PDF

Uploaded by

Copyright:

Available Formats

You might also like

Wse Homework - Semantic Web 2 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Wse Homework - Semantic Web 2 PDF

Uploaded by

Copyright:

Available Formats

Héctor Bállega Fernández

Web Science and Engineering

WSE HOMEWORK - SEMANTIC WEB 2

The selected book is ​Don Quixote by Miguel de Cervantes Saavedra:

1.How many tokens (words and punctuation symbols) are in text?

Section 2 (Word frequency)

1.No Filter: Consider all the words

Section 3 (Word embedding)

- pick at least 7 additional books from Project Gutenberg

- extract their named entities

- measure the semantic similarity (using word embeddings) between the

Solution (screenshot from notebook):

You might also like

The selected book is Don Quixote by Miguel de Cervantes Saavedra: