Verma - NLP Lab Manual

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Prestige Institute of Engineering, Management &

Research, Indore

Department of Artificial Intelligence and Data Science

Lab Manual

AD 802
Natural Language Processing

Name : Kartik Verma


Roll No : 0863AD201021
Branch : B. Tech ( AI & DS )
Semester : VIII ( 4th year )
Submitted To : Dr. Pritika Bahad

1
Prestige Institute of Engineering Research & Management

LIST OF EXPERIMENT
S.No. Name of Experiment Date of Faculty
Experiment Signature

1. Installation and exploring features of NLTK and spaCy


tools. Download Word Cloud and few corpora.

2. (i)Write a program to implement word Tokenizer,


Sentence and Paragraph Tokenizers.
(ii) Check how many words are there in any corpus.
Also check how many distinct words are there?

3. (i) Write a program to implement both user-defined and


pre-defined functions to generate (a) Uni-grams (b) Bi-
grams (c) Tri-grams (d) N-grams
(ii) Write a program to calculate the highest probability
of a word (w2) occurring after another word(w1).

4. (i)Write a program to identify the word collocations.


(ii) Write a program to print all words beginning with a
given sequence of letters.
(iii) Write a program to print all words longer than four
characters.

5. (i) Write a program to identify the mathematical


expression in a given sentence.
(ii) Write a program to identify different components of
an email address.

6. (i) Write a program to find all the stop words in any


given text.
(ii) Write a function that finds the 50 most frequently
occurring words of a text
That are not stop words.

7. (i) Write a program to find all the stop words in any


given text.
(ii) Write a function that finds the 50 most frequently
occurring words of a text that are not stop words.

8. Write a program to implement various stemming


techniques and prepare a chart with the performance of
each method.

9. Write a program to implement various lemmatization


techniques and prepare a chart with the performance of

2
Prestige Institute of Engineering Research & Management

each method.

10. (i) Write a program to implement Part-of-Speech (PoS)


tagging for any corpus.
(ii) Write a program to identify which word has the
greatest number of distinct tags? What are they, and
what do they represent?
(iii) Write a program to list tags in order of decreasing
frequency and what do the 20 most frequent tags
represent?
(iv) Write a program to identify which tags are nouns
most commonly found after? What do these tags
represent?

11. Write a program to implement TF-IDF for any corpus.

12. (i) Write a program to find all the mis-spelled words in


a paragraph.
(ii) Write a program to prepare a table with frequency
of mis-spelled tags for any given text.

13. Write a program to implement all the NLP Pre-


Processing Techniques required to perform further NLP
tasks.

Case Study-2. Write a program to perform Auto-


Correction of spellings for any text.

3
Prestige Institute of Engineering Research & Management

Experiment No. 1
Aim:- Installation and exploring features of NLTK and spaCy tools. Download Word Cloud
and few corpora.

To install NLTK:

To install spaCy

Download NLTK corpora:

Similarly, for spaCy, you can download pre-trained models for different languages:

Install WordCloud Library:

4
Prestige Institute of Engineering Research & Management

Experiment No. 2
i. Write a program to implement word Tokenizer, Sentence and Paragraph Tokenizers.

Word Tokenizer:

Sentence Tokenizer:

Paragraph Tokenizer / Blankline Tokenizer:

5
Prestige Institute of Engineering Research & Management

ii. Check how many words are there in any corpus. Also check how many distinct words
are there?

6
Prestige Institute of Engineering Research & Management

Experiment No. 3

i. Write a program to implement both user-defined and pre-defined functions to generate

(a) Uni-grams (b) Bi-grams (c) Tri-grams (d) N-grams

Output:

7
Prestige Institute of Engineering Research & Management

ii. Write a program to calculate the highest probability of a word (w2) occurring after
another word(w1).

Output:

8
Prestige Institute of Engineering Research & Management

Experiment No. 4

i. Write a program to identify the word collocations.

Output:

9
Prestige Institute of Engineering Research & Management

ii. Write a program to print all words beginning with a given sequence of letters.

Output:

10
Prestige Institute of Engineering Research & Management

iii. Write a program to print all words longer than four characters.

Output:

11
Prestige Institute of Engineering Research & Management

Experiment No. 5

i. Write a program to identify the mathematical expression in a given sentence.

Output:

12
Prestige Institute of Engineering Research & Management

ii. Write a program to identify different components of an email address.

Output:

13
Prestige Institute of Engineering Research & Management

Experiment No. 6

i. Write a program to identify all antonyms and synonyms of a word.

Output:

14
Prestige Institute of Engineering Research & Management

ii. Write a program to find hyponymy, homonymy, polysemy for a given word.

Ouput:

15
Prestige Institute of Engineering Research & Management

Experiment No. 7

i. Write a program to find all the stop words in any given text.

Output:

16
Prestige Institute of Engineering Research & Management

ii. Write a function that finds the 50 most frequently occurring words of a text that are not
stopwords.

Output:

17
Prestige Institute of Engineering Research & Management

Experiment No. 8

Aim:- Write a program to implement various stemming techniques and prepare a chart
with the performance of each method.

Output:

18
Prestige Institute of Engineering Research & Management

Experiment No. 9

Aim:- Write a program to implement various lemmatization techniques and prepare a


chart with the performance of each method.

Output:

19
Prestige Institute of Engineering Research & Management

Experiment No. 10
i. Write a program to implement Part-of-Speech (PoS) tagging for any corpus.

Output:

20
Prestige Institute of Engineering Research & Management

ii. Write a program to identify which word has the greatest number of distinct tags? What
are they, and what do they represent?

Output:

21
Prestige Institute of Engineering Research & Management

iii. Write a program to list tags in order of decreasing frequency and what do the 20 most
frequent tags represent?

Output:

22
Prestige Institute of Engineering Research & Management

iv. Write a program to identify which tags are nouns most commonly found after? What do
these tags represent?

Output:

23
Prestige Institute of Engineering Research & Management

Experiment No. 11

Aim:- Write a program to implement TF-IDF for any corpus.

Output:

24
Prestige Institute of Engineering Research & Management

Experiment No. 12

i. Write a program to find all the mis-spelled words in a paragraph.

Output:

25
Prestige Institute of Engineering Research & Management

ii. Write a program to prepare a table with frequency of mis-spelled tags for any given text.

Output:

26
Prestige Institute of Engineering Research & Management

Experiment No. 13

Aim:- Write a program to implement all the NLP Pre-Processing Techniques required to
perform further NLP tasks.

Output:

27
Prestige Institute of Engineering Research & Management

Case Study: Write a program to perform Auto-Correction of spellings for any text.

Output:

28

You might also like