ML Project Report: (Text Learning Case Study)

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

ML Project Report

(Text Learning Case Study)


Module 6 DSBA

Ankit bhagat
Date of Submission -8th Jan

1
Business problem

Problem 1: Text Learning

2
Text learning Case Study

3
Problem 2:
In this particular project, we are going to work on the inaugural corpora from the nltk in Python.
We will be looking at the following speeches of the Presidents of the United States of America:
1. President Franklin D. Roosevelt in 1941
2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973
(Hint: use .words(), .raw(), .sent() for extracting counts)

2.1 Find the number of characters, words, and sentences for the mentioned documents.

Answer-
President Franklin D. Roosevelt in 1941 speech

R represent character

R1=len(inaugural.raw('1941-Roosevelt.txt'))
R1

R1=7571

R2=len(inaugural.raw('1961-Kennedy.txt'))
R2

R2=7618

R3=len(inaugural.raw('1973-Nixon.txt'))
R3
R3=9991

S represent sentence

S1 =68

S2 =52
4
S3=69

W represent Words

Roosvelt W1=1536
Kennedy W2=1546
Nixon W3=2028

5
2.2 Remove all the stopwords from the three speeches. Show the word count before and
after the removal of stopwords. Show a sample sentence after the removal of stopwords.

Word1 =Stopword line item for Roosevelt text

2)Kenndy stopword

6
3)Nixon

7
Before Stopword removal

Roosvelt W1=1536
Kennedy W2=1546
Nixon W3=2028

After stopword removal

Roosevelt A1=670
Kennedy A2=716
Nixon A3 =857

2.3 Which word occurs the most number of times in his inaugural address for each president? Mention the
top three words. (after removing the stopwords)

After removing the stopwords

a) Roosevelt

FreqDist({'know': 10, 'spirit': 9, 'life': 9, 'us': 8, 'democracy': 8, 'people': 7,


'Nation': 7, 'America': 7, 'years': 6, 'freedom': 6, ...})

1)Know
2)Sprit
3)Life

b) Kennedy

FreqDist({'world': 8, 'sides': 8, 'new': 7, 'pledge': 7, 'citizens': 5, 'I': 5,


'power': 5, 'shall': 5, 'To': 5, 'free': 5, ...})

1)world
2)sides
3)New

c) Nixon

FreqDist({'America': 21, 'peace': 19, 'world': 17, 'new': 15, "'": 14, 'I': 12,
'responsibility': 11, 'great': 9, 'home': 9, 'nation': 9, ...})

1)America
2)peace
8
3)world

You might also like