Professional Documents
Culture Documents
ML Project Report: (Text Learning Case Study)
ML Project Report: (Text Learning Case Study)
ML Project Report: (Text Learning Case Study)
Ankit bhagat
Date of Submission -8th Jan
1
Business problem
2
Text learning Case Study
3
Problem 2:
In this particular project, we are going to work on the inaugural corpora from the nltk in Python.
We will be looking at the following speeches of the Presidents of the United States of America:
1. President Franklin D. Roosevelt in 1941
2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973
(Hint: use .words(), .raw(), .sent() for extracting counts)
2.1 Find the number of characters, words, and sentences for the mentioned documents.
Answer-
President Franklin D. Roosevelt in 1941 speech
R represent character
R1=len(inaugural.raw('1941-Roosevelt.txt'))
R1
R1=7571
R2=len(inaugural.raw('1961-Kennedy.txt'))
R2
R2=7618
R3=len(inaugural.raw('1973-Nixon.txt'))
R3
R3=9991
S represent sentence
S1 =68
S2 =52
4
S3=69
W represent Words
Roosvelt W1=1536
Kennedy W2=1546
Nixon W3=2028
5
2.2 Remove all the stopwords from the three speeches. Show the word count before and
after the removal of stopwords. Show a sample sentence after the removal of stopwords.
2)Kenndy stopword
6
3)Nixon
7
Before Stopword removal
Roosvelt W1=1536
Kennedy W2=1546
Nixon W3=2028
Roosevelt A1=670
Kennedy A2=716
Nixon A3 =857
2.3 Which word occurs the most number of times in his inaugural address for each president? Mention the
top three words. (after removing the stopwords)
a) Roosevelt
1)Know
2)Sprit
3)Life
b) Kennedy
1)world
2)sides
3)New
c) Nixon
FreqDist({'America': 21, 'peace': 19, 'world': 17, 'new': 15, "'": 14, 'I': 12,
'responsibility': 11, 'great': 9, 'home': 9, 'nation': 9, ...})
1)America
2)peace
8
3)world