Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Prashansa Ranjan, Aug 2020 Batch

Problem 2:
In this particular project, we are going to work on the inaugural corpora from the nltk in
Python. We will be looking at the following speeches of the Presidents of the United
States of America:
1. President Franklin D. Roosevelt in 1941
2. President John F. Kennedy in 1961
3. President Richard Nixon in 1973
 Find the number of characters, words and sentences for the mentioned documents. – 3
Marks

(Hint: use .words(), .raw(), .sent() for extracting counts)


 Remove all the stopwords from all the three speeches. – 3 Marks
 Which word occurs the most number of times in his inaugural address for each president?
Mention the top three words. (after removing the stopwords) – 3 Marks
 Plot the word cloud of each of the speeches of the variable. (after removing the stopwords) –
3 Marks [ refer to the End-to-End Case Study done in the Mentored Learning Session ]

Code Snippet to extract the three speeches:


“import nltk
nltk.download('inaugural')
from nltk.corpus import inaugural
inaugural.fileids()
inaugural.raw('1941-Roosevelt.txt')
inaugural.raw('1961-Kennedy.txt')
inaugural.raw('1973-Nixon.txt')”

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
ANSWER:
Firstly, we are importing the necessary libraries for the data set analysis and then the
necessary data sets from the cloud.

Post this, one by one imported the required speeches.

Now, we want to know the number of characters in each of the speeches. Hence
running the below code for the same.

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
Here we got to know the number of characters in each of the speeches.
Next, we want to know the number of words in each of the speeches. So running the
below code for the same.

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
This tells us the number of words in each speech. Now, we want to know the number of
sentences in each of the speech. So running the below code for it.

Here we got to know the number of sentences in each speech. Now, we have to remove
the stopwords from each speech. Hence running the below command for it:

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
Post this, we are running the code for removing the stopwords. For Roosevelt:

For Kennedy:

For Nixon:

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
The outcomes of all the three codes above can be seen in the Jupyter notebook
attached.
Now, we want to know the most common words in each of the speeches, post removing
the stop words. For this, we are running the below code:

Roosevelt’s most common top three words, post removing stopwords are: Nation, Know
and Spirit.
We will do the same for Kennedy:

The most common top three words are: Let, Us and World.

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
Doing the same for Nixon:

The three topmost common words for Nixon’s speech were: Us, Let and America.
Post this, we wish to generate the word cloud for all three of them. So, first running the
necessary codes (Please refer to Jupyter notebook) and then generating the below
three word clouds:
Roosevelt:

Kennedy:

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
Nixon:

This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
This study source was downloaded by 100000772831368 from CourseHero.com on 09-21-2022 20:19:33 GMT -05:00

https://www.coursehero.com/file/88675887/Problem-2-Businessreport-MLdocx/
Powered by TCPDF (www.tcpdf.org)

You might also like