text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908
Using the data above complete the following tasks:
1. Make a list of all four-letter-long words from text1. How many are there? 2. In text1 find all words longer than 17 letters. How many are there? 3. Using the built-in functions set() and sorted() create a dictionary for each sentence (sent1, […], sent9) and a joint dictionary for all the sentences. 4. Define vocab_size() function, which for a given text will return the size of a dictionary (so return a number of all unique words). How many are there in each book? 5. Print the 10 most commonly occurring words in text1. 6. Check which words are the longest in each of the text. 7. Check how many unique bigrams there are in text5. For the 10 most common, return the joint number of occurrences and compare it with the top 10 most commonly occurring words.