1. Write a program for pre-processing of a text document such as
conversion to lowercase, stop word removal, remove punctuation, removing numbers, stemming, lemmatization, number to text, remove whitespace.
2. Implement a program for retrieval of documents using inverted
files.
3. Implement e-mail spam filtering using text classification
algorithm with appropriate dataset. Analyse the performance of the algorithm.
4. Perform text classification using Naive Bayes/Logistic
Regression/ Support Vector Machines /Decision Tree Classifier. Use dataset "Economic news article tone and relevance" news articles, which were tagged as relevant or not relevant to the US Economy. Keep the required columns and save it in a dataframe. Explore the process of training and testing text classifier for this dataset and analyse the performance.
for implementation). 6. Build the web crawler to pull IMDB information and analyse the ratings.
7. Build a web crawler to get web contents like PPT and Image files. The files should be downloaded on the local machine.
8. Write a program to construct a Bayesian network considering
medical data. Use this model to demonstrate the diagnosis of heart patients using the standard Heart Disease Data Set (You can use Java/Python ML library classes/API