Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Text classification

Activity:
This activity is to assess your understanding of the text classifier. It involves going through
spam email examples to understand the text classifier. You will be using different ensemble
learning techniques to test the performance of the classifier.

Task 1: Spam email example

Activity for Ensemble Learning


Question: Is the given email a Fraud, Spam, or Normal?

What is the Label/Class column?


Which classification it is?

Step 1: Import basic libraries


Step 2: Read file

Step 3: Filter out unnecessary columns

Step 4: Find target class

Step 5: Find the frequency of target class

Step 6: Import libraries for text processing


Step 7: Define functions to process text

Step 8: Call functions for text processing


Step 9: Convert text to vector
The code sample below converts text to vector using TfidVectorizer and
unigrams. In addition to it, use possible combinations of Countvectorizer,
unigram, and bigrams.

Step 10: Convert target values to numbers

Step 11: Develop machine learning models using one or


more of the Ensemble Learning techniques – Bagging,
Boosting, and Random Forest. Use following resources
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingClassifier.html

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Step 12: Check predictions

Step 13: Check accuracy scores

Step 14: Display confusion matrix


Explain how to read confusion matrix

Step 15: Obtain encoded labels to text


Note: Step 10 converts target labels to numbers. The numbers can be converted back to the
text using inverse_transform() function.

Step 16: Using the information in step 15, display confusion


matrix with text in X and Y axis.

You might also like