Professional Documents
Culture Documents
The C & C: The Classification and Clustering Project
The C & C: The Classification and Clustering Project
CHOSEN OPTION:
MULTINOMIALNB
TRAIN AND TEST
SCORE.
WORD CLOUD
VISUALISATION OF
WEIGHTED WORDS.
MESSAGES THAT
SHOULD /
SHOULDN’T HAVE
BEEN SPAM
RESPECTIVELY.
CHOSEN OPTION:
RANDOM FOREST
K=3
CENTROIDS ARE
SHOWN.
CENTROIDS ARE
SHOWN.
CENTROIDS ARE
SHOWN.
THE INPUT
MATRIX IS
DISPLAYED.
SAMPLE
AFTER EACH
STEP IS
DISPLAYED
FRUIT OF THE LABOUR (OUTCOME)
1) SPAM DETECTION
TEST SET TRAINING SET
ACCURACY ACCURACY
MULTINOMIAL NB 95.16% 96.81%
RANDOM FOREST 97.28% 100%
LOGISTIC 96.41% 97.40%
REGRESSION
K NEAREST 92.22% 95.07%
NEIGHBOURS
2) CLUSTERING: K MEANS
FROM THE EVALUATION OF COST FUNCTION AT DIFFERENR
VALUES OF ‘k’, THE ELBOW POINT WAS FOUND AT k=4. THAT
IS, THE OPTIMAL SOLUTION CAN BE FOUND AT k=4
2) CLUSTERING:
AGGLOMERATIVE
CLUSTERING
DIFFERENT OUTCOMES WERE OBSERVED BY FEEDING IN
DIFFERENT VALUES OF POINTS.
THE FINAL PLATTER (CONCLUSION)
1) SPAM DETECTION
AS OBSERVED FROM THE TABLE IN THE OUTCOMES SECTION,
WE CAN INFER AND CONCLUDE THAT THE HIGHEST ACCURAXY
WAS SEEN IN TH CASE OF RANDOM FOREST CLASSIFIER FOR
BOTH, TEST AND TRAINING DATA.
THE REASONS HAVE BEEN STATED CLASSIFIER-WISE:
LOGISTIC REGRESSION
MULTINOMIAL RANDOM FOREST KNN
NAÏVE BAYES
The random forest (RF) is Random Forest is a Random Forest is a complex and
an “ensemble learning” complex and large model large model whereas Naive
technique consisting of whereas KNN is a Bayes is a relatively smaller
the aggregation of a large relatively smaller model. model.
number of decision trees,
resulting in a reduction of
variance.
LR is comparatively faster Both need large training Naive Bayes performs better
than RF sets. with small training data,
whereas RF needs larger set of
training data.
The random forest (RF) is
an “ensemble learning”
technique consisting of
the aggregation of a large
number of decision trees,
THE HELPING HAND(REFERENCES)
• https://
www.kaggle.com/uciml/sms-spam-collection-da
taset
• https://medium.com/@
dannymvarghese/comparative-study-on-classic-
machine-learning-algorithms-part-2-5ab58b683
ec0
• https://www.edureka.co/blog/k-nearest-neighb
ors-algorithm
/
• https://matplotlib.org/api/_
THANK YOU!