Project 2

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

EMAIL SPAM

CLASSIFICATION
Faizan Raza Siddique (CS-201034)
Hasan Haider Syed (CS-201041)
Syed Muhammad Sibtain Naqvi (CS-201249)
DHA SUFFA UNIVERSITY
CS-3102L Artificial Intelligence
Ms. Aniqa Hussain
19-06-2023
CONTENT
■ Slide 1: Title: Email Spam Classification
■ Slide 2: Title: Dataset Overview
■ Slide 3: Title: Data Preprocessing & Vectorization
■ Slide 4: Title: Naive Bayes & Logistic Regression Classifiers
■ Slide 5: Title: Accuracy Comparison & Email Prediction
INTRODUCTION
• Introduction:
• Email spam is a significant issue affecting individuals, businesses, and organizations worldwide.
• Spam emails often contain malicious content, scams, or unwanted advertising.
• Accurately classifying spam and ham (non-spam) emails is crucial for ensuring email security and
protecting users from potential threats.
• Goal of the Code:
• The purpose of the presented code is to demonstrate the application of machine learning algorithms for
email spam classification.
• By training and evaluating classifiers on a labeled dataset, the code aims to accurately differentiate
between spam and legitimate emails.
• Importance of Email Spam Classification:
• Enhanced Security: Efficient spam classification helps prevent users from falling victim to phishing
attempts, malware distribution, and fraudulent schemes.
• User Experience: Reducing the influx of spam emails improves productivity and ensures that users
receive relevant and legitimate messages.
• Resource Optimization: Identifying and filtering out spam emails saves storage space and reduces the
burden on email servers.
Data Data Preprocessing
Overview & Vectorization
• 'spam_ham_dataset.csv' dataset used • Data Splitting:
in the code. • Dataset split into 80% training and
20% testing sets.
• Contains labeled examples of spam
and ham emails. • Random state 42 for reproducibility.
• Vectorization:
• CountVectorizer converts text data
into numerical feature vectors.
• Training and testing data transformed
using CountVectorizer.
Naive Bayes & Logistic Regression
Classifiers
• Naive Bayes Classifier:
• Training: MultinomialNB classifier trained with vectorized training data.
• Prediction: Naive Bayes predicts labels for the test data.
• Accuracy Calculation: accuracy_score metric used to evaluate Naive Bayes
accuracy.
• Logistic Regression Classifier:
• Training: Logistic Regression classifier trained with vectorized training data.
• Prediction: Logistic Regression predicts labels for the test data.
• Accuracy Calculation: accuracy_score metric used to evaluate Logistic
Regression accuracy.
Accuracy Comparison & Email Prediction

Email Prediction
Accuracy Comparison Examples:

• Bar chart comparing accuracies of • Sample test spam and ham emails used
Naive Bayes and Logistic Regression for prediction.
classifiers. • Predictions made by both Naive Bayes
• X-axis: Algorithm names (Naive and Logistic Regression classifiers.
Bayes, Logistic Regression). • Display predicted labels for the test spam
• Y-axis: Corresponding accuracies. and ham emails.
SUMMARY

■ In this project, we successfully developed machine learning models for email spam
classification using the Naive Bayes and logistic regression algorithms. The models
exhibit high accuracy and effectively differentiate between spam and ham email
messages. These results validate the efficacy of the implemented algorithms in
addressing the email spam problem. We can further enhance the classifiers by
incorporating advanced techniques or exploring ensemble methods.
THANK YOU

You might also like