Resteam 253 - Cap2

Capstone
Project 02
RESTEAM253
TEAM MEMBERS
 ANIL KUMAR VEMULA
V. TEJA SAI BALARAM
GUTTA TEJ BHARGAV
Y. S. PRAKASH REDDY

Email Spam Detection using ML and CNNs
INCORPORATING OPTIMIZED NAIVE BAYES AND
CONVOLUTIONAL NEURAL NETWORKS
INTRODUCTION
• Email spam remains a pervasive issue, causing inconvenience and security
concerns for users.
• This thesis introduces an innovative approach to enhance email spam
detection, combining traditional methods with advanced machine learning and
convolutional neural networks.
MOTIVATION
 Legacy spam filters struggle to adapt to the ever-evolving tactics employed by
spammers.
 This research is motivated by the critical need for a more sophisticated and
adaptive solution to combat the persistent challenges posed by modern spam
campaigns.
METHODOLOGY
• The proposed methodology follows a systematic path, ensuring a holistic
approach to email spam detection.
• By combining data collection, preprocessing, and model selection, the
methodology aims for a comprehensive solution.
DATA PREPARATION
Collected Diverse Dataset:
o Assembled a diverse dataset representing various types of spam and non-spam emails.
Textual Data Processing:

o Applied rigorous preprocessing techniques, eliminating stop words, punctuation, and employing
s stemming for textual data.
Non-Textual Data Processing:
o Extracted and pre-processed non-textual elements, including images and emojis.
BASIC STEPS IN MACHINE LEARNING
1.Data Splitting:
• Randomly divided the dataset into training (80%) and testing (20%) subsets.
2.Feature Extraction:
• Utilized TF-IDF for textual features and histogram-based representations for non-textual features.
3.Model Selection:
• Chose Naive Bayes for textual classification and a Convolutional Neural Network (CNN) architecture for non-textual
classification.
4.Training:
• Trained Naive Bayes on textual features and CNN on non-textual features.
5.Evaluation:
• Assessed model performance using metrics like accuracy, precision, recall, and F1-score.
OPTIMIZATION WITH K-CROSS FOLD VALIDATION
• Implemented 5-fold Cross Validation:

• Applied K-Cross Fold Validation to optimize the Naive Bayes classifier.
• Iterative Model Tuning:

• Conducted iterative training and validation to find optimal hyperparameters.
• Enhanced Model Performance:

• Achieved significant improvements in accuracy and robustness through the optimization process.
INCORPORATION OF CNNs
• Preprocessing for Non-Textual Data:
• Normalized and resized non-textual data, especially images, to fit CNN input requirements.
• CNN Architecture Selection:

• Chose VGG16, a pre-trained CNN architecture, for image recognition.
• Integration of Features:
• Integrated Naive Bayes for textual features and CNN for non-textual features into a unified, hybrid model.
RESULTS
•Comparative Performance Metrics:
• Presented a detailed comparison of key performance metrics for the baseline and optimized models.
•Performance Improvement:
• Observed significant enhancements in accuracy, precision, recall, and F1-score with the optimized model.
E XP E RIM E NT AL RE S UL T S
99.496
98.253 98.445
IMPLICATIONS & FUTURE DIRECTIONS
•The findings from this research carry implications for the broader field of email security.
•As technology evolves, the need for continuous exploration and adaptation remains. Future research
should focus on more complex CNN architectures and stay attuned to emerging challenges in spam
tactics. The findings from this research carry implications for the broader field of email security.
CONCLUSIONS
•In conclusion, the hybrid model crafted in this research represents a formidable tool in the ongoing
battle against email spam.
•The culmination of these findings contributes to the broader landscape of cybersecurity, providing
insights into the symbiotic relationship between traditional machine learning algorithms and advanced
neural network architectures.
•The synthesis of machine learning and deep learning techniques positions this model as a promising
solution for organizations seeking robust and adaptive spam detection systems.

Resteam 253 - Cap2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Resteam 253 - Cap2

Uploaded by

Copyright:

Available Formats

Capstone

V. TEJA SAI BALARAM

GUTTA TEJ BHARGAV

Y. S. PRAKASH REDDY

Textual Data Processing:

• Implemented 5-fold Cross Validation:

• Iterative Model Tuning:

• Enhanced Model Performance:

• CNN Architecture Selection:

You might also like