Review 2

An Industry Oriented Mini Project
On
SPAM MAIL DETECTION USING MACHINE LEARNING
DEPARTMENT OF INFORMATION TECHNOLOGY

TEEGALA KRISHNA REDDY ENGINEERING COLLEGE
MEDBOWLI,SAROORNAGAR,HYDERABAD
PROJECT GUIDE SUBMITTED BY

Mrs S.PAVANI K.NIHARIKA 20R91A1225
ASSISTANT PROFESSOR K.SAI KUMAR 20R91A1224
K.RAHUL 20R91A1226
CONTENTS
1.Abstract
2.Problem statement
3.Existing System
4.Proposed System
5.Hardware and Software Requirements
6.Modules
7.Architecture Diagram
8.UML Diagrams
9.Conclusion
10.References
ABSTRACT
Unsolicited e-mail also known as Spam has become a huge concern for each e-mail
user. In recent times, it is very difficult to filter spam emails as these emails are
produced or written in a very special manner so that anti-spam filters cannot detect
such emails.Logistic regression model can be used to create a more intelligent spam
detection classifier by the algorithms, and the most common use case is binary logistic
regression, where the outcome is binary classification(0 or 1).This model gives the
highest level of accuracy up to 99%.It uses a supervised learning algorithm as it can
produce accurate predictions.
PROBLEM STATEMENT
• With a tiny investment, a spammer can send over 100,000 bulk emails per hour.
• Junk mails waste storage and transmission bandwidth.
• Spam is a problem because the cost is forced onto us, the recipient.
• There is no integrated approach for classifying and filtering a mail as a spam or ham.
EXISTING SYSTEM
• Spam mail detection systems use a combination of techniques such as content analysis,
machine learning algorithms and blacklists to identify and filter out spam emails These
systems analyze email content,headers to determine if an email is likely to be spam.
• Machine learning models are trained on large datasets to recognize patterns in spam
behavior.
• Is a pattern based approach.
PROPOSED SYSTEM
• Spam detecting system can distinguish spam from non-spam emails based on self-learning
algorithm.
• By using Logistic regression model, the data will be trained and tested.
• After testing, the device reads the data in semantic based approach which is stored in the
dataset.
• If the result gives 1 then it would be spam, else if it gives 0 then it would be ham
• Is a Semantic based approach.
REQUIREMENTS
Hardware Requirements
• Processor : Intelcore i3 or above
• Hard Disk : 256 GB or above
• RAM : 4GB or above
• Internet : 3Mbps or above
Software Requirements
• Operating System : windows,MacOs,linux
• Platform : Google Colab
• Programming Language : Python
MODULES
• Email
• Receiver
• Logistic Regression
• Trained & Tested
• Dataset
• Semantic Based Approach
• Spam(1) or Ham(0)
ARCHITECTURE DIAGRAM
Incoming Email
Preprocessing
Feature Extraction
Training Phase
Logistic
Regression Model
Spam
Classification Result
Ham
USECASE DIAGRAM
• A UML(Unified Modeling Language) use case diagram is a visual representation of the
interactions between users and a system.
• It capture the functional requirements of a system and help to identify how different users
interact with the system to achieve specific tasks.
• Use case diagrams are typically used during the early stages of software development to
capture and define the functional requirements of a system.
• A single use case diagram captures a particular functionality of a system. So to

model the entire system number of use cases diagrams are used.
Email-Data sent
Email-ID
Email-From
EMAIL Email-To
Email-Subject
USER
Email-Body
Spam score-Logistic Regression
ACTIVITY DIAGRAM
• Activity Diagrams describe how activities are coordinated to provide a service that cane be at
different levels of abstraction.
• Activity diagrams are graphical representations of Workflows of stepwise activities and actions.
• It is a type of Unified Modeling Language(UML) flowchart that shows the flow from one activity
to another in a process, and they are also used to construct the executable system by using forward
and reverse engineering techniques.
• It is also termed as an Object-Oriented flowchart.The flow can be sequential or branched.

• The activities are initiated at the initial node and terminated at the final node.
START
Read Email
Spam or
NO YES
Ham
Sample the Sample the

Ham Email Spam Email
End of NO
Datasets
YES
END
SEQUENCE DIAGRAM
• Sequence diagram shows the time sequence between the various objects in the process.It
consists of lines that are usually straight parallel lines and horizontal arrows indicating the
direction of the messages that makes the user understand easily.
• It describe interactions among classes in terms of an exchange of messages over time.
They are also called event diagrams.
• A Sequence diagram is a good way to visualize and validate various scenarios.
• These can help to predict how a system will behave and discover responsibilities a class
may need to have in the process of modeling a new system.
Splitting
ML Model Dataset
data
USER
Receives Logistic
Email Regression
Train Data
Test Data
Transfers the data
to Dataset
Detection
Reads the data in Semantic Based Approach
CLASS DIAGRAM
• A class diagram describes the structure of a system by showing the system’s classes,
their attributes, methods and the relationships among objects.
• It is like an ER(Entity relationship) diagram, which gives very clear information
• Class diagrams are the only diagrams which can be directly mapped with object-oriented
languages
<<Email>>
email-id : string
email-from : string
<<User>> email-to : string
email-body : string
email-semanticbasedapproach : string
Receives Email spamscore-logisticregression : int
email : string
receiveemail()
viewemail()
storeemail()
reademail()
semanticbasedapproach()
scorelogisticregression()
CODE OUTPUT
CONCLUSION
We used previously collected data in order to train the model and predicted the category for
new incoming emails. This indicate the importance of tagging the data in right way.One
mistake can make your machine dumb, e.g In your gmail or any other email account when
you get the emails and you think it is a spam, you choose to ignore, may be next time when
you see that email, you should report that as a spam.This process can help a lot of other
people who are receiving the same kind of email but not aware of what spam is. Sometimes
wrong spam tag can move a genuine email to spam folder too.So, you have to be careful
before you tag an email as a spam or not spam
REFERENCES
➢Sjarif, Nila, & Amir, N. (2019). SMS Spam Message Detection using Term
FrequencyInverse Document Frequency and Random Forest Algorithm. Procedia Computer
Science , 509-515.
➢Shankar, S. (2018). Advanced Detection of Spam And Email Fitering using NLP algorithms.
IJARIT .
➢Emmanuel, Gbengadada, & Joseph. (2016). Machine learning for email spam filtering:
review, approaches and open research problems.
➢https://www.scribd.com/document/459203809/email-spam
THANK YOU !

Review 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Review 2

Uploaded by

Copyright:

Available Formats

An Industry Oriented Mini Project

DEPARTMENT OF INFORMATION TECHNOLOGY

PROJECT GUIDE SUBMITTED BY

• A single use case diagram captures a particular functionality of a system. So to

• It is also termed as an Object-Oriented flowchart.The flow can be sequential or branched.

Sample the Sample the

• A Sequence diagram is a good way to visualize and validate various scenarios.

• It is like an ER(Entity relationship) diagram, which gives very clear information

You might also like