Professional Documents
Culture Documents
Review 2
Review 2
On
SPAM MAIL DETECTION USING MACHINE LEARNING
• With a tiny investment, a spammer can send over 100,000 bulk emails per hour.
• Junk mails waste storage and transmission bandwidth.
• Spam is a problem because the cost is forced onto us, the recipient.
• There is no integrated approach for classifying and filtering a mail as a spam or ham.
EXISTING SYSTEM
• Spam mail detection systems use a combination of techniques such as content analysis,
machine learning algorithms and blacklists to identify and filter out spam emails These
systems analyze email content,headers to determine if an email is likely to be spam.
• Machine learning models are trained on large datasets to recognize patterns in spam
behavior.
• Is a pattern based approach.
PROPOSED SYSTEM
• Spam detecting system can distinguish spam from non-spam emails based on self-learning
algorithm.
• By using Logistic regression model, the data will be trained and tested.
• After testing, the device reads the data in semantic based approach which is stored in the
dataset.
• If the result gives 1 then it would be spam, else if it gives 0 then it would be ham
• Is a Semantic based approach.
REQUIREMENTS
Hardware Requirements
• Processor : Intelcore i3 or above
• Hard Disk : 256 GB or above
• RAM : 4GB or above
• Internet : 3Mbps or above
Software Requirements
• Operating System : windows,MacOs,linux
• Platform : Google Colab
• Programming Language : Python
MODULES
• Email
• Receiver
• Logistic Regression
• Trained & Tested
• Dataset
• Semantic Based Approach
• Spam(1) or Ham(0)
ARCHITECTURE DIAGRAM
Incoming Email
Preprocessing
Feature Extraction
Training Phase
Logistic
Regression Model
Spam
Classification Result
Ham
USECASE DIAGRAM
• A UML(Unified Modeling Language) use case diagram is a visual representation of the
interactions between users and a system.
• It capture the functional requirements of a system and help to identify how different users
interact with the system to achieve specific tasks.
• Use case diagrams are typically used during the early stages of software development to
capture and define the functional requirements of a system.
Email-From
EMAIL Email-To
Email-Subject
USER
Email-Body
Spam score-Logistic Regression
ACTIVITY DIAGRAM
• Activity Diagrams describe how activities are coordinated to provide a service that cane be at
different levels of abstraction.
• Activity diagrams are graphical representations of Workflows of stepwise activities and actions.
• It is a type of Unified Modeling Language(UML) flowchart that shows the flow from one activity
to another in a process, and they are also used to construct the executable system by using forward
and reverse engineering techniques.
Read Email
Spam or
NO YES
Ham
End of NO
Datasets
YES
END
SEQUENCE DIAGRAM
• Sequence diagram shows the time sequence between the various objects in the process.It
consists of lines that are usually straight parallel lines and horizontal arrows indicating the
direction of the messages that makes the user understand easily.
• It describe interactions among classes in terms of an exchange of messages over time.
They are also called event diagrams.
• These can help to predict how a system will behave and discover responsibilities a class
may need to have in the process of modeling a new system.
Splitting
ML Model Dataset
data
USER
Receives Logistic
Email Regression
Train Data
Test Data
Transfers the data
to Dataset
Detection
Reads the data in Semantic Based Approach
CLASS DIAGRAM
• A class diagram describes the structure of a system by showing the system’s classes,
their attributes, methods and the relationships among objects.
• Class diagrams are the only diagrams which can be directly mapped with object-oriented
languages
<<Email>>
email-id : string
email-from : string
<<User>> email-to : string
email-body : string
email-semanticbasedapproach : string
Receives Email spamscore-logisticregression : int
email : string
receiveemail()
viewemail()
storeemail()
reademail()
semanticbasedapproach()
scorelogisticregression()
CODE OUTPUT
CONCLUSION
We used previously collected data in order to train the model and predicted the category for
new incoming emails. This indicate the importance of tagging the data in right way.One
mistake can make your machine dumb, e.g In your gmail or any other email account when
you get the emails and you think it is a spam, you choose to ignore, may be next time when
you see that email, you should report that as a spam.This process can help a lot of other
people who are receiving the same kind of email but not aware of what spam is. Sometimes
wrong spam tag can move a genuine email to spam folder too.So, you have to be careful
before you tag an email as a spam or not spam
REFERENCES
➢Sjarif, Nila, & Amir, N. (2019). SMS Spam Message Detection using Term
FrequencyInverse Document Frequency and Random Forest Algorithm. Procedia Computer
Science , 509-515.
➢Shankar, S. (2018). Advanced Detection of Spam And Email Fitering using NLP algorithms.
IJARIT .
➢Emmanuel, Gbengadada, & Joseph. (2016). Machine learning for email spam filtering:
review, approaches and open research problems.
➢https://www.scribd.com/document/459203809/email-spam
THANK YOU !