Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

An Industry Oriented Mini Project

On
SPAM MAIL DETECTION USING MACHINE LEARNING

DEPARTMENT OF INFORMATION TECHNOLOGY


TEEGALA KRISHNA REDDY ENGINEERING COLLEGE
MEDBOWLI,SAROORNAGAR,HYDERABAD

PROJECT GUIDE SUBMITTED BY


Mrs S.PAVANI K.NIHARIKA 20R91A1225
ASSISTANT PROFESSOR K.SAI KUMAR 20R91A1224
K.RAHUL 20R91A1226
CONTENTS
1.Abstract
2.Problem statement
3.Existing System
4.Proposed System
5.Hardware and Software Requirements
6.Modules
7.Architecture Diagram
8.UML Diagrams
9.Conclusion
10.References
ABSTRACT
Unsolicited e-mail also known as Spam has become a huge concern for each e-mail
user. In recent times, it is very difficult to filter spam emails as these emails are
produced or written in a very special manner so that anti-spam filters cannot detect
such emails.Logistic regression model can be used to create a more intelligent spam
detection classifier by the algorithms, and the most common use case is binary logistic
regression, where the outcome is binary classification(0 or 1).This model gives the
highest level of accuracy up to 99%.It uses a supervised learning algorithm as it can
produce accurate predictions.
PROBLEM STATEMENT

• With a tiny investment, a spammer can send over 100,000 bulk emails per hour.
• Junk mails waste storage and transmission bandwidth.
• Spam is a problem because the cost is forced onto us, the recipient.
• There is no integrated approach for classifying and filtering a mail as a spam or ham.
EXISTING SYSTEM

• Spam mail detection systems use a combination of techniques such as content analysis,
machine learning algorithms and blacklists to identify and filter out spam emails These
systems analyze email content,headers to determine if an email is likely to be spam.
• Machine learning models are trained on large datasets to recognize patterns in spam
behavior.
• Is a pattern based approach.
PROPOSED SYSTEM
• Spam detecting system can distinguish spam from non-spam emails based on self-learning
algorithm.
• By using Logistic regression model, the data will be trained and tested.
• After testing, the device reads the data in semantic based approach which is stored in the
dataset.
• If the result gives 1 then it would be spam, else if it gives 0 then it would be ham
• Is a Semantic based approach.
REQUIREMENTS

Hardware Requirements
• Processor : Intelcore i3 or above
• Hard Disk : 256 GB or above
• RAM : 4GB or above
• Internet : 3Mbps or above
Software Requirements
• Operating System : windows,MacOs,linux
• Platform : Google Colab
• Programming Language : Python
MODULES
• Email
• Receiver
• Logistic Regression
• Trained & Tested
• Dataset
• Semantic Based Approach
• Spam(1) or Ham(0)
ARCHITECTURE DIAGRAM
Incoming Email

Preprocessing

Feature Extraction

Training Phase

Logistic
Regression Model
Spam

Classification Result
Ham
USECASE DIAGRAM
• A UML(Unified Modeling Language) use case diagram is a visual representation of the
interactions between users and a system.

• It capture the functional requirements of a system and help to identify how different users
interact with the system to achieve specific tasks.

• Use case diagrams are typically used during the early stages of software development to
capture and define the functional requirements of a system.

• A single use case diagram captures a particular functionality of a system. So to


model the entire system number of use cases diagrams are used.
Email-Data sent
Email-ID

Email-From

EMAIL Email-To

Email-Subject
USER

Email-Body
Spam score-Logistic Regression
ACTIVITY DIAGRAM

• Activity Diagrams describe how activities are coordinated to provide a service that cane be at
different levels of abstraction.

• Activity diagrams are graphical representations of Workflows of stepwise activities and actions.

• It is a type of Unified Modeling Language(UML) flowchart that shows the flow from one activity
to another in a process, and they are also used to construct the executable system by using forward
and reverse engineering techniques.

• It is also termed as an Object-Oriented flowchart.The flow can be sequential or branched.


• The activities are initiated at the initial node and terminated at the final node.
START

Read Email

Spam or
NO YES
Ham

Sample the Sample the


Ham Email Spam Email

End of NO
Datasets

YES

END
SEQUENCE DIAGRAM

• Sequence diagram shows the time sequence between the various objects in the process.It
consists of lines that are usually straight parallel lines and horizontal arrows indicating the
direction of the messages that makes the user understand easily.
• It describe interactions among classes in terms of an exchange of messages over time.
They are also called event diagrams.

• A Sequence diagram is a good way to visualize and validate various scenarios.

• These can help to predict how a system will behave and discover responsibilities a class
may need to have in the process of modeling a new system.
Splitting
ML Model Dataset
data

USER

Receives Logistic
Email Regression
Train Data

Test Data
Transfers the data
to Dataset

Detection
Reads the data in Semantic Based Approach
CLASS DIAGRAM
• A class diagram describes the structure of a system by showing the system’s classes,
their attributes, methods and the relationships among objects.

• It is like an ER(Entity relationship) diagram, which gives very clear information

• Class diagrams are the only diagrams which can be directly mapped with object-oriented
languages
<<Email>>
email-id : string
email-from : string
<<User>> email-to : string
email-body : string
email-semanticbasedapproach : string
Receives Email spamscore-logisticregression : int
email : string
receiveemail()
viewemail()
storeemail()
reademail()
semanticbasedapproach()
scorelogisticregression()
CODE OUTPUT
CONCLUSION

We used previously collected data in order to train the model and predicted the category for
new incoming emails. This indicate the importance of tagging the data in right way.One
mistake can make your machine dumb, e.g In your gmail or any other email account when
you get the emails and you think it is a spam, you choose to ignore, may be next time when
you see that email, you should report that as a spam.This process can help a lot of other
people who are receiving the same kind of email but not aware of what spam is. Sometimes
wrong spam tag can move a genuine email to spam folder too.So, you have to be careful
before you tag an email as a spam or not spam
REFERENCES
➢Sjarif, Nila, & Amir, N. (2019). SMS Spam Message Detection using Term
FrequencyInverse Document Frequency and Random Forest Algorithm. Procedia Computer
Science , 509-515.
➢Shankar, S. (2018). Advanced Detection of Spam And Email Fitering using NLP algorithms.
IJARIT .
➢Emmanuel, Gbengadada, & Joseph. (2016). Machine learning for email spam filtering:
review, approaches and open research problems.
➢https://www.scribd.com/document/459203809/email-spam
THANK YOU !

You might also like