Email Spam Detection Using Machine Learning

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

Email Spam Detection Using

Machine Learning Algorithms

PERFORMED BY:
AYUSH KUNWAR
SOURAV KAMBLE
Abstract

As a means of contact for personal and professional use, emails are commonly used.
Information shared that emails, such as banking information, credit reports, login details, etc.,
is often sensitive and confidential. This makes them useful for cyber criminals who are able
to exploit the data for malicious purposes. Phishing is a technique that fraudsters use to
acquire confidential data from individuals by claiming to be from proven sources. The sender
will persuade you to provide personal information under bogus pretences in a phished email.
Phishing website detection is an intelligent and efficient model focused on the use of data
mining algorithms for classification or association.
Introduction

Phishing is a lucrative type of fraud in which the criminal deceives receivers and obtains
confidential information from them under false pretences. Phished emails may direct the
users to click on a link of a website or attachment where they are required to provide
confidential information like passwords, credit card information etc. The phisher sends out
the messages to thousands of users and usually only a small percentage of recipients may fall
into the trap but this can result in high profits for the sender.
Objective

 To design and develop an approach for email phishing detection from large synthetic as
well as real time data using machine learning.
 To develop an approach using various machine learning algorithms and explore the
accuracy using majority routing technique.
System Architecture

The first system collects data from the Internet, such as synthetic and real-time spam email
data and applies cross-fold validation. Apply pre-processing in the training and testing phase,
and then proceed with feature extraction and selection. Train the system to generate training
rules and use different machine learning algorithms. Classify all test data, normal and spam,
based on the achieved weight for each test sample. Finally, predict the accuracy of the entire
system using various confusion matrixes.
Fig 1: System Architecture of Proposed System
Software and Hardware Requirements

Front End
 Operating System: -Windows XP/7/8
 Programming Language: JAVA/J2EE/
 Tools: Eclipse or NetBeans, Heidi SQL, JDK 1.7 or Higher
 Database: MySQL 5.1
Hardware Requirements

 Processor: - Intel Pentium 4 or above


 Memory: - 2 GB or above
 Other peripheral: - Printer
 Hard Disk: - 500gb
Thank you

You might also like