Professional Documents
Culture Documents
Detection of Phishing On Apps and Websites - Project Report
Detection of Phishing On Apps and Websites - Project Report
Engineering
Project Report
Submitted By
Deepika S – 19MIS0005
Sethumadhavan V – 19MIS0010
Varun S -19MIS0046
Surya Teja G – 19MIS0247
Yuvaraj S- 19MIS0427
Abstract:
Phishing is a type of social engineering attack often used to steal user
data, including login credentials and credit card numbers. It occurs when an
attacker, masquerading as a trusted entity, dupes a victim into opening an email,
instant message, or text message. Phishing attack is a simplest way to obtain
sensitive information from innocent users. Aim of the phishers is to acquire
critical information like username, password and bank account details. Cyber
security persons are now looking for trustworthy and steady detection
techniques for phishing websites detection. This paper deals with machine
learning technology for detection of phishing URLs by extracting and analyzing
various features of legitimate and phishing URLs. Decision Tree, random forest
and Support vector machine algorithms are used to detect phishing websites.
Aim of the paper is to detect phishing URLs as well as narrow down to best
machine learning algorithm by comparing accuracy rate, false positive and false
negative rate of each algorithm.
In recent years, advancements in Internet and cloud technologies have led to a
significant increase in electronic trading in which consumers make online
purchases and transactions. This growth leads to unauthorized access to users’
sensitive information and damages the resources of an enterprise. Phishing is
one of the familiar attacks that trick users to access malicious content and gain
their information. In terms of website interface and uniform resource locator
(URL), most phishing webpages look identical to the actual webpages. Various
strategies for detecting phishing websites, such as blacklist, heuristic, Etc., have
been suggested. However, due to inefficient security technologies, there is an
exponential increase in the number of victims. The anonymous and
uncontrollable framework of the Internet is more vulnerable to phishing attacks.
Existing research works show that the performance of the phishing detection
system is limited. There is a demand for an intelligent technique to protect users
from the cyber-attacks. In this study, the author proposed a URL detection
technique based on machine learning approaches. A recurrent neural network
method is employed to detect phishing URL. Researcher evaluated the proposed
method with 7900 malicious and 5800 legitimate sites, respectively. The
experiments’ outcome shows that the proposed method’s performance is better
than the recent approaches in malicious URL detection.
Page.No:2
Detection of Phishing on Apps And Websites
Introduction:
Nowadays Phishing becomes a main area of concern for security
researchers because it is not difficult to create the fake website which looks so
close to legitimate website. Experts can identify fake websites but not all the
users can identify the fake website and such users become the victim of
phishing attack. Main aim of the attacker is to steal banks account credentials.
In United States businesses, there is a loss of US $10billion per year because
their clients become victim to phishing . In 3rd Microsoft Computing Safer
Index Report released in February 2020, it was estimated that the annual
worldwide impact of phishing could be as high as $5 billion. Phishing attacks
are becoming successful because lack of user awareness. Since phishing attack
exploits the weaknesses found in users, it is very difficult to mitigate them but it
is very important to enhance phishing detection techniques.
Problem Definition:
Phishing is a type of social engineering attack often used to
steal user data, including login credentials and credit card numbers. It occurs
when an attacker, masquerading as a trusted entity, dupes a victim into opening
an email, instant message, or text message. Phishing attack is a simplest way to
obtain sensitive information from innocent users. Aim of the phishers is to
acquire critical information like username, password and bank account details.
Cyber security persons are now looking for trustworthy and steady detection
techniques for phishing websites detection. This paper deals with machine
learning technology for detection of phishing URLs by extracting and analyzing
various features of legitimate and phishing URLs. Decision Tree, random forest
and Support vector machine algorithms are used to detect phishing websites.
Aim of the paper is to detect phishing URLs as well as narrow down to best
machine learning algorithm by comparing accuracy rate, false positive and false
negative rate of each algorithm.
Page.No:3
Detection of Phishing on Apps And Websites
COMPLETE DESIGN:
Proposed Approach:
MODULE DESCRIPTION:
➢ The first step is to load the data in the form of csv file which
contains different types of URLs.
➢ We have to vectorize our URLs. We used Count Vectorizer and
gathered words using tokenizer, since there are words in URLs
that are more important than other words e.g. ‘virus’, ‘.exe’,
‘.data’ etc.
Page.No:4
Detection of Phishing on Apps And Websites
MAJOR MODULES/TECHNIQUES INCORPORATED IN THE
PROJECT:
➢ Regexp Tokenizer
➢ Snowball Stemmer
➢ Beautiful Soup
➢ Logistic Regression
➢ Multinomial lNB
Page.No:5
Detection of Phishing on Apps And Websites
DATASET:
➢ Name of our dataset is phishing_site_urls.csv
➢ The given data set is in comma separated values(.csv file).
➢ File is containing 5,49,346 unique entries.
➢ There are two columns.
➢ Label column is prediction col which has 2 categories
❖ A. Good - which means the URLs is not containing
malicious stuff and this site is not a Phishing Site.
❖ B. Bad - which means the URLs contains malicious stuffs
and this site isa Phishing Site.
➢ There is no missing value in the dataset.
user platform website
SEQUENCE DIAGRAM:
train data
data visualization
graphical representation
data quality
confusion metrics
in .pkl file
makes desicions
Page.No:6
Detection of Phishing on Apps And Websites
ACTIVITY DIAGRAM:
user desktop w ebsite
train the
data
data
visualization
build a launch a
model model
Page.No:7
Detection of Phishing on Apps And Websites
IMPLEMENTATION:
Regexp Tokenizer:
Page.No:8
Detection of Phishing on Apps And Websites
Snowball Stemmer:
Visualization:
➢ Visualize some important keys using word cloud
➢ create a function to visualize the important keys from url
Page.No:9
Detection of Phishing on Apps And Websites
Beautiful Soup:
➢ It is use for getting data out of HTML, XML, and other markup languages.
➢ Use the Beautiful Soup library to extract only relevant hyperlinks for
Google, i.e. links only with '<'a'>' tags with href attributes.
➢ Turn the URL’s into a Data frame.
➢ After you get the list of your websites with hyperlinks turn them into a
Pandas Data Frame with columns “from” (URL where the link resides) and
“to” (link destination URL).
Page.No:10
Detection of Phishing on Apps And Websites
Logistic Regression:
Page.No:11
Detection of Phishing on Apps And Websites
MultinomialNB:
Applying Multinomial Naive Bayes to NLP Problems. Naive
Bayes Classifier Algorithm is a family of probabilistic algorithms based on
applying Bayes' theorem with the “naive” assumption of conditional
independence between every pair of a feature.
Page.No:12
Detection of Phishing on Apps And Websites
Page.No:13
Detection of Phishing on Apps And Websites
Implementation
Software Details:
Jupyter Notebook - (Anaconda Navigator):
Page.No:14
Detection of Phishing on Apps And Websites
Python:
The python code is used to deploy FastApi usind the .pkl file which is
generated from jupyter notebook.
Sample code:
Page.No:15
Detection of Phishing on Apps And Websites
FastApi:
We are using fastapi for our project to deploy it as a website
as a platform. This is an interactive and responsive website that will be
used to detect whether a website is legitimate or phishing. This website
is made using different web designing languages which include HTML,
CSS and Javascript.
Page.No:16
Detection of Phishing on Apps And Websites
Results and discussion:
it's that simple yet so effective. We get an accuracy of 98%. That’s a very
high value for a machine to be able to detect a malicious URL with.
Want to test some links to see if the model gives good predictions
Page.No:17
Detection of Phishing on Apps And Websites
GIVING THE SAFE INPUT WEBSITE LINK:
Page.No:18
Detection of Phishing on Apps And Websites
GIVING THE UNSAFE INPUT WEBSITE LINK:
Page.No:19
Detection of Phishing on Apps And Websites
Performance metrics:
From the obtained results of the above models,
logistic regression has highest model performance. So, we can
conclude that logistic regression has higher accuracy value compared
to others in detection of phishing website.
Page.No:20
Detection of Phishing on Apps And Websites
CONCLUSION:
Phishing becomes a main area of concern for security researchers
because it is not difficult to create the fake website which looks so close to
legitimate website. Experts can identify fake websites but not all the users can
identify the fake website and such users become the victim of phishing attack.
Main aim of the attacker is to steal banks account credentials. In United States
businesses, there is a loss of US $10billion per year because their clients
become victim to phishing. In 3rd Microsoft Computing Safer Index Report
released in February 2020, it was estimated that the annual worldwide impact of
phishing could be as high as $5 billion. Phishing attacks are becoming
successful because lack of user awareness. Since phishing attack exploits the
weaknesses found in users, it is very difficult to mitigate them, but it is very
important to enhance phishing detection techniques.
References:
➢ Pujara, Purvi, and M. B. Chaudhari. "Phishing website detection using
machine learning: a review." International Journal of Scientific Research
in Computer Science, Engineering and Information Technology 3.7
(2018): 395-399.
➢ Mahajan, Rishikesh, and Irfan Siddavatam. "Phishing website detection
using machine learning algorithms." International Journal of Computer
Applications 181.23 (2018): 45-47.
➢ Kulkarni, Arun D., and Leonard L. Brown III. "Phishing websites
detection using machine learning." (2019).
➢ Kiruthiga, R., and D. Akila. "Phishing websites detection using machine
learning." International Journal of Recent Technology and
Engineering 8.2 (2019): 111-114.
➢ Kumar, J., Santhanavijayan, A., Janet, B., Rajendran, B., &
Bindhumadhava, B. S. (2020, January). Phishing website classification
and detection using machine learning. In 2020 international conference
on computer communication and informatics (iccci) (pp. 1-6). IEEE.
Page.No:21