Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Assessing Machine Learning Tools for Web Page Phishing

Detection: A Performance Evaluation


&
Modernizing Phishing Defense: A Groundbreaking Ensemble
Machine Learning Approach

Presented By: Bakkireddygari Sai Sravanthi


Student Id: 112121037

1
Table of Contents

Introduction Ensemble Model PhishNet Methodology Data Collection and Feature


Architecture Overview Overview Pre-Processing Extraction

Results Phish Net Conclusion and


Rule Extraction Future work
Implementation

2
Introduction

• Phishing attacks surge during COVID-19, targeting individuals and


organizations globally.
• Cybercriminals use social engineering to trick users into sharing
sensitive data, emphasizing the need for strong security measures.
• This study introduces an innovative ensemble ML approach integrated
into the browser extension, Phish Net, to combat phishing threats
effectively.

3
Ensemble Model Architecture
Random Forest Classifier (RFC) combined with:
■ Artificial Neural Network (ANN)
■ k-Nearest Neighbors (KNN)
■ Decision Tree (C4.5)

The ensemble model connects the collective intelligence of diverse


classifiers to enhance phishing detection accuracy and robustness

4
PhishNet Overview
PhishNet is a browser extension for Google Chrome designed to
detect phishing websites effectively.

It analyzes webpage characteristics in real-time and alerts users if a


phishing attempt is detected, enhancing web security

5
Methodology Overview
Data Collection: Obtained 1000 phishing URLs from Phish Tank and 400
legitimate internet banking URLs.

Feature Extraction: Extracted 14 features including IP address, SSL security,


and URL characteristics.

Model Building and Training: Trained SVM, Random Forest, and k-NN
models using Python's Scikit-learn library.

6
Methodology Overview
Model Assessment: Evaluated model performance using metrics like accuracy,
true positive rate, and true negative rate.

Rule Extraction: Extracted decision rules from the best-performing model


(Random Forest).

Phish Net Implementation: Integrated extracted rules into a Google Chrome


extension using web technologies.

7
Data Collection and Pre-Processing
The dataset comprised 1000 phishing URLs and 400 legitimate Internet
banking URLs.

The dataset utilized for training comprises 11055 instances and 30


features sourced from the UCI machine learning repository
⚬ Phishing URLs: 55.69%
⚬ Legitimate URLs: 44.3057%

8
Feature Extraction

⚬ Presence of IP address in the ⚬ Domain registration length


URL ⚬ Redirects
⚬ SSL security availability ⚬ Website Popularity
⚬ Number of dots in the URL ⚬ Website age
⚬ Length of the URL ⚬ Unusual characters
⚬ Presence of "@" symbol in the
URL
⚬ Subdomains

9
Results
Ensemble Model Performance:
⚬ RFC + ANN achieved an impressive F1-score of 0.975 and an accuracy of
97.16%.
⚬ RFC + KNN demonstrated superior performance with an F1-score of 0.976
and an accuracy of 97.33%.
⚬ RFC + C4.5 exhibited notable results with an F1-score of 0.976 and an
accuracy of 96.36%.
Model Building and Training
Random Forest:
⚬ Achieved an outstanding accuracy of 98.35%
⚬ Demonstrated a perfect true positive rate of 100% and a true negative rate of
90.48%

10
Rule Extraction

Decision rules extracted from Random Forest highlight key features


signaling phishing behavior from the Trained Decision Tree model.

These rules support Phish Net's detection system, enabling instant


identification of potential phishing attempts

11
PhishNet Implementation
Screenshots or diagrams illustrating Phish Net's natural interface
and its continuous integration into the Google Chrome browser.

Highlight the user-friendly nature of Phish Net and its proactive


role in safeguarding users against phishing attacks.

12
Phish Net analyses a page PhishNet detects a phishing site

13
Conclusion and Future work

The study's findings highlight the efficacy of the proposed


ensemble model and its integration into the practical Phish Net
browser extension.

Future research avenues may include exploring performance on


diverse datasets, refining feature extraction techniques, and
enhancing Phish Net's capabilities through continuous innovation
and development.

14
15

You might also like