Final CPE

Government Polytechnic Khamgaon
Computer Department
Capstone Project Presentation on

Phishing Website Detection System using ML
Presented By -
2100210097 27 Aasawari Kshirsagar

2100210098 28 Rasika Majgaonkar Guided By: Prof.V. M. Bande
2100210112 37 Vaishnavi Sable
2100210125 49 Tanvi Wankhede
CONTENTS
➢ Introduction
➢ Project Overview
➢ Problem Statement
➢ Methodology
➢ Project Plan and Timeline
➢ Challenges Encountered
➢ Achievements and Progress
➢ Lessons Learned
➢ Next Steps
➢ Conclusion
➢ References
INTRODUCTION
 Phishing Website attack is a type of cyber threat where attackers create

deceptive websites that mimic legitimate ones, aiming to trick users into
divulging sensitive information.
 The ultimate goal of a phishing attack is to exploit the victim's trust and
obtain sensitive information that can be used for fraudulent activities,
unauthorized access, or identity theft.
 Phishing website detection involves the use of machine learning

techniques to identify and block websites.
PROJECT OVERVIEW
 The Phishing Website Detection project aims to create a robust system

that accurately identifies whether a user-entered website is a phishing or
not.
 The project aims to improve the accuracy of identifying phishing websites

compared to existing models, addressing the growing social issue of
increased phishing attacks despite strong security measures.
 The ultimate objective is to contribute to overcoming this social problem

by implementing a highly effective phishing detection system
PROBLEM STATEMENT
METHODOLOGY
01 02 03 04
Model Deployment and

Data Collection Feature Extraction Implementation Monitoring
Utilized the Kaggle dataset as our We focus on feature selection
and Training
Preparing for the deployment of
primary source of data. and engineering. We carefully Implemented machine learning trained models in the upcoming
Preprocess the dataset by choose features that are highly models using the selected weeks. This involves finalizing
handling missing values, indicative of phishing behavior. features. Trained these models model selection, conducting
removing duplicates, and By selecting and engineering on the preprocessed dataset to thorough testing, and ensuring
normalizing features to ensure these features, we aim to provide learn patterns and relationships compatibility with the
data quality and consistency our models with the necessary between features and phishing deployment environment.
information to make accurate behavior.
predictions.
TOOLS AND TECHNOLOGIES
 Anaconda Environment with Python : Anaconda provides a convenient environment for managing Python
packages and dependencies.
 Python Flask, HTML, CSS, JS : For designing the user-interface and backend integration.
 Machine Learning with Python Libraries : for Training our model using Scikit-learn's algorithms and evaluate its
performance. Once trained, integrate the model into your Flask application to perform real-time detection.
PROJECT PLAN AND TIMELINE
Week 4
Week 2 We implemented the
We studied different algorithms and trained our
datasets and decided models on the selected
which dataset to use. dataset .
Week 1 Week 3 Week 5

We created the user We learnt about the We compared the
interface for our algorithms which we accuracy of all the
project decided to implemented models
implement . and for improvement
in it implemented
hyperparameter
tunning.
www.free-powerpoint-templates-design.com
CHALLENGES ENCOUNTERED
1. Finding Suitable Environment -
Anaconda is one of the best environment as it already includes most of the pre
installed libraries such as scikit learn,pandas,etc
2. Accuracy -
We use technique called hyperparameter tuning to increase accuarcy of the algorithms and
to find out the parameters that contributes maximum to the accuracy
ACHIEVEMENTS AND PROGRESS
• Successful Completion of GUI

1
• All the selected algorithms are implemented

2
DEMONSTRATION
• User Interface
MODELS IMPLEMENTED
• Ensemble Technique
1] Bagging
Random Forest Algorithm
RANDOM FOREST
Accuracy :
Classification Report :
Confusion Matrix
Train and Test Accuracy Graph

VARIABLE IMPORTANCE
XGBOOST
• Ensemble Technique
2] Boosting Diagram:
Not Phishing
XGBOOST
Accuracy :
Confusion Matrix

LOGISTIC REGRESSION
• Logistic regression is a statistical method used for binary classification by estimating the probability of a binary
outcome based on one or more predictor variables.
LOGISTIC REGRESSION
Accuracy :
Confusion Matrix

K - NEAREST NEIGHBOUR (KNN)
• The K-NN algorithm works by finding the K nearest neighbors to a given data point based on a distance metric,
such as Euclidean distance.
K - NEAREST NEIGHBOUR (KNN)
Accuracy :
Confusion Matrix

LESSONS LEARNED
 Python proficiency: Learned python to understand machine learning algorithms python libraries such as NumPy,
Pandas, Matplotlib and Scikit-learn.
 Environment Choice: Selected Anaconda as primary environment for its efficient package management system
and support for data science tools.
 Dataset selection: Identified and acquired a suitable dataset to project’s requirements.
 Prioritized data preprocessing to ensure high-quality input for model training.
 Recognized the importance of hyperparameters, and allocated more resources and time for hyperparameter
tuning.
NEXT STEPS
 Integration: We will now integrate the main model with the front end of the website.
 Feature Extraction: Design a mechanism to extract features and preprocess from URLs entered by users on
the website.
 Compare Model Evaluation: Perform experiments to evaluate the performance of different models using real
URL inputs.
 Deployment and Maintenance: Deploy the finalized web application with the integrated machine learning
model.
CONCLUSION
 In Capstone Project Execution, we have implemented few classification models to predict phishing websites.
 As the Random Forest Classifier and XG Boost classifier has performed better than other models.
 In next phase of project implementation, we will integrate the classification model that performs better in real
environment.
Algorithms Accuracy
Random Forest 99.97%
XG Boost Classifier 99.56%
KNN 90%
Logistic Regression 91%
REFERENCES
 https://ieeexplore.ieee.org/document/9730579
THANK YOU

Final CPE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final CPE

Uploaded by

Copyright:

Available Formats

Government Polytechnic Khamgaon

Capstone Project Presentation on

2100210097 27 Aasawari Kshirsagar

 Phishing Website attack is a type of cyber threat where attackers create

 Phishing website detection involves the use of machine learning

 The Phishing Website Detection project aims to create a robust system

 The project aims to improve the accuracy of identifying phishing websites

 The ultimate objective is to contribute to overcoming this social problem

Model Deployment and

Week 1 Week 3 Week 5

1. Finding Suitable Environment -

• Successful Completion of GUI

• All the selected algorithms are implemented

Train and Test Accuracy Graph

Train and Test Accuracy Graph

Train and Test Accuracy Graph

Train and Test Accuracy Graph

You might also like