Professional Documents
Culture Documents
Final CPE
Final CPE
Computer Department
Presented By -
➢ Introduction
➢ Project Overview
➢ Problem Statement
➢ Methodology
➢ Project Plan and Timeline
➢ Challenges Encountered
➢ Achievements and Progress
➢ Lessons Learned
➢ Next Steps
➢ Conclusion
➢ References
INTRODUCTION
The ultimate goal of a phishing attack is to exploit the victim's trust and
obtain sensitive information that can be used for fraudulent activities,
unauthorized access, or identity theft.
01 02 03 04
Anaconda Environment with Python : Anaconda provides a convenient environment for managing Python
packages and dependencies.
Python Flask, HTML, CSS, JS : For designing the user-interface and backend integration.
Machine Learning with Python Libraries : for Training our model using Scikit-learn's algorithms and evaluate its
performance. Once trained, integrate the model into your Flask application to perform real-time detection.
PROJECT PLAN AND TIMELINE
Week 4
Week 2 We implemented the
We studied different algorithms and trained our
datasets and decided models on the selected
which dataset to use. dataset .
www.free-powerpoint-templates-design.com
CHALLENGES ENCOUNTERED
Anaconda is one of the best environment as it already includes most of the pre
installed libraries such as scikit learn,pandas,etc
2. Accuracy -
We use technique called hyperparameter tuning to increase accuarcy of the algorithms and
to find out the parameters that contributes maximum to the accuracy
ACHIEVEMENTS AND PROGRESS
• User Interface
MODELS IMPLEMENTED
• Ensemble Technique
1] Bagging
Random Forest Algorithm
RANDOM FOREST
Accuracy :
Classification Report :
Confusion Matrix
• Ensemble Technique
2] Boosting Diagram:
Not Phishing
XGBOOST
Accuracy :
Classification Report :
Confusion Matrix
• Logistic regression is a statistical method used for binary classification by estimating the probability of a binary
outcome based on one or more predictor variables.
LOGISTIC REGRESSION
Accuracy :
Classification Report :
Confusion Matrix
• The K-NN algorithm works by finding the K nearest neighbors to a given data point based on a distance metric,
such as Euclidean distance.
K - NEAREST NEIGHBOUR (KNN)
Accuracy :
Classification Report :
Confusion Matrix
Python proficiency: Learned python to understand machine learning algorithms python libraries such as NumPy,
Pandas, Matplotlib and Scikit-learn.
Environment Choice: Selected Anaconda as primary environment for its efficient package management system
and support for data science tools.
Dataset selection: Identified and acquired a suitable dataset to project’s requirements.
Prioritized data preprocessing to ensure high-quality input for model training.
Recognized the importance of hyperparameters, and allocated more resources and time for hyperparameter
tuning.
NEXT STEPS
Integration: We will now integrate the main model with the front end of the website.
Feature Extraction: Design a mechanism to extract features and preprocess from URLs entered by users on
the website.
Compare Model Evaluation: Perform experiments to evaluate the performance of different models using real
URL inputs.
Deployment and Maintenance: Deploy the finalized web application with the integrated machine learning
model.
CONCLUSION
In Capstone Project Execution, we have implemented few classification models to predict phishing websites.
As the Random Forest Classifier and XG Boost classifier has performed better than other models.
In next phase of project implementation, we will integrate the classification model that performs better in real
environment.
Algorithms Accuracy
Random Forest 99.97%
XG Boost Classifier 99.56%
KNN 90%
Logistic Regression 91%
REFERENCES
https://ieeexplore.ieee.org/document/9730579
https://ieeexplore.ieee.org/document/10169697
https://ieeexplore.ieee.org/document/10249799
https://ieeexplore.ieee.org/document/9824544
https://ieeexplore.ieee.org/document/10049452
THANK YOU