Appendices e F

1
Appendix E
Endorsement
2
Republic of the Philippines BULACAN

STATE UNIVERSITY SARMIENTO
CAMPUS
City of San Jose del Monte, Bulacan
Tel. No: (044) 815-4089
Email Address: sarmiento@bulsu.edu.ph
INFORMATION TECHNOLOGY AND DATA SCIENCE DEPARTMENT
Form 4
Final Defense Endorsement Proponents /
Researchers
1) Bacera, Aljhon A.
2) Corneja, Welson P.
3) Oliveros, Nikie A.
4) Quillo, Erica Mae J.
5) Santos, Mark Ivan C.
I. THE CAPSTONE PROJECT ENTITLED FAKE AND PHISHING WEBSITE DETECTOR

USING MACHINE LEARNING ALGORITHM HAS BEEN REVIEWED BY THE
UNDERSIGNED AND IS RECOMMENDED FOR FINAL DEFENSE ON NOVEMBER ,
2023.
Marlon Hernandez, DEng

Signature Over Printed Name
Technical Adviser
Noted:
DR. MARY GRACE G. HERMOGENES

Faculty, Caps 02
Approved:
MS. MARY ROSE C. COLUMBRES

IIT Department Head
3
Appendix F
IMRAD
4
Fake and Phishing Website Detector using

Machine Learning Algorithm
Aljohn A. Bacera 1, Welson P. Corneja 2, Nikie A. Oliveros 3, Erica Mae J. Quillo4, Mark Ivan C.
Santos5, Marlon Hernandez6
1
Bulacan State University-Sarmiento Campus; Email: baceraaljhonardiente@gmail.com
2
Bulacan State University-Sarmiento Campus; Email:cornejawelson21@gmail.com
3
Bulacan State University-Sarmiento Campus; Email: nikie.oliveros.a@bulsu.edu.ph
4
Bulacan State University-Sarmiento Campus; Email: ericaquillo23@gmail.com
5
Bulacan State University-Sarmiento Campus; Email: markivan.santos.c@bulsu.edu.ph
6
Bulacan State University-Sarmiento Campus; Email: marlon.hernandez@bulsu.edu.ph
Abstract—Almost 90% of cyber-attacks start with Vinay Kumar Kureel et al (2022) proposed
phishing. Gullible people are most likely to get research to compare different classifier namely
deceived and because of evolving nature of phishing Support Vector Machine, Random Forest, Decision
tactics this becomes significant issues faced globally. Tree, and Gradient Boost Classifier and using 30
To prevent loses and damages, a reliable and effective
features. After implementations the results shows
phishing website detector is needed. There are several
approaches was used and using a machine learning is
that the Gradient Boost Classifier gives the better
a good weapon to opposed it. detection accuracy (97%) with lowest false
In this project different algorithm considered but negative rate that other classifiers.
based on literature reviews Gradient Boost have the
highest accuracy and we decide to use it. In making Dr. M Prasad, Ansifa Kouser (2023) proposed a
the model we used a dataset from Kaggle with 30 phishing website detector using machine learning
features. The Gradient Boost model we train is algorithm, because it is a strong tool to oppose
performing well with 98.2% accuracy with both low phisher and to stop the rapid advancement of
false negative and false positive rate. Even though the
phishing techniques. Gradient Boost Classifier is
model has high accuracy it still cannot catch up to the
new phishing tactics. the model use to identify phishing websites based
on URL significance. The finished work was a
Keywords—phishing, Gradient Boost. good anti-phishing system, the Gradient Boost
Classifier has an accuracy of 97% with lowest false
positive rate which is good to effectively detect
II. INTRODUCTION phishing websites
As time goes by technology is enhancing but also
IV. MATERIALS AND METHODS
the scammers. And Phishing attack is one of the
top cyber threats globally attackers lure people to
malicious websites by mimicking the website of
legitimate sites. To most of people especially to
gullible and people who are bot knowledgeable
about technology, a lot of people was deceive and
lose a lot of assets even individual and companies.
The prevent being deceive making a phishing site
detector using machine learning algorithm is a
great weapon to opposed it.
III. LITERATURE REVIEW

In this section we have searched some study that
uses machine learning algorithm to detect phishing
sites.
5
Iterative was used as the SDLC to slowly enhance

simpler system until it is complete. As the
requirements are slowly understand new functional
capabilities are added at each iteration. To start it;
 Gathering of requirements is vital this

includes identifying what are the factors
or features of a phishing website, data sets
to be used both legitimate and fake sites,
approach and best algorithm.
 Making a model train and test it using the
datasets. Thirty (30) features were used
from Address Bar-based features, Figure 2. Model train and test
Abnormal based features, HTML and This figure shows the code how the Gradient Boost
JavaScript based features and Domain was train and test.
based features. After finishing the model,
we add it to our flask application.
 Testing it to and if there’s some features
and functionalities is needed next proceed
to next iteration.
V. RESULT AND DISCUSSION

In this part shows the findings and outcomes of the
research.
Gradient Boost was used as the machine learning
algorithm and used 30 attributes as features.
Figure 3. Confusion Matrix

In this figure is the confusion matrix and it shows
that the model was performing well with an
accuracy of 98.2% ((TP+TN)/(TP+TN+FP+FN))
and having low false negative and false positive
rate both in train and test.
Figure 1. Datasets
This figure shows the dataset used to make the
machine learning model.
Figure 4. Recall, Precision, and F1-score
This figure shows that the recall, precision, and f1-

score are high that indicates that the Gradient Boost
model’s performance is good.
6
11(7), 1329–1335.
https://doi.org/10.22214/ijraset.2023.54854
https://www.irjet.net/archives/V9/i12/IRJET-
V9I12228.pdf
[2] Suman, B., Chetan Kumar, P., & Praveen Kumar, P.
(2017, March 3). Detecting Phishing Websites, a
Heuristic Approach. International Journal of Latest
Engineering Research and Applications (IJLERA)
ISSN: 2455-7137. http://www.ijlera.com/papers/v2-
i3/20.201703073.pdf
Figure 5. User Inteface
This is the user interface of the finished product,

there’s an input field where the link wants to be
assessed the URL if it is phishing or not and in
right side there’s a table of factors of phishing
website and the percentage of URL’s legitimacy.
Figure 5.1
This the sample of detecting an unsafe URL.
VI. CONCLUSION AND RECOMMENDATION

The model’s performance is promising and can
detect a phishing website but the problem is the
tactics of Phishers is evolving so this system might
not detect accurately to new phishing tactics.
Future researchers might need to established a new
framework to keep up with the evolving nature of
phishing tactics. Making a database, and modifying
datasets time to time Also, a browser inside the
system is
great recommendation to assess the URL to avoid
clicking it attempting to paste it in the input field.
REFERENCES
[1] Prasad, M. H. M. K., & M, A. (2023). Phishing

website prediction using gradient boosting
classifier. International Journal for Research in
Applied Science and Engineering Technology,

Appendices e F

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Appendices e F

Uploaded by

Copyright:

Available Formats

1

Republic of the Philippines BULACAN

I. THE CAPSTONE PROJECT ENTITLED FAKE AND PHISHING WEBSITE DETECTOR

Marlon Hernandez, DEng

DR. MARY GRACE G. HERMOGENES

MS. MARY ROSE C. COLUMBRES

Fake and Phishing Website Detector using

III. LITERATURE REVIEW

Iterative was used as the SDLC to slowly enhance

 Gathering of requirements is vital this

V. RESULT AND DISCUSSION

Figure 3. Confusion Matrix

Figure 4. Recall, Precision, and F1-score

This figure shows that the recall, precision, and f1-

Figure 5. User Inteface

This is the user interface of the finished product,

VI. CONCLUSION AND RECOMMENDATION

[1] Prasad, M. H. M. K., & M, A. (2023). Phishing

You might also like