Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Undergraduate Final Year Project Handbook

COMSATS UNIVERSITY ISLAMABAD

DEPARTMENT OF COMPUTER SCIENCE

1
Undergraduate Final Year Project Handbook

COMSATS University Islamabad,


Vehari, Pakistan

Project Proposal
for
Detecting E Banking Phishing Websites Using Associative Classification PHP and
machine learning

By
Ali Raza CUI/FA16-BCS-003/VHR

M Faraz Saleem CUI/FA16-BCS-111/VHR

Supervisor
Mam Riffat Javed

Bachelor of Science in Computer Science (2016-2020)

2
Undergraduate Final Year Project Handbook

Supervisor Signature

Date: 18-Oct-19

Table of Contents
Serial No Content Name Page No

1. Category………………………………………………………....….…………...05
2. Abstract…………………………………………………………….…………...06

3. Introduction ……………………………………................................................06

4. Problem Statements …………………………………………………………….07


5. Problem Solution for Proposed System …………………………….…… …….07
6. Related System Analysis/Literature Review………………………….…… …..07
7. Advantages/Benefits of Proposed System …………………………….….........08
8. Module…………………………………………………………………………...09
9. Scope……………………………………………………………………….…… 09
10. Tools and Technologies …………………………………………………….…. 10
11. Software Process Methodology ………………………………………………... 10
12. System Limitations/Constraints …………………………………………….….. 10
13. Project Stakeholders and Roles …………………………………………….…. 11
14. Team Members Individual Tasks/Work Division ………………………………. 11
15. Data Gathering Approach………………………………………………….…… 11
16. Concepts…………………………………………………………………….…... 12
17. Conclusion…….………………………………………………………................. 14
18. References.……………………….…………………………………………...……15

3
Undergraduate Final Year Project Handbook

Category: (B-Web Application/Web Application based Information System )

A-Desktop Application/Information System


B-Web Application/Web Application based Information System
C- Problem Solving and Artificial Intelligence
D-Simulation and Modeling
E- Smartphone Application
F- Smartphone Game
G- Networks
H- Image Processing Other (specify category) ______________________

4
Undergraduate Final Year Project Handbook

Abstract

Tremendous resources are spent by organizations guarding against and recovering from
cybersecurity attacks by online hackers who gain access to sensitive and valuable user data.
Many cyber infiltrations are accomplished through phishing attacks where users are tricked
into interacting with web pages that appear to be legitimate. In order to successfully fool a
human user, these pages are designed to look like legitimate ones. Since humans are so
susceptible to being tricked, automated methods of differentiating between phishing websites
and their authentic counterparts are needed as an extra line of defense. The aim of this research
is to develop these methods of defense utilizing various approaches to categorize websites.
Specifically, we have developed a system that uses machine learning techniques to classify
websites based on their URL. We used four classifiers: the decision tree, Naïve Bayesian
classifier, support vector machine (SVM), and neural network. The classifiers were tested with
a data set containing 1,353 real world URLs where each could be categorized as a legitimate
site, suspicious site, or phishing site. The results of the experiments show that the classifiers
were successful in distinguishing real websites from fake ones over 90% of the time.

5
Undergraduate Final Year Project Handbook

1. Introduction
While cybersecurity attacks continue to escalate in both scale and sophistication, social engineering
approaches are still some of the simplest and most effective ways to gain access to sensitive or
confidential information. The United States Computer Emergency Readiness Team (US-CERT)
defines phishing as a form of social engineering that uses e-mails or malicious websites to solicit
personal information from an individual or company by posing as a trustworthy organization or
entity [1]. While organizations should educate employees about how to recognize phishing e-mails
or links to help protect against the above types of attacks, software such as HTTrack is readily
available for users to duplicate entire websites for their own purposes. As a result, even trained
users can still be tricked into revealing private or sensitive information by interacting with a
malicious website that they believe to be legitimate.
The above problem implies that computer-based solutions for guarding against phishing attacks are
needed along with user education. Such a solution would enable a computer to have the ability to
identify malicious websites in order to prevent users from interacting with them. One general
approach to recognizing illegitimate phishing websites relies on their Uniform Resource Locators
(URLs). A URL is a global address of a document in the World Wide Web, and it serves as the
primary means to locate a document on the Internet. Even in cases where the content of websites
are duplicated, the URLs could still be used to distinguish real sites from imposters.
One solution approach is to use a blacklist of malicious URLs developed by anti-virus groups. The
problem with this approach is that the blacklist cannot be exhaustive because new malicious URLs
keep cropping up continuously. Thus, approaches are needed that can automatically classify a new,
previously unseen URL as either a phishing site or a legitimate one. Such solutions are typically
machine-learning based approaches where a system can categorize new phishing sites through a
model developed using training sets of known attacks.
One of the main problems with developing machine-learning based approaches for this problem is
that very few training data sets containing phishing URLs are available in the public domain. As a
result, studies are needed that evaluate the effectiveness of machine-learning approaches based on
the data sets that do exist. This work aims to contribute to this need. Specifically, the goal of this
research is to compare the performance of the commonly used machine learning algorithms on the
same phishing data set. In this work, we use a data set, where features from the data URLs have
already been extracted, and the class labels are available. We have tested common machine
learning algorithms for the purpose of classifying URLs such as SVM, Naïve Bayes’ classifier,
decision tree, and neural network.

2. Problem Statement
What problem does your software solve?
Our system is develop to protect the users from stealing their credential information in the
process of buying items and money transaction. Some websites are not real these web only

6
Undergraduate Final Year Project Handbook

provide interface like original one to get the data without user permission. Our system works
on the base of these fake URL to block these links.
Why you are developing this system?
We develop this system because people needs to feel protected when they buy products on
internet. This system help users to detect which website is stealing data and get awareness from
phishing.
Does the same system already exists?
yes, Some system also available on internet but they are not so efficient like our system. Our
system provide them a way to secure there credentials from the phishing sites.
What skills do you expect to learn from this project?
I expect that this project help us to learn much more such as how to link database, JavaScript,
PHP, CSS etc. This project also helpful in creation of table of different types like employee
table, user table, admin table, feedback table etc.
3. Problem Solution for Proposed System

Blacklist method
This is most commonly used approach in which list of phishing URL is stored in database and then if
URL is found in database, it is known as phishing URL and gives warning otherwise it is called
legitimate. This approach is easy and faster to implement as it see URL is in db or not. But limitations
is small change in URL is sufficient to bypass the list based technique and Frequent update of list is
necessary to counter new attack.
Heuristic based method
This is extension of blacklist and able to detect new attack as use features extracted from phishing site
to detect phishing attack. But limitation is cannot detect all new attack and easies to bypass once
attacker know algorithm or features used. In addition, this has poor detection because site may or may
not have common features.

Machine learning
This approach works efficiently in large dataset. This also removes drawback of existing approach and
able to detect zero day attack .Machine Learning based classifiers are efficient classifiers which
achieved accuracy more than 99% .Performance depends on size of training data, feature set, and type
of classifier. Limitation of this is it fails to detect when attacker use compromised domain for hosting
their site.
Many of research have been performed in this area of phishing detection. Most research has worked
on improving accuracy of phishing website detection using different classifiers. Various Classifiers
used are KNN, SVM, Decision tree, ANN, Naïve Bayes, PART, ELM and Random forest. Among all of

7
Undergraduate Final Year Project Handbook

this tree based classifiers DT and RF is best as increase dataset as per my literature survey. Therefore,
proposed approach will be on phishing website detection using tree-based classifiers.
Various performance measure used for analysis of best algorithm are F-measure, precision, recall,
accuracy, AUC, ROC curve etc.

4. Related System Analysis/Literature Review


Proposed classification approach that use heuristic based feature extraction approach. In this, they
have classified extracted features into three categories such as URL Obfuscation features, Third-
Party-based features, Hyperlink-based features. Moreover, proposed technique gives 99.55%
accuracy. Drawback of this is that as this model uses third-party features, classification of website
dependent on speed of third-party services. Also this model is purely depends on the quality and
quantity of the training set and Broken links feature extraction has a Don’t use more than 4
sentences for explaining a single system/application. limitation of more execution time for the websites
with more number of links. proposed anti phishing approach that extracts features from client-side
only. Proposed approach is fast and reliable as it is not dependent on third party but it extracts
features only from URL and source code. In this paper, they have achieved 99.09% of overall
detection accuracy for phishing website. This paper have concluded that this approach has
limitation as it can detect webpage written in HTML .Non-HTML webpage cannot detect by this
approach.

Briefly explain the related system analysis which help to specify the contribution of the proposed
project.

Table 1 Related System Analysis with proposed project solution


Application Name Weakness Proposed Project Solution
PHISHCOOP It provide not well interface Our system provide well
and some other features that interface for user and provide 5
lake this system. GB user data storage.
5. Advantages/Benefits of Proposed System
 This system can be used by many E-commerce Websites in order to have
good customer relationship.
 User can make online payment securely.
 Data mining algorithm used in this system provides better performance as
compared to other traditional classifications algorithms.

8
Undergraduate Final Year Project Handbook

 With the help of this system user can also purchase products online without
any hesitation.

6. Scope
Phishing is a way to obtain user’s private information via email or website. As usage of internet is
very vast, almost all things are available online now it is either about shopping cloths, electronic
gadgets, crockery or to payment of mobile, TV & electricity bill. Rather than standing out in line
for hours, people are being aware of using online method. Due to this phisher has wide scope to
implement phishing scam. As there is lot of research work done in this area, there is not any single
technique, which is enough to detect all types of phishing attack. As technology increases, phishing
attackers using new methods day by day. This enables us to find effective classifier to detection of
phishing. According to this, we can say tree-based classifiers in machine learning approach is best
suitable than other.

Modules
1. Registration
2. Login
3. Add to Blacklist
4. Check Website
5. Feedback
6. Change Password

Explanation of a Module:

The system comprises of 6 major modules as follows:


Module 1: Registration:
A visitor can register himself to the website to access it.
Module 2: Login:
After a successful registration, user/admin may input his credentials to login into the
system.
Module 3: Add to Blacklist:
Here, the system administrator adds the malicious website to the blacklist.
Module 4: Check Website:
Here, the user checks for the blacklisted website by inputting the URL.
Module 5: Feedback:
A user could send a feedback regarding the website to the admin.

9
Undergraduate Final Year Project Handbook

Module 6: Change Password:


Admin may change his password for security purpose by inputting old and new password.

7. System Limitations/Constraints
 If Internet connection fails, this system won’t work.
 All e-banking websites related data will be stored in one place.

8. Software Process Methodology


Object Oriented Methodology:
It works on real world problems like the work done and compose also easy when we are dealing
with objects. Such as
9. Tools and Technologies

Table 2Tools and Technologies for Proposed Project


Tools Version Rationale
MS Visual Studio 2017 IDE
MS SQL Server 2017 DBMS
Tools Adobe Photoshop CSC 6 Design Work
And MS Word 2017 Documentation
Technologies MS Power Point 2017 Presentation
Pencil 2.0.5 Mockups Creation
Technology Version Rationale
C# 6.0 Programming language
SQL 2017 Query Language
Html 5 Web Development
CSS 3 Web Designing
PHP 3.0 Database

10
Undergraduate Final Year Project Handbook

10. Project Stakeholders and Roles


Write down the project stakeholders and their roles.

Table 3Project Stakeholders for Proposed Project


Project All web applications and desktop applications should have real client.
Sponsor Mention your project sponsor.
Default option will be: COMSATS University, Islamabad
Stakeholder Mention your stake holders with their roles and responsibilities.
Default option will be:
• Students names
• Project Supervisor Name: Mr./Miss …
• Final Year Project Committee: Evaluation of project

11. Team Members Individual Tasks/Work Division


Table 4Team Member Work Division for Proposed Project
Student Name Student Registration Number Responsibility/ Modules
Ali Raza FA16-BCS-003 Designing models and apply
algorithm to database and
interconnect these models.

Documentation and helps in


designing page templates and
M Faraz Saleem FA16-BCS-111
database tables.

11
Undergraduate Final Year Project Handbook

12. Data Gathering Approach


-Using internet.
-Through Interview
-Through Questions

13. Concepts
Through this project we learn SQL, Photoshop, JavaSrcipt , HTML, CSS and PHP.

Conclusion
Phishing is a way to obtain user’s private information via email or website. As usage of
internet is very vast, almost all things are available online now it is either about shopping
cloths, electronic gadgets, crockery or to payment of mobile, TV & electricity bill. Rather
than standing out in line for hours, people are being aware of using online method. Due to
this phisher has wide scope to implement phishing scam. As there is lot of research work
done in this area, there is not any single technique, which is enough to detect all types of
phishing attack. As technology increases, phishing attackers using new methods day by
day. This enables us to find effective classifier to detection of phishing. According to this,
we can say tree-based classifiers in machine learning approach is best suitable than other
References
www.stackoverflow.com/

www.slideshare.com

www.githut.com

12

You might also like