FINAL MAJOR PROJECT FILE

“Phishing Link Detection
System”
Major project- Report
Submitted by: -
Aditya Yadav 0206IS201006
Aryan Jha 0206IS201015
Pooja Soni 0206IS201045
Rashi Nagaich 0206IS201050
Vishal Kumar 0206IS201067
Yash Kumar Singh 0206IS201069
In Partial Fulfillment For The Award Of The Degree
Of
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE &ENGINEERING – IOT and CYBER SECURITY

INCLUDING BLOCKCHAIN
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GYAN GANGA INSTITUTE OF TECHNOLOGY & SCIENCES

JABALPUR (M.P.)
RAJIV GANDHI PRODYOGIKI VISHWAVIDYALAYA,
BHOPAL (M.P.)
Dec- 2023
CERTIFICATE
This is to certify that the Major Project Report entitled “Phishing Link
Detection System” submitted by Aditya Yadav, Aryan Jha, Pooja Soni, Rashi
Nagaich, Vishal Kumar, Yash Kumar Singh has been carried out under my
guidance & supervision. The project report is approved for submission towards
partial fulfillment of the requirement for the award of degree of BACHELOR OF
TECHNOLOGY in COMPUTER SCIENCE & ENGINEERING from RAJIV
GANDHI PROUDYOGIKI VISHWA-VIDYALAYA, BHOPAL (M.P).
Prof. Satendra Sonare Dr. Ashok Kumar Verma

Guide HOD
Dept. of Computer Science and Dept. of Computer Science and
Engineering Engineering
DECLARATION
We hereby declare that the project report entitled “Phishing Link Detection
System” which is being submitted in partial fulfillment of the requirement for
award of the Degree of Bachelor of Engineering in Computer Science and
Engineering to “RAJIV GANDHI PROUDYOGIKI VISHWAVIDYALAYA,
BHOPAL (M.P.)” is an authentic record of our own work done under the
guidance of Prof. Satendra Sonare, Department of Computer Science &
Engineering, GYAN GANGA INSTITUTE OF TECHNOLOGY &
SCIENCES, JABALPUR.
The matter reported in this report has not been submitted earlier for the
award of any other degree.
Date:
Place: JABALPUR
ACKNOWLEDGEMENT
We sincerely express indebtedness to esteemed and revered guide

Prof. Satendra Sonare, Computer Science and Engineering Department for his
invaluable guidance, supervision and encouragement throughout the work. Without
his kind patronage and guidance the synopsis would not have taken shape.
We take this opportunity to express deep sense of gratitude to Dr. Ashok

Kumar Verma, Dean of Computer Science and Engineering for his
encouragement and kind approval. Also we thank him in providing the computer
lab facility. We would like to express our sincere regards to him for advice and
counseling from time to time.
We owe sincere thanks to all the faculties in Department of Computer

Science and Engineering for their advice and counseling time to time.
Aditya Yadav
Aryan Jha
Date :
Pooja Soni
Place : JABALPUR Rashi Nagaich
Vishal Kumar
Yash Kumar Singh
TABLE OF CONTENTS
Title Page No.
1. Abstract 2
2. 1. Introduction
1.1 Purpose of the Project 2
1.2 Scope of the Project 2
1.3 Project and Product Overview 3
1.4 Design Goals 3
1.5 Intended Audience 3
1.6 Team Architecture 4
1.7 Survey of Technology 5
1.8 Overall Description 7
1.9 Project Modules and Timeline 8
3. 2. Problem Statement
2.1 Selection of Project 11
2.2 System Requirement 12
2.3 Existing System 13
2.4 Drawbacks of Existing System 13
2.5 Proposed System 14
2.6 Advantages of Proposed System 14
2.7 Limitation of Proposed System 15
2.8 Application 15
4. 3. Specific Requirements
3.1 User Interface 16
3.2 Hardware Interface 16
3.3 Software Interface 17
3.4 Communication Interface 17
3.5 Non Functional Requirements 18
3.6 Software System Attributes 18
5. 4. Software Process Model
4.1 Determining Project Feasibility 19
4.2 Agile 19
4.3 Dev-ops Technology 20
6. 5. Overall Implementation Design
5.1 About the Frontend 21
5.2 About the Backend 21
7. 6. List of Figures 23
8. 7. Future Enhancement 28
9. 8. Conclusion 30
10. 9. Reference 31
1
ABSTRACT
In an era where cyber security threats continue to escalate, the detection of phishing links
has become imperative to safeguard online users and organizations. This report delves into the
innovative application of Artificial Intelligence (AI) and Machine Learning (ML) techniques for
the identification and mitigation of phishing attacks through the analysis of malicious URLs. The
study involves the collection of diverse datasets encompassing both phishing and legitimate
URLs, followed by rigorous data pre-processing and feature extraction. Leveraging state-of-the-
art ML algorithms, the model is trained to discern patterns indicative of phishing, ultimately
enhancing the accuracy and efficiency of detection mechanisms. The evaluation metrics,
including accuracy, precision, recall, and F1 score, provide a comprehensive assessment of the
model's performance. Through this research, we aim to contribute to the ongoing efforts in
fortifying online security, demonstrating the potential of ML in mitigating the evolving threat
landscape posed by phishing attacks. The findings not only showcase the efficacy of the
proposed approach but also pave the way for future advancements in adaptive and proactive
cybersecurity measures.
1. Introduction
1.1 Purpose Of TheProject
The purpose of the project, "Phishing Link Detection Using AI and ML," is to enhance
and fortify cybersecurity measures in the face of escalating phishing threats. Phishing attacks,
often camouflaged within seemingly legitimate URLs, pose a significant risk to individuals and
organizations alike. The primary objectives of this project include:
1. Identification of Phishing Links
- Develop an advanced system capable of accurately distinguishing between phishing and
legitimate URLs through the utilization of Artificial Intelligence and Machine Learning
techniques.
2. Data-Driven Analysis:
- Conduct a comprehensive analysis of diverse datasets containing examples of both phishing
and legitimate URLs to extract meaningful features and patterns that contribute to the
identification process.
3. Model Training and Optimization:
- Implement and train machine learning models on the curated datasets, optimizing their
performance to achieve high accuracy and efficiency in the detection of phishing links.
1.2 Scope of the project

The "Phishing Link Detection Using AI and ML" project is a critical response to the
escalating threat of phishing attacks in our digitally interconnected world. By harnessing the
power of Artificial Intelligence (AI) and Machine Learning (ML), the project aims to create an
advanced system capable of accurately identifying phishing links amidst the vast sea of online
data.
2
The project's scope begins with the meticulous collection of diverse datasets, representing
both phishing and legitimate URLs. Through data-driven analysis, the team extracts features
that serve as key indicators in the identification of deceptive links. The implementation phase
involves training machine learning models using extra tree, optimizing them to achieve high
accuracy and efficiency in distinguishing phishing from legitimate URLs.
Evaluation metrics, such as accuracy and precision, validate the effectiveness of the models
under real-world conditions. Beyond technicalities, the project extends its reach to educational
initiatives, fostering awareness and empowering users to recognize and the phishing attempts.
1.3 Project And Product Overview
The "Phishing Link Detection Using AI and ML" project endeavors to fortify cybersecurity by
developing Phishing Detection System. This comprehensive product integrates cutting-edge
technologies, leveraging AI and ML algorithms, to identify and mitigate phishing links in real-
time. The project involves meticulous data collection, feature extraction, and model training,
culminating in the creation of a robust system capable of adapting to evolving phishing tactics.
The Phishing Detection System offers real-time alerts, a user-friendly interface, and adaptive
security measures, seamlessly integrating into existing infrastructures. Beyond technical
sophistication, the product includes educational features to empower users in recognizing and
thwarting phishing attempts, making it a holistic solution to the persistent threat of phishing
attacks in the dynamic digital landscape.
1.4 Design Goals

The design goals for the "Phishing Link Detection Using AI and ML" project and its associated
product, the Phishing Detection System, are crafted to ensure a comprehensive, effective, and
user-centric solution to address the escalating threat of phishing attacks. The key design goals
include:
1. Accuracy and Reliability:
- Develop machine learning models that achieve high accuracy in distinguishing between
phishing and legitimate links, minimizing false positives and false negatives to enhance the
reliability of the system.
2. Real-time Detection:
- Enable the Phishing Detection System to provide real-time alerts, ensuring prompt responses to
potential threats and minimizing the window of vulnerability for users and organizations.
3. Scalability:
- Design the solution to be scalable, accommodating increasing volumes of data and user traffic,
and capable of handling the evolving complexities of phishing attacks.
4. Continuous Improvement Mechanism:
- Establish a mechanism for continuous improvement, including regular updates and patches to
adapt to new threats, improve system performance, and incorporate user feedback.
1.5 Intended Audience
The intended audience for the "Phishing Link Detection Using AI and ML" project and its
associated product, the Phishing Detection System, encompasses a diverse range of stakeholders
with varying expertise and responsibilities in the fields of cyber security, technology, and user
education. The primary target audiences include:
1. Cyber security Professionals:
3
- Security analysts, experts, and professionals involved in securing digital environments and
networks.
- Individuals responsible for implementing and managing cybersecurity solutions within
organizations.
2. Machine Learning Practitioners:
- Data scientists, machine learning engineers, and researchers interested in the application of
AI and ML techniques in cyber security.
3. IT Administrators and System Integrators:
- Professionals responsible for the integration, deployment, and maintenance of cybersecurity
systems within organizational IT infrastructures.
4. Educators and Trainers:
- Professionals involved in cyber security education and training programs, aiming to
incorporate the project's findings into academic curricula or training materials.
5. End-Users:
- Individuals who use digital platforms and are potential targets of phishing attacks.
- Everyday users who can benefit from the protection provided by the Phishing Detection
System and educational features.
1.6 Team Architecture

The development of the " Phishing Link Detection Using AI and ML " involves a collaborative
effort from a skilled team, with each member contributing expertise to specific aspects of the
project. The team is divided into two main groups: the Frontend Team and the Backend Team.
Frontend Team:
1. Aryan Jha – User Interface (UI):
 Design an intuitive dashboard for security analysts to interact with the system.
 Display real-time analytics related to detected phishing links.
2. Pooja Soni - CSS
 Specializes in styling the application using CSS.

 Offer different visual themes to cater to user preferences and enhance the overall
user experience.
Backend Team:
3. Aditya Yadav (Team Leader) - Data Ingestion
 Develop a module for collecting and preprocessing data, including URLs and associated metadata.
 Ensure secure handling and storage of sensitive information.
4.Rashi Nagaich – Evaluation and Implementation
 Implement routines for evaluating the performance of the machine learning model regularly.
 Implementation of desired codes.
Integration Team:
4
5. Vishal Kumar: Machine Learning Model
 Implement the AI/ML model for phishing link detection, trained on labeled datasets.
 Integrate the model into the backend to analyze incoming URLs.
6. Yash Kumar Singh: APIs for Model Access
 Create APIs to expose the ML model's functionality for real-time URL analysis .
Team Collaboration:
 Regular Meetings: The teams engage in regular meetings to discuss progress, address
challenges, and synchronize efforts between frontend and backend development.
 Version Control: Utilizes a version control system, such as Git, to manage code base
changes and ensure a cohesive and well-coordinated development process.
 Testing and Quality Assurance: Collaborates on testing strategies, with each team
member responsible for the testing of their respective components. Ensures
comprehensive quality assurance throughout the development lifecycle.
 Agile Methodology: Adopts agile methodologies to facilitate iterative development,
allowing for flexibility in response to changing requirements.
 Communication Channels: Establishes efficient communication channels, such as
messaging platforms or project management tools, to facilitate real-time collaboration
and quick issue resolution.
1.7 Survey Of Technology

The Phishing Link Detection System capitalizes on a meticulously chosen array of technologies,
harmoniously working together to enhance the system's functionality, performance, and user
experience. The examined technology stack spans both frontend and backend components,
guaranteeing a unified and scalable approach to development.
Frontend Technologies:
1.HTML:
 HTML (Hypertext Markup Language) is the standard language for creating web
pages and web applications. In the context of phishing link detection, HTML can
be used to analyze and understand the structure and content of a web page.
2.CSS:
 Purpose: Utility-first CSS framework for streamlined and customizable styling.
 Benefits: Rapid development, consistent design, and responsiveness across
various screensizes.
Backend Technologies:
3.API (Application Programming Interface):

 Purpose: Defines endpoints for communication between frontend and backend
components.
 Benefits: Enables a modular and scalable architecture, facilitating communication
between different parts of the application.
Collaboration and Development Tools:

5
4.Git (Version Control):
 Purpose: Manages collaborative coding efforts, tracks changes, and facilitates
version control.
 Benefits: Coordination among team members, code base management, and easy
rollback to previous versions.
5.Agile Methodology:
 Purpose: Facilitates iterative development, ensuring adaptability to changing
requirements.
 Benefits: Incremental progress, regular feedback loops, and continuous
improvement.
TIER ARCHITECTURE.
The tier architecture of the Phishing Link Detection System encompasses a well-organized
distribution of components across three primary tiers: the Presentation Tier, the Application
(or Logic) Tier, and the Data Tier. This architectural framework is designed to uphold
modularity, scalability, and streamlined communication between distinct layers of the
application.
1. Presentation Tier:
 Responsibility:
 Handles user interface, user interaction, and presentation of information.
 Components:
.
1. CSS(Styling):
 Provides utility-first styling for a consistent and visually appealing user
interface.
 Ensures responsive design across various devices.
2. Application (Logic)Tier:
 Responsibility:
 Manages the application logic, processes user requests, and orchestrates
communication between the frontend and back-end.
 Components:
1. Flask (Backend Framework):
 Handles routing, middleware, and server-side logic.
 Communicates with the frontend to process user requests and deliver responses.
2. API (Application Programming Interface):
 Defines and manages endpoints for communication between frontend and
backend.
 Orchestrates the flow of data and functionality.
3. Data Tier:
Responsibility:
 Manages data storage, retrieval, and database-related operations.
6
Additional Considerations:
 Git (Version Control):
 Not tied to a specific tier but crucial for version control and collaboration.
Advantages of Tier Architecture:

1. Modularity:
 Enables the separation of concerns, making it easier to manage, maintain, and
scale different components.
2. Scalability:
 Allows for the independent scaling of tiers based on demand, optimizing
performance.
3. Flexibility:
 Provides flexibility in choosing and updating individual components without
affecting the entire system.
4. Security:
 Facilitates the implementation of security measures at different tiers, ensuring a
layered security approach.
5. Maintainability:
 Simplifies debugging, testing, and updates as each tier has a specific and defined
role.
1.8 Overall Description
The "Phishing Link Detection System" is an advanced and user-focused cybersecurity application
aimed at transforming the way individuals identify and protect themselves from phishing attacks
online. Leveraging state-of-the-art technologies and a modular architecture, the project aims to deliver
a personalized, secure, and vigilant experience in detecting and thwarting phishing attempts.
Key Features:
1. Phishing Link Detection:

• Users can customize their phishing detection preferences, creating a personalized system tailored
to their security needs.
• Real-time updates ensure users receive immediate alerts about the latest and most relevant
phishing threats.
2. Interactive User Features:
• The application includes a robust set of interactive features, such as reporting, voting, and
sharing information about phishing attempts.
• Users can actively participate in building a community defense, sharing insights, and collectively
combating phishing attacks.
3. Secure Authentication and Data Protection:
• Implements secure authentication mechanisms using JWT, ensuring the confidentiality of user
credentials.
• User data, including threat reports and preferences, is securely stored and managed, with
sensitive information encrypted for enhanced security.
4. Third-Party Threat Intelligence Integration:
• Dynamically fetches threat intelligence data from external sources, providing a comprehensive
and real-time view of emerging phishing threats.
• Enhances the system's capabilities by incorporating external threat data for a more robust defense
against evolving phishing tactics.
5. Responsive Design and Intuitive Navigation:
• Utilizes CSS for a responsive design, ensuring a seamless user experience across devices.
7
• Intuitive navigation allows users to explore different sections of the application effortlessly,
empowering them to stay vigilant against phishing threats.
 Tiered Architecture:
Divides the application into Presentation, Application, and Data Tiers for modularity, scalability,
and efficient communication.
 Technology Stack:
 Frontend development is powered by CSS for a dynamic and visually appealing user
interface.
 Backend functionality is implemented using Flask Framework.
User-Centric Design:
 Profile Customization:
 Users have the ability to create and manage personalized profiles, tailoring their news
consumption experience.
 Community Interaction:
 Interactive features such as comments and voting foster community engagement, encouraging
discussions and the exchange of opinions.
Collaboration and Agile Development:
 Multi disciplinary Team:
 The project is executed by a collaborative team with members specializing frontend,
backend, security, and database development.
 Agile Methodology:
 Adopts agile methodologies for iterative development, ensuring adaptability to changing
requirements and continuous improvement.
Future Enhancements:
 The project lays the groundwork for future enhancements, with the potential for feature
expansions, improved personalization algorithms, and additional integrations to further enrich
the user experience.
1.9 Project Modules
1. API Integration Module:
 Objective: Fetches diverse and up-to-date news content from external sources.
 Key Features:
 Integration with a API.
 Seamless updating of the application's content offerings.
2. Backend Logic and API Module:
 Objective: Manages server-side logic and communication with the frontend.
 Key Features:
 Express.js for routing and middleware.
8
 API definition and management for frontend-backend communication.
 Integration with other modules for cohesive functionality.
3. Responsive Design and UI/UX Module:
 Objective: Ensures a visually appealing and user-friendly interface across devices.
 Key Features:
 CSS for responsive and consistent styling.
 Intuitive navigation and user interfaced sign.
DURATION:
Phase 1: Project Initiation and Planning (Week 1-2)
1. Week 1: Project Kickoff

 Conduct a kickoff meeting to introduce the team and establish project goals.
 Define roles and responsibilities.
 Set up communication channels and collaboration tools.
2. Week 2: Requirement Gathering and Analysis
 Engage with stakeholders to gather detailed project requirements.
 Analyze requirements to identify technical specifications and dependencies.
 Document user stories, features, and system specifications.
Phase 2: Design and Architecture (Week 3-5)
3. Week 3: Architectural Design

 Define the overall architecture, including tiered structure and technology stack.
 Allocate responsibilities to team members based on their expertise.
 Review and finalize the architecture with the entire team.
4. Week 4: UI/UX Design and Wire framing
 Begin the design of the user interface, considering responsiveness and user
experience.
 Create wire frames for key application screens.
 Collect feedback from stakeholders and iterate on design elements.
5. Week 5: Detailed Design and Database Schema
 Develop detailed design specifications for each module.
 Finalize the database schema, ensuring efficiency and scalability.
 Review design documents with the team for consensus.
Phase 3: Development (Week 6-10)
6. Week 6-8: Frontend and Backend Development

 Start frontend development using HTML and CSS.
 Simultaneously, begin backend development with flask, API.
7. Week 9: Integration API
 Integrate the API for fetching diverse and real-time news content.
 Ensure seamless communication between the application and the external API.
Phase 4: Testing and Refinement (Week 11-13)

9
8. Week 12: Testing and Debugging
 Perform comprehensive testing, including functional, usability, and performance
testing.
 Address bugs, issues, and inconsistencies identified during testing.
9. Week 13: Optimization and Performance Tuning
 Optimize code for performance and responsiveness.
 Conduct load testing to ensure the application can handle expected user traffic.
 Refine UI/UX based on feedback from testing phases.
Phase 5: Deployment and Launch (Week 14-15)
10. Week 14: Final Preparations

 Conduct final checks and validations.
 Set up hosting environment for production deployment.
11. Week 15: Deployment and Launch
 Deploy the application to the production environment.
 Monitor for any issues post-deployment and address them promptly.
Phase 6: Post-Launch and Iteration (Ongoing)
12. Ongoing: Monitoring and Maintenance

 Implement monitoring tools to track application performance.
 Address any post-launch issues promptly.
 Plan for regular maintenance and updates based on user feedback and evolving
requirements.
10
2. Problem Statement:
2.1 Selection of Product
 Relevance:
Phishing attacks are prevalent and cause significant damage. Creating a detection system
contributes to combating this cyber security threat.
 Learning Opportunity:
It allows you to delve into various aspects of cyber security, machine learning, data analysis,
and system design, gaining practical skills in these areas.
 Impact:
A successful system could have a tangible impact by safeguarding users' sensitive
information and preventing potential financial or data loss.
 Innovation:
Developing a novel approach or improving existing methods in phishing detection can be
innovative and contribute to the field of cyber security.
 Career Prospects:
Projects in cyber security often attract attention from potential employers, showcasing your
abilities in an area with high demand for skilled professionals.
System Requirement
Hardware Requirements:
1. Server Infrastructure:
 Multiple servers for load balancing and redundancy.
 Sufficient processing power and memory to handle concurrent user requests.
 Storage capacity for application files, user data, and news content.
2. Network Infrastructure:
 High-speed internet connectivity to ensure quick data transfer.
 Load balancers for distributing incoming traffic across multiple servers.
Software Requirements:
3. Operating System:
 For Servers: Linux-based operating system (e.g., Ubuntu, Cent OS) for stability
and security.
 For Development: Compatible with Windows, mac OS, and Linux for
11
development environments.
.
4. Security Tools:
 SSL/TLS certificates for secure data transmission.
 Security protocols and firewalls to protect against unauthorized access.
Environmental Requirements:
5. Development Tools:
 Git for version control.
 Integrated Development Environment (IDE) compatible with JavaScript. Flask
(e.g., Visual Studio Code).
Performance and Scalability:
6. Load Testing Tools:

 Load testing tools (e.g., Apache JMeter) to simulate high traffic conditions.
 Performance monitoring tools for tracking system behavior under different loads.
Compatibility:
7. Cross-Browser Compatibility:
 Compatibility with major web browsers (e.g., Chrome, Firefox, Safari, Edge).
8. Responsive Design:
 Responsive design to ensure a consistent user experience across various devices
(desktops, laptops, tablets, smart phones).
2.2.1 Usage
The web forms should be self-explanatory and usable. We do not want prospective clients
dropping of the website because they cannot understand the forms and find them cumbersome.
2.2 Existing System:

The current system for phishing link detection relies on manual processes. Users are required to
manually identify and report phishing links, lacking an automated mechanism for timely and
efficient detection. This manual approach poses challenges in terms of scalability, accuracy, and
real-time responsiveness. Additionally, the system lacks proactive measures to adapt to evolving
phishing tactics and does not provide a comprehensive analysis of diverse data sets to extract
meaningful features for accurate identification. There is no automated training of machine
learning models, and the absence of evaluation metrics makes it challenging to assess the
system's performance objectively. In essence, the existing system falls short in providing a
robust, automated, and adaptive solution for the effective detection of phishing links.
Key Features of the Existing System:
The key features of the existing phishing link detection system include:
1.Manual Identification and Reporting: Users play a central role in identifying and reporting
phishing links, contributing to the system's data input.
2.Lack of Automation: The system relies on manual processes, lacking automated mechanisms
for the efficient and timely detection of phishing links.
12
3. Scalability Challenges: Due to its manual nature, the existing system may face challenges in
handling large volumes of data and scaling to meet increasing demands.
4. Absence of Real-Time Responsiveness: The system may not provide real-time responses to
emerging phishing threats, potentially leaving users and organizations vulnerable for
extended periods.
5. Limited Adaptability: There's a lack of proactive measures to adapt to evolving phishing

tactics, potentially making the system less effective against new and sophisticated attacks.
6. Insufficient Analysis of Diverse Data sets: The system may not conduct a comprehensive
analysis of diverse datasets containing both phishing and legitimate URLs, limiting its ability
to extract meaningful features for accurate identification.
7. No Automated Model Training: The absence of automated training for machine learning
models hinders the system's ability to continuously improve and stay updated with the latest
threat landscape.
8. Lack of Evaluation Metrics: The existing system may not employ evaluation metrics such as
accuracy, precision, recall, and F1 score, making it challenging to objectively assess its
performance.
In summary, the current system relies heavily on manual efforts, which may result in limitations
related to scalability, real-time responsiveness, adaptability, and overall effectiveness in
detecting phishing links.
2.3 Drawback Of Existing System:

The existing system faces several drawbacks that limit its effectiveness and user satisfaction:
The drawbacks of the existing phishing link detection system include:
1. Manual Intervention: Heavy reliance on manual identification and reporting makes the
system labor-intensive, prone to human error, and less efficient in detecting phishing
links promptly.
2. Limited Automation: Lack of automated processes hampers the system's ability to adapt
to the dynamic and rapidly evolving nature of phishing attacks, leading to delays in
response and increased vulnerability.
3. Scalability Issues: The manual nature of the system may struggle to handle a growing
volume of data, making it challenging to scale effectively to meet increasing demands
and data complexities.
4. Delayed Responses: Without real-time detection and response capabilities, the system
may fail to provide timely alerts, allowing phishing threats to persist for extended periods
before mitigation measures are implemented.
5. Inability to Learn and Improve: The absence of automated model training means the
system may not learn from new data and adapt its detection mechanisms over time,
resulting in a lack of continuous improvement.
6. Limited Analysis of Diverse Data sets: The system may not conduct thorough analysis of
diverse datasets, limiting its ability to identify emerging patterns and characteristics of
13
phishing links effectively.
7. Lack of Comprehensive Evaluation Metrics: The absence of evaluation metrics such as

accuracy, precision, recall, and F1 score makes it difficult to objectively measure and
assess the system's performance.
8. Vulnerability to Sophisticated Attacks: The static nature of the system may render it less
effective against sophisticated and evolving phishing tactics, as it may not proactively
adjust its detection strategies.
9. Dependency on User Reporting: Relying solely on users for reporting phishing links may
lead to under reporting or delays in identification, reducing the overall effectiveness of
the system.
Addressing these drawbacks would likely involve implementing more advanced, automated,
and adaptive technologies, such as machine learning and artificial intelligence, to enhance
the system's capabilities in phishing link detection.
2.3 Proposed System

A proposed phishing link detection system could integrate various approaches to
mitigate existing drawbacks:
Hybrid Detection Models:

Combine heuristic-based approaches with machine learning techniques to minimize
false positives and effectively adapt to evolving phishing tactics.
Real-Time Updates:
Implement mechanisms for continuous updates from various sources, ensuring the system
remains current and capable of identifying new threats promptly.
Behavioral Analysis:
Include features to analyze user behavior patterns, such as click patterns or browsing
habits,
to enhance the accuracy of link classification.
User-Friendly Alerts:
Design intuitive and informative alerts or warnings for users, ensuring that genuine
websites
are not mistaken as phishing links and maintaining user trust.
Scalability and Efficiency:

Develop algorithms and systems that are resource-efficient, allowing for scalability
without a significant increase in hardware requirements.
Collaboration and Feedback:

Enable mechanisms for users or administrators to provide feedback on flagged links,
improving the system's accuracy through continuous learning.
Integration with Education Initiatives:

Pair the system with educational campaigns to raise user awareness about phishing
threats, empowering users to identify potential risks.
A holistic approach combining technology, user interaction, and ongoing improvement

strategies would form the basis of an effective proposed phishing link detection system.
14
2.4 Applications of Proposed System:
The application of a phishing link detection system spans various domains and industries:
Web Browsers and Extensions:
Integrating the system into web browsers or developing browser extensions can provide
real-time protection to users while browsing.
Email Security:
Incorporating the system into email clients or servers helps identify and block phishing
links embedded within emails, preventing users from accessing malicious websites.
Enterprise Security Solutions:
Deploying the system within corporate networks can bolster the organization's
cybersecurity posture, protecting employees from phishing attacks across various
communication channels.
E-commerce and Financial Services:
Implementing the system in online transactions and financial platforms helps ensure the
security of sensitive information, reducing the risk of financial fraud.
Social Media Platforms:
Integrating the system into social media platforms can safeguard users from clicking on
malicious links shared within posts, messages, or advertisements.
Government and Public Services:
Utilizing the system in government websites or public service portals enhances
cybersecurity measures, protecting citizens' data and information.
Mobile Applications:
Incorporating the system into mobile apps can provide on-the-go protection, securing users
from phishing attempts while using various applications on their smartphones.
By integrating phishing link detection into diverse applications and platforms, the system
can effectively mitigate the risks posed by phishing attacks across multiple digital touchpoints.
Limitations:
 Dependency on Data Quality: The accuracy and effectiveness of the system heavily rely on
the quality and completeness of the data used for training and updating the detection models.
 User Awareness: Systems might not fully prevent phishing if users ignore warnings or fail to
understand the system's alerts, making user education crucial alongside the technical solution
 Maintenance and Updates: Regular maintenance and updates are necessary to keep the system
effective, requiring ongoing efforts to update databases, algorithms, and detection
mechanisms.
 False Positives: Overly stringent detection criteria might result in legitimate links being
flagged as phishing, leading to user inconvenience or distrust in the system's accuracy.
15
3. SPECIFIC REQUIREMENTS
3.1 User Interface
User Interface for Phishing Link Detection System
1. Landing Page:
- Present a visually appealing and user-friendly landing page introducing the application's core
features.
2. Dashboard:
- Design an interactive and visually appealing user dashboard displaying real-time phishing
link detection results.
- Include sections for flagged URLs, recent scans, and educational content.
- Implement user-friendly navigation for easy exploration of different sections within the
application.
3. Notifications:
- Implement a notification system for user interactions, such as flagged links and system
updates.
4. Error Handling:
- Display an error if the user enters an incorrect URL.
- Offer helpful tips to guide users in resolving common problems.
- Ensure a responsive and optimized user interface across various devices, including desktops,
tablets, and smartphones.
- Conduct thorough testing to guarantee a consistent and user-friendly experience across
different screen sizes.
6. Accessibility:
- Design the interface with accessibility in mind.
- Provide alternative text for images and ensure compatibility with screen readers.
7. Security Measures:
- Implement secure protocols (HTTPS) to protect user data during transmission.
- Utilize industry-standard encryption for storing and handling user information.
3.2 Hardware Interface
Processor:
 Multi-core processor with a clock speed of 2.0 GHz or higher.
16
Primary Memory:
 Minimum 2 GB RAM for basic functionality.
 Recommended 4 GB or higher for improved performance.
Storage:
 Support both portrait and landscape orientations on devices that allow orientation
changes.
 Ensure that the app adapts to different orientations without compromising usability.
Internet Connectivity:
 At least 10 GB of available storage.
Display:
 A monitor with a minimum resolution of 1024x768.
Internet Connection:
 An active and stable internet connection.
Internet Connection:
 An active and stable internet connection.
3.3 Software Interface
Operating System:
 Windows, Linux, Mac OS
Software:
 Flask
Web Browser:
 Any Web Browser
3.4 Communication Interface
 The health prediction app will display the user interface to users which will be a GUI.
 The customers while using the app will be communicating in online mode.
3.5 Non-functional Requirements
Security :
 Ensure user data privacy and secure authentication.
 Protect against common web vulnerabilities (e.g. XSS, CSRF, SQL injection).
Scalability:
 Design the system to handle a growing number of users.
Usability:
 The app should have an intuitive and user-friendly interface.
 Conduct usability testing to gather user feedback for improvements.
17
Reliability:
 Minimize downtime and errors.
 Implement regular backups and error handling mechanisms.
3.6 Software System Attributes
Reliability:
The prediction app should be easy and without any mistakes so that user should be able to
handle and make use of it very safely.
Availability:
The project should be available 24 hours a day, 7 days a week. The system will be available to
the user whenever the user needs it.
Maintainability:
Our approach extends beyond reactive updates. Proactive maintenance strategies are in place to
anticipate and address potential issues before they impact system performance. This forward-
looking approach enhances system reliability and minimizes the need for urgent, disruptive
fixes.
Portability:
Our project will be portable on any platform that allows the user to access it easily anywhere
and at a faster speed than others.
Software Process Model
4.1 Determining project feasibility

The feasibility study is not a full-blown systems study. Rather, the feasibility study is used to
gather broad data to make a decision on whether to proceed with system study. System project
feasibility is assessed in three principal ways:
 Economically
 Technically
 Operationally
The organization has evaluated cost of software and hardware required for the system including
the storage of data. The benefits expected from the system are studied to assess the reduced cost
due to the new system.
Economical Feasibility:
The organization has evaluated cost of software and hardware required for the system including
the storage of data. The benefits expected from the system are studied to assess the reduced cost
due to the new system.
Technical Feasibility:
Organization has shown willingness to purchase all hardware and software tools which we
recommend to successfully implement the system. Hence technically there are no limitations for
the development of the system. As far as programming efforts are concerned, we are familiar
with java programming. Thus the project is technically feasible.
Operational Feasibility:
18
Operational feasibility is dependent on the humans who will be using the software once it’s
ready and installed for use. The software will have a user friendly interface which will be much
convenient . Thus the project is operationally feasible.
4.2 Agile Model

Introduction:
- The development approach for the Phishing Link Detection System will adhere to Agile
principles, emphasizing adaptability, collaboration, and incremental progress. Agile
methodology enables responsiveness to evolving requirements and ensures the continuous
delivery of value to end-users.
Key Principles:
1. Iterative Development:
- Break down the project into manageable increments or sprints lasting 1-4 weeks.
- Deliver functional features regularly within each sprint to facilitate continuous feedback and
tangible progress.
2. Adaptability:
- Embrace changes in requirements, allowing adjustments throughout the development
process.
- Prioritize responding to user feedback and evolving needs to enhance the overall
effectiveness of the system.
3. Collaboration:
- Foster open communication and collaboration among team members, stakeholders, and end-
users.
- Regularly engage with stakeholders to gather feedback and refine priorities for ongoing
improvements.
4. Continuous Improvement:
- Conduct regular retrospectives at the end of each sprint to reflect on achievements and
identify areas for enhancement.
- Apply lessons learned to enhance team efficiency and the quality of the phishing link
detection system.
Roles and Responsibilities:

- Product Owner:
- Represents the interests of end-users and stakeholders.
- Defines and prioritizes features for each sprint.
- Scrum Master:
- Facilitates collaboration within the team.
- Removes impediments and ensures adherence to Agile principles.
- Development Team:
- Cross-functional team responsible for delivering increments of the phishing link detection
system.
- Collaborates on task assignments and collectively owns the quality of deliverables.
Sprint Planning:
- Frequency:
- Conduct sprint planning meetings at the beginning of each sprint.
- Activities:
19
- Define sprint goals and select user stories based on priority.
- Break down tasks, estimate effort, and create a sprint backlog.
Daily Stand-ups:
- Frequency:
- Hold daily stand-up meetings to maintain team alignment.
- Activities:
- Share progress updates, discuss challenges, and plan for the day.
- Identify and address any impediments requiring resolution.
Sprint Review:
- Frequency:
- Conduct sprint reviews at the end of each sprint.
- Activities:
- Demonstrate completed features to stakeholders.
- Collect feedback for further refinement of the phishing link detection system.
Retrospective :
- Frequency:
- Hold retrospectives at the end of each sprint.
- Activities:
- Reflect on achievements, identify areas for improvement, and plan actions for the next sprint.
- Continuously refine team processes and collaboration strategies.
User Stories and Backlog:

- User Stories:
- Define user stories to capture end-user requirements.
- Prioritize user stories based on value and dependencies.
- Backlog:
- Maintain a product backlog with user stories for future sprints.
- Continuously refine and reprioritize the backlog based on evolving needs.
Continuous Integration and Deployment:

- Automation:
- Implement continuous integration for automated testing and validation.
- Establish a continuous deployment pipeline for seamless releases of the phishing link
detection system.
Documentation:
- Lightweight Documentation:
- Prioritize working software over comprehensive documentation.
- Maintain just enough documentation to support ongoing development efforts.
Flexibility in Design:
- Iterative Design:
- Embrace an iterative approach to design, allowing for adjustments based on user feedback
and evolving requirements for the phishing link detection system.
4.3 DevOps Technology
Version Control System:

- Git:
- A distributed version control system to track changes in source code during website
development.
20
- Popular Git repository hosting services like GitHub, GitLab, and Bitbucket support
collaborative code management.
Containerization:
- Docker:
- A platform for developing, shipping, and running applications in containers.
- Ensures consistency across different environments by encapsulating applications and their
dependencies.
- Kubernetes:
- An open-source container orchestration platform automating deployment, scaling, and
management of containerized applications.
Infrastructure as Code (IaC):

- Terraform:
- An open-source IaC tool allowing users to define and provision infrastructure using a
declarative configuration language.
- Supports various cloud providers for seamless infrastructure management.
- AWS Cloud Formation:
- A service enabling users to define and provision AWS infrastructure as code using a template
language.
Overall Implementation Design
5.1 Frontend Implementation Design:

1. Technology Stack:
 Framework: Flask
 Structure : HTML
 Styling: CSS
 AI&ML Algo: Extra Tree
2. Project Structure:
 Components:
 Create reusable components for different UI elements (e.g., buttons, cards,
navigation).
 Pages:
 Organize pages for different sections (e.g., Home, About pages).
 Styles:
 Utilize CSS for styling, keeping styles modular and responsive.
3. Personalization:
 Implementation:
 This API undergoes self-training with newly introduced links.
 Implementation:
21
 Utilize CSS for responsive design.
 Test and optimize the app for various screen sizes.
5.2 Backend Implementation Design:

1. Technology Stack:
 Framework: Flask
2. Project Structure:
 Routes:
 Define routes for About and Help pages.
3. API Endpoints:
 Implementation:
 Create REST full API endpoints for frontend-backend communication.
 Ensure proper validation and error handling.
4. Security:
 Implementation:
 Implement secure communication with HTTPS.
22
6 List of Figures
a) Use case Diagram

b) Sequence Diagram
c) Class Diagram
d) Data Flow Diagram
e) Activity Diagram
(A) Use Case Diagram:

A use case diagram at its simplest is a representation of a user’s interaction the system and
depicting the specification of a use case. A use case is a description of how end-user will use a
software code. It describes a task or a series of tasks that user will accomplish using the
software and include the responses of the software to user actions.
23
FIG.6.1
24
(B) Sequence Diagram:
A sequence diagram in unified modeling language (UML) is a kind of interaction diagram that
shows how processes operate with one another and in what order. It is a construct of a message
sequence diagram are sometimes called event diagram.
A sequence diagram shows, as Parallel vertical lines (Lifeline), different processes or objects that
live simultaneously and as horizontal arrows, the message exchanged between them, in the order
in which they occur. This allows the specification of simple run-time scenarios in a graphical
manner.
FIG 6.2
25
(C) Class diagram:
A class diagram is a type of static structure diagram in the Unified Modeling Language (UML)
that represents the structure and organization of a system or application in terms of classes, their
attributes, methods, and the relationships between them. Class diagrams are widely used in
software engineering to visually depict the key aspects of a system's design.
FIG 6.3
26
(D) DFD:-
Data Flow Diagram is the graphical description of the system’s data and how the processes
transform the data. The information flow and the transform that are applied as data move from
the input to output. It is starting point of the design phase that functionally decomposes the
requirement specifications down to the lowest level of details. Thus a DFD describes what data
flow(logical) rather than how they are processed.
Unlike details flowchart, data flow diagram do no supply detailed description of the module but
graphically describes a system’s data interact with the system. to construct a data flow diagram,
we use-
 Arrows
 Circles
 Open end box
 Square
An arrow identifies the dataflow in motion. it is a pipeline though which information is flows
like the rectangle in the flowchart. A circle stands for process that converts data into information.
An open-ended box represents a data store, Data at rest or a temporary repository of data. Square
defines a source or destination of system data.
FIG 6.5 DFD
27
(E) Activity diagram:
Activity diagram is basically a flow chart to represent the flow form one activity to another
activity. The activity can be described as an operation of the system. It captures the dynamic
behavior of the system.
FIG 6.6
28
7. Future Enhancements:
1. SMS Phishing Detection:

- Expand the system's capabilities to include detection of SMS phishing (smishing)
attempts, providing a comprehensive solution across multiple communication channels.
2. Machine Learning Model Refinement:

- Continuously refine and update machine learning models to adapt to evolving phishing
techniques and enhance the accuracy of link detection.
3. Behavioral Analysis:
- Integrate behavioral analysis to detect anomalies in user behavior and identify patterns
indicative of phishing activities, adding an extra layer of security.
4. Multi-language Support:
- Extend language support for phishing link detection to cater to a broader user base and
address phishing threats in various languages.
5. Mobile Application Integration:

- Develop a dedicated mobile application for iOS and Android platforms, providing users
with on-the-go access to phishing link detection features.
6. Browser Extension Improvements:

- Enhance browser extensions by adding features such as real-time URL scanning during
page loads, offering users immediate warnings about potential phishing threats.
7. Integration with Email Clients:

- Collaborate with popular email clients to integrate phishing link detection directly into
email platforms, ensuring users are protected at the point of interaction.
8. Community Reporting System:

- Implement a community reporting system where users can report suspected phishing
links, contributing to a collaborative effort in keeping the database of malicious URLs
up-to-date.
9. Advanced URL Analysis:

- Incorporate advanced URL analysis techniques, such as lexical analysis and semantic
analysis, to improve the system's ability to detect subtle phishing attempts.
10. User Training Modules:

- Develop interactive user training modules within the application, educating users on
how to recognize and avoid phishing attempts, thereby reducing the likelihood of
falling victim to such attacks.
29
11. Integration with Security Information and Event Management (SIEM) Systems:
- Integrate with SIEM systems to enhance the overall security posture of organizations by
feeding phishing threat intelligence into broader security analytics.
12. Enhanced Reporting and Analytics:

- Provide users and administrators with detailed reports and analytics on phishing
attempts, allowing for insights into trends and potential areas of vulnerability.
13. Blockchain Integration for URL Reputation:

- Explore the use of blockchain technology to enhance the security and integrity of the
URL reputation database, ensuring tamper-proof records of known malicious URLs.
14. Automated Threat Response:

- Implement automated threat response mechanisms, allowing the system to take predefined
actions against identified phishing threats without manual intervention.
15. Cross-Platform Collaboration:

- Collaborate with other cybersecurity platforms and services to create a more
interconnected and resilient defense against evolving phishing threats.
Continuous research, adaptation to emerging technologies, and a commitment to user

education will be crucial in keeping the phishing link detection system robust and
effective in the face of evolving cybersecurity challenges.
8. Conclusion:
30
In conclusion, the "Phishing Link Detection System" embodies a robust solution
tailored to empower users with a vigilant and secure online experience. By centering on cutting-
edge Artificial Intelligence and Machine Learning, the system excels in identifying and
thwarting phishing threats, offering a comprehensive defense against malicious activities.
Throughout the developmental journey, leveraging technologies such as Python, Extra Tree,
Flask, HTML, CSS, and JavaScript has fortified the system's foundation. The user interface
prioritizes simplicity and functionality, providing a seamless experience for real-time URL
scanning, email verification, educational purposes, API integration, and browser extension
usage. Looking forward, the system is poised for continual improvement, with a focus on
enhancing machine learning models, user education, containerization with Docker, infrastructure
scalability, user feedback integration, and potential expansion into SMS phishing detection. This
commitment to evolution ensures the "Phishing Link Detection System" remains an adaptive and
resilient solution in the ever-evolving landscape of cybersecurity. As technology advances and
new threats emerge, the development team remains dedicated to refining and expanding the
system's capabilities to meet the diverse and dynamic challenges of online security. The journey
extends beyond the initial release, marking the beginning of a vigilant and responsive platform
that evolves in tandem with the ever-changing landscape of cyber threats. In essence, the
"Phishing Link Detection System" not only aims to identify malicious links but also strives to
cultivate a community where users actively participate in the protection of their online
environment. With unwavering commitment to innovation, user security, and excellence, the
system stands as a testament to the potential at the intersection of technology and proactive cyber
security.
31
9. References
Websites:
YouTube https://www.youtube.com
Google https://www.google.com
Bing https://www.bing.com
Kaggle https://www.kaggle.com
Wikipedia https://www.wikipedia.org
Google Collab https://colab.research.google.com
Books: HTML & CSS: "HTML & CSS: Design and Build Web Sites"
Author: Jon Duckett
This is the perfect book for those who want to learn HTML, CSS, and web design from scratch. It's
packed with easy-to-follow, beautiful visuals on every page to help you understand the concepts
better.
JavaScript: "JavaScript and JQuery: Interactive Front-End Web Development"

Author: Jon Duckett
Expert techniques to make your websites more interactive and engaging In JavaScript and jQuery:
Interactive Front-End Development, that delivers a fully illustrated guide to making your websites
more interactive and your interfaces more interesting and intuitive.
Python & Flask: "Flask Web Development 2nd Edition"

Author: Miguel Grinberg
Take full creative control of your web applications with Flask, the Python-based micro framework.
Explore the framework's core functionality, and learn how to extend applications with advanced
web techniques such as database migrations and an application programming interface.
Machine Learning: "Pattern Recognition and Machine Learning"

Author: Christopher M. Bishop
The field of pattern recognition has undergone substantial development over the years. This book
reflects these developments while providing a grounding in the basic concepts of pattern
recognition and machine learning.
32

FINAL MAJOR PROJECT FILE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FINAL MAJOR PROJECT FILE

Uploaded by

Copyright:

Available Formats

“Phishing Link Detection

In Partial Fulfillment For The Award Of The Degree

COMPUTER SCIENCE &ENGINEERING – IOT and CYBER SECURITY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

GYAN GANGA INSTITUTE OF TECHNOLOGY & SCIENCES

partial fulfillment of the requirement for the award of degree of BACHELOR OF

TECHNOLOGY in COMPUTER SCIENCE & ENGINEERING from RAJIV

GANDHI PROUDYOGIKI VISHWA-VIDYALAYA, BHOPAL (M.P).

Prof. Satendra Sonare Dr. Ashok Kumar Verma

We sincerely express indebtedness to esteemed and revered guide

We take this opportunity to express deep sense of gratitude to Dr. Ashok

We owe sincere thanks to all the faculties in Department of Computer

1.1 Purpose Of TheProject

1.2 Scope of the project

1.3 Project And Product Overview

1.4 Design Goals

1.5 Intended Audience

1.6 Team Architecture

2. Pooja Soni - CSS

 Specializes in styling the application using CSS.

4.Rashi Nagaich – Evaluation and Implementation

6. Yash Kumar Singh: APIs for Model Access

1.7 Survey Of Technology

3.API (Application Programming Interface):

Collaboration and Development Tools:

Advantages of Tier Architecture:

1. Phishing Link Detection:

Phase 1: Project Initiation and Planning (Week 1-2)

1. Week 1: Project Kickoff

Phase 2: Design and Architecture (Week 3-5)

3. Week 3: Architectural Design

Phase 3: Development (Week 6-10)

6. Week 6-8: Frontend and Backend Development

Phase 4: Testing and Refinement (Week 11-13)

Phase 5: Deployment and Launch (Week 14-15)

10. Week 14: Final Preparations

Phase 6: Post-Launch and Iteration (Ongoing)

12. Ongoing: Monitoring and Maintenance

2.1 Selection of Product

6. Load Testing Tools:

2.2 Existing System:

5. Limited Adaptability: There's a lack of proactive measures to adapt to evolving phishing

2.3 Drawback Of Existing System:

7. Lack of Comprehensive Evaluation Metrics: The absence of evaluation metrics such as

2.3 Proposed System

Hybrid Detection Models:

Scalability and Efficiency:

Collaboration and Feedback:

Integration with Education Initiatives:

A holistic approach combining technology, user interaction, and ongoing improvement

3.1 User Interface

User Interface for Phishing Link Detection System

3.3 Software Interface

3.4 Communication Interface

3.6 Software System Attributes

Software Process Model

4.1 Determining project feasibility

4.2 Agile Model

Roles and Responsibilities:

User Stories and Backlog:

Continuous Integration and Deployment:

4.3 DevOps Technology

Version Control System:

Infrastructure as Code (IaC):