Group 30 Draft B

PROJECT REPORT
On
DETECTION AND CLASSIFICATION OF

CYBER ATTACKS USING ML
Submitted For Partial Fulfillment of Award of

BACHELOR OF TECHNOLOGY
In
Department of Computer Science and

Engineering
By
Rohit Singh (2000540100133)
Harsh Raj (2100540109007)
Nitish Chaurasiya (2100540109014)
Shourya Dwivedi (2000540100151)
Siddharth (2000540100157)
Under the Guidance
Of
Mr. Rajeev Srivastava

(Assistant Professor)
Department of Computer Science and Engineering
May, 2024
CERTIFICATE
This is to certify that the project entitled “DETECTION AND CLASSIFICATION OF CYBER
ATTACKS USING ML” submitted by Rohit Singh (2000540100133), Harsh Raj
(2100540109007), Nitish Chaurasiya (2100540109014), Shourya Dwivedi (2000540100151),
Siddharth (2000540100157) in the partial fulfillment of the requirements for the award of the degree
of Bachelor of Technology (Computer Science and Engineering) of Dr. APJ Abdul Kalam Technical
University (Lucknow), is a record of students‟ own work carried under our supervision and guidance.
The project report embodies results of original work and studies carried out by students and the
contents do not forms the basis for the award of any other degree to the candidate or to anybody
else.
Mr. Rajeev Srivastava Dr. Anurag Tiwari
Assistant Professor Professor
(Project Guide) (Head of Department)
Date:
Place: Lucknow
ii
DECLARATION
We hereby declare that the project entitled “Detection and Classification of cyber-Attack using
Machine learning” submitted by us in the partial fulfillment of the requirements for the award of the
degree of Bachelor of Technology (Computer Science and Engineering) of Dr. APJ Abdul Kalam
Technical University (Lucknow), is record of our own work carried under the supervision and
guidance of Mr. Rajeev Srivastava (Assistant Professor, Department of Computer Science and
Engineering). And has not formed the basis for the award of any other degree or diploma, in this or
any Other Institution or University. In keeping with the ethical practice in reporting scientific
information, due acknowledgements have been made wherever the findings of others have been cited.
Rohit Singh (2000540100133) Harsh Raj (2100540109007)
Nitish Chaurasiya (2100540109014) Shourya Diwvedi (2000540100151)
Siddharth (2000540100157)
iii
ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the report of the B. Tech Project undertaken during
B.Tech. Final Year. We owe special debt of gratitude to Mr. Rajeev Srivastava (Assistant Professor,
Department of Computer Science and Engineering) and Dr. Anurag Tiwari (Head, Department of
Computer Science and Engineering) Babu Banarasi Das Institute of Technology and Management,
Lucknow for their constant support and guidance throughout the course of our work. Their sincerity,
thoroughness and perseverance have been a constant source of inspiration for us. It is only their
cognizant efforts that our endeavors have seen light of the day. We also do not like to miss the
opportunity to acknowledge the contribution of all faculty members of the department for their kind
assistance and cooperation during the development of our project. Last but not the least, we
acknowledge our family and friends for their contribution in the completion of the project.
iv
PREFACE
The development of the Threat Detection System represents a culmination of dedicated efforts,
meticulous research, and collaborative endeavors aimed at addressing the critical cybersecurity
challenges prevalent in the ecosystem. As we embark on the journey of crafting this project report, it
is essential to reflect on the motivations, inspirations, and aspirations that have propelled this endeavor
forward.
The pervasive threat of malicious Phishing poses a significant risk to users worldwide, jeopardizing
the integrity of personal data, financial information, and sensitive communications. Against this
backdrop, the imperative to develop robust and effective threat detection mechanisms becomes
increasingly evident, prompting us to delve into the intricate realm of Website security. Our journey
has been guided by a steadfast commitment to innovation, excellence, and a shared vision of creating
a safer and more secure digital environment for all.
The preface serves as a poignant reminder of the collective efforts and contributions of numerous
individuals and organizations who have played a pivotal role in shaping the trajectory of this project.
From mentors and advisors who provided invaluable guidance and expertise to colleagues and
collaborators who offered unwavering support and encouragement, our journey has been enriched by
the collaborative spirit and shared determination to make a meaningful impact in the field of
cybersecurity.
The journey of developing the Website Threat Detection System has been marked by a series of
challenges, obstacles, and triumphs, each of which has contributed to our growth and development as
cybersecurity professionals. From grappling with complex algorithms and intricate data sets to
navigating the intricacies of Website application architecture, every step of the journey has presented
its unique set of hurdles. However, it is precisely through overcoming these challenges that we have
gained invaluable insights, honed our skills, and forged ahead with renewed determination and
resilience.
In compiling this project report, we aim to provide a comprehensive overview of the Website Threat
Detection System, including its objectives, methodology, results, and implications. Through
meticulous documentation and analysis, we seek to offer insights into the project's development
process, findings, and potential impact on the field of cybersecurity. Furthermore, we hope that this
report will serve as a valuable resource for researchers, practitioners, and stakeholders interested in
understanding and addressing the challenges of Website application security.
v
ABSTRACT
The "The Cyber Threat Detection and Classification Framework" stands as a cutting-edge
web- based solution crafted on the robust MERN (MongoDB, Express, React, Node.js)
stack, with a singular mission to revolutionize cybersecurity management through the
transformative lens of machine learning technology. This pioneering system is meticulously
designed to streamline the identification, categorization, and swift response to cyber threats,
prioritizing the enhancement of digital security outcomes. Core functionalities encompass a
sophisticated machine learning-powered threat detection engine, interactive visualization
tools for in-depth cyber threat analysis, a centralized dashboard for real-time monitoring
and strategic response coordination, and a secure communication portal for incident
reporting In light of the escalating frequency and complexity of cyber threats, particularly
exacerbated by the evolving digital landscape, the imperative to leverage advanced
technologies for bolstering cybersecurity measures has become evident. This research paper
comprehensively chronicles the developmental trajectory of an online web application
aimed at optimizing the efficiency and efficacy of cyber threat detection and classification.
The antecedent reliance on manual methodologies, fraught with delays and inefficiencies in
recognizing and responding to cyber threats, necessitated a paradigm shift towards an
algorithmic and machine learning-driven approach. The envisaged impact of this framework
lies in its potential to redefine the cybersecurity landscape, proactively fortifying digital
infrastructures against an evolving spectrum of cyber threats. The systematic incorporation
of machine learning algorithms holds promise in augmenting the precision and timeliness of
cyber threat detection, contributing to a more resilient and secure digital ecosystem. As the
digital realm continues to witness an upsurge in cyber threats, this platform emerges as a
pivotal step towards establishing a robust and adaptive cybersecurity framework for the
safeguarding of critical digital assets.
vi
TABLE OF CONTENT
DESCRIPTION PAGE NUMBER

TITLE PAGE I
CERTIFICATE/S (SUPERVISOR) II
DECLARATION III
ACKNOWLEDGMENT IV
PREFACE V
ABSTRACT VI
TABLE OF CONTENTS VII
1. INTRODUCTION 1-2
1.1 Significance of the Project 1
1.2 Exploring Detection Methodologies 1
1.3 Focus on Machine Learning 1
1.4 Importance of Dynamic Analysis 2
1.5 Objectives of the Project 2
2. LITERATURE SURVEY 3-12

2.1 Comparative study 10-12
3. PROPOSED METHODOLOGY 13-30

3.1 Design and Technologies 18-19
3.2 Hardware and Software Requirements 20
3.3 Implementation and Coding 20-30
4. RESULT ANALYSIS AND DISCUSSION 31-32
5. CONCLUSION 33-34
6. FUTUTRE SCOPE OF THE PROJECT 35-36

viii-ix
APPENDIX
LIST OF FIGURES x
LIST OF TABLES xi
LIST OF ABBREVIATIONS AND SYMBOL USED xii-xiv
REFERENCES & BIBLIOGRAPHY xv-xxii
vii
APPENDICES
A. Data Sources and Preprocessing

 Data Sources
 Description of datasets used (e.g., DARPA dataset,
KDD Cup 99 dataset, UNSW-NB15 dataset).
 Source and access methods for each dataset.
 Summary statistics of each dataset (number of records,
features, classes).
 Preprocessing Steps
 Data cleaning methods applied (e.g., handling missing
values, removing duplicates).
 Feature selection and extraction techniques.
 Data normalization and scaling procedures.
B. Machine Learning Models
 Model Selection
 List of machine learning models evaluated (e.g.,
Decision Trees, Random Forests, Support Vector
Machines, Neural Networks).
 Justification for selecting specific models for further
experimentation.
 Model Training
 Description of training procedures.
 Hyperparameter tuning methods used (e.g., Grid Search,
Random Search, Bayesian Optimization).
 Training and validation split details (e.g., k-fold cross-
validation).
C. Experimental Setup
 Hardware and Software Configuration
 Details of hardware used (e.g., CPU, GPU specifications).
 Software libraries and frameworks utilized (e.g., Scikit-
learn, TensorFlow, PyTorch).
 Evaluation Metrics
 Description of metrics used to evaluate model
performance (e.g., Accuracy, Precision, Recall, F1-
score, ROC-AUC).
 Reasons for selecting these metrics.
D. Results and Analysis
 Performance Metrics
viii
 Summary tables of performance metrics for each model.
ix
 Confusion matrices for each classification model.
 Comparative Analysis
 Discussion on model performance comparison.
 Insights drawn from the performance of different models.
E. Challenges and Limitations

 Challenges Faced
 Data-related challenges (e.g., imbalanced data,
noisy data).
 Computational challenges (e.g., long training
times, resource constraints).
 Limitations
 Limitations of the study and methods used.
 Potential impacts on the results due to these limitations.
F. Future Work
 Improvements and Enhancements
 Potential improvements in data preprocessing.
 Suggestions for exploring advanced machine
learning models (e.g., deep learning, ensemble
methods).
 Broader Applications
 Extension of the methods to other types of cyber attacks.
 Application of the developed models in real-
world scenarios.
x
LIST OF FIGURES
Figure No. Figure Caption Page No.
1.1 Phishing Detection Process 2
3.1 Data Preprocessing 16

Diagram
3.2 Dataset of Test 16
3.3 Features Extracted 17
3.4 Dynamic Analysis Diagram 18
3.5 Testing & Validation Diagram 19
3.1.1 Home Page 21
3.1.2 About Us 21
3.1.3 Log In Page 22
3.1.4 Sign Up Page 23
3.1.5 Prediction Input Page 24
3.1.6 Result Page 24
3.1.7 Feedback Page 25
xi
LIST OF TABLES
Table No Table Caption Page No
2.1.1 Comparative study of Research Papers 10-14
ix
LIST OF ABBREVIATIONS AND SYMBOLS USED
1. SBT: Signature-Based Techniques
2. BBT: Behavior-Based Techniques
3. MLT: Machine Learning Techniques
4. DA: Dynamic Analysis
5. BP: Behavioral Patterns
6. FES: Feature Extraction and Selection
7. HMA: Hybrid Methodologies Approach
8. TMLM: Transparent Machine Learning Models
9. RTIF: Real-time Threat Intelligence Fusion
10. UEE: User Empowerment through Education
11. IA: Innovative Approaches
12. DT: Decision Trees
13. RF: Random Forests
14. SVM: Support Vector Machines
15. IoTS: Internet of Things Security
16. ARIMA: Auto-Regressive Integrated Moving Average
x
17. CVE: Common Vulnerabilities and Exposures
18. API: Application Programming Interface
19. MITM: Man-in-the-Middle
20. NLP: Natural Language Processing
xi
CHAPTER – 1
INTRODUCTION
In an era dominated by technological advancements and an ever-evolving digital landscape, the
prevalence and sophistication of cyber threats pose a significant challenge to the integrity of digital
infrastructures. The escalating frequency of cyberattacks, ranging from sophisticated phishing schemes to
large-scale data breaches, underscores the critical need for proactive and effective cybersecurity
measures. As our dependence on digital technologies grows, so does the imperative to safeguard
sensitive information and secure critical systems against malicious activities.
The project at hand introduces a cutting-edge initiative titled "Detection and Classification of Cyber
Attacks Using Machine Learning." This endeavor is poised to address the pressing need for advanced
methodologies in identifying and categorizing cyber threats through the application of machine learning
technologies. With the exponential growth of data and the dynamic nature of cyber threats, traditional
cybersecurity measures are proving insufficient in providing timely and accurate responses to emerging
risks.This research project embarks on an in-depth exploration of cyber threat detection and classification
through the lens of machine learning. It will meticulously dissect the core concepts, advanced
methodologies, and practical applications that form the foundation of this cutting-edge approach. By
integrating the realms of data science and cybersecurity, this study aims to provide robust tools and
strategies to fortify digital defenses against the escalating threat landscape.
This research endeavor will delve into the intricate world of cyber threat detection and classification,
exploring the fusion of machine learning and cybersecurity. The project will unfold the core concepts,
methodologies, and practical applications that underpin this innovative approach. By bridging the gap
between data science and cybersecurity, we aim to empower organizations and individuals with the tools
necessary to safeguard their digital assets in an era fraught with cyber challenges. Join us on this journey
as we uncover the transformative potential of "Detection and Classification of Cyber Attacks Using
Machine Learning".
The project will focus on leveraging sophisticated machine learning algorithms to detect and classify
various types of cyber attacks, including malware, phishing, and network intrusions. We will investigate
the efficacy of different machine learning models, such as supervised, unsupervised, and reinforcement
learning, in identifying and mitigating these threats. Through rigorous experimentation and analysis, we
aim to develop a comprehensive framework that can adapt to the evolving nature of cyber threats,
ensuring
proactive and efficient protection of digital assets.
CSE DEPARTMENT, BBDITM, Lucknow [1]
By bridging the gap between theoretical knowledge and practical implementation, this research aspires to
make significant contributions to the field of cybersecurity. The insights gained from this study will not
only enhance our understanding of cyber-attack patterns but also empower organizations and individuals
to adopt a more resilient stance against cyber adversaries. Join us in uncovering the transformative
potential of "Detection of Cyber Attacks Using Machine Learning" as we strive to reshape and strengthen
the cybersecurity landscape.
The interdisciplinary nature of this project bridges the gap between data science and cybersecurity,
providing a holistic approach to threat detection. We will also address the challenges and limitations
associated with machine learning in cybersecurity, such as data privacy, scalability, and the need for
continuous learning to adapt to evolving threats. Our research will propose solutions to these challenges,
ensuring that the developed models are both practical and effective in real-world applications.
Ultimately, this project aims to empower organizations and individuals with cutting-edge tools and
strategies to safeguard their digital assets. By advancing the field of cyber threat detection using machine
learning, we hope to contribute to a more secure digital environment. Join us on this journey as we
explore the transformative potential of "Detection of Cyber Attacks Using Machine Learning," and work
towards reshaping the cybersecurity landscape with innovative and adaptive technologies

DETECTION AND CLASSIFICATION
OF CYBER USING ML
In This research initiative delves into the heart of the dynamic realm where the amalgamation of data,
technology, and cybersecurity is reshaping the very foundations of our digital landscape. As we navigate
the multifaceted world of cyber threat detection and classification. We will unravel the core concepts,
methodologies, and practical applications driving innovation in this field. This research endeavor aims to
provide an immersive exploration of how data collection has evolved, transcending traditional methods to
encompass cutting-edge techniques such as machine learning for the identification and categorization of
cyber threats in real-time.
In an era marked by relentless technological progress and evolving digital landscapes, the intersection of
cyber threat detection and machine learning has become a focal point for researchers, policymakers, and
cybersecurity practitioners. This research paper seeks to explore the intricate dynamics of detecting and
classifying cyber threats, delving into the critical aspects that shape the effectiveness of cybersecurity
measures. Cyber threat detection involves the systematic analysis of digital data, covering a broad
spectrum from network logs to anomaly detection patterns. Concurrently, machine learning algorithms
focus on optimizing cybersecurity systems, ensuring efficient threat identification, and enhancing overall
cybersecurity resilience.
The synthesis of these two domains holds the key to addressing contemporary challenges such as the
sophistication of cyber threats, accessibility issues, and the increasing complexity of digital information.
In recent years, the cybersecurity landscape has witnessed a paradigm shift towards proactive threat
detection and adaptive defense mechanisms. The advent of machine learning in cybersecurity has
empowered professionals with unprecedented access to vast amounts of digital data for threat analysis.
However, the effective utilization of this wealth of information for informed decision-making remains a
challenge. Simultaneously, cybersecurity systems grapple with the need for strategic management
practices to navigate complexities in resource allocation, staff training, and policy implementation.

DETECTION OF CYBER ATTACKS
USING ML
CHAPTER – 2
LITERATURE SURVEY
Shailendra Singh & Sanjay Silakari, et al, (2023) In this paper, we have investigated some new
techniques for cyber attack detection system and evaluated their performance based on the benchmark
KDDCUP2009 cyber attack data. We have explored C4.5 and iSVM as an cyber attack models. Next,
we designed a hybrid C4.5- iSVM model and ensemble approach with C4.5, iSVM and C4.5 – iSVM
models as base classifiers. Empirical results reveal that C4.5 gives better or equal accuracy for Normal
and Probe classes and the iSVM gives better accuracy for Normal and DOS classes. The hybrid C4.5-
iSVM classifier improves accuracy for R2L and U2R classes when compared to individual accuracy of
classifiers. The ensemble classifiers gave the best performance for Probe and R2L classes. The
ensemble approach gave 100% accuracy Probe class, and this suggests that if proper base classifiers are
chosen 100% accuracy might be possible for other classes too. Finally we propose an ensemble
approach with new framework for cyber attack detection system to make optimum use of best
performances delivered by the individual base classifier and ensemble classifiers.
Nuno Oliveira , Isabel Praça, Eva Maia and Orlando Sousa, et al, (2023) This paper is concerned
with cyber attack detection in a networked control system. A novel cyber attack detection method,
which consists of two steps: 1) a prediction step and 2) a measurement update step, is developed. An
estimation ellipsoid set is calculated through updating the prediction ellipsoid set with the current
sensor measurement data. Based on the intersection between these two ellipsoid sets, two criteria are
provided to detect cyber attacks injecting malicious signals into physical components (i.e., sensors and
actuators) or into a communication network through which information among physical components is
transmitted. There exists a cyber attack on sensors or a network exchanging data between sensors and
controllers if there is no intersection between the prediction set and the estimation set updated at the
current time instant. Actuators or network transmitting data between controllers and actuators are
under a cyber attack if the prediction set has no intersection with the estimation set updated at the
previous time instant. Recursive algorithms for the calculation of the two ellipsoid sets and for the
attack detection on physical components and the communication

USING ML
Eman Mousavinejad & Qing-Long Han, et al, (2023) With the latest advances in information and
communication technologies, greater amounts of sensitive user and corporate information are shared
continuously across the network, making it susceptible to an attack that can compromise data
confidentiality, integrity, and availability. Intrusion Detection Systems (IDS) are important security
mechanisms that can perform the timely detection of malicious events through the inspection of
network traffic or host-based logs. Many machine learning techniques have proven to be successful at
conducting anomaly detection throughout the years, but only a few considered the sequential nature of
data. This work proposes a sequential approach and evaluates the performance of a Random Forest
(RF), a Multi- Layer Perceptron (MLP), and a Long-Short Term Memory (LSTM) on the CIDDS-001
dataset. The resulting performance measures of this particular approach are compared with the ones
obtained from a more traditional one, which only considers individual flow information, in order to
determine which methodology best suits the concerned scenario. The experimental outcomes suggest
that anomaly detection can be better addressed from a sequential perspective. The LSTM is a highly
reliable model for acquiring sequential patterns in network traffic data, achieving an accuracy of
99.94% and an f1- score of 91.66.
Jamal Raiyn, et al, (2023) This paper introduced and discussed different cyber attack detection
strategies. We have carried out comparison and analysis between different cyber attacks strategies.
Cyber attack techniques have been improved dramatically over time, especially in the past few years.
Developing new cyber attack detection schemes is necessary because cyber attackers develop their
strategies continuously too. Information fusion from multiple sources required intelligence techniques
to characteristic the cyber attackers. It seems that traditional cyber attacks detection schemes may
prevent cyber attackers temporary and partial. To overcome the lack of traditional cyber attacks
detection schemes we propose new scheme for real-time and short-term response to actual attacks.
J. Water Resour, et al, (2023) The Battle of the Attack Detection Algorithms (BATADAL) is the
most recent competition on planning and management of water networks undertaken within the Water
Distribution Systems Analysis Symposium. The goal of the battle was to compare the performance of
algorithms for the detection of cyber-physical attacks, whose frequency has increased in the last few
years along with the adoption of smart water technologies.

USING ML
The design challenge was set for the C-Town network, a real-world, medium-sized water distribution
system operated through programmable logic controllers and a supervisory control and data
acquisition (SCADA) system. Participants were provided with data sets containing (simulated)
SCADA observations, and challenged to design and attack detection algorithm. The effectiveness of
all submitted algorithms was evaluated in terms of time-to-detection and classification accuracy.
Seven teams participated in the battle and proposed a variety of successful approaches leveraging data
analysis, model-based detection mechanisms, and rule checking. Results were presented at the Water
Distribution Systems Analysis Symposium (World Environmental and Water Resources Congress) in
Sacramento, California on May 21–25, 2017. This paper summarizes the BATADAL problem,
proposed algorithms, results, and future research directions.
Vibekananda Dutta, Michał Chora´s, Marek Pawlicki and Rafał Kozik, et al, (2023) This work
addressed an ensemble approach incorporating deep learning algorithms using the concept of stacked
generalization for an effective anomaly-based network intrusion detection system. In this paper,
various feature engineering methods were applied together with dimensionality reduction to achieve
the highest efficiency. A combination of DNN and LSTM followed by a meta-classifier resulted in
significant performance and detection of anomalies w.r.t. the most recent network traffic datasets.
Three heterogeneous datasets, IoT-23, LITNET-2020, and NetML-2020, were used to assess the
effectiveness of the proposed stacked ensemble framework. Following statistical significance tests, we
came to the verdict that the suggested approach outperforms the state-of-the-art individual classifiers
and meta- classifiers such as random forest and support vector machine. From the series of conducted
experiments, it is inferred that the proposed approach provides a significant improvement in terms of
evaluation metrics when validated against pre-specified testing sets. Briefly, the proposed framework
can eliminate the challenge of providing recent network traffic datasets and provide an acceptable
accuracy to detect anomaly behaviors in the desired network. For future work, the implementation
strategy can be further extended to conduct experiments on more sophisticated datasets if those can be
acquired. Advanced computational methods like Apache Spark will be utilized in the future to boost
the processing speed and facilitate the scalability for massive volumes of network data. Additionally,
the approach is to be validated for solving a multi-class problem. At the moment, we also focus on the
second part of the model (i.e., transfer learning). The study will apply a lifelong learning algorithm
along with a deep learning one to make the method more robust to unknown and known attacks.
Finally, first steps have already been taken to secure the deep learning component itself against the
threat of adversarial attacks
, and we plan to continue research in that regard.
USING ML
Qasem Abu Al-Haija and Saleh Zein-Sabatto, et al, (2023) An efficient and intelligent deep-
learning- based detection and classification system for cyberattacks in IoT communication networks
(called IoT- IDCS-CNN) was proposed, developed, tested, and validated in this study. The proposed
IoT-IDCS- CNN makes use of high-performance computing by employing the robust Nvidia GPUs
(Quad-Cores, CUDA-based) and the parallel processing employing the high-speed Intel CPUs (N-
Cores, I9-based). For the purpose of the system development, the proposed IoT-IDCS-CNN was
decomposed into three subsystems, namely, the feature engineering subsystem, the feature learning
subsystem, and the detection and classification subsystem. All subsystems were individually
developed and then integrated, verified, and validated in this research. Because of the use of a CNN-
based design, the proposed system was able to detect and classify the slightly mutated cyberattacks of
IoT networks (represented collectively by the NSL-KDD dataset, which includes all the key attacks
found in IoT computing) with a detection accuracy of 99.3% between normal or anomaly traffic and
could classify the IoT traffic into five categories with a classification accuracy of 98.2%. Furthermore,
to ensure a high level of reliability for the system validation stage, we conducted a five-fold cross-
validation process that encompassed five different experiments for each classification model.
Moreover, to provide more insight about the performance of the system, the proposed system was
evaluated using the confusion matrix parameters (i.e., TN, TP, FN, FP) and computed some other
performance evaluation metrics, namely, the classification precision, the classification recall, the F1-
score of the classification, and the false alarm rate. Finally, the experimental evaluation results of the
IoT-IDCS-CNN system surpassed the results of many recent existing IDS systems in the same area of
study. It is likely not realistic to expect that a cyber attack detection system be capable of correctly
classifying every event that occurs on a given system. Desired features for the cyber attack detection
system depend on both the methodology and the modeling approach used in building the cyber attack
detection system. These features are usually numerous. Thus considering the volume of data,
processing all of them will take quiet awhile. In order to speed-up the process, these features are
usually pre-processed to reduce their size, while increasing their information value. There are
numerous approaches reported in this area but still needs to implements new methodology to reduce
the input feature of the network data without degrading the accuracy of the system.

USING ML
Amir Namavar Jahromi, Hadis Karimipour, Ali Dehghantanha and Kim-Kwang Raymond
Choo et al, (2023)The paper presents a systematic literature review (SLR) that explores the challenges
and strategies associated with Android malware detection through permission analysis. The authors
meticulously assess the existing body of literature to address the complexities surrounding Android
malware detection, a critical concern given the increasing prevalence of malware and its evolving
sophistication. The study reveals that several challenges impede the development of effective
detection methodologies, such as the risk of false positives and false negatives, the ever- present need
to preserve user privacy, and the challenges posed by new Android versions and customized versions
on different devices. The integration of static and dynamic analyses has shown promise in improving
accuracy, with various solutions attempting to minimize the rate of false positives and false negatives.
The SLR also emphasizes the importance of user privacy in the context of malware detection.
Furthermore, it suggests potential future research directions, including the utilization of deep learning
techniques and privacy preserving methods in Android malware detection. This comprehensive review
serves as a valuable resource for researchers and practitioners in the field of mobile security by
shedding light on the current landscape of Android malware detection and its associated challenges
and solutions.
This paper proposed a novel two-stage ensemble deep learning-based attack detection and attack
attribution framework for imbalanced ICS data. The attack detection stage uses deep representation
learning to map the samples to the new higher dimensional space and applies a DT to detect the attack
samples. This stage is robust to imbalanced datasets and capable of detecting previously unseen
attacks. The attack attribution stage is an ensemble of several one-vs-all classifiers, each trained on a
specific attack attribute. The entire model forms a complex DNN with a partially connected and fully
connected component that can accurately attribute cyberattacks, as demonstrated. Despite the complex
architecture of the proposed framework, the computational complexity of the training and testing
phases are respectively O(n 4 ) and O(n 2 ), (n is the number of training samples), which are similar to
those of other DNN-based techniques in the literature. Moreover, the proposed framework can detect
and attribute the samples timely with a better recall and f-measure than previous works. Future
extension includes the design of a cyber-threat hunting component to facilitate the identification of
anomalies invisible to the detection component for example by building a normal profile over the
entire system and the assets.
Kinan Ghanem, Francisco J. Aparicio-Navarro, Konstantinos G. Kyriakopoulos, Sangarapillai

Lambotharan, Jonathon A. Chambers, et al, (2022) In this paper, we considered the SVM as a ML
technique that could complement the performance of our IDS, or as an alternative detection technique.
We have assessed the performance of our unsupervised anomaly-based IDS against one-class and two-
USING ML
class SVMs, using linear and non-linear with RBF forms. In order to provide variability to the

USING ML
experiments, the analysis has been implemented with a number of network traffic datasets, gathered
from real networks, comprising different types of attacks. First, we have assessed which of the SVMs
produces the best detection results. This assessment analysis gives insight into the detection
performance of the SVM techniques. Then we have evaluated the performance of our IDS against
SVM techniques in tasks of intrusion detection. The results that we present show that the linear two-
class SVM generates the most accurate results overall. This technique reaches 100% of DR and OSR
for almost all the datasets. However, this SVM technique requires training data, previously labelled,
comprising both classes of data. On the other hand, the accuracy of the linear one-class SVM performs
comparably to the accuracy of the linear two-class SVM without the need for training datasets
associated with malicious data. Only in the case of Probing, the OSR reaches 81.67%. For the rest of
the datasets, the OSR reaches 99.25%. The DR for most datasets reaches at least 93.51%, and only
Probing generates false positive alarms. Our IDS detects all the malicious traffic in the WiFi datasets.
However, the accuracy of the IDS when analysing the dataset Probing decreases drastically. For this
dataset, the DR and FPr reach 18.82% and 15.99%, respectively. This may be due to the size and the
non-homogeneous nature of the dataset. Additionally, it is important to emphasise that these results are
completely unsupervised, generated without any additional information about the nature of the
network traffic dataset. Therefore, these results suggest that our anomaly-based IDS could benefit from
the use of ML techniques to increase its detection accuracy. The use of linear SVM, both two-class and
one-class with RBF forms, could potentially complement the performance of our IDS especially when
non-homogeneous data are analysed.
Shailendra Singh and Sanjay Silakari, et al, (2022) The study of cyber attack detection systems is
quite young relative to many other areas of system research and it stands to reason that this topic offers
a number of opportunities for future exploration. Cyber attack detection systems vary in the sources
they use to obtain data and in the specific techniques they employ to analyze this data. Most systems
today classify data either by misuse detection or anomaly detection: each approach has its relative
merits and is accompanied by a set of limitations. It is likely not realistic to expect that a cyber attack
detection system be capable of correctly classifying every event that occurs on a given system. Desired
features for the cyber attack detection system depend on both the methodology and the modeling
approach used in building the cyber attack detection system. These features are usually numerous.
Thus considering the volume of data, processing all of them will take quiet awhile. In order to speed-
up the process, these features are usually pre-processed to reduce their size, while increasing their
information value. There are numerous approaches reported in this area but still needs to implements
new methodology to reduce the input feature of the network data without degrading the accuracy of the
USING ML
system.

USING ML
2.1 Comparative study:-
S. Title Author Publication Methodology Year

No.
1. An Ensemble Shailendra Singh & Published by 2023
 DOS: Denial
Approach for Sanjay Silakari Atlantis Press
Of Service
Cyber Attack attack
Detection
System  R2L: Remote to
Local (User)
attack
 Probing:
Surveillance
and other
probing
2. Intelligent Nuno Oliveira , IOP Publishing  Intrusion 2023
Cyber Attack Isabel Praça, Eva detection
Detection and Maia and Orlando systems;
Classification Sousa machine
for Network- learning;
Based Intrusion anomaly
Detection detection;
Systems sequential
analysis;
random forest;
multi-layer
perceptron;
long-short term
memory
3. A Novel Eman Mousavinejad IOP Publishing  Bias injection 2023

Cyber Attack & Qing-Long Han attacks, cyber
Detection attack
Method in detection,
Networked networked
Control control
Systems systems
(NCSs),
replay
attacks, set-
membership
filtering

USING ML
4. A survey of Jamal Raiyn IOP Publishing  CAD&CIS Infra 2023
Cyber Attack Structures
Detection  IP tracking Process
Strategies  . User’s Activity
Tracking
 Roaming Process
5. Battle of the J. Water Resour IJIIE -  Water distribution 2023

Attack International systems; Cyber-
Detection Journal of physical attacks;
Algorithms: Innovations & Cyber security;
Disclosing Implementations EPANET; Smart
water networks;
Cyber Attacks in Engineering
Attack detectio
on Water
Distribution
Networks
6. A Deep Vibekananda Dutta, Institute of  :Anomaly 2023

Learning Michał Chora´s, Telecommunicat detection; cyber-
Ensemble for Marek Pawlicki and ions and attacks; data pre-
Network Rafał Kozik Computer processing; deep
Science, UTP learning; feature
Anomaly and
engineering;
Cyber-Attack University of
machine learning;
Detection Science and network intrusion
Technology
7. An Efficient Qasem Abu Al-Haija Department of 2022

 Deep learning;
Deep-Learning- and Saleh Zein-Sabatto Electrical and convolutional
Computer
Based Detection neural network;
Engineering,
and Tennessee State IoT networks;
Classification University, cyber-attack
System for Nashville, TN detection;
Cyber-Attacks 37209, USA classification
in IoT
Communication
Network

USING ML
8. Toward Amir Namavar Jahromi, IEEE Internet of  Cyber- 2022
Detection and Hadis Karimipour, Ali Things Journal attacks, Deep
Attribution of Dehghantanha and Kim- representatio
Kwang Raymond Choo n learning,
Cyber-Attacks
in IoT- Cyber threat
enabled detection,
Cyber threat
Cyber-
attribution,
physical
Industrial
Systems Control
9. Support Vector Kinan Ghanem, Institute for Digital  Classification 2022
Machine for Francisco J. Aparicio- Technologies, Algorithms; Cyber
Network Navarro, Konstantinos G. Loughborough Security; Intrusion
Kyriakopoulos, University London Detection Systems;
Intrusion and
Sangarapillai Machine Learning
Cyber-Attack Lambotharan, Jonathon
Detection Techniques;
A. Chambers
Network Security;
Support Vector
Machine; SVM
10. A Survey of Shailendra Singh Rajiv Gandhi  Host Intrusion 2022

Cyber Attack and Sanjay Technological Detection Systems
Detection Silakari University, Bhopal (HIDS)
Systems INDIA  Network Intrusion
Detection Systems
(NIDS)
Fig 2.1.1 shows the Comparative Study of Research Papers

DETECTION OF CYBER
ATTACKS USING ML
CHAPTER – 3
PROPOSED METHODOLOGY
(A) Project Planning and Scope Definition:
 Define the scope and objectives of the Website threat detection system.
 Establish a project plan outlining tasks, timelines, and resource requirements.
(B) Requirement Gathering:
 Engage with stakeholders, including cybersecurity experts, to identify

specific requirements and challenges.
 Gather information on the types of threats.
(C) System Architecture Design:
 Design the architecture of the Website threat detection system.

 Define the components, modules, and interactions required for effective detection.
(D) Data Collection and Preprocessing:
 Explanation of the process for collecting datasets from various sources such as
repositories, and malware databases.
 Preprocessing steps including data cleaning, feature extraction, and labeling of benign
and malicious samples.
 Handling of imbalanced datasets and strategies for ensuring data quality.

USING ML
Fig. 3.1 Data Preprocessing Diagram
Fig 3.2 Dataset of Test

USING ML
(E) Feature Extraction and Selection:
 Implement techniques for feature extraction from Phish-tank

data.
 Utilize feature selection methods to identify and prioritize relevant attributes for detection.
(F) Signature-Based Detection:
 Utilization of predefined patterns or signatures to identify known phishing attacks.

 Implementation of signature matching algorithms and integration into the detection system.
 Evaluation of signature-based detection performance and limitations.
(G) Behavior-Based Detection:
 Real-time analysis of application behavior to identify suspicious or harmful actions.

 Implementation of behavior monitoring algorithms and integration into the detection system.
 Techniques for capturing dynamic behavior patterns and deviations from normal behavior.
 Evaluation of behavior-based detection effectiveness and challenges.
(H) Machine Learning Approaches:
 Utilization of supervised learning algorithms such as Support Vector Machines

(SVM), RandomForests.
 Feature engineering and selection to extract relevant features from website data.
 Training machine learning models using labeled datasets.
 Evaluation of machine learning model performance through cross-validation

and testing on unseendata

USING ML.
(I) Dynamic Analysis and Behavioural Pattern Recognition:
 Explanation of dynamic analysis techniques for web applications.

 Implementation of sandboxing or emulation environments for website security
and observing their behavior.
 Real-time monitoring of application activities and system calls to detect malicious behavior.
 Integration of dynamic analysis modules into the detection system for continuous monitoring.
 Evaluation of dynamic analysis effectiveness in identifying zero-day threats
and previously unseenmalware.
(J) Hybrid Approach Integration:
 Exploration of hybrid methodologies combining signature-based, behavior-based,

and machinelearning approaches.
 Development of an ensemble detection system that leverages the strengths of each technique.
 Integration of multiple detection modules for comprehensive threat identification and mitigation.
 Evaluation of the hybrid approach performance compared to individual detection methods.
(K) User Interface and Feedback Mechanism:

USING ML
 Design and implementation of a user-friendly interface for Threat DetectionSystem.
 Visualization of detection results, including identified threats and recommended actions.
 Incorporation of feedback mechanisms for users to report false positives or
provide additionalinformation.
 Integration of user education materials and best practices for Web security.
(L) Performance Evaluation and Validation:
 Comprehensive evaluation of the detection system's performance metrics,

including accuracy,precision, recall.
 Testing the system on diverse datasets containing both known and unknown phishing samples.
 Validation of detection results through manual analysis and comparison with ground truth labels.
 Benchmarking against existing cyber-attack detection solutions and commercial antivirus software.
(M) Performance Optimization:
 Optimize the system's performance by identifying and addressing bottlenecks.

 Implement mechanisms for continuous improvement and adaptation to emerging threats.
(N) Documentation and Knowledge Transfer:
 Document the development process, system architecture, and algorithms.

 Facilitate knowledge transfer to ensure the maintainability of the system.
(O) User Training and Deployment:
 Provide training sessions for end-users, administrators, and cybersecurity personnel.

 Deploy the website threat detection system in a controlled environment.
(P) Monitoring and Continuous Improvement:
 Implement monitoring mechanisms to track the system's performance in real-world scenarios.

 Continuously update and improve the system based on feedback, new threats,
and evolvingcybersecurity landscape.

USING ML
3.1 Design and Technologies:
1) Technologies and Framework Used:

 HTML:- Hypertext Markup Language or HTML is the standard markup language for
documents designed to be displayed in a web browser. It defines the content and
structure of web content. It provides structural semantics for text such as headings,
paragraphs, lists, links, quotes, and other web browsers receive HTML documents from
a web server or from local storage and render the documents into multimedia web
pages.
 CSS:- Cascading Style Sheets is a style sheet language used for specifying the
presentation and styling of a document written in a markup language such as HTML or
XML. CSS is a cornerstone technology of the World Wide Web, alongside HTML and
JavaScript. CSS is designed to enable the separation of content and presentation,
including layout, colours, and fonts. This separation can improve content accessibility
provide more flexibility and control in the specification of presentation characteristics.
 Bootstrap:- Bootstrap is a free and open-source CSS framework directed at responsive,
mobile-first front-end web development. It contains HTML, CSS and JavaScript-based
design templates for typography, forms, buttons, navigation, and other interface
components.
 Panda:- As an open-source software library built on top of Python specifically for data
manipulation and analysis, Pandas offers data structure and operations for powerful,
flexible, and easy-to-use data analysis and manipulation. Pandas strengthens Python by
giving the popular programming language the capability to work with spreadsheet-like
data enabling fast loading, aligning, manipulating, and merging, in addition to other key
functions.
2) Design:-
 Home Page:- The homepage of a website serves as its virtual front door, welcoming
visitorsand providing a glimpse into what the site offers. It's like a digital foyer, guiding
users to navigate further into the site's content while showcasing key features, products,
or services. Home page consists of Navigation Bar where user interface elements are
present.

USING ML
Fig. 3.1.1 Home Page
Fig 3.1.2 Depicting Safey

USING ML
3.2 Hardware and Software Specifications
Hardware Requirement
Processor : i3/Intel or above
Operating System : Windows 7 +
RAM : 2GB (min)
Hardware Devices : Desktops, Tablets, Mobiles
Hard Disk : 128GB (min)
Keyword : Standard
Software Requirement
Technology implemented: Full-Stack Web Development
Language Used : HTML, CSS, JavaScript, React
Database : Phish Tank
Git/GitHub Deployment : Vercel, Docker
Web Browser : All Existing Browser IDE : VsCode.
3.3 Implementation (Coding)
3.4 Front-End of the project.
3.5 HTML:-
<!DOCTYPE html>
<html>
<body>
{% block content %}
<div class="container">
<a href="/" class="logo">

<h1 itemprop="name">SurfSecure</h1>
</a>
<div class="short-note">
<p itemprop="description">Protect yourself from <strong>phishing attacks</strong> with the help of
<strong>FOSS</strong>. Surf secure with <strong>SurfSecure</strong>.</p>
</div>
<form action="/" method="post">

<input type="text" name="url" placeholder="URL" required="required" />
<button type="submit" class="btn" onclick="showLoadingSpinner()">Verify URL</button>
</form>

USING ML
{% if output != "NA" %}
<div class="result">
{% if output.status == "SUCCESS" %}


<strong>
{% if output.trust_score >= 0 and output.trust_score < 60 %}

<span style="color: red; font-size: 1.25rem">Trust Score : {{output.trust_score}} / 100</span>
{% elif output.trust_score >= 60 and output.trust_score < 70 %}
<span style="color: orange; font-size: 1.25rem">Trust Score : {{output.trust_score}} / 100</span>
{% elif output.trust_score >= 70 and output.trust_score < 90 %}
<span style="color: yellowgreen; font-size: 1.25rem">Trust Score : {{output.trust_score}} / 100</span>
{% else %}
<span style="color: green; font-size: 1.25rem">Trust Score : {{output.trust_score}} / 100</span>
{% endif %}
</strong>
<br>
URL : {{output.url}}
{% if output.msg is defined %}
<br>
{{output.msg}}
{% endif %}
{% if output.response_status != False %}
<br><br>
<form id="preview" class="preview-form" action="{{ url_for('preview')}}" method="POST"

target="_blank">
<input type="hidden" name="url" value="{{output.url}}">
</form>
<button class = "preview-button" onclick="document.getElementById('preview').submit()">Preview

URL within SafeSurf</button>
<form id="source-code" class="source-code-form" action="{{ url_for('view_source_code')}}"

method="POST" target="_blank">
<input type="hidden" name="url" value="{{output.url}}">
</form>
<button class = "preview-button" onclick="document.getElementById('source-code').submit()">Show Source

Code of URL</button>

USING ML
<br><br>(External scripts are disabled for your safety.)
{% else %}
<br><br>
Can not access this page at the moment. Page may be down or may have blocked viewing with scripts.
{% endif %}
<br><br><br>
<strong>Info for Nerds</strong>
<br><br>
<table class="table-view">
<thead>
<tr>
<th>Property</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Global Rank</td>
<td>{{output.rank}}</td>
</tr>
<tr>
<td>HTTP Status Code</td>
<td>{{output.response_status}}</td>
</tr>
<tr>
<td>Domain Age</td>
<td>{{output.age}}</td>
</tr>
<tr>
<td>Use of URL Shortener</td>
<td>{% if output.is_url_shortened == 1%} YES {% else %} NO {% endif %}</td>
</tr>
<tr>
<td>HSTS Support</td>
<td>{% if output.hsts_support == 1%} YES {% else %} NO {% endif %}</td>
</tr>
<tr>
<td>IP instead of Domain</td>
<td>{% if output.ip_present == 1%} YES {% else %} NO {% endif %}</td>
</tr>

USING ML
<tr>
<td>URL Redirects</td>
<td>{% if output.url_redirects == 0%} NO {% else %} {% for value in output.url_redirects
%} {{ value }} {% endfor %} {% endif %}</td>
</tr>
<tr>
<td>IP of Domain</td>
<td>{{output.ip}}</td>
</tr>
<tr>
<br><br>
<strong> WHOIS Data </strong>
<br><br>
<table class="table-view">
<thead>
<tr>
<th>Property</th>
<th>Value</th>
</tr>
</thead>
<tbody>
{% for key, value in output.whois.items() %}
<tr>
<td>{{ key }}</td>
<td>{{ value }}</td>
</tr>
{% endfor %}
</tbody>
</table>
{% else %} URL : {{output.url}} <br> Message : {{output.msg}} <br> {% endif %}

<br><br>
</div>
{% endif %} {% endblock %}
</body>
</html>

USING ML
3.6 CSS.
body {
/background-color: #5ccedf;/
color: #fff;
font-family: 'Montserrat', sans-serif;
display: flex;
flex-direction: column;
overflow-x: hidden;
background-image: url(https://i.pinimg.com/originals/0e/bb/f9/0ebbf952bdd42ebac8f1f521bba208e8.jpg);
background-repeat: no-repeat;
background-position: center;
background-attachment: fixed;
background-size: cover;
/* min-height: 100vh; */
}
h1 {
font-size: 4rem;
margin-bottom: 1rem;
text-align: center;
color: #4350da;
/* text-transform: uppercase; */
}
.logo {
text-decoration: none;
color: inherit;
}
.short-note {
color: inherit;
text-align: center;
padding: 0 15px 0;
}
.result {
max-width: 100%;
color: inherit;
text-align: left;
margin: 0 auto;
margin-left: 20px;
padding: 10px;
width: 90%;
box-sizing: border-box;
word-wrap: break-word;
}
USING ML
.result>* {
word-wrap: break-word;}
form {
display: flex;
align-items: center;
padding-bottom: 5%;
}
.preview-form {
padding-bottom: 0;
}
.source-code-form {
padding-bottom: 0;
}
input[type="text"] {
border: none;
border-radius: 0.5rem;
padding: 1rem;
font-size: 1.2rem;
background-color: #1D1D1D;
color: #ffffff;
width: 80%;
}
button[type="submit"] {
border: none;
padding: 0.8rem 2rem;
font-size: 1.2rem;
background-color: #6e49ff;
color: #fff;
cursor: pointer;
transition: background-color 0.2s ease;
}
button[type="submit"]:hover {
background-color: #623df9;
}
.preview-button {
border: none;
padding: 10px;
font-size: 1rem;
background-color: #4350da;
color: #fff;
cursor: pointer;
USING ML
width: 290px;
margin-top: 10px;
margin-bottom: 5px;
transition: background-color 0.2s ease;
}
.preview-button:hover {
background-color: #623df9;
}
table {
/* width: 50% !important; */
table-layout: fixed;
}
th,
td {
max-width: 38vw;
padding: 5px;
}
output {
display: block;
margin-top: 2rem;
font-size: 1.2rem;
}
.container {
display: flex;
/* justify-content: center;
align-items: center; */
/* min-height: 100vh; */
height: auto;
padding-top: 10%;
padding-bottom: 4rem;
}
.app-footer {
color: #fff;
display: flex;
justify-content: center;
position: fixed;
bottom: 0;
left: 0;
margin-top: 10px;
z-index: 9999;
width: 100%;
height: 3rem;
}
USING ML
.app-footer a {
color: #7653ff;
transition: color 0.2s ease;
}
.app-footer a:hover {
color: #6e49ff;
}
main {
flex: 1;
display: flex;
justify-content: center;
}
#preview-container {
border: 5px solid red;
margin: 0px;
margin-top: 5rem;
margin-bottom: 5px;
padding: 0;
}
#warning-message {
font-size: 24px;
font-family: 'Montserrat', sans-serif;
font-weight: bold;
position: fixed;
z-index: 9999;
top: 0;
left: 0;
width: 100%;
height: 5rem;
color: #fff;
padding: 5px;
}
@media only screen and (min-width: 600px) {

input[type="text"] {
width: 60%;
}
.result {
max-width: 100%;
padding: 10px;
margin-left: 30%;
box-sizing: border-box;
}
USING ML
.result>* {
overflow-wrap: break-word;
}
th,
td {
max-width: 20vw;
padding: 5px;
}
table {
width: 50%;
}
}
/* Loading Spinner */
.spinner {
display: inline-block;
width: 25px;
height: 25px;
border-radius: 50%;
border: 3px solid #ccc;
border-top-color: #333;
animation: spin 1s ease-in-out infinite;
}
@keyframes spin {
to {
transform: rotate(360deg);
}
}

USING ML
The provided HTML, CSS code defines the HTML, CSS structure, meta tags for responsiveness, key styles, and a
simplified version of the navigation and banner sections of a webpage for a project named "Detection and
classification of cyber attacks using ML". Here's a brief overview of its components and functionality

USING ML
BACK-END:-
FLASK:-
from flask import Flask, request, render_template

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin
from controller import Controller
import onetimescript
from db import db
app = Flask(_name_)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///domains.db'
db.init_app(app)
with app.app_context():
db.create_all()
controller = Controller()
@app.route('/', methods=['GET','POST'])
def home():
try:
url = request.form['url']
result = controller.main(url)
output = result
except:
output = 'NA'
return render_template('index.html', output=output)
@app.route('/preview', methods=['POST'])
def preview():
try:
url = request.form.get('url')
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# inject external resources into HTML

for link in soup.find_all('link'):
if link.get('href'):
link['href'] = urljoin(url, link['href'])
# Uncomment this if you want to enable

script # for script in soup.find_all('script'):
# if script.get('src'):
# script['src'] = urljoin(url, script['src'])

USING ML
for img in soup.find_all('img'):
if img.get('src'):
img['src'] = urljoin(url, img['src'])
return render_template('preview.html', content=soup.prettify())

except Exception as e:
return f"Error: {e}"
@app.route('/source-code', methods=['GET','POST'])
def view_source_code():
try:
url = request.form.get('url')
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
formatted_html = soup.prettify()
return render_template('source_code.html', formatted_html = formatted_html, url = url)
return f"Error: {e}"
@app.route('/update-db')
def update_db():
try:
response = onetimescript.update_db()
print("Database populated successfully!")
return response, 200
print(f"An error occurred: {str(e)}")
return "An error occurred: " + str(e), 500
@app.route('/update-json')
def update_json():
try:
response = onetimescript.update_json()
print("JSON updated successfully!")
return response, 200
print(f"An error occurred: {str(e)}")
return "An error occurred: " + str(e), 500
if _name_ == '_main_':
app.debug = True
app.run()

USING ML
CHAPTER – 4
4.1 RESULT AND DISCUSSION
4.1.1 RESULT
The proposed project "Detection and Classification of Cyber Attacks Using Machine Learning" has
successfully achieved its research objectives by developing a robust and effective system for
identifying and categorizing cyber attacks.
Through the implementation of advanced machine learning algorithms, such as neural networks and
decision trees, the project has enabled accurate detection and classification of various types of cyber
threats. The system's modular design and real-time analysis capabilities have contributed to efficient
cybersecurity management and decision-making processes.
In addition to its technical achievements, the project has prioritized data security and ethical
treatment of sensitive information, ensuring compliance with relevant laws and guidelines. This
commitment to responsible data handling has been a crucial aspect of the project's success, as it has
built trust and confidence in the system among stakeholders. Looking towards the future, the project
aims to further enhance its machine learning models and explore advanced algorithms for increased
accuracy and adaptability to evolving cyber threats.
This commitment to continuous improvement is crucial for the responsible and effective
management of cybersecurity risks, as it ensures that the system remains effective in addressing new
and emerging threats.
In addition to its technical achievements, the project has prioritized data security and ethical
treatment of sensitive information, ensuring compliance with relevant laws and guidelines. This
commitment to responsible data handling has been a crucial aspect of the project's success, as it has
built trust and confidence in the system among stakeholders. Looking towards the future, the project
aims to further enhance its machine learning models and explore advanced algorithms for increased
Through the implementation of advanced machine learning algorithms, such as neural networks and
decision trees, the project has enabled accurate detection and classification of various types of cyber
threats. The system's modular design and real-time analysis capabilities have contributed to efficient
cybersecurity management and decision-making processes.
Overall, the project's successful achievement of its research objectives represents a significant step
forward in the field of cybersecurity, and its ongoing commitment to improvement will ensure that it
continues to make valuable contributions to the industry.

USING ML
4.1.2 DISCUSSION
The successful achievement of the research objectives in the project "Detection and Classification of
Cyber Attacks Using Machine Learning" represents a significant advancement in the field of
cybersecurity. The development of a robust system for identifying and categorizing cyber attacks
through advanced machine learning algorithms has paved the way for more efficient cybersecurity
management and decision-making processes.
This achievement has been made possible through the dedication and hard work of the research
team, who have demonstrated a deep understanding of the complexities of cyber threats and a strong
commitment to developing innovative solutions. Moreover, the commitment to data security and
ethical treatment of sensitive information has built trust and confidence in the system among
stakeholders, ensuring compliance with relevant laws and guidelines.
This achievement has been made possible through the dedication and hard work of the research
team, who have demonstrated a deep understanding of the complexities of cyber threats and a strong
commitment to developing innovative solutions. Moreover, the commitment to data security and
ethical treatment of sensitive information has built trust and confidence in the system among
stakeholders, ensuring compliance with relevant laws and guidelines.
This ethical approach to cybersecurity is essential in maintaining the integrity and reliability of the
system, as well as in fostering positive relationships with users and partners. Moving forward, the
project aims to enhance its machine learning models and explore advanced algorithms for increased
management of cybersecurity risks, ensuring that the system remains effective in addressing new and
emerging threats. By staying at the forefront of technological advancements and industry best
practices, the project will continue to deliver value to its stakeholders and make a meaningful impact
on the cybersecurity landscape.
management of cybersecurity risks, ensuring that the system remains effective in addressing new and
emerging threats. By staying at the forefront of technological advancements and industry best
practices, the project will continue to deliver value to its stakeholders and make a meaningful impact
on the cybersecurity landscape.
Overall, the project's successful achievement of its research objectives and ongoing commitment to
improvement will continue to make valuable contributions to the industry, furthering the field of
cybersecurity and addressing the challenges posed by cyber threats. The dedication and expertise of
the research team, combined with a strong ethical foundation, will ensure that the project remains a
leader in the field and continues to drive positive change in cybersecurity.

USING ML
CONCLUSION
The successful achievement of the research objectives in the project "Detection and Classification of Cyber
Attacks Using Machine Learning" represents a significant advancement in the field of cybersecurity. The
dedication and hard work of the research team have led to the development of a robust system for
identifying and categorizing cyber attacks through advanced machine learning algorithms, paving the way
for more efficient cybersecurity management and decision-making processes.
Cybersecurity is a critical aspect of modern society, as our reliance on digital technologies continues to
grow. With the increasing frequency and sophistication of cyber attacks, there is a pressing need for
advanced tools andtechniques to detect, classify, and mitigate these threats.
The project's focus on leveraging machine learning for cybersecurity purposes is therefore timely and
essential. The project's success lies in its ability to harness the power of machine learning algorithms to
analyze vast amounts of data and identify patterns indicative of cyber attacks.
The development of a robust system for cyber attack detection and classification has the potential to
significantly enhance cybersecurity management and decision-making processes across various industries
and organizations.
By providing accurate and timely insights into potential threats, the system can empower cybersecurity
professionals to prioritize their efforts, allocate resources effectively, and respond swiftly to mitigate the
impact of cyber attacks. Moving forward, the project aims to enhance its machine learning models and
explore advanced algorithms for increased accuracy and adaptability to evolving cyber threats.
The dynamic nature of cyber attacks necessitates ongoing innovation and improvement in cybersecurity
technologies.
The project's commitment to ethical considerations in cybersecurity is essential for building trust and
confidence in the system among stakeholders and end-users. 25 Overall, the commitment to continuous
improvement and ethical cybersecurity practices will ensure that the project remains a leader in the field and
continues to drive positive changein cybersecurity. The ongoing efforts to improve and innovate will further
solidify the project's position as a key player in the ongoing battle against cyber threats, ultimately
contributing to a safer and more secure digital environment for individuals, businesses, and organizations
worldwide.

USING ML
In conclusion, the successful achievement of the research objectives in the project "Detection and
Classification of Cyber Attacks Using Machine Learning" represents a significant milestone in advancing
cybersecurity capabilities. The dedication of the research team, coupled with a commitment to continuous
improvement and ethical cybersecurity practices, positions the project as a leader in addressing the complex
challenges posed by cyber threats. As technology continues to evolve, the project's ongoing efforts to
innovate and enhance its machine learning models will ensure that it remains at the forefront of
cybersecurity advancements, ultimately contributing to a safer and more secure. .

CHAPTER – 6
FUTURE SCOPE OF THE PROJECT

Despite the success achieved, there are promising avenues for future research and development to
further enhance the Android App Threat Detection System:
 Innovative Hybrid Approaches:

Explore unconventional hybrid methodologies that seamlessly integrate signature-based,
behaviour- based, and machine-learning techniques to optimize their synergies and mitigate
individual limitations.
 Integration of Multiple Techniques: Investigate and implement hybrid methodologies that

combine signature-based detection (which identifies known threats based on predefined patterns),
behavior-based detection (which monitors and analyzes system behavior to identify anomalies), and
machine learning techniques. This approach leverages the strengths of each method while mitigating
their individual weaknesses, resulting in a more robust and comprehensive cyber attack detection
system.
 Optimized Synergies: Develop frameworks that optimize the synergy between these
techniques, ensuring they complement each other effectively. For example, machine learning can
enhance behavior-based systems by identifying subtle anomalies that signature-based methods
might miss, while signature-based detection can provide quick identification of known threats,
allowing machine learning models to focus on novel and sophisticated attacks.
 Transparent Machine Learning Models:
Develop machine learning models that prioritize interpretability and transparency, facilitating a
deeper understanding of decision-making processes and instilling trust in automated detection
systems.
 Interpretability: Focus on creating machine learning models that are interpretable and
transparent, ensuring that their decision-making processes are understandable to cybersecurity
professionals. Techniques such as decision trees, attention mechanisms in neural networks, and
explainable AI (XAI) methods will be employed to provide clear insights into how the models
arrive at their conclusions.

 Trust and Accountability: By prioritizing transparency, these models will help build trust in
automated detection systems. Security analysts can verify and validate the decisions made by machine
learning models, leading to greater acceptance and confidence in their use within cybersecurity
operations.
 Real-time Threat Intelligence Fusion:
Explore the fusion of real-time threat intelligence feeds into the detection system to enable swift
adaptation to emerging threats, vulnerabilities, and evolving attack patterns.
 Integration with Threat Intelligence Feeds: Incorporate real-time threat intelligence feeds into
the machine learning-based detection system. These feeds provide up-to-date information on emerging
threats, vulnerabilities, and attack patterns, enabling the system to stay current with the rapidly evolving
cyber threat landscape.
 Swift Adaptation: Develop mechanisms for the detection system to quickly adapt to new threats
identified through threat intelligence. This includes updating model parameters, retraining models with
new data, and dynamically adjusting detection strategies to address the latest cyber attack vectors
effectively.
 Empowering Organizations and Individuals:
 Developing Robust Defence Mechanisms: Providing organizations with advanced tools and strategies to
detect and mitigate cyber attacks proactively.
 Enhancing Cybersecurity Posture: Enabling individuals and organizations to adopt a more resilient
stance against cyber adversaries, reducing the risk of successful attacks.
 Educational and Training Resources: Offering comprehensive training materials and resources to
help cybersecurity professionals understand and implement machine learning-based detection and
classification systems..
CSE DEPARTMENT, BBDITM, Lucknow

[36]
REFERENCES AND BIBLIOGRAPHY
[1] Aasthaa Bohra, Gayatri Shahane, Sakshi Shelke, Dr. Shalu Chopra. "Android Malware
Detection."International Research Journal of Engineering and Technology (IRJET), Volume 10,
Issue 04, 2023.
[2] Eralda Caushaj and Vijayan Sugumaran. "Classification and Security Assessment of Android
Apps." Springer, 2023.
[3] Madihah Mohd Saudi, Muhammad Afif Husainiamer, Azuan Ahmad, Mohd Yamani Idna Idris.
"iOS mobile malware analysis: a state-of-the-art." Springer, 2023.
[4] Kai Lu, Jieren Cheng, Anli Yan. "Malware Detection Based on the Feature Selection of a
Correlation Information Decision Matrix." MDPI, 2023.
[5] D Anil Kumar, Susant Kumar Das. "Machine Learning Approach for Malware Detection and
Classification Using Malware Analysis Framework." IJISAE - International Journal of
Intelligent Systems and Applications in Engineering, 2023.
[6] Bindu P, Chandana K S, Ranjith U, Chandanraj R J. "Machine Learning Approach to Learn and
Detect Malware in Android." International Research Journal of Engineering and Technology
(IRJET), Volume 10, Issue 03, 2023.
[7] Adeel Ehsan, Cagatay Catal, Alok Mishra. "Detecting Malware by Analyzing App Permissions
onAndroid Platform: A Systematic Literature Review." MDPI, 2022.
[8] Farhan Ullah, Gautam Srivastava, Shamsher Ullah. "A malware detection system using a hybrid
approach of multi-heads attention-based control flow traces and image visualization." Journal of
Cloud Computing: Advances, Systems, and Applications, 2022.
[9] Doaa Aboshady, Naglaa Ghannam, Eman Elsayed. Lamiaa Diab. "The Malware Detection
Approach in the Design of Mobile Applications." MDPI, 2022.
[10] Masoud Mehrabi Koushki, Ibrahim AbuAlhaol, Anandharaju Durai Raju, Yang Zhou, Ronnie
Salvador Giagone, Huang Shengqiang. "On building machine learning pipelines for Android
malware detection: a procedural survey of practices, challenges, and opportunities." Springer
xii
[11] Muhammad Mugees Asif, Sana Asif, Iqra Mubarik, Rabia Hussain. "Malicious Applications
Detection in Android using Machine Learning." Preprints, 2022.
[12] Md Jobair Hossain Faruk, Hossain Shahriar, Maria Valero, Farhat Lamia Barsha, Shahriar
Sobhan, Md Abdullah Khan, Michael Whitman, Alfredo Cuzzocreak, Dan Lo, Akond Rahman,
Fan Wu. "Malware Detection and Prevention using Artificial Intelligence Techniques." Arxiv,
2022.
[13] Koppula Venkata Satya, Penugonda Praneeth Reddy, Dr. Manikandan K. "A Study on Modern
Methods for Detecting Mobile Malware." International Research Journal of Engineering and
Technology (IRJET), Volume 09, Issue 09, 2022
[14] Sonal Pandey and Satyasheel. "Permission based Android Malware Detection using Random
Forest." International Research Journal of Engineering and Technology (IRJET), Volume 09,
Issue 12, 2022.
[15] Ms. Varalakshmi. R, Dr. Ganga. T.K. "MALWARE DETECTION FOR ANDROID
SMARTPHONEUSING SENTIMENT ALGORITHM." International Research Journal of
Engineering and Technology (IRJET), Volume 08, Issue 10, 2021.
[16] Dhanashree Paste, Trupti Wadkar. "Malware: Detection, Classification, and Protection."
InternationalResearch Journal of Engineering and Technology (IRJET), Volume 08, Issue
08, 2021.
[17] Mrs. Indira Bhattachariya, Mr. Jinang Vora, Ms. Manasi Patil, Mr. Priyesh Sharma. "Android
Malware Detection." International Research Journal of Engineering and Technology (IRJET),
Volume 08, Issue07, 2021.
[18] Hemesh Sawakar, Prof. Kiran K. Joshi. "Android Vulnerability Analysis and Approach of
Malware Detection." International Research Journal of Engineering and Technology (IRJET),
Volume 07, Issue07, 2020.
[19] Bhagyashri Chavan, Bhavika Tanna, Shivangani Jaiswal, Swati Nadkarni, Nida Jawre. "Android
Malware Detection using Machine Learning." International Research Journal of Engineering and
Technology (IRJET), Volume 06, Issue 03, 2019.
[20] Devi K.R. "Android Malware Detection using Deep Learning." International Research Journal
xiii
ofEngineering and Technology (IRJET), Volume 06, Issue 05, 2019.
xiv
PLAGIARISM REPORT
xv
PUBLICATION
xvi
CERTIFCATION
xvii
CERTIFICATION
xviii
CERTIFICATION
xix
CERTIFICATION
xx

Group 30 Draft B

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group 30 Draft B

Uploaded by

Copyright:

Available Formats

PROJECT REPORT

DETECTION AND CLASSIFICATION OF

Submitted For Partial Fulfillment of Award of

Department of Computer Science and

Rohit Singh (2000540100133)

Harsh Raj (2100540109007)

Nitish Chaurasiya (2100540109014)

Shourya Dwivedi (2000540100151)

Under the Guidance

Mr. Rajeev Srivastava

Mr. Rajeev Srivastava Dr. Anurag Tiwari

Assistant Professor Professor

(Project Guide) (Head of Department)

Rohit Singh (2000540100133) Harsh Raj (2100540109007)

Nitish Chaurasiya (2100540109014) Shourya Diwvedi (2000540100151)

DESCRIPTION PAGE NUMBER

2. LITERATURE SURVEY 3-12

3. PROPOSED METHODOLOGY 13-30

6. FUTUTRE SCOPE OF THE PROJECT 35-36

A. Data Sources and Preprocessing

 Insights drawn from the performance of different models.

E. Challenges and Limitations

Figure No. Figure Caption Page No.

1.1 Phishing Detection Process 2

3.1 Data Preprocessing 16

3.3 Features Extracted 17

3.4 Dynamic Analysis Diagram 18

3.5 Testing & Validation Diagram 19

3.1.1 Home Page 21

3.1.3 Log In Page 22

3.1.4 Sign Up Page 23

3.1.5 Prediction Input Page 24

3.1.6 Result Page 24

3.1.7 Feedback Page 25

Table No Table Caption Page No

2.1.1 Comparative study of Research Papers 10-14

1. SBT: Signature-Based Techniques

2. BBT: Behavior-Based Techniques

3. MLT: Machine Learning Techniques

4. DA: Dynamic Analysis

5. BP: Behavioral Patterns

6. FES: Feature Extraction and Selection

7. HMA: Hybrid Methodologies Approach

8. TMLM: Transparent Machine Learning Models

9. RTIF: Real-time Threat Intelligence Fusion

10. UEE: User Empowerment through Education

11. IA: Innovative Approaches

12. DT: Decision Trees

13. RF: Random Forests

14. SVM: Support Vector Machines

15. IoTS: Internet of Things Security

16. ARIMA: Auto-Regressive Integrated Moving Average

18. API: Application Programming Interface

19. MITM: Man-in-the-Middle

20. NLP: Natural Language Processing

CSE DEPARTMENT, BBDITM, Lucknow [2]

CSE DEPARTMENT, BBDITM, Lucknow [3]

CSE DEPARTMENT, BBDITM, Lucknow [4]

CSE DEPARTMENT, BBDITM, Lucknow [5]

CSE DEPARTMENT, BBDITM, Lucknow [7]

Kinan Ghanem, Francisco J. Aparicio-Navarro, Konstantinos G. Kyriakopoulos, Sangarapillai

CSE DEPARTMENT, BBDITM, Lucknow [9]

CSE DEPARTMENT, BBDITM, Lucknow [11]