Cybersecurity System

CYBER ATTACKS CLASSIFICATION USING SUPERVISED
MACHINE LEARNING TECHNIQUES
MINI PROJECT REPORT

Submitted in partial fulfillment of the requirements
for the award of the degree of
Bachelor of Computer Applications
SUBMITTED BY
SATHYA S
211314103314
Under the guidance of

Mrs.D.GAYATHRI, M.Sc
GUEST LECTURER
BACHELOR OF COMPUTER APPLICATIONS
GURU NANAK COLLEGE

(AUTONOMOUS)
Affiliated to University of Madras
Accredited at ‘A++’ Grade by NAAC | An ISO 9001 2015 Certified Institution
Guru Nanak Salai, Velachery, Chennai – 600 042.
MARCH- 2024
GURU NANAK COLLEGE
(AUTONOMOUS)
Affiliated to University of Madras

Accredited at ‘A++’ Grade by NAAC | An ISO 9001 2015 Certified Institution
Guru Nanak Salai, Velachery, Chennai – 600 042.
BACHELOR OF COMPUTER APPLICATIONS
BONAFIDE CERTIFICATE
This is to certify that, this is a bonafide record of work done by SATHYA S, 211314103314 of for
the Final Year Project during the Academic Year 2023-24.
PROJECT GUIDE HEAD OF THE DEPARTMENT
Submitted for the Project Viva Voce Examination held on ________________ at
GURU NANAK COLLEGE (Autonomous), Guru Nanak Salai, Velachery, Chennai - 600 042.
Internal Examiner External Examiner

Date: Date:
DECLARATION
I am SATHYA, 2113141033142 studying III Year ,Bachelor of Computer Applications at Guru

Nanak College (Autonomous), Chennai hereby declare that this the Report of my Project entitled, CYBER
ATTACKS CLASSIFICATION USING SUPERVISED MACHINE LEARNING TECHNIQUES is
the record of the original work carried out by me under the Guidance and Supervision of
Mrs.D.GAYATHRI towards the partial fulfillment of the requirements of the award of the Degree of
Bachelor of Computer Applications. I further declare that this has not been submitted anywhere for the
award of Degree/Diploma or any other similar to this before.
PLACE : CHENNAI SATHYA S

DATE : 2113141003142
ACKNOWLEDGEMENT
I would like to thank the Principal Dr. T. K. Avvai Kothai and Vice Principal Dr.
Anitha Malisetty for providing the necessary resources and facilities for the completion of this
project.
I extend my deepest thanks to Dr. K. RAVIYA, Head of the Department, whose guidance,
support, and encouragement were invaluable throughout this endeavor. Her expertise and insights
have been instrumental in shaping this project and enhancing its quality.
I owe my Guide Mrs. D. GAYATHRI a debt of gratitude for his/her invaluable guidance,
patience, and encouragement. Her mentorship has been a beacon of light, steering me through the
complexities of this project and helping me realize my potential.
I also like to extend my thanks to the faculty members of BACHELOR OF

COMPUTER APPLICATIONS, for their valuable suggestion during the course of the study
of my project.
Last but not least, I thank my family and friends for their unwavering encouragement
and understanding during this journey.
TABLE OF CONTENTS
S.No. TOPIC PAGE NO.
INTRODUCTION
1. 2
1.1 Objective
3
1.2 Modules of the Project
SYSTEM SPECIFICATION
2. 2.1 Hardware Requirements 5
2.2 Software Requirements 5
SURVEY OF TECHNOLOGIES
3. 3.1 Feature of the Front-End 7
3.2 Feature of the Back-End 7
SELECTED SOFTWARE
4.1 Html
4.2 Css
4.3 Java script
4. 4.4 Bootstrap
9
4.5 python
4.6 Django
4.7 My sql
4.8 My sql lite
SYSTEM ANALYSIS
5. 5.1 Existing System 14
5.2 Characteristics of Proposed System 14

SYSTEM DESIGN
6.1 Data visualization 16

6.
6.2 Use Case Diagram 16
6.3 Entity Relationship Diagram 18
PROGRAM CODING
7. 7.1 Source Code 21
7.2 Screenshots 57
TESTING
8. 8.1 Software Testing 61
8.2 Types of Testing 61
9. CONCLUSION 64
10. REFERENCE 66
ABSTRACT
Cyberattack classification through the utilization of supervised machine learning methods. The
system is designed to categorize diverse cyber-attacks by employing a meticulously curated dataset
encompassing a wide array of attack types, including but not limited to malware, phishing, and distributed
denial-of-service (DDoS) attacks. Feature extraction techniques are applied to both network traffic data and
behavioural attributes, facilitating the training of a robust classification model. Various supervised learning
algorithms, such as decision trees, support vector machines, and neural networks, are evaluated for their
efficacy in accurately predicting attack categories. The training process involves labelling historical attack
instances, enabling the model to discern intricate patterns and subtle differentiators among attack types.
Regular model updates and retraining with new attack data ensure its relevance in dynamically evolving
threat landscapes. The system's predictive accuracy empowers cybersecurity teams to swiftly identify and
respond to cyber threats, thereby bolstering overall defense strategies. Through this research, we contribute
to the proactive identification and mitigation of cyber-attacks, ultimately fortifying digital security
frameworks.
INTRODUCTION
1
1. INTRODUCTION
1.1 OBJECTIVES
The objective of this research is to explore and highlight the significance of employing supervised
machine learning techniques for the classification of cyber-attacks in the realm of modern
cybersecurity. The focus is on leveraging labelled datasets to train algorithms for the swift and accurate
identification and categorization of diverse cyber threats. The ultimate goal is to enable organizations
to respond effectively, mitigate potential damage, and strengthen their overall cybersecurity defenses.
• Pivotal Aspect of Cybersecurity: Recognition of the pivotal role played by supervised

machine learning techniques in the contemporary landscape of cybersecurity.
• Growing Sophistication and Frequency of Cyber Threats: Acknowledgment of the

increasing complexity and frequency of cyber threats in the evolving digital environment.
• Swift and Accurate Categorization: Emphasis on the ability of supervised machine learning
to provide a quick and accurate categorization of various types of cyber-attacks, aiding
organizations in timely responses.
• Leveraging Labelled Datasets: Highlighting the crucial role of labelled datasets in training
machine learning algorithms for effective cyber-attack classification.
• Challenges in Cybersecurity Classification: Recognition of challenges, including the

diversity of attack methods, adaptability of attackers, and imbalanced data, which contribute to
the complexity of cyber threat classification.
• Applications Across Cybersecurity Domains: Identification of diverse applications, ranging

from intrusion detection and email filtering to malware identification and anomaly detection,
showcasing the versatility of supervised machine learning in cybersecurity.
2
1.2 MODULES OF THE PROJECT
• Data Pre-processing
• Data Analysis of Visualization
• Implementing Algorithm 1
• Deployment
1. Data Pre-processing:
- Cleans and prepares raw data, addressing missing values and optimizing features for subsequent
analysis.
2. Data Analysis and Visualization:
- Extracts insights and patterns through statistical analysis and visualization, laying the groundwork for
informed decision-making.
3. Algorithm Implementation (1, 2, 3):
- Applies and evaluates multiple algorithms (1, 2, 3) to identify the most effective solution based on
performance metrics.
4. Deployment:
- Integrates the selected algorithm into a practical setting, ensuring it is adapted for operational use with
user interfaces and continuous monitoring.
3
SYSTEM SPECIFICATION
4
2. SYSTEM SPECIFICATION
2.1 Hardware Requirements
Processor : Pentium IV/III
Hard disk : minimum 80 GB
RAM : minimum 2 GB
2.2 Software Requirements
Operating System : Windows
Tool : Anaconda with Jupyter Notebook
5
SURVEY OF
TECHNOLOGIES
6
3.SURVEY OF TECHNOLOGIES
3.1 FEATURES OF FRONT END
The part of an application that the user interacts directly is termed as frontend. It is also
referred as a ‘client side’ of the application. It includes everything that users experience directly:
text colors and styles, images, graphs and tables, buttons, colors, and navigation menu. XML is
the language used for front end development. The structure, design, behaviour, and content of
everything seen on screens when websites, web applications, or mobile apps are opened up, is
implemented by front End developers. Responsiveness and performance are two main
objectives of the Front End. The developer must ensure that the site is responsive i.e. it appears
correctly on devices of all sizes no part of the website should behave abnormally irrespective
of the size of the screen. Some of frontend development tools are HTML, CSS, XML, BULMA,
TAIL WIND CSS, SASS.
3.2 FEATURES OF BACK END
Backend is the server-side of the application/website. It stores and arranges data, and
also makes sure everything on the client-side of the application/website works fine. It is the part
of the application/website that you cannot see and interact with. It is the portion of software that
does not come in direct contact with the users. The parts and characteristics developed by
backend designers are indirectly accessed by users through a frontend application. Activities,
like writing APIs, creating libraries, and working with system components without user
interfaces or even systems of scientific programming, are also included in the backend. Some
of backend development tools are PHP, Java, C++, Python, Firebase and MySQL. Etc.
7
SELECTED SOFTWARE
8
4.SELECTED SOFTWARE
4.1 HTML
HTML, or HyperText Markup Language, is the fundamental language of web development. Created by
Tim Berners-Lee, HTML uses tags to structure content, define links, and incorporate multimedia. Key
features include hyperlinks for navigation, support for multimedia elements, interactive forms, semantic
markup for accessibility, and cross-browser compatibility. HTML continues to evolve, with HTML5
introducing new features like canvas for graphics and improved support for mobile devices. As the
backbone of the web, HTML is essential for creating structured and visually appealing online content.
Features :
• New features should be based on HTML, CSS, DOM and JavaScript

• The need for external plug-in (Like Flash) need to be reduced.
• Error handling should be easier than in previous versions.
• Scripting has to be replaced by more mark-ups.
• Some of the most interesting new features in HTML5 are:
• The <canvas> elements for drawing.
• The <video> and <audio> elements for media playback
• Support for local storage
9
4.2 CSS
Cascading Style Sheets (CSS) is a vital web development technology that complements HTML by styling
web pages. Using selectors and declarations, CSS separates content and presentation, allowing developers
to define the appearance of HTML elements. Key features include layout control, responsiveness through
media queries, external style sheets for modularity, and support for animations. CSS enhances the visual
appeal and consistency of web pages, playing a crucial role in creating engaging and well-designed online
content.
KEY FEATURES:
• Selectors: CSS3 introduces several new selectors that allow you to target specific
elements in a more precise way, such as: nth-child (), not (), and: checked.
• Box model: CSS3 adds new properties for controlling the size, padding, border, and
margin of boxes, such as box-sizing, border-radius, and box-shadow.
• Colors: CSS3 introduces new color formats, such as HSL and RGBA, which allow
you to specify colors in a more intuitive way.
• Fonts: CSS# adds new properties for controlling the font size, style and weight, as
well as new font formats, such as web fonts.
10
4.3 JAVA SCRIPT
JavaScript is predominantly used for client-side scripting in web development. It runs directly
in the web browser, enabling developers to create dynamic and interactive web pages that respond to
user actions in real-time without needing to communicate with the server. Java Script is the backbone
of many modern web applications, including social media platforms, online collaboration tools, and
e-commerce websites. It allows developers to create rich, interactive user interfaces and deliver a
seamless user experience. With the advent of platforms like Node.js, JavaScript can also be used for
server-side development. Node.js allows developers to build scalable and high-performance web
servers and backend services using JavaScript. JavaScript allows manipulation of the Document
Object Model (DOM), enabling developers to dynamically update and modify the content, structure,
and style of web pages based on user actions or application state changes.
KEY FEATURES:
• Dynamic Content
• Event Handling
• Data Manipulation
• Modularity
• Event-Driven Programming
11
4.4 BOOTSTRAP
Bootstrap is a popular front-end framework that provides a collection of CSS, JavaScript,

and HTML components for building responsive, mobile-first web applications. It was first released
in 2011 by twitter and has since become one of the most widely used frontend frameworks on the
web
KEY FEATURES:
• Responsive design: bootstrap's grid system makes it easy to create responsive designs
that adapt to different screen sizes and devices.
• Pre-designed UI components: bootstrap includes a large set of pre-designed UI

components, such as buttons, forms, tables, and navigations bars, that can be easily
customized to fit your applications, needs
12
4.5 PYTHON
Python is a widely-used, high-level programming language renowned for its

readability and versatility. Created in 1991, Python's simplicity, extensive standard library,
and dynamic typing make it suitable for diverse applications, including web development and
data analysis. Its community-driven development, cross-platform compatibility, and ease of
learning contribute to Python's popularity and widespread
KEY FEATURES:
• Readable Syntax: Clear and concise code structure for readability.
• Versatility: Supports both procedural and object-oriented programming.
• Extensive Library: Rich standard library simplifies development.
• Interpreted and Interactive: Allows rapid testing and experimentation.
13
4.6 Django
Django is a high-level Python web framework known for rapid development. With an MVC
architecture, built-in ORM system, and templating engine, it simplifies common tasks.
Features like an automatic admin interface, security measures, and scalability contribute to its
popularity. Supported by a vibrant community, Django is versatile, suitable for various
applications, and includes Django REST framework for API development.
Key Features :
1. Rapid Development: Facilitates quick and efficient web development.
2. MVC Architecture: Organizes code in a Model-View-Controller pattern.
3. ORM System: Simplifies database interactions with Python code.
4. Django REST framework: Enhances Django for modern API development.
14
4.7 My sql
MySQL is an open-source relational database management system (RDBMS)

that uses Structured Query Language (SQL). One of its key features is its support for
ACID (Atomicity, Consistency, Isolation, Durability) properties, ensuring data
integrity and reliability even in complex transactional scenarios.
Key Features :
• Data Querying:Enables efficient retrieval and manipulation of data.
• Data Definition Language (DDL):Defines and modifies database structures.
• Data Manipulation Language (DML):Manipulates data within the database.
• Data Integrity:Ensures accuracy and consistency of stored data.
• Transaction Control: Manages transactions for data consistency.
• Security: Implements access controls and permissions for data protection.
• Scalability:Adaptable for handling growing volumes of data.
15
4.8 My sql lite
SQLite is a lightweight and serverless relational database management system, known for its
simplicity and efficiency. With a zero-configuration approach, it operates from a single file,
making it easy to integrate into various applications. SQLite supports standard SQL syntax, is
cross-platform, and boasts a low memory footprint, making it a popular choice for embedded
systems, mobile apps, and desktop software. As open-source software, SQLite has a robust
community providing support and resources for developers.
Key Features :
• Serverless & Embedded: Lightweight and serverless, operates from a single file.
• Zero Configuration: Requires minimal setup for easy integration.
• Cross-Platform: Compatible with various operating systems.
• Transactional Support: Ensures reliable transaction processing.
• SQL Compatibility: Follows standard SQL syntax for seamless use.
• Low Memory Footprint: Designed for memory efficiency.
• Open Source: Freely available and modifiable.
16
SYSTEM ANALYSIS
17
5.SYSTEM ANALYSIS
5.1 Existing System
The use of invariants in developing security mechanisms has become an attractive research
area because of their potential to both prevent attacks and detect attacks in Cyber-Physical
Systems (CPS). In general, an invariant is a property that is expressed using design parameters
along with Boolean operators and which always holds in normal operation of a system, in
particular, a CPS. Invariants can be derived by analysing operational data of various design
parameters in a running CPS, or by analysing the system’s requirements/design documents,
with both of the approaches demonstrating significant potential to detect and prevent cyber-
attacks on a CPS. While data-driven invariant generation can be fully automated, design-driven
invariant generation has a substantial manual intervention. In this paper, we aim to highlight
the shortcomings in data-driven invariants by demonstrating a set of adversarial attacks on such
invariants. We propose a solution strategy to detect such attacks by complementing them with
design-driven invariants. We perform all our experiments on a real water treatment testbed. We
shall demonstrate that our approach can significantly reduce false positives and achieve high
accuracy in attack detection on CPSs.
Disadvantages:
• Higher time complexity for implementation process.

• Complexity and usability.
• Accuracy was low.
• Limited scalability.
18
5.2 Characteristics of Proposed System
• Enhanced Efficiency:The proposed system is designed to improve overall operational

efficiency by streamlining processes and reducing manual intervention.
• User-Friendly Interface: A user-friendly interface ensures ease of use for all

stakeholders, promoting accessibility and reducing the learning curve.
• Scalability: The system is scalable to accommodate future growth or changes in user

requirements, ensuring adaptability to evolving needs.
• Robust Security Measures: Implementation of robust security features safeguards

sensitive data and protects against potential cyber threats, ensuring data integrity and
user privacy.
• Integration Capabilities:The proposed system is capable of integrating seamlessly

with existing systems and technologies, promoting interoperability and minimizing
disruptions.
• Reliability and Stability: The system is designed for reliability, minimizing downtime
and ensuring stable performance under varying conditions.
• Data Accuracy and Consistency: Measures are in place to ensure the accuracy and
consistency of data through validation and verification processes.
Real-time Reporting and Analytics:The system provides real-time reporting and

analytics capabilities, empowering users with timely and meaningful insights for
decision-making.
• Audit Trail and Traceability:Comprehensive audit trails are implemented to track

system activities, ensuring traceability and accountability for all transactions.
19
• Adherence to Regulatory Standards: The proposed system complies with relevant
regulatory standards and industry best practices, ensuring legal and ethical integrity.
• Flexibility and Customization: The system is flexible, allowing for customization to

meet specific organizational needs and evolving business requirements.
• Automated Workflows: Automation of key workflows reduces manual tasks,

minimizing errors and improving overall process efficiency.
• Collaborative Features: Collaboration tools and features are integrated to enhance

communication and teamwork among users.
• Regular Updates and Maintenance: A systematic approach to updates and

maintenance ensures the system's longevity and responsiveness to changing
technological landscapes.
• User Training and Support: Provision of adequate training resources and ongoing
support to users ensures effective utilization of the system and addresses any issues
promptly.
• These characteristics collectively contribute to the effectiveness and success of the

proposed system, aligning it with the organization's goals and requirements.
20
SYSTEM DESIGN
21
6.SYSTEM DESIGN
6.1 Data visualization
Data visualization is an important skill in applied statistics and machine learning. Statistics
does indeed focus on quantitative descriptions and estimations of data. Data visualization
provides an important suite of tools for gaining a qualitative understanding. This can be helpful
when exploring and getting to know a dataset and can help with identifying patterns, corrupt
data, outliers, and much more. With a little domain knowledge, data visualizations can be used
to express and demonstrate key relationships in plots and charts that are more visceral and
stakeholders than measures of association or significance. Data visualization and exploratory
data analysis are whole fields themselves and it will recommend a deeper dive into some the
books mentioned at the end.
Sometimes data does not make sense until it can look at in a visual form, such as with charts
and plots. Being able to quickly visualize of data samples and others is an important skill both
in applied statistics and in applied machine learning. It will discover the many types of plots
that you will need to know when visualizing data in Python and how to use them to better
understand your own data.
➢ How to chart time series data with line plots and categorical quantities with bar charts.
➢ How to summarize data distributions with histograms and box plots.
22
MODULE DIAGRAM
GIVEN INPUT EXPECTED OUTPUT
input : data
output : visualized data
23
6.2 Use Case Diagram
Use case diagrams are considered for high level requirement analysis of a system. So
when the requirements of a system are analyzed the functionalities are captured in use cases.
So, it can say that uses cases are nothing but the system functionalities written in an organized
manner.
24
6.3 Entity Relationship Diagram
An entity relationship diagram (ERD), also known as an entity relationship model, is a

graphical representation of an information system that depicts the relationships among people,
objects, places, concepts or events within that system. An ERD is a data modeling technique
that can help define business processes and be used as the foundation for a relational database.
Entity relationship diagrams provide a visual starting point for database design that can also be
used to help determine information system requirements throughout an organization. After a
relational database is rolled out, an ERD can still serve as a referral point, should any debugging
or business process re-engineering be needed later.
25
PROGRAM CODING
26
7.PROGRAM CODING
7.1 Source Code
DATA PREPROCESSING AND DATA CLEANING
# Import the necessary libraries.
import pandas as pd
import numpy as np
# Avoid unnecessary warnings, (EX: software updates, version mismatch, and so on.)
import warnings
warnings.filterwarnings('ignore')
# Load the datasets
df=pd.read_csv('CYBER.csv')
# Check the top5 values
df.head()
# Check the bottom five values.
df.tail()
# Check the dimension of our datasets
df.shape
# Check the dataset size
df.size
27
# Check the columns of dataset
df.columns
# To know the information of our datsets
df.info()
# Check the unique columns of our specific column
df['Label'].unique()
# Transform the columns value(ex: int to str, str to int) for classification purpose.
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
var = ['Label']
for i in var:
df[i] = le.fit_transform(df[i]).astype(int)
# Check the value is null or notnull
df.isnull().head()
# Remove the null value
df = df.dropna()
# Describe the datasets into stastical point of view
28
df.describe()
# Check the relation between each individual columns
df.corr().head()
# Check the events for specific columns
pd.crosstab(df["'Tot Fwd Pkts'"], df["'Tot Bwd Pkts'"]).head()
# Ascending the value of specific columns
df.groupby(["'Flow Byts/s'","'Pkt Len Std'"]).groups
# Check the value counts for specific columns
df["Label"].value_counts()
# Check the specific column catagorical distribution
pd.Categorical(df["'Idle Min'"]).describe()
# Check if the value is duplicated or not
df.duplicated()
# Calculate the total number of duplicated values
sum(df.duplicated())
# Remove the duplicate values
29
df=df.drop_duplicates()
# Calculate the total number of duplicated values
sum(df.duplicated())
DATA VISUALIZATION AND DATA ANALYSIS
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df.head()
df = df.dropna()
30
le = LabelEncoder()
var = ['Label']
for i in var:
# Check the data is balanced or imbalanced so that's why we use Countplot.
plt.figure(figsize=(12,7))
sns.countplot(x='Label',data=df)
# Plot a Histogram
plt.figure(figsize=(15,5))
plt.subplot(1,2,1)
plt.hist(df["'Flow Duration'"],color='red')
plt.subplot(1,2,2)
plt.hist(df["'Active Std'"],color='blue')
31
# Check how many columns are in datasets
df.columns
# Plot a Histogram.
df.hist(figsize=(15,55), color='green')
plt.show()
# Plot a Histogram
df["'Pkt Len Mean'"].hist(figsize=(10,5),color='yellow',bins=25)
# Check the outliers our datasets.
plt.boxplot(df["'Pkt Size Avg'"])
# Plot a density plot
df["'Pkt Len Mean'"].plot(kind='density')
# Plot a distance plot
sns.displot(df["'Bwd Pkt Len Mean'"], color='purple')
# barplot, boxenplot, boxplot, countplot, displot, distplot, ecdfplot, histplot, kdeplot,

pointplot, violinplot, stripplot
# Plot a distance plot.
sns.displot(df["'Pkt Len Mean'"], color='coral') # residplot, scatterplot
# Plot a head map for co relationships for each columns.
32
fig, ax = plt.subplots(figsize=(20,15))
sns.heatmap(df.corr(),annot = True, fmt='0.2%',cmap = 'autumn',ax=ax)
# Plot a Piechart
def plot(df, variable):
dataframe_pie = df[variable].value_counts()
ax = dataframe_pie.plot.pie(figsize=(9,9), autopct='%1.2f%%', fontsize = 10)
ax.set_title(variable + ' \n', fontsize = 10)
return np.round(dataframe_pie/df.shape[0]*100,2)
plot(df, 'Label')
GaussianNB CLASSIFIER ALGORITHEM
import pandas as pd
import numpy as np
import warnings
# Load the datasets
33
df.head()
del df["'TotLen Fwd Pkts'"]
del df["'TotLen Bwd Pkts'"]
del df["'Fwd Pkt Len Max'"]
del df["'Fwd Pkt Len Min'"]
del df["'Fwd Pkt Len Mean'"]
del df["'Fwd Pkt Len Std'"]
del df["'Bwd Pkt Len Max'"]
del df["'Bwd Pkt Len Mean'"]
del df["'Idle Std'"]
del df["'Flow Byts/s'"]
del df["'Flow IAT Std'"]
del df["'Flow IAT Min'"]
del df["'Pkt Len Max'"]
del df["'Bwd Pkt Len Min'"]
del df["'Flow IAT Max'"]
del df["'Fwd IAT Max'"]
del df["'Fwd IAT Min'"]
del df["'Bwd IAT Std'"]
del df["'Bwd IAT Max'"]
34
del df["'Fwd IAT Std'"]
del df["'Bwd IAT Min'"]
del df["'Bwd PSH Flags'"]
del df["'Bwd URG Flags'"]
del df["'Pkt Len Min'"]
del df["'Pkt Len Std'"]
del df["'Pkt Len Var'"]
del df["'FIN Flag Cnt'"]
del df["'RST Flag Cnt'"]
del df["'PSH Flag Cnt'"]
del df["'ACK Flag Cnt'"]
del df["'URG Flag Cnt'"]
del df["'CWE Flag Count'"]
df=df.dropna()
df.columns
le = LabelEncoder()
35
var = ['Label']
for i in var:
df.head()
# Split the datasets into depended and independed variable
# X is independend variable (Input features)
x1 = df.drop(labels='Label', axis=1)
# Y is dependend variable (Target variable)
y1 = df.loc[:,'Label']
# This process execute to balanced the datasets features.
import imblearn
from imblearn.over_sampling import RandomOverSampler
from collections import Counter
ros =RandomOverSampler(random_state=42)
36
x,y=ros.fit_resample(x1,y1)
print("OUR DATASET COUNT : ", Counter(y1))
print("OVER SAMPLING DATA COUNT : ", Counter(y))
# Split the datasets into two parts like trainng and testing variable.
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=42,

stratify=y)
print("NUMBER OF TRAIN DATASET : ", len(x_train))
print("NUMBER OF TEST DATASET : ", len(x_test))
print("TOTAL NUMBER OF DATASET : ", len(x_train)+len(x_test))
print("NUMBER OF TRAIN DATASET : ", len(y_train))
print("NUMBER OF TEST DATASET : ", len(y_test))
print("TOTAL NUMBER OF DATASET : ", len(y_train)+len(y_test))
# Implement Gaussian naive bayes algorithm learning patterns
from sklearn.naive_bayes import GaussianNB
GNB = GaussianNB()
# Fit is the training function for this algorithm.
GNB.fit(x_train,y_train)
# Predict is the test function for this algorithm
predicted = GNB.predict(x_test)
37
# Check classification report for this algorithm
from sklearn.metrics import classification_report
cr = classification_report(y_test,predicted)
print('THE CLASSIFICATION REPORT OF GAUSSIANNB CLASSIFIER:\n\n',cr)
# Check the confusion matrix for this algorithms
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,predicted)
print('THE CONFUSION MATRIX SCORE OF GAUSSIANNB CLASSIFIER:\n\n\n',cm)
# Check the cross value score of this algorithm.
from sklearn.model_selection import cross_val_score
accuracy = cross_val_score(GNB, x, y, scoring='accuracy')
print('THE CROSS VALIDATION TEST RESULT OF ACCURACY :\n\n\n',

accuracy*100)
# Check the accuracy score of this algorithms.
from sklearn.metrics import accuracy_score
a = accuracy_score(y_test,predicted)
print("THE ACCURACY SCORE OF GAUSSIANNB CLASSIFIER IS :",a*100)
# Check the hamming loss of this algorithm.
from sklearn.metrics import hamming_loss
hl = hamming_loss(y_test,predicted)
print("THE HAMMING LOSS OF GAUSSIANNB CLASSIFIER IS :",hl*100)
38
# Plot a Confusion matrix for this algorithms.
def plot_confusion_matrix(cm, title='THE CONFUSION MATRIX SCORE OF

GAUSSIANNB CLASSIFIER\n\n', cmap=plt.cm.cool):
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
cm1=confusion_matrix(y_test, predicted)
print('THE CONFUSION MATRIX SCORE OF GAUSSIANNB CLASSIFIER:\n\n')
print(cm)
plot_confusion_matrix(cm)
# Plot the worm plot for this model.
df2 = pd.DataFrame()
df2["y_test"] = y_test
df2["predicted"] = predicted
df2.reset_index(inplace=True)
plt.figure(figsize=(20, 5))
plt.plot(df2["predicted"][:100], marker='x', linestyle='dashed', color='red')
plt.plot(df2["y_test"][:100], marker='o', linestyle='dashed', color='green')
plt.show()
ADABOOST CLASSIFIER ALGORITHEM
39
import pandas as pd
import numpy as np
import warnings
# Load the datasets
df.head()
40
41
df=df.dropna()
df.columns
le = LabelEncoder()
var = ['Label']
for i in var:
df.head()
42
import imblearn

stratify=y)
43
# Implement Adaboost classifier algorithm learning patterns
from sklearn.ensemble import AdaBoostClassifier
ABC = AdaBoostClassifier()
ABC.fit(x_train,y_train)
predicted = ABC.predict(x_test)
print('THE CLASSIFICATION REPORT OF ADABOOST CLASSIFIER:\n\n',cr)
# Check the confusion matrix for this algorithms.
print('THE CONFUSION MATRIX SCORE OF ADABOOST CLASSIFIER:\n\n\n',cm)
44
accuracy = cross_val_score(ABC, x, y, scoring='accuracy')

accuracy*100)
print("THE ACCURACY SCORE OF ADABOOST CLASSIFIER IS :",a*100)
print("THE HAMMING LOSS OF ADABOOST CLASSIFIER IS :",hl*100)
def plot_confusion_matrix(cm, title='THE CONFUSION MATRIX SCORE OF

ADABOOST CLASSIFIER\n\n', cmap=plt.cm.cool):
plt.title(title)
plt.colorbar()
print('THE CONFUSION MATRIX SCORE OF ADABOOST CLASSIFIER:\n\n')
print(cm)
45
plt.show()
CAT BOOST CLASSIFIER ALGORITHEM
import pandas as pd
import numpy as np
import warnings
46
# Load the datasets
47
df.columns
df.head()
df=df.dropna()
df['Label'].value_counts()
48
le = LabelEncoder()
var = ['Label']
for i in var:
df['Label'].value_counts()
df.head()
49
import imblearn

stratify=y)
# Implement Catboost classifier algorithm learning patterns
from catboost import CatBoostClassifier
CBC = CatBoostClassifier()
50
CBC.fit(x_train,y_train)
predicted = CBC.predict(x_test)
print('THE CLASSIFICATION REPORT OF CAT BOOST CLASSIFIER:\n\n',cr)
# Check the confusion matrix for this algorithms.
print('THE CONFUSION MATRIX SCORE OF CAT BOOST CLASSIFIER:\n\n\n',cm)
accuracy = cross_val_score(CBC, x, y, scoring='accuracy')

accuracy*100)
51
print("THE ACCURACY SCORE OF CAT BOOST CLASSIFIER IS :",a*100)
print("THE HAMMING LOSS OF CAT BOOST CLASSIFIER IS :",hl*100)
def plot_confusion_matrix(cm, title='THE CONFUSION MATRIX SCORE OF CAT

BOOST CLASSIFIER\n\n', cmap=plt.cm.cool):
plt.title(title)
plt.colorbar()
print('THE CONFUSION MATRIX SCORE OF CAT BOOST CLASSIFIER:\n\n')
print(cm)
52
plt.show()
# Build a model in catboosting algorithms
import joblib
joblib.dump(CBC, 'cyber1.pkl')
53
7.2 SCREENSHOTS
54
55
56
TESTING
57
8.TESTING
8.1 Software Testing
The purpose of testing is to discover errors. Testing is the process of trying to

discover every conceivable fault or weakness in a work product. It provides a way
to check the functionality of components, sub-assemblies, assemblies and/or a
finished product it is the process of exercising software with the intent of ensuring
that the Software system meets its requirements and user expectations and does not
fail in an unacceptable manner. There are various types of test. Each test type
addresses a specific testing requirement.
8.2 TYPES OF TESTING
• Unit Testing
• White-box Testing
• Black-box Testing
• Validation Testing
• Backend Testing
Unit Testing: Unit testing involves the design of test cases that validate that the
internal program logic is functioning properly, and that program input produces
valid outputs. All decision branches and internal code flow should be validated. It
is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that
relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific 52 business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined
inputs and expected results.
White-box Testing: It is a test case design method that uses the control structure
of the procedural design to drive test cases. Using white box testing methods it was
guaranteed that most of the independent paths within modules had been exercised
at least once, all logical decision on their true and false sides, executed all loops at
their boundaries and exercised internal data structures to ensure their data validity.
White box testing has been done to achieve the following objectives. Logic errors
58
and incorrect assumptions are inversely proportional to the probability that a
program path will be executed. Errors tend to creep into the work when design and
implementation functions, condition or control that is out of the mainstream. We
often believe that logical path is not likely to be executed when the fact it may be
executed on regular basis. When program is translated into programming
language source code, it is likely that some typing errors will occur. Many will be
uncovered by syntax and type checking mechanism but others may go undetected
until testing begins.
Black-box Testing: Although tests are designed to uncover errors, they are also
used to demonstrate that the software functions are operational, input is properly
accepted and output is correctly produced and that the integrity of external
information is maintained. A black box test examines some of fundamental aspects
of a system with little regard for the internal logical structure of the software. All
input screens were thoroughly tested for data validity and smoothness of data entry
operations. Test cases were so formulated to verify whether the system works
properly in rare conditions also. Error conditions were checked. Data entry
operations are to be user friendly and smooth. It would be easier for the operators
if they can enter data through key board only.
Validation Testing: Validation testing can be defined as many, but a single

definition is that validation succeeds when the software functions in a manner that
can be reasonable excepted by the customer. After validation test have been
conducted one of the two possible conditions exists. The function or performance
characteristics are acceptable and confirmed to specification. A decision from
specification is uncovered and defining list is created. System validation checks the
quality of software in both simulated and live environment. First the software goes
through a phase in which errors and failures based on simulated user requirements
are verified and studied.
Back-end Testing: Whenever an input or data is entered on front-end application,

it stores in the database and the testing of such database is known as Database
59
Testing or Backend testing. There are different databases like SQL Server,
MySQL. Database testing involves testing of table structure, schema, stored
procedure, data structure and so on. Functional testing: Functional tests provide
systematic demonstrations that functions tested are available as specified by the
business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items: 1. Valid Input: identified
classes of valid input must be accepted. 2. Invalid Input: identified classes of
invalid input must be rejected. 3. Functions: identified functions must be exercised.
4. Output: identified classes of application outputs must be exercise.
60
CONCLUSION
61
9. CONCLUSION
The analytical process started from data cleaning and processing, missing
value, exploratory analysis and finally model building and evaluation. The best
accuracy on public test set of higher accuracy score algorithm will be find out.
The founded one is used in the application which can help to find the type of
Cyberattacks
FUTURE WORK
• Deploying the project in the cloud.
• To optimize the work to implement in the IOT system.
62
REFERENCE
63
10. REFERENCE
1. HTML: MDN Web Docs.

https://developer.mozilla.org/en-US/docs/Web/HTML
2. CSS: MDN Web Docs.

https://developer.mozilla.org/en-US/docs/Web/CSS
3. JavaScript: MDN Web Docs.

https://developer.mozilla.org/en-US/docs/Web/JavaScript
4. Bootstrap:Bootstrap Documentation.
https://getbootstrap.com/docs/4.0/getting-started/introduction/
5. Python: Python Software Foundation.

https://www.python.org/doc/
6. Django: Django Documentation.

https://docs.djangoproject.com/
7. MySQL: MySQL Documentation.

https://dev.mysql.com/doc/
8. SQLite: SQLite Documentation.

https://www.sqlite.org/docs.html
64

Cybersecurity System

Uploaded by

Copyright:

Available Formats

You might also like

Cybersecurity System

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cybersecurity System

Uploaded by

Copyright:

Available Formats

CYBER ATTACKS CLASSIFICATION USING SUPERVISED

MACHINE LEARNING TECHNIQUES

MINI PROJECT REPORT

Under the guidance of

GURU NANAK COLLEGE

Affiliated to University of Madras

BACHELOR OF COMPUTER APPLICATIONS

the Final Year Project during the Academic Year 2023-24.

PROJECT GUIDE HEAD OF THE DEPARTMENT

Submitted for the Project Viva Voce Examination held on ________________ at

Internal Examiner External Examiner

I am SATHYA, 2113141033142 studying III Year ,Bachelor of Computer Applications at Guru

PLACE : CHENNAI SATHYA S

I also like to extend my thanks to the faculty members of BACHELOR OF

S.No. TOPIC PAGE NO.

2. 2.1 Hardware Requirements 5

2.2 Software Requirements 5

3. 3.1 Feature of the Front-End 7

3.2 Feature of the Back-End 7

4.3 Java script

4.8 My sql lite

5. 5.1 Existing System 14

5.2 Characteristics of Proposed System 14

6.1 Data visualization 16

6.3 Entity Relationship Diagram 18

7. 7.1 Source Code 21

8. 8.1 Software Testing 61

8.2 Types of Testing 61

• Pivotal Aspect of Cybersecurity: Recognition of the pivotal role played by supervised

• Growing Sophistication and Frequency of Cyber Threats: Acknowledgment of the

• Challenges in Cybersecurity Classification: Recognition of challenges, including the

• Applications Across Cybersecurity Domains: Identification of diverse applications, ranging

2. Data Analysis and Visualization:

3. Algorithm Implementation (1, 2, 3):

2.1 Hardware Requirements

Processor : Pentium IV/III

Hard disk : minimum 80 GB

2.2 Software Requirements

Operating System : Windows

Tool : Anaconda with Jupyter Notebook

3.1 FEATURES OF FRONT END

3.2 FEATURES OF BACK END

• New features should be based on HTML, CSS, DOM and JavaScript

Bootstrap is a popular front-end framework that provides a collection of CSS, JavaScript,

• Pre-designed UI components: bootstrap includes a large set of pre-designed UI

Python is a widely-used, high-level programming language renowned for its

• Readable Syntax: Clear and concise code structure for readability.

• Versatility: Supports both procedural and object-oriented programming.

• Extensive Library: Rich standard library simplifies development.

• Interpreted and Interactive: Allows rapid testing and experimentation.

1. Rapid Development: Facilitates quick and efficient web development.

2. MVC Architecture: Organizes code in a Model-View-Controller pattern.

3. ORM System: Simplifies database interactions with Python code.

4. Django REST framework: Enhances Django for modern API development.

MySQL is an open-source relational database management system (RDBMS)

• Data Querying:Enables efficient retrieval and manipulation of data.

• Data Definition Language (DDL):Defines and modifies database structures.

• Data Manipulation Language (DML):Manipulates data within the database.