Professional Documents
Culture Documents
Cybersecurity System
Cybersecurity System
Cybersecurity System
SUBMITTED BY
SATHYA S
211314103314
BONAFIDE CERTIFICATE
This is to certify that, this is a bonafide record of work done by SATHYA S, 211314103314 of for
GURU NANAK COLLEGE (Autonomous), Guru Nanak Salai, Velachery, Chennai - 600 042.
I would like to thank the Principal Dr. T. K. Avvai Kothai and Vice Principal Dr.
Anitha Malisetty for providing the necessary resources and facilities for the completion of this
project.
I extend my deepest thanks to Dr. K. RAVIYA, Head of the Department, whose guidance,
support, and encouragement were invaluable throughout this endeavor. Her expertise and insights
have been instrumental in shaping this project and enhancing its quality.
I owe my Guide Mrs. D. GAYATHRI a debt of gratitude for his/her invaluable guidance,
patience, and encouragement. Her mentorship has been a beacon of light, steering me through the
complexities of this project and helping me realize my potential.
Last but not least, I thank my family and friends for their unwavering encouragement
and understanding during this journey.
TABLE OF CONTENTS
INTRODUCTION
1. 2
1.1 Objective
3
1.2 Modules of the Project
SYSTEM SPECIFICATION
SURVEY OF TECHNOLOGIES
SELECTED SOFTWARE
4.1 Html
4.2 Css
4. 4.4 Bootstrap
9
4.5 python
4.6 Django
4.7 My sql
SYSTEM ANALYSIS
PROGRAM CODING
7.2 Screenshots 57
TESTING
9. CONCLUSION 64
10. REFERENCE 66
ABSTRACT
Cyberattack classification through the utilization of supervised machine learning methods. The
system is designed to categorize diverse cyber-attacks by employing a meticulously curated dataset
encompassing a wide array of attack types, including but not limited to malware, phishing, and distributed
denial-of-service (DDoS) attacks. Feature extraction techniques are applied to both network traffic data and
behavioural attributes, facilitating the training of a robust classification model. Various supervised learning
algorithms, such as decision trees, support vector machines, and neural networks, are evaluated for their
efficacy in accurately predicting attack categories. The training process involves labelling historical attack
instances, enabling the model to discern intricate patterns and subtle differentiators among attack types.
Regular model updates and retraining with new attack data ensure its relevance in dynamically evolving
threat landscapes. The system's predictive accuracy empowers cybersecurity teams to swiftly identify and
respond to cyber threats, thereby bolstering overall defense strategies. Through this research, we contribute
to the proactive identification and mitigation of cyber-attacks, ultimately fortifying digital security
frameworks.
INTRODUCTION
1
1. INTRODUCTION
1.1 OBJECTIVES
The objective of this research is to explore and highlight the significance of employing supervised
machine learning techniques for the classification of cyber-attacks in the realm of modern
cybersecurity. The focus is on leveraging labelled datasets to train algorithms for the swift and accurate
identification and categorization of diverse cyber threats. The ultimate goal is to enable organizations
to respond effectively, mitigate potential damage, and strengthen their overall cybersecurity defenses.
• Swift and Accurate Categorization: Emphasis on the ability of supervised machine learning
to provide a quick and accurate categorization of various types of cyber-attacks, aiding
organizations in timely responses.
• Leveraging Labelled Datasets: Highlighting the crucial role of labelled datasets in training
machine learning algorithms for effective cyber-attack classification.
2
1.2 MODULES OF THE PROJECT
• Data Pre-processing
• Data Analysis of Visualization
• Implementing Algorithm 1
• Implementing Algorithm 2
• Implementing Algorithm 3
• Deployment
1. Data Pre-processing:
- Cleans and prepares raw data, addressing missing values and optimizing features for subsequent
analysis.
- Extracts insights and patterns through statistical analysis and visualization, laying the groundwork for
informed decision-making.
- Applies and evaluates multiple algorithms (1, 2, 3) to identify the most effective solution based on
performance metrics.
4. Deployment:
- Integrates the selected algorithm into a practical setting, ensuring it is adapted for operational use with
user interfaces and continuous monitoring.
3
SYSTEM SPECIFICATION
4
2. SYSTEM SPECIFICATION
RAM : minimum 2 GB
5
SURVEY OF
TECHNOLOGIES
6
3.SURVEY OF TECHNOLOGIES
The part of an application that the user interacts directly is termed as frontend. It is also
referred as a ‘client side’ of the application. It includes everything that users experience directly:
text colors and styles, images, graphs and tables, buttons, colors, and navigation menu. XML is
the language used for front end development. The structure, design, behaviour, and content of
everything seen on screens when websites, web applications, or mobile apps are opened up, is
implemented by front End developers. Responsiveness and performance are two main
objectives of the Front End. The developer must ensure that the site is responsive i.e. it appears
correctly on devices of all sizes no part of the website should behave abnormally irrespective
of the size of the screen. Some of frontend development tools are HTML, CSS, XML, BULMA,
TAIL WIND CSS, SASS.
Backend is the server-side of the application/website. It stores and arranges data, and
also makes sure everything on the client-side of the application/website works fine. It is the part
of the application/website that you cannot see and interact with. It is the portion of software that
does not come in direct contact with the users. The parts and characteristics developed by
backend designers are indirectly accessed by users through a frontend application. Activities,
like writing APIs, creating libraries, and working with system components without user
interfaces or even systems of scientific programming, are also included in the backend. Some
of backend development tools are PHP, Java, C++, Python, Firebase and MySQL. Etc.
7
SELECTED SOFTWARE
8
4.SELECTED SOFTWARE
4.1 HTML
HTML, or HyperText Markup Language, is the fundamental language of web development. Created by
Tim Berners-Lee, HTML uses tags to structure content, define links, and incorporate multimedia. Key
features include hyperlinks for navigation, support for multimedia elements, interactive forms, semantic
markup for accessibility, and cross-browser compatibility. HTML continues to evolve, with HTML5
introducing new features like canvas for graphics and improved support for mobile devices. As the
backbone of the web, HTML is essential for creating structured and visually appealing online content.
Features :
9
4.2 CSS
Cascading Style Sheets (CSS) is a vital web development technology that complements HTML by styling
web pages. Using selectors and declarations, CSS separates content and presentation, allowing developers
to define the appearance of HTML elements. Key features include layout control, responsiveness through
media queries, external style sheets for modularity, and support for animations. CSS enhances the visual
appeal and consistency of web pages, playing a crucial role in creating engaging and well-designed online
content.
KEY FEATURES:
• Selectors: CSS3 introduces several new selectors that allow you to target specific
elements in a more precise way, such as: nth-child (), not (), and: checked.
• Box model: CSS3 adds new properties for controlling the size, padding, border, and
margin of boxes, such as box-sizing, border-radius, and box-shadow.
• Colors: CSS3 introduces new color formats, such as HSL and RGBA, which allow
you to specify colors in a more intuitive way.
• Fonts: CSS# adds new properties for controlling the font size, style and weight, as
well as new font formats, such as web fonts.
10
4.3 JAVA SCRIPT
JavaScript is predominantly used for client-side scripting in web development. It runs directly
in the web browser, enabling developers to create dynamic and interactive web pages that respond to
user actions in real-time without needing to communicate with the server. Java Script is the backbone
of many modern web applications, including social media platforms, online collaboration tools, and
e-commerce websites. It allows developers to create rich, interactive user interfaces and deliver a
seamless user experience. With the advent of platforms like Node.js, JavaScript can also be used for
server-side development. Node.js allows developers to build scalable and high-performance web
servers and backend services using JavaScript. JavaScript allows manipulation of the Document
Object Model (DOM), enabling developers to dynamically update and modify the content, structure,
and style of web pages based on user actions or application state changes.
KEY FEATURES:
• Dynamic Content
• Event Handling
• Data Manipulation
• Modularity
• Event-Driven Programming
11
4.4 BOOTSTRAP
KEY FEATURES:
• Responsive design: bootstrap's grid system makes it easy to create responsive designs
that adapt to different screen sizes and devices.
12
4.5 PYTHON
KEY FEATURES:
13
4.6 Django
Django is a high-level Python web framework known for rapid development. With an MVC
architecture, built-in ORM system, and templating engine, it simplifies common tasks.
Features like an automatic admin interface, security measures, and scalability contribute to its
popularity. Supported by a vibrant community, Django is versatile, suitable for various
applications, and includes Django REST framework for API development.
Key Features :
14
4.7 My sql
Key Features :
15
4.8 My sql lite
SQLite is a lightweight and serverless relational database management system, known for its
simplicity and efficiency. With a zero-configuration approach, it operates from a single file,
making it easy to integrate into various applications. SQLite supports standard SQL syntax, is
cross-platform, and boasts a low memory footprint, making it a popular choice for embedded
systems, mobile apps, and desktop software. As open-source software, SQLite has a robust
community providing support and resources for developers.
Key Features :
• Serverless & Embedded: Lightweight and serverless, operates from a single file.
16
SYSTEM ANALYSIS
17
5.SYSTEM ANALYSIS
The use of invariants in developing security mechanisms has become an attractive research
area because of their potential to both prevent attacks and detect attacks in Cyber-Physical
Systems (CPS). In general, an invariant is a property that is expressed using design parameters
along with Boolean operators and which always holds in normal operation of a system, in
particular, a CPS. Invariants can be derived by analysing operational data of various design
parameters in a running CPS, or by analysing the system’s requirements/design documents,
with both of the approaches demonstrating significant potential to detect and prevent cyber-
attacks on a CPS. While data-driven invariant generation can be fully automated, design-driven
invariant generation has a substantial manual intervention. In this paper, we aim to highlight
the shortcomings in data-driven invariants by demonstrating a set of adversarial attacks on such
invariants. We propose a solution strategy to detect such attacks by complementing them with
design-driven invariants. We perform all our experiments on a real water treatment testbed. We
shall demonstrate that our approach can significantly reduce false positives and achieve high
accuracy in attack detection on CPSs.
Disadvantages:
18
5.2 Characteristics of Proposed System
• Reliability and Stability: The system is designed for reliability, minimizing downtime
and ensuring stable performance under varying conditions.
• Data Accuracy and Consistency: Measures are in place to ensure the accuracy and
consistency of data through validation and verification processes.
19
• Adherence to Regulatory Standards: The proposed system complies with relevant
regulatory standards and industry best practices, ensuring legal and ethical integrity.
• User Training and Support: Provision of adequate training resources and ongoing
support to users ensures effective utilization of the system and addresses any issues
promptly.
20
SYSTEM DESIGN
21
6.SYSTEM DESIGN
Data visualization is an important skill in applied statistics and machine learning. Statistics
does indeed focus on quantitative descriptions and estimations of data. Data visualization
provides an important suite of tools for gaining a qualitative understanding. This can be helpful
when exploring and getting to know a dataset and can help with identifying patterns, corrupt
data, outliers, and much more. With a little domain knowledge, data visualizations can be used
to express and demonstrate key relationships in plots and charts that are more visceral and
stakeholders than measures of association or significance. Data visualization and exploratory
data analysis are whole fields themselves and it will recommend a deeper dive into some the
books mentioned at the end.
Sometimes data does not make sense until it can look at in a visual form, such as with charts
and plots. Being able to quickly visualize of data samples and others is an important skill both
in applied statistics and in applied machine learning. It will discover the many types of plots
that you will need to know when visualizing data in Python and how to use them to better
understand your own data.
➢ How to chart time series data with line plots and categorical quantities with bar charts.
➢ How to summarize data distributions with histograms and box plots.
22
MODULE DIAGRAM
input : data
23
6.2 Use Case Diagram
Use case diagrams are considered for high level requirement analysis of a system. So
when the requirements of a system are analyzed the functionalities are captured in use cases.
So, it can say that uses cases are nothing but the system functionalities written in an organized
manner.
24
6.3 Entity Relationship Diagram
25
PROGRAM CODING
26
7.PROGRAM CODING
import pandas as pd
import numpy as np
# Avoid unnecessary warnings, (EX: software updates, version mismatch, and so on.)
import warnings
warnings.filterwarnings('ignore')
df=pd.read_csv('CYBER.csv')
df.head()
df.tail()
df.shape
df.size
27
# Check the columns of dataset
df.columns
df.info()
df['Label'].unique()
# Transform the columns value(ex: int to str, str to int) for classification purpose.
le = LabelEncoder()
var = ['Label']
for i in var:
df[i] = le.fit_transform(df[i]).astype(int)
df.isnull().head()
df = df.dropna()
28
df.describe()
df.corr().head()
df["Label"].value_counts()
pd.Categorical(df["'Idle Min'"]).describe()
df.duplicated()
sum(df.duplicated())
29
df=df.drop_duplicates()
sum(df.duplicated())
import pandas as pd
import numpy as np
# Avoid unnecessary warnings, (EX: software updates, version mismatch, and so on.)
df=pd.read_csv('CYBER.csv')
df.head()
df = df.dropna()
30
df=df.drop_duplicates()
# Transform the columns value(ex: int to str, str to int) for classification purpose.
le = LabelEncoder()
var = ['Label']
for i in var:
df[i] = le.fit_transform(df[i]).astype(int)
plt.figure(figsize=(12,7))
sns.countplot(x='Label',data=df)
# Plot a Histogram
plt.figure(figsize=(15,5))
plt.subplot(1,2,1)
plt.hist(df["'Flow Duration'"],color='red')
plt.subplot(1,2,2)
plt.hist(df["'Active Std'"],color='blue')
31
# Check how many columns are in datasets
df.columns
# Plot a Histogram.
df.hist(figsize=(15,55), color='green')
plt.show()
# Plot a Histogram
32
fig, ax = plt.subplots(figsize=(20,15))
# Plot a Piechart
dataframe_pie = df[variable].value_counts()
return np.round(dataframe_pie/df.shape[0]*100,2)
plot(df, 'Label')
import pandas as pd
import numpy as np
# Avoid unnecessary warnings, (EX: software updates, version mismatch, and so on.)
import warnings
warnings.filterwarnings('ignore')
33
df=pd.read_csv('CYBER.csv')
df.head()
34
del df["'Fwd IAT Std'"]
df=df.dropna()
df.columns
# Transform the columns value(ex: int to str, str to int) for classification purpose.
le = LabelEncoder()
35
var = ['Label']
for i in var:
df[i] = le.fit_transform(df[i]).astype(int)
df.head()
df=df.drop_duplicates()
x1 = df.drop(labels='Label', axis=1)
y1 = df.loc[:,'Label']
import imblearn
ros =RandomOverSampler(random_state=42)
36
x,y=ros.fit_resample(x1,y1)
# Split the datasets into two parts like trainng and testing variable.
GNB = GaussianNB()
GNB.fit(x_train,y_train)
predicted = GNB.predict(x_test)
37
# Check classification report for this algorithm
cr = classification_report(y_test,predicted)
cm = confusion_matrix(y_test,predicted)
a = accuracy_score(y_test,predicted)
hl = hamming_loss(y_test,predicted)
38
# Plot a Confusion matrix for this algorithms.
plt.title(title)
plt.colorbar()
cm1=confusion_matrix(y_test, predicted)
print(cm)
plot_confusion_matrix(cm)
df2 = pd.DataFrame()
df2["y_test"] = y_test
df2["predicted"] = predicted
df2.reset_index(inplace=True)
plt.figure(figsize=(20, 5))
plt.show()
39
# Import the necessary libraries.
import pandas as pd
import numpy as np
# Avoid unnecessary warnings, (EX: software updates, version mismatch, and so on.)
import warnings
warnings.filterwarnings('ignore')
df=pd.read_csv('CYBER.csv')
df.head()
40
del df["'Bwd Pkt Len Mean'"]
41
del df["'CWE Flag Count'"]
df=df.dropna()
df.columns
# Transform the columns value(ex: int to str, str to int) for classification purpose.
le = LabelEncoder()
var = ['Label']
for i in var:
df[i] = le.fit_transform(df[i]).astype(int)
df.head()
df=df.drop_duplicates()
42
# X is independend variable (Input features)
x1 = df.drop(labels='Label', axis=1)
y1 = df.loc[:,'Label']
import imblearn
ros =RandomOverSampler(random_state=42)
x,y=ros.fit_resample(x1,y1)
# Split the datasets into two parts like trainng and testing variable.
43
print("NUMBER OF TEST DATASET : ", len(y_test))
ABC = AdaBoostClassifier()
ABC.fit(x_train,y_train)
predicted = ABC.predict(x_test)
cr = classification_report(y_test,predicted)
cm = confusion_matrix(y_test,predicted)
44
from sklearn.model_selection import cross_val_score
a = accuracy_score(y_test,predicted)
hl = hamming_loss(y_test,predicted)
plt.title(title)
plt.colorbar()
cm1=confusion_matrix(y_test, predicted)
print(cm)
45
plot_confusion_matrix(cm)
df2 = pd.DataFrame()
df2["y_test"] = y_test
df2["predicted"] = predicted
df2.reset_index(inplace=True)
plt.figure(figsize=(20, 5))
plt.show()
import pandas as pd
import numpy as np
# Avoid unnecessary warnings, (EX: software updates, version mismatch, and so on.)
import warnings
warnings.filterwarnings('ignore')
46
# Load the datasets
df=pd.read_csv('CYBER.csv')
47
del df["'Bwd PSH Flags'"]
df.columns
df.head()
df=df.dropna()
df['Label'].value_counts()
# Transform the columns value(ex: int to str, str to int) for classification purpose.
48
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
var = ['Label']
for i in var:
df[i] = le.fit_transform(df[i]).astype(int)
df['Label'].value_counts()
df.head()
df=df.drop_duplicates()
x1 = df.drop(labels='Label', axis=1)
y1 = df.loc[:,'Label']
49
import imblearn
ros =RandomOverSampler(random_state=42)
x,y=ros.fit_resample(x1,y1)
# Split the datasets into two parts like trainng and testing variable.
CBC = CatBoostClassifier()
50
# Fit is the training function for this algorithm.
CBC.fit(x_train,y_train)
predicted = CBC.predict(x_test)
cr = classification_report(y_test,predicted)
cm = confusion_matrix(y_test,predicted)
a = accuracy_score(y_test,predicted)
51
print("THE ACCURACY SCORE OF CAT BOOST CLASSIFIER IS :",a*100)
hl = hamming_loss(y_test,predicted)
plt.title(title)
plt.colorbar()
cm1=confusion_matrix(y_test, predicted)
print(cm)
plot_confusion_matrix(cm)
df2 = pd.DataFrame()
df2["y_test"] = y_test
df2["predicted"] = predicted
df2.reset_index(inplace=True)
52
plt.figure(figsize=(20, 5))
plt.show()
import joblib
joblib.dump(CBC, 'cyber1.pkl')
53
7.2 SCREENSHOTS
54
55
56
TESTING
57
8.TESTING
• Unit Testing
• White-box Testing
• Black-box Testing
• Validation Testing
• Backend Testing
Unit Testing: Unit testing involves the design of test cases that validate that the
internal program logic is functioning properly, and that program input produces
valid outputs. All decision branches and internal code flow should be validated. It
is the testing of individual software units of the application .it is done after the
completion of an individual unit before integration. This is a structural testing, that
relies on knowledge of its construction and is invasive. Unit tests perform basic
tests at component level and test a specific 52 business process, application, and/or
system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined
inputs and expected results.
White-box Testing: It is a test case design method that uses the control structure
of the procedural design to drive test cases. Using white box testing methods it was
guaranteed that most of the independent paths within modules had been exercised
at least once, all logical decision on their true and false sides, executed all loops at
their boundaries and exercised internal data structures to ensure their data validity.
White box testing has been done to achieve the following objectives. Logic errors
58
and incorrect assumptions are inversely proportional to the probability that a
program path will be executed. Errors tend to creep into the work when design and
implementation functions, condition or control that is out of the mainstream. We
often believe that logical path is not likely to be executed when the fact it may be
executed on regular basis. When program is translated into programming
language source code, it is likely that some typing errors will occur. Many will be
uncovered by syntax and type checking mechanism but others may go undetected
until testing begins.
Black-box Testing: Although tests are designed to uncover errors, they are also
used to demonstrate that the software functions are operational, input is properly
accepted and output is correctly produced and that the integrity of external
information is maintained. A black box test examines some of fundamental aspects
of a system with little regard for the internal logical structure of the software. All
input screens were thoroughly tested for data validity and smoothness of data entry
operations. Test cases were so formulated to verify whether the system works
properly in rare conditions also. Error conditions were checked. Data entry
operations are to be user friendly and smooth. It would be easier for the operators
if they can enter data through key board only.
59
Testing or Backend testing. There are different databases like SQL Server,
MySQL. Database testing involves testing of table structure, schema, stored
procedure, data structure and so on. Functional testing: Functional tests provide
systematic demonstrations that functions tested are available as specified by the
business and technical requirements, system documentation, and user manuals.
Functional testing is centered on the following items: 1. Valid Input: identified
classes of valid input must be accepted. 2. Invalid Input: identified classes of
invalid input must be rejected. 3. Functions: identified functions must be exercised.
4. Output: identified classes of application outputs must be exercise.
60
CONCLUSION
61
9. CONCLUSION
The analytical process started from data cleaning and processing, missing
value, exploratory analysis and finally model building and evaluation. The best
accuracy on public test set of higher accuracy score algorithm will be find out.
The founded one is used in the application which can help to find the type of
Cyberattacks
FUTURE WORK
62
REFERENCE
63
10. REFERENCE
4. Bootstrap:Bootstrap Documentation.
https://getbootstrap.com/docs/4.0/getting-started/introduction/
64