A Machine Learning Proposal

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

A MACHINE LEARNING APPROACH TO INFORMATION SECURITY

BY

OLATAYO, Raymond Oluwafemi


REG NO: 16283260

SUBMITTED TO
MRS. M.M USMAN

DEPARTMENT OF COMPUTER SCIENCE


UNIVERSITY OF ABUJA

JUNE, 2021

1
CHAPTER ONE: INTRODUCTION

1.1 Background of the Study


Information security, infosec for short, is the act and practice of shielding information against
information attackers and hackers. Information security is the preservation of the confidentiality,
integrity and availability of information. Additionally, other properties, such as authenticity,
accountability, non-repudiation and reliability of information can also be involved (ISO/IEC
27000:2009). Infosec is the protection of information and information systems from
unauthorized access, use, disclosure, disruption, modification, or destruction in order to provide
confidentiality, integrity, and availability (CNSS, 2010).
1.2 Statement of Problem
Information and or Cyber-attacks are increasing within the cyber world. There ought to be some
advanced security measures taken to scale back or avoid the number of cyber-attacks. There are
various information security attacks or threats. Some of the most common threats today are
software attacks, theft of intellectual property, theft of identity, theft of equipment or
information, sabotage, and information extortion.
1.3 Aim and Objectives of the Study

The aim of this study is to examine a machine learning approach to information security.
Specifically, it sought to:
1. Investigate the use of UNSW_NB 15 (University of New South Wales –NB 2015) for
the protection of information system
2. Find out how Naive Bayes is used for the protection of information system
3. Examine the use of C4.5 Decision Tree machine learning algorithms for the protection of
information system
4. Ascertain the how KNN (K-Nearest Neighbour) is used for the protection of information
system
1.4 Scope and Limitation of Study

The study is carried out to a machine learning approach to information security. Machine
learning approaches are widely used to solve various types of information securities. The
proposed project would cover a Machine Learning, Network Intrusion Detection system for the
protection of information system based on the UNSW-NB15 dataset, Naive Bayes, KNN and
Decision Models.

However, in the effort of carrying out this research, researcher will face problem of time and
finance.
1.5 Significant of the Study
The results of this study will help the cyber security experts as it will direct them on how to save
guard and secured an information system against the notorious activities of hackers and cyber
attacker, the task of keeping information system secured and sustained in a secured state during
the period of their usage ( lifetime) is the aim of this research work.

2
1.6 Definition of terms
Algorithm: a process or set of rules to be followed in a computer
Cyber attack: any attempt to expose, alter, disable, destroy, steal or gain information through
unauthorized means
Cyber security: the practice of protecting systems, networks, and programs from digital attacks
Machine learning: the study of computer algorithms that improve automatically through
experience and by the use of data.
Information security: sometimes shortened to infosec, is the practice of
protecting information by mitigating information risks. It is part of information risk management.
CHAPTER TWO: LITERATURE REVEW
2.1 Machine Learning
Machine learning (ML) is the study of computer algorithms that improve automatically through
experience and by the use of data. It is seen as a part of artificial intelligence.
2.2 Machine Learning Approaches
Machine learning approaches are traditionally divided into three broad categories, depending on
the nature of the "signal" or "feedback" available to the learning system:

 Supervised learning: The computer is presented with example inputs and their desired
outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to
outputs.
 Unsupervised learning: No labels are given to the learning algorithm, leaving it on its own to
find structure in its input. Unsupervised learning can be a goal in itself (discovering hidden
patterns in data) or a means towards an end (feature learning).
 Reinforcement learning: A computer program interacts with a dynamic environment in
which it must perform a certain goal (such as driving a vehicle or playing a game against an
opponent). As it navigates its problem space, the program is provided feedback that's
analogous to rewards, which it tries to maximize

2.3 Information Security


Information security, sometimes shortened to infosec, is the practice of protecting information by
mitigating information risks. It is part of information risk management
2.4 Information Security Threats 
Information security threats come in many different forms. Some of the most common threats
today are software attacks, theft of intellectual property, theft of identity, theft of equipment or
information, sabotage, and information extortion. Most people have experienced software attacks
of some
2.5 Responses to Threats
Possible responses to a security threat or risk are:

3
 reduce/mitigate – implement safeguards and countermeasures to eliminate vulnerabilities or
block threats
 assign/transfer – place the cost of the threat onto another entity or organization such as
purchasing insurance or outsourcing
 accept – evaluate if the cost of the countermeasure outweighs the possible cost of loss due to
the threat
CHAPTER THREE: ANALYSIS AND DESIGN
3.1.0 ANALYSIS OF THE EXISTING SYSTEM
The existing system of machine learning, network intrusion detection system for the protection
of information system. It refers to the systems, tools and processes that are designed and then
deployed to field sensitive and confidential data from being compromised or tampered with.
3.1.1 STRENGTH OF THE EXISTING SYSTEM
The advantage of this system is to save guard and secured an information system against the
notorious activities of hackers and cyber attacker.
3.1.2 WEAKNESSES OF THE EXISTING SYSTEM
The weaknesses of the existing system were InfoSec was traditionally considered an IT
problem– this couldn’t be further from the truth. Attacks could occur from any weak link in the
company regardless of the hierarchy or department, so it is imperative that the entire enterprise is
protected by seamless security programmes.
3.2 ANALYSIS OF THE PROPOSED SYSTEM
Unsw-nb15 dataset has two attributes that can serve as class label; label and the attack_cat
attributes, the label attribute is a binary label attribute has value of 0 for normal connection and
value of 1 for attack connection, the attack_cat attribute has 10 values, each for the nine attacks
categories connections and the normal connection.
3.3 METHODOLOGY
This section presented machine learning-based information security intrusion detection models.
This comprised of several processing steps: exploring the security dataset, preparing raw data,
determining feature importance and ranking, and building the resultant models.
3.4 SYSTEM DESIGN
System design is a solution to a problem, it demands the translation of the requirements
uncovered in analysis into possible ways of meeting them (E.O Nwachukwu).
3.4.1 INPUT and output SPECIFICATION
The inputs and outputs to a machine learning task may be of different kinds. Generally, they are
in the form of numeric (both discrete and real-valued) or nominal attributes. Numeric attributes
may have continuous numeric values whereas nominal values may have values from a pre-
defined set.

4
CHAPTER FOUR: SYSTEM IMPLEMENTATION AND TESTING
4.1 Implementation
4.1.1 Naïve Bayes (NB).
These algorithms are probabilistic classifiers which make the a-priori assumption that the
features of the input dataset are independent from each other. They are scalable and do not
require huge training datasets to produce appreciable results
4.1.2 K-Nearest Neighbour (KNN).
KNN are used for classification and can be used for multi-class problems. However, both their
training and test phase are computationally demanding as to classify each test sample, they
compare it against all the training samples.
4.1.3 C4.5 Decision Tree
In this type of classification, the target concept is represented in the form of a tree.
The tree is built by using the principle of recursive partitioning. An attribute is selected as a
partitioning attribute (also referred to as node) based on some criteria (like information gain)
[Mit97].
4.2 Testing
The models were evaluated using the testing dataset, from the work,
CHAPTER FIVE: SUMMARY, CONCLUSION AND RECOMMENDATION
5.1 Summary
5.2 Conclusion
5.3 Recommendation
5.4 Future work
REFERENCE
APPENDIX

You might also like