Article Vivien Armel Eyangolo

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

1

Design and implementation of artificial intelligence for the prevention of


cybercrime in the Republic of Congo based on machine learning: Approach
based on artificial learning in cybersecurity for the detection of intrusions.

Vivien Armel EYANGOLO1 Roch Corneille NGOUBOU2


Bircham International University Computer Training Center-Computer and Research
Computer of science Center for the Army and Security,
Madrid, Spain Sciences and technologies Faculty
vivieneyangolo@yahoo.fr Brazzaville, CONGO
corneillerochdesuz@gmail.com

Pierre KAFUNDA KATALAY3


Faculty of science
Mathematics and computer science laboratory University
Kinshasa, DRC.
Pierre.kafunda@unikin.ac.cd

Abstract— Companies produce enough data in their INTRODUCTION


operations and this data is stored in their information
In our modern technological age, cybersecurity
systems. These information systems operate in an plays an essential role in the protection against the
insecure environment, that is to say less secure because ever-present dangers of cybercrime. These
of the communication channel used which is the malicious attacks, such as hacks, ransomware and
online fraud, highlight the importance of ensuring
Internet. It is exposed to several types of attacks such
the protection of our sensitive data. Faced with the
as: denial of service, SQL injection, password cracking, ever-evolving tactics employed by cybercriminals,
etc. This is how cyber security was developed to deal maintaining cybersecurity is a constant battle. This
with these scourges of attacks which can alter the requires the development of revolutionary solutions
capable of effectively protecting the integrity,
proper functioning of an organization. Cyber security
confidentiality and accessibility of information in
is a set of processes aimed at protecting an information the field of virtual reality
system against cyber-attack. Our approach in this
Most learning systems assume that
article is to develop a security system based on artificial
all the datasets used to learn are balanced. However,
intelligence, more precisely on artificial learning. The
objective of artificial learning is to create predictive
when it comes to real applications, this balance is not

models from a training sample. In this article we always verified. And in this case, the classifier
propose a new approach, which consists of controlling cannot accurately detect the false positive and false
the sensitivity of support vector machine using the negative.
perturbation method.
In a two-class classification
Keywords—component; Artificial learning, Predictive problem, when the training data of the majority class
model, Support vector machine, Imbalanced classes,
are much greater in number than those of the
Sensitivity and Specificity.
minority class, the algorithms allowing to obtain a
minimum error rate will always tend to neglect the
class minority, because of this disproportion. Which
2

justifies the great connection between the two forms - Section 3 is the
of asymmetry in supervised learning. Indeed, evaluation of our model.
asymmetry comes in two main forms: class
imbalance and cost asymmetry. Class imbalance
concerns problems where one of the modalities of
the target variable is much less represented than the
others, which disrupts the learning algorithms. Cost
asymmetry concerns cases where the costs of errors
are not symmetrical. In this article, it is the
asymmetry of classes that interests us. To deal with
this problem, several algorithms have been
developed, for example sampling, under sampling.
In this article, we propose a new approach, which is
an algorithmic approach which consists of
perturbing the machine, so as to take into account the
minority class.

Our problem is presented as


follows: Given a learning sample in an asymmetric
situation, how to create a machine capable of making
optimal assignments, without being influenced by
this class imbalance. To solve this problem, we
propose an algorithmic approach based on the
disturbance of SVM by inserting two parameters at
the level of the economic function, one of which is
the cost of misclassification of positive examples
and the other designates the cost of misclassification
of negative examples.

This article is organized as


follows:

- Section 1 shows how to


determine model
parameters when the data
is balanced;
- in section 2 which is our
contribution, we showed
how to determine the
model parameters when
the data is unbalanced;
3

I. CYBER SECURITY to be easy to access, in particular to protect the


simple IS from general and simple threats.

Several definitions of the term “cybersecurity” have For its part, the risk analysis method makes it
been established at the national and international possible to resolve the problem from above by
levels. For the purposes of this document, studying the threats and terrible events which
“cybersecurity” means all tools, policies, guidelines, particularly affect the IS studied, in order to better
risk management methods, actions, training, best adapt to the security measures in place. Therefore,
practices, safeguards and technologies that can be risks can have various origins or properties, but it is
used to protect the availability, integration and obvious that what people can now call cyber risks is
confidentiality of assets in connected infrastructures a risk that puts pressure on countries, organizations
of government, private organizations and users. and individuals. Risk analysis methods are
These assets include connected computing devices, particularly suited to complex systems subject to
personnel, infrastructure, applications, services, major threats. However, it must rely on a relatively
telecommunications systems and data in the cyber diversified and specific skills base, which will allow
environment. the implementation of appropriate methodological
tools. Depending on the objectives set for the
Before going any further, this is probably where we exercise, it may require a substantial investment of
need to try to clarify the use of several expressions resources.
and terms commonly used almost interchangeably,
such as: These two methods complement each other:
depending on the issues of the IS studied, the
 information system security (ISS) which compliance method will constitute a basic
can be understood as a set of measures foundation which can effectively complement the
implemented to achieve and maintain the
risk analysis work, and adapt more precisely and
cybersecurity state of an information
specifically to the IS concerned and its context. The
system (IS);
 cybersecurity which is therefore the business processes in which he participates and the
desirable state to achieve; risk scenarios that put him under pressure.
 cyber defense which also includes Cyber risk, the threat of exploiting one or more
measures implemented to maintain a state
vulnerabilities in a digital system, is itself a risk. It
of cybersecurity, but in the face of
has strategic importance, and its achievements can
particularly marked adversity and within a
very specific time frame. be fatal and dizzying for an organization. For all
Therefore, the protection of the Information System these reasons, if the notion of cyber risk historically
(IS) aims to identify, analyze and evaluate the risks refers to a very technical area, it is now obvious that
that affect these assets (which can be hardware, it will inevitably extend to areas that matter to
software, business processes), and to take the managers and decision-makers, and will come more
necessary measures to control these risks. In order to from all areas. At the functional level, any profession
properly protect itself from threats from online that uses digital technology to support its value chain
sources, a country or organization can take certain or produces digital equipment or services will be
security measures. In this area, you have the choice affected by cyber risks. Digital security is, as we
between several levers. However, the often say, everyone’s business. This awareness and
implementation of such measures must be done in an this collective and holistic involvement in the search
informed and reasonable manner, compatible with for the state of cybersecurity is not trivial.
the threat of pressure on the elements intended to be Taking preventive measures is obviously an
protected and the value attributed to them by their important aspect of ensuring the security of the
owners. system network. However, this is not unique. Once
The first basic approach consists of setting up a the protection elements are deployed, it is also useful
measurement base based on a priori benchmarks to dedicate resources to the monitoring system in
applicable to the organization's environment. These order to detect possible security incidents that may
standards can be associated with professional or arise. For this reason, security monitoring
activity-specific regulations. They can also be very capabilities rely at least on software or hardware
basic lists of indicators. This approach focuses tools and data to guide them properly. In order to
resources on implementing security measures, detect non-trivial attacks, manual analysis is almost
instead of defining a list of measures to be essential. When capabilities are implemented as
implemented. It is not very personal, but is designed services for the entire organization or a group of
4

organizations, larger infrastructure can be deployed 𝑥 → ф (𝑥 )


and dedicated human resources can be hired to
provide monitoring functions, analysis and possible With representation space of larger dimension than
answers.
the data spaceℱX
A country that wishes to respond to a victim attack
carries out a detailed analysis of the opportunity to
use specific levers. It will be difficult to find a
combination of levers expressed as precisely as
possible and consistent with strategic political
objectives, to maximize the effectiveness of the ф
response while remaining within a precise
framework of action, such as the framework of
international law. In order to conduct the analysis Figure 1 :Data space transformation
and decision-making process as calmly as possible,
during a real attack, it would be interesting if the
country concerned could prepare in advance, which The set of training data becomes:
will help it direct its final response and, if necessary,
where appropriate, to guide its implementation of E = , where: and Є {-1, +1}.
operational planning.
[8,14]{(ф (𝑥𝑖 ) , 𝑦𝑖 )}1≤i≤M ф (𝑥𝑖 ) ∈ ℱ 𝑦𝑖
II. NON LINEAR SUPPORT VECTOR
Therefore, the optimization problem can be written
MACHINES [5, 7,10,12, 14,15,22]
as follows:
The Linear Separator (decision
1 2
function) defined by the SVM is given by: min ‖w‖
2
𝒇(𝒙⃗⃗⃗ )=. +b, where: 𝒘𝒙 { 𝑆/𝑐 (1)
w is a vector perpendicular to the linear separator, ∀i, yi (w. ф (𝑥𝑖 ) + b) ≥ 1
called a weight vector;
1
b is called bias; 𝐿 = ‖𝜔‖2 − ∑𝑀
𝑖=1 𝛼𝑖 [𝑦𝑖 (𝜔. Φ(𝑥𝑖 ) + 𝑏) − 1](2)
2
“. » represents the dot product of the vectors
and.𝒘𝒙𝒙 ⃗⃗⃗ 𝒘
⃗⃗⃗⃗ [5, 7] By solving the system

𝜕𝐿
Assume that the data is nonlinearly =0
{𝜕𝜔
𝜕𝐿
separable. The determination of the decision =0
𝜕𝑏
function first involves a transformation of the data
space into another characteristic () or representation We obtain the result:
space, possibly of high dimension, where the data
w = ∑𝑀 𝑖=1 αi 𝑦𝑖 ф (𝑥𝑖 )
becomes linearly separable.ℱ { 𝑀 (3)
∑𝑖=1 αi 𝑦𝑖 = 0

This approach is based on Cover's Withw ∈ ℱ


theorem in 1965 which indicates that a set of
By replacing (3) in (2) we obtain the dual of
examples transformed nonlinearly into a higher-
problem (1):
dimensional space is more likely to be linearly
separable than in its original space. [22] 1
max ∑𝑀
𝑖=1 𝛼𝑖 − ∑𝑀 𝑀
𝑖=1 ∑𝑗=1 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 ⟨ф (𝑥𝑖 ), ф (𝑥𝑗 )⟩
2
𝑆/𝐶
 Determination of Wide Margin Separator
∀i, ∑𝑀𝑖=1 𝛼𝑖 𝑦𝑖 = 0
[10, 12] { 𝛼𝑖 ≥ 0
(4)
Consider the applicationф ∶ X → ℱ
5

For certain characteristic spaces and 𝜉𝑖 Indicates to what extent the example is on
associated applications, the scalar products are easily the wrong side or not.𝑥𝑖
calculated using specific functions, called kernel
If =0 then is well classified𝜉𝑖 𝑥𝑖
functions such as:ф
If ≥1 then is misclassified𝜉𝑖 𝑥𝑖
𝐾(𝑥𝑖 , 𝑥𝑗 ) = ⟨ф (𝑥𝑖 ), ф (𝑥𝑗 )⟩
(5) Thus, in general, the optimization problem of
Support Vector Machines can be written as follows
The interest of these kernel functions is to
[4,6,23]:
make it possible to calculate scalar products without
1
2
having to explicitly transform the data by the min ‖w‖ + 𝐶 ∑𝑀
𝑖=1 𝜉𝑖
2
function, therefore, without necessarily knowing this 𝑆/𝐶 (7)
∀i, yi (w. ф (𝑥𝑖 ) + b) ≥ 1 − 𝜉𝑖
function.ℱ ф ф
{ 𝜉𝑖 ≥ 0
By integrating equation (1) into (2), we obtain the
The dual of the general optimization problem to be
following dual problem:
solved will therefore be:
1
max ∑𝑀
𝑖=1 𝛼𝑖 − ∑𝑀 𝑀
𝑖=1 ∑𝑗=1 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝐾(𝑥𝑖 , 𝑥𝑗 ) 1
2 max ∑𝑀
𝑖=1 𝛼𝑖 − ∑𝑀 𝑀
𝑖=1 ∑𝑗=1 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝐾(𝑥𝑖 , 𝑥𝑗 )
2
𝑆/𝐶
𝑆/𝐶
∀i, ∑𝑀𝑖=1 𝛼𝑖 𝑦𝑖 = 0 𝑀

∀i, 𝑖=1 𝛼𝑖 𝑦𝑖 = 0
{ 𝛼𝑖 ≥ 0
{ 0 ≤ 𝛼𝑖 ≤ 𝐶
(6)
(8)

Classification of New Data[12, 15]


 Core Function

From the above, we can therefore deduce


Consider a set X of observations in the
the decision function, for the classification of new
training data. The Gram matrix of the kernel
data:
associated with this set is a square matrix of order M

𝑓(𝑥) = 𝑠𝑔𝑛 (𝑤. ф (𝑥 ) + 𝑏) and general term: . [1.8,


25]{𝑥𝑖 }1≤i≤M 𝑘(. , . )𝐾(𝑖, 𝑗) = 𝑘(𝑥𝑖 , 𝑥𝑗 )
= 𝑠𝑔𝑛 ((∑𝑀 ∗
𝑖=1 𝛼 i 𝑦𝑖 ф (𝑥𝑖 )). ф (𝑥 ) + 𝑏)

Theorem 1[8]:
= 𝑠𝑔𝑛 (∑𝑀 ∗
𝑖=1 𝛼 i 𝑦𝑖 ⟨ф (𝑥𝑖 ), ф (𝑥 )⟩ + 𝑏)
A function is a valid kernel if it is symmetric and
= 𝑠𝑔𝑛 (∑𝑀 ∗
𝑖=1 𝛼 i 𝑦𝑖 𝐾(𝑥𝑖 , 𝑥) + 𝑏) positive definite.𝒌𝑿 × 𝑿 → ℝ
In other words, a function is kernel if and only if:𝑘

- 𝐾(𝑖, 𝑗) = 𝐾(𝑗, 𝑖) ;
A- Soft margin [4,8,23,25]
- ∑𝑀 𝑀
𝑖=1 ∑𝑗=1 𝑐𝑖 𝑐𝑗 𝑘(𝑥𝑖 , 𝑥𝑗 ) ≥ 0,

In the case of soft margin, we proceed in the ∀ 𝑐1 , 𝑐2 … 𝑐𝑀 ∈ ℝ


same way as we did above. Our contribution
This last condition results in the fact that all
consisted of disrupting the economic function and
the eigenvalues of the Gram matrix are positive and
relaxing the constraints by introducing error terms.𝜉𝑖
non-zero.
6

Proof: (see [8])

B- Examples of Some Kernel Functions [2,3,


17] C- SENSITIVITY AND SPECIFICITY [20]

 Linear: 𝐾(𝑥𝑖 , 𝑥𝑗 ) = 𝑥𝑖 . 𝑥𝑗 1- Sensitivity

 Polynomial: or𝐾(𝑥𝑖 , 𝑥𝑗 ) =
The probability that a positive example is classified
(𝑥𝑖 . 𝑥𝑗 )𝑑 (𝑥𝑖 . 𝑥𝑗 + 𝑐)𝑑 as positive by the model is what we call the
sensitivity rate:
 RBF (Radial Basic Function): 𝐾(𝑥𝑖 , 𝑥𝑗 ) =
2
𝑒 −𝛾‖𝑥𝑖 −𝑥𝑗 ‖ , 𝛾 > 0 𝑉𝑃
𝑠𝑒𝑛 =
𝑉𝑃 + 𝐹𝑁
III. UNBALANCED CLASSES AND SVM
[20,24] 2- Specificity

The probability that a negative individual will be


A- Performance Measurement of a Classifier
classified as negative is what we call the specificity
rate:
Performance evaluation is a very
𝑉𝑁
important step when doing supervised learning. We 𝑠𝑝𝑒 =
𝑉𝑁 + 𝐹𝑃
have said that imbalance poses a performance
problem when it is not taken into account during the
By aggregating these two indices, we can obtain a
learning process. The question is how to tell that the
single index:
model is efficient or not?
To measure the performance of a classifier we use
3- Youden index [24]
the confusion matrix.
𝑦𝑜𝑢𝑑𝑒𝑛 = 𝑠𝑒𝑛 + 𝑠𝑝𝑒 − 1
B- Indicators in the case of two classes
The decision rule is as follows: "the model becomes
As we said above, real supervised better when the Youden index is close to 1".
learning situations are generally two-class problems
where one of them is the class of interest. Very often, 4- Likelihood Report𝐿[19.24]
we notice that the examples of this so-called class of
interest are in the minority. In the following, we 𝑠𝑒𝑛
consider the examples of the minority class as 𝐿=
1 − 𝑠𝑝𝑒
positive and those of the majority class as negative.
The decision rule is: “The higher the likelihood ratio,
The basic tool for model evaluation is the confusion the better the model”
matrix. Evidence (see [20]).

The following table gives certain results according


TABLE I. Confusion Matrix [18,24]
to the likelihood ratio.

TABLE II. Contribution of the model according to the


Real Predicted class Sum
likelihood ratio. [24]
class + -
+ True False 𝑁+
Positives Negatives L Contribution of the Model
(VP) (FN) 10 and above Important
- False True 𝑁− 5-10 Moderate
Positives Negatives 1-5 Weak
(FP) (TN) 1 None
7

problems, and thus generates very high costs when


5- Geometric Mean the classifier misclassifies elements of the minority
class. In the case of intrusion detection for example,
𝐺 = √𝑠𝑒𝑛 ∗ 𝑠𝑝𝑒
saying that an intrusion is normal access is very
This indicator is most used when the data is serious. [9.16]
unbalanced.

Despite the importance attached to


handling unbalanced data, most classifiers tend to
Noticed:
only optimize accuracy without taking into account
These indicators allow, by focusing the relative distribution of each class. As a result,
on the class of interest, to evaluate the quality of the these classifiers poorly classify elements of the
prediction. The recall is equal to the sensitivity
minority class when the data distribution is highly
presented in the previous point. However, precision
is the proportion of positive individuals among those skewed. This poses a performance problem. To solve
who were classified as positive. this problem, we propose an algorithmic solution
which is our contribution which consists of
𝑉𝑃
𝑟= disrupting the economic function by inserting two
𝑉𝑃 + 𝐹𝑁
parameters 𝐶 + And 𝐶 − which are respectively the
𝑉𝑃
𝑝= cost of misclassifications for positive examples and
𝑉𝑃 + 𝐹𝑃
the costs of misclassifications for negative examples.
These two indicators are better when it comes to [11, 20,24]
evaluating the performance of a model where one of
By assigning high misclassification
the classes is the one of interest.
When and, then such a classifier is said to be perfect cost for minority class compared to majority class (),
because it classifies all positive individuals well and the effect of class imbalance will be reduced 𝐶 + >
does not classify a negative individual as
𝐶−
positive.𝑟 = 1𝑝 = 1
We therefore optimize the following
mathematical program:

D- SVM IN CASES OF UNBALANCED


CLASSES [9,11,16,20,21,24,]

The problem of unbalanced classes 1


min ‖𝜔‖2 + 𝐶 + ∑𝑁
𝑖|𝑦𝑖 =1 𝜉𝑖 +
𝜔,𝜉𝑖 2
is becoming very frequent with the applications of
𝐶 − ∑𝑁
𝑖|𝑦𝑖 =−1 𝜉𝑖 (9)
Machine Learning algorithms in several fields,
notably in telecommunications in the detection of 𝑠/𝑐 𝑦𝑖 (𝜔𝑡 𝑥𝑖 + 𝑏) ≥ 1 − 𝜉𝑖
telephone fraud, bioinformatics, text classification,
𝜉𝑖 ≥ 0, 𝑖 = 1, … … . , 𝑁
voice recognition, intrusion detection and others.
[9.21]

In telephone fraud detection,


fraudulent credit card detection and intrusion
detection, the imbalance is real, and when it is not
taken into account, it leads to serious performance
8

𝑁 𝑁
‖𝜔‖2 – Calculation𝐿𝑝
𝐿𝑝 = + 𝐶 + ∑ 𝜉𝑖 + 𝐶 − ∑ 𝜉𝑖
2
𝑖|𝑦𝑖 =1 𝑖|𝑦𝑖 =−1 - Determine𝛼𝑖
𝑁

− ∑ 𝛼𝑖 [𝑦𝑖 (𝜔. 𝑥𝑖 + 𝑏) − 1 + 𝜉𝑖 ] – Calculate the bias b


𝑖=1
Error : 𝐸𝑖 = 𝑓(𝑥𝑖 ) −
𝑁
𝑀
− ∑ 𝜇𝑖 𝜉𝑖 𝑦𝑖 =(∑𝑗=1 𝛼𝑗 𝑦𝑗 𝐾(𝑥𝑗 , 𝑥𝑖 ) + 𝑏) −
𝑖=1 𝑦𝑖

Or𝛼𝑖 ≥ 0 𝑒𝑡 𝜇𝑖 ≥ 0 The complexity of this algorithm is:𝑂(𝑛2 ).


By solving the system:
𝜕𝐿𝑝
𝜕𝜔
=0 IV. DIGITAL EXPERIMENTATION AND
𝜕𝐿𝑝
=0
RESULTS
𝜕𝑏
𝜕𝐿𝑝
=0
𝜕 𝜉𝑖
The data on intrusion detection is
𝛼𝑖 [𝑦𝑖 (𝜔. 𝑥𝑖 + 𝑏) − 1 + 𝜉𝑖 ] = 0
{ 𝜇𝑖 𝜉𝑖 = 0 truly unbalanced. There are many more negative
(10) examples (normal use) than positive examples
(intrusion). In this experiment, we use data from
KDD (Data Mining and Knowlegment Discovery)
Cup 1999 [source:
We obtain the following dual program:
kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
1
𝐿𝐷 = ∑𝑁 𝑁
𝑖=1 𝛼𝑖 − ∑𝑖,𝑗=1 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑦𝑗 𝑘(𝑥𝑖 𝑥𝑗 ) −
2 ]which are relatively complex data and suitable for
1 1
∑𝑁 𝛼 2 − − ∑𝑁 𝛼 2(11)
4𝑐 + (𝑖 |𝑦𝑖 = +1) 𝑖 4𝑐 (𝑖 |𝑦𝑖 = −1) 𝑖 testing the effectiveness of the modified SVM model

Either𝜖 + =
1
, 𝜖+ =
1 on imbalanced data.
4𝑐 + 4𝑐 −

Indeed, this data was collected and


distributed by the MIT laboratory with the sponsor
The balance between sensitivity and specificity can
be controlled using the following scheme: of DARPA (Defense Advanced Research Projects

The components of the diagonal of the kernel Agency) and AFRL (Air Force Research
matrices are fixed as follows: Laboratory) for the evaluation of research on
intrusion detection. The KDD data is obtained from
raw data from tcpdump, a command-line packet
𝑘(𝑥𝑖 , 𝑥𝑖 ) + 𝜖 + 𝑝𝑜𝑢𝑟 𝑦𝑖 = +1
𝑘(𝑥𝑖 , 𝑥𝑖 ) = {
𝑘(𝑥𝑖 , 𝑥𝑖 ) + 𝜖 − 𝑝𝑜𝑢𝑟 𝑦𝑖 = −1 analyzer, based on simulations on the United States

Algorithm Air Force network over a nine-week period. This


data concerns several attacks and is divided into five
Beginning
classes in total which are: normal, DOS (Denial of
• Introduce: C, Core
Service), R2L (Remote To Local Attack), U2R (User
• Calculation𝐿𝑝 To Root Attack) and Probing Attack. And each
record has 30 attributes shown in the following table.
• Initialize b and all to 0𝛼𝑖
This learning database contains 4940000 examples
• Do Until KT (Kuhn and Tucker conditions) for a size of 744 MB and 10% are used as training
are satisfied:
data. In our experiment, we want to treat the problem
– Find the vector𝜔 in a binary case, which is why we consider the DOS,
9

R2L, U2R, Probing examples belonging to the same 17 num_fil_creation Number of file
creation
class (attack). Which means that we only have two
operations
classes: normal and intrusion. 18 num_shells Number of
command
prompts or
shells requested
TABLE III. Attributes characterizing records 19 num_access_files Number of
[kdd.ics.uci.edu/databases/kddcup99/kddcup99.html] configuration
file access
No. Attribute Description operations
1 Duration Connection 20 num_out_bound_cmds Number of
duration in commands
seconds outside the ftp
2 Protocol Protocol type session
3 Service Network service 21 is_host_login 1 If access
for the belongs to the
destination hot list and 0
4 Flag Connection otherwise
status (normal, 22 is_guess_login 1 if invited 0
error otherwise
5 src_bytes Data size in 23 Count Number of
bytes from connections to
source to the same host as
destination the current
6 dst_bytes Data size in connection two
bytes from seconds ago
destination to 24 srv_count Number of
source connections to
7 land 1 if the the same
connection uses service as the
the same port at current service
the source as at two seconds
the destination ago
8 wrong_fragment Wrong number 25 serror_rate % of
of fragments connections
9 Urgent Number of with SYN error
urgent packets 26 srv_serror_rate % of
10 Hot Number of hot connections
indicators with SYN error
11 num_failed_logins Number of 27 rerror_rate % of
incorrect access connections
attempts with REJ error
12 logged in 1 if accessed 28 srv_rerror_rate % of
successfully and connections
0 otherwise with REJ error
13 num_compromised Number of 29 same_srv_rate % of
compromised connections to
conditions the same
14 root_shell 1 if root shell service
obtained and 0 30 diff_srv_rate % of
otherwise connections to
15 su_attempted 1 if attempting a different
"su root" services
command and 0
otherwise
16 num_root Number of root
accesses
10

Results Obtained Hypothesis test

We implemented our algorithm ((9), (10)) in python We note that, faced with unbalanced data, classical
and found the following results by considering algorithms provide very poor results. We must
1830 example datasets.
therefore take this imbalance into account as we did
in section 2, to improve the performance of the
TABLE IV. Implementation on the algorithm
model. Our proposed approach gives better results.
Attack Normal Total
Attack 57 243 300
Normal 3 1527 1530
Total 60 1770 1830
CONCLUSION
These are the confusion matrices for the classic case Most real data is often unbalanced, and this poses a
and the unbalanced case
performance problem for the classifier which is more
Classic case: likely in this case to classify individuals from the
Out of 300 attacks the machine predicted 57 as attack positive (minority) class as negative, which
and 243 as normal
constitutes a great danger. In this article, we solved
Out of 1530 normal, the machine predicted 3 as this problem of imbalance of training games. We
attack and 1527 as normal
took the case of intrusion detection using the
Machine Learning approach, we used the SVM with
TABLE V. Implementation on the algorithm an algorithmic modification, and the KDD99 cup
Attack Normal Total data. We found very satisfactory results. As future
Attack 218 82 300
prospects we propose the use of ensemble methods
Normal 1 1529 1530
Total 219 1611 1830 which involve several classifiers and then combine
them by majority vote. This way of doing things can
On the other hand, in the table just above, this is the also lead to good results since we use several experts.
case for unbalanced classes and the results are
improved.

Out of 300 attacks the machine predicted 218 as


attack and 82 as normal and out of 1530 normals the
machine predicted 1 as attack and 1529 as normal REFERENCES

By calculating the different parameters allowing the


performance of the model to be measured, we
obtain the following: 1. BOUSQUET Olivier, Introduction to
Support Vector Machines", Center for

TABLE VI. Simulation of model performance Applied Mathematics, Ecole Polytechnique


Palaiseau, Orsay 2001
Classic SVM Our Algorithm
Se Sp Se MS 2. CAPO-CHICHI, Machine learning for
0.19 0.99803922 0.72666667 0.9994641 business relationship detection, University
of Montreal, 2012
TABLE VII. Simulation of model performance
3. FIESCHI Marius, Data Mining, data
Classic SVM Our Algorithm mining: Concepts and techniques,
G_Means G_Means
0.4354234 0.85216883 University of the Mediterranean Aix
Marseille II, February 2006.
11

4. GINNY Mak, "the implementation of 14. MILHEM Hélène, Machine Vector


support vector machines using the Support, Toulouse Institute of
sequential minimal optimization Mathematics, INSA Toulouse, France IUP
algorithm", McGill University, Montreal, SID, 2011-2012
Canada, 2000 15. ORCHARD, B., Yang, C. and Ali, M.:
5. GRAF Geraldine and Julien, Analytical Innovation in Applied Artificial
CRM OLAP analysis tools and Data Intelligence: 17th International Conference
Mining, University of Fribourg, April 26, on Industrial and Engineering Applications
2008 of Artificial Intelligence and Expert
6. HAMOUI Fady, "Fraud detection and Systems, Springer, New York, 2004, 1272
knowledge extraction", Montpellier II p
University, 2007 16. PERNER, P.: Machine Learning and Data
7. EL HASSANI Imane, SVR with boosting Mining in Pattern Recognition: 5th
for long-term forecasting, Polytechnic International Conference, Springer, New
School of the University of Tours, 2011- York, 2007, 913 p.
2012. 17. PLAT John, Fast Training of Support
8. ISHAK Ben Anis, Selection of Variables Vector Machines using Sequential Minimal
by Support Vector Machines for Binary and Optimization, Microsoft Research, August
Multiclass Discrimination in High 14, 2000
Dimensions, thesis, University of the 18. PREUX Ph, Data mining, Course notes,
Mediterranean, 2007 University of Lille 3, August 31, 2009.
9. JAYSHREE Jha, Intrusion Detection 19. RAKOTOMAMONJY Alain and GASSO
System using Support Vector Machine, Gilles, Wide Margin Separators, INSA
International Journal of Applied Rouen-ASI Department, LITIS Laboratory,
Information Systems, 2013 November 11, 2014
10. KAFUNDA Katalay Pierre, Supervised 20. REHAN A. et al, Applying Support Vector
Machine Learning based on Vector Machines to Imbalanced Datasets,
Machine Support for Churn analysis in a Springer-Verlag Berlin Heidelberg, 2004
telecommunications company, Anales de la 21. RUNG-CHING Chen and Kai-Fan Cheng,
faculty des Sciences, volume 1/2017, Using Rough Set and Support
UNIKIN, 2017 VectorMachine for Network Intrusion
11. KIM Sungchul and HWANYO YU, SVM: Detection System, First Asian Conference
Classification, Regression and Ranking, on Intelligent information and Database
Springer-Verlag Berlin Heidelberg 2012. Systems, 2009
12. LIAUDET Bertrand, Data Mining Course, 22. SONGLUN Zhao, "Intrusion detection
EPF – 4, 5th year, Business and Project using Support Vector Machine enhanced
Engineering Option, 2002. with a feature-weight kernel", Computer
13. MARREF Nadia, "Incremental learning & Science, University of Regina, 2007
Support Vector Machines", HADJ
LAKHDAR-BATNA University, 2013
12

23. VAPNIK, VN: The Nature of Statistical Engineering Mathematics, Bristol Universit
Learning Theory, Springer, New York, y, Bristol BS8 1TR, United King, 1999
1995, 188p. 25. WANG, L.: Support Vector Machines:
24. VEROPOULOS, C. CAMPBELL, N. Theory and Applications, Springer, New
CRISTIANINI, Controlling the Sensitivit y York, 2005, 431 p
of Supp ort Vector Machines Department of

You might also like