Machine Learning Algorithms For Spotting 6G Network Penetration For Different Attacks

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Machine Learning Algorithms for Spotting 6G

Network Penetration for Different Attacks


Anindya Bal Nabonita Halder Rahat Tajwar
Dept.of EEE Dept. of CSE,PSTU Dept. of EEE
BRAC University Barishal, Bangladesh BRAC University
Dhaka, Bangladesh nabonitahalder.cse@gmail.com Dhaka, Bangladesh
anindyabal007@gmail.com rahat.tajwar@gmail.com

Abstract—With the exponential evolution of current Internet its opponents. Companies are ready to spend billions to secure
and mobile connectivity technology, networking schemes, com- their data, indicating the seriousness of the issue. Using an IDS
puters and services have become more diverse and heterogeneous has been one of the ways for data protection. Put bluntly, an
in latest years. More sophistication must be deployed in order to
effectively coordinate, handle, operate and optimize networking IDS, that can also take the shape of a program or a hardware
frameworks. Network security devices are extremely critical like a router detects unusual or unauthorized network activity.
in securing an information infrastructure. With the enormous Anderson initially developed IDS in 1980 [1] and Denning
development of the internet, assault incidents as well as new updated it in 1987 [2].
possible attacks are growing every day. Intrusion Detection There are two basic Intrusion Detection approaches. Detecting
System is one of the real concern approach that can be used to
deal with the problem. One of several techniques used in the IDS abnormalities and detecting abuse [3]. The premise behind
is machine learning. Machine Learning IDS have been delivering abnormality detection is that attacker activity varies from
high precision and good tracking on novel threats in recent ordinary user activity. The process requires scanning a system
decades. In this The efficiency of a Machine Learning algorithm or network for unexpected or abnormal action. When the
called Decision Tree (J48) is analyzed and compared throughout detection is accomplished, the data from typical behavior will
this paper to three other Machine Learning algorithms: Neural
Networks (NN), Support Vector Machines (SVM), and a flexible be compared to the data from the real user. If the deviation is
narrative web - based learning algorithm called fast machine less than the predefined threshold, the user’s conduct can be
learning (FML). The algorithms were put through their paces assumed to be legitimate and without any criminal intent [4].
in terms of precision rate (PR), selection rate(SR), false alarm One of the perks of this detection method since it has a high
rate(FAR), and accuracy of four various attacks on network. accuracy and discover potential attacks.
According to the results of the tests, the Decision tree (J48)
algorithm outstrip the other three algorithms. Similarity measure is used in detecting abuse [5] (also referred
Index Terms—J48, NN, SVM, FML, Precision rate, Selection as stamp identification). It will review the results with the
rate, False alarm rate. identity in the signing database to identify an attacker and if
the system recognizes the pattern in the signing database, it
I. I NTRODUCTION will be recognized as an attack. This monitoring method has a
Machine Learning (ML) has been viewed as a paradigm high recognition accuracy and a low false alarm rate. The first
change for 6G networks, where network knowledge can be strategy (detecting abnormalities) is utilized in this research.
recognized from the transmitter to the receiver, in order to Additional methods like Stochastic Modelling, Computational
satisfy more complex and diverse specifications. Machine Autoimmune reactions Techniques, and Machine Learning can
learning must be analysed from the viewpoint of connec- be used to detect abnormalities. To make matters even more
tivity in order to achieve system intelligence in potential convoluted, the Machine Learning approach, for example,
6G networks. The fact that this methodology has already might use a variety of different algorithms. Four Machine
been used in online games, digital medical instruments, and Learning methods will be examined in this study notably;
self-driving cars encourages one to concentrate more on the NN,SVM, DT(J48), FML.
architecture of machine learning techniques and operating
equipment. Furthermore, while this information methodology II. L INKED P REVIOUS W ORKS
is so distinct from the conventional scientific methods used There are many ways to addressing the Intrusion Detection
in today’s information technology, heterogeneous wireless challenge in the Remote Monitoring area of studies. Author
scenarios and networking architectures must be reevaluated, suggested a data mining and decision tree hybrid mathematical
allowing established machine learning algorithms to work approach in [6]. The article claims that by using this method,
much better together. the discrepancy of false positives and the inability to differen-
The value of information privacy in a company has risen tiate between attacks and false positives will be minimized.
dramatically in recent years. Any company might not want In another work, authors implemented the Network depen-
its information to be shared with other companies especially dent Incoming Packets abnormalities detector model in [7].
The model was created to detect abnormal network traffic C. DT(J48)
packet activity using three different network and transport For classification task, the Decision Tree algorithm is com-
layer protocols: ICMP, TCP, UDP. One of it’s secrets to monly used. The data set is learned and developed in this
good classification outcomes is intelligent use of secondary process. As a result, if a new information object is assessed,
attributes which significantly aids the classification technique it will be graded appropriately. The Decision Tree algorithm
in producing good output guidelines in the IDS model. is also used to detect intrusions. As a result, the algorithm
Furthermore, the author recommends a K-NN, K-type Seg- will train and design data using the training samples as well.
mentation (KTS) protocol for abnormalities detection in [8]. As a consequence, depending on the model created, the model
They also validated their hypothesis by improving the detec- will define which attack categories a potential data belongs to.
tion performance of risky attacks such as R2L and U2R. Perhaps one Decision Tree’s advantages is that it can handle
However, some authors investigated and suggested a replica- large data sets. This is crucial because computing networks
tion focused on c-fuzzy decision tree classifier for Intrusion have a lot of data flowing through them. It functions well
Detection framework in [9], with the goal of increasing in actual intrusion detection since Decision Tree has the best
detection efficiency. To build an appropriate IDS, the authors detection efficiency and can efficiently create and analyse
used a c-fuzzy decision tree to pick the best attribute of the models. The sweeping statement consistency of the Decision
data collection. Tree in the Intrusion Detection model is also another helpful
However, in [10], it was suggested that an Intrusion Identifier feature. That’s also attributed to future developments in which
model be built using a data analysis method. The learning there will still be new moves and these attacks can be identified
rules that correctly describe the behaviour of intrusions and using the Decision Tree’s general precision.
relevant items, according to the speaker, can also be used for
detecting attacks and abnormalities detection. D. FML
Finally, The authors explored how mining techniques can be FML is accompanied by a realistic protocol that is focused
used in intrusion detection in [11]. This data analysis technique on the characteristics of the upcoming 6G cell connection.
works by acquiring attack-free training data (normal) and then It investigates various rays over period while taking into
using an optimization to group an attack from the data. It account qualitative knowledge [15]. The exploration’s findings
also employs affiliates rules to save information data about are used to respond to grid dynamics including the existence
the existence of patterns in data results which may help to of obstructions and shifts in traffic conditions. FML detects
increase recognition accuracy. To find attacks in audit results, pinhole leaks by analysing each vehicle’s accumulated ob-
they merged association rules and a classification techniques. tained data for each chosen beam. Furthermore, FML aims
to find the right contact laser, regardless of path, while
III. D ISCUSSED A LGORITHMS
maintaining sufficient protection and avoiding attacks. The
We have discussed here the details of four above mentioned algorithm divides the background area into small groups with
a;gorithms. identical circumstances and thinks about the output of various
rays in each of these groups independently. The process then
A. NN
begins indeed a discovery or an extraction process in each of
A neural network is a statistical or computational model its distinct cycles. It decides which process to join depending
that mimics the structure and behaviour of an artificial neural on the circumstances of oncoming traffic and a control mech-
systems. It uses a conducted to gather process data and is made anism. During the discovery process, the algorithm chooses
up of activation functions. Digital NN are quasi mathematical a random subset of beams; during the optimization process,
simulation techniques that are often used for generating vari- the algorithm chooses rays that performed best in past cycles.
ous input-output relationships in order to identify patterns in The algorithm obtains quality measurements of headlights by
information [12]. Neural Networks is a methodology that is analyzing the volume of data obtained by vehicles in the
based on the same ideas as how the individual brain depicts. system; as a result, it observes the behavior of individual
A vast system of interconnected neurons connects all sensory beams in different product circumstances across time.
- motor neurons in the human brain. Neurons in the brain,
according to most researchers, operate by firing an electrical IV. E XPERIMENT WITH A LGORITHMS
signal through the cortex to other neurons [13]. We will test the efficiency of NN, SVM, DT(J48), and
FML algorithms against the KDD-cup database throughout
B. SVM this study. As this dataset is so enormous, it’s nearly hard
Support Vector Machines seem to be the most general and to test it on a regular computer. The big dataset demands the
popular approach for machine learning tasks in regression and use of elevated computers. As a result, in this study, we’ll
classification [14]. This approach provides a series of training break down the information stream into smaller chunks and
scenarios, each of which is labelled as methods are divided into execute it on a regular computer. The assessment of the data
two groups. Then, through using SVM algorithm, a model is began with the choice of 10,000 data points from the training
created which can determine if a specific revelation belongs dataset, with the standard data percentage ranging from 10% to
in one of two categories. 90% and the unlabelled data percentage devoted to the assault
data, which were uniformly dispersed and selected. The two default values for the Weka parameters of the decision tree
datasets were then combined into a single one. The percentage J48 algorithm, such as reduced Error Pruning, conviction
indicates that if the typical ratio of information is 10%, the Variable, and min amount Item were used. We utilized 10
remaining data (that is the attacked data) will indeed be 90%. folds classification technique as our test choice during the
The remaining data will be made up of 39 different assault experiment. Cross-validation is a method for evaluating
variants that are evenly dispersed. individual data sets and predicting the statistical model
The following is a description of how attacks and regular data correctly. Cross validation operates by splitting data into
behave: compatible subsets and analysing each subset separately. In
• Accurate Favourable (AF): When the quantity of identi- order to get the best outcome from the database, we used 10
fied attack surface is indeed attack information. folds classification technique in our study.
• Accurate Non-Favourable (AN): When the estimated
quantity of normal data is indeed normal data. The information was uniformly subdivided into 10 subsam-
• Inaccurate Favourable (IF): When regular data is mistak- ples using 10 fold cross-validation. One subsample was held
enly identified as attack data. for verification and checking, while the other nine were used
• Inaccurate Non-favourable (IN): When the assault data is for training results. The validation data procedure was then
mistaken for regular data. replicated ten times (folds), with each of the ten subsamples
The fundamental goal of an IDS is to have high detection serving as validation data once. The ten folds findings are
accuracy and a low false alarm rate. then summed to create a single estimate. This method has the
An experiment was conducted to evaluate the efficiency of all advantage of repeating arbitrary quantization, i.e. all measure-
these four machine learning techniques. The KDD 99 database ments were used for both preparation and research, and each
was included in the research (Table 1 and Table 2 for more measurement was used precisely once for confirmation.
details).
V. S IMULATION R ESULT
TABLE I
T ESTING DATA Figure 1. depicts the outcomes of the experiment, which
were collected using the initial class mark identification with
Class Class Name No. of Example Percentage (%) a percentage of typical data ranging from 10% to 90%. We
0 Normal 96345 20.6%
1 Probe 5432 0.96% can see in the Figure 1. that as the proportion of normal
2 DoS 398427 80.22% data improves, the precision of these four algorithms rises as
3 U2R 66 0.02% well.This is because while the amount of normal data is low,
4 R2L 1236 0.35%
the attack data rate is high. As a result, when the percentage
of normal data is limited, all four algorithms struggle to
TABLE II
distinguish all 39 assaults. In the experiment, the percentage of
T RAINING DATA normal data increased from 10% to 90%, while the percentage
of attack data decreased from 90% to 10%. As a consequence,
Class Class Name No. of Example Percentage (%)
0 Normal 60582 20.4%
the amount of accurately categorized data increases. With an
1 Probe 6177 1.43% overall precision of 99 percent, the Decision Tree algorithm
2 DoS 276401 72.77% surpassed the NN, SVM, and FML implementations.
3 U2R 44 0.03% Figure 2. depicts the selection rate for the percentage of
4 R2L 17623 5.21%

Rest of the data, namely the attack data, were then


uniformly ordered and pooled into a single data set. We have
used Decision Tree C 4.5 method, also recognized as J48
in WEKA, during research. The results of the test are then
compared with the results of the other three techniques: NN,
SVM, and FML.
One of it’s goals of this study is to provide a rational analogy
for identifying various types of attacks. Rather than using
the entire KDD 99 database, 10,000 data points were used
for each assault group. We used Weka 3.7 for this study. 1.
Weka contains an arff-formatted database. We also have to
change the choice in Java Runtime Environment in Weka
with -Xmx1600m after upgrading Weka 3.7. These were
accomplished so that 1600MB of RAM could be allocated.
We will expand the usable memory for machine learning in
Weka thereby. Aside from the options mentioned above, the Fig. 1. Precision Rate performance raising KD99 database
normal data, which ranges from low to high. As seen in Figure FAR.
2., as the proportion of normal data increases, the selection The efficiency of the four machine learning techniques for
rate for these three four algorithms decreases. This occurred each attack style is compared in the below.
because as the number of usual data increases (from 10% to Probe: In [12], from Table III, We can see that when the
90%), the ratio of attacks decreases (from 90% to 10%). As a percentage of normal data is 10%, the DT (J48) algorithm pro-
result, preparing for attacks data would be more complex due duces the best results in terms of precision rate. Furthermore,
to the inadequate amount of attacks data to be learned. the DT (J48) algorithm outperformed the NN, SVM, and FML
As seen in Figure 3, the effect of the false alarm rate algorithms by giving a mean of (97.7%). In general, these
four algorithms are still effective at detecting Probe assaults.
DoS: In [12], from Table III, the precision of the NN,
SVM, and FML algorithms was a little poor, with averages
of (59.6%, 60.5% and 70.11% respectively), while the DT
(J48) algorithm had a 98.7% precision. This demonstrates that
the NN, SVM, and FML algorithms had trouble spotting DoS
attacks, while the Decision Tree (J48) algorithm performed
well in this regard.
U2R: In [12], from Table III, The precision of the NN,
SVM, and FML algorithms was nearly identical, at 64.4%,
65.5% and 67.55% respectively. It demonstrates that the NN,
SVM, and FML algorithms are all capable of detecting U2R
attacks. The DT (J48) algorithm, on the other hand, still has
a decent detection rate of 99.7% for this kind of attack.
R2L: The average precision of the NN, SVM, and FML al-
gorithms was almost identical, at 14.8%, 14.9% and 15.1% re-
Fig. 2. Performance of setection rate raising KD99 database spectively. R2L attack has the lowest detection efficiency of
the NN, SVM, and FML algorithms as opposed to other
detection methods. The DT (J48) algorithm is in the same
boat. This demonstrates that these four algorithms have trouble
spotting this type of attack. However, the DT (J48) algorithm
outperformed the NN, SVM, and FML algorithms in this sort
of attack, with an overall detection rate of 96.2 %, which is
much better than the other three algorithms.
The identification rate for four separate attacks (Probe, DoS,
U2R, and R2L), that are the most frequent types of attacks,
shows that DT (J48) outperforms the other methods listed here
for all four kinds of attacks. The average performance for
all four algorithms was measured to see which algorithm has
the highest detection precision. To conclude, DT (J48) has
shown outstanding performance in these four types of attack
(Probe, DoS, U2R, and R2L), outperforming the other three
algorithms. For Probe(97.7%), DoS(98.7%), U2R(99.7%) and
R2L(96.2%), DT (J48) indicates successful detection.
Fig. 3. FAR performance raising KD99 database
VI. C ONCLUSION AND F UTURE W ORK
(FAR) declines as the proportion of normal data rises. These In this article, the performance of selection rate , false
situations arise as a result of the high percentage of natural alarm rate, and precision of four different Machine Learning
data that aided the algorithms’ learning experience in order algorithms for four different types of attacks were compared
to comprehend normal behaviour. As a consequence, they can under different percentages of normal data. Using the KDD
easily distinguish between normal and abnormal data, lowering 99 database, the aim of this study is to find the best technique
the false alarm rate (FAR). The results of the false alarm rate that can be used as a reference for remote monitoring analysis.
(FAR) between the NN and DT algorithms are similar to one In Intrusion Detection, the KDD 99 database was the latest
another, as seen in Figure 3. The DT algorithm has a lower standard database. However, since the database was spread
success rate (1.59%) than with the NN algorithm (1.63% ). A unevenly, using only one set of datasets could result in an
successful IDS ought to have a low false alarm rate (FAR), error. As a result, the dataset was spread equally in this study,
and among these four algorithms, SVM (0.94%) generated a and the preparation and testing databases were merged. The
very low FAR, and FML (0.80%) also generated a very low key explanation for combining the databases is to ensure that
all 42 attacks in both databases can be performed at the same
time with varying percentages of normal data to obtain an
overall meaning. Among all the algorithms used here, DT(J48)
out forms the other algorithms for 4 different attacks.
R EFERENCES

You might also like