Professional Documents
Culture Documents
Intrusion Detection System Through Advance Machine Learning For The Internet of Things Networks
Intrusion Detection System Through Advance Machine Learning For The Internet of Things Networks
Intrusion Detection System Through Advance Machine Learning For The Internet of Things Networks
Network security is the main issue in the Internet of Things networks because most
manufacturers do not focus on security standards during design. The performance
of firewalls, intrusion prevention systems, and intrusion detection systems depend
upon accuracy, thus its essentials, to enhance the detection rate to lessen false
alarms. The objective of the firewall is to find violations as per predefined rules and
block the incoming risky traffic. However, it is very tough to differentiate between
malicious and regular traffic due to advanced techniques of attack. To address
these issues, a two-stage hybrid method is proposed. First, the genetic algorithm
(GA) is applied to select appropriate features to improve the accuracy of the
proposed framework. Next, the well-known machine learning (ML) algorithm,
including the support vector machine (SVM), ensemble classifier, and decision tree
are employed. The achieved accuracy is 99.8% through 10-fold cross-validation
using a multiclass NSL-KDD database.
I
ntrusion is a process through which intruders systems to ensure reliability and to protect network
enter into a network system inauthentically to infrastructure against intrusion. Therefore, computer
modify, steal, or encrypt important/ confidential and network security are essential and fundamental
information. It could also be used to damage the for organizations and individuals, as well to protect
hardware system shortly. Moreover, intrusion the cause of damages. To avoid such situations,
causes huge financial losses, and thus, companies advanced machine learning (ML) techniques have
are using different intelligent tools to enhance secu- been employed to develop an intrusion detection
rity against intrusion.1 In a global network, more system (IDS) for IoT networks. Therefore, developing
security issues may arise due to the Internet of efficient IDS is vital to defend network systems. The
Things (IoT) enabled devices as online systems are proposed model uses a genetic algorithm (GA) that
more attractive to intruders, e.g., microwave, could select appropriate features and advance ML
blender, wearable devices, clothing, cognitive build- algorithms.
ings, etc., allows access to the internet openly.2 The rest of the article is organized as follows:
Therefore, extensive research in the IoT security is Related work of recent IDS techniques and cyber-
the need of the day. This requires intelligent attacks in IoT environments are discussed in
“Related Work.” The proposed model of IDS is
described in “Proposed IDS for IOT Based on
Machine Learning Technique.” The experimental
results exhibited in “Performance Measures, Results,
1520-9202 ß 2021 IEEE
Digital Object Identifier 10.1109/MITP.2020.2992710
Discussions,” and “Conclusion and Future Works
Date of current version 31 March 2021. ”concludes the research.
RELATED WORK
Intrusion is a severe dilemma of security disclosure
because only one occurrence of intrusion might be
enough to delete or steal data from a computer. In
past few years, IDS has become essential to handle
security breaches and to resolve the concern issues.3
ML techniques in the view of IoT environments are
not highlighted too much for security challenges.
However, a large number of research works are pres-
ently being presented in area of intrusion detection.
Different approaches are concentrated on enhancing
the system’s ability to isolate network traffics into nor-
mal and abnormal packet.4
Horng et al.5 proposed support vector machine FIGURE 1. Framework of the proposed model.
(SVM) based IDS, which mixes hierarchical clustering
algorithm to preprocess the data and claimed high
accuracy. Eduardo et al.6 developed a classification them, a few are lack of authorization/authentication
model to find network anomalies by statistical techni- mechanisms, insecure web interfaces and lack of
ques and self-organizing maps. Initially, principal com- transport encryption, etc., creating a challenging task
ponent analysis is used for feature selection. Network to utilizing security measures in the IoT network. Some
classification of normal or malicious is done through of the most widespread attacks are explained in the fol-
Probabilistic Self-Organizing Maps. Ravale et al.7 pro- lowing, as a result of which IoT networks are insecure.
posed a hybrid method that utilizes a mix of data min- Denial of Service (DoS) Attack: In such attacks,
ing techniques, K-means clustering is used for feature machines are not accessible to an authentic user.
selection and SVM is utilized for classification. Guo When different devices become a part of this attack,
et al.8 suggested a hybrid learning approach termed then it is termed as distributed denial of service. It dis-
as distance sum-based SVM (DSSVM) to demonstrate turbs network and computation capacity, etc.
an efficient IDS. Sybil Attack: During this attack, an attacker gener-
Bostani et al.9 described that wireless sensor net- ates multiple fake identities to control a peer network.
works and internet are exposed to several attacks to Sinkhole Attack: This attack is used to catch the
the IoT environment. The results exhibited proposed attention of all network traffic from its adjacent points
system could acquire false positive rate and true posi- through advertising fake routing tables. It badly
tive rate of 5.92% and 76.19%, respectively. Yang et al.10 affects IoT networks and is hard to detect such
presented a study declaring that IoT network works attacks since it utilizes RPL as routing protocol.
under limited security guarantee. For this purpose, an Jamming: During this attack, attackers transmit
anomaly-based detection system is developed to secure faked signals to disturb communication of IoT networks.
the environment from false data injection attacks. Sev-
eral ML classifiers have been incorporated to enhance PROPOSED IDS FOR IoT BASED
the functionality of IDS, for example, K-nearest neighbor ON MACHINE LEARNING
(K-NN),11 SVM,4 ANN,12 decision tree (DT),13 naive Bayes TECHNIQUE
network,14 etc. However, performance of detection
Figure 1 shows the proposed IDS, which is based on
should not depend upon the employed techniques only,
machine learning techniques for IoT networks. The
but also the features of the given data.4 Indeed, the
proposed model includes feature selection and train-
quality of the data is based on the results of IDS. Still,
ing through different classifiers.
about the security concern and the hurdle of intrusions
in the IoT environment, we observed that its framework
is not standardized so far, but organizations, such as ITU Database Description
and IEEE are working on it. The performance analysis of the proposed model is
carried using a dataset called NSL-KDD.16 In each
Cyber-Attacks in IOT Environments record of NSL-KDD database, there are 41 features.
Secure routing is difficult for designing ID measures in The 42nd attribute comprises the information regard-
IoT networks. Several reviews15 have disclosed that IoT ing five different classes (1 normal class and 4 attack
networks are insecure due to different attacks. Among class). The attack classes are categorized as DoS
attack, remote to local (R2L) access, probing attack population is merged into current population and
(Probe), and User to Root (U2R) access. organized according to fitness value. Finally, the best
chromosomes are then chosen for the next produc-
Genetic Algorithm as Feature Selection tion, whereas the remainder is removed. The process
In the proposed model, the system is trained through is terminated after the maximum number of iterations
NSL-KDDTrainþ20%.17 This dataset includes 25192 reached. The N ( ¼10) best chromosome is nominated
cases, 13449 are considered normal data and 11743 of as the feature subset i.e., v1, v2, v3, v5, v6, v17, v29,
which are attack data. First, attack type is trans- v30, v33, v34 through GA, which is to be used for
formed into its numeric categories before the opera- classification.
tion of features selection. 1 is used for DoS attack, 2, 3,
4, and 5 are given to Probe, R2L, U2R, and normal data, Classification
respectively. Then, Genetic Algorithm (GA) is used for The selection of classifiers is a significant step for IDS.
feature selection.18 In the proposed model, supervised ML techniques
have been employed to evaluate the behaviors of IoT
Algorithm: IDS; Take input, perform preprocess- devices in malware detection. The selected features
ing and feature selection, classification are evaluated to implement a classification applying
1 procedure data_ processing (selection); Decision Tree (DT), SVM, and Ensemble classifier to
2 Input ¼ NSL_KDD (41features); mark the attack as malicious or normal. In the pro-
3 GA ¼ fselection (Input); posed work, the 10-fold technique is chosen as cross-
4 return v1, v2, v3, v5, v6, v17, v29, v30, v33, v34; validation for the classification to prevent overfitting.19
5 procedure classification [(v1, v2, v3, v5, v6, v17, v29, v30,
v33, v34), trueclass]; Support Vector Machine
6 Model ¼ training [(v1, v2, v3, v5, v6, v17, v29, v30, v33, The SVM works on the principle of statistical learning
v34), trueclass]; theory, and it attempts to find the best hyper-plane to
7 Predictedclass ¼ testing (Model (v1, v2, v3, v5, v6, v17, minimize the classification error in multidimensional
v29, v30, v33, v34]; space. The hyper-plane can be formulated as in (1) is
8 Accuracy ¼ Confusionmat (Predicedclass, trueclass);
9 return Predictedclass, Accuracy. V:x þ b ¼ 0 (1)
GA is a search heuristic algorithm employed to where x indicates the n-dimensional input vector, b is
solve optimization problems. It is an evolutionary algo- the bias of the model, V is termed the weight vector
rithm inspired by natural evolution. GA replicates the and described as V1 ; V2 ; . . . ; Vn .
procedure of natural selection, where the finest indi- SVM uses “kernel function” to take input data and
viduals are nominated for reproduction that produces transform into linear separation in case of non-linear
children for the next generation. Currently, GA docu- separation. The kernel function expressed as
mented a huge interest in the feature selection pro-
Kðxu :xv Þ ¼ ’ðxu ÞT : ’ðxv ÞT : (2)
cess because it does not apply any formula in the
calculation. The process of GA is explained as follows.
The calculation of the separating hyper-plane does
Initially, a random population of chromosomes is
not need information of ’, but only of K in kernel trick.
produced. Then, fitness function of each chromosome
Mostly four kernel functions are utilized in SVM: Lin-
is estimated. Usually, GA comprises three key opera-
ear, Gaussian, Quadratic, and Cubic. However, we
tions; parent selection, crossover, and finally mutation.
choose the Quadratic kernel function because it is
For the selection of parents, the roulette wheel
computationally less intensive. The Quadratic func-
method is utilized as a selection. This method picks
tion is presented as
two parents based on calculated probability from the
fitness value. A chromosome with best instance of fit-
xu xv 2
ness having a higher probability is selected as a par- Kðxu :xv Þ ¼ 1 : (3)
xu xv 2 þ C
ent. Once crossover is accomplished between two
nominated parents, a process of mutation is used
based on the mutation rate (MR). The MR specifies Decision Tree
the probability of a feature to flip from bit 0 to 1 or 1 to It categorizes the cases by sorting them based on val-
0. The fitness of the newly created population is ues using a top–down methodology. Each node of
assessed in the next step. Afterward, newly produced decision tree denotes instances to be classified, and
each branch denotes an attribute of that case. A case (6). True positive (TP) gets significant when the model
is grouped by instigating at root node, testing individ- recognizes a malicious attack as malicious, whereas a
ual value using this node, then going down toward true negative (TN) arises when a normal packet is rec-
tree branch as per value supplied. The same process ognized as normal. On the other side, a predicted false
is then repeated for subtree started at the new node. positive (FP) value occurs when traffic is normal and it
The variable employed in a splitting criterion is nomi- is recognized as malicious, whereas a false negative
nated as per the splitting condition to maximize the (FN) arises when a particular attack is malicious and is
lessening in impurity. We employed a Gini diversity recognized as normal
index (GDI) as a splitting criterion to select the best
variable, which is stated as
TP þ TN
X Accuracy ¼ : (6)
uðtÞ ¼ PðvjtÞPðujtÞ: (4) ðTP þ TN þ FP þ FNÞ
v6¼u
results, whereas TP rate of R2L, U2R attack is 90%, KDD dataset through an ensemble classifier using 10-
27%, respectively. fold cross-validation technique. Future research work
may focus to detect anomalies in IoT and IoMT (Inter-
net of Medical Things) networks using hybrid features
CONCLUSION AND FUTURE selection, such as PSO-GA and multiclass classifica-
WORKS tion methods.
A secure network is highly significant for an organiza-
tion. Network security is mandatory to be considered
during the design phase of IoT devices. IDS using
advance ML is most important for IoT networks. GA ACKNOWLEDGMENTS
with SVM, decision tree, and ensemble classifiers are This work was supported by the Artificial Intelligence
utilized for IDS. A higher result of classification in and Data Analytics Lab (AIDA), Prince Sultan Univer-
terms of accuracy has been observed on the NSL- sity, Riyadh, Saudi Arabia.
AMJAD REHMAN is a Senior Researcher with the AIDA Lab, Islamabad, Pakistan, and the BS degree in computer
Prince Sultan University, Riyadh, Saudi Arabia. His research engineering (Hons) in 2009 from the COMSATS University of
interests include data mining, health informatics, and pattern Sciences and Technology, Wah Campus, Pakistan. He is a
recognition. He received the Ph.D. degree from the Faculty of team-lead of the Forensic Analysis, Machine Learning, and
Computing Universiti Teknologi Malaysia with a specializa- Information Retrieval (FAMLIR) research group. Contact him
tion in Forensic Documents Analysis and Security in 2010. He at zahid.mehmood@uettaxila.edu.pk.
is the corresponding author of this article. Contact him at
rkamjad@gmail.com. QAISAR JAVAID is currently the Working Chairman of the
Department of Computer Science and Software Engineering,
ZAHID MEHMOOD is currently with the Department of Com- International Islamic University, Islamabad, Pakistan. He is
puter Engineering, University of Engineering and Technology consistently doing research in the areas of computer net-
(UET), Taxila, Pakistan, where he received the Ph.D. degree in works, information security, cloud computing, and IoT. He is
computer engineering in 2016. He received the MS degree in also heading the CISCO Networking Academy of Interna-
electronic engineering in 2012 with specialization in signal tional Islamic University, Islamabad, Pakistan. Contact him at
and image processing from International Islamic University, qaisar@iiu.edu.pk.