Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Received: 30 March 2020 Revised: 19 April 2020 Accepted: 6 May 2020

DOI: 10.1002/spy2.112

RESEARCH ARTICLE

Model based IoT security framework using multiclass


adaptive boosting with SMOTE

Pandit Byomakesha Dash1 Janmenjoy Nayak2 Bighnaraj Naik1


Etuari Oram1 S.K. Hafizul Islam3

1
Computer Application, Veer Surendra
Sai University of Technology, Burla, India Abstract
2
Computer Science and Engineering, Security and threats are growing immensely due to the higher usage of internet
Aditya Institute of Technology and of things applications in all aspects. Due to imbalanced nature of IoT security
Management (AITAM), Tekkali, India
data, the designing of model-based anomaly detection in IoT network poses a
3
Computer Science and Engineering,
Indian Institute of Information
challenge for machine learning model as most of the machine learning model
Technology Kalyani, Kalyani, India assumes the equal number of samples for each class. Approximately, 2.79% of
IoT network profiles are of anomaly types which impose severe imbalance where
Correspondence
Janmenjoy Nayak, Computer Science and there are three samples in the anomaly types for hundreds of samples in the
Engineering, Aditya Institute of majority normal class. This results in poor predictive performance for identifica-
Technology and Management, Tekkali, K
tion of anomaly type, which is essentially a problem because the anomaly type
Kotturu, Andhra Pradesh 532201, India.
Email: mailforjnayak@gmail.com is more sensitive than the normal activity type. This work proposes a multiclass
adaptive boosting ensemble learning-based model with the synthetic minor-
Funding information
ity oversampling technique for prediction of an anomaly in IoT network. The
Science and Engineering Research Board,
Grant/Award Number: EEQ/2017/000355 proposed approaches are simulated with DS2OS data and the performance is
compared with other machine learning approaches. The evaluation metrics such
as sensitivity, F1-score, and receiver operating characteristic-AUC imply the effi-
ciency of the proposed approach in handling the imbalanced nature of the data
and found efficient to identify both anomaly types and normal activity.

KEYWORDS
adaptive boosting, anomaly detection, ensemble learning, IoT security, SMOTE

1 I N T RO DU CT ION

In modern years, the internet has become the most significant part of people's life. More than 2 billion people throughout
the world are using the internet for distributing the huge amount of data, sending and receiving mails, using applications
of social networking, and many more. As the use of the internet is increasing every day, one more immense area is rising to
use the internet as a worldwide platform for letting the smart objects and machines to coordinate, calculate, and communi-
cate is called the internet of things (IoT).1 IoT can transmit data over a network with no human to a computer (or) human
to human interaction. It is a system of interconnected devices, objects, people, mechanical, and digital machines that are
offered with unique identifiers. IoT relays on three construction blocks such as identification, interaction, and commu-
nication respectively. The major challenges of IoT are device heterogeneity, embedded security and privacy-preserving
mechanisms, ubiquitous data exchange through wireless technologies, scalability, localization and tracking capabilities,

Security Privacy. 2020;e112. wileyonlinelibrary.com/journal/spy2 © 2020 John Wiley & Sons, Ltd. 1 of 15
https://doi.org/10.1002/spy2.112
2 of 15 DASH et al.

semantic interoperability and data management, self-organization capabilities, energy-optimized solutions, and embed-
ded security and privacy-preserving mechanisms. The inspiration behind IoT is to optimize the usage of public sources,2
to build the smart city, raise the value of services, and reduce the operational rates of the services. IoT is one of the great-
est rising areas in computing history. IoT plays a vital role in improving smart applications in real life. The significant and
crosscutting behavior of IoT with several components occupied in the exploitation of such systems has initiated novel
security challenges.
IoT systems are difficult and have unifying arrangements. These devices mainly employed in unattended surround-
ings and are generally connected through wireless networks where an intruder may contact confidential data from a
communication control by overhearing. So, retaining the security for the IoT system is a challenging task. Security is
a major factor in IoT applications and services. IoT security is a current research area that is drawing rising aware-
ness in academic, governmental, and industrial research. IoT security is the action of securing connected IoT devices
and networks. Several researchers encounter various challenges, mainly in security-related fields to offer a huge num-
ber of reliable services.3 Connecting everything in IoT in a smart environment requires a mechanism to ensure security
and privacy with a priority basis. This entails the studies on security requirements, threat models, and challenges of
securing IoT devices.4,5 Furthermore, the privacy policies adopted for consumer's data collection practices are also an
essential component for consumer's privacy6 and security.7 Data created by the IoT device are huge and due to this,
the collection of data, storage, and processing methods may not work properly. Also, a large amount of data can be
utilized for prototypes, actions, estimation, and prediction. Moreover, the variety of data produced by IoT generates
another face for the existing data processing mechanism. So, to connect the value of the IoT generated information, novel
mechanisms are required. Machine learning is a competitive mechanism that is measured to be the most appropriate
computational models to give entrenched intelligence in IoT devices.8 Machine Learning (ML) refers to an intelligent
technique utilized to optimize functional measures using experience data through past learning. Most accurately, algo-
rithms of ML can construct models of activities using mathematical methods on massive datasets. ML can assist smart
devices and machines to gather helpful information from the machine or human-produced data. Techniques of ML can
be used in several tasks like density estimation, classification, and regression. ML is a powerful tool of data investiga-
tion to study about usual and unusual performance according to how IoT mechanisms and devices interrelate with each
other in the IoT surroundings. Moreover, the methods of IoT can be significant in forecasting the latest attacks, which
are frequent alterations of prior attacks, as they can cleverly forecast upcoming unidentified attacks by learning from
obtainable examples.
Due to imbalanced nature of IoT security data, the designing of model-based anomaly detection in IoT network poses
a challenge for machine learning model as most of the machine learning model assumes an equal number of samples
for each class, which results in poor predictive performance for identification of anomaly type. This is essentially a trou-
ble because the anomaly type is more sensitive than the normal activity type. To address these issues, this work proposes
ensemble learning-based methods with the Synthetic Minority Over-sampling Technique (SMOTE) for the prediction of
an anomaly in the IoT network. In this paper, a multiclass adaptive boosting based model has been proposed for the pre-
diction of anomalies in the IoT network. The remaining sections are segmented in the following way: Section 2 elaborates
the literature about various ML-based methods in IoT applications with their limitations. Section 3 describes the pro-
posed methodology. Section 4 discusses the details about experimental setup with parameter settings and result analysis.
Section 5 concludes the work with some future directions.

2 LITERATURE ST UDY

Pahl and Aubet9 have developed a novel strategy for the detection of anomalies in IoT microservices. K-means, as well
as Birch clustering algorithm, have been considered for the study. Classification accuracy was considered as a perfor-
mance metric for the evaluation of the proposed methodology. A high detection rate and an accuracy of 96.3% have been
found after the successful experimentations. To defend the severe wireless sensor network attack named on-off, Liu et al10
have proposed a trust joint light probe-based defense methodology in the year 2018. Identification accuracy was consid-
ered as an evaluation metric for cram. After performance analysis, an identification accuracy of 0.80(>) was found with
less error rate. Diro and Chilamkurti11 have proposed a modern method for the detection of distributed attacks with the
help of NNs. NSL-KDD was used as a dataset for the performance study. Accuracy was considered as evaluation metric
and the proposed methodology yields higher attack detection rates with an accuracy of 98.27%. Anthi et al12 have devel-
oped a modern way of approach of intrusion detection system) for IoT. ML approach named Naive Bayes (NB) has been
DASH et al. 3 of 15

T A B L E 1 Other ML approaches for IoT security


S.No. ML method used Dataset used Performance metric Type of classification References

1. NB (Naive Bayes) Own • Precision Multiclass 18


• Recall
• F-score
2. k-NN, SVM, MLP, NB, RF KDD Cup 99 • Accuracy — 19
• Precision
• Recall
3. RF UNSW-NB15 • False positive rate Multi 20
• Classification accuracy
4. LR, RF, NN CRP • Classification rate — 21
• Cloning rate
5. SVM Generic • Classification accuracy Binary 22
6. DNN (deep neural NSL-KDD • Accuracy — 23
network) • Recall
• Precision
7. SVM Own • Accuracy rate Multi 24
8. SVM, DT RDD (resilient distributed • Classification accuracy Multiclass 25
dataset).
9. RF, k-NN, Gaussian NB Network traffic • Accuracy Multi 26
• Precision
• Recall
• F1-score
10. k-NN, SVM, DT Honey pot • Detection rate Binary 27
11. Graph based clustering Honey pot • Accuracy rate — 28
algorithm
12. RF KDD Cup 1999 • Classification time — 29
• Accuracy rate
• Learning time
13. SVM KDD • Detection accuracy Two-stage 30
14. MLP, RNN (recurrent UNSW-NB15 • Detection rate — 31
neural network),
alternate DT
15. RF, AdaBoost, etc. CIDDS-001, UNSWNB15, and • Accuracy Binary 32
NSL-KDD • Specificity
• Sensitivity
16. GD (gradient descent), Probe packet • Accuracy rate — 33
SVM • False detection
17. ANN UNSW-15 • Accuracy Binary 34
• Detection rate
18. DT Real time multivariate • Accuracy Binary 35
• F1-score, etc.
19. MLP (multilayer CRP (challenge response pairs) • Prediction accuracy — 36
perceptron)
20. RF, k-NN (k-nearest Own • Spoofing detection — 37
neighbor) accuracy
• Computation
complexity, etc.
4 of 15 DASH et al.

T A B L E 1 Continued
S.No. ML method used Dataset used Performance metric Type of classification References

21. ANN (artificial neural Own synthetic • Accuracy rate — 38


network) • False detection
22. k-NN, LSVM (support DoS attack traffic • Classification accuracy Binary 39
vector machine with
linear kernel), DT
(decision tree), RF, NN
23. k-NN, SVM Network traffic • Accuracy — 40
• Recall
• Precision, etc.
24. ANN Own complex • Accuracy — 41
• MCC (Matthews
correlation coefficient)
• Sensitivity
25. SVM Synthetic • Accuracy rate — 42
26. SVM, k-NN Network traffic • Accuracy rate Binary 43
27. SVM, DT NSL-KDD • Classification accuracy — 44
28. Dimension reduction KDD Cup 99 • Detection rate Multiclass 45
algorithm, k-NN, • Accuracy
Softmax regression • False alarm rate
29. RF (random forest) Own • Detection speed Multiclass 46
• Classification accuracy
30. C4.5 DT KDD Cup 1999 • Detection rate — 47
31. K-means, DT Own • Detection rate — 48
32. ANN Own • Prediction rate — 49
33. ANN IoT network TRAFFIC • Accuracy rate — 50
34. SVM Traffic • Convergence — 51
35. LSVM Device • Accuracy — 52
• Precision

employed in the proposed methodology. The authors claimed that the method acts as a tool for the identification of net-
work scanning probing and denial of service (DoS) attacks. Kozik et al13 have developed a classification approach for
attack detection in edge computing environments (ECE). A benchmark ML algorithm named extreme learning machine
has been used for the study. Higher detection accuracy was found for detecting the attacks in ECE with the proposed
technique. To cope with the security concern while developing the embedded IoT technologies, a modern method with
the help of digital watermarks was proposed by Usmonov et al.14 The conflict of data preservance of IoT was solved
with the developed method. Pajouh et al15 have developed a novel strategy for ABID of IoT backbone networks (BBNs)
with the help of two-tier classification algorithms named NB and k-nearest neighbor. NSL-KDD was considered as the
dataset for the evaluation of the study. Identification rate was used as a performance factor and by the evaluation of
data with proposed techniques and an accuracy of 84.82 was found for the identification of anomaly in IoT BBNs. D'an-
gelo et al16 have proposed a novel technique for network anomaly detection. A standard ML algorithm named U-BRAIN
has been employed and the NSL-KDD dataset has been considered by using accuracy rate as a performance factor for
the study. An accuracy of 94.1 for testing data and the accuracy of 97.4 for real-world data has been found with the
proposed technique. Aggarwal et al17 have developed a novel strategy for the data stream clustering with the help of
clustering algorithms such as Birch, K-means. Standard deviation was considered as a performance metric and a higher
accuracy rate of 96.3% for the clustering of microservice data streams was found with the proposed technique. More-
over, many kinds of literature were made for the security of IoT with the help of ML approaches and some of them were
depicted in Table 1.
DASH et al. 5 of 15

FIGURE 1 Class distribution in A, original dataset and B, oversampled dataset using synthetic minority over-sampling technique

3 P RO P O S E D METH O D O LO GY

The proposed work has two phases: (a) obtaining the balanced corpus of IoT profiles from original imbalanced data9
by using SMOTE and (b) designing multiclass adaptive boosting based model for prediction of anomalies in IoT net-
work. In this work, we have used IoT security dataset from kaggle53 for the model evaluation. This dataset is produced
in a virtual environment through distributed smart space orchestration system(DS2OS) which is composed of commu-
nications information among various IoT nodes in application layer. There are total of 357 952 samples each having 13
number of attributes. This dataset covers eight types of anomalies those are “anomalous (dataProbing)” (dP), “anoma-
lous (DoSattack)” (DoS), “anomalous (malitiousControl)” (mC), “anomalous (malitiousOperation)” (MO), “anomalous
(scan)” (scan), “anomalous (spying)” (spying), and “anomalous (wrongSetUp)” (wSU). The details percentage of distribu-
tions of anomalies “anomalous (dataProbing),” “anomalous (DoSattack),” “anomalous (malitiousControl),” “anomalous
(malitiousOperation),” “anomalous(scan),” “anomalous (spying),” and “anomalous (wrongSetUp)” having the distribu-
tions 03.41%, 57.70%, 08.87%, 08.03%, 15.44%, 05.31%, and 01.21% respectively. While implementing machine learning
algorithm, the structure of the data, particularly the balance between the numbers of observations for each potential tar-
get, has significant influence of the performance of the model prediction's performance.54 Due to imbalance nature of
the IoT anomalies data,9 we have used SMOTE55 in order to achieve a balance between the underrepresented classes
and majority class. 1 and 2 have been used to transform the original imbalanced IoT dataset9 to balanced corpus of
IoT profiles. Figure 1A,B represents the distribution of samples in original data and transformed corpus of IoT profiles
respectively.

Algorithm 1. Oversampling using SMOTE

[N: Amount of SMOTE in %, M: Number of minority class samples), k: Number of nearest neighbours, num_attrs: Number
of attributes, Sample_Minority: Two dimensional array representing original minority class samples, Sample_Synthetic:
Two dimensional array representing synthetic samples, nn_index: One dimensional array to store indices]

1. Set N, Mand k
2. if (N < 100)
M = (N/100) × M
N = 100
End_If
3. N = round (N/100)
4. new_index = 0
5. For i = 1 to M
Find out k nearest neighbours for ith minority sample.
Store the indices in the nn_index
Generate_Sample (N, i, nn_index)
6 of 15 DASH et al.

End_For
6. End_Algorithm1

Algorithm 2. Generate_Sample (N, i, nn_index)

1. While (N != 0)
nn = round (r and (1, k)), choose one random value of nn in between 1 to k.
For k = 1 to num_attrs
d = Sample_Minority[nn_index[nn]][j] − Sample_Minority[i][k]
m = r and (0, 1)
Sample_Synthetic[new_index][j] − Sample_Minority[i][k] + m × d
End_For
new_index = new_index + 1
N = N −1
2. End_While
3. End_Algorithm2

Let I = {I 1 , I 2 , … , I n } be the corpus of IoT network activity profile which is obtained through SMOTE where I i denotes
ith IoT network activity profile.

Ii = {Ii,1 , Ii,2 , … , Ii,m , ai }. (1)

In Equation (1), I i denotes ith IoT network activity profile, m is the number of features of communication profile in
the dataset and ai ∈ a, a = {a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 }, where a1 , a2 , a3 , a4 , a5 , a6 , a7 , and a8 symbolize dP, DoS, mC,
mO, scan, spying, wSU, and Normal respectively. Here, the data are preprocessed through label encoding. The pro-
posed multiclass adaptive boosting (AdaBoost)56 model makes use of decision tree (DT) as base classifier for prediction
of anomaly type ai . In this work, AdaBoost has been used to boost the performance of DTs for multiclass classifica-
tion problems. Here the proposed ensemble model predicts the anomaly type from “N” number of DTs constructed
from weighted instances (IoT profile) in training data. DTs are added sequentially and trained by using weighted
instances in training data and the prediction error is observed. This process is continued until no further improve-
ment in prediction performance or the required number of DT (ie, N) has been created. The final anomaly prediction
has been achieved by computing the weighted average of the prediction of the resultant pool of DTs. The working
schema of the proposed model and step by step computation is presented in 3 respectively. In 1 and 2, the value of
“k” has been set as 8, that is, 8 number of nearest neighbors for ith minority sample has been selected while gen-
erating sample in SMOTE. This is because the considered IoT traffic traces dataset contains 8 number of classes.
Here in 3, we have decided the number of DTs constructed from weighted instances (IoT profile) in training data as
N = 8 through experiment. It can be seen from Figure 5 that there is no change in error from the value of N = 8
onwards.

Algorithm 3. Multiclass adaptive boosting based model for anomaly prediction

1. Initialize the weights (Equation (2)) of each I i ∈ I.

1
Wi0 = (2)
n

2. Repeat for t = 0 to N

(i) Add decision tree sequentially DT t (I) by using splitting along features by using information gain computation
(Equation (3)) using Gini index (Equation (4)).
j FL FR
InfoGain(FI , fI ) = InfoMeasure(FI ′ ) − I InfoMeasure(FI L ) − I InfoMeasure(FI R ) (3)
FI FI
DASH et al. 7 of 15


InfoMeasuregini (I[FI S ]) = 1 − P(ai |I), S ∈ {L, R} (4)
ai ∈a

j
In Equations (3) and (4), fI ∈ FI , FI = {fI1 , fI2 , … , fIm } is the selected feature for splitting of I, where F I is the feature vector
of I. F I L and F I R are the features at left and right sub-tree of the DT t .

(ii) Predict the anomalies (Equation (5)) from trained model DT t (I).

a′ = 𝐷𝑇 t (I) (5)

In Equation (5), a′ is the vector of anomaly prediction and DT t (I) is the tth Decision Tree.

(iii) Select the model with least weighted prediction error (Equation (6)):

et = Error(W t [1a′i ≠ai ]ni=1 ) (6)

In Equation (6), et is the vector of weighted anomaly prediction error and W t is the tth weight vector.

(iv) Compute the weight parameter of tth model (Equation (7)):

( )
1 1 − et
𝛿t = × ln (7)
2 et

In Equation (7), 𝛿 t is the weight parameter of tth model.

(v) Update the weight (Equation (8)) of each IoT profile I i :

t t
W t (Ii,1 , Ii,2 , … , Ii,m , ai )e(−𝛿 ×ai ×𝐷𝑇 (Ii ))
WIt+1 = (8)
i 𝜃


n
In Equation (8), WIt+1 is the (t + 1)th weight of I i and 𝜃 is the normalization factor such that Wit = 1.
i
i=1

(vi) If (et − et + 1 < 𝜆, here is the 𝜆 threshold) then Break

Else Continue
End_For
3. Return the final prediction (Equation (9)):
( )

N
ΨAdaBoost (I) = 𝜎 𝛿 𝐷𝑇 (I)
t t
(9)
i=1

In Equation (9), ΨAdaBoost (I) is the final prediction on I and 𝜎(.) is the sigmoid activation.
End_Algorithm3

4 EXPERIMENTAL SETUP A ND RESULTS ANALYSIS

In this section, the experimental setup and result analysis have been presented and discussed in details.
8 of 15 DASH et al.

T A B L E 2 Performance analysis with SMOTE and without SMOTE


Prediction Models Performance metrics
Sensitivity F1- ROC-
TP FP TN FN (TPR) FPR Precision Specificity score AUC

RF 11 746 041 0 5780 154 691 0.98 0.0 1.0 1.0 0.99 0.99
NB 9 135 447 0 5780 251 171 0.97 0.0 1.0 1.0 0.98 0.98
LR 4 187 764 5780 2000 576 234 0.87 0.74 0.99 0.25 0.93 0.56
DT 9 752 627 18 900 2000 2800 0.99 0.90 0.99 0.09 0.99 0.54
LDA 10 330 274 0 5780 139 265 0.98 0.0 1.0 1.0 0.99 0.99
MLP 12 306 460 0 5780 24 039 0.99 0.0 1.0 1.0 0.99 0.99
AdaBoost 16 925 026 14 000 3780 0 1.0 0.78 0.99 0.21 0.99 0.60
RF_SMOTE 16 424 309 0 5780 28 102 0.99 0.0 1.0 1.0 0.99 0.99
NB_SMOTE 9 130 582 0 5780 251 717 0.97 0.0 1.0 1.0 0.98 0.98
LR_SMOTE 4 295 538 19 780 0 509 535 0.89 1.0 0.99 0.0 0.94 0.44
DT_SMOTE 15 572 162 0 5780 4900 0.99 0.0 1.0 1.0 0.99 0.99
LDA_SMOTE 10 322 826 0 5780 140 567 0.98 0.0 1.0 1.0 0.99 0.99
MLP_SMOTE 13 640 808 12 000 3780 95 305 0.99 0.76 0.99 0.23 0.99 0.61
Proposed 17 102 402 0 5780 0 1.0 0.0 1.0 1.0 1.0 1.0
AdaBoost_SMOTE

4.1 Simulation setup

The proposed method has been implemented on a system having Intel(R) Core(TM) i7-6700 CPU @3.40 GHz, 4.00 GB
RAM, 64 bit OS Windows 10 configurations. Simulation environment includes Python Anaconda open source distri-
bution and Spyder IDE. In the programming setup, we have used Pandas, Imblearn, and Numpy framework for data
analogy; Matplotlib framework for data visualization; sklearn framework preprocessing of data and classification model;
classification-metrics framework for performances measurement and analysis. Some other framework have been used
such as Seaborn for high-level interface with informative statistical graphs for correlation analysis; scipy and Itertools
have been used for scientific computing and efficient looping respectively. All the classifiers' parameters are set by
selecting suitable values in trial and error basis as follows: Decision Tree (max_depth = 5, random_state = 1), LinearDis-
criminantAnalysis (random_state = 1), LogisticRegression (random_state = 1), MLP Classifier (random_state = 1),
GaussianNB (random_state = 1), Random Forest (n_estimators = 100, max_depth = 5, random_state = 1), AdaBoost-
Classifier(base_estimator = DecisionTree(max_depth = 5,random_state = 1),n_estimators = 50) with Training and
Testing Spit: 70%-30%.

4.2 Results and discussion

The performance evaluation and efficiency of a proposed learning model aim to estimate the accuracy of a model on the
future data and metrics can make a decision which technique is most appropriate for this work.57 Various performance
metrics such as accuracy (Equation (10)), true positive (TP), false positive (FP), true negative (TN), false negative (FN),
sensitivity (also known as true positive rate [TPR]) (Equation (11)), false positive rate (FPR) (Equation (12)), precision
(Equation (13)), specificity (Equation (14)), F1-score (Equation (15)), and receiver operating characteristic (ROC)-AUC
(area under the ROC curve) curve have been computed and compared (Table 2) to study the effectiveness of the proposed
method.

(𝑇 𝑃 + 𝑇 𝑁)
Accuracy = (10)
(𝑇 𝑃 + 𝑇 𝑁 + 𝐹 𝑃 + 𝐹 𝑁)
DASH et al. 9 of 15

FIGURE 2 Receiver operating character analysis of A, DT; B, LDA; C, LR; D, MLP; E, NB; F, RF; G, AdaBoost

𝑇𝑃
Sensitivity = (11)
(𝑇 𝑃 + 𝐹 𝑁)

𝐹𝑃
𝐹𝑃𝑅 = (12)
(𝑇 𝑁 + 𝐹 𝑃 )
𝑇𝑃
Precision = (13)
(𝑇 𝑃 + 𝐹 𝑃 )

𝑇𝑁
Specificity = (14)
(𝑇 𝑁 + 𝐹 𝑃 )
10 of 15 DASH et al.

F I G U R E 3 Receiver operating character analysis of A, DT_SMOTE; B, LDA_SMOTE; C, LR_SMOTE; D, MLP_SMOTE; E,


NB_SMOTE; F, RF_SMOTE; G, AdaBoost_SMOTE

2 × 𝑇𝑃
F1-Score = (15)
(2 × 𝑇 𝑃 + 𝐹 𝑃 + 𝐹 𝑁)

The ROC-AUC curve of DT, LDA, LR, MLP, NB, RF, and AdaBoost without SMOTE is shown in Figure 2A to G respectively.
Figure 3A to G are the ROC_AUC curve of DT, LDA, LR, MLP, NB, RF, and AdaBoost with SMOTE. The comparisons
of all these considered model based on various performance metrics are shown in Table 2. In a close observation on
performance metrics, it is found that the proposed approach is better in sensitivity, F1-score, and ROC-AUC performance
metrics, which signifies the efficiency in identifying anomaly activities and capability to handle the imbalance IoT data
as compared to other models.
DASH et al. 11 of 15

FIGURE 4 Comparison of accuracy among all the methods

FIGURE 5 Error analysis


of models

Figure 4 represents the performance comparison in terms of accuracy. From the figure it is evident that, the classi-
fication accuracy of the proposed classifiers is better than the other standard classifiers. The proposed ensemble models
for detecting the intrusion has been analyzed and compared with respect to variations in error rate with no. of estimators
(Figure 5). Tables 3 and 4 represent confusion metric of AdaBoost and proposed AdaBoost_SMOTE, respectively.
The proposed ensemble-based methods are compared with some other competitive research in the literature. Table 5
depicts the performance comparison of some of the competent research on IoT attack detection with the present work.
From the table, it may be conferred that, the proposed system can detect the attacks with higher accuracy as compared to
other methods.

5 CO N C LU S I O N

Developing a suitable IoT based system with zero tolerance of misclassification has always been a research challenge
for all levels of researchers. Machine learning is one of the finest approaches for solving such a problem in IoT and has
been emerged as a solution for some of the network problems such as internet traffic management, resource allocation,
and security issues. This work proposes ensemble learning-based methods with the SMOTE for prediction of an anomaly
in IoT network data DS2OS. Simulation results are evidence that this proposed approach has successfully handled the
12 of 15 DASH et al.

T A B L E 3 Confusion metric
dP DoS mC mO scan spying wSU Normal
from AdaBoost
dP 3780 0 0 0 0 0 0 2000
DoS 0 342 0 0 0 0 0 0
mC 0 0 889 0 0 0 0 0
mO 0 0 0 805 0 0 0 0
Scan 0 0 0 0 1416 0 0 131
Spying 0 0 0 0 0 420 0 112
wSU 0 0 0 0 0 0 122 0
Normal 0 0 0 149 18 12 537 0 335 231

T A B L E 4 Confusion metric from


dP DoS mC mO scan spying wSU Normal
AdaBoost_SMOTE
dP 5780 0 0 0 0 0 0 0
DoS 0 342 0 0 0 0 0 0
mC 0 0 889 0 0 0 0 0
mO 0 0 0 805 0 0 0 0
Scan 0 0 0 0 1547 0 0 0
Spying 0 0 0 0 0 532 0 0
wSU 0 0 0 0 0 0 122 0
Normal 0 0 0 0 0 0 0 347 935

T A B L E 5 Comparison of the
Method name Dataset used Accuracy % References
proposed work with other research
Neural network NSL-KDD 98.27 11
K-means and BIRCH Synthetic 96.3 9
Naive Bayes Synthetic 97.7 12
PCA KDD Cup 99 84.4 45
Proactive based approach Synthetic 89 42
AdaBoost DS2OS 93.40 Studied
AdaBoost_SMOTE DS2OS 100 Proposed method

imbalanced nature of the data and found efficient to identify both anomaly types and normal activity. This proposed
method is found superior over other methods in terms of various evaluation metrics that is, accuracy, precision, and F1
score. We found the better performance indexes of the proposed approach in various evaluation metrics, that is, sensitivity,
F1-score, and ROC-AUC. This indicates efficiency in identifying anomaly activities over other approaches. In real-world
IoT scenarios, mostly the data is in an unstructured format and designing a machine learning-based method for such
data is always been challenging research. Moreover, the variations in the microservices with variant period in an IoT
environment may lead to an anomaly.

ACKNOWLEDGMENT
This research work is supported byScience and Engineering Research Board (SERB), Department of Science and Tech-
nology (DST), New Delhi, Govt. of India, under the research project entitled “Mining Socio-economic Factors Affecting
Agricultural Productivity in Sambalpur District, Odisha State: Soft Computing based Machine Learning Approaches”
(Grant No. EEQ/2017/000355).
DASH et al. 13 of 15

CONFLICT OF INTEREST
The authors declare no conflicts of interest.

ORCID
Janmenjoy Nayak https://orcid.org/0000-0002-9746-6557
S.K. Hafizul Islam https://orcid.org/0000-0002-2703-0213

REFERENCES
1. Suri B, Taneja S, Sahni N, Varshney A, Sharma A, Vidhani R. Smart threat alert system using IoT. Paper presented at: 2017 International
Conference on Computing, Communication and Automation (ICCCA); May 5, 2017; IEEE: 1246–1251.
2. Zanella A, Bui N, Castellani A, Vangelista L, Zorzi M. Internet of things for smart cities. IEEE Internet Things J. 2014;1(1):22-32.
3. Mosenia A, Jha NK. A comprehensive study of security of internet-of-things. IEEE Trans Emerg Top Comput. 2016 Sep 7;5(4):586-602.
4. Samaila MG, Neto M, Fernandes DA, Freire MM, Inácio PR. Challenges of securing internet of things devices: a survey. Secur Privacy.
2018;1(2):e20.
5. Losavio MM, Chow KP, Koltay A, James J. The internet of things and the smart city: legal challenges with digital forensics, privacy, and
security. Secur Privacy. 2018;1(3):e23.
6. Perez AJ, Zeadally S, Cochran J. A review and an empirical analysis of privacy policy and notices for consumer internet of things. Secur
Privacy. 2018;1(3):e15.
7. Raniyal MS, Woungang I, Dhurandher SK, Ahmed SS. Passphrase protected device-to-device mutual authentication schemes for smart
homes. Secur Privacy. 2018;1(3):e42.
8. Hussain F, Hussain R, Hassan SA, Hossain E. Machine learning in IoT security: current solutions and future challenges. arXiv preprint
arXiv:1904.05735; 2019.
9. Pahl MO, Aubet FX. All eyes on you: distributed multi-dimensional IoT microservice anomaly detection. Paper presented at: 2018 14th
International Conference on Network and Service Management (CNSM); November 5, 2018; IEEE: 72–80.
10. Liu X, Liu Y, Liu A, Yang LT. Defending ON–OFF attacks using light probing messages in smart sensors for industrial communication
systems. IEEE Trans Ind Inform. 2018;14(9):3801-3811.
11. Diro AA, Chilamkurti N. Distributed attack detection scheme using deep learning approach for internet of things. Future Gener Comput
Syst. 2018;82:761-768.
12. Anthi E, Williams L, Burnap P. Pulse: an adaptive intrusion detection for the internet of things
13. Kozik R, Choraś M, Ficco M, Palmieri F. A scalable distributed machine learning approach for attack detection in edge computing
environments. J Parallel Distrib Comput. 2018;119:18-26.
14. Usmonov B, Evsutin O, Iskhakov A, Shelupanov A, Iskhakova A, Meshcheryakov R. The cybersecurity in development of IoT embedded
technologies. Paper presented at: 2017 International Conference on Information Science and Communications Technologies (ICISCT);
November 2, 2017; IEEE: 1–4.
15. Pajouh HH, Javidan R, Khayami R, Ali D, Choo KK. A two-layer dimension reduction and two-tier classification model for anomaly-based
intrusion detection in IoT backbone networks. IEEE Trans Emerg Top Comput. 2016 Nov;29:314-323.
16. D'angelo G, Palmieri F, Ficco M, Rampone S. An uncertainty-managing batch relevance-based approach to network anomaly detection.
Appl Soft Comput. 2015 Nov 1;36:408-418.
17. Aggarwal CC, Philip SY, Han J, Wang J. A framework for clustering evolving data streams. Paper presented at: Proceedings 2003 VLDB
Conference, Morgan Kaufmann; January 1, 2003: 81–92.
18. Mridha MF, Hamid MA, Asaduzzaman M. Issues of internet of things (IoT) and an intrusion detection system for IoT using machine
learning paradigm. Paper presented at: Proceedings of International Joint Conference on Computational Intelligence; 2020; Springer,
Singapore: 395–406).
19. Maleh Y. Machine learning techniques for IoT intrusions detection in aerospace cyber-physical systems. Machine Learning and Data
Mining in Aerospace Technology. Cham: Springer; 2020:205-232.
20. Alrashdi I, Alqazzaz A, Aloufi E, Alharthi R, Zohdy M, Ming H. AD-IoT: anomaly detection of IoT cyberattacks in smart city using machine
learning. Paper presented at: 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC); January 7,
2019; IEEE: 0305–0310.
21. Laguduva V, Islam SA, Aakur S, Katkoori S, Karam R. Machine learning based iot edge node security attack and countermeasures. Paper
presented at: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI); July 15, 2019; IEEE: 670–675.
22. Ioannou C, Vassiliou V. Classifying security attacks in IoT networks using supervised learning. Paper presented at: 2019 15th International
Conference on Distributed Computing in Sensor Systems (DCOSS); May 29, 2019; IEEE: 652–658.
23. Liang C, Shanmugam B, Azam S, Jonkman M, De Boer F, Narayansamy G. Intrusion detection system for internet of things based on a
machine learning approach. Paper presented at: 2019 International Conference on Vision Towards Emerging Trends in Communication
and Networking (ViTECoN); March 30, 2019; IEEE: 1–6.
24. Hsu HT, Jong GJ, Chen JH, Jhe CG. Improve Iot security system of smart-home by using support vector machine. Paper presented at: 2019
IEEE 4th International Conference on Computer and Communication Systems (ICCCS); February 23, 2019; IEEE: 674–677.
14 of 15 DASH et al.

25. Kotenko I, Saenko I, Kushnerevich A, Branitskiy A. Attack detection in IoT critical infrastructures: a machine learning and big data
processing approach. Paper presented at: 2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based
Processing (PDP); February 13, 2019; IEEE :340–347.
26. Kumar A, Lim TJ. Edima: early detection of IoT malware network activity using machine learning techniques. Paper presented at: 2019
IEEE 5th World Forum on Internet of Things (WF-IoT); April 15, 2019; IEEE: 289–294.
27. Vishwakarma R, Jain AK. A honeypot with machine learning based detection framework for defending IoT based botnet DDoS attacks.
Paper presented at: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI); April 23, 2019; IEEE: 1019–1024.
28. Sun P, Li J, Bhuiyan MZ, Wang L, Li B. Modeling and clustering attacker activities in IoT through machine learning techniques. Inform
Sci. 2019;479:456-471.
29. Kunugi Y, Suzuki H, Koyama A. IoT security viewer system using machine learning. Paper presented at: International Conference on
Advanced Information Networking and Applications; March 27, 2019; Springer, Cham: 1071–1081.
30. Al-Hadhrami Y, Hussain FK. A machine learning architecture towards detecting denial of service Attack in IoT. Paper presented at:
Conference on Complex, Intelligent, and Software Intensive Systems; July 3, 2019; Springer, Cham: 417–429.
31. Shafi Q, Qaisar S, Basit A. Software defined machine learning based anomaly detection in fog based IoT network. Paper presented at:
International Conference on Computational Science and its Applications; July 1, 2019; Springer, Cham: 611–621.
32. Verma A, Ranga V. Machine learning based intrusion detection systems for IoT applications. Wirel Pers Commun. 2019;30:1-24.
33. Liu L, Yang J, Meng W. Detecting malicious nodes via gradient descent and support vector machine in internet of things. Comput Electric
Eng. 2019;77:339-353.
34. Hanif S, Ilyas T, Zeeshan M. Intrusion detection in IoT using artificial neural networks on UNSW-15 dataset. Paper presented at: 2019
IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT); October 6, 2019;
IEEE: 152–156.
35. Madhawa S, Balakrishnan P, Arumugam U. Roll forward validation based decision tree classification for detecting data integrity attacks
in industrial internet of things. J Intell Fuzzy Syst. 2019;36(3):2355-2366.
36. Aseeri AO, Zhuang Y, Alkatheiri MS. A machine learning-based security vulnerability study on XOR PUFs for resource-constraint internet
of things. Paper presented at: 2018 IEEE International Congress on Internet of Things (ICIOT); July 2, 2018; IEEE: 49–56.
37. Xiao L, Wan X, Lu X, Zhang Y, Wu D. IoT security techniques based on machine learning: how do IoT devices use AI to enhance security?
IEEE Signal Process Mag. 2018 Sep 3;35(5):41-49.
38. Chatterjee B, Das D, Sen S. RF-PUF: IoT security enhancement through authentication of wireless nodes using in-situ machine learning.
Paper presented at: 2018 IEEE International Symposium on Hardware Oriented Security and Trust (HOST); April 30, 2018; IEEE: 205–208.
39. Doshi R, Apthorpe N, Feamster N. Machine learning DDoS detection for consumer internet of things devices. Paper presented at: 2018
IEEE Security and Privacy Workshops (SPW); May 24, 2018; IEEE: 29–35.
40. Dash A, Pal S, Hegde C. Ransomware auto-detection in IoT devices using machine learning. Int J Eng Sci. 2018;8(12):19538-19546.
41. Zolanvari M, Teixeira MA, Jain R. Effect of imbalanced datasets on security of industrial IoT using machine learning. Paper presented at:
2018 IEEE International Conference on Intelligence and Security Informatics (ISI); November 9, 2018; IEEE: 112–117.
42. Baracaldo N, Chen B, Ludwig H, Safavi A, Zhang R. Detecting poisoning attacks on machine learning in IoT environments. Paper presented
at: 2018 IEEE International Congress on Internet of Things (ICIOT); July 2, 2018; IEEE: 57–64.
43. Gurulakshmi K, Nesarani A. Analysis of IoT bots against DDoS attack using machine learning algorithm. Paper presented at: 2018 2nd
International Conference on Trends in Electronics and Informatics (ICOEI) May 11, 2018; IEEE: 1052–1057.
44. Amfo JK, Hayfron-Acquah JB. Modeling of hybrid intrusion detection system in internet of things using support vector machine and
decision tree. Int J Comput Appl. 2018;181(15):45-52.
45. Zhao S, Li W, Zia T, Zomaya AY. A dimension reduction model and classifier for anomaly-based intrusion detection in internet of
things. Paper presented at 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Perva-
sive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress
(DASC/PiCom/DataCom/CyberSciTech); November 6, 2017; IEEE: 836–843.
46. Meidan Y, Bohadana M, Shabtai A, Ochoa M, Tippenhauer NO, Guarnizo JD, Elovici Y. Detection of unauthorized IoT devices using
machine learning techniques. arXiv preprint arXiv:1709.04647; 2017.
47. Alghuried A. A model for anomalies detection in internet of things (IoT) using inverse weight clustering and decision tree. https://doi.
org/10.21427/D7WK7S.
48. Shukla P. Ml-ids: a machine learning approach to detect wormhole attacks in internet of things. Paper presented at: 2017 Intelligent
Systems Conference (IntelliSys); September 7, 2017; IEEE: 234–240.
49. Canedo J, Skjellum A. Using machine learning to secure IoT systems. Paper presented at: 2016 14th Annual Conference on Privacy,
Security and Trust (PST); December 12, 2016; IEEE: 219–222.
50. Hodo E, Bellekens X, Hamilton A, Dubouilh PL, Iorkyase E, Tachtatzis C, Atkinson R. Threat analysis of IoT networks using artificial neu-
ral network intrusion detection system. Paper presented at: 2016 International Symposium on Networks, Computers and Communications
(ISNCC), May 11, 2016; IEEE: 1–6.
51. Liang G. Automatic traffic accident detection based on the internet of things and support vector machine. Int J Smart Home.
2015;9(4):97-106.
52. Ham HS, Kim HH, Kim MS, Choi MJ. Linear SVM-based android malware detection for reliable IoT services. J Appl Math. 2014;2014:1-10.
53. Pahl M-O, Aubet F-X. DS2OS traffic traces, 2018. https://www.kaggle.com/francoisxa/ds2ostraffictraces. Accessed January 20, 2020.
DASH et al. 15 of 15

54. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning.
J Mach Learn Res. 2017;18(1):559-563.
55. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res.
2002;16:321-357.
56. Hastie T, Rosset S, Zhu J, Zou H. Multi-class adaboost. Stat Interface. 2009;2(3):349-360.
57. Xin Y, Kong L, Liu Z, et al. Machine learning and deep learning methods for cybersecurity. IEEE Access. 2018;6(c):35365-35381. https://
doi.org/10.1109/ACCESS.2018.2836950.

How to cite this article: Dash PB, Nayak J, Naik B, Oram E, Islam SH. Model based IoT security framework
using multiclass adaptive boosting with SMOTE. Security and Privacy. 2020;e112. https://doi.org/10.1002/spy2.112

You might also like