Professional Documents
Culture Documents
IoT Network Anomaly Detection in Smart Homes Using Machine Learning
IoT Network Anomaly Detection in Smart Homes Using Machine Learning
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
ABSTRACT In this modern age of technology, the Internet of things has covered all aspects of life including smart
situations, smart homes, and smart spaces. Smart homes have a large number of IoT objects that are working continuously
without any interruption. Better security and authentication of these smart devices can provide peaceful environments to live
in such spaces. It is important to monitor the activities of smart IoT devices to make them work fault free. Such devices are
small, consume relatively less power and resources, and are easily attackable by attackers. It is crucial to protect the integrity
and characteristics of the smart home environment from external attacks. Machine Learning played a vital role in
recognizing such malicious activities and attempts. Several Machine Learning approaches are available to detect the normal
and abnormal behavior of IoT device traffic. This study proposed a machine learning-based anomaly detection approach for
smart homes using different classifiers. Testing and evaluation are performed using the University of New South Wales
(UNSW) BoT IoT dataset. Machine learning models based on four classifiers are built using an IoT devices dataset. For the
Test dataset, the Weighted Precision, Recall and F1 score of Random forest, decision tree, and AdaBoost is 1 as compared to
ANN which has 0.98, 0.96, and 0.96 respectively Results show that high performance, precision, and robustness can be
achieved using the proposed methodology. In this way, smart homes' security and identity of devices can be monitored and
anomalies can be detected with high accuracy. Attack categories include binary class, multiclass class, and subclasses.
Results show Random Forest algorithm outperforms enough to use this methodology in smart environments.
INDEX TERMS Smart Homes, IoT Environment, Cyber Security, Network Anomaly Detection, Smart
Environments, Machine Learning.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
In smart homes, such devices are playing an significant role Open Challenges, and Prospects [44], discusses machine
in day-to-day activities. One must consider the security and learning frameworks for DDoS attack detection in SD-IoT
privacy of data shared by IoT objects. Gartner predicts that networks.
by 2020, more than 25 percent of assaults on businesses For IoT-based smart home environments, Alippi et al. [26]
would include Internet of Things (IoT) devices [18]. presented a framework for detecting changes at the level of
Using these obstacles and numbers as a jumping-off point, sensors. The approach constructs a dynamical model of the
we discovered that almost all studies with the intent of anticipated sensor-generated signal over time [22]. The
analyzing the difficulties encountered in the IoT area system then continuously compares the predicted signals
include a portion dedicated to security and privacy concerns from the model with the actual signals (data streams)
[19]. The previous few years have seen a number of cyber- arriving from various sensors. Specifically, it uses a method
security problems involving networked, pervasive, and for detecting transitions known as "Intersection-of-
Internet of Things technology. Numerous attacks have been Confidence" to look for differences [27]. Author investigate
made against the Internet of Things (IoT), including notable the deep learning based solutions for IDS and achieved
Distributed Denial of Service (DDoS) attacks on Dyn's 95% precision and 97% Recall after various attack
DNS (domain name system; see solutions [38]. Also was discussed about Imbalanced data
reference).[20]Attacks/vulnerabilities on/off self-driving issues during intrusion detection and applied SMOTE
cars [21], ransomware attacks [21], attacks on smart home technique to balance dataset [39]. The author used IG and
and medical equipment [22,23], and more. GR filter-based approaches for feature ranking and applied
The IoT and critical infrastructures pose new security five ML algorithms, including kNN, ANN, bagging, J48,
threats. The limited resources aboard the devices used to and Ensemble methods, to classification of traffic features.
construct IoT and CPS apps make it difficult to mitigate Both classification approaches are implemented in this
these hazards. In addition, the gadgets are inherently study, Binary and multi [40]. NSL-KDD dataset was used
insecure since they need a constant Inter-net connection. to test the IoT based cyber-attack classification (Binary and
New instances and studies show that greater investigation is Multi) [41].
required [22]. They also reveal, sadly, that present solutions The available studies for these smart homes' security and
are far from being acceptable to halt the exponential threat protection are yet not mature enough to fulfill the
development in the number and complexity of assaults [24]. domain requirements. Such environments need security
Thus, the necessity for further layers of defense to make mechanisms, privacy mechanisms, and data transmission
smart home IoT devices more durable and proof of such mechanisms with machine intelligence for a secure smart
assaults is vital [22]. home. The proposed study consists of Machine Learning
[25] introduced a host-based anomaly discovery method in based layered architecture as shown in Fig 3, which can
which device features and activities are considered using a provide improved and reliable detection of anomalies in
mobile phone application. Their system uses a supervised smart homes.
machine learning approach to evaluate the gathered data At level 1, which is L1, the dataset generation and
and determine whether an app is benign or dangerous. collection are performed. In the proposed study, the
Device metrics including CPU load, active processes, and UNSW−2018−IoT−Botnet dataset is used. This dataset
application programming interface (API) invocations are all contains the data of IoT devices of a smart home. This
included in their feature vector [22]. Based on their claimed dataset contains 0 to 668521 records that are sufficient to be
results, their method successfully distinguishes between used for the study. This information is forwarded to L2,
apps that are games and those that are tools 99.7 percent of which is the data pre-processing level. At this level, data is
the time. However, they fail to reveal or comment on preprocessed for missing values, feature engineering, and
whether the shown or mentioned programs are dangerous. data balancing. After preprocessing, another vital level is
Relatedly, Burguera et al. (2011) devised a method for feature selection. The most relevant features are selected
detecting malware on smart devices at the device level [22]. using correlation among several features. After selecting
This study introduces a system called Crowdroid, which relevant and ideal features for classification, training data is
records and analyses system calls as they occur in real-time. used for classification. This classification is performed
A mobile application receives this data. The collected data, using different machine learning algorithms. These
represented by Linux system calls, is sent to a central server algorithms include AdaBoost, Decision Trees, Random
(off-device research) where it is analyzed for harmful Forest, LSTM, Auto Encoder, and ANN.
software. Towards a Machine Learning-based Framework
for DDoS Attack Detection in Software-Defined IoT (SD-
IoT) Networks[42]. This study discusses machine learning
frameworks for DDoS attack detection in SD-IoT networks
[43]Toward Software-Defined Networking-Based IoT
Frameworks: A Systematic Literature Review, Taxonomy,
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
1) DETAILS OF DATASET ATTACK CATEGORIES attack, which is studied in the proposed work. A comparison
UNSW BoT IoT dataset contains six different classes with a of the BoT IoT dataset with other standard network datasets
different number of records as shown in Table 1. There are is given in Table 3. This shows that the BoT IoT dataset is far
six different types of categories considered in the pro-posed more realistic and reliable to test smart network
study. At first, binary classification is done in which there are environments as compared to other datasets, which are used
two classes i.e. attack and normal. In multiclass in previous research. T represents true and F represents False.
classification, the attack category is subcategorized into True shows the presence of that characteristic and False
Keylogging, Data Exfiltration, Service Scan, OS Fingerprint, shows the absence of the characteristic.
and UDP. Table 2 presents the description of each type of
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
B. FEATURE ENGINEERING variations in network traffic, the model can identify abnormal
behaviors associated with potential attacks or malicious
The Machine Learning models are based on exploratory data activities. For instance, sudden spikes in packet sizes or
analysis and observations. A dataset must be in the unexpected protocol usage can be indicative of an anomaly.
appropriate format to be given to the classification model.
For this purpose, categorical attributes are handled by using 2. Communication Patterns: Features related to
different transformation techniques. The dataset used in this communication patterns involve attributes like the number of
study contains both categorical and numerical features. In connections, data transfer rates, and the frequency of
order to convert the input into numeric characteristics, one interactions between devices. Anomalies in these patterns,
hot encoding and label encoding technique is used. In this such as a device initiating an unusually high number of
research, the data was transformed into a feature vector using connections, may suggest a potential security threat.
label encoding techniques [11]. 3. Categorical Features: These features pertain to
After this process, we ensure that data must be balanced so qualitative aspects of network traffic, including service types
that the efficiency of the model can be achieved. 46 features and flags. By one-hot encoding categorical features, the
selected after encoding are presented in Fig 8. In this BoT model can effectively handle categorical data and distinguish
IoT dataset, there are different attack categories. Each patterns of behavior for different services or flags. Anomalies
category contains a different number of records. For efficient may manifest as unexpected or malicious usage of particular
performance, it is important to balance the dataset. For data services.
balancing, the oversampling technique is used. The balanced
dataset after over-sampling is shown in Fig 9.
TABLE 1: ATTACK COUNT DESCRIPTION IN THE DATASET
To aid in the computational efficiency and simplicity of the
machine learning models, only relevant characteristics are
Sr. Class Sub Class No of Samples
chosen in the feature selection process using correlation as 1 576884
shown in Fig 10. In addition, this method can mitigate the Data Exfiltration
effects of overfitting when all characteristics are used [11],
below features are extracted. Network Traffic Features: 2 Attack Keylogging 73168
4 OS Fingerprint 477
5 Service Scan 73
1. These features encompass data related to network 6 UDP 6
protocols, packet sizes, flow durations, and 7 Normal …. 17914
source/destination IP addresses. By analyzing patterns and
2 Act of tracking and recording every keystroke entry made on the network without
Keylogging
permission.
4 Attack OS Fingerprint Obtain OS information of target hosts to prepare for future attacks.
5 Service Scan The system is scanned for some information that aims to corrupt data.
6 Forward a huge amount of UDP packets to exhaust the bandwidth of target servers/
UDP
devices.
Dataset Darpa98 KDD99 ISCX LBNL DEFCON8 CAIDA UNIBS CICIDS TUIDS UNSWNB15 BoT
2017 IoT
Realistic Traffic No No Yes Yes No Yes Yes Yes Yes Yes Yes
Realistic Yes Yes Yes No No Yes Yes Yes Yes Yes Yes
Testbed
Configuration
Labeled data Yes Yes Yes No No No Yes Yes Yes Yes Yes
New generate No Yes Yes No No No No Yes Yes Yes Yes
features
Full packet Yes Yes Yes No Yes No Yes Yes Yes Yes Yes
capture
Diverse attack Yes Yes Yes Yes Yes No No Yes Yes Yes Yes
scenarios
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
In feature selection, No machine learning method similar to Figure 11. Oversampling process in the proposed methodology
that used by Pahl et al. has been used here for feature
selection. [32] for the proposed study [11]. Because of the Algorithm 1
large dataset, it is important to perform feature selection for Outlier detection and handling algorithm:
improved performance and minimum biasedness. This 1. Sort the dataset D in ascending order
process of oversampling is shown in Fig 11. IQR and 2. estimate the 1st and 3rd quartiles(Q1, Q3) in D features
SMOTE techniques are used for balancing and 3. calculate IQR=Q3-Q1
oversampling. SMOTE is a technique of oversampling in 4. calculate lower bound = (Q1–1.5*IQR), upper bound =
which artificial samples are created for the minority class. (Q3+1.5*IQR)
This technique helps in overcoming the overfitting problem 5. loop through the values of the dataset and check for those who
brought on by random oversampling. IQR is used to fall below the lower bound and above the upper bound and mark
identify outliers in the dataset. IQR (inter Quartile Range) them as outliers
is used to identify outliers. 1.5 IQR outlier is used to 6. Replace outlier values with Median value
identify abnormalities.
Algorithm 2
SMOTE algorithm:
1. Take difference between a sample i and its nearest neighbor ki
2. Multiply the difference by a random number between 0 and 1
Dif =Random(0,1) * (i*ki)
3. Add this difference to the sample to generate a new synthetic
example in feature space
Add Dif to Feature space
4. Continue on with next nearest neighbour up to user-defined
number.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
covariance between x and y. where Sd is the standard emerged. Then it evolved into a tree, or more accurately, a
deviation. flowchart.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
B. RESULT ANALYSIS
4) F1 SCORE
The data description section explains machine-learning
F1-score (F1) represents the measure of the harmonic mean
approaches used on the dataset UNSW BoT IoT. Validation
of recall (or TPR) and precision [36]. The formula for the
of data is performed using 5-fold cross-validation.
F1 measure is given in equation 6.
Performance parameters used for result evaluation are
confusion matrix; accuracy, precision, recall, and F1 score as
discussed in section 3.4. The accuracy comparison of
Random Forest, Decision Tree, AdaBoost, LSTM, Artificial
( 6) Neural Network (ANN), and Auto Encoder is shown in Table
4 and Fig 12.
5) CONFUSION MATRIX C. Comparison of Accuracy
When evaluating machine learning classification models, the
confusion matrix is a useful tool for getting a feel for how Accuracy is the measurement of correctly classified items
things will shake out. A comparison between the actual among all items. This comparison of accuracy is shown in
classes and the expected ones is shown in the table below. It Table 6. In this table accuracy of ML, models including
facilitates a speedy diagnosis of misidentification between Random Forest, Decision Tree, ANN, and AdaBoost are
actual and anticipated classes. Prediction outcomes from a compared. For training data, AdaBoost Decision Tree and
Random forest achieve 100% accuracy whereas ANN
classification task may be summarised using a confusion
achieves 93.17% accuracy. For the test dataset, these
matrix [37].
algorithms achieve 99.9% and ANN achieves 95.74%
accuracy. Fig 12 shows the graphical representation of
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
accuracy, in which classifiers are shown on the x-axis and the shown in blue color. Accuracy shows that ANN performance
accuracy percentage is shown on the y-axis. Training dataset is low compared to AdaBoost, Decision tree, and RF. ANN
accuracy is shown in yellow color and test dataset accuracy is performed low with training and testing datasets.
TABLE 4. ACCURACY COMPARISON OF ML APPROACHES.
Accuracy of ML Approaches
Classifiers
Performance
AdaBoost Decision ANN Random LSTM Auto
Metrics
Tree Forest Encoder
Training Data 100% 100% 93.17% 100%
91.40% 88.30%
Accuracy
Testing Data 99.99% 99.99% 95.74% 100%
90.20% 84.30%
Accuracy
D. Comparison of Confusion Matrices From the 5-fold cross-validation results, it can be concluded
The confusion Matrix shows the actual values of true that RF performed best as shown in Fig 14. In the first fold,
positive, false positive, true negative, and false negative ANN performed with approximately 94% accuracy, and DT,
instances predicted by the specific ML model. Confusion RF, and ADA performs between 96% and 98%. ADA and
matrices of ML techniques, which are shown in Fig 13, RF, DT are similar in the first fold whereas RF is slightly lower
DT, and ADA performed well in terms of True Positive and than these. In the second fold, ANN accuracy is below 92%,
True Negative predictions and resulted in high precision, whereas ADA, DT, and RF are better than ANN. In the third
recall, and accuracy. In ADA, 10 instances of fold, DT and ADA accuracy is dropped and ANN accuracy is
OS_Fingerprint are misclassified out of 5450 instances, and 2 comparatively higher than these two, Whereas RF performs
instances of keylogging are misclassified. In RF, only 2 highest in the third fold. In the last two folds, ADA and DT
instances of keylogging and 1 instance of Service_scan are accuracy are similar and RF performed better in all of the
misclassified. In DT, 2 instances of keylogging, 7 instances five folds. ANN, LSTM, and Auto Encoder have less
of OS_Fingerprint, and 3 instances of Service_scan are performance.
misclassified. Whereas in ANN, 3 instances of keylogging, 8
instances of Normal, 21 instances of OS_Fingerprint, and
8519 instances of Service_scan are misclassified.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
Figure 14. Cross-Validation Accuracy of Decision Tree, Random Forest, ANN, and ADA
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
F. Comparison of Weighted Precision, Recall, and F1 B, C, D, E, F) then weighted precision can be calculated
score using the following formula.
Weighted performance measures are presented in Table 6.
In weighted measures, weights (number of instances) are
( 11)
attached to the measures. If the number of instances is (A,
( 22)
Similarly, F1 measure and Recall can be produced in this score for RF, ADA, and DT are 1 compared to ANN which
way. Fig 16 shows that weighted precision, recall, and F1 has low measures.
TABLE 1: WEIGHTED PRECISION, RECALL, AND F1 SCORE OF ML ALGORITHMS
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
G. Summary of Right Classified and Misclassified respectively. With RF, 5450 and 0 are RC and MC
Instances respectively. LSTM has an 8765 MC value and Autoencoder
Table 7 presents the right classified (RC) and misclassified has 9504 misclassified instances. Their identification rate of
(MC) instances with different algorithms. The total count the malicious node is above 80%. Results show that the
shows the total number of instances in that particular class. proposed technique provides a detailed description of
Data Exfiltration has 2 instances with each classifier and the anomaly detection with large datasets. Feature engineering is
right predictions are made by each algorithm. Keylogging performed using one-hot encoding, label encoding,
has 28 counts of which 2 are misclassified by RF, DT, and oversampling, and correlation-based feature selection.
ADA whereas 3 are misclassified by ANN. For the normal Multiclass classification is performed accurately, which is
category, ANN correctly classifies 149, and 8 instances are more difficult to predict than binary classification. The
misclassified. proposed method outperforms Random forest and is best
with DT, and ADA.
For OS Fingerprint, 5450 is the total count out of which
accurately classified are 5440, and 10 instances are
misclassified by ADA. For the Decision Tree, 5453 is RC
and 7 is MC. For ANN, 5429 and 21 are RC and MC
TABLE 2: PROPOSED METHODOLOGY COMPARISON USING DIFFERENT ALGORITHMS
Attacks Total AdaBoost Decision Tree ANN Random Auto Encoder LSTM
Counts Forest
RC MC RC MC RC MC RC MC RC MC RC MC
Data 2 2 0 2 0 2 0 2 0 1 1 2 0
Exfiltration
Keylogging 28 26 2 26 2 25 3 26 2 10 18 20 08
Figure 17. Comparison of the proposed approach with the existing approach of the IoT anomaly detection system
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
Figure 17. Comparison of the proposed approach with the proposed Machine learning algorithms
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
towards making security by design retrofittable into existing [13] Diro, A. A., & Chilamkurti, N. Distributed attack detection
scheme using deep learning approach for Internet of Things.
IoT installations. Future generation computer systems, 82, (2018). 761-768.
In the future, the proposed methodology of anomaly [14] Alsharif, M. H., Kelechi, A. H., Yahya, K., & Chaudhry, S. A.
detection can be tested and validated with new ML models Machine learning algorithms for smart data analysis in Internet of
things environment: taxonomies and research trends. Symmetry,
and other datasets like CIDIDS, KDD, Bot−IoT, and CTU-13 12(1), (2020). 88.
and results can be compared. More Machine Learning [15] Porkodi, R., & Bhuvaneswari, V. The Internet of Things (IoT)
classifiers can be tested. Unsupervised techniques can also be applications and communication enabling technology standards:
tested and compared with the algorithms used in this study. An overview. Paper presented at the 2014 International
Conference on intelligent computing applications (2014).
Other feature selection techniques can be tested with this [16] Saleem, K., Bajwa, I. S., Sarwar, N., Anwar, W., & Ashraf, A.
methodology. IoT Healthcare: Design of Smart and Cost-Effective Sleep
The limitation of the proposed study is that it needs more in- Quality Monitoring System. Journal of Sensors, 2020. (2020).
[17] Klötzer, C., Weißenborn, J., & Pflaum, A. The evolution of cyber-
depth focus to interpret the behavior of the device in different physical systems as a driving force behind digital transformation.
environments. Paper presented at the 2017 IEEE 19th Conference on Business
Informatics (CBI). (2017).
[18] Garg, S., Kaur, K., Kaddoum, G., & Choo, K.-K. R. Toward
CONFLICTS OF INTEREST secure and provable authentication for Internet of Things:
The authors declare no conflicts of interest. Realizing industry 4.0. IEEE Internet of Things Journal, 7(5),
(2019). 4598-4606.
[19] Khalfi, B., Hamdaoui, B., & Guizani, M. Extracting and
Data Availability Statement:
exploiting inherent sparsity for efficient IoT support in 5G:
The dataset supporting the conclusions of this article is Challenges and potential solutions. IEEE Wireless
available at the repository: Communications, 24(5), (2017). 68-73.
https://research.unsw.edu.au/projects/bot-iot-dataset. [20] Donno, M. D., Dragoni, N., Giaretta, A., & Mazzara, M.
AntibIoTic: protecting IoT devices against DDoS attacks. Paper
presented at the International Conference in Software Engineering
for Defence Applications. (2016).
REFERENCES [21] Dargahi, T., Dehghantanha, A., Bahrami, P. N., Conti, M.,
[1] Chen, S., Xu, H., Liu, D., Hu, B., & Wang, H. A vision of IoT: Bianchi, G., & Benedetto, L. A cyber-kill-chain-based taxonomy
Applications, challenges, and opportunities with China of crypto-ransomware features. Journal of Computer Virology
perspective. IEEE Internet of Things Journal, 1(4), (2014) 349- and Hacking Techniques, 15(4), (2019). 277-305.
359. [22] Albasir, A. Detection of Anomalous Behavior of IoT/CPS
[2] Lee, I., & Lee, K. The Internet of Things (IoT): Applications, Devices Using Their Power Signals. (2020).
investments, and challenges for enterprises. Business Horizons, [23] Ullah, F. Javeed, A. Salam, M. Ahmad, N. Sarwar, D. Shah, and
58(4), (2015) 431-440. M. Abrar, “Modified decision tree technique for ransomware
[3] Xiao, L., Wan, X., Lu, X., Zhang, Y., & Wu, D. IoT security detection at runtime through API Calls,” Scientific Programming,
techniques based on machine learning: How do IoT devices use 4(2) 2020 1-21.
AI to enhance security? IEEE Signal Processing Magazine, 35(5), [24] Gou, Q., Yan, L., Liu, Y., & Li, Y.. Construction and strategies
(2018). 41-49. in IoT security system. Paper presented at the 2013 IEEE
[4] Chaabouni, N., Mosbah, M., Zemmari, A., Sauvignac, C., & international conference on green computing and
Faruki, P. Network intrusion detection for IoT security based on Communications and IEEE Internet of Things and IEEE Cyber,
learning techniques. IEEE Communications Surveys & Tutorials, physical and social computing. (2013)
21(3), (2019). 2671-2701. [25] Shabtai, A., Kanonov, U., Elovici, Y., Glezer, C., & Weiss, Y.
[5] Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & “Andromaly”: a behavioral malware detection framework for
Ayyash, M. Internet of things: A survey on enabling technologies, Android devices. Journal of Intelligent Information Systems,
protocols, and applications. IEEE Communications Surveys & 38(1), (2012). 161-190.
Tutorials, 17(4), (2015). 2347-2376. [26] Alippi, C., D'Alto, V., Falchetto, M., Pau, D., & Roveri, M.
[6] Balta-Ozkan, N., Davidson, R., Bicket, M., & Whitmarsh, L.. Detecting changes at the sensor level in cyber-physical systems:
Social barriers to the adoption of smart homes. Energy Policy, 63, Methodology and technological implementation. Paper presented
(2013) 363-374. at the 2017 International Joint Conference on Neural Networks
[7] Sperotto, A., & Pras, A.. Flow-based intrusion detection. Paper (IJCNN). (2017).
presented at the 12th IFIP/IEEE International Symposium on [27] Alippi, C., Boracchi, G., & Roveri, M. A just-in-time adaptive
Integrated Network Management (IM 2011) and Workshops. classification system based on the intersection of confidence
(2011) intervals rule. Neural Networks, 24(8), (2011). 791-800.
[8] Ali, B. Internet of Things Based Smart Homes: Security Risk [28] Shanthamallu, U. S., Spanias, A., Tepedelenlioglu, C., & Stanley,
Assessment and Recommendations. (2016) M. A brief survey of machine learning methods and their sensor
[9] Bin, S., Yuan, L., & Xiaoyi, W. Research on data mining models and IoT applications. Paper presented at the 2017 8th
for the Internet of things. Paper presented at the 2010 International Conference on Information, Intelligence, Systems &
International Conference on Image Analysis and Signal Applications (IISA). (2017).
Processing. (2010) [29] Aysa, M. H., Ibrahim, A. A., & Mohammed, A. H. IoT DDoS
[10] Steinberg, J. These devices may be spying on you (even in your attack detection using machine learning. Paper presented at the
own home). Forbes. Retrieved, 27. (2014). 2020 4th International Symposium on Multidisciplinary Studies
[11] Hasan, M., Islam, M. M., Zarif, M. I. I., & Hashem, M. Attack and Innovative Technologies (ISMSIT). (2020).
and anomaly detection in IoT sensors in IoT sites using machine [30] Koroniotis, N., Moustafa, N., Sitnikova, E., & Turnbull, B.
learning approaches. Internet of Things, 7, 100059. (2019). Towards the development of realistic botnet dataset in the internet
[12] Poyner, I., & Sherratt, R. Privacy and security of consumer IoT of things for network forensic analytics: Bot-iot dataset. Future
devices for the pervasive monitoring of vulnerable people. Paper generation computer systems, 100, (2019). 779-796.
presented at the Living in the Internet of Things: Cybersecurity of [31] Koroniotis, N., Moustafa, N., Sitnikova, E., & Slay, J. Towards
the IoT-2018. (2018). developing network forensic mechanism for botnet activities in
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3325929
the IoT based on machine learning techniques. Paper presented at and Soft Computing, PLOS ONE, IET Software, Security and
the International Conference on Mobile Networks and Communication Networks, International Journal of Telemedicine and
Management. (2017). Applications, Journal of Healthcare Engineering, Milestone transactions
[32] Pahl, M.-O., & Aubet, F.-X. All eyes on you: Distributed Multi- on medical Technometrics, Mathematics and Computer Science, SCIREA
Dimensional IoT microservice anomaly detection. Paper journal of Computer. Worked for more than 25 journals as and reviewer
presented at the 2018 14th International Conference on Network and PC member.
and Service Management (CNSM). (2018).
[33] Hastie, T., Tibshirani, R., & Friedman, J. H.. The elements of
statistical learning: data mining, inference, and prediction (Vol.
2): Springer. (2009) IMRAN SARWAR BAJWA received the
[34] Krogh, A. What are artificial neural networks? Nature Ph.D. degree in computer science from the
Biotechnology, 26(2), (2008). 195-197. University of Birmingham, U.K. He has more
[35] Xu, W., Jang-Jaccard, J., Singh, A., Wei, Y., & Sabrina, F. than 18 years of teaching research experience
Improving performance of autoencoder-based network anomaly in various universities of Pakistan, Portugal,
detection on nsl-kdd dataset. IEEE Access, 9, (2021). 140136- and U.K. He is currently an Associate
140146. Professor with the Department of Computer
[36] Hossin, M., & Sulaiman, M. N. A review on evaluation metrics Science and Information Technology, The
for data classification evaluations. International journal of data Islamia University of Bahawalpur. He has
mining & knowledge management process, 5(2), 1. (2015). published more than 190 articles in well
[37] Peterson, J. M., Leevy, J. L., & Khoshgoftaar, T. M. A Review reputed peer-refereed conferences and
and Analysis of the Bot-IoT Dataset. Paper presented at the 2021 journals. He has more than 80 publications and 1000 citations in Google
IEEE International Conference on Service-Oriented System Scholar. Additionally, he has more than 90 articles in Scopus and 370
Engineering (SOSE). (23-26 Aug. 2021). citations of his work in Scopus. His personal impact factor is more than
[38] Thamilarasu, G.; Chawla, S. Towards Deep-Learning-Driven 40. His current research interests include artificial intelligence, the IoT, big
data, and machine learning. He is an Associate Editor of IEEE ACCESS
Intrusion Detection for the Internet of Things. Sensors 2019, 19,
journal. He has very good programming skills in Java and C#.
1977. https://doi.org/10.3390/s19091977
[39] Alosaimi, S.; Almutairi, S.M. An Intrusion Detection System
Using BoT-IoT. Appl. Sci. 2023, 13, 5427.
https://doi.org/10.3390/app13095427 Muhammad Zunnurain Hussain
[40] Albulayhi, K.; Abu Al-Haija, Q.; Alsuhibany, S.A.; Jillepalli, working as Assistant Professor at
A.A.; Ashrafuzzaman, M.; Sheldon, F.T. IoT Intrusion Detection Department of Computer Science,
Using Machine Learning with a Novel High Performing Feature Bahria University, Lahore, Pakistan. He
Selection Method. Appl. Sci. 2022, 12, 5015. has 10 years proven Network & Security
https://doi.org/10.3390/app12105015 specialist with comprehensive lifecycle
[41] Abu Al-Haija, Q.; Zein-Sabatto, S. An Efficient Deep-Learning- advisory, client development,
Based Detection and Classification System for Cyber-Attacks in management, & technology consulting
IoT Communication Networks. Electronics 2020, 9, 2152. for fortune 100 clients. His research
https://doi.org/10.3390/electronics9122152 interests include the Computer
[42] Bhayo, J., Shah, S. A., Hameed, S., Ahmed, A., Nasir, J., & Networks, Information Security, IoT
Draheim, D. Towards a machine learning-based framework for Security, Cloud Computing, Machine
DDOS attack detection in software-defined IoT (SD-IoT) Learning.
networks. Engineering Applications of Artificial
Intelligence, 123, 2023, 106432.
[43] Siddiqui, S., Hameed, S., Shah, S. A., Ahmad, I., Aneiba, A.,
Draheim, D., & Dustdar, S. (2022). Towards Software-Defined
Muhammad Ibrahim received the Ph.D.
Networking-based IoT Frameworks: A Systematic Literature
degree in computer science from the
Review, Taxonomy, Open Challenges and Prospects. IEEE
Islamia University of Bahawalpur,
Access.
Pakistan. He has more than 10 years of
[44] Khalid, M., Hameed, S., Qadir, A., Shah, S. A., & Draheim, D. teaching research experience in various
(2023). Towards SDN-based smart contract solution for IoT institutes of Pakistan. He is currently an
access control. Computer Communications, 198, 1-31. Lecturer with the Department of Computer
Science, The Islamia University of
Bahawalpur. His current research interests
include artificial intelligence, the IoT, big
data, and machine learning.
NADEEM SARWAR is
perusing his PhD (CS) from
Islamia University of
Bahawalpur, Pakistan and
working as Assistant Professor
Khizra Saleem is working as a
at Department of Computer
Lecturer in Department of Computer
Science, Bahria University,
Science, The Islamia University of
Lahore, Pakistan. He has 10
Bahawalpur. Her research Interest
years of teaching and research
Includes Machine Learning, IoT, and
experience. He has published
Cloud Computing.
more than 60 international and
National Journal/conference
publications in the last five
years. He is also working as
Editorial Board Member for various research journals named are CMC-
Computers, Materials & Continua, Applied Computational Intelligence
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4