Professional Documents
Culture Documents
1 s2.0 S0743731522000570 Main
1 s2.0 S0743731522000570 Main
Othmane Friha, Mohamed Amine Ferrag, Lei Shu, Leandros Maglaras, Kim-Kwang
Raymond Choo et al.
PII: S0743-7315(22)00057-0
DOI: https://doi.org/10.1016/j.jpdc.2022.03.003
Reference: YJPDC 4526
Please cite this article as: O. Friha, M.A. Ferrag, L. Shu et al., FELIDS: Federated Learning-based Intrusion Detection System for
Agricultural Internet of Things, Journal of Parallel and Distributed Computing, doi: https://doi.org/10.1016/j.jpdc.2022.03.003.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and
formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and
review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal
pertain.
exploiting them can cause serious damage. A good example reduces the required rounds to achieve the desired accuracy,
from 2017, when Maersk, one of the leading shipping com- and the total amount of data downloaded per round relative to
panies worldwide, was massively damaged by the NotPetya FedAvg. CE-FedAvg experiments using MNIST and CIFAR-
malware, costing its business $250 million–$300 million [4]. 10 datasets on an edge environment showed that CE-FedAvg
In addition, in a new example from 2021, JBS, the biggest reduces the convergence time compared to FedAvg.
meat supplier in the world, was the target of a ransomware
attack that forced it to shut down some operations in multiple B. Intrusion Detection in IoT environments
countries and affect thousands of workers [5].
To prevent serious financial and reputational damage, the Increasingly, IoT devices are being deployed in the agri-
cybersecurity community is focusing on finding solutions cultural sector [2]. However, much of these equipment are
that can respond quickly and effectively to these threats. vulnerable as a result of flawed design, poor implementation,
Intrusion Detection System (IDS) is a dedicated security tool and bad configuration. Consequently, many Agri-IoT networks
that monitors events within a network or computer system contain vulnerable IoT devices that can be compromised with
regularly and assesses them for evidence of an intrusion. In little or no effort. Ferrag et al. [11] proposed a deep learning-
general, Intrusion detection strategies can be classified into based IDS to mitigate DDoS attacks in Agriculture 4.0 en-
two categories: signature-based methods and anomaly-based vironments. Experimental results on the CIC-DDoS2019 and
methods. In signature-based IDSs, the attack patterns or sig- TON_IoT datasets demonstrated significant performance for
natures, are already specified and stored. The major drawback the proposed system. In [12], authors proposed a hierarchical
of such methods is that the IDS will fail to detect new or IDS named RDTIDS for IoT networks. RDTIDS incorporates
previously unseen attacks (zero-day). In contrast, anomaly- a mixture of various classifiers, namely the REP tree, JRip
based methods use Machine Learning (ML) techniques [6], algorithm, and Forest PA. The obtained experimental results
such as Deep Learning (DL) to build a model by observing the when analyzing the proposed IDS using both CICIDS2017 and
normal behavior through its features, in which any deviation BoT-IoT datasets provided a significant detection rate, with
is detected as an anomaly. small false alarms. Integrating Blockchain and IDS has proven
Conventional machine learning techniques require central- to be a successful approach in the last few years. However, all
izing the learning data on a single machine or in the cloud. of the preceding studies require the centralization of data for
However, this brings up various issues, including data privacy model training and testing, which is a drawback for privacy-
[7], significant communication overhead, and high power sensitive data.
consumption. Federated Learning (FL) is a new approach
that allowed knowledge sharing with privacy maintaining, and C. Machine Learning for SDN security
costs reduction. FL allows devices to collectively learn a SDN technology decouples the traditional user plane from
shared model while preserving the entirety of the learning the control plane, to enhance the network’s overall efficiency,
data on the device, thus separating the machine learning and provide better network management and programmability
functionality from the requirement of centralized data storing [13]. However, these infrastructures are subject to a range of
[8]. Each device (or client in the FL jargon) downloads the security threats [14]. Nanda et al. [15] proposed an ML-based
current model from the FL server, enhances it by learning model employing four classifiers, namely, C4.5, BayesNet,
from the local private data, and then securely communicates Decision Table, and Naive-Bayes that were trained using
the model parameters with the server, where they are averaged historical data of network attacks, for predicting vulnerable
with other clients’ updates to better enhance the shared model. hosts in the SDN. Results showed that the proposed model
FL can considerably minimize privacy and security risks since has an accuracy of 91.68% by BayesNet. However, detailed
training data are kept on the client-side [9]. information on the training data was not provided by the
authors. Cusack et al. [16] provided an ML-based ransomware
II. R ELATED W ORKS detection method in SDN, by gathering packet-based network
In this section, we present a review of relevant literature monitoring data at high rates using programmable forwarding
related to FL and ML for intrusion detection in IoT networks. engines, along with training the random forest classifier to de-
tect ransomware. From the experiments, the proposed method
A. Federated Learning and Federated Edge Learning achieved a detection accuracy rate of about 87%. However, it
has a high false-negative rate of about 10%, and only detects
Thanks to recent developments, FL is taking an important one type of attack, the ransomware.
step towards widespread adoption, as it offers the opportunity
to share knowledge across multiple users while preserving
privacy. McMahan et al. [8] proposed the novel FedAvg D. Federated Deep Learning-based IDS for IoT
algorithm by which an aggregated model (i.e, global model) ML-based IDS typically is dependent on adequate amounts
could be trained while the clients’ data is not uploaded to of data to build a model. However, the centralization of all data
a centralized server. Federated Edge Learning (FEEL) has for training and testing may not be feasible in sophisticated
similar design aspects to FL, except for the fact that the server environments where there are multiple parties to be moni-
is located at the edge of the network. Mills et al. [10] proposed tored, which raises privacy issues and high communication
a communication-efficient FedAvg, called CE-FedAvg, which latency [4]. FL ensures data privacy without compromising
3
Block 0
Genesis
Blockchain Other
Permissioned Blockchain network
Clients
node Validators
Block 1
Distributed Ledger
Validator
T1
Blockchain node
Interconnect REST API
Consensus engine
Block 2
Block Transactions
management Handling
T2
Consensus Proxy
State Database
Transactions
Processor
Block N P2P Interface
TN
NorthBound API
network
Distributed Core
SouthBound API
Applications
SDN Data
FELIDS
Servers Virtual Switch
Cloud Node
Pools
Cloud network
FELIDS Server
FELIDS Models Aggregation Global Model Generation Global Model Update Global Model Broadcast
Server
Virtualization
ta
Da
Hardware
FL
Virtual Switch
Edge network
FELIDS Client
Edge Node
Network Security Detection Forward Drop Classification Network Privacy
Local Dataset Local Model Generation Local Model Upload Local Model Update
Edge node
Virtualization
Hardware
Io
Argicultural IoT Network
Base station 5G
D
at
Memory Connectivity
a
A/D
IoT Device
converter CPU
I/O Unit Storage
Actuator
Microcontroller
Sensors
Power management
the accuracy of the generated models at low latency and neural networks which have been proposed in the literature, the
communication costs [8]. Deep Learning for FL was in- particular design of the neural network used in the experiment
vestigated and implemented for IDS. Preuveneers et al. [4] is quite simple in architecture, rendering the generalization of
investigated federated IDS based on deep learning with the the proposed work’s results to other applications non-trivial.
conjunction of permissioned blockchain. The authors imple- In addition, the fedAvg algorithm requires customization for
mented an autoencoder that was trained on the CICIDS2017 each specific application to perform best, potentially impacting
dataset, in addition to the MultiChain platform to deliver the MultiChain blockchain interaction indirectly. Rahman et
full transparency through the distributed learning process. al. [17] proposed a FL-based IDS for IoT networks. The
However, by contrast, when compared to more sophisticated authors conducted three experiments of centralized, device-
4
based, and federated learning using the NSL-KDD dataset. and benign, which further develops the threat model.
The experimental evaluation showed that the FL-based IDS • We also provide a detailed performance evaluation along
has an accuracy of about 83.09%, which is close to the with a thorough comparative analysis between the pro-
centralized model evaluation. However, the dataset is pretty posed FELIDS model, the centralized machine learning
old, which makes it not suitable for IDSs, since newer cyber model, and the state-of-the-art works.
attacks are being updated continuously. Nguyen et al. [18] The following is the organization of the rest of this paper.
introduced DÏOT, an FL-based IDS for the IoT that uses Section II reviews the state-of-the-art studies for IoT-based
an automated technique for device type-specific IDS. On IDS, and federated learning-based IDS. Section III provides
average, it detected 95.6% of attacks in 257 ms, with only an overview of the Agri-IoT framework. Section IV presents
small false alarms when tested in a physical implementation, the implementation of the FELIDS scheme. Section V provides
involving devices infected with Mirai malware. However, the the performance evaluations and comparative studies. Section
proposed model is designed to only detect attacks against IoT VI the conclusions.
devices, other threats targeting additional entities in the entire
ecosystem, such as sophisticated networking technologies (e.g.
III. AGRI -I OT F RAMEWORK
SDN) and services (such as FTP, SSH), are not considered. Li
et al. [19] proposed an FL-based IDS for Industrial Cyber- In this section, we provide an overview of the Agri-IoT
Physical Systems (CPSs). Experiments conducted on a real framework by describing its architecture and highlighting the
industrial CPS dataset showed good results of the proposed threat model.
scheme. Schneble et al. [20] implemented an FL-based IDS
solution for Medical Cyber-Physical Systems (MCPS). The
A. Architecture description
proposed system was evaluated using a real patient dataset and
showed 99.0% accuracy. However, the authors only considered Emerging technologies have been widely applied in Agri-
three types of malicious data, namely: denial of service, data culture 4.0. Fig 2 illustrates an advanced Agri-IoT framework,
modification, and data injection. This limits the threat model, with both high and low-level architectures. As we can see the
and other types of attacks, such as malware attacks, IoT framework integrates various technologies such as smart sen-
protocol attacks (such as MQTT attacks), application attacks sors, robotics, SDN, and Blockchain. These technologies aim
(such as XSS and Brute-force), and service attacks (such as to supply the agricultural sector with suitable tools to assist
FTP and SSH attacks) remains out of scope. Huong et al. it in decision-making and automation activities by delivering
[21] proposed a Low-Complexity Cyberattack Detection in the right combination of materials, knowledge, and services,
IoT Edge Computing system, named LocKedge. The authors which results in a better profit.
compared the performance of LocKedge to other ML methods Ensuring that the agricultural data is protected and can
using the BoT-IoT dataset. Zhao et al. [22] proposed an only be accessed by authorized parties is of great importance.
FL-aided Long Short-Term Memory framework, named FL- Authentication and data encryption is essential to achieve this
LSTM, for intelligent IDSs. The accuracy rate was about 90%. insurance. The Agri-IoT framework ensures data confidential-
However, the dataset used for performance evaluation consists ity by:
of command blocks, which were created from a compiled list 1) Agri-IoT layer: In order for an IoT device to collect or
of user commands. Therefore, other threats residing in the send data, it must first authenticate itself to the network, also
network traffic are not taken into account and cannot be used for crypto-capable IoT devices, these will encrypt each piece
by network-based IDSs to detect network intrusions. of data before sending it to the fog layer, other less powerful
It is clear from the previous discussion that some gaps are devices will rely on access points or microcontrollers for this
identified in the literature, including outdated or contextually task.
inappropriate datasets, privacy issues for centralized models, 2) Edge layer: Edge nodes inter-operate together to manage
and limited threat models that only address a subset of the network resources efficiently or to perform a specific task,
attack vectors; all of which must be effectively addressed to and each node is authenticated to the network and secures the
secure the agricultural Internet of Things. This work’s main communication channel before that the collaboration begins.
contributions are: 3) SDN layer: Since OpenFlow can securely run on Trans-
port Layer Security (TLS) [13], it provides encryption and
• We propose FELIDS, a federated IDS based on deep authentication of fog nodes to the SDN controller while
learning for mitigating cyberattacks on Agri-IoT infras- receiving network rules from it. The use of an SDN cluster
tructures. (or distributed SDN controller architecture) is desired in large-
• We consider the implementation of three deep learn- scale systems because it overcomes some of the challenges
ing classifiers, namely: Deep Neural Networks (DNNs), associated with a standalone SDN controller, including the
Convolutional Neural Networks (CNNs) and Recurrent single point of failure (SPOF) problem, i.e., if the primary (and
Neural Networks (RNNs). only) SDN controller fails, the entire system stops working.
• Three recent real-world traffic datasets - CSE-CIC- Moreover, in conjunction with other technologies, such as
IDS2018, MQTTset, and InSDN - are used to evaluate the blockchain, with more controllers being used as miners, the
performance of each classifier. The datasets used include number of validated transactions grows, as demonstrated by
a large amount of recent network traffic, both malicious Barka et al. [23]
5
4) Blockchain layer: Each Blockchain node or client has A. Adversary Model and Defense Objectives
a pair of keys used for the authentication and encryption
An adversary A can be either external, typically a remote
of data because the architecture is based on a permissioned
entity launching cyberattacks from the internet, such as tak-
network, where only known peers are allowed to participate
ing down the entire network, exploiting internet-accessible
in the system. Blockchain can help network members to track
applications, injecting malicious code into a database, stealing
relevant information for the effective supply chain manage-
sensitive data...etc. Or an insider, where A may be on the Agri-
ment. Ensuring the presence of this information in the supply
IoT network, such as an infected IoT device or another entity
chain increases the traceability of materials and mitigates
within the network. For example, an IoT malware carrying
losses due to counterfeiting and gray market. Furthermore, this
out attacks against, or executing attacks from, weak devices
technology can be used to create a multinational industrial IoT
in the Agri-IoT network to detect and exploit vulnerable IoT
(IIoT) platform, while being effective in mitigating DoS/DDoS
devices and services. The main objective of FELIDS is to
threats, message tampering attacks, and authentication delays,
detect both legacy and recent cyber attacks originating from
as proposed by Rathee et al. [24].
external and internal entities on the Agri-IoT infrastructure so
While encryption and authentication are mandatory to se-
that appropriate countermeasures can be taken as quickly as
cure communications and preserve data confidentiality. These
possible.
countermeasures fail in the face of authorized internal parties
Moreover, we assume the following:
who conduct cybersecurity attacks, by abusing their authoriza-
tion. Our proposed IDS, named FELIDS, is designed to secure • Edge nodes are not compromised. Edge nodes are the
the Agri-IoT network against such threats. A detailed system gatekeeper for network security in Agri-IoT infrastruc-
overview will be provided in Section IV. tures, so we assume that they are not compromised.
However, by compromising these nodes, it becomes clear
B. Threat Model that the Agri-IoT network is no longer protected by them.
Even though Agriculture 4.0 is considered as the new • FL aggregators are trustworthy. Since aggregation
standard, several potential threats may be standing in the servers are an essential part of the learning process, it
way of its widespread acceptance and adoption. Historically, is clear that a level of trust in the server that coordinates
a combination of some of these threats still exists through the training is always necessary.
time, such as rough weather conditions. However, there are • No default malicious devices. IoT devices may have
additional ones that are traced to the broad adoption of built-in vulnerabilities when released for the first time
technology, leading to significant security flaws and critical by a manufacturer. However, these devices should not be
attack vectors. subject to any contamination or infection of any kind in
1) Conventional network-based attacks: in these types of the first use. This will allow devices to generate only
attacks, an adversary A uses the well-known TCP/IP Internet legitimate communications, enabling FELIDS to learn
protocol suite to launch attacks against the Agri-IoT infrastruc- from the benign patterns, before an adversary A finds
ture. Possible examples of such attacks include authentication and exploits any vulnerabilities.
compromise, traffic jamming, exploitation of the agri-product
manufacturing process by malware, commercial fraud by in- B. Design Goals
jecting fake product data, compromising vulnerable quality
and control systems, SQL and XSS injections against cloud- Agriculture 4.0 value chain is composed of different parties
based applications, and DDoS the consortium Blockchain. with different operational roles, with data involved at each
2) IoT protocol-based attacks: in these types of attacks, step of the food chain. However most of the time, this data is
A uses IoT protocols such as MQTT, to target vulnerable private, which makes it unfeasible to centralize all of it for an
devices. Examples of these attacks include, but are not limited ML-based IDS to learn from, to create an effective anomaly
to: controlling livestock health-related IoT devices to ruin good detection model. Also, an effective IDS model needs to ensure
reputations, GPS spoofing, compromising surveillance, con- that benign behavioral patterns are well-learned, to distinguish
trolling farm machinery to delay or make malicious decisions them from malicious actions. However, with the massive
in the field. number of different IoT device types and applications, network
3) Sophisticated network-based attacks: in these types of latency issues, limited processing and storage capabilities..etc,
attacks, A uses SDN-based protocols, such as OpenFlow, to the task becomes even more challenging.
carry out attacks on the Agri-IoT framework. Some examples Our goal is to develop a decentralized and distributed IDS
of these attacks include, but are not limited to OpenFlow for Agriculture 4.0, which can benefit from the rich knowledge
protocol-related attacks, DDoS, or rerouting SDN-enabled of Agri-IoT networks, without violating data privacy, while
switches or controllers so that Agri-IoT tasks are delayed or keeping costs low.
interrupted due to network failure. 1) Federated Deep Learning: The most suitable problems
for FL are those with the following properties: 1) when
IV. FELIDS: P RIVACY- PRESERVING IDS the training on real data from distributed devices offers an
In this section, we provide a comprehensive and detailed advantage over training on centralized data located in one
description of the proposed solution by highlighting its design central palace, 2) data is privacy sensitive or enormous, and
goals, the used FL-based approach, and the system design. 3) labeling the data can be obtained from distributed devices
6
FL Server
Centralized Server
Centralized dataset
Training
Global model
FL rounds
s el
Global model
er od
s
m lm
te
Download Global Model
Upload data
al
da
ra ca
de ob
up
et
pa Lo
l
mo G
ge
ad
l
ive
an
lo
ce
Up
ch
Re
Ex
Local data Local data Local data
Training Training Training
for supervised tasks [8]. Therefore, FL is the best approach Since we have a multi-class classification, the output layer is
for our work. a SoftMax layer, which holds one neuron per class i, to provide
As illustrated in Fig. 3, in a centralized learning envi- a prediction of P(Y = i/X). The total sum of all these values
ronment, every client uploads its data to a centralized deep is equal to 1. In order to compute the probability distribution
learning server to train the IDS model. However, in an FL between the classes, the SoftMax function (Eq. 3) calculates
environment, instead of training and evaluating it on a single the probability for each class as follows:
machine, all K clients learn a local model that has the same ezi
structure but is trained with different local datasets. Then, So f t Max(z)i = Í z j (3)
je
they communicate these local models to an aggregation server,
which aggregates all local models and produces an enhanced FELIDS uses the categorical_crossentropy loss function.
global model with optimized parameters. The optimization process is achieved with the Adam optimizer
[26], with error back propagation. Overfitting prevention is
carried out with two methods namely dropout and L2 regu-
C. Deep Learning Classifiers larization. We used three popular deep learning techniques,
The rapid development of deep learning theory and tech- namely: DNN, CNN, RNN.
nology in recent years has opened the door for a new age of 1) DNN: is an Artificial Neural Network (ANN) with mul-
ML [25] and provided an entirely new avenue for the devel- tiple hidden layers (Eq. 4), between the input layer (h(0) (x) =
opment of intelligent IDS technology. Since deep learning has x), and the output layer (Eq. 5).
considerable value in extracting enhanced data representations For k = 1, ..., L
for creating efficient models. Although there are several types
h(k) (x) = ReLu(W (k) h(k−1) (x) + b(k) ) (4)
of neural networks, they always include these fundamental
components: neurons, weights, biases, and functions. Also, in For k = L + 1
general, neural networks have the same purpose of associating
to an entry x and output y, such that y = f (x, θ), where θ is h(L+1) (x) = So f t Max(W (L+1) h(L) (x) + b(L+1) ) (5)
the parameters vector. Each connection record is taken as an 2) CNN: are intended to process data in the form of
input record : X ∈ Rd . Each input record X consists of a set of multiple arrays [25]. It is made up of convolutional layers,
input characteristics : X = (x1, x2, ..., xd ). Since that we employ pooling layers, and fully connected layers. The role of a
a supervised learning scheme, we associate each X with its convolution layer is to identify local characteristics at distinct
label Y , to indicate whether a record is a normal (benign) locations in the input feature maps with learnable kernels ki(l)j ,
communication or an intrusion. Every artificial neuron is a for the connection weights between feature maps i and j at
function f j (Eq. 1), that has in input X, a vector of connection layers l−1 and l, respectively.
weights w j = (w j,1 ...w j,d ), a bias b j , and an activation Convolution layer l activation A(l) j is computed as follows
function g(.). [27]:
(l−1)
MÕ
f j (x) = g(< w j , x > +b j ) (1) A(l) = A(l−1) ∗ k i(l)j + b(l)
j g( i j ) (6)
i=1
Rectified Linear Unit (ReLu) (Eq. 2) activation functions
are used in the hidden layers : Where, M is the feature maps number, ∗ is the convolution
operator, b is the bias, and g(.) is ReLu activation function. The
g(x) = max(0, x) (2) purpose of the pooling layer is to subsample the characteristic
7
maps from the previous layer and therefore reducing the TABLE I: FELIDS Settings for Deep Learning Classifiers
settings and calculations required. Classifier Parameter Value
3) RNN: are intended to process a sequence of inputs one Hidden nodes 64-80
DNN Hidden layers 1-2
piece at a time, while holding in their hidden units a so-called Dropout 0.1-0.4
"state vector" which holds implicitly the history of all past Convolutional layers 2-3 Conv1D
elements of the sequence [25]. According to Pascanu et al. Filters 16-74
Kernel size 3
[28], RNN can deepen further using the following: input-to- CNN Pooling layers 1 Global Average Pooling 1D
hidden function, hidden-to-hidden transition, and hidden-to- Hidden nodes 120-130
output function. Standard RNN at time t is formalized as Hidden layers 2-3
Dropout 0.1-0.4
follows: Given a sequence of input X = (x1, ..., xT ), and by Hidden nodes 20-80
computing a hidden state sequence H = (h1, ..., hT ), and also RNN Hidden LSTM layers 2
an output vector sequence y = (y1, ..., yT ), both the hidden Dropout 0.2
Batch size 100
states ht , and the output sequence yt , at time step t, are Local epochs 1
calculated as follows: Global epochs 50
Learning rate 0.01-0.5
( Regularization L2
ht = g(xt W (in,hi) + ht−1W (hi,hi) + b(h) ), Global Loss function categorical_crossentropy
(7) Activation function ReLu
yt = g(ht W (hi,ou)
+b )
(y)
Classification function SoftMax
Optimizer Adam
where xt is the input sequence, while W (in,hi) , W (hi,hi) , and
W (hi,ou) denotes the weight matrices for the input layer to the
1) Step 1: at t = 0, the FELIDS server generates a generic
hidden layer, the hidden layer to hidden layer, and the hidden
neural network model with a set of initial weights w. At this
layer to the output layer, respectively. In addition, b denote
point, the neural network architecture, the hyperparameters,
the bias, and g(.) is the activation function. Long Short-Term
and both local and global epochs are identified. However, this
Memory (LSTM) is a gated RNN, that has cells for learning
generic model is random at this stage.
long time dependencies, and to forget unrelated information
2) Step 2: every FELIDS client k, where k ∈ [1, .., K]
[29]. The formulation of the LSTM hidden layer at time-step
downloads the generic model from the FELIDS server.
t is as follows:
3) Step 3: each one of the K FELIDS clients re-train
the generic model locally with its private data in parallel
and computes a new local set of weights wt+1 k for the new
it = sigmoid(xt W (x,i) + ht−1W (h,i) + ct−1W (c,i) + b(i) )
generated local model. Every k has a local preprocessed
= sigmoid(xt W (x, f ) + ht−1W (h, f ) + ct−1W (c, f ) + b( f ) )
f
t dataset P, split into local minibatches with the size B, given
= ft ct−1 + it tanh(xt W (x,c) + ht−1W (h,c) + b(c) ) B. After that k compute: w ← w − η∇ f (w, b) multiple local
ct
ot
= sigmoid(xt W (x,o) + ht−1W (h,o) + ct W (c,o) + b(o) ) epochs E, before the averaging step, where η is the learning
rate, ∇ f (w, b) is the average gradient, and b is the minibatch
= ot tanh(ct )
ht
used for that local epoch.
= So f t Max(W ht + b)
yt
4) Step 4: instead of sharing training data with the server
(8) as in the centralized learning. FELIDS clients only share the
where, it is the input gates, ft is the forget gates, ct is the updated model parameters, which were trained on the local
memory cells, ot is the hidden output, and yt is the output data.
layer, all of which at time-step t. 5) Step 5: FELIDS server aggregates these parameters
Table I summarize the used values for different classifiers from the different FELIDS clients. Once all these updates are
parameters in the FELIDS scheme. received, FELIDS server createsÍa new updated global model
by applying the update wt+1 ← kk=1 nnk wt+1 k , where n is the
k
number of local examples for client k.
D. Learning Process 6) Step 6: FELIDS server forwards the updated global
model parameters to FELIDS clients.
Initially, C fraction of K edge nodes (FELIDS clients)
7) Step 7: Every FELIDS client applies the updated pa-
are selected by the FELIDS server to participate in the FL
rameters from the updated global model, and enhances them
process, and perform computation for R FL rounds. Clients-
using its new local data.
server communications is done through a secure gRPC channel
The 4th , 5th , 6th, and 7th steps are repetitively performed
since all data exchanged must be encrypted. gRPC has built-
for an ongoing learning and enhancement of the global model.
in SSL/TLS support to authenticate both the clients and the
server, and also to encrypt the entire communication between
them: [30]. Then, after all selected clients are connected, the E. FELIDS Complexity
entire process as illustrated in Fig. 4, and presented in Alg. Since FELIDS is expected to operate in the edge layer, it
1 and Alg. 2, which are adapted from the FedAvg algorithm means that it should have both low time complexity and low
[8], works as follows: power consumption, given that energy and time in computer
8
Client 1
Active Server Idle Server
FL rounds
Private
Local Idle Server
Data Test data 4. Upload Model to
server Generic model
FL Waiting for
7. Training Global Model requests
rounds
on Local Data
2. Generic Model
Sending to clients
Idle Server
5. Clients Models
Client n Aggregation Waiting for
FL rounds requests
Data Preparation
• Pre-processing: In order to prevent overfitting we dropped • False Negative (FN): reports the number of attack samples
the following features: ’Flow ID’, ’Src IP’, ’Src Port’, that are wrongly classified as benign.
’Dst IP’, and ’Timestamp’. After that we encoded the • Accuracy: reports the ratio of correct classifications num-
’Protocol’ feature’s data with one-hot encoding, then we ber to the total input number, and is given by:
normalized the remaining numerical features with the Z-
score normalization strategy. TP + TN
(14)
In Table II we report a statistical summary for each attack T P + T N + FP + F N
type in every dataset for both training and testing purposes. • Precision: reports the ratio of correct attack classifications
to the total number of attack results predicted, which is
B. Experimental setup given by:
TP
We conducted our experiments on Google Colaboratory (15)
[36], with Python 3 as the main programming language. T P + FP
FELIDS is implemented using well-known libraries, including • Recall: reports the ratio of correct attack classifications
NumPy for the manipulation of multi-dimensional arrays and to the total number of all samples that should have been
matrices. Pandas for manipulation data structures and rich identified as attacks, and is given by:
analysis tools. TensorFlow, and Keras for machine learning,
and deep learning. Scikit-learn for a wide range of implemen- TP
(16)
tation of both supervised and unsupervised ML algorithms. TP + FN
SMOTE to oversample data in minority classes. The Fed-
erated Leaning part is conducted using the recent Sherpa.ai • F1 -Score: reports the Harmonic Mean between Precision
FL framework [37]. FELIDS training energy consumption is and Recall, which is given by:
reported using carbontracker [38].
Precision · Recall
1) Performance Metrics: When analyzing intrusion detec- 2· (17)
tion performance, the following metrics are typically used: Precision + Recall
• True Positive (TP): reports the number of attack samples 2) Objectives: our experiments objectives, are the follow-
that are correctly classified as attacks. ing: 1) Build and evaluate a centralized model, where the
• False Positive (FP): reports the number of benign samples training and test data are located in one place; 2) Evaluate
that are wrongly classified as attacks. our proof-of-concept implementation of the FELIDS scheme.;
• True Negative (TN): reports the number of benign samples and 3) Comparing both the centralized model and FELIDS
that are correctly classified as benign. model results in terms of privacy and performance.
11
100 DNN
classes. Fig. 5 presents the accuracy of deep learning tech-
97.84
97.71
92
90.05
Centralized
IID NonI I D
K=5
K = 10
K = 15
80 80 80
60 60 60
Accuracy
40 40 40
20 20 20
0 0 0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
FL round FL round FL round
(a) CSE-CIC-IDS2018 dataset
80 80 80
Accuracy
60 60 60
40 40 40
20 20 20
0 0 0
01 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
FL round FL round FL round
(b) InSDN dataset
80 80 80
60 60 60
Accuracy
40 40 40
20 20 20
0 0 0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
FL round FL round FL round
(c) MQTTset dataset
local data. In addition, individual components of the FELIDS number of clients. From these results, we can conclude that the
system have been executed in a completely isolated ecosystem time and energy consumption mainly depends on two major
to avoid data leakage, and only communicate with the FELIDS factors, namely the DL algorithm used and the number of
server through secured channels. During each of the 50 FL clients involved in the FL training.
rounds, FELIDS clients perform one local epoch, then send In Table IV we present a comparison between the validation
the updated model parameters back to the FELIDS server. accuracy performance of all global models, best clients, and
All connected FELIDS clients participate in each FL round worst clients, for all client distribution sets, with both data
to simulate the real-life scenario of the Agri-IoT framework. distribution sets, at the first and the 50th FL rounds. In the case
Fig. 6 presents the validation accuracy for the centralized of IID, the difference between the performance of the worst
model and the validation accuracy of all global models across and the best client is small, which is quite normal since all
all three client distribution sets (i.e., K =5, K =10, and clients were able to learn from all classes. The global models
K =15), with both data distribution sets (i.e., IID and Non-IID) achieved performance close to that of the centralized model
over 50 FL rounds, again using three different deep learning while preserving the privacy of the clients data. For Non IID,
classifiers, over three different datasets. Fig. 6 (a) present the however, the gap is consistently huge, since some clients have
validation accuracies of the centralized model and the global a very small number of classes. An example is the InSDN
FELIDS models obtained with the CSE-CIC-IDS2018 dataset, dataset, for the CNN classifier, with K =10, the worst client
using DNN, CNN and RNN. Fig. 6 (b) present the validation performance was 00.40%, while the best client was 50.90%,
accuracies of the centralized model and the global FELIDS and the global model was 28.53%. Nevertheless, at the 50th
models obtained with the InSDN dataset, using DNN, CNN round, the worst client was able to improve its performance to
and RNN. Fig. 6 (c) present the validation accuracies of the 66.55%, the best client reached 99.38% and the overall model
centralized model and the global FELIDS models obtained reached 99.60%, which is better the the centralized model
with the MQTTset dataset, using DNN, CNN and RNN. performance. The worst client was able to benefit from its
peers without sharing any of its data, proving that our FELIDS
The main observation is that the accuracy improves with
model can be effective and preserve privacy at the same time.
the number of rounds in all FELIDS global models, meaning
In Table V we provide a comparison between the perfor-
that all clients improve and benefit from the global model,
mance of our work with other FL-based state-of-the-art IDS.
simultaneously. Another important observation is that, in some
The scope of the comparison covers the targeted deployment,
cases, global models have been able to match or approach the
used dataset, machine learning classifier, clients number, and
performance of the centralized model.
data distribution methods.
Fig. 7 and Fig. 8 represent the time and energy consumption
of FELIDS after 50 FL rounds of global model training,
VI. C ONCLUSIONS AND F UTURE W ORK
considering the datasets used, the number of clients (k = [5,
10, 15]), and the data distribution technique (IID or non-IID). In this paper, we proposed FELIDS, a federated deep
This experiment was conducted using the Intel CPU Core i5- learning IDS for the cybersecurity of Agri-IoT networks. Our
6300U @ 2.4GHz, 8.00 GB of RAM, and Ubuntu 20.04.3 federated learning experiments on (a) three deep learning
LTS. We can see that DNN is the most efficient with the lower classifiers, namely deep neural networks, convolutional neural
time and energy consumption, followed by CNN, whereas networks, and recurrent neural networks, and (b) three recent
RNN comes after, having consumed higher amounts of time real traffic datasets, namely: CSE-CIC-IDS2018, MQTTset,
and energy. Another observation is that the data distribution and InSDN, illustrated the practical feasibility of the proposed
technique does not have a large impact on time or energy system. FELIDS achieved close performance to the centralized
consumption, although the NonIID-based experiments usually model in some cases, such as in the CSE-CIC-IDS2018 dataset
tend to consume slightly less than their peers with the same where the centralized model’s accuracy was 93.58%, 94.24%,
Time (Minutes) Time (Minutes) Time (Minutes)
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
17.65 14.86 16.5
19.71 19.55 20.81
5-IID
5-IID
5-IID
31.05 29.75 32.68
5-NIID
5-NIID
5-NIID
30.65 29.1 31.53
DNN
DNN
DNN
10-IID
10-IID
10-IID
57.28 52.65 56.6
CNN
CNN
CNN
33.53 25.65 27
10-NIID
10-NIID
10-NIID
56.55 52.01 55.71
RNN
RNN
RNN
15-IID
15-IID
15-IID
82.95 75.95 81.2
15-NIID
15-NIID
15-NIID
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
0
0.5
1
1.5
2
2.5
3
·10−2
·10−2
·10−2
5-IID
5-IID
5-IID
5-NIID
5-NIID
5-NIID
DNN
DNN
DNN
10-IID
10-IID
10-IID
CNN
CNN
CNN
10-NIID
10-NIID
10-NIID
RNN
RNN
RNN
15-IID
15-IID
15-IID
15-NIID
15-NIID
15-NIID
and 94.22% for DNN, CNN, and RNN, respectively, compared [17] S. A. Rahman, H. Tout, C. Talhi, and A. Mourad, “Internet of things
to 93.29%, 94.09%, and 94.15% for FELIDS while preserving intrusion detection: Centralized, on-device, or federated learning?” IEEE
Network, vol. 34, no. 6, pp. 310–317, 2020.
data privacy. Moreover, FELIDS outperforms the centralized [18] T. D. Nguyen, S. Marchal, M. Miettinen, H. Fereidooni, N. Asokan, and
model in other cases, such as in the InSDN dataset where A.-R. Sadeghi, “Dïot: A federated self-learning anomaly detection sys-
the centralized model achieved accuracies of 98.54%, 97.71% tem for iot,” in 2019 IEEE 39th International Conference on Distributed
Computing Systems (ICDCS). IEEE, 2019, pp. 756–767.
and 97.84% for the same DL algorithms, compared to 98.63%, [19] B. Li, Y. Wu, J. Song, R. Lu, T. Li, and L. Zhao, “Deepfed: Feder-
99.71% and 99.05% for FELIDS. For the future work, we aim ated deep learning for intrusion detection in industrial cyber-physical
to improve the model in terms of adversarial settings, where systems,” IEEE Transactions on Industrial Informatics, 2020.
[20] W. Schneble and G. Thamilarasu, “Attack detection using federated
one or more edge nodes in the network may be malicious. learning in medical cyber-physical systems,” in 2019 28th International
Therefore, the adversary can perform data poisoning attacks Conference on Computer Communication and Networks, ICCCN, 2019,
on federated machine learning so that the global model fulfills pp. 1–8.
[21] T. T. Huong, T. P. Bac, D. M. Long, B. D. Thang, N. T. Binh,
the adversary’s objective of going undetected. In addition, we T. D. Luong, and T. K. Phuc, “Lockedge: Low-complexity cyberattack
will study the efficient of the proposed system with the recent detection in iot edge computing,” IEEE Access, vol. 9, pp. 29 696–
IoT and IIoT dataset, named Edge-IIoTset [39]. 29 710, 2021.
[22] R. Zhao, Y. Yin, Y. Shi, and Z. Xue, “Intelligent intrusion detection
based on federated learning aided long short-term memory,” Physical
Communication, vol. 42, p. 101157, 2020.
R EFERENCES [23] E. Barka, S. Dahmane, C. A. Kerrache, M. Khayat, and F. Sallabi,
“Sthm: A secured and trusted healthcare monitoring architecture using
[1] “more people, more food, worse water?” http://www.fao.org/3/ca0146en/ sdn and blockchain,” Electronics, vol. 10, no. 15, p. 1787, 2021.
CA0146EN.pdf, last accessed 2020-04-13. [24] G. Rathee, F. Ahmad, R. Sandhu, C. A. Kerrache, and M. A. Azad,
[2] O. Friha, M. A. Ferrag, L. Shu, L. Maglaras, and X. Wang, “Internet of “On the design and implementation of a secure blockchain-based hybrid
things for the future of smart agriculture: A comprehensive survey of framework for industrial internet-of-things,” Information Processing &
emerging technologies,” IEEE/CAA Journal of Automatica Sinica, vol. 8, Management, vol. 58, no. 3, p. 102526, 2021.
no. 4, pp. 718–752, 2021. [25] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521,
[3] G. Fortino, C. Savaglio, G. Spezzano, and M. Zhou, “Internet of things as no. 7553, pp. 436–444, 2015.
system of systems: A review of methodologies, frameworks, platforms, [26] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
and tools,” IEEE Transactions on Systems, Man, and Cybernetics: arXiv preprint arXiv:1412.6980, 2014.
Systems, vol. 51, no. 1, pp. 223–236, 2020. [27] H.-I. Suk, “An introduction to neural networks and deep learning,” in
[4] D. Preuveneers, V. Rimmer, I. Tsingenopoulos, J. Spooren, W. Joosen, Deep Learning for Medical Image Analysis. Elsevier, 2017, pp. 3–24.
and E. Ilie-Zudor, “Chained anomaly detection models for federated [28] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct
learning: An intrusion detection case study,” Applied Sciences, vol. 8, deep recurrent neural networks,” arXiv preprint arXiv:1312.6026, 2013.
no. 12, p. 2663, 2018. [29] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
[5] “Jbs: Cyber-attack hits world’s largest meat supplier,” https://www.bbc. computation, vol. 9, no. 8, pp. 1735–1780, 1997.
com/news/world-us-canada-57318965, last accessed 2021-06-02. [30] “grpc,” https://grpc.io/docs/guides/auth/, last accessed 2021-05-28.
[6] M. M. Hassan, A. Gumaei, A. Alsanad, M. Alrubaian, and G. Fortino, [31] R. Mînea, “A study on privacy-preserving federated learning and en-
“A hybrid deep learning model for efficient intrusion detection in big hancement through transfer learning,” 2021.
data environment,” Information Sciences, vol. 513, pp. 386–396, 2020. [32] X. Qiu, T. Parcollet, J. Fernandez-Marques, P. P. B. de Gusmao, D. J.
[7] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Beutel, T. Topal, A. Mathur, and N. D. Lane, “A first look into the carbon
Concept and applications,” ACM Transactions on Intelligent Systems and footprint of federated learning,” arXiv preprint arXiv:2102.07627, 2021.
Technology (TIST), vol. 10, no. 2, pp. 1–19, 2019. [33] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, “Toward generating
[8] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, a new intrusion detection dataset and intrusion traffic characterization.”
“Communication-efficient learning of deep networks from decentralized in ICISSp, 2018, pp. 108–116.
data,” in Artificial Intelligence and Statistics. PMLR, 2017, pp. 1273– [34] I. Vaccari, G. Chiola, M. Aiello, M. Mongelli, and E. Cambiaso,
1282. “Mqttset, a new dataset for machine learning techniques on mqtt,”
[9] “Federated learning: Collaborative machine learning without Sensors, vol. 20, no. 22, p. 6578, 2020.
centralized training data,” https://ai.googleblog.com/2017/04/ [35] M. S. Elsayed, N.-A. Le-Khac, and A. D. Jurcut, “Insdn: A novel sdn
federated-learning-collaborative.html, last accessed 2021-05-20. intrusion dataset,” IEEE Access, vol. 8, pp. 165 263–165 284, 2020.
[10] J. Mills, J. Hu, and G. Min, “Communication-efficient federated learning [36] “Google colaboratory,” https://colab.research.google.com/, last accessed
for wireless edge intelligence in iot,” IEEE Internet of Things Journal, 2021-05-28.
vol. 7, no. 7, pp. 5986–5994, 2019. [37] N. Rodríguez-Barroso, G. Stipcich, D. Jiménez-López, J. A. Ruiz-
[11] M. A. Ferrag, L. Shu, H. Djallel, and K.-K. R. Choo, “Deep learning- Millán, E. Martínez-Cámara, G. González-Seco, M. V. Luzón, M. A.
based intrusion detection for distributed denial of service attack in Veganzones, and F. Herrera, “Federated learning and differential privacy:
agriculture 4.0,” Electronics, vol. 10, no. 11, p. 1257, 2021. Software tools analysis, the sherpa. ai fl framework and methodological
[12] M. A. Ferrag, L. Maglaras, A. Ahmim, M. Derdour, and H. Janicke, guidelines for preserving data privacy,” Information Fusion, vol. 64, pp.
“Rdtids: Rules and decision tree-based intrusion detection system for 270–292, 2020.
internet-of-things networks,” Future internet, vol. 12, no. 3, p. 44, 2020. [38] L. F. W. Anthony, B. Kanding, and R. Selvan, “Carbontracker: Tracking
[13] D. Kreutz, F. M. Ramos, P. E. Verissimo, C. E. Rothenberg, S. Azodol- and predicting the carbon footprint of training deep learning models,”
molky, and S. Uhlig, “Software-defined networking: A comprehensive arXiv preprint arXiv:2007.03051, 2020.
survey,” Proceedings of the IEEE, vol. 103, no. 1, pp. 14–76, 2014. [39] M. A. Ferrag, O. Friha, D. Hamouda, L. Maglaras, and H. Janicke,
[14] O. Friha, M. A. Ferrag, L. Shu, and M. Nafa, “A robust security “Edge-iiotset: A new comprehensive realistic cyber security dataset of
framework based on blockchain and sdn for fog computing enabled iot and iiot applications for centralized and federated learning,” 2022.
agricultural internet of things,” in 2020 International Conference on
Internet of Things and Intelligent Applications (ITIA). IEEE, 2020,
pp. 1–5.
[15] S. Nanda, F. Zafari, C. DeCusatis, E. Wedaa, and B. Yang, “Predicting
network attack patterns in sdn using machine learning approach,” in
2016 IEEE Conference on Network Function Virtualization and Software
Defined Networks (NFV-SDN). IEEE, 2016, pp. 167–172.
[16] G. Cusack, O. Michel, and E. Keller, “Machine learning-based detection
of ransomware using sdn,” in Proceedings of the 2018 ACM Interna-
tional Workshop on Security in Software Defined Networks & Network
Function Virtualization, 2018, pp. 1–6.
16
Othmane Friha received the master’s degree in Lei Shu received the B.S. degree in computer
computer science from Badji Mokhtar - Annaba science from South Central University for Nation-
University, Algeria, in 2018. He is currently working alities, China, in 2002, and the M.S. degree in
toward the Ph.D. degree in the University of Badji computer engineering from Kyung Hee University,
Mokhtar - Annaba, Algeria. His current research South Korea, in 2005, and the Ph.D. degree from
interests include network and computer security, the Digital Enterprise Research Institute, National
Internet of Things, and applied cryptography. University of Ireland, Galway, Ireland, in 2010.
Until 2012, he was a Specially Assigned Researcher
with the Department of Multimedia Engineering,
Graduate School of Information Science and Tech-
nology, Osaka University, Japan. He is currently a
Distinguished Professor with Nanjing Agricultural University, China, and a
Lincoln Professor with the University of Lincoln, U.K. He is also the Director
of the NAU-Lincoln Joint Research Center of Intelligent Engineering. He has
published over 400 papers in related conferences, journals, and books in the
areas of sensor networks and Internet of Things. His current H-index is 65
and i10-index is 272 in Google Scholar Citation. His current research interests
include wireless sensor networks and Internet of Things. He has also served
as a TPC member for more than 160 conferences, such as ICDCS, DCOSS,
MASS, ICC, GLOBECOM, ICCCN, WCNC, and ISCC. He was a recipient
of the 2014 Top Level Talents in Sailing Plan of Guangdong Province, China,
the 2015 Outstanding Young Professor of Guangdong Province, and the
GLOBECOM 2010, ICC 2013, ComManTel 2014, WICON 2016, SigTelCom
2017 Best Paper Awards, the 2017 and 2018 IEEE Systems Journal Best
Paper Awards, the 2017 Journal of Network and Computer Applications Best
Research Paper Award, and the Outstanding Associate Editor Award of 2017,
and the 2018 IEEE ACCESS. He has also served over 60 various Co-Chair
for international conferences/workshops, such as IWCMC, ICC, ISCC, ICNC,
Chinacom, especially the Symposium Co-Chair for IWCMC 2012, ICC 2012,
the General Co-Chair for Chinacom 2014, Qshine 2015, Collaboratecom 2017,
DependSys 2018, and SCI 2019, the TPC Chair for InisCom 2015, NCCA
2015, WICON 2016, NCCA 2016, Chinacom 2017, InisCom 2017, WMNC
2017, and NCCA 2018.
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
Othmane Friha: Conceptualization, Methodology, Software,
Writing- Original draft preparation. Mohamed Amine Ferrag:
Supervision, Reviewing. Leandros Maglaras: Supervision. Lei
Shu: Supervision.: Kim-Kwang Raymond Choo: Supervision.