Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Information Sciences 568 (2021) 147–162

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

A novel IoT network intrusion detection approach based on


Adaptive Particle Swarm Optimization Convolutional Neural
Network
Xiu Kan a,b,⇑, Yixuan Fan a, Zhijun Fang a, Le Cao a,⇑, Neal N. Xiong c, Dan Yang a, Xuan Li d
a
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai201620, China
b
School of Mathematics, Southeast University, Nanjing 210096, China
c
Department of Mathematics and Computer Science, Northeastern State University, OK, USA
d
College of Science, Donghua University, Shanghai 201620, China

a r t i c l e i n f o a b s t r a c t

Article history: In the field of network security, it is of great significance to accurately detect various types
Received 14 October 2020 of Internet of Things (IoT) network intrusion attacks which launched by the attacker-
Received in revised form 7 March 2021 controlled zombie hosts. In this paper, we propose a novel IoT network intrusion detection
Accepted 29 March 2021
approach based on Adaptive Particle Swarm Optimization Convolutional Neural Network
Available online 6 April 2021
(APSO-CNN). In particular, the PSO algorithm with change of inertia weight is used to adap-
tively optimize the structure parameters of one-dimensional CNN. The cross-entropy loss
Keywords:
function value of the validation set, which is obtained from the first training of CNN, is
IoT network security
Adaptive Particle Swarm Optimization
taken as the fitness value of PSO. Especially, we define a new evaluation method that con-
Convolutional Neural Network siders both the prediction probability assigned to each category and prediction label to
Attack detection compare the proposed APSO-CNN algorithm with CNN set parameters manually (R-
CNN). Meanwhile, the comprehensive performance of proposed APSO-CNN and other three
well known algorithms are compared in the five traditional evaluation indicators and the
accuracy statistical characteristics of 10 times independent experiments. The simulation
results show that the multi-type IoT network intrusion attack detection task based on
APSO-CNN algorithm is effective and reliable.
Ó 2021 Elsevier Inc. All rights reserved.

1. Introduction

Nowadays, the rapid development of Internet technology in the world has also brought about the frequent occurrence of
network attacks. Through the extortion virus, the vulnerabilities and security defects of the network information system are
utilized to attack the system and resources for the access to website data and user’s personal information, etc. In particular,
the proliferation of industrial IoT devices is prone to expose security and privacy threats, leading a large number of attackers
to spread malicious content, which has aroused researchers’ extensive study on the security of the IoT. For instance, some
researchers discuss the existing attacks and main security problems of each layer in IoT devices [1,2], and further put for-
ward several privacy protection techniques [3,4]. Meanwhile, it is inevitable to induce network security problems in the field
of the smart grid because of the connection between communication facilities and power facilities, see [5–7]. Especially in

⇑ Corresponding authors at: School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China (X. Kan).
E-mail addresses: xiu.kan@sues.edu.cn (X. Kan), caole00012@163.com, caole00012@sues.edu.cn (L. Cao).

https://doi.org/10.1016/j.ins.2021.03.060
0020-0255/Ó 2021 Elsevier Inc. All rights reserved.
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

[7], it takes the detection and mitigation scheme into consideration, involving the random uncertainty related to communi-
cation noise during this period. Moreover, the general problems related to network security in smart grid infrastructure are
summarized in detail, and various classifications of smart grid attacks are provided, see [8,9]. It can be seen clearly that these
tremendous network threats have brought immense damage and loss to the security of the public property and quality of
life. Therefore, the research on intrusion detection technology of abnormal network attacks is an essential and urgent task.
It is well-known that machine learning technology is a multidisciplinary interdisciplinary field, covering probability the-
ory, statistics, approximation theory, convex analysis, algorithm complexity, etc. Network anomaly intrusion detection is a
binary classification task essentially. Many traditional machine learning technologies have been applied to anomaly traffic
detection, which improves the intrusion detection rate and reduces the false alarm rate, see [10,11]. In fact, it has been
shown in [12–14] that many common traditional machine learning algorithms do not distinguish intricate task of intrusion
detection, but the process of detecting abnormal data, and the preprocessing process of data set is complex and time-
consuming. On the contrary, deep learning technology [15,16] can map features to higher-dimensional and more distin-
guishable feature space by learning the non-linear combination of features of the original data set automatically, and can
combine with traditional machine learning classification model simultaneously to reduce the model preprocessing time.
Meanwhile, on the botnet virus data set used in this paper, some scholars have verified their methods. In [17], the feature
selection is applied to minimize the number of features in detecting the IoT bots with higher accuracy and concentrate on
affording interpretable results through decision tree that can produce signatures for the contemporary intrusion detection.
[18,19] adopt a detection method based on abnormal traffic to establish the behavior of normal traffic, and train a deep auto-
matic encoder to identify distributed denial of service (DDoS) attacks caused by botnet virus. Experiments show that the
anomaly detection algorithm can greatly improve the detection accuracy of abnormal traffic. In [20], the LSTM model is pro-
posed as a deep learning method which requires higher computational capabilities, they take advantage of the cloud envi-
ronment to perform anomaly detection task. Although they take multiple network attack into consideration, each attack is
regarded as an abnormal traffic to be distinguished from normal traffic and implemented multiple binary classification tasks.
In the above mentioned traditional machine learning methods and deep learning methods, CNN is one of the above men-
tioned deep learning methods that can automatically extract the effective features from the original data feature plane
through convolution layer, and it has been widely applied in the image field, see [21–23]. However, there are many param-
eters that need to be determined in the network structure. Through the process of the manual setting, it is full of uncertainty
and randomness to find a suitable network structure. At this time, when the parameters are unknown, for the purpose of
improving CNN’s adaptability, it is easy to think of parameter optimization algorithms with unknown parameters. In order
to improve the performance of deep learning model, by introducing biological heuristic algorithm, such as genetic algorithm,
PSO algorithm, biogeography-based optimization algorithm, ant colony algorithm, artificial bee colony algorithm and so on
[24–27], using their search strategies to hunt for the optimized parameters. The PSO algorithm is an optimization algorithm
inspired by birds’ foraging behavior, which can obtain the global optimal solution with a higher probability. But at the same
time, it is easy to premature convergence and falls into local extreme value. As a result, the improvement of PSO algorithm
has been further explored in recent years, see [28–30]. It is not difficult to see intuitively that PSO algorithm has been intro-
duced into CNN network structure for hyperparameters optimization in the image data field in [31,32], which significantly
improves network performance obviously and reduces the time complexity of artificial search.
Motivated by the above discussion, this paper provides a method to optimize the hyperparameters of CNN network struc-
ture based on the APSO algorithm, which has been successfully applied to multi-type network attack detection task. The
main contributions of this paper can be summarized in three aspects as follows.

(1) An adaptive change method of inertia weight factor is introduced to improve the original PSO algorithm. Through the
global search ability of the adaptive particle swarm, the structure parameters of one-dimensional CNN are taken as the
position parameters of APSO, and the cross-entropy loss function value of the validation set under the first training period
of CNN is taken as the fitness value of APSO.
(2) Aside from the changes of four indexes in each training cycle, a novel evaluation method is proposed which considers
the deviation between the prediction probability and the actual label, as well as the deviation between the prediction
label and the actual label together. Through taking two kinds of deviation as coordinates, the distance from it to the origin
is visualized.
(3) The proposed APSO-CNN algorithm is successfully applied to various attack data launched by IoT devices in botnet.
Compared with SVM, FNN, and R-CNN, the comprehensive performance of the proposed algorithm is evaluated based
upon five traditional multi-classification indicators. The stability of the proposed algorithm is further verified through
the statistical characteristics of model accuracy that is obtained based on 10 independent experiments.

The rest of this paper is arranged as follows. The data sources and preprocessing methods of network attack type detec-
tion tasks are discussed in Section 2. The APSO-CNN algorithm is introduced in Section 3, which describes the structure of the
one-dimensional CNN and the optimization process combined with APSO algorithm in detail. The simulation results based
on APSO-CNN algorithm and the comparison with the other three algorithms are analyzed in Section 4. Finally, the research
and future work of this paper are summarized in Section 5.

148
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

2. Data acquisition and preprocessing

2.1. Data source

In this paper, we will perform a variety of network attack type classification tasks on the public IoT data set created by
[18], which is collected from nine IoT devices infected by two notorious botnet viruses Mirai and BASHLITE. Botnets are
spread by attackers and infect a large number of network hosts. Then, control commands are sent to the infected zombie
hosts in the network through the command and control (C & C) channel, so as to launch various malicious network attacks.
The executed network attacks include Ack, COMBO, Junk, Scan, Syn, TCP, UDP, and UDPplain. The process of zombie virus
attack is shown in Fig. 1.
The traffic data including normal and malicious is collected by taking a behavioral snapshot of the hosts and protocols
communicating the data packets. After that, the snapshot captures the context of the data packet over 5 time windows con-
sisting of the recent 100 ms, 500 ms, 1.5 s, 10 s, and 1 min. Each time window summarizes 23 statistical information features,
including the mean and variance of outbound packet size from the source IP, the source MAC-IP, the channel and socket, and
the number of packet count from the same place. They also include the magnitude, radius, covariance, and correlation coef-
ficient of both inbound and outbound packet size from the channel and socket, the mean, variance, and the number of packet
jitter from the channel.

2.2. Data preprocessing

The data set of one Danmini Doorbell device is utilized for multi-classification task. First, 10000 records are randomly
sampled for each kind of data set to obtain an original data set with 90000 samples and 115 characteristic dimensions.
For the same feature, the values in different samples will diverge widely, and some abnormally small or large data will mis-
lead the training of the model. In addition, the training results will also be affected due to the scattered distribution. Accord-
ingly, the features are standardized by being transformed into standard normal distribution with zero mean and unit
variance.
Denote the original data matrix by X np where n and p represent the number of samples and number of features respec-
tively. The main process of standardization can be described as follows:

(1) Calculate the mean of each feature of the sample:


X
n
lj ¼ xji =n; ð1Þ
i¼1

Fig. 1. Zombie virus attack process.

149
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

where lj denotes the mean of the j-th characteristic, xji denotes the j-th feature of the i-th sample, n is the number of
samples.
(2) Calculate the variance of each feature of the sample:
n 
X 2
r2j ¼ xji  lj =n; ð2Þ
i¼1

where r2j denotes the variance of the j-th feature.


(3) Calculate the standardized characteristics of the sample:
 
xji ¼ xji  lj =rj ; ð3Þ

where 
xji denotes the standardization of the j-th feature of the i-th sample.

90000 records after feature standardization are divided into 81000 training samples and 9000 validation samples to train
the model and verify the validity of the model.

3. Novel APSO-CNN detection algorithm

3.1. One-dimensional CNN model

CNN is a neural network with multi-layer structure, and each layer of the network is composed of several two-
dimensional planes [33]. The output of each neuron is obtained and activated by the weighted sum of the elements in
the previous layer. In this paper, we use the one-dimensional CNN built by keras to detect the types of abnormal network
attacks. The specific network structure model is as follows in Fig. 2.
(1) The first layer is the input layer.
As the characteristic attribute of network traffic data is to obtain 23 statistical characteristic information in 5 time win-
dows respectively, that is to say, there are 5 main characteristics. In this case, the input dimension of each sample is set to the
feature plane of 23  5. The working process of the convolution kernel is shown in Fig. 3.
The output of the j-th neuron on the i-th feature plane in C1 layer denoted by c1out ij is given as follows:
!
X
fl5
c1out
ij ¼F wt  f  rawt ;
in in
ð4Þ
i¼1

where win in
t denotes the weight of the t-th position of convolution kernel, fl denotes the length of the filter, and f  rawt denotes
the feature of the position of the characteristic plane corresponding to the convolution kernel weight in the input layer. In
this paper, three types of nonlinear activation functions sigmoid, tanh, and relu which denoted by F i ðÞði ¼ 1; 2; 3Þ are shown
as follows:
F 1 ðxÞ ¼ sigmoidðxÞ ¼ 1=ð1 þ expx Þ; ð5Þ

F 2 ðxÞ ¼ tanhðxÞ ¼ ðexpx  expx Þ=ðexpx þ expx Þ; ð6Þ

F 3 ðxÞ ¼ reluðxÞ ¼ maxð0; xÞ: ð7Þ

Fig. 2. Structure model of one-dimensional convolution neural network.

150
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

Fig. 3. One dimensional convolution process.

The convolution layer obtains deeper and more advanced features of the original data. Moreover, the feature plane is flat-
tened before the layer is connected to the next layer. In the process of training network, there may be high precision on the
training set, but low precision on the validation set, that is over fitting phenomenon. In [34], Hinton proposed that when
training is processed every time, half of the feature detectors stop working, which can improve the generalization ability
of the network, also known as dropout. Therefore, some neurons do not work with a certain probability, and the neuron
nodes working in each training cycle are different when each batch of samples is trained. Simultaneously, not all neurons
work can improve the generalization ability of the network.
(2) The second layer is the first full connection layer (F2 layer).
The number and activation mode of neurons are set in the whole connective layer. Each neuron in the full connection
layer is connected with the neurons working in the previous layer in each batch training. The output of the m-th neuron
full1out
m in F2 layer is given as follows:

X
nkeep
f2
full1out
m ¼ Fð mn  keepn þ bm Þ;
wkeep ð8Þ
n¼1

where nkeep denotes the number of neurons after flattening and dropout, wkeep
mn denotes the connection weight between the n-
th neuron of the remaining working neuron after the processing of the previous layer and the m-th neuron of the F2 layer,
f2
keepn denotes the n-th neuron of the remaining working neuron and bm denotes the offset value of the m-th neuron of the F2
layer.
(3) The third layer is the second full connection layer (F3 layer).
The number of F3 neurons and activation mode are set initially. Each neuron in this layer connects to the previous layer in
the same way as F2 layer. The third layer and the fourth layer of full connection layer are designed to learn the non-linear
combination of flattened features better, and learn the features of advanced features extracted better by convolution layer
through the way of weight connection.
(4) The fourth layer is the output layer.
The output value of F3 in the last layer is transferred to an output layer, and the number of neurons in the output layer is
determined by the number of categories of multi-classification tasks. In this paper, softmax expression is used to classify
various types of network attacks, that is, the final output layer outputs the probability of each type by softmax formula,
which is given as follows:
X
kind
softmaxðyÞi ¼ expyi = expyi ; ð9Þ
i¼1

where yi denotes the output value of the i-th neuron in the output layer, kind denotes the number of network attack types.
The highest probability is the prediction results during training. The cross-entropy loss function Hðp; qÞ is used to measure
the distance between the actual probability distribution pðxÞ and the calculated probability distribution qðxÞ, which is shown
as follows:
X
Hðp; qÞ ¼  pðxÞ log qðxÞ: ð10Þ
x

151
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

Finally, inspired by previous work, the optimizer selection in the model training cycle is taken into consideration. As can
be seen in [35,36], they used the VGG + BN + Dropout network and DenseNet architecture to implement a comparative
experiment of the optimizer on the CIFAR-10 data set, where SGD achieves the best testing accuracy, but meanwhile Adam
leads to a generalization gap. In conclusion, although Adam combines the first-order momentum with the second-order
momentum, which accelerates the convergence speed over SGD, however, the final convergence result is not as good as
SGD. The CIFAR-10 data set contains 10 kinds of 60000 color images. It is a multi-classification task as well as the recognition
problem of multiple types of network attacks to be implemented in this paper. Therefore, SGD is selected to optimize the
training process of the model for better detection accuracy. For overcoming the frequent oscillation of the SGD in the local
optimal gully, the first-order momentum which set as 0.9 is introduced and the learning rate is dynamically attenuated to
improve the training effect. As a result, based on the above analysis, the optimizer adopts the SGD method, which adjusts the
connection weight and bias value of neurons by the value of cross-entropy loss function with a certain learning rate. In the
process of continuous network training, the accuracy of final classification can be improved.

3.2. APSO algorithm

PSO algorithm is a kind of evolutionary algorithm proposed by Kennedy and Eberhart, which is inspired by the predatory
behavior of birds [37]. In consequence, the process of finding food for birds can be analogized to the process of finding the
optimal solution for particles. Without knowing the optimal fitness value, the current best global fitness value and local opti-
mal fitness value of particles provide the speed of motion for each particle, which makes the whole particle swarm move
towards the direction of the optimal solution (i.e. the minimum fitness value).
Each particle has two parameters, position parameter denoted by xki ¼ ½xki1 ; xki2 ; xki3 ; . . . ; xkid  and velocity parameter denoted
by v ki ¼ ½v ki1 ; v ki2 ; v ki3 ; . . . ; v kid . during the iteration, the velocity and position update formula of each particle is given as follows:

v idkþ1 ¼ xv kid þ c1 r1 ðpbestid  xkid Þ þ c2 r2 ðgbestd  xkid Þ; ð11Þ

xkþ1
id ¼ xkid þ v id
kþ1
; ð12Þ

where v kþ1
id
denotes the d-th component of the velocity of the i-th particle in the k þ 1 iteration, v kid denotes the d-th com-
ponent of the velocity of the i-th particle in the k iteration, pbest id denotes the local optimal position of the i-th particle in the
current iteration, gbestd denotes the global optimal position of all particles in the population, x denotes the inertia weight, k
denotes the number of the current iteration, c1 and c2 denote the acceleration coefficients called as cognitive and social
kþ1
parameters, respectively, r 1 and r2 are two random numbers which are uniformly distributed over the interval [0,1], xid
denotes the d-th component of the position of the i-th particle in the k þ 1 iteration, xkid denotes the d-th component of
the velocity of the i-th particle in the k iteration.
When the inertia weight is too large, particles can easily escape the current local optimal value, but they are not guaran-
teed to converge in the late iteration. While, when the inertia weight is too small, the particles are easy to fall into the local
optimal value. Therefore, it is necessary to adjust the inertia weight adaptively. In reference to a dynamic inertia weight
adaptive particle swarm optimization algorithm, it is proposed that particle swarm can easily gather to a specific position
or several specific positions, indicating that the change of fitness value will affect the direction of the whole particle swarm.
Based on the characteristics, APSO algorithm for dynamic inertia weight is introduced, which adjusts the inertia weight
adaptively with the fitness value. The adaptive inertia weight formula is given as follows:
(
xmin  ðxmax  xmin Þ  ðf cur  f min Þ=ðf av g  f min Þ; f cur 6 f av g ;
x¼ ð13Þ
xmax ; f cur > f av g ;
where f cur denotes the fitness value of the current particle, f av g denotes the average fitness value of the current population,
f min denotes the smallest particle fitness value in the current population.

3.3. APSO-CNN algorithm

In this paper, the APSO algorithm is used to optimize the parameter structure of the one-dimensional CNN and find the
appropriate hyperparameters to avoid the high labor cost of manually adjusting the parameters that is used to find the
detection task suitable for the network attack type. Firstly, the parameters of each layer of the one-dimensional CNN are
composed of the position parameters of particles. Moreover, the components of each dimension of the position parameters
are initialized. Parameter setting range of particle swarm is shown in Tables 1 and 2.
Here, C1  n  filter is the number of convolution kernels in C1 layer, C1  filter  length is the length of filter in C1 layer,
C1  act is the type of activation function in C1 layer, C1  F2  dropout is the probability of nodes which still work between
C1 layer and F2 layer, F2  neuron is the number of neurons in F2 layer, F2  act is the type of activation function in F2 layer,
F3  neuron is the number of neurons in F3 layer, F3  act is the type of activation function in F3 layer, batch  size is the size
of batch training sample and learning  rate is the step to update weights in reverse.
152
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

Table 1
Position parameters value range.

Position Hyperparameters Particle value range

xki1 C1  n  filter 100–600(type int)


xki2 C1  filter  length 1–5(type int)
xki3 C1  act simoid(0),tanh(1),relu(2)
xki4 C1  F2  dropout 0.4–0.8(type float)
xki5 F2  neuron 256–1024(type int)
xki6 F2  act simoid(0),tanh(1),relu(2)
xki7 F3  neuron 256–1024(type int)
xki8 F3  act simoid(0),tanh(1),relu(2)
xki9 batch  size 16–300(type int)
xki10 learning  rate 0.01–1(type float)

Table 2
Initial velocity parameters setting range

Velocity Particle initial value setting range

v 1
i1
1–20(type int)
v 1
i2
0–1(type float)
v 1
i3
0–1(type float)
v 1
i4
0–1(type float)
v 1
i5
1–100(type int)
v 1
i6
0–1(type float)
v 1
i7
1–100(type int)
v 1
i8
0–1(type float)
v 1
i9
1–20(type int)
v1
i10
0–1(type float)

Algorithm 1 outline the procedure of searching for the optimal structure parameters of the above CNN. Each particle’s
position parameter corresponds to a CNN network structure, and the best network structure parameter can be found through
continuous iterative search. Considering the actual requirements, in order to effectively detect all kinds of network attacks,
we need to train a model with good prediction effect. Generally, the loss function is used to evaluate the difference between
the real value and the predicted value. The lower the difference degree is, the better the performance of the model is. As a
logarithmic likelihood function, cross-entropy loss function is often used in binary classification and multi-classification
tasks. Moreover, in the CNN model, the actual labels are encoded in one hot mode, and the prediction probability of each
category is obtained through softmax layer. Therefore, in order to ensure the difference between the real distribution and
the predicted probability distribution as small as possible, the cross-entropy loss function is used to train the model. Mean-
while, for accelerating the training speed of the model, we expect for lower cross-entropy loss function that is obtained by
using the optimization algorithm in the first training cycle. Based on the above considerations, the cross-entropy function
value of CNN in the first training cycle is set as the fitness function value of particle quality evaluation. Through the way
of APSO, the variable structure parameters of each layer of the network under the minimum loss function value are found.
The APSO-CNN algorithm flowchart is shown in Fig. 4.

Algorithm 1 The algorithm for searching for the optimal hyperparameters


1.Initialize:
Set the position parameter xkid according to Table 1;
Set the velocity parameter v kid according to Table 2;
2.while k < maximum iteration do
3. Calculate fitness value, determine the f cur ; f av g and f min ;
4. Update the local optimal position pbest id and the global optimal position
of all particles gbest d according to the fitness value;
5. Update the inertia weight x according to (13);
6. Update the velocity v kþ1
id
and position xkþ1
id
according to (11) and (12);
7. k ¼ k þ 1;
8.end while
9.Output: The optimal hyperparameters.

153
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

Fig. 4. Flowchart of the APSO-CNN algorithm.

As can be seen in Fig. 4, the first step is to initialize the position parameters in APSO-CNN algorithm according to Table 1.
In the second step, the fitness value of each particle in the current iteration is obtained by calculating the cross-entropy loss
function value under the first training period under 10 position parameters of all particles. Save the minimum fitness value of
each particle and the global minimum fitness value, and calculate the average fitness value of particles. In the third step, the
inertia weight is adaptively updated according to formula (13). In the fourth step, the velocity and position parameters in
APSO-CNN are updated according to formula (11) and formula (12). In the fifth step, add one to the number of iterations.
If the maximum number of iterations is not met, skip to the second step to continue the iteration. Otherwise, stop the iter-
ation and output the minimum fitness value to end the process.

4. Simulation and discussion of the APSO-CNN algorithm

In this paper, the parameter settings of our algorithm are determined as follows. The maximum number of iterations of
particle swarm is 30, the population size is 20, the learning factor is c1 ¼ c2 ¼ 2, the minimum inertia weight is xmin ¼ 0:4
and the maximum inertia weight is xmax ¼ 0:9. The change of the minimum fitness value with the number of iterations in
APSO-CNN algorithm is shown in Fig. 5.
After iterative optimization, the best fitness value of the particle is 0.18677, that is, under the position parameter of the
particle, the lowest loss function value of the validation set after the first training is obtained. According to the position
parameter of the best particle found, the component of each dimension can also be determined accordingly. The optimal
hyperparameters of CNN are shown in Table 3.
Put the parameters in Table 3 into the network structure described in subSection 3.1, and get the training set cross-
entropy loss function value, training set accuracy, validation cross-entropy loss function value, and validation set accuracy
under each training cycle. Compared with the same four indexes obtained from the R-CNN training with manually set
parameters, the advantages of the proposed method are also reflected. The early stop method is set in the program of net-
work structure for speeding up training process. If the accuracy of the validation set is not improved in one training cycle, the
training will be stopped and the best network model will be saved. The comparison results are shown in Figs. 6–9.
154
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

0.26

0.25

0.24

Min-fitness value
0.23

0.22

0.21

0.2

0.19

0.18
0 5 10 15 20 25 30
Iteration

Fig. 5. Iterative results of APSO-CNN algorithm.

Table 3
The optimal hyperparameters of CNN

Position Hyperparameters Optimization value


x1 C1  n  filter 540
x2 C1  filter  length 1
x3 C1  act relu
x4 C1  F2  dropout 0.4
x5 F2  neuron 654
x6 F2  act relu
x7 F3  neuron 326
x8 F3  act tanh
x9 batch  size 171
x10 learning  rate 0.28

0.5

0.45

0.4

0.35
R-CNN
0.3 APSO-CNN
Train-loss

0.25

0.2

0.15

0.1

0.05

0
1 2 3 4 5 6
epoch

Fig. 6. Cross-entropy loss function in training set.

155
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

0.95

0.9

0.85
R-CNN

Train-accuracy
0.8 APSO-CNN

0.75

0.7

0.65

0.6

0.55

0.5
1 2 3 4 5 6
epoch

Fig. 7. Accuracy in training set.

0.5

0.45

0.4

0.35
R-CNN
Validation-loss

0.3 APSO-CNN

0.25

0.2

0.15

0.1

0.05

0
1 2 3 4 5 6
epoch

Fig. 8. Cross-entropy loss function in validation set.

It can be distinctly observed that compared with R-CNN, the proposed APSO-CNN algorithm has obvious advantages in
every aspect. Firstly, as can be seen in Figs. 6 and 7, the cross-entropy loss value of the APSO-CNN algorithm is far less than
that of R-CNN algorithm in each training cycle on the training set, and meanwhile the accuracy is higher. This indicates that
the training effect of the proposed APSO-CNN algorithm is better, which proves its efficiency in model training. In addition,
from the aspect of validation set, we can clearly see the trend of discount. Equally, Figs. 8 and 9 reveal the change of cross-
entropy loss value and accuracy on the validation set, which also illustrates that the proposed APSO-CNN algorithm is cap-
able of obtaining lower loss value and higher accuracy in each training cycle. Therefore, according to the variation charac-
teristics of the above values, a conclusion can be drawn that the predicted value is closer to the actual value with the lower
cross entropy loss value. Moreover, the overall detection rate of the model is improved with the higher accuracy. Thus, the
prediction effect of the model trained by the proposed algorithm is verified. As shown in previous sections, the high accuracy
is determined by the small value of the cross-entropy loss function, and then a model with a overall high detection rate of
network attacks is required according to the practical significance. Hence, whether the accuracy is improved in a cycle is con-

156
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

0.95

0.9

0.85

Validation-accuracy
R-CNN
0.8 APSO-CNN

0.75

0.7

0.65

0.6

0.55

0.5
1 2 3 4 5 6
epoch

Fig. 9. Accuracy in validation set.

sidered as a condition to stop training. Under the setting of early stop method, the accuracy of validation set of R-CNN
decreased in the fifth training cycle, while that of APSO-CNN decrease in the sixth training cycle. At this time, the training
is stopped and the network model with higher accuracy of the previous cycle is saved.
Meanwhile, in order to evaluate the effect of the two models directly, an evaluation index that considers the deviation
between the prediction probability and the actual label, as well as the deviation between the prediction label and the actual
label is proposed in this paper. Since the softmax layer of the network model outputs a probability vector distribution, that is,
the probability belonging to each class. The difference between this obtained distribution and the actual value distribution
can describe the training effect of the model. Consequently, the lowness of the difference should be ensured as to keep the
best training result of the model. However, the practical application of the model is to identify network attacks, so the class
corresponding to the maximum probability is the detection result which also plays an important role. In order to obtain bet-
ter detection effect in real life, the difference between the predicted class vector distribution and the actual distribution
should be as small as possible. On the basis of above considerations, a method jointly is proposed here to evaluate the model.
As a result, the training effect of the model is reflected by the deviation between the probability and true value, the actual
detection effect is achieved in the meantime. The specific calculation method of deviation is shown as following formula (14)
and formula (15).
bias1i ¼ absðsoft max ðyÞi  ytrue
i Þ; ð14Þ

bias2i ¼ absðypredict
i  ytrue
i Þ; ð15Þ

where ytrue
i denotes the actual label of the i-th sample, ypredict
i denotes the predicted label of the i-th sample.
Under the condition that the deviation value of each probability need to be as low as possible, the accuracy of the pre-
diction results can be ensured at the same time. Generally speaking, the prediction effect is enhanced with the smaller devi-
ation. As a new evaluation method, the distance between the coordinate ðbias1i ; bias2i Þ and the origin ð0; 0Þ is visualized. The
prediction results of R-CNN and APSO-CNN are shown in Figs. 10 and 11.
For enlarging the prediction effect of the model, the deviation of every 1000 samples in the validation set is accumulated
to facilitate observation. Since the closer the distance is, the better the prediction effect is. It can be seen that most points in
the Fig. 11 are closer to the origin, which illustrates the advantage of the APSO-CNN algorithm. Furthermore, the points in
Fig. 11 are scattered in the lower left part of the arc, while the points in Fig. 10 are scattered in the upper right part of the arc
curve. This means that the two indexes we investigate are smaller, the training effect and prediction effect of the proposed
algorithm are better, moreover, the detection ability of network attack in real life is stronger.

5. Results and performance analysis

In order to evaluate the effectiveness of various classification methods in identifying various types of network attacks
launched by zombie hosts, the APSO-CNN algorithm proposed in this paper is compared with three popular algorithms such
as SVM, FNN, and R-CNN. Next, the comprehensive performance of the four algorithms is evaluated from five evaluation indi-
157
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

500

450

400

350

300

250

200

150

100

50

0
0 50 100 150 200 250 300 350 400 450 500

Fig. 10. Prediction results of R-CNN.

500

450

400

350

300

250

200

150

100

50

0
0 50 100 150 200 250 300 350 400 450 500

Fig. 11. Prediction results of APSO-CNN.

cators.

(1) Classification accuracy (i.e. accuracy in this paper)


accuracy ¼ ntrue =ntotal ; ð16Þ
where ntrue denotes the number of correctly classified samples, ntotal denotes the total samples.
Classification accuracy can get the proportion of correctly classified samples in the total samples. Since the accuracy
determines the overall detection effect of the captured network traffic in real life, especially the normal and abnormal traffic
detected. Moreover, the proportion of the correct number detected is required to be large, which has practical significance.
(2) Average precision (i.e. average detection rate in this paper)
X
class
av e  precision ¼ ð1=classÞ  ½ ntrue
i =ðntrue
i þ nfalse
i Þ; ð17Þ
i¼1

158
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

where ntrue
i denotes the number of correct samples divided into the i-th category, nfalse
i denotes the number of error samples
divided into the i-th category, class denotes the types of network attacks.
Average precision indicates the number of network traffic types that are actually identified in each class of actual network
traffic types. In real life, we need to detect the specific types of network attacks, which can provide the basis for the next
decision. In this context, we need to get the detection accuracy of each type of network attack. As a result, the average detec-
tion accuracy of all categories is low, which can show the high recognition rate of each type of network attack to a certain
extent.
(3) Kappa coefficient

kappa ¼ ðaccuracy  pe Þ=ð1  pe Þ; ð18Þ


P
where pe ¼ ð class label
i¼1 ni
ture
 npredict
i
true
Þ=ðntotal  ntotal Þ; nilabel ture
denotes the number of samples actually belonging to the i-th
category, npredict
i
true
denotes the number of samples predicted to belong to the i-th category.
Kappa coefficient is usually used to evaluate whether the prediction results of the model are consistent with the actual
classification results. Its value range is [-1,1]. While, it is generally [0,1] in practical application. Because the accuracy of
kappa coefficient can only be capable of obtaining the overall detection of the correct network mode, without measuring
the existence of a small number of classes of network attack detection for the majority of normal traffic behavior. Therefore,
the bias of the trained model is introduced to evaluate the consistency. In this way, the effectiveness of the network attack
detection model is evaluated practically. The relationship between kappa coefficient and consistent grade is shown in
Table 4.
Obviously, the larger the kappa coefficient is defined, the higher the consistency is obtained, and the better the classifi-
cation effect of the model is presented.
(4) Hamming loss

X
n total

hamming  loss ¼ ð1=ntotal Þ  ½ countðytrue


i  ypredict
i Þ=class; ð19Þ
i¼1

where ntotal denotes the total samples, countðÞ denotes the number of 1, ytrue
i denotes the actual label of the i-th sample, ypredict
i
denotes the predicted label of the i-th sample, class denotes the number of categories of samples.
Hamming loss is also applicable to the problem of multiple classification. In short, it is always used to measure the dis-
tance between the prediction label and the real label, with the value between 0 and 1. A value of 0 indicates that the pre-
dicted result is exactly the same as the real result, and a value of 1 indicates that the model is completely opposite to the
desired result. The model is trained to predict the type of network attacks correctly for practical use, hamming loss describes
the number of actual labels and predicted labels which are inconsistent. Furthermore, hamming loss can verify whether the
model is reasonable in the identification of actual network attack types. Therefore, for network attack detection, the smaller
the indicator is, the better the classifier is.
(5) Jaccard similarity coefficient

X
n total

J ¼ ð1=ntotal Þ  ð jytrue
i \ ypredict
i j=jytrue
i [ ypredict
i jÞ: ð20Þ
i¼1

From the characteristics of the formula, it can be seen that when the predicted label is completely consistent with the
actual label, its value is 1, otherwise it is 0. This evaluation indicator describes the correct network attacks detected by cal-
culating the intersection of predicted labels and real labels, which also evaluates the comprehensive detection effect of the
model trained by the proposed APSO-CNN algorithm. Therefore, jaccard similarity coefficient represents the quality of the
model.
According to the above five evaluation indicators, the proposed APSO-CNN algorithm is compared with SVM, FNN, and R-
CNN. Meanwhile, the size of each indicator value is obtained. Comparison results with five common indicators in each algo-
rithm are clearly shown in Fig. 12.
From Fig. 12, we can observe the magnitude of each algorithm’s evaluation indicator. To a certain extent, the five indica-
tors above are related to the performance of the model used to detect network attacks. It can be clearly seen that the com-
prehensive performance of multi-type network attack detection based on APSO-CNN algorithm is significantly better than
the other three algorithms. The result indicates that APSO-CNN algorithm can effectively identify various network attacks
and provide basis for further decision-making. Moreover, the effectiveness and rationality of the proposed algorithm in

Table 4
The relationship between kappa coefficient and consistent grade

Coefficient range 0–0.2 0.2–0.4 0.4–0.6 0.6–0.8 0.8–1.0


Consistent grade low common medium high almost entire

159
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

Fig. 12. Performance of evaluation index in four algorithms.

multi-classification task is proved as well. When any type of attack is detected, we can disconnect the IoT devices from the
network which will reduce significant losses.
Furthermore, by changing the sample set of training set and validation set every time, 10 independent experiments are
implemented to verify the stability of the proposed APSO-CNN algorithm. As a result, experiments are carried out on the
samples of unfixed training set and validation set. In addition, the maximum value, minimum value, median value and aver-
age value of the four algorithms are compared as shown in Fig. 13.
As is vividly shown above, the broken line with asterisk indicates the stability of the model obtained by the proposed
APSO algorithm. The result illustrates that the performance of other three algorithms is not as good as the APSO-CNN algo-
rithm from the perspective of maximum, minimum, median or average. In 10 experiments, the performance of the proposed
model is very stable. Even in the comparison of the minimum values, the accuracy of the model obtained by using the other
three algorithms on the validation set is lower. Therefore, Fig. 13 shows the effectiveness of the proposed APSO-CNN algo-
rithm in terms of the stability of the overall accuracy of the algorithm in detecting network attacks. This can further provide a

1
SVM
FNN
0.98
R-CNN
APSO-CNN
0.96
Validation-accuracy

0.94

0.92

0.9

0.88

0.86

max min median mean


Statistical characteristics

Fig. 13. Stability of four algorithms.

160
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

basis for the application of the proposed network attack detection model and help guide the related work in the field of
intrusion detection.

6. Conclusion and future work

In this paper, a novel APSO-CNN algorithm is proposed, and it is successfully applied to the detection of multi-type attacks
launched by zombie hosts infected by zombie viruses. The variable hyperparameters of each layer structure built by keras
are taken as the position parameters of particles, and the cross-entropy loss function value under the first training period of
CNN is taken as the fitness value of the APSO. In the iterative optimization of particle swarm, by updating the speed and
position of particle swarm, the smaller fitness value is searched. Meanwhile, the inertia weight factor is adaptively changed
with the fitness value to overcome the possibility of PSO falling into local extremum problem and obtain the suitable CNN
structure parameters. Finally, The effectiveness of the proposed APSO-CNN detection algorithm is demonstrated by compar-
ing with three popular detection algorithms.
Furthermore, several future research directions are worth excavating. If the practical application is considered in isola-
tion, individual stages of multi-stage network attacks against IoT devices may appear as normal activity [38,39]. In addition,
there is a time variation between each attack stage. Therefore, for network security, apart from the detection of the current
attack state, it is also of great significance to predict the next attack state of hackers. The actual data usually comes from
heterogeneous platforms, which inevitably contains abnormal and redundant instances [40]. Moreover, selecting a large
number of features can reduce the model training time [41] and we need to reduce the searching time of heuristic optimiza-
tion algorithm for off-line model. Based on the above discussion, future work can be summarized in three aspects: (1)
whether other methods to improve PSO algorithm can be applied here and reduce the time complexity [42]; (2) whether
there is a suitable prediction model for the evolution of network attack stages is investigated [43,44]; (3) In view of the
heterogeneous problem of different platforms, how to merge and filter the existing network attack data sets and select
the effective features [45,46].

CRediT authorship contribution statement

Xiu Kan: Conceptualization, Methodology, Validation, Investigation, Writing - original draft, Writing - review & editing.
Yixuan Fan: Conceptualization, Methodology, Validation, Investigation, Data curation, Writing - original draft, Writing -
review & editing. Le Cao: Validation, Investigation, Data curation, Writing - review & editing. Neal N. Xiong: Writing - review
& editing. Dan Yang: Data curation. Xuan Li: Investigation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.

Acknowledgment

This work was supported in part by the National Key R & D Program of China under Grant 2020AAA0109301, in part by
the National Natural Science Foundation of China under Grant 61803255 and 61703270.

References

[1] J. Lin, W. Yu, N. Zhang, X.Y. Yang, H.L. Zhang, W. Zhao, A survey on internet of things: architecture, enabling technologies, security and privacy, and
applications, IEEE Internet of Things Journal 4 (5) (2017) 1125–1142, https://doi.org/10.1109/JIOT.2017.2683200.
[2] M.A. Khan, K. Salah, IoT security: review, blockchain solutions, and open challenges, Future Generation Computer Systems 82 (2018) 395–411, https://
doi.org/10.1016/j.future.2017.11.022.
[3] Z. Fei, W. Wei, Y.X. Zhang, X.F. Zhang, Privacy-preserving authentication for general directed graphs in industrial IoT, Information Sciences 502 (2019)
218–228, https://doi.org/10.1016/j.ins.2019.06.032.
[4] C. Perera, A. Bandara, B.A. Price, B. Nuseibeh, Designing privacy-aware internet of things applications, Information Sciences 512 (2020) 238–257,
https://doi.org/10.1016/j.ins.2019.09.061.
[5] N.M. Dehkordi, H.R. Baghaee, N. Sadati, J.M. Guerrero, Distributed noise-resilient secondary voltage and frequency control for islanded microgrids, IEEE
Transactions on Smart Grid 10 (4) (2019) 3780–3790, https://doi.org/10.1109/TSG.2018.2834951.
[6] J. Hu, Z. Wang, G. Liu, C. Jia, J. Williams, Event-triggered recursive state estimation for dynamical networks under randomly switching topologies and
multiple missing measurements, Automatica, 115 (2020) Article No: 108908, doi: 10.1016/j.automatica.2020.108908..
[7] B.P Poudel, A. Mustafa, A. Bidram, H. Modares, Detection and mitigation of cyber-threats in the DC microgrid distributed control system, International
Journal of Electrical Power & Energy Systems, 120 (2020) Article No: 105968, doi: 10.1016/j.ijepes.2020.105968..
[8] W.Y. Wang, Z. Lu, Cyber security in the smart grid: survey and challenges, Computer Networks 57 (5) (2013) 1344–1371, https://doi.org/10.1016/
j.comnet.2012.12.017.
[9] B. Shen, Z. Wang, D. Wang, Q. Li, State-saturated recursive filter design for stochastic time-varying nonlinear complex networks under deception
attacks, IEEE Transactions on Neural Networks and Learning Systems 31 (10) (2020) 3788–3800, https://doi.org/10.1109/TNNLS.2019.2946290.
[10] W.L. Al-Yaseen, Z.A. Othman, M.Z.A. Nazri, Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for
intrusion detection system, Expert Systems with Applications 67 (2017) 296–303, https://doi.org/10.1016/j.eswa.2016.09.041.

161
X. Kan, Y. Fan, Z. Fang et al. Information Sciences 568 (2021) 147–162

[11] F.A. Narudin, A. Feizollah, N.B. Anuar, A. Gani, Evaluation of machine learning classifiers for mobile malware detection, Soft Computing 20 (2016) 343–
357, https://doi.org/10.1007/s00500-014-1511-6.
[12] T. Shon, J. Moon, A hybrid machine learning approach to network anomaly detection, Information Sciences 177 (18) (2007) 3799–3821, https://doi.org/
10.1016/j.ins.2007.03.025.
[13] S. Mohammadi, H. Mirvaziri, M. Ghazizadeh-Ahsaee, H. Karimipour, Cyber intrusion detection by combined feature selection algorithm, Journal of
Information Security and Applications 44 (2019) 80–88, https://doi.org/10.1016/j.jisa.2018.11.007.
[14] Y.Y. Zhou, G. Cheng, S.Q. Jiang, M. Dai, Building an efficient intrusion detection system based on feature selection and ensemble classifier, Computer
Networks. 174 (2020) Article No: 107247, doi: 10.1016/j.comnet.2020.107247..
[15] D. Tran, H. Mac, T. Van, H.A. Tran, N.L. Giang, A LSTM based framework for handling multiclass imbalance in DGA botnet detection, Neurocomputing
275 (2018) 2401–2413, https://doi.org/10.1016/j.neucom.2017.11.018.
[16] A. Pektas, T. Acarman, Deep learning to detect botnet via network flow summaries, Neural Computing and Applications 31 (2018) 8021–8033, https://
doi.org/10.1007/s00521-018-3595-x.
[17] H. Bahsi, S. Nõmm, F.B.L. Torre, Dimensionality reduction for machine learning based IoT botnet detection, in: Proceedings of the 3rd International
Renewable and Sustainable Energy Conference. (2018) 18-21, https://doi.org/10.1109/ICARCV.2018.8581205..
[18] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, A. Shabtai, D. Breitenbacher, Y. Elovici, N-BaIoT: network-based detection of IoT botnet attacks using
deep autoencoders, IEEE Pervasive Computing 17 (3) (2018) 12–22, https://doi.org/10.1109/MPRV.2018.03367731.
[19] Y. Mirsky, T. Doitshman, Y. Elovici, A. Shabtai, Kitsune: an ensemble of autoencoders for online network intrusion detection, in, in: Proceedings of the
Network and Distributed Systems Security Symposium, 2018, pp. 18–21.
[20] G.D.L.T. P, P. Rad, K.R. Choo, N. Beebe, Detecting internet of things attacks using distributed deep learning, Journal of Network and Computer
Applications. 163 (2020) Article No: 102662, doi: 10.1016/j.jnca.2020.102662..
[21] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A.L. Yuille, DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution,
and fully connected CRFs, IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (4) (2018) 834–848, https://doi.org/10.1109/
TPAMI.2017.2699184.
[22] V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: a deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on
Pattern Analysis and Machine Intelligence 39 (12) (2017) 2481–2495, https://doi.org/10.1109/TPAMI.2016.2644615.
[23] Z.M. Peng, Z.C. Li, J.G. Zhang, Y. Li, G.J. Qi, J.H. Tang, Few-shot image recognition with knowledge transfer, in: Proceedings of 2019 IEEE/CVF
International Conference on Computer Vision (ICCV), 2019, pp. 441–449, https://doi.org/10.1109/ICCV.2019.00053.
[24] B.Q. Li, X.H. Hu, Effective vehicle logo recognition in real-world application using mapreduce based convolutional neural networks with a pre-training
strategy, Journal of Intelligent and Fuzzy Systems 34 (3) (2018) 1985–1994, https://doi.org/10.3233/JIFS-17592.
[25] Y.J. Zheng, S.Y. Chen, Y. Xue, J.Y. Xue, A pythagorean-type fuzzy deep denoising autoencoder for industrial accident early warning, IEEE Transactions on
Fuzzy Systems 25 (6) (2017) 1561–1575, https://doi.org/10.1109/TFUZZ.2017.2738605.
[26] N. Zeng, Z. Wang, B. Zineddin, Y. Li, M. Du, L. Xiao, X. Liu, T. Young, Image-based quantitative analysis of gold immunochromatographic strip via cellular
neural network approach, IEEE Transactions on Medical Imaging 33 (5) (2014) 1129–1136, https://doi.org/10.1109/TMI.2014.2305394.
[27] H. Badem, A. Basturk, A. Caliskan, M.E. Yuksel, A new efficient training strategy for deep neural networks by hybridization of artificial bee colony and
limited-memory BFGS optimization algorithms, Neurocomputing 266 (2017) 506–526, https://doi.org/10.1016/j.neucom.2017.05.061.
[28] A. Ratnaweera, S.K. Halgamuge, H.C. Watson, Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients, IEEE
Transactions on Evolutionary Computation 8 (3) (2004) 240–255, https://doi.org/10.1109/tevc.2004.826071.
[29] Z.H. Zhan, J. Zhang, Y. Li, H.S.H. Chung, Adaptive particle swarm optimization, IEEE Transactions on Systems Man and Cybernetics Part B-Cybernetics 39
(6) (2009) 1362–1381, https://doi.org/10.1109/TSMCB.2009.2015956.
[30] N. Zeng, Z. Wang, H. Zhang, K.-E. Kim, Y. Li, X. Liu, An improved particle filter with a novel hybrid proposal distribution for quantitative analysis of gold
immunochromatographic strips, IEEE Transactions on Nanotechnology 18 (1) (2019) 819–829, https://doi.org/10.1109/TNANO.2019.2932271,
https://doi.org/.
[31] G.L.F.D. Silva, T.L.A. Valente, A.C. Silva, A.C.D. Paiva, M. Gattass, Convolutional neural network-based PSO for lung nodule false positive reduction on CT
images, Computer Methods and Programs in Biomedicine 162 (2018) 109–118, https://doi.org/10.1016/j.cmpb.2018.05.006.
[32] T.Y. Tan, L. Zhang, C.P. Lim, B. Fielding, Y.H. Yu, E. Anderson, Evolving ensemble models for image segmentation using enhanced particle swarm
optimization, IEEE Access 7 (2019) 34004–34019, https://doi.org/10.1109/ACCESS.2019.2903015.
[33] F. Hu, M.R. Zhou, P.C. Yan, D.T. Li, W.H. Lai, K. Bian, R.Y. Dai, Identification of mine water inrush using laser-induced fluorescence spectroscopy
combined with one-dimensional convolutional neural network, RSC Advances 9 (2019) 7673–7679, https://doi.org/10.1039/c9ra00805e.
[34] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors,
Computer Science 3 (4) (2012) 212–223, https://arxiv.org/abs/1207.0580.
[35] A.C. Wilson, R. Roelofs, M. Stern, N. Srebro, B. Recht, The marginal value of adaptive gradient methods in machine learning, in: Proceedings of the 31st
Conference on Neural Information Processing Systems, 2017.
[36] N.S. Keskar, R. Socher, Improving generalization performance by switching from Adam to SGD. (2017), https://arxiv.org/abs/1712.07628..
[37] R. Eberhart, J. Kennedy, A new optimizer using particle swarm theory, in: Proceedings of the Sixth International Symposium on Micro Machine and
Human Science, 1995, pp. 39–43, https://doi.org/10.1109/MHS.1995.494215.
[38] T. Chadza, K.G. Kyriakopoulos, S. Lambotharan, Analysis of hidden Markov model learning algorithms for the detection and prediction of multi-stage
network attacks, Future Generation Computer Systems 108 (2020) 636–649, https://doi.org/10.1016/j.future.2020.03.014.
[39] J. Hu, P. Zhang, Y. Kao, H. Liu, D. Chen, Sliding mode control for Markovian jump repeated scalar nonlinear systems with packet dropouts: the uncertain
occurrence probabilities case, Applied Mathematics and Computation. 362 (2019) Article NO: 124574, doi: 10.1016/j.amc.2019.124574..
[40] Y.Y. Zhou, G. Cheng, S.Q. Jiang, M. Dai, Building an efficient intrusion detection system based on feature selection and ensemble classifier, Computer
Networks. 174 (2020) Article NO: 107247, doi: 10.1016/j.comnet.2020.107247..
[41] X.K. Li, W. Chen, Q.R. Zhang, L.F. Wu, Building auto-encoder intrusion detection system based on random forest feature selection, Computers &
Security. 95 (2020) Article No: 101851, doi: 10.1016/j.cose.2020.101851..
[42] J.P.B. Mapetu, Z. Chen, L.F. Kong, Low-time complexity and low-cost binary particle swarm optimization algorithm for task scheduling and load
balancing in cloud computing, Applied Intelligence 49 (2019) 3308–3330, https://doi.org/10.1007/s10489-019-01448-x.
[43] H.Y. Wang, W.Q. Song, E. Zio, A. Kudreyko, Y.J. Zhang, Remaining useful life prediction for Lithium-ion batteries using fractional Brownian motion and
Fruit-fly Optimization Algorithm, Measurement. 161 (2020) Article NO: 107904, doi: 10.1016/j.measurement.2020.107904..
[44] H. Liu, W.Q. Song, M. Li, A. Kudreyko, E. Zio, A generalized cauchy method for remaining useful life prediction of wind turbine gearboxes, Mechanical
Systems and Signal Processing. 153 (2021) Article NO: 107471, doi: 10.1016/j.ymssp.2020.107471..
[45] Z.C. Li, J.H. Tang, Semi-supervised local feature selection for data classification, China Science Information Sciences. Accepted..
[46] Z.C. Li, J.H. Tang, L.Y. Zhang, J. Yang, Weakly-supervised semantic guided hashing for social image retrieval, International Journal of Computer Vision
128 (8) (2020) 2265–2278, https://doi.org/10.1007/s11263-020-01331-0.

162

You might also like