Brain and Nature-Inspired
Learning, Computation and
Licheng Jiao
Xidian University, Xi’an, China
Ronghua Shang
Xidian University, Xi’an, China
Fang Liu
Xidian University, Xi’an, China
Weitong Zhang
Xidian University, Xi’an, China
Chapter Outline
1.1 A brief introduction to the neural network 1
1.1.1 The development of neural networks 2
1.1.2 Neuron and feedforward neural network 3
1.1.3 Backpropagation algorithm 9
1.1.4 The learning paradigm of neural networks 11
1.2 Natural inspired computation 12
1.2.1 Fundamentals of nature-inspired computation 12
1.2.2 Evolutionary algorithm 12
1.2.3 Artificial immune system (AIS) 15
1.2.4 Other methods 16
1.3 Machine learning 18
1.3.1 Development of machine learning 18
1.3.2 Dimensionality reduction 20
1.3.3 Sparseness and low-rank 20
1.3.4 Semisupervised learning 22
1.4 Compressive sensing learning 24
1.4.1 The development of compressive sensing 24
1.4.2 Sparse representation 25
1.4.3 Compressive observation 26
1.4.4 Sparse reconstruction 26
1.5 Applications 27
1.5.1 Community detection 27
1.5.2 Capacitated arc routing optimization 29
1.5.3 Synthetic aperture radar image processing 32
1.5.4 Hyperspectral image processing 36
References 39

1.1 A brief introduction to the neural network

Over the years, scientists have been exploring the secrets of the human brain from various
perspectives, such as medicine, biology, physiology, philosophy, computer science,
cognition, and organization synergetics, hoping to make artificial neurons that simulate the
human brain. In the process of research, in recent years, a new multidisciplinary

Brain and Nature-Inspired Learning, Computation and Recognition.

Copyright © 2020 Tsinghua University Press.
Published by Elsevier Inc. All rights reserved. 1
2 Chapter 1

cross-technology field has been formed, called “artificial neural network” The research
into neural networks involves a wide range of disciplines, which combine, infiltrate, and
promote each other.
Artificial neural network (ANN) is an adaptive nonlinear dynamic system composed of a
large number of simple basic elementsdneurons. The structure and function of each
neuron are relatively simple, but the system behavior produced by a large number of
neuron combinations is very complex. The basic structure of an artificial neural network
mimics the human brain, and reflects some basic characteristics of human brain function.
It can adapt itself to the environment, summarize rules, and complete some operations,
recognition, or process control. Artificial neural networks have the characteristics of
parallel processing, which can greatly improve work speed.

1.1.1 The development of neural networks

The development of artificial neural networks has gone through three climaxes: control
theory from the 1940 to 1960s [1-3], connectionism from the 1980s to the mid-1990s
[4, 5], and deep learning since 2006 [6, 7].
In 1943, Warren McCulloch and Walter Pitts based on a mathematical algorithm
threshold logic algorithm created a neural network model [8]. This linear model
identifies two different types of inputs by testing whether the response output is positive
or negative. The study of neural networks is divided into the study of biological
processes in the brain and the study of artificial intelligence (artificial neural networks).
In 1949, Hebb published Organization of Behavior, and put forward the famous “Hebb
theory” [2] Hebb theory mainly argues that when the axons of neuron A are close to
neuron B and neuron A participates in the repeated and sustained excitement of neuron
B, both the neurons or one of them will change the process of growth or metabolism,
which can enhance the effectiveness of neuron A stimulating neuron B [9]. Hebb theory
was confirmed by Nobel Prize winner Kendall and his animal experiments in 2000 [10].
The later unsupervised machine learning algorithms are the variants of Hebb theory more
or less. In 1958, Frank Rosenblatt simulated a neural network model called the
“perceptron” which was invented on an IBM-704 computer [11]. This model can perform
some simple visual processing tasks. Rosenblatt believed that the perceptron would
eventually be able to learn, make decisions, and translate languages. In 1959, another
two American engineers, Widrow and Hoff [12], put forward the adaptive linear element
(Adaline). This was a change from the perceptron and one of the progenitor models of
machine learning. The main difference between it and perceptron was that the Adaline
neuron has a linear activation function, which allows the output to be of any value. In
Introduction 3

1969, Marvin Minsky and Seymour Papert found two major defects in the neural
network: first, the basic perceptron could not handle XOR problems [13]. Second, the
computing power of the computer was not sufficient to deal with the large neural
network. The study of neural networks was stagnant. In 1974, Paul Werbos proposed that
the multilayer perceptron be trained by a “back propagation algorithm” to overcome the
defects that resulted in the single-layer perceptron being unable to deal with an XOR
problem [14]. However, because neural network research was at a low level at that time,
this method did not attract much attention.
The neural network idea began to revived in the 1980s. In 1982, Hopfield [15] proposed
a novel neural network called the “Hopfield network.” The Hopfield neural network was
a kind of recurrent neural network, which combines a storage system and a binary
system. It introduced the concept of energy function for the first time so that the
equilibrium state of the neural network had a clear criterion method. But due to the
limitations of computing, for the rest of the 20th century, the popularity of support vector
machines and other simpler algorithms, such as linear classifiers, gradually exceeded the
neural network.
In 1998, LeCun proposed convolutional neural networks, called LeNet-5, which were
updated by back propagation and this method achieved a good result in a handwritten
digits database [16]. In the early 21st century, the computing power of the computer was
greatly improved with the help of GPU and distributed computing. The neural network has
since gained great attention and development. In 2006, Geoffrey Hinton [17] effectively
trained a deep belief network with greedy layer-wise pretraining. This technique was then
extended to many different neural networks by researchers, greatly improving the
generalization effect of the model on the test set. In 2012, Hinton’s group won Image Net
2012 [18]. Their image classification accuracy rate was far more than the second place.
The deep neural network algorithm has a great advantage over the traditional algorithm in
some areas.
In 2016, Alpha Go [19], an artificial intelligent software which was developed by Google
Deep Mind, beat the human top professional chess player. Its principle was to use a
Monte Carlo tree search method combined with two different deep neural networks. The
emergence of Alpha Go once again pushed the development of neural networks to the

1.1.2 Neuron and feedforward neural network

A neuron is a biological model based on the nerve cells of a biological nervous system. In
the study of biological nervous systems, the biological mechanism of the neuron can be
4 Chapter 1

represented by mathematics and a computational model based on the neuron obtained. The
neurons contain three parts: cell body, dendrites, and axons. The cell body is complexly
formed by many molecules. It is the energy supply area of neuronal activity, where
metabolic activities such as metabolism are carried out. Dendrites are the entry to receive
information from other neurons. Axons are the outlets for stimulating neurons to transmit
information. The synapse is the structure that enables communication between one neuron
and another and transmits information between them.
Neural networks are described on the basis of the mathematical model of neurons. The
model is represented by network topology, node characteristics, and learning rules. The
main advantages of neural networks are as follows:
(1) Parallel distribution processing
(2) High robustness and fault tolerance
(3) Distributed storage and learning ability
(4) The ability to fully approach the complex nonlinear relationship.
According to the characteristics of the neurons and the biological function, it is known
that the neuron is a processing unit of information with multiple inputs and a single
output. The processing of information is nonlinear, and we abstract it into a simple
mathematical model, as shown in Fig. 1.1.
The specific mathematical formulas are as follows:
8 Xm
<v ¼ xi wi þ b
: i¼1
y ¼ 4ðvÞ

x w biases

x w
∑ ϕ y
. .
. output
. sum
. .

xm wm

Figure 1.1
The mathematical model of neurons.
Introduction 5

Typical activation functions include sigmoid function, tanh function, radial basis
function, wavelet function, ReLU function, softplus function, etc. The corresponding
formula is (1.2).
> 1
> sigmoidðxÞ ¼
> 1 þ ex
< ex  ex
tanhðxÞ ¼ x
> e þ ex (1.2)
> ReLUðxÞ ¼ maxð0; xÞ
: softplusðxÞ ¼ logð1 þ ex Þ

Neuroscientists have found that neurons have the characteristics of unilateral inhibition,
wide excited boundary, sparse activation, and so on. Compared with other activation
functions, the rectified linear unit function (ReLU) has biological interpretability. In
addition, the derivative of the softplus function is the logistics function. This is the smooth
form of a rectified linear unit. Although it also has unilateral inhibition and wide
excitability boundary characteristics, it has no characteristics of sparse activation.
Based on mathematical neuron model, neural networks can be divided into forward
networks (directed acyclic) and feedback networks (undirected complete graph, also called
cyclic network) according to the topology of the network connection. For the feedback
network, the stability of the network model is closely related to the associative memory.
The Hopfield network and the Boltzmann machine are of this type. The forward network
can be realized by the multiple compound of simple nonlinear function. The network
structure is very simple. The following is an introduction to the forward neural network.
Its network structure is shown in Fig. 1.2.
The corresponding mathematical formula is (1.3).


Figure 1.2
Feedforward neural network with a single hidden layer.
6 Chapter 1
8 !
> Xm
h ¼4 ð1Þ
xi ,wi þ b ð1Þ
< i¼1
0 1 (1.3)
> Xn
> ð1Þ ð2Þ
: y¼4 @ hj ,wj þ bð2Þ A

where the input x˛ℝm, the hidden layer h˛ℝn, and the output y˛ℝK. w(1)˛ℝmn and
b(1)˛ℝn are the weight connection matrix and bias from the input layer to the hidden layer,
respectively. w(2)˛ℝnK and b(2)˛ℝK are the weight connection matrix and bias from the
hidden layer to the output layer. 4(1) and 4(2)are the activation function. In practical
applications, the training data set is assumed to be
8n oN
> x ðnÞ ðnÞ
; y
< n¼1
ðnÞ (1.4)
> x ˛ ℝ m
: ðnÞ
y ˛ ℝK
The model between the input and output is Formula (1.5).
0 ! 1
n Xm
ð1Þ ð2Þ
y ¼ Tðx; qÞ ¼ 4ð2Þ @ 4ð1Þ xi , wi þ bð1Þ wj þ bð2Þ A (1.5)
j¼1 i¼1

The parameter q¼(w(1),b(1);w(2),b(2)) is further optimized for the target (the loss term and
the regular term composition).
N   2 2  
1 X  ðnÞ  X  ðlÞ 2
min LðqÞ ¼ y  T xðnÞ ; q  þ l w  (1.6)
q N n¼1 F F

The gradient descent method is used to solve the parameter q.

8 k
< q ¼ q a,Vqjq¼qk1

> Vqj k1 ¼ vLðqÞ
: q¼q vq q¼qk1

With the increase in the iteration number k, the parameters will converge (indirectly
through the target function L(qk) to visualize the observation).
limqk ¼ q (1.8)

The reason for convergence is that the above objective function is convex. To optimize the
objective function, it can be directly solved by the closed form solution. However, when
the amount of data is large, storage and reading will be very time-consuming. Therefore, it
Introduction 7

is usually solved by random gradient descent (with batch processing). For the neural
networks with determined topology structure, Hornik et al. [20e22] proved that if the
output layer adopts linear activation function and the hidden layer adopts sigmoid
function, the single hidden layer neural network can approximate any rational function
with any accuracy.
When the number of layers of the network is more than one, it is called a multi hidden
layer feedforward neural network, or a deep feedforward neural network. Its structure is
shown in Fig. 1.3.
The topology of the deep feedforward neural network is the multi hidden layer, the full
connection, and the directed acyclic. Using the following notation, the model between the
input and output of the network is given.
The input layer is x˛ℝm, the output layer is y˛ℝs, and the output of the hidden layer is
written as
8 !
> ðlÞ
h ¼4 ðlÞ ðl1Þ ðlÞ
hi wi þ b ðlÞ
< i¼1

> l ¼ 1; 2; .L (1.9)
> hð0Þ ¼ x
hðLÞ ¼ y
Removing the input layer h(0)and the output layer h(L), the number of hidden layers is L1,
and the corresponding hyperparameters (the number of layers, the number of hidden units,
the activation function) are represented as:
> L þ 1/the number of layers
½n0 ; n1 ; n2 ; .; nL1 ; nL /dimensions of each layer (1.10)
> h i
> ð1Þ ð2Þ ðL1Þ ðLÞ
: 4 ; 4 ; .; 4 ;4 /activation function

where n0 ¼ m and nL ¼ s. The parameters to be learned are represented as

Hidden Layers

Figure 1.3
Feedforward neural network with multi hidden layers.
8 Chapter 1
> q ¼ ðq1 ; q2 ; .; qL Þ
ql ¼ wðlÞ ˛ ℝnl1 nl ; bðlÞ ˛ ℝnl (1.11)
l ¼ 1; 2; .; L
The relationship between input and output is represented as
ðL1Þ ðLÞ
y ¼ hðLÞ ¼ 4ðLÞ hiL ,wiL þ bðLÞ /writteng 4ðLÞ hðL1Þ ; qL
iL ¼1
! !
nL X
ðL2Þ ðL1Þ ðLÞ
ðLÞ ðL1Þ ðL1Þ ðLÞ
¼4 4 hiL1 ,wiL1 þb wiL þb
iL ¼1 iL1 ¼1
ðLÞ ðL1Þ ðL2Þ
/written 4 4 h ; qL1 ; qL

¼ 4ðLÞ 4ðL1Þ /4ð1Þ ðx; q1 Þ/; qL1 ; qL /written f ðx; qÞ

In practical applications, the training data set is assumed to be

8n oN
> x ðnÞ ðnÞ
; y
< n¼1
> xðnÞ ˛ ℝm
: ðnÞ
y ˛ ℝs
The optimized objective function (the loss term and the regular term) is as follows:
min JðqÞ ¼ LðqÞ þ lRðqÞ (1.14)

where b
y n ¼ f ðxn ; qÞ and
>    2
> y n ¼ yn  b
l yn ; b y n F
> XN  
< LðqÞ ¼ 1 l yn ; b
N n¼1 (1.15)
> X L XL  
> 2  ðLÞ 2
> RðqÞ ¼ kql k ¼ w 
: l¼1

There are many forms of loss function l(,): energy function, cross entropy loss, and the
regularization term R(,) includes Frobenius norm (preventing overfitting), and sparse
regularization (simulating biological response characteristics).
Introduction 9

1.1.3 Backpropagation algorithm

In order to optimize the objective function Formula (1.14), first, we must determine
the convexity and nonconvexity of the function (Fig. 1.4). If the feasible region is
the convex set, the convex function defined on the convex set is convex
optimization. And the obtained solution does not depend on the selection of initial
value and is the global optimal solution. Usually, the optimization objective function
of a deep feedforward neural network is nonconvex, therefore the solution of
parameters depends on the setting of the initial parameters (there are many saddle
points and local extreme points in the feasible region). If the setup is reasonable,
you can avoid falling into the local optimal. In order to illustrate the
backpropagation algorithm (based on the gradient descent method), the following
method is described to update the parameters.
> ðkÞ ðk1Þ
<q ¼ q a,Vqjq¼qðkÞ
: Vqj ðkÞ ¼ vLðqÞ þ l vRðqÞ
q¼q vq vq
where a is the learning rate, and the specific parameters on each layer are updated as
q ¼ ql
a,Vql jq ¼qðk1Þ
< l l l
vLðqÞ vRðqÞ (1.17)
> Vq j ðk1Þ ¼ þ l
: vql ql ¼qðk1Þ vql ql ¼qðk1Þ
l ql ¼ql
l l


Hidden Layers

Figure 1.4
An illustration of the backpropagation algorithm.
10 Chapter 1
where ql is the value to be updated for the lth layer in the kth iteration. And the error
propagation term is introduced for the solution of the gradient descent. According to the
chain rule, it is expanded to:

vLðqÞ vhðlÞ vhðlþ1Þ vhðLÞ vLðqÞ

¼ $ . $ (1.18)
vql vql vhðlÞ vhðL1Þ vhðLÞ
The error propagation term is written as:
dðlÞ ¼ (1.19)
With the further use of ql ¼ (w(l),b(l)), the corresponding derivatives of parameters for the
hidden layer output are represented as:
> ðlÞ l1 T ðlÞ ðlÞ  ’
> vh ðlÞ v4 h $w þ b
> ¼ ¼ h l1
< ðlÞ ðlÞ
vw vw
> ðlÞ l1 T ðlÞ ðlÞ
> vhðlÞ v4 h $w þ b  
: ¼ ¼ 1$ 4 ðlÞ

vbðlÞ vbðlÞ
where “$” is the Hadamard product. Formula (1.17) is the derivatives of the parameters on
the loss term, and the derivatives of the regular term are:
vRðqÞ v X L
vkql k2F
¼ kql k2F ¼ (1.21)
vql vql l¼1 vql

Usually, the constraints in the regular term are only for the weight matrix, and the bias is
not regular, so there are:
 ðlÞ 2
> vRðqÞ vw 
< ðlÞ
¼ ðlÞ
¼ 2wðlÞ
vw vw
 2 (1.22)
> ðlÞ
: vRðqÞ ¼
ðlÞ ðlÞ
vb vb
The process of optimizing the parameter ql of the lth hidden layer is mainly determined by
the gradient (first derivative) of loss item L(q) and regular item R(q) to the parameter ql.
The error propagating is realized by introducing the error propagation term [Formula
(1.19)]. Training of the feedforward neural network is divided into two steps. The first is
to calculate the output value of each layer in the forward propagation process according to
the current parameter value. The second is to backpropagate the error item of each layer
according to the difference between the actual output and the expected output.
Introduction 11

The partial derivatives of each layer’s output are combined to update the parameters.
Repeat the two steps until the network converges. When the network’s layer is deep, the
gradient error of parameters on each layer will gradually decrease from the output to the
input. (When it is closer to the output, the decline is greater. When it is closer to the input,
the decline is smaller and may be zero.). This makes the whole network difficult to obtain
better parameters by training. This phenomenon rejects global minima and saddle points of
the feasible region and makes object function tend to fall into the local optimum. This is
the vanishing gradient problem.

1.1.4 The learning paradigm of neural networks

The basic neural network still uses the paradigm of machine learning, that is, data, model,
optimization, and solving four parts. Machine learning emphasizes learning data features
based on the prior (including extracting and screening feature to get the discriminable
feature) and the classifier design. But model expression ability is limited by the
characteristics of learning. The advantages are that it can quickly optimize the objective
function by using a convex optimization algorithm or software. Its core is the pursuit of
speed and precision.
Compared with machine learning, a deep neural network reduces the dependence on priori
data. The representation ability of the model is increasingly deep and essential with the
deepening of layers.
(1) In the training stage, the labeled data are scarce and there are more parameters of the
model to be trained. This will lead to insufficient training or overfitting.
(2) The optimization objective is a nonconvex optimization problem. It depends on the se-
lection of the initial value. Choosing the proper initial value can avoid prematurely
falling into the local optimum and the obtained solution is close to the optimal solu-
tion. If the selection is not good, the network is prone to underfitting.
(3) When the backpropagation algorithm is used, the phenomenon of a vanishing gradient
problem can easily occur, which leads to inadequate training of the network model.
The difference in data is crucial to the deep neural network. For classification tasks,
stronger aggregation represents that the data belonging to the same class have greater
similarity. The common features are the main part, and the individual characteristics are
supplemented. The large sparsity between classes indicates that there is greater difference
between classes. That is, personalization is the main feature, and the common features are
supplemented. Using a deep neural network for feature learning, the multilevel
combination of hierarchical parameters will give the weight parameter a discriminable
characteristic. It emphasizes commonality in the class and pays attention to individuality
among the classes. The most satisfying model under the combination of parameters also
12 Chapter 1

indirectly indicates that the two factors mentioned above are contradictory and unified. In
essence, a deep neural network represents data in a hierarchical method. An advanced
representation is based on low-level representation. A complex problem is divided into a
series of nested and simple representation learning problems. For example, the first hidden
layer identifies the edge from some pixels and their adjacent pixels’ value in the image.
The second hidden layers integrate the edges to identify the outlines and corners. The third
hidden layers extract specific outlines and corners as abstract high-level semantic features.
Finally, a linear classifier is used to identify the target in the image.

1.2 Natural inspired computation

1.2.1 Fundamentals of nature-inspired computation

Bio-intelligence is a very important source of theoretical inspiration in artificial

intelligence research. From the perspective of information processing, the organism is an
excellent information processor, and its ability to solve problems through its own evolution
is also dwarfing the current best computer. In recent years, artificial intelligence
researchers have become accustomed to referring to the intelligent algorithms developed
by inspiration from natural phenomena as nature-inspired computation (NIC). Based on
the functions, characteristics, and mechanisms of organisms in nature, it studies the
abundant processing mechanisms contained in it, constructs corresponding computational
models, and designs corresponding algorithms and applies them to various fields. Natural
computing is not only a new hotspot in artificial intelligence research, but also a new way
of thinking for the development of artificial intelligence, and a new result of the
transformation of methodology. The research results include artificial neural networks,
evolutionary algorithms, artificial immune systems, fuzzy logic, quantum computing, and
complex adaptive systems, etc. Natural computing can solve many complex problems
which are difficult to solve by traditional computing methods. It has a good application
prospect in the fields of solving large-scale complex optimization problems, intelligent
control, and computer network security. This section focuses on evolutionary algorithms
and artificial immune systems.

1.2.2 Evolutionary algorithm

Evolutionary computation is a kind of adaptive artificial intelligence technique that

simulates the process and mechanism of the biological evolution to solve problems. The
core idea comes from a basic understanding that the process of evolution from simple to
complex and low level to high level is a natural, parallel, and robust optimization process.
The goal of this process is to achieve the purpose of optimization through the adaptability
of the environment, the “survival of the fittest” and genetic variation of the biological
Introduction 13

Evolutionary algorithm (EA) is a kind of random search technology based on the above
ideas. They simulate the learning process of a group of individuals, each of which
represents a point in a given problem search space. The evolutionary algorithm starts from
the selected initial solution and gradually improves the current solution through an
iterative evolutionary process until the best solution or a satisfactory solution position is
found. In the course of evolution, the algorithm uses a method similar to natural selection
and sexual reproduction in a set of solutions to generate the next-generation solutions with
better performance indicators on the basis of the inherited superior genes.
The general steps for solving an optimization problem using an evolutionary algorithm
(1) Give a set of initial solutions randomly;
(2) Evaluate the performance of the current set of solutions;
(3) If the current solution satisfies the requirements or the evolution process reaches a
certain algebra, the calculation will be terminated;
(4) According to the evaluation result of (2), select a certain number of solutions from the
current solutions as the objects of genetic operations;
(5) Perform genetic operations on the selected solutions, such as crossover, mutation, etc.,
to get a new set of solutions. Then go to (2).
The commonly used search methods fall into three categories: enumeration, analytics, and
randomization. Enumeration refers to enumerating all feasible solutions within a set of
feasible solutions in order to find the optimal solution. For a continuous function, it needs
to be discretized. However, many practical problems correspond to a large search space, so
the solution to this method is very inefficient. The analytical method mainly uses the
properties of the objective function in the solution process, such as the first derivative, the
second derivative, and so on. This method can be divided into two kinds of methods:
direct and indirect. The direct method determines the next search direction based on the
gradient of the objective function, so it is difficult to find the global optimal solution,
while the indirect method derives a set of equations from the necessary conditions of
extreme values, and then solves the system of equations. However, the derived equations
are generally nonlinear and their solution is very difficult. The random method introduces
random changes to the search direction during the search process, making the algorithm
jump out of the local extreme point with a greater probability during the search process.
Randomization can be further divided into blind randomization and guided randomization.
The former randomly selects different points in the feasible solution space for detection,
the latter changes the current search direction with a certain probability, and searches in
other directions.
EAs belong to a random search method, which adopt a random processing method in the
initial solution generation and the genetic operations such as selection, crossover, and
14 Chapter 1

variation. Compared with the traditional search algorithms, they have the following
(1) EAs do not act directly on the solution space, but use some kind of encoding represen-
tation of the solution.
(2) EAs start from a group of multiple points rather than one point, which is one of the
main reasons why they can find the global optimal solution with a large probability.
(3) EAs only use the adaptive information of the solution (i.e., the value of the objective
function) and weigh between increasing revenue and reducing overhead, while tradi-
tional search algorithms typically use derivatives.
(4) EAs use stochastic transition rules rather than deterministic transition rules.
In addition, the main features of EAs compared with the traditional algorithm are reflected
in the following two aspects.
Intelligence: The intelligence of EAs includes self-organization, self-adaptation, and self-
learning. When using EAs to solve the problem, the algorithm will use the information
obtained in the evolution process to self-organize the search after the coding scheme,
fitness function, and genetic operator are determined. This intelligent feature of EAs also
gives them the ability to automatically discover the characteristics and laws of the
environment based on changes in the environment.
Essential parallelism: The essential parallelism of EAs is manifested in two aspects. The
first is that EA is inherently parallel, that is, EA itself is well-suited for massive
parallelism; the second is the inherent parallelism of EA. EA uses the population method
for searching, so it can search for multiple areas within the solution space and exchange
information with each other.
The currently studied EAs are mainly divided into four types [1e11]: genetic algorithms
(GAs), evolutionary programming (EP), evolution strategy (ES), and genetic programming
(GP). The first three algorithms were developed independently of each other, and the last
is a branch developed on the basis of the genetic algorithm. Although these branches have
some subtle differences in the implementation of the algorithm, they have a common
feature, that is, they all rely on the ideas and principles of biological evolution to solve
practical problems.
Evolutionary computation is the product of multidisciplinary integration and infiltration. It
has developed into a comprehensive technology of self-organizing and self-adaption,
which has been widely used in computer science, engineering technology, management
science, and social science. At present, the research into evolutionary computation mainly
focuses on basic theory, function optimization, combinatorial optimization, classification
system, parallel evolutionary algorithm, image processing, evolutionary neural network,
and artificial life.
Introduction 15

1.2.3 Artificial immune system (AIS)

The artificial immune system (AIS) inspired by immunology is an adaptive system to solve
complex problems by simulating immune functions, principles, and models [12]. As early as
the mid-1980s, Farmer et al. [13] took the lead in providing a dynamic model of the
immune system based on the immune network theory and discussed the relationship
between the immune system and artificial intelligence methods which took up research on
artificial immune systems. However, the research findings after this are rare. Until
December 1996, on an international symposium that was held in Japan based on the
immune system, the concept of “artificial immune system” was firstly proposed.
Subsequently, the relevant research on the artificial immune system began rapidly and the
related papers and research results increased year by year. In 1997 and 1998, IEEE Systems,
Man and Cybernetics International Conference organized a related topic discussion and
established the “Artificial Immune System Memory Application Branch.” Subsequently, the
topic of artificial immune system also successively opened up on some famous international
conferences in the field of artificial intelligence, such as the International Joint Conference
on Artificial Intelligence (IJCAI), International Joint Conference on Neural Networks
(IJCNN), IEEE Congress on Evolutionary Computation (CEC), Genetic and Evolutionary
Computation Conference (GECCO), etc. Since 2002, six consecutive international
conferences on artificial immune systems have been held in the United Kingdom, Italy,
Canada, and Brazil. After more than a decade of development, the research into artificial
immune system algorithms has focused on the negative selection algorithm [14], clonal
selection algorithm [15], and immune network algorithm [16] and the research results
mainly relate to anomaly detection, computer security, data mining, and optimization, etc.
The organism is a complex large system whose information-processing function is
completed by three subsystems with different time and spatial dimensions, including the
brain nervous system, the immune system, and the endocrine system. The immune system,
consisting of immune-functioning organs, tissues, cells, immune effector molecules, and
related genes, is a necessary defense mechanism for organisms, especially vertebrates, and
can protect antibodies against the invasion of pathogens, harmful foreign bodies, cancer
cells, and pathogenic factors [13]. The immune function mainly includes immune defense,
immune stability, and immune surveillance. From the perspective of engineering
applications and information processing, biological immune systems provide many
information-processing mechanisms for artificial intelligence. It is the full recognition of
the rich information-processing mechanism in the biological immune system that enabled
Farmer et al. to take the lead in giving a dynamic model of the immune system based on
the immune network theory, discussing the relationship between the immune system and
other artificial intelligence methods, which began the research into artificial immune
system [13].
16 Chapter 1

The artificial immune system is a kind of intelligent method that imitates the natural
immune system. It realizes a learning technology inspired by the biological immune
system and the natural defense mechanism of external substances and provides the essence
of noise tolerance, non-teacher learning, self-organization, and memory. Combined with
some of the advantages of classifiers, neural networks, and machine inference, the artificial
immune system has the potential to provide novel solutions to problems. Its research
results involve many fields such as control, mathematical processing, optimization
learning, and fault diagnosis, etc. It has become another research hotspot of artificial
intelligence following neural networks, fuzzy logic, and evolutionary computation.
Although the artificial immune system has been gradually emphasized by researchers,
compared with the artificial neural networks that have been used in more mature methods
and models, whether it is the understanding of immune mechanisms, the construction of
immune algorithms, or the application of engineering, corresponding research on the
artificial immune system is at a relatively low level.
The research into the artificial immune system mainly focuses on three aspects, namely
research into the artificial immune system model, research into the artificial immune
system algorithm, and application of the artificial immune system. This book focuses on
the research and applications of immune optimization algorithms. Looking at the research
results of the artificial immune system, the immune calculation for the purpose of solving
optimization problems has attracted the attention of many researchers. Representative
research results include the clonal selection algorithm proposed by de Castro et al. [15],
the B-cell algorithm proposed by Timmis et al. [16], the immune network algorithm
proposed by de Castro et al. [17], the vaccine-based immune algorithm [18] proposed by
Jiao et al., and the immune optimization algorithm (opt-IA) [19] proposed by Cutello
et al., and a series of advanced clonal selection algorithms, etc. Many scholars have
generated great interest in these studies and proposed a series of improved algorithms in
succession; furthermore, they have conducted extensive research on the application of
these algorithms.

1.2.4 Other methods

In addition, research into NIC also includes quantum computation (QC) and the complex
adaptive system (CAS), etc.
The study of quantum computing began in 1982. Quantum computing was first seen as a
physical process by Richard Feynman, the Nobel Prize winner in physics and has now
become one of the foremost disciplines closely followed by countries around the world
today. The parallelism, exponential storage capacity, and exponential acceleration features
of quantum computing demonstrate its powerful computational capabilities [20,21]. In
1994, Peter Shor proposed a quantum algorithm for decomposing large prime factors
Introduction 17

which only takes a few minutes to complete the RSA-129 problem (a public key
cryptosystem) that requires 1600 classic computers to complete in 250 days. RSA is a
public key system known to be the safest and cannot be deciphered by classical
computers, but it can be easily deciphered by a quantum computer [22]; in 1996, Grover
proposed a quantum search algorithm that can replace approximately 3.5*1016 steps of a
classical computer with only 200 million steps for deciphering the widely used 56-bit data
encoding standard DES (a type used to protect
pffiffiffiffi interbank and other financial transactions)
to prove that quantum computers are O N faster than classical computers in exhaustive
search problems [23]. At present, quantum computing has been successfully applied in the
fields of secure communication, password systems, and database searches, etc. The United
States developed a prototype of a quantum computer computing as early as 1999.
Computational experts predict that this century will see the emergence and application of a
quantum computer which is 1000 times faster than electronic technology at the solution of
puzzles in the research into quantum computers.
Quantum algorithms are related to classical algorithms, whose most essential features are
the use of the superposition and coherence of quantum states, as well as the entanglement
between quantum bits. It is the product of quantum mechanics in the field of algorithms
and has quantum parallelism which is the most essential difference compared with other
classical algorithms [24, 25]. In the probabilistic algorithm, the system of the state
probability vector is no longer in a fixed state, but is a probability corresponding to each
possible state. If you know the initial state probability vector and the state transition
matrix, you should be able to get the probability vector at any time by multiplying the
state probability vector and the state transition matrix [26]. The quantum algorithm is
similar to this, except that the probability amplitudes of the quantum states need to be
pffiffiffiffi because they are normalized, caused by the fact that the probability amplitude
is N larger than the classical probability. The state transition matrix is changed by
WalsheHadamard, the rotation phase operation, etc. [27].
The complex adaptive system found by Professor Holland, who is researching a complex
system named the Complex Adaptive System (CAS) at the Santa Fe Institute (SFI),
consists of networks of parallel, interacting agents [28, 29]. Such systems include the
human brain, immune system, ecosystems, cells, ant colonies, political parties, and
organizations in human society, etc. The basic idea of a complex adaptive system is that
individuals (elements) are called agents in the system [30] and have their own purpose and
initiative and are active and adaptive. Agents can “learn” and “accumulate experience” in
the ongoing interaction with the environment and other agents so that they can change
their structure and behavior based on learned “experiences.” It is this initiative and the role
among agents, the environment, and other agents, that means they constantly change
themselves, and the environment becomes the basic driving force for system development
and evolution. The evolution of the entire system, including the emergence and
Another random document with
