2016 Book SearchAndOptimizationByMetaheu

Ke-Lin Du
M.N.S. Swamy
Search and
Optimization by
Metaheuristics
Techniques
and Algorithms
Inspired by Nature
Ke-Lin Du M.N.S. Swamy
•
Search and Optimization

by Metaheuristics
Techniques and Algorithms Inspired
by Nature
Ke-Lin Du M.N.S. Swamy
Xonlink Inc Department of Electrical and Computer
Ningbo, Zhejiang Engineering
China Concordia University
Montreal, QC
and Canada
Department of Electrical and Computer
Engineering
Concordia University
Montreal, QC
Canada
ISBN 978-3-319-41191-0 ISBN 978-3-319-41192-7 (eBook)

DOI 10.1007/978-3-319-41192-7
Library of Congress Control Number: 2016943857
Mathematics Subject Classification (2010): 49-04, 68T20, 68W15
© Springer International Publishing Switzerland 2016

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
Printed on acid-free paper
This book is published under the trade name Birkhäuser

The registered company is Springer International Publishing AG Switzerland
(www.birkhauser-science.com)
To My Friends Jiabin Lu and Biaobiao
Zhang
Ke-Lin Du
and
To My Parents
M.N.S. Swamy
Preface
Optimization is a branch of applied mathematics and numerical analysis. Almost

every problem in engineering, science, economics, and life can be formulated as an
optimization or a search problem. While some of the problems can be simple that
can be solved by traditional optimization methods based on mathematical analysis,
most of the problems are very hard to be solved using analysis-based approaches.
Fortunately, we can solve these hard optimization problems by inspirations from
nature, since we know that nature is a system of vast complexity and it always
generates a near-optimum solution.
Natural computing is concerned with computing inspired by nature, as well as
with computations taking place in nature. Well-known examples of natural com-
puting are evolutionary computation, neural computation, cellular automata, swarm
intelligence, molecular computing, quantum computation, artificial immune sys-
tems, and membrane computing. Together, they constitute the discipline of com-
putational intelligence.
Among all the nature-inspired computational paradigms, evolutionary compu-
tation is most influential. It is a computational method for obtaining the best pos-
sible solutions in a huge solution space based on Darwin’s survival-of-the-fittest
principle. Evolutionary algorithms are a class of effective global optimization
techniques for many hard problems.
More and more biologically inspired methods have been proposed in the past
two decades. The most prominent ones are particle swarm optimization, ant colony
optimization, and immune algorithm. These methods are widely used due to their
particular features compared with evolutional computation. All these biologically
inspired methods are population-based. Computation is performed by autonomous
agents, and these agents exchange information by social behaviors. The memetic
algorithm models the behavior of knowledge propagation of animals.
There are also many other nature-inspired metaheuristics for search and opti-
mization. These include methods inspired by physical laws, chemical reaction,
biological phenomena, social behaviors, and animal thinking.
Metaheuristics are a class of intelligent self-learning algorithms for finding
near-optimum solutions to hard optimization problems, mimicking intelligent
processes and behaviors observed from nature, sociology, thinking, and other
disciplines. Metaheuristics may be nature-inspired paradigms, stochastic, or
vii
viii Preface
probabilistic algorithms. Metaheuristics-based search and optimization are widely

used for fully automated decision-making and problem-solving.
In this book, we provide a comprehensive introduction to nature-inspired
metaheuristical methods for search and optimization. While each
metaheuristics-based method has its specific strength for particular cases, according
to no free lunch theorem, it has actually the same performance as that of random
search in consideration of the entire set of search and optimization problems. Thus,
when talking about the performance of an optimization method, it is actually based
on the same benchmarking examples that are representatives of some particular
class of problems.
This book is intended as an accessible introduction to metaheuristic optimization
for a broad audience. It provides an understanding of some fundamental insights on
metaheuristic optimization, and serves as a helpful starting point for those interested
in more in-depth studies of metaheuristic optimization. The computational para-
digms described in this book are of general purpose in nature. This book can be
used as a textbook for advanced undergraduate students and graduate students. All
those interested in search and optimization can benefit from this book. Readers
interested in a particular topic will benefit from the appropriate chapter.
A roadmap for navigating through the book is given as follows. Except the
introductory Chapter 1, the contents of the book can be grossly divided into five
categories and an appendix.
• Evolution-based approach is covered in Chapters 3–8:
Chapter 3. Genetic Algorithms
Chapter 4. Genetic Programming
Chapter 5. Evolutionary Strategies
Chapter 6. Differential Evolution
Chapter 7. Estimation of Distribution Algorithms
Chapter 8. Topics in Evolutionary Algorithms
• Swarm intelligence-based approach is covered in Chapters 9–15:
Chapter 9. Particle Swarm Optimization
Chapter 10. Artificial Immune Systems
Chapter 11. Ant Colony Optimization
Chapter 12. Bee Metaheuristics
Chapter 13. Bacterial Foraging Algorithm
Chapter 14. Harmony Search
Chapter 15. Swarm Intelligence
• Sciences-based approach is covered in Chapters 2, 16–18:
Chapter 2. Simulated Annealing
Chapter 16. Biomolecular Computing
Chapter 17. Quantum Computing
Chapter 18. Metaheuristics Based on Sciences
• Human-based approach is covered in Chapters 19–21:
Preface ix
Chapter 19. Memetic Algorithms

Chapter 20. Tabu Search and Scatter Search
Chapter 21. Search Based on Human Behaviors
• General optimization problems are treated in Chapters 22–23:
Chapter 22. Dynamic, Multimodal, and Constrained Optimizations
Chapter 23. Multiobjective Optimization
• The appendix contains auxiliary benchmarks helpful to test new and existing
algorithms.
In this book, hundreds of different metaheuristic methods are introduced.
However, due to space limitation, we only give detailed description to a large
number of the most popular metaheuristic methods. Some computational examples
for representative metaheuristic methods are given. The MATLAB codes for these
examples are available at the book website. We have also collected some MATLAB
codes for some other metaheuristics. These codes are of general purpose in nature.
The reader needs just to run these codes with their own objective functions.
For instructors, this book has been designed to serve as a textbook for courses on
evolutionary algorithms or nature-inspired optimization. This book can be taught in
12 two-hour sessions. We recommend that Chapters 1–11, 19, 22 and 23 should be
taught. In order to acquire a mastery of these popular metaheuristic algorithms,
some programming exercises using the benchmark functions given in the appendix
should be assigned to the students. The MATLAB codes provided with the book are
useful for learning the algorithms.
For readers, we suggest that you start with Chapter 1, which covers basic
concepts in optimization and metaheuristics. When you have digested the basics,
you can delve into one or more specific metaheuristic paradigms that you are
interested in or that satisfy your specific problems. The MATLAB codes accom-
panying the book are very useful for learning those popular algorithms, and they
can be directly used for solving your specific problems. The benchmark functions
are also very useful for researchers for evaluating their own algorithms.
We would like to thank Limin Meng (Zhejiang University of Technology,
China), and Yongyao Yang (SUPCON Group Inc, China) for their consistent
help. We would like to thank all the helpful and thoughtful staff at Xonlink Inc. Last
but not least, we would like to recognize the assistance of Benjamin Levitt and the
production team at Springer.
Ningbo, China Ke-Lin Du

Montreal, Canada M.N.S. Swamy
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Computation Inspired by Nature . . . . . . . . . . . . . . . . . . . . . 1
1.2 Biological Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Evolution Versus Learning . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Group Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Foraging Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Heuristics, Metaheuristics, and Hyper-Heuristics . . . . . . . . . . 9
1.6 Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6.1 Lagrange Multiplier Method . . . . . . . . . . . . . . . . . . 12
1.6.2 Direction-Based Search and Simplex Search . . . . . . . 13
1.6.3 Discrete Optimization Problems . . . . . . . . . . . . . . . 14
1.6.4 P, NP, NP-Hard, and NP-Complete . . . . . . . . . . . . . 16
1.6.5 Multiobjective Optimization Problem . . . . . . . . . . . . 17
1.6.6 Robust Optimization . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Performance Indicators. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.8 No Free Lunch Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.9 Outline of the Book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2 Basic Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Variants of Simulated Annealing . . . . . . . . . . . . . . . . . . . . . 33
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 37
3.1 Introduction to Evolutionary Computation . . . . . . . ....... 37
3.1.1 Evolutionary Algorithms Versus Simulated
Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Terminologies of Evolutionary Computation . . . . . . . . . . . . . 39
3.3 Encoding/Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4 Selection/Reproduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
xi
xii Contents
3.6 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.7 Noncanonical Genetic Operators . . . . . . . . . . . . . . . . . . . . . 49
3.8 Exploitation Versus Exploration . . . . . . . . . . . . . . . . . . . . . 51
3.9 Two-Dimensional Genetic Algorithms . . . . . . . . . . . . . . . . . 55
3.10 Real-Coded Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . 56
3.11 Genetic Algorithms for Sequence Optimization . . . . . . . . . . . 60
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.2 Syntax Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 Causes of Bloat. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.4 Bloat Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.4.1 Limiting on Program Size . . . . . . . . . . . . . . . . . . . 77
4.4.2 Penalizing the Fitness of an Individual
with Large Size. . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.4.3 Designing Genetic Operators . . . . . . . . . . . . . . . . . 77
4.5 Gene Expression Programming . . . . . . . . . . . . . . . . . . . . . . 78
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Evolutionary Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Evolutionary Gradient Search and Gradient Evolution . . . . . . 85
5.4 CMA Evolutionary Strategies . . . . . . . . . . . . . . . . . . . . . . . 88
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6 Differential Evolution . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 93
6.1 Introduction . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 DE Algorithm . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 94
6.3 Variants of DE . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 97
6.4 Binary DE Algorithms . ... . . . . . . . . . . . . . . . . . . . . . . . . 100
6.5 Theoretical Analysis on DE . . . . . . . . . . . . . . . . . . . . . . . . 100
References. . . . . . . . . . . . . . . ... . . . . . . . . . . . . . . . . . . . . . . . . 101
7 Estimation of Distribution Algorithms . . . . . . . . . . . . . . . . . . . . . 105
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.2 EDA Flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.3 Population-Based Incremental Learning . . . . . . . . . . . . . . . . 108
7.4 Compact Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . 110
7.5 Bayesian Optimization Algorithm . . . . . . . . . . . . . . . . . . . . 112
7.6 Concergence Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.7 Other EDAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.7.1 Probabilistic Model Building GP. . . . . . . . . . . . . . . 115
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Contents xiii
8 Topics in Evolutinary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 121

8.1 Convergence of Evolutinary Algorithms . . . . . . . . . . . . . . . . 121
8.1.1 Schema Theorem and Building-Block Hypothesis . . . 121
8.1.2 Finite and Infinite Population Models . . . . . . . . . . . 123
8.2 Random Problems and Deceptive Functions . . . . . . . . . . . . . 125
8.3 Parallel Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . 127
8.3.1 Master–Slave Model . . . . . . . . . . . . . . . . . . . . . . . 129
8.3.2 Island Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.3.3 Cellular EAs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
8.3.4 Cooperative Coevolution . . . . . . . . . . . . . . . . . . . . 133
8.3.5 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.3.6 GPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.4 Coevolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.4.1 Coevolutionary Approaches . . . . . . . . . . . . . . . . . . 137
8.4.2 Coevolutionary Approach for Minimax
Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.5 Interactive Evolutionary Computation . . . . . . . . . . . . . . . . . 139
8.6 Fitness Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.7 Other Heredity-Based Algorithms . . . . . . . . . . . . . . . . . . . . 141
8.8 Application: Optimizating Neural Networks . . . . . . . . . . . . . 142
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9 Particle Swarm Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.2 Basic PSO Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
9.2.1 Bare-Bones PSO . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.2.2 PSO Variants Using Gaussian or Cauchy
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.2.3 Stability Analysis of PSO. . . . . . . . . . . . . . . . . . . . 157
9.3 PSO Variants Using Different Neighborhood Topologies . . . . 159
9.4 Other PSO Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9.5 PSO and EAs: Hybridization . . . . . . . . . . . . . . . . . . . . . . . 164
9.6 Discrete PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.7 Multi-swarm PSOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
10 Artificial Immune Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
10.2 Immunological Theories . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
10.3 Immune Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
10.3.1 Clonal Selection Algorithm . . . . . . . . . . . . . . . . . . 180
10.3.2 Artificial Immune Network. . . . . . . . . . . . . . . . . . . 184
10.3.3 Negative Selection Algorithm . . . . . . . . . . . . . . . . . 185
10.3.4 Dendritic Cell Algorithm . . . . . . . . . . . . . . . . . . . . 186
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
xiv Contents
11 Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
11.2 Ant-Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 192
11.2.1 Basic ACO Algorithm . . . . . . . . . . . . . . . . . . . . . . 194
11.2.2 ACO for Continuous Optimization . . . . . . . . . . . . . 195
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
12 Bee Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.2 Artificial Bee Colony Algorithm . . . . . . . . . . . . . . . . . . . . . 203
12.2.1 Algorithm Flowchart . . . . . . . . . . . . . . . . . . . . . . . 203
12.2.2 Modifications on ABC Algorithm . . . . . . . . . . . . . . 207
12.2.3 Discrete ABC Algorithms. . . . . . . . . . . . . . . . . . . . 208
12.3 Marriage in Honeybees Optimization . . . . . . . . . . . . . . . . . . 209
12.4 Bee Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 210
12.5 Other Bee Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
12.5.1 Wasp Swarm Optimization . . . . . . . . . . . . . . . . . . . 212
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
13 Bacterial Foraging Algorithm . . . . . .................. . . . . 217
13.1 Introduction . . . . . . . . . . . . . .................. . . . . 217
13.2 Bacterial Foraging Algorithm . .................. . . . . 219
13.3 Algorithms Inspired by Molds, Algae, and Tumor Cells . . . . . 222
References. . . . . . . . . . . . . . . . . . . . .................. . . . . 224
14 Harmony Search. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
14.2 Harmony Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 228
14.3 Variants of Harmony Search . . . . . . . . . . . . . . . . . . . . . . . . 230
14.4 Melody Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
15 Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
15.1 Glowworm-Based Optimization. . . . . . . . . . . . . . . . . . . . . . 237
15.1.1 Glowworm Swarm Optimization . . . . . . . . . . . . . . . 238
15.1.2 Firefly Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 239
15.2 Group Search Optimization. . . . . . . . . . . . . . . . . . . . . . . . . 240
15.3 Shuffled Frog Leaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
15.4 Collective Animal Search . . . . . . . . . . . . . . . . . . . . . . . . . . 242
15.5 Cuckoo Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
15.6 Bat Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
15.7 Swarm Intelligence Inspired by Animal Behaviors. . . . . . . . . 247
15.7.1 Social Spider Optimization . . . . . . . . . . . . . . . . . . . 247
15.7.2 Fish Swarm Optimization . . . . . . . . . . . . . . . . . . . . 249
15.7.3 Krill Herd Algorithm . . . . . . . . . . . . . . . . . . . . . . . 250
15.7.4 Cockroach-Based Optimization . . . . . . . . . . . . . . . . 251
15.7.5 Seven-Spot Ladybird Optimization . . . . . . . . . . . . . 252
Contents xv
15.7.6 Monkey-Inspired Optimization . . . . . . . . . . . . . . . . 252

15.7.7 Migrating-Based Algorithms . . . . . . . . . . . . . . . . . . 253
15.7.8 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
15.8 Plant-Based Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . 255
15.9 Other Swarm Intelligence-Based Metaheuristics. . . . . . . . . . . 257
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
16 Biomolecular Computing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
16.1.1 Biochemical Networks . . . . . . . . . . . . . . . . . . . . . . 267
16.2 DNA Computing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
16.2.1 DNA Data Embedding. . . . . . . . . . . . . . . . . . . . . . 271
16.3 Membrane Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
16.3.1 Cell-Like P System . . . . . . . . . . . . . . . . . . . . . . . . 272
16.3.2 Computing by P System . . . . . . . . . . . . . . . . . . . . 273
16.3.3 Other P Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 275
16.3.4 Membrane-Based Optimization . . . . . . . . . . . . . . . . 277
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
17 Quantum Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
17.2 Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
17.2.1 Grover's Search Algorithm . . . . . . . . . . . . . . . . . . . 286
17.3 Hybrid Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
17.3.1 Quantum-Inspired EAs. . . . . . . . . . . . . . . . . . . . . . 287
17.3.2 Other Quantum-Inspired Hybrid Algorithms . . . . . . . 290
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
18 Metaheuristics Based on Sciences . . . . . . . . . . . . . . . . . . . . . . . . 295
18.1 Search Based on Newton's Laws . . . . . . . . . . . . . . . . . . . . . 295
18.2 Search Based on Electromagnetic Laws . . . . . . . . . . . . . . . . 297
18.3 Search Based on Thermal-Energy Principles . . . . . . . . . . . . . 298
18.4 Search Based on Natural Phenomena . . . . . . . . . . . . . . . . . . 299
18.4.1 Search Based on Water Flows . . . . . . . . . . . . . . . . 299
18.4.2 Search Based on Cosmology . . . . . . . . . . . . . . . . . 301
18.4.3 Black Hole-Based Optimization . . . . . . . . . . . . . . . 302
18.5 Sorting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
18.6 Algorithmic Chemistries. . . . . . . . . . . . . . . . . . . . . . . . . . . 304
18.6.1 Chemical Reaction Optimization . . . . . . . . . . . . . . . 304
18.7 Biogeography-Based Optimization. . . . . . . . . . . . . . . . . . . . 306
18.8 Methods Based on Mathematical Concepts . . . . . . . . . . . . . . 309
18.8.1 Opposition-Based Learning. . . . . . . . . . . . . . . . . . . 310
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
19 Memetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
19.2 Cultural Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
xvi Contents
19.3 Memetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

19.3.1 Simplex-based Memetic Algorithms. . . . . . . . . . . . . 320
19.4 Application: Searching Low Autocorrelation Sequences . . . . . 321
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
20 Tabu Search and Scatter Search . . . . . . . . . . . . . . . . . . . . . . . . . 327
20.1 Tabu Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
20.1.1 Iterative Tabu Search . . . . . . . . . . . . . . . . . . . . . . . 330
20.2 Scatter Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
20.3 Path Relinking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
21 Search Based on Human Behaviors . . . . . . . . . . . . . . . . . . . . . . . 337
21.1 Seeker Optimization Algorithm . . . . . . . . . . . . . . . . . . . . . . 337
21.2 Teaching–Learning-Based Optimization . . . . . . . . . . . . . . . . 338
21.3 Imperialist Competitive Algorithm. . . . . . . . . . . . . . . . . . . . 340
21.4 Several Metaheuristics Inspired by Human Behaviors . . . . . . 342
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
22 Dynamic, Multimodal, and Constrained Optimizations . . . . . . . . . 347
22.1 Dynamic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
22.1.1 Memory Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 348
22.1.2 Diversity Maintaining or Reinforcing . . . . . . . . . . . . 348
22.1.3 Multiple Population Scheme . . . . . . . . . . . . . . . . . . 349
22.2 Multimodal Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 350
22.2.1 Crowding and Restricted Tournament Selection . . . . 351
22.2.2 Fitness Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
22.2.3 Speciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
22.2.4 Clearing, Local Selection, and Demes . . . . . . . . . . . 356
22.2.5 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
22.2.6 Metrics for Multimodal Optimization . . . . . . . . . . . . 359
22.3 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 359
22.3.1 Penalty Function Method . . . . . . . . . . . . . . . . . . . . 360
22.3.2 Using Multiobjective Optimization Techniques . . . . . 363
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
23 Multiobjective Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
23.2 Multiobjective Evolutionary Algorithms . . . . . . . . . . . . . . . . 373
23.2.1 Nondominated Sorting Genetic Algorithm II. . . . . . . 374
23.2.2 Strength Pareto Evolutionary Algorithm 2 . . . . . . . . 377
23.2.3 Pareto Archived Evolution Strategy (PAES) . . . . . . . 378
23.2.4 Pareto Envelope-Based Selection Algorithm . . . . . . . 379
23.2.5 MOEA Based on Decomposition (MOEA/D) . . . . . . 380
23.2.6 Several MOEAs . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Contents xvii
23.2.7 Nondominated Sorting . . . . . . . . . . . . . . . . . . .... 384

23.2.8 Multiobjective Optimization
Based on Differential Evolution . . . . . . . . . . . . . . . 385
23.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
23.4 Many-Objective Optimization . . . . . . . . . . . . . . . . . . . . . . . 389
23.4.1 Challenges in Many-Objective Optimization . . . . . . . 389
23.4.2 Pareto-Based Algorithms . . . . . . . . . . . . . . . . . . . . 391
23.4.3 Decomposition-Based Algorithms . . . . . . . . . . . . . . 393
23.5 Multiobjective Immune Algorithms . . . . . . . . . . . . . . . . . . . 394
23.6 Multiobjective PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
23.7 Multiobjective EDAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
23.8 Tabu/Scatter Search Based Multiobjective Optimization . . . . . 399
23.9 Other Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
23.10 Coevolutionary MOEAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Appendix A: Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
Abbreviations
Ab Antibody
ABC Artificial bee colony
AbYSS Archive-based hybrid scatter search
ACO Ant colony optimization
ADF Automatically defined function
AI Artificial intelligence
aiNet Artificial immune network
AIS Artificial immune system
BBO Biogeography-based optimization
BFA Bacterial foraging algorithm
BMOA Bayesian multiobjective optimization algorithm
CCEA Cooperative coevolutionary algorithm
cGA Compact GA
CLONALG Clonal selection algorithm
CMA Covariance matrix adaptation
C-MOGA Cellular multiobjective GA
COMIT Combining optimizers with mutual information trees algorithm
COP Combinatorial optimization problem
CRO Chemical reaction optimization
CUDA Computer unified device architecture
DE Differential evolution
DEMO DE for multiobjective optimization
DMOPSO Dynamic population multiple-swarm multiobjective PSO
DNA Deoxyribonucleic acid
DOP Dynamic optimization problem
DSMOPSO Dynamic multiple swarms in multiobjective PSO
DT-MEDA Decision-tree-based multiobjective EDA
EA Evolutionary algorithms
EASEA Easy specification of EA
EBNA Estimation of Bayesian networks algorithm
EDA Estimation of distribution algorithm
EGNA Estimation of Gaussian networks algorithm
ELSA Evolutionary local selection algorithm
xix
xx Abbreviations
EPUS-PSO Efficient population utilization strategy for PSO

ES Evolution strategy
FDR-PSO Fitness-distance-ratio-based PSO
G3 Generalized generation gap
GA Genetic algorithm
GEP Gene expression programming
GP Genetic programming
GPU Graphics processing unit
HypE Hypervolume-based algorithm
IDCMA Immune dominance clonal multiobjective algorithm
IDEA Iterated density-estimation EA
IEC Interactive evolutionary computation
IMOEA Incrementing MOEA
IMOGA Incremental multiple-objective GA
LABS Low autocorrelation binary sequences
LCSS Longest common subsequence
LDWPSO Linearly decreasing weight PSO
LMI Linear matrix inequality
MCMC Markov chain Monte Carlo
meCGA Multiobjective extended compact GA
MIMD Multiple instruction multiple data
MIMIC Mutual information maximization for input clustering
MISA Multiobjective immune system algorithm
MOEA/D MOEA based on decomposition
MOGA Multiobjective GA
MOGLS Multiple-objective genetic local search
mohBOA Multiobjective hierarchical BOA
MOP Multiobjective optimization problem
moPGA Multiobjective parameterless GA
MPMO Multiple populations for multiple objectives
MST Minimum spanning tree
MTSP Multiple traveling salesmen problem
NetKeys Network random keys
NMR Nuclear magnetic resonance
NNIA Nondominated neighbor immune algorithm
NPGA Niched-Pareto GA
NSGA Nondominated sorting GA
opt-aiNet Optimized aiNet
PAES Pareto archived ES
PBIL Population-based incremental learning
PCB Printed circuit board
PCSEA Pareto corner search EA
PCX Parent-centric recombination
PICEA Preference-inspired coevolutionary algorithm
PIPE Probabilistic incremental program evolution
Abbreviations xxi
POLE Program optimization with linkage estimation

PSL Peak sidelobe level
PSO Particle swarm optimization
QAP Quadratic assignment problem
QSO Quantum swarm optimization
REDA Restricted Boltzmann machine-based multiobjective EDA
RM-MEDA Regularity model-based multiobjective EDA
SA Simulated annealing
SAGA Speciation adaptation GA
SAMC Stochastic approximation Monte Carlo
SDE Shift-based density estimation
SIMD Single instruction multiple data
SPEA Strength Pareto EA
SVLC Synapsing variable-length crossover
TLBO Teaching–learning-based optimization
TOPSIS Technique for order preference similar to an ideal solution
TSP Traveling salesman problem
TVAC Time-varying acceleration coefficients
UMDA Univariate marginal distribution algorithm
UNBLOX Uniform block crossover
VEGA Vector-evaluated GA
VIV Virtual virus
Introduction
1
This chapter introduces background material on global optimization and the concept
of metaheuritstics. Basic definitions of optimization, swarm intelligence, biological
process, evolution versus learning, and no-free-lunch theorem are described. We
hope this chapter will arouse your interest in reading the other chapters.
1.1 Computation Inspired by Nature
Artificial intelligence (AI) is an old discipline for making intelligent machines.

Search is a key concept of AI, because it serves all disciplines. In general, the
search spaces of practical problems are typically so large that excludes the pos-
sibility for being enumerated. This disables the use of traditional calculus-based and
enumeration-based methods. Computational intelligence paradigms are initiated for
this purpose, and the approach mainly depends on the cooperation of agents.
Optimization is the process of searching for the optimal solution. The three search
mechanisms are analytical, enumeration, and heuristic search techniques. Analytical
search is calculus-based. The search algorithms may be guided by the gradient or the
Hessian of the function, leading to a local minimum solution. Random search and
enumeration are unguided search methods that simply enumerate the search space
and exhaustively search for the optimal solution. Heuristic search is guided search
that in most cases produces high-quality solutions.
Computational intelligence is a field of AI. It investigates adaptive mechanisms
to facilitate intelligent behaviors in complex environments. Unlike AI that relies
on knowledge derived from human expertise, computational intelligence depends
upon numerical data collected. It includes a set of nature-inspired computational
paradigms. Major subjects in computational intelligence include neural networks for
pattern recognition, fuzzy systems for reasoning under uncertainty, and evolutionary
computation for stochastic optimization search.
© Springer International Publishing Switzerland 2016 1

K.-L. Du and M.N.S. Swamy, Search and Optimization by Metaheuristics,
DOI 10.1007/978-3-319-41192-7_1
2 1 Introduction
Nature is the primary source of inspiration for new computational paradigms. For
instance, Wiener’s cybernetics was inspired by feedback control processes observ-
able in biological systems. Changes in nature, from microscopic scale to ecological
scale, can be treated as computations. Natural processes always reach an equilibrium
that is optimal. Such analogies can be used for finding useful solutions for search
and optimization. Examples of natural computing paradigms are artificial neural
networks [43], simulated annealing (SA) [37], genetic algorithms [30], swarm intel-
ligence [22], artificial immune systems [16], DNA-based molecular computing [1],
quantum computing [28], membrane computing [51], and cellular automata (von
Neumann 1966).
From bacteria to humans, biological entities have social interaction ranging from
altruistic cooperation to conflict. Swarm intelligence borrows the idea of the collec-
tive behavior of biological population. Cooperative problem-solving is an approach
that achieves a certain goal by the cooperation of a group of autonomous enti-
ties. Cooperation mechanisms are common in agent-based computing paradigms,
be biological-based or not. Cooperative behavior has inspired researches in biology,
economics, and the multi-agent systems. This approach is based on the notion of the
associated payoffs from pursuing certain strategies.
Game theory studies situations of competition and cooperation between multiple
parties. The discipline starts with the von Neumann’s study on zero-sum games [48].
It has many applications in strategic warfares, economic or social problems, animal
behaviors, and political voting.
Evolutionary computation, DNA computing, and membrane computing are depen-
dent on knowledge on the microscopic cell structure of life. Evolutionary com-
putation evolves a population of individuals by generations, generate offspring by
mutation and recombination, and select the fittest to survive each generation. DNA
computing and membrane computing are emerging computational paradigms at the
molecular level.
Quantum computing is characterized by principles of quantum mechanics, com-
bined with computational intelligence [46]. Quantum mechanics is a mathematical
framework or set of rules for the construction of physical theories.
All effective formal behaviors can be simulated by Turing machines. For phys-
ical devices used for computational purpose, it is widely assumed that all physical
machine behaviors can be simulated by Turing machines. When a computational
model computes the same class of functions as the Turing machine, and potentially
faster, it is called a super-Turing model. Hypercomputation refers to computation
that goes beyond the Turing limit, and it is in the sense of super-Turing computation.
While Deutsch’s (1985) universal quantum computer is a super-Turing model, it is not
hypercomputational. The physicality of hypercomputational behavior is considered
in [55] from first principles, by showing that quantum theory can be reformulated in
a way that explains why physical behaviors can be regarded as computing something
in standard computational state machine sense.
1.2 Biological Processes 3
1.2 Biological Processes
The deoxyribonucleic acid (DNA) is carrier of the genetic information of organisms.

Nucleic acids are linear unbranched polymers, i.e., chain molecules, of nucleotides.
Nucleotides are divided into purines (adenine - A, guanine - G) and pyrimidines
(thymine - T, cytosine - C). The DNA is organized into a double helix structure.
Complementary nucleotides (bases) are pitted against each other: A and T, as well
as G and C.
The DNA structure is shown in Figure 1.1. The double helix, composed of phos-
phate groups (triangles) and sugar components (squares), is the backbone of the DNA
structure. The double helix is stabilized by two hydrogen bonds between A and T,
and three hydrogen bonds between G and C.
A sequence of three nucleotides is a codon or triplet. With three exceptions,
all 43 = 64 codons code one of 20 amino acids, and the synonyms code identical
amino acids. Proteins are polypeptide chains consisting of the 20 amino acids. An
amino acid consists of a carboxyl and an amino group which differs in other groups
that may also contain the hexagonal benzene molecule. The peptide bound of the
long polypeptide chains happens between the amino and the carboxyl group of the
neighbored molecule. Proteins are the basis modules of all cells and are actors of
life processes. They build characteristic three-dimensional structures, e.g., the alpha
helix molecule.
The human genome is about 3 billion base pairs long that specifies about 20488
genes, arranged in 23 pairs of homologous chromosomes. All base pairs of the DNA
from a single human cell have an overall length of 2.6 m, when unraveled and
stretched out, but are compressed in the core to size of 200 µm. Locations on these
chromosomes are referred to as loci. A locus which has a specific function is known
as a gene. The state of the genes is called the genotype and the observable of the
genotype is called the phenotype. A genetic marker is a locus with a known DNA
sequence which can be found in each person in the general population.
The transformation from genotype to phenotype is called gene expression. In the
transcription phase, the DNA is translated into the RNA. In the translation phase, the
RNA then synthesizes proteins.
A C T G
T G A C
Figure 1.1 The DNA structure.

4 1 Introduction
Figure 1.2 A gene on a

chromosome (Courtesy U.S.
Department of Energy,
Human Genome Program).
Figure 1.2 displays a chromosome, its DNA makeup, and identifies one gene.
The genome directs the construction of a phenotype, especially because the genes
specify sequences of amino acids which, when properly folded, become proteins. The
phenotype contains the genome. It provides the environment necessary for survival,
maintenance, and replication of the genome.
Heredity is relevant to information theory as a communication process [5]. The
conservation of genomes over intervals at the geological timescale and the existence
of mutations at shorter intervals can be conciliated, assuming that genomes possess
intrinsic error-correction codes. The constraints incurred by DNA molecules result
in a nested structure. Genomic codes resemble modern codes, such as low-density
parity-check (LDPC) codes or turbocodes [5]. The high redundancy of genomes
achieves good error-correction performance by simple means. At the same time,
DNA is a cheap material.
In AI, some of the most important components comprise the process of memory
formation, filtering, and pattern recognition. In biological systems, as in the human
brain, a model can be constructed of a network of neurons that fire signals with
different time sequence patterns for various input signals. The unit pulse is called an
action potential, involving a depolarization of the cell membrane and the successive
repolarization to the resting potential. The physical basis of this unit pulse is from
active transport of ions by chemical pumps [29]. The learning process is achieved by
taking into account the plasticity of the weights with which the neurons are connected
to one another. In biological nervous systems, the input data are first processed locally
and then sent to the central nervous system [33]. This preprocessing is partly to avoid
overburdening the central nervous system.
The connectionist systems (neural networks) are mainly based on a single brain-
like connectionist principle of information processing, where learning and infor-
mation exchange occur in the connections. In [36], the connectionist paradigm is
extended to integrative connectionist learning systems that integrate in their struc-
ture and learning algorithms principles from different hierarchical levels of informa-
tion processing in the brain, including neuronal, genetic, quantum. Spiking neural
networks are used as a basic connectionist learning model.
1.3 Evolution Versus Learning 5
1.3 Evolution Versus Learning
The adaptation of creatures to their environments results from the interaction of two
processes, namely, evolution and learning. Evolution is a slow stochastic process
at the population level that determines the basic structures of a species. Evolution
operates on biological entities, rather than on the individuals themselves. At the
other end, learning is a process of gradually improving an individual’s adaptation
capability to the environment by tuning the structure of the individual.
Evolution is based on the Darwinian model, also called the principle of natural
selection or survival of the fittest, while learning is based on the connectionist model
of the human brain. In the Darwinian evolution, knowledge acquired by an individual
during the lifetime cannot be transferred into its genome and subsequently passed
on to the next generation. Evolutionary algorithms (EAs) are stochastic search meth-
ods that employ a search technique based on the Darwinian model, whereas neural
networks are learning methods based on the connectionist model.
Combinations of learning and evolution, embodied by evolving neural networks,
have better adaptability to a dynamic environment [39,66]. Evolution and learning
can interact in the form of the Lamarckian evolution or be based on the Baldwin
effect. Both processes use learning to accelerate evolution.
The Lamarckian strategy allows the inheritance of the acquired traits during an
individual’s life into the genetic code so that the offspring can inherit its charac-
teristics. Everything an individual learns during its life is encoded back into the
chromosome and remains in the population. Although the Lamarckian evolution is
biologically implausible, EAs as artificial biological systems can benefit from the
Lamarckian theory. Ideas and knowledge are passed from generation to generation,
and the Lamackian theory can be used to characterize the evolution of human cul-
tures. The Lamarckian evolution has proved effective within computer applications.
Nevertheless, the Lamarckian strategy has been pointed out to distort the population
so that the schema theorem no longer applies [62].
The Baldwin effect is biologically more plausible. In the Baldwin effect, learning
has an indirect influence, that is, learning makes individuals adapt better to their envi-
ronments, thus increasing their reproduction probability. In effect, learning smoothes
the fitness landscape and thus facilitates evolution [27]. On the other hand, learning
has a cost, thus there is evolutionary pressure to find instinctive replacements for
learned behaviors. When a population evolves a new behavior, in the early phase,
there will be a selective pressure in favor of learning, and in the latter phase, there
will be a selective pressure in favor of instinct. Strong bias is analogous to instinct,
and weak bias is analogous to learning [60]. The Baldwin effect only alters the fitness
landscape and the basic evolutionary mechanism remains purely Darwinian. Thus,
the schema theorem still applies to the Baldwin effect [59].
A parent cannot pass its learned traits to its offspring, instead only the fitness after
learning is retained. In other words, the learned behaviors become instinctive behav-
iors in subsequent generations, and there is no direct alteration of the genotype.
The acquired traits finally come under direct genetic control after many genera-
tions, namely, genetic assimilation. The Baldwin effect is purely Darwinian, not
6 1 Introduction
Lamarckian in its mechanism, although it has consequences that are similar to those
of the Lamarckian evolution [59]. A computational model of the Baldwin effect is
presented in [27].
Hybridization of EAs and local search can be based either on the Lamarckian
strategy or on the Baldwin effect. Local search corresponds to the phenotypic plas-
ticity in biological evolution. The hybrid methods based on the Lamarckian strategy
and the Baldwin effect are very successful with numerous implementations.
1.4 Swarm Intelligence
The definition of swarm intelligence was introduced in 1989, in the context of cellular
robotic systems [6]. Swarm intelligence is a collective intelligence of groups of
simple agents [8]. Swarm intelligence deals with collective behaviors of decentralized
and self-organized swarms, which result from the local interactions of individual
components with one another and with their environment [8]. Although there is
normally no centralized control structure dictating how individual agents should
behave, local interactions among such agents often lead to the emergence of global
behavior.
Most species of animals show social behaviors. Biological entities often engage
in a rich repertoire of social interaction that could range from altruistic cooperation
to open conflict. The well-known examples for swarms are bird flocks, herds of
quadrupeds, bacteria molds, fish schools for vertebrates, and the colony of social
insects such as termites, ants, bees, and cockroaches, that perform collective behavior.
Through flocking, individuals gain a number of advantages, such as having reduced
chances of being captured by predators, following migration routes in a precise and
robust way through collective sensing, having improved energy efficiency during the
travel, and the opportunity of mating.
The concept of individual–organization [57] has been widely used to understand
collective behavior of animals. The principle of individual–organization indicates
that simple repeated interactions between individuals can produce complex behav-
ioral patterns at group level [57]. The agents of these swarms behave without super-
vision and each of these agents has a stochastic behavior due to its perception from,
and also influence on, the neighborhood and the environment. The behaviors can
be accurately described in terms of individuals following simple sets of rules. The
existence of collective memory in animal groups [15] establishes that the previous
history of the group structure influences the collective behavior in future stages.
Grouping individuals often have to make rapid decisions about where to move
or what behavior to perform, in uncertain or dangerous environments. Groups are
often composed of individuals that differ with respect to their informational status,
and individuals are usually not aware of the informational state of others. Some
animal groups are based on a hierarchical structure according to a fitness principle
known as dominance. The top member of the group leads all members of that group,
e.g., in the cases of lions, monkeys, and deer. Such animal behaviors lead to stable
1.4 Swarm Intelligence 7
groups with better cohesion properties among individuals [9]. Some animals, like
birds, fishes and sheep droves, live in groups but have no leader. This type of animals
has no knowledge about their group and environment. Instead, they can move in the
environment via exchanging data with their adjacent members.
Different swarm intelligence systems have inspired several approaches, including
particle swarm optimization (PSO) [21], based on the movement of bird flocks and
fish schools; the immune algorithm by the immune systems of mammals; bacteria for-
aging optimization [50], which models the chemotactic behavior of Escherichia coli;
ant colony optimization (ACO) [17], inspired on the foraging behavior of ants; and
artificial bee colony (ABC) [35], based on foraging behavior of honeybee swarms.
Unike EAs, which are primarily competitive among the population, PSO and
ACO adopt a more cooperative strategy. They can be treated as ontogenetic, since
the population resembles a multicellular organism optimizing its performance by
adapting to its environment.
Many population-based metaheuristics are actually social algorithms. Cultural
algorithm [53] is introduced for modeling social evolution and learning. Ant colony
optimization is a metaheuristic inspired by ant colony behavior in finding the short-
est path to reach food sources. Particle swarm optimization is inspired by social
behavior and movement dynamics of insect swarms, bird flocking, and fish school-
ing. Artificial immune system is inspired by biological immune systems, and exploit
their characteristics of learning and memory to solve optimization problems. Society
and civilization method [52] utilizes the intra and intersociety interactions within a
society and the civilization model.
1.4.1 Group Behaviors
In animal behavioral ecology, group living is a widespread phenomenon. Animal

search behavior is an active movement by which an animal attempts to find resources
such as food, mates, oviposition, or nesting sites. In nature, group members often
have different search and competitive abilities. Subordinates, who are less efficient
foragers than the dominant, will be dispersed from the group. Dispersed animals
may adopt ranging behavior to explore and colonize new habitats.
Group search usually adopts two foraging strategies within the group: producing
(searching for food) and joining (scrounging). Joining is a ubiquitous trait found in
most social animals such as birds, fish, spiders, and lions. In order to analyze the
optimal policy for joining, two models for joining are information-sharing [13] and
producer–scrounger [4]. Information-sharing model assumes that foragers search
concurrently for their own resource while searching for opportunities to join. In
producer–scrounger model, foragers are assumed to use producing (finding) or join-
ing (scrounging) strategies exclusively; they are divided into leaders and followers.
For the joining policy of ground-feeding birds, producer–scrounger model is
more plausible than information-sharing model. In producer–scrounger model, three
basic scrounging strategies are observed in house sparrows (Passer domesticus):
area copying—moving across to search in the immediate area around the producer,
8 1 Introduction
following—following another animal around without exhibiting any searching behav-

ior, and snatching—taking a resource directly from the producer.
The organization of collective behaviors in social insects can be understood as a
combination of the four functions of organization: coordination, cooperation, deliber-
ation, and collaboration [3]. The coordination function regulates the spatio-temporal
density of individuals, while the collaboration function regulates the allocation of
their activities. The deliberation function represents the mechanisms that support
the decisions of the colony, while the cooperation function represents the mecha-
nisms that overstep the limitations of the individuals. Together, the four functions of
organization produce solutions to the colony problems.
The extracted general cooperative group behaviors, search strategies, and com-
munication methods are useful within a computing context [3]
• Cooperation and group behavior.

Cooperation among individuals of the same or different species must benefit the
cooperators, whether directly or indirectly. Socially, the group may be individuals
working together for mutual benefit, or individuals each with their own specialized
role. Competition for the available resources may restrict the size of the group.
• Search strategies.
The success of a species depends on many factors, including its ability to search
effectively for resources, such as food and water, in a given environment. Search
strategies can be broadly divided into sit and wait (for ambush) and foraging widely
(for active searchers). Compared to the latter, the former has a lower opportunity
to get food, but with a low energy consumption.
• Communication strategies.
Inter-group communication is necessary for group behavior. Communication
strategies are often multimodal and can be either direct or indirect.
These aspects are not only general for biological-inspired natural computing, but
also applicable for all agent-based paradigms.
In biological populations, there is a continuous interplay between individuals of
the same species, and individuals of different species. Such ecological systems is
observed as symbiosis, host–parasite systems, and prey–predator systems, in which
two organisms mutually support each other, one exploits the other, or they fight
against each other. For instance, symbiosis between plants and fungi are very com-
mon, where the fungus invades and lives among the cortex cells of the secondary roots
and, in turn, helps the host plant absorb minerals from the soil. Cleaning symbiosis
is common in fish.
1.4.2 Foraging Theory
Natural selection has a tendency to eliminate animals having poor foraging strategies
and favor the ones with successful foraging strategies to propagate their genes. After
1.4 Swarm Intelligence 9
many generations, poor foraging strategies are either eliminated or shaped into good
ones.
Foraging can be modeled as an optimization process where an animal seeks to
maximize the energy obtained per unit time spent in foraging, or to maximize the
long-term average rate of energy intake, under constraints of its own physiology and
environment. Optimization models are also valid for social foraging where groups
of animals cooperatively forage.
Some animals forage as individuals and others forage as groups with a type of
collective intelligence. While an animal needs communication capabilities to perform
social foraging, it can exploit essentially the sensing capabilities of the group. The
group can catch large prey, individuals can obtain protection from predators while
in a group.
In general, a foraging strategy involves finding a patch of food, deciding whether
to proceed and search for food, and when to leave the patch. There are predators and
risks, energy required for travel, and physiological constraints (sensing, memory,
cognitive capabilities). Foraging scenarios can be modeled and optimal policies can
be found using dynamic programming. Search and optimal foraging decision-making
of animals can be one of three basic types: cruise (e.g., tunafish and hawks), saltatory
(e.g., birds, fish, lizards, and insects), and ambush (e.g., snakes and lions). In cruise
search, an animal searches the perimeter of a region; in an ambush, it sits and waits;
in saltatory search, an animal typically moves in some directions, stops or slows
down, looks around, and then changes direction over a whole region.
1.5 Heuristics, Metaheuristics, and Hyper-Heuristics
Many real-life optimization problems are difficult to solve by exact optimization

methods, due to properties, such as high dimensionality, multimodality, epistasis
(parameter interaction), and non-differentiability. Hence, approximate algorithms are
an alternative approach for these problems. Approximate algorithms can be decom-
posed into heuristics and metaheuristics. The words meta and heuristic both have
their origin in the old Greek: meta means upper level, and heuristic denotes the art
of discovering new strategies [58].
Heuristic refers to experience-based techniques for problem-solving and learning.
It gives a satisfactory solution in a reasonable amount of computational time, which
may not be optimal. Specific heuristics are problem-dependent and designed only
for the solution of a particular problem. Examples of this method include using a rule
of thumb, an educated guess, an intuitive judgment, or even common sense. Many
algorithms, either exact algorithms or approximation algorithms, are heuristics.
The term metaheuristic was coined by Glover in 1986 [25] to refer to a set of
methodologies conceptually ranked above heuristics in the sense that they guide
the design of heuristics. A metaheuristic is a higher level procedure or heuristic
designed to find, generate, or select a lower level procedure or heuristic (partial
search algorithm) that may provide a sufficiently good solution to an optimization
10 1 Introduction
problem. By searching over a large set of feasible solutions, metaheuristics can often
find good solutions with less computational effort than calculus-based methods, or
simple heuristics, can.
Metaheuristics can be single-solution-based or population-based. Single-solution
based metaheuristics are based on a single solution at any time and comprise
local search-based metaheuristics such as SA, Tabu search, iterated local search
[40,42], guided local search [61], pattern search or random search [31], Solis–Wets
algorithm [54], and variable neighborhood search [45]. In population-based meta-
heuristics, a number of solutions are updated iteratively until the termination condi-
tion is satisfied. Population-based metaheuristics are generally categoried into EAs
and swarm-based algorithms. Single-solution-based metaheuristics are regarded to
be more exploitation-oriented, whereas population-based metaheuristics are more
exploration-oriented.
The idea of hyper-heuristics can be traced back to the early 1960s [23]. Hyper-
heuristics can be thought of as heuristics to choose heuristics or as search algorithms
that explore the space of problem solvers. A hyper-heuristic is a heuristic search
method that seeks to automate the process of selecting, combining, generating, or
adapting several simpler heuristics to efficiently solve hard search problems. The low-
level heuristics are simple local search operators or domain-dependent heuristics,
which operate directly on the solution space for a given problem instance. Unlike
metaheuristics that search in a space of problem solutions, hyper-heuristics always
search in a space of low-level heuristics.
Heuristic selection and heuristic generation are currently the two main method-
ologies in hyper-heuristics. In the first method, the hyper-heuristic chooses heuristics
from a set of known domain-dependent low-level heuristics. In the second method,
the hyper-heuristic evolves new low-level heuristics by utilizing the components
of the existing ones. Hyper-heuristics can be based on genetic programming [11]
or grammatical evolution [10], which becomes an excellent candidate for heuristic
generation.
Several Single-Solution-Based Metaheuristics
Search strategies that randomly generate initial solutions and perform a local search
are also called multi-start descent search methods. However, to randomly create an
initial solution and perform a local search often results in low solution quality as the
complete search space is uniformly searched and search cannot focus on promising
areas of the search space.
Variable neighborhood search [45] combines local search strategies with dynamic
neighborhood structures subject to the search progress. The local search is an inten-
sification step focusing the search in the direction of high-quality solutions. Diver-
sification is a result of changing neighborhoods. By changing neighborhoods, the
method can easily escape from local optima. With an increasing cardinality of the
neighborhoods, diversification gets stronger as the shaking steps can choose from a
larger set of solutions and local search covers a larger area of the search space.
Guided local search [61] uses a similar principle and dynamically changes the
fitness landscape subject to the progress that is made during the search so that local
1.5 Heuristics, Metaheuristics, and Hyper-Heuristics 11
search can escape from local optima. The neighborhood structure remains constant.
It starts from a random solution x0 and performs a local search returning the local
optimum x1 . To escape the local optimum, a penalty is added to the fitness function
f such that the resulting fitness function h allows local search to escape. A new local
search is started from x1 using the modified fitness function h. Search continues until
a termination criterion is met.
Iterated local search [40,42] connects the unrelated local search phases as it creates
initial solutions not randomly but based on solutions found in previous local search
runs. If the perturbation steps are too small, the search cannot escape from a local
optimum. If perturbation is too strong, the search has the same behavior as multi-start
descent search methods. The modification step as well as the acceptance criterion
can depend on the search history.
1.6 Optimization
Optimization can generally be categorized into discrete or continuous optimization,

depending on whether the variables are discrete or continuous ones. There may be
limits or constraints on the variables. Optimization can be a static or a dynamic
problem depending upon whether the output is a function of time. Traditionally,
optimization is solved by calculus-based method, or based on random search, or
enumerative search. Heuristics-based optimization is the topic treated in this book.
Optimization techniques can generally be divided into derivative methods and
nonderivative methods, depending on whether or not derivatives of the objective
function are required for the calculation of the optimum. Derivative methods are
calculus-based methods, which can be either gradient search methods or second-
order methods. These methods are local optimizers. The gradient descent is also
known as steepest descent. It searches for a local minimum by taking steps along
the negative direction of the gradient of the function. Examples of second-order
methods are Newton’s method, the Gauss-Newton method, quasi-Newton methods,
the trust-region method, and the Levenberg-Marquardt method. Conjugate gradient
and natural gradient methods can also be viewed as reduced forms of the quasi-
Newton method.
Derivative methods can also be classified into model-based and metric-based
methods. Model-based methods improve the current point by a local approximating
model. Newton and quasi-Newton methods are model-based methods. Metric-based
methods perform a transformation of the variables and then apply a gradient search
method to improve the point. The steepest-descent, quasi-Newton, and conjugate
gradient methods belong to this latter category.
Methods that do not require gradient information to perform a search and sequen-
tially explore the solution space are called direct search methods. They maintain
a group of points. They utilize some sort of deterministic exploration methods to
search the space and almost always utilize a greedy method to update the maintained
12 1 Introduction
Figure 1.3 The landscape of Rosenbrock function f (x) with two variables x1 , x2 ∈
[−204.8, 204.8]. The spacing of the grid is set as 1. There are many local minima, and the global
minimum 0 is at (1, 1).
points. Simplex search and pattern search are two examples of effective direct search
methods.
Typical nonderivative methods for multivariable functions are random-restart
hill-climbing, random search, many heuristic and metaheuristic methods, and their
hybrids. Hill-climbing attempts to optimize a discrete or continuous function for
a local optimum. When operating on continuous space, it is called gradient ascent.
Other nonderivative search methods include univariant search parallel to an axis (i.e.,
coordinate search method), sequential simplex method, and acceleration methods in
direct search such as the Hooke-Jeeves method, Powell’s method and Rosenbrock’s
method. Interior-point methods represent state-of-the-art techniques for solving lin-
ear, quadratic, and nonlinear optimization programs.
Example 1.1: The Rosenbrock function

n−1
2
f (x) = 100 xi+1 − xi2 + (1 − xi )2 .
i=1
has the global minimum f (x) = 0 at xi = 1, i = 1, . . . , n. Our simulation is limited
to the two-dimensional case (n = 2), with x1 , x2 ∈ [−204.8, 204.8]. The landscape
of this function is shown in Figure 1.3.
1.6.1 Lagrange Multiplier Method
The Lagrange multiplier method can be used to analytically solve continuous func-
tion optimization problem subject to equality constraints [24]. By introducing the
1.6 Optimization 13
Lagrangian formulation, the dual problem associated with the primal problem is
obtained, based on which the optimal values of the Lagrange multipliers can be
found.
Let f (x) be the objective function and hi (x) = 0, i = 1, . . . , m, be the constraints.
The Lagrange function can be constructed as

m
L (x; λ1 , . . . , λm ) = f (x) + λi hi (x), (1.1)
i=1
where λi , i = 1, . . . , m, are called the Lagrange multipliers.
The constrained optimization problem is converted into an unconstrained opti-
mization problem: Optimize L (x; λ1 , . . . , λm ). By setting
∂
L (x; λ1 , . . . , λm ) = 0, (1.2)
∂x
∂
L (x; λ1 , . . . , λm ) = 0, i = 1, . . . , m, (1.3)
∂λi
and solving the resulting set of equations, we can obtain the x position at the extremum
of f (x) under the constraints.
To deal with constraints, the Karush-Kuhn-Tucker (KKT) theorem, as a gener-
alization to the Lagrange multiplier method, introduces a slack variable into each
inequality constraint before applying the Lagrange multiplier method. The conditions
derived from the procedure are known as the KKT conditions [24].
1.6.2 Direction-Based Search and Simplex Search
In direct search, generally the gradient information cannot be obtained; thus, it is

impractical to implement a step in the negative gradient direction for a minimum
problem. However, when the objectives of a group of solutions are available, the
best one can guide the search direction of the other solutions. Many direction-based
search methods and EAs are inspired by this intuitive idea.
Some of the direct search methods use improvement direction information to
search the objective space. Thus, it is useful to embed these directions into an EA as
either a local search method or an exploration operator.
Simplex search [47], introduced by Nelder and Mead in 1965, a well-known deter-
ministic direction-based search method. MATLAB contains a direct search toolbox
based on simplex search. Scatter search [26] includes the elitism mechanism into
simplex search. Like simplex search, for a group of points, the algorithm finds new
points, accepts the better ones, and discards the worse ones. Differential evolution
(DE) [56] uses the directional information from the current population. The mutation
operator of DE needs three randomly selected different individuals from the current
population for each individual to form a simplex-like triangle.
14 1 Introduction
Simplex Search
Simplex search is a group-based deterministic local search method capable of explor-
ing the objective space very fast. Thus many EAs use simplex search as a local search
method after mutation.
A simplex is a collection of n + 1 points in n-dimensional space. In an optimization
problem involving n variables, simplex method searches for an optimization solution
by evaluating a set of n + 1 points. The method continuously forms new simplices
by replacing the point having the worst performance in a simplex with a new point.
The new point is generated by reflection, expansion, and contraction operations.
In a multidimensional space, the subtraction of two vectors means a new vector
starting at one vector and ending at the other, like x2 − x1 . We often refer to the
subtraction of two vectors as a direction. Addition of two vectors can be implemented
in a triangular way, moving the start of one vector to the end of the other to form
another vector. The expression x3 + (x2 − x1 ) can be regarded as the destination of
a moving point that starts at x3 and has a length and direction of x2 − x1 .
For every new simplex, several points are assigned according to their objective
values. Then simplex search repeats reflection, expansion, contraction, and shrink in
a very efficient and deterministic way. Vertices of the simplex will move toward the
optimal point and the simplex will become smaller and smaller. Stop criteria can be
selected as a predetermined number of maximal iterations, the length of the edge or
the improving rate of B.
Simplex search for minimization is shown in Algorithm 1.1. The coefficients for
the reflection, expansion, contraction, and shrinking operations are typically selected
as α = 1, β = 2, γ = −1/2, and δ = 1/2. The initial simplex is important. The
search may easily get stuck for too small an initial simplex. This simplex should be
selected depending on the nature of the problem.
1.6.3 Discrete Optimization Problems
The discrete optimization problem is also known as combinatorial optimization prob-

lem (COP). Any problem that has a large set of discrete solutions and a cost function
for rating those solutions relative to one another is a COP. COPs are known to be
NP-complete.1 The goal for COPs is to find an optimal solution or sometimes a
nearly optimal solution. In COPs, the number of solutions grows exponentially with
the size of the problem n at O(n!) or O (en ) such that no algorithm can find the global
minimum solution in a polynomial computational time.
Definition 1.1 (Discrete optimization problem). A discrete optimization problem

is denoted as (X , f , ), or as minimizing the objective function
min f (x), x ∈ X , subject to , (1.4)
1 Namely, nondeterministic polynomial-time complete.

1.6 Optimization 15
Algorithm 1.1 (Simplex Search).
1. Initialize parameters.
Randomize the set of individuals xi .
2. Repeat:
a. Find the worst and best individuals as xh and xl .
Calculate the centroid of all xi ’s, i = h, as x.
b. Enter reflection mode:
xr = x + α(x − xh );
c. if f (xl ) < f (xr ) < f (xh ), xh ← xr ;
else if f (xr ) < f (xl ), enter expansion mode:
xe = x + β(x − xh );
if f (xe ) < f (xl ), xh ← xe ;
else xh ← xr ;
end
else if f (xr ) > f (xi ), ∀i = h, enter contraction mode:
xc = x + γ(x − xh );
if f (xc ) < f (xh ), xh ← xc ;
else enter shrinking mode:
xi = xl + δ(xi − xl ), ∀i = l;
end
end
until termination condition is satisfied.
where X ⊂ RN is the search space defined over a finite set of N discrete decision
variables x = (x1 , x2 , . . . , xN )T , f : X → R, is the set of constraints on x. Space
X is constructed according to all the constraints imposed on the problem.
Definition 1.2 (Feasible solution). A vector x that satisfies the set of constraints for
an optimization problem is called a feasible solution.
Traveling salesman problem (TSP) is perhaps the most famous COP. Given a set
of points, either nodes on a graph or cities on a map, find the shortest possible tour
that visits every point exactly once and then returns to its starting point. There are
(n − 1)!/2 possible tours for an n-city TSP. TSP arises in numerous applications,
from routing of wires on a printed circuit board (PCB), VLSI circuit design, to fast
food delivery.
Multiple traveling salesmen problem (MTSP) generalizes TSP using more than
one salesman. Given a set of cities and a depot, m salesmen must visit all cities
according to the constraints that the route formed by each salesman must start and
end at the depot, that each intermediate city must be visited once and by a single
salesman, and that the cost of the routes must be minimum. TSP with a time window
is a variant of TSP in which each city is visited within a given time window.
The vehicle routing problem concerns the transport of items between depots and
customers by means of a fleet of vehicles. It can be used for logistics and public
16 1 Introduction
services, such as milk delivery, mail or parcel pick-up and delivery, school bus
routing, solid waste collection, dial-a-ride systems, and job scheduling. Two well-
known routing problems are TSP and MTSP.
The location-allocation problem is defined as follows. Given a set of facilities,
each of which serves a certain number of nodes on a graph, the objective is to place
the facilities on the graph so that the average distance between each node and its
serving facility is minimized.
1.6.4 P, NP, NP-Hard, and NP-Complete
An issue related to the efficiency and efficacy of an algorithm is how hard the problem
itself is. The optimization problem is first transformed into a decision problem.
Problems that can be solved using a polynomial-time algorithm are tractable. A
polynomial-time algorithm has an upper bound O(nk ) on its running time, where k is
a constant and n is the problem size (input size). Usually, tractable problems are easy
to solve as running time increases relatively slowly with n. In contrast, problems are
intractable if they cannot be solved by a polynomial-time algorithm and there is a
lower bound on the running time which is (k n ), where k > 1 is a constant and n is
the input size.
The complexity class P (standing for polynomial time complexity) is defined as
the set of decision problems that can be solved by a deterministic Turing machine
using an algorithm with worst-case polynomial time complexity. P problems are
usually easy as there are algorithms that solve them in polynomial time.
The class NP (standing for nondeterministic polynomial time complexity) is the
set of all decision problems that can be verified by a nondeterministic Turing machine
using a nondeterministic algorithm in worst-case polynomial time. Although nonde-
terministic algorithms cannot be executed directly on conventional computers, this
concept is important and helpful for the analysis of the computational complexity
of problems. All problems in P also belong to the class NP, i.e., P ⊆ NP. There are
also problems where correct solutions cannot be verified in polynomial time.
All decision problems in P are tractable. Those problems that are in NP, but not in
P, are difficult as no polynomial-time algorithms exist for them. There are problems
in NP where no polynomial algorithm is available and which can be transformed into
one another with polynomial effort. A problem is said to be NP-hard, if an algorithm
for solving this problem is polynomial-time reducible to an algorithm that is able to
solve any problem in NP. Therefore, NP-hard problems are at least as hard as any
other problem in NP, and are not necessarily in NP.
The set of NP-complete problems is a subset of NP [14]. A decision problem A is
said to be NP-complete, if A is in NP and A is also NP-hard. NP-complete problems
are the hardest problems in NP. They all have the same complexity. They are difficult
as no polynomial-time algorithms are known. Decision problems that are not in NP
are even more difficult. The relationship between all these classes is illustrated in
Figure 1.4.
1.6 Optimization 17
Figure 1.4 The relationship

between P, NP, NP-complete, NP
and NP-hard classes. NP hard
NP complete
P
Practical COPs are all NP-complete or NP-hard. Right now, no algorithm with
polynomial time complexity can guarantee that an optimal solution will be found.
1.6.5 Multiobjective Optimization Problem
A multiobjective optimization problem (MOP) requires finding a variable vector x

in the domain X that optimizes the objective vector f (x).
Definition 1.3 (Multiobjective optimization problem). MOP is to optimize a sys-

tem with k conflicting objectives
min f (x) = (f1 (x), f2 (x), . . . , fk (x))T , x ∈ X (1.5)
subject to
gi (x) ≤ 0, i = 1, 2, . . . , m, (1.6)
hi (x) = 0, i = 1, 2, . . . , p, (1.7)
where x = (x1 , x2 , . . . , xn )T ∈ Rn , the objective functions fi : Rn → R, i = 1, . . . , k,
and gi , hj : Rn → R, i = 1, . . . , m, j = 1, . . . , p are the constraint functions of the
problem.
Conflicting objectives will be the case where increasing the quality of one objective
tends to simultaneously decrease the quality of another objective. The solution to
an MOP is not a single optimal solution, but a set of solutions representing the best
trade-offs among the objectives.
In order to optimize a system with conflicting objectives, the weighted sum of
these objectives is usually used as the compromise of the system

k
F(x) = wi f i (x), (1.8)
i=1
fi (x)
where f i (x) = |max(f i (x))|
are normalized objectives, and ki=1 wi = 1.
For many problems, there are difficulties in normalizing the individual objectives,
and also in selecting the weights. The lexicographic order optimization is based on
the ranking of the objectives in terms of their importance.
18 1 Introduction
The Pareto method is a popular method for multiobjective optimization. It is based

on the principle of nondominance. The Pareto optimum gives a set of solutions for
which there is no way of improving one criterion without deteriorating another
criterion. In MOPs, the concept of dominance provides a means by which multiple
solutions can be compared and subsequently ranked.
Definition 1.4 (Pareto dominance). A variable vector x1 ∈ Rn is said to dominate

another vector x2 ∈ Rn , denoted x1
x2 , if and only if x1 is better than or equal to
x2 in all attributes, and strictly better in at least one attribute, i.e., ∀i: fi (x1 ) ≥ fi (x2 )
∧∃j: fj (x1 ) > fj (x2 ).
For two solutions x1 , x2 , if x1 is better in all objectives than x2 , x1 is said to

strongly dominate x2 . If x1 is not worse than x2 in all objectives and better in at least
one objective, x1 is said to dominate x2 . A nondominated set is a set of solutions that
are not weakly dominated by any other solution in the set.
Definition 1.5 (Nondominance). A variable vector x1 ∈ X ⊂ Rn is nondominated

with respect to X , if there does not exist another vector x2 ∈ X such that x2 ≺ x1 .
Definition 1.6 (Pareto optimality). A variable vector x∗ ∈ F ⊂ Rn (F is the fea-

sible region) is Pareto optimal if it is nondominated with respect to F .
Definition 1.7 (Pareto optimal frontier). The Pareto optimal frontier P ∗ is defined
by the space in Rn formed by all Pareto optimal solutions P ∗ = {x ∈ F |x
is Pareto optimal}.
The Pareto optimal frontier is a set of optimal nondominated solutions, which

may be infinite.
Definition 1.8 (Pareto front). The Pareto front PF ∗ is defined by

PF ∗ = {f (x) ∈ Rk |x ∈ P ∗ }. (1.9)
The Pareto front is the image set of the Pareto optimal frontier mapping into the
objective space.
Obtaining the Pareto front of a MOP is the main goal of multiobjective optimiza-
tion. A good solution must contain a limited number of points, which should be as
close as possible to the exact Pareto front, as well as they should be uniformly spread
so that no regions are left unexplored.
An illustration of Pareto optimal solutions for a two-dimensional problem with
two objectives is given in Figure 1.5. The upper border from points A to B of the
domain X , denoted P ∗ , contains all Pareto optimal solutions. The frontier from points
f A to f B along the lower border of the domain Y , denoted PF ∗ , contains all Pareto
frontier in the objective space. For two points a and b, their mapping f a dominates f b ,
1.6 Optimization 19
x1 f1
A
P* f1* fA Y
X fb
a B f (x) fa
b fB
PF *
Parameter space x2 Objective space f2* f2
Figure 1.5 An illustration of Pareto optimal solutions for a two-dimensional problem with two
objectives. X ⊂ Rn is the domain of x, and Y ⊂ Rm is the domain of f (x).
(a) (b) (c)

f1 f1 f1
f1* fA f1* fA f1* fA

Y Y
Y
fB fB fB
PF *
PF *
f2* f2 f2* f2 PF * f2* f2
Figure 1.6 Different Pareto fronts. a Convex. b Concave. c Discontinuous.
denoted f a ≺ f b . Hence, the decision vector xa is a nondominated solution. Figure 1.6

illustrates that Pareto fronts can be convex, concave, or discontinuous.
Definition 1.9 (ε-dominance). A variable vector x1 ∈ Rn is said to ε-dominate

another vector x2 ∈ Rn , denoted x1
ε x2 , if and only if x1 is better than or
equal to εx2 in all attributes, and strictly better in at least one attribute, i.e., ∀i:
fi (x1 ) ≥ fi (εx2 ) ∧∃j: fj (x1 ) > fj (εx2 ) [69].
If ε = 1, ε-dominance is the same as Pareto dominance; otherwise, the area dom-

inated by xi is enlarged or shrunk. Thus, ε-dominance relaxes the area of Pareto
dominance by a factor of ε.
1.6.6 Robust Optimization
The robustness of a particular solution can be confirmed by resampling or by reusing

neighborhood solutions. Resampling is reliable, but computationally expensive. In
20 1 Introduction
contrast, the method of reusing neighborhood solutions is cheap but unreliable. A

confidence measure increases the reliability of the latter method. In [44], confidence-
based operators are defined for robust metaheuristics. The confidence metric and five
confidence-based operators are employed to design confidence-based robust PSO
and confidence-based robust GA. History can be utilized in helping to estimate the
expected fitness of an individual to produce more robust solutions in EAs.
Confidence metric defines the confidence level of a robust solution. The highest
confidence is achieved when there are a large number of solutions available with
greatest diversity within a suitable neighborhood around the solution in the parameter
space. Mathematically, confidence is expressed by [44]
n
C= , (1.10)
rσ
where n is the number of sampled points in the neighborhood, r is the radius of the
neighborhood, and σ is the distribution of the available points in the neighborhood.
1.7 Performance Indicators
For evaluation of different EA or iterative algorithms, one can implement overall

performance indicators and evolving performance indicators.
Overall Performance Indicators
The overall performance indicators provide a general description for the perfor-
mance. Overall performance can be compared according to their efficacy, efficiency,
and reliability on a benchmark problem with many runs.
Efficacy evaluates the quality of the results without caring about the speed of an
algorithm. Mean best fitness (MBF) is defined as the average of the best fitness in the
last population over all runs. The best fitness values thus far can be used as a more
absolute measure for efficacy.
Reliability indicates the extent to which the algorithm can provide acceptable
results. Success rate (SR) is defined as the percentage of runs terminated with success.
A successful run is defined as the difference between the best fitness value in the last
generation f ∗ and a predefined value f o under a predefined threshold ε.
Efficiency requires finding the global optimal solution rapidly. Average number
of evaluations to a solution (AES) is defined as the average number of evaluations
it takes for the successful runs. If an algorithm has no successful runs, its AES is
undefined.
Low SR and high MBF may indicate that the algorithm converges slowly, while
high SR and low MBF may indicate that the algorithm is basically reliable, but may
provide very bad results accidentally. It is desirable to have smaller AES and larger
SR, thus small AES/SR criterion considers reliability and efficiency at the same time.
1.7 Performance Indicators 21
Evolving Performance Indicators

Several generation-based evolving performance indicators can provide more detailed
information.
• Best-so-far (BSF) records the best solution found by the algorithm thus far for
each generation in every run. BSF index is monotonic.
• Best-of-current-population (BCP) records the best solution in each generation in
every run. MBF is the average of final BCP or final BSF over multiple runs.
• Average-of-current-population (ACP) records the average solution in each gener-
ation in every run.
• Worst-of-current-population (WCP) records the worst solution in each generation
in every run.
After many runs with random initial setting, we can draw conclusions on an algo-
rithm by applying statistical descriptions, e.g., statistical visualization, descriptive
statistics, and statistical inference.
Statistical visualization uses graphs to describe and compare algorithms. The box
plot is widely used for this purpose. Suppose we run an algorithm on a problem 100
times and get 100 values of a performance indicator. We can rank the 100 numbers
in ascending order. On each box, the central mark is the median, the lower and upper
edges are the 25th and 75th percentiles, the whiskers extend to the most extreme
data points not considered outliers, and outliers are plotted individually by +. The
interquartile range (IQR) is between the lower and upper edges of the box. Any
data that lie more than 1.5IQR lower than the lower quartile or 1.5IQR higher than
the higher quartile is considered an outlier. Two lines called whiskers are plotted to
indicate the smallest number that is not a lower outlier and the largest number that
is not a higher outlier. The default 1.5IQR corresponds to approximately ±2.7σ and
99.3 coverage if the data are normally distributed.
The box plot for BSF performance of two algorithms is illustrated in Figure 1.7.
Algorithm 2 has a larger median BDF and a smaller IQR, that is, better average
2
BSF
−2
−4
Algorithm 1 Algorithm 2
Figure 1.7 Box plot of the BSF of two algorithms.

22 1 Introduction
performance along with smaller variance, thus it outperforms algorithm 1. Also, for
the evolving process of many runs, the convergence graph illustrating the perfor-
mance over number of fitness evaluation (NOFE) is quite useful.
Graphs are easy to understand. When the difference between different algorithms
is small, one has to calculate specific numbers to describe and compare the perfor-
mance. The most often used descriptive statistics are mean and variance (or standard
deviation) of all performance indicators and compare them. Statistical inference
includes parameter estimation, hypothesis testing, and many other techniques.
1.8 No Free Lunch Theorem
Before no free lunch theorem [63,64] was proposed in 1995, people intuitively
believed that there exists some universally beneficial algorithms for search, and
many people actually made efforts to design some algorithms. No free lunch theorem
asserts that there is no universally beneficial algorithm.
The original no free lunch theorem for optimization states that no search algorithm
is better than another in locating an extremum of a function when averaged over the
set of all possible discrete functions. That is, all search algorithms achieve the same
performance as random enumeration, when evaluated over the set of all functions.
Theorem 1.1 (No free lunch theorem). Given the set of all functions F and a set
of benchmark functions F1 , if algorithm A1 is better on average than algorithm A2
on F1 , then algorithm A2 must be better than algorithm A1 on F \ F1 .
When there is no structural knowledge at all, all algorithms have equal perfor-
mance. No free lunch theorem is feasible for non-revisiting algorithms with no
problem-specific knowledge. It seems to be true because of deceptive functions and
random functions. Deceptive functions lead a hill-climber away from the optimum.
For random functions, search for optimum is totally at no where. For the two classes
of functions, it is like finding a needle in a haystack.
No free lunch theorem is concerned with the average performance for solving
all problems. In applications, such a scenario is hardly ever realistic since there is
almost always some knowledge about typical solutions. Practical problems always
contain priors such as smoothness, symmetry, and i.i.d. samples. The performance
of any algorithm is determined by the knowledge concerning the cost function. Thus,
it is meaningless to evaluate the performance of an algorithm without specifying the
prior knowledge. Thus, developing search algorithms actually builds special-purpose
methods to solve application-specific problems. For example, there are potentially
free lunches for coevolutionary approaches [65].
No free lunch theorem was later extended to coding methods, crossvalidation [67],
early stopping [12], avoidance of overfitting, and noise prediction [41]. Again, it has
been asserted that no one method is better than the others for all problems.
1.8 No Free Lunch Theorem 23
No free lunch theorem is also applicable to hyper-heuristics. Some of the different

selection strategies and acceptance criteria are combined in [49] and implemented
on benchmark exam timetabling problems. The experimental results showed that no
combination of heuristic selection strategies and acceptance criteria can dominate
others.
No free lunch theorem was expressed within a probabilistic framework [63].
Probability is inadequate to confirm no free lunch results in the general case [2].
No free lunch variants assume that the set of functions, the distribution of functions,
or the relationship between those functions and algorithms considered have special
properties. Specialized distributions are assumed in the case of the nonuniform no
free lunch [34].
Except for the analysis of randomized algorithms, a set-theoretic framework
related to no free lunch which obviates measure-theoretic limitations is presented in
[20], where functions are restricted to some benchmark, and algorithms are restricted
to some collection or limited to some number of steps, or the performance measure
is given.
1.9 Outline of the Book
In this book, we treat metaheuristics-based search and optimization techniques that

get inspirations from nature or thinking. For nature-inspired metaheuristics, there
is a hierarchical model of various computing paradigms. A biological system per-
forms information processing at different hierarchical levels: quantum, molecular,
genetic, cell, single neuron, ensemble of neurons, immune system, cognitive, and
evolutionary [19]:
• At a quantum level, particles, that constitute every molecule, move continuously,

being in several states at the same time that are characterized by probability, phase,
frequency, and energy. These states can change following the principles of quantum
mechanics. This motivates quantum computing.
• At a molecular level, RNA and protein molecules evolve in a cell and interact in
a continuous way, based on the stored information in the DNA and on external
factors, and affect the functioning of a cell (neuron). DNA computing simulates
this.
• At a cell level, chemicals exchange between cells and the functioning of organisms
is guaranteed. Membrane computing is inspired by this function.
• At the level of a single neuron, the internal information processes and the exter-
nal stimuli change the synapses and cause the neuron to produce a signal to be
transferred to other neurons. At the level of neuronal ensembles, all neurons oper-
ate together as a function of the ensemble through continuous learning. Artificial
neural network is a paradigm inspired by the neuronal system.
24 1 Introduction
• At the level of immune systems, the whole immune system of the biological entity
is working together to protect the body from damage of antigens. Artificial immune
systems simulate the activity of the immune system.
• At the level of the brain, cognitive processes take place in a life-long incremental
multiple task/multiple modalities learning mode, such as language and reasoning,
and global information processes are manifested, such as consciousness. Tabu
search, fuzzy logic and reasoning simulate how human thinks.
• At the level of a population of individuals, species evolve through evolution via
changing the genetic DNA code. Evolutionary algorithms are inspired by this idea.
• At the level of a population of individuals, individuals interact with one another
by social behaviors. Swarm intelligence contains a large class of algorithms that
simulate the social behaviors of a wide range of animals, from bacteria, insects,
fishes, mammals, to humans.
There are also many algorithms inspired by various natural phenomena, such as
rivers, tornado, plant reproduction, or by physical laws. Building computational
models that integrate principles from different levels may be efficient for solving
complex problems.
In the subsequent chapters we, respectively, introduce optimization methods
inspired from physical annealing, biological evolution, Bayesian inference, cultural
propagation, swarming of animals, artificial immune systems, ant colony, bee for-
aging, bacteria foraging, music harmony, quantum mechanics, DNA and molecular
biology, human strategies for problem-solving, and numerous other natural phenom-
ena.
In addition to the specific metaheuristics-based methods, we have also described
some general topics that are common to all optimization problems. The topics treated
are dynamic optimization, multimodal optimization, constrained optimization, mul-
tiobjective optimization, and coevolution.
Recurrent neural network models are also used for solving discrete as well as con-
tinuous optimization in the form of quadratic programming. Reinforcement learning
is a metaheuristic dynamic programming method for solving Markov and semi-
Markov decision problems. Since these neural network methods are useful for a
particular class of optimization problems, we do not treat them in this book. Inter-
ested readers are referred to the textbook entitled Neural Networks and Statistical
Learning by the same authors [19].
Problems
1.1 Plot the two-dimensional Rastrigin function in 3D space

n
f (x) = 10n + {xi2 − 10 cos(2πxi )}, −5.12 ≤ xi < 5.12, n = 2.
i=1
1.2 Find the minimum of the function
f = x12 + x22 + x32 + x42
1.9 Outline of the Book 25
subject to
x1 + 2x2 − x3 + x4 = 2,
2x1 − x2 + x3 + x4 = 4
by using the Lagrange multiplier method.
1.3 Consider the function f (x) = x 3 + 4x 2 + 3x + 1.
(a) Compute its gradient.
(b) Find all its local and global maxima/minima.
1.4 Given a set of points and a multiobjective optimization problem, judge the state-
ment that one point always dominates the others.
1.5 Given four points and their objective function values for multiobjective mini-
mization:
f1 (x1 ) = 1, f2 (x1 ) = 1,
f1 (x2 ) = 1, f2 (x2 ) = 2,
f1 (x3 ) = 2, f2 (x3 ) = 1,
f1 (x4 ) = 2, f2 (x4 ) = 2,
1) Which point dominates all the others?
2) Which point is nondominated?
3) Which point is Pareto optimal?
1.6 Apply exhaustive search to find the Pareto set and Pareto front of the problem
min{sin(x1 + x2 ), sin(x1 − x2 )},
where x1 , x2 ∈ (0, π], and the search resolution is 0.02.
1.7 What are the path, adjacent, ordinal, and matrix representations of the path
1 → 2 → 3 → 4 → 5?
1.8 MATLAB Global Optimization Toolbox provides MultiStart solver for find-
ing multiple local minima of smooth problems by using efficient gradient-based
local solvers. Try MultiStart solver on a benchmark function given in the
Appendix. Test the influence of different parameters.
1.9 Implement the patternsearch solver of MATLAB Global Optimization
Toolbox for solving a benchmark function given in the Appendix. Test the influ-
ence of different parameters.
References
1. Adleman LM. Molecular computation of solutions to combinatorial problems. Science.
1994;266:1021–4.
2. Auger A, Teytaud O. Continuous lunches are free plus the design of optimal optimization
algorithms. Algorithmica. 2010;57:121–46.
3. Banks A, Vincent J, Phalp K. Natural strategies for search. Nat Comput. 2009;8:547–70.
26 1 Introduction
4. Barnard CJ, Sibly RM. Producers and scroungers: a general model and its application to captive
flocks of house sparrows. Anim Behav. 1981;29:543–50.
5. Battail G. Heredity as an encoded communication process. IEEE Trans Inf Theory.
2010;56(2):678–87.
6. Beni G, Wang J. Swarm intelligence in cellular robotics systems. In: Proceedings of NATO
Advanced Workshop on Robots Biological Systems, Toscana, Italy, June 1989, p. 703–712.
7. Bishop CM. Neural networks for pattern recogonition. New York: Oxford Press; 1995.
8. Bonabeau E, Dorigo M, Theraulaz G. Swarm intelligence: from natural to artificial systems.
New York: Oxford University Press; 1999.
9. Broom M, Koenig A, Borries C. Variation in dominance hierarchies among group-living ani-
mals: modeling stability and the likelihood of coalitions. Behav Ecol. 2009;20:844–55.
10. Burke EK, Hyde MR, Kendall G. Grammatical evolution of local search heuristics. IEEE Trans
Evol Comput. 2012;16(3):406–17.
11. Burke EK, Hyde MR, Kendall G, Ochoa G, Ozcan E, Woodward JR. Exploring hyper-heuristic
methodologies with genetic programming. In: Mumford CL, Jain LC, editors. Computational
intelligence: collaboration, fusion and emergence. Berlin, Heidelberg: Springer; 2009. p. 177–
201.
12. Cataltepe Z, Abu-Mostafa YS, Magdon-Ismail M. No free lunch for early stropping. Neural
Comput. 1999;11:995–1009.
13. Clark CW, Mangel M. Foraging and ocking strategies: information in an uncertain environment.
Am Nat. 1984;123(5):626–41.
14. Cook SA. The complexity of theorem-proving procedures. In: Proceedings of the 3rd ACM
symposium on theory of computing, Shaker Heights, OH, USA, May 1971, p. 151–158.
15. Couzin ID, Krause J, James R, Ruxton GD, Franks NR. Collective memory and spatial sorting
in animal groups. J Theoret Biol. 2002;218:1–11.
16. de Castro LN, Timmis J. Artificial immune systems: a new computational intelligence approach.
Springer; 2002.
17. Dorigo M, Maniezzo V, Colorni A. Ant system: an autocatalytic optimizing process. Technical
Report 91-016, Politecnico di Milano, Milan, Italy, 1991.
18. Dorigo M, Maniezzo V, Colorni A. The ant system: optimization by a colony of cooperating
agents. IEEE Trans Syst, Man, Cybern Part B. 1996;26(1):29–41.
19. Du K-L, Swamy MNS. Neural netwotks and statistical learning. London: Springer; 2014.
20. Duenez-Guzman EA, Vose MD. No free lunch and benchmarks. Evol Comput. 2013;21(2):293–
312.
21. Eberhart R, Kennedy J. A new optimizer using particle swarm theory. In: Proceedings of the
6th International symposium on micro machine and human science, Nagoya, Japan, October
1995, p. 39–43.
22. Engelbrecht AP. Fundamentals of computational swarm intelligence. New Jersey: Wiley; 2005.
23. Fisher H, Thompson GL. Probabilistic learning combinations of local job shop scheduling rules.
In: Muth JF, Thompson GL, editors. Industrial scheduling. New Jersey: Prentice Hall;1963. p.
225–251.
24. Fletcher R. Practical methods of optimization. New York: Wiley; 1991.
25. Glover F. Future paths for integer programming and links to artificial intelligence. Comput
Oper Res. 1986;13:533–49.
26. Glover F, Laguna M, Marti R. Scatter search. In: Ghosh A, Tsutsui S, editors. Advances in
evolutionary computing: theory and applications. Berlin: Springer;2003. p. 519–537.
27. Hinton GE, Nowlan SJ. How learning can guide evolution. Complex Syst. 1987;1:495–502.
28. Hirvensalo M. Quantum computing. Springer. 2004.
29. Hodgkin AL, Huxley AF. Quantitative description of membrane current and its application to
conduction and excitation in nerve. J Physiol. 1952;117:500.
30. Holland JH. Outline for a logical theory of adaptive systems. J ACM. 1962;9(3):297–314.
References 27
31. Hooke R, Jeeves TA. “Direct search” solution of numerical and statistical problems. J ACM.
1961;8(2):212–29.
32. Hopfield JJ, Tank DW. Neural computation of decisions in optimization problems. Biol Cybern.
1985;52:141–52.
33. Hoppe W, Lohmann W, Markl H, Ziegler H. Biophysics. New York: Springer; 1983.
34. Igel C, Toussaint M. A no-free-lunch theorem for non-uniform distributions of target functions.
J Math Model Algorithms. 2004;3(4):313–22.
35. Karaboga D. An idea based on honey bee swarm for numerical optimization. Technical Report
TR06, Erciyes University, Kayseri, Turkey. 2005.
36. Kasabov N. Integrative connectionist learning systems inspired by nature: current models,
future trends and challenges. Natural Comput. 2009;8:199–218.
37. Kirkpatrick S, Gelatt CD Jr, Vecchi MP. Optimization by simulated annealing. Science.
1983;220:671–80.
38. Kleene SC. Introduction to metamathematics. Amsterdam: North Holland; 1952.
39. Ku KWC, Mak MW, Siu WC. Approaches to combining local and evolutionary search for
training neural networks: a review and some new results. In: Ghosh A, Tsutsui S, editors.
Advances in evolutionary computing: theory and applications. Berlin: Springer; 2003. p. 615–
641.
40. Lourenco HR, Martin O, Stutzle T. Iterated local search: framework and applications. In:
Handbook of metaheuristics, 2nd ed. New York: Springer. 2010.
41. Magdon-Ismail M. No free lunch for noise prediction. Neural Comput. 2000;12:547–64.
42. Martin O, Otto SW, Felten EW. Large-step Markov chains for the traveling salesman problem.
Complex Syst. 1991;5:299–326.
43. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. Bull
Math Biophys. 1943;5:115–33.
44. Mirjalili S, Lewis A, Mostaghim S. Confidence measure: a novel metric for robust meta-
heuristic optimisation algorithms. Inf Sci. 2015;317:114–42.
45. Mladenovic N, Hansen P. Variable neighborhood search. Comput Oper Res. 1997;24:1097–
100.
46. Moore M, Narayanan A. Quantum-inspired computing. Technical Report, Department of Com-
puter Science, University of Exeter, Exeter, UK. 1995.
47. Nelder JA, Mead R. A simplex method for function minimization. Comput J. 1965;7:308–13.
48. von Neumann J. Zur Theorie der Gesellschaftsspiele. Ann Math. 1928;100:295–320.
49. Ozcan E, Bilgin B, Korkmaz EE. A comprehensive analysis of hyper-heuristics. Intell Data
Anal. 2008;12(1):3–23.
50. Passino KM. Biomimicry of bacterial foraging for distributed optimisation and control. IEEE
Control Syst Mag. 2002;22(3):52–67.
51. Paun G. Membrane computing: an introduction. Berlin: Springer; 2002.
52. Ray T, Liew KM. Society and civilization: an optimization algorithm based on simulation of
social behavior. IEEE Trans Evol Comput. 2003;7:386–96.
53. Reynolds RG. An introduction to cultural algorithms. In: Proceedings of the 3rd Annual con-
ference on evolutionary programming, San Diego, CA, USA. New Jersey: World Scientific;
1994. p. 131–139
54. Solis FJ, Wets RJ. Minimization by random search techniques. Math Oper Res. 1981;6:19–30.
55. Stannett M. The computational status of physics: a computable formulation of quantum theory.
Nat Comput. 2009;8:517–38.
56. Storn R, Price K. Differential evolution—a simple and efficient heuristic for global optimization
over continuous spaces. J Glob Optim. 1997;11:341–59.
57. Sumper D. The principles of collective animal behaviour. Philos Trans R Soc B.
2006;36(1465):5–22.
58. Talbi E-G. Metaheuristics: from design to implementation. Hoboken, NJ: Wiley; 2009.
28 1 Introduction
59. Turney P. Myths and legends of the Baldwin effect. In: Proceedings of the 13th international
conference on machine learning, Bari, Italy, July 1996, p. 135–142.
60. Turney P. How to shift bias: lessons from the Baldwin effect. Evol Comput. 1997;4(3):271–95.
61. Voudouris C, Tsang E. Guided local search. Technical Report CSM-247, University of Essex,
Colchester, UK. 1995.
62. Whitley D, Gordon VS, Mathias K. Lamarckian evolution, the Baldwin effect and function
optimization. In: Proceedings of the 3rd Conference on parallel problem solving from nature
(PPSN III), Jerusalem, Israel, October 1994. p. 6–15.
63. Wolpert DH, Macready WG. No free lunch theorems for search. Technical Report SFI-TR-95-
02-010, Santa Fe Institute, Sante Fe, NM, USA. 1995.
64. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Com-
put. 1997;1(1):67–82.
65. Wolpert DH, Macready WG. Coevolutionary free lunches. IEEE Trans Evol Comput.
2005;9(6):721–35.
66. Yao X. Evolving artificial neural networks. Proc IEEE. 1999;87(9):1423–47.
67. Zhu H. No free lunch for cross validation. Neural Comput. 1996;8(7):1421–6.
68. Zimmermann HJ, Sebastian HJ. Intelligent system design support by fuzzy-multi-criteria deci-
sion making and/or evolutionary algorithms. In: Proceedings of IEEE international conference
on fuzzy systems, Yokohama, Japan, March 1995. p. 367–374.
69. Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG. Performance assessment of
multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput. 2003;7:117–32.
Simulated Annealing
2
This chapter is dedicated to simulated annealing (SA) metaheuristic for optimization.

SA is a probabilistic single-solution-based search method inspired by the annealing
process in metallurgy. Annealing is a physical process where a solid is slowly cooled
until its structure is eventually frozen at a minimum energy configuration. Various
SA variants are also introduced.
2.1 Introduction
Annealing is referred to as tempering certain alloys of metal, glass, or crystal by

heating above its melting point, holding its temperature, and then cooling it very
slowly until it solidifies into a perfect crystalline structure. This physical/chemical
process produces high-quality materials. The simulation of this process is known
as simulated annealing (SA) [4,10]. The defect-free crystal state corresponds to the
global minimum energy configuration. There is an analogy of SA with an optimiza-
tion procedure. The physical material states correspond to problem solutions, the
energy of a state to cost of a solution, and the temperature to a control parameter.
The Metropolis algorithm is a simple method for simulating the evolution to the
thermal equilibrium of a solid for a given temperature [14]. SA [10] is a variant of
the Metropolis algorithm, where the temperature is changing from high to low. SA
is basically composed of two stochastic processes: one process for the generation of
solutions and the other for the acceptance of solutions. The generation temperature is
responsible for the correlation between generated probing solutions and the original
solution.
SA is a descent algorithm modified by random ascent moves in order to escape
local minima which are not global minima. The annealing algorithm simulates a
nonstationary finite state Markov chain whose state space is the domain of the cost
function to be minimized. Importance sampling is the main principle that underlies

DOI 10.1007/978-3-319-41192-7_2
30 2 Simulated Annealing
SA. It has been used in statistical physics to choose sample states of a particle
system model to efficiently estimate some physical quantities. Importance sampling
probabilistically favors states with lower energies.
SA is a general-purpose, serial algorithm for finding a global minimum for a
continuous function. It is also a popular Monte Carlo algorithm for any optimization
problem including COPs. The solutions by this technique are close to the global
minimum within a polynomial upper bound for the computational time and are
independent of the initial conditions. Some parallel algorithms for SA have been
proposed aiming to improve the accuracy of the solutions by applying parallelism [5].
2.2 Basic Simulated Annealing
According to statistical thermodynamics, Pα , the probability of a physical system

being in state α with energy E α at temperature T satisfies the Boltzmann distribution1
1 −E αT
Pα =
e kB , (2.1)
Z
where k B is the Boltzmann’s constant, T is the absolute temperature, and Z is the
partition function, defined by
− Eβ T
Z= e kB , (2.2)
β
the summation being taken over all states β with energy E β at temperature T . At
high T , the Boltzmann distribution exhibits uniform preference for all the states,
regardless of the energy. When T approaches zero, only the states with minimum
energy have nonzero probability of occurrence.
In SA, the constant k B is omitted. At high T , the system ignores small changes in
the energy and approaches thermal equilibrium rapidly, that is, it performs a coarse
search of the space of global states and finds a good minimum. As T is lowered, the
system responds to small changes in the energy, and performs a fine search in the
neighborhood of the already determined minimum and finds a better minimum. At
T = 0, any change in the system states does not lead to an increase in the energy,
and thus, the system must reach equilibrium if T = 0.
When performing SA, theoretically a global minimum is guaranteed to be reached
with high probability. The artificial thermal noise is gradually decreased in time. T is
a control parameter called computational temperature, which controls the magnitude
of the perturbations of the energy function E(x). The probability of a state change is
determined by the Boltzmann distribution of the energy difference of the two states:
ΔE
P = e− T . (2.3)
1 Also known as the Boltzmann–Gibbs distribution.

2.2 Basic Simulated Annealing 31
The probability of uphill moves in the energy function (ΔE > 0) is large at high T ,
and is low at low T . SA allows uphill moves in a controlled fashion: It attempts to
improve on greedy local search by occasionally taking a risk and accepting a worse
solution. SA can be performed as Algorithm 2.1 [10].
Algorithm 2.1 (SA).
1. Initialize the system configuration.

Randomize x(0).
2. Initialize T with a large value.
3. Repeat:
a. Repeat:
i. Apply random perturbations to the state x = x + Δx.
ii. Evaluate ΔE(x) = E(x + Δx) − E(x):
if ΔE(x) < 0, keep the new state;
ΔE
otherwise, accept the new state with probability P = e− T .
until the number of accepted transitions is below a threshold level.

b. Set T = T − ΔT .
until T is small enough.
The basic SA procedure is known as Boltzmann annealing. The cooling schedule

for T is critical to the efficiency of SA. If T is reduced too rapidly, a premature
convergence to a local minimum may occur. In contrast, if it is too slow, the algorithm
is very slow to converge. Based on a Markov-chain analysis on the SA process,
Geman and Geman [6] have proved that a simple necessary and sufficient condition
on the cooling schedule for the algorithm state to converge in probability to the set
of globally minimum cost states is that T must be decreased according to
T0
T (t) ≥ , t = 1, 2, . . . (2.4)
ln(1 + t)
to ensure convergence to the global minimum with probability one, where T0 is a
sufficiently large initial temperature.
Given a sufficiently large number of iterations at each temperature, SA is proved
to converge almost surely to the global optimum [8]. In [8], T0 is proved to be
greater than or equal to the depth of the deepest local minimum which is not a global
minimum state. In order to guarantee Boltzmann annealing to converge to the global
minimum with probability one, T (t) needs to decrease logarithmically with time.
This is practically too slow. In practice, one usually applies, in Step 3b, a fast schedule
T (t) = αT (t − 1) with 0.85 ≤ α ≤ 0.96, to achieve a suboptimal solution.
However, due to its Monte Carlo nature, SA would require for some problems
even more iterations than complete enumeration in order to guarantee convergence to
an exact solution. For example, for an n-city TSP, SA using
2n−1 the logarithmic cooling
schedule needs a computational complexity of O n n , which is far more than
2 n
O((n − 1)!) for complete enumeration and O n 2 for dynamic programming [1].
Thus, one has to apply heuristic fast cooling schedules to improve the convergence
speed.
Example 2.1: We want to minimize the Easom function of two variables:

min f (x) = − cos x1 cos x2 exp −(x1 − π )2 − (x2 − π )2 , x ∈ [−100, 100]2 .
x
The Easom function is plotted in Figure 2.1. The global minimum value is −1 at
x = (π, π )T . This problem is hard since it has wide search space and the function
rapidly decays to values very close to zero, and the function has numerous local
minima with function value close to zero. This function is similar to a needle-in-a-
hay function. The global optimum is restricted in a very small region.
MATLAB Global Optimization Toolbox provides a SA solver
simulannealbnd, which assumes the objective function will take one input x.
We implement simulannealbnd with the default settings: initial temperature
of 100 for each dimension, temperature function as temperatureexp with a
factor of 0.95. The SA solver always fails to find the global optimum after ten runs,
when intial point x 0 is randomly selected within the range [−100, 100]2 . Even if we
set x 0 = [3, 3], which is very close to the global optimum, the algorithm still cannot
find the global minimum.
Figure 2.1 The Easom function when x ∈ [−10, 10]2 .

2.2 Basic Simulated Annealing 33
Best Function Value: −0.99969

0
−0.2
Function value
−0.4
−0.6
−0.8
−1
0 200 400 600 800 1000 1200 1400 1600
Iteration
Current Function Value: −7.7973e−005
0.5
Function value
−0.5
−1
0 200 400 600 800 1000 1200 1400 1600
Iteration
Figure 2.2 The evolution of a random run of simple GA: the minimum and average objectives.
After restricting the search space to [−10, 10]2 , and selecting a random intial
point x 0 ∈ [−0.5, 0.5]2 , we have the results of a random run as f (x) = −0.9997 at
(3.1347, 3.1542) with 1597 function evaluations. The evolution of the
simulannealbnd solver is given in Figure 2.2.
These results are very close to the global minimum.
2.3 Variants of Simulated Annealing
Standard SA is a stochastic search method, and the convergence to the global opti-
mum is too slow for a reliable cooling schedule. Many methods, such as Cauchy
annealing [18], simulated reannealing [9], generalized SA [19], and SA with known
global value [13] have been proposed to accelerate SA search. There are also global
optimization methods that make use of the idea of annealing [15,17].
Cauchy annealing [18] replaces the Boltzmann distribution with the Cauchy dis-
tribution, also known as the Cauchy–Lorentz distribution. The infinite variance pro-
vides a better ability to escape from local minima and allows for the use of faster
schedules, such as T decreasing according to T (t) = Tt0 .
In simulated reannealing [9], T decreases exponentially with t:
c1 t
T = T0 e− J , (2.5)
where the constant c1 > 0, and J is the dimension of the input space. The intro-
duction of reannealing also permits adaptation to changing insensitivities in the
multidimensional parameter space.
Generalized SA [19] generalizes both Cauchy annealing [18] and Boltzmann
annealing [10] within a unified framework inspired by the generalized thermostatis-
tics. Opposition-based SA [20] improves SA in accuracy and convergence rate using
opposite neighbors.
An SA algorithm under the simplifying assumption of known global value [13]
is the same as Algorithm 2.1 except that at each iteration a uniform random point is
generated over a sphere whose radius depends on the difference between the current
function value E (x(t)) and the optimal value E ∗ , and T is also decided by this
difference. The algorithm has guaranteed convergence and an upper bound for the
expected first hitting time, namely, the expected number of iterations before reaching
the global optimum value within a given accuracy [13].
The idea of annealing is a general optimization principle, which can be extended
using fuzzy logic. In the fuzzy annealing scheme [15], fuzzification is performed by
adding an entropy term. The fuzziness at the beginning of the entire procedure is
used to prevent the optimization process getting stuck at an inferior local optimum.
Fuzziness is reduced step by step. The fuzzy annealing scheme results in an increase
in the computation speed by a factor of one hundred or more compared to SA [15].
Since SA works by simulating from a sequence of distributions scaled with dif-
ferent temperatures, it can be regarded as Markov chain Monte Carlo (MCMC) with
a varying temperature. The stochastic approximation Monte Carlo (SAMC) algo-
rithm [12] has a remarkable feature of its self-adjusting mechanism. If a proposal
is rejected, the weight of the subregion that the current sample belongs to will be
adjusted to a larger value, and thus the proposal of jumping out from the current
subregion will be less likely rejected in the next iteration. Annealing SAMC [11] is
a space annealing√version of SAMC. Under mild conditions, it can converge weakly
at a rate of (1/ t) toward a neighboring set (in the space of energy) of the global
minimizers.
Reversible jump MCMC [7] is a framework for the construction of reversible
Markov chain samplers that jump between parameter subspaces of differing dimen-
sionality. The measure of interest occurs as the stationary measure of the chain. This
iterative algorithm does not depend on the initial state. At each step, a transition
from the current state to a new state is accepted with a probability. This acceptance
ratio is computed so that the detailed balance condition is satisfied, under which
the algorithm converges to the measure of interest. The proposition kernel can be
decomposed into several kernels, each corresponding to a reversible move. In order
for the underlying sampler to ensure the jump between different dimensions, the
various moves used are the birth move, death move, split move, merge move, and
perturb move, each with a probability of 0.2 [2]. SA with reversible-jump MCMC
method [2] has proved convergence.
SA makes a random search on the energy surface. Deterministic annealing [16,17]
is a deterministic method that replaces stochastic simulations by the use of expecta-
tion. It is a method where randomness is incorporated into the energy or cost function,
2.3 Variants of Simulated Annealing 35
which is then deterministically optimized at a sequence of decreasing temperature.

The iterative procedure of deterministic annealing is monotone nonincreasing in the
cost function. Deterministic annealing is able to escape local minima and reach a
global solution quickly. The approach is derived within a probabilistic framework
from basic information-theoretic principles (e.g., maximum entropy and random
coding). The application-specific cost is minimized subject to a constraint on the
randomness (Shannon entropy) of the solution, which is gradually lowered [17]. The
annealing process is equivalent to computation of Shannon’s rate-distortion function,
and the annealing temperature is inversely proportional to the slope of the curve.
Parallel SA algorithms take advantage of parallel processing. In [3], a fixed set of
samplers operates each at a different temperature. Each sampler performs the gener-
ate, evaluate, and decide cycle at a different temperature. A solution that costs less is
propagated from the higher temperature sampler to the neighboring sampler operat-
ing at a lower temperature. Therefore, the best solution at a given time is propagated
to all the samplers operating at a lower temperature. Coupled SA [21] is charac-
terized by a set of parallel SA processes coupled by their acceptance probabilities.
Coupling is performed by a term in the acceptance probability function, which is a
function of the energies of the current states of all SA processes. The addition of the
coupling and the variance control leads to considerable improvements with respect
to the uncoupled case.
Problems
2.1 Implement SA to minimize the 5-dimensional Ackley function. The parameters

are inverse cooling β = 0.01, initial temperature 100, iteration number 1000.
Keep track of the best-so-far solution x ∗k as a function of the iteration number k
for 10 runs. Plot the average value of xk∗ for the 10 runs.
2.2 Implement the simulannealbnd solver of MATLAB Global Optimization
Toolbox for solving a benchmark function. Test the influence of different para-
meter settings.
2.3 Run the accompanying MATLAB code of SA to find the global minimum of
six-hump-camelback function in the Appendix. Investigate how the parameters
influence the performance.
References
1. Aarts E, Korst J. Simulated annealing and Boltzmann machines. Chichester: Wiley; 1989.
2. Andrieu A, de Freitas JFG, Doucet A. Robust full Bayesian learning for radial basis networks.
Neural Comput. 2001;13:2359–407.
3. Azencott R. Simulated annealing: parallelization techniques. New York: Wiley; 1992.
4. Cerny V. Thermodynamical approach to the traveling salesman problem: an efficient simulation
algorithm. J Optim Theory Appl. 1985;45:41–51.
5. Czech ZJ. Three parallel algorithms for simulated annealing. In: Proceedings of the 4th inter-
national conference on parallel processing and applied mathematics, Naczow, Poland. London:
Springer; 2001. p. 210–217.
6. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration
of images. IEEE Trans Pattern Anal Mach Intell. 1984;6:721–41.
7. Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model deter-
mination. Biometrika. 1995;82:711–32.
8. Hajek B. Cooling schedules for optimal annealing. Math Oper Res. 1988;13(2):311–29.
9. Ingber L. Very fast simulated re-annealing. Math Comput Model. 1989;12(8):967–73.
10. Kirkpatrick S, Gelatt CD Jr, Vecchi MP. Optimization by simulated annealing. Science.
1983;220:671–80.
11. Liang F. Annealing stochastic approximation Monte Carlo algorithm for neural network train-
ing. Mach Learn. 2007;68:201–33.
12. Liang F, Liu C, Carroll RJ. Stochastic approximation in Monte Carlo computation. J Am Stat
Assoc. 2007;102:305–20.
13. Locatelli M. Convergence and first hitting time of simulated annealing algorithms for contin-
uous global optimization. Math Methods Oper Res. 2001;54:171–99.
14. Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E. Equations of state calculations
by fast computing machines. J Chem Phys. 1953;21(6):1087–92.
15. Richardt J, Karl F, Muller C. Connections between fuzzy theory, simulated annealing, and
convex duality. Fuzzy Sets Syst. 1998;96:307–34.
16. Rose K, Gurewitz E, Fox GC. A deterministic annealing approach to clustering. Pattern Recog-
nit Lett. 1990;11(9):589–94.
17. Rose K. Deterministic annealing for clustering, compression, classification, regression, and
related optimization problems. Proc IEEE. 1998;86(11):2210–39.
18. Szu HH, Hartley RL. Nonconvex optimization by fast simulated annealing. Proc IEEE.
1987;75:1538–40.
19. Tsallis C, Stariolo DA. Generalized simulated annealing. Phys A. 1996;233:395–406.
20. Ventresca M, Tizhoosh HR. Simulated annealing with opposite neighbors. In: Proceedings
of the IEEE symposium on foundations of computational intelligence (SIS 2007), Honolulu,
Hawaii, 2007. p. 186–192.
21. Xavier-de-Souza S, Suykens JAK, Vandewalle J, Bolle D. Coupled simulated annealing. IEEE
Trans Syst Man Cybern Part B. 2010;40(2):320–35.
Genetic Algorithms
3
Evolutionary algorithms (EAs) are the most influential metaheuristics for optimiza-
tion. Genetic algorithm (GA) is the most popular form of EA. In this chapter, we
first give an introduction to evolutionary computation. A state-of-the-art description
of GA is then presented.
3.1 Introduction to Evolutionary Computation
Evolutionary algorithms (EAs) are a class of general-purpose stochastic global opti-

mization algorithms under the universally accepted neo-Darwinian paradigm for
simulating the natural evolution of biological systems. The neo-Darwinian para-
digm is a combination of classical Darwinian evolutionary theory, the selectionism
of Weismann, and the genetics of Mendel. Evolution itself can be accelerated by
integrating learning, either in the form of the Lamarckian strategy or based on the
Baldwin effect. EAs are currently a major approach to adaptation and optimization.
EAs and similar population-based methods are simple, parallel, general-purpose,
global optimization methods. They are useful for any optimization problem, par-
ticularly when conventional calculus-based optimization techniques are difficult to
implement or is inapplicable. EAs can reliably solve hard problems fast that are large,
complex, noncontinuous, nondifferentiable, and multimodal. The approach is easy
to hybridize, and can be directly interfaced to existing simulations and models. EAs
can always reach the near-optimum or the global maximum. EAs possess inherent
parallelism by evaluating multipoints simultaneously.
A typical EA may consist of a population generator and selector, a fitness estima-
tor, and three basic genetic operators, namely, crossover(also called recombination),
mutation, and selection. Individuals in a population compete and exchange informa-
tion with one another.
Biologically, both crossover and mutation are considered the driving forces of evo-
lution. Crossover occurs when two parent chromosomes, normally two homologous
instances of the same chromosome, break and then reconnect but to the different end

DOI 10.1007/978-3-319-41192-7_3
38 3 Genetic Algorithms
pieces. Mutations can be caused by copying errors in the genetic material during cell
division and by external environment factors. Although the overwhelming majority
of mutations have no real effect, some can cause disease in organisms due to partially
or fully nonfunctional proteins arising from the errors in the protein sequence.
The procedure of a typical EA (in the form of GA) is given by Algorithm 3.1.
The initial population is usually generated randomly, while the population of other
generations are generated from some selection/reproduction procedure. The search
process of an EA will terminate when a termination criterion is met. Otherwise a
new generation will be produced and the search process continues. The termination
criterion can be selected as a maximum number of generations, or the convergence
of the genotypes of the individuals. Convergence of the genotypes occurs when all
the values in the same positions of all the strings are identical, and crossover has no
effect for further processes. Phenotypic convergence without genotypic convergence
is also possible. For a given system, the objective values are required to be mapped
into fitness values so that the domain of the fitness function is always greater than
zero.
Algorithm 3.1 (EA).
1. Set t = 0.
2. Randomize initial population P (0).
3. Repeat:
a. Evaluate fitness of each individual of P (t).
b. Select individuals as parents from P (t) based on fitness.
c. Apply search operators (crossover and mutation) to parents, and generate
P (t + 1).
d. Set t = t + 1.
until termination criterion is satisfied.
EAs are directed stochastic global search. They employ a structured, yet ran-
domized, parallel multipoint search strategy that is biased toward reinforcing search
points of high fitness. The evaluation function must be calculated for all the individ-
uals of the population, thus resulting in a high computation load. The high computa-
tional cost can be reduced by introducing learning into EAs, depending on the prior
knowledge of a given optimization problem.
EAs can be broadly divided into genetic algorithms (GAs) [46,47], evolution
strategies (ESs) [75], evolutionary programming [25], genetic programming (GP)
[55], differential evolution (DE) [88], and estimation of distribution algorithms
(EDAs) [67].
C++ code libraries for EAs are available such as Wall’s GALib at http://lancet.
mit.edu/ga/, and EOlib at http://eodev.sourceforge.net.
3.1 Introduction to Evolutionary Computation 39
3.1.1 Evolutionary Algorithms Versus Simulated Annealing
Markov chain Monte Carlo (MCMC) methods are often used to sample from
intractable target distributions. SA is an instance of MCMC. The stochastic process
of EAs is basically similar to that of MCMC algorithms: Both are Markov chains with
fixed transition matrices between individual states, for instance, transition matrices
given by mutation and recombination operators for EAs and by perturbation operators
for MCMC. MCMC uses a single chain whereas EAs use a population of individuals
that interact.
In SA, at each search two possibilities of selecting are controlled by a random
function. In an EA, this is achieved by the crossover and mutation operations. The
capability of an EA to converge to a premature local minimum or a global optimum is
usually controlled by suitably selecting the probabilities of crossover and mutation.
This is comparable to the controlled lowering of the temperature in SA. Thus, SA
can be viewed as a subset of EAs with a population of one individual and a changing
mutation rate.
SA is too slow for practical use. EAs are much more effective in finding the
global minimum due to their simplicity and parallel nature. The inherent parallel
property also offsets their high computational cost. Combination of SA and EAs
inherits the parallelization of EAs and avoids the computational bottleneck of EAs
by incorporating elements of SA. The hybrid retains the best properties of both
paradigms. Many efforts in the synergy of the two approaches have been made in
the past decade [12,100].
Guided evolutionary SA [100] incorporates the idea of SA into the selection
process of evolutionary computation in place of arbitrary heuristics. The hybrid
method is practically a number of parallel SA processes. Genetic SA [12] provides
a completely parallel, easily scalable hybrid GA/SA method. The hybrid method
combines the recombinative power of GA and annealing schedule of SA. Population
MCMC [21,56] simulates several (Markov) chains in parallel. The MCMC chains
interact through recombination and selection.
3.2 Terminologies of Evolutionary Computation
Some terminologies that are used in the evolutionary computation literature are listed
below. These terminologies are an analogy to their biological counterparts.
Definition 3.1 (Population). A set of individuals in a generation is called a popu-

lation, P (t) = x 1 , x 2 , . . . , x N P , where x i is the ith individual and N P is the size
of the population.
Definition 3.2 (Chromosome). Each individual x i in a population is a single chro-

mosome. A chromosome, sometimes called a genome, is a set of parameters that
define a solution to the problem.
Biologically, a chromosome is a long, continuous piece of DNA, that contains

many genes, regulatory elements and other intervening nucleotide sequences. Nor-
mal members of a particular species all have the same number of chromosomes.
For example, human body cells contain 46 diploid chromosomes, i.e., a set of 46
chromosomes from the mother and a set of 46 chromosomes from the father. Col-
lectively, a chromosome is used to encode a biological organism, that is, to store all
the genetic information of an individual. Diploid GA [101] has two chromosomes in
each individual. It promises robustness and preserves population diversity in contrast
to simple GA.
Definition 3.3 (Gene). In EAs, each chromosome x comprises of a string of elements

xi , called genes, i.e., x = (x1 , x2 , . . . , xn ), where n is the number of genes in the
chromosome. Each gene encodes a parameter of the problem into the chromosome.
A gene is usually encoded as a binary string or a real number.
In biology, genes are entities that parents pass to offspring during reproduction.
These entities encode information essential for the construction and regulation of pro-
teins and other molecules that determine the growth and functioning of the organism.
Definition 3.4 (Allele). The biological definition for an allele is any one of a number
of alternative forms of the same gene occupying a given position called a locus on
a chromosome. The gene’s position in the chromosome is called locus.
Alleles are the smallest information units in a chromosome. In nature, alleles exist
pairwise, whereas in EAs an allele is represented by only one symbol and it indicates
the value of a gene.
Figure 3.1 illustrate the differences between chromosome, gene, and allele.
Definition 3.5 (Genotype). A genotype is biologically referred to the underlying

genetic coding of a living organism, usually in the form of DNA. In EAs, a genotype
represents a coded solution, that is, an individual’s chromosome.
The genotype of each organism corresponds to an observable, known as a phenotype.
Figure 3.1 Alleles, genes,

and chromosomes. 10111001011011
allele gene chromosome

3.2 Terminologies of Evolutionary Computation 41
Definition 3.6 (Phenotype). Biologically, the phenotype of an organism is either

its total physical appearance and constitution or a specific manifestation of a trait.
There is a phenotype associated with each individual. The phenotype of an individual
is the set of all its traits (including its fitness and its genotype).
A phenotype is determined by genotype or multiple genes and influenced by environ-

mental factors. The concept of phenotypic plasticity describes the degree to which
an organism’s phenotype is determined by its genotype. A high level of plasticity
means that environmental factors have a strong influence on the particular phenotype
that develops. The ability to learn is the most obvious example of phenotypic plas-
ticity. As another example of phenotypic plasticity, sports can strengthen muscles.
However, some organs have very low phenotypic plasticity, for example, the color
of human eyes cannot be changed by environment.
The mapping of a set of genotypes to a set of phenotypes is referred to as a
genotype–phenotype map. In EAs, a phenotype represents a decoded solution.
Definition 3.7 (Fitness). Fitness in biology refers to the ability of an individual of

certain genotype to reproduce. The set of all possible genotypes and their respective
fitness values is called a fitness landscape.
Fitness function is a particular type of objective function that quantifies the opti-
mality of a solution, i.e., a chromosome, in an EA. It is used to map an individual’s
chromosome into a positive number. Fitness is the value of the objective function
for a chromosome x i , namely f (x i ). The fitness function is used to convert the
phenotype’s parameter values into the fitness.
Definition 3.8 (Natural Selection). Natural selection alters biological populations

over time by propagating heritable traits affecting individual organisms to survive
and reproduce. It adapts a species to its environment. Natural selection does not
distinguish between its two forms, namely, ecological selection and sexual selection,
but it is concerned with those traits that help individuals to survive the environment
and to reproduce. Natural selection causes traits to become more prevalent when
they contribute to fitness.
Natural selection is different from artificial selection. Genetic drift and genetic
flow are two other mechanisms in biological evolution. Genetic flow, also known as
genetic migration, is the migration of genes from one population to another.
Definition 3.9 (Genetic Drift). As opposed to natural selection, genetic drift is a

stochastic process that arises from random sampling in the reproduction. Genetic
drift is the tendency of the selection mechanism to converge over time toward a
uniform distribution of mutants of the fittest individual. It changes allele frequencies
(gene variations) in a population over many generations and affects traits that are
more neutral.
The genes of a new generation are a sampling from the genes of the successful
individuals of the previous one, but with some statistical error. Genetic drift is the
cumulative effect over time of this sampling error on the allele frequencies in the
population, and traits that do not affect reproductive fitness change in a population
over time. Like selection, genetic drift acts on populations, altering allele frequencies
and the predominance of traits. It occurs most rapidly in small populations and can
lead some alleles to become extinct or become the only alleles in the population,
thus reducing the genetic diversity in the population in finite populations.
3.3 Encoding/Decoding
GA uses binary coding. A chromosome x is a potential solution, denoted by a con-

catenation of the parameters x = (x1 , x2 , . . . , xn ), where each xi is a gene, and the
value of xi is an allele. x is encoded in the form
01 .
. . 00 10 .
. . 10 . . . 10 .
. . 11 . (3.1)
x1 x2 xn
If the chromosome is l-bit long, it has 2l possible values. If the variable xi is in the
− +
range xi , xi with a coding sli . . . s2 s1 , where li is its bit-length in the chromosome
and si ∈ {0, 1}, then the decoding function is given by
1
l −1

i
xi = xi− + xi+ − xi− s j 2 j
. (3.2)
2li − 1
j=0
In binary coding, there is the so-called Hamming cliffs phenomenon, where large
Hamming distances between the binary codes of adjacent integers occur. Gray coding
is another approach to encoding the parameters into bits. The decimal value of a
Gray-encoded integer variable increases or decreases by 1 if only one bit is changed.
However, the Hamming distance does not monotonically increase with the difference
in integer values.
For a long period, Gray encoding was believed to outperform binary encoding
in GA. However, based on a Markov chain analysis of GA, there is little difference
between the performance of binary and Gray codings for all possible functions [10].
Also, Gray coding does not necessarily improve the performance for functions that
have fewer local minima in the Gray representation than in the binary representation.
This reiterates the no free lunch theorem, namely, no representation is superior for
all classes of problems.
Example 3.1: The conversion from binary coding to Gray coding is formulated as

b1 , i =1
gi = , (3.3)
bi ⊕ bi−1 , i > 1
3.3 Encoding/Decoding 43
where gi and bi are, respectively, the ith Gray code bit and the ith binary code bit,
which are numbered from 1 to n starting on the left, and ⊕ denotes addition mod 2,
i.e., exclusive-or. Gray coding can be converted into binary coding by

i
bi = gj, (3.4)
j=1
where the summation denotes summation mod 2. As an example, we can check the
equivalence between the binary code 1011011011 and the gray code 1110110110.
From the two equations, the most significant i bits of the binary code determine the
most significant i bits of the Gray code and vice versa.
The performance of GA depends on the choice of the encoding techniques. GA

usually uses fixed-length binary coding, which results in limited accuracy and slow
convergence when approaching the optimum. This drawback can also be eliminated
by introducing adaptation into GA. Greater accuracy in the final solution is obtained
and convergence speed is increased by dynamically controlling the coding of the
search space. Examples of adaptive coding include delta coding [61], dynamic para-
meter encoding [81], and fuzzy coding [83,89]. There are also variable-length encod-
ing methods [34,61]. In messy GA [34], both the value and the position of each bit
are encoded in the chromosome.
The addition of fuzzy rules to control coding changes provides a more uniform
performance in GA search. Examples of fuzzy encoding techniques for GA are
the fuzzy GA parameter coding [89] and the fuzzy coding for chromosomes [83].
Compared with other coding methods, each parameter in the fuzzy coding always
falls within the desired range, thus removing the additional overheads on the genetic
operators. Prior knowledge from the problem domain can be integrated easily.
3.4 Selection/Reproduction
Selection embodies the principle of survival of the fittest, which provides a driving
force in GA. Selection is based on the fitness of the individuals. From a population
P (t), those individuals with strong fitness will be selected for reproduction so as to
generate a population of the next generation, P (t + 1). Chromosomes with larger
fitness are selected and are assigned a higher probability of reproduction.
Sampling chromosomes from the sample space can be in a stochastic manner,
a deterministic manner, or their mixed mode. The roulette-wheel selection [47] is
a stochastic selection method, while the ranking selection [33] and the tournament
selection [31] are mixed mode selection methods.
Other approaches that incorporate mating preferences into evolutionary systems
are correlative tournament selection [62] and seduction [76].
Roulette-Wheel Selection
The roulette-wheel or proportional selection [31,47] is a simple and popular selection
scheme. Segments of the roulette wheel are allocated to individuals of the population
in proportion to the individuals’ relative fitness scores. Selection of parents is carried
out by successive spins of the roulette wheel, and an individual’s possibility of being
selected is based on its fitness:
f (x i )
Pi = N , i = 1, 2, . . . , N P . (3.5)
i=1 f (x i )
P
Consequently, a chromosome with larger fitness has a possibility of getting more

offspring.
Only two chromosomes will be selected to undergo genetic operations. Typically,
the population size N P is relatively small, and this proportional selection may select
a disproportionately large number of unfit chromosomes. This easily induces pre-
mature convergence when all the individuals in the population become very similar
after a few generations. GA thus degenerates into a Monte Carlo-type search method.
Ranking Selection
Ranking selection [33] can eliminate some of the problems inherent in proportional
selection. It can maintain a more constant selective pressure. Individuals are sorted
according to their fitness values. The best individual is assigned the maximum rank
N P and the worst individual the lowest rank 1. The selection probability is linearly
assigned according to their ranks

1 i −1
Pi = β − 2(β − 1) , i = 1, 2, . . . , N P , (3.6)
NP NP − 1
where β is selected in [0, 2].
Tournament Selection
Tournament selection [31] involves h individuals at a time. The h chromosomes are
compared and a copy of the best performing individual becomes part of the mating
pool. The tournament will be performed repeatedly N P times until the mating pool
is filled. Typically, the tournament size h, which controls the selective pressure, is
selected as 2.
Tournament selection only uses local information. It is very easy to implement in
parallel and its time complexity is small. However, tournament selection suffers from
selection bias, and the best one will not be selected. Unbiased tournament selection
[86] is suggested to diminish the selective error.
Boltzmann tournament selection [32] introduces probability into tournament
selection. In binary Boltzmann tournament selection, two individuals i and j are
picked up randomly with replacement. The probability of i winning the tournament
is given by pi = 1
f j − f i , where T is a temperature decreasing as an annealing
1+exp( T )
process, and f i and f j are fitness values of individual i and j, respectively.
3.4 Selection/Reproduction 45
Elitism Strategy
The elitism strategy for selecting the individual with best fitness can improve the
convergence of GA [78]. The elitism strategy always copies the best individual of a
generation to the next generation. Although elitism may increase the possibility of
premature convergence, it improves the performance of GA in most cases and thus,
is integrated in most GA implementations [18].
Truncation selection is also an elitism strategy. It ranks all the individuals in the
current population according to their fitness and selects the best ones as parents.
Truncation selection is used as the basic selection scheme in ES and is also used in
breeder GA [68]. Breeder GA [68] was designed according to the methods used in
livestock breeding, and is based on artificial selection. Stud GA [52] uses the fittest
individual (the stud) in the population as one of the parents in all recombination
operations. Only one parent is selected stochastically.
Fitness-Uniform Selection/Deletion
Fitness-uniform selection and fitness-uniform deletion [49] achieve a population
which is uniformly distributed across fitness values, thus diversity is always preserved
in the population. Fitness-uniform selection generates selection pressure toward
sparsely populated fitness regions, not necessarily toward higher fitness. Fitness-
uniform deletion always deletes those individuals with very commonly occurring
fitness values. As fitness-uniform deletion is only a deletion scheme, EA still requires
a selection scheme. However, within a given fitness level genetic drift can occur,
although the presence of many individuals in other fitness levels to breed with will
reduce this effect.
Multikulti Selection
The natural mate selection of preferring somewhat different individuals has been
proved to increase the resistance to infection of the resulting offspring and thus
fitness. Multikulti methods [2] choose the individuals that are going to be sent to
other nodes based on the principle of multiculturality in an island model. In general,
multikulti policies outperform the usual migration policy of sending the best or a
random individual; however, the size of this advantage tends to be greater as the
number of nodes increases [2].
Replacement Strategies
The selection procedure needs to decide as to how many individuals in one population
will be replaced by the newly generated individuals so as to produce the population
for the new generation. Thus, the selection mechanism is split into two phases,
namely, parental selection and replacement strategy. There are many replacement
strategies such as the complete generational replacement, replace-random, replace-
worst, replace-oldest, and deletion by kill tournament [85]. In the crowding strategy
[20], an offspring replaces one of the parents whom it most resembles using the
similarity measure of the Hamming distance. These replacement strategies may result
in a situation where the best individuals in a generation may fail to reproduce. Elitism
strategy cures the problem by storing the best individuals obtained so far [18].
Statistically, the selective pressure for different replacement strategies are ranked
as: replace worst > kill tournament > age-based replacement ≈ replace random.
Elitism increases the selective pressure. Elitism can be combined with the kill tour-
nament, the age-based replacement, and the replace random rule. One can define a
probability for replacement so that the individual selected by the replacement rule
will have a chance to survive. This technique decreases the selective pressure.
3.5 Crossover
In sexually reproducing animals, genetic recombination occurs during the fusion of

sperm and egg cells (gametes); this process is called meiosis. Genetic recombination
actually occurs in the initial stage of meiosis. During meiosis, chromosomes in
a diploid cell resegregate, forming four haploid cells. DNA replication has already
occurred prior to meiosis. Each of the chromosomes within the cell have already been
doubled forming pairs of sister chromatids or dyads held together by the kinetochore.
The primary exploration operator in GA is crossover, which searches the range
of possible solutions based on existing solutions. Crossover, as a binary operator,
is to exchange information between two selected parent chromosomes at randomly
selected positions and to produce two new offspring (individuals). Both the children
will be different from either of their parents, yet retain some features of both.
The crossover method is highly dependent on the method of the genetic coding.
Some of the commonly used crossover techniques are one-point crossover [47],
two-point crossover [20], multipoint crossover [27], and uniform crossover [90].
The crossover points are typically at the same, random positions for both parent
chromosomes. These crossover operators are illustrated in Figure 3.2.
One-Point Crossover
One-point crossover requires one crossover point on the parent chromosomes, and
all the data beyond that point are swapped between the two parent chromosomes. It
is easy to model analytically. The operator generates bias toward bits at the ends of
the strings.
Two-Point Crossover
Two-point crossover selects two points on the parent chromosomes, and everything
between the two points is swapped. The operator causes a smaller schema disruption
than one-point crossover. It eliminates the disadvantage of one-point crossover, but
generates bias at a different level.
Two-point crossover does not sample all regions of the string equally, and the ends
of the string are rarely sampled. This problem can be solved by wrapping around the
string, such that the substring outside the region from the first cut point to the second
is crossed.
3.5 Crossover 47
(a) Parents (b) Parents
Children Children
(c) (d)
Parents Parents
A BCD E F G H I J
abcde f g hi j
Children Children
AB c d E f G h I j
a b CD e F g H i J
Figure 3.2 Illustration of crossover operators. a One-point crossover. b Two-point crossover.
c Multipoint crossover. d Uniform crossover. For multipoint crossover and uniform crossover,
the exchange between crossover points takes place at a fixed probability.
Multipoint Crossover
Multipoint crossover treats each string as a ring of bits divided by m crossover points
into m segments, and each segment is exchanged at a fixed probability.
Uniform Crossover
Uniform crossover exchanges bits of a string rather than segments. Individual bits in
the parent chromosomes are compared, and each of the nonmatching bits is proba-
bilistically swapped with a fixed probability, typically 0.5. The operator is unbiased
with respect to defining length. In half-uniform crossover [23], exactly half of the
nonmatching bits are swapped.
One-point and two-point crossover operations preserve schemata due to low dis-
ruption rates. In contrast, uniform crossover swaps are more exploratory, but have
a high disruptive nature. Uniform crossover is more suitable for small populations,
while two-point crossover is better for large populations. Two-point crossover per-
forms consistently better than one-point crossover [90].
When all the chromosomes are very similar or even the same in the population, it
is difficult to generate a new structure by crossover only and premature convergence
takes place. Mutation operation can introduce genetic diversity into the population.
This prevents premature convergence from happening when all the individuals in the
population become very similar.
3.6 Mutation
Mutation is a unary operator that requires only one parent to generate an offspring.
A mutation operator typically selects a random position of a random chromosome
and replaces the corresponding gene or bit by other information. Mutation helps to
regain the lost alleles into the population.
Mutations can be classified into point mutations and large-scale mutations. Point
mutations are changes to a single position, which can be substitutions, deletions,
or insertions of a gene or a bit. Large-scale mutations can be similar to the point
mutations, but operate in multiple positions simultaneously, or at one point with
multiple genes or bits, or even on the chromosome scale. Functionally, mutation
introduces the necessary amount of noise to perform hill-climbing.
Inversion and rearrangement operators are also large-scale mutation operators.
Inversion operator [47] picks up a portion between two randomly selected positions
within a chromosome and then reverses it. Swap is the most primitive reordering
operator, based on which many new unary operators including inversion can be
derived. The rearrangement operator reshuffles a portion of a chromosome such
that the juxtaposition of the genes or bits is changed. Some mutation operations are
illustrated in Figure 3.3.
Uniform bit-flip mutation is a popular mutation for binary string representations.
It independently changes each bit of a chromosome with a probability of p. Typ-
ically, p = 1/L for a string of L bits. This in expectation changes one bit in each
chromosome. The probability distribution of fitness values after the operation can
be exactly computed as a polynomial in p [14].
(a) (b)
Parent Parent
A
Child Child
a
(c) Parent (d) Parent

AB C D E
Child
E DCB A
Child
Parent
e
Child
Figure 3.3 Illustration of some mutation operators. a Substitution. b Deletion. c Duplication.

d Inversion. e Insertion.
3.6 Mutation 49
A high mutation rate can lead genetic search to random search. It may change
the value of an important bit, and thus slow down the fast convergence of a good
solution or slow down the process of convergence of the final stage of the iterations.
In simple GA, mutation is typically selected as a substitution operation that changes
one random bit in the chromosome at a time. An empirically derived formula that
can be used as the probability of mutation Pm at a starting point is Pm = 1√ , for a
T l
total number of T generations and a string length of l [80].
The random nature of mutation and its low probability of occurrence leads to slow
convergence of GA. The search process can be expedited by using the directed muta-
tion technique [6] that deterministically introduces new points into the population
by using gradient or extrapolation of the information acquired so far.
It is commonly agreed that crossover plays a more important role if the population
size is large, and mutation is more important if the population size is small [69].
In addition to traditional mutation operators, hill-climbing and bit climber are
two well-known local search operators, which can be treated as mutation operators.
Hill-climbing operators [65] find an alternative similar individual that represents a
local minimum close to the original individual in the solution space. The bit climber
[17] is a simple stochastic bit-flipping operator. The fitness is computed for an initial
string. A bit of the string is randomly selected and flipped, and the fitness is computed
at the new point. If the fitness is lower than its earlier value, the new string is updated
as the current string. The operation repeats until no bit flip improves the fitness. The
bit-based descent algorithm is several times faster than an efficient GA [17].
3.7 Noncanonical Genetic Operators
Most selection schemes are based on individuals’ fitness. The entropy-Boltzmann

selection method [58], stemming from the entropy, and the importance sampling
methods in the Monte Carlo simulation, tend to escape from local optima. It avoids
the problem of premature convergence systematically. The adaptive fitness consists
of the usual fitness together with the entropy change due to the environment, which
may vary from generation to generation.
Many genetic operators, such as transposition, host–parasite interaction, and gene-
regulatory networks, have been applied to EAs from biological inspirations. Host–
parasite methods are based on the coevolution of two different populations, acting
as parasite and host, respectively. The parasites usually encode the problem domain,
and the hosts encode the solution to the problem [45,74].
Bacterial Operators
Bacteria usually reproduce asexually. The bacterial mutation operation optimizes
the chromosome of one bacterium. There are three main types of gene transfer
mechanisms for bacterial populations: transformation, transduction, and conjugation
[73]. Transformation is a natural phenomenon resulting from the uptake by a recipient
bacterium of a DNA fragment from the environment which can be incorporated to
the recipient chromosome. Transduction involves the transfer of genes from a donor
bacterium to a recipient one by a bacteriophage, namely, a virus whose hosts are
bacteria. In contrast with transduction, in conjugation, the absence of a bacteriophage
requires a direct physical contact between the donor bacterium and the recipient one.
Gene transfer operation [70] allows the transfer of a segment between the bacteria
in the population. Bacterial EA [70] substitutes the classical crossover with the gene
transfer operation. Each bacterium represents a solution for the original problem. A
segment of a bacterium is transferred to a destination bacterium, and those genes in
the destination bacterium that appears in the segment from the source bacterium are
removed after the transfer.
Based on a microbial tournament, microbial GA [39] is a minimal steady-state GA
implementation. Thus, once two parent chromosomes are chosen at random from a
population, the winner is unchanged, while the loser or less fit chromosome is infected
by a copy of a segment of the winner’s chromosome and further mutated. This form of
recombination is inspired in bacterial conjugation. A conjugation operator simulating
the genetic mechanism exhibited by bacterial colonies is introduced in [73].
Jumping-Gene Transposition
The jumping-gene (transposon) phenomenon is the gene transposition in the genome
that was discovered from the maize plants. The jumping genes could move around
the genome in two ways: cut-and-paste transposon and copy-and-paste (replicate)
transposon. Cut-and-paste cuts a piece of DNA and pastes it somewhere else. Copy-
and-paste means that the genes remain at the same location while the message in
the DNA is copied into RNA and then copied back into DNA at another place in the
genome. The jump of genes enables a transposition of gene(s) to be induced in the
same chromosome or even to other chromosomes.
Transposition operator [11,39,84] is a genetic operator that mimics the jumping-
gene phenomenon. It enables the gene mobility within the same chromosome, or
even to a different chromosome. Transposons resemble computer viruses: They are
the autonomous programs, which are transmissible from one site to another on the
same or another chromosome, or from parent to offspring in the reproduction process.
These autonomous parasitic programs cooperate with the host genetic programs, thus
realizing process of self-replication.
Crossover for Variable-Length GAs
The complexity of the human genome was not obtained at the beginning of evolution,
but rather it is generally believed that life started off from simple form and gradually
incremented its organism complexity through evolution. Variable-length GAs operate
within a variable parameter space. Consequently, they are usually applied to design
problems, where the phenotype can have a variable number of components and the
problem is incremental in nature.
Messy GA [34] utilizes a variable-length representation. In messy GA, the
crossover operator is implemented by cutting and splicing. Each parent genome
is first cut into two strings at a random point, obtaining four strings. The strings are
3.7 Noncanonical Genetic Operators 51
then spliced in a random order. Although messy GA uses a variable-length repre-

sentation, they are in fact based on a fixed-length scheme since genes in messy GA
contain both a value and a tag that specifies the position or locus of that value in a
fixed-length genome.
Speciation adaptation GA (SAGA) cross [38], virtual virus (VIV) crossover
algorithm [7], and synapsing variable-length crossover (SVLC) algorithm [50] are
three biologically inspired methods for performing meaningful crossover between
variable-length genomes. They conduct recombination of parent genomes by exchang-
ing sections with good similarity.
SAGA cross [38] defines the longest common subsequence (LCSS) as the metric
for the sequence similarity of the two parent genomes. For each random crossover
point on the first genome, only the crossover point(s) with the highest LCSS score
are eligible as a crossover point for the second genome.
VIV crossover algorithm [7] is also based on the sequence similarity between
parent genomes. VIV adopts the standard four letter alphabet, {A, T, C, G}, and a
genome is a sequence of the four symbols. For modeling the effect of recombination
in viruses, VIV adopts a biologically plausible crossover operator called homologous
1-point crossover, in which the probability of crossover is controlled by the degree
of local similarity between two parents within a specified fixed size window. As in
the SAGA cross, a random crossover point is initially chosen on one of the parent
genomes. The algorithm compares a window of bases from the selected point with
all possible windows of the same size on the other parent genome. The genomes are
then crossed within the matched window.
SVLC algorithm [50] also uses the LCSS similarity metric for variable-length
genomes, and this creates the possibility of using speciation or niche formation
techniques in variable-length GAs. SVLC uses both parent strings as a template.
This preserves any common sequences, allowing only differences to be exchanged,
thereby producing complete child genomes which possess the common parental
sequence and any recombined differences between the parent genomes.
3.8 Exploitation Versus Exploration
For EAs, two fundamental processes that drive the evolution of a population are the
exploration process and the exploitation process. Exploitation means taking advan-
tage of the information already obtained, while exploration means searching differ-
ent regions of the search space. Exploitation is achieved by the selection procedure,
while exploration is achieved by genetic operators, which preserve genetic diversity
in the population. The two objectives are conflicting: increasing the selective pres-
sure leads to decreasing diversity, while keeping the diversity can result in delayed
convergence.
GA often converges rather prematurely before the optimal solution is found. To
prevent premature convergence, an appropriate diversity in the population has to be
maintained. Otherwise, the entire population tends to be very similar, and crossover
will be useless and GA reduces to parallel mutation climbing. The trade-off between
exploitation (convergence) and exploration (diversity) controls the performance of
GA and is determined by the choice of the control parameters, namely, the probability
of crossover Pc , the probability of mutation Pm , and the population size N P . Some
trade-offs are made for selecting the optimal control parameters:
• Increasing Pc results in fast exploration at the price of increasing the disruption of

good strings.
• Increasing Pm tends to transform genetic search into a random search, while it
helps reintroduce lost alleles into the population.
• Increasing N P increases the genetic diversity in the population and reduces the
probability of premature convergence, at the price of an increased time of conver-
gence.
These control parameters depend on one another, and their choices depend on the
nature of the problem. In GA practice, for small N P one can select relatively large Pm
and Pc , while for large N P smaller Pc and Pm are desirable. Empirical results show
that GA with N P = 20 – 30, Pc = 0.75 – 0.95, and Pm = 0.005 – 0.01 performs
well [80]. When crossover is not used, GA can start with large Pm , decreasing toward
the end of the run. In [66], the optimal Pm is analytically derived as Pm = 1/L for
a string length L.
It is concluded from a systematic benchmark investigation on the seven parameters
of GA in [64] that crossover most significantly influenced the success of GA, followed
by mutation rate and population size and then by rerandomization point and elite
strategy. Selection method and the representation precision for numerical values had
least influence.
Adapting Control Parameters
Adaptation of control parameters is necessary for the best search process. At the
beginning of a search process, GA should have more emphasis on exploration, while
at a later stage more emphasis should be on exploitation.
Increasing Pm and Pc promotes exploration at the expense of exploitation. A
simple method to adapt Pm is implemented by linearly decreasing Pm with the
number of generations, t. Pm can also be modified by [44]
γ0 t
α0 e− 2
Pm (t) = √ , (3.7)
NP l
where the constants α0 > 0, γ0 ≥ 0, and l is the length of the chromosome. In [80],
α0 = 1.76 and γ0 = 0.
In [87], a fitness-based rule is used to assign mutation and recombination rates,
with higher rates being assigned to those genotypes that are most different in fitness
from the fittest individual in the population. This results in a reduced probability of
crossover for the best solutions available in an attempt to protect them. When all
the individuals in the population are very similar, the exploration drive will be lost.
Rank GA [9] is obtained by assigning the mutation rate through a ranking of the
3.8 Exploitation Versus Exploration 53
population by fitness. This protects only the current maximal fitness found, while
the rest perform random walks with different step sizes. The worst individuals will
undergo the most changes.
Dynamic control of GA parameters can be based on fuzzy logic techniques [40,
41,57]. In [57], the population sizes, and crossover and mutation rates are determined
from average and maximum fitness values and differentials of the fitness value by
fuzzy reasoning.
Controlling Diversity
The genetic diversity of the population can be easily improved so as to prevent
premature convergence by adapting the size of the population [1,34] and using partial
restart [23]. Partial restart is a simple approach to maintain genetic diversity [23]. It
can be implemented by a fixed restart schedule at a fixed number of generations, or
implemented when premature convergence occurs.
Periodic population reinitialization can increase the diversity of the population.
One methodology combining the effects of the two strategies is saw-tooth GA [54],
which follows a saw-tooth population scheme with a specific amplitude and period
of variation.
Duplicate removal can enhance the diversity substantially. The uniqueness opera-
tor [63] allows a child to be inserted into the population only if its Hamming distance
to all members of the population is greater than a threshold. Analysis of an EA with
N P > 1 using uniform bit mutation but no crossover [28] shows that the duplicate
removal method changes the time complexity of optimizing a plateau function from
exponential to polynomial. Each child is required to compare with all the solutions
in the current population.
Diversity-guided EA [94] uses the distance-to-average-point measure to alternate
between phases of exploration (mutation) and phases of exploitation (recombination
and selection). The diversity-guided EA has shown remarkable results not only in
terms of fitness, but also in terms of saving a substantial amount of fitness evaluations
compared to simple EA.
Since the selection operator has a tendency to reduce the population variance,
population variance can be increased by the variation operator to maintain adequate
diversity in the population. A variation operator [5] is a combination of the recombi-
nation and the mutation operator. For a variation operator, population mean decision
variable vector should remain the same before and after the variation operator.
Varying Population Size
Population sizing schemes for EAs may rely on the population sizing theory [60],
or include the concepts of age, lifetime, and competition among species for limited
resources. In [51], a thorough analysis of the role of the offspring population size
in an EA is presented using a simplified, but still realistic EA. The result suggests a
simple way to dynamically adapt this parameter when necessary.
Messy GA [34] starts with a large initial population and halves it at regular intervals
during the primordial stage. In the primordial stage only a selection operation is
applied. This helps the population to get enriched with good building blocks. Fast
messy GA [35] is an improved version of messy GA.
GENITOR [96] employs an elitist selection that is a deterministic, rank-based
selection method so that the best N P individuals found so far are preserved by
using a crossgenerational competition. Crossover produces only one offspring that
immediately enters the population. Offspring do not replace their parents, except for
those least-fit individuals in the population. This selection strategy is similar to the
(λ + μ) strategy of ES.
CHC algorithm [23] stands for crossgenerational elitist selection, heterogeneous
recombination, and cataclysmic mutation. Like GENITOR, it also borrows from the
(λ + μ) strategy of ES. Incest prevention is introduced so that similar individuals
are prevented from mating. Half-uniform crossover is applied, and mutation is not
performed. Diversity is reintroduced by restarting partial population whenever con-
vergence is detected. This is implemented by randomly flipping a fixed proportion
of the best individual found so far as template, and introducing the better offspring
into the population.
Parameterless population pyramid [36] is an efficient, general, parameterless
evolutionary approach without user-specified parameters. It replaces the genera-
tional model with a pyramid of multiple populations that are iteratively created and
expanded. The approach scales to the difficulty of the problem when combined with
local search, advanced crossover, and addition of diversity.
Aging
Aging provides a mechanism to make room for the development of the next genera-
tion. Aging is a general mechanism to increase genetic diversity. An optimal lifespan
plays an important role in improving the effectiveness of evolution. For intelligent
species which are able to learn from experience, aging avoids excessive experience
accumulation of older individuals to avoid their being always the superior competi-
tors.
Aging is often used by assigning age 0 to each new offspring. The age is increased
by 1 in each generation. In selection for replacement the age is taken into account:
Search points exceeding a predefined maximal age are removed from the collection
of search points.
GA with varying population size [1] does not use any variation of selection mech-
anism discussed earlier, but introduces the concept of age of a chromosome in the
number of generations.
In cohort GA [48], a string of high fitness produces offspring quickly, while a
string of low fitness may have to wait a long time before reproducing. All strings
can have the same number of offspring, say two, at the time they reproduce. To
implement this delayed-reproduction idea, the population of cohort GA is divided
into an ordered set of nonoverlapping subpopulations called cohorts. Reproduction
is carried out by cycling through the cohorts in the given order.
3.8 Exploitation Versus Exploration 55
15
10
10
10
Best
Average
f( x , x )
2
5
10
i
0
10
−5
10
0 50 100 150 200 250 300
Generation t
Figure 3.4 The evolution of a random run of simple GA: the maximum and average objectives
Example 3.2: In this example, we apply simple GA to solve optimization of Rosen-

brock function plotted in Example 1.1. The domain is x1 , x2 ∈ [−2048, 2048].
For simple GA without elite strategy, the size of population is 100, and the rep-
resentation for each variable is 30-bit Gray coding. Single-point crossover with
Pc = 0.98 is applied, and Pm = 0.02. The selection scheme is the roulette-wheel
selection. In each generation, 90 % of the population is newly generated. The evo-
lution for 300 generations for a typical random run is shown in Figure 3.4. At
the end of the 300th generation, the best solution is x1 = 1.4171, x2 = 1.9413,
and f = 0.6198. The best solution is present at the end of the 279th generation,
x1 = 0.8060, x2 = 0.6409, and f = 0.6198. The global minimum is f ∗ = 0 at
x1 = 1, x2 = 1.
3.9 Two-Dimensional Genetic Algorithms
Under the scenario of two-dimensional problems such as image processing, con-

ventional GAs as well as EAs cannot be applied in a natural way, since linear
encoding causes a loss of two-dimensional correlations. Encoding an image into
a one-dimensional string increases the conceptual distance between the search space
and its representation, and thus introduces extra problem-specific operators. If an
image is encoded by concatenating horizontal lines, crossover operations result in a
large vertical disruption. In two-dimensional GAs [8,13], each individual is a two-
dimensional binary string.
In two-dimensional GA, mutation and reproduction operators can be applied in the
normal way, but two-point crossover operator samples the matrix elements in a two-
dimensional string very unevenly. Genetic operators for two-dimensional strings are
also defined, such as a crossover operator that exchanges rectangular blocks between
pairs of matrices [13], and an unbiased crossover operator called UNBLOX (uniform
block crossover) [8]. UNBLOX is a two-dimensional wraparound crossover and can
sample all the matrix positions equally. The convergence rates of two-dimensional
GAs are higher than that of simple GA for bitmaps [8].
3.10 Real-Coded Genetic Algorithms
Although GA is conventionally based on binary coding, for numerical optimization,

parameters are usually real numbers. The floating-point and fixed-point coding tech-
niques are two methods for representing real numbers. Fixed-point coding allows
more gradual mutation than floating-point coding for the change of a single bit, and
fixed-point coding is sufficient for most cases. Floating-point coding is widely used
in continuous numerical optimization. Real-coded GA has an advantage over binary-
coded GA in exploiting local continuities in function optimization. It is faster, more
consistent from run to run, and provides a higher precision than binary-coded GA.
Accordingly, genetic operators for real-coded GA need to be defined.
Crossover
In analogy to crossover operators for binary-coded GA, crossover operators for real-
coded GA such as one-point, two-point, multipoint, and uniform crossover operators
are also defined [90].
Crossover can also be defined as a linear combination of two parent vectors x 1
and x 2 and generates two offspring
x 1 = λx 1 + (1 − λ)x 2 , (3.8)
x 2 = λx 2 + (1 − λ)x 1 , (3.9)
where 0 < λ < 1. If λ = 0.5 only one offspring is obtained [82].
Assume that x 2 is an individual better than x 1 . In order to generate offspring with
better fitness than their parents, crossover can be defined by extrapolation of the two
points representing the two parents [98]
x = λ (x 2 − x 1 ) + x 2 , (3.10)
where 0 < λ < 1 is a random number. This crossover operator is suitable for locally
fine-tuning and for searching in a most promising direction.
Neighborhood-based real-parameter crossover operators [43] determine the genes
of the offspring extracting values from intervals defined on neighborhoods associ-
ated with the genes of the parents through probability distributions. BLX-α [24],
PNX [3] and fuzzy recombination [95] are based on uniform, normal, and triangular
probability distributions, respectively. PNX chooses all the genes of the same par-
ent to generate the offspring, thus it is a parent-centric crossover operator [30]. In
fuzzy recombination each gene of the offspring is generated in the neighborhood
of the corresponding gene of one of the parents, and thus fuzzy recombination is a
gene-centric crossover operator. BLX-α is a blend operator.
3.10 Real-Coded Genetic Algorithms 57
Generalized generation gap (G3) model is a steady-state, elite-preserving, scal-

able, and fast algorithm for real-parameter optimization [19]. Parent-centric recom-
bination (PCX) operator favors solutions close to parents. This approach with PCX
consistently and reliably outperforms other real-coded GAs with unimodal normal
distribution crossover and simplex crossover, correlated self-adaptive ES, CMA-ES,
and DE.
Crossover operators with multiple descendants have been presented in [42,79,
98] and these produce more than two offspring for each pair of parents. In this
case, an offspring selection strategy limits the number of offspring that will become
population members.
In [59], the crossover operator is defined as that which generates four chromo-
somes from two parents according to a strategy of combining the maximum, min-
imum, or average of all the parameters encoded in the chromosome. Only the one
with the largest fitness, denoted x , is used as the offspring of the crossover operation.
Traditional crossover operators are defined on two parents, as this is biologically
reasonable. Multiparent crossover operators combine the features of more than two
parents for generating the offspring [19]. Some multiparent crossover operators for
real-coded GAs are p-sexual coding recombination [65], bit-simulated crossover
[91], and simplex crossover [93].
In [29], mating index is introduced to allow different mating strategies to be
developed within a uniform framework: an exploitative strategy called best-first, an
explorative strategy called best-last, and a self-adaptive strategy to achieve a balance
between exploitation and exploration in a domain-independent manner.
In [15], the proposed parallel-structured real-coded GA integrates ranking selec-
tion, direction-based crossover, and dynamic random mutation. A coordinator is
embedded in the inner parallel loop to organize the operations of direction-based
crossover and dynamic random mutation. Direction-based crossover divides the pop-
ulation into N P /2 pairs according to fitness rankings, and then directly uses the rela-
tive fitness information of each pair of parents to conduct 2n 1 crossover directions for
exploring potential offspring chromosomes. Dynamic random mutation dynamically
adjusts the mutation size through successive generations.
Mutation
Mutation can be conducted by replacing one or more genes xi , i = 1, . . . , n, with
a random number xi from the domain of the corresponding parameter. The popular
uniform mutation substitutes the values of one or more randomly selected genes with
random values within their domain.
Gaussian mutation [82] is usually applied in real-coded GA. It adds a Gaussian
random number to one or multiple genes of the chromosome x and produces a new
offspring x with one or more genes defined by
xi = xi + N (0, σi ) , (3.11)
where N (0, σi ) is a random number drawn from a normal distribution with zero
mean and standard deviation σi , traditionally selected as a decreasing function, such
as σi (t) = √1+t
1
, with t corresponding to the number of generations.
Cauchy mutation replaces Gaussian distribution by Cauchy distribution, and it

is more likely to generate an offspring further away from its parent than Gaussian
mutation due to the long flat tails of Cauchy distribution [99]. Cauchy mutation,
however, has a weaker fine-tuning capability than Gaussian mutation in small to
mid-range regions. Thus, Cauchy mutation performs better when the current search
point is far from the global minimum, while Gaussian mutation is better at finding a
local minimum in a good region.
In [59], for a parent, three new offspring are generated by allowing one parameter,
some of the parameters, and all the parameters in the chromosome to change by a
randomly generated number, subject to constraints on each parameter. Only one
of the offspring will be used to replace the chromosome with the smallest fitness,
according to a predefined probability criterion that, as in SA, allows uphill move
in a controlled fashion. The probability of accepting a bad offspring is aimed at
reducing the chance of converging to a local optimum. Hence, the search domain is
significantly enlarged.
Backtracking search [16] is a real-coded GA for numerical optimization. It
employs a random mutation strategy that mutates all individuals in the direction of
the search-direction matrix (i.e., the difference of a previous population and the cur-
rent population), and a nonuniform and more complex crossover strategy. A memory
is used to store a randomly chosen previous population for generating the search-
direction matrix. The method has a single control parameter. It outperforms PSO,
CMA-ES, ABC, and DE on the benchmark.
Example 3.3: In this example, we solve optimization of Rosenbrock function given

in Example 1.1 by using real-coded GA. The domain is x1 , x2 ∈ [−2, 048, 2, 048].
The global optimum fitness is 0 at (1, 1).
For most numerical optimization problems, real coding can usually generate a
performance better than that of binary coding. Here we include the elitism strategy
in the real-coded GA realization. Our numerical test shows that a high mutation rate
can usually yield good results. We select N P = 100, Pc = 0.9, Pm = 0.9. Roulette-
wheel selection scheme is adopted. The crossover operator generates, by averaging
two parents, only one offspring. One-point mutation is employed. The mutation
operator rejects infeasible chromosomes that are beyond the domain. An annealing
variance σ = σ0 (1 − Tt ) + σ1 is selected for Gaussian mutation, where σ0 = 30, and
σ1 = 0.1, and t corresponds to the number of generations. Except for the largest chro-
mosome of the old generation, all chromosomes in the old generations are replaced
by the new offspring.
The evolution for T = 300 generations for a typical random run is shown in
Figure 3.5. At the end of the 300th generation, the solution of a typical run is x1 =
0.9338, x2 = 0.8719, and f = 0.0044. The adaptation is shown in Figure 3.5. Real-
coded GA typically leads to a performance better than that of simple GA based on
a random run.
3.10 Real-Coded Genetic Algorithms 59
10
10
Best
Average
5
f(x ,x ) 10
1 2
0
10
−5
10
0 50 100 150 200 250 300
Generation t
Figure 3.5 The evolution of a random run of the real-coded GA with the elitism strategy: the
maximum and average objectives
Example 3.4: We revisit the optimization problem treated in Example 2.1:

min f (x) = − cos x1 cos x2 exp −(x1 − π)2 − (x2 − π)2 , x ∈ [−100, 100]2 .
x
The Easom function is plotted in Figure 2.2. The global minimum value is −1 at x =
(π, π)T . As we described in Example 2.1, this problem is hard, since this function
is similar to a needle-in-a-hay function.
MATLAB Global Optimization Toolbox provides a GA solver ga. Using the
default parameter settings, ga solver can find the global optimum nine out of ten
runs, for the range [−100, 100]2 . The GA solver has the default settings: real-coded
GA, with scattered crossover, Gaussian mutation, elite strategy, an initial population
randomly selected in (0, 1), a population size of 20, and other parameters. We notice
that by using the default initial population, the solver always finds the global optimum
very rapidly. This is because all the initial individuals are very close the global
optimum.
A fair evaluation of GA is to set the initial population randomly from the entire
domain. We select an initial population size of 40. For a random run, we have f (x) =
−1.0000 at (3.1416, 3.1414) with 2080 function evaluations. All the individuals
converge toward the global optimum. For 10 random runs, the solver converged 9
times for 50 generations. The evolution of a random run is illustrated in Figure 3.6.
Further, after restricting the search space to [−10, 10]2 , the solve can always find the
global optimum. In summary, we conclude that the ga solver in real-coded mode to
be much more efficient than the SA solver simulannealbnd.
Best: −1 Mean: −1
0
Best fitness
Mean fitness
−0.2
Fitness value
−0.4
−0.6
−0.8
−1
0 20 40 60 80 100
Generation
Figure 3.6 The evolution of a random run of simple GA: the minimum and average objectives
3.11 Genetic Algorithms for Sequence Optimization
For sequence optimization problems such as scheduling and TSP, permutation encod-
ing is a natural representation for a set of symbols, and each symbol can be identified
by a distinct integer. This representation avoids missing or duplicate alleles [37].
Genetic operators should be defined so that infeasible solutions do not occur or
a way is viable for repairing or rejecting infeasible solutions. Genetic operators for
reordering a sequence of symbols can be unary operators such as inversion and swap,
or binary operators that combine features of inversion and crossover, such as partial
matched crossover, order crossover, and cycle crossover [31], edge recombination
[97], as well as intersection and union [26].
The random keys representation [4] encodes each symbol with a random num-
ber in (0, 1). A random key for a variable is a real-valued number in the interval
(0,1). By sorting the random keys in a descending or ascending order, we can get a
decoded solution. For example, assume that we are solving a TSP of 5 cities, with
the chromosome for a route encoded as (0.52, 0.40, 0.81, 0.90, 0.23). If the genes
are sorted in a descending order, the largest random key is 0.90, so the fourth city is
the beginning of the route, and the whole route can be 4 → 3 → 1 → 2 → 5. This
representation avoids infeasible offspring by representing solutions in a soft manner,
such that real-coded GA and the ES can be applied directly for sequence optimization
problems. The random keys representation is simple and robust, and always allows
simple crossover operations to generate feasible solutions. Ordering messy GA [53]
is specialized for solving sequence optimization problems. It uses the mechanics of
fast messy GA [35] and represents the solutions using random keys.
3.11 Genetic Algorithms for Sequence Optimization 61
Biased random key GA [22] is a variation of random keys GA, but differs in the
way crossover is performed. In biased random key GA, the population is divided
into a small elite subpopulation and a nonelite subpopulation. To generate the off-
spring, biased random key GA selects one parent from the elite subpopulation and
the other parent from the nonelite subpopulation. Thus the offspring would have
more probability of inhering the keys of its elite parent.
Coding Spanning Trees
Many combinatorial problems seek solutions that either are or are derived from
spanning trees. For the minimum spanning tree (MST) problem, polynomial time
algorithms exist for identifying an optimal solution. Other problems, such as the
optimal communications spanning tree problem and the degree-constrained MST
problem have been shown to be NP-hard.
The concept of random keys [4] has been transferred from scheduling and ordering
problems to the encoding of trees. A tree is an undirected, fully connected graph with
no cycles. One of the most common representation schemes for networks is the char-
acteristic vector representation. Simple GAs with network random keys (NetKeys)
significantly outperform their counterparts using characteristic vectors and are much
faster for solving complex tree problems [77]. For NetKeys [77], a chromosome
assigns to each edge on the network a rating of its importance, which is referred to as
a weight, a real number in [0, 1]. A spanning tree is decoded from the chromosome
by adding edges from the network to an initially empty graph in order of importance,
ignoring edges that introduce cycles. Once n − 1 edges have been added, a span-
ning tree has been identified. NetKeys has high computational complexity. Since
the chromosome has length e = |E|, E being the set of edges, the time required for
crossover and mutation is O(e). Decoding is even more complex, since it requires
to identify an MST on the problem network.
With a direct tree representation, the identity of all n − 1 edges in the spanning
tree can be identified directly from its chromosome. One example is the predecessor
code [71]. A node is designated as the root node of the tree and, for each node, the
immediate predecessor pi in the path from pi to the present node is recorded. A
spanning tree T = (V, E) is encoded as the vector P = { p1 , p2 , . . . , pn−1 }, where
(i, pi ) ∈ E and V is designated as the root node. Although the code does not exclu-
sively encode spanning trees, it does ensure that each node belongs to at least one
edge and that no edge is represented more than twice.
The Dandelion code [92] represents each tree on n vertices as a string of (n − 2)
integers from the set [1, n]. The implementation of the Dandelion mapping has
O(n) complexity. Although the direct tree coding, which exhibits perfect heritability,
achieves the best results in the fewest generations, with NetKeys being a close second,
the Dandelion code is a strong alternative, particularly for very large networks, since
the Dandelion code is computationally the most efficient coding scheme for spanning
trees and locality seems to improve as the problem size increases. The decoding and
encoding algorithms for the Dandelion code may both be implemented in O(n) time
[72], and the locality is high.
Figure 3.7 The TSP with 30

randomly generated cities 1
0.8
0.6
0.4
0.2
0 0.5 1 1.5
Example 3.5: Consider the TSP for 30 randomly generated cities in the United States,
plotted in Figure 3.7.
When using the GA solver, city sequence is coded as a custom data type, and
the corresponding creation function, crossover function, and mutation function are
provided in the MATLAB Global Optimization Toolbox. We set the population size
as 50 and the number of generations as 400. The initial solutions are randomly
selected as a sequence of all the cities.
The evolution of a random run is illustrated in Figure 3.8. The final optimal route
length is 4.096 obtained at the 391st generation, with 19600 fitness evaluations.
Problems
3.1 What will happen if pc = pm = 1 in simple GA?

3.2 For a GA individual “10011010”, what are the chromosome, genotypes, and
phenotypes for this individual?
3.3 Show the first 3 generations of GA for optimizing y = x 3 − 5 in the internal of
0 ≤ x ≤ 10. The initial population is assumed to be {1101, 1011, 1010, 0101}.
3.4 For the TSP, design a crossover operator that peserves the TSP constraint that
each city is visited exactly once to be a valid tour.
3.5 Implement BLX-0.5 and simplex crossover in your programming environment.
3.6 Select and implement at least one method of deterministic, adaptive, and self-
adaptive control over pm or pc using a benchmark problem.
3.7 Implement simple GA from scratch in MATLAB language to repeat the solution
process of problem 3.3.
3.11 Genetic Algorithms for Sequence Optimization 63
(a) (b)
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 0.5 1 1.5 0 0.5 1 1.5
Best: 6.4762 Mean: 8.6769 Best: 4.0916 Mean: 6.4025

18 20
Best fitness Best fitness
16 Mean fitness Mean fitness
15
Fitness value
Fitness value
14
12 10
10
5
8
6 0
0 100 200 300 400 0 100 200 300 400
Generation Generation
Figure 3.8 The GA evolution of the TSP: a the optimal solution, b minimum and average route
lengths
3.8 Suppose that we use binary GA to find x to a resolution of 0.01, in order

to minimize the two-dimensional Rastrigin function on the domain [−1, +1].
How many bits are needed to encode each chromosome?
3.9 Assume that we have four individuals in a population. The fitness values are
1, 3, 5, 7.
(1) What are the selection probabilities of each individual for one spin of a
roulette wheel?
(2) What if we use rank-based selection?
3.10 Gray codes are not unique. Give two alternative gray codings of the numbers
0–15.
3.11 Given a GA with 5 individuals xi , i = 1, . . . , 5, and the fitness of xi is f (xi ) =
i. Roulette wheel selection is used to select 4 parents for crossover. The first two
produce two offspring, and the next two produce two more offspring. What is
the probability that the most fit individual mates itself at least once to produce
two cloned offspring?
3.12 For binary GA with population size N and mutation rate pm , and chromosome
length n bits, what is the probability that any bits will not be mutated in the
entire population for one generation?
3.13 Explain why the crossover operators defined for binary GA are not suitable for
real-coded GA.
3.14 Implement fitness transform for maximizing a function with negative objective
value.
3.15 Implement the ga solver for solving a benchmark function using both the
binary mode and the real-coded mode.
3.16 Implement the ga solver for solving the knapsack problem in the Appendix.
References
1. Arabas J, Michalewicz Z, Mulawka J. GAVaPS—a genetic algorithm with varying population
size. In: Proceedings of the 1st IEEE international conference on evolutionary computation,
Orlando, FL, USA, June 1994. p. 73–78.
2. Araujo L, Merelo JJ. Diversity through multiculturality: assessing migrant choice policies in
an island model. IEEE Trans Evol Comput. 2011;15(4):456–69.
3. Ballester PJ, Carter JN. An effective real-parameter genetic algorithm with parent centric
normal crossover for multimodal optimisation. In: Proceedings of genetic and evolutionary
computation conference (GECCO), Seattle, WA, USA, June 2004. p. 901–913.
4. Bean J. Genetic algorithms and random keys for sequence and optimization. ORSA J Comput.
1994;6(2):154–60.
5. Beyer H-G, Deb K. On self-adaptive features in real-parameter evolutionary algorithms. IEEE
Trans Evol Comput. 2001;5(3):250–70.
6. Bhandari D, Pal NR, Pal SK. Directed mutation in genetic algorithms. Inf Sci. 1994;79:251–
70.
7. Burke DS, De Jong KA, Grefenstette JJ, Ramsey CL, Wu AS. Putting more genetics into
genetic algorithms. Evol Comput. 1998;6(4):387–410.
8. Cartwright HM, Harris SP. The application of the genetic algorithm to two-dimensional
strings: the source apportionment problem. In: Forrest S, editor, Proceedings of the 5th inter-
national conference on genetic algorithms, Urbana-Champaign, IL, USA, June 1993. San
Mateo, CA: Morgan Kaufmann; 1993. p. 631.
9. Cervantes J, Stephens CR. Limitations of existing mutation rate heuristics and how a rank
GA overcomes them. IEEE Trans Evol Comput. 2009;13(2):369–97.
10. Chakraborty UK, Janikow CZ. An analysis of Gray versus binary encoding in genetic search.
Inf Sci. 2000;156:253–69.
11. Chan TM, Man KF, Kwong S, Tang KS. A jumping gene paradigm for evolutionary multiob-
jective optimization. IEEE Trans Evol Comput. 2008;12(2):143–59.
12. Chen H, Flann NS, Watson DW. Parallel genetic simulated annealing: a massively parallel
SIMD algorithm. IEEE Trans Parallel Distrib Syst. 1998;9(2):126–36.
13. Cherkauer KJ. Genetic search for nearest-neighbor exemplars. In: Proceedings of the 4th
midwest artificial intelligence and cognitive science society conference, Utica, IL, USA,
1992. p. 87–91.
14. Chicano F, Sutton AM, Whitley LD, Alba E. Fitness probability distribution of bit-flip muta-
tion. Evol Comput. 2015;23(2):217–48.
References 65
15. Chuang Y-C, Chen C-T, Hwang C. A real-coded genetic algorithm with a direction-based
crossover operator. Inf Sci. 2015;305:320–48.
16. Civicioglu P. Backtracking search optimization algorithm for numerical optimization prob-
lems. Appl Math Comput. 2013;219:8121–44.
17. Davis L. Bit-climbing, representational bias, and test suite design. In: Proceedings of the 4th
international conference on genetic algorithms, San Diego, CA, USA, July 1991. San Mateo,
CA: Morgan Kaufmann; 1991. p. 18–23.
18. Davis L, Grefenstette JJ. Concerning GENESIS and OOGA. In: Davis L, editor. Handbook
of genetic algorithms. New York: Van Nostrand Reinhold; 1991. p. 374–377.
19. Deb K, Anand A, Joshi D. A computationally efficient evolutionary algorithm for real-
parameter optimization. Evol Comput. 2002;10(4):371–95.
20. De Jong K. An analysis of the behavior of a class of genetic adaptive systems. PhD Thesis,
University of Michigan, Ann Arbor, MI, USA, 1975.
21. Drugan MM, Thierens D. Recombination operators and selection strategies for evolutionary
Markov Chain Monte Carlo algorithms. Evol Intel. 2010;3(2):79–101.
22. Ericsson M, Resende MGC, Pardalos PM. A genetic algorithm for the weight setting problem
in OSPF routing. J Comb Optim. 2002;6:299–333.
23. Eshelman LJ. The CHC adaptive search algorithm: How to have safe search when engaging
in nontraditional genetic recombination. In: Rawlins GJE, editor. Foundations of genetic
algorithms. San Mateo, CA: Morgan Kaufmannpp; 1991. p. 265–283.
24. Eshelman LJ, Schaffer JD. Real-coded genetic algorithms and interval-schemata. In: Whitley
LD, editor, Foundations of genetic algorithms 2. San Mateo, CA: Morgan Kaufmann; 1993.
p. 187–202.
25. Fogel L, Owens J, Walsh M. Artificial intelligence through simulated evolution. New York:
Wiley; 1966.
26. Fox BR, McMahon MB. Genetic operators for sequencing problems. In: Rawlins GJE, editor.
Foundations of genetic algorithms. San Mateo, CA: Morgan Kaufmann; 1991. p. 284–300.
27. Frantz DR. Non-linearities in Genetic Adaptive Search. PhD Thesis, University of Michigan,
Ann Arbor, MI, USA, 1972.
28. Friedrich T, Hebbinghaus N, Neumann F. Rigorous analyses of simple diversity mechanisms.
In: Proceedings of genetic and evolutionary computation conference (GECCO), London, UK,
July 2007. p. 1219–1225.
29. Galan SF, Mengshoel OJ, Pinter R. A novel mating approach for genetic algorithms. Evol
Comput. 2012;21(2):197–229.
30. Garcia-Martinez C, Lozano M, Herrera F, Molina D, Sanchez AM. Global and local real-
coded genetic algorithms based on parent-centric crossover operators. Eur J Oper Res.
2008;185:1088–113.
31. Goldberg DE. Genetic algorithms in search, optimization, and machine learning. Reading,
MA, USA: Addison-Wesley; 1989.
32. Goldberg D. A note on Boltzmann tournament selection for genetic algorithms and population-
oriented simulated annealing. Complex Syst. 4:4:445–460.
33. Goldberg DE, Deb K. A comparative analysis of selection schemes used in genetic algo-
rithms. In: Rawlins GJE, editor. Foundations of genetic algorithms. San Mateo, CA: Morgan
Kaufmann; 1991. p. 69–93.
34. Goldberg DE, Deb K, Korb B. Messy genetic algorithms: motivation, analysis, and first results.
Complex Syst. 1989;3:493–530.
35. Goldberg DE, Deb K, Kargupta H, Harik G. Rapid, accurate optimization of difficult problems
using fast messy genetic algorithms. In: Proceedings of the 5th international conference on
genetic algorithms, Urbana-Champaign, IL, USA, June 1993. p. 56–64.
36. Goldman BW, Punch WF. Fast and efficient black box optimization using the parameter-less
population pyramid. Evol Comput. 2015;23(2):451–79.
37. Grefenstette JJ, Gopal R, Rosmaita BJ, Gucht DV. Genetic algorithms for the traveling sales-
man problem. In: Proceedings of the 1st international conference on genetic algorithms and
their applications, Pittsburgh, PA, USA, July 1985. Mahwah, NJ: Lawrence Erlbaum Asso-
ciates; 1985. p. 160–168.
38. Harvey I. The SAGA cross: the mechanics of crossover for variable-length genetic algorithms.
In: Proceedings of the 2nd conference on parallel problem solving from nature (PPSN II),
Brussels, Belgium, Sept 1992. Amsterdam, The Netherlands: North Holland; 1992. p. 269–
278.
39. Harvey I. The microbial genetic algorithm. In: Proceedings of 10th european conference on
advances in artificial life: Darwin meets von Neumann, Budapest, Hungary, Sept 2009, Part
II, p. 126–133.
40. Herrera F, Lozano M. Adaptation of genetic algorithm parameters based on fuzzy logic con-
trollers. In: Herrera F, Verdegay JL, editors. Genetic algorithms and soft computing. Berlin:
Physica-Verlag; 1996. p. 95–125.
41. Herrera F, Lozano M. Fuzzy adaptive genetic algorithms: design, taxonomy, and future direc-
tions. Soft Comput. 2003;7:545–62.
42. Herrera F, Lozano M, Verdegay JL. Fuzzy connectives based crossover operators to model
genetic algorithms population diversity. Fuzzy Sets Syst. 1997;92(1):21–30.
43. Herrera F, Lozano M, S’anchez AM. A taxonomy for the crossover operator for real-coded
genetic algorithms: An experimental study. Int J Intell Syst. 2003;18:3:309–338.
44. Hesser J, Manner R. Towards an optimal mutation probability for genetic algorithms. In:
Proceedings of the 1st workshop on parallel problem solving from nature (PPSN I), Dortmund,
Germany, Oct 1990. p. 23–32.
45. Hillis WD. Co-evolving parasites improve simulated evolution as an optimization procedure.
Physica D. 1990;42:228–34.
46. Holland JH. Outline for a logical theory of adaptive systems. J ACM. 1962;9(3):297–314.
47. Holland J. Adaptation in natural and artificial systems. Ann Arbor, Michigan: University of
Michigan Press; 1975.
48. Holland JH. Building blocks, cohort genetic algorithms and hyperplane-defined functions.
Evol Comput. 2000;8(4):373–91.
49. Hutter M, Legg S. Fitness uniform optimization. IEEE Trans Evol Comput. 2006;10(5):568–
89.
50. Hutt B, Warwick K. Synapsing variable-length crossover: meaningful crossover for variable-
length genomes. IEEE Trans Evol Comput. 2007;11(1):118–31.
51. Jansen T, De Jong KA, Wegener I. On the choice of the offspring population size in evolu-
tionary algorithms. Evol Comput. 2005;13(4):413–40.
52. Khatib W, Fleming PJ. The stud GA: a mini revolution? In: Eiben A, Back T, Schoenauer
M, Schwefel H, editors. Proceedings of the 5th international conference on parallel problem
solving from nature (PPSN V). Amsterdam: The Netherlands; 1998. p. 683–691.
53. Knjazew D, Goldberg DE. OMEGA—Ordering messy GA: Solving permutation problems
with the fast messy genetic algorithm and random keys. In: Proceedings of genetic and evo-
lutionary computation conference (GECCO), Las Vegas, NV, USA, July 2000. p. 181–188.
54. Koumousis VK, Katsaras CP. A saw-tooth genetic algorithm combining the effects of vari-
able population size and reinitialization to enhance performance. IEEE Trans Evol Comput.
2006;10(1):19–28.
55. Koza JR. Genetic programming: On the programming of computers by means of natural
selection. Cambridge, MA: MIT Press; 1992.
56. Laskey KB, Myers JW. Population Markov chain Monte Carlo. Mach Learn. 2003;50:175–96.
57. Lee MA, Takagi H. Dynamic control of genetic algorithms using fuzzy logic techniques. In:
Proceedings of the 5th international conference on genetic algorithms (ICGA’93), Urbana,
IL, USA, July 1993. p. 76–83.
References 67
58. Lee CY. Entropy-Boltzmann selection in the genetic algorithms. IEEE Trans Syst Man Cybern
Part B. 2003;33(1):138–42.
59. Leung FHF, Lam HK, Ling SH, Tam PKS. Tuning of the structure and parameters of a neural
network using an improved genetic algorithm. IEEE Trans Neural Networks. 2003;14(1):79–
88.
60. Lobo FG, Lima CF. A review of adaptive population sizing schemes in genetic algorithms.
In: Proceedings of genetic and evolutionary computation conference (GECCO), Washington,
DC, USA, June 2005. p. 228–234.
61. Mathias K, Whitley LD. Changing representations during search: a comparative study of delta
coding. Evol Comput. 1995;2(3):249–78.
62. Matsui K. New selection method to improve the population diversity in genetic algorithms.
In: Proceedings of the 1999 IEEE International conference on systems, man, and cybernetics,
Tokyo, Japan, Oct 1999. p. 625–630.
63. Mauldin ML. Maintaining diversity in genetic search. In: Proceedings of the 4th national
conference on artificial intelligence (AAAI-84), Austin, TX, USA, Aug 1984. p. 247–250.
64. Mills KL, Filliben JJ, Haines AL. Determining relative importance and effective settings for
genetic algorithm control parameters. Evol Comput. 2015;23(2):309–42.
65. Muhlenbein H. Parallel genetic algorithms, population genetics and combinatorial optimiza-
tion. In: Proceedings of the 3rd international conference on genetic algorithms, Fairfax, VA,
USA, June 1989. San Mateo, CA: Morgan Kaufman; 1989. p. 416–421.
66. Muhlenbein H. How genetic algorithms really work: mutation and hill climbing. In: Manner
R, Manderick B, editors. Proceedings of the 2nd conference on parallel problem solving
from nature (PPSN II), Brussels, Belgium, Sept 1992. Amsterdam, The Netherlands: North
Holland; 1992. pp. 15–25.
67. Muhlenbein H, Paab G. From recombination of genes to the estimation of distributions. I.
Binary parameters. In: Proceedings of the 4th International conference on parallel problem
solving from nature (PPSN IV), Berlin, Germany, Sept 1996. p. 178–187.
68. Muhlenbein H, Schlierkamp-Voosen D. Predictive models for the breeder genetic algorithm:
continuous parameter optimization. Evol Comput. 1994;1(4):25–49.
69. Mulenbein H, Schlierkamp-Voose D. Analysis of selection, mutation and recombination in
genetic algorithms. In: Banzhaf W, Eechman FH, editors. Evolution and biocomputation:
Evolution and biocomputation, computational models of evolution. Berlin: Springer; 1995.
p. 142–68.
70. Nawa NE, Furuhashi T. Fuzzy systems parameters discovery by bacterial evolutionary algo-
rithms. IEEE Trans Fuzzy Syst. 1999;7:608–16.
71. Palmer CC, Kershenbaum A. An approach to a problem in network design using genetic
algorithms. Networks. 1995;26:151–63.
72. Paulden T, Smith DK. From the Dandelion code to the Rainbow code: a class of bijective
spanning tree representations with linear complexity and bounded locality. IEEE Trans Evol
Comput. 2006;10(2):108–23.
73. Perales-Gravan C, Lahoz-Beltra R. An AM radio receiver designed with a genetic algorithm
based on a bacterial conjugation genetic operator. IEEE Trans Evol Comput. 2008;12(2):129–
42.
74. Potter MA, De Jong KA. Cooperative coevolution: an architecture for evolving coadapted
subcomponenets. Evol Comput. 2000;8(1):1–29.
75. Rechenberg I. Evolutionsstrategie-optimierung technischer systeme nach prinzipien der biol-
ogischen information. Freiburg, Germany: Formman Verlag; 1973.
76. Ronald E. When selection meets seduction. In: Proceedings of the 6th international conference
on genetic algorithms, Pittsburgh, PA, USA, July 1995. p. 167–173.
77. Rothlauf F, Goldberg DE, Heinzl A. Network random keys—a tree network representation
scheme for genetic and evolutionary algorithms. Evol Comput. 2002;10(1):75–97.
78. Rudolph G. Convergence analysis of canonical genetic algorithm. IEEE Trans Neural Net-
works. 1994;5(1):96–101.
79. Satoh H, Yamamura M, Kobayashi S. Minimal generation gap model for GAs considering
both exploration and exploitation. In: Proceedings of the 4th International conference on
soft computing (Iizuka’96): Methodologies for the conception, design, and application of
intelligent systems, Iizuka, Fukuoka, Japan, Sept 1996. p. 494–497.
80. Schaffer JD, Caruana RA, Eshelman LJ, Das R. A study of control parameters affecting
online performance of genetic algorithms for function optimisation. In: Proceedings of the
3rd international conference on genetic algorithms, Fairfax, VA, USA, June 1989. San Mateo,
CA: Morgan Kaufmann; 1989. p. 70–79.
81. Schraudolph NN, Belew RK. Dynamic parameter encoding for genetic algorithms. Mach
Learn. 1992;9(1):9–21.
82. Schwefel HP. Numerical optimization of computer models. Chichester: Wiley; 1981.
83. Sharma SK, Irwin GW. Fuzzy coding of genetic algorithms. IEEE Trans Evol Comput.
2003;7(4):344–55.
84. Simoes AB, Costa E. Enhancing transposition performance. In: Proceedings of congress on
evolutionary computation (CEC), Washington, DC, USA, July 1999. p. 1434–1441.
85. Smith J, Vavak F. Replacement strategies in steady state genetic algorithms: static environ-
ments. In: Banzhaf W, Reeves C, editors. Foundations of genetic algorithms 5. CA: Morgan
Kaufmann; 1999. p. 219–233.
86. Sokolov A, Whitley D. Unbiased tournament selection. In: Proceedings of the conference
on genetic and evolutionary computation (GECCO), Washington, DC, USA, June 2005. p.
1131–1138.
87. Srinivas M, Patnaik LM. Adaptive probabilities of crossover and mutation in genetic algo-
rithms. IEEE Trans Syst Man Cybern. 1994;24(4):656–67.
88. Storn R, Price K. Differential evolution–a simple and efficient adaptive scheme for global
optimization over continuous spaces. Technical Report TR-95-012, International Computer
Science Institute, Berkeley, CA, March 1995.
89. Streifel RJ, Marks RJ II, Reed R, Choi JJ, Healy M. Dynamic fuzzy control of genetic algorithm
parameter coding. IEEE Trans Syst Man Cybern Part B. 1999;29(3):426–33.
90. Syswerda G. Uniform crossover in genetic algorithms. In: Proceedings of the 3rd international
conference on genetic algorithms, Fairfax, VA, USA, June 1989. San Francisco: Morgan
Kaufmann; 1989. p. 2–9.
91. Syswerda G. Simulated crossover in genetic algorithms. In: Whitley LD, editor. Foundations
of genetic algorithms 2, San Mateo, CA: Morgan Kaufmann; 1993. p. 239–255.
92. Thompson E, Paulden T, Smith DK. The Dandelion code: a new coding of spanning trees for
genetic algorithms. IEEE Trans Evol Comput. 2007;11(1):91–100.
93. Tsutsui S, Yamamura M, Higuchi T. Multi-parent recombination with simplex crossover in
real coded genetic algorithms. In: Proceedings of the genetic and evolutionary computation
conference (GECCO), Orlando, FL, USA, July 1999. San Mateo, CA: Morgan Kaufmann;
1999. p. 657–664.
94. Ursem RK. Diversity-guided evolutionary algorithms. In: Proceedings of the 7th conference
on parallel problem solving from nature (PPSN VII), Granada, Spain, Sept 2002. p. 462–471.
95. Voigt HM, Muhlenbein H, Cvetkovic D. Fuzzy recombination for the breeder genetic algo-
rithm. In: Eshelman L, editor. Proceedings of the 6th international conference on genetic
algorithms, Pittsburgh, PA, USA, July 1995. San Mateo, CA: Morgan Kaufmann; 1995. p.
104–111.
96. Whitley D. The GENITOR algorithm and selective pressure. In: Proceedings of the 3rd inter-
national conference on genetic algorithms, Fairfax, VA, USA, June 1989. San Mateo, CA:
Morgan Kaufmann; 1989. p. 116–121.
97. Whitley D, Starkweather T, Fuquay D. Scheduling problems and traveling salesmen: the
genetic edge recombination operator. In: Proceedings of the 3rd international conference on
References 69
genetic algorithms, Fairfax, VA, USA, June 1989. San Mateo, CA: Morgan Kaufmann; 1989.
p. 133–140.
98. Wright AH. Genetic algorithms for real parameter optimization. In: Rawlins G, editor. Foun-
dations of genetic algorithms. San Mateo, CA: Morgan Kaufmann; 1991. p. 205–218.
99. Yao X, Liu Y, Liang KH, Lin G. Fast evolutionary algorithms. In: Ghosh S, Tsutsui S, editors.
Advances in evolutionary computing: theory and applications. Berlin, Springer; 2003. p. 45–9.
100. Yip PPC, Pao YH. Combinatorial optimization with use of guided evolutionary simulated
annealing. IEEE Trans Neural Networks. 1995;6(2):290–5.
101. Yukiko Y, Nobue A. A diploid genetic algorithm for preserving population diversity—pseudo-
meiosis GA. In: Parallel problem solving from nature (PPSN III), Vol. 866 of the series Lecture
Notes in Computer Science. Berlin: Springer; 1994. p. 36–45.
Genetic Programming
4
Genetic programming (GP) is a variant of GA whose chromosomes have variable

length and data structure in the form of hierarchical trees. It is an automated method
for evolving computer programs from a high-level statement of a problem. This
chapter is dedicated to GP.
4.1 Introduction
GP [12] is a variant of GA for symbolic regression such as evolving computer pro-

grams, rather than for simple strings. GP is a hyper-heuristic search method. It is
particularly suitable for problems in which the optimal underlying structure must be
discovered, for instance, for automatic discovery of empirical laws. The design of
programming languages, compilers, and interpreters is an important topic in theo-
retical computer science.
GP has chromosomes of both variable length and data structure in the form of hier-
archical trees, instead of numeric vectors, or finite state machines. Internal nodes of
solution trees represent appropriate operators and leaf nodes represent input variables
or constants. For regression applications, the operators are mathematical functions
and the inputs are variables.
GP suffers from the so-called bloat phenomenon, resulting from the growth of non-
coding branches in the individuals. The bloat phenomenon may cause an excessive
consumption of computer resources and increase the cost of fitness computation. A
simple steady-state GP system is tinyGP (In Java at http://cswww.essex.ac.uk/staff/
rpoli/TinyGP/) [34].
Standard GP suffers from a structural difficulty problem in that it is unable to
search effectively for solutions requiring very full or very narrow trees [4]. This
deficiency is not due to the tree structure, but rather it may arise from the lack of local
structure-editing operators and GP’s fixed-arity expression tree representation [10].

DOI 10.1007/978-3-319-41192-7_4
72 4 Genetic Programming
Symbolic regression via GP has advantages over neural networks and SVMs in
terms of representation complexity, interpretability, and generalizing behavior. An
approach to generating data-driven regression models are proposed in [37]. These
models are obtained as solutions of the GP process for two-objective optimization
of low model error and low orders of expressional complexity. It is Pareto optimiza-
tion of the goodness of fit and expressional complexity, alternated with the Pareto
optimization of the goodness of fit and the order of nonlinearity at every generation.
Grammatical evolution [23] represents a grammar-based GP. Rather than repre-
senting the programs as parse trees, it uses a linear genome representation in the form
of a variable-length binary string. Grammatical evolution uses algorithmic maps to
define a phenotype from a genome, and uses a GA to search the space of struc-
tures specified by some context-free or attribute grammar. Christiansen grammar
evolution [24] extends grammatical evolution by replacing context-free grammars
by Christiansen grammars to improve grammatical evolution performance. Gram-
matical evolution only takes into account syntactic restrictions to generate valid
individuals, while Christiansen grammar evolution adds semantics to ensure that
both semantically and syntactically valid individuals are generated.
The inclusion of automatically defined functions (ADFs) in GP is widely adopted
by the GP research community. ADFs are reusable subroutines that are simultane-
ously evolved with the GP program, and are capable of exploiting any modularity
present in a problem to improve the performance of GP. However, the output of each
ADF is determined by evolution.
Gene expression programming (http://www.gepsoft.com/) [9] is a genotype/
phenotype GA for the creation of computer programs. In gene expression program-
ming, the genome is a symbol string of constant length, which may contain one or
more genes linked through a linking function. Thus, the algorithm distinguishes the
expression of genes (phenotype) from their representation (genotype). Gene expres-
sion programming considerably outperforms GP.
Cartesian GP uses directed graphs to represent programs, rather than trees. This
allows implicit reuse of nodes, as a node can be connected to the output of any
previous node in the graph. This is an advantage over tree-based GP representations
(without ADFs), where identical subtrees have to be constructed independently. Even
though Cartesian GP does not have ADFs, it performs better than GP with ADFs on a
number of problems. Embedded Cartesian GP [38] implements a form of ADF based
on the evolutionary module acquisition approach, which is capable of automatically
acquiring and evolving modules.
4.2 Syntax Trees
GP represents variables and algebraic operators in genes. Each chromosome is a

syntax tree that represents an algebraic expression. Lisp language is suitable for
crossover and mutation in GP. Linked lists are a major structure in Lisp. Lisp program
code is written with parentheses, with a function name followed by its arguments.
4.2 Syntax Trees 73
Many Lisp functions take a variable number of arguments. A parenthetical expression

in Lisp is called an s-expression (or symbolic expression), which corresponds to a
tree structure called syntax trees.
For example, the code (* x 100) represents x ∗ 100, and sin (* x 100)
outputs sin(x + 100). An s-expression for 4x + sin z is written as (+ (* 4 x)
(sin z)). Each s-expression in parentheses correspond to a subtree. Those sym-
bols at the bottom of a syntax tree are called leaves. For another example, a
function to be identified, f (x) = x 2 + sin(x/3), can be expressed by a syntax
tree written as (+(ˆ(x 2) sin((/ x 3))). The chromosome is encoded
as + ˆ x 2 sin / x 3. Figure 4.1 gives the solution tree for f (x) = x 2 +
sin(x/3).
Tree-based crossover replaces an s-expression of a syntax tree by another s-
expression, and the syntax tree will remain valid. We can perform mutation by
replacing a randomly selected s-expression with a randomly generated s-expression,
leading to tree-based mutation.
Examples of tree-based crossover and mutation are illustrated in Figure 4.2 and
Figure 4.3.
+
^ sin
x 2 /
x 3
Figure 4.1 Syntax tree for f (x) = x 2 + sin(x/3).
+ +
^ + ^ +
x 4 * x / x * x
x y x y x y
+ +
Crossover
^ * ^ *
/ x 4 x 4 x
x 4
x y
Parents Children
Figure 4.2 Crossover and mutation in GP.

+ +
^ + ^ −−
Mutation
x * x x 4 y x
4
x y
Parent Child
Figure 4.3 Crossover and mutation in GP.
Example 4.1: GP has been used for generating nonlinear input–output models that
are linear in parameters. The models are represented in a tree structure [20]. For
linear-in-parameters models, the model complexity can be controlled by orthogonal
least squares (OLS) method. The model terms are sorted by error reduction ratio
values according to OLS method. The subtree that had the least error reduction ratio
is eliminated from the tree.
MATLAB GP-OLS Toolbox provides an efficient and fast method for data-based
identification of nonlinear models. Instead of the mean square error (MSE), the fitness
function is defined as a correlation coefficient of the measured and the calculated
output values, multiplied by a penalty factor controlling the model complexity. OLS
introduces the error reduction ratio which is a measure of the decrease in the variance
of the output by a given term. The user should only specify the input–output data,
the set of the variables or the maximum model order at the terminal nodes, the set of
mathematical operators at the internal nodes, and some parameters of GP.
We consider the nonlinear input–output model with linear parameters:
y(k) = 0.5u(k − 1)2 + 0.6y(k − 1) − 0.6y(k − 2) − 0.2;
where u(k) and y(k) are the input and the output variables of the model at the kth
sample time.
This model is first used to generate the input and output data, as plotted in
Figure 4.4. Notice that the input and the output are polluted by 6 % and 3 % Gaussian
noise, respectively.
During the evolution, the function set F contained the basic arithmetic operations
F = {+, −, ∗}, and the terminal set T contained the arguments T = {u(k − 1),
u(k − 2), y(k − 1), y(k − 2)}. Parameters of GP are set as follows: N P = 50, the
maximum tree depth as 5, the maximum number of generations as 200, tournament
selection of size 2, one-point crossover pc = 0.8, point-mutation pm = 0.4, elitist
replacement, and generation gap as 0.9.
For ten random runs, the algorithm found perfect solution to the model structure
five times. For a random run, we got the best fitness 0.7596, the best MSE 0.7632,
and the evolved model
y(k) = 0.5074u(k − 1)2 + 0.4533y(k − 1) − 0.4586y(k − 2) − 0.2041.
That is, GP-OLS method can correctly identify the model structure of nonlinear
systems. The evolution of the fitness and MSE are shown in Figure 4.5.
4.3 Causes of Bloat 75
0.5
0
u(t), y(t)
−0.5
−1
u(t)
−1.5
y(t)
−2
0 2 4 6 8 10
t
Figure 4.4 The input and output data for model identification.
Best fitness: 0.7596 Best MSE: 0.7632

3
Best fitness
Best MSE
2.5
2
Value
1.5
0.5
0
0 20 40 60 80 100
Iteration
Figure 4.5 The evolution of the fitness and MSE.
4.3 Causes of Bloat
Because GP uses a variable-length representation, the individuals within the evolving

population tend to grow rapidly without a corresponding return in fitness improve-
ment. This is a phenomenon known as bloat. GP generates solutions with large
amounts of irregular and unnecessary code, which dramatically increases over time
and memory, and is not proportionate to any increase in the quality of the solutions.
In fact, biological genotypes are also fairly irregular and not too compressible. In
GP, code bloat is almost inevitable [12,26]. Programs that are much larger than they
need to be may over-fit the training data, reducing the performance on unseen data.
Classical theories for explaining bloat are mainly based on the concept of introns,
areas of code that can be removed without altering the fitness value of the solution.
Introns in biology are noncoding regions of the DNA, that is, those that eventually do
not end up as part of a protein. Explicitly defined introns [21] control the probability
of particular nodes being chosen as the crossover point in an attempt to prevent
destructive crossover. Increasing the number of nodes of the tree makes it more
difficult to destroy with crossover.
Hitchhiking theory [35] proved that random selection in conjunction with standard
subtree crossover does not cause code growth and therefore it is concluded that fitness
is the cause of size increase.
The removal bias theory [33] states that, assuming that redundant data are closer to
the leaves than to the root and applying crossover to redundant data does not modify
the fitness of a solution, evolution will favor the replacement of small branches.
Since there is not a bias for insertion, small branches will be replaced by average-
size branches, leading to bigger trees. In [15] experimental evidence is against the
claim that it is the crossover between introns that causes the bloat problem. Instead,
a generalization of the removal bias theory is used to explain the code growth.
In [26,29], a size evolution equation is developed, which provides an exact for-
malization of the dynamics of average program size. Also, the crossover bias theory
[5,29] states that while the mean size of programs is unaffected by crossover, higher
moments of the distribution are. The population evolves toward a distribution where
small programs have a higher frequency than longer ones.
Several non-intron theories of bloat have been proposed. The program search
space theory [14] relies on the idea that above a certain size, the distribution of
fitness does not vary with size. Since in the search space there are more big tree
structures than small ones, during the search process GP will tend to find bigger
trees. In [27], it is argued that GP will tend to produce larger trees simply because
there are more large programs than small ones within the search space. Theory of
modification point depth [16] argues that if deeper points are selected for crossover,
then it is less likely that crossover will significantly modify fitness. Therefore, there
is a bias for larger trees, which have deeper modification points.
4.4 Bloat Control
In tree-based GP, it is standard practice to place control on program size either by

limiting the number of nodes or the depth of the trees, or by adding a term to the
fitness function that rewards smaller programs (parsimony pressure), or based on
genetic operators.
4.4 Bloat Control 77
4.4.1 Limiting on Program Size
This method constrains the evolving population with the maximum allowed depth,
or size, of the trees. A limit can be placed on either the number of nodes or the depth
of the tree [12]. Children whose size exceeds the limit are rejected, placing copies of
their parents in the population in their stead. In [25], newly created programs do not
enter the population until after a number of generations proportional to their size,
the idea being to give smaller programs a chance to spread through the population
before being overwhelmed by their larger brethren.
Augmenting any bloat control method with size limit never hurts [19]. However,
the population will quickly converge to the size limit and thus, lead to premature con-
vergence. It is very difficult to set a good limit without prior knowledge. The dynamic
limits approach [31] refines the hard-limiting approach based on fitness. Bloat con-
trol methods based on operator equalization [6,32] eliminate bloat by biasing the
search toward a predefined size distribution.
4.4.2 Penalizing the Fitness of an Individual with Large Size
Another approach is to penalize the fitness of an individual if it is too large. Such

methods are called parsimony pressure methods. In [27], the fitness of an individual
is nullified if the size of an individual is larger than the average size of the entire
population. The fitness of each individual is reduced by a specified amount deter-
mined by its size, normally with respect to the rest of the population [30]. In the
covariant parsimony pressure method [30], the parsimony coefficient is recalculated
at each generation to ensure that the mean value size of the population remains con-
stant along the evolution. Parsimony pressure can also be implemented by using
tree size as a secondary objective for lexicographic ordering [17] or multiobjective
optimization [8].
Inclusion of parsimony pressure in the selection method is accomplished either
by selecting a proportion of individuals based on size or by doing two tournaments,
one on fitness and another on size. In double tournament selection [18], a series of
tournaments are run using program size to determine the winner, the winners of these
tournaments then contest a final tournament using fitness to determine the winner.
4.4.3 Designing Genetic Operators
Several schemes are on the genetic operators, such as the crossover operator [2],
[14], and selection strategy that eliminates larger trees [19,25].
An editing operator [12] periodically simplify the trees, eliminating the subtrees
that do not add anything to the final solution. In [7], a mutation operator that performs
algebraic simplification of the tree expression is introduced, in a way similar to
the editing operator. In [11] the algebraic simplification approach is extended by

considering numerical simplification.
A simple crossover for tree-based GP is the one-point crossover developed in [28].
One-point crossover only allows crossover to happen between regions of two program
trees that share the same shape. Before choosing the crossover points, both parent
trees are aligned, starting from the root node, to determine the common region shared
between them. Unlike standard subtree crossover, one-point crossover makes the
population converge just like in GA [28]. One-point crossover effectively eliminates
the possibility of bloat [28]. To provide a wider exploration of the search space, one-
point crossover is combined with one-point mutation [28]. By substituting standard
subtree crossover with the one-point crossover coupled with subtree mutation, an
order of magnitude reduction in bloat is achieved when compared with standard
GP [36].
Size fair crossover is introduced in [13], and size fair mutation is introduced in [3].
In size fair operators, the size of the subtree to be deleted is calculated and this is used
to guide the random choice of the second crossover point. Size fair crossover and
homologous crossover [13] explicitly consider the size of the subtree that is removed
from one parent when choosing which subtree to insert from the other. In addition,
homologous crossover not only accounts for size, but it also considers shape. In this
sense homologous crossover can be seen as a generalization of one-point crossover
combined with size fair crossover.
The prune and plant method [1] is an elaborate mutation operator inspired from
agricultural practice for fruit trees. It prunes some branches of trees and plants them
in order to grow new trees. The method creates two offspring from a single parent.
The pruned branch will be planted in the population as a new tree. Prune and plant
can be considered as the combination of two mutation operators: shrink (or trunc) and
hoist. The shrink operator removes a branch of a tree and replaces it with a terminal.
Hoist selects an inner node and returns a copy of this subtree as a new individual.
Prune and plant achieves the quality of the final solutions in terms of fitness while
achieving a substantial reduction of the mean tree size [1].
4.5 Gene Expression Programming
In nature, the phenotype has multiple levels of complexity: tRNAs, proteins, ribo-
somes, cells, and the organism itself, all of which are products of expression and
are ultimately encoded in the genome. The expression of the genetic information
starts with transcription (the synthesis of RNA) and, for protein genes, proceeds
with translation (the synthesis of proteins).
Gene expression programming (GEP) [9] incorporates both the linear fixed-length
chromosomes of GA type and the expression trees of different sizes and shapes
similar to the parse trees of GP. The chromosomes have fixed length and are composed
4.5 Gene Expression Programming 79
of one or more equal-size genes structurally organized in a head and a tail. Since the
expression trees are totally encoded in the linear chromosomes of fixed length, the
genotype and phenotype are finally separated from each other. Thus, the phenotype
consists of the same kind of ramified structure used in GP.
In GEP, from the simplest individual to the most complex, the expression of
genetic information starts with translation, the transfer of information from a gene
into an expression tree. There is no need for transcription: the message in the gene is
directly translated into an expression tree. The expression trees are the expression of
a totally autonomous genome. Only the genome is passed on to the next generation,
and the modified simple linear structure will grow into an expression tree.
The chromosomes function as a genome and are subjected to modification by
means of mutation, transposition, root transposition, gene transposition, gene recom-
bination, and one- and two-point recombination. The chromosomes encode expres-
sion trees which are the object of selection.
Karva language is used to read and express the information encoded in the chro-
mosomes. K-expressions in terms of open reading frames (ORFs) is in fact the
phenotype of the chromosomes, being the genotype easily inferred from the pheno-
type, which is the straightforward reading of the expression tree from left to right
and from top to bottom. The length of the ORFs is variable, and it may be equal to or
less than the length of a gene. These noncoding regions in genes allow modification
of the genome using any genetic operator without restrictions, always producing
syntactically correct programs.
However, experiments show that GEP does not have a better performance than
other GP techniques [22].
Self-learning GEP [39] features a chromosome representation in which each chro-
mosome is embedded with subfunctions that can be deployed to construct the final
solution. The subfunctions are self-learned or self-evolved during the evolutionary
search. Self-learning GEP is simple, generic and has much fewer control parameters
than GEP has.
Problems
4.1 Write an s-expression that returns 1 if x > 10, and −1 otherwise.

4.2 Give the defining length, order, and length of the tree-structure schema
(if (#x#)1#). √
4.3 Write an s-expression for f (x) = x 2 + a 2 x + ab − 1/(2b2 ). Draw the syntax
tree. What is the depth of the syntax tree?
4.4 Find a chromosome that can be decoded as cos(x) sin2 (x 2 ).
4.5 Generate data samples from the model [20]:
√
ẋ1 = 10(6 − x1 ) − 2.5468x1 x2 ,
ẋ2 = 80u − 10.1022x2 ,
√
ẋ3 = 0.024121x1 x2 + 0.112191x2 − 10x3 ,
√
ẋ4 = 245.978x1 x2 − 10x4 ,
x4
y= .
x3
Then find the model structure by using GP-OLS Toolbox.
References
1. Alfaro-Cid E, Merelo JJ, Fernandez de Vega F, Esparcia-Alcazar AI, Sharman K. Bloat con-
trol operators and diversity in genetic programming: a comparative study. Evol Comput.
2010;18(2):305–32.
2. Blickle T, Thiele L. Genetic programming and redundancy. In: Hopf J, editor. Proceedings
of KI-94 workshop on genetic algorithms within the framework of evolutionary computation.
Germany: Saarbrucken; September 1994. p. 33–8.
3. Crawford-Marks R, Spector L. Size control via size fair genetic operators in the PushGP genetic
programming system. In: Proceedings of the genetic and evolutionary computation conference
(GECCO), New York, USA, July 2002. pp. 733–739.
4. Daida JM, Li H, Tang R, Hilss AM. What makes a problem GP-hard? validating a hypothesis
of structural causes. In: Cantu-Paz E, et al., editors. Proceedings of genetic and evolutionary
computation conference (GECCO), Chicago, IL, USA; July 2003. p. 1665–77.
5. Dignum S, Poli R. Generalisation of the limiting distribution of program sizes in tree-based
genetic programming and analysis of its effects on bloat. In: Proceedings of the 9th annual
conference on genetic and evolutionary computation (GECCO), London, UK, July 2007. p.
1588–1595.
6. Dignum S, Poli R. Operator equalisation and bloat free GP. In: Proceedings of the 11th European
conference on genetic programming (EuroGP), Naples, Italy, March 2008. p. 110–121.
7. Ekart A. Shorter fitness preserving genetic programs. In: Proceedings of the 4th European con-
ference on artificial evolution (AE’99), Dunkerque, France, November 1999. Berlin: Springer;
2000. p. 73–83.
8. Ekart A, Nemeth SZ. Selection based on the Pareto nondomination criterion for controlling
code growth in genetic pregramming. Genet Program Evol Mach. 2001;2(1):61–73.
9. Ferreira C. Gene expression programming: a new adaptive algorithm for solving problems.
Complex Syst. 2001;13(2):87–129.
10. Hoai NX, McKay RIB, Essam D. Representation and structural difficulty in genetic program-
ming. IEEE Trans Evol Comput. 2006;10(2):157–66.
11. Kinzett D, Johnston M, Zhang M. Numerical simplification for bloat control and analysis of
building blocks in genetic programming. Evol Intell. 2009;2:151–68.
12. Koza JR. Genetic programming: on the programming of computers by means of natural selec-
tion. Cambridge: MIT Press; 1992.
13. Langdon WB. Size fair and homologous tree genetic programming crossovers. Genet Program
Evol Mach. 2000;1:95–119.
14. Langdon WB, Poli R. Fitness causes bloat. In: Proceedings of the world conference on soft
computing in engineering design and manufacturing, London, UK, June 1997. p. 13–22.
15. Luke S. Code growth is not caused by introns. In: Proceedings of the genetic and evolutionary
computation conference (GECCO’00), Las Vegas, NV, USA, July 2000. p. 228–235.
16. Luke S. Modification point depth and genome growth in genetic programming. Evol Comput.
2003;11(1):67–106.
References 81
17. Luke S, Panait L. Lexicographic parsimony pressure. In: Proceedings of the genetic and evo-
lutionary computation conference (GECCO), New York, USA, July 2002. p. 829–836.
18. Luke S, Panait L. Fighting bloat with nonparametric parsimony pressure. In: Proceedings of
the 7th international conference on parallel problem solving from nature (PPSN VII), Granada,
Spain, September 2002. p. 411–421.
19. Luke S, Panait L. A comparison of bloat control methods for genetic programming. Evol
Comput. 2006;14(3):309–44.
20. Madar J, Abonyi J, Szeifert F. Genetic programming for the identification of nonlinear input-
output models. Ind Eng Chem Res. 2005;44(9):3178–86.
21. Nordin P, Francone F, Banzhaf W. Explicitly defined introns and destructive crossover in genetic
programming. In: Rosca JP, editor. Proceedings of the workshop on genetic programming: from
theory to real-world applications, Tahoe City, July 1995. p. 6–22.
22. Oltean M, Grosan C. A comparison of several linear genetic programming techniques. Complex
Syst. 2003;14:4. 285CC314.
23. O’Neill M, Ryan C. Grammatical evolution. IEEE Trans Evol Comput. 2001;5(4):349–58.
24. Ortega A, de la Cruz M, Alfonseca M. Christiansen grammar evolution: grammatical evolution
with semantics. IEEE Trans Evol Comput. 2007;11(1):77–90.
25. Panait L, Luke S. Alternative bloat control methods. In: Proceedings of genetic and evolutionary
26. Poli R. General schema theory for genetic programming with subtree-swapping crossover. In:
Proceedings of the 4th European conference on genetic programming (EuroGP), Lake Como,
Italy, April 2001. p. 143–159.
27. Poli R. A simple but theoretically-motivated method to control bloat in genetic programming.
In: Proceedings of the 6th European conference on genetic programming (EuroGP), Essex,
UK, April 2003. p. 204–217.
28. Poli R, Langdon WB. Genetic programming with one-point crossover. In: Chawdhry PK, Roy
R, Pant RK, editors. Soft computing in engineering design and manufacturing, Part 4. Berlin:
Springer; 1997. p. 180–189.
29. Poli R, McPhee NF. General schema theory for genetic programming with subtree-swapping
crossover: Part II. Evol Comput. 2003;11(2):169–206.
30. Poli R, McPhee NF. Parsimony pressure made easy. In: Proceedings of the 10th annual confer-
ence on genetic and evolutionary computation (GECCO’08), Atlanta, GA, USA, July 2008. p.
1267–1274.
31. Silva S, Costa E. Dynamic limits for bloat control in genetic programming and a review of past
and current bloat theories. Genet Program Evol Mach. 2009;10(2):141–79.
32. Silva S, Dignum S. Extending operator equalisation: fitness based self adaptive length distribu-
tion for bloat free GP. In: Proceedings of the 12th European conference on genetic programming
(EuroGP), Tubingen, Germany, April 2009. p. 159–170.
33. Soule T, Foster JA. Removal bias: a new cause of code growth in tree based evolutionary pro-
gramming. In: Proceedings of the IEEE international conference on evolutionary computation,
Anchorage, AK, USA, May 1998. p. 781–786.
34. Syswerda G. A study of reproduction in generational and steady state genetic algorithms.
In: Rawlings GJE, editor. Foundations of genetic algorithms. San Mateo: Morgan Kaufmann;
1991. p. 94–101.
35. Tackett WA. Recombination, selection and the genetic construction of genetic programs. PhD
thesis, University of Southern California, Los Angeles, CA, USA, 1994.
36. Trujillo L. Genetic programming with one-point crossover and subtree mutation for effective
problem solving and bloat control. Soft Comput. 2011;15:1551–67.
37. Vladislavleva EJ, Smits GF, den Hertog D. Order of nonlinearity as a complexity measure for
models generated by symbolic regression via Pareto genetic programming. IEEE Trans Evol
Comput. 2009;13(2):333–49.
38. Walker JA, Miller JF. The automatic acquisition, evolution and reuse of modules in Cartesian
genetic programming. IEEE Trans Evol Comput. 2008;12(4):397–417.
39. Zhong J, Ong Y, Cai W. Self-learning gene expression programming. IEEE Trans Evol Comput.
2016;20(1):65–80.
Evolutionary Strategies
5
Evolutionary strategy (ES) paradigm is one of the most successful EAs. Evolutionary
gradient search and gradient evolution are two methods that use EA to construct gra-
dient information for directing the search efficiently. Covariance matrix adaptation
(CMA) ES [11] accelerates the search efficiency by supposing that the local solution
space of the current point has a quadratic shape.
5.1 Introduction
ES [20,22] is another popular EA. ES was originally developed for numerical opti-
mization problems [22]. It was later extended to discrete optimization problems [13].
The objective parameters x and strategy parameters σ are directly encoded into the
chromosome using regular numerical representation, and thus no coding or decoding
is necessary.
Evolutionary programming [9] was presented for evolving artificial intelligence
for predicting changes in an environment, which was coded as a sequence of symbols
from a finite alphabet. Each chromosome is encoded as a finite state machine. The
approach was later generalized for solving numerical optimization problems based
on Gaussian mutation [8]. Evolutionary programming is very similar to ES with the
(λ + λ) strategy, but it does not use crossover, and it uses probabilistic competition
for selection.
Unlike GA, the primary search operator in ES is mutation. There are some major
differences between ES and GA.
• Selection procedure. The selection procedure in ES is deterministic: it always

selects the specified number of best individuals as a population, and each individ-
ual in the population has the same mating probability. In contrast, the selection

DOI 10.1007/978-3-319-41192-7_5
84 5 Evolutionary Strategies
procedure in GA is random and the chances of selection and mating are propor-
tional to an individual’s fitness.
• Relative order of selection and genetic operations. In ES, the selection procedure
is implemented after crossover and mutation, while in GA, it is carried out before
crossover and mutation are applied.
• Adaptation of control parameters. In ES, the strategy parameters σ are evolved
automatically by encoding them into chromosomes. In contrast, the control para-
meters in GA are problem-specific and need to be prespecified.
• Function of mutation. In GA, mutation is used to regain the lost genetic diversity,
while in ES, mutation functions as a hill-climbing search operator with adaptive
step size σ. Due to the normal distribution nature in Gaussian mutation, the tail
part of the distribution may generate a chance for escaping from a local optimum.
Other differences are embodied in the encoding methods and genetic operators.
However, the line between the different evolutionary computation methods is now
being blurred, since both methods are improved by borrowing the ideas from each
other. For example, CHC [7] has the properties of both GA and ES.
For continuous functional optimization, it is generally known that evolutionary
programming or ES works better than GA [2]. CMA-ES belongs to the
best-performing direct search strategies for real-valued black-box optimization of
unconstrained problems, based on the results of the 2009 and 2010 GECCO black-
box optimization benchmarking.
5.2 Basic Algorithm
Canonical ES uses only mutation operations. Biologically, this corresponds to the

asexual reproduction. However, crossover operators used for real-coded GA can be
introduced into ES. For example, crossover operator can be defined by recombining
two parents x1 and x2 such that the ith gene of the generated offspring x takes the
value
1
xi = x1,i + x2,i (5.1)
2
or is selected as either x1,i or x2,i . An offspring obtained from recombination is
required to be mutated before it is evaluated and entered into the population.
Mutation can be applied to a parent or to an offspring generated by crossover. For
a chromosome x = (x1 , x2 , . . . , xn ), Gaussian mutation produces a new offspring x
with one or more genes defined by
xi = xi + N (0, σi ) , i = 1, . . . , n, (5.2)
where N (0, σi ) is a Gaussian distribution with zero mean and standard deviation
σi , and σ = (σ1 , . . . , σn )T . The optimal σi is problem-dependent, and is evolved
automatically by encoding it into the chromosome. In practical implementations, σi
5.2 Basic Algorithm 85
is usually mutated first and then xi is mutated using σi

σi = σi eN(0,δσi ) , (5.3)
where δσi is a parameter of the method.
The performance of ES depends substantially on σi . In canonical ES, the self-
adaptation strategy provides each offspring with an individual σi computed from the
best μ offspring of the previous generation. σi can also be adjusted by cumulative
step-size adaptation [19].
For ES, two major selection schemes are usually applied, namely, the (λ + μ)
and (λ, μ) strategies, where μ is the population size and λ is the number of off-
spring generated from the population. As opposed to GA, both selection schemes
are deterministic sampling methods. These ranking-based selection schemes make
ES more robust than GA. In the (λ + μ) strategy, μ fittest individuals are selected
from the (λ + μ) candidates to form the next generation, while in the (λ, μ) scheme,
μ fittest individuals are selected from λ (λ ≥ μ) offspring to form the next generation.
The (λ + μ) strategy is elitist and therefore guarantees a monotonically improving
performance. This selection strategy, however, is unable to deal with changing envi-
ronments and jeopardizes the self-adaptation mechanism with respect to the strategy
parameters, especially within small populations. The (λ, μ) strategy, with λ/μ = 7,
is recommended in this case [2].
To discover a wider region of the search space globally, Cauchy and Levy probabil-
ity distributions have a larger variation at a single mutation, and they are used instead
of Gaussian mutation as the primary search operator in evolutionary programming
[15,23].
Dynamic systems analysis derives a nonlinear system of difference equations that
describes the mean value evolution of ES. Some examples of dynamic analysis of ES
are implemented for (1, λ)-ES with σ self-adaptation on the sphere [3], (μ/μI , λ)-ES
with σ self-adaptation on the ellipsoid model [5], and (μ/μI , λ)-ES with cumula-
tive step-size adaptation on the ellipsoid model [6]. (μ/μI , λ)-ES with cumulative
step-size adaptation exhibits linear convergence order. Compared to canonical ES,
the control rule of cumulative step-size adaptation allows for a mutation strength
approximately μ-fold larger, which accounts for its superior performance in non-
noisy environments [6]. The convergence behaviors of ES and information-geometric
optimization are analyzed in [4] based on information geometry. ES philosophy opti-
mizing the expected value of the objective functions is shown to lead to sublinear
convergence toward the optimizer.
5.3 Evolutionary Gradient Search and Gradient Evolution
The gradient is a generalization of the derivative of a function in one dimension to a

function in several dimensions. It represents the slope of the tangent of the function.
More precisely, the gradient points in the direction of the greatest rate of increase of
the function and its magnitude is the slope of the graph in that direction. A positive
gradient represents an increasing function, while a negative gradient represents a
decreasing function. When the gradient is zero, the curve at that point is flat. This
point is called an extreme or stationary point. An optimal solution is located at an
extreme point.
Classical gradient methods provide fast and reliable search on a differentiable solu-
tion landscape, but may be trapped at a local optimal solution. Evolutionary gradient
search uses EAs to construct gradient information on a nondifferential landscape and
later developed it for noisy environment optimization [1,21]. It uses self-adaptive
control for mutation, i.e., the chromosome is coded as x = (x1 , x2 , . . . , xn , σ). Mul-
tidirectional searches are carried out. The gradient is the direction calculated from
the evolutionary movement instead of the single movement of a solution. A centred
differencing approach is used for gradient estimation.
Evolutionary gradient search has the sense of (1, λ)-ES. It only works on one
individual. From current point x, the method generates λ new individuals t 1 , . . . , t λ
using normal mutation, and calculates their fitness values as f (t 1 ), . . . , f (t λ ). The
estimated gradient is given by
λ

g= (f (t i ) − f (x)) (t i − x), (5.4)
i=1
g
which is normalized as e = g .
Evolutionary gradient search generates two trial points:
σ
x1 = x + (σψ)e, x2 = x + e, (5.5)
ψ
where ψ > 1 is a factor. The new individual is given by
x = x + σ e, (5.6)
with
σψ if f (x1 ) > f (x2 )
σ = σ . (5.7)
ψ if f (x1 ) ≤ f (x2 )
Gradient evolution [14] is a population-based metaheuristic method. Similar to
evolutionary gradient search, gradient evolution uses a gradient estimation approach
that is based on a centered differencing approach. Its population comprises a num-
ber of vectors that represent possible solutions. Gradient evolution searches for the
optimal solution over several iterations. In each iteration, all vectors are updated
using three operators: vector updating, jumping, and refreshing. Gradient evolution
algorithm uses an elitist strategy. Gradient evolution performs better than, or as well
as, PSO, DE, ABC, and continuous GA, for most of the benchmark problems tested.
The updating rule for gradient evolution is derived from a gradient estimation
method. It modifies the updating rule for the individual-based search, which is
inspired from a Taylor series expansion. The search direction is determined by the
Newton–Raphson method. Vector jumping and refreshing help to avoid local optima.
The algorithm simply sets a jumping rate to determine whether or not a vector must
jump. Vector refreshing is performed when a vector does not move to another location
after multiple iterations. Only a chosen vector can jump to a different direction.
5.3 Evolutionary Gradient Search and Gradient Evolution 87
Example 5.1: We revisit Rosenbrock function treated in Example 3.3:

n−1
2
min f (x) = 100 xi+1 − xi2 + (1 − xi )2 , x ∈ [−2048, 2048]2 .
x
i=1
The function has the global minimum f (x) = 0 at xi = 1, i = 1, . . . , n. The landscape
We apply μ + λ-ES with intermediate recombination. The implementation sets
the population size as μ = λ = 100, the maximum number of generations as 100,
and selects the initial population randomly from the entire domain. For a random
run, we have f (x) = 0.0039 at (0.9381, 0.8791) with 4000 function evaluations.
All the individuals converge toward the global optimum. For 10 random runs, the
solver always converged toward a point very close to the global optimum within 100
generations. The evolution of a random run is illustrated in Figure 5.1.
Example 5.2: The Easom function is treated in Example 2.1 and Example 3.4. Here
we solve this same problem using ES with the same ES settings given in Example
5.1. The global minimum value is −1 at x = (π, π)T .
For a random run, we have f (x) = −1.0000 at (3.1416, 3.1413) with 9000 func-
tion evaluations. All the individuals converge toward the global optimum. For 10
random runs, the solver always converged to the global optimum within 100 gener-
ations. The evolution of a random run is illustrated in Figure 5.2.
20
10
Best fitness
Mean fitness
15
10
10
Fitness value
10
5
10
0
10
−5
10
0 5 10 15 20 25 30 35 40
Generation
Figure 5.1 The evolution of a random run of ES for Rosenbrock function: the minimum and
average objectives.
0
Best fitness
−0.1 Mean fitness
−0.2
−0.3
−0.4
Fitness value
−0.5
−0.6
−0.7
−0.8
−0.9
−1
0 10 20 30 40 50 60 70 80 90
Generation
Figure 5.2 The evolution of a random run of ES for the Easom function: the minimum and average
objectives.
From this example and Example 5.1, it is concluded that the ES implementa-
tion gives better results than SA and GA for both Rosenbrock function and Easom
function.
5.4 CMA Evolutionary Strategies
Evolutionary gradient search uses EA to construct gradient information that is used to

direct the search efficiently. Covariance matrix adaptation (CMA) ES [11] accelerates
the search efficiency by supposing that the local solution space of the current point
has a quadratic shape, i.e., the Taylor series of f (x) around xk .
In self-adaptive ES, the standard deviation and the covariance (or the correspond-
ing rotation angle) of multi-dimensional normal distribution could be encoded into
chromosome to be optimized by the algorithm. In CMA-ES, the λ new individuals
generated with normal distribution are regarded as samplings on the solution space.
The density function is a quadratic function of x. If we could simplify the local area
of the solution space as a convex quadratic surface, the μ best individuals among λ
might form a better density function by which we could generate better individuals.
The set of all mutation steps that yield improvements is called an evolution path of
ES [10]. CMA-ES is a technique that uses information embedded in the evolution path
to accelerate the convergence. CMA is a completely derandomized self-adaptation
scheme. Subsequent mutation steps are uncorrelated with the previous ones. The
mutation operator is defined by
x = x + δBz, (5.8)
5.4 CMA Evolutionary Strategies 89
where δ is a global step size, z is a random vector whose elements are drawn from a
normal distribution N(0, 1), and the columns of the rotation matrix B are the eigen-
vectors of the covariance matrix C of the distribution of mutation points. The step
size δ is also adaptive. CMA implements PCA of the previously selected mutation
steps to determine the new mutation distribution [11].
By suitably defining mutation operators, ES can evolve significantly faster. CMA-
based mutation operator makes ES two orders of magnitude faster than conventional
ES [10–12]. CMA implements the concepts of derandomization and cumulation for
self-adaptation of the mutation distribution [11].
In CMA-ES (https://www.lri.fr/~hansen/), not only is the step size of the mutation
operator adjusted at each generation, but also is the step direction. Heuristics for
setting search parameters, detecting premature convergence, and a restart strategy can
also be introduced into CMA-ES. CMA is one of the best real-parameter optimization
algorithms.
In [12], the original CMA-ES [11] is modified to adapt the covariance matrix
by exploiting more of the information contained in larger populations. Instead of
updating the covariance matrix with rank-one information, higher rank information
is included. This reduces the time complexity from O(n2 ) to O(n), for a problem
dimension of n.
BI-population CMA-ES with alternative restart strategy combines two modes of
parameter settings for each restart [17]. It is the winner of the competition on real-
parameter single objective optimization at IEEE CEC-2013.
Limited memory CMA-ES [18] is an alternative to limited memory BFGS method.
Inspired by limited memory BFGS, limited memory CMA-ES samples candidate
solutions according to a covariance matrix reproduced from m direction vectors
selected during the optimization process. Limited memory CMA-ES outperforms
CMA-ES and its large scale versions on non-separable ill-conditioned problems
with a factor that increases with problem dimension. The algorithm demonstrates a
performance comparable to that of limited memory BFGS on non-trivial largescale
optimization problems.
Mixed integer evolution strategies [16] are natural extensions of ES for mixed
integer optimization problems whose parameter vectors consisting of continuous
variables as well as nominal discrete and integer variables. They use specialized
mutation operators tailored for the mixed parameter classes. For each type of variable,
the choice of mutation operators is governed by a natural metric for this variable type,
maximal entropy, and symmetry considerations. All distributions used for mutation
can be controlled in their shape by means of scaling parameters, allowing self-
adaptation to be implemented. Global convergence of the method is proved on a
very general class of problems.
The evolution path technique employed by CMA-ES is a fine example of exploit-
ing history. History was also used in developing efficient EAs that adaptively mutate
and never revisit [24]. An archive is used to store all the solutions that have been
explored before. It constitutes an adaptive mutation operator that has no parameter.
The algorithm has superior performance over CMA-ES.
Problems
5.1 Find out the global search mechanism, the convergence mechanism, and the
up-hill mechanism of ES.
5.2 Explain how an elitist GA is similar to (μ + λ)-ES.
5.3 Minimize the 10-dimensional Rastrigin function on the domain [−5.12, 5.12]
using (μ + λ)-ES with μ = 10 and λ = 10. Set the standard deviation of the
mutation in each dimension to 0.02.
(1) Record the best individual at each generation for 100 generations.
(2) Run 50 simulations.
(3) Plot the average minimum cost values as a function of generation number.
References
1. Arnold D, Salomon R. Evolutionary gradient search revisited. IEEE Trans Evol Comput.
2007;11(4):480–95.
2. Back T, Schwefel H. An overview of evolutionary algorithms for parameter optimization. Evol
Comput. 1993;1(1):1–23.
3. Beyer H-G. Toward a theory of evolution strategies: self-adaptation. Evol Comput.
1995;3(3):311–47.
4. Beyer H-G. Convergence analysis of evolutionary algorithms that are based on the paradigm
of information geometry. Evol Comput. 2014;22(4):679–709.
5. Beyer H-G, Melkozerov A. The dynamics of self-adaptive multi-recombinant evolution strate-
gies on the general ellipsoid model. IEEE Trans Evol Comput. 2014;18(5):764–78.
6. Beyer H-G, Hellwig M. The dynamics of cumulative step size adaptation on the ellipsoid
model. Evol Comput. 2016;24:25–57.
7. Eshelman LJ. The CHC adaptive search algorithm: how to have safe search when engaging in
nontraditional genetic recombination. In: Rawlins GJE, editor. Foundations of genetic algo-
rithms. San Mateo, CA: Morgan Kaufmann; 1991. p. 265–283.
8. Fogel DB. An analysis of evolutionary programming. In: Proceedings of the 1st annual con-
ference on evolutionary programming, La Jolla, CA, May 1992. p. 43–51.
9. Fogel L, Owens J, Walsh M. Artificial intelligence through simulated evolution. New York:
Wiley; 1966.
10. Hansen N, Ostermeier A. Adapting arbitrary normal mutation distributions in evolution strate-
gies: the covariance matrix adaptation. In: Proceedings of IEEE international conference on
evolutionary computation, Nagoya, Japan, 1996. p. 312–317.
11. Hansen N, Ostermeier A. Completely derandomized self-adaptation in evolution strategies.
Evol Comput. 2001;9(2):159–95.
12. Hansen N, Muller SD, Koumoutsakos P. Reducing the time complexity of the derandomized
evolution strategy with covariance matrix adaptation (CMA-ES). Evol Comput. 2003;11(1):1–
18.
13. Herdy M. Application of the evolution strategy to discrete optimization problems.In: Schwe-
fel HP, Manner R, editors. Parallel problem solving from nature, Lecture notes on computer
science, vol. 496. Berlin: Springer; 1991. p. 188–192
14. Kuo RJ, Zulvia FE. The gradient evolution algorithm: a new metaheuristic. Inf Sci.
2015;316:246–65.
References 91
15. Lee CY, Yao X. Evolutionary programming using mutations based on the Levy probability
distribution. IEEE Trans Evol Comput. 2004;8(1):1–13.
16. Li R, Emmerich MTM, Eggermont J, Back T, Schutz M, Dijkstra J, Reiber JHC. Mixed integer
evolution strategies for parameter optimization. Evol Comput. 2013;21(1):29–64.
17. Loshchilov I. CMA-ES with restarts for solving CEC 2013 benchmark problems. In: Proceed-
ings of IEEE congress on evolutionary computation (CEC 2013), Cancun, Mexico, June 2013.
p. 369–376.
18. Loshchilov I. LM-CMA: an alternative to L-BFGS for large scale black-box optimization. Evol
Comput. 2016.
19. Ostermeier A, Gawelczyk A, Hansen N. Step-size adaptation based on non-local use of selection
information. In: Parallel problem solving from nature (PPSN III), Lecture notes in computer
science, vol. 866. Berlin: Springer; 1994. p. 189–198.
20. Rechenberg I. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biol-
ogischen biologischen Evolution. Freiburg, Germany: Formman Verlag; 1973.
21. Salomon R. Evolutionary algorithms and gradient search: similarities and differences. IEEE
22. Schwefel HP. Numerical optimization of computer models. Chichester: Wiley; 1981.
23. Yao X, Liu Y, Lin G. Evolutionary programming made faster. IEEE Trans Evol Comput.
1999;3(2):82–102.
24. Yuen SY, Chow CK. A genetic algorithm that adaptively mutates and never revisits. IEEE Trans
Evol Comput. 2009;13(2):454–72.
Differential Evolution
6
Differential evolution (DE) is a popular, simple yet efficient EA for solving real-
parameter global optimization problems [30]. DE is an elitist EA. It creates new
candidate solutions by a multiparent reproduction strategy. DE uses the directional
information from the current population for each individual to form a simplex-like
triangle.
6.1 Introduction
Differential evolution (DE) uses a one-to-one spawning and selection relationship

between each individual and its offspring. It creates new candidate solutions by a
multiparent reproduction strategy. In this sense, DE is not biologically plausible. A
detailed review on DE is given in [8].
Unlike traditional EAs, DE variants perturb the current generation population
members with the scaled differences of randomly selected and distinct population
members. Thus, it owes a lot to the Nelder–Mead algorithm and the controlled random
search algorithm, which also rely on the difference vectors to perturb the current trial
solutions. A candidate replaces a parent only if it has better fitness.
The space complexity of DE is low as compared to some of the most competitive
real-parameter optimizers like CMA-ES. Although restart CMA-ES was able to beat
DE at CEC 2005 competition, the gross performance of DE in terms of accuracy,
convergence speed, and robustness still makes it attractive for various real-world
DE faces significant difficulty on functions that are not linearly separable and can
be outperformed by CMA-ES [27]. On such functions, DE must rely primarily on
its differential mutation procedure, which is rotationally invariant [33].

DOI 10.1007/978-3-319-41192-7_6
94 6 Differential Evolution
6.2 DE Algorithm
DE is a kind of direction-based search. Unlike the random step size of mutation

along each dimension of ES, DE uses the directional information from the current
population.
Each individual in the current generation is allowed to breed through mating
with other randomly selected individuals from the population. Specifically, for each
individual xti , i = 1, . . . , NP , at the current generation t, three other random distinct
individuals are selected from the population such that j, k, l ∈ {1, . . . , NP } and i =
j = k = l. Thus a parent pool of four individuals is formed to breed an offspring.
After initialization, DE creates a mutated vector v ti corresponding to each popu-
lation member through mutation and then target vector uti using arithmetic recom-
bination in the current generation. It is the method for creating the mutated vector
that differentiates one DE scheme from another. The five most frequently used muta-
tion strategies are implemented in the DE codes (in C, http://www.icsi.berkeley.edu/
~storn/code.html). These mutation, crossover, and selection operators defined for
DE are somewhat similar to those for real-coded GA.
In DE, mutation is applied before crossover, as opposed to GA. Moreover, in GA,
mutation is applied occasionally, whereas in DE it is a regular operation applied to
generate each offspring. The general convention for naming the mutation strategies
is DE/x/y/z, where x specifies the vector to be mutated, y is the number of difference
vectors considered for perturbation of x, and z is for the type of crossover being
used (exp: exponential; bin: binomial). DE family of algorithms can use two kinds
of crossover schemes: exponential and binomial [30–32]. In the high-performance
DE/rand/1/either-or variant [32], the trial vectors that are pure mutants occur with a
probability PF and those that are pure recombinants occur with a probability 1 − PF .
Differential Mutation
The standard mutation operator of DE needs three randomly selected different indi-
viduals from the current population for each individual to form a simplex-like trian-
gle. It prevents premature local convergence and ensures global convergence in the
final stage as all individuals in general evolve to one optimal point.
A frequently used mutation, denoted DE/rand/1. The differential mutation oper-
ation generates a mutated individual v ti by
v ti = xtj + F(xtk − xtl ), (6.1)
where j = k = l = i, and typically, 0 < F < 1 controls the strength of the direction.
Another mutation operation, denoted DE/best/2, is
v ti = xtbest + F(xtj − xtk ) + F(xtl − xtn ), (6.2)
where j, k, l, n corresponds four distinct points taken randomly from P (not coin-
ciding with the current xi ), xtmin is the point of P with minimal function value, and
0 < F < 1 is a scaling factor.
6.2 DE Algorithm 95
Directional mutation operator [46] attempts to recognize good variation directions

and increase the number of generations having fitness improvement. The method
constructs a pool of difference vectors, calculated when fitness is improved at a gen-
eration. The difference vector pool will guide the mutation search in the next gener-
ation once only. Directional mutation operator can be applied into any DE mutation
strategy, resulting in an improved performance for most of these DE algorithms. It
outperforms proximity-based mutation operator on the five DE variants.
A proximity-based mutation operator [11] selects the vectors to perform mutation
operation using a distance related probability.
Crossover
The mutated individual v ti is mated with xti , generating the offspring or trial individ-
ual uti . The genes of uti are inherited from xti and v ti , determined by the crossover
probability Cr ∈ [0, 1]:
t
vi,m , if rand(m) ≤ Cr or m = rn(i)
ui,m =
t
t , if rand(m) > C and m = rn(i) , (6.3)
xi,m r
where m = 1, . . . , N corresponds to the mth element of an individual vector,

rand(m) ∈ [0, 1) is the mth evaluation of a uniform random number generator and
rn(i) ∈ {1, . . . , N} is a randomly chosen index which ensures that uti gets at least one
element from v ti . Equation (6.3) ensures that at least one element of xi is changed
even if Cr = 0.
DE applies selection pressure only when picking survivors. Competition is con-
ducted between each individual xti and its offspring uti , and the winner is selected
deterministically based on objective function values and promoted to the next gen-
eration.
DE works with two populations P (old generation) and Q (new generation) of
the same size NP . A new trial point ui is composed of the current point xi of the old
generation and the point v i obtained by using mutation. If f (ui ) < f (xi ) the point
ui is inserted into the new generation Q instead of xi . After completion of the new
generation Q, the old generation P is replaced by Q and the search continues until
the stopping condition is fulfilled. DE in pseudo-code is written as Algorithm 6.1.
Due to the specific recombination operator, DE is very likely to prematurely con-
verge unless their parameters are carefully chosen. DE has three control parameters,
namely, population size NP , scaling factor F, and crossover rate Cr . Storn and Price
suggested NP ∈ [5D, 10D] for D-dimension problems, and a good initial choice of
F = 0.5 and Cr = 0.1; and to use 0.5 ≤ F ≤ 1 and 0 ≤ Cr ≤ 1 depending on the
results of preliminary tuning [30]. In [12], it is suggested that a plausible choice of
NP ∈ [3D, 8D], with F = 0.6 and Cr ∈ [0.3, 0.9]. In [27], F ∈ (0.4, 0.95) is sug-
gested, with F = 0.9 to be a good first choice; Cr typically lies in (0, 0.2) when the
function is separable, while in (0.9, 1) when the function’s parameters are dependent.
Under suitable assumptions, the dynamics of DE asymptotically converge to the
global optimum of the objective function, assuming the shape of a Dirac delta dis-
tribution [13].
Algorithm 6.1 (DE).
1. Generate P = (x1 , x2 , . . . , xNP ).

2. Repeat:
a. for i = 1 to NP do
i. Compute a mutant vector v i .
ii. Create ui by the crossover of v i and xi .
iii. if f (ui ) < f (xi ) then insert ui into Q.
else insert xi into Q.
end if
end for
b. P ← Q.
until stopping condition is satisfied.
Eigenvector-based crossover operator [15] utilizes the eigenvectors of the covari-

ance matrix of individual solutions to make the binomial crossover rotationally invari-
ant. During crossover, the donor vectors are projected onto the eigenvector basis that
provides a proper coordinate system, so that the rotated fitness landscape becomes
pseudo-separable. To preserve the population diversity, a parameter that controls the
ratio between the binomial crossover and the eigenvector-based crossover is intro-
duced. Incorporation of the proposed eigenvector-based crossover in six DE variants
demonstrates either solid performance gains, or statistically identical behaviors.
Example 6.1: Consider the Rastrigin function:

n
2
min f (x) = 10n + xi − 10 cos(2πxi ) , x ∈ [−5.12, 5.12]n . (6.4)
x
i=1
It is a multimodal function. The global optimum is f (x) = 0 at x∗ = 0. The function
is shown in Figure 6.1.
We now find the global optimum by using DE. The population size is selected as
50, F = 0.5, Cr = 0.9, and the maximum number of generations is 100. The initial
population is randomly generated from the entire domain.
For a random run, we have f (x) = 0 at (−0.1449 × 10−8 , −0.0307 × 10−8 ). All
the individuals converge toward the global optimum. For 10 random runs, the solver
always converged to the global optimum within 100 generations. The evolution of a
random run is illustrated in Figure 6.2.
6.3 Variants of DE 97
Figure 6.1 The landscape of Rastrigin function f (x) with two variables.
Best value= 0.0000 Mean value= 0.0000

30
Best value
Mean value
25
20
Fitness value
15
10
0
0 20 40 60 80 100
Generation
Figure 6.2 The evolution of a random run of DE for Rastrigin function: the minimum and average
objectives.
6.3 Variants of DE
DE basically outperforms PSO and other EAs in terms of the solution quality [35].
It still has the problems of slow and/or premature convergence.
By dynamically controlling F and/or Cr using fuzzy logic controllers, fuzzy adap-
tive DE [18] converges much faster than DE, particularly when the dimensionality
of the problem is high or the problem concerned is complicated. Self-adaptive DE

(SaDE) [24] adapts both trial vector generation strategies and their associated control
parameter values by learning from their previous experience in generating promising
solutions so as to match different phases of the search process/evolution.
Opposition-based DE [25] outperforms DE and fuzzy adaptive DE [18] in terms of
convergence speed and solution accuracy. It is specially suited to noisy optimization
problems. In [1], opposition-based learning is used in shuffled DE, where population
is divided into several memeplexes and each memeplex is improved by DE.
The success of DE is highly dependent on the search operators and control para-
meters that are often decided a priori. In [29], a DE algorithm is proposed that
dynamically selects the best performing combinations of parameters for a problem
during the course of a single run.
DE with self-adaptive parameters [3] adaptively adjusts the control parameters
F and Cr by implementing the DE scheme of DE/rand/1/bin strategy. The method
encodes F and Cr into the individual and evolves their values by using two probabil-
ities τ1 and τ2 . The self-adaptive method outperforms fuzzy adaptive DE [18]. The
method is improved by including a dynamic population size reduction mechanism
in [4].
In the parameter adaptation strategy for DE [39], the idea of controlling the pop-
ulation diversity is implemented. A multipopulation approach to the adaptive DE
algorithm is also analyzed.
DE with neighborhood search [40] performs mutation by adding a normally dis-
tributed random value to each component of the target vector. Self-adaptive DE with
neighborhood search [41] incorporates self-adaptation ideas from self-adaptive DE
[24] and proposes three self-adaptive strategies: self-adaptive choice of the mutation
strategy between two alternatives, self-adaptation of F, and self-adaptation of Cr .
In [26], the proposed hybrid DE algorithm uses local search to improve conver-
gence and an adaptive value for Cr . This adaptive mechanism combines the binary
crossover and the linear recombination in view of the diversity, and a population
refreshment mechanism is used to avoid stagnation. The algorithm gives competi-
tive results compared to existing methods on the CEC 2011 Competition benchmark.
The performance of standard DE can be enhanced by a crossover-based adaptive
local search operation [21]. The method adaptively adjusts the length of the search,
using a hill-climbing heuristic. Bare-bones DE [22] and Gaussian bare-bones DE [36]
are almost parameter-free optimization algorithms that are inspired by bare-bones
PSO.
DE/target-to-best/1 favors exploitation only, since all the vectors are attracted by
the same best position found so far by the entire population, thereby, converging faster
towards the same point. The family of improved DE variants of the DE/target-to-
best/1/bin scheme [6] addresses this advantage by using a hybrid mutation operator
which is a linear combination of neighborhood-based and global DE mutations. The
local neighborhood mutation mutates each vector using the best position found so
far in a small neighborhood of it. The global mutation takes into account the globally
6.3 Variants of DE 99
best vector xtbest of the entire population at current generation G for mutating a
population member.
DE Markov chain [34], as a population MCMC algorithm, solves an important
problem in MCMC, namely, that of choosing an appropriate scale and orientation
for the jumping distribution. In DE Markov chain, the jumps are simply a fixed
multiple of the differences of two random parameter vectors that are currently in
the population. The selection process of DE Markov chain works via the Metropolis
ratio which defines the probability with which a proposal is accepted.
JADE [45] implements a mutation strategy DE/current-to-pbest with optional
external archive and updates control parameters in an adaptive manner. DE/current-
to-pbest is a generalization of classical DE/current-to-best, while the optional archive
operation utilizes historical data to provide information of progress direction. Both
operations diversify the population and improve the convergence performance.
Current-to-pbest utilizes the information of multiple best solutions to balance the
greediness of the mutation and the diversity of the population. JADE is better than,
or at least comparable to, other DE algorithms, canonical PSO, and other EAs in
terms of convergence performance.
Geometric DE is a formal generalization of traditional DE that can be used to
derive specific DE algorithms for both continuous and combinatorial spaces retain-
ing the same geometric interpretation of the dynamics of DE search across represen-
tations. Specific geometric DE are derived for search spaces associated with binary
strings, permutations, vectors of permutations and genetic programs [20].
In [7], switched parameter DE modifies basic DE by switching the values of
the scale factor (mutation step size) and crossover rate in a uniformly random way
between two extreme corners of their feasible ranges for different individuals. Each
individual is mutated either by DE/rand/1 scheme or by DE/best/1 scheme. The
individual is subjected to that mutation strategy which was responsible for its last
successful update. Switched parameter DE achieves very competitive results against
the best known algorithms under the IEEE CEC 2008 and 2010 competitions.
In DE, the use of different mutation and crossover strategies with different parame-
ter settings can be appropriate during different stages of the evolution. In evolving
surrogate model-based DE method [19], a surrogate model, which is constructed
based on the population members of the current generation, is used to assist DE
in order to generate competitive offspring using the appropriate parameter setting
during different stages of the evolution. From the generated offspring members, a
competitive offspring is selected based on the surrogate model evaluation. A Krig-
ing model is employed to construct the surrogate. Evolving surrogate model-based
DE performs statistically similar or better than the state-of-the-art self-adaptive DE
algorithms.
6.4 Binary DE Algorithms
Standard DE and its variants typically operate in the continuous space. Several DE
algorithms are proposed for binary and discrete optimization problems.
In artificial immune system-based binary DE [16], the scaling factor is treated as
a random bit-string and the trial individuals are generated by Boolean operators. A
modified binary DE algorithm [37] improves the Boolean mutation operator based
on the binary bit-string framework. In binary-adapted DE [14], the scaling factor
is regarded as the probability of the scaled difference bit to take on one. Stochastic
diffusion binary DE [28] hybridizes binary-adapted DE [14] with ideas extracted
from stochastic diffusion search. These binary DE algorithms discard the updating
formulas of standard DE and generated new individuals based on different Boolean
operators.
Angle modulated DE is a binary DE inspired by angle modulated PSO. In angle
modulated DE [23], standard DE is adopted to update the four real-coded parameters
of angle modulated function which is sampled to generate the binary-coded solutions
till the global best solution is found. Thus, angle modulated DE actually works in
continuous space.
In discrete binary DE [5], the sigmoid function used in discrete binary PSO [17]
is directly taken to convert the real individuals to bit strings. Discrete binary DE
searches in the binary space directly, but it is very sensitive to the setting of the
control parameters. Moreover, the value transformed by the sigmoid function is not
symmetrical in discrete binary DE, which reduces the global searching ability.
Another modified binary DE [38] develops a probability estimation operator to
generate individuals. The method reserves the updating strategy of DE. The prob-
ability estimation operator is utilized to build the probability model for generating
binary-coded mutated individuals, and it can keep the diversity of population better
and is robust to the setting of parameters. It outperforms discrete binary DE [5],
modified binary DE [37], discrete binary PSO [17] and binary ant system in terms
of accuracy and convergence speed.
6.5 Theoretical Analysis on DE
Some theoretical results on DE are provided in [42,43], where the influence of the
mutation and crossover operators and their parameters on the expected population
variance is theoretically analyzed. In case of applying mutation and recombination
but no selection, the expected population variance of DE is shown to be greater than
that of ES [42].
In [44], the influence of the crossover rate on the distribution of the number of
mutated components and on the mutation probability is theoretically analyzed for
several variants of crossover, including binomial and exponential strategies in DE.
The behavior of exponential crossover variants is more sensitive to the problem size
than that of its binomial crossover counterparts.
6.5 Theoretical Analysis on DE 101
The theoretical studies on the evolutionary search dynamics of DE are given in

[9,10]. A simple mathematical model of the underlying evolutionary dynamics of a
1-D DE-population (evolving with the DE/rand/1/bin algorithm) is proposed in [9],
based on the fact that DE perturbs each dimension separately and if a D-dimensional
objective function is separable, this function can be optimized in a sequence of D
1-D optimization processes. The model reveals that the fundamental dynamics of
each search-agent in DE employs the gradient-descent type search strategy, with a
learning rate that depends on F and Cr . It is due to the gradient-descent strategy
that DE converges much faster than some variants of GA or PSO over unimodal
benchmarks [35]. The stability and convergence behavior of the proposed dynamics
is analyzed in the light of Lyapunov’s stability theorems in [10].
Problems
6.1 The mutation DE/current-to-best/1 is defined by

v ti = xti + F(xtbest − xti ) + F(xtj − xtk ).
Write the mutations DE/best/1, DE/current-to-best/1, DE/best/2, DE/current-to-
rand/1, and DE/rand/2.
6.2 Classical DE requires the generation of three random integers, but the random
number might need to be repeated due to their restriction on their values. On
average, how many random number generations are required for generating
acceptable xj , xk , xl ?
6.3 Use DE to minimize the 5-dimensional Ackley function. The selected parameters
are population size N = 100, and the number of generation 50.
(1) Run 10 simulations for crossover Cr = 0.8, and each of the step sizes F:
0.1, 0.4, 0.7, 0.9. Plot the average performance of the best cost of each set as a
function of generation number.
(2) Run the same procedure for F = 0.3, but CR = 0.1, 0.5, 0.9.
References
1. Ahandani MA, Alavi-Rad H. Opposition-based learning in the shuffled differential evolution
algorithm. Soft Comput. 2012;16:1303–37.
2. Ahandani MA, Shirjoposht NP, Banimahd R. Three modified versions of differential evolution
algorithm for continuous optimization. Soft Comput. 2010;15:803–30.
3. Brest J, Greiner S, Boskovic B, Mernik M, Zumer V. Self-adapting control parameters in
differential evolution: a comparative study on numerical benchmark problems. IEEE Trans
Evol Comput. 2006;10(6):646–57.
4. Brest J, Maucec MS. Population size reduction for the differential evolution algorithm. Appl
Intell. 2008;29:228–47.
5. Chen P, Li J, Liu Z. Solving 0-1 knapsack problems by adiscrete binary version of differen-
tial evolution. In: Proceedings of second international symposiumon intelligent information
technology application, Shanghai, China, Dec 2008. p. 513–516.
6. Das S, Abraham A, Chakraborty UK, Konar A. Differential evolution using a neighborhood-

based mutation operator. IEEE Trans Evol Comput. 2009;13(3):526–53.
7. Das S, Ghosh A, Mullick SS. A switched parameter differential evolution for large scale global
optimization—simpler may be better. In: Proceedings of MENDEL 2015, Vol. 378 of Recent
Advances in Soft Computing. Berlin: Springer; 2015. p. 103–125.
8. Das S, Suganthan PN. Differential evolution: a survey of the state-of-the-art. IEEE Trans Evol
Comput. 2011;15(1):4–31.
9. Dasgupta S, Das S, Biswas A, Abraham A. The population dynamics of differential evolution:
a mathematical model. In: Proceedings of IEEE congress on evolutionary computation, June
2008. p. 1439–1446.
10. Dasgupta S, Das S, Biswas A, Abraham A. On stability and convergence of the population-
dynamics in differential evolution. AI Commun. 2009;22(1):1–20.
11. Epitropakis MG, Tasoulis DK, Pavlidis NG, Plagianakos VP, Vrahatis MN. Enhancing dif-
ferential evolution utilizing proximity-based mutation operators. IEEE Trans Evol Comput.
2011;15(1):99–119.
12. Gamperle R, Muller SD, Koumoutsakos A. Parameter study for differential evolution. In:
Proceedings of WSEAS NNA-FSFS-EC 2002, Interlaken, Switzerland, Feb 2002. p. 293–298.
13. Ghosh S, Das S, Vasilakos AV, Suresh K. On convergence of differential evolution over a class
of continuous functions with unique global optimum. IEEE Trans Syst Man Cybern Part B.
2012;42(1):107–24.
14. Gong T, Tuson AL. Differential evolution for binary encoding. In: Soft computing in industrial
applications, Vol. 39 of Advances in Soft Computing. Berlin: Springer; 2007. p. 251–262.
15. Guo S-M, Yang C-C. Enhancing differential evolution utilizing eigenvector-based crossover
operator. IEEE Trans Evol Comput. 2015;19(1):31–49.
16. He X, Han L. A novel binary differential evolution algorithm based on artificial immune system.
In: Proceedings of IEEE congress on evolutionary computation (CEC), 2007. p. 2267–2272.
17. Kennedy J, Eberhart RC. A discrete binary version of the particle swarm algorithm. In: Proceed-
ings of IEEE international conference on systems, man, and cybernetics, 1997. p. 4104–4108.
18. Liu J, Lampinen J. A fuzzy adaptive differential evolution algorithm. Soft Comput.
2005;9(6):448–62.
19. Mallipeddi R, Lee M. An evolving surrogate model-based differential evolution algorithm.
Appl Soft Comput. 2015;34:770–87.
20. Moraglio A, Togelius J, Silva S. Geometric differential evolution for combinatorial and pro-
grams spaces. Evol Comput. 2013;21(4):591–624.
21. Noman N, Iba H. Accelerating differential evolution using an adaptive local search. IEEE Trans
Evol Comput. 2008;12(1):107–25.
22. Omran MGH, Engelbrecht AP, Salman A. Bare bones differential evolution. Eur J Oper Res.
2009;196(1):128–39.
23. Pampara G, Engelbrecht AP, Franken N. Binary differential evolution. In: Proceedings of IEEE
congress on evolutionary computation (CEC), 2006. p. 1873–1879.
24. Qin AK, Huang VL, Suganthan PN. Differential evolution algorithm with strategy adaptation
for global numerical optimization. IEEE Trans Evol Comput. 2009;13(2):398–417.
25. Rahnamayan S, Tizhoosh HR, Salama MMA. Opposition-based differential evolution. IEEE
26. Reynoso-Meza G, Sanchis J, Blasco X, Herrero JM. Hybrid DE algorithm with adaptive
crossover operator for solving real-world numerical optimization problems. In: Proceedings
of IEEE congress on evolutionary computation (CEC), New Orleans, LA, USA, June 2011. p.
1551–1556.
27. Ronkkonen J, Kukkonen S, Price KV. Real parameter optimization with differential evolution.
In: Proceedings of IEEE congress on evolutionary computation (CEC-2005), vol. 1. Piscataway,
NJ: IEEE Press; 2005. p. 506–513.
References 103
28. Salman AA, Ahmad I, Omran MGH. A metaheuristic algorithm to solve satellite broadcast
scheduling problem. Inf Sci. 2015;322:72–91.
29. Sarker RA, Elsayed SM, Ray T. Differential evolution with dynamic parameters selection for
optimization problems. IEEE Trans Evol Comput. 2014;18(5):689–707.
30. Storn R, Price K. Differential evolution—a simple and efficient adaptive scheme for global
optimization over continuous spaces. International Computer Science Institute, Berkeley, CA,
Technical Report TR-95-012, March 1995.
31. Storn R, Price KV. Differential evolution—a simple and efficient heuristic for global optimiza-
tion over continuous spaces. J Global Optim. 1997;11(4):341–59.
32. Storn R, Price KV, Lampinen J. Differential evolution—a practical approach to global opti-
mization. Berlin, Germany: Springer; 2005.
33. Sutton AM, Lunacek M, Whitley LD. Differential evolution and non-separability: using selec-
tive pressure to focus search. In: Proceedings of the 9th annual conference on GECCO, July
2007. p. 1428–1435.
34. Ter Braak CJF. A Markov chain Monte Carlo version of the genetic algorithm differential
evolution: Easy Bayesian computing for real parameter spaces. Stat Comput. 2006;16:239–49.
35. Vesterstrom J, Thomson R. A comparative study of differential evolution, particle swarm opti-
mization, and evolutionary algorithms on numerical benchmark problems. In: Proceedings of
IEEE congress on evolutionary computation (CEC), Portland, OR, June 2004. p. 1980–1987.
36. Wang H, Rahnamayan S, Sun H, Omran MGH. Gaussian bare-bones differential evolution.
IEEE Trans Cybern. 2013;43(2):634–47.
37. Wu CY, Tseng KY. Topology optimization of structures using modified binary differential
evolution. Struct Multidiscip Optim. 2010;42:939–53.
38. Wang L, Fu X, Mao Y, Menhas MI, Fei M. A novel modified binary differential evolution
algorithm and its applications. Neurocomputing. 2012;98:55–75.
39. Zaharie D. Control of population diversity and adaptation in differential evolution algorithms.
In: Proceedings of MENDEL 2003, Brno, Czech, June 2003. p. 41–46.
40. Yang Z, He J, Yao X. Making a difference to differential evolution. In: Advances in metaheuris-
tics for hard optimization. Berlin: Springer; 2007. p. 415–432.
41. Yang Z, Tang K, Yao X. Self-adaptive differential evolution with neighborhood search. In:
Proceedings of IEEE congress on evolutionary computation (CEC), Hong Kong, June 2008.
p. 1110–1116.
42. Zaharie D. On the explorative power of differential evolution. In: Proceedings of 3rd interna-
tional workshop on symbolic numerical algorithms and scientific computing, Oct 2001. http://
web.info.uvt.ro/~dzaharie/online?papers.html.
43. Zaharie D. Critical values for the control parameters of differential evolution algorithms. In:
Proceedings of the 8th international mendel conference on soft computing, 2002. p. 62–67.
44. Zaharie D. Influence of crossover on the behavior of differential evolution algorithms. Appl
Soft Comput. 2009;9(3):1126–38.
45. Zhang J, Sanderson AC. JADE: adaptive differential evolution with optional external archive.
IEEE Trans Evol Comput. 2009;13(5):945–58.
46. Zhang X, Yuen SY. A directional mutation operator for differential evolution algorithms. Appl
Soft Comput. 2015;30:529–48.
Estimation of Distribution Algorithms
7
Estimation of distribution algorithm (EDA) is a most successful paradigm of EAs.

EDAs are derived by inspirations from evolutionary computation and machine learn-
ing. This chapter describes EDAs as well as several classical EDA implementations.
7.1 Introduction
EDAs [28,37] are also called probabilistic model-building GAs [41] and iterated
density-estimation EAs (IDEAs) [7]. They borrow two concepts from evolutionary
computation: population-based search, and exploration by combining and perturbing
promising solutions. They also use probabilistic models from machine learning to
guide exploration of the search space. EDAs usually differ in the representation of
candidate solutions, the class of probabilistic models, or the procedures for learning
and sampling probabilistic models. EDA deals with noisy information.
EDAs have the ability to uncover the hidden regularities of problems and then
exploit them for effective search. EDA uses a probabilistic model to estimate the
distribution of promising solutions, and to further guide the exploration of the search
space. Estimating the probability distribution from data corresponds to tuning the
model for the inductive search bias. The probabilistic model is further employed to
generate new points.
In EDAs, classical genetic operators are replaced by the estimation of a prob-
abilistic model and its simulation in order to generate the next population. EDAs
perform two steps: building a probabilistic model from promising solutions found so
far, and then using this model to generate new individuals to replace the old popula-
tion. EDAs often require fewer fitness evaluations than EAs. A population is usually
not maintained between generations. A drawback of EDAs is that the computational
complexity increases rapidly with increasing dimensionality.

DOI 10.1007/978-3-319-41192-7_7
106 7 Estimation of Distribution Algorithms
Crossover operator in EAs sometimes destroys the building blocks because it is

randomly carried out. EDAs overcome the defects of crossover operators. EDAs
have theoretical attraction. Genetic operators are extremely hard to understand and
predict. Replacing genetic operators and populations with a simple yet powerful
model makes it simpler to understand system behavior.
EDAs ensure an effective mixing and reproduction of promising partial solutions,
thereby solving GA-hard problems with linear or subquadratic performance in terms
of fitness function evaluations [1,41]. A number of EDAs have been developed
for discrete and continuous variables: Factorized distribution algorithm (FDA) [36],
estimation of Bayesian networks algorithm (EBNA) [18], and Bayesian optimization
algorithm (BOA) [41,42] for discrete variables, and estimation of Gaussian networks
algorithm (EGNA) [29], IDEAs [9], mixed BOA [40], and real-coded BOA [1] for
continuous variables.
EDAs mainly differ in the class of probabilistic models used and the methods
applied to learn and sample these models. EDAs can exploit first-order or higher order
statistics. EDAs using only first-order statistics or simplest univariate EDAs include
several well-known ones such as compact GA (cGA) [24], population-based incre-
mental learning (PBIL) [4], and univariate marginal distribution algorithm (UMDA)
[35,37], the latter being a special case of PBIL. They employ probability models
in which all the variables in p(x, t + 1) are independent; hence, only the marginal
probability of each variable needs to be estimated in the selected solutions at each
iteration. PBIL and bit-based simulated crossover [57] use extremely simple models,
where each bit is generated independently.
Univariate EDAs can be easily implemented in hardware. They lead to a significant
reduction in memory requirements, as only the probability vector instead of an entire
population of solutions are stored. This feature makes them particularly attractive
for memory-constrained applications such as evolvable-hardware [20] or complex
combinatorial problems.
Most existing EDAs use low-order dependence relationships in modeling the
posterior probability of promising solutions in order to avoid exponential explosion.
EDAs using higher order statistics use a conditional dependence chain or network
to model the probability distributions.
In EDAs, the correlations between different variables are explicitly expressed
through the joint probability distribution associated with the individuals selected
at each iteration. EDAs can capture the structure of variable interactions, identify-
ing and manipulating crucial building blocks [64]. Some EDAs modeling bivariate
dependencies are implemented by mutual information maximization for input clus-
tering (MIMIC) algorithm [15], combining optimizers with mutual information trees
(COMIT) algorithm [6], and bivariate marginal distribution algorithm [43].
Some EDAs for modeling multivariate variable interactions are FDA [39], EBNA
[18], BOA [42], IDEA [9], and extended compact GA [23]. FDA with no variable
overlaps is equivalent to UMDA [39].
7.1 Introduction 107
The first EDA for real-valued random variables was an adaptation of binary PBIL
[49,52]. Unsupervised estimation of Bayesian network algorithm [44] is for effective
and efficient globally multimodal problem optimization. It uses a Bayesian network
for data clustering in order to factorize the joint probability distribution for the
individuals selected at each iteration.
ACO belongs to EDAs. EDAs and ACO are very similar and differ mainly in the
way the probabilistic model is updated [14,34].
Mateda-2.0 (http://www.jstatsoft.org/article/view/v035i07) is a MATLAB pack-
age for the implementation and analysis of EDAs.
7.2 EDA Flowchart
In EDAs, a probabilistic model is induced from some of the individuals in population

Pt , and then the next population Pt+1 is obtained by sampling this probabilistic
model. In EDAs, the estimation of distribution is often separated into two phases:
model selection and model fitting.
Unlike GA, which uses explicit representation of the population, an EDA uses
a probability distribution over the choices available at each position in the vector
that represents a population member. If the chromosome of a population codes L
bits, EDA uses a single vector of L probabilities ( p1 , p2 , . . . , p L ), where pi is the
probability of bit i being 1, to create an arbitrary number of candidate solutions. This
representation avoids premature convergence and is a compact representation.
The flowchart of EDA is shown in Algorithm 7.1. EDAs iterate the three steps
until some termination criteria are satisfied: select good candidates (i.e., solutions)
from a population of solutions, estimate the probability distribution from the selected
individuals, and generate new candidates (i.e., offspring) from the estimated distri-
bution.
Algorithm 7.1 (EDA).
1. Set t = 1.
Initialize the probability model p(x, t) to some prior (e.g., a uniform distribution).
2. Repeat:
a. Sampling step: Generate a population P (t) of N P individuals by sampling the
model.
b. Evaluation step: Determine the fitness of the individuals in the population.
c. Selection step: Create an improved data set by selecting M ≤ N P points.
d. Learning step: Create a new model p(x, t) from the old model and the improved
data set.
e. Generate O(t) by generating N P new points from the distribution p(x, t).
f. Incorporate O(t) into P (t).
g. Set t = t + 1.
until termination criteria are met.
EDA evolves a population of candidate solutions. Each iteration starts by eval-

uating the candidate solutions and selecting promising solutions so that solutions
of higher quality are given more copies than solutions of lower quality. EDAs can
use any standard selection method of EAs. Next, a probabilistic model is built for
the selected solutions and new solutions are generated by sampling the probabilis-
tic model. New solutions are then incorporated into the original population using a
replacement strategy, and the next iteration is executed unless the termination criteria
are met.
EDAs are likely to search the space where they have visited, just like GA without a
mutation operation. When the probability distribution of a decision variable is close
to 1 or 0, its value is difficult to change. This is the so-called fixed-point problem and
it may lead the search process to a local optimum. Thus, they are not ergodic. This
effect is analogous to the phenomenon of genetic drift in evolutionary dynamics.
EDAs are susceptible to premature convergence. Finite population sampling in
selection results in fluctuations, which get reinforced when the probability model is
updated. These attractors are only removed when the population size or the learning
rate are scaled with the system size in a suitable way. For hard problems exponential
scaling will be required, whereas for easy ones polynomial scaling is necessary.
A significant difference between discrete and real-coded EDAs exists from the
viewpoint of probabilistic model learning. Discrete EDAs can easily estimate a prob-
ability distribution of a given/observed data set by simply counting the number of
instances for possible combinations. The estimated distribution converges to its true
distribution as the data size increases. Real-coded EDAs cannot use this simple
counting method to estimate a probability distribution for real-valued data.
Hybrid DE/EDA algorithm [56] tries to guide its search toward a promising area
by sampling new solutions from a probability model. It outperforms DE and EDA.
7.3 Population-Based Incremental Learning
Population-based incremental learning (PBIL) [4] was designed as an abstraction of

binary-coded GA, which explicitly maintains the statistics contained in GA popula-
tion. As a combination of evolutionary optimization and hill-climbing, PBIL outper-
forms standard GAs and hill-climbing algorithms [5]. It aims to generate a real-valued
probability vector p = { p1 , . . . , p L } for L bits, which creates high-quality solutions
with high probabilities when sampled.
PBIL supposes that all the variables are independent. It employs a Bernoulli
random variable as the model for each bit. PBIL starts from a probability vector with
all elements set to 0.5. During evolution, the value of each element will be updated
by the best individual in the population, modifying its estimation about the structure
of good individuals. The algorithm will converge to a vector with each element being
0 or 1.
7.3 Population-Based Incremental Learning 109
PBIL uses a Hebbian-inspired rule to update the probability vector:

μ
1 s
p(t + 1) = (1 − α) p(t) + α x k (t), (7.1)
μ
k=1
where α ∈ (0, 1] is the learning rate, and x k is a sample in the set of μ best samples
from N P . In [5], N P = 200, α = 0.005, and μ = 2.
A mutation step can be further applied on the learned probability vector p. If
random(0, 1) < pm , then
p(t + 1) = (1 − δ) p(t) + δb(t), (7.2)
where δ is mutation shift, pm is mutation rate. α and δ can be set as small values,
e.g., α = 0.1, and δ = 0.02.
Bit-based simulated crossover [57] regenerates the probability vector at each gen-
eration and it also uses selection probabilities to generate the probability vector. In
contrast, PBIL does not regenerate the probability vector at each generation, but
updates it using a few of the best performing individuals. Also, PBIL does not use
selection probabilities.
In UMDA the new probabilistic model replaces the old one, while in PBIL the new
model is used to refine the old one by means of a parameter α. UMDA corresponds
to a particular case of PBIL when α = 1.
Two variations of PBIL are, respectively, based on mutations and learning from
negative examples [5]. Mutations in PBIL serve a purpose to inhibit premature con-
vergence, by perturbing the probability vector with a small probability in a random
direction. The amount of the perturbation is generally kept small in relation to the
learning rate. The second variation is to learn from negative examples. The probabil-
ity update rule and sampling procedure for PBIL presented in [61] could effectively
utilize the increased diversity of opposition to generate significantly improved results
over PBIL. Opposition-based learning is an effective method to enhance PBIL [61].
PBIL has been extended to continuous spaces using a Gaussian distribution model
[49,52]. Gaussian distribution is the product of a set of univariate Gaussians for each
variable. To accommodate for these normal pdfs, in [49] the probability vector from
PBIL is replaced with a vector that specifies for each variable the mean and variance
of the associated normal pdf. The means are updated using an update rule similar
to that in PBIL. The variances are initially relatively large and are annealed to a
small value using a geometrically decaying schedule. In [52] a normal pdf is used
for each variable, but the variance is updated using the same update rule as that for
the mean. It starts with a general distribution with the mean vector of its Gaussian
in the middle of the search space. In each generation, the mean vector x is updated
by a combination of the best, the second best and the worst individuals:
x(t + 1) = (1 − α)x(t) + α(x best1 + x best2 − x wor st ). (7.3)
The standard deviation σ of the univariate Gaussian determines the diversity of the
population. A strategy for dynamically adapting σ is derived from the distribution
of best individuals [52].
Best value: 0.3272 Mean value: 19.2897
1 Minimum value
10
Mean value
Function value
0
10
0 100 200 300 400 500

Generation
Figure 7.1 The evolution of a random run of PBIL for Ackley function: the minimum and average
objectives.
Example 7.1:
We now minimize Ackley function of two variables:
⎛ ⎞
n
n
1 1
min f (x) = 20 + e − 20 exp ⎝−0.2 xi2 ⎠ − exp cos(2πxi ) ,
x n n
i=1 i=1
where x ∈ [−32, 32]2 . The global minimum value is 0 at = 0. x∗

We implement PBIL on this problem by setting the population size as 200, the
maximum number of iterations as 500, an elitism strategy of passing 5 best individuals
to the next generation, α = 0.1, pm = 0.9, and select the initial population randomly
from the entire domain. The probability vector is updated from 20 best individuals of
each generation. In order to maintain the population diversity, a program is applied
to make sure there are no duplicate individuals in the population.
For a random run, we have f (x) = 0.3272 at (−0.0645, 0.0285). The evolution
of the search is illustrated in Figure 7.1. For 10 random runs, the solver always
converged toward the global optimum.
7.4 Compact Genetic Algorithms
Compact GAs (cGAs) [24] evolve a probability vector that describes the hypothetic
distribution of a population of solutions in the search space to mimic the first-order
behavior of simple GA with uniform crossover. It was primarily inspired by the ran-
dom walk model, proposed to estimate GA convergence on a class of problems where
7.4 Compact Genetic Algorithms 111
there is no interaction between the building blocks constituting the solution. cGA
iteratively processes the probability vector with updating mechanisms that mimic
the typical selection and recombination operations performed in standard GA, and
is almost equivalent to simple GA with binary tournament selection and uniform
crossover on a number of test problems [24].
Elitism-based cGAs [2] are EDAs for solving difficult optimization problems
without compromising on memory and computation costs. The idea is to deal with
issues connected with lack of memory by allowing a selection pressure that is high
enough to offset the disruptive effect of uniform crossover. The analogies between
cGAs and (1 + 1)-ES are discussed and a mathematical model of ES is also extended
to cGAs obtaining useful analytical performance in [2].
cGA represents the population by means of a vector of probabilities pi ∈
[0, 1], i = 1, . . . , l, for l alleles needed to represent the solutions. Each pi mea-
sures the proportion of individuals in the simulated population that have a zero (one)
in the ith locus. By treating these values as probabilities, new individuals can be
generated and, based on their fitness, the probability vector updated in order to favor
the generation of better individuals.
The probabilities pi are initially set to 0.5 for a randomly generated population. At
each iteration cGA generates two individuals on the basis of the current probability
vector and compares their fitness. Let W be the individual with better fitness and L
the individual with worse fitness. The probability vector at step k + 1 is updated by
⎧ k
⎨ pi + N1 , if wi = 1 and li = 0
k+1
pi = pik − N1 , if wi = 0 and li = 1 , (7.4)
⎩ k
pi , if wi = li
where N is the size of the population simulated and wi (or li ) is the value of the ith
allele of W (or L). cGA stops when the values of the probability vector p are all
equal to zero or one, which is the final solution.
Since cGA mimics the first-order behavior of standard GA, it is basically a 1-bit
optimizer and ignores the interactions among the genes. To solve problems with
higher-order building blocks, GAs with both higher selection pressure and larger
population sizes have to be exploited to help cGA to converge to better solutions [24].
cGA can be used to quickly assess the difficulty of a problem. A problem is
easy if it can be solved with cGA exploiting a low selection rate. The more the
selection rate is for solving the problem, the more difficult is the problem. Given a
population of individuals, cGA updates the probability vector by 1/N . Only log2 N
bits are needed to store the finite set of values for each pi . cGA, therefore, requires
l log2 N bits compared to the Nl bits needed by simple GA, hence saving memory
requirement.
Real-valued cGA [32] works directly with real-valued chromosomes. For an opti-
mization problem with m real-valued variables, it uses as probability vector a m × 2
matrix describing the mean and the standard deviation of the distribution of each gene
in the hypothetical population. New variants of the update rules are then introduced
to evolve the probability vector in a way that mimics binary-coded cGA.
7.5 Bayesian Optimization Algorithm
BOA employs general probabilistic models for discrete variables [41]. It utilizes
techniques for modeling multivariate data by Bayesian networks so as to estimate
the joint probability distribution of promising solutions. The method is very effec-
tive even on large decomposable problems with loose and tight linkage of building
blocks. The superior subsolutions are identified as building blocks. Theoretically
and empirically, BOA finds the optimal solution with subquadratic scaleup behavior.
BOA realizes probabilistic building-block crossover that approximates population-
wise building-block crossover by a probability distribution estimated on the basis of
proper decomposition [41,42].
Real-coded BOA [1] employs a Bayesian factorization that estimates a joint prob-
ability distribution for multivariate variables by a product of univariate conditional
distributions of each random variable. It deals with a real-valued optimization by
evolving a population of promising solutions such that new offspring are generated
in line with the estimated probabilistic models of superior parent population. An ini-
tial population is generated at random. Superior solutions are selected by a method
such as tournament or truncation. A probabilistic model is learned from the selected
solutions by exploiting an information metric. New solutions are drawn by sampling
the learned model. The procedure iterates until some termination criteria are satisfied.
Real-coded BOA empirically solves numerical optimization problems of bounded
difficulty with subquadratic scaleup behavior [1]. A theoretical analysis shows that
real-coded BOA finds the optimal solution with a subquadratic (in problem size)
scalability for uniformly scaled decomposable problems [3]. The analytical models
of real-coded BOA have been verified by experimental studies. The analysis has been
extended for exponentially scaled problems, and the quasi-quadratic scalability has
also found experimental support.
7.6 Concergence Properties
The behaviors of PBIL with elitist selection in discrete space have been studied in
[22,26]. Having a sufficiently small learning rate, PBIL is modeled using a discrete
dynamic system and the local optima of an injective function with respect to Ham-
ming distance are stable fixed points of PBIL [22]. The dynamic behavior of UMDA
is shown to be very similar to that of GA with uniform crossover [35].
UMDA and PBIL can locate the optimum of a linear function, but cannot solve
problems with nonlinear variable interactions [22,26]. These results suggest that
EDAs using only first-order statistics have very limited ability to find global optimal
solutions.
PBIL and cGA are modeled by a Markov process and the behavior is approximated
using an ordinary differential equation, which, with sufficiently small learning rates,
converges to local optima of the function to be optimized, with respect to Hamming
distance [46,48]. Bounds on the probability of convergence to the optimal solution
7.6 Concergence Properties 113
are obtained in [45] for cGA and PBIL. Moreover, a sufficient condition for conver-
gence to the optimal solution is given, and a range of possible values for algorithmic
parameters is computed, at which the algorithm converges to the optimal solution
with a predefined confidence level.
The dynamic behaviors of the limit models of UMDA and FDA with tournament
selection are studied in [63] for discrete optimization problems. The local optima
with respect to the Hamming distance are asymptotically stable. The limit model
of UMDA can be trapped at any local optimum for some initial probability models.
In the case of an additively decomposable objective function, FDA can converge to
the global optimal solution [63]. Based on the dynamic analysis of the distributions
of infinite population in EDAs, FDA under proportional selection converges to the
global optimum for optimization of continuous additively decomposable functions
with overlaps [64].
In addition to convergence time, the time complexity of EDAs can be measured
by the first hitting time. The first hitting time of cGA with population size 2 is
analyzed in [17] by employing drift analysis and Chernoff bounds on linear pseudo-
boolean functions. On the pseudo-boolean injective function, the worst-case mean
exponential first hitting time in the problem size are proved for four commonly used
EDAs using the analytical Markov chain framework [21].
In [13], a classification of problem hardness for EDAs and the corresponding prob-
ability conditions are proposed based on the first hitting time measure. An approach
to analyzing the first hitting time for EDAs with finite population was introduced,
which is implemented on UMDA with truncation selection using discrete dynamic
systems and Chernoff bounds on two unimodal problems.
For EDAs, theoretical results on convergence are available based on infinite pop-
ulation assumption [12,22,38,64].
In consideration of the premature convergence phenomenon, the dynamics of
EDAs are analyzed in [55] in terms of Markov chains and general EDAs cannot
satisfy two necessary conditions for being effective search algorithms. In the case of
UMDA, the global optimum is found only if the population size is sufficiently large.
When the initial configuration is fixed and the learning rate is close to zero,
a unified convergence behavior of PBIL is presented in [30] based on the weak
convergence property of PBIL, and the results are further generalized to the case
when the individuals are randomly selected from the population.
7.7 Other EDAs
Traditional EDAs have difficulties in solving higher-dimensional problems because

of the curse of dimensionality and rapidly increasing computational costs. EDA with
model complexity control [16] scales up continuous EDAs. By employing weakly
dependent variable identification and subspace modeling, it significantly outperforms
traditional EDAs on high-dimensional problems. Moreover, the computational cost
and the requirement of large population sizes can be reduced.
Several EDAs based on multivariate Gaussian distribution have been proposed,

such as EMNAglobal [28], normal IDEA [8,9], and EGNA [28]. EMNAglobal adopts
a conventional maximum likelihood estimated multivariate Gaussian distribution.
In normal IDEA and EGNA, after obtaining the maximum likelihood estimation
of mean and deviation, a Bayesian factorization (i.e., a Gaussian network) is con-
structed, usually by greedy search. Since these EDAs are essentially based on the
same multivariate Gaussian distribution, their performances are similar. EDAs adopt-
ing Gaussian mixture distribution [3,19] have been proposed for solving multimodal
and hard deceptive problems.
CMA-ES could be regarded as an EDA, which considers the solution landscape
as a probability density function space and uses the population to estimate that
probability distribution.
Stochastic GA [58] employs a stochastic coding strategy. The search space is
explored region by region. Regions are dynamically created using a stochastic
method. In each region, a number of children are produced through random sam-
pling, and the best child is chosen to represent the region. The variance values are
decreased if at least one of five generated children results in improved fitness; other-
wise, they are increased. Stochastic GA codes each chromosome as a representative
of a stochastic region described by a multivariate Gaussian distribution rather than
a single candidate solution. On average, the computational cost is significantly less
than that of the other algorithms.
Edge histogram-based sampling algorithm [59] and node histogram-based sam-
pling algorithm [60] are EDAs specifically designed for permutation-based problems.
Mallows EDA [10] applies a probabilistic model that estimates an explicit probability
distribution in the domain of permutations. The Mallows model is a distance-based
exponential probabilistic model considered analogous to the Gaussian probability
distribution over the space of permutations. Mallows EDA is able to outperform
edge histogram-based sampling algorithm and node histogram-based sampling algo-
rithm for the permutation flow shop problem with the makespan criterion [10]. In
[11], a general EDA based on the generalized Mallows model is introduced to deal
with permutation-based optimization problems. It consists of EDA and a variable
neighborhood search.
Variation operators in EAs directly use the location information of the locally opti-
mal solutions found so far. The offspring thus produced are close to their parents.
On the other hand, EDAs use the global statistical information to sample offspring.
EA with guided mutation [65] is a hybrid of EA and EDA. Originally developed
for discrete optimization problems, it is also suitable for continuous optimization
problems. The algorithm is scalable. The algorithm flowchart is similar to that of
PBIL except that the offspring for the next generation are produced using guided
mutation operator. According to guided mutation rate β, the operator samples new
offspring by copying the location information either from the parent or from the
probability vector p; with a larger value of β, more genes of the offspring are sam-
pled from the probability vector. The algorithm has the control parameters: learning
rate λ, guided-mutation rate β, and population size [65]. In [27], on the CEC-2010
benchmark functions for largescale global optimization, EA with guided mutation
7.7 Other EDAs 115
outperforms PBIL in solution quality, but at a higher computational cost; its perfor-
mance is comparable to that of MA-SW-Chains [33], the winner of CEC’2010.
Source code of various EDAs can be downloaded from the following sources:
extended cGA [23] (C++), BOA (C++), BOA with decision graphs (http://www-
illigal.ge.uiuc.edu); adaptive mixed BOA (http://jiri.ocenasek.com), real-coded
BOA (http://www.evolution.re.kr), naive multiobjective mixture-based IDEA,
normal IDEA-induced chromosome elements exchanger, normal IDEA (http://
homepages.cwi.nl/~bosman). There are also Java applets for several real-valued and
permutation EDAs (http://www2.hannan-u.ac.jp/~tsutsui/research-e.html).
7.7.1 Probabilistic Model Building GP
Probabilistic model building GP can be broadly classified into algorithms based on

a prototype tree, and those based on grammar-guided GPs.
Examples of prototype tree-based approach are probabilistic incremental pro-
gram evolution (PIPE) [50], estimation of distribution programming [62], extended
compact GP [51], BOA programming [31], and program optimization with linkage
estimation (POLE) [25]. The prototype tree-based approach is easy to apply. For
example, PIPE extends PBIL to program evolution, while extended compact GP
combined extended compact GA [23] with GP. PIPE [50] combines probability vec-
tor coding of program instructions, PBIL, and tree-coded programs like those used
in some variants of GP. POLE [25] is a program evolution algorithm employing a
Bayesian network for generating new individuals. This approach employs a special
chromosome called the expanded parse tree, which significantly reduces the size
of the conditional probability table. However, there are two problems pertinent to
program evolution applications: the number of symbols and the syntactic correctness.
Examples of context-free grammar-based approach are stochastic grammar-based
GP [47], program evolution with explicit learning [53], and grammar model-based
program evolution [54].
Problems
7.1 Given a uniform random variable in (0, 1), find the function y(x) with the pdf

3a if 0 < y < 3/4
p(y) =
a if 3/4 < y < 1.
Solve a so that p(y) is a valid pdf.
7.2 Plot Ackley function of two variables.
7.3 Write the algorithmic flowchart of PBIL.
7.4 Use PBIL to minimize the 10-dimensional Ackley function, using eight bits per
dimension. Run for 30 generations using N P = 200, α = 0.005, and μ = 2.
7.5 Write the algorithmic flowchart of cGA.
7.6 Download Mateda-2.0 package and learn by running the examples. Then use it
for optimizing a general benchmark in the Appendix.
References
1. Ahn CW, Goldberg DE, Ramakrishna RS. Real-coded Bayesian optimization algorithm: bring-
ing the strength of BOA into the continuous world. In: Proceedings of genetic and evolutionary
2. Ahn CW, Ramakrishna RS. Elitism based compact genetic algorithms. IEEE Trans Evol Com-
put. 2003;7(4):367–85.
3. Ahn CW, Ramakrishna RS. On the scalability of real-coded Bayesian optimization algorithm.
4. Baluja S. Population-based incremental learning: a method for integrating genetic search based
function optimization and competitive learning. Technical Report CMU-CS-94-163, Computer
Science Department, Carnegie Mellon University, Pittsburgh, PA, 1994.
5. Baluja S, Caruana R. Removing the genetics from the standard genetic algorithm. In: Prieditis
A, Russel S, editors. Proceedings of the 12th international conference on machine learning.
San Mateo, CA: Morgan Kaufmann; 1995. p. 38–46.
6. Baluja S, Davies S. Fast probabilistic modeling for combinatorial optimization. In: Proceedings
of the 15th national conference on artificial intelligence (AAAI-98), Madison, WI, 1998. p.
469–476.
7. Bosman PAN, Thierens D. An algorithmic framework for density estimation based evolutionary
algorithms. Technical Report UU-CS-1999-46, Utrecht University, 1999.
8. Bosman PAN, Thierens D. Expanding from discrete to continuous estimation of distribution
algorithms: The IDEA. In: Proceedings of parallel problem solving from nature (PPSN VI),
vol. 1917 of Lecture Notes in Computer Science. Springer: Berlin; 2000. p. 767–776.
9. Bosman PAN, Thierens D. Advancing continuous IDEAs with mixture distributions and factor-
ization selection metrics. In: Proceedings of genetic and evolutionary computation conference
(GECCO-2001). San Francisco, CA; 2001. p. 208–212.
10. Ceberio J, Mendiburu A, Lozano JA. Introducing the Mallows model on estimation of distribu-
tion algorithms. In: Proceedings of international conference on neural information processing
(ICONIP), Shanghai, China, Nov 2011. p. 461–470.
11. Ceberio J, Irurozki E, Mendiburu A, Lozano JA. A distance-based ranking model estimation
of distribution algorithm for the flowshop scheduling problem. IEEE Trans Evol Comput.
2014;18(2):286–300.
12. Chen T, Tang K, Chen G, Yao X. On the analysis of average time complexity of estimation of
distribution algorithms. In: Proceedings of IEEE congress on evolutionary computation (CEC),
Singapore, Sept 2007. p. 453–460.
13. Chen T, Tang K, Chen G, Yao X. Analysis of computational time of simple estimation of
distribution algorithms. IEEE Trans Evol Comput. 2010;14(1):1–22.
14. Cordon O, de Viana IF, Herrera F, Moreno L. A new ACO model integrating evolutionary com-
putation concepts: the best-worst ant system. In: Proceedings of second international workshop
ant algorithms (ANTS’2000): from ant colonies to artificial ants, Brussels, Belgium, 2000.
p. 22–29.
15. de Bonet JS, Isbell Jr CL, Viola P. MIMIC: finding optima by estimating probability densities.
In: Mozer MC, Jordan MI, Petsche T. editors, Advances in neural information processing
systems, vol. 9. Cambridge, MA: MIT Press; 1997. p. 424–424.
16. Dong W, Chen T, Tino P, Yao X. Scaling up estimation of distribution algorithms for continuous
optimization. IEEE Trans Evol Comput. 2013;17(6):797–822.
17. Droste S. A rigorous analysis of the compact genetic algorithm for linear functions. Nat Comput.
2006;5(3):257–83.
18. Etxeberria R, Larranaga P. Global optimization using Bayesian networks. In: Proceedings of
2nd symposium on artificial intelligence (CIMAF-99), Habana, Cuba, 1999. p. 332–339.
References 117
19. Gallagher M, Frean M, Downs T. Real-valued evolutionary optimization using a flexible prob-
ability density estimator. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Orlando, Florida, July 1999. p. 840–846.
20. Gallagher JC, Vigraham S, Kramer G. A family of compact genetic algorithms for intrinsic
evolvable hardware. IEEE Trans Evol Comput. 2004;8:111–26.
21. Gonzalez C. Contributions on theoretical aspects of estimation of distribution algorithms.
Doctoral Dissertation, Department of Computer Science and Artificial Intelligence, University
of Basque Country, Donostia, San Sebastian, Spain, 2005.
22. Gonzalez C, Lozano JA, Larranaga P. Analyzing the PBIL algorithm by means of discrete
dynamical systems. Complex Syst. 2000;12(4):465–79.
23. Harik G. Linkage learning via probabilistic modeling in the ECGA. Berlin, Germany: Springer;
1999.
24. Harik GR, Lobo FG, Goldberg DE. The compact genetic algorithm. IEEE Trans Evol Comput.
1999;3(4):287–97.
25. Hasegawa Y, Iba H. A Bayesian network approach to program generation. IEEE Trans Evol
Comput. 2008;12(6):750–63.
26. Hohfeld M, Rudolph, G. Towards a theory of population-based incremental learning. In: Pro-
ceedings of the 4th IEEE conference on evolutionary computation, Indianapolis, IN, 1997. p.
1–5.
27. Khan IH. A comparative study of EAG and PBIL on large-scale global optimization problems.
Appl Comput Intell Soft Comput. 2014; Article ID 182973:10 p.
28. Larranaga P, Lozano JA, editors. Estimation of distribution algorithms: a new tool for evolu-
tionary computation. Norwell, MA: Kluwer Academic Press; 2001.
29. Larranaga P, Lozano JA, Bengoetxea E. Estimation of distribution algorithms based on multi-
variate normal and gaussian networks. Department of Computer Science and Artificial Intelli-
gence, University of Basque Country, Vizcaya, Spain, Technical Report KZZA-1K-1-01, 2001.
30. Li H, Kwong S, Hong Y. The convergence analysis and specification of the population-based
incremental learning algorithm. Neurocomputing. 2011;74:1868–73.
31. Looks M, Goertzel B, Pennachin C. Learning computer programs with the Bayesian optimiza-
tion algorithm. In: Proceedings of genetic and evolutionary computation conference (GECCO),
Washington, DC, 2005, vol. 2, p. 747–748.
32. Mininno E, Cupertino F, Naso D. Real-valued compact genetic algorithms for embedded micro-
controller optimization. IEEE Trans Evol Comput. 2008;12(2):203–19.
33. Molina D, Lozano M, Herrera F. MA-SW-Chains: memetic algorithm based on local search
chains for large scale continuous global optimization. In: Proceedings of the IEEE world
congress on computational intelligence (WCCI’10), Barcelona, Spain, July 2010, p. 1–8.
34. Monmarche N, Ramat E, Dromel G, Slimane M, Venturini G. On the Similarities between
AS, BSC and PBIL: toward the birth of a new meta-heuristic. Technical Report 215, Ecole
d’Ingenieurs en Informatique pour l’Industrie (E3i), Universite de Tours, France, 1999.
35. Muhlenbein H. The equation for response to selection and its use for prediction. Evol Comput.
1998;5:303–46.
36. Muhlenbein H, Mahnig T. FDA—a scalable evolutionary algorithm for the optimization of
additively decomposed function. Evol Comput. 1999;7(4):353–76.
37. Muhlenbein H, Paab G. From recombination of genes to the estimation of distributions. I.
Binary parameters. In: Voigt H-M, Ebeling W, Rechenberg I, Schwefel H-P. editors, Parallel
problem solving from nature (PPSN IV), Lecture Notes in Computer Science 1141. Berlin:
Springer; 1996. p. 178–187.
38. Muhlenbein H, Schlierkamp-Voosen D. Predictive models for the breeder genetic algorithm,
i: continuous parameter optimization. Evol Comput. 1993;1(1):25–49.
39. Muhlenbein H, Mahnig T, Rodriguez AO. Schemata, distributions, and graphical models in
evolutionary optimization. J Heuristics. 1999;5(2):215–47.
40. Ocenasek J, Schwarz J. Estimation of distribution algorithm for mixed continuous-discrete opti-
mization problems. In: Proceedings of the 2nd euro-international symposium on computational
intelligence, Kosice, Slovakia, 2002. p. 115–120.
41. Pelikan M. Bayesian optimization algorithm: from single level to hierarchy. PhD thesis, Uni-
versity of Illinois at Urbana-Champaign, Urbana, IL, 2002. Also IlliGAL Report No. 2002023.
42. Pelikan M, Goldberg DE, Cantu-Paz E. BOA: the Bayesian optimization algorithm. In: Pro-
ceedings of genetic and evolutionary computation conference, Orlando, FL, 1999. p. 525–532.
43. Pelikan M, Muhlenbein H. The bivariate marginal distribution algorithm. In: Roy R, Furuhashi
T, Chawdhry PK. editors, Advances in soft computing: engineering design and manufacturing.
London, U.K.: Springer; 1999. p. 521–53.
44. Pena JM, Lozano JA, Larranaga P. Globally multimodal problem optimization via an estimation
of distribution algorithm based on unsupervised learning of Bayesian networks. Evol Comput.
2005;13(1):43–66.
45. Rastegar R. On the optimal convergence probability of univariate estimation of distribution
algorithms. Evol Comput. 2011;19(2):225–48.
46. Rastegar R, Hariri A. A step forward in studying the compact genetic algorithm. Evol Comput.
2006;14(3):277–89.
47. Ratle A, Sebag M. Avoiding the bloat with probabilistic grammar-guided genetic programming.
In: Proceedings of the 5th international conference on artificial evolution, Creusot, France,
2001. p. 255–266.
48. Rastegar R, Hariri A. The population-based incremental learning algorithm converges to local
optima. Neurocomputing. 2006;69:1772–5.
49. Rudlof S, Koppen M. Stochastic hill climbing with learning by vectors of normal distributions.
In: Furuhashi T, editor. Proceedings of the 1st Online Workshop on Soft Computing (WSC1).
Nagoya, Japan: Nagoya University; 1996. p. 60–70.
50. Salustowicz R, Schmidhuber J. Probabilistic incremental program evolution. Evol. Comput.
1997;5(2):123–41.
51. Sastry K, Goldberg DE. Probabilistic model building and competent genetic programming. In:
Riolo RL, Worzel B, editors. Genetic programming theory and practice, ch. 13. Norwell, MA:
Kluwer; 2003. p. 205–220.
52. Sebag M, Ducoulombier A. Extending population–based incremental learning to continuous
search spaces. In: Eiben AE et al, editors. Parallel problem solving from nature (PPSN) V.
Berlin: Springer; 1998. p. 418–427.
53. Shan Y, McKay RI, Abbass HA Essam D. Program evolution with explicit learning: A new
framework for program automatic synthesis. In: Proceedings of 2003 congress on evolutionary
computation (CEC), Canberra, Australia, 2003. p. 1639–1646.
54. Shan Y, McKay RI, Baxter R, Abbass H, Essam D, Hoai NX. Grammar model-based program
evolution. In: Proceedings of 2004 IEEE congress on evolutionary computation, Portland, OR,
2004. p. 478–485.
55. Shapiro JL. Drift and scaling in estimation of distribution algorithms. Evol Comput.
2005;13(1):99–123.
56. Sun J, Zhang Q, Tsang E. DE/EDA: a new evolutionary algorithm for global optimization. Inf
Sci. 2005;169:249–62.
57. Syswerda G. Simulated crossover in genetic algorithms. In: Whitley DL, editor. Foundations
of genetic algorithms 2. San Mateo, CA: Morgan Kaufmann; 1993. p. 239–255.
58. Tu Z, Lu Y. A robust stochastic genetic algorithm (StGA) for global numerical optimization.
59. Tsutsui S. Probabilistic model-building genetic algorithms in permutation representation
domain using edge histogram. In: Proceedings of the 7th international conference on parallel
problem solving from nature (PPSN VII), Granada, Spain, September 2002. p. 224–233.
60. Tsutsui S. Node histogram vs. edge histogram: a comparison of probabilistic model-building
genetic algorithms in permutation domains. In: Proceedings of IEEE congress on evolutionary
computation (CEC), Vancouver, BC, Canada, July 2006. p. 1939–1946.
References 119
61. Ventresca M, Tizhoosh H. A diversity maintaining population-based incremental learning algo-

rithm. Inf Sci. 2008;178:4038–56.
62. Yanai K, Iba H. Estimation of distribution programming based on Bayesian network. In: Pro-
ceedings of 2003 congress on evolutionary computation (CEC), Canberra, Australia, 2003. p.
1618–1625.
63. Zhang Q. On stability of fixed points of limit models of univariate marginal distribution algo-
rithm and factorized distribution algorithm. IEEE Trans Evol Comput. 2004;8(1):80–93.
64. Zhang Q, Muhlenbein H. On the convergence of a class of estimation of distribution algorithms.
65. Zhang Q, Sun J, Tsang E. An evolutionary algorithm with guided mutation for the maximum
clique problem. IEEE Trans Evol Comput. 2005;9(2):192–200.
Topics in Evolutinary Algorithms
8
This chapter continues to introduce topics on EAs. Convergence of EAs is first

analyzed by using scheme theorem, building-block hypothesis, and then by using
finite and infinite population models. Various parallel implementations of EAs are
then described in detail. Some other associated topics including coevolution and
fitness approximation are finally introduced.
8.1 Convergence of Evolutinary Algorithms
The behavior of EAs is often analyzed by using the schema-based approach [51],
Markov chain models [79], and infinite population models [91].
8.1.1 Schema Theorem and Building-Block Hypothesis
Schema Theorem
The two most important theoretical foundations of GA are Holland’s schema theorem
[50] and Goldberg’s building-block hypothesis [40]. The convergence analysis of
simple GA is based on the concept of schema [50]. A schema is a bit pattern that
functions as a set of binary strings.
A schema is a similarity template describing a subset of strings with the same bits
(0 or 1) at certain positions. A schema h = (h 1 , h 2 , . . . , h l ) is defined as a ternary
string of length l, where h i ∈ {0, 1, ∗}, with ∗ denoting the do-not-care symbol. The
size or order o(h) of a schema h is defined as the number of fixed positions (0s or
1s) in the string. A position in a schema is fixed if there is either a 0 or a 1 in this
position. The defining length δ(h) of a schema h is defined as the maximum distance
between any two fixed bits. The fitness of a schema is defined as the average fitness

DOI 10.1007/978-3-319-41192-7_8
122 8 Topics in Evolutinary Algorithms
of all instances of this schema:

1
f (h) = f (x), (8.1)
h
x∈{h}
where h is the number of individuals x that are an instance of the schema h.
The instances of a schema h are all genotypes where x g ∈ {h}. For example,
x = 01101 and x g = 01100 are instances of h = 0 ∗ 1 ∗ ∗. The number of
g
individuals that are an instance of a schema h can be calculated as 2l−o(h) .

The combined effect of selection, crossover, and mutation gives the reproductive
schema growth inequality [50]

f (h) δ(h)
m(h, t + 1) ≥ m(h, t) · 1 − Pc · − o(h)Pm , (8.2)
f (t) l −1
where m(h, t) is the number of examples of a particular schema h within a population
at generation t, and f (t) is the average fitness of the whole population at generation
t.
The schema theorem [40,50] states that in the long run the best bit patterns will
dominate the whole population. The schema theorem asserts that the proportions of
the better schemata to the overall population increases as the generation progresses
and eventually the search converges to the best solution with respect to the opti-
mization function [40]. The schema theorem, given in Theorem 8.1, can be readily
derived from (8.2) [50].
Theorem 8.1 (Schema Theorem). Above-average schemata with short defining

length and low order will receive exponentially increasing trials in subsequent gen-
erations of a GA.
The schema theory for GA [50,51] aims to predict the expected numbers of
solutions in a given schema (a subset of the search space) at the next generation,
in terms of quantities measured at the current generation. According to the schema
theorem, schemata with high fitness and small defining lengths grow exponentially
with time. Thus, GA simultaneously processes a large number of schemata. For a
population of N P individuals, GA implicitly evaluates approximately N P3 schemata
in one generation [40]. This is called implicit parallelism. The theorem holds for all
schemata represented in the population.
The schema theorem works for GP as well, based on the idea of defining a schema
as the subspace of all trees that contain a predefined set of subtrees [59,82]. A schema
theorem for GP was derived in the presence of fitness-proportionate selection and
crossover in [82].
The exact schema theorems for GA and GP have been derived for exactly predict-
ing the expected characteristics of the population at the next generation [85,107]. The
schema theorem based on the concept of effective fitness [107] shows that schemata
of higher than average effective fitness receive an exponentially increasing number
of trials over time. However, generically there is no preference for short, low-order
schemata [107]. Based on the theory proposed in [107], a macroscopic exact schema
8.1 Convergence of Evolutinary Algorithms 123
theorem for GP with one-point crossover is provided in [85]. These schema theorems
have also been written for standard GP with subtree crossover [87,88].
A simpler definition of the schema of GP given in [86] is close to the original
concept of schema in GA. Along with one-point crossover and point mutation, this
concept of schema has been used to derive an improved schema theorem for GP that
describes the propagation of schemata from one generation to the next [86].
An exact microscopic model for the dynamics of a GA with generalized recom-
bination is presented in [106]. It is shown that the schema dynamics have the same
functional form as that of strings and a corresponding exact schema theorem is
derived.
However, there are a lot of criticisms on the schema theorem. The schema growth
inequality provides a lower bound for one-generation transition of GA. For multiple
generations, the prediction of the schema may be useless or misleading due to the
inexactness of the inequality [43].
Building-Block Hypothesis
Building-block hypothesis [40] is the assumption that strings with high fitness can
be located by sampling building blocks with high fitness and combining the building
blocks effectively. This is given in Theorem 8.2.
Theorem 8.2 (Building-block Hypothesis). A GA seeks near-optimal performance

by the juxtaposition of short, low-order, and highly fit schemata, called building
blocks.
Building block hypothesis clearly explains how GA operates. It indicates that

successively better solutions are generated by combining useful parts of existing
solutions. Schemata are viewed as building blocks that may be useful for constructing
complete solutions.
It is argued in [13] that neither the building block hypothesis nor the notion of
implicit parallelism explains the reason that makes a GA a function optimizer. An
explanation is derived in [13] from basic EA principles derived from the ES theory.
The principles studied are general, and hold for all EAs, including GAs.
Crossover is beneficial because it can capitalize on mutations that have both ben-
eficial and disruptive effects on building blocks: crossover is able to repair the dis-
ruptive effects of mutation in later generations [109]. Compared to mutation-based
EAs, this makes multi-bit mutations more useful.
8.1.2 Finite and Infinite Population Models
Many attempts have been made on characterizing the dynamics of EAs. This helps
to understand the conditions for EAs to converge to the global optimum.
Markov chains are widely used mathematical models for the theoretical analysis of
EAs [21,28,79,96,97]. An EA is characterized as a Markov chain with the current
population being the state variables, because the state of the (t + 1)th generation
often depends only on the tth generation. Convergence is analyzed in the sense of
probability. An EA with elitist selection strategy can be modeled by an absorbing

Markov chain. Such an exact approach has been successfully applied to EAs with
finite population for some typical examples.
In [28], a Markov-chain analysis was conducted for a population of one-locus
binary genes to reach different levels of convergence in an expected number of
generations under random selection. In [79], simple GA has been analyzed in the
form of a Markov chain, and the trajectory followed by finite populations is related
to the evolutionary path predicted by the infinite population model. In [33], Markov
chain model is used to show the evolution of abstract GA, which generalizes and
unifies GA and SA. By applying zero mutation probability, the relationships between
premature convergence and effect of GA parameters were analyzed using Markov
chain in [66].
By using the homogeneous Markov chain, simple GA with proportional selection
is shown never to converge to the global optimum, while its elitist variants will
eventually converge the global optimal solution [96]. If the mutation rate is nonzero,
GA will eventually converge in the sense that it will visit the global optimum in finite
time with probability one [97]. The convergence rate of EAs are investigated in [49].
In [52], Markov chains are used to analyze the stochastic effects of the niching
operator of a niched GA. The effect of absorbing Markov chain and ergodic Markov
chain was shown to estimate the convergence of a niched GA [52].
It was proven in [98] that elitist EAs with a self-adaptation mechanism resembling
(1,1)-ES will get caught by non-global optima with positive probability even under
an infinite time horizon.
Crossover operator makes theoretical analyses of EAs difficult. It can be useful
only if the current population has a certain diversity. In [54], it is proved that an EA
can produce enough diversity such that the use of crossover can speedup the expected
optimization time from superpolynomial to a polynomial of small degree.
Drift analysis draws properties of a stochastic process from its mean drift, and has
been used to study properties of the general Markov chain. Drift analysis has been
applied to estimate the mean first hitting time of EAs [46,47]. The first hitting time
is defined as the first time for a stochastic optimization algorithm to reach the global
optimum. In [105], a Markov chain analysis has been made to model the expected
time for a single member of the optimal class to take over finite populations in the
case of different replacement strategies. Other methods for estimating the first hitting
time are Chernoff bounds [31,53], and convergence rate [126].
Infinite population model assumes an infinite population size [91,115,118]. As a
result, the process of an EA is modeled by a deterministic nonlinear mapping. These
models are often represented by deterministic dynamic systems and the analysis
becomes easier. However, an upper bound of the error between the actual EA and its
model is not easily estimated. The behavior of an EA with large population can be
approximated by that of the deterministic dynamic system. An infinite population
model introduced in [115] provides a complete model as to how all strings in the
search space are processed by simple GA. An infinite population model of simple
GA for permutation-based representations has been developed in [118]. In [125],
lower and upper bounds for the expected probability of the global optimal solution
8.1 Convergence of Evolutinary Algorithms 125
are derived based on the infinite population model under proportional selection and
uniform crossover but no elitist selection. The result is then extended to the finite
population model.
A rigorous runtime analysis of a nonelitist-based EA with linear ranking selection
is presented by using an analytical tool called multi-type branching processes in [65].
The results point out situations where a correct balance between selection pressure
and mutation rate is essential for finding the optimal solution in polynomial time.
Building on known results on the performance of the (1+1) EA, an analysis of the
performance of the (1 + λ) EA has been presented for different offspring population
size λ [53]. A simple way is suggested to dynamically adapt this parameter when
necessary.
In [108], a method for establishing lower bounds on the expected running time of
EAs is presented. It is based on fitness-level partitions and an additional condition
on transition probabilities between fitness levels. It yields exact or near-exact lower
bounds for all functions with a unique optimum.
8.2 Random Problems and Deceptive Functions
Example 8.1: The needle-in-a-haystack problem is to find a needle-in a haystack.

The problem can be formalized by assuming a discrete search space X and the
objective function
0 if x = x opt
f (x) = .
1 if x = x opt
This function is illustrated in Figure 8.1.
When physically searching in a haystack for a needle, there is no good strategy
for choosing promising areas of the haystack. This is a random search problem. No
method can outperform random search. The complexity for solving this problem
increases linearly with the size of the search space, |X |.
Example 8.2: A deceptive function, plotted in Figure 8.2, is given by

500 if x = 0
f (x) = .
2x if x ∈ (0, 100]
For this problem, the optimal solution is x ∗ = 0 with f (x ∗ ) = 500. The solution
x = 100 is a deceptive attractor and guided search methods that search in the direction
of higher objective function always find the best solution x = 100, which is not the
optimal solution.
For the above two problems, guided search methods perform worse than many
other methods, since the fitness landscape leads the search method away from the
f(x)
0
40 45 50 55 60
x
Figure 8.1 Illustration of the needle-in-a-haystack problem.
500
400
300
f(x)
200
100
0
0 20 40 60 80 100
x
Figure 8.2 Illustration of a deceptive function.
optimal solution. For these problems, random search is most likely to be the most
efficient approach to these problems.
GA-deceptive functions are a class of functions where low-order building blocks
are misleading, and their combinations cannot generate higher order building blocks.
Deceptive problems remain as hard problems. Due to deceptive problems, the
building-block hypothesis is facing strong criticism [43]. A fitness landscape with
the global optimum surrounded by a part of the landscape of low average payoff
is highly unlikely to be found by GA, and thus, GA may converge to a suboptimal
solution. For deceptive functions, the fitness of an individual of the population is not
correlated to the expected ability of its representational components. Messy GA [41]
was specifically designed to handle bounded deceptive problems. In [43], the static
building-block hypothesis was proposed as the underlying assumption for defining
deception, and augmented GAs for deceptive problems were also proposed.
Through deception, objective functions may actually prevent the objective from
being reached. Objective functions themselves may actively misdirect search toward
8.2 Random Problems and Deceptive Functions 127
dead ends. Novelty search [64] circumvents deception that also yields a perspec-
tive on open-ended evolution. It simply explores search space by searching for
behavioral novelty and ignoring the objective, even in an objective-based problem.
In the maze navigation and biped walking tasks, novelty search significantly outper-
forms objective-based search.
8.3 Parallel Evolutionary Algorithms
Synchronism/asynchronism and homogeneity/heterogeneity describe the properties

of distributed algorithms. The speedup, distributed efficiency, fault-tolerance, and
scalability are performance metrics for evaluating distributed algorithms.
Synchronism/asynchronism issue characterizes the communications among
processors. If all communications are controlled by a clock signal, the algorithm
is said to be synchronous, otherwise it is asynchronous. In an asynchronous distrib-
uted algorithm, communications is automatically driven by data.
Homogeneity/heterogeneity describes whether the evolution tasks on different
processors are of the same settings. In a homogeneous mode, each processor adopts
the same algorithmic settings, whereas in a heterogeneous mode, the local algorith-
mic settings vary.
A distributed algorithm is qualified by a speedup measure, which is the ratio of
sequential execution time to parallel execution time of the algorithm. Ideally, the
speedup is equal to the number of processors being used. Accordingly, distributed
efficiency is defined as the ratio of speedup to the number of processors and its ideal
value is 100%. In practice, the speedup and efficiency of distributed algorithms may
be limited by the computational overhead, the most loaded node, and the communi-
cation speed between processors.
Fault-tolerance measures the ability of a distributed algorithm to continue opti-
mization in the condition of some physical components failing.
The scalability of a distributed algorithm involves size scalability and task scala-
bility. Size scalability refers to the ability to achieve proportionally increased perfor-
mance by increasing the number of processors. Task scalability refers to the ability
to adapt to the changes in the problem scale, e.g., to retain its efficiency when the
problem dimension increases.
Most EAs use a single population (panmixia) of individuals and apply the opera-
tors on them as a whole. Conversely, there exist structured EAs, in which the popu-
lation is decentralized somehow.
Distributed EAs and models can be classified into two groups according to their
task division mechanism [42]. Population-distributed models are presented in the
form of global parallelized (master-slave), coarse-grained parallel (island), fine-
grained parallel (cellular), hierarchical, and pool architectures, which parallelize an
evolution task at population, individual, or operation levels. Dimension-distributed
models include coevolution and multiagent models, which focus on dimension
reduction.
In general, they are capable of higher quality solutions than EAs due to better
diversity.
• Global parallelized EAs implement EAs in master–slave parallel mode across a

cluster of computers, where the whole population can be kept in a master processor
that selects individuals for mating and sends them to slave processors for perform-
ing other operations. This scheme could also overcome the drawbacks due to the
heterogeneous speed of the slave processors.
• Coarse-grained parallel EAs, also known as island model or multi-deme model, are
distributed EAs, where the population is divided into a few isolated subpopulations,
called demes or islands. Individuals can migrate from one deme to another. Each
deme is run on a processor.
• Fine-grained parallel EAs, also called cellular EAs or massively parallel EAs, par-
tition the population into many very small subpopulations, typically one individual
per deme. This technique is also called diffusion model, where the individuals mate
only with individuals in the local neighborhood. This technique is particularly
suitable for massively parallel computers with a fast local intercommunication
network.
• Pool-based methods are represented by cloud computing. A task is submitted to
the cloud, and the MapReduce infrastructure can relieve a user by only caring for
the problem and algorithm. Cloud computing is well suited to build highly scalable
and cost-effective distributed EAs for solving problems with variable demands.
• Cooperative coevolution solves a problem by dividing it into subcomponents, based
on a divide-and-conquer strategy. A complete solution is obtained by assembling
best individual subcomponents from each of the species.
Figure 8.3 illustrates panmictic EA, master–slave EA, island EA, and cellular EA.
Various hierarchical EAs can be obtained by hybridizing these models, producing,
such models as island-master–slave hybrid, island-cellular hybrid, and island–island
hybrid.
Figure 8.4 illustrates pool-based EA. The pool is a shared global array of n tasks.
Each of the p processors processes a segment of size u.
Figure 8.5 illustrates coevolutionary EA. Each of the p processors handles one
dimension of the decision variable, and the final solution is obtained by assembling
these components. Each processor treats one variable as the primary variable, and
the other variables as secondary variables.
Scheduling in distributed systems, as grid computing, is a challenging task in
terms of time. Energy savings is also a promising objective for meta-schedulers.
Energy consumption and execution time can be optimized simultaneously using
multiobjective optimization [9].
8.3 Parallel Evolutionary Algorithms 129
Slave
Slave
Master
a ls
du
vi
s
es
di
In
tn
Fi
Slave Slave
Panmictic Master−slave
Island model Cellular

Figure 8.3 Parallel EA.
Pool
1 2 3 4 5 6 ... n
...
Processor 1 Processor 2 ... Processor p
Figure 8.4 Pool-based EA.
8.3.1 Master–Slave Model
The master–slave model summarizes a distributed approach to the EA operations

and domain evaluations. A commonly used method is to distribute not only the
evaluation tasks but also the individual update tasks to slave nodes. The master per-
forms crossover, mutation, and selection operations, but sends individuals to slaves
for fitness evaluations. There is communication among slaves. For problems whose
x1 Processor 1
Network
x2 Processor 2 ... Processor p

xp
Figure 8.5 Coevolutionary EA.
evaluation costs are not relatively high, employing a master–slave model may become
inefficient in that communications occupy a large proportion of time.
Another approach is a coarse-grained master–slave model in which each slave
processor contains a subpopulation, while the master receives the best individual
from each slave and sends the global best information to all the slaves [122]. Master
conducts basic EA for global search, whereas the slaves execute local search by
considering the individuals received from the master as neighborhood centers.
In a master–slave algorithm, synchronization plays a vital role in algorithm per-
formance on load-balanced problems, while asynchronous distributed EAs are more
efficient for load-imbalanced problems [102]. The speedup and efficiency of master–
slave distributed EAs may be limited by the master’s performance and by the com-
munication speed between the master and the slaves. In a master–slave model, with
increasing number of slave nodes, the speedup will eventually become poor when
the master saturates. The master–slave distributed EAs are fault-tolerant unless the
master node fails.
8.3.2 Island Model
The island model is a well-known way to parallelize EAs [4]. The population is
split into smaller subpopulations, which evolve independently for certain periods of
time and periodically exchange solutions through a process called migration. The
approach can execute an existing EA within each deme. To promote information
sharing, a migration mechanism allows to periodically export some best individuals
to other nodes according to a predefined topology.
Using coarse-grained parallelization can have several advantages. This approach
introduces very little overhead, compared to parallelizing function evaluations,
Figure 8.6 Parallel DE:

unidirectional ring topology
with master node.
Sub−node
Master
node
because the amount of communication between different machines is very low. Fur-
thermore, the effort of managing a small population can be much lower than that of
managing a large, panmictic population, as some operations require time that grows
superlinearly with the population size. Also, a small population is more likely to fit
into a cache than a big one. For EA speedups, a linear speedup in the size of the
population or even a superlinear speedup have been reported [2,48], which means
that the total execution time across all machines may be even lower than that for
its sequential counterpart. Diversity is also an advantage, since the subpopulations
evolve independently for certain periods of time. An island distributed EA is often
synchronous that the best individual on each island propagates to all the other islands
at a specific interval of generation [127]. In asynchronous island models, an island
can receive migrated information as soon as it is ready.
A rigorous runtime analysis for island models is performed in [62]. A simple island
model with migration finds a global optimum in polynomial time, while panmictic
populations as well as island models without migration need exponential time, with
very high probability.
GENITOR II [117] is a coarse-grained parallel version of GENITOR. Individuals
migrate at fixed intervals to neighboring nodes, and immigrants replace the worst
individuals in the target deme. In an asynchronous parallel GA [74], each individual
of the population improves its fitness by hill-climbing.
In the parallel DE scheme [112], an entire subpopulation is mapped to a proces-
sor using island model, allowing different subpopulations to evolve independently
toward a solution. It is organized around one master node and m subpopulations
running each on one node, and organized as a unidirectional ring, as shown in
Figure 8.6. The migrations of individuals are passing through the master. This method
is improved in [116].
In religion-based EAs [113], individuals are allowed to move around and interact
with one another as long as they do not violate the religion membership rules. Mating
is prohibited among individuals of different religions and exchange of individuals
between religions is provided only via religious conversion. Briefly, the religious
rules include commitments to reproduce, to believe in no other religion and to con-
vert nonbelievers. Like other structured population GAs, genetic information is able
to spread slowly due to the spatial topology of the population model which restricts
mating to neighbored individuals only. In addition, religion-based EA also provides

flexible subpopulation sizes, self-adaptive control of migration, and spatial neigh-
borhood between subpopulations.
Motivated by real-world human community and social interactions are the social-
based GA [6] and human-community-based GA [5] models. Like the island model,
the problem space is divided into subgroups, each of which represents a community.
Mimicking the natural and social selection in human societies, recombination oper-
ation is restricted by genders, society, age, and social level, that is, higher probability
of recombination (marriage) if both parents are from the same society and social
level. Additionally, in human-community-based GA, family relations must be main-
tained so that no incest occurs—two individuals must not share the same parents and
are of different genders. Other interesting operators include the birth and the death
operators.
Multilevel cooperative GA [94] is based on the fact that evolution occurs at dif-
ferent levels in a population. The population is made up of subpopulations known
as groups and evolution occurs at individual level as well as group level. Individuals
within groups go through the normal process of GA reproduction. Occasional interac-
tions are among cooperative groups in the form of information exchange. Meanwhile,
evolution takes the form of colonization, where the worst group is selected to extinct
and replaced by colonist group.
A coarse-grained (island) GP model is given in [78]. Parallel distributed GP [84]
represents programs in direct graphs without using genotype–phenotype mapping,
and it uses sophisticated crossover and mutation to manipulate subgraphs.
8.3.3 Cellular EAs
Fine-grained parallel EAs [24,70,75,128] organize the population of chromosomes

as a two-dimensional square grid with each grid point representing a chromosome and
interactions among individuals are restricted to a set neighborhood. The processes
of selection and mating are confined in a local area. In cellular GAs, an individual
may only cooperate with its nearby neighbors in the breeding loop [70]. This local
reproduction has the effect of reducing selection pressure to achieve more exploration
of the search space. Local mating can find very fast multiple optimal solutions in the
same run, and is much more robust [24]. The dynamic-adaptive cellular EA [3] is
very effective for solving a diverse set of single-objective optimization problems. A
fine-grained (grid) model is given in [37].
A cellular EA can also be either synchronous or asynchronous. In a synchronous
mode, all cells update their individuals simultaneously, whereas in an asynchronous
mode, the cells are updated one by one. In island and cellular models, the predefined
topology and the rigid connectivity restrict the amount of islands or cells to be used
and the spontaneous cooperation among the nodes. For island, cellular, and hierar-
chical models, failure of some processers will result in loss of some subpopulations
or individuals. The fault-tolerance is medium to high.
In multiagent GA [128], each agent represents a candidate solution, and has its
own purpose and behaviors and can also use knowledge. An agent interacts with its
neighbors by transfering information. In this manner, the information is diffused to
the whole agent lattice. Four evolutionary operators are designed: The neighborhood
competition operator and the neighborhood orthogonal crossover operator realize
the behaviors of competition and cooperation, respectively; the mutation operator
and the self-learning (local search) operator realize the behaviors of making use of
knowledge. Theoretical analysis shows that multiagent GA converges to the global
optimum. Multiagent GA can find high-quality solutions at a computational cost
better than a linear complexity. Similar ideas are implemented in multiagent EA for
constraint satisfaction problems [67] and in multiagent EA for COPs [68].
By analyzing the behavior of a three-dimensional cellular GA against different
grid shapes and selection rates to investigate their influence on the performance of
the algorithm, convergence-speed-guided three-dimensional cellular GA [7] dynam-
ically balances between exploration and exploitation processes. A diversity speed
measure is used to guide the algorithm.
8.3.4 Cooperative Coevolution
Cooperative coevolution has been introduced into EAs for solving increasingly com-
plex optimization problems through a divide-and-conquer paradigm. In the cooper-
ative coevolution model [89,90], each subcomponent is evolved in a genetically
isolated subpopulation (species). These species cohabit in an ecosystem where each
of them occupies a niche. These species collaborate with one another. Species are
evolved in separate instances of an EA executing in parallel. The individuals are eval-
uated in collaboration with the best individuals of the other species. Credit assign-
ment at the species level is defined in terms of the fitness of the complete solutions
in which the species members participate. The evolution of each species is handled
by a standard EA.
A key issue in cooperative coevolution is the task of problem decomposition. An
automatic decomposition strategy called differential grouping [80] can uncover the
underlying interaction structure of the decision variables and form subcomponents
such that the interdependence between them is kept to a minimum. In [38], the inter-
dependencies among variables are captured by a fast search operator, and problem
decomposition is then performed.
Another key issue involved is the optimization of the subproblems. In [38], a
cross-cluster mutation strategy is utilized to enhance exploitation and exploration.
More specifically, each operator is identified as exploitation-biased or exploration-
biased. The population is divided into several clusters. For individuals within each
cluster, exploitation-biased operators are applied. For individuals among different
clusters, exploration-biased operators are applied. These operators are incorporated
into DE. A cooperative coevolution GP is given in [60].
For the dimension-distributed models, failure of a processor will result in losing

subcomponents of the global solution and hence lead to a crash of the entire algorithm.
Therefore, these models are not fault-tolerant.
8.3.5 Cloud Computing
Although cluster, computing grid [35] and peer-to-peer network [119] have been
widely used as physical platforms for distributed algorithms, the implementation of
distributed EAs on a cloud platform has received increasing attention since 2008.
Cloud computing is an emerging technology that is now a commercial reality. Cloud
computing represents a pool of virtualized computer resources. It utilizes virtualiza-
tion and autonomic computing techniques to realize dynamic resource allocations.
MapReduce [30] is a programming model for accessing and processing of scal-
able data with parallel and distributed algorithms. It has been applied in various web-
scale and cloud computing applications. Hadoop is a popular Java-based open-source
clone of Google’s private MapReduce infrastructure. The MapReduce infrastructure
provides all the functional components including communications, load balancing,
fault-tolerance, resource allocation, and file distribution. A user needs only to imple-
ment the map and the reduce functions. Thus, the user needs to focus on the problem
and algorithm only.
As a nondemand computing paradigm, cloud computing is well suited to build
highly scalable and cost-effective distributed EA systems for solving problems with
requirements of variable demands. The cloud computing paradigm prefers availabil-
ity to efficiency, and hence are more suitable for business and engineering applica-
tions. The speedup and distributed efficiency of distributed EAs deployed on clouds
are lower than those deployed on clusters and computing grids, due to the higher
communication overhead. As a pool-based model, the set of participating proces-
sors can be dynamically changed, which enables the algorithms to achieve superior
performance.
Studies in cloud storage are mainly related to content delivery or designing data
redundancy schemes to ensure information integrity. The public FluidDB platform
is a structured storage system. It is an ideal candidate for acting as the substrate of a
persistent or pool-based EA, leading to fluid EA [71].
A cloud-computing-based EA uses a synchronous storage service as pool for
information exchange among population of solutions [72]. It uses free cloud storage
as a medium for holding distributed evolutionary computation, in a parasitic way.
In parasitic computing [11], one machine forces target computers to solve a piece
of a complex computational task merely by engaging them in standard communica-
tion, and the target computers are unaware of having performed computation for a
commanding node.
8.3.6 GPU Computing
Population-based computational intelligence naturally benefits from parallel hard-

ware. Parallelization have been focusing on using graphics processing units (GPUs),
i.e., graphics cards, to provide fast parallel hardware. A GPU consists of a large
number of processors and recent devices operate as multiple instruction multiple
data (MIMD) architectures. Today, GPUs can be programmed by any user to per-
form general-purpose computation.
Population parallel and fitness parallel are two methods for exploiting the parallel
architecture of GPU. In the fitness parallel method, all the fitness cases are executed
in parallel with only one individual being evaluated at a time. In the population
parallel method, multiple individuals are evaluated simultaneously. Each of them
executes exactly the same instruction at the same time, but on different data. This is
known as single instruction multiple data (SIMD) parallelism.
Many types of metaheuristic algorithms have been implemented on GPUs, includ-
ing a complete GA [36,121], binary and real-coded GAs with crossover [8], multi-
objective EA [120], DE [29], memetic algorithm [63], and BOA [76].
Computer unified device architecture (CUDA) is a parallel computing architec-
ture developed by Nvidia (http://www.nvidia.com). It provides an application pro-
gramming interface (API) for an easy access of the single instruction multiple data
(SIMD) architeture. It allows us to take advantage of the computing capacity of
its GPUs using a subset of C/C++. CUDA programming model executes kernels as
batches of parallel threads in a SIMD programming style. CUDA (http://www.nvidia.
com/object/cuda_home_new.html) and OpenCL (https://www.khronos.org/opencl/)
offer sophisticated general-purpose GPU facilities.
EAsy Specification of EA (EASEA) [23] is a software platform dedicated to the
implementation of EAs that can port different types of EAs on general-purpose GPUs
or over clusters of machines using an island model. EASEA platform is designed to
produce an EA from a problem description.
Open-MP is a set of compiler directives and callable runtime library routines
designed to support portable implementation of parallel programs for shared mem-
ory multiprocessor architectures [55]. It extends FORTRAN, C and C++ to express
shared memory parallelism. It is simple in implementation. Compared to CUDA
applied on GPU in which the accuracy is inversely proportional to the speedup rate,
Open-MP gives a high accuracy equal to that of the sequential implementation. In
general, it provides an incremental path for parallel conversion of any existing soft-
ware, as well as targeting at scalability and performance for a complete rewrite or
entirely new software [20].
A GPU model for GP is given in [73]. Genetic parallel programming [22] is a
GP paradigm which evolves parallel program solutions that run on a tightly coupled,
MIMD register machine. It bears some similarities with parallel distributed GP, but
represents programs in a linear list of parallel instructions with a specific genotype–
phenotype mapping.
Table 8.1 Comparison of distributed models [42]

Model Parallelism Diversity Communication cost Scalability Fault-
level tolerance
Master-slave Operation, Like Medium to high Medium High
evaluation seqential EA
Island Population Good Low to medium Low Medium
Cellular Individual Good Medium Medium to Medium to
high high
Hierarchical Population, Good Medium Medium to Medium to
individual, high high
operation
Pool Population, – Low High High
individual,
operation
Coevolution Variable, – Medium Low Low
variable-
block
The computing power of GPUs can also be used to implement other distributed
population-based metaheuristic models such as the fine-grain parallel fitness evalu-
ation [16], and parallel implementations of ACO [10], and PSO [18].
A comparison of these distrubuted EAs are given in Table 8.1.
8.4 Coevolution
The introduction of ecological models and coevolutionary architectures are effective

methods to improve the efficacy of EAs. The coevolutionary paradigm is inspired by
the reciprocal evolution driven by the cooperative or competitive interaction between
different species.
In case of coevolution, two or more populations coexist during the execution of EA,
interacting and evolving simultaneously. The most important benefit of coevolution
is the possibility of defining several components to represent a problem and assigning
them to several populations to handle each one separately. This allows EA to employ
a divide-and-conquer strategy, where each population can focus its efforts on solving
a part of the problem. If the solutions obtained by the populations are joined correctly,
and the interaction between individuals is managed in a suitable way, coevolution can
lead to high-quality solutions, often improving those obtained by noncoevolutionary
approaches.
8.4 Coevolution 137
8.4.1 Coevolutionary Approaches
Cooperative Coevolution
Cooperative coevolution is inspired by the ecological relationship of symbiosis,
where different species live together in a mutually beneficial relationship. The rela-
tion between butterflies and plants are an example of coevolution. They apply a
divide-and-conquer approach to simplify the search. It divides an optimization prob-
lem into many modules, evolves each module separately using a species, and then
combines them together to form the whole system [90]. The fitness of an individual
depends on its ability to collaborate with individuals from other species. Each pop-
ulation evolves individuals representing a component of the final solution. Thus, a
full solution is obtained by joining an individual chosen from each population. In
this way, increases in a collaborative fitness value are shared between individuals of
all the populations of the algorithm [90].
Cooperative coevolutionary algorithm [89] decomposes a high-dimensional prob-
lem into multiple lower dimensional subproblems, and tackles each subproblem sep-
arately by a subpopulation. An overall solution can be derived from a combination
of subsolutions, which are evolved from individual subpopulations. The cooperative
coevolution framework is applied to PSO in [114].
The method performs poorly on nonseparable problems, because the interde-
pendencies among different variables cannot be captured well enough. Existing
algorithms perform poorly on nonseparable problems with 100 or more real-valued
variables [123]. Theoretical and empirical arguments show that cooperative coevo-
lutionary algorithms tend to converge to suboptimal solutions in the search space.
An extended formal model for cooperative coevolutionary algorithms, under specific
conditions, can be guaranteed to converge to the globally optimal solution [83].
Teacher-learner type coevolutionary algorithms are a popular approach for imple-
menting active learning, where active learning is divided into two complementary
subproblems: one population infers models using a dynamic dataset while the second
adds to the dataset by designing experiments that disambiguate the current candidate
models. Each EA leverages the advancements in its counterpart to achieve superior
results in a unified active learning framework [15].
Competitive Coevolution
Competitive coevolution resembles predator–prey or host–parasite interactions,
where predators (or hosts) implement the potential solutions to the optimization prob-
lem, while preys (or parasites) find individual fitness. In competitive coevolutionary
optimization, there are usually two independently evolving populations of hosts and
parasites, and an inverse fitness interaction exists between the two subpopulations.
To survive, the losing subpopulation adapts to counter the winning subpopulation in
order to become the new winner. The individuals of each population compete with
one another. This competition is usually represented by a decrease in the fitness value
of an individual when the fitness value of its antagonist increases [95].
Cooperative–Competitive Coevolution
Cooperative–competitive coevolution paradigm, which tries to achieve the advan-
tages of cooperation and competition at different levels of the model, has been suc-
cessfully employed in dynamic multiobjective optimization [39]. Multiple-species
models have also been used to evolve coadapted subcomponents. Because the host
and parasite species are genetically isolated and only interact through their fitness
functions, they are full-fledged species in a biological sense.
8.4.2 Coevolutionary Approach for Minimax Optimization
Many robust design problems can be described by minimax optimization problems.

Classical techniques for solving these problems are typically limited to a discrete
form of the problem. Examples for coevolutionary EAs for solving minimax opti-
mization problems are alternating coevolutionary GA [12], parallel coevolutionary
GA [45], and alternating coevolutionary PSO [103].
The minimax problem is defined as
min max f (x, s), (8.3)
x∈X s∈S
where f (·, ·) is the objective or fitness function.
By analogy, the set X stands for preys and the set S for predators. The prey with
the optimal performance with respect to the worst possible predator is sought. A
population of preys and a population of predators evolve independently and simulta-
neously. The two populations are tied together through fitness evaluation. The fitness
of an individual in one population is based on its performance against the individuals
in the other. The fitness for the prey population x ∈ PX should be minimized:
F(x) = max f (x, s). (8.4)
s∈PS
By a security strategy, a predator’s fitness is assigned with respect to the prey that
performs best against it, that is, by maximizing
G(s) = min f (x, s). (8.5)
x in PX
In alternating coevolutionary GA [12], the evolution of the two populations is
staggered. It assigns fitness to preys and predators alternatively. The populations are
randomly initialized and evaluated against each other. The algorithm then fixes the
prey population, while it evolves the predator population for several generations.
Then, it fixes the predator population and evolves the prey population for several
generations. This process repeats a fixed number of times. Parallel coevolutionary
GA [45] is similar to alternating coevolutionary GA except that the two populations
evolve simultaneously. The two populations are randomly initialized and evaluated
against each other. There is no fixed fitness landscape. Alternating coevolutionary
PSO [103] is the same as alternating coevolutionary GA except that it is implemented
using PSO. In [61], an approach based on coevolutionary PSO is applied to solve
minimax problems. Two populations of independent PSO using Gaussian distribution
8.4 Coevolution 139
are evolved: one for the variable vector and the other for the Lagrange multiplier
vector. A method of solving general minimax optimization problems using GA is
proposed in [26].
8.5 Interactive Evolutionary Computation
Many design tasks, such as artistic or aesthetic design, control for virtual reality or
comfortableness, and signal processing to increase visibility or audibility, applica-
tions in engineering, edutainment and other fields, require human evaluation. For
domains in which fitness is subjective or difficult to formalize (e.g., for aesthetic
appeal), interactive evolutionary computation (IEC) is an approach to evolutionary
computation in which human evaluation replaces the fitness function. IEC is applied
to human–computer interaction to optimize a target system based on human subjec-
tive evaluation [111].
Genetic art encompasses a variety of digital media, including images, movies,
three-dimensional models, and music [14]. GenJam is an IEC system for evolving
improvisational jazz music [14]. IEC has been applied to police face sketching [19].
In [44], IEC is applied to particle system effects for generating special effects in
computer graphics.
A typical IEC application presents the current generation of solutions, which may
be in the form of sounds or images, to the user. The user interactively gives his or
her subjective evaluations as numerical inputs, based on which EA generates new
parameters for candidate solutions as the next generation of solutions. The parameters
of the target system are optimized toward each user’s preference by iterating this
procedure. In each generation, the user selects the most promising designs, which
are then mated and mutated to create the next generation. This initial population is
evolved through a process similar to domesticated animal and plant breeding.
IEC is often limited by human fatigue. An IEC process usually lasts dozens
of generations for a single user [111]. Collaborative interactive evolution systems
[110] involve multiple users in one IEC application, working to create products with
broader appeal and greater significance. Users vote on a particular individual selected
by the system. To overcome user fatigue, the system combines these inputs to form
a fitness function for GA. GA then evolves an individual to meet the combined user
requirements. Imagebreeder system (http://www.imagebreeder.com/) also offers an
online community coupled with an IEC client for evolving images.
8.6 Fitness Approximation
For optimization, much of the computational complexity is due to fitness function

evaluations. Fitness approximation through modeling helps to reduce the number of
expensive fitness evaluations. With a fitness model, one can improve EA efficiency by
directly sampling new solutions, developing hybrid guided evolutionary operators or
using the model as a surrogate for an expensive fitness function. Fitness models have
also been applied to handle noisy fitness functions, smooth multimodal landscapes,
and define a continuous fitness in domains that lack an explicit fitness (e.g., evolving
art and music).
Fitness Inheritance
An approach to function approximation is fitness inheritance. By fitness inheritance,
an offspring inherits a fitness value from its parents rather than through function
evaluations. An individual is evaluated indirectly by interpolating the fitness of their
parents. In [104], fitness inheritance can be implemented by taking the average fitness
of the two parents or by taking a weighted average. Convergence time and population
sizing of EAs with fitness inheritance are derived for single-objective GAs in [100].
In [99], fitness of a child is the weighted sum of its parents; a fitness and associated
reliability value are assigned to each new individual that is evaluated using the true
fitness function only if the reliability value is below a certain threshold.
An exact evaluation of fitness may not be necessary as long as a proper rank is
approximately preserved. By using fitness granulation via an adaptive fuzzy simi-
larity analysis, the number of fitness evaluations can be reduced [1]. An individual’s
fitness is computed only if it has insufficient similarity to a pool of fuzzy granules
whose fitness has already been computed.
Fitness Approximation by Metamodeling
Fitness approximation can be otained through metamodeling [81,92,93]. Data col-
lected for all previously evaluated points can be used during the evolution to build
metamodels, and the cost of training a metamodel depends on its type and the training
set size. Many statistics such as Bayesian interpolation and neural network models
[81] can be used to construct surrogate models.
Screening methods also consider the confidence of the predicted output [34,57,
92]. Among the previous evaluated points, the less promising generation members
are screened out, and expensive evaluations are only necessary for the most promising
population members. For multimodal problems and in multiobjective optimization,
the confidence information provided by Bayesian interpolation should be used in
order to boost evaluations toward less explored regions. In EAs assisted by local
Bayesian interpolation [34], predictions and their confidence intervals predicted by
Bayesian interpolation are used by EA. It selects the promising members in each
generation and carries out exact, costly evaluations only for them.
In [129], a data parallel Gaussian process-based global surrogate model and a
Lamarckian evolution-based neural network local metamodel are combined in a
hierarchical framework to accelerate convergence.
Efficient global optimization [57] makes use of Gaussian process to model the
search landscape from solutions visited during the search. It does not just choose the
solution that the model predicts would minimize the cost. Rather, it automatically
balances exploitation and exploration. The method uses a closed-form expression
for the expected improvement, and it is thus possible to search the decision space
globally for the solution that maximizes this.
8.6 Fitness Approximation 141
An algorithm that coevolves fitness predictors is proposed in [101]. Fitness predic-

tors may or may not represent the objective fitness, opening opportunities to adapt
selection pressures and diversify solutions. Coevolved predictors scale favorably
with problem complexity on a series of randomly generated test problems. Fitness
prediction can also reduce solution bloat and find solutions more reliably.
In [17], the Markov network fitness model is defined in terms of Walsh functions
ro identify qualitative features of the fitness function. Fitness prediction correlation
metric is defined to measure fitness modeling capability of local and global fitness
models. This metric is used to investigate the effects of population size and selection
on the tradeoff between model quality and complexity for the Markov network fitness
model.
In evolving surrogate model-based DE [69], a surrogate model constructed based
on the population members of the current generation is used to assist DE in order to
generate competitive offspring using the appropriate parameter setting during dif-
ferent stages of the evolution. The surrogate model, constructed by a simple Kriging
model, evolves over the iterations to better represent the basin of search by DE.
Fitness Imitation
Another solution is to cluster individuals in a population into several groups. Only
the individual that represents its cluster or is closest to the cluster center need to be
evaluated using the fitness function, and the fitness value of other individuals in the
same cluster will be estimated from the representative individual based on a distance
measure [58] or based on a neural network ensemble [56]. This is referred to as
fitness imitation in contrast to fitness inheritance.
8.7 Other Heredity-Based Algorithms
Sheep Flock Heredity Algorithm

Sheep flock heredity algorithm [77] simulates heredity of a sheep flock in a prairie.
Normally, sheep in each flock are living within their own flock under the control
of shepherds. Therefore, the genetic inheritance only occurs within the flock. Occa-
sionally, two flocks are mixed with the other flocks. In addition to normal genetic
operations, it also works with two kinds of genetic operators: sub-chromosome level
operator and chromosome level operator. This hierarchical step is referred as multi-
stage genetic operation. The algorithm has been successfully used in various schedul-
ing problems.
Selfish Gene Theory
Selfish gene theory of Dawkins gives a different view on the evolution [27]. In this
theory, the population can be regarded as a pool of genes and the individual genes
strive for their appearances in the genotype of vehicles. The survival of the fittest is a
battle fought by genes, not individuals. Only good genes can survive in the evolution
process.
In an evolutionary optimization strategy based on selfish gene theory [25], the

population is like a store room of genes which is called a virtual population. Individ-
uals would be generated when necessary and be dumped after the statistical analysis
of genes. The individuals are stored with genes in a virtual population and can be
selected after sampling by the density function. Each variation of a gene, an allele, is
in a constant battle against other alleles for the same spot on a chromosome, and any
alleles more successful at increasing its presence over others have a better chance
at winning this battle over altruistic or passive genes [25]. The success of an allele
is often measured by the frequency with which it appears in the virtual population.
Each solution is implicitly generated by changing the frequencies or the probabilities
of the alleles. It proceeds by choosing two individuals randomly by the frequencies
of alleles and compares the fitness of these two individuals. The individuals with
higher fitness would be kept.
8.8 Application: Optimizating Neural Networks
Training of a neural network is a search process for the minimization of an error

function of network parameters including the structure, the parameters, and/or the
nonlinear activation function of the neural network [32]. With evolution, prior knowl-
edge of a neural network is not necessary. The optimization capability of EAs can
lead to a minimal configuration that reduces the total training time as well as the
performing time for new patterns. In many cases, a chromosome is selected as a
whole network. Competition occurs among those individual networks, based on the
performance of each network.
EAs can be used to optimize network structure and parameters, or to optimize spe-
cific network performance and algorithmic parameters. EAs are suitable for learning
networks with nondifferentiable activation function. When EAs are used to optimize
network parameters, the fitness can be defined by f (s) = 1+E1
, where s is the genetic
coding of the network parameters, and E is the objective function. The fitness func-
tion can also be selected as a complexity criterion such as AIC, which gives a tradeoff
between the training error and the network complexity.
Although the framework of evolving both structures and weights can avoid the one-
to-many mapping problem, there still exists another structural-functional-mapping
called the permutation problem, wherein the network parameters can be permuted
without affecting their function. It is mainly caused by the many-to-one mapping
from the genotypes to the phenotypes [124]. Permutation results in a topological
symmetry, and consequently in a high number of symmetries in the error function.
Thus, the number of local minima is high. Figure 8.7 shows two functionally equiv-
alent networks that order their hidden units differently in their chromosomes. This
leads to a coded string that looks quite different.
For two networks with permuted parameters, crossover almost certainly leads
nowhere, and thus, the algorithm converges very slowly. The permutation problem
8.8 Application: Optimizating Neural Networks 143
Figure 8.7 Permutation A B

problem: The two networks w1 w3
with permuted weights and w5 w6
neurons are equivalent, but
their chromosomes are quite w3 C w1 C
different. w2 w4
B w6 A w5
w4 w2
can be resolved by sorting the strings appropriately before crossover. When evolving
the architecture of the network, crossover is usually avoided and only mutations are
adopted.
Coding of network parameters is critical in view of the convergence speed of
search. Each instance of the neural network is encoded by the concatenation of all
the network parameters in one chromosome. A heuristic concerning the order of the
concatenation of the network parameters is to put connection weights terminating at
the same unit together.
The architecture of a neural network is referred to as its topological structure,
i.e., connectivity. Given certain performance criteria, such as minimal training error
and lowest network complexity, the performance levels of all architectures form a
discrete surface in the space due to a discrete number of nodes. The performance
surface is nondifferentiable and multimodal.
Direct and indirect encodings are used for encoding the architecture. For direct
encoding, every connection of the architecture is encoded into the chromosome. For
indirect encoding, only the most important parameters of the architecture, such as
the number of hidden layers and the number of hidden units in each hidden layer, are
encoded. Only the architecture of a network is evolved, whereas other parameters of
the architecture such as the connection weights have to be learned after a near-optimal
architecture is found.
In direct encoding, each parameter ci j , the connectivity from nodes i to j, can be
represented by a bit denoting the presence or absence of a connection.
An architecture
of Nn nodes is represented by an Nn × Nn matrix, C = ci j . If ci j is represented
by real-valued connection weights, both the architecture and connection weights
are evolved simultaneously. The binary string representing the architecture is the
concatenation of all the rows of the matrix. For a feedforward network, only the
upper triangle of the matrix will have nonzero entries, and thus only this part of the
connectivity matrix needs to be encoded into the chromosome. As an example, a
2-2-1 feedforward network is shown in Figure 8.8. Only the upper triangle of the
connectivity matrix is encoded in the chromosome, and we get “0110 110 01 1.”
A chromosome is required to be converted back to a neural network in order to
evaluate the fitness of each chromosome. The neural network is then trained after
being initialized with random weights. The training error is used to measure the
fitness. In this way, EAs explore all possible connectivities.
Figure 8.8 Direct encoding 3

of a 2-2-1 feedforward 1
network architecture. The 00110
number above each node 5 0110
denotes the cardinal of the C= 001
node. 2 4 0 01
0
The direct encoding scheme has the problem of scalability. A large network
would require a very large matrix and thus, the computation time of the evolu-
tion is increased. Prior knowledge can be used to reduce the size of the matrix. For
example, for multilayer perceptrons, two adjacent layers are in complete connection,
and therefore, its architecture can be encoded by the number of hidden layers and
the number of hidden units in each layer. This leads to indirect encoding.
Indirect encoding can effectively reduce the chromosome length of the architec-
ture by encoding only some characteristics of the architecture. The details of each
connection are either predefined or specified by some rules. Indirect encoding may
not be very good at finding a compact network with good generalization ability.
Each network architecture may be encoded by a chromosome consisting of a set
of parameters such as the number of hidden layers, the number of hidden nodes in
each layer, and the number of connections between two layers. In this case, EAs can
only search a limited subset of the whole feasible architecture space. This parametric
representation method is most suitable when the type of architecture is known.
One major problem with the evolution of architectures without connection weights
is noisy fitness evaluation [124]. The noise is dependent on the random initialization
of the weights and the training algorithm used. The noise identified is caused by
the one-to-many mapping from genotypes to phenotypes. This drawback can be
alleviated by simultaneously evolving network architectures and connection weights.
The activation function for each neuron can be evolved by symbolic regression
among some popular nonlinear functions such as the Heaviside, sigmoidal, and
Gaussian functions during the learning period.
Example 8.3: We consider the iris classification problem. The iris data set has 150
patterns belonging to 3 classes, shown in Figure 8.9. Each pattern has four numeric
properties. We use a 4-4-3 multilayer perceptron to learn this problem, with three dis-
crete values representing different classes. The logistic sigmoidal function is selected
for the hidden neurons and linear function is used for the output neurons. We use
GA to train the neural network and hope to find a global optimum solution for the
weights.
There are a total of 28 weights in the network, which are encoded as a string of 28
numbers. The fitness function is defined as f = 1+E 1
, where E is the training error,
that is, the mean squared error. Real encoding is employed. A fixed population of
20 is applied. The selection scheme is the roulette-wheel selection. Only mutation is
employed. Only one random gene of a chromosome is mutated by adding Gaussian
8.8 Application: Optimizating Neural Networks 145
(a) (b)
4.5 2.5
4 2
3.5 1.5
2
x4
x
3 1
2.5 0.5
2 0
4 5 6 7 8 1 2 3 4 5 6 7
x1 x3
Figure 8.9 Plot of the iris dataset: a x1 vs. x2 . b x3 vs. x4 .
0.35
Best
0.3 Average
0.25
0.2
Fitness
0.15
0.1
0.05
0
0 100 200 300 400 500
t
Figure 8.10 The evolution of real-coded GA for training a 4-4-3 MLP: the fitness and average
fitness. t corresponds to the number of generations.

noise with variance σ = σ0 1 − Tt + σ1 . The initial population is randomly
generated with all genes of the chromosomes as random numbers in (0, 1). σ0 and σ1
are, respectively, selected as 10 and 0.5. Elitism strategy is adopted. The results for
a typical random run are shown in Figures 8.10 and 8.11. The computation time is
461.66 s for 500 generations. Although the training error is relatively large, E =
2.9171, the rate of correct classification for the trained examples is 96.67%.
In the above implementation, the selection of variance σ is of vital importance. In
ESs, σ itself is evolved, and some other measures beneficial to numerical optimization
are also used.
30
25
20
15
E
10
0
0 100 200 300 400 500
t
Figure 8.11 The evolution of real-coded GA for training a 4-4-3 MLP: the training error. t corre-
sponds to the number of generations.
References
1. Akbarzadeh-T M-R, Davarynejad M, Pariz N. Adaptive fuzzy fitness granulation for evolu-
tionary optimization. Int J Approx Reason. 2008;49:523–38.
2. Alba E. Parallel evolutionary algorithms can achieve superlinear performance. Inf Process
Lett. 2002;82(1):7–13.
3. Alba E, Dorronsoro B. The exploration/exploitation tradeoff in dynamic cellular evolutionary
algorithms. IEEE Trans Evol Comput. 2005;9(2):126–42.
4. Alba E, Tomassini M. Parallelism and evolutionary algorithms. IEEE Trans Evol Comput.
2002;6(5):443–62.
5. Al-Madi NA. De Jong’s sphere model test for a human community based genetic algorithm
model (HCBGA). Int J Adv Compu Sci Appl. 2014;5(1):166–172.
6. Al-Madi NA, Khader AT. A social based model for genetic algorithms. In: Proceedings of the
3rd international conference on information technology (ICIT), Amman, Jordan, May 2007.
p. 23–27
7. Al-Naqi A, Erdogan AT, Arslan T. Adaptive three-dimensional cellular genetic algorithm for
balancing exploration and exploitation processes. Soft Comput. 2013;17:1145–57.
8. Arora R, Tulshyan R, Deb K. Parallelization of binary and realcoded genetic algorithms on
GPU using CUDA. In: Proceedings of IEEE world congress on computational intelligence,
Barcelona, Spain, July 2010. p. 3680–3687.
9. Arsuaga-Rios M, Vega-Rodriguez MA. Multiobjective energy optimization in grid systems
from a brain storming strategy. Soft Comput. 2015;19:3159–72.
10. Bai H, Ouyang D, Li X, He L, Yu H. MAX-MIN ant system on GPU with CUDA. In:
Proceedings of the IEEE 4th international conference on innovative computing, information
and control (ICICIC), Kaohsiung, Taiwan, Dec 2009. p. 801–204.
11. Barabasi AL, Freeh VW, Jeong H, Brockman JB. Parasitic computing. Nature.
2001;412(6850):894–7.
References 147
12. Barbosa HJC. A genetic algorithm for min-max problems. In: Proceedings of the 1st inter-
national conference on evolutionary computation and applications, Moscow, Russia, 1996. p.
99–109.
13. Beyer H-G. An alternative explanation for the manner in which genetic algorithms operate.
Biosystems. 1997;41(1):1–15.
14. Biles J. Genjam: a genetic algorithm for generating jazz solos. In: Proceedings of international
computer music conference, Arhus, Denmark, 1994. p. 131–137.
15. Bongard J, Zykov V, Lipson H. Resilient machines through continuous self-modeling. Science.
2006;314(5802):1118–21.
16. Bozejko W, Smutnicki C, Uchronski M. Parallel calculating of the goal function in meta-
heuristics using GPU. In: Proceedings of the 9th international conference on computational
science, Baton Rouge, LA, USA, May 2009, vol. 5544 of Lecture Notes in Computer Science.
17. Brownlee AEI, McCall JAW, Zhang Q. Fitness modeling with Markov networks. IEEE Trans
Evol Comput. 2013;17(6):862–79.
18. Calazan RM, Nedjah N, De Macedo Mourelle L. Parallel GPU-based implementation of high
dimension particle swarm optimizations. In: Proceedings of the IEEE 4th Latin American
symposium on circuits and systems (LASCAS), Cusco, Peru, Feb 2013. p. 1–4.
19. Caldwell C, Johnston VS. Tracking a criminal suspect through “face-space” with a genetic
algorithm. In: Proceedings of the 4th international conference on genetic algorithms, San
Diego, CA, USA, July 1991. San Diego, CA: Morgan Kaufmann; 1991. p. 416–421
20. Candan C, Dreo J, Saveant P, Vidal V. Parallel divide-and-evolve: experiments with Open-MP
on a multicore machine. In: Proceedings of GECCO, Dublin, Ireland, July 2011. p. 1571–
1578.
21. Cerf R. Asymptotic convergence of genetic algorithms. Adv Appl Probab. 1998;30(2):521–50.
22. Cheang SM, Leung KS, Lee KH. Genetic parallel programming: design and implementation.
Evol Comput. 2006;14(2):129–56.
23. Collet P, Lutton E, Schoenauer M, Louchet J. Take it EASEA. In: Proceedings of the 6th
international conference on parallel problem solving from nature (PPSN VI), Paris, France,
Sept 2000, vol. 1917 of Lecture Notes in Computer Science. London: Springer; 2000. p.
891–901
24. Collins RJ, Jefferson DR. Selection in massively parallel genetic algorithms. In: Belew RK,
Booker LB, editors. Proceedings of the 4th international conference on genetic algorithms,
San Diego, CA, USA, July 1991. San Diego, CA: Morgan Kaufmann; 1991. p. 249–256.
25. Corno F, Reorda M, Squillero G. The selfish gene algorithm: a new evolutionary optimization
strategy. In: Proceedings of the 13th annual ACM symposium on applied computing (SAC),
Atlanta, Georgia, USA, 1998. p. 349–355.
26. Cramer AM, Sudhoff SD, Zivi EL. Evolutionary algorithms for minimax problems in robust
design. IEEE Trans Evol Comput. 2009;13(2):444–53.
27. Dawkins R. The selfish gene. Oxford: Oxford University Press; 1989.
University of Michigan, Ann Arbor, 1975.
29. de Veronese PL, Krohling RA. Differential evolution algorithm on the GPU with C-CUDA.
In: Proceedings of IEEE world congress on computational intelligence, Barcelona, Spain,
July 2010. p. 1878–1884.
30. Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. In: Proceed-
ings of the 6th symposium on operating system design and implementation (OSDI), San
Francisco, CA, 2004. p. 137–147.
31. Droste S, Jansen T, Wegener I. On the analysis of the (1+1) evolutionary algorithm. Theor
Comput Sci. 2002;276:51–81.
32. Du K-L, Swamy MNS. Neural networks and statistical learning. London: Springer; 2014.
33. Eiben AE, Aarts EHL, Van Hee KM. Global convergence of genetic algorithms: a Markov
chain analysis. In: Proceedings of the 1st workshop on parallel problem solving from nature
(PPSN I), Dortmund, Germany, Oct 1990. Berlin: Springer; 1991. p. 3–12.
34. Emmerich MTM, Giannakoglou KC, Naujoks B. Single- and multiobjective evolutionary
optimization assisted by Gaussian random field metamodels. IEEE Trans Evol Comput.
2006;10(4):421–39.
35. Ewald G, Kurek W, Brdys MA. Grid implementation of a parallel multiobjective genetic algo-
rithm for optimized allocation of chlorination stations indrinking water distribution systems:
Chojnice case study. IEEE Trans Syst Man Cybern Part C. 2008;38(4):497–509.
36. Fok K-L, Wong T-T, Wong M-L. Evolutionary computing on consumer graphics hardware.
IEEE Intell Syst. 2007;22:69–78.
37. Folino G, Pizzuti C, Spezzano G. A scalable cellular implementation of parallel genetic
programming. IEEE Trans Evol Comput. 2003;7(1):37–53.
38. Ge H, Sun L, Yang X, Yoshida S, Liang Y. Cooperative differential evolution with fast variable
interdependence learning and cross-cluster mutation. Appl Soft Comput. 2015;36:300–14.
39. Goh C-K, Tan KC. A competitive-cooperative coevolutionary paradigm for dynamic multi-
objective optimization. IEEE Trans Evol Comput. 2009;13(1):103–27.
41. Goldberg DE, Deb K, Korb B. Messy genetic algorithms: motivation, analysis, and first results.
Complex Syst. 1989;3:493–530.
42. Gong Y-J, Chen W-N, Zhan Z-H, Zhang J, Li Y, Zhang Q, Li J-J. Distributed evolutionary
algorithms and their models: a survey of the state-of-the-art. Appl Soft Comput. 2015;34:286–
300.
43. Grefenstette JJ. Deception considered harmful. In: Whitley LD, editor. Foundations of genetic
algorithms, vol. 2. Morgan Kaufmann: San Mateo, CA; 1993. p. 75–91.
44. Hastings EJ, Guha RK, Stanley KO. Interactive evolution of particle systems for computer
graphics and animation. IEEE Trans Evol Comput. 2009;13(2):418–32.
45. Herrmann JW. A genetic algorithm for minimax optimization problems. In: Proceedings of
the congress on evolutionary computation (CEC), Washington DC, July 1999, vol. 2. p. 1099–
1103.
46. He J, Yao X. Drift analysis and average time complexity of evolutionary algorithms. Artif
Intell. 2001;127:57–85.
47. He J, Yao X. From an individual to a population: an analysis of the first hitting time of
population-based evolutionary algorithms. IEEE Trans Evol Comput. 2002;6(5):495–511.
48. He J, Yao X. Analysis of scalable parallel evolutionary algorithms. In: Proceedings of the
IEEE congress on evolutionary computation (CEC), Vancouver, BC, Canada, July 2006. p.
120–127.
49. He J, Yu X. Conditions for the convergence of evolutionary algorithms. J Syst Arch.
2001;47(7):601–12.
50. Holland J. Adaptation in natural and artificial systems. Ann Arbor, Michigan: University of
Michigan Press; 1975.
51. Holland JH. Building blocks, cohort genetic algorithms and hyperplane-defined functions.
Evol Comput. 2000;8(4):373–91.
52. Horn J. Finite Markov chain analysis of genetic algorithms with niching. In: Proceedings of
the 5th international conference on genetic algorithms, Urbana, IL, July 1993. San Francisco,
CA: Morgan Kaufmann Publishers; 1993. p. 110–117
53. Jansen T, De Jong KA, Wegener I. On the choice of the offspring population size in evolu-
tionary algorithms. Evol Comput. 2005;13(4):413–40.
54. Jansen T, Wegener I. The analysis of evolutionary algorithms—a proof that crossover really
can help. Algorithmica. 2002;33:47–66.
References 149
55. Jin H, Frumkin M, Yan J.The OpenMP implementation of NAS parallel benchmarks and its
performance. MRJ Technology Solutions, NASA Contract NAS2-14303, Moffett Field, CA,
Oct 1999.
56. Jin Y, Sendhoff B. Reducing fitness evaluations using clustering techniques and neural network
ensembles. In: Proceedings of genetic and evolutionary computation, Seattle, WA, USA, July
2004. p. 688–699.
57. Jones DR, Schonlau M, Welch WJ. Efficient global optimization of expensive black-box
functions. J Global Optim. 1998;13(4):455–92.
58. Kim H-S, Cho S-B. An efficient genetic algorithms with less fitness evaluation by clustering.
In: Proceedings of IEEE congress on evolutionary computation (CEC), Seoul, Korea, May
2001. p. 887–894.
59. Koza JR. Genetic programming: on the programming of computers by means of natural
selection. Cambridge, MA: MIT Press; 1992.
60. Krawiec K, Bhanu B.Coevolution and linear genetic programming for visual learning. In:
Proceedings of genetic and evolutionary computation conference (GECCO), Chicago, Illinois,
USA, vol. 2723 of Lecture Notes of Computer Science. Berlin: Springer; 2003. p. 332–343
61. Krohling RA, Coelho LS. Coevolutionary particle swarm optimization using Gaussian distri-
bution for solving constrained optimization problems. IEEE Trans Syst Man Cybern Part B.
2006;36(6):1407–16.
62. Lassig J, Sudholt D. Design and analysis of migration in parallel evolutionary algorithms.
Soft Comput. 2013;17:1121–44.
63. Lastra M, Molina D, Benitez JM. A high performance memetic algorithm for extremely
high-dimensional problems. Inf Sci. 2015;293:35–58.
64. Lehman J, Stanley KO. Abandoning objectives: evolution through the search for novelty
alone. Evol Comput. 2011;19(2):189–223.
65. Lehre PK, Yao X. On the impact of mutation-selection balance on the runtime of evolutionary
66. Leung Y, Gao Y, Xu Z-B. Degree of population diversity: a perspective on premature con-
vergence in genetic algorithms and its Markov chain analysis. IEEE Tran Neural Netw.
1997;8(5):1165–76.
67. Liu J, Zhong W, Jiao L. A multiagent evolutionary algorithm for constraint satisfaction prob-
lems. IEEE Trans Syst Man Cybern Part B. 2006;36(1):54–73.
68. Liu J, Zhong W, Jiao L. A multiagent evolutionary algorithm for combinatorial optimization
problems. IEEE Trans Syst Man Cybern Part B. 2010;40(1):229–40.
69. Mallipeddi R, Lee M. An evolving surrogate model-based differential evolution algorithm.
70. Manderick B, Spiessens P. Fine-grained parallel genetic algorithms. In: Schaffer JD, editor.
Proceedings of the 3rd international conference on genetic algorithms, Fairfax, Virginia, USA,
June 1989. San Mateo, CA: Morgan Kaufmann; 1989. p. 428–433.
71. Merelo-Guervos JJ. Fluid evolutionary algorithms. In: Proceedings of IEEE congress on
evolutionary computation, Barcelona, Spain, July 2010. p. 1–8.
72. Meri K, Arenas MG, Mora AM, Merelo JJ, Castillo PA, Garcia-Sanchez P, Laredo JLJ. Cloud-
based evolutionary algorithms: an algorithmic study. Natural Comput. 2013;12(2):135–47.
73. Meyer-Spradow J, Loviscach J. Evolutionary design of BRDFs. In: Chover M, Hagen H, Tost
D, editors. Eurographics 2003 short paper proceedings. Spain: Granada; 2003. p. 301–6.
74. Muhlenbein H. Parallel genetic algorithms, population genetics and combinatorial optimiza-
tion. In: Schaffer JD, editor. Proceedings of the 3rd international conference on genetic
algorithms, Fairfax, Virginia, USA, June 1989. San Mateo, CA: Morgan Kaufman; 1989.
p. 416–421.
75. Muhlenbein H, Schomisch M, Born J. The parallel genetic algorithm as a function optimizer.
In: Proceedings of the 4th international conference on genetic algorithms, San Diego, CA,
July 1991. p. 271–278.
76. Munawar A, Wahib M, Munawar A, Wahib M. Theoretical and empirical analysis of a GPU
based parallel Bayesian optimization algorithm. In: Proceedings of IEEE international confer-
ence on parallel and distributed computing, applications and technologies, Higashi Hiroshima,
Japan, Dec 2009. p. 457–462.
77. Nara K, Takeyama T, Kim H. A new evolutionary algorithm based on sheep flocks hered-
ity model and its application to scheduling problem. In: Proceedings of IEEE international
conference on systems, man, and cybernetics, Tokyo, Japan, Oct 1999, vol. 6. p. 503–508.
78. Niwa T, Iba H. Distributed genetic programming: empirical study and analysis. In: Proceedings
of the 1st annual conference on genetic programming, Stanford University, CA, USA, July
1996. p. 339–344.
79. Nix AE, Vose MD. Modeling genetic algorithms with markov chains. Ann Math Artif Intell.
1992;5:79–88.
80. Omidvar MN, Li X, Mei Y, Yao X. Cooperative co-evolution with differential grouping for
large scale optimization. IEEE Trans Evol Comput. 2014;18(3):378–93.
81. Ong YS, Nair PB, Kean AJ. Evolutionary optimization of computationally expensive problems
via surrogate modeling. AIAA J. 2003;41(4):687–96.
82. O’Reilly UM, Oppacher F. The troubling aspects of a building-block hypothesis for genetic
programming. In: Whitley LD, Vose MD, editors. Foundations of genetic algorithm 3. San
Francisco, CA: Morgan Kaufmann; 1995. p. 73–88
83. Panait L. Theoretical convergence guarantees for cooperative coevolutionary algorithms. Evol
Comput. 2010;18(4):581–615.
84. Poli R. Parallel distributed genetic programming. In: Come D, Dorigo M, Glover F, editors.
New ideas in optimization. New York: McGraw-Hill; 1999.
85. Poli R. Exact schema theory for GP and variable-length GAs with one-point crossover. Genetic
Progr Evol Mach. 2001;2:123–63.
86. Poli R, Langdon WB. Schema theory for genetic programming with one-point crossover and
point mutation. Evol Comput. 2001;6(3):231–52.
crossover: part i. Evol Comput. 2003;11(1):53–66.
crossover: part ii. Evol Comput. 2003;11(2):169–206.
89. Potter MA, de Jong KA. A cooperative coevolutionary approach to function optimization.
In: Proceedings of the 3rd conference on parallel problem solving from nature (PPSN III),
Jerusalem, Isreal, Oct 1994. Berlin: Springer; 1994. p. 249–257.
90. Potter MA, De Jong KA. Cooperative coevolution: an architecture for evolving coadapted
subcomponenets. Evol Comput. 2000;8(1):1–29.
91. Qi X, Palmieri F. Theoretical analysis of evolutionary algorithms with an infinite population
size in continuous space, part 1: basic properties of selection and mutation. IEEE Trans Neural
Netw. 2004;5(1):102–19.
92. Ratle A. Accelerating the convergence of evolutionary algorithms by fitness landscape approx-
imation. In: Parallel problem solving from nature (PPSN V), 1998. p. 87–96.
93. Regis RG, Shoemaker CA. Local function approximation in evolutionary algorithms for the
optimization of costly functions. IEEE Trans Evol Comput. 2004;8(5):490–505.
94. Reza A, Vahid Z, Koorush Z. MLGA: a multilevel cooperative genetic algorithm. In: Pro-
ceedings of the IEEE 5th international conference on bio-inspired computing: theories and
applications (BIC-TA), Changsha, China, Sept 2010. p. 271–277.
95. Rosin C, Belew R. New methods for competitive coevolution. Evol Comput. 1997;15(1):1–29.
96. Rudolph G. Convergence analysis of canonical genetic algorithm. IEEE Trans Neural Netw.
1994;5(1):96–101.
97. Rudolph G. Finite Markov chain results in evolutionary computation: a tour d’horizon. Fun-
damenta Informaticae. 1998;35:67–89.
References 151
98. Rudolph G. Self-adaptive mutations may lead to premature convergence. IEEE Transa Evol
Comput. 2001;5:410–4.
99. Salami M, Hendtlass T. A fast evaluation strategy for evolutionary algorithms. Appl Soft
Comput. 2003;2(3):156–73.
100. Sastry K, Goldberg DE, Pelikan M. Don’t evaluate, inherit. In: Proceedings of genetic evolu-
tionary computation conference (GECCO), San Francisco, CA, USA, July 2001. p. 551–558.
101. Schmidt MD, Lipson H. Coevolution of fitness predictors. IEEE Trans Evol Comput.
2008;12(6):736–49.
102. Schutte JF, Reinbolt JA, Fregly BJ, Haftka RT, George AD. Parallel global optimization with
the particle swarm algorithm. Int J Numer Methods Eng. 2004;61(13):2296–315.
103. Shi Y, Krohling RA. Co-evolutionary particle swarm optimization to solve min-max problems.
In: Proceedings of the congress on evolutionary computation (CEC), Honolulu, HI, May 2002,
vol. 2. p. 1682–1687.
104. Smith RE, Dike BA, Stegmann SA. Fitness inheritance in genetic algorithms. In: Proceedings
of ACM symposium on applied computing, Nashville, Tennessee, USA, 1995. p. 345–350.
105. Smith J, Vavak F. Replacement strategies in steady state genetic algorithms: static environ-
ments. In: Banzhaf W, Reeves C, editors. Foundations of genetic algorithms, vol. 5. CA:
106. Stephens CR, Poli R. Coarse-grained dynamics for generalized recombination. IEEE Trans
Evol Comput. 2007;11(4):541–57.
107. Stephens CR, Waelbroeck H. Schemata evolution and building blocks. Evol Comput.
1999;7:109–29.
108. Sudholt D. A new method for lower bounds on the running time of evolutionary algorithms.
109. Sudholt D. How crossover speeds up building-block assembly in genetic algorithms. Evol
Comput 2016.
110. Szumlanski SR, Wu AS, Hughes CE. Conflict resolution and a framework for collaborative
interactive evolution. In: Proceedings of the 21st national conference on artificial intelligence
(AAAI), Boston, Massachusetts, USA, July 2006. p. 512–517.
111. Takagi H. Interactive evolutionary computation: fusion of the capacities of EC optimization
and human evaluation. Proc IEEE. 2001;89(9):1275–96.
112. Tasoulis DK, Pavlidis NG, Plagianakos VP, Vrahatis MN. Parallel differential evolution. In:
Proceedings of the IEEE congress on evolutionary computation, Portland, OR, USA, June
2004. p. 2023–2029.
113. Thomsen R, Rickers P, Krink T. A religion-based spatial model for evolutionary algorithms.
In: Proceedings of the 6th international conference on parallel problem solving from nature
(PPSN VI), Paris, France, September 2000, vol. 1917 of Lecture Notes in Computer Science.
London: Springer; 2000. p. 817–826.
114. van den Bergh F, Engelbrecht A. A cooperative approach to particle swarm optimization.
115. Vose M, Liepins G. Punctuated equilibria in genetic search. Complex Syst. 1991;5:31–44.
116. Weber M, Neri F, Tirronen V. Distributed differential evolution with explorative-exploitative
population families. Genetic Progr Evol Mach. 2009;10:343–471.
117. Whitley D, Starkweather T. GENITOR II: a distributed genetic algorithm. J Exp Theor Artif
Intell. 1990;2(3):189–214.
118. Whitley D, Yoo NW. Modeling simple genetic algorithms for permutation problems. In:
Whitley D, Vose M, editors. Foundations of genetic algorithms, vol. 3. San Mateo, CA:
119. Wickramasinghe W, van Steen M, Eiben A. Peer-to-peer evolutionary algorithms with adaptive
autonomous selection. In: Proceedings of the 9th annual conference on genetic and evolu-
tionary computation (GECCO), London, U.K., July 2007. p. 1460–1467.
120. Wong M-L, Cui G. Data mining using parallel multiobjective evolutionary algorithms on
graphics hardware. In: Sobrevilla P, editors. Proceedings of IEEE world congress on compu-
tational intelligence, Barcelona, Spain, July 2010. p. 3815–3822.
121. Wong M-L, Wong T-T, Fok K-L. Parallel evolutionary algorithms on graphics processing
unit. In: Proceedings of the IEEE congress on evolutionary computation, Edinburgh, UK,
Sept 2005. p. 2286–2293.
122. Xu L, Zhang F. Parallel particle swarm optimization for attribute reduction. In: Proceedings
of the 8th ACIS international conference on software engineering, artificial intelligence, net-
working, and parallel/distributed computing, Qingdao, China, July 2007, vol. 1. p. 770–775.
123. Yang Z, Tang K, Yao X. Large scale evolutionary optimization using cooperative coevolution.
Inf Sci. 2008;178(15):2985–99.
124. Yao X, Liu Y. A new evolutionary system for evolving artificial neural networks. IEEE Trans
Neural Netw. 1997;8(3):694–713.
125. Yuen SY, Cheung BKS. Bounds for probability of success of classical genetic algorithm based
on Hamming distance. IEEE Trans Evol Comput. 2006;10(1):1–18.
126. Yu Y, Zhou Z-H. A new approach to estimating the expected first hitting time of evolutionary
algorithms. Artif Intell. 2008;172(15):1809–32.
127. Zhang C, Chen J, Xin B. Distributed memetic differential evolution with the synergy of
Lamarckian and Baldwinian learning. Appl Soft Comput. 2013;13(5):2947–59.
128. Zhong W, Liu J, Xue M, Jiao L. A multiagent genetic Algorithm for global numerical opti-
mization. IEEE Trans Syst Man Cybern Part B. 2004;34(2):1128–41.
129. Zhou Z, Ong YS, Nair PB, Keane AJ, Lum KY. Combining global and local surrogate models
to accelerate evolutionary optimization. IEEE Trans Syst Man Cybern Part C. 2007;37(1):
66–76.
Particle Swarm Optimization
9
PSO can locate the region of the optimum faster than EAs, but once in this region
it progresses slowly due to the fixed velocity stepsize. Almost all variants of PSO
try to solve the stagnation problem. This chapter is dedicated to PSO as well as its
variants.
9.1 Introduction
The notion of employing many autonomous particles that act together in simple ways
to produce seemingly complex emergent behavior was initially considered to solve
the problem of rendering images in computer animations [79]. A particle system
stochastically generates a series of moving points. Each particle is assigned an initial
velocity vector. It may also have additional characteristics such as color, texture, and
limited lifetime. Iteratively, velocity vectors are adjusted by some random factor. In
computer graphics and computer games, particle systems are ubiquitous and are the
de facto method for producing animated effects such as fire, smoke, clouds, gunfire,
water, cloth, explosions, magic, lighting, electricity, flocking, and many others. They
are defined by a set of points in space and a set of rules guiding their behavior
and appearance, e.g., velocity, color, size, shape, transparency, and rotation. This
decouples the creation of new complex effects from mathematics and programming.
Today, particle systems are even more popular in global optimization.
PSO originates from studies of synchronous bird flocking, fish schooling, and bees
buzzing [22,44,45,59,83]. It evolves populations or swarms of individuals called
particles. Particles work under social behavior in swarms. PSO finds the global
best solution by simply adjusting the moving vector of each particle according to
its personal best (cognition aspect) and the global best (social aspect) positions of
particles in the entire swarm at each iteration.

DOI 10.1007/978-3-319-41192-7_9
154 9 Particle Swarm Optimization
Compared with ant colony algorithms and EAs, PSO requires only primitive
mathematical operators, less computational bookkeeping and generally fewer lines
of code, and thus it is computationally inexpensive in terms of both memory require-
ments and speed. PSO is popular due to its simplicity of implementation and its
ability to quickly converge to a reasonably acceptable solution.
9.2 Basic PSO Algorithms
The socio-cognitive learning process of basic PSO is based on a particle’s own

experience and the experience of the most successful particle. For an optimization
problem of n variables, a swarm of N P particles is defined, where each particle is
assigned a random position in the n-dimensional space as a candidate solution. Each
particle has its own trajectory, namely position x i and velocity v i , and moves in the
search space by successively updating its trajectory. Populations of particles modify
their trajectories based on the best positions visited earlier by themselves and other
particles. All particles have fitness values that are evaluated by the fitness function
to be optimized. The particles are flown through the solution space by following the
current optimum particles. The algorithm initializes a group of particles with random
positions and then searches for optima by updating generations. In every iteration,
each particle is updated by following the two best values, namely, the particle best
pbest, denoted x i∗ , i = 1, . . . , N P , which is the best solution it has achieved so far,
and the global best gbest, denoted x g , which is the best value obtained so far by any
particle in the population. The best value for the population in a generation is a local
best, lbest.
At iteration t + 1, the swarm can be updated by [45]

v i (t + 1) = v i (t) + cr1 x i∗ (t) − x i (t) + cr2 x g (t) − x i (t) , (9.1)
x i (t + 1) = x i (t) + v i (t + 1), i = 1, . . . , N P , (9.2)
where the acceleration constant c > 0, and r1 and r2 are uniform random numbers
within [0, 1]. This basic PSO may lead to swarm explosion and divergence due to
lack of control of the magnitude of the velocities. This can be solved by setting a
threshold vmax on the absolute value of velocity v i .
PSO can be physically interpreted as a particular discretization of a stochastic
damped mass–spring system: the so-called PSO continuous model. From (9.1), the
velocities of particles are determined by their previous velocities, cognitive learning
(the second term), and social learning (the third term). Due to social learning, all the
particles are attracted by gbest and move toward it. The other two parts correspond to
the autonomy property, which makes particles keep their own information. Therefore,
during the search all particles move toward the region where gbest is located.
Because all particles in the swarm learn from gbest even if gbest is far from the
global optimum, particles may easily be attracted to the gbest region and get trapped in
a local optimum for multimodal problems. In case the gbest positions locate on local
minimum, other particles in the swarm may also be trapped. If an early solution is
9.2 Basic PSO Algorithms 155
suboptimal, the swarm can easily stagnate around it without any pressure to continue
exploration. This can be seen from (9.1). If x i (t) = x i∗ (t) = x g (t), then the velocity
update will depend only on the value of αv i (t). If their previous velocities v i (t) are
very close to zero, then all the particles will stop moving once they catch up with
the gbest particle. Even worse, the gbest point may not be a local minimum. This
phenomenon is referred to as stagnation. To avoid stagnation, reseeding or partial
restart is introduced by generating new particles at distinct places of the search space.
Almost all variants of PSO try to solve the local optimum or stagnation problem.
PSO can locate the region of the optimum faster than EAs. However, once in this
region it progresses slowly due to the fixed velocity stepsize. Linearly decreasing
weight PSO (LDWPSO) [83] effectively balances the global and local search abilities
of the swarm by introducing a linearly decreasing inertia weight on the previous
velocity of the particle into (9.1):

v i (t + 1) = αv i (t) + c1r1 x i∗ (t) − x i (t) + c2 r2 x g (t) − x i (t) , (9.3)
where α is called the inertia weight, and the positive constants c1 and c2 are, respec-
tively, cognitive and social parameters. Typically, c1 = 2.0, c2 = 2.0, and α gradu-
ally decreases from αmax to αmin :
t
α(t) = αmax − (αmax − αmin ) , (9.4)
T
T being the maximum number of iterations. One can select αmax = 1 and αmin = 0.1.
The flowchart of PSO is given by Algorithm 9.1.
Center PSO [57] introduces a center particle into LDWPSO and is updated as
the swarm center at every iteration. The center particle has no velocity, but it is
involved in all operations the same way as the ordinary particles, such as fitness
evaluation, competition for the best particle, except for the velocity calculation. All
particles oscillate around the swarm center and gradually converge toward it. The
center particle often becomes the gbest of the swarm during the run. Therefore, it
has more opportunities to guide the search of the whole swarm, and influences the
performance greatly. CenterPSO achieves not only better solutions but also faster
convergence than LDWPSO does.
PSO, DE, and CMA-ES are compared using certain fitness landscapes evolved
with GP in [52]. DE may get stuck in local optima most of the time for some problem
landscapes. However, over similar landscapes PSO will always find the global optima
correctly within a maximum time bound. DE sometimes has a limited ability to move
its population large distances across the search space if the population is clustered
in a limited portion of it.
Instead of applying inertia to the velocity memory, constriction PSO [22] applies
a constriction factor χ to control the magnitude of velocities:
v i (t + 1) = χ{v i (t) + φ1r1 (x i∗ (t) − x i (t)) + φ2 r2 (x g (t) − x i (t))}, (9.5)
2
χ= , (9.6)

2 − ϕ − ϕ2 − 4ϕ
Algorithm 9.1 (PSO).
1. Set t = 1.
Initialize each particle in the population by randomly selecting values for its position
x i and velocity v i , i = 1, . . . , N P .
2. Repeat:
a. Calculate the fitness value of each particle i.
If the fitness value for each particle i is greater than its best fitness value found
so far, then revise x i∗ (t).
b. Determine the location of the particle with the highest fitness and revise x g (t) if
necessary.
c. for each particle i, calculate its velocity according to (9.1) or (9.3).
d. Update the location of each particle i according to (9.2).
e. Set t = t + 1.
until stopping criteria are met.
where ϕ = ϕ1 + ϕ2 > 4. With this formulation, the velocity limit vmax is no longer
necessary, and the algorithm could guarantee convergence without clamping the
velocity. It is suggested that ϕ = 4.1 (c1 = c2 = 2.05) and χ = 0.729 [27]. When
α = χ and ϕ1 + ϕ2 > 4, the constriction and inertia approaches are algebraically
equivalent and improved performance could be achieved across a wide range of
problems [27]. Constriction PSO has faster convergence than LDWPSO, but it is
prone to be trapped in local optima for multimodal functions.
9.2.1 Bare-Bones PSO
Bare-bones PSO [42], as the simplest version of PSO, eliminates the velocity equation
of PSO and uses a Gaussian distribution based on pbest and gbest to sample the
search space. It does not use the inertia weight, acceleration coefficient or velocity.
The velocity update equation (9.3) is not used and a Gaussian distribution with the
global and local best positions is used to update the particles’ positions. Bare-bones
PSO has the following update equations:
xi, j (t + 1) = gi, j (t) + σi, j (t)N (0, 1), (9.7)

g
gi, j (t) = 0.5 xi,∗ j (t) + x j (t) , (9.8)

g
σi, j (t) = xi,∗ j (t) − x j (t) , (9.9)
where subscripts i, j denote the ith particle and jth dimension, respectively, N (0, 1)
is the Gaussian distribution with zero mean and unit variance. The method can be
9.2 Basic PSO Algorithms 157
derived from basic PSO [68]. An alternative version is to set xi, j (t + 1) to (9.7) with
50 % chance, and to the previous best position x i,∗ j (t) with 50 % chance. Bare-bones
PSO still suffers from the problem of premature convergence.
9.2.2 PSO Variants Using Gaussian or Cauchy Distribution
In basic PSO, a uniform probability distribution is used to generate random numbers

for the coefficients r1 and r2 . The use of Gaussian or Cauchy probability distributions
may improve the ability of fine-tuning or even escaping from local optima. In [24],
truncated Gaussian and Cauchy probability distributions are used to generate random
numbers for the velocity updating equation. In [80], a rule is used for moving the
particles of the swarm a Gaussian distance from the gbest and lbest. An additional
perturbation term can be introduced to the velocity updating equation as a Gaussian
mutation operator [34,85] or as a Cauchy mutation operator [29]. A Gaussian dis-
tribution is also used in a simplified PSO algorithm [42]. The velocity equation can
be updated based on the Gaussian distribution, where the constants c1 and c2 are
generated using the absolute value of the Gaussian distribution with zero mean and
unit standard deviation [51].
In [32], PSO is combined with Levy flights to get rid of local minima and improve
global search capability. Levy flight is a random walk determining stepsize using
Levy distribution. A more efficient search takes place in the search space, thanks to
the long jumps to be made by the particles. A limit value is defined for each particle,
and if the particles could not improve self-solutions at the end of current iteration,
this limit is increased. If the limit value determined is exceeded by a particle, the
particle is redistributed in the search space with Levy flight method.
9.2.3 Stability Analysis of PSO
In [22], the stability analysis of PSO is implemented by simplifying PSO through

treating the random coefficients as constants; this leads to a deterministic second-
order linear dynamical system whose stability depends on the system poles or the
eigenvalues of the state matrix. In [41], sufficient conditions for the stability of
the particle dynamics are derived using Lyapunov stability theorem. A stochastic
analysis of the linear continuous and generalized PSO models for the case of a
stochastic center of attraction are presented in [31]. Generalized PSO tends to the
continuous PSO, when time step approaches zero.
Theoretically, each particle in PSO is proved to converge to the weighted average
of x i∗ and g best [22,89]:
c1 x i∗ + c2 x g (t)
lim x i (t) = , (9.10)
t→∞ c1 + c2
where c1 and c2 are the two learning factors in PSO.
It is shown in [15] that during stagnation in PSO, the points sampled by the leader
particle lie on a specific line. The condition under which particles stick to exploring
one side of the stagnation point only is obtained, and the case where both sides
are explored is also given. Information about the gradient of the objective function
during stagnation in PSO are also obtained.
Under the generalized theoretical deterministic PSO model, conditions for particle
convergence to a point are derived in [20]. The model greatly weakens the stagnation
assumption, by assuming that each particle’s personal best and neighborhood best
can occupy an arbitrarily large number of unique positions.
In [21], an objective function is designed for assumption-free convergence analysis
of some PSO variants. It is found that canonical particle swarm’s topology does not
have an impact on the parameter region needed to ensure convergence. The parameter
region needed to ensure convergent particle behavior has been empirically obtained
for fully informed PSO, bare-bones PSO, and standard PSO 2011.
The issues associated with PSO are the stagnation of particles in some points in
the search space, inability to change the value of one or more decision variables,
poor performance in case of small swarm, lack of guarantee to converge even to
a local optimum, poor performance for an increasing number of dimensions, and
sensitivity to the rotation of the search space. A general form of velocity update rule
for PSO proposed in [10] guarantees to address all of these issues if the user-definable
function f satisfies the two conditions: (i) f is designed in such a way that for any
input vector x in the search space, there exists a region A which contains x and f (x)
can be located anywhere in A, and (ii) f is invariant under any affine transformation.
Example 9.1: We revisit the optimization problem treated in Example 2.1. The
Easom function is plotted in Figure 2.1. The global minimum value is −1 at
x = (π, π)T .
MATLAB Global Optimization Toolbox provides a PSO solver particles
warm. Using the default parameter settings, particleswarm solver can always
find the global optimum very rapidly for ten random runs, for the range [−100, 100]2 .
This is because all the initial individuals which are randomly selected in (0, 1) are
very close the global optimum.
A fair evaluation of PSO is to set the initial population randomly from the entire
domain. We select an initial population size of 40 and other default parameters. For
20 random runs, the solver converged 19 times for a maximum of 100 generations.
For a random run, we have f (x) = −1.0000 at (3.1416, 3.1416) with 2363 function
evaluations, and all the individuals converge toward the global optimum. The evolu-
tion of a random run is illustrated in Figure 9.1. For this problem, we conclude that
the particleswarm solver outperforms ga and simulannealbnd solvers.
9.3 PSO Variants Using Different Neighborhood Topologies 159
Best: -1 Mean: -0.988523

0.2
Best fitness
0 Mean fitness
-0.2
Function value
-0.4
-0.6
-0.8
-1
0 10 20 30 40 50 60
Iteration
Figure 9.1 The evolution of a random run of PSO: the minimum and average objectives.
9.3 PSO Variants Using Different Neighborhood Topologies
A key feature of PSO is social information sharing among the neighborhood. Typical
neighborhood topologies are the von-Neumann neighborhood, gbest and lbest, as
shown in Figure 9.2. The simplest neighbor structure might be the ring structure.
Basic PSO uses gbest topology, in which the neighborhood consists of the whole
swarm, meaning that all the particles have the information of the globally found best
solution. Every particle is a neighbor of every other particle.
The lbest neighborhood has ring lattice topology: each particle generates a neigh-
borhood consisting of itself and its two or more immediate neighbors. The neighbors
may not be close to the generating particle either regarding the objective function
values or the positions, instead they are chosen by their adjacent indices.
lbest gbest Von Neumann

Figure 9.2 Swarms with different social networks.
For the von-Neumann neighborhood, each particle possesses four neighbors on a

two-dimensional lattice that is wrapped on all four sides (torus), and a particle is in
the middle of its four neighbors. The possible particle number is restricted to four.
Based on testing on several social network structures, PSO with a small neigh-
borhood tends to perform better on complex problems, while PSO with a large
neighborhood would perform better on simple problems [46,47]. The von-Neumann
neighborhood topology performs consistently better than gbest and lbest do [46].
To prevent premature convergence, in fully informed PSO [64], a particle uses
information from all its topological neighbors to update the velocity. The influence of
each particle on its neighbors is weighted based on its fitness value and the neighbor-
hood size. This scheme outperforms basic PSO. The constriction factor is adopted
in fully informed PSO, with the value ϕ being equally distributed among all the
neighbors of a particle.
Unified PSO [70] is obtained by modifying the constricted algorithm to harness the
explorative behavior of global variant and exploitative nature of a local neighborhood
variant. Two velocity updates are initially calculated and are then linearly combined
to form a unified velocity update, which is then applied to the current position.
The lbest topology is better for exploring the search space while gbest converges
faster. The variable neighborhood operator [86] begins the search with an lbest ring
lattice and slowly increases the size of the neighborhood, until the population is fully
connected.
In hierarchical PSO [38], particles are arranged in a dynamic hierarchy to define
a neighborhood structure. Depending on the quality of their pbests, the particles
move up or down the hierarchy. A good particle on the higher hierarchy has a larger
influence on the swarm. The shape of the hierarchy can be dynamically adapted.
Different behavior to the individual particles can also be assigned with respect to
their level in the hierarchy.
9.4 Other PSO Variants
Particle swarm adaptation is an optimization paradigm that simulates the ability

of human societies to process knowledge. Similar to social-only PSO [49], many
optimizing liaisons optimization [73] is a simplified PSO by not having any attraction
to the particle’s personal best position. It has a performance comparable to that of
PSO, and has behavioral parameters that are easier to tune.
Basic PSO [45] is synchronous PSO in which communication between particles is
synchronous. Particles communicate their best positions and respective objective val-
ues to their neighbors, and the neighbors do the same immediately. Hence, particles
have perfect information from their neighbors before updating their positions. For
asynchronous PSO models [1,12,50], in a given iteration, each particle updates and
communicates its memory to its neighbors immediately after its move to a new posi-
tion. Thus, the particles that remain to be updated in the same iteration can exploit the
new information immediately, instead of waiting for the next iteration as in the syn-
9.4 Other PSO Variants 161
chronous model. In general, the asynchronous model has faster convergence speed
than synchronous PSO, yet at the cost of getting trapped by rapidly attracting all parti-
cles to a deceitful solution. Random asynchronous PSO is a variant of asynchronous
PSO where particles are selected at random to perform their operations. Random
asynchronous PSO has the best general performance in large neighborhoods, while
synchronous PSO has the best one in small neighborhoods [77].
In fitness-distance-ratio-based PSO (FDR-PSO) [74], each particle utilizes an
additional information of the nearby higher fitness particle that is selected according
to fitness–distance ratio, i.e., the ratio of fitness improvement over the respective
weighted Euclidean distance. The algorithm moves particles toward nearby particles
of higher fitness, instead of attracting each particle toward just the gbest position. This
combats the problem of premature convergence observed in PSO. Concurrent PSO
[6] avoids the possible crosstalk effect of pbest and gbest with nbest in FDR-PSO
by concurrently simulating modified PSO and FDR-PSO algorithms with frequent
message passing between them.
To avoid stagnation and to keep the gbest particle moving until it has reached a
local minimum, guaranteed convergence PSO [87] uses a different velocity update
equation for the x g particle, which causes the particle to perform a random search
around x g within a radius defined by a scaling factor. Its ability to operate with small
swarm sizes makes it an enabling technique for parallel niching solutions.
For large parameter optimization problems, orthogonal PSO [35] uses an intel-
ligent move mechanism, which applies orthogonal experimental design to adjust a
velocity for each particle by using a divide and conquer approach in determining the
next move of particles.
In [14], basic PSO and Michigan PSO are used to solve the problem of prototype
placement for nearest prototype classifiers. In the Michigan approach, a member of
the population only encodes part of the solution, and the whole swarm is the potential
solution to the problem. This reduces the dimension of the search space. Adaptive
Michigan PSO [14] uses modified PSO equations with both particle competition
and cooperation between the closest neighbors and a dynamic neighborhood. The
Michigan PSO algorithms introduce a local fitness function to guide the particles’
movement and dynamic neighborhoods that are calculated on each iteration.
Diversity can be maintained by relocating the particles when they are too close
to each other [60] or using some collision-avoiding mechanisms [8]. In [71], trans-
formations of the objective function through deflection and stretching are used to
overcome local minimizers and a repulsion source at each detected minimizer is
used to repel particles away from previously detected minimizers. This combina-
tion is able to find as many global minima as possible by preventing particles from
moving to a previously discovered minimal region.
In [30], PSO is used to improve simplex search. Clustering-aided simplex PSO
[40] incorporates simplex method to improve PSO performance. Each particle in
PSO is regarded as a point of the simplex. On each iteration, the worst particle is
replaced by a new particle generated by one iteration of the simplex method. Then,
all particles are again updated by PSO. PSO and simplex methods are performed
iteratively.
Incremental social learning is a way to improve the scalability of systems com-

posed of multiple learning agents. The incremental particle swarm optimizer [26]
has a growing population size, with the initial position of new particles being biased
toward the best-so-far solution. Solutions are further improved through a local search
procedure. The population size is increased if the optimization problem at hand can-
not be solved satisfactorily by local search alone.
Efficient population utilization strategy for PSO (EPUS-PSO) [36] adopts a popu-
lation manager to improve the efficiency of PSO. The population manager eliminates
redundant particles and recruits new ones or maintain particle numbers according to
the solution-searching status. If the particles cannot find a better solution to update
gbest, they may be trapped into the local minimum. To keep gbest updated and to find
better solutions, new particles should be added into the swarm. A maximal popula-
tion size should be predefined. The population manager will adjust population size
depending on whether the gbest has not been updated in k consecutive generations.
A mutation-like ES and two built-in sharing strategies can prevent the solutions from
falling into the local minimum.
The population size of PSO can be adapted by assigning a maximum lifetime
to groups of particles based on their performance and spatial distribution [53]. PSO
with an aging leader and challengers (ALC-PSO) [17] improves PSO by overcoming
the problem of premature convergence. The leader of the swarm is assigned with a
growing age and a lifespan, and the other individuals are allowed to challenge the
leadership when the leader becomes aged. The lifespan of the leader is adaptively
tuned according to the leader’s leading power. If a leader shows strong leading power,
it lives longer to attract the swarm toward better positions. Otherwise, it gets old and
new particles emerge to challenge and claim the leadership, bringing in diversity.
Passive congregation is an important biological force preserving swarm integrity.
It has been introduced into the velocity update equation as an additional compo-
nent [33].
In [84], PSO is improved by applying diversity to both the velocity and the popula-
tion by a predator particle and several scout particles. The predator particle balances
the exploitation and exploration of the swarm, while scout particles implement dif-
ferent exploration strategies. The closer the predator particle is to the best particle,
the higher the probability of perturbation.
Opposition-based learning can be used to improve the performance of PSO by
replacing the least-fit particle with its antiparticle. In [91], opposition-based learning
is applied to PSO, where the particle’s own position and the position opposite the
center of the swarm are evaluated for each randomly selected particle, along with a
Cauchy mutation to keep the gbest particle moving and thus avoiding its premature
convergence.
Animals react to negative as well as positive stimuli, e.g., an animal looking for
food is also conscious of danger. In [92], each particle adjusts its position according
to its own personal worst solution and its group’s global worst based on similar
formulae of regular PSO. This strategy outperforms PSO by avoiding those worse
areas.
9.4 Other PSO Variants 163
Adaptive PSO [93] first, by evaluating the population distribution and particle
fitness, performs a real-time procedure to identify one of the four defined evolution-
ary states, including exploration, exploitation, convergence, and jumping out in each
generation. It enables the automatic control of algorithmic parameters at run time to
improve the search efficiency and convergence speed. Then, an elitist learning strat-
egy is performed when the evolutionary state is classified as convergence state. The
strategy will act on the gbest particle to jump out of the likely local optima. Adaptive
PSO substantially enhances the performance of PSO in terms of convergence speed,
global optimality, solution accuracy, and algorithm reliability.
Chaotic PSO [2] utilizes chaotic maps for parameter adaptation which can improve
the search ability of basic PSO. Frankenstein’s PSO [25] combines a number of algo-
rithmic components such as time-varying population topology, the velocity updating
mechanism of fully informed PSO [64], and decreasing inertia weight, showing
advantages in terms of optimization speed and reliability. Particles are initially con-
nected with fully connected topology, which is reduced over time with certain pattern.
Comprehensive learning PSO (http://www.ntu.edu.sg/home/epnsugan) [55] uses
all other particles’ pbest information to update a particle’s velocity. It learns each
dimension of a particle from just one particle’s historical best information, while
each particle learns from different particles’ historical best information for different
dimensions for a few generations. This strategy helps to preserve the diversity to
discourage premature convergence. The method outperforms PSO with inertia weight
[83] and PSO with constriction factor [22] in solving multimodal problems.
Inspired by the social behavior of clan, clan PSO [13] divides the PSO population
into several clans. Each clan will first perform the search and the particle with the best
fitness is selected as the clan leader. The leaders then meet to adjust their position.
Dynamic clan PSO [7] allows particles in one clan migrate to another clan.
Motivated by a social phenomenon where multiple of good exemplars assist the
crowd to progress better, in example-based Learning PSO [37], an example set of
multiple gbest particles is employed to update the particles’ position in example-
based Learning PSO.
Charged PSO [8] utilizes an analogy of electrostatic energy, where some mutually
repelling particles orbit a nucleus of neutral particles. This nucleus corresponds to
a basic PSO swarm. The particles with identical charges produce a repulsive force
between them. The neutral particles allow exploitation while the charged particles
enforce separation to maintain exploration.
Random black hole PSO [95] is a PSO algorithm based on the concept of black
holes in physics. In each dimension of a particle, a black hole located nearest to
the best particle of the swarm in current generation is randomly generated and then
particles of the swarm are randomly pulled into the black hole with a probability
p. This helps the algorithm fly out of local minima, and substantially speed up the
evolution process to global optimum.
Social learning plays an important role in behavior learning among social animals.
In contrast to individual learning, social learning allows individuals to learn behaviors
from others without the cost of individual trials and errors. Social learning PSO [18]
introduces social learning mechanisms into PSO. Each particle learns from any of
the better particles (termed demonstrators) in the current swarm. Social learning
PSO adopts a dimension-dependent parameter control method. It performs well on
low-dimensional problems and is promising for solving large-scale problems as well.
In [5], agents in the swarm are categorized into explorers and settlers, which can
dynamically exchange their role in the search process. This particle task differen-
tiation is achieved through a different way of adjusting the particle velocities. The
coefficients of the cognitive and social component of the stochastic acceleration as
well as the inertia weight are related to the distance of each particle from the gbest
position found so far. This particle task differentiation enhances the local search
ability of the particles close to the gbest and improves the exploration ability of the
particles far from the gbest.
PSO lacks mechanisms which add diversity to exploration in the search process.
Inspired by the collective response behavior of starlings, starling PSO [65] introduces
a mechanism to add diversity into PSO. This mechanism consists of initialization,
identifying seven nearest neighbors, and orientation change.
9.5 PSO and EAs: Hybridization
In PSO, the particles move through the solution space through perturbations of their
position, which are influenced by other particles, whereas in EAs, individuals breed
with one another to produce new individuals. Compared to EAs, PSO is easy to
implement and there are few parameters to adjust. In PSO, every particle remembers
its pbest and gbest, thus having a more effective memory capability than EAs have.
PSO is also more efficient in maintaining the diversity of the swarm, since all the
particles use the information related to the most successful particle in order to improve
themselves, whereas in EAs only the good solutions are saved.
Hybridization of EAs and PSO is usually implemented by incorporating genetic
operators into PSO to enhance the performance of PSO: to keep the best particles
[4], to increase the diversity, and to improve the ability to escape local minima [61].
In [4], a tournament selection process is applied to replace each poorly performing
particle’s velocity and position with those of better performing particles. In [61], basic
PSO is combined with arithmetic crossover. The hybrid PSOs combine the velocity
and position update rules with the ideas of breeding and subpopulations. The swarm
is divided into subpopulations, and a breeding operator is used within a subpopulation
or between the subpopulations to increase the diversity of the population. In [82], the
standard velocity and position update rules of PSO are combined with the concepts
of selection, crossover, and mutation. A breeding ratio is employed to determine
the proportion of the population that undergoes breeding procedure in the current
generation and the portion to perform regular PSO operation. Grammatical swarm
adopts PSO coupled to a grammatical evolution genotype–phenotype mapping to
generate programs [67].
Evolutionary self-adapting PSO [63] grants a PSO scheme with an explicit selec-
tion procedure and with self-adapting properties for its parameters. This selection
9.5 PSO and EAs: Hybridization 165
acts on the weights or parameters governing the behavior of a particle and, the particle
movement operator is introduced to generate diversity.
In [39], mutation, crossover, and elitism are incorporated into PSO. The upper-
half of the best-performing individuals, known as elites, are regarded as a swarm
and enhanced by PSO. The enhanced elites constitute half of the population in the
new generation, while crossover and mutation operations are applied to the enhanced
elites to generate the other half.
AMALGAM-SO [90] implements self-adaptive multimethod search using a sin-
gle universal genetic operator for population evolution. It merges the strengths of
CMA-ES, GA, and PSO for population evolution during each generation and imple-
ments a self-adaptive learning strategy to automatically tune the number of offspring.
The method scales well with increasing number of dimensions, converges in the close
proximity of the global minimum for functions with noise induced multimodality,
and is designed to take full advantage of the power of distributed computer networks.
Time-varying acceleration coefficients (TVAC) [78] are introduced to efficiently
control the local search and convergence to the global optimum, in addition to the
time-varying inertia weight factor in PSO. Mutated PSO with TVAC adds a pertur-
bation to a randomly selected modulus of the velocity vector of a random particle by
predefined probability. Self-organizing hierarchical PSO with TVAC considers only
the social and cognitive parts, but eliminates the inertia term in the velocity update
rule. Particles are reinitialized whenever they are stagnated in the search space, or
any component of a particle’s velocity vector becomes very close to zero.
9.6 Discrete PSO
Basic PSO is applicable to optimization problems with continuous variables. A dis-

crete version of PSO for binary problems is proposed in [43] for problems with
binary-valued solution elements. It solves the problem of moving the particles
through the problem space by changing the velocity in each particle to the prob-
ability of each bit being in one state or the other. The particle is composed of binary
variables, and the velocity is transformed into a change of probability.
Assume N P particles in the population. Each particle x i = (xi,1 , . . . , xi,n )T ,
xi,n ∈ {0, 1}, has n bits. As in basic PSO, each particle adjusts its velocity by using
(9.1), where c1r1 + c2 r2 is usually limited to 4 [44]. The velocity value is then con-
verted to a probability to denote bit xi,d (t) taking one, generating a threshold Ti,d by
using a logistic function
1
Ti,d = . (9.11)
1 + e−vi,d (t)
Generate a random number r for each bit. If r < Ti,d , then xid is interpreted as 1;
otherwise, as 0. The velocity term is limited to |vi,d | < Vmax . To prevent Ti,d from
approaching 0 or 1, one can force Vmax = 4 [44].
Based on the discrete PSO proposed in [43], multiphase discrete PSO [3] is for-
mulated by using an alternative velocity update technique, which incorporates hill
climbing using random stepsize in the search space. The particles are divided into
groups that follow different search strategies. A discrete PSO algorithm is proposed
in [56] for flowshop scheduling, where the particle and velocity are redefined, an
efficient approach is developed to move a particle to the new sequence, and a local
search scheme is incorporated.
Jumping PSO [62] is a discrete PSO inspired from frogs. The positions x i of
particles jump from one solution to another. It does not consider any velocity. Each
particle has three attractors: its own best position, the best position of its social
neighborhood, and the gbest position. A jump approaching an attractor consists of
changing a feature of the current solution by a feature of the attractor.
9.7 Multi-swarm PSOs
Multiple swarms in PSO explore the search space together to attain the objective of
finding the optimal solutions. This resembles many bird species joining to form a
flock in a geographical region, to achieve certain foraging behaviors that benefit one
another. Each species has different food preferences. This corresponds to multiple
swarms locating possible solutions in different regions of the solution space. This
is also similar to people all over the world: In each country, there is a different
lifestyle that is best suited to the ethnic culture. A species can be defined as a group
of individuals sharing common attributes according to some similarity metric.
Multi-swarm PSO is used for solving multimodal problems and combating PSO’s
tendency in premature convergence. It typically adopts a heuristically chosen number
of swarms with a fixed swarm size throughout the search process. Multi-swarm PSO
is also used to locate and track changing optima in a dynamic environment.
Based on guaranteed convergence PSO [87], niching PSO [11] creates a subswarm
from a particle and its nearest spatial neighbor, if the variance in that particle’s fitness
is below a threshold. Niching PSO initially sets up subswarm leaders by training the
main swarm utilizing the basic PSO using no social information (c2 = 0). Niches are
then identified and a subswarm radius is set. As optimization progresses, particles
are allowed to join subswarms, which are in turn allowed to merge. Once the velocity
has minimized, they converge to their subswarm optimum.
In turbulent PSO [19], the population is divided into two subswarms: one sub-
swarm following the gbest, while the other moving in the opposite direction. The
particles’ positions are dependent on their lbest, their corresponding subswarm’s
best, and the gbest collected from the two subswarms. If the gbest has not improved
for fifteen successive iterations, the worst particles of a subswarm are replaced by the
best ones from the other subswarm, and the subswarms switch their flight directions.
Turbulent PSO avoids premature convergence by replacing the velocity memory by
a random turbulence operator when a particle exceeds it. Fuzzy adaptive turbulent
9.7 Multi-swarm PSOs 167
PSO [58] is a hybrid of turbulent PSO with a fuzzy logic controller to adaptively
regulate the velocity parameters.
Speciation-based PSO [54] uses spatial speciation for locating multiple local
optima in parallel. Each species is grouped around a dominating particle called the
species seed. At each iteration, species seeds are identified from the entire popula-
tion, and are then adopted as neighborhood bests for these individual species groups
separately. Dynamic speciation-based PSO [69] modifies speciation-based PSO for
tracking multiple optima in the dynamic environment by comparing the fitness of
each particle’s current lbest with its previous record to continuously monitor the
moving peaks, and by using a predefined species population size to quantify the
crowdedness of species before they are reinitialized randomly in the solution space
to search for new possible optima.
In adaptive sequential niche PSO [94], the fitness values of the particles are mod-
ified by a penalty function to prevent all subswarms from converging to the same
optima. A niche radius is not required. It can find all optimal solutions for multimodal
function sequentially.
In [48], the swarm population is clustered into a certain number of clusters. Then,
a particle’s lbest is replaced by its cluster center, and the particles’ gbest is replaced
by the neighbors’ best. This approach has improved the diversity and exploration of
PSO. In [72], in order to solve multimodal problems, clustering is used to identify
the niches in the swarm population and then to restrict the neighborhood of each
particle to the other particles in the same cluster in order to perform a local search
for any local minima located within the clusters.
In [9], the population of particles are split into a set of interacting swarms,
which interact locally by an exclusion parameter and globally through a new anti-
convergence operator. Each swarm maintains diversity either by using charged or
quantum particles. Quantum swarm optimization (QSO) builds on the atomic pic-
ture of charged PSO, and uses a quantum analogy for the dynamics of the charged
particles. Multi-QSO uses multiple swarms [9].
In multigrouped PSO [81], N solutions of a multimodal function can be searched
with N groups. A repulsive velocity component is added to the particle update equa-
tion, which will push the intruding particles out of the other group’s gbest radius.
The predefined radius is allowed to increase linearly during the search process to
avoid several groups from settling on the same peak.
When multi-swarms are used for enhancing diversity of PSO, each swarm per-
forms a PSO paradigm independently. After some predefined generations, the swarms
will exchange information based on a diversified list of particles. Some strategies
for information exchange between two or more swarms are given in [28,75]. In [28],
two subswarms are updated independently for a certain interval, and then, the best
particles (information) in each subswarm are exchanged. In [75], swarm population
is initially clustered into a predefined number of swarms. Particles’ positions are first
updated using a PSO equation where three levels of communications are facilitated,
namely, personal, global, and neighborhood levels. At every iteration, the particles in
a swarm are divided into two sets: One set of particles is sent to another swarm, while
the other set of particles will be replaced by the individuals from other swarms [75].
Cooperative PSO [88] employs cooperative behavior among multiple swarms to

improve the performance of PSO on multimodal problems based on cooperative
coevolutionary GA. The decision variables are divided into multiple parts and to
assign different parts to different swarms for optimization. In multipopulation coop-
erative PSO [66], the swarm population comprises a master swarm and multiple
slave swarms. The slave swarms explore the search space independently to maintain
diversity of particles, while the master swarm evolves via the best particles collected
from the slave swarms [66].
Coevolutionary particle swarm optimizer with parasitic behavior (PSOPB) [76]
divides the population into two swarms: host swarm and parasite swarm. The par-
asitic behavior is mimicked from three aspects: the parasites getting nourishments
from the host, the host immunity, and the evolution of the parasites. With a prede-
fined probability, which reflects the facultative parasitic behavior, the two swarms
exchange particles according to fitness values in each swarm. The host immunity is
mimicked through two ways: the number of exchange particles is linearly decreased
over iterations, and particles in the host swarm can learn from the global best position
in the parasite swarm. Two mutation operators are utilized to simulate two aspects
of the evolution of the parasites. Particles with poor fitness in the host swarm are
replaced by randomly initialized particles. PSOPB outperforms eight PSO variants
in terms of solution accuracy and convergence speed.
PS2O [16] is multi-swarm PSO inspired by the coevolution of symbiotic species
(or heterogeneous cooperation) in natural ecosystems. The interacting swarms are
modeled by constructing hierarchical interaction topology and enhanced dynamical
update equations. Information exchanges take place not only between the particles
within each swarm, but also between different swarms. Each individual is influenced
by three attractors: its own previous best position, best position of its neighbors from
its own swarm, and best position of its neighbor swarms.
TRIBES [23], illustrated in Figure 9.3, is a parameter-free PSO system. The topol-
ogy includes the size of the population, evolving over time in response to performance
feedback. In TRIBES, only adaptation rules can be modified or added by the user,
while the parameters change according to the swarm behavior. The population is
divided in subpopulations called tribes, each maintaining its own order and struc-
Figure 9.3 TRIBES

topology. A tribe is a fully
connected network. Each
tribe is linked to the others
via its shaman (denoted by a
black particle).
9.7 Multi-swarm PSOs 169
ture. Tribes may benefit by removal of their weakest member, or by addition of a new
member. The best particles of the tribes are exchanged among all the tribes. Relation-
ships between particles in a tribe are similar to those defined in global PSO. TRIBES
is efficient in quickly finding a good region of the landscape, but less efficient for
local refinement.
Problems
9.1 Explain why in basic PSO with a neighbor structure a larger neighbor number
has faster convergence, but in fully informed PSO the opposite is true.
9.2 Implement the particleswarm solver of MATLAB Global Optimization
Toolbox for solving a benchmark function. Test the influence of different para-
meter settings.
References
1. Akat SB, Gazi V. Decentralized asynchronous particle swarm optimization. In: Proceedings of
the IEEE swarm intelligence symposium, St. Louis, MO, USA, September 2008. p. 1–8.
2. Alatas B, Akin E, Bedri A. Ozer, Chaos embedded particle swarm optimization algorithms.
Chaos Solitons Fractals. 2009;40(5):1715–34.
3. Al-kazemi B, Mohan CK. Multi-phase discrete particle swarm optimization. In: Proceedings
of the 4th international workshop on frontiers in evolutionary algorithms, Kinsale, Ireland,
January 2002.
4. Angeline PJ. Using selection to improve particle swarm optimization. In: Proceedings of IEEE
congress on evolutionary computation, Anchorage, AK, USA, May 1998. p. 84–89.
5. Ardizzon G, Cavazzini G, Pavesi G. Adaptive acceleration coefficients for a new search diver-
sification strategy in particle swarm optimization algorithms. Inf Sci. 2015;299:337–78.
6. Baskar S, Suganthan P. A novel concurrent particle swarm optimization. In: Proceedings of
IEEE congress on evolutionary computation (CEC), Beijing, China, June 2004. p. 792–796.
7. Bastos-Filho CJA, Carvalho DF, Figueiredo EMN, de Miranda PBC. Dynamicclan particle
swarm optimization. In: Proceedings of the 9th international conference on intelligent systems
design and applications (ISDA’09), Pisa, Italy, November 2009. p. 249–254.
8. Blackwell TM, Bentley P. Don’t push me! Collision-avoiding swarms. In: Proceedings of
congress on evolutionary computation, Honolulu, HI, USA, May 2002, vol. 2. p. 1691–1696.
9. Blackwell T, Branke J. Multiswarms, exclusion, and anti-convergence in dynamic environ-
ments. IEEE Trans Evol Comput. 2006;10(4):459–72.
10. Bonyadi MR, Michalewicz Z. A locally convergent rotationally invariant particle swarm opti-
mization algorithm. Swarm Intell. 2014;8:159–98.
11. Brits R, Engelbrecht AF, van den Bergh F. A niching particle swarm optimizer. In: Proceedings
of the 4th Asia-Pacific conference on simulated evolutions and learning, Singapore, November
2002. p. 692–696.
12. Carlisle A, Dozier G. An off-the-shelf PSO. In: Proceedings of workshop on particle swarm
optimization, Indianapolis, IN, USA, Jannuary 2001. p. 1–6.
13. Carvalho DF, Bastos-Filho CJA. Clan particle swarm optimization. In: Proceedings of IEEE
congress on evolutionary computation (CEC), Hong Kong, China, June 2008. p. 3044–3051.
14. Cervantes A, Galvan IM, Isasi P. AMPSO: a new particle swarm method for nearest neighbor-
hood classification. IEEE Trans Syst Man Cybern Part B. 2009;39(5):1082–91.
15. Chatterjee S, Goswami D, Mukherjee S, Das S. Behavioral analysis of the leader particle during
stagnation in a particle swarm optimization algorithm. Inf Sci. 2014;279:18–36.
16. Chen H, Zhu Y, Hu K. Discrete and continuous optimization based on multi-swarm coevolution.
Nat Comput. 2010;9:659–82.
17. Chen W-N, Zhang J, Lin Y, Chen N, Zhan Z-H, Chung HS-H, Li Y, Shi Y-H. Particle swarm
optimization with an aging leader and challengers. IEEE Trans Evol Comput. 2013;17(2):241–
58.
18. Cheng R, Jin Y. A social learning particle swarm optimization algorithm for scalable optimiza-
tion. Inf Sci. 2015;291:43–60.
19. Chen G, Yu J. Two sub-swarms particle swarm optimization algorithm. In: Advances in natural
computation, vol. 3612 of Lecture notes in computer science. Berlin: Springer; 2005. p. 515–
524.
20. Cleghorn CW, Engelbrecht AP. A generalized theoretical deterministic particle swarm model.
Swarm Intell. 2014;8:35–59.
21. Cleghorn CW, Engelbrecht AP. Particle swarm variants: standardized convergence analysis.
Swarm Intell. 2015;9:177–203.
22. Clerc M, Kennedy J. The particle swarm-explosion, stability, and convergence in a multidi-
mensional complex space. IEEE Trans Evol Comput. 2002;6(1):58–73.
23. Clerc M. Particle swarm optimization. In: International scientific and technical encyclopaedia.
Hoboken: Wiley; 2006.
24. Coelho LS, Krohling RA. Predictive controller tuning using modified particle swarm optimi-
sation based on Cauchy and Gaussian distributions. In: Proceedings of the 8th online world
conference soft computing and industrial applications, Dortmund, Germany, September 2003.
p. 7–12.
25. de Oca MAM, Stutzle T, Birattari M, Dorigo M. Frankenstein’s PSO: a composite particle
swarm optimization algorithm. IEEE Trans Evol Comput. 2009;13(5):1120–32.
26. de Oca MAM, Stutzle T, Van den Enden K, Dorigo M. Incremental social learning in particle
swarms. IEEE Trans Syst Man Cybern Part B. 2011;41(2):368–84.
27. Eberhart RC, Shi Y. Comparing inertia weights and constriction factors in particle swarm
optimization. In: Proceedings of IEEE congress on evolutionary computation (CEC), La Jolla,
CA, USA, July 2000. p. 84–88.
28. El-Abd M, Kamel MS. Information exchange in multiple cooperating swarms. In: Proceedings
of IEEE swarm intelligence symposium, Pasadena, CA, USA, June 2005. p. 138–142.
29. Esquivel SC, Coello CAC. On the use of particle swarm optimization with multimodal func-
tions. In: Proceedings of IEEE congress on evolutionary computation (CEC), Canberra, Aus-
tralia, 2003. p. 1130–1136.
30. Fan SKS, Liang YC, Zahara E. Hybrid simplex search and particle swarm optimization for the
global optimization of multimodal functions. Eng Optim. 2004;36(4):401–18.
31. Fernandez-Martinez JL, Garcia-Gonzalo E. Stochastic stability analysis of the linear continuous
and discrete PSO models. IEEE Trans Evol Comput. 2011;15(3):405–23.
32. Hakli H, Uguz H. A novel particle swarm optimization algorithm with Levy flight. Appl Soft
Comput. 2014;23:333–45.
33. He S, Wu QH, Wen JY, Saunders JR, Paton RC. A particle swarm optimizer with passive
congregation. Biosystems. 2004;78:135–47.
34. Higashi N, Iba H. Particle swarm optimization with Gaussian mutation. In: Proceedings of
IEEE swarm intelligence symposium, Indianapolis, IN, USA, April 2003. p. 72–79.
35. Ho S-Y, Lin H-S, Liauh W-H, Ho S-J. OPSO: orthogonal particle swarm optimization and its
application to task assignment problems. IEEE Trans Syst Man Cybern Part A. 2008;38(2):288–
98.
References 171
36. Hsieh S-T, Sun T-Y, Liu C-C, Tsai S-J. Efficient population utilization strategy for particle
swarm optimizer. IEEE Trans Syst Man Cybern Part B. 2009;39(2):444–56.
37. Huang H, Qin H, Hao Z, Lim A. Example-based learning particle swarm optimization for
continuous optimization. Inf Sci. 2012;182:125–38.
38. Janson S, Middendorf M. A hierarchical particle swarm optimizer and its adaptive variant.
IEEE Trans Syst Man Cybern Part B. 2005;35(6):1272–82.
39. Juang C-F. A hybrid of genetic algorithm and particle swarm optimization for recurrent network
design. IEEE Trans Syst Man Cybern Part B. 2004;34(2):997–1006.
40. Juang C-F, Chung I-F, Hsu C-H. Automatic construction of feedforward/recurrent fuzzy
systems by clustering-aided simplex particle swarm optimization. Fuzzy Sets Syst.
2007;158(18):1979–96.
41. Kadirkamanathan V, Selvarajah K, Fleming PJ. Stability analysis of the particle dynamics in
particle swarm optimizer. IEEE Trans Evol Comput. 2006;10(3):245–55.
42. Kennedy J. Bare bones particle swarms. In: Proceedings of IEEE swarm intelligence sympo-
sium, Indianapolis, IN, USA, April 2003. p. 80–87.
43. Kennedy J, Eberhart RC. A discrete binary version of the particle swarm algorithm. In: Pro-
ceedings of IEEE conference on systems, man, and cybernetics, Orlando, FL, USA, October
1997. p. 4104–4109.
44. Kennedy J, Eberhart RC. Swarm intelligence. San Francisco, CA: Morgan Kaufmann; 2001.
45. Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of IEEE international
conference on neural networks, Perth, WA, USA, November 1995, vol. 4. p. 1942–1948.
46. Kennedy J, Mendes R. Population structure and particle swarm performance. In: Proceedings
of congress on evolutionary computation, Honolulu, HI, USA, May 2002. p. 1671–1676.
47. Kennedy J. Small worlds and mega-minds: Effects of neighborhood topology on particle swarm
performance. In: Proceedings of congress on evolutionary computation (CEC), Washington,
DC, USA, July 1999. p. 1931–1938.
48. Kennedy J. Stereotyping: improving particle swarm performance with cluster analysis. In:
Proceedings of congress on evolutionary computation (CEC), La Jolla, CA, July 2000. p.
1507–1512.
49. Kennedy J. The particle swarm: social adaptation of knowledge. In: Proceedings of IEEE
international conference on evolutionary computation, Indianapolis, USA, April 1997. p. 303–
308.
50. Koh B-I, George AD, Haftka RT, Fregly BJ. Parallel asynchronous particle swarm optimization.
Int J Numer Methods Eng. 2006;67:578–95.
51. Krohling RA. Gaussian swarm: a novel particle swarm optimization algorithm. In: Proceedings
of IEEE conference cybernetics and intelligent systems, Singapore, December 2004. p. 372–
376.
52. Langdon WB, Poli R. Evolving problems to learn about particle swarm optimizers and other
search algorithms. IEEE Trans Evol Comput. 2007;11(5):561–78.
53. Lanzarini L, Leza V, De Giusti A. Particle swarm optimization with variable population size.
In: Proceedings of the 9th international conference on artificial intelligence and soft computing,
Zakopane, Poland, June 2008, vol. 5097 of Lecture notes in computer science. Berlin: Springer;
2008. p. 438–449.
54. Li X. Adaptively choosing neighbourhood bests using species in a particle swarm optimizer for
multimodal function optimization. In: Proceedings of genetic and evolutionary computation
conference (GECCO), Seattle, WA, USA, June 2004. p. 105–116.
55. Liang JJ, Qin AK, Suganthan PN, Baskar S. Comprehensive learning particle swarm optimizer
for global optimization of multimodal functions. IEEE Trans Evol Comput. 2006;10(3):281–
95.
56. Liao C-J, Tseng C-T, Luarn P. A discrete version of particle swarm optimization for flowshop
scheduling problems. Comput Oper Res. 2007;34:3099–111.
57. Liu Y, Qin Z, Shi Z, Lu J. Center particle swarm optimization. Neurocomputing. 2007;70:672–
9.
58. Liu H, Abraham A. Fuzzy adaptive turbulent particle swarm optimization. In: Proceedings of
the 5th international conference on hybrid intelligent systems (HIS’05), Rio de Janeiro, Brazil,
November 2005. p. 445–450.
59. Loengarov A, Tereshko V. A minimal model of honey bee foraging. In: Proceedings of IEEE
swarm intelligence symposium, Indianapolis, IN, USA, May 2006. p. 175–182.
60. Lovbjerg M, Krink T. Extending particle swarm optimisers with self-organized criticality. In:
Proceedings of congress on evolutionary computation (CEC), Honolulu, HI, USA, May 2002.
p. 1588–1593.
61. Lovbjerg M, Rasmussen TK, Krink T. Hybrid particle swarm optimiser with breeding and sub-
populations. In: Proceedings of genetic and evolutionary computation conference (GECCO),
Menlo Park, CA, USA, August 2001. p. 469–476.
62. Martinez-Garcia FJ, Moreno-Perez JA. Jumping frogs optimization: a new swarm method for
discrete optimization. Technical Report DEIOC 3/2008, Department of Statistics, O.R. and
Computing, University of La Laguna, Tenerife, Spain, 2008.
63. Miranda V, Fonseca N. EPSO—Best of two worlds meta-heuristic applied to power system
problems. In: Proceedings of IEEE congress on evolutionary computation, Honolulu, HI, USA,
May 2002. p. 1080–1085.
64. Mendes R, Kennedy J, Neves J. The fully informed particle swarm: simpler, maybe better.
65. Netjinda N, Achalakul T, Sirinaovakul B. Particle swarm optimization inspired by starling flock
behavior. Appl Soft Comput. 2015;35:411–22.
66. Niu B, Zhu Y, He X. Multi-population cooperative particle swarm optimization. In: Proceedings
of European conference on advances in artificial life, Canterbury, UK, September 2005. p. 874–
883.
67. O’Neill M, Brabazon A. Grammatical swarm: the generation of programs by social program-
ming. Nat Comput. 2006;5:443–62.
68. Pan F, Hu X, Eberhart RC, Chen Y. An analysis of bare bones particle swarm. In: Proceedings
of the IEEE swarm intelligence symposium, St. Louis, MO, USA, September 2008. p. 21–23.
69. Parrott D, Li X. Locating and tracking multiple dynamic optima by a particle swarm model
using speciation. IEEE Trans Evol Comput. 2006;10(4):440–58.
70. Parsopoulos KE, Vrahatis MN. UPSO: a unified particle swarm optimization scheme. In: Pro-
ceedings of the international conference of computational methods in sciences and engineering,
2004. The Netherlands: VSP International Science Publishers; 2004. pp. 868–873.
71. Parsopoulos KE, Vrahatis MN. On the computation of all global minimizers through particle
swarm optimization. IEEE Trans Evol Comput. 2004;8(3):211–24.
72. Passaro A, Starita A. Clustering particles for multimodal function optimization. In: Proceedings
of ECAI workshop on evolutionary computation, Riva del Garda, Italy, 2006. p. 124–131.
73. Pedersen MEH, Chipperfield AJ. Simplifying particle swarm optimization. Appl Soft Comput.
2010;10(2):618–28.
74. Peram T, Veeramachaneni K, Mohan CK. Fitness-distance-ratio based particle swarm opti-
mization. In: Proceedings of the IEEE swarm intelligence symposium, Indianapolis, IN, USA,
April 2003. p. 174–181.
75. Pulido GT, Coello CAC. Using clustering techniques to improve the performance of a par-
ticle swarm optimizer. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Seattle, WA, USA, June 2004. p. 225–237.
76. Qin Q, Cheng S, Zhang Q, Li L, Shi Y. Biomimicry of parasitic behavior in a coevolutionary par-
ticle swarm optimization algorithm for global optimization. Appl Soft Comput. 2015;32:224–
40.
77. Rada-Vilela J, Zhang M, Seah W. A performance study on synchronicity and neighborhood
size in particle swarm optimization. Soft Comput. 2013;17:1019–30.
References 173
78. Ratnaweera A, Halgamuge SK, Watson HC. Self-organizing hierarchical particle swarm opti-
mizer with time-varying acceleration coefficients. IEEE Trans Evol Comput. 2004;8(3):240–
55.
79. Reeves WT. Particle systems—a technique for modeling a class of fuzzy objects. ACM Trans
Graph. 1983;2(2):91–108.
80. Secrest BR, Lamont GB. Visualizing particle swarm optimizationGaussian particle swarm
optimization. In: Proceedings of the IEEE swarm intelligence symposium, Indianapolis, IN,
USA, April 2003. p. 198–204.
81. Seo JH, Lim CH, Heo CG, Kim JK, Jung HK, Lee CC. Multimodal function optimization
based on particle swarm optimization. IEEE Trans Magn. 2006;42(4):1095–8.
82. Settles M, Soule T. Breeding swarms: a GA/PSO hybrid. In: Proceedings of genetic and evo-
lutionary computation conference (GECCO), Washington, DC, USA, June 2005. p. 161–168.
83. Shi Y, Eberhart RC. A modified particle swarm optimizer. In: Proceedings of IEEE congress
on evolutionary computation, Anchorage, AK, USA, May 1998. p. 69–73.
84. Silva A, Neves A, Goncalves T. An heterogeneous particle swarm optimizer with predator
and scout particles. In: Proceedings of the 3rd international conference on autonomous and
intelligent systems (AIS 2012), Aveiro, Portugal, June 2012. p. 200–208.
85. Stacey A, Jancic M, Grundy I. Particle swarm optimization with mutation. In: Proceedings of
IEEE congress on evolutionary computation (CEC), Canberra, Australia, December 2003. p.
1425–1430.
86. Suganthan PN. Particle swarm optimizer with neighborhood operator. In: Proceedings of IEEE
congress on evolutionary computation (CEC), Washington, DC, USA, July 1999. p. 1958–1962.
87. van den Bergh F, Engelbrecht AP. A new locally convergent particle swarm optimizer. In: Pro-
ceedings of IEEE conference on systems, man, and cybernetics, Hammamet, Tunisia, October
2002, vol. 3. p. 96–101.
88. van den Bergh F, Engelbrecht AP. A cooperative approach to particle swarm optimization.
IEEE Trans Evol Comput. 2004;3:225–39.
89. van den Bergh F, Engelbrecht AP. A study of particle swarm optimization particle trajectories.
Inf Sci. 2006;176(8):937–71.
90. Vrugt JA, Robinson BA, Hyman JM. Self-adaptive multimethod search for global optimization
in real-parameter spaces. IEEE Trans Evol Comput. 2009;13(2):243–59.
91. Wang H, Liu Y, Zeng S, Li C. Opposition-based particle swarm algorithm with Cauchy muta-
tion. In: Proceedings of the IEEE congress on evolutionary computation (CEC), Singapore,
September 2007. p. 4750–4756.
92. Yang C, Simon D. A new particle swarm optimization technique. In: Proceedings of the 18th
IEEE international conference on systems engineering, Las Vegas, NV, USA, August 2005. p.
164–169.
93. Zhan Z-H, Zhang J, Li Y, Chung HS-H. Adaptive particle swarm optimization. IEEE Trans
Syst Man Cybern Part B. 2009;39(6):1362–81.
94. Zhang J, Huang DS, Lok TM, Lyu MR. A novel adaptive sequential niche technique for
multimodal function optimization. Neurocomputing. 2006;69:2396–401.
95. Zhang J, Liu K, Tan Y, He X. Random black hole particle swarm optimization and its application.
In: Proceedings on IEEE international conference on neural networks and signal processing,
Nanjing, China, June 2008. p. 359–365.
Artificial Immune Systems
10
EAs and PSO tend to converge to a single optimum and hence progressively lose
diversity. This is not the case for artificial immune systems (AISs). AISs are based on
four main immunological theories, namely, clonal selection, immune networks, neg-
ative selection, and danger theory. This chapter introduces four immune algorithms
inspired by the four immunological theories.
10.1 Introduction
Artificial immune system (AIS) is inspired by ideas gleaned from the biological
immune system. The immune system is a collection of defense mechanisms in a
living body that protects the body from disease by detecting, identifying, and killing
pathogens and tumor cells. It discriminates between the host organism’s own mole-
cules and external pathogenic molecules. It has inherent mechanisms for maintaining
and boosting the diversity of the immune repertoire.
The immune system of vertebrates protects living bodies against the invasion of
various foreign substances (called antigens or pathogens) such as viruses, harmful
bacteria, parasites and fungi, and eliminates debris and malfunctioning cells. This
job does not depend upon prior knowledge of these pathogens. The immune sys-
tem has a memory of previously encountered pathogens in the form of memory
cells. Immune response then quickly destroys the nonself-cells and stores memory
for similar intruders. This protection property, along with the distributed and self-
organized nature, has made the immune system particularly useful within computer
science and for intrusion detection.
The immune system is made up of some organs (e.g., thymus, spleen, lymph
nodes) and a huge number of cells (1012 –1013 in a human being) of different types.
Like the neural system, the immune system has a high degree of robustness. The two

DOI 10.1007/978-3-319-41192-7_10
176 10 Artificial Immune Systems
basic components of the immune system are two types of white blood cells, called
B lymphocytes (B cells) and T lymphocytes (T cells).
B lymphocytes are blood cells produced by bone marrow and migrate to the
spleen, where they mature and differentiate into mature B lymphocytes, which are
then released into the blood and lymph systems. T cells are also produced in the
bone marrow, but migrate to and mature in the thymus. Both B and T cells can
encounter antigens, proliferate and evolve, and mature into fully functional cells.
Cell-mediated immunity is mediated by T lymphocytes, and humoral immunity is
mediated by secreted antibodies produced in B lymphocytes.
Roughly 107 distinct types of B lymphocytes exist in a human body. B lymphocytes
generate a Y-shaped molecular structure called antibody (Ab) on their surfaces to
recognize and bind to foreign cells (antigens) or malfunctioning self-cells. After
maturation, they have B cell receptors of one specific type on their membrane, called
immunoglobulin (Ig) receptors. When a B cell encounters its specific antigen for the
first time through its Ig receptors and receives additional signals from a helper T cell,
it further differentiates into an effector cell called a plasma cell. Plasma cells, instead
of having Ig receptors, produce antibodies, which lock onto the antigens. Antibodies
bind to antigens on the surfaces of invading pathogens and trigger their destruction.
Phagocytes destroy any pathogens.
T lymphocytes regulate the production of antibodies from B lymphocytes. T cells
are produced through negative selection. They express unique T cell receptors. When
a T cell receptor encounters any antigen with major histocompatibility complex
molecule, it undergoes proliferation as well as production of memory cells.
Lymphocytes normally stay in a passive state until they encounter antigens. After
an infection, the antigen leaves a genetic blueprint memory on B or T lymphocytes
so that each lymphocyte recognizes one type of antigen. Some cloned B cells can
differentiate into B memory cells. Adaptive cells that are not stimulated by any
antigen are eliminated. This phenomenon is called immunological memory. Memory
cells circulate through the body. They live long by costimulating one another in a
way that mimics the presence of the antigen.
When exposed to an antigenic stimulus, B lymphocytes differentiate into plasma
that are capable of producing high-affinity antibodies for the specific antigen. These
new cloned cells suffer a high-rate somatic mutation (or hypermutation) that will
promote their genetic variation; a mechanism of selective pressure will result in the
survival of cells with increased affinity. An antibody recognizes and eliminates a
specific type of antigen. The damaged antigens are eliminated by scavenger cells
called macrophages. The immune system is required to recognize all cells (or mole-
cules) within the body as self or nonself. A simplified view of the immune system is
illustrated in Figure 10.1.
Memory can also be developed in an artificial manner by means of vaccination.
Vaccines are attenuated live virus or dead pathogenic cells that can activate the
immune system to develop resistance to particular pathogen groups. When a vaccine
is administered, the immune system detects the vaccine and develops resistance
against the pathogen in the vaccine. These memory cells recognize real pathogens
and defend the body before severe damage results.
Infect Recognize Destroy
Figure 10.1 Principle of the immune system: Red shape stands for the antigen, blue ones for
immune system detectors, and green one denotes the antibody. Antigens are eliminated by general-
purpose scavenger cells (macrophages). Reproduced from Figure 1 in [15].
AIS has unique characteristics of pattern recognition, self-identity, optimization,

and machine learning [6]. The adaptability of the immune system to diverse bacteria
and viruses in the environment can conceptually be formulated as a multimodal
function optimization problem, with the antibodies being points in the decision space
and the antigens being the solutions.
10.2 Immunological Theories
Four main immunological theories are clonal selection [1,3], immune networks [21],
negative selection, and danger theory [23]. The learning and memory mechanisms of
the immune system typically take clonal selection and immune network theories as a
basis, whereas the selection of detectors for identifying anomalous entities is based
on the negative selection theory. The biological immune system has the features of
immunological memory and immunological tolerance.
Artificial immune networks [2,10,29] employ two types of dynamics. The short-
term dynamics govern the concentration of a fixed set of lymphocyte clones and
the corresponding immunoglobins. The metadynamics govern the recruitment of
new species from an enormous pool of lymphocytes freshly produced by the bone
marrow. The short-term dynamics correspond to a set of cooperating or competing
agents, while the metadynamics refine the results of the short-term dynamics. In this
sense, the short-term dynamics resemble evolution and the metadynamics resemble
learning.
Clonal Selection Theory
In clonal selection theory [3], when immune cells are stimulated by antigens, clonal
proliferation occurs; a large number of clones are generated, and then these clones
differentiate into effect cells and memory cells. Effect cells generate a large number
of antibodies, which duplicate and mutate to make affinities gradually increase and
eventually reach affinity maturation. Clonal selection theory simulates the evolution
of immune cells, which can learn and memorize the modes of antigens.
The antibodies with good affinity value are selected as parents and are led to
proliferation by producing multiple offspring in an asexual manner (mitosis). In
immunology, cloning corresponds to asexual reproduction so that multiple identical
cells can be obtained from a parent by mitosis. These offspring are copies of parent
antibodies, but further undergo affinity maturation. An offspring replaces the parent
only if it has improved its fitness value.
Clonal selection theory describes the basic features of an immune response to an
antigenic stimulus. The clonal operation is an antibody random map induced by the
affinity, including four steps, namely clone, clonal crossover, clonal mutation, and
clonal selection.
Immune Networks
Immune network theory [21] states that the B cells are interconnected to form a
network. It is an important complement to clonal selection theory. When a B cell
is stimulated by an antigen, the stimulated cell activates other B cells in the net-
work through its paratopes [26]. Cells with close resemblance to one another are
suppressed, and new cells are generated to replace lost cells. Thus, the network can
maintain population diversity and equilibrium. B cells can edit their receptors by
randomly changing the genetic orientation of their receptors. The change may result
in higher affinity between the antigen epitope and the B cell antibody. When a B
cell is first activated, it increases in number; this stimulates the neighboring cell to
suppress the first stimulated antibody. Differential equations are designed to accom-
modate idiotypic interactions, in consideration of antigenic recognition, death of
unstimulated cells, and influx of new cells [12].
Idiotypic network theory is derived from immune network theory. It postulates
that the immune system can be seen as a network in which the interactions can not
only be between antigens and antibodies, but also between antibodies and antibodies.
This induces either stimulating or suppressive immune responses. These result in a
series of immunological behaviors, including tolerance and memory emergence.
Negative Selection
Negative selection is a way of differentiating self from nonself. The immune system
destroys all the generated antibodies, which are similar to self to avoid self-destructive
immune responses. Negative selection is performed in the thymus, where all T cells
that recognize self-cells are excluded, whereas T cells having less affinity to self-cells
are tolerated and released to the system. The negative selection algorithm mimics
this biological process of generating mature T cells and self-/nonself-discrimination.
This allows the immune system to detect previously unseen harmful cells.
Danger Theory
Danger theory [23,24], proposed by Matzinger in 1994, argues that the objective of
immune system is not to discriminate between self and nonself, but to react to signs
10.2 Immunological Theories 179
of damage to the body. It explains why the immune system is able to distinguish
the nonself-antigens and self-antigens. The nonself-antigens make the body produce
biochemical reactions different from natural rules and the reactions produce danger
signals of different levels. Danger theory introduces the environmental factors of the
body. It can explain some immune phenomena, such as autoimmune diseases.
Danger theory states that the immune system will only respond when damage
is indicated and is actively suppressed otherwise. The immune system is triggered
by a danger signal produced by a necrotic cell which unexpectedly dies due to a
pathogenic infection. When a cell is infected, it establishes a danger zone around
itself to mitigate and localize the impact of the attack. In principle, danger theory
views all cells in the human body as antigens. It relies on the function of dendrite
cells, a family of cells known as macrophages. In nature, dendritic cells are the
intrusion detection agents of the human body, monitoring the tissue and organs for
potential invaders in the form of pathogens.
Signals are collected by dendritic cells from their local environment. Dendritic
cells combine molecular information and interpret this information for the T cells
and controls the activation state of T cells in the lymph nodes. The dendrite cell has
three states, namely immature state, semi-mature state, and mature state.
The immune system produces danger signals in the form of molecules based
on the environmental changes. These molecules are released as a by-product of
unplanned cell death, necrosis. By combining the signals from the tissue, these
cells produce their own output signals to instruct the responder cells of the immune
system to deal with the potential damage. The danger signal creates a danger zone
around itself and immune cells within this zone will be activated to participate in
the immune response. Danger signals are indicators of abnormality. The PAMP
(pathogenic associated molecular patterns) signals are a class of molecules expressed
exclusively by microorganisms such as bacteria. They are processed as environmental
input and are a strong indicator that a non-host-based entity is present. Safe signals
are released as a result of healthy tissue cell function and this form of cell death is
termed apoptosis.
At the beginning of the detection process, the dendrite cells are initially immature
cells in thymus. The dendrite cell collects the body cell protein paired with its three
signals in cell tissue. Based on the collected input, the dendrite cell will evolve from
being immature into either a semi-mature (apoptotic death) or a mature state (necrotic
death). At this phase, the dendrite cell is migrated from cell tissue to lymph node.
Reaching a mature state indicates that the cell has experienced more danger signals
throughout its life span, and that the harmful antigen has been detected and a danger
zone will be released. In a mature state, T cells are activated to release antibody. A
semi-mature state indicates that apoptotic death has occurred as part of normal cell
function, and the semi-mature dendrite cells cannot activate T Cells and they are
tolerized to the presented antigen.
10.3 Immune Algorithms
An AIS incorporates many properties of natural immune systems of vertebrates,

including diversity, distributed computation, error tolerance, dynamic learning and
adaptation, and self-monitoring. The immune system distinguishes between dan-
gerous and nondangerous pathogens through learning. In general, clonal selection
principle is utilized to design the immune algorithms due to its self-organizing and
learning capability.
AIS is very similar to GA. The antigen/pathogen is defined as the problem to opti-
mize, and antibodies are candidate solutions to the problem. Qualities of candidate
solutions correspond with affinities between antibodies and antigens. The process of
seeking feasible solutions is the process of immune cells recognizing antigens and
making immune responses in the immune system. Nonself-antigens are constraints.
In a way similar to EAs, immune algorithms typically evolve solutions by repeating
a cloning, mutation, and selection cycle to a population of candidate solutions and
remaining good solutions in the population. The antigen can also be defined as the
pattern to be recognized or the training data.
10.3.1 Clonal Selection Algorithm
Clonal selection algorithm (CLONALG) [10] simulates the activation process of

immune cells. It searches for the global optimal solutions through the cloning and
high-frequency variation of immune cells that can recognize antigens. CLONALG
imitates the learning and affinity maturation processes deployed by immune (B)
cells. Antibodies are used to represent a variety of immune cells. The algorithm
is formulated based on clonal selection principle and affinity maturation process
of adaptive immune response, and exploits the clonal selection, proliferation, and
differentiation features to design a selection method in a way to reproduce good
solutions and replace weak solutions. Affinity maturation corresponds to learning
from new patterns. The solutions are justified by their affinity toward the antigens,
which can be treated as fitness. CLONALG introduces suppress cells to change the
search scope and memory cells to keep the candidate solutions.
Antibodies’ affinity to an antigen is first tested. Only those lymphocytes that
recognize the antigens are selected to proliferate. The selected lymphocytes are
subject to an affinity maturation process, which improves their affinity to the antigens.
Antibodies with higher affinity are then cloned and hypermutated to attain an even
better affinity. Hypermutation helps antibodies in the cloned set to exploit their
local area in the decision space. Antibodies with low affinity will be eliminated and
replaced by new antibodies, or they will undergo receptor editing in an attempt to
improve their affinity.
Learning involves raising the relative population size and affinity of those lym-
phocytes. Immune algorithm first recognizes the antigen, and produces antibodies
from memory cells. Then it calculates the affinity between antibodies. Antibodies
are dispersed to the memory cell and the concentration of antibodies is controlled
10.3 Immune Algorithms 181
Algorithm 10.1 (CLONALG).
1. Set t = 0.
2. Initialize a population P of cells (antibodies) and a set of memory M = ∅.
3. Repeat:
a. Selection. Select n best cells (antibodies) to generate a new population Pn
according to the affinity principle.
b. Clonal. Reproduce a population of clones C from the population Pn . More
offspring are produced for higher affinity cells.
c. Maturation. Hypermutate the cells to create the population C ∗ .
d. Reselection. Reselect the improved cells from C ∗ and update the memory set
M.
e. Diversity introduction. Replace d cells in P with Nd newly generated cells.
f. Set t = t + 1.
by stimulating or suppressing antibodies. A diversity of antibodies for capturing

unknown antigen is generated using genetic operators.
CLONALG is described in Algorithm 10.1.
In Step 3c), a lower mutation rate Pm is assigned to higher affinity cells and vice
versa. This is to maintain the cells close to a local optimum and to move cells far from
an optimum larger steps toward other regions of the affinity landscape. In Step 3e,
the lower affinity cells will have higher probability of being replaced.
For solving multimodal problems, multiple optima are needed to be located within
a single population of antibodies. In this case, all the n antibodies from Pn will be
selected for cloning and the affinity proportionate cloning is not necessarily applica-
ble. Each antibody will be viewed locally and have the same clone size as the other
ones. The antigenic affinity will only be accounted to determine the hypermutation
rate for each antibody, which is still proportional to their affinity.
CLONALG is similar to mutation-based EAs and has good features for optimiza-
tion and search. There are some major differences between CLONALG and GA.
Inspired by the immune system, CLONALG performs proportionate selection and
affinity inversely proportional hypermutation, but no crossover operation. CLON-
ALG has a unique clone step. It has an elitism mechanism. It uses binary repre-
sentation in antibodies. CLONALG is capable of allocating multiple optima and
maintaining local optimal solutions, while GA tends to converge the whole popu-
lation toward the best candidate solution. CLONALG can be regarded as a parallel
version of (1 + round(β N P )-ES with adaptive mutation control, where β is a clone
factor and N P is the population size.
CLONALG favors only high-affinity antibodies, making it suitable for high-peak

problems [10]. In CLONALG, the algorithmic parameters need to be specifically
defined by users. These parameters are evolved in [16].
In [31], antibody clustering is introduced in the clonal selection process. A single
population is expanded to a multipopulation by performing antibody clustering.
Competition selection is introduced in each subpopulation. The current best antibody
that has the maximum affinity is used to represent the cluster center in the elite set.
Gaussian mutation is used to promote local search, whereas Cauchy mutation is used
to explore new search areas.
Aging is also used in AISs. In the static pure aging scheme [5], search points are
associated with an individual age and the age is increased by 1 in each round. The
offspring inherits the age of its parent and is only assigned age 0 if its fitness is strictly
larger than its parent’s fitness. This aging scheme gives an equal opportunity to each
improving new search point to effectively explore the landscape. The performance
improvement of AIS with aging can be achieved when aging is replaced by an
appropriate restart strategy.
Inspired by EDAs, Bayesian AIS [7] uses a Bayesian network as the probabilistic
model to replace the mutation and cloning operators for generating new antibodies.
In Bayesian AIS, the initial population is generated at random. From the current
population, the best solutions are selected. A Bayesian network that properly fits the
selected antibodies is constructed. A number of new antibodies sampled from the
network are then inserted into the population and those similar to and with lower
fitness than selected ones are eliminated. A few individuals are generated randomly
and inserted into the population in order to favor diversity. Bayesian AIS performs
multimodal optimization, adjusts dynamically the size of the population according
to the problem.
Vaccine-enhanced AIS is designed for multimodal function optimization [28,30].
Vaccines are emulated to promote exploration in the search space. In [28], the points
randomly initialized in the decision space are considered as antibodies, whereas
points generated in a special manner to explore new areas are treated as weak antigens
that could be used for vaccination to enhance exploration in the decision space.
Multiple subpopulations are produced and undergo parallel search in all subspaces
by performing mutation and selection in each subpopulation. Similar antibodies
are eliminated, retaining those with better affinity. In [30], the decision space is
first divided into equal subspaces. The vaccine is then randomly extracted from each
subspace. A few of these vaccines, in the form of weakened antigens, are then injected
into the algorithm to enhance the exploration of global and local optima.
The immune mechanism can also be defined as a genetic operator and integrated
into GA [22]. Immune operator overcomes the blindness in action of crossover and
mutation and to make the fitness of population increase steadily. Composed of the
vaccination and immune selection operations, it utilizes reasonably selected vaccines
to intervene in the variation of genes in an individual chromosome.
The immune system is useful to maintain diversity in the population of GA used

to solve multimodal optimization problems [27]. The main idea is to construct a
population of antigens and a population of antibodies. Matching of an antibody and
an antigen is determined if their bit strings are complementary. Antibodies are then
matched against antigens and a fitness value is assigned to each antibody based on this
matching. Finally, simple GA is used to replicate the antibodies that better match
the antigens present. In this model, GA must discover a set of pattern-matching
antibodies that effectively match a set of antigen patterns. In this way, GA can
automatically evolve and sustain a diverse, cooperative population. This effect is
similar to fitness sharing in GA.
Example 10.1: Revisit the Rastrigin function treated in Example 6.1 and Exam-
ple 14.1. The global optimum is f (x) = 0 at x ∗ = 0.
We implement CLONALG with the following parameters: population size is set
as 50, the best population size as 20, the clone size factor as 0.7, and the maximum
number of iterations as 100. The initial population is randomly generated from the
entire domain. For a random run, we have the optimum solution f (x) = 1.2612 ×
10−6 at (−0.1449 × 10−4 , −0.7840 × 10−4 ). For 10 random runs, the solver always
converged toward a point very close to the global optimum. The evolution of a random
run is illustrated in Figure 10.2. The average cost is the mean of the 20 best solutions,
and it is very close to the optimum solution. CLONALG has very good diversity,
since the clone and mutation operations are applied on the 20 best solutions for
each generation. It continuously searches for the global optimum even after many
iterations.
Best value: 1.2612e−006 Mean value: 1.2627e−006

2
10
Best value
Mean value
0
10
Fitness value
−2
10
−4
10
−6
10
0 50 100 150 200
Iteration
Figure 10.2 The evolution of a random run of CLONALG for Rastrigin function: the minimum
and average objectives.
10.3.2 Artificial Immune Network
The aiNet (artificial immune network) [9] combines CLONALG with immune net-
work theory for solving optimization problems. It is a connectionist, competitive and
constructive network, where the antibodies correspond to the network nodes and the
antibody concentration and affinity are their states. Learning is responsible for the
changes in antibody concentration and affinity. The decision as to which node is to
be cloned, suppressed, or maintained depends on the interaction established by the
immune network using an affinity measure. Learning aims at building a memory set
that recognizes and represents the antigenic spatial distribution. The nodes work as
internal images of ensembles of patterns, and the connection strengths describe the
similarities among these ensembles.
Optimized aiNet (opt-aiNet) [8] adapts aiNet for multimodal optimization prob-
lems by locating and maintaining a memory of multiple optimal solutions. It can
dynamically adjust the population size and maintain stable local optima solutions.
Opt-aiNet represents cells by real-valued vectors in the search space. The initial
population goes through fitness evaluation, cloning, and mutation operations. After
these operations, fitter antibodies from each clone are selected and passed to form
the memory set. This process is repeated until the available population stabilizes
in the local search. When this population reaches a stable state, the cells interact
with one another in a network form, and some of the cells with affinity above a
preset threshold are eliminated to avoid redundancy. Antibodies that have affinities
less than the suppression threshold are eliminated. The affinity between two cells is
determined by their Euclidean distance. Afterward, new antibodies are introduced
to the system to encourage the exploration in the decision space. Opt-aiNet delivers
the selection process to clone level by selecting the elitist from each clone. Roughly,
the computational complexity of the algorithm is quadratic in the number of cells in
the network. Opt-aiNet algorithm is described in Algorithm 10.2.
As a member of aiNet family, omni-aiNet [4] presents self-maintenance of diver-
sity in the population, simultaneous search for multiple high-quality solutions, and
dynamical adjustment of its population by adapting to the optimization problem.
The dopt-aiNet algorithm [11] enhances the diversity of the population, and refines
individuals of solutions to suit dynamic optimization. It introduces golden section
line search procedure for choosing the best step size of mutation, and two muta-
tion operators, namely, one-dimensional mutation and gene duplication, are used.
Danger theory-based immune network algorithm [32], named dt-aiNet, introduces
danger theory into aiNet algorithm in order to increase the solution quality and the
population diversity.
Algorithm 10.2 (opt-aiNet).
1. Set t = 0.
2. Initialize a population P with N cells.
Initialize Nc , Ns , σs .
3. Repeat:
a. for each cell - Generate Nc clones.
Mutate the clones.
Determine the fitness of each clone.
Select the best cell among the clones and parent cell to form the new population.
end for
b. Determine the average fitness of the new population.
c. if clone suppression should be made (t mod Ns == 0)
Determine the affinity (distance) among all cells.
Suppress cells according to threshold σs .
Introduce randomly generated cells.
end if
d. Set t = t + 1.
10.3.3 Negative Selection Algorithm
Negative selection algorithm [14] is inspired from the negative selection mecha-
nism with the ability to detect unknown antigens. An efficient implementation of
the algorithm (for binary strings) run in linear time with the number of self input
patterns [14].
At the beginning, the algorithm treats the profiled normal patterns as self-patterns,
which represent the typical property of the date stream to protect. Then, it generates a
number of random patterns (called detectors) and compares them to each self-pattern
to check if a detector recognizes self-patterns. If a detetor matches a self-pattern, it
is discarded, otherwise it is kept as a detector pattern. This process is repeated until
sufficient detectors are accumulated. In the monitoring phase, if a detector pattern
matches any newly profiled pattern, anomaly must have occurred since the data are
corrupted or altered. The detectors are hard to determine so as to cover all data to
protect.
The negative selection algorithm has been applied to anomaly detection, such as
detecting computer security in computer networks [14,15]. In [20], AIS is applied
to computer security in the form of a network intrusion detection system.
Receptor density algorithm [25] is an AIS developed from models of the immuno-
logical T cell and the T-cell receptor’s ability to contribute to T-cell discrimination. It
is an anomaly detection system for generation of clean signatures. Stochastic analy-
sis of the T-cell mechanism modeling results in a hypothesis for T-cell activation,
which is abstracted to a simplified model retaining key mechanisms. The algorithm
places a receptor at each discretized location within the spectrum. At each time step,
each receptor takes an input and produces a binary classification on whether that
location is considered anomalous.
10.3.4 Dendritic Cell Algorithm
Danger theory provides inspiration for a robust, highly distributed, adaptive, and
autonomous detection mechanism for early outbreak notification with excellent
detection results. Dendritic cell algorithm [17,19] is a population-based algorithm
inspired by the function of the dendritic cells of the human immune system. It incor-
porates the principles of danger theory in immunology. The algorithm is a multi-
sensor data fusion and correlation algorithm that can perform anomaly detection on
time series datasets.
Dendritic cell algorithm does not require a training phase and knowledge of nor-
mality and anomaly is acquired through statistical analysis. It has a linear computa-
tional complexity, making it ideal for anomaly detection tasks, which require high
detection speed. Dendritic cell algorithm has shown a high detection rate and a low
rate of false alarms.
Each dendritic cell in the population has a set of instructions which is followed
each time a dendritic cell is updated. Each dendritic cell performs its own antigen
sampling and signal collection. It is capable of combining multiple data streams and
can add context to data suspected as anomalous. Diversity is generated by migration
of the dendritic cells. Each dendritic cell can perform fusion of signal input to produce
its own signal output. The assessment of the signal output of the entire population is
used to perform correlation with suspect data items.
In dendritic cell algorithm, three types of signals are used. PAMP signal is a con-
fident indicator of anomaly. Danger signal is an indicator of a potential abnormality.
Safe is a confident indicator of normal, predictable, or steady-state system behavior.
Predefined weights are incorporated for each signal category. The output signals are
used to evaluate the status of the monitored system. By defining the danger zone
to calculate danger signals for each antibody, the algorithm adjusts antibodies’ con-
centrations through its own danger signals and then triggers immune responses of
self-regulation.
The input data is mapped to the underlying problem domain. Signals are rep-
resented as vectors of real-valued numbers. Antigens are categorical values repre-
senting what are to be classified within a problem domain. The algorithm aims to
incorporate a relationship to identify antigens that are responsible for the anomalies
reflected by signals. The algorithm first identifies whether anomalies occurred in
the past based on the input data. Then it correlates the identified anomalies with the
potential causes, generating an anomaly scene per suspect.
The dendrite cell acts as an agent that is responsible for collecting antigen cou-
pled with its three context signals. The antigen presents each record contained in the
dataset and the signals present the normalized value of the selected attributes. Each
dendrite cell accumulates the changes that occur in the monitored system and deter-
mines which antigen causes the changes. All input signals are transformed into three
outputs signals, namely, immature (co-stimulatory molecules), mature, semi-mature
states: 3
i=0 Wi j Ii j (x)
O j (x) = 3
, (10.1)
i=0 |Wi j |
where W = [Wi j ] is the weight matrix, I = [Ii j ] is the input signal matrix, O is the
output signal vector, i is the input signal category, and j is the output signal category.
The dendrite cell samples input signals and antigens multiple times. This is analo-
gous to sampling a series of suspected antigens in human body such that the dendrite
cell will hold several antigens until it matures. Throughout the sampling process, the
experience of each cell is increasing whereby the entire experience is documented
in immature (O1 ), mature (O2 ), and semi-mature (O3 ) as output signals. The sam-
pling process stops when the cell is ready to migrate. This occurs when O1 reaches
the migration threshold and the cell is then removed from the population for anti-
gen presentation. After migration, the outputs O2 and O3 are compared in order to
derive a context for the presented item. The antigen is treated mature if O2 > O3
or semi-mature if O2 < O3 . Then, the migrated dendrite cell is replaced with a new
cell to restart sampling and return to the population. This process is iterated several
times.
A prototype dendritic cell algorithm [17] has been applied to a binary classification
problem which can perform two-class discrimination on an ordered dataset, using
a time stamp as antigen and a combination of features forming the three signal
categories. Deterministic dendritic cell algorithm [18] provides a controllable system
by removing a large amount of randomness from the algorithm.
Problems
10.1 Compare aiNet with (μ + λ)-ES for optimization.

10.2 List the major features of the four theories of immunology.
References
1. Ada GL, Nossal GJV. The clonal selection theory. Sci Am. 1987;257(2):50–7.
2. Atlan H, Cohen IR. Theories of immune networks. Berlin: Spriner; 1989.
3. Burnet FM. The clonal selection theory of acquired immunity. Cambridge, UK: Cambridge
University Press; 1959.
4. Coelho GP, Von Zuben FJ. Omni-aiNet: an immune-inspired approach for omni optimiza-
tion. In: Proceedings of the 5th international conference on artificial immune systems, Oeiras,
Portugal, Sept 2006. p. 294–308.
5. Cutello V, Nicosia G, Pavone M. An immune algorithm with stochastic aging and Kullback
entropy for the chromatic number problem. J Combinator Optim. 2007;14(1):9–33.
6. Dasgupta D. Advances in artificial immune systems. IEEE Comput Intell Mag. 2006;1(4):40–9.
7. de Castro PAD, Von Zuben FJ. BAIS: a Bayesian artificial immune system for the effective
handling of building blocks. Inf Sci. 2009;179(10):1426–40.
8. de Castro LN, Timmins J. An artificial immune network for multimodal function optimization.
In: Proceedings of IEEE congress on evolutionary computation, Honolulu, HI, USA, May
2002, vol. 1, p. 699–704.
9. de Castro LN, Von Zuben FJ. aiNet: an artificial immune network for data analysis. In: Abbass
HA, Sarker RA, Newton CS, editors. Data mining: a heuristic approach. Hershey, USA: Idea
Group Publishing; 2001. p. 231–259.
10. de Castro LN, Von Zuben FJ. Learning and optimization using the clonal selection principle.
11. de Franca FO, Von Zuben FJ, de Castro LN. An artificial immune network for multimodal
function optimization on dynamic environments. In: Proceedings of genetic and evolutionary
computation conference (GECCO), Washington, DC, USA, June 2005. p. 289–296.
12. Engelbrecht AP. Computational intelligence: an introduction. New York: Wiley; 2007.
13. Ferreira C. Gene expression programming: a new adaptive algorithm for solving problems.
Complex Syst. 2001;13(2):87–129.
14. Forrest S, Perelson AS, Allen L, Cherukuri R. Self-nonself discrimination in a computer. In:
Proceedings of IEEE symposium on security and privacy, Oakland, CA, USA, May 1994. p.
202–212.
15. Forrest S, Hofmeyr SA, Somayaji A. Computer immunology. Commun ACM. 1997;40(10):88–
96.
16. Garret SM. Parameter-free, adaptive clonal selection. In: Proceedings of IEEE congress on
evolutionary computation (CEC), Portland, OR, June 2004. p. 1052–1058.
17. Greensmith J, Aickelin U. Dendritic cells for SYN scan detection. In: Proceedings of genetic
and evolutionary computation conference (GECCO), London, UK, July 2007. p. 49–56.
18. Greensmith J, Aickelin U. The deterministic dendritic cell algorithm. In: Proceedings of the
7th International conference on artificial immune systems (ICARIS), Phuket, Thailand, August
2008. p. 291–303.
19. Greensmith J, Aickelin U, Cayzer S. Introducing dendritic cells as a novel immune-inspired
algorithm for anomaly detection. In: Proceedings of the 4th international conference on artificial
immune systems (ICARIS), Banff, Alberta, Canada, Aug 2005. p. 153–167.
20. Hofmeyr SA, Forrest S. Architecture for an artificial immune system. Evol Comput.
2000;8(4):443–73.
21. Jerne NK. Towards a network theory of the immune system. Annales d’Immunologie (Paris).
1974;125C:373–89.
22. Jiao L, Wang L. A novel genetic algorithm based on immunity. IEEE Trans Syst Man Cybern
Part A. 2000;30(5):552–61.
23. Matzinger P. Tolerance, danger and the extended family. Annu Rev Immunol. 1994;12:991–
1045.
24. Matzinger P. The danger model: a renewed sense of self. Science. 2002;296(5566):301–5.
25. Owens NDL, Greensted A, Timmis J, Tyrrell A. T cell receptor signalling inspired kernel
density estimation and anomaly detection. In: Proceedings of the 8th international conference
on artificial immune systems (ICARIS), York, UK, Aug 2009. p. 122–135.
26. Perelson AS. Immune network theory. Immunol Rev. 1989;110:5–36.
27. Smith RE, Forrest S, Perelson AS. Population diversity in an immune system model: implica-
tions for genetic search. In: Whitley LD, editor. Foundations of genetic algorithms, vol. 2. San
Mateo, CA: Morgan Kaufmann Publishers; 1993. p. 153–165.
28. Tang T, Qiu J. An improved multimodal artificial immune algorithm and its convergence
analysis. In: Proceedings of world congress on intelligent control and automation, Dalian,
China, June 2006. p. 3335–3339.
29. Varela F, Sanchez-Leighton V, Coutinho A. Adaptive strategies gleaned from immune networks:
Viability theory and comparison with classifier systems. In: Goodwin B, Saunders PT, editors.
Theoretical biology: epigenetic and evolutionary order (a Waddington Memorial Conference).
Edinburgh, UK: Edinburgh University Press; 1989. p. 112–123.
30. Woldemariam KM, Yen GG. Vaccine-enhanced artificial immune system for multimodal func-
tion optimization. IEEE Trans Syst Man Cybern Part B. 2010;40(1):218–28.
References 189
31. Xu X, Zhang J. An improved immune evolutionary algorithm for multimodal function opti-
mization. In: Proceedings of the 6th international conference on natural computing, Haikou,
China, Aug 2007. p. 641–646.
32. Zhang R, Li T, Xiao X, Shi Y. A danger-theory-based immune network optimization algorithm.
Sci World J;2013:Article ID 810320, 13 p.
Ant Colony Optimization
11
Ants are capable of finding the shortest path between the food and the colony using
a pheromone-laying mechanism. ACO is a metaheuristic optimization approach
inspired by this foraging behavior of ants. This chapter is dedicated to ACO.
11.1 Introduction
Eusociality has evolved independently among the hymenoptera insects (ants and
bees), and among the isoptera insects (termites). These two orders of social insects
have almost identical social structures: populous colonies consisting of sterile work-
ers, often differentiated into castes that are the offspring of one or a few reproductively
competent individuals. This type of social structure is similar to a superorganism, in
which the colony has many attributes of an organism, including physiological and
structural differentiation, coordinated and goal-directed action.
Many species of ants have foraging behavior. The strategies of two types of poner-
ine ant are the army ant style foraging of the genus Leptogenys and the partitioned
space search of Pachycondyla apicalis.
Termite swarms are organized through a complex language of tactile and chem-
ical signals between individual members. These drive the process of recruitment in
response to transient perturbation of the environment. A termite can either experience
a perturbation directly, or is informed of it by other termites. The structures as well
as their construction of the mound of Macrotermes have been made clear in [22].
Swarm cognition in these termites is in the form of extended cognition, whereby
the swarm’s cognitive abilities arise both from interaction among agents within a
swarm, and from the interaction of the swarm with the environment, mediated by
the mound’s dynamic architecture.

DOI 10.1007/978-3-319-41192-7_11
192 11 Ant Colony Optimization
Ants are capable of finding the shortest path between the food and the colony (nest)
due to a simple pheromone-laying mechanism. Inspired by the foraging behavior of
ants, ACO is a metaheuristic approach for solving discrete or continuous optimization
problems [1,2,4–6]. Unlike in EAs, PSO and multiagent systems where agents do
not communicate with each other, agents in ant-colony system communicate with
one another with pheromone. The optimization is the result of the collective work of
all the ants in the colony.
Ants use their pheromone trails as a medium for communicating information.
All the ants secrete pheromone and contribute to the pheromone reinforcement, and
old trails will vanish due to evaporation. The pheromone builds up on the traversed
links between nodes. An ant selects a link probabilistically based on the intensity of
the pheromone. Ant-Q [3,8] merges ant-colony system with reinforcement learning
such as Q-learning to update the amount of pheromone on the succeeding link. Ants
in the ant-colony system use only one kind of pheromone for their communication,
while natural ants also use haptic information for communication and possibly learn
the environment with their micro brain.
In ACO, simulated ants walk around the graph representing the problem to solve.
ACO has an advantage over SA and GA when the graph changes dynamically. ACO
has been extended to continuous domains without any major conceptual change
to ACO structure, applied to continuous and mixed discrete-continuous problems
[18,19].
11.2 Ant-Colony Optimization
ACO (http://www.aco-metaheuristic.org/) can be applied to discrete COPs, where

solutions can be expressed in terms of feasible paths on a graph. In every iteration,
artificial ants construct solutions randomly but guided by pheromone information
from former ants that found good solutions. Among all feasible paths, ACO can locate
the one with a minimum cost. ACO algorithm includes initialization, construction of
ants’ solutions, applying optional local search, updating pheromones, and evaluation
of the termination criterion.
Ant system [5] was initially designed for solving the classical TSP. The ant system
uses the terminology of EAs. Several generations (iterations) of artificial ants search
for good solutions. Every ant of a generation builds up a complete solution, step by
step, going through several decisions by choosing the nodes on a graph according
to a probabilistic state transition rule, called the random-proportional rule. When
building its solution, each ant collects information based on the problem character-
istics and its own performance. The information collected by the ants during the
search process is stored in pheromone trails τ associated to the connection of all
edges. The ants cooperate in finding the solution by exchanging information via the
11.2 Ant-Colony Optimization 193
pheromone trials. Edges can also have an associated heuristic value to represent a
priori information about the problem instance definition or runtime information pro-
vided by a source different from the ants. Once all ants have completed their tours
at the end of each generation, the algorithm updates the pheromone trails. Different
ACO algorithms arise from different pheromone update rules.
The probability for ant k at node i moving to node j at generation t is defined by [5]
−β
τi, j (t)di, j
Pi,k j (t) = −β
, j ∈ Jik , (11.1)
u∈Jik τi,u di,u
where τi, j is the intensity of the pheromone on edge i → j, di, j is the distance
between nodes i and j, Jik is the set of nodes that remain to be visited by ant k
positioned at node i to make the solution feasible, and β > 0. A tabu list is used to
save the nodes already visited during each generation. When a tour is completed, the
tabu list is used to compute the ant’s current solution.
Once all the ants have built their tours, the pheromone is updated on all edges
i → j according to a global pheromone updating rule

NP
τi, j (t + 1) = (1 − ρ)τi, j (t) + τi,k j (t), (11.2)
k=1
where τi,k j is the intensity of the pheromone on edge i → j laid by ant k, taking
L k if ant k passes edge i → j and 0 otherwise, ρ ∈ (0, 1) is a pheromone decay
1
parameter, L k is the length of the tour performed by ant k, and N P is the number
of ants. Consequently, a shorter tour gets a higher reinforcement. Each edge has a
long-term memory to store the pheromone intensity. In ACO, pheromone evaporation
provides an effective strategy to avoid rapid convergence to local optima and to favor
the exploration of new areas of the search space.
Finally, a pheromone renewal is again implemented by
τi, j (t + 1) ← max{τmin , τi, j (t + 1)} ∀(i, j). (11.3)
Ant-colony system [4] improves on ant system [5]. It applies a pseudorandom-
proportional state transition rule. The global pheromone updating rule is applied only
to edges that belong to the best ant tour, while in ant system, the pheromone update
is performed at a global level by every ant. Ant-colony system also applies a local
pheromone updating rule during the construction of a solution, which is performed
by every ant every time node j is added to the path being built.
Max–min ant system [20] improves ant system by introducing explicit maximum
and minimum trail strengths on the arcs to alleviate the problem of early stagnation.
In both max–min ant system and ant-colony system, only the best ant updates the
trails in each iteration. The two algorithms differ mainly in the way how a premature
stagnation of the search is prevented.
A convergence proof to the global optimum, which is applicable to a class of ACO
algorithms that constrain all pheromone values not smaller than a given positive lower
bound, is given in [21]. This lower bound prevents the probability of generating any
solution becoming zero. This proof is applicable directly to ant-colony system [4]
and max–min ant system [20]. A short convergence proof for a class of ACO is given
in [21].
In [14], the dynamics of ACO algorithms are analyzed for certain types of per-
mutation problems using a deterministic pheromone update model that assumes an
average expected behavior of the algorithms. In [16], a runtime analysis of sim-
ple ACO algorithm is presented. By deriving lower bounds on the tails of sums of
independent Poisson trials, the effect of the evaporation factor is almost completely
determined and a transition from exponential to polynomial runtime is proved. In
[11], an analysis of ACO convergence time is made based on the absorbing Markov
chain model, and the relationship between convergence time and pheromone rate is
established.
11.2.1 Basic ACO Algorithm
An NP-hard COP can be denoted by (S, , f ), where S is the discrete solution

space, is the constraint set, f : S → R + is the objective function, and R + is the
positive real domain. The output is the best solution sbest .
ACO has been widely used to tackle COPs [2,6]. In ACO, artificial ants randomly
walk on a graph G = (V, E, W L , WT ), where V is the set of vertices, E is the
set of edges, and W L and WT are, respectively, the length and weight matrices
of the edges. Besides the initialization step, ACO is a loop of the ant’s solution
construction, evaluation of the solutions, optional local search, and the pheromone
update, until the termination condition is satisfied. The basic ACO algorithm is given
by Algorithm 11.1 [6].
In Algorithm 11.1, T = [τi, j ] is the pheromone matrix and Ss (t) is the set of
solutions obtained by ants. Step 2 initializes the pheromone matrix, τi, j (0) = τ0 ≥
τmin > 0, i, j = 1, . . . , n, where n is the number of nodes (size of the problem).
In Step 4(b)i, each ant first starts at a randomly selected vertex i and then chooses
the next vertex j according to Pi, j until a solution s contains all the nodes: s =
x n , where x n = {s1 , s2 , . . . , si }, si is the node visited by the ant at step i, and i =
1, . . . , n.
Example 11.1: Consider the TSP for Berlin52 benchmark in TSPlib. Berlin52 pro-
vides coordinates of 52 locations in Berlin, Germany. The length of the optimal tour
is 7542 when using Euclidean distances. In this example, we implement max–min
ant system. The parameters are selected as β = 5, ρ = 0.7. We set the population
size as 40 and the number of iterations as 1000. The best result obtained is 7544.4.
For a random run, the optimal solution is illustrated in Figure 11.1, and the evolution
of a random run is illustrated in Figure 11.2.
Algorithm 11.1 (ACO).
1. Set t = 0.
2. Initialize the pheromone matrix T(0), the number of ants N P .
3. sbest ← Null .
4. Repeat:
a. Initialize the set of solutions obtained by ants: Ss (t) ← ∅.
b. for k = 1, . . . , N P do
i. Ant k builds a solution s ∈ S.
S ← {1, 2, . . . , n}.
for i = 1 to n do
Choose item j ∈ S with probability pi j .
S ← S \ { j}.
Build s by the selected items.
end for
) or sbest = N ull, sbest ← s.
ii. if f (s) ≤ f (sbest
iii. Ss (t) ← Ss (t) {s}.
end for
c. Update pheromone T(t) according to Ss (t), sbest .
for all (i, j): τi j ← (1 − ρ)τi j + .
d. Set t = t + 1.
until termination condition is satisfied.
Figure 11.1 The best TSP Global best solution

solution by ACO. 1200
1000
800
600
400
200
0
0 500 1000 1500 2000
11.2.2 ACO for Continuous Optimization
ACO was originally introduced to solve discrete (combinatorial) optimization prob-

lems. In order to expand ACO for continuous optimization, an intuitive idea is to
change the discrete distributed pheromone on the edge into a continuous distributed
probabilistic distribution function on the solution landscape.
Figure 11.2 The TSP Global best route length: 7544.3659

evolution by ACO. 10000
9500
Iterative best cost

9000
8500
8000
7500
0 200 400 600 800 1000
Iteration
API [15] simulates the foraging behavior of Pachycondyla apicalis ants, which
use visual landmarks but not pheromones to memorize the positions and search the
neighborhood of the hunting sites.
Continuous ACO [1,23] generally hybridizes with other algorithms for maintain-
ing diversity. Pheromones are placed on the points in the search space. Each point is
a complete solution, indicating a region for the ants to perform local neighborhood
search. Continuous interacting ant-colony algorithm [7] uses both the pheromone
information and the ants’ direct communications to accelerate the diffusion of infor-
mation. Continuous orthogonal ant-colony algorithm [10] adopts an orthogonal
design method and a global pheromone modulation strategy to enhance the search
accuracy and efficiency.
By analysizing the relationship between the position distribution and food source
in the process of ant-colony foraging, a distribution model of ant-colony foraging
is proposed in [13], based on which a continuous domain optimization algorithm is
implemented.
Traditional ACO is extended for solving both continuous and mixed discrete–
continuous optimization problems in [18]. ACOR [19] is an implementation of con-
tinuous ACO. In ACOR, an archive with k best solutions with n variables are main-
tained and used to generate normal distribution density functions, which are later
used to generate m new solutions by ants. Then, the m newly generated solutions
replace the worst solutions in the archive. In ACOR, the construction of new solu-
tions by the ants is accomplished in an incremental manner, variable by variable. At
first, an ant is used to generate a variable value, just like it is used to generate a step in
TSP. For a problem with n variables, an ant needs n steps to generate a solution, just
like it needs n steps to generate a Hamiltonian cycle in TSP. ACOR is quite similar
to CMA and EDA. Similar realizations of this type are reported in [17].
SamACO [9] extends ACO to solving continuous optimization problems by focus-

ing on continuous variable sampling as a key to transforming ACO from discrete
optimization to continuous optimization. SamACO consists of three major steps,
namely, the generation of candidate variable values for selection, the ants’ solu-
tion construction, and the pheromone update process. The distinct characteristics of
SamACO are the cooperation of a sampling method for discretizing the continuous
search space and an efficient incremental solution construction method based on the
sampled values.
ACOMV [12] extends ACOR to tackle mixed-variable optimization problems.
The decision variables of an optimization problem can be explicitly declared as con-
tinuous, ordinal, or categorical, which allows the algorithm to treat them adequately.
ACOMV includes three solution generation mechanisms: a continuous optimization
mechanism (ACOR), a continuous relaxation mechanism (ACOMV-o) for ordinal
variables, and a categorical optimization mechanism (ACOMV-c) for categorical
variables.
Problems
11.1 Given an ant-colony system with four cities, and that the kth ant is in city 1
and
k
P11 = 0, P12 k
= 1/4, P13 k
= 1/4, P14 k
= 1/2.
What is the probability of the kth ant proceeding to each of the four cities?
11.2 TSP consists in finding a Hamiltonian circuit of minimum cost on an edge-
weighted graph G = (N , E), where N is the set of nodes, and E is the set of
edges. Let xi j (s) be a binary variable taking 1 if edge <i, j> is included in
the tour, and 0 otherwise. Let ci, j be the cost associated with edge <i, j>.
The goal is to find such a tour that minimizes the function

f (s) = ci j xi j (s).
i∈N j∈N
Set the algorithmic parameters of ACO for TSP. [Hint: τi j = 1/ci j ].

11.3 In quadratic assignment problem, n facilities and n locations are given,
together with two n × n matrices A = [ai j ] and B = [buv ], where ai j is the
distance from location i to j, and buv is the flow from facility u to v. A solution
s is an assignment of each facility to a location. Let xi (s) denote the facility
assigned to location i. The goal is to find an assignment that minimizes the
function
n n
f (s) = ai j bxi (s)x j (s) .
i=1 j=1
Set the algorithmic

n parameters of ACO for this problem. [Hint: β = 0; or
τ i j = 1/ l=1 ail ].
11.4 Implement ACO R on the Rastrigin function given in the Appendix.
References
1. Bilchev G, Parmee IC. The ant colony metaphor for searching continuous design spaces. In:
Fogarty TC, editor. Proceedings of AISB workshop on evolutionary computing, Sheffield, UK,
April 1995, vol. 993 of Lecture notes in computer science. London: Springer; 1995. p. 25–39.
2. Dorigo M, Di Caro G, Gambardella LM. Ant algorithms for discrete optimization. Artif Life.
1999;5(2):137–72.
3. Dorigo M, Gambardella LM. A study of some properties of Ant-Q. In: Proceedings of the 4th
international conference on parallel problem solving from nature (PPSN IV), Berlin, Germany,
September 1996. p. 656–665.
4. Dorigo M, Gambardella LM. Ant colony system: a cooperative learning approach to the trav-
eling salesman problem. IEEE Trans Evol Comput. 1997;1(1):53–66.
5. Dorigo M, Maniezzo V, Colorni A. Positive feedback as a search strategy. Dipartimento di
Elettronica, Politecnico di Milano, Milan, Italy, Technical Report, 1991. p. 91–016:
6. Dorigo M, Stutzle T. Ant colony optimization. Cambridge: MIT Press; 2004.
7. Dreo J, Siarry P. Continuous interacting ant colony algorithm based on dense heterarchy. Future
Gener Comput Syst. 2004;20(5):841–56.
8. Gambardella LM, Dorigo M. Ant-Q: a reinforcement learning approach to the traveling sales-
man problem. In: Proceedings of the 12th international conference on machine learning, Tahoe
City, CA, USA, July 1995. p. 252–260.
9. Hu X-M, Zhang J, Chung HS-H, Li Y, Liu O. SamACO: variable sampling ant colony
optimization algorithm for continuous optimization. IEEE Trans Syst Man Cybern Part B.
2010;40:1555–66.
10. Hu X-M, Zhang J, Li Y. Orthogonal methods based ant colony search for solving continuous
optimization problems. J Comput Sci Technol. 2008;23(1):2–18.
11. Huang H, Wu C-G, Hao Z-F. A pheromone-rate-based analysis on the convergence time of
ACO algorithm. IEEE Trans Syst Man Cybern Part B. 2009;39(4):910–23.
12. Liao T, Socha K, Montes de Oca MA, Stutzle T, Dorigo M. Ant colony optimization for
mixed-variable optimization problems. IEEE Trans Evol Comput. 2013;18(4):503–18.
13. Liu L, Dai Y, Gao J. Ant colony optimization algorithm for continuous domains based on
position distribution model of ant colony foraging. Sci World J. 2014; 2014:9 p. Article ID
428539.
14. Merkle D, Middendorf M. Modeling the dynamics of ant colony optimization. Evol Comput.
2002;10(3):235–62.
15. Monmarche N, Venturini G, Slimane M. On how Pachycondyla apicalis ants suggest a new
search algorithm. Future Gener Comput Syst. 2000;16(9):937–46.
16. Neumann F, Witt C. Runtime analysis of a simple ant colony optimization algorithm. In:
Proceedings of the 17th international symposium on algorithms and computation, Kolkata,
India, December 2006. vol. 4288 of Lecture notes in computer science. Berlin: Springer; 2006.
p. 618–627.
17. Pourtakdoust SH, Nobahari H. An extension of ant colony system to continuous optimization
problems. In: Proceedings of the 4th international workshop on ant colony optimization and
swarm intelligence (ANTS 2004), Brussels, Belgium, September 2004. p. 294–301.
18. Socha K. ACO for continuos and mixed-variable optimization. In: Proceedings of the 4th
international workshop on ant colony optimization and swarm intelligence (ANTS 2004),
Brussels, Belgium, September 2004. p. 25–36.
19. Socha K, Dorigo M. Ant colony optimization for continuous domains. Eur J Oper Res.
2008;185(3):1115–73.
20. Stutzle T, Hoos HH. The MAX-MIN ant system and local search for the traveling salesman
problem. In: Proceedings of IEEE international conference on evolutionary computation (CEC),
Indianapolis, IN, USA, April 1997. p. 309–314.
References 199
21. Stutzle T, Dorigo M. A short convergence proof for a class of ant colony optimization algo-
rithms. IEEE Trans Evol Comput. 2002;6(4):358–65.
22. Turner JS. Termites as models of swarm cognition. Swarm Intell. 2011;5:19–43.
23. Wodrich M, Bilchev G. Cooperative distributed search: the ants’ way. Control Cybern.
1997;26(3):413–46.
Bee Metaheuristics
12
This chapter introduces various algorithms that are inspired by the foraging, mating,
fertilization, and communication behaviors of honey bees. Artificial bee colony
(ABC) algorithm and marriage in honeybees optimization are described in detail.
12.1 Introduction
In nature, although each bee only performs one single task, yet through a variety of
ways of communication between bees, such as waggle dance and special odor, the
entire colony can complete complex works, such as hives building and pollen harvest
[51]. A number of optimization algorithms are inspired by the intelligent behavior of
honey bees, such as artificial bee colony (ABC) [27], bee colony optimization [57],
bees algorithm [48], and bee nectar search optimization [7].
Bee Foraging Behavior
Bees crawl along a straight line, and then turn left, moving as figure eight and swing-
ing their belly. This is waggle dance, and the angle between the gravity direction and
the center axis of the dance is exactly equal to the angle between the sun and food
source. Waggle dance can also deliver information about the distance and direction
of the food sources. The nature and duration of a waggle dance depends on the nec-
tar content of the food source. Bees in the hive each select a food source to search
for nectar, or investigate new food sources around the hive, from the information
delivered by the waggle dance [54]. Through this kind of information exchanging
and learning, the colony would always find relatively prominent nectar source. Fol-
lowing a visit to a nectar-rich inflorescence, a bee will fly a short distance to the
next inflorescence, but direction is maintained; this is believed to avoid revisiting a

DOI 10.1007/978-3-319-41192-7_12
202 12 Bee Metaheuristics
site that it has depleted. When an inflorescence provides poor rewards, the bee will
extend its flight and increase its turn angles to move away from the area.
Initially, some scout bees search the region around the hive for food. After the
search, they return to the hive and inform other bees of the locations, quantity and
quality of food sources. In case they have discovered nectar, they will dance in the
so-called dance floor area of the hive, to advertise food locations so as to encourage
the other bees to follow them. If a bee decides to leave the hive and collect nectar,
it will follow one of the dancing scout bees to the destination. Upon arriving at the
food source, the foraging bee takes a load of nectar and returns to the hive, passing
the nectar to a food storer. It can abandon the food location and return to its role of an
uncommitted follower, or continue with the foraging behavior, or recruit other bees
by dancing before returning to the food location. Several bees may attempt to recruit
other bees at the dance floor area simultaneously. The process continues repeatedly,
while bees accumulate nectar and explore new areas with potential food sources.
The essential components of a colony are food sources, unemployed foragers and
employed foragers [27]. Unemployed foragers can be either onlookers or scouts.
They are continually looking for a food source to exploit. Scout bees performs explo-
ration, whereas employed and onlooker bees perform exploitation.
• Employed bees are those that are presently exploiting a food source. They bring
loads of nectar from the food sources to the hive and share the information (via
waggle dance) about food sources with onlooker bees. They carry information
about a particular source, and share this information with certain probability.
• Onlookers are those that search for a better food source in the neighborhood of
the memorized food sources based on the information from the employed bees.
Onlookers wait in the dance area of the hive for the information from the employed
bees about the food sources. They watch the dance of the employed bees, and then
choose a food source.
• Scout bees are those that are randomly looking for new food sources in the vicinity
of the hive without any knowledge. The percentage of scout bees varies from 5 to
30 % according to the information into the hive [51].
Onlooker bees observe numerous dances before choosing a food source with a prob-
ability proportional to the nectar content of that food source. Therefore, good food
sources attract more bees than bad ones. Whenever a bee, whether it is a scout or
an onlooker, finds a food source it becomes employed. Whenever a food source is
completely exhausted, all the employed bees associated with it leave, and can again
become scouts or onlookers.
Bee Mating Behavior
A typical bee colony is composed of the queen, drones (male bees), and workers
(female workers). The queen’s life is a couple of years, and she is the only mother
of the colony. She is the only bee capable of laying eggs. Drones are produced from
unfertilized eggs and are the fathers of the colony. Their number is around a couple of
hundreds. Worker bees are produced from fertilized eggs, and they work on all pro-
cedures in the colony, such as feeding the colony and the queen, maintaining broods,
building combs, and collecting food. Their numbers are around 10–60 thousand.
Mating flight happens only once during the life of the queen. Mating starts with
the dance of the queen. Drones follow and mate with the queen during the flight.
Mating of a drone with the queen depends of the queen’s speed and their fitness.
Sperms of the drone are stored in the spermatheca of the queen, where the gene
pool of future generations is created. The queen has a certain amount of energy at
the start of the flight and return to the nest when her energy falls to minimum or
when her spermatheca is full. After going back to the nest, broods are generated
and these are improved by the worker bees crossover and mutation. The queen lays
approximately two thousand fertilized eggs a day (two hundred thousand a year).
After the spermatheca is discharged, she lays unfertilized eggs [45].
12.2 Artificial Bee Colony Algorithm
ABC algorithm [27,29,30,54] is mainly applied in continuous optimization prob-

lems. It simulates the waggle dance behavior that a swarm of bees performs during
the foraging process of the bees. ABC algorithm has better performance in function
optimization problem, compared with that of GA, DE, ES, and PSO [28–30]. Its main
advantage lies in that it conducts local search in each iteration. ABC can produce a
more optimal solution and thus is more effective than the other methods in several
optimization problems [25,54].
In ABC, the position of a food source represents a possible solution to the problem,
and the nectar amount of a food source corresponds to the quality (fitness) of the
solution. ABC begins with random solutions and attempts to find better solutions
by searching the neighborhoods of the current best solutions. The solutions are
represented as food sources that are each associated with an employed bee. An equal
number of onlooker bees each choose one of those food sources to be exploited
based on the quality or fitness, using roulette-wheel selection. Both onlooker and
employed bees try to locate better food sources in the neighborhood of their current
food source by perturbing a randomly chosen dimension of their food source position
toward another randomly chosen food source.
12.2.1 Algorithm Flowchart
ABC associates all employed bees with food sources (solutions). Unlike real bee
colonies, there is a one-to-one correspondence between employed bees and food
sources (solutions). That is, the number of food sources is the same as that of
employed bees.
In the initialization phase, a population of food sources (solutions) are initial-

ized by scout bees, and control parameters are set. A scout bee generates a food
source (solution) randomly and it is then associated with this food source to make it
employed.
After initialization of the ABC parameters and swarm, it requires iterations of
the three phases, namely, employed bees phase, onlooker bees phase and scout bees
phase. In the employed bees phase, employed bees search for new food sources
having more nectar within the neighborhood of the food source x m in their memory.
They find a neighbor food source and then evaluate its fitness. A greedy selection is
applied between the two sources. This is a local search step. After that, employed
bees share their food source information with onlooker bees waiting in the hive by
dancing on the dancing area.
There are possible options related to residual amount of nectar for a foraging bee.
If the nectar amount decreases to a low level or exhausted, it abandons the food
source and becomes an unemployed bee. If there is still a sufficient amount of nectar,
it can continue to forage without sharing the food source information with the nest
mates, or it can perform waggle dance to inform the nest mates about the food source.
Onlooker bees phase begins where all employed bees share the nectar information
of their corresponding food sources with the onlookers. Onlookers select a food
source i with a probability Pi determined by roulette-wheel selection
fi
Pi = M , (12.1)
j=1 f j
where f i is the fitness of the solution corresponding to the food source i, and M is
the total number of food sources which is equal to the number of employed bees.
The fitness of a solution can be defined from its objective function f (x m ) by

1/(1 + f (x i )), if f (x i ) ≥ 0
fi = . (12.2)
1 + | f (x i )|, if f (x i ) < 0
After a food source is chosen for an onlooker bee, a neighborhood source is
determined. As in employed bees phase, a greedy selection is applied between two
sources. Onlooker bees phase ends when the new locations of all food sources are
determined.
In scout bees phase, employed bees whose solutions cannot be improved after a
specified number of trials become scouts, and their solutions are abandoned. Those
food sources are assumed to be exhausted and the associated employed bees become
scouts. A scout then searches for a random solution x i and is associated with it, and
it again becomes employed. If a new food source has equal or better nectar than
old source, it replaces the old one in the memory. Hence, those sources which are
initially poor or have been made poor by exploitation are abandoned. The three steps
are repeated until a termination criterion is satisfied.
The general flowchart of ABC is given as Algorithm 12.1 (Source code of ABC
is available at http://mf.erciyes.edu.tr/abc). ABC algorithm has only three control
parameters: the bee colony size (equal to twice the number of food sources), the local
search abandoning limit, and the maximum number of search cycles or a fitness-based
termination criterion). Parameter tuning of ABC has been investigated in [3].
12.2 Artificial Bee Colony Algorithm 205
Algorithm 12.1 (Artificial Bee Colony).
1. Initialize the parameters.

2. Generate randomly distributed food sources x i , i = 1, . . . , M, over the search space,
and evaluate their nectar (fitness).
3. Send the employed bees to the current food sources.
4. Repeat:
a. for each employed bee:
Find a new food source in its neighborhood, and evaluate the fitness.
Apply greedy selection on the two food sources.
end for
b. Calculate the probability Pi for each food source.
c. for each onlooker bee:
for each food source i:
if (rand() < Pi )
Send the onlook bee to the food source of the ith employed bee.
Find a new food source in the neighborhood, and evaluate the fitness.
Apply greedy selection on the two food sources.
end if
continue
end for
end for
d. if any employed bee becomes sout bee
Send the scout bee to a randomly produced food source.
end if
e. Memorize the best food source (solution) found so far.
Example 12.1: The Easom function is treated in Examples 2.1, 3.4, and 5.2. Here
we solve this same problem by using ABC. The global minimum value is −1 at
x = (π, π )T .
By setting the maximum number of search cycles as 200, the bee colony size as
100, the local search abandoning limit as 2000, the implementation always finds a
solution close to the global optimum.
tion evaluations. All the individuals converge toward the global optimum. For 10
random runs, the solver always converged to the global optimum within 200 search
cycles. For a random run, the evolution is shown in Figure 12.1, and the evolution
of the best solution at each cycle is shown in Figure 12.2. Note that in Figure 12.2,
we show only a small region of the domain for illustration purpose.
Best value: −0.9988 Mean value: −0.1516

0
Best value
−0.1 Mean value
−0.2
−0.3
Function value
−0.4
−0.5
−0.6
−0.7
−0.8
−0.9
−1
0 50 100 150 200
Iteration
Figure 12.1 The evolution of a random run of ABC for the Easom function: the minimum and
average objectives.
Figure 12.2 The evolution of the best solution at each cycle for a random run of ABC for the
Easom function.
12.2.2 Modifications on ABC Algorithm
Due to roulette-wheel selection in the onlooker phase, ABC suffers from some inher-
ent drawbacks like slow or premature convergence when dealing with certain com-
plex models [28,54]. Boltzmann selection mechanism is employed instead in [24]
for improving the convergence ability of ABC.
Intermediate ABC [53] modifies the structure of ABC. The potential food sources
are generated by using the intermediate positions between the uniformly generated
random numbers and random numbers generated by opposition-based learning. Inter-
mediate ABC is further modified by guiding the bees toward the best food location
in the population to improve the convergence rate.
Hybrid simplex ABC [25] combines ABC with Nelder–Mead simplex method to
solve inverse analysis problems. Memetic ABC proposed in [20] hybridizes ABC
with two local search heuristics: the Nelder-Mead algorithm and the random walk
with direction exploitation.
Interactive ABC [58] introduced in the onlooker phase of ABC, the Newtonian law
of universal gravitation, which is also for modifying roulette-wheel selection. Gbest-
guided ABC [62] incorporates the gbest solution into the solution search equation. In
[11], different chaotic maps are used for parameter adaptation in order to improve the
convergence characteristics and to prevent ABC from getting stuck in local solutions.
In ABC, only one dimension of the food source position is updated by the
employed or onlooker bees. In order to accelerate the convergence, in ABC with
modification rate [4], a control parameter called modification rate (in [0, 1]) is intro-
duced to decide whether a dimension will be updated. If a random number is less
than the modification rate, the dimension j is modified and at least one dimension
is updated. A lower modification rate may cause solutions to improve slowly while
a higher value may cause too much diversification in the population.
The undirected random search in ABC causes slow convergence of the algorithm
to the optimum or near optimum. Directed ABC [35] adds directional information for
each dimension of each food source position to ABC. The directions of information
for all dimensions are initially set to 0. If the new solution is better than old one,
the direction information is updated. If previous value of the dimension is less than
current value, the direction information of this dimension is set to −1; otherwise the
direction information of this dimension is set to 1. If new solution is worse than old
one, the direction information of the dimension is set to 0. The direction information
of each dimension of each food source position is used. Directed ABC is better than
ABC and ABC with modification rate in terms of solution quality and convergence
rate.
ABC is excellent in exploration but poor in exploitation. Gaussian bare-bones
ABC [61] designs a search equation based on utilizing the global best solution.
The generalized opposition-based learning strategy is employed to generate new
food sources for scout bees. In [40], exploitation is improved by integrating the
information of previous best solution into the search equation for employed bees
and global best solution into the update equation for onlooker bees. S-type adaptive
scaling factors are introduced in the search equation of employed bees. The search
policy of scout bees is modified to update food source in each cycle in order to
increase diversity and stochasticity of the bees.
In [8], ABC is modified by replacing the process of the employed bee opera-
tor by the hill-climbing optimizer controlled by hill-climbing rate to empower its
exploitation capability. The algorithm is applied on nurse rostering problem.
ABC uses differential position update rule. When food sources gather on the
similar points within the search space, differential position update rule can cause
stagnation during the search process. Distribution-based update rule for ABC [9]
uses the mean and standard deviation of the selected two food sources to obtain a
new candidate solution. This effectively overcomes stagnation behavior.
Rosenbrock ABC [26] combines Rosenbrock’s rotational direction method with
ABC. In [18], two variants of ABC apply new methods for the position update of
the artificial bees. An improved version of ABC [50] uses mutation based on Levy
probability distributions.
In [29], ABC is extended for solving constrained optimization problems. In [13],
an improved version of ABC is proposed for constrained optimization problems. In
[43], an algorithm is introduced based on ABC to solve constrained real-parameter
optimization problems, in which a dynamic tolerance control mechanism for equality
constraints is added to the algorithm in order to facilitate the approach to the fea-
sible region of the search space. In a modified ABC algorithm [55] for constrained
problems, a smart bee having memory is employed to keep the location and quality
of food sources.
Quick ABC [32] models the behavior of onlooker bees more accurately and
improves the performance of standard ABC in terms of local search ability; this
is described and its performance is analyzed depending on the neighborhood radius,
on a set of benchmark problems. ABC with memory [38] imitates a memory mech-
anism to the artificial bees to memorize their previous successful experiences of
foraging behavior. ABC with memory outperforms ABC and quick ABC.
Opposition-based Levy flight ABC [52] incorporates Levy flight random-walk-
based local search strategy with ABC along with opposition-based learning strategy.
It outperforms basic ABC, gbest-guided ABC [62], best-so-far ABC [10] and a
modified ABC [4] in most of the experiments.
12.2.3 Discrete ABC Algorithms
Binary versions of ABC are available for binary optimization problems [34,47].
Discrete ABC [34] uses a differential expression which employs a measure of dis-
similarity between binary vectors in place of the vector subtraction operator used in
ABC. In [47], the binary ABC is based on genetic operators such as crossover and
swap; it improves the global–local search ability of basic ABC in binary domain by
integrating the neighborhood searching mechanism of basic ABC.
In [37], concepts of inertia weight and acceleration coefficients from PSO have
been utilized to improve the search process of ABC. In [31], a combinatorial ABC
is introduced for traveling salesman problems. ABC programming is applied to
symbolic regression in [33].
In another ABC for binary optimization [36], artificial bees work on the continuous
solution space, and the obtained food source position is converted to binary values
before the objective function is evaluated.
12.3 Marriage in Honeybees Optimization
Marriage in honeybees optimization [1,23] is a metaheuristic algorithm inspired by

the marriage behavior and fertilization of honey bees. It simulates the evolution of
honeybees starting with a solitary colony (single queen without a family) to the
emergence of an eusocial colony (one or more queens with a family) by the mating
process of the queen.
The mating process of the queen begins when the queen flights away from the nest
performing the mating flight during which the drones follow the queen and mate with
her in the air [1,2]. The algorithm uses a swarm of bees where there are three kinds
of bees, the queen, the drones, and the workers. There are a number of procedures
that can be applied inside the swarm. In the algorithm, the procedure of mating of the
queen with the drones is described. First, the queen is flying randomly in the air and,
based on her speed and her energy, if she meets a drone then there is a possibility to
mate with him. Even if the queen mates with the drone, she does not create directly
a brood, but stores the genotype of the drone in her spermatheca and the brood is
created only when the mating flight has been completed. A crossover operator is used
in order to create the broods. In a hive the role of the workers is simply the brood
care (i.e., to feed them with the royal jelly) and, thus, they are only a local search
phase in the algorithm. Thus, this algorithm combines both the mating process of
the queen and one part of the foraging behavior of the honey bees inside the hive. If
a brood is better (fittest) than the queen, then this brood replaces the queen.
In [56], annealing is applied in the algorithm for determining the gene pool of
male bees. Marriage in honeybees optimization is modified for solving combinatorial
problems and for infinite horizon-discounted cost stochastic dynamic programming
problems in [14].
Mating of the drone with the queen takes place according to the annealing prob-
ability of the drone to be added to the spermatheca of the queen [1]:
P f = e−( f )/S(t) , (12.3)
where ( f ) is the absolute difference between the drone’s fitness and queen’s fitness,
S(t) is the speed of the queen at time t. In cases where the queen’s speed is high or
the fitness of the drone is close to the queen’s fitness, mating probability is high. The
speed S(t) and energy E(t) of the queen in each pass are defined by
S(t + 1) = αS(t), E(t + 1) = E(t) − γ , (12.4)
Algorithm 12.2 (Marriage in Honeybees Optimization).
1. Initialize workers. Randomly generate the queens.

2. Apply local search to find a good queen.
3. Repeat:
a. for each queen:
Initialize energy E, speed S, and position.
The queen moves between states.
Probabilistically choose drones by P f = e−( f )/S(t) .
if a drone is selected
Add its sperm to the queen’s spermatheca;
S(t + 1) = αS(t), E(t + 1) = E(t) − γ .
end if
Update queen’s energy and speed.
end for
b. Generate broods by crossover and mutation.
c. Use workers to improve the broods.
d. Update worker’s fitness.
e. while the best brood is better than the worst queen,
Replace the least-fit queen with the best brood
Remove the best brood from the brood list
end while
until a maximum number of mating flights is reached.
where α ∈ [0, 1] and γ is the amount of energy reduction in each pass. The algorithm
[1] is shown in Algorithm 12.2.
12.4 Bee Colony Optimization
Bee colony optimization [42,57] is a stochastic swarm optimization method inspired

by the foraging behavior of bees. A population of artificial bees searches for the
optimal solution in solving COPs.
A step-by-step solution is produced by each forager bee. Every bee generates
a solution to the problem through a sequence of construction steps, over multiple
iterations, until a stopping criterion is met. After NC forward/backward passes are
performed, all B solutions are generated. The best among them is used to update
the gbest solution and one iteration of the bee colony optimization is completed.
Iterations are repeated until a stopping criterion is met.
During each forward pass, every bee is exploring the neighborhood of its current
solution by a certain number of moves which construct and/or improve the solu-
tion. Every bee adds new components to its partial solution. Having obtained new
solutions, the bees return to the nest and start the backward pass in the iteration.
12.4 Bee Colony Optimization 211
During the backward pass, all bees share their solutions using waggle dance. Each
bee decides, with certain probability, whether to keep its solution or not: a bee with
better solution has a higher chance of keeping and advertising its solution. The bees
that are loyal to their partial solutions are called recruiters. Every remaining bee has
to decide whether to continue to explore its own solution in the next forward pass or
to start exploring the neighborhood of one of the solutions advertised. The followers
have to choose a bee to follow and adopt its solution. Selection of a recruiter is made
probabilistically. Once a solution is abandoned, the bee becomes uncommitted, and
has to select one of the advertised solutions probabilistically, in such a way that
better advertised solutions have higher chances to be chosen for further exploration.
Within each backward pass, all bees are divided into two groups (R recruiters and B R
uncommitted bees). The number of components is calculated in such a way that one
iteration of bee colony optimization is completed after NC forward/backward passes.
At the end of the forward pass the new (partial or complete) solution is generated
for each bee.
Bee colony optimization is good at exploration but weak at exploitation. Weighted
bee colony optimization [44] improves the exploitation power by allowing the bees
to search in the solution space deliberately while considering policies to share the
attained information about the food sources heuristically. It considers global and
local weights for each food source, where the former is the rate of popularity of a
given food source in the swarm and the latter is the relevancy of a food source to a
category label. To preserve diversity in the population, new policies are embedded in
the recruiter selection stage to ensure that uncommitted bees follow the most similar
committed ones.
12.5 Other Bee Algorithms
Other approaches that simulate the behavior of bees are virtual bee algorithm [60],
beehive algorithm [59], bee swarm optimization [19], bees algorithm [48], honey bee
colony algorithm [15], beehive model [46], and honey bee social foraging algorithm
[49].
Virtual bee algorithm [60] associates the population of bees with a memory and
a food source, and then, the bees communicate with a waggle dance procedure. A
swarm of virtual bees are generated and they are allowed to move randomly in the
phase space and these bees interact when they find some target nectar. Nectar sources
correspond to the encoded values of the function. The solution can be obtained from
the intensity of bee interactions.
Bees algorithm [48] mimics the foraging behavior of honey bees. It performs
neighborhood search combined with random search and can be used for both combi-
natorial and continuous optimization. A population of initial solutions (food sources)
are randomly generated. Then, the bees are assigned to the solutions based on their
fitness function. The bees return to the hive and based on their food sources, a number
of bees are assigned to the same food source in order to find a better neighborhood
solution. Each bee is represented as an individual whose behavior is regulated by a
behavior-control structure.
Beehive algorithm [59] is inspired by the communication in the hive of honey
bees. It has been applied to the routing in networks. In Beehive algorithm, a protocol
inspired from dance language and foraging of one bee is determined by the internal
and external information available to it and its motivational state, according to a set
of specific rules which is identical for each bee. Since the perceptible environment
differs for bees with a different spatial location, the behavior also differs. Bees can
show different behaviors as well, given differences in their foraging experience and/or
their motivational state.
Bee swarm optimization [5,19] uses a modified formula for different phases of
ABC. Different types of flying patterns are introduced to maintain proper balance
between global and local search by providing diversity into the swarm of bees.
Penalty and repulsion factors are introduced to mitigate stagnation. In bees swarm
optimization, initially a bee finds an initial solution (food source) and from this
solution the other solutions are produced with certain strategies. Then, every bee is
assigned in a solution and when they accomplish their search, the bees communicate
between themself with a waggle dance strategy and the best solution will become
the new reference solution. A tabu list is used to avoid cycling.
Bee collecting pollen algorithm [41] is a metaheuristic optimization algorithm
for discrete problems such as TSP, inspired by the pollen-collecting behavior of
honeybees.
12.5.1 Wasp Swarm Optimization
Wasp swarm optimization [16,17] is a heuristic stochastic method for solving dis-
crete optimization problems. It mimics the behavior of a wasp colony, in particular,
the assignment of resources to individual wasps is based on their social status. For
example, if the colony has to fight a war against an enemy colony, then wasp sol-
diers will receive more food than others. Generally, the method assigns resources
to individual solution components stochastically, where the probabilities depend
on the strength of each option. The function for computing this strength is highly
application-dependent. In [16], a stochastic tournament mechanism is used to pick a
solution based on the probabilities calculated from the given strengths. The algorithm
needs to decide the application-specific strength function and the way to stochasti-
cally pick options.
12.5 Other Bee Algorithms 213
Problems
12.1 Compare the similarity of ABC and DE.

12.2 Run the accompanying MATLAB code of bees algorithm to find the global
minimum of six-hump-camelback function in the Appendix. Investigate how
to improve the result by adjusting the parameters.
References
1. Abbass HA. MBO: Marriage in honey bees optimization—a haplometrosis polygynous swarm-
ing approach. In: Proceedings of the IEEE congress on evolutionary computation (CEC2001),
Seoul, Korea, May 2001. p. 207–214.
2. Afshar A, Bozog Haddad O, Marino MA, Adams BJ. Honey-bee mating optimization (HBMO)
algorithm for optimal reservoir operation. J Frankl Inst. 2007;344:452–462.
3. Akay B, Karaboga D. Parameter tuning for the artificial bee colony algorithm. In:Proceedings
of the 1st international conference on computational collective intelligence (ICCCI): Semantic
web, social networks and multiagent systems, Wroclaw, Poland, October 2009. p. 608–619.
4. Akay B, Karaboga D. A modified artificial bee colony algorithm for real-parameter optimiza-
tion. Inf Sci. 2012;192:120–42.
5. Akbari R, Mohammadi A, Ziarati K. A novel bee swarm optimization algorithm for numerical
function optimization. Commun Nonlinear Sci Numer Simul. 2010;15:3142–55.
6. Alam MS, Ul Kabir MW, Islam MM. Self-adaptation of mutation step size in artificial bee
colony algorithm for continuous function optimization. In: Proceedings of the 13th international
conference on computer and information technology (ICCIT), Dhaka, Bangladesh, December
2010. p. 69–74.
7. Alfonso W, Munoz M, Lopez J, Caicedo E. Optimización de funciones inspirada en el compor-
tamiento de búsqueda de néctar en abejas. In: Congreso Internacional de Inteligenicia Com-
putacional (CIIC2007), Bogota, Colombia, September 2007.
8. Awadallah MA, Bolaji AL, Al-Betar MA. A hybrid artificial bee colony for a nurse rostering
problem. Appl Soft Comput. 2015;35:726–39.
9. Babaoglu I. Artificial bee colony algorithm with distribution-based update rule. Appl Soft
Comput. 2015;34:851–61.
10. Banharnsakun A, Achalakul T, Sirinaovakul B. The best-so-far selection in artificial bee colony
algorithm. Appl Soft Comput. 2011;11(2):2888–901.
11. Bilal A. Chaotic bee colony algorithms for global numerical optimization. Expert Syst Appl.
2010;37:5682–7.
12. Brajevic I, Tuba M, Subotic M. Improved artificial bee colony algorithm for constrained prob-
lems. In: Proceedings of the 11th WSEAS International conference on evolutionary computing,
world scientific and engineering academy and society (WSEAS), Stevens Point, WI, USA, June
2010. p. 185–190.
13. Brajevic I, Tuba M, Subotic M. Performance of the improved artificial bee colony algorithm
on standard engineering constrained problems. Int J Math Comput Simul. 2011;5(2):135–43.
14. Chang HS. Convergingmarriage in honey-bees optimization and application to stochastic
dynamic programming. J Glob Optim. 2006;35(3):423–41.
15. Chong CS, Low MYH, Sivakumar AI, Gay KL. A bee colony optimization algorithm to job
shop scheduling. In: Proceedings of the winter simulation conference, Monterey, CA, USA,
December 2006. p. 1954–1961.
16. Cicirello VA, Smith SF. Improved routing wasps for distributed factory control. In: Proceedings
of IJCAI workshop on artificial intelligence and manufacturing, Seattle, WA, USA, August
2001. p. 26–32.
17. Cicirello VA, Smith SF. Wasp-like agents for distributed factory coordination. Auton Agents
Multi-Agent Syst. 2004;8:237–66.
18. Diwold K, Aderhold A, Scheidler A, Middendorf M. Performance evaluation of artificial bee
colony optimization and new selection schemes. Memetic Comput. 2011;3:149–62.
19. Drias H, Sadeg S, Yahi S. Cooperative bees swarm for solving the maximum weighted satisfi-
ability problem. In: Computational intelligence and bioinspired systems, vol. 3512 of Lecture
notes in computer science. Berlin: Springer; 2005. p. 318–325.
20. Fister I, Fister Jr I, Zumer JB. Memetic artificial bee colony algorithm for large-scale global
optimization. In: Proceedings of IEEE congress on evolutionary computation (CEC), Brisbane,
Australia, June 2012. p. 1–8.
21. Gao W, Liu S. Improved artificial bee colony algorithm for global optimization. Inf Process
Lett. 2011;111(17):871–82.
22. Gao WF, Liu SY. A modified artificial bee colony algorithm. Comput Oper Res.
2012;39(3):687–97.
23. Haddad OB, Afshar A, Marino MA. Honey-bees mating optimization (HBMO) algo-
rithm: a new heuristic approach for water resources optimization. Water Resour Manage.
2006;20(5):661–80.
24. Haijun D, Qingxian F. Artificial bee colony algorithm based on Boltzmann selection policy.
Comput Eng Appl. 2009;45(31):53–5.
25. Kang F, Li J, Xu Q. Structural inverse analysis by hybrid simplex artificial bee colony algo-
rithms. Comput Struct. 2009;87(13):861–70.
26. Kang F, Li J, Ma Z. Rosenbrock artificial bee colony algorithm for accurate global optimization
of numerical functions. Inf Sci. 2011;181:3508–31.
27. Karaboga D. An Idea based on honey bee swarm for numerical optimization. Technical Report,
Erciyes University, Engineering Faculty Computer Engineering Department, Erciyes, Turkey,
2005.
28. Karaboga D, Akay B. A comparative study of artificial bee colony algorithm. Appl Math
Comput. 2009;214:108–32.
29. Karaboga D, Basturk B. A powerful and efficient algorithm for numerical function optimization:
artificial bee colony (ABC) algorithm. J Glob Optim. 2007;39(3):459–71.
30. Karaboga D, Basturk B. On the performance of artificial bee colony (ABC) algorithm. Appl
Soft Comput. 2008;8(1):687–97.
31. Karaboga D, Gorkemli B. A combinatorial artificial bee colony algorithm for traveling salesman
problem. In: Proceedings of international symposium on innovations in intelligent systems and
applications (INISTA), Istanbul, Turkey, June 2011. p. 50–53.
32. Karaboga D, Gorkemli B. A quick artificial bee colony (qABC) algorithm and its performance
on optimization problems. Appl Soft Comput. 2014;23:227–38.
33. Karaboga D, Ozturk C, Karaboga N, Gorkemli B. Artificial bee colony programming for
symbolic regression. Inf Sci. 2012;209:1–15.
34. Kashan MH, Nahavandi N, Kashan AH. DisABC: a new artificial bee colony algorithm for
binary optimization. Appl Soft Comput. 2012;12:342–52.
35. Kiran MS, Findik O. A directed artificial bee colony algorithm. Appl Soft Comput.
2015;26:454–62.
36. Kiran MS. The continuous artificial bee colony algorithm for binary optimization. Appl Soft
Comput. 2015;33:15–23.
37. Li G, Niu P, Xiao X. Development and investigation of efficient artificial bee colony algorithm
for numerical function optimization. Appl Soft Comput. 2012;12:320–32.
38. Li X, Yang G. Artificial bee colony algorithm with memory. Appl Soft Comput. 2016;41:362–
72.
References 215
39. Liu Y, Passino KM. Biomimicry of social foraging bacteria for distributed optimization: models,
principles, and emergent behaviors. J Optim Theor Appl. 2002;115(3):603–28.
40. Liu J, Zhu H, Ma Q, Zhang L, Xu H. An artificial bee colony algorithm with guide of global and
local optima and asynchronous scaling factors for numerical optimization. Appl Soft Comput.
2015;37:608–18.
41. Lu X, Zhou Y. A novel global convergence algorithm: bee collecting pollen algorithm. In:
Proceedings of the 4th international conference on intelligent computing, Shanghai, China,
September 2008, vol. 5227 of Lecture notes in computer science. Berlin: Springer; 2008. p.
518–525.
42. Lucic P, Teodorovic D. Computing with bees: attacking complex transportation engineering
problems. Int J Artif Intell Tools. 2003;12:375–94.
43. Mezura-Montes E, Velez-Koeppel RE. Elitist artificial bee colony for constrained real-
parameter optimization. In: Proceedings of IEEE congress on evolutionary computation (CEC),
Barcelona, Spain, July 2010. p. 1–8.
44. Moayedikia A, Jensen R, Wiil UK, Forsati R. Weighted bee colony algorithm for discrete opti-
mization problems with application to feature selection. Eng Appl Artif Intell. 2015;44:153–67.
45. Moritz RFA, Southwick EE. Bees as super-organisms. Berlin, Germany: Springer; 1992.
46. Navrat P. Bee hive metaphor for web search. In: Proceedings of the international conference
on computer systems and technologies (CompSysTech), Veliko Turnovo, Bulgaria, 2006. p.
IIIA.12.
47. Ozturk C, Hancer E, Karaboga D. A novel binary artificial bee colony algorithm based on
genetic operators. Inf Sci. 2015;297:154–70.
48. Pham DT, Kog E, Ghanbarzadeh A, Otri S, Rahim S, Zaidi M. The bees algorithm—a novel tool
for complex optimisation problems. In: Proceedings of the 2nd international virtual conference
on intelligent production machines and systems (IPROMS), Cardiff, UK, July 2006. p. 454–
459.
49. Quijano N, Passino KM. Honey bee social foraging algorithms for resource allocation, Part i:
algorithm and theory; part ii: application. In: Proceedings of the American control conference,
New York, NY, USA, July 2007. p. 3383–3388, 3389–3394.
50. Rajasekhar A, Abraham A, Pant M. Levy mutated artificial bee colony algorithm for global opti-
mization. In: Proceedings of IEEE international conference on systems, man and cybernetics,
Anchorage, AK, USA, October 2011. p. 665–662.
51. Seeley TD. The wisdom of the hive: the social physiology of honey bee colonies. Massachusetts:
Harvard University Press; 1995.
52. Sharma H, Bansal JC, Arya KV. Opposition based Levy flight artificial bee colony. Memetic
Comput. 2013;5:213–27.
53. Sharma TK, Pant M. Enhancing the food locations in an artificial bee colony algorithm. Soft
Comput. 2014;17:1939–65.
54. Singh A. An artificial bee colony algorithm for the leaf-constrained minimum spanning tree
problem. Applied Soft Comput. 2009;9(2):625–31.
55. Stanarevic N, Tuba M, Bacanin N. Enhanced artificial bee colony algorithm performance. In:
Proceedings of the 14th WSEAS international conference on computers, world scientific and
engineering academy and society (WSEAS). Stevens Point, WI, USA, June 2010. p. 440–445.
56. Teo J, Abbass HA. A true annealing approach to the marriage in honey-bees optimization
algorithm. Int J Comput Intell Appl. 2003;3:199–208.
57. Teodorovic D, Dell’Orco M. Bee colony optimization—a cooperative learning approach to
complex transportation problems. In: Proceedings of the 10th meeting of the EURO working
group on transportation, Poznan, Poland, September 2005. p. 51–60.
58. Tsai P-W, Pan J-S, Liao B-Y, Chu S-C. Enhanced artificial bee colony optimization. Int J
Innovative Comput Inf Control. 2009;5(12):5081–92.
59. Wedde HF, Farooq M, Zhang Y. BeeHive: an efficient fault-tolerant routing algorithm inspired
by honey bee behavior. In: Dorigo M, editors. Ant colony optimization and swarm intelligence,
vol. 3172 of Lecture notes in computer science. Berlin: Springer; 2004. pp. 83–94.
60. Yang XS. Engineering optimizations via nature-inspired virtual bee algorithms. In: Mira J,
lvarez JR, editors. Artificial intelligence and knowledge engineering applications: a bioinspired
approach, vol. 3562 of Lecture notes in computer science. Berlin: Springer; 2005. pp. 317–323.
61. Zhou X, Wu Z, Wang H, Rahnamayan S. Gaussian bare-bones artificial bee colony algorithm.
Soft Comput. 2016: 1–18. doi:10.1007/s00500-014-1549-5.
62. Zhu G, Kwong S. Gbest-guided artificial bee colony algorithm for numerical function opti-
mization. Appl Math Comput. 2010;217:3166–73.
Bacterial Foraging Algorithm
13
This chapter describes bacterial foraging algorithm inspired by the social foraging
behavior of Escherichia coli present in human intestine. Several algorithms inspired
by molds, algae, and tumor cells are also introduced.
13.1 Introduction
The social foraging behavior of Escherichia coli present in human intestine and
M. xanthus bacteria was explained in [9]. The social foraging of both species of bac-
teria is able to climb noisy gradients in nutrients. The foraging behavior is modeled
as an optimization process where bacteria seek to maximize the energy intake per
unit time spent for foraging, considering all the constraints presented by their own
physiology and environment. Bacterial foraging algorithm [9,14] is a population-
based stochastic optimization technique inspired by the behavior of Escherichia coli
bacteria that forage for food. Bacterial chemotaxis algorithm [11] tackles optimiza-
tion problems by employing the way in which bacteria react to chemoattractants in
concentration gradients.
Bacterial foraging behavior is known as bacterial chemotaxis. Chemotaxis, a cell
movement in response to gradients of chemical concentrations present in the envi-
ronment, is a survival strategy that allows bacteria to search for nutrients and avoid
noxious environments. The chemotactical behavior of bacteria as an optimization
process was modeled in the early 1970s [2].
The chemotactical behavior of bacteria is modeled by making the following
assumptions [3]. (1) The path of a bacterium is a sequence of straight-line trajec-
tories joined by instantaneous turns, each trajectory being characterized by speed,
direction, and duration. (2) All trajectories have the same constant speed. (3) When
a bacterium turns, its choice of a new direction is governed by a probability distri-
bution, which is azimuthally symmetric about the previous direction. (4) The angle

DOI 10.1007/978-3-319-41192-7_13
218 13 Bacterial Foraging Algorithm
between two successive trajectories is governed by a probability distribution. (5)

The duration of a trajectory is governed by an exponentially decaying probability
distribution. (6) The probability distributions for both the angle and the duration are
independent of the parameters of the previous trajectory.
A bacterium is a prokaryotic unicellular organism. Many bacteria have a series
of rotating flagellant cell surface that act as propellants, allowing them to swim at a
speed of 10–35 mm/s [5]. They have potent receivers (chemoreceptors) for detecting
temporal space changes of chemical concentrations in the environment. When an
external perturbation is detected, bacteria use their memory to make a temporal
space comparison of the gradients [15].
An Escherichia coli bacterium consists of the main cell body, the pili (used for
the transfer of DNA to other bacteria), and flagella (long, left-handed helix, whip-
like projections that enable motor activity). A bacterium has 8–10 flagella placed
randomly on its cell body. These flagella can rotate at a high speed of 270 rounds
per second, stop momentarily and change the direction of rotation in a controlled
manner [5]. When all of the flagella move counterclockwise, they act as propellants
moving the bacterium forward in an almost rectilinear movement called swim at a
very fast rate. If the flagella rotate clockwise, they destabilize, causing the bacterium
to tumble randomly.
Chemotaxis is a foraging strategy that implements a type of local optimization
where the bacteria try to climb up the nutrient concentration, avoid noxious substance,
and search for ways out of neutral media. The chemotaxis step has resemblance with
a biased random walk model [7]. It is a cell movement in response to gradients of
chemical concentrations present in the environment. Tumble is where a bacterium
moves randomly one step ahead around 360◦ of its current location as initial step to
guess for a food location. If the location after the tumble has higher food than its
original location, then the bacterium starts to swim in the same direction of tumble to
another location with higher speed until it reaches higher nutrient location. However,
if the location after the tumble has lower nutrient value, the bacterium repeats the
tumble action by selecting another random location around its current position and
it continuously tumbles until better nutrient position is found.
Generally, the bacteria move for a longer distance in a friendly environment.
In a harmful place, it tumbles frequently to find a nutrient gradient. When placed
into a neutral environment, where there are no nutrients or harmful substances, the
bacteria work independently, tumbling and swimming for equal time periods. Upon
discovering a nutrient, the bacteria engage in chemotaxes. In an environment with a
constant level of nutrient the chemotaxes is similar to the neutral case, except that the
mean swim length and speed increase at the cost of tumbling time. The bacteria will
always seek positive nutrient gradients, even in nutrient-rich environments. However,
negative gradients are sought in the presence of a harmful substance.
When they get food in sufficient amount, they are increased in length, and in
presence of suitable temperature, they break in the middle to form an exact replica of
itself. Due to the occurrence of sudden environmental changes or attack, the chemo-
tactic progress may be destroyed, and a group of bacteria may move to some other
places or some others may be introduced in the swarm of concern. This constitutes the
event of elimination–dispersal in the real bacterial population, where all the bacteria
in a region are killed or a group is dispersed into a new part of the environment.
In summary, the chemotactical strategy of Escherichia coli can be given as follows
[14]. If a bacterium finds a neutral environment or an environment without gradients,
it alternately tumbles and swims; If it finds a nutrient gradient, the bacterium spend
more time swimming and less time tumbling, so the directions of movement are
biased toward increasing nutrient gradients; If it finds a negative gradient or noxious
substances, it swims to better environments or run away from dangerous places.
13.2 Bacterial Foraging Algorithm
Overall, bacterial foraging algorithm is a very effective search approach for global
optimization problems [4,14]. However, it is relatively complex and more computa-
tion time might be needed [9].
In bacterial foraging algorithm, a set of bacteria tries to reach an optimum cost
by following four stages: chemotaxis, swarming, reproduction, and elimination and
dispersal. All the stages are continuous and they are repeated until the end of bacteria
life.
At the beginning, each bacterium produces a solution iteratively for a set of para-
meters. In the chemotaxis phase, the step size of bacterium movement determines the
performance of the algorithm both in terms of the convergence speed and the accu-
racy. In the swarming stage, each bacterium signals another bacterium via attractants
to swarm together. This is the cell-to-cell signaling stage. During the process of reach-
ing toward the best food location, the bacterium which has searched the optimum
path produces an attraction signal to other bacteria to swarm to the desired location.
In the reproduction stage, all the bacteria are sorted and grouped into two classes.
The first half of the bacteria with high fitness is cloned to inherit their good features.
Each bacterium splits into two bacteria, which are placed at the same location; the
other half are eliminated from the population. In the elimination and dispersal stage,
any bacterium from the total set can be either eliminated or dispersed to randomly
distribute within the search area to search for other better nutrient location. This
stage prevents the bacteria from attaining the local optimum.
Let x be the position of a bacterium and J (x) be the value of the objective
function. The conditions J (x) < 0, J (x) = 0, and J (x) > 0 indicate whether
the bacterium at location x is in nutrient-rich, neutral, and noxious environments,
respectively. Chemotaxis tries to find lower values of J (x), and avoids positions x
where J (x) ≥ 0.
The chemotaxis process simulates the movement of the bacteria via swarming
and tumbling. The chemotactic movement can be represented by
j+1,k,l j,k,l i
xi = xi + Ci , (13.1)
iT i
j,k,l
where x i is the position of the ith bacterium at the jth chemotaxis, kth repro-
duction, and lth elimination–dispersal stage, the step size Ci is taken in the random
direction specified by the tumble (swim), and i is a random vector with each entry
lying in [−1, 1].
A mathematical analysis of the chemotactic step in bacterial foraging algorithm is
performed based on gradient descent approach in [4]. The stability and convergence
behavior of the dynamics is analyzed according to Lyapunov stability theorems. The
analysis suggests that chemotaxis employed in standard bacterial foraging algorithm
usually results in sustained oscillation in the vicinity of the global minimum. The
step size can be made adaptive to avoid oscillation: a high nutrient value corresponds
to a large step size, and in the vicinity of the optima the step size can be reduced.
During the movements, cells release signal to other cells to swarm, depending
on whether they get a nutrient-rich environment or avoid a noxious environment. A
tie-varying term associated to the number of bacteria N P and the number of variables
p is added to the actual objective function.
The swarming pattern of the cell-to-cell attraction and repellence in bacterial
foraging algorithm reduces the precision of optimization. Bacteria in the local optima
may attract those in global optimum and thus lower the convergence speed. Fast
bacterial swarming algorithm [10] assumes that bacteria have the ability, similar to
that of birds to follow the best bacteria in the optimization domain. The position of
each bacterium is updated by
j+1,k,l j,k,l j,k,l j,k,l j,k,l j,k,l
xi = xi + Ci (x ∗ − xi ), if Ji > Jmin , (13.2)
j,k,l j,k,l
where x∗ is the best position the bacterium has at the moment. Ji is the
health status of the ith bacterium at the jth chemotaxis, kth reproduction and lth
elimination–disperse stage.
To accelerate the convergence speed near optima, the chemotactic step size C is
made adaptive in [4]:
1
C= λ
, (13.3)
ψ + |J (x)−J ∗|
where λ is a positive constant, typically λ = 400 and ψ ∈ [0, 1], and J ∗ is the fitness
of the global best bacterium. When the distance between the two fitness values is
much smaller than λ, C ≈ 1/λ. In [13], the step size of bacteria movement is
dynamically adjusted by using linear and nonlinear relationships based on the index
of iteration, index of bacteria, and fitness cost.
At the reproduction stage, the population is sorted according to the accumulated
cost, and N P /2 least healthy bacteria die and the remaining N P /2 healthier bacteria
are used for asexual reproduction, each being split into two bacteria at the same
location of their parent and keeping the same value. That is, after Nc chemotactic
steps, the fitness value of the ith bacterium in the chemotactic loop is accumulated
and calculated by
c +1
N
j,k,l
Jihealth = Ji . (13.4)
j=1
13.2 Bacterial Foraging Algorithm 221
−1
10
−2
Fitness 10
−3
10
−4
10
−5
10
0 20 40 60 80 100
Iteration
Figure 13.1 The evolution of a random run of bacterial foraging algorithm for Rastrigin function:
the minimum objective at each iteration.
For the purpose of improving the global search ability, after Nr e steps of repro-
duction, an elimination–dispersal event is applied to the algorithm. Each bacterium
is eliminated and dispersed to random positions in the search space according to the
probability Ped and their health status. Some bacteria are liquidated at random with a
small probability (commonly set to 0.25) while the new replacements are randomly
initialized over the search space.
A few variants of the classical algorithm as well as hybridizations of bacterial
foraging algorithm with other naturally inspired algorithms have been introduced in
[4,14]. New versions of bacterial foraging algorithm have been proposed in [1,17].
Quantum-inspired bacterial foraging algorithm [6] applies several quantum comput-
ing principles, and a mechanism is proposed to encode and observe the population.
Example 13.1: We revisit the Rastrigin function considered in Example 6.1. The
global optimum is f (x) = 0 at x ∗ = 0.
We now find the global optimum by using bacterial foraging algorithm. The pop-
ulation size is selected as 40, the numbers of reproduction steps, chemotactic steps
and swarming steps are all set as 20, C = 0.001, Ped = 0.8, and the maximum
number of iterations is 100. The initial population is randomly generated from the
entire domain.
For a random run, we have f (x) = 0.0107 at (−0.0050, 0.0054) at the end of
the iteration, and the evolution is illustrated in Figure 13.1. For 10 random runs, the
solver found the solution near the global optimum at the end of the iteration for three
runs. The performance is undesirable compared to that of the other methods, as the
algorithm lacks an elitism strategy to retain the best solutions found thus far.
Bacterial Chemotaxis Algorithm

Bacterial chemotaxis algorithm [11] is an environmental chemical attractant inspired
optimization algorithm that performs similar to standard GA, but worse than ES
with enhanced convergence properties. In bacterial chemotaxis algorithm, every bac-
terium searches the optimal value according to its own judgment. Bacteria use their
own memory to make a temporal space comparison of the gradients found, and decide
the length and duration of their next movement. As the length and duration are com-
puted by probability distributions, it indicates that they are able to escape from local
optimal solutions and find the global optimal value. Bacterial colony chemotaxis [8]
optimization introduces the colony, and adds communication features to bacterial
chemotaxis algorithm. It outperforms bacterial chemotaxis algorithm in terms of
convergence ability and computation speed.
13.3 Algorithms Inspired by Molds, Algae, and Tumor Cells
Physarum Polycephalum Algorithm

The slime mold Physarum polycephalum is a large, single-celled amoeboid organism
with a body made up of tubes. Assume the shape of Physarum is represented by a
graph, in which a plasmodial tube refers to an edge of the graph and a junction
between tubes refers to a node. It can form a dynamic tubular network linking the
discovered food sources during foraging. The physiological mechanism behind the
tube formation and selection contributes to the Physarum’s ability of path finding:
tubes thicken in a given direction when the flux through it persists in that direction
for a certain time. It behaves as an intelligent nonlinear spatially extended active
medium encapsulated in an elastic membrane. The cell optimizes its growth patterns
in configurations of attractants and repellents. On a nutrient substrate Physarum
expands as an omnidirectional wave, for e.g., as a classical excitation wave in a
two-dimensional excitable medium.
It is capable of solving many graph theoretical problems including shortest path
problem [12,21], and network design [19]. By extracting the underlying physiolog-
ical mechanism of tube construction and degeneration, a path finding mathematical
model is constructed in [18]. And it is shown that the model is capable of finding the
shortest route and road navigation in complex road networks [19].
Artificial Algae Algorithm
The term algae refers to a diverse group of photosyntheticeucaryotes (except blue-
green algae–cyanobacteria) that have a discrete nucleus, and an internal green pho-
tosynthetic pigment called chlorophyll. Chlorophyll combine CO2 and H2 O to form
starch or related substance as their own food, and simultaneously releases oxygen in
the presence of sunlight.
Artificial algae algorithm [20] is a population-based metaheuristic optimization
algorithm inspired by the living behaviors of microalgae, photosynthetic species. On
13.3 Algorithms Inspired by Molds, Algae, and Tumor Cells 223
CEC05, it has balanced search performance, arising from the contribution of adap-
tation and evolutionary process, semi-random selection while choosing the source
of light in order to avoid local minima, and balancing of helical movement methods.
Artificial algae corresponds to solutions in the problem space. Artificial algae
algorithm has three control parameters (energy loss, adaptation parameter, and shear
force). Energy loss parameter determines the number of new candidate solutions
of algal colonies produced at each iteration. Each algal colony can produce new
candidate solutions in direct proportion to its energy (the success achieved in the
previous iteration). A small energy loss parameter corresponds to a high local search
capability, whereas a high parameter leads to a high global search ability. It uses an
adaptive energy loss parameter.
Similar to the real algae, artificial algae can move toward the source of light to
photosynthesize with helical swimming, and they can adapt to the environment, are
able to change the dominant species, and can reproduce by mitotic division. The
algorithm is composed of three basic parts: evolutionary process, adaptation, and
helical movement. In adaptation process, in each iteration, an insufficiently grown
algal colony tries to resemble itself to the biggest algal colony in the environment.
This process ends up with the change in starvation level. Starvation value increases
with time, when algal cell receive insufficient light. In evolutionary process, single
algal cell of the smallest algal colony dies and it is replaced by the replicated algal
cell of the biggest algal colony; this process achieves fine-tuning to find the global
optimum. Helical movement is applied to produce a new candidate solution. The
algorithm employs a greedy selection process between the candidate and the current
solutions.
The whole population is composed of algal colonies. An algal colony is a group
of algal cells living together. Under sufficient nutrient conditions, if the algal colony
receives enough light, it grows and reproduces itself to generate two new algal cells,
similar to the real mitotic division. When a single algal cell is divided to produce two
new algal cells, they live adjacently. An algal colony behaves like a single cell, moves
together, and cells in the colony may die under unsuitable life conditions. An external
force like a shear force may distribute the colony, and each distributed part become a
new colony as life proceeds. An algal colony not receiving enough light survives for
a while but eventually dies. An algal colony providing good solutions grows more
as the amount of nutrient obtained is high. In a randomly selected dimension, algal
cell of the smallest algal colony dies and algal cell of the biggest colony replicates
itself.
Algal cells and colonies generally swim and try to stay close to the water surface
because of adequate light for survival is available there. They swim helically in
the liquid with their flagella which provide forward movement. As friction surface
of growing algal cell gets larger, the frequency of helical movements increases by
increasing their local search ability. Each algal cell can move proportional to its
energy. The energy of an algal cell is directly proportional to the amount of nutrient
uptake at the time. The gravity restricting the movement is set as 0 and viscous drag
is displayed as shear force, which is proportional to the size of algal cell.
Invasive Tumor Growth Optimization

Tumor growth mechanism shows that each cell of tumor strives for the nutrient in their
microenvironment to grow and proliferate. Tumor cells are divided into proliferative
cells, quiescent cells, and dying cells. Invasive tumor growth optimization [16] is
based on the principle of invasive tumor growth. The cell movement relies on the
chemotaxis, random walk of motion and interaction with other cells in different
categories. Invasive behaviors of proliferative cells and quiescent cells are simulated
by levy flight, and the behavior of dying cells is simulated through interaction with
proliferative cells and quiescent cells.
Problem
13.1 What type of selection is used for bacterial foraging optimization?
References
1. Abraham A. A synergy of differential evolution and bacterial foraging optimization for global
optimization. Neural Netw World. 2007;17(6):607–26.
2. Bremermann H. Chemotaxis and optimization. J Franklin Inst. 1974;297:397–404.
3. Dahlquist FW, Elwell RA, Lovely PS. Studies of bacterial chemotaxis in defined concentration
gradients—a model for chemotaxis toward l-serine. J Supramol Struct. 1976;4:329–42.
4. Dasgupta S, Das S, Abraham A, Biswas A. Adaptive computational chemotaxis in bacterial
foraging optimization: an analysis. IEEE Trans Evol Comput. 2009;13(4):919–41.
5. Eisenbach M. Chemotaxis. London: Imperial College Press; 2004.
6. Huang S, Zhao G. A comparison between quantum inspired bacterial foraging algorithm and
Ga-like algorithm for global optimization. Int J Comput Intell Appl. 2012;11(3):19. Paper no.
1250016.
7. Hughes BD. Random walks and random environments. London: Oxford University Press;
1996.
8. Li WW, Wang H, Zou ZJ, Qian JX. Function optimization method based on bacterial colony
chemotaxis. J Circ Syst. 2005;10:58–63.
9. Liu Y, Passino KM. Biomimicry of social foraging bacteria for distributed optimization: models,
principles and emergent behaviors. J Optim Theory Appl. 2002;115(3):603–28.
10. Mi H, Liao H, Ji Z, Wu QH. A fast bacterial swarming algorithm for high-dimensional function
optimization. In: Proceedings of IEEE world congress on computational intelligence, Hong
Kong, China, June 2008. p. 3135–3140.
11. Muller SD, Marchetto J, Airaghi S, Kournoutsakos P. Optimization based on bacterial chemo-
taxis. IEEE Trans Evol Comput. 2002;6:16–29.
12. Nakagaki T, Kobayashi R, Nishiura Y, Ueda T. Obtaining multiple separate food sources:
behavioural intelligence in the Physarum plasmodium. Proc R Soc B: Biol Sci. 2004;271:2305–
10.
13. Nasir ANK, Tokhi MO, Abd Ghani NM. Novel adaptive bacteria foraging algorithms for global
optimization. Appl Comput Intell Soft Comput. 2014:7. Article ID 494271.
14. Passino KM. Biomimicry of bacterial foraging for distributed optimization and control. IEEE
Control Syst Mag. 2002;22(3):52–67.
References 225
15. Segall J, Block S, Berg H. Temporal comparisons in bacterial chemotaxis. Proc Natl Acad Sci
U S A. 1986;83(23):8987–91.
16. Tang D, Dong S, Jiang Y, Li H, Huang Y. ITGO: invasive tumor growth optimization algorithm.
17. Tang WJ, Wu QH. Bacterial foraging algorithm for dynamic environments. In: Proceedings
of the IEEE congress on evolutionary computation (CEC), Vancouver, Canada, July 2006. p.
1324–1330.
18. Tero A, Kobayashi R, Nakagaki T. A mathematical model for adaptive transport network in
path finding by true slime mold. J Theor Biol. 2007;244:553–64.
19. Tero A, Yumiki K, Kobayashi R, Saigusa T, Nakagaki T. Flow-network adaptation in Physarum
amoebae. Theory Biosci. 2008;127:89–94.
20. Uymaz SA, Tezel G, Yel E. Artificial algae algorithm (AAA) for nonlinear global optimization.
21. Zhang X, Wang Q, Chan FTS, Mahadevan S, Deng Y. A Physarum polycephalum optimization
algorithm for the bi-objective shortest path problem. Int J Unconv Comput. 2014;10:143–62.
Harmony Search
14
Harmony search and melody search are population-based metaheuristic optimiza-

tion techniques inspired by the improvisation process of music players or group
improvisation. They represent the vertical aspect and the horizontal aspect of music
space.
14.1 Introduction
Harmony search is a population-based metaheuristic optimization technique that

mimics the improvisation process of music players when a musician is attempting
to find a state of pleasing harmony and continues to polish the pitches to obtain a
better harmony [9–11,17,18]. It can handle both discrete and continuous variables.
The concepts of harmony search are musicians, notes, harmonies, improvisation,
pitch, audio aesthetic standard to objective function, practice, pleasing harmony,
and harmony memory. In numerical optimization context, the musicians are the
decision variables. The notes played by the musicians are the values of the variables.
A harmony contains the notes played by all musicians, namely, a solution vector.
Improvisation corresponds to generation, and pitch to value, audio aesthetic standard
to objective function, practice to iteration, and pleasing harmony to good solution.
Harmony memory contains harmonies (solution vectors) played by the musicians.
It is represented in a matrix where all the solution vectors are stored. The rows contain
harmonies and the number of rows is predefined. Each column is dedicated to one
musician (a decision variable); it not only stores the good notes previously played by
the musician but also provides the pool of playable notes for future improvisations.
Harmony search is not sensitive to the initial values. It iteratively generates a new
solution after considering all the existing solutions. It has a stochastic derivative
which reduces the number of iterations for converging toward local minima [11].

DOI 10.1007/978-3-319-41192-7_14
228 14 Harmony Search
Harmony search uses five parameters, including three core parameters such as the
size of harmony memory (HMS), the harmony memory considering rate (PHMCR ),
and the maximum number of iterations or improvisations (NI), and two optional ones
such as the pitch adjustment rate (PAR), and the adjusting bandwidth (BW) or fret
width (FW). HMS is similar to the population size in GA. PHMCR ∈ (0, 1) is the rate
of choosing one value from the harmony memory, while 1 − PHMCR is the rate of
randomly selecting one value from the domain. The number of improvisations (NI)
corresponds to the number of iterations. PAR decides whether the decision variables
are to be adjusted to a neighboring value. In [8], three PARs are used for moving
rates to the nearest, second nearest, and third nearest cities. The number of musicians
N is equal to the number of variables in the optimization function. In [12], fret width
is introduced to replace the static valued bandwidth, making the algorithm adaptive
to the variance in the variable range and suitable for solving real-valued problems.
Generating a new harmony is called improvisation. Harmony search generates a
new vector that encodes a candidate solution, after considering a selection of existing
quality vectors. It is an iterative improvement method initiated with a number of
provisional solutions that are stored in the harmony memory. At each iteration, a
new solution (harmony) x is generated that is based on three operations: memory
consideration for exploitation, random consideration for diversification, and pitch
adjustment for local search. A new harmony is then evaluated against an objective
function, and replaces the worst harmony in the harmony memory, only if its fitness
is better than that of the worst harmony. This process is repeated until an acceptable
solution is obtained.
Consider four decision variables, each of which has stored experience values in
the harmony memory as follows: x1 : {10, 20, 4}, x2 : {133, 50, 60}, x3 : {100, 23,
393}, and x4 : {37, 36, 56}. In an iteration, if x1 is assigned 20 from its memory, x2 is
adjusted from the value 133 stored in its memory to be 28, x3 is assigned 23 from its
memory, and x4 is assigned 67 from its feasible range x4 ∈ [0, 100]. The objective
function of a constructed solution (20, 28, 23, 67) is evaluated. If the new solution
is better than the worst solution in the harmony memory, then it replaces the worst
solution. This process is repeated until an optimal solution is reached.
14.2 Harmony Search Algorithm
In basic harmony search, randomly generated feasible solutions are initialized in the
harmony memory. In each iteration, the algorithm aims at improvising the harmony
memory. Harmony search algorithm can be summarized in four steps: initialization
of the harmony memory, improvisation of a new harmony, inclusion of the newly
generated harmony in the harmony memory if its fitness improves the worst fitness
value in the harmony memory, and loop until a termination criterion is satisfied.
The first step is initialization of the control parameters: HMS, HMCR, PAR, BW,
NI. Randomly generate feasible solution vectors from the solution xt obtained from
tabu search. The harmony memory is initialized with the solution obtained from
14.2 Harmony Search Algorithm 229
tabu search plus HMS − 1 solutions that are randomly chosen in the neighborhood
of xt :
xi = xt + rand(−0.5, 0.5), i = 1, 2, . . . , HMS. (14.1)
Then the solution is sorted by the objective function as
⎡ 1 ⎤
x1 · · · xj1 · · · xn1
⎢ . .. .. .. .. ⎥
⎢ .. . . . . ⎥
⎢ ⎥
⎢ xi · · · xi · · · xi ⎥
HM = ⎢ 1 n ⎥, (14.2)
⎢ . .. ..
j
.. .. ⎥
⎢ . ⎥
⎣ . . . . . ⎦
x1HMS · · · xjHMS · · · xnHMS
where n is the number of variables.
Next step is to generate new solutions. A new solution xi can be obtained by
choosing from the harmony memory with the probability of PHMCR , or generated
randomly with probability 1 − PHMCR in the feasible search space. PHMCR can be
selected as 0.9. This solution is then adjusted by a random number with probability
PAR, and remains unchanged with probability 1 − PAR. PAR can be selected as 0.8.
In pitch adjustment, the solution is changed slightly in the neighborhood space of
the solution.
The harmony memory is then updated. The new solution xi is substituted for
the worst solution in the harmony memory, if it outperforms the worst one. New
solutions are generated and the harmony memory is updated, until the stopping
criterion is satisfied. The flowchart of harmony search is given by Algorithm 14.1.
GA considers only two vectors for generating a new solution or offspring, whereas
harmony search takes into account, componentwise and on a probabilistic basis, all
the existing solutions (melodies) in the harmony memory. Harmony search is able
to infer new solutions merging the characteristics of all individuals by simply tuning
the values of its probabilistic parameters. Besides, it independently operates on each
constituent variable (note) of a solution vector (harmony), to which stochastic opera-
tors for fine-tuning and randomization are applied. The convergence rate of harmony
search and the quality of the produced solutions are not dramatically affected by
the initialized values of the constituent melodies in the harmony memory. Besides,
harmony search utilizes a probabilistic gradient which does not require the derivative
of the fitness function to be analytically solvable, nor even differentiable over the
whole solution space. Instead, the probabilistic gradient converges to progressively
better solutions iteration by iteration. Harmony search performs satisfactorily in both
continuous and discrete optimization problems. It is able to handle both decimal and
binary alphabets without modifying the definition of the original HMCR and PAR
parameters of the algorithm.
Algorithm 14.1 (Harmony Search).
1. Initialize the HM.

2. Evaluate the fitness.
3. Repeat:
a. for i = 1 to n do
if rand < HMCR //memory consideration
xi = xia , a ∈ (1, 2, . . . , HMS).
if rand < PAR //pitch adjustment
xi = xi + BW (2rand − 1).
endif
else
Randomly select xi in its domain.
endif
end for
b. Evaluate the fitness of x.
c. Update the HM by replacing the worst HM member xw by x if
f (x) is better than f (xw ), or disregard x otherwise.
d. Update the best harmony vector.
e. Set t = t + 1.
14.3 Variants of Harmony Search
There are some improved and hybridized variants of harmony search. The HMCR and
PAR parameters help harmony search in searching for globally and locally improved
solutions, respectively. Harmony search is not successful in performing local search
in numerical optimization [19].
Improved harmony search [19] dynamically adjusts the parameters PAR and BW
with regard to search iterations. It linearly adjusts PAR from its minimum to the max-
imum, while exponentially decreases BW from its maximum value to its minimum,
as iteration proceeds.
Global-best harmony search [23] hybridizes PSO concept with harmony search
operators. The pitch adjustment operator is modified to improve the convergence rate,
such that the new improvised harmony is directly selected from the best solution in
the harmony memory. PAR is dynamically updated. Instead of making a random
change in the generated solution after the harmony memory consideration phase, the
solution is replaced with the best solution in harmony memory with the probability
of PAR. Improved global-best harmony search [21] combines a novel improvisation
scheme with an existing PAR and BW updating mechanism.
14.3 Variants of Harmony Search 231
Self-adaptive global-best harmony search [25] adopts a new improvisation scheme

and an adaptive parameter tuning method. According to the applied pitch adjustment
rule in the new improvisation scheme, xnew
i is assigned to the corresponding decision
variable xbest
i of the best harmony. In order to avoid getting trapped in locally optimal
solutions, a modified memory consideration operator is used in the algorithm. Fur-
thermore, HMCR and PAR are dynamically updated to a suitable range by recording
their historical values corresponding to generated harmonies entering the HM. BW
is decreased with increasing generations by a dynamic method.
In [7], sequential quadratic programming is used as a local optimizer to improve
the new harmony for harmony search. By mathematically analyzing the evolution of
population variance for harmony search, a small but effective amendment to harmony
search is proposed in [6] to increase its explorative power. Inspired by the local
version of PSO, local-best harmony search with dynamic subpopulations [24] divides
the harmony memory into many subharmony memories.
In a global harmony search method [28], harmony memory consideration and
pitch adjustment are not used, but genetic mutation with low probability is included,
and a new variable updating technique is applied.
In global dynamic harmony search [16], all the parameters are dynamically
adjusted, and the domain is changed to dynamic mode to help a faster convergence.
The method outperforms other harmony search variants, GA, PSO, DE, and ABC
algorithms.
A location-updating strategy is designed which makes the algorithm easier to
converge. Another improvement to harmony search replaces the pitch adjustment
operation with a mutation strategy borrowed from DE [5].
Enhanced harmony search [20] enables harmony search to quickly escape from
local optima. The harmony memory updating phase is enhanced by considering also
designs that are worse than the worst design stored in the harmony memory but are
far enough from local optima.
Intelligent tuned harmony search [27] maintains a balance between diversification
and intensification by automatically selecting PAR based on its harmony memory.
The performance of the algorithm is influenced by other parameters, such as HMS
and HMCR.
Self-adaptive harmony search [26] uses the minimum and maximum of the present
harmony memory members (self-consciousness), to automatically control the pitch
adjustment step. The method linearly updates PAR from its maximum to its minimum
during the iterations, as in improved harmony search, but BW is completely removed.
We have

xnew + [max(HMi ) − xnew
i ]rand() with probability p = 0.5
xnew = i (14.3)
i xi − [xnew
new
i − min(HMi )]rand() with probability 1 − p = 0.5
where min(HMi ) and max(HMi ) are the lowest and highest values of the ith decision
variable in the harmony memory, and rand() generates a uniform random number
in [0, 1]. Since min(HMi ) and max(HMi ) gradually approach the optimum design,
finer adjustments of the harmony memory are produced.
A selection mechanism in harmony search is introduced in [1,15]. In [15], the

tournament selection-based harmony search is basically the same as the improved
harmony search [19], except that a predefined number of harmonies participate in a
tournament and the winner of the tournament is selected as a harmony for impro-
visation. In [1], different selection schemes are considered, including global-best,
fitness-proportional, tournament, linear rank, and exponential rank selection. A selec-
tion scheme in the process of memory consideration has a beneficial effect on the
performance of the harmony search algorithm.
Geometric selective harmony search [4] integrates a selection procedure in the
improvisation phase, a memory consideration process that makes use of a recom-
bination operator, and a mutation operator. On CEC 2010 suite, the algorithm out-
performs the other studied harmony search variants with statistical significance in
almost all the benchmark problems considered.
In [14], two varying control parameters are used to generate new harmony vectors.
Both the parameters are selected from the average values that are observed within
the current harmony memory matrix using a given probability density function.
Parameter-setting-free harmony search [13] has a rehearsal step, in which certain
numbers of new solutions are generated with the initial HMCR and PAR. The adaptive
HMCR and PAR are then calculated based on the rehearsal results evaluated.
In [6], the exploratory power of harmony search is analyzed based on the evolution
of the population variance over successive generations of the harmony memory. In
exploratory harmony search [6], BW is set to be proportional to the standard deviation
of the harmony memory population. Exploratory harmony search outperforms IHS
and GHS.
Best value:6.0243e−006 Mean value:6.0243e−006

2
10
Best value
Mean value
0
10
Function value
−2
10
−4
10
−6
10
0 20 40 60 80 100
Iteration
Figure 14.1 The evolution of a random run of harmony search for Rastrigin function: the minimum
and average objectives.
14.3 Variants of Harmony Search 233
Example 14.1: Revisit the Rastrigin function treated in Example 6.1. The global
optimum is f (x) = 0 at x∗ = 0.
We now find the global optimum by using the improved harmony search [19]. We
select HMCR = 0.9, PAR linearly decreasing from 0.9 to 0.3, and BW exponentially
decreasing from 0.5 to 0.2. The harmony memory size is selected as 50, and the
maximum number of iterations is 100. The initial harmonies are randomly generated
from the entire domain.
For 10 random runs, the solver always converged to the global optimum. For a
random run, it gives the optimum solution: f (x) = 6.0243 × 10−6 at (−0.1232 ×
10−3 , −0.1232 × 10−3 ), and all the individuals converged toward the global opti-
mum. The evolution of a random run is illustrated in Figure 14.1.
14.4 Melody Search
In music, harmony is the use of simultaneous pitches or chords, and is the vertical
aspect of music space. Melodic line is the horizontal aspect, as shown in Figure 14.2.
Melody is a linear succession of individual pitches. Figure 14.3 illustrates the melody
search model.
Melody search [2], as an improved version of harmony search method, mimics
performance processes of the group improvisation for finding the best series of
pitches within a melody. In such a group, the music players can improvise the melody
differently and lead one another to achieve the best subsequence of pitches. The group
of music players can achieve the best subsequence of pitches faster.
In medody search, each melodic pitch corresponds to a decision variable, each
melody is generated by a player and corresponds to a solution of the problem. Each
player produces a series of subsequent pitches within their possible ranges; if the
succession of pitches makes a good melody, that experience is stored into the player
memory. Unlike harmony search that uses a single harmony memory, melody search
employs several memories named player memory.
Figure 14.2 Melody and

harmony.
Melody
Harmony Harmony
Figure 14.3 Melodies and

optimization.
Melody 1, by player 1
Applying an alternative improvisation procedure [3] makes algorithm more capa-

ble in optimizing shifted and rotated unimodal and multimodal problems than basic
melody search. The algorithm is capable of finding better solutions compared with
harmony search and a number of its variants, and basic melody search.
Utilizing different player memories and their interactive process enhances the
algorithm efficiency compared to harmony search, while the possible range of vari-
ables can be varied going through the algorithmic iterations.
Method of musical composition [22] is a multiagent metaheuristic, based on an
artificial society that uses a dynamic creative system to compose music, for continu-
ous optimization problems. In this method, composers exchange information among
themselves and their environment, generate for each agent a new tune, and use their
knowledge to improve their musical works. These interactions produce a learning
that is used to adapt the individual to the current environment faster. The method
outperforms harmony search, improved harmony search, global-best harmony search
and self-adaptative harmony search in a set of multimodal functions.
References
1. Al-Betar MA, Doush IA, Khader AT, Awadallah MA. Novel selection schemes for harmony
search. Appl Math Comput. 2012;218:6095–117.
2. Ashrafi SM, Dariane AB. A novel and effective algorithm for numerical optimization: melody
search. In: Proceedings of the 11th international conference on hybrid intelligent systems (HIS),
Malacca, Malaysia, Dec 2011. p. 109–114.
3. Ashrafi SM, Dariane AB. Performance evaluation of an improved harmony search algorithm
for numerical optimization: melody Search (MS). Eng Appl Artif Intell. 2013;26:1301–21.
4. Castelli M, Silva S, Manzoni L, Vanneschi L. Geometric selective harmony search. Inf Sci.
2014;279:468–82.
5. Chakraborty P, Roy GG, Das S, Jain D, Abraham A. An improved harmony search algorithm
with differential mutation operator. Fundamenta Informaticae. 2009;95(4):401–26.
References 235
6. Das S, Mukhopadhyay A, Roy A, Abraham A, Panigrahi BK. Exploratory power of the harmony
search algorithm: analysis and improvements for global numerical optimization. IEEE Trans
Syst Man Cybern Part B. 2011;41(1):89–106.
7. Fesanghary M, Mahdavi M, Minary-Jolandan M, Alizadeh Y. Hybridizing harmony search
algorithm with sequential quadratic programming for engineering optimization problems.
Comput Methods Appl Mech Eng. 2008;197:3080–91.
8. Geem ZW, Tseng C, Park Y. Harmony search for generalized orienteering problem: best touring
in China. In: Wang L, Chen K, Ong Y editors. Advances in natural computation, vol. 3412 of
Lecture Notes in Computer Science. Berlin: Springer; 2005. p. 741–750.
9. Geem ZW, Kim JH, Loganathan GV. A new heuristic optimization algorithm: harmony search.
Simulation. 2001;76(2):60–8.
10. Geem ZW, Kim JH, Loganathan GV. Harmony search optimization: application to pipe network
design. Int J Model Simul. 2002;22:125–33.
11. Geem ZW. Novel derivative of harmony search algorithm for discrete design variables. Appl
Math Comput. 2008;199(1):223–30.
12. Geem ZW. Recent advances in harmony search algorithm. Berlin: Springer; 2010.
13. Geem ZW, Sim K-B. Parameter-setting-free harmony search algorithm. Appl Math Comput.
2010;217(8):3881–9.
14. Hasannebi O, Erdal F, Saka MP. Adaptive harmony search method for structural optimization.
ASCE J Struct Eng. 2010;136(4):419–31.
15. Karimi M, Askarzadeh A, Rezazadeh A. Using tournament selection approach to improve har-
mony search algorithm for modeling of proton exchange membrane fuel cell. Int J Electrochem
Sci. 2012;7:6426–35.
16. Khalili M, Kharrat R, Salahshoor K, Sefat MH. Global dynamic harmony search algorithm:
GDHS. Appl Math Comput. 2014;228:195–219.
17. Lee KS, Geem ZW. A new structural optimization method based on the harmony search algo-
rithm. Comput Struct. 2004;82:781–98.
18. Lee KS, Geem ZW. A new meta-heuristic algorithm for continuous engineering optimization:
harmony search theory and practice. Comput Methods Appl Mech Eng. 2005;194:3902–33.
19. Mahdavi M, Fesanghary M, Damangir E. An improved harmony search algorithm for solving
optimization problems. Appl Math Comput. 2007;188(2):1567–79.
20. Maheri MR, Narimani MM. An enhanced harmony search algorithm for optimum design of
side sway steel frames. Comput Struct. 2014;136:78–89.
21. Mohammed EA. An improved global-best harmony search algorithm. Appl Math Comput.
2013;222:94–106.
22. Mora-Gutierrez RA, Ramirez-Rodriguez J, Rincon-Garcia EA. An optimization algorithm
inspired by musical composition. Artif Intell Rev. 2014;41:301–15.
23. Omran MGH, Mahdavi M. Global-best harmony search. Appl Math Comput. 2008;198(2):643–
56.
24. Pan QK, Suganthan PN, Liang JJ, Tasgetiren MF. A local-best harmony search algorithm with
dynamic subpopulations. Eng Optim. 2010;42(2):101–17.
25. Pan QK, Suganthan PN, Tasgetiren MF, Liang JJ. A self-adaptive global best harmony search
algorithm for continuous optimization problems. Appl Math Comput. 2010;216:830–48.
26. Wang CM, Huang YF. Self-adaptive harmony search algorithm for optimization. Expert Syst
Appl. 2010;37:2826–37.
27. Yadav P, Kumar R, Panda SK, Chang CS. An intelligent tuned harmony search algorithm for
optimization. Inf Sci. 2012;196:47–72.
28. Zou D, Gao L, Wu J, Li S. Novel global harmony search algorithm for unconstrained problems.
Neurocomputing. 2010;73:3308–18.
Swarm Intelligence
15
Nature-inspired optimization algorithms can, generally, be grouped into

evolutionary approaches and swarm intelligence methods. EAs try to improve the
candidate solutions (chromosomes) using evolutionary operators. Swarm intelli-
gence methods use differential position update rules for obtaining new candidate
solutions. The popularity of the swarm intelligence methods is due to their sim-
plicity, easy adaptation to the problem and effectiveness in solving the complex
15.1 Glowworm-Based Optimization
Glowworms or fireflies belong to a family of beetles. They emit bioluminescent light

to attract their mates or prey. The brighter the glow, the more the attraction. The light
intensity is proportional to the associated luminescence quantity called luciferin and
it interacts with other glowworms within a variable neighborhood.
Most fireflies produce short and rhythmic flashes. The pattern of flashes is often
unique for a particular species. The flashing light is produced by a process of bio-
luminescence. Such flashes are to attract mating partners, and to attract potential
prey, and to serve as a protective warning mechanism. The rhythmic flash, the rate of
flashing, and the amount of time form part of the signal system that brings both sexes
together. Females respond to a male’s unique pattern of flashing in the same species,
while in some species females can mimic the mating flashing pattern of other species
so as to lure and eat the male fireflies.

DOI 10.1007/978-3-319-41192-7_15
238 15 Swarm Intelligence
15.1.1 Glowworm Swarm Optimization
Inspired by the natural behavior of glowworms in emitting luciferin in order to attract

other glowworms, glowworm swarm optimization [35,37] was developed for the
simultaneous computation of multiple optima of multimodal functions. The related
theoretical foundation is reported in [36].
Glowworms carry luciferin along with them. A glowworm identifies its neigh-
bors and computes its movements by exploiting an adaptive neighborhood, which is
bounded above by its sensor range. Glowworms with larger emissions of luciferin are
more attractive. Each agent selects a neighbor that has a luciferin value greater than
its own (within the local decision range) and moves toward it using a probabilistic
mechanism. The algorithm starts by placing a population of glowworms randomly
in the solution space. The glowworms encode the fitness of their current locations
into a luciferin value that they broadcast to their neighbors.
Initially, all the glowworms contain an equal quantity of luciferin l 0 . Each iteration
consists of three consecutive phases: luciferin update phase, movement phase, and
neighborhood range update phase (or local decision range update phase).
Each glowworm, using a probabilistic mechanism, selects a neighbor that has
a luciferin value higher than its own and moves toward it. These movements, that
are based only on local information and selective neighbor interactions, enable the
swarm to partition into disjoint subgroups that converge to multiple optima of a given
multimodal function.
The luciferin update depends on the function value at the glowworm position. Each
glowworm adds, to its previous luciferin level, a luciferin quantity proportional to
the fitness of its current location. The luciferin update rule is given by
li (t + 1) = (1 − ρ)li (t) + γ f (x i (t + 1)), (15.1)
where li (t) is the luciferin level associated with glowworm i at time t, 0 < ρ < 1 is
the luciferin decay constant, γ is the luciferin enhancement constant, and f (x i (t +1))
represents the objective value of agent i at time t.
During the movement phase, for each glowworm i, the probability of moving
toward a neighbor j is given by
l j (t) − li (t)
Pi j (t) = , (15.2)
k∈Ni (t) lk (t) − li (t)
where j ∈ Ni (t), Ni (t) is the set of neighbors of glowworm i at time t that have
luciferin value higher than that of glowworm i. The neighborhood range of each
glowworm is defined by an Euclidean distance, and is adaptively updated.
Let glowworm i select glowworm j with Pi j (t). x i (t) is the location of glowworm
i at time t, and it is updated by
x j (t) − x i (t)
x i (t + 1) = x i (t) + α , (15.3)
x j (t) − x i (t)
where α > 0 is the step size which can be linearly decreasing.
A glowworm swarm optimization algorithm [31] is proposed to find the optimal
solution for multiobjective environmental economic dispatch problem. Technique
15.1 Glowworm-Based Optimization 239
for order preference similar to an ideal solution (TOPSIS) is employed as an overall

fitness ranking tool to evaluate the multiple objectives simultaneously. In addition, a
time-varying step size is incorporated in the algorithm to get better performance. By
taking advantage of its ability to solve multimodal optimization, in [2] glowworm
swarm optimization is combined with MapReduce parallelization methodology for
clustering big data.
15.1.2 Firefly Algorithm
Similar to PSO, firefly algorithm [83] is inspired by the ability of fireflies in emitting
light (bioluminescence) in order to attract other fireflies for mating purposes. It was
first proposed for multimodal continuous optimization [83]. A further study on the
firefly algorithm is presented for constrained continuous optimization problems in
[45]. In [69] a discrete firefly algorithm is presented to minimize makespan for
flowshop scheduling problems.
A firefly’s flash mainly acts as a signal to attract mating partners and potential
prey. Flashes also serve as a protective warning mechanism. In firefly algorithm [83],
a firefly will be attracted to other fireflies regardless of their sex. Its attractiveness
is proportional to its brightness, and they both decrease as the distance increases. If
there is no brighter one than a particular firefly, it will move randomly. The brightness
of a firefly is affected by the landscape of the objective function.
The attractiveness of a firefly is determined by its light intensity I , which can be
defined by the fitness function f (x). The attractiveness may be calculated by
β(r ) = β0 e−γ r ,
2
(15.4)
where r is the distance between any two fireflies, β0 is the initial attractiveness at
r = 0, and γ is an absorption coefficient, which controls the decrease in the intensity
of light.
A less attractive firefly i move toward a more attractive firefly j by
x i = x i + β0 e−γ x j −x i + α(rand − 0.5),
2
(15.5)
where α ∈ [0, 1], and rand ∈ (0, 1) is a uniformly distributed random number.
Typically, γ0 = 0.8, α = 0.01, β0 = 1.
Firefly algorithm is implemented as follows. For all the N P fireflies: if intensity
I j < Ii , move firefly j toward i; update attractiveness and light intensity. The
algorithm repeats unitl the termination criterion is satisfied.
Firefly movement is based on the local optima, but is not influenced by the global
optima. Thus the exploration rate of firefly algorithm is very limited. Fuzzy firefly
algorithm [26] increases the exploration and improves the global search of fire-
fly algorithm. In each iteration, the global optima and some brighter fireflies have
influence on the movement of fireflies. The effect of each firefly depends on its
attractiveness, which is selected as a fuzzy variable.
Eagle strategy [89] is a two-stage hybrid search method for stochastic optimiza-
tion. It combines the random search using Levy walk with firefly algorithm in an
iterative manner.
15.2 Group Search Optimization
Group search optimization [28,29] is a swarm intelligence algorithm inspired by

the animal (such as lions and wolves) search behavior and group living theory. The
population is called a group and each individual is called a member. The framework
is based on producer–scrounger model. General animal scanning mechanisms (e.g.,
vision) are employed for producers. The method is not sensitive to most of the
algorithm parameters except the percentage of rangers [29]. It is effective and robust
on solving multimodal problems.
A group consists of three types of members: producers, scroungers, and dispersed
members who perform random walk motions to avoid entrapments in local minima.
Producer–scrounger model is simplified by assuming that there is only one producer
at each search bout and the remaining members are scroungers and dispersed mem-
bers. All scroungers will join the resource found by the producer. In optimization
problems, unknown optima can be regarded as open patches randomly distributed in
a search space. The producer and the scroungers can switch between the two roles.
At each iteration, the member located at the best position is chosen as the producer
G best . Each member has position x i (k) and head angle φ i (k). At the kth iteration,
the producer position x p (k) scans three points around it, namely, a point in front of
it, a point on its left side, and a point on the right-hand side, to find a better position.
If the producer finds that the best position in the three points is better than its current
position, it moves to the best position and changes its head angle. Otherwise, it stays
at the original position. If the producer fails to find a better point in a iterations, it
scans front again.
In the computation, most of the members are chosen as scroungers. If the ith
member is chosen as a scrounger at the kth iteration, it moves toward the producer
with a random distance. The rest of the members are dispersed members, acting as
rangers. If the ith member is chosen as a ranger at the kth iteration, it turns its head
to a random angle and then moves to a search direction.
At each iteration, a member, which is located in the most promising area and
conferring the best fitness value, is chosen as the producer. It then stops and scans
the environment to seek resources (optima). Scanning can be accomplished through
physical contact or by visual, chemical, or auditory mechanisms. Vision is employed
by the producer.
During each searching bout, a number of group members are selected as
scroungers. The scroungers will keep searching for opportunities to join the resources
found by the producer. In this method, only area copying behavior in sparrows is
adopted. At the kth iteration, the area copying behavior of the ith scrounger can be
modeled as a random walk toward the producer helping the group to escape from
local minima in the earlier search bouts.
In social network structure, the spread of information is more efficient, where
each individual can gather information from its neighbors. A network structure called
small-world topology [78] is inspired by the human social network by building a small
number of shortcuts between nodes which are far from one another. In [82], group
search optimization is improved by increasing the diversity of scroungers’ behavior
15.2 Group Search Optimization 241
through introducing small-world scheme in complex network. Each scrounger selects

a subset of members as its neighbors, and evolves with the effects of global best
member and local best member within neighbors at each iteration.
In [44], area-restricted search behavior has inspired synthetic predator search
algorithm for solving combinatorial optimization problems.
15.3 Shuffled Frog Leaping
Shuffled frog leaping [21,22] is a memetic metaheuristic inspired from grouping

search of frogs for food resources in a swamp. It combines the benefits of memetic
algorithm, PSO and information sharing of parallel local search. The method has been
used to solve discrete and continuous optimization problems. It has a performance
similar to that of PSO, and outperforms GA [20].
In the method, frogs with worst positions are directed to leap toward a position
with more food by sharing ideas/beliefs with the local/global best frogs. Thus, all
frogs approach to the best solution on evolution. The method performs local evolution
through social interaction of the species within local community and achieves global
evolution by shuffling the whole population after every generation.
The algorithm has three stages: partitioning, local search, and shuffling. The pop-
ulation consists of a set of frogs (solutions) that are partitioned into subsets referred
to as memeplexes. The memeplexes are considered as different cultures of frog, each
performing a local search. Within each memeplex, the individuals hold ideas, which
can be influenced by the idea of others, and evolve through a process of memetic
evolution. Within each memeplex, frogs search for maximum amount of food avail-
able in the pond and searching for food is optimized by improving the positions of
worst frogs. Members of each memeplex are improved in a way similar to that of
PSO. After a defined number of memetic evolution steps, ideas are passed among
memeplexes in a shuffling process [20]. The stopping criteria are checked.
Position of worst frog is optimized by adapting the moment of inertia through
sharing of ideas/ beliefs either with the optimal best frog within the memeplex or
global best frog among the entire population. This approach brings the movement
of inertia of worst frog anywhere between the position of worst frog and best frog.
Thus contrary to the expectation to retain diverse species, there is high probability
that upon evolution all frogs in a memeplex approach to the best solution. In [40],
shuffled frog leaping is modified by adding an inertia component to the existing
leaping rule to improve the position of worst frog.
Shuffled DE presents a structure for DE that is derived from partitioning and
shuffling concepts of shuffled frog leaping.
The shuffled frog leaping flowchart is given in Algorithm 15.1.
Algorithm 15.1 (Shuffled Frog Leaping).
1. Generate initial population of N P frogs x i , i = 1, 2, . . . , N P .

2. for each individual i in P : calculate fitness f (i).
3. Sort the population P in descending order of their fitness.
4. Repeat:
a. Divide P into m memeplexes.
b. for each memeplex:
Determine the best x b and worst x w frogs.
Improve the worst frog position by x w = x w + rand()(x b − x w ).
Repeat for a specific number of iterations.
end for
c. Combine the evolved memeplexes.
d. Sort the population P in descending order of their fitness.
Jumping frogs optimization [50] is another metaheuristic optimization method

inspired by frogs jumping. It is suitable for discrete problems. The procedure derives
from PSO, except that the velocity concept is not used but the notion of attraction
of the leaders is kept. Instead of velocity and inertia, a random component in the
movement of particles, in the form of jumps, are considered. Local search is also
included to improve the evolving solutions. After each random or approaching-to-
attractor movement, a local search is applied to every particle in the swarm.
15.4 Collective Animal Search
By mimicking the collective animal behavior, collective animal behavior algorithm

[17] is a metaheuristic algorithm for multimodal optimization. Searcher agents are
a group of animals which interact with one another based on the biologic laws of
collective motion, which are simple behavioral rules. A memory is incorporated to
store the best animal positions (best solutions) considering a competition-dominance
mechanism. The memory maintains the best found positions in each generation
(Mg), and the best history positions during the complete evolutionary process (Mh).
Collective animal behavior algorithm starts by generating random solutions or animal
positions. The fitness value refers to the animal dominance with respect to the group.
The algorithm then keeps the positions of the best individuals. The individuals move
from or to nearby neighbors (local attraction and repulsion). Some individuals move
randomly, and compete for the space inside of a determined distance (updating the
memory). This process repeats until the termination criterion is met.
15.4 Collective Animal Search 243
Free search [63] is inspired from the animals’ behavior and operates on a set of
solutions called population. In free search, each animal has original peculiarities
called sense and mobility. The sense is an ability of the animal for orientation within
the search space, and it is used for selection of location for the next step. The sen-
sibility varies during the optimization process. The animal can select any location
marked with pheromone, which fits its sense. During the exploration walk, the ani-
mals step within the neighbor space. The neighbor space also varies for the different
animals. Therefore, the probability for access to any location of the search space
is nonzero. During the exploration, each animal achieves some favor (an objective
function value) and distributes a pheromone in amount proportional to the amount
of the found favor. The pheromone is fully replaced with a new one after each walk.
Particularly, the animals in the algorithm are mobile. Each animal can operate either
with small precise steps for local search or with large steps for global exploration.
Moreover, the individual decides how to search personally.
Animal migration optimization [42] is a heuristic optimization method inspired
by the ubiquitous animal migration behavior, such as birds, mammals, fish, reptiles,
amphibians, insects, and crustaceans. In the first process, the algorithm simulates how
the groups of animals move from the current position to the new position. During
this process, each individual should obey three main rules. In the latter process, the
algorithm simulates how some animals leave the group and some join the group
during the migration.
15.5 Cuckoo Search
Cuckoo search is a metaheuristic search algorithm for global optimization, imitating

cuckoo bird’s behavior [73,74,87,88]. Cuckoo search was inspired by the obligate
brood parasitism of some cuckoo species by laying their eggs in the nests of other
host birds, in combination of the Levy flight behavior of some birds and fruit flies.
If an egg is discovered by the host bird as not its own, it will either throw the
unknown egg away or simply abandon its nest and build a new nest elsewhere. Some
other species have evolved in such a way that female parasitic cuckoos are often very
specialized in the mimic in color and pattern of the eggs of a few chosen host species.
This reduces the probability of their eggs being abandoned and thus increases their
population. Further, the cuckoos often choose a nest where the host bird just laid
its eggs. Typically, the cuckoo eggs hatch a little earlier than the host eggs, and the
cuckoo chicks may evict the host eggs out of the nest. The cuckoo chicks also mimic
the call of host chicks for feeding.
In cuckoo search algorithm, each egg in a nest represents a solution, and a cuckoo
egg represents a new solution, the aim is to use a cuckoo egg to replace a solution
in the nests. This algorithm follows three idealized rules: Each cuckoo lays one egg
at a time, and put its egg in randomly chosen nest; The best nests with high-quality
eggs will carry over to the next generations; The number of available host nests is
fixed, and the egg laid by a cuckoo is discovered by the host bird with a probability
Algorithm 15.2 (Cuckoo Search).
1. Generate initial population of N P host nests x i and evaluate their fitness Fi , i =

1, 2, . . . , N P .
2. Repeat
a. for i = 1 to N P do
i. Get cuckoo egg x i from random host nest by using Levy flights
evaluate its fitness Fi .
Choose nest j among N P randomly.
ii. if Fi > F j , replace solution j by the new solution.
iii. A fraction Pa of worst nests are abandoned and new ones are built.
iv. Keep the best solutions or nests.
v. Rank the fitness of the solutions/nests, and find the current best solution.
end for
until the maximum number of generations.
Pa ∈ [0, 1]. The nests discovered by the host bird are abandoned and removed from
the population, and they are replaced by new nests (with new random solutions).
Levy flights algorithm is a stochastic algorithm for global optimization [62]. It is a
random walk that is characterized by a series of straight jumps chosen from a heavy-
tailed probability density function [77]. Unlike Gaussian and Cauchy distributions,
Levy distribution is nonsymmetrical, and has infinite variance with an infinite mean.
The foraging path of an animal commonly has the next move based on the current
state and the variation probability to the next state. The flight behavior of many birds
and insects has the characteristics of Levy flights.
When generating new solution x(t + 1) for the ith cuckoo, a Levy flight is per-
formed:
x i (t + 1) = x i (t) + αl ev y(λ), (15.6)
where α > 0 is the step size, and the random step length is drawn from a Levy
distribution u = t −λ , λ ∈ (1, 3], which has an infinite variance with an infinite
mean. This escapes local minima easier than Gaussian random steps do.
Cuckoo search algorithm consists of three parameters: Pa (probability of worse
nests to be abandoned), step size α, and random step length λ. The optimal solutions
obtained by cuckoo search are far better than the best solutions obtained by PSO or
GA [88]. The algorithm flowchart is given in Algorithms 15.2.
In [19], cuckoo search is enhanced with multimodal optimization capacities by
incorporating a memory mechanism to register potential local optima according
to their fitness value and the distance to other potential solutions, modifying the
individual selection strategy to accelerate the detection process of new local minima,
and including a depuration procedure to cyclically eliminate duplicated memory
elements.
15.5 Cuckoo Search 245
Cuckoo optimization algorithm [66] is another population-based metaheuristic

inspired from cuckoo survival competition by egg laying and breeding. The cuckoos,
in different societies, exist in the forms of mature cuckoos and eggs. Mature cuckoos
lay eggs in other birds’ nest and if these eggs are not recognized and not killed by the
host birds, they grow and become mature cuckoos. During the survival competition,
some of the cuckoos or their eggs demise. The survived cuckoo societies immigrate
to a better environment and start reproducing and laying eggs. Cuckoos’ survival
effort hopefully converges to a state that there is only one cuckoo society, all with the
same profit values. Environmental features and the migration of societies (groups) of
cuckoos hopefully lead them to converge and find the best environment for breeding
and reproduction. In [49], cuckoo optimization algorithm is modified for discrete
Example 15.1: We revisit Rosenbrock function treated in Examples 3.3 and 5.1. The
function has the global minimum f (x) = 0 at xi = 1, i = 1, . . . , n. The landscape
We apply cuckoo search algorithm to solve this problem. The implementation
sets the number of nests (solutions) as 30, the maximum number of iterations as
1000, Pa = 0.25, and selects the initial nests randomly from the entire domain.
For a random run, we have f (x) = 3.0920 × 10−4 at (0.9912, 0.9839) with 60000
function evaluations. All the individuals converge toward the global optimum. For
10 random runs, the solver always converged toward a point very close to the global
optimum. The evolution of a random run is illustrated in Figure 15.1.
10
10
8
10
6
10
Function value
4
10
2
10
0
10
−2
10
−4
10
0 200 400 600 800 1000
Iteration
Figure 15.1 The minimum objective of a random run of cuckoo search for Rosenbrock function.
15.6 Bat Algorithm
Bats are the only volitant mammals in the world. There are nearly 1,000 species
of bats. Many bats have echolocation (https://askabiologist.asu.edu/echolocation);
they can emit a very loud and short sound pulse and receive the echo reflected
from the surrounding objects by their extraordinary big auricle. The emitted pulse
could be as loud as 110 dB in the ultrasonic region. The loudness varies from the
loudest when searching for prey and to a quieter base when homing toward the
prey. This echo is then analyzed in their brain, from which they can discriminate
direction for their flight pathway and also distinguish different insects and obstacles,
to hunt prey and avoid collision effectively. Natural bats increases the rate of pulse
emission and decreases the loudness when a bat finds a prey [7]. The echolocation
signal can simultaneously serve as a communication function, allowing for social
communication in bats population.
Bat algorithm [84,85] is a metaheuristic optimization method inspired by the
echolocation or biosonar behavior of bats. In the algorithm, all bats navigate by
using echolocation to sense distance and detect the surroundings. Bats fly randomly
with velocity v i at position x i with a fixed frequency f min , varying wavelength λ,
and loudness A0 to search for prey. They automatically adjust the wavelength of
their emitted pulses and adjust the rate of pulse emission r ∈ [0, 1], depending on
the proximity of their target. Typically, the rate of pulse emission r increases and the
loudness A decreases when the population draws nearer to the local optimum. The
loudness varies from a positive large value A0 to a minimum value Amin .
Apart from the population size N P and maximum iteration number, the algorithm
employs two control parameters: pulse rate and loudness. The pulse rate regulates
an improvement of the best solution, while the loudness influences an acceptance of
the best solution.
Bat algorithm controls the size and orientation of bats’ moving speed by adjusting
the frequency of each bat and then moves to a new location. To some extent, PSO
is a special case of bat algorithm. Bat algorithm utilizes a balanced combination of
PSO and the local/global search mode controlled by loudness A and pulse rate r .
Each bat in the population represents a candidate solution x i , i = 1, . . . , N P .
Bat algorithm consists of initialization, variation operation, local search, solution
evaluation, and replacement steps.
In the initialization step, the algorithm parameters are initialized. Then, an initial
population of N P solutions (bats) x i is generated randomly. Next, this population is
evaluated, and the best solution is determined as x best .
The variation operator moves the virtual bats in the search space. In local search,
the current best solution is improved by the random walk direct exploitation heuris-
tics. The replacement step replaces the current solution with the newly generated
solution according to some probability. A local search is launched with the proba-
bility of pulse rate r . The probability of accepting the new best solution and save the
best solution conditionally depends on loudness A.
x it = x it−1 + v it , (15.7)
15.6 Bat Algorithm 247
where the velocity of movement v it is calculated by

v it = v it−1 + (x it − x best ) f i . (15.8)
f i being the frequency of the ith bat, and x best the global best solution found so far.
f i can be set as a uniform random value between f max and f min . It is recommended
f min = 0 and f max = 2. In the case that a value of a variable overflows the allowed
search space limits, then the variable is updated with the value of the closer limit
value.
The current best is then improved. For each x it , it is updated by
x new = x best + At , if rand1 > rit , (15.9)
or x new = x it otherwise, where rand1 is a uniform random value in [0, 1], is a
uniform random value in [−1, 1], At =< Ait > is the average loudness of all bats at
step t, and rit is the pulse rate function. The pulse rate function is defined by
rit = ri0 (1 − e−βt ), (15.10)
where β is a constant and are initial pulse rates in the range [0, 1]. It can be seen
ri0
that this function controld the intensive local search. The pulse rate can be simply
determined in the range from 0 to 1, where 0 means that there is no emission and 1
means that the bat’s emitting is at maximum.
Next, the solution x new and f (x new ) are accepted as a new solution and its objec-
tive function value for x it , if rand2 < Ait and f (x new ) > f (x it−1 ), otherwise
x it = x it−1 , where rand2 is a uniform random number in [0, 1]. The loudness Ait is
given by
Ait = α Ait−1 , (15.11)
where α is a constant and plays a role similar to the cooling factor of a cooling
schedule.
Discrete bat algorithms have been proposed for the optimal permutation flow shop
scheduling problem [46] and for the symmetric and asymmetric TSPs [60]. In [32],
chaotic-based strategies are incorporated into bat swarm optimization. Ergodicity
and non-repetitious nature of chaotic functions can diversify the bats. The loudness
is computed via multiplying a linearly decreasing function by chaotic map function.
15.7 Swarm Intelligence Inspired by Animal Behaviors
15.7.1 Social Spider Optimization
Spiders are air-breathing arthropods having eight legs and chelicerae with fangs.
Most of them detect prey by sensing vibrations on their webs. Some social species,
e.g., Mallos gregalis and Oecobius civitas, live in groups and interact with others in
Algorithm 15.3 (Bat Algorithm).
1. Initialization.
Set t = 1.
Set bat population N P .
Set loudness Ai , pulse frequency f i at x i , pulse rate ri .
Initialze x i , v i .
2. Repeat
a. Generate new solutions by adjusting frequency, and updating velocities and location
solution.
b. for bat i:
if (rand > ri )
Select a solution among the best solutions.
Generate a location solution around the selected best solution.
end if
Generate a new solution by flying randomly.
if (rand < Ai and f (x i ) < f (x ∗ ))
Accept the new solution.
Increase ri and reduce Ai .
end if
end for
c. Rank the bats and find the current best x ∗ .
d. Set t = t + 1.
until termination criterion is met.
the same group. Spiders have accurate senses of vibration. They can separate different
vibrations and sense their respective intensities. The social spiders passively receive
vibrations generated by other spiders on the same web to have a clear view of the
web. The foraging behavior of the social spider can be described as the cooperative
movement of the spiders toward the food source.
Social spider optimization [18] is a swarm algorithm imitating the mating behavior
of social spiders. A group of spiders interact with one another based on the biological
laws of the cooperative colony. The algorithm considers the gender of the spiders.
Depending on gender, each individual is conducted by a set of different evolutionary
operators which mimic different cooperative behaviors that are typically found in
the colony.
Social spider algorithm [92] solves global optimization problems, imitating the
information-sharing foraging strategy of social spiders, utilizing the vibrations on
the spider web to determine the positions of preys. The search space is formulated
as a hyper-dimensional spider web, on which each position represents a feasible
solution. The web also serves as the transmission media of the vibrations generated
by the spiders. Each spider on the web holds a position and the fitness of the solution
is based on the objective function, and represented by the potential of finding a
food source at the position. The spiders can move freely on the web. When a spider
moves to a new position, it generates a vibration which is propagated over the web.
15.7 Swarm Intelligence Inspired by Animal Behaviors 249
The intensity of the vibration is correlated with the fitness of the position. In this
way, the spiders on the same web share their personal information with others to
form a collective social knowledge.
15.7.2 Fish Swarm Optimization
Artificial fish swarm optimization [39,41] population-based optimization technique

inspired by the collective movement of the fish and their social behaviors. It has a
behavior similar to that of PSO. Based on a series of instinctive behaviors, the fish
always try to maintain their colonies. The action of artificial fish occurs only within
the radius of a circle with vision. An area with more fish is generally most nutritious.
The algorithm imitates the fish behaviors with local search of fish individual for
reaching the global optimum. It uses a greedy selection method: A fish x i moves to
a new position only if the new position is better. The environment where an artificial
fish lives is mainly the solution space and is the state of other artificial fish. Its next
behavior depends on its current state and its local environmental state. A fish would
influence the environment via its own activities and its companions’ activities. This
algorithm has a high convergence speed, but it may fall in local optimum and has
high time complexity.
Prey behavior is a basic biological behavior to find food. Swarm, follow, and
move behaviors are basic behaviors to seek food or companions in larger ranges.
Leap behavior can move to new state to avoid local minima. The search process
iteratively performs prey, swarm, follow, move, and leap behaviors, and bookkeeps
the global best solution, until the termination condition is satisfied.
In [30], an improved artificial fish swarm algorithm selects a behavior based on
log-linear model, which is used to implement a multiprobability adaptive model. A
variety of knowledge sources are added to the model in the form of a feature function
to enhance decision-making ability. Adaptive movement behavior based on adaptive
weight dynamically adjusts according to the diversity of fishes. Population inhibition
behavior is introduced to accelerate the convergence speed at later stages. After a
period of evolution, the big fishes will eat small fishes, and the occupied space of
small fishes will be cleared. Population expansion behavior is then applied to restore
diversity.
Fish school search (FSS) [23] is a swarm-based metaheuristic that excels on high-
dimensional multimodal search problems. It uses three operators: feeding, swim-
ming, and breeding. These operators provide automatic selection between explo-
ration and exploitation. FSS is comparable to PSO. FSS needs to specify the step
used in some operators and to evaluate the fitness function twice per fish per iteration.
FSS-II [6] improves FSS: it has high exploitation capability and uses just one fitness
evaluation per fish per iteration.
15.7.3 Krill Herd Algorithm
Krill herd algorithm [24] is a metaheuristic swarm intelligence optimization method,

inspired from the herding behavior of the krill swarms in response to specific bio-
logical and environmental processes.
In krill herd algorithm, the time-dependent position of a krill is formulated by three
main factors: motion induced by the presence of other individuals, foraging motion,
and physical diffusion. The foraging motion of a krill is formulated in terms of the
food location and the previous experience about the food location. The diffusion
of a krill can be considered to be a random process. Only time interval should be
fine-tuned in krill herd algorithm which is a remarkable advantage in comparison
with other nature-inspired algorithms.
The Lagrangian model of the ith krill is given by
d xi
= N i + F i + Di , (15.12)
dt
where x denotes the position of the ith krill, N i is the motion induced by other krills,
F i is the foraging motion, and Di is the physical diffusion of the ith krill.
In general, the defined motions frequently change the position of a krill toward
the best fitness. The foraging motion and the motion induced by other krills contain
two global and two local strategies. The motion during the interval t to t + t is
given by
d xi
x i (t + t) = x i (t) + t . (15.13)
dt
The population of krills is then sorted and the krill with best fitness is found. This
procedure is repeated until the termination criterion is satisfied.
Stud krill herd algorithm [75] introduces stud selection and crossover operator
into krill herd algorithm during the krill updating process. The best krill, i.e., the
stud, provides its optimal information for all the other individuals in the population
using genetic operators instead of stochastic selection. A discrete krill herd algorithm
was proposed for network route optimization in [72].
In krill herd algorithm, a krill is influenced by its neighbors and the optimal krill,
and the sensing distance of each krill is fixed. But in nature, the action of each
krill is free and uncertain. Free search krill herd algorithm [43] introduces into krill
herd algorithm the opposition-based learning strategy for generating more uniform
initial populations and free search operator for simulating the freedom and uncertain
individual behavior of krill herd. Each krill searches according to its own perception
and scope of activities. The free search strategy allows nonzero probability for access
to any location of the search space and highly encourages the individuals to escape
from being trapped in a local optimal solution. When increasing the sensitivity, a
krill will approach the whole population’s current best value, while reducing the
sensitivity, the krill can search around other neighborhood. Free search krill herd
algorithm outperforms PSO, DE, krill herd algorithm, harmony search, free search,
and bat algorithm.
15.7.4 Cockroach-Based Optimization
Cockroaches prefer to concurrently optimize number of friends and level of shelter

darkness. Roach infestation optimization [27] is a metaheuristic optimization method
inspired by the social behavior of cockroaches. As an improvement on PSO, it is based
partly on PSO equations. However, the agents are designed to congregate under
dark shelters, whereas PSO particles are designed to gather food. Roach infestation
optimization uses local best positions to replace the global best position of PSO, and
adds a random search behavior to prevent convergence on local minima. Thus, the
algorithm finds global optima more effectively than PSO.
Hungry behavior is also introduced. At interval of time, when a cockroach is
hungry, it migrates from its comfortable shelter and friends to look for food [27].
Hungry PSO and hungry roach infestation optimization are hungry versions of PSO
and roach infestation optimization.
Roach infestation optimization implements optimization by the three behaviors.
• Find darkness. A roach moves at velocity v i at position x i with a recognized

p
personal best (darkest) position x i in search for a comfortable (dark) position.
• Find friends. A roach communicates with roaches near its current position, depend-
ing on group parameters to attain a local best position x li and search the hyper-
p
rectangle formed by x i and x li in search for an optimally comfortable position.
• Find food. Each roach grows hungry over time and will eventually leave its com-
fortable position and seek a new position b to satisfy its hunger.
Thus, each roach updates its position by

x i (t) + v i (t + 1), hungeri < Thunger
x i (t + 1) = , (15.14)
b, hungeri < Thunger
where hungeri is an incremental hunger counter initially determined at random from
[0, Thunger ].
Cockroach swarm optimization [14,15] mimics chase swarming, dispersion, and
ruthless social behavior of cockroaches. In [59], hunger component is introduced to
cockroach swarm optimization to prevent local optimum and enhance diversity of
population. The algorithm is somewhat like PSO. The improved algorithm outper-
forms roach infestation optimization, and hungry roach infestation optimization.
Cockroach swarm optimization is based on four behaviors. The algorithm executes
initialization, and then execute a loop of find personal bests and global best, chase-
swarming behavior, hunger behavior, dispersion behavior, and ruthless behavior,
until the stopping criterion is satisfied.
The position x i of a cockroach corresponds to a solution. For chase-swarming
behavior, a cockroach x i is attracted by its personal best within the visual range x i∗ , or
by the global best x g . Dispersion behavior is characterized by adding a random vector
to the individuals. Ruthless behavior set an individual to the global best position x g .
Hunger behavior is modeled using a partial differential equation. A threshold hunger
is defined. When a cockroach reaches threshold hunger, it migrates to food source
x f ood within the search space.
15.7.5 Seven-Spot Ladybird Optimization
The seven-spot ladybird, coccinella septempunctata, is a common insect. Seven-spot

ladybirds are effective predators of aphids and other homopteran pests. Seven-spot
ladybirds use different kinds of pheromones at different stages of their life, such as
eggs, larvae, pupa, and adult stages. Seven-spot ladybird optimization [76] is a
metaheuristic algorithm inspired by the foraging behavior of a seven-spot ladybird.
Seven-spot ladybird optimization is somewhat similar to PSO, which uses lbest
and gbest for search. By dividing the space into patches, the algorithm can search in
intensive and extensive modes. Movement between prey within aggregates of aphids
is referred to as intensive search which is slow. Movement between aggregates within
a patch is referred to as extensive search which is relatively linear and fast. Movement
between patches is called dispersal and movement from patches to hibernation is
called migration. Seven-spot ladybirds locate their prey via extensive search and
then switch to intensive search after feeding. While searching for its prey, a seven-
spot ladybird holds its antennae parallel to its searching substratum and its maxillary
palpi perpendicular to the substratum. The ladybird vibrates its maxillary palpi and
turns its head from side to side. The sideward vibration can increase the area wherein
the prey may be located.
15.7.6 Monkey-Inspired Optimization
Monkey search [57] is a metaheuristic optimization algorithm inspired by the behav-

ior of a monkey climbing trees for food. The tree branches are represented as pertur-
bations between two neighboring feasible solutions. The monkey marks and updates
these branches leading to good solutions as it climbs up and down the tree. A wide
selection of perturbations can be applied based on other metaheuristic methods for
global optimization.
Spider monkey optimization [5] is a numerical optimization approach inspired by
intelligent foraging behavior of fission–fusion social structure based animals such
as spider monkeys. The animals which follow fission–fusion social systems, split
themselves from large to smaller groups and vice versa based on the scarcity or
availability of food.
Monkey king EA [52] is a memetic EA for global optimization. It outperforms
PSO variants on robustness, optimization accuracy, and convergence speed on BBOB
and CEC benchmark functions.
Monkey Algorithm
Monkey algorithm [94] is a swarm intelligent algorithm. It was put forward for solv-
ing large-scale, multimodal optimization problem. The method derives from the sim-
ulation of mountain-climbing processes of monkeys. It consists of three processes:
climb process, watch–jump process, and somersault process. In the original monkey
algorithm, the time consumed mainly lies in using the climb process to search local
optimal solutions.
The climb process is a step-by-step procedure to change the monkeys’ position

from the initial positions to new ones that can make an improvement in the objective
function. The climb process uses the pseudo-gradient-based simultaneous perturba-
tion stochastic approximation. The calculation of the pseudo-gradient of the objective
function only requires two measurements of the objective function regardless of the
dimension of the optimization problem.
After the climb process, each monkey arrives at its own mountaintop and enter
watch–jump process. It then takes a look and determines whether there are other
points around it that are higher than the current one. If yes, it will jump from the
current position and then repeat the climb process until it reaches the top of the
mountain.
The purpose of the somersault process is to make monkeys find new search
domains and this action avoids running into local search. After repetitions of the
climb process and watch-jump process, each monkey will find a locally maximal
mountaintop aroud its initial point. In order to find a much higher mountaintop, it
is natural for each monkey to somersault to a new search domain. In the original
monkey algorithm, the monkeys will somersault along the direction pointing to the
pivot which is equal to the bar center of all monkeys’ current positions.
15.7.7 Migrating-Based Algorithms
Self-organizing migrating algorithm [93] is a stochastic population-based optimiza-

tion algorithm that is modeled on the social behavior of cooperating individuals.
It has the ability to converge toward the global optimum. It works in loops called
migration loops. The population is randomly distributed over the search space at
the beginning of the search. In each loop, the population is evaluated and the solu-
tion with the highest fitness becomes the leader L. Apart from the leader, in one
migration loop, all individuals will traverse the input space in the direction of the
leader. Mutation ensures the diversity among the individuals and it also provides the
means to restore lost information in a population. A parameter called PRT is used to
achieve perturbation. The PRT vector defines the final movement of an active indi-
vidual in search space. The randomly generated binary perturbation vector controls
the allowed dimensions for an individual.
Differential search algorithm [16] is a metaheuristic search technique that mimics
the migration behavior of living beings, which move away from a habitat having
low food capacity toward habitat having more food capacity. The migration process
entails the Brownian-like random walk movement of a large number of individu-
als comprising a superorganism. Once the superorganism finds new fruitful habitat
named as stopover site, it settles in the new habitat for the time being and contin-
ues its migration toward more fruitful habitats. The algorithm starts by generating
individuals of respective optimization problem corresponding to an artificial superor-

ganism. Hereafter artificial-superorganism tries to migrate from its current position
to the global minimum value. It simulates a superorganism migrating between the
two stopovers sites. It has unique mutation and crossover operators, and has only two
control parameters that are used for controlling the movement of superorganisms.
15.7.8 Other Methods
Society and civilization algorithm [67] is a stochastic optimization algorithm designed

for single objective-constrained optimization problems, by using intra- and interso-
ciety interactions of animal societies, e.g., human and social insect societies.
Gray wolf optimizer [56] is inspired by gray wolves (Canis lupus). It mimics the
leadership hierarchy and hunting mechanism of gray wolves in nature. Four types of
gray wolves are employed for simulating the leadership hierarchy. The three main
steps of hunting, namely, searching for prey, encircling prey, and attacking prey,
are implemented. The method is competitive with gravitational search algorithm
and DE.
Dog group wild chase and hunt drive algorithm is a metaheuristic simulating
intelligent chasing and hunting method adopted by the dogs to chase and hunt their
prey in groups [11].
Bird mating optimizer [4] is a metaheuristic optimization algorithm inspired by
evolution of bird species and the intelligent behavior of birds during mating season.
Raven roosting optimization algorithm [10] is inspired from the social roosting
and foraging behavior of one species of bird, the common raven. Some species of
birds and bats engage in social roosting. These roosts can serve as information centers
to spread knowledge concerning the location of food resources in the environment.
Dolphin partner optimization [70] is a metaheuristic inspired by the clustering
behavior of dolphins. It predicts the best position according to the positions and
fitness of the team members.
Echolocation is the biological sonar used by dolphins and several kinds of other
animals for navigation and hunting in various environments. Dolphin echolocation
optimization [34] mimics this echolocation ability of dolphins. It has few parameters
to set. Likewise, electrolocation is a location technique based on the electric wave
propagation. Fish electrolocation optimization [25] is a metaheuristic optimization
method that mixes the foraging behaviors based on the active electrolocation of
elephant nose fish and on the passive electrolocation of shark.
Fruit fly optimization algorithm [61] is a simple and robust swarm optimization
algorithm inspired by the foraging behavior of fruit flies. In [80], the randomness
and fuzziness of the foraging behavior of fruit fly swarm is described by the normal
cloud model to improve the convergence and the global search ability of fruit fly opti-
mization algorithm. Bimodal adaptive fruit fly optimization algorithm [81] divides
the population into the search and capture groups, and uses normal cloud learning
and an adaptive parameter updating strategy. The search group is mainly based on
the fruit fly’s olfactory sensors to find possible global optima in a large range, while
the capture group makes use of their keen visions to exploit neighborhood of the
current best food source found by the search group. The randomness and fuzziness
of the foraging behavior of fruit fly swarm during the olfactory phase are described
by a normal cloud model. The algorithm outperforms, or perform similarly to, PSO
and DE.
Antlion optimizer (http://www.alimirjalili.com/ALO.html) [54] is a population-
based global optimization metaheuristic that mimics the hunting mechanism of
antlions in nature. Five main steps of hunting prey such as the random walk of
antlions, building traps, entrapment of antlions in traps, catching preys, and rebuild-
ing traps are implemented.
Moths fly on nights for searching food by maintaining a fixed angle with respect
to the moon, a very effective mechanism called transverse orientation for travel-
ing in a straight line for long distances. However, these insects are trapped in a
useless/deadly spiral path around artificial lights. Moth flame optimization (http://
www.alimirjalili.com/MFO.html) [55] is a population-based metaheuristic optimiza-
tion method inspired by the navigation strategy of moths.
15.8 Plant-Based Metaheuristics
Invasive Weed Optimization

Invasive weed optimization [51] is a metaheuristic optimization method inspired
from the nature principles and behaviors of weedy invasion and colonization in the
shifting and turbulent environment. The method has been extended for multiobjective
optimization problems [38].
The algorithm has four steps, namely, initialiation, reproduction, spatial disper-
sal, and competitive exclusion. First, a population of solutions are initialized and
dispersed in the search space uniformly and randomly. Then, each individual is per-
mitted to reproduce seeds according to its own fitness, the colony’s lowest and highest
fitness. The fitness of each individual is normalized and the number of seeds that
each individual reproduces depends on given minimum and maximum, and increases
linearly. Next, offspring are randomly distributed over the search space by normally
distributed random numbers with mean equal to zero but varying variance. Through
this, a group of offspring are produced around their parent individual and thus weed
colony is formed to enhance the search ability. Furthermore, standard deviation of
the normally distributed random function will be reduced from a predefined initial
value to a small final value over every generation. Finally, with the growth and repro-
duction of weeds, after several generations, the number of weeds in a colony will
reach its maximum. Exclusion mechanism is applied to eliminate weeds with low
fitness and select good weeds that reproduce more than undesirable ones. These steps
are repeated until termination criterion is reached.
Flower Pollination Algorithm

Flower pollination algorithm [86] is a population-based metaheuristic optimization
algorithm simulating the flower pollination behavior of flowering plants. Dispersion
is probably by wind and ballistic means.
Flower pollination algorithm can efficiently combine local and global searches,
inspired by cross-pollination and self-pollination of flowering plants, respectively. It
uses Levy flights instead of standard Gaussian random walks. The algorithm selects a
population of flowers/pollens, global pollination is updated by the distance between
its current position to the global current best solution with a step size generated by
Levy distribution and. Local pollination is performed by taking any two pollens in
the population. The best pollens are kept in the population in each generation.
By using a set of benchmark functions, flower pollination algorithm has proved
to outperform both GA and PSO in obtaining better results and fast convergence rate
[86]. It has demonstrated very good efficiency in solving multiobjective optimiza-
tion problems [91]. Flower pollination algorithm is extended to solve multiobjective
optimization problems in [90].
Plant Propagation Algorithm
Plants rely heavily on the dispersion of their seeds to colonize new territories and to
improve their survival. Plants have evolved a variety of ways to propagate. Propa-
gation through seeds is perhaps the most common of them all and one which takes
advantage of all sorts of agents ranging from wind to water, birds, and animals. The
strawberry plant uses both runners and seeds to propagate. Because of the periodic
nature of fruit and seed production, it amounts to setting up a feeding station for the
attention of potential seed-dispersing agents. The same applies to birds and animals
visiting and feeding on ripe fruit produced by plants such as the strawberry plant.
Modeling it as a queuing process results in a seed-based optimisation algorithm.
Plant propagation algorithm [68,71], also known as strawberry algorithm, is based
on the way the strawberry plant propagates using runners. In the case of the straw-
berry plant, given the way the seeds stick to the surface of the fruit, dispersion by
wind or mechanical means is very limited. Animals and birds are the ideal agents for
dispersion. Seed-based plant propagation algorithm is entirely based on the propaga-
tion by seeds of the strawberry plant. It follows the principle that plants in good spots
with plenty of nutrients will send many short runners. They send few long runners
when in nutrient poor spots. With long runners plant propagation algorithm tries to
explore the search space while short runners enable it to exploit the solution space
well.
Other Plant Methods

Artificial plant optimization algorithm (APOA) [12] is a methodology that maps
the growing process of a tree into an optimization problem. The method designs
three operators, namely, photosynthesis operator, phototropism operator and apical
dominance operator, to simulate three important phenomena. The light responsive
15.8 Plant-Based Metaheuristics 257
curve of photosynthesis operator can be selected as rectangular hyperbolic model,

and parabola model performs even better.
Plants have to adapt to these changes and adopt new techniques to defend from
natural predators (herbivores). In [13], an optimization algorithm inspired in the self-
defense mechanisms of plants is presented based on predator–prey model, where
two populations are taken and the objective is to maintain a balance between the two
populations.
Runner-root algorithm [53] is a metaheuristic inspired by the function of runners
and roots of some plants in nature. The plants which are propagated through runners
look for water resources and minerals by developing runners and roots (as well as root
hairs). Runners help the plant for search around with random big steps while roots are
appropriate for search around with small steps. Moreover, the plant which is placed
at a good location by chance spreads in a larger area through its longer runners and
roots. Runner-root algorithm has two means for exploration: random jumps with big
steps and a reinitialization strategy in case of trapping in local optima. Exploitation
is performed by the roots and root hairs which respectively apply random large and
small changes to the variables of the best computational agent separately (in case of
stagnation).
Artificial root foraging optimization [48] is a metaheuristic inspired by plant root
foraging behaviors. It mimics the adaptation and randomness of plant root foraging
behaviors, e.g., branching, regrowing, and tropisms.
15.9 Other Swarm Intelligence-Based Metaheuristics
The idea underlying all swarm intelligence algorithms is similar. Shuffled frog leap-
ing algorithm, group search optimizer, firefly algorithm, ABC and the gravitational
search algorithm are all algorithmically identical to PSO under certain conditions
[47]. However, their implementation details result in notably different performance
levels.
More and more emerging computational paradigms are inspired by the metaphor
of nature. This section gives an introduction to some of them.
Amorphous Computing
Amorphous computing [1,64,79] presents a computational paradigm that consists
of a set of tiny, independent and self-powered processors or robots that can com-
municate wirelessly to a limited distance. Such systems should also be compared
to so-called population protocols [3]. In the underlying model, they consider the
anonymous finite-state agents computing a predicate of the multiset of their inputs
via two-way or one-way interactions in the all-pairs family of communication net-
works.
Stochastic Diffusion Search

Stochastic diffusion search [9,58], as another swarm intelligence method, is a
population-based pattern-matching algorithm. The agents perform cheap, partial
evaluations of a hypothesis (a candidate solution) to the search problem. Diffusion of
information is implemented through direct one-to-one communication. High-quality
solutions can be identified from clusters of agents with the same hypothesis. In [65],
the proposed diffusion dynamics uses a spatially inhomogeneous diffusion coeffi-
cient. By appropriately constructing the inhomogeneous diffusion, one can improve
the speed of convergence of the overall dynamics to the stationary distribution. The
stationary Gibbs distribution of the introduced dynamics is identical to that of the
homogeneous diffusion. Adapting the diffusion coefficient to the Hamiltonian allows
escaping local wide minima and to speed up the convergence of the dynamics to the
global minima.
Hyper-Spherical Search
Hyper-spherical search [33] is a population-based metaheuristic. Population indi-
viduals are particles and hyper-sphere centers that all together form particle sets.
Searching the hyper-sphere inner space made by the hyper-sphere center and its
particle is the basis of the algorithm. The algorithm hopefully converges to a state
at which there exists only one hyper-sphere center, and its particles are at the same
position and have the same cost function value as the hyper-sphere center.
Weighted superposition attraction algorithm [8] is a swarm-based metaheuristic
for global optimization, based on the superposition principle in combination with
the attracted movement of agents that are observable in many systems; it attempts
to model and simulate the dynamically changing superposition due to the dynamic
nature of the system in combination with the attracted movement of agents.
Problems
15.1 Give some specific conditions under which the firefly algorithm can be con-
sidered as a special case of the PSO algorithm.
15.2 Run the accompanying MATLAB code of firefly algorithm to find the global
minimum of Schwefel function in the Appendix. Investigate how to improve
the result by adjusting the parameters.
15.3 Run the accompanying MATLAB code of bat algorithm to find the global
minimum of Griewank function in the Appendix. Understand the principle
of the algorithm.
15.4 Run the accompanying MATLAB code of gray wolf optimizer to find the
global minimum of Schwefel function in the Appendix. Compare its perfor-
mance with that of firefly algorithm in Problems 15.2 and 15.3.
15.5 Run the accompanying MATLAB code of collective animal behavior algo-
rithm to find the global minimum of Griewank function in the Appendix.
Understand the principle of the algorithm.
15.9 Other Swarm Intelligence-Based Metaheuristics 259
15.6 Run the accompanying MATLAB code of differential search algorithm to

find the global minimum of Griewank function in the Appendix. Understand
the principle of the algorithm.
15.7 Run the accompanying MATLAB code of ant lion optimizer to find the global
minimum of Michalewicz function in the Appendix. Understand the principle
of the algorithm.
References
1. Abelson H, Allen D, Coore D, Ch Hanson G, Homsy TF Knight, Jr R, Nagpal E, Rauch GJ
Sussman, Weiss R. Amorphous computing. Commun ACM. 2000;43(5):74–82.
2. Al-Madi N, Aljarah I, Ludwig SA. Parallel glowworm swarm optimization clustering algo-
rithm based on MapReduce. In: Proceedings of IEEE symposium on swarm intelligence (SIS),
Orlando, FL, December 2014. p. 1–8.
3. Angluin D, Aspnes J, Eisenstat D, Ruppert E. The computational power of population protocols.
Distrib Comput. 2007;20(4):279–304.
4. Askarzadeh A, Rezazadeh A. A new heuristic optimization algorithm for modeling of proton
exchange membrane fuel cell: bird mating optimizer. Int J Energ Res. 2013;37(10):1196–204.
5. Bansal JC, Sharma H, Jadon SS, Clerc M. Spider monkey optimization algorithm for numerical
optimization. Memetic Comput. 2014;6(1):31–47.
6. Bastos-Filho CJA, Nascimento DO. An enhanced fish school search algorithm. In: Proceed-
ings of 2013 BRICS congress on computational intelligence and 11th Brazilian congress on
computational intelligence, Ipojuca, Brazil, September 2013. p. 152–157.
7. Bates ME, Simmons JA, Zorikov TV. Bats use echo harmonic structure to distinguish their
targets from background clutter. Science. 2011;333(6042):627–30.
8. Baykasoglu A, Akpinar S. Weighted Superposition Attraction (WSA): a swarm intelligence
algorithm for optimization problems - part 1: unconstrained optimization; part 2: constrained
optimization. Appl Soft Comput. 2015;37:396–415.
9. Bishop JM. Stochastic searching networks. Proceedings of IEE conference on artificial neural
networks, London, UK, October 1989. p. 329–331.
10. Brabazon A, Cui W, O’Neill M. The raven roosting optimisation algorithm. Soft Comput.
2016;20(2):525–45.
11. Buttar AS, Goel AK, Kumar S. Evolving novel algorithm based on intellectual behavior of
wild dog group as optimizer. In: Proceedings of IEEE symposium on swarm intelligence (SIS),
Orlando, FL, December 2014. p. 1–7.
12. Cai X, Fan S, Tan Y. Light responsive curve selection for photosynthesis operator of APOA.
Int J Bio-Inspired Comput. 2012;4(6):373–9.
13. Caraveo C, Valdez F, Castillo O. A new bio-inspired optimization algorithm based on the self-
defense mechanisms of plants. In: Design of intelligent systems based on fuzzy logic, neural
networks and nature-inspired optimization, vol. 601 of studies in computational intelligence.
14. Chen Z. A modified cockroach swarm optimization. Energ Procedia. 2011;11:4–9.
15. Chen Z, Tang H. Cockroach swarm optimization. In: Proceedings of the 2nd international
conference on computer engineering and technology (ICCET’10). April 2010, vol. 6. p. 652–
655.
16. Civicioglu P. Transforming geocentric cartesian coordinates to geodetic coordinates by using
differential search algorithm. Comput Geosci. 2012;46:229–47.
17. Cuevas E, Gonzalez M. An optimization algorithm for multimodal functions inspired by col-
lective animal behavior. Soft Comput. 2013;17:489–502.
18. Cuevas E, Cienfuegos M, Zaldvar D, Prez-Cisneros M. A swarm optimization algorithm
inspired in the behavior of the social-spider. Expert Syst Appl. 2013;40(16):6374–84.
19. Cuevas E, Reyna-Orta A. A cuckoo search algorithm for multimodal optimization. Sci World
J. 2014;2014:20. Article ID 497514.
20. Elbeltagi E, Hegazy T, Grierson D. Comparison among five evolutionary-based optimization
algorithms. Adv Eng Inf. 2005;19(1):43–53.
21. Eusuff MM, Lansey KE. Optimization of water distribution network design using the shuffled
frog leaping algorithm. J Water Resour Plan Manage. 2003;129(3):210–25.
22. Eusuff MM, Lansey K, Pasha F. Shuffled frog-leaping algorithm: a memetic meta-heuristic for
discrete optimization. Eng Optim. 2006;38(2):129–54.
23. Filho C, de Lima Neto FB, Lins AJCC, Nascimento AIS, Lima MP. A novel search algorithm
based on fish school behavior. In: Proceedings of IEEE international conference on systems,
man and cybernetics, Singapore, October 2008. p. 2646–2651.
24. Gandomi AH, Alavi AH. Krill herd: A new bio-inspired optimization algorithm. Commun
Nonlinear Sci Numer Simul. 2012;17(12):4831–45.
25. Haldar V, Chakraborty N. A novel evolutionary technique based on electrolocation principle
of elephant nose fish and shark: Fish electrolocation optimization. Soft Computing, first online
on 11, February 2016. p. 22. doi:10.1007/s00500-016-2033-1.
26. Hassanzadeh T, Kanan HR. Fuzzy FA: a modified firefly algorithm. Appl Artif Intell.
2014;28:47–65.
27. Havens TC, Spain CJ, Salmon NG, Keller JM. Roach infestation optimization. In: Proceedings
of the IEEE swarm intelligence symposium, St. Louis, MO, USA, September 2008. p. 1–7.
28. He S, Wu QH, Saunders JR. A novel group search optimizer inspired by animal behavioral
ecology. In: Proceedings of IEEE congress on evolutionary computation (CEC), Vancouver,
BC, Canada, July 2006. p. 1272–1278.
29. He S, Wu QH, Saunders JR. Group search optimizer: an optimization algorithm inspired by
animal searching behavior. IEEE Trans Evol Comput. 2009;13(5):973–90.
30. Huang Z, Chen Y. Log-linear model based behavior selection method for artificial fish swarm
algorithm. Comput Intell Neurosci. 2015;2015:10. Article ID 685404.
31. Jayakumar N, Venkatesh P. Glowworm swarm optimization algorithm with topsis for solv-
ing multiple objective environmental economic dispatch problem D. Appl Soft Comput.
2014;23:375–86.
32. Jordehi AR. Chaotic bat swarm optimisation (CBSO). Appl Soft Comput. 2015;26:523–30.
33. Karami H, Sanjari MJ, Gharehpetian GB. Hyper-spherical search (HSS) algorithm: a novel
meta-heuristic algorithm to optimize nonlinear functions. Neural Comput Appl. 2014;25:1455–
65.
34. Kaveh A, Farhoudi N. A new optimization method: dolphin echolocation. Adv Eng Softw.
2013;59:53–70.
35. Krishnanand KN, Ghose D. Detection of multiple source locations using a glowworm metaphor
with applications to collective robotics. In: Proceedings of IEEE swarm intelligence sympo-
sium, 2005. p. 84–91.
36. Krishnanand KN, Ghose D. Theoretical foundations for rendezvous of glowworm-inspired
agent swarms at multiple locations. Robot Auton Syst. 2008;56(7):549–69.
37. Krishnanand KN, Ghose D. Glowworm swarm optimization for simultaneous capture of mul-
tiple local optima of multimodal functions. Swarm Intell. 2009;3:87–124.
38. Kundu D, Suresh K, Ghosh S, Das S, Panigrahi BK, Das S. Multi-objective optimization with
artificial weed colonies. Inf Sci. 2011;181(12):2441–54.
39. Li XL, Lu F, Tian GH, Qian JX. Applications of artificial fish school algorithm in combinatorial
optimization problems. Chin J Shandong Univ (Eng Sci). 2004;34(5):65–7.
References 261
40. Li X, Luo J, Chen M-R, Wang N. An improved shuffled frog-leaping algorithm with extremal
optimisation for continuous optimisation. Inf Sci. 2012;192:143–51.
41. Li XL, Shao ZJ, Qian JX. An optimizing method based on autonomous animals: fish-swarm
algorithm. Syst Eng—Theory Pract. 2002;22(11):32–8.
42. Li X, Zhang J, Yin M. Animal migration optimization: an optimization algorithm inspired by
animal migration behavior. Neural Comput Appl. 2014;24:1867–77.
43. Li L, Zhou Y, Xie J. A free search krill herd algorithm for functions optimization. Math Probl
Eng. 2014;2014:21. Article ID 936374.
44. Linhares A. Synthesizing a predatory search strategy for VLSI layouts. IEEE Trans Evol
Comput. 1999;3(2):147–52.
45. Lukasik S, Zak S. Firefly algorithm for continuous constrained optimization tasks. In: Proceed-
ings of the 1st international conference on computational collective intelligence: Semantic web,
social networks and multiagent systems, Wroclaw, Poland, October 2009. p. 97–106.
46. Luo Q, Zhou Y, Xie J, Ma M, Li L. Discrete bat algorithm for optimal problem of permutation
flow shop scheduling. Sci World J. 2014;2014:15. Article ID 630280.
47. Ma H, Ye S, Simon D, Fei M. Conceptual and numerical comparisons of swarm intelligence
optimization algorithms. Soft Comput. 2016:1–20. doi:10.1007/s00500-015-1993-x.
48. Ma L, Zhu Y, Liu Y, Tian L, Chen H. A novel bionic algorithm inspired by plant root foraging
behaviors. Appl Soft Comput. 2015;37:95–113.
49. Mahmoudi S, Lotfi S. Modified cuckoo optimization algorithm (MCOA) to solve graph coloring
problem. Appl Soft Comput. 2015;33:48–64.
50. Martinez-Garcia FJ, Moreno-Perez JA. Jumping frogs optimization: a new swarm method for
discrete optimization. Technical Report DEIOC 3/2008. Spain: Universidad de La Laguna;
2008.
51. Mehrabian AR, Lucas C. A novel numerical optimization algorithm inspired from weed colo-
nization. Ecol Inf. 2006;1:355–66.
52. Meng Z, Pan J-S. Monkey king evolution: a new memetic evolutionary algorithm and its
application in vehicle fuel consumption optimization. Knowl.-Based Syst. 2016;97:144–57.
53. Merrikh-Bayat F. The runner-root algorithm: a metaheuristic for solving unimodal and mul-
timodal optimization problems inspired by runners and roots of plants in nature. Appl Soft
Comput. 2015;33:292–303.
54. Mirjalili S. The ant lion optimizer. Adv Eng Softw. 2015;83:80–98.
55. Mirjalili S. Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm.
Knowl-Based Syst. 2015;89:228–49.
56. Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Adv Eng Softw. 2014;69:46–61.
57. Mucherino A, Seref O. Monkey search: a novel metaheuristic search for global optimization.
In: AIP conference proceedings 953: Data mining, systems analysis and optimization in bio-
medicine, American, Gainesville, FL, USA, March 2007. New York: American Institute of
Physics; 2007. p. 162–173.
58. Nasuto SJ, Bishop JM. Convergence analysis of stochastic diffusion search. Parallel Algorithms
Appl. 1999;14:89–107.
59. Obagbuwa IC, Adewumi AO. An improved cockroach swarm optimization. Sci World J.
2014;375358:13.
60. Osaba E, Yang X-S, Diaz F, Lopez-Garcia P, Carballedo R. An improved discrete bat algorithm
for symmetric and asymmetric traveling salesman problems. Eng Appl Artif Intell. 2016;48:59–
71.
61. Pan W-T. A new fruit fly optimization algorithm: taking the financial distress model as an
example. Knowl-Based Syst. 2012;26:69–74.
62. Pavlyukevich I. Levy flights, non-local search and simulated annealing. J Comput Phys.
2007;226(2):1830–44.
63. Penev K, Littlefair G. Free search-a comparative analysis. Inf Sci. 2005;172:173–93.
64. Petru L, Wiedermann J. A universal flying amorphous computer. In: Proceedings of the 10th
International conference on unconventional computation (UC’2011), Turku, Finland, June
2011. p. 189–200.
65. Poliannikov OV, Zhizhina E, Krim H. Global optimization by adapted diffusion. IEEE Trans
Sig Process. 2010;58(12):6119–25.
66. Rajabioun R. Cuckoo optimization algorithm. Appl Soft Comput. 2011;11(8):5508–18.
67. Ray T, Liew KM. Society and civilization: an optimization algorithm based on the simulation
of social behavior. IEEE Trans Evol Comput. 2003;7(4):386–96.
68. Salhi A, Fraga ES. Nature-inspired optimisation approaches and the new plant propagation
algorithm. In: Proceedings of the international conference on numerical analysis and optimiza-
tion (ICeMATH’11), Yogyakarta, Indonesia, June 2011. p. K2-1–K2-8.
69. Sayadia MK, Ramezaniana R, Ghaffari-Nasab N. A discrete firefly meta-heuristic with local
search for makespan minimization in permutation flow shop scheduling problems. Int J Ind
Eng Comput. 2010;1(1):1–10.
70. Shiqin Y, Jianjun J, Guangxing Y. A dolphin partner optimization. In: Proceedings of IEEE
WRI global congress on intelligent systems, Xiamen, China, May 2009, vol. 1. p. 124–128.
71. Sulaiman M, Salhi A. A seed-based plant propagation algorithm: the feeding station model.
Sci World J. 2015;2015:16. Article ID 904364.
72. Sur C. Discrete krill herd algorithm—a bio-inspired metaheuristics for graph based network
route optimization. In: Natarajan R, editor. Distributed computing and internet technology, vol.
8337 of Lecture notes in computer science. Berlin: Springer; 2014. p. 152–163.
73. Tuba M, Subotic M, Stanarevic N. Modified cuckoo search algorithm for unconstrained opti-
mization problems. In: Proceedings of the european computing conference (ECC), Paris,
France, April 2011. p. 263–268.
74. Tuba M, Subotic M, Stanarevic N. Performance of a modified cuckoo search algorithm for
unconstrained optimization problems. WSEAS Trans Syst. 2012;11(2):62–74.
75. Wang G-G, Gandomi AH, Alavi AH. Stud krill herd algorithm. Neurocomputing.
2014;128:363–70.
76. Wang P, Zhu Z, Huang S. Seven-spot ladybird optimization: a novel and efficient metaheuristic
algorithm for numerical optimization. Sci World J. 2013;2013:11. Article ID 378515.
77. Walton S, Hassan O, Morgan K, Brown M. Modified cuckoo search: a new gradient free
optimisation algorithm. J Chaos, Solitons Fractals. 2011;44(9):710–8.
78. Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–
2.
79. Wiedermann J, Petru L. On the universal computing power of amorphous computing systems.
Theor Comput Syst. 2009;46(4):995–1010.
80. Wu L, Zuo C, Zhang H. A cloud model based fruit fly optimization algorithm. Knowl-Based
Syst. 2015;89:603–17.
81. Wu L, Zuo C, Zhang H, Liu Z. Bimodal fruit fly optimization algorithm based on cloud model
learning. Soft Comput. 2016:17. doi:10.1007/s00500-015-1890-3.
82. Yan X, Yang W, Shi H. A group search optimization based on improved small world and its
applicationon neural network training in ammonia synthesis. Neurocomputing. 2012;97:94–
107.
83. Yang XS. Firefly algorithms for multimodal optimization. In: Proceedings of the 5th inter-
national symposium on stochastic algorithms: Foundations and applications, SAGA 2009,
Sapporo, Japan, October 2009. p. 169–178.
84. Yang X-S. A new metaheuristic bat-inspired Algorithm. In: Cruz C, Gonzlez J, Krasnogor
GTN, Pelta DA, editors. Nature inspired cooperative strategies for optimization (NICSO), vol.
284 of Studies in computational intelligence. Berlin, Germany: Springer; 2010. p. 65–74.
85. Yang X-S. Bat algorithm for multi-objective optimisation. Int J Bio-Inspired Comput.
2011;3:267–74.
References 263
86. Yang X-S. Flower pollination algorithm for global optimization. In: Unconventional computa-
tion and natural computation, vol. 7445 of Lecture notes in computer science. Berlin: Springer;
2012. p. 240–249.
87. Yang XS, Deb S. Cuckoo search via Levy flights. In: Proceedings of world congress on nature
and biologically inspired computing, Coimbatore, India, December 2009. p. 210–214.
88. Yang XS, Deb S. Engineering optimisation by cuckoo search. Int J Math Modell Numer Optim.
2010;1(4):330–43.
89. Yang X-S, Deb S. Eagle strategy using Levy walk and firefly algorithms for stochastic opti-
mization. In: Gonzalez JR, Pelta DA, Cruz C, Terrazas G, Krasnogor N, editors. Nature inspired
cooperative strategies for optimization (NISCO 2010), vol. 284 of Studies in computational
intelligence. Berlin: Springer; 2010. p. 101–111.
90. Yang X-S, Karamanoglu M, He X. Multi-objective flower algorithm for optimization. Procedia
Comput Sci. 2013;18:861–8.
91. Yang X-S, Karamanoglu M, He XS. Flower pollination algorithm: a novel approach for mul-
tiobjective optimization. Eng Optim. 2014;46(9):1222–37.
92. Yu JJQ, Li VOK. A social spider algorithm for global optimization. Appl Soft Comput.
2015;30:614–27.
93. Zelinka I. SOMA—Self organizing migrating algorithm. In: Onwubolu GC, Babu BV, edi-
tors. New optimization techniques in engineering, vol. 141 of Studies in fuzziness and soft
computing. New York: Springer; 2004. p. 167–217.
94. Zhao R, Tang W. Monkey algorithm for global numerical optimization. J Uncertain Syst.
2008;2(3):164–75.
Biomolecular Computing
16
Biomolecular computing studies the potential of using biological molecules to

perform computation. DNA (deoxyribonucleic acid) computing [49] and membrane
computing [46] are two natural computing techniques at the biomolecular level. This
chapter gives a conceptual introduction to these computing paradigms.
16.1 Introduction
A multicellular organism consists of a vast number of cells that run their own cycle
in parallel to maintain the organism alive and functional. A single cell is the build-
ing block of living organisms. As such, either development alone, or with evolu-
tion could be suitable design methods for a cellular computing machine. Living
cells can be categorized into the prokaryotes (including bacteria and archaea) and
eukaryotes (including animals, plants and fungi). Eukaryotic cells contain complex
functional substructures enclosed in membranes, whereas prokaryotic cells largely
organize their substructures without using them. P systems (or membrane systems)
are eukaryotical models of computation [45]. Biological functions are the results of
the interactions between modules made up of many molecular species.
The transport of chemicals (symbol objects) across membranes is a fundamental
function of a cell. The transport can be passive or active. It is passive when molecules
(symbol objects) pass across the membrane from a higher concentration region to a
lower concentration region, while it is active in a reverse case. This requires some
metabolical energy to accomplish the transport. Respiration is the biological process
that allows the cells (from bacteria to humans) to obtain energy. In short, respiration
promotes a flux of electrons from electron donors to a final electron acceptor, which
in most cases is the molecular oxygen.
Cellular architectures are an appealing architecture for hardware. BioSpice [60]
is a simulation tool providing models of cells and cell communication at different

DOI 10.1007/978-3-319-41192-7_16
266 16 Biomolecular Computing
levels of abstraction. In [58], an adaptive method for designing a cellular computing

machine has been addressed, implementing the cellular computing machine on an
existing silicon technology—FPGA. The design method involves artificial develop-
ment.
DNA, RNA, and protein molecules are the fundamental devices of biomolecular
computers, in which computation is carried out by intra/intermolecular reactions of
biomolecules. In particular, DNA has been playing the leading role in biomolecular
computing in that DNA bears strong affinity to the electronic computer architecture.
In contrast, protein molecules are not directly compatible with ordinary electronic
computer architecture: in proteins, inter/intramolecular recognition or structural for-
mation is intrinsically complex. As a consequence, biomolecular computing usually
uses proteins as mere black box devices to manipulate the information on DNA.
DNA is well known as the blueprint of life. Genome sequencing examines the
long-term primary structure of DNA. Analysis of epigenetic inheritance deals with
intermediate-term storage. Epigenetic inheritance relies upon chemical modification
of DNA in ways that do not alter sequence content but which affect access to specific
regions of the genome. Genetic engineering with recombinant DNA is a widespread
technology that enables biologists to redesign life forms by modifying their DNA.
DNA is a unique data structure: a naturally occurring DNA molecule has a double
helix structure. Each strand is a directed sequence of bases A, C, G, and T. Two
single DNA strands assemble into a double stranded DNA molecule, which is stabi-
lized by hydrogen bonds between the nucleotides. Natural double strands are bound
together by mutual attractions of each A to a T and each C to a G, forming base
pairs. Thus, every DNA sequence has a natural complement. This determines the
complementarity principle, also known as Watson–Crick base pairing of the DNA
double helix. The A and T base pair aligns through a double hydrogen bond and the
G and C pair glues with a triple hydrogen bond, which is the reason for the higher
stability of the G–C Watson–Crick base pair over the A–T Watson–Crick base pair.
The overall stability of the DNA molecule increases with increasing proportion of
the G–C base pairs. The two single DNA strands are complementarily aligned in a
reverse direction. For example, if a sequence is ATTACGTCA, its complement is
TAATGCAGT. A simple way to separate complementary strands is to boil them. But
as they cool, they will seek each other to recombine. But all DNA sequences tend to
bind to one another, or even to parts of themselves.
The bases (known as nucleotides) are spaced every 0.35 nm along the DNA mole-
cule, corresponding to an impressive data density of nearly 18 Mb per inch. In the
cell, DNA is modified biochemically by a variety of enzymes, which are tiny protein
machines that read and process DNA according to nature’s design. These enzymes
or operational proteins manipulate DNA on the molecular level: cutting, pasting,
copying, repairing, and many other. There are well-developed techniques for per-
forming many of these cellular functions in test tubes. In the test tube, enzymes do
not function sequentially, rather many copies of the enzymes work on different DNA
molecules simultaneously. DNA becomes a desirable material because of its excel-
lent properties such as minute size, extraordinary information density, and an abil-
ity to self-assemble into well-defined structures. The massively parallel processing
capabilities allow a DNA-based computer to solve hard problems in a reasonable

amount of time.
16.1.1 Biochemical Networks
In biological systems, biochemical networks emerge from protein-mediated molecu-

lar interactions taking place within cells. These complex dynamical networks under-
lie both the structure and function of biological organisms. Three kinds of biochem-
ical network are metabolic, genetic, and signaling networks, which are described,
respectively, as the self-organizing, self-modifying, and self-reshaping components
of a cell’s biochemical network [40].
A metabolic network results from self-organizing interactions between the
enzyme-mediated reactions that take place within a cell. It emerges when the prod-
ucts of certain reactions become the substrates of others, forming chains of reactions
known as metabolic pathways. Product–substrate sharing between pathways results
in the metabolic network. A genetic network emerges from the regulatory interac-
tions between genes. It captures how they regulate one another’s protein expression
levels over time through the production of transcription factors. A signaling network
comprises the protein-mediated reaction pathways through which chemical messages
are delivered to the cell’s internal environment.
These three networks are coupled. By regulating protein production, the genetic
network modifies the behavior of both the metabolic and signaling networks. By
delivering chemical signals to different subcellular locations, the signaling network
modulates the behavior of both genetic and metabolic networks. In single-celled
organisms, these interactions allow the cell’s metabolism to be reconfigured for
different nutrient environments; in multicellular organisms, they are the basis of
cellular differentiation and morphogenesis.
Artificial metabolic networks are modeled on the self-organizing behavior of cel-
lular chemistries. It is a minimal implementation of an artificial chemistry, capturing
the key idea that a set of computational elements manipulate a set of chemicals
over a period of time, but abstracting away the elements found in more complicated
chemistries such as nondeterminism, internal chemical structure, and spatial distri-
bution. It comprises an indexed set of enzyme-analogous elements which transform
the concentrations of an indexed set of real-valued chemicals. Each enzyme has a set
of substrates, a set of products, and a mapping which calculates the concentrations
of its products based on the concentrations of its substrates.
An artificial genetic network is a computational architecture modeled on the reg-
ulatory interactions between genes. The simplest and the best known example of
an artificial genetic network is the random Boolean network. A random Boolean
network is a closed system comprising a set of interconnected genes, each of which
has a Boolean state and a Boolean regulatory function. Random Boolean networks
have been used to successfully model the dynamics of real genetic networks [2].
Signaling networks carry out a number of computationally interesting behaviors
[19,35]. In [19], the authors discuss the manner in which signaling pathways integrate
and preprocess diverse incoming signals, likening their behavior to that of a fuzzy
classifier system. In [35], the author draws parallels between the adaptive behaviors
of various signaling pathways and those of engineered controllers.
16.2 DNA Computing
Adleman’s seminal work [1] on the use of DNA molecules for solving a TSP of
seven cities has pioneered an era in DNA computing. The application was realized
by creating a solution environment in the biology laboratory and using biochemical
reactions. The cities and distances were coded using DNA series and operations for
solution were created using polymer chain reactions.
DNA computing is an optimization metaheuristic method that performs compu-
tation using DNA molecules of the living things. The paradigm utilizes the natural
biomolecular characteristics of DNA molecules, such as the inherent logic of DNA
hybridization, massive parallelism, high memory capacity without any corruption
in many years, and energy saving. Techniques in isolating, copying, and prepar-
ing nucleotide sequences make DNA computing a powerful alternative to silicon
computers. However, equipments such as solution tubes and systems named gel
electrophoresis are needed for production of DNA serials, DNA synthesis, and for
acquiring and analyzing the results.
DNA molecules exist in the form of single serial and double helix serial. Using the
single DNA serials, synthesis and reproduction of DNA molecules is realized. Double
helix DNA serials are created according to the Watson–Crick complementation rule:
A and T combine, and G and C combine.
The unique double stranded data structure can be exploited in many ways, such
as error correction. Errors in DNA occur due to mistakes made by DNA enzymes or
damage from thermal energy and ultraviolet energy from the sun. If the error occurs
in only one strand of double stranded DNA, repair enzymes can restore the proper
DNA sequence by using the complement strand as a reference. In biological systems,
due to this error correction capability, the error rate for DNA operations can be quite
low. A typical error rate for DNA replication is 10−9 .
DNA molecules can perform sophisticated, massively parallel computations. The
potential of polymerase chain reaction as a verification and readout approach has been
shown for computations at the molecular level. In [1], polymerase chain reaction is
used to amplify all the correct solutions of the TSP. A 10-bit RNA combinator-
ial library was reverse transcribed and amplified through colony polymerase chain
reaction followed by multiplex linear polymerase chain reaction to determine the
configuration of knights on a chessboard [18].
The experimental side often focused on the implementation of molecular circuits
and gates that mimic their digital counterparts [53,55,57,61]. Bacterial genetic ele-
ments have been connected to create logic gates that approximate Boolean functions
such as NOT, OR, and AND [26]. A set of deoxyribozyme-based logic gates (NOT,
AND, and XOR) is presented in [57]. As the input and output of the gates are both
16.2 DNA Computing 269
DNA strands, different gates can communicate with one another. In [55], DNA-based
digital logic gates for constructing large reliable circuits are implemented. In addi-
tion to logic gates they demonstrated signal restoration and amplification. In [53], a
set of catalytic logic gates suitable for scaling up to large circuits is presented and a
formalism for representing and analyzing circuits based on these gates is developed.
Engineered nucleic acid logic switches based on hybridization and conformational
changes have also been successfully demonstrated in vivo [31]. These switches have
been extended to more complex logical gates in [61]. Their gates are part of a single
molecule of RNA which can fold on itself into a special structure. It can detect spe-
cific chemical molecules as input, and either cleave itself or remain intact based on
the input(s) and the function of the gate. Advances have also been made in designing
simple molecular machines that open and close like a clamp [65].
RTRACS (Reverse transcription and TRanscription-based Autonomous Comput-
ing System) is a molecular computing system constructed with DNA, RNA, and
enzymes. In [33], a two-input logic gate is reported that receives input and produces
output in the form of RNA molecules. Each of the two-input molecules is chosen
from a set of two, and the logic gate produces an output molecule for each of the four
possible input combinations. Since the RNA strands can be arbitrarily assigned log-
ical values, this module is capable performing multiple logical operations, including
AND, NAND, OR, and NOR.
The processing of the information stored in DNA is rather random, incomplete,
and complex, especially as the size and sequence diversity of the oligonucleotide
mix increases. DNA computing generates solutions in a probabilistic manner, where
any particular solution is generated with some probability based on the complex
dynamics of the bonding process. By increasing the number of each strand in the
initial solution, one can assume with reasonable certainty that all possible solutions
will be constructed in the initial solution set.
Many theoretical designs have been proposed for DNA automata and Turing
machines [6,10,54]. An in vitro combination of DNA, restriction enzymes and
DNA ligase has been used to construct a programmable finite automaton (Turing
machine) [5].
Synthetic gene networks allow to engineer cells in the same way that we currently
program computers. To programm cell behavior, a component library of genetic cir-
cuit building blocks is necessary. These building blocks perform computation and
communications using DNA-binding proteins, small inducer molecules that inter-
act with these proteins, and segments of DNA that regulate the expression of these
proteins. A component library of cellular gates that implement several digital logic
functions is described in [59]. To represent binary streams, the chemical concentra-
tions of specific DNA-binding proteins and inducer molecules act as the input and
output signals of the genetic logic gates. Biochemical inverters are used to construct
more sophisticated gates and logic circuits. Figure 16.1 depicts a circuit in which a
NAND gate is connected to an inverter. For simplicity, both mRNA and their corre-
sponding protein products are used to denote the signals, or the circuit wires. The
regulation of the promoter and mRNA and protein decay enable the gate to perform
computation. The NAND gate protein output is expressed in the absence of either
Figure 16.1 A biochemical NAND gate connected to a downstream inverter. The two-input NAND
gate consists of two separate inverters, each with a different input, but both connected to the same
output protein [59].
of the inputs, and transcription of the output gene is only inhibited when both input
repressor proteins are present.
One of the most promising applications of DNA computing might be a DNA
memory [4]. The stored information on DNA can be kept without deteriorating for
a long period of time because DNA is very hard to collapse. DNA memory could
have a capacity greater than the human brain in minute scale (ex. a few hundreds
microliter) [4]. A DNA-based memory has been implemented with in vitro learning
and associative recall [11]. The learning protocol stores the sequences to which it
is exposed, and memories are recalled by sequence content through DNA-to-DNA
template annealing reactions. Theoretically, the memory has a pattern separation
capability that is very large, and can learn long DNA sequences. The learning and
recall protocols are massively parallel, as well as simple, inexpensive, and quick. The
design of an irreversible memory element for use in DNA-based computing systems
is presented in [7]. A DNA memory with 16.8 million addresses was achieved in
[64]. The data embedded into a unique address was correctly extracted through
an addressing processes based on nested polymerase chain reaction. In decoding
process, multiple data with different addresses can be also simultaneously accessed
by using the mixture of some address primers.
Recombinant DNA technology allows the manipulation of the genetic information
of the genome of a living cell. It facilitates the alteration of bio-nanomachines within
the living cells and leads to genetically modified organisms. Manipulation of DNA
mimics the horizontal gene transfer in the test tube.
Numerical DNA Computing
Unlike DNA computing using DNA molecules, numerical DNA computing is similar
to GA, but it uses A, T, G, and C bases to code the solution set [63]. A, G, C, and T
bases can be converted into numerical data using 0, 1, 2, and 3, respectively. DNA
computing has two new mutation operations: enzyme and virus mutations. Enzyme
mutation deletes one or more DNA parts from a DNA serial, while virus mutation
adds one or more DNA parts to a DNA serial. The two mutations provide continuous
renewal of the population and prevent focusing on local optima.
DNA computing has some limitations in terms of convergence speed, adaptability,
and effectiveness. In [34], DNA computing algorithm is improved by using adaptive
16.2 DNA Computing 271
parameters toward the desired goal using quantum-behaved PSO, where parameters
of population size, crossover rate, maximum number of operations, enzyme and virus
mutation rates, and fitness function are simultaneously tuned for adaptive process in
order to increase the diversity in the population and prevent the focusing on local
optimum points.
16.2.1 DNA Data Embedding
Since a DNA sequence is conceptually equivalent to a sequence of quaternary sym-

bols (bases), DNA data embedding, diversely called DNA watermarking or DNA
steganography, can be seen as a digital communication problem where channel errors
are analogous to mutations of DNA bases. Depending on the use of coding or non-
coding DNA host sequences, which, respectively, denote DNA segments that can or
cannot be translated into proteins, DNA data embedding is essentially a problem of
communications with or without side information at the encoder.
The two broad fields of application of DNA data embedding techniques are the
use of DNA strands as self-replicating nanomemories with the ability to store huge
amounts of data in an ultracompact and energy-efficient way [15,62], and security
and tracking applications made possible by embedding nongenetic information in
DNA such as DNA watermarking [27], DNA steganography [14], and DNA tagging.
The purpose of DNA sequence compression is to find an efficient encoding method
and reduce the space to store the exponentially increasing sequence data. DNA
compression must be lossless to retain all genetic information enciphered in the
sequences and to guarantee the reliability of the raw data. DNA sequences are com-
monly recorded in the form of text; however, traditional text compression techniques
(e.g., bzip2, gzip, and compress) fail in compressing them efficiently. Unlike com-
mon text data, DNA sequences contain an abundance of repeated fragments, which
could occur at long intervals and in peculiar patterns. The intrinsic characteristics of
DNA sequences have thus led to the introduction of specialized compression algo-
rithms. One of the most commonly used DNA sequence compression technique is
substitutional compression, which compresses sequences by substituting repeated
subsequences with a convenient or specially designed coding scheme. BioCompress
[25] is a substitutional method that compresses the exact DNA subsequence repeats
using Fibonacci coding. BioCompress-2 [24] improves the algorithm by introducing
Markov model to encode the non-repeated regions.
16.3 Membrane Computing
All life forms process information on a biomolecular level, which is robust, self-
organizing, adaptive, decentralized, asynchronous, fault-tolerant, and evolvable.
These properties have been exploited in artificial chemical systems like P systems
or artificial hormone systems.
Membrane computing (http://ppage.psystems.eu) [46,48] is a parallel and distrib-

uted computing model that abstracts formal computing models from the structure
and functioning of the living cells as well as from the cooperation of cells in tissues,
organs, and other higher-order structures.
Membrane systems or P systems [45] belong to artificial metabolic networks.
They are a special case of a group of algorithms known as artificial chemistries [17]
and algorithmic chemistries. These comprise three elements: a set of chemicals, a set
of reactions, and an algorithm that determines how chemicals move about and when
reactions can take place. Chemicals may be symbols to which some computational
meaning can be associated.
In a membrane system, multisets are placed in the compartments defined by the
membrane structure, the symbol-objects are evolved by executing the reaction rules
in a maximally parallel and nondeterministic manner. Reaction rules are inspired by
the biochemical reactions in an internal biological living cell.
The membrane system is able to trade space for time. It can solve intractable
problems in a feasible time due to making use of an exponential space. Most mem-
brane systems are computationally universal, and they are equal to Turing machines
in computing power.
16.3.1 Cell-Like P System
The structure of the basic cell-like P system consists of several membranes arranged
in a hierarchical structure inside a main membrane (the skin), and delimiting regions.
A cell-like P system is defined as a hierarchical arrangement of compartments delim-
ited by membranes. Each compartment may contain a finite multiset of objects (chem-
icals) and a finite set of rules, as well as a finite set of other compartments. The rules
perform transformation and communication operations. Each membrane identifies
a region inside the system. A region contains some objects representing molecules
and evolution rules representing chemical reactions, and possibly other membranes.
A region without any membrane inside is called an elementary one. Objects inside
the regions are delimited by membranes, and rules assigned to the regions of the
membrane structure. The objects can be described by symbols or by strings of sym-
bols. They can evolve and/or move from a region to a neighboring one according to
given evolution rules, associated with the regions. Usually, the rules are applied in
a nondeterministic and maximally parallel way. The evolution of the system corre-
sponds to a computation. The evolution rules represent biochemical interactions or
chemical reactions. During execution, rules are iteratively applied to the symbolic
state of each compartment, and compartments may break open, causing the compo-
sition of symbolic states. The molecular species (ions, proteins, etc.) floating inside
cellular compartments are represented by multisets of objects described by means
of symbols or strings over a given alphabet.
The membrane structure and its associated tree is shown in Figure 16.2. It has a
parentheses expression for membranes as [[[ ]5 [[ ]6 ]4 ]2 [ ]3 ]1 .
16.3 Membrane Computing 273
Figure 16.2 An illustrative 1

2 1
membrane structure of a 5
cell-like P system and its Skin 3
associated tree. 2
4 3
Regions 6
4
5
6
Membranes Elementary Environment
membranes
A membrane system can perform computations in the following way. Starting

from an initial configuration which is defined by the multisets of objects initially
placed inside the membranes, the system evolves by applying the evolution rules
of each membrane in a nondeterministic and maximally parallel manner. A rule is
applicable when all the objects that appear in its left hand side are available in the
region where the rule is placed. The maximally parallel way of using the rules means
that in each step, in each region of the system, we apply a maximal multiset of rules,
namely a multiset of rules such that no further rule can be added to this multiset. A
halting configuration is reached when no rule is applicable. The result is represented
by the number of objects from a specified membrane.
For each evolution rule there are two multisets of objects, describing the reactants
and the products of the chemical reaction. A rule in a membrane can be applied only
to objects in the same membrane. Some objects produced by the rule remain in the
same membrane, some sent out of the membrane, others sent into the inner mem-
branes. Symport/antiport rules allow simultaneous transmembrane transportation of
objects either in the same direction (symport) or in opposite directions (antiport).
In membrane channels, the passage of objects through membranes is allowed only
through specific channels associated with membranes.
The system will go from one configuration to a new one by applying the rules in a
nondeterministic and maximally parallel manner. A computation is defined by a set
of steps, when the system moves from one configuration to another one. The system
will halt when no more rules are available to be applied. Usually, the result of the
computation is obtained in a specified component of the system, called the output
region.
16.3.2 Computing by P System
Definition 16.1 A P system is a tuple = (V, µ; w1 , . . . , wn ; R1 , . . . , Rn ), where

V is a finite alphabet whose elements are called objects, µ ⊂ N × N describes
the tree-structure of membranes, i.e., the hierarchical arrangement of n compart-
ments called regions delimited by membranes, with (i, j) ∈ µ denoting that the
membrane labeled by j is contained in the membrane labeled by i; wi , i = 1, . . . , n,
represents the initial multiset occurring in region i; Ri , i = 1, . . . , n, denotes the set

of processing rules applied in region i.
The membrane structure µ is denoted by a string of left and right brackets ([, ]),
each with the label of the membrane it points to and describing the position of this
membrane in the hierarchy. The rules in each region: µ → (a1 , t1 ), . . . , (am , tm ),
where u is a multiset of symbols from V , ai ∈ V , and ti ∈ {in, out, her e}, i =
1, . . . , m, denotes the symbol remaining in the current compartment, sent to the
outer compartment, or sent to one of the arbitrarily chosen compartments contained
in the current one. When the rule is applied to a multiset u in the current compartment,
u is replaced by the symbols ai .
A configuration of the P system is a tuple c = (u 1 , . . . , u n ), where u i ∈ V ∗ ,
is the multiset associated with compartment i, i = 1, . . . , n. A computation from a
configuration c1 to c2 using the maximal parallelism mode is denoted by c1 =⇒ c2 .
A configuration is a terminal configuration if there is no compartment i such that u i
can be further developed.
A sequence of transitions between configurations of a given P system is called
a computation. A computation is successful if and only it reaches a configuration in
which no rule is applicable. A successful computation sends out of the skin membrane
the multiset of objects during the computation. Unsuccessful computation will never
halt, and generate no result.
This framework provides polynomial time solutions to NP-complete problems
by trading space for time, and whose efficient simulation poses challenges in three
different aspects: an intrinsic massively parallelism of P systems, an exponential com-
putational workspace, and a nonintensive floating point nature. Specifically, these
models were inspired by the capability of cells to produce an exponential num-
ber of new membranes in linear time, through mitosis (membrane division) and/or
autopoiesis (membrane creation) processes.
Computing Capability of P Systems
In [28] it was proved that P system with symport/antiport operating under maxi-
mal parallelism, with only one symbol and degree 2n + 3 can simulate a partially
blind register machines with n registers. If priorities are added to the rules, then the
obtained P system, having n + 3 compartments, can simulate register machines with
n registers. The former result was improved in [20], where it was proved that any
partially blind register machine with n registers can be simulated by a P system with
symport/antiport with only one symbol, degree n + 3 and operating under maximal
parallelism. It was proved in [21] that P systems with symport/antiport operating
under maximal parallelism, with only one symbol and degree 2n + 1, can simulate
register machines with n registers. P systems can solve a number of NP-hard prob-
lems in linear or polynomial time complexity and even solve PSPACE problems in
a feasible time [3,32].
The first super-Turing model of computation rooted in biology rather than physics
is introduced in [8]. In [23], the accelerating P system model [8] is extended, and it
is shown that the resulting systems have hyperarithmetical computational power.
16.3.3 Other P Systems
In addition to basic cell-like P systems [45], there are tissue-like P systems [41],
neural-like P systems [29], metabolic P systems [39], and population P systems [46].
In all cases, there are basic components (membranes, cells, neurons, etc.) hier-
archically arranged, through a rooted tree, for cell-like P systems, or distributed
across a network, like a directed graph, for tissue-like P systems, with a common
environment. Neural-like P systems consider neurons as their cells organized with
a network structure as a directed graph. Various variants of P systems with Turing
computing power have been developed and polynomial or linear solutions to a variety
of computationally hard, NP-complete or PSPACE-complete, problems have been
obtained [48].
A biological motivation of tissue P systems [41] is the intercellular communication
and cooperation between tissue cells by the interchange of signaling molecules.
Tissue P systems can simulate a Turing machine even when using a small number
of cells, each of them having a small number of states.
Tissue-like P systems consider arbitrary graphs as underlying structures, with
membranes placed in the nodes while edges correspond to communication channels
[41]. In tissue-like P systems, several one-membrane cells are considered as evolving
in a common environment [16]. Neural-like P systems can be similar to tissue-like
P systems, or be spiking neural P systems, which only use one type of objects–
the spike. Results are output through the distance between consecutive spikes. The
computing systems obtained are proved to be equivalent to Turing machines [47] even
when using restricted combinations of features. In the evolution–communication P
systems, communication rules are represented by symport/antiport rules that simulate
some of the biochemical transport mechanisms present in the cell.
Figure 16.3 shows the membrane structure of a tissue-like P system for evolving
the optimal solution. It consists of q cells. The region 0 is the environment and out-
put region of the system. The directed lines indicate the communication of objects
between the cells. Each object in the cells expresses a solution. The cells are arranged
as a loop topology based on the communication rules. Each cell runs independently.
The environment stores the global best object found so far. The communication
mechanism exchanges the objects between each cell and its two adjacent cells and
updates the global best object in the environment by using communication antiport
rule and symport rule. The role of evolution rules is to evolve the objects in cells to
generate new objects used in next computing step. During the evolution, each cell
maintains a population of objects. After objects are evolved, each cell communi-
Figure 16.3 An illustrative

structure of tissue-like P 1
system. q 2
q−1
0
cates its best object found in current computing step into the environment to update
the global best object. When the system halts, the objects in the environment are
regarded as the output of the system. This membrane computing approach has been
implemented for clustering in [50].
The inspiration for tissue P systems with cell separation [44] is that new cells are
produced by cell separation in tissues in a natural way. An upper bound of the power
of tissue P systems with cell separation is demonstrated in [56]. The class of problems
solvable by uniform families of these systems in polynomial time is contained in the
class PSPACE, which characterizes the power of many classical models of parallel
computing machines, such as the alternating Turing machine, relating classical and
bio-inspired parallel computing devices.
Spiking neural P systems [29,38], are a class of distributed parallel computing
models inspired by the neurophysiological behavior of neurons sending electrical
impulses (spikes) along axons to other neurons where there is a synapse between
each pair of connected neurons. Spiking neural P systems can also be viewed as an
evolution of P systems shifting from cell-like to neural-like architectures. They have
been shown to be computationally universal [29]. They employ the basic principle
of spiking neural networks — computation by sending spikes via a fixed network of
synapses between neurons, using membrane computing background. Each spiking
neuron is represented as a discrete device equipped with a counter of spikes it receives
from its neighbors. Even very restricted spiking neural P systems keep their universal
(in Turing sense) computational power [22].
Metabolic P systems [39] are a quantitative extension of P system form modeling
metabolic processes. They are deterministic P systems developed to model dynamics
of biological phenomena related to metabolism and signaling transduction in the
living cell. The classical viewpoint on metabolic dynamics, in terms of ordinary
differential equations, is replaced by suitable generalizations of chemical principles.
P systems with active membranes [46] have been proved to be complete from a
computational viewpoint, equivalent in this respect to Turing machines. The mem-
brane division can be used to solve computationally hard problems, e.g., NP-complete
problems, in polynomial or even linear time, by a space–time trade-off. In this com-
puting paradigm, decision problems are solved by using families of recognizer con-
fluent P systems [51], where all possible computations with the same initial config-
uration must give the same answer. In confluent recognizer P systems, all computa-
tions halt, only two possible outputs exist (usually named yes and no), and the result
produced by the system only depends upon its input, and is not influenced by the
particular sequence of computation steps taken to produce it.
Reconfig-P [43] is an implementation of membrane computing based on recon-
figurable hardware that is able to execute P systems at high performance. It exploits
the reconfigurability of the hardware by constructing and synthesizing a customized
hardware circuit for the specific P system to be executed. The Reconfig-P hardware
design treats reaction rules as the primary computational entities and represents
regions only implicitly. A generic simulator on GPUs for a family of recognizer P
system with active membranes was presented in [9].
The computational power of energy-based P systems, where a fixed amount of

energy is associated with each object and the rules transform objects by manipulating
their energy, is studied in [36]. If local priorities are assigned to the rules, then energy-
based P systems are shown to be as powerful as Turing machines. Moreover, instances
of a special symbol are used to denote free energy units occurring inside the regions
of the system. These energy units can be used to transform objects, using appropriate
rules that satisfy the principle of energy conservation. By allowing the presence of
a potentially infinite amount of free energy units, energy-based P systems are able
to simulate register machines, hence the model reaches the computational power of
Turing machines.
It is known [51,66] that the class of all decision problems which can be solved
in polynomial time by a family of recognizer P systems that use only evolution,
communication and dissolution rules coincides with the standard complexity class
P. Hence, in order to solve computationally hard (such as NP-complete or PSPACE-
complete) problems in polynomial time by means of P systems, it seems necessary
to be able to produce an exponential number of membranes in a polynomial number
of computation steps. Recognizer P systems with active membranes (using division
rules and, possibly, polarizations associated to membranes) have been successfully
used to solve NP-complete problems efficiently. The first solutions were given in the
so-called semi-uniform setting [66], which means that we assume the existence of
a deterministic Turing machine that, for every instance of the problem, produces in
polynomial time a description of the P system that solves such an instance. Recog-
nizer P systems having three polarizations associated to the membranes [52] are able
to solve the PSPACE-complete problem quantified 3SAT when working in polyno-
mial space and exponential time. As it happens with Turing machines, this model of
P systems can solve in exponential time and polynomial space problems that cannot
be solved in polynomial time, unless P = PSPACE. [52] trade time for space. This
constraint implies that the number of computation steps is at most exponential with
respect to the input size.
With exponential precomputed resources SAT is solvable in constant time with
spiking neural P systems [12]. It is proved in [42] that there is no standard spiking
neural P system that simulates Turing machines with less than exponential time and
space overheads. The spiking neural P systems have a constant number of neurons
that is independent of the input length. Following this, we construct a universal
spiking neural P system with exhaustive use of rules that simulates Turing machines
in linear time and has only 10 neurons. Extended spiking neural P systems with
exhaustive use of rules were proved computationally universal in [30]. Using the
simulation algorithm in [30] gives an exponential time overhead when simulating
Turing machines.
16.3.4 Membrane-Based Optimization
Membrane-inspired EAs integrate the membrane structures, evolution rules and

computational mechanisms of P systems with the search principles of various
metaheuristics. Until now two classes of membrane structures, the hierarchical struc-
ture of a cell-like P system (formally, a rooted tree) and the network structure of a
tissue-like P system (formally, a directed graph) have been used to design a variety
of membrane-inspired EAs.
DEPS [13] is a membrane algorithm for numerical optimization, which com-
bines DE, local search such as simplex method and P systems. The hierarchical
structure of cell-like P systems is used to organize the objects consisting of real-
valued strings, and the rules are composed of mutation, crossover and selection
operations in elementary membranes, a local search in the skin membrane and
transformation/communication-like rules in P systems. DEPS outperforms DE.
In [67], an EA is introduced by using the concepts and principles of the quantum-
inspired evolutionary approach and the hierarchical arrangement of the compart-
ments of a P system for solving NP-hard COPs.
The main structure of an animal cell includes the cell membrane, cell cytoplasm,
and cell nucleus. Many nuclear pores distributed on the cell’s nucleus are channels
for the transportation of macromolecules, such as mRNA nucleus, related enzymes,
and some proteins. Those macromolecules are essential substances for metabolism
of the cell, but some other substances are forbidden to enter the cell nucleus. Due
to the nuclear pores, the nucleus has the ability to select essential substances to
keep itself alive and stronger by means of substance filtration. Cell nuclear pore
optimization [37] is inspired by the phenomenon that appears in cell biology. This
means the optimal solution can be obtained through continuous filtrating from the
potential optimal solutions. The method obtains the potential essential samples from
the common samples according to certain evaluation criteria; if the potential essential
samples meet some evaluation criteria, they are the real essential samples. All the
common samples accompany with a pore vector containing 0 or 1, and its elements
are generated by some initial conditions or initial rules.
References
1. Adleman LM. Molecular computation of solutions to combinatorial problems. Science.
1994;266(5187):1021–4.
2. Albert R, Othmer HG. The topology of the regulatory interactions predicts the expression
pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol. 2003;223(1):
1–18.
3. Alhazov A, Martin-Vide C, Pan L. Solving a PSPACE complete problem by recognizing P
systems with restricted active membranes. Fundamenta Informaticae. 2003;58(2):67–77.
4. Baum EB. Building an associative memory vastly larger than the brain. Science. 1995;268:583–
5.
5. Benenson Y, Paz-Elizur T, Adar R, Keinan E, Livneh Z, Shapiro E. Programmable and
autonomous computing machine made of biomolecules. Nature. 2001;414:430–4.
6. Benenson Y, Gil B, Ben-Dor U, Adar R, Shapiro E. An autonomous molecular computer for
logical control of gene expression. Nature. 2004;429(6990):423–9.
References 279
7. Blenkiron M, Arvind DK, Davies JA. Design of an irreversible DNA memory element. Nat
Comput. 2007;6:403–11.
8. Calude CS, Paun G. Bio-steps beyond Turing. BioSystems. 2004;77:175–94.
9. Cecilia JM, Garcia JM, Guerrero GD, Martinez-del-Amor MA, Perez-Hurtado I, Perez-
Jimenez MJ. Simulation of P systems with active membranes on CUDA. Briefings Bioinform.
2010;11(3):313–22.
10. Chen H, Anindya D, Goel A. Towards programmable molecular machines. In: Proceedings of
the 5th conference on foundation of nanoscience, Snowbird, Utah, 2008. p. 137–139.
11. Chen J, Deaton R, Wang YZ. A DNA-based memory with in vitro learning and associative
recall. Nat Comput. 2005;4:83–101.
12. Chen H, Ionescu M, Ishdorj T. On the efficiency of spiking neural P systems. In: Gutierrez-
Naranjo MA, Paun G, Riscos-Nunez A, Romero-Campero FJ, editors. Proceedings of fourth
brainstorming week on membrane computing, Sevilla, Spain, February 2006. p. 195–206.
13. Cheng J, Zhang G, Zeng X. A novel membrane algorithm based on differential evolution for
numerical optimization. Int J Unconv Comput. 2011;7:159–83.
14. Clelland CT, Risca V, Bancroft C. Hiding messages in DNA microdots. Nature.
1999;399(6736):533–4.
15. Cox JP. Long-term data storage in DNA. Trends Biotechnol. 2001;19(7):247–50.
16. Diaz-Pernil D, Gutierrez-Naranjo MA, Perez-Jimenez MJ, Riscos-Nuez A. A linear-time tis-
sue P system based solution for the 3-coloring problem. Electron Notes Theor Comput Sci.
2007;171(2):81–93.
17. Dittrich P, Ziegler J, Banzhaf W. Artificial chemistries—a review. Artif Life. 2001;7(3):225–75.
18. Faulhammer D, Cukras AR, Lipton RJ, Landweber LF. Molecular computation: RNA solutions
to chess problems. Proc Nat Acad Sci U.S.A. 2000;97:1385–9.
19. Fisher MJ, Paton RC, Matsuno K. Intracellular signalling proteins as ‘smart’ agents in parallel
distributed processes. BioSystems. 1999;50(3):159–71.
20. Frisco P. Computing with cells: advances in membrane computing. Oxford: Oxford University
Press; 2009.
21. Frisco P. P Systems and unique-sum sets. In: Proceedings of international conference on mem-
brane computing, Lecture notes of computer science 6501. Berlin: Springer; 2010. p. 208–225.
22. Garcia-Arnau M, Perez D, Rodriguez-Paton A, Sosik P. On the power of elementary features
in spiking neural P systems. Nat Comput. 2008;7:471–83.
23. Gheorghe M, Stannett M. Membrane system models for super-Turing paradigms. Nat Comput.
2012;11:253–9.
24. Grumbach S, Tahi F. A new challenge for compression algorithms: genetic sequences. Inf
Process Manag. 1994;30:875–86.
25. Grumbach S, Tahi F. Compression of DNA sequences. In: Proceedings of data compression
conference, Snowbird, UT, March 1993. p. 340–350.
26. Hasty J, McMillen D, Collins JJ. Engineered gene circuits. Nature. 2002;420:224–30.
27. Heider D, Barnekow A. DNA-based watermarks using the DNA-crypt algorithm. BMC Bioin-
form. 2007;8:176.
28. Ibarra OH, Woodworth S. On symport/antiport P systems with small number of objects. Int J
Comput Math. 2006;83(7):613–29.
29. Ionescu M, Paun G, Yokomori T. Spiking neural P systems. Fundamenta Informaticae.
2006;71:279–308.
30. Ionescu M, Paun G, Yokomori T. Spiking neural P systems with an exhaustive use of rules. Int
J Unconv Comput. 2007;3(2):135–53.
31. Isaacs FJ, Dwyer DJ, Ding C, Pervouchine DD, Cantor CR, Collins JJ. Engineered riboregu-
lators enable post-transcriptional control of gene expression. Nat Biotechnol. 2004;22:841–7.
32. Ishdorj T, Leporati A, Pan L, Zeng X, Zhang X. Deterministic solutions to QSAT and Q3SAT by
spiking neural P systems with pre-computed resources. Theor Comput Sci. 2010;411:2345–58.
33. Kan A, Sakai Y, Shohda K, Suyama A. A DNA based molecular logic gate capable of a variety
of logical operations. Nat Comput. 2014;13:573–81.
34. Karakose M, Cigdem U. QPSO-based adaptive DNA computing algorithm. Sci World J.
2013;2013:8. Article ID 160687.
35. Lauffenburger DA. Cell signaling pathways as control modules: complexity for simplicity?
PNAS. 2000;97(10):5031–3.
36. Leporati A, Besozzi D, Cazzaniga P, Pescini D, Ferretti C. Computing with energy and chemical
reactions. Nat Comput. 2010;9:493–512.
37. Lin L, Guo F, Xie X. Novel informative feature samples extraction model using cell nuclear
pore optimization. Eng Appl Artif Intell. 2015;39:168–80.
38. Maass W. Computing with spikes. Found Inf Process TELEMATIK. 2002;8:32–6.
39. Manca V, Bianco L, Fontana F. Evolution and oscillation in P systems: applications to biolog-
ical phenomena. In: Mauri G, Paun G, Perez-Jimenez MJ, Rozenberg G, Salomaa A, editors.
Workshop on membrane computing, Lecture notes in computer science 3365. Berlin: Springer;
2004. p. 63–84.
40. Marijuan PC. Enzymes, artificial cells and the nature of biological information. BioSystems.
1995;35:167–70.
41. Martin-Vide C, Paun G, Pazos J, Rodriguez-Paton A. Tissue P systems. Theor Comput Sci.
2003;296(2):295–326.
42. Neary T. On the computational complexity of spiking neural P systems. Nat Comput.
2010;9:831–51.
43. Nguyen V, Kearney D, Gioiosa G. An implementation of membrane computing using recon-
figurable hardware. Comput Inform. 2008;27:551–69.
44. Pan L, Perez-Jimenez M. Computational complexity of tissue-like P systems. J Complex.
2010;26:296–315.
45. Paun G. Computing with membranes. J Comput Syst Sci. 2000;61(1):108–43.
46. Paun G. Membrane computing: an introduction. Berlin: Springer; 2002.
47. Paun G. A quick introduction to membrane computing. J Logic Algebraic Program.
2010;79(6):291–4.
48. Paun G, Rozenberg G, Salomaa A, editors. Handbook of membrane computing. Oxford, UK:
Oxford University Press; 2010.
49. Paun G, Rozenberg G, Salomaa A. DNA computing. Berlin: Springer; 1998.
50. Peng H, Luo X, Gao Z, Wang J, Pei Z. A novel clustering algorithm inspired by membrane
computing. Sci World J. 2015;2015:8. Article ID 929471.
51. Perez-Jimenez MJ, Romero-Jimenez A, Sancho-Caparrini F. Complexity classes in models of
cellular computing with membranes. Nat Comput. 2003;2(3):265–85.
52. Porreca AE, Leporati A, Mauri G, Zandron C. P systems with active membranes: trading time
for space. Nat Comput. 2011;10:167–82.
53. Qian L, Winfree E. A simple DNA gate motif for synthesizing large-scale circuits. In: DNA
computing, Volume 5347 of Lecture notes in computer science. Berlin: Springer; 2008. p.
70–89.
54. Rothemund P. A DNA and restriction enzyme implementation of turing machines. In: DNA
based computers, DIMACS series in discrete mathematics and theoretical computer science,
no. 27. Providence, RI: American Mathematical Society; 1996. p. 75–120.
55. Seelig G, Soloveichik D, Zhang DY, Winfree E. Enzyme-free nucleic acid logic circuits. Sci-
ence. 2006;314(5805):1585.
56. Sosik P, Cienciala L. Computational power of cell separation in tissue P systems. Inf Sci.
2014;279:805–15.
57. Stojanovic MN, Mitchell TE, Stefanovic D. Deoxyribozyme-based logic gates. J Am Chem
Soc. 2002;124(14):3555–61.
58. Tufte G, Haddow PC. Towards development on a silicon-based cellular computing machine.
Nat Comput. 2005;4:387–416.
References 281
59. Weiss R, Basu S, Hooshansi S, Kalmbach A, Karig D, Mehreja R, Netravalt I. Genetic circuit
building blocks for cellular computation, communications, and signal processing. Nat Comput.
2003;2:47–84.
60. Weiss R, Knight Jr TF, Sussman G. Genetic process engineering. In: Amos M, editor. Cellular
computation. Oxford, UK: Oxford University Press; 2004. p. 43–73.
61. Win MN, Smolke CD. Higher-order cellular information processing with synthetic RNA
devices. Science. 2008;322(5900):456–60.
62. Wong PC, Wong K, Foote H. Organic data memory using the DNA approach. Commun ACM.
2003;46(1):95–8.
63. Xu J, Qiang X, Yang Y, Wang B, Yang D, Luo L, Pan L, Wang S. An unenumerative DNA
computing model for vertex coloring problem. IEEE Trans Nanobiosci. 2011;10(2):94–8.
64. Yamamoto M, Kashiwamura S, Ohuchi A, Furukawa M. Large-scale DNA memory based on
the nested PCR. Nat Comput. 2008;7:335–46.
65. Yurke B, Turberfield A, Mills A Jr, Simmel F, Neumann J. A DNA-fuelled molecular machine
made of DNA. Nature. 2000;406:605–8.
66. Zandron C, Ferretti C, Mauri G. Solving NP-complete problems using P systems with active
membranes. In: Antoniou CS, Calude MJ, Dinneen I, editors. Unconventional models of com-
putation. London: Springer; 2000. p. 289–301.
67. Zhang GX, Gheorghe M, Wu CZ. A quantum-inspired evolutionary algorithm based on P
systems for knapsack problem. Fundamenta Informaticae. 2008;87:93–116.
Quantum Computing
17
Quantum computing is inspired from the theory of quantum mechanics, which

describes the behavior of particles of atomic size. Quantum computing is involved
with the research on quantum computers and quantum algorithms. Quantum algo-
rithms perform exponentially faster than any of the traditional algorithms [30]. Quan-
tum computers were proposed in the 1980s [1,6]. This chapter introduces some basic
quantum computing algorithms and quantum-based hybrid metaheuristic algorithms.
17.1 Introduction
The quantum principle of superposition of states assumes that a system is in a super-

position of all of its possible states at the same time defined by a probability density
amplitude and that all states can be processed in parallel in order to optimize an
objective function. Quantum computing uses unitary operators acting on discrete
state vectors. Quantum processing allows an optimization problem to be solved by
exhaustive search on all its possible solutions. Such efficiency is ensured when the
algorithm is run on a quantum computer, whereas on a classical computer it can be
very resource-consuming.
Parallelism and entanglement cause quantum computations and communications
to exhibit speedups. Parallelism is the superposition of an exponential number of
states representing several solutions of the problem, including the best one. This
allows for exponential speedup and storage in a quantum register in terms of the
number of basis states.
Entanglement is the potential for quantum states to exhibit correlations that cannot
be accounted for classically, in particular, for associating a fitness to each solution
and making a decision. Principle of entanglement states that two or more particles,
regardless of their location, can be viewed as correlated, undistinguishable, synchro-

DOI 10.1007/978-3-319-41192-7_17
284 17 Quantum Computing
nized, coherent. If one particle is measured and collapsed it causes all other particles
to collapse too.
The well-known quantum algorithms include the Deutsch–Jozsa algorithm [7],
Shor’s quantum factoring algorithm [29,30], and Grover’s database search algorithm
[9,10]. Shor’s algorithm can give an exponential speedup for factoring large integers
into prime numbers, and has been implemented using nuclear magnetic resonance
(NMR) [33]. Shor’s quantum algorithm [30] is exponentially faster than any known
classical algorithm. It can factorize large integers faster than any Turing program,
and this suggests that quantum theory has super-Turing potential.
In ensemble quantum computation, all computations are performed on an ensem-
ble of computers rather than on a single computer. Measurements of qubits in a single
computer cannot be performed, and only expectation values of each particular bit
over all the computers can be read out. The randomizing strategy and the sorting strat-
egy resolve the ensemble-measurement problem in most cases [2]. NMR computing
[5,8] is a promising implementation of quantum computing. Several quantum algo-
rithms involving only few qubits have been demonstrated [5,8,18,26,33]. In such
NMR systems, each molecule is used as a computer. Different qubits in the com-
puter are represented by spins of different nuclei. There is an ensemble of quantum
computers.
17.2 Fundamentals
Quantum information processing has the limitations of demolition of quantum mea-

surement and no-cloning theorem. Demolition of quantum measurement states that
measurement of a quantum state results in its disturbance. No-cloning theorem states
that there is no way of copying unknown quantum states faithfully. Heisenberg’s
uncertainty principle gives the uncertainty relation on measurements.
Quantum systems can be described by a probability density ψ that exists in a
Hilbert space. In quantum mechanics, the state of a physical system is identified
with a ray in a complex separable Hilbert space. For states (vectors) of a Hilbert
space, so-called Dirac notation |φ >–ket vector is usually used. Notation < φ| is
called bra vector.
Unlike classical bits, a quantum bit or qubit may be in state 1 or 0, or in a super-
position of both states. A quantum system is said to be coherent if it is in a linear
superposition of its basis states. Observance or interaction of such state with its envi-
ronment will lead to instantaneous choice among one of those states and collapse into
that state and remains in that state only. Entanglement is the nonclassical correlation
that may exist between separated quantum systems.
Mathematically, a qubit is represented by a unit vector in the two-dimensional
complex Hilbert space, and can be written in the Dirac notation
| >= α|0 > +β|1 >, (17.1)
17.2 Fundamentals 285
where |0 > and |1 > are two basis states, α and β are complex numbers defining
probabilities which of the corresponding states are likely to appear when a qubit
is read (measured, collapsed). |α|2 and |β|2 give the probability of a qubit being
found in state 0 or 1, respectively. Thus, |α|2 + |β|2 = 1 at any time. After loss of
coherence, the qubit will collapse into one of the states |0 > or |1 >.
With the exception of measurements, all other operations allowed by quantum
mechanics are unitary operations on the Hilbert space in which qubits live. They
are represented by gates, much as in a classical circuit. Hadamard gate H maps:
|0 >→ √1 (|0 > +|1 >) and |1 >→ √1 (|0 > −|1 >). It makes the eigenstates into
2 2
a superposition of |0 > and |1 > with equal probability amplitudes.
The evolution of a quantum system is described by special linear operators, unitary
operators U , which give
U | >= U [α|0 > +β|1 >] = αU |0 > +βU |1 > . (17.2)
That is, evolution of a two-level quantum system is a linear combination of those of
their basis states.
Analogous to logic gates in classical computers, quantum computing tasks can be
completed through quantum logic gates. In order to modify the probability ampli-
tudes, quantum gates can be applied to the states of a qubit. Quantum gates are
unitary operators that transform quregisters into quregisters. Being unitary, gates
represent characteristic reversible transformations. Some most useful quantum gates
for quantum computation are NOT-gate, controlled-NOT (CNOT) gate, phase-shift
gate, and Hadamard gate. Phase-shift gate is an important element to carry out the
Grover iteration for reinforcing a good decision. The quantum analog for exploring
the search space is quantum gate, which is a unitary transformation.
A set of gates is said to be universal for quantum computation if any unitary
operation may be approximated to arbitrary accuracy by a quantum circuit involving
only those gates. Any arbitrary unitary operation can be approximated to arbitrary
accuracy using Hadamard, phase, CNOT, and π/8 gates. Further, any classical circuit
can be made reversible by introducing a special gate called Toffoli gate. Since a
quantum version of Toffoli gate is developed, a classical reversible circuits can be
converted to a quantum circuit that computes the same function.
The basic components of quantum circuits are linear quantum gates, which imple-
ment unitary (and reversible) transformations as rotations in the complex qubit vector
space. Rotations maintain the orthogonality of basis vectors, and hence, the validity
of the measurement postulate. Each quantum gate is therefore represented by a suit-
able unitary matrix U. As a consequence of linearity for matrix-vector multiplication,
the gate operation is equivalently represented by the transformation of every basis
vector in the quantum state space. The unitary property implies that quantum states
cannot be copied or cloned; this is also known as the no-cloning property.
No-cloning theorem states that it is not possible to clone a quantum state , and
thus to obtain full information on the coefficients α and β from a single copy of .
Entanglement is another feature arising from the linearity of quantum mechanics.
The state of a composite classical system is completely determined by the states
of the subsystems. The state of a composite quantum system | > AB is the tensor
product ⊗ of the states of the component systems
1
|Bell AB = √ [|0 > A ⊕|0 > B +|1 > A ⊕|1 > B ]. (17.3)
2
Such a Bell state is said to be entangled.
Quantum algorithms rely on properties of quantum parallelism and quantum
superposition. Quantum parallelism arises from the ability of a quantum memory
register to exist in a superposition of base states. A quantum memory register can
exist in a superposition of states and each component of this superposition may be
thought of as a single argument to a function. Since the number of possible states is
2n for n qubits in the quantum register, we can perform in one operation on a quan-
tum computer what would take an exponential number of operations on a classical
computer. As the number of superposed states increases in the quantum register,
the probability of measuring any state will start decreasing. In quantum computing,
by contrast, all solutions are guaranteed to be generated and we need not concern
ourselves with the possibility of missing potential solutions.
Quantum algorithm is based on a probability for searching the best solution ran-
domly. It has the drawback of premature and stagnation in the late stage of evolution.
17.2.1 Grover’s Search Algorithm
Grover’s algorithm [10] solves the problem of searching in an unstructured database.

The basic idea is to amplify the coefficients of the superposition of all elements that
correspond to the solutions of the given problem, while reducing √ the others. This
procedure is performed by applying a unitary operator O( N ) times.
Nonstructuredness is essential for achieving the speedup stated above, other-
wise classical binary tree search would solve the problem in O(log N ). By uti-
lizing
√ Grover’s algorithm, it is possible to search a database of N entries in time
O( N ) compared to O(N ) in the classical setting. By repeating the whole quantum
procedure, however, it is possible to obtain other solutions. When the number of
solutions is known in advance, one can use Grover’s algorithm to look for one of
them. Grover’s algorithm can achieve a square speedup over classical algorithms in
unsorted database searching, and has been realized using NMR [4,17] and quantum
optics [20].
Assume that there is a system with N = 2n states labeled S1 , S2 , …, S N , repre-
sented by n-bit strings. Assume that there is a unique marked element Sm that satisfies
a condition C(Sm ) = 1, and C(S) = 0 for all other states. Grover’s algorithm can
find Sm by minimizing the number of evaluations of C.
Grover’s algorithm initially places the state register in an equal superposition of
all states, that is, the amplitude of all states is set as the same positive value. It then
implements two unitary transformations, namely a selective phase inversion and an
inversion about average operation, for a number of times. A selective phase inversion
of the marked state followed by an inversion about average step has the effect of
17.2 Fundamentals 287
Algorithm 17.1 (Grover’s Search Algorithm).
1. Prepare a quantum register to be normalized and uniquely in the first √ state.

2. Place the register in an equal superposition of all states (1, 1, . . . , 1)/ N by applying
the Walsh–Hadamard
√ operator W .
3. Repeat for O( N ) times:
a. For any state S: if C(S) = 1, rotate the phase by π; else, leave the state unaltered.
b. Apply the inversion about average operator A, whose matrix representation is
[Ai j ] = 2/N if i = j and [Ai j ] = −1 + 2/N if i = j, on the quantum register.
4. Measure the quantum register. The measurement yields an n-bit label of the marked
state C(S M ) = 1 with probability at least 1/2.
√
increasing the amplitude of the marked state by O( N ), at the expense of the non-
marked states,
√ in a number of ways analogous to interference of waves. Therefore,
after O( N ) times, the probability of measuring the marked √ state approaches 1.
Grover showed that performing a measurement after π/4 N iterations is highly
likely to give the correct result for sufficiently large N .
Grover’s algorithm is given in Algorithm 17.1.
Example 17.1: In this simple example, we search for a needle in the haystack, i.e., to
find a particular element among the elements of a database. We now simulate Grover’s
algorithm using six qubits. There are 26 = 64 database elements. The desired ele-
ment is randomly generated from among the 64 elements. The Walsh–Hadamard
transformation and the operators to rotate phase and inversion about average trans-
formation are achieved through matrices of zeros and ones. By testing with different
value for the number √ of iterations, we verified that the optimal number of iterations
is determined by π/4 N as proposed by Grover. Figure 17.1 shows the probability
dynamics generated for the desired element being selected and the resulting distrib-
ution for each element being selected.
17.3 Hybrid Methods
17.3.1 Quantum-Inspired EAs
By injecting the power of quantum parallelism into computational intelligence such

as EAs and swarm intelligence, many hybridized algorithms, such as quantum-
inspired EA [11,12,24], quantum-inspired immune algorithm [16], quantum-inspired
SA [19], and quantum PSO [32], have been introduced for gloabl optimization of
COPs and numerical optimization problems.
Probability for the desired element being selected, Prfinal = 0.9966

(a) 1
0.9
0.8
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
1 2 3 4 5 6
Iteration
Probability distribution for each element being selected

(b) 1
0.9
0.8
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70
Element
Figure 17.1 a The probability dynamics for the desired element being selected, and b the resulting
distribution for each element being selected.
In EA, variation operators like crossover or mutation operations are used to explore
the search space. The quantum analog for these operators is called a quantum gate.
Mutation can be performed by deducing probability distribution or by Q-gate rotation
[34], while quantum collapse concept was introduced to maintain diversity among
quantum chromosomes in [35].
Quantum or quantum-inspired EAs have been proposed to improve the existing
EAs. In such hybrid methods, qubits are generally used to encode the individuals
and quantum operations are used to define the genetic operations. There are binary
observation quantum-inspired EA [11,12], and real-observation quantum EA [37].
17.3 Hybrid Methods 289
Quantum-inspired EAs were introduced to solve the traveling salesman prob-

lem [24], in which the crossover operation was performed based on the concept of
interference. Binary quantum EA [11] solves COPs. It uses qubits to represent the
individuals, and searches for the optimum by observing the quantum states. It can
work with small population sizes without running into premature convergence [13].
Quantum EA is analyzed using simple test function with single individual in [13].
Qubit representation for the elements of the population is a key point for the use
of quantum algorithm. It provides probabilistically a linear superposition of multiple
states. By adopting a qubit chromosome representation, a classical population can
be generated by repeatedly measuring the quantum population, and then its best
elements are used to update the quantum population [11]. Diversity is caused by
qubit representation, which is further driven towards better solutions by quantum
gate as variation operator. Quantum EA is elitist. It can exploit the search space for
a global solution with a small number of individuals, even with one element [11].
Chaotic behavior is also sliced with quantum EA [21].
Quantum EA has the following advantages owing to its high degree of parallelism:
automatic balancing ability between global and local search, inclusion of individ-
ual’s past history, involving fewer individuals thereby demanding less memory with
increased performance, less computation time, clearer termination-condition, and
higher precision. In quantum EA with respect to binary encoding, haphazard and
blind characteristics of qubit measurement and frequent binary to decimal conver-
sion are required. Quantum EA holds some disadvantages such as demand of large
memory for coding, leading to premature convergence, appropriate selection of rota-
tion angle of quantum rotation gate [11].
For solving continuous function optimization problems, the update strategy of
quantum gates depends on prior knowledge of the criterion of the optimal solution.
An extended coarse-grained hierarchical ring model is presented in coarse-grained
parallel quantum EA [36]. Real-observation quantum-inspired EA [37] solves global
numerical optimization problems.
Each quantum gene is denoted by
t
α j1 αtj2 · · · αtjm
x tj = . (17.4)
β tj1 β tj2 · · · β tjm
where m is the number of qubits.
At the beginning, equal probability superposition is employed in the coding of
chromosome, α0ji = β 0ji = √1 .
2
The quantum gate U is defined by
t
αt+1 cos θ − sin θ α ji
ji
= , (17.5)
β t+1
ji
sin θ cos θ β tji
where θ represents the direction and angle of quantum gate rotation. The angle is
decided by their fitness value and two reference bits. For instance, if the kth qubit in
the ith solution qki has to update, θ is set according to the fitness value of x it and two
t and bbest,t .
bits, xi,k i,k
The flowchart of quantum EA is given in Algorithm 17.2.
Algorithm 17.2 (Quantum EA).
1. Quantum initialization.
Set the generation t = 1.
Set the population size N , and the number of quantum genes m.
2. Repeat:
a. Measurement: Convert quantum encoding to binary encoding (b1t , b2t , . . . , btN ).
Produce a random number r ∈ (0, 1).
if r < |αtji |2 , then bi = 0 otherwise bi = 1.
b. Fitness evaluation.
c. Determine the rotation angle θ.
d. Apply quantum gate U on each x tj .
e. Record the best solution.
f. Set t = t + 1.
Quantum EA is a multimodal probability EDA exploring quantum parallelism

based on the probabilistic superposition of states [13,28]. Versatile quantum EA
[27] belongs to the class of EDAs. In versatile quantum EA, elitism is switched off
and the search at a given time is driven by the best solution found at that time. The
information of the search space collected during evolution is not kept at the individual
level, but continuously renewed and periodically shared among the groups or even the
whole population. Versatile quantum EA continuously adapts the search according
to local information while the quantum individuals act as memory buffers to keep
track of the search history. This leads to a much smoother and more efficient long-
term exploration of the search space. Versatile quantum EA outperforms quantum
EA in terms of speed, solution quality, and scalability. Relatively to three classical
EDAs, versatile quantum EA provides comparatively good results in terms of loss
of diversity, scalability, solution quality, and robustness to fitness noise.
In [23], quantum GA is implemented using the software platform of Compute Uni-
fied Device Architecture (CUDA) from NVIDIA, in special, the MATLAB Graphic
Processing Unit (GPU) library.
17.3.2 Other Quantum-Inspired Hybrid Algorithms
Quantum-inspired PSO employs a probability searching technique. In quantum PSO

[32], the search space is transfered from classical space to quantum space, and the
movement of the particles is the same as the ones with the quantum mechanics.
In the iteration process, first initialize the position of the population, and then the
particles search according to the wave principle of the particles. The particle appears
anywhere in the searching space with a certain probability. To evaluate the individual,
one needs to learn the precise position of every particle and then obtain the fitness
17.3 Hybrid Methods 291
of the particles. In quantum mechanics, the position is measured by Monte Carlo

methods.
Quantum-inspired PSO for binary optimization is addressed in [15,22]. The qubit
individual is used for the probabilistic representation of a particle, thereby elimi-
nating the velocity update procedure in PSO. In [15], the inertia weight factor and
two acceleration coefficients are removed and only rotation angle is needed when
modifying the position of particles. The proposed rotation gate includes a coordi-
nate rotation gate for updating qubits, and a dynamic rotation angle approach for
determining the magnitude of rotation angle.
Quantum-inspired tabu search [3] is a combination of quantum EA [11] and tabu
search. It includes the diversification and the intensification strategies from quantum
EA. Quantum EA is modified with another updating quantum state. This combi-
nation prevents premature convergence, but also more quickly obtains the optimal
solution. A population of qubit states is maintained, which are binary strings by mea-
surement. The process of qubit measurement is a probability operation that increases
diversification; a quantum rotation gate used to search toward attractive regions will
increase intensification. The repair procedure keeps the possible solutions in the fea-
sible domain. After obtaining possible solutions, the best and worst solutions are
used as the reference bits to update the qubits original state. This gradually forces
the possible solutions toward the elite solutions.
Quantum-inspired binary gravitational search algorithm [14,25] combines grav-
itational search algorithm and quantum computing to present a robust optimiza-
tion tool to solve binary encoded problems. Quantum binary gravitational search
algorithm [14] uses the rotation angle to determine the new position of the agent.
Acceleration updating in binary gravitational search algorithm is converted to obtain
the rotation angle, and the magnitude of the rotation angle is used to replace the
gravitation mass. A quantum-inspired gravitational search algorithm for continuous
optimization is given in [31].
Problem
17.1 Open the accompanying Quantum Information Toolkit (QIT) MATLAB code.
(a) Run and understand Shor’s factorization algorithm.
(b) Run and understand Grover’s search algorithm.
References
1. Benioff P. The computer as a physical system: a microscopic quantum mechanical Hamiltonian
model of computers as represented by Turing machines. J Stat Phys. 1980;22(5):563–91.
2. Boykin PO, Mor T, Roychowdhury V, Vatan F. Algorithms on ensemble quantum computers.
Natural Comput. 2010;9(2):329–45.
3. Chiang H-P, Chou Y-H, Chiu C-H, Kuo S-Y, Huang Y-M. A quantum-inspired tabu search
algorithm for solving combinatorial optimization problems. Soft Comput. 2014;18:1771–81.
4. Chuang IL, Gershenfeld N, Kubinec M. Experimental implementation of fast quantum search-
ing. Phys Rev Lett. 1998;80(15):3408–11.
5. Cory DG, Fahmy AF, Havel TF. Ensemble quantum computing by nuclear magnetic resonance
spectroscopy. Proc Natl Acad Sci USA. 1997;94:1634–9.
6. Deutsch D. Quantum theory, the Church-Turing principle and the universal quantum computer.
Proc Royal Soc Lond A. 1985;400(1818):97–117.
7. Deutsch D, Jozsa R. Rapid solution of problems by quantum computation. Proc Royal Soc
Lond A. 1992;439(1907):553–8.
8. Gershenfeld N, Chuang IL. Bulk spin-resonance quantum computation. Science.
1997;275(5298):350–6.
9. Grover LK. Quantum mechanics helps in searching for a needle in a haystack. Phys Rev Lett.
1997;79(2):325–8.
10. Grover LK. A fast quantum mechanical algorithm for database search. In: Proceedings of the
28th annual ACM symposium on theory of computing (STOC’96), Philadelphia, PA, USA,
May 1996. New York: ACM Press; 1996. p. 212–219.
11. Han KH, Kim JH. Quantum-inspired evolutionary algorithm for a class of combinatorial opti-
mization. IEEE Trans Evol Comput. 2002;6(6):580–93.
12. Han KH, Kim JH. Quantum-inspired evolutionary algorithms with a new termination criterion,
H gate, and two-phase scheme. IEEE Trans Evol Comput. 2004;8(2):156–69.
13. Han KH, Kim JH. On the analysis of the quantum-inspired evolutionary algorithm with a single
individual. In: Proceedings of IEEE congress on evolutionary computation (CEC), Vancouver,
BC, Canada, July 2006. p. 2622–2629.
14. Ibrahim AA, Mohamed A, Shareef H. A novel quantum-inspired binary gravitational search
algorithm in obtaining optimal power quality monitor placement. J Appl Sci. 2012;12:822–30.
15. Jeong Y-W, Park J-B, Jang S-H, Lee KY. A new quantum-inspired binary PSO: application to
unit commitment problems for power systems. IEEE Trans Power Syst. 2010;25(3):1486–95.
16. Jiao L, Li Y, Gong M, Zhang X. Quantum-inspired immune clonal algorithm for global opti-
mization. IEEE Trans Syst Man Cybern Part B. 2008;38(5):1234–53.
17. Jones JA. Fast searches with nuclear magnetic resonance computers. Science.
1998;280(5361):229.
18. Jones JA, Mosca M, Hansen RH. Implementation of a quantum search algorithm on a quantum
computer. Nature. 1998;393:344–6.
19. Kadowaki T, Nishimori H. Quantum annealing in the transverse Ising model. Phys Rev E.
1998;58:5355–63.
20. Kwiat PG, Mitchell JR, Schwindt PDD, White AG. Grover’s search algorithm: an optical
approach. J Modern Optics. 2000;47:257–66.
21. Liao G. A novel evolutionary algorithm for dynamic economic dispatch with energy saving
and emission reduction in power system integrated wind power. Energy. 2011;36:1018–29.
22. Meng K, Wang HG, Dong ZY, Wong KP. Quantum-inspired particle swarm optimization for
valve-point economic load dispatch. IEEE Trans Power Syst. 2010;25(1):215–22.
23. Montiel O, Rivera A, Sepulveda R. Design and acceleration of a quantum genetic algorithm
through the Matlab GPU library. In: Design of intelligent systems based on fuzzy logic, neural
networks and nature-inspired optimization, vol. 601 of Studies in Computational Intelligence.
24. Narayanan A, Moore M. Quantum-inspired genetic algorithms. In: Proceedings of IEEE inter-
national conference on evolutionary computation, Nogaya, Japan, May 1996. p. 61–66.
25. Nezamabadi-pour H. A quantum-inspired gravitational search algorithm for binary encoded
optimization problems. Eng Appl Artif Intell. 2015;40:62–75.
26. Nielsen MA, Knill E, Laflamme R. Complete quantum teleportation using nuclear magnetic
resonance. Nature. 1998;396:52–5.
References 293
27. Platel MD, Schliebs S, Kasabov N. A versatile quantum-inspired evolutionary algorithm. In:
Proceedings of IEEE congress on evolutionary computation (CEC), Singapore, Sept 2007. p.
423–430.
28. Platel MD, Schliebs S, Kasabov N. Quantum-inspired evolutionary algorithm: a multimodel
EDA. IEEE Tran Evol Comput. 2009;13(6):1218–32.
29. Shor PW. Algorithms for quantum computation: discrete logarithms and factoring. In: Pro-
ceedings of the 35th annual symposium on foundations of computer science, Sante Fe, NM,
USA, Nov 1994. pp. 124–134.
30. Shor PW. Polynomial-time algorithms for prime factorization and discrete logarithms on a
quantum computer. SIAM J Comput. 1997;26:1484–509.
31. Soleimanpour-moghadam M, Nezamabadi-pour H, Farsangi MM. A quantum-inspired gravi-
tational search algorithm fornumerical function optimization. Inf Sci. 2014;276:83–100.
32. Sun J, Feng B, Xu WB. Particle swarm optimization with particles having quantum behavior.
In: Proceedings of IEEE congress on evolutionary computation (CEC), Portland, OR, USA,
June 2004. p. 325–331.
33. Vandersypen LMK, Steffen M, Breyta G, Yannoni CS, Sherwood MH, Chuang IL. Experi-
mental realization of Shor’s quantum factoring algorithm using nuclear magnetic resonance.
Nature. 2001;414(6866):883–7.
34. Vlachogiannis JG, Ostergaard J. Reactive power and voltage control based on general quantum
genetic algorithms. Expert Syst Appl. 2009;36:6118–26.
35. Yang S, Wang M, Jiao L. A genetic algorithm based on quantum chromosome. In: Proceedings
of the 7th international conference on signal processing, Beijing, China, Aug 2004. p. 1622–
1625.
36. Zhang G, Jin W, Hu L. A novel parallel quantum genetic algorithm. In: Proceedings of the 4th
international conference on parallel and distributed computing, applications and technologies,
Chengdu, China, Aug 2003. p. 693–697.
37. Zhang GX, Rong HN. Real-observation quantum-inspired evolutionary algorithm for a class
of numerical optimization problems. In: Proceedings of the 7th international conference on
computational science, Beijing, China, May 2007, vol. 4490 of Lecture Notes in Computer
Science. Berlin: Springer; 2007. p. 989–996.
Metaheuristics Based on Sciences
18
This chapter introduces dozens of metaheuristic optimization algorithms that are

related to physics, natural phenomena, chemistry, biogeography, and mathematics.
18.1 Search Based on Newton’s Laws
Gravitational Search Algorithm

Gravitational search algorithm [45] is a stochastic optimization technique inspired
by the metaphor of the Newton theory of gravitational interaction between masses.
The search agents are a collection of objects having masses, and their interactions are
based on the Newtonian laws of gravity and motion. The force causes a global move-
ment of all objects toward the objects with heavier masses. Hence, masses cooperate
through gravitational force. Each agent represents a solution. Each agent has four
specifications: position, inertial mass, active gravitational mass, and passive gravita-
tional mass. The position of an agent corresponds to a solution of the problem, and
its gravitational and inertial masses are determined using a fitness function. Heavy
masses, which correspond to good solutions, move more slowly than lighter ones;
this guarantees the exploitation step of the algorithm. The algorithm is navigated by
properly adjusting the gravitational and inertia masses. Masses are attracted by the
heaviest object, corresponding to an optimum solution in the search space.
Gravitational search algorithm is somewhat similar to PSO in the position and
velocity update equations. However, the velocity update is based on the acceleration
obtained by the gravitational law of Newton. Consequently, position of each agent is
updated using the modified velocity. The gravitational constant adjusts the accuracy
of the search, so it speeds up the solution process. Furthermore, gravitational search
algorithm is memoryless, it works efficiently like algorithms with memory, and it
can be considered as an adaptive learning algorithm.

DOI 10.1007/978-3-319-41192-7_18
296 18 Metaheuristics Based on Sciences
In gravitational search algorithm, the algorithmic gravitational forces lead directly

to changes in the position of search points in a continuous space. In most cases,
gravitational search algorithm provides superior or at least comparable results with
PSO and central force optimization [23]. Gravitational search algorithm is easier to
implement in parallel with Open-MP compared to central force optimization.
In binary gravitational search algorithm [46], trajectories are changes in the prob-
ability that a coordinate will take on a zero or one value depending on the forces.
Artificial Physics Optimization
Inspired by the second Newton’s force law, artificial physics optimization [60] is
a stochastic population-based global optimization algorithm. Each entity is treated
as a physical individual with attributes of mass, position, and velocity. The rela-
tionship between an individual’s mass and its fitness is constructed. The better the
objective function value, the bigger is the mass, and the higher is the magnitude
of attraction. The individuals move toward the better fitness region, which can be
mapped to individuals moving toward others with bigger masses. In addition, the
individual attracts ones with worse fitness while repelling those with better fitness.
Especially, the individual with the best fitness attracts all others, whereas it is never
repelled or attracted by others. An individual moves toward other particles with
larger masses (better fitness values) and away from lower mass particles (worse
fitness values).
In rank-based multiobjective artificial physics optimization algorithm [59], the
mass function is dealt with by assigning different ranks to individuals by evaluating
the Pareto dominant relationships between individuals and their crowding degree. In
addition, crowding degree within the individual’s neighborhood is checked as another
index to evaluate the performance of individuals with the same Pareto-dominated
rank.
Central Force Optimization
Central force optimization [23] is a deterministic metaheuristic for optimization
based on the metaphor of gravitational kinematics. It models probes that fly through
the decision space by analogy to masses moving under the influence of gravity. Every
run beginning with the same parameters leads to the same result. The acceleration
update equation is dependent on the updated position and fitness for all probes. The
convergence is analyzed by using the gravitational kinematics theory [12]. Distrib-
uted multiobjective central force optimization [12] is a multigroup variate of central
force optimization.
Vibration Damping Optimization
Vibration damping optimization [36] is a metaheuristic algorithm based on the con-
cept of the vibration damping in mechanical vibration. It has been used for optimizing
the parallel machine scheduling problem.
18.2 Search Based on Electromagnetic Laws 297
18.2 Search Based on Electromagnetic Laws
Charged system search [30] is a multiagent optimization algorithm based on princi-

ples on Coulomb law from electrostatics and the Newtonian laws of mechanics. The
agents are charged particles, which can affect one another based on their fitness val-
ues and their distances. The resultant force is determined by using the electrostatics
laws and the movement is determined using Newtonian mechanics laws.
Electromagnetism-Like Algorithm
Electromagnetism-like algorithm [11] is a population-based heuristic inspired by
theory of electromagnetism in physics, in which charged particles exert attractive or
repulsive forces on one another. The basic idea behind the algorithm is to force parti-
cles to search for the optimum in a multidimensional space by applying a collective
force on them. The algorithm can be used for solving optimization problems with
bounded variables.
A candidate solution is associated with a charged particle in a multidimensional
space using a real-coded position vector x. Each particle has a charge qi , which is
related with its objective function value f (x). Each particle exerts a repulsive or
attrative force on other particles according to the charges they carry. x i is updated
by the resultant force F i on particle i at each iteration. For a minimization prob-
lem, a candidate particle i will attract particle j if particle i has an objective func-
tion value better than particle j has ( f (x i ) < f (x j )), or repel particle j otherwise.
Electromagnetism-like algorithm has four phases: initialization, calculation of par-
ticle charges and force vectors, movement according to the resultant force, and local
search to exploit the local minima. The last three phases repeat until the iteration is
completed.
Ions Motion Optimization
Charged particles are called ions. Ions with negative charge are called anions, whereas
ions with positive charge are called cations. Ions motion optimization [27] is a
population-based algorithm inspired from the fact that ions with similar charges
tend to repel, whereas ions with opposite charges attract each other. It has few tuning
parameters, low computational complexity, fast convergence, and high local optima
avoidance.
The ions represent candidate solutions for a particular problem and attrac-
tion/repulsion forces move the ions around the search space. The population of
candidate solutions divides into two sets: negative charged ions and positive charged
ions. Ions are required to move toward best ions with opposite charges. The fitness
of ions is proportional to the value of the objective function. Anions move toward the
best cation, whereas cations move toward the best anion. The amount of their move-
ment depends on the attraction/repulsion forces between them. The size of this force
specifies the momentum of each ion. In the liquid state, the ions have greater freedom
of motion compared to the solid phase (crystal) where high attraction forces between
ions prevent them from moving around freely. In fact, ions face minor motion and
mostly vibrate in their position in solid phase. The IMO algorithm also mimics these
two phases to perform diversification and intensification during optimization.
Magnetic Optimization Algorithm

Magnetic optimization algorithm [56] is inspired by the principles of magnetic field
theory. The possible solutions are magnetic particles scattered in the search space.
Each magnetic particle has a measure of mass and magnetic field according to its fit-
ness. The fitter magnetic particles have higher magnetic field and higher mass. Since
the electromagnetic force is proportional to the fitness of particles, search agents tend
to be attracted toward the fittest particles. Therefore, the search agents are improved
by moving toward the best solutions. A similar algorithm is the gravitational search
algorithm. These particles are located in a lattice-like environment and apply a force
of attraction to their neighbors. The cellular structure allows a better exploitation of
local neighborhoods before they move toward the global best.
Optimization by Optics
Similar to other multiagent methods, ray optimization [29] has a number of particles
consisting of the variables of the problem. These agents are considered as rays of light.
Based on the Snell’s light refraction law, when light travels from a lighter medium to
a darker medium, it refracts and its direction changes. This behavior helps the agents
to explore the search space in early stages of the optimization process and to make
them converge in the final stages.
Optics inspired optimization [28] treats the surface of the numerical function to
be optimized as a reflecting surface in which each peak is assumed to reflect as a
convex mirror and each valley to reflect as a concave one.
Filter machine [24] is an optical model for computation in solving combinatorial
problems. It consists of optical filters as data storage and imaging operation for
computation. Each filter is a long optical sensitive sheet, divided into cells. Filter
machine is able to generate every Boolean function.
18.3 Search Based on Thermal-Energy Principles
States of Matter Search

States of matter search [14] is inspired by the states of matter phenomenon. Indi-
viduals emulate molecules which interact with one another by using evolutionary
operations which are based on the thermal-energy motion principles. The evolution-
ary process is divided into three phases which emulate the three states of matter: gas,
liquid, and solid. In each state, molecules (individuals) exhibit different movement
capacities. Beginning from the gas state (pure exploration), the algorithm modifies
the intensities of exploration and exploitation until the solid state (pure exploitation)
is reached.
18.3 Search Based on Thermal-Energy Principles 299
Heat Transfer Search

Heat transfer search [43] is a metaheuristic optimization algorithm inspired by the
law of thermodynamics and heat transfer. The search agents are molecules that inter-
act with one another as well as with the surrounding to attain thermal equilibrium
state. The interactions of molecules are through various modes of heat transfer: con-
duction, convection, and radiation.
Kinetic Gas Molecule Optimization

The atomic theory of gases states that each substance is composed of a large number
of very small particles (molecules or atoms). Basically, all of the properties of the
gases, including the pressure, volume and temperature, are the consequence of the
actions of the molecules that compose the gas. Gas molecules attract one another
based on weak electrical intermolecular Van Der Waal forces, where the electrical
force is the result of positive and negative charges in the molecules. Kinetic gas
molecule optimization [40] is based on the kinetic energy and the natural motion of
gas molecules. It can converge toward the global minima quickly, and is also more
accurate and can decrease the mean square error (MSE) by orders of magnitude times
compared to PSO and gravitational search algorithm.
The agents are gas molecules that are moving in the search space; they are subject
to the kinetic theory of gases, which defines the rules for gas molecule interactions in
the model. Kinetic energy is used in measuring the performance. Each gas molecule
(agent) has four specifications: position, kinetic energy, velocity, and mass. The
kinetic energy of each gas molecule determines its velocity and position. The gas
molecules explore the whole search space to reach the point that has the lowest
temperature. The gas molecules move in the container until they converge in the part
of the container that has the lowest temperature and kinetic energy.
18.4 Search Based on Natural Phenomena
18.4.1 Search Based on Water Flows
Intelligent Water Drops Algorithm

Intelligent water drops algorithm [49] is a swarm-based optimization algorithm
inspired from observing natural water drops that flow in rivers. A natural river often
finds good paths in its ways from the source to destination. These paths are obtained
by a permanent interaction between swarm of water drops and the riverbeds. The
water drops are created with velocity and soil. The optimal path is revealed as the
one with the lowest soil on its links. During its trip, an intelligent water drop removes
some soil in the environment and may gain some speed. This soil is removed from
the path joining the two locations. The solutions are incrementally constructed.
Water Cycle Algorithm

Water cycle algorithm [22] mimics the flow of rivers and streams toward the sea,
considering the rain and precipitation phenomena. The algorithm is based on the pro-
cedure that water moves from streams to the rivers and, then from rivers to the sea.
The sea corresponds to the best optimum solution, and rivers are a number of best
selected solutions except the best one (sea). This procedure leads to indirect move-
ments toward the best solution. The evaporation and raining process corresponds to
the exploration phase, which avoids getting trapped in local optimum solutions.
The population includes a sea, some rivers, and some streams. Sea is the best
solution for the current population. It absorbs more streams than rivers do. For a
minimization problem, more streams flow to sea which has the lowest cost, and
other streams flow to rivers which has lower costs. An initial population of designs
variables (population of streams) is randomly generated after raining process. The
best individual (i.e., the best stream), having the minimum cost function, is chosen
as the sea. Then, a number of good streams (i.e., cost function values close to the
current best record) are chosen as rivers, while all other streams flow to the rivers
and sea.
Great Deluge Algorithm
Great deluge algorithm is a single-solution based metaheuristic for continuous global
optimization [19]. It is similar to SA to some extent in that both of the approaches
accept inferior solutions based on an acceptance rule during the solution search
process in order to escape from local optima. The inferior solution acceptance rule
is controlled by a variable called level, where any inferior solution with a penalty
cost value that is lower than level will be accepted.
In [1], great deluge algorithm is improved by maintaining a two-stage memory
architecture and search operators exploiting the accumulated experience in memory.
The level-based acceptance criterion is applied for each best solution extracted in a
particular iteration. In [2], this method is hybridized with a variant of tabu search as
local search for solving quadratic assignment problem.
Plant Growth Algorithm
Plant growth algorithm [64] is a metaheuristic optimization method inspired by
plant growth mechanism. The basic rules include phototropism, negative geotropism,
apical dominance, and branch in plant growth. The starting point of the algorithm
is the seed germ (first bud) and the target point of the algorithm is the light source.
The algorithm includes six steps, anemly, initialization, light intensity calculation,
random branch, growth vector calculation, plant growth, and path output.
18.4 Search Based on Natural Phenomena 301
18.4.2 Search Based on Cosmology
Big Bang Big Crunch Algorithm

Big bang big crunch algorithm [21] is a global optimization method inspired by a
theory of the evolution of the universe, namely, the big bang and big crunch theory.
In the big bang phase, energy dissipation produces disorder and randomness is the
main feature of this phase; thus, candidate solutions are randomly distributed over
the search space. In the big crunch phase, randomly distributed particles are drawn
into an order. The method generates random points in the big bang phase and shrinks
those points to a single point via a center of mass or minimal cost approach in the
big crunch phase. All subsequent big bang phases are randomly distributed about
the center of mass or the best fit individual. The working principle can be explained
as the transformation of a convergent solution to a chaotic state and then back to a
single tentative solution point. The algorithm outperforms compact GA in terms of
computational time and convergence speed.
Spiral or Vortex Search
Spiral dynamics algorithm [55] is a metaheuristic algorithm inspired from spiral
phenomena in nature such as spiral of galaxy, hurricanes and tornados, geometry of
nautilus shell, and shape of human fingerprint. It contains a powerful spiral model,
which forms a spiral motion trajectory for search agents in a search space for both
exploration and exploitation strategies. The model causes all the agents to be diversi-
fied and intensified at the initial and final phases of the search operation, respectively.
With the presence of spiral model, the step size of agents can be steadily controlled
and dynamically varied throughout the search operation. In addition, the motion of
all agents is always guided toward the best optimum location found in that particular
iteration. The convergence speed and fitness accuracy of the algorithm are mostly
determined by the constant spiral radius r and angular displacement θ . The conver-
gence rate for all agents toward the best location is uniform. This may potentially lead
them to local optima solutions. The problem might be possibly solved by varying r
and θ .
Vortex search [17] is a single-solution based metaheuristic inspired from the vor-
tex pattern created by the vortical flow of the stirred fluids. To provide a good balance
between the explorative and exploitative behavior of a search, the method models its
search behavior as a vortex pattern by using an adaptive step size adjustment scheme.
The algorithm is simple and does not require any additional parameters. Vortex search
outperforms SA, and its prformance is comparable to population-based metaheuris-
tics.
Cloud-Based Optimization
Atmosphere clouds model optimization [62] is a stochastic optimization algorithm
inspired from the behaviors of cloud in the natural world. It simulates the generation,
moving and spreading behaviors of cloud in a simple way. The search space is divided
into many disjoint regions according to some rules, and each region has its own
humidity value and air pressure value. There are some rules. (a) Clouds can only be
generated in regions whose humidity values are higher than a certain threshold. (b)
Under wind, clouds move from regions with higher air pressure to regions with lower
air pressure. (c) In the moving process, the droplets of one cloud would spread or
gather according to the air pressure difference between the region where this cloud
is located before move behavior and the region where the cloud is located after move
behavior. (d) One cloud is regarded as having disappeared when its coverage exceeds
a certain value or its droplets number is less than a threshold. The humidity value and
air pressiure value of a region are updated every time after the generation, moving
and spreading behaviors of clouds.
Lightning Search
Lightning search [50] is a metaheuristic method for solving constrained optimiza-
tion problems, which is inspired by the natural phenomenon of lightning and the
mechanism of step leader propagation using the concept of fast particles known as
projectiles. Three projectile types are developed to represent the transition projec-
tiles that create the first step leader population, the space projectiles that attempt to
become the leader, and the lead projectile that represent the projectile fired from
best positioned step leader. The major exploration feature of the algorithm is mod-
eled using the exponential random behavior of space projectile and the concurrent
formation of two leader tips at fork points using opposition theory.
Wind Driven Optimization
Wind driven optimization [7,8] is a population-based global optimization algorithm
inspired by atmospheric motion. At its core, a population of infinitesimally small
air parcels navigates over a search space, where the velocity and the position of
wind controlled air parcels are updated following Newton’s second law of motion.
Compared to PSO, wind driven optimization employs additional terms in the velocity
update equation (e.g., gravitation and Coriolis forces).
18.4.3 Black Hole-Based Optimization
A black hole is a region of space-time whose gravitational field is so strong that

nothing which enters it, not even light can escape. Black holes all share a common
characteristic; they are so dense that not even light can escape their gravity. No
information can be obtained from this region. The sphere-shaped boundary of a
black hole in space is known as the event horizon [26].
Black hole-based optimization [26] is a population-based algorithm inspired by the
black hole phenomenon. At the initialization step, a randomly generated population
of candidate solutions, called stars, are placed in the search space of the problem.
At each iteration, the algorithm performs black hole update, movement of stars,
stars replacement. The steps are repeated until a termination criterion is reached. In
other words, at each iteration, the objective function or the fitness value of each star is
evaluated and the best star is selected as the black hole x B H , which then starts pulling
18.4 Search Based on Natural Phenomena 303
other stars around it. If a star gets too close to the black hole, it will be swallowed
by the black hole and is gone forever. In such a case, a new star (candidate solution)
is randomly generated and placed in the search space and starts a new search.
Once the stars are initialized and the black hole is designated, the black hole starts
absorbing the stars around it. Therefore, all stars move toward the black hole [26]:
x i = x i + rand(x B H − x i ), ∀i, i = best. (18.1)
The black hole does not move, because it has the best fitness value and then attracts
all other particles. While moving toward the black hole, a star may reach a location
with lower cost (with a best fitness) than the black hole. Therefore, the black hole is
updated by selecting this star.
If a star crosses the event horizon of the black hole, i.e., if the distance between
a star and the black hole is less than the Schwarzschild radius, this star dies. A new
star is born and it is distributed randomly in the search space. The radius of the event
horizon is calculated by [26]
fBH
R = N , (18.2)
i=1 f i
where f B H is the fitness value of the black hole, f i is the fitness value of the ith star,
and N is the number of stars.
Stellar-mass black hole optimization [6] is another metaheuristic technique
inspired from the property of a black hole’s gravity that is present in the Universe. It
outperforms PSO and cuckoo search on the benchmark.
Multi-verse optimizer [38] is a metaheuristic inspired from three concepts in
cosmology: white hole, black hole, and wormhole. The mathematical models of
these three concepts are developed to perform exploration, exploitation, and local
search, respectively.
18.5 Sorting
Computer processing mainly depends on sorting and searching methods. There are
many sorting algorithms, like bubble sort and library sort [9]. Beadsort [4] is a natural
sorting algorithm where the basic operation can be compared to the manner in which
beads slide on parallel poles, such as on an abacus. Rainbow sort [47] is based on the
physical concepts of refraction and dispersion, where light beams of longer wave-
lengths are refracted to a lesser degree than beams of a shorter wavelength. Spaghetti
sort [15] can be illustrated by using uncooked pipes of spaghetti. Centrifugal sort
[41] represents the numbers to be sorted by the density of the liquids. The gravitation
acceleration would be sufficient for sorting. Higher values of acceleration and speed
may speed up the process. Friction-based sorting [16] is to associate to each number
a ball with weight proportional to that number. All the balls fall in the presence of
friction, and the heavier ball corresponding to the greater input number will reach
the ground earlier.
18.6 Algorithmic Chemistries
Artificial chemistry algorithms mimic a real chemistry process, in some cases by

assigning kinetic coefficients, defining molecule representation and focusing on an
efficient energy conservation state. Algorithmic chemistries are intended as com-
putation models. In [37], the potential roles of energy in algorithmic chemistries
are illustrated. A simple yet sufficiently accurate energy model can efficiently steer
resource usage. An energy framework keeps the molecules within reasonable length
bounds, allowing the algorithm to behave thermodynamically and kinetically similar
to real chemistry.
A chemical reaction network comprises a set of reactants, a set of products (often
intersecting the set of reactants), and a set of reactions. For example, the pair of
combustion reactions 2H2 + O2 → 2H2 O forms a reaction network. The reactions
are represented by the arrows. The reactants appear to the left of the arrows, in this
example they are H2 (hydrogen), and O2 (oxygen). The products appear to the right
of the arrows, here they are H2 O (water).
Chemical reaction networks model chemistry in a well-mixed solution. A natural
language for describing the interactions of molecular species in a well-mixed solution
is that of (finite) chemical reaction networks, i.e., finite sets of chemical reactions.
Chemical reaction networks can simulate a bounded-space Turing machine effi-
ciently, if the number of reactions is allowed to scale polynomially with the Turing
machine’s space usage [57]. Even Turing universal computation is possible with an
arbitrarily small, nonzero probability of error over all time [54].
18.6.1 Chemical Reaction Optimization
Chemical reaction optimization (CRO) [31] is a population-based metaheuristic for

combinatorial optimization problems, inspired by the phenomenon of interactions
between molecules in a chemical reaction process based on the principle that reac-
tions yield products with the lowest energy on the potential energy surface. The
objective function of a combinatorial optimization problem can be viewed as the
potential energy of the molecules in CRO. CRO has demonstrated good performance
in solving task scheduling in grid computing [61]. CRO was later developed for the
continuous problems [33], where an adaptive design reduces the number of control
parameters.
CRO loosely couples optimization with chemical reactions. The underlying princi-
ple of CRO is the conservation of energy. The molecules possess molecular structures
with lower and lower potential energy in each subsequent change. This phenomenon
is the driving force of CRO to ensure convergence to lower energy state.
The details of chemical reactions, e.g., quantum and statistical mechanics, are not
captured in the canonical design of CRO. The manipulated agents in CRO are mole-
cules, each of which maintains a molecular structure, potential energy, kinetic energy,
the number of hits, the minimum hit number, and the minimum value. CRO has a
18.6 Algorithmic Chemistries 305
variable population size. All quantities related to energy should have nonnegative
values.
Molecular structure represents the feasible solution of the optimization problem
currently attained by the molecule. Potential energy quantifies the molecular struc-
ture in terms of energy and is modeled as the cost function value of the optimization
problem. Kinetic energy characterizes the degree of the molecule’s activity, indicat-
ing the solution’s ability of jumping out of local optima. Number of hits counts the
number of hits experienced by the molecule. Minimum hit number is recorded at
the hit when the molecule possesses the current best solution. Thus, the difference
between the number of hits and minimum hit number is the number of hits that the
molecule has experienced without finding a better solution. This is also used as the
criterion for decomposition. Minimum value is the cost function value of the solu-
tion generated at the time when the minimum hit number is updated, that is, it is the
minimum potential energy experienced by the molecule itself.
Imagine that there is a closed container with a certain number of molecules.
These molecules collide and undergo elementary reactions, which may modify their
molecular structures and the attained energies. Elementary reactions are operators,
which update the solutions. Through a random sequence of elementary reactions,
the algorithm explores the solution space and converges to the global minimum.
Chemical reactions occur due to the formation and breaking of chemical bonds
that is produced by the motion of electrons of the molecules. Four types of elementary
reactions are on-wall ineffective collision, decomposition, intermolecular ineffective
collision, and synthesis. Through a random sequence of elementary reactions, CRO
explores the solution space and converges to the global minimum. The two ineffective
collisions modify the molecules to new molecular structures that are close to the
original ones, thus enabling the molecules to search their immediate surroundings on
the potential energy space (solution space). Conversely, decomposition and synthesis
tend to produce new molecular structures. Among the four collisions, local search is
contributed by onwall ineffective collosion and intermolecular ineffective collision,
whereas global search is intensified by decomposition and synthesis.
In initialization, a population of molecules is randomly generated, their potential
energies are determined, and they are assigned with proper initial kinetic energys.
Then, CRO enters into the stage of iterations. The manipulated agents are mole-
cules and the events for manipulating the solutions represented by the molecules
are classified into four elementary reactions. In each iteration, the collision is first
identified as unimolecular or intermolecular. In each iteration of the algorithm, only
one elementary reaction will take place, depending on the conditions of the chosen
molecules for that iteration. The algorithm then checks if any new solution superior
to the best-so-far solution is found. If so, the solution will be kept in memory. This
iteration stage repeats until a stopping criterion is satisfied. Finally, the solution with
the lowest cost function value is outputted.
In [32], some convergence results are presented for several generic versions of
CRO, each adopting different combinations of elementary reactions. By modeling
CRO as a finite absorbing Markov chain, CRO is shown to converge to a global
optimum solution with a probability arbitrarily close to one, when time tends to
infinity. The convergence of CRO is shown to be determined by both the elementary

reactions and the total energy of the system. A necessary condition for convergence is
provided from the perspective of elementary reaction design. A lower bound of total
energy that can guarantee CRO’s convergence is derived. The finite time behavior of
CRO is also explored.
Chemical reaction algorithm [5] performs a stochastic search for optimal solutions
within a defined search space. It has a simpler parameter representation than CRO.
Since only the general schema of the chemical reactions is taken into consideration,
no extra parameters (such as mass and kinetic coefficient) are added. Every solution
is represented as an element (or compound), and the fitness or performance of the
element is evaluated in accordance with the objective function. The main character-
istics of this algorithm are the exploiting/exploring mechanisms combined with the
elitist survival strategy.
Artificial chemical reaction optimization algorithm [3] is another metaheuristics
inspired from types and occurring of chemical reactions.
Chemical Reaction Networks
Chemical reaction networks formally model chemistry in a well-mixed solution.
They are widely used to describe information processing occurrence in natural cel-
lular regulatory networks, and are a promising language for the design of artificial
molecular control circuitry. They have been shown to be efficiently Turing universal
when allowing for a small probability of error.
Chemical reaction networks that are guaranteed to converge on a correct answer
have been shown to decide only the semilinear predicates (a multidimensional gen-
eralization of eventually periodic sets). Computation of a function f : Nk → Nl is
represented by a count of some molecular species. The function f is deterministi-
cally computed by a stochastic chemical reaction network if and only if its graph is
a semilinar set [13]. The time complexity of the algorithm is lower than quadratic
of the total number of input molecules. Deterministic chemical reaction networks
without a leader have been shown to retain the same computability power as stochatic
chemical reaction networks [18].
18.7 Biogeography-Based Optimization
Biogeography is the study of the geographical distribution of biological organisms

over space and time. It is nature’s way of distributing species. Mathematical models
of biogeography describe the migration, speciation, and extinction of species [34,35].
Species migrate between islands. Islands that are well suited as residences for bio-
logical species are said to be highly habitable. Habitability features include rainfall,
diversity of vegetation, diversity of topographic features, land area, and tempera-
ture. Islands that are highly habitable tend to have many species. Highly habitable
islands have a high emigration rate due to the accumulation of random effects on
their large populations. They have a low immigration rate because they are already
18.7 Biogeography-Based Optimization 307
Figure 18.1 A linear model λi μi

of species richness: a
habitat’s immigration rate λi
I
and emigration rate μi .
E
λi
μi
fmin fmax fi
nearly saturated with species. Islands with low habitability have a high species immi-
gration rate. Immigration of new species to islands might raise the habitability of
those islands because habitability is proportional to biological diversity.
Biogeography-based optimization (BBO) (Matlab Code, http://academic.csuohio.
edu/simond/bbo) [51,53] is a population-based stochastic global optimization algo-
rithm based on biogeography theory. In BBO, a set of solutions is called archipelago,
a solution is called a habitat (island) with a habitat suitability index (HSI) as the fit-
ness of the solution, and a solution feature is called species. BBO adopts migration
operator to share information between solutions. It maintains its set of solutions from
one iteration to the next one.
BBO has migration and mutation operators. As with every other EA, mutation and
elitism might also be incorporated. Each individual has its own immigration rate λi
and emigration rate μi , which are functions of its fitness. A good solution has higher
μi and lower λi , and vice versa. In a linear model of species richness (as illustrated
in Figure 18.1), a habitat’s immigration rate λi and emigration rate μi are calculated
based on its fitness f i by
f max − f i f i − f min
λi = I , μi = E , (18.3)
f max − f min f max − f min
where f max and f min are, respectively, the maximum and minimum fitness values
among the population and I and E are, respectively, the maximum possible immi-
gration rate and emigration rate. That is, with the increase of HSI f i , λi linearly
decreases from I to 0, while μi linearly increases from 0 to E.
The probability of immigrating to x k is λk and that of emigrating from x k is based
on roulette-wheel selection Nμk , where N is the population size.
j=1 μ j
For each habitat i, a species count probability Pi computed from λi and μi indi-
cates the likelihood that the habitat was expected a priori as a solution. Mutation is
a probabilistic operator that randomly modifies a decision variable of a candidate
solution to increase diversity among the population. The mutation rate of habitat i is
inversely proportional to its probability: pm,i = pm,max (1 − PPmax
i
), where pm,max is
a control parameter and Pmax is the maximum habitat probability in the population.
The BBO flowchart is given in Algorithm 18.1.
Algorithm 18.1 (BBO).
1. Generate initial population of N P islands x i , i = 1, 2, . . . , N P .

2. Repeat
a. for i = 1 to N P do
i. Calculate the fitness f i , the immigration rate λi , and the emigration rate μi for
each individual x i .
ii. Select x i with probability proportional to λi .
iii. if rand(0, 1) < λi , //immigration
for j = 1 to N P do
Select x j with probability proportional to μ j .
if rand(0, 1) < μ j
Randomly select a variable x from x j .
Replace the corresponding variable in x i with x.
end if
end for
end if
iv. Mutation by pm,i .
b. Update f max , Pmax and the best known solution.
3. until the maximum number of generations.
BBO migration strategy is similar to the global recombination approach of ES,

in which many parents can contribute to a single offspring. BBO outperforms most
EAs [51]. A conceptual comparison and contrast between GAs and BBO is discussed
in [53]. BBO migration strategy is conceptually similar to a combination of global
recombination and uniform crossover. Global recombination means that many par-
ents can contribute to a single offspring, and uniform crossover means that each
decision variable in an offspring is generated independently. The entire population
is used as potential contributors to each offspring, and fitness-based selection is used
for each decision variable in each offspring. BBO reduces to GA with a combination
of global recombination and uniform crossover with the setting λk = 1 for all k.
BBO maintains its set of solutions from one iteration to the next, relying on
migration to probabilistically adapt those solutions; this is in common with strategies
such as PSO and DE. BBO solutions are changed directly via migration from other
solutions (islands). That is, BBO solutions directly share their attributes with other
solutions.
Based on a simplified version of BBO, an approximate analysis of the BBO pop-
ulation is performed using probability theory in [52]. The analysis provides approx-
imate values for the expected number of generations before the population’s best
solution improves, and the expected amount of improvement.
DE/BBO [25] combines the exploration of DE with the exploitation of BBO effec-
tively. Oppositional BBO [20] modifies BBO by employing opposition-based learn-
ing alongside migration rates. Quasi-reflection oppositional BBO [10] accelerates
18.7 Biogeography-Based Optimization 309
Best value: −1.0000 Mean value: −1.0000

0
Best value
−0.1 Mean value
−0.2
−0.3
Function value
−0.4
−0.5
−0.6
−0.7
−0.8
−0.9
−1
0 20 40 60 80 100
Iteration
Figure 18.2 The evolution of a random run of BBO for Easom function: the minimum and average
objectives.
the convergence of BBO. Instead of opposite numbers, they use quasi-reflected num-
bers for population initialization and also for generation jumping.
Example 18.1: The Easom function is treated in Examples 2.1, 3.4, and 5.2. Here
we solve this same problem by using BBO. The global minimum value is −1 at
x = (π, π )T .
We implement BBO on this problem by setting the number of habitats (population
size) as 50, the maximum number of iterations as 100, a keep rate of 0.2, α = 0.9,
pm = 0.1, σ = 4, and selects the initial population randomly from the entire domain.
tion evaluations. All the individuals converge toward the global optimum. The evolu-
tion of the search is illustrated in Figure 18.2. For 10 random runs, the solver always
converged to the global optimum within 100 generations.
18.8 Methods Based on Mathematical Concepts
From the mathematical aspect, chaos is defined as a pseudorandom behavior gen-

erated by nonlinear deterministic systems. Chaos has several important dynamical
characteristics, namely, the sensitive dependence on initial conditions, pseudoran-
domness, ergodicity, and strange attractor with self-similar fractal pattern. Chaos
theory has been used to develop global optimization techniques. Chaos has also
been widely integrated with metaheuristic algorithms.
Chaos optimization algorithms [42] are population-based metaheuristic based
on the use of pseudorandom numerical sequences generated by means of chaotic
map. Chaos optimization can carry out overall searches at higher speeds and escape
from local minima more easily than stochastic ergodic searches that depend on the
probabilities [42]. The parallel chaos optimization algorithm proposed in [63] uses
migration and merging operations to achieve a good balance between exploration
and exploitation.
Sine cosine algorithm (http://www.alimirjalili.com/SCA.html) [39] is a
population-based metaheuristic optimization algorithm. It creates multiple initial
random solutions and requires them to fluctuate outwards or toward the best solution
using a mathematical model based on sine and cosine functions. Several random and
adaptive variables are integrated to enable exploration and exploitation of the search
space.
18.8.1 Opposition-Based Learning
The concept of opposition-based learning [58] has been utilized in a wide range of
learning and optimization fields. A mathematical proof shows that in terms of conver-
gence speed, utilizing random numbers and their oppositions is more beneficial than
using the pure randomness to generate initial estimates without a prior knowledge
about the solution of a continuous domain optimization problem [44]. It is mathemat-
ically proven in [48] that opposition-based learning performs well in binary spaces.
The proposed binary opposition-based scheme can be embedded inside many binary
population-based algorithms. Opposition-based learning is applied to accelerate the
convergence rate of binary gravitational search algorithm [48].
Opposition-based strategy in optimization algorithms uses the concept of
opposition-based learning [58]. EAs by opposition-based learning is implemented
by comparing the fitness of an individual to its opposite and retaining the fitter one in
the population. Opposition-based learning is an effective method to enhance various
optimization techniques.
Definition 18.1 (Opposition number). The opposition number of a real number

x ∈ [a, b] is defined by x̃ = a + b − x. For a vector x, each dimension xi ∈ [ai , bi ],
the corresponding dimension of the opposite point is denoted by x̃i = ai + bi − xi .
Definition 18.2 (Opposition number in binary domain). Let x ∈ {0, 1}. The oppo-
site number x̃ is denoted by x̃ = 1 − x. For a vector x in binary space, each dimension
xi , the corresponding dimension of the opposite point is denoted by x̃i = 1 − xi .
18.8 Methods Based on Mathematical Concepts 311
Problems
18.1 Give the similarity of gravitational search algorithm and PSO.

18.2 Run the accompanying MATLAB code of wind driven optimization to find
the global minimum of six-hump-camelback function in the Appendix. Inves-
tigate how the parameters influence the performance.
18.3 Run the accompanying MATLAB code of multi-verse optimizer to find the
global minimum of Schaffer function in the Appendix. Investigate how to
improve the result by adjusting the parameters.
18.4 Consider the four one-dimensional maps for generating chaotic behaviors.
a) Logistic map xn+1 = 4xn (1 − xn ) generates chaotic sequences in (0,1).
b) Chebyshev map xn+1 = cos(5 cos−1 xn ) generates chaotic sequences in
[−1, 1].
c) Cubic map xn+1 = 2.59xn (1 − xn2 ) generates chaotic sequences in (0,1).
d) Sinusodial map xn+1 = sin(π xn ) generates chaotic sequences in (0,1).
Draw the chaotic motions in two-dimensional space (x1 , x2 ) for 200 iterations
for an initial value of x1 = 0.2. Observe their ergodic property.
References
1. Acan A, Unveren A. A two-stage memory powered great deluge algorithm for global optimiza-
tion. Soft Comput. 2015;19:2565–85.
2. Acan A, Unveren A. A great deluge and tabu search hybrid with two-stage memory support
for quadratic assignment problem. Appl Soft Comput. 2015;36:185–203.
3. Alatas B. ACROA: artificial chemical reaction optimization algorithm for global optimization.
Expert Syst Appl. 2011;38:13170–80.
4. Arulanandham JJ, Calude C, Dinneen MJ. Bead-sort: a natural sorting algorithm. Bull Eur
Assoc Theor Comput Sci. 2002;76:153–61.
5. Astudillo L, Melin P, Castillo O. Introduction to an optimization algorithm based on the chem-
ical reactions. Inf Sci. 2015;291:85–95.
6. Balamurugan R, Natarajan AM, Premalatha K. Stellar-mass black hole optimization for biclus-
tering microarray gene expression data. Appl Artif Intell. 2015;29:353–81.
7. Bayraktar Z, Komurcu M, Werner DH. Wind driven optimization (WDO): a novel nature-
inspired optimization algorithm and its application to electromagnetics. In: Proceedings of
IEEE antennas and propagation society international symposium (APSURSI), Toronto, ON,
Canada, July 2010. p. 1–4.
8. Bayraktar Z, Komurcu M, Bossard JA, Werner DH. The wind driven optimization technique
and its application in electromagnetics. IEEE Trans Antennas Propag. 2013;61(5):2745–57.
9. Bender MA, Farach-Colton M, Mosteiro MA. Insertion sort is O(n log n). Theory Comput
Syst. 2006;39(3):391–7.
10. Bhattacharya A, Chattopadhyay P. Solution of economic power dispatch problems using oppo-
sitional biogeography-based optimization. Electr Power Compon Syst. 2010;38:1139–60.
11. Birbil SI, Fang S-C. An electromagnetism-like mechanism for global optimization. J Global
Optim. 2003;25(3):263–82.
12. Chao M. SunZhi Xin, LiuSan Min, Neural network ensembles based on copula methods
and Distributed Multiobjective Central Force Optimization algorithm. Eng Appl Artif Intell.
2014;32:203–12.
13. Chen H-L, Doty D, Soloveichik D. Deterministic function computation with chemical reaction
networks. Nat Comput. 2014;13:517–34.
14. Cuevas E, Echavarria A, Ramirez-Ortegon MA. An optimization algorithminspired by the
states of matter that improves the balance between explorationand exploitation. Appl Intell.
2014;40:256–72.
15. Dewdney AK. On the spaghetti computer and other analog gadgets for problem solving. Sci
Am. 1984;250(6):19–26.
16. Diosan L, Oltean M. Friction-based sorting. Nat Comput. 2011;10:527–39.
17. Dogan B, Olmez T. A new metaheuristic for numerical function optimization: vortex search
algorithm. Inf Sci. 2015;293:125–45.
18. Doty D, Hajiaghayi M. Leaderless deterministic chemical reaction networks. Nat Comput.
2015;14:213–23.
19. Dueck G. New optimization heuristics: the great deluge algorithm and the record-to-record
travel. J Comput Phys. 1993;104:86–92.
20. Ergezer M, Simon D, Du D. Oppositional biogeography-based optimization. In: Proceedings of
IEEE conference on systems, man, and cybernetics, San Antonio, Texas, 2009. p. 1035–1040.
21. Erol OK, Eksin I. A new optimization method: big bang big crunch. Adv Eng Softw.
2006;37(2):106–11.
22. Eskandar H, Sadollah A, Bahreininejad A, Hamdi M. Water cycle algorithm—a novel meta-
heuristic optimization method for solving constrained engineering optimization problems.
Comput Struct. 2012;110:151–60.
23. Formato RA. Central force optimization: a new metaheuristic with application in applied elec-
tromagnetics. Prog Electromagn Res. 2007;77:425–91.
24. Goliaei S, Jalili S. Computation with optical sensitive sheets. Nat Comput. 2015;14:437–50.
25. Gong W, Cai Z, Ling CX. DE/BBO: a hybrid differential evolution with biogeography-based
optimization for global numerical optimization. Soft Comput. 2010;15:645–65.
26. Hatamlou A. Black hole: a new heuristic optimization approach for data clustering. Inf Sci.
2013;222:175–84.
27. Javidy B, Hatamlou A, Mirjalili S. Ions motion algorithm for solving optimization problems.
28. Kashan AH. A New metaheuristic for optimization: optics inspired optimization (OIO). Tech-
nical Report, Department of Industrial Engineering, Tarbiat Modares University. 2013.
29. Kaveh A, Khayatazad M. A new meta-heuristic method: ray optimization. Comput Struct.
2012;112:283–94.
30. Kaveh A, Talatahari S. A novel heuristic optimization method: charged system search. Acta
Mech. 2010;213:267–89.
31. Lam AYS, Li VOK. Chemical-reaction-inspired metaheuristic for optimization. IEEE Trans
Evol Comput. 2010;14(3):381–99.
32. Lam AYS, Li VOK, Xu J. On the convergence of chemical reaction optimization for combina-
torial optimization. IEEE Trans Evol Comput. 2013;17(5):605–20.
33. Lam AYS, Li VOK, Yu JJQ. Real-coded chemical reaction optimization. IEEE Trans Evol
Comput. 2012;16(3):339–53.
34. Lomolino M, Riddle B, Brown J. Biogeography. 3rd ed. Sunderland, MA: Sinauer Associates;
2009.
35. MacArthur R, Wilson E. The theory of biogeography. Princeton, NJ: Princeton University;
1967.
36. Mehdizadeh E, Tavakkoli-Moghaddam R, Yazdani M. A vibration damping optimization algo-
rithm for a parallel machines scheduling problem with sequence-independent family setup
times. Appl Math Modell. 2016. in press.
References 313
37. Meyer T, Yamamoto L, Banzhaf W, Tschudin C. Elongation control in an algorithmic chem-

istry. In: Advances in artificial life. Darwin Meets von Neumann, Lecture Notes on Computer
Science, vol. 5777. Berlin: Springer; 2011. p. 273–280.
38. Mirjalili S, Mirjalili SM, Hatamlou A. Multi-verse optimizer: a nature-inspired algorithm for
global optimization. Neural Comput Appl. 2015;49:1–19.
39. Mirjalili S. SCA: a sine cosine algorithm for solving optimization problems. Knowl-Based
Syst. 2016;96:120–33.
40. Moein S, Logeswaran R. KGMO: a swarm optimization algorithm based on thekinetic energy
of gas molecules. Inf Sci. 2014;275:127–44.
41. Murphy N, Naughton TJ, Woods D, Henley B, McDermott K, Duffy E, van der Burgt PJM,
Woods N. Implementations of a model of physical sorting. Int J Unconv Comput. 2008;1(4):3–
12.
42. Okamoto T, Hirata H. Global optimization using a multi-point type quasi-chaotic optimization
method. Appl Soft Comput. 2013;13(2):1247–64.
43. Patel VK, Savsani VJ. Heat transfer search (HTS): a novel optimization algorithm. Inf Sci.
2015;324:217–46.
44. Rahnamayan S, Tizhoosh HR, Salama MMA. Opposition versus randomness in soft computing
techniques. Appl Soft Comput. 2008;8(2):906–18.
45. Rashedi E, Nezamabadi-Pour H, Saryazdi S. GSA: a gravitational search algorithm. Inf Sci.
2009;179(13):2232–48.
46. Rashedi E, Nezamabadi-pour H, Saryazdi S. BGSA: binary gravitational search algorithm. Nat
Comput. 2010;9:727–45.
47. Schultes D. Rainbow sort: sorting at the speed of light. Nat Comput. 2006;5(1):67–82.
48. Seif Z, Ahmadi MB. Opposition versus randomness in binary spaces. Appl Soft Comput.
2015;27:28–37.
49. Shah-Hosseini H. The intelligence water drops algorithm: a nature-inspired swarm-based opti-
mization algorithm. Int J Bio-Inspired Comput. 2009;1:71–9.
50. Shareef H, Ibrahim AA, Mutlag AH. Lightning search algorithm. Appl Soft Comput.
2015;36:315–33.
51. Simon D. Biogeography-based optimization. IEEE Trans Evol Comput. 2008;12(6):702–13.
52. Simon D. A probabilistic analysis of a simplified biogeography-based optimization algorithm.
Evol Comput. 2011;19(2):167–88.
53. Simon D, Rarick R, Ergezer M, Du D. Analytical and numerical comparisons of biogeography-
based optimization and genetic algorithms. Inf Sci. 2011;181(7):1224–48.
54. Soloveichik D, Cook M, Winfree E, Bruck J. Computation with finite stochastic chemical
reaction networks. Nat Comput. 2008;7:615–33.
55. Tamura K, Yasuda K. Primary study of spiral dynamics inspired optimization. IEE J Trans
Electr Electron Eng. 2011;6:98–100.
56. Tayarani NMH, Akbarzadeh-T MR. Magnetic optimization algorithms: a new synthesis. In:
IEEE International conference on evolutionary computations, Hong Kong, June 2008. p. 2664–
2669.
57. Thachuk C, Condon A. Space and energy efficient computation with DNA strand displacement
systems. In: Proceedings of the 18th international meeting on DNA computing and molecular
programming, Aarhus, Denmark, Aug 2012. p. 135–149.
58. Tizhoosh HR. Opposition-based learning: a new scheme for machine intelligence. In: Pro-
ceedings of international conference on computational intelligence for modelling, control and
automation, Vienna, Austria, Nov 2005, vol. 1, p. 695–701.
59. Wang Y, Zeng J-C. A multi-objective artificial physics optimization algorithm based on ranks
of individuals. Soft Comput. 2013;17:939–52.
60. Xie LP, Zeng JC, Cui ZH. Using artificial physics to solve global optimization problems. In:
Proceedings of the 8th IEEE international conference on cognitive informatics (ICCI), Hong
Kong, 2009.
61. Xu J, Lam AYS, Li VOK. Chemical reaction optimization for task scheduling in grid computing.
IEEE Trans Parallel Distrib Syst. 2011;22(10):1624–31.
62. Yan G-W, Hao Z-J. A novel optimization algorithm based on atmosphere clouds model. Int J
Comput Intell Appl 12:1;2013: article no. 1350002, 16 pp.
63. Yuan X, Zhang T, Xiang Y, Dai X. Parallel chaos optimization algorithm with migration and
merging operation. Appl Soft Comput. 2015;35:591–604.
64. Zhou Y, Wang Y, Chen X, Zhang L, Wu K. A novel path planning algorithm based on plant
growth mechanism. Soft Comput. 2016. p. 1–11. doi:10.1007/s00500-016-2045-x.
Memetic Algorithms
19
The term meme was coined by Dawkins in 1976 in his book The Selfish Gene [7].
The sociological definition of a meme is the basic unit of cultural transmission or
imitation. A meme is the social analog of genes for individuals. Universal Darwinism
draws the analogy on the role of genes in genetic evolution to that of memes in a
cultural evolutionary process [7]. The science of memetics [3] represents the mind-
universe analog to genetics in cultural evolution, ranging the fields of anthropology,
biology, cognition, psychology, sociology and sociobiology. This chapter is dedicated
to memetic and cultural algorithms.
19.1 Introduction
The meme is a unit of intellectual or cultural information that can pass from mind to
mind, when people exchange ideas. As genes propagate in the gene pool via sperms
or eggs, memes propagate in the meme pool by spreading from brain to brain via a
process called imitation. Unlike genes, memes are typically adapted by the people
who transmit them before being passed on, that is, meme is a lifetime learning
procedure capable of generating refinement on individuals.
Like genes that serve as building blocks in genetics, memes are building blocks of
meaningful information that is transmissible and replicable. Memes can be thought
of as schemata that are modified and passed on over a learning process. The concept
of schemata being passable are just as behaviors or thoughts are passed on memes.
The typical memetic algorithm uses an additional mechanism to modify schemata
during an individual’s lifetime, taken as the period of evaluation from the point of
view of GA, and that refinement can be passed on to an individual’s offspring.
Memetic computation is a computational paradigm that encompasses the con-
struction of a comprehensive set of memes. It involves the additional dimension of
cultural evolution through memetic transmission, selection, replication, imitation, or

DOI 10.1007/978-3-319-41192-7_19
316 19 Memetic Algorithms
variation, in the context of problem-solving. Memetic computation is the hybridiza-

tion of a population-based global search and the local improvement, which strikes a
balance between exploration and exploitation of the search space.
An important step in memetic computation is to identify a suitable memetic rep-
resentation of the memotype. The memetic evolutionary process is primarily driven
by imitation [3], which takes place during transmission of memes. Individuals make
choices and imitate others who have obtained high payoffs in the previous rounds. For
imitation, memetic selection decides whom to imitate, memetic transmission decides
how to imitate, and memetic variation relates to what is imitated or assimilated.
In memetic expression and assimilation, the focus is placed on the socio-types
(which is the social expression of a meme, as analogous to the phenotype of a gene)
instead of memotypes of the agents. The agent assimilates memes by observing the
behaviors expressed by other agents. Expression and assimilation stages exist in
the iterated learning model in [15], whereby each agent indirectly acquires linguis-
tic memes from another by learning from a set of meaning-signal pairs generated
from the linguistic memes of another agent. During the process of imitation, memes
are constantly injected either into the meme pool of an agent [15] or the common
meme pool [10]. This results in a competition among the existing memes and the
injected memes during the retention stage. Memetic variation process refers to the
self-generation and reconfiguration of memes. The process takes place during the
various stages of meme propagation.
Cultural algorithms are similar to memetic computation in that both use the notion
of domain specific cultural knowledge to bias the search during problem-solving.
While the former predefines an appropriate belief space representation, memetic
computation encodes high-level knowledge representation that undergoes memetic
transmission, selection, replication, and/or variation. Further, memetic computation
embraces sociotype transmission as a realization of meme imitation, as opposed to
the typical memotype transmission in cultural algorithms.
19.2 Cultural Algorithms
EAs are easy to fall into premature convergence because implicit information and
domain knowledge is not fully used. Cultural algorithms [30] are motivated by human
culture evolution process. They can effectively improve the evolution performance
[6,28,31].
Cultural algorithms [30] are a computational framework consisting of two differ-
ent spaces: population space and belief space. Selected experiences of the successful
agents during the population evolution will produce knowledge that can be commu-
nicated to the belief space, where it gets manipulated and used to affect the evolution
process of the population. The interaction between both spaces yields a dual inheri-
tance structure in which the evolution of the agents and the evolved beliefs take place
in parallel, in a way similar to the human cultures evolution. Figure 19.1 presents
the components of a cultural algorithm.
19.2 Cultural Algorithms 317
Figure 19.1 Block diagram Adjust()

of cultural algorithm.
Belief space
Accept() Influence()
Population space
Reproduce() Performance()
The population space comprises a set of possible solutions to the problem, and can
be modeled using any population-based approach. Inspired by culture as information
storage in the society, the belief space is information which does not depend on the
individuals who generated it and can be accessed by all members in the population
space [31]. In belief space, implicit knowledge is extracted from better individuals
in the population. This is utilized to direct the evolution in the population space
to escape from the local optimal solutions. The two spaces first evolve separately,
then exchange experience by accept operation and affect operation. Individuals in
the population space can contribute their experience to the belief space using accept
operation, and the belief space influence the individuals in the population space using
affect operation. The two spaces can be modeled using any swarm-based computing
model.
Five basic categories of knowledge are stored in the belief space: situational,
normative, topographical, domain, and history knowledge [6,28].
• Normative knowledge denotes acceptable ranges for individuals’ behavior. It mem-

orizes the feasible search space of optimization problems. It consists of a set of
promising ranges.
• Historical knowledge is a time series of individuals’ behavior (temporal patterns).
Historical knowledge keeps track of the history of the search process and records
key events in the search. Individuals use the history knowledge for guidance in
selecting a moving direction. Tabu list is also history knowledge.
• Situational knowledge is a set of exemplary individuals (successful and unsuc-
cessful) useful for the experiences of all individuals. It guides all individuals to
learn from the exemplary individuals.
• Topographical knowledge denotes geographically recorded behaviors of the indi-
viduals (spatial patterns). It describes and is responsible for updating the distrib-
ution of good solutions in the feasible search space [6]. It uses the search space
found by normative knowledge to uniformly divide the search region into subre-
gions alongside each of the dimensions of the problem.
• Domain knowledge is formed knowledge of domain entities, their formed inter-
actions and relationships. It adopts information about the problem domain to lead
the search. Domain knowledge is modeled separately from the population, due to
the independence between them.
Algorithm 19.1 (Cultural Algorithm).
1. Initialize t = 0.
Initialize population space Pst and belief space Bst .
Initialize all the individuals in the two spaces randomly.
Evaluate the fitness of each individual.
2. Repeat:
a. Update the individuals in the two spaces according to their own rules and eval-
uate the fitness value of each individual in Pst .
b. Update belief space by accept operation: Bst = evolve (Bst , accept (Pst )).
c. Update population space by influence operation: Pst = create(Pst ,
influence(Bst )).
d. Set t = t + 1.
e. Choose Pst from Pst−1 .
until stopping criterion is satisfied.
Domain knowledge and history knowledge are useful on dynamic landscape prob-
lems [28].
The process of cultural algorithm is described as Algorithm 19.1. Individuals get
assessed with the performance function. The accept function then selects the best
agents in the population space to inform the belief space so as to update the belief
space. Knowledge in the belief space is then allowed to enhance those individuals
selected to the next generation through the affect operation. The algorithm replicates
this process iteratively until the stopping condition is reached.
Crossover operators are often used but they have no biological analogy; they mimic
obsequious and rebellious behavior found in cultural systems; the problem-solving
experience of individuals selected from the population space is used to generate
problem-solving knowledge in the belief space. This knowledge can control the
evolution of individuals by means of an influence function, by modifying any aspect
of the individuals.
Multi-population cultural algorithm [8] adopts individual migration. Only best
solutions coming from each sub-population are exchanged in terms of given migra-
tion rules. It does not use implicit knowledge extracted from a sub-population. A
method proposed in [1] divides sub-population based on fuzzy clustering and gives
cultural exchange among sub-populations.
In [2], culture algorithm uses DE as the population space. The belief space uses
different knowledge sources to influence the variation operator of DE in order to
reduce the calculated amount on evaluating fitness values.
19.3 Memetic Algorithms
Motivated by the evolution of ideas, memetic algorithm [23,24], also called genetic
local search, is another cultural algorithm framework based upon the cultural evolu-
tion that can exhibit local refinement. It is a dual inheritance system that consists of
19.3 Memetic Algorithms 319
a social population and a belief space, and models the evolution of culture or ideas.
Their owner can improve upon the idea by incorporating local search.
Memetic algorithm was inspired by both the neo-Darwinian paradigm and
Dawkins’ notion of a meme defined as a unit of cultural evolution that is capa-
ble of local refinements. Evolution and learning are combined using the Lamarckian
strategy. Memetic algorithm can be considered as EA with local search. It combines
the evolutionary adaptation of a population with individual learning of its members.
Memetic algorithm is considerably faster than simple GA.
Though encompassing characteristics of cultural evolution in the form of local
refinement in the search cycle, memetic algorithm is not a true evolving system
according to universal Darwinism, since the principles of inheritance/memetic trans-
mission, variation and selection are missing.
In [25], a probabilistic memetic framework that governs memetic algorithms as
a process involving whether evolution or individual learning should be favored is
presented and the probability of each process in locating the global optimum is
analyzed. The framework balances evolution and individual learning by governing
the learning intensity of each individual according to the theoretical upper bound
derived during the search process.
Another class of memetic algorithms exhibits the principles of memetic trans-
mission and selection in their design. In multi-meme memetic algorithm [16], the
memetic material is encoded as part of the genotype. Subsequently, the decoded
meme of each respective individual is then used to perform a local refinement. The
memetic material is then transmitted through a simple inheritance mechanism from
parent to offspring. In hyper-heuristic [14] and meta-Lamarckian memetic algorithm
[27], the pool of candidate memes considered will compete, based on their past merits
in generating local improvements through a reward mechanism, deciding on which
meme to be selected to proceed for future local refinements.
In coevolution and self-generation memetic algorithms [17,32], all three princi-
ples satisfying the definitions of a basic evolving system has been considered. A
rule-based representation of local search is co-adapted alongside candidate solutions
within the evolutionary system, thus capturing regular repeated features or patterns
in the problem space.
By combining cellular GA with a random walk local search [11], a better conver-
gence rate is achieved on the satisfiability problems. For cellular memetic algorithm
[12], adaptive mechanisms that tailor the amount of exploration versus exploita-
tion of local solutions are carried out. A memetic version of DE, called memDE
[26], applies crossover-based local search, called fittest individual refinement, for
exploring the neighborhood of the best solution in each generation for enhanced
convergence speed and robustness. Evolutionary gradient search [34] adapts gradi-
ent search into evolutionary mechanism. The bacterial memetic algorithm [4] is a
kind of memetic algorithm based on the bacterial approach. An intense continuous
local search is proposed in the framework of memetic algorithms [22].
Real-coded memetic algorithm [18] applies a crossover hill-climbing to solutions
produced by the genetic operators. Crossover hill-climbing exploits the self-adaptive
capacity of real-parameter crossover operators with the aim of producing an effective
local tuning on the solutions. The algorithm employs an adaptive mechanism that
determines the probability with which every solution should receive the application
of crossover hill-climbing.
In [36], greedy crossover-based hill-climbing and steepest mutation-based hill-
climbing are used as an adaptive hill-climbing strategy within the framework of
memetic algorithms for solving dynamic optimization problems.
In memetic algorithms, local search is used to search around the most promis-
ing solutions. As the local region extension increases with the dimensionality,
high-dimensional problems require a high number of evaluations during each local
search process, called local search intensity. MA-SW-Chains [21], the winner of
the CEC’2010 competition, is a memetic algorithm for large scale global optimiza-
tion. It combines a steady state GA with a Solis Wets local search method. MA-
SW-Chains introduces the concept of local search chains to adapt the local search
intensity assigned to the local search method, by exploiting with higher intensity the
most promising individuals. It assigns to each individual a local search intensity that
depends on its features, by chaining different local search applications. MA-SW-
Chains adapts the local search intensity by applying the local search several times
over the same individual, with a fixed local search intensity, and storing its final
parameters, creating local search chains [21]. MA-SW-Chains uses a relative small
population, and iteratively improves the best current solution.
A diversity-based adaptive local search strategy based on parameterized Gaussian
distribution [35] is integrated into the framework of the parallel memetic algorithm
to address large scale COPs.
19.3.1 Simplex-based Memetic Algorithms
Simplex method is a robust, easy to be programmed and fast, nonderivative, local

search algorithm. Many attempts have been made to hybridize EAs with simplex
methods [5,29,37]. Simplex can be used as a local search method after mutation.
Simplex evolution [33] is based on a deterministic simplex operator that is equiv-
alent to one cycle of classical Nelder–Mead simplex method. An iteration of simplex
evolution starts by setting the first individual from the current population as the base
point, randomly selecting other individuals from the current population to form a
simplex, and performing simplex operator on the selected simplex to generate a new
individual and put it into the new generation.
m-simplex evolution [19] combines DE and classical Nelder–Mead simplex
method. It incorporates stochastic reflection and contraction operators of classical
Nelder–Mead simplex method with an additional step, in which an individual not
attaining at least the average fitness of the overall population will take a deterministic
step toward the best individual or away from the worst one.
Global simplex search [20] is an EA based on the stochastic modifications of the
reflection and expansion operators of the simplex method. The reflection and expan-
sion operators with random reflection and expansion factors have been employed as
the recombination operators, and a low mutation rate is also used. The concept of
19.3 Memetic Algorithms 321
generation does not exist in global simplex search; this allows for smooth decrease
of the population from an initial size to a final one.
Global simplex optimization [13] is a population-based EA incorporating a spe-
cial multistage, stochastic and weighted version of the reflection operator of classical
Nelder–Mead simplex method for minimization of continuous multimodal functions.
The method incorporates a weighted stochastic recombination operator inspired from
the reflection and expansion operators of the simplex method, but no mutation oper-
ator.
19.4 Application: Searching Low Autocorrelation Sequences
Binary sequences with low aperiodic autocorrelation levels, defined in terms of the
peak sidelobe level and/or merit factor, have many important engineering applica-
tions, such as radars, sonars, channel synchronization and tracking, spread spectrum
communications, system identification, and cryptography. Searching for low auto-
correlation binary sequences (LABS) is a notorious combinatorial problem.
For a binary sequence of length L, a = a1 a2 . . . a L with ai = {−1, +1} for all i,
its autocorrelation function is given by

L−k
Ck (a) = ai ai+k , k = 0, ±1, . . . , ±(L − 1). (19.1)
i=1
For k = 0, the value of the autocorrelation function equals L and is called the peak,
and for k = 0, the values of the autocorrelation function are called the sidelobes.
The peak sidelobe level (PSL) of a binary sequence a of length L is defined as
P S L(a) = max |Ck (a)|. (19.2)
k=1,...,L−1
The minimum peak sidelobe (MPS) for all possible binary sequences of length L
is defined as
M P S(L) = min P S L(a). (19.3)
a∈{−1,+1} L
The merit factor F of a binary sequence a is defined as

L2
F(a) = L−1 2 . (19.4)
2 k=1 Ck (a)
The sum term in the denominator is called the sidelobe energy of the sequence.
LABS search targets at low PSL or at high merit factor (or equivalently, low
sidelobe energy). Our focus is to search for long LABS with low PSL, which is more
challenging because of the nonanalytical maximum operator in its definition. Both
versions of the LABS problem are hard, since the search space grows exponentially
with the sequence length and there are numerous local minima as well as many
optima. The brute-force exhaustive search requires to examine 2 L binary sequences.
EAs have attained the best results so far [9].
Our EA for the LABS problem integrates the key features of GA, ES and memetic
algorithms. Binary coding is a natural coding scheme for this problem. Each chromo-
some is encoded by a string. The EA design incorporates several features, including
(λ + μ) ES-like scheme, two-point mutation, a bit-climber used as a local search
operator, partial population restart, and a fast scheme for calculating autocorrela-
tion. Crossover operation is not applied. The algorithm can efficiently discover long
LABS of lengths up to several thousands.
Represent binary sequences ai ’s as ±1-valued bit strings. The evaluation of the
fitness function takes O(L 2 ) operations for calculating Ck (a)’s. For the bit-climber,
for each bit flip at ai , Ck (a) can be calculated from its previous value Ck (a) by the
update equation
⎧
⎪
⎪ Ck (a) − 2ai ai+k , 1≤i ≤k
⎪
⎪
⎪
⎪ and i ≤ L − k;
⎨
Ck (a) − 2ai (ai−k + ai+k ), k + 1 ≤ i ≤ L − k;
Ck (a) = (19.5)
⎪
⎪ Ck (a) − 2ai−k ai , L −k+1≤i ≤ L
⎪
⎪
⎪
⎪ and i ≥ k + 1;
⎩
Ck (a), otherwise.
This reduces the complexity for updating all Ck (a)’s to O(L). The resultant saving
is significant, especially because each mutated or randomly generated individual
is subject to L bit flips and fitness evaluations. For example, compared to direct
calculation of Ck ’s, the computing time of the EA is reduced by a factor of 4 when
calculating Ck ’s for L = 31 by (19.5).
In addition to PSL and merit factor, another fitness function is defined by
F(a)
f (a) = . (19.6)
P S L(a)
The results for using several fitness functions were compared in terms of both PSL
and merit factor in [9].
Denote the number of children N O , number of generations for each restart G R S ,
maximal number of generations G max , and population size for partial restart N R S .
Before applying the algorithm for finding long LABS with low PSL, we first
address the problem of which fitness function is most suitable for the task at hand.
We set N P = 4L, N O = 20L, G R S = 5, G max = 100, N R S = 10L. The fitness func-
tions PSL, F and f are evaluated for 5 random runs of the EA on a Linux system
with Intel’s Core 2 Duo processor. When PSL is selected as the fitness function, the
F performance is the poorest. In contrast, when F is selected as the fitness function,
the PSL performance is poorest. Better tradeoffs are achieved by the fitness functions
f . In particular, f achieves the best tradeoff between the achieved PSL and F [9].
For each length, we implemented 3 random runs of our program, and the best
result was retained. To reduce the computing time, the population and children
sizes for longer lengths are decreased. For L = 300 to 1000, we set N P = L,
N O = 2L, G R S = 5, G max = 200, N R S = L. When L > 1000, we set N P = N O =
1000, G R S = 5, G max = 200, N R S = 1000.
19.4 Application: Searching Low Autocorrelation Sequences 323
Table 19.1 Results for L = 1024 and 4096, obtained from 3 random runs of the algorithm
L PSL F Hexadecimal form
1024 28 3.9683 4A3850
61EB56D8C3A37BEDFF2EEBC30 96B47CF2CE9EBA6C28A6895AF
4CDF08090AB612DA8043C3D1F E644D50A15E908692AC4DC095
218D398A6A66B389D16C8A6BC AF26896612DF211D48CBC027C
7C451B6B5B14EECD199CE823E 63C07C4E20AECF7513F41329D
56706E05F66D22A6EEC152A83 0F9378B07D7F3DC2D9FF88C08
4096 61 3.4589 E30A5D894A09A4CE0D11987E
FC7E8DC88127C078FBD569A4A D05AB26D86A2D067C1E274783
B891CBF64617E0906673F029A ED144133B3FF48DF2DB8A1878
6780075E9C2B0CC46E6D0DA62 3CF1F50F1DF94177C28076F3C
E44BC24C69D242E8D6F49F678 E71C2D4D72C9412C828734AA3
9CA28EA2A7E5891B451ADA9B2 408E666BA052C81509DE81789
7E4AF9FE4F504846D80D6B14C EEBDD9402A35C03AFD4EAE97B
7ECB690094681EFD13837398A CECAA9AB5FC10682B00CA74BD
15B5C0D7C53BAF35BF70612CB 4DDE55EB4CF2F028596ED8382
3F5D1A73463B9953326AE6950 CF1299AB6ACB432887A56E9F0
42957BAE604C003E982152DFE AFA75968C0D8B0FEAA2ED33FC
20DE73FBA4E21F154CB291291 58F8BB5B9977C57B6F77A7363
4D9164A6FEA9647EAA1E1D631 14B6BA1E9F065D66E5F5BF15B
0D46EF9CED3216DB9DF0298E1 CFBE0AF7596E9EB4BCBBBDA10
8A2B6088380B8D73797F9E9DB 094FCC06FF0544F46E261FE4E
F60AABCA0A32A5D1694B818B0 3A6D5351B28BAF523D1AE65D6
048136003CFBA56CF22E0E1A2 F2973C8163731272219255826
1DC2BEC886EBBBD73B5D1EFC2 9BB7E91F72964943D6D3560C3
A8E20D11EC5A81C106E04D5F5 9218D9FD9D823B118AD4FB1D6
C1435461E338D9F171B337E5D D7320CCD9CFE5DC651051E0F6
678550BA09F9892E76D6E17C4 9ECD63F71B71FF351EEAF6DEB
The computing time is 16.1136 hours for L = 1019. For lengths up to 4096, the
computing time required empirically shows a seemingly quadratic growth with L.
In particular, the parameters have been adjusted to trade the performance for the
search time, in case of long sequences. This flexible tradeoff is in fact one of the
key advantages of the algorithm. The sequences obtained for L=1024 and 4096 are
listed in Table 19.1. A detailed implementation of the algorithm and a full list of best
sequences thus far is given in [9].
Problem
19.1 Run the accompanying MATLAB code of cultural algorithm to find the global
minimum of Rosenbrock function. Investigate how to improve the program.
References
1. Alami J, Imrani AE, Bouroumi A. A multi-population cultural algorithm using fuzzy clustering.
Appl Soft Comput. 2007;7(2):506–19.
2. Becerra RL, Coello CAC. Cultured differential evolution for constrained optimization. Comput
Meth Appl Mech Eng. 2006;195:4303–22.
3. Blackmore S. The meme machine. New York: Oxford University Press; 1999.
4. Botzheim J, Cabrita C, Koczy LT, Ruano AE. Fuzzy rule extraction by bacterial memetic
algorithms. Int J Intell Syst. 2009;24(3):1563–8.
5. Chelouah R, Siarry P. Genetic and Nelder-Mead algorithms hybridized for a more accurate
global optimization of continuous multiminima functions. Eur J Oper Res. 2003;148:335–48.
6. Chung CJ, Reynolds RG. Function optimization using evolutionary programming with self-
adaptive cultural algorithms. In: Proceedings of Asia-Pacific conference on simulated evolution
and learning, Taejon, Korea, 1996. p. 17–26.
7. Dawkins R. The selfish gene. Oxford, UK: Oxford Unive Press; 1976.
8. Digalakis JG, Margaritis KG. A multi-population cultural algorithm for the electrical generator
scheduling problem. Math Comput Simul. 2002;60(3):293–301.
9. Du K-L, Mow WH, Wu WH. New evolutionary search for long low autocorrelation binary
sequences. IEEE Trans Aerosp Electron Syst. 2015;51(1):290–303.
10. Farahmand AM, Ahmadabadi MN, Lucas C, Araabi BN. Interaction of culture-based learning
and cooperative coevolution and its application to automatic behavior-based system design.
11. Folino G, Pizzuti C, Spezzano G. Combining cellular genetic algorithms and local search for
solving satisfiability problems. In: Proceedings of the 12th IEEE international conference on
tools with artificial intelligence, Taipei, Taiwan, November 1998. p. 192–198.
12. Huy NQ, Soon OY, Hiot LM, Krasnogor N. Adaptive cellular memetic algorithms. Evol Com-
put. 2009;17(2):231–56.
13. Karimi A, Siarry P. Global simplex optimization—a simple and efficient metaheuristic for
continuous optimization. Eng Appl Artif Intell. 2012;25:48–55.
14. Kendall G, Soubeiga E, Cowling P. Choice function and random hyperheuristics. In: Pro-
ceedings of the 4th Asia-Pacific conference on simulated evolution and learning, Singapore,
November 2002. p. 667–671.
15. Kirby S. Spontaneous evolution of linguistic structure: an iterated learning model of the emer-
gence of regularity and irregularity. IEEE Trans Evol Comput. 2001;5(2):102–10.
16. Krasnogor N. Studies on the theory and design space of memetic algorithms. PhD Thesis,
Faculty Comput Math Eng Bristol, UK, University West of England, 2002.
17. Lee JT, Lau E, Ho Y-C. The Witsenhausen counterexample: a hierarchical search approach for
nonconvex optimization problems. IEEE Trans Autom Control. 2001;46(3):382–97.
18. Lozano M, Herrera F, Krasnogor N, Molina D. Real-coded memetic algorithms with crossover
hill-climbing. Evol Comput. 2004;12(3):273–302.
19. Luo C, Yu B. Low dimensional simplex evolution—a new heuristic for global optimization. J
Glob Optim. 2012;52(1):45–55.
20. Malaek SM, Karimi A. Development of a new global continuous optimization algorithm based
on Nelder–Mead Simplex and evolutionary process concepts. In: Proceedings of the 6th inter-
national conference on nonlinear problems in aerospace and aviation (ICNPAA), Budapest,
Hungary, June 2006. p. 435–447.
21. Molina D, Lozano M, Garcia-Martinez C, Herrera F. Memetic algorithms for continuous opti-
mization based on local search chains. Evol Comput. 2010;18(1):27–63.
22. Molina D, Lozano M, Herrera F. MA-SW-Chains: memetic algorithm based on local search
chains for large scale continuous global optimization. In: Proceedings of the IEEE Congress
on evolutionary computation (CEC), Barcelona, Spain, July 2010. p. 1–8.
References 325
23. Moscato P. On evolution, search, optimization, genetic algorithms and martial arts: towards
memetic algorithms. Technical Report 826, Caltech Concurrent Computation Program, Cali-
fornia Institute of Technology, Pasadena, CA, 1989.
24. Moscato P. Memetic algorithms: a short introduction. In: Corne D, Glover F, Dorigo M, editors.
New ideas in optimization. McGraw-Hill; 1999. p. 219–234.
25. Nguyen QH, Ong Y-S, Lim MH. A probabilistic memetic framework. IEEE Trans Evol Comput.
2009;13(3):604–23.
26. Noman N, Iba H. Enhancing differential evolution performance with local search for high
dimensional function optimization. In: Proceedings of genetic and evolutionary computation
conference (GECCO), Washington DC, June 2005. p. 967–974.
27. Ong YS, Keane AJ. Meta-Lamarckian learning in memetic algorithms. IEEE Trans Evol Com-
put. 2004;8(2):99–110.
28. Peng B, Reynolds RG. Cultural algorithms: knowledge learning in dynamic environments. In:
Proceedings of IEEE congress on evolutionary computation, Portland, OR, 2004. p. 1751–1758.
29. Renders J-M, Bersini H. Hybridizing genetic algorithms with hill-climbing methods for global
optimization: two possible ways. In: Proceedings of the 1st IEEE conference on evolutionary
computation, Orlando, FL, June 1994, vol. 1. p. 312–317.
30. Reynolds RG. An introduction to cultural algorithms. In: Sebald AV, Fogel LJ, editors. Pro-
ceedings of the 3rd annual conference on evolutionary programming. River Edge, NJ: World
Scientific; 1994. p. 131–139.
31. Reynolds RG. Cultural algorithms: theory and applications. In: Corne D, Dorigo M, Glover
F, editors. Advanced topics in computer science series: new ideas in optimization. New York:
McGraw-Hill; 1999. p. 367–377.
32. Smith JE. Coevolving memetic algorithms: a review and progress report. IEEE Trans Syst Man
Cybern Part B. 2007;37(1):6–17.
33. Sotiropoulos DG, Plagianakos VP, Vrahatis MN. An evolutionary algorithm for minimizing
multimodal functions. In: Proceedings of the 5th Hellenic–European conference on computer
mathematics and its applications (HERCMA), Athens, Greece, September 2001, vol. 2. Athens,
Greece: LEA Press; 2002. p. 496–500.
34. Solomon R. Evolutionary algorithms and gradient search: similarities and differences. IEEE
Trans Evol Compu. 1998;2(2):45–55.
35. Tang J, Lim M, Ong YS. Diversity-adaptive parallel memetic algorithm for solving large scale
combinatorial optimization problems. Soft Comput. 2007;11(9):873–88.
36. Wang H, Wang D, Yang S. A memetic algorithm with adaptive hill climbing strategy for
dynamic optimization problems. Soft Comput. 2009;13:763–80.
37. Yen J, Liao JC, Lee B, Randolph D. A hybrid approach to modeling metabolic systems using a
genetic algorithm and simplex method. IEEE Trans Syst Man Cybern Part B. 1998;28:173–91.
Tabu Search and Scatter Search
20
Tabu search is a single solution-based stochastic metaheuristic global optimization

method. It is a hill-climbing method that imitates human memory structure to im-
prove decision-making. Scatter search is a population-based metaheuristic algorithm.
Scatter search and its generalized form called path relinking are intimately related
to tabu search, and they derive additional advantages by using adaptive memory
mechanism.
20.1 Tabu Search
Tabu search (prohibited search) is a stochastic metaheuristic global optimization

method, which was originally developed for very large COPs [3–5,9] and was later
extended to continuous optimization [2,19]. Like SA, tabu search is a single-solution-
based metaheuristic.
Tabu, or taboo, means forbidden or banned. Tabu search uses a set of strategies
and learned information to mimic human insights for problem-solving. Tabu search
pioneered the systematic exploration of memory in search processes, while EAs
pioneered the idea of combining solutions. Tabu search is conceptually much simpler
than EAs or SA and is easy to implement. It is superior to SA and EAs, for many
optimization problems, both in terms of the computation time for a solution and in
terms of the solution quality.
Tabu search is essentially a greedy local search (also known as hill-climbing)
method that explores the solution space beyond local optimality and adopts a mem-
ory structure that imitates human behavior, and uses past experiences to improve
decision-making. By employing the concepts of best improvement, tabu lists and
aspiration criteria, it avoids getting premature convergence to a local optimum.

DOI 10.1007/978-3-319-41192-7_20
328 20 Tabu Search and Scatter Search
Once a potential solution has been determined, it will be marked as tabu, and
the algorithm will not visit it repeatedly. The approach uses memories to avoid en-
trapment in cycles and pursues the search when the optimization process encounters
local optima, where cycling back to formerly visited solutions is prohibited through
the use of memory lists called tabu lists, which trace the recent search history. Best
improvement is implemented by always replacing each current solution by its best
neighbor, even if the best neighbor is worse than the current solution. This can avoid
getting stuck at local optima. In order to avoid cycling among already visited solu-
tions, a tabu list is used to keep the information about the past steps of the search,
and to create and exploit new solutions in the search space.
Tabu search starts searching with a present solution and constructs a set of feasible
solutions from the present one based on neighborhood by using the tabu list. The tabu
list T holds a record of all previously visited states. The solutions constructed are
evaluated and the one with the highest metric value is selected as the next solution.
The tabu list is then updated. However, forbidding all solutions corresponding to a
tabu attribute may forbid some good or even optimal solutions that have not yet been
visited. No records in T can be used to form a next feasible solution, unless they fit
aspiration criteria. The aspiration criteria allow better solutions to be chosen, even
if they have been tabooed. Suppose T follows the policy of FIFO; the larger the set
T , the longer will be the prohibited time of the move in T .
An aspiration criterion is a condition that, if satisfied, allows to set a solution
obtained by performing a tabu move as new current solution. It is a rule that allows
the tabu status to be overridden in cases where the forbidden exchange exhibits
desirable properties. A typical aspiration criterion is to keep a solution that is better
than the best solution found so far. In this metaheuristic, intensification is provided
by the local search mechanism, while diversification is given by the use of tabu lists.
Basic tabu search is given in Algorithm 20.1, where x, y are feasible solutions of a
COP, A(x, t) is the set of solutions among which the new current solution is chosen
at iteration t, N (x) is the set of neighbors of x, T (x, t) is the set of tabu moves
at iteration t, and T̃ (x, t) is the set of tabu moves satisfying at least one aspiration
criterion, and E(·) is the metric function. Stopping criterion may be the maximum
number of consecutive iterations not producing an improving solution, or A(x, t) is
an empty set.
Step 4.a can be implemented as follows. The set A(x, t) is generated by generating
M children x from the neighborhood of x. These children satisfy the conditions that
their features do not belong to T , or they satisfy at least one of the aspirations T̃ . Step
4.b determines the new solution x by selecting the one with the minimum fitness.
Step 4.c updates the tabu list T by including the features from x , and updates x by
x , if f (x ) < f (x). Simple tabu search, in most cases, will find a local optimum
rather than a global optimum.
Tabu search has a strong reliance on the initial solution and its quality. The conver-
gence speed of tabu search to the global optimum is dependent on the initial solution,
since it is a form of iterative search. A multistart method is one that executes multiple
20.1 Tabu Search 329
times from different initial settings. In [14], strategic diversification is utilized within
the tabu search framework for the QAP, by incorporating several diversification and
multistart tabu search variants.
Algorithm 20.1 (Tabe Search).
1. Set t = 0.
2. Generate an initial solution x.
3. Initialize the tabu lists T ← ∅ and the size of tabu list L.
4. Repeat:
a. Set the candidate set A(x, t) = {x ∈ N (x) \ T (x, t) ∪ T̃ (x, t)}.
b. Find the best x from A(x, t): Set x = arg min y∈A(x,t) f ( y).
c. If f (x ) is better than f (x), x ← x .
d. Update the tabu lists and the aspiration criteria.
e. If the tabu list T is full, then old features from T are replaced.
f. Set t = t + 1.
By introducing parallelism, tabu search can find the promising regions of the
search space very quickly. A parallel tabu search model, which is based on the
crossover operator of GA, has been described in [15]. Theoretical properties of
convergence of tabu search to the optimal solutions has been analyzed in [13].
Diversification-driven tabu search [12] repeatedly alternates between simple tabu
search and a diversification phase founded on a memory-based perturbation operator.
Starting from an initial random solution, the method uses tabu search to reach a local
optimum. Then, perturbation operator is applied to displace the solution to a new
region, whereupon a new round of tabu search is launched. The tabu search procedure
uses a neighborhood defined by single 1-flip moves, which consist of flipping a single
variable x j to its complement value 1 − x j . The diversification strategy utilizes a
memory-based perturbation operator.
CA-TS [1] combines cultural algorithms and tabu search, where tabu search is used
to transform history knowledge in the belief space from a passive knowledge source
to an active one. In each generation of the cultural algorithm, the best individual
solution is calculated and then the best new neighbor of that solution is sought
in the social network for that population using tabu search. In order to speed up
the convergence process through knowledge dissemination, simple forms of social
network topologies are used to describe the connectivity of individual solutions. The
integration of tabu search as a local enhancement process enables CA-TS to leap
over false peaks and local optima.
20.1.1 Iterative Tabu Search
Random search or pattern search introduces fixed step size random search based
on basic mathematical analysis. It iteratively moves to better positions in the search
space that are sampled from a hypersphere surrounding the current position. The step
size significantly affects the performance of the algorithms. Solis–Wets algorithm
is a randomized hill climber with an adaptive step size, which is a general and fast
search algorithm with good behavior.
Iterated local search [17] creates a sequence of solutions iteratively according to
a local search heuristic. After a new solution is created by local search, it is modified
by perturbation to escape from local extremum, and an intermediate solution is
produced. A neighborhood-based local search procedure is also designed to return
an enhanced solution. An acceptance measure is also delineated deciding which
solution is selected for further evolution. The new solution replaces the previous
one if it has better quality. The procedure continues until a termination criterion is
satisfied.
Iterated tabu search [18], as a special case of iterative local search, combines
tabu search with perturbation operators to avoid getting stuck in local optima. The
local search phase is replaced by a tabu search phase. At each iteration, solution ŝ is
perturbed resulting in solution s , which is then improved by tabu search to obtain
solution s̄. If solution s̄ satisfies the acceptance criterion, the search continues with
solution s̄, otherwise the search proceeds with solutions ŝ. The best-known feasible
solution encountered s ∗ and its function value are recorded.
Example 20.1: Reconsider the TSP for Berlin52 benchmark in TSPlib, which is
treated in Example 11.1. The length of the optimal tour is 7542 when using Euclidean
distances. In this example, we implement tabu search. We set the maximum number
of iterations as 1000, and the tabu list length as 500.
Global best solution

1200
1000
800
600
400
200
0
0 500 1000 1500 2000
Figure 20.1 The best TSP solution by tabu search.

20.1 Tabu Search 331
4 Best route length: 7782.9844

x 10
3
2.5
2
Iterative cost
1.5
0.5
0 20 40 60 80 100
Iteration
Figure 20.2 The TSP evolution by tabu search.
For a random run, the best route length obtained is 7782.9844 at the 100th iteration,
the optimal solution is illustrated in Figure 20.1, and the evolution of a random run is
illustrated in Figure 20.2. Compared to the ACO implementation given in Example
20.1, the implementation given here always converges to a local minimum. A more
elaborate strategy is required to help the search to get out of local minima.
20.2 Scatter Search
Scatter search [6,10,16] is a population-based metaheuristic algorithm. Initially, scat-

ter search was simply considered as one of the component processes available within
the tabu search framework. Like EAs, they construct solutions by combining others.
They can be used for solving combinatorial and nonlinear optimization problems.
Scatter search and its generalized form called path relinking are intimately related
to tabu search, and they derive additional advantages by using adaptive memory and
associated memory-exploiting mechanisms.
Scatter search starts from the solutions obtained by means of a suitable heuristic
technique. New solutions are then generated on the basis of a subset of the best
solutions obtained from the start. A set of the best solutions is then selected from
these newly found solutions and the entire process is repeated.
Scatter search [11,16] is an ES-like algorithm by including the elitism mechanism
in simplex search. The basic idea of scatter search is the same as that of simplex
search. Given a group of points, the algorithm finds new points, accepts the better
ones and discards the worse ones.
Scatter search explores the solution space by evolving a set of reference points
(solutions) stored in the reference set (RefSet). These points are initially generated
with a diversification method and the evolution of these reference points is induced
by the application of four methods: subset generation, combination, improvement,
and update. Furthermore, these new individuals can be improved by applying a local
search method.
Scatter search is a kind of direction-based method that utilizes the subtraction of
two solutions as the perturbation direction in an evolution episode. A set of solutions
with high evaluation are used to generate new solutions to replace less promising
solutions at each iteration of the implementation process. A local search procedure is
usually applied over each solution of the population and each combined new solution.
The scatter search method builds a reference set (RefSet for short) of solutions to
maintain a good balance between intensification and diversification of the solution
process. Reference set stores b high-quality solutions: RefSet1 with b1 solutions in
terms of objective value, and RefSet2 with b2 = b − b1 solutions in terms of diversity
(crowdedness) and far away from RefSet1 points. With a generation procedure,
subsets are generated from the reference set. A combination procedure is then carried
out to form new solutions from subsets, and the new solutions experience local search
by the improvement procedure to become better solutions. There are update rules to
determine whether an improved solution could enter a reference set.
Scatter search has four main steps. The initialization of scatter search randomly
generates solutions in such a way that the more the individuals generate in one area,
the less opportunity this area will have to generate new ones. This ensures that the
initial solutions of scatter search have maximum diversity. Scatter search then makes
use of simplex search to improve the initial solutions. After that, RefSet1 is selected
from the improvement results according to the objective quality, and RefSet2 is
selected according to the distance to RefSet1 of the remaining improved individuals
(the larger the better). Then the algorithm starts the main loop. The reference set is
used to generate subsets. The solutions in the subsets are combined in various ways
to get Psi ze new solutions, which are then improved by local search such as simplex
search. If the improvement results in shrinking of the population, diversification
is applied again until the total number of improved solutions reaches the desired
target. Based on the improved solutions, the reference update is applied to construct
the reference set. Then scatter search continues in a loop that consists of applying
solution combination followed by improvement and the reference update. Finally,
the improved solutions will replace some solutions of the reference set if they are
good with respect to objective quality or diversity. This loop terminates when the
reference set does not change and all the subsets have already been subjected to
solution combination. At this point, diversification generation is used to construct a
new Refset2 and the search continues. The whole scatter search is terminated when
the predefined termination criterion is satisfied.
There are four types of subsets to be generated in scatter search: two-element
subsets, three-element subsets, four-element subsets, and subsets containing the best
five elements or more. There are many types of combinations for generating new
20.2 Scatter Search 333
solutions from subsets. Let us give an example for a two-element subset: x 1 and x 2 .
We can first define a vector starting at x 1 and pointing to x 2 as d = x 2 −x 1
2 . Three
types of recombination are suggested [11]:
x new = x 1 − r d, x new = x 1 + r d, x new = x 2 + r d, (20.1)
where r is a random number uniformly drawn from (0, 1).
Every subset can generate several new solutions according to the composition of
the subset. When both x 1 and x 2 belong to RefSet1, which means that they are all
good solutions, four new solutions are generated by types 1 and 3 once and type
2 twice. When only one of x 1 and x 2 belong to RefSet1, three new solutions are
generated by types 1, 2, 3 once. When neither x 1 nor x 2 belongs to RefSet1, which
means that they are all uncrowded solutions, two new solutions are generated by type
2 once and by type 1 or 3 once.
Simplex search is used to improve the new solutions. If an improved solution is
better than the worst one in RefSet1, it will replace the worst one. If an improved
solution’s distance to the closest reference set solutions is larger than that of most
crowded solutions in RefSet2, it will replace the most crowded one. If reference
set does not change in the updating procedure and the stop criterion has not been
satisfied, then the initialization procedure will be started to construct a new RefSet2.
It is suggested that Psi ze = max(100, 5b) [11]. Scatter search can be considered
as a (b + Psi ze )-ES, but the objective value is not the only criterion in the updating
(replacement) phase.
Scatter search is given in Algorithm 20.2.
Global search [20], called OptQuest/NLP, is a global optimization heuristic for
pure and mixed integer nonlinear problems with many constraints and variables,
where all problem functions are differentiable with respect to the continuous vari-
ables. The procedure combines the global optimization abilities of OptQuest with
the superior accuracy and feasibility-seeking behavior of gradient-based local NLP
solvers. OptQuest, a commercial implementation of scatter search developed by
OptTek Systems, provides starting points for any gradient-based local NLP solver.
20.3 Path Relinking
Path relinking [7,9] is a metaheuristic, originally proposed as a method to integrate

intensification and diversification strategies in the context of tabu search in [4]. It
creates combinations of high-quality solutions by generating paths between selected
points in neighborhood space. The approach takes advantage of the path interpretation
of solution combinations as a foundation for numerous applications in combinatorial
optimization.
Algorithm 20.2 (Scatter Search).
1. Set D = ∅.
2. Repeat:
Construct a solution x by the diversification method.
if x D, then D = D ∪ {x}.
until |D| = DSi ze.
3. Build Re f Set = {x 1 , . . . , x b } in D with a one-by-one max–min selection.
4. Order the solutions in Re f set by their objective function value in the order that x 1
is the best.
5. N ewSolutions ← T RU E.
6. while (N ewSolutions) do:
a. Generate N ewSubsets, which consists of all pairs of solutions in Re f Set that
include at least one new solution.
N ewSolution ← F AL S E.
b. while (N ewSubsets = ∅) do:
i. Select the nest subset S in N ewSubsets.
ii. Apply solution combination on S to obtain one or more new solutions x.
if (x Re f Set and f (x) < f (x b ) )
x b ← x, and reorder Re f Set.
N ewSolutions ← T RU E.
end if
iii. N ewSubsets ← N ewSubsets \ S.
end while
end while
Path relinking is an intensification strategy to explore trajectories connecting high-

quality (elite) solutions [9], generating intermediate solutions that can eventually be
better than the high-quality solutions being connected. Instead of directly generating a
new solution by combining two or more original solutions, it generates paths between
and beyond the selected solutions in the neighborhood space. Path relinking generally
operates by starting from an initial solution, selected from a subset of high-quality
solutions, and generating a path in the neighborhood space that leads toward the other
solutions in the subset, which are called guiding solutions. This is accomplished by
selecting moves that introduce attributes contained in the guiding solutions. The roles
of the initiating and guiding solutions are interchangeable.
Path relinking operates on a set of solutions, called the elite set, typically sorted
from best to worst. Path relinking maintains a reference set of elite and diverse
solutions, and generates new solutions between and beyond initiating and guiding
solutions selected from this set.
Given two solutions in the elite set, S and S0 . The standard implementation of
path relinking, called interior path relinking, starts from the initiating solution S
and gradually transforms it into the guiding solution S0 . This transformation is ac-
complished by swapping elements selected in S with elements in S0 , generating an
intermediate solution S . The elements present in both solutions (S ∩ S ) remain se-
lected in solutions generated in the path between them. The set of elements in S and
20.3 Path Relinking 335
not in S is S \ S . Symmetrically, S \ S is the set of elements selected in S and

not selected in S. To obtain the first intermediate solution in this path, we remove
a single element u ∈ S \ S and include a single element v ∈ S \ S, thus obtaining
S1 = S \ {u} ∪ {v}, denoted by S1 = move(S, u, v). In general, the (k + 1)-th inter-
mediate solution is constructed from the previous solution as Sk+1 = move(Sk , u, v)
with u ∈ Sk \ S and v ∈ S \ Sk .
In the between-form of path relinking (interior path relinking), paths in the neigh-
borhood solution space connecting good solutions are explored between these so-
lutions in the search for improvements. The beyond-form of path relinking, called
exterior path relinking [8], is a variant of the more common interior path relinking.
It explores paths beyond those solutions. This is accomplished by considering an
initiating solution and a guiding solution and introducing in the initiating solution
attributes not present in the guiding solution. To complete the process, the roles of
initiating and guiding solutions are exchanged.
Problems
20.1 Find out the global search mechanism, the convergence mechanism, and the
uphill mechanism of scatter search.
20.2 GlobalSearch solver of MATLAB Global Optimization Toolbox imple-
ments global search algorithm [20] for finding global optimum solution of
smooth problems. Try GlobalSearch solver on a benchmark function given
in the Appendix. Test the influence of different parameters.
20.3 Run the accompanying MATLAB code of tabu search for n-queens problem.
Understand the principle of the algorithm. Investigate how to improve the
result by adjusting the parameters.
References
1. Ali MZ, Reynolds RG. Cultural algorithms: a Tabu search approach for the optimization of
engineering design problems. Soft Comput. 2014;18:1631–44.
2. Cvijovic D, Klinowski J. Taboo search: an approach to the multiple minima problem. Science.
1995;267(3):664–6.
3. Glover F. Future paths for integer programming and links to artificial intelligence. Comput
Oper Res. 1986;13(5):533–49.
4. Glover F. Tabu search-Part I. ORSA J Comput. 1989;1(3):190–206.
5. Glover F. Tabu search-Part II. ORSA J Comput. 1990;2(1):4–32.
6. Glover F. A template for scatter search and path relinking. In: Proceedings of the 3rd European
conference on artificial evolution, Nimes, France, Oct 1997, vol. 1363 of Lecture Notes in
Computer Science. Berlin: Springer; 1997. p. 3–51.
7. Glover F. Tabu search and adaptive memory programming: advances, applications and chal-
lenges. In: Barr RS, Helgason RV, Kennington JL, editors. Interfaces in computer science
and operations research: advances in metaheuristics, optimization, and stochastic modeling
technologies. Boston, USA: Kluwer Academic Publishers; 1997. p. 1–75.
8. Glover F, Exterior path relinking for zero-one optimization. Int J Appl Metaheuristic Comput.
2014;5(3):8 pages.
9. Glover F, Laguna M. Tabu search. Norwell, MA, USA: Kluwer Academic Publishers; 1997.
10. Glover F, Laguna M, Marti R. Fundamentals of scatter search and path relinking. Control
Cybernet. 2000;29(3):653–84.
11. Glover F, Laguna M, Marti R. Scatter search. In: Koza JR, editors. Advances in evolutionary
computation: theory and applications. Berlin: Springer; 2003. p. 519–537.
12. Glover F, Lv Z, Hao JK. Diversification-driven tabu search for unconstrained binary quadratic
problems. 4OR Q J Oper Res. 2010;8:239–53.
13. Hanafi S. On the convergence of tabu search. J Heuristics. 2000;7(1):47–58.
14. James T, Rego C, Glover F. Multistart tabu search and diversification strategies for the quadratic
assignment problem. IEEE Trans Syst Man Cybern Part A. 2009;39(3):579–96.
15. Kalinli A, Karaboga D. Training recurrent neural networks by using parallel tabu search algo-
rithm based on crossover operation. Eng Appl Artif Intell. 2004;17:529–42.
16. Laguna M, Marti R. Scatter search: methodology and implementations in C. Dordrecht: Kluwer
Academic; 2003.
17. Lourenco HR, Martin OC, Stutzle T. Iterated local search: framework and applications. In:
Glover F, Kochenberger G, editors. Handbook of metaheuristics, 2nd ed. Boston, USA: Kluwer
Academic Publishers; 2010. p. 363–397.
18. Misevicius A, Lenkevicius A, Rubliauskas D. Iterated tabu search: an improvement to standard
tabu search. Inf Technol Control. 2006;35:187–97.
19. Siarry P, Berthiau G. Fitting of tabu search to optimize functions of continuous variables. Int
J Numer Methods Eng. 1997;40:2449–57.
20. Ugray Z, Lasdon L, Plummer JC, Glover F, Kelly J, Marti R. Scatter search and local NLP
solvers: a multistart framework for global optimization. INFORMS J Comput. 2007;19(3):328–
40.
Search Based on Human Behaviors
21
Human being is the most intelligent creature on this planet. This chapter introduces
various search metaheuristics that are inspired by various behaviors of human creative
problem-solving process.
21.1 Seeker Optimization Algorithm
Seeker optimization algorithm [7] is a population-based metaheuristic search al-

gorithm for real-parameter optimization problems by simulating the act of human
searching. It operates on a set of solutions called human search team (or population),
and the individuals are called seekers. The choice of search direction is based on the
empirical gradient by evaluating the response to the position change, and the decision
of step length is based on a simple fuzzy rule.
Unlike PSO and DE, seeker optimization algorithm deals with search direction
and step length independently [6,7]. In seeker optimization algorithm, the search
direction is determined by a randomized compromise among seeker’s egoistic be-
havior, altruistic behavior and proactiveness behavior, while the step length is given
by a fuzzy reasoning rule:
d i (t) = sign(ωd i, pr o + φ1 d i,ego + φd i,alt ), (21.1)
where sign(·) is a signum function, ω linearly decreases from 0.9 to 0.1, φ1 and φ2
are random number uniformly drawn in [0, 1].
The egoistic and altruistic directions of the ith seeker are defined by
p
d i,ego = x i,best (t) − x i (t), (21.2)
g
d i,alt = x best − x i (t), (21.3)

DOI 10.1007/978-3-319-41192-7_21
338 21 Search Based on Human Behaviors
p
where x i (t) is the position of the ith seeker, x i,best is its own personal best position
g
so far, and x best is the neighborhood best position so far.
Each seeker may be proactive to change his search direction according to his
past behavior and the environment. Proactiveness direction for each seeker i can be
determined by the empirical gradient by evaluating the latest three positions:
d i, pr o = x i (t1 ) − x i (t2 ), (21.4)
where x i (t1 ) and x i (t2 ) are the best and worst positions from {x i (t − 2), x i
(t − 1), x i (t)}, respectively.
The position update is given by
x i (t + 1) = x i (t) + αi (t)d i (t), (21.5)
where αi (t) is a step size, which is given by a Gaussian membership function.
Compared to PSO with inertia weight, PSO with constriction factor and DE, seeker
optimization algorithm has faster convergence speed and better global search ability
with more successful runs for the benchmark functions.
21.2 Teaching–Learning-Based Optimization
Teaching–learning-based optimization (TLBO) [20–22] is a population-based

method inspired from the philosophy of teaching and learning that consists of a
group of learners. It is based on the effect of the influence of a teacher on the output
of learners in a class which is considered in terms of results or grades. The method
consists of teacher phase and learner phase, where the individuals can learn from
the teacher and from the interaction of other individuals, respectively. The teacher
is generally considered as a highly learned person who shares his or her knowledge
with the learners.
In TLBO, the population is considered as a group of learners or a class of learners.
Different design variables will be analogous to different subjects offered to learners
and the learners’ result is analogous to the fitness. The teacher is considered as the
best learner (solution) obtained so far. Each candidate solution x i is characterized
by a string of variables which represent the results of a student that consists of grade
point of different subjects. The students try to improve their results by acquiring
knowledge from the teacher and this process is termed as teaching phase. At the
same time they also improve their performance through mutual interaction with
other students and this is learning phase.
Each n-dimensional individual x i within the population represents the possible
solution to an optimization problem, where n represents the number of subjects of-
fered to the learners. The algorithm attempts to improve the knowledge (represented
by fitness) of each learner through the two learning phases, namely, the teacher phase
and the learner phase. The learners will be replaced if the new solutions produced
during the teacher or learner phases have better fitness. This algorithm will be re-
peated until the termination criteria are met. During the teacher phase, each learner is
21.2 Teaching–Learning-Based Optimization 339
learning from the teacher x teacher , who is the best individual in the population. The
learner will move their position toward x teacher , by taking into account the current
mean value of the learners (x mean ) that represents the average qualities of all learners
in the population.
During the teacher phase, the learner x i updates his/her position by
x new,i = x i + r (x teacher − TF x mean ), (21.6)
where r is a random number ranges from 0 to 1, TF is a teaching factor that is used
to emphasize the importance of the learner’s average qualities x mean . TF = 1 or 2 is
heuristically obtained by TF = r ound[1 + rand(0, 1)].
For learner phase, each learner x i randomly selects a peer learner x j . It will move
toward or from x j depending on whether x j has better fitness than x i has:

x i + r (x j − x i ), f (x j ) > f (x i )
x new,i = . (21.7)
x i + r (x i − x j ), f (x j ) < f (x i )
TLBO has many features in common with DE.
Teaching and peer-learning PSO [16] adapts TLBO into PSO. It adopts the teach-
ing and peer-learning phases. The particle first enters into the teaching phase and
updates its velocity based on its historical best and the global best information.
Particle that fails to improve its fitness in the teaching phase then enters into the
peer-learning phase, where an exemplar is selected as the guidance particle. Roulette
wheel selection technique is employed to ensure fitter particle has higher probabil-
ity to be selected as the exemplar. Additionally, a stagnation prevention strategy is
employed.
In bare-bones TLBO [26], each learner of teacher phase employs an interactive
learning strategy, which is the hybridization of the learning strategy of teacher phase
in TLBO and Gaussian sampling learning based on neighborhood search, and each
learner of learner phase employs the learning strategy of learner phase in TLBO or
the new neighborhood search strategy. The bare-bones method outperforms TLBO.
TLBO is a parameter-free stochastic search technique. It gains its popularity due
to its ability to achieve better results in comparatively faster convergence to GA,
PSO, and ABC.
In [25], TLBO is enhanced with learning experience of other learners. In this
method, two random possibilities are used to determine the learning methods of
learners in different phases. In teacher phase, the learners improve their grades by
utilizing the mean information of the class and the learning experience of other learn-
ers according to a random probability. In learner phase, a learner learns knowledge
from another learner which is randomly selected from the whole class or the mutual
learning experience of two randomly selected learners. Area-copying operator in
producer–scrounger model is also used for parts of learners to increase the learning
speed.

5
10
Best value
Mean value
0
Function value 10
−5
10
−10
10
−15
10
−20
10
0 20 40 60 80 100
Iteration
Figure 21.1 The evolution of a random run of TLBO for Ackley function: the minimum and
average objectives.
Example 21.1:
We now reconsider Ackley function, which was solved in Example 7.1. The global
minimum value is 0 at x ∗ = 0.
We implement TLBO on this problem by setting the population size as 50, the
maximum number of iterations as 100, the teaching factor randomly as 1 or 2, and
selecting the initial population randomly from the entire domain. For a random
run, we have f (x) = 8.8818 × 10−16 at (−0.2663, −0.1530) × 10−15 . The con-
vergence curves are illustrated in Figure 21.1. TLBO always converged toward the
global optimum very rapidly during the random runs. When we take the number
of dimensions as 10, a random run gives the minimum value 1.8854 × 10−10 at
(0.1217, 0.5837, −0.2207, 0.7190, −0.2184, 0.1717, −0.1355, −0.7541, 0.5743,
−0.5536) ×10−10 .
21.3 Imperialist Competitive Algorithm
Imperialist competitive algorithm [3,17] is inspired by the human sociopolitical evo-

lution process of imperialistic competition. It is a population-based method for solv-
ing continuous optimization problems. It can be regarded as the social counterpart
of GA.
The algorithm starts by generating a set of random solutions called the initial coun-
tries in the search space. The cost function of the optimization problem determines
21.3 Imperialist Competitive Algorithm 341
the power of each country. Some of the best initial countries, i.e., the countries with
the least cost function value, become imperialists and start taking control of other
countries (called colonies) and form the initial empires.
Two major operators are assimilation and revolution. Assimilation makes the
colonies of each empire get closer to the imperialist state in the space of sociopolitical
characteristics (i.e., search space). Revolution causes sudden random changes in the
characteristics of some of the countries in the search space. During assimilation and
revolution, a colony might reach a better position and has the chance to take control
of the entire empire and replace the current imperialist of the empire. All the empires
try to win imperialistic competition and take possession of colonies of other empires.
Based on their power, all the empires have a chance to take control of one or more
of the colonies of the weakest empire. Weak empires lose their power gradually
and they will finally be eliminated. Algorithm continues with the mentioned steps
(assimilation, revolution, competition) until a stop condition is satisfied.
In [5], imperialist competitive algorithm is combined with a policy-learning func-
tion for solving the TSP. All offspring of each country represent feasible solutions for
the TSP. All countries can grow increasingly strong by learning the effective policies
of strong countries. Weak countries will generate increasingly excellent offspring by
learning the policies of strong countries while retaining the characteristics of their
own countries.
Example 21.2: The global minimum of Ackley function was solved in Examples 7.1
and 21.3. We now do the same thing by using imperialist competitive algorithm. We
set the number of initial countries as 200, number of initial imperialists as 8, number

1
10
Best value
Mean value
0
10
Function value
−1
10
−2
10
−3
10
−4
10
0 20 40 60 80 100
Iteration
Figure 21.2 The evolution of a random run of imperialist competitive algorithm for Ackley func-
tion: the minimum and average objectives.
of all colonies as 192, number of decades as 100, revolution rate as 0.3, assimila-
tion coefficient β = 2, assimilation angle coefficient gama = 0.5, cost penalizing
parameter of all colonies as 0.02, damping ratio as 0.99, uniting threshold as 0.02,
and α = 0.1. The algorithm stops when just one empire is remaining. The algo-
rithm always converges to the global optimum very rapidly for random runs. For
a random run, we have f (x) = 3.1093 × 10−4 at (0.0269, −0.1065) × 10−3 . The
convergence curves are illustrated in Figure 21.2.
21.4 Several Metaheuristics Inspired by Human Behaviors
League Championship Algorithm

League championship algorithm [14] is a stochastic population-based metaheuris-
tic for continuous global optimization. It mimics the championship process in sport
leagues wherein artificial teams play in an artificial league for several weeks (itera-
tions). A number of individuals making role as sport teams compete in a league for
several weeks (iterations). Based on the league schedule in each week, teams play in
pairs and their game outcome is determined in terms of win or loss (or tie), given the
playing strength (fitness value) along with the particular team formation/arrangement
(solution) followed by each team. Keeping track of the previous week events, each
team devises the required changes in its formation/playing style (a new solution is
generated) for the next week contest and the championship goes on for a number
of seasons (stopping condition). The teams are similar to PSO’s particle, but with
a quite different way of performing their search. The way in which a new solution
associated to a team is generated is governed via imitating the match analysis process
followed by coaches to design a suitable arrangement for their forthcoming match.
In a typical match analysis, coaches will modify their arrangement on the basis of
their own game experiences and their opponent’s style of play.
Golden Ball Metaheuristic
Golden ball [18,19] is a multiple-population metaheuristic based on soccer concepts.
It was designed to solve combinatorial optimization problems. In the initialization
phase, the population of players is created. These players are divided among the
different subpopulations called teams. Each team has its own training method or
coach. The competition phase is divided into seasons. Each season is composed of
weeks, in which the teams train independently and face one another creating a league
competition. At the end of every season, a transfer procedure happens, in which the
players and coaches can switch teams. The competition phase is repeated until the
termination criterion is met.
21.4 Several Metaheuristics Inspired by Human Behaviors 343
Squeaky Wheel Optimization

Squeaky wheel optimization [12] is a metaheuristic based on the observation in
combinatorial problems. The solutions consist of components which are intricately
woven together in a nonlinear, nonadditive fashion.
An initial solution is first constructed by a greedy algorithm. The solution is
then analyzed to assign blame to the components which cause trouble in the solu-
tion, and this information is used to modify the priority order in which the greedy
algorithm constructs the new solutions. This cycle continues until a stopping con-
dition is reached. At each iteration, squeaky wheel optimization does a complete
construction of a solution starting from the empty assignment. Hence, the cycle has
the consequence that problem components that are hard to handle tend to rise in the
priority queue, and components that are easy to handle tend to sink. In essence, this
method finds good quality solutions by searching in two spaces simultaneously: the
traditional solution space and the new priority space.
The method has poor scalability due to the random starting point of the greedy
constructor and slow convergence due to the inability to make small moves in the
solution space. If its construction process only started from a partial solution for each
cycle, squeaky wheel optimization would speed up significantly. In addition, if it were
possible to restrict changes of components to the trouble-makers only, the changes in
the corresponding solutions would be relatively small. Evolutionary squeaky wheel
optimization [1] is designed to improve the intensification by keeping the good
components of solutions and only using squeaky wheel optimization to reconstruct
other poorer components of the solution. It incorporates two additional operators into
the cycle: selection and mutation to improve squeaky wheel optimization. It moves
through the space of partial assignments, in contrast to the analyses used in local
search (such as SA) which only move through complete assignments. It is analyzed
in [15] based on Markov chains.
Exchange Market Algorithm
Exchange market algorithm [11] is a metaheuristic method for continuous optimiza-
tion, inspired by the procedure of trading the shares on stock market. Evaluation of
how the stocks are traded on the stock market by elites has formed this evolutionary
as an optimization algorithm. All of the shareholders try to introduce themselves
as the most successful individuals to market and then the individuals have less fit-
ness tend to do greater risks. Similar to real stock market, each individual carries
unique trade and risks. Shareholders are arranged according to their rank after each
fitness test. The individuals in the first group as successful people in the market re-
main unchanged in all stages of the market. The second and third groups trade with
separate equations. In a non-oscillated market, the individuals in second and third
groups select stocks which are same or close to the shares of the first group. In other
words, the algorithm has the duty to recruit members toward the elite members. For
an oscillated market, the individuals in second and third groups trade with separate
relationship at high risk, that is, the algorithm searches for unknown points.
Group Counseling Optimization

Group counseling optimization [8,9] is a population-based optimization algorithm
based on emulating the behavior of human beings in life problem-solving through
counseling within a group. The inspiration radiates from the various striking points
of analogy between group counseling and population-based optimization which we
have discovered. A multiobjective version of group counseling optimization [2] gives
promising results in solving multiobjective optimization problems.
Human Learning Optimization
Human learning optimization [24] is a metaheuristic inspired by human learning
mechanisms, in which the individual learning operator, social learning operator, ran-
dom exploration learning operator and relearning operator are developed to generate
new solutions and search for the optima by mimicking the human learning process.
Then HLO is applied to solve multidimensional knapsack problems.
Creative Thinking-Based Optimization
Creative thinking plays an essential role in the progress of human society. Creativity-
oriented optimization [10] is a metaheuristic for continuous optimization problems,
inspired by the creative thinking process. The method is constructed by simplifying
the procedure of creative thinking. The approach has high intelligence, effectiveness,
parallelism and low computational complexity for complex problems. Brain storm
optimization [23] is inspired by the human brainstorming process. It simulates the
problem-solving process of a group of people.
Immigrant Population Search
Immigrant population search [13] is a population-based algorithm for solving con-
strained combinatorial optimization, inspired by the pattern of human population
migration to find better habitats. In this algorithm, the life environment is the solu-
tion space of the problem. Every point of this space is a solution, either feasible or
infeasible, and the quality of life at that point is the value of fitness function for that
solution. Each population group tries to investigate feasible and better habitats.
Democracy-Inspired PSO with the Concept of Peer Groups
The concept of governance in human society is integrated with PSO. Democracy-
inspired PSO with the concept of peer groups [4] solves multidimensional multi-
modal optimization problems by exploiting the concept of peer-influenced topology,
where the particles are given a choice to follow two possible leaders who have been
selected on the basis of a voting mechanism. The leader and the opposition have their
influences proportional to the total number of votes polled in their favor.
Problems
21.1 Write the flow chart of seeker optimization algorithm.

21.2 Write the flow chart of imperialist competitive algorithm.
References 345
References
1. Aickelin U, Burke EK, Li J. An evolutionary squeaky wheel optimisation approach to personnel
scheduling. IEEE Trans Evol Comput. 2009;13:433–43.
2. Ali H, Khan FA. Group counseling optimization for multi-objective functions. In: Proceedings
of IEEE congress on evolutionary computation (CEC), Cancun, Mexico, June 2013. p. 705–
712.
3. Atashpaz-Gargari E, Lucas C. Imperialist competitive algorithm: an algorithm for optimization
inspired by imperialistic competition. Proceedings of IEEE congress on evolutionary compu-
tation (CEC), Singapore, September 2007. p. 4661–4666.
4. Burman R, Chakrabarti S, Das S. Democracy-inspired particle swarm optimizer with the con-
cept of peer groups. Soft Comput. 2016, p. 1–20. doi:10.1007/s00500-015-2007-8.
5. Chen M-H, Chen S-H, Chang P-C. Imperial competitive algorithm with policy learning for the
traveling salesman problem. Soft Comput. 2016, p. 1–13. doi:10.1007/s00500-015-1886-z.
6. Dai C, Chen W, Zhu Y, Zhang X. Seeker optimization algorithm for optimal reactive power
dispatch. IEEE Trans Power Syst. 2009;24(3):1218–31.
7. Dai C, Zhu Y, Chen W. Seeker optimization algorithm. In: Wang Y, Cheung Y, Liu H, editors.
Computational intelligence and security, vol. 4456 of Lecture Notes in Computer Science.
8. Eita MA, Fahmy MM. Group counseling optimization: a novel approach. In: Proceedings of
the 29th SGAI international conference on innovative techniquesand applications of artificial
intelligence (AI-2009), Cambridge, UK, Dec 2009, p. 195–208.
9. Eita MA, Fahmy MM. Group counseling optimization. Appl Soft Comput. 2014;22:585–604.
10. Feng X, Zou R, Yu H. A novel optimization algorithm inspired by the creative thinking process.
Soft Comput. 2015;19:2955–72.
11. Ghorbani N, Babaei E. Exchange market algorithm. Appl Soft Comput. 2014;19:177–87.
12. Joslin D, Clements DP. Squeaky wheel optimization. J Artif Intell Res. 1999;10:353–73.
13. Kamali HR, Sadegheih A, Vahdat-Zad MA, Khademi-Zare H. Immigrant population search
algorithm for solving constrained optimization problems. Appl Artif Intell. 2015;29:243–58.
14. Kashan AH. League championship algorithm (LCA): an algorithm for global optimization
inspired by sport championships. Appl Soft Comput. 2014;16:171–200.
15. Li J, Parkes AJ, Burke EK. Evolutionary squeaky wheel optimization: a new framework for
analysis. Evol Comput. 2011;19(3):405–28.
16. Lim WH, Isa NAM. Teaching and peer-learning particle swarm optimization. Appl Soft Com-
put. 2014;18:39–58.
17. Nazari-Shirkouhi S, Eivazy H, Ghodsi R, Rezaie K, Atashpaz-Gargari E. Solving the integrated
product mix-outsourcing problem by a novel meta-heuristic algorithm: imperialist competitive
algorithm. Expert Syst Appl. 2010;37(12):7615–26.
18. Osaba E, Diaz F, Onieva E. A novel meta-heuristic based on soccer concepts to solve routing
problems. In: Proceedings of the 15th ACM annual conference on genetic and evolutionary
computation (GECCO), Amsterdam, The Netherlands, July 2013. p. 1743–1744.
19. Osaba E, Diaz F, Onieva E. Golden ball: a novel metaheuristic to solve combinatorial opti-
mization problems based on soccer concepts. Appl Intell. 2014;41(1):145–66.
20. Rao RV, Patel V. An elitist teaching-learning-based optimization algorithm for solving complex
constrained optimization problems. Int J Ind Eng Comput. 2012;3:535–60.
21. Rao RV, Savsania VJ, Balic J. Teaching-learning-based optimization algorithm for uncon-
strained and constrained real-parameter optimization problems. Eng Optim. 2012;44:1447–62.
22. Rao RV, Savsani VJ, Vakharia DP. Teaching-learning-based optimization: an optimization
method for continuous non-linear large scale problems. Inf Sci. 2012;183(1):1–15.
23. Shi Y. Brain storm optimization algorithm. In: Advances in swarm intelligence, Vol. 6728 of
Lecture Notes in Computer Science. Berlin: Springer; 2011. p. 303–309.
24. Wang L, Yang R, Ni H, Ye W, Fei M, Pardalos PM. A human learning optimization algorithm
and its application to multi-dimensional knapsack problems. Appl Soft Comput. 2015;34:736–
43.
25. Zou F, Wang L, Hei X, Chen D. Teaching-learning-based optimization with learning experience
of other learners and its application. Appl Soft Comput. 2015;37:725–36.
26. Zou F, Wang L, Hei X, Chen D, Jiang Q, Li H. Bare-bones teaching-learning-based optimiza-
tion. Sci World J. 2014; 2014: 17 pages. Article ID 136920.
Dynamic, Multimodal,
and Constrained Optimizations 22
This chapter treats several hard problems associated with metaheuristic optimization,
namely, dynamic, multimodal, and constrained optimization problems.
22.1 Dynamic Optimization
For dynamic optimization problems (DOPs, http://www.dynamic-optimization.org),

the evaluation function and/or problem-specific constraints, such as design variables
and environmental conditions, may change over time. In such cases, the goal of an
optimization algorithm is no longer to find a satisfactory solution to a fixed problem,
but to track the moving optimum in search space as closely as possible within the
time specified by the rate of change.
The simplest strategy to cope with a change of the environment is to regard every
change as the arrival of a new optimization problem that has to be solved from
scratch. However, this strategy generally requires substantial computational efforts.
Therefore, the goal is no longer to locate a stationary optimal solution, but to track
its movement through the solution and time spaces as closely as possible.
EAs often place emphasis on adaptability. Once converged, EAs cannot adapt
well to the changing environment. Several specific strategies have been proposed for
EAs on DOPs, including diversity reinforcing or maintaining schemes [16,30,78],
memory schemes [7,80], multipopulation schemes [8,72], adaptive schemes [48,55],
multiobjective optimization methods [12], and problem change detection approaches
[60].
The random immigrants scheme addresses dynamic environments by maintaining
the population diversity throughout the run via introducing new individuals into the
current population, while the memory scheme aims to adapt EAs quickly to new
environments by reusing historical information. Random immigrants are usually
beneficial to improve the performance of GAs in dynamic environments.

DOI 10.1007/978-3-319-41192-7_22
348 22 Dynamic, Multimodal, and Constrained Optimizations
22.1.1 Memory Scheme
Memory scheme works by implicitly using redundant representation or explicitly

storing good solutions, usually the best ones of the population, regularly during the
run in an extra memory and reusing them when the environment changes. Especially,
when the environment changes cyclically, memory can work very well. The use of
memory technique is only useful when periodic changes occur.
A memory-based immigrants and an elitism-based immigrants scheme are pre-
sented in [78] for GAs in dynamic environments. In these schemes, the best individual
from memory or the elite from the previous generation is retrieved as the base to
create immigrants into the population by mutation. This way, not only can diversity
be maintained but it is done more efficiently to adapt GAs to the current environ-
ment. The memory-based immigrants scheme combines the principles of memory
and random immigrants and consistently improves the performance of GAs in dy-
namic environments. On the other hand, the elitism-based immigrants scheme has
inconsistent effect on the performance of GAs in dynamic environments. When the
environment involves slight changes consistently, elitism-based immigrants scheme
outperforms memory-based immigrants scheme.
The application of the memory scheme for PBIL algorithms is investigated for
DOPs in [1]. A PBIL-specific associative memory scheme, which stores best solu-
tions as well as corresponding environmental information in the memory, is investi-
gated to improve its adaptability in dynamic environments. In [80], the interactions
between the memory scheme and random immigrants, multipopulation, and restart
schemes for PBILs in dynamic environments are investigated. A dynamic environ-
ment generator that can systematically generate dynamic environments of different
difficulty with respect to memory schemes is also proposed. The proposed memory
scheme is efficient for PBILs in dynamic environments.
Many experimental studies have shown that locating and tracking a set of optima
rather than a single global optimum is an effective idea to solve DOPs [56,79].
22.1.2 Diversity Maintaining or Reinforcing
Maintaining or reinforcing the diversity of EAs, among which, immigrants schemes

are the simplest approaches to implement and have been validated to be efficient
[16,30,72,77,83]. Immigrants schemes attempt to maintain the diversity of the pop-
ulation via introducing new individuals into the current population.
The existing immigrants schemes are categorized into direct and indirect immi-
grants schemes. The random immigrants scheme works well in environments where
there are occasional, large changes in the location of the optimum.
The direct immigrants scheme generates immigrants based on the current popula-
tion. Examples are elitism-based immigrants scheme [77] in which immigrants come
from mutating the elite from previous generation for slowly and slightly changing
environments. The indirect immigrants scheme first builds a model based on the cur-
rent population, then generates immigrants according to the model. In [83], a vector
22.1 Dynamic Optimization 349
with the allele distribution of the population was first calculated and then was used
to generate immigrants for GAs to address DOPs with some preliminary results.
As to the number of immigrants, in order to prevent immigrants from disrupting
the ongoing search progress too much, the ratio of the number of the immigrants to
the population size, i.e., the replacement rate, is usually set to a small value, e.g., 0.2
or 0.3.
Some special diversity schemes have been developed for PSO in dynamic envi-
ronments. In charged PSO [5], a nucleus of neutral particles is surrounded by some
charged particles. The charge imposes a repulsion force between particles and thus
hinders the swarm to converge. Several techniques are quantum particles [6] based
on a quantum, the replacement of global by local neighborhoods [41] or hierarchical
neighborhood structures [36].
22.1.3 Multiple Population Scheme
Multiple population methods [6,41,56,79] are used to enhance the population diver-
sity for an algorithm with the aim of maintaining multiple populations in different
subareas in the fitness landscape. One challenging issue of using the multipopulation
method is that of how to create an appropriate number of subpopulations with an ap-
propriate number of individuals to cover different subareas in the fitness landscape.
Clustering particle swarm optimizer of [79] can solve this problem. A hierarchical
clustering method is employed to automatically create a proper number of subpopu-
lations in different subareas. A hierarchical clustering method is investigated in [39]
to locate and track multiple optima for dynamic optimization problems.
In another multi-swarm approach [56], the number and size of swarms are adjusted
dynamically by a speciation mechanism, which was originally proposed for finding
multiple optima in multimodal landscapes.
The dynamic forecasting genetic program (DyFor GP) model [74] is a dynamic
GP model that is specifically tailored for forecasting in nonstatic environments. It
incorporates features that allow it to adapt to changing environments automatically,
as well as retain knowledge learned from previously encountered environments.
By adapting the concept of forking GA [70] to time-varying multimodal opti-
mization problems, multinational GA [72] uses multiple GA populations known as
nations to track multiple peaks in a dynamic environment, with each nation having
a policy representing the best point of the nation.
The self-organizing scouts approach [8] divides the population into a parent popu-
lation that searches the solution space and child populations that track known optima.
The parent population is periodically analyzed for clusters of partly converged indi-
viduals which are split off as child populations centered on the best individual in the
child population. Members of the parent population are then excluded from the child
population’s space. The size of child populations is altered to give large populations
to optima demonstrating high fitness or dynamism.
Metric for Dynamic Optimization

A metric for evaluating dynamic optimization is defined by
1
Nc
(c) (c)
e= (fopt − fbest ), (22.1)
Nc
c=1
(c) (c)
where Nc is the number of changes in the problem, fopt and fbest are the function
values for the problem global optimum and the algorithm best solution before the
change c, respectively.
22.2 Multimodal Optimization
Multimodal optimization is a generic and large class of optimization problems, where

the search space is split into regions containing local optima. The objective is to
identify a number of local optima and to maintain these solutions while continuing
to search other local optima. Each peak in the solution landscape can be treated as
a separate environment niche. A niche makes a particular subpopulation (species)
unique. The niching mechanism embodies both cooperation and competition.
An ecosystem is composed of different physical spaces (niches) that allow the
formation and maintenance of different types of species. A species is formed by
individuals with similar biological features that can breed among themselves, but
cannot breed with individuals of other species. A species adapts to the specific
features of the niche where it lives. The fitness of an individual measures its ability to
exploit environmental resources to generate offspring. In artificial systems, a niche
corresponds to a peak of the fitness landscape, while a species to a subpopulation of
individuals that have similar features. Niches are thus partitions of an environment,
and species are partitions of a population competing within the environment [34].
Horn defines implicit niching as the sharing of resources, and explicit niching as the
sharing of fitness [34].
Niching [50] is referred to as the technique of finding and preserving multiple
stable niches, or favorable parts of the solution space possibly around multiple solu-
tions, so as to prevent convergence to a single solution. For each optimum solution,
a niche is formed in the population of an EA. Besides multimodal problems, nich-
ing techniques are also frequently employed for solving multiobjective and dynamic
optimization problems [35].
The two main objectives of niching are to converge to multiple, highly fit, and
significantly different solutions, and to slow down convergence in cases where only
one solution is required. Niching can be implemented by preventing or suppressing
the crossover of solutions that are dissimilar to each other. It is also necessary to
prevent a large subpopulation from creating a disproportionate number of offspring.
Niching techniques are usually based on the ideas of crowding [21,49,69], fitness
sharing [29], clustering [82], restricted tournament selection [32], and speciation
[40,54,58].
22.2 Multimodal Optimization 351
Generally, the niching methods can be divided into two major categories: sequen-
tial niching and parallel niching. Sequential niching develops niches sequentially
over time. As niches are discovered, the search space of a problem is adapted to
repel other individuals from traversing the area around the recently located solu-
tion. The sequential niching technique [4] modifies the evaluation function in the
region of the solution to eliminate the solution found once an optimum is found. GA
continues the search for new solutions without restarting the population. In order to
avoid repeated search within previously visited areas, individuals in the vicinity of
a discovered optimum are punished by a fitness derating function.
Parallel niching forms and maintains several different niches simultaneously. The
search space is not modified. Parallel niching techniques therefore not only depend
on finding a good measure to locate possible solutions, but also need to organize
individuals in a way that maintains their organization in the search space over time,
to populate locations around solutions [29,34]. Most multimodal GAs adopt a parallel
scheme [27,29,40,54,70,71,82].
Dynamic niche clustering approach [27] starts from N small niches with given
initial radii. It merges niches approaching the same optimum and splits niches focus-
ing on different optima. Each niche has an independent
√
radius, which is dynamically
adjusted, with an initial radius σinitial = Nλ 1/dd , where d is the dimensionality of the
problem at hand, and λ is a constant. Dynamic niche clustering is able to identify
niches of variable radii. It also allows some overlap between niches. In [22], each
individual has its own radius, and the niche radius is incorporated as an additional
variable of the optimization problem.
Crossover between individuals from different niches may lead to unviable off-
spring and is usually avoided. It introduces a strong selection advantage to the niche
with the largest population, and thus accelerates symmetry breaking of the search
space and causes the population to become focused around one region of the search
space. This, however, prevents a thorough exploration of the fitness landscape and
makes it more likely to find a suboptimal solution.
22.2.1 Crowding and Restricted Tournament Selection
A crowd is a subpopulation and crowding is inspired by a naturally occurring phe-

nomenon in ecologies, namely, competition among similar individuals for limited
resources. Similar individuals compete to occupy the same ecological niche, while
dissimilar individuals do not compete, as they do not occupy the same ecological
niche. When a niche has reached its carrying capacity, older individuals are replaced
by newer individuals. The carrying capacity of the niche does not change, so the pop-
ulation size will remain constant. Crowding makes individuals within a single niche
compete with one another over limited resources, and thus prevents a single geno-
type from dominating a population and allows other less-fit niches to form within
the population.
Crowding was originally devised as a diversity preservation technique [21]. For
GA, at each step, the crowding algorithm selects only a portion of the current gen-
eration to reproduce. In each generation, a fraction of the population dies and is

replaced by new offspring. A newly generated individual replaces the individual that
is most similar to it from a pool of randomly selected individuals. Crowding factor
CF is used to control the size of the sample. CF is generally set to 2 or 3. The com-
putational complexity of crowding is O(N) for a population size of N. Crowding is
simple. However, replacement error is the main disadvantage of crowding. A higher
crowding factor (e.g., one equal to the population size) could avoid replacement
errors [69]. However, when optima are close to one another, the replacement errors
may still occur since a new individual may replace another similar individual that
belongs to a different species.
Deterministic crowding [49,50] tries to improve the original crowding. It elimi-
nates niching parameter CF, reduces the replacement errors, and restores selection
pressure. This method also faces the problem of loss of niches, as it also uses local-
ized tournament selection between similar individuals. In deterministic crowding,
each offspring competes only with its most similar parent. Both the original crowd-
ing and deterministic crowding methods select better individuals deterministically.
In GA with crossover, deterministic crowding works as follows. In every generation,
the population is partitioned into λ/2 pairs of individuals, assuming λ to be even.
These pairs are then recombined and mutated. Every offspring then competes with
one of its parents and may replace it if the offspring is not worse. As we do not
consider crossover, we adapt the main idea of offspring competing with their parents
for a steady-state mutation-based algorithm. In deterministic crowding, phenotypic
similarity measures are used instead of genotypic measures. Deterministic crowd-
ing compares an offspring only to its parents and not to a random sample of the
population. Random selection is used to select individuals for reproduction.
To prevent the loss of niches with lower fitness or loss of local optima, probabilistic
crowding [52] uses a probabilistic replacement rule which replaces the lower fitness
individuals by higher fitness individuals in proportion to their finesses. This technique
leads to high diversity. On the other hand, this method suffers from slow convergence
and poor fine searching ability. Probabilistic crowding is a tournament selection
algorithm using distance-based tournaments, and it employs a probabilistic rather
than a deterministic acceptance function as basis for replacement.
Generalized crowding [26] encompasses both deterministic and probabilistic
crowding, and allows better control over the selection pressure.
In a spirit similar to crowding, restricted tournament selection [32] method selects
a random sample of w (window size) individuals from the population and determines
which one is the closest to the offspring, by either Euclidean (for real variables)
or Hamming (for binary variables) distance measure. The closest member within
the w individuals will compete with the offspring and the one with higher fitness
will survive in the next generation. The complexity is O(Nw). The method has
replacement error.
In [53], niching using crowding techniques is investigated in the context of local
tournament algorithms. The family of local tournament algorithms includes Metropo-
lis algorithm, SA, restricted tournament selection, and parallel recombinative SA. An
algorithmic and analytical framework is used to analyze the probabilistic crowding

algorithm.
Crowding DE [69] extends DE with a crowding scheme to allow it to tackle
multimodal optimization problems. Crowding DE, with a crowding factor equal to the
population size, has outperformed sharing DE on standard benchmarks. In crowding
DE, when an offspring is generated, its fitness is only compared with the most similar
(in terms of lo Euclidean distance) individual in the current population. Crowding
DE limits the competition between nearest members to maintain the diversity.
22.2.2 Fitness Sharing
Fitness sharing was originally introduced as a diversity-maintenance technique [29].

It is a method for creating subpopulations of like individuals. It is a parallel, explicit
niching approach. The algorithm regards each niche as a finite resource, and shares
this resource among all individuals in the niche. Sharing in EAs is implemented
by scaling the fitness of an individual based on the number of similar individuals
present in the population. The fitness fi of individual i is adapted to its shared fitness.
Fitness sharing treats fitness as a shared resource among similar individuals, and
thus prevents a single genotype from dominating the population and encourages the
development of new niches. Crowding is simple, while sharing is far more complex,
yet far more effective in multimodal optimization.
Sharing also encourages the search in unexplored regions of the space by in-
creasing the diversity of the population. However, specifying the niching parameter
σshare requires prior knowledge of how far apart the optima are. Sharing method is
computationally more expensive than other commonly used niching techniques.
The similarity between individuals x and y is measured by a sharing function
sh(x, y) ∈ [0, 1], where a large value corresponds to large similarity. The idea is that
if there are several copies of the same individual in the population, these individuals
have to share their fitness. As a consequence, selection is likely to remove such
clusters and to keep the individuals apart. It is a common practice to use a sharing
distance a such that individuals only share fitness if they have a distance less than a.
Fitness sharing attempts to maintain the diversity of the population by altering the
fitnesses of potential optima to approximately the same level. A niche is a group of
individuals within the same vicinity of a potential optimum. In each niche, the raw
fitness fi of an individual i is scaled in the following manner
fi
fsh,i = , (22.2)
mi
where mi is a niche count given by

N
mi = sh(dij ), (22.3)
j=1
N being the total number of individuals in the population.

The sharing function sh(dij ) is defined by

α
dij
1 − σshare , if dij < σshare
sh(dij ) = , (22.4)
0, otherwise
where α is a constant used to configure the shape of the sharing function, dij is the
phenotypic or genetypic distance between two individuals i and j, and niche radius
√
d
σshare = 1/d , (22.5)
2p
d being the dimensionality of the problem at hand and p the estimated number of
optima in the search space, which is usually unavailable. It is assumed that niches
occur at least a minimum distance, 2σshare , from each other.
Standard fitness sharing suffers from a time complexity of O(N 2 ), incurred by
the computation of the sharing function. One of the major improvements is the
introduction of cluster analysis [27,54,82]. Individuals are clustered into different
groups based on their location in the search space. When computing the sharing
function value of an individual, the individual is compared to the center of each
cluster, rather than all other individuals in the population. Hence, the overall cost
is reduced to O(NK), for K clusters. Sharing DE [69] integrated the fitness sharing
concept with DE.
Dynamic fitness sharing [15] allows an explicit, dynamic identification of the
species discovered at each generation, their localization on the fitness landscape, the
application of the sharing mechanism to each species separately, and a species elitist
strategy. It performs significantly better than fitness sharing and species-conserving
GA without requiring any further assumption on the fitness landscape than those
assumed by the fitness sharing itself.
It is known, both empirically and through theoretical analysis, that fitness sharing
is fairly robust to parameterization [18,34].
Restricted mating techniques do not select a mate uniformly at random, and have
been successfully developed for specific contexts such as fitness sharing [18] and
incest prevention [23]. Fitness sharing uses a restricted mating approach to avoid the
creation of lethal (low fitness) individuals, and restricted mating among individuals
of the same niche is promoted. Incest prevention [23] promotes restricted mating
between dissimilar enough individuals. In comparison to random mating, similarity-
based restricted mating has been shown to produce a more effective exploration of
the search space, both in fitness sharing [18] and in incest prevention [23].
22.2.3 Speciation
In general, speciation refers to the partitioning of a population into subpopulations

such that each of them occupies a different region of attraction on the fitness land-
scape. That is, each subpopulation is expected to independently cover a peak and its
surrounding region. The idea of speciation is commonly used in multimodal opti-
mization [40,58].
Nonspeciation-based niching techniques try to maintain diversity at the individual

level. Speciation-based niching techniques rely on speciation to partition a population
into subpopulations (species) such that each occupies a different region of attraction
(niche) on the fitness landscape.
Speciation method also needs to specify a radius parameter rs . The center of a
species is called species seed. Each of the species is built around the dominating
species’ seed. All individuals that fall within the radius from the species seed are
of the same species. In this way, the whole population is classified into different
groups according to their similarity. The complexity of speciation is between O(N)
and O(N 2 ). Speciation is able to maintain high diversity and stable niches over
generations.
Species conservation techniques take explicit measures to preserve identified
species from one generation to another [40]. Species-wise local evolution techniques
evolve different species independently to facilitate convergence to their respective
optima [41,42,71]. Two state-of-the-art EAs for multimodal optimization, namely,
biobjective multipopulation GA [81] and topological species conservation version 2
[66], are both speciation-based.
Speciation methods mainly differ in the way they determine whether two individu-
als are of the same species, and can thereby be categorized into distance-based [56],
or topology-based [72]. Distance-based methods rely on the intuition that closer
individuals are more likely to be of the same species. Typically, a fixed distance
threshold, called the niche radius, is specified [29]. In CMA-ES2 niching algorithm,
there is a niche radius for each individual, which is dynamically adapted according
to the step size of that individual [63]. For spherically shaped niches, Mahalanobis
distance is used based on the self-adapted covariance matrix of each individual in
CMA-ES [64]. An important advantage of these distance-based methods is that they
rely on distance calculations only and do not refer to the fitness landscape for any
information.
Topology-based methods are more flexible and have fewer assumptions than
distance-based methods. They rely on the intuition that two individuals should be of
different species if there exists a valley between them on the fitness landscape. This
methodology is able to form species of varying sizes and shapes. Topology-based
methods do not use the niche radius. Existing topology-based methods all require
sampling and evaluating new individuals to capture the landscape topography, incur-
ring extra fitness evaluations.
Hill-valley is a widely used topology-based method to determine whether two
individuals are of the same species. Hill-valley was originally integrated in multi-
national GA [71,72]. It was employed by other EAs for multimodal optimization,
including dynamic niche clustering [27], crowding clustering GA [47], and topo-
logical species conservation [66]. To determine whether two points are of the same
species, hill-valley examines the landscape topography along the line segment con-
necting the two points. If there exists a third point on the line segment whose fitness
is lower than that of both points, a valley that separates the two points to different
hills is then identified, and the two points are determined to be of different species.
History-based topological speciation [44] is a parameter-free speciation method

that relies exclusively on search history to capture the landscape topography and,
therefore, does not require any additional fitness evaluations. The crucial task is
to approximate a continuous point sequence along a line segment using evaluated
history points.
Species-conserving GA [40] divides the population into several species. All dom-
inant individuals, called the species seed, are finally filtered by an acceptance thresh-
old. Individuals with fitness above the threshold are identified as global optima. The
only difference between this approach and GA is the introduction of two processes:
the selection of seeds and the conservation of species.
Species-conserving GA and opt-aiNet do not consider any sharing mechanism.
Once a new species is discovered, its fittest individual is retained in the next gener-
ations until a fitter individual for that species is generated, thus realizing a sort of
elitism. However, such a behavior implies that each species populating a region of the
fitness landscape survives during the entire evolution, whether or not it corresponds
to an actual niche. In addition, the number of individuals forming a species is not
related to the niche carrying capacity.
There are also speciation-based DE [42], speciation-based PSO [41] which form
species based on Euclidean distance.
22.2.4 Clearing, Local Selection, and Demes
Clearing eliminates similar individuals and maintains the diversity among the se-
lected individuals. Clearing [58] determines the dominant individuals of the sub-
populations and removes the remaining population members from the mating pool.
The algorithm first sorts the population members in descending order of their fit-
ness values. It then picks one individual at a time from the top and removes all the
individuals with worse fitness than the selected one within the specified clearing ra-
dius σclear . This step is repeated until all the individuals in the population are either
selected or removed. The complexity of clearing is O(cN), for c niches maintained
during the generations. Clearing is simpler than sharing. It is also able to preserve
the best elements of the niches during the generations. However, clearing can be
slow to converge and may not locate local optima effectively. In clearing, the cleared
individuals still occupy population slots. In [65] these individuals are reallocated
outside the range of their respective fittest individuals. It is known that clearing is
particularly sensitive to parameterization [40].
In local selection scheme [51], fitness is the result of an individual’s interaction
with the environment and its finite shared resources. Individual fitnesses are com-
pared to a fixed threshold to decide as to who gets the opportunity to reproduce.
Local selection is an implicitly niched scheme. It maintains genetic diversity in a
way similar to, yet generally more efficient than, fitness sharing. Local selection is
suitable for parallel implementations. It can effectively avoid premature convergence
and it applies minimal selection pressure upon the population.
Another technique is to split the population into subpopulations or demes. The

demes evolve independently except for an occasional migration of individuals be-
tween demes. In cellular GA, local mating explores the peak in each deme, and finds
and maintains multiple solutions. Forking GA [70] is such a technique for multimodal
optimization. Depending on the convergence status and the solution obtained so far,
forking GA divides the search space into subspaces. It is a multipopulation scheme
that includes one parent population and one or more child populations, each exploit-
ing a subspace [70]. A parent population continuously searches for new peaks, while
a number of child populations try to exploit previously detected promising areas.
22.2.5 Other Methods
In [66], the conservation of the best successive local individuals is integrated with
a topological method of separating the subpopulations instead of the conventional
radius-triggered manner.
Some niching techniques integrated with PSO are given in [11,56]. In [43], a
simple lbest PSO employing the ring topology is used to ensure stable niching be-
haviors. Index-based neighborhood is utilized in ring-topology-based PSO to control
the speed of convergence for the PSO population.
In [9], a neighborhood-based mutation is integrated with three different DE nich-
ing algorithms, namely, crowding DE, species-based DE, and sharing DE to solve
multimodal optimization problems. In neighborhood mutation, difference vector gen-
eration is limited to m similar individuals. In this way, each individual is evolved
toward its nearest optimal point and the possibility of between niche difference vec-
tor generation is reduced. Generally, m should be chosen between 0.05 to 0.2 of the
population size. In [59], an Euclidean neighborhood-based mutation is integrated
with various niching DE algorithms. Neighborhood mutation is able to restrict the
production of offspring within a local area or the same niche as their parents.
In addition to those genetic operator based GA such as clearing [58] and species-
conserving GA [40], there are also some population-based multimodal GA tech-
niques, such as multinational GA [71], multipopulation GA, forking GA, and roam-
ing. Forking GA [70] uses a multipopulation scheme, which involves one parent
population that explores one subspace and one or more child populations that exploit
other subspaces. Multinational GA [71] maintains a number of nations. Each nation
corresponds to a promising optimum area in the search space. Mating is locally re-
stricted within individual nation. Selection is performed either globally (weighted
selection) or locally (national selection). In multinational GA with national selection,
individuals only compete with other individuals from the same nation.
CMA-ES with self-adaptive niche radius [64] applied a concept of adaptive indi-
vidual niche radius in conjunction with CMA-ES. The so-called niche radius problem
is addressed by the introduction of radii-based niching methods with derandomized
ES [64]. A new concept of an adaptive individual niche radius is applied to niching
with CMA-ES.
By adapting crossover, mutation, and selection parameters, ACROMUSE [28]

creates and maintains a diverse population of highly fit (healthy) individuals, ca-
pable of adapting quickly to fitness landscape change and well-suited to efficient
optimization of multimodal fitness landscapes. Standard population diversity (SPD)
measure is employed to adapt crossover and mutation, while selection pressure is
controlled by adapting tournament size according to healthy population diversity
(HPD) measure. SPD is calculated by finding the position of the average individual
within the population and summing the gene-wise Euclidean distances from this
average point to the location of each individual. HPD describes fitness-weighted
diversity while SPD solely describes solution space diversity. ACROMUSE tourna-
ment selection mechanism selects individuals according to healthy diversity rather
than fitness. ACROMUSE achieves an effective balance between exploration and
exploitation.
Collective animal behavior algorithm [17] is a multimodal optimization algo-
rithm. It generally outperforms deterministic crowding [50], probabilistic crowding
[52], sequential fitness sharing [4], clearing procedure [58], clustering analysis [82],
species-conserving GA [40], and aiNet, regarding efficiency and solution quality,
typically showing significant efficiency speedups.
Biobjective Approaches
Biobjective approaches are proposed for solving multimodal optimization problems.
Biobjective multipopulation GA [81] uses two complementary objective functions
for simultaneous detection of multiple peaks over a function landscape. The first
objective is the multimodal function itself, and the second objective is chosen as
gradient of the function for continuous problems and a numerical estimation of the
gradient for discrete problems. Based on these two objectives, all the population
members are ranked into two ranking lists. Next, a clustering algorithm is employed
to form subpopulations around potential optima. The subpopulations are allowed to
evolve independently toward their potential optima.
In niching-based NSGA-II [20], the second objective is designed from the inspira-
tion of gradient, though without using the exact or estimated gradient. For a solution
x, it is a count of the neighboring solutions that are better than this solution. Based
on these two objectives, the nondominated front members are identified using the
modified domination principle and a clearing operation.
In [3], a second objective is introduced to increase the population diversity, and
is chosen to maximize the mean Euclidean distance of a solution from all other
population members. An external archive is maintained to keep track of solutions
having the current best fitness values. The archive prevents the generation of new
solutions from near points already stored in the archive. It also helps to reduce the total
number of function evaluations. Clearing of solutions in the archive is done during
archive update. The biobjective formulation is solved using DE with nondominated
sorting and hypervolume measure-based sorting.
22.2.6 Metrics for Multimodal Optimization
The popular performance metrics for multimodal optimization are effective number
of the peaks maintained (ENPM), maximum peak ratio (MPR), and Chi-square-like
performance criterion.
A large ENPM value indicates a good ability to identify and maintain multiple
peaks. After running a niche EA several times, the average and the standard deviation
of the ENPM are calculated to characterize the algorithm.
ENPM does not consider the influence of peak heights. Suppose that a problem
has k peaks with heights h1 , . . . , hk , and that the algorithm has found m peaks with
heights h1 , . . . , hm
. MPR is defined by [54]
m
h
MPR = ki=1 i . (22.6)
j=1 hj
MPR grants higher peaks with more preference. A larger MPR value means a better
convergence to peaks. It takes a maximum value of 1, when all the peaks have been
identified and maintained correctly.
ENPM and MPR do not consider the distribution of individuals in the last gen-
eration. Chi-square-like (CSL) performance criterion [18], which has the form of
chi-square distribution, can be used to evaluate the distribution of a population.
Suppose that every individual at the end of the evolution converges to one peak
and that the probability pi of an individual being on peak i is given by
hi
pi = k . (22.7)
l=1 hl
CSL is defined by
k+1
(xi − μi )2
CSL = , (22.8)
i=1
σi2
where xi is the number of individuals on peak i at the end of the evolution, μi = NP pi

and σi2 = NP pi (1 − pi ), NP being the population size. It is seen that if the number
of individuals on every peak equals the mean for that peak, the CSL value is zero. A
smaller CSL value implies a better individual distribution.
22.3 Constrained Optimization
When dealing with optimization problems with constraints, two kinds of constraints,
namely, equality and inequality constraints, may arise. The existence of equality con-
straints reduces the size of the feasible space significantly, which makes it difficult to
locate feasible and optimal solutions. Popular methods include penalizing infeasible
individuals, repairing infeasible individuals, or considering bound violation as an
additional objective. Each constraint can also be handled as an objective, leading to

multiobjective optimization techniques.
Lagrange-based method is a classical approach to convert constrained optimiza-
tion problems into unconstrained optimization problems. For pure equality con-
straints, one can use the Lagrange multiplier method. Inequality constraints can be
converted into equality constraints by introducing extra slack variables, and Lagrange
multiplier method is then applied. This leads to KKT method. These methods are
actually useful for analysis-based optimization, but not applicable for heuristics-
based optimization. The two-population novelty search approach [45] handles the
constraints by maintaining and evolving two separate populations, one with feasible
and the other with infeasible individuals.
For population-based implementations of constrained optimization, one needs
to handle infeasible solutions in a population. One can simply reject the infeasible
individuals or penalize the infeasible individuals. The former, known as death penalty
method, is the easiest way to incorporate constraints. In death penalty method, all
the infeasible solutions are discarded and new solutions are created until enough
feasible ones are available. During iterations, if an individual moves to an infeasible
position, it is reset to its previous position. Death penalty can be used only with
inequality constraints as it is very difficult to randomly generate solutions that satisfy
equality constraints. The method suffers from premature step size reduction because
of insufficient birth surplus. Penalty function method is a simple and commonly used
method for handling constraints.
Repair algorithms generate a feasible solution from an infeasible one. They either
replace infeasible solutions or only use the repaired solutions for evaluation of their
infeasible one [10,30]. They can be seen as local search methods which reduce the
constraint violation. A gradient-based repair [14] uses the gradient information to
direct the infeasible solutions towards the feasible region.
For constraint handling, a replacement procedure is described in [19] for single-
objective and multiobjective GAs. A solution x1 is considered better than x2 in case
when: (1) both vectors are feasible but x1 has the better objective value, (2) x1 is
feasible but x2 is not, or (3) both vectors are infeasible but x1 exhibits the lower sum
of constraint violation (scaled constraints).
22.3.1 Penalty Function Method
Penalty function method exploits infeasible solutions by adding some penalty value
to the objective function of each infeasible individual so that it will be penalized for
violating the constraints. It converts equality and/or inequality constraints into a new
objective function, so that beyond the constraints the objective function is abruptly
reduced. Most of the constraint handling methods are based on penalty functions.
Penalty function method transforms constraint optimization problems into uncon-
strained optimization problems by defining an objective function in the form such
as [33]
f (x) = f (x) + fp (x), (22.9)
22.3 Constrained Optimization 361
where f (x) is the objective function, and fp (x) < 0 is a penalty for infeasible solutions
and zero for feasible solutions. A constrained problem can be solved by a sequence
of unconstrained optimizations in which the penalty factors are stepwise intensified.
Static penalty functions usually require the user to control the amount of penalty
added when multiple constraints are violated. These parameters are usually prob-
lem dependent and chosen heuristically. The penalties are the weighted sum of the
constraint violations.
In adaptive penalty function methods [24,37,68,76], information gathered from
the search process, such as the generation number t of EA, are used to control the
amount of penalty added to infeasible solutions, and they do not require users to
define parameters explicitly.
As the number of generations increases, the penalty also increases, and this puts
more and more selective pressure on GA to find a feasible solution [37]. Penalty
factors can be defined statically or depending on the number of satisfied constraints.
They can dynamically depend on the number of generations [37]
f˜ (x) = f (x) + (Ct)α G(x), (22.10)
where C and α are user-defined, and G(x) is a penalty function. Typically C = 0.5,
α = 1 or 2.
In [38], the average value of the objective function in the current population and
the level of violation of each constraint during the evolution process are used to define
the penalty parameters. For each constraint violation, a different penalty coefficient
is assigned so that a higher penalty value will be added for larger violation of a given
constraint. This requires an extra computation to evaluate the average value of the
objective function in each generation.
In [24], an infeasibility measure is used to form a two-stage penalty that is im-
posed upon the infeasible solutions to ensure that those infeasible individuals with
low fitness value and low constraint violation remain fit. The worst infeasible in-
dividual is first penalized to have objective function value equal to or greater than
the best feasible solution. This value is then increased to twice the original value
by penalizing. All other individuals are penalized accordingly. The method requires
no parameter tuning and no initial feasible solution. However, the algorithm fails to
produce feasible solutions in every run.
In [68], infeasible individuals with low objective value and low constraint violation
are exploited to facilitate finding feasible individuals in each run as well as producing
quality results. The number of feasible individuals in the population is used to guide
the search process either toward finding more feasible individuals or searching for
the optimum solution. Two types of penalties are added to each infeasible individual
to identify the best infeasible individuals in the current population. The amount of
the two penalties added is controlled by the number of feasible individuals currently
present in the population. If there are few feasible individuals, large penalty will be
added to infeasible individuals with higher constraint violation. On the other hand,
if there are sufficient numbers of feasible individuals present, infeasible individuals
with larger objective function values will be penalized more than infeasible ones
with smaller objective function values. The algorithm can find feasible solutions in
problems having small feasible space compared to the search space. The proposed
method is simple to implement and does not need any parameter tuning. It is able to
find feasible solutions in every run for all of the benchmark functions tested.
In [76], the constraint handling technique extends the single-objective optimiza-
tion algorithm proposed in [68] for multiobjective optimization. It is based on an
adaptive penalty function and a distance measure. These two functions vary de-
pending upon the objective function value and the sum of constraint violations of
an individual. The objective space is modified to account for the performance and
constraint violation of each individual. The modified objective functions are used
in the nondominance sorting to facilitate the search of optimal solutions not only
in the feasible space but also in the infeasible regions. The search in the infeasible
space is designed to exploit those individuals with better objective values and lower
constraint violations. The number of feasible individuals in the population is used
to guide the search process either toward finding more feasible solutions or favor in
search for optimal solutions.
In [2], the average value of the objective function in the current population and the
level of violation of each constraint during the evolution process are used to define
the penalty parameters. For each constraint violation, a different penalty coefficient
is assigned so that a higher penalty value will be added for larger violation of a given
constraint.
The constrained optimum is usually located at the boundary between feasible
and infeasible domains. Self-organizing adaptive penalty method [46] attempts to
maintain an equal number of designs on each side of the constraint boundary. The
method adjusts the penalty parameter value of each constraint according to the ratio
of the number of solutions that satisfy the constraint to the number of solutions that
violate the constraint. The penalty cost is calculated as the sum of a penalty factor
multiplying the constraint violation and the penalty pressure term that increases as
the generation increases.
Example 22.1: We want to minimize the following function of two variables:

min f (x) = 2x12 + x22 − x1 x2 − 2x1 − 3x2
x
subject to
x1 + x2 ≤ 2, −5x1 + 4x2 ≤ 3, 3x1 + 2x2 ≤ 5, x1 , x2 ≥ 0.
For this linear constrained problem, ga solver uses the gacreatelinear
feasible function to create a well-dispersed population that satisfies linear con-
straints and bounds. We then apply ga with the constraints and default parameters.
The evolution of ga solver is given in Figure 22.1, wherein the domain is a pen-
tagon region. Initially, the individuals are well distributed over the region and close
to the border. They rapidly converge toward to the global optimum. The optimum
solution is f (x) = −3.5630 at x = (0.6355, 1.3655).
(a) 2 6
Variable 2 4
1 2
0
0
−2
−0.5 0 0.5 1 1.5 2
Variable 1
−1
Best: −3.5533 Mean: −2.7103
Fitness value
−2 Best fitness
Mean fitness
−3
−4
0 20 40 60 80 100
Generation
(b) 2 6
4
Variable 2
1 2
0
0
−2
−0.5 0 0.5 1 1.5 2
Variable 1
−1
Best: −3.563 Mean: −3.5626
Fitness value
−2 Best fitness
Mean fitness
−3
−4
0 20 40 60 80 100
Generation
Figure 22.1 The evolution of a random run of GA for an linearly constrained problem. (a) At the
5th generation. (b) At the end of evolution.
22.3.2 Using Multiobjective Optimization Techniques
Constraint satisfaction and multiobjective optimization both involve the simulta-

neous optimization of a number of functions. Constraints can be treated as hard
objectives, and it must be satisfied ahead of the optimization of the soft objectives.
Constraint violation and objective function can be optimized separately using multi-
objective optimization techniques [25,61,62,67,75]. A single-objective constrained
optimization problem can be converted into a MOP by treating the constraints as one
or more objectives of constraint violation to be minimized.
To be more specific, a constrained optimization problem can be transformed into
a two-objective problem, where one objective is the original objective and the other
is the overall violation of the constraints [13]. The method maintains two groups of
individuals: one for the population in GAs, and the other for best infeasible indi-
viduals close to the feasible region. A constrained optimization problem can also be
transformed into a (k + 1)-objective problem, where k objectives are related to the
k constraints, and one is the original objective.
In [61,62], stochastic ranking is introduced to achieve a balance between objec-
tive and penalty functions stochastically in terms of the dominance of penalty and
objective functions. A probability factor is used to determine whether the objective
function value or the constraint violation value determines the rank of each individual.
Suitable ranking alone is capable of improving the search performance significantly.
In [62], the simulation results reveal that the unbiased multiobjective approach to
constraint handling may not be effective. A nondominated rank removes the need
for setting a search bias. However, this does not eliminate the need for having a bias
in order to locate feasible solutions.
In [73], constrained optimization problems are solved by a two-phase algorithm. In
the first phase, only constraint satisfaction is considered. The search is directed toward
finding a single feasible solution using ranking. In the second phase, simultaneous
optimization of the objective function and the satisfaction of the constraints is treated
as a biobjective optimization problem. In this case, nondominated ranking is used
to rank individuals, and niching scheme is used to preserve diversity. The algorithm
can always find feasible solutions for all problems.
α-constrained method [67] introduces a satisfaction level of a search point for
the constraints. It can convert an algorithm for unconstrained problems into an algo-
rithm for constrained problems by replacing ordinary comparisons with the α level
comparisons.
Hybrid constrained optimization EA [75] effectively combines multiobjective
optimization with global and local search models. A niching GA based on tournament
selection is used to perform global search. A parallel local search operator is adopted
to implement a clustering partition of the population and multiparent crossover is
used to generate the offspring population. The dominated individuals in the parent
population are replaced by nondominated individuals in the offspring population.
Infeasible individuals are replaced in a way to rapidly guide the population toward
the feasible region of the search space.
Problems
22.1 Explain why fitness sharing technique does not suffer from genetic drift after
all relevant peaks have already been found.
22.2 Suggest a way to use DE for multimodal optimization. Compare it with

standard fitness sharing.
22.3 Implement a real-coded GA with three types of niching techniques to mini-
mize 10-dimensional Griewank function. Try to find ten multimodal solutions
as well as the global minimum.
22.4 Consider the convex programming problem:
minimize f (x) = 2x12 + x22 + x1 x2 − 12x1 − 10x2 ,
subject to: x12 + (x2 − 5)2 ≤ 64,
(x1 + 3)2 + (x2 − 1)2 ≤ 36,
(x1 − 3)2 + (x2 − 1)2 ≤ 36,
x1 ≥ 0, and x2 ≥ 0.
Find the unique global minimum using ga solver.
22.5 Consider the problem:
minimize f (x) = (x12 + x2 − 11)2 + (x1 + x22 − 7)2 ,
subject to g1 (x) = 4 − (x1 − 0.5)2 − (x2 − 2)2 ≥ 0,
g2 (x) = x12 + (x2 − 2)2 − 6 ≥ 0,
x1 , x2 ∈ [0, 8].
Define a penalty method to convert this constrained problem to an uncon-
strained one. Then solve this problem using real-coded GA.
22.6 For constrained EAs, name some ways of measuring the degree of a constraint
violation.
References
1. Baluja S. Population-based incremental learning: a method for integrating genetic search based
function optimization and competitive learning. Computer Science Department, Carnegie Mel-
lon University, Pittsburgh, PA, USA, Technical Report CMU-CS-94-163. 1994.
2. Barbosa HJC, Lemonge ACC. An adaptive penalty scheme in genetic algorithms for constrained
optimization problems. In: Proceedings of the genetic and evolutionary computation conference
(GECCO), New York, July 2002. p. 287–294.
3. Basak A, Das S, Tan KC. Multimodal optimization using a biobjective differential evo-
lution algorithm enhanced with mean distance-based selection. IEEE Trans Evol Comput.
2013;17(5):666–85.
4. Beasley D, Bull DR, Martin RR. A sequential niche technique for multimodal function opti-
mization. Evol Comput. 1993;1(2):101–25.
5. Blackwell TM, Bentley PJ. Dynamic search with charged swarms. In: Proceedings of the
genetic and evolutionary computation conference (GECCO), New York, July 2002. p. 19–26.
6. Blackwell T, Branke J. Multi-swarm optimization in dynamic environments. In: Applications
of Evolutionary Computing, vol. 3005 of Lecture Notes in Computer Science. Berlin: Springer.
p. 489–500.
7. Branke J. Memory enhanced evolutionary algorithms for changing optimization problems. In:
Proceedings of the IEEE congress on evolutionary computation (CEC), Washington, DC, USA,
July 1999. p. 1875–1882.
8. Branke J, Kaubler T, Schmidt C, Schmeck H. A multi-population approach to dynamic opti-

mization problems. In: Evolutionary design and manufacture: selected papers from adaptive
computing in design and manufacture. London: Springer; 2000. p. 299–307.
9. Brest J, Maucec MS. Population size reduction for the differential evolution algorithm. Appl
Intell. 2008;29(3):228–47.
10. Brits R, Engelbrecht AP, van den Bergh F. Solving systems of unconstrained equations us-
ing particle swarm optimization. In: Proceedings of IEEE conference on systems, man, and
cybernetics, Hammamet, Tunisia, October 2002, vol. 3, p. 102–107.
11. Brits R, Engelbrecht AP, van den Bergh F. A niching particle swarm optimizer. In: Proceedings
of the 4th Asia-Pacific conference on simulated evolution and learning, Singapore, November
2002. p. 692–696.
12. Bui LT, Abbass HA, Branke J. Multiobjective optimization for dynamic environments. In:
Proceedings of congress on evolutionary computation (CEC), Edinburgh, UK, September 2005.
p. 2349–2356.
13. Cai Z, Wang Y. A multiobjective optimization-based evolutionary algorithm for constrained
optimization. IEEE Trans Evol Comput. 2006;10:658–75.
14. Chootinan P, Chen A. Constraint handling in genetic algorithms using a gradient-based repair
method. Comput Oper Res. 2006;33:2263–81.
15. Cioppa AD, De Stefano C, Marcelli A. Where are the niches? dynamic fitness sharing. IEEE
16. Cobb HG, Grefenstette JJ. Genetic algorithms for tracking changing environments. In: Pro-
ceedings of the 5th International conference on genetic algorithms, Urbana-Champaign, IL,
USA, June 1993. San Mateo, CA: Morgan Kaufmann; 1993. p. 523–530.
17. Cuevas E, Gonzalez M. An optimization algorithm for multimodal functions inspired by col-
lective animal behavior. Soft Comput. 2013;17:489–502.
18. Deb K, Goldberg DE. An investigation of niche and species formation in genetic function
optimization. In: Schaffer JD, editor. Proceedings of the 3rd International conference on genetic
algorithms, Fairfax, Virginia, USA, June 1989. San Mateo, CA: Morgan Kaufmann; 1989. p.
42–50
19. Deb K, Pratab A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm:
NSGA-II. IEEE Trans Evol Comput. 2002;6(2):182–97.
20. Deb K, Saha A. Finding multiple solutions for multimodal optimization problems using a
multiobjective evolutionary approach. In: Proceedings of the 12th Annual conference on genetic
and evolutionary computation (GECCO), Portland, Oregon, USA, July 2010. p. 447–454.
University of Michigan, Ann Arbor, MI, USA, 1975.
22. Dilettoso E, Salerno N. A self-adaptive niching genetic algorithm for multimodal optimization
of electromagnetic devices. IEEE Trans Magn. 2006;42(4):1203–6.
23. Eshelman LJ, Schaffer JD. Preventing premature convergence in genetic algorithms by pre-
venting incest. In: Proceedings of the 4th International conference on genetic algorithms, San
Diego, CA, USA, July 1991. San Mateo, CA, USA: Morgan Kaufmann Publishers; 1991. p.
115–122.
24. Farmani R, Wright J. Self-adaptive fitness formulation for constrained optimization. IEEE
25. Fonseca CM, Fleming PJ. Multiobjective optimization and multiple constraint handling with
evolutionary algorithms—Part i: a unified formulation; Part ii: application example. IEEE Trans
Syst, Man, Cybern Part A 28:1;1998: 26–37, 38–47.
26. Galan SF, Mengshoel OJ. Generalized crowding for genetic algorithms. In: Proceedings of
genetic and evolutionary computation conference (GECCO), Portland, Oregon, USA, July
2010. p. 775–782.
27. Gan J, Warwick K. A genetic algorithm with dynamic niche clustering for multimodal function
optimisation. In: Proceedings of the international conference on artificial neural networks and
genetic algorithms, Portoroz, Slovenia. Vienna: Springer; 1999. p. 248–255
References 367
28. Mc Ginley B, Maher J, O’Riordan C, Morgan F. Maintaining healthy population diversity using
adaptive crossover, mutation, and selection. IEEE Trans Evol Comput. 15:5;2011: 692–714.
29. Goldberg DE, Richardson J. Genetic algorithms with sharing for multimodal function opti-
mization. In: Grefenstette J, edtor. Proceedings of the 2nd International conference on genetic
algorithms and their application, Cambridge, MA, USA, July 1987. Hillsdale, New Jersey:
Lawrence Erlbaum; 1987. p. 41–49.
30. Grefenstette JJ. Genetic algorithms for changing environments. In: Proceedings of the 2nd
International conference on parallel problem solving from nature (PPSN II), Brussels, Belgium,
September 1992. p. 137–144.
31. Hansen N. Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed.
In: Proceedings of Genetic and Evolutionary Computation Conference (GECCO), Montreal,
Canada, July 2009, pp. 2389–2395.
32. Harik GR. Finding multimodal solutions using restricted tournament selection. In: Proceedings
of the 6th International conference on genetic algorithms, Pittsburgh, PA, USA, July 1995. San
Mateo, CA: Morgan Kaufmann; 1995. p. 24–31.
33. Homaifar A, Lai SHY, Qi X. Constrained optimization via genetic algorithms. Simulation.
1994;62(4):242–54.
34. Horn J. The nature of niching: genetic algorithms and the evolution of optimal, cooperative
populations. Ph.D. Thesis, Genetic Algorithm Lab, University of Illinois at Urbana-Champaign
Champaign, IL, USA, 1997.
35. Horn J, Nafpliotis N, Goldberg DE. A niched pareto genetic algorithm for multiobjective
optimization. In: Proceedings of the 1st IEEE Conference on evolutionary computation (CEC),
Orlando, FL, USA, June 1994, vol. 1, p. 82–87.
36. Janson S, Middendorf M. A hierarchical particle swarm optimizer for dynamic optimization
problems. In: Applications of evolutionary computing, vol. 3005 of Lecture Notes in Computer
Science. Berlin: Springer; 2004. p. 513–524.
37. Joines JA, Houck CR. On the use of non-stationary penalty functions to solve nonlinear con-
strained optimization problems with GAs. In: Proceedings of IEEE Congress on evolutionary
computation (CEC), Orlando, FL, USA, June 1994, p. 579–584.
38. Lemonge ACC, Barbosa HJC. An adaptive penalty scheme in genetic algorithms for constrained
optimization problems. In: Proceedings of genetic and evolutionary computation conference
(GECCO), New York, July 2002, p. 287–294.
39. Li C, Yang S. A general framework of multipopulation methods with clustering in undetectable
dynamic environments. IEEE Trans Evol Comput. 2012;16(4):556–77.
40. Li J-P, Balazs ME, Parks GT, Clarkson PJ. A species conserving genetic algorithm for multi-
modal function optimization. Evol Comput. 2002;10(3):207–34.
41. Li X. Adaptively choosing neighborhood bests using species in a particle swarm optimizer for
multimodal function optimization. In: Proceedings of the genetic and evolutionary computation
conference (GECCO), Seattle, WA, USA, June 2004. p. 105–116.
42. Li X. Efficient differential evolution using speciation for multimodal function optimization. In:
Proceedings of conference on genetic and evolutionary computation (GECCO), Washington,
DC, USA, June 2005. p. 873–880.
43. Li X. Niching without niching parameters: particle swarm optimization using a ring topology.
44. Li L, Tang K. History-based topological speciation for multimodal optimization. IEEE Trans
Evol Comput. 2015;19(1):136–50.
45. Liapis A, Yannakakis GN, Togelius J. Constrained novelty search: a study on game content
generation. Evol Comput. 2015;23(1):101–29.
46. Lin CY, Wu WH. Self-organizing adaptive penalty strategy in constrained genetic search. Struct
Multidiscip Optim. 2004;26(6):417–28.
47. Ling Q, Wu G, Yang Z, Wang Q. Crowding clustering genetic algorithm for multimodal function
optimization. Appl Soft Comput. 2008;8(1):88–95.
48. Liu L, Yang S, Wang D. Particle swarm optimization with composite particles in dynamic
environments. IEEE Trans Syst, Man, Cybern Part B. 2010;40(6):1634–48.
49. Mahfoud SW. Crowding and preselection revisited. In: Manner R, Manderick B, editors. Pro-
ceedings of the 2th International conference on parallel problem solving from nature (PPSN
II), Brussels, Belgium, September 1992. Amsterdam: Elsevier; 1992. p. 27–36.
50. Mahfoud SW. Niching methods for genetic algorithms. Technical Report 95001, Illinois Ge-
netic Algorithms Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL, USA,
1995.
51. Menczer F, Belew RK. Local selection. In: Proceedings of the 7th International conference on
evolutionary programming, San Diego, CA, USA, March 1998, vol. 1447 of Lecture Notes in
Computer Science. Berlin: Springer; 1998. p. 703–712.
52. Mengshoel OJ, Goldberg DE. Probability crowding: deterministic crowding with probabilistic
replacement. In: Proceedings of genetic and evolutionary computation conference (GECCO),
Orlando, FL, USA, July 1999. p. 409–416.
53. Mengshoel OJ, Goldberg DE. The crowding approach to niching in genetic algorithms. Evol
Comput. 2008;16(3):315–54.
54. Miller BL, Shaw MJ. Genetic algorithms with dynamic niche sharing for multimodal function
optimization. In: Proceedings of IEEE International conference on evolutionary computation
(CEC), Nagoya, Japan, May 1996. p. 786–791.
55. Morrison RW, De Jong KA. Triggered hyper mutation revisited. In: Proceedings of congress
on evolutionary computation (CEC), San Diego, CA, USA, July 2000. p. 1025–1032.
56. Parrott D, Li X. Locating and tracking multiple dynamic optima by a particle swarm model
using speciation. IEEE Trans Evol Comput. 2006;10(4):440–58.
57. Parsopoulos KE, Tasoulis DK, Pavlidis NG, Plagianakos VP, Vrahatis MN. Vector evaluated
differential evolution for multiobjective optimization. In: Proceedings of IEEE congress on
evolutionary computation (CEC), Portland, OR, USA, June 2004. p. 204–211.
58. Petrowski A. A CLEARING procedure as a niching method for genetic algorithms. In: Pro-
ceedings of IEEE International conference on evolutionary computation (CEC), Nagoya, Japan,
May 1996. p. 798–803.
59. Qu BY, Suganthan PN, Liang JJ. Differential evolution with neighborhood mutation for mul-
timodal optimization. IEEE Trans Evol Comput. 2012;16(5):601–14.
60. Richter H. Detecting change in dynamic fitness landscapes. In: Proceedings of congress on
evolutionary computation (CEC), Trondheim, Norway, May 2009. p. 1613–1620.
61. Runarsson TP, Yao X. Stochastic ranking for constrained evolutionary optimization. IEEE
62. Runarsson TP, Yao X. Search bias in constrained evolutionary optimization. IEEE Trans Syst,
Man, Cybern Part C. 2005;35:233–43.
63. Shir OM, Back T. Niche radius adaptation in the CMA-ES niching algorithm. In: Proceedings of
the 9th International conference on parallel problem solving from nature (PPSN IX), Reykjavik,
Iceland, September 2006, vol. 4193 of Lecture Notes in Computer Science. Berlin: Springer;
2006. p. 142–151.
64. Shir OM, Emmerich M, Back T. Adaptive niche radii and niche shapes approaches for niching
with the CMA-ES. Evol Comput. 2010;18(1):97–126.
65. Singh G, Deb K. Comparison of multimodal optimization1 algorithms based on evolution-
ary algorithms. In: Proceedings of the 8th Annual conference on genetic and evolutionary
computation (GECCO), Seattle, WA, USA, June 2006. p. 1305–1312.
66. Stoean C, Preuss M, Stoean R, Dumitrescu D. Multimodal optimization by means of a topo-
logical species conservation algorithm. IEEE Trans Evol Comput. 2010;14(6):842–64.
67. Takahama T, Sakai S. Constrained optimization by applying the α-constrained method to the
nonlinear simplex method with mutations. IEEE Trans Evol Comput. 2005;9(5):437–51.
68. Tessema B, Yen GG. An adaptive penalty formulation for constrained evolutionary optimiza-
tion. IEEE Trans Syst, Man, Cybern Part A. 2009;39(3):565–78.
References 369
69. Thomsen R. Multimodal optimization using crowding-based differential evolution. In: Pro-
ceedings of IEEE Congress on evolutionary computation (CEC), Portland, OR, USA, June
2004. p. 1382–1389.
70. Tsutsui S, Fujimoto Y, Ghosh A. Forking genetic algorithms: GAs with search space division
schemes. Evol Comput. 1997;5:61–80.
71. Ursem RK. Multinational evolutionary algorithms. In: Proceedings of the IEEE Congress on
evolutionary computation (CEC), Washington, DC, USA, July 1999. p. 1633–1640.
72. Ursem RK. Multinational GAs: multimodal optimization techniques in dynamic environments.
In: Proceedings of the genetic and evolutionary computation conference (GECCO), Las Vegas,
NV, USA, July 2000. p. 19–26.
73. Venkatraman S, Yen GG. A generic framework for constrained optimization using genetic
74. Wagner N, Michalewicz Z, Khouja M, McGregor RR. Time series forecasting for dynamic
environments: the DyFor genetic program model. IEEE Trans Evol Comput. 2007;11(4):433–
52.
75. Wang Y, Cai Z, Guo G, Zhou Y. Multiobjective optimization and hybrid evolutionary algo-
rithm to solve constrained optimization problems. IEEE Trans Syst, Man, Cybern Part B.
2007;37(3):560–75.
76. Woldesenbet YG, Yen GG, Tessema BG. Constraint handling in multiobjective evolutionary
77. Yang S. Genetic algorithms with elitism-based immigrants for changing optimization problems.
In: Applications of evolutionary computing, vol. 4448 of Lecture Notes in Computer Science.
78. Yang S. Genetic algorithms with memory- and elitism-based immigrants in dynamic environ-
ments. Evol Comput. 2008;16(3):385–416.
79. Yang S, Li C. A clustering particle swarm optimizer for locating and tracking multiple optima
in dynamic environments. IEEE Trans Evol Comput. 2010;14(6):959–74.
80. Yang S, Yao X. Population-based incremental learning with associative memory for dynamic
environments. IEEE Trans Evol Comput. 2008;12(5):542–61.
81. Yao J, Kharma N, Grogono P. Bi-objective multipopulation genetic algorithm for multimodal
function optimization. IEEE Trans Evol Comput. 2010;14(1):80–102.
82. Yin X, Germay N. A fast genetic algorithm with sharing scheme using cluster analysis meth-
ods in multimodal function optimization. In: Proceedings of the International conference on
artificial neural nets and genetic algorithms, Innsbruck, Austria, 1993. Vienna: Springer; 1993.
p. 450–457.
83. Yu X, Tang K, Yao X. An immigrants scheme based on environmental information for genetic
algorithms in changing environments. In: Proceedings of the IEEE Congress on evolutionary
computation (CEC), Hong Kong, June 2008. p. 1141–1147.
Multiobjective Optimization
23
Multiobjective optimization problems (MOPs) involve several conflicting objectives

to be optimized simultaneously. The challenge is to find a Pareto set involving non-
dominated solutions that are evenly distributed along the Pareto Front. Metaheuristics
for multiobjective optimization have been established as efficient approaches to solve
MOPs.
23.1 Introduction
Metaheuristics for multiobjective optimization, termed multiobjective evolutionary

algorithms (MOEAs) in general, have been established as efficient approaches to
solve MOPs.
As metaheuristics for multiobjective optimization can be non-Pareto-based or
Pareto-based. Two popular non-Pareto-based MOEAs are lexicographic method and
aggregating function method. In lexicographic method, the objectives are ranked in
a decreasing order and optimization proceeds from higher to lower order objectives,
one at a time. Once an objective is optimized, the aim is to improve as much as
possible the next objective(s) without decreasing the quality of the previous one(s).
In aggregating function method, all the objectives are weighted averaged into a single
objective to be optimized. Since objectives tend to be defined in very different ranges,
normalization is normally required. Varying the weights during the run allows, in
general, to generate different nondominated solutions in one run.
Vector-evaluated GA (VEGA) [140] is the first EA for multiobjective optimiza-
tion. The population is divided into equal-sized subpopulations, each for searching
the optimum of a single objective, and then all the subpopulations are merged and
mixed. When performing crossover, individuals that are good in one objective will
recombine with individuals that are good in another one. This sort of approach pro-
duces several nondominated solutions in a single run, but it typically misses good
compromises among the objectives. Some other heuristics are used to prevent the

DOI 10.1007/978-3-319-41192-7_23
372 23 Multiobjective Optimization
system from converging toward solutions that are not with respect to any criterion.
This algorithm, however, has bias toward some regions [150].
In multiple-objective genetic local search (MOGLS) [74], the MOP is reformu-
lated as a simultaneous optimization of all weighted Tchebycheff functions or a
weighted sum of multiple objectives as a fitness function. A local search procedure
is applied to each individual generated by genetic operations. MOGLS randomly
specifies weight values whenever a pair of parent solutions are selected. It examines
only a small number of neighborhood solutions of a current solution in the local
search procedure.
EAs tend to converge to a single solution if running long enough. Therefore, a
mechanism to maintain diversity is required in order to deal with MOPs. All the
nondominated solutions should be considered equally good by the selection mech-
anism. Goldberg [58] introduced nondominated sorting to rank a search population
according to Pareto optimality. Pareto ranking assigns a rank to each solution based
on its Pareto dominance, such that nondominated solutions are all sampled at the
same rate. In Pareto rank method, all individuals need to be compared with others
using a Pareto dominance concept to determine the nondominated solutions in the
current population. Pareto ranking gives nondominated individuals the highest rank,
i.e., rank 1. Then rank-1 individuals are removed from the population, the nondom-
inated solutions are determined in the remaining individuals, and rank 2 are given.
The procedure is repeated until all individuals have been assigned a rank number.
Niching and speciation techniques can be used to promote genetic diversity so that
the entire Pareto frontier is covered. Equal probability of reproduction is assigned to
all nondominated individuals in the population.
Multiobjective GA (MOGA) [50,51] uses a rank-based fitness assignment method.
Niche formation techniques are used to promote diversity among preferable candi-
dates. If an individual xi at generation t is dominated by pi (t) individuals in the
current population, its current rank is given by rank(xi (t)) = 1 + pi (t) [50]. All non-
dominated individuals are assigned rank 1, see Figure 23.1.
The rank-based fitness assignment can be implemented in three steps [50]. First,
the population is sorted according to rank. Then, fitnesses is assigned to individuals
by interpolating from the best (rank 1) to the worst (rank n < NP ) according to
Figure 23.1 Multiobjective f2

ranking in two-dimensional
space. 5
1
3
1
3
1
1
1
f1
some function, say, linear function. Finally, the fitnesses of individuals having the
same rank are averaged, so that all of them will be sampled at the same rate. This
procedure keeps the global population fitness constant while maintaining appropriate
selective pressure. The vast majority of MOEAs resort to Pareto ranking as a fitness
assignment methodology.
Pearson’s correlation coefficient has been used as the measure of conflict among
the objectives in KOSSA [75], thus aiding in dimension reduction. Another method
selects a subset of objectives and performs the MOEA based on those objectives only.
In the context of objective reduction, a principal component analysis (PCA)-based
algorithm has been suggested in [40]. In [16], δ-conflict is defined as a measure of
conflict among objective functions, and it is used to select a subset of the original
set of objectives, which preserve the weak Pareto dominance.
To extend multiobjective optimization algorithms in the presence of noise in fit-
ness estimates, a common strategy is to utilize the concept of sampling (fitness
reevaluation of the same trial solution) to improve fitness estimates in the presence
of noise [6].
Dynamic MOPs requires an optimization algorithm to continuously track the
moving Pareto front over time. In [162], a directed search strategy combines two
mechanisms for achieving a good balance between exploration and exploitation of
MOEAs in changing environments. The first mechanism reinitializes the population
based on the predicted moving direction as well as the directions that are orthogonal
to the moving direction of the Pareto set, when a change is detected. The second
mechanism aims to accelerate the convergence by generating solutions in predicted
regions of the Pareto set according to the moving direction of the nondominated
solutions between two consecutive generations.
23.2 Multiobjective Evolutionary Algorithms
Pareto-based methods can be nonelitist and elitist MOEAs. They typically adopt
Pareto ranking, some form of elitism and diversity maintenance strategy. They have
the ability to find multiple Pareto optimal solutions in one single run. Nonelitist
MOEAs do not retain the nondominated solutions that they generate. Representative
nonelitist MOEAs are nondominated sorting GA (NSGA) [149], niched Pareto GA
(NPGA) [71], and MOGA [50]. Elitist MOEAs retain these solutions either in an
external archive or in the main population. Elitism allows solutions that are globally
nondominated to be retained. Popular Pareto-based elitist MOEAs are strength Pareto
EA (SPEA) [181], strength Pareto EA 2 (SPEA2) [180], Pareto archived ES (PAES)
[84], and nondominated sorting GA II (NSGA-II) [39].
A good MOEA for MOPs should satisfy the requirements of convergence, dis-
tribution, and elitism. MOEAs should have a convergence mechanism so that they
can find the Pareto front as soon as possible. They should distribute their individuals
as evenly as possible along the Pareto front so as to provide more nondominated
solutions. The elitism mechanism should contain a set of nondominated individuals

found thus far, often called the archive in MOEAs.
Rank density-based GA [106] utilizes VEGA to deal with the tradeoff between
convergence and distribution. It transforms an m-objective problem into a two-
objective problem. Each individual has two fitness values: rank and density. With
respect to both of them, the smaller, the better.
Coello maintains an evolutionary multiobjective optimization repository (http://
delta.cs.cinvestav.mx/~ccoello/EMOO) in which most MOEA algorithms can be
found. The source code of NSGA-II, PAES and MOPSO is available from the EMOO
repository.
23.2.1 Nondominated Sorting Genetic Algorithm II
NSGA [150] implements a nondominated sorting in GA along with a niching and

speciation method to find multiple Pareto optimal points simultaneously. It converges
to the Pareto front with a good spread of solutions. NSGA combines Pareto ranking
and fitness sharing. After Pareto ranking, every individual in the same rank r gets
the same dummy fitness value fr (f1 > f2 > . . .). Individuals share their fitness in the
same rank. In this way, separated lower rank individuals have a selection advantage,
which pushes NSGA toward the Pareto front with good distribution. Before selection
is performed, all nondominated individuals have an equal reproductive potential for
these individuals.
NSGA-II (C language implementation, http://www.iitk.ac.in/kangal/codes.shtml)
[37,39] improves NSGA by introducing elitism and a crowd comparison operator.
The convergence, distribution, and elitism mechanisms in NSGA-II are Pareto rank
and tournament selection, the crowding distance, and the introduction of the archive
A, respectively. The Pareto ranking (nondominated sorting) procedure is improved
to have lower time complexity by adopting a better bookkeeping scheme to assign
rank to individuals. The time complexity remains the same order as that of NSGA,
O(MN 2 ), in generating nondominated fronts in one generation for a population size
N and M objective functions. The requirement for the niche radius is eliminated by
the new crowding distance method. In the case of a tie in rank during the selection
process, the individual with a lower density count will be chosen. The capacity of
the archive is the population size.
NSGA-II uses (λ + μ)-ES selection as its elitist mechanism, where λ = μ takes
the population size. NSGA-II is able to find much better spread of solutions and
better convergence near the true Pareto optimal front compared to PAES and SPEA.
NSGA-II ranks the population by layers. SPEA and NSGA-II are parameter-free
techniques. NSGA-II has become a benchmark.
In [44], preference order is compared with Pareto dominance-based ranking within
NSGA-II, along with two strategies that make different use of the conditions of
efficiency provided. Preference order ranking enables NSGA-II to achieve better
scalability properties compared with Pareto dominance-based ranking.
23.2 Multiobjective Evolutionary Algorithms 375
Algorithm 23.1 (One Generation of NSGA-II).
1. Assignment for Pareto rank and crowding distance.

a. Combine the population P (t) (popsize) and the archive A(t) (popsize) to get
2NP individuals.
b. Assign each individual a Pareto rank.
c. Calculate the crowding distance for each individual.
2. Generation of the new archive A(t + 1).
a. Insert the individuals into A(t + 1). The individuals in rank 1 should be inserted
first, then rank 2, and so on.
b. If rank r cannot be fully inserted into A(t + 1), then insert individuals in
descending order of the crowding distance until A(t + 1) is full with NP indi-
viduals.
3. Generation of the new population P (t + 1).
a. Select from A(t + 1) using binary tournament selection to form a mating pool.
If two individuals in A(t + 1) have different ranks, the one with the lower rank
or the one with the same rank but larger crowding distance wins the tournament.
b. Generate the new population P (t + 1) by simulated binary crossover and poly-
nomial mutation from the mating pool.
PCX-NSGA-II [41] uses parent-centric recombination (PCX) to generate a set of

new trial solutions, all of which are mutated by a polynomial mutation operator.
The solution process of one generation in NSGA-II is listed in Algorithm 23.1.
The source code of NSGA-II and MOPSO is available at http://delta.cs.cinvestav.
mx/~ccoello/EMOO/.
MATLAB Global Optimization Toolbox provides gamultiobj solver which
uses a variant of NSGA-II. An elitist GA always favors individuals with better fit-
ness value (rank). A controlled elitist GA favors individuals that can help increase
the diversity of the population even if they have a lower fitness value. Diversity is
maintained by controlling the elite members of the population as the algorithm pro-
gresses. Pareto fraction limits the number of individuals on the Pareto front (elite
members). Distance function such as a distance crowding function, helps to maintain
diversity on a front by favoring individuals that are relatively far away on the front.
The crowding distance measure function in the toolbox takes an optional argument
to calculate distance either in function space (phenotype) or design space (genotype).
If ’genotype’ is chosen, then the diversity on a Pareto front is based on the design
space. The default choice is ’phenotype’ and, in that case, the diversity is based on
the function space.
Example 23.1:
In this example, we run gamultiobj solver to minimize the Schaffer function:
min f1 (x) = x 2 , f2 (x) = (x − 2)2 , [−10, 10]. (23.1)
x
This function has a convex, continuous Pareto optimal front with x ∈ [0, 2]. The
Schaffer function is plotted in Figure 23.2.
The population size is selected as 50. The solver will try to limit the number
of individuals in the current population that are on the Pareto front to 40 % of the
population size by setting the Pareto fraction to 0.4. Thus, the number of points on
the Pareto front is 20. The initial population is randomly selected within the domain.
Crowding distance function in genotype space is selected for diversity control.
For a random run, the solver terminated at the 200th generation, which is the
default maximum number of generations, with 10051 function evaluations. The
solutions on the Pareto front has an average distance of 0.00985864, and a spread
of 0.127192. The obtained Pareto front is given in Figure 23.3. It is shown that the
150
f1
f
2
100
f
50
0
−10 −8 −6 −4 −2 0 2 4 6 8 10
x
Figure 23.2 The Schaffer function.
Figure 23.3 A random run Pareto front

of a multiobjective GA. 4.5
4
3.5
3
Objective 2
2.5
2
1.5
1
0.5
0
0 1 2 3 4 5
Objective 1
solutions are well distributed over the Pareto front. The objective values on the Pareto
front are both within [0, 4], corresponding to the Pareto optimal solutions x ∈ [0, 2].
For a random run with crowding distance function in phenotype space, the solver
terminated at the 139th generation, with 7001 function evaluations. The solutions on
the Pareto front has an average distance of 0.0190628 and a spread of 0.100125.
23.2.2 Strength Pareto Evolutionary Algorithm 2
SPEA [181] is an elitist Pareto-based strategy that uses an external archive to store the
nondominated solutions found so far. The Pareto-based fitness assignment method
is itself a niching technique that does not require the concept of distance. The fitness
(strength) of a nondominated solution stored in the external archive is proportional
to the number of individuals covered, while the fitness of a dominated individual
is calculated by summing the strength of the nondominated solutions that cover
it. This fitness assignment criterion results in the definition of a niche that can be
identified with the portion of the objective function space covered by a nondominated
solution. Both the population and the external nondominated set participate in the
selection phase (the smaller the fitness, the higher the reproduction probability). The
secondary population is updated every generation and pruned by clustering if the
number of the nondominated individuals exceeds a predefined size. SPEA can be
very effective in sampling along the entire Pareto optimal front and distributing the
generated solutions over the tradeoff surface.
A systematic comparison of various MOEAs is provided by [177] (http://www.
tik.ee.ethz.ch/~zitzler/testdata.html) using six carefully chosen test functions [35]:
convexity, non-convexity, discrete Pareto fronts, multimodality, deception and biased
search spaces. A clear hierarchy of algorithms emerges regarding the distance to the
Pareto optimal front in descending order of merit: SPEA, NSGA, VEGA, HLGA,
NPGA, FFGA. While there is a clear performance gap between SPEA and NSGA,
as well as between NSGA and the remaining algorithms, the fronts achieved by
VEGA, HLGA, NSGA, and FFGA are rather close. Elitism is shown to be an impor-
tant factor for improving evolutionary multiobjective search. An elitist variant of
NSGA (NSGA-II) equals the performance of SPEA. The performance of the other
algorithms improved significantly when elitist strategy was included.
SPEA2 (in C language, http://www.tik.ee.ethz.ch/sop/pisa/) [179] improves SPEA
by incorporating a fine-grained fitness assignment strategy, a nearest neighbor density
estimation technique, and an enhanced archive truncation method. The convergence,
distribution, and elitism mechanisms in SPEA2 are raw fitness assignment, density,
and archive, respectively. Both the archive and population are assigned a fitness based
upon the strength and density estimation. SPEA2 and NSGA-II seem to behave in
a very similar manner, and they both outperform Pareto envelope-based selection
algorithm (PESA).
In SPEA2, the strength of an individual is the number of individuals it dominates

in the union of archival A and population P:

S(i) = {xj |(xj ∈ P + A) ∩ xi ≺ xj } , (23.2)
where | · | denotes the operation of cardinality.
To avoid selection bias, raw fitness is defined to describe how good an individual
is of convergence. An individual’s raw fitness is defined by

R(i) = S(j). (23.3)
xj ∈P+A,xj ≺xi
If xi is the nondominated solution in the union of A and P, it is assigned the best raw
fitness (such as zero.)
Density estimation function is an adaptation of the kth nearest neighbor, where
the density at any point is a decreasing function of the distance to the kth nearest data
point. Density D(i) is defined to describe the crowdedness of xi , based on ranking
the distances of every individual to all the other individuals. A truncation method
based upon the density is applied to keep the archive at a fixed size.
Every individual is granted fitness, based on which the basis of the binary tour-
nament select is:
F(i) = R(i) + D(i). (23.4)
For nondominated individual xi , R(i) = 0 and D(i) < 1, thus the MOP is transformed
into a single-objective minimization problem.
23.2.3 Pareto Archived Evolution Strategy (PAES)
The adaptive grid mechanism was first used in PAES [84]. In PAES, an external
archive is incorporated to store all the nondominated solutions obtained. In reality,
the adaptive grid is a space formed by hypercubes. As it is effective in maintaining
diversity of nondominated solutions, adaptive grid and its variations are used by
a number of algorithms. The adaptive grid is started when the upper limitation of
external archive is reached. This means that it cannot maintain good distribution of
nondominated solutions when the number of nondominated solutions is below the
upper limitation.
PAES [84] expands ES to solve MOPs. PAES ensures that the nondominated
solutions residing in an uncrowded location will survive. In the simplest (1+1)-
PAES, there are three groups: one current individual, one updated individual, and
one archive containing all the nondominated individuals found thus far. It consists
of a (1 + 1)-ES employing local search in combination with a reference archive that
records some of the nondominated solutions previously found in order to identify the
approximate dominance ranking of the current and candidate solution vectors. If the
archive size exceeds a threshold, then it is pruned by removing the individual that has
the smallest crowding distance. This archive is used as a reference set against which
each mutated individual is compared. The mechanism used to maintain diversity
consists of a crowding procedure that divides the objective space in a recursive
manner. Each solution is placed in a certain grid location based on the values of its
objectives. A map of such grid is maintained, indicating the number of solutions that
reside in each grid location. Since the procedure is adaptive, no extra parameters are
required (except for the number of divisions of the objective space). (1 + λ)-PAES
and (μ + λ)-PAES extend the basic algorithm. (1 + 1)-PAES is comparable with
NSGA-II.
Memetic PAES [85] associates a global search evolutionary scheme with mutation
and recombination operators of a population, with the local search method of (1 + 1)-
PAES [84]. Memetic PAES outperforms (1 + 1)-PAES, and performs similar to
SPEA.
23.2.4 Pareto Envelope-Based Selection Algorithm
Motivated from SPEA, PESA [31] uses an external archive to store the evolved
Pareto front and an internal population to generate new candidate solutions. PESA
maintains a hyper grid-based scheme to keep track of the degree of crowding in
different regions of the archive, which is applied to maintain the diversity of the
external population and to select the internal population from the external archive.
In PESA, mating selection is only performed on the archive which stores the current
nondominated set. The same holds for NSGA-II. PESA uses binary tournament
selection to generate new population from the archive. The archive in PESA only
contains the nondominated solutions found thus far. The one with the smaller squeeze
factor, i.e., the one residing in the less crowded hyperbox, wins the tournament. PESA
generally outperforms both SPEA and PAES. Both SPEA and PESA outperform
NSGA and NPGA.
As to the diversity mechanisms, NSGA-II uses the crowding distance and SPEA2
the density function. In PESA, hyperbox method and squeeze factor concept are
used. For the archive-updating mechanism, if a new individual is nondominated in
both the population and the archive, and the archive is full, then select the individual
in the archive with the largest squeeze factor to be replaced by the new one.
PESA-II [29] differs from PESA only in the selection mechanism. In PESA-II,
the unit of selection is a hyperbox in the objective space. Every hyperbox has its own
squeeze factor. The hyperbox with the smallest squeeze factor will be selected first
and then a randomly chosen individual is selected. Region-based selection could
ensure a good distribution along the Pareto front. Instead of assigning a selective
fitness to an individual, selective fitness is assigned to the hyperboxes in an elitist
fashion in the objective space which are currently occupied by at least one individual
in the current approximation to the Pareto frontier. A hyperbox is thereby selected,
and the resulting selected individual is randomly chosen from this hyperbox. This
method of selection is more sensitive to ensuring a good spread of development
along the Pareto frontier than individual-based selection. PESA-II gives significantly
superior results to PAES, PESA, and SPEA.
23.2.5 MOEA Based on Decomposition (MOEA/D)
MOEAs can be Pareto-based methods or decomposition-based methods. A well-

known decomposition approach is the cooperative coevolution technique. Coopera-
tive coevolution with global search has been designed to handle large scale optimiza-
tion problems by decomposing a large scale problem into smaller scale subproblems.
Pareto optimal solution can be obtained by solving an appropriate scalarizing func-
tion [114]. The scalarizing function-based fitness evaluation approach is an alterna-
tive to the Pareto dominance relation. MOEA based on decomposition (MOEA/D)
[170] decomposes a MOP into a number of scalar optimization subproblems and
optimizes them in a collaborative manner by using an EA. MOEA/D is based on the
weighted sum method. A weighted Tchebycheff approach [114] is used in MOEA/D
algorithm for scalarization. A neighborhood relationship among all the subproblems
is defined based on the distances of their weight vectors. For the problem of scaling,
adaptive normalization is used for each objective. Each subproblem is optimized by
only using information from its several neighboring subproblems. MOEA/D employs
a set of N individuals and uniformly distributed weight vectors. Each of these weight
vectors formulates a single-objective problem, that is, a subproblem. A neighbor-
hood is defined for each individual in the population based on the distances between
the weight vectors.
MOEA/D converts a MOP into a number of scalar optimization subproblems
using Tchebycheff approach:
min g tch (x|w, z∗ ) = min max {wi |fi (x) − zi∗ |}, (23.5)
x∈ x∈ 1=1,2,...,m
m
where w = (w1 , w2 , . . . , wm ), i=1 wi = 1, wi ≥ 0, i = 1, . . . , m is the weight
vector, z∗ = (z1∗ , z2∗ , . . . , zm
∗ ) is the reference point, z∗ = min{f (x|x ∈ )}, i =
i i
, 2, . . . , m for minimization problems.
These subproblems are optimized by evolving a set of solutions simultaneously.
The population saves the best solution found so far for each subproblem at each gener-
ation. The neighborhood relations among these subproblems are based on Euclidean
distances between their weight vectors. It solves each subproblem by only using the
information of its neighboring subproblems.
MOEA/D has high search ability for continuous optimization, combinatorial opti-
mization, and also problems with complex Pareto sets. MOEA/D outperforms or
performs similarly to MOGLS and NSGA-II in terms of solution quality but with
lower complexity.
MOEA/D algorithm employs a linear DE mutation operator LIMO as a crossover
operator. The offspring generated by crossover is subsequently mutated using a muta-
tion operator with a probability to produce a new offspring [171]. MOEA/D-DRA
[171], an MOEA/D variant with a dynamic resource allocation scheme, was the
winner in the CEC2009 unconstrained MOEA competition.
A general class of continuous multiobjective optimization test instances with
arbitrary prescribed Pareto set shapes is introduced in [95]. MOEA/D based on
DE (MOEA/D-DE) (http://cswww.essex.ac.uk/staff/zhang/) [95] uses a DE operator
and a polynomial mutation operator for producing new solutions, and it has two
extra measures for maintaining the population diversity. Compared with NSGA-
II with the same reproduction operators on the test instances, MOEA/D-DE is less
sensitive to the control parameters in DE operator than NSGA-II-DE. MOEA/D could
significantly outperform NSGA-II on these test instances. Optional local search is
used to guarantee that the offspring will be a legal and feasible solution and they
utilize the archive to contain the nondominated solutions found thus far.
In MOEA/D, each subproblem is paired with a solution in the current popula-
tion. Subproblems and solutions are two sets of agents. The selection of promising
solutions for subproblems can be regarded as a matching between subproblems and
solutions. Stable matching, proposed in economics, can effectively resolve conflicts
of interests among selfish agents in the market. MOEA/D-STM [97] is derived from
MOEA/D-DRA [171]. The only difference between MOEA/D-STM and MOEA/D-
DRA is in the selection. MOEA/D-STM uses stable matching model to implement
the selection operator in MOEA/D. The subproblem preference encourages conver-
gence, whereas the solution preference promotes population diversity. Stable match-
ing model is used to balance these two preferences and thus, trading off the conver-
gence and diversity of the evolutionary search. The stable outcome produced by the
stable matching model matches each subproblem with one single solution, whereas
each subproblem agent in MOEA/D, by using its aggregation function, ranks all
solutions in the solution pool.
23.2.6 Several MOEAs
In micro-GA [27], the initial population memory is divided into a replaceable and a
non-replaceable portion. The non-replaceable portion is randomly generated, never
changes during the entire run and it provides the required diversity. The replaceable
portion changes after each generation. The population of each generation is taken
randomly from both population portions, and then undergoes conventional genetic
operators. After one generation, two nondominated vectors are chosen from the final
population and they are compared with the contents of the external archive. If either
of them (or both) remains as nondominated after comparing it against the vectors in
this archive, then they are included in the archive. This is the historical archive of
nondominated vectors. All dominated vectors contained in the archive are eliminated.
Micro-GA uses three forms of elitism. It can produce an important portion of the
Pareto front at a very low computational cost. During the evolving process, micro-
GA will start from points getting closer and closes to the Pareto front, which makes
micro-GA very efficient. The crowdedness evaluation in micro-GA is the squeeze
factor.
Incrementing MOEA [151] has a dynamic population size that is computed adap-
tively according to the discovered Pareto front and desired population density. It
incorporates the method of fuzzy boundary local perturbation with interactive local
fine-tuning for broader neighborhood exploration.
Incremental multiple-objective GA (IMOGA) [22] considers each objective incre-

mentally. The whole evolution is divided into as many phases as the number of
objectives, and one more objective is considered in each phase. In each phase, an
independent population is first evolved to optimize one specific objective, and then
the better-performing individuals from the single-objective population evolved and
the multiobjective population evolved in the last phase are joined together. IMOGA
uses Pareto-based fitness assignment, elitism, and a parameter-free diversity main-
tenance strategy. IMOGA generally outperforms NSGA-II, SPEA, and PAES.
Dynamic population size MOEA [168] includes a population-growing strategy
that is based on the converted fitness and a population-declining strategy that resorts
to the three qualitative indicators: age, health, and crowdedness. A cell-based rank
and density estimation strategy is proposed to efficiently compute dominance and
diversity information when the population size varies dynamically. Meanwhile, an
objective space compression strategy continuously refines the quality of the resulting
Pareto front. The performance is competitive with or even superior to incrementing
MOEA [151], NSGA-II and SPEA2.
In cellular multiobjective GA (C-MOGA) [119], each individual is located in a cell
with a different weight vector. This weight vector governs the selection operation.
The selection is performed in the neighborhood of each cell. C-MOGA uses weighted
sums of the objectives as its guided functions and there is no mechanism for keeping
the best solution found so far to each subproblem in its internal population. C-MOGA
inserts solutions from its external archive to its internal population at each generation
for dealing with nonconvex Pareto fronts.
In [161], a local search operator is employed in MOEAs. The operator employs
quadratic approximations of the objective functions and constraints using the pre-
vious EA function evaluations. It is applied to a few individuals of the EA pop-
ulation only. The local search phase consists of solving the auxiliary multiobjec-
tive quadratic optimization problem by the common linear matrix inequality (LMI)
solvers, which are based on interior point methods. The individuals generated by
the operator are pooled with the individuals generated by usual operators (such as
mutation or crossover), and a fixed-length population is evaluated.
Evolutionary local selection algorithm [113] has a local selection scheme informed
by ecological modeling. In local selection, individual fitnesses are accumulated over
time and compared to a fixed threshold to decide who will reproduce. Local selec-
tion maintains diversity in a way similar to fitness sharing, but is more efficient.
Evolutionary local selection algorithm outperforms NPGA.
MOCell [121] is an elitist cellular GA for solving multiobjective optimization
problems. An external archive is used to store nondominated solutions found so far
(like PAES or SPEA2), and solutions from this archive randomly replace existing
individuals in the population after each iteration. MOCell obtains competitive results
in terms of convergence and hypervolume, and it clearly outperforms NSGA-II and
SPEA2 concerning the diversity of the solutions along the Pareto front.
In [148], the presented weight-based EA tries to approximate the Pareto frontier
and to evenly distribute the solutions over the frontier. Each member selects its own
weight for a weighted Tchebycheff distance function to define its fitness score. The
fitness scores favor solutions that are closer to the Pareto frontier and that are located
at underrepresented regions.
ParEGO [83] is an extension of efficient global optimization to the multiobjective
framework. The objective values of solutions are scalarized with a weighted Tcheby-
cheff function and a model based on the Gaussian process at each iteration is used
to better approximate the Pareto frontier. ParEGO generally outperforms NSGA-II.
Hill climber with sidestep [91] is a local search-based procedure. It has been
integrated into a given evolutionary method such as SPEA2 and NSGA-II leading
to new memetic algorithms. The local search procedure is intended to be capable of
both moving toward and along the (local) Pareto set depending on the distance of
the current iterate toward this set. It utilizes the geometry of the directional cones of
such optimization problems and works with or without gradient information.
Genetic diversity evaluation method [155] is a diversity-preserving mechanism.
It considers a distance-based measure of genetic diversity as a real objective in
fitness assignment. This provides a dual selection pressure toward the exploitation
of current nondominated solutions and the exploration of the search space. Fitness
assignment is performed by ranking the solutions according to the Pareto ranks scored
with respect to the objectives of the MOP and a distance-based measure of genetic
diversity, creating a two-criteria optimization problem in which the objectives are
the goals of the search process itself. Genetic diversity EA [155] is a multiobjective
EA that is strictly designed around genetic diversity evaluation method, and features
a (μ + λ) selection scheme as an elitist strategy.
NPGA [71] is a global nonelitist selection algorithm for finding the Pareto optimal
set. It modifies GA to deal with multiple objectives by incorporating the concept of
Pareto dominance in its selection operator, and applying a niching pressure to spread
its population out along the Pareto optimal tradeoff surface. Niched Pareto GA 2
(NPGA2) [46] improves NPGA by using Pareto-rank-based tournament selection
and criteria-space niching to find nondominated frontiers.
Hypervolume-based algorithm (HypE) [7] uses a hypervolume estimation algo-
rithm for multiobjective optimization, by which the accuracy of the estimates can
be traded off against the available computing resources. Like standard MOEA, it is
based on fitness assignment schemes, and consists of successive application of mat-
ing selection, variation, and environmental selection. The hypervolume indicator
is applied in environmental selection. In HypE, a Monte Carlo simulation method
is used to approximate the exact hypervolume value. This approximation method
significantly reduces the computational load and makes HypE very competitive for
solving many-objective optimization problems.
Single front GA [20,21] is an island model for multiobjective problems with a
clearing procedure that uses a grid in the objective space for maintaining diversity
and the distribution of the solutions in the Pareto front. Each subpopulation (island)
is associated with a different area in the search space. Compared with NSGA-II and
SPEA2, single front GAs (SFGA, and especially SFGA2) have obtained adequate
quality in the solutions in very little time. Single front GAs could be appropriate
in dealing with optimization problems with high rates of change, and thus stronger
time constraints, such as multiobjective optimization for dynamic problems.
Direction-based multiobjective EA (DMEA) [18] incorporates the concept of

direction of improvement. A population of solutions is evolved under the guidance
of directions of improvement. Two types of directions are used including conver-
gence direction (from a dominated solution to a nondominated one) and spreading
direction (between two nondominated solutions) for generation of offspring along
those directions. An archive is not only used for contributing elite solutions for the
next generation, but also for deriving the directions. DMEA-II [124] adapts a balance
between convergence and spreading by using an adaptive ratio between the conver-
gence and spreading directions being selected. It introduces a concept of ray-based
density for niching and a selection scheme based on the ray-based density. DMEA-II
yields quite good results on primary performance metrics, including the generation
distance, inverse generation distance, hypervolume and the coverage set.
Wolbachia pipientis is a bacteria that is widespread among insect species. Wol-
bachia has the capacity to spread very rapidly in an uninfected population due to
the induction of a biological mechanism known as cytoplasmic incompatibility. This
mechanism causes the death of the progeny when an uninfected female mates with
an infected male. The host infected with Wolbachia bacteria is endowed with resis-
tance to virus infection. Wolbachia infection is mainly used as a biological disease
control strategy to combat vector borne diseases. Wolbachia infection GA [60] is
an MOEA that simulates a Wolbachia infection during reproduction in the popula-
tion to help achieve better solutions and in some cases in less generations. By using
Wolbachia infection and cytoplasmic incompatibility, the best solutions are spread
to assert exploitation. At every generation, some individuals are infected with the
Wolbachia bacteria. Those individuals in the Pareto front are the infected ones. These
individuals are selected to reproduce because they are the best solutions found.
MOEA based on reputation [79] introduces reputation concept to measure the
dynamic competency of operators and parameters across problems and stages of
the search. Each individual in MOEAs selects operators and parameters with the
probabilities correlated to their reputation. In addition to population initialization,
the initial reputation score of operators and parameters is set as the same value which
offers all an equal chance to participate in the search process. Individual solutions
generate offspring by choosing operators and parameters based on reputation scores.
Credit assignment rewards the selected operators and parameters based on whether
the offspring can survive to the next generation. The reputation of operators and
parameters is updated based on the aggregation of historical rewards gained in the
previous generation and new rewards in the current generation. The Java source codes
of MOEAs based on reputation, and four adaptive MOEA/D variants are available
at http://trust.sce.ntu.edu.sg/~sjiang1/.
23.2.7 Nondominated Sorting
Among various dominance comparison mechanisms, nondominated sorting is very

effective for finding Pareto optimal solutions. Nondominated sorting is a procedure
where solutions in the population are assigned to different fronts based on their
dominance relationships. In most nondominated sorting algorithms, a solution needs

to be compared with all other solutions before it can be assigned to a front. This
can be computationally expensive, especially when the number of individuals in the
population becomes large.
Nondominated sorting is a selection strategy implemented in NSGA. It has a
time complexity of O(MN 3 ) and a space complexity of O(N), for M objectives and
N solutions in the population. Fast nondominated sort [149] has a reduced time
complexity of O(MN 2 ), but the space complexity for fast nondominated sort is
O(N 2 ). In [77], a divide and conquer strategy is adopted for nondominated sorting,
and the time complexity is O(N logM−1 N).
Climbing sort and deductive sort [23] improve nondominated sorting by inferring
some dominance relationships between solutions based on recorded comparison
results. The
√ space complexity for deductive sort is O(N), its best case time complexity
is O(MN N).
In efficient nondominated sort [174], a solution to be assigned to a front needs
to be compared only with those that have already been assigned to a front, thereby
avoiding many unnecessary dominance comparisons. The population is sorted in one
objective before efficient nondominated sort is applied. Thus, a solution added to the
fronts cannot dominate solutions that are added before. Efficient nondominated sort
has a space complexity of O(1). The time complexity of efficient nondominated sort
is O(MN log N) in good cases, and O(MN 2 ) in the worst case.
In [47], a nondominated sorting algorithm is presented to generate the nondomi-
nated fronts in MOEAs, particularly NSGA-II. It reduces the number of redundant
comparisons existing in NSGA-II by recording the dominance information among
solutions from their first comparisons. By utilizing the dominance tree data structure
and the divide and conquer mechanism, the algorithm generates the same nondomi-
nated fronts as in NSGA-II, but use less time.
23.2.8 Multiobjective Optimization Based on Differential Evolution
DE has been extended for multiobjective optimization, such as Pareto-frontier DE

algorithm [1], self-adaptive Pareto DE [2], Pareto DE [111], and vector-evaluated
DE [126], DE for multiobjective optimization (DEMO) [136], generalized DE [88],
and multiobjective DE with spherical pruning [135].
Pareto-frontier DE [1] solves a multiobjective problem by incorporating Pareto
dominance, self-adaptive crossover, and mutation operator. Pareto DE [111] extended
DE by incorporating nondominated sorting and ranking selection scheme of NSGA-
II. Inspired by VEGA [140], a parallel multipopulation DE algorithm is introduced for
multiobjective optimization [126]. Generalized DE (GDE3) [88] extends the selec-
tion operator of DE to handle constrained multiobjective optimization problems.
GDE3 (and DE in general) is notable for rotationally invariant operators—they pro-
duce offspring independent of the orientation of the fitness landscape. GDE3 was a
strong competitor in the CEC 2009 Competition.
In self-adaptive Pareto DE [2], Cr is encoded into each individual and simultane-

ously evolves with other search variables, whereas F was generated for each variable
from a Gaussian distribution N(0, 1). The approach self-adapts the crossover and
mutation rates. Based on self-adaptive Pareto DE [2], DE with self-adapting popula-
tions [154] self-adapts the population size, in addition to the crossover and mutation
rates.
DEMO [136] combines DE with Pareto-based ranking and crowding distance sort-
ing. AMS-DEMO [43] is an asynchronous master–slave implementation of DEMO.
It utilizes queues for each slave, which reduces the slave idle time to a negligible
amount. The number of processors is not required to divide the population size and
may even exceed it. AMS-DEMO achieves speedups larger than the population size,
and therefore larger than the theoretical limit for generational algorithms. Asynchro-
nous nature makes AMS-DEMO robust to communication failures and able to handle
dynamic allocation of processing resources, and thus suitable for grid computing.
In [135], multiobjective DE with spherical pruning and multiobjective DE with
spherical pruning based on preferences are proposed for finding a set of nondomi-
nated and pareto optimal solutions.
23.3 Performance Metrics
The three performance objectives are convergence to the Pareto front, evenly distrib-
uted Pareto optimal solutions and coverage of the entire front.
In multiobjective optimization, a theoretically well-supported alternative to Pareto
dominance is the use of a set-based indicator function to measure the quality of a
Pareto front approximation of solution sets. Some performance metrics are described
in this section.
Generational Distance
Generational distance [157] aims to find a set of nondominated solutions having
the lowest distance with the Pareto optimal fronts. An algorithm with the minimum
generational distance has the best convergence to Pareto optimal fronts.
Generational distance is defined as the average distance from an approximation
set of solutions, P , found by evolution to the global Pareto optimal set (i.e., the
reference set):
∗ ∗ n
∗ s∈P min{x1 − s2 , . . . , xNP − s2 } di
D(P , P ) = = i=1 , (23.6)
|P | n
where |P ∗ | = NP is the cardinality of the set P ∗ = {x∗1 , . . . , x∗NP }, di is the Euclidean
distance (in the objective space) from solution i to the nearest solution in the Pareto
optimal set, and n is the size of P . This metric describes convergence, but not the
distribution of the solution over the entire Pareto optimal front. The metric measures
the distance of the elements in the set P from the nearest point of the reference
23.3 Performance Metrics 387
Pareto frontier, P being an approximation of the true front and P ∗ the reference
Pareto optimal set.
Spacing
Spacing metric [141] is used to measure the distribution of the nondominated solu-
tions obtained by an algorithm, i.e., the obtained Pareto optimal front [36]

|P |
1
Sp = (di − d̄)2 , (23.7)
|P |
i=1
where |P | is the number of member in the approximate Pareto optimal front P , and
di is the Euclidean distance between the member xi in P and the nearest member
in P
k

di = min |fm (xi ) − fm (xj )| , xj ∈ P , j = i, j = 1, 2, . . . , |P |. (23.8)
m=1
d̄ is the average value of di , and K is the number of objective functions fm .

A smaller Sp gives a better uniform distribution in P . If all nondominated solutions
are uniformly distributed in the objective space PF, then di = d̄, and Sp = 0.
The average Sp over all time is represented by
Tmax
Spi
Sp = i=1 , (23.9)
Tmax
where Spi is the performance metric at the moment t, and Tmax is the maximum
number of time steps.
Spread or Diversity
Spread or diversity metric [37,39] measures the spread of the nondominated solutions
obtained by an algorithm. This criterion indicates how the solutions are extended
in P : |P |
df + dl + i=1 |di − d̄|
= , (23.10)
df + dl + (|P | − 1)d̄
where df and dl are the Euclidean distances between the extreme solutions (updown-
direction boundary) in true Pareto front PFoptimal and obtained Pareto front P , respec-
tively, and di is the Euclidean distance between point xi in P and the closest point
in PFoptimal .
is always greater than zero, and a smaller means better distribution and spread
of the solutions. = 0 is the perfect condition indicating that extreme solutions of
PFoptimal have been found and that di = d̄ for all nondominated point.
Inverse Generational Distance
Inverted generational distance (IGD) metric [12,170] measures the average distance
from the points in the Pareto front to their closest solution in the obtained set. It
provides combined information about convergence and diversity of a solution set. A
low IGD value (ideally zero) is preferable, indicating that the obtained solution set is
close to the Pareto front as well as has a good distribution. Knowledge of the Pareto
front of a test problem is required for the calculation of IGD. The DTLZ problems
have known optimal fronts.
Let PF be the Pareto optimal set (a reference set representing the Pareto front);
the IGD value from PF to the obtained solution set P is defined by
|PF|
d(x, P ) d̄i
I GD(P ) = = i=1 , (23.11)
|PF| |PF|
x∈PF
where the cardinality function |PF| is the size of the Pareto optimal set, d(x, P ) is
the minimum Euclidean distance (in the objective space) from x to P , and di is the
Euclidean distance from solution i in the Pareto optimal set to the nearest solution
in P .
In order to get a low IGD value, P needs to cover all parts of the Pareto optimal
set. However, this method only focuses on the solution that is closest to the solution
in the Pareto optimal set.
The average IDG over all time is a dynamic performance metric
Tmax
I GDi
I GD = i=1 , (23.12)
Tmax
where I GDi is the performance metric at the moment t, and Tmax is the maximum
number of time steps.
Hypervolume
Hypervolume metric [177,181] is a unary measure of the hypervolume in the objec-
tive space that is dominated by a set of nondominated points. Hypervolume measures
the volume of the objective space covered/dominated by the approximation set, rep-
resenting a combination of proximity and diversity. For the problem whose Pareto
front is unknown, hypervolume is a popular performance metric.
The hypervolume measure is strictly monotonic with regard to Pareto dominance,
i.e., if a set A dominates the set B , then the hypervolume metric H V (A) > H V (B ),
assuming the metric to be maximized. However, hypervolume calculation requires
a runtime that increases exponentially with respect to the number of objectives. R2
metric is considered as an alternative to hypervolume. R2 metric [64] is weakly
monotonic, i.e., H V (A) ≥ H V (B ) if A weakly dominates B . Its calculation is much
easier.
Hypervolume calculates the volume of the objective space between the obtained
solution set and a reference point, and a larger value is preferable. Before computing
hypervolume, the values of all objectives are normalized to the range of a reference
point for each test problem. Choosing a reference point that is slightly larger than
the worst value of each objective on the Pareto front is suitable since the effects
of convergence and diversity of the set can be well balanced [5]. For minimization
problems, hypervolume values are normalized as
H Vk∗
H Vk = , (23.13)
maxi=1,2,...,N H Vi∗
23.3 Performance Metrics 389
where H Vk∗ , k = 1, 2, . . . , N, is the kth hypervolume value for a test problem, and
H Vk is the normalized value of H Vk∗ .
Two Set Coverage
Two set coverage (SC) metric [177], as a relative coverage comparison of two sets,
is defined as the mapping of the order pair (A, B ) to [0, 1]:
|{xb ∈ B ; ∃xa ∈ A : xb ≺ xa }|
SC(A, B ) = . (23.14)
|B |
If all points in A dominate or are equal to all positions in B, then SC = 1; SC = 0
otherwise. SC(A, B ) denotes the total percentage of solutions in B that are dominated
by A. Note that SC(A, B ) = 1 − SC(B , A).
Additive indicator (+-indicator) is the smallest distance the approximation set
must be translated so that every objective vector in the reference set is covered.
This identifies situations in which the approximation set contains one or more out-
lying objective vectors with poor proximity. A binary ε-dominance-based indicator
is defined in [182].
23.4 Many-Objective Optimization
A multiobjective problem involving a large number of objectives (M > 4) is generally

said to be a many-objective optimization problem. In multiobjective optimization, it is
generally observed that the conflict between convergence and diversity is aggravated
with the increase of the number of objectives, and the Pareto dominance loses its
effectiveness for a high-dimensional space but works well on a low-dimensional
space.
Most classical Pareto-based MOEAs, such as NSGA-II, SPEA2, and PAES,
encounter difficulties in dealing with many-objective problems [82,159]. The poor
performance of EAs is due to the loss of selection pressure in fitness evaluation.
This is improved in [66], where the whole population first quickly approaches a
small number of target points near the true Pareto front, and a diversity improvement
strategy is then applied to facilitate these individuals to spread and well distribute.
23.4.1 Challenges in Many-Objective Optimization
Balancing convergence and diversity is not an easy task in many-objective opti-

mization. One major reason is that the proportion of nondominated solutions in a
population rises rapidly with the number of objectives. This makes the Pareto-based
primary selection fail to distinguish individuals, and makes the diversity-based sec-
ondary selection play a leading role in determining the survival of individuals. In
this case, the performance of algorithms may worsen since they prefer dominance
resistant solutions [72], i.e., solutions with an extremely poor value in at least one of
the objectives, but with near optimal values in the others. Consequently, the solutions
in the final solution set may be distributed uniformly in the objective space, but away
from the desired Pareto front.
Some studies have shown that a random search algorithm may even achieve bet-
ter results than Pareto-based algorithms in MOPs with a high number of objectives
[86,130]. The selection rule created by the Pareto dominance makes the solutions
nondominated with respect to one another, at an early stage of MOEAs [30,48]. In
these algorithms, the ineffectiveness of the Pareto dominance relation for a high-
dimensional space leads diversity maintenance mechanisms to play a leading role
during the evolutionary process, while the preference of diversity maintenance mech-
anisms for individuals in sparse regions results in the final solutions distributed widely
over the objective space but distant from the desired Pareto front.
Let us consider a solution set of size N for an M (M > 4) objective optimization
problem. Assume that each of the solutions is distinct in all M objectives and each of
the objective values are continuous variables. The expected number of nondominated
solutions is given by [17]
N

k+1 N 1
A(N, M) = (−1) . (23.15)
k k M−1
k=1
By dividing the above expression by N, we have
N 1
k+1 N
A(N, M) k=1 (−1) k k M−1
P(N, M) = = . (23.16)
N N
For given N, P(N, M) converges to 1 with increasing M, as shown in Figure 23.4.
This indicates that if we follow the selection rule defined by Pareto dominance, the
change of getting a nondominated solution increases as the number M of objectives
is increased. The problem can be solved by changing the dominance criterion [48].
Nondominance is an inadequate strategy for convergence to the Pareto front for
such problems, as almost all solutions in the population become nondominated,
0.8
0.6 N=10 20 50
P(N,M)
0.4
0.2
0
5 10 15 20
M
Figure 23.4 P(N, M) as a function of the number of objectives, M.

23.4 Many-Objective Optimization 391
resulting in loss of convergence pressure. However, for some problems, it may be

possible to generate the Pareto front using only a few of the objectives, rendering
the rest of the objectives redundant. Such problems may be reducible to a man-
ageable number of relevant objectives, which can be optimized using conventional
MOEAs. Pareto corner search EA [145] searches for the corners of the Pareto front
instead of searching for the complete Pareto front, and then the corner solutions are
used for dimensionality reduction to identify the relevant objectives. The approach
does not use nondominated sorting, and hence does not suffer from the lack of
selection pressure during evolutionary search. Consequently, it takes much fewer
evaluations to find solutions close to the Pareto front. A very large population size
is not required with growing number of objectives, because the approach does not
attempt to approximate the entire Pareto front, but only a few characteristic solutions
instead. For improving convergence, there have been proposals such as average rank-
ing [30], modifying dominance relations [139], indicator-based ranking [178], and
substitute distance assignments [87]. To aid such preference-based search, proposals
based on reference point method [42] and preference articulation method [49] have
been formulated.
23.4.2 Pareto-Based Algorithms
Based on DEMO [136], α-DEMO [8] implements the technique of selecting a subset
of conflicting objectives using a correlation-based ordering of objectives. α is a
parameter determining the number of conflicting objectives to be selected. A new
form of elitism is proposed so as to restrict the number of higher ranked solutions that
are selected in the next population. α-DEMO algorithm [8] works faster than other
MOEAs based on dimensionality reduction, such as KOSSA [75], MOEA/D [170],
and HypE [7] for many-objective optimization problems, while having competitive
performance.
Shift-based density estimation (SDE) [98] can accurately reflect the density of
individuals in the population. It is a modification of traditional density estimation
in Pareto-based algorithms for dealing with many-objective problems. By shifting
individuals’ positions according to their relative proximity to the Pareto front, SDE
considers both convergence and diversity for each individual in the population. The
implementation of SDE is simple and it can be applied to any density estimator with-
out additional parameters. SDE has been applied to three Pareto-based algorithms,
namely, NSGA-II, SPEA2, and PESA-II. SPEA2+SDE provides a good balance
between convergence and diversity. When addressing a many-objective problem,
SDE may be easily and effectively adopted, as long as the algorithm’s density esti-
mator can accurately reflect the density of individuals.
Grid-based EA [165] solves many-objective optimization problems. It exploits
the potential of the grid-based approach to strengthen the selection pressure toward
the optimal direction while maintaining an extensive and uniform distribution among
solutions. Grid dominance and grid difference are used to determine the mutual rela-
tionship of individuals in a grid environment. Three grid-based criteria, namely, grid
ranking, grid crowding distance, and grid coordinate point distance, are incorporated
into the fitness of individuals to distinguish them in both the mating and environ-
mental selection processes. Moreover, a fitness adjustment strategy is developed by
adaptively punishing individuals based on the neighborhood and grid dominance
relations in order to avoid partial overcrowding as well as guide the search toward
different directions in the archive. The designed density estimator of an individ-
ual takes into account not only the number of its neighbors, but also the distance
difference between itself and these neighbors.
A diagnostic assessment framework for rigorously evaluating the effectiveness,
reliability, efficiency, and controllability of many-objective evolutionary optimiza-
tion as well as identifying their search controls and failure modes is proposed in [62].
Given the variety of fitness landscapes and the complexity of search population
dynamics, the operators used during multiobjective search are adapted based on
their success in guiding search [158]. Building on this, Borg MOEA [63] handles
many-objective multimodal problems using an auto-adaptive multioperator recom-
bination operator. This adaptive configuration of simulated binary crossover, DE,
parent-centric recombination (PCX), unimodal normal distribution crossover, sim-
plex crossover, polynomial mutation, and uniform mutation enables Borg MOEA to
quickly adapt to the problem’s local characteristics. The auto-adaptive multioperator
recombination, adaptive population sizing, and time continuation components all
exploit dynamic feedback from an -dominance archive to guarantee convergence
and diversity throughout search, according to the theoretical analysis of [94]. Borg
MOEA combines -dominance, a measure of convergence speed named -progress,
randomized restarts, and auto-adaptive multioperator recombination into a unified
optimization framework. Borg meets or exceeds -NSGA-II, -MOEA, OMOPSO,
GDE3, MOEA/D, SPEA2, and NSGA-II on the majority of the tested problems.
NSGA-III [38,76] is a reference point-based many-objective EA following NSGA-
II framework. It emphasizes population members that are nondominated, yet close to
a set of supplied reference points. NSGA-III outperforms MOEA/D-based algorithm
for unconstrained and constrained problems with a large number of objectives.
Clustering–ranking EA [19] implements clustering and ranking sequentially for
many-objective optimization. Clustering incorporates NSGA-III, using a series of
reference lines as the cluster centroid. The solutions are ranked according to their
degree of closeness to the true Pareto front. An environmental selection operation is
performed on every cluster to promote both convergence and diversity.
MOEA equipped with the preference relation can be integrated into an interactive
optimization method. A preference relation based on a reference point approach [108]
enables integrating decision-maker’s preferences into an MOEA. Besides finding the
optimal solution of the achievement scalarizing function, the new preference relation
allows the decision-maker to find a set of solutions around that optimal solution.
Since the preference relation induces a finer order on vectors of the objective space
than that achieved by the Pareto dominance relation, it is appropriate to cope with
many-objective problems.
23.4 Many-Objective Optimization 393
Preference-inspired coevolutionary algorithm (PICEA) [160] coevolves a fam-

ily of decision-maker preferences together with a population of candidate solu-
tions. A realization of this method, PICEA-g, is systematically compared with a
Pareto dominance relation-based algorithm (NSGA-II), an -dominance relation-
based algorithm (-MOEA), a scalarizing function-based algorithm (MOEA/D), and
an indicator-based algorithm (HypE). For many-objective problems, PICEA-g and
HypE have comparable performance, and tend to outperform NSGA-II, -MOEA,
and MOEA/D.
To deal with many-objective optimization problems, bi-goal evolution [99]
converts a MOP into a bi-objective optimization problem regarding convergence
and diversity, and then handles it using the Pareto dominance relation in this bi-goal
domain. Implemented with performance estimation of individuals and the nondomi-
nated sorting procedure, bi-goal evolution divides individuals into different nondom-
inated layers and attempts to put well-converged and well-distributed individuals into
the first few layers.
23.4.3 Decomposition-Based Algorithms
Convergence to the Pareto optimal front for decomposition-based algorithms can

often be superior to that of Pareto-based alternatives. Performance of decomposition-
based approaches for many-objective optimization problems are largely dependent
on means of reference point generation, schemes to simultaneously deal with con-
vergence and diversity, and methods to associate solutions to reference directions.
In a decomposition-based EA introduced in [4], uniformly distributed reference
points are generated via systematic sampling (the same as adopted in NSGA-III [38]),
balance between convergence and diversity is maintained using two independent dis-
tance measures, and a simple preemptive distance comparison scheme is used for
association. Scalarization has been addressed via two fundamental means, namely,
through a systematic association and niche preservation mechanism as in NSGA-III
[38] or through an aggregation of the projected distance along a reference direction
and the perpendicular distance from a point to a given reference direction within
the framework of MOEA/D. The association of solutions to reference directions are
based on two independent distance measures. The reference directions are gener-
ated using systematic sampling, wherein the points are systematically generated on
a hyperplane with unit intercepts in each objective axis. The distance along the ref-
erence direction controls the convergence, whereas the perpendicular distance from
the solution to the reference direction controls the diversity. In order to improve the
efficiency of the algorithm, a steady state form is adopted in contrast to a generational
model used in NSGA-III [38]. In order to deal with constraints, an adaptive epsilon
formulation is used.
MOEA/D has two difficulties for many-objective problems. First, the number
of constructed weight vectors is not arbitrary and the weight vectors are mainly
distributed on the boundary of weight space for many-objective problems. Second,
the relationship between the optimal solution of subproblem and its weight vec-
tor is nonlinear for the Tchebycheff decomposition approach used by MOEA/D. To

deal with these difficulties, MOEA/D-UDM (MOEA/D with uniform decomposition
measurement and the modified Tchebycheff decomposition approach) [110] obtains
uniform initial weight vectors in any amount based on the uniform decomposition
measurement, and uses the modified Tchebycheff decomposition approach to alle-
viate the inconsistency between the weight vector of subproblems and the direction
of its optimal solution in the Tchebycheff decomposition approach. MOEA/D-UDM
combines simplex-lattice design with transformation method to generate alternative
weight vectors and then selects uniform weight vectors from the alternative weight
vectors based on the uniform design measurement.
Decomposition-based algorithms have difficulties in effectively distributing Pareto
optimal solutions in a high-dimensional space. Generalized decomposition [55] pro-
vides a framework with which the decision-maker can guide the underlying search
algorithm toward regions of interest, or the entire Pareto front, with the desired distri-
bution of Pareto optimal solutions. It focuses on only the performance of convergence
to the Pareto front. A set of weighting vectors can be generated near regions of inter-
est, thus avoiding a waste of resources in a search for Pareto optimal solutions away
from such regions. Since generalized decomposition-based algorithms have a way to
distribute solutions on the Pareto front very precisely, the necessity of using elaborate
archiving strategies and sharing is diminished. Many-objective cross entropy based
on generalized decomposition (MACE-gD) [55] is a scalable framework for tackling
many-objective problems with respect to GD-metric. It is established on generalized
decomposition and an EDA based on low-order statistics, namely, the cross-entropy
method. MACE-gD is competitive with MOEA/D and RM-MEDA [173] in terms of
GD-metric.
23.5 Multiobjective Immune Algorithms
Based on clonal selection principle, multiobjective immune system algorithm (MISA)

[25] uses Pareto dominance and feasibility to identify solutions that deserve to be
cloned. MISA adopts an affinity measure to control the amount of hypermutation
applied to each antibody, and uses two types of mutation: uniform mutation applied
to the clones produced and nonuniform mutation applied to the not-so-good antibod-
ies. An archive is used to store the nondominated solutions found along the search
process. Such archive allows the elitist mechanism to move toward the true Pareto
front over time. MISA decides how many clones to produce from a certain antibody
based on how crowded is the region to which it belongs in the archive.
Immune dominance clonal multiobjective algorithm (IDCMA) [78] defines
antibody–antibody (Ab-Ab) affinity as a custom distance measure between the dom-
inated individuals and one of the nondominated individuals found so far. According
to the values of Ab-Ab affinity, all dominated individuals (antibodies) are divided
into subdominant antibodies and cryptic antibodies. A heuristic search only applies
to the subdominant antibodies, but the cryptic antibodies can become subdominant
23.5 Multiobjective Immune Algorithms 395
(active) antibodies in the subsequent evolution. IDCMA has difficulties in converg-

ing to the true Pareto optimal front and obtaining well-distributed solutions for some
complicated problems.
Both MISA and IDCMA adopt binary representation. Nondominated neighbor
immune algorithm (NNIA) (http://see.xidian.edu.cn/iiip/mggong/Projects/NNIA.
htm) [59] modifies IDCMA by using real-coded representation. It uses nondomi-
nated neighbor-based selection, an immune-inspired operator, two heuristic search
operators, elitism, and a population maintenance strategy. In NNIA, the fitness value
of each nondominated individual is assigned as the average distance of two non-
dominated individuals on either side of this individual along each of the objectives,
namely, the crowding distance. Inspired by immune theory, only a few nondominated
individuals with greater crowding distance values are selected as active antibodies to
perform proportional cloning, recombination, and hypermutation. By using nondom-
inated neighbor-based selection and proportional cloning, NNIA realizes enhanced
local search in the less crowded regions of the current trade-off front. The algorithm
scales well along the number of objectives.
Double-module immune algorithm [103] embeds two evolutionary modules to
simultaneously improve the convergence speed and population diversity. The first
module optimizes each objective independently by using a subpopulation composed
of the competitive individuals in this objective. DE crossover is performed to enhance
the corresponding objective. The second one applies immune algorithm, where pro-
portional cloning, recombination, and hypermutation operators are operated to con-
currently strengthen the multiple objectives. The method outperforms NSGA-II,
SPEA2, MOEA/D, and NNIA.
Vector AIS [52] is a multiobjective optimization algorithm based on opt-aiNet.
Its performance is similar to or better than those produced by NSGA-II.
23.6 Multiobjective PSO
Various multiobjective PSO algorithms have been developed for MOPs [14,26,28,
67,68,100,101,129,133,134,144]. These designs generally use a fixed population
size throughout the process of searching for possible nondominated solutions until
the Pareto optimal set is obtained.
Multiobjective PSO (MOPSO) [28] incorporates Pareto dominance and an adap-
tive mutation operator. It uses an archive of particles for guiding the flight of other
particles. The algorithm is relatively easy to implement. It is able to cover the full
Pareto front of all the functions used.
Multiswarm multiobjective PSO [26] divides the decision space into multiple
subswarms via clustering to improve the diversity of solutions on the Pareto optimal
front. PSO is executed in each subswarm. Every particle will deposit its flight expe-
riences after each flight cycle. At some points during the search, different subswarms
exchange information so that each subswarm chooses a different leader to preserve
diversity. The number of particles in each swarm is predetermined. AMOPSO [129]
is similar to multiswarm multiobjective PSO [26]. AMOPSO uses the concept of

Pareto dominance to determine the flight direction of a particle and to maintain pre-
viously found nondominated vectors in a global repository that is later used by other
particles to guide their own flight. A mutation operator is also used to act both on
the particles of the swarm and on the range of each design variable of the problem.
AMOPSO outperforms NSGA-II, multiswarm multiobjective PSO [26] and PAES.
Built on AMOPSO [129], dynamic population multiple-swarm multiobjective
PSO (DMOPSO) [107] is integrated with a dynamic population strategy. The
dynamic population strategy only applies to the swarm population, while the mul-
tiple swarms are grouped via a clustering technique. DMOPSO also incorporates a
cell-based rank density estimation scheme to quickly update the location of the new
particles in the objective space and to provide easy access to the rank and density
information of the particles, and adaptive local archives to improve the selection
of group leaders to produce a better distributed Pareto front associated with each
swarm. An appropriate number of swarms is needed to be prespecified.
In dynamic multiple swarms in multiobjective PSO (DSMOPSO) [167], the num-
ber of swarms is adaptively adjusted throughout the search process by a swarm-
growing strategy and a swarm-declining strategy. Cell-based rank density estimation
scheme is used to keep track of the rank and density values of the particles. Objective
space compression and expansion strategy are used to adjust the size of the objective
space whenever needed to progressively search for high-precision true Pareto front.
PSO updating equation is modified to exploit its usefulness and to accommodate the
multiple-swarm concept. Swarm lbest archive is updated based on the progression
of the swarm leaders.
Vector-evaluated PSO [127] modifies the VEGA idea to fit in the PSO framework.
It uses two or more swarms to search the space. When updating the velocity and
position, a particle learns from its personal best experience and the best experience
of its neighbor swarm.
Based on Pareto dominance, OMOPSO [144] divides the population into three
subswarms of equal size, each adapting to a different mutation (or turbulence) oper-
ator. It uses a crowding factor to filter out the list of available leaders. -dominance
concept is also incorporated to fix the size of the set of final solutions.
In [133], the proposed multiobjective PSO algorithm mimics the social behavior
of a real swarm: The individuals of a swarm update their flying direction through
communication with their neighboring leaders with an aim to collectively attain a
common goal. The algorithm employs a multilevel sieve to generate a set of leaders,
a probabilistic crowding radius-based strategy for leader selection and a simple gen-
erational operator for information transfer. It is effective for problems with multiple
suboptimal Pareto fronts.
Selecting a proper personal guide has a significant impact on the performance of
multiobjective PSO algorithms [14]. In [14], a notion of allowing each particle to
memorize all nondominated personal best particles it has encountered is proposed
and several strategies are investigated for selecting a pbest particle from the personal
archive.
23.6 Multiobjective PSO 397
In [67], PSO is modified by using a dynamic neighborhood strategy, particle

memory updating, and one-dimension optimization to deal with multiple objectives.
In a dynamic neighborhood, m closest particles in performance space are selected to
be its new neighborhood in each generation, and the nbest is selected among them.
In modified DNPSO [68], an extended memory is introduced to store global Pareto
optimal solutions. This can significantly decrease the computation time.
Nondominated sorting PSO [100] uses nondominated sorting concept and two
parameter-free niching methods for multiobjective optimization. It extends PSO by
making a better use of particles’ pbests and offspring for more effective nondomi-
nation comparisons. Instead of a single comparison between a particle’s pbest and
its offspring, nondominated sorting PSO compares all particles’ pbests and their off-
spring in the entire population. This proves to be effective in providing an appropriate
selection pressure to push the swarm population toward the Pareto optimal front.
For multiobjective optimization, maximinPSO [101] uses a fitness function
derived from maximin strategy to determine Pareto domination. By using the max-
imin fitness function, no additional clustering or niching technique is required, since
the maximin fitness of a solution gives not only whether a solution is dominated, but
also whether it is clustered with other solutions. On the ZDT test function series,
maximinPSO outperforms NSGA-II in all the performance measures used.
Elitist-mutation multiobjective PSO [134] incorporates Pareto dominance crite-
ria for nondomination selection and an efficient elitist-mutation strategy into PSO.
To create effective selection pressure among the nondominated solutions, it uses a
variable size archive for elitism and crowding distance comparison operator. The
elitist-mutation mechanism effectively explores the feasible search space and speeds
up the search for the true Pareto optimal region.
FMOPSO [104] is a multiobjective MA within the context of PSO. It combines
the global search ability of PSO with synchronous local search for directed local
fine-tuning. The particle updating strategy is based on the concept of fuzzy gbest
for diversity maintenance within the swarm. A synchronous particle local search
performs directed local fine-tuning, which helps to discover a well-distributed Pareto
front.
Fuzzy clustering-based PSO [3] solves highly constrained MOPs involving con-
flicting objectives and constraints. It uses an archive to preserve nondominated par-
ticles found along the search process. Fuzzy clustering technique manages the size
of the archive within limits without destroying the characteristics of the Pareto front.
Niching mechanism is incorporated to direct the particles toward lesser explored
regions of the Pareto front. A self-adaptive mutation operator is used, and the algo-
rithm incorporates a fuzzy-based feedback mechanism.
Strength Pareto PSO [45] for multiobjective optimization is based on the strength
Pareto approach. It shows a slower convergence, but requires less CPU time,
compared to SPEA2 and a competitive multiobjective PSO using several metrics.
Combining strength Pareto PSO with EAs leads to superior hybrid algorithms that
outperform SPEA2, a competitive multiobjective PSO, and strength Pareto PSO in
terms of different metrics.
23.7 Multiobjective EDAs
Many multiobjective EDAs have been developed for continuous-valued multiob-

jective optimization. Most of them use real representation. Examples using binary
representation are BMOA [92], mohBOA [128], multiobjective extended compact
GA (meCGA) [138], and multiobjective parameterless GA (moPGA) [147].
Voronoi-based EDA [125] adjusts its reproduction process to the problem structure
for solving multiobjective optimization. A Voronoi diagram is used to construct
stochastic models, based on which new offspring will be generated. Voronoi-based
EDA outperforms NSGA-II when a limited number of fitness evaluations is allowed.
In MIDEA [13], the probabilistic model is a mixture distribution, and each compo-
nent in the mixture is a univariate factorization. Mixture distributions allow for wide
spread exploration of the Pareto front in multiobjective optimization, and diversity
is preserved by using a diversity-preserving selection. The number of Gaussians is
determined by adaptive clustering.
Restricted Boltzmann machine-based multiobjective EDA (REDA) [153] uses
restricted Boltzmann machine for probabilistic model building and applies other
canonical operators for multiobjective optimization. REDA uses only global infor-
mation in guiding the search. It may be trapped at local optima. To overcome this
limitation, REDA is hybridized with PSO [57]. In [143], an EDA based on restricted
Boltzmann machines is hybridized with a PSO algorithm in a discrete domain for
handling multiobjective optimization problems in a noisy environment. In [142], the
behaviors of the sampling techniques in terms of energy levels are investigated for
REDA, and a sampling mechanism that exploits the energy information of the solu-
tions in a trained network is proposed to improve the search capability. REDA is then
hybridized with GA and local search based on an evolutionary gradient approach.
Bayesian multiobjective optimization algorithm (BMOA) [92] combines BOA
with the selection and replacement mechanisms of SPEA2 to approximate the set
of Pareto optimal solutions. BOA with binary decision trees is used to capture the
mutual dependencies between the decision variables. The probability information is
constructed in the built binary decision tree. In [81], the selection strategy in NSGA-
II is combined with BOA for multiobjective and hierarchically difficult problems. In
[92], SPEA2 is combined with BOA for solving multiobjective knapsack problem.
Multiobjective hierarchical BOA (mohBOA) [128] utilizes BOA as its probabilis-
tic modeling approach. It is a scalable algorithm for solving multiobjective decom-
posable problems by combining hierarchical BOA with NSGA-II and clustering in
the objective space. Nondominance sorting is employed by replacing genetic opera-
tors with a Bayesian-based modeling approach and a sampling operator. Clustering
divides the objective space into different regions for modeling purposes.
Multiobjective extended compact GA (meCGA) [138] is studied on a class of
bounding adversarial problems with scalable decision variables. The moPGA algo-
rithm [147] incorporates extended compact GA as its modeling approach, competent
mutation as its enhanced searching operator, clustering as a diversity enhancement
approach, and an archive to preserve the promising solutions found.
23.7 Multiobjective EDAs 399
In multiobjective Parzen-based estimation of distribution algorithm (MOPED)

[32], Parzen estimator is used to estimate the population density of the promising
solutions. A spreading technique that utilizes Parzen estimator in the objective space
is used to improve the population diversity. MOPED takes few fitness evaluations to
reach a satisfying performance.
In a decision tree-based multiobjective EDA (DT-MEDA) for continuous-valued
optimization [175], the conditional dependencies among the decision variables are
extracted by a decision tree-based probabilistic model. Offspring solutions are then
produced by sampling the tree from the root node to the left node. In [112], an algo-
rithm is designed that uses growing neural gas network as its probabilistic modeling
technique.
Regularity model-based multiobjective EDA (RM-MEDA) [173] models a promis-
ing area in the search space by a probability model whose centroid is a piecewise
continuous manifold. Local PCA is used for building such a model by extracting
the regularity patterns of the candidate solutions from previous searches. The pop-
ulation is partitioned into disjoint clusters whose centroids and variances are then
estimated. New trial solutions are sampled from the model thus built. A nondomi-
nated sorting-based selection is used for choosing solutions for the next generation.
Overall, RM-MEDA outperforms GDE3 [88], PCX-NSGA-II [41], and MIDEA
[13] on problems with complex Pareto space. It has promising scalability in terms of
decision variables. In [172], biased crossover and biased initialization are added to
RM-MEDA to enhance its global search ability for problems with many local Pareto
fronts. In [176], RM-MEDA is improved to approximate the set of Pareto optimal
solutions in both the decision and objective spaces.
23.8 Tabu/Scatter Search Based Multiobjective Optimization
Archive-based hybrid scatter search (AbYSS) [123] follows scatter search structure
but uses mutation and crossover operators from EAs for solving MOPs. AbYSS
incorporates Pareto dominance, density estimation, and an archive. An archive is
used to store the nondominated solutions found during the search, following the
scheme applied by PAES, but using the crowding distance of NSGA-II as a niching
measure instead of the adaptive grid. Selection of solutions from the initial set used
to build the reference set applies the SPEA2 density estimation. AbYSS outperforms
NSGA-II and SPEA2 in terms of diversity of solutions, and it obtains very competitive
results according to the convergence to the true Pareto fronts and the hypervolume
metric.
MOSS [11] is a hybrid tabu/scatter search method for MOPs. It uses a weighted
sum approach. Multistart tabu search is used as the diversification method for gener-
ating a diverse approximation to the Pareto optimal set of solutions. It is also applied
to rebuild the reference set after each iteration of scatter search. Each tabu search
works with its own starting point, recency memory, and aspiration threshold. Fre-
quency memory is used to diversify the search and it is shared between the tabu
search algorithms. SSPMO [117] is also a hybrid scatter/tabu search algorithm for
continuous MOPs. Part of the reference set is obtained by selecting the best solu-
tions from the initial set for each objective function. The rest of the reference set
is obtained using the usual approach of selecting the remaining solutions from the
initial set which maximize the distance to the solutions already in the reference set.
SSMO [122] is a scatter search-based algorithm for solving MOPs. It incorpo-
rates Pareto dominance, crowding, and Pareto ranking. It is characterized by using
a nondominating sorting procedure to build the reference set from the initial set
where all the nondominated solutions found in the scatter search loop are stored,
and a mutation-based local search is used to improve the solutions obtained from the
reference set.
M-scatter search [156] extends scatter search to multiobjective optimization by
using nondominated sorting and niched-type penalty method of NSGA. It uses an
archive to store nondominated solutions found during the computation. NSGA nich-
ing method is applied for updating the archive so as to keep nondominated solutions
uniformly distributed along the Pareto front.
23.9 Other Methods
Multiobjective SA [120] uses dominance concept and annealing scheme for efficient
search. In [120], the relative dominance of the current and proposed solutions is
tested by using dominance in state change probabilities, and the proposal is accepted
when the proposed solution dominates the current solution. In [146], multiobjective
optimization is mapped to single-objective optimization by using the true tradeoff
surface, and is then applied by single-objective SA. Exploration of the full tradeoff
surface is encouraged. The method uses the relative dominance of a solution as the
system energy for optimization. It promotes rapid convergence to the true Pareto front
with a good coverage of solutions across it comparing favorably with both NSGA-II
and multiobjective SA [120]. SA-based multiobjective optimization [9] incorporates
an archive to provide a set of tradeoff solutions. To determine the acceptance prob-
ability of a new solution against the current solution, an elaborate procedure takes
into account the domination status of the new solution with the current solution, as
well as those in the archive.
Multiobjective ACO algorithms are proposed in [53]. In [118] different coarse-
grained distribution schemes for multiobjective ACO algorithms are based on inde-
pendent multi-colony structures. An island-based model is introduced where the
colonies communicate by migrating ants, following a neighborhood topology which
fits to the search space. The methods are aimed to cover the whole Pareto front, thus
each subcolony tries to search for solutions in a limited area.
Dynamic multi-colony multiobjective ABC [163] uses the multi-deme model and
a dynamic information exchange strategy. Colonies search independently most of
the time and share information occasionally. In each colony, there are S bees con-
taining an equal number of employed bees and onlooker bees. For each food source,
23.9 Other Methods 401
the employed or onlooker bee will explore a temporary position generated by using
neighboring information, and the better one determined by a greedy selection strategy
is kept for the next iterations. The external archive is employed to store nondominated
solutions found during the search process, and the diversity over the archived indi-
viduals is maintained by using crowding distance strategy. If a randomly generated
number is smaller than the migration rate, then an elite, identified as the intermediate
individual with the maximum crowding distance value, is used to replace the worst
food source in a randomly selected colony.
In elite-guided multiobjective ABC algorithm [70], fast nondominated sorting and
population selection strategy are applied to measure the quality of the solution and
select the better ones. The neighborhood of the existing solutions are exploited to
generate new solutions under the guidance of the elite. A fitness calculation method
is used to calculate the selection probability for onlookers.
Bacterial chemotaxis algorithm for multiobjective optimization [61] uses fast
nondominated sorting procedure, communication between the colony members and
a simple chemotactical strategy to change the bacterial positions in order to explore
the search space to find several optimal solutions. Multiobjective bacterial colony
chemotaxis algorithm [109] adds improved adaptive grid, oriented mutation based
on grid, and adaptive external archive to bacterial colony chemotaxis algorithm to
improve the convergence and the diversity of nondominated solutions.
A general framework for combining MOEAs with interactive preference infor-
mation and ordinal regression is presented in [15]. The interactive MOEA attempts
to learn a value function capturing the users’ true preferences. At regular intervals,
the user is asked to rank a single pair of solutions. This information is used to update
the algorithm’s internal value function model, and the model is used in subsequent
generations to rank solutions incomparable according to dominance.
HP-CRO [102] is a hybrid of PSO and CRO for multiobjective optimization. It
creates new molecules (particles) used by CRO operations as well as by mechanisms
of PSO. HP-CRO outperforms FMOPSO, MOPSO, NSGA-II and SPEA2.
Examples of other methods for multiobjective optimization are multiobjective
backtracking search algorithm [116], multiobjective cultural algorithm along with
evolutionary programming [24], multiobjective ABC by combining modified near-
est neighbor approach and improved inver-over operation [96], hybrid multiobjective
optimization based on shuffled frog leaping and bacteria optimization [131], mul-
tiobjective cuckoo search [65], self-adaptive multiobjective harmony search [34],
multiobjective teaching–learning-based optimization [132], multiobjective fish
school search [10], multiobjective invasive weed optimization [90], multiobjective
BBO [33,115], multiobjective bat algorithm [166], multiobjective brainstorming
optimization [164], multiobjective water cycle algorithm (MOWCA) [137], Gaussian
bare-bones multiobjective imperialist competitive algorithm [54], multiobjective dif-
ferential search algorithm [89], and multiobjective membrane algorithms [69].
23.10 Coevolutionary MOEAs
Coevolutionary paradigm has been integrated into multiobjective optimization in

the form of cooperative coevolution [73,152] or competitive coevolution [24,105].
Multiobjective coevolutionary algorithms are particularly suitable for dynamic mul-
tiobjective optimization. A fast convergence can be achieved by coevolution while
maintaining a good diversity of solutions.
In [93], a predator–prey model is applied in a multiobjective ES. The model is
similar to cellular GA, because solutions (preys) are placed on the vertices of an
undirected connected graph, thus defining neighborhoods, where they are caught by
predators.
Multiobjective cooperative coevolutionary algorithm (MOCCGA) [80] integrates
the cooperative coevolutionary effect and the search mechanisms utilized in multiob-
jective GA [50]. Nondominated sorting cooperative coevolutionary algorithm [73]
extends NSGA-II.
Cooperative coevolutionary algorithm (CCEA) for multiobjective optimization
[152] applies divide and conquer approach to decompose decision vectors into
smaller components and evolves multiple solutions in the form of cooperative sub-
populations. For m-parameter problems, CCEA assign m subpopulations and each
optimizes only a single parameter. Incorporated with various features like archiving,
dynamic fitness sharing, and extending operator, CCEA is capable of maintaining
archive diversity in the evolution and distributing the solutions uniformly along the
Pareto front. Exploiting the inherent parallelism of cooperative coevolution, CCEA
can be formulated into a distributed CCEA suitable for concurrent processing that
allows intercommunication of subpopulations residing in networked computers.
Competitive–cooperation coevolutionary paradigm [56] exploits the complemen-
tary diversity-preserving mechanism of both competitive and cooperative models.
It hybridizes competitive and cooperative mechanisms to track the Pareto front in
a dynamic environment. The decomposition process of the optimization problem
is allowed to adapt. Each species subpopulation competes to represent a particu-
lar subcomponent of the MOP, and the final winners cooperate to evolve for better
solutions. A dynamic coevolutionary algorithm that incorporates the features of sto-
chastic competitors and temporal memory is capable of tracking the Pareto front over
different environmental changes.
Multiple populations for multiple objectives (MPMO) [171] is a coevolutionary
technique for solving MOPs by letting each population correspond with only one
objective. The individuals’ fitness in each population can be assigned by the corre-
sponding objective. Coevolutionary multiswarm PSO adopts PSO for each popula-
tion, a shared archive for different populations to exchange search information, and
two designs to enhance the performance. One design is to modify the velocity update
equation to use the search information found by different populations to approxi-
mate the whole Pareto front fast. The other is to use an elitist learning strategy for
the archive update to bring in diversity to avoid local Pareto fronts.
23.10 Coevolutionary MOEAs 403
Problems
23.1 Apply gamultiobj solver to solve the ZDT1 problem in the Appendix as an
instance of unconstrained multiobjective optimization.
23.2 Apply gamultiobj solver to solve the Srinivas problem in the Appendix as
an instance of constrained multiobjective optimization.
23.3 Run the accompanying MATLAB code of MOEA/D to find the Pareto front
of Fonseca function in the Appendix. Investigate how to improve the result by
adjusting the parameters.
References
1. Abbass HA, Sarker R, Newton C. PDE: a Pareto-frontier differential evolution approach for
multi-objective optimization problems. In: Proceedings of IEEE congress on evolutionary
computation (CEC), Seoul, South Korea, May 2001. p. 971–978.
2. Abbass HA. The self-adaptive pareto differential evolution algorithm. In: Proceedings of IEEE
congress on evolutionary computation (CEC), Honolulu, HI, USA, May 2002. p. 831–836.
3. Agrawal S, Panigrahi BK, Tiwari MK. Multiobjective particle swarm algorithm with fuzzy
clustering for electrical power dispatch. IEEE Trans Evol Comput. 2008;12(5):529–41.
4. Asafuddoula M, Ray T, Sarker R. A decomposition-based evolutionary algorithm for many
5. Auger A, Bader J, Brockhoff D, Zitzler E. Theory of the hypervolume indicator: optimal μ-
distributions and the choice of the reference point. In: Proceedings of the 10th ACM SIGEVO
workshop on foundations of genetic algorithms (FOGA), Orlando, FL, USA, Jan 2009. p.
87–102.
6. Babbar M, Lakshmikantha A, Goldberg DE. A modified NSGA-II to solve noisy multi-
objective problems. In: Proceedings of genetic and evolutionary computation conference
(GECCO), Chicago, IL, USA, July 2003. p. 21–27.
7. Bader J, Zitzler E. HypE: an algorithm for fast hypervolume-based many-objective optimiza-
tion. Evol Comput. 2011;19(1):45–76.
8. Bandyopadhyay S, Mukherjee A. An algorithm for many-objective optimization with
reduced objective computations: a study in differential evolution. IEEE Trans Evol Com-
put. 2015;19(3):400–13.
9. Bandyopadhyay S, Saha S, Maulik U, Deb K. A simulated annealing-based multiobjective
optimization algorithm: AMOSA. IEEE Trans Evol Comput. 2008;12(3):269–83.
10. Bastos-Filho CJA, Guimaraes ACS. Multi-objective fish school search. Int J Swarm Intell
Res. 2015;6(1):18p.
11. Beausoleil RP. Moss: multiobjective scatter search applied to nonlinear multiple criteria opti-
mization. Eur J Oper Res. 2006;169(2):426–49.
12. Bosman PAN, Thierens D. The balance between proximity and diversity in multiobjective
evolutionary algorithms. IEEE Trans Evol Comput. 2003;7(2):174–88.
13. Bosman PAN, Thierens D. The naive MIDEA: a baseline multi-objective EA. In: Proceed-
ings of the 3rd international conference on evolutionary multi-criterion optimization (EMO),
Guanajuato, Mexico, March 2005. p. 428–442.
14. Branke J, Mostaghim S. About selecting the personal best in multiobjective particle swarm
optimization. In: Proceedings of conference on parallel problem solving from nature (PPSN
IX), Reykjavik, Iceland, Sept 2006. Berlin: Springer; 2006. p. 523–532.
15. Branke J, Greco S, Slowinski R, Zielniewicz P. Learning value functions in interactive evo-
lutionary multiobjective optimization. IEEE Trans Evol Comput. 2015;19(1):88–102.
16. Brockhoff D, Zitzler E. Objective reduction in evolutionary multiobjective optimization: the-
ory and applications. Evol Comput. 2009;17(2):135–66.
17. Buchta C. On the average number of maxima in a set of vectors. Inf Process Lett.
1989;33(2):63–5.
18. Bui LT, Liu J, Bender A, Barlow M, Wesolkowski S, Abbass HA. DMEA: a direction-based
multiobjective evolutionary algorithm. Memetic Comput. 2011;3:271–85.
19. Cai L, Qu S, Yuan Y, Yao X. A clustering-ranking method for many-objective optimization.
20. Camara M, de Toro F, Ortega J. An analysis of multiobjective evolutionary algorithms for
optimization problems with time constraints. Appl Artif Intell. 2013;27:851–79.
21. Camara M, Ortega J, de Toro F. A single front genetic algorithm for parallel multi-objective
optimization in dynamic environments. Neurocomputing. 2009;72:3570–9.
22. Chen Q, Guan S-U. Incremental multiple objective genetic algorithms. IEEE Trans Syst Man
Cybern Part B. 2004;34(3):1325–34.
23. Clymont KM, Keedwell E. Deductive sort and climbing sort: new methods for non-dominated
sorting. Evol Comput. 2012;20(1):1–26.
24. Coello CAC, Becerra RL. Evolutionary multiobjective optimization using a cultural algorithm.
In: Proceedings of IEEE swarm intelligence symposium, Indianapolis, IN, USA, April 2003.
p. 6–13.
25. Coello CAC, Cortes NC. Solving multiobjective optimization problems using an artificial
immune system. Genet Program Evolvable Mach. 2005;6:163–90.
26. Coello CAC, Lechuga MS. MOPSO: a proposal for multiple objective particle swarm opti-
mization. In: Proceedings of IEEE congress on evolutionary computation (CEC), Honolulu,
HI, USA, May 2002. p. 1051–1056.
27. Coello CAC, Pulido GT. A micro-genetic algorithm for multiobjective optimization. In: Pro-
ceedings of the 1st international conference on evolutionary multi-criterion optimization
(EMO), Zurich, Switzerland, March 2001. p. 126–140.
28. Coello CAC, Pulido GT, Lechuga MS. Handling multiple objectives with particle swarm
29. Corne DW, Jerram NR, Knowles JD, Oates MJ. PESA-II: region-based selection in evolu-
tionary multiobjective optimization. In: Proceedings of genetic and evolutionary computation
conference (GECCO), San Francisco, CA, USA, July 2001. p. 283–290.
30. Corne DW, Knowles JD. Techniques for highly multiobjective optimization: some nondomi-
nated points are better than others. In: Proceedings of the 9th ACM genetic and evolutionary
computation conference (GECCO), London, UK, July 2007. p. 773–780.
31. Corne DW, Knowles JD, Oates MJ. The pareto envelope-based selection algorithm for multi-
objective optimisation. In: Proceedings of the 6th international conference on parallel problem
solving from nature (PPSN VI), Paris, France, Sept 2000. Berlin: Springer; 2000. p. 839–848.
32. Costa M, Minisci E. MOPED: a multi-objective Parzen-based estimation of distribution algo-
rithm for continuous problems. In: Proceedings of the 2nd international conference on evo-
lutionary multi-criterion optimization (EMO), Faro, Portugal, April 2003. p. 282–294.
33. Costa e Silva MA, Coelho LDS, Lebensztajn L. Multiobjective biogeography-based optimiza-
tion based on predator-prey approach. IEEE Trans Magn. 2012;48(2):951–954.
34. Dai X, Yuan X, Zhang Z. A self-adaptive multi-objective harmony search algorithm based on
harmony memory variance. Appl Soft Comput. 2015;35:541–57.
35. Deb K. Multi-objective genetic algorithms: problem difficulties and construction of test prob-
lems. Evol Comput. 1999;7(3):205–30.
36. Deb K. Multi-objective optimization using evolutionary algorithms. Chichester: Wiley; 2001.
References 405
37. Deb K, Agrawal S, Pratap A, Meyarivan T. A fast elitist non-dominated sorting genetic
algorithm for multi-objective optimization: NSGA-II. In: Proceedings of the 6th international
conference on parallel problem solving from nature (PPSN VI), Paris, France, Sept 2000.
38. Deb K, Jain H. An evolutionary many-objective optimization algorithm using reference-point
based non-dominated sorting approach, part i: solving problems with box constraints. IEEE
39. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multi-objective genetic algorithm:
40. Deb K, Saxena DK. On finding Pareto-optimal solutions through dimensionality reduc-
tion for certain large-dimensional multi-objective optimization problems. KanGAL Report,
No.2005011. 2005.
41. Deb K, Sinha A, Kukkonen S. Multi-objective test problems, linkages, and evolutionary
methodologies. In: Proceedings of genetic and evolutinary computation conference (GECCO),
Seattle, WA, USA, July 2006. p. 1141–1148.
42. Deb K, Sundar J. Reference point based multiobjective optimization using evolutionary algo-
rithms. In: Proceedings of the 8th genetic and evolutionary computation conference (GECCO),
Seattle, WA, USA, July 2006. p. 635–642.
43. Depolli M, Trobec R, Filipic B. Asynchronous master-slave parallelization of differential
evolution for multi-objective optimization. Evol Comput. 2013;21(2):261–91.
44. di Pierro F, Khu S-T, Savic DA. An investigation on preference order ranking scheme for
multiobjective evolutionary optimization. IEEE Trans Evol Comput. 2007;11(1):17–45.
45. Elhossini A, Areibi S, Dony R. Strength Pareto particle swarm optimization and hybrid EA-
PSO for multi-objective optimization. Evol Comput. 2010;18(1):127–56.
46. Erickson M, Mayer A, Horn J. The niched pareto genetic algorithm 2 applied to the design
of groundwater remediation systems. In: Proceedings of the 1st international conference on
evolutionary multi-criterion optimization (EMO), Zurich, Switzerland, March 2001. p. 681–
695.
47. Fang H, Wang Q, Tu Y-C, Horstemeyer MF. An efficient non-dominated sorting method for
evolutionary algorithms. Evol Comput. 2008;16(3):355–84.
48. Farina M, Amato P. On the optimal solution definition for many-criteria optimization prob-
lems. In: Proceedings of the annual meeting of the North American fuzzy information process-
ing society (NAFIPS), New Orleans, LA, USA, June 2002. p. 233–238.
49. Fleming PJ, Purshouse RC, Lygoe RJ. Many-objective optimization: an engineering design
perspective. In: Proceedings of international conference on evolutionary multi-criterion opti-
mization (EMO), Guanajuato, Mexico, March 2005. p. 14–32.
50. Fonseca CM, Fleming PJ. Genetic algorithms for multiobjective optimization: formulation,
discussion and generalization. In: Forrest S, editor. Proceedings of the 5th international con-
ference on genetic algorithms, July 1993. San Francisco, CA: Morgan Kaufmann; 1993. p.
416–423.
51. Fonseca CM, Fleming PJ. Multiobjective optimization and multiple constraint handling with
evolutionary algorithms—Part i: a unified formulation; Part ii: application example. IEEE
Trans Syst Man Cybern Part A. 1998;28(1):26–37, 38–47.
52. Freschi F, Repetto M. Multiobjective optimization by a modified artificial immune system
algorithm. In: Proceedings of the 4th international conference on artificial immune systems
(ICARIS), Banff, Alberta, Canada, Aug 2005. pp. 248–261.
53. Garcia-Martinez C, Cordon O, Herrera F. A taxonomy and an empirical analysis of mul-
tiple objective ant colony optimization algorithms for the bi-criteria TSP. Eur J Oper Res.
2007;180(1):116–48.
54. Ghasemi M, Ghavidel S, Ghanbarian MM, Gitizadeh M. Multi-objective optimal electric
power planning in the power system using Gaussian bare-bones imperialist competitive algo-
rithm. Inf Sci. 2015;294:286–304.
55. Giagkiozis I, Purshouse RC, Fleming PJ. Generalized decomposition and cross entropy meth-
ods for many-objective optimization. Inf Sci. 2014;282:363–87.
56. Goh C-K, Tan KC. A competitive-cooperative coevolutionary paradigm for dynamic multi-
57. Goh CK, Tan KC, Liu DS, Chiam SC. A competitive and cooperative coevolutionary
approach to multi-objective particle swarm optimization algorithm design. Eur J Oper Res.
2010;202(1):42–54.
59. Gong M, Jiao L, Du H, Bo L. Multiobjective immune algorithm with nondominated neighbor-
based selection. Evol Comput. 2008;16(2):225–55.
60. Guevara-Souza M, Vallejo EE. Using a simulated Wolbachia infection mechanism to improve
multi-objective evolutionary algorithms. Nat Comput. 2015;14:157–67.
61. Guzman MA, Delgado A, De Carvalho J. A novel multi-objective optimization algorithm
based on bacterial chemotaxis. Eng Appl Artif Intell. 2010;23:292–301.
62. Hadka D, Reed P. Diagnostic assessment of search controls and failure modes in many-
objective evolutionary optimization. Evol Comput. 2012;20(3):423–52.
63. Hadka D, Reed P. Borg: an auto-adaptive many-objective evolutionary computing framework.
Evol Comput. 2013;21:231–59.
64. Hansen MP, Jaszkiewicz A. Evaluating the quality of approximations to the non-dominated
set. Technical Report IMM-REP-1998-7, Institute of Mathematical Modeling, Technical Uni-
versity of Denmark, Denmark; 1998.
65. He X-S, Li N, Yang X-S. Non-dominated sorting cuckoo search for multiobjective optimiza-
tion. In: Proceedings of IEEE symposium on swarm intelligence (SIS), Orlando, FL, USA,
Dec 2014. p. 1–7.
66. He Z, Yen GG. Many-objective evolutionary algorithm: objective space reduction and diversity
improvement. IEEE Trans Evol Comput. 2016;20(1):145–60.
67. Hu X, Eberhart RC. Multiobjective optimization using dynamic neighborhood particle swarm
optimization. In: Proceedings of congress on evolutinary computation (CEC), Honolulu, HI,
USA, May 2002. p. 1677–1681.
68. Hu X, Eberhart RC, Shi Y. Particle swarm with extended memory for multiobjective opti-
mization. In: Proceedings of IEEE swarm intelligence symposium, Indianapolis, IN, USA,
April 2003. p. 193–197.
69. Huang L, He XX, Wang N, Xie Y. P systems based multi-objective optimization algorithm.
Prog Nat Sci. 2007;17:458–65.
70. Huo Y, Zhuang Y, Gu J, Ni S. Elite-guided multi-objective artificial bee colony algorithm.
71. Horn J, Nafpliotis N, Goldberg DE. A niched pareto genetic algorithm for multiobjective opti-
mization. In: Proceedings of the 1st IEEE conference on evolutionary computation, Orlando,
FL, USA, June 1994. p. 82–87.
72. Ikeda K, Kita H, Kobayashi S. Failure of Pareto-based MOEAs: does non-dominated really
mean near to optimal? In: Proceedings of congress on evolutionary computation (CEC), Seoul,
Korea, May 2001. p. 957–962.
73. Iorio AW, Li X. A cooperative coevolutionary multiobjective algorithm using non-dominated
sorting. In: Proceedings of genetic and evolutionary computation conference (GECCO), Seat-
tle, WA, USA, June 2004. p. 537–548.
74. Ishibuchi H, Murata T. Multi-objective genetic local search algorithm and its application to
flowshop scheduling. IEEE Trans Syst Man Cybern Part C. 1998;28(3):392–403.
75. Jaimes AL, Coello CAC, Barrientos JEU. Online objective reduction to deal with many-
objective problems. In: Proceedings of the 5th international conference on evolutionary multi-
criterion optimization (EMO), Nantes, France, April 2009. p. 423–437.
References 407
76. Jain H, Deb K. An evolutionary many-objective optimization algorithm using reference-point

based non-dominated sorting approach, part ii: handling constraints and extending to an
adaptive approach. IEEE Trans Evol Comput. 2013;18(4):602–22.
77. Jensen MT. Reducing the run-time complexity of multiobjective eas: the NSGA-II and other
78. Jiao L, Gong M, Shang R, Du H, Lu B. Clonal selection with immune dominance and energy
based multiobjective optimization. In: Proceedings of the 3rd international conference on
evolutionary multi-criterion optimization (EMO), Guanajuato, Mexico, March 2005. p. 474–
489.
79. Jiang S, Zhang J, Ong Y-S. Multiobjective optimization based on reputation. Inf Sci.
2014;286:125–46.
80. Keerativuttitumrong N, Chaiyaratana N, Varavithya V. Multi-objective co-operative co-
evolutionary genetic algorithm. In: Proceedings of the 7th international conference on parallel
problem solving from nature (PPSN VII), Granada, Spain, Sept 2002. Berlin: Springer; 2002.
p. 288–297.
81. Khan N. Bayesian optimization algorithms for multi-objective and hierarchically difficult
problem. IlliGAL Report No. 2003021, Department of General Engineering, University of
Illinois at Urbana-Champainge, Urbana, IL, USA. 2003.
82. Khare V, Yao X, Deb K. Performance scaling of multiobjective evolutionary algorithms. In:
Proceedings of the 2nd international conference on evolutionry multi-criterion optimization
(EMO), Faro, Portugal, April 2003. p. 376–390.
83. Knowles J. ParEGO: a hybrid algorithm with on-line landscape approximation for expensive
multiobjective optimization problems. IEEE Trans Evol Comput. 2006;10(1):50–66.
84. Knowles JD, Corne DW. Approximating the nondominated front using the Pareto archived
evolution strategy. Evol Comput. 2000;8(2):149–72.
85. Knowles JD, Corne DW. M-PAES: a memetic algorithm for multiobjective optimization. In:
Proceedings of IEEE congress on evolutionary computation (CEC), La Jolla, CA, USA, July
2000. p. 325–332.
86. Knowles JD, Corne DW. Quantifying the effects of objective space dimension in evolutionary
multiobjective optimization. In: Proceedings of the 4th international conference on evolution-
ary multi-criterion optimization (EMO), Matsushima, Japan, March 2007. p. 757–771.
87. Koppen M, Yoshida K. Substitute distance assignments in NSGAII for handling many-
objective optimization problems. In: Proceedings of the 4th international conference on evo-
lutionary multi-criterion optimization (EMO), Matsushima, Japan, March 2007. p. 727–741.
88. Kukkonen S, Lampinen J. GDE3: the third evolution step of generalized differential evolution.
In: Proceedings of IEEE congress on evolutionary computation (CEC), Edinburgh, UK, Sept
2005. p. 443–450.
89. Kumar V, Chhabra JK, Kumar D. Differential search algorithm for multiobjective problems.
Procedia Comput Sci. 2015;48:22–8.
90. Kundu D, Suresh K, Ghosh S, Das S, Panigrahi BK, Das S. Multi-objective optimization with
artificial weed colonies. Inf Sci. 2011;181(12):2441–54.
91. Lara A, Sanchez G, Coello CAC, Schutze O. HCS: a new local search strategy for memetic
multiobjective evolutionary algorithms. IEEE Trans Evol Comput. 2010;14(1):112–32.
92. Laumanns M, Ocenasek J. Bayesian optimization algorithms for multi-objective optimization.
In: Proceedings of the 7th international conference on parallel problem solving from nature
(PPSN-VII), Granada, Spain, Sept 2002. Berlin: Springer; 2002. p. 298–307.
93. Laumanns M, Rudolph G, Schwefel H-P. A spatial predator-prey approach to multiobjective
optimization: a preliminary study. In: Proceedings of the 5th international conference on
parallel problem solving from nature (PPSN-V), Amsterdam, The Netherlands, Sept 1998.
94. Laumanns M, Thiele L, Deb K, Zitzler E. Combining convergence and diversity in evolution-
ary multi-objective optimization. Evol Comput. 2002;10(3):263–82.
95. Li H, Zhang Q. Multiobjective optimization problems with complicated Pareto sets, MOEA/D
and NSGA-II. IEEE Trans Evol Comput. 2009;13(2):284–302.
96. Li JQ, Pan QK, Gao KZ. Pareto-based discrete artificial bee colony algorithm for multi-
objective flexible job shop scheduling problems. Int J Adv Manuf Technol. 2011;55:1159–69.
97. Li K, Zhang Q, Kwong S, Li M, Wang R. Stable matching-based selection in evolutionary
multiobjective optimization. IEEE Trans Evol Comput. 2014;18(6):909–23.
98. Li M, Yang S, Liu X. Shift-based density estimation for Pareto-based algorithms in many-
99. Li M, Yang S, Liu X. Bi-goal evolution for many-objective optimization problems. Artif Intell.
2015;228:45–65.
100. Li X. A non-dominated sorting particle swarm optimizer for multiobjective optimization. In:
Proceedings of genetic and evolutionary computation conference (GECCO), Chicago, IL,
USA, July 2003. p. 37–48.
101. Li X. Better spread and convergence: particle swarm multiobjective optimization using the
maximin fitness function. In: Proceedings of genetic and evolutionary computation conference
102. Li Z, Nguyen TT, Chen SM, Truong TK. A hybrid algorithm based on particle swarm and
chemical reaction optimization for multi-object problems. Appl Soft Comput. 2015;35:525–
40.
103. Liang Z, Song R, Lin Q, Du Z, Chen J, Ming Z, Yu J. A double-module immune algorithm
for multi-objective optimization problems. Appl Soft Comput. 2015;35:161–74.
104. Liu D, Tan KC, Goh CK, Ho WK. A multiobjective memetic algorithm based on particle
swarm optimization. IEEE Trans Syst Man Cybern Part B. 2007;37(1):42–50.
105. Lohn JD, Kraus WF, Haith GL. Comparing a coevolutionary genetic algorithm for multiob-
jective optimization. In: Proceedings of the world on congress on computational intelligence,
Honolulu, HI, USA, May 2002. p. 1157–1162.
106. Lu H, Yen G. Rank-density-based multiobjective genetic algorithm and benchmark test func-
tion study. IEEE Trans Evol Comput. 2003;7(4):325–43.
107. Leong W-F, Yen GG. PSO-based multiobjective optimization with dynamic population size
and adaptive local archives. IEEE Trans Syst Man Cybern Part B. 2008;38(5):1270–93.
108. Lopez-Jaimes A, Coello Coello CA. Including preferences into a multiobjective evolu-
tionary algorithm to deal with many-objective engineering optimization problems. Inf Sci.
2014;277:1–20.
109. Lu Z, Zhao H, Xiao H, Wang H, Wang H. An improved multi-objective bacteria colony
chemotaxis algorithm and convergence analysis. Appl Soft Comput. 2015;31:274–92.
110. Ma X, Qi Y, Li L, Liu F, Jiao L, Wu J. MOEA/D with uniform decomposition measurement
for many-objective problems. Soft Comput. 2014;18:2541–64.
111. Madavan NK. Multiobjective optimization using a Pareto differential evolution approach. In:
Proceedings of IEEE congress on evolutionary computation (CEC), Honolulu, HI, USA, May
2002. p. 1145–1150.
112. Marti L, Garcia J, Berlanga A, Molina JM. Solving complex high-dimensional problems with
the multi-objective neural estimation of distribution algorithm. In: Proceedings of the 11th
genetic and evolutionary computation conference (GECCO), Montreal, Canada, July 2009.
p. 619–626.
113. Menczer F, Degeratu M, Steet WN. Efficient and scalable Pareto optimization by evolutionary
local selection algorithms. Evol Comput. 2000;8(2):223–47.
114. Miettinen K. Nonlinear multiobjective optimization. Boston: Kluwer; 1999.
115. Mo H, Xu Z, Xu L, Wu Z, Ma H. Constrained multiobjective biogeography optimization
algorithm. Sci World J. 2014;2014, Article ID 232714:12p.
116. Modiri-Delshad M, Rahim NA. Multi-objective backtracking search algorithm for economic
emission dispatch problem. Appl Soft Comput. 2016;40:479–94.
References 409
117. Molina J, Laguna M, Marti R, Caballero R. SSPMO: a scatter tabu search procedure for
non-linear multiobjective optimization. INFORMS J Comput. 2007;19(1):91–100.
118. Mora AM, Garcia-Sanchez P, Merelo JJ, Castillo PA. Pareto-based multi-colony multi-
objective ant colony optimization algorithms: an island model proposal. Soft Comput.
2013;17:1175–207.
119. Murata T, Ishibuchi H, Gen M. Specification of genetic search direction in cellular multi-
objective genetic algorithm. In: Proceedings of the 1st international conference on evolution-
ary multicriterion optimization (EMO), Zurich, Switzerland, March 2001. Berlin: Springer;
2001. p. 82–95.
120. Nam DK, Park CH. Multiobjective simulated annealing: a comparative study to evolutionary
algorithms. Int J Fuzzy Syst. 2000;2(2):87–97.
121. Nebro AJ, Durillo JJ, Luna F, Dorronsoro B, Alba E. MOCell: a cellular genetic algorithm
for multiobjective optimization. Int J Intell Syst. 2009;24:726–46.
122. Nebro AJ, Luna F, Alba E. New ideas in applying scatter search to multiobjective optimization.
In: Proceedings of the 3rd international conference on evolutionary multicriterion optimization
(EMO), Guanajuato, Mexico, March 2005. p. 443–458.
123. Nebro AJ, Luna F, Alba E, Dorronsoro B, Durillo JJ, Beham A. AbYSS: adapting scatter
search to multiobjective optimization. IEEE Trans Evol Comput. 2008;12(4):439–57.
124. Nguyen L, Bui LT, Abbass HA. DMEA-II: the direction-based multi-objective evolutionary
algorithm-II. Soft Comput. 2014;18:2119–34.
125. Okabe T, Jin Y, Sendhoff B, Olhofer M. Voronoi-based estimation of distribution algorithm for
multi-objective optimization. In: Proceedings of IEEE congress on evolutionary computation
(CEC), Portland, OR, USA, June 2004. p. 1594–1601.
126. Parsopoulos KE, Tasoulis DK, Pavlidis NG, Plagianakos VP, Vrahatis MN. Vector evaluated
differential evolution for multiobjective optimization. In: Proceedings of IEEE congress on
evolutionary computation (CEC), Portland, Oregon, USA, June 2004. p. 204–211.
127. Parsopoulos KE, Tasoulis DK, Vrahatis MN. Multiobjective optimization using parallel vector
evaluated particle swarm optimization. In: Proceedings of the IASTED international confer-
ence on artificial intelligence and applications, Innsbruck, Austria, Feb 2004. p. 823–828.
128. Pelikan M, Sastry K, Goldberg DE. Multiobjective HBOA, clustering, and scalability. In:
Proceedings of international conference on genetic and evolutionary computation; 2005. p.
663–670.
129. Pulido GT, Coello CAC. Using clustering techniques to improve the performance of a par-
ticle swarm optimizer. In: Proceedings of genetic and evolutionary computation conference
130. Purshouse RC, Fleming PJ. On the evolutionary optimization of many conflicting objectives.
131. Rahimi-Vahed A, Mirzaei AH. A hybrid multi-objective shuffled frog-leaping algorithm for
a mixed-model assembly line sequencing problem. Comput Ind Eng. 2007;53(4):642–66.
132. Rao RV, Patel V. Multi-objective optimization of two stage thermoelectric cooler using a mod-
ified teaching-learning-based optimization algorithm. Eng Appl Artif Intell. 2013;26:430–45.
133. Ray T, Liew KM. A swarm metaphor for multiobjective design optimization. Eng Optim.
2002;34(2):141–53.
134. Reddy MJ, Kumar DN. An efficient multi-objective optimization algorithm based on swarm
intelligence for engineering design. Eng Optim. 2007;39(1):49–68.
135. Reynoso-Meza G, Sanchis J, Blasco X, Martinez M. Design of continuous controllers using
a multiobjective differential evolution algorithm with spherical pruning. In: Applications of
evolutionary computation. Lecture notes in computer science, vol. 6024. Berlin: Springer;
2010. p. 532–541.
136. Robic T, Filipic B. DEMO: differential evolution for multiobjective optimization. In: Proceed-
ings of the 3rd international conference on evolutionary multi-criterion optimization (EMO),
Guanajuato, Mexico, March 2005. p. 520–533.
137. Sadollah A, Eskandar H, Kim JH. Water cycle algorithm for solving constrained multi-
objectiveoptimization problems. Appl Soft Comput. 2015;27:279–98.
138. Sastry K, Goldberg DE, Pelikan M. Limits of scalability of multi-objective estimation of dis-
tribution algorithms. In: Proceedings of IEEE congress on evolutionary computation (CEC),
Edinburgh, UK, Sept 2005. p. 2217–2224.
139. Sato H, Aguirre H, Tanaka K. Controlling dominance area of solutions and its impact on the
performance of MOEAs. In: Proceedings of the 4th international conference on evolutionary
multi-criterion optimization (EMO), Matsushima, Japan, March 2007. p. 5–20.
140. Schaffer JD. Multiple objective optimization with vector evaluated genetic algorithms. In:
Grefenstette JJ, editor. Proceedings of the 1st international conference on genetic algorithms,
Pittsburgh, PA, USA, July 1985. Hillsdale, NJ, USA: Lawrence Erlbaum; 1985. p. 93–100.
141. Schott JR. Fault tolerant design using single and multicriteria genetic algorithm optimization.
Master’s Thesis, Department of Aeronautics and Astronautics, Massachusetts Institute of
Technology, Cambridge, MA; 1995.
142. Shim VA, Tan KC, Cheong CY. An energy-based sampling technique for multi-objective
restricted Boltzmann machine. IEEE Trans Evol Comput. 2013;17(6):767–85.
143. Shim VA, Tan KC, Chia JY, Al Mamun A. Multi-objective optimization with estimation of
distribution algorithm in a noisy environment. Evol Comput. 2013;21(1):149–77.
144. Sierra MR, Coello CAC. Improving PSO-based multiobjective optimization using crowding,
mutation and -dominance. In: Proceedings of the 3rd international conference on evolution-
ary multi-criterion optimization (EMO), Guanajuato, Mexico, March 2005. p. 505–519.
145. Singh HK, Isaacs A, Ray T. A Pareto corner search evolutionary algorithm and dimen-
sionality reduction in many-objective optimization problems. IEEE Trans Evol Comput.
2011;15(4):539–56.
146. Smith KI, Everson RM, Fieldsend JE, Murphy C, Misra R. Dominance-based multiobjective
simulated annealing. IEEE Trans Evol Comput. 2008;12(3):323–42.
147. Soh H, Kirley M. moPGA: toward a new generation of multiobjective genetic algorithms. In:
Proceedings of IEEE congress on evolutionary computation, Vancouver, BC, Canada, July
2006. p. 1702–1709.
148. Soylu B, Köksalan M. A favorable weight-based evolutionary algorithm for multiple criteria
problems. IEEE Trans Evol Comput. 2010;14(2):191–205.
149. Srinivas N, Deb K. Multiobjective optimization using nondominated sorting in genetic algo-
rithms. Evol Comput. 1994;2(3):221–48.
150. Srinivas M, Patnaik LM. Adaptive probabilities of crossover and mutation in genetic algo-
rithms. IEEE Trans Syst Man Cybern. 1994;24(4):656–67.
151. Tan KC, Lee TH, Khor EF. Evolutionary algorithms with dynamic population size and local
exploration for multiobjective optimization. IEEE Trans Evol Comput. 2001;5(6):565–88.
152. Tan KC, Yang YJ, Goh CK. A distributed cooperative coevolutionary algorithm for multiob-
jective optimization. IEEE Trans Evol Comput. 2006;10(5):527–49.
153. Tang HJ, Shim VA, Tan KC, Chia JY. Restricted Boltzmann machine based algorithm for
multi-objective optimization. In: Proceedings of IEEE congress on evolutionary computation
(CEC), Barcelona, Spain, July 2010. p. 3958–3965.
154. Teo J. Exploring dynamic self-adaptive populations in differential evolution. Soft Comput.
2006;10(8):673–86.
155. Toffolo A, Benini E. Genetic diversity as an objective in multi-objective evolutionary algo-
rithms. Evol Comput. 2003;11(2):151–67.
156. Vasconcelos JA, Maciel JHRD, Parreiras RO. Scatter search techniques applied to electro-
magnetic problems. IEEE Trans Magn. 2005;4:1804–7.
157. Veldhuizen DAV, Lamont GB. Multiobjective evolutionary algorithm research: a history and
analysis. Technical Report TR-98-03, Department of Electrical and Computer Engineering,
Graduate School of Engineering, Air Force Institute of Technology, Wright-Patterson AFB,
OH, USA; 1998.
References 411
158. Vrugt JA, Robinson BA, Hyman JM. Self-adaptive multimethod search for global optimization
in real-parameter spaces. IEEE Trans Evol Comput. 2009;13(2):243–59.
159. Wagner T, Beume N, Naujoks B. Pareto-, aggregation-, and indicator-based methods in many-
objective optimization. In: Proceedings of the 4th international conference on evolutionary
multi-criterion optimization (EMO), Matsushima, Japan, March 2007. p. 742–756.
160. Wang R, Purshouse RC, Fleming PJ. Preference-inspired coevolutionary algorithms for many-
161. Wanner EF, Guimaraes FG, Takahashi RHC, Fleming PJ. Local search with quadratic approx-
imations into memetic algorithms for optimization with multiple criteria. Evol Comput.
2008;16(2):185–224.
162. Wu Y, Jin Y, Liu X. A directed search strategy for evolutionary dynamic multiobjective
optimization. Soft Comput. 2015;19:3221–35.
163. Xiang Y, Zhou Y. A dynamic multi-colony artificial bee colony algorithm for multi-objective
optimization. Appl Soft Comput. 2015;35:766–85.
164. Xue J, Wu Y, Shi Y, Cheng S. Brain storm optimization algorithm for multi-objective opti-
mization problems. In: Proceedings of the 3rd international conference on advances in swarm
intelligence, Shenzhen, China, June 2012. Berlin: Springer; 2012. p. 513–519.
165. Yang S, Li M, Liu X, Zheng J. A grid-based evolutionary algorithm for many-objective
166. Yang X-S. Bat algorithm for multi-objective optimization. Int J Bio-Inspired Comput.
2011;3(5):267–74.
167. Yen GG, Leong WF. Dynamic multiple swarms in multiobjective particle swarm optimization.
IEEE Trans Syst Man Cybern Part A. 2009;39(4):890–911.
168. Yen GG, Lu H. Dynamic multiobjective evolutionary algorithm: adaptive cell-based rank and
density estimation. IEEE Trans Evol Comput. 2003;7(3):253–74.
169. Zhan Z-H, Li J, Cao J, Zhang J, Chung HS-H, Shi Y-H. Multiple populations for multi-
ple objectives: a coevolutionary technique for solving multiobjective optimization problems.
IEEE Trans Cybern. 2013;43(2):445–63.
170. Zhang Q, Li H. MOEA/D: a multiobjective evolutionary algorithm based on decomposition.
171. Zhang Q, Liu W, Li H. The performance of a new version of MOEA/D on CEC09 uncon-
strained MOP test instances. In: Proceedings of the IEEE conference on evolutionary com-
putation (CEC), Trondheim, Norway, May 2009. p. 203–208.
172. Zhang Q, Zhou A, Jin Y. Global multiobjective optimization via estimation of distribution
algorithm with biased initialization and crossover. In: Proceedings of the genetic and evolu-
tionary computation conference (GECCO), London, UK, July 2007. p. 617–622.
173. Zhang Q, Zhou A, Jin Y. RM-MEDA: a regularity model-based multi-objective estimation of
distribution algorithm. IEEE Trans Evol Comput. 2008;12(1):41–63.
174. Zhang X, Tian Y, Cheng R, Jin Y. An efficient approach to non-dominated sorting for evolu-
tionary multi-objective optimization. IEEE Trans Evol Comput. 2015;19(2):201–15.
175. Zhong X, Li W. A decision-tree-based multi-objective estimation of distribution algorithm. In:
Proceedings of international conference on computational intelligence and security, Harbin,
China, Dec 2007. p. 114–118.
176. Zhou A, Zhang Q, Jin Y. Approximating the set of pareto-optimal solutions in both the
decision and objective spaces by an estimation of distribution algorithm. Trans Evol Comput.
2009;13(5):1167–89.
177. Zitzler E, Deb K, Thiele L. Comparison of multiobjective evolutionary algorithms: empirical
results. Evol Comput. 2000;8(2):173–95.
178. Zitzler E, Kunzli S. Indicator-based selection in multiobjective search. In: Proceedings of the
8th international conference on parallel problem solving from nature (PPSN VIII), Birming-
ham, UK, Sept 2004. Berlin: Springer; 1998. p. 832–842.
179. Zitzler E, Laumanns M, Thiele L. SPEA2: improving the strength Pareto evolutionary algo-
rithm. TIK-Report 103, Departmentt of Electrical Engineering, Swiss Federal Institute of
Technology, Switzerland. 2001.
180. Zitzler E, Laumanns M, Thiele L. SPEA2: improving the strength pareto evolutionary algo-
rithm. In: Proceedings of evolutionary methods for design, optimisation and control. CIMNE,
Barcelona, Spain; 2002. p. 95–100.
181. Zitzler E, Thiele L. Multiobjective evolutionary algorithms: a comparative case study and the
strength Pareto approach. IEEE Trans Evol Comput. 1999;3(4):257–71.
182. Zitzler E, Thiele L, Laumanns M, Fonseca CM, da Fonseca VG. Performance assessment of
multiobjective optimizers: an analysis and review. IEEE Trans Evol Comput. 2003;7:117–32.
Appendix
Benchmarks A
This chapter gives benchmark functions for discrete optimization as well as for real-
valued unconstrained, multimodal, multiobjective, and dynamic optimization.
A.1 Discrete Benchmark Functions
This section gives a few well-known benchmark functions for evaluating discrete
optimization methods.
Quadratic Assignment Problem (QAP)
The quadratic assignment problem (QAP) is a well-known NP-hard COP with a
wide variety of applications, including the facility location problem. For the facility
location problem, the objective is to find a minimum cost assignment of facilities
to locations considering the flow of materials between facilities and the distance
between locations. The facility location problem can be formulated as

n
n
min z p = f i j d pi p j , (A.1)
p∈P
i=1 j=1

where f i j is the flow matrix with the flow f i j between the two facilities i and j,
di j is the distance matrix, p is a permutation vector of n indices of facilities (or
locations) mapping a possible assignment of n facilities to n locations, and P is the
set of all n-vector permutations.
The p-center problem [3], also known as minimax location-allocation problem,
consists of locating p facilities (centers) on a network such that the maximum of
the distances between nodes and their nearest centers is minimized. In the p-center
problem, N nodes (customers) and distances between nodes are given, and p centers
should be located at any of the N given nodes. The p-center problem can be used in
applications such as locating fire stations, police departments, or emergency centers.

DOI 10.1007/978-3-319-41192-7
414 Appendix A: Benchmarks
The location-allocation problem can be formulated as [8]

m
n
min di j xi j (A.2)
i=1 j=1
subject to

n
ai j xi j = bi , ∀i = 1, . . . , m, (A.3)
j=1

m
xi j = 1, ∀ j = 1, . . . , n, (A.4)
i=1
xi j ∈ {0, 1}, ∀i = 1, . . . , m; j = 1, . . . , n, (A.5)

where xi j ∈ {0, 1}m×nis a variable matrix, di j ≥ 1, ai j ≥ 1 and bi ≥ 1 are
constant integers.
Task assignment problem is a QAP application. It involves assigning a number of
tasks to a number of processors in a distributed system. The objective is to minimize
the total execution and communication cost incurred by task assignment, which is
limited by the resource requirements. This is a 0-1 quadratic integer programming
problem. The general formulation is given by
r −1

r n r
n
min Q(X) = eik xik + ci j 1 − xik x jk , (A.6)
i=1 k=1 i=1 j=i+1 k=1

n
subject to xik = 1, ∀i = 1, 2, . . . , r, (A.7)
k=1

r
m i xik ≤ Mk , ∀k = 1, 2, . . . , n, (A.8)
i=1

r
pi xik ≤ Pk , ∀k = 1, 2, . . . , n, (A.9)
i=1
xik ∈ {0, 1}, ∀i, k, (A.10)

where variable xik = 1 if task i is assigned to processor k, and 0 otherwise, n is the
number of processors, r is the number of tasks, eik is the incurred execution cost if
task i is executed on processor k, ci j is the incurred communication cost between
tasks i and j if they are executed on different processors, m i is the memory capacity
of processor k, pi is the processing requirement of task i from its execution processor,
and Pk is processing capacity of processor k. The constraint (A.7) specifies that any
task should be assigned to only one processor. Constraint (A.8) specifies that the sum
of memory requirements of the tasks assigned to processor k should not exceed the
memory capacity of processor k. Constraint (A.9) specifies that the sum of processing
Appendix A: Benchmarks 415
requirements of the tasks assigned to processor k should not exceed the processing
capacity of processor k.
Other QAP applications have been encountered in a variety of other domains
such as the backboard wiring problem in electronics, the arrangement of electronic
components in printed circuit boards and in microchips, machine scheduling in man-
ufacturing, load balancing and task allocation in parallel and distributed computing,
statistical data analysis, and transportation. A set of test problems can be obtained
from QAPLIB (http://www.seas.upenn.edu/qaplib/inst.html) and Taillard’s reposi-
tory (http://mistic.heig-vd.ch/taillard/problemes.dir/qap.dir/qap.html).
Traveling Salesman Problem
For symmetric TSP, the distances between nodes are independent of the direction,
i.e., di j = d ji for every pair of nodes. In asymmetric TSP, at least one pair of nodes
satisfies di j = d ji .
The problem can be described as

min dx y vxi (v y,i+1 + v y,i−1 ) (A.11)
x y=x i
subject to
vxi vx j = 0, (A.12)
x i j=i

vxi v yi = 0, (A.13)
i x x= y
2

vxi − n = 0. (A.14)
x i
The objective is to find the shortest tour. The first constraint is satisfied if and only if
each city row x contains no more than one 1, i.e., the rest of the entries are zero. The
second constraint is satisfied if and only if each position-in-tour column contains no
more than one 1, i.e., the rest of the entries are zero. The third constraint is satisfied
if and only if there are n entries of one in the entire matrix. The first three terms
describe the feasibility requirements which defines a valid tour by taking zero [4].
The last term represents the objective function of TSP.
TSPLIB (http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/) provides TPS
problem benchmarking from Reinelt [9]. Ulysses16 provides coordinates of 16 loca-
tions of Odysseus’ journey home to Ithaca, also known as Homer’s Odyssey, given
in Table A.1. The length of the optimal tour is 6859 when geographical distances are
used.
Some benchmarks can be found in TSPlib. Berlin52 provides coordinates of 52
locations in Berlin, Germany. The length of optimal tour is 7542 when using Euclid-
ean distances. Bier127 provides coordinates of 127 beer gardens in Augsburg, Ger-
many. The length of optimal tour is 118282 when using Euclidean distances. Gr666
provides coordinates of 666 cities on earth. The length of optimal tour is 294358 in
case of using geographical distances.
Table A.1 Coordinates of problem ulysses16, geographical distances

City Latitude Longitude City Latitude Longitude
1 38.24 20.42 9 41.23 9.10
2 39.57 26.15 10 41.17 13.05
3 40.56 25.32 11 36.08 −5.21
4 36.26 23.12 12 38.47 15.13
5 33.48 10.54 13 38.15 15.35
6 37.56 12.19 14 37.51 15.17
7 38.42 13.11 15 35.49 14.32
8 37.52 20.44 16 39.36 19.56
Knapsack Problem
The knapsack problem consists in finding a subset of an original set of objects such
that the total profit of the selected objects is maximized while a set of resource con-
straints are satisfied. The knapsack problem is a model of many real applications such
as cutting stock problems, project selection and cargo loading, allocating processors
and databases in a distributed computer system.
The knapsack problem is an NP-hard problem. It can be formulated as an inte-
ger linear programming problem. The most common 0/1 knapsack problem can be
formulated as [1]
n
max pi xi (A.15)
i=1
subject to

n
ri j x j ≤ bi , i = 1, . . . , m, (A.16)
j=1
xi ∈ {0, 1}, i = 1, . . . , n, (A.17)
where p = ( p1 , p2 , . . . , pn )T with pi > 0 denoting the profit on item i, x =
(x1 , x2 , . . . , xn )T with xi = 1 denoting item i among the selected items (the knap-
sack) and xi = 0 otherwise, m is the number of resource constraints, bi ≥ 0,
i = 1, 2, . . . , m, denotes the budget of constraint i, and the weights ri j represents
the investment on item j subject to constraint i.
The bounded knapsack problem replaces the constraint (A.17) by xi ∈ {0,
1, . . . , ci }, i = 1, . . . , n, where ci is an integer value. The unbounded knapsack
problem replaces the constraint (A.17) by xi ≥ 0, that is, xi is a nonnegative integer.
There are also multidimensional knapsack problems, multiple knapsack problems,
and multiobjective multiple knapsack problems. Multiple knapsack problem is sim-
ilar to bin packing problem.
Maximum Diversity Problem

Maximum diversity problem is to select a subset of m elements from a set of n
elements in such a way that the sum of the distances between the chosen ele-
ments is maximized. MDPLIB (http://www.optsicom.es/mdp) is a comprehensive
set of benchmarks representative of the collections previously used for computa-
tional experiments.
Maximum diversity problem can be formulated as [1]

n−1
n
max di j xi x j (A.18)
i=1 j=i+1
subject to

n
xi = m, (A.19)
i=1
xi ∈ {0, 1}, i = 1, . . . , n, (A.20)
where di j is simply the distance between element i and element j.
Bin Packing Problem
In the bin packing problem, objects of different volumes must be packed into a finite
number of bins or containers each of volume V in a way that minimizes the number
of bins used. There are many variations of this problem, such as 2D packing, linear
packing, packing by weight, packing by cost, and so on. They have many applications,
such as filling up containers, loading trucks with weight capacity constraints, and
creating file backups in media.
The bin packing problem is an NP-hard COP. It can also be seen as a special
case of the cutting stock problem. When the number of bins is restricted to 1 and
each item is characterized by both a volume and a value, the problem of maximizing
the value of items that can fit in the bin is known as the knapsack problem. The
2D bin packing problem is to pack objects with various width and length sizes into
minimized number of 2D bins.
Nurse Rostering Problem
Nurse rostering problem is a COP tackled by assigning a set of shifts to a set of
nurses, each of whom has specific skills and work contract, to a predefined roster-
ing period according to a set of constraints. The standard dataset published in the
First International Nurse Rostering Competition 2010 (INRC2010) consists of 69
instances which reflect this problem in many real-world cases that are varied in size
and complexity.
A.2 Test Functions
Many problems from the EA literature, each belonging to the important class of real-
valued, unconstrained, multiobjective test problems, are systematically reviewed
and analyzed in [6], where a flexible toolkit is presented for constructing well-
designed test problems. CEC2005 benchmark [10] is a well-known benchmark that
includes 25 functions for real-parameter optimization algorithms. The codes in Mat-
lab, C and Java for them could be found at http://www.ntu.edu.sg/home/EPNSugan/.
IEEE Congress on Evolutionary Computation provides a series of CEC benchmark
functions for testing various optimization algorithms. The Black-Box Optimization
Benchmarking (BBOB) Workshop of the Genetic and Evolutionary Computation
Conference (GECCO) also provides a series of BBOB benchmark functions, which
are composed of noisy and noiseless test functions.
Optimal reactive power dispatch problem is a well-known nonlinear optimization
problem in power systems. It tries to find the best combination of control variables
so that the loss and voltage deviation minimizations can be achieved. Two examples
are IEEE 30-bus system and IEEE 118-bus system.
Some test functions are illustrated in http://en.wikipedia.org/wiki/Test_functions_
for_optimization and http://www.sfu.ca/~ssurjano/optimization.html. MATLAB
codes for various metaheuristics are available at http://yarpiz.com.
A.2.1 Test Functions for Unconstrained and Multimodal

Optimization
Ackley Function
⎛ ⎞ n

n

1 1
20 + e − 20 exp ⎝−0.2 xi2 ⎠ − exp cos(2πxi ) . (A.21)
n n
i=1 i=1
Decision space: [−32, 32]n .
Minimum: 0 at x ∗ = 0.
Alpine Function

n
|xi sin xi + 0.1xi |. (A.22)
i=1
Minimum: 0.
Six-Hump-Camelback Function
x16
4x12 − 2.1x14 + + x1 x2 − 4x22 + 4x24 . (A.23)
3
Decision space: [−5, 5]2 .

Minimum: −1.03163.
Sphere Function

n
x2 = xi2 . (A.24)
i=1
Drop Wave Function
√
1 + cos 12 x
− . (A.25)
2 x +2
1 2
Decision space: [−5.12, 5.12]n .

Minimum: −1 at x = (0, 0)T .
Easom Function

− cos x1 cos x2 exp −(x1 − π)2 − (x2 − π)2 (A.26)
Minimum: -1 at x = (π, π)T .
Griewank Function
x2
n
xi
− cos √ + 1. (A.27)
4000 i
i=1
Michalewicz Function
20

n
i xi2
− sin xi sin . (A.28)
π
i=1
Decision space: [0, π]n .
Minimum: −1.8013 at x ∗ = (2.20, 1.57)T for n = 2.
Pathological Function
⎛ ⎞

n−1 sin2 100xi2 + xi+1
2 − 0.5
⎜ ⎟
⎝0.5 + 2 2 ⎠ . (A.29)
i=1 1 + 0.001 xi − 2xi xi+1 + xi+1
2

Minimum: 0.
Rastrigin Function

n

10n + xi2 − 10 cos(2πxi ) . (A.30)
i=1
Rosenbrock Function
n−1

2
100 xi+1 − xi2 + (xi − 1)2 . (A.31)
i=1
Minimum: 0 at x ∗ = (1, 1, . . . , 1)T .
Salomon Function
1 − cos(2πx2 ) + 0.1x2 . (A.32)

Needle-in-Haystack
2
a
f (x) = + (x12 + x22 )2 (A.33)
b + (x12 + x22 )
with a = 3.0, b = 0.05.
x ∈ [5.12, 5.12]2 .
Schaffer Function

sin2 x12 + x22 − 0.5
f (x) = 0.5 + 2 . (A.34)
1 + 0.001(x12 + x22 )
Minimum: 0 at x = 0.
Schwefel Function

n
418.9829n − xi sin |xi | . (A.35)
i=1

Minimum: 0 at x = (420.9687, . . . , 420.9687).
Sum of Powers Function

n
|xi |i+1 . (A.36)
i=1

Minimum: 0.
Tirronen Function

x2 2.5
n
3 exp − − 10 exp −8x2 + cos(5(xi + (1 + i mod 2) cos(x2 ))).
10n n
i=1
(A.37)
Minimum: 0.
Whitley Function

n
n yi,2 j
− cos yi, j + 1 . (A.38)
4000
i=1 j=1
yi, j = (100(x j − xi )2 + (1 − xi )2 )2 . (A.39)

Minimum: 0.
Zakharov Function
2 4

n
i x1
n
i x1
x2 + + xi . (A.40)
2 2
i=1 i=1

Minimum: 0.
Axis Parallel Hyper-ellipsoid Function

n
i xi2 . (A.41)
i=1

Minimum: 0.
Moved Axis Function

n
5i xi2 . (A.42)
i=1
Test Functions for Multimodal Optimization
Those test functions listed in Section A.2.1 that contains sin and cos functions demon-
strates periodic properties and thus can be used as benchmark for multimodal opti-
mization. For example, Ackley, Rastrigin, Griewank, and Schwefel functions are
typically used.
A.2.2 Test Functions for Constrained Optimization
The following test functions for constrained optimization are extracted from [7].
g06
min f (x) = (x1 − 10)3 + (x2 − 20)3 (A.43)

subject to g1 (x) = −(x1 − 5)2 − (x2 − 5)2 + 100 ≤ 0 (A.44)
g2 (x) = (x1 − 6)2 + (x2 − 5)2 − 82.81 ≤ 0 (A.45)
where 13 ≤ x1 ≤ 100 and 0 ≤ x2 ≤ 100. The minimum is f (x ∗ ) =
−6961.81387558015 at x ∗ = (14.09500000000000064, 0.8429607892154795668)T .
g08
sin3 (2πx1 ) sin(2πx2 )

min f (x) = − (A.46)
x13 (x1 + x2 )
subject to g1 (x) = x12 − x2 + 1 ≤ 0 (A.47)
g2 (x) = 1 − x1 + (x2 − 4) ≤ 0 2
(A.48)
where 0 ≤ x1 ≤ 10 and 0 ≤ x2 ≤ 10. The minimum is f (x ∗ ) = −0.0958250414180359
at x ∗ = (1.227, 4.245)T .
g11
min f (x) = −x12 + (x2 − 1)2 (A.49)

subject to h(x) = x2 − x12 =0 (A.50)
where −1 ≤ x1 ≤ 1 and −1 ≤ x2 ≤ 1. The minimum is f (x ∗ ) = 0.7499 at
x ∗ = (−0.707036070037170616, 0.00000004333606807)T .
A.2.3 Test Functions for Unconstrained Multiobjective

Optimization
Test functions for unconstrained and constrained multiobjective optimization can be

found in [2,12], and at http://www.tik.ee.ethz.ch/~zitzler/testdata.html. The bench-
mark can be constructed using WFG Toolkit [5]. IEEE Congress on Evolutionary
Computation provides CEC2009 MOEA Competition benchmark for multiobjective
optimization [11].
Schaffer
Objective functions:
f 1 (x) = x 2 , f 2 (x) = (x − 2)2 . (A.51)
Variable bounds: [−103 , 103 ].
Optimal solutions: x ∈ [0, 2].
This function has a convex, continuous Pareto optimal front.
Fonseca

n
1 2
f 1 (x) = 1 − exp − (xi − √ ) , (A.52)
n
i=1
n
1 2
f 2 (x) = 1 − exp − (xi − √ ) , (A.53)
n
i=1
n = 3;
Variable bounds: [−4, 4].
Optimal solutions: x1 = x2 = x3 ∈ [− √1 , √1 ].
3 3
This function has a nonconvex, continuous Pareto optimal front which corresponds
to g(x) = 1.
ZDT1
f 1 (x) = x1 , (A.54)

x1
f 2 (x) = g(x) 1 − , (A.55)
g(x)
n
xi
g(x) = 1 + 9 i=2 (A.56)
n−1
n = 30;
Variable bounds: x = (x1 , x2 , . . . , xn )T , xi ∈ [0, 1].
Optimal solutions: x1 ∈ [0, 1], xi = 0, i = 2, . . . , 30.
This function has a convex, continuous Pareto optimal front which corresponds to
g(x) = 1.
ZDT2
f 1 (x) = x1 , (A.57)

x1 2
f 2 (x) = g(x) 1 − ( ) , (A.58)
g(x)
n
xi
g(x) = 1 + 9 i=2 . (A.59)
n−1
This function has a nonconvex, continuous Pareto optimal front which corresponds
to g(x) = 1.
ZDT3
f 1 (x) = x1 , (A.60)

x1 x1
f 2 (x) = g(x) 1 − − sin(10πx1 ) , (A.61)
g(x) g(x)
n
x i
g(x) = 1 + 9 i=2 (A.62)
n−1
This function has a convex, discontinuous Pareto optimal front which corresponds
to g(x) = 1.
ZDT4
f 1 (x) = x1 , (A.63)

x1
f 2 (x) = g(x) 1 − , (A.64)
g(x)
n
2
g(x) = 1 + 10(n − 1) + xi − 10 cos(4πxi ) . (A.65)
i=2
n = 30.
Variable bounds: x1 ∈ [0, 1], xi ∈ [−5, 5], i = 2, . . . , n.
This function has a nonconvex, discontinuous Pareto optimal front which corre-
sponds to g(x) = 1.
ZDT6
f 1 (x) = 1 − exp(−4x1 ) sin6 (6πx1 ), (A.66)

x1 2
f 2 (x) = g(x) 1 − ( ) , (A.67)
g(x)
n 0.25
i=2 x i
g(x) = 1 + 9 . (A.68)
n−1
n = 30.
This function has a nonconvex, many-to-one, nonuniformly spaced Pareto optimal
front which corresponds to g(x) = 1.
A.2.4 Test Functions for Constrained Multiobjective Optimization
Osyczka2

f 1 (x) = − 25(x1 − 2)2 + (x2 − 2)2 + (x3 − 1)2 (x4 − 4)2 + (x5 − 1)2 , (A.69)
f 2 (x) = x12 + x22 + x32 + x42 + x52 + x62 . (A.70)
Constraints:
g1 (x) = 0 ≤ x1 + x2 − 2, (A.71)
g2 (x) = 0 ≤ 6 − x1 − x2 , (A.72)
g3 (x) = 0 ≤ 2 − x2 + x1 , (A.73)
g4 (x) = 0 ≤ 2 − x1 + 3x2 , (A.74)
g5 (x) = 0 ≤ 4 − (x3 − 3)2 − x4 , (A.75)
g6 (x) = 0 ≤ (x5 − 3)3 + x6 − 4. (A.76)
Variable bounds: x1 ∈ [0, 10], x2 ∈ [0, 10], x3 ∈ [1, 5], x4 ∈ [0, 6], x5 ∈ [1, 5],
x6 ∈ [0, 10].
Tanaka
f 1 (x) = x1 , (A.77)
f 2 (x) = x2 , (A.78)
Constraints:
g1 (x) = −x12 − x22 + 1 + 0.1 cos(16 arctan(x1 /x2 )) ≤ 0, (A.79)
g2 (x) = (x1 − 0.5)2 + (x2 − 0.5)2 ≤ 0.5. (A.80)
Variable bounds: xi ∈ [−π, π].

ConstrEx
f 1 (x) = x1 , (A.81)
f 2 (x) = (1 + x2 )/x1 . (A.82)
Constraints:
g1 (x) = x2 + 9x1 ≥ 6, (A.83)
g2 (x) = −x2 + 9x1 ≥ 1. (A.84)
Variable bounds: x1 ∈ [0.1, 1.0], x2 ∈ [0, 5].
Srinivas
f 1 (x) = (x1 − 2)2 + (x2 − 1)2 + 2, (A.85)
f 2 (x) = 9x1 − (x2 − 1) . 2
(A.86)
Constraints:
g1 (x) = x12 + x22 ≥ 225, (A.87)
g2 (x) = x1 − 3x2 ≥ −10. (A.88)
Variable bounds: xi ∈ [−20, 20].
A.2.5 Test Functions for Dynamic Optimization
Moving Peaks Benchmark (http://www.aifb.uni-karlsruhe.de/~jbr/MovPeaks/) is a

test benchmark for DOPs. The idea is to have an artificial multidimensional landscape
consisting of several peaks, where the height, width, and position of each peak are
altered slightly every time a change in the environment occurs. Repository on EAs for
dynamic optimization problems is available: http://www.aifb.uni-karlsruhe.de/~jbr/
EvoDOP.
Those test functions listed in Section A.2.3 can be modified to act as benchmark
for dynamic multiobjective optimization.
DZDT1
f 1 ( y) = y1 , (A.89)

y1
f 2 ( y) = g( y) 1 − , (A.90)
g( y)
n
yi
g( y) = 1 + 9 i=2 (A.91)
n−1
t =
f c /F E Sc (A.92)
y1 = x1 (A.93)
t
yi = |xi − |/H (t), i = 2, . . . , n (A.94)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.95)
nT nT
n = 30.
DZDT2
f 1 ( y) = y1 , (A.96)

y1 2
f 2 ( y) = g( y) 1 − ( ) , (A.97)
g( y)
n
yi
g( y) = 1 + 9 i=2 , (A.98)
n−1
t =
f c /F E Sc , (A.99)
y1 = x1 (A.100)
t
yi = |xi − |/H (t), i = 2, . . . , n, (A.101)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.102)
nT nT
n = 30.
DZDT3
f 1 (x) = x1 , (A.103)

x1 x1
f 2 (x) = g(x) 1 − − sin(10πx1 ) , (A.104)
g(x) g(x)
n
xi
g(x) = 1 + 9 i=2 (A.105)
n−1
t =
f c /F E Sc (A.106)
y1 = x1 (A.107)
t
yi = |xi − |/H (t), i = 2, . . . , n (A.108)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.109)
nT nT
n = 30.
DZDT4

y1
f 1 ( y) = y1 , f 2 ( y) = g( y) 1 − , (A.110)
g( y)
n
g( y) = 1 + 10(n − 1) + [yi2 − 10 cos(4πyi )], (A.111)
i=2
t =
f c /F E Sc , (A.112)
y1 = x1 (A.113)
t
yi = |xi − |/H (t), i = 2, . . . , n (A.114)
nT
t t
H (t) = max{|1 − |, | − 1 − |}. (A.115)
nT nT
n = 10.
Problem
A.1 Plot the deceptive multimodal objective function:

f (x) = −0.9x 2 + (5|x|0.001 /50.001 )2 , x ∈ [−5, 5].
References
1. Chu PC, Beasley JE. A genetic algorithm for the multidimensional knapsack problem. J Heuris-
tics. 1998;4:63–86.
2. Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multi-objective genetic algorithm:
3. Drezner Z. The p-center problem: heuristic and optimal algorithms. J Oper Res Soc.
1984;35(8):741–8.
4. Hopfield JJ, Tank DW. Neural computation of decisions in optimization problems. Biol Cybern.
1985;52:141–52.
5. Huband S, Barone L, While RL, Hingston P. A scalable multiobjective test problem toolkit. In:
Proceedings of the 3rd international conference on evolutionary multi-criterion optimization
(EMO), Guanajuato, Mexico, March 2005. p. 280–295.
6. Huband S, Hingston P, Barone L, While L. A review of multiobjective test problems and a
scalable test problem toolkit. IEEE Trans Evol Comput. 2006;10(5):477–506.
7. Kramer O. Self-adaptive heuristics for evolutionary computation. Berlin: Springer; 2008.
8. Matsuda S. "Optimal" Hopfield network for combinatorial optimization with linear cost func-
tion. IEEE Trans Neural Netw. 1998;9(6):1319–30.
9. Reinelt G. TSPLIB–a traveling salesman problem library. ORSA J Comput. 1991;3:376–84.
10. Suganthan PN, Hansen N, Liang JJ, Deb K, Chen Y-P, Auger A, Tiwari S. Problem defini-
tions and evaluation criteria for the CEC 2005 special session on real-parameter optimization.
Technical Report, Nanyang Technological University, Singapore, and KanGAL Report No.
2005005, Kanpur Genetic Algorithms Laboratory, IIT Kanpur, India, May 2005. http://www.
ntu.edu.sg/home/EPNSugan/.
11. Zhang Q, Zhou A, Zhao S, Suganthan PN, Liu W, Tiwari S. Multiobjective optimization
test instances for the CEC 2009 special session and competition. Technical Report CES-487,
University of Essex and Nanyang Technological University, Essex, UK/Singapore, 2008.
12. Zitzler E, Deb K, Thiele L. Comparison of multiobjective evolutionary algorithms: empirical
results. Evol Comput. 2000;8(2):173–95.
Index
A C
Adaptive coding, 43 Cauchy annealing, 33
Affinity, 180 Cauchy mutation, 58
Affinity maturation process, 180 Cell-like P system, 272
Algorithmic chemistry, 304 Cellular EA, 128, 132
Allele, 40 Central force optimization, 296
Animal migration optimization, 243 Chemical reaction network, 306
Annealing, 29 Chemical reaction optimization, 304
Antibody, 180 Chemotaxis, 217
Antigen, 180 Chromosome, 40
Artificial algae algorithm, 222 Clonal crossover, 178
Artificial fish swarm optimization, 249 Clonal mutation, 178
Artificial immune network, 184 Clonal selection, 177, 178
Artificial physics optimization, 296 Clonal selection algorithm, 180
Artificial selection, 41 Clone, 178
Cloud computing, 134
CMA-ES, 88
B Cockroach swarm optimization, 251
Backtracking search, 58 Coevolution, 136
Bacterial chemotaxis algorithm, 222 Collective animal behavior algorithm, 242
Baldwin effect, 5 Combinatorial optimization problem, 14
Bare-bones PSO, 156 Compact GA, 110
Bat algorithm, 246 Computational temperature, 30
Bee colony optimization, 210 Constrained optimization, 359
Belief space, 316 Cooperative coevolution, 133
Big bang big crunch, 301 Crossover, 46, 56
Binary coding, 42 Crowding, 351
Bin packing problem, 417 Cuckoo search, 243
Biochemical network, 267 Cycle crossover, 60
Bit climber, 49
Black hole-based optimization, 302 D
Bloat phenomenon, 71 Danger theory, 178
Boltzmann annealing, 31 Darwinian model, 5
Boltzmann distribution, 30 Deceptive function, 125
Building block, 123 Deceptive problem, 126
Building-block hypothesis, 123 Deme, 128, 356

DOI 10.1007/978-3-319-41192-7
432 Index
Dendritic cell algorithm, 186 H

Deterministic annealing, 34 Hamming cliff phenomenon, 42
Differential mutation, 94 Heat transfer search, 299
Diffusion model, 128 Heuristics, 9
Diffusion search, 258 Hill-climbing operator, 49
DNA computing, 268 Hyper-heuristics, 9
I
E
Immune algorithm, 180
Ecological selection, 41
Immune network, 178
Electromagnetism-like algorithm, 297
Immune selection, 182
Elitism strategy, 45
Immune system, 175
Evolutionary gradient search, 85
Imperialist competitive algorithm, 340
Evolutionary programming, 83 Individual, 40
Exchange market algorithm, 343 Intelligent water drops algorithm, 299
Exploitation/Exploration, 51 Invasive tumor growth optimization, 224
Invasive weed optimization, 255
F Inversion operator, 48
Firefly algorithm, 239 Ions motion optimization, 297
Fitness, 41 Island, 128
Fitness approximation, 139 Island model, 130
Fitness imitation, 141 Iterated local search, 11
Fitness inheritance, 140 Iterated tabu search, 330
Fitness landscape, 41
Fitness sharing, 350 J
Flower pollination algorithm, 256 Jumping-gene phenomenon, 50
Free search, 243
K
Kinetic gas molecule optimization, 299
G
KKT conditions, 13
Gausssian mutation, 57
Knapsack problem, 416
Gene, 40 Krill herd algorithm, 250
Gene expression programming, 78
Generational distance, 386
L
Genetic assimilation, 5
Lagrange multiplier method, 12
Genetic diversity, 47, 51 Lamarckian strategy, 5, 319
Genetic drift, 41 (λ + μ) strategy, 85
Genetic flow, 41 (λ, μ) strategy, 85
Genetic migration, 41 Large-scale mutation, 48
Genotype, 40 League championship algorithm, 342
Genotype–phenotype map, 41 Levy flights, 244
Glowworm swarm optimization, 238 Lexicographic order optimization, 17
Golden ball metaheuristic, 342 Location-allocation problem, 414
GPU computing, 135 Locus, 40
Gradient evolution, 85
Gravitational search algorithm, 295 M
Gray coding, 42 Magnetic optimization algorithm, 298
Great deluge algorithm, 300 MapReduce, 134
Group search optimization, 240 Markov chain analysis, 124
Grover’s search algorithm, 286 Marriage in honeybees optimization, 209
Guided local search, 10 Master–slave model, 129
Index 433
Maximum diversity problem, 417 Population space, 316

Melody search, 233 Population-based incremental learning, 108
Membrane computing, 271 Premature convergence, 44
Memetic algorithm, 318 Principle of natural selection, 5
Memory cell, 175
Messy GA, 53 Q
Metaheuristics, 9 Quadratic assignment problem, 413
Metropolis algorithm, 29
MOEA/D, 380
R
Multimodal optimization, 350
Random keys representation, 60
Multipoint crossover, 47
Ranking selection, 44
Mutation, 48, 57
Ray optimization, 298
Real-coded GA, 56
N
Rearrangement operator, 48
Natural selection, 41
Replacement strategy, 45
Negative selection, 178
Reproduction, 43
Negative selection algorithm, 185
Roach infestation optimization, 251
Neo-Darwinian paradigm, 37
Roulette-wheel selection, 44
Niching, 350
Niching mechanism, 350
No free lunch theorem, 22 S
Nondominated sorting, 372, 384 Scatter search, 331
NP-complete, 14 Schema theorem, 121, 122
NSGA-II, 374 Seeker optimization algorithm, 337
Nuclear magnetic resonance, 284 Selection, 43
Nurse rostering problem, 417 Selfish gene theory, 141
Sequence optimization problem, 60
O Seven-spot ladybird optimization, 252
One-point crossover, 46 Sexual selection, 41
Opposition-based learning, 310 Sheep flock heredity algorithm, 141
Order crossover, 60 Shuffled frog leaping, 241
Simplex search, 14
Social spider optimization, 247
P
Sorting, 303
PAES, 378
SPEA2, 377
Pareto method, 18
Pareto optimum, 18 Squeaky wheel optimization, 342
Partial matched crossover, 60 States of matter search, 298
Partial restart, 53 Statistical thermodynamics, 30
Particle, 153 Suppress cell, 180
Pathogen, 180 Survival of the fittest, 5
Path relinking, 333 Swarm intelligence, 6
Penalty function method, 360 Syntax tree, 72
Permutation encoding, 60
Permutation problem, 142 T
Phenotype, 41 Tabu list, 193, 328
Phenotypic plasticity, 6, 41 Teaching–learning-based optimization, 338
Physarum polycephalum algorithm, 222 Tournament selection, 44
Plant growth algorithm, 300 Transposition operator, 50
Plant propagation algorithm, 256 Traveling salesman problem, 415
Point mutation, 48 Two-dimensional GA, 55
Population, 39 Two-point crossover, 46
434 Index
U Vortex search, 301

Uniform crossover, 47
Uniform mutation, 57
W
V Wasp swarm optimization, 212
Vaccination, 182 Water cycle algorithm, 300
Variable neighborhood search, 10 Wind driven optimization, 302

2016 Book SearchAndOptimizationByMetaheu

Uploaded by

Copyright:

Available Formats

You might also like

2016 Book SearchAndOptimizationByMetaheu

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2016 Book SearchAndOptimizationByMetaheu

Uploaded by

Copyright:

Available Formats

Ke-Lin Du

Search and Optimization

ISBN 978-3-319-41191-0 ISBN 978-3-319-41192-7 (eBook)

Mathematics Subject Classiﬁcation (2010): 49-04, 68T20, 68W15

© Springer International Publishing Switzerland 2016

Printed on acid-free paper

This book is published under the trade name Birkhäuser

Optimization is a branch of applied mathematics and numerical analysis. Almost

probabilistic algorithms. Metaheuristics-based search and optimization are widely

Chapter 19. Memetic Algorithms

Ningbo, China Ke-Lin Du

8 Topics in Evolutinary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 121

11 Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

15.7.6 Monkey-Inspired Optimization . . . . . . . . . . . . . . . . 252

19.3 Memetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

23.2.7 Nondominated Sorting . . . . . . . . . . . . . . . . . . .... 384

Appendix A: Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413

EPUS-PSO Efﬁcient population utilization strategy for PSO

POLE Program optimization with linkage estimation

1.1 Computation Inspired by Nature

Artificial intelligence (AI) is an old discipline for making intelligent machines.

© Springer International Publishing Switzerland 2016 1

1.2 Biological Processes

The deoxyribonucleic acid (DNA) is carrier of the genetic information of organisms.

Figure 1.1 The DNA structure.

Figure 1.2 A gene on a

1.3 Evolution Versus Learning

1.4 Swarm Intelligence

1.4.1 Group Behaviors

In animal behavioral ecology, group living is a widespread phenomenon. Animal

following—following another animal around without exhibiting any searching behav-

• Cooperation and group behavior.

1.4.2 Foraging Theory

1.5 Heuristics, Metaheuristics, and Hyper-Heuristics

Many real-life optimization problems are difficult to solve by exact optimization

Optimization can generally be categorized into discrete or continuous optimization,

Example 1.1: The Rosenbrock function

1.6.1 Lagrange Multiplier Method

1.6.2 Direction-Based Search and Simplex Search

In direct search, generally the gradient information cannot be obtained; thus, it is

1.6.3 Discrete Optimization Problems

The discrete optimization problem is also known as combinatorial optimization prob-

Definition 1.1 (Discrete optimization problem). A discrete optimization problem

1 Namely, nondeterministic polynomial-time complete.

Algorithm 1.1 (Simplex Search).

1.6.4 P, NP, NP-Hard, and NP-Complete

Figure 1.4 The relationship

1.6.5 Multiobjective Optimization Problem

A multiobjective optimization problem (MOP) requires finding a variable vector x

Definition 1.3 (Multiobjective optimization problem). MOP is to optimize a sys-

The Pareto method is a popular method for multiobjective optimization. It is based

Definition 1.4 (Pareto dominance). A variable vector x1 ∈ Rn is said to dominate

For two solutions x1 , x2 , if x1 is better in all objectives than x2 , x1 is said to

Definition 1.5 (Nondominance). A variable vector x1 ∈ X ⊂ Rn is nondominated

Definition 1.6 (Pareto optimality). A variable vector x∗ ∈ F ⊂ Rn (F is the fea-

The Pareto optimal frontier is a set of optimal nondominated solutions, which

Definition 1.8 (Pareto front). The Pareto front PF ∗ is defined by

Parameter space x2 Objective space f2* f2

(a) (b) (c)