BSCM-02 Evolutionary Algorithms

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Bioinspired and soft-computing

methods in data analysis and


optimization Population-based
Evolutionary Algorithms metaheuristics

Krzysztof Michalak
krzysztof.michalak@ue.wroc.pl
https://krzysztof-michalak.pl/

Drawbacks of classical methods Drawbacks of classical methods


 Classical (e.g. gradient) optimization methods start  A function with local suboptima
from an initial solution x0 and search for better  The optimizer can get stuck in a suboptimum x'
solutions around it
 The direction of the search is determined using the f (x) x0

function derivative or gradient x1


x'
x*
 Iteratively, the process converges to the optimum x* Ω

f (x)  An attempt to remedy this


x0  Multiple restarts
x1
 At least some starts occur close to the optimum and
x*
converge to it
Ω
3  Disadvantage: no information transfer between runs 4

Drawbacks of classical methods The Four Peaks problem (4PP)


 Multiple restarts very time consuming in some cases  A benchmark problem (used for testing algorithms)

 The search space: binary vectors of length n = 100

 Evaluation f(x) of a solution x ∈ { 0, 1 }100


 z(x) – the number of contiguos zeros at the start
 o(x) – the number of contiguos ones at the end
 REWARD
 100 if z(x) > T and o(x) > T
https://en.wikipedia.org/wiki/Rastrigin_function
 0 otherwise
 f(x) = max( z(x), o(x) ) + REWARD

5 6

1
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
 The 4PP is a deceptive problem  The 4PP instance for n = 16 and T = 4

 Two locally optimal solutions


x0 = 0 0 0 … 0 0 0 f(x0) = max(z(x0), o(x0)) = 100
x1 = 1 1 1 … 1 1 1 f(x1) = max(z(x1), o(x1)) = 100

 Two globally optimal solutions for T = 15


x*0 = 0 0 0 … 1 1 1 f(x*0) = max(z(x*0), o(x*0)) + REWARD = 184
16 84
x*1 = 0 0 0 … 1 1 1 f(x*1) = max(z(x*1), o(x*1)) + REWARD = 184
84 16

7 8

The Four Peaks problem (4PP) The Four Peaks problem (4PP)
 All solutions of the 4PP instance for n = 16 and T = 4  Solutions found by an evolutionary algorithm solving
the 4PP instance for n = 100 and T = 15 mapped to
ℝ2 using the LDEE technique (red: REWARDed
For this visualization, all 65536 solutions were mapped from { 0, 1 }16
to ℝ2 using the LDEE technique[1]. solution)
LDEE tries to find positions of points in the target space (ℝ2) in such
a way that the distances are "similar" to those in the source space
({ 0, 1 }16).
[1] K. Michalak "Low-Dimensional Euclidean Embedding for Visualization of Search Spaces
in Combinatorial Optimization", IEEE Transactions on Evolutionary Computation, 23(2),
pp. 232-246, IEEE, 2019.

See 4pp_ea and 4pp_3d movies!


9 10

The Four Peaks problem (4PP) The Four Peaks problem (4PP)
Here, the selective pressure drives us
 The gradual improvement suggests increasing the  Theaway
gradual improvement
from a REWARDed suggests increasing the
solution!
number of either zeros or ones number of either zeros or ones
Here, the selective pressure vanishes.
 Getting (gradually) from a locally optimal solution to  Getting
If only (gradually)
a few bits have from a locally optimal solution to
to be fixed,
a REWARDed one requires worsening of the solution a REWARDed solution
a REWARDed could be found
one requires worsening of the solution
by the genetic drift.
x0 = 000…00000000000000000 f(x0) = 100 x0 = 000…00000000000000000 f(x0) = 100

x1 = 000…00000000000000100 f(x1) = 97 x1 = 000…00000000000000100 f(x1) = 97

x2 = 000…01000000000000100 f(x2) = 84 x2 = 000…01000000000000100 f(x2) = 84


x3 = 000…01000001000000100 f(x3) = 84 x3 = 000…01000001000000100 f(x3) = 84
… …
xn - 1 = 0 0 0 … 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 f(xn - 1) = 84 xn - 1 = 0 0 0 … 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 f(xn - 1) = 84
xn = 000…01111111111111111 f(xn) = 184 xn = 000…01111111111111111 f(xn) = 184

84 zeros 84 zeros

11 12

2
The Four Peaks problem (4PP) The Four Peaks problem (4PP)
 There is a gradual improvement path from less  A problem: when the left or right part of the solution
optimized solutions to the REWARDed ones is improved, there is no incentive to work on the
x0 =0100010010110000…0010010100100011 f(x0) = 2
other end of the vector
x1 =0100010010110000…0010010100100111 f(x1) = 3 x2 =0000010010110000…0010010100100111 f(x2) = 5
x2 =0000010010110000…0010010100100111 f(x2) = 5 x3 =0000010010110000…0010010100101111 f(x2) = 5
x3 =0000000010110000…0010010100100111 f(x3) = 8
x4 =0000000000110000…0010010100100111 f(x4) = 10
…  The more improved one end of the vector, the har-
first 16 positions last 16 positions der to improve a solution by gradually modifying the
other end
x4 =0000000000110000…0010010100100111 f(x4) = 10
x5 =0000000000110000…0010010111111111 f(x5) = 10

13 14

The Four Peaks problem (4PP) The Four Peaks problem (4PP)
 Solution: process a population of solutions and allow  Solution: process a population of solutions and allow
transfering information between them transfering information
If the right between them
part is not optimized,
gradually improving the left part
 Some solutions improve the left part of the vector,  Some solutions gives
improve theanleft
a solution part of the vector,
advantage

others the right part others the right part


x0 = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(x0) = 2 x0 = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(x0) = 2

xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10 xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10

x'n = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 f(x'n) = 11 x'n = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 f(x'n) = 11

15 16

The Four Peaks problem (4PP) The Four Peaks problem (4PP)
 Solution: process a population of solutions and allow  Solution: process a population of solutions and allow
transfering information between
If the left part them
is not optimized, transfering information between them
gradually improving the right part
 Some solutions gives a solution
improve theanleft
advantage
part of the vector,  Some solutions improve the left part of the vector,
others the right part others the right part
x0 = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(x0) = 2 x0 = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(x0) = 2

xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10 xn = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 f(xn) = 10

x'n = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 f(x'n) = 11 x'n = 0 1 0 0 0 1 0 0 1 0 1 1 0 0 0 0 … 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 f(x'n) = 11

 Combining these partial solutions produces a vector


improved on both ends
xn ⊗ x'n = 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 … 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 f(x'n) = 11

17 18

3
The Four Peaks problem (4PP) Population-based metaheuristics
 Conclusions  Design principles
 Start with many initial solutions
 If a population of solutions is processed, each population
member can improve a different part of the solution  Evaluate the goal function for these solutions

 Combining these partial solutions can be beneficial  Use the obtained information from the entire population to
guide the search
 A less optimized but more diverse population can find  Details depend on a particular method
a REWARDed solution more easily  For example in EAs information from parents is combined to
generate offspring

 As the result return


 The best solution
 The entire population
 Some other representation (e.g. a probabilistic model in the EDAs)
19 20

Population-based metaheuristics
 Advantages
 Much less likely to get stuck in local optima

The algorithm gets a lot of information from many


Evolutionary

evaluated solutions

 Suitable for problems which require a set of solutions as a


result
Algorithms
 Multiobjective (the Pareto front)
 Multimodal (many equally good solutions)
 Novelty search (optimization focused on generating a variety of
different solutions, e.g. designs)

21 22

Bio-inspiration Evolution vs. a random search


 Biological evolution  Evolution is not a random process (and neither are
evolutionary algorithms)
 Key drivers of evolution (both biological and
Mutation can be assumed to be purely random
artificial) 

 Selection usually involves an element of chance, but the


 Recombination
probability distribution is skewed towards better solultions
 BIO: mating, horizontal gene transfer
 The whole process is guided by the selective pressure
 EA: crossover operators
 Variation  Good analogy: a waterfall
BIO: mutation

 Droplets fall chaotically
 EA: mutation
 The entire flow is directed
 Selective pressure
 BIO: competition for resources, predator vs. prey
 EA: selection procedures
23 24

4
Evolution vs. a random search A broader view
 Evolution is not a random process (and neither are  Evolution as a (very) general principle
evolutionary algorithms)  Selection causes "better" individuals to dominate in the population
 Optimization
Mutation can (minimization)
be assumed of to be purely random  Population contains more and more "better" individuals
potential energy!  Statistically, the entire population "moves" towards a "better" region
 Selection usually involves an element of chance, but the in the search space
probability landscape
Optimization distribution is skewed towards better solultions  Next generation is selected from an already "improved" population
The shape of the suface under
 The whole process is guided
water by the selective pressure
 Everything else are just "implementational details"
 Good analogy: a waterfall
 Biological
 Droplets fall chaotically  Chemical (abiogenesis) – self replicating molecules
 The entire flow is directed  Computer-based
 Evolution of memes (ideas, behaviors, or styles that spread by means
of imitation from person to person)

25 26

Evolutionary Algorithms (EAs) Terminology


 Phenotype: a solution in the solution space of the
Main Mating Evolutionary Next solved problem
population pool operators population  Example: a tour in the TSP

 Genotype (a.k.a. chromosome): a solution encoded


using a given representation
 Example: in an evolutionary algorithm solving the TSP tours can be
represented as permutations
 Note, that multiple genotypes (permutations) encode the same
phenotype (tour)
1
[1234] [2341] [3412] [4123]
4 2
[1432] [2143] [3214] [4321]
3
 If a matrix-based representation is used the genotypes are
27 permutation matrices, but phenotypes are still tours 28

Terminology Terminology
 Individual (a.k.a. specimen)  Representation
 Can be used to denote both a genotype and a phenotype  Two different meanings
 A particular representation (genotype) of a phenotype
 Often, genotypes are composed from elements
 A method of representing phenotypes in the genotype space
 Examples: vectors, matrices, etc.
Locus (plural: loci), a.k.a. position, gene: where the element can be

placed
 Encoding and decoding
 Encoding: the mapping from phenotypes to genotypes
 Allele (a.k.a. value): what is actually stored
 Decoding: the mapping from genotypes to phenotypes

29 30

5
Terminology Terminology
 Population  Mating pool
 The unit of evolution (both biological and artificial)  A subset of the population
 In EAs it is a multiset*) of genotypes
 Contains solutions which are recombined to produce a new
 A parameter: Npop the population size (the number of genotypes) generation
 More advanced population structures:
 Neighbourhood structure (e.g. in the MOEA/D algorithm)
 Solutions that are included in the mating pool are referred to as
 In multipopulation algorithms: subpopulations, a migration scheme
parents

 Selected in such a way that fitter solutions are preferred (e.g. more
 Archive copies, more probable to be selected)
 Contains genotypes which we want to prevent from being lost
 Depending on the recombination operators the mating pool can be
 Example: the best approximation of the Pareto front found so far in split into pairs or used in some other way
multiobjective evolutionary algorithms  Differential evolution uses three solutions to create a single new one.
 In some algorithms (e.g. SPEA2) solutions from the archive can mix  The inver-over operator (dedicated to the TSP) can mutate one parent or
with the evolving population recombine it with another one. This is repeated many times, effectively involving
numerous solutions in generating a single new one.
*) A multiset is a set where multiple copies of an element are possible 31 32

A Simple Genetic Algoritm (SGA) A Simple Genetic Algoritm (SGA)


 Pseudocode*)  Evolutionary algorithms are metaheuristics
(algorithm design patterns)
 There are many things to decide!
 Problem-oriented components
 Solution representation. Can, but does not have to be
identical with the elements of the search space  .
 Permutations
 Permutation matrices can all be used to encode an ordering
 Real vectors

*) Unless noted otherwise, the material from the following book is used
 Solution evaluation. A problem-specific numerical value
A.E. Eiben, J.E. Smith, "Introduction to Evolutionary Computing", 2nd ed., Springer, 2015.
https://link.springer.com/book/10.1007%2F978-3-662-44874-8
representing the quality we want to optimize.
It is free to download from the UEW network (http proxy, virtual machines or labs) 33 34

A Simple Genetic Algoritm (SGA) A Simple Genetic Algoritm (SGA)


 Algorithm-oriented components  Parameters
 Population initialization. Usually random generation.  Population size N pop

 Operators: crossover, mutation. Have to be compatible  Termination (stopping) condition


with the used solution representation (e.g. recombining  Number of generations N gen
permutations is not trivial).
 Number of solution evaluations (calculations of f(x )) N fe a good
 Parent and survivor selection mechanisms criterion to consider in the case of expensive evaluations (e.g.
simulations)
 Roulette wheel
 Running time N sec. May be important if the algorithm itself
 fitness-proportionate selection
performs expensive computations (e.g. complex probability
 needs a fitness-assignment scheme models)
 Binary tournament
 Operator probabilities (crossover P cross, mutation P mut)
 two randomly selected individuals meet
 the better solution wins
35 36

6
SGA for the Knapsack Problem (KP)
 Given n items, each with a weight wi and a value vi
choose the items to pack to a knapsack with the
A Simple Genetic Algorithm weight limit W so tha the total value is maximized.

for the Knapsack Problem (KP)  Search space:  = { 0, 1 }n

 Solution representation: binary


(binary vector representation) vectors

 Problem: many binary vectors


encode infeasible solutions (bre- CC BY-SA 2.5,

aking the weight constraint) https://commons.wikimedia.org/w/index.php?curid=985491

37 38

SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
 Remedy: use a constraint-aware solution decoding  Population initialization
 Given a binary vector x = [ x i ] i = 1, …, n
 Generate N pop = 500 binary vectors
At each position in each vector set 0 or 1 with P = 0.5
[ 0, 1, 0, 1, 1, 1, 0, 1, 0 ]

go from left to right and only include the item if it selected


in x and the weight constraint is not violated  Stopping criterion
 No improvement in the last 25 generations
 Thereby a new vector x' is obtained, possibly with fewer
 Good for detecting stagnation
items selected
n
The weight constraint is satisfied:  wi xi  W
'

i 1

 The solution x is evaluated using items left in x'


n
f ( x)   vi xi'
i 1
39 40

SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
 Crossover  Crossover
 single-point and two-point  n -point
A generalization of the two-point crossover
The vector is cut at n randomly selected positions

 uniform
Values from each position are independently assigned to
each of the two offspring with P = 0.5

41 42

7
SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
 Parent selection  Parent selection
 Binary tournament  Binary tournament
 Select two individuals randomly from the population with  Select two individuals randomly from the population with
uniform probability uniformNote,
probability
that the tournament is binary because two
individuals take part.
 The one with the higher value is selected  The one with the higher value is selected
 Resolve draws at random  ResolveIt draws
has nothing to do with the binary representation
at random
used for the Knapsack Problem (it can be used, for
 For the crossover operators we need two parents, so we  For theexample, alsooperators
crossover for the TSP we
based on permutations).
need two parents, so we
perform two separate tournaments perform two separate tournaments
When more than two indivuals take part we get an
n -ary tournament.

For n > 2 the selective pressure is increased – a


solution has to be the best out of n to survive.
43 44

SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
 Reproduction Mating pool Next population  Mutation
 With the probability P cross  Bit-flip mutation
 apply the crossover operator Pcross  Applied independently to each position in each vector
 evaluate the new solutions
 With the probability P mut the value is flipped from 0 to 1
 With the probability 1 - P cross or vice versa
just copy the parents (the 1 - Pcross
evaluation does not have  A typical setting is P mut = 1 / n
to be recalculated)
 On average one value is changed in each individual
 A setting of P cross = 0.7 balances exploitation (keeping
good solutions in the population) with the exploration
(generating new solutions)

 N pop new solutions are generated 45 46

SGA for the Knapsack Problem (KP) SGA for the Knapsack Problem (KP)
 Survival selection  Summary of design choices proposed in the
 Generational: offspring population replaces the parent Eiben&Smith book
population

 Additional mechanisms
 Elitism: the best solution is always copied to the next
generation without change
 External archiving: a copy of the best solution is made
after each generation and put aside (but not included in
the population)

47 48

8
Real-coded representation
 Useful for problems described by numerical
parameters, for example, engineering problems

Real-coded  The genotype x is a real vector


Unbounded: x ℝ
d

representation

 Bounded, e.g. x [0, 1]d

 These are continuous search spaces

49 50

Real-coded representation Real-coded representation


 Cable-Stayed Bridge (CSB) optimization[1]  Cable-Stayed Bridge (CSB) optimization
 Evolutionary algorithm: SGA
 Fitness function combining structural constraints S(x)
and costs C(x)

[1] Correia J., Ferreira F., „Designing Cable-Stayed Bridges with Genetic Algorithms”, in: Castillo
P. A. et al. (eds.): EvoApplications 2020, LNCS 12104, pp. 228–243, 2020. 51 52

Real-coded representation Real-coded representation


 Crossover operators typical for arrays:  Flat crossover
 single-point (a.k.a. simple), two-point, n –point crossover One offspring H = (h1, …, hd) is generated, where h i is a
randomly (uniformly) chosen value from the interval [c i1, c i2]
 Uniform (a.k.a. discrete crossover)

 Crossover operators dedicated for real vectors[1]


 Arithmetical crossover
 Parent solutions A value   [0, 1] is randomly (uniformly) drawn
Two offspring are generated:

[1] Herrera F., Lozano M., Verdegay J. L., "Tackling Real-Coded Genetic Algorithms: Operators If   [-0.25, 1.25] we call it the extended line crossover
and Tools for Behavioural Analysis", Artificial Intelligence Review, vol. 12, p. 265–319, 1998.
53 54

9
Real-coded representation Real-coded representation
 BLX- crossover  BLX- crossover – the role of the  parameter
One offspring H = (h1, …, hd) is generated, where h i is a BLX0.00 hi  [ cmin, cmax]
cmin
randomly (uniformly) chosen value from the interval: If parents are in the basin cmax f (x)
of attraction of a suboptimum x’
[c min – I ∙ , c max + I ∙  ] the new solution converges to x’
x'
x*
where:
Ω
c min = min(c i1, c i2) hi range
c max = max(c i1, c i2) BLX0.25 hi  [c min – I ∙ 0.25, c max + I ∙ 0.25 ]
The new solution can be located
I = c max – cmin cmin
f (x)
outside the basin of attraction of cmax
a suboptimum x’ and may converge
BLX0.00 = flat crossover to the true optimum x* x'
x*
BLX0.25 = extended intermediate crossover
Ω
55 hi range 56

Real-coded representation Real-coded representation


 Mutation operators – notation  Muehlenbein's mutation
C = (c1, …, cd) – a genotype The mutated gene is:
c i' = c i ± ri ∙ γ
c i  [a i, b i] – a gene to be mutated
ci' – the mutated gene where:
ri – the mutation range, normally set to 0.1 ∙ (bi - ai)
 Random mutation ± – the sign chosen with a probability of 0.5
The mutated gene c i' is a number randomly (uniformly) drawn
from [a i, b i]
k  { 0, 1 } – randomly generated with P(k = 1) = 1/16.

57 58

Permutation-based representation
 Useful for problems in which the ordering of items
matters
 The TSP: the order in which to visit cities.
Permutation-based  Scheduling problems: the order in which to perform

representation operations.

 Assignment problems: the permutation  is used to


match, e.g. facilities to locations in the Quadratic
Assignment Problem (QAP). Facility k is matched to the
location [k ].

59 60

10
Permutation-based representation Permutation-based representation
 Population initialization – random permutations of n  The need for dedicated crossover operators for
elements permutations
 The Fisher-Yates shuffle[1,2]  We can store permutations as vectors of integers
// All numbers from 1 to n
for i from 0 to n−1 do  However, generic crossovers for vectors (arrays)
a[i] = i + 1
produce invalid permutations
// Shuffling
for i from 0 to n−2 do  Example: a single point crossover
j := a random integer such that i ≤ j < n
exchange a[i] and a[j]
1 2 3 4 5 6 1 2 3 3 2 1

6 5 4 3 2 1 6 5 4 4 5 6
[1] Knuth D. E., "The Art of Computer Programming. Volume 2, Seminumerical algorithms",
Reading, MA: Addison–Wesley. pp. 139–140, 1969.
[2] https://en.wikipedia.org/wiki/Fisher-Yates_shuffle 61 62

Permutation-based representation Permutation-based representation


 Order Crossover (OX)[1,2]  Order Crossover (OX)

 The intention is to transmit information about


relative order
1. Choose two crossover points at random and copy the segment
between them from the first parent (P1) into the first
offspring
2. Starting from the second crossover point, copy the remaining
unused numbers from the second parent to the first child in
the order that they appear in the second parent, wrapping
around at the end of the list.
3. Create the second offspring in an analogous manner, with the
parent roles reversed.

[1] Eiben&Smith 2015 p. 73-74.


[2] Davis L., "Handbook of Genetic Algorithms", Van Nostrand Reinhold, 1991.
63 64

Permutation-based representation Permutation-based representation


 Cycle Crossover (CX)[1,2]  Cycle Crossover (CX)

 The intention is to preserve as much information as  Construction of one cycle


possible about the absolute positions in which 1. Start with the first unused position and value v1 of P1
elements occur 2. Add v1 to the cycle

1. Divide parents into cycles 3. Look at the value v2 in the same position in P2

2. From the first, third, etc. cycles copy the elements 4. Add v2 to the cycle
P1  O1 and P2  O2 5. Go to the position with the value v2 in P1
3. From the second, fourth, etc. cycles copy the elements 6. Repeat steps 3 through 5 until you arrive again at the value
P1  O2 and P2  O1 v1 in P1. When you do, the cycle is completed.

[1] Eiben&Smith 2015 p. 73-74.


[2] Oliver I. M. , Smith D. J., Holland J., "A study of permutation crossover operators on the
travelling salesman problem", in: Grefenstette J. J. (ed.), Proceedings of the 2nd International
Conference on Genetic Algorithms and Their Applications, Lawrence Erlbaum, Hillsdale, New
Jersey, pp. 224–230, 1987. 65 66

11
Permutation-based representation Permutation-based representation
The first cycle is: (1, 9, 4, 8)
 Cycle Crossover (CX)  Cycle Crossover (CX) Elements from this cycle are
assigned: P1  O1 and P2  O2

67 68

Permutation-based representation Permutation-based representation


The second cycle is: (2, 3, 5, 7) The third cycle is: (6)
 Cycle Crossover (CX) Elements from this cycle are
 Cycle Crossover (CX) Elements from this cycle are
assigned: P1  O2 and P2  O1 assigned: P1  O1 and P2  O2

69 70

Permutation-based representation Permutation-based representation


Note various applications! Note various applications!
 Linear Order Crossover (LOX)  Partially Mapped Crossover (PMX)
Falkenauer E., Bouffouix S., "A genetic algorithm for job shop", in: Proceedings of the 1991 Goldberg D. E., Lingle Jr. R., "Alleles, loci, and the traveling salesman problem", in:
IEEE International Conference on Robotics and Automation, pp. 824–829, 1991. Grefenstette J. J. (ed.), Proceedings of the First International Conference on Genetic
Algorithms and Their Applications, pp. 154–159, Lawrence Erlbaum Associates Publishers,
 Merging Crossover (MOX) Hillsdale, 1985.

Mumford C. L., "New order-based crossovers for the graph coloring problem", in:
Runarsson T. P., Beyer H. G., Burke E., Merelo-Guervos J. J., Whitley D. L., Yao X. (eds.)  Precedence Preservative Crossover (PPX)
PPSN IX, LNCS, vol. 4193, pp. 880–889. Springer, Heidelberg, 2006. Bierwirth C., Mattfeld D. C., Kopfer H., "On permutation representations for
scheduling problems", in: Proceedings of the 4th International Conference on Parallel
 Non-Wrapping Order Crossover (NWOX) Problem Solving from Nature, pp. 310–318, Springer, 1996.
Blanton J. L. Jr., Wainwright R. L., "Multiple vehicle routing with time and capacity
Cicirello V. A., "Non-wrapping order crossover: an order preserving crossover operator that constraints using genetic algorithms", in: Proceedings of the 5th International Conference
respects absolute position", in: Proceedings of the 8th Annual Conference on Genetic and on Genetic Algorithms, pp. 452–459, Morgan Kaufmann Publishers Inc., San Francisco, CA,
Evolutionary Computation, pp. 1125–1132. ACM, New York, NY, USA, 2006. USA, 1993.

 Order Based Crossover (OBX)  Uniform Partially Mapped Crossover (UPMX)


Position Based Crossover (PBX) Cicirello V. A., Smith S. F., "Modeling GA performance for control parameter optimization",
in: Whitley L. (ed.) GECCO-2000: Proceedings of the Genetic and Evolutionary Computation
Syswerda G., "Schedule optimization using genetic algorithms", in: Davis L. (ed.),
Conference, Wiley, Las Vegas, USA, 2000.
Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991.
71 72

12
Permutation-based representation Permutation-based representation
 Swap Mutation  Illustration of the mutation operators for the
 Two positions in the chromosome are selected at random permutation-based representation
and their values are swapped
 Insert Mutation
 Two values are selected at random and the second is
moved next to the first, shifting the others to make room
 Scramble Mutation
 The entire chromosome, or some randomly chosen subset
of it, is randomly shuffled
 Inversion Mutation
 Two positions in the chromosome are selected and the
elements between them are placed in the reversed order
73 74

Dedicated operators for the TSP Dedicated operators for the TSP
 The Inver-Over Operator[1]  The Inver-Over Operator - example
 Random inverse rate parameter   Assume that the current individual is:

 Combines two operations S = [ 2, 3, 9, 4, 1, 5, 8, 6, 7 ]


 With probability : inversion mutation c c'

 With probability 1-: transfer of information from another solution  A random city is selected from S, let's say c = 3
using the "guided inversion" operation  A random number x is drawn from [0, 1]
 Typical setting:  = 0.02  If x   then inversion mutation is performed
 Another random city is selected from S, let's say c' = 8
 The solution S is updated – the segment (c, c'] is reversed

[1] Tao G., Michalewicz Z., "Inver-over operator for the TSP", in: Eiben A. E., Back T., S  [ 2, 3, 8, 5, 1, 4, 9, 6, 7 ]
Schoenauer M., Schwefel H. P. (eds.), Parallel Problem Solving from Nature PPSN V, Lecture
Notes in Computer Science, vol. 1498, pp. 803–812, Springer, Berlin Heidelberg, 1998.
75 76

Dedicated operators for the TSP Dedicated operators for the TSP
 The Inver-Over Operator – example  The 2-opt operator
 If x >  then another solution is randomly selected:  Randomly selects two non-consecutive edges <c 1, c 2>
and <c 3, c 4>
S' = [ 1, 6, 4, 3, 5, 7, 9, 2, 8 ]
c c'  Replaces them with the two other edges, which do not
cut the tour in two: <c 1, c 3> and <c 2, c 4>
 The city c' is selected as the next one, after c in S'
 The solution S is updated – the segment (c, c'] is reversed  This operator is intended to untangle loops in the tour
 Used as a local search operator
S = [ 2, 3, 9, 4, 1, 5, 8, 6, 7 ]
c c'
S  [ 2, 3, 5, 1, 4, 9, 8, 6, 7 ]
c c'
 Information is transferred from S' to S : the next city after c
becomes c' (taken from S' )
source: http://www.devx.com/dotnet/Article/33574/0/page/3
77 78

13
Dedicated operators for the TSP Dedicated operators for the TSP
 The 2-opt operator – C# implementation  The 2-opt operator – C# implementation

Inputs:
perm – the solution we start from
costs – the cost matrix

Output:
perm – the solution improved using 2-opt

These two nested loops enumerate all


pairs of edges i1 → i2 and i3 → i4 that do
not have any cities in common.

79 80

Dedicated operators for the TSP Dedicated operators for the TSP
 The 2-opt operator – C# implementation  The 2-opt operator – key features
 Very fast, because only delta evaluation is needed

 An informed operator
 Uses information about the solved problem
 The cost matrix is provided

 Previously discussed operators (e.g. OX, CX) are non-informed


 When the crossover is applied it does not "know" if the change is
beneficial
 The operators generate new solutions
 The selective pressure promotes good solutions
Knowing the cost matrix we can  The approach known as "black-box optimization" (the details of the
- Determine which changes are good solved problem are not visible)
- Perform a delta evaluation  Advantages: the algorithm is independent of the solved problem; we do
- We only calculate the cost change not have to know the problem very well
 Disadvantage: the search usually takes longer
81 82

Fitness
 A single number that represents how good a
soltution is
 Should be higher for better individuals
Fitness-proportionate  For maximization problems a possible choice is
selection fitness(x ) = f(x )
 For minimization problems the fitness should be
higher for lower f(x )
 Fitness can summarize multiple objectives as well as
other aspects, such as crowding or constraint
violation
83 84

14
Fitness assignment Fitness assignment
 A scheme which determines how the fitness depends  Rank-based fitness
on the evaluation f(x )  Order solutions in the population w.r.t. to f(x )
 If values of f(x ) are large, and differences are small  Use the (normalized) position in the ranking as fitness
there can be little selective pressure
 For constrained optimization problems fitness can be
Use normalized evaluations as fitness

reduced by a penalty
f ( x )  f min  For feasible solutions: no penalty
 Maximization: fitness ( x ) 
f max  f min  Penalty increases when constraints are violated
f  f ( x) Infeasible solutions can contain valuable information
 Minimization: fitness( x )  max 
f max  f min
 They should neither be allowed to dominate the population
nor to go extinct entirely
f min , f max  calculated in the whole population
85 86

Fitness sharing Roulette-wheel selection


 A technique that maintains diversity in the population  A solution x can be selected with the probablity
proportional to fitness(x )
 Based on the distance between individuals

 Good solutions in densely populated regions are  Example (maximization)


given a lower fitness value than comparably good  Evaluations:
38, 27, 36, 6, 18, 26, 56, 41
solutions in sparsely populated regions
 Normalized fitness values f i:
 Not a selection technique per se, but can be used 0.64, 0.42, 0.60, 0.00, 0.24, 0.40, 1.00, 0.70
with fitness-base selection techniques  Sum of fitness values:
 Example 4.00
 Multiply fitness by the distance to the closest neighbour  Probabilities p i:
Distance in the: genotype space, fenotype space, objective space (for multiobjective
0.160, 0.105, 0.150, 0.000, 0.060, 0.100, 0.250, 0.175

optimization problems) 87 88

Roulette-wheel selection Roulette-wheel selection


 Given the fitness values f i, how to select a solution?  Normalized fitness values f i:
n
0.64, 0.42, 0.60, 0.00, 0.24, 0.40, 1.00, 0.70
 Calculate the sum of fitness values f '   f i
i 1

 Generate a random number   [0, f ')  Sum of fitness values:


4.00
 Then, check where  points to in a sequence of f i
 Random number :
f '' = 0 2.15
for i = 1, …, n
 = 2.15 
f '' = f '' + f i
if  < f ''
=2.30
return the i -th solution
89 90

15
Stochastic remainder selection Stochastic remainder selection
 Given parameters  The truncated integer value of ei copies of i-th
 The population size n solution are propagated to the next generation
 The fitness values fi, for i = 1, …, n
 The fractional parts of each ei are used to allocate
 The expected number ei of copies is calculated that proportionally-sized slots on a roulette wheel
statistically should be carried over to the next
generation for i = 1, …, n  The roulette wheel is randomly spun and the chosen
individuals are copied to the next generation until the
 
f next generation contains n individuals
ei  n n i 

 fj
 j 1 
 This way at least ⌊ei⌋ copies are guaranteed to
survive
91 92

Stochastic remainder selection


 The truncated integer value of ei copies of i-th
solution are propagated to the next generation

 The fractional parts of each ei are used to allocate Importance of various


proportionally-sized slots on a roulette wheel
EA elements
 The roulette wheel is randomly spun and the chosen
individuals are copied to the nextrounding
⌊ ⌋ denotes
⌊e generation until the
i

next generation contains down


n individuals
(floor) operation

 This way at least ⌊ei⌋ copies are guaranteed to


survive
93 94

The role of EA elements The role of EA elements


 Crossover  Mutation
 Spreads information in the population  Increases the diversity of the population

 Higher crossover rate  Higher mutation rate


 More new solutions generated for the next generation  Increases diversity
 Good solutions from the previous generation less likely to survive  Allows finding solutions that differ from existing ones
 Increased exploration (tendency to look for new solutions)  Increased exploration (tendency to look for new solutions)
 High crossover rates may require preventing good solutions from  Lower risk of getting stuck in local suboptima
being lost
 Typical settings
 Elitism (the best solution always copied to the next generation)
 Truncation selection (Npop best solutions selected from a union of parents and
 Pmut ≈ 0.01
offspring)  For vector-based representations: Pmut = 1/n, where n is the
genotype length
 Typical settings: Pcross ≥ 0.5, often even Pcross = 1.0  This causes the mutation to on average change one element in
95 each genotype per generation 96

16
The role of EA elements The role of EA elements
 Selection  Selection
 Key driver of evolution (both biological and artificial)  Increasing selective pressure
 More solutions participating in a tournament
 Causes better solutions to have a higher chance of survival
 Nonlinear fitness assignment (e.g. f(x)α with α > 1).
 Higher selective pressure  Truncation selection (the best Npop solutions selected from a union
of parents and offspring)
 Better solutions found faster
 Faster convergence (diversity decreases more rapidly)  Decreasing selective pressure
 Higher risk of getting stuck in local suboptima  Allow the worse solution to win the tournament with some
small probability
 Nonlinear fitness assignment (e.g. f(x)α with α < 1)

97 98

A sample problem (a TSP instance) EA description


 A PCB drilling problem with 1002 locations  Initial population
 Search space size | Ω | = 1002! = 4.04  102573  200 specimens
 The optimal solution with f (x ) = 259 045  Each specimen is a randomly generated permutation

 Crossover
 CX, LOX, MOX, NWOX, OBX, OX, PBX, PMX, PPX, UPMX
 In each generation one operator selected randomly

 Mutation
 Displacement, Insertion, Inversion, Scramble, Transpose
 In each generation one operator selected randomly

99 100

EA description EA variants
Note, that these operators are not
 Initial population TSP-specific and can be used for  "Random"
 200 specimens many permutation-based problems  In each generation a new population is randomly
 Each specimen is a randomly generated permutation generated
 No operators, no selection
 Crossover  Random search, not an evolutionary algorithm
 CX, LOX, MOX, NWOX, OBX, OX, PBX, PMX, PPX, UPMX
 In each generation one operator selected randomly  "Random selection"
 Crossover and mutation
 Mutation
 Specimens for the new generation are randomly selected
 Displacement, Insertion, Inversion, Scramble, Transpose
 Full EA, but with random selection
 In each generation one operator selected randomly
 No selective pressure
101 102

17
EA variants EA variants
 "Elitism only"  "No crossover" and "no mutation"
 The same as the "random selection" variant  Only one of the operators is used
 The best specimen is preserved and copied to the next  Specimens for the new generation are selected using the
generation binary tournament
 Full EA, but with all specimens randomly selected, except  This creates a selective pressure
the best one
 EA with one of the operators missing

103 104

EA variants Plots
 "All" – full EA, no missing parts  Generation 0 – the best solution in the initial
 Both operators are used population f (x ) = 6 216 147
 Crossover: allows information transfer
 Mutation: increases diversity
 Specimens for the new generation are selected using the
binary tournament
 This creates a selective pressure

 After each generation


 For each of the variants
 The best solution found so far is plotted
105 106

Plots Plots
 Generation 10 000  Generation 100 000

Random: 5 956 257 Random selection: 5 968 342 Elitism only: 4 277 255 Random: 5 956 257 Random selection: 5 954 514 Elitism only: 2 157 273

No crossover: 863 838 No mutation: 3 868 928 All: 1 056 090 No crossover: 305 276 No mutation: 3 868 928 All: 320 296
107 108

18
Plots Plots
 Generation 500 000  Generation 1 000 000

Random: 5 935 172 Random selection: 5 931 241 Elitism only: 1 079 215 Random: 5 935 172 Random selection: 5 931 241 Elitism only: 801 917

No crossover: 282 174 No mutation: 3 868 928 All: 280 468 No crossover: 281 355 No mutation: 3 868 928 All: 279 238
109 110

Plots Results summary


Only differ w.r.t. selection
 Generation 1 000 000 (have crossover and mutation)  Random search performs worst

 The selection is very important!


 Random selection is not much better than random search,
even though it uses both crossover and mutation
 Even "elitism only" performs much better
Random: 5 935 172 Random selection: 5 931 241 Elitism only: 801 917

Binary tournament selection  Clearly, evolution is not random

No crossover: 281 355 No mutation: 3 868 928 All: 279 238


111 112

Results summary TSP-specific algorithm improvements


 Without mutation the algorithm improves the  TSP-specific elements that can improve the algorithm
solutions, but then stagnates  The inver-over operator

 For this problem mutation is more important than  A local search based on the 2-opt operator
crossover  Untangles "loops"
 Is known to greatly improve the results
 The best solution found is 7.8% worse than optimal
 3-opt (and possibly also k -opt with k > 3)
 Note, that no TSP-specific elements were used  Lin-Kernighan (LK) algorithm
 A variable k -opt algorithm
 Decides which k is the most suitable at each iteration step

 Improvements by Helsgaun – the LKH algorithm

113 114

19
TSP-specific algorithm improvements
 A comparison of the EA variants presented before
with the EA using the 2-opt operator as the local
search

 The "All" variant with the 2-opt local search applied


to the best specimen ("LS-best") or to all specimens
("LS-all") after each generation

 Comparison after 10 000 generations

115

20

You might also like