Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 92

Derivative-Free Optimization

Chapter 7
Derivative-Free Optimization
Outline

Genetic algorithms (GA)


Simulated Annealing (SA)
Downhill simplex search
Random search
The Big Picture

Model space
Adaptive networks
Neural networks
Fuzzy inf. systems
Soft
Computing
Approach space
Derivative-free optim.

Derivative-based optim.
Genetic Algorithms

Motivation
• Look at what evolution brings us?
- Vision
- Hearing
- Smelling
- Taste
- Touch
- Learning and reasoning
• Can we emulate the evolutionary process with
today's fast computers?
Genetic Algorithms
Outlines:
 Basics of genetic algorithms

 Further Evolution Of GA

Improved Selection Schemes


Advanced Operators.
Hybrid Genetic Algorithms.

Applications Of Genetic
Algorithms.
Genetic Algorithm.

Given a way or a method of encoding solutions


of a problem into the form of chromosomes
given an evaluation function that returns a
measurement of the cost value of any
chromosomes in the context of the problem.
GA.v.s normal optimization

 Search peaks (extremes) in parallel and


exchanges information between peaks, thus
 lessening the possibility of ending at a local
minimum and missing the global minimum.
 Work with a coding of the parameters.
 Needs only the objective function to guild
search, there is no requirement for derivatives.
 The transition rules are probabilistic.
Genetic Algorithms

Terminology:
• Fitness function
• Population
• Encoding schemes
• Selection
• Crossover
• Mutation
• Elitism
Three operators- Reproduction, Crossover
Mutation

Chromosome

(11, 6, 9) 1011 0110 1001


Gene
Crossover
10011110 10010010
10110010 10111110
Crossover point
Mutatio
n
10011110 10011010
Mutation bit
Reproduction

Reproduction is a process in which


individual strings are copied according to
their fitness value.

A fitness f(i) is assigned a higher number,


denoting good fit.

The fitness function can be any nonlinear


non-differentiable, discontinuous
function.
Example

Consider a population of six chromosomes (strings) with a


set of fitness values shown in Table. The corresponding
weighted roulette wheel is shown in generate numbers
randomly from the interval 0 and 50.
Random number 26 2 49 15 40 36
Chromosome chosen 4 1 6 2 5 5

No. String(Chromosome) Fitness % of total Running

1 01110 8 16 8
2 11000 15 30 23
3 00100 2 4 25
4 10010 5 10 30
5 01100 12 24 42
6 00011 8 16 50
Crossover

An offspring has two parent genes from both. The main


operator working on the parents is crossover, which happens
for a selected pair with a crossover probability Pc.

0 1 0 1 1 1 0 0 1 0 1 0 0 1

1 1 1 1 0 0 1 1 1 1 1 1 1 0

Crossover site
Mutation

Mutation is introduced and is applied with a


low probability Pm . It inverts a randomly
chosen bit on a string.
GA to specify the following parameters

n = population size,
pc = crossover probability,
pm = mutation probability,
G = generation gap.
G = 1 non-overlapping population
0<G<1 overlapping
Genetic Algorithms

10010110 10010110
01100010 Elitism 01100010
10100100 10100100
10011001 10011101
01111101 01111001
... ...
... Selection Crossover Mutation ...
... ...
... ...

Current Next
generation generation
Step1: Establish a base population of chromosomes.

Step2: Determine the fitness value of each chromosome.

Step3: Duplicate the chromosomes according to their fitness


values and create new chromosomes by mating current
chromosomes. (e.g. ,mutation, recombination).

Step 4: Delete undesirable members of the population.

Step5: Insert the new chromosomes into the population to form a


new population pool.
Example
Consider the problem of maximizing the function
f(x)= x2 , where x is use a GA, we code the variable
x simply as a binary unsigned integer of the string
“11000” represents the integer x=24. The fitness
function is function f(x)

TABLE 14.2 GA in Example : Reproduction Process


Initial Fitness Pselecti’ No
No. Population x f(x)= x2 fi/f
1 01001 9 81 0.08 1
2 11000 24 576 0.55 2
3 00100 4 16 0.02 0
4 10011 19 361 0.35 1
Sum 1034
Average 259
Example

Consider the problem of maximizing the function


f(x)= x2 , where x is use a GA, we code the
variable x simply as a binary unsigned integer of
the string “11000” represents the integer x=24.
The fitness function is function f(x)

(b)Crossover Process
Mating Pool after Crossover New
Reproduction Mate Site Population x
0100|1 2 4 01000 8
1100|0 1 4 11001 25
11|000 4 2 11011 27
10|011 2 2 10000 16
Schemata
Schema Theorem:
Short, low-order, above-average schemata receive
exponentially increasing trials in subsequent
generations.

The Building Block Hypothesis:


Short, low-order, highly fit schemata (building blocks)
are sampled, recombined, and re-sampled to form
strings of potentially higher fitness. In other words,
building blocks are combined to form better strings.
Schemata
 schema: Is a similarity template describing a subset
of string with similarity at certain string position.
 The order of schema: is the number of fixed positions.

Schema order :
ex: o(*111*)=3 o(*1***)=1

 The defining length of a schema ∂ (H) : is the distance


between the first and the last specific string position.

Schema length :
ex: (011*1**)=5-1=4 (0******)=1-1=0
Mapping objective function values to
:
fitness
Minimize a cost function g(x):

C max  g x  when g(x) < C max

f x   
0 otherwise,

Maximize a profit or utility function u(x):

u x   C min when u(x) + C min > 0


f x   
0 otherwise,
Fitness scaling

Diversity
Premature convergence
Linear scaling.
Linear Fitness scaling

Denote the raw fitness as f and the scaled fitness


as f’ . Linear scaling relationship between f’ and f
as follows:

f' = af + b,
where the coefficient a and b are chosen such that
f’avg = favg
and
f’max = Cmult favg’
when Cmult is the number of expected copies desired f
or the best population typically small populations (n
=50 to 100), Cmult =1.2 to 2 has been used .
Coding
Coding maps a finite-length parameters of
an optimization problem.

Two fundamental guidelines


Meaningful building blocks: The user should
select a coding such that order schemata are
relevant to the underlying problem and relatively
schemata over other fixed positions.
Minimal alphabets: The user should select the
smallest alphabet that expression of the problem -
Binary alphabet
Discretization
In practice, one successfully used method of coding mu
lti-parameter optimization problems involving real param
eters is concatenated, multi-parameter, mapped, fixed-p
oint coding.
Example: f(t) is reduced to a search for the six paramter
s f0,…., f5 f1*
Continuous
Force, f5*
F f 0* f 4*
f2*
Linear

f3*

t0 t1 t2 t3 t4 t5
Time, t
Optimization problems with
constraints
To incorporate GA search, we can use the penalty
method which degrades the fitness ranking in
relation to the degree of constraint violation.
Minimize g(x)
subject to hi (x)≥0, i=1,2,…,n,
n
 Minimize g  x    r h i  x 
i 1

Where F(·) is a proper penalty function and r is the penalty


coefficient penalty function.

For example, [hi(X)]= hi2 for all violated constraints i.


FURTHER EVOLUTION
There are various basic techniques.

Improved Selection Schemes


Micro-operators
Reordering operators
Macro-operators
Hybrid Genetic Algorithm
Improved Selection Schemes

Roulette-wheel selection problem:


the best member of the population may fail to in the next
generation and may cause a so-called stochastic error

Elitism.
Deterministic Sampling.
Remainder Stochastic Sampling with Replacement.
Remainder Stochastic Sampling Without
Replacement.
Ranking Procedure.
Elitism
The elitist strategy copies the best member of each
succeeding generation. This strategy may
increase the speed of domination by a super
individual and thus improves the local search at
the expense of , but on balance it appears to
improve GA performance.
Deterministic Sampling

The probabilities of selection are calculated as usu


al Pselecti= fi/f . Then the expected number ei of offs
pring calculated by ei = npselecti. Each string is alloca
ted offspring according to integer part of ei value.
The remaining strings needed to fill out the populat
ion are drawn from the top of the sorted list.
Remainder Stochastic Sampling with
Replacement

Mostly identical to deterministic sampling


scheme.

The fractional parts of the expected


number values are used to calculate
weights in a roulette-wheel selection
process to fill the remaining population
slots.
Remainder Stochastic Sampling Without
Replacement – Most popular in applications

Identical to the deterministic scheme,

the fractional number values are treated


as probabilities. In other words, the
remaining offspring for each of the equal
to the fractional part of the expected
individual count until the spring equal
the population size n.
Ranking Procedure

the population is sorted according to an objecti


ve function (fitness) value. Individuals are th
en assigned an offspring count that is solely
a function of their rank . One such function p
roposed by Baker, the expected number of in
dividuals i is computed according to

ei = - 2( max  1) +1+( ) n 1


rank (i )  max  1 n 1
n 1
Where max is an user-defined value,1≤  max≤ 2, and n is the po
pulation size. The range of ei will then be [2-  max ,max ].
Ranking Procedure
max

count

min

1n
Rank
Figure Baker’s ranking procedure
Advanced Operators

Two kinds: micro-operators


macro-operators
• micro operators: operating at a chromoso
nal level.
• macro operators: acting at a population le
vel. We shall first examine some micro oper
ators: multiple-point crossover and reorderi
ng operators.
Micro-operators
– multiple-point crossover

Chromosome 1: 10110001100
Chromosome 2: 00101101001

This problem can be solved by, for example, two point crossover
as follows:

Parent 1: 1011 | 0001 | 100


Parent 2: 0010 | 1101 | 001

Child 1: 1011 1101 100


Child 2: 0010 0001 001
Reordering operators

reordering operators search for better


codes as well as better allele sets. These
operators are appropriate for problems
where fitness values depend on string
arrangement, such as where fitness f
depends on some combination of allele
value v and ordering o, f = f(v, o)
e.g., inversion operator
Inversion operator
for example, by

12345678
10010001
where 1,2,…,8 represent positions or gene names.
two inversion sites are chosen at random:
12|3456|78
01|1110|10
Then after inversion, it becomes
12654378
01011110
 
Macro-operators - Interspecies differentiation

Interspecies differentiation is carried out in


nature through speciation and niche
exploitation.

i. g., Sharing function : A sharing function


determines the neighborhood and degree of
sharing for each string in the population. As a
result, this mechanism limits the uncontrolled
growth of particular species within a population.
Sharing function
A simple sharing function s(d(xi , xj)) is shown in the fi
gure where d(xi , xj)) is the (Hamming) distance betwe
en the two strings xi and xj .
1.0

Share, s( d )
f ( xi )
fs(xi)= j
n
s ( d ( xi , xj )) 0.0
1
0 share
Distance dij = || xi - xj ||
Example
the fitness of each string. Since the fitness should i
ncrease as the distance the fitness F(q) of the string q
= 0-4-8-10-15-20-23-26 can be given by F(q)= 1/ (d0-4+ d
4-8
+…+d23-26 ), where di-j is the distance between nodes i
and j, which is equal to the length of link on the static
map.
Example
Parent 1: 0-1-2-3-6-9-11-15-16-21-24-26
Parent 2: 0-4-8-10-15-20-23-26
Since each parent has the point 15, the underlined parts of
each string are exchanged, yielding:
Child 1: 0-1-2-3-6-9-11-15-20-23-26
Child 2: 0-4-8-10-15-16-21-24-26
After the crossover children are checked to determine
whether each string has repeated number
Child 3: 0-1-2-3-6-9-11-15-10-7-11-17-22-25-26.
and after the cutoff operation,
Child 3’: 0-1-2-3-6-9-11-17-22-25-26
Genetic Algorithms
Example: Find the max. of the “peaks” function
z = f(x, y) = 3*(1-x)^2*exp(-(x^2) - (y+1)^2) - 10*(x/5 - x^3 -
y^5)*exp(-x^2-y^2) -1/3*exp(-(x+1)^2 - y^2).
Genetic Algorithms
GA process:

Initial population 5th generation 10th generation

MATLAB file: go_ga.m


Genetic Algorithms
Performance profile

MATLAB file: go_ga.m


Simulated Annealing
Analogy
Simulated Annealing
Terminology:
• Objective function E(x): function to be optiimized
• Move set: set of next points to explore
• Generating function: to select next point
• Acceptance function h(E, T): to determine if the s
elected point should be accept or not. Usually h(
E, T) = 1/(1+exp(E/(cT)).
• Annealing (cooling) schedule: schedule for reduci
ng the temperature T
Simulated Annealing
Flowchart:

Select a new point xnew in the move sets


via generating function

Compute the obj. function E(xnew)

Set x to xnew with prob. determined


by h(E, T)

Reduce temperature T
Simulated Annealing
Example: Travel Salesperson Problem (TSP)
How to transverse n cities once and only
once with a minimal total distance?
Simulated Annealing
Move sets for TSP
12 12
10 10
3 3
1 6 Inversion 1 6 Translation
2 7 2 7
11 8 9 11 8 9

4 5 4 5

1-2-3-4-5-6-7-8-9-10-11-12 1-2-3-4-5-9-8-7-6-10-11-12
12 12
10 10
3 3
1 Switching 1
6 6

2 7 2 7
11 9 11 9
8 8
4 5 4 5
1-2-11-4-8-7-5-9-6-10-3-12 1-2-3-4-8-7-5-9-6-10-11-12
Simulated Annealing
A 100-city TSP using SA

Initial random path During SA process Final path

MATLAB file: tsp.m


Simulated Annealing
100-city TSP with penalities when crossing the
circle

Penalty = 0 Penalty = 0.5 Penalty = -0.3


Copyright © 2001
By Dr Djamel Bouchaffra

CSE 513 Soft Computing


Fall 2001
Oakland University
Chapter 7 (part 2):
Derivative-Free Optimization

Simulated Annealing (SA)

Random Search Method


Simulated Annealing (SA)
Introduction

• Introduced by Metropolis et al. in 1953 and


adapted by Kirkpatrick (1983) in order to find an
optimum solution to a large-scale combinatorial
optimization problems

• It is a derivative–free optimization method

• SA was derived from physical characteristics of


spin glasses
Simulated Annealing (cont.)

Introduction (cont.)

• The principle behind SA is similar to what happens


when metals are cooled at a controlled rate

• The slowly decrease of temperature allows the


atoms in the molten metal to line themselves up an
form a regular crystalline structure that possesses
a low density & a low energy
Simulated Annealing (cont.)

Introduction (cont.)

• The value of an objective function that we intend


to minimize corresponds to the energy in a
thermodynamic system

• High temperatures corresponds to the case where


high-mobility atoms can orient themselves with
other non-local atoms & the energy can increase:
function evaluation accepts new points with higher
energy
Simulated Annealing (cont.)

Introduction (cont.)

• Low temperatures corresponds to the case where the


low-mobility atoms can only orient themselves with
local atoms & the energy state is not likely to increase:
Function evaluation is performed only locally and
points with higher energy are more & more refused

• The most important element of SA is the so-called


annealing schedule (or cooling schedule) which
express how fast the temperature is being lowered
from high to low
Simulated Annealing (cont.)

Terminology:

• Objective function E(x): function to be optimized

• Move set: set of next points to explore

E = xnew – x is a random variable with pdf = g(.,.)

• Generating function: pdf for selecting next point


g(x, T) = (2 T)-n/2 exp[-|| x||2 / 2T]
(n is the explored space dimension)
Simulated Annealing (cont.)

Terminology (cont.)

• Acceptance function h(E, T): to determine if the


selected point should be accepted or not. Usually
h(E, T) = 1/(1+exp(E/(cT)).

• Annealing (cooling) schedule: schedule for


reducing the temperature T
Simulated Annealing (cont.)

Basic step involved in a general SA method

• Step 1 [Initialization]: Choose an initial point x and a high t


emperature T and set the iteration count k to 1

• Step 2 [Evaluation]: Evaluate the objective function E = f(x)

• Step 3 [Exploration]: Select x with probability g(x, T), an


d set xnew = x + x

• Step 4 [Reevaluation]: Compute the new value of the objec


tive function Enew = f(xnew)
Simulated Annealing (cont.)

Basic step involved in a general SA method (co


nt.)

• Step 5 [Acceptance]: Test if xnew is accepted or not


by computing h(E, T) where E = Enew-E

• Step 6: Lower the temperature according to the an


nealing schedule (T = T; 0 <  < 1)

• Step 7: k = k + 1, if k reaches the maximum iteratio


n count, then stop otherwise go to step 3
Simulated Annealing (cont.)

Example: Travel Salesperson Problem (TSP)


How to transverse n cities once and only once with a
minimal total distance?

NP hard problem with [(n – 1)!] / 2 possible tours to explore


Simulated Annealing (cont.)

Move sets for TSP Remove 8-7 & replace it


Remove 6-7 and 8-9 in between two randomly
12 and replace them 12
selected cities 4 & 5
10 with opposite order 10
3 3
1 6 Inversion 1 6 Translation
2 7 2 7
11 8 9 11 8 9

4 5 4 5

1-2-3-4-5-6-7-8-9-10-11-12 1-2-3-4-5-9-8-7-6-10-11-12
Select 3 and 11
12 and switch them 12
10 10
3 Switching 3
1 6 1 6

2 7 2 7
11 9 11 9
8 8
4 5 4 5
1-2-11-4-8-7-5-9-6-10-3-12 1-2-3-4-8-7-5-9-6-10-11-12
Simulated Annealing (cont.)

A 100-city TSP using SA

Initial random path During SA process Final path


Random Search
This method explores the parameter space of
an objective function sequentially in a random
fashion to find the optimal point that
maximizes the objective function

This method is simple and converges to the


global optimum surely on a compact set
Random Search (cont.)

The steps involved are:

• Step 1: Choose an initial point x as the current point

• Step 2: Add a random vector dx to the current point


x & evaluate the objective function at x + dx

• Step 3: if f(x + dx) < f(x) set x = x + dx

• Step 4: Stop the process if max iteration is reacted o


therwise go back to step 2 to find a new point
Random Search (cont.)

This method is blind since the search directions


are guided by a random number generator,
therefore an improved version is developed

It is based on 2 observations:

• If a search in a direction results in a higher value of


the objective function the the opposite direction
should provide a lower value of the objective function

• Successive successful (failure) searches in a certain


direction should bias subsequent searches toward (or
against) this direction
Random Search (cont.)

Flowchart:

Select a random dx

yes x = x + b + dx
f(x+b+dx)<f(x)?
b = 0.2 b + 0.4 dx
no

yes x = x + b - dx
f(x+b-dx)<f(x)?
b = b - 0.4 dx
no
b = 0.5 b
Hybrid Genetic Algorithm

 Simulated Annealing
 SA mutation (SAM) SA Recombination (SAR)

 Temperatures schedule is non-homogenous


As temperature is lowed, the accept rate in SAM, SAR is
increase.
 Application of GA
 Natural network parameter learning compromise to hybr
idize GAs with gradient methods such as the back prop
agation algorithm: An initial genetic search followed by
a gradient method or a gradient-descent step can be inc
luded as one of the genetic operstors.
GA and Neural Network

0.5

1.1 -0.7

-1.6 2.1

0.9 -7.3
1.5 3.5
-0.8 -1.9

Encoding : (1.1, -0.7, 0.5, 0.9, 1.5, -0.8, -1.9, 3.5, -7.3, -1.6, 2.1)
Initialization of weights

An efficient initialization scheme can be used to


choose the weights randomly with a probability
distribution given by e-||x||, that is, a two-sided
exponential distribution with a mean of 0 and a mean
absolute value of 1 . Mutation crossover and gradient
genetic operators are used to search for the optimal
weight set solutions.

Unbiased-mutate-weight:
Biased-mutate-weights:
Mutate-nodes:
Unbiased-mutate-weight

For each entry in the chromosome, this


operator with fixed probability (p=0.1)
replaces it with a random value chosen
from the initialization probability
distribution.
Biased-mutate-weights

For each entry in the chromosome, this


operator with fixed probability distribution
(p= 0.1) adds to it a random value chosen
from the initialization probability
distribution.
Mutate-nodes

 This operator select n non-input nodes of the


network that the parent chromosome represents.
 For each input link to these n nodes, the
operator adds to the link’s weight a random
value from the initialization probability
distribution.
 It then encodes this new network on the child’s
chromosome.
 Since the input links to a node from a logical
subgroup of all links, confining the random
weight changes to these subgroups seems more
likely to result in a good evaluation.
Random Search
Properties:
• Intuitive
• Simple
Analogy:
• Get down to a valley blindfolded ( 盲人下山 )
Two heuristics:
• Reverse step
• Bias direction
Random Search
Flowchart:
Select a random dx

yes x = x + b + dx
f(x+b+dx)<f(x)?
b = 0.2 b + 0.4 dx
no

yes x = x + b - dx
f(x+b-dx)<f(x)?
b = b - 0.4 dx
no
b = 0.5 b
Random Search
Example: Find the min. of the “peaks” function
z = f(x, y) = 3*(1-x)^2*exp(-(x^2) - (y+1)^2) - 10*(x/5 - x^3 -
y^5)*exp(-x^2-y^2) -1/3*exp(-(x+1)^2 - y^2).

MATLAB file: go_rand.m


Downhill Simplex Search
Simplex: a set of n+1 points in n-dim. space
• A triangle in a 2D space
• A tetrahedron in a 3D space
Concept of downhill simplex search
• Repeatedly replaces the highest points with a lowe
r one
• Consecutive successful replacements lead to the e
nlargement of the simplex
• Consecutive unsucessful replacements lead to the
shrinkage of the simplex
Downhill Simplex Search
Flowchart
Figure 7.9 in page 188
Behavior
The simplex can adapt itself to the objective functio
n landscape (just like an amoeba), and eventually c
onverges to a nearby local minimum.
Program
The search procedure is implemented as a function f
mins.m that comes with MATLAB.
Downhill Simplex Search
Example: Find the min. of the “peaks” function
z = f(x, y) = 3*(1-x)^2*exp(-(x^2) - (y+1)^2) - 10*(x/5 - x^3 -
y^5)*exp(-x^2-y^2) -1/3*exp(-(x+1)^2 - y^2).

MATLAB file: go_simp.m


Copyright © 2001
By Dr Djamel Bouchaffra

CSE 513 Soft Computing


Fall 2001
Oakland University
Chapter 7 (part 3):
Derivative-Free Optimization

• Downhill Simplex Search


(DSS)
Downhill Simplex Search

It is a derivative free method for


multidimensional function optimization

It has an interesting geometrical


interpretation

Principle:

• It starts with an initial simplex and replaces the


point having the highest function value in a
simplex with another point
Downhill Simplex Search (cont.)

It is based on 4 operations:

• Reflection

• Reflection & expansion

• Contraction

• Shrinkage
Downhill Simplex Search (cont.)

Cycle of the DSS

• Start with a point P0 and set up the simplex


Pi = P0 + i ei (i = 1, …, n). Let’s define the follo
wing quantities:

- l = argmin{yi} (l for “low”)

- h = argmax{yi} (h for “high”)

- yi is the function value at the simplex point Pi


Downhill Simplex Search (cont.)

Cycle of the DSS (cont.)

- Interval 1: {y; y yl}


max
- Interval 2: {y; yl < y  i h i
{yi}}
max
i h i
- Interval 3: {y; < y yh}

- Interval 4: {y; yh < y}


Downhill Simplex Search (cont.)

Cycle of the DSS (cont.)


• 4 Steps are needed for each cycle of the DSS

A. Reflection: Define the reflection point and


* its
P
*
value y
as:

(Choose an _ _ _
*
opposite P  P   ( P  Ph ) ( P centroid of (n  1) points)
direction of Ph)
* *
y  fP (  0)
 
Downhill Simplex Search (cont.)

Cycle of the DSS (cont.)

A. Reflection (cont.)

Test: 1. If * Interval 1  go to Expansion


y this direction)
(continue on
2. If * Interval 2  replace Ph with and
y cycle
terminate the *
3. If * Interval 3  replace Ph with P and
go y to contraction
*
4. If P
* Interval 4  go to contraction
y
(change direction)
Downhill Simplex Search (cont.)

Cycle of the DSS (cont.)

B. Expansion: Define the expansion point


** _ * _
P  P (P P)
**  ** 
y  fP   1
  **
Test: if y**  interval 1, replace Ph with Pand
terminate cycle. Otherwise, replace Ph with the
*
original reflection point P& terminate the cycle
Downhill Simplex Search (cont.)

Cycle of the DSS (cont.) ** _ _


P  P  ( Ph  P )
C. Contraction: Define **  ** 
y  f  P ;0    1
**  
Test: if y  Interval 1, 2 or 3 replace Ph with
and terminate the cycle. Otherwise go to **
shrinkage P

D. Shrinkage: Replace Pi with (Pi + Pl) / 2 and


terminate cycle

You might also like