Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Home Exercise 3: Dynamic Programming and Randomized Algorithms

Team member: Yifan WANG, Xingshan HE, Michele Natacha Ela Essola

1-Dynamic Programming for the Knapsack Problem (5 points)


Assume a 0-1-knapsack problem instance with weight restriction 𝑊 = 10 and 5 items with the following
profits and weights:

item 1 2 3 4 5
profit 4 3 5 6 2
weight 4 3 4 2 3

Follow the dynamic programming algorithm from the lecture and fill out the table below with the
corresponding profit values of the subproblems to pack the first 𝑖 items into a knapsack of weight 𝑗:
What is the optimal packing?

𝑖/𝑗 0 1 2 3 4 5 6 7 8 9 10
0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 4 4 4 4 4 4 4
2 0 0 0 3 4 4 4 7 7 7 7
3 0 0 0 3 5 5 5 8 9 9 9
4 0 0 6 6 6 9 11 11 11 14 15
5 0 0 6 6 6 9 11 11 11 14 15

Let the table (matrix) above be 𝑉;


As 𝑉[5,10] = 𝑉[4,10] and 𝑉[4,10] ≠ 𝑉[3,10], the introduction of item 5 has no effect on the optimal packing,
which is determined on row 𝑖 = 4 by item 4;
The weight of 𝑖4 is 2 and the value of 𝑖4 is 6, 𝑉[4−1,10−2] = 𝑉[3,8] = 9 = 𝑉[4,10] − 6, move to 𝑉[3,8];

As 𝑉[3,8] ≠ 𝑉[2,8], item 3 is in the optimal packing;

The weight of 𝑖3 is 4 and the value of 𝑖3 is 5, 𝑉[3−1,8−4] = 𝑉[2,4] = 4 = 𝑉[3,8] − 5, move to 𝑉[2,4];

As 𝑉[2,4] = 𝑉[1,4] and 𝑉[1,4] ≠ 𝑉[0,4] = 0, the introduction of item 2 has no effect on the optimal packing,
and item 1 is in the optimal packing.
So the optimal packing is:

𝑖1 𝑖2 𝑖3 𝑖4 𝑖5
1 0 1 1 0
2-Matrix Chain Multiplications (5 points)
Consider the multiplication of 𝑛 matrices 𝐴1 · A2 ··· An where matrix 𝐴𝑖 is an 𝑎𝑖 -by-𝑏𝑖 matrix. The
number of all possible orders of multiplications is exponential in 𝑛 and, thus, an enumeration/brute force
approach will not be feasible. We consider dynamic programming instead here. Let 𝐶(𝑖, 𝑗) be the optimal
cost (in number of basic multiplications) to compute 𝐴𝑖 · Ai+1 ··· Aj .

1. Which values of 𝐶(𝑖, 𝑗) are easy to compute (“initialization of the dynamic programming”) and which
value of 𝐶(𝑖, 𝑗) corresponds to the optimal solution (the cost of the entire matrix chain multiplication)?
According to the Lecture Slides, dynamic programming is a method for solving a complex problem by breaking it
down into a collection of simpler subproblems. Two of the Typical Algorithm Designs are (1) bottom-up solving of
the subproblems (i.e. computing their optimal value), starting from the smallest by using a table structure to store
the optimal values and the Bellman equality; and (2) eventually construct the final solution (can be omitted if only
the value of an optimal solution is sought).

In this case, the ‘bottom’ or ‘base’ (simplest subproblems) are the cost (numbers of multiplications) of
getting 𝐶𝑥 of 𝐴𝑥 where 𝑖 ≤ 𝑥 ≤ 𝑗, which are 0 and are easy to compute (have no cost).
An upper tier of simple subproblems is to get 𝐶(𝑥, 𝑥 + 1) which is the numbers of multiplications to get
the product of 𝐴𝑥 · Ax+1 where 𝑖 ≤ 𝑥 ≤ 𝑗 − 1. This is also easy to compute (but more complicated
compared with getting 𝐶𝑥 of 𝐴𝑥 , these processes already need the values of the base, which is 𝐶𝑥 ).
Obtaining the numbers of multiplications to get the products of three adjacent matrix like 𝐴𝑥−1 · Ax ·
Ax+1 is much more complicated and will need the Bellman equation. There are ways like (𝐴𝑥−1 · Ax ) ·
Ax+1 and 𝐴𝑥−1 · (Ax · Ax+1 ), so it is not a easy computation.
And in this case, the number of multiplications we get on the very top (constructed final solution)
corresponds to the optimal solution. It is the value of the minimum number of multiplications to compute
𝐴𝑖 · Ai+1 ··· Aj . This can only be obtained after all subproblems are done in dynamic programming.

2.Consider that the corresponding solution for the subproblem 𝐶(𝑖, 𝑗) calculates first 𝐴𝑖 ··· Ak , then 𝐴𝑘+1 ·
·· Aj and finally multiplies the corresponding matrices 𝐴𝑖 · Ai+1 ··· Aj is computed as (𝐴𝑖 ··· Ak ) · (Ak+1 ···
Aj ). Note that the splitting point 𝑘 is unknown in advance.

Write down the Bellman equation to compute 𝐶(𝑖, 𝑗).


If 𝑖 = 𝑗, 𝐶(𝑖, 𝑗) = 𝐶𝑖 = 0, there is no cost to compute 𝐴𝑖 ;
If 𝑖 < 𝑗:
Let the number of rows of 𝐴𝑥 be 𝑝𝑥−1 and the number of columns be 𝑝𝑥 , so the chain of matrixes is like:
𝐴𝑖 · Ai+1 ··· Aj−1 · Aj

(𝑝𝑖−1 , 𝑝𝑖 ) ( 𝑝𝑖 , 𝑝𝑖+1 ) ··· (𝑝𝑗−2 , 𝑝𝑗−1 ) (𝑝𝑗−1 , 𝑝𝑗 )

𝑘 can be inserted from 𝑖 to 𝑗 − 1, and there exist 𝑗 − 𝑖 places for 𝑘


As 𝐴𝑘 divides (𝐴𝑖 · Ai+1 ··· Aj ) into (𝐴𝑖 ··· Ak ) and (𝐴𝑘+1 ··· Aj ), 𝐶(𝑖, 𝑗) will need the minimum numbers
of multiplications of both (𝐴𝑖 ··· Ak ) and (𝐴𝑘+1 ··· Aj ) (which are exactly 𝐶(𝑖, 𝑘) and 𝐶(𝑘 + 1, 𝑗)), and
additionally, the number of multiplications of (𝐴𝑖 ··· Ak ) · (Ak+1 ··· Aj ), which is 𝑝𝑖−1 · pk · pj .

For 𝑘 in a fixed position, the total number of multiplications are 𝐶(𝑖, 𝑘) + 𝐶(𝑘 + 1, 𝑗) + 𝑝𝑖−1 𝑝𝑘 𝑝𝑗

To find 𝐶(𝑖, 𝑗), we need to compare all the values for 𝑘 in all 𝑗 − 𝑖 positions and pick the smallest one.
So, the Bellman equation to compute 𝐶(𝑖, 𝑗) is:
𝐶(𝑖, 𝑗) = 0, 𝑖𝑓 𝑖 = 𝑗
𝐶(𝑖, 𝑗) = min {𝐶(𝑖, 𝑗) = 𝐶(𝑖, 𝑘) + 𝐶(𝑘 + 1, 𝑗) + 𝑝𝑖−1 𝑝𝑘 𝑝𝑗 } , 𝑖𝑓 𝑖 < 𝑗
𝑖≤𝑘<𝑗

3.Consider the example of five matrices 𝐴1 (5-by-2), 𝐴2 (2-by-10), 𝐴3 (10-by-1), 𝐴4 (1-by-10), and 𝐴5
(10-by-2) and complete a table like the following one with the values of C(i, j) as the dynamic
programming approach would do. What are the actual minimum number of basic multiplications needed?

𝑖/𝑗 1 2 3 4 5
1 0 100 30 80 60
2 - 0 20 40 44
3 - - 0 100 40
4 - - - 0 20
5 - - - - 0

𝑘 1 2 3 4 5
1 - - 1 3 3
2 - - - 3 3
3 - - - - 3
4 - - - - -
5 - - - - -

In this case, 𝑝0 = 5, 𝑝1 = 2, 𝑝2 = 10, 𝑝3 = 1, 𝑝4 = 10, 𝑝5 = 2


Let the First Matrix be 𝑀, let the Second Matrix be 𝐾 to record the location of 𝑘;
For 𝑖 = 𝑗, 𝑀[𝑖, 𝑗] = 0;
𝑀[1,2] = 𝑝0 𝑝1 𝑝2 = 100, 𝑀[2,3] = 𝑝1 𝑝2 𝑝3 = 20, 𝑀[3,4] = 𝑝2 𝑝3 𝑝4 = 100, 𝑀[4,5] = 𝑝3 𝑝4 𝑝5 = 20
𝑀[1,3] = min{𝐶(1,1) + 𝐶(2,3) + 𝑝0 𝑝1 𝑝3 , 𝐶(1,2) + 𝐶(3,3) + 𝑝0 𝑝2 𝑝3 } = min{30, 150} = 30,
𝑀[2,4] = min{𝐶(2,2) + 𝐶(3,4) + 𝑝1 𝑝2 𝑝4 , 𝐶(2,3) + 𝐶(4,4) + 𝑝1 𝑝3 𝑝4 } = min{300, 40} = 40,
𝑀[3,5] = min{𝐶(3,3) + 𝐶(4,5) + 𝑝2 𝑝3 𝑝5 , 𝐶(3,4) + 𝐶(5,5) + 𝑝2 𝑝4 𝑝5 } = min{40, 300} = 40,
𝑀[1,4] = min{𝐶(1,1) + 𝐶(2,4) + 𝑝0 𝑝1 𝑝4 , 𝐶(1,2) + 𝐶(3,4) + 𝑝0 𝑝2 𝑝4 , 𝐶(1,3) + 𝐶(4,4) + 𝑝0 𝑝3 𝑝4 } =
min{140, 700,80} = 80,
𝑀[2,5] = min{𝐶(2,2) + 𝐶(3,5) + 𝑝1 𝑝2 𝑝5 , 𝐶(2,3) + 𝐶(4,5) + 𝑝1 𝑝3 𝑝5 , 𝐶(2,4) + 𝐶(5,5) + 𝑝1 𝑝4 𝑝5 } =
min{80, 44,80} = 44,
𝑀[1,5] = min{𝐶(1,1) + 𝐶(2,5) + 𝑝0 𝑝1 𝑝5 , 𝐶(1,2) + 𝐶(3,5) + 𝑝0 𝑝2 𝑝5 , 𝐶(1,3) + 𝐶(4,5) +
𝑝0 𝑝3 𝑝5 , 𝐶(1,4) + 𝐶(5,5) + 𝑝0 𝑝4 𝑝5 } = min{64, 240,60,180} = 60;
To get 𝐶(1,5), 𝑘1 is at 𝐴3 , the chain is divided into (𝐴1 · A2 · A3 ) · (A4 · A5 ),
To get 𝐶(1,3), 𝑘2 is at 𝐴1 , the chain is further divided into 𝐴1 · (A2 · A3 ) · (A4 · A5 );
Check:
𝐶(1,5) = 𝑝1 𝑝2 𝑝3 + 𝑝3 𝑝4 𝑝5 + 𝑝0 𝑝1 𝑝3 + 𝑝0 𝑝3 𝑝5 = 60

3-Roulette wheel and Binary Tournament selection (4 points)

a) Given the fitness function 𝑓(𝑥) = √𝑥 on a single variable x ∈ R, calculate the probability of selecting
the individuals 𝑎 with 𝑥 = 1, 𝑏 with 𝑥 = 4, and 𝑐 with 𝑥 = 9, using the roulette wheel selection from the
lecture.
individual a b c
fitness 1 2 3
𝑄𝑖 1 2 3
6 6 6
Cumulative Probability 1 3 6
6 6 6

b) Calculate the probability of selecting the same individuals when the fitness function is 𝑔(𝑥) =
2
(𝑓(𝑥)) .
2 2
𝑔(𝑥) = (𝑓(𝑥)) = (√𝑥) = 𝑥

individual a b c
fitness 1 4 9
𝑄𝑖 1 4 9
14 14 14
Cumulative Probability 1 5 14
14 14 14

c) Calculate the probabilities of selecting each solution from the population {𝑎, 𝑏, 𝑐} (with the same above
solutions 𝑥 = 1 (a), 𝑥 = 4 (b), and 𝑥 = 9 (c) and for both fitness functions 𝑓(𝑥) and 𝑔(𝑥) )in a single
binary tournament. A single binary tournament means, we draw two solutions uniformly at random from
the population (with replacement) and pick the better one.
The table for all possible results is as shown:
Pairs 𝑓(𝑥) 𝑔(𝑥) Selection
𝑎, 𝑎 1 1 1 1 𝑎
𝑎, 𝑏 1 2 1 4 𝑏
𝑎, 𝑐 1 3 1 9 𝑐
𝑏, 𝑎 2 1 4 1 𝑏
𝑏, 𝑏 2 2 4 4 𝑏
𝑏, 𝑐 2 3 4 9 𝑐
𝑐, 𝑎 3 1 9 1 𝑐
𝑐, 𝑏 3 2 9 4 𝑐
𝑐, 𝑐 3 3 9 9 𝑐

The probability distributions for this single binary tournament is:

𝑃(𝑎) 𝑃(𝑏) 𝑃(𝑐)


1 3 5
9 9 9

The selection of fitness function (𝑓 𝑎𝑛𝑑 𝑔) does not change the results in this case.

d) Which of the selection operators do you favor? Why?


Selection is the major determinant for specifying the trade-off between exploitation and exploration. Both
Roulette wheel selection and Tournament selection apply to this role.
For Roulette wheel selection, the trade-off between exploitation and exploration are made through the
choice of fitness function. In this case 𝑔(𝑥) pays more attention to exploitation as the individual with
better performances (c in this case) has much higher probability to be chosen. But 𝑔(𝑥) is easy to fall into
a local optimal solution. With fitness function 𝑓(𝑥), individuals having inferior performances (a and b in
this case) have higher probabilities to be selected than 𝑔(𝑥), which helps explore.
For Tournament selection, the trade-off between exploitation and exploration are made through
tournament size. The larger the tournament size, the more exploitation (as the individual with better
performances has higher chance to be picked and win the tournament). The smaller the tournament size,
the more exploration.
Both Roulette wheel selection and Tournament selection have advantages and disadvantages. For
Roulette wheel selection, 𝑄𝑖 depends on scaling of 𝑓. And Tournament selection is a function value free
operator (invariant under monotonous transformations of objective function).
Personally, I prefer Tournament selection because of its high execution efficiency and easy
implementation. I think it has the following four significant advantages:
(1) Less complexity 𝑂(𝑛)
(2) Easy to parallelize
(3) Not easy to fall into a local optimal solution
(4) No need to sort all fitness values
4-Pure Random Search (6 points)
The first stochastic optimization algorithm, introduced before any genetic algorithm (GA) or evolution
strategy (ES), is the so-called pure random search, or PRS for short.
In the following, we consider the optimization of the following two functions, both defined on bitstrings
of length 𝑛:
Example 1 For x ∈ {0, 1}n , the function OM is defined as
𝑛

𝑓𝑂𝑀 (𝑥) = ∑ 𝑥𝑖
𝑛=1

Example 2 For x ∈ {0, 1}n , the function TZ is defined as


𝑛
𝑛

𝑓𝑇𝑍 (𝑥) = ∑ ∏(1 − 𝑥𝑗 )


𝑗=1
𝑛=1

a) Describe in words what the functions 𝑓𝑂𝑀 and 𝑓𝑇𝑍 compute


For 𝒇𝑶𝑴 :
As x ∈ {0, 1}n , let 𝑥𝑘 be the element of the set that 𝑥𝑖 = 0, and 𝑥𝑙 be the element of set that 𝑥𝑖 = 1
∑𝑥𝑘 = 0, and 𝑓𝑂𝑀 (𝑥) = ∑𝑛𝑛=1 𝑥𝑖 = ∑𝑥𝑙

So 𝑓𝑂𝑀 (𝑥) is the total number of elements 𝑥𝑖 that equals to 1.


For 𝒇𝑻𝒁:
𝑓𝑇𝑍 (𝑥) = (1 − 𝑥1 ) + (1 − 𝑥1 )(1 − 𝑥2 ) + ⋯ + (1 − 𝑥1 )(1 − 𝑥2 ) … (1 − 𝑥𝑛 )
If 𝑥1 = 1, 𝑓𝑇𝑍 (𝑥) = 0 + 0 + ⋯ + 0 = 0,
If 𝑥1 = 0 𝑎𝑛𝑑 𝑥2 = 1, 𝑓𝑇𝑍 (𝑥) = 1 + 0 + ⋯ + 0 = 1,
If 𝑥1 = 𝑥2 = ⋯ = 𝑥𝑚−1 = 0 𝑎𝑛𝑑 𝑥𝑚 = 1, 𝑓𝑇𝑍 (𝑥) = 1 + 1 + ⋯ + 0 = 𝑘 − 1
So 𝑓𝑇𝑍 (𝑥) is the total number of 𝑥𝑖 = 0 before the first 𝑥𝑖 = 1

Or the 𝑓𝑇𝑍 (𝑥) + 1𝑡ℎ element is the first 𝑥𝑖 that 𝑥𝑓𝑇𝑍 (𝑥)+1 = 1

b) What are the maxima of the two functions? What are the values of 𝑓𝑂𝑀 and of 𝑓𝑇𝑍 at those optima?
When 𝑓𝑂𝑀 is maximized, all the elements of the bitstring equals to 1; 𝑥𝑖 = 1 𝑓𝑜𝑟 1 ≤ 𝑖 ≤ 𝑛
𝑀𝑎𝑥(𝑓𝑂𝑀 ) = ∑𝑥𝑙 = 𝑛
When 𝑓𝑇𝑍 is maximized, all the elements of the bitstring equals to 0; 𝑥𝑖 = 0 𝑓𝑜𝑟 1 ≤ 𝑖 ≤ 𝑛
𝑀𝑎𝑥(𝑓𝑇𝑍 ) = 1 + 1 + ⋯ + 1 = 𝑛
c) Compute the expected time (in number of function evaluations) to reach the optimum as a function of
the search space dimension n.
Hint: show that the time to reach the optimum follows a geometric distribution with a parameter to
determine.
1
For the search space dimension n, the chance for a search to find the optimum is 𝑝 = 2𝑛

The probability that the 𝑘 𝑡ℎ search is successful where 𝑘 ≥ 1 is:

𝑘−1
1 𝑘−1 1
Pr(𝑋 = 𝑘) = (1 − 𝑝) 𝑝 = (1 − 𝑛 ) ∗ 𝑛
2 2
Which follows a geometric distribution.
1
𝐸(𝑥) = = 2𝑛
𝑝

You might also like