Download as pdf or txt
Download as pdf or txt
You are on page 1of 130

Formulation – Modeling

Michel Bierlaire

Practice quiz

An investment fund is trying to determine how to invest its assets for the
following year, in order to maximize its profit. Currently, the fund has 2.5
million euros that it can invest in state bonds, real estate loans, car loans or
scholarship loans. The annual interest rates of the listed investment types
are 4% for state bonds, 6% for real estate loans, 8% for car loans and 9% for
scholarship loans.
To minimize risks, the investment fund allows only the selection of a
strategy satisfying the following restrictions:

• the amount invested in car and scholarship loans must not exceed twice
the amount invested in bonds;

• the amount invested in car loans must be larger or equal than the
amount invested in scholarship loans;

• the investment in car loans should not exceed the investment in real
estate loans by more than 70%.

Formulate this problem as an optimization problem by determining

1. the decision variables,

2. the objective function, and

3. the constraint(s).
Formulation – Modeling
Michel Bierlaire

Solution of the practice quiz

We proceed with the three steps of the modeling process.


Decision variables The decision variables are the amounts in euros in-
vested for
• state bonds: xsb ,
• real estate loans: xre ,
• car loans: xcℓ , and,
• scholarship loans: xsℓ .
Objective function As the company wants to maximize its profit, the ob-
jective function must provide the profit as a function of the decision
variables:
f : R4 → R : f (xsb , xre , xcℓ , xsℓ ).
The profit of the investment fund for the following year is calculated
based on the interest rates using the following formula:
f (xsb , xre , xcℓ , xsℓ ) = 0.04xsb + 0.06xre + 0.08xcl + 0.09xsℓ .
Therefore, we can write the problem as
max 0.04xsb + 0.06xre + 0.08xcl + 0.09xsℓ .
xsb ,xre ,xcℓ ,xsℓ

Constraints The total amount to invest is 2.5 Me. This is modeled using
the following constraint:
xsb + xre + xcℓ + xsℓ = 2′ 500′ 000

Each restriction is modeled as follows:


• the total amount invested in car and scholarship loans (xcℓ + xsℓ )
must not exceed twice the amount invested in bonds (2xsb ):

xcℓ + xsℓ ≤ 2xsb .

• the amount invested in car loans (xcℓ ) must be larger or equal


than the amount invested in scholarship loans (xsℓ ):

xsℓ ≤ xcℓ .

• the investment in car loans (xcℓ ) should not exceed the investment
in real estate loans (xre ) by more than 70%:

xcℓ ≤ 1.7xre .

Finally, we must impose all the decision variables to be non negative:

xsb , xre , xcℓ , xsℓ ≥ 0.

Putting everything together, we obtain the following optimization prob-


lem:
max 0.04xsb + 0.06xre + 0.08xcℓ + 0.09xsℓ
subject to

xsb + xre + xcℓ + xsℓ = 2′ 500′ 000,


xcℓ + xsℓ ≤ 2xsb ,
xsℓ ≤ xcℓ ,
xcℓ ≤ 1.7xre ,
xsb , xre , xcℓ , xsℓ ≥ 0.

For your information, the optimal solution (rounded to 1/10 of euros) is:

• State bonds: 696’721.3 euros

• Real estate loans: 409’836.1 euros

• Car loans: 696’721.3 euros

• Scholarship loans: 696’721.3 euros


for a total profit of 170’902.6 euros
The constraints are verified:

• Constraint 1: 696′ 721.3+409′ 836.1+696′ 721.3+696′ 721.3 = 2′ 500′ 000.0,

• Constraint 2: 696′ 721.3+696′ 721.3 = 1′ 393′ 442.62∗696′ 721.3 = 1′ 393′ 442.6,

• Constraint 3: 696′ 721.3 <= 696′ 721.3,

• Constraint 4: 696′ 721.3 <= 1.7 · 409′ 836.1 = 696′ 721.3.


Formulation – Modeling
Michel Bierlaire

Practice quiz

The company Coola-Coola Ltd. wants to design a can of soda of volume


0.33 liters. They need to set the dimensions (in centimeters) of this can to
use the minimum amount of aluminium, knowing that the form of the can is a
perfect cylinder, and the thickness of the aluminium is the same everywhere.
Write the problem as an optimization problem.
Formulation – Modeling
Michel Bierlaire

Solution of the practice quiz

We consider the three steps of the modeling process.

Decision variables The design of the cylinder depends on two variables,


both expressed in centimeters:

• the radius of the basis: r,


• the height of the cylinder: h.

Objective function Since the thickness of the aluminium is the same at


any part of the can, the total surface of the cylinder has to be min-
imized. The objective function must provide the total surface as a
function of the decision variables:

f : R2 → R : f (r, h).

• Each basis is a circle of radius r, so its surface is πr2 .


• The side of the can is a rectangle of area 2πrh.

Therefore, the objective function to minimize is

f (r, h) = 2πr2 + 2πrh.


Constraints The volume of the can must be 0.33 liters, that is 330 cm3 .
The first constraint is therefore:

πr2 h = 330.

We also need non negativity constraints:

r ≥ 0,
h ≥ 0.

The optimization problem is therefore:

min 2πr2 + 2πrh


r,h

subject to

πr2 h = 330,
r ≥ 0,
h ≥ 0.

The optimal solution of this problem is r = 3.746 cm and h = 7.491 cm.


Transformations
Michel Bierlaire

Practice quiz

Given the following optimization problem, transform it in such a way as


to obtain a minimization problem in which all decision variables must be non
negative and all constraints are defined by lower inequalities.

max −x21 + sin x2

subject to

6x1 − x22 ≥ 1,
x21 + x22 = 3,
x1 ≥ 2,
x2 ∈ R.
Transformations
Michel Bierlaire

Solution of the practice quiz

We apply a sequence of simple transformations to the optimization prob-


lem.

1. A maximization problem whose objective function is f (x) is equivalent


to a minimization problem whose objective function is −f (x):

argmaxx f (x) = argmaxx −f (x)

and
max f (x) = − min −f (x).
By applying statement (a) to our problem we obtain:

− min x21 − sin x2


subject to

6x1 − x22 ≥ 1,
x21 + x22 = 3,
x1 ≥ 2,
x2 ∈ R.

2. Now we transform the constraints. An equality constraint can be writ-


ten as the combination of two inequalities.

g(x) ≤ 0
g(x) = 0 ⇐⇒
g(x) ≥ 0.
The optimization problem becomes

− min x21 − sin x2

subject to

6x1 − x22 ≥1
x21 + x22 ≤3
x21 + x22 ≥3
x1 ≥2
x2 ≥ 0.

3. A constraint defined by a lower inequality can be multiplied by −1 to


get an upper inequality:

g(x) ≤ 0 ⇐⇒ −g(x) ≥ 0.

− min x21 − sin x2


subject to

−6x1 + x22 ≤ −1
x21 + x22 ≤3
−x21 − x22 ≤ −3
x1 ≥2
x2 ∈ R.

4. If a variable x can take any real value, it can be replaced by two non
negative artificial variables denoted by x+ and x− , such that

x = x+ − x− .

We can impose x+ ≥ 0 and x− ≥ 0, without loss of generality. We


apply this to the variable x2 in our formulation.

− min x21 − sin(x+ −


2 − x2 )
subject to

−6x1 + (x+ − 2
2 − x2 ) ≤ −1
x21 + (x+ − 2
2 − x2 ) ≤3
−x21 − (x+ − 2
2 − x2 ) ≤ −3
x1 ≥2
x+
2 ≥0
x−
2 ≥ 0.

5. In the presence of a constraint x ≥ a, with a ∈ R, a simple change of


variable
x = x̂ + a
transforms the constraint in

x̂ ≥ 0.

We apply this last principle to variable x1 , and we obtain:

− min(x̂1 + 2)2 − sin(x+ −


2 − x2 )

subject to

−6(x̂1 + 2) + (x+ − 2
2 − x2 ) ≤ −1
(x̂1 + 2)2 + (x+ − 2
2 − x2 ) ≤3
−(x̂1 + 2)2 − (x+ − 2
2 − x2 ) ≤ −3
x̂1 ≥0
x+
2 ≥0
x−
2 ≥ 0.

As requested, it is a minimization problem, all decision variables are non


negative and all constraints are defined by lower inequalities. Any solution
(x̂1 , x+ −
2 , x2 ) of this problem corresponds to the following solution of the orig-
inal problem:
x1 = x̂1 + 2
x2 = x+ −
2 − x2 .
Transformations
Michel Bierlaire

Practice quiz

The following optimization problem is not linear, because of the absolute


value in the objective function. Transform it into a linear problem in which
all decision variables must be non negative.

min | x1 − x2 |

subject to

x1 ≥ 0,
x2 ≥ 0.
Transformations
Michel Bierlaire

Solution of the practice quiz

To solve this exercise, we remember that if a variable x can take any real
value, it can be replaced by two non negative artificial variables denoted by
x+ and x− , such that x = x+ − x− . We also recall that the absolute value of
x is defined as 
x if x ≥ 0,
|x| =
−x if x < 0.
In our case, we have that

x1 − x2 if x1 ≥ x2 ,
|x1 − x2 | =
x2 − x1 if x1 < x2 .

Since x1 and x2 are non negative real numbers, the difference x1 − x2 can
take any real value.
Let us study the absolute value by examining the two cases:
• If x1 − x2 ≥ 0, we can define the non negative quantity
(
x1 − x2 , if x1 − x2 > 0,
y+ =
0, otherwise.

• If x1 − x2 < 0, we can define the positive quantity


(
x2 − x1 , if x1 − x2 < 0,
y− =
0, otherwise.

Consequently, the absolute value of the difference can then be written as

|x1 − x2 | = y + + y − .
Furthermore, if we impose that y + ≥ 0 and y − ≥ 0, we have that our
minimization problem can be written as
min (y + + y − )
x1 ,x2 ,y + ,y −

subject to
y+ ≥ x1 − x2 ,
y− ≥ x2 − x1 ,
x1 ≥ 0,
x2 ≥ 0,
y+ ≥ 0,
y− ≥ 0.
Note that this formulation is not strictly equivalent to the original one
for all feasible solutions. For instance, the feasible solution x1 = 0, x2 = 0,
y + = 1, y − = 1 has objective value 2 in the transformed problem, and 0 in
the original one.
But it is equivalent at the optimal solution. Denote the optimal solution
of the original problem (x∗1 , x∗2 ).
• If x∗1 − x∗2 > 0, the lowest possible value for y + , denoted by (y + )∗ , is
x∗1 −x∗2 (because of the constraint y + ≥ x1 −x2 ), and the lowest possible
value for y − , denoted by (y − )∗ , is 0 (because of the constraint y − ≥ 0).
Therefore, the objective function of the transformed problem at the
optimal solution is
(y + )∗ + (y − )∗ = x∗1 − x∗2 + 0 = |x∗1 − x∗2 |.

• If x∗1 − x∗2 < 0, for similar reasons, we have (y + )∗ = 0 and (y − )∗ =


x∗2 − x∗1 . Therefore, the objective function of the transformed problem
at the optimal solution is
(y + )∗ + (y − )∗ = 0 + x∗2 − x∗1 = |x∗1 − x∗2 |.

• If x∗1 − x∗2 = 0, we have (y + )∗ = 0 and (y − )∗ = 0. Therefore, the


objective function of the transformed problem at the optimal solution
is
(y + )∗ + (y − )∗ = 0 + 0 = |x∗1 − x∗2 |.
Problem definition
Michel Bierlaire

Practice quiz

Can a function have both a global and a local minimum? If so, provide
an example. If not, explain why.
Problem definition
Michel Bierlaire

Solution of the practice quiz

Yes. Any global minimum is also a local minimum. If the function is


convex, a local minimum is also a global minimum. If not, there may be
local minima that are not global. An example of such a function could be

f (x) = −2x2 + x4 − x3 .

6
f (x)
4

x
−3 −2 −1 1 2 3
−2

−4

We use the optimality conditions to identify the maxima and minima of the
function. The first derivative of f (x) is

f ′ (x) = −4x + 4x3 − 3x2 .

If we solve f ′ (x) = 0, we obtain three solutions:

x1 = 0, x2 = −0.693, x3 = 1.443.

These are the only candidates to be minima or maxima.


• Consider the interval [−0.1, 0.1]. x1 reaches the maximum of f in this
neighborhood. It is therefore a local maximum.

• Consider the interval [−0.7, −0.6]. x2 reaches the minimum of f in this


neighborhood. It is therefore a local minimum.

• Consider the interval [1.4, 1.5]. x3 reaches the minimum of f in this


neighborhood. It is therefore a local minimum.

This can also be verified using the second derivative of the function f (x):

f ′′ (x) = −4 + 12x2 − 6x.

Indeed, if we substitute the solutions x in the function f ′′ (x) we obtain:

f ′′ (x1 ) = −4 + 12(0)2 − 6(0) = −4 < 0 ⇒local maximum,


′′ 2
f (x2 ) = −4 + 12(−0.693) − 6(−0.693) = 5.92 > 0 ⇒local minimum,
f ′′ (x3 ) = −4 + 12(1.443)2 − 6(1.443) = 12.32 > 0 ⇒local minimum.

Since the function is a polynomial, we have

lim f (x) = lim −2x2 + x4 − x3 = +∞,


x→∞ x→∞
lim f (x) = lim −2x2 + x4 − x3 = +∞.
x→−∞ x→−∞

Therefore, the function has no global maximum. Its global minimum is the
local minimum associated with the lowest value of f . We can conclude that
the function has

• a local maximum at x1 = 0, f (x1 ) = 0,

• a local minimum at x2 = −0.693, f (x2 ) = −0.554, and

• a global minimum at x3 = 1.443, f (x3 ) = −2.77.


Problem definition
Michel Bierlaire

Practice quiz

Consider the following objective functions,

1. f1 (x) = x2 .

2. f2 (x) = 1/|x|.

3. f3 (x) = 1/x.

For each function, provide its infimum on R, or show that it does not
exist. Provide also its minimum on R, or show that it does not exist.
Answer the same question if the decision variable is constrained as follows:

−1 ≤ x ≤ 2.
Problem definition
Michel Bierlaire

Solution of the practice quiz


We first plot the function f1 to have an intuition.
6
f1 (x) = x2

x
0
−3 −2 −1 0 1 2 3

The infimum of x2 on R is 0:
inf x2 = 0.
x∈R

Indeed, for each M > 0, there exists



M
y=
2
such that y 2 = M
4
< M . The function has also a minimum at x∗ = 0, as
f1 (x∗ ) = inf x2 = 0.
x∈R

When the constraints are introduced, the same arguments can be used to
reach the same conclusions: the infimum is 0, and x∗ = 0 is an optimum.
We now analyze the function f2 .
6
f2 (x) = 1/|x|

x
0
−2 0 2 4

The infimum of 1/|x| on R is 0:


1
inf = 0.
x∈R |x|

Indeed, for each M > 0, there exists


2
y=
M
such that f2 (x) = M
2
< M . However, there is no minimum, as there is no x
such that
1
= 0.
|x|
The infimum of 1/|x| on Y = {x ∈ R| − 1 ≤ x ≤ 2} is 0.5. Indeed, for
each M > 0.5, there exists
y=2∈Y
such that f2 (y) = 0.5 < M . And the minimum is x∗ = 2, as
1
f2 (x∗ ) = 0.5 = inf .
x∈Y |x|

Finally, we analyze the function f3 .


6

f2 (x) = 1/x 2

−2

−4

−6
−4 −2 0 2 4
x

There is no infimum on R. Indeed, the function is not bounded from


below. As the infimum is the best lower bound, and there is no lower bound,
there is no infimum. Consequently, there is no optimum either. As the
function is also not bounded from below on the interval [−1 : 2], we reach
the same conclusions for the constrained case.
Problem definition
Michel Bierlaire

Practice quiz

Does the following function f (x1 , x2 ) : R2 → R have a global maximum


and a global minimum in the feasible set A?

f (x1 , x2 ) = x21 + x22 − 2x1 x2


,

A = (x1 , x2 ) ∈ R2 |(x1 − 1)2 + (x2 − 2)2 ≤ 1 .




Hint: use the Weierstrass extreme value theorem.


Problem definition
Michel Bierlaire

Solution of the practice quiz

The function f (x1 , x2 ) = x21 +x22 −2x1 x2 is a continuous function, because


it is the sum of continuous functions.

·105

0
0 200
100
200
300 0

The set A = (x1 , x2 ) ∈ R2 |(x1 − 1)2 + (x2 − 2)2 ≤ 1 is a compact set:
6 x
2

2 A

x1
−4 −3 −2 −1 1 2 3

Weierstrass’s theorem guarantees that the given function has both a max-
imum and a minimum when optimized on the feasible set A.
Convexity
Michel Bierlaire

Practice quiz

Are the following functions convex or concave? Justify your answer.

1. f (x) = 2x2 − 3.

2. g(x) = x3 − 5x2 + 6x.


Convexity
Michel Bierlaire

Solution of the practice quiz

1. The function f is plotted at Figure 1. This figure gives us the intuition


that the function is convex.

−3 −2 −1 1 2 3
−2

Figure 1: f (x) = 2x2 − 3

We use the definition of convexity to formally show it. Let x, y ∈ R and


λ ∈ [0, 1]. In order for f (x) to be convex, it must satisfy the following
condition:
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).
We first write the inequation and use algebra to simplify it.
2(λx + (1 − λ)y)2 − 3 ≤ λ(2x2 − 3) + (1 − λ)(2y 2 − 3)
2λ2 x2 + 4λ(1 − λ)xy + 2(1 − λ)2 y 2 − 3 ≤ 2λx2 − 3λ + 2y 2 − 3 − 2λy 2 + 3λ
2λ2 x2 + 4λ(1 − λ)xy + 2(1 − λ)2 y 2 ≤ 2λx2 + 2y 2 − 2λy 2
2λ2 x2 + 4λ(1 − λ)xy + 2y 2 − 4λy 2 + 2λ2 y 2 ≤ 2λx2 + 2y 2 − 2λy 2
2λ2 x2 + 4λ(1 − λ)xy − 4λy 2 + 2λ2 y 2 ≤ 2λx2 − 2λy 2
2(λ2 − λ)x2 + 4λ(1 − λ)xy − 2λy 2 + 2λ2 y 2 ≤ 0
2(λ2 − λ)(x2 − 2xy + y 2 ) ≤ 0
2(λ2 − λ)(x − y)2 ≤ 0.

Since λ ∈ [0, 1], the quantity (λ2 − λ) is not positive. As (x − y)2 ≥ 0,


for any x, y ∈ R, the final inequation is always true, which makes the
starting inequation true as well. With this, we conclude that f (x) is a
convex function.
2. The function is plotted at Figure 2. It gives us the intuition that the
function is not convex, and not concave. We show it.

−1 1 2 3 4

−1

Figure 2: g(x) = x3 − 5x2 + 6x

In order for g(x) to be convex, it must satisfy the following condition:


g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y),
for each x, y ∈ R and each λ ∈ [0, 1].
Similarly, in order for g(x) to be concave, it must satisfy the following
condition:
g(λx + (1 − λ)y) ≥ λg(x) + (1 − λ)g(y),
for each x, y ∈ R and each λ ∈ [0, 1].
To show that the function is not convex, and not concave, we need to
prove the existence of x, y and λ that violate the above inequalities.
Let us first rearrange the function in the following manner:

g(x) = x3 − 5x2 + 6x = x(x − 2)(x − 3).

From here, we see easily that g(x) = 0 for x1 = 0, x2 = 2, and x3 = 3.


Let us further observe the function values in the points x1 , x2 , and
λx1 + (1 − λ)x2 , where λ = 0.5 for the sake of simplifying the proof:

g(x1 ) = g(0) = 0
g(x2 ) = g(2) = 0
g(λx1 + (1 − λ)x2 ) = g(1) = 2

Since
g(λx1 + (1 − λ)x2 ) > λg(x1 ) + (1 − λ)g(x2 ),
we can conclude that g(x) is not convex.
Let us further observe the function values in the points x2 , x3 , and
λx2 + (1 − λ)x3 , where λ = 0.5, again for the sake of simplifying the
proof:

g(x2 ) = g(2) = 0
g(x3 ) = g(3) = 0
g(λx2 + (1 − λ)x3 ) = g(2.5) = −0.625

Since
g(λx2 + (1 − λ)x3 ) < λg(x2 ) + (1 − λ)g(x3 ),
we can conclude that g(x) is not concave either.
Convexity
Michel Bierlaire

Practice quiz

Consider a function f : R → R of a single variable, concave on [ℓ1 , u1 ],


where ℓ1 < u1 . Consider also the function g of two variables, defined by
g(x1 , x2 ) = f (x1 ) on A = [ℓ1 , u1 ] × [ℓ2 , u2 ], where ℓ2 < u2 . Show that g is
concave on A.
Remember that g is concave on a convex set A if

g((1 − λ)x + λy) ≥ (1 − λ)g(x) + λg(y)

for all x, y ∈ A and for all λ ∈ [0, 1].


Convexity
Michel Bierlaire

Solution of the practice quiz

Note first that the set A = [ℓ1 , u1 ] × [ℓ2 , u2 ] is non empty and convex.
Consider x and y in A and λ ∈ [0, 1]. By convexity of A, the point z =
(1 − λ)x + λy belongs to A.
As f is concave, we have

g(z) = f (z1 )
= f ((1 − λ)x1 + λy1 )
≥ (1 − λ)f (x1 ) + λf (y1 )
= (1 − λ)g(x1 , x2 ) + λg(y1 , y2 )
= (1 − λ)g(x) + λg(y),

that proves the concavity of g.


Differentiability: The first order
Michel Bierlaire

Practice quiz

Calculate the gradient of the following function:

f : R3 → R : f (x1 , x2 , x3 ) = ex1 + x21 x3 − x1 x2 x3 .


What is the directional derivative along the direction d at a point (x1 , x2 , x3 ),
where  
d1
d = d 2 ?
d3
Differentiability: the first order
Michel Bierlaire

Solution of the practice quiz

If f : Rn → R is a differentiable function, the function ∇f (x) : Rn → Rn


is called the gradient of f and is defined as
 ∂f 
∂x1
∂f
 
∇f (x1 , x2 , x3 ) = 
 ∂x2
.

∂f
∂x3

Therefore, the gradient of the function is


 x 
e 1 + 2x1 x3 − x2 x3
∇f (x1 , x2 , x3 ) =  −x1 x3 .
2
x1 − x1 x2

Let f : Rn → R be a differentiable function. Consider x ∈ Rn and d ∈ Rn .


The directional derivative of f at x in the direction d is given by
f (x + αd) − f (x)
lim ,
α→0 α
if the limit exists. Here, we have

f (x + αd) − f (x) =ex1 +αd1 + (x1 + αd1 )2 (x3 + αd3 )


− (x1 + αd1 )(x2 + αd2 )(x3 + αd3 )
− (ex1 + x21 x3 − x1 x2 x3 )
=ex1 +αd1 − ex1 + αx21 d3 + α2 x3 d21 + α3 d21 d3 + 2αx1 x3 d1
+ 2α2 x1 d1 d3 − αx1 x2 d3 − αx1 x3 d2 − α2 x1 d2 d3
− αx2 x3 d1 − α2 x2 d1 d3 − α2 x3 d1 d2 − α3 d1 d2 d3 .
Therefore,

f (x + αd) − f (x) ex1 +αd1 − ex1


lim = lim
α→0 α α→0 α
2
+ x1 d3 + 2x1 x3 d1 − x1 x2 d3 − x1 x3 d2 − x2 x3 d1
=d1 ex1 + x21 d3 + 2x1 x3 d1 − x1 x2 d3 − x1 x3 d2 − x2 x3 d1
=d1 (ex1 + 2x1 x3 − x2 x3 ) − x1 x3 d2 + (x21 − x1 x2 )d3 .

In addition, when the gradient exists, the directional derivative is the


inner product between the gradient of f and the direction d, that is,

∇f (x)T d.

Using this formula, the directional derivative is

(d1 d2 d3 )∇f (x1 , x2 , x3 ) =

d1 (ex1 + 2x1 x3 − x2 x3 ) − d2 x1 x3 + d3 (x21 − x1 x2 ).


You can verify that it is the same result as above.
Differentiability: The first order
Michel Bierlaire

Practice quiz

Calculate the steepest descent direction of the following function


1
f (x) = x21 + 2x22 ,
2
at x = (1, 1). What is the directional derivative of the function at x in the
steepest descent direction?
Differentiability: the first order
Michel Bierlaire

Solution of the practice quiz

The steepest descent direction is the direction opposite to the gradient.


We have
∂f
!  
∂x x1
∇f (x) = ∂f1 = .
4x2
∂x2
T
At point x = (1 1) , we have
   
x1 1
∇f (x) = = .
4x2 4

Therefore, the steepest descent direction is


 
−1
.
−4

The directional derivative is


 
T
 1
−∇f (x) ∇f (x) = −1 −4 = −17.
4
Differentiability: first order
Michel Bierlaire

Practice quiz

Consider the function f : R2 → R defined as


1
f (x) = x21 + 2x22 .
2
1. Implement in Python a function that takes x as argument and returns
the value of the function and its gradient.

2. Evaluate the function at  


1
x= .
1

3. Plot the function for x1 and x2 ranging from -6 to 6.

4. Consider the three following directions:


   
−1 1
d1 = −∇f (x), d2 = , d3 = .
−1 −3

Plot the functions

gi (α) = f (x + αdi ), i = 1, 2, 3,

for α ∈ [0, 1].

5. Calculate the directional derivatives of f at x along each direction.


Differentiability: The first order
Michel Bierlaire

Practice quiz

Calculate the Jacobian matrix of the following function:

x21 x2
 
2
f : R2 → R3 : f (x1 , x2 ) =  cos(ex1 +x2 )  .
x2
Differentiability: the first order
Michel Bierlaire

Solution of the practice quiz

The Jacobian matrix is the transposed of the gradient matrix. In the


gradient matrix, each row corresponds to a variable, and each column to a
function. For the Jacobian matrix, it is the opposite. Here, as we have 2
variables and 3 functions, we have

∇f (x) ∈ R2×3 , J(x) ∈ R3×2 .

The gradient matrix is


∂f1 ∂f2 ∂f3
!  2 2 
∂x1 ∂x1 ∂x1 2x1 x2 − sin(ex1 +x2 )ex1 +x2 0
∇f (x) = = 2 2 .
∂f1 ∂f2 ∂f3 x21 −2x2 sin(ex1 +x2 )ex1 +x2 1
∂x2 ∂x2 ∂x2

The Jacobian matrix is


 ∂f1 ∂f1 
x21
 
∂x1
 ∂f2
∂x2 2x1 x2
∂f2  2 2 2 2

J(x) = ∇f (x)T = 
 ∂x1 ∂x2  = − sin(ex1 +x2 )ex1 +x2 −2x2 sin(ex1 +x2 )ex1 +x2  .
∂f3 ∂f3 0 1
∂x1 ∂x2
Differentiability: The first order
Michel Bierlaire

Practice quiz

The depth of a lake (in meters) at coordinates (x1 , x2 ) is given by the


function:
f (x1 , x2 ) = 400 − 3x21 x22 ,
where the coordinate system is also in meters. If a swimmer is located in
the middle of the lake, at coordinates (1, −2), determine the direction d that
she needs to swim in order to make the depth increase as fast as possible.
Provide d such that its norm is 1. If she swims a distance of one meter in
that direction, what is the depth of the lake at her new position? What if
she swims a distance of eight meters?
Differentiability: the first order
Michel Bierlaire

Solution of the practice quiz

The function
f (x1 , x2 ) = 400 − 3x21 x22
is differentiable at (1, −2)T . Therefore the gradient is the direction of steepest
ascent. We have  
−6x1 x22
∇f (x1 , x2 ) = ,
−6x21 x2
and  
−24
∇f (1, −2) = .
12

As k∇f (1, −2)k = 12 5, the normalized direction in which the depth in-
creases as fast as possible is then
√ √ 
d = −2/ 5, 1/ 5 .

If she swims one meter along the direction, her new position is
 √ 
+ 1 − 2/ √5
x = ,
−2 + 1/ 5

and the depth at this position is



+ 269 − 120 5
f (x ) = 400 − 3 ≈ 399.92.
25
If she swims a distance of 8 meters, her new position is
 √ 
++ 1 − 16/ √5
x = ,
−2 + 8/ 5
and the depth is
f (x++ ) ≈ 117.
As the depth at the current position is

f (x) = 388,

it illustrates that following an ascent direction increases the value of the


function, but up to a point. If the step along the direction is too long, the
value of the function may actually decrease. In this example, the step should
be lesser than 5.59 in order to obtain an increase of the value of the function.
Differentiability: the second order
Michel Bierlaire

Practice quiz

For each of the following functions,

1. calculate the gradient,

2. calculate the Hessian,

3. check if the function is convex, concave or neither,

4. calculate the curvature of the function in the direction d at the specified


point x+ .

Hint: remember that a symmetric matrix is positive definite if all its


eigenvalues are positive.

1 9
f (x) = x21 + x22 ,
2 2
x+ = (0, 0)T ,
d = (1, 1)T .

1
g(x) = x31 + x32 − x1 − x2 ,
3
x+ = (9, 1)T ,
d = (9, 1)T .
Differentiability: the second order
Michel Bierlaire

Solution of the practice quiz

1 9
f (x) = x21 + x22 .
2 2
1. The gradient is
∂f
!  
∂x1 x1
∇f (x1 , x2 ) = = .
∂f 9x2
∂x2

2. The Hessian is
 
∂2f ∂2f  
2 ∂x21 ∂x1 x2 1 0
∇ f (x1 , x2 ) =  = .
∂2f ∂2f 0 9
∂x2 x1 ∂x22

3. The Hessian is constant, and positive definite. Indeed, as it is a diagonal


matrix, its diagonal entries are also its eigenvalues. As they are both
positive, the matrix is positive definite. And, consequently, the function
f is convex everywhere.

4. The curvature of the function in the direction d = (1, 1)T at x+ =


(0, 0)T is obtained as

dT ∇2 f (x+ )d 10
T
= = 5.
d d 2
1
g(x) = x31 + x32 − x1 − x2 .
3
1. The gradient is
∂f
!  
∂x1 x21 − 1
∇f (x1 , x2 ) = = .
∂f 3x22 − 1
∂x2

2. The Hessian is
 
∂2f ∂2f  
2 ∂x21 ∂x1 x2 2x1 0
∇ f (x1 , x2 ) =  = .
∂2f ∂2f 0 6x2
∂x2 x1 ∂x22

3. Consider the point xa = (1, 1)T . At that point, the Hessian is positive
definite and, therefore, the function is convex. If you now consider
the point xb = (−1, −1)T , the Hessian is negative definite, and the
function is concave at xb . Therefore, the function itself is neither a
convex function, nor a concave function.

4. The curvature of the function in the direction d = (9, 1)T at the speci-
fied point x+ = (9, 1)T is obtained as

dT ∇2 g(x+ )d 1464
T
= = 17.85.
d d 82
Differentiability: second order
Michel Bierlaire

Practice quiz
Consider f : R2 → R defined as
1
f (x) = x31 + x32 − x1 − x2 .
3
1. Implement in Python a function that takes x as argument and returns
the value of the function, its gradient and its second derivatives matrix.
2. Evaluate the function, its gradient and second derivatives matrix at
 
9
x= .
1

3. Plot the function for x1 and x2 ranging from -5 to 5.


4. Consider the following direction:
 
−1
d= .
−1
Plot the uni-dimensional function
g(α) = f (x + αd),
for α ∈ [0, 10].
5. Calculate the directional derivatives of f at x along the direction.
6. Calculate the curvature of f at x along the direction.
7. Calculate the eigenvalues and the eigenvectors of the matrix ∇2 f (x).
8. Calculate the curvature of f at x along the eigenvectors.
Linearity
Michel Bierlaire

Practice quiz

Consider the function


x2
f (x) = − 2x + 1.
100
Provide a Lipschitz constant for the derivative of this function.
Linearity
Michel Bierlaire

Solution of the practice quiz

We have
x2
f (x) = − 2x + 1,
100
2x
f ′ (x) = − 2,
100
2
f ′′ (x) = .
100
We need to find M such that

|f ′ (x) − f ′ (y)| ≤ M |x − y|.

We have
2x 2y = 2 |x − y|.

|f ′ (x) − f ′ (y)| = −
100 100 100
Therefore, any M such that
2
M≥
100
is a Lipschitz constant.
Conditioning
Michel Bierlaire

Practice quiz
Consider the following quadratic function:
f (x1 , x2 ) = 2x21 + 9x22 . (1)
1. Calculate the condition number of the Hessian of f at point (x1 , x2 ).
2. Apply the change of variable x′ = M x where
 
2 √0
M= ,
0 3 2
that is,
x′1 = 2x1 ,

x′2 = 3 2x2 .
Consider the function
f˜(x′ ) = f (M −1 x′ ).
Calculate the condition number of the Hessian (using the 2-norm) of f˜
at point (x′1 , x′2 ).

Remember that, if A ∈ Rn×n is a non singular symmetric matrix, then the


condition number of A is
κ(A) = ||A|| ||A−1 ||. (2)
The norm 2 of a matrix is its largest singular value. Therefore, in this case,
σ1
κ2 (A) = ||A||2 ||A−1 ||2 = , (3)
σn
where σ1 is the largest singular value of A and σn is the smallest. By exten-
sion, the condition number of a singular matrix (i.e., such that σn = 0) is
+∞. If A is symmetric positive semidefinite, the singular values of A are its
eigenvalues.
Conditioning
Michel Bierlaire

Solution of the practice quiz

We consider
f (x1 , x2 ) = 2x21 + 9x22 .
The second derivative matrix is
 
2 4 0
∇ f (x1 , x2 ) = .
0 18

1. The matrix is diagonal. Therefore, its eigenvalues are obtained directly


as the diagonal entries. Consequently, the condition number is
18 9
κ2 (A) = = ,
4 2
for any x ∈ R2 .

2. We now apply the change of variables


 ′       1  ′
x1 2 √ 0 x1 x1 0 x1
′ = , = 2 √
2
x2 0 3 2 x2 x2 0 6
x′2

We obtain

1 ′ 2 ′
f˜(x′1 , x′2 ) = f ( x1 , x)
2 6 2√
1 2 ′ 2
= 2( x′1 )2 + 9( x)
2 6 2
1 1 ′2
= x′2 1 + x2 .
2 2
The Hessian of f˜ is the identity matrix:
 
2˜ ′ 1 0
∇ f (x ) = .
0 1

Therefore, the two eigenvalues are equal to 1, and the condition number
is 1, for any x′ ∈ R2 .
Necessary optimality conditions
Michel Bierlaire

Practice quiz
Consider the function
f (x1 , x2 ) = 50x21 − x32
illustrated in the figure below.

5,000

0 10

−10 0
−5 0 5
10−10

1. Show that the point  


∗ 0
x =
0
satisfies the necessary optimality conditions.
2. Show that this point is neither a local minimum nor a local maximum.
Hint: Consider the directions
   
0 0
d1 = and d2 = .
1 −1
Necessary optimality conditions
Michel Bierlaire

Solution of the practice quiz

1. The gradient is
   
100x1 ∗ 0
∇f (x) = , ∇f (x ) = .
−3x22 0

Therefore, the first order necessary condition is verified.


The second derivative matrix is
   
2 100 0 2 ∗ 100 0
∇ f (x) = , ∇ f (x ) = .
0 −6x2 0 0

It is positive semidefinite. Indeed, the eigenvalues are read on the diag-


onal, as it is a diagonal matrix. And they are non negative. Therefore,
the second order necessary condition is verified.

2. Note that the value of the objective function at x∗ is

f (x∗ ) = 0.

Consider the direction  


0
d1 =
1
and the point x1 obtained by following this direction from x∗ with a
step α:      
∗ 0 0 0
x1 = x + αd1 = +α = .
0 1 α
The value of the objective function at x1 is

f (x1 ) = −α3 .
Therefore, for any α > 0, we have

f (x1 ) < f (x∗ ),

and x∗ cannot be a local minimum.


Consider now the direction
 
0
d2 =
−1

and the point x2 obtained by following this direction from x∗ with a


step α:
     
∗ 0 0 0
x2 = x + αd2 = +α = .
0 −1 −α

The value of the objective function at x2 is

f (x2 ) = α3 .

Therefore, for any α > 0, we have

f (x2 ) > f (x∗ ),

and x∗ cannot be a local maximum.


Necessary optimality conditions
Michel Bierlaire

Practice quiz

Consider the affine function


n
X
f :R →R:x→c x+b=
n T
ci xi + b,
i=1

where c ∈ Rn and b ∈ R. For what values of x, b and c are the necessary


optimality conditions verified?
Necessary optimality conditions
Michel Bierlaire

Solution of the practice quiz

The gradient of f is
∂f
   
∂x1 c1
  .. 
∇f (x) =  ...  =  .  = c,

∂f
∂xn
cn

for any x ∈ Rn . As it is a constant vector, the second derivative matrix is


zero, for any x ∈ Rn :
∇2 f (x) = 0.
We first note that the value of b is irrelevant to determine the necessary
optimality conditions. Adding a constant to an objective function does not
change the optima. If c = 0, that is, if

c1 = . . . = cn = 0,

then the necessary optimality conditions are verified for all x ∈ Rn . If c 6= 0,


that is, if there is at least one i such that ci 6= 0, then the first order necessary
optimality conditions are not verified . The second order necessary conditions
are always verified, as the null matrix is positive semidefinite.
Note that, if c = 0, the linear function is constant, with value b. And any
x ∈ Rn is a (global) optimum. If not, the function is not constant. As there
is no constraint, it is not bounded and there is no optimum.
Sufficient optimality conditions
Michel Bierlaire

Practice quiz

Consider the function


1
f (x1 , x2 ) = x21 + x1 cos x2
2
illustrated in the figures below.

0
5
0
−1
0 −5
1
5

−5

−1.5 −1 −0.5 0 0.5 1 1.5

Use the optimality conditions to identify the minima of this function.


Sufficient optimality conditions
Michel Bierlaire

Solution of the practice quiz

The gradient of the function is


 
x1 + cos x2
∇f (x1 , x2 ) = .
−x1 sin x2

This gradient is zero for


 
(−1)k+1
x∗k = , k ∈ Z,

and for  
0
x̄k = π , k ∈ Z.
2
+ kπ
x̄2•
x∗2

5 x̄1

x∗
• 1
x̄0

x∗0
0 •
x̄−1

x∗
• −1
x̄−2
−5 •
x∗−2

−1.5 −1 −0.5 0 0.5 1 1.5


The second derivative matrix is
 
2 1 − sin x2
∇ f (x1 , x2 ) = .
− sin x2 −x1 cos x2

By evaluating this matrix at x∗k , we obtain for any k ∈ Z


 
2 ∗ 1 0
∇ f (xk ) = .
0 1

Since this matrix is positive definite, each point x∗k satisfies the sufficient
optimality conditions and is a local minimum of the function.

By evaluating the second derivative matrix at x̄k , we obtain for any k ∈ Z


 
2 1 (−1)k+1
∇ f (x̄k ) = .
(−1)k+1 0

Regardless of k, this matrix is not positive semidefinite. Indeed, if k is even,


 
2 ∗ 1 −1
∇ f (xk ) = .
−1 0

If k is odd,  
2 1 1
∇ f (x∗k ) = .
1 0
In both cases, the eigenvalues are −0.61803 and 1.61803.
Therefore, there is no x̄k that satisfies the necessary optimality conditions.
None of them can then be a local minimum.
Sufficient optimality conditions
Michel Bierlaire

Practice quiz

Consider the function

f (x1 , x2 ) = 2x21 + 3x22 + x1 − x2 + 3

illustrated in the figures below.

20

10
2

−2 0
−1 0 1 2 −2
2

−1

−2
−2 −1 0 1 2

Show that
− 41
 

x = 1
6

is a global minimum of f . Is it the only one?


Sufficient optimality conditions
Michel Bierlaire

Solution of the practice quiz

First, we show that the point


− 14
 

x = 1 .
6

is a local minimum of the function f . The gradient of the function is


 
4x1 + 1
∇f (x1 , x2 ) = ,
6x2 − 1

which is zero at x∗ .
2

x∗•
0

−1

−2
−2 −1 0 1 2

The second derivative matrix is


 
2 4 0
∇ f (x1 , x2 ) = ,
0 6
which is positive definite at x∗ . Therefore, the point
 1 
∗ −4
x = 1
6

satisfies the sufficient optimality conditions and is a local minimum.


The second derivative matrix is positive definite for all x ∈ R2 , then f is
strictly convex. Therefore, x∗ is a unique global minimum of the function.
Thus, x∗ = (− 14 , 61 ) is the only global minimum of f .
Quadratic functions
Michel Bierlaire

Practice quiz

Consider a quadratic function


1
f = xT Qx + g T x + c,
2
 
1 α
where Q = , g ∈ R2 and c ∈ R.
α 1
1. For which values of α f does not have a local minimum?

2. For which values of α f has a unique global minimum?

3. What are the conditions on g and α so that the problem has an infinite
number of global minima?
Quadratic functions
Michel Bierlaire

Solution of the practice quiz

The optimality conditions for quadratic functions are related to the pos-
itive definiteness of the (constant) second derivative matrix. The derivatives
of the objective function are

∇f (x) = Qx + g, ∇2 f (x) = Q.

The positive definiteness of Q can be identified from its eigenvalues, that are
the roots of the characteristic polynomial. Therefore, we solve the equation
 
1−λ α
det(λI − Q) = det = (1 − λ)2 − α2
α 1−λ
= λ2 − 2λ + (1 − α2 )
= (1 − α − λ)(1 + α − λ).

Therefore, we obtain the eigenvalues:

λ1 = 1 − α
λ2 = 1 + α.

The corresponding (normalized) eigenvectors are


√ !
2
u1 = 2√
2
as Qu1 = λ1 u1 ,
− 2
√ !
2
u2 = √2
2
as Qu2 = λ2 u2 .
2
1. For which values of α f does not have a local minimum? According
to the theory, it happens when Q is not positive semidefinite, that is,
when at least one eigenvalue is non positive. This is the case if α > 1
(so that λ1 < 0), or α < −1 (so that λ2 < 0).
2. For which values of α f has a unique global minimum? According to
the theory, it happens when Q is positive definite. This is the case if
−1 < α < 1, so that λ1 > 0 and λ2 > 0.
3. What are the conditions on g and α so that the problem has an infinite
number of global minima? According to the theory, it happens when
Q is positive semidefinite but not positive definite. It means that no
eigenvalue is negative, and at least one of them is zero. This is the case
if α = 1, so that λ1 = 0 and λ2 = 2, and if α = −1, so that λ1 = 2 and
λ2 = 0.
Geometrically, we can decompose the space into two subspaces. In the
subspace spanned by the eigenvectors of the positive eigenvalues, the
function is strictly convex. In this subspace, there is a unique minimum.
In the subspace spanned by the eigenvectors of the zero eigenvalues, the
function is linear, as there is no curvature. Therefore, it is bounded
only if it is constant. And, in that case, there is an infinite number of
minima.
Algebraically, we use the Schur decomposition of Q:

Q = U ΛU T
√ √ ! √ √ !
2 2 2
−√ 22

1−α 0
= 2√ √2 √2 ,
− 2 2 0 1+α 2 2
2 2 2 2

where U is an orthogonal matrix composed of the eigenvectors of Q


organized in columns.
Let’s use the matrix U to decompose x and g:
 ′   ′ 
′ x1 ′ g1
U x=x =
T
′ and U g = g =
T
,
x2 g2′
that is
√   √  
′ 2 x1 − x2 ′ 2 g1 − g2
x = and g = .
2 x1 + x2 2 g1 + g2
We use the Schur decomposition, and the fact that the U matrix is
orthogonal, so that U U T = I, to rewrite the objective function as
follows:
1
f (x) = xT Qx + g T x + c
2
1
= xT U ΛU T x + g T U U T x + c.
2
Using the decomposition of the vectors, we obtain the function in the
new variables
1 T
f˜(x′ ) = x′ Λx′ + g ′ x′ + c
T
2
1 1
= λ1 (x′1 )2 + λ2 (x′2 )2 + g1′ x′1 + g2′ x′2 .
2 2
The gradient of this function is
 
λ1 x′1 + g1′
∇f˜(x′ ) = .
λ2 x′2 + g2′

We see that, for the gradient to be zero (which is a necessary condition


for optimality), we need

−gi′
x′i = if λi 6= 0,
λi

gi = 0 if λi = 0.

Now, we consider the two values of α identified above.

• If α = 1, then λ1 = 0 and λ2 = 2, and we need


√ √
′ 2 ′ 2 −g2′
g1 = (g1 − g2 ) = 0 and x2 = (x1 + x2 ) = .
2 2 2
It means that the function must be such that g1 = g2 . If so, any
solution such that
x1 = −g1 − x2
is a global minimum.
• If α = −1, then λ1 = 2 and λ2 = 0, and we need
√ √
′ 2 −g1′ ′ 2
x1 = (x1 − x2 ) = and g2 = (g1 + g2 ) = 0.
2 2 2
It means that the function must be such that g1 = −g2 . If so, any
solution such that
x1 = −g1 + x2
is a global minimum.

It can easily be verified that the gradient of the function


 
x1 + αx2 + g1
f (x) = Qx + g =
αx1 + x2 + g2

is indeed zero under these conditions.


Quadratic functions
Michel Bierlaire

Practice quiz

Consider the function


1
f (x1 , x2 ) = (x1 − x2 )2 + 3x1 − 5,
2
which is illustrated in the figure below.

50

0
5

−5 0
0
5 −5

1. Write the function in quadratic form.

2. Does the function have any local minimum?


Quadratic functions
Michel Bierlaire

Solution of the practice quiz

To write f in quadratic form


1
f (x) = xT Qx + g T x + c,
2
we calculate its derivatives:
   
x1 − x2 + 3 2 1 −1
∇f (x) = and ∇ f (x) = .
−x1 + x2 −1 1

Therefore,    
1 −1 3
Q= , g= , c = −5.
−1 1 0
The positive definiteness of Q can be identified from its eigenvalues, that
are the roots of the characteristic polynomial. Therefore, we solve the equa-
tion
 
1 − λ −1
det(λI − Q) = det = (1 − λ)2 − 1
−1 1 − λ
= 1 − 2λ + λ2 − 1
= λ(λ − 2).

Therefore, the eigenvalues of Q are λ1 = 2 and λ2 = 0. We deduce that


Q is positive semidefinite. Thus, either the problem is not bounded or there
is an infinite number of global minima.
In order to find in which case we are, we consider the Schur decomposition
of Q. To do so, we first have to determine the eigenvectors associated with
each eigenvalue.
• For λ1 we have     
−1 −1 x1 0
= ,
−1 −1 x2 0
which implies that the normalized eigenvector is
√  
2 1
xλ 1 = .
2 −1

• For λ2 we have     
1 −1 x1 0
= ,
−1 1 x2 0
which implies that the normalized eigenvector is
√  
2 1
xλ 2 = .
2 1

Thus, we can write


√ √ ! √ √ !T
2 2 2 2

2 0
Q = U ΛU T = √2 √2 √2 √2 .
− 22 2 0 0 − 22 2
2 2

We have √  
T 2 3
U g= .
2 −3
As the second entry of this vector, corresponding to the zero eigenvalue, is
not zero, we can conclude that the problem is unbounded.
Solving equations: Newton with one variable
Michel Bierlaire

Practice quiz

Implement in Python Newton’s algorithm to find the root of one equation


with one unknown. More precisely, for a given F : R → R, find x∗ such that

|F (x∗ )| ≤ ε = 10−15 .

Use this implementation to find the root of the following functions.

1. F (x) = x2 − 2 = 0 with x0 = 2.

2. F (x) = x − sin(x) = 0 with x0 = 1.

3. F (x) = arctan(x) = 0 with x0 = 1.5.


Solving equations: Newton with several
variables
Michel Bierlaire

Practice quiz

Newton’s method is easily generalized to solve systems of equations with


several variables. We want to find x such that F (x) = 0, where F : Rn → Rn .
If J(x) is the Jacobian if F , the method consists in solving, at each iteration,
the system of linear equations
J(xk )dk+1 = −F (xk ),
and to update the iterate:
xk+1 ← xk + dk+1 .
Remember that the element in row i and column j of the Jacobian matrix is
∂Fi
.
∂xj
1. Implement in Python Newton’s algorithm to find the root of a system of
n equations with n unknowns. More precisely, for a given F : Rn → Rn ,
find x∗ such that
kF (x∗ )k ≤ ε = 10−15 .

2. Apply it to the following system of equations, using


 
−2
x0 =
−2
as a starting point:
 
x31 − 3x1 x22 − 1
F (x) = = 0.
x32 − 3x21 x2
3. The above system of equations happens to have three roots:
     
⋆ 1 ⋆ −1/2 ⋆ −1/2
x (b) = , x (g) = √ , x (w) = √ .
0 3/2 − 3/2

It is actually difficult to predict toward which root the algorithm will


converge. To visualize the process, generate a plot using the following
convention:

• if Newton’s method, when starting from the point x0 , converges


toward the solution x⋆ (b), the point x0 is colored in red;
• if Newton’s method, when starting from the point x0 , converges
toward the solution x⋆ (g), the point x0 is colored in blue;
• if Newton’s method, when starting from the point x0 , converges
toward the solution x⋆ (w), the point x0 is colored in yellow.
Newton’s local method
Michel Bierlaire

Practice quiz

1. Implement in Python Newton’s local algorithm to find the minimum


of a function of n variables. More precisely, for a given f : Rn → R,
find x∗ such that
k∇f (x∗ )k ≤ ε = 10−15 .

2. Apply it to the following function:

f (x1 , x2 ) = 2x31 + 6x1 x22 − 3x32 − 150x1 ,

using the following starting points:


   
6.2 2.8
and .
−3.4 4.3

3. For each case, determine if the algorithm indeed found a minimum or


not.
Newton’s local method
Michel Bierlaire

Practice quiz

Consider the function f : R → R defined as

f (x) = −x4 + 12x3 − 47x2 + 60x.

For each of the following points x


b,

• identify the quadratic model of the function f at x


b,

• identify the point at which the derivative of the quadratic model is


zero,

• plot the function, the quadratic model and the zero and,

• comment on the corresponding iteration of Newton’s local method.

The points to consider are

1. x
b = 3,

2. x
b = 4,

3. x
b = 5.
Newton’s local method
Michel Bierlaire

Solution of the practice quiz

[1]: %matplotlib inline


import numpy as np
import matplotlib.pyplot as plt

Consider f : R → R defined as

f ( x ) = − x4 + 12x3 − 47x2 + 60x.

[2]: def theFunction(x):


return(-x**4 + 12 * x**3 - 47 * x**2 + 60 * x)
x = np.arange(0, 5.5, 0.1)
y = theFunction(x)
plt.plot(x, y)

[2]: [<matplotlib.lines.Line2D at 0x115410940>]


The first derivative is
f 0 ( x ) = −4x3 + 36x2 − 94x + 60.

[3]: def fprime(x):


return -4 * x**3 + 36 * x**2 - 94 * x + 60

The second derivative is


f 00 ( x ) = −12x2 + 72x − 94.

[4]: def fsecond(x):


return -12 * x**2 + 72 * x - 94

The quadratic model at the point xb is defined as

1
m( x ) = f ( xb) + ( x − xb) f 0 ( xb) + ( x − xb)2 f 00 ( xb)
2

[5]: def model(x, xhat):


return theFunction(xhat) + (x - xhat) * fprime(xhat) + 0.5 * (x - xhat)**2 *␣
,→fsecond(xhat)

The first derivative of the quadratic model is zero at the point

f 0 ( xb)
x + = xb − .
f 00 ( xb)

[6]: def zero(xhat):


return xhat - fprime(xhat) / fsecond(xhat)

Plot the quadratic model at a given point

[7]: def plotModel(xhat):


print(f'f({xhat})={theFunction(xhat)}')
print(f'f\'({xhat})={fprime(xhat)}')
print(f'f\'\'({xhat})={fsecond(xhat)}')
x = np.arange(0, 5.5, 0.1)
y1 = theFunction(x)
y2 = model(x, xhat)
plt.ylim(-10, 25)
plt.plot(x, y1)
plt.plot(x, y2)
plt.plot(xhat, theFunction(xhat), 'go')
z = zero(xhat)
print(f'Zero of the derivative of the model: {z:.3g}')
plt.plot(z, model(z, xhat), 'ro')

Quadratic model around xb = 3.


[8]: plotModel(3)

f(3)=0
f'(3)=-6
f''(3)=14
Zero of the derivative of the model: 3.43

The model is convex. Indeed, the second derivative is positive. The zero of the derivative is
therefore the minimum of the quadratic model. And there is a good adequacy between the model
and the function at that point. This is a favorable case for Newton’s local method.
Quadratic model around xb = 4.
[9]: plotModel(4)

f(4)=0
f'(4)=4
f''(4)=2
Zero of the derivative of the model: 2
The model is convex. Indeed, the second derivative is positive. The zero of the derivative is
therefore the minimum of the quadratic model. However, there is not a good adequacy between
the model and the function at that point. The value of the function actually increase at that point.
This is not a favorable case for Newton’s local method.
Quadratic model around xb = 5.
[10]: plotModel(5)

f(5)=0
f'(5)=-10
f''(5)=-34
Zero of the derivative of the model: 4.71
The model is not convex. Indeed, the second derivative is negative. Therefore, the model is
not bounded from below and there is no minimum. The zero of the derivative corresponds to a
maximum of the quadratic model. Even if there is a good adequacy between the model and the
function at that point, the value of the function actually increase. This is not a favorable case for
Newton’s local method.
Preconditioned steepest descent
Michel Bierlaire

Practice quiz

1 Algorithm
Consider a function f : Rn → R. The preconditioned steepest descent algo-
rithm is defined as
xk+1 = xk + αdk ,
where
dk = −Dk ∇f (xk ),
and Dk ∈ Rn×n is a positive definite matrix. Implement the algorithm with

dTk ∇f (xk )
α= .
dTk ∇2 f (xk )dk
The iterations must be interrupted when
1. either the norm of the gradient is sufficiently close to zero, that is

k∇f (xk )k2 ≤ ε,

where ε is a given precision,

2. or a maximum number of iterations is reached.

2 Illustration
Consider the function

f (x1 , x2 ) = x21 + 11x22 + x1 x2 .


Apply the preconditioned steepest descent algorithm with
 
4
x0 = , ε = 10−7 ,
1

and the following preconditioners:

Dk = I
 
1 0
Dk = 1
0 10
 
1 22 −1
Dk = .
43 −1 2

Comment on the number of iterations and the global behavior of the algo-
rithm.
Preconditioning
Michel Bierlaire

Practice quiz
Consider the function f : R2 → R defined as
1 101 2
f (x) = x21 + x + x1 x2 .
2 2 2
1. Consider a change of variables
x′ = LTk x,
where
Lk LTk = ∇2 f (xk ).
Write the function in the new variables
f˜(x′ ) = f (L−T ′
k x ).

2. Calculate its first and second derivatives.


3. Consider the point  
1
x0 = .
1
Calculate the corresponding point in the new variables x′0 . Apply one
iteration of the steepest descent algorithm on f˜ from that point, that
is
x′k+1 = x′k − α∇f˜(xk ),
where the step size is
∇f˜(xk )T ∇f˜(xk )
α= .
∇f˜(xk )T ∇2 f˜(xk )∇f˜(xk )

4. Finally, identify the corresponding point in the original variables.


Preconditioning
Michel Bierlaire

Solution of the practice quiz

The derivatives of the function are


   
x1 + x2 2 1 1
∇f (x) = , ∇ f (x) = .
x1 + 101x2 1 101

1. The matrix Lk defining the change of variables is the Cholesky factor


of the second derivatives matrix, that is
 
1 0
Lk = , ∀k,
1 10
as  
1 1
Lk LTk = .
1 101
Therefore, the change of variables is x′ = LTk x, that is
 
′ 1 1
x = x
0 10
or

x′1 = x1 + x2
x′2 = 10x2 .

We write the change of variables in the opposite direction as x = L−T


k x,

that is
1
 
1 − 10
x= 1 x′
0 10
or
1 ′
x1 = x′1 − x,
10 2
1 ′
x2 = x.
10 2

The function f˜ is therefore defined as


1 1
f˜(x′ ) = f (x′1 − x′2 , x′2 )
10 10
 2  2   
1 ′ 1 ′ 101 1 ′ ′ 1 ′ 1 ′
= x1 − x2 + x + x1 − x2 x ,
2 10 2 10 2 10 10 2
1 1 ′2
= x′2 1 + x2 .
2 2

2. The derivatives of f˜ are


   
x′1 1 0
∇f˜(x′1 , x′2 ) = , ∇ f˜ =
2
.
x′2 0 1

3. We write the point in the new variables:


   
1 ′ 2
x0 = , x0 = ,
1 10

so that  
2
∇f˜(x0 ) = ′
.
10
The step to perform along the steepest descent direction is

∇f˜(x0 ′ )T ∇f˜(x0 ′ )
α= .
∇f˜(x0 ′ )T ∇2 f˜(x0 ′ )∇f˜(x0 ′ )

As ∇2 f˜(x0 ′ ) is the identity matrix, α = 1. Actually, this would be true


for any x0 . Therefore, we obtain
     
′ ′ ˜ ′ 2 2 0
x1 = x0 − α∇f (x0 ) = − = .
10 10 0
4. In the original variables, we have
1 ′
x1 = x′1 − x = 0,
10 2
1 ′
x2 = x = 0.
10 2
It happens to be the optimal solution of the problem.
Quadratic interpolation
Michel Bierlaire

Practice quiz

Consider the function h : R → R, and three points a, b and c such that


a < b < c, h(a) > h(b) and h(c) > h(b).

1. Write the polynomial of degree 2 that interpolates h at a, b and c. That


is, P (x) ∈ P2 such that P (a) = h(a), P (b) = h(b) and P (c) = h(c).

2. Show that this polynomial is convex.

3. Identify the point corresponding to the minimum of the polynomial.


Quadratic interpolation
Michel Bierlaire

Solution of the practice quiz

1. We first build a polynomial of degree 2 that is equal to 1 if x = a, and


to 0 if x = b or x = c:
(x − b)(x − c)
Pa (x) = .
(a − b)(a − c)
Similarly, we define a polynomial of degree 2 that is equal to 1 if x = b,
and to 0 if x = a or x = c:
(x − a)(x − c)
Pb (x) = ,
(b − a)(b − c)
and a polynomial of degree 2 that is equal to 1 if x = c, and to 0 if
x = a or x = b:
(x − a)(x − b)
Pc (x) = .
(c − a)(c − b)
Then, we define the polynomial

P (x) = h(a)Pa (x) + h(b)Pb (x) + h(c)Pc (x).

By construction, P (a) = h(a), P (b) = h(b) and P (c) = h(c).


2. To show that the polynomial is convex, we need to calculate the deriva-
tives. We have
2x − c − b 2
Pa′ (x) = , Pa′′ (x) = ,
(a − b)(a − c) (a − b)(a − c)
2x − a − c 2
Pb′ (x) = , Pb′′ (x) = ,
(b − a)(b − c) (b − a)(b − c)
2x − a − b 2
Pc′ (x) = , Pc′′ (x) = ,
(c − a)(c − b) (c − a)(c − b)
and
P ′′ (x) = h(a)Pa′′ (x) + h(b)Pb′′ (x) + h(c)Pc′′ (x),
2
= (h(a)(c − b) + h(b)(a − c) + h(c)(b − a)) .
(b − a)(c − a)(c − b)
Because a < b < c, we have
2
> 0.
(b − a)(c − a)(c − b)
For the second factor, we have
h(a)(c − b) + h(b)(a − c) + h(c)(b − a) =
(h(a) − h(b))(c − b) + (h(c) − h(b)(b − a) + h(b)((c − b) + (a − c) + (b − a)) =
(h(a) − h(b))(c − b) + (h(c) − h(b))(b − a).
Because a < b < c, h(a) > h(b) and h(c) > h(b), this factor is (strictly)
positive), and so is the second derivative. Therefore, the polynomial is
convex.
3. As the polynomial is convex, its minimum is reached where the first
derivative is zero. We have
P ′ (x) = h(a)Pa′ (x) + h(b)Pb′ (x) + h(c)Pc′ (x),
that is
h(a)(c − b)(2x − c − b) + h(b)(a − c)(2x − a − c) + h(c)(b − a)(2x − a − b)
P ′ (x) =
(b − a)(c − a)(c − b)
h h(a)(c − b) + h(b)(a − c) + h(c)(b − a)
=2 x
(b − a)(c − a)(c − b)
h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 ) i
+
(b − a)(c − a)(c − b)
Therefore P ′ (x∗ ) = 0 if
1 h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 )
x∗ = − ,
2 h(a)(c − b) + h(b)(a − c) + h(c)(b − a)
or, equivalently,
1 h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 )
x∗ = .
2 h(a)(b − c) + h(b)(c − a) + h(c)(a − b)
Quadratic interpolation
Michel Bierlaire

Practice quiz

Consider the unidimensional function h : R → R, which is continuous


and decreasing at 0. That is, there exists η > 0 such that h(x) < h(0) for
each 0 < x ≤ η. In particular, assume that we have access to δ ∈ R such
that h(δ) < h(0).

1. Implement first a function that generates three points a, b and c such


that a < b < c, h(a) > h(b) and h(c) > h(b). Hint: start with a = 0,
b = δ, c = 2δ, and check if the conditions are verified. If not, consider
a = δ, b = 2δ, c = 4δ, etc.

2. Implement the exact line search algorithm with quadratic interpolation.

3. Consider the function

h(x) = (2 + x) cos(2 + x).

Apply the quadratic interpolation algorithm with a precision ε = 10−3 .


It means that the algorithm stops if

max(h(a), h(c) − h(b) ≤ ε or c − a ≤ ε.

Identify the first 3 points using the procedure implemented under item
1 with δ = 6.
Exact line search: Golden Section
Michel Bierlaire

Practice quiz

In the video, the recycling mechanism of the Golden section method is


illustrated in the case where α2k becomes the new upper bound, that is when
the iteration is [ℓk+1 , uk+1 ] = [ℓk , α2k ]. We now consider an iteration of the
Golden section algorithm for which α1k becomes the new lower bound, that
is
[ℓk+1 , uk+1 ] = [α1k , uk ].
In this case, we select α1k+1 = α2k so that there is no need to recalculate the
value of the function at α1k+1 .

1. Write α2k+1 as a function of ρ, α1k and uk .

2. Derive the value of ρ.


Exact line search: Golden Section
Michel Bierlaire

Solution of the practice quiz

1. Denote λ the length of the interval:

λ = uk − ℓ k . (1)

By symmetry of the reduction of the intervals, we have

α1k − ℓk = uk − α2k = ρ(uk − ℓk ) = ρλ, (2)

and for the next iteration

α1k+1 − ℓk+1 = uk+1 − α2k+1 = ρ(uk+1 − ℓk+1 ). (3)

We now exploit the fact that [ℓk+1 , uk+1 ] = [α1k , uk ] and α1k+1 = α2k to
obtain
α2k − α1k = uk − α2k+1 = ρ(uk − α1k ). (4)

Thus we get
α2k+1 = uk − ρ(uk − α1k ). (5)

We now need to derive ρ.

2. We first derive

α2k − α1k = α2k − α1k + ℓk − ℓk + uk − uk


= −(α1k − ℓk ) − (uk − α2k ) + uk − ℓk
(6)
= −ρλ − ρλ + λ
= λ(1 − 2ρ).
Then, we derive

uk − α1k = ℓk − ℓk + uk − α1k
= −(α1k − ℓk ) + (uk − ℓk ) (7)
= λ(1 − ρ).

Inserting (6) and (7) into (4) we obtain

λ(1 − 2ρ) = ρλ(1 − ρ),

or equivalently,
ρ2 − 3ρ + 1 = 0. (8)

Equation (8) has two solutions:


√ √
3+ 5 3− 5
and .
2 2
As the shrinking factor ρ has to be less than 1/2, we select

3− 5
ρ= . (9)
2

Finally, using (9) in the equation (5), we obtain



3 − 5
α2k+1 = uK − (uk − α1k ).
2
Golden section
Michel Bierlaire

Practice quiz

Consider a function h : R → R, that is strictly unimodal in [ℓ, u]. It


means that it has a unique global minimum α∗ in [ℓ, u] and the following
conditions are verified:

• f (α1 ) > f (α2 ) > f (α∗ ) for each α1 , α2 such that α1 < α2 < α∗ ,

• f (α2 ) > f (α1 ) > f (α∗ ) for each α1 , α2 such that α2 > α1 > α∗ .

1. Implement the exact line search algorithm with Golden section.

2. Consider the function

h(x) = (2 + x) cos(2 + x),

on the interval [5, 10]. Verify visually that it is strictly unimodal, and
apply the Golden section algorithm with a precision ε = 10−3 . If means
that the algorithm stops when

u − ℓ ≤ ε.

3. Same question for the interval [0, 8].


First Wolfe condition
Michel Bierlaire

Practice quiz

Consider the unconstrained optimization problem

min2 f (x) = 4x21 − 4x1 + x22 + 2x2 ,


x∈R

and the point x0 = (0, 0)T .

1. Calculate Newton’s direction at x0 .

2. Verify that it is a descent direction.

3. Consider the first Wolfe condition with β1 = 0.1. What are the values
of the step α that verify the condition?
First Wolfe condition
Michel Bierlaire

Solution of the practice quiz

1. Newton’s direction is a solution of the linear system

∇2 f (xk )dk = −∇f (xk ).

The derivatives of the function are


   
8x1 − 4 2 8 0
∇f (x) = , ∇ f (x) = .
2x2 + 2 0 2

Therefore, Newton’s equations at x0 = (0, 0)T are


    
8 0 d1 −4
=− ,
0 2 d2 2

that is

8d1 = 4
2d2 = −2,

and Newton’s direction is


 
1/2
dN = .
−1

2. The directional derivative of the function along dN at x0 is

∇f (x0 )T dN = −4.

It is negative, so dN is a descent direction.


3. Consider the point
 
α/2
xα = x0 + αdN = ,
−α
where α ≥ 0 is the step performed along dN . The point xα verifies the
first Wolfe condition if
f (xα ) ≤ f (x0 ) + αβ1 ∇f (x0 )T dN ,
2α2 − 4α ≤ 0 − 4αβ1 ,
that is, if
α ≤ 2(1 − β1 ).
If β1 = 0.1, all steps α such that
α ≤ 1.8
verify the first Wolfe condition. It is illustrated by the figure below,
where f (xα ) is represented in blue and the line f (x0 ) + αβ1 ∇f (x0 )T dN
is represented in red. The slope of this line is β1 ∇f (x0 )T dN = −0.4.
The value α = 1.8 is represented by the dotted vertical line.

0.5
f (xα )
α
−0.5 0.5 1 1.5 2 2.5
−0.5

−1

−1.5

−2
Second Wolfe condition
Michel Bierlaire

Practice quiz

Consider the unconstrained optimization problem

min2 f (x) = 4x21 − 4x1 + x22 + 2x2 ,


x∈R

and the point x0 = (0, 0)T .

1. Calculate Newton’s direction at x0 .

2. Verify that it is a descent direction.

3. Consider the second Wolfe condition with β2 = 0.7. What are the
values of the step α that verify the condition?
Second Wolfe condition
Michel Bierlaire

Solution of the practice quiz

1. Newton’s direction is a solution of the linear system

∇2 f (xk )dk = −∇f (xk ).

The derivatives of the function are


   
8x1 − 4 2 8 0
∇f (x) = , ∇ f (x) = . (1)
2x2 + 2 0 2

Therefore, Newton’s equations at x0 = (0, 0)T are


    
8 0 d1 −4
=− ,
0 2 d2 2

that is

8d1 = 4
2d2 = −2,

and Newton’s direction is


 
1/2
dN = .
−1

2. The directional derivative of the function along dN at x0 is

∇f (x0 )T dN = −4.

It is negative, so dN is a descent direction.


3. Consider the point
 
α/2
xα = x0 + αdN = ,
−α

where α ≥ 0 is the step performed along dN . Using (1), we have


 
4α − 4
∇f (xα ) = .
−2α + 2

Therefore, the point xα verifies the second Wolfe condition if

∇f (xα )T dN ≥ β2 ∇f (x0 )T dN ,
4(α − 1) ≥ −β2 4,

that is, if
α ≥ 1 − β2 .
If β2 = 0.7, all steps α such that

α ≥ 0.3.

verify the second Wolfe condition. It is illustrated by the figure below,


where f (xα ) is represented in blue and the line

∇f (xα )T dN
∇f (x0 )T dN

is represented in red. Note that, for that representation, we consider


the second Wolfe written as
∇f (xα )T dN
≤ β2 ,
∇f (x0 )T dN

where the left hand side is plotted in red on the figure. The value
α = 0.3 is represented by the dotted vertical line.
f (xα )
1 β2 = 0.7

α
−1 1 2 3

−1

−2
Validity of the Wolfe conditions
Michel Bierlaire

Practice quiz

We are performing a line search along a direction d at a point x. The


function evaluated at x + αd is defined as

g(α) = f (x + αd) = −1.833α3 + 8.5α2 − 10α + 5.

Consider the step α = 3. Consider the first Wolfe condition with β1 = 0.15
and the second Wolfe condition with β2 = 0.8.

1. Is the step α = 3 too short, or too long?

2. Propose a value of α that verifies both Wolfe conditions.


Validity of the Wolfe conditions
Michel Bierlaire

Solution of the practice quiz

1. As
g(α) = f (x + αd) = −1.833α3 + 8.5α2 − 10α + 5,
the directional derivative is

g ′ (α) = −3 · 1.833α2 + 2 · 8.5α − 10.

Therefore, the second Wolfe condition for α = 3 is

g ′ (3) ?
≤ β2 ,
g ′ (0)
?
−8.5/ − 10 = 0.85 ≤ 0.8.

It is violated. Therefore, it is tempting to conclude that the step is too


short. The first Wolfe condition for α = 3 is
?
g(3) ≤ g(0) + 3β1 g ′ (0),
?
2.009 ≤ 5 − 4.5 = 0.5.

It is also violated. Therefore, it is tempting to conclude that the step


is too long.
However, a step cannot be both too short and too long. The two
conditions must be verified in the right order. The first Wolfe condition
must be checked first. If it is violated, we conclude that the step is too
long, and we do not need to consider the second condition. The second
Wolfe condition is checked only when the step is not deemed too long
by the first condition.
In the case described above, the step is too long, as it violates the first
condition. And the algorithm should make it shorter.
In the next figure, the value of g(α) is plotted in blue, the line

g(0) + 3β1 g ′ (0)

characterizing the first Wolfe condition is plotted in red, and the ratio

g ′ (α)
g ′ (0)

characterizing the second Wolfe condition is plotted in green, as well


as the threshold value β2 = 0.8. The dotted vertical line corresponds
to α = 3.

g(α)
4

β1
=
0. 1
2 5

β2 = 0.8 α
1 2 3 4

2. All values of α such that

0.12251 ≤ α ≤ 1.45912,

verify both conditions. The two thresholds are identified by the two
dashed blue vertical lines on the picture above.
Steepest descent algorithm
Michel Bierlaire

Practice quiz

1. Implement the steepest descent algorithm

xk+1 = xk − α∇f (xk ),

where α is obtained from the inexact line search algorithm based on


the two Wolfe conditions.

2. Apply the algorithm on the Rosenbrock function

f (x1 , x2 ) = 100(x2 − x21 )2 + (1 − x1 )2 ,

starting from  
−1.5
x0 = ,
1.5
to try to obtain x∗ such that k∇f (x∗ )k ≤ ε = 10−7 . Limit the number
of iterations to 10000.

3. Plot the iterations on the contours of the function.


Newton with line search
Michel Bierlaire

Practice quiz
1. Implement a function that takes as input a square symmetric matrix
H and returns a scalar τ ≥ 0 and a lower triangular matrix L such that
H + τ I = LT L.

2. Implement a function that solves the system


LLT d = −∇f (xk ),
by solving two triangular systems.
3. Implement Newton’s method with line search
xk+1 = xk − α(Lk LTk )−1 ∇f (xk ),
where α is obtained from the inexact line search algorithm based on
the two Wolfe conditions, and Lk is a lower triangular matrix such that
∇2 f (xk ) + τ I = LTk Lk ,
where τ ≥ 0.
4. Apply the algorithm on the Rosenbrock function
f (x1 , x2 ) = 100(x2 − x21 )2 + (1 − x1 )2 ,
starting from  
−1.5
x0 = ,
1.5
to try to obtain x∗ such that k∇f (x∗ )k ≤ ε = 10−7 . Limit the number
of iterations to 10000.
5. Plot the iterations on the contours of the function.
6. Report the value of τ and α for each iteration.
Unconstrained nonlinear optimization
Michel Bierlaire

Graded quiz

1 Formulation
Question 1
James Bond must reach a yacht moored 100 meters from shore. Currently,
James Bond is 80 meters from the nearest point to the yacht from the beach.
He is capable of running on the beach at 20 km/h and swimming at 6 km/h.
James Bond wants to determine which distance x in meters he needs to run
on the beach before jumping in the water in order to minimize the time to
get to the yacht. Which of the following formulations corresponds to this
problem?

Solution 1:

min 20x + 6(80 − x)


s.t.
x ≤ 80,
x ≥ 0.

Solution 2:
18 p
min x + 0.6 1002 + (80 − x)2
100
s.t.
x ≤ 80,
x ≥ 0.
Solution 3:

min 20x + 0.6 1002 + x2
s.t.
x ≥ 0.

Solution 4:
18 p
min x + 0.6 1002 + (80 − x)2 .
100
Question 2
Consider the following optimization model.


max −3 x1 − 2x22

s.t. 2 x1 + x22 ≥ 4,

−3 x1 + 2x22 ≥ 2,
x1 , x2 ≥ 0.

Which linear model in standard form corresponds to it?

Solution 1:

min 3y1 + 2y2


s.t. 2y1 + y2 − y3 = 4,
−3y1 + 2y2 − y4 = 2,
y1 , y2 , y3 , y4 ≥ 0.

Solution 2:

min 3y1 + 2y2


s.t. 2y1 + y2 = 4,
−3y1 + 2y2 = 2,
y1 , y2 ≥ 0.

Solution 3:

max −3 x1 − 2x22

s.t. 2 x1 + x22 = 4,

−3 x1 + 2x22 = 2,
x1 , x2 ≥ 0.
Solution 4:

min 3 x1 + 2y

s.t. 2 x1 + y ≥ 4,

−3 x1 + 2y ≥ 2,
x1 , y ≥ 0.
Question 3
An iteration of the local Newton method is exactly an iteration of the
preconditioned steepest descent method when...
1
Solution 1: ∇2 f (xk ) is positive semidefinite and the step αk = 2
satisfies
the Wolfe conditions.
1
Solution 2: ∇2 f (xk ) is positive definite and the step αk = 2
satisfies the
Wolfe conditions.

Solution 3: ∇2 f (xk ) is positive semidefinite and the step αk = 1 satisfies


the Wolfe conditions.

Solution 4: ∇2 f (xk ) is positive definite and the step αk = 1 satisfies the


Wolfe conditions.
2 Objective function
Question 4
Consider the following function:

f (x1 , x2 ) = ln(x1 ),
where x1 > 0.
This function is...

Solution 1: neither convex nor concave.

Solution 2: both convex and concave.

Solution 3: convex.

Solution 4: concave.
Question 5
Consider the following function:

f (x1 , x2 ) = ex1 cos(x2 ).


What is the directional derivative along the direction
 
0
d=
5

at point  
0
x= π ?
2

Solution 1: -1

Solution 2: 0

Solution 3: 1

Solution 4: -5
Question 6
Consider a function f , twice differentiable from R2 to R, and consider a
stationary point (x1 , x2 ) ∈ R2 . If the determinant of the Hessian matrix of
f at this point is negative, then ...

Solution 1: the stationary point is a saddle point.

Solution 2: the stationary point is a local minimum.

Solution 3: the stationary point is a local maximum.

Solution 4: the provided information is not enough to decide about the


nature of the stationary point.
Question 7
Consider the following function:

f (x1 , x2 , x3 ) = x22 + x1 x3 .
What is the curvature of f along the direction
 
2
d =  −1 
2

at point  
1
x =  2 ?
2
2
Solution 1: 3

Solution 2: 0

Solution 3: 1
10
Solution 4: 9
Question 8
Preconditioning is ...

Solution 1: a method to define a descent direction.

Solution 2: applied when there is no difference in the curvature of the


function among all directions.

Solution 3: a method to define a step size.

Solution 4: a variable changing procedure for differentiable functions.


3 Optimality conditions
Question 9
Consider the function f (x1 , x2 ) = x21 +x22 +x1 x2 +1 and the point xT = (0, 0).
Which one of the following statements is correct?

Solution 1: x is not a global minimum.

Solution 2: The first-order necessary condition is not satisfied in x.

Solution 3: The sufficient optimality conditions are satisfied in x.

Solution 4: The first-order necessary condition is satisfied in x but the


second-order necessary condition is not satisfied.
Question 10
Consider the function
1
f (x1 , x2 ) = x31 + x22 + 2x1 x2 − 6x1 − 3x2 + 4
3
and the point  
−1
x= 5 .
2

Which one of the following statements is correct?

Solution 1: x is a local minimum.

Solution 2: x is a local maximum.

Solution 3: x is a saddle point.

Solution 4: x is not a critical point.


Question 11
Consider the function

f (x1 , x2 ) = 48x1 + 96x2 − x21 − 2x1 x2 − 9x22

and the point  


21
x= .
3
Which one of the following statements is correct?

Solution 1: x is a saddle point.

Solution 2: x is a local minimum.

Solution 3: x is a local maximum.

Solution 4: x is not a critical point.


Question 12
Consider the following quadratic function
1
f (x) = xT Qx + g T x + c
2
 
1 2
where Q = , g ∈ R2 and c ∈ R.
2 1
Which one of the following statements is correct?

Solution 1: f does not have a local minimum.

Solution 2: f has an only global minimum.

Solution 3: f has an infinite number of local minima.

Solution 4: f is a convex function.


4 Solving equations: Newton
Question 13
We aim to find the root of the equation

F (x) = x2 − 3 = 0.
More precisely, we aim to find x∗ such that |F (x∗ )| ≤ ε = 10−15 .
When applying the first step of Newton’s algorithm starting from point
x0 = 2, we obtain:

Solution 1: x1 = 1.73.

Solution 2: x1 = 1.732.

Solution 3: x1 = 1.74.

Solution 4: x1 = 1.75.
5 Newton’s local method
Question 14
Let f be a twice differentiable function. Which statement about Newton’s
local method for the minimization of f is correct?

Solution 1: If the algorithm converges, it always converges to a stationary


point of the function f .

Solution 2: The point obtained during the k th iteration of the algorithm


maximizes the quadratic model of the function f in point xk .

Solution 3: If we start the algorithm from two different starting points,


it will always converge to two different local minima.

Solution 4: If the algorithm converges, it enables to always find a point


that satisfies the second-order necessary optimality condition.
Question 15
Consider the function f : R → R defined as

f (x) = −x5 + 2x3 + 40x.

The quadratic model of the function f at x̂ = 2 is:

Solution 1: m2 (x) = 40x2 − 44x + 92.

Solution 2: m2 (x) = −68x2 + 256x − 176.

Solution 3: m2 (x) = −720x2 + 64x + 81.

Solution 4: m2 (x) = 360x2 + 32x + 944.


6 Descent methods
Question 16
The purpose of preconditioning the function in the steepest descent method
is to

Solution 1: increase the speed of convergence of the algorithm.

Solution 2: avoid local optima.

Solution 3: obtain the optimum solution in maximum 3 iterations.

Solution 4: linearize the optimization problem.


Question 17
The purpose of an exact line search algorithm, applied to a descent method,
is to:

Solution 1: determine the descent direction in the current iteration.

Solution 2: determine an acceptable step to follow a descent direction in


the current iteration.

Solution 3: find the optimal value of the objective function in the next
iteration.

Solution 4: determine the step corresponding to a local minimum of the


function along a descent direction in the current iteration.
Question 18
The golden section method applied to a function h on an interval [ℓ, u] gen-
erates a sequence of intervals [ℓk , uk ] such that for each k we always have:

Solution 1: [ℓk+1 , uk+1 ] ⊂ [ℓk , uk ].

Solution 2: [ℓk , uk ] ⊂ [ℓk+1 , uk+1 ].

Solution 3: ℓk < ℓk+1 .

Solution 4: ℓk+1 ≤ ℓk .
Question 19
Suppose that f : Rn → R is a differentiable nonlinear function, xk ∈ Rn is a
point and dk ∈ Rn is a direction such that ∇f (xk )T dk < 0, and f is bounded
from below in the direction dk . The purpose of the first Wolfe condition:

f (xk + αdk ) ≤ f (xk ) + αβ1 ∇f (xk )T dk

is to:

Solution 1: check that dk is a descent direction.

Solution 2: insure that the objective function decreases along the direc-
tion dk .

Solution 3: insure that the descent algorithm will progress rapidly.

Solution 4: insure a sufficient decrease of the objective function.


Question 20
Suppose that f : Rn → R is a differentiable nonlinear function, xk ∈ Rn is
a point and dk ∈ Rn is a descent direction such that ∇f (xk )T dk < 0, and
that f is bounded from below in the direction dk . If we write the two Wolfe’s
conditions as:
f (xk + αdk ) ≤ f (xk ) + αβ1 ∇f (xk )T dk ,
and
∇f (xk + αdk )T dk
≤ β2 ,
∇f (xk )T dk
which condition should be satisfied by the parameters β1 and β2 so that
we can be sure that there exists a step size α that satisfies both Wolfe’s
conditions?

Solution 1: This condition does not exist.

Solution 2: 0 < β2 < β1 < 1.

Solution 3: 0 < β1 = β2 < 1.

Solution 4: 0 < β1 < β2 < 1.

You might also like