Nlpguia

Formulation – Modeling
Michel Bierlaire
Practice quiz
An investment fund is trying to determine how to invest its assets for the
following year, in order to maximize its profit. Currently, the fund has 2.5
million euros that it can invest in state bonds, real estate loans, car loans or
scholarship loans. The annual interest rates of the listed investment types
are 4% for state bonds, 6% for real estate loans, 8% for car loans and 9% for
scholarship loans.
To minimize risks, the investment fund allows only the selection of a
strategy satisfying the following restrictions:
• the amount invested in car and scholarship loans must not exceed twice
the amount invested in bonds;
• the amount invested in car loans must be larger or equal than the
amount invested in scholarship loans;
• the investment in car loans should not exceed the investment in real
estate loans by more than 70%.
Formulate this problem as an optimization problem by determining
1. the decision variables,
2. the objective function, and
3. the constraint(s).
Michel Bierlaire
Solution of the practice quiz
We proceed with the three steps of the modeling process.

Decision variables The decision variables are the amounts in euros in-
vested for
• state bonds: xsb ,
• real estate loans: xre ,
• car loans: xcℓ , and,
• scholarship loans: xsℓ .
Objective function As the company wants to maximize its profit, the ob-
jective function must provide the profit as a function of the decision
variables:
f : R4 → R : f (xsb , xre , xcℓ , xsℓ ).
The profit of the investment fund for the following year is calculated
based on the interest rates using the following formula:
f (xsb , xre , xcℓ , xsℓ ) = 0.04xsb + 0.06xre + 0.08xcl + 0.09xsℓ .
Therefore, we can write the problem as
max 0.04xsb + 0.06xre + 0.08xcl + 0.09xsℓ .
xsb ,xre ,xcℓ ,xsℓ
Constraints The total amount to invest is 2.5 Me. This is modeled using
the following constraint:
xsb + xre + xcℓ + xsℓ = 2′ 500′ 000
Each restriction is modeled as follows:

• the total amount invested in car and scholarship loans (xcℓ + xsℓ )
must not exceed twice the amount invested in bonds (2xsb ):
xcℓ + xsℓ ≤ 2xsb .
• the amount invested in car loans (xcℓ ) must be larger or equal

than the amount invested in scholarship loans (xsℓ ):
xsℓ ≤ xcℓ .
• the investment in car loans (xcℓ ) should not exceed the investment
in real estate loans (xre ) by more than 70%:
xcℓ ≤ 1.7xre .
Finally, we must impose all the decision variables to be non negative:
xsb , xre , xcℓ , xsℓ ≥ 0.
Putting everything together, we obtain the following optimization prob-

lem:
max 0.04xsb + 0.06xre + 0.08xcℓ + 0.09xsℓ
subject to
xsb + xre + xcℓ + xsℓ = 2′ 500′ 000,

xcℓ + xsℓ ≤ 2xsb ,
xsℓ ≤ xcℓ ,
xcℓ ≤ 1.7xre ,
xsb , xre , xcℓ , xsℓ ≥ 0.
For your information, the optimal solution (rounded to 1/10 of euros) is:
• State bonds: 696’721.3 euros
• Real estate loans: 409’836.1 euros
• Car loans: 696’721.3 euros
• Scholarship loans: 696’721.3 euros

for a total profit of 170’902.6 euros
The constraints are verified:
• Constraint 1: 696′ 721.3+409′ 836.1+696′ 721.3+696′ 721.3 = 2′ 500′ 000.0,
• Constraint 2: 696′ 721.3+696′ 721.3 = 1′ 393′ 442.62∗696′ 721.3 = 1′ 393′ 442.6,
• Constraint 3: 696′ 721.3 <= 696′ 721.3,
• Constraint 4: 696′ 721.3 <= 1.7 · 409′ 836.1 = 696′ 721.3.

Michel Bierlaire
Practice quiz
The company Coola-Coola Ltd. wants to design a can of soda of volume

0.33 liters. They need to set the dimensions (in centimeters) of this can to
use the minimum amount of aluminium, knowing that the form of the can is a
perfect cylinder, and the thickness of the aluminium is the same everywhere.
Write the problem as an optimization problem.
Michel Bierlaire
We consider the three steps of the modeling process.
Decision variables The design of the cylinder depends on two variables,

both expressed in centimeters:
• the radius of the basis: r,

• the height of the cylinder: h.
Objective function Since the thickness of the aluminium is the same at

any part of the can, the total surface of the cylinder has to be min-
imized. The objective function must provide the total surface as a
function of the decision variables:
f : R2 → R : f (r, h).
• Each basis is a circle of radius r, so its surface is πr2 .

• The side of the can is a rectangle of area 2πrh.
Therefore, the objective function to minimize is
f (r, h) = 2πr2 + 2πrh.

Constraints The volume of the can must be 0.33 liters, that is 330 cm3 .
The first constraint is therefore:
πr2 h = 330.
We also need non negativity constraints:
r ≥ 0,
h ≥ 0.
The optimization problem is therefore:
min 2πr2 + 2πrh

r,h
subject to
πr2 h = 330,
r ≥ 0,
h ≥ 0.
The optimal solution of this problem is r = 3.746 cm and h = 7.491 cm.

Transformations
Michel Bierlaire
Practice quiz
Given the following optimization problem, transform it in such a way as

to obtain a minimization problem in which all decision variables must be non
negative and all constraints are defined by lower inequalities.
max −x21 + sin x2
subject to
6x1 − x22 ≥ 1,
x21 + x22 = 3,
x1 ≥ 2,
x2 ∈ R.
Transformations
Michel Bierlaire
We apply a sequence of simple transformations to the optimization prob-

lem.
1. A maximization problem whose objective function is f (x) is equivalent

to a minimization problem whose objective function is −f (x):
argmaxx f (x) = argmaxx −f (x)
and
max f (x) = − min −f (x).
By applying statement (a) to our problem we obtain:
− min x21 − sin x2

subject to
6x1 − x22 ≥ 1,
x21 + x22 = 3,
x1 ≥ 2,
x2 ∈ R.
2. Now we transform the constraints. An equality constraint can be writ-

ten as the combination of two inequalities.

g(x) ≤ 0
g(x) = 0 ⇐⇒
g(x) ≥ 0.
The optimization problem becomes
subject to
6x1 − x22 ≥1
x21 + x22 ≤3
x21 + x22 ≥3
x1 ≥2
x2 ≥ 0.
3. A constraint defined by a lower inequality can be multiplied by −1 to

get an upper inequality:
g(x) ≤ 0 ⇐⇒ −g(x) ≥ 0.

subject to
−6x1 + x22 ≤ −1
x21 + x22 ≤3
−x21 − x22 ≤ −3
x1 ≥2
x2 ∈ R.
4. If a variable x can take any real value, it can be replaced by two non
negative artificial variables denoted by x+ and x− , such that
x = x+ − x− .
We can impose x+ ≥ 0 and x− ≥ 0, without loss of generality. We

apply this to the variable x2 in our formulation.
− min x21 − sin(x+ −

2 − x2 )
subject to
−6x1 + (x+ − 2
2 − x2 ) ≤ −1
x21 + (x+ − 2
2 − x2 ) ≤3
−x21 − (x+ − 2
2 − x2 ) ≤ −3
x1 ≥2
x+
2 ≥0
x−
2 ≥ 0.
5. In the presence of a constraint x ≥ a, with a ∈ R, a simple change of

variable
x = x̂ + a
transforms the constraint in
x̂ ≥ 0.
We apply this last principle to variable x1 , and we obtain:
− min(x̂1 + 2)2 − sin(x+ −

2 − x2 )
subject to
−6(x̂1 + 2) + (x+ − 2
2 − x2 ) ≤ −1
(x̂1 + 2)2 + (x+ − 2
2 − x2 ) ≤3
−(x̂1 + 2)2 − (x+ − 2
2 − x2 ) ≤ −3
x̂1 ≥0
x+
2 ≥0
x−
2 ≥ 0.
As requested, it is a minimization problem, all decision variables are non

negative and all constraints are defined by lower inequalities. Any solution
(x̂1 , x+ −
2 , x2 ) of this problem corresponds to the following solution of the orig-
inal problem:
x1 = x̂1 + 2
x2 = x+ −
2 − x2 .
Transformations
Michel Bierlaire
Practice quiz
The following optimization problem is not linear, because of the absolute

value in the objective function. Transform it into a linear problem in which
all decision variables must be non negative.
min | x1 − x2 |
subject to
x1 ≥ 0,
x2 ≥ 0.
Transformations
Michel Bierlaire
To solve this exercise, we remember that if a variable x can take any real
value, it can be replaced by two non negative artificial variables denoted by
x+ and x− , such that x = x+ − x− . We also recall that the absolute value of
x is defined as
x if x ≥ 0,
|x| =
−x if x < 0.
In our case, we have that

x1 − x2 if x1 ≥ x2 ,
|x1 − x2 | =
x2 − x1 if x1 < x2 .
Since x1 and x2 are non negative real numbers, the difference x1 − x2 can
take any real value.
Let us study the absolute value by examining the two cases:
• If x1 − x2 ≥ 0, we can define the non negative quantity
(
x1 − x2 , if x1 − x2 > 0,
y+ =
0, otherwise.
• If x1 − x2 < 0, we can define the positive quantity

(
x2 − x1 , if x1 − x2 < 0,
y− =
0, otherwise.
Consequently, the absolute value of the difference can then be written as
|x1 − x2 | = y + + y − .
Furthermore, if we impose that y + ≥ 0 and y − ≥ 0, we have that our
minimization problem can be written as
min (y + + y − )
x1 ,x2 ,y + ,y −
subject to
y+ ≥ x1 − x2 ,
y− ≥ x2 − x1 ,
x1 ≥ 0,
x2 ≥ 0,
y+ ≥ 0,
y− ≥ 0.
Note that this formulation is not strictly equivalent to the original one
for all feasible solutions. For instance, the feasible solution x1 = 0, x2 = 0,
y + = 1, y − = 1 has objective value 2 in the transformed problem, and 0 in
the original one.
But it is equivalent at the optimal solution. Denote the optimal solution
of the original problem (x∗1 , x∗2 ).
• If x∗1 − x∗2 > 0, the lowest possible value for y + , denoted by (y + )∗ , is
x∗1 −x∗2 (because of the constraint y + ≥ x1 −x2 ), and the lowest possible
value for y − , denoted by (y − )∗ , is 0 (because of the constraint y − ≥ 0).
Therefore, the objective function of the transformed problem at the
optimal solution is
(y + )∗ + (y − )∗ = x∗1 − x∗2 + 0 = |x∗1 − x∗2 |.
• If x∗1 − x∗2 < 0, for similar reasons, we have (y + )∗ = 0 and (y − )∗ =

x∗2 − x∗1 . Therefore, the objective function of the transformed problem
at the optimal solution is
(y + )∗ + (y − )∗ = 0 + x∗2 − x∗1 = |x∗1 − x∗2 |.
• If x∗1 − x∗2 = 0, we have (y + )∗ = 0 and (y − )∗ = 0. Therefore, the

objective function of the transformed problem at the optimal solution
is
(y + )∗ + (y − )∗ = 0 + 0 = |x∗1 − x∗2 |.
Problem definition
Michel Bierlaire
Practice quiz
Can a function have both a global and a local minimum? If so, provide
an example. If not, explain why.
Problem definition
Michel Bierlaire
Yes. Any global minimum is also a local minimum. If the function is

convex, a local minimum is also a global minimum. If not, there may be
local minima that are not global. An example of such a function could be
f (x) = −2x2 + x4 − x3 .
6
f (x)
4
x
−3 −2 −1 1 2 3
−2
−4
We use the optimality conditions to identify the maxima and minima of the
function. The first derivative of f (x) is
f ′ (x) = −4x + 4x3 − 3x2 .
If we solve f ′ (x) = 0, we obtain three solutions:
x1 = 0, x2 = −0.693, x3 = 1.443.
These are the only candidates to be minima or maxima.

• Consider the interval [−0.1, 0.1]. x1 reaches the maximum of f in this
neighborhood. It is therefore a local maximum.
• Consider the interval [−0.7, −0.6]. x2 reaches the minimum of f in this

neighborhood. It is therefore a local minimum.
• Consider the interval [1.4, 1.5]. x3 reaches the minimum of f in this

neighborhood. It is therefore a local minimum.
This can also be verified using the second derivative of the function f (x):
f ′′ (x) = −4 + 12x2 − 6x.
Indeed, if we substitute the solutions x in the function f ′′ (x) we obtain:
f ′′ (x1 ) = −4 + 12(0)2 − 6(0) = −4 < 0 ⇒local maximum,

′′ 2
f (x2 ) = −4 + 12(−0.693) − 6(−0.693) = 5.92 > 0 ⇒local minimum,
f ′′ (x3 ) = −4 + 12(1.443)2 − 6(1.443) = 12.32 > 0 ⇒local minimum.
Since the function is a polynomial, we have
lim f (x) = lim −2x2 + x4 − x3 = +∞,

x→∞ x→∞
lim f (x) = lim −2x2 + x4 − x3 = +∞.
x→−∞ x→−∞
Therefore, the function has no global maximum. Its global minimum is the
local minimum associated with the lowest value of f . We can conclude that
the function has
• a local maximum at x1 = 0, f (x1 ) = 0,
• a local minimum at x2 = −0.693, f (x2 ) = −0.554, and
• a global minimum at x3 = 1.443, f (x3 ) = −2.77.

Problem definition
Michel Bierlaire
Practice quiz
Consider the following objective functions,
1. f1 (x) = x2 .
2. f2 (x) = 1/|x|.
3. f3 (x) = 1/x.
For each function, provide its infimum on R, or show that it does not
exist. Provide also its minimum on R, or show that it does not exist.
Answer the same question if the decision variable is constrained as follows:
−1 ≤ x ≤ 2.
Problem definition
Michel Bierlaire

We first plot the function f1 to have an intuition.
6
f1 (x) = x2
x
0
−3 −2 −1 0 1 2 3
The infimum of x2 on R is 0:
inf x2 = 0.
x∈R
Indeed, for each M > 0, there exists

√
M
y=
2
such that y 2 = M
4
< M . The function has also a minimum at x∗ = 0, as
f1 (x∗ ) = inf x2 = 0.
x∈R
When the constraints are introduced, the same arguments can be used to
reach the same conclusions: the infimum is 0, and x∗ = 0 is an optimum.
We now analyze the function f2 .
6
f2 (x) = 1/|x|
x
0
−2 0 2 4
The infimum of 1/|x| on R is 0:

1
inf = 0.
x∈R |x|
Indeed, for each M > 0, there exists

2
y=
M
such that f2 (x) = M
2
< M . However, there is no minimum, as there is no x
such that
1
= 0.
|x|
The infimum of 1/|x| on Y = {x ∈ R| − 1 ≤ x ≤ 2} is 0.5. Indeed, for
each M > 0.5, there exists
y=2∈Y
such that f2 (y) = 0.5 < M . And the minimum is x∗ = 2, as
1
f2 (x∗ ) = 0.5 = inf .
x∈Y |x|
Finally, we analyze the function f3 .

6
f2 (x) = 1/x 2
−2
−4
−6
−4 −2 0 2 4
x
There is no infimum on R. Indeed, the function is not bounded from

below. As the infimum is the best lower bound, and there is no lower bound,
there is no infimum. Consequently, there is no optimum either. As the
function is also not bounded from below on the interval [−1 : 2], we reach
the same conclusions for the constrained case.
Problem definition
Michel Bierlaire
Practice quiz
Does the following function f (x1 , x2 ) : R2 → R have a global maximum

and a global minimum in the feasible set A?
f (x1 , x2 ) = x21 + x22 − 2x1 x2

,
A = (x1 , x2 ) ∈ R2 |(x1 − 1)2 + (x2 − 2)2 ≤ 1 .

Hint: use the Weierstrass extreme value theorem.

Problem definition
Michel Bierlaire
The function f (x1 , x2 ) = x21 +x22 −2x1 x2 is a continuous function, because

it is the sum of continuous functions.
·105
0
0 200
100
200
300 0

The set A = (x1 , x2 ) ∈ R2 |(x1 − 1)2 + (x2 − 2)2 ≤ 1 is a compact set:
6 x
2
2 A
x1
−4 −3 −2 −1 1 2 3
Weierstrass’s theorem guarantees that the given function has both a max-
imum and a minimum when optimized on the feasible set A.
Convexity
Michel Bierlaire
Practice quiz
Are the following functions convex or concave? Justify your answer.
1. f (x) = 2x2 − 3.
2. g(x) = x3 − 5x2 + 6x.

Convexity
Michel Bierlaire
1. The function f is plotted at Figure 1. This figure gives us the intuition

that the function is convex.
−3 −2 −1 1 2 3
−2
Figure 1: f (x) = 2x2 − 3
We use the definition of convexity to formally show it. Let x, y ∈ R and

λ ∈ [0, 1]. In order for f (x) to be convex, it must satisfy the following
condition:
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).
We first write the inequation and use algebra to simplify it.
2(λx + (1 − λ)y)2 − 3 ≤ λ(2x2 − 3) + (1 − λ)(2y 2 − 3)
2λ2 x2 + 4λ(1 − λ)xy + 2(1 − λ)2 y 2 − 3 ≤ 2λx2 − 3λ + 2y 2 − 3 − 2λy 2 + 3λ
2λ2 x2 + 4λ(1 − λ)xy + 2(1 − λ)2 y 2 ≤ 2λx2 + 2y 2 − 2λy 2
2λ2 x2 + 4λ(1 − λ)xy + 2y 2 − 4λy 2 + 2λ2 y 2 ≤ 2λx2 + 2y 2 − 2λy 2
2λ2 x2 + 4λ(1 − λ)xy − 4λy 2 + 2λ2 y 2 ≤ 2λx2 − 2λy 2
2(λ2 − λ)x2 + 4λ(1 − λ)xy − 2λy 2 + 2λ2 y 2 ≤ 0
2(λ2 − λ)(x2 − 2xy + y 2 ) ≤ 0
2(λ2 − λ)(x − y)2 ≤ 0.
Since λ ∈ [0, 1], the quantity (λ2 − λ) is not positive. As (x − y)2 ≥ 0,

for any x, y ∈ R, the final inequation is always true, which makes the
starting inequation true as well. With this, we conclude that f (x) is a
convex function.
2. The function is plotted at Figure 2. It gives us the intuition that the
function is not convex, and not concave. We show it.
−1 1 2 3 4
−1
Figure 2: g(x) = x3 − 5x2 + 6x
In order for g(x) to be convex, it must satisfy the following condition:

g(λx + (1 − λ)y) ≤ λg(x) + (1 − λ)g(y),
for each x, y ∈ R and each λ ∈ [0, 1].
Similarly, in order for g(x) to be concave, it must satisfy the following
condition:
g(λx + (1 − λ)y) ≥ λg(x) + (1 − λ)g(y),
for each x, y ∈ R and each λ ∈ [0, 1].
To show that the function is not convex, and not concave, we need to
prove the existence of x, y and λ that violate the above inequalities.
Let us first rearrange the function in the following manner:
g(x) = x3 − 5x2 + 6x = x(x − 2)(x − 3).
From here, we see easily that g(x) = 0 for x1 = 0, x2 = 2, and x3 = 3.

Let us further observe the function values in the points x1 , x2 , and
λx1 + (1 − λ)x2 , where λ = 0.5 for the sake of simplifying the proof:
g(x1 ) = g(0) = 0
g(x2 ) = g(2) = 0
g(λx1 + (1 − λ)x2 ) = g(1) = 2
Since
g(λx1 + (1 − λ)x2 ) > λg(x1 ) + (1 − λ)g(x2 ),
we can conclude that g(x) is not convex.
Let us further observe the function values in the points x2 , x3 , and
λx2 + (1 − λ)x3 , where λ = 0.5, again for the sake of simplifying the
proof:
g(x2 ) = g(2) = 0
g(x3 ) = g(3) = 0
g(λx2 + (1 − λ)x3 ) = g(2.5) = −0.625
Since
g(λx2 + (1 − λ)x3 ) < λg(x2 ) + (1 − λ)g(x3 ),
we can conclude that g(x) is not concave either.
Convexity
Michel Bierlaire
Practice quiz
Consider a function f : R → R of a single variable, concave on [ℓ1 , u1 ],

where ℓ1 < u1 . Consider also the function g of two variables, defined by
g(x1 , x2 ) = f (x1 ) on A = [ℓ1 , u1 ] × [ℓ2 , u2 ], where ℓ2 < u2 . Show that g is
concave on A.
Remember that g is concave on a convex set A if
g((1 − λ)x + λy) ≥ (1 − λ)g(x) + λg(y)
for all x, y ∈ A and for all λ ∈ [0, 1].

Convexity
Michel Bierlaire
Note first that the set A = [ℓ1 , u1 ] × [ℓ2 , u2 ] is non empty and convex.
Consider x and y in A and λ ∈ [0, 1]. By convexity of A, the point z =
(1 − λ)x + λy belongs to A.
As f is concave, we have
g(z) = f (z1 )
= f ((1 − λ)x1 + λy1 )
≥ (1 − λ)f (x1 ) + λf (y1 )
= (1 − λ)g(x1 , x2 ) + λg(y1 , y2 )
= (1 − λ)g(x) + λg(y),
that proves the concavity of g.

Differentiability: The first order
Michel Bierlaire
Practice quiz
Calculate the gradient of the following function:
f : R3 → R : f (x1 , x2 , x3 ) = ex1 + x21 x3 − x1 x2 x3 .

What is the directional derivative along the direction d at a point (x1 , x2 , x3 ),
where  
d1
d = d 2 ?
d3
Differentiability: the first order
Michel Bierlaire
If f : Rn → R is a differentiable function, the function ∇f (x) : Rn → Rn

is called the gradient of f and is defined as
 ∂f 
∂x1
∂f
 
∇f (x1 , x2 , x3 ) = 
 ∂x2
.

∂f
∂x3
Therefore, the gradient of the function is

 x 
e 1 + 2x1 x3 − x2 x3
∇f (x1 , x2 , x3 ) =  −x1 x3 .
2
x1 − x1 x2
Let f : Rn → R be a differentiable function. Consider x ∈ Rn and d ∈ Rn .

The directional derivative of f at x in the direction d is given by
f (x + αd) − f (x)
lim ,
α→0 α
if the limit exists. Here, we have
f (x + αd) − f (x) =ex1 +αd1 + (x1 + αd1 )2 (x3 + αd3 )

− (x1 + αd1 )(x2 + αd2 )(x3 + αd3 )
− (ex1 + x21 x3 − x1 x2 x3 )
=ex1 +αd1 − ex1 + αx21 d3 + α2 x3 d21 + α3 d21 d3 + 2αx1 x3 d1
+ 2α2 x1 d1 d3 − αx1 x2 d3 − αx1 x3 d2 − α2 x1 d2 d3
− αx2 x3 d1 − α2 x2 d1 d3 − α2 x3 d1 d2 − α3 d1 d2 d3 .
Therefore,
f (x + αd) − f (x) ex1 +αd1 − ex1

lim = lim
α→0 α α→0 α
2
+ x1 d3 + 2x1 x3 d1 − x1 x2 d3 − x1 x3 d2 − x2 x3 d1
=d1 ex1 + x21 d3 + 2x1 x3 d1 − x1 x2 d3 − x1 x3 d2 − x2 x3 d1
=d1 (ex1 + 2x1 x3 − x2 x3 ) − x1 x3 d2 + (x21 − x1 x2 )d3 .
In addition, when the gradient exists, the directional derivative is the

inner product between the gradient of f and the direction d, that is,
∇f (x)T d.
Using this formula, the directional derivative is
(d1 d2 d3 )∇f (x1 , x2 , x3 ) =
d1 (ex1 + 2x1 x3 − x2 x3 ) − d2 x1 x3 + d3 (x21 − x1 x2 ).

You can verify that it is the same result as above.
Michel Bierlaire
Practice quiz
Calculate the steepest descent direction of the following function

1
f (x) = x21 + 2x22 ,
2
at x = (1, 1). What is the directional derivative of the function at x in the
steepest descent direction?
Michel Bierlaire
The steepest descent direction is the direction opposite to the gradient.

We have
∂f
!
∂x x1
∇f (x) = ∂f1 = .
4x2
∂x2
T
At point x = (1 1) , we have

x1 1
∇f (x) = = .
4x2 4
Therefore, the steepest descent direction is

−1
.
−4
The directional derivative is

T
1
−∇f (x) ∇f (x) = −1 −4 = −17.
4
Differentiability: first order
Michel Bierlaire
Practice quiz
Consider the function f : R2 → R defined as

1
f (x) = x21 + 2x22 .
2
1. Implement in Python a function that takes x as argument and returns
the value of the function and its gradient.
2. Evaluate the function at

1
x= .
1
3. Plot the function for x1 and x2 ranging from -6 to 6.
4. Consider the three following directions:

−1 1
d1 = −∇f (x), d2 = , d3 = .
−1 −3
Plot the functions
gi (α) = f (x + αdi ), i = 1, 2, 3,
for α ∈ [0, 1].
5. Calculate the directional derivatives of f at x along each direction.

Michel Bierlaire
Practice quiz
Calculate the Jacobian matrix of the following function:
x21 x2
 
2
f : R2 → R3 : f (x1 , x2 ) =  cos(ex1 +x2 )  .
x2
Michel Bierlaire
The Jacobian matrix is the transposed of the gradient matrix. In the

gradient matrix, each row corresponds to a variable, and each column to a
function. For the Jacobian matrix, it is the opposite. Here, as we have 2
variables and 3 functions, we have
∇f (x) ∈ R2×3 , J(x) ∈ R3×2 .
The gradient matrix is

∂f1 ∂f2 ∂f3
! 2 2
∂x1 ∂x1 ∂x1 2x1 x2 − sin(ex1 +x2 )ex1 +x2 0
∇f (x) = = 2 2 .
∂f1 ∂f2 ∂f3 x21 −2x2 sin(ex1 +x2 )ex1 +x2 1
∂x2 ∂x2 ∂x2
The Jacobian matrix is

 ∂f1 ∂f1 
x21
 
∂x1
 ∂f2
∂x2 2x1 x2
∂f2  2 2 2 2

J(x) = ∇f (x)T = 
 ∂x1 ∂x2  = − sin(ex1 +x2 )ex1 +x2 −2x2 sin(ex1 +x2 )ex1 +x2  .
∂f3 ∂f3 0 1
∂x1 ∂x2
Michel Bierlaire
Practice quiz
The depth of a lake (in meters) at coordinates (x1 , x2 ) is given by the

function:
f (x1 , x2 ) = 400 − 3x21 x22 ,
where the coordinate system is also in meters. If a swimmer is located in
the middle of the lake, at coordinates (1, −2), determine the direction d that
she needs to swim in order to make the depth increase as fast as possible.
Provide d such that its norm is 1. If she swims a distance of one meter in
that direction, what is the depth of the lake at her new position? What if
she swims a distance of eight meters?
Michel Bierlaire
The function
f (x1 , x2 ) = 400 − 3x21 x22
is differentiable at (1, −2)T . Therefore the gradient is the direction of steepest
ascent. We have
−6x1 x22
∇f (x1 , x2 ) = ,
−6x21 x2
and
−24
∇f (1, −2) = .
12
√
As k∇f (1, −2)k = 12 5, the normalized direction in which the depth in-
creases as fast as possible is then
√ √
d = −2/ 5, 1/ 5 .
If she swims one meter along the direction, her new position is
√
+ 1 − 2/ √5
x = ,
−2 + 1/ 5
and the depth at this position is

√
+ 269 − 120 5
f (x ) = 400 − 3 ≈ 399.92.
25
If she swims a distance of 8 meters, her new position is
√
++ 1 − 16/ √5
x = ,
−2 + 8/ 5
and the depth is
f (x++ ) ≈ 117.
As the depth at the current position is
f (x) = 388,
it illustrates that following an ascent direction increases the value of the

function, but up to a point. If the step along the direction is too long, the
value of the function may actually decrease. In this example, the step should
be lesser than 5.59 in order to obtain an increase of the value of the function.
Differentiability: the second order
Michel Bierlaire
Practice quiz
For each of the following functions,
1. calculate the gradient,
2. calculate the Hessian,
3. check if the function is convex, concave or neither,
4. calculate the curvature of the function in the direction d at the specified

point x+ .
Hint: remember that a symmetric matrix is positive definite if all its

eigenvalues are positive.
1 9
f (x) = x21 + x22 ,
2 2
x+ = (0, 0)T ,
d = (1, 1)T .
1
g(x) = x31 + x32 − x1 − x2 ,
3
x+ = (9, 1)T ,
d = (9, 1)T .
Differentiability: the second order
Michel Bierlaire
1 9
f (x) = x21 + x22 .
2 2
1. The gradient is
∂f
!
∂x1 x1
∇f (x1 , x2 ) = = .
∂f 9x2
∂x2
2. The Hessian is
 
∂2f ∂2f
2 ∂x21 ∂x1 x2 1 0
∇ f (x1 , x2 ) =  = .
∂2f ∂2f 0 9
∂x2 x1 ∂x22
3. The Hessian is constant, and positive definite. Indeed, as it is a diagonal

matrix, its diagonal entries are also its eigenvalues. As they are both
positive, the matrix is positive definite. And, consequently, the function
f is convex everywhere.
4. The curvature of the function in the direction d = (1, 1)T at x+ =

(0, 0)T is obtained as
dT ∇2 f (x+ )d 10
T
= = 5.
d d 2
1
g(x) = x31 + x32 − x1 − x2 .
3
1. The gradient is
∂f
!
∂x1 x21 − 1
∇f (x1 , x2 ) = = .
∂f 3x22 − 1
∂x2
2. The Hessian is
 
∂2f ∂2f
2 ∂x21 ∂x1 x2 2x1 0
∇ f (x1 , x2 ) =  = .
∂2f ∂2f 0 6x2
∂x2 x1 ∂x22
3. Consider the point xa = (1, 1)T . At that point, the Hessian is positive
definite and, therefore, the function is convex. If you now consider
the point xb = (−1, −1)T , the Hessian is negative definite, and the
function is concave at xb . Therefore, the function itself is neither a
convex function, nor a concave function.
4. The curvature of the function in the direction d = (9, 1)T at the speci-
fied point x+ = (9, 1)T is obtained as
dT ∇2 g(x+ )d 1464
T
= = 17.85.
d d 82
Differentiability: second order
Michel Bierlaire
Practice quiz
Consider f : R2 → R defined as
1
f (x) = x31 + x32 − x1 − x2 .
3
1. Implement in Python a function that takes x as argument and returns
the value of the function, its gradient and its second derivatives matrix.
2. Evaluate the function, its gradient and second derivatives matrix at

9
x= .
1
3. Plot the function for x1 and x2 ranging from -5 to 5.

4. Consider the following direction:

−1
d= .
−1
Plot the uni-dimensional function
g(α) = f (x + αd),
for α ∈ [0, 10].
5. Calculate the directional derivatives of f at x along the direction.
6. Calculate the curvature of f at x along the direction.
7. Calculate the eigenvalues and the eigenvectors of the matrix ∇2 f (x).
8. Calculate the curvature of f at x along the eigenvectors.
Linearity
Michel Bierlaire
Practice quiz
Consider the function

x2
f (x) = − 2x + 1.
100
Provide a Lipschitz constant for the derivative of this function.
Linearity
Michel Bierlaire
We have
x2
f (x) = − 2x + 1,
100
2x
f ′ (x) = − 2,
100
2
f ′′ (x) = .
100
We need to find M such that
|f ′ (x) − f ′ (y)| ≤ M |x − y|.
We have
2x 2y = 2 |x − y|.

|f ′ (x) − f ′ (y)| = −
100 100 100
Therefore, any M such that
2
M≥
100
is a Lipschitz constant.
Conditioning
Michel Bierlaire
Practice quiz
Consider the following quadratic function:
f (x1 , x2 ) = 2x21 + 9x22 . (1)
1. Calculate the condition number of the Hessian of f at point (x1 , x2 ).
2. Apply the change of variable x′ = M x where

2 √0
M= ,
0 3 2
that is,
x′1 = 2x1 ,
√
x′2 = 3 2x2 .
f˜(x′ ) = f (M −1 x′ ).
Calculate the condition number of the Hessian (using the 2-norm) of f˜
at point (x′1 , x′2 ).
Remember that, if A ∈ Rn×n is a non singular symmetric matrix, then the

condition number of A is
κ(A) = ||A|| ||A−1 ||. (2)
The norm 2 of a matrix is its largest singular value. Therefore, in this case,
σ1
κ2 (A) = ||A||2 ||A−1 ||2 = , (3)
σn
where σ1 is the largest singular value of A and σn is the smallest. By exten-
sion, the condition number of a singular matrix (i.e., such that σn = 0) is
+∞. If A is symmetric positive semidefinite, the singular values of A are its
eigenvalues.
Conditioning
Michel Bierlaire
We consider
f (x1 , x2 ) = 2x21 + 9x22 .
The second derivative matrix is

2 4 0
∇ f (x1 , x2 ) = .
0 18
1. The matrix is diagonal. Therefore, its eigenvalues are obtained directly

as the diagonal entries. Consequently, the condition number is
18 9
κ2 (A) = = ,
4 2
for any x ∈ R2 .
2. We now apply the change of variables

′ 1 ′
x1 2 √ 0 x1 x1 0 x1
′ = , = 2 √
2
x2 0 3 2 x2 x2 0 6
x′2
We obtain
√
1 ′ 2 ′
f˜(x′1 , x′2 ) = f ( x1 , x)
2 6 2√
1 2 ′ 2
= 2( x′1 )2 + 9( x)
2 6 2
1 1 ′2
= x′2 1 + x2 .
2 2
The Hessian of f˜ is the identity matrix:

2˜ ′ 1 0
∇ f (x ) = .
0 1
Therefore, the two eigenvalues are equal to 1, and the condition number
is 1, for any x′ ∈ R2 .
Necessary optimality conditions
Michel Bierlaire
Practice quiz
f (x1 , x2 ) = 50x21 − x32
illustrated in the figure below.
5,000
0 10
−10 0
−5 0 5
10−10
1. Show that the point

∗ 0
x =
0
satisfies the necessary optimality conditions.
2. Show that this point is neither a local minimum nor a local maximum.
Hint: Consider the directions

0 0
d1 = and d2 = .
1 −1
Michel Bierlaire
1. The gradient is

100x1 ∗ 0
∇f (x) = , ∇f (x ) = .
−3x22 0
Therefore, the first order necessary condition is verified.


2 100 0 2 ∗ 100 0
∇ f (x) = , ∇ f (x ) = .
0 −6x2 0 0
It is positive semidefinite. Indeed, the eigenvalues are read on the diag-

onal, as it is a diagonal matrix. And they are non negative. Therefore,
the second order necessary condition is verified.
2. Note that the value of the objective function at x∗ is
f (x∗ ) = 0.
Consider the direction

0
d1 =
1
and the point x1 obtained by following this direction from x∗ with a
step α:
∗ 0 0 0
x1 = x + αd1 = +α = .
0 1 α
The value of the objective function at x1 is
f (x1 ) = −α3 .
Therefore, for any α > 0, we have
f (x1 ) < f (x∗ ),
and x∗ cannot be a local minimum.

Consider now the direction

0
d2 =
−1
and the point x2 obtained by following this direction from x∗ with a

step α:

∗ 0 0 0
x2 = x + αd2 = +α = .
0 −1 −α
The value of the objective function at x2 is
f (x2 ) = α3 .
Therefore, for any α > 0, we have
f (x2 ) > f (x∗ ),
and x∗ cannot be a local maximum.

Michel Bierlaire
Practice quiz
Consider the affine function

n
X
f :R →R:x→c x+b=
n T
ci xi + b,
i=1
where c ∈ Rn and b ∈ R. For what values of x, b and c are the necessary

optimality conditions verified?
Michel Bierlaire
The gradient of f is
∂f
   
∂x1 c1
  .. 
∇f (x) =  ...  =  .  = c,

∂f
∂xn
cn
for any x ∈ Rn . As it is a constant vector, the second derivative matrix is

zero, for any x ∈ Rn :
∇2 f (x) = 0.
We first note that the value of b is irrelevant to determine the necessary
optimality conditions. Adding a constant to an objective function does not
change the optima. If c = 0, that is, if
c1 = . . . = cn = 0,
then the necessary optimality conditions are verified for all x ∈ Rn . If c 6= 0,

that is, if there is at least one i such that ci 6= 0, then the first order necessary
optimality conditions are not verified . The second order necessary conditions
are always verified, as the null matrix is positive semidefinite.
Note that, if c = 0, the linear function is constant, with value b. And any
x ∈ Rn is a (global) optimum. If not, the function is not constant. As there
is no constraint, it is not bounded and there is no optimum.
Sufficient optimality conditions
Michel Bierlaire
Practice quiz

1
f (x1 , x2 ) = x21 + x1 cos x2
2
illustrated in the figures below.
0
5
0
−1
0 −5
1
5
−5
−1.5 −1 −0.5 0 0.5 1 1.5
Use the optimality conditions to identify the minima of this function.

Michel Bierlaire
The gradient of the function is

x1 + cos x2
∇f (x1 , x2 ) = .
−x1 sin x2
This gradient is zero for

(−1)k+1
x∗k = , k ∈ Z,
kπ
and for
0
x̄k = π , k ∈ Z.
2
+ kπ
x̄2•
x∗2
•
5 x̄1
•
x∗
• 1
x̄0
•
x∗0
0 •
x̄−1
•
x∗
• −1
x̄−2
−5 •
x∗−2
•
−1.5 −1 −0.5 0 0.5 1 1.5


2 1 − sin x2
∇ f (x1 , x2 ) = .
− sin x2 −x1 cos x2
By evaluating this matrix at x∗k , we obtain for any k ∈ Z

2 ∗ 1 0
∇ f (xk ) = .
0 1
Since this matrix is positive definite, each point x∗k satisfies the sufficient
optimality conditions and is a local minimum of the function.
By evaluating the second derivative matrix at x̄k , we obtain for any k ∈ Z

2 1 (−1)k+1
∇ f (x̄k ) = .
(−1)k+1 0
Regardless of k, this matrix is not positive semidefinite. Indeed, if k is even,

2 ∗ 1 −1
∇ f (xk ) = .
−1 0
If k is odd,
2 1 1
∇ f (x∗k ) = .
1 0
In both cases, the eigenvalues are −0.61803 and 1.61803.
Therefore, there is no x̄k that satisfies the necessary optimality conditions.
None of them can then be a local minimum.
Michel Bierlaire
Practice quiz
f (x1 , x2 ) = 2x21 + 3x22 + x1 − x2 + 3
illustrated in the figures below.
20
10
2
−2 0
−1 0 1 2 −2
2
−1
−2
−2 −1 0 1 2
Show that
− 41

∗
x = 1
6
is a global minimum of f . Is it the only one?

Michel Bierlaire
First, we show that the point

− 14

∗
x = 1 .
6
is a local minimum of the function f . The gradient of the function is

4x1 + 1
∇f (x1 , x2 ) = ,
6x2 − 1
which is zero at x∗ .
2
x∗•
0
−1
−2
−2 −1 0 1 2

2 4 0
∇ f (x1 , x2 ) = ,
0 6
which is positive definite at x∗ . Therefore, the point
1
∗ −4
x = 1
6
satisfies the sufficient optimality conditions and is a local minimum.

The second derivative matrix is positive definite for all x ∈ R2 , then f is
strictly convex. Therefore, x∗ is a unique global minimum of the function.
Thus, x∗ = (− 14 , 61 ) is the only global minimum of f .
Quadratic functions
Michel Bierlaire
Practice quiz
Consider a quadratic function

1
f = xT Qx + g T x + c,
2

1 α
where Q = , g ∈ R2 and c ∈ R.
α 1
1. For which values of α f does not have a local minimum?
2. For which values of α f has a unique global minimum?
3. What are the conditions on g and α so that the problem has an infinite
number of global minima?
Quadratic functions
Michel Bierlaire
The optimality conditions for quadratic functions are related to the pos-
itive definiteness of the (constant) second derivative matrix. The derivatives
of the objective function are
∇f (x) = Qx + g, ∇2 f (x) = Q.
The positive definiteness of Q can be identified from its eigenvalues, that are
the roots of the characteristic polynomial. Therefore, we solve the equation

1−λ α
det(λI − Q) = det = (1 − λ)2 − α2
α 1−λ
= λ2 − 2λ + (1 − α2 )
= (1 − α − λ)(1 + α − λ).
Therefore, we obtain the eigenvalues:
λ1 = 1 − α
λ2 = 1 + α.
The corresponding (normalized) eigenvectors are

√ !
2
u1 = 2√
2
as Qu1 = λ1 u1 ,
− 2
√ !
2
u2 = √2
2
as Qu2 = λ2 u2 .
2
1. For which values of α f does not have a local minimum? According
to the theory, it happens when Q is not positive semidefinite, that is,
when at least one eigenvalue is non positive. This is the case if α > 1
(so that λ1 < 0), or α < −1 (so that λ2 < 0).
2. For which values of α f has a unique global minimum? According to
the theory, it happens when Q is positive definite. This is the case if
−1 < α < 1, so that λ1 > 0 and λ2 > 0.
3. What are the conditions on g and α so that the problem has an infinite
number of global minima? According to the theory, it happens when
Q is positive semidefinite but not positive definite. It means that no
eigenvalue is negative, and at least one of them is zero. This is the case
if α = 1, so that λ1 = 0 and λ2 = 2, and if α = −1, so that λ1 = 2 and
λ2 = 0.
Geometrically, we can decompose the space into two subspaces. In the
subspace spanned by the eigenvectors of the positive eigenvalues, the
function is strictly convex. In this subspace, there is a unique minimum.
In the subspace spanned by the eigenvectors of the zero eigenvalues, the
function is linear, as there is no curvature. Therefore, it is bounded
only if it is constant. And, in that case, there is an infinite number of
minima.
Algebraically, we use the Schur decomposition of Q:
Q = U ΛU T
√ √ ! √ √ !
2 2 2
−√ 22

1−α 0
= 2√ √2 √2 ,
− 2 2 0 1+α 2 2
2 2 2 2
where U is an orthogonal matrix composed of the eigenvectors of Q

organized in columns.
Let’s use the matrix U to decompose x and g:
′ ′
′ x1 ′ g1
U x=x =
T
′ and U g = g =
T
,
x2 g2′
that is
√ √
′ 2 x1 − x2 ′ 2 g1 − g2
x = and g = .
2 x1 + x2 2 g1 + g2
We use the Schur decomposition, and the fact that the U matrix is
orthogonal, so that U U T = I, to rewrite the objective function as
follows:
1
f (x) = xT Qx + g T x + c
2
1
= xT U ΛU T x + g T U U T x + c.
2
Using the decomposition of the vectors, we obtain the function in the
new variables
1 T
f˜(x′ ) = x′ Λx′ + g ′ x′ + c
T
2
1 1
= λ1 (x′1 )2 + λ2 (x′2 )2 + g1′ x′1 + g2′ x′2 .
2 2
The gradient of this function is

λ1 x′1 + g1′
∇f˜(x′ ) = .
λ2 x′2 + g2′
We see that, for the gradient to be zero (which is a necessary condition

for optimality), we need
−gi′
x′i = if λi 6= 0,
λi
′
gi = 0 if λi = 0.
Now, we consider the two values of α identified above.
• If α = 1, then λ1 = 0 and λ2 = 2, and we need

√ √
′ 2 ′ 2 −g2′
g1 = (g1 − g2 ) = 0 and x2 = (x1 + x2 ) = .
2 2 2
It means that the function must be such that g1 = g2 . If so, any
solution such that
x1 = −g1 − x2
is a global minimum.
• If α = −1, then λ1 = 2 and λ2 = 0, and we need
√ √
′ 2 −g1′ ′ 2
x1 = (x1 − x2 ) = and g2 = (g1 + g2 ) = 0.
2 2 2
It means that the function must be such that g1 = −g2 . If so, any
solution such that
x1 = −g1 + x2
is a global minimum.
It can easily be verified that the gradient of the function

x1 + αx2 + g1
f (x) = Qx + g =
αx1 + x2 + g2
is indeed zero under these conditions.

Quadratic functions
Michel Bierlaire
Practice quiz

1
f (x1 , x2 ) = (x1 − x2 )2 + 3x1 − 5,
2
which is illustrated in the figure below.
50
0
5
−5 0
0
5 −5
1. Write the function in quadratic form.
2. Does the function have any local minimum?

Quadratic functions
Michel Bierlaire
To write f in quadratic form

1
f (x) = xT Qx + g T x + c,
2
we calculate its derivatives:

x1 − x2 + 3 2 1 −1
∇f (x) = and ∇ f (x) = .
−x1 + x2 −1 1
Therefore,
1 −1 3
Q= , g= , c = −5.
−1 1 0
The positive definiteness of Q can be identified from its eigenvalues, that
are the roots of the characteristic polynomial. Therefore, we solve the equa-
tion

1 − λ −1
det(λI − Q) = det = (1 − λ)2 − 1
−1 1 − λ
= 1 − 2λ + λ2 − 1
= λ(λ − 2).
Therefore, the eigenvalues of Q are λ1 = 2 and λ2 = 0. We deduce that

Q is positive semidefinite. Thus, either the problem is not bounded or there
is an infinite number of global minima.
In order to find in which case we are, we consider the Schur decomposition
of Q. To do so, we first have to determine the eigenvectors associated with
each eigenvalue.
• For λ1 we have
−1 −1 x1 0
= ,
−1 −1 x2 0
which implies that the normalized eigenvector is
√
2 1
xλ 1 = .
2 −1
• For λ2 we have
1 −1 x1 0
= ,
−1 1 x2 0
which implies that the normalized eigenvector is
√
2 1
xλ 2 = .
2 1
Thus, we can write

√ √ ! √ √ !T
2 2 2 2

2 0
Q = U ΛU T = √2 √2 √2 √2 .
− 22 2 0 0 − 22 2
2 2
We have √
T 2 3
U g= .
2 −3
As the second entry of this vector, corresponding to the zero eigenvalue, is
not zero, we can conclude that the problem is unbounded.
Solving equations: Newton with one variable
Michel Bierlaire
Practice quiz
Implement in Python Newton’s algorithm to find the root of one equation

with one unknown. More precisely, for a given F : R → R, find x∗ such that
|F (x∗ )| ≤ ε = 10−15 .
Use this implementation to find the root of the following functions.
1. F (x) = x2 − 2 = 0 with x0 = 2.
2. F (x) = x − sin(x) = 0 with x0 = 1.
3. F (x) = arctan(x) = 0 with x0 = 1.5.

Solving equations: Newton with several
variables
Michel Bierlaire
Practice quiz
Newton’s method is easily generalized to solve systems of equations with

several variables. We want to find x such that F (x) = 0, where F : Rn → Rn .
If J(x) is the Jacobian if F , the method consists in solving, at each iteration,
the system of linear equations
J(xk )dk+1 = −F (xk ),
and to update the iterate:
xk+1 ← xk + dk+1 .
Remember that the element in row i and column j of the Jacobian matrix is
∂Fi
.
∂xj
1. Implement in Python Newton’s algorithm to find the root of a system of
n equations with n unknowns. More precisely, for a given F : Rn → Rn ,
find x∗ such that
kF (x∗ )k ≤ ε = 10−15 .
2. Apply it to the following system of equations, using

−2
x0 =
−2
as a starting point:

x31 − 3x1 x22 − 1
F (x) = = 0.
x32 − 3x21 x2
3. The above system of equations happens to have three roots:

⋆ 1 ⋆ −1/2 ⋆ −1/2
x (b) = , x (g) = √ , x (w) = √ .
0 3/2 − 3/2
It is actually difficult to predict toward which root the algorithm will

converge. To visualize the process, generate a plot using the following
convention:
• if Newton’s method, when starting from the point x0 , converges

toward the solution x⋆ (b), the point x0 is colored in red;
toward the solution x⋆ (g), the point x0 is colored in blue;
toward the solution x⋆ (w), the point x0 is colored in yellow.
Newton’s local method
Michel Bierlaire
Practice quiz
1. Implement in Python Newton’s local algorithm to find the minimum

of a function of n variables. More precisely, for a given f : Rn → R,
find x∗ such that
k∇f (x∗ )k ≤ ε = 10−15 .
2. Apply it to the following function:
f (x1 , x2 ) = 2x31 + 6x1 x22 − 3x32 − 150x1 ,
using the following starting points:

6.2 2.8
and .
−3.4 4.3
3. For each case, determine if the algorithm indeed found a minimum or

not.
Michel Bierlaire
Practice quiz
Consider the function f : R → R defined as
f (x) = −x4 + 12x3 − 47x2 + 60x.
For each of the following points x

b,
• identify the quadratic model of the function f at x

b,
• identify the point at which the derivative of the quadratic model is

zero,
• plot the function, the quadratic model and the zero and,
• comment on the corresponding iteration of Newton’s local method.
The points to consider are
1. x
b = 3,
2. x
b = 4,
3. x
b = 5.
Michel Bierlaire
[1]: %matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
Consider f : R → R defined as
f ( x ) = − x4 + 12x3 − 47x2 + 60x.
[2]: def theFunction(x):

return(-x**4 + 12 * x**3 - 47 * x**2 + 60 * x)
x = np.arange(0, 5.5, 0.1)
y = theFunction(x)
plt.plot(x, y)
[2]: [<matplotlib.lines.Line2D at 0x115410940>]

The first derivative is
f 0 ( x ) = −4x3 + 36x2 − 94x + 60.
[3]: def fprime(x):

return -4 * x**3 + 36 * x**2 - 94 * x + 60
The second derivative is

f 00 ( x ) = −12x2 + 72x − 94.
[4]: def fsecond(x):

return -12 * x**2 + 72 * x - 94
The quadratic model at the point xb is defined as
1
m( x ) = f ( xb) + ( x − xb) f 0 ( xb) + ( x − xb)2 f 00 ( xb)
2
[5]: def model(x, xhat):

return theFunction(xhat) + (x - xhat) * fprime(xhat) + 0.5 * (x - xhat)**2 *␣
,→fsecond(xhat)
The first derivative of the quadratic model is zero at the point
f 0 ( xb)
x + = xb − .
f 00 ( xb)
[6]: def zero(xhat):

return xhat - fprime(xhat) / fsecond(xhat)
Plot the quadratic model at a given point
[7]: def plotModel(xhat):

print(f'f({xhat})={theFunction(xhat)}')
print(f'f\'({xhat})={fprime(xhat)}')
print(f'f\'\'({xhat})={fsecond(xhat)}')
x = np.arange(0, 5.5, 0.1)
y1 = theFunction(x)
y2 = model(x, xhat)
plt.ylim(-10, 25)
plt.plot(x, y1)
plt.plot(x, y2)
plt.plot(xhat, theFunction(xhat), 'go')
z = zero(xhat)
print(f'Zero of the derivative of the model: {z:.3g}')
plt.plot(z, model(z, xhat), 'ro')
Quadratic model around xb = 3.

[8]: plotModel(3)
f(3)=0
f'(3)=-6
f''(3)=14
Zero of the derivative of the model: 3.43
The model is convex. Indeed, the second derivative is positive. The zero of the derivative is
therefore the minimum of the quadratic model. And there is a good adequacy between the model
and the function at that point. This is a favorable case for Newton’s local method.
[9]: plotModel(4)
f(4)=0
f'(4)=4
f''(4)=2
Zero of the derivative of the model: 2
The model is convex. Indeed, the second derivative is positive. The zero of the derivative is
therefore the minimum of the quadratic model. However, there is not a good adequacy between
the model and the function at that point. The value of the function actually increase at that point.
This is not a favorable case for Newton’s local method.
[10]: plotModel(5)
f(5)=0
f'(5)=-10
f''(5)=-34
Zero of the derivative of the model: 4.71
The model is not convex. Indeed, the second derivative is negative. Therefore, the model is
not bounded from below and there is no minimum. The zero of the derivative corresponds to a
maximum of the quadratic model. Even if there is a good adequacy between the model and the
function at that point, the value of the function actually increase. This is not a favorable case for
Newton’s local method.
Preconditioned steepest descent
Michel Bierlaire
Practice quiz
1 Algorithm
Consider a function f : Rn → R. The preconditioned steepest descent algo-
rithm is defined as
xk+1 = xk + αdk ,
where
dk = −Dk ∇f (xk ),
and Dk ∈ Rn×n is a positive definite matrix. Implement the algorithm with
dTk ∇f (xk )
α= .
dTk ∇2 f (xk )dk
The iterations must be interrupted when
1. either the norm of the gradient is sufficiently close to zero, that is
k∇f (xk )k2 ≤ ε,
where ε is a given precision,
2. or a maximum number of iterations is reached.
2 Illustration
f (x1 , x2 ) = x21 + 11x22 + x1 x2 .

Apply the preconditioned steepest descent algorithm with

4
x0 = , ε = 10−7 ,
1
and the following preconditioners:
Dk = I

1 0
Dk = 1
0 10

1 22 −1
Dk = .
43 −1 2
Comment on the number of iterations and the global behavior of the algo-
rithm.
Preconditioning
Michel Bierlaire
Practice quiz
Consider the function f : R2 → R defined as
1 101 2
f (x) = x21 + x + x1 x2 .
2 2 2
1. Consider a change of variables
x′ = LTk x,
where
Lk LTk = ∇2 f (xk ).
Write the function in the new variables
f˜(x′ ) = f (L−T ′
k x ).
2. Calculate its first and second derivatives.

3. Consider the point
1
x0 = .
1
Calculate the corresponding point in the new variables x′0 . Apply one
iteration of the steepest descent algorithm on f˜ from that point, that
is
x′k+1 = x′k − α∇f˜(xk ),
where the step size is
∇f˜(xk )T ∇f˜(xk )
α= .
∇f˜(xk )T ∇2 f˜(xk )∇f˜(xk )
4. Finally, identify the corresponding point in the original variables.

Preconditioning
Michel Bierlaire
The derivatives of the function are

x1 + x2 2 1 1
∇f (x) = , ∇ f (x) = .
x1 + 101x2 1 101
1. The matrix Lk defining the change of variables is the Cholesky factor

of the second derivatives matrix, that is

1 0
Lk = , ∀k,
1 10
as
1 1
Lk LTk = .
1 101
Therefore, the change of variables is x′ = LTk x, that is

′ 1 1
x = x
0 10
or
x′1 = x1 + x2
x′2 = 10x2 .
We write the change of variables in the opposite direction as x = L−T

k x,
′
that is
1

1 − 10
x= 1 x′
0 10
or
1 ′
x1 = x′1 − x,
10 2
1 ′
x2 = x.
10 2
The function f˜ is therefore defined as

1 1
f˜(x′ ) = f (x′1 − x′2 , x′2 )
10 10
2 2
1 ′ 1 ′ 101 1 ′ ′ 1 ′ 1 ′
= x1 − x2 + x + x1 − x2 x ,
2 10 2 10 2 10 10 2
1 1 ′2
= x′2 1 + x2 .
2 2
2. The derivatives of f˜ are

x′1 1 0
∇f˜(x′1 , x′2 ) = , ∇ f˜ =
2
.
x′2 0 1
3. We write the point in the new variables:

1 ′ 2
x0 = , x0 = ,
1 10
so that
2
∇f˜(x0 ) = ′
.
10
The step to perform along the steepest descent direction is
∇f˜(x0 ′ )T ∇f˜(x0 ′ )
α= .
∇f˜(x0 ′ )T ∇2 f˜(x0 ′ )∇f˜(x0 ′ )
As ∇2 f˜(x0 ′ ) is the identity matrix, α = 1. Actually, this would be true

for any x0 . Therefore, we obtain

′ ′ ˜ ′ 2 2 0
x1 = x0 − α∇f (x0 ) = − = .
10 10 0
4. In the original variables, we have
1 ′
x1 = x′1 − x = 0,
10 2
1 ′
x2 = x = 0.
10 2
It happens to be the optimal solution of the problem.
Quadratic interpolation
Michel Bierlaire
Practice quiz
Consider the function h : R → R, and three points a, b and c such that

a < b < c, h(a) > h(b) and h(c) > h(b).
1. Write the polynomial of degree 2 that interpolates h at a, b and c. That

is, P (x) ∈ P2 such that P (a) = h(a), P (b) = h(b) and P (c) = h(c).
2. Show that this polynomial is convex.
3. Identify the point corresponding to the minimum of the polynomial.

Michel Bierlaire
1. We first build a polynomial of degree 2 that is equal to 1 if x = a, and

to 0 if x = b or x = c:
(x − b)(x − c)
Pa (x) = .
(a − b)(a − c)
Similarly, we define a polynomial of degree 2 that is equal to 1 if x = b,
and to 0 if x = a or x = c:
(x − a)(x − c)
Pb (x) = ,
(b − a)(b − c)
and a polynomial of degree 2 that is equal to 1 if x = c, and to 0 if
x = a or x = b:
(x − a)(x − b)
Pc (x) = .
(c − a)(c − b)
Then, we define the polynomial
P (x) = h(a)Pa (x) + h(b)Pb (x) + h(c)Pc (x).
By construction, P (a) = h(a), P (b) = h(b) and P (c) = h(c).

2. To show that the polynomial is convex, we need to calculate the deriva-
tives. We have
2x − c − b 2
Pa′ (x) = , Pa′′ (x) = ,
(a − b)(a − c) (a − b)(a − c)
2x − a − c 2
Pb′ (x) = , Pb′′ (x) = ,
(b − a)(b − c) (b − a)(b − c)
2x − a − b 2
Pc′ (x) = , Pc′′ (x) = ,
(c − a)(c − b) (c − a)(c − b)
and
P ′′ (x) = h(a)Pa′′ (x) + h(b)Pb′′ (x) + h(c)Pc′′ (x),
2
= (h(a)(c − b) + h(b)(a − c) + h(c)(b − a)) .
(b − a)(c − a)(c − b)
Because a < b < c, we have
2
> 0.
(b − a)(c − a)(c − b)
For the second factor, we have
h(a)(c − b) + h(b)(a − c) + h(c)(b − a) =
(h(a) − h(b))(c − b) + (h(c) − h(b)(b − a) + h(b)((c − b) + (a − c) + (b − a)) =
(h(a) − h(b))(c − b) + (h(c) − h(b))(b − a).
Because a < b < c, h(a) > h(b) and h(c) > h(b), this factor is (strictly)
positive), and so is the second derivative. Therefore, the polynomial is
convex.
3. As the polynomial is convex, its minimum is reached where the first
derivative is zero. We have
P ′ (x) = h(a)Pa′ (x) + h(b)Pb′ (x) + h(c)Pc′ (x),
that is
h(a)(c − b)(2x − c − b) + h(b)(a − c)(2x − a − c) + h(c)(b − a)(2x − a − b)
P ′ (x) =
(b − a)(c − a)(c − b)
h h(a)(c − b) + h(b)(a − c) + h(c)(b − a)
=2 x
(b − a)(c − a)(c − b)
h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 ) i
+
(b − a)(c − a)(c − b)
Therefore P ′ (x∗ ) = 0 if
1 h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 )
x∗ = − ,
2 h(a)(c − b) + h(b)(a − c) + h(c)(b − a)
or, equivalently,
1 h(a)(b2 − c2 ) + h(b)(c2 − a2 ) + h(c)(a2 − b2 )
x∗ = .
2 h(a)(b − c) + h(b)(c − a) + h(c)(a − b)
Michel Bierlaire
Practice quiz
Consider the unidimensional function h : R → R, which is continuous

and decreasing at 0. That is, there exists η > 0 such that h(x) < h(0) for
each 0 < x ≤ η. In particular, assume that we have access to δ ∈ R such
that h(δ) < h(0).
1. Implement first a function that generates three points a, b and c such

that a < b < c, h(a) > h(b) and h(c) > h(b). Hint: start with a = 0,
b = δ, c = 2δ, and check if the conditions are verified. If not, consider
a = δ, b = 2δ, c = 4δ, etc.
2. Implement the exact line search algorithm with quadratic interpolation.
3. Consider the function
h(x) = (2 + x) cos(2 + x).
Apply the quadratic interpolation algorithm with a precision ε = 10−3 .

It means that the algorithm stops if
max(h(a), h(c) − h(b) ≤ ε or c − a ≤ ε.
Identify the first 3 points using the procedure implemented under item
1 with δ = 6.
Exact line search: Golden Section
Michel Bierlaire
Practice quiz
In the video, the recycling mechanism of the Golden section method is

illustrated in the case where α2k becomes the new upper bound, that is when
the iteration is [ℓk+1 , uk+1 ] = [ℓk , α2k ]. We now consider an iteration of the
Golden section algorithm for which α1k becomes the new lower bound, that
is
[ℓk+1 , uk+1 ] = [α1k , uk ].
In this case, we select α1k+1 = α2k so that there is no need to recalculate the
value of the function at α1k+1 .
1. Write α2k+1 as a function of ρ, α1k and uk .
2. Derive the value of ρ.

Exact line search: Golden Section
Michel Bierlaire
1. Denote λ the length of the interval:
λ = uk − ℓ k . (1)
By symmetry of the reduction of the intervals, we have
α1k − ℓk = uk − α2k = ρ(uk − ℓk ) = ρλ, (2)
and for the next iteration
α1k+1 − ℓk+1 = uk+1 − α2k+1 = ρ(uk+1 − ℓk+1 ). (3)
We now exploit the fact that [ℓk+1 , uk+1 ] = [α1k , uk ] and α1k+1 = α2k to
obtain
α2k − α1k = uk − α2k+1 = ρ(uk − α1k ). (4)
Thus we get
α2k+1 = uk − ρ(uk − α1k ). (5)
We now need to derive ρ.
2. We first derive
α2k − α1k = α2k − α1k + ℓk − ℓk + uk − uk

= −(α1k − ℓk ) − (uk − α2k ) + uk − ℓk
(6)
= −ρλ − ρλ + λ
= λ(1 − 2ρ).
Then, we derive
uk − α1k = ℓk − ℓk + uk − α1k
= −(α1k − ℓk ) + (uk − ℓk ) (7)
= λ(1 − ρ).
Inserting (6) and (7) into (4) we obtain
λ(1 − 2ρ) = ρλ(1 − ρ),
or equivalently,
ρ2 − 3ρ + 1 = 0. (8)
Equation (8) has two solutions:

√ √
3+ 5 3− 5
and .
2 2
As the shrinking factor ρ has to be less than 1/2, we select
√
3− 5
ρ= . (9)
2
Finally, using (9) in the equation (5), we obtain

√
3 − 5
α2k+1 = uK − (uk − α1k ).
2
Golden section
Michel Bierlaire
Practice quiz
Consider a function h : R → R, that is strictly unimodal in [ℓ, u]. It

means that it has a unique global minimum α∗ in [ℓ, u] and the following
conditions are verified:
• f (α1 ) > f (α2 ) > f (α∗ ) for each α1 , α2 such that α1 < α2 < α∗ ,
• f (α2 ) > f (α1 ) > f (α∗ ) for each α1 , α2 such that α2 > α1 > α∗ .
1. Implement the exact line search algorithm with Golden section.
2. Consider the function
h(x) = (2 + x) cos(2 + x),
on the interval [5, 10]. Verify visually that it is strictly unimodal, and
apply the Golden section algorithm with a precision ε = 10−3 . If means
that the algorithm stops when
u − ℓ ≤ ε.
3. Same question for the interval [0, 8].

First Wolfe condition
Michel Bierlaire
Practice quiz
Consider the unconstrained optimization problem
min2 f (x) = 4x21 − 4x1 + x22 + 2x2 ,

x∈R
and the point x0 = (0, 0)T .
1. Calculate Newton’s direction at x0 .
2. Verify that it is a descent direction.
3. Consider the first Wolfe condition with β1 = 0.1. What are the values
of the step α that verify the condition?
First Wolfe condition
Michel Bierlaire
1. Newton’s direction is a solution of the linear system
∇2 f (xk )dk = −∇f (xk ).

8x1 − 4 2 8 0
∇f (x) = , ∇ f (x) = .
2x2 + 2 0 2
Therefore, Newton’s equations at x0 = (0, 0)T are

8 0 d1 −4
=− ,
0 2 d2 2
that is
8d1 = 4
2d2 = −2,
and Newton’s direction is

1/2
dN = .
−1
2. The directional derivative of the function along dN at x0 is
∇f (x0 )T dN = −4.
It is negative, so dN is a descent direction.


α/2
xα = x0 + αdN = ,
−α
where α ≥ 0 is the step performed along dN . The point xα verifies the
first Wolfe condition if
f (xα ) ≤ f (x0 ) + αβ1 ∇f (x0 )T dN ,
2α2 − 4α ≤ 0 − 4αβ1 ,
that is, if
α ≤ 2(1 − β1 ).
If β1 = 0.1, all steps α such that
α ≤ 1.8
verify the first Wolfe condition. It is illustrated by the figure below,
where f (xα ) is represented in blue and the line f (x0 ) + αβ1 ∇f (x0 )T dN
is represented in red. The slope of this line is β1 ∇f (x0 )T dN = −0.4.
The value α = 1.8 is represented by the dotted vertical line.
0.5
f (xα )
α
−0.5 0.5 1 1.5 2 2.5
−0.5
−1
−1.5
−2
Second Wolfe condition
Michel Bierlaire
Practice quiz
Consider the unconstrained optimization problem
min2 f (x) = 4x21 − 4x1 + x22 + 2x2 ,

x∈R
and the point x0 = (0, 0)T .
1. Calculate Newton’s direction at x0 .
2. Verify that it is a descent direction.
3. Consider the second Wolfe condition with β2 = 0.7. What are the
values of the step α that verify the condition?
Second Wolfe condition
Michel Bierlaire
1. Newton’s direction is a solution of the linear system
∇2 f (xk )dk = −∇f (xk ).

8x1 − 4 2 8 0
∇f (x) = , ∇ f (x) = . (1)
2x2 + 2 0 2
Therefore, Newton’s equations at x0 = (0, 0)T are

8 0 d1 −4
=− ,
0 2 d2 2
that is
8d1 = 4
2d2 = −2,
and Newton’s direction is

1/2
dN = .
−1
2. The directional derivative of the function along dN at x0 is
∇f (x0 )T dN = −4.
It is negative, so dN is a descent direction.


α/2
xα = x0 + αdN = ,
−α
where α ≥ 0 is the step performed along dN . Using (1), we have

4α − 4
∇f (xα ) = .
−2α + 2
Therefore, the point xα verifies the second Wolfe condition if
∇f (xα )T dN ≥ β2 ∇f (x0 )T dN ,
4(α − 1) ≥ −β2 4,
that is, if
α ≥ 1 − β2 .
If β2 = 0.7, all steps α such that
α ≥ 0.3.
verify the second Wolfe condition. It is illustrated by the figure below,

where f (xα ) is represented in blue and the line
∇f (xα )T dN
∇f (x0 )T dN
is represented in red. Note that, for that representation, we consider

the second Wolfe written as
∇f (xα )T dN
≤ β2 ,
∇f (x0 )T dN
where the left hand side is plotted in red on the figure. The value
α = 0.3 is represented by the dotted vertical line.
f (xα )
1 β2 = 0.7
α
−1 1 2 3
−1
−2
Validity of the Wolfe conditions
Michel Bierlaire
Practice quiz
We are performing a line search along a direction d at a point x. The

function evaluated at x + αd is defined as
g(α) = f (x + αd) = −1.833α3 + 8.5α2 − 10α + 5.
Consider the step α = 3. Consider the first Wolfe condition with β1 = 0.15
and the second Wolfe condition with β2 = 0.8.
1. Is the step α = 3 too short, or too long?
2. Propose a value of α that verifies both Wolfe conditions.

Validity of the Wolfe conditions
Michel Bierlaire
1. As
g(α) = f (x + αd) = −1.833α3 + 8.5α2 − 10α + 5,
the directional derivative is
g ′ (α) = −3 · 1.833α2 + 2 · 8.5α − 10.
Therefore, the second Wolfe condition for α = 3 is
g ′ (3) ?
≤ β2 ,
g ′ (0)
?
−8.5/ − 10 = 0.85 ≤ 0.8.
It is violated. Therefore, it is tempting to conclude that the step is too

short. The first Wolfe condition for α = 3 is
?
g(3) ≤ g(0) + 3β1 g ′ (0),
?
2.009 ≤ 5 − 4.5 = 0.5.
It is also violated. Therefore, it is tempting to conclude that the step

is too long.
However, a step cannot be both too short and too long. The two
conditions must be verified in the right order. The first Wolfe condition
must be checked first. If it is violated, we conclude that the step is too
long, and we do not need to consider the second condition. The second
Wolfe condition is checked only when the step is not deemed too long
by the first condition.
In the case described above, the step is too long, as it violates the first
condition. And the algorithm should make it shorter.
In the next figure, the value of g(α) is plotted in blue, the line
g(0) + 3β1 g ′ (0)
characterizing the first Wolfe condition is plotted in red, and the ratio
g ′ (α)
g ′ (0)
characterizing the second Wolfe condition is plotted in green, as well

as the threshold value β2 = 0.8. The dotted vertical line corresponds
to α = 3.
g(α)
4
β1
=
0. 1
2 5
β2 = 0.8 α
1 2 3 4
2. All values of α such that
0.12251 ≤ α ≤ 1.45912,
verify both conditions. The two thresholds are identified by the two
dashed blue vertical lines on the picture above.
Steepest descent algorithm
Michel Bierlaire
Practice quiz
1. Implement the steepest descent algorithm
xk+1 = xk − α∇f (xk ),
where α is obtained from the inexact line search algorithm based on

the two Wolfe conditions.
2. Apply the algorithm on the Rosenbrock function
f (x1 , x2 ) = 100(x2 − x21 )2 + (1 − x1 )2 ,
starting from
−1.5
x0 = ,
1.5
to try to obtain x∗ such that k∇f (x∗ )k ≤ ε = 10−7 . Limit the number
of iterations to 10000.
3. Plot the iterations on the contours of the function.

Newton with line search
Michel Bierlaire
Practice quiz
1. Implement a function that takes as input a square symmetric matrix
H and returns a scalar τ ≥ 0 and a lower triangular matrix L such that
H + τ I = LT L.
2. Implement a function that solves the system

LLT d = −∇f (xk ),
by solving two triangular systems.
3. Implement Newton’s method with line search
xk+1 = xk − α(Lk LTk )−1 ∇f (xk ),
where α is obtained from the inexact line search algorithm based on
the two Wolfe conditions, and Lk is a lower triangular matrix such that
∇2 f (xk ) + τ I = LTk Lk ,
where τ ≥ 0.
4. Apply the algorithm on the Rosenbrock function
f (x1 , x2 ) = 100(x2 − x21 )2 + (1 − x1 )2 ,
starting from
−1.5
x0 = ,
1.5
to try to obtain x∗ such that k∇f (x∗ )k ≤ ε = 10−7 . Limit the number
of iterations to 10000.
5. Plot the iterations on the contours of the function.
6. Report the value of τ and α for each iteration.
Unconstrained nonlinear optimization
Michel Bierlaire
Graded quiz
1 Formulation
Question 1
James Bond must reach a yacht moored 100 meters from shore. Currently,
James Bond is 80 meters from the nearest point to the yacht from the beach.
He is capable of running on the beach at 20 km/h and swimming at 6 km/h.
James Bond wants to determine which distance x in meters he needs to run
on the beach before jumping in the water in order to minimize the time to
get to the yacht. Which of the following formulations corresponds to this
problem?
Solution 1:
min 20x + 6(80 − x)

s.t.
x ≤ 80,
x ≥ 0.
Solution 2:
18 p
min x + 0.6 1002 + (80 − x)2
100
s.t.
x ≤ 80,
x ≥ 0.
Solution 3:
√
min 20x + 0.6 1002 + x2
s.t.
x ≥ 0.
Solution 4:
18 p
min x + 0.6 1002 + (80 − x)2 .
100
Question 2
Consider the following optimization model.
√
max −3 x1 − 2x22
√
s.t. 2 x1 + x22 ≥ 4,
√
−3 x1 + 2x22 ≥ 2,
x1 , x2 ≥ 0.
Which linear model in standard form corresponds to it?
Solution 1:
min 3y1 + 2y2

s.t. 2y1 + y2 − y3 = 4,
−3y1 + 2y2 − y4 = 2,
y1 , y2 , y3 , y4 ≥ 0.
Solution 2:
min 3y1 + 2y2

s.t. 2y1 + y2 = 4,
−3y1 + 2y2 = 2,
y1 , y2 ≥ 0.
Solution 3:
√
max −3 x1 − 2x22
√
s.t. 2 x1 + x22 = 4,
√
−3 x1 + 2x22 = 2,
x1 , x2 ≥ 0.
Solution 4:
√
min 3 x1 + 2y
√
s.t. 2 x1 + y ≥ 4,
√
−3 x1 + 2y ≥ 2,
x1 , y ≥ 0.
Question 3
An iteration of the local Newton method is exactly an iteration of the
preconditioned steepest descent method when...
1
Solution 1: ∇2 f (xk ) is positive semidefinite and the step αk = 2
satisfies
the Wolfe conditions.
1
Solution 2: ∇2 f (xk ) is positive definite and the step αk = 2
satisfies the
Wolfe conditions.
Solution 3: ∇2 f (xk ) is positive semidefinite and the step αk = 1 satisfies

the Wolfe conditions.
Solution 4: ∇2 f (xk ) is positive definite and the step αk = 1 satisfies the

Wolfe conditions.
2 Objective function
Question 4
Consider the following function:
f (x1 , x2 ) = ln(x1 ),
where x1 > 0.
This function is...
Solution 1: neither convex nor concave.
Solution 2: both convex and concave.
Solution 3: convex.
Solution 4: concave.
Question 5
f (x1 , x2 ) = ex1 cos(x2 ).

What is the directional derivative along the direction

0
d=
5
at point
0
x= π ?
2
Solution 1: -1
Solution 2: 0
Solution 3: 1
Solution 4: -5
Question 6
Consider a function f , twice differentiable from R2 to R, and consider a
stationary point (x1 , x2 ) ∈ R2 . If the determinant of the Hessian matrix of
f at this point is negative, then ...
Solution 1: the stationary point is a saddle point.
Solution 2: the stationary point is a local minimum.
Solution 3: the stationary point is a local maximum.
Solution 4: the provided information is not enough to decide about the

nature of the stationary point.
Question 7
f (x1 , x2 , x3 ) = x22 + x1 x3 .
What is the curvature of f along the direction
 
2
d =  −1 
2
at point  
1
x =  2 ?
2
2
Solution 1: 3
Solution 2: 0
Solution 3: 1
10
Solution 4: 9
Question 8
Preconditioning is ...
Solution 1: a method to define a descent direction.
Solution 2: applied when there is no difference in the curvature of the

function among all directions.
Solution 3: a method to define a step size.
Solution 4: a variable changing procedure for differentiable functions.

3 Optimality conditions
Question 9
Consider the function f (x1 , x2 ) = x21 +x22 +x1 x2 +1 and the point xT = (0, 0).
Which one of the following statements is correct?
Solution 1: x is not a global minimum.
Solution 2: The first-order necessary condition is not satisfied in x.
Solution 3: The sufficient optimality conditions are satisfied in x.
Solution 4: The first-order necessary condition is satisfied in x but the

second-order necessary condition is not satisfied.
Question 10
1
f (x1 , x2 ) = x31 + x22 + 2x1 x2 − 6x1 − 3x2 + 4
3
and the point
−1
x= 5 .
2
Solution 1: x is a local minimum.
Solution 2: x is a local maximum.
Solution 3: x is a saddle point.
Solution 4: x is not a critical point.

Question 11
f (x1 , x2 ) = 48x1 + 96x2 − x21 − 2x1 x2 − 9x22
and the point

21
x= .
3
Solution 1: x is a saddle point.
Solution 2: x is a local minimum.
Solution 3: x is a local maximum.
Solution 4: x is not a critical point.

Question 12
Consider the following quadratic function
1
f (x) = xT Qx + g T x + c
2

1 2
where Q = , g ∈ R2 and c ∈ R.
2 1
Solution 1: f does not have a local minimum.
Solution 2: f has an only global minimum.
Solution 3: f has an infinite number of local minima.
Solution 4: f is a convex function.

4 Solving equations: Newton
Question 13
We aim to find the root of the equation
F (x) = x2 − 3 = 0.
More precisely, we aim to find x∗ such that |F (x∗ )| ≤ ε = 10−15 .
When applying the first step of Newton’s algorithm starting from point
x0 = 2, we obtain:
Solution 1: x1 = 1.73.
Solution 2: x1 = 1.732.
5 Newton’s local method
Question 14
Let f be a twice differentiable function. Which statement about Newton’s
local method for the minimization of f is correct?
Solution 1: If the algorithm converges, it always converges to a stationary

point of the function f .
Solution 2: The point obtained during the k th iteration of the algorithm

maximizes the quadratic model of the function f in point xk .
Solution 3: If we start the algorithm from two different starting points,

it will always converge to two different local minima.
Solution 4: If the algorithm converges, it enables to always find a point

that satisfies the second-order necessary optimality condition.
Question 15
Consider the function f : R → R defined as
f (x) = −x5 + 2x3 + 40x.
The quadratic model of the function f at x̂ = 2 is:
Solution 1: m2 (x) = 40x2 − 44x + 92.
Solution 2: m2 (x) = −68x2 + 256x − 176.
Solution 3: m2 (x) = −720x2 + 64x + 81.
Solution 4: m2 (x) = 360x2 + 32x + 944.

6 Descent methods
Question 16
The purpose of preconditioning the function in the steepest descent method
is to
Solution 1: increase the speed of convergence of the algorithm.
Solution 2: avoid local optima.
Solution 3: obtain the optimum solution in maximum 3 iterations.
Solution 4: linearize the optimization problem.

Question 17
The purpose of an exact line search algorithm, applied to a descent method,
is to:
Solution 1: determine the descent direction in the current iteration.
Solution 2: determine an acceptable step to follow a descent direction in

the current iteration.
Solution 3: find the optimal value of the objective function in the next
iteration.
Solution 4: determine the step corresponding to a local minimum of the

function along a descent direction in the current iteration.
Question 18
The golden section method applied to a function h on an interval [ℓ, u] gen-
erates a sequence of intervals [ℓk , uk ] such that for each k we always have:
Solution 1: [ℓk+1 , uk+1 ] ⊂ [ℓk , uk ].
Solution 2: [ℓk , uk ] ⊂ [ℓk+1 , uk+1 ].
Solution 3: ℓk < ℓk+1 .
Solution 4: ℓk+1 ≤ ℓk .
Question 19
Suppose that f : Rn → R is a differentiable nonlinear function, xk ∈ Rn is a
point and dk ∈ Rn is a direction such that ∇f (xk )T dk < 0, and f is bounded
from below in the direction dk . The purpose of the first Wolfe condition:
f (xk + αdk ) ≤ f (xk ) + αβ1 ∇f (xk )T dk
is to:
Solution 1: check that dk is a descent direction.
Solution 2: insure that the objective function decreases along the direc-
tion dk .
Solution 3: insure that the descent algorithm will progress rapidly.
Solution 4: insure a sufficient decrease of the objective function.

Question 20
Suppose that f : Rn → R is a differentiable nonlinear function, xk ∈ Rn is
a point and dk ∈ Rn is a descent direction such that ∇f (xk )T dk < 0, and
that f is bounded from below in the direction dk . If we write the two Wolfe’s
conditions as:
f (xk + αdk ) ≤ f (xk ) + αβ1 ∇f (xk )T dk ,
and
∇f (xk + αdk )T dk
≤ β2 ,
∇f (xk )T dk
which condition should be satisfied by the parameters β1 and β2 so that
we can be sure that there exists a step size α that satisfies both Wolfe’s
conditions?
Solution 1: This condition does not exist.
Solution 2: 0 < β2 < β1 < 1.
Solution 3: 0 < β1 = β2 < 1.
Solution 4: 0 < β1 < β2 < 1.

Nlpguia

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nlpguia

Uploaded by

Copyright:

Available Formats

Formulation – Modeling

Formulate this problem as an optimization problem by determining

1. the decision variables,

2. the objective function, and

Solution of the practice quiz

We proceed with the three steps of the modeling process.

Each restriction is modeled as follows:

xcℓ + xsℓ ≤ 2xsb .

• the amount invested in car loans (xcℓ ) must be larger or equal

Finally, we must impose all the decision variables to be non negative:

xsb , xre , xcℓ , xsℓ ≥ 0.

Putting everything together, we obtain the following optimization prob-

xsb + xre + xcℓ + xsℓ = 2′ 500′ 000,

• State bonds: 696’721.3 euros

• Real estate loans: 409’836.1 euros

• Car loans: 696’721.3 euros

• Scholarship loans: 696’721.3 euros

• Constraint 1: 696′ 721.3+409′ 836.1+696′ 721.3+696′ 721.3 = 2′ 500′ 000.0,

• Constraint 2: 696′ 721.3+696′ 721.3 = 1′ 393′ 442.62∗696′ 721.3 = 1′ 393′ 442.6,

• Constraint 3: 696′ 721.3 <= 696′ 721.3,

• Constraint 4: 696′ 721.3 <= 1.7 · 409′ 836.1 = 696′ 721.3.

The company Coola-Coola Ltd. wants to design a can of soda of volume

Solution of the practice quiz

We consider the three steps of the modeling process.

Decision variables The design of the cylinder depends on two variables,

• the radius of the basis: r,

Objective function Since the thickness of the aluminium is the same at

• Each basis is a circle of radius r, so its surface is πr2 .

Therefore, the objective function to minimize is

f (r, h) = 2πr2 + 2πrh.

We also need non negativity constraints:

The optimization problem is therefore:

min 2πr2 + 2πrh

The optimal solution of this problem is r = 3.746 cm and h = 7.491 cm.

Given the following optimization problem, transform it in such a way as

max −x21 + sin x2

Solution of the practice quiz

We apply a sequence of simple transformations to the optimization prob-

1. A maximization problem whose objective function is f (x) is equivalent

argmaxx f (x) = argmaxx −f (x)

− min x21 − sin x2

2. Now we transform the constraints. An equality constraint can be writ-

− min x21 − sin x2

3. A constraint defined by a lower inequality can be multiplied by −1 to

− min x21 − sin x2

We can impose x+ ≥ 0 and x− ≥ 0, without loss of generality. We

− min x21 − sin(x+ −

5. In the presence of a constraint x ≥ a, with a ∈ R, a simple change of

We apply this last principle to variable x1 , and we obtain:

− min(x̂1 + 2)2 − sin(x+ −

As requested, it is a minimization problem, all decision variables are non

The following optimization problem is not linear, because of the absolute

Solution of the practice quiz

• If x1 − x2 < 0, we can define the positive quantity

Consequently, the absolute value of the difference can then be written as

• If x∗1 − x∗2 < 0, for similar reasons, we have (y + )∗ = 0 and (y − )∗ =

• If x∗1 − x∗2 = 0, we have (y + )∗ = 0 and (y − )∗ = 0. Therefore, the

Solution of the practice quiz

Yes. Any global minimum is also a local minimum. If the function is

f ′ (x) = −4x + 4x3 − 3x2 .

If we solve f ′ (x) = 0, we obtain three solutions:

These are the only candidates to be minima or maxima.

2. Evaluate the function at