Professional Documents
Culture Documents
MATH412 QUIZ 3 Solution
MATH412 QUIZ 3 Solution
28/11/2022
Dear students, usual rules apply: you can’t leave the class during the test, nor move from your desk
or talk to your colleagues. Mobile switched off, scientific calculator fine, you can ask additional
sheets rising your hand remaining at your desk. Please write your name and ID number on the first
sheet and hand back all your sheets numbered k/N with k=1,2,..,N.
1) The following set of experimental results is given, leading to a minimum least square
estimation problem.
We wish to identify the best hyperplane 𝑋𝛽 approximating 𝑦 which minimizes the sum of
the squared errors ∑𝑡 𝑒𝑡2 , 𝑡 = 1,2, . . ,6, where 𝛽 ∈ 𝑅 3 , 𝑋 ∈ 𝑅 (6,3) , 𝑦 ∈ 𝑅 6 .
a. Formulate the least square minimization problem associated with this set of data
leading to the least square estimate 𝛽̂. In particular: specify the objective function,
define the errors and the associated data sets.
Write down the first order optimality conditions leading to the definition of 𝛽̂.
[Hint: you may solve the minimization problem but notice that you are not expected to
0.6032
̂
do it ok. Indeed I can anticipate that the solution is 𝜷 = [0.2677] resulting into a
1.2841
minimum ∑𝑡 𝑒𝑡2 = 7.6721)
b. Clarify the mathematical motivation behind the LSE method and under which general
assumptions does this quadratic programming problem arise.
[Hint: I expect here a link between the definition of the errors, the solution of the
unconstrained quadratic program and the rank of the matrix X].
c. Given the LSE solution what can you say regarding the linear dependence of 𝑦 from
𝑋1 , 𝑋2 ?
Which type of probability distribution is associated, for 𝑡 = 1,2. . ,6, with the errors 𝑒𝑡 ?
Focus only on the mean and the variance. What can we say instead for 𝑇 ≤ 3?
[Hint: notice that here you are expected to answer to 3 possibilities, namely for 6 data
points as above but also for T=3 and for T<3. For the first case you have been working
above and you think to the difference between the observed data and the fitting
hyperplane. For T=3 and T<3 will there be an error to specify? Then the answer follows.]
Solution
And we can derive the first order optimality condition (gradient=0) in matrix form as
2) Consider the following minimization problem: min 𝑓(𝐱) = 𝑥12 + 3𝑥22 + 𝑥32 − 3𝑥1 subject to
𝑥
two equality constraints: ℎ1 (𝐱): 2𝑥1 + 𝑥3 = 2 and ℎ2 (𝐱): 𝑥2 − 𝑥1 = 3 with 𝐱 ∈ 𝑅 3 , 𝐱 =
(𝑥1 , 𝑥2 , 𝑥3 )𝑇 . [Hint: this is a quadratic program with linear constraints. It is not
1
however in the form min 2 𝑥 𝑇 𝑄𝑥 𝑠. 𝑡. 𝐴𝑥 = 𝑏, due to the linear term 3𝑥1 in the objective. It
𝑥
may be extended to the classical case we have seen in class but I suggest you just develop the
Lagrange method]
a. Define the Lagrange function for this problem and derive the first order necessary
optimality conditions. Why, in general, are these first order conditions necessary
but not sufficient for a minimum (or a maximum)? While they are sufficient in this
case (for this class of problems). [Hint: for the sufficiency, elaborate over the fact
that both the objective and the constraints are convex functions]
b. Specify the Jacobian matrix for this problem and show that any point 𝐱 =
(𝑥1 , 𝑥2 , 𝑥3 )𝑇 ∈ 𝑅 3 is a regular point on the constraint surface. Why is this condition
necessary to establish a candidate 𝐱 ∗ = argmin{𝑓(x) 𝑠. 𝑡. ℎ(𝑥) = 0} ?
[Hint: this question is a direct consequence of the Lagrange theorem in R^n and
focuses on its assumptions]
−0.4375
c. The function has a unique minimum given by 𝐱 ∗ = [ 2.5625 ]. Evaluate the
2.875
objective function at this point and derive the optimal Lagrange multipliers. Verify
the optimality of 𝐱 ∗ . [Hint: verify the optimality amounts to determine the
Lagrange multipliers and check that with the given set of optimal vectors 𝐱 ∗ and
𝛌∗ FONC are verified].
d. Consider now the same objective function min 𝑓(𝐱) = 𝑥12 + 3𝑥22 + 𝑥32 − 3𝑥1 but
𝑥
without any constraint. Determine the optimal decision vector 𝐱 ∗ and optimal
𝒇(𝐱 ∗ ). Is this optimal value unique? [Hint: the solution is immediate once your
write down the first order conditions. Notice that the function is quadratic and
without the linear term, tje optimal x would be the origin.]
e. Consider the solution of the constrained problem with the one of the unconstrained
problem: when analysing the optimal values of the objective function and the
optimal vectors 𝐱 ∗ , how would you interpret the effect of the Lagrange multipliers
on the optimal solution of the constrained case? [Hint: you have all the
information from the previous answers: this should be easy to answer. You focus on
the definition of the dual multipliers as shadow prices and as sensitivity coefficients
for constraint violation.]
Solution
a) We have two constraints then this is in the form min 𝑓(𝑥) 𝑠. 𝑡. ℎ1 (𝑥) = 0, ℎ2 (𝑥) = 0. The
𝑥
Lagrange function is 𝐿(𝒙, 𝝀) = 𝑥12 + 3𝑥22 + 𝑥32 − 3𝑥1 + 𝜆1 (2 − 2𝑥1 − 𝑥3 ) + 𝜆2 (3 − 𝑥2 +
𝑥1 ).
We derive FONC:
𝜕𝐿(𝑥, 𝜆)
= 2𝑥1 − 3 − 2𝜆1 + 𝜆2 = 0
𝜕𝑥1
𝜕𝐿(𝑥, 𝜆)
= 6𝑥2 − 𝜆2 = 0
𝜕𝑥2
𝜕𝐿(𝑥, 𝜆)
= 2𝑥3 − 𝜆1 = 0
𝜕𝑥3
𝜕𝐿(𝑥, 𝜆)
= 2 − 2𝑥1 − 𝑥3 = 0
𝜕𝜆1
𝜕𝐿(𝑥, 𝜆)
= 3 + 𝑥1 − 𝑥2 = 0
𝜕𝜆2
These conditions are in general not sufficient to define and extremum because they would
be satisfied also in presence of a saddle point: then only second order information would
clarify whether the associated Hessian is indefinite or rather positive or negative s.d.
In this case however, being the problem quadratic with linear constraints it is a convex
program with both the objective function and the constraints convex, in such case then the
conditions are also sufficient.
b) Consider the two constraints: the Jacobian matrix is the (2,3) matrix with the gradients of
ℎ1 (𝑥), ℎ2 (𝑥) as rows, we have:
𝜕ℎ1 𝜕ℎ1 𝜕ℎ1
𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 −2 0 − 1
𝑫𝒉(𝒙) = =[ ]
𝜕ℎ2 𝜕ℎ2 𝜕ℎ2 1 −1 0
[ 𝜕𝑥1 𝜕𝑥2 𝜕𝑥3 ]
We see that the two gradients (rows of the Jacobian) are linearly independent and, being
linear functions, do not depend on x. Then any 𝑥 ∈ 𝑅 3 is regular: accordingly the gradient of
the objective function can be expressed as a linear combination of the two constraints’
gradients, which is the Lagrange theorem. Indeed:
𝑫𝒇(𝒙∗ ) + 𝝀∗ 𝑻 𝑫𝒉(𝒙∗ ) = 𝟎𝑻 thus 𝛁𝒇(𝒙∗ ) = −𝑫𝒉(𝒙∗ )𝑻 𝝀∗ that requires the rows of the
Jacobian to be l.i.
c) The problem has a unique solution that can be derived directly from the FONC, once you see
that this is a system in 5 unknowns as 𝐴𝑥 = 𝑏, where:
−0.4375
2.5625
Then 𝑥 ∗ = 𝐴−1 𝑏 = 2.875 with the two optimal multipliers already written down and
5.75
[ 15.375 ]
optimal objective given by 𝑓(𝑥 ∗ ) = 29.41.
Since I gave you the optimal coordinates 𝐱 = (𝒙∗𝟏 , 𝒙∗𝟐 , 𝒙∗𝟑 ), the optimal multipliers can be
recovered directly from the first order conditions written down in question 2.a.
To verify the optimality, it is then sufficient to check whether indeed the elements of the
gradient of f can be expressed as linear combinations of the gradients of the constraints with
coefficients defined by 𝝀∗𝟏 , 𝝀∗𝟐 .
Indeed:
−0.4375 −3.875 2 −1
∇𝑓 ([ 2.5625 ]) = [ 15.375 ] = 5.75 [0] + 15.375 [ 1 ]
2.875 5.75 1 0
d) Here we have only the objective and in this case the FONC simplify to:
𝜕𝑓(𝑥)
= 2𝑥1 − 3 = 0
𝜕𝑥1
𝜕𝑓(𝑥)
= 6𝑥2 = 0
𝜕𝑥2
𝜕𝑓(𝑥)
= 2𝑥3 = 0
𝜕𝑥3
1.5
With stationary point at 𝐱 ∗ = [ 0 ] which is unique being the function convex and leading to
0
an optimal value 𝒇(𝐱 ∗ ) = −𝟐. 𝟐𝟓.
e) In the unconstrained case above here there is a global minimum which is identified over the
entire 𝑅 3, this feasible region is heavily restricted once we introduce the two linear equality
constraints which notice specify two hyperplanes one is defined with the second coordinate
at 0, thus in the (x1,x3) plane, and the other with the third coordinate equal to 0, thus in the
(x1,x2) plane. Now the Lagrange multipliers tell us what would be the impact on the optimal
value induced by a violation of a constraint. We see that in this case the second constraint
has a relevant impact on the optimal value.