Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Lecture Analysis and Linear Algebra

Multivariate Calculus and Optimization

Rüdiger Frey

ruediger.frey@wu.ac.at
http://statmath.wu.ac.at/˜frey

Winter 2018

1 / 27
Multivariate Calculus and optimization

Functions of several variables

From now on we consider functions f : G ⊂ Rn → R,


(x1 , . . . , xn ) 7→ f (x1 , . . . , xn ).
Visualisation of function y = f (x1 , x2 ):

Plot the graph Gf = { x1 , x2 , f (x1 , x2 ) : (x1 , x2 ) ∈ G } (perspective
plot)
Contour plot. Plot the level sets or contours
Hc = {(x1 , x2 ) : f (x1 , x2 ) = c}.
Plot sections, that is graphs of the function x1 7→ f (x1 , x2 ) for fixed
values of x2 and of x2 7→ f (x1 , x2 ) for fixed x1 .

2 / 27
Multivariate Calculus and optimization

Example: plots of densities on R 2


4

4
2

2
0

0
y

y
-2

-2
-4

-4
   

-4 -2 0 2 4 -4 -2 0 2 4

x x
0.25

0.4
0 0.05 0.1 0.15 0.2

0 0.1 0.2 0.3


Z

Z
4


4


2



4 2 




4
0 2 

2
Y 0

0 Y 0
-2

X -2 

X


-2 

-2
-4 -4


-4 -4


density normal density; right bivariate t


Left contour and density plot of bivariate

3 / 27
Multivariate Calculus and optimization

Further mathematical properties

A function f : G ⊂ Rn → R is continuous in x0 ∈ G if for every


sequence xn ∈ G with xn → x0 it holds that limn→∞ f (xn ) = f (x0 ).
Be careful: it is not enough to check continuity of the sections
x1 7→ f (x1 , x2 ) and x2 7→ f (x1 , x2 ).
Suppose that G is convex. Then f is convex on G if for all x, y ∈ G
and all α ∈ [0, 1] it holds that

f (αx + (1 − α)y ) ≤ αf (x) + (1 − α)f (y ) .

4 / 27
Multivariate Calculus and optimization

A useful counterexample

Consider the function


( x1 x2
x12 +x22
for x ∈ R2 \ {0}
f (x1 , x2 ) =
0 for x = 0.

Then f is constant on every ray through the origin: For arbitrary


x = (x1 , x2 ) and arbitrary t 6= 0 one has f (tx1 , tx2 ) = f (x1 , x2 ). For
instance f (t, 0) = 0, and f (t, t) = t 2 /2t 2 = 1/2. In particular, the
sections t 7→ f (t, 0) and t 7→ f (0, t) are continuous, but f is not
continuous in t.

5 / 27
Multivariate Calculus and optimization

Differentiability and partial derivatives


Consider some f : U ⊂ Rn → R, (x1 , . . . , xn ) 7→ f (x1 , . . . , xn ), where U is
an open subset of Rn , and some x ∈ U.
Definition. (1) The ith partial derivative of f at x ∈ U is given by
∂f ∂f (x) f (x + hei ) − f (x)
(x) = = lim
∂xi ∂xi h→0 h
(if the limit exists).
(2) f is called continuously differentiable on U (f ∈ C 1 (U)) if for all
x ∈ U all partial derivatives exist and if the partial derivatives are are
continuous functions of x.
(3) More generally, a function f : U ⊂ Rn → Rm ,

(x1 , . . . , xn ) 7→ (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn ))t

is continuously differentiable on U if all components f1 , . . . , fm belong to


C 1 (U).
6 / 27
Multivariate Calculus and optimization

Examples

(1) Let f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then

∂f (x) ∂f
= 3x12 x2 + 2x1 x22 + 1, = x13 + 2x12 x2 + 2x2
∂x1 ∂x2

These are obviously continuous functions of x; hence f ∈ C 1 .


(2) Consider a symmetric 2 × 2 matrix A and let

f (x1 , x2 ) = x t Ax = a11 x12 + 2a12 x1 x2 + a22 x22 . ⇒

∂f (x) ∂f (x)
= 2a11 x1 + 2a12 x2 = (2Ax)1 and = (2Ax)2
∂x1 ∂x2

7 / 27
Multivariate Calculus and optimization

Gradient and Jacobi matrix

Suppose that f : U → R is in C 1 (U). Then the column vector


 t
∂f ∂f
∇f (x) = ∂x 1
(x), . . . , ∂xn (x) is the gradient of f .
For a C 1 function g : U → Rm the Jacobi matrix is given by
 
∂g1 (x) ∂g1 (x)
· · ·
 ∂x. 1 ∂xn
.. 
Jg (x) =  .
 . .  
∂gm (x) ∂gm (x)
∂x1 . . . ∂xn

Sometimes one uses also the gradient matrix ∇g (x) = Jg (x)t .

8 / 27
Multivariate Calculus and optimization

First order (Taylor) approximation

Theorem. Consider some C 1 function f : U → R. Then it holds for any


x, y ∈ U

f (y ) − f (x) = ∇f (x)t (y − x) + R(x, y − x) (1)

R(x,z)
where it holds that limkzk→0 kzk = 0.
Idea. The function f can be approximated locally around x by the affine
mapping y 7→ f (x) + ∇f (x)t (y − x).

9 / 27
Multivariate Calculus and optimization

Chain rule

Theorem. Consider C 1 functions f : Rn → Rm and g : Rk → Rn and let


h := f ◦ g . Then h is C 1 and it holds for the Jacobi matrix that
Jh(x) = Jf (g (x))Jg (x), i.e. the Jacobian of the concatenation is the
product of the individual Jacobi matrices.
Example. (Derivative along a vector). Consider a C 1 functions f : Rn → R
We want to consider the function f along the straight line φ(t) := x + tv ,
for t ∈ R, x, v ∈ Rn . We have Jφ(t) = v , Jf (x) = (∇f (x))t and hence

d d
f (φ(t)) = (∇f (x + tv ))t v , in particular f (φ(0)) = (∇f (x))t v .
dt dt

10 / 27
Multivariate Calculus and optimization

Second derivatives

Definition. Consider C 1 function f : U ⊂ Rn → R. Then the first order


partial derivatives ∂f∂x(x)
i
, 1 ≤ i ≤ n, are themselves functions from U to R.
1 If all partial derivatives are C 1 functions, f is called twice
continuously differentiable on U (f ∈ C 2 (U)). Fix i, j ∈ {1, . . . , n}.
Then one writes  
∂f
2
∂ f ∂ ∂xi (x)
(x) :=
∂xi ∂xj ∂xj
for the second partial derivative in direction xi and xj .
∂2f
2 For f ∈ C 2 (U) the matrix Hf with Hfij (x) = ∂xi ∂xj (x) is the Hessian
matrix of f .

11 / 27
Multivariate Calculus and optimization

Theorem of Young and Schwarz

Theorem. Consider f ∈ C 2 (U). Then the Hessian matrix is symmetric,


that is
∂2f ∂2f
(x) = (x) , 1 ≤ i, j ≤ n.
∂xi ∂xj ∂xj ∂xi
It follows that the Hessian is a symmetric matrix, that is Hfij (x) = Hfji (x),
1 ≤ i, j ≤ n. In particular, the definiteness of Hf can be checked using
eigenvalues or (for strictly definite matrices) with leading principal minors.

12 / 27
Multivariate Calculus and optimization

Example

(1) Consider f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then we have

∂2f ∂2f ∂2f


= 6x1 x2 + 2x22 , = 2x12 + 2 , = 3x12 + 4x1 x2 .
∂x12 ∂x22 ∂x1 ∂x2

(2) Consider f (x) = x t Ax for some symmetric matrix A. Then


Hf (x) = 2A.

13 / 27
Multivariate Calculus and optimization

Second order Taylor expansion

Theorem. If f is C 2 (U) and x, y ∈ U the Taylor formula becomes

1
f (y ) − f (x) = ∇f (x)t (y − x) + (y − x)t Hf (x)(y − x) + R2 (x, y − x)
2
R2 (x,z)
where limkzk→0 kzk2
= 0.
Idea. f can be approximated locally around x ∈ U by the quadratic
function
1
y 7→ f (x) + ∇f (x)t (y − x) + (y − x)t Hf (x)(y − x) .
2
Locally, this is a better approximation than the first order Taylor
approximation.

14 / 27
Multivariate Calculus and optimization

Characterizing Convex and Concave functions

Let f : x 7→ f (x1 , x2 ) be defined on an open and convex set U and have


continuous partial derivatives of first and second order. If Hf (x) is positive
semidefinite for all x ∈ U then all sections of f along straight lines are
convex functions.
Theorem. (1) f is a convex function on U ⇔ Hf is positive semidefinite
on U.
(2 ) f is concave on U ⇔ Hf is negative semidefinite on U.
Note that we may decide convexity or concavity by finding the eigenvalues
of Hf (x).

15 / 27
Multivariate Calculus and optimization

Example

Problem. Let f (x1 , x2 ) = 2x1 − x2 − x12 + 2x1 x2 − x22 . Is f convex, concave


or none of both ?

Solution. Define the matrix


 
−1 1
A=
1 −1

An easy computation gives for the Hessian that Hf (x) = 2A. Hence we
need to check the definiteness of A.
Approach via eigenvalues. The characteristic polynomial of A is
P(λ) = λ2 + 2λ; the equation P(λ) = 0 has solutions (eigenvalues)
−2 and 0. Hence, A is negative semidefinite and the function is
concave.

16 / 27
Multivariate Calculus and optimization

Example
Problem. Let f (x1 , x2 ) = x12 − x22 − x1 x2 − x13 . Find the largest domain in
which f is concave.

Solution. We find easily that


 
2 − 6x1 −1
Hf (x) = = A(x)
−1 −2

We look for the maximal domain where A(x) is negative semidefinite. Now
a b
a symmetric 2 × 2 matrix A = is negative semidefinite if and only
b c
if a ≤ 0, c ≤ 0 and det(A) ≥ 0. This gives the conditions
m1 = 2 − 6x1 ≤ 0 and m2 = det(A) = 12x1 − 5 ≥ 0 (as c < 0). Therefore
the solution is

G = {(x1 , x2 ) : 2 − 6x1 ≤ 0 and 12x1 − 5 ≥ 0} = {(x1 , x2 ) : x1 ≥ 5/12}.

17 / 27
Multivariate Calculus and optimization

Optimization problems

In its most general form an optimization problem is

minimize f (x) subject to x ∈ X̃ (2)

Here the set of admissible points X̃ is a subset of Rn , and the cost


function f is a function from X̃ to R. Often the admissible points are
further restricted by explicit inequality constraints.
Note that maximization problems can be addressed by replacing f with
−f , as supx∈X̃ f (x) = − inf x∈X̃ {−f (x)}.

18 / 27
Multivariate Calculus and optimization

Unconstrained optimization: the problem

In this section we consider problems of the form

minimize f (x) for x ∈ X̃ = Rn (3)

Moreover, we assume that f is once or twice continuously differentiable.


Most results hold also in the case where X̃ is an open subset of Rn .

19 / 27
Multivariate Calculus and optimization

Local and global optima

Definition. Consider the optimization problem (3).


1 x ∗ is called (unconstrained) local minimum of f if there is some δ > 0
such that f (x ∗ ) ≤ f (x) for all x ∈ Rn with kx − x ∗ k < δ.
2 x ∗ is called global minimum of f , if f (x ∗ ) ≤ f (x) ∀x ∈ Rn .
3 x ∗ is said to be a strict local/global minimum if the inequality
f (x ∗ ) ≤ f (x) is strict for x 6= x ∗ .
4 The value of the problem is f ∗ := inf{f (x) : x ∈ Rn }
Remark Local and global maxima are defined analogously.

20 / 27
Multivariate Calculus and optimization

Necessary optimality conditions

Proposition. Suppose that x ∗ ∈ U is a local minimum of f .


1 If f is C 1 in U, then ∇f (x ∗ ) = 0. (First Order Necessary Condition
or FONC).
2 If moreover f ∈ C 2 (U) then Hf (x ∗ ) is positive semi-definite (Second
Order Necessary Condition or SONC).
Comments.
x ∗ ∈ Rn with ∇f (x ∗ ) = 0 is called stationary point of f .
Proof is based on Taylor formula.
Necessary conditions for a local maximum: ∇f (x ∗ ) = 0, Hf (x ∗ )
negative semidefinite.
Necessary conditions in general not sufficient: consider f (x) = x 3 ,
x ∗ = 0.

21 / 27
Multivariate Calculus and optimization

Sufficient optimality conditions

Proposition. (Sufficient conditions.) Let f : U ⊂ Rn → R be C 2 on U.


Suppose that x ∗ ∈ U satisfies the conditions

∇f (x ∗ ) = 0, Hf (x ∗ ) strictly positive definite (4)

Then x ∗ is a local minimum.


Comments.
Sufficient conditions not necessary: Consider eg. f (x) = x 4 , x ∗ = 0.
No global statements possible.

22 / 27
Multivariate Calculus and optimization

The case of convex functions

Lemma. Consider an open convex set X ⊂ Rn . A C 1 function f : X → Rn


is convex on the if and only if it holds for all x, z ∈ X that

f (z) ≥ f (x) + ∇f (x)0 (z − x).

If f is C 2 a necessary and sufficient condition for the convexity of f on X


is that Hf (x) is positive semi-definite for all x ∈ X .
Proposition. Let f : X → R be a convex function on some convex set
X ⊂ Rn . Then
1 A local minimum of f over X is also a global minimum. If f is strictly
convex, there exists at most one global minimum.
2 If X is open, the condition ∇f (x ∗ ) = 0 is necessary and sufficient for
x ∗ ∈ X to be a global minimum of f .

23 / 27
Multivariate Calculus and optimization

Example: Quadratic cost functions

Let f (x) = 12 x 0 Qx − b 0 x, x ∈ Rn for a symmetric n × n matrix Q and


some b ∈ Rn . Then we have

∇f (x) = Qx − b and Hf (x) = Q.

a) Local minima. If x ∗ is a local minimum we must have

∇f (x ∗ ) = Qx ∗ − b = 0, Hf (x ∗ ) = Q positive semi-definite;

hence if Q is not positive semi-definite, f has no local minima.


b) If Q is positive semi-definite, f is convex. In that case we need not
distinguish global and local minima, and f has a global minimum if and
only if there is some x ∗ with Qx ∗ = b.
c) If Q is positive definite, Q −1 exists and the unique global minimum is
attained at x ∗ = Q −1 b.
24 / 27
Multivariate Calculus and optimization

Constraint optimization and Lagrange multipliers

Given C 1 -functions f and h from Rn to R, consider the problem

min f (x) subject to h(x) = 0 (5)


x∈Rn

Proposition.(Existence of a Lagrange multiplier) Let x ∗ be a local


minimum for Problem 5, and suppose that ∇h(x ∗ ) 6= 0. Then
1 (FONC) There exists a unique λ∗ ∈ R such that

∇f (x ∗ ) + λ∗ ∇h(x ∗ ) = 0.

25 / 27
Multivariate Calculus and optimization

Comments and Interpretation.

i) V (x ∗ ) is the subspace of variations ∆x = (x − x ∗ ) for which the


constraint h(x) = 0 holds ’up to first order’. Condition ?? states that
∇f (x ∗ ) is orthogonal to all these ‘locally permissible’ variations: for
y ∈ V (x ∗ ) is holds that:

∇f (x ∗ )0 y = λ∗ ∇h(x ∗ )0 y = 0.

This condition is analogous to the condition ∇f (x ∗ ) = 0 in


unconstrained optimization.
ii) Using the Lagrange function L(x, λ) = f (x) + λh(x) we may write the
FONC as ∂x ∂
i
L(x ∗ , λ∗ ) = 0, 1 ≤ i ≤ n.

26 / 27
Multivariate Calculus and optimization

Examples

Geometric interpretation of multipliers: Consider the problem

min x1 + x2 subject to x12 + x22 = 2.


x∈R2

Lagrange can be used to establish geometric-arithmetic-mean


inequality

27 / 27

You might also like