Slides Multivariate Calculus Wima 2018

Lecture Analysis and Linear Algebra
Multivariate Calculus and Optimization
Rüdiger Frey
ruediger.frey@wu.ac.at
http://statmath.wu.ac.at/˜frey
Winter 2018
1 / 27
Multivariate Calculus and optimization
Functions of several variables
From now on we consider functions f : G ⊂ Rn → R,

(x1 , . . . , xn ) 7→ f (x1 , . . . , xn ).
Visualisation of function y = f (x1 , x2 ):

Plot the graph Gf = { x1 , x2 , f (x1 , x2 ) : (x1 , x2 ) ∈ G } (perspective
plot)
Contour plot. Plot the level sets or contours
Hc = {(x1 , x2 ) : f (x1 , x2 ) = c}.
Plot sections, that is graphs of the function x1 7→ f (x1 , x2 ) for fixed
values of x2 and of x2 7→ f (x1 , x2 ) for fixed x1 .
2 / 27
Example: plots of densities on R 2

4
4
2
2
0
0
y
y
-2
-2
-4
-4

-4 -2 0 2 4 -4 -2 0 2 4
x x
0.25
0.4
0 0.05 0.1 0.15 0.2
0 0.1 0.2 0.3

Z
Z
4

4

2

4 2

4
0 2
2
Y 0

0 Y 0
-2

X -2
X

-2
-2
-4 -4

-4 -4

density normal density; right bivariate t

Left contour and density plot of bivariate
3 / 27
Further mathematical properties
A function f : G ⊂ Rn → R is continuous in x0 ∈ G if for every

sequence xn ∈ G with xn → x0 it holds that limn→∞ f (xn ) = f (x0 ).
Be careful: it is not enough to check continuity of the sections
x1 7→ f (x1 , x2 ) and x2 7→ f (x1 , x2 ).
Suppose that G is convex. Then f is convex on G if for all x, y ∈ G
and all α ∈ [0, 1] it holds that
f (αx + (1 − α)y ) ≤ αf (x) + (1 − α)f (y ) .
4 / 27
A useful counterexample
Consider the function

( x1 x2
x12 +x22
for x ∈ R2 \ {0}
f (x1 , x2 ) =
0 for x = 0.
Then f is constant on every ray through the origin: For arbitrary

x = (x1 , x2 ) and arbitrary t 6= 0 one has f (tx1 , tx2 ) = f (x1 , x2 ). For
instance f (t, 0) = 0, and f (t, t) = t 2 /2t 2 = 1/2. In particular, the
sections t 7→ f (t, 0) and t 7→ f (0, t) are continuous, but f is not
continuous in t.
5 / 27
Differentiability and partial derivatives

Consider some f : U ⊂ Rn → R, (x1 , . . . , xn ) 7→ f (x1 , . . . , xn ), where U is
an open subset of Rn , and some x ∈ U.
Definition. (1) The ith partial derivative of f at x ∈ U is given by
∂f ∂f (x) f (x + hei ) − f (x)
(x) = = lim
∂xi ∂xi h→0 h
(if the limit exists).
(2) f is called continuously differentiable on U (f ∈ C 1 (U)) if for all
x ∈ U all partial derivatives exist and if the partial derivatives are are
continuous functions of x.
(3) More generally, a function f : U ⊂ Rn → Rm ,
(x1 , . . . , xn ) 7→ (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn ))t
is continuously differentiable on U if all components f1 , . . . , fm belong to

C 1 (U).
6 / 27
Examples
(1) Let f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then
∂f (x) ∂f
= 3x12 x2 + 2x1 x22 + 1, = x13 + 2x12 x2 + 2x2
∂x1 ∂x2
These are obviously continuous functions of x; hence f ∈ C 1 .

(2) Consider a symmetric 2 × 2 matrix A and let
f (x1 , x2 ) = x t Ax = a11 x12 + 2a12 x1 x2 + a22 x22 . ⇒
∂f (x) ∂f (x)
= 2a11 x1 + 2a12 x2 = (2Ax)1 and = (2Ax)2
∂x1 ∂x2
7 / 27
Gradient and Jacobi matrix
Suppose that f : U → R is in C 1 (U). Then the column vector

t
∂f ∂f
∇f (x) = ∂x 1
(x), . . . , ∂xn (x) is the gradient of f .
For a C 1 function g : U → Rm the Jacobi matrix is given by
 
∂g1 (x) ∂g1 (x)
· · ·
 ∂x. 1 ∂xn
.. 
Jg (x) =  .
 . .  
∂gm (x) ∂gm (x)
∂x1 . . . ∂xn
Sometimes one uses also the gradient matrix ∇g (x) = Jg (x)t .
8 / 27
First order (Taylor) approximation
Theorem. Consider some C 1 function f : U → R. Then it holds for any

x, y ∈ U
f (y ) − f (x) = ∇f (x)t (y − x) + R(x, y − x) (1)
R(x,z)
where it holds that limkzk→0 kzk = 0.
Idea. The function f can be approximated locally around x by the affine
mapping y 7→ f (x) + ∇f (x)t (y − x).
9 / 27
Chain rule
Theorem. Consider C 1 functions f : Rn → Rm and g : Rk → Rn and let

h := f ◦ g . Then h is C 1 and it holds for the Jacobi matrix that
Jh(x) = Jf (g (x))Jg (x), i.e. the Jacobian of the concatenation is the
product of the individual Jacobi matrices.
Example. (Derivative along a vector). Consider a C 1 functions f : Rn → R
We want to consider the function f along the straight line φ(t) := x + tv ,
for t ∈ R, x, v ∈ Rn . We have Jφ(t) = v , Jf (x) = (∇f (x))t and hence
d d
f (φ(t)) = (∇f (x + tv ))t v , in particular f (φ(0)) = (∇f (x))t v .
dt dt
10 / 27
Second derivatives
Definition. Consider C 1 function f : U ⊂ Rn → R. Then the first order

partial derivatives ∂f∂x(x)
i
, 1 ≤ i ≤ n, are themselves functions from U to R.
1 If all partial derivatives are C 1 functions, f is called twice
continuously differentiable on U (f ∈ C 2 (U)). Fix i, j ∈ {1, . . . , n}.
Then one writes
∂f
2
∂ f ∂ ∂xi (x)
(x) :=
∂xi ∂xj ∂xj
for the second partial derivative in direction xi and xj .
∂2f
2 For f ∈ C 2 (U) the matrix Hf with Hfij (x) = ∂xi ∂xj (x) is the Hessian
matrix of f .
11 / 27
Theorem of Young and Schwarz
Theorem. Consider f ∈ C 2 (U). Then the Hessian matrix is symmetric,

that is
∂2f ∂2f
(x) = (x) , 1 ≤ i, j ≤ n.
∂xi ∂xj ∂xj ∂xi
It follows that the Hessian is a symmetric matrix, that is Hfij (x) = Hfji (x),
1 ≤ i, j ≤ n. In particular, the definiteness of Hf can be checked using
eigenvalues or (for strictly definite matrices) with leading principal minors.
12 / 27
Example
(1) Consider f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then we have
∂2f ∂2f ∂2f

= 6x1 x2 + 2x22 , = 2x12 + 2 , = 3x12 + 4x1 x2 .
∂x12 ∂x22 ∂x1 ∂x2
(2) Consider f (x) = x t Ax for some symmetric matrix A. Then

Hf (x) = 2A.
13 / 27
Second order Taylor expansion
Theorem. If f is C 2 (U) and x, y ∈ U the Taylor formula becomes
1
f (y ) − f (x) = ∇f (x)t (y − x) + (y − x)t Hf (x)(y − x) + R2 (x, y − x)
2
R2 (x,z)
where limkzk→0 kzk2
= 0.
Idea. f can be approximated locally around x ∈ U by the quadratic
function
1
y 7→ f (x) + ∇f (x)t (y − x) + (y − x)t Hf (x)(y − x) .
2
Locally, this is a better approximation than the first order Taylor
approximation.
14 / 27
Characterizing Convex and Concave functions
Let f : x 7→ f (x1 , x2 ) be defined on an open and convex set U and have

continuous partial derivatives of first and second order. If Hf (x) is positive
semidefinite for all x ∈ U then all sections of f along straight lines are
convex functions.
Theorem. (1) f is a convex function on U ⇔ Hf is positive semidefinite
on U.
(2 ) f is concave on U ⇔ Hf is negative semidefinite on U.
Note that we may decide convexity or concavity by finding the eigenvalues
of Hf (x).
15 / 27
Example
Problem. Let f (x1 , x2 ) = 2x1 − x2 − x12 + 2x1 x2 − x22 . Is f convex, concave

or none of both ?
Solution. Define the matrix

−1 1
A=
1 −1
An easy computation gives for the Hessian that Hf (x) = 2A. Hence we
need to check the definiteness of A.
Approach via eigenvalues. The characteristic polynomial of A is
P(λ) = λ2 + 2λ; the equation P(λ) = 0 has solutions (eigenvalues)
−2 and 0. Hence, A is negative semidefinite and the function is
concave.
16 / 27
Example
Problem. Let f (x1 , x2 ) = x12 − x22 − x1 x2 − x13 . Find the largest domain in
which f is concave.
Solution. We find easily that

2 − 6x1 −1
Hf (x) = = A(x)
−1 −2
We look for the maximal domain where A(x) is negative semidefinite. Now
a b
a symmetric 2 × 2 matrix A = is negative semidefinite if and only
b c
if a ≤ 0, c ≤ 0 and det(A) ≥ 0. This gives the conditions
m1 = 2 − 6x1 ≤ 0 and m2 = det(A) = 12x1 − 5 ≥ 0 (as c < 0). Therefore
the solution is
G = {(x1 , x2 ) : 2 − 6x1 ≤ 0 and 12x1 − 5 ≥ 0} = {(x1 , x2 ) : x1 ≥ 5/12}.
17 / 27
Optimization problems
In its most general form an optimization problem is
minimize f (x) subject to x ∈ X̃ (2)
Here the set of admissible points X̃ is a subset of Rn , and the cost

function f is a function from X̃ to R. Often the admissible points are
further restricted by explicit inequality constraints.
Note that maximization problems can be addressed by replacing f with
−f , as supx∈X̃ f (x) = − inf x∈X̃ {−f (x)}.
18 / 27
Unconstrained optimization: the problem
In this section we consider problems of the form
minimize f (x) for x ∈ X̃ = Rn (3)
Moreover, we assume that f is once or twice continuously differentiable.

Most results hold also in the case where X̃ is an open subset of Rn .
19 / 27
Local and global optima
Definition. Consider the optimization problem (3).

1 x ∗ is called (unconstrained) local minimum of f if there is some δ > 0
such that f (x ∗ ) ≤ f (x) for all x ∈ Rn with kx − x ∗ k < δ.
2 x ∗ is called global minimum of f , if f (x ∗ ) ≤ f (x) ∀x ∈ Rn .
3 x ∗ is said to be a strict local/global minimum if the inequality
f (x ∗ ) ≤ f (x) is strict for x 6= x ∗ .
4 The value of the problem is f ∗ := inf{f (x) : x ∈ Rn }
Remark Local and global maxima are defined analogously.
20 / 27
Necessary optimality conditions
Proposition. Suppose that x ∗ ∈ U is a local minimum of f .

1 If f is C 1 in U, then ∇f (x ∗ ) = 0. (First Order Necessary Condition
or FONC).
2 If moreover f ∈ C 2 (U) then Hf (x ∗ ) is positive semi-definite (Second
Order Necessary Condition or SONC).
Comments.
x ∗ ∈ Rn with ∇f (x ∗ ) = 0 is called stationary point of f .
Proof is based on Taylor formula.
Necessary conditions for a local maximum: ∇f (x ∗ ) = 0, Hf (x ∗ )
negative semidefinite.
Necessary conditions in general not sufficient: consider f (x) = x 3 ,
x ∗ = 0.
21 / 27
Sufficient optimality conditions
Proposition. (Sufficient conditions.) Let f : U ⊂ Rn → R be C 2 on U.

Suppose that x ∗ ∈ U satisfies the conditions
∇f (x ∗ ) = 0, Hf (x ∗ ) strictly positive definite (4)
Then x ∗ is a local minimum.

Comments.
Sufficient conditions not necessary: Consider eg. f (x) = x 4 , x ∗ = 0.
No global statements possible.
22 / 27
The case of convex functions
Lemma. Consider an open convex set X ⊂ Rn . A C 1 function f : X → Rn

is convex on the if and only if it holds for all x, z ∈ X that
f (z) ≥ f (x) + ∇f (x)0 (z − x).
If f is C 2 a necessary and sufficient condition for the convexity of f on X

is that Hf (x) is positive semi-definite for all x ∈ X .
Proposition. Let f : X → R be a convex function on some convex set
X ⊂ Rn . Then
1 A local minimum of f over X is also a global minimum. If f is strictly
convex, there exists at most one global minimum.
2 If X is open, the condition ∇f (x ∗ ) = 0 is necessary and sufficient for
x ∗ ∈ X to be a global minimum of f .
23 / 27
Example: Quadratic cost functions
Let f (x) = 12 x 0 Qx − b 0 x, x ∈ Rn for a symmetric n × n matrix Q and

some b ∈ Rn . Then we have
∇f (x) = Qx − b and Hf (x) = Q.
a) Local minima. If x ∗ is a local minimum we must have
∇f (x ∗ ) = Qx ∗ − b = 0, Hf (x ∗ ) = Q positive semi-definite;
hence if Q is not positive semi-definite, f has no local minima.

b) If Q is positive semi-definite, f is convex. In that case we need not
distinguish global and local minima, and f has a global minimum if and
only if there is some x ∗ with Qx ∗ = b.
c) If Q is positive definite, Q −1 exists and the unique global minimum is
attained at x ∗ = Q −1 b.
24 / 27
Constraint optimization and Lagrange multipliers
Given C 1 -functions f and h from Rn to R, consider the problem
min f (x) subject to h(x) = 0 (5)

x∈Rn
Proposition.(Existence of a Lagrange multiplier) Let x ∗ be a local

minimum for Problem 5, and suppose that ∇h(x ∗ ) 6= 0. Then
1 (FONC) There exists a unique λ∗ ∈ R such that
∇f (x ∗ ) + λ∗ ∇h(x ∗ ) = 0.
25 / 27
Comments and Interpretation.
i) V (x ∗ ) is the subspace of variations ∆x = (x − x ∗ ) for which the

constraint h(x) = 0 holds ’up to first order’. Condition ?? states that
∇f (x ∗ ) is orthogonal to all these ‘locally permissible’ variations: for
y ∈ V (x ∗ ) is holds that:
∇f (x ∗ )0 y = λ∗ ∇h(x ∗ )0 y = 0.
This condition is analogous to the condition ∇f (x ∗ ) = 0 in

unconstrained optimization.
ii) Using the Lagrange function L(x, λ) = f (x) + λh(x) we may write the
FONC as ∂x ∂
i
L(x ∗ , λ∗ ) = 0, 1 ≤ i ≤ n.
26 / 27
Examples
Geometric interpretation of multipliers: Consider the problem
min x1 + x2 subject to x12 + x22 = 2.

x∈R2
Lagrange can be used to establish geometric-arithmetic-mean

inequality
27 / 27

Slides Multivariate Calculus Wima 2018

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides Multivariate Calculus Wima 2018

Uploaded by

Copyright:

Available Formats

Lecture Analysis and Linear Algebra

Multivariate Calculus and Optimization

Functions of several variables

From now on we consider functions f : G ⊂ Rn → R,

Example: plots of densities on R 2

0 0.1 0.2 0.3

density normal density; right bivariate t

Further mathematical properties

A function f : G ⊂ Rn → R is continuous in x0 ∈ G if for every

f (αx + (1 − α)y ) ≤ αf (x) + (1 − α)f (y ) .

Consider the function

Then f is constant on every ray through the origin: For arbitrary

Differentiability and partial derivatives

(x1 , . . . , xn ) 7→ (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn ))t

is continuously differentiable on U if all components f1 , . . . , fm belong to

(1) Let f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then

These are obviously continuous functions of x; hence f ∈ C 1 .

f (x1 , x2 ) = x t Ax = a11 x12 + 2a12 x1 x2 + a22 x22 . ⇒

Gradient and Jacobi matrix

Suppose that f : U → R is in C 1 (U). Then the column vector

Sometimes one uses also the gradient matrix ∇g (x) = Jg (x)t .

First order (Taylor) approximation

Theorem. Consider some C 1 function f : U → R. Then it holds for any

f (y ) − f (x) = ∇f (x)t (y − x) + R(x, y − x) (1)

Theorem. Consider C 1 functions f : Rn → Rm and g : Rk → Rn and let

Definition. Consider C 1 function f : U ⊂ Rn → R. Then the first order

Theorem of Young and Schwarz

Theorem. Consider f ∈ C 2 (U). Then the Hessian matrix is symmetric,

(1) Consider f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then we have

∂2f ∂2f ∂2f

(2) Consider f (x) = x t Ax for some symmetric matrix A. Then

Second order Taylor expansion

Theorem. If f is C 2 (U) and x, y ∈ U the Taylor formula becomes

Characterizing Convex and Concave functions

Let f : x 7→ f (x1 , x2 ) be defined on an open and convex set U and have

Problem. Let f (x1 , x2 ) = 2x1 − x2 − x12 + 2x1 x2 − x22 . Is f convex, concave

Solution. Define the matrix

Solution. We find easily that

G = {(x1 , x2 ) : 2 − 6x1 ≤ 0 and 12x1 − 5 ≥ 0} = {(x1 , x2 ) : x1 ≥ 5/12}.

In its most general form an optimization problem is

minimize f (x) subject to x ∈ X̃ (2)

Here the set of admissible points X̃ is a subset of Rn , and the cost

Unconstrained optimization: the problem

In this section we consider problems of the form

minimize f (x) for x ∈ X̃ = Rn (3)

Moreover, we assume that f is once or twice continuously differentiable.

Local and global optima

Definition. Consider the optimization problem (3).

Necessary optimality conditions

Proposition. Suppose that x ∗ ∈ U is a local minimum of f .

Sufficient optimality conditions

Proposition. (Sufficient conditions.) Let f : U ⊂ Rn → R be C 2 on U.

∇f (x ∗ ) = 0, Hf (x ∗ ) strictly positive definite (4)

Then x ∗ is a local minimum.

The case of convex functions

Lemma. Consider an open convex set X ⊂ Rn . A C 1 function f : X → Rn

f (z) ≥ f (x) + ∇f (x)0 (z − x).

If f is C 2 a necessary and sufficient condition for the convexity of f on X

Example: Quadratic cost functions

Let f (x) = 12 x 0 Qx − b 0 x, x ∈ Rn for a symmetric n × n matrix Q and

∇f (x) = Qx − b and Hf (x) = Q.

a) Local minima. If x ∗ is a local minimum we must have

hence if Q is not positive semi-definite, f has no local minima.