Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

Differential Calculus of Several

Variables
Fernando Barrera Mora
PoLaMaT-CIMA

January 2021
Introduction

The aim of writing notes to support courses lies on the very ground of the original idea that
guided the starting point and development of Centro de Investigación en Matemáticas (CIMA),
which originated almost 20 years ago. However, given that after 20 years, many changes have
occurred concerning the way of finding sources or references to support student’s mathematical
learning, the writing of lecture notes still makes good sense. Taking this into account, it would
be reasonable to incorporate the many available sources, on the web, to support the development
of the tasks. For instance, the use of GeoGebra, Cocalc, wolfram and several others tools is
welcome in the discussion of lecture notes. Along the discussion of this notes, the reader will find
resources that she(he) can use to explore or foster his view on the subject. Also, writing lecture
notes has the advantage of presenting the authors view of the course as well as his(her) point
of view of learning the subject under discussion. In this case, the first chapter is a quick review
of basic topics from one variable calculus; chapter two is under construction, due to the idea
of incorporating digital tools to help visualizing important concepts related to the geometry of
calculus; in chapter three, basic algebraic and topological concepts of Rn are presented with the
aim of providing foundation elements to approach the main results to be presented; in chapters
four, five and six the fundamental results concerning properties of functions of several variables
are discussed.
We think that writing lecture notes or other materials to help students in their mathematical
learning, should be consider as an important activity to be develop by instructors as part of
the mathematical tasks to be carry out by them, especially if they are directly involved in a
program such as LIMA.
Concerning other but related point, this year is the 20th anniversary of the Bachelors Degree
Program in Applied Mathematics that has been offered in CIMA. At this respect, we would like
to mention that in this period of time, several dozens of students have finished their studies at
CIMA and now they are active in the academic scenarios or elsewhere. Just this fact should
encourage every one who has been working (studying) at CIMA to continue working (studying)
as hard as he(she) can.
Unfortunately, in the last thirteen months or so, the UAEH administrators almost disappear
CIMA’s infrastructure, even though we are sure that there are some colleagues whose hard work
will continue helping, present and future, students at CIMA to reach their academic goals.

For a successful 20th anniversary of LIMA, let’s work hard!

Fernando Barrera Mora, January 2021

2
Polya’s Ten Famous Quotations ( Decalogue for the learning and the advancing of
mathematics)

1. It is better to solve one problem five different ways, than to solve five problems one way.

2. Mathematics is the cheapest science. Unlike physics or chemistry, it does not require any
expensive equipment. All one needs for mathematics is a pencil and paper.

3. Solving problems is a practical art, like swimming, or skiing or playing the piano: you can
learn it only by imitation and practice.

4. If there is a problem you can’t solve, then there is an easier problem you can solve: find
it.

5. It may be more important in the mathematics class how you teach that what you teach.

6. Beauty in mathematics is seeing the truth without effort.

7. To teach effectively a teacher must develop a feeling for his subject; he cannot make his
students sense its vitality if he does not sense it himself. He cannot share its enthusiasm
when he has no enthusiasm to share. How he makes his point may be as important as the
point he makes; he must personally feel it to be important.

8. The future mathematician . . . should solve problems, choose the problems which are in his
line, mediate upon their solution, and invent new problems. By this means, and by all
other means, he should endeavor to make his first important discovery: he should discover
his likes and dislikes, his taste, his own line.

9. Mathematics consist in proving the most obvious thing in the least obvious way.

10. An idea which can be used only once is a trick. If one can use it more than once it becomes
a method.

A small collection of recommended videos and applets

number e
e to the pi i
e and pi are transcendental
The Riemann Cnjecture, Explained
Directional derivative GeoGebra
DirectionalDerivativeAppletGeogebra
Birch and Swinnerton-Dyer Conjecture

3
Contents

1 A quick review of basic results from calculus of one variable 1


1.1 Some properties of the reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Functions: limits, continuity and derivatives . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Some types of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Limit of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.3 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Elements of Geometry in R3 5

3 Basic Algebraic and Topological Properties of Rn 7


3.1 Geometric Aspects of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Topology of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Limit and Continuity of Functions 21


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5 Differentiable Functions from Rn → Rm 31


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Derivatives in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.3 Partial Derivatives and Matrix Representation of the Derivative . . . . . . . . . 37
5.3.1 Directional Derivatives and continuity . . . . . . . . . . . . . . . . . . . 38
5.3.2 Graphs, level curves and gradient . . . . . . . . . . . . . . . . . . . . . . 41
5.3.3 Matrix Representation of the Derivative . . . . . . . . . . . . . . . . . . 44
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Main theorems in several variables calculus 51


6.1 Mean value Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Maxima and Minima of Real valued Functions . . . . . . . . . . . . . . . . . . . 53
6.3.1 Second Derivative Method . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3.2 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.3.3 Inverse and Implicit Function Theorem . . . . . . . . . . . . . . . . . . . 61
6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

i
Chapter 1

A quick review of basic results from


calculus of one variable

It seems to me that when starting a discussion of calculus of several variables, it would be


appropriated to review basic concepts from calculus of one variable. In order to do so, it is
needed to start by reviewing basic properties of the real numbers.

1.1 Some properties of the reals


The properties listed below are basic to develop the foundations of calculus (theory of calcu-
lus=analysis). It is highly recommended to review them very carefully and use them (when
solving problems) as often as possible.

1. Trichotomy law: Given x ∈ R, exactly one of the following conditions hold.

(a) x = 0,
(b) x > 0,
(c) −x > 0.

2. Archimedian principle (Eudoxus’ axiom (Eucl. V, def. 4)): given ε > 0 and M ∈ R, there
is a natural number n so that M < nε.

3. The following principles are logically equivalent. For a proof of this, consult [8]
Principle 1 (Pigeonhole principle). Assume that m pigeons are to be distributed into n
pigeonholes. If m > n, then at least one pigeonhole contains more than one pigeon.
Principle 2 (Multiplication principle). If there exist n ways of performing one operation
and m ways of performing another operation (independent of the first), then there exist
mn ways of performing both operations, one followed by the other.
Principle 3 (Well-ordering principle). If S is a nonempty subset of the positive integers,
then there exists n0 ∈ S so that n0 ≤ n for every n ∈ S.
Principle 4 (Principle of Mathematical Induction). If S is a subset of the positive integers
that satisfies:

(a) 1 ∈ S, and

1
(b) whenever k ∈ S it implies that k + 1 ∈ S,

then S is the set of all positive integers.

4. Supremum of a set: Given S ⊆ R and α ∈ R, it is said that α is the supremum of S if

(a) s ≤ α for every s ∈ S,


(b) if there is another β ∈ R so that s ≤ β for every s ∈ S, then α ≤ β.

If α is the supremum of S, this is denoted by α = sup(S).

5. Supremum principle: Every non empty and bounded subset S of R has a supremum.

1.2 Functions: limits, continuity and derivatives


Recall that a function f : R → R is a “rule” that assigns to each x ∈ R a unique element of
R denoted by f (x). It is appropriated to mention that one of the most important concepts in
all mathematics is that of function. In order to get a better understanding of that important
mathematical object, some terms are needed

1.2.1 Some types of functions


For each of the following type of functions, provide definitions and examples of each.

1. Increasing function.

2. Even or odd function.

3. Bounded function.

4. Injective, surjective or bijective function.

5. Inverse function.

1.2.2 Limit of a function


Definition 1.2.1. Lef f : [a, b] → R be a function and let x0 ∈]a, b[. We say that f has limit l
at x0 if for every ε > 0, there exists δ > 0 such that if 0 < |x − x0 | < δ then |f (x) − l| < ε.

Theorem 1.2.1. (Arithmetic of limits) Let f, g : [a, b] → R be functions which have limit l and
l1 at x0 ∈]a, b[ respectively, then

1. the function f + g has limit and lim (f (x) + g(x)) = lim f (x) + lim g(x) = l + l1 .
x→x0 x→x0 x→x0

2. lim (f (x)g(x)) = lim f (x) lim g(x) = ll1 .


x→x0 x→x0 x→x0

f f (x) l
3. If l1 6= 0, then the function has limit at x0 and lim = .
g x→x0 g(x) l1

2
1.2.3 Continuous functions
In this section you should give an account of the main results for continuous functions such as
the analogous of Theorem 1.2.1 for continuous functions, the chain rule for continuous functions,
the so called three strong theorems and all that you need to recall from calculus of one variable.

1.3 Exercises
1. Let x and y be real numbers and n a positive integer. Factor xn + y
n
completely as
  n
x
a product of real numbers. Hint: note that xn + y n = y n + 1 and solve the
y
equation X n + 1 = 0.
2. Formulate and give a proof of the fundamental theorem of arithmetic. Use this result to

prove that if p is a prime number, then p is irrational.

3. Let a and n be positive integers so that a 6= bn for every integer b. Prove that n a is
irrational.
4. Is there a function f : [0, 1] → R which is continuous at the irrationals an discontinuous
at the rationals?
(
a2 x − a, x ≥ 3
5. For which values of a is the function f (x) = continuous everywhere?
4 x<3

ax − b,
 if x ≤ −1
2
6. For which values of a and b is the function f (x) = 2x + 3ax + b, if −1 < x ≤ 1

4, otherwise

continuous for every x ∈ R?


7. Prove Bernoulli’s inequality. If x ≥ −1 and n is a positive integer, then (1 + x)n ≥ 1 + nx.
8. Prove that exp(x) ≥ 1 + x for every x ∈ R.
f (x) f (bx)
9. Assume that lim = l. Prove that lim = bl, if b 6= 0.
x→0 x x→0 x
10. Let f : R → R be a function so that f (x + y) = f (x) + f (y) for every x, y ∈ R.
(a) If f continuous at zero, then f is continuous everywhere and f (x) = ax, for some
a ∈ R.
(b) If f is not continuous at zero, then f is discontinuous everywhere and its graph is
dense in the plane.
11. Let f : [a, b] → R be a function which is continuous in [a, b] an differentiable in ]a, b[ and
satisfies: f (a) = f (b) = 0, f 0 (x) = 0 implies f (x) = 0. Prove that f is constant.
an
12. Let {an } and {bn } be sequences of positive real numbers so that lim exists. Prove that
n→∞ bn

P ∞
P
an < ∞ if and only of bn < ∞.
n=1 n=1

3
13. Let f, g : R → R be differentiable functions so that f (0) = g(0) and f 0 (x) < g 0 (x) for
every x. Prove that f (x) < g(x) if and only if x > 0.

14. Let f : R → R be a continuous and increasing function so that f (0) = 0. Prove that
Rx fR
(x)
xf (x) = f + f −1 .
0 0

15. Let f : [0, ∞[→ R be a differentiable function such that f (0) = 0 and f 0 is increasing.
f (x)
Prove that the function g(x) = is also increasing in ]0, ∞[.
x

1.4 Derivatives
In this section you should start by defining the derivative of a function and provide several
example of functions which are differentiable at some points, functions which are differentiable
exactly at one, two, three, etc. points.
Also, state the main results of differential calculus such as the Mean value theorem, Rolle’s
theorem, L’Hospital’s Rule, Taylor Theorem, etc.

4
Chapter 2

Elements of Geometry in R3

In this chapter we shall review basic geometric objects in R3 , such as planes, lines, curves and
graphs of functions, using GeoGebra.
To start the discussion let’s start by presenting the graph of a function from R2 to R.
Graph of a function

5
Chapter 3

Basic Algebraic and Topological


Properties of Rn

3.1 Geometric Aspects of Rn


We assume that the reader is familiar with the basic properties of Rn as a vector space such
as linear dependence, basis, dimension, linear transformations etc. However, as a matter of
completeness, in this section we discuss some basic concepts such as inner product and norm
concerning the euclidean vector space Rn . With this, we are preparing the necessary background
to introduce basic ideas such as limit and continuity of functions. We present the definition
of a general euclidean vector space, even though our objective is to study several geometric
properties of Rn .
Definition 3.1.1. Given a real vector space V , we say that it is euclidean if there is a function
h , i : V × V → R which satisfies:
1. for every α, β ∈ V , hα, βi = hβ, αi.

2. for every α, β, γ ∈ V and for every a, b ∈ R, haα + bβ, γi = ahα, γi + bhβ, γi.

3. for every α ∈ V , hα, αi ≥ 0, and hα, αi = 0 only when α = 0. The function that makes V
a euclidean space is called an inner product, dot product or scalar product.
.
Example 3.1.1. Let V be the space of continuous
Z 1 functions in the interval [0, 1]. Define in
V × V the function, h , i given by hf, gi := f g.
0

Discussion: The first two properties are straightforwardly verified.


Z 1 The only one that needs an
argument is the third. That is, we need to show that hf, f i = f 2 ≥ 0, with equality if and
0
only if f = 0. A result from calculus guarantees that if g is a continuous function at x0 and
g(x0 )
g(x0 ) > 0, then there is δ > 0 such that g(x) > for every x ∈]x0 − δ, x0 + δ[. We apply
2 Z 1 Z c Z 1
2
this result to g = f , which is continuous. We know that if c ∈ [0, 1], then g= g+ g.
0 0 c
If g(x0 ) 6= 0 for some x0 ∈]0, 1[, then applying the cited property, there is δ > 0 such that

7
g(x0 )
g(x) > for every x ∈]x0 − δ, x0 + δ[. From this and using that the integral of a nonnegative
2
function is greater or equal to zero, one obtains:
Z 1
hf, f i = g
0
Z x0 −δ Z x0 +δ Z 1
= g+ + g
0 x0 −δ x0 +δ
Z x0 +δ
(3.1)
≥ g
x0 −δ
Zx0 +δ
g(x0 )

x0 −δ 2
= δg(x0 ) > 0.
From (3.1) one has that hf, f i = 0 implies f = 0. It is also clear that if f is zero, then hf, f i = 0.

Example 3.1.2. Given X, Y ∈ Rn , say X = (x1 , x2 , . . . , xn ) and Y = (y1 , y2 , . . . , yn ), define
hX, Y i = x1 y1 + x2 y2 + · · · + xn yn . This function is an inner product in Rn .
Definition 3.1.2. Let V be a euclidean vector space.
1. Given α, β ∈ V , we say that they are orthogonal if hα, βi = 0.
p
2. Given α ∈ V , its norm is defined by ||α|| := hα, αi.
Remark 3.1.1. If a is a scalar, then ||aα|| = |a|||α||.
p p p
Proof. By definition, ||aα|| := haα, aαi = a2 hα, αi = |a| hα, αi.
Theorem 3.1.1 (Pythagorean Theorem). Let V be a euclidean vector space, α, β ∈ V . Then
α is orthogonal to β if and only if ||α + β||2 = ||α||2 + ||β||2 .
The geometric meaning of the theorem is illustrated in Figure 3.1.

Figure 3.1: Pythagoream Theorem

Proof: From the definition of norm one has: ||α + β||2 = hα + β, α + βi = ||α||2 + 2hα, βi + ||β||2 .
Now, using the definition of orthogonality one has that α is orthogonal to β if and only if
hα, βi = 0, the conclusion follows. 

8
Theorem 3.1.2 (Cauchy-Schwarz inequality). If V is a euclidean space and α, β ∈ V , then
|hα, βi| ≤ ||α|| ||β||. With equality if and only if α and β are linearly dependent.
Proof: Given α, β ∈ V fixed and any scalar x ∈ R we have that

f (x) := hα + xβ, α + xβi = ||α||2 + 2hα, βix + ||β||2 x2 (3.2)

defines a quadratic function which is no negative for every x ∈ R, hence the discriminant of f
satisfies d := 4hα, βi2 −4||α||2 ||β||2 ≤ 0, and it has a double zero if and only d = 0. Equivalently,
this last occurs if and only if, 4hα, βi2 − 4||α||2 ||β||2 = 0 which in turns occurs if and only if
|hα, βi| = ||α|| ||β||. Then f (x) = hα + xβ, α + xβi = 0 if and only if α + xβ = 0 if and
only if |hα, βi| = ||α|| ||β||. Arguing as before, one has f (x) = hα + xβ, α + xβi > 0 if and
only if 4hα, βi2 − 4||α||2 ||β||2 < 0, that is, α and β are linearly independent if and only if
|hα, βi| < ||α|| ||β||. 

Figure 3.2: Orthogonal projection of α along β

Another Proof: The idea behind this proof is illustrated in Figura 3.2. From the definition of
hα, βi hα, βi
orthogonal projection we know that α = u + 2
β, with u and β orthogonal. Applying
||β|| ||β||2
the Pythagorean Theorem and Remark 3.1.1, one has:
2 2 2 2
hα, βi hα, βi = |hα, βi| ||β||2 = |hα, βi| .

||α||2 = ||u||2 + β ≥ β
||β||2 ||β||2 ||β||4 ||β||2
|hα, βi|2
Hence ||α||2 ≥ , and from this the conclusion follows by taking square root and trans-
||β||2
posing the denominator to the left hand side of the inequality. Notice that equality occurs if
and only if u = 0, and this last occurs if and only α coincides with its orthogonal projection
along β, that is, α and β are linearly dependent. 
One of the advantages of the last proof is that it does not use the order properties of the real
numbers to analyze the behavior of a quadratic function, then it can be used in any euclidean
vector space over the complex numbers.
Remark 3.1.2. If α, β ∈ V \ {0} then the Cauchy-Schwarz inequality can be represented as
hα, βi
−1 ≤ ≤ 1. (3.3)
||α|| ||β||
Definition 3.1.3. Given α, β ∈ V \ {0}. We define the angle θ between α and β by
hα, βi
cos(θ) = . (3.4)
||α|| ||β||

9
Corollary 3.1.3 (Distance from a point to a hyperplane). Given a hyperplane H ⊂ Rn , whose
equation is a1 x1 + · · · + an xn + b = 0 and P = (b1 , . . . , bn ) 6∈ H, the distance from P to H,
d(P, H), is given by
|a1 b1 + · · · + an bn + b|
d(P, H) = p . (3.5)
a21 + · · · + a2n

Figure 3.3: Distance from a point to a hyperplane

Proof:. Let X ∈ H be so that B = P − X is orthogonal to H, hence the norm of B is the


distance from P to H. We also have that N = (a1 , a2 , . . . , an ) is orthogonal to the n − 1
dimension subspace W = {Y ∈ Rn : hN, Y i = 0}, and H is a translation of W , hence B and
N are linearly dependent. Applying Theorem 3.1.2 we have
kBkkN k = |hN, Bi|
= |hN, P − Xi|
= |hN, P i − hN, Xi|
= |a1 b1 + · · · + an bn + b|, since X ∈ H, then hN, Xi = −b.
From this equation we have
|a1 b1 + · · · + an bn + b|
d(P, H) = kBk = p , (3.6)
a21 + · · · + a2n
finishing the proof of the theorem. 
Theorem 3.1.4 (Triangle Inequality). Let V be a euclidean vector space. If α and β are
elements of V , then ||α + β|| ≤ ||α|| + ||β||.
Proof: One has ||α + β||2 = hα + β, α + βi = ||α||2 + 2hα, βi + ||β||2 ≤ ||α||2 + 2|hα, βi| + ||β||2 .
Applying Cauchy-Schwarz’s inequality in the right hand side of the inequality, one has
||α + β||2 ≤ ||α||2 + 2||α|| ||β|| + ||β||2 = (||α|| + ||β||)2 .
The conclusion follows since the square root function is increasing. 

10
Figure 3.4: Triangle Inequality: ||α + β|| ≤ ||α|| + ||β||

3.2 Topology of Rn
In this section we will discuss some basic topological concepts in Rn such as: open, closed set;
accumulation, interior, cluster point; compact set, cover of a set, connected set, among others.
This concepts are important to discuss limit and continuity of functions.

Definition 3.2.1. Given r > 0 and a ∈ Rn . We define the open ball centered at a and radius
r as B(a, r) = {x ∈ Rn | ||x − a|| < r}.

Remark 3.2.1. When n = 3 the open ball centered at a and radius r is a usual sphere without
the border. If n = 2, then the open ball is a disk, without border, center at a and radius r,
Figure 3.5.

Figure 3.5: Open ball centered at a and radius r

Definition 3.2.2. Given a subset X ⊆ Rn , it is called open, if for every a ∈ X there is r > 0
such that B(a, r) = {x ∈ Rn | ||x − a|| < r} ⊆ X.

Example 3.2.1. The set D = {(x, y) ∈ R2 | y > x}, is open.

11
Figure 3.6: An open set contains an open ball for every a ∈ X

Figure 3.7: The semiplane defined by y > x is open.

In fact, let P = (x0 , y0 ) so that y0 > x0 . We know that the distance from P to the line L : y = x
is given by
|y0 − x0 |
d(P, L) = √ > 0.
2
|y0 − x0 |
Set r = √ . Claim: B(P, r) ⊆ D. To prove the claim we need to prove that if Q =
2 2
(x, y) ∈ B(P, r), then y > x. The assumption Q = (x, y) ∈ B(P, r) implies
p
(x − x0 )2 + (y − y0 )2 < r,

which implies,

(a) |x − x0 | < r and

(b) |y − y0 | < r or equivalently

(a)’ x0 − r < x < x0 + r and

12
(b)’ y0 − r < y < y0 + r.

To finish the proof of the claim it is enough to show that x0 + r < y0 − r or 2r < y0 − x0 , which
y 0 − x0
follows from 2r = √ < y0 − x0 . The above example can be generalized to:
2
Example 3.2.2. The set D1 = {(x, y) ∈ R2 | y > mx + b}, is open.

Figure 3.8: The semiplane defined by y > mx + b is open.

The proof follows some ideas as before.


The distance from P to the line L : y = mx + b is given by
|y0 − mx0 − b|
d(P, L) = √ > 0.
1 + m2
|y0 − mx0 − b|
Set r = √ . Claim: B(P, r) ⊆ D1 . Let (x, y) ∈ B(P, r), we need to prove that
2 1 + m2
y > mx + b, for the case m > 0, the other case follows the same ideas.
Let g(x) = mx + b; the condition m > 0 implies that g(x) is increasing, from (a)’ and using
that g is increasing we have mx + b < m(x0 + r) + b. We also have y > y0 − r. We shall prove
that y0 − r > m(x0 + r) + b, which is equivalent to show that

13
(1 + m)(y0 − mx0 − b)
y0 − mx0 − b > (1 + m)r = √ ,
2 1 + m2

or 2 1 + m2 > 1 + m. This last inequality is equivalent to 4(1 + m2 ) > (1 + m)2 and this is
 2
1 8
equivalent to m − + > 0, which holds for every m ∈ R.
3 9
Theorem 3.2.1. Every open ball is an open set.
Proof: Let B(a, r) = {x ∈ Rn | ||x − a|| < r} be an open ball centered at a and radius r. We
need to show that for every b ∈ B(a, r), there is ε > 0 so that B(b, ε) ⊆ B(a, r), see Figure 3.9.
r − ||a − b||
We have that b ∈ B(a, r), hence ||a − b|| < r. Set ε = > 0 and consider the open
2
ball B(b, ε).

Figure 3.9: An open ball B(a, r), contains an open ball centered at any point b ∈ B(a, r)

Let c ∈ B(b, ε). We shall show that ||a − c|| < r, that is, any element of B(b, ε) belongs to
B(a, r) as needed.
r − ||a − b||
We have ||a − c|| = ||a − c + b − b|| ≤ ||a − b|| + ||b − c|| < ||a − b|| + ε = ||a − b|| + =
2
r ||a − b||
+ < r. 
2 2
Definition 3.2.3. A subset F ⊆ Rn is called closed, if Rn \ F := {x ∈ Rn | x 6∈ F } is open.
Definition 3.2.4. Given X ⊆SRn and {Ui }i∈I a family of open subsets. We say that the family
is an open cover of X if X ⊆ Ui .
i∈I

Definition 3.2.5. A subset K ⊆ Rn is compact if every open cover of K contains a finite


subcover.
Exercise 3.2.1. For some of the exercises, a picture could help you to understand and approach
a solution of the problem.
1. Prove that the union of open sets is open. Using the De Morgan’s Laws, prove that the
intersection of closed sets is closed. Is it true that the intersection of open sets is open?

14
2. Let B(a, r) = {x ∈ Rn | ||x − a|| ≤r}. Provethat B(a, r) is a closed set. Hint. For every
1
k ∈ N consider the set Fk = Rn \B a, r −
T
. Show that Fk is closed and B(a, r) = Fk
k k≥1
and use part 1 of this exercise.

3. Prove that the set {(x, y) ∈ R2 | y < 0} is open.

4. Is the set {(x, y) ∈ R2 | y < x + 1} open?

5. Is it true that a finite set is compact?

6. Is it true that the union of compact sets is compact?

7. Given a ∈ X ⊆ Rn , it is called an interior point of X if there is r > 0 such that


o
B(a, r) ⊆ X. The set of interior points of X is denoted by X. Show that X is open iff
o
X = X.

Theorem 3.2.2
T (Nested Intervals). If In = {[an , bn ]} is a sequence
T of nested closed intervals
in R, then 6 ∅. Furthermore, if lim (bn − an ) = 0, then
In = In = {x0 }, for some x0 ∈ R.
n≥1 n→∞ n≥1

Figure 3.10: Nested Intervals

Proof: The assumption [a1 , b1 ] ⊇ [a2 , b2 ] ⊇ · · · ⊇ [an , bn ] ⊇ · · · implies, a1 ≤ a2 ≤ · · · ≤


an ≤ bn ≤ · · · ≤ b2 ≤ b1 , hence {an } and {bn } are upper and lower bounded respectively. Let
c = sup({an }) and d = inf({bn }). From the very definition of sup and inf, one has T c ≤ bn and
an ≤ d for every n, so c ≤ d. Also [c, d] ⊆ [an , bn ] for every n, hence ∅ =6 [c, d] ⊆ In .
n≥1
By assumption, lim (bn −an ) = 0. On the other hand, [c, d] ⊆ [an , bn ] implies, an +d ≤ an +bn ≤
n→∞
bn + c, hence 0 ≤ d − c ≤ bn − an . Taking limit in the last inequality leads to c = d, as wanted.


Theorem 3.2.3 (Heine-Borel for real intervals). Every closed interval in the reals is compact.

Proof: Before starting the proof, let’s set the length of I = [a, b] by l([a, b]) := b − a. Let
{Ai }i∈I be an open cover of the interval I0 := [a, b]. We will argue by contradiction,  that is,

a+b
assume that {Ai }i∈I does not contained a finite subcover for I0 , then at least one of a,
  2
a+b
or , b has no finite subcover from {Ai }i∈I . Renaming the interval, which has no finite
2
subcover, by [a1 , b1 ] and applying the same argument, we obtain a new interval [a2 , b2 ] which
1 1
has not finite subcover from {Ai }i∈I . Notice that l([a1 , b1 ]) = (b−a) and l([a2 , b2 ]) = 2 (b−a).
2 2
Continuing in this way, one has a nested sequence of closed intervals, In = [an , bn ] which satisfy

15
lim (bn −an ) = 0 and In is not cover by a finite subcover from {Ai }i∈I . Applying Theorem 3.2.2
n→∞ T
one has In = {p} ∈ I, thus p ∈ Ai for some i. Since Ai is open, then there exists ε > 0
n≥1
so that the ball with center at p and radius ε is contained in Ai . On the other hand, since
lim (bn − an ) = 0, then for some n, In = [an , bn ] ⊆ Ai , contradicting that In = [an , bn ] was not
n→∞
contained in a finite subcover from {Ai }i∈I . 
Remark 3.2.2. Theorem 3.2.3 holds in a more general setting: A subset of R is compact if and
only if it is closed and bounded. The same result, Theorem 3.2.7, holds in Rn .

Theorem 3.2.4. Let a ≤ b and c ≤ d be real numbers, and set R = [a, b] × [c, d]. Then R is
compact.

Figure 3.11: Product of closed intervals is compact

Proof: Let {Ui }i∈Ω be an open covering of R and let (x, y) ∈ R, then there is λ0 ∈ Ω so that
(x, y) ∈ Uλ0 . Since all Ui are open sets, then there is r > 0 so that B((x, y), r) ⊆ Uλ0 , see
Figure 3.11. From this we conclude that there are open intervals, U(x) and U(y) containing
x and y respectively, and U(x) × U(y) ⊆ Uλ0 . Now keep x fixed and vary y ∈ [c, d], then
{U(y)} is an open covering of [c, d]. From Theorem 3.2.3, there are finitely many y1 , y2 , . . . , ym
so that {U(y1 ), U(y2 ), . . . , U(ym )} is an open covering of [c, d] and each rectangle U(x) × U(yi )
is contained in some Uλi .
Interchanging roles between x and y, for each ys , where 1 ≤ s ≤ m, there are finitely many
xs1 , xs2 , . . . , xsjs so that {U(xs1 ), U(xs2 ), . . . , U(xsjs )} is an open covering of [a, b]. and U(xsj ) ×
U(ys ) is contained in some Uλsj . From this construction we conclude that there are finitely many
Uλ that cover R, proving what is needed. 

16
Theorem 3.2.4 can be extended to n intervals, the proof follows the same lines as in the proof
above and induction.
Theorem 3.2.5. The cartesian product of n closed intervals is a compact subset of Rn .
Proof: By following the ideas in the proof of Theorem 3.2.4, and inductive argument will take
care of the proof. Assume that n > 2 and the result to be true for n − 1 closed intervals. For
each j ∈ {1, 2, . . . , n} let Ij = [aj , bj ] be a closed interval. By the induction assumption we have
that K = I1 × I2 × · · · × In−1 is compact. To prove that R = K × In is compact adapt the ideas
in the proof of the previous theorem. 
n
In order to prove the theorem that characterizes compact sets in R , we need another result:
Lemma 3.2.6. Assume that K ⊆ Y , with K closed and Y compact, then K is compact.
Proof:Since K is closed, then K c is open. Let {Uα }α∈I be an open cover of K, then Y is covered
by K c and {Uα }α∈I . Since Y is compact, then this cover has a finite subcover, say K c and
Uα1 , Uα2 , . . . , Uαk . It is straightforward to verify that Uα1 , Uα2 , . . . , Uαk is a finite subcover of K.

With the results already proved, we are in the position to state and prove the Heine-Borel
theorem which is:
Theorem 3.2.7 (Heine-Borel). A subset K of Rn is compact, if and only if K is closed and
bounded.
Proof: (⇒ Assume that K is compact. We shall prove that K is closed and bounded. First of
all, we shall prove that K is bounded, to this end, let B(0, l) be the open ball with center at
0 ∈ Rn and radius l > 0. It is clear that the family {B(0, l)}l∈N is an open cover of Rn , hence
k
S
it is also a cover of K, therefore there exist integers l1 , l2 , . . . lk so that K ⊆ B(0, li ). Take
i=1
the maximum of the integers l1 , l2 , . . . lk and called it M , then K ⊆ B(0, M ), proving that K is
bounded.
We shall prove now that K is closed, which is equivalent to show that K c is open. Let x c
 ∈ K,
1
we will prove that there exists r > 0 so that B(x, r) ⊆ K c . For every n ∈ N, let Un = B x,
n
1
be the open ball with center at x and radius . Let Fn = Un be the closure of Un . Then the
n
family {Fn } satisfies:

T
1. Fn = {x}, and
i=1

2. {Fnc } covers Rn \ {x}.

From this conditions and the assumption, x 6∈ K, one has that {Fnc }n∈N covers K, hence there
k
are n1 , n2 , . . . , nk so that K ⊆ Fnci .
S
i=1
 
1 k
Fni ⊆ K c , where n is the maximum of the
T
Taking complement one has B x, ⊆ Fn ⊆
n i=1
integers n1 , n2 , . . . , nk .
⇐) We are assuming that K is closed and bounded, hence K is contained in a closed rectangle
R, which by Theorem 3.2.5 is compact. Now Lemma 3.2.6 implies that K is compact. 

17
Definition 3.2.6. Given A ⊆ Rn and x ∈ Rn . We say:

1. x is an accumulation point or limit point of A, if for every r > 0 one has A∩(B(x, r)\{x}) 6=
∅. The collection of accumulation points of A is denoted by A0 .

2. x is a closter point if for every r > 0 one has A ∩ B(x, r) 6= ∅. The set of closter points of
A is denoted by Ā.

3. x is an interior point if there is r > 0 such that B(x, r) ⊆ A. The set of all interior points
o
of A is denoted by A.

Remark 3.2.3. A finite set has no accumulation points. If A has accumulation points, then A
is infinite. The converse is not true, however, if A is bounded, then it has accumulation points.
Theorem 3.2.8 characterizes the bounded sets which have accumulation points.

Exercise 3.2.2. Assume that A ⊆ R is closed and bounded, then sup(A), inf(A) ∈ A.

Exercise 3.2.3. Let A be a subset of Rn . Prove that the following conditions are equivalent.

1. The subset A is closed.

2. The inclusion A0 ⊆ A holds.

3. The closure of A satisfies Ā = A ∪ A0 = A.

Theorem 3.2.8 (Bolzano-Weierstrass). Let A ⊆ Rn be a bounded subset. Then A0 6= ∅, if and


only if A is infinite.

Proof: By Remark 3.2.3, it is enough to prove that A has accumulation points. We argue by
contradiction. Assume that A is an infinite and bounded subset of Rn and has no accumulation
points, hence A0 = ∅ ⊆ A, thus Exercise 3.2.3 implies that A is closed. By Theorem 3.2.7, A
is compact with no accumulation points, therefore, for every x ∈ A, there is rx > 0 so that
B(x, rx ) ∩ A = {x}. Hence, the family {B(x, rx )}x∈A is an open cover of A and since A is
infinite, such cover has not finite subcover of A, contradicting compactness. 
One more concept which is important in the discussion is that of connected set.
The idea of a connected set C is that it is not “made” of two or more pieces. For instance, a
ball. The connectness idea has to do with the idea of global continuity, which will be widely
discussed in the next chapter.
The following definition is important in this respect.

Definition 3.2.7. A subset A ⊆ Rn is called connected, if whenever A ⊆ V ∪ W with V and W


open and V ∩ W = ∅ then A ⊆ V or A ⊆ W . If a set is not connected, it is called disconnected.

Examples of disconnected sets in R are the integers, the rationals and the irrationals among
others.

Theorem 3.2.9. The real interval [0, 1] is connected.

18
Proof:We shall prove something more general, every interval I = [a, b] is connected. The proof
is by contradiction. Assume that there are two open sets A and B in R so that I ⊆ A ∪ B,
A ∩ B = ∅, A ∩ I 6= ∅ and B ∩ I 6= ∅. The last conditions guarantee that there are c ∈ A ∩ I
and d ∈ B ∩ I. Without loss of generality we may assume that c < d (c = d is impossible
since A ∩ B = ∅). Since both A and B are open, we may assume that a < c < d < b. Set
S = {x ∈ A | x < d}. It is clear that S 6= ∅ and bounded, hence it has sup which is less or equal
to d, hence it belongs to I. Called it s, then s ∈ A ∪ B. If s ∈ B, since B is open, then there
is r > 0 so that ]s − r, s + r[⊂ B. Since A ∩ B = ∅ this implies that there is no x ∈ A so that
s − r < x, contradicting that s is the sup of S. On the other hand, if s ∈ A, then, since A is
open, there are elements in A greater that s, again a contradiction. With this we have proved
that the interval I is connected. 
Remark 3.2.4. The converse of Theorem 3.2.9 is true, that is, if S ⊆ R is connected, then S is
an interval. We invite the eager reader to explore ideas to provide a proof of this fact.

3.3 Exercises
1. Let A be a closed subset of Rn . Prove that A ∪ X is closed for any finite subset X ⊆ Rn .

2. Prove that a finite union of compact sets is compact.

3. Is {(x, y) ∈ R2 | x, y ∈ Q} a closed set?

4. Let W ⊆ Rn be a subspace not equal to Rn . Is W a closed set?

5. To approach this exercise we encourage the reader to use GeoGebra (click Geogebra) to
sketch a representation of the specified sets. Furthermore, provide arguments to decide if
the sets are: closed, open, compact, connected, bounded, etc.

(a) A = {(x, y) | x2 + y 2 > 1},


(b) B = {(x, y) | y > x2 },
(c) C = {(x, y) | 2x2 + 5y 2 ≤ 1},
(d) D = {(x, y) | 1 > x2 + y 2 > 0},
(e) E = {(x, y) | x + y sin(x) = 0},
(f) F = {(x, y) | x2 + y 3 + xy > 3}.

6. In Figure 3.12, the graph of x sin(x2 ) is shown in the interval [−2π, 2π]. Is it a closed
subset of R2 ? Explain.

7. Let A be a closed set in R2 which contains the set {(x, y) | x, y ∈ Q ∩ [0, 1]}. Show that
{(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} ⊆ A.

8. Let A be a closed subset in Rn and b ∈ Rn \A. Show that there is d > 0 so that k a−b k≥ d
for every a ∈ A. More generally, if K is a compact subset of Rn so that A ∩ K = ∅ then
there is d > 0 so that k a − b k≥ d for every b ∈ K and for every a ∈ A. Is the previous
result true if K is only closed?

19
Figure 3.12: The graph of x sin(x2 ) in the interval [−2π, 2π]

9. Let K and A be subsets of Rn . Assume that K is compact, A is open and K ⊂ A. Prove


o o
that there is a compact set K1 so that K ⊂ K 1 ⊂ A. Here K 1 represents the interior of
K1 .

10. Let n be an integer ≥ 1 and let A be a subset of Rn . Prove that A is open, if and only if
o
A = A.

11. Give an example of a closed set A which is the complement of a compact set. Are there
many such closed sets?

12. Is the union of compact sets, compact?

13. Give examples of subsets in R2 whose derived set consists of all the points (n, m), where
n and m are integers.

14. Is there an infinite open cover of R2 so that deleting one member, it does not cover R2 ?
If your answer is positive, does it hold in Rn for any n?

15. Solve all the exercises from pages 4-5 in the book: Calculus on Manifolds, author, M.
Spivak.

20
Chapter 4

Limit and Continuity of Functions

4.1 Introduction
graphfunction
In this chapter we discuss the basic properties of the limit of a function, as well as the properties
of continuous functions, whose domain is a subset of Rn . We shall recall that the norm of an
element will be used to define the distance from that element to zero. We encourage the reader
to start by reviewing the basic definition of limit and continuity of a function of one variable.
The results and terms concerning limit and continuity of a function are so closely related that
in several cases, we only state them for continuity and ask the reader to provide the details and
adjustments in order to have both of them to hold.
The first definition has to do with the concept of limit.

Definition 4.1.1. Let f : A ⊆ Rn → Rm be a function, and let a ∈ A. We say that f has


limit l at a, if for every ε > 0 there exists δ > 0 such that if x ∈ A ∩ (B(a, δ) \ {a}), then
||f (x) − l|| < ε. When f has limit l at a, it will be denoted by lim f (x) = l.
x→a

Figure 4.1: The open ball B(a, r) is mapped, by f , into the ball B(l, ε): shaded region

We should notice that this definition is the natural generalization of the one variable case for
the limit of a function. Also, when A is open the condition x ∈ A ∩ B(a, δ) is not needed, since
the ball can be chosen to be contained in A.

Exercise 4.1.1. By reviewing the results for a one variable case, prove the theorems for limit
of functions as stated in Definition 4.1.1. Pay special attention when m = 1. Also, consider the

21
case when f, g : A ⊆ Rn → Rm , lim f (x) = L and lim g(x) = L1 , then lim hf (x), g(x)i = hL, L1 i.
x→a x→a x→a
Where hL, L1 i is the inner product of L and L1 in Rm .
Example(4.1.1. Let f : R2 → R2 be a function given by:
(x2 + y 2 , 0) if x, y ∈ Q
f (x, y) = .
(0, 0) otherwise
This function has limit at (0, 0).
(p
(x2 − y 2 )2 − 02 if x, y ∈ Q
In fact, given ε > 0, we have that ||f (x, y) − (0, 0)|| = is less
0 otherwise
2 2 2
p
2 + y2 <

than ε ⇐⇒ √ x + y < ε , and this last is equivalent to ||(x, y)|| = x ε, hence
taking δ = ε will lead to ||(x, y)|| < δ implies ||f (x, y) − (0, 0)|| < ε. Using the usual notation
one has, lim f (x, y) = (0, 0). It is not difficult to show that if (a, b) 6= (0, 0) then f has no
(x,y)→(0,0)
limit at (a, b). Write the details.
Definition 4.1.2. [Local Continuity] Let f : A ⊆ Rn → Rm be a function and a ∈ A. We say
that f is continuous at a if for every ε > 0 there exists δ > 0 such that if x ∈ A ∩ B(a, δ) then
||f (x) − f (a)|| < ε.
Remark 4.1.1. A function f is continuous at a, if and only if f has limit at a and its limit is
f (a).
Some times, when speaking about continuous functions, it is said that the function does not
“break” its domain. This idea is referring to global continuity, as is stated below.
Definition 4.1.3 (Global Continuity). Let f : A ⊆ Rn → Rm be a function. We say that f is
continuous in A, if f is continuous at every element of A.
It is straightforward to see that the function of example (4.1.1) is continuous at (0, 0) and it is
not continuous at other points.
The first result concerning continuous functions has to do with alternative ways of stating the
same definition, which will be useful.
Theorem 4.1.1. Let f : A ⊆ Rn → Rm be a function and a ∈ A. Then the following conditions
are equivalent.
1. The function f is continuous at a.

2. For every open set V in Rm such that f (a) ∈ V , there exists an open set U ⊆ Rn containing
a so that if x ∈ U ∩ A then f (x) ∈ V .

3. For every sequence {ak } ⊆ A such that lim ak = a, then lim f (ak ) = f (a).
k→∞ k→∞

Proof: 1 ⇒ 2 We assume that f is continuous at a and we shall prove that if V is an open subset
of Rm which contains f (a), then there exists U , an open set in Rn , so that x ∈ U ∩ A implies
f (x) ∈ V . Since f (a) ∈ V and V is open, then there exists ε > 0 so that B(f (a), ε) ⊆ V .
Since f continuous at a, for this ε, there exists δ > 0 so that if x ∈ B(a, δ) ∩ A then f (x) ∈
B(f (a), ε) ⊆ V . Take U = B(a, δ) and the condition is satisfied.
2 ⇒ 3 Let {ak } ⊆ A be a sequence which converges to a. We will prove that f (ak ) converges
to f (a). Given ε > 0, consider V = B(f (a), ε) which is an open subset in Rm containing

22
f (a). Using the assumption, there is an open subset U ⊆ Rn containing a so that x ∈ A ∩ U
implies f (x) ∈ V . Since U is open and a ∈ U , then there is δ > 0 so that B(a, δ) ⊆ U . Since
ak converges to a, for this δ, there exists k0 ∈ N so that k ≥ k0 implies ak ∈ B(a, δ). Also,
{ak } ⊆ A implies that for every k ≥ k0 one has f (ak ) ∈ V = B(f (a), ε), that is, for k ≥ k0 it
holds that ||f (ak ) − f (a)|| < ε, proving that f (ak ) converges to f (a).
3 ⇒ 1 If f were not continuous at a, then there would exist ε > 0 so that for every δ > 0 there
1
would be x ∈ B(a, δ) ∩ A so that ||f (x) − f (a)|| ≥ ε. In particular, for δk = , there exists
k
ak ∈ B(a, δk ) ∩ A so that ||f (x) − f (a)|| ≥ ε. By the choice of δk , it is clear that ak converges
to a and f (ak ) does not converge to f (a), contradicting the assumption. 
Another important result concerning continuous functions is Theorem 4.1.3. For its proof, we
will need a couple of auxiliary results.

Lemma 4.1.2. If f : Rn → Rm is a continuous function at a ∈ Rn , then f is bounded in an


open ball centered at a. Furthermore, if f (a) 6= 0, then f (x) 6= 0 for every x in an open ball
centereda at a.

Proof: Since f is continuous at a, then for ε = 1 there is δ > 0 such that kf (x)−f (a)k < ε = 1 for
every x ∈ B(a, δ). We also have that kf (x)k−kf (a)k ≤ kf (x)−f (a)k, thus kf (x)k < 1+kf (a)k,
kf (a)k
that is,a bound for f in B(a, δ) is 1 + kf (a)k. If f (a) 6= 0, then for ε = , there
2
kf (a)k
is δ > 0 such that kf (x) − f (a)k < for every x ∈ B(a, δ). As was noted before,
2
kf (a)k
kf (a)k − kf (x)k ≤ kf (x) − f (a)k, therefore 0 < < kf (x)k, which implies f (x) 6= 0 for
2
every x ∈ B(a, δ). 

Theorem 4.1.3. Assume that the functions f, g : Rn → Rm are continuous at a ∈ Rn . Then


f
f ± g and hf, gi are continuous at a. Furthermore, if m = 1 and g(a) 6= 0, then f g and are
g
continuous at a.

Proof: We ask the reader to provide the details of the proof as a good exercise. 
Before going a little bit further in the discussion, we will introduce a few terms which will be
useful. Given a function f : Rn → Rm , it defines m functions fi : Rn → R, i = 1, 2, . . . , m
so that f (x) = (f1 (x), f2 (x), . . . , fm (x)) for every x. The functions f1 , f2 , . . . , fm are called the
coordinate functions of f . (
x2 + y 2 if x, y ∈ Q
If f is the function in example (4.1.1), then f = (f1 , f2 ), where f1 (x, y) =
0 otherwise
and f2 (x, y) = 0.
Sketch the graph of f1 and show that the only point where f1 is continuous, is at (0, 0).
If f : X → Y is a function, A ⊆ X and B ⊆ Y we define the sets f (A) := {y ∈ Y | y =
f (a) for some a ∈ A} and f −1 (B) := {x ∈ X | f (x) ∈ B}. The set f (A) is called the direct
image of A under f and f −1 (B) is called the inverse image of B under f .
If f : R2 → R is given by f (x, y) = x2 + y 2 , then f (R2 ) is the set of nonnegative real numbers,
while f −1 ({−1}) = ∅ the empty set.

Exercise 4.1.2. Let f : X → Y be a function. Assume that {Bα }α∈I is a family of subsets of
Y and {Aα }α∈J is a family of subsets of X. Show that:

23
1. f −1 ( f −1 (Bα ),
S S
Bα ) =
α∈I α∈I

2. f −1 ( f −1 (Bα ),
T T
Bα ) =
α∈I α∈I
S S
3. f ( Aα ) = f (Aα ),
α∈J α∈J
T T
4. f ( Aα ) ⊆ f (Aα ),
α∈J α∈J

5. For every α, β ∈ I, f −1 (Bα \ Bβ ) = f −1 (Bα ) \ f −1 (Bβ ).


Theorem 4.1.4. Let f : Rn → Rm be a function whose coordinate functions are f1 , f2 , . . . , fm .
Assume that a ∈ Rn and l = (l1 , l2 , . . . , lm ) ∈ Rm . Then lim f (x) = l if and only if lim fi (x) = li
x→a x→a
for every i = 1, 2, . . . , m.
Proof: Assume lim f (x) = l, we need to prove that lim fi (x) = li for every i = 1, 2, . . . , m. Given
x→a x→a
ε > 0, the assumption lim f (x) = l implies that there exists δ > 0 so that ||f (x) − l|| < ε for
x→a p
every x ∈ B(a, δ) \ {a}. We also have that |fi (x) − li | ≤ (f1 (x) − l1 )2 + · · · + (fm (x) − lm )2 =
||f (x) − l|| for every i = 1, 2, . . . , m, hence, if x ∈ B(a, δ) \ {a} then |fi (x) − li | < ε for every
i = 1, 2, . . . , m.
Conversely, assume lim fi (x) = li for every i = 1, 2, . . . , m and let ε > 0. The assumption
x→a
ε
implies that there are δ1 , δ2 , . . . , δm so that x ∈ B(a, δi ) \ {a} implies |fi (x) − li | < √ for every
n
m ε
i = 1, 2, . . . , m. Set δ = min{δi }i=1 , then if x ∈ B(a, δ) \ {a}, one has |fi (x) − li | < √ for every
n
m ε
i = 1, 2, . . . , m. Set M = max{|fi (x) − li |}i=1 , then M < √ . On the other hand, ||f (x) − l|| =
n
p √ √ √ ε
(f1 (x) − l1 )2 + · · · + (fm (x) − lm )2 ≤ nM , hence ||f (x)−l|| < nM < n √ = ε, proving
n
what was needed to prove. 
Remark 4.1.2. The previous result applies, mutatis mutandis, to the continuous case. State and
prove the corresponding result.
Remark 4.1.3. Theorems 4.1.3 and 4.1.4 together with Remark 4.1.2 are one of the main tools
to check if a function f : Rn → Rm is continuous.
The next result is an alternative definition of global continuity which will prove to be very
useful, especially for analyzing global continuity of a function.
Theorem 4.1.5. Let f : A ⊆ Rn → Rm be a function. Then f is continuous in A, if and only
if for every open set V ⊆ Rm , there is an open set U ⊆ Rn so that f −1 (V ) = A ∩ U . This
condition is the same as saying that f −1 (V ) is open in A.
Proof:(⇒ Assume that f is continuous in A and let V be an open set in Rm . If f −1 (V ) = ∅,
take U = ∅ and we are done. So, we may assume that f −1 (V ) 6= ∅ and let a ∈ f −1 (V ) ∩ A,
then by definition f (a) ∈ V . Since V is open, there is εa > 0 so that B(f (a), εa ) ⊆ V . Now
using the assumption that f is continuousS at a, there is δa so that x ∈ B(a, δa ) ∩ A implies
f (x) ∈ B(f (a), εa ) ⊆ V . Define U = B(a, δa ), which is open since each ball is open.
a∈f −1 (V )
−1
Now it is straightforward to show that f (V ) = A ∩ U .

24
Conversely, assume that the condition is satisfied and let a ∈ A. Given ε > 0, let V = B(f (a), ε)
be the open ball centered at f (a) and radius ε. By the assumption, there is an open set U so
that A ∩ U = f −1 (V ). Since a ∈ U and U is open, there exists δ > 0 so that B(a, δ) ⊆ U , hence
if x ∈ A ∩ B(a, δ) ⊆ A ∩ U = f −1 (V ), then f (x) ∈ V = B(f (a), ε), proving that f is continuous
at a. Since a is any point of A, we have proven that f is continuous in A. 
Remark 4.1.4. In the previous theorem the condition, open, can be changed by closed and the
result is the same. That is, the function f is continuous in A if and only if for every closed set
G ⊆ Rm , there is a closed set H ⊆ Rn so that f −1 (G) = A ∩ H.

Theorem 4.1.6. Let f : Rn → Rm be a continuous function and let K ⊆ Rn be a compact set,


then f (K) is compact.

Proof: Let O be an open cover of f (K). From the previous theorem, for every V ∈ O, there is
UV so that f −1 (V ) = K ∩ UV . It is clear that the family {UV } is an open cover of K. Since K
is compact, there are finitely many V1 , V2 . . . Vk so that UV1 , UV2 , . . . , UVk is a cover of K. It is
straightforward to show that V1 , V2 , . . . , Vk is a finite cover of f (K), proving the theorem. 
The proof of the following result is straightforward and is left to the reader.

Theorem 4.1.7. Let K be a compact subset of R then inf (K) and sup (K) are elements of K.

Theorem 4.1.8 (Mini-Max). Let K be a compact subset of Rn and let f : K → R. Then there
are x0 , y0 ∈ K so that f (x0 ) ≤ f (x) ≤ f (y0 ) for ever x ∈ K.

Proof: The conclusion follows directly from Theorem 4.1.6 and Theorem 4.1.7. 

Theorem 4.1.9 (Chain rule for continuity). Let f : A ⊆ Rn → B ⊆ Rm and g : B ⊆ Rm →


C ⊆ Rp be functions. Assume that f is continuous at a ∈ A, f (a) ∈ B, and that g is continuous
at f (a). Then g ◦ f is continuous at a.

Rn Rm Rp
B
A f C
g

a f (a)
g(f (a))

Figure 4.2: Given ε > 0, it is needed to find δ > 0 such that, if x ∈ B(a, δ) ∩ A then g(f (x)) ∈
B(g(f (a)), ε) ∩ C.

Proof: Given ε > 0, since g is continuous at f (a), there is δ1 > 0 so that, if y ∈ B(f (a), δ1 ) ∩ B,
then g(y) ∈ B(g(f (a)), ε). Also, since f is continuous at a, then for δ1 > 0, there is δ > 0
so that, if x ∈ B(a, δ) ∩ A then f (x) ∈ B(f (a), δ1 ) ∩ B, hence for x ∈ B(a, δ) ∩ A one has
g(f (x)) ∈ B(g(f (a)), ε), proving the theorem. 

25
Corollary 4.1.10. Let f : Rn → Rm be continuous and let K be a compact subset in Rn . Then
there are x0 , y0 ∈ K so that ||f (x0 )|| ≤ ||f (x)|| ≤ ||f (y0 )|| for all x ∈ K.
Proof: The conclusion follows from Theorems 4.1.8, 4.1.9 and Exercise 7. p. 29. 
Another very important result is that continuous functions preserve connectivity, more precisely.
Theorem 4.1.11. Let f : Rn → Rm be a continuous function and let C ⊆ Rn be a connected
subset, then f (C) is connected.
Proof: Let U and W be open sets in Rm so that f (C) ⊆ U ∪ W , then C ⊆ f −1 (U ) ∪ f −1 (W ).
Since f is continuous, then f −1 (U ) and f −1 (W ) are open. Also the disjointness of U and W
implies that f −1 (U ) and f −1 (W ) are disjoint. Since C is connected, then C is contained in one
of them. Let us say C ⊆ f −1 (W ), hence f (C) ⊆ W , proving that f (C) is connected. 
The next result is the generalization of the intermediate value theorem. In the proof of this
result we shall use that Theorem 3.2.9 and Remark 3.2.4 guarantee that the only connected
subsets of R are the intervals.
Corollary 4.1.12. Let f : C ⊆ Rn → R be a continuous function. Assume that C is connected
and there are x0 , x1 ∈ C so that f (x0 )f (x1 ) < 0. Then there exists at least one c ∈ C so that
f (c) = 0.
Proof: From Theorem 4.1.11, f (C) is connected and from Remark 3.2.4, f (C) is and interval,
which contains one positive element and one negative, then it contains zero, that is 0 ∈ f (C),
so there is c ∈ C so that f (c) = 0. 
In the next result, linear transformations appear and will be in the middle of the discussion up
to the end of the course.
Theorem 4.1.13. Let T : Rn → Rm be a linear transformation. Then there exists a constant
M so that ||T (x)|| ≤ M ||x|| for every x ∈ Rn , consequently, T is continuous. Additionally, T
is one to one if and only if there is m > 0 so that m||x|| ≤ ||T (x)|| for every x ∈ Rn .
Proof: If x ∈ Rn , then there are scalars x1 , x2 , . . . , xn such that x = x1 e1 + x2 e2 + · · · + xn en ,
where ei is the i-th canonical element of Rn . From this and using that T is linear, we have
||T (x)|| = ||x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en )|| ≤ |x1 |||T (e1 )|| + |x2 |||T (e2 )|| + · · · + |xn |||T (en )||.
Also, |xi | ≤ ||x|| for every i, hence ||T (x)|| ≤ ||x||(||T (e1 )|| + ||T (e2 )|| + · · · + ||T (en )||. Set
M1 = max{||T (ei )||}1≤i≤n , then ||T (x)|| ≤ nM1 ||x||. Set M = nM1 , then the first part of the
theorem is proved.
For the second part, let K = {x ∈ Rn | ||x|| = 1} be the unit sphere. It is clear that K is
bounded. Also K is closed (why?), hence K is compact. Since the function || || ◦ T is continuous
(composition of continuous functions), then from the Mini-Max theorem, Theorem 4.1.8, there
n
are constants m0 and M0 so that m 0 ≤ ||T (x)|| ≤ M0 for every x ∈ K. Given any y ∈ R \ {0}
y y
one has ∈ K, hence m0 ≤ T ≤ M0 for every y ∈ Rn \ {0}. Since T is linear, one
 ||y|| ||y||
y 1
has T = T (y), thus the previous inequality becomes m0 ||y|| ≤ ||T (y)|| ≤ M0 ||y||
||y|| ||y||
for every y 6= 0.
Now, assume that there is m > 0 so that m||x|| ≤ ||T (x)|| for every x ∈ Rn . If x is in the
kernel of T , then T (x) = 0 and from this we have m||x|| ≤ ||T (x)|| = 0, thus x = 0, that is T
in injective.
Conversely, if T is injective, then m0 = inf{||T (x)|| | ||x|| = 1} > 0, since otherwise, there would
exists x0 so that ||x0 || = 1 and m0 = ||T (x0 )|| = 0, contradicting that T is injective. 

26
Corollary 4.1.14. If T : Rn → Rm is an injective linear transformation, then T −1 : Im(T ) →
Rn is linear and continuous.

Proof:Is it straightforward to show that T −1 is linear. From the previous theorem, there is a
positive constant m so that m||x|| ≤ ||T (x)|| for every x ∈ Rn . Given y1 , y2 ∈ Im(T ) we will
show that ||T −1 (y1 ) − T −1 (y2 )|| ≤ M ||y1 − y2 || for some constant M . From the definition of
y1 , y2 ∈ Im(T ), there are x1 , x2 ∈ Rn so that T (x1 ) = y1 and T (x2 ) = y2 , hence m||x1 − x2 || =
1
m||T −1 (y1 ) − T −1 (y2 )|| ≤ ||T (x1 ) − T (x2 )|| = ||y1 − y2 ||. Take M = . 
m
The previous corollary shows that the inverse of a linear function is continuous, hence it is
natural to ask if it holds generally, that is, if f is continuous and has inverse g, is it true that g
is continuous? The answer is no in general. Consider the following example.
Let f : [0, 2π[→ R2 be given by f (θ) = (cos(θ), sin(θ)). Then f is injective and continuous.
Indeed, f is continuous since its coordinate functions are continuous. To show that f is injective,
we provide some ideas to the reader and leave her(him) the details. Assume that f (θ) = f (ω),
then (cos(θ), sin(θ)) = (cos(ω), sin(ω)), if and only if cos(θ) = cos(ω) and sin(θ) = sin(ω). Use
the identities

   
θ−ω θ+ω
sin(θ) − sin(ω) = 2 sin cos ,
2 2
   
θ+ω θ−ω
cos(θ) − cos(ω) = 2 sin sin
2 2

to show that θ = ω. Notice that the image of f is the unit circle, which is compact. If the
inverse of f were continuous, then the image of the unit circle under f −1 would be compact,
but the image of the unit circle under f −1 is [0, 2π[, which is not compact. Using Theorem 4.1.6
we conclude that f −1 is not continuous.
There are plenty of examples of injective continuous functions whose inverse is not continuous.
I will provide one more example and you are asked ( to give many more.
x if 0 ≤ x ≤ 1
Let f : [0, 1]∪]2, 3] → [0, 2] be given by f (x) = . Then f is continuous
x − 1 if 2 < x ≤ 3
and injective, however its inverse is not continuous. Explain why it is not continuous.
These examples lead to the question: under which conditions do we have that the inverse of a
continuous function is continuous? A partial answer is given by:

Theorem 4.1.15. Let K be a compact set in Rn and let f : K → Rm . Assume that f is


continuous and injective, then the inverse of f is continuous.

Proof: Use arguments based on Theorem 4.1.6 and Remark 4.1.4 to complete the proof. 

4.2 Exercises
1. In each of the following exercises, find the domain of the function and determine the set
of points where it is continuous.

(a) f (x, y) = x2 + y 2 − xy.

27
4 4

3 3

2 2

1 1

0 1 2 3 4 0 1 2 3 4

Figure 4.3: The graph of f Figure 4.4: The graph of f −1

Figure 4.5: The graph of a continuous function and the graph of its inverse which is not
continuous.

sin(x2 + y 2 )
(b) f (x, y) = .
x2 + y 2
(c) f (x, y) = xy .
(d) f (x, y) = exp(x − y).
(
x2 + y 2 if x, y ∈ Q
(e) f (x, y) =
0 otherwise
sin(||X||)
(f) f (x1 , x2 , . . . , xn ) = , where X = (x1 , x2 , . . . , xn ).
||X||
2. Let f : R2 → R be a function and assume that lim f (x, y) = L. Also assume that
(x,y)→(a,b)
lim f (x, y) and limf (x, y) both exist. Prove that lim[lim f (x, y)] = lim [limf (x, y)] = L.
x→a y→b y→b x→a x→a y→b
These limits are called iterated limits.
x−y
3. Let f (x, y) = , if x + y 6= 0. Compute the iterated limits at (0, 0). Is this a
x+y
contradiction to the previous exercise? Can you conclude that the limit of f (x, y) does
not exist when (x, y) → (0, 0)?
x2 y 2
4. Let f (x, y) = , whenever x2 y 2 + (x − y)2 6= 0. Prove that the iterated
x2 y 2 + (x − y)2
limits exist at (0, 0) and are equal, however the function has no limit at (0, 0).
 
x sin 1

if y 6= 0
5. Let f (x, y) = y . Show that f has limit at (0, 0) but the iterated
0 if y = 0

limits are not equal. Why this does not contradict exercise 2?

6. Find a function f : Rn → Rm which is continuous exactly at two points. Define a function


which is continuous exactly at 100 points.

28
7. Prove that the norm is continuous in Rn . This will follow from the inequality | ||x||−||y|| | ≤
||x − y|| for every x, y ∈ Rn .

8. Let f : Rn → Rm be a continuous function at a. Additionally, assume that f (a) 6= 0.


Show that there is r > 0 so that f (x) 6= 0 for every x ∈ B(a, r). In particular, if m = 1
and f (a) > 0, then there is r > 0 so that f (x) > 0 for every x ∈ B(a, r).

9. In this exercise you will prove some results which will be very useful. By a sequence
in Rn we understand a function a : N → Rn . The value of a in k will be denoted by
a(k) = ak . The sequence will be denoted by {ak }k∈N . Notice that the elements of a
sequence ak are elements in Rn , hence for every k one has ak = (a1k , a2k , . . . , ank ), that is,
each sequence in Rn determines n sequences in R, {aik }k∈N , i ∈ {1, 2, . . . , n}. Conversely,
n real sequences, {aik }k∈N , i ∈ {1, 2, . . . , n} determine a sequence in Rn . The sequences
{aik }k∈N , i ∈ {1, 2, . . . , n} are called the coordinate sequences of {ak }k∈N . A sequence
{ak }k∈N converges to l ∈ Rn , if for every ε > 0 there is k0 ∈ N so that k ≥ k0 implies
||ak − l|| < ε.

(a) Show that a sequence {ak }k∈N converges if and only if its coordinate sequences con-
verge.
(b) Let S be a subset in Rn . Prove that S is closed, if and only if every sequence contained
in S that converges, its limit is in S.
(c) (Bolzano-Weierstrass) Prove that a bounded sequence in Rn has a subsequence which
converges (give the definition of a subsequence).
(d) A sequence {uk } ⊆ Rn , with ||uk || = 1 for every k, is called uniformly linearly
independent if there exist integers m ≥ n, p0 ≥ 0 and a constant c > 0 such that,
for each k ≥ p0 , max{|hx, uk+i i|, i ∈ {1, 2, . . . , m}} ≥ c||x|| for all x ∈ Rn . Let
e1 , e2 , . . . , en be the canonical basis of Rn and set uk = ei , where k = nq + i, i =
1, 2, . . . , n − 1, uk = en if n divide a k. Is {uk } uniformly linearly independent? If
your answer is yes and A is a positive definite matrix, how can you mimic the above
construction using the columns of A?

29
Chapter 5

Differentiable Functions from Rn → Rm

5.1 Introduction
In this chapter we shall discuss the main results concerning differentiable functions from Rn →
Rm . As we have been doing when introducing some new ideas to be discussed, we will try to
find the analogy between functions of one variable and functions of several variables. To start
with, we recall some basic facts concerning the derivative of a function from R to R. So, assume
that f : R → R and let a be a real number so that f is differentiable at a, that means that:
f (a + h) − f (a)
lim = f 0 (a), (5.1)
h→0 h
exists and is denoted by f 0 (a). The existence of this limit has interesting meanings from the
geometric point of view. The existence of the limit is equivalent to the existence of a tangent
line, whose slope is f 0 (a), to the graph of f at the given point (a, f (a)) . See Figure 5.1
We shall rewrite equation (5.1) in a little bit different way.

f (a + h) − f (a) − f 0 (a)h
lim = 0. (5.2)
h→0 h
We will see, fairly soon, that this new reformulation of the derivative extends to functions in Rn .
Before doing so, let’s recall that a linear transformation T : R → R is given by T (h) = bh, for
some fixed real number b. Hence the numerator in (5.2) is represented in terms of f and a linear
transformation. Defining ϕ(h, a) := f (a + h) − f (a) − f 0 (a)h we can say that f is differentiable
at a if and only if

ϕ(h, a)
lim = 0. (5.3)
h→0 h
Summarizing, we can say that a function f : R → R is differentiable at a if there exists a linear
transformation T : R → R so that
f (a + h) − f (a) − T (h)
lim = 0. (5.4)
h→0 h
With the new definition of the derivative, there is a natural question: what is the relationship
between T and f 0 (a)? Answer, f 0 (a) = T (1). In fact, a linear transformation is determined in
a basis and the linear transformations from R → R are of the form x → mx, for some m, thus
f 0 (a) = m = T (1).

31
Figure 5.1: The tangent line at a given point has slope equal to the derivative of f at the point.

5.2 Derivatives in Rn
In the introduction we saw that the definition of a differentiable function f : R → R is given by
the existence of a linear transformation T that satisfies (5.4). With this reformulation of the
derivative it is natural to ask, what is the definition of the derivative of a function f : Rn → Rm ?
We should notice that equation (5.4) has no meaning for a function f : Rn → Rm , however
making some appropriated changes, things become much better, that is (5.4) is equivalent to

|f (a + h) − f (a) − T (h)|
lim = 0, (5.5)
h→0 |h|
and it will make perfect sense in Rn by changing the absolute value by the corresponding norm.
With this in mind, we are ready to state the main definition in this chapter.
Definition 5.2.1. Let f : A ⊆ Rn → Rm be a function, with A open, and let a ∈ A. We say
that f is differentiable at a if there is a linear transformation T : Rn → Rm so that

||f (a + h) − f (a) − T (h)||


lim = 0. (5.6)
h→0 ||h||
The linear transformation T is called a derivative or total derivative of f at a.
It is important to notice that we have used the same symbol to denote the norm in Rn as well
as in Rm .
Before we prove that the derivative is unique, we will give some examples of functions for which
a derivative exists.
Example 5.2.1. Assume that f : Rn → Rm is a constant function, then f has a derivative at any
point a ∈ Rn . It is natural to think, as in the real case, that the derivative of f is the zero linear
transformation. Since f is constant, then for any a, h ∈ Rn one has f (a + h) − f (a) − 0(h) = 0,
||f (a + h) − f (a) − 0(h)||
hence lim = 0, proving what we claimed.
h→0 ||h||

32
Example 5.2.2. Assume that f : Rn → Rm is linear, then f is its own derivative. In
fact, since f is linear we have f (a + h) − f (a) − f (h) = 0 for every a, h ∈ Rn , hence
||f (a + h) − f (a) − f (h)||
lim = 0, proving what we asserted.
h→0 ||h||
Example 5.2.3. Let f : Rn → R be given by f (x1 , x2 , . . . , xn ) = a1 x1 + a2 x2 + . . . + an xn ,
for some constants a1 , a2 , . . . , an . It is clear that f is linear and we can identify it with the vector
(a1 , a2 , . . . , an ). The identification is in the sense f (x1 , x2 , . . . , xn ) = h(a1 , a2 , . . . , an ), (x1 , x2 , . . . , xn )i
(inner product). From the previous example, f has derivative and is equal to f .

Example 5.2.4. Let f : Rn → R be given by f (x1 , x2 , . . . , xn ) = x1 x2 · · · xn . Is f differentiable


at some points and what is its derivative? To understand the question, let start with n = 2,
that is, we are considering the function f (x1 , x2 ) = x1 x2 . If f is differentiable at (a, b), then
there must exist a linear transformation T : R2 → R, say T (x1 , x2 ) = mx1 + nx2 so that

||f (a + h1 , b + h2 ) − f (a, b) − mh1 − nh2 ||


lim = 0. (5.7)
(h1 ,h2 )→(0,0) ||(h1 , h2 )||
Since

f (a + h1 , b + h2 ) − f (a, b) = (a + h1 )(b + h2 ) − ab
= ab + ah2 + bh1 + h1 h2 − ab
= ah2 + bh1 + h1 h2 ,
h1 h2
then taking m = b and n = a, we will need to prove that lim = 0 in or-
(h1 ,h2 )→(0,0) ||(h1 , h2 )||
der to justify that f is diefferentiable at (a, b). In fact, from (h1 , h2 ) 6= (0, 0), we have
|h1 ||h2 | |h1 ||h2 |
0 6= max{|h1 |, |h2 |} ≤ ||(h1 , h2 )||, hence 0 ≤ ≤ . Clearly, the last
||(h1 , h2 )|| max{|h1 |, |h2 |}
quotient approaches zero, when (h1 , h2 ) → (0, 0). From above, we have that the derivative of
f (x, y) at the point (a, b) is the linear transformation T (x, y) = bx + ay. With the previous
result at hand, find the derivative of f (x1 , x2 , . . . , xn ) = x1 x2 · · · xn at any point (a1 , a2 , . . . , an ).
In the general case, you might use that if p is the product of more than two coordinates of
|p| |p|
(h1 , h2 , . . . , hn ), then 0 ≤ ≤ and the last quotient
||(h1 , h2 , . . . , hn )|| max{|h1 |, |h2 |, . . . , |hn |}
approaches zero when (h1 , h2 , . . . , hn ) → 0.

Before going further, we shall prove that the derivative is unique.

Theorem 5.2.1. If f : Rn → Rm is differentiable at a ∈ Rn and there are T and T1 linear


transformations which satisfy Equation (5.6), then T = T1 . Since there is only one linear
transformation satisfying (5.6), it will be denoted by Df (a).

Proof: Set ϕ(h) = f (a + h) − f (a) − T (h) and ϕ1 (h) = f (a + h) − f (a) − T1 (h). Now using the
assumption that T and T1 satisfy, we have

ϕ(h) − ϕ1 (h) T (h) − T1 (h)


lim = −lim = 0. (5.8)
h→0 ||h|| h→0 ||h||

33
Now, take any fixed x ∈ Rn \ {0} and varying t ∈ R one has,

T (tx) − T1 (tx) t(T (x) − T1 (x))


lim = lim
t→0 ||tx|| t→0 |t|||x||
(T (x) − T1 (x))
= ±
||x||
= 0,

so T (x) = T1 (x) for any x 6= 0. The equation also holds for x = 0 since T and T1 are linear. 
In the one variable case, we know that a differentiable function is continuous, the same result
also holds for functions defined in Rn .

Theorem 5.2.2. Let f : Rn → Rm be a differentiable function at a ∈ Rn , then f is continuous


at a.
||f (a + h) − f (a) − T (h)||
Proof:By assumption, there is T : Rn → Rm , linear so that lim =
h→0 ||h||
0. Multiplying by ||h|| and taking limits one has lim ||f (a + h) − f (a) − T (h)|| = 0, if and
h→0
only if lim f (a + h) − f (a) − T (h) = 0. On the other hand, T is continuous at zero, hence
h→0
lim f (a + h) − f (a) = 0, which implies that lim f (a + h) = f (a), proving what was needed. 
h→0 h→0

Example(5.2.5. We saw in Example 4.1.1 that the function f : R2 → R2 given by:


(x2 + y 2 , 0) if x, y ∈ Q
f (x, y) = , has limit at (0, 0), actually f is continuous at (0, 0) and f
(0, 0) otherwise
is not continuous at any other point. We encourage the reader to prove that f is differentiable
at (0, 0) and it is not differentiable at any other point. Hint, prove that Df (0, 0) = 0, the zero
linear transformation.

We know that computing derivatives of functions of one variable is reduced to compute limits,
however, to compute derivatives of functions of several variables is quite different, since the
derivative is not a limit. Hence, it is natural to ask: what conditions would guarantee that
the derivative of a function exists? In the following we will be dealing with this question, but
before, we shall prove the chain rule theorem which is a very powerful tool when applicable.

Theorem 5.2.3 (Chain rule for derivatives). Assume that f : Rn → Rm is differentiable at


a ∈ Rn , and g : Rm → Rp is differentiable at f (a). Then g ◦ f is differentiable at a and
D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a).

Proof: By assumption, there are linear transformations T : Rn → Rm and T1 : Rm → Rp so


that:
||f (a + h) − f (a) − T (h)||
1. lim = 0.
h→0 ||h||
||g(f (a) + k) − g(f (a)) − T1 (k)||
2. lim = 0.
k→0 ||k||

34
We want to show that
||(g ◦ f )(a + h) − (g ◦ f )(a) − (T1 ◦ T )(h)||
lim = 0. (5.9)
h→0 ||h||

Definining
φ(h) = f (a + h) − f (a) − T (h), (5.10)
||φ(h)||
then (1) becomes lim = 0.
h→0 ||h||
Call the numerator of (5.9)

ρ(h) = (g ◦ f )(a + h) − (g ◦ f )(a) − (T1 ◦ T )(h). (5.11)

From (5.10) one has


T (h) = f (a + h) − f (a) − φ(h). Substituting this in (5.11) one has:

ρ(h) = (g ◦ f )(a + h) − (g ◦ f )(a) − T1 (f (a + h) − f (a) − φ(h))


= (g ◦ f )(a + h) − (g ◦ f )(a) − T1 (f (a + h) − f (a)) + T1 (φ(h)).

Hence, in order to show that (5.9) holds, it is enough to show that

||(g ◦ f )(a + h) − (g ◦ f )(a) − T1 (f (a + h) − f (a))||


lim = 0. (5.12)
h→0 ||h||
and
||T1 (φ(h))||
lim = 0. (5.13)
h→0 ||h||
hold.
The equation (5.13) follows directly from the condition that φ satisfies and the continuity of T1 .
To provethat (5.12) holds, set
 ||(g ◦ f )(a + h) − (g ◦ f )(a) − T1 (f (a + h) − f (a))|| if f (a + h) − f (a) 6= 0
ϕ1 (h) = ||f (a + h) − f (a)||
0 otherwise.

Set k = f (a + h) − f (a), since f is continuous at a, then k → 0 as h → 0. Using this last
equation to solve for f (a + h) and the assumption that g is differentiable at f (a), one concludes
that ϕ1 is continuous at zero.
Multiplying and dividing by ||f (a + h) − f (a)|| = 6 0 in
||(g ◦ f )(a + h) − (g ◦ f )(a) − T1 (f (a + h) − f (a))||
A= and using (5.10)one has
||h||

||f (a + h) − f (a)||
A = ||ϕ1 (h)||
||h||
(5.14)
||φ(h) + T (h)||
= ||ϕ1 (h)|| ,
||h||

||φ(h)|| ||T (h)||


The result claimed in (5.12) follows, since , ϕ1 (h) approach zero when h → 0 and
||h|| ||h||
is bounded. 

35
Theorem 5.2.4. Let f : Rn → Rm be a function whose coordinate functions are f1 , f2 , . . . , fm .
Then f is differentiable at A = (a1 , a2 , . . . , an ) if and only if, fj is differentiable at (a1 , a2 , . . . , an ),
for every j ∈ {1, 2, . . . , m}, and Df (A) = (Df1 (A), Df2 (A), . . . , Dfm (A)).

Proof:⇐) Before starting the proof we mention the following remark. A function G : Rn → Rm is
linear if and only if its coordinates functions are linear. Assume that each fj is differentiable at A
||f (A + h) − f (A) − T (A)||
and let T = (Df1 (A), Df2 (A), . . . , DFm (A)). We will show that lim =
h→0 ||h||
0, which is equivalent to: each coordinate in f (A + h) − f (A) − T (A) divided by ||h|| approaches
fj (A + h) − fj (A) − Dfj (A)(h)
zero. By assumption, for every j ∈ {1, 2, . . . , m} one has lim =
h→0 ||h||
0, hence proving what was claimed.
(⇒ Let πi : Rn → R be given by πi (x1 , x2 , . . . , xn ) = xi . Clearly, πi is linear, hence differentiable
at any point. We have that fi = πi ◦ f . Now apply the chain rule to finish the proof. 
Remark 5.2.1. The previous theorem, together with the theorems for limits and continuity, show
that the theory of functions from Rn → Rm can be reduced to the theory of functions from
Rn → R.

Theorem 5.2.5. Let f, g : Rn → R be differentiable at a, then

1. f + g is differentiable and D(f + g)(a) = Df (a) + Dg(a).

2. f g is differentiable and D(f g)(a) = g(a)Df (a) + f (a)Dg(a).


 
f f g(a)Df (a) − f (a)Dg(a)
3. If g(a) 6= 0, then is differentiable and D (a) = .
g g g(a)2
Proof: The first two follow from examples 5.2.3, 5.2.4 and thechain
 rule. To prove the third,
1 1 −Dg(a)
notice that it is enough to show that is differentiable and D (a) = .
g g g(a)2
We will prove that this last equation holds.
We have:
1 1 Dg(a)h g(a)2 − g(a)g(a + h) + g(a + h)Dg(a)h − g(a)Dg(a)h + g(a)Dg(a)h
− +
g(a + h) g(a) g(a)2 g(a)2 g(a + h)
=
||h|| ||h||
g(a)[g(a) − g(a + h) + Dg(a)h] + Dg(a)h[g(a + h) − g(a)]
=
g(a)2 g(a + h)||h||
g(a)[g(a) − g(a + h) + Dg(a)h] Dg(a)h[g(a + h) − g(a)]
= +
g(a)2 g(a + h)||h|| g(a)2 g(a + h)||h||

The first term in the right hand of the equation approaches zero, by the definition of derivative.
Rewriting the second term, we have:  
Dg(a)h[g(a + h) − g(a)] g(a + h) − g(a) h
= Dg(a) . Since g is continuous at a, the first
g(a)2 g(a + h)||h|| g(a)2 g(a + h) ||h||
factor approaches zero and the second is bounded, since Dg(a) is linear, hence the product
approaches zero. With this we finish the proof. 

36
5.3 Partial Derivatives and Matrix Representation of the
Derivative
In this section we discuss the partial derivative concept and give sufficient conditions under
which a function has derivative.
In several application, it is important to identify directions in which a given function varies the
most. To fix ideas, let us suppose that we have a function T that measures the temperature in
a flat plate. Furthermore, let us assume that in the plate there is one heater and one cooler.
At the point where the heater is placed, the temperature measures 1000 Celsius degrees and at
the point where the cooler is placed the temperature reads −150 . How does the temperature
changes when moving out from one given point? We can think that T is a function defined from
R2 → R. Let us assume that the point where an observer is situated is denoted by A and he
wants to observe how the temperature changes from A in the direction determined by the vector
u. How can he determine the way in which the temperature changes? The points on the line
that passes through A in the direction of u are given by A+tu, where t runs on the real numbers.
T (A + tu) − T (A)
Hence the average change of temperature from A to A + tu is given by , so
t
T (A + tu) − T (A)
the instantaneous rate of change of T at A is given by lim , in case that the
t→0 t
limit exists.

E
−150

u
A F

1000 C

Figure 5.2: A flat plate with a heater and a cooler

The above idea can be generalized to any function f : Rn → Rm , that is:

Definition 5.3.1. Let f : Rn → Rm be a function and let a and u be elements in Rn so that


u 6= 0. If the limit
f (a + tu) − f (a)
lim (5.15)
t→0 t
exists, it is called the directional derivative of f at a in the direction of u and it will be denoted by
f (a + tu) − f (a)
Du f (a) = lim . When u = ei , the canonical basis element of Rn , the directional
t→0 t
derivative of f will be called i-th partial derivative of f at a and will be denoted by Di f (a) or
∂f
by (a)
∂xi

37
Figure 5.3: Directional derivative: the slope of line l is Du f (a)

Example 5.3.1. Let f : R2 → R be given by f (x, y) = x2 + 2xy + y 2 , a = (1, 1) and u = (2, 1).
f (a + tu) − f (a) f ((1, 1) + t(2, 1)) − f (1, 1) 12t + 9t2
Then lim = lim = lim = 12, or using the
t→0 t t→0 t t→0 t
above notation, D(2,1) f (1, 1) = 12.

It should be noticed, that if the directional derivative exists, it is an element in the space where
the function takes values.
Remark 5.3.1. We should notice that defining γ(t) = a + tu, t ∈ R, then the existence of Du f (a)
is equivalent to the existence of the derivative of the function f ◦ γ : R → Rm at zero.

5.3.1 Directional Derivatives and continuity


What is the geometric meaning of the directional derivative? The video “Directional derivative”
might help to understand this idea.
We know that if a function f is differentiable at a, then it is continuous at a, however, if a
function f has directional derivatives at a in all directions, it might not be continuous.
2
xy
 if x 6= 0
Consider the function defined by f (x, y) = x2 + y 4 .
0 if x = 0 and y ∈ R

It can be verified that f has directional derivatives at (0, 0) in every direction u,
1
lim f (x, x) = 0 and lim f (y 2 , y) = , hence f is not continuous at (0, 0).
x→0 y→0 2

38
This example shows that the existence of directional derivatives is such a weak condition that
it does not guarantee continuity. We already know that the existence of the derivative of
a function guarantees continuity, Theorem 5.2.2. In the next theorem it is proven that the
derivative condition also guarantees the existence of directional derivatives.
We know that a function f = (f1 , f2 , . . . , fm ), has derivative if and only if each of its coordinate
functions has derivative, hence, when considering the derivative of a function f , it is enough to
consider the derivatives of its coordinate functions, which are functions from Rn → R. With
this in mind, we have the following theorem.
Theorem 5.3.1. Let f : Rn → R be a function which has derivative at a, then Du f (a) exists
for every uand Du f (a) = Df (a)(u). Moreover,
 under this condition, Df (a) is identified with
∂f ∂f ∂f
the vector (a), (a), . . . , (a) .
∂x1 ∂x2 ∂xn
Proof: Let u be any non zero element of Rn and let γ : R → Rn be given by γ(t) = a + tu.
It is clear that γ has derivative at any t and Dγ(t) = u. From Remark 5.3.1, the existence of
Du f (a) is equivalent to the existence of D(f ◦ γ)(0). Since γ(0) = a and the assumption on
f , by the chain rule, we conclude that Du f (a) = Df (γ(0)) ◦ Dγ(0) = Df (a)(u). Now, since
Df (a) is a linear transformation from Rn → R, we know that Df (a) = (b1 , b2 , . . . , bn ), hence
Du f (a) = Df (a)(u) = b1 u1 + bn un + · · · + bn un , where u = (u1 , un , . . . , un ). Taking u = ei the
i-th canonical basic element of Rn , then the expression
 for Di f (a) reduces to Di f (a) = bi , which
∂f ∂f ∂f
implies Df (a) can be identified with the vector (a), (a), . . . , (a) , as claimed. 
  ∂x 1 ∂x 2 ∂x n
∂f ∂f ∂f
The vector (a), (a), . . . , (a) is so important that has a name and notation. One
∂x1 ∂x2 ∂xn
of its properties is that it gives the direction of maximum growth of a function from Rn → R.
More precisely:
Definition 5.3.2. Let f : Rn → R bea function. Assume that for a∈ Rn all partial derivatives
∂f ∂f ∂f
of f exists at a. The vector ∇f (a) = (a), (a), . . . , (a) is called the gradient of f
∂x1 ∂x2 ∂xn
at a.
Corollary 5.3.2. The gradient of f at a, indicates the direction of maximum change of f at a.
Proof: We may assume that kuk = 1. From Equation 3.4, p. 9 we have cos(θ)k∇f (a)kkuk =
h∇f (a), ui. From Theorem 5.3.1 we have Du f (a) = h∇f (a), ui, hence cos(θ)k∇f (a)kkuk =
Du f (a). Now the assumption kuk = 1 implies that cos(θ)k∇f (a)k = Du f (a), which is maximum
exactly when cos(θ) = 1, that is, when u and ∇f (a) have the same direction. 
Remark 5.3.2. Notice that the definition of the gradient of f only requires the existence of the
partial derivatives at a, however, we know that the existence of the partial derivatives do not
guarantee even continuity of f at a. We also know, that when f is differentiable at a then f
has directional derivatives at a for every direction, hence in particular the gradient of f exists.
When f is differentiable at a we will identify Df (a) with ∇f (a)
We have seen that the existence of Di f (a) for every i does not guarantee the existence of Df (a),
hence it is natural to ask, what additional conditions would guarantee that f has derivative?
The answer is given by the next:
Theorem 5.3.3. Let f : Rn → R be a function. Assume that the partial derivatives of f exist
in a neighborhood of a and are continuous at a. Then there exists Df (a).

39
Remark 5.3.3. The general result for functions from Rn → Rm follows directly, by recalling that
f is differentiable if and only if, its coordinate functions are.
Proof: First of all, we will write the details for the case n = 2, the general case follows by adding
sufficient terms. Assume that a = (a1 , a2 ) and h =(h1 , h2 ). We know  that if the derivative of
∂f ∂f
f exists at a, it would be identified with ∇f (a) = (a), (a) , hence we need to prove
∂x1 ∂x2
that
∂f ∂f
f (a + h) − f (a) − (a)h1 − (a)h2
∂x1 ∂x2
lim = 0. (5.16)
h→0 ||h||
In order to achieve this goal we will show that the difference f (a + h) − f (a) can be written in
a very appropriate way!
Using coordinates we have:
f (a1 + h1 , a2 + h2 ) − f (a1 , a2 ) = f (a1 + h1 , a2 + h2 ) − f (a1 + h1 , a2 ) + f (a1 + h1 , a2 ) − f (a1 , a2 ).
Since the partial derivatives of f exist in a neighborhood of a, we may assume that h is chosen
so that the partial derivatives exist. Define g(t) = f (a1 + t, a2 ) and g1 (t) = f (a1 + h1 , a2 + t).
We will show that both functions have derivative in the intervals [0, h1 ] and [0, h2 ] respectively.
We write the details for g; the procedure is analogous for g1 .
By definition, the derivative of g at t is:

g(t + k) − g(t)
g 0 (t) = lim
k→0 k
f (a1 + t + k, a2 ) − f (a1 + t, a2 )
= lim
k→0 k
f ((a1 + t, a2 ) + ke1 ) − f (a1 + t, a2 )
= lim , definition of partial derivative of f at (a1 + t, a2 )
k→0 k
= D1 f (a1 + t, a2 ), notation for the first partial derivative of f at (a1 + t, a2 ).
Just as we did before, g10 (t) = D2 f (a1 + h1 , a2 + t), hence, by applying the mean value theorem
to the functions in the intervals [0, h1 ] and [0, h2 ] we have:
g(h1 ) − g(0) = h1 g 0 (b1 ) and g1 (h2 ) − g1 (0) = h2 g10 (b2 ) for some b1 ∈]0, h1 [ and b2 ∈]0, h2 [. Now
we notice that g 0 (b1 ) = D1 f (a1 + b1 , a2 ) and g10 (b2 ) = D2 f (a1 + h1 , a2 + b2 ).
From all of this one has:
f (a1 + h1 , a2 + h2 ) − f (a1 , a2 ) = g1 (h2 ) − g1 (0) + g(h1 ) − g(0)
= D2 f (a1 + h1 , a2 + b2 )h2 + D1 f (a1 + b1 , a2 )h1 .


f (a + h) − f (a) − ∂f ∂f
(a)h 1 − (a)h 2

∂x1 ∂x2
Set L = . Using this and the above, one obtains:
||h||

∂f ∂f ∂f ∂f

∂x2 (a 1 + h 1 , a 2 + b 2 )h2 + (a 1 + b 1 , a2 )h1 − (a)h 1 − (a)h 2

∂x1 ∂x1 ∂x2
L =
||h||

∂f ∂f ∂f ∂f

∂x1 (a 1 + b 1 , a2 ) − (a) |h1 | + (a 1 + h1 , a2 + b 2 ) − (a) |h2 |
∂x1 ∂x2 ∂x2
≤ .
||h||

40
|hi |
We also have that ≤ 1, hence
||h||
∂f ∂f ∂f ∂f
0 ≤ L ≤ (a1 + b1 , a2 ) − (a) + (a1 + h1 , a2 + b2 ) − (a) .
∂x1 ∂x1 ∂x2 ∂x2
Recall the conditions on b1 and b2 , which imply

a1 ≤ b1 + a1 ≤ a1 + h1
a2 ≤ b2 + a2 ≤ a2 + h2 ,

hence (h1 , h2 ) → (0, 0) implies (b1 , b2 ) → (0, 0). Now, the continuity of the partial derivatives
guarantee that each term in the sum that dominates L approaches zero, hence so does L,
finishing the proof for the case n = 2.
The general case is obtained by adding the remaining terms. Please write the details as an
excellent exercise!! 

5.3.2 Graphs, level curves and gradient


There are several ways to represent a function, one possibility is by its graph. But what is the
graph of a function from Rn → R?

Definition 5.3.3. Given a function f : D ⊆ Rn → R. Its graph, Gf , is defined by Gf :=


{(x, f (x)) ∈ Rn × R | x ∈ D}.

Remark 5.3.4. It should be noticed that Gf can be identified with a subset of Rn+1 , especially
when n = 2.

Example 5.3.2. Sketch the graph of the function f (x, y) = x2 − y 2 .

Discussion: One possible approach to sketch the graph of f is by noticing that for a fixed value
of y0 , the graph corresponds to a parabola, and the graph consists in taking the union of those
parabolas. 

Figure 5.4: The graph of f (x, y) = x2 − y 2

sin(π(x2 + y 2 ))
A much more sophisticated function is f (x, y) = , whose graph is shown in
2
Figure 5.5.

41
sin(π(x2 + y 2 ))
Figure 5.5: The graph of f (x, y) =
2

Definition 5.3.4. Given a function f : Rn → R and r ∈ R, the set Cr := {x ∈ Df | f (x) = r}


is called the level set of f at r. When n = 2, Cr is called a level curve of f at r.

Figure 5.6: Level curves view of f (x, y) = x2 + y 2 : Left, from the first octant; Right, from
above.

Remark 5.3.5. The graph of a function f can be obtained by gluing together all the curves
obtained by evaluating the function at the level curves.
Assume that f is a function from R2 → R which is differentiable, in particular it is differentiable
at (a, b). Consider the function F : R3 → R given by F (x, y, z) = f (x, y) − z, then F is
differentiable and at (a, b, f (a, b)) one has ∇F (a, b, f (a, b)) = (D1 f (a, b), D2 f (a, b), −1). From
the definition of F we have that the graph of f is the level set of F at zero. If γ : [0, 1] → R3 is
a differentiable function whose image contains (a, b, f (a, b)), that is, there is t0 ∈ [0, 1] so that
γ(t0 ) = (a, b, f (a, b)) then the tangent vector of γ at t0 is orthogonal to the gradient of F at
(a, b, f (a, b)).
In fact, by the assumption on γ one has that g = F ◦ γ is the zero function and is differentaible,

42
hence by the chain rule we have

g 0 (t) = DF (γ(t)).Dγ(t)
= h(D1 f, D2 f, −1), (γ10 (t), γ20 (t), γ30 (t)) (5.17)
= h∇F, γ 0 (t)i.
= 0

From Equation (5.17) we have that the gradient is orthogonal to every curve that passes through
(a, b, f (a, b)). From a global view, we have that the level curves are orthogonal to the gradient.
More precisely, the tangent vectors to the level curves are orthogonal to the gradient at each
point.

Figure 5.7: Level curves and gradient vectors of f (x, y) = x2 − y 2

A useful interpretation of the fact that the gradient is orthogonal to the curves that pass
through (a, b, f (a, b)) and the fact that ∇F (a, b, f (a, b)) 6= 0, is that the tangent plane to
the graph of f at (a, b, f (a, b)) can be obtained by considering the points (x, y, z) that satisfy
h∇F (a, b, f (a, b)), (x − a, y − b, z − f (a, b))i = 0.
Equivalently, the equation of the plane tangent to the graph of f at (a, b, f (a, b)) is:

D1 f (a, b)(x − a) + D2 f (a, b)(y − b) − (z − f (a, b)) = 0 (5.18)

43
Example 5.3.3. Given the function f (x, y) = x2 + y 2 , obtained the equation of the tangent
plane at (5, 5, 50) and sketch its graph.

Discussion: the gradient of f is ∇f (x, y) = (2x, 2y), hence ∇f (5, 5) = (10, 10). From Equation
(5.18) we have that the equation of the tangent plane is 10(x − 5) + 10(y − 5) − (z − 50) = 0,
or 10x + 10y − z − 50 = 0. 

Figure 5.8: Tangent plane to the graph of f (x, y) = x2 + y 2 at (5, 5, 50).

5.3.3 Matrix Representation of the Derivative


In this subsection we present a practical way to represent the derivative of a function f : Rn →
Rm . First of all, if f = (f1 , f2 , . . . , fm ), then from Theorem 5.2.4,

Df (a) = (Df1 (a), Df2 (a), . . . , Dfm (a)). (5.19)


We shall recall how to represent a linear transformation with a matrix, given a basis. So, we
assume that the canonical basis in Rn and Rm are given to represent Df (a). To construct the
matrix representing Df (a), we evaluate the derivative at each canonical element ei , and the
scalar that appear in the linear combination are used to form the i-th column.
From equation (5.19) one has

Df (a)(ei ) = (Df1 (a)(ei ), Df2 (a)(ei ), . . . , Dfm (a)(ei )). (5.20)


Now, fromTheorem 5.3.1 we have that  the derivative of a function fj : Rn → R is given by
∂fj ∂fj ∂fj i
Dfj (a) = (a), (a), . . . , (a) , hence evaluating at ei = (0, 0, . . . , 0, 1, 0, . . . , 0) one
∂x1 ∂x2 ∂xn
∂fj
has Dfj (a)(ei ) = (a), thus Equation (5.20) becomes
∂xi
 
∂f1 ∂f2 ∂fm
Df (a)(ei ) = (a), (a), . . . , (a) , (5.21)
∂xi ∂xi ∂xi

44
From Equation 5.21 and the definition of a matrix representing a linear transformation in a
given basis, we have that Df (a) is represented by the matrix:
 
∂f1 ∂f1 ∂f1
 (a) (a) · · · (a) 
 ∂x1 ∂x2 ∂xn 
 
 ∂f2 ∂f2 ∂f2 

 ∂x (a) (a) · · · (a) 
Jf (a) =  1 ∂x 2 ∂x n

. (5.22)
 
 .. .. .. .. 

 . . . . 

 
 ∂f ∂fm ∂fm 
m
(a) (a) · · · (a)
∂x1 ∂x2 ∂xm
Definition 5.3.5. If f : Rn → Rm is a differentiable function at a, the matrix given in (5.22)
is called the Jacobian matrix of f at a.

Remark 5.3.6. It should be notice that the Jacobian matrix of a function f = (f1 , f2 , . . . , fm )
exists, as long as the partial derivatives of each fj exist, even if f is not differentiable. Also,
the derivative Df (a) must not be confused with the Jacobian Jf (a). Moreover, the Jacobian
matrix is ONLY, the representation of Df (a) respect to the canonical basis.
The chain rule has the following matrix representation. Assume that f : Rn → Rm is differ-
entiable at a and that g : Rm → Rp is differentiable at f (a). Furthermore, assume that the
coordinate functions of f and g are (f1 , f2 , . . . , fm ) and (g1 , g2 , . . . , gp ) respectively, then the
derivative of g◦ f is represented by the matrix:  
∂g1 ∂g1 ∂g1 ∂f1 ∂f1 ∂f1
 (f (a)) (f (a)) · · · (f (a))   (a) (a) · · · (a) 
 ∂u1 ∂u2 ∂um   ∂x1 ∂x2 ∂xn 
  
 ∂g2 ∂g2 ∂g2   ∂f2 ∂f2 ∂f2 
 ∂u (f (a)) ∂u (f (a)) · · · ∂u (f (a))   ∂x (a) ∂x (a) · · · ∂x (a) 
   
D(g◦f )(a) =  1 2 m  1 2 n .
  
 .. .. .. ..  .. .. .. .. 

 . . . . 

 . . . . 

  
 ∂g ∂g ∂g   ∂f ∂f ∂f 
p p p m m m
(f (a)) (f (a)) · · · (f (a)) (a) (a) · · · (a)
∂u1 ∂u2 ∂um ∂x1 ∂x2 ∂xm
Example 5.3.4. Let γ : [a, b] → R2 and f : R2 → R be differentiable functions, and let
z(t) = f ◦ γ, then z is differentiable and

dz ∂f dx ∂f dy
= + . (5.23)
dt ∂x dt ∂y dt
Discussion. By the chain rule we have that
dz dγ
= Df (γ(t)) . (5.24)
dt dt

Now, using a matrix representation of Df (γ(t)) and , the result follows.
dt

45
Exercise 5.3.1. Let g : R2 → R2 and f : R2 → R be differentiable functions. More, precisely,
assume that g(u, v) = (x(u, v), y(u, v)) and let z(u, v) = f (x(u, v), y(u, v)). Prove that z is
differentiable and
∂z ∂z ∂x ∂z ∂y
= +
∂u ∂x ∂u ∂y ∂u
∂z ∂z ∂x ∂z ∂y
= + .
∂v ∂x ∂v ∂y ∂v

In the one variable case, the second and higher order derivatives play a very important role to
get a better understanding of the functions. In the several variables case it would not be an
exception, hence in what follows, we will discuss concepts and results concerning derivatives
of higher order. We will start by considering the second partial derivatives of a given function
defined from Rn → R.
Assume that f : Rn → R is a function, furthermore, assume that f has all partial derivatives
∂f
in an open subset A of Rn , then we can consider each partial derivative as a function from
∂xi
A → R. For instance, if f (x, y, z) = x2 y + y 2 z + xz 2 , then f has partial derivatives in all R3 . For
∂f ∂f
example, the function : R3 → R defined by (x, y, z) = 2xy + z 2 is the partial derivative
∂x ∂x
of f evaluated at (x, y, z), hence this can be consider as a new function from R3 → R.
When the partial derivatives, considered as functions, have partial derivatives themselves, those
are called second order partial derivatives. There are several ways to denote the second order
∂ 2f
partial derivatives, the most common are: or Dj,i f , which means that you compute first
∂xi ∂xj
the partial derivative respect to xj and then, the partial derivative respect to xi is computed.
∂ 2f
The especial case when i = j is denoted by 2 or Di2 f . The above notation might be confusing,
∂xi
fortunately we shall prove a theorem that establishes the equality in the most common cases,
∂ 2f ∂ 2f
that is, we will prove that under good conditions (x) = (x).
∂xi ∂xj ∂xj ∂xi
The next result is stated for functions from R2 → R, however, it holds in full generality, that
means it holds for functions from Rn → R.

Theorem 5.3.4. Assume that the function f : R2 → R satisfies the following properties.

1. There exists the partial derivatives D1 f , D2 f and D1,2 f in an open set U which contains
the point (a, b).

2. The second order partial derivative D1,2 f , is continuous at (a, b).

Then there exists D2,1 f (a, b) and D2,1 f (a, b) = D1,2 f (a, b).

Proof: Without loss of generality we may assume that (a, b) = (0, 0). Define:

A(h, k) = f (h, k) − f (h, 0) − f (0, k) + f (0, 0). (5.25)

46
Since the partial derivatives exist in U , we may assume that (h, k) ∈ U , hence we can apply
the one variable case of the mean value theorem to B(t) = f (t, k) − f (t, 0) in the interval [0, h].
Notice that A(h, k) = B(h) − B(0), and B 0 (t) = D1 f (t, k) − D1 f (t, 0). Therefore,

A(h, k) = B(h) − B(0) = hB 0 (c) = h(D1 f (c, k) − D1 f (c, 0)), (5.26)


for some c ∈]0, h[. Since the second order partial derivative exists in U , we can apply the mean
value theorem to the function g(t) = D1 f (c, t) in the interval [0, k]. We have, g 0 (t) = D1,2 f (c, t),
hence by the mean value theorem g(k) − g(0) = kg 0 (d) = D1,2 f (c, d)k for some d ∈]0, k[.
Therefore, D1 f (c, k) − D1 f (c, 0) = D1,2 f (c, d)k. Substituting this in (5.26) we have:

A(h, k) = hkD1,2 f (c, d). (5.27)


From this equation, the continuity of D1,2 f at (0, 0) and the conditions on c and d, one gets
directly that

A(h, k)
lim = D1,2 f (0, 0). (5.28)
(h,k)→(0,0) hk
It is straightforward to verify that:

A(h, k) 1
lim = (D1 f (0, k) − D1 f (0, 0)) (5.29)
h→0 hk k
and

A(h, k) 1
= (D2 f (h, 0) − D2 f (0, 0)).
lim (5.30)
k→0 hk h
Now applying Exercise 2, p. 28 we have that

    
A(h, k) 1
D1,2 f (0, 0) = lim lim = lim D2 f (h, 0) − D2 f (0, 0) = D2,1 f (0, 0), (5.31)
h→0 k→0 hk h→0 h

proving what we claimed. 


Warning 5.3.1. Even though, we have talked about higher order partial derivatives, we have not
said anything about the second derivative that generalizes the concept of a derivative. In other
words, if a function f : Rn → Rm has derivative in a set A ⊆ Rn , then for each point a ∈ A,
there exists the derivative Df (a) which is a linear transformation from Rn → Rm . Denoting by
L(Rn ; Rm ) the vector space of all linear transformations from Rn → Rm , hence the existence of
Df (a) for every a ∈ A, can be thought as the existence of a function ϕ : A → L(Rn ; Rm ) so
that ϕ(a) := Df (a), hence the second derivative of f at a would be defined as the derivative
of ϕ at a. Being consistent with the definition of the derivative o a function f : Rn → Rm ,
then the derivative of ϕ would be a linear transformation from Rn → L(Rn ; Rm ), that is, and
element of L(Rn ; L(Rn ; Rm )). Fortunately, this complicated object is isomorphic to a something
more “down to earth object”: the vector space B(Rn × Rn ; Rm ), which is the set of all bi-linear
functions from Rn × Rn → Rm .
To end this warning, do not be afraid of these objects, most of you can live without knowing
what derivatives of higher order are. You can learn about them in a differential calculus course
in Banach Spaces! For the time been, let us go ahead and just learn partial derivatives!

47
5.4 Exercises
1. Read very carefully all the results presented up to this point. Give all the proofs needed
to understand the results. Give examples to illustrate the use of the theorems. As a good
exercise, write down the statements of the theorems without consulting any references,
including the lectures presented here.

2. Solve the problems included in lists 2-10 and 2-17 from the book: “Calculus on manifolds”,
Spivak, English version.

3. Find and discuss examples of functions which have directional derivatives at a given point,
in all directions, but are not continuous.

4. Consider the norm in Rn as a function from Rn → R. Determine the subset of points


where it has derivative and compute it.

5. Let f : Rn → Rn be a functions that satisfies ||f (x)|| ≤ ||x||2 for every x. Is f differentiable
at 0?
 
1

||x||2 sin if x 6= 0
6. Let f : Rn → R be a function given by f (x) = ||x|| .
0 otherwise

Is f differentiable at 0?

7. A function f : Rn → R is called homogeneous of degree k, if f (tx) = tk f (x) for every real


Pn
t and every x. Assume that f is differentiable at any x. Show that xi Di f (x) = kf (x).
i=1
Hint, apply the chain rule to the function g(t) = f (tx).

8. Let T : Rn → Rn be a linear transformation and f : Rn → R given by f (x) = hx, T (x)i.


Explain why f is differentiable and compute Df (x). More generally, assume that f, g :
Rn → Rn are differentiable and set h(x) = hf (x), g(x)i. Prove that h is differentiable and
compute Dh(x).

9. Let f : Rn → R be a function which satisfies: there is an open ball B(a, r) such Du f (x) = 0
for every u and for every x ∈ B(a, r). Is f constant in B(a, r)?

10. Is there a function f : Rn → R so that Du f (a) > 0 for every u and a fixed a?

11. Let f : R2 → R be given by f (x, y) = 3x2 + y 2 . Find the direction u on which f has its
maximum value when (x, y) varies on the unit circle.

12. Find the directional derivatives of the given functions at the given points and in the given
directions.

(a) f (x, y, z) = 2x2 + 3xy + 4xyz at a = (1, 1, 1) in the direction u = (2, 1, 3).
(b) f (x, y, z) = (x2 + 3xy + xz, x + y + z) at a = (1, 0, −1) in the direction u = (1, 0, 3).
(c) f (x, y, z) = xyz at a = (1, 1, 1) in the direction u = (1 − 1, −1).

48
13. Assume that f : Rn → R is differentiable at each point in an open ball B, furthermore,
assume that there are u1 , u2 , . . . , un linearly independent vectors so that Dui f (a) = 0 for
every i = 1, 2, . . . , n and every a ∈ B. Prove that f is constant in B. Does the same result
holds if f takes values in Rm ?

14. Consider the following statement related to a function f : Rn → R and a given point
a ∈ Rn .

(a) f is continuous at a.
(b) f is differentiable at a.
(c) f has directional derivative at a in any direction.
(d) f has continuous partial derivatives (all) in a neighborhood of a.
(e) ∇f (a) = 0.
(f) f is constant in an open ball B containing a.

Construct a table to illustrate which implies the other(s).

15. Let γ(t) = (x(t), y(t)) be a function from R → R2 , and f : R2 → R. If u = f ◦ γ, express


u0 (t) in terms of the partial derivatives of f . You might assume all the differentiability
conditions needed.

16. Related to the previous exercise compute u0 (t) in each case:

(a) f (x, y) = x2 + y 2 , γ(t) = (t, t2 ).


2
(b) f (x, y) = exy+y sin(xy), γ(t) = (cos(t), t2 ).
!
x2
1+e 2
(c) f (x, y) = ln y 2 , γ(t) = (e, e−t ).
1+e

x2 y2
17. Consider the ellipse of equation 2 + 2 = 1. Justify why γ(t) = (a cos(t), b sin(t)) is a
a b
parameterisation of the ellipse. Assume that the foci of the ellipse are F1 and F2 and r1 ,
r2 are the distances from F1 and F2 to a point (x, y), respectively, then γ 0 (t) is orthogonal
to ∇(r1 + r2 ). Hint: use that r1 + r2 is constant.

18. Let f : Rn → Rm be a function. Assume that Du f (a) exists for some a. Show that
Dru f (a) = rDu f (a) for any r 6= 0. It f is differentiable at a, then Du+v f (a) = Du f (a) +
Dv f (a).

19. For a fixed a ∈ Rn , let Da (Rn , Rm ) = {f : Rn → Rm : Df (a) exists}. Prove that


Da (Rn , Rm ) is a vector space with the usual sum of functions and the usual product of a
scalar by a function. Let L(Da (Rn , Rm ); Rm ) be the vector space of all linear transforma-
tions from Da (Rn , Rm ) → Rm . Define ϕ : Rn → L(Da (Rn , Rm ); Rm ), by ϕ(u) = Tu where
Tu (f ) := Du f (a). Show that ϕ is a linear transformation.

49
Chapter 6

Main theorems in several variables


calculus

6.1 Mean value Theorem


The mean value theorem for one variable functions is such an important result that it is natural
to ask for its generalization to the several variables case.
We start by recalling the mean value theorem in the one variable case and illustrate one of its
meanings.

If f : [a, b] → R is a function which is continuous in [a, b] and differentiable in ]a, b[,


then there is at least one c ∈]a, b[ so that f (b) − f (a) = (b − a)f 0 (c).

(a, f (a))
(b, f (b))

Figure 6.1: There is at least one tangent to the graph of f , which is parallel to the secant that
passes throuhg the points (a, f (a)) and (b, f (b))

Theorem 6.1.1 (Mean Value Theorem). Let f : A ⊆ Rn → R be a function which is differ-


entiable in the segment [a, b] = {a + t(b − a) | 0 ≤ t ≤ 1} ⊆ A. Then there exists at least one
c ∈ [a, b] so that f (b) − f (a) = Df (c)(b − a).

Proof: Given a, b ∈ A, let γ : [0, 1] → Rn be given by γ(t) = a + t(b − a). Then γ is


differentiable and γ 0 (t) = b − a for every t. Since f is differentiable in [a, b], then by the
chain rule, the function g = f ◦ γ : [0, 1] → R is differentiable in ]0, 1[ and we have g 0 (t) =

51
Df (γ(t))g 0 (t) = Df (γ(t))(b − a). Also g is continuous in [0, 1]. By the mean value theorem
in the one variable case, one has g(1) − g(0) = (1 − 0)g 0 (t0 ) = g 0 (t0 ) for some t0 ∈]0, 1[. On
the other hand, g(1) − g(0) = f (γ(1)) − f (γ(0)) = f (b) − f (a), hence, from above and setting
c = γ(t0 ), we have f (b) − f (a) = Df (c)(b − a) as claimed. 
In the one variable case one shows that the constant functions are the only functions whose
derivative is zero in an interval. We ask if it is also true for functions from Rn → Rm . The
answer is given in the next result.

Theorem 6.1.2. Assume that A ⊆ Rn is an open set that satisfies: for every a, b ∈ A, there is
a sequence of points a1 , a2 , . . . , ak such that the segments [a, a1 ], [a1 , a2 ], . . . , [ai , ai+1 ], . . . , [ak , b]
are contained in A. Furthermore, if f : A → Rm is differentiable in A and Df (a) = 0 for every
a ∈ A, then f is constant.

Proof: Let f1 , f2 , . . . , fm be the coordinate functions of f , then each fi is differentiable in A and


Dfi (a) = 0 for every i = 1, 2, . . . , m and for every a ∈ A. It is enough to show that each fi is
constant. We need to show that for every a, b ∈ A, fi (a) = fi (b). Using the assumption about A,
there are a1 , a2 , . . . , ak so that [a, a1 ], [a1 , a2 ], . . . , [ak−1 , ak ], [ak , b] ⊆ A. Applying the mean value
theorem to [a, a1 ] we have that there is c ∈ [a, a1 ] so that fi (a1 ) − fi (a) = Dfi (c)(a1 − a) = 0,
since Dfi (x) = 0 for every x ∈ A, therefore fi (a) = fi (a1 ). Using the same argument to each
interval one concludes that fi (aj ) = fi (b) for every j = 1, 2, . . . , k, hence fi (a) = fi (b). 
Remark 6.1.1. It can be shown that if A ⊆ Rn is open and connected, then A satisfies the
condition in Theorem 6.1.2. Hence, that result can be stated for open connected subsets of Rn .

Corollary 6.1.3. Let A be as in Theorem 6.1.2. If f, g : A → Rn are differentiable functions


on A such that Df (a) = Dg(a) for every a ∈ A, then f = g + c, where c is a fixed element in
Rn .

Proof: Let F = f − g, then DF (a) = Df (a) − Dg(a) = 0 for every a ∈ A. Applying Theorem
6.1.2 one concludes that F = c for some constant c, so f = g + c. 

6.2 Taylor’s Theorem


Taylor’s theorem is a powerful result that can be used, among many other aspects, to represent
functions as polynomials in the one variable case. It can also be used to prove the results
concerning classification of critical points.

(Taylor’s Theorem, real case) If f : R → R is n + 1 times differentiable on the open


f 0 (a)
interval with end points a and x, then f (x) = f (a) + f 0 (a)(x − a) + (x − a)2 +
2
f (n) (a) n f (n+1) (ξ)
··· + (x − a) + (x − a)n+1 , for some ξ in the interval determined
n! (n + 1)!
by a and x.

Theorem 6.2.1. [Criteria for maxima or minima, real case] Assume that f has continuous
second derivative in an open interval containing a and f 0 (a) = 0. Under these conditions we
have:

1. if f 00 (a) < 0, then f attains a maximum at a,

52
2. if f 00 (a) > 0, then f attains a minimum at a,

3. if f 00 (a) = 0, no conclusion is obtained.


f 00 (c)
Proof: From Taylor’s theorem, f (x) = f (a) + f 0 (a)(x − a) + (x − a)2 , for some c in the
2
interval determined by a and x. Since f 00 is continuous at a, we can take x so that x − a is
small enough such that f 00 (c) and f 00 (a) have the same sign. Now, the assumption f 0 (a) = 0
f 00 (c)
guarantees that f (x) − f (a) = (x − a)2 , hence the sign of f (x) − f (a) is the same as that
2
of f 00 (a), that is, f (x) − f (a) ≤ 0 if and only if f 00 (a) ≤ 0, hence, if f 00 (a) < 0 then f (x) ≤ f (a),
for every x closed enough to a, proving that f attains a maximum at a. The other cases are
obtained accordingly. 
n
For functions from R → R, we only state Taylor’s theorem for a second order approximation.
The general theorem is stated and proved, for instance, in [4].
As it was stated in Warning 5.3.1, the higher order derivatives of a function from Rn → R, are
multilinear functions from Rn × Rn × · · · × Rn → R (the number of factors in Rn × Rn × · · · × Rn
is equal to the order of the derivative). Hence, without any further explanation, if f : Rn → R
is a differentiable function and if it has derivative Df (x) in an open set containing a, the second
derivative of f , at a, is denoted by D2 F (a) : Rn × Rn → R and “declared” as D2 f (a)(z, w) :=
Pn
Di,j f (a)zi wj , where z = (z1 , z2 , . . . , zn ) and w = (w1 , w2 , . . . , wn ). It is assumed that all
i,j=1
partial derivatives are continuous in an open subset containing a. Similarly, the third derivative
n
of f at a is “declared” to be D3 f (a)(w, y, z) :=
P
Di,j,k f (a)wi yj zk ,
i,j,k=1

Theorem 6.2.2 (Taylor’s Theorem). Let f : D ⊆ Rn → R be a function and a, b ∈ D so that


f has continuous partial derivatives of order two in an open subset U containing the segment
1
[a, b] ⊆ U ⊆ D. Then f (b) = f (a) + Df (a)(b − a) + D2 f (c)(b − a, b − a), for some c ∈ [a, b].
2
Sketch of the Proof: Let γ : [0, 1] → R be given by γ(t) = a + t(b − a). Now apply the one
variable Taylor’s theorem to the function f ◦ γ : [0, 1] → R and use the chain rule. 
Remark 6.2.1. The general statement and proof of Taylor’s theorem is almost a matter of
notation, since the same idea works for the proof.

6.3 Maxima and Minima of Real valued Functions


One of the most important applications of differential calculus is to decide for maximum and
minimum attained by a function. In this section we present some methods to find maximum
and minimum of functions from Rn → R.

6.3.1 Second Derivative Method


Definition 6.3.1. Let f : A ⊆ Rn → R be a function and let a ∈ A. We say that f has a local
minimum at a if there is r > 0 so that f (x) ≥ f (a) for every x ∈ B(a, r) ∩ A. If the inequality
is reversed, that is, if f (x) ≤ f (a) for every x ∈ B(a, r) ∩ A, then f has a maximum at a. If
neither one of the above inequalities holds for every x ∈ B(a, r) ∩ A, then a is called a saddle
point.

53
The function f (x, y) = x2 + y 2 has a local minimum at (0, 0), since f (x, y) ≥ f (0, 0) = 0 for
every (x, y).
In the one variable case, we have a result that tells us that the derivative vanishes at maximum
or minimum values of a function, the same result holds for the several variables case.
Theorem 6.3.1. Let f : A ⊆ Rn → R be a function that attains a maximum or minimum at
a ∈ A. Furthermore, if f is differentiable at a, then Df (a) = 0.
Proof: Since Df (a) is represented by the gradient ∇f (a) = (D1 f (a), D2 f (a), . . . , Dn f (a)),
it is enough to show that Di f (a) = 0 for every i = 1, 2, . . . , n. Let us assume that f
attains a maximum at a. Hence, for t small enough we have f (a + tei ) − f (a) ≤ 0. If
f (a + tei ) − f (a) f (a + tei ) − f (a)
t > 0, then ≤ 0, therefore lim+ ≤ 0. If t < 0 then
t t→0 t
f (a + tei ) − f (a) f (a + tei ) − f (a)
≥ 0, thus lim− ≥ 0. Since the limit exists, one must have
t t→0 t
f (a + tei ) − f (a)
Di f (a) = lim = 0. The case when f attains a minimum is treated in essentially
t→0 t
the same way. 
The above result will help us to find points where a function reaches maximum or minimum,
hence it is important to have criteria to determine maximum and minimum. The next result is
in order, but before stating it, we need to fixed some terminology.
Definition 6.3.2. If a function f : Rn → R has continuous second order partial derivatives in
a neighborhood of a, the Hessian matrix of f at a is defined by:
 
∂ 2f ∂ 2f ∂ 2f
 (a) (a) · · · (a) 
 ∂x1 ∂x1 ∂x1 ∂x2 ∂x1 ∂xn 
 
 ∂ f 2 2 2
∂ f ∂ f 
 ∂x ∂x (a) ∂x ∂x (a) · · · ∂x ∂x (a) 
 
Hf (a) =  2 1 2 2 2 n . (6.1)
 
 .. .. .. .. 

 . . . . 

 
 ∂ 2f ∂ 2f ∂ 2f 
(a) (a) · · · (a)
∂xn ∂x1 ∂xn ∂x2 ∂xn ∂xn
Example 6.3.1. Given the function f (x, y) = (x2 − 4)2 + (y − 1)2 find Hf (2, 1).
Discussion. It is clear that f has continuous partial derivatives of all orders at every point
(a, b) ∈ R2 . By computing the partial derivatives one has: D1 f (x, y) = 4x(x2 − 4); D2 f (x, y) =
2(y − 1), hence D1,2 f (x, y) = D2,1 f (x, y) = 0,D2,2 f (x,y) = 2 and D1,1 (x, y) = 12x2 − 16.

 32 0 
Therefore, D1,1 (2, 1) = 32. Finally, Hf (2, 1) =  .
 
 
0 2
Here we recall some terminology from linear algebra. Given an n × n symmetric matrix A and
X an n × 1 matrix, we define he quadratic form Q(X) = X t AX. It is said that A is positive
definite (negative definite), if for every X 6= 0, Q(X) > 0 (Q(X) < 0).
From Theorem 5.3.4, we have that Hf (a) is a real symmetric matrix, hence it does make sense
to talk about positiveness or negativeness of Hf (a). From linear algebra we know that all the
eigenvalues of Hf (a) are real. The next theorem characterizes positiveness of a matrix.

54
Theorem 6.3.2. A real symmetric matrix A is positive definite, if and only if all the eigenvalues
of A are positive.
The proof of Theorem 6.3.2 can be found in [3], p. 183 .
Theorem 6.3.3. Let f : Rn → R be a function, a ∈ Rn . Assume that f has continuous
second order partial derivatives in an open ball B(a, r) and that Df (a) = 0. Then the following
statements hold.
1. If Hf (a) is positive definite, then f has a minimum at a.

2. If Hf (a) is negative definite, then f has a maximum at a.

3. If neither of the above holds, then f has a saddle point at a.


Proof: From Taylor’s theorem, Theorem 6.2.2, if x is closed enough to a, then
1
f (x) = f (a) + Df (a)(x − a) + D2 f (c)(x − a, x − a), for some c in the interval determined by
2
a and x. The assumption Df (a) = 0 leads to
1
f (x) = f (a) + D2 f (c)(x − a, x − a), (6.2)
2
1 2
or f (x) − f (a) = D f (c)(x − a, x − a), hence the sign of f (x) − f (a) is the same as that of
2
2
D f (c)(x − a, x − a). Since Hf (a) is given by the second order partial derivatives of f , then
the continuity assumption on these partial derivatives guarantees that the function Q(x, c) =
xt Hf (c)x is also continuous as a function of c, (it is clearly continuous as a function of x).
Hence the sign of Q(x, c) is the same as that of Q(x, a), hence the sign of f (x) − f (a) is the
same as the sign of Q(x − a, a) = (x − a)t Hf (a)(x − a), for x close enough to a. Now, the sign
of Q(x − a, a) = (x − a)t Hf (a)(x − a) is determined by the condition on Hf (a) being positive
definite or negative definite. The result follows directly from this. 
Remark 6.3.1. If all the eigenvalues of Hf (a) are zero and ∇f (a) = 0, then nothing can be said
as the following examples show.
1. Let f (x, y) = x4 + y 4 then all the eigenvalues of Hf (0, 0) are zero, however, f has a
minimum at (0, 0).

2. If f (x, y) = x3 − y 3 , then all the eigenvalues of Hf (0, 0) are zero and f has a saddle point
at (0, 0).

3. The function f (x, y) = −x4 − y 4 has a maximum at (0, 0) but Hf (0, 0) is the zero matrix.
Exercise 6.3.1. For each of the given functions f , classify the points where Df (a) = 0.
1. f (x, y) = (x − 1)2 (x + 3)2 + (y − 1)(y + 2)2 .

2. f (x, y, z) = xy + yz + xyz.
2 +y 2 )
3. f (x, y) = (x2 + y 2 )e−(x .

4. f (x, y) = (x + y)2 − (x − y)2 .

5. f (x, y) = (x − 3)(y + 4)(x + y − 3)2 .

55
6.3.2 Lagrange Multipliers
In several optimization problems, it is necessary to identify maximum or minimum of functions
subject to several constraints. There are several good references for approaching this problems.
One, which we recommend, is [6].
A simple example to illustrate a case where a function needs to be optimized is:
Example 6.3.2. find the maxima and minima of the function f (x, y) = 2 − x2 − 2y 2 on the
unit circle.
This example can be approach by writing
√ the equation of the unit circle, x2 + y 2 = 1, and
that is, write y = ± 1 − x2 and then evaluate the function f at
solving for y, √ √ points of
the form (x, ± 1 − x2 ). Hence, we need to optimize the function g(x) = f (x, ± 1 − x2 ) =
2−x2 −2(1−x2 ) = x2 in the interval [−1, 1]. Without any further calculation one sees that g(x)
has maximum values at x = ±1 and minimum values at x = 0. For this values of x, one has
the corresponding points on the unit circle (±1, 0) and (0, ±1), hence the function f attains:
a maximum of value: f (±1, 0) = 2 − 1 = 1, and
a minimum of value f (0, ±1) = 2 − 2 = 0. See Figure 6.2.

Figure 6.2: The curve on the graph of f (blue) is the image of the unit circle (green) under f .
It can be noticed were the maximum and minimum occur when the constraint takes effect.

We notice that in Example 6.3.2, the constraint was given by the points where the function
g(x, y) = x2 + y 2 − 1 is zero, then, it was possible to solve for one of the variables and substitute
the value of y to evaluate the function f . By doing this we have that the problem of finding
maximum or minimum values of f , is reduced to find maximum or minimum of a one variable
function. In general, this procedure fails since in many cases it is not possible to solve for
one of the variables that appear in the constraint. When this happens, and there are “good
conditions” about differentiability, one has at his disposal the powerful method, called the
Lagrange multiplier rule, which is based on the following result.
Theorem 6.3.4. [the Lagrange multiplier rule] Let U ⊆ Rn be an open set, and let f, gi : S ⊆
Rn → R be functions of class C 1 , i = 1, 2, . . . m. Assume that x0 ∈ S0 = {x ∈ U | gi (x) = 0, i =

56
1, 2, . . . , m} satisfies: there is an open ball B(x0 , r) ⊆ Rn so that f (x0 ) ≥ f (x) or f (x0 ) ≤ f (x)
for every x ∈ B(x0 , r) ∩ S0 . Then

λ0 ∇f (x0 ) + λ1 ∇g1 (x0 ) + · · · + λm ∇gm (x0 ) = 0, (6.3)

for some real numbers λ0 , λ1 , λ2 , . . . , λm , not all zero, called Lagrange multipliers.

Remark 6.3.2. From some point of view, Lagrange multiplier rule can be approached from the
following form. Finding a solution of the equation (6.3) is equivalent to find stationary points
of the function F : Rn × Rm → R given by:

F (x1 , x2 , . . . , xn , λ1 , . . . , λm ) = f (x1 , . . . , xn )−(λ1 g1 (x1 , . . . , xn )+· · ·+λm gm (x1 , . . . , xn )), (6.4)

that is, it is needed to solve the equation: ∇F (x1 , x2 , . . . , xn , λ1 , . . . , λm ) = (0, 0, . . . , 0, 0, . . . , 0).


Recently, I read in [5] the following quote of Lagrange, dated in 1797.

“If a function of several variables should be a maximum or minimum and there


are between these variables one or several equations, then it will suffice to add
to the proposed function the functions that should be zero, each multiplied by an
undetermined quantity, and then to look for the maximum or the minimum as if
the variables were independent; the equations that one will find, combined with the
given equations, will serve to de- termine all the unknowns” (see [10]).

I think there is an analogy between the statement of the remark and the cited quote.
Proof: The proof of Theorem 6.3.4 will be given in several steps and follows the lines as in [5].
Step I. Notice that Equation 6.3 is equivalent to

∇f (x0 ) = λ1 ∇g1 (x0 ) + · · · + λm ∇gm (x0 ), (6.5)


if ∇g1 (x0 ), . . . , ∇gm (x0 ) are linearly independent, since in this case, we can choose λ0 = 1 and
change signs if needed. In many instances, this is the way the theorem is stated.
We will show that if ∇f (x0 ), ∇g1 (x0 ), . . . , ∇gm (x0 ), are linearly independent, then x0 is not a
minimum nor a maximum of f . Since we are assuming that ∇f (x0 ), ∇g1 (x0 ), . . . , ∇gm (x0 ) are
linearly independent, then n ≥ m + 1. To fixed the road and without loss of generality we
may assume that x0 = 0, f (0) = 0 and n = m + 1. The first two conditions are justify by the
translations y = x − x0 and h(x) = f (x) − f (x0 ), since the gradient does not change under this
translations. The condition n = m + 1 can be obtained by adding linear constraints hai , xi = 0
if needed, where the vectors ∇f (x0 ), ∇g1 (x0 ), . . . , ∇gm (x0 ), a1 , a2 , . . . , ak are a basis of Rn .
Step II Let F = (f, g1 , . . . , gm ) : Rn → Rn be the function given in terms of its coordi-
nate functions, then F (0) = 0, F is differentiable at U and its derivative is represented by
the Jacobian JF (x), which is continuous by assumption, and whose rows are the gradients
∇f (x), ∇g1 (x), . . . , ∇gm (x).
Also, the differentiability of F implies

F (x) = DF (0)x + h(x), (6.6)


h(x)
where h(x) satisfies lim = 0.
x→0 ||x||
It should be notice that JF (0) is invertible, since its rows are linearly independent.

57
For every k ∈ N define
pk (x) = ||F (x) + (1/k 2 , 0, . . . , 0)||. (6.7)
It is clear that pk is continuous in U . Since B(0, 1/k) is compact, then there is xk ∈ B(0, 1/k)
so that pk (xk ) ≤ pk (x) for every x ∈ B(0, 1/k).
Step III Claim ||xk || < 1/k. The proof will be accomplish by proving several statements.
||x||2
1. lim = 0. This equation follows from Theorem 4.1.13, p. 26.
x→0 ||DF (0)x||

||F (x)||
2. lim = 1. From Equation 6.6 and the triangle inequality we have
x→0 ||DF (0)x||

||F (x)|| ||DF (0)x + h(x)|| ||DF (0)x|| + ||h(x)|| ||h(x)||


= ≤ =1+ . (6.8)
||DF (0)x|| ||DF (0)x|| ||DF (0)x|| ||DF (0)x||

Since DF (0) is invertible, then again Theorem 4.1.13, p. 26 implies that there is m > 0 so that
m||x|| ≤ ||DF (0)x||. Thus

||F (x)|| ||h(x)|| ||h(x)||


≤1+ ≤1+ . (6.9)
||DF (0)x|| ||DF (0)x|| m||x||
||F (x)||
The assumption on h implies lim ≤ 1.
x→0 ||DF (0)x||
Triangle inequality implies

||F (x)|| ||DF (0)x + h(x)|| ||DF (0)x|| − ||h(x)|| ||h(x)||


= ≥ =1− , (6.10)
||DF (0)x|| ||DF (0)x|| ||DF (0)x|| ||DF (0)x||
||F (x)|| ||h(x)||
hence lim ≥ 1, since lim = 0 (m||x|| ≤ ||Df (0)(x)||).
x→0 ||DF (0)x|| x→0 ||DF (0)x||
On the other hand, we have

||F (x)|| − 2||(||x||2 , 0, . . . , 0)|| ||F (x) + (||x||2 , 0, . . . , 0)|| − ||(||x||2 , 0, . . . , 0)|| ||F (x)||
≤ ≤ .
||DF (0)x|| ||DF (0)x|| ||DF (0)x||
(6.11)
Taking limit in (6.11) we have

||F (x) + (||x||2 , 0, . . . , 0)|| − ||(||x||2 , 0, . . . , 0)||


lim = 1. (6.12)
x→0 ||DF (0)x||
1
From Equation 6.12 it follows that for k >> 1 and for each ||y|| = , ||F (y)+(||y||2 , 0, . . . , 0)|| >
k
2 1
||(||y|| , 0, . . . , 0)||, thus for every ||y|| = we have pk (y) = ||F (y) + (||y||2 , 0, . . . , 0)|| >
k
1 1
||(||y||2 , 0, . . . , 0)|| = 2 = pk (0). From this we conclude that xk does not satisfy ||xk || = ,
k k
1
thus ||xk || < , proving Step III.
k
Step IV For k >> 1 we have F (xk ) + (1/k 2 , 0, . . . , 0) = 0. In fact, we have that the Jacobian
JF (xk ) in invertible, since JF (x) is continuous and JF (0) is invertible. On the other hand,

58
p2k (x) = hF (x) + (1/k 2 , 0, . . . , 0), F (x) + (1/k 2 , 0, . . . , 0)i. Computing the derivative of p2k (x),
using that xk is an interior point of the close ball of radius 1/k and evaluating at xk we have

2JF (xk )(F (x)t + (1/k 2 , 0, . . . , 0)t ) = 0. (6.13)


The conclusion follows, since JF (xk ) has inverse.
Finally, F (xk ) + (1/k 2 , 0, . . . , 0) = 0, implies
1
f (xk ) = − < 0 = f (0), gi (xk ) = 0, i = 1, 2, . . . m, (6.14)
k2
hence x = 0 was not a minimum of f . 
Remark 6.3.3. Lagrange’s multiplier rule is a very powerful result to search for maxima or
minima of real valued functions. We shall provide a few examples to show how it can be
applied.
Example 6.3.3. Find the closest point of the graph of the function h(x, y) = (x − 1)2 + (y − 1)2
to the origin.
Discussion: The problem can be stated in the following form: find the minimum of f (x, y, z) =
x2 + y 2 + z 2 subject to g(x, y, z) = z − (x − 1)2 − (y − 1)2 = 0.
Also, it can be stated as: find the minimum of the function f1 (x, y) = x2 +y 2 +((x−1)2 +(y−1)2 )2 .
Notice that f1 is obtained from f by substituting the value of z, obtained from the constraint,
into f .
First approach., One needs to solve the system of equations

∇f (x, y, z) = λ∇g(x, y, z)
g(x, y, z) = 0

Equivalently, one needs to solve:

2x = −2λ(x − 1)
2y = −2λ(y − 1)
2z = λ
0 = z − (x − 1)2 − (y − 1)2

Doing calculations one shows that the solutions of this system are given by:

4(x − 1)3 = −x
y = x
x
z =
2(1 − x)
x
λ = .
1−x
To finish, it is necessary to solve the cubic(4(x − 1)3 = −x. Asking Sagemath to solve the
√ √ )
1 5 − 7i 5 + 7i
equation, one gets that the solutions are: , , . Taking the real solution
2 4 4

59
1
one has x = y = z = and λ = 1, hence the minimum distance from the graph of z =
2 r √
1 1 1 3
(x − 1)2 + (y − 1)2 to the origin is + + = .
4 4 4 2
Second approach. To find the closest point of the graph of h to the origin, we want to find a
minimum of f1 (x, y) = x2 + y 2 + [(x − 1)2 + (y − 1)2 ]2 .
We need to compute the points where ∇f1 (x, y) = (0, 0), that is, we need to solve the system:

∇f1 (x, y) = (2x + 4(x − 1)[(x − 1)2 + (y − 1)2 ], 2y + 4(y − 1)[(x − 1)2 + (y − 1)2 ]) = (0, 0) (6.15)

which is equivalent to

2x + 4(x − 1)[(x − 1)2 + (y − 1)2 ] = 0 (6.16)


2y + 4(y − 1)[(x − 1)2 + (y − 1)2 ] = 0 (6.17)

Multiplying the first equation by y and the second by −x, adding and factorizing one has:

[(x − 1)2 + (y − 1)2 ][4y(x − 1) − 4x(y − 1)] = 0, (6.18)


hence we must have (x − 1)2 + (y − 1)2 = 0 or 4y(x − 1) − 4x(y − 1) = 0. In the first case one
gets x = y = 1 which is not a solution of (6.17). From the second case one obtains x = y. Now,
substituting in one of the equations in (6.17) it leads to

x + 4(x − 1)3 = 0, (6.19)


which is the main equation in the(previous method. )
√ √
1 5 − 7i 5 + 7i
We already known its solutions: , , . Hence, the point where f1 reaches a
2 4 4
 
1 1
maximum or minimum is at x0 = , . The next step is to compute the Hessian matrix of
2 2
f1 , evaluate it at x0 and decide if it is positive or negative definite.
In general, the
 hessian matrix of f1 is: 
 2 + 12(x − 1)2 + 4(y − 1)2 8(x − 1)(y − 1) 
Hf1 (x, y) =  .
 
 
8(x − 1)(y − 1) 2 + 12(y − 1)2 + 4(x − 1)2
 

1 1
 
1 1  6 2 

Evaluating at x0 = , one has Hf1 ,
=   . It can be verified that it is
 
2 2 2 2  
2 6
 
1 1 1 1 1 3
positive definite, hence f1 attains a minimum at x0 of value f1 , = + + = , hence
2 2 4 4 4√ 4
3
the minimum distance from the graph of z = (x − 1)2 + (y − 1)2 to the origin is , just as
2
before! 

60
6.3.3 Inverse and Implicit Function Theorem
In the one variable case, the problem of computing the derivative of the inverse of a function is
solved with a couple of assumptions as it is stated in Theorem 6.3.5. The several variable case
is more subtle. In this section we will be dealing with the inverse of a function f : Rn → Rn .

Theorem 6.3.5. Let f : [a, b] → R be a continuous, one to one, function and let x0 ∈]a, b[
be so that f is differentiable at x0 and f 0 (x0 ) 6= 0. Then f −1 is differentiable at f (x0 ) and
1
(f −1 )0 (f (x0 )) = 0 .
f (x0 )
g(f (x0 ) + h) − g(f (x0 ))
Proof: Set g = f −1 . We need to prove that lim exists and is equal to
h→0 h
1
0
. If h is small enough, by the intermediate value theorem, there exists k depending on h,
f (x0 )
so that f (x0 ) + h = f (x0 + k), see Figure 6.3.

Figure 6.3: By the intermediate value theorem, there is k so that f (x0 + k) = f (x0 ) + h

From this equation one has

g(f (x0 ) + h) − g(f (x0 )) g(f (x0 + k)) − g(f (x0 ))


=
h h
x0 + k − x0
=
f (x0 + k) − f (x0 )
k
=
f (x0 + k) − f (x0 )
1
= .
f (x0 + k) − f (x0 )
k

61
Notice that from equation f (x0 ) + h = f (x0 + k), the continuity of f and the one to one
assumption on f , one has h → 0 if and only if k → 0. Hence, the conclusion follows by taking
limit and applying the assumption f 0 (x0 ) 6= 0. 
Remark 6.3.4. The assumption f 0 (x0 ) 6= 0 is crucial, since otherwise the derivative of f −1 does
not exist. Argue by contradiction, if it exists, by the chain rule the derivative of f −1 ◦ f (x) = x
at x0 is given by (f −1 )0 (f (x0 ))f 0 (x0 ) = 1, which is impossible, since f 0 (x0 ) = 0.

Figure 6.4: The graph of f and the graph of f −1

Remark 6.3.5. A geometric method to sketch the graph of the inverse of a function f is based
on the property: If (x, f (x)) is a point on the graph of f , then the corresponding point on the
graph of f −1 is (f (x), x).
In fact, set y = f (x), then f −1 (y) = x. This must be interpreted as follows: Over the horizontal
axis, locate the point y, then evaluating f −1 at that point gives f −1 (y) = x, hence if we call
P = (x, f (x)), then the corresponding point on the graph of f −1 is Q = (y, f −1 (y)) = (f (x), x),
see Figure 6.4. A geometric construction of the graph of f −1 is giving by showing that the
triangle OP Q is isosceles and Q is the reflection of P through the line y = x. In fact, we have
x2 + f (x)2 = y 2 + (f −1 (y))2 , hence OP = OQ, showing that the triangle OP Q is isosceles. Also,
x − f (x)
the segment P Q has slope equal to m = = −1, so it is perpendicular to the line y = x
f (x) − x
which has slope equal to 1. Hence, Q is the reflection of P through the line y = x. This method
is useful to construct the graph of f −1 without knowing the function f −1 explicitly.
1
An interesting example is: let f (x) = x + 2 , then f is strictly increasing, since f 0 (x) =
x +1
2x
1− 2 > 0, as can be verified. From this we have that f has inverse, which can be
(x + 1)2
1
obtained explicitly by solving the equation f (x) = x + 2 = y for x in terms of y. It should
x +1
be noticed that the mentioned equation is cubic and can be solved by using Cardano’s formulas.
The graphs of f and f −1 are shown in Figure 6.5.
More examples of this sort can be approached by using almost any software that produces
graphs of functions. We invite you to produce a few more.

62
1
Figure 6.5: The graphs of f (x) = x + and of f −1
x2 +1

The main result concerning the derivative of the inverse of f : Rn → Rn is given by the inverse
function theorem. We start the discussion of this result with the:
Theorem 6.3.6. Let f : Rn → Rn be a function which has continuous first order partial
derivatives in the closed ball with center at X0 : B = {X | ||X − X0 || ≤ r}. Then, there exists
M > 0 so that ||f (X) − f (Y )|| ≤ n2 M ||X − Y || for every X, Y ∈ B.
Proof: Let f = (f1 , f2 , . . . , fn ) be the coordinate functions of f . Since B is compact and Dj fi is
continuous in B for every i, j = 1, 2, . . . , n, then the image of B under each Dj fi is a compact
subset in Rn , hence there exists Mij > 0 so that |Dj fi (X)| ≤ Mij for every X ∈ B. Take M
equal to the greatest of all of the Mij , then we have

|Dj fi (X)| ≤ M for every i, j = 1, 2, . . . , n and for every X ∈ B (6.20)


Let X, Y ∈ B, say X = (x1 , x2 , . . . , xn ) and Y = (y1 , y2 , . . . , yn ).

Figure 6.6: Points Yj in the closed ball centered at X0

For each j = 1, 2, . . . , n, set1 Yj = (y1 , y2 , . . . , yj , xj+1 , . . . , xn ), then for each i = 1, 2, . . . , n one


has
n
X
fi (Y ) − fi (X) = [fi (Yj ) − fi (Yj−1 ]. (6.21)
j=1

1
According to Figure 6.6, the choice of Yj has to be made more carefully, since in some cases Yj 6∈ B.

63
We will closely examine each term in the sum

fi (Yj ) − fi (Yj−1 ) = fi (y1 , y2 , . . . , yj , xj+1 , . . . , xn ) − fi (y1 , y2 , . . . , yj−1 , xj , xj+1 , . . . , xn ). (6.22)

Set yj − xj = hj and ϕ(t) := fi (y1 , y2 , . . . , yj−1 , t + xj , xj+1 , . . . , xn ). By the assumption on Dj fi ,


we can applied the mean value theorem to ϕ(t) in the interval [0, hj ] obtaining that there is
cj ∈ [0, hj ] so that

fi (Yj ) − fi (Yj−1 ) = ϕ(hj ) − ϕ(0) = hj ϕ0 (cj ) = hj Dj fi (Zj ) = (yj − xj )Dj fi (Zj ), (6.23)

where Zj = (y1 , y2 , . . . , yj−1 , cj + xj , xj+1 , . . . , xn ). Hence from equations (6.21) and (6.23) we
obtain:
n
X n
X
fi (Y ) − fi (X) = [fi (Yj ) − fi (Yj−1 )] = (yj − xj )Dj fi (Zj ). (6.24)
j=1 j=1

Applying the triangle inequality, inequality (6.20) and the fact that |yj − xj | ≤ ||Y − X||, one
has:

Xn X n
|fi (Y ) − fi (X)| = [fi (Yj ) − fi (Yj−1 ] = (yj − xj )Dj fi (Zj ) ≤ nM ||Y − X||. (6.25)


j=1 j=1

On the other hand, we have that ||(x1 , x2 , . . . , xn )|| ≤ |x1 | + |x2 | + · · · + |xn |, which can be
verified by squaring both sides of the inequality, hence using this and (6.25), one has
||f (X) − f (Y )|| ≤ |f1 (Y ) − f1 (X)| + |f2 (Y ) − f2 (X)| + · · · + |fn (Y ) − fn (X)| ≤ n2 M ||Y − X||,
hence ||f (X) − f (Y )|| ≤ n2 M ||Y − X|| proving what was claimed. 
The next result is known as the Inverse Function Theorem and it will solve the problem of
deciding when a function has an inverse.

Theorem 6.3.7 (Inverse Function Theorem). Let f : Rn → Rn be a continuously differentiable


function in an open set that contains the point a. Furthermore, assume that Df (a) is not
singular. Then there exists an open set V which contains a and an open set W that contains
f (a) so that f : V → W has inverse f −1 which is differentiable at any y ∈ W and Df −1 (y) =
(Df (f −1 (y)))−1 .

Proof: The proof is going to be given in several steps.


First. Reduction to the case Df (a) is the identity linear transformation. Set, T = Df (a); since
Df (a) is not singular, there exists T −1 which is differentiable, hence T −1 ◦f is differentiable at a.
By applying the chain rule one has D(T −1 ◦f )(a) = DT −1 (f (a))◦Df (a) = T −1 (f (a))◦Df (a) =
I, the identity linear transformation. Now it is clear that if the result holds for T −1 ◦ f , then
the result holds for f , since T is linear and bijective.
Second. We shall prove that there are open sets U and W so that, f : U → W has a con-
tinuous inverse. By the first step, we may assume that Df (a) is the identity. If h is so
||f (a + h) − f (a) − T (h)|| ||h||
that f (a + h) = f (a), then = = 1. Also, by assumption
||h|| ||h||
||f (a + h) − f (a) − T (h)||
lim = 0, hence there must be a closed ball B, center at a, so that
h→0 ||h||

f (x) 6= f (a) for every x ∈ B \ {a}. (6.26)

64
By the continuity of the partial derivatives of the coordinate functions of f and the assumption
on Df (a) being non singular, we may choose B so that:

det (Df (x)) 6= 0 for every x ∈ B (6.27)


1
|Dj fi (x) − Dj fi (a)| < for every x ∈ B (6.28)
2n2
Claim. f is injective in B. Let ϕ(x) := f (x) − x, then the coordinate functions of ϕ are
given by: ϕi (x) = fi (x) −(xi , where f = (f1 , f2 , . . . , fn ) and x = (x1 , x2 , . . . , xn ). Hence, one
1 if i = j
has Dj ϕi (x) = Dj fi (x) − .
0 otherwise
On the other hand, we are assuming that Df (a) = I,(the identity linear transformation, and
1 if i = j
this linear transformation is represented by Dj fi (a) = , the identity matrix. One
0 otherwise
1
concludes that Dj ϕi (x) = Dj fi (x) − Dj fi (a). Applying Theorem 6.3.6 to ϕ(x) with M = 2
2n
one has
||x − y||
||f (x) − x − f (y) + y|| ≤ ; now applying the triangle inequality we obtain
2
||x − y||
||x − y|| − ||f (x) − f (y)|| ≤ and from this we have:
2
||x − y|| ≤ 2||f (x) − f (y)||, for every x, y ∈ B. (6.29)
The injectivity of f in B follows directly from inequality (6.29).
We also have that the boundary of B, Fr(B) is a compact set, hence f ({Fr}(B)) is compact
and from (6.26), one has that f (a) 6∈ f (Fr(B)), hence there is d > 0 so that ||f (x) − f (a)|| ≥ d,
d
for every x ∈ Fr (B). Set r = and W = B(f (a), r). If y ∈ W and x ∈ Fr (B), then by the
2
triangle inequality
d
d ≤ ||f (x) − f (a)|| = ||f (x) − y − f (a) + y|| ≤ ||f (x) − y|| + ||f (a) − y|| ≤ ||f (x) − y|| + , hence
2
d
||f (a) − y|| < ≤ ||f (x) − y|| (6.30)
2
Third. f : U → W is onto, where U is the interior of B. Given y ∈ W , define g : U → R by
g(x) = ||y − f (x)||2 . It is straightforward to see that g is continuous, hence, since B is compact
g attains a minimum in B. From (6.30) we conclude that if x is in the boundary of B, then
g(a) < g(x), hence the minimum of g is attained in U which is open. Since g has derivative, we
must have that Dg(x0 ) = for some x0 ∈ U . Notice that the j-th partial derivative of g is given
by:
Xn
Dj g(x0 ) = 2(yi − fi (x0 ))Dj fi (x0 ). (6.31)
i=1

Also, Dg(x0 ) = 0 is equivalent to Dj g(x0 ) = 0 for every j = 1, 2, . . . , n. From (6.27) we have


that the matrix (Dj fi (x0 )) has inverse, hence the system given by (6.31), has only the trivial
solution, that is yi − fi (x0 ) = 0 for every i = 1, 2, . . . , n, thus y = f (x0 ), proving that f is onto
at W .

65
Set V = U ∩ f −1 (W ), since f is continuous and W is open, then V is the intersection of two
open sets, hence V is also open. Therefore, we have proved that f : V → W has an inverse.
It follows directly from (6.29), that f −1 is continuous in W .
Fourth. It only remains to prove that f −1 is differentiable in W and that its derivative is the
inverse of the derivative of f . In order to prove that f −1 has Df (x1 )−1 as its derivative at y1 ,
f −1 (y) − f −1 (y1 ) − T −1 (y − y1 )
we need to prove that lim = 0, where T = Df (x1 ), f (x1 ) = y1
y→y1 ||y − y1 ||
and y = f (x) for some x, since f is bijective. Also notice that y → y1 if and only if x → x1 , by
the continuity of both f and f −1 . Set ϕ(y) = f −1 (y) − f −1 (y1 ) − T −1 (y − y1 ), then applying T
ϕ(y)
to , one has.
||y − y1 ||

f −1 (y) − f −1 (y1 ) − T −1 (y − y1 )
   
ϕ(y)
T = T
||y − y1 || ||y − y1 ||
T (x − x1 ) − (f (x) − f (x1 ))
=
||f (x) − f (x1 )||
T (x − x1 ) − (f (x) − f (x1 )) ||x − x1 ||
=
||f (x) − f (x1 )|| ||x − x1 ||
T (x − x1 ) − (f (x) − f (x1 )) ||x − x1 ||
=
||x − x1 || ||f (x) − f (x1 )||

Now, the first factor on the right approaches zero, by assumption, while the second is bounded
by 2 (inequality (6.29)), hence the right hand member approaches zero, when y → y1 . Since T
ϕ(y)
is injective, one concludes that lim = 0, as needed, finishing the proof of the theorem.
y→y1 ||y − y1 ||


In several mathematical problems or in applied problems, there are cases in which, there are
relations among several variables and under especial assumptions one of those variables can be
obtained in terms of the rest. For example, given the equation x2 + y 2 + z 2 − 1 = 0, which
represents the sphere of radius one and center at the origin. This equations can be thought
as the level set of the function f (x, y, z) = x2 + y 2 + z 2 −p1 at zero, that is, f (x, y, z) = 0.
We notice that solving, for instance for z, one haspz = ± 1 − x2 − y 2 . Taking one sign, for
instance the positive one, we have z = g(x, y) = 1 − x2 − y 2 , hence f (x, y, g(x, y)) = 0 we
also have that the graph of g is half the sphere. Also g is a differentiable function of x, y, if
∂f ∂f
1 < x2 + y 2 . Also, when 1 = x2 + y 2 , z = 0 and = 2z = 0. Hence whenever 6= 0, the
∂z ∂z
condition f (x, y, z) = x2 + y 2 + z 2 − 1 defines a function implicitly.
The next result gives a general view concerning the case when an equation of the form f (x1 , x2 , . . . , xn ) =
0 can be solved for one of the variables.

66
Figure 6.7: The equation x2 + y 2 + z 2 − 1 = 0 defines a function whose graph is the upper half
of the sphere of radius one with center at the origen

Theorem 6.3.8 (Implicit Function Theorem). Suppose f : Rn × Rm → Rm , with coordinate


functions f1 , f2 , . . . , fm , is of class C 1 in an open set W ⊆ Rn × Rm containing (a, b). Further-
more, assume that:

1. f (a, b) = 0,

2. The matrix A(a, b) = (Dn+j fi (a, b)), 1 ≤ i, j ≤ m is non singular.

Then,

1. there are open sets U ⊂ Rn and V ⊂ Rm and a function l : U → V so that f (x, l(x)) = 0.

2. l(x) is diferentiable in U and Dl(x) is represented by the matrix A(x, l(x))−1 A1 (x), where
A1 (x) = (Di fj (x)) with 1 ≤ i ≤ n and 1 ≤ j ≤ m.

Proof: 1) In the following arguments we will use:

67
 
 B O 
Fact. If M =   is a matrix defined by blocks, where B and C are square matrices
 
 
O C
and O denotes the zero matrix, then |M | = |B||C|. This can be achieved by computing the
determinant of M using elementary operations on the rows of M .
Define F : Rn × Rm → Rn × Rm by F (x, y) = (x, f (x, y)). Here we need to be a little
bit careful about the notation, since the couple (x, y) means a pair of n and m tuples. If
x = (x1 , x2 , . . . , xn ), then the coordinate functions of F are: Fi (x, y) = xi for i = 1, 2, . . . , n and
Fn+i (x, y) = fi (x, y) for i = 1, 2, . . . , m. Since each Fi is linear for i = 1, 2, . . . , n and each fi is
differentiable, then the function F is differentiable
 in
 the same open set where f is. Moreover,
 I O 
the Jacobian of F at (a, b) is JF (a, b) =   as can be verified by recalling what the
 
 
O A
coordinate functions of F are. Applying the recalled fact above, we have that the determinant
of JF (a, b) is equal to |A|, which is non zero, hence JF (a, b) is non singular, therefore F satisfies
the assumption of the inverse function theorem, thus there exists an open set W ⊆ Rn × Rm
containing F (a, b) = (a, f (a, b)) = (a, 0), and an open set D, which can be taken to be of the
form D = U × V , so that F : D → W has an inverse H : W → U × V which is of the form
H(x, y) = (x, g(x, y)), for some differentiable function g, (convince yourself that H has this
form by using the definition of the inverse of a function). Let π : Rn × Rm → Rm be given by
π(x, y) = y. Then π ◦ F = f . From this equation one has:

f (x, g(x, y)) = f ◦ H(x, y)


= π ◦ F ◦ H(x, y)
= π(x, y)
= y,

hence f (x, g(x, 0)) = 0. Setting l(x) = g(x, 0) one has that l is differentiable and f (x, l(x)) = 0.
(2) We need to prove only that Dl(x) = A(x, l(x))−1 A1 (x). Recall that for every i we have
πi : Rn → R given by πi (x1 , . . . , xi , . . . , xn ) = xi . Let l1 , . . . lm be the coordinate functions of l,
hence the coordinate functions of h(x) = (x, l(x)) are π1 , . . . , πn , l1 , . . . , lm . Set g1 = f ◦ h, then
g1 is differentiable and g1 (x) = 0 for every x ∈ U . Applying the chain rule one has:

0 = Dg1 (x) = Df (h(x)) ◦ Dh(x). (6.32)


To finish the proof, write (6.32) in matrix form using partial derivatives, that is,
 
 In 
0 = [A1 A2 ]   = A1 + A2 B2 , (6.33)
 
 
B2

where:
A1 = (Di fj ), 1 ≤ i ≤ n and 1 ≤ j ≤ m;

68
A2 = (Dn+k fr ), 1 ≤ k ≤ m and 1 ≤ r ≤ m and
B2 = (Di lj ), 1 ≤ i ≤ n and 1 ≤ j ≤ m.
We notice that A2 is non singular in (a, b), the assumption on f implies that A2 is non singular
in an open ball, hence from (6.33) it follows that

B2 = −A−1
2 A1 , (6.34)
finishing the proof of the theorem. 
Remark 6.3.6. It is important to notice that even that we do not know the function l explicitly,
we are able to compute its derivative, which contains very important information about l,
especially to perform qualitative analysis of l.

6.4 Exercises
1. Let f : Rn → Rn be a function which has derivative everywhere and Df (a) has rank n for
some a ∈ Rn . Let y ∈ Rn be a fixed element. Set ϕ(x) = x + Df (a)−1 (y − f (x)). Show
that ϕ is differentiable everywhere and compute Dϕ(x).

2. Assume that f : Rn → Rn is continuously differentiable in an open ball B(a, r) and Df (a)


is injective. Show that given ε > 0, there is 0 < d ≤ r so that |Dj fi (x) − Dj fi (a)| < ε and
Df (x) has determinant 6= 0 for every x ∈ B(a, d), where f = (f1 , f2 , . . . , fn ).

3. Let f : R2 → R be a continuous function. Could f be one to one?

4. (One version of the mean value theorem) Let f : D ⊆ Rn → Rm be a function which is


differentiable on the segment [a, b] ⊆ D. For a given y ∈ Rm , there is c ∈ [a, b] so that
hf (b) − f (a), yi = hDf (c)(b − a), yi.

5. (One more version of the mean value theorem) Let f : D ⊆ Rn → Rm be a function which
is differentiable on the segment [a, b] ⊆ D. Then there exists a linear transformation
T : Rn → Rm so that f (b) − f (a) = T (b − a).

6. (Still one more version of the mean value theorem) Let Ax + By + Cz + D = 0 be the
equation of a plane with C 6= 0. Let f : R2 → R be a function which satisfies:

(a) For a subset D ⊆ R2 , which has nonempty interior, f is continuous in the closure of
D.
(b) f is differentiable in the interior of D.
(c) There exists a bounded subset F so that the closure ofF is contained in
 D and the
Ax By
points (x, y) on the boundary of F satisfy f (x, y) = − + +D .
C C
A
Then there exists a point (x0 , y0 ) in the interior of F so that D1 f (x0 , y0 ) = − and
C
B
D2 f (x0 , y0 ) = − . Geometrically, this means that the tangent plane to the graph of f at
C
(x0 , y0 , f (x0 , y0 )) is parallel to the plane of equation Ax + By + Cz + D = 0 with C 6= 0.
Generalize the result to functions from Rn → R.

69
7. Show that the function f (x) = x5 + 3x3 + x + 1 has inverse which is also diffrentiable.
Compute the derivative of the inverse of f and sketch the graph of f −1 .

8. In Figure 6.8 points D and E satisfy CD + BE = BC. In what position, do these points
satisfy that ED has minimum length?

Figure 6.8: In triangle ABC, points D and E satisfy CD + BE = BC

9. Find the intervals where the function f (x) = x3 − x has inverse and sketch its graph.

10. Construct several examples of functions from Rn → Rn which have inverse (do not forget
to consider particular cases, that is n = 1, 2, 3 an so on). Convince yourself that from
all of these functions, probably the easiest way to construct such functions is to consider
linear transformations, since in that case, you have many methods at your disposal to
verify that a linear transformation has inverse.
 
 a b 
11. Let A =   be real matrix. Show that A is:
 
 
b c

(a) positive definite if and only if ac − b2 > 0 and a > 0 or c > 0,


(b) negative definite if and only if ac − b2 > 0 and a < 0 or c < 0,

12. Apply Exercise 11 to state criteria to classify maximum or minimum of functions f : R2 →


R in terms of second order partial derivatives.

13. Generalize Example 6.3.3 in the following sense: first, consider a function h(x, y) = (x −
a)2 + (y − a)2 , a 6= 0. Find the point on the graph of h which is closest to the origin
(0, 0, 0). Second generalization, consider the function h(x1 , x2 , . . . , xn ) = (x1 − a)2 + (x2 −
a)2 + · · · + (xn − a)2 and find the closest point on the graph of h to the origin in Rn+1 .

70
14. Assume that f : R2 → R is continuous. Show that f has no inverse. Is the same result
true for a function : Rn → Rm with m < n?

15. Assume that f : R → R satisfies f 0 (x) 6= 0 for every x. Show that f has inverse.

16. Let f : R2 → R2 be given by f (x, y) = ex (cos(y), sin(y)). Show that Df (x, y) is not
singular for every (x, y), however f is not one to one. Where is f one to one?

17. Find the smallest value of x2 + y 2 subject to the constraint y + 3x = 3.

18. Find the maximum and minimum values of the function f (x, y) = x2 + 2y 2 that lie in the
disk x2 + y 2 ≤ 1.
x2 y 2
19. Find the extremal values of the function f (x, y) = xy subject to the constraint + −
8 2
1 = 0.

20. Find the dimensions of a cardboard box of maximum volume, given that the area of the
box is fixed.

21. Minimize xy + xz + yz subject to the constraint x + y + z = 3.


(x − 4)2
22. Find the closest and furthest points between the circle x2 +y 2 = 1 and the ellipse +
4
(y − 4)2
= 1. Draw a picture and represent your result.
3
23. (Cobb-Douglas Model) A manufacturer knows that his production if given by the Coob-
Douglas function f (x, y) = 100x3/4 y 1/4 , where x represents the amount of units of labor
and y represents the amount of units of capital. Each labor unit costs $200 and each
capital unit costs $250. The total expenses for labor and capital cannot exceed $50,000.
Find the maximal level of production.
 
3 3 x y z
24. Let f : R → R be given by f (x, y, z) = , , .
1+x+y+z 1+x+y+z 1+x+y+z
Determine the domain of f and show that at those points, det (Jf (x, y, z) = (1 + x + y +
z)−4 . Show that f is one to one and find its inverse. What is the generalization of this
result to Rn , with n ≥ 4? Start the solution of the problem with the case n = 2.

25. Let f : Rn → Rn be of class C 1 function with coordinate functions f = (f1 , f2 , . . . , fn ).


Assume that the functions g1 , g2 , . . . , gn : R → R are of class C 1 in R. Let h : Rn → Rn be
given by h(x1 , x2 , . . . , xn ) = (f1 (g1 (x1 ), g2 (x2 ), . . . , gn (xn )), . . . , fn (g1 (x1 ), g2 (x2 ), . . . , gn (xn ))).
Compute the Jacobian matrix of h in terms of the jacobian of f and the derivatives of the
functions gi , i = 1, 2, . . . , n. More precisely, show that
Jh(x1 , x2 , . . . , xn ) = Jf (g1 (x1 ), . . . , gn (xn ))g10 (x1 ) · · · gn0 (xn ). Hint: apply the chain rule.

26. Maximize the function f (x1 , x2 , . . . , xn ) = (x1 x2 · · · xn )2 subject to the constraint x21 +x22 +
· · · x2n = 1. Use the result to show the arithmetic-geometric mean, that is: if x1 , x2 , . . . are
a1 + a2 + · · · + an
no negative reals, then (a1 a2 · · · an )1/n ≤ holds.
n

71
Bibliography

[1] T. M. Apostol, (1969). Calculus, Vol. II, Second Edition. John Wiley and Sons. A Wiley
International Edition.

[2] T. M. Apostol, (1972-1974). Análisis Matemático, Editorial Reverté, S. A.

[3] F. Barrera Mora, (2007). Álgebra Lineal. Grupo Editorial Patria.

[4] R. G. Bartle, (1964). The Elements of Real Analysis. John Wiley and Sons, A Wiley Inter-
national Edition.

[5] J. Brinkhuis and V. Protasov, (2016). A new proof of the Lagrange multiplier rule. Opera-
tions Research Letters, Volume 44 Issue 3, May 2016, Pages 400-402.

[6] J. Nocedal and S. J. Wright, (2006). Numerical Optimization. Springer-Science+Business


Media.

[7] M. Spivak, (1965). Calculus on Manifolds. Addison-Wesley Publishing Company. The Ad-
vanced Book Program.

[8] L. G. Swanson and R. T. Hansen, (1988). The equivalence of the multiplication, pigeonhole,
induction, and well ordering principles, Vol. 19:1 , 129:131.

73

You might also like