MAT2122 Multivariable Calculus

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 173

Multivariable Calculus

MAT 2122

Fall 2021

y
x

Alistair Savage

Department of Mathematics and Statistics

University of Ottawa

This work is licensed under a


Creative Commons Attribution-ShareAlike 4.0 International License
Contents

Preface 4

1 Differentiation 6
1.1 Open sets and boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Affine functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Affine approximations in one and two variables . . . . . . . . . . . . . . . . . 14
1.5 The derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Differentiation rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.7 Paths, curves, and surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8 Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.9 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.10 Taylor’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.11 The implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.12 The inverse function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2 Extrema 44
2.1 Maximum and minimum values . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2 The second derivative test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Constrained extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Multiple constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.6 Finding global extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3 Double and triple integrals 63


3.1 Vertical slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Horizontal slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Double integrals over a rectangle . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4 Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.5 Double integrals over more general regions . . . . . . . . . . . . . . . . . . . 75
3.6 Examples and applications of double integrals . . . . . . . . . . . . . . . . . 78
3.7 Triple integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2
Contents 3

4 Change of variables 90
4.1 Two variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.2 Polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3 Three variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Cylindrical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.5 Spherical coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5 Vector fields 102


5.1 Definitions and first examples . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2 Conservative vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3 Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6 Path and line integrals 111


6.1 Path integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Line integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.3 Reparameterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.4 Line integrals of conservative vector fields . . . . . . . . . . . . . . . . . . . 119
6.5 Path independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

7 Surface integrals 127


7.1 Parameterized surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Tangent planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.3 Surface area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.4 Integrals of scalar functions over surfaces . . . . . . . . . . . . . . . . . . . . 134
7.5 Centre of mass and moment of inertia . . . . . . . . . . . . . . . . . . . . . . 136
7.6 Surface integrals of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . 140
7.7 Reparameterization and orientation . . . . . . . . . . . . . . . . . . . . . . . 144

8 Integral theorems 148


8.1 Gradient, divergence, and curl . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.2 Divergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.3 Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.4 Stokes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Preface

These are notes for the course Multivariable calculus (MAT 2122) at the University of Ot-
tawa. This is a course on calculus in multiple dimensions aimed at students majoring in
mathematics or doing a joint degree with another subject. We will focus mainly on real-
valued functions of multiple real-valued inputs. Many concepts will be discussed using the
language of vectors and linear algebra, since this is the most natural setting for multivariable
calculus. We’ll see how much of the calculus you learned in previous courses generalizes to
multiple dimensions. This will allow us to explore interesting new mathematics, such as
integration over surfaces and three-dimensional regions.

Acknowledgements: I would like to thank Aaron Tikuisis and Tanya Schmah for sharing with
me their lecture notes for this course, upon which portions of these lecture notes are based.
The official textbooks for this course are the open access books [FRYc, FRYe]. Portions of
these notes follow those textbooks.

Alistair Savage

Course website: https://alistairsavage.ca/mat2122

4
Contents 5

Conventions
We will use the following conventions in these notes:

• N = {0, 1, 2, 3, . . . } denotes the set of nonnegative integers.


• R denotes the set of real numbers.
• We will denote vectors by bold letters, such as x, y, u, v, w. Since bold characters can
be hard to handwrite (e.g. during lectures), we will sometimes denote vectors with a
small overhead arrow, e.g. ⃗x, ⃗y , ⃗u, ⃗v , w,
⃗ when writing by hand.

We will write vectors in Rn horizontally as

x = (x1 , . . . , xn ) (1)

or as column matrices  
x1
x =  ...  , (2)
 
xn
interchangeably. Note that (1) is not the same as the row matrix
 
x1 · · · xn .

Rather, (1) is just a horizontally written notation (to save space) for the column matrix (2).
We write q
∥x∥ = x21 + · · · + x2n
for the norm of a vector. Note that some textbooks, including [FRYe], use the notation |x|
for this norm.

Exercises
After most sections, there is a list of exercises. Some exercises are written directly in the
notes, but most refer to the problems book of the course textbooks. Sometimes the list of
exercises is quite long, but many of the problems are similar. In this case, you should attempt
a selection of them—enough so that you feel comfortable doing that type of question. Doing
exercises is the best way to learn the material!
Chapter 1

Differentiation

In this first chapter, we discuss differentiation in multiple dimensions. Some results will be
review from previous courses, while other material will be new.

1.1 Open sets and boundaries


Many definitions in calculus require one to be able to get “near” a point. The precise
formulation of this concept involves the notion of open sets, which we discuss here.

Definition 1.1.1 (Open ball). Let a ∈ Rn and ε > 0. The open ball of radius ε centered at
a is
ε
n
Bε (a) := {x ∈ R : ∥x − a∥ < ε}. •
a

Definition 1.1.2 (Open set). A set A ⊆ Rn is open if

∀a ∈ A, ∃ε > 0 such that Bε (a) ⊆ A.

In other words, A is open if every point of A is the center of an open ball contained in A.


a

Example 1.1.3. Every open ball is an open set. Let’s prove it! Choose b ∈ Rn and δ > 0.
We will prove that A = Bδ (b) is open. Let a ∈ A. We must find ε > 0 such that Bε (a) ⊆ A.

δ •
a

b

6
Open sets and boundaries 7

Since a ∈ A = Bδ (b), we have ∥a − b∥ < δ. Let

ε = δ − ∥a − b∥ > 0.

Then, for all x ∈ Bε (a), we have ∥x − a∥ < ε, and so

∥x − b∥ ≤ ∥x − a∥ + ∥a − b∥ (triangle inequality)
< ε + ∥b − a∥
= δ − ∥a − b∥ + ∥b − a∥
= δ.

Thus x ∈ Bδ (b) = A. Hence Bε (a) ⊆ A, as desired. △

Example 1.1.4. The set A = (0, ∞) × (0, ∞) ⊆ R2 is open. Indeed, if (a, b) ∈ A, then
a, b > 0. If we take ε = min{a, b}, then Bε (a, b) ⊆ A.


Example 1.1.5. The set A = [0, ∞) × (0, ∞) is not open. To see this, we note that for any
b > 0, we have (0, b) ∈ A. However, for all ε > 0, we have

− 2ε , b ∈ Bε (0, b) but − 2ε , b ∈
 
/ A.

Hence Bε (0, b) ⊈ A. △

Recall that, if A and B are sets, then their set difference is

A \ B := {x : x ∈ A, x ∈
/ B}.

Definition 1.1.6 (Boundary). Let A ⊆ Rn . The boundary of A is the set of all points
a ∈ Rn such that,
∀ε > 0 (Bε (a) ∩ A ̸= ∅ and Bε (a) \ A ̸= ∅.)
We denote the boundary of A by ∂A.

Note that Definition 1.1.6 does not require that a ∈ A.

Remark 1.1.7. In Section 8.4 we will give a different definition of boundary for surfaces in
R3 .
8 Differentiation

Example 1.1.8. If A = Bδ (b) ⊆ Rn , then ∂A = {x ∈ Rn : ∥x − b∥ = δ}.



b

In this case, none of the boundary points of A are in A. The closed ball has the same
boundary:
∂{x ∈ Rn : ∥x − b∥ ≤ δ} = {x ∈ Rn : ∥x − b∥ = δ}.
In this case, all of the boundary points are in the set itself. △

Example 1.1.9. If
A = [0, ∞) × (0, ∞)
then
∂A = {(x, 0) : x ≥ 0} ∪ {(0, y) : y ≥ 0}.
The boundary is the region colored green in the following picture:

Note that the sets


(0, ∞) × (0, ∞), (0, ∞) × [0, ∞), and [0, ∞) × [0, ∞)
all have the same boundary as A. △

Exercises.
1.1.1. For each of the following sets, state whether or not the set is open and describe its
boundary.
(a) [0, 1] × (0, 1)
(b) (0, 1) × (0, 1)
(c) [0, 1] × [0, 1]
(d) {(x, y) : y ≥ x2 }
(e) {(x, y) : y > x2 }
(f) R2
(g) ∅
Limits 9

1.2 Limits
The notion of a limit underpins all of calculus. In this section we give a precise definition of
a limit in the multivariable setting, and discuss several important properties of limits.
Definition 1.2.1 (Limit). Suppose A ⊆ Rn is an open set, f : A → Rm , a ∈ A ∪ ∂A, and
L ∈ Rm . Then we write
lim f (x) = L
x→a

if
∀ε > 0, ∃δ > 0 such that (x ∈ A, 0 < ∥x − a∥ < δ) =⇒ ∥f (x) − L∥ < ε.
(Note that the condition 0 < ∥x − a∥ is equivalent to the condition x ̸= a.)
Remarks 1.2.2. (a) As in single variable calculus, limits may not exist. However, if they
exist, they are unique.
(b) Note that Definition 1.2.1 allows a to lie in the boundary of A, and does not require
a ∈ A. Thus, in some situations, we can discuss the limit of a function at a point that
does not lie in the domain of that function.

In practice, it is enough to consider the m = 1 case of Definition 1.2.1, as we now explain.


Suppose A ⊆ Rn . Then every function f : A → Rm can be written in the form

f (x) = (f1 (x), . . . , fm (x)), where fi : A → R for i = 1, 2, . . . , m. (1.1)

The fi are the component functions of f , and we write f = (f1 , . . . , fm ). The following
proposition states that the coordinates of the limit of f are limits of its component functions
(if these all exist).
Proposition 1.2.3. Suppose A ⊆ Rn is an open set, f : A → Rm , a ∈ A ∪ ∂A, and
L = (L1 , . . . , Lm ) ∈ Rm . Then
lim f (x) = L
x→a

if and only if
lim fi (x) = Li for i ∈ {1, 2, . . . , m}.
x→a

In particular limx→a f (x) exists if and only if limx→a fi (x) exists for each i.
Proof. Suppose limx→a f (x) = L. Let ε > 0. Then there exists δ > 0 such that

(x ∈ A, 0 < ∥x − a∥ < δ) =⇒ ∥f (x) − L∥ < ε.

Let i ∈ {1, 2, . . . , m}. Since


v
u m
uX
|fi (x) − Li | ≤ t (fj (x) − Lj )2 = ∥f (x) − L∥,
j=1

we have
(x ∈ A, 0 < ∥x − a∥ < δ) =⇒ |fi (x) − Li | ≤ ∥f (x) − L∥ < ε.
10 Differentiation

Hence limx→a fi (x) = Li .


Conversely, suppose limx→a fi (x) = Li for i ∈ {1, 2, . . . , m}. Let ε > 0. For each
i ∈ {1, 2, . . . , m}, there exists δi > 0 such that

(x ∈ A, 0 < ∥x − a∥ < δi ) =⇒ |fi (x) − Li | < √ε .


m

Let δ = min{δ1 , . . . , δm }. Then, for all x ∈ A satisfying 0 < ∥x − a∥ < δ, we have


v v
u m u m ε2
uX uX
∥f (x) − L∥ = t 2
(fj (x) − Lj ) ≤ t = ε.
m
j=1 j=1

Hence limx→a f (x) = L.


Definition 1.2.4 (Continuity). Suppose A ⊆ Rn is an open set, f : A → Rm and a ∈ A.
We say f is continuous at a if
lim f (x) = f (a).
x→a

We say that f is continuous if it continuous at every point in its domain.


Corollary 1.2.5. Suppose A ⊆ Rn is an open set, f = (f1 , . . . , fm ) : A → Rm and a =
(a1 , . . . , am ) ∈ A. Then f is continuous at a if and only if fi is continuous at ai for each
i ∈ {1, 2, . . . , m}.
Proof. This follows from Definition 1.2.4 and Proposition 1.2.3.

Remark 1.2.6. In light of Proposition 1.2.3 and Corollary 1.2.5, we will often restrict our
attention to real-valued functions f : A → R, A ⊆ Rn , when discussing limits and continuity.
We will see later that we can also make this simplification when studying derivatives. How-
ever, we cannot make analogous simplifications in the domain of a function. For example,
we will see that all the partial derivatives of a function may exist, even though the function
is not differentiable.

Theorem 1.2.7 (Squeeze theorem). Suppose A ⊆ Rn is open, a ∈ A ∪ ∂A, L ∈ R, and


f, h : A → R. If
|f (x) − L| ≤ h(x) for all x ∈ A and lim h(x) = 0,
x→a

then limx→a f (x) = L.


Proof. The proof is very similar to the proof of the single-variable squeeze theorem, and so
we will omit it.
Proposition 1.2.8. Suppose f : A → R, where A ⊆ Rn is open, and limx→a f (x) = L.
Then, for any line ℓ : R → Rn such that ℓ(0) = a, we have limt→0 f (ℓ(t)) = L.

ℓ(t)


a=ℓ(0)
Limits 11

Proof. The proof of this proposition is Exercise 1.2.1.

Warning 1.2.9. The converse of Proposition 1.2.8 is false! There exist functions f such that
limt→0 f (ℓ(t)) = L for all lines ℓ, but limx→a f (x) does not exist. See [FRYc, §2.1.1] for an
example.

Example 1.2.10. Let


x2 + 5xy + 3y 2
f (x, y) = .
|x| + |y|
Does lim(x,y)→(0,0) f (x, y) exist? If so, what is it? Let’s try approaching (0, 0) along the line

ℓ(t) = (t, 0).

We have
t2 + 5t · 0 + 3 · 02 t2
f (ℓ(t)) = f (t, 0) = = = |t|.
|t| + |0| |t|
Thus,
lim f (ℓ(t)) = lim |t| = 0.
t→0 t→0

Therefore, by Proposition 1.2.8, if lim(x,y)→(0,0) f (x, y) exists, it must be 0. Let’s prove


lim(x,y)→(0,0) f (x, y) = 0. Using the triangle inequality, we have

x2 + 5xy + 3y 2 |x2 | 5|xy| 3|y 2 |


|f (x, y) − 0| = ≤ + + .
|x| + |y| |x| + |y| |x| + |y| |x| + |y|

Now
|x2 | |x| |x| |x|(|x| + |y|)
= ≤ = |x|,
|x| + |y| |x| + |y| |x| + |y|
5|xy| 5|x|(|x| + |y|)
≤ = 5|x|,
|x| + |y| |x| + |y|
3|y 2 | 3|y|(|x| + |y|)
≤ = 3|y|.
|x| + |y| |x| + |y|

Therefore
|f (x, y) − 0| ≤ |x| + 5|x| + 3|y| = 6|x| + 3|y|.
Since lim(x,y)→(0,0) (6|x| + 3|y|) = 0, we have

lim f (x, y) = 0
(x,y)→(0,0)

by the squeeze theorem (Theorem 1.2.7). △

Example 1.2.11. Consider the function

x2 − y 2
f (x, y) = .
x2 + y 2
12 Differentiation

Does lim(x,y)→(0,0) f (x, y) exist? If so, what is it? Let’s try approaching the origin along the
two coordinate axes. First consider ℓ1 (t) = (t, 0). Then

t2 − 02
f (ℓ1 (t)) = f (t, 0) = = 1.
t2 + 02

Now consider ℓ2 (t) = (0, t). Then

02 − t2
f (ℓ2 (t)) = f (0, t) = = −1.
02 + t2

Thus, by Proposition 1.2.8, lim(x,y)→(0,0) does not exist. △

We conclude this section with some properties of limits. The proofs of these properties
are analogous to the proofs in the single-variable case, so we omit them. For the remainder
of this section, A ⊆ Rn is open, f, g : A → R, and a ∈ A ∪ ∂A.

Proposition 1.2.12 (Uniqueness of limits). If

lim f (x) = K and lim f (x) = L,


x→a x→a

then K = L.

Proposition 1.2.13 (Arithmetic of limits). Suppose limx→a f (x), limx→a g(x) both exist.

(a) limx→a (f (x) + g(x)) = limx→a f (x) + limx→a g(x).


(b) limx→a f (x)g(x) = (limx→a f (x)) (limx→a g(x)).
(c) If f (x) ̸= 0 for all x ∈ A and limx→a f (x) ̸= 0, then

1 1
lim = .
x→a f (x) limx→a f (x)
1
Corollary 1.2.14. If a ∈ A and f, g are both continuous at a, then so are f + g, f g, and f
(if defined).

Exercises.
1.2.1. Prove Proposition 1.2.8.

Exercises from [FRYb, §2.1] : Q1–Q11.


Affine functions 13

1.3 Affine functions


Suppose we have a function

f : Rn → Rm , n, m ∈ N.

Recall, from linear algebra, that f is linear if

f (x + y) = f (x) + f (y), f (sx) = sf (x),

for all x, y ∈ Rn , s ∈ R. The function f is linear if and only if it is given by multiplication


by some matrix. In other words, f is linear if and only if there exists an m × n matrix A
such that
f (x) = Ax for all x ∈ Rn ,

where we view x as an n × 1 column matrix.


More generally, the function f is affine if it is a composition of a linear map and a
translation. In other words, f is affine if and only if there exists an m × n matrix A and a
vector b ∈ Rm such that
f (x) = Ax + b.

Intuitively, both linear and affine functions are those functions whose graphs are “flat”.
However, the graphs of linear functions are required to pass through the origin. Note that if
x0 ∈ Rm is some fixed vector, then the function x 7→ A(x − x0 ) + b is also affine, since we
have
Ax + (b − Ax0 ) .
A(x − x0 ) + b = |{z}
| {z }
linear translation

Many of the affine maps we are about to see will be of this form.

Example 1.3.1. The function


f : R → R, f (x) = 12 x,

is linear. Its graph is as follows:

f (x)

Note that the graph passes through the origin. On the other hand, the function

g : R → R, g(x) = 12 x + 1,
14 Differentiation

is not linear, but it is affine. Its graph is as follows:

g(x)

Note that the graph does not pass through the origin. △

Exercises.
1.3.1. Which of the following functions are linear? Which are affine?

(a) f : R2 → R, f (x, y) = 2x − 7y + 10
(b) f : R2 → R, f (x, y) = 2(x − 1) + 5(y − 3) + 17
(c) f : R2 → R, f (x, y) = x2 + y
(d) f : R2 → R3 , f (x, y) = (x + 4y, 3x + y, 1)
(e) f : R3 → R2 , f (x, y, z) = (z, x)

1.4 Affine approximations in one and two variables


Since linear and affine functions are much easier to work with than arbitrary functions
(because we can use the tools of linear algebra), it is often useful to approximate a function
near some point by an affine function. You have done this in previous calculus courses when
you found tangent lines and tangent planes. A tangent line is the graph of the affine function
that best approximates a given function R → R, while a tangent plane is the graph of the
affine function that best approximates a given function R2 → R. Note that it is important
that we allow ourselves to work with affine functions, and not just linear ones, since the
best approximation will not pass through the origin in general.
Recall that if A ⊆ R is open (e.g. A is some open interval (b, c) for b < c), and f : A → R,
then the derivative of f at a ∈ A is

df f (x) − f (a)
f ′ (a) = (a) := lim ,
dx x→a x−a
if this limit exists. If this limit exists, we say f is differentiable at a. We say f is differentiable
if it is differentiable at all a ∈ A.
Affine approximations in one and two variables 15

Lemma 1.4.1. The function f is differentiable at a with derivative f ′ (a) if and only if

f (x) − f (a) + f ′ (a)(x − a)



lim = 0. (1.2)
x→a |x − a|

Proof. Suppose first that f is differentiable at a with derivative f ′ (a). Then

f (x) − f (a) + f ′ (a)(x − a)



f (x) − f (a)
lim+ = lim+ − f ′ (a) = f ′ (a) − f ′ (a) = 0
x→a |x − a| x→a x−a

and

f (x) − f (a) + f ′ (a)(x − a)



f (x) − f (a)
lim− = lim− + f ′ (a) = −f ′ (a) + f ′ (a) = 0.
x→a |x − a| x→a a−x

Since the two one-sided limits exist and are equal, we have proved (1.2). The proof of the
reverse implication is similar, and is left as exercise Exercise 1.5.1.

Recall that the tangent line to a differentiable function f : R → R at the point (a, f (a))
is given by
y = f ′ (a)(x − a) + f (a). (1.3)
In other words, it is the graph of the affine function

x 7→ f ′ (x0 )(x − x0 ) + f (x0 ). (1.4)

Lemma 1.4.1 says that this function is the best affine approximation to the function f (x)
near the point a. Precisely, it says that the function (1.4) is the unique affine function such
that the difference between it and f (x) (the “error”) goes to zero faster than |x − a|.

Example 1.4.2. To refresh our memory, let’s find the tangent line to the function

f (x)

f : R → R, f (x) = x3 + x2 − 2x, x

at the point (−1, f (−1)) = (−1, 2). We first compute the derivative:

f ′ (x) = 3x2 + 2x − 2, hence f ′ (−1) = −1.

The tangent line to the graph of f at the point (−1, 2) is therefore given by

y = f ′ (−1) x − (−1) + 2 = (−1)(x + 1) + 2 = 1 − x.



16 Differentiation

Drawing the tangent line,


f (x)
(−1, 2)

we see that it is the graph of an affine function. In particular, it is the graph of the affine
function
g(x) = −x + 1. △

Now let’s move to two variables. Suppose A ⊆ R2 is open and we have a differentiable
function f : A → R. (We’ll return to the precise definition of a differentiable function in
multiple variables soon.) Remember that the tangent plane to the graph of f at the point
(a1 , a2 , f (a1 , a2 )) is given by

z = fx (a1 , a2 )(x − a1 ) + fy (a1 , a2 )(y − a2 ) + f (a1 , a2 )


(1.5)
= ∇f (a1 , a2 ) · (x − a1 , y − a2 ) + f (a1 , a2 ),

where
∂f ∂f
fx = and fy = (1.6)
∂x ∂y
are the partial derivatives of f ,
∇f = (fx , fy ) (1.7)
is the gradient of f , and · denotes the dot product.

Example 1.4.3. Consider the function

f : R2 → R, f (x, y) = x3 + xy − y 2 + 4.

Suppose we want to find the equation of the tangent plane to the graph of f (that is, the
surface z = f (x, y)) at the point (0, 1, f (0, 1)) = (0, 1, 3). We compute

fx = 3x2 + y, fy = x − 2y.

Thus, the equation of the tangent plane is

z = fx (0, 1)(x − 0) + fy (0, 1)(y − 1) + f (0, 1) = x − 2(y − 1) + 3 = x − 2y + 5. △

Exercises.
Exercises from [FRYb, §2.5] : Q5–Q7, Q10.
The derivative 17

1.5 The derivative


We’d like to use our knowledge of linear algebra to unify and generalize the above discussion
of finding a best affine approximation to a differentiable function. Let us suppose we have a
function
f : Rn → Rm , n, m ∈ N.
It is now useful to use the notation x1 , . . . , xn for the input variables and y1 , . . . , ym for the
output variables. Using vector notation, we set x = (x1 , . . . , xn ), y = (y1 , . . . , ym ), and

y = f (x).

We fix a point a ∈ Rn and let b = f (a). Then define

∆x = x − a, ∆y = y − b.

Note that a (and thus b) are fixed points, x is our variable, and y changes with x, according
to the equation y = f (x).
In our new notation, (1.3) becomes

∆y = f ′ (a)∆x (1.8)

and (1.5) becomes

(1.9)
 
∆y = fx (a)(x1 − a1 ) + fy (a)(x2 − a2 ) = fx fy ∆x,

where the righthand side involves matrix multiplication. These, of course, are the cases
where m = 1 and n = 1 or n = 2, respectively. You might now be able to guess the general
form.
Definition 1.5.1 (Differentiable function). Let A ⊆ Rn be open, f : A → Rm , and a ∈ A.
We say f is differentiable at a if there exists a linear map Df (a) : Rn → Rm such that

∥f (x) − f (a) − Df (a)(x − a)∥


lim = 0. (1.10)
x→a ∥x − a∥

If this limit exists, then Df (a) is called the total derivative of f at a.


How do we find the total derivative in practice? As we noted in (1.1), every function
f : Rn → Rm can be written in the form

f (x) = (f1 (x), . . . , fm (x)), where fi : Rn → R for i = 1, 2, . . . , m.

If all the partial derivatives of the components functions fi exist, then the Jacobian matrix,
or matrix of partial derivatives of f at a point a ∈ Rn is the m × n matrix
 ∂f1 ∂f1

∂x1
(a) · · · ∂xn
(a)
 .. .. ..  .
h i
∂fi
∂xj
(a) =  . . . 
∂fm ∂fm
∂x1
(a) · · · ∂xn (a)
18 Differentiation

Theorem 1.5.2. If f is differentiable at a, then its total derivative is the linear map cor-
responding to its Jacobian matrix (that is, the linear map given by left multiplication by the
Jacobian matrix).
We will not prove Theorem 1.5.2 in class, since the proof is a bit long and technical.
As in linear algebra, we often identify a matrix and the corresponding linear map. Thus,
Theorem 1.5.2 says that, if f is differentiable at a, then
 ∂f1 ∂f1

∂x
(a) · · · ∂x n
(a)
Df (a) = ∂xji (a) =  ... ..  .
1
..
h i
∂f
. . 

∂fm ∂fm
∂x1
(a) · · · ∂xn (a)

As will see in Theorem 1.5.12, it is possible for the Jacobian matrix to exist (that is, the
partial derivatives of all the component functions of f exist), but the function to not be
differentiable. However, we will still use the notation Df (a) for the Jacobian matrix in this
case.
Definition 1.5.1 states that, if f : A → Rm , A ⊆ Rn , is a differentiable function, then the
best affine approximation to f at the point a ∈ A is given by

∆y = Df (a)∆x, where (1.11)


∆y = y − b = f (x) − f (a) and ∆x = x − a.

Note that, when n = m = 1, then (1.11) becomes (1.8), describing a tangent line. On the
other hand, when n = 2, m = 1, equation (1.11) becomes (1.9), describing a tangent plane.
In higher dimensions, we refer to the affine approximation as a hyperplane.

Remark 1.5.3. Some references use the term linear approximation instead of affine approxi-
mation. However, the term affine approximation is more correct.

When m = 1, then we are considering functions

f : A → R, A ⊆ Rn .

The gradient of such a function is


 
∂f ∂f
∇f = ,..., . (1.12)
∂x1 ∂xn
(When n = 2, this agrees with (1.7).) In this case (that is, when m = 1) the total derivative
Df (a) is the transpose ∇⊺ f (a) of the gradient ∇f (a) at a. Then (1.10) becomes
|f (x) − f (a) − ∇f (a) · (x − a)|
lim = 0. (1.13)
x→a ∥x − a∥
In fact, using the squeeze theorem (Theorem 1.2.7), we can remove the absolute value in the
numerator, if desired. In other words, f is differentiable if and only if
f (x) − f (a) − ∇f (a) · (x − a)
lim = 0. (1.14)
x→a ∥x − a∥
The derivative 19

Example 1.5.4. Consider the function

f : R2 → R, f (x, y) = 3xy 2 .

We have  
∂f ∂f
∇f = , = (3y 2 , 6xy), ∇f (2, 1) = (3, 12).
∂x ∂y
Thus
f (x, y) − f (2, 1) − ∇f (2, 1) · (x − 2, y − 1)
lim
(x,y)→(2,1) ∥(x, y) − (2, 1)∥
3xy 2 − 6 − 3(x − 2) − 12(y − 1)
= lim
(x,y)→(2,1) ∥(x − 2, y − 1)∥
2
3xy − 3x − 12y + 12
= lim .
(x,y)→(2,1) ∥(x − 2, y − 1)∥
Setting w = x − 2, z = y − 1, this limit is equal to
3(w + 2)(z + 1)2 − 3(w + 2) − 12(z + 1) + 12 3wz 2 + 6z 2 + 6wz
lim = lim √ .
(w,z)→(0,0) ∥(w, z)∥ (w,z)→(0,0) w2 + z 2
√ √
Now, since |z| = z 2 ≤ w2 + z 2 , we have
3wz 2 + 6z 2 + 6wz |z|
√ =√ |3wz + 6z + 6w| ≤ |3wz + 6z + 6w| → 0
2
w +z 2 w2 + z 2
as (w, z) → (0, 0). Thus, f is differentiable at (2, 1) with total derivative
⊺  
Df (2, 1) = ∇ f (2, 1) = 3 12 .

and the tangent plane to the graph of f at the point (2, 1, f (2, 1)) = (2, 1, 6) is given by

z = f (2, 1) + ∇(f ) · (x − 2, y − 1) = 6 + (3, 12) · (x − 2, y − 1) = 3x + 12y − 12. △

Notice that, in Example 1.5.4, it was quite easy to compute the gradient ∇f , but then
it required some effort to prove that the function f was differentiable. It would be nice if
there were an easier way to see that a function is differentiable, at least for some large class
of functions. In fact, there is, as we now explain.
First, as we saw for limits, we can actually always reduce our attention to this m = 1
case, as the following proposition states.
Proposition 1.5.5. Let A ⊆ Rn be open, f = (f1 , . . . , fm ) : A → Rm , and a ∈ A. Then f
is differentiable at a if and only if each component function fi , 1 ≤ i ≤ m, is differentiable
at a. In this case, we have  ⊺ 
∇ f1 (a)
Df (a) = 
 .. 
. 

∇ fm (a)
where ∇⊺ fi (a) is the transpose of the gradient ∇fi (a).
20 Differentiation

Proof. This follows from Definition 1.5.1 and Proposition 1.2.3.


The following theorem (whose proof we omit) is very useful for showing that functions
are differentiable.

Theorem 1.5.6. Let A ⊆ Rn be open, f : A → R, and a ∈ A. If all partial derivatives of f


are continuous at a, then f is differentiable at a.

Corollary 1.5.7. Let A ⊆ Rn be open, f = (f1 , . . . , fm ) : A → Rm , and a ∈ A. If all partial


∂fi
derivatives ∂x j
, 1 ≤ i ≤ m, 1 ≤ j ≤ n, are continuous at a, then f is differentiable at a.

Proof. This follows from Proposition 1.5.5 and Theorem 1.5.6.

Example 1.5.8. Consider the function

f : R3 → R2 , f (x1 , x2 , x3 ) = (x1 ex2 , x31 − x2 cos(x3 )).

The component functions are

f1 (x1 , x2 , x3 ) = x1 ex2 , f2 (x1 , x2 , x3 ) = x31 − x2 cos(x3 ),

which are all differentiable functions. The total derivative is


" ∂f1 ∂f1 ∂f1 #  
∂x1 ∂x2 ∂x3 e x2 x 1 e x2 0
= .
∂f2 ∂f2 ∂f2 3x21 − cos(x3 ) x2 sin(x3 )
∂x1 ∂x2 ∂x3

Since all the partial derivatives are continuous, the function f is differentiable at every point
of R3 , by Theorem 1.5.6.
If a = (2, 0, π), we have  
1 2 0
Df (a) = .
12 1 0
Since f (a) = (2, 8), the best affine approximation to f at a is given by
 
  x1 − 2
1 2 0 
y − (2, 8) = x2  ,
12 1 0
x3 − π

that is,

y1 = 2 + (x1 − 2) + 2x2 = x1 + 2x2 ,


y2 = 8 + 12(x1 − 2) + x2 = 12x1 + x2 − 16.

Since the graph of f lives in five dimensions, we are not able to visualize this as we can do
for tangent lines and tangent planes. Nevertheless, we were still able to give an algebraic
description of the approximation. △

Warning 1.5.9. The converse to Theorem 1.5.6 is false (even for single-variable functions!),
as the next example shows.
The derivative 21

Example 1.5.10. Consider the function


(
1

x2 sin x
x ̸= 0,
f (x) =
0 x = 0.

We have
f (x) − 0 1

lim = lim x sin x
=0
x→0 x x→0

by the squeeze theorem (since |x sin( x1 )| ≤ |x|). Thus


(
1 1
 
2x sin − cos x ̸= 0,
f ′ (x) = x x
0 x = 0.

Therefore, f is differentiable everywhere. However, limx→0 f ′ (x) does not exist, and so f ′ (x)
is not continuous at x = 0. △

Remark 1.5.11. A function whose partial derivatives exist and are continuous is said to be of
class C 1 . Thus, Theorem 1.5.6 and Corollary 1.5.7 say that any C 1 function is differentiable.
However, Example 1.5.10 shows that there are differentiable functions that are not of class
C 1.

You learned in earlier classes that, for single-variable functions, differentiability implies
continuity. We have the same result in higher dimensions.

Theorem 1.5.12. Let A ⊆ Rn be open, f : A → Rm , and a ∈ A. If f is differentiable at a,


then f is continuous at a.

Proof. The proof is very similar to the single-variable case, and so will omit it.

Warning 1.5.13. The existence of all partial derivatives is not enough to ensure that a
function is differentiable, as the next example illustrates.

Example 1.5.14. Consider the function


(
x2
2 x−y
x ̸= y,
f : R → R, f (x, y) =
0 x = y.

Then ( (
x2 −2xy x2
∂f (x−y)2
x ̸= y, ∂f − (x−y)2 x ̸= y,
(x, y) = (x, y) =
∂x 1 x = y, ∂y 0 x = y.
However, f is not continuous as (0, 0). (See [FRYc, Example 2.2.9] for details.) Thus, the
partial derivatives exist everywhere (and are continuous everywhere except at (0, 0)), but f
is not differentiable at (0, 0), since it is not continuous there. △
22 Differentiation

Exercises.
1.5.1. Complete the proof of Lemma 1.4.1.

1.5.2. Compute the matrix of partial derivatives of the following functions:


(a) f : R2 → R2 , f (x, y) = (x, y)
(b) f : R2 → R3 , f (x, y) = (ex y + sin x, 2x, x − ey )
(c) f : R3 → R2 , f (x, y, z) = (2x − ez + y, x3 y)
(d) f : R3 → R3 , f (x, y, z) = (xyexz , y sin x, 3xyz)
1.5.3. Suppose f : Rn → Rm is a linear map. What is the derivative of f ?

1.6 Differentiation rules


The derivative in higher dimensions satisfies many of the same rules as you saw for single-
variable functions.
Theorem 1.6.1. Let A ⊆ Rn be open and f, g : A → Rm . Suppose that f and g are
differentiable at a ∈ A.
(a) If c ∈ R, then cf is differentiable at a and

D(cf )(a) = c(Df (a)).

(b) The function f + g is differentiable at a and

D(f + g)(a) = Df (a) + Dg(a).

(c) If m = 1, then f g is differentiable at a and

D(f g)(a) = g(a)Df (a) + f (a)Dg(a).

This is known as the product rule.


(d) If m = 1 and g(x) ̸= 0 for all x ∈ A, then f
g
is differentiable at a and
 
f g(a)Df (a) − f (a)Dg(a)
D (a) = .
g g(a)2
You first encountered the chain rule for single-variable functions, where you learned that,
for differentiable functions f, g : R → R,

(g ◦ f )′ (x) = g ′ (f (x))f ′ (x).

This result can be generalized to multivariable functions, using tools of linear algebra.
Differentiation rules 23

Theorem 1.6.2 (Chain rule). If

f : Rn → Rm and g : Rm → Rk

are differentiable functions and a ∈ Rn , then

D(g ◦ f )(a) = (Dg(f (a))) (Df (a)) ,

where the product on the right-hand side is matrix multiplication (equivalently, composition
of linear maps).

Example 1.6.3. Consider the functions

f : R2 → R2 , f (u, v) = (u2 + 5v, 3uv), g : R2 → R, g(x, y) = xy 2 .

Then we have  
2u 5  
Df = , Dg = y 2 2xy .
3v 3u
Thus, using the chain rule, we have

D(g ◦ f ) = Dg(f (u, v))Df


= Dg(u2 + 5v, 3uv)Df
 
 2 2
 2u 5
= (3uv) 2(u + 5v)(3uv)
3v 3u
 
= 18u v + 18(u + 5v)uv 45u2 v 2 + 18(u2 + 5v)u2 v
3 2 2 2
 
= 36u3 v 2 + 90uv 3 18u4 v + 135u2 v 2 .

In particular, we have

∂(g ◦ f ) ∂(g ◦ f )
= 36u3 v 2 + 90uv 3 and = 18u4 v + 135u2 v 2 .
∂u ∂v
Of course, we could check this by computing

g ◦ f = (u2 + 5v)(3uv)2 = 9u4 v 2 + 45u2 v 3

and then calculating the partial derivatives. △

Exercises.
Exercises from [FRYb, §2.4] : Q1, Q5, Q6, Q12–Q14, Q18, Q22.
24 Differentiation

1.7 Paths, curves, and surfaces


In this section we briefly discuss paths, curves, and surfaces. We will return to these impor-
tant concepts in more detail later in the course.
Definition 1.7.1 (Path, curve). A path is a map
c : [a, b] → Rn ,
where a, b ∈ R, a ≤ b. (We will mostly be interested in continuous paths.)

c(t) •
• c(b)


c(a)
A curve is the image of a path:
C = {c(t) : t ∈ [a, b]}.
Example 1.7.2. Consider the path
c(π/2)

c : [0, 2π] → R2 , c(t) = (cos(t), sin(t)), •


c(0) = (1, 0) = c(2π)

Its image is the unit circle. △

Example 1.7.3 (Parametric equation of a line). If p, v ∈ Rn , then


ℓ(t) = p + tv, t ∈ R,
is the parametric equation of the line through the point p and parallel to v. If we restrict
the domain to some interval [a, b], then this is a curve whose image is a line segment. △

Given a path c : [a, b] → Rn , we will usually denote its derivative by c′ (t) instead of Dc(t).
Thus, if
c(t) = (c1 (t), . . . , cn (t)) then c′ (t) = (c′1 (t), . . . , c′n (t)).
For t0 ∈ [a, b], the derivative c′ (t0 ) is the velocity at time t0 . (We will often use this termi-
nology, even if t does not really denote time in our particular situation.)

c(t0 )
• c′ (t0 )
Paths, curves, and surfaces 25

Provided c′ (t0 ) ̸= 0, the tangent line to the curve is given by

ℓ(t) = c(t0 ) + c′ (t0 )(t − t0 ).

This is just a special case of (1.11).

Example 1.7.4. Suppose we are going on a hike. We have a differentiable function f : R2 → R


giving the elevation.

z = f (x, y)

20

z 10
2
0
−4 0
−2 0
2 y
x −2

On a two-dimensional map (ignoring elevation), our position at time t is given by some


differentiable path
c = (c1 , c2 ) : [a, b] → R2 .

Then, the three-dimensional coordinates of our hike at time t are given by

d : [a, b] → R3 , d(t) = (c1 (t), c2 (t), f (c(t))).

Our elevation at time t is f (c(t)). Using the chain rule, we have

h i c′ (t)
′ ∂f ∂f
(f ◦ c) (t) = Df (c(t))Dc(t) = (c(t)) (c(t)) 1
= ∇f (c(t)) · c′ (t).
∂x ∂y c′2 (t)

This gives us the rate of change of elevation at time t. △

A surface is a two-dimensional object in three-dimensional space. There are three ways


of describing a surface:

(a) We can give a parameterization of the surface. For example,

Φ : [0, 1] × [0, 2π] → R3 , Φ(u, v) = (u cos(v), u sin(v), u)


26 Differentiation

describes a cone:

z 0.5

0 0.5
0
−0.5 0
0.5 −0.5
1 y
x

We will discuss parameterizations of surfaces in more detail in Section 7.1.

(b) We can describe the surface as the set of solutions to an equation. For example,

S = {(x, y, z) : x2 + y 2 ≤ 1, z 2 = x2 + y 2 }

is the same cone as above.

(c) We can describe a surface as the graph of a function f : A → R. For example, the
graph of the function
p
f : A → R, f (x, y) = x2 + y 2 , A = {(x, y) : x2 + y 2 ≤ 1}

is the same cone as above.

If A ⊆ Rn , f : A → R, and k ∈ R, then the level set of f at k is the set

{x ∈ A : f (x) = k}. (1.15)

When n = 2, level sets are sometimes called contour lines. A contour plot is an image
showing multiple contour lines for the same function.

Example 1.7.5. Consider the function

f : R2 → R, f (x, y) = x2 y.

The level set of f at k is the set of all points (x, y) such that k = f (x, y) = x2 y, or,
Directional derivatives 27

equivalently, y = k
x2
.

6 y
k=2
4 k=1
k = −1
2 k = −2

x
−1.5 −1 −0.5 0.5 1 1.5
−2

−4

−6 △

Exercises.
Exercises from [FRYb, §1.6] : Q1–Q4, Q6, Q10, Q21.
Exercises from [FRYb, §1.7] : Q2–Q11.

1.8 Directional derivatives


A good reference for the material in this section is [FRYc, §2.7]. Throughout this section,
A ⊆ Rn is an open set and f : A → R.
Definition 1.8.1 (Directional derivative). Let a ∈ A, and v ∈ Rn . The directional derivative
of f at a along v is
d
Dv f (a) := f (a + tv) .
dt t=0

Remark 1.8.2. Some references require the vector v in Definition 1.8.1 to be a unit vector,
that is, ∥v∥ = 1. We will not require this.

Example 1.8.3. Consider the function

f : R3 → R, f (x, y, z) = xy 2 − 3z,

and let a = (q, r, s), v = (2, 1, −1). Then we have

f (a + tv) = f (q + 2t, r + t, s − t)
= (q + 2t)(r + t)2 − 3(s − t)
= qr2 + 2tr2 + 2qtr + 4t2 r + qt2 + 2t3 − 3s + 3t.
28 Differentiation

Thus
d
f (a + tv) = 2r2 + 2qr + 8tr + 2qt + 6t2 + 3,
dt
and so
d
Dv f (q, r, s) = f (a + tv) = 2r2 + 2qr + 3. △
dt t=0

Proposition 1.8.4. If f is differentiable at a, then

Dv f (a) = ∇f (a) · v.

Proof. Let
ℓ(t) = a + tv, t ∈ R.

Then

d
Dv f (a) = f ◦ ℓ(t)
dt t=0
= (f ◦ ℓ)′ (t)
= Df (ℓ(0))Dℓ(0) (chain rule)

= ∇ f (a)ℓ′ (0)
= ∇f (a) · v.

Example 1.8.5. Returning to the function f (x, y, z) = xy 2 − 3z of Example 1.8.3, we have

∇f (x, y, z) = (y 2 , 2xy, −3).

Thus
D(2,1,−1) f (x, y, z) = 2y 2 + 2xy + 3.

Taking (x, y, z) = (q, r, s), this gives the same directional derivative that we computed in
Example 1.8.3. △

Note that there is a significant computational advantage that comes from Proposi-
tion 1.8.4. Namely, we can compute the gradient ∇f once, and then find any directional
derivative by computing a simple dot product.
The following result gives two important interpretations of the gradient.

Proposition 1.8.6. Suppose f is differentiable at a ∈ A, and let k = f (a).

(a) The gradient ∇f (a) is a vector pointing in the direction in which f increases most
rapidly.
(b) If S is the level set of f at k (hence a ∈ S), then ∇f (a) is orthogonal to the tangent
line/plane/hyperplane to S at a.
Higher order derivatives 29

a• k=1

k=2

k=3

Proof. (a) For any unit vector v, Dv f (a) tells us how fast f changes along the direction
v. We have
Dv f (a) = ∇f (a) · v = ∥v∥ ∥∇f (a)∥ cos θ = ∥∇f (a)∥ cos θ,
where θ is the angle between ∇f (a) and v. Now, cos θ attains its maximal value when θ = 0,
that is, when ∇f (a) and v are parallel.
(b) Let c(t) be a path in S such that c(t0 ) = a. Then c′ (t0 ) is a tangent line contained
in the tangent plane to S at a. So we want to show that ∇f (a) ⊥ c′ (t0 ) for any such path.
For all t we have

c(t) ∈ S =⇒ f (c(t)) = k =⇒ (f ◦ c)′ (t) = 0.

Thus we have

0 = (f ◦ c)′ (t)
= D(f ◦ c)(t)
= ∇f (c(t)) · c′ (t). (chain rule)

Thus, taking t = t0 , we have 0 = ∇f (a) · c′ (t0 ), and so ∇f (a) ⊥ c′ (t0 ).

Exercises.
Exercises from [FRYb, §2.7] : Q1–Q8, Q10–Q20.

1.9 Higher order derivatives


A good reference for the material in this section is [FRYc, §2.3].
Suppose A ⊆ Rn is open and f : A → R. If the partial derivatives ∂x
∂f
i
exist, then they are
again functions from A → R, and we can try to differentiate again. If the partial derivatives
are differentiable, we obtain the second order derivatives

∂ 2f ∂ ∂f
:= , 1 ≤ i, j ≤ n.
∂xi ∂xj ∂xi ∂xj
30 Differentiation

Of course, if the partial derivatives of these second order derivatives exist, we can differentiate
again, to get the third order derivatives
∂ 3f ∂ ∂ 2f
:= , 1 ≤ i, j, k ≤ n.
∂xi ∂xj ∂xk ∂xi ∂xj ∂xk
We can continue as long as we obtain functions whose partial derivatives exist.
We may also use the subscript notation of (1.6), especially when our variables are denoted
x, y, z. Then, for example, we have
∂ 2f ∂ 2f
fxx := (fx )x = , fxy := (fx )y = ,
∂x ∂x ∂y ∂x
∂ 2f ∂ 2f
fyx := (fy )x = , fyy := (fy )y = .
∂x ∂y ∂y ∂y
Pay careful attention to the order of the subscripts!

Example 1.9.1. Suppose α, β ∈ R, and let f (x, y) = eαx+βy . Then


∂f ∂f
fx = = αeαx+βy , fy = = βeαx+βy ,
∂x ∂y
∂ 2f ∂ 2f
fxx = = α2 eαx+βy , fxy = = αβeαx+βy ,
∂x ∂x ∂y ∂x
∂ 2f ∂ 2f
fyx = = αβeαx+βy , fyy = = β 2 eαx+βy .
∂x ∂y ∂y ∂y
In general, for m, n ≥ 0, we have
∂ m+n f
= αm β n eαx+βy . (1.16)
∂xm ∂y n
Try proving this by induction (Exercise 1.9.1). △

Example 1.9.2. Suppose


f (x1 , x2 , x3 , x4 ) = x31 x2 cos(x3 x4 ).
Then
∂ 4f ∂3
−x31 x2 x3 sin(x3 x4 )

=
∂x3 ∂x1 ∂x2 ∂x4 ∂x3 ∂x1 ∂x2
∂2
−x31 x3 sin(x3 x4 )

=
∂x3 ∂x1

−3x21 x3 sin(x3 x4 )

=
∂x3
= −3x21 sin(x3 x4 ) − 3x21 x3 x4 cos(x3 x4 ).

We also have
∂ 4f ∂ 3f
x31 cos(x3 x4 )

=
∂x1 ∂x3 ∂x4 ∂x2 ∂x1 ∂x3 ∂x4
Higher order derivatives 31

∂ 2f
−x31 x3 sin(x3 x4 )

=
∂x1 ∂x3
∂f
−x31 sin(x3 x4 ) − x31 x3 x4 cos(x3 x4 )

=
∂x1
= −3x21 sin(x3 x4 ) − 3x21 x3 x4 cos(x3 x4 ). △

Definition 1.9.3 (Class C k , class C ∞ ). For k ∈ N, a function f : A → R, A ⊆ Rn , is


of class C k at a if all of the k-th order partial derivatives exist and are continuous at a.
More generally, f = (f1 , . . . , fm ) : A → Rm is of class C k at a if each component function
fi , 1 ≤ i ≤ m, is of class C k at a. A function is of class C ∞ at a if it is of class C k at a
for all k ∈ N. We say a function is of class C k (respectively, of class C ∞ ) it is of class C k
(respectively, of class C ∞ ) at all a ∈ A.
Example 1.9.4. (a) A function is of class C 0 if it is continuous.
(b) A function is C 1 if all partial derivatives exist and are continuous. By Theorem 1.5.6,
all C 1 functions are differentiable.
We can think of the condition of being a class C k function as a “higher” continuity or
differentiability condition. △

Example 1.9.5. The functions sin x, cos x, ex , x1 , and log x are all of class C ∞ . (Note that
we are implicitly assuming that the domain of each function is the subset of R where it is
defined.) Similarly, all polynomials are of class C ∞ . △

We saw in Example 1.9.1 that fxy = fyx , and we saw in Example 1.9.2 that
∂ 4f ∂ 4f
= .
∂x3 ∂x1 ∂x2 ∂x4 ∂x1 ∂x3 ∂x4 ∂x2
In both examples, it did not matter in what order we computed the partial derivatives. The
following theorem explains this phenomenon.
Theorem 1.9.6 (Clairaut’s theorem or Schwarz’s theorem). Suppose A ⊆ Rn is open and
2f ∂2f
f : A → R. If the partial derivatives ∂x∂i ∂xj
and ∂x j ∂xi
exist and are continuous at a, then

∂ 2f ∂ 2f
(a) = (a).
∂xi ∂xj ∂xj ∂xi
In particular, if f is of type C 2 , then
∂ 2f ∂ 2f
= for all 1 ≤ i, j ≤ n.
∂xi ∂xj ∂xj ∂xi
Proof. A proof of this theorem can be found in [FRYc, §2.3.1].

Remark 1.9.7. Using Theorem 1.9.6 and induction, we get analogues for higher order deriva-
tives. For example, if f : R3 → R is of class C 4 , then
∂ 4f ∂ 4f
= .
∂x ∂y ∂x ∂z ∂z ∂y ∂x2
32 Differentiation

Warning 1.9.8. There exist functions whose mixed partial derivatives exist, but are not
equal!

Example 1.9.9. Consider the function


( 2 2
xy xx2 −y
+y 2
if (x, y) ̸= (0, 0),
f : R2 → R, f (x, y) =
0 if (x, y) = (0, 0).

Then one can show that


∂ 2f ∂ 2f
(0, 0) = 1 and (0, 0) = −1.
∂x ∂y ∂y ∂x
See [FRYc, §2.3.2] for the details of this computation. △

The class of C k functions is closed under the following operations:


• Addition. If f, g : A → R are of class C k , then so is f + g.
• Scalar multiplication. If f : A → R is of class C k and α ∈ R, then αf is of class C k .
• Multiplication. If f, g : A → R are of class C k , then so is f g.
• Composition. If A ⊆ Rn , B ⊆ Rm ,

f : A → B, g : B → Rm ,

and f, g are both of class C k , then so is g ◦ f .

Exercises.
1.9.1. Prove (1.16) using induction.

Exercises from [FRYb, §2.3] : Q1–Q7.

1.10 Taylor’s theorem


Recall the single-variable version of Taylor’s theorem. It states that, if f : R → R is of class
C k at a, then
1 1
f (a + h) = f (a) + f ′ (a)h + f ′′ (a)h2 + · · · + f (k) (a)hk +Rk (h),
| 2 {z k! }
k-th Taylor polynomial

where
Rk (h)
→ 0 as h → 0.
hk
Taylor’s theorem 33

For a refresher on this topic, see [FRYa, §3.4].


When k = 1, Taylor’s theorem corresponds to the one-variable affine approximations we
discussed in Section 1.4. When k = 2, it yields the best possible quadratic approximation to
the function.
We would like to generalize Taylor’s theorem to more variables. For the remainder of
this section, suppose A ⊆ Rn is open and f : A → R.

Definition 1.10.1 (Taylor polynomial). If the k-th order partial derivatives of f exist at a,
then the k-th Taylor polynomial, or the degree k Taylor polynomial of f at a is
k n
X 1 X ∂ if
pk (a + h) = f (a) + (a)hj1 · · · hji , (1.17)
i=1
i! j ,...,j =1 ∂xj1 · · · ∂xji
1 i

where h = (h1 , . . . , hn ). We define the remainder function

Ra,k (h) := f (a + h) − pk (a + h),

so that
f (a + h) = pk (a + h) + Ra,k (h).

Theorem 1.10.2 (Taylor’s theorem). If f is of class C k at a, then

Ra,k (h)
lim = 0.
h→0 ∥h∥k

Proof. We will not prove this theorem in class. A proof of the k = 2 case can be found in
[MT11, §3.2, Th. 3].

Let us now describe the k = 2 case of Taylor’s theorem more explicitly.

Definition 1.10.3 (Hessian matrix). If the second partial derivatives of f exist, then the
Hessian matrix of f at a is
 2 
∂ f ∂2f ∂2f
2 (a) ∂x1 ∂x2
(a) · · · ∂x1 ∂xn
(a)
 ∂x2 1 
 ∂ f (a) ∂2f ∂2f
(a) · · · (a) 
Hf (a) :=  ∂x2 ∂x.1 ∂x22 ∂x2 ∂xn
 ∈ Matn×n (R). (1.18)
 
. .
. . . .
.

 . . . . 

∂2f ∂2f ∂2f
∂xn ∂x1
(a) ∂xn ∂x2 (a) · · · ∂x2
(a)
n

Comparing (1.17) and (1.18), we see that the degree 2 Taylor polynomial is given by

1 ⊺
p2 (a + h) = f (a) + ∇f (a) · h + h (Hf (a))h, (1.19)
2

where h⊺ = h1 · · · hn is the transpose of h.


 
34 Differentiation

Corollary 1.10.4 (Second-order Taylor formula). If f is of class C 2 at a, then


1 ⊺
f (a + h) = f (a) + ∇f (a) · h + h (Hf )h + Ra,2 (h),
2
where
Ra,2 (h)
lim = 0.
h→0 ∥h∥2

Example 1.10.5. Consider the function

f : R2 → R, f (x, y) = ey cos(x2 − 2y).



Let us find the degree 2 Taylor polynomial at a = ( π, π). We have

f ( π, π) = eπ cos(−π) = −eπ .

Next we compute the first-order partial derivatives:


∂f ∂f √ √
= −2ey x sin(x2 − 2y), ( π, π) = −2eπ π sin(−π) = 0,
∂x ∂x
∂f ∂f √
= ey cos(x2 − 2y) + 2ey sin(x2 − 2y), ( π, π) = eπ cos(−π) + 2eπ sin(−π) = −eπ .
∂y ∂y
Finally, we compute the second-order partial derivatives:
∂ 2f ∂ 2f √
= −2ey sin(x2 − 2y) + 2x2 cos(x2 − 2y) , ( π, π) = 4πeπ ,

∂x2 ∂x2
∂ 2f ∂ 2f √ √
= −2x ey sin(x2 − 2y) − 2ey cos(x2 − 2y) , ( π, π) = −4 πeπ ,

∂y ∂x ∂y ∂x
∂ 2f ∂ 2f √
= −3ey cos(x2 − 2y) + 4ey sin(x2 − 2y), ( π, π) = 3eπ ,
∂y 2 ∂y 2

∂ 2f ∂ 2f √ √
= −2x ey sin(x2 − 2y) − 2ey cos(x2 − 2y) , ( π, π) = −4 πeπ .

∂x ∂y ∂x ∂y
Thus, the gradient and Hessian are
√ π
√ √
 π
4πe −4 πe
∇f ( π, π) = (0, −eπ ), Hf ( π, π) = √ ,
−4 πeπ 3eπ
and the degree 2 Taylor polynomial is
√ 
p2 ( π, π) + (h1 , h2 )
 √ π  
π π 1  4πeπ −4 πe h1
= −e + (0, −e ) · (h1 , h2 ) + h1 h2 √ π π
2 −4 πe 3e h2
3 √
= −eπ − eπ h2 + 2πeπ h21 + eπ h22 − 4 πeπ h1 h2 .
2
Since f is of class C 2 , Corollary 1.10.4 tells us that the difference between the function and
this polynomial goes to zero faster than ∥h∥2 as h → 0. △
The implicit function theorem 35

Exercises.
1.10.1 ([MT11, §3.2, Exercise 1]). Let f (x, y) = ex+y .

(a) Find the first-order Taylor polynomial for f at (0, 0).


(b) Find the second-order Taylor polynomial for f at (0, 0).

1.10.2 ([MT11, §3.2, Exercise 2]). Suppose

L : R2 → R, L(x, y) = ax + by,

is a linear function.

(a) Find the first-order Taylor approximation for L.


Thus p1 = L.
(b) Find the second-order Taylor approximation for L.
(c) What will the higher-order approximations look like?

1.10.3 ([MT11, §3.2, Exercises 3–7]). For each of the following, determine the second-order
Taylor polynomial for the given function about the point (0, 0). Use this polynomial to
approximate f (−1, −1). Compare your approximation to the exact value using a calculator.

(a) f (x, y) = (x + y)2


(b) f (x, y) = 1/(x2 + y 2 + 1)
(c) f (x, y) = ex+y
(d) f (x, y) = e−x
2 −y 2
cos(xy)

1.10.4 ([MT11, §3.2, Exercise 11]). Let g(x, y) = sin(xy) − 3x2 ln y + 1. Find the degree 2
polynomial that best approximates g near the point (π/2, 1).

1.11 The implicit function theorem


In this section we state the implicit function theorem, which is one of the most important
theorems in mathematical analysis. A detailed proof is beyond the scope of this course, but
we will give some intuition for why it is true. A good reference for the material in this section
is [MT11, §3.5].
Recall that if A ⊆ Rn , f : Rn → R, and k ∈ R, then the level set of f at k is

{x ∈ A : f (x) = k}

If we define g : A → R by g(x) = f (x) − k, then

{x ∈ A : f (x) = k} = {x ∈ A : g(x) = 0}.


36 Differentiation

Thus, without loss of generality, we can assume that k = 0.


Now, consider such a level set, given by the equation

f (x, y) = 0.

When can we solve for one variable in terms of another? If the level set satisfies the vertical
line test (every vertical line meets the graph at most once), then we can write y as a function
of x.
y
level set
vertical line test

1 2 3 4 x

−5

Now consider the level set given by the equation

y
4

2
x
f (x, y) = x3 − 8x2 + 16x − y 2 = 0.
1 2 3 4 5 6
−2

−4

This level set does not satisfy the vertical line test. However, near some points we could
write y as a function of x. For instance, consider the portion of the level set near the point
(1, 3), indicated in red above. In this small region, we can write y as a function of x. On
the other hand, we would not be able to do this near the point (0, 0). The problem is that
the tangent line is vertical there.

Definition 1.11.1 (Neighbourhood). A neighbourhood of a point a ∈ Rn is an open set


U ⊆ Rn such that a ∈ U .

To state the implicit function theorem, we use the fact that Rn+1 = Rn × R to write
points of Rn+1 in the form (x, z), where x ∈ Rn , and z ∈ R. In particular, if F : Rn+1 → R,
then ∂F
∂z
denotes the partial derivative with respect to the last variable.

Theorem 1.11.2 (Special implicit function theorem). Suppose F : Rn+1 → R is of class C 1


and (x0 , z0 ) ∈ Rn+1 , such that

∂F
F (x0 , z0 ) = 0 and (x0 , z0 ) ̸= 0.
∂z
The implicit function theorem 37

Then there exist neighbourhoods U ⊆ Rn of x0 and V ⊆ R of z0 such that there is a unique


function
g : U → V such that F (x, g(x)) = 0 for all x ∈ U.
Furthermore, g is of class C 1 and
∂F
∂g ∂xi
(x, g(x))
(x) = − ∂F (1.20)
∂xi ∂z
(x, g(x))
for all x ∈ U and i ∈ {1, . . . , n}.
Proof. This theorem is a special case of the general implicit function we will state below
(Theorem 1.11.5). Once it is known that z = g(x) exists and is differentiable, we can obtain
equation (1.20) by implicit differentiation: Applying the chain rule to F (x, g(x)) = 0 gives
 
∂F ∂F ∂g
(x, g(x)) + (x, g(x)) (x) = 0,
∂xi ∂z ∂xi
which is equivalent to (1.20).

Example 1.11.3. Consider the points (1, 3), (0, 0), and (4, 0) of the level set
y
4 (1, 3)

2 (4, 0)
(0, 0) x
F (x, y) = x3 − 8x2 + 16x − y 2 = 0. • •
2 4 6
−2

−4

We have
∂F
= −2y.
∂y
This is zero if and only if y = 0. Therefore, the special implicit function theorem (The-
orem 1.11.2) implies that we can write y as a function of x near every point on the level
set where y ̸= 0. In particular, we can do so near the point (1, 3). (Note that the z in
Theorem 1.11.2 is our y here.)
We could also interchange the roles of the variables, and try to write x as a function of
y. Then the z of Theorem 1.11.2 is our x. Since
∂F
= 3x2 − 16x + 16 = (x − 4)(3x − 4),
∂x
we can write x as a function of y near every point on the level set where x ̸= 34 , 4. In
particular, we can do this near (0, 0), a point where we could not write y as a function of x.
The most problematic point is (4, 0), since both partial derivatives of F vanish here:

∇F (4, 0) = 0.
38 Differentiation

Here we cannot write any variable as a function of the others. Looking at the picture of the
level set above, we see why this is the case. The level set does not satisfy the vertical line
test or the horizontal line test in any neighbourhood of the point (4, 0). △

Remark 1.11.4. As illustrated in Example 1.11.3, if F : Rn+1 → R is of class C 1 and x0 ∈


Rn+1 , such that
F (x0 ) = 0 and ∇F (x0 ) ̸= 0,
then we can write at least one variable in terms of the others.

We now give a more general version of the implicit function theorem. Suppose we have
a C 1 function
F = (F1 , . . . , Fm ) : Rn × Rm → Rm .
We write points of Rn ×Rm = Rn+m as (x, z), with x = (x1 , . . . , xn ) ∈ Rn , z = (z1 , . . . , zm ) ∈
Rm . Define
 ∂F1 ∂F1 
∂x1
· · · ∂xn
 . . .. 
. .. and
Dx F :=  . .   ∈ Matm×n (R)
∂Fm
∂x
· · · ∂F
∂xn
m

 ∂F11 ∂F1 
∂z1
· · · ∂zm
 . . .. 
Dz F :=  ..
 .. .  ∈ Matm×m (R).
∂Fm
∂z1
· · · ∂F
∂zm
m

(This conflicts with our notation for the directional derivative, but the context should make
clear what we mean.) We view the equation F (x, z) = 0 as a collection of m equations:
F1 (x, z) = 0, . . . , Fm (x, z) = 0.
We’re interested in trying to solve for z1 , . . . , zm in terms of x1 , . . . , xn .
Theorem 1.11.5 (General implicit function theorem). Suppose F : Rn × Rm → Rm is a C 1
function and (x0 , z0 ) ∈ Rn × Rm , such that
F (x0 , z0 ) = 0 and Dz F (x0 , z0 ) is invertible.
Then there exist neighbourhoods U ⊆ Rn of x0 and V ⊆ Rm of z0 such that there is a unique
function
g : U → V such that F (x, g(x)) = 0 for all x ∈ U.
Furthermore, g is of class C 1 and
−1
Dg(x) = − Dz F (x, g(x)) Dx F (x, g(x)).
Sketch of proof. For a complete proof of this theorem, see [Gou59, p. 45]. We give here only
a sketch of a proof to give some intuition as to why the theorem might hold. The first degree
Taylor polynomial of F at (x, z) (see (1.17)) gives the approximation

F (x, z) ≈ F (x0 , z0 ) + ∇F (x, z) · (x, z) − (x0 , z0 )
The implicit function theorem 39

= F (x0 , z0 ) + Dx F (x0 , z0 )(x − x0 ) + Dz F (x0 , z0 )(z − z0 ).

If Dz F (x0 , z0 ) is invertible, then we can (approximately) solve the equation F (x, z) = 0 to


get
z ≈ z0 − (Dz F (x0 , z0 ))−1 (F (x0 , z0 ) + Dx F (x0 , z0 )(x − x0 )) .
Even though this is only an approximation, it is still true that if Dz F (x0 , z0 ) is invertible,
then we can solve for z in terms of x at nearby points.

Remark 1.11.6. Remember that a square matrix is invertible if and only if its determinant
is nonzero. So we can see if Dz F (x0 , z0 ) is invertible or not by computing its determinant.

Example 1.11.7. Consider the set of points (x, y, z) satisfying the following two equations:

x sin y + ex+z = 1,
x2 y + eyz − z 2 = 1.

Note that the points


(1, 0, −1) and (−1, 0, 1)
are both solutions. Can we solve for y and z in terms of x around these points?
We will use the implicit function theorem (Theorem 1.11.5) with n = 1, m = 2, and
F = (F1 , F2 ) : R3 → R2 , with

F1 (x, y, z) = x sin y + ex+z − 1, F2 (x, y, z) = x2 y + eyz − z 2 − 1.

We have " ∂F #
1 ∂F1 ∂F1  
∂x ∂y ∂z sin y + ex+z x cos y ex+z
DF = = .
∂F2 ∂F2 ∂F2 2xy x2 + zeyz yeyz − 2z
∂x ∂y ∂z

Thus    
x cos y ex+z 1 1
D(y,z) F = 2 , D(y,z) (1, 0, −1) = .
x + zeyz yeyz − 2z 0 −2
Since
det D(y,z) (1, 0, −1) = 1(−2) − 1(0) = −2 ̸= 0,
the matrix D(y,z) (1, 0, −1) is invertible. Thus, the implicit function theorem applies, and we
can solve for (y, z) in terms of x near the point (1, 0, −1).
However,  
−1 1
D(y,z) (−1, 0, 1) = ,
2 −2
which is not invertible. Thus, the implicit function theorem does not apply. On the other
hand,    
sin y + ex+z ex+z 1 1
D(x,z) = , and D(x,z) (−1, 0, 1) =
2xy yeyz − 2z 0 −2
is invertible. So the implicit function theorem tells us that we can solve for (x, z) in terms
of y near the point (−1, 0, 1). △
40 Differentiation

Warning 1.11.8. The converses to Theorems 1.11.2 and 1.11.5 are false! For example, if
∂F
∂z
(x0 , z0 ) = 0 in the setting of Theorem 1.11.2, this does not imply that it impossible to
write z as a function of x near (x0 , z0 ). See Exercise 1.11.1.

Exercises.
1.11.1. Use the function F (x, z) = x3 − z 3 to show that the converse to Theorem 1.11.2 is
false.

1.11.2 ([MT11, §3.5, Exercise 1]). Show that the equation x + y − z + cos(xyz) = 1 can be
solved for z = g(x, y) near the origin. Find ∂x
∂g
and ∂y
∂g
at (0, 0).

1.11.3 ([MT11, §3.5, Exercise 2]). Show that xy + z + 3xz 5 = 4 is solvable for z as a function
of (x, y) near (1, 0, 1). Compute ∂x
∂z
and ∂y
∂z
at (x, y) = (1, 0).

1.11.4 ([MT11, §3.5, Exercise 6]). Consider the surface S given by 3y 2 z 2 − 3x = 0.

(a) Using the implicit function theorem, verify that we can solve for x as a function of y
and z near any point on S. Explicitly write x as a function of y and z.
(b) Show that near (1, 1, −1) we can solve for either y or z, and give explicit expressions
for these variables in terms of the other two.

1.11.5 ([MT11, §3.5, Exercise 7]). Show that x3 z 2 − z 3 yx = 0 is solvable for z as a function
of (x, y) near (1, 1, 1), but not near the origin. Compute ∂x
∂z
and ∂y∂z
at (x, y) = (1, 1).

1.11.6 ([MT11, §3.5, Exercise 8]). Discuss the solvability of the system

3x + 2y + z 2 + u + v 2 = 0
4x + 3y + z + u2 + v + w + 2 = 0
x + z + w + u2 + 2 = 0

for u, v, w in terms of x, y, z near x = y = z = 0, u = v = 0, and w = −2.

1.11.7 ([MT11, §3.5, Exercise 14]). Consider the unit sphere

S = {(x, y, z) : x2 + y 2 + z 2 = 1}.

This sphere intersects the x-axis at two points. Which variables can we solve for at these
points? What about the points of intersection of S with the y-axis and z-axis?
The inverse function theorem 41

1.12 The inverse function theorem


A special case of the implicit function theorem (Theorem 1.11.5) is the inverse function
theorem. Suppose we have a function

f : Rn → Rn .

When can we invert this function, at least locally (that is, in a neighbourhood of a point)?
The inverse function theorem says that we can do this when the derivative is invertible at
that point.

Theorem 1.12.1 (Inverse function theorem). Suppose f : Rn → Rn is a C 1 -function, x0 ∈


Rn , and Df (x0 ) is an invertible matrix. Then there exists a neighbourhood U of x0 and a
neighbourhood V of f (x0 ) such that f |U : U → V is a bijection (hence invertible). Moreover,
(f |U )−1 : V → U is C 1

Proof. Consider the function

F : Rn × Rn → Rn , F (x, y) = f (x) − y.

Thus,
y = f (x) ⇐⇒ F (x, y) = 0.
Let y0 = f (x0 ), so that F (x0 , y0 ) = 0.
Since
Dx F = Df,
our assumption that Df (x0 ) is invertible implies that Dx F (x0 ) is invertible. Hence, the
implicit function theorem (Theorem 1.11.5) implies that there exist neighbourhoods V of y0
and U of x0 such that there is a unique function

g: V → U such that F (g(y), y) = 0 for all y ∈ V.

In other words, g is the unique function satisfying

y = f (g(y)) for all y ∈ V. (1.21)

Thus g = (f |U )−1 . (More precisely, (1.21) shows that g is a right inverse to f |U , implying
that f |U is surjective. The uniqueness of g shows that f is also injective, so that g is a
two-sided inverse to f |U .)

Example 1.12.2 (Spherical coordinates). Consider the function

f : R3 → R3 , f (ρ, θ, φ) = (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ).

Then we have  
sin φ cos θ −ρ sin φ sin θ ρ cos φ cos θ
Df =  sin φ sin θ ρ sin φ cos θ ρ cos φ sin θ  .
cos φ 0 −ρ sin φ
42 Differentiation

Using some basic trigonometric formulas (Exercise 1.12.4), we can compute that

det(Df ) = −ρ2 sin φ. (1.22)

Therefore, we can apply the invertible function theorem (Theorem 1.12.1) as long as ρ ̸= 0
and sin φ ̸= 0 (that is, φ is not a multiple of π).
The function f maps (ρ, θ, φ) to the point (x, y, z) ∈ R3 whose spherical coordinates are
(ρ, θ, φ):
z

θ y

When ρ = 0, we are at the origin. Then changes in θ and φ do not change the point (x, y, z).
Thus, the function is not locally invertible when ρ = 0. Similarly, when φ is a multiple of
π, we are on the z-axis, and changes in θ do not change the point. We will discuss spherical
coordinates in more detail in Section 4.5. △

Warning 1.12.3. The converse to Theorem 1.12.1 is false! See Exercise 1.12.1.

Exercises.
1.12.1. Show that the converse to Theorem 1.12.1 is false by giving a counterexample. Hint:
Try n = 1 and f (x) = x3 .

1.12.2 ([MT11, §3.5, Exercise 11]). Consider

x2 − y 2
 
2 2 xy
f : R \ {(0, 0)} → R , f (x, y) = , 2 .
x + y x + y2
2 2

Does this map have a local inverse near (x, y) = (0, 1)?

1.12.3 ([MT11, §3.5, Exercise 12]). (a) Define



F (r, θ) = x(r, θ), y(r, θ) = (r cos θ, r sin θ).

Show that the determinant of the Jacobian matrix at (r0 , θ0 ) is

det DF (r0 , θ0 ) = r0 .
The inverse function theorem 43

(b) When does there exist a smooth inverse function (r(x, y), θ(x, y))? Check directly and
with the inverse function theorem.

1.12.4. Prove (1.22).


Chapter 2

Extrema

One of the important applications of single-variable calculus that you saw in previous courses
is finding maxima and minima of functions of one variable. We would now like to extend
these techniques to functions of several variables. A good reference for the material in this
chapter is [FRYc, §2.9, §2.10].

2.1 Maximum and minimum values


We begin with some definitions. Recall the definition of the open ball Br (a) given in Defi-
nition 1.1.1.

Definition 2.1.1 (Maximum and minimum). Suppose A ⊆ Rn , f : A → R, and a ∈ A.

• We say a is a local maximum of f if there exists r > 0 such that

f (x) ≤ f (a) for all x ∈ Br (a).

• We say a is a local minimum of f if there exists r > 0 such that

f (x) ≥ f (a) for all x ∈ Br (a).

• We say a is a global maximum, or absolute maximum, of f if

f (x) ≤ f (a) for all x ∈ A.

• We say a is a global minimum, or absolute minimum of f if

f (x) ≥ f (a) for all x ∈ A.

We say a is a local extremum if it is a local minimum or a local maximum. We say a is a


global extremum if it is a global minimum or a global maximum. The plurals of maximum,
minimum, and extremum, are maxima, minima, and extrema, respectively.

Definition 2.1.2 (Critical point). Suppose A ⊆ Rn and f : A → R. We say a ∈ A is a


critical point if either ∇f = 0 or f is not differentiable at a.

44
Maximum and minimum values 45

Remark 2.1.3. If a ∈ ∂A (that is, a is a boundary point of A), then f is not differentiable
at a. Hence, all boundary points are critical points.

Remark 2.1.4. Some texts (e.g. [FRYc]) use the term critical point only to refer to points a
such that ∇f (a) = 0, and call a ∈ A a singular point if f is not differentiable at a.

You learned in single-variable calculus that every continuous function defined on a closed
interval attains a global maximum and minimum. This result can be generalized to multiple
variables.
Definition 2.1.5 (Bounded, compact). We say that the set A ⊆ Rn is bounded if there
exists some r > 0 such that A ⊆ Br (0). If A is bounded and contains its boundary (that is,
∂A ⊆ A), we say A is compact.
Examples 2.1.6. (a) For r > 0 and a ∈ Rn , the open ball Br (a) is not compact since it
does not contain its boundary.
(b) For r > 0 and a ∈ Rn , the closed ball {x : ∥x − a∥ ≤ r} is compact.
(c) The sphere {(x, y, z) : x2 + y 2 + z 2 = 1} is compact since it is bounded and has no
boundary (hence it contains its boundary, since the empty set is contained in every
set).
(d) The set [0, 1]2 is compact.
(e) The set A = [0, 1] × (0, 1] is not compact, since it does not contain its boundary. For
example, (0.5, 0) is a boundary point of A that is not contained in A.
(f) The set A = R × [0, 1] is not compact, since is it not bounded.

Theorem 2.1.7 (Extremum value theorem). Suppose A ⊆ Rn is compact, and let f : A → R


be continuous. Then f attains a global maximum and minimum. In other words, there exist
xmin , xmax ∈ A such that

f (xmin ) ≤ f (x) ≤ f (xmax ) for all x ∈ A.

Proof. The proof of this theorem uses concepts from real analysis, and is beyond the scope of
this course. For an idea of the proof, see http://www.cut-the-knot.org/fta/fta_note.
shtml.

Example 2.1.8. For a, b ∈ R with a < b. The interval [a, b] ⊆ R2 is bounded and contains
its boundary, hence is compact. Then Theorem 2.1.7 reduces to the extreme value theorem
you saw in single variable calculus: any continuous function f : [a, b] → R attains a global
maximum and minimum. △

A function f : A → R, A ⊆ Rn is bounded if there is a number M > 0 such that

−M ≤ f (x, y) ≤ M for all (x, y) ∈ A.

Note that this is equivalent to the image f (A) of f being a bounded set in the sense of
Definition 2.1.5.
46 Extrema

Lemma 2.1.9. If A ⊆ Rn is compact and f : A → R is continuous, then f is bounded.


Proof. By Theorem 2.1.7, f attains a global maximum M1 and a global minimum m1 . If we
define M = max{|M1 |, |m1 |}, then for all (x, y) ∈ A, we have
−M ≤ m1 ≤ f (x, y) ≤ M1 ≤ M.
Hence f is bounded.
Theorem 2.1.10 (First derivative test). Every local extremum is a critical point.
Proof. Suppose A ⊆ Rn and f : A → R. Suppose a = (a1 , . . . , an ) ∈ A is a local extremum.
If f is not differentiable at a, then a is a critical point, and we are done. Thus, we suppose
f is differentiable at a. For 1 ≤ i ≤ n, consider the function
gi : R → R, gi (x) = f (a1 , . . . , ai−1 , x, ai+1 , . . . , an ).
Since a is a local extremum for f , it is also a local extremum for gi . By the single-variable
first derivative test, we have
dgi ∂f
0= (ai ) = (a).
dx ∂xi
Hence ∇f (a) = 0.

Warning 2.1.11. The converse to Theorem 2.1.10 is false, as the following example illustrates.

Example 2.1.12. Consider the function


f : R2 → R, f (x, y) = x2 − y 2 ,
Its graphs looks as follows:

z = x2 − y 2

4
2

z 0
−2
2
−4
−2 0
−1 0 1 2 −2 y
x
We have ∇f = (2x, −2y). Hence ∇f (0, 0) = 0, and so (0, 0) is a critical point. However,
(0, 0) is not a local extremum. We have f (0, 0) = 0. However, for any r > 0, the ball Br (0, 0)
contains the points 2 , 0 and 0, 2 . Since
r r


f 2r , 0 > 0 and f 0, 2r < 0,


 

the point (0, 0) is neither a local maximum nor a local minimum. △


Maximum and minimum values 47

Definition 2.1.13 (Saddle point). Suppose A ⊆ Rn , f : A → R, and a ∈ A. We say that a


(or the corresponding point (a, f (a)) on the graph of f ) is a saddle point if ∇f (a) = 0, but
a is not a local extremum of f .

In Example 2.1.12, (0, 0, 0) is saddle point for f .

Example 2.1.14. Let us find the critical points of the function

f : R2 → R, f (x, y) = x2 y + xy 2 + xy.

We have

∂f
= 2xy + y 2 + y = y(2x + y + 1),
∂x
∂f
= x2 + 2xy + x = x(x + 2y + 1).
∂y

Since f is differentiable everywhere, we must solve ∇f (x, y) = (0, 0). So we have

y(2x + y + 1) = 0 and x(x + 2y + 1) = 0.

Thus
(y = 0 or 2x + y + 1 = 0) and (x = 0 or x + 2y + 1 = 0).

So we have four cases:

• x = 0 and y = 0.
• y = 0 and x + 2y + 1 = 0, which implies that x = −1.
• 2x + y + 1 = 0 and x = 0, which implies that y = −1.
• 2x + y + 1 = 0 and x + 2y + 1 = 0. Solving this linear system (exercise!) gives
x = y = − 31 .

Thus, there are four critical points:

− 13 , − 31 .

(0, 0), (−1, 0), (0, −1),

If f has any local extrema, then they must be amongst these points. We will return to this
example in Example 2.2.5. △
48 Extrema

Exercises.
2.1.1. Consider the function f (x, y) = 2(x2 + y 2 )e−x , whose graph looks as follows:
2 −y 2

2 −y 2
z = 2(x2 + y 2 )e−x

0.5
z

0 1
0
−1
0 −1 y
1
x

Find the critical points.

Exercises from [FRYb, §2.9] : Q1–Q3

2.2 The second derivative test


If we are searching for extrema, the first derivative test (Theorem 2.1.10) allows us to restrict
our attention to critical points. However, once we have found the critical points, how do we
determine if they are extrema? One useful tool is the second derivative test, which we now
explore. We first need a few facts from linear algebra.
Note that, if B ∈ Matn×n (R) and h ∈ Rn , then h⊺ Bh is a 1 × 1 matrix, which we
sometimes just view as a real number.
Definition 2.2.1 (Positive/negative definite matrix). A square matrix B ∈ Matn×n (R) is
positive definite if

h Bh > 0 for all h ∈ Rn , h ̸= 0.
The matrix B is negative definite if

h Bh < 0 for all h ∈ Rn , h ̸= 0.

(Note that a matrix can be neither positive definite nor negative definite.)
For a matrix  
b11 · · · b1n
B =  ... . . . ...  ∈ Matn×n (R),
 
bn1 · · · bnn
The second derivative test 49

define the submatrices


 
(1) (2) b11 b12
B (n) = B.
 
B = b11 , B = , ...,
b21 b22
In other words, for 1 ≤ m ≤ n, B (m) is the top-left m × m submatrix of B. Recall that the
matrix B is symmetric if bij = bji for 1 ≤ i, j ≤ m. For symmetric matrices, there is an easy
way to determine if the matrix is positive or negative definite.
Proposition 2.2.2. Suppose B ∈ Matn×n (R) is symmetric.
(a) The matrix B is positive definite if and only if
det(B (m) ) > 0 for all 1 ≤ m ≤ n.
(b) The matrix B is negative definite if and only if, for 1 ≤ m ≤ n,
det(B (m) ) < 0 if m is odd
and
det(B (m) ) > 0 if m is even.
Proof. For a proof of this result, see [Nic, Th. 8.3.3].
Recall the Hessian matrix from (1.18)
 2 
∂ f ∂2f ∂2f
∂x21
(a) ∂x1 ∂x2
(a) · · · ∂x1 ∂xn
(a)
 2 2

 ∂ f (a) ∂ f ∂2f
∂x22
(a) ··· (a)
Hf (a) :=  ∂x2 ∂x.1 ∂x2 ∂xn
 ∈ Matn×n (R).
 
.. .
.. ... ..

 . 

2
∂ f 2 2

∂xn ∂x1
(a) ∂x∂n ∂x
f
2
(a) · · · ∂ f
∂x2
(a)
n

If f is of class C at a, then it follows from Clairaut’s theorem (Theorem 1.9.6) that Hf (a)
2

is a symmetric matrix.
Theorem 2.2.3 (Second derivative test). Suppose A ⊆ Rn is open, x0 ∈ A, f : A → R is of
class C 2 , and ∇f (x0 ) = 0.
(a) If Hf (x0 ) is positive definite, then x0 is a local minimum.
(b) If Hf (x0 ) is negative definite, then x0 is a local maximum.
(c) If det(Hf (x0 )) ̸= 0 and Hf (x0 ) is neither positive definite nor negative definite, then
x0 is a saddle point.
(d) If det(Hf (x0 )) = 0 then we cannot draw any conclusion without additional work.
Proof. For a detailed proof, see [MT11, §3.3, Th. 5]. For a general idea, note that, when
∇f (x0 ) = 0, the degree 2 Taylor polynomial from (1.19) becomes
1 ⊺
p2 (x0 + h) = f (x0 ) + h (Hf (x0 ))h.
2
If Hf (x0 ) is positive definite, then p2 (x0 + h) > f (x0 ) for h ̸= 0. Since p2 is a good
approximation to f near x0 , this implies that f (x) ≥ f (x0 ) for x near x0 , and hence x0 is
a local minimum. Similarly, if Hf (x0 ) is negative definite, then f (x) ≤ f (x0 ) for x near x0 ,
and so x0 is a local maximum.
50 Extrema

It can be useful for reference purposes to give the n = 2 case of the second derivative test
explicitly.
Corollary 2.2.4 (Two-variable second derivative test). Suppose A ⊆ R2 is open, (a, b) ∈ A,
f : A → R is of class C 2 , and ∇f (a, b) = 0. Then we have
2
∂ 2f ∂ 2f
 2
∂ f
det(Hf (a, b)) = (a, b) 2 (a, b) − (a, b) .
∂x2 ∂y ∂x ∂y
(a) If
∂ 2f
(a, b) > 0 and det(Hf (a, b)) > 0
∂x2
then (a, b) is a local minimum.
(b) If
∂ 2f
(a, b) < 0 and det(Hf (a, b)) > 0
∂x2
then (a, b) is a local maximum.
(c) If
det(Hf (a, b)) < 0
then (a, b) is a saddle point.
(d) If
det(Hf (a, b)) = 0
then we cannot draw any conclusion without additional work.
Example 2.2.5. Consider again the function
f : R2 → R, f (x, y) = x2 y + xy 2 + xy.
In Example 2.1.14, we found that the critical points of f are
− 31 , − 31 .

(0, 0), (−1, 0), (0, −1),
The second partial derivatives of f are
∂ 2f ∂ 2f ∂ 2f
= 2y, = 2x + 2y + 1, = 2x.
∂x2 ∂x ∂y ∂y 2
Therefore, the Hessian matrix is
 
2y 2x + 2y + 1
Hf = ,
2x + 2y + 1 2x
and so
   
0 1 0 −1
Hf (0, 0) = , Hf (−1, 0) = ,
1 0 −1 −2
" 2 #
−3 − 31
 
−2 −1
Hf − 31 , − 31 =

Hf (0, −1) = , .
−1 0 − 13 − 32
We can then look at the critical points:
Constrained extrema 51

• Since det(Hf (0, 0)) = −1 < 0, the point (0, 0, 0) is a saddle point.
• Since det(Hf (−1, 0)) = −1 < 0, the point (−1, 0, 0) is a saddle point.
• Since det(Hf (0, −1)) = −1 < 0, the point (0, −1, 0) is a saddle point.
• Since

∂ 2f 2 2
− 13 , − 31 = − 32 < 0 and det Hf − 13 , − 13 = − 32 − − 13 1
 
2
= 3
> 0,
∂x

the function attains a local maximum at − 13 , − 31 .




In fact, the graph of f looks as follows:

z = f (x, y)
saddle points
local maximum
10

0
z

−10
0
−2
−1 0
1 y
x −2

We see that our local maximum is not a global maximum. △

Exercises.
Exercises from [FRYb, §2.9] : Q4–Q29

2.3 Constrained extrema


We often want to maximize or minimize a function subject to some constraint. Geometrically,
this constraint means that we want to restrict the domain of our function.

Example 2.3.1. Consider the function



f : R2 → R, f (x, y) = 3x − y.
52 Extrema

We want to find the maximum or minimum value of this function, but not on its entire
domain R2 . Rather, we want to find the maximum value on the unit circle:
y

S = {(x, y) : x2 + y 2 = 1},
x

One way to solve this problem would be to parameterize the circle, compose f with the
parameterization, then find the extrema as usual. Since
S = {(cos t, sin t) : t ∈ [0, 2π]},
we can find the extrema of

h(t) := f (cos t, sin t) = 3 cos t − sin t.
This is now a single-variable calculus problem. We solve
√ sin t 1 π 5π
0 = h′ (t) = − 3 sin t − cos t =⇒ tan t = = − √ =⇒ t = − or t = .
cos t 3 6 6
At t = − π6 , we have

3 1 3 1
cos t = , sin t = − , f (cos t, sin t) = + = 2.
2 2 2 2
At t = 5π
6
, we have

3 1 3 1
cos t = − , sin t = , f (cos t, sin t) = − − = −2.
2 2 2 2
Since the domain of h is [0, 2π], which is a closed and bounded interval (hence compact), we
know that h attains a maximum and minimum (by the extreme  √value theorem). Therefore,
the maximum value of f on the unit circle is 2, at the point 2 , − 2 , and the minimum
3 1
 √ 
value of f on the unit circle is −2, at the point − 23 , 12 . △

In theory, we can take a similar approach to finding extrema of functions restricted to


surfaces as well. In that case, our parameterization will involve two variables, and we would
reduce the problem to finding extreme for a two-variable function.
However, there are some limitations to the above approach:
• It is not always easy to find a parameterization of a constraint.
• This approach can sometimes get messy, especially if the parameterization is compli-
cated.
• This approach is not as elegant as the Lagrange multiplier method we will develop in
the next section.
Lagrange multipliers 53

2.4 Lagrange multipliers


A good reference for the material in this section is [FRYc, §2.10].
Recall that if f : A → B is a function, and S ⊆ A, then

f |S : S → B, f |S (x) := f (x),

is the restriction of f to S. Finding extrema for f |S is the same as finding extrema for f
subject to the constraint that points lie in S. For example a ∈ S is a local minimum for f |S
if there exists r > 0 such that

f (a) ≤ f (x) for all x ∈ Br (a) ∩ S.

Theorem 2.4.1 (Lagrange multipliers). Suppose U ⊆ Rn is open and f, g : U → R are C 1


functions. Furthermore, suppose k ∈ R and

S = {x ∈ U : g(x) = k}.

If x0 ∈ S is a local extremum for f |S , then ∇f (x0 ) and ∇g(x0 ) are linearly dependent, that
is, either

• ∇g(x0 ) = 0, or
• ∇f (x0 ) = λ∇g(x0 ) for some λ ∈ R (possibly λ = 0). The number λ is a called a
Lagrange multiplier.

We call a point x0 where ∇f (x0 ) and ∇g(x0 ) are linearly dependent a critical point.

Proof. Let x0 ∈ S be a local extremum for f |S . Suppose we have a C 1 path in S passing


through x0 . More precisely, suppose we have a C 1 function

p : [−1, 1] → S, p(0) = x0 .

S
p(t)
x0

p (0)
∇g(x0 )

y
x

Then the function


f ◦ p : [−1, 1] → R
54 Extrema

has a local extremum at 0. Since it is of class C 1 , this implies that its derivative is zero.
Using the chain rule (Theorem 1.6.2), we have

0 = (f ◦ p)′ (x0 ) = ∇f (p(0)) · p′ (0) = ∇f (x0 ) · p′ (0) =⇒ ∇f (x0 ) ⊥ p′ (0).

Since this is true for all paths in S that pass through x0 at time 0, we see that ∇f (x0 ) is
orthogonal to all tangent vectors to S at x0 . Thus, ∇f (x0 ) is orthogonal to S at x0 .
But we already know, by Proposition 1.8.6, that ∇g(x0 ) is orthogonal to S at x0 . There-
fore, ∇f (x0 ) and ∇g(x0 ) are parallel vectors. In other words, they are linearly depen-
dent.

Example 2.4.2. Let’s return to Example 2.3.1 and find the extrema using the method of
Lagrange multipliers. We define
g(x, y) = x2 + y 2 ,
and we want to find the extrema of

f : R2 → R, f (x, y) = 3x − y,

subject to the constraint


g(x, y) = 1.
We compute √
∇f = ( 3, −1), ∇g = (2x, 2y).
Since ∇g is only zero at (0, 0), which does not satisfy the constraint g(x, y) = 1, we want to
solve the vector equation ∇f (x) = λ∇g(x). Thus, we want to solve

( 3, −1) = λ(2x, 2y).

So we have three equations in three variables (don’t forget the constraint!):



2λx = 3, 2λy = −1, x2 + y 2 = 1.

It follows from the first two equations that λ ̸= 0. Thus, we can solve these first two equations
for x and y, giving √  √ 
(x, y) = 2λ3 , − 2λ
1
= λ1 23 , − 21 . (2.1)

Note that this is the parametric equation of a line through the origin (with the origin removed,
since 1/λ can never be zero). We want to find the points where this line intersects the unit
circle.
y

x
Lagrange multipliers 55

To find these points of intersection, we substitute the expression for x and y from (2.1) into
the constraint, to get
√ !2  2
3 1
1= + − =⇒ 4λ2 = 3 + 1 =⇒ λ2 = 1 =⇒ λ = ±1.
2λ 2λ

Substituting these two values for λ into (2.1) yields the points
√   √ 
(x, y) = 23 , − 12 and (x, y) = − 23 , 21

These are the same points we found in Example 2.3.1, where


√ we then concluded that the
maximum value of f on the unit circle is 2, at the point 2 , − 21 , and the minimum value
3
 √ 
of f on the unit circle is −2, at the point − 23 , 12 . △

Example 2.4.3. Let’s find the point on the sphere

S = {(x, y, z) : x2 + y 2 + z 2 = 6}

that is closest to the point (3, 1, −2).

S
(3, 1, −2)

2
0
z 5
−2
−5 0
0 y
x
5 −5

Instead of minimizing the distance, we will minimize the square of the distance from
(3, 1, −2), since this avoids some nasty square roots. (Since the square function t 7→ t2
is increasing, the square of the distance attains its minimum precisely where the distance
attains its minimum.) So we want to minimize the function

f (x, y, z) = (x − 3)2 + (y − 1)2 + (z + 2)2

subject to the constraint


g(x, y, z) = x2 + y 2 + z 2 = 6.
We have

∇f (x, y, z) = 2(x − 3), 2(y − 1), 2(z + 2) , ∇g(x, y, z) = (2x, 2y, 2z).
56 Extrema

So we need to solve the equations

3
2(x − 3) = 2λx ⇐⇒ x= (2.2)
1−λ
1
2(y − 1) = 2λy ⇐⇒ y= (2.3)
1−λ
2
2(z + 2) = 2λz ⇐⇒ z=− (2.4)
1−λ
x2 + y 2 + z 2 = 6. (2.5)

(Note that, in the first equation, λ = 1 yields the contradiction −6 = 0. So we can assume
λ ̸= 1, allowing us to divide by 1 − λ.) Substituting (2.2) to (2.4) into (2.5) gives

14 2 2 21
2
= 6 =⇒ 6(1 − λ) = 14 =⇒ 3λ − 6λ − 4 = 0 =⇒ λ = 1 ± ,
(1 − λ) 3

where we used the quadratic formula in the last step. Substituting these values of λ into
(2.2) to (2.4) gives  
9 3 6
(x, y, z) = ∓ √ , ∓ √ , ± √ .
21 21 21
   
The point √921 , √321 , − √621 is clearly closer to (3, 1, −2) than the point − √921 , − √321 , √621
   
is. So the nearest point is √921 , √321 , − √621 and the farthest point is − √921 , − √321 , √621 . △

Exercises.
Exercises from [FRYb, §2.10] : Q1–Q9, Q12, Q13, Q15–25, Q27, Q28.

2.5 Multiple constraints


In Section 2.4, we learned how to use the method of Lagrange multipliers to maximize or
minimize a function subject to one constraint. However, sometimes we want to maximize or
minimize a function subject to several constraints. It turns out that the method of Lagrange
multipliers can be generalized to accommodate this scenario. For a reference discussing two
constraints, see [FRYc, §2.10.1].
Recall that vectors v1 , . . . , vk ∈ Rn are linearly dependent if there exist scalars λ1 , . . . , λk ∈
R, not all zero, such that
λ1 v1 + · · · + λk vk = 0.
Equivalently, the vectors are linearly dependent if at least one of them can be written as a
linear combination of the others.
Multiple constraints 57

Theorem 2.5.1 (Lagrange multipliers, several constraints). Suppose U ⊆ Rn is open,

f, g1 , . . . , gr : U → R

are C 1 functions, and k1 , . . . , kr ∈ R. Let

S = {x ∈ U : gi (x) = ki for all i = 1, . . . , r}.

If x0 ∈ S is a local extrema for f |S , then the vectors

∇f (x0 ), ∇g1 (x0 ), . . . , ∇gr (x0 )

are linearly dependent.

Example 2.5.2. Let’s find the extrema of

f : R3 → R, f (x, y, z) = x − y,

subject to the constraints

g1 (x, y, z) = x2 + y 2 + z 2 = 11 and g2 (x, y, z) = x + y + z = 3.

One can show that the set

S = {(x, y, z) : g1 (x, y, z) = 11 and g2 (x, y, z) = 3}

contains its boundary. Thus, by the extreme value theorem (Theorem 2.1.7) f |S attains a
maximum and a minimum. So we can use the method of Lagrange multipliers to find these
extrema.
We compute

∇f = (1, −1, 0), ∇g1 = (2x, 2y, 2z), ∇g2 = (1, 1, 1).

Since ∇f and ∇g2 are clearly linearly independent1 , these three vectors are linearly de-
pendent if and only if ∇g1 can be written as a linear combination of ∇f and ∇g2 . So we
have
∇g1 = λ1 ∇f + λ2 ∇g2
for some λ1 , λ2 ∈ R. Therefore, we want to solve the following equations:

2x = λ1 + λ2 (2.6)
2y = −λ1 + λ2 (2.7)
2z = λ2 (2.8)
x + y + z 2 = 11
2 2
(2.9)
x+y+z =3 (2.10)
1
Recall that two vectors are linearly dependent if and only if one is a scalar multiple of the other.
58 Extrema

Adding (2.6) and (2.7), then using (2.8) gives

2x + 2y = 2λ2 = 4z =⇒ x + y = 2z.

Substituting this into (2.10) gives

2z + z = 3 =⇒ z = 1.

Therefore
x + y = 2 =⇒ x = 2 − y.
Substituting into (2.9) then gives

(2 − y)2 + y 2 + 12 = 11
=⇒ y 2 − 2y − 3 = 0
=⇒ (y − 3)(y + 1) = 0.

Thus y = 3 or y = −1, giving x = −1 or x = 3, respectively. Therefore, the extrema occur


at
(−1, 3, 1) and (3, −1, 1).
Since f (−1, 3, 1) = −1 − 3 = −4 and f (3, −1, 1) = 3 − (−1) = 4, we see that f |S attains its
minimum at (−1, 3, 1) and its maximum at (3, −1, 1). △

Exercises.
Exercises from [FRYb, §2.10] : Q10, Q11, Q14, Q26.

2.6 Finding global extrema


Let’s now combine all the things we have learned in this chapter. So far we have developed
techniques for finding local extrema. How do we find global extrema of a function f on
a set S? Since every global extremum is a local extremum, the two processes are related.
However, since not every local extremum is a global extremum, we still have some work to
do.
The first step is to determine whether or not the global extrema even exist! For this, one
of our most powerful tools is the extreme value theorem (Theorem 2.1.7). If S is compact
(recall that this means it bounded and contains its boundary) and f is continuous, the
extreme value theorem tells us that f does indeed attain a global maximum and minimum.
Thus, we only need to find the local extrema, and then see which of them is the largest (this
will be the global maximum) and which is the smallest (this will be the global minimum).
The situation is much more complicated if S is not compact. If S is not compact, then
we need to look at the limiting behaviour of f as ∥x∥ → ∞ or x → ∂S to decide if f is
even bounded and, if so, whether or not it has a global maximum and/or minimum. Doing
Finding global extrema 59

this is usually not so hard for functions of a single variable, but generally much harder for
multivariable functions.
In this course, we will focus mainly on the case where S is compact. Typically, our set S
will be a compact set of the form

S = {x : g(x) ≤ k}

for some C 1 function g : A → R, A ⊆ Rn , and some scalar k ∈ R. Then the boundary will
often (but not always!) be given by

∂S = {x : g(x) = k}.

We then split our search for the local extrema into two parts:

(a) We find the critical points in S \ ∂S. For this we use the first derivative test.
(b) We find the critical points in the boundary ∂S. For this we use the method of Lagrange
multipliers.

Example 2.6.1. Let’s find the global extrema (if they exist) of

f (x, y, z) = x2 + y 3 + z 3

on the set
S = {(x, y, z) ∈ R3 : g(x, y, z) := x2 + 3y 2 + 3z 2 ≤ 100}.
So S is a solid ellipsoid:

z 0

−5 5
0
−5 0 5 −5 y
x

The set S is bounded and contains its boundary

∂S = {(x, y, z) : g(x, y, z) = 100}.

Therefore, by the extreme value theorem (Theorem 2.1.7), f attains a global minimum and
maximum on S.
60 Extrema

First we look for critical points for f in the interior, S \ ∂S. So we solve
(0, 0, 0) = ∇f = (2x, 3y 2 , 3z 2 ) =⇒ x = y = z = 0.
Don’t forget to check that the points you find are actually in S! Here we have
02 + 2 · 02 + 3 · 02 = 0 ≤ 100,
and so (0, 0, 0) ∈ S.
Now we consider the boundary of S, which is the ellipsoid
∂S = {(x, y, z) ∈ R3 : g(x, y, z) = 100}.
We have
∇g = (2x, 6y, 6z).
Note that ∇g = (0, 0, 0) would imply that x = y = z = 0. Since (0, 0, 0) ∈
/ ∂S, we see that
∇g is never zero on ∂S. So ∇f and ∇g are linearly dependent if and only if ∇f = λ∇g for
some λ ∈ R. Therefore the Lagrange equations become:
2x = 2λx (2.11)
3y 2 = 6λy (2.12)
3z 2 = 6λz (2.13)
x2 + 3y 2 + 3z 2 = 100 (2.14)
From (2.11) to (2.13), we have
(λ − 1)x = 0 =⇒ x = 0 or λ = 1,
(y − 2λ)y = 0 =⇒ y = 0 or y = 2λ,
(z − 2λ)z = 0 =⇒ z = 0 or z = 2λ.
Considering all the above possibilities, together with the constraint (2.14), we have the
following critical points (including the one we found in the interior of S), together with the
corresponding value of the function f :

Critical point f (x, y, z)


(0, 0, 0) √ 0 √
(0, 0, ±10/
√ 3) ±1000/3√3
(0, ±10√3, 0) √ ±1000/3
√ 3
±(0, 5/ 6, 5/ 6) 125/3 6
(±10,√ 0, 0) 100
(±2√19, 2, 2) 92
(±2√22, 0, 2) 96
(±2 22, 2, 0) 96
√ √
Note that
√ 1000/3 3 ≈ 192.45 and
√ 125/3 √ Thus, f attains its maximum value of
6 ≈ 17.01.
1000/3 √3 at the points (0, 0, 10/ 3)√and (0, 10/ 3, 0)√and it attains its minimum value of
−1000/3 3 at the points (0, 0, −10/ 3) and (0, −10/ 3, 0). △
Finding global extrema 61

Example 2.6.2. Let’s find the global extrema, if they exist, of

f (x, y) = x2 + 2y

on the set
S = {(x, y) ∈ R2 : g(x, y) := 2x + y 2 ≤ 3}.
The set S looks as follows:
y
2
1
x
−1−1 1 2
−2
Its boundary is
∂S = {(x, y) : g(x, y) = 3}.
Note that S is not bounded, and so we cannot apply the extreme value theorem (Theo-
rem 2.1.7) to conclude that f has a global maximum or minimum on the set S.
Nevertheless, we begin by finding the critical points in the interior of S. Since

∇f = (2x, 2) ̸= (0, 0),

there are no such critical points.


Next, we use the method of Lagrange multipliers to find the critical points on the bound-
ary. Since ∇g = (2, 2y) is never zero, the Lagrange equations become:

2x = 2λ (2.15)
2 = 2λy (2.16)
2x + y 2 = 3 (2.17)

The equation (2.15) gives that x = λ, while (2.16) implies that y = λ1 . (Note that λ = 0
contradicts (2.16), so we can divide by λ.) Substituting into (2.17) gives
1 2
2λ + 2
= 3 =⇒ + y 2 = 3.
λ y
(Note that y = 0 would contradict (2.16), and so we have y ̸= 0.) Multiplying by y, we have

y 3 − 3y + 2 = 0.

Since y = 1 is clearly a solution, we can use polynomial division to find the other factors:

y 3 − 3y + 2 = (y − 1)(y 2 + y − 2) = (y − 1)2 (y + 2).

Thus, y = 1 or y = −2. When y = 1, we have x = λ = 1


y
= 1. When y = −2, we have
x = − 12 . Thus, the critical points on ∂S are

(1, 1) and − 12 , −2 .

62 Extrema

Now, 2
f (1, 1) = 12 + 2 · 1 = 3 and f − 12 , −2 = − 12 + 2(−2) = − 15

4
.
Are the points we found global extrema? Note that (x, 0) ∈ S for arbitrarily large
negative x. Since f (x, 0) = x2 , this means that f takes arbitrarily large values on S. So f
has no global maximum. Since there is no othercritical point on S, the function f does attain
a minimum value of − 15 4
at the point − 21 , −2 . (We use here the fact that f (x, y) → ∞ as
∥(x, y)∥ → ∞.) △

Exercises.
2.6.1. Find the absolute maximum and minimum values of f (x, y) = x2 + 3xy + y 2 + 7 on
D = {(x, y) : x2 + y 2 ≤ 1}.

2.6.2. Find and classify the extreme values (if any) of the following functions defined on R2 :

(a) x2 − y 5
(b) (x + 1)2 + (x − y)2
(c) x2 + xy 2 + y 4

2.6.3. Find the maximum and minimum values of f (x, y) = xy − y + x − 1 subject to the
condition x2 + y 2 ≤ 2.
Chapter 3

Double and triple integrals


Rb
In previous calculus courses, you learned about single variable integrals, such as a f (x) dx.
In this chapter we will explore integration in two and three dimensions. We begin in Sec-
tions 3.1 and 3.2 with an informal treatment, discussing how one computes double integrals
in practice (via iterated integrals). We then give a more precise definition of the double inte-
gral in Sections 3.3 to 3.5. Finally, in Section 3.7, we discuss integration in three dimensions.
A good reference of the material in this chapter is [FRYc, §3.1–3.4].

3.1 Vertical slices


Suppose we want to compute the mass of a two-dimensional object occupying the region
D in the xy-plane. We first need to know the density of the object, which may depend on
position. Let f (x, y) denote the density at the point (x, y). For simplicity, let’s assume that
D is the region between the bottom curve y = B(x) and the top curve y = T (x), with x
running from a to b. Thus,

D = {(x, y) : a ≤ x ≤ b, B(x) ≤ y ≤ T (x)}.


y
T (x)

x
a b

B(x)

Now, if we wanted to compute the area of D (instead of its mass), we learned in single
variable calculus that we should approximate the region by rectangles, which we call slices.
We do this by splitting the interval [a, b] into n subintervals and choosing a point x∗i in the

63
64 Double and triple integrals

i-th interval. Then, for each interval, we draw a rectangle over that interval, with top edge
at height T (x∗i ) and bottom edge at height B(x∗i ).

T (x) (x∗i , T (x∗i ))


a b


B(x) (x∗i , B(x∗i ))

The sum of the areas of the slices (called a Riemann sum) is

n
X b−a
(T (x∗i ) − B(x∗i ))∆x where ∆x = .
i=1
n

We then take the limit as n → ∞. If this limit exists, it is the area between the two curves:

n
X Z b
Area = lim (T (x∗i ) − B(x∗i ))∆x = (T (x) − B(x)) dx .
n→∞ a
i=1

We need to modify this procedure in order to calculate the mass of D, instead of its area.
To do this, we need to replace our approximation (T (x∗i ) − B(x∗i ))∆x of the area of the i-th
slice of D by an approximation of the mass of this slice. To do this, we subdivide the slice
into m rectangles of height

T (x∗i ) − B(x∗i )
∆y = .
m

Let

yj = B(x∗i ) + j∆y

be the y-coordinate of the top of the j-th rectangle. For each j = 1, . . . , m, we choose a
number yj∗ between yj−1 and yj . We then approximate the density on rectangle j in slice i
Vertical slices 65

by the constant f (x∗i , yj∗ ).

yj−1
yj∗
yj

x∗i

The mass of rectangle j in slice i is approximately

f (x∗i , yi∗ )∆x∆y.

Taking the limit as m → ∞ (that is, the limit as the height of the rectangles goes to zero),
the Riemann sum becomes an integral:
Z T (x∗i )
Mass of slice i ≈ ∆x f (x∗i , y) dy = F (x∗i )∆x,
B(x∗i )

where
Z T (x)
F (x) = f (x, y) dy .
B(x)

Notice that we started with a function f (x, y), and we have “integrated out” the variable y,
leaving a function F (x) of one variable.
Now we take the limit at n → ∞ (that is, the limit as the width of the slices goes to
zero), giving
n
X Z T (x∗i ) X
Mass = lim ∆x f (x∗i , y) dy = lim F (x∗i )∆x.
n→∞ B(x∗i ) n→∞
i=1 i=1

Now, the sum


X
F (x∗i )∆x
i=1
Rb
is a Riemann sum approximation to the integral a
F (x) dx. Thus,
!
Z b Z b Z T (x)
Mass = F (x) dx = f (x, y) dy dx .
a a B(x)
66 Double and triple integrals

Notation 3.1.1 (Double integral).


!
ZZ Z b Z T (x)
f (x, y) dx dy = f (x, y) dy dx
a B(x)
D
Z bZ T (x) Z b Z T (x)
= f (x, y) dy dx = dx f (x, y) dy .
a B(x) a B(x)

The last three integrals are called iterated integrals. Note the placement of dx and dy in
these iterated integrals. The integral signs (and their limits of integration) match up with
the dx or dy in the same way that you match parentheses in a mathematical expression.

Example 3.1.2. Let D be the region above the curve x2 = 9y and to the right of the curve
3y 2 = x. Let’s compute ZZ
xy dx dy .
D

The curves intersect at points (x, y) that satisfy both

x2 = 9y and 3y 2 = x.

Thus, we have

x2 (3y 2 )2
y= = = y 4 =⇒ y 4 − y = 0 =⇒ y(y 3 − 1) = 0 =⇒ y = 0 or y = 1.
9 9
Therefore, the curves intersect at (0, 0) and (3, 1). So the region D looks as follows:

y x2 = 9y
1 3y 2 = x

x
3

x2
For 0 ≤ x ≤ 3, the bottom
p x boundary of the region D is given by y = 9
and the top
boundary is given by y = 3 . Thus, we have

ZZ Z 3 Z √x/3 !
xy dx dy = xy dy dx
0 x2 /9
D
Horizontal slices 67

√x/3
 
Z 3 
=  1 xy 2  dx
0 2 y=x2 /9
Z 3 2
x5

x
= − dx
0 6 162
 3 3
x x6
= −
18 6 · 162 0
3 3 3
= − = .
2 4 4
Note that we first integrated with respect to y, treating x as a constant. Then we integrated
with respect to x. △

Example 3.1.3. If f (x, y) = 1 for all (x, y) ∈ D, then we have


ZZ Z b Z T (x) Z b Z b
T (x)
f (x, y) dx dy = dx dy = [y]B(x) dx = (T (x) − B(x)) dx .
a B(x) a a
D

This is precisely the formula for the area between two curves that you learned in single-
variable calculus! △

Exercises.
Exercises are deferred until Section 3.6.

3.2 Horizontal slices


In Section 3.1, to integrate over a region D, we approximated the region by vertical slices.
As you may expect, we can also approximate the region by horizontal slices. It is quite useful
to have the flexibility to use either method. For instance, consider the following region:

y x = 2y + 1
1 3y 2 = x

D
x
1 3
3
− 13
68 Double and triple integrals

Using the vertical slice method, we would have to split the region
p x D into two pieces. For
0 ≤ x ≤ 3 , the region is bounded above by the curve y =
1
3
and below by the curve
y = − 3 . Then, for 3 ≤ x ≤ 3, the region is bounded above by y = x3 and below by
px 1
p

y = x−12
. On the other hand, the region D is more simply described as the region to the
right of the curve x = 2y + 1 and to the left of the curve 3y 2 = x.
Let’s consider the general setup. Suppose we want to integrate the function f (x, y) over
the region

D = {(x, y) : c ≤ y ≤ d, L(y) ≤ x ≤ R(y)}.


y x = R(y)
x = L(y)
d

c
x

To integrate using horizontal slices, we follow the procedure of Section 3.1, but we draw our
slices horizontally and reverse the roles of x and y everywhere.

y x = R(y)
x = L(y)
d

yi∗

c
x
xj x∗ xj−1
j

Interchanging x and y in our discussion from Section 3.1 then tells us that, in order to
integrate the function f (x, y) over the region D, we compute the double integral
!
Z d
Z R(y)
f (x, y) dx dy .
c L(y)

The following notation should be compared to Notation 3.1.1.


Horizontal slices 69

Notation 3.2.1 (Double integral).


!
ZZ Z d Z R(y)
f (x, y) dx dy = f (x, y) dx dy
c L(y)
D
Z d Z R(y) Z d Z R(y)
= f (x, y) dx dy = dy f (x, y) dx .
c L(y) c L(y)

Example 3.2.2. Let’s compute the integral from Example 3.1.2, but using horizontal slices
instead of vertical slices. So we want to compute
ZZ
xy dx dy
D

where D is the following region:


y x2 = 9y
1 3y 2 = x

x
3

In order to use the method of horizontal slices, we describe the region as



D = {(x, y) : 0 ≤ y ≤ 1, 3y 2 ≤ x ≤ 3 y}.
Thus we have

ZZ Z 1 Z 3 y 
xy dx dy = xy dx dy
0 3y 2
D
Z 1  3√y !
1 2
= xy dy
0 2 x=3y 2
1
9y 2 9y 5

Z 
= − dy
0 2 2
 3 1
3y 3y 6
= −
2 4 0
3 3 3
= − = .
2 4 4
Note that we got the same answer as we did in Example 3.1.2. We will see why this happened
in Theorem 3.4.1. △
70 Double and triple integrals

Exercises.
Exercises are deferred until Section 3.6.

3.3 Double integrals over a rectangle


In Sections 3.1 and 3.2 we developed two ways to compute a double integral over a region in
the plane. In Example 3.1.2 and Example 3.2.2 we computed the same integral in both ways
and got the same answer. Of course, this is what we want to happen. But we haven’t yet
explained why this should be the case. In this section, we give an alternative definition of
the double integral, and then give some conditions that ensure that this definition matches
the calculation via vertical and horizontal slices.
Consider a closed rectangle

R = [a, b] × [c, d] ⊆ R2 .

A regular partition of R of order n is the collection of points {xj }nj=0 and {yj }nj=0 satisfying

a = x0 < x1 < · · · < xn = b, c = y0 < y1 < · · · < yn = d,

where
b−a d−c
xj+1 − xj = , yk+1 − yk =
n n
for 0 ≤ j, k ≤ n − 1. Using these equally spaced points, we subdivide the rectangle R:
y

y4 = d
y3
y2 R
y1
c = y0
x
a = x0 x1 x2 x3 x4 = b

Let
Rjk = [xj , xj+1 ] × [yk , yk+1 ]
and let cjk be any point in Rjk . Suppose f : R → R is bounded. For example, since R is
compact, it follows from Lemma 2.1.9 that f is bounded if f is continuous. Form the sum
n−1
X n−1
X
Sn = f (cjk )∆x∆y = f (cjk )∆A, (3.1)
j,k=0 j,k=0
Double integrals over a rectangle 71

where
b−a d−c
∆x = , ∆y = , ∆A = ∆x∆y.
n n
Thus ∆A is the area of each of the subrectangles into which we have divided R. The sum
(3.1) is called a Riemann sum for f .
We can now give the precise definition of the double integral.
Definition 3.3.1 (Double integral). If the sequence (Sn )∞
n=1 converges to a limit S as n → ∞
and if the limit S is the same for any choice of points cjk in the rectangle Rjk , then we say
that f is integrable over R, and we write
ZZ ZZ ZZ
f (x, y) dA, f (x, y) dx dy, or f dx dy .
R R R

for the limit S.


Note that some of the notation in Definition 3.3.1 is the same as the Notation 3.1.1
and 3.2.1. We should justify why we can use the same notation. It is also very useful to
have a class of functions which we know are integrable.
Theorem 3.3.2. Any continuous function defined on a closed rectangle is integrable.
In fact, a more general result is true. Consider a function f : R → R whose graph looks
something like the following:

z = f (x, y)

y
x

This function is continuous every except along a continuous curve (in the domain R). It
turns out that this type of function is also integrable.
Theorem 3.3.3. Suppose R is a closed rectangle and f : R → R is bounded. Furthermore,
suppose that the set of points where f is discontinuous lies on a finite union of graphs of
continuous functions. Then f is integrable over R.
Proof. We will not prove this theorem in this course. The basic idea behind the proof is the
same as that for single-variable functions, where a function with only finitely many disconti-
nuities is integrable. In the limit defining the integral, the terms involving the discontinuities
become zero.
72 Double and triple integrals

Theorem 3.3.3 will be key in handling integrals over more general regions, and relating
Definition 3.3.1 to the vertical and horizontal slice methods of Sections 3.1 and 3.2. Before we
do this, we state some properties of integrals that follows from Definition 3.3.1 and properties
of limits.

Proposition 3.3.4 (Arithmetic of integrals). Suppose f and g are integrable functions on


a rectangle R, and let c ∈ R. Then f + g and cf are integrable, and we have the following.

(a) Linearity:
ZZ ZZ ZZ
(f (x, y) + g(x, y)) dA = f (x, y) dA + g(x, y) dA .
R R R

(b) Homogeneity: ZZ ZZ
cf (x, y) dA = c f (x, y) dA .
R R

(c) Monotonicity: If f (x, y) ≥ g(x, y) for all (x, y) ∈ R, then


ZZ ZZ
f (x, y) dA ≥ g(x, y) dA .
R R

Corollary 3.3.5. If f is integrable on a rectangle R, then so is |f |, and we have

ZZ ZZ
f dA ≤ |f | dA . (3.2)
R R

Proof. Since
−|f (x, y)| ≤ f (x, y) ≤ |f (x, y)| for all (x, y) ∈ R,
we can use monotonicity and homogeneity of integration (with c = −1) to conclude that
ZZ ZZ ZZ
− |f | dA ≤ f dA ≤ |f | dA,
R R R

which is equivalent to (3.2).

Exercises.
Exercises are deferred until Section 3.6.
Fubini’s theorem 73

3.4 Fubini’s theorem


The following theorem states that, for continuous functions, one can compute the integral
over a rectangle by the vertical slice method from Section 3.1 or the horizontal slice method
from Section 3.2. In particular, the vertical and horizontal slice methods give the same
answer, explaining the phenomenon we say in Example 3.1.2 and Example 3.2.2.
Theorem 3.4.1 (Fubini’s theorem, continuous case). Suppose R = [a, b] × [c, d] and f : R →
R is continuous. Then
ZZ Z bZ d Z dZ b
f (x, y) dA = f (x, y) dy dx = f (x, y) dx dy .
a c c a
R

Proof. For a proof of this theorem, see [MT11, Th. 5.2.3].

Example 3.4.2. Let R = [0, 3] × [e, 2e], and let


x
f : R → R, f (x, y) = .
y
By Fubini’s theorem (Theorem 3.4.1), we have
ZZ Z 2e Z 3
f dA = f (x, y) dx dy
e 0
R
Z 2e Z 3
x
= dx dy
e 0 y
Z 2e  2 3
x
= dy
e 2y x=0
Z 2e
9
= dy
e 2y
Z 2e
9
= dy
e 2y
 2e
9
= ln y
2 y=e
9 
= ln(2e) − ln e
2
9 
= ln 2 + ln e − ln e
2
9
= ln 2.
2
Alternatively, we could first integrate with respect to y, giving
ZZ Z 3 Z 2e
x
f dA = dy dx
0 e y
R
74 Double and triple integrals

Z 3
= [x ln y]2e
y=e dx
Z0 3
= x ln 2 dx
0
 3
1 2
= x ln 2
2 x=0
9
= ln 2.
2
As expected, we obtain the same answer. △

As we saw in Theorem 3.3.3, a function that is discontinuous only on a finite union of


graphs of continuous functions is still integrable. The following theorem generalizes Theo-
rem 3.4.1 to this setting.
Theorem 3.4.3 (Fubini’s theorem, general version). Suppose R = [a, b] × [c, d] and f : R →
R is a bounded function whose discontinuities lie on a finite union of graphs of continuous
functions.
Rd
(a) If the integral c f (x, y) dy exists for all x ∈ [a, b], then
Z b Z d 
f (x, y) dy dx
a c

exists and Z bZ d ZZ
f (x, y) dy dx = f (x, y) dA .
a c
R
Rb
(b) If the integral a
f (x, y) dx exists for all y ∈ [c, d], then
Z dZ b ZZ
f (x, y) dx dy = f (x, y) dA .
c a
R

Therefore, if all the above conditions hold, we have


Z bZ d Z dZ b ZZ
f (x, y) dy dx = f (x, y) dx dy = f (x, y) dA .
a c c a
R

We now give an example of a “problematic function”.

Example 3.4.4. Let R = [0, 1] × [0, 1] and


(
1 if x, y are both rational numbers,
f : R → R, f (x, y) =
0 otherwise.

Then the integral f dA does not exist. This is because, if we choose the points cjk in
RR
R
Definition 3.3.1 to have rational coordinates, the Riemann sums are all equal to 1, whereas
if we choose them to have irrational coordinates, the Riemann sums are all equal to 0. △
Double integrals over more general regions 75

Remark 3.4.5. If you take more advanced courses in analysis, you will learn about the
Lebesgue integral which does exist for the function f of Example 3.4.4, and is equal to
0, essentially because there are way more irrational numbers than rational numbers.

Exercises.
Exercises from [FRYb, §3.1] : Q1, Q2, Q3a
Further exercises are deferred until Section 3.6.

3.5 Double integrals over more general regions


Fubini’s theorem (Theorem 3.4.3) told us that the integral over a rectangle (in the sense of
Definition 3.3.1) is equal to the iterated integrals corresponding to the vertical and horizontal
slice methods. However, it is stated only for rectangular regions. We’d like to be able to
work with more general regions in the plane. Fortunately, there is an easy “trick” allowing
us to do this.
Suppose D is a bounded region in R2 , and we want to integrate a function f : D → R
over D.
y

Since D is bounded, we can enclose it in a rectangle R = [a, b] × [c, d].


y
d

c
x
a b
76 Double and triple integrals

Our “trick” is to extend the function f to R by defining it to be zero outside of D. Let us


make this more precise.
We say a set D ⊆ R2 is y-simple if there are two continuous functions

B, T : [a, b] → R

such that

D = {(x, y) : a ≤ x ≤ b, B(x) ≤ y ≤ T (x)}.


y
T (x)

x
a b

B(x)

We considered these types of regions in Section 3.1.


We say a set D ⊆ R2 is x-simple if there are continuous functions

L, R : [c, d] → R

such that

D = {(x, y) : c ≤ y ≤ d, L(y) ≤ x ≤ R(y)}.


y x = R(y)
x = L(y)
d

c
x

These are the types of regions we considered in Section 3.2.


We say a set D ⊆ R2 is simple if it is both x-simple and y-simple. For instance the unit
disk
D = {(x, y) : x2 + y 2 ≤ 1}
Double integrals over more general regions 77

since we have
√ √
D = {(x, y) : −1 ≤ x ≤ 1, − 1 − x2 ≤ y ≤ 1 − x2 }
y √
y = 1 − x2

y = − 1 − x2

and
p p
D = {(x, y) : −1 ≤ y ≤ 1, − 1 − y 2 ≤ x ≤ 1 − y 2 }
y p
x = 1 − y2
p
x = − 1 − y2

A set D ⊆ R2 is an elementary region if it is either x-simple or y-simple (or both).


We can now give the precise definition of the integral over a more general region in the
plane.
Definition 3.5.1 (Double integral over an elementary region). Suppose D is an elementary
region in the plane and R is a rectangle containing D. Furthermore, suppose f : D → R is
continuous. Define
(
f (x, y) if (x, y) ∈ D,
f ∗ : R → R, f ∗ (x, y) =
0 otherwise.

Then f ∗ is integrable over R by Theorem 3.3.3. We define the integral of f over D to be


ZZ ZZ
f (x, y) dA = f ∗ (x, y) dA .
D R
78 Double and triple integrals

Definition 3.5.1 is related to the vertical and horizontal slice methods of Sections 3.1
and 3.2 by the following result.

Theorem 3.5.2 (Reduction to iterated integrals). Suppose f : D → R is a continuous


function.

(a) If
D = {(x, y) : a ≤ x ≤ b, B(x) ≤ y ≤ T (x)}
is a y-simple region, then
ZZ Z bZ T (x)
f (x, y) dA = f (x, y) dy dx .
a B(x)
D

(b) If
D = {(x, y) : c ≤ y ≤ d, L(y) ≤ x ≤ R(y)}
is an x-simple region, then
ZZ Z d Z R(y)
f (x, y) dA = f (x, y) dx dy .
c L(y)
D

We can now add one more important property of integrals to those listed in Proposi-
tion 3.3.4. Recall that two sets X and Y are disjoint if X ∩ Y = ∅.

Proposition 3.5.3 (Additivity of the integral). If D1 , D2 ⊆ R2 are disjoint elementary


regions and f : D1 ∪ D2 → R is a continuous function, then
ZZ ZZ ZZ
f (x, y) dA = f (x, y) dA + f (x, y) dA .
D1 ∪D2 D1 D2

Exercises.
Exercises from [FRYb, §3.1] : Q3
Further exercises are deferred until Section 3.6.

3.6 Examples and applications of double integrals


Example 3.6.1. Let

D = {(x, y) : 0 ≤ x ≤ 1, x ≤ y ≤ x + 1}
Examples and applications of double integrals 79

2 y y =x+1
y=x

x
1

and
f : D → R, f (x, y) = 2y − (x + 1)2 .

Then
ZZ Z 1 Z x+1
2y − (x + 1)2 dy dx

f dA =
0 x
D
Z 1 x+1
y 2 − (x + 1)2 y

= y=x
dx
0
Z 1
(x + 1)2 − (x + 1)3 − x2 + x(x + 1)2 dx

=
Z0 1
(x + 1)2 (1 − x − 1 + x) − x2 dx

=
0
Z 1
=− x2 dx
0
 3 1
x
=−
3 0
1
=− .
3

Note that D is also an x-simple region. So we could compute the integral by first integrating
with respect to x. But this would involve splitting the region into two pieces. △

One important use of the double integral is computing volumes. You learned in single-
variable calculus how to compute the volume of certain types of solids (e.g. solids of rev-
olution). Using the double integral, we can compute more general volumes. In particular,
suppose D ⊆ R2 is an elementary region, and suppose we have two functions

ϕ1 , ϕ2 : D → R, ϕ1 (x, y) ≤ ϕ2 (x, y) for all (x, y) ∈ D.

Consider the solid

S = {(x, y, z) : (x, y) ∈ D, ϕ1 (x, y) ≤ z ≤ ϕ2 (x, y)}


80 Double and triple integrals

z = ϕ1 (x, y)
z = ϕ2 (x, y)

y
x

where we have drawn the graphs of the functions ϕ1 , ϕ2 giving the top and bottom of the
solid. The volume of S is given by
ZZ
(3.3)

Vol S := ϕ2 (x, y) − ϕ1 (x, y) dA .
D

Example 3.6.2. Let’s compute the volume of the solid

S = {(x, y, z) : 0 ≤ z ≤ 1 − x2 − y 2 }.

y
x

First we note that

S = {(x, y, z) : (x, y) ∈ D, 0 ≤ z ≤ 1 − x2 − y 2 },

where
D = {(x, y) : x2 + y 2 ≤ 1}
is the unit disk. The region D is simple, so we can choose to describe it as an x-simple region
or as a y-simple region. Let’s describe it as an x-simple region:
p p
D = {(x, y) : −1 ≤ y ≤ 1, − 1 − y 2 ≤ x ≤ 1 − y 2 }.
Examples and applications of double integrals 81

Then we have
ZZ
Vol S = (1 − x2 − y 2 ) dA
D
Z 1 Z √1−y2
= √ (1 − x2 − y 2 ) dx dy
−1 − 1−y 2

1 
Z 3
√1−y2
x
= (1 − y 2 )x − dy
−1 3 x=−√1−y2
Z 1 
2
p
2
1 2 3/2
=2 (1 − y ) 1 − y − (1 − y ) dy
−1 3
4 1
Z
= (1 − y 2 )3/2 dy .
3 −1

We now make the substitution

t ∈ − π2 , π2 ,
 
y = sin t,

so that q
p
dy = cos(t) dt and 1− y2 = 1 − sin2 (t) = cos t.
Thus Z π/2
4
Vol S = cos4 (t) dt .
3 −π/2

Now, recall the double-angle identity

cos(2t) + 1
cos(2t) = 2 cos2 (t) − 1 =⇒ cos2 (t) = .
2
So we have
2
cos2 (2t) + 2 cos(2t) + 1

4 cos(2t) + 1
cos (t) = =
2 4
cos(4t) + 1 cos(2t) 1 cos(4t) cos(2t) 3
= + + = + + .
8 2 4 8 2 8
Therefore,
Z π/2  
4 cos(4t) cos(2t) 3
Vol S = + + dt
3 −π/2 8 2 8
 π/2
4 sin(4t) sin(2t) 3t
= + +
3 4·8 2·2 8 t=−π/2
4 3 π π
  π
= · + = . △
3 8 2 2 2
82 Double and triple integrals

It is sometimes useful to change the order of integration in an iterated integral. In some


cases, such a change makes the integral much easier to compute.

Example 3.6.3 ([MT11, Example 5.4.2]). Let’s evaluate the integral


Z 2 Z ln x √
(x − 1) 1 + e2y dy dx . (3.4)
1 0

This iterated integral is quite hard to compute as it is. However, it will become easier once
we change the order of integration.
We first note that the above integral is equal to
ZZ √
(x − 1) 1 + e2y dA,
D

where

D = {(x, y) : 1 ≤ x ≤ 2, 0 ≤ y ≤ ln x}.
y y = ln x
y=0
ln 2

D
x
1 2

The region D is simple, and is also given by

D = {(x, y) : 0 ≤ y ≤ ln 2, ey ≤ x ≤ 2}.

Thus, the integral (3.4) is equal to


ln 2 2 ln 2 √
Z Z p Z Z 2 
2
(x − 1) 1 + e y dx dy = 1 + e2y (x − 1) dx dy
0 ey 0 ey
2
√ ln 2 2
Z 
x
= 1 + e2y − x dy
0 2 ey
Z ln 2  2y √

e
=− − ey 1 + e2y dy
0 2
1 ln 2 2y √
Z Z ln 2 √
=− e 1 + e2y dy + ey 1 + e2y dy . (3.5)
2 0 0
Examples and applications of double integrals 83

In the first integral in (3.5), we substitute

u = e2y , du = 2e2y dy,

and in the second integral we substitute

v = ey , dv = ey dy .

This gives
1
Z 4 √
Z 2 √
− 1 + u du + 1 + v 2 dv .
4 1 1

Both of these integrals can be computed using methods from single-variable calculus. (For
instance, the antiderivatives are included in standard lists.) For the first integral, we have
4
4 √
Z 
2 2 3/2
1 + u du = (1 + u)3/2 5 − 23/2 .

=
1 3 1 3

The second integral is


Z 2√
1h √ √ i2
1 + v 2 dv = v 1 + v 2 + ln( 1 + v 2 + v)
1 2 1
1 √ √  1 √ √ 
= 2 5 + ln( 5 + 2) − 2 + ln( 2 + 1) .
2 2
All together, the integral (3.4) is equal to

1 2 3/2 3/2 1 √ √  1 √ √ 
− · (5 − 2 ) + 2 5 + ln( 5 + 2) − 2 + ln( 2 + 1) .
4 3 2 2
√ !
1 √ √ 5+2 1 3/2
5 − 23/2 . △

= 2 5 − 2 + ln √ −
2 2+1 6

Example 3.6.4. How could we compute an integral over the region

D = {(x, y) : 1 ≤ x2 + y 2 ≤ 2}?
y
2

1 D

x
1 2
84 Double and triple integrals

The region D is neither x-simple nor y-simple. However, we can use additivity of the integral
(Proposition 3.5.3) and split D into two y-simple regions:

y y
2

1 D1

x x
1 2 1 2
−1 D2

−2

Alternatively, we could split D into two x-simple regions. △

Exercises.
Exercises from [FRYb, §3.1] : Q4–Q29.

3.7 Triple integrals


Having discussed double integrals in some depth, we now turn our attention to higher dimen-
sional integrals. The theory is analogous to the two-variable case and so we will be rather
brief here. Although one can define multiple integrals in any finite dimension, we restrict
our attention to triple integrals.
Our goal is to define the triple integral of a function over a box (more precisely, a rect-
angular parallelepiped)
B = [a, b] × [c, d] × [p, q].
As for double integrals, we partition the three sides of the box into equal parts by subdividing
the intervals
b−a
a = x0 < x1 < · · · < xn = b, xi+1 − xi = ∆x := ,
n
d−c
c = y0 < y1 < · · · < yn = d, yj+1 − yj = ∆y := ,
n
q−p
p = z0 < z1 < · · · < zn = q, zk+1 − zk = ∆z := .
n
In this way we partition the box B into sub-boxes

Bijk := [xi , xi+1 ] × [yj , yj+1 ] × [zk , zk+1 ], 0 ≤ i, j, k ≤ n − 1.


Triple integrals 85

Then we form the sum


n−1
X
Sn = f (cijk )∆V,
i,j,k=0

where cijk is any point in Bijk and

∆V = ∆x∆y∆z

is the volume of each sub-box Bijk .

Definition 3.7.1 (Triple integral). Suppose that B = [a, b] × [c, d] × [p, q] ⊆ R3 and that f
is a bounded function f : B → R. If limn→∞ Sn = S exists and is independent of the choice
of the points cijk , then we say f is integrable and we call S the triple integral (or simple the
integral ) of f over B. We denote it by
ZZZ ZZZ ZZZ
f dV , f (x, y, z) dV , or f (x, y, z) dx dy dz .
B B B

The triple integral satisfies the properties of linearity, homogeneity, monotonicity, and
additivity. (See Propositions 3.3.4 and 3.5.3.) As for the double integral, we compute the
triple integral in practice by reducing it to an iterated integral.

Theorem 3.7.2. Suppose that B = [a, b] × [c, d] × [p, q] ⊆ R3 and f : B → R is integrable.


Then any iterated integral that exists is equal to the triple integral. For example,
ZZZ Z qZ d Z b
f (x, y, z) dV = f (x, y, z) dx dy dz
p c a
B
Z qZ bZ d
= f (x, y, z) dy dx dz
p a c
Z bZ d Z q
= f (x, y, z) dz dy dx .
a c p

(There are six possible iterated integrals.)

Example 3.7.3. Let’s compute


ZZZ
(2x + y 2 − z) dV where B = [−1, 1] × [0, 1] × [0, 3].
B
86 Double and triple integrals

Using Theorem 3.7.2, we have


ZZZ Z 3 Z 1 Z 1
2
(2x + y − z) dV = (2x + y 2 − z) dx dy dz
0 0 −1
B
Z 3 Z 1 1
x2 + (y 2 − z)x

= x=−1
dy dz
0 0
Z 3 Z 1
=2 (y 2 − z) dy dz
0 0
Z 3 3 1
y
=2 − zy dz
0 3 y=0
Z 3 
1
=2 − z dz
0 3
3
z z2
  
9
=2 − =2 1− = −7. △
3 2 z=0 2

As for double integrals, we often want to integrate over more general regions W , not just
boxes. As we did in Section 3.5, we do this by enclosing the region in a box and extending
our function by zero outside of W .
An elementary region in R3 is one defined by restricting one of the variables to lie between
two functions of the remaining variables, and where the domain of these two functions is an
elementary region in the plane. For example, if D ⊆ R2 is an elementary region and

γ1 , γ2 : D → R, γ1 (x, y) ≤ γ2 (x, y) for all (x, y) ∈ D,

then
W = {(x, y, z) : (x, y) ∈ D, γ1 (x, y) ≤ z ≤ γ2 (x, y)}
is an elementary region in R3 .

z z = γ1 (x, y)
D
z = γ2 (x, y)

Example 3.7.4. Let’s describe the unit ball

W = {(x, y, z) : x2 + y 2 + z 2 ≤ 1}
Triple integrals 87

as an elementary region.
First note that we can write
n p p o
W = (x, y, z) : (x, y) ∈ D, − 1 − x2 − y 2 ≤ z ≤ 1 − x2 − y 2 ,

where
D = {(x, y) : x2 + y 2 ≤ 1}
is the unit disk. We can then describe D as a x-simple or as a y-simple region. As a y-simple
region, it is √ √
D = {(x, y) : −1 ≤ x ≤ 1, − 1 − x2 ≤ y ≤ 1 − x2 }.
So we have described D by the inequalities

−1 ≤ x ≤ 1,
√ √
− 1 − x2 ≤ y ≤ 1 − x2 ,
p p
− 1 − x2 − y 2 ≤ z ≤ 1 − x2 − y 2 .

√ z √
z= 1−x2 −y 2 y = 1 − x2

y = − 1 − x2

x √
z=− 1−x2 −y 2

Theorem 3.7.5 (Triple iterated integrals). Suppose D ⊆ R2 is an elementary region,

γ1 , γ2 : D → R, γ1 (x, y) ≤ γ2 (x, y) for all (x, y) ∈ D,

and
W = {(x, y, z) : (x, y) ∈ D, γ1 (x, y) ≤ z ≤ γ2 (x, y)}.
Then Z bZ
ZZZ ϕ2 (x) Z γ2 (x,y)
f (x, y, z) dV = f (x, y, z) dz dy dx
a ϕ1 (x) γ1 (x,y)
W

if
D = {(x, y) : a ≤ x ≤ b, ϕ1 (x) ≤ y ≤ ϕ2 (x)}
is a y-simple region, and
ZZZ Z d Z ψ2 (y) Z γ2 (x,y)
f (x, y, z) dV = f (x, y, z) dz dx dy
c ψ1 (y) γ1 (x,y)
W
88 Double and triple integrals

if
D = {(x, y) : c ≤ y ≤ d, ψ1 (y) ≤ x ≤ ψ2 (y)}
is an x-simple region, provided all these integrals exist.
There are analogues of Theorem 3.7.5 for other orders of the variables.

Example 3.7.6. In the setting of Theorem 3.7.5, suppose f (x, y, z) = 1 for all (x, y, z) ∈ W .
Then ZZZ ZZ
dV = (γ2 (x, y) − γ1 (x, y)) dA
W D

is the volume of W . (Compare to (3.3).) △

Often the hardest part of computing a triple integral is finding the limits of integration.

Example 3.7.7 (cf. [MT11, Example 5.5.5]). Let W be the region bounded by the planes
x = 0, y = 0, and z = 2, and theRRR
surface z = x2 + y 2 , and lying in the quadrant x ≥ 0,
y ≥ 0. Let’s compute the integral y dx dy dz.
W
We draw the region W from three different perspectives:
z z z

y x

x x y

We can describe this region as


√ √
W = {(x, y, z) : 0 ≤ x ≤ 2, 0 ≤ y ≤ 2 − x2 , x2 + y 2 ≤ z ≤ 2}.

Thus we have
√ √
ZZZ Z 2 Z 2−x2 Z 2
y dx dy dz = y dz dy dx
0 0 x2 +y 2
W
√ √
Z 2 Z 2−x2
= [yz]2z=x2 +y2 dy dx
0 0
√ √
Z 2 Z 2−x2
= y(2 − x2 − y 2 ) dy dx
0 0
Triple integrals 89

Z √
2  2 2 4
√2−x2
xy y
= y2 − − dx
0 2 4 y=0

Z 2  
1 2 22 1 2 2
= 2 − x − x (2 − x ) − (2 − x ) dx
0 2 4
Z √2  
1 4
= x − x2 + 1 dx
0 4
√ 2 √ √ √
2 2 2 √
 5
x x3 8 2
= − +x = − + 2= . △
20 3 x=0 5 3 15

Exercises.
Exercises from [FRYb, §3.5] : Q1–Q22.
Chapter 4

Change of variables

In this chapter we discuss changing variables in double and triple integrals. This is a very
useful technique and one can sometimes greatly simplify an integral by changing to more
suitable variables.

4.1 Two variables


Recall the change of variable formula, also called substitution, for single-variable integrals:
Z b Z φ−1 (b)
f (x) dx = f (φ(u))φ′ (u) du, (4.1)
a φ−1 (a)

where φ : [c, d] → [a, b] is a C 1 one-to-one function. We often write

x = φ(u), dx = φ′ (u) du,

to remember the extra factor of φ′ that we need to include. We made these types of change
of variables in Example 3.6.3. Sometime we use the change of variables formula in the other
direction: Z b Z φ(b)
f (φ(x))φ′ (x) dx = f (u) du .
a φ(a)
R √π/2
For example, to compute 0
x cos(x2 ) dx, we can make the substitution

u = φ(x) = x2 , du = 2x dx,

so that
Z √π/2
1 π/2
Z
1 1
x cos(x ) dx = 2
cos u du = [sin u]π/2
0 = .
0 2 0 2 2
RbRd
In an iterated integral a c f (x, y) dy dx, we could make such a change of variables for
x and y independently. However, we often want to change x and y simultaneously, in such
a way that the new variables u and v are functions of both x and y:

(u, v) = T (x, y).

90
Two variables 91

Since our definition of the double integral (Definition 3.3.1) involved dividing a region
into rectangles, the key to developing the change of variables formula in two dimensions is
seeing how the change of variables affects the area of a rectangle. In fact, it is enough see
how it affects the area of the unit square.
Recall that any linear map R2 → R2 corresponds to left multiplication by a 2 × 2 matrix.
Consider the matrix  
a b
M= ∈ Mat2×2 (R).
c d
This sends the unit square [0, 1] × [0, 1], which has vertices

(0, 0), (1, 0), (0, 1), (1, 1),

to the parallelogram with vertices

(0, 0), (a, c), (b, d), (a + b, c + d).

1 1
(c, d)
M

(a, b)

1 1

The area of this parallelogram is the absolute value of

ad − bc = det M,

the determinant of M .
In general, our change of variables may not be given by a linear map. However, we can
approximate it by a linear map. Suppose our change of variables is given by

(x, y) = T (u, v), where T : D∗ → R2 is C 1 .

We learned in Section 1.5 that the best linear approximation to the function T is given by
its Jacobian matrix " ∂x ∂x #
∂u ∂v
DT = ∂y ∂y
.
∂u ∂v

The determinant of this matrix is called the Jacobian determinant, or simply the Jacobian,
of T . It is denoted " ∂x ∂x #
∂(x, y) ∂u ∂v
:= det ∂y ∂y .
∂(u, v)
∂u ∂v
92 Change of variables

In the limit defining the two-variable integral, this approximation becomes better and better.
So when we change variables in two dimensions, we need to adjust by a factor equal to the
absolute value of the Jacobian, to correct for the change in area induced by the change
of variables. Recall that a function is bijective if it is injective (one-to-one) and surjective
(onto).

Theorem 4.1.1 (Change of variables for double integrals). Suppose that D and D∗ are
elementary regions in the plane, and that

T : D∗ → D, (u, v) 7→ (x(u, v), y(u, v)),

is bijective and of class C 1 . Then, for any integrable function f : D → R, we have


ZZ ZZ
∂(x, y)
f (x, y) dx dy = f (x(u, v), y(u, v)) du dv,
∂(u, v)
D D∗

where (x, y) = T (u, v).

We can remember the formula in Theorem 4.1.1 by

∂(x, y)
dA = dx dy = du dv . (4.2)
∂(u, v)
Example 4.1.2. Let P be the parallelogram bounded by

y = 3x, y = x, y = x + 2, y = 3x − 2.
y
• (2,4)

y=x+2

(1,3) •

y=3x y=3x−2
P

1 • (1,1)

y=x
x
1

Suppose we want to compute the integral


ZZ
xy dx dy .
P

Since P is an elementary region, we could compute this as an iterated integral. However,


this would require breaking the region P into pieces. Alternatively, we can change variables
in such a way that the region becomes much easier to work with.
Two variables 93

Let’s make the change of variables

x = u + v, y = 3u + v, so that T (u, v) = (u + v, 3u + v).

This T is the linear map given by the matrix


 
1 1
.
3 1

Since the determinant of this matrix is −2, which is nonzero, it is one-to-one (as you learned
in linear algebra). Thus we can use Theorem 4.1.1. We chose T so that it maps the unit
square P ∗ to P :

v y
• (2,4)

y=x+2

1 (1,3) •

T y=3x y=3x−2
⇝ P
P∗
1 • (1,1)

y=x
u
x
1
1

Furthermore  
∂(x, y) 1 1
= det = | − 2| = 2.
∂(u, v) 3 1
Thus, we have
dA = dx dy = 2 du dv .
Therefore, by the change of variables formula (Theorem 4.1.1), we have
ZZ ZZ Z 1 Z 1
xy dx dy = (u + v)(3u + v)2 du dv = 2 (3u2 + 4uv + v 2 ) du dv
0 0
P P∗
1 1 1
v3
Z Z 
 3 1 14
=2 u + 2u2 v + uv 2 u=0 dv = 2 2
(1 + 2v + v ) dv = 2 v + v + 2
= . △
0 0 3 v=0 3
94 Change of variables

Exercises.
4.1.1. The one-variable analogue of Theorem 4.1.1 recovers (4.1) in the following sense.
Suppose a < b, c < d and let
I = [a, b], J = [c, d]
Furthermore, suppose that φ : [c, d] → [a, b] is a C 1 bijective function. Show that the formula
Z b Z d
f (x) dx = f (φ(u))|φ′ (u)| du
a c

recovers the single-variable change of variable formula (4.1). The subtle point here is the
presence of the absolute value above. Hint: The assumptions on φ imply that it is monotonic.

4.1.2 ([MT11, §6.2, Exercises 1, 2]). For each of the following integrals, suggest a change of
variables that will simplify the integrand. Then find the corresponding Jacobian determinant.
(a)
RR
(3x + 2y) sin(x − y) dA
R

(b) e−4x+7y cos(7x − 2y) dA


RR
R

(c)
RR
(5x + y)3 (x + 9y)4 dA
R

(d)
RR 
x sin(6x + 7y) − 3y sin(6x + 7y) dA
R

4.1.3 ([MT11, §6.2, Exercise 4]). Let D be the region 0 ≤ y ≤ x and 0 ≤ x ≤ 1. Evaluate
ZZ
(x + y) dx dy
D

by making the change of variables x = u + v, y = u − v. Check your answer by evaluating


the integral directly (i.e. without the change of variables) using an iterated integral.

4.1.4 ([MT11, §6.2, Exercise 8]). Define T (u, v) = (u2 − v 2 , 2uv). Let

D∗ = {(u, v) : u2 + v 2 ≤ 1, u ≥ 0, v ≥ 0}.

Find D = T (D∗ ) and evaluate


RR
dx dy.
D

4.1.5 ([MT11, §6.2, Exercise 10]). Calculate


ZZ
1
dy dx,
x+y
R

where R is the region bounded by x = 0, y = 0, x + y = 1, x + y = 4, by using the mapping


T (u, v) = (u − uv, uv).
Polar coordinates 95

R 1 R x2
4.1.6 ([MT11, §6.2, Exercise 14]). (a) Express 0 0 xy dy dx as an integral over the tri-
angle
D∗ = {(u, v) : 0 ≤ u ≤ 1, 0 ≤ v ≤ u}.
Hint: Find a one-to-one mapping T of D∗ onto the given region of integration.
(b) Evaluate this integral directly and as an integral over D∗ .

4.2 Polar coordinates


As a special case of the change of variables technique of Section 4.1, we now develop a
method for computing integrals in polar coordinates. A good reference for the material in
this section is [FRYc, §3.2].
Definition 4.2.1 (Polar coordinates). The polar coordinates of the point (x, y) ∈ R2 is the
ordered pair (r, θ), where r is the distance from (0, 0) to (x, y) and θ is the counter-clockwise
angle between the x-axis and the line joining (x, y) to (0, 0).
y
• (x, y)
r

θ x

So we have

x = r cos θ, y = r sin θ,
p y
r = x2 + y 2 , θ = arctan .
x
We can apply the general theory of Section 4.1. Taking

(x, y) = T (r, θ) = (r cos θ, r sin θ), (4.3)

we have  
∂(x, y) cos θ −r sin θ
= det DT = det = r(cos2 θ + sin2 θ) = r.
∂(r, θ) sin θ r cos θ
Note that T is not one-to-one, since it sends all (0, θ) to (0, 0). However, the change of
variables theorem is still valid. This is because the set of points where T is not one-to-one
lies on the graph of a smooth curve (in particular, it is a one-dimensional region with zero
area) and hence can be neglected for the purposes of integration.
Theorem 4.2.2 (Double integral in polar coordinates). Suppose that D is an elementary
region and that the map T of (4.3) gives a bijection from D∗ to D, except possibly for points
on the boundary of D∗ . Then
ZZ ZZ
f (x, y) dx dy = f (r cos θ, r sin θ)r dr dθ .
D D∗
96 Change of variables

Example 4.2.3. Let’s evaluate ZZ


ln(x2 + y 2 ) dx dy,
D

where D is the region in the first quadrant lying between the arcs of the circles x2 + y 2 = a2
and x2 + y 2 = b2 , where 0 < a < b.

θ y

π/2

T

D∗ D

r x
a b a b

In polar coordinates, the region D corresponds to

D∗ = {(r, θ) : a ≤ r ≤ b, 0 ≤ θ ≤ π2 } = [a, b] × 0, π2 .
 

Therefore, by Theorem 4.2.2, we have


ZZ Z bZ π/2 Z b Z b
2 2 2 π 2
ln(x + y ) dx dy = r ln(r ) dθ dr = r ln(r ) dr = π r ln r dr .
a 0 2 a a
D

Using integration by parts, we obtain


b
r2 r2

π
2b2 ln b − 2a2 ln a + a2 − b2 .

π ln r − = △
2 4 r=a 4

Example 4.2.4. Let’s compute the area of a sector of a circle of radius R and angle α.

D
α x
R

In polar coordinates, the region is given by

0 ≤ r ≤ R, 0 ≤ θ ≤ α.
Three variables 97

So we have
R α R R
r2
ZZ Z Z Z 
1
Area(D) = dA = r dθ dr = α r dr = α = αR2 .
0 0 0 2 r=0 2
D

In particular, the area of the circle of radius R (so α = 2π) is πR2 . △

Exercises.
Exercises from [FRYb, §3.2] : Q1–Q24.

4.3 Three variables


We can generalize the theory of Section 4.1 to arbitrary dimensions. We covered two dimen-
sions in Section 4.1 and one dimension in previous courses (see also Exercise 4.1.1). We now
treat the case of three dimensions.
The situation for three dimensions is very similar to the two-dimensional case treated in
Section 4.1. The key now is that the linear map R3 → R3 given by left multiplication by the
matrix M ∈ Mat3×3 (R) sends the unit cube [0, 1]3 to the parallelepiped determined by the
vectors M e1 , M e2 , M e3 , where

e1 = (1, 0, 0), e2 = (0, 1, 0), e3 = (0, 0, 1)

are the standard basis vectors.


z
z

M

y y
x
x

This parallelepiped has volume | det M |.


Now suppose we introduce new variables u, v, w such that

(x, y, z) = T (u, v, w)

for some C 1 function


T : R3 → R3 .
98 Change of variables

As in Section 4.1, we use the fact that the best linear approximation to T is given by the
Jacobian matrix DT . Therefore, the factor we need to add to our integral when changing
coordinates is the absolute value of the Jacobian determinant
 ∂x ∂x ∂x 
∂u ∂v ∂w
∂(x, y, z)  ∂y ∂y ∂y
:= det(DT ) = det  ∂u .

∂(u, v, w) ∂v ∂w
∂z ∂z ∂z
∂u ∂v ∂w

Theorem 4.3.1 (Change of variables for triple integrals). Suppose that W and W ∗ are
elementary regions in R3 , and that
T : W ∗ → W, (u, v, w) 7→ (x(u, v, w), y(u, v, w), z(u, v, w)),
is of class C 1 and is bijective, except possibly on a set that is a finite union of graphs of
functions of two variables. Then, for any integrable function f : W → R, we have
ZZZ ZZZ
∂(x, y, z)
f (x, y, z) dx dy dz = f (x(u, v, w), y(u, v, w), z(u, v, w)) du dv dw .
∂(u, v, w)
W W∗

Exercises.
4.3.1 ([MT11, §6.2, Exercise 20]). Let T : R3 → R3 be defined by
T (u, v, w) = (u cos v cos w, u sin v cos w, u sin w).
(a) Show that T is onto the unit sphere; that is, every (x, y, z) with x2 + y 2 + z 2 = 1 can
be written as (x, y, z) = T (u, v, w) for some (u, v, w).
(b) Show that T is not one-to-one.

4.4 Cylindrical coordinates


Let’s apply the general change of variables formula for triple integrals (Theorem 4.3.1) to
cylindrical coordinates. A good reference for the material in this section is [FRYc, §3.6].
z

θ r y

Cylindrical coordinates, denoted (r, θ, z), of a point (x, y, z) ∈ R3 , are defined by:
Cylindrical coordinates 99

• r is the distance from (x, y, 0) to (0, 0, 0), which is the same as the distance from
(x, y, z) to the z-axis;
• θ is the angle between the positive x-axis and the line joining (x, y, 0) to (0, 0, 0);
• z is the signed distance from (x, y, z) to the xy-plane (that is, it is the same z as in
cartesian coordinates).
In other words, (r, θ) are polar coordinates in the xy-plane, and z is the usual z.
Cartesian and cylindrical coordinates are related by:

x = r cos θ, y = r sin θ, z = z,
p
r = x2 + y 2 , θ = arctan xy , z = z.

So we consider the change of variables map

(x, y, z) = T (r, θ, z) = (r cos θ, r sin θ, z). (4.4)

The Jacobian matrix of T is


 ∂x ∂x ∂x   
∂r ∂θ ∂z cos θ −r sin θ 0
 ∂y ∂y ∂y 
DT =  ∂r ∂θ ∂z 
=  sin θ r cos θ 0 .
∂z ∂z ∂z 0 0 1
∂r ∂θ ∂z

Its determinant is
∂(x, y, z)
= r(cos2 θ + sin2 θ) = r.
∂(r, θ, z)
Applying Theorem 4.3.1 to this change of variables, we obtain the following result.
Theorem 4.4.1 (Triple integral in cylindrical coordinates). Suppose that W is an elementary
region in R3 and that the map T of (4.4) gives a bijection from W ∗ to W , except possibly
for points on the boundary of W ∗ . Then
ZZZ ZZZ
f (x, y, z) dx dy dz = f (r cos θ, r sin θ, z)r dr dθ dz .
W W∗

Example 4.4.2. Consider the cone

W = {(x, y, z) : z ≥ 0, x2 + y 2 ≤ z 2 ≤ 1}.
z

y
x

Suppose the density of the cone at point (x, y, z) is equal to z. What is the mass of the cone?
100 Change of variables

In cylindrical coordinates, the cone is given by

W ∗ = {(r, θ, z) : 0 ≤ z ≤ 1, 0 ≤ θ ≤ 2π, 0 ≤ r ≤ z}

Thus, we have
ZZZ ZZZ Z 1 Z z Z 2π
Mass = z dV = rz dr dθ dz = rz dθ dr dz
0 0 0
W W∗
1 z 1 z 1 1
r2 z z4
Z Z Z  Z 
3 π
= 2π rz dr dz = 2π dz = π z dz = π = . △
0 0 0 2 r=0 0 4 z=0 4

Exercises.
Exercises from [FRYb, §3.6] : Q1–Q17.

4.5 Spherical coordinates


Now let’s apply the general change of variables formula for triple integrals (Theorem 4.3.1)
to spherical coordinates. A good reference for the material in this section is [FRYc, §3.7].

θ y

Spherical coordinates, denoted (ρ, θ, φ), of a point (x, y, z) ∈ R3 are defined by:

• ρ is the distance from (x, y, z) to (0, 0, 0);


• θ is the angle between the positive x-axis and the line joining (x, y, 0) to (0, 0, 0);
• φ is the angle between the positive z-axis and the line joining (x, y, z) to (0, 0, 0).

The spherical coordinate θ is the same as the cylindrical coordinate θ, while the spherical
coordinate φ is new. We generally restrict these coordinates to the ranges

ρ ≥ 0, 0 ≤ θ ≤ 2π, 0 ≤ φ ≤ π.
Spherical coordinates 101

Cartesian and spherical coordinates are related by


x = ρ sin φ cos θ, y = ρ sin φ sin θ, z = ρ cos φ,
p
p y x2 + y 2
ρ = x2 + y 2 + z 2 , θ = arctan , φ = arctan .
x z
So we consider the change of variables map
(x, y, z) = T (ρ, θ, φ) = (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ). (4.5)
As we noted in Example 1.12.2, we have
 
sin φ cos θ −ρ sin φ sin θ ρ cos φ cos θ
DT =  sin φ sin θ ρ sin φ cos θ ρ cos φ sin θ  .
cos φ 0 −ρ sin φ
and
det(DT ) = −ρ2 sin φ.
Applying Theorem 4.3.1 to this change of variables, we obtain the following result.
Theorem 4.5.1 (Triple integral in spherical coordinates). Suppose that W is an elementary
region in R3 and that the map T of (4.5) gives a bijection from W ∗ to W , except possibly
for points on the boundary of W ∗ . Then
ZZZ ZZZ
f (x, y, z) dx dy dz = f (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ)ρ2 sin φ dρ dθ dφ .
W W∗

Example 4.5.2 (Volume of a ball). Consider the ball of radius R, centred at the origin:
B = {(x, y, z) : x2 + y 2 + z 2 ≤ R2 }.
In spherical coordinates the ball is given by
B ∗ = {(ρ, θ, φ) : 0 ≤ r ≤ R, 0 ≤ θ ≤ 2π, 0 ≤ φ ≤ π}.
Therefore
ZZZ ZZZ Z π Z 2π Z R
2
Vol B = dx dy dz = ρ sin φ dρ dθ dφ = ρ2 sin φ dρ dθ dφ
0 0 0
B B∗
π 2π R π 2π π
ρ3 R3 2πR3
Z Z  Z Z Z
= sin φ dθ dφ = sin φ dθ dφ = sin φ dφ
0 0 3 ρ=0 3 0 0 3 0

2πR3 4πR3
= [− cos φ]πφ=0 = . △
3 3

Exercises.
Exercises from [FRYb, §3.7] : Q1–Q31.
Chapter 5

Vector fields

Many of the remaining topics in the course will involve vector fields. In this chapter we
define vector fields and discuss some important related concepts. A good reference for the
material in this chapter is [FRYe, §2.1–§2.3].

5.1 Definitions and first examples


Definition 5.1.1 (Vector field). (a) A vector field in the plane (or in two dimensions) is
a function
v : D → R2 , D ⊆ R2 .
We denote its components by v(x, y) = (v1 (x, y), v2 (x, y)).
(b) A vector field in space (or in three dimensions) is a function
v : W → R3 , W ⊆ R3 .
We denote its components by v(x, y, z) = (v1 (x, y, z), v2 (x, y, z), v3 (x, y, z)).
Examples 5.1.2. (a) If a moving fluid occupies some region W ⊆ R3 and the velocity of
the fluid is v(x, y, z) at position (x, y, z), then v is called a velocity field.
(b) If there is a force F(x, y, z) at position (x, y, z), then F is called a force field.

We will sometimes denote the unit vectors in the plane by


i = (1, 0), j = (0, 1),
and the unit vectors in space by
i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1).
Note that we use the notation i to denote both (1, 0) and (1, 0, 0), and similarly for j. The
context should make clear which one we mean. Alternatively, if we view R2 as the subset
{(x, y, 0) : x, y ∈ R} ⊆ R3 , then there is no ambiguity.
It is sometimes useful to sketch a vector field by drawing its values as arrows at a selection
of points.

102
Definitions and first examples 103

Example 5.1.3. The constant vector field

v(x, y) = 2i + j = (2, 1)

can be sketched as follows:

Example 5.1.4. The vector field


v(x, y) = yi

can be sketched as follows:

Example 5.1.5. The vector field

v(x, y) = −yi + xj = (−y, x)


104 Vector fields

can be sketched as follows:


y

We will sometimes use the notation


p
r = (x, y, z) and r = |r| = x2 + y 2 + z 2

to denote points in R3 , and describe vector fields in terms of r and r.

Example 5.1.6 (Gravity). Suppose an object of mass M is placed at the origin (0, 0, 0).
Newton’s law of gravitation states that the magnitude of the gravitational force experienced
by a mass m at point r = (x, y, z) is given by GMr2
m
, where G is the gravitational constant and
r = x + y + z is the distance to the origin (in other words, r is the distance between the
p
2 2 2

two objects). This force points towards the origin. Thus, since − rr is a unit vector pointing
towards the origin, the gravitational force field for the object of mass m is given by
GM m
F(r) = − r. △
r3

Exercises.
Exercises from [FRYd, §2.1] : Q1–Q15.

5.2 Conservative vector fields


As we will see in later chapters, some vector fields are considerably easier to work with than
others. In this section, we examine one such type of vector field.
Recall, from (1.12), that the gradient of a differentiable function f : A → R, A ⊆ Rm , is
 
∂f ∂f
grad f = ∇f := ,..., .
∂x1 ∂xm
Conservative vector fields 105

If m = 2, then we have
∂f ∂f
∇f = i+ j,
∂x ∂y
and, if m = 3, we have
∂f ∂f ∂f
∇f =
i+ j+ k.
∂x ∂y ∂z
We sometimes remember this by writing
∂ ∂ ∂
∇=i +j +k . (5.1)
∂x ∂y ∂y
Definition 5.2.1 (Conservative vector field, potential, equipotential curve and surface).
(a) A vector field F is said to be conservative if there exists a function φ such that
F = ∇φ.
In this case, φ is called a potential for F. Note that if φ is a potential for F and C ∈ R,
then φ + C is also a potential for F.
(b) If F = ∇φ is a conservative field in three dimensions with potential φ and C ∈ R, then
the set
{(x, y, z) : φ(x, y, z) = C}
is called an equipotential surface. Similarly, in two dimensions,
{(x, y) : φ(x, y) = C}
is called an equipotential curve.
Note that equipotential curves and surfaces are just level sets (see (1.15)). It follows
from Proposition 1.8.6, that if F = ∇φ is a conservative vector field, then the vector field is
orthogonal to the equipotential surface or curve.

Warning 5.2.2. Physicists use a different convention for potentials. In physics, φ is called a
potential for F when F = −∇φ. Note the minus sign.

Example 5.2.3. Recall the gravitational force field of Example 5.1.6. This force is conserva-
tive, with potential
GM m
φ(r) = .
r
Indeed, we have
∂ ∂ GM m 1 GM m · 2x GM m
φ(r) = p =− = − x,
∂x ∂x x2 + y 2 + z 2 2 (x2 + y 2 + z 2 )3/2 r3
∂ ∂ GM m 1 GM m · 2y GM m
φ(r) = p =− 2 2 2 3/2
= − 3 y,
∂y ∂y x2 + y 2 + z 2 2 (x + y + z ) r
∂ ∂ GM m 1 GM m · 2z GM m
φ(r) = p =− = − z.
∂z ∂z x2 + y 2 + z 2 2 (x2 + y 2 + z 2 )3/2 r3
Thus,
GM m GM m
∇φ(r) = − 3
(x, y, z) = − 3 r = F(r). △
r r
106 Vector fields

Example 5.2.4. Consider the vector field

F(x, y) = xi − yj.

Let’s try to show this field is conservative and find a potential. A potential φ would have to
satisfy
∂φ ∂φ
(x, y) = x and (x, y) = −y. (5.2)
∂x ∂y
In order to satisfy the first equation in (5.2), we would need to have

x2
φ(x, y) = + ψ(y),
2
where ψ(y) is a function of y only. For this to also satisfy the second equation in (5.2), we
need
∂ x2
 
∂φ
−y = (x, y) = + ψ(u) = ψ ′ (y).
∂y ∂y 2
This holds if and only if
y2
ψ(y) = − + C, C ∈ R.
2
Therefore,
x2 y 2
φ(x, y) = − +C
2 2
is a potential for any C ∈ R. △

We conclude this section with an example of a field that is not conservative.

Example 5.2.5. Consider the vector field

v(x, y) = −yi + xj.

Let’s prove by contradiction that v is not conservative. Thus, we begin by assuming that v
is conservative (and then try to arrive at a contradiction). So v = ∇φ for some differentiable
function φ : R2 → R. Thus we have
∂φ ∂φ
(x, y) = −y and (x, y) = x.
∂x ∂y
Proceeding as in Example 5.2.4, we have

φ(x, y) = −xy + ψ(y)

for some function ψ : R → R. Then we have


∂φ ∂
x= (x, y) = (−xy + ψ(y)) = −x + ψ ′ (y) ⇐⇒ ψ ′ (y) = 2x.
∂y ∂y
But this is a contradiction, since ψ(y) cannot depend on x. Hence, v is not conservative. △
Curl 107

Exercises.
Exercises are deferred until Section 5.3.

5.3 Curl
We now introduce the important notion of the curl of a vector field. In addition to being
important in later chapters, we will see that curl gives an important test for conservative
vector fields.

Definition 5.3.1 (Curl). The curl of a vector field F(x, y, z), is defined by
     
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
curl F = ∇ × F = − i+ − j+ − k
∂y ∂z ∂z ∂x ∂x ∂y
 
i j k
∂ ∂ ∂ 
= det  ∂x ∂y ∂z .
F1 F2 F3

(The determinant expression is just a mnemonic device.) If ∇ × F = 0, we say that F is


curl free.

Note that curl only applies to vector fields in three dimensions. However, any vector field
v(x, y) in two dimensions can be extended to a vector field in three dimensions by setting

ṽ(x, y, z) = v(x, y).

Then its curl is given by  


∂v2 ∂v1
∇ × ṽ = − k. (5.3)
∂x ∂y
We will sometimes simply reuse the notation v for the three-dimensional extension, writing
∇ × v for its curl. The quantity
∂v2 ∂v1

∂x ∂y
is sometimes called the scalar curl of the two-dimensional vector field v.

Examples 5.3.2. (a) Consider the gravitational force field

GM m GM m
F(r) = − 3
r=− 2 (x, y, z).
r (x + y 2 + z 2 )3/2

We have
∂F1 3GM mxy ∂F1 3GM mxz
= 2 , = 2 ,
∂y (x + y 2 + z 2 )5/2 ∂z (x + y 2 + z 2 )5/2
108 Vector fields

∂F2 3GM mxy ∂F2 3GM myz


= 2 , = 2 ,
∂x (x + y 2 + z 2 )5/2 ∂z (x + y 2 + z 2 )5/2
∂F3 3GM mxz ∂F3 3GM myz
= 2 , = 2 .
∂x (x + y 2 + z 2 )5/2 ∂y (x + y 2 + z 2 )5/2
Thus ∇ × F = 0.
(b) Consider the vector field

v(x, y, z) = −yi + xj = (−y, x, 0)

of Example 5.1.5 (extended to three dimensions). We have

∂v1 ∂v2
= −1 and = 1.
∂y ∂x
Therefore
∇ × v = 2k.

Theorem 5.3.3 (Test for conservative vector fields). (a) Suppose that F(x, y) is a C 2 vec-
tor field in the plane. If F is conservative, then
∂F1 ∂F2
= .
∂y ∂x

(b) Suppose that F(x, y, z) is a C 2 vector field in space. If F is conservative, then

∇ × F = 0.

Note that, using (5.3), both cases of Theorem 5.3.3 involve the condition ∇ × F = 0.
Proof. We give the proof of (b), since the proof of (a) is similar and easier (and can be
found in [FRYe, Th. 2.3.9].) Suppose F is conservative. Then there exists a potential φ with
F = ∇φ. In other words, we have
∂φ ∂φ ∂φ
= F1 , = F2 , = F3 .
∂x ∂y ∂z

Using Clairaut’s theorem (Theorem 1.9.6), we have

∂F3 ∂ 2φ ∂ 2φ ∂F2
= = = ,
∂y ∂y ∂z ∂z ∂y ∂z
∂F1 ∂ 2φ ∂ 2φ ∂F3
= = = ,
∂z ∂z ∂x ∂x ∂z ∂x
∂F2 ∂ 2φ ∂ 2φ ∂F1
= = = .
∂x ∂x ∂y ∂y ∂x ∂y
Hence ∇ × F = 0.
Curl 109

Example 5.3.4. In Example 5.3.2(b), we saw that the vector field v(x, y, z) = −yi + xj has
∂v1 ∂v2
= −1 ̸= 1 = .
∂y ∂x
Thus, by Theorem 5.3.3(a), we can conclude that v is not conservative. We also proved this
directly in Example 5.2.5. △

Warning 5.3.5. The converse of Theorem 5.3.3 is false! If ∇ × F ̸= 0, then you can use
Theorem 5.3.3 to conclude that F is not conservative. However, if ∇ × F = 0, you cannot
conclude that F is conservative without doing more work.

Example 5.3.6. Consider the vector field


 
y x
v(x, y) = − , 2 .
x + y x + y2
2 2

Then
∂v1 y 2 − x2 ∂v2
= 2 = .
∂y (x + y 2 )2 ∂x
However, v is not conservative. We will prove this in Example 6.4.4. △

We conclude this section with one final example.

Example 5.3.7 ([FRYe, Example 2.3.13]). Consider the vector field

F(x, y, z) = (y 2 + 2xz 2 − 1)i + (2x + 1)yj + (2x2 z + z 3 )k.

We’d like to determine whether or not this vector field is conservative and, if it is, find a
potential.
We first apply the test for conservative vector fields (Theorem 5.3.3). We compute
 
i j k
∂ ∂ ∂
∇ × F = det  ∂x ∂y ∂z
 = 0i − (4xz − 4xz)j + (2y − 2y)k = 0.
2 2 2 3
y + 2xz − 1 (2x + 1)y 2x z + z

So F passes the test. However, we must remember (Warning 5.3.5) that this, by itself, does
not imply that F is conservative.
Let’s try to find a potential φ. We need
∂φ
(x, y, z) = y 2 + 2xz 2 − 1, (5.4)
∂x
∂φ
(x, y, z) = (2x + 1)y, (5.5)
∂y
∂φ
(x, y, z) = 2x2 z + z 3 . (5.6)
∂z
From (5.4), we have
φ(x, y, z) = xy 2 + x2 z 2 − x + ψ(y, z).
110 Vector fields

This satisfies (5.5) if and only if


(xy 2 + x2 z 2 − x + ψ(y, z)) = (2x + 1)y
∂y
∂ψ
⇐⇒ 2xy + (y, z) = (2x + 1)y
∂y
∂ψ
⇐⇒ (y, z) = y
∂y
y2
⇐⇒ ψ(y, z) = + ζ(z).
2
Finally, this satisfies (5.6) if and only if

y2
 
∂ 2 2 2
xy + x z − x + + ζ(z) = 2x2 z + z 3
∂z 2
⇐⇒ 2x2 z + ζ ′ (z) = 2x2 z + z 3
z4
⇐⇒ ζ(z) = + C,
4
for some constant C ∈ R. Thus, one possible potential (the one with C = 0) is

y2 z4
φ(x, y, z) = xy 2 + x2 z 2 − x + + .
2 4
We can double-check our computations by computing the gradient:

∇φ(x, y, z) = (y 2 + 2xz 2 − 1)i + (2x + 1)yj + (2x2 z + z 3 )k = F(x, y, z). △

Exercises.
Exercises from [FRYd, §2.3] : Q2–Q11.
Chapter 6

Path and line integrals

In this chapter we will see how to integrate along curves. There are two types of such
integrals. We can integrate a scalar field or a vector field. The first gives what is called a
path integral, while the second is called a line integral. A good reference for the material in
this chapter is [FRYe, Ch. 1 and §2.4].

6.1 Path integrals


In this section we discuss the integration of a scalar field (just another name for real-valued
function) along a curve.
Recall from Definition 1.7.1 that a path is a map

r : [a, b] → Rn ,

where a, b ∈ R, a ≤ b. (In Section 1.7, we used the notation c(t). We will now start to
use boldface notation, such as r(t), since we want to emphasize the vector nature of these
paths.) Throughout this chapter, we will assume that paths are continuous. A curve is the
image of a path:
C = {r(t) : a ≤ t ≤ b}.
If r(t) is injective (one-to-one), we say that r(t) is a parameterization of the curve C. We
also allow the possibility that r(a) = r(b), in which case we say that C is a closed curve.

Definition 6.1.1 (Piecewise C k ). We say a function f : [a, b] → Rn is piecewise C k if there


exist real numbers a = t0 < t1 < . . . < tj = b such that the restrictions f |[ti−1 ,ti ] are C k for
i = 1, . . . , j.

Example 6.1.2. The function



2
(t, t )
 0 ≤ t ≤ 1,
f : [0, 3] → R2 , f (t) = (2t − 1, 2 − t) 1 ≤ t ≤ 2,

(5 − t, 3t − 6) 2 ≤ t ≤ 3.

111
112 Path and line integrals

is piecewise C 1 .
y

x
In fact, it is piecewise C for all k (hence piecewise C ∞ ).
k

Example 6.1.3. A curve can have many different parameterizations. For instance,
r : [0, 1] → R3 , r(t) = (t2 , 1 + t, 1 − t)
and
s : 0, 12 → R3 , s(t) = ((1 − 2t)2 , 2 − 2t, 2t)
 

both parameterize the same curve.


z

x

Suppose that we have a curve C that is parameterized as r(t), a ≤ t ≤ b. Suppose
furthermore that this curve is actually a wire whose density at the point r is ρ(r). Let’s try
to compute the mass of this wire.
We divide the interval [a, b] into n equal subintervals, each of length
b−a
∆t = .
n
Let
ti = a + i∆t
denote the endpoint of the i-th interval. We approximate the length of the part of the curve
between r(ti−1 ) and r(ti ) by ∥r(ti ) − r(ti−1 )∥ and the mass of this part of the curve by
ρ(r(ti ))∥r(ti ) − r(ti−1 )∥.
r(ti ) •

r(ti−1 ) •

This gives us an approximate mass for the wire of


n n
X X r(ti ) − r(ti−1 )
ρ(r(ti ))∥r(ti ) − r(ti−1 )∥ = ρ(r(ti )) ∆t.
i=1 i=1
ti − ti−1

We then take the limit as n → ∞. If r(t) is piecewise C 1 and ρ(r) is continuous, we get
Z b
dr
Mass of C = ρ(r(t)) (t) dt .
a dt
Path integrals 113

Definition 6.1.4 (Path integral, integral along a curve). Suppose C is a curve parameterized
by
r : [a, b] → Rn , a ≤ t ≤ b,
and f : A → R is a function, with C ⊆ A ⊆ Rn . Then we define
Z Z b
f ds := f (r(t))∥r′ (t)∥ dt . (6.1)
C a

In this notation, C specifies the curve, and ds denotes arc length. The integral (6.1) is called
a path integral.
If r is piecewise C 1 and
R f is piecewise continuous, then the integral (6.1) always exists.
Note that the notation C f ds only involves the curve C, and not the parameterizing path
r. We will justify this in Section 6.3.
Definition 6.1.5 (Length of a curve). Suppose C is a curve parameterized by a piecewise
C 1 path r : [a, b] → Rn , a ≤ t ≤ b. Then the length of C is
Z Z b
Length(C) := ds = ∥r′ (t)∥ dt .
C a

Example 6.1.6. Fix a, b ∈ R. Suppose we have a helical wire parameterized by

r(t) = (x(t), y(t), z(t)) = (a cos t, a sin t, bt), 0 ≤ t ≤ 2π.


z

y
x

Furthermore, suppose the density at the point (x, y, z) of the wire is equal to z. Let’s compute
the length and the mass of the wire.
We have
r′ (t) = (−a sin t, a cos t, b),
and so p √
∥r′ (t)∥ = a2 sin2 t + a2 cos2 t + b2 = a2 + b2 .
Thus, the length of the wire is
Z Z b Z b√ √

ds = ∥r (t)∥ dt = a2 + b2 dt = (b − a) a2 + b2 .
C a a

The mass of the wire is


Z b √ b
√ t2 √
Z 

bt∥r (t)∥ dt = 2 2 2
bt a + b dt = b a + b 2 = 12 b(b2 − a2 ) a2 + b2 . △
C a 2 t=a
114 Path and line integrals

Definition 6.1.7 (Average value along a curve). The average value of a function f on a
curve C is R
C
f ds
.
Length(C)
Example 6.1.8. Consider

r(t) = ( 3t2 , t3 − t), t ∈ [−1, 1].
Let’s find the average x-value on the corresponding curve C. We have

r′ (t) = (2 3t, 3t2 − 1),

∥r′ (t)∥2 = (2 3t)2 + (3t2 − 1)2
= 12t2 + 9t4 − 6t2 + 1
= 9t4 + 6t2 + 1
= (3t2 + 1)2
∥r′ (t)∥ = 3t2 + 1.
Therefore, Z Z 1 1
(3t2 + 1) dt = t3 + t t=−1 = 4

Length(C) = ds =
C −1
and
1 √ √ Z 1 4 √ 3t5 t3 1
Z Z  
2 2 2
x ds = 3t (3t + 1) dt = 3 (3t + t ) dt = 3 +
C −1 −1 5 3 t=−1

√ 6 2  28 3
= 3 5+3 = .
15

So the average x-value is 7 3
15
. △

We conclude this section with another intuitive interpretation of path integrals. Suppose
r : [a, b] → R2 is a path with image C and f : C → R is aRfunction taking nonnegative values.
So f (r(t)) ≥ 0 for all t ∈ [a, b]. Then the path integral C f ds is the surface area of a fence
along the curve C whose height at point r(t) is given by f (r(t)).

y
x
Line integrals 115

Exercises.
Exercises from [FRYd, §1.6] : Q1–Q11.

6.2 Line integrals


In Section 6.1 we saw how to integrate a scalar function along a curve. In this section, we
will see a second type of integral along curves. This time we will integrate a vector -valued
function along the curve.
If r : [a, b] → Rn is a parameterization of a curve C, we say that F is a vector field on the
path r (or on the curve C) if F : A → Rn is a vector field with C ⊆ A ⊆ Rn . In other words,
the domain of F contains the curve C.
Definition 6.2.1 (Line integral). Suppose r : [a, b] → Rn is a piecewise C 1 path, and F is a
piecewise continuous vector field on r. Then we define the line integral
Z Z b
F · ds := F(r(t)) · r′ (t) dt .
r a

If r(a) = r(b), then the notation r F · ds is also used. If n = 3 and F = (F1 , F2 , F3 ), we also
H

write Z Z
F · ds = (F1 dx +F2 dy +F3 dz).
r r
If n = 2 and F = (F1 , F2 ), we sometimes write
Z Z
F · ds = (F1 dx +F2 dy).
r r

The notation F · ds in Definition 6.2.1 is meant to remind you of the dot product. If
ds = (dx, dy, dy),
is an infinitesimal displacement, then F · ds = F1 dx +F2 dy +F3 dz.

Example 6.2.2. Suppose


r(t) = (t, t2 ), t ∈ [0, 1] and F(x, y) = (−y, x).
Let’s compute
R
r
F · ds.
We have
r′ (t) = (1, 2t)
F(r(t)) = F(t, t2 ) = (−t2 , t).
Therefore,
F(r(t)) · r′ (t) = (−t2 , t) · (1, 2t) = −t2 + 2t2 = t2 .
Thus,
Z Z 1 Z 1  3 1
′ 2 t 1
F · ds = F(r(t)) · r (t) dt = t dt = = . △
r 0 0 3 t=0 3
116 Path and line integrals

Definition 6.2.3 (Work). If F(t) denotes a force moving a particle whose position is given
by r(t), with t indicating time, then r F · ds is the work done by the force.
R

Example 6.2.4. Recall from Example 5.1.6 that the gravitational force field for an object of
mass m in the presence of a mass M at the origin is given by
GM m
F(r) = − r,
r3
where r = (x, y, z) is the position of the object of mass m. Suppose this object moves along
a path
r : [a, b] → R3 .
What work is done by gravity? By Definition 6.2.3,
Z Z b Z b
GM m
Work = F · ds = ′
F(r(t)) · r (t) dt = − r(t) · r′ (t) dt .
r a a ∥r(t)∥3
Let’s make the substitution

u = ∥r(t)∥2 = x(t)2 + y(t)2 + z(t)2 ,

so that
du = 2x(t)x′ (t) dt +2y(t)y ′ (t) dt +2z(t)z ′ (t) dt = 2r(t) · r′ (t) dt .
Thus
Z ∥r(b)∥2  ∥r(b)∥2
GM m GM m GM m GM m GM m GM m
Work = − 3/2
du = √ = − = − ,
∥r(a)∥2 2u u u=∥r(a)∥2 ∥r(b)∥ ∥r(a)∥ rend rstart

where rstart is the distance from the origin at the start of the path and rend is the distance
from the origin at the end of the path. Note that the total work done only depends on the
endpoints of the path, and not on the route taken in between or the time taken to travel
between the points. We will explain this phenomenon in Example 6.4.2. △

Exercises.
Exercises are deferred until Section 6.5.

6.3 Reparameterization
Our definitions of path integrals (Definition 6.1.4) and line integrals (Definition 6.2.1) both
involve a parameterization r(t) of a curve C. It is natural to ask how these integrals depend
on the parameterization. We now examine this question.
Suppose
r : [a, b] → Rn and r̃ : [c, d] → Rn
Reparameterization 117

are two parameterizations of the same curve. In other words, they have the same image C.
Since parameterizations are invertible, we have the composite function

g := r−1 ◦ r̃ : [c, d] → [a, b]. (6.2)

Thus
r̃(u) = r(g(u)), u ∈ [c, d].
We call r̃ a reparameterization of r.
Since we assume that r and r̃ are continuous, the map g is a continuous bijection. There-
fore it is either increasing or decreasing. If g is increasing, we say that r and r̃ have the same
orientation, and that the reparameterization is orientation-preserving. If g is decreasing, we
say that r and r̃ have opposite orientation, and that the reparameterization is orientation-
reversing.

Theorem 6.3.1 (Reparameterization for path integrals). Under the above hypotheses,
Z d Z b

f (r̃(u))∥r̃ (u)∥ du = f (r(t))∥r′ (t)∥ dt .
c a

Proof. This follows from the change of variables formula. Setting t = g(u), we have

d
dt = g ′ (u) du, r̃′ (u) = r(g(u)) = r′ (g(u))g ′ (u).
du
If g is increasing, we have g ′ (u) ≥ 0 for u ∈ [a, b], g(c) = a, and g(d) = b. Thus,

∥r̃′ (u)∥ = ∥r′ (g(u))g ′ (u)∥ = ∥r′ (g(u))∥g ′ (u).

Therefore,
Z b Z d Z d
′ ′ ′
f (r(t))∥r (t)∥ dt = f (r(g(u)))∥r (g(u))∥g (u) du = f (r̃(u))∥r̃′ (u)∥ du .
a c c

We leave it as an exercise (Exercise 6.3.1) to handle the case where g is decreasing.


Theorem 6.3.1 shows that, while we need to parameterize a curve in order to integrate
along it, the integral of Definition 6.1.4 is independent of the parameterization. This situation
is slightly more subtle for line integrals (Definition 6.2.1).

Theorem 6.3.2 (Reparameterization for line integrals). Suppose F is a vector field on the
piecewise C 1 path r : [a, b] → Rn , and let r̃ : [c, d] → Rn be a reparameterization of r. If r̃ is
orientation-preserving, then Z Z
F · ds = F · ds,
r̃ r
and if r̃ is orientation-reversing, then
Z Z
F · ds = − F · ds .
r̃ r
118 Path and line integrals

Proof. As in the proof of Theorem 6.3.1, we define g as in (6.2) and set t = g(u). Then
d
dt = g ′ (u) du, r̃′ (u) = r(g(u)) = r′ (g(u))g ′ (u).
du
Then
Z Z b
F · ds = F(r(t)) · r′ (t) dt
r a
Z g −1 (b)
= F(r(g(u))) · r′ (g(u))g ′ (u) du
g −1 (a)
Z g −1 (b)
= F(r̃(u)) · r̃′ (u) du
g −1 (a)
(R d
F(r̃(u)) · r̃′ (u) du if r̃ is orientation-preserving,
= Rcc
d
F(r̃(u)) · r̃′ (u) du if r̃ is orientation-reversing
(R
F · ds
r̃ R
if r̃ is orientation-preserving,
=
− r̃ F · ds if r̃ is orientation-reversing.
Theorem 6.3.2 states that line integrals are independent of parameterization of a curve,
except that reversing orientation introduces a negative sign. It thus makes sense to speak
of a oriented curve which is, by definition, a curve C together with a choice of start point
P0 and endpoint P1 . We indicate the orientation of a curve by drawing an arrow along the
curve, in the direction from P0 to P1 :
• P1

P0 •

A parameterization of such an oriented curve is a parameterization r : [a, b] → Rn such that


r(a) = P0 and r(b) = P1 .
Then, by Theorem 6.3.2, for an oriented curve C, we can define
Z Z
F · ds = F · ds, (6.3)
C r

where r is any parameterization of C.

Exercises.
6.3.1. Complete the proof of Theorem 6.3.1 by treating the case where g is decreasing.

Further exercises are deferred until Section 6.5.


Line integrals of conservative vector fields 119

6.4 Line integrals of conservative vector fields


It turns out that computing line integrals of conservative vector fields is particularly easy.

Theorem 6.4.1 (Fundamental theorem of line integrals). Suppose F = ∇φ is a conservative


vector field. Then if C is any oriented curve that starts at P0 and ends at P1 , we have
Z
F · ds = φ(P1 ) − φ(P0 ).
C

Proof. We give the proof for the three-dimensional setting; the proof in two dimensions is
almost identical. Let
r(t) = (x(t), y(t), z(t)), a ≤ t ≤ b,
be a parameterization of C, with r(a) = P0 and r(b) = P1 . Then,
Z Z b
F · ds = F(r(t)) · r′ (t) dt
C a
Z b
= ∇φ(r(t)) · r′ (t) dt
a
Z b
∂φ dx ∂φ dy
= (x(t), y(t), z(t)) (t) + (x(t), y(t), z(t)) (t)
a ∂x dt ∂y dt

∂φ dz
+ (x(t), y(t), z(t)) (t) dt
∂z dt
Z b
d
= (φ(x(t), y(t), z(t))) dt (by the chain rule)
a dt
= φ(r(b)) − φ(r(a)) (by the fundamental theorem of calculus)
= φ(P1 ) − φ(P0 ).

Provided we are integrating a conservative vector field (whose potential we can find),
Theorem 6.4.1 gives us an extremely easy way to compute line integrals. In one dimension,
Theorem 6.4.1 reduces to the usual fundamental theorem of calculus.

Example 6.4.2. Recall from Example 5.2.3 that the gravitational force field F is conservative,
with potential
GM m
φ(r) = , where r = ∥r∥.
r
Thus, the work done by gravity on a particle moving along a curve C, starting at point
(x0 , y0 , z0 ) and ending at point (x1 , y1 , z1 ) is
Z
GM m GM m
F · ds = p 2 − p .
C x1 + y12 + z12 x20 + y02 + z02

The work does not depend on the curve, only its starting point and endpoint. Furthermore,
it does not depend on how long it takes the particle to travel between these two points. This
agrees with our computation in Example 6.2.4. △
120 Path and line integrals

Example 6.4.3. Consider the vector field

F(x, y) = xyi + (y 2 + 1)j.

Consider two curves, both starting at P0 = (0, 0) and ending at P1 = (1, 1).

(a) Let C1 be the straight line from P0 to P1 .


(b) Let C2 be the curve, made from two straight lines, which follows the y-axis from P0 to
(0, 1) and then follows the line x = 1 from (1, 0) to P1 .
y
C2
• P1

C1
P0 x

Let’s compute the line integral F · ds for each curve.


R
Ci

(a) We parameterize C1 by

r(t) = ti + tj, t ∈ [0, 1].

Then
F(r(t)) = t2 i + (t2 + 1)j and r′ (t) = i + j.
and so
Z Z 1 Z 1

t2 i + (t2 + 1)j · (i + j) dt

F · ds = F(r(t)) · r (t) dt =
C1 0 0
1 1
2t3
Z 
2 5
= (2t + 1) dt = +t = .
0 3 t=0 3

(b) We parameterize C2 with a piecewise C 1 path. The curve C2 is the union of the
curves C3 and C4 , where C3 is the straight line from (0, 0) to (0, 1) and C4 is the straight
line from (0, 1) to (1, 1). We parameterize C3 by

r(t) = tj, t ∈ [0, 1],

and C4 by
p(t) = ti + j, t ∈ [0, 1].
Then
Z Z Z
F · ds = F · ds + F · ds
C2 C3 C4
Line integrals of conservative vector fields 121

Z 1 Z 1

= F(r(t)) · r (t) dt + F(p(t)) · p′ (t) dt
0 0
Z 1 Z 1
2
= (t + 1)j · j dt + (ti + 2j) · i dt
0 0
Z 1 Z 1
2
= (t + 1) dt + t dt
0 0
 3 1  2 1
t t 11
= +t + = .
3 t=0 2 t=0 6
We see that the integrals along C1 and C2 are different, even though they have the same
starting point and endpoint. (See [FRYe, Example 2.4.3] for integrals along some additional
curves.) Using Theorem 6.4.1, this implies that F is not a conservative vector field. △

Example 6.4.4. Consider the vector field


 
y x
F(x, y) = − , 2 .
x + y x + y2
2 2

We claimed in Example 5.3.6 that this vector field is not conservative, even though it is
curl free (that is, ∇ × F = 0). One way to prove it is not conservative, is to show that the
integrals along two curves with the same starting point and endpoint are different.
Let
P0 = (−1, 0), P1 = (1, 0),
r(t) = (− cos t, sin t), t ∈ [0, π],
p(t) = (− cos t, − sin t), t ∈ [0, π].
y
r

P0 P1
• •
x

Then we have
Z Z π
F · ds = F(r(t)) · r′ (t) dt
r
Z0 π
= (− sin t, − cos t) · (sin t, cos t) dt
0
Z π
= (− sin2 t − cos2 t) dt
Z0 π
= (−1) dt
0
122 Path and line integrals

= −π

and
Z Z π
F · ds = F(p(t)) · p′ (t) dt
p
Z0 π
= (sin t, − cos t) · (sin t, − cos t) dt
0
Z π
= (sin2 t + cos2 t) dt
Z0 π
= dt
0
= π.

Since the two integrals are different, the vector field F cannot be conservative. △

Remark 6.4.5. Note that the vector field F of Example 6.4.4 is defined on R2 \ {(0, 0)}, and
that the paths C1 and C2 go on opposite sides of the origin. In a certain sense F is the only
vector field defined on R2 \ {(0, 0)} that has zero curl but is not conservative. More precisely,
any other such vector field is a scalar multiple of F plus a conservative vector field. This is
related to an advanced mathematical theory known as de Rham cohomology.

Exercises.
Exercises are deferred until Section 6.5.

6.5 Path independence


As we saw in Theorem 6.4.1, the line integral of a conservative vector field is path independent,
in the sense that it depends only on the starting point and endpoint of the path. In this
section, we examine this phenomenon in more detail. A good reference for the material in
this section is [FRYe, §2.4.1].
To simplify our discussion, we will restrict our attention to vector fields that are defined
and continuous on all of R2 or R3 . Remember that a curve is closed it is starts and ends at
the same point.

Theorem 6.5.1. Suppose that F is a vector field that is defined and continuous on all of R2
or R3 . Then the following statements are equivalent:

(a) F is conservative.
(b) For any closed curve C, we have
H
C
F · ds = 0.
Path independence 123

(c) The integral


R
R F ·R ds is path independent. More precisely, for any points P0 , P1 , we
have C1 F · ds = C2 F · ds for all curves C1 , C2 that start at P0 and end at P1 .

Proof. (a) =⇒ (b): Suppose (a) is true. Thus, there exists a function φ such that F = ∇φ.
Then, if C is a closed curve that starts and ends at P0 , we have, by Theorem 6.4.1,
I
F · ds = φ(P0 ) − φ(P0 ) = 0.
C

(b) =⇒ (c): Suppose (b) is true. Let C1 , C2 be paths that start at P0 and end at P1 . Define
C̃2 to be the opposite of the path C2 (that is, we follow C2 in the opposite direction). Then
let C be the path C1 followed by the path C̃2 . Thus C is a closed path starting and ending
at P0 . Then, using (b) and Theorem 6.3.2, we have
I Z Z Z Z
0= F · ds = F · ds + F · ds = F · ds − F · ds
C C1 C̃2 C1 C2

Hence
R R
C1
F · ds = C2
F · ds.
(c) =⇒ (a): Suppose (c) is true. We want to find a function φ such that F = ∇φ. By
Theorem 6.4.1, such a function would have to satisfy
Z
F · ds = φ(x) − φ(0)
C

for any path C from 0 to x. Since adding a constant to a potential yields another potential,
we can assume that φ(0) = 0. So let’s define φ by
Z
φ(x) = F · ds,
C

where C is any path from 0 to x. Since we’re assuming (c) is true, this definition of φ(x)
does not depend on the path C we choose.
Now we want to show that F = ∇φ. Fix a point x and any curve Cx that starts at 0
and ends at x. Then, for any vector u, let Du be the curve with parameterization

ru (t) = x + tu, 0 ≤ t ≤ 1.

For s ∈ R, let Cx + Dsu be the curve that first follows Cx from 0 to x and then follows Dsu
from x to x + su. Then
Z
φ(x + su) = F · ds
Cx +Dsu
Z Z
= F · ds + F · ds
Cx Dsu
Z Z 1
= F · ds + F(x + tsu) · (su) dt,
Cx 0
124 Path and line integrals

since r′su = su. In the second integral, make the change of variables

v = ts, dv = s dt .

Then Z Z s
φ(x + su) = F · ds + F (x + vu) · u dv .
Cx 0

The first integral is independent of s. Using the fundamental theorem of calculus for the
second integral, we have

d
φ(x + su) = F(x + su) · u = F(x) · u.
ds s=0 s=0

Applying this with u = i, j, k gives


 
∂φ ∂φ ∂φ
∇φ(x) = (x), (x), (x) = (F(x) · i, F(x) · j, F(x) · k) = F(x),
∂x ∂y ∂z

as desired.

We can use Theorem 6.5.1 to completely characterize conservative vector fields on R2


and R3 .

Theorem 6.5.2. Let F be a vector field that is defined and C 1 on all of R2 or R3 . Then F
is conservative if and only if ∇ × F = 0 (i.e. F is curl free).

Proof. We’ll give the proof for the R3 case. The proof for the R2 case is very similar and can
be found in [FRYe, Th. 2.4.8]. We’ve already seen in Theorem 5.3.3 that every conservative
vector field is curl free. It remains to show the reverse implication.
Suppose F = (F1 , F2 , F3 ) is a vector field that is defined and C 1 on all of R3 . Furthermore,
suppose that ∇ × F = 0. Thus

∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1


= , = , = . (6.4)
∂y ∂z ∂z ∂x ∂x ∂y

We want to find a function φ such that F = ∇φ. So we want

∂φ ∂φ ∂φ
(x, y, z) = F1 (x, y, z), (x, y, z) = F2 (x, y, z), (x, y, z) = F3 (x, y, z). (6.5)
∂x ∂y ∂z

The function φ obeys the first equation in (6.5) if and only if there is a function ψ(y, z) such
that Z x
φ(x, y, z) = F1 (X, y, z) dX +ψ(y, z).
0

This will satisfy the second equation in (6.5) if and only if


Z x 
∂φ ∂
F2 (x, y, z) = (x, y, z) = F1 (X, y, z) dX +ψ(y, z)
∂y ∂y 0
Path independence 125

Z x
∂F1 ∂ψ
= (X, y, z) dX + (y, z).
0 ∂y ∂y
So we have
Z x
∂ψ ∂F1
(y, z) = F2 (x, y, z) − (X, y, z) dX
∂y ∂y
Z0 x
∂F2
= F2 (x, y, z) − (X, y, z) dX (by (6.4))
0 ∂x
x
= F2 (x, y, z) − F2 (X, y, z) (by the fundamental theorem of calculus)
X=0
= F2 (0, y, z).
Thus, we can choose Z y
ψ(y, z) = F2 (0, Y, z) dZ +χ(z)
0
for some function χ depending only on χ(z).
So now we have
Z x Z y
φ(x, y, z) = F1 (X, y, z) dX + F2 (0, Y, z) dZ +χ(z).
0 0

This satisfies the last equation in (6.5) if and only if


∂φ
F3 (x, y, z) = (x, y, z)
∂z Z
x Z y 

= F1 (X, y, z) dX + F2 (0, Y, z) dZ +χ(z)
∂z 0 0
Z x Z y
∂F1 ∂F2 ∂χ
= (X, y, z) dX + (0, Y, z) dZ + (z)
∂z ∂z ∂z
Z0 x Z0 y
∂F3 ∂F3 ∂χ
= (X, y, z) dX + (0, Y, z) dZ + (z) (by (6.4))
0 ∂x 0 ∂y ∂z
x y ∂χ
= F3 (X, y, z) + F3 (0, Y, z) + (z).
X=0 Y =0 ∂z
So we have
∂χ x y
(z) = F3 (x, y, z) − F3 (X, y, z) − F3 (0, Y, z) = F3 (0, 0, z).
∂z X=0 Y =0

Thus, we can choose Z z


χ(z) = F3 (0, 0, Z) dZ +C
0
for some constant C ∈ R.
Summarizing, we have found that
Z x Z y Z z
φ(x, y, z) = F1 (X, y, z) dX + F2 (0, Y, z) dZ + F3 (0, 0, Z) dZ +C
0 0 0

is a potential for F.
126 Path and line integrals

Later, when we learn Stokes’ Theorem, we’ll be able to give an alternative proof of
Theorem 6.5.2. See Remark 8.4.9.

Warning 6.5.3. It is crucial in Theorem 6.5.2 that ∇ × F = 0 on all of R2 or R3 . We saw


in Examples 5.3.6 and 6.4.4 an example of a vector field that is defined and curl free on
R2 \ {(0, 0)} but is not conservative.

Remark 6.5.4. In fact, we can generalize Theorem 6.5.1 somewhat. We can replace R2 or
R3 by any region that is simply connected. Intuitively, a region is simply connected if any
closed curve can be contracted to a point. The region R2 \ {(0, 0)} is not simply connected
since loops around the origin cannot be contracted to a point.

Exercises.
Exercises from [FRYd, §2.4] : Q1–Q38.
Chapter 7

Surface integrals

In this chapter we discuss integration of real-valued functions and vector fields over surfaces.
A good reference for the material in this section is [FRYe, Ch. 3].

7.1 Parameterized surfaces


In Chapter 6, we saw that, in order to integrate along curves, we needed to parameterize
them. The same is true for surfaces. One type of surface we’ve seen is the graph of a function
f (x, y) of two variables. However, if we only considered graphs, we would not be able to
handle many important surfaces, such as spheres and tori.

Definition 7.1.1 (Parameterized surface). A parameterization of a surface is a function


Φ : D → R3 , D ⊆ R2 .
The surface corresponding to the function Φ is its image S = Φ(D). We can write
Φ(u, v) = (ϕ1 (u, v), ϕ2 (u, v), ϕ3 (u, v)).
If Φ is differentiable or is of class C 1 , we call S a differentiable or a C 1 surface, respectively.
We will typically require that a parameterization is injective except perhaps on a finite
union of curves.

Example 7.1.2 (Graphs). If g : D → R is a function, with D ⊆ R2 , then


Φ(u, v) = (u, v, g(u, v)), (u, v) ∈ D.
is a parameterization of the graph of g. △

127
128 Surface integrals

Example 7.1.3 (Planes). Let P be a plane that is parallel to two vectors a and b and passes
through the point p. Then
Φ(u, v) = p + ua + vb
is a parameterization of P .

•p
b

Example 7.1.4 (Sphere). Using spherical coordinates (see Section 4.5), we can parameterize
the sphere of radius R centred at the origin by

Φ(u, v) = (R sin v cos u, R sin v sin u, R cos v), (u, v) ∈ [0, 2π] × [0, π]. △

Example 7.1.5 (Torus). The torus obtained by revolving a circle of radius r in the yz-plane,
centred at (0, R, 0), about the z-axis, can be parameterized by

Φ(θ, ψ) = (R + r cos θ) cos ψ, (R + r cos θ) sin ψ, r sin θ , 0 ≤ θ, ψ ≤ 2π.

See [FRYe, Example 3.1.5] for details. △

Exercises.
Exercises from [FRYd, §3.1] : Q1–Q7.

7.2 Tangent planes


We discussed tangent planes to graphs in Section 1.4. Let’s now discuss the more general
situation of a tangent planes to surfaces.
Suppose we have a parameterized surface

Φ = (ϕ1 , ϕ2 , ϕ3 ) : D → R3 .

Define    
∂ϕ1 ∂ϕ2 ∂ϕ3 ∂ϕ1 ∂ϕ2 ∂ϕ3
Φu := , , , Φv := , , .
∂u ∂u ∂u ∂v ∂v ∂v
Tangent planes 129

Then Φu (u0 , v0 ) and Φv (u0 , v0 ) are tangent to the surface at Φ(u0 , v0 ). Thus, if they are
linearly independent, they generate the tangent plane. In other words, the tangent plane to
the surface at the point Φ(u0 , v0 ) is given by

Φ(u0 , v0 ) + sΦu (u0 , v0 ) + tΦv (u0 , v0 ), s, t ∈ R.

Recall that if a = (a1 , a2 , a3 ), b ∈ (b1 , b2 , b3 ) ∈ R3 , then their cross product


 
i j k
a × b := (a2 b3 − a3 b2 )i + (a3 b1 − a1 b3 )j + (a1 b2 − a2 b1 )k = det a1 a2 a3 
b 1 b2 b3

is a vector that is perpendicular to both a and b (with direction given by the right-hand
rule) and with magnitude
∥a × b∥ = ∥a∥ ∥b∥ sin θ,
where θ is the angle between a and b.
z

a
a×b y

It follows that
Φu (u0 , v0 ) × Φv (u0 , v0 )
is a normal vector to the surface parameterized by Φ at the point Φ(u0 , v0 ), that is, it
is orthogonal to the tangent plane at that point. This vector is nonzero precisely when
Φu (u0 , v0 ) and Φv (u0 , v0 ) are linearly independent.
Definition 7.2.1 (Regular point). We say that a surface S parameterized by Φ is regular
at Φ(u0 , v0 ) ∈ S when Φu (u0 , v0 ) × Φv (u0 , v0 ) ̸= 0.
Example 7.2.2. Let’s find the tangent plane to the helicoid parameterized by

Φ(r, θ) = (r cos θ, r sin θ, θ)

at the point Φ(1, π). We have

Φr (r, θ) = (cos θ, sin θ, 0), Φr (1, π) = (−1, 0, 0),


Φθ (r, θ) = (−r sin θ, r cos θ, 1), Φθ (1, π) = (0, −1, 1),

and Φ(1, π) = (−1, 0, π). Therefore, the tangent plane is given by

(−1, 0, π) + s(−1, 0, 0) + t(0, −1, 1) = (−1 − s, −t, π + t), s, t ∈ R.


130 Surface integrals

y
x

Note also that

Φr × Φθ = (sin θ, − cos θ, r cos2 θ + r sin2 θ) = (sin θ, − cos θ, r). (7.1)

This vector is never equal to 0 (since sin θ and cos θ are never zero for the same value of θ)
and so the helicoid is regular at all points. △

Example 7.2.3. Consider the cone parameterized by

Φ(r, θ) = (r cos θ, r sin θ, r).


z

We have
Φr = (cos θ, sin θ, 1), Φθ = (−r sin θ, r cos θ, 0).
Thus
Φr × Φθ = (−r cos θ, −r sin θ, r cos2 θ + r sin2 θ) = (−r cos θ, −r sin θ, r).
Therefore the cone is regular everywhere except at the origin (when r = 0). △
Surface area 131

Exercises.
Exercises from [FRYd, §3.2] : Q1–Q17.

7.3 Surface area


The first type of surface integral we consider is the computation of the surface area. Through-
out the remainder of this chapter we make the following assumption.

Assumption 7.3.1. We assume that all surfaces are unions of images of parameterized surfaces
Φi : Di → R3 for which:
(a) Di is an elementary region in the plane,
(b) Φi is of class C 1 and one-to-one, except possibly on the boundary of Di , and
(c) the image of Φi is regular, except possibly at a finite number of points.
In what follows, we will use the fact ∥a × b∥ is the area of the parallelogram determined
by the vectors a and b.

a
θ Area = ∥a∥ ∥b∥ sin θ = ∥a × b∥. (7.2)

b
Before defining general integrals of scalar functions over surfaces, we start with the defi-
nition of surface area. For simplicity, we assume that D is a rectangle. As in Section 3.3, we
subdivide the rectangle D by subdividing each of its sides into n segments of equal length.
z
v

∆uΦu
∆u
Φ(ui ,vj ) •
∆v Φ
−→ ∆vΦv

(ui ,vj ) y

x
The sub-rectangle with bottom-right corner at (ui , vj ) is sent to a piece of the surface in-
cluding the point Φ(ui , vj ). We approximate this piece of the surface by the parallelogram
determined by the vectors ∆uΦu (ui , vj ) and ∆vΦv (ui , vj ). The area of this parallelogram is
∥∆uΦu (ui , vj ) × ∆vΦv (ui , vj )∥ = ∥Φu (ui , vj ) × Φv (ui , vj )∥∆u∆v. (7.3)
132 Surface integrals

Adding up all the areas of these parallelograms, our approximation for the area of the surface
is
n−1 X
X n−1
∥Φu (ui , vj ) × Φv (ui , vj )∥∆u∆v. (7.4)
i=0 j=0

As n → ∞, these sums should converge to the area of the surface. This leads us to the
following definition.

Definition 7.3.2 (Area of a parameterized surface). The surface area of a surface S param-
eterized by Φ : D → R3 is ZZ
Area(S) := ∥Φu × Φv ∥ dA . (7.5)
D

If S is a union of surfaces Si , then its area is the sum of the areas of the Si .

Note that our definition of surface area involves the parameterization. We will see in
Theorem 7.7.2 that, in fact, it is independent of the parameterization. We saw such a
phenomenon already for path integrals in Theorem 6.3.1.

Example 7.3.3 (Area of a cone). Let’s compute the area of the cone parameterized by

Φ(r, θ) = (r cos θ, r sin θ, r), 0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π.


z

y
x

As we computed in Example 7.2.3,

Φr × Φθ = (−r cos θ, −r sin θ, r)

and (0, 0, 0) is the only point where the surface is not regular. Furthermore, Φ is one-to-
one on the interior of its domain D = [0, 1] × [0, 2π]. Since D is an elementary region,
Assumption 7.3.1 is satisfied. We have
p √
∥Φr × Φθ ∥ = r2 cos2 θ + r2 sin2 θ + r2 = 2r.

Thus, the area of the cone is


 2 1
ZZ Z 2π Z 1 √ √ r √
∥Φr × Φθ ∥ dr dθ = 2r dr dθ = 2 2π = 2π. △
0 0 2 r=0
D
Surface area 133

Example 7.3.4. Let’s compute the surface area of the helicoid (see Example 7.2.2) parame-
terized by
Φ(r, θ) = (r cos θ, r sin θ, θ), 0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π.
We have
Φr (r, θ) = (cos θ, sin θ, 0), Φθ (r, θ) = (−r sin θ, r cos θ, 1),
and so
Φr × Φθ = (sin θ, − cos θ, r cos2 θ + r sin2 θ) = (sin θ, − cos θ, r).
Since Φr × Φθ is never 0 (since sin θ and cos θ are never zero for the same value of θ), the
surface is regular at all points. The domain [0, 1] × [0, 2π] is an elementary region, and Φ is
one-to-one. Thus, Assumption 7.3.1 is satisfied. We have
p √
∥Φr × Φθ ∥ = sin2 θ + cos2 θ + r2 = r2 + 1.

Thus, with the use of a table of integrals, we can compute that the surface area is
ZZ Z 2π Z 1 √ √ √ 
∥Φr × Φθ ∥ dr dθ = r2 + 1 dr dθ = π 2 + ln(1 + 2) . △
0 0
D

Example 7.3.5 (Graphs). Recall, from Example 7.1.2, that if g : D → R is a function, with
D ⊆ R2 , then
Φ(x, y) = (x, y, g(x, y)), (x, y) ∈ D,
is a parameterization of the graph of f . Then we have
   
∂g ∂g
Φx (x, y) = 1, 0, (x, y) and Φy (x, y) = 0, 1, (x, y) ,
∂x ∂y

and so  
∂g ∂g
Φx × Φy = − ,− ,1 ,
∂x ∂y
which is never zero. Thus, provided D is an elementary region and f is C 1 , Assumption 7.3.1
is satisfied. Therefore, the surface area of the graph is
s 
ZZ  2  2
∂g ∂g
Area(graph of g) =  + + 1 dx dy . (7.6)
∂x ∂y
D

Exercises.
Exercises from [FRYd, §3.3] : Q1–Q3, Q5–Q10, Q14
134 Surface integrals

7.4 Integrals of scalar functions over surfaces


In Section 7.3 we learned how to compute the area of a surface. Now suppose that the surface
is a thin sheet whose density at point (x, y, z) is given by a continuous function f (x, y, z).
Now we want to compute the mass of the surface. In this case, we should multiply the area
of the parallelogram given in (7.3) by the density f (Φ(u∗i , vj∗ )) for some point (u∗i , vj∗ ) in the
sub-rectangle with bottom-right corner at (ui , vj ). This gives an approximation of the mass
of the image of this rectangle under the parameterization Φ : D → R3 . Adding up these
approximations, our approximation for the mass of the surface is
n−1 X
X n−1
f (Φ(u∗i , vj∗ ))∥Φu (ui , vj ) × Φv (ui , vj )∥∆u∆v.
i=0 j=0

Compare to (7.4). As n → ∞, these sums should converge to the mass of the surface. This
leads us to the following definition, which applies more generally (i.e. not only when the
function f (x, y, z) represents a density).

Definition 7.4.1 (Integral of a scalar function over a surface). If f (x, y, z) is a real-valued


continuous function defined on a parameterized surface Φ : D → R3 satisfying Assump-
tion 7.3.1, we define the integral of f over Φ to be
ZZ ZZ ZZ
f (x, y, z) dS = f dS = f (Φ(u, v))∥Φu × Φv ∥ dA . (7.7)
Φ Φ D

Note that when f (x, y, z) = 1, (7.7) reduces to the formula (7.5) for surface area, as
expected.

Example 7.4.2. Consider the helicoid S from Example 7.2.2 parameterized by

Φ(r, θ) = (r cos θ, r sin θ, θ), (r, θ) ∈ D = [0, 1] × [0, 2π].


z

y
x

Let’s compute ZZ p
f dA, where f (x, y, z) = x2 + y 2 .
Φ
Integrals of scalar functions over surfaces 135

We computed in (7.1) that


Φr × Φθ = (sin θ, − cos θ, r).
Thus p √
∥Φr × Φθ ∥ = sin2 θ + cos2 θ + r2 = 1 + r2 .
We also have

since r ≥ 0.
p
f (Φ(u, v)) = (r cos θ)2 + (r sin θ)2 = r2 = r

Therefore,
ZZ p ZZ √
2 2
x + y dA = r 1 + r2 dA
Φ D
Z 1 Z 2π √
= r 1 + r2 dθ dr
Z0 1 0

= 2πr 1 + r2 dr .
0

We now make the substitution

u = 1 + r2 , du = 2r dr .

Then 2
2 √ 2π √
ZZ p Z 
2π 3/2
x2 + y 2 dA = π u du = u = (2 2 − 1). △
1 3 u=1 3
Φ

Example 7.4.3 (Graphs). In Example 7.3.5, we developed a formula for the surface area of
a graph (which is a special type of surface). Repeating the argument there, but for the
integral of a function f (x, y, z) over the graph, we can give a formula for the integral of a
scalar function over a graph. Suppose

g : D → R, D ⊆ R2

is a C 1 function and D is an elementary region. Consider the graph of g, with parameteri-


zation
Φ(x, y) = (x, y, g(x, y)), (x, y) ∈ D.
Then the integral of a continuous function f over Φ is
s 
ZZ  2  2
∂g ∂g
f (x, y, g(x, y))  + + 1 dx dy . (7.8)
∂x ∂y
D

Note that, when f (x, y, z) = 1, (7.8) reduces to (7.6). △


136 Surface integrals

Example 7.4.4. Let’s integrate y 3 over the surface defined by

z = x + y3, 0 ≤ x ≤ 3, 0 ≤ y ≤ 1.
z

x
Note that this surface is the graph of the function

g(x, y) = x + y 3 .

Thus, (7.8) gives


s 
ZZ ZZ  2  2
∂g ∂g
y 3 dS = y3  + + 1 dA
∂x ∂y
Φ D
Z 1Z 3 p
= y 3 1 + 9y 4 + 1 dx dy
Z0 1 0 p
= 3y 3 1 + 9y 4 + 1 dy
Z0 11
1√
= u du (u = 9y 4 + 2, du = 36y 3 dy)
2 12
 u=11
1 3/2 1
113/2 − 23/2 .

= u = △
18 u=2 18
Note that Definition 7.4.1 depends on a choice of parameterization of the surface. Later,
in Section 7.7, we will see that, in fact, the integral does not depend on the parameterization.

Exercises.
Exercises from [FRYd, §3.3] : Q11, Q15, Q20, Q25, Q26, Q32, Q34

7.5 Centre of mass and moment of inertia


In this section we briefly discuss some applications of integration in higher dimensions in
physics.
Centre of mass and moment of inertia 137

Definition 7.5.1 (Centre of mass). Suppose an object occupies a volume V ⊆ R3 , and its
density at point (x, y, z) is given by ρ(x, y, z). The centre of mass of the object is (x0 , y0 , z0 ),
where
ZZZ
1
x0 = xρ(x, y, z) dV ,
M
Z VZ Z
1
y0 = yρ(x, y, z) dV ,
M
Z VZ Z
1
z0 = zρ(x, y, z) dV ,
M
V

and ZZZ
M= ρ(x, y, z) dV
V

is the total mass of the object. If the object is one-dimensional or two-dimensional, we


replace the above integrals by path or surface integrals, respectively. If the density of the
object is not given, we assume it has constant density ρ(x, y, z) = 1.

Example 7.5.2. Consider the solid cone


p
V = {(x, y, z) : x2 + y 2 ≤ z ≤ 1}

and its surface


p
S = {(x, y, z) : z = x2 + y 2 ≤ 1}.

z 0.5

0 0.5
0
−0.5 0
0.5 −0.5
1 y
x

If both have constant density 1, which is more stable? More precisely, which has a lower
centre of mass?
Let’s first consider the solid cone. Using cylindrical coordinates, we compute that its
mass is
138 Surface integrals

1 z 2π 1 z
r2
ZZZ Z Z Z Z 
MV = dV = r dθ dr dz = 2π dz
0 0 0 0 2 r=0
V
1  3 1
z2
Z
z π
= 2π dz = 2π = .
0 2 6 z=0 3

Next, we compute
1 z 2π 1 z
zr2
ZZZ Z Z Z Z 
z dV = zr dθ dr dz = 2π dz
0 0 0 0 2 r=0
V
1  4 1
z3
Z
z π
= 2π dz = 2π = .
0 2 8 z=0 4

Thus π
4 3
z0 = π = .
3
4
By symmetry, we have x0 = y0 = 0. Thus, the centre of mass of the solid cone is 0, 0, 34 .


Now let’s consider the empty cone. Our computation of the surface area of a cone in
Example 7.3.3 gives √
MS = 2π.
As in Example 7.3.3, we parameterize the cone by

Φ(r, θ) = (r cos θ, r sin θ, r), 0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π,

and we have √
∥Φr × Φθ ∥ = 2r.
Thus,
" √ #1 √
ZZ Z 1 Z 2π √ Z 1 √ 2 2π 2 2π
z dA = r 2r dθ dr = 2π 2r2 dr = r3 = .
0 0 0 3 3
S r=0

Therefore, √
2 2π
32
z0 = √ = .
2π 3
Again, by symmetry, we have x0 = y0 = 0. Thus, the centre of mass is 0, 0, 32 . So the


empty cone has a lower centre of mass than the solid cone. △

For the next definition, we consider rotating some object about an axis. The axis is a
line in R3 . The moment of inertia about this axis measures the torque needed per unit of
angular acceleration. It is the rotational analogue of usual inertia.

Definition 7.5.3 (Moment of inertia). Suppose an object occupies a volume V ⊆ R3 , and


its density at point (x, y, z) is given by ρ(x, y, z). Fix an axis, and let d(x, y, z) denote the
Centre of mass and moment of inertia 139

distance from the point (x, y, z) to the axis. Then the moment of inertia of the object about
the axis is ZZZ
ρ(x, y, z)d(x, y, z)2 dV .
V

If the object is one-dimensional or two-dimensional, we replace the above integrals by path


or surface integrals, respectively. If the density of the object is not given, we assume it has
constant density ρ(x, y, z) = 1.
Example 7.5.4. Consider the cube
V = [0, 1]3
with density
ρ(x, y, z) = x.
Its moment of inertia about the x-axis is
ZZZ Z 1 Z 1 Z 1
2
ρ(x, y, z)d(x, y, z) dV = x(y 2 + z 2 ) dx dy dz
0 0 0
V
1 1 1
x2 2
Z Z 
= (y + z 2 ) dy dz
0 0 2 x=0
Z 1Z 1
1
= (y 2 + z 2 ) dy dz
2 0 0
Z 1 3 1
1 y 2
= +z y dz
2 0 3 y=0
1 1 1
Z  
2
= + z dz
2 0 3
1
1 z z3

= +
2 3 3 z=0
 
1 1 1 1
= + = . △
2 3 3 3
Example 7.5.5. Consider the half-sphere

S = {(x, y, z) : x2 + y 2 + z 2 = 1, z ≥ 0}

with density
ρ(x, y, z) = z.
Let’s find its moment of inertia about the z-axis. As in Example 7.7.1, we parameterize
this surface by

Φ(θ, φ) = (cos θ sin φ, sin θ sin φ, cos φ), θ ∈ [0, 2π], φ ∈ [0, π2 ],

and we have

∥Φθ × Φφ ∥ = ∥(− cos θ sin2 φ, − sin θ sin2 φ, − sin φ cos φ)∥


140 Surface integrals

q
= cos2 θ sin4 φ + sin2 θ sin4 φ + sin2 φ cos2 φ
q q
= sin φ + sin φ cos φ = sin2 φ = sin φ,
4 2 2

since sin φ ≥ 0 for φ ∈ [0, π2 ]. Then the moment of inertia about the z-axis is
ZZ Z 2π Z π/2
2 2
cos φ cos2 θ sin2 φ + sin2 θ sin2 φ sin φ dφ dθ

z(x + y ) dA =
0 0
S
Z 2π Z π/2
= cos φ sin3 φ dφ dθ
0 0
Z 2π  π/2 Z 2π
1 4 1 π
= sin φ dθ = dθ = . △
0 4 φ=0 0 4 2

Exercises.
Exercises from [FRYd, §3.3] : Q4, Q13, Q21, Q22, Q27,

7.6 Surface integrals of vector fields


In Section 7.4, we developed the notion of the integral of a scalar function over a surface.
We now want to define the integral of a vector field over a surface. This is analogous to the
two types of integrals along curves. Definition 6.2.1 was motivated by the concept of work ;
see Definition 6.2.3. Our definition of surface integrals of vector fields will be inspired by the
physical notion of flux. A good reference for the material in this section is [FRYe, §3.4].
If our vector field v(x, y, z) gives the velocity of a fluid, ρ(x, y, z) denotes the density of
the fluid, and we put an imaginary surface in the fluid, we can ask: “What is the rate at
which the fluid is crossing the surface”? Let n be the unit normal vector to a small piece of
the surface, with area ∆S. We draw a cross section as follows:
n v∆t

θ v∆t

v∆t v∆t

The green line is a side view of the piece ∆S of the surface. In a small time interval ∆t,

• the red line moves to the green line and


• the green line moves to the blue line, so that
• the fluid filling the dark grey region below the green line crosses through ∆S and moves
to the light grey region above the green line.
Surface integrals of vector fields 141

If we denote by θ the angle between n and v, the dark grey region has base ∆S and height
∥v∆t∥ cos θ. Thus, its volume is

∥v(x, y, z)∆t∥ cos θ∆S = v(x, y, z) · n(x, y, z)∆t∆S,

because n(x, y, z) has length one. The mass of the fluid that crosses ∆S during the time
interval ∆t is then
ρ(x, y, z)v(x, y, z) · n(x, y, z)∆t∆S
and the rate at which fluid is crossing through ∆S is

ρ(x, y, z)v(x, y, z) · n(x, y, z)∆S.

Adding up these amounts over all pieces ∆S of the surface and then taking the limit as the
size of the pieces ∆S and the time interval ∆t goes to zero, we find that the rate at which
fluid is crossing through the surface S is the flux integral
ZZ
ρ(x, y, z)v(x, y, z) · n(x, y, z) dS .
S

Now, comparing to (7.7), we see that to compute this integral, we should parameterize
the surface by a function Φ : D → R3 . Then dS is replaced by ∥Φu ×Φv ∥ dA, and we integrate
over D. Recall that the point Φ(u0 , v0 ) ∈ S is regular if Φu (u0 , v0 ) × Φv (u0 , v0 ) ̸= 0. Since
Φu and Φv are tangent to the surface, their cross product Φu × Φv is normal to the surface.
Therefore, informally speaking, we have

n dS = Φu × Φv .

This motivates the following definition (letting F = ρv).


Definition 7.6.1 (Integral of a vector field over a surface). Let F be a vector field defined
on the image of a parameterized surface Φ. The surface integral of F over Φ is
ZZ ZZ
F · dS := F · (Φu × Φv ) dA .
Φ D

Note that Definition 7.6.1 depends on a choice of parameterization of the surface. Later,
in Section 7.7, we will see that, up to a sign, the integral does not depend on the parame-
terization.

Example 7.6.2. Consider the cylinder S parameterized by

Φ : [0, 2π] × [0, 1] → R3 , Φ(u, v) = (cos u, sin u, v).

and the vector field


F(x, y, z) = (x, 0, z).
We have
Φu × Φv = (− sin u, cos u, 0) × (0, 0, 1) = (cos u, sin u, 0).
142 Surface integrals

Drawing this surface and vector field, we have the following:

z
Φu × Φv
F
S

y
x
We compute
F(Φ(u, v)) · (Φu × Φv ) = (cos u, 0, v) · (cos u, sin u, 0) = cos2 u = 21 (1 + cos(2u)).
Thus,
ZZ Z 1 Z 2π Z 2π
1 1
F · dS = (1 + cos(2u)) du dv = (1 + cos(2u)) du
0 0 2 2 0
Φ
 2π
1 1
= u + sin(2u) = π. △
2 2 u=0

Example 7.6.3. Recall the helicoid S of Examples 7.2.2 and 7.4.2, with parameterization
Φ(r, θ) = (r cos θ, r sin θ, θ), 0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π.
Let’s integrate the following vector fields over Φ:
F(x, y, z) = (x, 0, 0), G(x, y, z) = (xz, 0, 0).
So we have the following picture:
z
S
F
G

x
Surface integrals of vector fields 143

First we compute

Φr × Φθ = (cos θ, sin θ, 0) × (−r sin θ, r cos θ, 1)


= (sin θ, − cos θ, r cos2 θ + r sin2 θ) = (sin θ, − cos θ, r).
Since
F(Φ(r, θ)) · (Φr × Φθ ) = (r cos θ, 0, 0) · (sin θ, − cos θr) = r cos θ sin θ = 2r sin(2θ),
we have
ZZ Z 1 Z 2π Z 1 Z 1
r h r i2π
F · dS = sin(2θ) dθ dr = − cos(2θ) dr = 0 dr = 0.
0 0 2 0 4 θ=0 0
Φ

On the other hand,



G(Φ(r, θ)) · (Φr × Φθ ) = (rθ cos θ, 0, 0) · (sin θ, − cos θ, r) = rθ cos θ sin θ = 2
sin(2θ),
and so
2π 1 2π 1
r2 θ 1 2π
ZZ Z Z Z  Z

G · dS = sin(2θ) dr dθ = sin(2θ) dθ = θ sin(2θ) dθ .
0 0 2 0 4 r=0 4 0
Φ

We calculate the last integral using integration by parts with


u = θ, dv = sin(2θ) dv,
du = dθ, v = − 21 cos(2θ).
So, Z Z
θ sin(2θ) dθ = − 21 θ cos(2θ) − − 12 cos(2θ) dθ = − 12 θ cos(2θ) + 14 sin(2θ).

Therefore
ZZ
1 1 2π 1 π
G · dS = − 2 θ cos(2θ) + 41 sin(2θ) θ=0 = ((−π + 0) − (0 + 0)) = − . △
4 4 4
Φ

Example 7.6.4 (Graphs). Recall, from Example 7.3.5, that if g : D → R is a function, with
D ⊆ R2 , then
Φ(x, y) = (x, y, g(x, y)), (x, y) ∈ D,
is a parameterization of the graph of g and we have
 
∂g ∂g
Φx × Φy = − , − , 1 .
∂x ∂y
Therefore, the integral of a vector field F = (F1 , F2 , F3 ) over the graph is given by
ZZ ZZ  
∂g ∂g
F · dS = −F1 − F2 + F3 dx dy . △
∂x ∂y
Φ D
144 Surface integrals

Exercises.
Exercises from [FRYd, §3.3] : Q16, Q23, Q24, Q28–Q31, Q33, Q35–Q37

7.7 Reparameterization and orientation


In Section 6.3 we discussed the effect of reparameterization on line integrals. In particular, we
saw that a line integral is independent of parameterization provided that we fix an orientation
of the curve over which we are integrating. This allowed us to define, in (6.3), an integral
over an oriented curve. We would now like to do the same for surface integrals.
Suppose
Φ : D → R3 , Ψ : E → R3
are two parameterizations of the same surface S. In other words, Φ, Ψ are one-to-one C 1
functions with Φ(D) = Φ(E) = S. Choose a point (x, y, z) ∈ S. Then there exist unique

(u, v) ∈ D such that Φ(u, v) = (x, y, z) and (s, t) ∈ E such that Ψ(s, t) = (x, y, z).

We say that Φ and Ψ have the same orientation at (x, y, z) if

Φu × Φv and Ψs × Ψt

are positive scalar multiples of each other. We say they have they have opposite orientation
at (x, y, z) if they are negative scalar multiples of each other. (Since Φu × Φv and Ψs × Ψt are
both normal to S, these are the only possibilities, provided both vectors are nonzero.) We
say that Φ and Ψ have the same (respectively, opposite) orientation if they have the same
(respectively, opposite) orientation at all points of S.

Example 7.7.1. Consider the half-sphere

{(x, y, z) : x2 + y 2 + z 2 = 1, z ≥ 0}

with parameterizations

Φ(θ, φ) = (cos θ sin φ, sin θ sin φ, cos φ), θ ∈ [0, 2π], φ ∈ [0, π2 ],

Ψ(u, v) = (u, v, 1 − u2 − v 2 ), (u, v) ∈ {(x, y) : x2 + y 2 ≤ 1}.

(Note that Φ is not actually one-to-one as given. To fix this, we could modify the domain
to be [0, 2π] × (0, π2 ] ∪ {(0, 0)}.) We compute

Φθ × Φφ = (− sin θ sin φ, cos θ sin φ, 0) × (cos θ cos φ, sin θ cos φ, − sin φ)


= (− cos θ sin2 φ, − sin θ sin2 φ, − sin φ cos φ),
   
−2u −2v
Ψu × Ψv = 1, 0, √ × 0, 1, √
2 1 − u2 − v 2 2 1 − u2 − v 2
Reparameterization and orientation 145

 
u v
= √ ,√ ,1 .
1 − u2 − v 2 1 − u2 − v 2
(Note that Φu × Φv is not defined on the boundary u2 + v 2 = 1 of its domain.) For φ ∈ (0, π2 ],
we have − sin φ cos φ < 0. So the z-coordinate of Φθ × Φφ is negative, while the z-coordinate
of Ψu × Ψv is positive. Thus, Φ and Ψ have opposite orientations.
To further illustrate this, let’s consider a specific point, say
   
1 1
Ψ 2, 0 =
√ √
2
, 0, 2 = Φ(0, π4 ).
√1

At this point, the normal vectors coming from the two parameterizations are
 
Φθ × Φφ 0, 4 = − 2 , 0, − 2 . and Ψu × Ψv
π 1 1 1
 

2
, 0 = (1, 0, 1).

So Ψu × Ψv = −2Φθ × Φφ at this point. Hence the orientations are opposite. △

We are now ready to state our main theorem of this section.


Theorem 7.7.2. Suppose Φ, Ψ are two one-to-one C 1 parameterizations of the same surface
S. Let F be a vector field on S and let f be a real-valued function on S.
(a) If Φ and Ψ have the same orientation then
ZZ ZZ
F · dS = F · dS .
Φ Ψ

(b) If Φ and Ψ have opposite orientations, then


ZZ ZZ
F · dS = − F · dS .
Φ Ψ

(c) We have ZZ ZZ
f dS = f dS .
Φ Ψ

Proof. The proof of this theorem is similar to that of Theorem 6.3.1. It involves using a
change of variables. We omit the details.
Theorem 7.7.2 tells us that integrals of scalar functions over surfaces are independent of
parameterization, while integrals of vector fields over surfaces depend only on the orientation
of the parameterization. This is analogous to Theorems 6.3.1 and 6.3.2, which state that
integrals of scalar functions along paths are independent of parameterization, while integrals
of vector fields along paths depend only on the orientation of the parameterization.
In light of Theorem 7.7.2, if f is a real-valued function defined on a surface S, we can
define ZZ ZZ
f dS := f dS,
S Φ
146 Surface integrals

where Φ is any regular parameterization of S.


For vector fields, we need to specify an orientation. An oriented surface is a two-sided
surface with one side specified as the outside or positive side. We call the other side the
inside or negative side. We say that Φ(u, v) is a parameterization of the oriented surface S
if Φ(u, v) is a parameterization of S and Φu × Φv points out of the positive side of S for all
(u, v) in the domain of Φ. Then, for a vector field F defined on an oriented surface S, we
can define
ZZ ZZ
F · dS := F · dS,
S Φ

where Φ is any regular parameterization of the oriented surface S.

Example 7.7.3. Suppose we choose to orient the half-sphere S of Example 7.7.1 upward.
That is, we our normal vectors to have positive z-component. Consider the vector field
F(x, y, z) = (0, 0, 1). As we noted in Example 7.7.1, the parameterization Ψ gives normal
vectors with positive z-components. So Ψ is a parameterization of the oriented surface S.
Therefore,

ZZ ZZ ZZ ZZ
F · dS = F · dS = (0, 0, 1) · (Ψu × Ψv ) dS = 1 du dv = Area(D) = π,
S Ψ Ψ D

where D = {(x, y) : x2 + y 2 ≤ 1} is the domain of Ψ. On the other hand

ZZ ZZ Z π/2 Z 2π
F · dS = (0, 0, 1) · (Φθ × Φφ ) dS = (− sin φ cos φ) dθ dφ
0 0
Φ Φ
Z π/2  π/2
1 2
= −2π sin φ cos φ dφ = −2π sin φ = −π,
0 2 φ=0

which is − F · dS since the orientation of Φ is opposite that of S.


RR

S

If a vector field F represents the flow of a fluid, the sign of a surface


RR integral indicates
the flow relative to the orientation of the surface. In other words, if F · dS is positive,
S
then
RR there is a net flow across S in the direction of the orientation of S. On the other hand,
if F · dS is negative, then the net flow is in the opposite direction.
S
Note that some surfaces cannot be oriented. In other words, there are surfaces where it
is not possible to choose one side. The most famous example is the Möbius strip, which has
only one side.
Reparameterization and orientation 147

Image source: https://commons.wikimedia.org/wiki/File:Green_Coloured_Mobius_Strip.png


(CC BY-SA 3.0)

Integrating a vector field over the Möbius strip is problematic! For further discussion on
orientation see [FRYe, §3.5].
Chapter 8

Integral theorems

In this chapter we discuss three theorems: the divergence theorem, Green’s theorem, and
Stokes’ theorem. They are all generalization of the fundamental theorem of calculus. Recall
that the fundamental theorem of calculus states that
Z b
df
(t) dt = f (b) − f (a).
a dt

Thus, it relates the integral of the derivative of a function f over an interval [a, b] to the
values of that function on the boundary {a, b} of that interval. The divergence theorem,
Green’s theorem, and Stokes’ theorem will also have this form. However, we will now work
in higher dimensions. For the divergence theorem, the integral on the left-hand side is over
a three-dimensional volume, and the right-hand side is an integral over the boundary of the
volume, which is a surface. For Green’s theorem and Stokes’ theorem, the integral on the
left-hand side is over a surface, and the right-hand side is an integral over the boundary of
the surface, which is a curve. A good reference for the material in this section is [FRYe,
Ch. 4].

8.1 Gradient, divergence, and curl


We begin with a discussion of gradient, divergence, and curl. We have already seen gradient
and curl, but divergence is new. The best way to remember these operators uses the nabla
symbol ∇. This is a differential operator defined by
∂ ∂ ∂
∇=i +j +k .
∂x ∂y ∂z
Throughout this section, F and G denote three-dimensional vector fields and f, g, h denote
scalar-valued functions. we denote the components of a vector field F by F1 , F2 , F3 (and
similarly for G, etc.).
Definition 8.1.1 (Gradient, divergence, curl, Laplacian). (a) The gradient of a scalar-
valued function f (x, y, z) is the vector field
∂f ∂f ∂f
grad f = ∇f = i+ j+ k.
∂x ∂y ∂z

148
Gradient, divergence, and curl 149

(b) The divergence of a vector field F(x, y, z) is the scalar-valued function


∂F1 ∂F2 ∂F3
div F = ∇ · F = + + .
∂x ∂y ∂z
(c) The curl of a vector field F(x, y, z) is the vector field
     
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
curl F = ∇ × F = − i+ − j+ − k.
∂y ∂z ∂z ∂x ∂x ∂y

(d) The Laplacian of a scalar-valued function f (x, y, z) is the scalar-valued function

∂ 2f ∂ 2f ∂ 2f
∆f = ∇2 f = ∇ · ∇f = + + .
∂x2 ∂y 2 ∂z 2

8.1.1 Vector identities


When working with the above operators, it is useful to have some identities at our disposal.
Some of these identities involve the expression (G·∇)F for vector fields F, G. This is defined
to be
∂F ∂F ∂F
(G · ∇)F := G1 + G2 + G3 .
∂x ∂y ∂z
The proofs of all of these identities involve using rules you’ve learned for differentiation. So,
for each proposition, we will only prove a small selection of the identities, leaving the proof
of the others as an exercise. The proofs can also be found in [FRYe, §4.1.1].
Proposition 8.1.2 (Gradient identities).
(a) ∇(f + g) = ∇f + ∇g.
(b) ∇(cf ) = c∇f for all c ∈ R.
(c) ∇(f g) = (∇f )g + f (∇g).
(d) ∇(f /g) = (g∇f − f ∇g)/g 2 at points x where g(x) ̸= 0.
(e) ∇(F · G) = F × (∇ × G) − (∇ × F) × G + (G · ∇)F + (F · ∇)G.
Proof. To prove (c), we compute
∂f g ∂f g ∂f g
∇(f g) = i +j +k
∂x ∂y ∂z
     
∂f ∂g ∂f ∂g ∂f ∂g
=i g+f +j g+f +k g+f (chain rule)
∂x ∂x ∂y ∂y ∂z ∂z
= (∇f )g + f (∇g).

Since ∇c = 0 for a constant c ∈ R, we have

∇(cf ) = (∇c)f + c(∇f ) = c∇f,

proving (b).
150 Integral theorems

Proposition 8.1.3 (Divergence identities).


(a) ∇ · (F + G) = ∇ · F + ∇ · G.
(b) ∇ · (cF) = c∇ · F for all c ∈ R.
(c) ∇ · (f F) = (∇f ) · F + f ∇ · F.
(d) ∇ · (F × G) = (∇ × F) · G − F · (∇ × G).
Proof. To prove (c), we compute
∂(f F1 ) ∂(f F2 ) ∂(f F3 )
∇ · (f F) = + +
∂x ∂y ∂z
∂f ∂F1 ∂f ∂F2 ∂f ∂F3
= F1 + f + F2 + f + F3 + f (product rule)
∂x ∂x ∂y ∂y ∂z ∂z
= (∇f ) · F + f ∇ · F.
Taking f (x, y, z) = c for some constant c ∈ R and using the fact that ∇c = 0 then gives
(b).
Proposition 8.1.4 (Curl identities).
(a) ∇ × (F + G) = ∇ × F + ∇ × G.
(b) ∇ × (cF) = c∇ × F for all c ∈ R.
(c) ∇ × (f F) = (∇f ) × F + f ∇ × F.
(d) ∇ × (F × G) = F(∇ · G) − (∇ · F)G + (G · ∇)F − (F · ∇)G.
Proof. To prove (c), we use the chain rule to compute
∂(f F3 ) ∂(f F2 ) ∂f ∂F3 ∂f ∂F2
− = F3 + f − F2 − f
∂y ∂z ∂y ∂y ∂z ∂z
∂(f F1 ) ∂(f F3 ) ∂f ∂F1 ∂f ∂F3
− = F1 + f − F3 − f
∂z ∂x ∂z ∂z ∂x ∂x
∂(f F2 ) ∂(f F1 ) ∂f ∂F2 ∂f ∂F1
− = F2 + f − F1 − f .
∂x ∂y ∂x ∂x ∂y ∂y
Thus
   
∂f ∂F3 ∂f ∂F2 ∂f ∂F1 ∂f ∂F3
∇ × (f F) = F3 + f − F2 − f i+ F1 + f − F3 − f j
∂y ∂y ∂z ∂z ∂z ∂z ∂x ∂x
 
∂f ∂F2 ∂f ∂F1
+ F2 + f − F1 − f k
∂x ∂x ∂y ∂y
     
∂f ∂f ∂f ∂f ∂f ∂f
= F3 − F2 i + F1 − F3 j + F2 − F1 k
∂y ∂z ∂z ∂x ∂x ∂y
     
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
+f − i+f − j+f − k
∂y ∂z ∂z ∂x ∂x ∂y
= (∇f ) × F + f ∇ × F.
Taking f (x, y, z) = c for some constant c ∈ R then gives (b).
Gradient, divergence, and curl 151

Proposition 8.1.5 (Laplacian identities).

(a) ∇2 (f + g) = ∇2 f + ∇2 g.
(b) ∇2 (cf ) = c∇2 f for all c ∈ R.
(c) ∇2 (f g) = f ∇2 g + 2(∇f ) · (∇g) + g∇2 f .

Proof. To prove (c), we compute

∇2 (f g) = ∇ · ∇(f g)
(by Proposition 8.1.2(c))

= ∇ · (∇f )g + f (∇g)
(by Proposition 8.1.3(a))
 
= ∇ · (∇f )g + ∇ · f (∇g)
= (∇2 f )g + (∇f ) · (∇g) + (∇f ) · (∇g) + f ∇2 g, (by Proposition 8.1.3(c))

as desired.

Proposition 8.1.6 (Degree two identities). Suppose F and f, g, h are C 2 .

(a) ∇ · (∇ × F) = 0.
(b) ∇ × (∇f ) = 0.
(c) ∇ · f (∇g × ∇h) = ∇f · (∇g × ∇h).


(d) ∇ · (f ∇g − g∇f ) = f ∇2 g − g∇2 f .

Proof. We proved (b) in Theorem 5.3.3(b).

Example 8.1.7. Let r(x, y, z) = x2 i + yj + z 3 k, and let ψ(x, y, z) be any real-valued function.
Let’s prove that
∇ · (r × ∇ψ) = 0.
Using Proposition 8.1.3(d), we have

∇ · (r × ∇ψ) = (∇ × r) · ∇ψ − r · (∇ × (∇ψ)) = (∇ × r) · ∇ψ,

where we used Proposition 8.1.6(b) to conclude that ∇ × (∇ψ) = 0. Next, we compute

∂(z 3 ) ∂y ∂(x2 ) ∂(z 3 ) ∂(x2 )


     
∂y
∇×r= − i+ − j+ − k = 0.
∂y ∂z ∂z ∂x ∂x ∂y

Thus, ∇ · (r × ∇ψ) = 0, as desired. In fact, this is true for any curl free r. △

8.1.2 Physical interpretations


It is helpful to have an intuitive interpretation of gradient, divergence, and curl. Such
intuition helps us understand some of the theorems and applications to come. We already
saw, in Proposition 1.8.6, that the gradient ∇f (x) at a point x points in the direction in
which f increases most rapidly at x. Then Proposition 1.8.4 tells us that its magnitude is
the rate of change in this direction.
152 Integral theorems

Next we discuss divergence. Fix a point (x0 , y0 , z0 ). Then, for ε > 0, let

Sε = {(x, y, z) : (x − x0 )2 + (y − y0 )2 + (z − z0 )2 = ε2 }

be the sphere of radius ε centred at (x0 , y0 , z0 ). Then one can show that, for a C 1 vector
field F : R3 → R3 , we have
ZZ
1
∇ · F(x0 , y0 , z0 ) = lim F · dS,
ε→0 Vol(Sϵ )

where Vol(Sϵ ) = 43 πε3 is the volume of Sε . (For a proof of this fact, see
RR[FRYe, Lem. 4.1.20].)
If we interpret F as the velocity of some fluid, then the flux integral F · dS represents the

total flow across the surface. Thus
∇ · F(x)
is the rate at which the fluid is exiting an infinitesimal sphere, centered at x, per unit time,
per unit volume. So it is a measure of the strength of the “source” at x.

Example 8.1.8. Consider the vector field

F(x, y, z) = xi + yj + zk.

If this vector represents the flow of fluid, then fluid is being created and pushed out through
any sphere centred at the origin. The divergence is

∇ · F = 3 > 0,

consistent with our intuitive interpretation. △

Example 8.1.9. Consider the vector field

F(x, y, z) = −yi + xj.


Gradient, divergence, and curl 153

In this case, the fluid is going in circles. No fluid actually exits the sphere. The divergence
is
∇ · F = 0,
consistent with our interpretation. △

Example 8.1.10. Consider the vector field


F(x, y, z) = j.

In this case, the fluid is moving uniformly upwards. Fluid enters the sphere at the bottom
and leaves through the top at exactly the same rate. So the net rate at which fluid crosses
the sphere is zero. The divergence is
∇ · F = 0,
consistent with our interpretation. △

Finally, we turn our attention to curl. Fix ε > 0 and a, n ∈ R3 . Let Cε be the circle
centred at a, with radius ε, and in the plane through a orthogonal to n. We orient Cε using
the right-hand rule. If the thumb of your right hand points in the direction of n and you
curl your fingers, they point in the orientation of Cε .
n
ε
a•

Then one can show that


I
 1
∇ × F(a) · n = lim 2 F · ds,
ε→0 πε Cε

where πε2 is the area of the disk with boundary Cε . (For a proof of this fact, see [FRYe,
Lem. 4.1.25].) Now suppose that F represents the velocity of a fluid and you place an
infinitesimal paddlewheel at a with its axle in direction n. If the paddlewheel is rotating
at a rate of Ω radians per minute, then the speed of the paddles is Ωε. (Remember that
radians are defined as arc length divided by radius). This must be equal to the average value
of speed of the paddles in the direction of rotation. Since the circumference of Cε is 2πε, we
have I I
1 1 1 
Ωε = F · ds =⇒ Ω = F · ds = ∇ × F(a) · n.
2πε Cε 2π Cε 2
154 Integral theorems

Thus, our infinitesimal paddlewheel rotates at 21 ∇ × F(a) · n radians per unit time. In


particular, to maximize the rotation, we should orient the paddlewheel so that n is parallel
to ∇ × F(a).
Let’s return to the vector fields of Examples 8.1.8, 8.1.9, and 8.1.10 and consider our
interpretation of their curls.

Example 8.1.11. Consider the vector field

F(x, y, z) = xi + yj + zk.

If this vector represents the flow of fluid, then fluid is moving parallel to the paddles, so the
paddlewheel should not rotate at all. Indeed, we have
 
i j k
∂ ∂ ∂

∇ × F(0) = det  ∂x ∂y ∂z
 = 0 =⇒ ∇ × F(0) · k = 0,
x y z

as expected. △

Example 8.1.12. Consider the vector field

F(x, y, z) = −yi + xj.

In this case, the fluid is rotating counterclockwise. So the paddlewheel should also rotate
counterclockwise. Hence it should have positive angular velocity. Indeed, we have
 
i j k
∂ ∂ ∂

∇ × F(0) = det  ∂x ∂y ∂z
 = 2k =⇒ ∇ × F(0) · k = 2k · k = 2 > 0,
−y x 0

as expected. △
Divergence theorem 155

Example 8.1.13. Consider the vector field


F(x, y, z) = j.

In this case, the fluid is pushing the right paddle counterclockwise and the left paddle
clockwise, with the same force. So the paddle should not rotate. Indeed, we have
 
i j k
∂ ∂ ∂ 

∇ × F(0) = det  ∂x ∂y ∂z = 0 =⇒ ∇ × F(0) · k = 0,
1 0 0
as expected. △

Exercises.
Exercises from [FRYd, §4.1] : Q1–Q13.

8.2 Divergence theorem


The divergence theorem will relate a volume integral over a solid V to a flux integral over
the surface of V . We first start with some definitions about the type of surfaces we consider.
Definition 8.2.1. (a) A surface is smooth if it has a C 1 parameterization Φ(u, v) such
that all points are regular, i.e. Φu × Φv ̸= 0 everywhere.
(b) A surface is piecewise smooth if it is a finite union of smooth surfaces that meet along
curves.
Example 8.2.2. The ice-cream cone shape

is piecewise smooth, but not smooth, since it has a “sharp curve” where the half-sphere (the
ice cream) meets the cone. △
156 Integral theorems

Recall from Section 1.1 that ∂A denotes the boundary of a set A. Thus, if V is a bounded
solid, then ∂V denotes the surface of V .
Theorem 8.2.3 (Divergence theorem). Suppose that V is a bounded solid with piecewise
smooth boundary ∂V , and let F be a C 1 vector field on V . We give ∂V the outward orientation
(i.e. we choose normal vectors pointing away from V ). Then
ZZ ZZZ
F · dS = ∇ · F dV .
∂V V

Proof. For a proof of this theorem, see [FRYe, Th. 4.2.2]. Given the physical interpretation
of divergence we discussed in Section 8.1.2, the divergence theorem should not be surprising.
The right-hand integral ZZZ
∇ · F dV
V
adds up the divergence over all points of the solid V . For any points in the interior of V ,
flow out of a small sphere is cancelled by the flow into neighbouring spheres. Thus, what we
are left with is the net flow out of the boundary, which is the left-hand integral
ZZ
F · dS .
∂V

The divergence theorem is sometimes called Gauss’ theorem or Gauss’ divergence theorem.
Like the fundamental theorem of calculus, the divergence theorem states that the integral of
a derivative of a function (here the “derivative” is the divergence) over some region is equal
to the integral of the function over the boundary of that region. When using the divergence
theorem, we will always choose the outward orientation for the surface of a solid region.

Example 8.2.4. Consider the ball


V = {(x, y, z) : x2 + y 2 + z 2 ≤ 1}
and the vector field
F(x, y, z) = (x, y, 0).
Let’s compute both sides of the divergence theorem and verify that they are equal. The
boundary ∂V of V is the unit sphere, which we can parameterize by
Φ(θ, φ) = (cos θ sin φ, sin θ sin φ, cos φ).
As we computed in Example 7.7.1, we have
Φθ × Φφ = (− cos θ sin2 φ, − sin θ sin2 φ, − sin φ cos φ).
Also as we noted in Example 7.7.1, this is the inward orientation, so we will need to add a
negative sign, since the divergence theorem assumes the outward orientation. (Alternatively,
we could interchange θ and φ, since Φθ × Φφ = −Φφ × Φθ .) We have
F(Φ(θ, φ)) · (Φθ × Φφ ) = (cos θ sin φ, sin θ sin φ, 0) · (− cos θ sin2 φ, − sin θ sin2 φ, − sin φ cos φ)
Divergence theorem 157

= − cos2 θ sin3 φ − sin2 θ sin3 φ


= − sin3 φ
= (cos2 φ − 1) sin φ.

Thus
ZZ ZZ
F · dS = − F · dS (because of the reversed orientation)
∂V Φ
Z π Z 2π
=− (cos2 φ − 1) sin φ dθ dφ
0 0
Z π
= −2π (cos2 φ − 1) sin φ dφ
0
Z −1
= 2π (u2 − 1) du (u = cos φ, du = − sin φ dφ)
1
 3 −1
u
= 2π −u
3
 u=1
= 2π − 32 + 2 = 8π 3
.

On the other hand,


∂ ∂ ∂
∇·F= x+ y+ 0 = 2,
∂x ∂y ∂z
and so ZZZ ZZZ
4π 8π
∇ · F dV = 2 dV = 2 Vol(V ) = 2 · = . △
3 3
V V

Example 8.2.5. Consider the solid

V = {(x, y, z) : x, y, z ≥ 0, x + y + z ≤ 1}
z

y
x

and the vector field


F(x, y, z) = (xy, yz, xz).
The boundary ∂V has 4 components:

Φ1 (u, v) = (u, v, 0), u, v ≥ 0, u + v ≤ 1,


158 Integral theorems

Φ2 (u, v) = (u, 0, v), u, v ≥ 0, u + v ≤ 1,


Φ3 (u, v) = (0, u, v), u, v ≥ 0, u + v ≤ 1,
Φ4 (u, v) = (u, v, 1 − u − v), u, v ≥ 0, u + v ≤ 1.

These components have normal vectors:

Φ1u × Φ1v = (1, 0, 0) × (0, 1, 0) = (0, 0, 1), (negative orientation)


Φ2u × Φ2v = (1, 0, 0) × (0, 0, 1) = (0, −1, 0), (positive orientation)
Φ3u × Φ3v = (0, 1, 0) × (0, 0, 1) = (1, 0, 0), (negative orientation)
Φ4u × Φ4v = (1, 0, −1) × (0, 1, −1) = (1, 1, 1). (positive orientation)

Thus

F(Φ1 (u, v)) · Φ1u × Φ1v = (uv, 0, 0) · (0, 0, 1) = 0,

F(Φ2 (u, v)) · Φ2u × Φ2v = (0, 0, uv) · (0, −1, 0) = 0,

F(Φ3 (u, v)) · Φ3u × Φ3v = (0, uv, 0) · (1, 0, 0) = 0,
F(Φ4 (u, v)) · Φ4u × Φ4v = (uv, v(1 − u − v), u(1 − u − v)) · (1, 1, 1) = u + v − uv − u2 − v 2 .


and so
ZZ ZZ ZZ ZZ ZZ
F · dS = − F · dS + F · dS − F · dS + F · dS .
∂V Φ1 Φ2 Φ3 Φ4
ZZ
= F · dS
Φ4
Z 1 Z 1−u
= (u + v − uv − u2 − v 2 ) dv du
Z0 1 0
(1 − u)2 u(1 − u)2 (1 − u)3

2
= u(1 − u) + − − u (1 − u) − du
0 2 2 3
1 1 3
Z
= (5u − 9u2 + 1) du
6 0
 
1 5 9 1 3 1
= − +1 = · = .
6 4 3 6 4 8
On the other hand
∂ ∂ ∂
∇·F= (xy) + (yz) + (xz) = x + y + z,
∂x ∂y ∂z
and so
ZZZ Z 1 Z 1−x Z 1−x−y
∇ · F dx dy dz = (x + y + z) dz dy dx
0 0 0
V
1 1−x
(1 − x − y)2
Z Z  
= (x + y)(1 − x − y) + dy dx
0 0 2
Divergence theorem 159

1 1 1−x
Z Z
1 − x2 − 2xy − y 2 dy dx

=
2 0 0
1 1 (1 − x)3
Z  
2 2
= (1 − x )(1 − x) − x(1 − x) − dx
2 0 3
1 1 3
Z

= x − 3x + 2 dx
6 0
 
1 1 3 1 3 1
= − +2 = · = . △
6 4 2 6 4 8
The divergence theorem can also be used to simplify some surface integrals, as the fol-
lowing example illustrates.

Example 8.2.6. Consider the cone

C = {(x, y, z) : x2 + y 2 = (1 − z)2 }
z

y
x

oriented upwards and the vector field

F(x, y, z) = (x2 , −xy, 1 − xz).

Note that
∂ 2 ∂ ∂
∇·F= x + (−xy) + (1 − xz) = 2x − x − x = 0.
∂x ∂y ∂z
Therefore, by the divergence theorem,
ZZ
F · dS = 0
S

for any piecewise smooth closed surface S (i.e. any surface S that is the boundary of a
bounded solid). Let’s close off the cone by adding the bottom

B = {(x, y, 0) : x2 + y 2 ≤ 1}

oriented downwards. Then V = C ∪ B is a closed surface, and so


ZZ ZZ ZZ ZZ
F · dS + F · dS = 0 =⇒ F · dS = − F · dS .
C B C B
160 Integral theorems

So we have turned our integral over C into an integral over B, which is easier to compute!
We parameterize B by

Φ(u, v) = (u, v, 0), (u, v) ∈ D := {(x, y) : x2 + y 2 ≤ 1}.

We have
Φu × Φv = (0, 0, 1),
which is opposite to the orientation of B. Since

F(Φ(u, v)) · (Φu × Φv ) = (u2 , −uv, 1) · (0, 0, 1) = 1,

we have
ZZ ZZ ZZ ZZ
F · dS = − F · dS = F · dS = 1 du dv = Area(D) = π. △
C B Φ D

Warning 8.2.7. In the divergence theorem, it is crucial that F is C 1 at every point of V . For
example, the divergence theorem fails for
r
F= , V = {(x, y, z) : x2 + y 2 + z 2 ≤ 1},
∥r∥3

where r(x, y, z) = (x, y, z). Note that F is not defined at the origin, which is contained in
V . See [FRYe, Example 4.2.7] for details.

Exercises.
Exercises from [FRYd, §4.2] : Q1–Q34.

8.3 Green’s theorem


Our next integral theorem is Green’s theorem, which relates the integral of a derivative of
a vector-valued function over a region in the plane to an integral of the function over the
boundary curve. We begin by discussing a few properties of curves, some of which we have
seen before. A good reference for the material in this section is [FRYe, §4.3].

Definition 8.3.1. Let r(t), a ≤ t ≤ b, be a parameterization of a curve C.

(a) The curve C is closed if r(a) = r(b).


(b) The curve C is simple if it does not cross itself. More precisely, C is simple if, for any
a ≤ t1 , t2 ≤ b satisfying r(t1 ) = r(t2 ), we have

t1 = t2 or (t1 = a, t2 = b) or (t1 = b, t1 = a).


Green’s theorem 161

(c) The curve C is piecewise smooth if it has a parameterization r(t) that


(i) is continuous and which
(ii) is differentiable except possibly at finitely many points with
(iii) the derivative being continuous and nonzero except possibly at finitely many
points.
Example 8.3.2. Here are some examples of curves:

simple curve simple closed curve not a simple curve piecewise smooth curve

Theorem 8.3.3 (Green’s theorem). Suppose D is a bounded region in the xy-plane and that
the boundary, C, of D consists of a finite number of piecewise smooth, simple closed curves,
each oriented so that if you walk along in the direction of the orientation, then D is on your
left.
y

D
y
C

D x
C C
C
x

Furthermore, suppose that F = (F1 , F2 ) is a C 1 vector field on D. Then


I ZZ  
∂F2 ∂F1
F · ds = − dx dy .
C ∂x ∂y
D

Remarks 8.3.4. (a) Remember that C is just an alternate notation for C that is some-
H R

times used when C is a closed curve (or a union of closed curves).


(b) Recall that ∂F
∂x
2
− ∂F
∂y
1
is the scalar curl of F. It is the z-component of the curl of F if
we extend F to be a vector field in three dimensions with zero z-component. See (5.3).
(c) Sometimes we write
I I
for

F1 (x, y) dx +F2 (x, y) dy F · ds .
C C

Proof Theorem 8.3.3. We will use the divergence theorem (Theorem 8.2.3) to prove Green’s
theorem. Define

V = {(x, y, z) : (x, y) ∈ D, 0 ≤ z ≤ 1},


162 Integral theorems

G(x, y, z) = F2 (x, y)i − F1 (x, y)j.

z
z=1

y
D
C
x

By the divergence theorem, we have


ZZ ZZZ
G · dS = ∇ · G dV . (8.1)
∂V V

The right-hand side of (8.1) is


ZZZ ZZZ  
∂F2 ∂F1
∇ · G dV = (x, y) − (x, y) dx dy dz
∂x ∂y
V V
ZZ  
∂F2 ∂F1
= (x, y) − (x, y) dx dy,
∂x ∂y
D

which is the right-hand side of Green’s theorem.


Now we compute the left-hand side of (8.1). The boundary, ∂V , of V is the union of
• the flat top, with normal k,
• the flat bottom, with normal −k,
• the side, traced out by the curve C as one slides it from z = 0 to z = 1.
Since G · k = 0, we have
ZZ ZZ ZZ ZZ ZZ
G · dS = G · k dS + G · (−k) dS + G · dS = G · dS .
∂V top bottom side side

To compute the integral over the side, we parameterize it. Suppose that

r(t) = x(t)i + y(t)j, a ≤ t ≤ b,

is a parameterization of C. Then

Φ(t, z) = r(t) + zk = x(t)i + y(t)j + zk, a ≤ t ≤ b, 0 ≤ z ≤ 1,

is a parameterization of the side. We have

Φt × Φz = (x′ (t), y ′ (t), 0) × (0, 0, 1) = r′ (t) × k = y ′ (t)i − x′ (t)j.


Green’s theorem 163

Using the right-hand rule, we see that this gives the outward normal.


C
r′
r′ ×k

Therefore,
ZZ ZZ
G · dS = G · dS
∂V side
Z b Z 1
= G(Φ(t, z)) · (Φt × Φz ) dz dt
a 0
Z bZ 1
F2 (x, y), −F1 (x, y), 0 · y ′ (t), −x′ (t), 0 dz dt
 
=
a 0
Z bZ 1
F2 (x, y)y ′ (t) + F1 (x, y)x′ (t) dz dt

=
a 0
Z b
F2 (x, y)y ′ (t) + F1 (x, y)x′ (t) dt

=
a
Z b
= F · r′ (t) dt
Ia
= F · ds .
C

This is precisely the left-hand side of Green’s theorem.

Example 8.3.5. Consider the vector field

F(x, y) = (−y, x)

and the unit disk

D = {(x, y) : x2 + y 2 ≤ 1}.

The boundary is the unit circle, which we can parameterize by

r(θ) = (cos θ, sin θ), θ ∈ [0, 2π].


164 Integral theorems

Note that this has the correct (that is, counterclockwise) orientation. We have

r′ (θ) = (− sin θ, cos θ).

Thus the left-hand side of Green’s theorem is


I Z 2π Z 2π

F · ds = F(r(θ)) · r (θ) dθ = (− sin θ, cos θ) · (− sin θ, cos θ) dθ
C 0 0
Z 2π Z 2π
2 2
= (sin θ + cos θ) dθ = dθ = 2π.
0 0

On the other hand, the right-hand side of Green’s theorem is


ZZ   ZZ
∂F2 ∂F1
− dx dy = 2 dx dy = 2 Area(D) = 2π.
∂x ∂y
D D

So we have verified Green’s theorem. △

Example 8.3.6. Let D be the triangle with vertices at (0, 0), (0, 1), and (1, 0). Let’s verify
Green’s theorem on D for
F(x, y) = (x, y).
We parameterize the boundary C of D by

r1 (t) = (t, 0), t ∈ [0, 1], r′1 (t) = (1, 0),


r2 (t) r3 (t)
r2 (t) = (0, t), t ∈ [0, 1], r′2 (t) = (0, 1),
r3 (t) = (t, 1 − t), t ∈ [0, 1], r′3 (t) = (1, −1).
r1 (t)

Note that the pieces r2 (t) and r3 (t) are oriented in the wrong direction, so we will need to
add a negative sign in front of the corresponding terms. So the left-hand side of Green’s
theorem is
I Z Z Z
F · ds = F · ds − F · ds − F · ds
C r1 r2 r3
Z 1 Z 1 Z 1
= (t, 0) · (1, 0) dt − (0, t) · (0, 1) dt − (t, 1 − t) · (1, −1) dt
0 0 0
Z 1 Z 1 Z 1
= t dt − t dt − (t − 1 + t) dt
0 0 0
1
= − t2 − t t=0 = 0.


On the other hand, the right-hand side of Green’s theorem is


ZZ   ZZ
∂F2 ∂F1
− dx dy = 0 dx dy = 0.
∂x ∂y
D D

So we have verified Green’s theorem. △


Green’s theorem 165

Example 8.3.7. Consider


 2 2 2 2

F(x, y) = −yex +y , xex +y ,
D = {(x, y) : x2 + y 2 ∈ [1, 4]}.

The boundary ∂D consists of two circles:


r1 (θ) = (cos θ, sin θ), θ ∈ [0, 2π], (wrong orientation)
r2 (θ) = (2 cos θ, 2 sin θ), θ ∈ [0, 2π]. (correct orientation)
We have
r′1 (θ) = (− sin θ, cos θ), r′2 (θ) = (−2 sin θ, 2 cos θ),
and so
F(r1 (θ)) · r′1 (θ) = (−e sin θ, e cos θ) · (− sin θ, cos θ) = e,
F(r2 (θ)) · r′2 (θ) = (−2e4 sin θ, 2e4 cos θ) · (−2 sin θ, 2 cos θ) = 4e4 .
Therefore, the left-hand side of Green’s theorem is
I Z Z Z 2π Z 2π
F · ds = − F · ds + F · ds = − e dθ + 4e4 dθ = 8πe4 − 2πe.
∂D r1 r2 0 0

Now let’s compute the right-hand side. The scalar curl of F is


∂F2 ∂F1 ∂  x2 +y2  ∂  2 2

− = xe − −yex +y
∂x ∂y ∂x ∂y
x2 +y 2 2 2 2 2 2 2 2 2
=e + 2x2 ex +y + ex +y + 2y 2 ex +y = 2(1 + x2 + y 2 )ex +y .
Using polar coordinates, we have
ZZ   Z 2π Z 2
∂F2 ∂F1 2
− dx dy = 2(1 + r2 )er r dr dθ
∂x ∂y 0 1
D
Z 2π Z 4
= (1 + u)eu du dθ (u = r2 , du = 2r dr)
Z0 2π 1
= [ueu ]4u=1 dθ
Z0 2π
= (4e4 − e) dθ = (4e4 − e)2π = 8πe4 − 2πe.
0
So we have verified Green’s theorem. △
166 Integral theorems

Green’s theorem can be used to compute the areas of regions bounded by curves, as the
next example illustrates.

Example 8.3.8. Let R be the region bounded by the curve



r(t) = (2 + cos 4t) cos t, (2 + cos 4t) sin t , 0 ≤ t ≤ 2π.
y

If we choose a vector field F with scalar curl


∂F2 ∂F1
− = 1,
∂x ∂y
then, by Green’s theorem, we have
ZZ ZZ   I
∂F2 ∂F1
Area(R) = 1 dx dy = − dx dy = F · ds .
∂x ∂y r
R R

Let’s choose F(x, y) = (0, x), which does satisfy ∂F ∂x


2
− ∂F
∂y
1
= 1. So
Z 2π
Area(R) = F(r) · r′ (t) dt
Z0 2π

= (2 + cos 4t)(cos t) (2 − 4 sin 4t) sin t + (2 + cos 4t) cos t dt
0

= ,
2
where the last integral was computed using techniques from single-variable calculus. (We’ve
omitted the details of this last computation.) △

Warning 8.3.9. In Green’s theorem (Theorem 8.3.3), it is crucial that F is C 2 on all of D.


For example, with
 
y x
F(x, y) = − 2 , , D = {(x, y) : x2 + y 2 ≤ 1},
x + y 2 x2 + y 2
the equality in Green’s theorem does not hold. Note that F is not defined at the origin,
which is contained in D. For details, see [FRYe, Examples 4.3.7, 4.3.8].
Stokes’ theorem 167

Exercises.
Exercises from [FRYd, §4.3] : Q1–Q25.

8.4 Stokes’ theorem


Our final variant of the fundamental theorem of calculus is Stokes’ theorem, which is a three-
dimensional version of Green’s theorem. It relates an integral over a surface in R3 with an
integral over the curve bounding the surface.
Before stating Stokes’, we introduce some terminology and conventions. Suppose S is a
piecewise smooth oriented surface in R3 . Remember that this means we have “chosen a side”
of S. More precisely, we have chosen a unit normal vector at each point of S in a continuous
way. In Definition 1.1.6, we defined the boundary of S to be the set of all points such that
any ball around the point contains points of S and points not in S. By this definition, the
boundary of most surfaces is the entire surface itself, which is not what we want for Stokes’
theorem. So here we say that a point x ∈ S is a boundary point if, when you zoom in close,
it looks as follows:
S

x
(This definition can be made more precise with additional work, but boundaries will be clear
in all of our examples.) Then the boundary, ∂S, of S is the set of all its boundary points.
Although we have now used the notation ∂S to denote two different types of boundary, the
meaning should be clear from the context. For instance, whenever we speak of the boundary
of a surface in R3 , we are using the new definition just introduced here.

Example 8.4.1. For the cone


S = {(x, y, z) : x2 + y 2 = z 2 , z ∈ [0, 1]}
z

y
x

we have
∂S = {(x, y, 1) : x2 + y 2 = 1}. △

Example 8.4.2. For the cylinder


S = {(x, y, z) : x2 + y 2 = 1, z ∈ [0, 1]}
168 Integral theorems

y
x

we have
∂S = {(x, y, z) : x2 + y 2 = 1, z ∈ {0, 1}}. △

Example 8.4.3. For the sphere

S = {(x, y, z) : x2 + y 2 + z 2 = 1}
z

we have ∂S = ∅. △

We will be most interested in surfaces whose boundaries are finite unions of piecewise
smooth, simple, closed curves. We want a convention for orienting these boundaries. For
Green’s theorem (Theorem 8.3.3), we oriented the boundary such that, when you walk along
the curve following the orientation, the surface is on your left. We essentially want the same
convention for surfaces in R3 , except that now we need to specify where your feet and head
are as you walk!

Convention 8.4.4. If S is a piecewise smooth oriented surface in R3 whose boundary ∂S is a


finite union of piecewise smooth, simple, closed curves, then we orient the boundary so that

• if you walk along ∂S in the direction of the orientation of ∂S,

• with the vector from your feet to your head having the same direction as the chosen
normal n for S (giving its orientation),

• then S is on your left-hand side.


Stokes’ theorem 169

z n

∂S
y

Theorem 8.4.5 (Stokes’ theorem). Suppose that S is a bounded piecewise smooth oriented
surface whose boundary ∂S is a finite union of piecewise smooth, simple, closed curves, with
orientation chosen according to Convention 8.4.4. Furthermore suppose that F is a C 1 vector
field on S. Then I ZZ
F · ds = (∇ × F) · dS .
∂S
S

Proof. For a proof of Stokes’ theorem, see [FRYe, Th. 4.4.1]. The proof involves reducing to
Green’s theorem; see Remark 8.4.6.

Remark 8.4.6. Suppose that F has zero z-component, so that F = (F1 , F2 , 0), and that the
surface S lies in the xy-plane, so that we have
S = {(x, y, 0) : (x, y) ∈ D}
for some D ⊆ R2 . Then, using (5.3), we have
ZZ ZZ ZZ  
∂F2 ∂F1
(∇ × F) · dS = (∇ × F) · k dS = − dx dy .
∂x ∂y
S S D

Therefore, in this situation, Stokes’ theorem reduces to Green’s theorem.

Example 8.4.7. Let’s verify Stokes’ theorem for


S = {(x, y, z) : x2 + y 2 + z 2 = 1}.
F(x, y, z) = (xy, xy, 0),
Since ∂S = ∅, the left-hand side of Stokes’ theorem is ∂S F · ds = 0. So we need to check
H

that the right-hand side is also zero. We parameterize S, which is the unit sphere, using
spherical coordinates:
Φ(θ, φ) = (sin φ cos θ, sin φ sin θ, cos φ), θ ∈ [0, 2π], φ ∈ [0, π].
As we computed in Example 7.7.1, we have
Φθ × Φφ = (− cos θ sin2 φ, − sin θ sin2 φ, − sin φ cos φ).
We have
 
∂ ∂ ∂ ∂ ∂ ∂
curl F = 0− (xy), (xy) − 0, (xy) − (xy) = (0, 0, y − x),
∂y ∂z ∂z ∂x ∂x ∂y
170 Integral theorems

and so
curl F(Φ(θ, φ)) = (0, 0, sin θ sin φ − cos θ sin φ).
Therefore
ZZ Z 2π Z π
curl F · dS = curl F(Φ(θ, φ)) · (Φθ × Φφ ) dφ dθ
0 0
S
Z 2π Z π
= sin2 φ cos φ(cos θ − sin θ) dφ dθ
Z0 π 0

= sin2 φ cos φ [sin θ + cos θ]2π


θ=0 dφ
0
= 0,
as expected. △

Example 8.4.8. Let’s verify Stokes’ theorem for


F(x, y, z) = (−ez y, ez x, 0), S = {(x, y, z) : x2 + y 2 = 1, z ∈ [0, 1]}.
z

y
x

We parameterize the cylinder S by


Φ(θ, z) = (cos θ, sin θ, z), θ ∈ [0, 2π], z ∈ [0, 1].
We have
 
∂ ∂ z ∂ z ∂ ∂ z ∂ z
∇×F= 0− (e x), (−e y) − 0, (e x) − (−e y)
∂y ∂z ∂z ∂x ∂x ∂y
= (−ez x, −ez y, 2ez )
and
Φθ × Φz = (− sin θ, cos θ, 0) × (0, 0, 1) = (cos θ, sin θ, 0),
which is the outward orientation of the cylinder. Next we compute
∇ × F(Φ(θ, z)) · (Φθ × Φz ) = (−ez cos θ, −ez sin θ, 2ez ) · (cos θ, sin θ, 0)
= −ez cos2 θ − ez sin2 θ = −ez .
Therefore, the right-hand side of Stokes’ theorem is
ZZ Z 1 Z 2π
(∇ × F) · dS = (−ez ) dθ dz = −2π [ez ]1z=0 = 2π(1 − e).
0 0
S
Stokes’ theorem 171

To compute the left-hand side, we parameterize the two pieces of the boundary of the cylin-
der:

r1 (t) = (cos t, sin t, 1), t ∈ [0, 2π], (negative orientation)


r2 (t) = (cos t, sin t, 0), t ∈ [0, 2π]. (positive orientation)

Then the left-hand side of Stokes’ theorem is


I Z Z
F · ds = − F · ds + F · ds
∂S r r2
Z 12π Z 2π
=− F(r1 (t)) · r′1 (t) dt + F(r2 (t)) · r′2 (t) dt
Z0 2π 0

=− (−e sin t, e cos t, 0) · (− sin t, cos t, 0) dt


0
Z 2π
+ (− sin t, cos t, 0) · (− sin t, cos t, 0) dt
0
Z 2π Z 2π
=− e dt + dt
0 0
= −2πe + 2π = 2π(1 − e),

as expected. △

We conclude with some connections between Stokes’ theorem and other concepts we have
seen in this course. Suppose S1 and S2 are two oriented surfaces having the same boundary
curve C with orientation chosen according to Convention 8.4.4. (Note that we are assuming
that Convention 8.4.4 applied to both S1 and S2 yields the same orientation on C.) Then,
by Stokes’ theorem
ZZ I ZZ
(∇ × F) · dS = F · ds = (∇ × F) · dS .
C
S1 S2

In other words, the integral of curl F is the same over S1 as is it over S2 .


For example, if C is the unit circle

C = {(x, y, 0) : x2 + y 2 = 1}

oriented counterclockwise when viewed from above, then both

S1 = {(x, y, z) : x2 + y 2 ≤ 1, z = 0} and
S2 = {(x, y, z) : x2 + y 2 + z 2 = 1, z ≥ 0},
172 Integral theorems

with upward pointing orientation, have boundary C.

S2
n

S1
C

Then, by Stokes’ theorem, we have


ZZ ZZ
(∇ × F) · dS = (∇ × F) · dS . (8.2)
S1 S2

In fact, there is another way to see why (8.2) holds. Let

V = {(x, y, z) : x2 + y 2 + z 2 ≤ 1, z ≥ 0}

be the solid between S1 and S2 . Then

∂V = S1 ∪ S2 .

Note that the outward-pointing normal to ∂V is n on S2 and −n on S1 . Thus we have


ZZ ZZ ZZ
(∇ × F) · dS − (∇ × F) · dS = (∇ × F) · dS
S2 S1 ∂V
ZZZ
= ∇ · (∇ × F) dV (by the divergence theorem)
V
= 0,

where, in the last equality, we used the vector identity ∇ · (∇ × F) = 0 from Proposi-
tion 8.1.6(a).

Remark 8.4.9. If F is a C 1 vector


H field on all of R and satisfies ∇ × F = 0 everywhere, then
3

Stokes’ theorem implies that C F · ds = 0 for all closed curves C. Thus, by Theorem 6.5.1,
F is conservative. This gives an alternative proof of Theorem 6.5.2.

Exercises.
Exercises from [FRYd, §4.4] : Q1–Q28.
Bibliography

[FRYa] J. Feldman, A. Rechnitzer, and E. Yeager. CLP-1 Differential Calculus textbook.


URL: https://secure.math.ubc.ca/~CLP/CLP1/.

[FRYb] J. Feldman, A. Rechnitzer, and E. Yeager. CLP-3 Multivariable Calculus problem


book. URL: https://secure.math.ubc.ca/~CLP/CLP3/.

[FRYc] J. Feldman, A. Rechnitzer, and E. Yeager. CLP-3 Multivariable Calculus textbook.


URL: https://secure.math.ubc.ca/~CLP/CLP3/.

[FRYd] J. Feldman, A. Rechnitzer, and E. Yeager. CLP-4 Vector Calculus problem book.
URL: https://secure.math.ubc.ca/~CLP/CLP4/.

[FRYe] J. Feldman, A. Rechnitzer, and E. Yeager. CLP-4 Vector Calculus textbook. URL:
https://secure.math.ubc.ca/~CLP/CLP4/.

[Gou59] É. Goursat. A course in mathematical analysis: Vol 1: Derivatives and differentials,


definite integrals, expansion in series, applications to geometry. Vol. 2, Part 1:
Functions of a complex variable. Vol. 2, Part 2: Differential equations. Dover
Publications, Inc., New York, 1959. Translated by E. R. Hedrick (Vol. 1), and E.
R. Hedrick and O. Dunkel (Vol. 2).

[MT11] J. E. Marsden and A. Tromba. Vector Calculus. WH Freeman, 6th edition, 2011.

[Nic] W. K. Nicholson. Linear Algebra with Applications. URL: https://lyryx.com/


linear-algebra-applications/.

173

You might also like