Professional Documents
Culture Documents
Calculus 2
Calculus 2
25 Lectures for
Undergraduate Calculus II
February 19, 2024
카이스트 수리과학과
To the ones who question
Foreword
vii
viii Foreword
On the other hand, lecture books are made to proceed a semester course. It is
not designed for vague readers, but for students who are studying the entire course
together. Therefore, the big difference from academic books is the way they lead
students to follow the whole course. In particular, effective communication skill
is important, as if the lecturer and students are talking to each other in a class.
Appropriate motivating questions to help students think by themselves can increase
the effectiveness of learning. Sometimes, it may be more effective to let students find
answers by themselves with appropriate hints and motivation rather than specific
explanations, and through this, lectures become not only the transfer of knowledge,
but also the transfer of wisdom and a stimulation of creative mind. A lecture book
may be written together with a specific academic book or may include all necessary
contents to be self-contained lecture book.
However, most of the books used in lectures have a certain aspect of academic
books and lecture books together. Some of them are actually inadequate for stu-
dents who follow the lecture. This “28 Lectures series” are designed in a form in
which students and instructors communicate with each other effectively to keep
the characteristics as a lecture book. All lectures start with the right questions and
problems that motivates students toward main concepts of the lecture. In addition,
considering that a semester course consists of 28 lectures of 75 minutes long at a
large number of universities, the book is composed of 28 lectures. Two lectures per
week can complete the course in 14 weeks. In addition, this lecture book is designed
to provide the necessary contents so that it can be independently used for lectures
without supplementary textbooks.
Questions are the driving force for progressing the study and the starting point
for creative thinking. I hope that this lecture book will give students an opportunity
to think about and answer questions and problems, and ask their own questions.
This “25 Lectures series” is designed in a form in which students and instruc-
tors communicate with each other effectively to keep the characteristics of a lecture
book. All lectures start with the right questions and problems that motivate students
toward the main concepts of the lecture. In addition, considering that a semester
course consists of 25 lectures 75 minutes long at a large number of universities, the
book is composed of 25 lectures. Two lectures per week can complete the course in
14 weeks. In addition, this lecture book is designed to provide the necessary contents
so that it can be independently used for lectures without supplementary textbooks.
Questions are the driving force for progressing the study and the starting point
for creative thinking. I hope that this lecture book will give students an opportunity
to think about and answer questions and problems and ask their questions.
ix
Acknowledgements
Use the template acknow.tex together with the Springer document class SVMono
(monograph-type books) or SVMult (edited books) if you prefer to set your ac-
knowledgement section as a separate chapter instead of including it as last part of
your preface.
xi
Contents
2 Polar coordinates in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Variable change with polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Motion in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Ellipses in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Curves in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
xiii
xiv Contents
8 Full Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.1 Full Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2 The Chain Rule; Differential of Compositions . . . . . . . . . . . . . . . . . . . 65
8.3 Graph of Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
9 Line Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.1 Line Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2 Expansion rate of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.3 Functions on parametrized curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.4 Directional derivative and Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Lists of abbreviations, symbols and the like are easily formatted with the help of the
Springer-enhanced description environment.
x∈R Real numbers are denoted by regular characters.
(a, b) ⊂ R Open interval for a, b ∈ R.
[a, b] ⊂ R Closed interval for a, b ∈ R.
-
x ∈ Rn Column vectors are denoted by bold characters. One of unique feature of
this lecture note is that we distinguish row-vectors and column-vectors.
xt Row vectors are denoted with the transpose notation.
N(c, r) The neighborhood centered at c with radius r.
Rm×n m × n matrixes with m-rows and n-columns.
ai j The entry of a matrix placed at the i-th row and the j-the column.
det(A) The determinant of a square matrix A ∈ Rn×n .
trace(A) The trace a square matrix A ∈ Rn×n .
∇f For f : Rn → R, the gradient vector ∇ f is a row vector (1 × n matrix).
∇f(c) For f : Rn → Rm , the gradient of the vector field f (or the differential of f)
is denoted by ∇f(c) which is an m × n matrix.
∇ The notation ∇ = (D1 , · · · , Dn ) is a row-vector producing operator, where Di ’s
are partial derivatives.
U0 The interior of a set U ⊂ Rn .
xvii
Part I
Kepler and Newton’s Laws of Motion
Astronomer Johannes Kepler, in the 16th century, analyzed the observations of
Danish astronomer Tycho Brahe and explained the orbits of planets around the sun
with three laws between 1609 and 1619. These laws modified the circular orbit
theory of Nicolaus Copernicus to elliptical orbits and explained how the speed of
planets changes. The three laws are as follows:
1. The orbit of a planet is an ellipse with the sun at one of the two foci.
2. The line segment connecting the planet and the sun sweeps equal areas in equal
time intervals.
3. The square of the period of the planet’s orbit is proportional to the cube of the
semi-major axis length of the orbit.
Isaac Newton, in 1687, demonstrated that Kepler’s laws result from his laws of
motion and universal gravitation. Newton’s laws of motion consist of three parts:
1. Law of Inertia: An object at rest stays at rest, and an object in motion stays in
motion with the same speed unless acted upon by an external force.
2. Force Law: Force is the product of mass and acceleration (F = ma).
3. Action-Reaction Law: For every action, there is an equal and opposite reaction.
Newton’s law of gravitational force states that the gravitational force between two
objects is inversely proportional to the square of the distance and directly propor-
tional to the product of their masses. If the masses of the two objects are m1 and m2 ,
and their positions are x1 and x2 , then the gravitational force acting on object m1 is
given by:
m1 m2 r
Fm1 = −G 2 .
r r
In the above equation, G is the gravitational constant (6.674 × 10−11 m2 /kg s), and
r and r are defined as follows:
r = x1 − x2 , r = ∥r∥.
The force acting on object m2 is simply the opposite, following the action-reaction
law:
m1 m2 r
Fm2 = G 2 = −Fm1 .
r r
Thus, it satisfies the action-reaction law.
In Part II, the first goal is to explain Kepler’s laws using Newton’s laws, and in
the process, the second goal is to learn various useful mathematics.
Lecture 1
Rectangular coordinate system and curves in R3
Space: the final frontier. These are the voyages of the Starship Enterprise. Its
five-year mission: to explore strange new worlds. To seek out new life and new
civilizations. To boldly go where no man has gone before! (From Star Trek)
Now, let’s take the perspective of Newton and try to explain the motion of celestial
bodies using mathematics. To represent the motion of celestial bodies in space with
equations, we first need to establish a coordinate system in space. However, this task
is not as simple as it might seem. While the Earth has served as a reference for us
living on it, there is no such absolute reference in space. The reference frame needs
to be chosen by us.
In the movie Star Trek, the spacecraft Enterprise often moved at high speeds
and then came to a stop. However, distinguishing between a spacecraft moving at a
constant speed and a stationary one is not meaningful. Therefore, stating whether an
object is moving quickly or at rest is not meaningful. If we want to reach a certain
planet, it is more accurate to say that we match the velocity of the spacecraft to the
velocity of that planet. Velocity is relative, and kinetic energy is also relative. Only
acceleration has meaning.
3
4 1 Rectangular coordinate system and curves in R3
Problem 1.1. In the explanation above, positively oriented coordinate systems and
negatively oriented coordinate systems are distinguished based on the choice of k.
What are these cases?
Solution 1.1 (i) Right-hand rule: Wrap the fingers of your right hand around the
line passing through the origin, i, and j in the plane containing them, with the thumb
pointing in the k direction. If k aligns with the thumb, the coordinate system is
positively oriented. Otherwise, it is negatively oriented.
(ii) Cross product test: If k = i × j, the coordinate system is positively oriented. (In
any case, cross product can be explained using the right-hand rule.) ⊔ ⊓
Remark 1.1. In this lecture, we consider 3-dimensional space, but for spaces with
dimensions two or higher, there exist both positive and negative coordinate systems,
and they can be distinguished. However, two positively oriented coordinate systems
cannot be distinguished from each other. They coincide upon rotation. The choice
1.2 Projection 5
Problem 1.2. Two particles move with different velocities without acceleration.
Prove that there exists a plane containing the motion of these two particles in space.
Solution 1.2 As discussed, let’s take one of the particles as the origin. There is a
line passing through the origin, and it intersects the plane containing the motion of
the second particle in space. If this line passes through the origin, there are many
such planes, and if it does not pass through the origin, there is a unique plane. ⊔
⊓
After looking at the solution to the above problem, if you feel a bit deceived, I
want to emphasize that this is not the case. Of course, within the coordinate system
with the third party as the origin, there is no such plane. Problem 1.2 illustrates that
the coordinate system should be chosen according to the purpose.
1.2 Projection
Let r denote the position of a particle in space. Given a coordinate system, we can
represent the position of r with three numbers using that coordinate system. Let’s
examine the meaning and method in detail. First, we project r onto the line x-axis,
which is the line connecting the origin 0 and the unit vector i. When projecting onto
the line, the point where the line, passing through the position r and perpendicular
to the x-axis, intersects the x-axis is the projection point of r onto the x-axis. The
distance from the origin to the projection point is the x coordinate of r. If the pro-
jection point is on the opposite side of i, we assign a negative sign. Similarly, we
can perform this process for j and k to find the y and z coordinates. These are the
coordinates of the point r. Consider the projection onto the xy-plane. Draw a line
perpendicular to the xy-plane, passing through r, and find the point where it inter-
sects the xy-plane. This point is the projection. The coordinates of this point on the
xy-plane are (x, y).
We represent r as a column vector:
x
r = y .
z
0 1 0 0
0 = 0 , i = 0 , j = 1 , k = 0 .
0 0 0 1
r = xi + yj + zk.
Vectors are denoted in bold, and scalars are denoted in regular font. The magnitude
or norm of the position vector r is defined and represented as:
p
∥r∥ = x2 + y2 + z2 .
This represents the distance between r and the origin 0 (Pythagorean theorem). Dif-
ferent coordinate systems can be chosen as needed. In such cases, the essential po-
sition of r remains unchanged, but its representation changes.
Question 1.1. Most calculus textbooks do not distinguish whether vectors are col-
umn vectors or row vectors. However, we fix r as a column vector. What is the
advantage of choosing column vectors over row vectors?
Distinguishing between column vectors and row vectors reduces confusion. One
reason for representing the position vector r as a column vector is matrix multipli-
cation. If A is a 3 × 3 matrix and x is a vector, we typically write the matrix-vector
multiplication as Ax. In this case, x must be a column vector.
However, using column vectors has its drawbacks, as it consumes more space.
Therefore, sometimes, we may write r = (1, 3, 2), saving space horizontally. But
remember to keep in mind that, depending on the context, this may still represent a
column vector.
1.3 Moving particle and trajectory curves in space 7
Let’s consider a planet moving in space. Let time be represented by t ∈ R, and let
r(t) denote the position of the planet or object at time t. Then, we can write:
f (t)
r(t) = f (t)i + g(t)j + h(t)k = g(t) .
h(t)
Both representations are equivalent, and the meaning is clear. However, reconsid-
ering, what is the reason for introducing the new expressions f (t), g(t), and h(t)?
They represent functions of x, y, and z coordinates of the planet, respectively. But
later, one might forget whether f (t) represented the x or y coordinate. So, it is better
to write:
x(t)
r(t) = y(t) .
z(t)
The trajectory of the planet, denoted as {r(t) : t ∈ R}, is a curve in 3D space. Thus,
we can consider it as a vector-valued function with time variable t ∈ R. Using either
of the two expressions mentioned earlier, the norm of the position vector r(t) can be
represented as follows:
q q
∥r(t)∥ = f 2 (t) + g2 (t) + h2 (t) or ∥r(t)∥ = x2 (t) + y2 (t) + z2 (t).
The second notation makes it clear that this is the distance between the position
vector r(t) and the origin. This use of notation abuse clarifies the meaning.
Remark 1.2. In this notation, x(t) is a function with t as the variable representing
the x coordinate of the moving particle’s position at time t. We refer to this kind of
expression as notation abuse. Using the same symbol x for both the x coordinate in
the coordinate system and the function representing the position at time t is more
convenient than introducing a new function f (t) as x = f (t). This kind of notation
abuse, where the same symbol is used for two different entities, is widespread and
has been used in calculus, including the chain rule.
We commonly say that a scalar is a quantity with only magnitude, and a vector is a
quantity with both magnitude and direction. However, that statement is not entirely
accurate. A scalar value x ∈ R also has one of two directions, either to the right
or to the left, with a magnitude of |x|. A more precise distinction is that a scalar
8 1 Rectangular coordinate system and curves in R3
is a quantity that arises in a number system like real or complex numbers, while a
vector can be considered as composed of multiple scalars, including the case of a
single-component vector. In other words, a scalar can be called a single-component
vector.
Problem 1.3. Draw the trajectory of the vector function r(t) = costi + sintj given
by r : (0, 2π) → R2 . In which direction is it moving?
Problem 1.4. Draw the trajectory of the vector function r(t) = costi + sintj + tk
given by r : (0, 2π) → R3 .
Problem 1.5. Generate a function r : (0, 2π) → R3 that traces the trajectory of a coil
rotating the z-axis 10 times when projected onto the xy plane, resulting in a circle of
radius 2.
Multiplying a vector by a scalar is given by cr = (cx, cy, cz). The sum and difference
of two vectors are defined by adding and subtracting each component of the vectors,
respectively. That is,
It is also called the vector product. To make it easier to remember the above formula,
we use the determinant of a 3 × 3 matrix:
i j k
y z x z x y
r1 × r2 = x1 y1 z1 = 1 1 i − 1 1 j + 1 1 k.
y2 z2 x2 z2 x2 y2
x2 y2 z2
The cross product is defined only for 3-dimensional vectors. Geometrically, the
cross product r1 × r2 is a vector perpendicular to the plane containing the two vec-
tors r1 and r2 , with a magnitude given by
where θ is the angle between them. There are two such vectors, satisfying the right-
hand rule. If the two vectors are parallel, i.e., if the angle is θ = 0, then r1 × r2 = 0.
Problem 1.6. Let r1 (t) and r2 (t) denote the vectors representing the positions of
two objects at time t. Show that the cross product satisfies the following product
rule:
(r1 (t) × r2 (t))′ = r′1 (t) × r2 (t) + r1 (t) × r′2 (t).
Solution 1.6 We can use the product rule for derivatives as follows:
Thus, the product rule is satisfied. (Not all terms are explicitly written, please verify.)
⊔
⊓
10 1 Rectangular coordinate system and curves in R3
Question 1.3. Is there a way to determine if two vectors r1 and r2 are perpendicular?
Is there an easy way to find the angle between them?
Using (1.1), we can find the angle between two vectors. However, an easier way
to determine the angle is through the inner product, also known as the dot product.
The inner product is defined in two ways:
r1 · r2 = ⟨r1 , r2 ⟩ = x1 x2 + y1 y2 + z1 z2 .
Problem 1.7. If θ is the angle between two vectors r1 and r2 , show that
r1 · r2
cos θ = . (1.2)
∥r1 ∥ ∥r2 ∥
Solution 1.7 Assuming the two vectors meet at the origin, we can consider them
lying in the xy-plane. Therefore, let’s assume all z components are zero. Then the
relationship (1.2) corresponds to basic trigonometry learned in high school. Though
not explicitly shown here, (1.2) should be remembered. ⊔ ⊓
The relationship (1.2) is very important. If the inner product is 0, the vectors are
perpendicular. If the angle is 0, i.e., if the vectors are parallel, then cos 0 = 1, and
the inner product of the two vectors equals the product of their lengths.
Solution 1.8 (Refer to the figure above) Let x = (x, y, z) represent a point on the
plane. Then, the vector x − r = (x − 1, y − 2, z + 1) is perpendicular to v = (0, 3, −2).
Therefore,
The inner product can be defined not only for 3-dimensional vectors but also
for vectors of any dimension. However, the notation used previously is not suitable
for expressing the inner product of n-dimensional vectors. Let’s represent two n-
dimensional vectors slightly differently:
x1 y1
.. ..
x = . , y = . .
xn yn
The inner product of two functions f and g can also be defined by integration.
Z
⟨ f , g⟩ = f (x)g(x)dx. (1.4)
What is the angle between two vectors in n-dimensional space? What about the
angle between two functions f and g? Although their meanings are different, (1.2)
can be used as a definition for angles.
Question 1.4. What commonality exists between the inner products (1.3) and (1.4),
even though they seem different?
Problem 1.9. Let x(t) and y(t) denote vectors representing the positions of two
objects at time t. Show that the derivative of their inner product also satisfies the
following product rule:
Solution 1.9 We can use the product rule for derivatives as follows:
n ′ n
(x(t) · y(t))′ = x (t)y
∑ i i (t) = ∑ (xi (t)yi (t))′
i=1 i=1
n
= ∑ (xi′ (t)yi (t) + xi (t)y′i (t)) = x′ (t) · y(t) + x(t) · y′ (t).
i=1
Exercises
6. Find the equation of a plane parallel to the xy-plane passing through the point
r = (2, 1, 4).
7. Find the equation of a plane parallel to the xz-plane passing through the point
r = (2, 1, 4).
8. Find the intersection of the planes 2x + 3y − z = 2 and 3x + y − 2z = 0.
9. Find the equation that represents all points equidistant to the points r1 = (1, 2, 1)
and r2 = (3, 2, −1).
Lecture 2
Polar coordinates in R2
The planets in the solar system orbit in elliptical paths close to circles around the
sun. Artificial satellites orbiting around the Earth are mainly designed to orbit in
circular paths, but they can also orbit in elliptical paths. Each orbit can be described
in two-dimensional space coordinates on a plane. Particularly, polar coordinates are
useful for representing circular or elliptical orbits. In this lecture, we will discuss
polar coordinates, which have many practical applications.
Of course, given orthogonal coordinates (x, y), we can find the corresponding polar
13
14 2 Polar coordinates in R2
Although these two vectors are unit vectors, unlike i and j, they are not constant
vectors. Both vectors depend only on θ and are independent of r. The correspond-
ing basis vectors of the polar coordinate system are er , which becomes (1, 0), and
eθ , which becomes (0, 1). The reason is as follows: as seen in the figure, the vec-
dinate system as it is a vector in the direction of the fixed r. Let’s examine which
point in the orthogonal coordinate corresponds to the given coordinates (r, θ ) in the
polar coordinate plane. Once the angle θ is given, we consider the direction vec-
tor er corresponding to the angle θ . Since the direction vector is a unit vector, the
corresponding vector has a length of r:
r = rer (θ ).
This equation is nothing more than rewriting the relationship (2.1) as a vector equa-
tion. If er is the first coordinate axis and eθ is the second coordinate axis, then the
new coordinate system also has a positive orientation.
Now, if the point r = (x, y) on the xy plane is given, let’s find the corresponding
polar coordinates (r, θ ). There is a point to be careful about: since the correspon-
dence (2.1) is not one-to-one, it is not uniquely determined. To establish an inverse
correspondence, we must choose a branch as in defining inverse functions. In polar
coordinates, we choose r ≥ 0 and 0 ≤ θ < 2π as branches. Within this range, we
choose r and θ that satisfy (2.2).
This section is essential for deriving the orbit formulas of planets. It requires math-
ematical thinking for physical understanding. Assuming that two celestial bodies
(such as the Sun and the Earth) do not exert any external forces other than gravity
on each other, they will lie on the same plane (this will be confirmed later). Intro-
ducing a polar coordinate system on this plane allows us to represent the position of
an object or a planet using polar coordinates:
We denote the position vector in bold font r. The relationship with polar coordinates
r is
∥r∥ = r.
16 2 Polar coordinates in R2
The basis vectors i and j in the orthogonal coordinate system are fixed perpendic-
ular coordinate systems regardless of the position. However, er (θ ) and eθ (θ ) are
perpendicular coordinate systems that vary depending on the position. They are de-
termined by the angle θ for a given position in the orthogonal coordinate system
and are independent of r.
der deθ
= eθ , = −er .
dθ dθ
Solution 2.1 These relations can be easily proven using the derivatives of trigono-
metric functions. Remembering them is more important. ⊔ ⊓
With the new coordinate system, the position of the object is represented as
r = rer (θ ). This notation hides the time variable. As the object moves, the polar
coordinates r and θ representing the position of the object become functions of the
time variable t. The right side of the following figure shows the trajectory of a par-
ticle moving on the xy plane. Then, the corresponding polar coordinate position is
represented as r̃(t) = (r(t), θ (t)). The space where Newton’s laws apply is not the
polar coordinate space but the orthogonal coordinate space. In other words, New-
ton’s gravitational law and laws of motion must be applied to the trajectory where
the point r = rer (θ ) on the right side of the figure moves. Therefore, the coordinates
er and eθ become functions of the angle θ with respect to the time variable t, and
the position of the particle can be written as follows:
Solution 2.2 To calculate the derivatives with respect to time ėr and ėθ , consider
the angle as a function of time θ = θ (t). Using the chain rule and problem 2.1, we
get
2.3 Ellipses in polar coordinates 17
′
d cos θ (t) cos θ (t)θ̇ − sin θ (t)θ̇ der
ėr = = ′ = = θ̇ = eθ θ̇
dt sin θ (t) sin θ (t)θ̇ cos θ (t)θ̇ dθ
and similarly
deθ
ėθ = θ̇ = −er θ̇ .
dθ
⊔
⊓
Problem 2.3 (Position, velocity, acceleration using polar coordinates). The posi-
tion, velocity, and acceleration of an object are given as follows.
r = rer (2.5)
v = ṙer + rθ̇ eθ (2.6)
2
a = (r̈ − rθ̇ )er + (rθ̈ + 2ṙθ̇ )eθ (2.7)
Solution 2.3 The position vector (2.5) has already been explained. Its derivative
using the product rule and (2.4) is as follows:
a = v̇ = r̈er + 2ṙθ̇ eθ + rθ̈ eθ − rθ̇ 2 er = (r̈ − rθ̇ 2 )er + (rθ̈ + 2ṙθ̇ )eθ .
⊔
⊓
Remark 2.1. Remember that using polar coordinates r and θ , it is convenient to use
er and eθ as basis vectors instead of i and j.
The equation of an ellipse with its center at the origin and major and minor axes
along the x-axis and y-axis, respectively, is given by:
x 2 y2
+ = 1.
a2 b2
An overview of the graph is given on the left side of the figure. ±a represent the
x-intercepts, and ±b represent the y-intercepts. If a = b, then the above ellipse be-
comes a circle. For convenience, we consider the case where a ≥ b, so the x-axis
becomes the major axis. The focus of the ellipse lies on√the major axis at two points.
The distance between the center and the focus is c = a2 − b2 , i.e., the foci are at
(±c, 0). The eccentricity of the ellipse, which indicates how far it deviates from a
circle, is given by: r
c a2 − b2
e= = . (2.8)
a a2
18 2 Polar coordinates in R2
r = ePD (2.9)
defines all points P(x, y) that satisfy this equation. Since the length of segment PD
is k − x, we have:
p
r = ePD ⇒ x2 + y2 = e(k − x) ⇒ x2 + y2 = e2 (k2 − 2kx + x2 ).
(1 − e2 )x2 + 2ke2 x + y2 = e2 k2 .
Problem 2.4. If 0 < e < 1, show that (2.10) represents an ellipse with one of its foci
at the origin, where e represents the eccentricity of the ellipse.
e2 k2 e2 k 2 ke2
a2 = , b2 = a2 (1 − e2 ) = , c= > 0.
(1 − e2 )2 (1 − e2 ) 1 − e2
(x + c)2 y2
+ 2 = 1,
a2 b
2.4 Curves in polar coordinates 19
a2 − b2 a2 − a2 (1 − e2 ) 1 − (1 − e2 )
= = = e2 . (2.11)
a2 a2 1
Thus, the coefficient e in the relationship r = ePD is indeed the eccentricity of the
ellipse, so it is reasonable to set the coefficient to
√e from the beginning. The distance
from the center to the focus of the ellipse is a2 − b2 , and using (2.11), we can
compute: s
p √ k 2 e4
2 2
a −b = e a = 2 2 = c.
(1 − e2 )2
Therefore, shifting the ellipse by c units to the left means the origin is a focus. ⊔
⊓
We have shown that points satisfying (2.9) form an ellipse with eccentricity e
and one focus at the origin. The length of segment PD is k − r cos θ , so the polar
representation of this ellipse becomes r = e(k − r cos θ ). Solving for r, we get:
L
r= , L = ek.
1 + e cos θ
This equation represents an ellipse with eccentricity e for 0 < e < 1. However, for
e ≥ 1, it represents a parabola or a hyperbola (see Appendix B).
Problem 2.5. Convert the following equations given in polar coordinates to Carte-
sian coordinates and draw their corresponding graphs.
2
(1) r = 1 (2) r = cos θ (3) r = cos(2θ ) (4) r =
sin θ − cos θ
Solution 2.5 It’s important to distinguish between the graphs in polar coordinates
and their corresponding graphs in Cartesian coordinates, understanding that the
graphs in polar coordinates correspond to the graphs in Cartesian coordinates via
the transformation (2.1). The overview of the graphs is given in the figure.
20 2 Polar coordinates in R2
p
(1) The equation r = 1 in Cartesian coordinates becomes x2 + y2 = 1, which
represents the equation x2 + y2 = 1. We know this represents a circle with its center
at the origin and radius 1. Even without knowing this, if we plot r = 1 for various
values of θ from 0 to 2π, we would observe a circle with radius 1.
(2) Since cos θ can take negative values, we need to consider the possibility of
r being negative when writing r = cos θ . Multiplying both sides by r, we get r2 =
r cos θ , which, in Cartesian coordinates, becomes x2 + y2 = x. Rewriting this, we get
(x − 0.5)2 + y2 = 0.52 . This represents a circle centered at (0.5, 0) with radius 0.5. In
the polar coordinate space, this graph is represented by the cosine function, which
repeats every 2π interval. Thus, the interval [0, 2π] corresponds to two circles. It’s
worth understanding why this is so when θ moves from 0 to π.
(3) Using the double angle formula, we get r = cos2 θ − sin2 θ , and in Cartesian
coordinates, this becomes (x2 + y2 )3/2 = x2 − y2 . Squaring both sides and rewriting,
we get x6 + 3x4 y2 + 3x2 y4 + y6 = x4 − 2x2 y2 + y4 . It’s not immediately clear what
curve this equation represents. However, in polar coordinates, the graph is simply
the cosine function, and considering the above graph, we end up with a four-leaf
clover pattern due to the absence of overlapping.
(4) In this case, the graph in polar coordinates might seem more complicated,
but when rewritten in Cartesian coordinates, we get y = x + 2, which represents a
straight line. ⊔⊓
Exercises
23
24 3 Newton’s law on Earth
When an object receives a force F and moves a distance ℓ, the magnitude of work
W is given as follows:
Question 3.1. Why is work defined as the product of force and displacement?
when the parameter for energy calculation is changed from time to distance (arc-
length).
What if the force is not constant but a function? If it is a function of time, then
it means that acceleration varies with time, and thus, velocity becomes the integral
of acceleration, i.e., v(t) = v(0) + 0t a(s)ds. Therefore, kinetic energy can be easily
R
obtained. If the force is a function of position, then integration using Equation (3.3)
is necessary. The actual gravity (3.1) is a function of position or distance, and in this
case, Equation (3.3) is more useful than the formula for kinetic energy. For example,
if an object moves along the x-axis and the force component in the x-direction is a
function of x, i.e., f = f (x), then the work done by the force f (x) between x = a
and x = b is given by
Z b
W= f (x)dx.
a
It is called a definite integral because it calculates the accumulated work done by the
force f (x) from the beginning to the end. That is, the definite integral is to determine
the signed area of the graph of f (x) from x = a to x = b.
The motion energy of a planet undergoes exchange between potential and kinetic
energy as it alternates between acceleration and deceleration. When an object with
mass m1 moves with velocity v, the kinetic energy is given by:
1
Ek = m1 ∥v∥2 .
2
The following problem demonstrates that the potential energy due to gravity on the
surface of Earth can also be expressed as a product of gravity and distance.
Problem 3.1 (Gravity on the earth surface). The gravitational force exerted on
an object with mass m1 at the Earth’s surface is −m1 gk̂. Here, g = 9.8 m/sec2 is
the gravitational acceleration, and k̂ is the unit vector in the vertical direction on
the Earth’s surface. If this object is placed at a height h > 0 above the surface, the
object’s potential energy is
E p = m1 gh (3.4)
Show the following:
(1) Confirm the magnitude of the gravity constant g using Equation (3.1).
(2) Explain the concept of potential energy (3.4) using the work concept.
(3) Explain the significance of potential energy (3.4).
Solution 3.1 (1) The mass m corresponds to m1 , and the vector k̂ corresponds to
r/r. Therefore, the remaining part corresponds to the constant g:
26 3 Newton’s law on Earth
Here, m2 is the mass of Earth, and R is the radius of Earth. The value of g can be
verified by finding it on the internet.
(2) Work is a method of calculating potential energy. If the force F in the direction
of motion of an object with respect to the ground is constant, then the work is given
by fz h. Here, h is the (vertical) displacement. Therefore, the potential energy is
E p = m1 gh.
(3) The energy required to push the object from the Earth’s surface to its current
position is the potential energy. Alternatively, it is the amount of work needed for
the object to fall to the Earth’s surface from that position.
Problem 3.2. A mass of 2Kg is thrown vertically upward from the ground with a
force of twice the gravity for t seconds. Calculate the kinetic and potential energies
at that moment.
Solution 3.2 If the force is twice the gravity, then 2mg = 4Kgg. The acceler-
ation
Rt
is g since we subtract gravity. Therefore, the velocity after t seconds is
0Rgds = gt. Therefore, the kinetic energy is 12 mv2 = g2t 2 Kg. The distance traveled
t
is 0 gsds = 21 gt 2 , so the potential energy is E p = mgh = (2Kg)g 21 gt 2 = g2t 2 Kg. The
total energy is g2t 2 Kg + g2t 2 Kg = 2g2t 2 Kg. Alternatively, using Equation (3.3), the
total energy can be calculated. Then, 4gKg × 12 gt 2 = 2g2t 2 Kg. If the total energy
after 100 seconds is expressed in units, since g = 9.8 m/sec2 , the total energy is as
follows:
Solution 3.3 First, assume that the object moves up and down along the center of
the Earth. The k̂ component of gravity is given by f = −Gm1 m2 s−2 . Here, s is the
3.3 Gravity force and potential energy 27
distance to the center of the Earth. Assume pushing the object away from the Earth’s
surface requires a force in the opposite direction. Integrating gravity for r > R yields:
Z r r
Gm1 m2 s−2 ds = −Gm1 m2 s−1 = Gm1 m2 (R−1 − r−1 ).
R R
This matches (3.5). Let h denote the distance from the surface. Then, r = R + h.
Therefore, the potential energy is:
1 1 R+h−R h R2
E p = Gm1 m2 − = Gm1 m2 = Gm1 m2 2 2 .
R R+h R(R + h) R R + Rh
R2
If h is much smaller than R, R2 +Rh
≈ 1. The potential energy can then be written as:
h Gm2
E p ≈ Gm1 m2 2
= m1 2 h,
R R
which is a valid approximation for the potential energy (3.5). (The radius of Earth is
2
R = 6371 km. If h = 10 km, then R2R+Rh ≈ 0.9984, with a difference of about 0.16%.)
h
Remark 3.1 (A brief note). Since h is much smaller than R, we can say R(R+h) ≈ Rh2 .
However, we left the h in the numerator. We shouldn’t delete everything just because
it’s small. Depending on what we want to see, we can distinguish between what can
be deleted and what shouldn’t be deleted, depending on what’s around.
Question 3.2. The potential energy (3.5) becomes 0 on the Earth’s surface. This def-
inition represents potential energy with respect to the Earth’s surface. What happens
if we calculate potential energy with respect to the center of the Earth?
When calculating the potential energy from the center of the Earth, it corresponds
to the case where R = 0. In this scenario, the potential energy given by (3.5) diverges,
meaning:
lim Gm1 m2 (R−1 − r−1 ) = ∞.
R→0
This implies that the potential energy becomes infinite when measured from the
center of the Earth. Essentially, this suggests that an infinite amount of energy is
required to move away from the center of the Earth. In other words, objects located
at the center of the Earth cannot escape. (Even if an object has a small mass, if it
can be compressed sufficiently, nothing can escape from within. Such objects are
known as micro black holes.)
If potential energy cannot be measured from the center of the Earth, the next
natural choice is to measure it from ∞. Then, when R = ∞, the potential energy is
given by:
Gm1 m2
Ep = − . (Potential Energy)
r
28 3 Newton’s law on Earth
In this case, the drawback is that potential energy is negative. When measured from
infinity, the potential energy is 0 at ∞ and becomes increasingly negative as it ap-
proaches the Earth’s center. But among other choices, this is the best one. When
considering the movement between planets, the reference point for potential energy
is r = ∞, and the potential energy is negative and becomes 0 at r = ∞. When con-
sidering movement due to gravity on the Earth’s surface, the reference point is the
surface of the Earth, and potential energy is positive, reaching a minimum of 0 at
h = 0.
Let’s examine the trajectory of a projectile launched from the ground at an angle
φ ∈ (0, π2 ) with an initial velocity v0 > 0. The objective is to find the projectile’s
trajectory before it touches the ground again, the maximum height reached before it
falls, the distance traveled, and the time it stays in the air. Air resistance is ignored.
Assuming the projectile moves in the xz-plane, let’s find the trajectory r(t).
Let the starting point be the origin, r(0) = 0, and the initial velocity be v(0) =
(v0 cos φ , v0 sin φ ). The acceleration a is given by gravity, so a(t) = (0, −g). The
velocity vector v(t) at time t is obtained by integrating the acceleration with initial
conditions:
v cos φ
Z
c1 c
v(t) = a(t)dt = , v(0) = 1 = 0 .
−gt + c2 c2 v0 sin φ
Thus, v(t) = (v0 cos φ , −gt + v0 sin φ ). Integrating once more to calculate the posi-
tion vector:
v0 cos φt + c1 0
Z
c
r(t) = v(t)dt = ⇒ r(0) = 1 = .
− 21 gt 2 + v0 sin φt + c2 c2 0
z(t) = 0 represents the moment when the projectile is on the ground. Therefore,
solving − 12 gt 2 + v0 sin φt = 0 gives us the moments when it touches the ground.
One solution is the initial time, t = 0. The other is:
2v0 sin φ
T= Time of flight
g
when it touches the ground again. The x-component x(T ) at time T is the distance
traveled:
3.4 Projectile motion 29
v20 sin2 φ
H = z(T /2) = Maximum height
2g
Problem 3.4. Explain how the projectile trajectory changes if there is a crosswind
blowing at a speed of v1 .
Solution 3.4 If we ignore air resistance, no matter how strong the crosswind is, it
doesn’t affect the projectile’s trajectory. When considering air resistance, the method
used above is not sufficient.
Problem 3.5. Given a fixed launch velocity, how can you maximize the distance the
projectile travels?
Solution 3.5 If the launch angle φ is fixed, the maximum distance and height are
proportional to the square of the velocity v20 . The time of flight is proportional to v0 .
If the velocity is fixed, you can choose the angle φ . The range is maximized when
sin φ cos φ reaches its maximum value. To find the maximum, differentiate it since
it’s 0:
(sin φ cos φ )′ = cos2 φ − sin2 φ = 2 cos2 φ − 1.
Thus, the critical points are when cos φ = √1 , so φ = π4 .
2
Question 3.3. The following text is from a baseball magazine: ”We were taught in
school that the ’most distance a ball can be thrown angle’ is 45 degrees. But in actual
baseball, the optimal launch angle is close to 30 degrees.” Why is this different? (The
optimal angle for a golf ball is about 17 degrees.)
The reason is air resistance and the spin of the ball. The ball’s spin is due to the
bottom part of the bat hitting the ball. If the launch angle is 45 degrees and the ball
has such spin, the actual trajectory is much higher than the optimal trajectory. The
spin of a golf ball is also caused by hitting the bottom of the ball, making the spin
more pronounced than a baseball and having a greater impact due to the surface of
the ball. Of course, without air resistance, 45 degrees is always the optimal launch
angle.
Exercises
2. Calculate the gravitational force between the Earth and the Sun. Compare it with
the gravitational force between Mars and the Sun. (Necessary data can be found
on the internet, such as the masses of Earth and Mars, and the distances from the
Sun.)
3. Let the mass of Jupiter be 1.899 × 1027 kg and its radius be 140, 000 km. Calcu-
late the magnitude of the gravitational force on Jupiter’s surface and compare it
with the gravitational force on the Earth’s surface.
4. A 10 kg piece of iron falls into water with a depth of 10 meters. How much work
does gravity do? (Necessary data can be found on the internet, such as the density
of iron.)
5. Assume the speed of sound is 340 m/s. Calculate the maximum distance traveled
when the projectile’s velocity is equal to the speed of sound. Also, determine the
time of flight and maximum height reached.
6. It is said that the maximum range of a K9 howitzer is 53 km. What is the launch
velocity?
Lecture 4
Multi-variable Vector-valued Functions
In Calculus 2, the primary subject matter dealt with is multivariable functions that
have vector values. Vectors are denoted in boldface and are conceived as column
vectors. Generally, they are represented as follows:
f : D ⊂ Rn → Rm , y = f(x), x ∈ R n , y ∈ Rm .
A function f assigns a single value for each element in the set D ⊂ Rn , referred
to as the domain. The collection of all function values, denoted as {y ∈ Rm : y =
f(x) for some x ∈ D}, is called the range (or image), and the space in which the
function values belong, Rm , is termed the codomain. We typically denote the di-
mension of the domain as n and the dimension of the codomain as m. We express
this as:
f1 (x) x1
.. ..
f(x) = . , x = . ∈ Rn .
fm (x) xn
Here, fi : Rn → R represents functions with n independent variables that yield scalar
values. Typically, independent variables are denoted as x1 , · · · , xn . However, when
n ≤ 3, they are occasionally represented as x, y, and z. We aim to distinguish between
x ∈ Rn and y ∈ Rm , though in some cases, it may not be possible.
Given a function, often one can determine its maximum domain even without
explicit specification. In such cases, finding the maximum range is also feasible.
31
32 4 Multi-variable Vector-valued Functions
Problem 4.1. Determine the maximum domain and range for the following func-
tions. p
(i) f (x, y) = y − x2 .
1
(ii) f (x, y) = xy .
(iii) f (x, y, z) = xy ln z.
(iv) f (x, y, z) = x2 +y12 +z2 .
Solution 4.1 ⊔
⊓
Given a point c ∈ Rn and a radius r > 0, the set of points that lie within a maxi-
mum distance r from c, denoted as
B(c, r) := {x ∈ Rn : ∥x − c∥ < r}
is called an open ball with radius r > 0 and center c. If c ∈ D and B(c, r) ⊂ D for
some r > 0, then c is termed an interior point of the set D. If c ̸∈ D and B(c, r)∩D =
0/ for some r > 0, then c is termed an exterior point of the set D. Otherwise, c is
termed a boundary point. If every point of a set D ⊂ Rn is an interior point, then D
is termed open. If Dc := {x ∈ Rn : x ̸∈ D} is an open set, then D is termed closed.
Solution 4.2 ⊔
⊓
Remark 4.1. (i) The concepts of open sets and closed sets are extremely important
and are utilized in proofs in analysis courses. They are merely introduced in calculus
courses. (ii) The definitions provided here may slightly differ from those in other
textbooks, but their equivalence can be verified.
This section delves into the concepts of graphs, images, level sets, contours, and
their interrelations.
Problem 4.3. What distinguishes a function’s graph from its image (or range)? Can
you differentiate between a curve drawn on a plane representing the graph of a
function and one representing its image? (Hint: Use the vertical line test?)
r lies in four-dimensional space, represented as (t, x, y, z) = (t, x(t), y(t), z(t)). How-
ever, since we are accustomed to experiencing three-dimensional space, imagining
objects in four dimensions is challenging. Therefore, let’s start with simpler exam-
ples.
cost
Problem 4.4. Consider r(t) = , a function with domain [0, 2π] and range in
sint
R2 . Draw the graph and image of this function.
Solution 4.4 (Hint. The graph of this function lies in R3 , whereas its trajectory (or
its range) lies in R2 .) ⊔
⊓
Problem 4.5. Let D = (−1, 1) × (−1, 1) be the domain of the function f (x1 , x2 ) =
q
x12 + x22 . Draw the graph and image.
Solution 4.5 In this example, it can be observed that the image is not particularly
meaningful. ⊔⊓
Problem 4.6. (i) Describe the level set and contour map of the function f (x, y) =
p
2 2
p1 − x + y . (ii) Describe the level set and contour map of the function f (x, y, z) =
x2 + y2 + z2 .
Solution 4.6 ⊔
⊓
34 4 Multi-variable Vector-valued Functions
Problem 4.7. The graph of a function f : Rn → R can be considered as the zero level
set of another function g : Rn+1 → R. (i) Find the function g. (ii) Let f : R2 → R be
defined as f (x, y) = 2x + 3y. Find a function g : R3 → R such that its zero level set
gives the graph of f . Find a vector perpendicular to the graph.
Solution 4.7 ⊔
⊓
The definitions of limits and continuity provided above are based on a dynamical
argument. To show a limit or continuity, we need to find a suitable δ > 0 such that
the conditions of the definition are satisfied for any given ε > 0. The possibility
of finding such δ > 0 for any ε > 0 signifies a limit or continuity. Directly defining
continuity without going through limits is also possible, with only slight differences.
Problem 4.8. Let f : Rn → Rm and c ∈ Rn . Then, for any ε > 0, there exists δ > 0
such that ∥x − c∥ < δ implies ∥f(x) − f(c)∥ < ε. This statement is equivalent to
saying lim = f(c), i.e., f is continuous at c.
x→c
Solution 4.8 Showing that assertion A implies assertion B means proving two
things.
(A⇒B)
(A⇐B) ⊔
⊓
Several basic rules apply when computing limits. These rules are similar to those
for scalar-valued functions of a single variable, but they require conditions to hold
for vectors.
Problem 4.9. Let f, g : Rn → Rm , with lim f(x) = L and lim g(x) = M. Show the
x→c x→c
following.
1. lim (f(x) + g(x)) = L + M.
x→c
Some explanation is needed. The meaning of these relationships is that limits can
be taken separately. For example, the first one means:
lim (f(x) + g(x)) = lim f(x) + lim g(x) = L + M.
x→c x→c x→c
4.3 Limit and Continuity in Rn 35
Solution 4.9 Let’s prove the first problem. Given ε > 0, we need to find δ > 0 such
that ∥f(x) + g(x) − (L + M)∥ < ε. Since lim f(x) = L, there exists δ1 > 0 such that
x→c
∥x − c∥ < δ1 implies ∥f(x) − L∥ < ε2 . Similarly, since lim g(x) = M, there exists
x→c
δ2 > 0 such that ∥x − c∥ < δ2 implies ∥g(x) − M∥ < ε2 . Now, let δ be the minimum
of these two, min(δ1 , δ2 ). Then, for all 0 < ∥x − c∥ < δ , we have
ε ε
∥f(x) + g(x) − (L + M)∥ ≤ ∥f(x) − L∥ + ∥g(x) − M∥ < + < ε.
2 2
Here, the first inequality is due to the triangle inequality.
The second problem is similar, and the third problem can also be approached
similarly. This technique is called the ”give and take” method. Ultimately, we obtain
the following result:
ε
∥f(x) − L∥ < whenever ∥x − c∥ < δ1 .
2∥M∥
ε
Note that we replace ε with 2∥M∥ in this expression. Then we get ∥(f(x) − L)∥ ∗
ε
∥M∥ ≤ 2 for the other half as well.) ⊔ ⊓
Solution 4.10 ⊔
⊓
(i) Show that f (x, y) converges to 0 as (x, y) approaches the origin (0, 0) in any
direction. (ii) Show that f (x, y) converges to a non-zero value as (x, y) approaches
4.4 Composition of two functions 37
p
the origin (0, 0) along the parabola y = k |x|. (iii) What can be concluded about
the continuity of f at the origin (0, 0)?
Solution 4.11 ⊔
⊓
Solution 4.12 It is important to see the similarity between this example and the
previous two examples. ⊔⊓
Problem 4.13. Test the continuity of the following function at (x, y) = (0, 0) with
ε > 0. ( 1+ε
xy
, if (x, y) ̸= (0, 0)
f (x, y) = x2 +y2
0, otherwise.
Solution 4.13 ⊔
⊓
This theorem implies that if the outer function h is continuous at the limit
lim f(x) = L, then we can move the outer limit inside the function h.
x→c
Solution 4.14 Let ε > 0 be given. Our goal is to find δ > 0 such that
holds. Also, since lim f(x) = L, there exists δ > 0 such that
x→c
38 4 Multi-variable Vector-valued Functions
Now let’s consider an important example. The power function h(y) = yk is con-
tinuous for all positive k ≥ 0. If k < 0, it is discontinuous at y = 0. Therefore, we
have the following:
1. For all cases where k ≥ 0, limx→c ( f (x))k = (limx→c f (x))k .
2. limx→c e f (x) = elimx→c f (x) .
Part II
Linear Functions and Differentiation
Lecture 5
Linear maps and matrix multiplication
Multivariable functions
f : Rn → Rm , y = f(x), x ∈ R n , y ∈ Rm
T : Rn → Rm
and the graph of this linear function and the graph of f(x + c) − f (c) shifted par-
allelly to it meet at the origin (See Lecture 8). For a better understanding of multi-
variable functions, basic knowledge of linear functions and matrix theory is crucial.
This lecture covers the basics of linear functions and matrix multiplication.
We will follow strict rules for notation rather than using it freely. We will dis-
tinguish between vectors and scalars, as well as between row vectors and column
vectors. We will try to distinguish the notation of matrices and the indices represent-
ing rows and columns as much as possible.
This column vector has n rows and 1 column, which can be viewed as an n × 1
matrix. If a row vector is needed, we take the transpose of the vector as follows:
41
42 5 Linear maps and matrix multiplication
xt = (x1 , · · · , xn ).
This row vector has 1 row and n columns, thus it can be viewed as a 1 × n matrix.
Also, the following notation is used:
x = (x1 ; x2 ; · · · ; xn ),
where the semicolon ‘;’ is a delimiter for changing rows. The basis unit vectors are
represented as follows:
Solution 5.1 This problem is added to remind that T (0) = 0 for all linear functions.
The first appearance of ”T (0)” denotes the zero vector 0 in Rn . The subsequent
appearance of ”0” denotes the zero vector 0 in Rm .
Let’s prove it as follows. For any x ∈ Rn ,
(In other words, the value of the function on a linear combination is equal to the
linear combination of the function values.)
5.2 Matrix multiplication 43
Solution 5.2 We use induction. We already know that the proposition holds for
N = 2. Assuming that it holds for N = ℓ − 1, we will prove that it holds for N = ℓ.
Then,
ℓ ℓ−1 ℓ−1
T a x
∑ kk = T a x
∑ kk ℓℓ + a x = T ∑ k k + aℓ T (xℓ )
a x
k=1 k=1 k=1
ℓ−1 ℓ
= ∑ ak T (xk ) + aℓ T (xℓ ) = ∑ ak T (xk ),
k=1 k=1
where the second equality follows from the definition, the third equality follows
from the assumption that it holds for N = ℓ − 1. ⊔
⊓
x · y = ⟨x, y⟩ := x1 y1 + · · · + xn yn .
A has m rows and n columns. The number of rows is the size of the columns, and
the number of columns is the size of the rows. Each element is a real
Remark 5.1 (Index Selection). i) We denote the row vector of the matrix A as ãti .
Note that in our notation, ã1 ̸= a1 . The vector a1 represents the first column of A,
hence a1 ∈ Rm . On the other hand, ã1 represents the transpose of the first row of A,
thus ã1 ∈ Rn . ii) In matrix multiplication AB, where A is an m × n matrix and B is
an n × ℓ matrix, the resulting matrix C = AB is of size m × ℓ. We denote A = (ai j )
and B = (b jk ). Since j will disappear, we have chosen these indices. Choose what
indices to use for C. Your choice should be C = (Cik ). Therefore, C = AB represents
the i-th row and k-th column of C.
1 In linear algebra, people usually consider m × n matrix, not n × m matrix. Hence, m is for the
number of rows and n for columns. This is related to the convention that vector-valued functions
are usually denoted as f : Rn → Rm .
44 5 Linear maps and matrix multiplication
Viewpoint #1
Matrix multiplication can be understood from about four viewpoints. When we have
AB = C, the ik-th element of the resulting matrix C is the dot product of the i-th row
vector of A and the k-th column vector of B. In other words,
b1k n
..
cik = (ai1 , · · · , ain ) . = ∑ ai j b jk . (5.1)
j=1
bnk
This first viewpoint might be familiar as it is often used as the definition of matrix
multiplication. The dot product itself can be understood as a matrix multiplication,
where the row vector is a 1 × n matrix and the column vector is an n × 1 matrix,
resulting in a 1 × 1 matrix, i.e., a scalar. The row vector is multiplied on the left, and
the column vector on the right. Again, written as,
y1 n
t
.
x · y = x y = x1 · · · xn .. = ∑ xi yi .
i=1
yn
Viewpoint #2
Problem 5.3. Verify if the ik element of (5.2) matches the given ik element of (5.1).
5.2 Matrix multiplication 45
Viewpoint #3
The third viewpoint is row operation. This method is often used when solving sys-
tems of equations, especially in Gaussian Elimination. It is the dual viewpoint of the
second viewpoint but is also widely used in practice.
Let B be an n × ℓ matrix. To multiply a vector on the left of B, it must be a 1 × n
row vector. If x ∈ Rn , then the row vector xt is a 1 × n matrix. Then, xt B gives a 1 × ℓ
row vector, which is a linear combination of the row vectors of B. It can be written
as follows; t
b̃1 n
..
x B = (x1 , · · · , xn ) . = ∑ xi b̃ti ∈ Rℓ .
t
i=1
b̃tn
That is, multiplying a n × ℓ matrix on the left with a 1 × n row vector gives a 1 × ℓ
row vector, which is a linear combination of the row vectors inside B.
Now, let’s multiply a matrix A with B on its left. Then,
t t
ã1 ã1 B
.. ..
AB = . B = . . (5.3)
ãtm ãtm B
That is, multiplying a matrix with m rows on the left means obtaining m rows, each
of which is a linear combination of the row vectors of B, with coefficients given by
ã j . This third viewpoint is useful from the perspective of row operations or Gaussian
elimination.
Problem 5.4. Verify if the ik element of (5.3) matches the given ik element of (5.1).
Solution 5.5 Since E has 3 rows, EA also has 3 rows. The first row of E creates the
first row of EA, and since the coefficients of the linear combination are (1, 0, 0, 0), it
is identical to the first row of A. The second row of EA is created by the second row
of E, which involves multiplying the first row of A by 2 and subtracting it from the
second row of A. The third row of EA is created by the third row of E, and so on.
⊔
⊓
46 5 Linear maps and matrix multiplication
Viewpoint #4
The fourth viewpoint uses the tensor product (outer product) perspective. Although
not widely used, the tensor product allows multiplication between arbitrary vectors,
even if they have different dimensions. For example, let x ∈ Rn and y ∈ Rm . The
tensor product, denoted as y ⊗ x, is defined as follows:
y1 y1 x1 · · · y1 xn
y ⊗ x := yxt = ... x1 · · · xn = ... ..
.
ym ym x1 · · · ym xn
Solution 5.6 Let B be an n × ℓ matrix, and b̃tj be the row vectors of B. Then,
t
b̃1 n n
..
AB = (a1 , · · · , an ) . = ∑ a j b̃tj = ∑ a j ⊗ b̃ j ,
j=1 j=1
b̃tn
In this section, we demonstrate that linear mappings are equivalent to matrices, and
the composition of linear functions is equivalent to matrix multiplication.
T (x) = Ax.
The last equality follows from the second perspective of matrix multiplication. ⊔
⊓
Problem 6.2 (Matrices are linear mappings). Given a matrix A ∈ Rm×n , define
T : Rn → Rm by T (x) = Ax. Show that T is a linear mapping.
47
48 6 Properties of linear mappings
Solution 6.3 (1) is straightforward. (2) demonstrates the relationship between ma-
trix multiplication and composition of functions. Although it is often used without
much thought, verification is necessary. But what needs to be shown? If C is the
ℓ × m matrix given by the matrix multiplication above, we need to show
Cu = A(Bu)
for any u ∈ Rℓ . Here, the left-hand side is precisely the linear function defined by
C, while the right-hand side is the composition function T (S(u)). Thus, showing
(AB)u = A(Bu)
⊔
⊓
Ω ε = [0, ε]n .
We denote the volume of a given set S as ∆ S. Then the volume of the above set is
given by:
∆ (Ω ε ) = ε n .
When ε = 1, we have ∆ (Ω 1 ) = 1.
We consider parallelepipeds with one vertex at the origin. The parallelepiped Ω 1
has all edges of length 1 and is a special parallelepiped where two edges are perpen-
dicular to each other. Generally, an n-dimensional parallelepiped is determined by
n linearly independent vectors. These vectors are the n edges of the parallelepiped
connected to the origin. Their lengths or the angles between two edges do not neces-
sarily have to be the same. For n = 2, the parallelepiped consists of n + n × (n − 1) =
n2 = 4 edges. For n = 3, it consists of a total of n+n(n−1)+ n(n−1)(n−2)
2 = 12 edges.
Solution 6.4 Let A be the m × n matrix of the linear transformation T . Then A can
be written as follows.
A = (a1 , a2 , · · · , an ),
where ai are the column vectors of the matrix A. Then the n edges of the cube Ω 1 are
ei , and their images are Aei = ai . In other words, the n edges of the n-dimensional
parallelepiped T (Ω 1 ) connected to the origin are given by the n columns of the ma-
trix A. (Strictly speaking, the condition that ai are linearly independent is necessary.)
⊔
⊓
∆ T (Ω 1 )
q=
∆Ω1
is called the volume expansion rate of the linear function T . Due to the properties
ε)
of linear functions, it can be shown that for all ε > 0, q = ∆ T∆ (Ω
Ω ε . Furthermore, for
any non-zero volume space V ⊂ Rn , q = ∆ T∆V(V ) .
matrix A is denoted as det(A) and is defined only for square matrices. The following
is the method of calculating the determinant.
Determinant
Problem 6.5 (Volume of parallelepiped when m = n). Show that the volume of a
parallelepiped formed by edges a1 , · · · , an is equal to the determinant of the matrix
A formed by a1 , · · · , an .
Solution 6.5 Let’s start with the case when n = 2. By rotating the edges, we can
position a1 along the x-axis. Then c = 0 and det(A) = ad. This matches the area of
the parallelogram. For n = 3, similarly, we rotate the edges so that a1 is aligned with
the x-axis. Then d = g = 0. By rotating the parallelepiped about a1 to position a2
in the xy-plane, we have h = 0, and det(A) becomes det(A) = aei. While this value
may be negative depending on the signs of a, e, and i, its absolute value matches the
volume. For higher dimensions, we simply remember the formula. ⊔ ⊓
Solution 6.6 ⊔
⊓
Problem 6.7 (Area of parallelogram). When m > 2, show that the area of the space
Rm formed by two edges a1 , a2 ∈ Rm of a parallelogram in Rm is given by the
following formula: q
(∥a1 ∥ ∗ ∥a2 ∥)2 − (a1 · a2 )2 . (6.1)
⊔
⊓
Problem 6.8 (Volume of parallelepiped when n < m). Let n < m. Show that the
volume of a space Rm formed by n edges a1 , · · · , an ∈ Rm of a parallelepiped is
given by the following Gram determinant:
1/2
q a1 · a1 · · · a1 · an
∥a1 × a2 × · · · × an ∥ = det(ai · a j ) = .. .. (6.2)
. .
an · a1 · · · an · an
Solution 6.8 Let’s not prove it but just remember the formula. ⊔
⊓
Question 6.1. Do the formulas for the area of a parallelogram (6.1) and the Gram
determinant in (6.2) match?
Lecture 7
Directional and Partial Differentials
Quiz: If the inequality “≤” in (7.1) is changed to the inequality “<”, what needs
to be changed accordingly in the definition?1
1 To exclude the case h = 0, we need to use “whenever 0 < |h| < δ ”. The advantage of putting h on
the right rather than writing the derivative in fractional form is that we can include the case h = 0,
which can be handled when using “≤”.
53
54 7 Directional and Partial Differentials
Problem 7.1. Prove the following using the definition of limits and the given defi-
nition:
f (c + hu) − f (c)
lim = Du f (c).
h→0 h
Solution 7.1 First, let’s clarify the meaning of the problem. Although not explicitly
stated in the problem, the problem means that if the directional derivative Du f (c)
exists, then the left-hand limit exists and they are equal. This seemingly obvious
problem helps us to view the definition from the perspective of limits. The proof is
simple. Let’s look at it step by step.
Let v = Du f (c). Then, (7.1) can be written as follows:
f (c + hu) − f (c)
−v ≤ ε for 0 < |h| ≤ δ .
h
Therefore,
f (c + hu) − f (c)
lim −v = 0
h→0 h
As taking absolute value is a continuous function, we can move the limit inside:
f (c + hu) − f (c)
lim − v = 0.
h→0 h
If the absolute value is 0, then the content inside is also 0, which is what we want to
show. ⊔ ⊓
7.2 Partial derivative 55
Definition 7.2 (Directional derivatives for f). A vector v ∈ Rm is called the direc-
tional derivative (or directional differential) of a vector-valued function f : D ⊂
Rn → Rm at c ∈ D in the direction u if for any ε > 0, there exists δ > 0 such that
Definition 7.3 (Partial derivatives for f). A vector v ∈ Rm is called the partial
derivative (or partial differential) of a vector-valued function f : D ⊂ Rn → Rm
at c ∈ D with respect to xi for i = 1, · · · , n, if for any ε > 0, there exists δ > 0 and
whenever |h| < δ . In other words, if v = Dei f(c). We denote it by Di f(c) := Dei f(c).
Various notations are used for partial derivatives. If the independent variables
(x, y, z) are used in R3 , partial derivatives can be denoted by Dx f, Dy f, Dz f. If the
56 7 Directional and Partial Differentials
∂f f(x + h, y, z) − f(x, y, z)
fx := := lim ,
∂x h→0 h
∂f f(x, y + h, z) − f(x, y, z)
fy := := lim ,
∂y h→0 h
∂f f(x, y, z + h) − f(x, y, z)
fz := := lim .
∂z h→0 h
Problem 7.3. Find the partial derivatives of the following two-variable functions.
2y
(1) f (x, y) = x2 + 3xy + y − 1. (2) f (x, y) = y sin xy. (3) f (x, y) = y+cos x
Solution 7.3 Let’s find D1 f (x, y) for (1). In this case, consider x as the only variable
and treat the rest as constants to calculate the derivative of a single-variable function.
Then,
D1 f (x, y) = 2x + 3y
But is this consistent with the definition? Let’s verify using (7.3). In this case, u = e1
and c = x = (x, y)t , so
f (x + he1 ) − f (x)
lim
h→0 h
f (x + h, y) − f (x, y)
= lim
h→0 h
(x + h)2 + 3(x + h)y + y − 1 − (x2 + 3xy + y − 1)
= lim
h→0 h
(x + h)2 + 3(x + h)y − (x2 + 3xy)
= lim
h→0 h
2hx + h2 + 3hy
= lim
h→0 h
= lim 2x + h + 3y = 2x + 3y.
h→0
Observe why we differentiate only x and treat other variables as constants when
finding D1 f in the calculation above. ⊔
⊓
Problem 7.4. Find the partial derivative of the function f = f (x, y) when implicitly
given as follows.
y f − ln f = x + y.
Solution 7.4 ⊔
⊓
Here, we have only computed partial derivatives. Later, we will consider formulas
for computing directional derivatives using partial derivatives.
7.3 Gradient 57
Question 7.1. Suppose that the function f has directional derivatives in all directions
at c ∈ D. Does this imply that f is continuous at c?
x1 x22
f (x) = if x ̸= 0, and f (0) = 0
x12 + x24
√ t2 1
lim f (t, t) = = ̸= 0.
h→0 t2 + t2 2
Hence, f is discontinuous at 0. ⊔
⊓
7.3 Gradient
From now on, we will define the gradient of a scalar function ∇ f as a row vector.
This is the only case in this lecture where row vectors are used. Since vector-valued
functions have column vector values, it is natural to apply ∇ to vector-valued func-
tions. (Most textbooks do not make this distinction clear. It is convenient to make
this distinction clear when writing.) In conclusion, we can write as follows.
f1 (x) ∇ f1 (c)
f(x) = ... =⇒ ∇f(c) := ... .
fm (x) m×1
∇ fm (c) m×n
∂ 2f ∂ 2f
fxx (c) = D2x f(c) = (c), fyx (c) = Dx (Dy f)(c) = (c)
∂ x2 ∂ x∂ y
∂ 2f ∂ 2f
fyy (c) = D2y f(c) = 2 (c), fxy (c) = Dy (Dx f)(c) = (c)
∂y ∂ y∂ x
Generally, fxy ̸= fyx . However, if these mixed partial derivatives are continuous, they
are equal.
∂w
Problem 7.6. Given w = x2 + y2 + z2 , find ∂x .
When people ask such questions, they do not always specify everything. When we
read such problems, we need to understand the meaning in a reasonable way. In this
problem, w is considered as a function of three variables x, y, z. The person asking the
question would generally consider these three variables as independent variables.
Therefore, we should consider y and z as independent variables and compute ∂∂wx .
Thus, the answer should be as follows.
7.5 Partial Derivatives with Constrained Variables 59
∂w
= 2x + 0 + 0 = 2x.
∂x
If we want to make the meaning of the answer clear, we can represent it as follows:
∂w
= 2x.
∂x x,y,z
∂w
Here, we temporarily create the notation ∂ x x,y,z , which means the partial derivative
of w with respect to x when x, y, z are considered independent variables. ⊔
⊓
∂y
Problem 7.7. Given w = x2 + y2 + z2 , find ∂w .
(Huh?) In this problem, we are asked to compute ∂∂wy . This means that y is con-
sidered as a function (or dependent variable) and w is considered as an independent
variable. Typically, when one equation is given, one of w, x, y, z is considered as the
dependent variable and the rest as independent variables. Therefore, if we consider
w, x and z as independent variables, we obtain the following answer.
∂y
1 = 0 + 2y + 0.
∂w
∂y 1
Therefore, ∂ w x,z,w = 2y . ⊔
⊓
Next, let’s consider the case where there are two relations and four variables. In
this case, we need to clarify the meaning.
∂y
Problem 7.8. Given w = x2 + y2 + z2 and z = xy, find ∂w .
In this problem, we are asked to compute ∂∂wy . This means that y is considered as
a function (or dependent variable) and w is considered as an independent variable.
Typically, when two equations are given, one of w, x, y, z is considered as the depen-
dent variable and the rest as independent variables. Therefore, if we consider w, x
and z as independent variables, we need to clarify the meaning.
The meaning of this problem is confusing. Since there are two relations, we can
consider two variables as dependent and the other two as independent variables.
So, which ones should we choose? Asking to compute ∂∂wx implies considering x
as the independent variable and w as the dependent variable. Then, it should have
been specified in the problem which variables are to be considered as dependent and
independent.
60 7 Directional and Partial Differentials
Solution 7.8 (i) First, let’s find the answer when x and z are considered independent
variables. Then, differentiating the two relations with respect to x, we get:
∂w ∂y ∂y
= 2x + 2y + 0, 0 − 2x − 2y = 0.
∂x ∂x ∂x
Therefore,
∂y x ∂w x
=− and = 2x − 2y = 0.
∂x y ∂x x,z y
(ii) Now, let’s consider the case when x and y are independent variables. Differenti-
ating the two relations with respect to x, we get:
∂w ∂z ∂z
= 2x + 0 + 2z , − 2x − 0 = 0.
∂x ∂x ∂x
Therefore,
∂z ∂w
= 2x and = 2x + 2z2x = 2x + 4zx.
∂x ∂x x,y
⊔
⊓
∂w ∂w
̸= .
∂x x,y ∂x x,z
That is, the value of the partial derivative ∂∂wx depends on what we choose as the rela-
tive independent variables. It is natural that the influence we exert varies depending
on the relative choice.
Finally, let’s consider the case of three relations and four variables.
∂w
Problem 7.9. Given w = x2 + y2 + z2 , z − x2 − y2 = 0, and x2 + y2 = 1, find ∂x .
Since there are three relations, we can consider three variables as dependent and
the remaining one as the independent variable. If we are asked to compute ∂∂wx ,
it implies considering x as the independent variable. Then, it would be better to
represent it as a total derivative, dw ′
dx . Or, more simply, we can denote it as w . Other
′ ′
derivatives can also be denoted as y and z . Let’s compute it.
Solution 7.9 Implicitly differentiating the three relations with respect to x, we ob-
tain:
w′ = 2x + 2yy′ + 2zz′ , z′ − 2x − 2yy′ = 0, 2x + 2yy′ = 0.
Therefore, y′ = − xy , z′ = 2x − 2y xy = 0, and
x
w′ = 2x + 2yy′ + 2zz′ = 2x − 2y + 0 = 0.
y
7.5 Partial Derivatives with Constrained Variables 61
We call the linear function T (or the corresponding matrix A) the (full) differential
of f at c.
f(x) ∼
= f(c) + T (x − c). (8.2)
can be made sufficiently small, becoming less than ε as ∥x−c∥ becomes sufficiently
small.
63
64 8 Full Differentials
If there exists a full differential, it seems like it implies that all partial differentials
exist. Indeed, directional differentials exist for all directions.
v = Tu
Then, taking x = c + hu, we can satisfy the conditions of the definition of directional
differential. ⊔
⊓
Question 8.1. We said that the differential T is a linear function. What does that
mean? It means that T is given by a matrix A = (ai j ). What is that matrix?
Proof. T (e j ) becomes the j-th column of the matrix. And T (e j ) is the directional
derivative of f in the e j direction at c. Therefore, it becomes D j f(c). ⊔
⊓
T (x − c) = ∇f(c)(x − c).
Question 8.2. Does the converse of the theorem hold? In other words, if f has partial
derivatives for all elements fi , and thus ∇f(c) exists, is f differentiable at c?
The answer is ”no.” In other words, even if the gradient matrix ∇f(c) exists, the
relation (8.1) may not hold. So, can we find a counterexample? And what additional
conditions are needed for the converse of the theorem to hold?
The chain rule is the rule for differentiating composite functions, and the explanation
for it is shown in the above figure. Theorem 8.1 is a related theorem, but proving
it is important, but understanding each part of the above figure clearly is no less
important than proving it. Try to understand what the figure is saying by yourself
first, and then compare it with the following explanation. In the figure, a function
g : Rℓ → Rn is given. The function g maps the domain Rℓ of the function f to the
codomain Rn . Assume that the initial function g is differentiable at c ∈ Rℓ . Then,
the derivative ∇g(c) (or the linear function H) is an n × ℓ matrix and is a linear
approximation of g at the point c. The second function f : Rn → Rm is fortunately
also differentiable at the point g(c) ∈ Rn . Then, the derivative ∇f(g(c)) is an m × n
matrix (or the linear function T ), and it is a linear approximation of f at the point
g(c).
The composite function f ◦ g is defined on Rℓ and takes values in Rm , and the fact
that this composite function is differentiable at c is the chain rule. To prove this,
we need to find a linear function approximating the composite function f ◦ g at the
point c. What could it be? Obviously, it is the composition T ◦ H of the two linear
functions. Expressed as matrices, it is given by the product of the two matrices as
in Equation (8.1). The following is the essence of the differentiation formula: the
Chain Rule.
Proof (Proof of The Chain Rule). The logic of the proof is similar to proving the
continuity of the composition of two continuous functions (Problem 4.14). We omit
it. ⊔
⊓
Theorem 8.3 is the most general form of the chain rule. In the next section, we
consider three cases where the dimension of the starting space Rℓ is ℓ = 1, 2, 3.
The dimension of the intermediate space Rn is not so important, so we fix n = 3. We
consider the case where the destination space Rm is m = 1 so that we can handle each
component fi separately. Although the problem considers vector-valued functions f,
in the explanation, we consider the case where m = 1 for simplicity of notation,
considering only one component of f.
66 8 Full Differentials
In this section, we will examine the relationship between the graph of a function
f and the graph of its differential T . We can clearly see the graph of a function
f : Rn → Rm only when n + m ≤ 3, so we can explicitly draw figures for the three
cases of (m, n) = (1, 1), (1, 2), and (2, 1). Still, when the dimension is higher, we
simply imagine that they would look similar.
Question 8.3. What does the graph of a linear function T : Rn → Rm look like?
Explaining the answer to the above question is a good exercise. Let’s consider the
cases of (m, n) = (1, 2) and (2, 1) first, and then describe the general case.
Zero-level set
Next, we describe the graphs of multivariable functions and their differentials using
the zero-level set. Let f : R2 → R be differentiable at c ∈ R2 . The graph of the
function f is the set of points where z = f (x, y). Define the scalar function F : R3 →
R as follows:
F(x, y, z) = f (x, y) − z.
Then, the graph of the function f (x, y) becomes the zero-level set of F. Let
c = (c1 ; c2 ), and c̃ := (c; f (c)) = (c1 , c2 , f (c1 , c2 )). Then, the gradient of F at c̃
is ∇F(c̃) = ( fx (c), fy (c), −1).
Solution 8.2 What should we show? We will demonstrate that every curve on the
zero-level set passing through c is orthogonal to ∇G(c). (Is it good?) Let r(t) be a
curve on the zero-level set for t ∈ (−ε, ε) with r(0) = c. Then, since G(r(t)) = 0
for all t,
dG
= ∇G(r(t)) · r′ (t) = 0
dt
(using the chain rule for the next lecture). Thus, ∇G(r(0))·r′ (0) = ∇G(c)·r′ (0) = 0,
and therefore r′ (0) is orthogonal to ∇G(c). ⊔ ⊓
Problem 8.3. Discuss the relationship between a function f : R → R and its graph
when f is differentiable at c ∈ R.
Considering the general case f : Rn → Rm will be helpful for the following prob-
lem.
Solution 8.5 The graph of T is the set of points (x; y) ∈ Rn+m satisfying y = T x.
Define the function F : Rn+m → Rm as F(x, y) = T x − y. Then, the graph of T is the
zero-level set of F. Let T = (ai j ) (1 ≤ i ≤ m, 1 ≤ j ≤ n). Then, the gradient of F is
given by
a11 a12 · · · a1n −1 0 · · · 0
a21 a22 · · · a2n 0 −1 · · · 0
∇F = .
.. .. .. .. ..
.. . . . . .
am1 am2 · · · amn 0 0 · · · −1 m×(n+m)
Question 8.5. For Problem 8.5 to hold, one or two conditions are necessary for T .
Which of the following conditions are needed?
(i) The rows of the matrix (ai j ) are linearly independent. (ii) There are no zero
rows in (ai j ).
(iii) n ≥ m. (iv) m ≥ n.
68 8 Full Differentials
For Problem 8.5 to hold, conditions (i) and (ii) are necessary. The graph of T
becomes an n-dimensional plane in Rn+m and m vectors perpendicular to that plane
can be found only when the rows of the matrix (ai j ) are linearly independent and
none of them are zero rows. (But if there are zero rows, they are not linearly inde-
pendent anyway, so only condition (i) is sufficient.)
Problem 8.6. Discuss the relationship between a function f : Rn → Rm and the graph
of its differential at c ∈ Rn when f is differentiable at c.
Lecture 9
Line Integral
In Calculus 1, integration was defined for functions defined on an interval [a, b].
Now, let’s consider a real-valued function f : R3 → R defined in three-dimensional
space. Suppose there is a curve C lying in this space. Even if this curve is curved,
from the perspective of the curve or tiny organisms living on it, there is no distinction
between straight and curved lines. It’s like how we perceive ourselves living on a
spherical Earth but feel like we’re living on a flat plane. In this situation, we can
consider the function f as defined on this curve C as if it were a one-dimensional
function, and integrating in this scenario is what we call a line integral. For example,
if the function is f = 1, then the integral should be the length of the curve C. It’s
very difficult for us, from the outside, to directly integrate this. It’s hard to determine
where the curve passes and what values it takes there. In most cases, as shown in
the figure above, integration is done when the curve is parameterized by r(t). How
is the line integral given when using a parameter?
Question 9.1. The integral ab f (r(t))dt of the composite function f (r(t)) is not the
R
line integral we desire. Why is that? It’s easy to know. For example, when f = 1,
the result is not the length of the curve C but the length of the interval [a, b].
69
70 9 Line Integral
where ∆ xi is the length of the ith interval of the partition of the curve, and f (xi ) is
the value of the function at the endpoint of that subinterval. When doing Riemann
integration, we can choose any point inside the subinterval.
If the curve C is given by a parameter t ∈ [a, b] using r(t), then we can express
and define the line integral as follows, after taking a partition of the interval [a, b] as
Problem 9.1. Explain the difference between the limit (9.2) and the integral of the
composite function f (r(t)).
Rb
Solution 9.1 The integral a f (r(t))dt satisfies the following relationship:
Z b N
f (r(t))dt = lim ∑ f (r(ti ))∆ti , ∆ti = ti+1 − ti .
a ∥π∥→0 i=1
The difference from the limit (9.2) is that ∆ xi and ∆ti do not match. They are dif-
ferent. ⊔⊓
9.2 Expansion rate of a curve 71
It’s most convenient to remember that integrating speed gives distance. Thinking
of it as the rubber band expanding rather than the speed also has its usefulness.
Especially when the dimension of the space Rℓ increases, it becomes particularly
useful
p because it’s impossible to think in terms of speed in higher dimensions.
x′ (t)2 + y′ (t)2 + z′ (t)2 represents how much the rubber band around the segment
[a, b] ⊂ Rℓ expands around r(t) as t moves nearby. Let’s calculate how much it ex-
pands using the definition of full differential. According to the definition, even if
any ε > 0 is given, for sufficiently small h, the following holds:
∥r(t + h) − r(t)∥ ∥T h∥
− < ε.
|h| |h|
∥r(t + h) − r(t)∥
Here, represents how many times the original rubber band length,
|h|
with length h, expands due to the original function r. Of course, the image may
not be a straight line, but for sufficiently small h, we can think of it as a straight
line. On the other hand, ∥T|h|h∥ represents how many times it expands due to the
linear function or by differentiation. Therefore, the above expression means that the
difference between the two is less than any ε > 0. That is, if h is sufficiently small,
instead of calculating with r, we can calculate it with the linear function T = ∇r(t).
Then we get:
∥T h∥
q
= ∥∇r(t)∥ = x′ (t)2 + y′ (t)2 + z′ (t)2 .
|h|
Problem 9.2. If r(t) is differentiable and v(t) = r′ (t), then show that the following
holds: Z Z b
f= f (r(t))∥v(t)∥dt. (9.3)
C a
is satisfied. Substituting this into the definition of the line integral (9.2), we get
Z N Z ti+1 Z b
f = lim ∑ f (r(ti ))∥v(t)∥dt = f (r(t))∥v(t)∥dt
C ∥π∥→0 i=1 ti a
is satisfied. ⊔
⊓
If the curve C is parameterized by r(t) using a parameter, and the function r(t) is
differentiable, then the curve C is called smooth.
Let’s consider one specific case of the Chain Rule Theorem 8.3, which is ℓ = 1.
We fix the intermediate space Rn to n = 3. Thus, g is a vector-valued function with
three components. Consider a scalar-valued function f . The figure corresponding
to this section is as follows. In this section, we use t as the variable of Rℓ=1 . The
independent variables in space Rn=3 are denoted as (x, y, z). Therefore, we express
g(t) as g(t) = (x(t), y(t), z(t))t .
dw
Problem 9.3. Given w = xy + z and (x, y, z) = (2 cost, 2 sint, 5 cos2 t), find .
dt
Solution 9.3 First, we need to interpret the problem correctly. It’s essential to prac-
tice viewing the given problem from the perspective of the chain rule. Here, w is
a function of (x, y, z), and (x, y, z) is a function of t. Finding dw
dt means differen-
tiating the composition of w and g(t) = (x(t); y(t); z(t)). Here, we abuse notation
again. According to the Chain Rule Theorem 8.3, we have w = f (x, y, z) = xy + z
and g(t) = (2 cost; 2 sint; 5 cos2 t). Therefore,
∇ f (g(t)) = ∇ f at g(t)
= (y, x, 1) at g(t)
= (2 sint, 2 cost, 1),
9.4 Directional derivative and Chain rule 73
−2 sint
∇g(t) = g′ (t) = 2 cost .
−10 cost sint
dw
Therefore, = ∇ f (g(t)) · ∇g(t) = −4 sin2 t + 4 cos2 t − 10 sint cost. ⊔
⊓
dt
In Problem 9.2, we attempted to use the theorem directly. It’s crucial to grasp the
meaning of the theorem through repeated practice. However, since that always takes
time, we write the equation (8.1) as follows for convenience:
′
x (t)
df ∂f ′ ∂f ′ ∂f ′
= ∇ f |g(t) · ∇g|t = ( fx , fy , fz ) y′ (t) = x (t) + y (t) + z (t).
dt ′ ∂x ∂y ∂z
z (t)
Here, we again abuse notation. ddtf on the left-hand side means that f is thought of
as a function of t. In other words, it means the composition of f and g. On the right-
hand side, ∂∂ xf means that f is thought of as a function of x, y, z. This expression is
clearer.
df ∂ f dx ∂ f dy ∂ f dz
= + + . (9.4)
dt ∂ x dt ∂ y dt ∂ z dt
Question 9.2. What if we use a vector-valued function f? Nothing changes. Just
replace f with f and think of everything as row vectors. The formula is simply as
follows.
df ∂ f dx ∂ f dy ∂ f dz
= + + .
dt ∂ x dt ∂ y dt ∂ z dt
(The formula for the directional differential is easy to remember. It’s simply ”gradi-
ent matrix times direction.”)
d
f(c + tu) = ∇f|g(0)=c ∇g|0 = ∇f(c)u. ⊔
⊓
dt t=0
Question 9.3. Eqn. (9.5) shows how to compute the directional derivative Du f(c)
using partial derivatives. Does the existence of partial derivatives always imply the
existence of the directional derivative Du f(c)? No, it doesn’t. There’s an important
condition to remember. What is it?
√1
3
∇ f (g(0)) = ∇ f (c) = (y, x, 1)|c = (1, 2, 1), ∇g(0) = u = √13 .
√1
3
Problem 9.6. Let f : R3 → R2 be given by f(x, y, z) = (xy + z, yz + x), and let the
direction vector be u = ( √13 , √13 , √13 ). Find the directional differential Du f at c =
(2, 1, 0).
Solution 9.6 Here, even though f is a vector-valued function, we can still calculate
it in the same way. That is,
1
√3 √4
!
yx1 121 1 2 1 √1 3
∇f(c) = = , Du f(c) = ∇f(c)u = 3 = √2 .
1zy c 101 101 1 √ 3
3
⊔
⊓
The directional derivative Du f(c) is for the case where u is a unit vector with
∥u∥ = 1. However, even when u is not a unit vector, we define Du f using (11.1).
But when ∥u∥ ≠ 1, calling Du f the directional derivative is not exactly correct. To
compare with the directional derivative, let’s perform the calculation:
1 1
D u f(c) = (u1 D1 + · · · + un Dn )f(c) = Du f(c)
∥u∥ ∥u∥ ∥u∥
The purpose of this lecture is to find the maximum and minimum values of a differ-
entiable function f : Rn → R over the entire domain or within a region on a surface
or curve. In the case of surfaces or curves, we consider two cases: when they are
given by parameters or by level sets.
Problem 10.1. Suppose f (x0 ) is a local extremum and the directional derivative
Du f (x0 ) exists in the direction of u. Show that Du f (x0 ) = 0.
Solution 10.1 Let’s only consider the case where f (x0 ) is a local maximum. In this
case, for h > 0,
f (x0 + hu) − f (x0 )
≤ 0.
h
Therefore,
f (x0 + hu) − f (x0 )
Du f (x0 ) = lim ≤ 0.
h→0+ h
Similarly, for h < 0,
77
78 10 Finding Extreme Values
Question 10.1. How would we prove the case where f (x0 ) is a local minimum?
There are two methods. One is to follow the above proof but reverse the direction
of the inequality, and the other is to use the fact that if g(x) = − f (x) has a local
maximum at x0 .
We will learn how to determine and find the maximum and minimum values of
a multivariable function f : Rn → R. Let’s start by reviewing the case of single-
variable functions and think about its significance.
Problem 10.2. Find the critical points of the function f (x) = x2 + 3x − 1 and deter-
mine whether these critical points are local maxima or minima.
This vector consists of first-order partial derivatives. The Hessian is the derivative
of the derivative. More precisely, we first take the transpose of ∇ f and then take the
gradient of the resulting column vector. This gives us a square matrix consisting of
second-order partial derivatives:
D1 f (x) D11 f (x0 ) · · · D1n f (x0 )
H f (x0 ) = ∇ ... = .. ..
.
. .
Dn f (x) Dn1 f (x0 ) · · · Dnn f (x0 )
It’s not enough for the second derivatives to exist; we also need the second-order
partial derivatives to be continuous. If the function is continuously differentiable,
10.2 Criterion for maximum, minimum, and saddle 79
then Di j f = D ji f , so the Hessian matrix is symmetric, meaning the ith row and ith
column are the same.
This is where linear algebra comes in. According to linear algebra, if A is an
n × n symmetric matrix, it has n eigenvalues λi and corresponding eigenvectors xi
for i = 1, · · · , n, satisfying:
Axi = λi xi .
This means that if we only consider the direction of xi , multiplying the matrix A
is the same as multiplying by λi . In other words, this is the case where the second
derivative of the function is λi along this direction. Furthermore, the eigenvectors xi
are orthogonal to each other, meaning that if we rotate the coordinate axes appropri-
ately, we can make them coincide with the basic axis direction ei .
If x0 is a critical point and all eigenvalues of its Hessian are positive, the function
f has a local minimum at x0 . If they are all negative, it has a local maximum, and if
they are mixed, it is a saddle point. If zero is included, the conclusion is inconclusive.
Can you explain why this is the case, comparing it to the case of single-variable
functions?
Question 10.2. There is one missing condition in the above explanation. What is it?
Now let’s consider the case where f : R2 → R. Then, the Hessian matrix is a 2 × 2
matrix:
f (x ) f (x )
H f (x0 ) = xx 0 xy 0 .
fxy (x0 ) fyy (x0 )
For a 2 × 2 matrix, there are two eigenvalues. If both eigenvalues of H f (x0 ) are
positive, then f (x0 ) is a local minimum. If one is positive and the other is negative,
it is a saddle point. If both are negative, f (x0 ) is a local maximum.
If we denote the two eigenvalues as λ1 and λ2 , their product and sum are as
follows:
2
λ1 λ2 = fxx (x0 ) fyy (x0 ) − fxy (x0 ), λ1 + λ2 = fxx (x0 ) + fyy (x0 ).
Solution 10.3 ⊔
⊓
Solution 10.4 We can find the critical points, but there is currently no way to de-
termine whether these points are maximum, minimum, or saddle points in three-
dimensional space. ⊔⊓
Solution 10.5 To solve the problem, we need something more. Comparing only
critical points is not enough because maximum or minimum values can occur at the
boundary even if they are not critical points. We will cover what is needed in the
next section. ⊔ ⊓
Problem 10.6. Find the point on the 3-dimensional curve (cost, sint,t) closest to
the origin.
Solution 10.6 The functionp representing the distance between a point x = (x, y, z)
and the origin is f (x, y, z) = x2 + y2 + z2 . If f (x0 ) is a critical value, then h(x0 ) is
also a critical value of the function h(x, y, z) = x2 + y2 + z2 , and the computation is
simpler than finding critical values of g. Then h(t) := h(r(t)) = cos2 t + sin2 t +t 2 =
t 2 + 1, and h′ (t) = 2t. Therefore, t = 0 is the only critical point. Since h′′ (t) = 2 > 0,
this is a global minimum. The original variables (x, y, z) = (1, 0, 0) are the closest
point to the origin on the curve. ⊔ ⊓
10.4 Extreme values on level-set; Lagrange multiplier 81
Problem 10.7. Find the point on the plane x + y − z − 1 = 0 closest to the origin.
Solution 10.7 In the above solution, we used the same variables (x, y) for Rℓ space.
If this is inconvenient, we can use g(u, v) = (u, v, u + v − 1). Then the combination
is as follows:
S = {x ∈ Rn : g(x) = 0}.
Here, ∇g(x) is perpendicular to the surface. In this situation, the given problem is
to find the maximum and minimum of the function f : Rn → R on the surface S. It
is important to note that these are extreme values on the surface S, not in the entire
space Rn . In this situation, the Lagrange multiplier method is appropriate. The key
idea of this method is as follows:
“The vector ∇ f (x) represents the direction in which f increases most rapidly.
Similarly, if you move in the direction of −∇ f (x), f decreases most rapidly. If
f has an extreme value at the point x0 on the surface S, then ∇ f (x0 ) must be
perpendicular to the surface. Otherwise, ∇ f (x0 ) will have a tangential compo-
nent to the surface, and the value may increase or decrease as the point moves
along the surface.”
Problem 10.8. Find the maximum and minimum values of the function f (x, y) = xy
on the ellipse
x 2 y2
+ = 1.
8 2
2 2
Solution 10.8 The constraint function is g(x, y) = x8 + y2 . The relation ∇ f = λ ∇g
leads to:
x
y = λ , x = λ y.
4
This results in 4y = λ 2 y. Therefore, either y = 0 or λ = ±2. If y = 0, then x = λ 0 = 0.
Thus, (0, 0) is one possible point, but it does not lie on the ellipse.
Consider the cases λ = ±2. Since x = λ y, we have:
4y2 y2
+ = 1 ⇒ y = ±1.
8 2
And since x = λ y, the four possible points are (2, 1), (−2, 1), (2, −1), (−2, −1).
Therefore, comparing the values at these four points is sufficient. ⊔
⊓
Problem 10.9. Find the maximum and minimum values of the function f (x, y) =
3x + 4y on the circle x2 + y2 = 1.
Solution 10.9 ⊔
⊓
S = {x ∈ R3 : x + y + z = 1, x2 + y2 = 1}
Solution 10.10 The square of the distance to the origin is calculated as f (x, y, z) =
x2 +y2 +z2 . The distance is minimized when the square of the distance is minimized.
Therefore, the relation ∇ f = α∇g1 + β ∇g2 is as follows:
x 1 x
y = α 1 + β y .
z 1 0
Then,
x = α + β x
(1 − β )x = α
y = α +βy ⇒ (1 − β )y = α
z=α z=α
The domain D is a convex set. ((k+1)-times differentiable means that it can be dif-
ferentiated up to k+1 times and these derivatives are all continuous functions. A
function is called continuously differentiable.) Taylor’s formula for multi-variable
functions is very similar to the case of single-variable functions if you know how to
compute directional derivatives. Let’s start by considering why the domain D needs
to be convex.
First, let’s review the case for single variables. Let (a, b) ⊂ R be an open interval
and let F : (a, b) → R be a function differentiable (k+1) times. Taylor’s theorem
states that there exists a point s between c and x such that the following holds:
There are a total of k + 2 terms in the above expression, and the first k + 1 terms,
85
86 11 Taylor’s Formula for Multi-Variable Functions
F (k+1) (s)
R(x) := (x − c)k+1 ,
(k + 1)!
represents the difference between F(x) and the kth degree polynomial pk (x). Al-
though we don’t know the exact value of s, this expression tells us the approximate
error within a certain range. The maximum possible error is given by:
F (k+1) (s)
Maximum Error = max (x − c)k+1 .
s∈(c,x) (k + 1)!
Solution 11.1 For vector functions, Taylor’s polynomial can be written in the same
way as for scalar functions, but there cannot be an error term. Instead, we should
consider the directional derivative. The statement of Taylor’s theorem for vector-
valued functions is: [Please provide the solution for this problem]
Let g(t) = c +tu, then f(c + hu) = f ◦ g(t) and ∇g(t) = u. Therefore, using the chain
rule, we have:
d
Du f(c) = f(c + tu) = ∇f(c + tu)u t=0 .
dt t=0
11.2 Higher order directional derivatives 87
More generally, computing the kth order directional derivative in the direction u is
equivalent to applying the operator (u1 D1 + · · · + un Dn )k to the function f:
Problem 11.2. Let f (x, y) = sin x sin y and u = (u1 ; u2 ) be given. Compute the di-
rectional derivatives of f at the origin in the direction u up to the third order.
Solution 11.2 The 0th derivative is f (0, 0) = 0. The 1st derivative is:
Du f (0, 0) = (u1 D1 f (0, 0)+u2 D2 f (0, 0)) = (u1 cos(0) sin(0)+u2 sin(0) cos(0)) = 0.
⊔
⊓
88 11 Taylor’s Formula for Multi-Variable Functions
Solution 11.3 ⊔
⊓
where
k
((x1 − c1 )D1 + · · · + (xn − cn )Dn )ℓ f(c)
pk (c, x) := ∑ ,
ℓ=0 ℓ!
((x1 − c1 )D1 + · · · + (xn − cn )Dn )k+1 f(s)
Rk (c, x) := .
(k + 1)!
Then, since D is convex and open, there exists a small positive number ε > 0 such
that c + t(x − c) ∈ D for t ∈ (−ε, 1 + ε), and thus F is k + 1 times continuously
differentiable on (−ε, 1 + ε). Therefore, by the 1-variable Taylor theorem, there
exists 0 < s < 1 such that
The derivative F (ℓ) (0) is given by the ℓth directional derivative of f at c in the direc-
tion (x − c) (more precisely, if ∥x − c∥ = ̸ 1, it’s not a directional derivative). That
is,
11.3 Taylor’s formula for n variable functions 89
Problem 11.5. Find the 2nd order approximation of the function f (x, y) = sin x sin y
at the origin. Estimate the error of the 2nd order approximation when |x| < 0.1 and
|y| < 0.1.
Solution 11.5 Let’s use the calculations from Problem 11.3. In this case, it corre-
sponds to the case (u1 , u2 ) = (x, y). We already saw that the 0th and 1st terms are 0.
The 2nd term is 2xy. Therefore, the 2nd order approximation is
1
p2 (x, y) = 2xy = xy.
2!
The approximation error is given by:
1 3
| f (x, y) − xy| = |x cos s sint + 3x2 y sin s cost + 3xy2 cos s sint + y3 sin s cost|.
3!
Since | sin s| and | cost| are less than or equal to 1, and |x| and |y| are less than or
equal to 0.1, we can obtain the following estimate:
1 0.008
| f (x, y) − xy| ≤ 8(0.1)3 12 = .
3! 6
This calculation is simple and suitable as an example. Generally, to estimate the
error manually, many calculations are required and it takes a long time. ⊔
⊓
Problem 11.6. The answer given in Problem 11.5 matches the answer in Thomas’
14th edition. However, since D3u f (0, 0) = 0, it is not the optimal answer. What is the
optimal answer?
Solution 11.6 Can you understand what I’m saying? If you understand the Taylor
formula, you should be able to understand it. Need more hints? If so, does the 2nd
order approximation xy correspond to p2 or p3 ? ⊔⊓
Part III
Integration of Multi-variable Functions
Lecture 12
Double and Iterated Integrals on Rectangular
Coordinates
π = {x0 , x1 , · · · , x p }
is a partition of the interval [a, b] if the first point is x0 = a, the last point is x p = b,
and the points in between satisfy
Then, we can consider n sub-intervals. The i-th sub-interval is [xi−1 , xi ]. The length
of a sub-interval is denoted as follows:
∆ xi := xi − xi−1 , i = 1, · · · , p.
93
94 12 Double and Iterated Integrals on Rectangular Coordinates
The gauge (or mesh) of the partition is defined and denoted as follows:
∥π∥ := max ∆ xi .
1≤i≤p
This is the size of the largest sub-interval. Therefore, as ∥π∥ → 0, it means that the
size of all sub-intervals converges to 0. The Riemann sum of the function f for a
given partition π is defined as follows:
p
S(π, {ci }) := ∑ f (ci )∆ xi , ci ∈ [xi−1 , xi ].
i=1
It is determined by the given partition π and the choice of point ci in each sub-
interval. Finally, the Riemann integral of the function f over the interval [a, b] is
denoted and defined as follows:
Z b
f (x) dx := lim S(π, {ci }).
a ∥π∥→0
If the limit exists regardless of how the choices ci ∈ [xi−1 , xi ] are made, we say that
the function f is Riemann integrable over the interval [a, b]. The following theorem
introduces examples of functions for which Riemann integrals are possible.
Next, let’s examine some examples of functions for which Riemann integrability
fails.
Problem 12.1. It is possible to create a bounded function that is not Riemann in-
tegrable, but it is often done in strange ways. Show that the following bounded
function f : [0, 1] → R is not integrable:
(
0, if x ∈ Q
f (x) =
1, otherwise.
Problem 12.2. Among unbounded functions, there are many natural functions that
are not Riemann integrable. The integrability depends on how the function behaves
near divergence. Show that the function
(
xr , if x ̸= 0
f (x) =
0, if x = 0
Solution 12.2 For all r < 0, the function f diverges near x = 0, and as r approaches
0, the integration is possible, but further away, when r ≤ −1, the integration di-
verges. ⊔ ⊓
are partitions of [a, b] and [c, d], respectively, and are denoted as follows:
∆ xi := xi − xi−1 , i = 1, . . . , p1 ,
∆ y j := y j − y j−1 , j = 1, . . . , p2 .
We define small rectangles as follows:
96 12 Double and Iterated Integrals on Rectangular Coordinates
Ai j := [xi−1 , xi ] × [y j−1 , y j ],
∆ Ai j = ∆ xi ∆ y j .
∥π∥ = max ∆ Ai j .
1≤i≤p1 ,1≤ j≤p2
Finally, the Riemann integral of the function f over the rectangular region R =
[a, b] × [c, d] is defined and denoted as follows:
ZZ
f (x, y) dxdy := lim S(π, {ci j }).
R ∥π∥→0
Here, the limit must exist as ∥π∥ approaches 0 for the integral to be defined, and
in this case, we say that the function f is Riemann integrable over R = [a, b] ×
[c, d]. Such integration in two-dimensional space is called a double integral. Triple
integrals in three dimensions, and so on, all refer to Riemann integrals.
The following theorem lists the cases where Riemann integrals are possible for
functions.
What does it mean to have a finite number of smooth curves? The term ”finite
number” is clear, and ”smooth curve” means a function r : [a, b] → R that is differ-
entiable. In other words, it refers to functions among parameterized curves that are
differentiable.
This means that we first perform the inner integral 02 (4 − x − y) dx. In this case,
R
other variables like y are treated as constants. By doing so, we obtain the following
result:
Z 2 x=2
1
(4 − x − y) dx = 4x − x2 − yx = (8 − 2 − 2y) − (0) = 6 − 2y.
0 2 x=0
Thus, the inner summation corresponds to the inner integral, and the outer sum-
mation corresponds to the outer integral. This is theoretically proven by Fubini’s
Theorem.
Solution 12.5 Since the function f (x, y) = 10 − 6xy is continuous on R, we can use
Fubini’s Theorem. Therefore,
ZZ
f (x, y) dxdy = · · ·
R
⊔
⊓
Problem 12.6. Find the volume enclosed by the graph of the function f (x, y) =
x2 + y2 over the domain R = [0, 1] × [0, 1].
Solution 12.6 The volume is given by R x2 + y2 dxdy. Since the function is con-
RR
Many properties of one-dimensional Riemann integrals also hold for double inte-
grals, as well as integrals of higher dimensions. These are fundamental properties of
integrals that apply to all types of integration methods, including Riemann integrals.
1. Constant multiple rule
ZZ ZZ
c f (x, y)dxdy = c f (x, y)dxdy.
R R
2. Sum rule
ZZ ZZ ZZ
( f (x, y) ± g(x, y))dxdy = f (x, y)dxdy ± g(x, y)dxdy.
R R R
3. Comparison principle
ZZ ZZ
f (x, y)dxdy ≥ g(x, y)dxdy if f (x, y) ≥ g(x, y) on R.
R R
Problem 12.7.
Solution 12.7 ⊔
⊓
Lecture 13
Double Integration over a General Domain
RR
If D is a bounded closed domain and f is continuous on D, will the double integral
R h(x, y) dxdy be well-defined? Not necessarily. Even if f is continuous on D, h
may not be continuous on R. It is discontinuous at the boundary ∂ D. Therefore,
we cannot be sure if h is integrable on R. If the boundary ∂ D of the domain D
consists of a finite number of smooth curves, then integration is possible by the
previous theorem. However, if D has a complex structure, remember that h may not
be integrable on R. In this lecture, let’s practice integration over bounded closed
domains D whose boundaries consist of a finite number of smooth curves.
Even among general domains, there are cases where iterated integrals are more con-
venient. Let’s explore some of those cases and practice integration. First, consider a
domain enclosed by two lines parallel to the y-axis and n graphs of the form y = g(x).
101
102 13 Double Integration over a General Domain
Solution 13.1 First, D is contained within a rectangle R = [a, b] × [c, d], and its
boundary consists of two lines and two graphs, each of which is a smooth curve.
Thus, the function h(x, y) given in (13.1) is continuous except for a finite number of
smooth curves. Therefore, h(x,t) is Riemann integrable, and
ZZ ZZ
f (x, y)dxdy = h(x, y)dxdy
D R
RR
is well-defined. Now, let’s compute R h(x, y)dxdy. Expressing it as an iterated in-
tegral using Fubini’s Theorem, we have
ZZ Z bZ d
h(x, y)dxdy = h(x, y)dydx.
R a c
Since h is zero outside the graphs g1 and g2 , the inner integral becomes
Z d Z g2 (x)
h(x, y)dy = f (x, y)dy
c g1 (x)
Solution 13.2 This problem is almost identical to the previous one, but let’s practice
it. ⊔
⊓
13.2 Examples
Problem 13.3. Find the volume of a tetrahedron with vertices at the origin and three
vectors i, j, and k using double integration.
Solution 13.3 Let’s first compute it without integration. Since the area of the base
is 0.5 and the height is 1, the volume using the formula for the volume of a cone 13 ×
base × height is 61 . Now, let’s compute it using integration. First, find the equation
of the plane passing through the three vertices i, j, and k. The vector perpendicular
to the plane is (1, 1, 1), and since it passes through the point (0, 0, 1), the equation
of the plane is given by
(x, y, z − 1) · (1, 1, 1) = x + y + z − 1 = 0.
Thus, z = 1 − x − y. The base is defined by the x-axis, the y-axis, and the line y =
1 − x. Hence, in terms of integration, we have
Z 1 Z 1−x Z 1 1−x
Z 1
1
(1 − x − y)dydx = y − xy − 0.5y2 dx = 0.5x2 − x + 0.5dx = .
0 0 0 y=0 0 6
⊔
⊓
Lecture 14
Integration with Variable Changes
In this section, we explain the intuitive concept of the volume expansion rate that
should be considered when performing integration with variable changes.
105
106 14 Integration with Variable Changes
also be of dimensions lower than n. To define the Riemann integral, we first partition
the domain D into small regions (cells). The collection of these small regions
π = {Ai , i = 1, · · · N}
is called the partition of the domain D, and the size of the largest divided region
is denoted by ∥π∥. In other words, ∥π∥ → 0 means that the number of divided
regions increases and the size of each region converges to 0. Then, a point xi ∈ Ai is
chosen for each divided region. The Riemann integral is then defined and denoted
as follows:
Z N
f (x)dx = lim ∑ f (xi )∆ Ai .
D ∥π∥→0 i=1
Problem 14.1. The functions f and the composite function f ◦ g take the same val-
ues at x and y (of course, when x = g(y)). However, the integrals are different;
Z Z
f (x)dx ̸= f ◦ g(y)dy.
D G
Explain why they are different and how to modify them to make the equation hold.
Solution 14.1 The divided regions Bi = g−1 (Ai ) represent the inverse images of the
divided regions Ai , and Bi become the divided regions of G, forming the partition of
G. Denoting yi = g−1 (xi ), we have f (g(yi )) = f (xi ). However, since the volumes
of Ai and Bi are different, we have
N N
∑ f (xi )∆ Ai ̸= ∑ f (g(yi ))∆ Bi .
i=1 i=1
If g preserves the volume, the two expressions are equal, but most variable transfor-
mations we consider do not preserve the volume. Therefore, we need to consider the
change in volume between the divided regions, especially when the divided regions
are very small, i.e., when considering the limit ∥π∥ → 0. To make the equation hold,
we need to multiply the right side by the ratio of how much the divided region Bi
has increased due to g:
N N N
∆ Ai ∆ g(Bi )
∑ f (xi )∆ Ai = ∑ f (g(yi )) ∆ Bi ∆ Bi = ∑ f (g(yi )) ∆ Bi
∆ Bi .
i=1 i=1 i=1
14.1 Volume Expansion Rate 107
In other words, we need to multiply the right side by the ratio of how much g has
expanded the divided region Bi . ⊔
⊓
Here, q(y) represents the volume expansion rate around the point y, and it is given
as follows when B is a small region containing y:
∆ T (B)
q(y) = lim , y ∈ B. Volume Expansion Rate
∆ B→0 ∆B
Problem 14.2 (Review of Linear Approximation). The function g(y) is not a lin-
ear function of the variable y. Its derivative ∇g(y) is also not a linear function of the
variable y. In what sense, then, is ∇g referred to as a linear approximation?
Solution 14.2 It means that for a fixed c, ∇g(c)y is a linear function of the variable
y. The derivative ∇g(c) computes the gradient of g at the given point c and can be
represented as a matrix. Then, the function y → ∇g(c)y is a linear function with
respect to y. Therefore, g(c) + ∇g(c)y is an approximation of g(y). Alternatively,
∇g(c)y can be considered as a linear approximation of g(y) − g(c). If the notation
∇g(c)(Bi ) causes confusion, it can be replaced by ∇g|c (Bi ). ⊔⊓
Solution 14.3 Rather than proving, we present the principle. In the Taylor expan-
sion, the linear approximation ∇g is the first-order term. Therefore, the difference
g(y) − (g(c) + ∇g(y)) is a second-order term. Since the volume of a constant is 0,
we have
108 14 Integration with Variable Changes
∆ ((∇g|yi − g)(B))
lim = 0.
∆ B→0 ∆ (B)
This is because the numerator is a second-order term and the denominator is a first-
order term. The squared term converges to 0 faster. ⊔⊓
∆ T (B)
Therefore, instead of taking the limit of ∆∆g(B) B , we take the limit of ∆ B using
the linear approximation. However, linear functions have a very nice property. The
volume expansion rate is the same for all sets B. What does this mean? It means that
for linear function T , ∆ (T∆ (B))
B is constant for all sets B. Therefore, there is no need
to take a limit, and B doesn’t even need to contain y. For any set B, we have the
following:
∆ (T (B))
q(y) = , T = ∇g(y). Volume Expansion Rate
∆B
Then, what is the most convenient choice for B to compute the volume expansion
rate q(y)? The easiest choice is to take [0, 1]ℓ where each side length is 1, like a
square or a cube. In that case, T (B) becomes a parallelotope in n-dimensional space.
Therefore, we only need to know how to compute the volume of a parallelotope.
T (x) = Ax, x ∈ Rn .
(In the context of the previous section, it should have been T : Rℓ → Rn ...) Since the
linear transformation T and the matrix A are the same concept, it is not necessary
to use both notations, but sometimes it can be visually pleasing to use both. Let’s
denote an n-dimensional cube with edge length ε > 0 as follows:
Ω ε = [0, ε]n .
The volume of a given set S is denoted as ∆ S. Then, the volume of the above set is
as follows:
∆ (Ω ε ) = ε n .
When ε = 1, we have ∆ (Ω 1 ) = 1.
We consider parallelepipeds with one vertex at the origin. The parallelepiped Ω 1
is a special parallelepiped where all edges have a length of 1 and two edges are per-
pendicular to each other. Generally, an n-dimensional parallelepiped is determined
by n linearly independent vectors. These vectors are the n edges of the parallelepiped
connected to the origin. Their sizes or angles between two edges do not need to be
the same. For n = 2, the parallelepiped is composed of n + n × (n − 1) = n2 = 4
14.2 Linear function and volume of parallelotopes 109
Problem 14.4. Let’s assume m ≥ n. Explain that the image T (Ω 1 ) by a linear trans-
formation is an n-dimensional parallelepiped in the m-dimensional space. What are
the edges of T (Ω 1 ) connected to the origin?
Solution 14.4 Let A be the m × n matrix of the linear transformation T . Then, A can
be written as follows:
A = (a1 , a2 , · · · , an ),
where ai are the column vectors of matrix A. Then, the n edges of the cube Ω 1 are
ei , and their images are Aei = ai . In other words, the n columns of matrix A are the n
edges of the n-dimensional parallelepiped T (Ω 1 ) connected to the origin. (Strictly
speaking, the condition that ai ’s are linearly independent is needed.) ⊔ ⊓
∆ T (Ω 1 )
q=
∆Ω1
is called the volume expansion rate of the linear function T . Due to the linearity of
ε)
the function, we can show that q = ∆ T∆ (Ω
Ω ε for all ε > 0. Moreover, it holds for any
non-zero volume space V ⊂ Rn that q = ∆ T∆V(V ) .
Determinant
Problem 14.5 (Volume of parallelepiped when m = n). Show that the volume of
a parallelepiped composed of edges a1 , · · · , an is equal to the determinant of the
matrix A formed by these vectors.
Solution 14.5 Let’s start with the case n = 2. Rotate the edges so that a1 lies on
the x-axis. Then c = 0, and det(A) = ad. It is known to match the area of the paral-
lelogram. For n = 3, rotate it so that a1 lies on the x-axis. Then d = g = 0. Rotate
the parallelepiped centered at a1 so that a2 lies in the xy-plane. Then h = 0, and
the determinant of A becomes det(A) = aei. This value may have a negative value
depending on the signs of a, e, and i, but its absolute value matches the volume. For
higher dimensions, let’s just remember the formula. ⊔ ⊓
Solution 14.6 ⊔
⊓
Problem 14.7 (Area of parallelogram). When m > 2, show that the area of a par-
allelogram in Rm spanned by 2 edges a1 , a2 ∈ Rm is given by the following formula:
q
(∥a1 ∥ ∗ ∥a2 ∥)2 − (a1 · a2 )2 . (14.2)
⊔
⊓
Problem 14.8 (Volume of parallelotope when n < m). When n < m, show that the
volume of a parallelotope in Rm spanned by n edges a1 , · · · , an ∈ Rm is given by the
following Gram determinant:
1/2
q a1 · a1 · · · a1 · an
∥a1 × a2 × · · · × an ∥ = det(ai · a j ) = .. .. (14.3)
. .
an · a1 · · · an · an
Solution 14.8 Let’s just remember the formula without proving it. ⊔
⊓
Question 14.1. Do the formulas (14.2) and (14.3) for the area of a parallelogram
coincide with the Gram determinant?
Other things
In conclusion, the volume expansion rate for a linear function T is ∆ T (B) when
B = [0, 1]ℓ , i.e., the volume of the parallelotope T (B) whose edges are the columns
of ∇g(y). If T is a square matrix, this volume is simply ∆ T (B) = |detT |, i.e.,
∆ (∇g|c (B))
q(y) = = |det(∇g(y))|
∆B
The determinant of the derivative det(∇g(y)) is called the Jacobian. The formula for
variable transformation is as follows;
Z Z
f (x)dx = f (g(y))q(y)dy, g(G) = D.
D G
Remember to put an absolute value on the Jacobian. That is, Equation (2) of
the Thomas textbook in Section 14.8 is incorrect. This is related to the following
question.
112 14 Integration with Variable Changes
Rb R
Question 14.2. What is the difference between a f (x)dx and [a,b] f (x)dx in nota-
tion?
the absolute value is not attached. The values at both ends of the integral and the
sign of g′ (y) offset each other. However,
Z Z
f (x)dx = f (g(y))|g′ (y)|dy
[a,b] g−1 [a,b]
So far, integrals have been defined for functions defined on regions D ⊂ Rn inside
the space. The region D could be the entire space Rn or a part of it, but the dimension
of D was fundamentally the same as that of the entire space. In this and the following
lectures, we study integrals on surfaces inside the space, rather than n-dimensional
regions of Rn . Integrals on curves have already been discussed. The dimension of
the entire space n can easily be extended to the general case, but we restrict ourselves
to the case of n = 3. Understanding the case of n = 3 will enable us to handle cases
where n > 3 easily.
In this lecture, we apply the chain rule from the previous lecture to the case of
ℓ = 2. In other words, we consider the case of g : Rℓ=2 → Rn=3 . First, we denote the
space Rℓ=2 using variables (u, v). And the independent variables of Rn=3 are still
denoted by (x, y, z). These are understood as functions of u and v. That is, g(u, v) =
(x(u, v); y(u, v); z(u, v)). Then, the chain rule is written as follows.
xu xv
∂f ∂f
∇( f ◦ g) = ( , ) = (∇ f )(∇g) = ( fx , fy , fz ) yu yv .
∂u ∂v
zu zv
113
114 15 Surface Integral
∂f ∂ f ∂x ∂ f ∂y ∂ f ∂z
= + +
∂u ∂x ∂u ∂y ∂u ∂z ∂u (15.1)
∂ f = ∂ f ∂x + ∂ f ∂y + ∂ f ∂z.
∂v ∂x ∂v ∂y ∂v ∂z ∂v
∂f
In the notation , f is considered a function of u and v. In other words, the f
∂u
inside the notation is essentially the composition function f ◦ g. Also, since it is a
multivariable function, we use ∂ instead of d as a symbol.
Problem 15.1. Differentiate w with respect to variables r and s under the following
conditions:
r
w = x + 2y + z2 , x = , y = r2 + ln s, z = 2r.
s
Solution 15.1 (In the equations and figures, (u, v), (x, y, z), and f were used as
variables and functions. However, different people may use different notations. It is
important to be able to handle different notations when they are given.) ⊔
⊓
(2) If the dimension is n > 3, then we cannot use the above formula. However, we
can still find the area of the parallelogram. First,
s
a1 · a2 a ·a
1 2
2
cos θ = ⇒ sin θ = 1 −
∥a1 ∥ ∗ ∥a2 ∥ ∥a1 ∥ ∗ ∥a2 ∥
15.1 Surface integral 115
Therefore, for the more general case, the area is calculated as follows;
Z q
Area(g(G)) = (∥gu ∥ ∗ ∥gv ∥)2 − (gu · gv )2 dudv. Area Formula 2
G
Question 15.1. What is the relationship between Area Formula 2 and the Gram de-
terminant in (14.3)?
Problem 15.2. Find the area of the graph z = x2 + y2 for 0 < x, y < 2.
Solution 15.2 In this case, we can use (x, y) instead of (u, v). Then, g(x, y) =
(x; y; x2 + y2 ). gx = (1, 0, 2x), gy = (0, 1, 2y). Therefore,
116 15 Surface Integral
p
gx × gy = (2x; −2y; 1), ∥gx × gy ∥ = 4x2 + 4y2 + 1.
Let’s compute the area of the parallelogram using two methods. First, computing
the determinant yields:
cos θ −r sin θ
∥gr × gθ ∥ = |det(∇g)| = = r.
sin θ r cos θ
It is important to remember that the area expansion rate in polar coordinates is given
by r. The integral formula for variable transformation is as follows:
Z Z
f (x, y)dxdy = f (r cos θ , r sin θ )rdrdθ , G = g−1 (D).
D G
f = x2 + y2 , x = r cos θ , y = r sin θ ,
Solution 15.3 This problem confirms that the two methods yield the same result.
The function f is chosen to make (i) easy to compute. For a general function f , (ii)
would be easier. ⊔ ⊓
Solution 15.4 ⊔
⊓
Let’s reserve the variables r, θ , φ , and ρ, and use u, v, w, etc., for others.
2x−y
Problem 15.5. Convert the following iterated integral using the variables u = 2
and v = 2y and then compute it:
y
2 +1
Z 4Z
2x − y
dxdy
0 y
2
2
Solution 15.5 First, we need to find the function g(u, v) and its derivatives.
√
Z 1 Z 1−x
x + y(y − 2x)2 dydx
0 0
u = x + y, v = y − 2x.
Then,
1 1 −1 1 1
∇g(u, v) = , det(∇g(u, v)) = ( )2 (1 − (−2)) = .
3 2 1 3 3
x = 0 ⇒ v = u, y = 0 ⇒ v = 2u, x + y = 1 ⇒ u = 1.
Solution 15.7 Both x and y are positive within the region. The new variables u and
v are also positive. They can be expressed as follows.
p u √ 2
uv = y2 = y, = x = x.
v
−1 −1
uv v −uv−2 u
g(u, v) = , ∇g(u, v) = , det(∇g(u, v)) = 2 .
uv v u v
The boundaries of regions D and G are as follows:
y = 2 ⇒ uv = 2, y = x ⇒ v = 1, xy = 1 ⇒ u = 1.
119
120 16 Triple Integrals in Rectangular Coordinates
are the cells forming the partition of D, and their sizes. The total number of cells is
p1 p2 p3 . We denote the partition and size of D by
Now, let’s choose one point from each cell. Then the Riemann sum is as follows:
p1 p2 p3
S( f , π) = ∑ ∑ ∑ f (si jk )∆ xi ∆ y j ∆ zk , si jk ∈ Ci jk .
i=1 j=1 k=1
The average of the function f : D → R is the integral value divided by the volume:
1
Z Z
Average = f dx = f dx.
Vol(D) D D
Problem 16.1. Find the average of the function f (x, y, z) = xyz on the domain D =
[0, 2]3 .
Solution 16.1 ⊔
⊓
16.2 Riemann Integral on Non-Hexahedron Domain 121
Problem 16.2. Let D be the interior of a sphere with center (a, b, c) and radius r > 0.
Express the Riemann integral of a continuous and bounded function f over D as an
iterated integral.
Solution 16.2 First, projecting D onto the xy-plane yields a circle with center (a, b)
and radius r. Projecting this area again onto the x-axis gives the interval (a−r, a+r).
Now, once x ∈ (a − r, a + r) is fixed, the points (x, y) inside the disk are such that
q q
(x − a)2 + (y − b)2 < r2 ⇒ y ∈ b − r2 − (x − a)2 , b + r2 − (x − a)2 .
Furthermore, if (x, y) is fixed inside the disk, the points (x, y, z) inside the sphere
satisfy
(x − a)2 + (y − b)2 + (z − c)2 < r2 ,
i.e.,
q q
z ∈ c − r2 − (x − a)2 − (y − b)2 , c + r2 − (x − a)2 − (y − b)2 .
Problem 16.3. Suppose there is a sphere with center (0, 0, 1) and radius 5. Let D
be the part inside the sphere where z > 4. Express the volume of D as an iterated
integral.
Solution 16.3 Using the Pythagorean theorem, we find that the base of D is a disk
with radius 4. When projected onto the xy-plane, it becomes a disk with radius 4
and center at (0, 0). That is,
x2 + y2 < 16.
Problem 16.4 (Plane Passing Through Three Points). Find the equation of a plane
in three-dimensional space including the three points {(0, 0, 0), (1, 1, 0), (0, 1, 1)}.
Solution 16.4 To find the equation, follow these steps: (1) First, find a vector n
perpendicular to the plane. (2) Choose a point a on the plane. (3) For any point
x = (x, y, z) on the plane, the vector x − a is perpendicular to the plane. Thus, the
equation of the plane is given by n · (x − a) = 0.
Let’s proceed step by step. (1) To find the perpendicular vector, we need two
vectors in the plane. Given three points a, b, c, the difference between two points
forms a vector in the plane. For this problem, it’s convenient to choose (1, 1, 0) and
(0, 1, 1) since subtracting the origin is straightforward. The cross product gives a
vector perpendicular to both:
(2) Choose the point on the plane as (0, 0, 0) for simplicity. (3) Therefore, the equa-
tion of the plane is:
(1, −1, 1) · (x − 0, y − 0, z − 0) = x − y + z = 0.
⊔
⊓
Problem 16.5. Find the volume of the tetrahedron in three-dimensional space with
vertices {(0, 0, 0), (1, 1, 0), (0, 1, 0), (0, 1, 1)}.
⊔
⊓
Problem 16.6. Find the volume of the region formed by the intersection of z =
7 − x2 − y2 and 2x − 2y + z = 0 in three-dimensional space.
2x − 2y − 7 + x2 + y2 = (x + 1)2 + (y − 1)2 − 9 = 0.
124 16 Triple Integrals in Rectangular Coordinates
It’s a circle with center (−1, 1) and radius 3. This circle is the projection of the curve
onto the xy-plane, and to lift it back to the plane, we need to find the equation of the
circle in three dimensions. Projecting thispcircle onto the x-axis gives p the interval
−4 < x < 2. Given x, y ranges from 1 − 9 − (x + 1)2 < y < 1 + 9 − (x + 1)2 .
Once (x, y) is determined, z ranges from 2x − 2y < z < 7 − x2 − y2 . Now, to find the
volume:
Z Z Z √ 2 1− 2Z
9+(x+1) 2 2
7−x −y
Volume = 1 dx = √ 1 dzdxdy.
D −4 1− 9−(x+1)2 2x−2y
⊔
⊓
The center of mass, or centroid, is obtained by dividing the first moment by the
mass: R
x f dx
Center of mass or centroid = RD .
D f dx
Problem 16.7. Explain whether the centroid given above is indeed the centroid we
commonly refer to.
Solution 16.7 ⊔
⊓
Problem 16.8. Find the centroid of the triangle with vertices {(0, 0), (0, 3), (3, 0)}.
Verify if it matches the existing centroid.
Solution 16.8 In this problem, assuming the thickness of the triangle is constant.
Then, assuming the density is constant and equal to 1, we can find the centroid.
The mass can be found by integrating 1. Since it’s a right triangle, the area is
simply 92 , and the mass is equal to the area: ⊔
⊓
Lecture 17
Coordinate Systems
127
128 17 Coordinate Systems
Polar Coordinates
can represent θ .
However, what’s more important is to find the transformation function g that cal-
culates (x, y) when (r, θ ) is given. First, let’s consider the (r, θ ) space. We define
r as the first coordinate and θ as the second coordinate. The order also matters, as
it relates to orientation. By defining it this way, the orientation is preserved by the
transformation g. Then, g is given by
r cos θ
g(r, θ ) = .
r sin θ
We’ve already calculated it before, but if we calculate the volume expansion rate,
it’s as follows:
cos θ −r sin θ
det(∇g) = = r cos2 θ + r sin2 θ = r.
sin θ r cos θ
In the notation above, strictly speaking, the last f is actually different from the
previous f . It’s a notation abuse, but such abuse of notation is often used and should
be allowed for convenience.
17.1 Coordinate Systems 129
Cylindrical Coordinates
The cylindrical coordinate system simply uses the polar coordinates (r, θ ) from
the polar coordinate system and incorporates the z coordinate from the Cartesian
coordinate system. Let’s confirm this through some calculations. We designate z as
the third coordinate. First, the transformation function g is as follows:
r cos θ
g(r, θ , z) = r sin θ
z
The volume expansion rate is simply the absolute value of the determinant of ∇g.
Thus,
cos θ −r sin θ 0
q(r, θ , z) = |det(∇g)| = sin θ r cos θ 0 = r
0 0 1
which is the same as in the 2D case of polar coordinates. Therefore, the integral
transformation formula is as follows:
Z Z Z
f dx = f ◦ g(r, θ , z)r drdθ dz = f (r, θ , z)r drdθ dz.
D G G
Spherical Coordinates
The spherical coordinate system defines the distance ρ between the point (x, y, z)
and the origin as well as the angle φ formed between the z-axis and the line segment
connecting the origin and the point (x, y, z). The azimuthal angle θ remains the same
as in the polar coordinate system. Therefore, given the point (x, y, z),
130 17 Coordinate Systems
p z
ρ= x 2 + y2 + z2 , φ = cos−1 p
x2 + y2 + z2
are satisfied. Using (ρ, φ ) to represent r in the polar coordinate system is useful:
r = ρ sin φ .
Let’s choose the order of variables: ρ as the first variable, φ as the second vari-
able, and θ as the third variable. There are other ways to choose, but we’ll go with
this one. This choice simplifies the following calculations slightly and preserves
orientation under transformation. The transformation function g is as follows:
ρ sin φ cos θ
g(ρ, φ , θ ) = ρ sin φ sin θ
ρ cos φ
The volume expansion rate is simply the absolute value of the determinant of ∇g.
Thus,
q(ρ, φ , θ ) = |det(∇g)|
sin φ cos θ ρ cos φ cos θ −ρ sin φ sin θ
= sin φ sin θ ρ cos φ sin θ ρ sin φ cos θ
cos φ −ρ sin φ 0
= ρ 2 sin φ = rρ.
which is the same as in the 2D case of polar coordinates. The angle φ lies between
0 and π, so sin φ is positive. Also, since r = ρ sin φ , it’s useful to remember that
the volume expansion rate is q = rρ. Now, the integral transformation formula is as
follows:
Z Z Z
f dx = f ◦ g(ρ, φ , θ )ρ sin2 φ dρdφ dθ = f (ρ, φ , θ )ρ sin2 φ dρdφ dθ .
D G G
17.2 Examples of Variable Changes 131
Solution 17.1 Since the shape of the region D has boundaries parallel to the z-
axis, it’s advantageous to use cylindrical coordinates. Projecting D onto the xy-plane
yields x2 + (y − 1)2 < 1. The range of the angle θ is 0 < θ < π. Now, given θ , the
points on the circle are given by
so the range is 0 < r < 2 sin θ . And once (r, θ ) is determined, the range of z is
0 < z < r2 , so the integral becomes
Z Z π Z 2 sin θ Z r2 Z π Z 2 sin θ Z r2
f dx = f (g)r dzdrdθ = f (r, θ , z)r dzdrdθ .
D 0 0 0 0 0 0
Problem 17.2. When the center of the disc is moved from (0, 1) to (2, 2), find the
integral.
132 17 Coordinate Systems
Solution 17.2 The points on the circle on the xy-plane are satisfied by (x − 2)2 +
(y − 2)2 − 1 = 0. Rewriting it,
x2 + y2 − 4x − 4y + 7 = r2 − 4r(cos θ + sin θ ) + 7 = 0.
It is divided into two cases: when the quadratic in r has two distinct roots, and when
it has a repeated root or no root. Solving it, the range of angle θ becomes the angles
θ1 and θ2 where the quadratic has two distinct roots, and between them, r has two
values r1 (θ ) < r2 (θ ). Then, it can be rewritten as follows:
Z Z θ2 Z r2 (θ ) Z r2
f dx = f (g)r dzdrdθ .
D θ1 r1 (θ ) 0
⊔
⊓
The techniques used in Problems 17.1 and 17.2 are sometimes complex and not
particularly helpful for integration. They don’t fully utilize the advantages of using
cylindrical coordinates. Rather than these methods, it’s more meaningful to move
the center of the region to the origin and then use cylindrical coordinates. This means
applying another simple variable change, a translation. Let x̄ = x − 2, ȳ = y − 2 for
the translation, then the equation of the circle becomes x̄2 + ȳ2 = 1.
Then, in the variable change given in the figure, the range of (r, θ ) is 0 < θ < 2π,
0 < r < 1. When r and θ are determined, the range of z is 0 < z < x2 + y2 . However,
this range is not 0 < z < r2 because r and θ are now related to x̄ and ȳ. So, the
17.2 Examples of Variable Changes 133
calculation gives
Solution 17.3 ⊔
⊓
Part IV
Integration of Vector Fields
Lecture 18
Line Integral for Tangential Component
Let’s start by reviewing the line integral of a scalar-valued function, which we’ve
already studied in Calculus 1. Since we’ve already learned how to perform variable
transformations using the volume expansion rate, let’s summarize it again.
137
138 18 Line Integral for Tangential Component
The difference from (9.3) is simply that we use D to represent a general region
instead of C to emphasize that it’s a curve. There’s no real difference. To emphasize
that it’s a curve, we’ll use C in the future.
If the curve C is divided into several segments, Ci , i = 1, · · · , N, then we can
integrate each segment separately and add them together. In other words,
Z Z Z bi
f (x) dx = ∑ f (x)dx = ∑ f (ri (t))∥r′i (t)∥dt.
∪Ci i Ci i ai
Problem 18.1. If two curves r̄(t) and r(t) pass through the same curve but move
in opposite directions, what relationship holds between the line integrals given by
these two curves? Are they equal or opposite in sign?
Solution 18.1 ⊔
⊓
Now let’s consider the line integral of a vector field f, rather than a scalar function
f . When we say a vector field in an n-dimensional space D ⊂ Rn , we simply mean
a function f : D → Rn that takes n-dimensional vectors as values. Of course, we
mainly focus on the case of three-dimensional space n = 3. In a physical sense,
f(x) represents vectors such as force vectors acting at position x. As shown in the
previous page’s figure, you can also visualize it by drawing the vector f(x) at each
position x. When we say ”vector field,” remember that f also takes n-dimensional
vectors as values.
Solution 18.2 If a mass m is located at the position x ∈ R3 , the distance from the sun
x
is ∥x∥, and the direction vector toward the sun is − ∥x∥ . According to Newton’s law
18.2 Line integral for a force field 139
There are various types of vector fields in the world. Examples include gravita-
tional fields caused by celestial bodies, electromagnetic fields caused by charges or
currents, vector fields representing fluid flow, and vector fields caused by wind or
ocean currents. Mathematically, we can consider even more diverse vector fields.
Vector fields that rotate around a point, converge or diverge at a point, or emanate
from or converge to a point can have different properties.
However, this expression is incorrect. Why is that? Force is a vector and distance is
a scalar, so their product should be a vector. However, energy (work) is a scalar. The
above equation is oversimplified and doesn’t make sense upon closer examination.
The correct expression is as follows:
140 18 Line Integral for Tangential Component
If the vector field f is not a force field, the line integral above may not represent
energy, but it can have other meanings and be useful. Even when a general vector
field f is given, we define the line integral (line integral) as above. The following is
a commonly used notation for line integrals:
Z Z b Z b
′
f(r) · dr = f(r(t)) · r (t)dt = f(r(t)) · dr(t). (18.2)
C a a
Problem 18.3. Determine whether the values of line integrals using two parametric
curves r1 (t) and r2 (t) are equal under the following conditions:
(1) When the same curve C is traversed in the same direction but with different
speeds.
(2) When the same curve C is traversed in opposite directions.
(3) When starting and ending points are the same but different paths are taken.
Solution 18.3 (1) They are equal. Different speeds are reflected in the expansion
rate. (2) They are different. (3) Only the sign is opposite. ⊔
⊓
Question 18.1. Is the line integral for a scalar function (18.1) fundamentally differ-
ent from that for a vector field (18.2), or do they coincide in certain cases (e.g., in
the case of n = 1)? Find the fundamental difference between them.
There are various ways to express the line integral of f(x) = f1 (x)i + f2 (x)j +
f3 (x)k along a curve r(t) = r1 (t)i + r2 (t)j + r3 (t)k for a < t < b. Let’s find the logic
behind each method.
18.3 Path independence, potential, and conservative fields 141
Z Z b Z b
f(x) · dx = f(r(t)) · dr(t) = f(r(t)) · r′ (t)dt
C a a
Z b
f1 (r(t))r1′ (t) + f2 (r(t))r2′ (t) + f3 (r(t))r3′ (t) dt
=
a
Z b Z b Z b
= f1 (r(t))r1′ (t)dt + f2 (r(t))r2′ (t)dt + f3 (r(t))r3′ (t)dt
a a a
You should be familiar with both notations as both are commonly used.
Let’s practice computing line integrals.
√
Problem 18.4. Let f(x) = yi + zj + xk and r(t) = ti + tj + t 2 k for a < t < b. Com-
pute the line integral.
√
Solution 18.4 (1) First, f(r(t)) = ti + t 2 j + tk and r′ (t) = i + 0.5t −0.5 j + 2tk.
Therefore, Z Z b √
f(r(t)) · r′ (t)dt = t + 0.5t 1.5 + 2t 2 dt.
C a
(2) Instead of computing the dot product above, let’s use the formula that already
computes the dot product:
Z Z Z Z
f(x) · dx = f1 (x)dx + f2 (x)dy + f3 (x)dz
C C C C
Question 18.2. If two different paths r1 and r2 are taken from the starting point S to
the ending point E, the line integrals for a vector field along these paths are generally
different. However, for vector fields like gravitational fields, the line integral should
142 18 Line Integral for Tangential Component
be the same if the starting and ending points are the same. Why is this the case? Can
you explain?
Vector fields that represent forces, such as gravitational fields, have line integrals
that compute the work done by the gravitational field. Moreover, according to the
law of conservation of energy, the work done by the gravitational field is equal to
the difference in potential energy between the starting and ending points, regardless
of the path taken. Not all vector fields have this property. In this section, we will
learn about the characteristics of vector fields that exhibit such properties.
DefinitionR
18.1. Let f be a vector field on an open domain D ⊂ Rn . We say the line
integral C f · dr is path independent in D and the vector field f is conservative if
the line integral depends only on the starting and ending points S, E ∈ D of the curve
C.
In the above definition, note that we write f(x) = (∇p(x))t instead of f(x) =
(∇p(x)). This is because we have decided to represent the gradient of a scalar func-
tion ∇p(x) as a row vector, and f(x) as a column vector.
Problem 18.5. Find the potential function for the gravitational field obtained in
Problem 18.2.
Solution 18.5 There is not a unique potential function. Adding a constant still re-
sults in a potential function, as the constant disappears when computing the gradient.
The most commonly used potential function is as follows:
GMm
p(x) = .
∥x∥
Let’s compute the gradient to confirm. The potential function is inversely propor-
tional to the distance. Note that the potential energy is not the same as potential; it
18.3 Path independence, potential, and conservative fields 143
is the potential energy with a negative sign. (What about the potential energy at the
origin?) ⊔ ⊓
In the next lecture, we will learn that a vector field being conservative and having
a potential function are equivalent.
Exercises
1
1. Let f be a vector field with p = ∥x∥ as its potential function. If the curve C starts
at S = (2, 1, 0) and ends at E = (−1, 0, 1), what is the amount of work done?
2.
Lecture 19
Line Integral and a Fluid Flow
The concept of a vector field being conservative and having a potential field is equiv-
alent. This relationship is quite intuitive. The conservation of energy implies that
even if kinetic energy decreases, it is converted into potential energy, thus conserv-
ing the total energy. In a world where potential energy is not defined, there is no
conservation of energy. On the other hand, one of the key properties of a vector field
representing a fluid flow, rather than a force field, is circulation.
The definitions of a vector field being conservative and having a potential function
were discussed in the previous lecture. Now let’s consider their relationship. Firstly,
if a vector field is given by a potential function, it can be easily shown to be conser-
vative.
Problem 19.1 (potential field ⇒ conservative field). Show that if a vector field f
has a potential function, then it is conservative.
Solution 19.1 To show that the vector field f is conservative, we need to demonstrate
that the line integral is path-independent. Let C be a curve with the starting point S
and the ending point E. Let r(t) be a parametric function representing the curve C,
where r(a) = S and r(b) = E. Then,
145
146 19 Line Integral and a Fluid Flow
Z Z b
f(r) · dr = f(r(t)) · ṙ(t)dt
C a
Z b
= ∇p(r(t)) · ṙ(t)dt
a
Z b
= (p(r(t)))′ dt
a
= p(r(b)) − p(r(a))
= p(E) − p(S).
In other words, the line integral is determined by the starting and ending points, and
the path does not matter. (Explain the meaning of each equality above.)
If a vector field f has a potential function p, then the line integral from the starting
point S to the ending point E is given by:
Z Z b
f(r) · dr = ∇p(r(t)) · r′ (t)dt = p(E) − p(S). (19.1)
C a
Note
Rb
that in the above notation, the dot product · is omitted, and it is written as
′
a ∇p(r(t)) · r (t)dt. This is because ∇p(r(t)) is a column vector.
Now let’s show that a conservative vector field has a potential function. This proof
is more challenging and requires a well-defined potential function.
Problem 19.2 (Conservative field ⇔ potential field). Show that if the region D ⊂
R3 is open and connected, and the vector field f : D → R3 is continuous, then the
property of f being conservative is equivalent to the existence of a potential function.
Solution 19.2 Let’s show that if f is conservative, then a potential function exists.
The reverse direction has already been demonstrated in Problem 18.5. Let’s define
the potential function p(x, y, z) of the vector field f as follows. First, we choose a
point S ∈ D in the domain D as the reference point and set p(S) = 0. Then, for any
point E = (x, y, z) ∈ D, we determine the value of p at this point. Initially, we choose
a curve starting from S and ending at E, and define p using a line integral as follows:
Z b
p(E) = p(x, y, z) = f(r(t)) · r′ (t)dt.
a
We need to verify whether the function is well-defined. That is, we need to confirm
that only one value is assigned to each (x, y, z) according to our definition. Even if
we choose a different curve C1 , the value of p(E) should not change. However, since
f is a conservative field, the line integral will yield the same value for the starting
and ending points, regardless of the curve chosen. This fact is crucially used here.
∂p ∂p ∂p
Now we need to show ∇p(x, y, z) = f(x, y, z). Since ∇p = ∂ x i + ∂ y j + ∂ z k, let
∂p
f = f1 i + f2 j + f3 k. To prove = f1 , let’s choose a point E = (x, y, z) sufficiently
∂x
close to it in the domain D, and let M = (x0 , y, z) be a point close to E in D connected
19.2 Line integral and closed curves 147
to E by the line segment L (refer to the figure). Then, for a parametric curve r(t) =
ti + yj + zk representing L with x0 < t < x, r′ (t) = (1, 0, 0), and
Z Z
p(x, y, z) = f(r) · dr + f(r) · dr.
C0 L
Differentiating with respect to x, the constant terms with respect to x vanish, yield-
ing:
∂p ∂ x
Z Z
∂
(E) = f(r) · dr = f1 (r(t))dt = f1 (x, y, z).
∂x ∂x L ∂ x x0
To prove ∂∂ yp = f2 and ∂p
∂z = f3 , we need to choose the curve slightly differently.
Let’s proceed. ⊔ ⊓
A curve with the same starting and ending points is called a loop or a closed curve.
Line integrals along loops will frequently appear.
Solution 19.3 Let r(t), a < t < b be the parameterization of C. Then r(a) = S = E =
r(b). Let M be a point on the curve C. Then, C can be divided into two parts, and
we can denote two parameterizations r1 and r2 Rconnecting S and RM (see the figure).
Since f is a conservative vector field, we have f(r1 ) · dr1 (t) = f(r2 ) · dr2 (t). By
choosing a path that follows r1 first and then r2 in the opposite direction, we obtain:
Z Z Z
f(x) · dx = f(x) · dx − f(x) · dx = 0.
C C1 C2
⊔
⊓
∂ f1 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f3
= , = , = . (19.2)
∂y ∂x ∂z ∂x ∂z ∂y
It is important to note in the above criterion that D must be simply connected, not
just connected. The definition of simply connected is as follows.
Definition 19.2. An open domain D is called simply connected if every closed sim-
ple curve in D can be continuously contracted to a single point in D without leaving
D.
19.3 Flow and circulation 149
Problem 19.5. Determine whether the regions D depicted in the four figures are
simply connected. (The point marked in the middle indicates that the region has
been removed around that point.)
Solution 19.5 (1) The first figure is a partial disc with a small part removed around
the center, creating a loop that cannot be contracted to a single point. (Can you
identify which loop it is?) Therefore, it is not simply connected. (2) The second
figure is a disc with a small part removed around the center. Every loop can be
contracted to a single point. Therefore, it is simply connected. (3) The third figure
is obtained by drilling a cylinder into a filled sphere. It is not simply connected. (4)
The fourth figure resembles a donut shape, known as a torus. The inside is filled,
which is equivalent to structure (3) in practice, and it is not simply connected. ⊔ ⊓
Now let’s consider line integrals from the perspective of a velocity field representing
the flow of a fluid, rather than from the viewpoint of a force field like gravity. In this
case, instead of calling the line integral ”work,” we refer to it as the flow around the
curve C: Z
Flow around C = f(x) · dr(t).
C
In particular, when the curve C is a closed and simple curve, i.e., a loop, we call it
the circulation around the loop C:
I
Circulation around C = f(x) · dr(t).
C
Of course, when the vector field f is not specified as a force field or a velocity field,
whether we call it work, flow, or circulation, we understand that its meaning simply
denotes a line integral.
150 19 Line Integral and a Fluid Flow
If f is a conservative vector field, then the circulation is zero everywhere. This was
already shown in Problem 19.3.
Among the three vector field diagrams following Problem 18.2, one of them has
nonzero circulation. Which one is it? It’s difficult to definitively say that the remain-
ing two diagrams clearly do not have nonzero circulation.
Question 19.2. When computing line integrals, changing the direction of the pa-
rameterization of the curve reverses the sign. This property poses a serious problem
when calculating circulation. What is the issue?
The sign of the circulation around curve C cannot distinguish between positive
and negative values. If the fluid exhibits vortex behavior, it’s crucial to determine
whether the circulation is in the clockwise or counterclockwise direction. If this
distinction cannot be made, the concept becomes useless. Therefore, we need to
establish a convention. When computing circulation, we define the direction of the
loop in the counterclockwise direction as shown in the diagram. However, the prob-
lem remains unresolved. For example, in the case of a loop in three-dimensional
space as shown on the right, it’s not clear which direction is counterclockwise. We
need to decide which direction to point as counterclockwise. This is equivalent to
determining the orientation.
19.3 Flow and circulation 151
Exercises
1.
Lecture 20
Surface Integral for Normal Component
In this lecture, we will learn about surface integrals for the normal component of
a vector field. Since we are integrating the component perpendicular to the vector
field’s surface, it’s also appropriate to call it a normal integral. In three-dimensional
space, a surface has two dimensions, but in general, in n-dimensional space, a
surface has n − 1 dimensions. In two-dimensional space, a one-dimensional curve
serves the role of a surface. In any case, integrating the normal component of a vec-
tor field over a given surface is called a surface integral. Surface integrals play an
important role, such as computing the flux leaving an area through its boundary.
153
154 20 Surface Integral for Normal Component
For scalar functions, the value of the surface integral remains the same regardless of
the parametrization used.
∥∇F∥
Z
dxdy. (formula for area of level surface)
R |∂z F|
Solution 20.1 Let’s write x, y directly instead of using parameters u, v. Then, the
function g can be written as follows:
x
g(x, y) = y
h(x, y)
Since g(x, y) lies on the surface, it satisfies F(x, y, h(x, y)) = 1. Since the derivative
of a constant is 0, we have:
∂ ∂
F(g(x, y)) = F(x, y, h(x, y)) = Fx + Fz hx = 0,
∂x ∂x
∂ ∂
F(g(x, y)) = F(x, y, h(x, y)) = Fy + Fz hy = 0.
∂y ∂y
Therefore, we obtain hx = −Fx /Fz and hy = −Fy /Fz . Substituting them, the expan-
sion rate is as follows:
s
q F 2 F 2 F 2 ∥∇F∥
x y z
h2x + h2y + 1 = + + = .
Fz Fz Fz |Fz |
20.2 Surface, normal vector, and tangent plane 155
Since the area is obtained by integrating the constant function f = 1, we obtain the
formula for area mentioned above. ⊔ ⊓
In three-dimensional space, there are many tangent lines touching a point on a given
two-dimensional surface. Moreover, a single tangent line does not characterize the
shape of the surface. The collection of all these tangent lines forms the tangent plane,
as depicted in the figure above. This tangent plane provides information about the
surface near a point. Therefore, instead of individual tangent lines, we need to find
the tangent plane.
Let’s denote the intersection of two line segments parallel to each axis of the
domain G of the given parametric function g(u, v) as (u0 , v0 ). Then, these two line
segments can be represented as (u0 , v) with a < v < b and (u, v0 ) with c < u < d.
The image of these line segments on the surface S becomes a curve. The partial
derivatives gu (u0 , v0 ) and gv (u0 , v0 ) become vectors tangent to these two curves at
the point g(u0 , v0 ). Therefore, the cross product of these two vectors is perpendicular
to the tangent plane. The unit vector in this direction is given by:
gu × gv
n= . (20.2)
∥gu × gv ∥
(x − x0 ) · n = 0, x0 = g(u0 , v0 )
Of course, when obtaining the equation of the tangent plane, it’s not necessary to use
the unit vector n, but any vector perpendicular to the plane can be used. Therefore,
it’s convenient to use
(x − x0 ) · (gu × gv ) = 0
without explicitly using the unit vector.
Problem 20.2. At the point (0; 3; 4) on the surface of a sphere with center at the ori-
gin and radius 5, the direction vector perpendicular to the sphere is simply 51 (0; 3; 4).
Verify this by using the method of (20.2) to find the vector n.
Solution 20.2 The key to this problem is to choose g and compute the vector your-
self. Let’s set the domain
p of g as G = {(x, y) : x2 + y2 < 25}, and define the function
g = g(x, y) = (x; y; 25 − x2 − y2 ). Rewriting, we have
q
g(x, y) = (x; y; (25 − x2 − y2 )−0.5 2x), )), x2 + y2 < 25.
It’s easy to see that g(x, y) lies on the circle. The point (0; 3; 4) occurs when x =
0, y = 3. Therefore,
1
gx (0, 3) = (1; 0; − (25 − x2 − y2 )−0.5 2x) = (1; 0; 0),
2
1
gy (0, 3) = (0; 1; − (25 − x2 − y2 )−0.5 2y) = (0; 1; −3/4).
2
Now, to use (20.2), compute the cross product and direction vector:
3 1
gx (0, 3) × gy (0, 3) = (0, , 1), n = (0; 3; 4).
4 5
Solution 20.3 Since the radius of the sphere is fixed at ρ = 5, we choose parameters
φ and θ , and let the domain be (φ , θ ) ∈ [0, π] × [0, 2π]. Then, define g as follows:
5 sin φ cos θ
g(φ , θ ) = 5 sin φ sin θ .
5 cos φ
20.3 Surface integral for a vector field 157
Here, T is the direction vector of the tangent to the curve. However, there are two
directions, and depending on the choice, the magnitude remains the same but the
sign changes (see the figure). Since there are many tangent vectors on the surface,
such a definition is meaningless. However, there are only two normal directions
on the surface, and they provide information about the surface (see the figure). In
reality, the surface integral is defined as follows:
Z
f · n dx.
S
That is, the component of the vector field perpendicular to the surface is scalar, and
integrating it gives the surface integral of the vector field.
It is worth noting that there are two possibilities for the normal vector n. There-
fore, in some cases, it may not matter which direction you choose, but if a direction
is specified in the problem, you should choose the normal vector n accordingly.
Now, using parameterization, the surface integral above can be written as follows.
gu × gv
Z Z
f · n dx = f(g(u, v)) · ∥gu × gv ∥ dudv.
S G ∥gu × gv ∥
Problem 20.4. Let the vector field f(x) = (y; xz; z), and the surface S be defined by
x = y + z2 with 0 < z < 2 and 0 < y < 3. Compute the surface integral S f · n dx.
R
Solution 20.4 In this problem, since the normal vector n is not specified as one
of two possible normal vectors, there may be two possible answers. Let’s compute
them. First, choose parameters (y, z) and let the domain be G = {0 < y < 3, 0 < z <
2}. The mapping g : G → R3 is as follows:
y + z2
g(y, z) = y .
z
Therefore,
(The cross product gy ×gz = (1, −1, −2z) is non-zero for all intervals. This condition
is necessary for the normal vector n to be defined by (20.2).) ⊔ ⊓
Question 20.1. Surface integrals are not possible for all surfaces, and conditions
on the surface are necessary for the calculations we’ve done in this lecture. What
conditions are necessary?
The surface S must be smooth for us to perform the calculations in this lecture.
Smoothness of the surface means that the parameter function g(u, v) is a differen-
tiable function. Additionally, the partial derivatives gu and gv must not be zero, and
they must not be parallel. This ensures that gu × gv is not the zero vector, allowing
us to compute the normal vector n using (20.2). In other words, ∇g must always be
a rank 2 matrix.
20.3 Surface integral for a vector field 159
Exercises
1.
Lecture 21
Divergence Theorem #1
Solution 21.1 ⊔
⊓
161
162 21 Divergence Theorem #1
Solution 21.2 (1) The boundary of a disk with radius 1 is a closed curve, which
is a circle. The disk is in 2-dimensional space, and its boundary is a 1-dimensional
curve. The normal vector forms a 1-dimensional line. (2) The boundary of a sphere
with radius 1 is a surface called a sphere. This surface has no boundary. The sphere
is in 3-dimensional space, and its boundary, the sphere, is a 2-dimensional surface.
The normal vector forms a 1-dimensional line. (3) The boundary of a 4-dimensional
sphere with radius 1 is a kind of sphere. This surface has no boundary. The domain
D is in 4-dimensional space, and its boundary is 3-dimensional. The normal vector
forms a 1-dimensional line. ⊔ ⊓
Although there may be special spaces, as seen above, we can think of the bound-
ary of a bounded n-dimensional domain D ⊂ Rn as an (n − 1)-dimensional surface.
Additionally, the normal vector always forms a 1-dimensional line. Therefore, when
the boundary of a finite n-dimensional space D ⊂ Rn forms an (n − 1)-dimensional
smooth surface, the surface integral of the vector field f : Rn → Rn is defined in the
same way: Z
f · n dx.
∂D
Now, regarding the direction vector n perpendicular to the surface, there are two
possible cases. For a surface like ∂ D that forms the boundary of a single domain,
the convention is to choose the outward unit normal vector pointing from inside the
space to outside.
Proof. We will only prove the case for n = 2 instead of general dimensions. Al-
though the proof for dimensions 3 and higher is essentially the same, it involves
21.2 Divergence theorem 163
more cumbersome notation. The proof can also be extended similarly to other di-
mensions. To complete the proof, we should show convergence, but we will omit
that part and conclude by explaining the core mechanism. Additionally, we will
only consider the case when the domain D is a rectangle.
Let f : R2 → R2 be given by f(x, y) = f1 (x, y)i + f2 (x, y)j. Consider a partition of
the domain D into small cells, and let one of the cells be denoted as Di . Then, Di
has a boundary consisting of 4 line segments C1 ,C2 ,C3 ,C4 , and the outward unit
normal vector for each boundary is as shown in the figure. (Since we are dealing
with the 2-dimensional case and the line forms the boundary of Di , it should not be
thought of as a line integral. We should consider it as a surface integral because we
are integrating the normal component.) Now, when we perform the surface integral
over the boundary of Di , we have
Z Z Z Z Z
f · n dx = f · n dx + f · n dx + f · n dx + f · n dx
∂ Di C1 C3 C2 C4
∼ ∆x ∆ x ∆y ∆ y
= f1 x + , y − f1 x − , y ∆ y + f2 x, y + − f2 x, y − ∆x
2 2 2 2
f (x + ∆ x , y) − f (x − ∆ x , y) f (x, y + ∆ y ) − f (x, y − ∆ y )
1 2 1 2 2 2 2 2
= + ∆ x∆ y
∆x ∆y
∼ ∂ f1 ∂ f2
= (x, y) + (x, y) ∆ x∆ y = (∇ · f)∆ x∆ y.
∂x ∂y
Now, let’s consider the surface integral over the entire boundary ∂ D. Each line
segment on the boundary of a cell that lies inside contributes twice to the surface
integral, with opposite directions of the normal vector. On the other hand, the line
segments on the boundary of the entire domain D belong to only one cell. Therefore,
Z N Z N Z
f · n dx = ∑ = ∑ (∇ · f)∆ x∆ y ∼
f · n dx ∼ = ∇ · f dx.
∂D i=1 ∂ Di i=1 D
164 21 Divergence Theorem #1
In the above calculation, the first equality holds because the internal boundaries of
the cells always appear twice, canceling each other out. This proof demonstrates
why the surface integral and the volume integral are connected by the divergence
theorem. ⊔ ⊓
Problem 21.3. (1) Prove the divergence theorem in one dimension. (2) Prove the
divergence theorem in three dimensions.
Solution 21.3 (2) Let’s illustrate only a part of the general proof. Similar to the
2-dimensional case, consider a domain D ⊂ R3 in the shape of a cube and divide the
entire domain into small cube cells, denoted as Di . Let (x, y, z) be the coordinates of
the center of a cell. The boundary of this cell consists of six faces, as shown below.
Let’s denote the two faces perpendicular to the x-axis as S1 and S2 . The integral
R
over the surface of this cell, ∂ Di f · n, involves surface integrals over six faces, but
we are interested in the surface integrals over S1 and S2 , which can be approximated
as follows:
δx δx
Z Z Z Z
f·n+ f·n ∼
= f1 − f1 ∼
= ( f1 (x + , y, z) − f1 (x − , y, z))∆ y∆ z
S1 S2 S1 S2 2 2
f1 (x + ∆2x , y, z) − f1 (x − ∆2x , y, z) ∂ f1
= ∆ x∆ y∆ z ∼
= ∆ x∆ y∆ z.
∆x ∂x
This calculation demonstrates how the surface integral changes into a volume in-
tegral for divergence. Now, performing similar surface integrals for the remaining
pairs of faces, we obtain partial derivatives with respect to y and z. Summing them
all up, we get: Z
f·n ∼
= ∇ · f(x, y, z)∆ x∆ y∆ z.
∂ Di
Now, summing up the integrals over all cells leads to the divergence theorem, as in
the proof of the theorem. The integral over the normal component applies regardless
of the dimension. ⊔ ⊓
Problem 21.4. How do we prove the theorem for a general case where the domain
is not a rectangle?
21.2 Divergence theorem 165
Solution 21.4 Let’s consider the case when the shape of the domain is not a rectan-
gle, as shown below. First, we’ll enclose the domain D within a rectangular area and
divide it into small cells of side length ε, as depicted. Let’s gather all cells inside the
domain D and call it Dε . Then, Dε is not a rectangle, but we can apply the previous
proof directly to obtain: Z Z
∇·f = n · f.
Dε ∂ Dε
but there is uncertainty in this regard. Even though the lengths of ∂ Dε do not con-
verge to the length of ∂ D as ε → 0, we can still observe that the relationship holds.
For instance, in the triangle on the right side of the figure, the red and black line
segments have different lengths, but their line integrals are the same. Calculating,
Z Z Z Z
n·f = f2 ∼
= f2 (x, y)∆ x, n·f = f1 ∼
= f1 (x, y)∆ x,
ℓ1 ℓ1 ℓ2 ℓ2
1 1 f f
Z Z Z
n·f = (√ , √ )·f = √1 + √2 = ( f1 (x, y) + f2 (x, y))∆ x.
ℓ3 ℓ3 2 2 ℓ3 2 2
Now, combining the above relationships, we get:
Z Z Z Z
∇ · f dx = lim ∇ · f = lim n·f = f · n dx.
D ε→0 Dε ε→0 ∂ Dε ∂D
166 21 Divergence Theorem #1
These relationships provide insight into why the theorem holds, although it’s not a
rigorous proof. ⊔⊓
This means that regardless of the chosen domain D, the amount entering and leaving
the domain always remains the same, so the total quantity does not change. If it were
compressed, more would enter, and if it were expanded, more would leave, but the
fact that the total quantity remains unchanged implies that there is no compression
or expansion. Now, let’s practice with some examples.
Solution 21.5 ⊔
⊓
Solution 21.6 ⊔
⊓
Here, the left-hand side should not be referred to as a line integral. We are integrating
the normal component of the vector field f, not the tangential component. Unlike line
integrals, where the value changes depending on the direction of integration, in this
case, it does not. It is more accurate to call it a surface integral in two dimensions. In
two dimensions, since the boundary of the domain is a curve rather than a surface, it
may seem otherwise. However, changing the direction of the normal vector n in the
surface integral changes the sign of the integral value. The right-hand side is more
of a simple two-dimensional double integral than a surface integral. Writing this in
terms of the components of the vector field f = f1 i + f2 j, we have
I Z
f · n dx = (∂x f1 + ∂y f2 )dxdy. (21.1)
C D
21.2 Divergence theorem 167
Let’s rewrite this for the curve C using the parameterization r(t) = x(t)i + y(t)j.
There are a few things to note here. Firstly, r′ (t) = x′ (t)i + y′ (t)j is a vector tangent
to the curve. Then, what is the outward unit normal vector n? It is clear that −y′ (t)i+
x′ (t)j and y′ (t)i−x′ (t)j are normal vectors. But which one is outward? It is related to
the rotation direction of r(t). We usually choose curves that rotate counterclockwise,
in which case y′ (t)i − x′ (t)j is the outward normal direction. Therefore, the outward
unit normal n is given by:
y′ (t)i − x′ (t)j
n= .
∥r∥
Hence,
f1 y′ (t) − f2 x′ (t)
I Z b Z b I
f1 y′ (t) − f2 x′ (t) dt = f1 dy − f2 dx,
f · n dx = ∥r∥dt =
C a ∥r∥ a C
Theorem 21.2 (Green’s Theorem for Flux). Let S ⊂ R2 be a bounded smooth do-
main. Let C = ∂ S be the boundary of S and n be the outward unit normal vector
on the boundary. If f : R2 → R2 is a continuously differentiable vector field in an
open set D including S, Equation (21.2) is satisfied, where the line integral is in the
counterclockwise direction.
Lecture 22
Divergence Theorem #2
Consider a vector field v representing the velocity of a fluid such as wind or water.
In this case, the divergence theorem leads to a conservation law. Let’s explain this
process.
Let u(x,t) be the density of the fluid at a point x at time t. Suppose we fix the
region D. Then, the integral
Z
u(x,t) dx (total mass)
D
represents the total mass of the substance within the region D. If we differentiate
this quantity with respect to time, we obtain the rate of increase of mass within the
region D:
d
Z
u(x,t) dx (rate of increase of mass)
D dt
Here, the term ”rate of increase” refers to the rate of change with respect to time,
which can be interpreted as the rate of increase. On the other hand, at the boundary
of the region D, there are parts where fluid flows in and parts where it flows out.
The direction of flow is given by the velocity vector field v, and the amount of flow
is given by the product of velocity and density, which is called flux. That is, flux
is given by f = uv. Therefore, integrating this over the boundary yields the amount
leaving the region per unit time:
Z
f · n dx (rate of mass outflow)
∂D
Hence, the rate of increase of mass within the region and the outflow rate have
opposite signs. Therefore,
169
170 22 Divergence Theorem #2
d
Z Z Z
u(x,t) dx = − f · n dx = − ∇ · f dx.
D dt ∂D D
Here, the second equality follows from the divergence theorem. Now, rearranging
the above equation, we obtain:
Z
ut + ∇ · f dx = 0.
D
This equation, written in integral form, is called the conservation law. Since the
integral over all regions D must be zero, the integral function inside must be zero.
Therefore, we obtain:
ut (x,t) + ∇ · f = 0
This partial differential equation is commonly known as the mass conservation law.
In three dimensions, if the velocity is written as v = (v1 , v2 , v3 ), then the equation
can be rewritten as follows:
Let’s consider a charged particle located at the origin with a charge of q. The electric
field created by this charge is given by:
qx
E(x) = .
4πε0 ∥x∥3
Here, ε0 is a physical constant. Despite the difference in coefficients, it has the same
structure as the force field due to gravity. As already calculated in Problem 21.1, it
satisfies the following:
holds. Thus, the focus is on the integral over regions containing 0. If Br is a sphere
with center at the origin and radius r, where r > 0 is sufficiently small such that
Br ⊂ D, then
qx x q q
Z Z Z Z
E · n dx = E · n dx = · dx = dx = . (22.1)
∂D ∂ Br ∂ Br 4πε0 r3 r ∂ Br 4πε0 r2 ε0
22.3 Gauss’ Law 171
Solution 22.1 Let Dr be the set difference between D and Br , defined as Dr = D\Br .
Then, Z Z Z
E · n dx = E · n dx + E · n dx
∂D ∂ Dr ∂ Br
holds. This is because the boundary ∂ Br consists of ∂ D and ∂ Br , and the outward
normal vector n from the perspective of Dr is opposite to that from the perspective
of Br , thereby canceling each other out. Therefore, the above equation holds. Now,
applying the divergence theorem yields
Z Z
E · n dx = ∇ · Edx = 0.
∂ Dr Dr
R R
Thus, ∂D E · n dx = ∂ Br E · n dx is satisfied. ⊔
⊓
Exercises
1.
Lecture 23
Stokes’ Theorem #1
Although this definition is easy to forget over time, it’s helpful to remember it as the
cross product of two vectors, denoted as ∇ × f, and to recall it as a determinant of a
3 × 3 matrix. In other words,
i j k
curl(f) = ∇ × f = ∂x ∂y ∂z = (∂y f3 − ∂z f2 )i + (∂z f1 − ∂x f3 )j + (∂x f2 − ∂y f1 )k.
f1 f2 f3
Problem 23.1. Find the curl vector field of the following vector fields.
(1) f = (x; y; z) (2) f = (x; z; y) (3) f = (y; z; x) (4) f = (x2 − y; xez ; yz)
Solution 23.1 ⊔
⊓
Problem 23.2 (∇ × (∇p) = 0). Show that the curl of a conservative vector field is
0. What condition is necessary for this?
Solution 23.2 If the vector field f is conservative, then there exists a potential func-
tion p such that f = ∇p. Therefore, the curl of f is as follows:
Problem 23.3 (∇ · (∇ × f) = 0). Compute the divergence of the curl vector field
∇ × f of a vector field f. Show that the divergence of this curl vector field is 0. What
condition is necessary for this?
173
174 23 Stokes’ Theorem #1
The derivative matrix of the curl is generally not a zero matrix. The diagonal el-
ements of this derivative matrix are the partial derivatives of the i-th component
function of the vector field with respect to the i-th variable, and their sum gives the
divergence. Therefore,
Before explaining Stokes’ Theorem, let’s summarize what we have learned so far.
The surface integral of a vector field involves integrating the component perpendic-
ular to the surface, but for this to be possible, the given surface must be an oriented
surface so that we can choose the normal vector n throughout the entire surface.
However, even in this case, there are not just one but two perpendicular vectors, and
we choose one of them to represent n, while the other becomes −n. Choosing a
different perpendicular vector would change the sign of the surface integral.
Stokes’ theorem describes the relationship between the surface integral on a given
surface in three-dimensional space and the line integral on the curve that forms the
boundary of that surface. But what is the boundary of a surface? On the other hand,
what is the boundary of a surface? The following figure shows some examples:
The first example is a hemisphere cut in half. In this case, the cut surface marks
the ending point of the surface, which we call the boundary. The second and third
examples involve surfaces with holes. In this case, the perimeter of the hole also
becomes the boundary.
The line integral of a vector field involves integrating the component tangent to
the curve along the curve, and the direction of integration along the curve also has
23.2 Stokes’ Theorem 175
two options, with the sign changing depending on the chosen direction. Therefore, it
is important to choose the direction. In Stokes’ theorem, after choosing the normal
vector n, the line integral must be performed counterclockwise with respect to this
vector. This means that the counterclockwise direction is defined with the normal
vector as the axis, and the needle rotates in the opposite direction of the clock (re-
fer to the left two figures below). However, if the surface is not simply connected,
such as in the case of a surface with a hole, the boundary around the hole may ap-
pear counterclockwise like clockwise. It is essential to understand the meaning of
counterclockwise direction here. It refers to the perspective inside the surface, and
people inside the surface consider counterclockwise as the direction of the n-axis
inside the surface, so the direction of the boundary is just the direction they see as
they pass by (refer to the third figure).
where the line integral is in the counterclockwise direction with respect to the nor-
mal vector n.
The left-hand side of (23.1) represents the line integral of the vector field f, while
the right-hand side represents the surface integral of the curl of the vector field f.
The rigorous proof of Stokes’ theorem is beyond the scope and purpose of Cal-
culus II. However, to understand how the line integral is connected to the surface
integral and the meaning of Stokes’ theorem clearly, we will outline a fairly detailed
proof step by step. The actual proof involves approximating the left-hand and right-
hand sides of (23.1) and showing that the approximations converge, but we will only
proceed with the approximation steps.
176 23 Stokes’ Theorem #1
The first step considers the case where the surface S is flat, meaning it lies on a
single plane. In such cases, it is reasonable to assume that the surface lies on the
xy-plane from the beginning. This is because we can either redefine the coordinate
system or rotate and translate the plane to align it with the xy-plane. Of course, in
this case, the representation of the vector field f must also be expressed in the new
coordinate system.
Now that the plane S lies on the xy-plane, we can treat it as a two-dimensional
problem. First, assuming that the surface S is bounded, we place it in a rectangular
space and create partitions into small cells with side length ε > 0. We then collect
those cells that are contained in the surface S to form Sε , as shown in the figure
below. We denote each cell of Sε as Si , where i = 1, · · · , N, and zoom in on one of
them, as shown in the figure. Let (xi , yi ) be the midpoint of the cell Si , and denote
its four sides as ℓ1 , ℓ2 , ℓ3 , ℓ4 . Then,
I Z Z Z Z
f · dr = f · dr + f · dr + f · dr + f · dr.
∂ Si ℓ1 ℓ2 ℓ3 ℓ4
23.2 Stokes’ Theorem 177
The second approximation comes from the definition of Riemann integrals. Here,
ε 2 represents the area of the small partition cell, and (xi , yi ) are the points of each
cell.
Meanwhile, the following relation holds:
I N I
f · dr = ∑ f · dr.
∂ Sε i=1 ∂ Si
The sum of the line integrals on the right-hand side includes many more line inte-
grals over the boundaries of the internal cells of Sε . However, the boundaries within
the surface are integrated twice, once in each direction, and thus cancel each other
out. Therefore, only the boundaries of Sε remain, satisfying the equation above.
Moreover, the remaining line integrals are all counterclockwise. An example illus-
trating this is provided in the figure below. Each boundary of the small cell is inte-
grated counterclockwise, and after excluding the canceled parts, only the boundaries
remain, as shown in the figure.
Combining these, we obtain:
I Z
f · dr ∼
= ∂x f2 − ∂y f1 dxdy.
∂ Sε Sε
The next step is to take the limit as ε → 0. When taking the limit, all the approxima-
tions become equalities, Sε converges to S, ∂ Sε converges to ∂ S, and the integrals
converge as well. Thus, the rigorous proof is completed, resulting in the following
178 23 Stokes’ Theorem #1
equation: I Z
f · dr = ∂x f2 − ∂y f1 dxdy. (23.2)
∂S S
Returning to the original position, n is no longer k, and the surface S, although flat,
no longer lies on the xy-plane.
Let’s consider the case of a general oriented surface S. We approximate the given
surface by connecting small flat patches, as shown in the figure below. Although
the patches made into flat surfaces do not exactly match the plane, as we make the
flat patches smaller and smaller, they converge to the original surface. Making each
patch smaller than ε, we define the patch composed of these patches as Sε . Since
each patch is flat, we can apply Stokes’ Theorem to each patch. Now, summing them
up, as in the previous case, the line integrals over segments inside are canceled out,
leaving only the line integrals over the boundary. For this reason, we obtain:
I Z
f · dr ∼
= (∇ × f) · n.
∂ Sε Sε
Among surfaces, there are surfaces that do not have a boundary. Below are some
examples. The first one is a sphere: It has no boundary. In other words, the sphere
is the boundary of a ball, but the boundary of the sphere is the empty set. The
second example is a torus, which is a donut shape, and it has no boundary. The
third example is a cylinder with the top and bottom caps included, and it also has no
boundary (if the caps are removed, the cut-out circle becomes the boundary). Such
surfaces without a boundary are called closed surfaces.
Problem 24.1. Prove that the surface integral of the curl vector field over a sphere
is always zero. Explain which property of the sphere makes the integral zero.
Solution 24.1 The sphere does not have a boundary; that is, the boundary is the
empty set. Therefore,
Z Z Z
(∇ × f) · n = f · dr = f · dr = 0.
S ∂S 0/
This proof may seem awkward because Stokes’ Theorem is a theorem for surfaces
with boundaries. In that case, as shown in the figure below, if we create a hole with
181
182 24 Stokes’ Theorem #2
a radius of ε > 0 on the surface of the sphere, and denote the sphere with a hole as
Sε and the boundary of the small hole as Cε = ∂ Sε , we obtain the following:
Z Z I
(∇ × f) · n = lim (∇ × f) · n = lim f · dr = 0.
S ε→0 Sε ε→0 ∂ Sε
Here, the second equality is by Stokes’ Theorem, and the first and third equalities
are because f and ∇ × f are finite and continuous. ⊔
⊓
Closed surfaces divide space into inside and outside regions. And closed surfaces
become the boundary of the inside space. Of course, we can also say that they are
the boundary of the outside space. Moreover, when a surface with a hole becomes
a surface with a boundary (refer to the figure), the inside and outside become con-
nected, and the distinction between inside and outside disappears. Thus, surfaces
forming the boundary of a finite region do not have a boundary themselves. This
relationship can be expressed as:
∂ (∂ D) = 0,
/ D ⊂ Rn .
Using this relationship, we can connect the Divergence Theorem and Stokes’ The-
orem, and say the following. Let f be a vector field with continuous differentiability
up to the second derivative. Let D ⊂ R3 be an open set with a smooth boundary, and
let n be the outward unit normal vector on the boundary. Applying the Divergence
Theorem to ∇ × f and then applying Stokes’ Theorem, we obtain:
Z Z Z Z
∇ · (∇ × f) = (∇ × f) · n = f · dr = f · dr = 0.
D ∂D ∂ (∂ D) 0/
For this integral to be zero for all sets D ⊂ R3 , the integrand itself must be zero, so
∇ · (∇ × f) = 0. Of course, this fact can also be demonstrated directly by calculating
derivatives, as shown in Problem 23.3.
24.2 Simply connected domain 183
If a domain D is simply connected, then Stokes’ Theorem can be used more freely.
Let’s see the following theorem.
Proof. We only need to consider the case where C is a closed simple curve. Since
the domain D is simply connected, any closed curve C can be contracted to a single
point within the domain D without leaving it. Thinking of the trace left by this curve
as a surface implies that there exists some surface S within the domain D that has
C as its boundary. Moreover, closed simple curves always have orientable surfaces
associated with them. Now, applying Stokes’ Theorem yields:
I Z
f · dr = (∇ × f) · n.
C S
it is a sphere, or where part of its interior has been removed. In the case of a sphere,
it is easy to construct a surface S, and even if part of the interior is removed, a
surface S can still be constructed by avoiding it. However, in the case where a hole
is made from top to bottom of the sphere, depending on the curve C, it may or may
not be possible to construct a surface S within D that contains C. If we consider
the boundary curve of a Möbius strip, for example, it is a closed simple curve in
three-dimensional space. Therefore, it is possible to construct an oriented surface
184 24 Stokes’ Theorem #2
with this curve as its boundary. Thus, remember that different properties of surfaces
can have the same closed curve as their boundary.
Theorem 24.3 (Green’s Theorem for flux). Let S ⊂ R2 be a bounded smooth do-
main. Let C = ∂ S be the boundary of S, and n be the outward unit normal vector on
the boundary. If f : R2 → R2 is a continuously differentiable vector field in an open
set D including S, then
I I Z
f · n dx = f1 dy − f2 dx = (∂x f1 + ∂y f2 )dxdy,
C C D
24.4 Examples
Problem 24.2. Consider a hemisphere as shown in the figure below. Let the radius
of the hemisphere be a > 0 and denote it as S1 . Let C be its boundary. Also, consider
a disc with radius a centered at the origin on the xy plane, denoted as S2 . The vector
field is given by f = (−y, x, 0).
(1)
R
Given the normal vector n as shown in the figure, calculate the surface integral
S1 (∇ × f) · n. R
(2) Given the normal vector n = k, calculate theH
surface integral S2 (∇ × f) · n.
(3) Determine and calculate the line integral C f · dr when the direction of circula-
tion is as shown in the figure.
Solution 24.2 First, note that (1) and (3) are equal by Stokes’ Theorem. Simi-
larly, (2) and (3) are equal by Stokes’ Theorem. Recognize that when two different
surfaces have the same boundary and each normal vector gives the same counter-
clockwise direction to the boundary, the curl’s surface integral yields the same value.
Therefore, it is important to know which calculation is easier. More importantly, you
should be able to calculate each based on what we have studied so far. Now, let’s
solve.
(1) At a point (x, y, z) ∈ S1 on the hemisphere, the normal vector is n = 1a (x, y, z).
Also, it is easy to compute ∇ × f = (0, 0, 2). Therefore,
1 2z
Z Z Z
(∇ × f) · n = (0, 0, 2) · (x, y, z) = .
S1 a S1 S1 a
(This case is feasible by hand calculation, but if not, think of it as doing computer
integration and make formulas, it’s almost satisfying.)
(1b) Since the surface is a hemisphere, using spherical coordinates may be more
natural. Spherical coordinates satisfy the following relations:
Since the surface is half of a sphere with radius a, the variable transformation is as
follows.
π
g(φ , θ ) = (a sin φ cos θ , a sin φ sin θ , a cos φ ), G = {0 < φ < , 0 < θ < 2π}.
2
Using the variable transformation,
2z
Z Z π/2
= 2a2 π cos φ sin φ dφ .
S1 a 0
(2) At a point (x, y, 0) ∈ S2 on the disc, the normal vector is n = (0, 0, 1). Therefore,
Z Z Z
(∇ × f) · n = (0, 0, 2) · (0, 0, 1) = 2.
S2 S2 S2
The area of the circle is πa2 , so the integral value is also 2πa2 .
(3) For the curve that revolves around the origin and has a radius of a, there are
(a cost, a sint, 0) and (a sint, a cost, 0). It is necessary to recognize which direction
corresponds to the one shown in the figure. It is the former. Since it completes one
revolution, it has the interval 0 < t < 2π. Therefore, the line integral is
I Z 2π Z 2π
f·dr = (−a sint, a cost, 0)·(−a sint, a cost, 0)dt = a2 sin2 t +a2 cos2 tdt.
C 0 0
24.4 Examples 187
Problem 24.3. A conepis given as shown in the figure above. Given the relationship
of the cone as z = r = x2 + y2 , 0 < z < 2, and denote it as S1 . Let C be its boundary.
Also, consider a disc with radius a centered at the origin on the plane z = 2, denoted
as S2 . The vector field is given by f = (x2 − y, 4z, x2 ).R(1) Given the normal vector n
as shown in the figure, calculate the surface integral S1 (∇ × f) · n.R
(2) Given the normal vector n = k, calculate the H
surface integral S2 (∇ × f) · n.
(3) Determine and calculate the line integral C f · dr such that it matches the value
of the surface integral in (1).
Solution 24.3 This problem can be solved almost identically to the previous one.
Note that if you follow the orientation given in the figure, you should choose r(t) =
(2 sint, 2 cost, 2). ⊔
⊓