Download as pdf or txt
Download as pdf or txt
You are on page 1of 204

Yong-Jung Kim

25 Lectures for
Undergraduate Calculus II
February 19, 2024

카이스트 수리과학과
To the ones who question
Foreword

대학의 교과목에 대한 학술 서적과 그 교과목을 가르치는 강의록은 근본적인 차


이가 있다. 학술 서적은 관련 내용을 전부 포함하지는 못한다고 하더라도 필수
부분은 반드시 포함해야 하는 완결성과 이에 대한 정확한 설명을 핵심으로 한다.
또한 교과목의 진행과 별도로 다음에도 쉽게 필요한 부분에 접근할 수 있게 만드
는 접근 가능성도 중요하다. 가령 한 부분을 다시 보려고 했는데 그를 이해하기
위해서 앞부분을 전부 다시 봐야 한다면 접근이 힘들어진다.

반면에 강의록은 강의를 위해 만들어진다. 막연한 독자를 위해서가 아니라 전


과정을 함께 공부해 나가는 학생들을 염두에 두고 만들어진다. 따라서 학술 서
적과의 큰 차이는 학생들이 전 과정을 따라 올 수 있게 하는 학생들을 이끄는
방식이다. 특히 강의에서 강사와 학생들이 대화하는 것과 같은 효과적인 소통이
필요하다. 학생들이 생각할 수 있게 하는 적절한 질문과 동기부여가 학습의 효과
를 높일 수 있다. 때로는 구체적인 설명보다는 적절한 힌트와 동기부여로 학생 스
스로가 답을 찾아가게 하는 것이 더 효과적일 수 있으며 이를 통해 강의가 지식의
전달만이 아니라 지혜의 전달과 창의력에 대한 자극이 되게 할 수 있다. 강의록
은 어떤 특정 학술 서적과 동반하여 쓰여질 수도 또는 자체적으로 필요한 내용을
포함하게 하여 독립적인 교재로 만들어질 수도 있다.

그러나 실제 강의에서 사용되고 있는 대부분 서적은 학술 서적과 강의록의 측


면을 일정 부분씩 가지고 있지만, 또한 강의를 따라오는 학생들에게는 어정쩡한
경우가 많다. “28 Lectures series”는 강의록의 특성에 충실하여 학생들과 강사
가 소통하는 형식으로 만들어졌다. 적절한 질문과 스스로 이해하는 과정을 통해
성취를 이루어 갈 수 있는 문제들을 주요 구성 요소로 가지고 있다. 또한 현재
많은 수의 대학에서 한 학기 코스가 75분 강의 28개로 이루어진 점을 고려하여
28개의 강의로 구성하였으며 1주에 2개의 강의로 14주에 마무리될 수 있도록 구
성하였다. 또한 본 강의록은 보조 교재 없이 독립적으로 강의에 쓰일 수 있도록
자체적으로 필요한 내용을 공급하고 있다.

질문은 학습을 이끄는 원동력이자 창조적 사고의 시작이다. 본 강의록의 질문


과 문제들에 대해 생각하고 답하는 중에 자신의 새로운 질문을 찾길 바란다.

vii
viii Foreword

2021년 3월에 대전에서 김용정

There is a fundamental difference between academic books for college course


subjects and lecture books that teach those subjects. Although academic books may
not contain all relevant contents, it should include the cores of the subject with
accurate explanations. It is also important to let people access to individual materials
apart from course progression. If you have to read the book from the beginning when
you need only one part of it, the book cannot be used as a reference book.

On the other hand, lecture books are made to proceed a semester course. It is
not designed for vague readers, but for students who are studying the entire course
together. Therefore, the big difference from academic books is the way they lead
students to follow the whole course. In particular, effective communication skill
is important, as if the lecturer and students are talking to each other in a class.
Appropriate motivating questions to help students think by themselves can increase
the effectiveness of learning. Sometimes, it may be more effective to let students find
answers by themselves with appropriate hints and motivation rather than specific
explanations, and through this, lectures become not only the transfer of knowledge,
but also the transfer of wisdom and a stimulation of creative mind. A lecture book
may be written together with a specific academic book or may include all necessary
contents to be self-contained lecture book.

However, most of the books used in lectures have a certain aspect of academic
books and lecture books together. Some of them are actually inadequate for stu-
dents who follow the lecture. This “28 Lectures series” are designed in a form in
which students and instructors communicate with each other effectively to keep
the characteristics as a lecture book. All lectures start with the right questions and
problems that motivates students toward main concepts of the lecture. In addition,
considering that a semester course consists of 28 lectures of 75 minutes long at a
large number of universities, the book is composed of 28 lectures. Two lectures per
week can complete the course in 14 weeks. In addition, this lecture book is designed
to provide the necessary contents so that it can be independently used for lectures
without supplementary textbooks.

Questions are the driving force for progressing the study and the starting point
for creative thinking. I hope that this lecture book will give students an opportunity
to think about and answer questions and problems, and ask their own questions.

Deajeon Korea, March 2021 Yong-Jung Kim


Preface

본 “25 Lectures series”는 강의록의 특성에 충실하여 학생들과 강사가 소통하는


형식으로 만들어졌다. 적절한 질문과 이해하는 과정을 스스로 성취할 수 있는 문
제들을 주요 구성 요소로 가지고 있다. 또한 현재 많은 수의 대학에서 한 학기
코스가 75분 강의 25개로 이루어진 점을 고려하여 25개의 강의로 구성하였으며
1주에 2개의 강의로 14주에 마무리될 수 있도록 구성하였다. 또한 본 강의록은
보조 교재 없이 독립적으로 강의에 쓰일 수 있도록 자체적으로 필요한 내용을
공급하고 있다.

질문은 학습을 이끄는 원동력이자 창조적 사고의 시작이다. 본 강의록의 질문


과 문제들에 대해 생각하고 답하는 중에 자신의 새로운 질문을 찾길 바란다.

2021년 3월에 대전에서 김용정

This “25 Lectures series” is designed in a form in which students and instruc-
tors communicate with each other effectively to keep the characteristics of a lecture
book. All lectures start with the right questions and problems that motivate students
toward the main concepts of the lecture. In addition, considering that a semester
course consists of 25 lectures 75 minutes long at a large number of universities, the
book is composed of 25 lectures. Two lectures per week can complete the course in
14 weeks. In addition, this lecture book is designed to provide the necessary contents
so that it can be independently used for lectures without supplementary textbooks.

Questions are the driving force for progressing the study and the starting point
for creative thinking. I hope that this lecture book will give students an opportunity
to think about and answer questions and problems and ask their questions.

Daejeon Korea, March, 2021 Yong Jung Kim

ix
Acknowledgements

Use the template acknow.tex together with the Springer document class SVMono
(monograph-type books) or SVMult (edited books) if you prefer to set your ac-
knowledgement section as a separate chapter instead of including it as last part of
your preface.

xi
Contents

Part I Kepler and Newton’s Laws of Motion

1 Rectangular coordinate system and curves in R3 . . . . . . . . . . . . . . . . . . . 3


1.1 Coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Moving particle and trajectory curves in space . . . . . . . . . . . . . . . . . . 7
1.4 Cross product & inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Polar coordinates in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Variable change with polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Motion in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Ellipses in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Curves in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Newton’s law on Earth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23


3.1 Newton’s law of motion and gravitation . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Work and energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Gravity force and potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Projectile motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Multi-variable Vector-valued Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


4.1 Domain, Range, and Codomain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Graph, Image, Level Set, and Contours . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Limit and Continuity in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Composition of two functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Part II Linear Functions and Differentiation

5 Linear maps and matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 41


5.1 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

xiii
xiv Contents

6 Properties of linear mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47


6.1 Matrix multiplication and composition of linear functions . . . . . . . . . 47
6.2 Linear function and volume of parallelotopes . . . . . . . . . . . . . . . . . . . 48
6.2.1 Volume of parallelepiped when m = n . . . . . . . . . . . . . . . . . . . 49
6.2.2 Volume of parallelepiped when n < m . . . . . . . . . . . . . . . . . . . 50

7 Directional and Partial Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53


7.1 Directional derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2 Partial derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.3 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7.4 Higher order partial differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.5 Partial Derivatives with Constrained Variables . . . . . . . . . . . . . . . . . . 58

8 Full Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.1 Full Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2 The Chain Rule; Differential of Compositions . . . . . . . . . . . . . . . . . . . 65
8.3 Graph of Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

9 Line Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.1 Line Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
9.2 Expansion rate of a curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
9.3 Functions on parametrized curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.4 Directional derivative and Chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . 73

10 Finding Extreme Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


10.1 Extreme values in the entire space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
10.2 Criterion for maximum, minimum, and saddle . . . . . . . . . . . . . . . . . . 78
10.3 Parameterized curves and surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
10.4 Extreme values on level-set; Lagrange multiplier . . . . . . . . . . . . . . . . 81

11 Taylor’s Formula for Multi-Variable Functions . . . . . . . . . . . . . . . . . . . . 85


11.1 Taylor’s formula for 1-variable functions (Review) . . . . . . . . . . . . . . . 85
11.2 Higher order directional derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
11.3 Taylor’s formula for n variable functions . . . . . . . . . . . . . . . . . . . . . . . 88

Part III Integration of Multi-variable Functions

12 Double and Iterated Integrals on Rectangular Coordinates . . . . . . . . . 93


12.1 Riemann integral in R1 (Review) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
12.2 Double integral (or Riemann integral in 2-D) . . . . . . . . . . . . . . . . . . . 95
12.3 Iterated Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

13 Double Integration over a General Domain . . . . . . . . . . . . . . . . . . . . . . . 101


13.1 2 Types of Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
13.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Contents xv

14 Integration with Variable Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


14.1 Volume Expansion Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
14.2 Linear function and volume of parallelotopes . . . . . . . . . . . . . . . . . . . 108
14.2.1 Volume of parallelepiped when m = n . . . . . . . . . . . . . . . . . . . 109
14.2.2 Volume of parallelotope when n < m . . . . . . . . . . . . . . . . . . . . 110

15 Surface Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113


15.1 Surface integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
15.2 Polar Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
15.3 Variable Transformation in Multiple Integrals . . . . . . . . . . . . . . . . . . . 117

16 Triple Integrals in Rectangular Coordinates . . . . . . . . . . . . . . . . . . . . . . . 119


16.1 Riemann Integral on Hexahedron Domain . . . . . . . . . . . . . . . . . . . . . . 119
16.2 Riemann Integral on Non-Hexahedron Domain . . . . . . . . . . . . . . . . . . 121
16.3 Moments and Center of Mass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

17 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127


17.1 Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
17.2 Examples of Variable Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Part IV Integration of Vector Fields

18 Line Integral for Tangential Component . . . . . . . . . . . . . . . . . . . . . . . . . . 137


18.1 Line integral for a scalar function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
18.2 Line integral for a force field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
18.3 Path independence, potential, and conservative fields . . . . . . . . . . . . . 141

19 Line Integral and a Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


19.1 Potential field is conservative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
19.2 Line integral and closed curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
19.3 Flow and circulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

20 Surface Integral for Normal Component . . . . . . . . . . . . . . . . . . . . . . . . . . 153


20.1 Surface integral for a scalar function . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
20.2 Surface, normal vector, and tangent plane . . . . . . . . . . . . . . . . . . . . . . 155
20.3 Surface integral for a vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

21 Divergence Theorem #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161


21.1 Divergence and boundary of a domain . . . . . . . . . . . . . . . . . . . . . . . . . 161
21.2 Divergence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

22 Divergence Theorem #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169


22.1 Flux and Conservation Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
22.2 Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
22.3 Gauss’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
xvi Contents

23 Stokes’ Theorem #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173


23.1 Curl of a Vector Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
23.2 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

24 Stokes’ Theorem #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181


24.1 Boundary of Boundary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
24.2 Simply connected domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
24.3 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
24.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Acronyms

Lists of abbreviations, symbols and the like are easily formatted with the help of the
Springer-enhanced description environment.
x∈R Real numbers are denoted by regular characters.
(a, b) ⊂ R Open interval for a, b ∈ R.
[a, b] ⊂ R Closed interval for a, b ∈ R.
-
x ∈ Rn Column vectors are denoted by bold characters. One of unique feature of
this lecture note is that we distinguish row-vectors and column-vectors.
xt Row vectors are denoted with the transpose notation.
N(c, r) The neighborhood centered at c with radius r.
Rm×n m × n matrixes with m-rows and n-columns.
ai j The entry of a matrix placed at the i-th row and the j-the column.
det(A) The determinant of a square matrix A ∈ Rn×n .
trace(A) The trace a square matrix A ∈ Rn×n .
∇f For f : Rn → R, the gradient vector ∇ f is a row vector (1 × n matrix).
∇f(c) For f : Rn → Rm , the gradient of the vector field f (or the differential of f)
is denoted by ∇f(c) which is an m × n matrix.
∇ The notation ∇ = (D1 , · · · , Dn ) is a row-vector producing operator, where Di ’s
are partial derivatives.
U0 The interior of a set U ⊂ Rn .

xvii
Part I
Kepler and Newton’s Laws of Motion
Astronomer Johannes Kepler, in the 16th century, analyzed the observations of
Danish astronomer Tycho Brahe and explained the orbits of planets around the sun
with three laws between 1609 and 1619. These laws modified the circular orbit
theory of Nicolaus Copernicus to elliptical orbits and explained how the speed of
planets changes. The three laws are as follows:
1. The orbit of a planet is an ellipse with the sun at one of the two foci.
2. The line segment connecting the planet and the sun sweeps equal areas in equal
time intervals.
3. The square of the period of the planet’s orbit is proportional to the cube of the
semi-major axis length of the orbit.
Isaac Newton, in 1687, demonstrated that Kepler’s laws result from his laws of
motion and universal gravitation. Newton’s laws of motion consist of three parts:
1. Law of Inertia: An object at rest stays at rest, and an object in motion stays in
motion with the same speed unless acted upon by an external force.
2. Force Law: Force is the product of mass and acceleration (F = ma).
3. Action-Reaction Law: For every action, there is an equal and opposite reaction.
Newton’s law of gravitational force states that the gravitational force between two
objects is inversely proportional to the square of the distance and directly propor-
tional to the product of their masses. If the masses of the two objects are m1 and m2 ,
and their positions are x1 and x2 , then the gravitational force acting on object m1 is
given by:
m1 m2 r
Fm1 = −G 2 .
r r
In the above equation, G is the gravitational constant (6.674 × 10−11 m2 /kg s), and
r and r are defined as follows:

r = x1 − x2 , r = ∥r∥.

The force acting on object m2 is simply the opposite, following the action-reaction
law:
m1 m2 r
Fm2 = G 2 = −Fm1 .
r r
Thus, it satisfies the action-reaction law.

In Part II, the first goal is to explain Kepler’s laws using Newton’s laws, and in
the process, the second goal is to learn various useful mathematics.
Lecture 1
Rectangular coordinate system and curves in R3

Space: the final frontier. These are the voyages of the Starship Enterprise. Its
five-year mission: to explore strange new worlds. To seek out new life and new
civilizations. To boldly go where no man has gone before! (From Star Trek)

Now, let’s take the perspective of Newton and try to explain the motion of celestial
bodies using mathematics. To represent the motion of celestial bodies in space with
equations, we first need to establish a coordinate system in space. However, this task
is not as simple as it might seem. While the Earth has served as a reference for us
living on it, there is no such absolute reference in space. The reference frame needs
to be chosen by us.
In the movie Star Trek, the spacecraft Enterprise often moved at high speeds
and then came to a stop. However, distinguishing between a spacecraft moving at a
constant speed and a stationary one is not meaningful. Therefore, stating whether an
object is moving quickly or at rest is not meaningful. If we want to reach a certain
planet, it is more accurate to say that we match the velocity of the spacecraft to the
velocity of that planet. Velocity is relative, and kinetic energy is also relative. Only
acceleration has meaning.

1.1 Coordinate system

Let there be a particle in space. It is represented as r in the figure. Before choosing


a reference frame, we cannot say whether this particle is moving or not. What we

3
4 1 Rectangular coordinate system and curves in R3

can have as a reference is an object that is not accelerating or a position. We call it


the origin, denoted by 0. If the particle r moves with the same velocity as the origin
0, we say the particle is not moving. In other words, the velocity of the origin is the
zero velocity, denoted by v = 0. We use the same notation 0 for the position of the
origin and the velocity of the origin, distinguishing them by context. A point that
does not move in space, i.e., has a zero velocity, is called a position.
To express the position of the particle numerically, we need a coordinate system.
A coordinate system in space implies three positions satisfying certain conditions
with respect to the origin. First, we define the unit of distance. Next, we need a po-
sition i, which is one unit of distance away from the origin 0 in the x-axis direction.
Then, we choose a line perpendicular to the line connecting the origin and i in the
y-axis direction. On this line, we choose a point one unit of distance away from the
origin and name it j. A line perpendicular to the plane passing through the origin, i,
and j is chosen, and a point at a unit distance from the origin on this line is selected
and named k. This completes the coordinate system.

Problem 1.1. In the explanation above, positively oriented coordinate systems and
negatively oriented coordinate systems are distinguished based on the choice of k.
What are these cases?

Solution 1.1 (i) Right-hand rule: Wrap the fingers of your right hand around the
line passing through the origin, i, and j in the plane containing them, with the thumb
pointing in the k direction. If k aligns with the thumb, the coordinate system is
positively oriented. Otherwise, it is negatively oriented.
(ii) Cross product test: If k = i × j, the coordinate system is positively oriented. (In
any case, cross product can be explained using the right-hand rule.) ⊔ ⊓

We choose the positively oriented coordinate system, as is the tradition.

Remark 1.1. In this lecture, we consider 3-dimensional space, but for spaces with
dimensions two or higher, there exist both positive and negative coordinate systems,
and they can be distinguished. However, two positively oriented coordinate systems
cannot be distinguished from each other. They coincide upon rotation. The choice
1.2 Projection 5

of the coordinate system order determines the orientation. In 1-dimensional space,


there is only one coordinate system. When an object moves to the right, increasing
x is considered positive, and when it moves to the left, decreasing x is considered
positive. However, with a rotation, left and right are swapped, and they become
indistinguishable. The actual choice of coordinates determines the orientation.

Problem 1.2. Two particles move with different velocities without acceleration.
Prove that there exists a plane containing the motion of these two particles in space.

Solution 1.2 As discussed, let’s take one of the particles as the origin. There is a
line passing through the origin, and it intersects the plane containing the motion of
the second particle in space. If this line passes through the origin, there are many
such planes, and if it does not pass through the origin, there is a unique plane. ⊔

After looking at the solution to the above problem, if you feel a bit deceived, I
want to emphasize that this is not the case. Of course, within the coordinate system
with the third party as the origin, there is no such plane. Problem 1.2 illustrates that
the coordinate system should be chosen according to the purpose.

1.2 Projection

Let r denote the position of a particle in space. Given a coordinate system, we can
represent the position of r with three numbers using that coordinate system. Let’s
examine the meaning and method in detail. First, we project r onto the line x-axis,
which is the line connecting the origin 0 and the unit vector i. When projecting onto
the line, the point where the line, passing through the position r and perpendicular
to the x-axis, intersects the x-axis is the projection point of r onto the x-axis. The
distance from the origin to the projection point is the x coordinate of r. If the pro-
jection point is on the opposite side of i, we assign a negative sign. Similarly, we
can perform this process for j and k to find the y and z coordinates. These are the
coordinates of the point r. Consider the projection onto the xy-plane. Draw a line
perpendicular to the xy-plane, passing through r, and find the point where it inter-
sects the xy-plane. This point is the projection. The coordinates of this point on the
xy-plane are (x, y).
We represent r as a column vector:
 
x
r = y .
z

The coordinates for i, j, k, and 0 are as follows:


6 1 Rectangular coordinate system and curves in R3

       
0 1 0 0
0 = 0 , i = 0 , j = 1 , k = 0 .
0 0 0 1

The point r can be expressed using i, j, and k as:

r = xi + yj + zk.

Vectors are denoted in bold, and scalars are denoted in regular font. The magnitude
or norm of the position vector r is defined and represented as:
p
∥r∥ = x2 + y2 + z2 .

This represents the distance between r and the origin 0 (Pythagorean theorem). Dif-
ferent coordinate systems can be chosen as needed. In such cases, the essential po-
sition of r remains unchanged, but its representation changes.

Question 1.1. Most calculus textbooks do not distinguish whether vectors are col-
umn vectors or row vectors. However, we fix r as a column vector. What is the
advantage of choosing column vectors over row vectors?

Distinguishing between column vectors and row vectors reduces confusion. One
reason for representing the position vector r as a column vector is matrix multipli-
cation. If A is a 3 × 3 matrix and x is a vector, we typically write the matrix-vector
multiplication as Ax. In this case, x must be a column vector.
However, using column vectors has its drawbacks, as it consumes more space.
Therefore, sometimes, we may write r = (1, 3, 2), saving space horizontally. But
remember to keep in mind that, depending on the context, this may still represent a
column vector.
1.3 Moving particle and trajectory curves in space 7

1.3 Moving particle and trajectory curves in space

Let’s consider a planet moving in space. Let time be represented by t ∈ R, and let
r(t) denote the position of the planet or object at time t. Then, we can write:
 
f (t)
r(t) = f (t)i + g(t)j + h(t)k =  g(t)  .
h(t)

Alternatively, we can express it as:

x = f (t), y = g(t), z = h(t).

Both representations are equivalent, and the meaning is clear. However, reconsid-
ering, what is the reason for introducing the new expressions f (t), g(t), and h(t)?
They represent functions of x, y, and z coordinates of the planet, respectively. But
later, one might forget whether f (t) represented the x or y coordinate. So, it is better
to write:  
x(t)
r(t) = y(t) .
z(t)
The trajectory of the planet, denoted as {r(t) : t ∈ R}, is a curve in 3D space. Thus,
we can consider it as a vector-valued function with time variable t ∈ R. Using either
of the two expressions mentioned earlier, the norm of the position vector r(t) can be
represented as follows:
q q
∥r(t)∥ = f 2 (t) + g2 (t) + h2 (t) or ∥r(t)∥ = x2 (t) + y2 (t) + z2 (t).

The second notation makes it clear that this is the distance between the position
vector r(t) and the origin. This use of notation abuse clarifies the meaning.

Remark 1.2. In this notation, x(t) is a function with t as the variable representing
the x coordinate of the moving particle’s position at time t. We refer to this kind of
expression as notation abuse. Using the same symbol x for both the x coordinate in
the coordinate system and the function representing the position at time t is more
convenient than introducing a new function f (t) as x = f (t). This kind of notation
abuse, where the same symbol is used for two different entities, is widespread and
has been used in calculus, including the chain rule.

Question 1.2. What is the difference between a vector and a scalar?

We commonly say that a scalar is a quantity with only magnitude, and a vector is a
quantity with both magnitude and direction. However, that statement is not entirely
accurate. A scalar value x ∈ R also has one of two directions, either to the right
or to the left, with a magnitude of |x|. A more precise distinction is that a scalar
8 1 Rectangular coordinate system and curves in R3

is a quantity that arises in a number system like real or complex numbers, while a
vector can be considered as composed of multiple scalars, including the case of a
single-component vector. In other words, a scalar can be called a single-component
vector.

Problem 1.3. Draw the trajectory of the vector function r(t) = costi + sintj given
by r : (0, 2π) → R2 . In which direction is it moving?

Problem 1.4. Draw the trajectory of the vector function r(t) = costi + sintj + tk
given by r : (0, 2π) → R3 .

Problem 1.5. Generate a function r : (0, 2π) → R3 that traces the trajectory of a coil
rotating the z-axis 10 times when projected onto the xy plane, resulting in a circle of
radius 2.

Vector sums and subtractions

Multiplying a vector by a scalar is given by cr = (cx, cy, cz). The sum and difference
of two vectors are defined by adding and subtracting each component of the vectors,
respectively. That is,

r1 + r2 = (x1 + x2 , y1 + y2 , z1 + z2 )., r1 − r2 = (x1 − x2 , y1 − y2 , z1 − z2 ).

The geometric interpretation of vector addition is explained using parallelograms.


The vector difference r2 − r1 is understood with r2 as the terminal point and r1 as
the initial point (refer to the figure above).

1.4 Cross product & inner product

For two vectors,


1.4 Cross product & inner product 9
   
x1 x2
r1 = y1  , r2 = y2  ,
z1 z2
the cross product is denoted and defined as follows:

r1 × r2 = (y1 z2 − z1 y2 )i − (x1 z2 − z1 x2 )j + (x1 y2 − y1 x2 )k.

It is also called the vector product. To make it easier to remember the above formula,
we use the determinant of a 3 × 3 matrix:

i j k
y z x z x y
r1 × r2 = x1 y1 z1 = 1 1 i − 1 1 j + 1 1 k.
y2 z2 x2 z2 x2 y2
x2 y2 z2

The cross product is defined only for 3-dimensional vectors. Geometrically, the
cross product r1 × r2 is a vector perpendicular to the plane containing the two vec-
tors r1 and r2 , with a magnitude given by

∥r1 × r2 ∥ = ∥r1 ∥ ∥r2 ∥ sin θ (1.1)

where θ is the angle between them. There are two such vectors, satisfying the right-
hand rule. If the two vectors are parallel, i.e., if the angle is θ = 0, then r1 × r2 = 0.

Problem 1.6. Let r1 (t) and r2 (t) denote the vectors representing the positions of
two objects at time t. Show that the cross product satisfies the following product
rule:
(r1 (t) × r2 (t))′ = r′1 (t) × r2 (t) + r1 (t) × r′2 (t).

Solution 1.6 We can use the product rule for derivatives as follows:

(r1 (t) × r2 (t))′ = (y1 z2 − z1 y2 )′ i − (x1 z2 − z1 x2 )′ j + (x1 y2 − y1 x2 )′ k


= (y′1 z2 − z′1 y2 )i + (y1 z′2 − z1 y′2 )i + (· · · )j + (· · · )k
= r′1 (t) × r2 (t) + r1 (t) × r′2 (t).

Thus, the product rule is satisfied. (Not all terms are explicitly written, please verify.)


10 1 Rectangular coordinate system and curves in R3

Question 1.3. Is there a way to determine if two vectors r1 and r2 are perpendicular?
Is there an easy way to find the angle between them?

Using (1.1), we can find the angle between two vectors. However, an easier way
to determine the angle is through the inner product, also known as the dot product.
The inner product is defined in two ways:

r1 · r2 = ⟨r1 , r2 ⟩ = x1 x2 + y1 y2 + z1 z2 .

The inner product of two vectors yields a single scalar value.

Problem 1.7. If θ is the angle between two vectors r1 and r2 , show that
r1 · r2
cos θ = . (1.2)
∥r1 ∥ ∥r2 ∥

Solution 1.7 Assuming the two vectors meet at the origin, we can consider them
lying in the xy-plane. Therefore, let’s assume all z components are zero. Then the
relationship (1.2) corresponds to basic trigonometry learned in high school. Though
not explicitly shown here, (1.2) should be remembered. ⊔ ⊓

The relationship (1.2) is very important. If the inner product is 0, the vectors are
perpendicular. If the angle is 0, i.e., if the vectors are parallel, then cos 0 = 1, and
the inner product of the two vectors equals the product of their lengths.

Problem 1.8 (Equation of a plane). Find the equation of a plane perpendicular to


vector v = 2i + 3j + k passing through the point r = (1, 2, −1).

Solution 1.8 (Refer to the figure above) Let x = (x, y, z) represent a point on the
plane. Then, the vector x − r = (x − 1, y − 2, z + 1) is perpendicular to v = (0, 3, −2).
Therefore,

(x − r) · v = 0(x − 1) + 3(y − 2) − 2(z + 1) = 3y − 2z − 8 = 0.

Thus, the equation of the plane is 3y − 2z − 8 = 0. Alternatively, it can be written as


3y − 2z = 8. ⊔⊓

The inner product can be defined not only for 3-dimensional vectors but also
for vectors of any dimension. However, the notation used previously is not suitable
for expressing the inner product of n-dimensional vectors. Let’s represent two n-
dimensional vectors slightly differently:
   
x1 y1
 ..   .. 
x =  . , y =  . .
xn yn

The inner product of these two vectors is defined as follows.


1.4 Cross product & inner product 11
n
x · y = ⟨x, y⟩ = ∑ xi yi . (1.3)
i=1

The inner product of two functions f and g can also be defined by integration.
Z
⟨ f , g⟩ = f (x)g(x)dx. (1.4)

What is the angle between two vectors in n-dimensional space? What about the
angle between two functions f and g? Although their meanings are different, (1.2)
can be used as a definition for angles.

Question 1.4. What commonality exists between the inner products (1.3) and (1.4),
even though they seem different?

Problem 1.9. Let x(t) and y(t) denote vectors representing the positions of two
objects at time t. Show that the derivative of their inner product also satisfies the
following product rule:

(x(t) · y(t))′ = x′ (t) · y(t) + x(t) · y′ (t).

Solution 1.9 We can use the product rule for derivatives as follows:
 n ′ n
(x(t) · y(t))′ = x (t)y
∑ i i (t) = ∑ (xi (t)yi (t))′
i=1 i=1
n
= ∑ (xi′ (t)yi (t) + xi (t)y′i (t)) = x′ (t) · y(t) + x(t) · y′ (t).
i=1

Thus, the product rule is satisfied. ⊔


Exercises

1. Find all vectors perpendicular to r = i + 2j + k.


(1) r = i + 2j + k (2) r = 2i − 3j + 4k (3) r = i − j + k (4) r = 2i + j − 3k
2. Find unit vectors perpendicular to the following pairs of vectors.
(1) r1 = 3j + k, r2 = 2i + j − k (2) r1 = i + 2j + k, r2 = 2i − j
3. Find the equation of a plane perpendicular to the vector v = 2i + 3j + k passing
through the point r = (1, 2, −1).
4. Find the equation of a plane perpendicular to the vector v = 2i + 3j + k passing
through the point r = (1, 2, −1).
5. Find a vector perpendicular to the plane with the equation 2x + 3y − z = 2.
12 1 Rectangular coordinate system and curves in R3

6. Find the equation of a plane parallel to the xy-plane passing through the point
r = (2, 1, 4).
7. Find the equation of a plane parallel to the xz-plane passing through the point
r = (2, 1, 4).
8. Find the intersection of the planes 2x + 3y − z = 2 and 3x + y − 2z = 0.
9. Find the equation that represents all points equidistant to the points r1 = (1, 2, 1)
and r2 = (3, 2, −1).
Lecture 2
Polar coordinates in R2

The planets in the solar system orbit in elliptical paths close to circles around the
sun. Artificial satellites orbiting around the Earth are mainly designed to orbit in
circular paths, but they can also orbit in elliptical paths. Each orbit can be described
in two-dimensional space coordinates on a plane. Particularly, polar coordinates are
useful for representing circular or elliptical orbits. In this lecture, we will discuss
polar coordinates, which have many practical applications.

2.1 Variable change with polar coordinates

The polar coordinate system in two-dimensional space consists of two numbers:


the length r and the angle θ . The orthogonal coordinate system in two dimensions
consists of two numbers: the x-coordinate and the y-coordinate. The length of the
line segment connecting the origin and the point (x, y) is r, and this line segment
makes an angle θ with the x-axis. Given polar coordinates (r, θ ), we can calculate
orthogonal coordinates using sin θ and cos θ . That is,

x = r cos θ , y = r sin θ . (2.1)

Of course, given orthogonal coordinates (x, y), we can find the corresponding polar

coordinates (r, θ ). However, it is important to determine the ranges of r and θ . We


have

13
14 2 Polar coordinates in R2

(r, θ ) ∈ [0, ∞) × [0, 2π),


so the length r is given by p
r= x 2 + y2 .
However, explicitly expressing the angle θ as θ = f (x, y) is difficult, and it is given
implicitly as
x y p
cos θ = , sin θ = , r = x2 + y2 . (2.2)
r r
If r ̸= 0, then there exists a unique θ satisfying (2.2) in the interval 0 ≤ θ < 2π.
Let’s agree to write r before θ , similar to writing x before y.
Depending on the purpose and convenience, one can choose either of the two co-
ordinate systems and should understand their relationship well in order to perform
variable transformations freely. The relations (2.1) and (2.2) between polar coordi-
nates (r, θ ) and orthogonal coordinates (x, y) are perhaps the most important exam-
ples of multidimensional variable transformations, serving as the first example in
understanding variable transformations clearly, which is essential for understanding
Newton’s planetary theory.
The orthogonal coordinate system is not only a coordinate system but also the
actual world where the motion of planets occurs. On the other hand, the polar coor-
dinate system is a convenient coordinate system for representing elliptical orbits in
the orthogonal coordinate system. To easily use the polar coordinate system in the
orthogonal coordinate system, we introduce new basis vectors instead of the basic
basis vectors i and j of the orthogonal coordinate system. These are as follows:
   
cos θ − sin θ
er (θ ) = , eθ (θ ) = . (2.3)
sin θ cos θ

Although these two vectors are unit vectors, unlike i and j, they are not constant
vectors. Both vectors depend only on θ and are independent of r. The correspond-
ing basis vectors of the polar coordinate system are er , which becomes (1, 0), and
eθ , which becomes (0, 1). The reason is as follows: as seen in the figure, the vec-

tor er is a vector in the direction of the fixed θ , so it corresponds to (1, 0) in the


polar coordinate system, and the vector eθ corresponds to (0, 1) in the polar coor-
2.2 Motion in polar coordinates 15

dinate system as it is a vector in the direction of the fixed r. Let’s examine which
point in the orthogonal coordinate corresponds to the given coordinates (r, θ ) in the
polar coordinate plane. Once the angle θ is given, we consider the direction vec-
tor er corresponding to the angle θ . Since the direction vector is a unit vector, the
corresponding vector has a length of r:

r = rer (θ ).

This equation is nothing more than rewriting the relationship (2.1) as a vector equa-
tion. If er is the first coordinate axis and eθ is the second coordinate axis, then the
new coordinate system also has a positive orientation.
Now, if the point r = (x, y) on the xy plane is given, let’s find the corresponding
polar coordinates (r, θ ). There is a point to be careful about: since the correspon-
dence (2.1) is not one-to-one, it is not uniquely determined. To establish an inverse
correspondence, we must choose a branch as in defining inverse functions. In polar
coordinates, we choose r ≥ 0 and 0 ≤ θ < 2π as branches. Within this range, we
choose r and θ that satisfy (2.2).

Question 2.1. What if we express θ as tan−1 (y/x), θ as sin−1 (y/r), or θ as


cos−1 (x/r) instead of (2.2)?

Indeed, many calculus books use such relationships. However, if we define θ =


tan−1 (y/x) or θ = sin−1 (y/r), these two inverse functions only give angles in the
range − π2 ≤ θ ≤ π2 according to their definitions. If we use θ = cos−1 (x/r), it gives
angles only in the range 0 ≤ θ ≤ π according to the definition of cos−1 . Therefore,
these expressions are not accurate representations. Let’s simply use the solutions of
(2.2) as a new function θ (x, y). Then, we can cover the range 0 ≤ θ < 2π handled
in polar coordinates.

2.2 Motion in polar coordinates

This section is essential for deriving the orbit formulas of planets. It requires math-
ematical thinking for physical understanding. Assuming that two celestial bodies
(such as the Sun and the Earth) do not exert any external forces other than gravity
on each other, they will lie on the same plane (this will be confirmed later). Intro-
ducing a polar coordinate system on this plane allows us to represent the position of
an object or a planet using polar coordinates:

r = xi + yj = r cos θ i + r sin θ j = rer (θ ).

We denote the position vector in bold font r. The relationship with polar coordinates
r is
∥r∥ = r.
16 2 Polar coordinates in R2

The basis vectors i and j in the orthogonal coordinate system are fixed perpendic-
ular coordinate systems regardless of the position. However, er (θ ) and eθ (θ ) are
perpendicular coordinate systems that vary depending on the position. They are de-
termined by the angle θ for a given position in the orthogonal coordinate system
and are independent of r.

Problem 2.1. Prove the following derivatives.

der deθ
= eθ , = −er .
dθ dθ

Solution 2.1 These relations can be easily proven using the derivatives of trigono-
metric functions. Remembering them is more important. ⊔ ⊓

With the new coordinate system, the position of the object is represented as
r = rer (θ ). This notation hides the time variable. As the object moves, the polar
coordinates r and θ representing the position of the object become functions of the
time variable t. The right side of the following figure shows the trajectory of a par-
ticle moving on the xy plane. Then, the corresponding polar coordinate position is

represented as r̃(t) = (r(t), θ (t)). The space where Newton’s laws apply is not the
polar coordinate space but the orthogonal coordinate space. In other words, New-
ton’s gravitational law and laws of motion must be applied to the trajectory where
the point r = rer (θ ) on the right side of the figure moves. Therefore, the coordinates
er and eθ become functions of the angle θ with respect to the time variable t, and
the position of the particle can be written as follows:

r(t) = r(t)er (θ (t)).

Problem 2.2. Prove the following.

ėr = eθ θ̇ , ėθ = −er θ̇ . (2.4)

Solution 2.2 To calculate the derivatives with respect to time ėr and ėθ , consider
the angle as a function of time θ = θ (t). Using the chain rule and problem 2.1, we
get
2.3 Ellipses in polar coordinates 17
   ′   
d cos θ (t) cos θ (t)θ̇ − sin θ (t)θ̇ der
ėr = = ′ = = θ̇ = eθ θ̇
dt sin θ (t) sin θ (t)θ̇ cos θ (t)θ̇ dθ

and similarly
deθ
ėθ = θ̇ = −er θ̇ .


Problem 2.3 (Position, velocity, acceleration using polar coordinates). The posi-
tion, velocity, and acceleration of an object are given as follows.

r = rer (2.5)
v = ṙer + rθ̇ eθ (2.6)
2
a = (r̈ − rθ̇ )er + (rθ̈ + 2ṙθ̇ )eθ (2.7)

Solution 2.3 The position vector (2.5) has already been explained. Its derivative
using the product rule and (2.4) is as follows:

v = ṙ = ṙer + rėr = ṙer + rθ̇ eθ ,

a = v̇ = r̈er + 2ṙθ̇ eθ + rθ̈ eθ − rθ̇ 2 er = (r̈ − rθ̇ 2 )er + (rθ̈ + 2ṙθ̇ )eθ .

Remark 2.1. Remember that using polar coordinates r and θ , it is convenient to use
er and eθ as basis vectors instead of i and j.

2.3 Ellipses in polar coordinates

The equation of an ellipse with its center at the origin and major and minor axes
along the x-axis and y-axis, respectively, is given by:

x 2 y2
+ = 1.
a2 b2
An overview of the graph is given on the left side of the figure. ±a represent the
x-intercepts, and ±b represent the y-intercepts. If a = b, then the above ellipse be-
comes a circle. For convenience, we consider the case where a ≥ b, so the x-axis
becomes the major axis. The focus of the ellipse lies on√the major axis at two points.
The distance between the center and the focus is c = a2 − b2 , i.e., the foci are at
(±c, 0). The eccentricity of the ellipse, which indicates how far it deviates from a
circle, is given by: r
c a2 − b2
e= = . (2.8)
a a2
18 2 Polar coordinates in R2

If e = 0, the shape is a circle. If e = 1, then b = 0, and it is no longer an ellipse. The


eccentricity of an ellipse lies between 0 and 1.
Let’s represent the ellipse using polar coordinates. Take a line perpendicular to
the x-axis, x = k, and use this line as the directrix for obtaining the curve in front.
Let P(x, y) have polar coordinates (r, θ ), and denote the foot of the perpendicular
from P to the directrix as D. For some positive e > 0,

r = ePD (2.9)

defines all points P(x, y) that satisfy this equation. Since the length of segment PD
is k − x, we have:
p
r = ePD ⇒ x2 + y2 = e(k − x) ⇒ x2 + y2 = e2 (k2 − 2kx + x2 ).

In simplified form, this becomes:

(1 − e2 )x2 + 2ke2 x + y2 = e2 k2 .

If e ̸= 1, we can rewrite this equation as follows:


 ke2 2 y2 e2 k 2
x+ + = . (2.10)
1 − e2 1 − e2 (1 − e2 )2

Problem 2.4. If 0 < e < 1, show that (2.10) represents an ellipse with one of its foci
at the origin, where e represents the eccentricity of the ellipse.

Solution 2.4 If 0 < e < 1, then 1 − e2 > 0, and we can define:

e2 k2 e2 k 2 ke2
a2 = , b2 = a2 (1 − e2 ) = , c= > 0.
(1 − e2 )2 (1 − e2 ) 1 − e2

Dividing (2.10) by a2 , we get:

(x + c)2 y2
+ 2 = 1,
a2 b
2.4 Curves in polar coordinates 19

q The center of the ellipse is (−c, 0). The eccentricity of


which represents an ellipse.
a2 −b2
the ellipse is defined as a2
. Calculating,

a2 − b2 a2 − a2 (1 − e2 ) 1 − (1 − e2 )
= = = e2 . (2.11)
a2 a2 1
Thus, the coefficient e in the relationship r = ePD is indeed the eccentricity of the
ellipse, so it is reasonable to set the coefficient to
√e from the beginning. The distance
from the center to the focus of the ellipse is a2 − b2 , and using (2.11), we can
compute: s
p √ k 2 e4
2 2
a −b = e a = 2 2 = c.
(1 − e2 )2
Therefore, shifting the ellipse by c units to the left means the origin is a focus. ⊔

We have shown that points satisfying (2.9) form an ellipse with eccentricity e
and one focus at the origin. The length of segment PD is k − r cos θ , so the polar
representation of this ellipse becomes r = e(k − r cos θ ). Solving for r, we get:
L
r= , L = ek.
1 + e cos θ
This equation represents an ellipse with eccentricity e for 0 < e < 1. However, for
e ≥ 1, it represents a parabola or a hyperbola (see Appendix B).

2.4 Curves in polar coordinates

When using polar coordinates (r, θ ) correspondingly with Cartesian coordinates


(x, y), it’s common to define the range as r ≥ 0 and 0 ≤ θ < 2π. However, when
simply using polar coordinates to represent curves, they can be used without such
restrictions. In this section, we consider the equations of curves represented in po-
lar coordinates and their corresponding curves in Cartesian coordinates and their
meanings.

Problem 2.5. Convert the following equations given in polar coordinates to Carte-
sian coordinates and draw their corresponding graphs.
2
(1) r = 1 (2) r = cos θ (3) r = cos(2θ ) (4) r =
sin θ − cos θ

Solution 2.5 It’s important to distinguish between the graphs in polar coordinates
and their corresponding graphs in Cartesian coordinates, understanding that the
graphs in polar coordinates correspond to the graphs in Cartesian coordinates via
the transformation (2.1). The overview of the graphs is given in the figure.
20 2 Polar coordinates in R2

p
(1) The equation r = 1 in Cartesian coordinates becomes x2 + y2 = 1, which
represents the equation x2 + y2 = 1. We know this represents a circle with its center
at the origin and radius 1. Even without knowing this, if we plot r = 1 for various
values of θ from 0 to 2π, we would observe a circle with radius 1.
(2) Since cos θ can take negative values, we need to consider the possibility of
r being negative when writing r = cos θ . Multiplying both sides by r, we get r2 =
r cos θ , which, in Cartesian coordinates, becomes x2 + y2 = x. Rewriting this, we get
(x − 0.5)2 + y2 = 0.52 . This represents a circle centered at (0.5, 0) with radius 0.5. In
the polar coordinate space, this graph is represented by the cosine function, which
repeats every 2π interval. Thus, the interval [0, 2π] corresponds to two circles. It’s
worth understanding why this is so when θ moves from 0 to π.
(3) Using the double angle formula, we get r = cos2 θ − sin2 θ , and in Cartesian
coordinates, this becomes (x2 + y2 )3/2 = x2 − y2 . Squaring both sides and rewriting,
we get x6 + 3x4 y2 + 3x2 y4 + y6 = x4 − 2x2 y2 + y4 . It’s not immediately clear what
curve this equation represents. However, in polar coordinates, the graph is simply
the cosine function, and considering the above graph, we end up with a four-leaf
clover pattern due to the absence of overlapping.
(4) In this case, the graph in polar coordinates might seem more complicated,
but when rewritten in Cartesian coordinates, we get y = x + 2, which represents a
straight line. ⊔⊓

Exercises

1. Convert the following points√


given in Cartesian√
coordinates to polar coordinates.
(1) r = (1, 1) (2) r = (−1, 3) (3) r = (−2 3, −2) (4) r = (0, −2)
2.4 Curves in polar coordinates 21

2. Convert the following points given in polar coordinates to Cartesian coordinates.


π √ π π
(1) r̃ = (2, ) (2) r̃ = (4, π) (3) r̃ = (2 3, ) (4) r̃ = (0, )
2 6 4
3. Sketch the overview of the curves represented by the following polar equations.
(1) r = 1 − cos θ (2) r = 1 − sin θ (3) r2 = sin θ (4) r2 = 4 cos θ
4. Given below are equations of ellipses. Compute the center, foci, and eccentricity.
(1) 16x2 + 25y2 = 400 (2) 9x2 − 18x + 10y2 = 44 (3) 6x2 + 9y2 − 18y = 45
5. Represent the above ellipses in polar coordinates.
6. Convert the equations of the curves given in polar coordinates to Cartesian coor-
dinates and sketch their overview.
1 20 5
(1) r = (2) r = (3) r =
1 + cos θ 10 − 5 cos θ 1 + 2 sin θ
5
(4) r =
1 − 0.5 cos θ
7. Find the equation of the ellipse with a directrix at x = 5, eccentricity e = 0.5, and
the focus at the origin.
8. Use equation (2.6) to find the distance from the center of an artificial satellite with
an orbital period of 24 hours to the center of the Earth. (Refer to the necessary
data such as the gravity formula from the internet, etc.)
Lecture 3
Newton’s law on Earth

3.1 Newton’s law of motion and gravitation

Newton’s three laws of motion are as follows.


1. Law of inertia: An object moves at a constant velocity if no external forces act
on it.
2. Law of force: Force is equal to the product of mass and acceleration (F = ma).
3. Law of action-reaction: For every action, there is an equal and opposite reaction.
The first law is Galileo’s law of inertia, which describes motion at a constant
speed, corresponding to a = 0 in the second law, which is a special case.
According to Newton’s law of universal gravitation, the gravitational force be-
tween two objects is inversely proportional to the square of the distance between
them and directly proportional to the product of their masses. Let m1 and m2 be the
masses of two objects, and x1 and x2 be their position vectors. Then, the gravita-
tional force acting on object m1 is given by
m1 m2 r m1 m2
Fm1 = −G 2
= −G 2 er (3.1)
r r r
where G is the gravity constant with a value of G ∼
= 6.674 × 10−11 m2 /kg s. r is the
distance between the two objects, r is the position difference vector, and er is the
unit vector in the direction of r. That is,
r
r = x1 − x2 , r = ∥r∥, er = . (3.2)
r
Here, r is the position vector pointing from object m2 to object m1 . Hence, the
motion of m1 as observed by an observer at m2 is r(t). We will consider m2 as a
large object like the sun and m1 as a small object like the Earth. In Equation (3.2),
the trajectory of r(t) with respect to the time parameter t ∈ R is viewed from the
origin with m2 . In the next chapter, we will see that this trajectory is an elliptical

23
24 3 Newton’s law on Earth

orbit. However, in reality, m2 also moves slightly, so the motion of m1 is a slightly


deviated orbit from the ellipse by the amount m2 moves. What remains stationary (or
moves at a constant speed) is the center of mass of the two celestial bodies, which is
not the sun but the center of mass of the two celestial bodies, the sun and the Earth.
The vector rr is a unit vector pointing from x2 to x1 . This corresponds to the unit
vector er in Lecture 2 when x2 is taken as the origin in polar coordinates.

3.2 Work and energy

When an object receives a force F and moves a distance ℓ, the magnitude of work
W is given as follows:

W = f ℓ (work = component of force in the direction of motion × displacement).


(3.3)
Here, f refers to the component of force F in the direction of motion. If the force F
is perpendicular to the direction of motion, then f = 0, and no work is done by the
force. For instance, if a planet or satellite orbits in a circular orbit, the force acting,
i.e., gravity, is perpendicular to the direction of motion, and thus, the work done is
W = 0.

Question 3.1. Why is work defined as the product of force and displacement?

Is Equation (3.3) a definition of work? If work is energy, then Equation (3.3)


should be a formula for calculating energy, not a definition of work. Some books
refer to Equation (3.3) as the definition of work. If that is the case, then one must
separately demonstrate that work and energy are the same. In any case, what needs
to be explained is that using Equation (3.3) for calculation yields the correct en-
ergy. And by ”correct,” it is meant that the calculated energy does not contradict
existing energy concepts. In fact, Equation (3.3) can be understood as a formula for
calculating potential energy.
Let’s see through an example how energy and work are connected. Suppose a
mass m in a stationary state in one-dimensional space receives force f = ma for a
time t. Then the obtained velocity is v = at. Therefore, the kinetic energy at that
moment is Ek = 21 ma2t 2 . So what is the distance traveled? The distance traveled is
obtained by integrating the velocity. That is,
Z t t
1 1
ℓ= as ds = as2 = at 2 .
0 2 0 2

Therefore, using Equation (3.3) to calculate work, W = f ℓ = ma × 12 at 2 = 12 ma2t 2 ,


which is equal to kinetic energy. In other words, energy can also be calculated using
Equation (3.3). In reality, Equation (3.3) is a formula for calculating potential energy
3.3 Gravity force and potential energy 25

when the parameter for energy calculation is changed from time to distance (arc-
length).
What if the force is not constant but a function? If it is a function of time, then
it means that acceleration varies with time, and thus, velocity becomes the integral
of acceleration, i.e., v(t) = v(0) + 0t a(s)ds. Therefore, kinetic energy can be easily
R

obtained. If the force is a function of position, then integration using Equation (3.3)
is necessary. The actual gravity (3.1) is a function of position or distance, and in this
case, Equation (3.3) is more useful than the formula for kinetic energy. For example,
if an object moves along the x-axis and the force component in the x-direction is a
function of x, i.e., f = f (x), then the work done by the force f (x) between x = a
and x = b is given by
Z b
W= f (x)dx.
a
It is called a definite integral because it calculates the accumulated work done by the
force f (x) from the beginning to the end. That is, the definite integral is to determine
the signed area of the graph of f (x) from x = a to x = b.

3.3 Gravity force and potential energy

The motion energy of a planet undergoes exchange between potential and kinetic
energy as it alternates between acceleration and deceleration. When an object with
mass m1 moves with velocity v, the kinetic energy is given by:
1
Ek = m1 ∥v∥2 .
2
The following problem demonstrates that the potential energy due to gravity on the
surface of Earth can also be expressed as a product of gravity and distance.

Problem 3.1 (Gravity on the earth surface). The gravitational force exerted on
an object with mass m1 at the Earth’s surface is −m1 gk̂. Here, g = 9.8 m/sec2 is
the gravitational acceleration, and k̂ is the unit vector in the vertical direction on
the Earth’s surface. If this object is placed at a height h > 0 above the surface, the
object’s potential energy is
E p = m1 gh (3.4)
Show the following:
(1) Confirm the magnitude of the gravity constant g using Equation (3.1).
(2) Explain the concept of potential energy (3.4) using the work concept.
(3) Explain the significance of potential energy (3.4).

Solution 3.1 (1) The mass m corresponds to m1 , and the vector k̂ corresponds to
r/r. Therefore, the remaining part corresponds to the constant g:
26 3 Newton’s law on Earth

g = Gm2 /R2 ≈ 9.8 m/sec2

Here, m2 is the mass of Earth, and R is the radius of Earth. The value of g can be
verified by finding it on the internet.
(2) Work is a method of calculating potential energy. If the force F in the direction
of motion of an object with respect to the ground is constant, then the work is given
by fz h. Here, h is the (vertical) displacement. Therefore, the potential energy is
E p = m1 gh.
(3) The energy required to push the object from the Earth’s surface to its current
position is the potential energy. Alternatively, it is the amount of work needed for
the object to fall to the Earth’s surface from that position.

Problem 3.2. A mass of 2Kg is thrown vertically upward from the ground with a
force of twice the gravity for t seconds. Calculate the kinetic and potential energies
at that moment.

Solution 3.2 If the force is twice the gravity, then 2mg = 4Kgg. The acceler-
ation
Rt
is g since we subtract gravity. Therefore, the velocity after t seconds is
0Rgds = gt. Therefore, the kinetic energy is 12 mv2 = g2t 2 Kg. The distance traveled
t
is 0 gsds = 21 gt 2 , so the potential energy is E p = mgh = (2Kg)g 21 gt 2 = g2t 2 Kg. The
total energy is g2t 2 Kg + g2t 2 Kg = 2g2t 2 Kg. Alternatively, using Equation (3.3), the
total energy can be calculated. Then, 4gKg × 12 gt 2 = 2g2t 2 Kg. If the total energy
after 100 seconds is expressed in units, since g = 9.8 m/sec2 , the total energy is as
follows:

Etotal = 2(9.8)2 m2 /sec4 × (100)2 sec2 Kg = 1.9208 × 106 m2 Kg/sec2 .

Calculating the potential energy or gravity between planets or between a planet


and a star requires a different approach. In these cases, gravity cannot be treated as a
constant. Gravity becomes a function of distance, requiring integration to calculate
energy. However, there is another fundamental problem. Potential energy on the
Earth’s surface is defined to be 0, with the Earth’s surface as the reference point.
What should be the reference point for potential energy between planets?

Problem 3.3 (Potential energy with Earth’s surface as reference). Gravity is a


function of distance r between two objects given by Newton’s law of gravitation
(3.1). Let’s denote the mass of Earth as m2 . For an object with mass m1 located at
a distance r > 0 from the center of Earth (not on the Earth’s surface), the potential
energy is given by
E p = Gm1 m2 (R−1 − r−1 ), (3.5)
where R is the radius of Earth and r is the distance between the object’s center and
the Earth’s center.

Solution 3.3 First, assume that the object moves up and down along the center of
the Earth. The k̂ component of gravity is given by f = −Gm1 m2 s−2 . Here, s is the
3.3 Gravity force and potential energy 27

distance to the center of the Earth. Assume pushing the object away from the Earth’s
surface requires a force in the opposite direction. Integrating gravity for r > R yields:
Z r r
Gm1 m2 s−2 ds = −Gm1 m2 s−1 = Gm1 m2 (R−1 − r−1 ).
R R

This matches (3.5). Let h denote the distance from the surface. Then, r = R + h.
Therefore, the potential energy is:
1 1  R+h−R h R2
E p = Gm1 m2 − = Gm1 m2 = Gm1 m2 2 2 .
R R+h R(R + h) R R + Rh

R2
If h is much smaller than R, R2 +Rh
≈ 1. The potential energy can then be written as:

h Gm2
E p ≈ Gm1 m2 2
= m1 2 h,
R R
which is a valid approximation for the potential energy (3.5). (The radius of Earth is
2
R = 6371 km. If h = 10 km, then R2R+Rh ≈ 0.9984, with a difference of about 0.16%.)

h
Remark 3.1 (A brief note). Since h is much smaller than R, we can say R(R+h) ≈ Rh2 .
However, we left the h in the numerator. We shouldn’t delete everything just because
it’s small. Depending on what we want to see, we can distinguish between what can
be deleted and what shouldn’t be deleted, depending on what’s around.

Question 3.2. The potential energy (3.5) becomes 0 on the Earth’s surface. This def-
inition represents potential energy with respect to the Earth’s surface. What happens
if we calculate potential energy with respect to the center of the Earth?

When calculating the potential energy from the center of the Earth, it corresponds
to the case where R = 0. In this scenario, the potential energy given by (3.5) diverges,
meaning:
lim Gm1 m2 (R−1 − r−1 ) = ∞.
R→0

This implies that the potential energy becomes infinite when measured from the
center of the Earth. Essentially, this suggests that an infinite amount of energy is
required to move away from the center of the Earth. In other words, objects located
at the center of the Earth cannot escape. (Even if an object has a small mass, if it
can be compressed sufficiently, nothing can escape from within. Such objects are
known as micro black holes.)
If potential energy cannot be measured from the center of the Earth, the next
natural choice is to measure it from ∞. Then, when R = ∞, the potential energy is
given by:
Gm1 m2
Ep = − . (Potential Energy)
r
28 3 Newton’s law on Earth

In this case, the drawback is that potential energy is negative. When measured from
infinity, the potential energy is 0 at ∞ and becomes increasingly negative as it ap-
proaches the Earth’s center. But among other choices, this is the best one. When
considering the movement between planets, the reference point for potential energy
is r = ∞, and the potential energy is negative and becomes 0 at r = ∞. When con-
sidering movement due to gravity on the Earth’s surface, the reference point is the
surface of the Earth, and potential energy is positive, reaching a minimum of 0 at
h = 0.

3.4 Projectile motion

Let’s examine the trajectory of a projectile launched from the ground at an angle
φ ∈ (0, π2 ) with an initial velocity v0 > 0. The objective is to find the projectile’s
trajectory before it touches the ground again, the maximum height reached before it
falls, the distance traveled, and the time it stays in the air. Air resistance is ignored.
Assuming the projectile moves in the xz-plane, let’s find the trajectory r(t).
Let the starting point be the origin, r(0) = 0, and the initial velocity be v(0) =
(v0 cos φ , v0 sin φ ). The acceleration a is given by gravity, so a(t) = (0, −g). The
velocity vector v(t) at time t is obtained by integrating the acceleration with initial
conditions:
     
v cos φ
Z
c1 c
v(t) = a(t)dt = , v(0) = 1 = 0 .
−gt + c2 c2 v0 sin φ

Thus, v(t) = (v0 cos φ , −gt + v0 sin φ ). Integrating once more to calculate the posi-
tion vector:
     
v0 cos φt + c1 0
Z
c
r(t) = v(t)dt = ⇒ r(0) = 1 = .
− 21 gt 2 + v0 sin φt + c2 c2 0

Therefore, the projectile’s trajectory is:


   
x(t) v0 cos φt
r(t) = = .
z(t) − 12 gt 2 + v0 sin φt

z(t) = 0 represents the moment when the projectile is on the ground. Therefore,
solving − 12 gt 2 + v0 sin φt = 0 gives us the moments when it touches the ground.
One solution is the initial time, t = 0. The other is:
2v0 sin φ
T= Time of flight
g

when it touches the ground again. The x-component x(T ) at time T is the distance
traveled:
3.4 Projectile motion 29

2v20 sin φ cos φ


R = x(T ) = Range
g
The projectile’s maximum height occurs at half of the total time of flight, so:

v20 sin2 φ
H = z(T /2) = Maximum height
2g

Problem 3.4. Explain how the projectile trajectory changes if there is a crosswind
blowing at a speed of v1 .

Solution 3.4 If we ignore air resistance, no matter how strong the crosswind is, it
doesn’t affect the projectile’s trajectory. When considering air resistance, the method
used above is not sufficient.

Problem 3.5. Given a fixed launch velocity, how can you maximize the distance the
projectile travels?

Solution 3.5 If the launch angle φ is fixed, the maximum distance and height are
proportional to the square of the velocity v20 . The time of flight is proportional to v0 .
If the velocity is fixed, you can choose the angle φ . The range is maximized when
sin φ cos φ reaches its maximum value. To find the maximum, differentiate it since
it’s 0:
(sin φ cos φ )′ = cos2 φ − sin2 φ = 2 cos2 φ − 1.
Thus, the critical points are when cos φ = √1 , so φ = π4 .
2

Question 3.3. The following text is from a baseball magazine: ”We were taught in
school that the ’most distance a ball can be thrown angle’ is 45 degrees. But in actual
baseball, the optimal launch angle is close to 30 degrees.” Why is this different? (The
optimal angle for a golf ball is about 17 degrees.)

The reason is air resistance and the spin of the ball. The ball’s spin is due to the
bottom part of the bat hitting the ball. If the launch angle is 45 degrees and the ball
has such spin, the actual trajectory is much higher than the optimal trajectory. The
spin of a golf ball is also caused by hitting the bottom of the ball, making the spin
more pronounced than a baseball and having a greater impact due to the surface of
the ball. Of course, without air resistance, 45 degrees is always the optimal launch
angle.

Exercises

1. Calculate the potential energy of a 10 kg object on the Earth’s surface. (Consider


R = ∞ as the reference point.)
30 3 Newton’s law on Earth

2. Calculate the gravitational force between the Earth and the Sun. Compare it with
the gravitational force between Mars and the Sun. (Necessary data can be found
on the internet, such as the masses of Earth and Mars, and the distances from the
Sun.)
3. Let the mass of Jupiter be 1.899 × 1027 kg and its radius be 140, 000 km. Calcu-
late the magnitude of the gravitational force on Jupiter’s surface and compare it
with the gravitational force on the Earth’s surface.
4. A 10 kg piece of iron falls into water with a depth of 10 meters. How much work
does gravity do? (Necessary data can be found on the internet, such as the density
of iron.)
5. Assume the speed of sound is 340 m/s. Calculate the maximum distance traveled
when the projectile’s velocity is equal to the speed of sound. Also, determine the
time of flight and maximum height reached.
6. It is said that the maximum range of a K9 howitzer is 53 km. What is the launch
velocity?
Lecture 4
Multi-variable Vector-valued Functions

In Calculus 2, the primary subject matter dealt with is multivariable functions that
have vector values. Vectors are denoted in boldface and are conceived as column
vectors. Generally, they are represented as follows:

f : D ⊂ Rn → Rm , y = f(x), x ∈ R n , y ∈ Rm .

Most examples consider n = 2 or 3. However, the same theory applies to dimen-


sions larger than these. The theory, including dealing with big data, often involves
considering very high dimensions.

4.1 Domain, Range, and Codomain

A function f assigns a single value for each element in the set D ⊂ Rn , referred
to as the domain. The collection of all function values, denoted as {y ∈ Rm : y =
f(x) for some x ∈ D}, is called the range (or image), and the space in which the
function values belong, Rm , is termed the codomain. We typically denote the di-
mension of the domain as n and the dimension of the codomain as m. We express
this as:    
f1 (x) x1
 ..   .. 
f(x) =  .  , x =  .  ∈ Rn .
fm (x) xn
Here, fi : Rn → R represents functions with n independent variables that yield scalar
values. Typically, independent variables are denoted as x1 , · · · , xn . However, when
n ≤ 3, they are occasionally represented as x, y, and z. We aim to distinguish between
x ∈ Rn and y ∈ Rm , though in some cases, it may not be possible.
Given a function, often one can determine its maximum domain even without
explicit specification. In such cases, finding the maximum range is also feasible.

31
32 4 Multi-variable Vector-valued Functions

Problem 4.1. Determine the maximum domain and range for the following func-
tions. p
(i) f (x, y) = y − x2 .
1
(ii) f (x, y) = xy .
(iii) f (x, y, z) = xy ln z.
(iv) f (x, y, z) = x2 +y12 +z2 .

Solution 4.1 ⊔

Given a point c ∈ Rn and a radius r > 0, the set of points that lie within a maxi-
mum distance r from c, denoted as

B(c, r) := {x ∈ Rn : ∥x − c∥ < r}

is called an open ball with radius r > 0 and center c. If c ∈ D and B(c, r) ⊂ D for
some r > 0, then c is termed an interior point of the set D. If c ̸∈ D and B(c, r)∩D =
0/ for some r > 0, then c is termed an exterior point of the set D. Otherwise, c is
termed a boundary point. If every point of a set D ⊂ Rn is an interior point, then D
is termed open. If Dc := {x ∈ Rn : x ̸∈ D} is an open set, then D is termed closed.

Problem 4.2. It is crucial to thoroughly understand the mathematical concepts in-


troduced above. Consider the following problem:
1. Prove that the open ball B(c, r) is an open set.
2. Find the boundary points of the deleted open ball, denoted as D = B(c, r) \ {c}.

Solution 4.2 ⊔

Remark 4.1. (i) The concepts of open sets and closed sets are extremely important
and are utilized in proofs in analysis courses. They are merely introduced in calculus
courses. (ii) The definitions provided here may slightly differ from those in other
textbooks, but their equivalence can be verified.

4.2 Graph, Image, Level Set, and Contours

This section delves into the concepts of graphs, images, level sets, contours, and
their interrelations.

Problem 4.3. What distinguishes a function’s graph from its image (or range)? Can
you differentiate between a curve drawn on a plane representing the graph of a
function and one representing its image? (Hint: Use the vertical line test?)

A curve representing the trajectory of particles in space is the range of a single-


variable function r : [0, T0 ] → R3 that takes values in R2 . The graph of this function
4.2 Graph, Image, Level Set, and Contours 33

r lies in four-dimensional space, represented as (t, x, y, z) = (t, x(t), y(t), z(t)). How-
ever, since we are accustomed to experiencing three-dimensional space, imagining
objects in four dimensions is challenging. Therefore, let’s start with simpler exam-
ples.
 
cost
Problem 4.4. Consider r(t) = , a function with domain [0, 2π] and range in
sint
R2 . Draw the graph and image of this function.

Solution 4.4 (Hint. The graph of this function lies in R3 , whereas its trajectory (or
its range) lies in R2 .) ⊔

Let x ∈ Rn and y ∈ Rm . Consider a function f : Rn → Rm . The graph of f is


the collection of all points (x, y) ∈ Rn+m satisfying y = f(x), or equivalently, the
set of points (x; f(x)) ∈ Rn+m for x ∈ Rn . However, a single sheet of paper only
provides two dimensions. Since we have experience with three dimensions, we can
use our imagination to draw pictures of three-dimensional objects on paper. Among
the graphs of multivariable functions, we can only visualize those where n = 2 and
m = 1. That is, when there are two independent variables and the function yields
scalar values.

Problem 4.5. Let D = (−1, 1) × (−1, 1) be the domain of the function f (x1 , x2 ) =
q
x12 + x22 . Draw the graph and image.

Solution 4.5 In this example, it can be observed that the image is not particularly
meaningful. ⊔⊓

For f : Rn → R or r : Rn → Rm , where f has n independent variables and yields


scalar values, a given value c ∈ R defines the level set of the function f as the set of
points x ∈ Rn where f takes the value c, i.e.,

Level set of the function f with level c: {x ∈ Rn : f (x) = c}.

When c is determined by c = 0, it is called the zero-level set. Generally, if the


domain of the function f : Rn → R is n-dimensional, its level set has dimension
n−1. The Implicit Function Theorem provides information regarding this. However,
if the assumptions of the theorem do not hold, the level set is not necessarily of
dimension n − 1. The collection of level sets of a function is termed the contour
map of the function.

Problem 4.6. (i) Describe the level set and contour map of the function f (x, y) =
p
2 2
p1 − x + y . (ii) Describe the level set and contour map of the function f (x, y, z) =
x2 + y2 + z2 .

Solution 4.6 ⊔

34 4 Multi-variable Vector-valued Functions

Problem 4.7. The graph of a function f : Rn → R can be considered as the zero level
set of another function g : Rn+1 → R. (i) Find the function g. (ii) Let f : R2 → R be
defined as f (x, y) = 2x + 3y. Find a function g : R3 → R such that its zero level set
gives the graph of f . Find a vector perpendicular to the graph.

Solution 4.7 ⊔

4.3 Limit and Continuity in Rn

We are already familiar with the definition of limits for functions f : R → R or


r : R → Rm . For functions f : Rn → Rm , the definition of limits is similar.

Definition 4.1. Let f : Rn → Rm and c ∈ Rn . A vector L ∈ Rm is called the limit of


f as x → c, denoted by lim = L, if for any given ε > 0, there exists δ > 0 such that
x→c
∥f(x) − L∥ < ε whenever 0 < ∥x − c∥ < δ . We say f is continuous at c if lim = f(c).
x→c

The definitions of limits and continuity provided above are based on a dynamical
argument. To show a limit or continuity, we need to find a suitable δ > 0 such that
the conditions of the definition are satisfied for any given ε > 0. The possibility
of finding such δ > 0 for any ε > 0 signifies a limit or continuity. Directly defining
continuity without going through limits is also possible, with only slight differences.

Problem 4.8. Let f : Rn → Rm and c ∈ Rn . Then, for any ε > 0, there exists δ > 0
such that ∥x − c∥ < δ implies ∥f(x) − f(c)∥ < ε. This statement is equivalent to
saying lim = f(c), i.e., f is continuous at c.
x→c

Solution 4.8 Showing that assertion A implies assertion B means proving two
things.
(A⇒B)
(A⇐B) ⊔

Several basic rules apply when computing limits. These rules are similar to those
for scalar-valued functions of a single variable, but they require conditions to hold
for vectors.

Problem 4.9. Let f, g : Rn → Rm , with lim f(x) = L and lim g(x) = M. Show the
x→c x→c
following.
1. lim (f(x) + g(x)) = L + M.
x→c
Some explanation is needed. The meaning of these relationships is that limits can
be taken separately. For example, the first one means:
 
lim (f(x) + g(x)) = lim f(x) + lim g(x) = L + M.
x→c x→c x→c
4.3 Limit and Continuity in Rn 35

2. lim (f(x) − g(x)) = L − M.


x→c
3. lim (f(x)g(x)) = LM.
x→c
Some explanation is needed. It is not clear what the multiplication between two
vectors means. First, we need to clarify its meaning. The multiplication could
represent an inner product, an outer product, or a cross product, and all three
cases are valid. That is,
3’. lim (f(x) · g(x)) = L · M, lim (f(x) ⊗ g(x)) = L ⊗ M, lim (f(x) × g(x)) = L × M.
x→c x→c x→c
f(x) L
4. lim = .
x→c g(x) M
Division between two vectors is not defined. Division is only defined when g is
scalar-valued. Even if g : Rn → R, its limit must not be zero.
f(x) L
4’. lim = , where g : Rn → R, and M ̸= 0.
x→c g(x) M

Solution 4.9 Let’s prove the first problem. Given ε > 0, we need to find δ > 0 such
that ∥f(x) + g(x) − (L + M)∥ < ε. Since lim f(x) = L, there exists δ1 > 0 such that
x→c
∥x − c∥ < δ1 implies ∥f(x) − L∥ < ε2 . Similarly, since lim g(x) = M, there exists
x→c
δ2 > 0 such that ∥x − c∥ < δ2 implies ∥g(x) − M∥ < ε2 . Now, let δ be the minimum
of these two, min(δ1 , δ2 ). Then, for all 0 < ∥x − c∥ < δ , we have
ε ε
∥f(x) + g(x) − (L + M)∥ ≤ ∥f(x) − L∥ + ∥g(x) − M∥ < + < ε.
2 2
Here, the first inequality is due to the triangle inequality.
The second problem is similar, and the third problem can also be approached
similarly. This technique is called the ”give and take” method. Ultimately, we obtain
the following result:

∥f(x) · g(x) − L · M∥ = ∥f(x) · g(x) − f(x) · M + f(x) · M − L · M∥


= ∥f(x) · (g(x) − M) + (f(x) − L) · M∥
ε ε
≤ ∥f(x)∥ ∗ ∥g(x) − M∥ + ∥f(x) − L∥ ∗ ∥M∥ ≤ + ≤ ε.
2 2
(In the above notation, ∗ denotes multiplication. Multiplication is usually written
without any symbol, but it can be slightly inconvenient to write ∥x∥∥y∥ with-
out anything in between, so I inserted ∗ in between. × is avoided because · is
already being used for a different purpose. The first two equalities are trivial.
The first inequality uses the triangle inequality and the property of inner product
∥x · y∥ = cos θ ∥x∥ ∗ ∥y∥. We use the assumptions lim f(x) = L and lim g(x) = M to
x→c x→c
obtain the second equality, for which we need to perform the same work as we did
in the first problem. That is, considering lim f(x) = L, there exists δ1 > 0 such that
x→c
36 4 Multi-variable Vector-valued Functions

ε
∥f(x) − L∥ < whenever ∥x − c∥ < δ1 .
2∥M∥
ε
Note that we replace ε with 2∥M∥ in this expression. Then we get ∥(f(x) − L)∥ ∗
ε
∥M∥ ≤ 2 for the other half as well.) ⊔ ⊓

Discontinuous multi-variable functions

Discontinuities can arise in various situations, and it is helpful to examine some


examples. Let’s consider only scalar-valued functions f : Rn → R to verify discon-
tinuities.
Starting with functions of one independent variable, f : R → R, possible discon-
tinuities include:
1. Jump discontinuity: limx→c+ f (x) ̸= limx→c− f (x). This is the most common and
meaningful case.
2. Removable discontinuity: limx→c+ f (x) = limx→c− f (x) ̸= f (c). Physically, this
discontinuity is insignificant. It can be made continuous by assigning a new value
to f (c).
3. Oscillation or blow-up: limx→c+ f (x) or limx→c− f (x) may not exist. As x → c±,
the function value may oscillate or blow up to infinity.
In the case of functions with multiple independent variables, there are more dis-
continuity scenarios. The limit of a function in multiple dimensions is approached
simultaneously from all directions. Approaching from different directions may yield
different values, which implies discontinuity. Moreover, approaching along a curve
may result in yet another limit value. Even in the case of two variables, we can eas-
ily provide examples where there are singularities at the origin (x, y) = (0, 0), which
can be either continuous or discontinuous at the origin.“‘

Problem 4.10. Determine whether the following function is continuous at (x, y) =


(0, 0). ( xy
2 2, if (x, y) ̸= (0, 0)
f (x, y) = x +y
0, otherwise.

Solution 4.10 ⊔

Problem 4.11. Examine the continuity of the following function:


( 2
xy
, if (x, y) ̸= (0, 0)
f (x, y) = x2 +y4
0, otherwise.

(i) Show that f (x, y) converges to 0 as (x, y) approaches the origin (0, 0) in any
direction. (ii) Show that f (x, y) converges to a non-zero value as (x, y) approaches
4.4 Composition of two functions 37
p
the origin (0, 0) along the parabola y = k |x|. (iii) What can be concluded about
the continuity of f at the origin (0, 0)?

Solution 4.11 ⊔

Problem 4.12. Determine whether the following function is continuous at (x, y) =


(0, 0). ( x√y
2 , if (x, y) ̸= (0, 0)
f (x, y) = x +y
0, otherwise.

Solution 4.12 It is important to see the similarity between this example and the
previous two examples. ⊔⊓

Problem 4.13. Test the continuity of the following function at (x, y) = (0, 0) with
ε > 0. ( 1+ε
xy
, if (x, y) ̸= (0, 0)
f (x, y) = x2 +y2
0, otherwise.

Solution 4.13 ⊔

4.4 Composition of two functions

Composition functions appear frequently. Implicit differentiation implies consider-


ing every function as a composition function as needed. The chain rule, the flower
of calculus, is the differentiation of composite functions.

Problem 4.14. Let f : Rn → Rm and assume lim f(x) = L. Also, let h : Rm → Rℓ be


x→c
continuous at L. Then, the following holds:

lim h(f(x)) = h(lim f(x)) = h(L)


x→c x→c

This theorem implies that if the outer function h is continuous at the limit
lim f(x) = L, then we can move the outer limit inside the function h.
x→c

Solution 4.14 Let ε > 0 be given. Our goal is to find δ > 0 such that

∥h(f(x)) − h(L)∥ < ε whenever ∥x − c∥ < δ .

Since h is continuous at y = L, there exists δ1 such that

∥h(y) − h(L)∥ < ε whenever ∥y − L∥ < δ1

holds. Also, since lim f(x) = L, there exists δ > 0 such that
x→c
38 4 Multi-variable Vector-valued Functions

∥f(x) − L∥ < δ1 whenever ∥x − c∥ < δ .

Therefore, for this δ , the first claim holds. ⊔


Now let’s consider an important example. The power function h(y) = yk is con-
tinuous for all positive k ≥ 0. If k < 0, it is discontinuous at y = 0. Therefore, we
have the following:
1. For all cases where k ≥ 0, limx→c ( f (x))k = (limx→c f (x))k .
2. limx→c e f (x) = elimx→c f (x) .
Part II
Linear Functions and Differentiation
Lecture 5
Linear maps and matrix multiplication

Multivariable functions

f : Rn → Rm , y = f(x), x ∈ R n , y ∈ Rm

differential at a point c ∈ Rn is a linear function

T : Rn → Rm

and the graph of this linear function and the graph of f(x + c) − f (c) shifted par-
allelly to it meet at the origin (See Lecture 8). For a better understanding of multi-
variable functions, basic knowledge of linear functions and matrix theory is crucial.
This lecture covers the basics of linear functions and matrix multiplication.
We will follow strict rules for notation rather than using it freely. We will dis-
tinguish between vectors and scalars, as well as between row vectors and column
vectors. We will try to distinguish the notation of matrices and the indices represent-
ing rows and columns as much as possible.

5.1 Linear functions

We represent vectors in bold letters such as x or y. We always consider these vectors


as column vectors, and if x ∈ Rn , it is represented as follows:
 
x1
 .. 
x =  . .
xn

This column vector has n rows and 1 column, which can be viewed as an n × 1
matrix. If a row vector is needed, we take the transpose of the vector as follows:

41
42 5 Linear maps and matrix multiplication

xt = (x1 , · · · , xn ).

This row vector has 1 row and n columns, thus it can be viewed as a 1 × n matrix.
Also, the following notation is used:

x = (x1 ; x2 ; · · · ; xn ),

where the semicolon ‘;’ is a delimiter for changing rows. The basis unit vectors are
represented as follows:

et1 = (1, 0, · · · , 0), · · · , etn = (0, · · · , 0, 1).

These vectors are orthogonal to each other and have a magnitude of 1.


A function T : Rn → Rm is called linear if it satisfies

T (ax + by) = aT (x) + bT (y) for all x, y ∈ Rn , and a, b ∈ R.

Next, let’s consider some useful facts.

Problem 5.1. If T : Rn → Rm is a linear function, then T (0) = 0.

Solution 5.1 This problem is added to remind that T (0) = 0 for all linear functions.
The first appearance of ”T (0)” denotes the zero vector 0 in Rn . The subsequent
appearance of ”0” denotes the zero vector 0 in Rm .
Let’s prove it as follows. For any x ∈ Rn ,

T (0) = T (0x) = 0T (x) = 0.

Let’s explain what each step means. ⊔


Given N vectors xi ∈ Rn and N numbers ai ∈ R, their sum


N
∑ ai xi = a1 x1 + a2 x2 + · · · + aN xN
i=1

is called a linear combination.

Problem 5.2 (Linearity of arbitrary combinations). If T : Rn → Rm is a linear


function, then the following holds for any linear combination:
 N  N
T ∑ ak xk = ∑ ak T (xk ).
k=1 k=1

(In other words, the value of the function on a linear combination is equal to the
linear combination of the function values.)
5.2 Matrix multiplication 43

Solution 5.2 We use induction. We already know that the proposition holds for
N = 2. Assuming that it holds for N = ℓ − 1, we will prove that it holds for N = ℓ.
Then,
 ℓ   ℓ−1   ℓ−1 
T a x
∑ kk = T a x
∑ kk ℓℓ + a x = T ∑ k k + aℓ T (xℓ )
a x
k=1 k=1 k=1

ℓ−1 ℓ
= ∑ ak T (xk ) + aℓ T (xℓ ) = ∑ ak T (xk ),
k=1 k=1

where the second equality follows from the definition, the third equality follows
from the assumption that it holds for N = ℓ − 1. ⊔

5.2 Matrix multiplication

Two important examples of matrix multiplication are inner-product and outer-


product (tensor-product). The inner product is the product of two column vectors
of the same size. The inner product of two vectors x, y ∈ Rn is denoted by x · y or
⟨x, y⟩. The result of the inner product is a scalar (real number), given by

x · y = ⟨x, y⟩ := x1 y1 + · · · + xn yn .

Let A be an m × n matrix.1 Matrices are written in a rectangular form as follows;


 
a11 · · · a1n
A =  ... ..  .

. 
am1 · · · amn

A has m rows and n columns. The number of rows is the size of the columns, and
the number of columns is the size of the rows. Each element is a real

Remark 5.1 (Index Selection). i) We denote the row vector of the matrix A as ãti .
Note that in our notation, ã1 ̸= a1 . The vector a1 represents the first column of A,
hence a1 ∈ Rm . On the other hand, ã1 represents the transpose of the first row of A,
thus ã1 ∈ Rn . ii) In matrix multiplication AB, where A is an m × n matrix and B is
an n × ℓ matrix, the resulting matrix C = AB is of size m × ℓ. We denote A = (ai j )
and B = (b jk ). Since j will disappear, we have chosen these indices. Choose what
indices to use for C. Your choice should be C = (Cik ). Therefore, C = AB represents
the i-th row and k-th column of C.
1 In linear algebra, people usually consider m × n matrix, not n × m matrix. Hence, m is for the
number of rows and n for columns. This is related to the convention that vector-valued functions
are usually denoted as f : Rn → Rm .
44 5 Linear maps and matrix multiplication

Viewpoint #1

Matrix multiplication can be understood from about four viewpoints. When we have
AB = C, the ik-th element of the resulting matrix C is the dot product of the i-th row
vector of A and the k-th column vector of B. In other words,
 
b1k n
 .. 
cik = (ai1 , · · · , ain )  .  = ∑ ai j b jk . (5.1)
j=1
bnk

This first viewpoint might be familiar as it is often used as the definition of matrix
multiplication. The dot product itself can be understood as a matrix multiplication,
where the row vector is a 1 × n matrix and the column vector is an n × 1 matrix,
resulting in a 1 × 1 matrix, i.e., a scalar. The row vector is multiplied on the left, and
the column vector on the right. Again, written as,
 
y1 n
t
 . 
x · y = x y = x1 · · · xn  ..  = ∑ xi yi .
i=1
yn

Viewpoint #2

Now, let’s introduce the second viewpoint of matrix multiplication. It is a perspec-


tive from the column vectors of the matrix A. If A is of size m × n, it has n columns.
When multiplied on the right by a vector x ∈ Rn , the result Ax ∈ Rm can be under-
stood as follows;
 
x1 n
 .. 
Ax = (a1 , · · · , an )  .  = ∑ x j a j ∈ Rm .
j=1
xn

In other words, Ax is a linear combination of the column vectors of A, with coef-


ficients given by the components of the vector x ∈ Rn . Since the column vectors
belong to Rm , the result is a vector in Rm .
Now, let’s multiply by another matrix B = (b1 , · · · , bℓ ). If B is an n × ℓ matrix,
then,
AB = A(b1 , · · · , bℓ ) = (Ab1 , · · · , Abℓ ). (5.2)
That is, multiplying a matrix with ℓ columns on the right means obtaining ℓ columns,
each of which is a linear combination of the columns of A, with coefficients given by
b j . This second viewpoint is very useful from the perspective of column operations
or linear mapping.

Problem 5.3. Verify if the ik element of (5.2) matches the given ik element of (5.1).
5.2 Matrix multiplication 45

Viewpoint #3

The third viewpoint is row operation. This method is often used when solving sys-
tems of equations, especially in Gaussian Elimination. It is the dual viewpoint of the
second viewpoint but is also widely used in practice.
Let B be an n × ℓ matrix. To multiply a vector on the left of B, it must be a 1 × n
row vector. If x ∈ Rn , then the row vector xt is a 1 × n matrix. Then, xt B gives a 1 × ℓ
row vector, which is a linear combination of the row vectors of B. It can be written
as follows;  t
b̃1 n
 .. 
x B = (x1 , · · · , xn )  .  = ∑ xi b̃ti ∈ Rℓ .
t
i=1
b̃tn
That is, multiplying a n × ℓ matrix on the left with a 1 × n row vector gives a 1 × ℓ
row vector, which is a linear combination of the row vectors inside B.
Now, let’s multiply a matrix A with B on its left. Then,
 t  t 
ã1 ã1 B
 ..   .. 
AB =  .  B =  .  . (5.3)
ãtm ãtm B

That is, multiplying a matrix with m rows on the left means obtaining m rows, each
of which is a linear combination of the row vectors of B, with coefficients given by
ã j . This third viewpoint is useful from the perspective of row operations or Gaussian
elimination.

Problem 5.4. Verify if the ik element of (5.3) matches the given ik element of (5.1).

Let’s review row operations with the following problem.

Problem 5.5 (Row Operation). A is a 4 × 3 matrix, and to multiply a row vector


on A, we need to multiply a 1 × 4 vector on the right side. Thus, we obtain a row
vector. Let’s denote the matrix
 
1 000
E = −2 1 0 0
1 210

and consider EA. Explain each row of EA using the rows of A.

Solution 5.5 Since E has 3 rows, EA also has 3 rows. The first row of E creates the
first row of EA, and since the coefficients of the linear combination are (1, 0, 0, 0), it
is identical to the first row of A. The second row of EA is created by the second row
of E, which involves multiplying the first row of A by 2 and subtracting it from the
second row of A. The third row of EA is created by the third row of E, and so on.


46 5 Linear maps and matrix multiplication

Viewpoint #4

The fourth viewpoint uses the tensor product (outer product) perspective. Although
not widely used, the tensor product allows multiplication between arbitrary vectors,
even if they have different dimensions. For example, let x ∈ Rn and y ∈ Rm . The
tensor product, denoted as y ⊗ x, is defined as follows:
   
y1 y1 x1 · · · y1 xn
y ⊗ x := yxt =  ...  x1 · · · xn =  ... .. 
   
. 
ym ym x1 · · · ym xn

In this case, it is the product of an m × 1 matrix and a 1 × n matrix, resulting in an


m × n matrix. (The cross-product, on the other hand, is the multiplication between
three-dimensional vectors, resulting in another three-dimensional vector, which is
entirely unrelated to matrix multiplication.)

Problem 5.6 (Fourth perspective of matrix multiplication). The first perspective


was based on the inner product. Now, the fourth perspective utilizes tensor product
or outer product. Explain the matrix multiplication AB as a sum of n outer products.

Solution 5.6 Let B be an n × ℓ matrix, and b̃tj be the row vectors of B. Then,
 t
b̃1 n n
 .. 
AB = (a1 , · · · , an )  .  = ∑ a j b̃tj = ∑ a j ⊗ b̃ j ,
j=1 j=1
b̃tn

which is a sum of n outer products. By comparing the ik entry of the right-hand


side with (5.1), we can see that they match. ⊔

Lecture 6
Properties of linear mappings

6.1 Matrix multiplication and composition of linear functions

In this section, we demonstrate that linear mappings are equivalent to matrices, and
the composition of linear functions is equivalent to matrix multiplication.

Problem 6.1 (Linear mappings are matrices). Let T : Rn → Rm be a linear map-


ping. Show that there exists a unique m × n matrix A satisfying:

T (x) = Ax.

Solution 6.1 Let a j = T (e j ) ∈ Rm where j = 1, · · · , n. Denote A = (a1 , · · · , an ).


Then A is an m × n matrix. Let x ∈ Rn . Then x = ∑nj=1 x j e j = (x1 , · · · , xn ). By the
linearity property of the mapping, we have:
n n n
T (x) = T ( ∑ x j e j ) = ∑ x j T (e j ) = ∑ x j a j = Ax.
j=1 j=1 j=1

The last equality follows from the second perspective of matrix multiplication. ⊔

Problem 6.2 (Matrices are linear mappings). Given a matrix A ∈ Rm×n , define
T : Rn → Rm by T (x) = Ax. Show that T is a linear mapping.

Solution 6.2 Let x, y ∈ Rn and a, b ∈ R. Let c = ax + by. Then c j = ax j + by j and


n n n
T (ax + by) = Ac = ∑ (ax j + by j )a j = a ∑ x j a j + b ∑ y j a j
j=1 j=1 j=1

= aAx + bAy = aT (x) + bT (y).


Therefore, T is a linear mapping. ⊔

47
48 6 Properties of linear mappings

Problem 6.3 (Matrix multiplication is composition of linear functions). Let


T : Rn → Rm and S : Rℓ → Rn be linear mappings given by matrices A and B re-
spectively.
(1) Show that the composition function T ◦ S : Rℓ → Rm is a linear mapping.
(2) Show that the matrix corresponding to the composition function T ◦ S is the
matrix product AB.

Solution 6.3 (1) is straightforward. (2) demonstrates the relationship between ma-
trix multiplication and composition of functions. Although it is often used without
much thought, verification is necessary. But what needs to be shown? If C is the
ℓ × m matrix given by the matrix multiplication above, we need to show

Cu = A(Bu)

for any u ∈ Rℓ . Here, the left-hand side is precisely the linear function defined by
C, while the right-hand side is the composition function T (S(u)). Thus, showing

(AB)u = A(Bu)

demonstrates that the two functions are equal.


(Method 1) Consider only the i-th element of this vector equation. The i-th el-
ement of the left-hand side is ∑ℓk=1 cik uk . The i-th element of the right-hand side
is ãti (Bu) = ãti ∑ℓk=1 bk uk = ∑ℓk=1 ãti bk uk . Since we defined cik as ãti bk , the two are
equal. This method uses the first and third perspectives.
(Method 2) The second perspective is used here. Let’s explain the meaning of
each step and why it holds.

A(Bu) = A((b1 , · · · , bℓ )u)


= A(u1 b1 + · · · + uℓ bℓ )
= (u1 Ab1 + · · · + uℓ Abℓ )
= (Ab1 , · · · , Abℓ )u = (AB)u.


6.2 Linear function and volume of parallelotopes

Let A be the m × n matrix corresponding to a linear transformation T : Rn → Rm ,


i.e.,
T (x) = Ax, x ∈ Rn .
Since a linear transformation T and a matrix A represent the same concept, it is not
necessary to use two different notations, but sometimes it may be visually appealing
to use both. Let us denote an n-dimensional cube with side length ε > 0 as follows:
6.2 Linear function and volume of parallelotopes 49

Ω ε = [0, ε]n .

We denote the volume of a given set S as ∆ S. Then the volume of the above set is
given by:
∆ (Ω ε ) = ε n .
When ε = 1, we have ∆ (Ω 1 ) = 1.
We consider parallelepipeds with one vertex at the origin. The parallelepiped Ω 1
has all edges of length 1 and is a special parallelepiped where two edges are perpen-
dicular to each other. Generally, an n-dimensional parallelepiped is determined by
n linearly independent vectors. These vectors are the n edges of the parallelepiped
connected to the origin. Their lengths or the angles between two edges do not neces-
sarily have to be the same. For n = 2, the parallelepiped consists of n + n × (n − 1) =
n2 = 4 edges. For n = 3, it consists of a total of n+n(n−1)+ n(n−1)(n−2)
2 = 12 edges.

Problem 6.4. Let m ≥ n. Describe why the image T (Ω 1 ) of a linear transformation


T is an n-dimensional parallelepiped in m-dimensional space. What are the edges of
T (Ω 1 ) connected to the origin?

Solution 6.4 Let A be the m × n matrix of the linear transformation T . Then A can
be written as follows.
A = (a1 , a2 , · · · , an ),
where ai are the column vectors of the matrix A. Then the n edges of the cube Ω 1 are
ei , and their images are Aei = ai . In other words, the n edges of the n-dimensional
parallelepiped T (Ω 1 ) connected to the origin are given by the n columns of the ma-
trix A. (Strictly speaking, the condition that ai are linearly independent is necessary.)

The volume of Ω 1 ⊂ Rn is 1. Knowing the volume of T (Ω 1 ) ⊂ Rm is crucial


when performing integrals. The ratio

∆ T (Ω 1 )
q=
∆Ω1
is called the volume expansion rate of the linear function T . Due to the properties
ε)
of linear functions, it can be shown that for all ε > 0, q = ∆ T∆ (Ω
Ω ε . Furthermore, for
any non-zero volume space V ⊂ Rn , q = ∆ T∆V(V ) .

6.2.1 Volume of parallelepiped when m = n

To obtain the volume of the n-dimensional parallelepiped T (Ω 1 ) ⊂ Rm , we need to


calculate it in each case. When m = n, i.e., when the matrix A is a square matrix,
the volume is given by the absolute value of the determinant. The determinant of a
50 6 Properties of linear mappings

matrix A is denoted as det(A) and is defined only for square matrices. The following
is the method of calculating the determinant.

Determinant

The determinant is defined recursively as follows.


1. If A = (a) is a 1 × 1 matrix, then det(A) = a.
 
ab
2. If A = is a 2 × 2 matrix, then det(A) = ad − bc.
cd
 
abc    
e f d f
3. If A = d e f is a 3 × 3 matrix, then det(A) = a det
  − b det +
h i g i
 g h i
de
c det = a(ei − f h) − b(di − f g) + c(dh − eg).
gh

Problem 6.5 (Volume of parallelepiped when m = n). Show that the volume of a
parallelepiped formed by edges a1 , · · · , an is equal to the determinant of the matrix
A formed by a1 , · · · , an .

Solution 6.5 Let’s start with the case when n = 2. By rotating the edges, we can
position a1 along the x-axis. Then c = 0 and det(A) = ad. This matches the area of
the parallelogram. For n = 3, similarly, we rotate the edges so that a1 is aligned with
the x-axis. Then d = g = 0. By rotating the parallelepiped about a1 to position a2
in the xy-plane, we have h = 0, and det(A) becomes det(A) = aei. While this value
may be negative depending on the signs of a, e, and i, its absolute value matches the
volume. For higher dimensions, we simply remember the formula. ⊔ ⊓

Problem 6.6. Show the following.


1. ∆ (T (Ω ε )) = ε n ∆ (T (Ω 1 )).
∆ (T (Ω ε )) ∆ (T (Ω 1 ))
2. ∆ (Ω ε ) = ∆ (Ω 1 )
.

Solution 6.6 ⊔

6.2.2 Volume of parallelepiped when n < m

Given a linear function T : Rn → Rm and its corresponding m × n matrix A =


(a1 , · · · , an ), the image T (Ω 1 ) of the unit cube Ω 1 with each edge being a unit
vector ei ∈ Rn is a parallelepiped in Rm with each edge being ai . Therefore, the
volume expansion rate of T is the volume of this parallelepiped. Below, we find the
volume of this parallelepiped.
6.2 Linear function and volume of parallelotopes 51

Problem 6.7 (Area of parallelogram). When m > 2, show that the area of the space
Rm formed by two edges a1 , a2 ∈ Rm of a parallelogram in Rm is given by the
following formula: q
(∥a1 ∥ ∗ ∥a2 ∥)2 − (a1 · a2 )2 . (6.1)

Solution 6.7 Firstly,


s
a1 · a2  a1 · a2 2
cos θ = ⇒ sin θ = 1−
∥a1 ∥ ∗ ∥a2 ∥ ∥a1 ∥ ∗ ∥a2 ∥

Thus, the area of the parallelogram is


q
∥a1 ∥ ∗ ∥a2 ∥ sin θ = (∥a1 ∥ ∗ ∥a2 ∥)2 − (a1 · a2 )2 . ⊔


Problem 6.8 (Volume of parallelepiped when n < m). Let n < m. Show that the
volume of a space Rm formed by n edges a1 , · · · , an ∈ Rm of a parallelepiped is
given by the following Gram determinant:
1/2
q a1 · a1 · · · a1 · an
∥a1 × a2 × · · · × an ∥ = det(ai · a j ) = .. .. (6.2)
. .
an · a1 · · · an · an

Solution 6.8 Let’s not prove it but just remember the formula. ⊔

Question 6.1. Do the formulas for the area of a parallelogram (6.1) and the Gram
determinant in (6.2) match?
Lecture 7
Directional and Partial Differentials

In this lecture, let D ⊂ Rn be an open set, c ∈ D, and f : D ⊂ Rn → Rm . For such a


function, y = f(x) means x ∈ Rn , y ∈ Rm , which can be expanded as
   
y1 f1 (x1 , · · · , xn )
 ..   ..
 . =

. 
ym fm (x1 , · · · , xn )

Here, fi : D ⊂ Rn → R is a function taking real values for the i-th component. We


first consider the directional differentiability of the function f at c. Partial differential
is a special case of directional differential. Then, we consider the full differential.

7.1 Directional derivative

First, we consider the directional derivative of scalar-valued functions.

Definition 7.1 (Directional derivatives for f ). We call a vector u ∈ Rn a direction if


it is a unit vector. A number (or a scalar) v ∈ R is called the directional derivative
(or directional differential) of a function f : D ⊂ Rn → R at c ∈ D in the direction u
if for any ε > 0, there exists δ > 0 such that

| f (c + hu) − f (c) − hv| ≤ ε|h| (7.1)

whenever |h| < δ . The directional derivative v is denoted by Du f (c) := v.

Quiz: If the inequality “≤” in (7.1) is changed to the inequality “<”, what needs
to be changed accordingly in the definition?1
1 To exclude the case h = 0, we need to use “whenever 0 < |h| < δ ”. The advantage of putting h on
the right rather than writing the derivative in fractional form is that we can include the case h = 0,
which can be handled when using “≤”.

53
54 7 Directional and Partial Differentials

Even if the function f is a multivariable function, if a direction vector u is given


and we only consider the function f along the line in that direction at a given point
c, we are essentially dealing with a single-variable function (see the figure). The
directional derivative is similar to the derivative of this restricted single-variable
function.
The inequality (7.1) is written slightly differently from the definition of derivative
of a single-variable function, but it is almost the same. Each has its own advantages
and disadvantages.

Problem 7.1. Prove the following using the definition of limits and the given defi-
nition:
f (c + hu) − f (c)
lim = Du f (c).
h→0 h

Solution 7.1 First, let’s clarify the meaning of the problem. Although not explicitly
stated in the problem, the problem means that if the directional derivative Du f (c)
exists, then the left-hand limit exists and they are equal. This seemingly obvious
problem helps us to view the definition from the perspective of limits. The proof is
simple. Let’s look at it step by step.
Let v = Du f (c). Then, (7.1) can be written as follows:

f (c + hu) − f (c)
−v ≤ ε for 0 < |h| ≤ δ .
h
Therefore,
f (c + hu) − f (c)
lim −v = 0
h→0 h
As taking absolute value is a continuous function, we can move the limit inside:

f (c + hu) − f (c)
lim − v = 0.
h→0 h
If the absolute value is 0, then the content inside is also 0, which is what we want to
show. ⊔ ⊓
7.2 Partial derivative 55

The directional derivative of a vector-valued function f : D ⊂ Rn → Rm is the same


as that of a scalar-valued function, with the only difference being that the derivative
is a vector in Rm .

Definition 7.2 (Directional derivatives for f). A vector v ∈ Rm is called the direc-
tional derivative (or directional differential) of a vector-valued function f : D ⊂
Rn → Rm at c ∈ D in the direction u if for any ε > 0, there exists δ > 0 such that

∥f(c + hu) − f(c) − hv∥ ≤= ε|h| (7.2)

whenever |h| < δ . The directional derivative v is denoted by Du f(c) := v.

The directional derivative of scalar-valued functions is a special case when m = 1.


The case of vector-valued functions also allows for taking the directional derivative
without additional difficulty.

Problem 7.2. Show that a vector v ∈ Rm is the directional derivative of f at c ∈ Rn


in the direction u ∈ Rn if and only if

f(c + hu) − f(c)


lim = v. (7.3)
h→0 h

Solution 7.2 (⇒) Let ε > 0 be given. We take δ in (7.2). Then,

f(c + hu) − f(c) ∥f(c + hu) − f(c) − hv∥ ε|h|


−v = ≤ =ε
h |h| |h|

whenever 0 < |h| < δ . Hence, by definition, (7.3) holds.


(⇐) The above inequality gives (7.2) after multiplying |h| to it. ⊔

7.2 Partial derivative

The partial derivative is a special case of the directional derivative where u = ei


(i = 1, · · · , n), which is convenient for calculating values.

Definition 7.3 (Partial derivatives for f). A vector v ∈ Rm is called the partial
derivative (or partial differential) of a vector-valued function f : D ⊂ Rn → Rm
at c ∈ D with respect to xi for i = 1, · · · , n, if for any ε > 0, there exists δ > 0 and

∥f(c + hei ) − f(c) − hv∥ ≤ ε|h| (7.4)

whenever |h| < δ . In other words, if v = Dei f(c). We denote it by Di f(c) := Dei f(c).

Various notations are used for partial derivatives. If the independent variables
(x, y, z) are used in R3 , partial derivatives can be denoted by Dx f, Dy f, Dz f. If the
56 7 Directional and Partial Differentials

limit exists, partial derivatives can also be denoted as follows:

∂f f(x + h, y, z) − f(x, y, z)
fx := := lim ,
∂x h→0 h
∂f f(x, y + h, z) − f(x, y, z)
fy := := lim ,
∂y h→0 h
∂f f(x, y, z + h) − f(x, y, z)
fz := := lim .
∂z h→0 h
Problem 7.3. Find the partial derivatives of the following two-variable functions.
2y
(1) f (x, y) = x2 + 3xy + y − 1. (2) f (x, y) = y sin xy. (3) f (x, y) = y+cos x

Solution 7.3 Let’s find D1 f (x, y) for (1). In this case, consider x as the only variable
and treat the rest as constants to calculate the derivative of a single-variable function.
Then,
D1 f (x, y) = 2x + 3y
But is this consistent with the definition? Let’s verify using (7.3). In this case, u = e1
and c = x = (x, y)t , so

f (x + he1 ) − f (x)
lim
h→0 h
f (x + h, y) − f (x, y)
= lim
h→0 h
(x + h)2 + 3(x + h)y + y − 1 − (x2 + 3xy + y − 1)
= lim
h→0 h
(x + h)2 + 3(x + h)y − (x2 + 3xy)
= lim
h→0 h
2hx + h2 + 3hy
= lim
h→0 h
= lim 2x + h + 3y = 2x + 3y.
h→0

Observe why we differentiate only x and treat other variables as constants when
finding D1 f in the calculation above. ⊔

Implicit differentiation can also be done partially.

Problem 7.4. Find the partial derivative of the function f = f (x, y) when implicitly
given as follows.
y f − ln f = x + y.

Solution 7.4 ⊔

Here, we have only computed partial derivatives. Later, we will consider formulas
for computing directional derivatives using partial derivatives.
7.3 Gradient 57

Continuity and directional differentiability

Question 7.1. Suppose that the function f has directional derivatives in all directions
at c ∈ D. Does this imply that f is continuous at c?

The answer to this question is ”YES” if n = 1, that is, if f : R1 → Rm . If n = 1,


there is only one direction, and the differentiability of the direction is the same
as the differentiability. However, when n > 1, there are countless directions. Let us
assume that Du f(c) exists in all directions u. Then f is continuous on all lines passing
through c. So does this mean that f is continuous on c? The answer is ”NO”. Let’s
look at the following example:

Problem 7.5. Consider the function f : R2 → R defined as follows,

x1 x22
f (x) = if x ̸= 0, and f (0) = 0
x12 + x24

Show that while f has directional derivatives at c = 0, it is not continuous there.

Solution 7.5 Let u = (u1 , u2 ) be a direction (i.e., a unit vector). Then,

f (hu) − f (0) h3 u1 u22 u2


Du f (0) = lim = lim 3 2 2 4
= 2.
h→0 h h→0 h (u1 + h u2 ) u1

Therefore, f is directionally differentiable in all directions. However,

√ t2 1
lim f (t, t) = = ̸= 0.
h→0 t2 + t2 2
Hence, f is discontinuous at 0. ⊔

7.3 Gradient

If partial derivatives exist for all component functions fi , i = 1, · · · , n, we can define


and express the gradient of the vector field f as follows:
 
D1 f1 (c) · · · Dn f1 (c)
∇f(c) := (D1 f(c), · · · , Dn f(c)) =  .. ..
.
 
. .
D1 fm (c) · · · Dn fm (c)

Each component function fi is a scalar-valued function, and its gradient is denoted


as a row vector;
∇ fi (c) = (D1 fi (c), · · · , Dn fi (c))1×n .
58 7 Directional and Partial Differentials

From now on, we will define the gradient of a scalar function ∇ f as a row vector.
This is the only case in this lecture where row vectors are used. Since vector-valued
functions have column vector values, it is natural to apply ∇ to vector-valued func-
tions. (Most textbooks do not make this distinction clear. It is convenient to make
this distinction clear when writing.) In conclusion, we can write as follows.
   
f1 (x) ∇ f1 (c)
f(x) =  ...  =⇒ ∇f(c) :=  ...  .
   

fm (x) m×1
∇ fm (c) m×n

Therefore, ∇f(c) is an m × n matrix.

7.4 Higher order partial differentials

If partial derivatives are directionally differentiable, we can consider higher-order


partial differentials. We express higher-order partial differentials as follows: fxx (c) =
2
D2x f(c) = ∂∂ x2f (c), which is the partial differential of fx with respect to x. Similarly,

∂ 2f ∂ 2f
fxx (c) = D2x f(c) = (c), fyx (c) = Dx (Dy f)(c) = (c)
∂ x2 ∂ x∂ y
∂ 2f ∂ 2f
fyy (c) = D2y f(c) = 2 (c), fxy (c) = Dy (Dx f)(c) = (c)
∂y ∂ y∂ x

Generally, fxy ̸= fyx . However, if these mixed partial derivatives are continuous, they
are equal.

7.5 Partial Derivatives with Constrained Variables

Communication skills are important everywhere, and of course, in mathematics too.


Often, it is useful to think about the meaning of sentences. Let’s consider some
problems.

∂w
Problem 7.6. Given w = x2 + y2 + z2 , find ∂x .

When people ask such questions, they do not always specify everything. When we
read such problems, we need to understand the meaning in a reasonable way. In this
problem, w is considered as a function of three variables x, y, z. The person asking the
question would generally consider these three variables as independent variables.
Therefore, we should consider y and z as independent variables and compute ∂∂wx .
Thus, the answer should be as follows.
7.5 Partial Derivatives with Constrained Variables 59

Solution 7.6 Using implicit differentiation, we obtain:

∂w
= 2x + 0 + 0 = 2x.
∂x
If we want to make the meaning of the answer clear, we can represent it as follows:

∂w
= 2x.
∂x x,y,z

∂w
Here, we temporarily create the notation ∂ x x,y,z , which means the partial derivative
of w with respect to x when x, y, z are considered independent variables. ⊔

∂y
Problem 7.7. Given w = x2 + y2 + z2 , find ∂w .

(Huh?) In this problem, we are asked to compute ∂∂wy . This means that y is con-
sidered as a function (or dependent variable) and w is considered as an independent
variable. Typically, when one equation is given, one of w, x, y, z is considered as the
dependent variable and the rest as independent variables. Therefore, if we consider
w, x and z as independent variables, we obtain the following answer.

Solution 7.7 Differentiating the relation with respect to w, we get:

∂y
1 = 0 + 2y + 0.
∂w
∂y 1
Therefore, ∂ w x,z,w = 2y . ⊔

Next, let’s consider the case where there are two relations and four variables. In
this case, we need to clarify the meaning.

∂y
Problem 7.8. Given w = x2 + y2 + z2 and z = xy, find ∂w .

In this problem, we are asked to compute ∂∂wy . This means that y is considered as
a function (or dependent variable) and w is considered as an independent variable.
Typically, when two equations are given, one of w, x, y, z is considered as the depen-
dent variable and the rest as independent variables. Therefore, if we consider w, x
and z as independent variables, we need to clarify the meaning.
The meaning of this problem is confusing. Since there are two relations, we can
consider two variables as dependent and the other two as independent variables.
So, which ones should we choose? Asking to compute ∂∂wx implies considering x
as the independent variable and w as the dependent variable. Then, it should have
been specified in the problem which variables are to be considered as dependent and
independent.
60 7 Directional and Partial Differentials

Solution 7.8 (i) First, let’s find the answer when x and z are considered independent
variables. Then, differentiating the two relations with respect to x, we get:

∂w ∂y ∂y
= 2x + 2y + 0, 0 − 2x − 2y = 0.
∂x ∂x ∂x
Therefore,
∂y x ∂w x
=− and = 2x − 2y = 0.
∂x y ∂x x,z y
(ii) Now, let’s consider the case when x and y are independent variables. Differenti-
ating the two relations with respect to x, we get:

∂w ∂z ∂z
= 2x + 0 + 2z , − 2x − 0 = 0.
∂x ∂x ∂x
Therefore,
∂z ∂w
= 2x and = 2x + 2z2x = 2x + 4zx.
∂x ∂x x,y

The key point to remember in this example is that

∂w ∂w
̸= .
∂x x,y ∂x x,z

That is, the value of the partial derivative ∂∂wx depends on what we choose as the rela-
tive independent variables. It is natural that the influence we exert varies depending
on the relative choice.
Finally, let’s consider the case of three relations and four variables.

∂w
Problem 7.9. Given w = x2 + y2 + z2 , z − x2 − y2 = 0, and x2 + y2 = 1, find ∂x .

Since there are three relations, we can consider three variables as dependent and
the remaining one as the independent variable. If we are asked to compute ∂∂wx ,
it implies considering x as the independent variable. Then, it would be better to
represent it as a total derivative, dw ′
dx . Or, more simply, we can denote it as w . Other
′ ′
derivatives can also be denoted as y and z . Let’s compute it.

Solution 7.9 Implicitly differentiating the three relations with respect to x, we ob-
tain:
w′ = 2x + 2yy′ + 2zz′ , z′ − 2x − 2yy′ = 0, 2x + 2yy′ = 0.
Therefore, y′ = − xy , z′ = 2x − 2y xy = 0, and

x
w′ = 2x + 2yy′ + 2zz′ = 2x − 2y + 0 = 0.
y
7.5 Partial Derivatives with Constrained Variables 61

Thus, w is a constant. (Of course, we can show that w = 2 by substituting x2 + y2 = 1


into the relations, but this problem is meant to practice implicit differentiation.) ⊔

Lecture 8
Full Differentials

8.1 Full Differentials

The (full) differential of a function f : D ⊂ Rn → Rm is its linear approximation


at a given point. More precisely, the (full) differential of a vector-valued function
f : D ⊂ Rn → Rm at a point c ∈ D is a linear function T : Rn → Rm such that if ∥x−c∥
is small, then f(c) + T (x − c) approximates f(x) well up to the order o(∥x − c∥) as
∥x − c∥ → 0. We define it using ε and δ as follows.

Definition 8.1 (Full Differentials). We say f : D ⊂ Rn → Rm is differentiable at


c ∈ D if there exists a linear function T : Rn → Rm such that for any ε > 0, there
exists δ > 0 and

∥f(x) − f(c) − T (x − c)∥ ≤ ε∥x − c∥ whenever ∥x − c∥ < δ . (8.1)

We call the linear function T (or the corresponding matrix A) the (full) differential
of f at c.

In the above definition, if ∥x − c∥ is sufficiently small, we can observe that f(x) is


well approximated by f(c) + T (x − c). That is,

f(x) ∼
= f(c) + T (x − c). (8.2)

Equation (8.1) indicates that their difference converges to 0 as x → c. In fact, not


only the difference but also the ratio

∥f(x) − f(c) − T (x − c)∥


≤ε
∥x − c∥

can be made sufficiently small, becoming less than ε as ∥x−c∥ becomes sufficiently
small.

63
64 8 Full Differentials

If there exists a full differential, it seems like it implies that all partial differentials
exist. Indeed, directional differentials exist for all directions.

Problem 8.1. Show that if f : D ⊂ Rn → Rm is differentiable at c ∈ D, it is direc-


tionally differentiable in any direction u ∈ Rn .

Solution 8.1 To prove this, we need to choose v ∈ Rm . As expected,

v = Tu

Then, taking x = c + hu, we can satisfy the conditions of the definition of directional
differential. ⊔

Question 8.1. We said that the differential T is a linear function. What does that
mean? It means that T is given by a matrix A = (ai j ). What is that matrix?

The answer is in the following theorem.

Theorem 8.1. Let D ⊂ Rn be an open set, f : D → Rm be a differentiable function at


c ∈ D, and let A = (ai j ) be the (full) differential. Then, the j-th column of the matrix
A, denoted by a j , is D j f(c), i.e., A = ∇f(c).

Proof. T (e j ) becomes the j-th column of the matrix. And T (e j ) is the directional
derivative of f in the e j direction at c. Therefore, it becomes D j f(c). ⊔

For a linear function T , we write it as follows:

T (x − c) = ∇f(c)(x − c).

Here, x − c is a vector in Rn , and ∇f(c) is an m × n matrix. Therefore, matrix multi-


plication is well-defined, and the result is a vector in Rm .

Question 8.2. Does the converse of the theorem hold? In other words, if f has partial
derivatives for all elements fi , and thus ∇f(c) exists, is f differentiable at c?

The answer is ”no.” In other words, even if the gradient matrix ∇f(c) exists, the
relation (8.1) may not hold. So, can we find a counterexample? And what additional
conditions are needed for the converse of the theorem to hold?

Theorem 8.2. Let D ⊂ Rn be an open set, c ∈ D, and f : D → Rm . Suppose that


for all j = 1, · · · , n, the partial derivative D j f(x) exists for all x in an open interval
B(c, r) with r > 0 and is continuous at c. Then, f is differentiable at c.

The directional differentiability at a single point c does not imply differentiability


at that point. To obtain differentiability at c ∈ D, partial derivatives must exist at
every point in a neighborhood of c, and they must be continuous at c.
8.2 The Chain Rule; Differential of Compositions 65

8.2 The Chain Rule; Differential of Compositions

The chain rule is the rule for differentiating composite functions, and the explanation
for it is shown in the above figure. Theorem 8.1 is a related theorem, but proving
it is important, but understanding each part of the above figure clearly is no less
important than proving it. Try to understand what the figure is saying by yourself
first, and then compare it with the following explanation. In the figure, a function
g : Rℓ → Rn is given. The function g maps the domain Rℓ of the function f to the
codomain Rn . Assume that the initial function g is differentiable at c ∈ Rℓ . Then,
the derivative ∇g(c) (or the linear function H) is an n × ℓ matrix and is a linear
approximation of g at the point c. The second function f : Rn → Rm is fortunately
also differentiable at the point g(c) ∈ Rn . Then, the derivative ∇f(g(c)) is an m × n
matrix (or the linear function T ), and it is a linear approximation of f at the point
g(c).
The composite function f ◦ g is defined on Rℓ and takes values in Rm , and the fact
that this composite function is differentiable at c is the chain rule. To prove this,
we need to find a linear function approximating the composite function f ◦ g at the
point c. What could it be? Obviously, it is the composition T ◦ H of the two linear
functions. Expressed as matrices, it is given by the product of the two matrices as
in Equation (8.1). The following is the essence of the differentiation formula: the
Chain Rule.

Theorem 8.3 (The Chain Rule). Let D ⊂ Rℓ be an open set, c ∈ D, and g : D ⊂


Rℓ → Rn be differentiable at c. And let f : Rn → Rm be differentiable at g(c) ∈ Rn .
Then, the composite function g ◦ f is differentiable at c, and its derivative ∇(f ◦ g)(c)
is as follows:
 
∇(f ◦ g)(c) = ∇f(g(c))∇g(c) ≡ ∇f g(c) ∇g c , (8.3)

The right-hand side of Equation (8.1) is a matrix multiplication of (m × n) ∗ (n ×


ℓ), resulting in an m × ℓ matrix. It is more clear to write ∇f g(c) instead of ∇f(g(c)).
Since c ∈ Rℓ , ∇f c has no meaning because c does not belong to the domain of f.

Proof (Proof of The Chain Rule). The logic of the proof is similar to proving the
continuity of the composition of two continuous functions (Problem 4.14). We omit
it. ⊔

Theorem 8.3 is the most general form of the chain rule. In the next section, we
consider three cases where the dimension of the starting space Rℓ is ℓ = 1, 2, 3.
The dimension of the intermediate space Rn is not so important, so we fix n = 3. We
consider the case where the destination space Rm is m = 1 so that we can handle each
component fi separately. Although the problem considers vector-valued functions f,
in the explanation, we consider the case where m = 1 for simplicity of notation,
considering only one component of f.
66 8 Full Differentials

8.3 Graph of Differentials

In this section, we will examine the relationship between the graph of a function
f and the graph of its differential T . We can clearly see the graph of a function
f : Rn → Rm only when n + m ≤ 3, so we can explicitly draw figures for the three
cases of (m, n) = (1, 1), (1, 2), and (2, 1). Still, when the dimension is higher, we
simply imagine that they would look similar.

Question 8.3. What does the graph of a linear function T : Rn → Rm look like?

Explaining the answer to the above question is a good exercise. Let’s consider the
cases of (m, n) = (1, 2) and (2, 1) first, and then describe the general case.

Question 8.4. If a linear function T is the differential of a function f, what is the


relationship between its graph and the graph of f? Explain using the tangent relation.

Zero-level set

Next, we describe the graphs of multivariable functions and their differentials using
the zero-level set. Let f : R2 → R be differentiable at c ∈ R2 . The graph of the
function f is the set of points where z = f (x, y). Define the scalar function F : R3 →
R as follows:
F(x, y, z) = f (x, y) − z.
Then, the graph of the function f (x, y) becomes the zero-level set of F. Let
c = (c1 ; c2 ), and c̃ := (c; f (c)) = (c1 , c2 , f (c1 , c2 )). Then, the gradient of F at c̃
is ∇F(c̃) = ( fx (c), fy (c), −1).

Problem 8.2. Let G : Rn → R be such that ∇G(c) ̸= 0 for some c ∈ Rn . Describe


the relationship between the zero-level set of G passing through c and ∇G(c).

Solution 8.2 What should we show? We will demonstrate that every curve on the
zero-level set passing through c is orthogonal to ∇G(c). (Is it good?) Let r(t) be a
curve on the zero-level set for t ∈ (−ε, ε) with r(0) = c. Then, since G(r(t)) = 0
for all t,
dG
= ∇G(r(t)) · r′ (t) = 0
dt
(using the chain rule for the next lecture). Thus, ∇G(r(0))·r′ (0) = ∇G(c)·r′ (0) = 0,
and therefore r′ (0) is orthogonal to ∇G(c). ⊔ ⊓

The graph of f is the zero-level set of F, so by Problem 8.2, the graph of f is


orthogonal to ∇F. Let c be (c1 ; c2 ) and c̃ be (c; f (c)). Then, the gradient of F at c̃ is
∇F(c̃) = ( fx (c), fy (c), −1).
The differential at c of the function f is a linear function T : R2 → R. This func-
tion is given by the gradient ∇ f (c), i.e., T = ∇ f (c) = ( fx (c), fy (c)), and
8.3 Graph of Differentials 67
 
x
T x = ( fx (c), fy (c)) = fx (c)x + fy (c)y.
y

Thus, the graph of the differential is composed of points (x; y; z) satisfying z =


fx (c)x + fy (c)y. Therefore, every point (x; y; z) on the graph of the differential satis-
fies
( fx (c), fy (c), −1) · (x; y; z) = fx (c)x + fy (c)y − z = 0.
Therefore, the graph of the differential is a plane passing through the origin orthog-
onal to the vector ( fx (c); fy (c); −1). Thus, the graph of f at the point c̃ is a plane
tangent to the graph of the function f (x) at the point c̃.
We may try other cases.

Problem 8.3. Discuss the relationship between a function f : R → R and its graph
when f is differentiable at c ∈ R.

Solution 8.3 This problem is a simple yet good exercise. ⊔


Problem 8.4. Consider a function f : R → R2 that is differentiable at c ∈ R. Discuss


the relationship between the function f and the graph of its differential.

Solution 8.4 This is the simplest case of a vector-valued function. ⊔


Considering the general case f : Rn → Rm will be helpful for the following prob-
lem.

Problem 8.5. Consider a linear function T : Rn → Rm . Find m vectors perpendicular


to the n-dimensional plane of T ’s graph in Rn+m .

Solution 8.5 The graph of T is the set of points (x; y) ∈ Rn+m satisfying y = T x.
Define the function F : Rn+m → Rm as F(x, y) = T x − y. Then, the graph of T is the
zero-level set of F. Let T = (ai j ) (1 ≤ i ≤ m, 1 ≤ j ≤ n). Then, the gradient of F is
given by  
a11 a12 · · · a1n −1 0 · · · 0
 a21 a22 · · · a2n 0 −1 · · · 0 
∇F =  .
 
.. .. .. .. .. 
 .. . . . . . 
am1 am2 · · · amn 0 0 · · · −1 m×(n+m)

∇F has m row vectors, which are perpendicular to the plane in question. ⊔


Question 8.5. For Problem 8.5 to hold, one or two conditions are necessary for T .
Which of the following conditions are needed?
(i) The rows of the matrix (ai j ) are linearly independent. (ii) There are no zero
rows in (ai j ).
(iii) n ≥ m. (iv) m ≥ n.
68 8 Full Differentials

For Problem 8.5 to hold, conditions (i) and (ii) are necessary. The graph of T
becomes an n-dimensional plane in Rn+m and m vectors perpendicular to that plane
can be found only when the rows of the matrix (ai j ) are linearly independent and
none of them are zero rows. (But if there are zero rows, they are not linearly inde-
pendent anyway, so only condition (i) is sufficient.)

Problem 8.6. Discuss the relationship between a function f : Rn → Rm and the graph
of its differential at c ∈ Rn when f is differentiable at c.
Lecture 9
Line Integral

In Calculus 1, integration was defined for functions defined on an interval [a, b].
Now, let’s consider a real-valued function f : R3 → R defined in three-dimensional
space. Suppose there is a curve C lying in this space. Even if this curve is curved,
from the perspective of the curve or tiny organisms living on it, there is no distinction
between straight and curved lines. It’s like how we perceive ourselves living on a
spherical Earth but feel like we’re living on a flat plane. In this situation, we can
consider the function f as defined on this curve C as if it were a one-dimensional
function, and integrating in this scenario is what we call a line integral. For example,
if the function is f = 1, then the integral should be the length of the curve C. It’s
very difficult for us, from the outside, to directly integrate this. It’s hard to determine
where the curve passes and what values it takes there. In most cases, as shown in
the figure above, integration is done when the curve is parameterized by r(t). How
is the line integral given when using a parameter?

Question 9.1. The integral ab f (r(t))dt of the composite function f (r(t)) is not the
R

line integral we desire. Why is that? It’s easy to know. For example, when f = 1,
the result is not the length of the curve C but the length of the interval [a, b].

69
70 9 Line Integral

9.1 Line Integral

Let f : R3 → R be a function and C be a curve in the space R3 . We are interested in


the values of the function on the curve, and we want to integrate the function f as if
it were defined on this curve like a one-dimensional function. This is what we call
a line integral. For example, although we live on a spherical Earth, we can consider
ourselves living on a two-dimensional plane, and this is how we want to consider
integrating f on the curve C. If we understand the line integral as an integral in one
dimension, then we can define it similarly. First, define the partition of the curve as
follows:
π = {x0 , x1 , · · · , x p }
where x0 is the starting point of the curve, x p is the ending point of the curve, and
the rest xi are points on the curve given in order. In one-dimensional integration,
we could write it more concisely as a = x0 < x1 < · · · < x p = b, but it’s hard to
write it concisely for line integral. Then, the line integral is expressed and defined
as follows:
Z N
f = lim ∑ f (xi )∆ xi , (9.1)
C ∥π∥→0 i=1

where ∆ xi is the length of the ith interval of the partition of the curve, and f (xi ) is
the value of the function at the endpoint of that subinterval. When doing Riemann
integration, we can choose any point inside the subinterval.
If the curve C is given by a parameter t ∈ [a, b] using r(t), then we can express
and define the line integral as follows, after taking a partition of the interval [a, b] as

π = {a = t0 < t1 < · · · < t p = b}

and defining xi = r(ti ):


Z N
f = lim ∑ f (r(ti ))∆ xi , (9.2)
C ∥π∥→0 i=1

Problem 9.1. Explain the difference between the limit (9.2) and the integral of the
composite function f (r(t)).
Rb
Solution 9.1 The integral a f (r(t))dt satisfies the following relationship:
Z b N
f (r(t))dt = lim ∑ f (r(ti ))∆ti , ∆ti = ti+1 − ti .
a ∥π∥→0 i=1

The difference from the limit (9.2) is that ∆ xi and ∆ti do not match. They are dif-
ferent. ⊔⊓
9.2 Expansion rate of a curve 71

9.2 Expansion rate of a curve

Let r : R1 → R3 be a function that maps the interval [a, b] to a curve in R3 . The


length of this curve may have increased or decreased compared to the length of
[a, b]. We already know how to measure its length. That is,
Z b Z bq
∥r′ (t)∥dt = x′ (t)2 + y′ (t)2 + z′ (t)2 dt. (Length Formula)
a a

It’s most convenient to remember that integrating speed gives distance. Thinking
of it as the rubber band expanding rather than the speed also has its usefulness.
Especially when the dimension of the space Rℓ increases, it becomes particularly
useful
p because it’s impossible to think in terms of speed in higher dimensions.
x′ (t)2 + y′ (t)2 + z′ (t)2 represents how much the rubber band around the segment
[a, b] ⊂ Rℓ expands around r(t) as t moves nearby. Let’s calculate how much it ex-
pands using the definition of full differential. According to the definition, even if
any ε > 0 is given, for sufficiently small h, the following holds:

∥r(t + h) − r(t) − T h∥ < ε|h| ⇒ ∥r(t + h) − r(t)∥ − ∥T h∥ < ε|h|.

Rearranging it again, we get:

∥r(t + h) − r(t)∥ ∥T h∥
− < ε.
|h| |h|

∥r(t + h) − r(t)∥
Here, represents how many times the original rubber band length,
|h|
with length h, expands due to the original function r. Of course, the image may
not be a straight line, but for sufficiently small h, we can think of it as a straight
line. On the other hand, ∥T|h|h∥ represents how many times it expands due to the
linear function or by differentiation. Therefore, the above expression means that the
difference between the two is less than any ε > 0. That is, if h is sufficiently small,
instead of calculating with r, we can calculate it with the linear function T = ∇r(t).
Then we get:
∥T h∥
q
= ∥∇r(t)∥ = x′ (t)2 + y′ (t)2 + z′ (t)2 .
|h|

Problem 9.2. If r(t) is differentiable and v(t) = r′ (t), then show that the following
holds: Z Z b
f= f (r(t))∥v(t)∥dt. (9.3)
C a

Solution 9.2 Using the length formula,


Z ti+1
∆ xi = ∥v(t)∥dt
ti
72 9 Line Integral

is satisfied. Substituting this into the definition of the line integral (9.2), we get
Z N Z ti+1 Z b
f = lim ∑ f (r(ti ))∥v(t)∥dt = f (r(t))∥v(t)∥dt
C ∥π∥→0 i=1 ti a

is satisfied. ⊔

If the curve C is parameterized by r(t) using a parameter, and the function r(t) is
differentiable, then the curve C is called smooth.

9.3 Functions on parametrized curves

Let’s consider one specific case of the Chain Rule Theorem 8.3, which is ℓ = 1.
We fix the intermediate space Rn to n = 3. Thus, g is a vector-valued function with
three components. Consider a scalar-valued function f . The figure corresponding
to this section is as follows. In this section, we use t as the variable of Rℓ=1 . The

independent variables in space Rn=3 are denoted as (x, y, z). Therefore, we express
g(t) as g(t) = (x(t), y(t), z(t))t .

dw
Problem 9.3. Given w = xy + z and (x, y, z) = (2 cost, 2 sint, 5 cos2 t), find .
dt
Solution 9.3 First, we need to interpret the problem correctly. It’s essential to prac-
tice viewing the given problem from the perspective of the chain rule. Here, w is
a function of (x, y, z), and (x, y, z) is a function of t. Finding dw
dt means differen-
tiating the composition of w and g(t) = (x(t); y(t); z(t)). Here, we abuse notation
again. According to the Chain Rule Theorem 8.3, we have w = f (x, y, z) = xy + z
and g(t) = (2 cost; 2 sint; 5 cos2 t). Therefore,

∇ f (g(t)) = ∇ f at g(t)
= (y, x, 1) at g(t)
= (2 sint, 2 cost, 1),
9.4 Directional derivative and Chain rule 73
 
−2 sint
∇g(t) = g′ (t) =  2 cost .
−10 cost sint
dw
Therefore, = ∇ f (g(t)) · ∇g(t) = −4 sin2 t + 4 cos2 t − 10 sint cost. ⊔

dt
In Problem 9.2, we attempted to use the theorem directly. It’s crucial to grasp the
meaning of the theorem through repeated practice. However, since that always takes
time, we write the equation (8.1) as follows for convenience:
 ′ 
x (t)
df ∂f ′ ∂f ′ ∂f ′
= ∇ f |g(t) · ∇g|t = ( fx , fy , fz ) y′ (t) = x (t) + y (t) + z (t).
dt ′ ∂x ∂y ∂z
z (t)

Here, we again abuse notation. ddtf on the left-hand side means that f is thought of
as a function of t. In other words, it means the composition of f and g. On the right-
hand side, ∂∂ xf means that f is thought of as a function of x, y, z. This expression is
clearer.
df ∂ f dx ∂ f dy ∂ f dz
= + + . (9.4)
dt ∂ x dt ∂ y dt ∂ z dt
Question 9.2. What if we use a vector-valued function f? Nothing changes. Just
replace f with f and think of everything as row vectors. The formula is simply as
follows.
df ∂ f dx ∂ f dy ∂ f dz
= + + .
dt ∂ x dt ∂ y dt ∂ z dt

9.4 Directional derivative and Chain rule

Problem 9.4 (Directional differential). Let f : Rn → Rm be differentiable at c ∈ Rn ,


and let u ∈ Rn be a unit vector. Use the Chain Rule to show that the directional
differential Du f (c) is given by:
n
Du f(c) = ∇f(c)u = ∑ Di f(c)ui . (9.5)
i=1

(The formula for the directional differential is easy to remember. It’s simply ”gradi-
ent matrix times direction.”)

Solution 9.4 (Here, c is in c ∈ Rn . That is, c ̸∈ Rℓ .) The directional differential is


given by:
f(c + hu) − f(c) d
Du f(c) = lim = f(c + tu) .
h→0 h dt t=0

Using the Chain Rule, g(t) = c + tu, so


74 9 Line Integral

d
f(c + tu) = ∇f|g(0)=c ∇g|0 = ∇f(c)u. ⊔

dt t=0

Question 9.3. Eqn. (9.5) shows how to compute the directional derivative Du f(c)
using partial derivatives. Does the existence of partial derivatives always imply the
existence of the directional derivative Du f(c)? No, it doesn’t. There’s an important
condition to remember. What is it?

Problem 9.5. Let u = ( √13 , √13 , √13 ) and f : R3 → R be given by f (x, y, z) = xy + z.


Find the directional differential Du f at c = (2, 1, 0).

Solution 9.5 Let’s calculate it first. Suppose g(t) = c + tu, then

√1
 
3
∇ f (g(0)) = ∇ f (c) = (y, x, 1)|c = (1, 2, 1), ∇g(0) = u =  √13  .
 
√1
3

Therefore, ∇ f (g(0)) · ∇g(0) = √4 . ⊔



3

Problem 9.6. Let f : R3 → R2 be given by f(x, y, z) = (xy + z, yz + x), and let the
direction vector be u = ( √13 , √13 , √13 ). Find the directional differential Du f at c =
(2, 1, 0).

Solution 9.6 Here, even though f is a vector-valued function, we can still calculate
it in the same way. That is,
 1 
      √3 √4
!
yx1 121 1 2 1  √1  3
∇f(c) = = , Du f(c) = ∇f(c)u =  3  = √2 .
1zy c 101 101 1 √ 3
3


Relationship between Du f and directional derivative when ∥u∥ ̸= 1

The directional derivative Du f(c) is for the case where u is a unit vector with
∥u∥ = 1. However, even when u is not a unit vector, we define Du f using (11.1).
But when ∥u∥ ≠ 1, calling Du f the directional derivative is not exactly correct. To
compare with the directional derivative, let’s perform the calculation:
1 1
D u f(c) = (u1 D1 + · · · + un Dn )f(c) = Du f(c)
∥u∥ ∥u∥ ∥u∥

is obtained. Therefore, Du f(c) is the directional derivative in the u direction multi-


plied by ∥u∥. In other words,

Du f(c) = ∥u∥D u f(c).


∥u∥
9.4 Directional derivative and Chain rule 75

Even if someone calls Du f the directional derivative, if u is not a unit vector, it


should be understood as the directional derivative in the u direction multiplied by
∥u∥.
Lecture 10
Finding Extreme Values

The purpose of this lecture is to find the maximum and minimum values of a differ-
entiable function f : Rn → R over the entire domain or within a region on a surface
or curve. In the case of surfaces or curves, we consider two cases: when they are
given by parameters or by level sets.

10.1 Extreme values in the entire space

Definition 10.1. Let D ⊂ Rn be an open set, x0 ∈ D, and f : D → R. (1) If there


exists r > 0 such that f (x0 ) ≥ f (x) for all x ∈ B(x0 , r), then f (x0 ) is called a local
maximum. (2) If there exists r > 0 such that f (x0 ) ≤ f (x) for all x ∈ B(x0 , r), then
f (x0 ) is called a local minimum. (3) The combination of these two cases is called
a local extremum.

Definition 10.2. Additionally, suppose that the function f : D → R has a partial


derivative at the point x0 . (4) If ∇ f (x0 ) = 0, then x0 is called a critical point. (5) If
x0 is a critical point but not a local extremum, it is called a saddle point.

Problem 10.1. Suppose f (x0 ) is a local extremum and the directional derivative
Du f (x0 ) exists in the direction of u. Show that Du f (x0 ) = 0.

Solution 10.1 Let’s only consider the case where f (x0 ) is a local maximum. In this
case, for h > 0,
f (x0 + hu) − f (x0 )
≤ 0.
h
Therefore,
f (x0 + hu) − f (x0 )
Du f (x0 ) = lim ≤ 0.
h→0+ h
Similarly, for h < 0,

77
78 10 Finding Extreme Values

f (x0 + hu) − f (x0 )


Du f (x0 ) = lim ≥ 0.
h→0− h

The only value that satisfies both conditions is Du f (x0 ) = 0. ⊔


Question 10.1. How would we prove the case where f (x0 ) is a local minimum?
There are two methods. One is to follow the above proof but reverse the direction
of the inequality, and the other is to use the fact that if g(x) = − f (x) has a local
maximum at x0 .

10.2 Criterion for maximum, minimum, and saddle

We will learn how to determine and find the maximum and minimum values of
a multivariable function f : Rn → R. Let’s start by reviewing the case of single-
variable functions and think about its significance.

Problem 10.2. Find the critical points of the function f (x) = x2 + 3x − 1 and deter-
mine whether these critical points are local maxima or minima.

Solution 10.2 To find the critical points, we compute f ′ (x) = 2x + 3 = 0, which


gives x = − 23 . Computing the second derivative, f ′′ (− 23 ) = 2 > 0, which means the
function is increasing at the critical point. Therefore, it has a local minimum at this
point, and in fact, this is the global minimum. ⊔ ⊓

In the case of single-variable functions, if the second derivative at a critical point


is positive, the point is a local minimum, and if it’s negative, the point is a local
maximum. Then, what about the relationship between the second derivatives and
critical points for multivariable functions?
We represent ∇ f as a row vector:

∇ f (x) = (D1 f (x), · · · , Dn f (x)).

This vector consists of first-order partial derivatives. The Hessian is the derivative
of the derivative. More precisely, we first take the transpose of ∇ f and then take the
gradient of the resulting column vector. This gives us a square matrix consisting of
second-order partial derivatives:
   
D1 f (x) D11 f (x0 ) · · · D1n f (x0 )
H f (x0 ) = ∇  ...  =  .. ..
.
   
. .
Dn f (x) Dn1 f (x0 ) · · · Dnn f (x0 )

It’s not enough for the second derivatives to exist; we also need the second-order
partial derivatives to be continuous. If the function is continuously differentiable,
10.2 Criterion for maximum, minimum, and saddle 79

then Di j f = D ji f , so the Hessian matrix is symmetric, meaning the ith row and ith
column are the same.
This is where linear algebra comes in. According to linear algebra, if A is an
n × n symmetric matrix, it has n eigenvalues λi and corresponding eigenvectors xi
for i = 1, · · · , n, satisfying:
Axi = λi xi .
This means that if we only consider the direction of xi , multiplying the matrix A
is the same as multiplying by λi . In other words, this is the case where the second
derivative of the function is λi along this direction. Furthermore, the eigenvectors xi
are orthogonal to each other, meaning that if we rotate the coordinate axes appropri-
ately, we can make them coincide with the basic axis direction ei .
If x0 is a critical point and all eigenvalues of its Hessian are positive, the function
f has a local minimum at x0 . If they are all negative, it has a local maximum, and if
they are mixed, it is a saddle point. If zero is included, the conclusion is inconclusive.
Can you explain why this is the case, comparing it to the case of single-variable
functions?

Question 10.2. There is one missing condition in the above explanation. What is it?

Now let’s consider the case where f : R2 → R. Then, the Hessian matrix is a 2 × 2
matrix:  
f (x ) f (x )
H f (x0 ) = xx 0 xy 0 .
fxy (x0 ) fyy (x0 )
For a 2 × 2 matrix, there are two eigenvalues. If both eigenvalues of H f (x0 ) are
positive, then f (x0 ) is a local minimum. If one is positive and the other is negative,
it is a saddle point. If both are negative, f (x0 ) is a local maximum.
If we denote the two eigenvalues as λ1 and λ2 , their product and sum are as
follows:
2
λ1 λ2 = fxx (x0 ) fyy (x0 ) − fxy (x0 ), λ1 + λ2 = fxx (x0 ) + fyy (x0 ).

Using this relationship, we obtain the following theorem.

Theorem 10.1. Let f : R2 → R be a function with continuous partial derivatives fx ,


fy , fxx , fyy , fxy , and fyx , and let ∇ f (x0 ) = 0. Then,
2 (x ) < 0, x is a saddle point.
1. If fxx (x0 ) fyy (x0 ) − fxy 0 0
2 (x ) > 0 and f (x ) > 0, x is a local minimum.
2. If fxx (x0 ) fyy (x0 ) − fxy 0 xx 0 0
2 (x ) > 0 and f (x ) < 0, x is a local maximum.
3. If fxx (x0 ) fyy (x0 ) − fxy 0 xx 0 0
2 (x ) = 0, anything is possible.
4. If fxx (x0 ) fyy (x0 ) − fxy 0

Problem 10.3. Classify the critical points of the following functions.


(1) f = x2 + y2 − 4y + 9 (2) f = y2 − x2 (3) f = xy − x2 − y2 − 2x − 2y + 4.
80 10 Finding Extreme Values

Solution 10.3 ⊔

Problem 10.4. Classify the critical points of the function f = x2 + y2 − z2 + xy − z.

Solution 10.4 We can find the critical points, but there is currently no way to de-
termine whether these points are maximum, minimum, or saddle points in three-
dimensional space. ⊔⊓

Problem 10.5 (Finding maximum and minimum on boundaries). Consider the


function f = 2 + 2x + 4y − x2 − y2 . Let D be the triangle with vertices at (0, 0),
(9, 0), and (0, 9). Find the global maximum and minimum of the function f on the
triangle D.

Solution 10.5 To solve the problem, we need something more. Comparing only
critical points is not enough because maximum or minimum values can occur at the
boundary even if they are not critical points. We will cover what is needed in the
next section. ⊔ ⊓

10.3 Parameterized curves and surfaces

When a function f : R3 → R has three variables, we want to find the extreme


values of f on surfaces or curves in R3 . First, we will solve it using parameterization,
and then we will consider f as a function of one or two variables and calculate the
extreme values as in the previous section.

Problem 10.6. Find the point on the 3-dimensional curve (cost, sint,t) closest to
the origin.

Solution 10.6 The functionp representing the distance between a point x = (x, y, z)
and the origin is f (x, y, z) = x2 + y2 + z2 . If f (x0 ) is a critical value, then h(x0 ) is
also a critical value of the function h(x, y, z) = x2 + y2 + z2 , and the computation is
simpler than finding critical values of g. Then h(t) := h(r(t)) = cos2 t + sin2 t +t 2 =
t 2 + 1, and h′ (t) = 2t. Therefore, t = 0 is the only critical point. Since h′′ (t) = 2 > 0,
this is a global minimum. The original variables (x, y, z) = (1, 0, 0) are the closest
point to the origin on the curve. ⊔ ⊓
10.4 Extreme values on level-set; Lagrange multiplier 81

Problem 10.7. Find the point on the plane x + y − z − 1 = 0 closest to the origin.

Solution 10.7 The plane x + y − z − 1 = 0 can be viewed as a plane or as a parame-


terized surface. In this section, we consider it as a parameterized surface. Since it’s a
plane, we use two parameters (x, y). Then g(x, y) = (x, y, x + y − 1). The combination
is as follows:

h(x, y) = x2 + y2 + (x + y − 1)2 = 2x2 + 2y2 + 2xy − 2x − 2y + 1.

Then we can find the extreme values of the two-variable function. ⊔


Solution 10.7 In the above solution, we used the same variables (x, y) for Rℓ space.
If this is inconvenient, we can use g(u, v) = (u, v, u + v − 1). Then the combination
is as follows:

h(u, v) = u2 + v2 + (u + v − 1)2 = 2u2 + 2v2 + 2uv − 2u − 2v + 1.

Then we can find the extreme values of the two-variable function. ⊔


10.4 Extreme values on level-set; Lagrange multiplier

In Rn , an (n − 1) dimensional surface is often given by the level-set of a function


g : Rn → R. In other words,

S = {x ∈ Rn : g(x) = 0}.

Here, ∇g(x) is perpendicular to the surface. In this situation, the given problem is
to find the maximum and minimum of the function f : Rn → R on the surface S. It
is important to note that these are extreme values on the surface S, not in the entire
space Rn . In this situation, the Lagrange multiplier method is appropriate. The key
idea of this method is as follows:

“The vector ∇ f (x) represents the direction in which f increases most rapidly.
Similarly, if you move in the direction of −∇ f (x), f decreases most rapidly. If
f has an extreme value at the point x0 on the surface S, then ∇ f (x0 ) must be
perpendicular to the surface. Otherwise, ∇ f (x0 ) will have a tangential compo-
nent to the surface, and the value may increase or decrease as the point moves
along the surface.”

Therefore, if the surface S is a level-set of the function g and x0 is a point on the


surface where the function f has a maximum or minimum, then the following must
hold:
∇ f (x0 ) = λ ∇g(x0 ),
where λ is a real number called the Lagrange multiplier.
82 10 Finding Extreme Values

Problem 10.8. Find the maximum and minimum values of the function f (x, y) = xy
on the ellipse
x 2 y2
+ = 1.
8 2
2 2
Solution 10.8 The constraint function is g(x, y) = x8 + y2 . The relation ∇ f = λ ∇g
leads to:
x
y = λ , x = λ y.
4
This results in 4y = λ 2 y. Therefore, either y = 0 or λ = ±2. If y = 0, then x = λ 0 = 0.
Thus, (0, 0) is one possible point, but it does not lie on the ellipse.
Consider the cases λ = ±2. Since x = λ y, we have:

4y2 y2
+ = 1 ⇒ y = ±1.
8 2
And since x = λ y, the four possible points are (2, 1), (−2, 1), (2, −1), (−2, −1).
Therefore, comparing the values at these four points is sufficient. ⊔

Problem 10.9. Find the maximum and minimum values of the function f (x, y) =
3x + 4y on the circle x2 + y2 = 1.

Solution 10.9 ⊔

If functions gi : Rn → R are differentiable for i = 1, · · · , k, and Si are level sets


defined as follows:
Si = {x ∈ Rn : gi (x) = 0},
and S is the intersection of these level sets, i.e.,

S = ∩ki=1 Si or S = {x ∈ Rn : g1 (x) = 0, · · · , gk (x) = 0},

and the function f : Rn → R has an extreme value at a point x0 ∈ S, then if ∇ f (x0 )


has a component tangent to S, the point may not be an extreme value. Therefore,
∇ f (x0 ) must be perpendicular to the surface S. It should be noted that for all i =
1, · · · , k, ∇gi (x0 ) must be perpendicular to the surface S. Therefore, ∇ f (x0 ) will be
a linear combination of ∇gi (x0 ), and there exist λi such that the following holds:

∇ f (x0 ) = λ1 ∇g1 (x0 ) + · · · + λk ∇gk (x0 ). (10.1)

To test for extreme values, we consider points where (10.1) is satisfied.

Problem 10.10. Find the point on the curve in 3-dimensional space

S = {x ∈ R3 : x + y + z = 1, x2 + y2 = 1}

closest to the origin.


10.4 Extreme values on level-set; Lagrange multiplier 83

Solution 10.10 The square of the distance to the origin is calculated as f (x, y, z) =
x2 +y2 +z2 . The distance is minimized when the square of the distance is minimized.
Therefore, the relation ∇ f = α∇g1 + β ∇g2 is as follows:
     
x 1 x
y = α 1 + β y .
z 1 0

Then,  
x = α + β x
 (1 − β )x = α

y = α +βy ⇒ (1 − β )y = α
 
z=α z=α
 

If β = 1, then z = 0. Solving x + y = 1 and x2 + y2 = 1, we obtain the two points


(1, 0, 0) and (0, 1, 0). If β ̸= 1, then x = y. And since x2 + y2 = 1, we have√x = y =
± √12 . Since z = 1 − x − y, there are two additional points: ( √12 , √12 , 1 − 2) and

(− √12 , − √12 , 1 + 2). Therefore, we just need to check these four points. ⊔ ⊓
Lecture 11
Taylor’s Formula for Multi-Variable Functions

Taylor’s formula is a method for approximating functions. In this lecture, we extend


Taylor’s formula, which we applied to functions of one variable in Calculus 1, to
functions of several variables. In this lecture, let D ⊂ Rn be an open set, and let
c, x ∈ D be elements of that open set such that

f :D→R is a function differentiable (k+1) times.

The domain D is a convex set. ((k+1)-times differentiable means that it can be dif-
ferentiated up to k+1 times and these derivatives are all continuous functions. A
function is called continuously differentiable.) Taylor’s formula for multi-variable
functions is very similar to the case of single-variable functions if you know how to
compute directional derivatives. Let’s start by considering why the domain D needs
to be convex.

11.1 Taylor’s formula for 1-variable functions (Review)

First, let’s review the case for single variables. Let (a, b) ⊂ R be an open interval
and let F : (a, b) → R be a function differentiable (k+1) times. Taylor’s theorem
states that there exists a point s between c and x such that the following holds:

F(c) F ′ (c) F ′′ (c) F (k+1) (s)


F(x) = (x − c)0 + (x − c)1 + (x − c)2 + · · · + (x − c)k+1 .
0! 1! 2! (k + 1)!

There are a total of k + 2 terms in the above expression, and the first k + 1 terms,

F(c) F ′ (c) F (k) (c)


pk (x) := (x − c)0 + (x − c)1 + · · · + (x − c)k ,
0! 1! k!
form a polynomial of degree k. This polynomial is considered as an approximation
of the original function f (x). The last term,

85
86 11 Taylor’s Formula for Multi-Variable Functions

F (k+1) (s)
R(x) := (x − c)k+1 ,
(k + 1)!

represents the difference between F(x) and the kth degree polynomial pk (x). Al-
though we don’t know the exact value of s, this expression tells us the approximate
error within a certain range. The maximum possible error is given by:

F (k+1) (s)
Maximum Error = max (x − c)k+1 .
s∈(c,x) (k + 1)!

Question 11.1. If the function F is differentiable (k+1) times, we can construct a


k + 1 degree approximation polynomial pk+1 and forego the error estimation. Which
would be better? Having a polynomial of one degree lower with an estimation of
error, or having a polynomial of the exact degree?

We can write Taylor’s theorem in a simpler form. If c = 0 and x = h > 0, then we


can write:

F(0) 0 F ′ (0) 1 F (k) (0) k F (k+1) (s) k+1


F(h) = h + h +···+ h + h , 0 ≤ s ≤ h.
0! 1! k! (k + 1)!

In particular, if c = 0 and x = 1, Taylor’s theorem can be written as follows:

F(0) F ′ (0) F (k) (0) F (k+1)


F(1) = + +···+ + (s), 0 ≤ s ≤ 1.
0! 1! k! (k + 1)!

Problem 11.1. In the above description, we only considered scalar-valued functions


F : (a, b) → R. Describe Taylor’s theorem for vector-valued functions g : (a, b) → R.

Solution 11.1 For vector functions, Taylor’s polynomial can be written in the same
way as for scalar functions, but there cannot be an error term. Instead, we should
consider the directional derivative. The statement of Taylor’s theorem for vector-
valued functions is: [Please provide the solution for this problem]

11.2 Higher order directional derivatives

Let f : Rn → Rm be a function that is (k + 1) times continuously differentiable at


c ∈ Rn , and let u ∈ Rn be a unit vector. The first-order directional derivative Du f(c)
is given by:
f(c + hu) − f(c) d
Du f(c) = lim = f(c + tu) .
h→0 h dt t=0

Let g(t) = c +tu, then f(c + hu) = f ◦ g(t) and ∇g(t) = u. Therefore, using the chain
rule, we have:
d
Du f(c) = f(c + tu) = ∇f(c + tu)u t=0 .
dt t=0
11.2 Higher order directional derivatives 87

To compute higher-order directional derivatives, we can separate terms with and


without t. A good way to write this is as follows:

Du f(c) = ∇f(c + tu)u t=0


= (u1 D1 f(c + tu) + · · · + un Dn f(c + tu))|t=0 .

In conclusion, computing the directional derivative Du f(c) is equivalent to applying


the operator (u1 D1 + · · · + un Dn ) to the function f. That is,

Du f(c) = (u1 D1 + · · · + un Dn )f(c). (11.1)

The second-order directional derivative is given by:

Du (Du f(c)) = (u1 D1 + · · · + un Dn )(u1 D1 + · · · + un Dn )f(c).

More generally, computing the kth order directional derivative in the direction u is
equivalent to applying the operator (u1 D1 + · · · + un Dn )k to the function f:

Dku f(c) = (u1 D1 + · · · + un Dn )k f(c). (11.2)

Problem 11.2. Let f (x, y) = sin x sin y and u = (u1 ; u2 ) be given. Compute the di-
rectional derivatives of f at the origin in the direction u up to the third order.

Solution 11.2 The 0th derivative is f (0, 0) = 0. The 1st derivative is:

Du f (0, 0) = (u1 D1 f (0, 0)+u2 D2 f (0, 0)) = (u1 cos(0) sin(0)+u2 sin(0) cos(0)) = 0.

The 2nd derivative is:

D2u f (0, 0) = (u1 D1 + u2 D2 )2 f (0, 0)


= (u21 D21 + 2u1 u2 D1 D2 + u22 D22 ) f (0, 0)
= (−u21 sin(0) sin(0) + 2u1 u2 cos(0) cos(0) − u22 sin(0) sin(0))
= 2u1 u2 .

The 3rd derivative is:

D3u f (0, 0) = (u1 D1 + u2 D2 )3 f (0, 0)


= (u31 D31 + 3u21 u2 D21 D2 + 3u1 u22 D1 D22 + u32 D32 ) f (0, 0)
= −u31 cos(0) sin(0) − 3u21 u2 sin(0) cos(0) − 3u1 u22 cos(0) sin(0)
−u32 sin(0) cos(0)
= 0.



88 11 Taylor’s Formula for Multi-Variable Functions

11.3 Taylor’s formula for n variable functions

The case of two variables or n variables is almost identical. Therefore, we consider


the case of n variables.

Definition 11.1. A set D ⊂ Rn is said to be convex if the line segment connecting


any two points c, x ∈ D is always contained in D.

Problem 11.3. What does it mean for a function f : D ⊂ Rn → R to be (k + 1) times


continuously differentiable?

Solution 11.3 ⊔

Now, we introduce the Taylor theorem for multivariable functions in theorem


form.

Theorem 11.1 (Taylor’s theorem). Let f : D ⊂ Rn → Rm be (k + 1) times continu-


ously differentiable, and let D ⊂ Rn be an open convex set with c, x ∈ D. Then, there
exists a point s between c and x such that the following holds:

f(x) = pk (c, x) + Rk (c, x),

where
k
((x1 − c1 )D1 + · · · + (xn − cn )Dn )ℓ f(c)
pk (c, x) := ∑ ,
ℓ=0 ℓ!
((x1 − c1 )D1 + · · · + (xn − cn )Dn )k+1 f(s)
Rk (c, x) := .
(k + 1)!

Problem 11.4. Prove Theorem 11.1.

Solution 11.4 Let 0 < t < 1 and define

F(t) = f(c + t(x − c)).

Then, since D is convex and open, there exists a small positive number ε > 0 such
that c + t(x − c) ∈ D for t ∈ (−ε, 1 + ε), and thus F is k + 1 times continuously
differentiable on (−ε, 1 + ε). Therefore, by the 1-variable Taylor theorem, there
exists 0 < s < 1 such that

F(0) F ′ (0) F (k) (0) F (k+1)


F(1) = + +···+ + (s), 0 ≤ s ≤ 1.
0! 1! k! (k + 1)!

The derivative F (ℓ) (0) is given by the ℓth directional derivative of f at c in the direc-
tion (x − c) (more precisely, if ∥x − c∥ = ̸ 1, it’s not a directional derivative). That
is,
11.3 Taylor’s formula for n variable functions 89

F (ℓ) (0) = ((x1 − c1 )D1 + · · · + (xn − cn )Dn )ℓ f(c).


Since F(1) = f(c + 1(x − c)) = f(x), substituting this into the above equation yields
Theorem 11.1. ⊔ ⊓

Problem 11.5. Find the 2nd order approximation of the function f (x, y) = sin x sin y
at the origin. Estimate the error of the 2nd order approximation when |x| < 0.1 and
|y| < 0.1.

Solution 11.5 Let’s use the calculations from Problem 11.3. In this case, it corre-
sponds to the case (u1 , u2 ) = (x, y). We already saw that the 0th and 1st terms are 0.
The 2nd term is 2xy. Therefore, the 2nd order approximation is
1
p2 (x, y) = 2xy = xy.
2!
The approximation error is given by:
1 3
| f (x, y) − xy| = |x cos s sint + 3x2 y sin s cost + 3xy2 cos s sint + y3 sin s cost|.
3!
Since | sin s| and | cost| are less than or equal to 1, and |x| and |y| are less than or
equal to 0.1, we can obtain the following estimate:
1 0.008
| f (x, y) − xy| ≤ 8(0.1)3 12 = .
3! 6
This calculation is simple and suitable as an example. Generally, to estimate the
error manually, many calculations are required and it takes a long time. ⊔

Problem 11.6. The answer given in Problem 11.5 matches the answer in Thomas’
14th edition. However, since D3u f (0, 0) = 0, it is not the optimal answer. What is the
optimal answer?

Solution 11.6 Can you understand what I’m saying? If you understand the Taylor
formula, you should be able to understand it. Need more hints? If so, does the 2nd
order approximation xy correspond to p2 or p3 ? ⊔⊓
Part III
Integration of Multi-variable Functions
Lecture 12
Double and Iterated Integrals on Rectangular
Coordinates

In this lecture, we extend the one-dimensional Riemann integral to two-dimensional


space. After this, we can understand how to extend integration to spaces of three
dimensions or more on our own. Through this lecture, it is crucial to clearly un-
derstand the differences between double integrals and iterated integrals. Fubini’s
theorem states that if a function is continuous, then the two integrals have the same
value. Also, in this lecture, let’s pay attention to the indexing method. We need to
develop a notation style and become familiar with using indices.

12.1 Riemann integral in R1 (Review)

To summarize the one-dimensional Riemann integral from calculus 1 is helpful


for understanding the two-dimensional integral. In one-dimensional space, the only
bounded and connected region is an interval. As a result, integration in one dimen-
sion becomes very simple. Let f : [a, b] → R be a function defined on the closed
interval [a, b] with real values. To define the Riemann integral, we first consider a
partition. A set of grid points

π = {x0 , x1 , · · · , x p }

is a partition of the interval [a, b] if the first point is x0 = a, the last point is x p = b,
and the points in between satisfy

x0 < x1 < x2 < · · · < x p .

Then, we can consider n sub-intervals. The i-th sub-interval is [xi−1 , xi ]. The length
of a sub-interval is denoted as follows:

∆ xi := xi − xi−1 , i = 1, · · · , p.

93
94 12 Double and Iterated Integrals on Rectangular Coordinates

The gauge (or mesh) of the partition is defined and denoted as follows:

∥π∥ := max ∆ xi .
1≤i≤p

This is the size of the largest sub-interval. Therefore, as ∥π∥ → 0, it means that the
size of all sub-intervals converges to 0. The Riemann sum of the function f for a
given partition π is defined as follows:
p
S(π, {ci }) := ∑ f (ci )∆ xi , ci ∈ [xi−1 , xi ].
i=1

It is determined by the given partition π and the choice of point ci in each sub-
interval. Finally, the Riemann integral of the function f over the interval [a, b] is
denoted and defined as follows:
Z b
f (x) dx := lim S(π, {ci }).
a ∥π∥→0

If the limit exists regardless of how the choices ci ∈ [xi−1 , xi ] are made, we say that
the function f is Riemann integrable over the interval [a, b]. The following theorem
introduces examples of functions for which Riemann integrals are possible.

Theorem 12.1. A function f : [a, b] → R is Riemann integrable in the following


cases:
1. If f is continuous on the closed interval [a, b].
2. If f is bounded on [a, b] and has a finite number of discontinuities.

Next, let’s examine some examples of functions for which Riemann integrability
fails.

Problem 12.1. It is possible to create a bounded function that is not Riemann in-
tegrable, but it is often done in strange ways. Show that the following bounded
function f : [0, 1] → R is not integrable:
(
0, if x ∈ Q
f (x) =
1, otherwise.

Solution 12.1 This function is quite strange. It is 0 at rational numbers and 1 at


irrational numbers. Showing that it is not Riemann integrable is straightforward. We
can demonstrate that the limit varies depending on how the choices ci ∈ [xi−1 , xi ] are
made. ⊔ ⊓

Most naturally occurring bounded functions are integrable on an interval. The


cases where Riemann integrability fails often involve functions that are unbounded
or when the integration interval is infinite.
12.2 Double integral (or Riemann integral in 2-D) 95

Problem 12.2. Among unbounded functions, there are many natural functions that
are not Riemann integrable. The integrability depends on how the function behaves
near divergence. Show that the function
(
xr , if x ̸= 0
f (x) =
0, if x = 0

is integrable on [−1, 1] if r > −1, and not integrable if r ≤ −1.

Solution 12.2 For all r < 0, the function f diverges near x = 0, and as r approaches
0, the integration is possible, but further away, when r ≤ −1, the integration di-
verges. ⊔ ⊓

12.2 Double integral (or Riemann integral in 2-D)

In one-dimensional space, the only bounded and connected region is an interval.


However, in two-dimensional space, there are various possibilities. The simplest
case is a rectangular region. Let’s denote a rectangular region R as follows:

R := [a, b] × [c, d].

The rectangular region R can be easily partitioned. First,

π1 = {x0 , x1 , . . . , x p1 } and π2 = {y0 , y1 , . . . , y p2 }

are partitions of [a, b] and [c, d], respectively, and are denoted as follows:

∆ xi := xi − xi−1 , i = 1, . . . , p1 ,

∆ y j := y j − y j−1 , j = 1, . . . , p2 .
We define small rectangles as follows:
96 12 Double and Iterated Integrals on Rectangular Coordinates

Ai j := [xi−1 , xi ] × [y j−1 , y j ],

and their sizes are defined as follows:

∆ Ai j = ∆ xi ∆ y j .

Then, the partition of the rectangle R consists of p1 × p2 small rectangles. This


partition is denoted as π = π1 × π2 , and the gauge (size, mesh) is defined as follows:

∥π∥ = max ∆ Ai j .
1≤i≤p1 ,1≤ j≤p2

The two-dimensional Riemann sum is defined and denoted as follows:


p1 ,p2
S(π, {ci j }) = ∑ f (ci j )∆ Ai j , ci j ∈ Ai j .
i, j=1

Finally, the Riemann integral of the function f over the rectangular region R =
[a, b] × [c, d] is defined and denoted as follows:
ZZ
f (x, y) dxdy := lim S(π, {ci j }).
R ∥π∥→0

Here, the limit must exist as ∥π∥ approaches 0 for the integral to be defined, and
in this case, we say that the function f is Riemann integrable over R = [a, b] ×
[c, d]. Such integration in two-dimensional space is called a double integral. Triple
integrals in three dimensions, and so on, all refer to Riemann integrals.
The following theorem lists the cases where Riemann integrals are possible for
functions.

Theorem 12.2. A function f : R = [a, b] × [c, d] → R is Riemann integrable if it


satisfies the following conditions:
1. f is continuous on R.
2. f is bounded on R and has a finite number of discontinuities except for smooth
curves.

What does it mean to have a finite number of smooth curves? The term ”finite
number” is clear, and ”smooth curve” means a function r : [a, b] → R that is differ-
entiable. In other words, it refers to functions among parameterized curves that are
differentiable.

Problem 12.3. What kinds of functions are not Riemann integrable?

Solution 12.3 Let’s consider some examples. ⊔



12.3 Iterated Integrals 97

12.3 Iterated Integrals

Iterated integrals bear resemblance to partial derivatives. When performing an it-


erated integral, one variable is treated as a variable while the other is treated as a
constant.

Problem 12.4. Compute the following iterated integrals.


(a) 01 02 (4 − x − y) dx dy (b) 02 01 (4 − x − y) dy dx
R R R R

Solution 12.4 (a) The iterated integral can be expressed as follows:


Z 1Z 2 Z 1 Z 2 
(4 − x − y) dx dy = (4 − x − y) dx dy.
0 0 0 0

This means that we first perform the inner integral 02 (4 − x − y) dx. In this case,
R

other variables like y are treated as constants. By doing so, we obtain the following
result:
Z 2 x=2
1
(4 − x − y) dx = 4x − x2 − yx = (8 − 2 − 2y) − (0) = 6 − 2y.
0 2 x=0

Now, we proceed with the outer integral:


Z 1 Z 2  Z 1 y=1
(4 − x − y) dx dy = (6 − 2y)dy = 6y − y2 = 6 − 1 − (0) = 5
0 0 0 y=0

is the final answer.


(b) asks to compute the iterated integral by reversing the order of integration from
(a). Are they always the same? Let’s calculate. While they are not always equal, they
mostly are. When are they equal? Fubini’s Theorem provides the answer. ⊔ ⊓

Iterated integrals were defined independently of the double integral discussed in


the previous section, but they are conceptually similar. The first integral calculates
the area of a cross-section, while the second integral integrates the cross-sectional
integrals to give volume. Writing the Riemann integral as follows makes them look
more similar:
p1 ,p2 p2  p1 
S(π1 × π2 , {ci j }) = ∑ f (ci j )∆ xi ∆ y j = ∑ ∑ f (ci j )∆ xi ∆ y j.
i, j=1 j=1 i=1

Thus, the inner summation corresponds to the inner integral, and the outer sum-
mation corresponds to the outer integral. This is theoretically proven by Fubini’s
Theorem.

Theorem 12.3 (Fubini’s Theorem). If a function f : R ⊂ R2 → R is Riemann inte-


grable on a rectangle R = [a, b] × [c, d], then the following equality holds:
98 12 Double and Iterated Integrals on Rectangular Coordinates
ZZ Z bZ d Z dZ b
f dxdy = f (x, y) dydx = f (x, y) dxdy.
R a c c a

The proof of Fubini’s Theorem is covered in analysis courses. Humans don’t


compute double integrals directly using the definition of Riemann integrals. Instead,
they use iterated integrals. However, computers compute Riemann integrals.
RR
Problem 12.5. Given R = [0, 2]×[−1, 1] and f (x, y) = 10−6xy, find R f (x, y) dxdy.

Solution 12.5 Since the function f (x, y) = 10 − 6xy is continuous on R, we can use
Fubini’s Theorem. Therefore,
ZZ
f (x, y) dxdy = · · ·
R


Problem 12.6. Find the volume enclosed by the graph of the function f (x, y) =
x2 + y2 over the domain R = [0, 1] × [0, 1].

Solution 12.6 The volume is given by R x2 + y2 dxdy. Since the function is con-
RR

tinuous on R, we can use Fubini’s Theorem to calculate it. ⊔


Properties of double integrals

Many properties of one-dimensional Riemann integrals also hold for double inte-
grals, as well as integrals of higher dimensions. These are fundamental properties of
integrals that apply to all types of integration methods, including Riemann integrals.
1. Constant multiple rule
ZZ ZZ
c f (x, y)dxdy = c f (x, y)dxdy.
R R

2. Sum rule
ZZ ZZ ZZ
( f (x, y) ± g(x, y))dxdy = f (x, y)dxdy ± g(x, y)dxdy.
R R R

3. Comparison principle
ZZ ZZ
f (x, y)dxdy ≥ g(x, y)dxdy if f (x, y) ≥ g(x, y) on R.
R R

4. Domain Split. If R = R1 ∪ R2 , R1 ∩ R2 = 0.,


/
12.3 Iterated Integrals 99
ZZ ZZ ZZ
f (x, y)dxdy = f (x, y)dxdy + f (x, y)dxdy.
R R1 R2

Problem 12.7.

Solution 12.7 ⊔

Lecture 13
Double Integration over a General Domain

When integrating a function defined on a rectangular domain, we can separate the


integration into one-dimensional cases. However, when dealing with domains of
more general shapes rather than rectangular ones, defining the integration is not
straightforward. Let’s first consider when integration is possible. Let D ⊂ R2 be a
bounded domain. Then, we can find a rectangular region R containing D. We define
a function h(x, y) on R as follows:
(
f (x, y), if (x, y) ∈ D
h(x, y) = (13.1)
0, otherwise

Then, we aim to define the integral of the function f : D → R as the integral of


h : R → R: ZZ ZZ
f (x, y) dxdy = h(x, y) dxdy.
D R

RR
If D is a bounded closed domain and f is continuous on D, will the double integral
R h(x, y) dxdy be well-defined? Not necessarily. Even if f is continuous on D, h
may not be continuous on R. It is discontinuous at the boundary ∂ D. Therefore,
we cannot be sure if h is integrable on R. If the boundary ∂ D of the domain D
consists of a finite number of smooth curves, then integration is possible by the
previous theorem. However, if D has a complex structure, remember that h may not
be integrable on R. In this lecture, let’s practice integration over bounded closed
domains D whose boundaries consist of a finite number of smooth curves.

13.1 2 Types of Domains

Even among general domains, there are cases where iterated integrals are more con-
venient. Let’s explore some of those cases and practice integration. First, consider a
domain enclosed by two lines parallel to the y-axis and n graphs of the form y = g(x).

101
102 13 Double Integration over a General Domain

Problem 13.1. Let D = {(x, y) : a ≤ x ≤ b, g1 (x) ≤ y ≤ g2 (x)} and f : D → R be


continuous on D. Show that
ZZ Z b Z g2 (x)
f (x, y)dxdy = f (x, y)dydx, (13.2)
D a g1 (x)

and explain the meaning of this equality.

Solution 13.1 First, D is contained within a rectangle R = [a, b] × [c, d], and its
boundary consists of two lines and two graphs, each of which is a smooth curve.
Thus, the function h(x, y) given in (13.1) is continuous except for a finite number of
smooth curves. Therefore, h(x,t) is Riemann integrable, and
ZZ ZZ
f (x, y)dxdy = h(x, y)dxdy
D R
RR
is well-defined. Now, let’s compute R h(x, y)dxdy. Expressing it as an iterated in-
tegral using Fubini’s Theorem, we have
ZZ Z bZ d
h(x, y)dxdy = h(x, y)dydx.
R a c

Since h is zero outside the graphs g1 and g2 , the inner integral becomes
Z d Z g2 (x)
h(x, y)dy = f (x, y)dy
c g1 (x)

upon substitution. This yields the desired result. ⊔


Problem 13.2. Let D = {(x, y) : g1 (y) ≤ x ≤ g2 (y), c ≤ y ≤ d} and f : D → R be


continuous on D. Show that
ZZ Z d Z g2 (y)
f (x, y)dxdy = f (x, y)dxdy, (13.3)
D c g1 (y)

and explain the difference from the previous case.

Solution 13.2 This problem is almost identical to the previous one, but let’s practice
it. ⊔

13.2 Examples

To perform actual calculations, we need to decide whether to use (13.2) or (13.3)


depending on the shape of the integration domain D. Both methods may be applica-
ble in some cases, while only one may be applicable in others. Moreover, we need to
find the functions g1 and g2 that define the boundaries. It requires several practices.
13.2 Examples 103

Problem 13.3. Find the volume of a tetrahedron with vertices at the origin and three
vectors i, j, and k using double integration.

Solution 13.3 Let’s first compute it without integration. Since the area of the base
is 0.5 and the height is 1, the volume using the formula for the volume of a cone 13 ×
base × height is 61 . Now, let’s compute it using integration. First, find the equation
of the plane passing through the three vertices i, j, and k. The vector perpendicular
to the plane is (1, 1, 1), and since it passes through the point (0, 0, 1), the equation
of the plane is given by

(x, y, z − 1) · (1, 1, 1) = x + y + z − 1 = 0.

Thus, z = 1 − x − y. The base is defined by the x-axis, the y-axis, and the line y =
1 − x. Hence, in terms of integration, we have
Z 1 Z 1−x Z 1 1−x
Z 1
1
(1 − x − y)dydx = y − xy − 0.5y2 dx = 0.5x2 − x + 0.5dx = .
0 0 0 y=0 0 6


Lecture 14
Integration with Variable Changes

In this lecture, we study integration techniques using variable changes. We have


already studied variable changes for one-dimensional integration in Lecture 8. Now
that we have learned two-dimensional integration, it’s time to study variable changes
for two-dimensional integration. However, variable changes for all dimensions can
be explained using the same principle. Let’s develop the concept of variable changes
for general dimensions in this lecture. Specific two-dimensional variable changes
will be studied in the next lecture on surface integrals.

14.1 Volume Expansion Rate

In this section, we explain the intuitive concept of the volume expansion rate that
should be considered when performing integration with variable changes.

As illustrated in the figure above, the function f : D ⊂ Rn → R is a function that


takes real values in the domain D. Alternatively, f can be considered as a function
defined on Rn , and we are considering its integral over a part D ⊂ Rn . Here, D can

105
106 14 Integration with Variable Changes

also be of dimensions lower than n. To define the Riemann integral, we first partition
the domain D into small regions (cells). The collection of these small regions

π = {Ai , i = 1, · · · N}

is called the partition of the domain D, and the size of the largest divided region
is denoted by ∥π∥. In other words, ∥π∥ → 0 means that the number of divided
regions increases and the size of each region converges to 0. Then, a point xi ∈ Ai is
chosen for each divided region. The Riemann integral is then defined and denoted
as follows:
Z N
f (x)dx = lim ∑ f (xi )∆ Ai .
D ∥π∥→0 i=1

If there exists a function g : G → D for some region G ⊂ Rℓ , which is one-to-one,


onto, and differentiable, variable transformation can be considered. Denoting the
variable of G as y ∈ G and x = g(y), the integral of the composite function can be
considered: Z
f ◦ g(y)dy.
G

Problem 14.1. The functions f and the composite function f ◦ g take the same val-
ues at x and y (of course, when x = g(y)). However, the integrals are different;
Z Z
f (x)dx ̸= f ◦ g(y)dy.
D G

Explain why they are different and how to modify them to make the equation hold.

Solution 14.1 The divided regions Bi = g−1 (Ai ) represent the inverse images of the
divided regions Ai , and Bi become the divided regions of G, forming the partition of
G. Denoting yi = g−1 (xi ), we have f (g(yi )) = f (xi ). However, since the volumes
of Ai and Bi are different, we have
N N
∑ f (xi )∆ Ai ̸= ∑ f (g(yi ))∆ Bi .
i=1 i=1

If g preserves the volume, the two expressions are equal, but most variable transfor-
mations we consider do not preserve the volume. Therefore, we need to consider the
change in volume between the divided regions, especially when the divided regions
are very small, i.e., when considering the limit ∥π∥ → 0. To make the equation hold,
we need to multiply the right side by the ratio of how much the divided region Bi
has increased due to g:
N N N
∆ Ai ∆ g(Bi )
∑ f (xi )∆ Ai = ∑ f (g(yi )) ∆ Bi ∆ Bi = ∑ f (g(yi )) ∆ Bi
∆ Bi .
i=1 i=1 i=1
14.1 Volume Expansion Rate 107

In other words, we need to multiply the right side by the ratio of how much g has
expanded the divided region Bi . ⊔

Here, ∆∆ ABii or ∆ T∆ (Bi)


Bi represents the ratio of how much the small divided region
Bi has expanded by the transformation g. Let’s call this ratio the volume expansion
rate. Now, the integral expression becomes
Z Z
f (x)dx = f ◦ g(y)q(y)dy. (14.1)
D G=g−1 (D)

Here, q(y) represents the volume expansion rate around the point y, and it is given
as follows when B is a small region containing y:

∆ T (B)
q(y) = lim , y ∈ B. Volume Expansion Rate
∆ B→0 ∆B

The volume expansion rate is computed using differentiation. The derivative


∇g(c) is the linear approximation of g at y = c.

Problem 14.2 (Review of Linear Approximation). The function g(y) is not a lin-
ear function of the variable y. Its derivative ∇g(y) is also not a linear function of the
variable y. In what sense, then, is ∇g referred to as a linear approximation?

Solution 14.2 It means that for a fixed c, ∇g(c)y is a linear function of the variable
y. The derivative ∇g(c) computes the gradient of g at the given point c and can be
represented as a matrix. Then, the function y → ∇g(c)y is a linear function with
respect to y. Therefore, g(c) + ∇g(c)y is an approximation of g(y). Alternatively,
∇g(c)y can be considered as a linear approximation of g(y) − g(c). If the notation
∇g(c)(Bi ) causes confusion, it can be replaced by ∇g|c (Bi ). ⊔⊓

When we write ∇g(c), it means a matrix, while we used T in the definition of


differentiation. When B is a small region around the point c, there is a difference
in size between g(B) and the small region approximating it by ∇g(c)(B) or T (B).
However, as B becomes smaller, they become increasingly similar, and the volume
ratio converges to 1.

Problem 14.3. Let B be a small region containing yi . Denote the volume of B as ∆ B,


and the volume of B and its image under the function g and the linear approximation
∇g|yi as ∆ (g(B)) and ∆ (∇g|yi (B)), respectively. Show the following:

∆ (g(B)) ∆ (∇g|yi (B))


lim = lim .
∆ B→0 ∆B ∆ B→0 ∆B

Solution 14.3 Rather than proving, we present the principle. In the Taylor expan-
sion, the linear approximation ∇g is the first-order term. Therefore, the difference
g(y) − (g(c) + ∇g(y)) is a second-order term. Since the volume of a constant is 0,
we have
108 14 Integration with Variable Changes

∆ ((∇g|yi − g)(B))
lim = 0.
∆ B→0 ∆ (B)
This is because the numerator is a second-order term and the denominator is a first-
order term. The squared term converges to 0 faster. ⊔⊓
∆ T (B)
Therefore, instead of taking the limit of ∆∆g(B) B , we take the limit of ∆ B using
the linear approximation. However, linear functions have a very nice property. The
volume expansion rate is the same for all sets B. What does this mean? It means that
for linear function T , ∆ (T∆ (B))
B is constant for all sets B. Therefore, there is no need
to take a limit, and B doesn’t even need to contain y. For any set B, we have the
following:

∆ (T (B))
q(y) = , T = ∇g(y). Volume Expansion Rate
∆B
Then, what is the most convenient choice for B to compute the volume expansion
rate q(y)? The easiest choice is to take [0, 1]ℓ where each side length is 1, like a
square or a cube. In that case, T (B) becomes a parallelotope in n-dimensional space.
Therefore, we only need to know how to compute the volume of a parallelotope.

14.2 Linear function and volume of parallelotopes

Let A correspond to a linear transformation T : Rn → Rm as an m × n matrix, i.e.,

T (x) = Ax, x ∈ Rn .

(In the context of the previous section, it should have been T : Rℓ → Rn ...) Since the
linear transformation T and the matrix A are the same concept, it is not necessary
to use both notations, but sometimes it can be visually pleasing to use both. Let’s
denote an n-dimensional cube with edge length ε > 0 as follows:

Ω ε = [0, ε]n .

The volume of a given set S is denoted as ∆ S. Then, the volume of the above set is
as follows:
∆ (Ω ε ) = ε n .
When ε = 1, we have ∆ (Ω 1 ) = 1.
We consider parallelepipeds with one vertex at the origin. The parallelepiped Ω 1
is a special parallelepiped where all edges have a length of 1 and two edges are per-
pendicular to each other. Generally, an n-dimensional parallelepiped is determined
by n linearly independent vectors. These vectors are the n edges of the parallelepiped
connected to the origin. Their sizes or angles between two edges do not need to be
the same. For n = 2, the parallelepiped is composed of n + n × (n − 1) = n2 = 4
14.2 Linear function and volume of parallelotopes 109

edges. For n = 3, the total number of edges completing the parallelepiped is


n + n(n − 1) + n(n−1)(n−2)
2 = 12.

Problem 14.4. Let’s assume m ≥ n. Explain that the image T (Ω 1 ) by a linear trans-
formation is an n-dimensional parallelepiped in the m-dimensional space. What are
the edges of T (Ω 1 ) connected to the origin?

Solution 14.4 Let A be the m × n matrix of the linear transformation T . Then, A can
be written as follows:
A = (a1 , a2 , · · · , an ),
where ai are the column vectors of matrix A. Then, the n edges of the cube Ω 1 are
ei , and their images are Aei = ai . In other words, the n columns of matrix A are the n
edges of the n-dimensional parallelepiped T (Ω 1 ) connected to the origin. (Strictly
speaking, the condition that ai ’s are linearly independent is needed.) ⊔ ⊓

The volume of Ω 1 ⊂ Rn is 1. In this case, knowing the volume of T (Ω 1 ) ⊂ Rm


is crucial for integration. The ratio

∆ T (Ω 1 )
q=
∆Ω1
is called the volume expansion rate of the linear function T . Due to the linearity of
ε)
the function, we can show that q = ∆ T∆ (Ω
Ω ε for all ε > 0. Moreover, it holds for any
non-zero volume space V ⊂ Rn that q = ∆ T∆V(V ) .

14.2.1 Volume of parallelepiped when m = n

To obtain the volume of an n-dimensional parallelepiped T (Ω 1 ) ⊂ Rm , calculations


are required depending on each case. However, when m = n, i.e., when the matrix A
is square, the volume is given by the absolute value of the determinant of the matrix,
denoted as det(A). The following is how to compute the Determinant.

Determinant

The determinant is defined recursively as follows:


1. If A = (a) is a 1 × 1 matrix, then det(A) = a.
 
ab
2. If A = is a 2 × 2 matrix, then det(A) = ad − bc.
cd
110 14 Integration with Variable Changes
 
abc
3. If A = d e f  is a 3 × 3 matrix, then
g h i     
e f d f de
det(A) = a det − b det + c det = a(ei − f h) − b(di − f g) +
h i g i gh
c(dh − eg).

Problem 14.5 (Volume of parallelepiped when m = n). Show that the volume of
a parallelepiped composed of edges a1 , · · · , an is equal to the determinant of the
matrix A formed by these vectors.

Solution 14.5 Let’s start with the case n = 2. Rotate the edges so that a1 lies on
the x-axis. Then c = 0, and det(A) = ad. It is known to match the area of the paral-
lelogram. For n = 3, rotate it so that a1 lies on the x-axis. Then d = g = 0. Rotate
the parallelepiped centered at a1 so that a2 lies in the xy-plane. Then h = 0, and
the determinant of A becomes det(A) = aei. This value may have a negative value
depending on the signs of a, e, and i, but its absolute value matches the volume. For
higher dimensions, let’s just remember the formula. ⊔ ⊓

Problem 14.6. Show the following.


1. ∆ (T (Ω ε )) = ε n ∆ (T (Ω 1 )).
∆ (T (Ω ε )) ∆ (T (Ω 1 ))
2. ∆ (Ω ε ) = ∆ (Ω 1 )
.

Solution 14.6 ⊔

14.2.2 Volume of parallelotope when n < m

Given a linear function T : Rn → Rm and its corresponding m × n matrix A =


(a1 , · · · , an ). The image T (Ω 1 ) of the unit cube Ω 1 with each edge as unit vec-
tor ei ∈ Rn is a parallelotope in Rm with each edge as ai . Therefore, the volume
expansion rate by T is the volume of this parallelotope. Let’s find the volume of this
parallelotope below.

Problem 14.7 (Area of parallelogram). When m > 2, show that the area of a par-
allelogram in Rm spanned by 2 edges a1 , a2 ∈ Rm is given by the following formula:
q
(∥a1 ∥ ∗ ∥a2 ∥)2 − (a1 · a2 )2 . (14.2)

Solution 14.7 Firstly,


s
a1 · a2  a1 · a2 2
cos θ = ⇒ sin θ = 1−
∥a1 ∥ ∗ ∥a2 ∥ ∥a1 ∥ ∗ ∥a2 ∥
14.2 Linear function and volume of parallelotopes 111

Thus, the area of the parallelogram is


q
∥a1 ∥ ∗ ∥a2 ∥ sin θ = (∥a1 ∥ ∗ ∥a2 ∥)2 − (a1 · a2 )2 . ⊔


Problem 14.8 (Volume of parallelotope when n < m). When n < m, show that the
volume of a parallelotope in Rm spanned by n edges a1 , · · · , an ∈ Rm is given by the
following Gram determinant:
1/2
q a1 · a1 · · · a1 · an
∥a1 × a2 × · · · × an ∥ = det(ai · a j ) = .. .. (14.3)
. .
an · a1 · · · an · an

Solution 14.8 Let’s just remember the formula without proving it. ⊔

Question 14.1. Do the formulas (14.2) and (14.3) for the area of a parallelogram
coincide with the Gram determinant?

Other things

In conclusion, the volume expansion rate for a linear function T is ∆ T (B) when
B = [0, 1]ℓ , i.e., the volume of the parallelotope T (B) whose edges are the columns
of ∇g(y). If T is a square matrix, this volume is simply ∆ T (B) = |detT |, i.e.,

∆ (∇g|c (B))
q(y) = = |det(∇g(y))|
∆B
The determinant of the derivative det(∇g(y)) is called the Jacobian. The formula for
variable transformation is as follows;
Z Z
f (x)dx = f (g(y))q(y)dy, g(G) = D.
D G

If ℓ = n, i.e., T is a square matrix, it becomes as follows.


Z Z
f (x)dx = f (g(y))|det(∇g(y))|dy, g(G) = D.
D G

Remember to put an absolute value on the Jacobian. That is, Equation (2) of
the Thomas textbook in Section 14.8 is incorrect. This is related to the following
question.
112 14 Integration with Variable Changes
Rb R
Question 14.2. What is the difference between a f (x)dx and [a,b] f (x)dx in nota-
tion?

first notation includes direction dependency such as ab f (x)dx =


R
Solution
Ra
14.2 The
R
− b f (x)dx, but [a,b] f (x)dx does not include directionality. Therefore, in the case
of one-dimensional transformations,
Z b Z g−1 (b)
f (x)dx = f (g(y))g′ (y)dy
a g−1 (a)

the absolute value is not attached. The values at both ends of the integral and the
sign of g′ (y) offset each other. However,
Z Z
f (x)dx = f (g(y))|g′ (y)|dy
[a,b] g−1 [a,b]

the absolute value must be attached. ⊔



Lecture 15
Surface Integral

So far, integrals have been defined for functions defined on regions D ⊂ Rn inside
the space. The region D could be the entire space Rn or a part of it, but the dimension
of D was fundamentally the same as that of the entire space. In this and the following
lectures, we study integrals on surfaces inside the space, rather than n-dimensional
regions of Rn . Integrals on curves have already been discussed. The dimension of
the entire space n can easily be extended to the general case, but we restrict ourselves
to the case of n = 3. Understanding the case of n = 3 will enable us to handle cases
where n > 3 easily.

In this lecture, we apply the chain rule from the previous lecture to the case of
ℓ = 2. In other words, we consider the case of g : Rℓ=2 → Rn=3 . First, we denote the
space Rℓ=2 using variables (u, v). And the independent variables of Rn=3 are still
denoted by (x, y, z). These are understood as functions of u and v. That is, g(u, v) =
(x(u, v); y(u, v); z(u, v)). Then, the chain rule is written as follows.
 
xu xv
∂f ∂f
∇( f ◦ g) = ( , ) = (∇ f )(∇g) = ( fx , fy , fz ) yu yv  .
∂u ∂v
zu zv

Writing this out component-wise, we have:

113
114 15 Surface Integral

∂f ∂ f ∂x ∂ f ∂y ∂ f ∂z


 = + +
∂u ∂x ∂u ∂y ∂u ∂z ∂u (15.1)
∂ f = ∂ f ∂x + ∂ f ∂y + ∂ f ∂z.

∂v ∂x ∂v ∂y ∂v ∂z ∂v

∂f
In the notation , f is considered a function of u and v. In other words, the f
∂u
inside the notation is essentially the composition function f ◦ g. Also, since it is a
multivariable function, we use ∂ instead of d as a symbol.

Problem 15.1. Differentiate w with respect to variables r and s under the following
conditions:
r
w = x + 2y + z2 , x = , y = r2 + ln s, z = 2r.
s
Solution 15.1 (In the equations and figures, (u, v), (x, y, z), and f were used as
variables and functions. However, different people may use different notations. It is
important to be able to handle different notations when they are given.) ⊔

Surface expansion rate by g : R2 → R3

The function g : R2 → R3 maps the domain G ⊂ R2 to a surface in R3 . The area of


this surface may increase or decrease compared to the area of G. Moreover, some
parts may increase significantly while others may increase slightly or even decrease.
The rate at which the area increases at the point (u, v) ∈ G is determined by the
derivative  
xu (u, v) xv (u, v)
∇g = (gu (u, v), gv (u, v)) = yu (u, v) yv (u, v) .
zu (u, v) zv (u, v)
This is given by the area of the parallelogram created by ∇g. How can we measure
this area? In the previous lecture, we learned that if ∇g is a square matrix, then
the absolute value of the determinant |det(∇g)| is the expansion rate of the area.
However, we cannot use this when ∇g is not a square matrix.
There are several methods to compute the area of this parallelogram.
(1) One method is to use the cross-product. Since it is a parallelogram in 3D and
the vectors forming the sides are given, its area is ∥a1 × a2 ∥.
Z
Area(g(D)) = ∥gu × gv ∥dudv. Area Formula 1
D

(2) If the dimension is n > 3, then we cannot use the above formula. However, we
can still find the area of the parallelogram. First,
s
a1 · a2  a ·a
1 2
2
cos θ = ⇒ sin θ = 1 −
∥a1 ∥ ∗ ∥a2 ∥ ∥a1 ∥ ∗ ∥a2 ∥
15.1 Surface integral 115

so the area of the parallelogram is


q
∥a1 ∥ ∗ ∥a2 ∥ sin θ = (∥a1 ∥ ∗ ∥a2 ∥)2 − (a1 · a2 )2 .

Therefore, for the more general case, the area is calculated as follows;
Z q
Area(g(G)) = (∥gu ∥ ∗ ∥gv ∥)2 − (gu · gv )2 dudv. Area Formula 2
G

Question 15.1. What is the relationship between Area Formula 2 and the Gram de-
terminant in (14.3)?

15.1 Surface integral

Let f : R3 → R be a function and S be a surface inside the space R3 . We want to


integrate the function f as if it were defined on the surface S as we do with functions
defined on a 2-dimensional space. This is called a surface integral. For example,
we live on the spherical surface called the Earth, but we consider it as if we were
living on a 2-dimensional plane. If we understand line integrals as integrals in 2
dimensions, we can define them similarly. First, we define a partition of the surface
S:
π = {Ai , i = 1, · · · , N}, S = ∪Ai ,
and the area of S should be equal to the sum of the areas of Ai . We choose points
xi ∈ Ai from each subregion, and the surface integral is defined as follows:
Z N
f = lim ∑ f (xi )∆ Ai , (15.2)
S ∥π∥→0 i=1

where ∆ Ai is the area of the i-th subregion of the surface.


If the surface S is given by a parametrization using parameters (u, v) ∈ G ⊂ R2
with g(u, v), then the surface integral is calculated using the above area formulas as
follows; Z Z
f= f (g(u, v))∥gu × gv ∥dudv. (15.3)
S G
And if it is a surface in 3 dimensions or more, we use the following:
Z Z q
f= f (g(u, v)) (∥gu ∥ ∗ ∥gv ∥)2 − (gu · gv )2 dudv. (15.4)
S G

Problem 15.2. Find the area of the graph z = x2 + y2 for 0 < x, y < 2.

Solution 15.2 In this case, we can use (x, y) instead of (u, v). Then, g(x, y) =
(x; y; x2 + y2 ). gx = (1, 0, 2x), gy = (0, 1, 2y). Therefore,
116 15 Surface Integral
p
gx × gy = (2x; −2y; 1), ∥gx × gy ∥ = 4x2 + 4y2 + 1.

So the area is obtained by integrating f = 1:


Z 2Z 2p
4x2 + 4y2 + 1 dxdy
0 0

is the area of the graph. ⊔


15.2 Polar Coordinates

In the case of polar coordinates, where ℓ = n = 2, ∇g is a 2 × 2 square matrix. g and


its derivatives are as follows:
     
r cos θ cos θ −r sin θ
g(r, θ ) = , ∇g = (gr , gθ ), gr = , gθ = .
r sin θ sin θ r cos θ

Let’s compute the area of the parallelogram using two methods. First, computing
the determinant yields:

cos θ −r sin θ
∥gr × gθ ∥ = |det(∇g)| = = r.
sin θ r cos θ

Using the second area formula, we have:


q q
(∥gr ∥ ∗ ∥gθ ∥)2 − (gr · gθ )2 = (1 ∗ r)2 − 0 = r.

It is important to remember that the area expansion rate in polar coordinates is given
by r. The integral formula for variable transformation is as follows:
Z Z
f (x, y)dxdy = f (r cos θ , r sin θ )rdrdθ , G = g−1 (D).
D G

Problem 15.3 (Polar Coordinates). Given the following:

f = x2 + y2 , x = r cos θ , y = r sin θ ,

(i) Rewrite f in terms of r and θ , then compute the derivatives.


(ii) Use the chain rule to compute the derivatives of f with respect to r and θ .

Solution 15.3 This problem confirms that the two methods yield the same result.
The function f is chosen to make (i) easy to compute. For a general function f , (ii)
would be easier. ⊔ ⊓

Problem 15.4. Given the following vector-valued function f:


15.3 Variable Transformation in Multiple Integrals 117
!
x2 + y2
f(x, y) = x2 , x = r cos θ , y = r sin θ .
x2 +y2

(i) Compute the derivatives of f with respect to r and θ directly.


(ii) Use the chain rule to compute the derivatives of f with respect to r and θ .

Solution 15.4 ⊔

15.3 Variable Transformation in Multiple Integrals

Let’s reserve the variables r, θ , φ , and ρ, and use u, v, w, etc., for others.
2x−y
Problem 15.5. Convert the following iterated integral using the variables u = 2
and v = 2y and then compute it:
y
2 +1
Z 4Z
2x − y
dxdy
0 y
2
2

Solution 15.5 First, we need to find the function g(u, v) and its derivatives.

y = 2v, x = u + v, g1 (u, v) = u + v, g2 (u, v) = 2v,


 
11
∇g(u, v) = , det(∇g(u, v)) = 2.
02
The region D in the xy-plane is bounded by y = 0, y = 4, y = 2x, y = 2(x − 1). The
region G using the variables u and v is bounded by v = 0, v = 2, u = 0, u = 1. There-
fore, the integral with respect to u and v is as follows:
Z 2Z 1
2ududv. ⊔

0 0

Problem 15.6. Compute the following:


Z 1 Z 1−x
x + y(y − 2x)2 dydx
0 0

using the following variable transformation:

u = x + y, v = y − 2x.

Solution 15.6 Since x = (u − v)/3 and y = (2u − v)/3, the transformation g is as


follows.    
g1 (u, v) 1 u−v
g(u, v) = = .
g2 (u, v) 3 2u + v
118 15 Surface Integral

Then,
 
1 1 −1 1 1
∇g(u, v) = , det(∇g(u, v)) = ( )2 (1 − (−2)) = .
3 2 1 3 3

The boundaries of regions D and G are as follows:

x = 0 ⇒ v = u, y = 0 ⇒ v = 2u, x + y = 1 ⇒ u = 1.

The integral can be expressed as follows.


Z 1Z u
√ 1 2
= uv2 dvdu = ...
0 −2u 3 9
(reference to the figure). ⊔


Problem 15.7. Compute the following using the variable transformation u = xy, v =
p
y/x:
Z 2Z y r √
y xy
e dxdy
0 1/y x

Solution 15.7 Both x and y are positive within the region. The new variables u and
v are also positive. They can be expressed as follows.
p u √ 2
uv = y2 = y, = x = x.
v
 −1   −1 
uv v −uv−2 u
g(u, v) = , ∇g(u, v) = , det(∇g(u, v)) = 2 .
uv v u v
The boundaries of regions D and G are as follows:

y = 2 ⇒ uv = 2, y = x ⇒ v = 1, xy = 1 ⇒ u = 1.

The integral can be expressed as follows.


Z 2Z y r √ Z 2Z 2
y xy v u
e dxdy = veu 2 dudv. ⊔

0 1/y x 1 1 v
Lecture 16
Triple Integrals in Rectangular Coordinates

Since we have experienced extending the one-dimensional Riemann integral to two


dimensions, extending it to three dimensions should be an easy task. It’s just re-
peating the same process. In this lecture, while briefly going through the repetitive
process, we will practice several integrals. Try to anticipate how to extend it and
compare your expectations with the contents of this lecture. Pay attention to how
the notation and indices are chosen. Choosing them carefully can reduce confusion
and make progress easier.

16.1 Riemann Integral on Hexahedron Domain

Let D ⊂ R3 be a hexahedron in three-dimensional space given by D = [a, b] ×


[c, d] × [e, h], and let f : D ⊂ R3 → R be a function defined on this hexahedron.
To define the Riemann integral of this function, let’s first create a partition of D.
Let’s denote the partition of the interval [a, b] by {xi : i = 0, · · · , p1 }, the partition
of the interval [c, d] by {y j : j = 0, · · · , p2 }, and the partition of the interval [e, h] by
{zk : k = 0, · · · , p3 }. Then,

Ci jk = [xi−1 , xi ] × [y j−1 , y j ] × [zk−1 , zk ], ∆Ci jk = ∆ xi ∆ y j ∆ zk

119
120 16 Triple Integrals in Rectangular Coordinates

are the cells forming the partition of D, and their sizes. The total number of cells is
p1 p2 p3 . We denote the partition and size of D by

π = {Ci jk : i = 1, · · · p1 , j = 1, · · · p2 , k = 1, · · · p3 }, ∥π∥ := max ∆Ci jk .

Now, let’s choose one point from each cell. Then the Riemann sum is as follows:
p1 p2 p3
S( f , π) = ∑ ∑ ∑ f (si jk )∆ xi ∆ y j ∆ zk , si jk ∈ Ci jk .
i=1 j=1 k=1

If the function is continuous or bounded on D and discontinuous only on a fi-


nite number of smooth surfaces, thenR
the limit of the above Riemann sum exists as
∥π∥ → 0. This limit is denoted by D f dx and called the Riemann integral of f on
D, and we say that the function f is Riemann integrable. In other words, we write it
as follows: Z
f dx = lim S( f , π).
D ∥π∥→0

As in Fubini’s theorem, if the function is continuous or bounded on the region D


and discontinuous only on a finite number of smooth surfaces, the Riemann integral
is equal to the iterated integral, and can be written as follows:
Z Z bZ dZ h
f dx = f (x, y, z)dzdydx.
D a c e

The order of integration can be changed. In practice, we use iterated integrals to


perform the integration.
To find the volume of the region D ⊂ R3 , we integrate f = 1:
Z
Vol(D) = 1 dx.
D

The average of the function f : D → R is the integral value divided by the volume:
1
Z Z
Average = f dx = f dx.
Vol(D) D D

Problem 16.1. Find the average of the function f (x, y, z) = xyz on the domain D =
[0, 2]3 .

Solution 16.1 ⊔

16.2 Riemann Integral on Non-Hexahedron Domain 121

16.2 Riemann Integral on Non-Hexahedron Domain

If the domain D ⊂ R3 is not a hexahedron, expressing the Riemann integral as an


iterated integral becomes difficult. However, in certain cases, this is possible, and in
this section, let’s look at a few examples.

Problem 16.2. Let D be the interior of a sphere with center (a, b, c) and radius r > 0.
Express the Riemann integral of a continuous and bounded function f over D as an
iterated integral.

Solution 16.2 First, projecting D onto the xy-plane yields a circle with center (a, b)
and radius r. Projecting this area again onto the x-axis gives the interval (a−r, a+r).
Now, once x ∈ (a − r, a + r) is fixed, the points (x, y) inside the disk are such that
 q q 
(x − a)2 + (y − b)2 < r2 ⇒ y ∈ b − r2 − (x − a)2 , b + r2 − (x − a)2 .

Furthermore, if (x, y) is fixed inside the disk, the points (x, y, z) inside the sphere
satisfy
(x − a)2 + (y − b)2 + (z − c)2 < r2 ,
i.e.,
 q q 
z ∈ c − r2 − (x − a)2 − (y − b)2 , c + r2 − (x − a)2 − (y − b)2 .

Therefore, the corresponding iterated integral is as follows:


Z Z Z √2 b b+ 2Z
√2
r −(x−a) c+ 2 2
r −(x−a) −(y−b)
f dx = √ √ f (x, y, z)dxdydz.
D a b− r2 −(x−a)2 c− r2 −(x−a)2 −(y−b)2

(Refer to the diagram) ⊔



122 16 Triple Integrals in Rectangular Coordinates

Problem 16.3. Suppose there is a sphere with center (0, 0, 1) and radius 5. Let D
be the part inside the sphere where z > 4. Express the volume of D as an iterated
integral.

Solution 16.3 Using the Pythagorean theorem, we find that the base of D is a disk
with radius 4. When projected onto the xy-plane, it becomes a disk with radius 4
and center at (0, 0). That is,
x2 + y2 < 16.

√ the x-axis√yields the interval (−4, 4). Now, once x is


Projecting this disk onto
fixed, y ranges from − 16 − x2 to 16 − x2 . If (x, y) is a point on the disk, then
the z-coordinate
p of a point inside the region D of the sphere satisfies 4 < z <
1 + 25 − x2 − y2 . To find the volume of D, we integrate the constant function
f = 1:
√ Z √
Z Z Z 4 16−x2 1+ 25−x2 −y2
Volume = 1 dx = √ 1 dzdydx.
D −4 − 16−x2 4

(Refer to the diagram) ⊔


Problem 16.4 (Plane Passing Through Three Points). Find the equation of a plane
in three-dimensional space including the three points {(0, 0, 0), (1, 1, 0), (0, 1, 1)}.

Solution 16.4 To find the equation, follow these steps: (1) First, find a vector n
perpendicular to the plane. (2) Choose a point a on the plane. (3) For any point
x = (x, y, z) on the plane, the vector x − a is perpendicular to the plane. Thus, the
equation of the plane is given by n · (x − a) = 0.
Let’s proceed step by step. (1) To find the perpendicular vector, we need two
vectors in the plane. Given three points a, b, c, the difference between two points
forms a vector in the plane. For this problem, it’s convenient to choose (1, 1, 0) and
(0, 1, 1) since subtracting the origin is straightforward. The cross product gives a
vector perpendicular to both:

(1, 1, 0) × (0, 1, 1) = (1, −1, 1).


16.2 Riemann Integral on Non-Hexahedron Domain 123

(2) Choose the point on the plane as (0, 0, 0) for simplicity. (3) Therefore, the equa-
tion of the plane is:

(1, −1, 1) · (x − 0, y − 0, z − 0) = x − y + z = 0.


Problem 16.5. Find the volume of the tetrahedron in three-dimensional space with
vertices {(0, 0, 0), (1, 1, 0), (0, 1, 0), (0, 1, 1)}.

Solution 16.5 Drawing a good diagram is important to understand and communi-


cate the situation in the problem. By referring to the diagram, it’s easy to decide
which plane to project onto. First, projecting onto the xy-plane results in a trian-
gle with vertices {(0, 0, 0), (1, 1, 0), (0, 1, 0)}. Projecting this triangle onto the y-axis
gives the interval (0, 1). Therefore, 0 < y < 1. Now, once y is fixed, x ranges from
0 < x < y. If (x, y) is a point on the triangle, then the z-coordinate of a point inside the
tetrahedron satisfying {(0, 0, 0), (1, 1, 0), (0, 1, 1)} lies between the plane z = y − x
and z = 0. Hence, we need to find the equation of the plane. As calculated in the
previous problem, z varies between 0 < z < y − x. To find the volume, integrate the
constant function f = 1. Therefore, the volume is given by:
Z Z 1 Z y Z y−x
Volume = 1 dx = 1 dzdxdy.
D 0 0 0


Problem 16.6. Find the volume of the region formed by the intersection of z =
7 − x2 − y2 and 2x − 2y + z = 0 in three-dimensional space.

Solution 16.6 The curve formed by the intersection of the plane 2x − 2y − z = 0


and the three-dimensional parabola z = 7 − x2 − y2 is given by:

2x − 2y − 7 + x2 + y2 = (x + 1)2 + (y − 1)2 − 9 = 0.
124 16 Triple Integrals in Rectangular Coordinates

It’s a circle with center (−1, 1) and radius 3. This circle is the projection of the curve
onto the xy-plane, and to lift it back to the plane, we need to find the equation of the
circle in three dimensions. Projecting thispcircle onto the x-axis gives p the interval
−4 < x < 2. Given x, y ranges from 1 − 9 − (x + 1)2 < y < 1 + 9 − (x + 1)2 .
Once (x, y) is determined, z ranges from 2x − 2y < z < 7 − x2 − y2 . Now, to find the
volume:
Z Z Z √ 2 1− 2Z
9+(x+1) 2 2
7−x −y
Volume = 1 dx = √ 1 dzdxdy.
D −4 1− 9−(x+1)2 2x−2y


16.3 Moments and Center of Mass

Let D ⊂ Rn be situated in n-dimensional space. The moment of a real function


f : D → R is defined as follows. First, we call the integral value of f the 0-th order
moment, which we refer to as the mass:
Z
Mass = f dx. (0th moment)
D

The first moment is the integral of x multiplied by f . That is,


Z
x f dx. (1st moment)
D

Here, x is a column vector. Therefore, when n = 3, the 1st moment expressed


element-wise is as follows:
  R 
Z Z x RD
x f dx
x f dx = y f dx = RD y f dx .
D D
z D z f dx
16.3 Moments and Center of Mass 125

The center of mass, or centroid, is obtained by dividing the first moment by the
mass: R
x f dx
Center of mass or centroid = RD .
D f dx

Problem 16.7. Explain whether the centroid given above is indeed the centroid we
commonly refer to.

Solution 16.7 ⊔

Problem 16.8. Find the centroid of the triangle with vertices {(0, 0), (0, 3), (3, 0)}.
Verify if it matches the existing centroid.

Solution 16.8 In this problem, assuming the thickness of the triangle is constant.
Then, assuming the density is constant and equal to 1, we can find the centroid.
The mass can be found by integrating 1. Since it’s a right triangle, the area is
simply 92 , and the mass is equal to the area: ⊔

Lecture 17
Coordinate Systems

In this lecture, we introduce the widely used three-dimensional coordinate sys-


tems: cylindrical coordinates and spherical coordinates, and utilize them to perform
integrations. The main task is to express integrals over the region D in orthogonal
coordinates as integrals over the region G in new coordinates. We utilize the follow-
ing integral formula from the previous section:
Z Z
f dx = f (g)|det(∇g)| dudvdw.
D G

In coordinate transformations, since ∇g is always a square matrix, the volume ex-


pansion rate can be calculated using the determinant as above. We practice express-
ing integrals over G in specific examples.

17.1 Coordinate Systems

In two dimensions, we used polar coordinates as the polar coordinate system. In


three dimensions, there are two main coordinate systems: cylindrical coordinates
and spherical coordinates. Both of them use the polar coordinate system as the basis,
so we start this lecture by reviewing the polar coordinate system.

127
128 17 Coordinate Systems

Polar Coordinates

We should be able to calculate the coordinates (x, y) represented in rectangular


coordinates using the polar coordinate system. First, in the polar coordinate system,
p distance between the point (x, y) in the xy plane and the origin, so it satisfies
r is the
r = x2 + y2 . Also, the angle θ is the angle formed between the x-axis and the line
segment connecting the origin and the given point (x, y). Therefore,
y  x 
θ = tan−1 or θ = cos−1 p
x x2 + y2

can represent θ .
However, what’s more important is to find the transformation function g that cal-
culates (x, y) when (r, θ ) is given. First, let’s consider the (r, θ ) space. We define
r as the first coordinate and θ as the second coordinate. The order also matters, as
it relates to orientation. By defining it this way, the orientation is preserved by the
transformation g. Then, g is given by
 
r cos θ
g(r, θ ) = .
r sin θ

We’ve already calculated it before, but if we calculate the volume expansion rate,
it’s as follows:
cos θ −r sin θ
det(∇g) = = r cos2 θ + r sin2 θ = r.
sin θ r cos θ

Therefore, the integral transformation formula is as follows:


Z Z Z
f dx = f ◦ g(r, θ )r drdθ = f (r, θ )r drdθ .
D G G

In the notation above, strictly speaking, the last f is actually different from the
previous f . It’s a notation abuse, but such abuse of notation is often used and should
be allowed for convenience.
17.1 Coordinate Systems 129

Cylindrical Coordinates

The cylindrical coordinate system simply uses the polar coordinates (r, θ ) from
the polar coordinate system and incorporates the z coordinate from the Cartesian
coordinate system. Let’s confirm this through some calculations. We designate z as
the third coordinate. First, the transformation function g is as follows:
 
r cos θ
g(r, θ , z) =  r sin θ 
z

The volume expansion rate is simply the absolute value of the determinant of ∇g.
Thus,
cos θ −r sin θ 0
q(r, θ , z) = |det(∇g)| = sin θ r cos θ 0 = r
0 0 1
which is the same as in the 2D case of polar coordinates. Therefore, the integral
transformation formula is as follows:
Z Z Z
f dx = f ◦ g(r, θ , z)r drdθ dz = f (r, θ , z)r drdθ dz.
D G G

This is almost identical to the case of polar coordinates.

Spherical Coordinates

The spherical coordinate system defines the distance ρ between the point (x, y, z)
and the origin as well as the angle φ formed between the z-axis and the line segment
connecting the origin and the point (x, y, z). The azimuthal angle θ remains the same
as in the polar coordinate system. Therefore, given the point (x, y, z),
130 17 Coordinate Systems

p  z 
ρ= x 2 + y2 + z2 , φ = cos−1 p
x2 + y2 + z2

are satisfied. Using (ρ, φ ) to represent r in the polar coordinate system is useful:

r = ρ sin φ .

Let’s choose the order of variables: ρ as the first variable, φ as the second vari-
able, and θ as the third variable. There are other ways to choose, but we’ll go with
this one. This choice simplifies the following calculations slightly and preserves
orientation under transformation. The transformation function g is as follows:
 
ρ sin φ cos θ
g(ρ, φ , θ ) =  ρ sin φ sin θ 
ρ cos φ

The volume expansion rate is simply the absolute value of the determinant of ∇g.
Thus,

q(ρ, φ , θ ) = |det(∇g)|
sin φ cos θ ρ cos φ cos θ −ρ sin φ sin θ
= sin φ sin θ ρ cos φ sin θ ρ sin φ cos θ
cos φ −ρ sin φ 0
= ρ 2 sin φ = rρ.

which is the same as in the 2D case of polar coordinates. The angle φ lies between
0 and π, so sin φ is positive. Also, since r = ρ sin φ , it’s useful to remember that
the volume expansion rate is q = rρ. Now, the integral transformation formula is as
follows:
Z Z Z
f dx = f ◦ g(ρ, φ , θ )ρ sin2 φ dρdφ dθ = f (ρ, φ , θ )ρ sin2 φ dρdφ dθ .
D G G
17.2 Examples of Variable Changes 131

17.2 Examples of Variable Changes

Let D ⊂ R3 be a region in space, and let f : D → R be a given function. What we


want to compute is Z
f dx
D
However, sometimes it’s better to compute it using cylindrical or spherical coordi-
nates rather than Cartesian coordinates. Let’s practice that.

Problem 17.1. The region D ⊂ R3 is represented in Cartesian coordinates by lying


below the graph z = x2 + y2 and above the disk on the xy-plane defined by x2 + (y −
2
R
1) < 1. Rewrite the integral D f dx using cylindrical or spherical coordinates.

Solution 17.1 Since the shape of the region D has boundaries parallel to the z-
axis, it’s advantageous to use cylindrical coordinates. Projecting D onto the xy-plane
yields x2 + (y − 1)2 < 1. The range of the angle θ is 0 < θ < π. Now, given θ , the
points on the circle are given by

0 = x2 + (y − 1)2 − 1 = x2 + y2 − 2y = r2 − 2r sin θ ⇒ r = 2 sin θ

so the range is 0 < r < 2 sin θ . And once (r, θ ) is determined, the range of z is
0 < z < r2 , so the integral becomes
Z Z π Z 2 sin θ Z r2 Z π Z 2 sin θ Z r2
f dx = f (g)r dzdrdθ = f (r, θ , z)r dzdrdθ .
D 0 0 0 0 0 0

The third f is a notation abuse. (Thinking of f as a function of (r, θ , z) instead of


(x, y, z) is essentially considering a composite function, and it’s a natural notation
abuse.) ⊔ ⊓

Problem 17.2. When the center of the disc is moved from (0, 1) to (2, 2), find the
integral.
132 17 Coordinate Systems

Solution 17.2 The points on the circle on the xy-plane are satisfied by (x − 2)2 +
(y − 2)2 − 1 = 0. Rewriting it,

x2 + y2 − 4x − 4y + 7 = r2 − 4r(cos θ + sin θ ) + 7 = 0.

It is divided into two cases: when the quadratic in r has two distinct roots, and when
it has a repeated root or no root. Solving it, the range of angle θ becomes the angles
θ1 and θ2 where the quadratic has two distinct roots, and between them, r has two
values r1 (θ ) < r2 (θ ). Then, it can be rewritten as follows:
Z Z θ2 Z r2 (θ ) Z r2
f dx = f (g)r dzdrdθ .
D θ1 r1 (θ ) 0


The techniques used in Problems 17.1 and 17.2 are sometimes complex and not
particularly helpful for integration. They don’t fully utilize the advantages of using
cylindrical coordinates. Rather than these methods, it’s more meaningful to move
the center of the region to the origin and then use cylindrical coordinates. This means
applying another simple variable change, a translation. Let x̄ = x − 2, ȳ = y − 2 for
the translation, then the equation of the circle becomes x̄2 + ȳ2 = 1.

Then, in the variable change given in the figure, the range of (r, θ ) is 0 < θ < 2π,
0 < r < 1. When r and θ are determined, the range of z is 0 < z < x2 + y2 . However,
this range is not 0 < z < r2 because r and θ are now related to x̄ and ȳ. So, the
17.2 Examples of Variable Changes 133

calculation gives

x2 + y2 = x̄2 + ȳ2 + 4(x̄ + ȳ) + 8 = r2 + 4r(cos θ + sin θ ) + 8.

And if we denote h(x̄, ȳ, z) = f (x, y, z), in other words,

h(x̄, ȳ, z) = f (x̄ + 2, ȳ + 2, z),

then the integral above becomes:


Z Z 2π Z 1 Z r2 +4r(cos θ +sin θ )+8
f dx = h(g)r dzdrdθ .
D 0 0 0

Problem 17.3.RGiven f (x, y, z) = x + y + z and the region D ⊂ R3 as given in Prob-


lem 17.2, find D f dx using the variable transformation described above.

Solution 17.3 ⊔

Part IV
Integration of Vector Fields
Lecture 18
Line Integral for Tangential Component

In this lecture, we learn how to integrate tangential components of vector fields


using line integrals. Remember that we are integrating only the component of the
vector field that is tangent to the given curve, hence the name ”tangential component
integral.” Especially in the case of vector fields representing force fields, the kinetic
energy gained by an object moving along a given curve is obtained by integrating
the tangential component. Moreover, the gained kinetic energy is equal to the poten-
tial energy lost by the object. Therefore, line integrals are often used in calculating
energy. Examples of force fields include gravitational fields, electromagnetic fields,
etc. We’ll learn about conservative vector fields and potential vector fields.

18.1 Line integral for a scalar function

Let’s start by reviewing the line integral of a scalar-valued function, which we’ve
already studied in Calculus 1. Since we’ve already learned how to perform variable
transformations using the volume expansion rate, let’s summarize it again.

137
138 18 Line Integral for Tangential Component

In the case of a line integral, the domain D ⊂ R3 is a curve. We can think of


a curve as the trajectory r(t) of a particle. In this case, the corresponding variable
transformation function g is the trajectory function r(t), and the new variable space
is Rℓ = R1 . Since we’re in one dimension, the volume expansion rate becomes the
length expansion rate, which is determined by the derivative:
q
∥∇r(t)∥ = x′ (t)2 + y′ (t)2 + z′ (t)2 .

Thus, the integral transformation formula is as follows:


Z Z b
f (x) dx = f (r(t))∥r′ (t)∥dt. (18.1)
D a

The difference from (9.3) is simply that we use D to represent a general region
instead of C to emphasize that it’s a curve. There’s no real difference. To emphasize
that it’s a curve, we’ll use C in the future.
If the curve C is divided into several segments, Ci , i = 1, · · · , N, then we can
integrate each segment separately and add them together. In other words,
Z Z Z bi
f (x) dx = ∑ f (x)dx = ∑ f (ri (t))∥r′i (t)∥dt.
∪Ci i Ci i ai

Problem 18.1. If two curves r̄(t) and r(t) pass through the same curve but move
in opposite directions, what relationship holds between the line integrals given by
these two curves? Are they equal or opposite in sign?

Solution 18.1 ⊔

18.2 Line integral for a force field

Now let’s consider the line integral of a vector field f, rather than a scalar function
f . When we say a vector field in an n-dimensional space D ⊂ Rn , we simply mean
a function f : D → Rn that takes n-dimensional vectors as values. Of course, we
mainly focus on the case of three-dimensional space n = 3. In a physical sense,
f(x) represents vectors such as force vectors acting at position x. As shown in the
previous page’s figure, you can also visualize it by drawing the vector f(x) at each
position x. When we say ”vector field,” remember that f also takes n-dimensional
vectors as values.

Problem 18.2. Explicitly express the function f : R3 → R3 representing the gravita-


tional field exerted on a mass m by the sun located at the origin.

Solution 18.2 If a mass m is located at the position x ∈ R3 , the distance from the sun
x
is ∥x∥, and the direction vector toward the sun is − ∥x∥ . According to Newton’s law
18.2 Line integral for a force field 139

of universal gravitation, the gravitational force is inversely proportional to the square


of the distance and directly proportional to the product of the masses. Therefore, the
gravitational force is as follows:
GMmx
Gravitational force = f(x) = − .
∥x∥3

Here, G is the gravitational constant and M is the mass of the sun. ⊔


There are various types of vector fields in the world. Examples include gravita-
tional fields caused by celestial bodies, electromagnetic fields caused by charges or
currents, vector fields representing fluid flow, and vector fields caused by wind or
ocean currents. Mathematically, we can consider even more diverse vector fields.
Vector fields that rotate around a point, converge or diverge at a point, or emanate
from or converge to a point can have different properties.

Work or labor is another expression of energy. You can think of it as a way to


calculate energy. It is related to vector fields that represent forces, such as grav-
ity or electromagnetic fields. Let’s consider calculating work given a vector field
representing force. The well-known formula for work is as follows:

Work = Force × Distance.

However, this expression is incorrect. Why is that? Force is a vector and distance is
a scalar, so their product should be a vector. However, energy (work) is a scalar. The
above equation is oversimplified and doesn’t make sense upon closer examination.
The correct expression is as follows:
140 18 Line Integral for Tangential Component

Work = Tangential component of force × Distance.

This is the correct expression.


But what if the direction of motion is not a straight line but a curve? And what if
the force varies in magnitude and direction depending on the position, like gravity?
How do we calculate it when an object moves along a curve r(t) from r(a) to r(b)?
We can calculate the work done by computing the tangential component of force
at each position along the path using line integrals. (Integration is a very powerful
tool.) The direction of motion can be determined by calculating velocity, which can
be represented in various forms such as v = r′ (t) = ṙ(t) = ∇r. The direction vector
ṙ(t)
of motion is the unit vector of velocity ∥ṙ(t)∥ . Therefore, the tangential component
ṙ(t)
of force f can be calculated using the dot product as f(x) · ∥ṙ(t)∥ . Now, the work done
or energy along curve C is as follows:
Z b Z b
ṙ(t)
Z
Work = f(x) · dx = f(r(t)) · ∥ṙ(t)∥dt = f(r(t)) · ṙ(t)dt.
C a ∥ṙ(t)∥ a

If the vector field f is not a force field, the line integral above may not represent
energy, but it can have other meanings and be useful. Even when a general vector
field f is given, we define the line integral (line integral) as above. The following is
a commonly used notation for line integrals:
Z Z b Z b

f(r) · dr = f(r(t)) · r (t)dt = f(r(t)) · dr(t). (18.2)
C a a

Problem 18.3. Determine whether the values of line integrals using two parametric
curves r1 (t) and r2 (t) are equal under the following conditions:
(1) When the same curve C is traversed in the same direction but with different
speeds.
(2) When the same curve C is traversed in opposite directions.
(3) When starting and ending points are the same but different paths are taken.

Solution 18.3 (1) They are equal. Different speeds are reflected in the expansion
rate. (2) They are different. (3) Only the sign is opposite. ⊔

Question 18.1. Is the line integral for a scalar function (18.1) fundamentally differ-
ent from that for a vector field (18.2), or do they coincide in certain cases (e.g., in
the case of n = 1)? Find the fundamental difference between them.

Solution 18.1 They are fundamentally different. ⊔


There are various ways to express the line integral of f(x) = f1 (x)i + f2 (x)j +
f3 (x)k along a curve r(t) = r1 (t)i + r2 (t)j + r3 (t)k for a < t < b. Let’s find the logic
behind each method.
18.3 Path independence, potential, and conservative fields 141
Z Z b Z b
f(x) · dx = f(r(t)) · dr(t) = f(r(t)) · r′ (t)dt
C a a
Z b
f1 (r(t))r1′ (t) + f2 (r(t))r2′ (t) + f3 (r(t))r3′ (t) dt

=
a
Z b Z b Z b
= f1 (r(t))r1′ (t)dt + f2 (r(t))r2′ (t)dt + f3 (r(t))r3′ (t)dt
a a a

Next, let’s express it by first computing the dot product.


Z Z Z Z
f(x) · dx = f1 (x)dx + f2 (x)dy + f3 (x)dz
C C C C
Z b Z b Z b
= f1 (r(t))dr1 (t) + f2 (r(t))dr2 (t) + f3 (r(t))dr3 (t)
a a a
Z b Z b Z b
= f1 (r(t))r1′ (t)dt + f2 (r(t))r2′ (t)dt + f3 (r(t))r3′ (t)dt
a a a

You should be familiar with both notations as both are commonly used.
Let’s practice computing line integrals.

Problem 18.4. Let f(x) = yi + zj + xk and r(t) = ti + tj + t 2 k for a < t < b. Com-
pute the line integral.

Solution 18.4 (1) First, f(r(t)) = ti + t 2 j + tk and r′ (t) = i + 0.5t −0.5 j + 2tk.
Therefore, Z Z b √
f(r(t)) · r′ (t)dt = t + 0.5t 1.5 + 2t 2 dt.

C a
(2) Instead of computing the dot product above, let’s use the formula that already
computes the dot product:
Z Z Z Z
f(x) · dx = f1 (x)dx + f2 (x)dy + f3 (x)dz
C C C C

Here, we substitute x → r(t), x → r1 (t), y → r2 (t), z → r3 (t). For example,


Z Z b Z b Z b√
f1 (x)dx = f1 (r(t))dr1 (t) = f1 (r(t))r1′ (t)dt = t1dt.
C a a a

You should be able to compute using both methods. ⊔


18.3 Path independence, potential, and conservative fields

Question 18.2. If two different paths r1 and r2 are taken from the starting point S to
the ending point E, the line integrals for a vector field along these paths are generally
different. However, for vector fields like gravitational fields, the line integral should
142 18 Line Integral for Tangential Component

be the same if the starting and ending points are the same. Why is this the case? Can
you explain?

Vector fields that represent forces, such as gravitational fields, have line integrals
that compute the work done by the gravitational field. Moreover, according to the
law of conservation of energy, the work done by the gravitational field is equal to
the difference in potential energy between the starting and ending points, regardless
of the path taken. Not all vector fields have this property. In this section, we will
learn about the characteristics of vector fields that exhibit such properties.

DefinitionR
18.1. Let f be a vector field on an open domain D ⊂ Rn . We say the line
integral C f · dr is path independent in D and the vector field f is conservative if
the line integral depends only on the starting and ending points S, E ∈ D of the curve
C.

Definition 18.2. Let f be a vector field on an open domain D ⊂ Rn . If there exists a


function p : D → R such that f(x) = (∇p(x))t for all x ∈ D, we call p a potential
function of f.

In the above definition, note that we write f(x) = (∇p(x))t instead of f(x) =
(∇p(x)). This is because we have decided to represent the gradient of a scalar func-
tion ∇p(x) as a row vector, and f(x) as a column vector.

Problem 18.5. Find the potential function for the gravitational field obtained in
Problem 18.2.

Solution 18.5 There is not a unique potential function. Adding a constant still re-
sults in a potential function, as the constant disappears when computing the gradient.
The most commonly used potential function is as follows:
GMm
p(x) = .
∥x∥

Let’s compute the gradient to confirm. The potential function is inversely propor-
tional to the distance. Note that the potential energy is not the same as potential; it
18.3 Path independence, potential, and conservative fields 143

is the potential energy with a negative sign. (What about the potential energy at the
origin?) ⊔ ⊓

In the next lecture, we will learn that a vector field being conservative and having
a potential function are equivalent.

Exercises

1
1. Let f be a vector field with p = ∥x∥ as its potential function. If the curve C starts
at S = (2, 1, 0) and ends at E = (−1, 0, 1), what is the amount of work done?
2.
Lecture 19
Line Integral and a Fluid Flow

The concept of a vector field being conservative and having a potential field is equiv-
alent. This relationship is quite intuitive. The conservation of energy implies that
even if kinetic energy decreases, it is converted into potential energy, thus conserv-
ing the total energy. In a world where potential energy is not defined, there is no
conservation of energy. On the other hand, one of the key properties of a vector field
representing a fluid flow, rather than a force field, is circulation.

19.1 Potential field is conservative

The definitions of a vector field being conservative and having a potential function
were discussed in the previous lecture. Now let’s consider their relationship. Firstly,
if a vector field is given by a potential function, it can be easily shown to be conser-
vative.

Problem 19.1 (potential field ⇒ conservative field). Show that if a vector field f
has a potential function, then it is conservative.

Solution 19.1 To show that the vector field f is conservative, we need to demonstrate
that the line integral is path-independent. Let C be a curve with the starting point S
and the ending point E. Let r(t) be a parametric function representing the curve C,
where r(a) = S and r(b) = E. Then,

145
146 19 Line Integral and a Fluid Flow
Z Z b
f(r) · dr = f(r(t)) · ṙ(t)dt
C a
Z b
= ∇p(r(t)) · ṙ(t)dt
a
Z b
= (p(r(t)))′ dt
a
= p(r(b)) − p(r(a))
= p(E) − p(S).

In other words, the line integral is determined by the starting and ending points, and
the path does not matter. (Explain the meaning of each equality above.)

If a vector field f has a potential function p, then the line integral from the starting
point S to the ending point E is given by:
Z Z b
f(r) · dr = ∇p(r(t)) · r′ (t)dt = p(E) − p(S). (19.1)
C a

Note
Rb
that in the above notation, the dot product · is omitted, and it is written as

a ∇p(r(t)) · r (t)dt. This is because ∇p(r(t)) is a column vector.
Now let’s show that a conservative vector field has a potential function. This proof
is more challenging and requires a well-defined potential function.

Problem 19.2 (Conservative field ⇔ potential field). Show that if the region D ⊂
R3 is open and connected, and the vector field f : D → R3 is continuous, then the
property of f being conservative is equivalent to the existence of a potential function.

Solution 19.2 Let’s show that if f is conservative, then a potential function exists.
The reverse direction has already been demonstrated in Problem 18.5. Let’s define
the potential function p(x, y, z) of the vector field f as follows. First, we choose a
point S ∈ D in the domain D as the reference point and set p(S) = 0. Then, for any
point E = (x, y, z) ∈ D, we determine the value of p at this point. Initially, we choose
a curve starting from S and ending at E, and define p using a line integral as follows:
Z b
p(E) = p(x, y, z) = f(r(t)) · r′ (t)dt.
a

We need to verify whether the function is well-defined. That is, we need to confirm
that only one value is assigned to each (x, y, z) according to our definition. Even if
we choose a different curve C1 , the value of p(E) should not change. However, since
f is a conservative field, the line integral will yield the same value for the starting
and ending points, regardless of the curve chosen. This fact is crucially used here.
∂p ∂p ∂p
Now we need to show ∇p(x, y, z) = f(x, y, z). Since ∇p = ∂ x i + ∂ y j + ∂ z k, let
∂p
f = f1 i + f2 j + f3 k. To prove = f1 , let’s choose a point E = (x, y, z) sufficiently
∂x
close to it in the domain D, and let M = (x0 , y, z) be a point close to E in D connected
19.2 Line integral and closed curves 147

to E by the line segment L (refer to the figure). Then, for a parametric curve r(t) =
ti + yj + zk representing L with x0 < t < x, r′ (t) = (1, 0, 0), and
Z Z
p(x, y, z) = f(r) · dr + f(r) · dr.
C0 L

Differentiating with respect to x, the constant terms with respect to x vanish, yield-
ing:
∂p ∂ x
Z Z

(E) = f(r) · dr = f1 (r(t))dt = f1 (x, y, z).
∂x ∂x L ∂ x x0
To prove ∂∂ yp = f2 and ∂p
∂z = f3 , we need to choose the curve slightly differently.
Let’s proceed. ⊔ ⊓

19.2 Line integral and closed curves

A curve with the same starting and ending points is called a loop or a closed curve.
Line integrals along loops will frequently appear.

Definition 19.1. Let C be a curve parameterized by r : [a, b] → R. If the starting


point equals the ending point, i.e., r(a) = r(b), then C is called a closed curve. A
closed curve is also referred to as a loop. If the curve C does not intersect itself,
meaning there are only two cases where r(x) = r(y) for x, y ∈ [a, b], either x = y or
x, y are the endpoints a, b, then the curve is called a simple curve (see the figure).

When C is a closed curve, the line integral is often denoted as


I
f(x) · dx
C

to emphasize that this line integral is taken along a loop.

Problem 19.3 (Line integral of a loop). If f is conservative and C is a closed curve,


prove that the line integral is zero.
148 19 Line Integral and a Fluid Flow

Solution 19.3 Let r(t), a < t < b be the parameterization of C. Then r(a) = S = E =
r(b). Let M be a point on the curve C. Then, C can be divided into two parts, and
we can denote two parameterizations r1 and r2 Rconnecting S and RM (see the figure).
Since f is a conservative vector field, we have f(r1 ) · dr1 (t) = f(r2 ) · dr2 (t). By
choosing a path that follows r1 first and then r2 in the opposite direction, we obtain:
Z Z Z
f(x) · dx = f(x) · dx − f(x) · dx = 0.
C C1 C2


However, given a vector field f(x) = f1 i + f2 j + f3 k, can we determine if this


vector field is conservative? We can establish a simple criterion using relationships
like fxy = fyx .

Problem 19.4 (Criterion for conservative fields). Let D ⊂ R3 be an open simply


connected domain, and let f = ( f1 , f2 , f3 ) be a vector field defined on D that is
continuously differentiable. Then, the existence of a potential function for f and the
following relations are equivalent:

∂ f1 ∂ f2 ∂ f1 ∂ f3 ∂ f2 ∂ f3
= , = , = . (19.2)
∂y ∂x ∂z ∂x ∂z ∂y

Solution 19.4 If f is a potential field and continuously differentiable, it is obvious


that (19.2) holds (verify). Conversely, assuming (19.2) holds, proving that f is a
potential field is a bit more difficult and will be shown later when proving Stokes’
theorem. ⊔ ⊓

It is important to note in the above criterion that D must be simply connected, not
just connected. The definition of simply connected is as follows.

Definition 19.2. An open domain D is called simply connected if every closed sim-
ple curve in D can be continuously contracted to a single point in D without leaving
D.
19.3 Flow and circulation 149

Understanding the concept of simply connectedness is essential to solving the


following problem.

Problem 19.5. Determine whether the regions D depicted in the four figures are
simply connected. (The point marked in the middle indicates that the region has
been removed around that point.)

Solution 19.5 (1) The first figure is a partial disc with a small part removed around
the center, creating a loop that cannot be contracted to a single point. (Can you
identify which loop it is?) Therefore, it is not simply connected. (2) The second
figure is a disc with a small part removed around the center. Every loop can be
contracted to a single point. Therefore, it is simply connected. (3) The third figure
is obtained by drilling a cylinder into a filled sphere. It is not simply connected. (4)
The fourth figure resembles a donut shape, known as a torus. The inside is filled,
which is equivalent to structure (3) in practice, and it is not simply connected. ⊔ ⊓

19.3 Flow and circulation

Now let’s consider line integrals from the perspective of a velocity field representing
the flow of a fluid, rather than from the viewpoint of a force field like gravity. In this
case, instead of calling the line integral ”work,” we refer to it as the flow around the
curve C: Z
Flow around C = f(x) · dr(t).
C
In particular, when the curve C is a closed and simple curve, i.e., a loop, we call it
the circulation around the loop C:
I
Circulation around C = f(x) · dr(t).
C

Of course, when the vector field f is not specified as a force field or a velocity field,
whether we call it work, flow, or circulation, we understand that its meaning simply
denotes a line integral.
150 19 Line Integral and a Fluid Flow

Question 19.1. The concept of circulation is meaningless for conservative vector


fields. Why is that?

If f is a conservative vector field, then the circulation is zero everywhere. This was
already shown in Problem 19.3.
Among the three vector field diagrams following Problem 18.2, one of them has
nonzero circulation. Which one is it? It’s difficult to definitively say that the remain-
ing two diagrams clearly do not have nonzero circulation.

Question 19.2. When computing line integrals, changing the direction of the pa-
rameterization of the curve reverses the sign. This property poses a serious problem
when calculating circulation. What is the issue?

The sign of the circulation around curve C cannot distinguish between positive
and negative values. If the fluid exhibits vortex behavior, it’s crucial to determine
whether the circulation is in the clockwise or counterclockwise direction. If this
distinction cannot be made, the concept becomes useless. Therefore, we need to
establish a convention. When computing circulation, we define the direction of the
loop in the counterclockwise direction as shown in the diagram. However, the prob-
lem remains unresolved. For example, in the case of a loop in three-dimensional
space as shown on the right, it’s not clear which direction is counterclockwise. We
need to decide which direction to point as counterclockwise. This is equivalent to
determining the orientation.
19.3 Flow and circulation 151

Suppose there is a surface S in space with a curve C on it. To define a parameter-


ization r(t) moving counterclockwise along this curve, we first need to establish an
orientation, such as pointing the normal vector n upward. Then, the counterclock-
wise direction becomes clear. To discuss the circulation of a three-dimensional fluid,
we must first establish a direction. We define circulation with respect to this direc-
tion after choosing a vector n as shown in the diagram. When discussing circulation
in ocean currents, for example, we typically choose a vector n pointing outward
from the Earth’s surface and then select the counterclockwise direction based on
that orientation.

Exercises

1.
Lecture 20
Surface Integral for Normal Component

In this lecture, we will learn about surface integrals for the normal component of
a vector field. Since we are integrating the component perpendicular to the vector
field’s surface, it’s also appropriate to call it a normal integral. In three-dimensional
space, a surface has two dimensions, but in general, in n-dimensional space, a
surface has n − 1 dimensions. In two-dimensional space, a one-dimensional curve
serves the role of a surface. In any case, integrating the normal component of a vec-
tor field over a given surface is called a surface integral. Surface integrals play an
important role, such as computing the flux leaving an area through its boundary.

20.1 Surface integral for a scalar function

153
154 20 Surface Integral for Normal Component

Let’s start by reviewing surface integrals of scalar-valued functions, which we


have already studied in Lecture 15. In the case of surface integrals, the integration
domain D ⊂ R3 is a surface. The boundary surface of a three-dimensional region
is a surface. For example, the boundary surface of a three-dimensional sphere is
a spherical surface. However, it’s difficult to directly integrate a function f over a
surface S. Usually, when a parametrization using two variables g(u, v) is given as
shown in the figure, the integral is as follows:
Z Z
f (x) dx = f (g(t))∥gu × gv ∥dudv. (20.1)
S G

For scalar functions, the value of the surface integral remains the same regardless of
the parametrization used.

Problem 20.1. Let F : R3 → R be continuously differentiable, and let S be a part of


the level set F(x, y, z) = 1, lying above the region G in the xy-plane as shown in the
figure. Show that the area of this surface is given by:

∥∇F∥
Z
dxdy. (formula for area of level surface)
R |∂z F|

Solution 20.1 Let’s write x, y directly instead of using parameters u, v. Then, the
function g can be written as follows:
 
x
g(x, y) =  y 
h(x, y)

Then, the area expansion rate is as follows:


q
∥gx × gy ∥ = ∥(1; 0; hx ) × (0; 1; hy )∥ = h2x + h2y + 1.

Since g(x, y) lies on the surface, it satisfies F(x, y, h(x, y)) = 1. Since the derivative
of a constant is 0, we have:
∂ ∂
F(g(x, y)) = F(x, y, h(x, y)) = Fx + Fz hx = 0,
∂x ∂x
∂ ∂
F(g(x, y)) = F(x, y, h(x, y)) = Fy + Fz hy = 0.
∂y ∂y

Therefore, we obtain hx = −Fx /Fz and hy = −Fy /Fz . Substituting them, the expan-
sion rate is as follows:
s
q  F 2  F 2  F 2 ∥∇F∥
x y z
h2x + h2y + 1 = + + = .
Fz Fz Fz |Fz |
20.2 Surface, normal vector, and tangent plane 155

Since the area is obtained by integrating the constant function f = 1, we obtain the
formula for area mentioned above. ⊔ ⊓

20.2 Surface, normal vector, and tangent plane

In three-dimensional space, there are many tangent lines touching a point on a given
two-dimensional surface. Moreover, a single tangent line does not characterize the
shape of the surface. The collection of all these tangent lines forms the tangent plane,
as depicted in the figure above. This tangent plane provides information about the
surface near a point. Therefore, instead of individual tangent lines, we need to find
the tangent plane.

Let’s denote the intersection of two line segments parallel to each axis of the
domain G of the given parametric function g(u, v) as (u0 , v0 ). Then, these two line
segments can be represented as (u0 , v) with a < v < b and (u, v0 ) with c < u < d.
The image of these line segments on the surface S becomes a curve. The partial
derivatives gu (u0 , v0 ) and gv (u0 , v0 ) become vectors tangent to these two curves at
the point g(u0 , v0 ). Therefore, the cross product of these two vectors is perpendicular
to the tangent plane. The unit vector in this direction is given by:
gu × gv
n= . (20.2)
∥gu × gv ∥

Thus, the equation of the plane is


156 20 Surface Integral for Normal Component

(x − x0 ) · n = 0, x0 = g(u0 , v0 )

The equation can also be written component-wise as:

(x − x0 )n1 + (y − y0 )n2 + (z − z0 )n3 = 0, x0 = (x0 , y0 , z0 ), n = (n1 , n2 , n3 ).

Of course, when obtaining the equation of the tangent plane, it’s not necessary to use
the unit vector n, but any vector perpendicular to the plane can be used. Therefore,
it’s convenient to use
(x − x0 ) · (gu × gv ) = 0
without explicitly using the unit vector.

Problem 20.2. At the point (0; 3; 4) on the surface of a sphere with center at the ori-
gin and radius 5, the direction vector perpendicular to the sphere is simply 51 (0; 3; 4).
Verify this by using the method of (20.2) to find the vector n.

Solution 20.2 The key to this problem is to choose g and compute the vector your-
self. Let’s set the domain
p of g as G = {(x, y) : x2 + y2 < 25}, and define the function
g = g(x, y) = (x; y; 25 − x2 − y2 ). Rewriting, we have
q
g(x, y) = (x; y; (25 − x2 − y2 )−0.5 2x), )), x2 + y2 < 25.

It’s easy to see that g(x, y) lies on the circle. The point (0; 3; 4) occurs when x =
0, y = 3. Therefore,
1
gx (0, 3) = (1; 0; − (25 − x2 − y2 )−0.5 2x) = (1; 0; 0),
2
1
gy (0, 3) = (0; 1; − (25 − x2 − y2 )−0.5 2y) = (0; 1; −3/4).
2
Now, to use (20.2), compute the cross product and direction vector:
3 1
gx (0, 3) × gy (0, 3) = (0, , 1), n = (0; 3; 4).
4 5

Note that depending on the choice of parameters, n = − 51 (0; 3; 4) could also be


obtained. ⊔⊓

Problem 20.3. Revisit Problem 20.2 using spherical coordinates.

Solution 20.3 Since the radius of the sphere is fixed at ρ = 5, we choose parameters
φ and θ , and let the domain be (φ , θ ) ∈ [0, π] × [0, 2π]. Then, define g as follows:
 
5 sin φ cos θ
g(φ , θ ) =  5 sin φ sin θ  .
5 cos φ
20.3 Surface integral for a vector field 157

Now, let’s find the normal vector. ⊔


20.3 Surface integral for a vector field

Let f : R3 → R3 be a vector field and S ⊂ R3 be a surface. Now, let’s define the


surface integral of the vector field f over the surface S. Previously, we defined the
line integral along a curve C as follows:
Z
f · T dx.
C

Here, T is the direction vector of the tangent to the curve. However, there are two
directions, and depending on the choice, the magnitude remains the same but the
sign changes (see the figure). Since there are many tangent vectors on the surface,
such a definition is meaningless. However, there are only two normal directions
on the surface, and they provide information about the surface (see the figure). In
reality, the surface integral is defined as follows:
Z
f · n dx.
S

That is, the component of the vector field perpendicular to the surface is scalar, and
integrating it gives the surface integral of the vector field.
It is worth noting that there are two possibilities for the normal vector n. There-
fore, in some cases, it may not matter which direction you choose, but if a direction
is specified in the problem, you should choose the normal vector n accordingly.
Now, using parameterization, the surface integral above can be written as follows.
gu × gv
Z Z
f · n dx = f(g(u, v)) · ∥gu × gv ∥ dudv.
S G ∥gu × gv ∥

The expansion rate ∥gu × gv ∥ cancels out and simplifies further:


Z Z
f · n dx = f(g(u, v)) · (gu × gv )dudv.
S G
158 20 Surface Integral for Normal Component

Problem 20.4. Let the vector field f(x) = (y; xz; z), and the surface S be defined by
x = y + z2 with 0 < z < 2 and 0 < y < 3. Compute the surface integral S f · n dx.
R

Solution 20.4 In this problem, since the normal vector n is not specified as one
of two possible normal vectors, there may be two possible answers. Let’s compute
them. First, choose parameters (y, z) and let the domain be G = {0 < y < 3, 0 < z <
2}. The mapping g : G → R3 is as follows:

y + z2
 

g(y, z) =  y  .
z

Therefore,

gy = (1, 1, 0), gz = (2z, 0, 1), gy × gz = (1, −1, −2z).

The surface integral is then given by:


Z Z 2Z 3
f · (gy × gz )dydz = (y − yz − z3 − 2z2 )dydz.
G 0 0

(The cross product gy ×gz = (1, −1, −2z) is non-zero for all intervals. This condition
is necessary for the normal vector n to be defined by (20.2).) ⊔ ⊓

Question 20.1. Surface integrals are not possible for all surfaces, and conditions
on the surface are necessary for the calculations we’ve done in this lecture. What
conditions are necessary?

The surface S must be smooth for us to perform the calculations in this lecture.
Smoothness of the surface means that the parameter function g(u, v) is a differen-
tiable function. Additionally, the partial derivatives gu and gv must not be zero, and
they must not be parallel. This ensures that gu × gv is not the zero vector, allowing
us to compute the normal vector n using (20.2). In other words, ∇g must always be
a rank 2 matrix.
20.3 Surface integral for a vector field 159

Exercises

1.
Lecture 21
Divergence Theorem #1

Let f : Rn → Rn be a differentiable vector function, and let D ⊂ Rn be a domain


with its surface S = ∂ D consisting of piecewise smooth surfaces. When the surface
S in the Divergence Theorem is piecewise smooth, the theorem relates the surface
integral of the vector field f and the volume integral of the divergence ∇ · f. In the
Divergence Theorem, there are no restrictions on the dimension n of space. How-
ever, the upcoming Stokes theorem is only meaningful in three-dimensional space
where n = 3.

21.1 Divergence and boundary of a domain

A vector function f : Rn → Rn has n components. The divergence of a vector function


is defined as follows:
∂ f1 ∂ fn
div(f) = ∇ · f = +···+ = ( f1 )x1 + · · · ( fn )xn .
∂ x1 ∂ xn
In words, it is the sum of the partial derivatives of each component with respect to
its corresponding variable. Although we will mainly focus on the case where n = 3,
this theorem is valid regardless of the number of components.

Problem 21.1. Compute the divergence of the following vector functions.


x
(1) f = xi + yj + zk (2) f = yi + zj + xk (3) f = ∥x∥3

Solution 21.1 ⊔

Consider the properties of the surface of a domain D ⊂ Rn when given.

Problem 21.2. Describe the properties of the following space surfaces.


(1) D = {x ∈ R2 : ∥x∥ < 1} (2) D = {x ∈ R3 : ∥x∥ < 1} (3) D = {x ∈ R4 : ∥x∥ < 1}

161
162 21 Divergence Theorem #1

Solution 21.2 (1) The boundary of a disk with radius 1 is a closed curve, which
is a circle. The disk is in 2-dimensional space, and its boundary is a 1-dimensional
curve. The normal vector forms a 1-dimensional line. (2) The boundary of a sphere
with radius 1 is a surface called a sphere. This surface has no boundary. The sphere
is in 3-dimensional space, and its boundary, the sphere, is a 2-dimensional surface.
The normal vector forms a 1-dimensional line. (3) The boundary of a 4-dimensional
sphere with radius 1 is a kind of sphere. This surface has no boundary. The domain
D is in 4-dimensional space, and its boundary is 3-dimensional. The normal vector
forms a 1-dimensional line. ⊔ ⊓

Although there may be special spaces, as seen above, we can think of the bound-
ary of a bounded n-dimensional domain D ⊂ Rn as an (n − 1)-dimensional surface.
Additionally, the normal vector always forms a 1-dimensional line. Therefore, when
the boundary of a finite n-dimensional space D ⊂ Rn forms an (n − 1)-dimensional
smooth surface, the surface integral of the vector field f : Rn → Rn is defined in the
same way: Z
f · n dx.
∂D

Now, regarding the direction vector n perpendicular to the surface, there are two
possible cases. For a surface like ∂ D that forms the boundary of a single domain,
the convention is to choose the outward unit normal vector pointing from inside the
space to outside.

21.2 Divergence theorem

Theorem 21.1 (Divergence Theorem). Let f : Rn → Rn be a continuously differ-


entiable vector field, D ⊂ Rn be a bounded domain with an oriented and piecewise
smooth boundary ∂ D, and n be the outward unit normal vector on the boundary
∂ D. Then, Z Z
∇ · f dx = f · n dx.
D ∂D

Proof. We will only prove the case for n = 2 instead of general dimensions. Al-
though the proof for dimensions 3 and higher is essentially the same, it involves
21.2 Divergence theorem 163

more cumbersome notation. The proof can also be extended similarly to other di-
mensions. To complete the proof, we should show convergence, but we will omit
that part and conclude by explaining the core mechanism. Additionally, we will
only consider the case when the domain D is a rectangle.
Let f : R2 → R2 be given by f(x, y) = f1 (x, y)i + f2 (x, y)j. Consider a partition of
the domain D into small cells, and let one of the cells be denoted as Di . Then, Di
has a boundary consisting of 4 line segments C1 ,C2 ,C3 ,C4 , and the outward unit
normal vector for each boundary is as shown in the figure. (Since we are dealing
with the 2-dimensional case and the line forms the boundary of Di , it should not be
thought of as a line integral. We should consider it as a surface integral because we
are integrating the normal component.) Now, when we perform the surface integral
over the boundary of Di , we have
Z Z Z Z Z
f · n dx = f · n dx + f · n dx + f · n dx + f · n dx
∂ Di C1 C3 C2 C4

∼ ∆x  ∆ x  ∆y ∆ y 
= f1 x + , y − f1 x − , y ∆ y + f2 x, y + − f2 x, y − ∆x
2 2 2 2
 f (x + ∆ x , y) − f (x − ∆ x , y) f (x, y + ∆ y ) − f (x, y − ∆ y ) 
1 2 1 2 2 2 2 2
= + ∆ x∆ y
∆x ∆y
∼ ∂ f1 ∂ f2 
= (x, y) + (x, y) ∆ x∆ y = (∇ · f)∆ x∆ y.
∂x ∂y

Now, let’s consider the surface integral over the entire boundary ∂ D. Each line
segment on the boundary of a cell that lies inside contributes twice to the surface
integral, with opposite directions of the normal vector. On the other hand, the line
segments on the boundary of the entire domain D belong to only one cell. Therefore,
Z N Z N Z
f · n dx = ∑ = ∑ (∇ · f)∆ x∆ y ∼
f · n dx ∼ = ∇ · f dx.
∂D i=1 ∂ Di i=1 D
164 21 Divergence Theorem #1

In the above calculation, the first equality holds because the internal boundaries of
the cells always appear twice, canceling each other out. This proof demonstrates
why the surface integral and the volume integral are connected by the divergence
theorem. ⊔ ⊓

Problem 21.3. (1) Prove the divergence theorem in one dimension. (2) Prove the
divergence theorem in three dimensions.

Solution 21.3 (2) Let’s illustrate only a part of the general proof. Similar to the
2-dimensional case, consider a domain D ⊂ R3 in the shape of a cube and divide the
entire domain into small cube cells, denoted as Di . Let (x, y, z) be the coordinates of
the center of a cell. The boundary of this cell consists of six faces, as shown below.
Let’s denote the two faces perpendicular to the x-axis as S1 and S2 . The integral

R
over the surface of this cell, ∂ Di f · n, involves surface integrals over six faces, but
we are interested in the surface integrals over S1 and S2 , which can be approximated
as follows:
δx δx
Z Z Z Z
f·n+ f·n ∼
= f1 − f1 ∼
= ( f1 (x + , y, z) − f1 (x − , y, z))∆ y∆ z
S1 S2 S1 S2 2 2
f1 (x + ∆2x , y, z) − f1 (x − ∆2x , y, z) ∂ f1
= ∆ x∆ y∆ z ∼
= ∆ x∆ y∆ z.
∆x ∂x
This calculation demonstrates how the surface integral changes into a volume in-
tegral for divergence. Now, performing similar surface integrals for the remaining
pairs of faces, we obtain partial derivatives with respect to y and z. Summing them
all up, we get: Z
f·n ∼
= ∇ · f(x, y, z)∆ x∆ y∆ z.
∂ Di

Now, summing up the integrals over all cells leads to the divergence theorem, as in
the proof of the theorem. The integral over the normal component applies regardless
of the dimension. ⊔ ⊓

Problem 21.4. How do we prove the theorem for a general case where the domain
is not a rectangle?
21.2 Divergence theorem 165

Solution 21.4 Let’s consider the case when the shape of the domain is not a rectan-
gle, as shown below. First, we’ll enclose the domain D within a rectangular area and

divide it into small cells of side length ε, as depicted. Let’s gather all cells inside the
domain D and call it Dε . Then, Dε is not a rectangle, but we can apply the previous
proof directly to obtain: Z Z
∇·f = n · f.
Dε ∂ Dε

As the size of the region D \ Dε where D and Dε differ approaches 0 as ε → 0, and


since ∇ · f is a bounded function, we have
Z Z
lim ∇·f = ∇ · f.
ε→0 Dε D

For the boundary integral, we also want


Z Z
lim n·f = n · f,
ε→0 ∂ Dε ∂D

but there is uncertainty in this regard. Even though the lengths of ∂ Dε do not con-
verge to the length of ∂ D as ε → 0, we can still observe that the relationship holds.
For instance, in the triangle on the right side of the figure, the red and black line
segments have different lengths, but their line integrals are the same. Calculating,
Z Z Z Z
n·f = f2 ∼
= f2 (x, y)∆ x, n·f = f1 ∼
= f1 (x, y)∆ x,
ℓ1 ℓ1 ℓ2 ℓ2

1 1 f f
Z Z Z
n·f = (√ , √ )·f = √1 + √2 = ( f1 (x, y) + f2 (x, y))∆ x.
ℓ3 ℓ3 2 2 ℓ3 2 2
Now, combining the above relationships, we get:
Z Z Z Z
∇ · f dx = lim ∇ · f = lim n·f = f · n dx.
D ε→0 Dε ε→0 ∂ Dε ∂D
166 21 Divergence Theorem #1

These relationships provide insight into why the theorem holds, although it’s not a
rigorous proof. ⊔⊓

A vector field f satisfying ∇ · f = 0 is said to be incompressible. The reason lies


in the divergence theorem. By the theorem,
Z Z
f · n dx = ∇ · f dx = 0
∂D D

This means that regardless of the chosen domain D, the amount entering and leaving
the domain always remains the same, so the total quantity does not change. If it were
compressed, more would enter, and if it were expanded, more would leave, but the
fact that the total quantity remains unchanged implies that there is no compression
or expansion. Now, let’s practice with some examples.

Problem 21.5. RLet f(x, y, z) = xi + yj + zk and D = {x : x2 + y2 + z2 < r2 }. Compute


R
D ∇ · f dx and ∂ D f · n dx, and confirm that they are equal.

Solution 21.5 ⊔

Problem 21.6. Let f(x) = − GMmx ∥x∥3


and D = {x : r2 < x2 + y2 + z2 < R2 }. Compute
R R
D ∇ · f dx and ∂ D f · n dx, and confirm that they are equal.

Solution 21.6 ⊔

Divergence theorem in 2 dimensions

Let C be a closed curve in a two-dimensional plane that is simply connected and


D be the region in two dimensions enclosed by the curve C. Applying the two-
dimensional divergence theorem, we obtain
I Z
f · n dx = (∇ · f) dx.
C D

Here, the left-hand side should not be referred to as a line integral. We are integrating
the normal component of the vector field f, not the tangential component. Unlike line
integrals, where the value changes depending on the direction of integration, in this
case, it does not. It is more accurate to call it a surface integral in two dimensions. In
two dimensions, since the boundary of the domain is a curve rather than a surface, it
may seem otherwise. However, changing the direction of the normal vector n in the
surface integral changes the sign of the integral value. The right-hand side is more
of a simple two-dimensional double integral than a surface integral. Writing this in
terms of the components of the vector field f = f1 i + f2 j, we have
I Z
f · n dx = (∂x f1 + ∂y f2 )dxdy. (21.1)
C D
21.2 Divergence theorem 167

Let’s rewrite this for the curve C using the parameterization r(t) = x(t)i + y(t)j.
There are a few things to note here. Firstly, r′ (t) = x′ (t)i + y′ (t)j is a vector tangent
to the curve. Then, what is the outward unit normal vector n? It is clear that −y′ (t)i+
x′ (t)j and y′ (t)i−x′ (t)j are normal vectors. But which one is outward? It is related to
the rotation direction of r(t). We usually choose curves that rotate counterclockwise,
in which case y′ (t)i − x′ (t)j is the outward normal direction. Therefore, the outward
unit normal n is given by:
y′ (t)i − x′ (t)j
n= .
∥r∥
Hence,

f1 y′ (t) − f2 x′ (t)
I Z b Z b I
f1 y′ (t) − f2 x′ (t) dt = f1 dy − f2 dx,

f · n dx = ∥r∥dt =
C a ∥r∥ a C

which allows us to rewrite (21.1) as follows:


I I Z
f · n dx = f1 dy − f2 dx = (∂x f1 + ∂y f2 )dxdy. (21.2)
C C D

This 2-dimensional divergence theorem concerning flux is known as Green’s the-


orem. Remember that the notation in the middle refers to the line integral in the
counterclockwise direction.

Theorem 21.2 (Green’s Theorem for Flux). Let S ⊂ R2 be a bounded smooth do-
main. Let C = ∂ S be the boundary of S and n be the outward unit normal vector
on the boundary. If f : R2 → R2 is a continuously differentiable vector field in an
open set D including S, Equation (21.2) is satisfied, where the line integral is in the
counterclockwise direction.
Lecture 22
Divergence Theorem #2

22.1 Flux and Conservation Law

Consider a vector field v representing the velocity of a fluid such as wind or water.
In this case, the divergence theorem leads to a conservation law. Let’s explain this
process.
Let u(x,t) be the density of the fluid at a point x at time t. Suppose we fix the
region D. Then, the integral
Z
u(x,t) dx (total mass)
D

represents the total mass of the substance within the region D. If we differentiate
this quantity with respect to time, we obtain the rate of increase of mass within the
region D:
d
Z
u(x,t) dx (rate of increase of mass)
D dt
Here, the term ”rate of increase” refers to the rate of change with respect to time,
which can be interpreted as the rate of increase. On the other hand, at the boundary
of the region D, there are parts where fluid flows in and parts where it flows out.
The direction of flow is given by the velocity vector field v, and the amount of flow
is given by the product of velocity and density, which is called flux. That is, flux
is given by f = uv. Therefore, integrating this over the boundary yields the amount
leaving the region per unit time:
Z
f · n dx (rate of mass outflow)
∂D

Hence, the rate of increase of mass within the region and the outflow rate have
opposite signs. Therefore,

169
170 22 Divergence Theorem #2

d
Z Z Z
u(x,t) dx = − f · n dx = − ∇ · f dx.
D dt ∂D D

Here, the second equality follows from the divergence theorem. Now, rearranging
the above equation, we obtain:
Z 
ut + ∇ · f dx = 0.
D

This equation, written in integral form, is called the conservation law. Since the
integral over all regions D must be zero, the integral function inside must be zero.
Therefore, we obtain:
ut (x,t) + ∇ · f = 0
This partial differential equation is commonly known as the mass conservation law.
In three dimensions, if the velocity is written as v = (v1 , v2 , v3 ), then the equation
can be rewritten as follows:

ut (x,t) + (uv1 )x + (uv2 )y + (uv3 )z = 0.

22.2 Heat Equation

22.3 Gauss’ Law

Let’s consider a charged particle located at the origin with a charge of q. The electric
field created by this charge is given by:
qx
E(x) = .
4πε0 ∥x∥3

Here, ε0 is a physical constant. Despite the difference in coefficients, it has the same
structure as the force field due to gravity. As already calculated in Problem 21.1, it
satisfies the following:

∇ · E(x) = 0 for all x ̸= 0.

Therefore, for all regions D that do not contain 0,


Z
E · n dx = 0
∂D

holds. Thus, the focus is on the integral over regions containing 0. If Br is a sphere
with center at the origin and radius r, where r > 0 is sufficiently small such that
Br ⊂ D, then
qx x q q
Z Z Z Z
E · n dx = E · n dx = · dx = dx = . (22.1)
∂D ∂ Br ∂ Br 4πε0 r3 r ∂ Br 4πε0 r2 ε0
22.3 Gauss’ Law 171

This result is known as Gauss’ Law.

Problem 22.1. Show that the first equality in (22.1) holds.

Solution 22.1 Let Dr be the set difference between D and Br , defined as Dr = D\Br .
Then, Z Z Z
E · n dx = E · n dx + E · n dx
∂D ∂ Dr ∂ Br

holds. This is because the boundary ∂ Br consists of ∂ D and ∂ Br , and the outward
normal vector n from the perspective of Dr is opposite to that from the perspective
of Br , thereby canceling each other out. Therefore, the above equation holds. Now,
applying the divergence theorem yields
Z Z
E · n dx = ∇ · Edx = 0.
∂ Dr Dr
R R
Thus, ∂D E · n dx = ∂ Br E · n dx is satisfied. ⊔

Exercises

1.
Lecture 23
Stokes’ Theorem #1

23.1 Curl of a Vector Field

Let’s denote a differentiable vector field f : R3 → R3 as f = ( f1 ; f2 ; f3 ). Then, the


curl of the vector field f is defined as follows:

curl(f) = (∂y f3 − ∂z f2 )i + (∂z f1 − ∂x f3 )j + (∂x f2 − ∂y f1 )k.

Although this definition is easy to forget over time, it’s helpful to remember it as the
cross product of two vectors, denoted as ∇ × f, and to recall it as a determinant of a
3 × 3 matrix. In other words,

i j k
curl(f) = ∇ × f = ∂x ∂y ∂z = (∂y f3 − ∂z f2 )i + (∂z f1 − ∂x f3 )j + (∂x f2 − ∂y f1 )k.
f1 f2 f3
Problem 23.1. Find the curl vector field of the following vector fields.
(1) f = (x; y; z) (2) f = (x; z; y) (3) f = (y; z; x) (4) f = (x2 − y; xez ; yz)

Solution 23.1 ⊔

Problem 23.2 (∇ × (∇p) = 0). Show that the curl of a conservative vector field is
0. What condition is necessary for this?

Solution 23.2 If the vector field f is conservative, then there exists a potential func-
tion p such that f = ∇p. Therefore, the curl of f is as follows:

∇ × f = (∂yz p − ∂zy p)i + (∂zx p − ∂xz p)j + (∂xy p − ∂yx p)k.

Thus, if f is continuously differentiable, then ∇ × f = 0. ⊔


Problem 23.3 (∇ · (∇ × f) = 0). Compute the divergence of the curl vector field
∇ × f of a vector field f. Show that the divergence of this curl vector field is 0. What
condition is necessary for this?

173
174 23 Stokes’ Theorem #1

Solution 23.3 The derivative matrix is as follows:


   
∂y f3 − ∂z f2 ∂yx f3 − ∂zx f2 ∂yy f3 − ∂zy f2 ∂yz f3 − ∂zz f2
∇(∇ × f) = ∇ ∂z f1 − ∂x f3  = ∂zx f1 − ∂xx f3 ∂zy f1 − ∂xy f3 ∂zz f1 − ∂xz f3  .
∂x f2 − ∂y f1 ∂xx f2 − ∂yx f1 ∂xy f2 − ∂yy f1 ∂xz f2 − ∂yz f1

The derivative matrix of the curl is generally not a zero matrix. The diagonal el-
ements of this derivative matrix are the partial derivatives of the i-th component
function of the vector field with respect to the i-th variable, and their sum gives the
divergence. Therefore,

∇ · (∇ × f) = ∂yx f3 − ∂zx f2 + ∂zy f1 − ∂xy f3 + ∂xz f2 − ∂yz f1 .

If each component of f is continuously differentiable up to the second derivative,


then the right-hand side becomes 0. Hence, we obtain ∇ · (∇ × f) = 0. ⊔

23.2 Stokes’ Theorem

Before explaining Stokes’ Theorem, let’s summarize what we have learned so far.
The surface integral of a vector field involves integrating the component perpendic-
ular to the surface, but for this to be possible, the given surface must be an oriented
surface so that we can choose the normal vector n throughout the entire surface.
However, even in this case, there are not just one but two perpendicular vectors, and
we choose one of them to represent n, while the other becomes −n. Choosing a
different perpendicular vector would change the sign of the surface integral.
Stokes’ theorem describes the relationship between the surface integral on a given
surface in three-dimensional space and the line integral on the curve that forms the
boundary of that surface. But what is the boundary of a surface? On the other hand,
what is the boundary of a surface? The following figure shows some examples:
The first example is a hemisphere cut in half. In this case, the cut surface marks

the ending point of the surface, which we call the boundary. The second and third
examples involve surfaces with holes. In this case, the perimeter of the hole also
becomes the boundary.
The line integral of a vector field involves integrating the component tangent to
the curve along the curve, and the direction of integration along the curve also has
23.2 Stokes’ Theorem 175

two options, with the sign changing depending on the chosen direction. Therefore, it
is important to choose the direction. In Stokes’ theorem, after choosing the normal
vector n, the line integral must be performed counterclockwise with respect to this
vector. This means that the counterclockwise direction is defined with the normal
vector as the axis, and the needle rotates in the opposite direction of the clock (re-
fer to the left two figures below). However, if the surface is not simply connected,

such as in the case of a surface with a hole, the boundary around the hole may ap-
pear counterclockwise like clockwise. It is essential to understand the meaning of
counterclockwise direction here. It refers to the perspective inside the surface, and
people inside the surface consider counterclockwise as the direction of the n-axis
inside the surface, so the direction of the boundary is just the direction they see as
they pass by (refer to the third figure).

Theorem 23.1 (Stokes’ Theorem). Let S ⊂ R3 be a bounded oriented smooth sur-


face. Let n be a unit normal vector field on the surface. Let C = ∂ S be the boundary
of S. If f is a continuously differentiable vector field in an open set D including S,
I Z
f · dr = (∇ × f) · n, (23.1)
∂S S

where the line integral is in the counterclockwise direction with respect to the nor-
mal vector n.

The left-hand side of (23.1) represents the line integral of the vector field f, while
the right-hand side represents the surface integral of the curl of the vector field f.
The rigorous proof of Stokes’ theorem is beyond the scope and purpose of Cal-
culus II. However, to understand how the line integral is connected to the surface
integral and the meaning of Stokes’ theorem clearly, we will outline a fairly detailed
proof step by step. The actual proof involves approximating the left-hand and right-
hand sides of (23.1) and showing that the approximations converge, but we will only
proceed with the approximation steps.
176 23 Stokes’ Theorem #1

Step 1. Flat Surfaces

The first step considers the case where the surface S is flat, meaning it lies on a
single plane. In such cases, it is reasonable to assume that the surface lies on the
xy-plane from the beginning. This is because we can either redefine the coordinate
system or rotate and translate the plane to align it with the xy-plane. Of course, in
this case, the representation of the vector field f must also be expressed in the new
coordinate system.

Now that the plane S lies on the xy-plane, we can treat it as a two-dimensional
problem. First, assuming that the surface S is bounded, we place it in a rectangular
space and create partitions into small cells with side length ε > 0. We then collect
those cells that are contained in the surface S to form Sε , as shown in the figure
below. We denote each cell of Sε as Si , where i = 1, · · · , N, and zoom in on one of

them, as shown in the figure. Let (xi , yi ) be the midpoint of the cell Si , and denote
its four sides as ℓ1 , ℓ2 , ℓ3 , ℓ4 . Then,
I Z Z Z Z
f · dr = f · dr + f · dr + f · dr + f · dr.
∂ Si ℓ1 ℓ2 ℓ3 ℓ4
23.2 Stokes’ Theorem 177

Adding the integrals over ℓ1 and ℓ3 , we get


Z Z Z yi +ε/2 Z yi +ε/2
f · dr + f · dr = f2 (xi + ε/2,t)dt − f2 (xi − ε/2,t)dt
ℓ1 ℓ3 yi −ε/2 yi −ε/2

∼ f2 (xi + ε/2, yi ) − f2 (xi − ε/2, yi ) 2 ∼


= ε = ∂x f2 (xi , yi )ε 2 .
ε
Similarly, adding the integrals over ℓ2 and ℓ4 , we obtain
Z Z
f · dr + f · dr ∼
= −∂y f1 (x, y)ε 2 ,
ℓ2 ℓ4

and summing them all up yields


I
f · dr ∼
= ∂x f2 (x, y) − ∂y f1 (x, y) ε 2 .

∂ Si

Now, summing up the line integrals over all cells, we obtain


N I N Z
f · dr ∼
= ∑ ∂x f2 (xi , yi ) − ∂y f1 (xi , yi ) ε 2 ∼
 
∑ = ∂x f2 − ∂y f1 dxdy.
i=1 ∂ Si i=1 Sε

The second approximation comes from the definition of Riemann integrals. Here,
ε 2 represents the area of the small partition cell, and (xi , yi ) are the points of each
cell.
Meanwhile, the following relation holds:
I N I
f · dr = ∑ f · dr.
∂ Sε i=1 ∂ Si

The sum of the line integrals on the right-hand side includes many more line inte-
grals over the boundaries of the internal cells of Sε . However, the boundaries within
the surface are integrated twice, once in each direction, and thus cancel each other
out. Therefore, only the boundaries of Sε remain, satisfying the equation above.
Moreover, the remaining line integrals are all counterclockwise. An example illus-
trating this is provided in the figure below. Each boundary of the small cell is inte-
grated counterclockwise, and after excluding the canceled parts, only the boundaries
remain, as shown in the figure.
Combining these, we obtain:
I Z
f · dr ∼

= ∂x f2 − ∂y f1 dxdy.
∂ Sε Sε

The next step is to take the limit as ε → 0. When taking the limit, all the approxima-
tions become equalities, Sε converges to S, ∂ Sε converges to ∂ S, and the integrals
converge as well. Thus, the rigorous proof is completed, resulting in the following
178 23 Stokes’ Theorem #1

equation: I Z 
f · dr = ∂x f2 − ∂y f1 dxdy. (23.2)
∂S S

This relation, viewed as a phenomenon occurring in two-dimensional space rather


than three-dimensional space, is called Green’s Theorem.

Theorem 23.2 (Green’s Theorem for Circulation). Let S ⊂ R2 be a bounded


smooth surface. Let C = ∂ S be the boundary of S. If f : R2 → R2 is a continuously
differentiable vector field in an open set D including S, Equation (23.2) is satisfied,
where the line integral is taken in the counterclockwise direction.

However, we consider (23.2) in the context of events occurring in three-dimensional


space. We have simply placed the surface S on the xy-plane. In this case, the unit
normal vector is n = (0; 0; 1) = k, and ∂x f2 − ∂y f1 = ∇ × f · n holds. Therefore, we
obtain Stokes’ Theorem for flat surfaces, namely,
I Z
f · dr = (∇ × f) · n. (23.3)
∂S S

Returning to the original position, n is no longer k, and the surface S, although flat,
no longer lies on the xy-plane.

Problem 23.4. For f = f1 i + f2 j, Equation (23.2) is sometimes written as follows:


I I Z
f · dr = f1 dx + f2 dy = (∂x f2 − ∂y f1 )dxdy.
∂S ∂S S

Explain the meaning of this.


R
Solution 23.4 In the above expression, S (∂x f2 − ∂y f1 )dxdy Hdenotes integrating
the real-valued function ∂x f2 − ∂y f1 overH
the region S, while ∂ S f · dr represents
integrating over the curve ∂ S. However, ∂ S f1 dx + f2 dy is somewhat ambiguous. It
is also a representation of line integration. Here, dx and dy have a different meaning
than in the integral of dxdy. When given a curve r(t) = (x(t); y(t)), understand dx =
x′ (t)dt and dy = y′ (t)dt. In other words, interpret ∂ S f · dr as the integration of the
H

dot product in advance. ⊔ ⊓


23.2 Stokes’ Theorem 179

Step 2. General Surface

Let’s consider the case of a general oriented surface S. We approximate the given
surface by connecting small flat patches, as shown in the figure below. Although
the patches made into flat surfaces do not exactly match the plane, as we make the
flat patches smaller and smaller, they converge to the original surface. Making each

patch smaller than ε, we define the patch composed of these patches as Sε . Since
each patch is flat, we can apply Stokes’ Theorem to each patch. Now, summing them
up, as in the previous case, the line integrals over segments inside are canceled out,
leaving only the line integrals over the boundary. For this reason, we obtain:
I Z
f · dr ∼
= (∇ × f) · n.
∂ Sε Sε

Taking the limit as ε → 0, we obtain Stokes’ Theorem for a general surface.


Lecture 24
Stokes’ Theorem #2

24.1 Boundary of Boundary

Among surfaces, there are surfaces that do not have a boundary. Below are some
examples. The first one is a sphere: It has no boundary. In other words, the sphere

is the boundary of a ball, but the boundary of the sphere is the empty set. The
second example is a torus, which is a donut shape, and it has no boundary. The
third example is a cylinder with the top and bottom caps included, and it also has no
boundary (if the caps are removed, the cut-out circle becomes the boundary). Such
surfaces without a boundary are called closed surfaces.

Problem 24.1. Prove that the surface integral of the curl vector field over a sphere
is always zero. Explain which property of the sphere makes the integral zero.

Solution 24.1 The sphere does not have a boundary; that is, the boundary is the
empty set. Therefore,
Z Z Z
(∇ × f) · n = f · dr = f · dr = 0.
S ∂S 0/

This proof may seem awkward because Stokes’ Theorem is a theorem for surfaces
with boundaries. In that case, as shown in the figure below, if we create a hole with

181
182 24 Stokes’ Theorem #2

a radius of ε > 0 on the surface of the sphere, and denote the sphere with a hole as
Sε and the boundary of the small hole as Cε = ∂ Sε , we obtain the following:
Z Z I
(∇ × f) · n = lim (∇ × f) · n = lim f · dr = 0.
S ε→0 Sε ε→0 ∂ Sε

Here, the second equality is by Stokes’ Theorem, and the first and third equalities
are because f and ∇ × f are finite and continuous. ⊔

Connection of Divergence and Stokes’ Theorems

Closed surfaces divide space into inside and outside regions. And closed surfaces
become the boundary of the inside space. Of course, we can also say that they are
the boundary of the outside space. Moreover, when a surface with a hole becomes
a surface with a boundary (refer to the figure), the inside and outside become con-
nected, and the distinction between inside and outside disappears. Thus, surfaces
forming the boundary of a finite region do not have a boundary themselves. This
relationship can be expressed as:

∂ (∂ D) = 0,
/ D ⊂ Rn .

Using this relationship, we can connect the Divergence Theorem and Stokes’ The-
orem, and say the following. Let f be a vector field with continuous differentiability
up to the second derivative. Let D ⊂ R3 be an open set with a smooth boundary, and
let n be the outward unit normal vector on the boundary. Applying the Divergence
Theorem to ∇ × f and then applying Stokes’ Theorem, we obtain:
Z Z Z Z
∇ · (∇ × f) = (∇ × f) · n = f · dr = f · dr = 0.
D ∂D ∂ (∂ D) 0/

For this integral to be zero for all sets D ⊂ R3 , the integrand itself must be zero, so
∇ · (∇ × f) = 0. Of course, this fact can also be demonstrated directly by calculating
derivatives, as shown in Problem 23.3.
24.2 Simply connected domain 183

24.2 Simply connected domain

If a domain D is simply connected, then Stokes’ Theorem can be used more freely.
Let’s see the following theorem.

Theorem 24.1. Let an open domain D ⊂ R3 be simply connected, and ∇ × f = 0 in


D. Then, for any closed smooth curve C ⊂ D,
I
f · dr = 0.
C

Proof. We only need to consider the case where C is a closed simple curve. Since
the domain D is simply connected, any closed curve C can be contracted to a single
point within the domain D without leaving it. Thinking of the trace left by this curve
as a surface implies that there exists some surface S within the domain D that has
C as its boundary. Moreover, closed simple curves always have orientable surfaces
associated with them. Now, applying Stokes’ Theorem yields:
I Z
f · dr = (∇ × f) · n.
C S

Since ∇ × f is assumed to be constantly zero within the domain D in the theorem, it


is always zero on the surface S. Therefore, the surface integral above becomes zero,
and the proof is complete. ⊔ ⊓

It is easy to understand why a domain needs to be simply connected in order to


connect line integrals along closed curves and surface integrals over surfaces with
those curves as boundaries. Consider the case of a three-dimensional domain, where

it is a sphere, or where part of its interior has been removed. In the case of a sphere,
it is easy to construct a surface S, and even if part of the interior is removed, a
surface S can still be constructed by avoiding it. However, in the case where a hole
is made from top to bottom of the sphere, depending on the curve C, it may or may
not be possible to construct a surface S within D that contains C. If we consider
the boundary curve of a Möbius strip, for example, it is a closed simple curve in
three-dimensional space. Therefore, it is possible to construct an oriented surface
184 24 Stokes’ Theorem #2

with this curve as its boundary. Thus, remember that different properties of surfaces
can have the same closed curve as their boundary.

24.3 Green’s Theorem

Green’s Theorem is a two-dimensional perspective on the Divergence Theorem and


Stokes’ Theorem. In most textbooks, Green’s Theorem is taught first, followed by
its extensions, the Divergence Theorem and Stokes’ Theorem, but we did not follow
that order, considering the Divergence Theorem and Stokes’ Theorem as parts of
Green’s Theorem. Let’s summarize and understand the differences.

Theorem 24.2 (Green’s Theorem for circulation). Let S ⊂ R2 be a bounded


smooth surface. Let C = ∂ S be the boundary of S. If f : R2 → R2 is a continuously
differentiable vector field in an open set D including S, then
I I Z
f · dr = f1 dx + f2 dy = (∂x f2 − ∂y f1 )dxdy,
∂S ∂S S

where the line integral is in the counterclockwise direction.

This Green’s Theorem cannot be called Stokes’ Theorem in two dimensions.


Stokes’ Theorem does not hold unless it is in three dimensions. This theorem is
a special case where the surface S lies on the xy plane. Therefore, regarding it as
Stokes’ Theorem in three dimensions, we can write it as follows:
I I Z
f · dr = f1 dx + f2 dy + f3 dz = (∇ × f) · n.
∂S ∂S S

The curve lies on the xy plane, so dz = 0, and n = k, making it simple.

Theorem 24.3 (Green’s Theorem for flux). Let S ⊂ R2 be a bounded smooth do-
main. Let C = ∂ S be the boundary of S, and n be the outward unit normal vector on
the boundary. If f : R2 → R2 is a continuously differentiable vector field in an open
set D including S, then
I I Z
f · n dx = f1 dy − f2 dx = (∂x f1 + ∂y f2 )dxdy,
C C D

where the line integral is in the counterclockwise direction.

This Green’s theorem is the divergence theorem in two dimensions. Remember


that the direction of the line integral is counterclockwise. If a curve in a clockwise
direction is chosen, the corresponding expression becomes:
I I Z
f · n dx = − f1 dy + f2 dx = (∂x f1 + ∂y f2 )dxdy.
C C D
24.4 Examples 185
H
In the two-dimensional case, the boundary is a curve, so terms like C f1 dy − f2 dx
appear in the line integral notation, which do not exist in three dimensions.

24.4 Examples

The following problem is a comprehensive summary.

Problem 24.2. Consider a hemisphere as shown in the figure below. Let the radius
of the hemisphere be a > 0 and denote it as S1 . Let C be its boundary. Also, consider
a disc with radius a centered at the origin on the xy plane, denoted as S2 . The vector
field is given by f = (−y, x, 0).
(1)
R
Given the normal vector n as shown in the figure, calculate the surface integral
S1 (∇ × f) · n. R
(2) Given the normal vector n = k, calculate theH
surface integral S2 (∇ × f) · n.
(3) Determine and calculate the line integral C f · dr when the direction of circula-
tion is as shown in the figure.

Solution 24.2 First, note that (1) and (3) are equal by Stokes’ Theorem. Simi-
larly, (2) and (3) are equal by Stokes’ Theorem. Recognize that when two different
surfaces have the same boundary and each normal vector gives the same counter-
clockwise direction to the boundary, the curl’s surface integral yields the same value.
Therefore, it is important to know which calculation is easier. More importantly, you
should be able to calculate each based on what we have studied so far. Now, let’s
solve.
(1) At a point (x, y, z) ∈ S1 on the hemisphere, the normal vector is n = 1a (x, y, z).
Also, it is easy to compute ∇ × f = (0, 0, 2). Therefore,

1 2z
Z Z Z
(∇ × f) · n = (0, 0, 2) · (x, y, z) = .
S1 a S1 S1 a

To perform the surface integral, we need a change of variables.


(1a) What would be a good change of variables? Consider a variable transforma-
tion like this:
186 24 Stokes’ Theorem #2
p
g(u, v) = (u, v, a2 − u2 − v2 ), G = {(u, v) : u2 + v2 < a2 }.
a
Now, calculate ∥gu ×gv ∥ to get ∥gu ×gv ∥ = √ . Therefore, with this variable
a2 −u2 −v2
transformation,
2z
Z Z
=2 dudv = 2πa2 .
S1 a G

(This case is feasible by hand calculation, but if not, think of it as doing computer
integration and make formulas, it’s almost satisfying.)
(1b) Since the surface is a hemisphere, using spherical coordinates may be more
natural. Spherical coordinates satisfy the following relations:

g(ρ, φ , θ ) = (ρ sin φ cos θ , ρ sin φ sin θ , ρ cos φ ).

Since the surface is half of a sphere with radius a, the variable transformation is as
follows.
π
g(φ , θ ) = (a sin φ cos θ , a sin φ sin θ , a cos φ ), G = {0 < φ < , 0 < θ < 2π}.
2
Using the variable transformation,

2z
Z Z π/2
= 2a2 π cos φ sin φ dφ .
S1 a 0

Using u = sin φ for substitution, du = cos φ dφ is obtained, so


Z 1
1
Z π/2
cos φ sin φ dφ = udu = .
0 0 2
Therefore, we obtain the same value:
2z
Z
= 2a2 π.
S1 a

(2) At a point (x, y, 0) ∈ S2 on the disc, the normal vector is n = (0, 0, 1). Therefore,
Z Z Z
(∇ × f) · n = (0, 0, 2) · (0, 0, 1) = 2.
S2 S2 S2

The area of the circle is πa2 , so the integral value is also 2πa2 .
(3) For the curve that revolves around the origin and has a radius of a, there are
(a cost, a sint, 0) and (a sint, a cost, 0). It is necessary to recognize which direction
corresponds to the one shown in the figure. It is the former. Since it completes one
revolution, it has the interval 0 < t < 2π. Therefore, the line integral is
I Z 2π Z 2π
f·dr = (−a sint, a cost, 0)·(−a sint, a cost, 0)dt = a2 sin2 t +a2 cos2 tdt.
C 0 0
24.4 Examples 187

Therefore, the line integral is also = a2 2π.

Problem 24.3. A conepis given as shown in the figure above. Given the relationship
of the cone as z = r = x2 + y2 , 0 < z < 2, and denote it as S1 . Let C be its boundary.
Also, consider a disc with radius a centered at the origin on the plane z = 2, denoted
as S2 . The vector field is given by f = (x2 − y, 4z, x2 ).R(1) Given the normal vector n
as shown in the figure, calculate the surface integral S1 (∇ × f) · n.R
(2) Given the normal vector n = k, calculate the H
surface integral S2 (∇ × f) · n.
(3) Determine and calculate the line integral C f · dr such that it matches the value
of the surface integral in (1).

Solution 24.3 This problem can be solved almost identically to the previous one.
Note that if you follow the orientation given in the figure, you should choose r(t) =
(2 sint, 2 cost, 2). ⊔

You might also like