Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

MAT1841: REVISION OF SOME TOPICS

Remark. Reading and understanding the theory is very important, however being able to do
questions efficiently is the key to doing well in the exam! Thus, you should try and attempt the
Applied class problem sets on your own. Refer to lectures/lecture notes/this document if you
get stuck.

1. Vectors

An n dimensional vector in Rn is an ordered list (i.e., n-tuple)

v = (v1 , . . . , vn )
˜
where v1 , . . . , vn are all real numbers. We say that v belongs to Rn (i.e. v ∈ Rn ). Its length1 is
˜ ˜
given by
n
!1/2
X
|v| = (v12 + · · · + vn2 )1/2 = vi2 .
˜ i=1

A vector possesses only a length and direction, it does not correspond to a certain position in
Rn .

1.1. Dot product. For two vectors v, w ∈ Rn , their dot product is defined as
˜ ˜
Xn
v · w := v1 w1 + · · · + vn wn = v i wi .
˜ ˜ i=1

We know the following things about the dot product (and probably more!):

(i) v · w = w · v.
˜ ˜ ˜ ˜
(ii) v · w ∈ R. That is, the dot product returns a scalar (not a vector!). This is one reason
˜ ˜
why it is often called a scalar product.2

(iii) v · w = |v||w| cos(θ), where θ ∈ [0, π] is the angle between the vectors v and w. Thus the
˜ ˜ ˜ ˜ ˜ ˜
dot product gives a measure of ‘colinearity’ of two vectors.

(iv) v · w = 0 if and only if v and w are perpendicular (also known as orthogonal).


˜ ˜ ˜ ˜

1.2. Scalar and Vector projection. The vector projection of a vector v onto w is a vector vw
˜ ˜ ˜
which points in the direction of w, its length being the ‘shadow’ of v cast on w. Mathematically,
˜ ˜ ˜

1Also known as magnitude or norm, although the latter is usually used in more abstract mathematics.
2Actually, the dot product is an operator which belongs to a broader class of operators called inner/scalar
products.
1
2

the vector projection of v onto w is


˜ ˜  
v·w
vw = ˜ ˜2 w.
˜ |w| ˜
˜
The scalar projection of v onto w is
˜ ˜
v·w
vw = ˜ ˜
|w|
˜
and is almost like the length of the vector projection, but it can be negative depending on the
angle between v and w.
˜ ˜

1.3. Cross product. For two vectors v, w ∈ R3 , their cross product is defined as
˜ ˜
v × w = (v2 w3 − v3 w2 , v3 w1 − v1 w3 , v1 w2 − v2 w1 ).
˜ ˜
We know the following things about the cross product (and probably more!):

(i) Cross products are only defined for vectors that belong to R3 .

(ii) Let u = v × w. Then u is perpendicular to both v and w, and points in the direction
˜ ˜ ˜ ˜ ˜ ˜
given by the right hand rule. In other words, the cross product v × w returns a vector in
˜ ˜
R3 that is perpendicular to both v and w.
˜ ˜
(iii) v × w = −(w × v), this is a consequence of the right hand rule.
˜ ˜ ˜ ˜
(iv) v × w = |v||w| sin(θ)ˆ
n, where θ ∈ [0, π] is the angle between v and w and n
ˆ is a unit vector
˜ ˜3 ˜ ˜ ˜ ˜ ˜ ˜
in R perpendicular to both v and w and pointing in the direction given by the right hand
˜ ˜
rule.

(v) |v × w| = |v||w| sin(θ), this is the area of the parallelogram spanned by v and w.
˜ ˜ ˜ ˜ ˜ ˜
(vi) v × w = (0, 0, 0) if and only if the angle between the vectors v and w is either 0 or π (i.e.,
˜ ˜ ˜ ˜
if v and w are colinear).
˜ ˜
(vii) v × w can be written as an informal determinant of a 3 × 3 matrix:
˜ ˜
i j k
˜ ˜ ˜
v × w = v1 v2 v3
˜
w1 w2 w3
where one implements Laplace expansion along the top row. Note that i = (1, 0, 0),
˜
j = (0, 1, 0), k = (0, 0, 1).
˜ Often people ˜
prefer remembering this form of the cross product, as computing deter-
minants via Laplace expansion is easy to remember!

2. Lines, planes and linear systems

Lines and planes can be utilised for various purposes in linear algebra. In particular, they can
provide a geometric picture concerning the solution(s) of linear systems.
3

2.1. Lines. See notes! Important topics:

(i) Parametric, vector, and symmetric forms of lines in R3 .

(ii) Distance between lines and points, lines and lines.

2.2. Planes. See notes! Important topics:

(i) Cartesian, parametric, and vector forms of planes in R3 .

(ii) Distance between planes and points, planes and planes.

(iii) Point and angle of intersection of lines and planes.

2.3. Linear systems. See notes! Important topics:

(i) Geometric interpretation in terms of lines for systems of linear equations in two variables.

(ii) Geometric interpretation in terms of planes for systems of linear equations in three vari-
ables.

2.4. Matrices. See notes! Important topics:

(i) Multiplication of matrices.

(ii) Solving linear systems using Gaussian elimination.

(iii) Row echelon form of a matrix.

(iv) Rank of a matrix.

(v) Consequences of consistency of linear systems for matrices.

(vi) Finding inverses.

(vii) Finding determinants.

3. Derivative and differentiation

See notes! Important topics:

(i) Derivative from first principles.

(ii) Chain, product, quotient rules.

(iii) Critical points (i.e., stationary points and points of singularity), Local extrema and Ab-
solute extrema. We’re talking about univariate functions here!
4

(iv) Derivatives of inverse functions, in particular derivatives of inverse circular functions (e.g.,
sin−1 , cos−1 , tan−1 ).

4. Parametric curves

A curve can be expressed in cartesian form if we can find an algebraic relationship between the
x and y coordinates on the curve. There are two ways to do this, either explicitly or implicitly.
For a curve in explicit (cartersian) form, the set of all points on the curve is

CEx = {(x, y) ∈ R2 : y = f (x)}

where f : I → R is a function. This means the y coordinate for each point on the curve can
be expressed as the output of a function of x. Most curves cannot be expressed in explicit
(cartesian) form. However, we still may be able to find an algebraic relationship between the x
and y coordinates. A curve in implicit (cartesian) form can be described as

CIm = {(x, y) ∈ R2 : g(x, y) = 0}.

For example, if g(x, y) = x2 + y 2 − 1, then CIm describes a circle centred at (0, 0) with radius
1.3

A curve in R2 considered in parametric form is given by the mapping t 7→ ((x(t), y(t)).4


We say that the curve is parameterised by t. The set of points on this curve is

CPar = {(x, y) : x = x(t), y = y(t), t1 , ≤ t ≤ t2 }

For a curve in R3 considered in parametric form, it is given by the mapping t 7→ ((x(t), y(t), z(t)),
in which case the collection of points on this curve is

CPar = {(x, y, z) : x = x(t), y = y(t), z = z(t), t1 ≤ t ≤ t2 }.

The parametric form has a number of advantages and disadvantages over the cartersian form.

(i) The parametric form of a curve always exists. That is, a curve is a curve if and only if it
can be expressed in parametric form. A cartersian form for a curve may not exist.

(ii) Finding a typical point on a curve in parametric form is easy, just substitute in a particular
value for the parameter. This is not true in implicit (cartesian) form, one needs to solve
an algebraic equation to find a typical point on the curve, which may not be easy.

(iii) For a curve in R2 in parametric form, we have


dy dy(t)/dt
=
dx dx(t)/dt
which is a function of t. dy/dx is often hard to find in cartesian form.

3In the lectures we did not write the implicit form of a curve this way. This is because we had not seen
multivariable functions yet! But now you can see that a curve in implicit form is a level curve of a surface in
explicit form!
4Technically the curve is the mapping t 7→ ((x(t), y(t)) rather than the set of points C .
Par
5

(iv) The parametric form of a curve is non-unique, thus it is often difficult to tell what the
curve looks like when presented in parametric form. Often we will need to convert it to
cartesian form, as cartesian form is more or less unique.
2
Exercise 4.0.1. Consider the parametric curve given by x(t) = et and y(t) = e2t − cos(πt).
Find the tangent line to the curve at t = 1.

Solution.
6

4.1. Power series and Taylor series. A power series 5 is the series

X
f (x) = an xn ,
n=0

where a0 , a1 , . . . is a sequence of real numbers. Note that a power series is a function f : I → R.


As f is a series, there is no guarantee that it converges for all x ∈ R. The radius of convergence
for the power series f is the largest positive R such that f exists in an interval (−R, R). The
interval of convergence of the power series f is the interval where it converges, and it is either
(−R, R), [−R, R), (−R, R] or [−R, R]. Basically, one finds the radius of convergence for the
power series, and then tests the end points to determine whether convergence of f occurs there.

We know a power series is a function. We now ask the reverse question: ‘If we have an
arbitrary function, can it be expressed as a power series?’. The answer is often yes, and this
object is called the function’s Taylor series. The Taylor series of f : I → R centred at/expanded
around a ∈ R is given by

X f (n) (a)
f (x) = (x − a)n ,
n!
n=0

where f (n)means the n-th derivative of f . A Taylor series centred at/expanded around a = 0
is called a Maclaurin series. The Maclaurin series of your favourite functions (e.g., ex , sin(x))
can be found in your lecture notes.

The truncation of the Taylor series of a function f up to n + 1 terms is called the n-th
degree Taylor polynomial. Specifically,
n
X f (k) (a)
Tn (x; a) := (x − a)k .
k!
k=0

It provides an approximation to the function f , and this approximation is usually good if one
evaluates the Taylor polynomial near the centring point a, and/or a sufficiently high number of
terms (i.e., sufficiently large n) is used in the Taylor polynomial.

Exercise 4.1.1. Find the first three non-zero terms in the Taylor series for f (x) = (x− π2 )ecos(x)
centred at a = π/2.

Solution.

5There exists more general notions of power series, but we do not cover them.
7
8

4.2. Cubic splines. Consider n + 1 data points (x0 , y0 ), . . . , (xn , yn ). A cubic spline is a
piecewise function ỹ(x) defined by



ỹ0 (x), x0 ≤ x < x1 ,


ỹ1 (x), x1 ≤ x < x2 ,

ỹ(x) = .

..



ỹn−1 (x), xn−1 ≤ x ≤ xn ,

where the pieces are cubic polynomials

ỹi (x) = di + ai (x − xi ) + bi (x − xi )2 + ci (x − xi )3

for i = 0, . . . , n − 1 and must satisfy the following properties:

(i) ỹi (xi ) = yi for i = 0, . . . , n − 1. (Spline interpolates data points) n eqns.

(ii) ỹi−1 (xi ) = yi for i = 1, . . . , n. (Continuity of the spline) n eqns.

′ (x ) = ỹ ′ (x ) for i = 1, . . . , n − 1. (Continuity of the first derivative) n − 1 eqns.


(iii) ỹi−1 i i i

(iv) ′′ (x ) = ỹ ′′ (x ) for i = 1, . . . , n − 1. (Continuity of the second derivative) n − 1 eqns.


ỹi−1 i i i

(v) ỹ0′′ (x0 ) = ỹn−1


′′ (x ) = 0. (Endpoints of spline have second derivatives equal to 0) 2 eqns.
n
9

In total there are 4n eqns, and also 4n unknowns. Hence the coefficients ai , bi , ci , di can be
found uniquely (see lectures for the formulas!).

The purpose of a cubic spline is to provide a sufficiently smooth curve that interpolates
the given data points, and thus provide an estimate to the data at unknown sample points.

Exercise 4.2.1. Consider the data points

x −4 −3 o1 o3
f (x) o2 o2 o2 o4

Let



ỹ0 (x), −4 ≤ x < −3,



ỹ(x) = ỹ1 (x), −3 ≤ x < 1,





ỹ (x),
2 1 ≤ x ≤ 3.
where the pieces ỹ0 (x), ỹ1 (x), ỹ2 (x) are given by
1 1
ỹ0 (x) = 2 + (x + 4) − (x + 4)3 ,
26 26
1 3 7
ỹ1 (x) = 2 − (x + 3) − (x + 3)2 + (x + 3)3 ,
13 26 208
8 15 5
ỹ2 (x) = 2 + (x − 1) + (x − 1)2 − (x − 1)3 .
13 52 104

Show that ỹ(x) is a cubic spline.

Solution. No need to derive the cubic spline yourself. One needs to just verify that the pieces
ỹ0 , ỹ1 , ỹ2 satisfy the properties (i) to (v) of a cubic spline.
10
11
12

5. Integration

5.1. Definite integral. The Riemann integral or definite integral of a piecewise continuous
and bounded function f : I → R is defined as
Z b n−1
X
f (x)dx := lim f (xi )(xi+1 − xi ).
a n→∞
i=0

This limit is a bit strange and we did not properly study it in this course. However, the
interpretation is that the definite integral yields the signed area bounded by the curve of the
function f , the vertical lines x = a, x = b and the x-axis. It achieves this by essentially fitting
rectangles of infinitesimal width between the curve of f and the x-axis, and ‘summing’ up each
rectangle’s area.

In general, it is not easy to calculate definite integrals from the definition. Instead, one
usually utilises the fundamental theorem of calculus (FTOC), which states that
Z b
f (x)dx = F (b) − F (a)
a

where F ′ (x)
= f (x). Here, F is called an antiderivative of f . Antiderivatives are non-unique;
if F is an antiderivative for f , then so is F̃ (x) = F (x) + c where c is a constant. Thus,
the fundamental theorem of calculus yields the notion that ‘differentiation and integration are
essentially inverse operations’.6

5.2. Indefinite integrals and properties. The indefinite integral of a function f is given by
Z
f (x)dx = F (x) + c

where F is an antiderivative of f . Since F (x) + c is also an antiderivative of f , one can think of


the indefinite integral as being synonymous with antiderivative, or one can interpret it as the
‘set of all antiderivatives of f ’. Either interpretation is fine.

Integrals (definite and indefinite) possess a number of properties.

Rc Rb Rc
(i) a f (x)dx = a f (x)dx + b f (x)dx, one can split the interval of integration up.
R R R
(ii) [d1 f (x) + d2 g(x)] dx = d1 f (x)dx + d2 g(x)dx. This means indefinite integrals are
linear.
Rb Rb Rb
(iii) a [d1 f (x) + d2 g(x)] dx = d1 a f (x)dx + d2 a g(x)dx. This means definite integrals are
linear.

One can easily find antiderivatives (or indefinite integrals) for various elementary functions
R
by simply reverse engineering differentiation. E.g., sin(x)dx = − cos(x) + c. See lecture
notes for a full list! However, we are often interested in integrating functions which are more

6Although, I believe one should appreciate that the ideas of differentiation and integration are quite separate —
one can talk about integration without needing to appeal to differentiation.
13

complicated than elementary functions. In order to do so, one can reverse engineer the chain rule
and the product rule to obtain integration by substitution and integration by parts respectively.

Integration by substitution is given by


Z Z

f (g(x))g (x)dx = f (u)du

for the indefinite integral and


Z b Z g(b)

f (g(x))g (x)dx = f (u)du
a g(a)

for the definite integral.

Integration by parts is given by


Z Z
f (x)g ′ (x)dx = f (x)g(x) − f ′ (x)g(x)dx,

for the indefinite integral and


Z b Z b
b
f (x)g ′ (x)dx = f (x)g(x) − f ′ (x)g(x)dx,
a a a
for the definite integral. Note the definite integral versions aren’t really necessary (one can use
the indefinite version and then use the FTOC), but they are sometimes quicker to use (but
sometimes not!).
R
One small note, if you are asked to do an indefinite integral f (x)dx and you utilise a
substitution in order to do so, you must convert your answer back in terms of x at the end! It is
always good to verify that your answer is correct by differentiating it and checking it matches
the integrand (i.e., the function inside the integral).

5.3. Area between curves. Suppose f (x) ≥ g(x) for all x ∈ [a, b]. The area bounded by the
two curves determined by the functions f and g, and the vertical lines x = a and x = b is given
by the formula
Z b
[f (x) − g(x)]dx.
a
This only works if f (x) ≥ g(x) for all x ∈ [a, b]. If the inequality between f and g changes over
the interval [a, b], you must take this into account; namely, determine the subintervals where f
is bigger/smaller than g, and consider them case by case.

Finding which function f or g is bigger over each potential subinterval can be difficult. In
fact, you sort of don’t need to determine which function is bigger if you consider the following
strategy. Suppose you need to find the area between f and g over the interval [a, b]. Let
h(x) := f (x) − g(x). Then:

(i) Solve h(x) = 0 to get x1 , x2 , . . . . Order them so that xi < xi+1 . These values xi are the
points where f and g intersect.

(ii) Discard any xi that are not in (a, b). Suppose you have x1 , . . . , xn left over.
14

(iii) If there are no xi in (a, b), then the area between f and g over the interval [a, b] is
Z b Z b
(±?)h(x)dx = h(x)dx
a a

and if there are xi in (a, b), then the area is


Z x1 Z x2 Z b
(±?)h(x)dx + (±?)h(x)dx + · · · + (±?)h(x)dx
a x1 xn

Z x1 Z x2 Z b
= h(x)dx + h(x)dx + · · · + h(x)dx .
a x1 xn

In other words, the total desired area is the sum of all these integrals. Note I have
written a ±? in front of each h(x). This is because you may not know which function f
or g is bigger over each subinterval. However, what you do know is that each of these
integrals correspond to (unsigned) areas, and thus must be positive! So if you compute
any of these integrals and they end up negative, you chose the wrong sign. But do not
fret, you simply need to remove the minus sign (why?), hence why we put the absolute
value. Note that this ONLY works once you write the desired area as a sum of integrals
like this. And note in application, there are probably either 0, 1, or 2 intersection points.

In order to find the area of the region enclosed between two curves (without this vertical
line constraint), you must find all points where they intersect (if they do!). Note that there
could be multiple regions of interest. A slight modification of the previous strategy will then
yield you the desired area.

Exercise 5.3.1. Find the area of the region bounded by the curves determined by the functions
f (x) = x2 e−x and g(x) = xe−x and the vertical lines x = 0, x = 1. What about if it were instead
the vertical lines x = 0, x = 2?

Solution.
15
16

5.4. Trapezoidal rule. The trapezoidal rule is utilised for numerically approximating definite
integrals by fitting trapeziums between the curve of the function being integrated and the x-axis.
It is superior to fitting rectangles, since rectangles assume the function is constant on intervals,
whereas the trapeziums assume they are linear on intervals. Neither is correct, but linear is
better than constant! The more trapeziums utilised, the better the approximation. One would
use the trapezoidal rule if the function being integrated has no known antiderivative, or the
antiderivative is hard to find. The trapezoidal rule with n trapeziums is
n−1
Z b !
1b−a X
f (x)dx ≈ f (a) + f (b) + 2 f (xi ) .
a 2 n
i=1
b−a
Note that the distance between xi and xi+1 is assumed to be uniform, thus ∆x = n . This
yields that xi = a + i∆x.

6. Multivariable calculus

A function f : D → R where D ⊆ Rn is called a real-valued multivariable function of n variables.


For example, if f denotes the temperature at a point in space at some time, then we would
require a function f (x, y, z, t), where (x, y, z) denotes the point in space and t denotes the time.

We restricted ourselves to functions of two variables in this course (sometimes three).

6.1. Surfaces. The plot of a function of two variables f : D → R is the set of points

SEx = {(x, y, z) ∈ R3 : z = f (x, y)},

which is a surface in R3 .7 A surface with this representation is in explicit (cartesian) form, as


the z value corresponds to a function output.

In order to plot the surface of a function of two variables, one considers cross sections of
the surface, and then constructs the surface from them. For example, if z = f (x, y) is a surface

7Usually when we refer to a surface we just say something like ‘Consider the surface z = f (x, y)’ rather than the
whole set of points thing.
17

in explicit (cartesian) form, then one considers plotting the curves

C = {(x, y) ∈ R2 : z0 = f (x, y)}

for various z0 all on the same set of axes (x − y plane). This plot is called a contour plot, and
the curves are called contours or level curves/sets. The level curves can be thought of as curves
obtained via intersection (‘slicing’) of the surface z = f (x, y) with the horizontal planes z = z0 .

One can also look at cross sections corresponding to vertical slicing, i.e., freezing one of
the input variables x or y. These curves are thus

C = {(y, z) ∈ R2 : z = f (x0 , y)}

plotted in the y − z plane and

C = {(x, z) ∈ R2 : z = f (x, y0 )}

plotted in the x − z plane. These are called traces of the surface. One can think of these being
obtained as the intersection (‘slicing’) of the surface with the vertical planes x = x0 or y = y0 .
Thus, one can construct a surface by simply investigating its level curves and traces.

Similar to the case of curves, one can write the implicit (cartesian) form of a surface as

SIm = {(x, y, z) ∈ R3 : g(x, y, z) = 0}.

For example, if we let g(x, y, z) = x2 + y 2 − z 2 , then SIm is the surface of a double cone.

Lastly, we also have the parametric form of a surface. The parametric form of a surface is
given by the map (u, v) 7→ (x(u, v), y(u, v), z(u, v)). The set of all points on the surface is given
by

SPar = {(x(u, v), y(u, v), z(u, v)) ∈ R3 : u1 ≤ u ≤ u2 , v1 ≤ v ≤ v2 }.

6.2. Partial derivatives. Let f : D → R be a two variable function. The partial derivative of
f w.r.t. x, denoted by ∂f∂x , is the instantaneous rate of change of f in the x direction, with y
fixed. Similarly, the partial derivative of f w.r.t. y, denoted by ∂f
∂y , is the instantaneous rate of
change of f in the y direction, with x fixed. Namely,
∂f f (x + ∆x, y) − f (x, y)
:= lim ,
∂x ∆x→0 ∆x

∂f f (x, y + ∆y) − f (x, y)


:= lim .
∂y ∆y→0 ∆y
∂f ∂f
We often use the subscript notation fx ≡ ∂x and fy ≡ ∂y . If we want to look at the partial
∂f
derivative at a specific point (x0 , y0 ), we would write fx (x0 , y0 ) ≡ ∂x |(x,y)=(x0 ,y0 ) and fy (x0 , y0 ) ≡
∂f
∂y |(x,y)=(x0 ,y0 ) . In this case, the subscript notation is usually preferred!

Calculating partial derivatives is easy! For example, if f (x, y) = xex+y , then in order to
compute fx , one ‘treats’ y as a constant, and then implements the usual differentiation rules
from the univariate setting in x. Using the product rule, one obtains fx = ex+y + xex+y . Note
18

that although we treated y as a constant in order to compute fx , in the end fx is a function of


both x and y.

6.3. Tangent planes. Tangent planes generalise the notion of a tangent line to two variable
functions. Tangent planes can be placed at a point (x0 , y0 , f (x0 , y0 )) on the surface z = f (x, y)
if the surface is sufficiently locally linear. That is, zoom in at (x0 , y0 , f (x0 , y0 )), if the surface
is locally flat, a tangent plane can be fitted.

The equation of the tangent plane to the surface z = f (x, y) is given by

z = z0 + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 )

where z0 = f (x0 , y0 ). Interestingly, it contains all the possible tangent lines to the point (x0 , y0 ),
and thus the instantaneous rate of change of f at (x0 , y0 ) in any direction. However, in order to
construct the tangent plane, the only information we require from f are its partial derivatives.
This is because the partial derivatives are sufficient in order to obtain any directional derivative
(think about adding non colinear vectors!). Also, if you look at the RHS of the preceding
equation, it coincides with the linear approximation / first-order Taylor polynomial of f centred
at (x0 , y0 ).

Exercise 6.3.1. Let f (x, y) = x3 ecos(x+y) . Find the equation of the tangent plane to the surface
z = f (x, y) at the point (π/2, 0).

Solution.
19

6.4. Chain rule. Let f : D → R be a two variable function. Suppose we have a curve in R2
specified in parametric form s 7→ (x(s), y(s)). If we apply f to this curve, what we get is a
curve in R3 , namely s 7→ (x(s), y(s), f (x(s), y(s))). It makes sense to talk about the derivative
of the (univariate!) function s 7→ f (x(s), y(s)), since the curve s 7→ (x(s), y(s)) determines
the direction in which you approach any point (x0 , y0 ). Perhaps confusing with all the maps
here...but there is a nice geometric picture in the lectures! Anywho, the chain rule states
d ∂f dx ∂f dy
f (x(s), y(s)) = + ,
ds ∂x ds ∂y ds
which should be thought of as a function of s.

Suppose now we have a region in R2 specified in parametric form (u, v) 7→ (x(u, v), y(u, v)).
If we apply f to this region, what we get is a surface in R3 . Now it doesn’t make sense to talk
about the derivative of (u, v) 7→ f (x(u, v), y(u, v)), since you can approach any point (x0 , y0 ) in
this region from infinitely many directions. However, we can talk about partial derivatives. In
this case the chain rule states
∂ ∂f ∂x ∂f ∂y
f (x(u, v), y(u, v)) = + ,
∂u ∂x ∂u ∂y ∂u
and
∂ ∂f ∂x ∂f ∂y
f (x(u, v), y(u, v)) = + .
∂v ∂x ∂v ∂y ∂v
Both of these should be thought of as functions of (u, v).

6.5. Directional Derivative. Let f : D → R be a two variable function. The directional


derivative generalises the notion of a partial derivative. Note that the partial derivative fx
gives the instantaneous rate of change of f in the x direction (with y fixed) whilst fy gives the
instantaneous rate of change in the y direction (with x fixed).

For a unit vector t = (t1 , t2 ), the directional derivative of f in the direction t is defined as
˜ ˜
f (x + t1 h, y + t2 h) − f (x, y)
∇t f := lim
˜ h→0 h
and thus gives the instantaneous rate of change of f in the direction t. Notice if one takes
˜
t = (1, 0) and t = (0, 1) they recover the definitions of partial derivatives in x and y respectively.
˜ ˜
20

Rather than compute directional derivatives from the definition, one can show (using the
chain rule!) that

∇t f = ∇f · t
˜ ˜
where ∇f := (fx , fy ) is a vector of the partial derivatives of f , called the gradient of f (often
pronounced ‘grad f ’).8 Be careful, when calculating directional derivatives, you must ensure
that your direction vector t is a unit vector. If it is not, you must instead consider a unit vector
˜
in the same direction (just normalise t by dividing it by its length, i.e., consider t̂ = t/|t|).
˜ ˜ ˜ ˜

6.6. Higher order Taylor Polynomials. Let f : D → R be a two variable function. The
first-order Taylor polynomial (i.e., linear approximation) of f centred at (x0 , y0 ) is given by

T1 (x, y) = f (x0 , y0 ) + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 ).

This is essentially the equation of the tangent plane. The second-order Taylor polynomial
centred at (x0 , y0 ) is similar, one just needs to include second-order terms:

T2 (x, y) = f (x0 , y0 ) + fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 )


1h i
+ fxx (x0 , y0 )(x − x0 )2 + fyy (x0 , y0 )(y − y0 )2 + 2fxy (x0 , y0 )(x − x0 )(y − y0 ) .
2
That’s it! Notice here we assume fxy = fyx . This is usually true. Precisely, if fxy and fyx exist
and are continuous, then fxy = fyx .9

6.7. Extrema of two variable functions. Let f : D → R be a two variable function. A


stationary point of f is a point (x0 , y0 ) such that ∇f (x0 , y0 ) = 0, which is the same as saying
˜
fx (x0 , y0 ) = 0 and fy (x0 , y0 ) = 0. A point of singularity is a point (x0 , y0 ) where the partial
derivatives do not exist. Critical points refer to either stationary points or points of singularity.
We only focused only on stationary points.

A stationary point (x0 , y0 ) corresponds to a:

(i) Local minimum: if f (x0 , y0 ) ≤ f (x, y) for all (x, y) ‘near’ (x0 , y0 ).

(ii) Local maximum: if f (x0 , y0 ) ≥ f (x, y) for all (x, y) ‘near’ (x0 , y0 ).

(iii) Saddle point: if (x0 , y0 ) is not a local minimum or maximum.

Local extrema refer to either local minima or local maxima. Saddle points are technically not
extrema.

One can classify the nature of a stationary point (x0 , y0 ) via the following test.10 Let
D = fxx fyy − (fxy )2 .11 Then
8We didn’t talk much about ∇f but the interpretation is that it is a vector that points in the direction of the
steepest ascent/descent along the surface z = f (x, y).
9A function whose second-order partial derivatives exist and the second-order partial derivatives themselves are
continuous are called C 2 functions.
10This test doesn’t work for points of singularity, why?
11This is the determinant of the Hessian matrix, i.e., the 2 × 2 matrix of second-order partial derivatives.
21

(i) If D(x0 , y0 ) > 0 and fxx (x0 , y0 ) > 0: Local minimum

(ii) If D(x0 , y0 ) > 0 and fxx (x0 , y0 ) < 0: Local maximum

(iii) If D(x0 , y0 ) < 0: Saddle point

(iv) If D(x0 , y0 ) = 0: Inconclusive

You might also like