Professional Documents
Culture Documents
Mgre Notes
Mgre Notes
•
T ic Li
f he GRE Ma h S bjec Te
—Bill
(P.S.: I also offer personalized tutoring for the GRE Math Subject Test. Contact me via my website,
https://www.mathsub.com if o d like help re ie ing hese opics as well as learning some of the tips
and tricks I recommend to make them easier!)
P ecalc l
Basic Geometry
Parallel and perpendicular lines, transversals
Congruence
Centers of triangles (circumcenter, incenter, orthocenter, centroid)
Triangle inequality
Properties of polygons (angle sum, interior angle measure, etc.)
Similarity and proportionality
Angle bisector theorem for triangles
Pythagorean theorem
Right triangle trigonometry
Circles (arcs, chords, inscribed angles, tangents, secants, power of a point, etc.)
Cyclic quadrilaterals*
AM-GM inequality*
Areas and perimeters of triangles (multiple formulas!), quadrilaterals (multiple
formulas!), circles, sectors, ellipses
Volumes and surface areas of cubes, cylinders/prisms, cones/pyramids, spheres,
ellipsoids
Composite figures and shaded regions
Coordinate geometry (distance, midpoints, etc.)
Basic Algebra
Basics of functions (domain, range, intervals of increase, end behavior, etc.)
Algebra of functions
Inverse functions
Cyclic functions*
Functional equations*
Even and odd functions
Graphs of equations and transformations
Solving equations and inequalities (inverse functions, factoring, completing the square,
looking at the graph, etc.)
Lines and linear functions
Piecewise functions
Absolute value function
Floor and ceiling functions
Max and min functions
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Algebraic functions
Quadratic functions, quadratic formula, discriminant
Graphs of polynomials
Binomial theorem a d Pa cal ia gle
Factoring and zero-finding techniques (grouping, polynomial division, rational root
he em De ca e R le f Sig Vie a f m la etc.)
Fundamental theorem of algebra
Rational functions (asymptotes, holes, etc.)
Radical functions
Transforming radicals (rationalizing fractions, radical conjugates, nested radicals)
Transcendental functions
Exponential functions and exponent laws
Logarithmic functions and logarithm laws
Exponential and logarithmic applications and models (growth/decay, Gaussian curves,
financial applications, etc.)
Trigonometric and inverse trigonometric functions
Circular and harmonic motion
Laws of sines and cosines
Trigonometric identities (reciprocal, quotient, Pythagorean, sum/difference, double/half
angle, product/sum)
Combined sinusoids*
Hyperbolic functions
Analytic Geometry
Polar coordinates
Graphs of polar equations (rose curves, limaçons, lemniscates, etc.)
Plane curves and parametric equations
Loci in the plane
Conic sections (circles, ellipses, parabolas, hyperbolas) and their anatomy (foci,
eccentricity, etc.)
Transformations of conics (shifting, scaling, rotating)
Polar equations of conic sections
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Differentiation
Tangent lines
Concept and limit definition
Differentiability and continuity
Linearity rules
Product and quotient rules
Chain rule
Higher order derivatives
Derivatives of elementary functions
Velocity and acceleration
Implicit differentiation
Relative extrema
Increasing and decreasing functions
Concavity and points of inflection
Curve sketching
Mean Value Theorem
Optimization
Related rates
Logarithmic differentiation
Parametric and polar derivatives
Integration
Antiderivatives
Definite integrals
Average value and the Mean Value Theorem
Fundamental Theorem of Calculus
Leib i R le
𝑢-substitution
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Series
Infinite series
Geometric series
Telescoping series
Integral test
Comparison test and limit comparison test
Alternating series and absolute convergence
Ratio test and root test
Power series
Taylor and Maclaurin series
Common power series
Remainders of Taylor series and Lagrange error bound
https://www.mathsub.com
Topic List for the GRE Math Subject Test
M l i a iable Calc l
Vectors
Vectors in 2D and 3D, rectangular and polar form
Dot and cross product
Lines and planes in space
Quadric surfaces
Vector-valued functions
Vector calculus
The Frenet frame, curvature, and tortsion*
Tangent and normal vectors
Parametric surfaces
Multivariable functions
Limits and continuity of multivariable functions
Partial derivatives and differentiability
Tangent planes
Multivariable chain rule
Gradients
Directional derivatives
Classifying critical points
Lagrange multipliers
Double and triple integrals
F bi i The em
Area, volume, and centroids
Cylindrical and spherical coordinates
Jacobians
Vector Calculus
Vector fields
Line integrals
Independence of path and conservative vector fields
G ee The em
Curl and divergence
Surface integrals and Flux
Divergence theorem
S ke The em*
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Diffe e ial E ai
First-O de ODE
Initial value problems
Slope fields
A m ODE
Equilibrium solutions
Separable equations
Linear equations and integrating factors
Exact equations
Inexact equations and integrating factors*
Solutions by substitutions
Modeling with first- de ODE
Higher-O de ODE
Reduction of order
Homogeneous linear equations with constant coefficients
Undetermined coefficients
Variation of parameters*
Cauchy-Euler equations*
S em f li ea ODE
Basic modeling with higher- de ODE s
Relationships to linear algebra
Stretch topics*
Laplace transform
Fourier series
Partial differential equations
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Li ea Algeb a
Linear equations and matrices
Systems of linear equations
Systems of inequalities and linear programming
Row reduction and echelon forms
Vector and matrix equations
Solution sets of linear systems
Linear independence
Linear transformations
Matrix algebra
Transformation matrices (rotation, dilation, shear, etc.)
Inverse and invertible matrices
Triangular and diagonal matrices
Partitioned and block matrices*
Matrix factorizations
Determinants
C ame R le
Vector spaces
Definitions
Subspaces
Null space and column space
Bases
Coordinate systems
Dimension
Rank and nullity
Change of basis
Inner product, length, and orthogonality
Cauchy-Schwartz inequality*
Orthogonal projection
Other examples of vector spaces (polynomials, functions, etc.) and their linear operators
Eigenvalues etc.
Eigenvectors and eigenvalues
Trace
The characteristic equation
Cayley-Hamilton Theorem
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Diagonalization
Minimal polynomials
Nilpotent and idempotent matrices
Invariant subspaces and direct sums*
Jordan normal form*
Matrix exponentials*
Gram-Schmidt orthogonalization*
Least squares solutions*
Quadratic forms*
https://www.mathsub.com
Topic List for the GRE Math Subject Test
N mbe The
Divisibility
Division algorithm
GCD and LCM
Euclidean algorithm
Diophantine equations
Fundamental theorem of arithmetic
Modular arithmetic
Basic properties of congruence
Base 𝑏 representations
Divisibility tricks
Chinese remainder theorem*
Fe ma Li le The em
Wil The rem
Number-theoretic functions
Sum and number of divisors
E le hi f c i
E le he em
Other topics
Order of an integer mod 𝑚
Primitive roots
Quadratic residues*
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Ab ac Algeb a
Groups
Definitions and properties of groups
Dihedral groups
Cyclic groups
Subgroups
Permutation groups (including permutation notations)
Isomorphisms
Cosets
Lag a ge he em
Ca le he em
S l fi he em*
Direct products
Normal subgroups
Quotient groups
Homomorphisms
First isomorphism theorem
Fundamental theorem of finite abelian groups
Conjugacy classes
Automorphism groups*
Group-like structures: semigroups, monoids*
Rings
Definitions and properties of rings
Examples
Subrings
Integral domains
Fields
Characteristic of a ring
Ideals
Quotient rings
Prime and maximal ideals
Ring homomorphisms
Polynomial rings
Polynomial (ir)reducibility tests*
Other special types of rings (PID, UFD, Boolean, local, Noetherian, Artinian, etc.)*
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Fields
Field extensions and their relation to vector spaces
Classifications and structure of finite fields
Splitting fields*
Constructible numbers*
Modules*
(Technically mentioned on the syllabus, but this is definitely a stretch topic all unto itself.)
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Di c e e Ma h
Set Theory
Sets and set operations
Venn diagrams
Operations and relations
Equivalence and order relations
Functions: injections, surjections, bijections
Function composition
Images and preimages
Cardinality, cardinal numbers, and countability
Cantor-Schröder-Bernstein theorem
Logic
Propositional logic and truth tables
Propositional equivalences
Predicates and quantifiers
Proof techniques and proof validity
Induction
Algorithms
Basics of pseudocode (input/output, assignment, branching, looping, etc.)
Growth of functions
Runtime complexity
Well-known algorithms
Recursion*
Combinatorics
Basics of counting
Pigeonhole principle
Binomial and multinomial coefficients
Permutations and combinations
Circular permutations
Complements and the Inclusion-Exclusion Principle
Generating functions*
Partitions*
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Graph Theory
Graph terminology
Graph isomorphism
Connectivity
Adjacency matrices
Euler and Hamilton paths
Trees
Tree traversals*
Spanning trees
Planar graphs
Graph coloring*
https://www.mathsub.com
Topic List for the GRE Math Subject Test
P babili S a i ic
Probability
Basic concepts and properties
Probability and combinatorics
Probability and geometry
Conditional probability
Joint probability
Independent events
Ba e The em
Random Variables
P babili ma de i f c i mf df a d c m la i e di ib i f ci
CDF
Expectation
Mean, variance, standard deviation
Moment generating functions MGF
Discrete distributions (uniform, Bernoulli, binomial, geometric, Poisson, etc.)
Continuous distributions (uniform, exponential, normal, 𝑡, 𝜒 )
Normal approximation of binomial distribution
Empirical Rule
Functions of random variables
Statistics
Mean, median, quartiles, range, percentiles
𝑧-scores
Linear regression and correlation
Linearization of nonlinear models
Sampling distributions of sample means and proportions
Point estimation
Biased and unbiased estimators
Confidence intervals*
Hypothesis testing*
https://www.mathsub.com
Topic List for the GRE Math Subject Test
T l g
Topological spaces
Point-set topology and open sets
Examples (standard, indiscrete, discrete, lower limit, cofinite, etc.)
Basis for a topology
Closed sets
Interior, exterior, boundary
Limit points, derived set, closure
Subspace topology
Product topology
Quotient topology a d gl i g
Geometric examples
https://www.mathsub.com
Topic List for the GRE Math Subject Test
Real A al i
Properties of real numbers
Supremum/infimum and the completeness property
Density property
Archimedean property
Sequences
Convergence and limits of sequences
Limit superior and limit inferior
Bolzano-Weierstrass theorem
Cauchy sequences
Functions
Limits of functions
Continuous functions
Sequential limits and continuity
Uniform continuity and continuous extensions
Other types of continuity (Lipschitz, Hölder, absolute)*
Bounded variation*
Differentiability classes
Riemann and Darboux integrability
Sequences of functions
Pointwise and uniform convergence
Interchange of limits
Series of functions
Metric spaces
Metric definitions and examples (Euclidean, taxicab, max, ℓ /𝐿 , etc.)
Complete metrics
Topological concepts in metric spaces: compactness, connectedness, continuity
Heine-Borel Theorem
Lebesgue measure*
(used to be on the test syllabus in 1997, likely a stretch topic now)
https://www.mathsub.com
Topic List for the GRE Math Subject Test
C m le A al i
Basics of complex numbers
Definitions
Complex plane
Complex conjugation
Polar form
Powers and roots
Loci and regions in the complex plane
Complex functions
Functions and mappings
Linear and fractional-linear mappings
Inversive geometry*
Power functions
Limits and continuity
Differentiability and analyticity
Cauchy-Riemann equations
Harmonic functions
Elementary functions extended to the complex plane
Conformal mappings*
The Riemann mapping theorem*
Complex integration
Contour integrals
Cauchy-Goursat theorem
Maximum Modulus Theorem
ML Inequality
Independence of path
Ca ch i eg al f m la
Laurent series
Residues and the residue theorem
https://www.mathsub.com
Topic List for the GRE Math Subject Test
N me ical A al i
(Again i s mentioned on the syllabus, so may as well be prepared.)
Approximation of functions
Basic estimation techniques
Tangent line approximation
Linear interpolation
E le me h d
Taylor polynomial approximations
Error
Order of approximations*
Lagrange interpolating polynomials*
Root finding
Bisection method
Fixed points and contraction mapping theorem*
Newton-Raphson method
Secant method
Calculus methods
Finite differences
Numerical differentiation
Riemann sums
Trapezoidal sums
Sim le
https://www.mathsub.com
Christian Parkinson GRE Prep: Calculus I Notes 1
Week 1: Calculus I
Notes
The most fundamental definition in Calculus is that of the limit.
Definition 1 (Limit). Let f : R ! R and let x0 2 R. We say that the limit of f (x)
as x approaches x0 is L if for all " > 0, there is > 0 such that 0 < |x x0 | < =)
|f (x) L| < " [note: there may be no such L in which case we say the limit does not exist].
When we can find such L, we write
lim f (x) = L.
x!x0
If instead, we want to take the limit at ±1, we need to modify the definition a bit. We
say that the limit of f (x) as x approaches 1 is L if for all " > 0, there is M > 0 such that
x > M =) |f (x) L| < " (and likewise for 1 with some of the signs changed) whence
we write
lim f (x) = L.
x!1
There is one theorem for computing limits that can be helpful on the math subject GRE.
Then lim g(x) = L. [The same holds when x0 is replaced with ±1.]
x!x0
sin(x)
This theorem helps us calculate limits like lim .
x!1 x
Example 3. Note that we require |f (x) L| < " for 0 < |x x0 | < ; the lower bound of
0 is important. Consider f (x) = 1 for x 6= 0 and f (0) = 2. By tracing the graph, it is clear
that we should have
lim f (x) = 1.
x!0
However, if we remove the lower bound of 0 in the definition, then we cannot prove that the
limit is indeed 1.
With these definitions, we can state some properties of limits and continuous functions.
(a) f ± g is continuous at x0 .
(b) f g is continuous at x0 .
This theorem gives us many continuous functions. For example, since the function
f (x) = x is continuous, it follows from rules (a), (b), (d) that any polynomial is contin-
uous. As a general rule, any ‘common’ function is continuous; this includes sin, cos, exp for
Christian Parkinson GRE Prep: Calculus I Notes 3
example.
There is one very nice theorem regarding continuous functions that comes up fairly often
in the math subject GRE.
This theorem is key in proving that the continuous image of a connected set remains
connected; we discuss this later when we address set theoretic topology. The context in
which this theorem typically arises on the GRE is root finding.
Example 8. Given that p(x) = 2x3 2x + 3 has one root in R, find the interval [n, n + 1]
(where n 2 Z) containing the root.
Solution. By the intermediate value theorem, it suffices to find n 2 Z such that p(n) < 0
and p(n + 1) > 0 or p(n) > 0 and p(n + 1) < 0. Testing, we see
Continuity often times is not enough since a continuous function can still be very jagged.
We like our functions to be smooth. This motivates us to define smoother functions; specif-
ically, functions that locally look like straight lines. Accordingly, for a function f : R ! R
and a point x 2 R, you could think of drawing the secant line between the points x and a
nearby point x + h (for some small h 2 R). The degree to which this line approximates the
graph of f is some measure of smoothness of f . Letting h become smaller and smaller, the
secant line ‘should’ settle to a line which lies tangent to f at the point x; indeed, this occurs
if f is di↵erentiable at x.
Of course, we may assume knowledge of some derivatives which are not listed here. Oth-
erwise, we have some rules which will tell us how to find the derivatives of other functions.
(b) (Product Rule) f g is di↵erentiable at x with (f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x).
Also, as discussed above, we can use the derivative to find tangent lines to a graph.
Christian Parkinson GRE Prep: Calculus I Notes 5
These simple rules and definitions can be used to solve a surprising amount of problems
on the math subject GRE.
x2 x 2
Example 14. Let f (0) = 0 and f (x) = xe for x 6= 0. At how many points is the
tangent line to f horizontal?
Solution. We could chug away at this using the power rule and product rule and it would
get fairly messy. Instead, we can use the chain rule to find the derivative of ln(f (x)). We
see
f 0 (x) d d 2 9
= (ln(f (x))) = (2 ln(x + 1) 9 ln(x) x) = 1.
f (x) dx dx x+1 x
Thus ✓ ◆
0 (x + 1)2 2 9
f (x) = 1 .
x9 e x x+1 x
A natural question: is there a point when the average rate of change of a function over
an interval is equal to the instantaneous rate of change? The answer is yes and physically
this makes sense.For example, if you drove 500 miles in 10 hours, there was a point when
you were going exactly 50 miles per hour.
f (b) f (a)
f 0 (c) = .
b a
With “derivative as slope”, we see that a di↵erentiable function f is increasing at x if
f (x) 0 and decreasing if f 0 (x) 0. Thus the extreme points of f must occur at places
0
Christian Parkinson GRE Prep: Calculus I Notes 6
where f (x) = 0.
Derivatives are also very helpful in finding limits. Many limits result in so-called inde-
terminant forms. For example, we may want to take the limit
f (x)
lim
x!1 g(x)
or
lim f (x) = ±1, lim g(x) = ±1.
x!x0 x!x0
f 0 (x)
Then if limx!x0 g 0 (x)
exists, we have
f (x) f 0 (x)
lim = lim 0 .
x!x0 g(x) x!x0 g (x)
This tells us that if we have an indeterminant form, what really matters is the rate at which
the functions approach that indeterminant form individually.
This is an incredibly powerful method for finding limits and it often arises on the math
GRE. It also (indirectly) helps us find limits when other indeterminant forms such at 01 or
10 arise.
Solution.
sin(x) cos(x)
(a) lim = lim = 1.
x!0 x x!0 1
Christian Parkinson GRE Prep: Calculus I Notes 7
1 x
(b) Let L = lim 1 + x
. Then by continuity,
x!1
✓ ◆ ✓ ◆
1 ln(1 + 1/x) (1/x2 ) 1
ln L = lim x ln 1 + = lim = lim = 1.
x!1 x x!1 1/x x!1 1 + 1/x (1/x2 )
Thus L = e.
(c) We see
From here, we move onto Riemann integration. The Riemann integral of a function
between two points a, b can be physically Rthought of as the area underneath the graph of
b
the function between a, b. If we denote by a f (x)dx the integral of f from a to b, then it is
easy to see for example, that
Z 1 Z 1
1 dx = 1, x dx = 1/2
0 0
just by calculating the area. But what if f is not so simple? We can calculate the area using
limits and approximating with rectangles
Note that when f is continuous, these limits always exists and are equal. When f is
discontinuous, some of them may fail to exist. Thus a bit more care is needed in rigorously
Christian Parkinson GRE Prep: Calculus I Notes 8
defining the Riemann integral for discontinuous functions. We also note that there is no
reason the partition xi needs to contain evenly spaced points and there’s no reason we need
to evaluate f at xi 1 , xi or the midpoint. We could refine the definition to account for an
arbitrary partition with arbitrary tags:
Z b n
X
f (x)dx = lim f (x⇤i )(xi xi 1 ), where x⇤i 2 [xi 1 , xi ]
a n!1
i=1
for example. In this case, the points xi need to be strictly increasing, a = x0 < x1 < · · · <
xn = b and satisfy condition that ensures that they dont cluster in some small portion of the
interval: specifically, we need hn ..= max1in (xi xi 1 ) ! 0 as n ! 1. On the GRE, it is
rare to see a Riemann sum with non-uniform partition.
We also note that each of these limits has a di↵erent interpretation which can be seen by
looking at a picture.
Example 21. Let 0 = x0 < x1 < . . . < xn = 1 be a partition of [0, 1]. Which of the
following quantities is greatest?
Z 1
A= x2 dx
0
n
X
B= (xi 1 )2 (xi xi 1 )
i=1
n
X
C= (xi )2 (xi xi 1 )
i=1
n
X
D= 1
4
(xi 1 + xi )2 (xi xi 1 ).
i=1
Solution. By drawing a picture, it is easy to see that B < D < C and that A < C so C is
the largest. [Note: C was largest because x2 is an increasing function; what happens if we
change it to a decreasing function? non-monotone function?]
Such F is di↵erentiable on (a, b) with F 0 (x) = f (x) for all x 2 (a, b).
FToC II: Suppose that f : [a, b] ! R is continuous and F : [a, b] ! R is any function
such that F 0 (x) = f (x) for x 2 (a, b). Then
Z b
f (x)dx = F (b) F (a).
a
Thus to calculuate area under curves, we simply need to reverse the di↵erentiation process
and evaluate at the correct points. Such a function F in the FToC is called an antiderivative
of f .
Z ⇡
Example 24. Evaluate sin(x)dx.
0
d
Solution. Since dx
( cos(x)) = sin(x), we see,
Z ⇡
sin(x)dx = cos(⇡) ( cos(0)) = 2.
0
The FToC can also be very helpful evaluating certain limits if we can recognize that they
are actually Riemann sums.
n
X n
Example 25. Evaluate lim .
n!1
k=1
n2 + k2
Solution. Call the limit S. When we re-write this as
n
1X 1
S = lim 2
n!1 n k
k=1 1 + n
1
we recognize that it is simply the Riemann sum for 1+x2
on the interval [0, 1]. Thus
Z 1
dx ⇡
S= = arctan(1) arctan(0) = .
0 1 + x2 4
The Fundamental Theorem of Calculus tells us that to calculate integrals, we simply
need to find antiderivatives. There are a few standard methods which help us with this. We
close the Calculus I notes by covering these methods.
Christian Parkinson GRE Prep: Calculus I Notes 10
We can use the formula in either direction; one of the above integrals may be very
straightforward while the other is not. We illustrate this with two examples.
Z 1 Z 1
t dt dx
Example 27. Compute (a) 2
and (b) 2
.
0 1+t 0 1+x
Solution. For (a), note that the numerator is the derivative of the denominator (up to a
constant). Thus we should make a substitution for the denominator. Putting x = 1 + t2 , we
see dx = 2tdt and so Z 1 Z
t dt 1 2 dx log(2)
2
= = .
0 1+t 2 1 x 2
Here we used the rule from right to left.
For (b), we can set x = tan(t). Then dx = sec2 (t)dt so we see
Z ⇡/4 Z ⇡/4
sec2 (t)dt ⇡
= 1 dt = .
0 1 + tan2 (t) 0 4
Example 28. Note that the assumptions on are necessary. Indeed, removing this as-
sumption and proceeding formally, we can run into some errors. Consider, by (b) above and
using evenness, we have Z 1
dx ⇡
2
= .
1 1+x 2
However, using x = 1/t gives,
Z 1 Z 1 Z 1
dx ( 1/t2 )dt dt ⇡
2
= = = .
1 1+x 1 1+t 2 1 1 + t2 2
Thus we’ve seemingly proved that ⇡ = ⇡ which is absurd. However, this substitution was
invalid because (t) = 1/t is not a di↵erentiable bijection on [ 1, 1].
Intuitively, integration by substitution takes advantage of the chain rule for di↵erentia-
tion. A natural question is if we can exploit other rules for di↵erentiation in a similar way.
which is equivalent to the above statement. You can think of integration by parts as “taking
a derivative o↵ of one term and applying it to the other.” Of course, this process is not free;
the cost you incur is the boundary evaluation u(b)v(b) u(a)v(a). In terms of indefinite
integrals, you often see this rule written as
Z Z
u dv = uv v du.
For (b), there is only one function present. However, we can take u = log(x), dv = dx to see
that Z 2 Z 2
x=2
1
log(x)dx = x log(x) x · dx = 2 log(2) 1.
1 x=1 1 x
A last technique used for integration is partial fractions. This is a method for simplify-
ing the integrand; it doesn’t correspond to some di↵erentiation rule. This is a method for
breaking up rational functions into much simpler rational functions. Rather than discuss
this, we illustrate the technique with an example.
This is a system of three equations in three unknowns (collect terms for x0 , x1 , x2 ). It is easy
to solve by substituting certain values for x. Substituting in x = 1, x = 3 and x = 3, we
arrive at
1 = ( 2)(4)A, 1 = (2)(6)B, ( 4)( 6)C = 1.
Thus
1 1 1 1 1 1 1
= + + .
(x 1)(x 3)(x + 3) 8 x 1 12 x 3 24 x + 3
It is now easy to see that
Z
1 1 1 1
dx = log(x 1) + log(x 3) + log(x + 3) + constant.
(x 1)(x2 9) 8 12 24
Note that there need to be special considerations when the denominator does not split into
linear factors; we omit these cases.
Christian Parkinson GRE Prep: Calculus II Notes 1
Week 2: Calculus II
Notes
We concluded the Calculus I notes with Riemann integration, Fundamental Theorem of
Calculus and some helpful integration techniques. However, ofter times, you will be asked to
identify whether an integral converges or diverges even when you cannot find the value. This
is especially true for improper integrals: those in which the integrand had a singularity or the
range goes to infinity. We begin the Calculus II notes by discussing some basic convergence
results.
to converge, we will need f (x) ! 0 faster than 1/x as x ! 1 and if g(x) has a singularity
at x = 0, then for Z 1
g(x)dx
0
to converge, we need g(x) to blow up slower than 1/x as x ! 0+ . Note, these “general rules”
should not be seen has hard-and-fast truths, but rather heuristics which can often make it
easy to verify if an integral converges or diverges. The following two theorems give more
rigorous forms of these general rules.
Theorem 2 (Comparison Test). Suppose that f, g : [0, 1) ! [0, 1) are continuous and
that f (x) g(x) for all x 2 [0, 1). Then
Z 1 Z 1
(a) if g(x)dx converges, then so does f (x)dx,
0 0
Christian Parkinson GRE Prep: Calculus II Notes 2
Z 1 Z 1
(b) if f (x)dx diverges, then so does g(x)dx.
0 0
Theorem 3 (Limit Comparison Test). Suppose that f, g : [0, 1) ! [0, 1) are contin-
uous and that
f (x)
lim
x!1 g(x)
Note, the previous two theorems also hold for improper integrals where the functions have
singularities at the same point but the limit in the second must be taken at the singularity.
This last theorem tells us that the only important thing in determining convergence of
an integral with 1 as a limit is the asymptotic behavior of the integrand. We have a special
notation for this. If f, g are such that f /g ! L as x ! 1 where L is a finite non-zero
number, then we write f ⇠ g [note: in most contexts, L is required to be 1 in order to say
f ⇠ g; for our purposes, it is fine to allow L to be finite and non-zero]. We typically read
this as “f is asymptotic to g.” Note that if f ⇠ g, then there are constants c, C > 0 such
that for sufficiently large x, we have cg(x) f (x) Cg(x). This shows that we can easily
derive the Limit Comparison Test from the ordinary Comparison Test. A relaxed version
of this requirement leads to a new definition. If there is C > 0 such that f (x) Cg(x) for
all x sufficiently large, then we write f = O(g) [and say “f is big-oh of g”]. Thus f ⇠ g i↵
f = O(g) and g = O(f ). Specifically, f = O(g) i↵ f /g remains bounded as x ! 1. Finally,
note that we can define the same notion for x approaching some other point. For example,
sin(x) ⇠ x as x ! 0.
In order to apply these rules, it is nice to know something about the growth rates of
di↵erent common functions at 1. We discuss a few here.
log(x)↵
(b) lim = 0,
x!1 x
x
(c) lim = 0.
x!1 e x
What these tells us is that asympotically as x ! 1, any power of x is smaller than any
exponential and any power of a logarithm is smaller than any power of x. In asymptotic
notation, we write 1 ⌧ log(x)↵ ⌧ x ⌧ e x where f ⌧ g means that f /g ! 0 at 1. We
can compound these into new relationships x ⌧ x log(x)↵ . Many more such relationships
can be proved by simply by taking the required limit. Note that if f ⌧ g, then f = O(g);
Christian Parkinson GRE Prep: Calculus II Notes 3
Solution.
p p
(a) Since x, log(x) ⌧ x, there are constants c1 , c2 such that for x large enough, x c1 x,
log(x) c2 x. Thus
Z 1 Z 1
dx dx
p C+
2 x + log(x) + x M (1 + c1 + c2 )x
where C, M are some constants. The former diverges since the latter does.
(b) Notice that if we divide the integrand by 1/x3 (hence multiply by x3 ) and take the
limit, we see
Next, we discuss applications of the integral. We have already established that it can be
used to calculate area under curves but there are a few other applications that show up often
on the math subject GRE; most prominently arc-length calculations and surfaces/volumes
of revolution. We discuss the formulas for these briefly.
More generally, if x, y : [a, b] ! R2 are continuously di↵erentiable, then the length of the
curve parameterized by (x(t), y(t)) for t 2 [a, b] is given by
Z bp
L= x0 (t)2 + y 0 (t)2 dt.
a
These formulas are easy to derive when you have a good picture in your mind; otherwise
you can use them as a black box. Note that they agree with our intuition.
p
Example 8. What is the length of the graph of f (x) = 1 x2 for x 2 [ 1, 1]?
Solution. Intuitively, this graph is half of a unit circle so the length should be ⇡. Using the
formula, we see
Z 1r Z 1
x2 dx
L= 1+ 2
dx = p = arcsin(1) arcsin( 1) = ⇡/2 ( ⇡/2) = ⇡,
1 1 x 1 1 x2
where the antiderivative can easily be found using the trig. substitution x = sin(t).
Example 9. Let f (x) = x for x 2 [0, 1] What surface area and volume result from rotating
the graph of f about the x-axis?
Solution. Intuitively, this will create a right, circular cone of “height” 1 and radius 1. You
may recall from geometry, the volume is ⇡3 r2 h = ⇡3 . According to our formula,
Z 1
⇡
V =⇡ x2 dx = .
0 3
Likewise, the surface area p
(not includingpthe “bottom” of the cone; i.e. the unit circle which
forms the base) will be ⇡r r2 + h2 = ⇡ 2 and our formula gives
Z 1 p p
A = 2⇡ x 1 + 12 dx = ⇡ 2.
0
Note: much like the arc-length formula, the surface area and volume formulas can be
adjusted to account for parameterized coordinates or to account for rotation around the
Christian Parkinson GRE Prep: Calculus II Notes 5
y-axis or any other line ax + by = c. Also, you may be asked for the volume created when
the area between two graphs (x, f (x)) and (x, g(x)) is rotated around the x-axis. This will
be given by Z b
⇡ f (x)2 g(x)2 dx.
a
The reasoning should be somewhat clear from the geometry.
Note that sequences can begin at n = 0 or n = 1; both are common start points. Se-
quences themselves (i.e. independent of series) account for many questions on the math
subject GRE. A common question will give a recursive formula (i.e. a formula for an in
terms of the preceeding entries: an 1 , an 2 , . . .) for a sequence and ask the student to iden-
tify the sequence explicitly.
Example 11. Suppose that a0 = 0 and an = an 1 + (2n 1) for all n 1. Find a closed
form expression for an .
a0 = 0, a1 = 0 + 1 = 1, a2 = 1 + 3 = 4, a3 = 4 + 5 = 9, . . . .
From here it is easy to surmise that the sequence is given by an = n2 . To prove this, one
could use mathematical induction; however, proofs are not necessary on the GRE so this
would be a waste of time. We will talk more about general methods for identifying sequences
when we touch on Discrete Math.
Note that sometimes we drop the n ! 1 part since it is implied: we may say, lim an = L
or an ! L.
Christian Parkinson GRE Prep: Calculus II Notes 6
The rules for limits are sequences are the same as those for limits of functions. We repeat
them here.
Proposition 13 (Rules for limits of sequences). Suppose (an ) and (bn ) are two
convergent sequences. Then
We also have analogous theorems like the squeeze theorem which we omit for brevity.
Each sequence also defines a series. A series is a formal infinite sum. Infinite sums are of
great interest since, for example, they can be used to approximate functions which cannot
be explicity calculated in a finite number of steps; e.g. ex or cos(x).
s = lim sN
N !1
1
X N
X
an = lim an .
N !1
n=1 n=1
If the partial sums sn have a limit, then the series is said to converge; otherwise it is said to
diverge.
In some cases, it is easy enough to look at the partial sums and explicitly take the limit.
1 1
Example 15. Calculate the infinite sums of an = 2n
and bn = n(n+1)
.
Christian Parkinson GRE Prep: Calculus II Notes 7
While it is fairly easy to tell if a sequence converges or diverges, it can be difficult to tell
whether a series converges or diverges. For example, it may be unclear initially whether
X1 X1 X1
1 ( 1)n
, or sin(n)
n=1
n n=2
log(n) n=1
converge or diverge since we can’t find a nice closed form for the partial sums. To this end,
we have built up several tests for convergence/divergence. The first test tells you that if the
series is to converge, we at least need the summand to approach zero.
diverges. [A nice document with roughly twenty proofs of this fact can be found here.]
You may recognize this as a direct analog to one of the above propositions for conver-
gence of integrals (as we will soon see, this is no coincidence) and wonder whether the other
integral convergence tests have analogs for sequence. The answer is yes.
Theorem 19 (Comparison Test for Series). Suppose that (an ) and (bn ) are sequences
with non-negative terms such that an bn for all n 2 N. Then
1
X 1
X
(a) if bn converges then so does an .
n=1 n=1
1
X 1
X
(b) if an diverges then so does bn .
n=1 n=1
Theorem 20 (Limit Comparison Test for Series). Suppose that (an ) and (bn ) are
non-negative sequences such that bn > 0 for all n sufficiently large. If
an
lim = L 2 (0, 1)
n!1 bn
then 1 1
X X
an and bn either both converge or both diverge.
n=1 n=1
One feature that is more important for series than for integrals is sign changes.
P The re-
( 1)n
sults above apply for non-negative sequences but they do not address series like 1 n=1
p
n
n
[such series with summands P of the form ( 1) a n where a n are non-negative terms are called
alternating series]. While 1 p1 diverges (by the p-series test), with the alternating sign,
n=1 n
there is cancellation occuring and perhaps there is enough cancellation that the alternating
series converges. This is indeed the case; we state this as a proposition after a few other
related definition and results.
Christian Parkinson GRE Prep: Calculus II Notes 9
P1
Definition 21 (Absolute
P1 Convergence). The series n=1 an is said to converge abso-
lutely if the series n=1 |an | converges.
P P1
Theorem 22 (Absolute Convergence Theorem). If 1 n=1 |an | converges, then n=1 an
converges. That is, absolute convergence implies convergence. (This is a reflection of the
fact that R [with the usual norm] is a Banach space.)
Since the converse is not true (lack of absolute convergence does not imply lack of con-
P ( 1)n
vergence), this still says nothing about 1 n=1
p . For this we need another theorem.
n
The Alternating Series test is a special case of a more general theorem. By a theorem of
Dirichlet, if (an ) is a non-negative
Psequence decreasing to zero, and (bn ) is any series whose
1
partial sums are bounded, then n=1 an bn converges; the alternating series test is the case
that bn = ( 1)n . Note: it is a good exercise to prove by example that the assumption
that (an ) is decreasing is necessary; without this, the alternating series could diverge even if
an ! 0.
Above we remarked that the rules for infinite sums very closely resemble those for im-
proper integrals. Here is the reason why:
That is, to check convergence of the sum, we simply need to check convergence of the
integral. Morally: “sums are integrals and vice versa.”
Finally, there are two more tests for convergence which are very useful.
Theorem 25 (Ratio Test). Suppose that (an ) is a sequence such that an 6= 0 for all
sufficiently large n and assume that
an+1
lim = L 2 [0, 1). (i.e., assume the limit exists)
n!1 an
Then
1
X
(a) if L < 1, then an converges,
n=1
1
X
(b) if L > 1, then an diverges.
n=1
Christian Parkinson GRE Prep: Calculus II Notes 10
Theorem 26 (Root Test). Suppose that (an ) is a sequence and assume that
p
lim n |an | = L 2 [0, 1). (i.e., assume the limit exists)
n!1
Then
1
X
(a) if L < 1, then an converges,
n=1
1
X
(b) if L > 1, then an diverges.
n=1
Note that neither the ratio test or the root test can address the case that the given limit
L = 1; in this case, these tests are inconclusive and another test must be used. Also, it is not
strictly necessary for the limit to exist. We could replace L with the limit superior (which
will always exists as a number in [0, 1]) in either case and the theorems continue to hold.
We introduce one more convergence test which is not always covered in the Calculus II
curriculum but it can be very useful.
This test can be useful in series involving logarithms since log(2n ) = n log(2).
Solution. There are several ways to check convergence or divergence. We suggest one way
for each sum. Series (a) diverges by limit comparison with the harmonic series. Series (b)
converges by the alternating series test. Series (c) converges by the ratio or root test. Series
(d) converges by the root test.
Note that series (c) above is a special series. The underlying sequence is a geometric
progression; i.e., a sequence of the form cr0 , cr1 , cr2 , cr3 , . . . (above we have c = 1, r = 1/⇡).
Such series are so important, they are not only given their own name, they are evaluated
explicitly.
and then taking the limit (or course, r = 1 needs to be dealt with separately, but this is no
problem).
This last proposition is a nice segue into the final topic of Calculus II: series of functions
and specifically, the Taylor series. Note that in the above proposition, the function f :
( 1, 1) ! R defined by
1
f (x) = , x 2 ( 1, 1)
1 x
could just as well be defined by
1
X
f (x) = xn , x 2 ( 1, 1).
n=0
A natural question is then: what other functions has such representations as “infinite poly-
nomials”? To understand this question we may think of approximating function locally by
polynomials and allowing the degree of the approximating polynomial tend to 1. Recall
that di↵erentiable functions look “locally linear”; thus it is natural to think that smooth
functions (i.e., functions that are infinitely many times di↵erentiable) may look locally like
a polynomial of any degree that we choose. While this is not true of all smooth functions, it
is true for most of the functions that one typically encounters in a Calculus course. In what
follows, unless explicitly stated, we assume all functions that we introduce are smooth.
For demonstrative purposes, recall that the tangent line to a function f (x) at a point a
is given by
p1 (x) = f (a) + f 0 (a)(x a).
You may notice that p1 is the unique first degree polynomial which matches the value of f
at a and the value of f 0 at a. If we want to also match the second derivative of f at a, we
will need to up the order of the polynomial but some work shows that
f 00 (a)
p2 (x) = f (a) + f 0 (a)(x a) + (x a)2
2
will work. Likewise, we can match the first N derivatives of f at a using the polynomial
N
X f (n) (a)
pN (x) = (x a)n .
n=0
n!
These polynomials pN are sometimes called the N th order Taylor approximations to f (at
the point x = a). To see that these do actually approximate f near x = a, consider the
following theorem.
Christian Parkinson GRE Prep: Calculus II Notes 12
This shows that locally, pN approximates f to at least N th order. There are several ways
to explicity identify what form the function RN (x) actually takes but they are not of great
importance for the math subject GRE; error bounds are more practical.
M |x a|N +1
|RN (x)| .
(N + 1)!
Intuitively this tells you that if f is smooth enough, then |f pN | = O(|x a|N +1 )
as x ! a; that is, the error in the approximation should be on the order of |x a|N +1
(RN ⇠ |x a|N +1 as x ! a). This error bound can be used to calculate certain function
within a given tolerance.
M |1 0|N +1
|e pN (1)| = |RN (1)|
(N + 1)!
d n
where M is the maximum of the N + 1 order derivative of ex . Since dx x
n (e ) = e
x
for any n,
we easily find that M = e. To get the first two digits correct, we need |RN (1)| 0.01. Thus
we simply choose N to accomplish this. We find that using the above bound, N = 5 gives
|RN (1)| . 0.003 which is good enough (one can also verify that N = 4 is not good enough).
These propositions don’t quite answer our question. We would like to be able to able to
write f as an “infinite polynomial,” but these only give finite polynomial approximations.
We pass to the “infinite polynomial” by taking N ! 1. However, some care is required.
that 1
X
f (x) = cn (x a)n , for all x 2 I.
n=0
This gives us the definition that we want but does not tell us what functions are analytic.
For that, we use the above error bound.
Proposition 34. Suppose that f : R ! R is smooth and that for some open interval
I ✓ R, we have that the sequence (MN ) given by
MN = max f (N ) (x)
x2I
[Under these hypotheses, we can actually make the stronger claim that pN ! f uniformly
in I].
This is easily proven using the error bound in Proposition 35. Note, this is not strictly
a necessary condition. For example, ex is not bounded on R (nor are its derivatives), but it
can be verified that ex can be represented by its Taylor series on all of R.
The above proposition shows that we cannot enlarge ( 1, 1) at all; i.e., this is the maximum
interval on which we have this equality. This is because the radius of convergence here is
Note, this radius is found using the root test; due to some compatibility conditions with
the root and ratio test, in most exercises it can be found just as easily using the ratio test,
which is often simpler to apply. Also, in most practical example, the limit limn!1 |cn |1/n
will actually exist and so we can do away with the lim sup. Finally, note that the proposition
says that f is analytic on (a r, a + r); it does not say anything about the boundary points.
At these points, f may still be equal to its Taylor series or the series may fail to converge.
They must be checked separately. The set where a series converges is called the interval of
convergence. By the above proposition, I = (a r, a + r) will be the interior of the interval
of convergence.
P1 xn
Example 37. Find the interval of convergence of the series n=1 n .
This shows that the series converges on ( 1, 1). We can find the same result using the ratio
test. By the ratio test, the series will converge when
n |x|n+1 n
lim n = |x| lim = |x| < 1
n!1 (n + 1) |x| n!1 n + 1
which again shows convergence on ( 1, 1). The endpoints need to be checked separately
(this is where the ratio test and root test will fail). At x = 1, we have the harmonic series
which is divergent. At x = 1, we have the alternating harmonic series which converges by
the alternating series test. Thus the interval of convergence is [ 1, 1).
Note, this same convergence test can be performed for series thatP1are xnot necessarily
n
power series. For example, we could use the ratio test to find where n=0 1+xn coverges.
There are several Taylor series which are ubiquitous enough to merit memorization (al-
ternatively, these can be derived by quickly taking the derivatives and using the formula).
X1
( 1)n x2n
(b) cos(x) = for all x 2 R,
n=0
(2n)!
Christian Parkinson GRE Prep: Calculus II Notes 15
X1
( 1)n x2n+1
(c) sin(x) = for all x 2 R,
n=0
(2n + 1)!
1
X
1
(d) = xn for x 2 ( 1, 1).
1 x n=0
Note that all these series are centered at x = 0; it is fairly uncommon at this level (or on
the math subject GRE) to come across a Taylor series that is centered somewhere besides
x = 0 though it does occasionally happen. Note that the series for cos(x) and sin(x) can be
derived from the series for ex using Euler’s identity eix = cos(x) + i sin(x) and collecting real
and imaginary parts.
From these, using the next two propositions and some clever functional composition, we
can derive many other series for common functions without having to actually work out the
derivatives.
That is, to find the derivative of a convergent series, we can di↵erentiate term-by-term. Note,
this proposition states f 0 will be analytic with the same radius of convergence as f .
That is, to find the integral of a series, we can integrate term-by-term. Note that in this
case, the radius of convergence also does not change.
Example 41. Find the Taylor series for cosh(x) and log(1 + x2 ) centered at x = 0. What
are the intervals on convergence for these series?
Christian Parkinson GRE Prep: Calculus II Notes 16
Solution. Recall that cosh(x) = 12 (ex + e x ). To find the series for cosh(x), we can simply
add the series for ex and e x . Since
1
X xn
x
e = , for all x 2 R,
n=0
n!
we see that 1
X ( 1)n xn
e x
= , for all x 2 R.
n=0
n!
When adding these together, the odd terms will cancel and the even terms will be doubled.
Thus we have 1
X x2n
cosh(x) = .
n=0
(2n)!
Since this is just the addition of two series which converge everywhere, this series should also
converge everywhere; this can be checked using the ratio test. The the interval of convergence
is R.
For log(1 + x2 ), we note that
Z x
2 2t
log(1 + x ) = 2
dt.
0 1+t
Now we can expand the integrand in a Taylor series. Using the geometric series, we see that
X 1
1
= ( 1)n t2n for t 2 ( 1, 1).
1 + t2 n=0
Note that at both x = 1 and x = 1, this series will converge (by the alternating series
test), thus the interval of convergence is [ 1, 1].
One large application of Taylor series is evaluation of infinite sums. We can see for
example, that plugging in certain values of x, the series listed in Proposition 42 will give us
the value of certain infinite sums. For example,
X1 X1
1 1 3 ( 1)n
n
= = or = e 1.
n=0
3 1 (1/3) 2 n=0
n!
Christian Parkinson GRE Prep: Calculus II Notes 17
Taylor series can help with much more involved sums though this sometimes requires skillful
manipulations and some fortuitous recognition of series.
X1
n
n
.
n=1
4
Notice that we have a geomtric term 1/4n in the summand (though it is accompanied by
another term); this should hint that (a) comes from the geometric series somehow. Indeed,
to make that n appear in the numerator, we can di↵erentiate the geometric series:
1
X 1
X
1 1
= xn =) = nxn 1 .
1 x n=0
(1 x)2 n=1
1 1
16 X n 2 X n
= =) = .
9 n=1
4n 1 3 n=1 4n
For (b), we notice the factorial in the denominator: this indicated that the sum can likely
be evaulated using the Taylor series for sin(x), cos(x) or ex . Call the sum S. Note that
X1 X1
( 1)k (2k + 2)⇡ 2k ( 1)k ⇡ 2k
S= 2 = S 1 + S2 .
k=0
(2k + 2)! k=0
(2k + 2)!
We see 1 1
X ( 1)k ⇡ 2k 1 X ( 1)k ⇡ 2k+1 sin(⇡)
S1 = = = = 0.
k=0
(2k + 1)! ⇡ k=0
(2k + 1)! ⇡
Now 1 1
X ( 1)k ⇡ 2k 2 X ( 1)k+1 ⇡ 2k+2
S2 = 2 = 2 .
k=0
(2k + 2)! ⇡ k=0
(2k + 2)!
This looks exactly like the Taylor series for cos(x) with x = ⇡ plugged in except that k
replaced in the summad by k + 1. This has the e↵ect of omitting the first term inthe Taylor
series which is cos(0) = 1. Thus
1
!
2 X ( 1)k ⇡ 2k 2 4
S2 = 2 1+ = 2 ( 1 + cos(⇡) = 2
.
⇡ k=0
(2k)! ⇡ ⇡
Christian Parkinson GRE Prep: Calculus II Notes 18
Thus
4
S = S1 + S 2 = .
⇡2
For (c), we can imagine this may have arisen from plugging in x = 1 to a Taylor series.
Indeed, for x 2 ( 1, 1),
1 1 Z x Z x X1
! Z x
X x n+1 X dt
n n
= t dt = t dt = = log(1 x).
n=0
n + 1 n=0 0 0 n=0 0 1 t
Note, at the end here, we sort of cheated: we substituted in x = 1 when the original
manipulation was only valid for x 2 ( 1, 1). This is justified by Abel’s Theorem.
If
P1 f is continous from the leftPat a + r (resp. continuous from the rightPat a r), then
n 1 n 1 n
n=0 cn r Pconverges (resp. n=0 cn ( r) converges) and f (a + r) = n=0 cn r (resp.
1
f (a r) = n=0 cn ( r)n ).
P ( 1)n
This theorem validates that log(1 x) = 1 n=0 n+1 since log(1 x) remains continuous
as x ! 1.
Another use for Taylor series is in evaluating limits since Taylor series decide local be-
havior of a function. For example, we saw previously using l’Hôpital’s rule that
sin(x)
lim = 1.
x!0 x
This can easily be computed using Taylor approximations as well. Writing out the first few
terms of the Taylor series, we see that near x = 0,
x3
sin(x) ⇡ x .
6
Thus
sin(x) x2
⇡1 ! 1 as x ! 0.
x 6
This can greatly simplify limits which appear tricky at first.
sin(x sin(x))
Example 44. Compute lim .
x!0 x3
Christian Parkinson GRE Prep: Calculus II Notes 19
Solution. This limit can be performed using l’Hôpital’s rule and manipulating the result,
but with knowledge of Taylor series it is very easy. From the above, we see that near x = 0,
we have
x3
x sin(x) = + O(x4 ).
6
In the limit, the higher order terms disappear, so we see
sin(x sin(x)) x3 /6 1
lim = lim = .
x!0 x3 x!0 x3 6
As a final note, in light of our viewing Taylor series as “infinite polynomials,” there should
be some compatibility with finite polynomials. Indeed, there is.
Note that there is no issue checking convergence since p(n) (x) = 0 for all x, when n is
larger than the degree of the polynomial; thus the sum is actually finite. This proposition
tells us that any polynomial is equal to its Taylor series centered at any point. This fact is
occasionally useful on the math subject GRE; for example, one problem on a previous math
GRE asks to identify the values of a0 , a1 , a2 , a3 so that
which is very simple if you view the right hand side as the Taylor series for the polynomial
p(x) = x3 x + 1 centered at x = 2.
Christian Parkinson GRE Prep: Calculus III Notes 1
v1 + v2 = (x1 + x2 , y1 + y2 , z1 + z2 ).
for any ↵ 2 R.
Intuitively, this gives the length from the origin (0, 0, 0) to the point (x, y, z).
We use vectors to define planes; these are the higher dimensional analogs to lines in R2 .
whenever the limit exists. Alternately, these are often written as fx and fy respectively and,
when it is clear that x is taken to be the first argument of f , we may call fx the partial
derivative of f with respect to x (and similarly for y).
A function f : R2 ! R defines a surface (x, y, f (x, y)). Intuitively, these partial deriva-
tives measure the slope of that surface in the x and y directions repectively. From here, all
derivative rules follow from the same rules in the single variable case. One derivative rule
that is worth restating is a multi-dimensional version of the chain rule.
Just as we looked for tangent lines in Calculus I, we can look for tangent planes in mul-
tiple dimensions.
Example 14. Find the tangent plane to the surface x2 + log(y) + (z + 1)ez = 5 at the point
(2, 1, 0).
Solution. Here we cannot solve for z explicitly as a function of x and y but as above, the
tangent plane should be given by
@z @z
z = z0 + (x0 , y0 , z0 )(x x0 ) + (x0 , y0 , z0 )(y y0 ).
@x @y
Di↵erentiating the equation with respect to x, we see
@z z @z
2x + e + (z + 1)ez = 0.
@x @x
Plugging in (x, y, z) = (2, 1, 0), we see
@z @z @z
4+ + =0 =) = 2 at (2, 1, 0).
@x @x @x
Likewise, di↵erentiating with respect to y gives
1 @z z @z
+ e + (z + 1)ez =0
y @y @y
whence plugging in our point gives
@z @z 1
1+2 =0 =) = at (2, 1, 0).
@y @y 2
Thus the tangent plane at that point is given by
1
z= 2(x 2) (y 1).
2
While fx and fy give information about how f changes in the x or y direction irrespective
of the other, it can be important to decide how f changes in other directions as well. To
this end, we define the gradient of a function f .
Du f (x, y) = rf (x, y) · u.
Here u is assumed to be a direction vector; i.e. kuk = 1. If the norm of u is not 1, then it
must be normalized.
Note: Proposition 18 (combined with Proposition 7) gives another way of computing the
tangent plane to the surface g(x, y, z) = C.
If rf (x0 , y0 ) = 0, then
(a) if (x0 , y0 ) > 0 and fxx (x0 , y0 ) > 0, then f has a local minimum at (x0 , y0 ),
(b) if (x0 , y0 ) > 0 and fxx (x0 , y0 ) < 0, then f has a local maximum at (x0 , y0 ),
rf (x0 , y0 ) = rg(x0 , y0 )
for some real number . Note that this also holds in higher dimensions.
Christian Parkinson GRE Prep: Calculus III Notes 6
Geometrically, evaluating f along the curve creates a new curve which lies on the surface
(x, y, f (x, y)); this integral gives the area underneath that curve. This definition works just
as well in higher dimensions.
and with the formal notation x0 (t)dt = dx and y 0 (t)dt = dy, we write this as
Z Z
F · dr = M (x, y)dx + N (x, y)dy.
C C
That is, a vector field is said to be conservative if the line integral between any two points
is independent of the path traversed between them. Equivalently, F is conservative if
Z
F · dr = 0
C
for any closed curve C (i.e., a curve which begins and ends at the same point). When the
curve is closed, we typically draw the integral sign with a circle:
I
F · dr = 0.
C
And easy consequence of Theorem 23 is that any vector field F which has a potential
function is conservative. A natural question is to ask whether we can characterize vector
fields with potential functions.
(b) My = Nx ,
I
(c) F · dr = 0 for any closed curve C (contained in the domain of F),
C
Z Z
(d) F · dr = F · dr for any two curves C1 , C2 beginning and ending at the same point.
C1 C2
We will never use this definition (or any similar definition). Rather than build area inte-
grals from Riemann sums, we simply note that for nice functions, area integrals can be seen
as iterated one-dimensional integrals (Fubini’s theorem) when, in each iterated integral, we
treat the other variable as a constant. We demonstrate this with examples.
Example 28. Let f (x, y) = xy. Calculate the area integral of f over
the regions
(c) D3 = {(x, y) 2 R2 : 0 x 1, 0 y x2 }.
ZZ Z 1Z 1
f (x, y)dA = xy dxdy
D1 0 0
Z 1 x=1
x2
= y dy
0 2 x=0
Z 1
y
= dy
0 2 (b) D2
2 y=1
y 1
= = .
4 y=0 4
(b) Now to cover the area, we need to integrate from y = 0 to y = x, then from x = 0 to
x = 1. Thus
ZZ Z 1Z x
f (x, y)dA = xy dydx
D2 0 0
Z 1 2 y=x
y
= x dx
0 2 y=0
Z 1 3
x
= dx
0 2
4 x=1
x 1
= = .
8 x=0 8
Here the order of integration mattered. We can change the order, but the we must change
the bounds. Indeed, {0 x 1, 0 y x} is the same as {0 y 1, y x 1} whence
ZZ Z 1Z 1 Z 1
1 y2 1 1 1
f (x, y)dA = xy dxdy = y· dy = = .
D2 0 y 0 2 4 8 8
Again, if we want to change the order of integration, we need to be careful to change the
p
bounds as well. We see {0 x 1, 0 y x} is the same as {0 y 1, y x 1}
whence ZZ Z 1Z 1 Z 1
1 y 1 1 1
f (x, y)dA = xy dxdy = y· = = .
D3 0
p
y 0 2 4 6 12
As the above example demonstrated, we can perform the iterated integration in either
order. In some examples, it will be very difficult (or even impossible) to perform the inte-
gration in one order, but very easy in the other order.
All the same integration techniques from Calculus I can be used when performing the
iterated integrals. However, before performing iterated integrals, there are also generaliza-
tions of the techniques to area integrals.
Christian Parkinson GRE Prep: Calculus III Notes 10
x, y : D̃ ! D
(u, v) 7! (x(u, v), y(u, v)).
Then ZZ Z
f (x, y)dxdy = f (x(u, v), y(u, v)) |J(u, v)| dudv
D D̃
One particular transformation is used often enough that it deserves to be singled out.
x = r cos ✓, y = r sin ✓
Solution. (d) For the first integral, it is not too difficult to write
the region in terms of x and y (the boundaries are given by x + y = (d)
±1 and x y = ±1). However, actually performing the integral
with x and y is somewhat tedious. It is much easier if we use the
substitution u = x + y and v = x y since these variables will
satisfy 1 u 1, 1 v 1. This transformation is given by
u+v u v
x= ,y =
2 2
(e)
Christian Parkinson GRE Prep: Calculus III Notes 11
which has Jacobian determinant J(u, v) = (1/2)( 1/2) (1/2)(1/2) = 1/2. Thus the
integral is given by
ZZ Z Z ✓ ◆
2 1 1 1 u2 + 2uv + v 2
x dxdy = dudv
D 2 1 1 4
Z
1 1 2 1 1
= ( 3 + 2v 2 )dv = ( 43 + 43 ) = .
8 1 8 3
(e) Here we use x = r cos(✓), y = r sin(✓) for 1 r 2 and ⇡/2 ✓ ⇡. This gives
ZZ Z ⇡ Z 2 2
xy r cos(✓) sin(✓)
2 2
dxdy = r dr✓
D x +y ⇡/2 1 r2
Z ⇡ Z 2
= r cos(✓) sin(✓)drd✓
⇡/2 1
Z ⇡ ✓=⇡
3 3 3
= cos(✓) sin(✓)d✓ = sin2 ✓ = .
2 ⇡/2 4 ✓=⇡/2 4
One of the main applications of the integral is finding area or volume. We note that in sin-
gle variable calculus, the integral of 1 over an interval will return the length of that interval.
This remains true in higher dimensions: the integral of 1 over a domain in R2 will give the
area of that domain, the integral of 1 over a domain in R3 will give the volume of that domain.
Example 32. Find the volume of the region in the first octant bounded above by the surface
z = 1 x y.
Theorem 33 (Green’s Theorem). Suppose that F : R2 ! R2 , F(x, y) = (M (x, y), N (x, y))
is a smooth vector field. Let C be a piecewise smooth, simple (i.e., non-self-intersecting),
closed curve in R2 which encloses a region D. Then
I ZZ ✓ ◆
@N @M
M dx + N dy = dA.
C D @x @y
We conclude with Stokes’ Theorem and the Divergence Theorem of Gauss. It bears re-
peating that for the sake of brevity, I have skipped some material that would lead up to
these theorems. These do not often appear on the math subject GRE.
As a final note, we see that Green’s Theorem, Stokes’ Theorem and the Divergence
Theorem all have a similar flavor (they all relate an integral over an n-dimensional object
to an integral over an (n 1)-dimensional object). This is no coincidence. All three of
these are special cases of a theorem from di↵erential geometry (which is typically called
Stokes’ Theorem) which says roughly that the integral of a di↵erential (n 1)-form ! over
the boundary of a smooth, orientable n-manifold M is equal to the integral of the exterior
derivative d! over the whole manifold:
Z Z
!= d!.
@M M
This version of Stokes’ Theorem also subsumes the fundamental theorem of calculus.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 1
Di↵erential Equations
Definition 1 (Di↵erential Equation). An ordinary di↵erential equation is an equation
of the form
F (x, y, y 0 , y 00 , . . . , y (n) ) = 0
for some function F : Rn+2 ! R where y (k) denotes the k th derivative of y with respect to the
independent variable x. The order of the di↵erential equation is equal to the highest order
derivative of u which appears; the above equation is nth order (assuming F does actually
depend on its last variable). Note, we often replace y with u which can be thought to stand
for “unknown.” The goal then, is to find the unknown function which satisfies the equation.
On the math subject GRE, questions are mostly limited to equations where the highest
order derivative of the unknown function can be isolated; that is, equations of the form
y (n) = f (x, y 0 , . . . , y (n 1)
).
Many di↵erential equations can be solved by simply recalling facts from calculus.
A natural question would be to ask if we have found all solutions to a given equation.
Of course, in the above example, we did not. For (a), any constant times ex would work
just as well so a full family of solutions would be u(x) = Cex for some arbitrary constant
C 2 R. For (b), besides simply mutiplying by a constant, we see y(x) = cos(2x) works as
well. Indeed, combinations of the two also work so the full family of solutions is give by
y(x) = A sin(2x) + B cos(2x) for some arbitrary constants A, B 2 R. Both these equations
have special structure which is defined here.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 2
F (x, y, y 0 , . . . , y (n) ) = 0
said to be linear if the function F is linear in the arguments (y, y 0 , . . . , y (n) ); that is,
In the terminology of linear algebra, a di↵erential equation is linear if its solution set forms
a (possibly affine) subspace of the vector space of continuous functions on R. Intuitively, a
di↵erential equation is linear if the unknown function and its derivatives do not appear in
any non-linear way. In symbols, a linear nth order di↵erential equation has the form
Typically, we also require that an (x) 6= 0 for any x, so that we may divide by an (x) and
eliminate the coefficient of y (n) . If an (x) = 0 at some x, this x is called a singular point
of the di↵erential equation. Solving di↵erential equations with singular points can be quite
difficult and is beyond the scope of these notes.
Linear equations are always solvable analytically. We build toward the general solution
of a first-order linear equation in a few steps.
f (x)
y0 =
g(y)
is given by the pair of integrals
Z Z
g(y)dy = f (x)dx.
This formula deserves some motivation. In line with separable equations (or even simpler
equations of the form y 0 (x) = f (x)), we see that solving a di↵erential equation is akin to
“integrating the equation.” With this in mind, it would be ideal if the left hand side of the
above equation was a perfect derivative so we could just integrate it away. In general, there
will not be Y (x) such that
Y 0 (x) = y 0 (x) + p(x)y;
i.e., the left hand side will not be a perfect derivative. However, we can fix that by cleverly
multiplying by some other function µ(x) which we call an integrating factor. Indeed, our
new equation will read
µ(x)y 0 (x) + µ(x)p(x)y(x) = µ(x)q(x).
Now, if µ0 = p(x)µ, then the left hand side will read
µ(x)y 0 (x) + µ0 (x)y(x)
which is the derivative of µ(x)y(x). ThusR we simply choose µ satisfying µ0 = p(x)µ. However,
from the above, we know that µ(x) = e p(x)dx satisfies that equation. Thus we see,
R R R d h R p(x)dx i R
e p(x)dx y 0 + p(x)e p(x)dx y = e p(x)dx q(x) =) e y(x) = e p(x)dx q(x)
dx
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 4
R
(1/x)dx
Solution. According the formula, we should multiply the equation by µ(x) = e =
eln(x) = x. Doing this gives
d
xy 0 (x) + y(x) = x cos(x2 ) =) (xy(x)) = x cos(x2 ).
dx
Integrating yields
sin(x2 ) sin(x2 ) C
xy(x) = +C =) y(x) = +
2 2x x
where C is an arbitrary constant.
It may irk you that all of these solutions have this lingering arbitrary constant C. This
has something to do with the fact that di↵erentiation eliminates constants so when we solve
the equation (think of “integrating the equation”) we need to reintroduce constants that
may have been eliminated. Often times, along with a di↵erential equation, a given value
y(x0 ) = y0 is specified. This value can be used to solve for the constant C. For example,
p along
with the above equation, the question may have specifiedp that y should satisfy y( ⇡) = 1.
We found our general solution, so substituting x = ⇡ gives
p sin(⇡) C
y( ⇡) = p + p .
2 ⇡ ⇡
We are told this value should be 1, so this gives
C p
p =1 =) C= ⇡.
⇡
p
Thus the function y which solves the equation and satisfies y( ⇡) = 1 is given by
p
sin(x2 ) ⇡
y(x) = + .
2x x
Such questions which give an equation and specify a value for the unknown function at a
point are called Initial Value Problems or Boundary Value Problems depending on the con-
text (and the way in which the values of the unknown function are specified). We won’t
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 5
This gives a solution to all linear first order equations. With separation of variables we
can also solve some non-linear equations. Most non-linear equations will not be solvable an-
alytically. However, without solving we can still determine some of the behavior for solutions
to certain equations.
It is not difficult to see that if we specify y(x0 ) = y0 for some x0 where y0 is root of f , then
y(x) = y0 for all x > x0 ; that is, if y hits an equilibrium point, it will stay on that equilibrium
point forever. By a standard uniqueness argument (so long as f is di↵erentiable), this shows
that no solution could cross an equilibrium solution. That is, if y0 is an equilibrium point
and y(x0 ) < y0 , then y(x) < y0 for all x > x0 . A general rull of thumb for autonomous
equations is that all solutions either (1) blow up to positive or negative infinity as x ! ±1
or (2) tend toward an equilibrium solution as x ! ±1. The function f and a specified
value will tell you which regime your solution falls into. We demonstrate this in an example
before moving on.
an y (n) + an 1 y (n 1)
+ · · · + a1 y 0 + a0 y = 0 (1)
where a0 , a1 , . . . , an 2 R, an 6= 0. There is one strategy that always works for such equations
up to yor ability to find roots of polynomials. Intuitively, we could think that we need to find
solutions which don’t change much upon di↵erentiation since the desired solution seemingly
will only be multiplied by constants when di↵erentiated. Thus we look for a solution of the
form y = erx where r is a constant. Plugging this in, we arrive at
an r n + an 1 r n 1
+ · · · + a1 r + a0 = 0.
This is satisfied if r is a root of the above polynomial. The polynomial has n roots up to
multiplicity, and we can then superpose these to construct the solution.
r2 2r 3=0 =) (r + 1)(r 3) = 0 =) r= 1, 3.
x
This shows that both e and e3x are solutions. Thus by linearity, the general solution is
x
y(x) = C1 e + C2 e3x .
This method raises an immediate red flag: what if the polynomial in r does not have real
roots? This is easily handled using Euler’s identity.
r2 + 4r + 13 = 0 =) (r + 2)2 + 9 = 0 =) r= 2 ± 3i.
y(x) = C1 e( 2+3i)x
+ C2 e ( 2 3i)x
.
You may think the final constants C1 and C2 must be allowed to be complex now. This
is sort of true however if real initial conditions are specified, it will always turn out that
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 7
C1 , C2 2 R.
There is a slightly more subtle hiccup that one can encounter when using this method.
Consider the equation y 00 2y 0 + y = 0. Using this method, we would find that r2 2r + 1 =
0 =) (r 1)2 = 0 =) r = 1. This yields the solution y(x) = Cex . However, you
may have noticed that with all the previous second-order equations, there were two solutions
with two arbitrary constants (i.e., the solution space was a two-dimensional vector space).
This is true generally: the solution space to an nth order linear equation where the right
hand side is zero will form an n-dimensional vector space. So for this latter equation we
are missing a solution. This solution can be recovered using a method called variation of
parameters but what you will find is that in this case, the general solution is not a constant
multiple of ex , but rather a linear multiple of ex :
This method only deals with homogeneous equations like (1); equations which have zero
on the right hand side. There are also methods for non-homogeneous equations; those of the
form
an y (n) + an 1 y (n 1) + · · · + a1 y 0 + a0 y = f (x)
for some given function f . These methods cen be very complicated even in relatively simple
cases (low order, “nice” functions). However, if f has a special form then they can be very
easy. We state this in a couple propositions and finish with a brief example.
an yp(n) + an 1 yp(n 1)
+ · · · + a1 yp0 + a0 yp = f (x)
The function yp is called the particular solution and the function yh is called the homoeneous
solution. Thus this proposition tells us that we can always break a solution into a particular
and homogeneous part.
an y (n) + an 1 y (n 1)
+ · · · + a1 y 0 + a0 y = f (x).
Suppose that f is not in the span of the homogeneous solutions for the equation. Then
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 8
(a) if f (x) = ↵ sin( x) + sin( x) where 6= 0 and at least one of ↵, is non-zero, then
yp (x) = A sin( x) + B cos( x) for some constants A, B,
x x
(b) if f (x) = ↵e for ↵, 6= 0, then yp (x) = Ae for some constant A,
This gives us a way of solving non-homogeneous equations if the forcing functions are of
a certain form.
Solution. We must find the general homogeneous solution and then find a single particular
solution. For the homogeneous solution, we try yh = erx . This will be a homogeneous
solution if
r2 + 25 = 0 =) r = ±5i.
This gives a homogeneous solution: yh (x) = C1 cos(5x) + C2 sin(5x). By the above proposi-
tion, it suffices to look for a particular solution of the form yp (x) = Ax2 +Bx+C. Substituting
this into the equation gives
Matching the coefficients on each power of x gives A = 1/25, B = 0, C = 2/625. Thus the
general solution is given by
x2 2
y(x) = yp (x) + yh (x) = + C1 cos(5x) + C2 sin(5x).
25 625
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 9
Linear Algebra
The most basic goal of linear algebra is to solve systems of linear equations. Since these
linear systems can be expressed in matrix vector form, it is worthwhile to build up some
theory surrounding matrices and vectors; we will do that momentarily. First we review how
to solve these systems. The most general techinique for solving these equations is known
as Gauss elimination (also called row reduction) with back substituion wherein we simply
eliminate variables by tactfully combining the equations and then solve the equations in
reverse order. We demonstrate these sort of computations in a few examples and then move
on to theoretical aspects of linear algebra.
where it is understood that each column represents a variable and the entries to the right of
the divider represent the values on the right hand side of the equation.
Many questions on the math subject GRE boil down to solving linear systems but there
are some intricacies. Above we found a unique solution to our system of equations. This
may not always happen. We demonstrate this with two more brief examples before moving
to theory.
x+y+z =5
x + 2y z = 9
3x + 5y z = 23
and
x+y+z =5
x + 2y z = 9
2x 5y + 4z = 10
With these three examples we have witnessed three di↵erent scenarios. The first system
admitted a unique solution, the second had infinitely many and the third had zero. It turns
out these are the only three possibilities. We see this shortly.
Now we move onto some theory. The concepts of greatest concern in undergraduate linear
algebra (matrix algebra and the structure of Rn ) are specific cases of more general concepts
which we cover first though we will soon see that it is enough to study Rn .
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 11
Definition 19 (Vector Space). A vector space is a set V paired with a scalar field F and
equipped with two operations
+:V ⇥V !V
(x, y) 7! x + y
·:F⇥V !V
(↵, x) 7! ↵ · x (also written ↵x)
such that (V, +) is an additive group and + and · are compatible.1 In particular, since (V, +)
is an additive group it must have an additive identity which we name 0 and each element
x 2 V must have an additive inverse which we denote by x. When referencing a vector
space, we often only refer to the set V especially when there is no ambiguity about which
underlying field F we are using.
Any non-trivial vector space has smaller vector spaces embedded inside of it. These will
be of interest for various reasons later on so we define the concept now.
That is, a subspace is a vector space which is contained in some larger vector space. The
o the coordinate axes in R . Indeed, consider the subsets of
n
easiest examples of subspacesn are
R2 given by x0 x2R and y0 . These are both subspaces of R2 . Every vector space
y2R
has two trivial subspaces: the set {0} and the space itself. Subspaces can be combined to
make other subspaces in various ways. For example, intersecting two subspaces or taking
the direct sum of two subspaces will yield another subspace.
One interesting thing about Rn is that all vectors in Rn can be expressed as linear
combinations of a very small set of vectors. Indeed, if x = (x1 , x2 , . . . , xn ) 2 Rn then
x = x1 e 1 + x2 e 2 + · · · xn e n
where ek is a coordinate vector; one which has zeros everywhere except a 1 in the k th entry.
Moreover, this representation is unique. There are no other scalars (↵1 , . . . , ↵n ) such that
x = ↵1 e1 + · · · + ↵n en . It is reasonable to ask if other vector spaces have a similar property:
can we always find a relatively small set whose linear combinations can form any vector in
1
In the sense of several axioms such as distributivity and associativity which I am much too lazy to
reproduce here.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 12
According to this definition, the set of vectors {e1 , . . . , en } in Rn are a spanning set for Rn .
One deficiency in this definition is that we could always add more vectors to a spanning set.
Indeed, {0, v, e1 , . . . , en } is also a spanning set for Rn (where v 2 Rn is arbitrary). However,
the zero vector and this arbitrary v are not needed in order to span Rn since they themselves
can be written as a linear combination of the other vectors in the set; including them would
ruin the uniqueness of the representation of any vector x since x = ↵0+0v +x1 e1 +· · ·+xn en
for any ↵ 2 R. We would like to eliminate such redundacies.
↵ 1 v1 + · · · + ↵ n vn = 0 =) ↵1 = · · · = ↵n = 0.
↵ 1 v1 + · · · + ↵ n vn = 0
Intuitively, linearly independent sets have no redundacies; no one vector in the set can be
written as a linear combination of the others. Thus we have uniqueness of representations of
elements in the span of linearly independent vectors. From this definition, it is easy to see
that the coordinate vectors in Rn , {e1 , . . . , en } form a linearly independent set which also
spans Rn .
Definition 23 (Basis). A basis for a vector space is a linearly independent spanning set.
Equivalently, {vi }i2I is a basis for a vector space V if every vector v 2 V can be represented
uniquely as a finite linear combination of vectors in {vi }i2I :
Bases are very important because they allow us write all vectors in a space in terms of
a relatively small set of vectors. It isn’t clear a priori that all vector spaces have a basis.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 13
This theorem is deceptive in that it seems very strong and useful but in practice it is
essentially useless. The theorem is actually equivalent to the axiom of choice and so the
proof is entirely non-constructive. So while this theorem ensures that every vector space has
a basis, it gives no advice on how to find such a basis and indeed, attempting to identify a
basis could be a fruitless endeavor. In one special case it is very easy to find a basis; when
the basis consists of a finite amount of vectors. Indeed in this case, we may use a basis
to somehow define the size of the space. First, it should be noted that bases are far from
unique. For example, both the sets 10 , 01 and 11 , 11 are bases for R2 . This could
potentially be troublesome; if we want to use bases to determine the size of a space, we
should be certain that all bases are actually the same size.
Theorem 25 (Invariance of Size of a Basis). If two sets each form a basis for the same
vector space, then the sets have the same cardinality.
Thus for example Rn is n-dimensional since the basis {e1 , . . . , en } has n elements. For
the remainder of these notes, we focus on finite dimensional vector spaces (as is common in
undergraduate linear algebra); for a survey of results in infinite dimensional vector spaces,
one could study linear functional analysis.
In finite dimensional vector spaces, it is very easy to find a basis due to the next theorem.
Theorem 27. In any n-dimensional vector space, any set of n linearly independent vectors
forms a basis and any spanning set consisting of n vectors forms a basis.
This is a great result because it shows that we only need to check one property in the
definition of a basis. The following theorem has a similar flavor.
This theorem makes a lot of sense intuitively: if there are only n-dimensions and you have
more than n vectors, then one of them must be a combination of the others. Similarly, if
there are n-dimensions then less than n vectors cannot span the space since some dimension
will be missing. This theorem tells us that a basis can be viewed as a maximal linearly
independent set or a minimal spanning set.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 14
With all these definitions, we move on to what is the largest topic in linear algebra: maps
between vector spaces. For various reasons, linear transformations between vector spaces are
of special interest.
Definition 29 (Linear Transformation). Suppose that V, W are vector spaces over the
same field F. A linear transformation T : V ! W is a map such that
We will focus mainly on the case that V = W so the map is a linear transformation
which takes a vector space to itself though for several results this is not necessary. When
V = W , the most obvious example of a linear transformation is the identity transformation
I : V ! V for which I(v) = v for all v 2 V . We typcally denote this transformation by
I, sometimes with an indication of the underlying space; e.g., IV . As another example, all
matrices form a linear transformation.
TA (x) = Ax, x 2 Rm .
In view of this proposition, we typically blur the lines between a matrix and the linear
transformation that it induces. For example, we often refer to matrices as linear transfor-
mations with the understanding that the tranformation we are discussing is given by x 7! Ax.
Taking a categorical view, we note that for a linear transformation T : V ! W , the set
of vectors {T (v) : v 2 V } ⇢ W is itself a subspace of W ; that is, T does actually map a
vector space to a vector space which shows that linear transformations are the morphisms in
the category of vector spaces. In fact, each linear transformation has two natural subspaces
associated with it.
Definition 31 (Range & Null Space). Suppose V, W are vector spaces and T : V ! W
is a linear map. We define the null space of T by
N (T ) = {v 2 V : T (v) = 0}
R(T ) = {T (v) : v 2 V }.
Thinking more about structure which linear transformations preserve, we can use them
to identify spaces which are essentially the same. We do this in the following way.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 15
Isomorphic spaces share essentially all the same structure. For example, isomorphic
spaces have the same dimension (in fact, an invertible linear transformation will map a basis
of one space to a basis of the other). The next theorem justifies focusing only on Rn rather
than generic vector spaces V .
This tells us that essentially any linear algebra property that we can prove for Rn will
hold in any n-dimensional vector space (we can simply translate between the two by finding
an isomorphism). Given this, we drop any sort of generality and focus only on Rn for the
majority of the notes.
Another curious simplification from undergraduate linear algebra is that, rather than deal
with general linear maps T , we only deal with matrices. This is justified by the following.
T (x) = Ax, x 2 Rm .
This further blurs the lines between matrices and linear transformations by providing a
converse to Proposition 29: not only does every matrix induce a linear transformation, we
also have that every linear transformation is induced by a matrix. Finding the matrix which
induces a given transformation is easy; the columns of the matrix are given by T (ek ) where
ek is a coordinate vector (in view of this, we can think of an invertible matrix as simply
changing the basis for the underlying space; in a di↵erent basis, a linear transformation will
have a di↵erent matrix associated with it). With this, we drop generality even further and
discuss matrices for a while. Any of the above definitions for linear transforms are adaptable
to matrices; e.g. we will use R(A) to mean the range of the linear transform which A induces.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 16
Given a matrix, we would like to nail down exactly how it transforms Rn by answering
questions like “what does the range of A look like?” The first realization to note is that the
range of A is exactly the span of its columns. Indeed, for A 2 Rn⇥m , any y in the range of
A has the form
y = Ax = x1 A1 + x2 A2 + · · · + xm Am
where x is some vector in Rm and Ak are the columns of A. For this reason, the range of A
is often called the column space of A.
Definition 36 (Rank). The rank of a matrix is defined to be the dimension of the range
or the matrix. Equivalently the rank is the number of linearly independent columns in the
matrix. For a matrix A, we write this as rank(A).
The rank of an n ⇥ m matrix is at most n because the matrix maps into Rn which is
n-dimensional but the rank is also at most m since the matrix has m columns. A matrix
is said to be full rank if the rank is as large as possible; i.e., A 2 Rn⇥m is full rank if
rank(A) = min{n, m}. The rank of a matrix tells you a lot about how the matrix transforms
Rn . The only rank 0 matrix is zero matrix; it maps all vectors to the point {0}. A rank
1 n ⇥ m matrix maps Rm to a line in Rn . A rank 2 matrix maps Rm to a plane in Rn ,
etc. Intuitively, the rank should tell you something about the invertibility of a matrix. The
transformation which maps all of Rm onto a line (where m 2) will not be invertible since
many points must be mapped to the same point on the line.
Definition 37 (Nullity). The nullity of a matrix is defined to be the dimension of the null
space of the matrix. Equivalently, the nullity is maximum number of elements in linearly
independent set which is annihilated by the matrix. For a matrix A, we write this as null(A).
As an extension of the above discussion, if the nullity of the matrix is non-zero, then that
matrix maps a non-trivial subspace onto the zero vector which will mean the transforma-
tion is non-invertible. Thus there seems to be some relationship between rank and nullity;
roughly, higher rank means larger range which means less vectors are mapped to zero which
means lower nullity. This line of reasoning is made precise in the following theorem.
We now define what it means for a matrix to be invertible and relate this to the rank
and nullity of a matrix.
Definition 39 (Identity Matrix & Inverse Matrices). The identity matrix of dimen-
sion n is the matrix I 2 Rn⇥n is the matrix with entries ij = 1 if i = j and ij = 0 for i 6= j.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 17
The power of the inverse of a matrix is that it gives us a concrete way to solve linear
systems. Indeed, for a system of equations like those in Example 16, we can write the
equations in the form Av = b for some matrix A, vector of unknowns v and a given vector
b. If A is invertible and we know the inverse A 1 , then we see
Av = b =) A 1 Av = A 1 b =) Iv = A 1 b =) v = A 1 b
so we have solved the equation for v. Because of this, it is very helpful to identify exactly
which matrices are invertible and which are not. For example, from the above it seems like
an n ⇥ n matrix can be invertible only if it has full rank; this turns out to be true. Towards
the end of this document, we will give a fairly comprehensive list of concrete conditions
which can be easily checked and are equivalent to invertibility of a matrix.
The above shows that invertibility is related to solvability of the system Av = b. We saw
in Examples 16,17 that not all systems are uniquely solvable and some are insolvable. The
previous paragraph seems to show that Av = b is uniquely solvable when A is invertible.
This is a sufficient but unnecessary condition for unique solvability. Indeed, the system may
be uniquely solvable even when the matrix A is no longer n ⇥ n. We summarize this in a
proposition.
(ii) There are infinitely many solutions v 2 Rm . [This will occur when b 2 R(A) and N (A)
is non-trivial because then we can find a single solution v and add to it any element
of the null space of A.]
In line with the above discussion, it seems like a geometric way to determine whether
a matrix is invertible if to see if it maps n-dimensional subspaces to other n-dimensional
subspaces. To motivate this, we could consider volume in R3 . In some rough sense, a shape
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 18
which is truly 3-dimensional should have some positive volume while shapes which are 1-
dimensional or 2-dimensional (embedded in R3 ) have zero volume. Thus to decide whether
a matrix is invertible, we could check if it maps sets of positive volume to sets of positive
volume. This is essentially what the determinant of the matrix measures.
where Sn is the group of permutations of {1, . . . , n}. This is also sometimes written as |A|
(though we will stick to the former notation: det A).
At first this formula is somewhat mysterious but we rarely use this definition to calculate
a determinant. To actually calculate a determinant, we induct upwards. By the formula, we
see that ✓ ◆
a11 a12
det = a11 a22 a12 a21 .
a21 a22
Thus we can take the determinant of a 2 ⇥ 2 matrix. The following proposition allows us to
take the determinant of larger matrices.
where A is the matrix obtained by removing the ith row and j th column from A. Likewise,
(ij)
The geometric interpretation of the determinant is crucial to seeing how this number
relates to invertibility.
In words, the matrix A will transform the unit cube into some n-dimensional parallelepiped;
the determinant of A gives the volume of that parallelepiped.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 19
By our above reasoning then, if the determinant of A is zero, then A squishes the unit
cube onto some lower dimensional parallelepiped and thus A should fail to be invertible; this
turns out to be true.
(1) det(I) = 1
(2) If two rows of a matrix are interchanged, the determinant changes sign.
(3) If one row of a matrix is multiplied by a constant, the determinant is also multiplied
by that constant. In particular for A 2 Rn⇥n , det(↵A) = ↵n det(A) for ↵ 2 R.
(4) Adding a multiple of one row to another row will not change the determinant.
(5) The determinant is multiplicative; i.e., det(AB) = det(A) det(B) when A, B 2 Rn⇥n .
Having defined the determinant, we are equipped to define eigenvalues and eigenvectors
but there is one more important value to define regarding a matrix which will come back later.
Definition 45 (Trace). The trace of a matrix is the sum of the diagonal elements of the
matrix. In symbols, for a matrix A 2 Rn⇥n , the trace is defined by
n
X
tr A = aii
i=1
The trace doesn’t carry quite the weight that the determinant does (for example, it is
unrelated to invertibility) but it is a linear functional on Rn⇥n (a functional on a vector space
is a map which takes the space to the underlying field) and you will be expected to know
about it for the math subject GRE. We state its properties.
(b) [Cyclic Property.] tr(AB) = tr(BA) and more generally the trace is invariant under
cyclic permutations so e.g.
It is important to note what this last property does not imply: the trace is not invariant
under any permutations. Thus, when taking the trace of a product of several matrices, one
cannot swap the matrices in an arbitrary order with impunity.
With this, we move on to eigenvalues and eigenvectors of a matrix which carry geometric
information about how a square matrix A transforms Rn .
The eigenvectors (if they are real) show the directions in which A simply stretches or
squishes vectors and the amount of stretch is given by ; if we know these directions, we
may be able to deduce how A deforms other vectors. Imaginary eigenvalues and eigenvec-
tors correspond somehow to rotation rather than stretching. Practically, we see that an
eigenvalue/eigenvector pair satisfies
(A I)v = 0
Making this proposition even more explicit, when A 2 Rn⇥n , n 2, the characteristic
polynomial of A is given by
n n 1
pA ( ) = tr(A) + · · · + ( 1)n det(A).
That is, the determinant of A is the constant term in its characteristic polynomial (possibly
with a sign change) and the trace is the second coefficient (the coefficient attached to n 1 ).
Thus, for example, when A is 2 ⇥ 2,
2
pA ( ) = tr(A) + det(A).
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 21
With all these definitions, we state the largest theorem of undergraduate linear algebra.
This theorem gives several ways to check if a matrix is invertible or singular. We have al-
ready discussed most of these but it is nice to have them listed in one location. Many of the
following conditions are simply re-statements of each other but I list them for completeness.
This theorem displays the many relationships between the preceding topics
Theorem 49 (Invertible Matrix Theorem). Let A 2 Rn⇥n . The following are equiva-
lent:
(1) A is invertible
(14) det A 6= 0
Of course, this is not an exhaustive list. There are plenty more equivalent conditions but
these give a solid overview of what it means for a matrix to be invertible.
This concludes a vast majority of the linear algebra material which is included on the
math subject GRE. For completeness, we discuss a few more topics: those related to simi-
larity and diagonalizability.
Similarity is an equivalence relation on Rn⇥n . At first glance, this definition may not
seem very meaningful but similar matrices share many nice properties in common.
Proposition 51. Similar matrices have the same characteristic polynomial; hence the same
eigenvalues, the same determinant and the same trace.
In view of this, similar matrices can be seen as representing the same linear transforma-
tion with respect to di↵erent bases for Rn . For a matrix A 2 Rn⇥n , one goal may be to
find a matrix which is similar to A but has a much simpler form; this would help us identify
key features about A (especially regarding how it transforms Rn ) without doing much heavy
lifting.
Definition 52 (Diagonal Matrix). A matrix D 2 Rn⇥n with entries (dij )ni,j=1 is said to
be diagonal if dij = 0 when i 6= j.
Diagonal matrices are incredibly simple and easy to analyze; they transform Rn by
stretching the coordinate axes. Thus each standard basis vector ek is an eigenvector of
a diagonal matrix and the corresponding eigenvalue is the k th diagonal element of the ma-
trix. From this, we see that the determinant of a diagonal matrix is the product of the
diagonal elements and a diagonal matrix is invertible i↵ it has no zeros on its diagonal.
Since similar matrices share the same eigenvalues, if a matrix is diagonalizable, then it is
similar to the diagonal matrix with its eigenvalues as the diagonal elements. This reasoning
seems, at first glance, to be reversible. Suppose that A 2 Rn⇥n has eigenvalues 1 , . . . , n
with corresponding eigenvectors v1 , . . . vn . We can form a matrix P which has columns given
by v1 , . . . , vn and we will see
AP = [Av1 | · · · | Avn ] = [ 1 v1 | · · · | n vn ] = PD
Theorem 54. A real n ⇥ n matrix is diagonalizable i↵ its eigenvectors form a basis for Rn .
In some sense, most matrices are diagonalizable (precisely: the diagonalizable matrices
form a dense subset of Rn⇥n with respect to any norm on Rn⇥n ). However, given a matrix,
it can be difficult to decide quickly whether it is diagonalizable. For large n, finding the
eigenvalues and eigenvectors of an n ⇥ n matrix is very difficult so it is useful to characterize
large classes of diagonal matrices in other ways. This requires a bit more machinery.
(a) (At )t = A,
More generally, for any finite dimensional vector spaces V, W and any linear transforma-
tion T : V ! W , we can define an adjoint transformation T ⇤ : W ⇤ ! V ⇤ where V ⇤ , W ⇤ are
the continuous dual spaces of V, W respectively. In the terminology of category theory, the
map which takes any vector space to its dual and transposes any linear map is a contravari-
ant functor on the category of vector spaces. To gain more context regarding the meaning
of transposition one can read about the dual spaces; this is outside of the scope of these notes.
Symmetric matrices (and their more general analog, self-adjoint operators) have many
desirable properties; these are highlighted whne studying inner product spaces. There are
two properties which are pertinent to our discussion.
y 0 = Ay
u0 = Du
which is a very easy system to solve. Reversing the substitution will yield the solution y to
the original equation.
A2 = P DP 1
P DP 1
= P D2 P 1
, A3 = P D2 P 1
P DP 1
= P D3 P 1
, etc.
Topics which sometimes appear on the math subject GRE and which were excluded here
for brevity include normed spaces and inner product spaces. The main idea is that we can
add much more structure to a vector space by specifying a way to measure length and angles
between vectors. For a discussion of normed spaces, inner product spaces and much more,
one could look to Peter Petersen’s linear algebra book which is freely available online.
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 1
Abstract Algebra
Definition 1 (Group). A group is a set G paired with a binary operation which satisfies
the following axioms:
We refer to the group as (G, ) unless there is no ambiguity about the operation in which
case we can simply write G. The element e in (c) is called the identity element in the group;
a simply proof shows that it is unique. For each x, the corresponding x 1 defined in (d) is
called the inverse of x; this element is also unique. In many circumstances, the operation
is addition (in which case we may call G an additive group and refer to inverses with a minus
sign: x is the inverse of x) or multiplication (in which case we may call G a multiplicative
group).
Almost all of the sets we’ve defined thus far have group structure with respect to some
operation.
(R, +) - the set of real numbers with addition (more generally (Rn⇥m , +) is a group)
(Z⇤n , ·n ) - the set of integers k 2 {1, . . . , n 1} such that gcd(k, n) = 1 with multipli-
cation modulo n
Dn - the set of reflections and rotations of the regular n-gon with functional composition
K4 = {00, 01, 10, 11} - the set of length two strings consisting of zeros and ones under
bitwise XOR operation. One can think of this as the set of configurations of a pair
of light switches: o↵/o↵, o↵/on, on/o↵, on/on. The operations then correspond to
flipping the switches.
While most of the sets we’ve dealt with so far have a natural group structure, it is also
easy to identify sets which fail to be a group.
(R, ) - the set of real numbers with subtraction. This operation is non-associative.
(Z+ , +) - the set of positive integers with addition. This set has no identity element.
(Rn⇥n , ·) - the set of n ⇥ n matrices with matrix multiplication. Not all elements of
this set have inverses.
Notice that in the definition, we only require the operation in a group to be associative.
However, in many groups (especially additive groups), the operation is also commutative.
This distinction is especially important when discussing inverses of elements. In general, for
x, y in some group, we have (xy) 1 = y 1 x 1 . Only if the operation is commutative can we
arrive at the more natural (xy) 1 = x 1 y 1 . Commutivity is an additional piece of structure
and thus should be given its own definition.
Definition 4 (Abelian Group). A group is called abelian if, in addition to the other
group axioms, the binary operation is commutative. That is, (G, ) is abelian if x y = y x
for all x, y 2 G.
As stated above, just about any additive group will be abelian. Another thing to notice
is that many of the above groups are embedded inside each other; we call these subgroups
(just like many vector spaces can be realized as subspaces of some larger space).
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 3
Definition 5 (Subgroup). Suppose that (G, ) is a group and that H ⇢ G is such that
(H, ) is also a group. Then we call H a subgroup of G and write H G.
The subgroups tell you something about the overall structure of the group. For example,
it may be reasonable to try to build groups by successively nesting subgroups. To this end,
it is useful to identify the subgroups of a given group. We discuss this after some examples.
Example 6. Every group has two trivial subgroups: the group itself and the set containing
only the identity element.
Recall that we have a chain of subsets Z ⇢ Q ⇢ R ⇢ C. This becomes a chain of
subgroups when each set is given the operation of addition: Z Q R C.
The subset SL(n, R) = {A 2 Rn⇥n : det A = 1} is a subgroup of GL(n, R) since the
determinant is multiplicative.
The subset An of Sn consisting of even permutations (those which can be arrived at with
an even number of transpositions) is a subgroup.
The subset Zn = {0, 1, . . . , n 1} is not considered a subgroup of Z since, in order to
maintain closure, one must change the operation from addition to addition modulo n (in
fact, Zn is a quotient group of Z; a topic we will not cover).
For any group (G, ) and any x 2 G, the subset
hxi = {xn : n 2 Z}
There is a handy test for deciding whether a subset of a group is indeed a subgroup.
One thing to notice is that this subgroup hxi generated by x may be the entire group.
Again, the sets Z and Zn are prototypical cyclic groups. Another example of a cyclic
group is Un , the nth roots of unity. Most of the groups listed above (indeed “most” groups
in general) are not cyclic: none of Q, R, C are cyclic under addition, Sn and Dn are not
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 4
cyclic. Another group which is famously non-cyclic is the Klein-4 group defined above by
K4 = {00, 01, 10, 11} with bitwise XOR; this is the smallest non-cyclic group. Any cyclic
group is abelian (this follows from associativity, can you see why?), but the converse is not
true.
Definition 9 (Order). Suppose that (G, ) is a group. The order of the group G is defined
to be the cardinality of G and written |G|. G is called a finite group if the cardinality is
finite. Let x 2 G. The order of x is defined to be the smallest n > 0 such that xn = e where
e is the identity of the group. If no such n exists, the element is said to have infinite order.
From this definition, we see that the order of an element x 2 G is the same as the order
of the group hxi. Likewise, G is cyclic if and only if there is some x 2 G such that the order
of x is equal to the order of G. Cyclic groups have an easily identifiable structure detailed
in the following proposition.
Proposition 10. Suppose that G is a cyclic group. Then all subgroups of G are cyclic.
Further if G has finite order n and x 2 G is a generator of G (so that x has order n), then
xm has order n/gcd(m, n). In particular, G has a subgroup of order d for all d which divide
n and xm is a generator of G if and only if gcd(m, n) = 1.
This proposition classifies all subgroups of a cyclic group (expecially finite cyclic groups).
One might ask if we can classify the subgroups of any finite group; while we cannot do this,
we can at least classify the potential sizes of subgroups and state some existence results for
subgroups. We list these in a few theorems.
Theorem 11 (Lagrange’s Theorem). Let G be a finite group. The order of any sub-
group of G must divide the order of G. In particular, if x 2 G has order m and G has order
n, then m | n.
A corollary of Lagrange’s Theorem is that every group of prime order is cyclic. Indeed,
if n is prime in the above statement, then every element of G must have order 1 or order n.
It is obvious that only the identity element can have order 1 and thus every other element
has order n and generates the group. While this tells you the possible orders of subgroups
of a finite group; it does not guarantee existence of subgroups of a given order.
Theorem 13 (Sylow’s First Theorem). Let G be a finite group of order n and let
n = pk m for some prime p which does not divide m (i.e., pk is the largest power of p. Then
G has a subgroup of order p` for all 1 ` k.
We can summarize these three theorems with an example. If G has order 54, then by
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 5
Lagrange’s Theorem, any subgroup of G has order 1, 2, 3, 6, 9, 18, 27 or 54 (of course, we will
have the trivial subgroups of order 1 and 54). By Cauchy’s Theorem, there is at least one
subgroup of order 2 and at least one subgroup of order 3 since these are the primes which
divide 54. By Sylow’s First Theorem, there is at least one subgroup of order 9 and at least
one subgroup of order 27 since these are the prime powers which divide 54. We are not
guaranteed existence of subgroups of order 6 or order 18.
Since algebra is about identifying structure, it makes sense that we should identify when
two groups are morally the same. Take for example, Z4 , U4 and K4 with their repective
operations. Notice that both Z4 and U4 are cyclic. Indeed,
1 · 1 = 1, 2 · 1 = 2, 3 · 1 = 3, 4 · 1 = 0 in Z4
and
i1 = i, i2 = 1, i3 = i, i4 = 1 in U4 .
This gives us a natural bijection between the two groups. We could map 1 ! i and specify
that the map preserve group operations. However, K4 is not cyclic since ever element has
order 2. Thus, while there are bijections from K4 to Z4 , none of them will respect the group
structure. In this case, we say that Z4 and U4 are isomorphic while Z4 and K4 are not. We
define this notion here.
Definition 14 (Isomorphism). Let (G, ) and (H, ⇤) be two groups. These groups are
said to be isomorphic if there is a bijective map : G ! H such that
(x y) = (x) ⇤ (y)
for all x, y 2 G. Such a map is called a (group) isomorphism. When such a map exists,
we write G ⇠ = H.
As stated above, groups which are isomorphic can be thought of as morally the same
group up to renaming elements. We may also call isomorphic groups structurally identical,
whereupon we would refer to non-isomorphic groups as structurally distinct. All algebraic
properties of a group are preserved under isomorphism: number of solutions to equations
like x2 = y for fixed y; commutivity of the operation; number of subgroups of a given order,
etc. Using this concept, we can classify how many groups there are of a given order up to
isomorphism. In general, this is a humongous undertaking (the classification of finite, simple
groups requires hundreds of pages for example) but there are a few theorems which help us
classify groups with nice structure.
Proposition 15. Any cyclic group is isomorphic to either Z or Zn for some finite n.
This proposition essentially closes the book on cyclic groups: we know essentially any-
thing we will ever need to know about cyclic groups from an algebraic standpoint since they
are just integer groups.
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 6
The next goal may be to classify abelian groups. While in general this is still too difficult,
if we add an extra property, we can achieve a full classification. We build toward this.
Definition 16 (Direct Product). Let (G, ) and (H, ⇤) be groups. We define the
cartesion product
G ⇥ H = {(g, h) : g 2 G, h 2 H}.
The direct product group is the group (G ⇥ H, ·) with operation defined by
This gives us a way to build larger groups out of smaller groups. Using this, we can see
for example than
Rn ⇠
=R | ⇥R⇥ {z· · · ⇥ R} .
n times
What is interesting is how the direct product works with finite groups. For any finite groups
G of order n and H of order m, G ⇥ H will have order nm. It is quite easy to see that the
direct product of two abelian groups is again an abelian group. However, the same is not
true of cyclic groups. Indeed, Z2 is certainly cyclic, but
Proposition 17. Suppose that G, H are groups and that x 2 G has order n and y 2 H has
order m. The order of (x, y) 2 G ⇥ H is lcm(n, m). In particular, Zn ⇥ Zm is cyclic if and
only if n and m are relatively prime.
This theorem continuous to hold when we add more factors. For example, Zn ⇥ Zm ⇥ Zk
is cyclic i↵ gcd(n, m, k) = 1. Using these direct products, we can build all abelian groups
which satisfy one more property.
Definition 18 (Finitely Generated Group). Let (G, ) be a group and let S ⇢ G. The
subgroup generated by S is defined to be the set of all finite combinations of elements in S
and their inverses. A groups is said to be finitely generated if it is equal to the subgroup
generated by a finite set of elements. For example, the subgroup generated by {a, b} contains
the elements
ab, b 1 a2 b3 , a2 b 5 ab7 , etc.
Thus a group is finitely generated if it can be built from just a few elements in the group.
From this definition it is clear that all cyclic groups are finitely generated. It is also
clear that any finite group is finitely generated since the entire group can be taken as the
generating set. The groups Zn = Z ⇥ Z ⇥ · · · ⇥ Z are prototypical examples are groups
which are infinite, non-cyclic and finitely generated. Indeed, Zn is generated by the elements
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 7
G⇠
= Zr ⇥ Zm1 ⇥ Zm2 ⇥ · · · ⇥ Zmk
where p1 , . . . , p` are (not necessarily distinct) prime numbers and k1 , . . . , k` are (not neces-
sarily distinct positive integers.
This theorem allows us to identify the number of abelian groups of a given finite order.
Example 20. How many structurally distinct abelian groups have order 540?
Solution. We can factor 540 = 22 · 33 · 5. Now we simply need to find how many ways we
can write 540 = m1 · · · mk where each m` divides m`+1 or 540 = pk11 · · · pk` ` . Personally, I find
the latter easier. Listing the prime factors, we see
(i) 540 = 2 · 2 · 3 · 3 · 3 · 5
(ii) 540 = 22 · 3 · 3 · 3 · 5
(iii) 540 = 2 · 2 · 3 · 32 · 5
(iv) 540 = 22 · 3 · 32 · 5
(v) 540 = 2 · 2 · 33 · 5
(vi) 540 = 22 · 33 · 5
(i) 540 = 3 · 6 · 30
(ii) 540 = 3 · 3 · 60
(iii) 540 = 6 · 90
Z2 ⇥ Z2 ⇥ Z3 ⇥ Z3 ⇥ Z3 ⇥ Z5 ⇠
= Z3 ⇥ Z6 ⇥ Z30
Z 4 ⇥ Z 3 ⇥ Z3 ⇥ Z 3 ⇥ Z5 ⇠
= Z3 ⇥ Z3 ⇥ Z60
Z2 ⇥ Z2 ⇥ Z3 ⇥ Z9 ⇥ Z5 ⇠
= Z6 ⇥ Z90
Z4 ⇥ Z3 ⇥ Z9 ⇥ Z5 ⇠
= Z3 ⇥ Z180
Z2 ⇥ Z2 ⇥ Z9 ⇥ Z5 ⇠
= Z2 ⇥ Z270
Z4 ⇥ Z27 ⇥ Z5 ⇠
= Z540
While isomorphisms preserve all the algebraic properties of a group, requiring the map
to be bijective is very stringent. Indeed, a map could still preserve group structure without
satisfying this requirement.
Homorphisms compress a group down into smaller group while still maintaining much of
the structure of the original group. It is easy to check that if G is a group and : G ! H is
an map which respects operations, then (G) ⇢ H is indeed a group (thus homomorphisms
are the morphisms in the category of groups). Note that every isomorphism is a homomor-
phism but not conversely. Homomorphisms have many nice properties which we list here.
(v) if G ha finite order, then | (G)| | |G|. Likewise, if H has finite order then | (G)| | |H|,
is a subgroup of G. In particular,
ker = 1
({e0 }) = {x 2 G : (x) = e0 }
Finally, until now we have only considered a single binary operation on a set. While
group theory is useful in its own right, most spaces that we are used to dealing with are
naturally endowed with two binary operations. Indeed, all of N, Z, Q, R, C have both addi-
tion and multiplication as does Rn⇥n . To account for such spaces, we need new algebraic
structures.
Definition 23 (Ring). A ring is a set R with two binary operations + and · (which we
call addition and multiplication) such that (R, +) is an abelian group (with identity 0 and
inverse x for each x) and · is associative. Further, we require + and · to be compatible in
the sense of distributivity:
x · (y + z) = (x · y) + (x · z) and (y + z) · x = (y · x) + (z · x), for all x, y, z 2 R.
If there is a multiplicative identity, we call this element 1 and we say R is a ring with unity.
If · is commutative, we say that R is a commutative ring.
One thing to notice about this definition is that multiplication need not be a nice op-
eration in a ring and indeed, it often is not. For example, Rn⇥n forms a ring under matrix
addition and matrix multiplication. However, under multiplication, there is no commutivity
and there are no guaranteed inverse elements. In more extreme examples, there is not even
necessarily and identity element for multiplication. More quotidian examples of rings are
Z and R. Again, in Z there are no multiplicative inverses, but multiplication is commuta-
tive. In R, each element (other than the additive identity) has both an additive inverse and
a multiplicative inverse. Other examples of rings are Zn under addition and multiplication
modulo n, RR - the functons from R ! R or Z[i] = {a+bi : a, b 2 Z} - the Gaussian integers.
As a shorthand, we usually drop the dot and write xy rather than x · y. Also, to avoid
drawing too many parentheses, it is convenient to adopt an order of operations wherein
multiplication has priority over addtion; thus xy + z = (x · y) + z for example.
Many of the definitions for groups have direct analogs for rings. For example a subring
is a subset of a ring which is itself a ring. A ring homomorphism is an operation preserving
map from one ring to another (note, there are two operations to preserve here).
Since ring multiplication can be bad, we typically do not have factorization like we do
in R. For example, in R if xy = 0, then either x = 0 or y = 0. This is not true in Z6
where 2 · 3 = 0 for example. However, if an element has a multiplicative inverse, then we can
perform this sort of cancellation. Thus the non-zero elements x such that xy = 0 for some
other non-zero y can be seen as the bad elements in a ring.
Definition 24 (Units & Zero Divisors). Let (R, +, ·) be a ring with unity 1. An element
x 2 R is called a unit if there is an element x 1 such that xx 1 = x 1 x = 1. In this case x 1
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 10
From the definitions, it is clear that no zero divisor can be a unit and vice versa. The
matrices Rn⇥n form a ring which has zero divisors. For example A = ( 00 10 ) satisfies A2 = 0
and is thus a zero divisor in R2⇥2 .We can use these concepts to further add structure to the
ring.
The prototypical integral domain is Z (hence the name integral domain). However, as
stated before Z lacks one final property that is shared by R and C for example.
Commonly encountered fields include Q, R, C. However, there are finite fields as well.
For example, Zp is a field under addition and multiplication modulo p when p is prime. In
fact, all finite fields have order pk for some prime p though the field of order pk is not Zpk
unless k = 1. Another common proposition states that all finite integral domains are fields.
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 11
Complex Analysis
Complex analysis deals with the extension of calculus to the complex plane C. The com-
plex numbers C are usually defined as the algebraic completion of R; i.e., the smallest field
containing R over which all polynomials can be factored. The motivation is that in the real
numbers, certain polynomials like p(x) = x2 + 1 have no roots. To remedy this, we add an
number i called the complex unit with the property that i2 = 1 and thus i and i are
roots of p. Then all roots of all polynomials with real coefficients can be written in the form
a + bi for some real numbers a, b.
One thing we may immediately recognize is that whenever we had a real polynomial, the
imaginary roots came in pairs a ± bi. We call these roots a conjugate pair. Indeed, for any
z = x + yi 2 C, we can define the complex conjugate of z by z = x yi. A complex number
is real i↵ its imaginary part is zero which is equivalent to z = z. It is useful to consider
what is lost when we move from R to C; indeed, we gain algebraic closure but we lose total
ordering. Thus the statement z < w is meaningless for z, w 2 C unless it happens to be the
case that z, w 2 R.
We can think of the real part and the imaginary part of a complex number as being inde-
pedent, representing each with an axis. In this way, we can geometrically (and topologically)
associate C with R2 under the map (x, y) 2 R2 7! x + yi 2 C. For this reason, we often refer
to C as the complex plane. Then, just as we found polar coordinates useful in R2 , they will
be useful in C.
Definition 28 (Magnitude
p & Argument). Let z = x + yi 2 C. We define the magni-
tude of z by |z| = x2 + y 2 . This can be thought of as the length from the origin to the
point x + yi. We also define the argument of z to be an angle ✓ which z makes with the
positive real axis. This is not a unique value since we could slway add or subtract 2⇡ to get
another argument of z. Thus we define the principal argument of z (denoted Arg z to be
the argument of z which lies in ( ⇡, ⇡].
From simple plane geometry, we see that |z|2 = zz and Arg z = arctan(Re z/Im z) after
possible accounting for a phase shift. Note that the magnitude gives us a sense of distance
in the complex plane: while |z| denotes the length from the origin to z, we also have that
|z w| denotes the length from z to w. Thus for example {z 2 C : |z w| < 2} is all
complex numbers whose distance from w is less than 2; we know this to be the circle of
radius 2 centered at w.
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 12
Definition 29 (Polar Form). Let z 2 C. The polar form of z is given by z = rei✓ where
r = |z| and ✓ = Arg z.
Note that the polar form of a complex number is the same as the polar form of a vector
(x, y) $ (r cos(✓), r sin(✓)) once we recall Euler’s identity ei✓ = cos(✓) + i sin(✓). Ths polar
form is useful in computing powers of z. For example, if z = rei✓ then z 5 = r5 e5i✓ is much
easier to compute than z 5 = (x + yi)5 and similarly for z 1/3 = r1/3 ei✓/3 . However, in this last
example some care is needed. While most functions translate nicely from the real numbers
to the complex numbers (as we will see shortly), fractional powers do not. For example, in
the real numbers the map x ! x1/3 is a well-defined bijection but in the complex numbers,
the map z 7! z 1/3 is not well-defined. Thus we simply need to be careful manipulating roots
in the complex plane. For example,
p p p
ab 6= a b
for complex a, b. We see that while xn = 1 has only 1 or 2 roots for real x, the equation
z n = 1 has n roots for complex z. We can find these using the polar form of z.
Definition 30 (Roots of Unity). The nth roots of unity are the n solutions to z n = 1 in
C. Setting !n = e2⇡i/n , we see that the roots of unity are given by 1, !n , !n2 , . . . , !nn 1 .
Visually, these roots of unity are n points which are evenly spaced around the unit circle.
Next we want to deal with functions of complex variables. Let f : C ! C. Then for
every z 2 C, we have f (z) 2 C which means that f (z) = Re(f (z)) + Im(f (z))i; that is f will
have a real and imaginary part which are maps from C ! R. With the understanding that
x is the real part of z and y is the imaginary part of z, we often write the real an imaginary
part of f as u(x, y) and v(x, y) respectively. Thus, somewhat confusingly, we will often write
and u, v : R2 ! R. Other times it is more convenient to forgo this and just work with f (z);
however, any function f (z) does have the above decomposition into real and imaginary part.
For example, the function f (z) = z 2 can be written in terms of its real and imaginary
part as follows:
f (z) = (x + iy)2 = (x2 y 2 ) + i(2xy).
Thus in this example, u(x, y) = x2 y 2 and v(x, y) = 2xy. It is worthwhile to define a few
commonly used complex functions.
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 13
cos2 (z) + sin2 (z) = 1, and cos(z + 2⇡) = cos(z), ez+w = ez ew for all z, w 2 C.
These can all be written in terms of real and imaginary parts as well. For example,
f (z) = ez can be written as
We want to define calculus over C and in order to do so, we need to define limits. We
can do this in the same way as in R since magnitude gives us a metric on C.
In this case, we write limz!a f (z) = L or if no such L exists, we say that the limit does not
exist.
All of the same limit rules from ordinary calculus hold for the complex limit as well.
Using this definition of the limit, we can define derivatives in the same way that we do in R
as well.
f (z + h) f (z)
f 0 (z) = lim
h!0 h
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 14
whenever the limit exists. When the limit exists, we say f is (complex) di↵erentiable at z.
If the limit does not exist, then we say that f is not di↵erentiable at the point z.
Note again, the limit is taken in the context of complex numbers so h needs to be allowed
to approach zero from any direction in the complex plane.
We can use this definition to prove all the same derivative rules from ordinary calculus.
For example, if f (z) = z 2 , note that
(z + h)2 z2 2zh + h2
f 0 (z) = lim = lim = lim (2z + h) = 2z
h!0 h h!0 h h!0
just as one would expect. Likewise if f (z) = ez then f 0 (z) = ez and if f (z) = sin(z) then
f 0 (z) = cos(z). To stress that the limit must exist in all directions, we do another example.
Solution. We note that f (h) h f (0) = hh . Now if we allow h to approach 0 along the real
axis, we see that hh = 1 ! 1 while if we let h approach 0 along the imaginary axis, we have
h
h
= 1 ! 1. Since these do not agree, the limit does not exist and thus f (z) = z is not
di↵erentiable at z = 0.
The function f (z) = z seems fairly simply and thus, at a first glance it may seem like it
should be di↵erentiable but we’ve proved it isn’t (at least at z = 0). A natural question is:
which complex functions f (z) = u(x, y)+iv(x, y) are di↵erentiable. It seems very logical that
if u, v are di↵erentiable then f will be di↵erentiable however, for z = x iy, the components
are di↵erentiable so clearly this is not strong enough. We investigate further. Suppose that
f (z) = u(x, y) + iv(x, y) is di↵erentiable at z 2 C and let h = hx + ihy 2 C. Notice that
✓ ◆
f (z + h) f (z) u(x + hx , y + hy ) u(x, y) v(x + hx , y + hy ) v(x, y)
lim = lim +i .
h!0 h h!0 hx + ihy hx + ihy
Since the limit can be taken in any direction, we consider taking the limit along the real and
imaginary axis respectively. Along the real axis we arrive at
✓ ◆
0 u(x + hx , y) u(x, y) v(x + hx , y) v(x, y) @u @v
f (z) = lim +i = (x, y) + i (x, y).
hx !0 hx hx @x @x
However, along the imaginary axis, we see
✓ ◆
0 u(x, y + hy ) u(x, y) v(x, y + hy ) v(x, y) @u @v
f (z) = lim +i = i (x, y) + (x, y).
hy !0 ihy ihy @y @y
@u @v @u @v
= and =
@x @y @y @x
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 15
Again, most nice functions are holomorphic on all of C or at least on “most” of C. For
example, exponentials, polynomials, sines, cosines and compositions thereof are holomophic
on all of C (i.e., they are entire). Rational functions (like f (z) = 1 1 z ) are holomorphic except
at isolated points in the complex plane (such functions are called meromorphic). As I said,
holomorphicity is actually a phenomenal property. For example, when we discussed Calculus
II, we noted that in R, it is not difficult to find smooth (i.e. infinitely di↵erentiable) func-
tions with no Taylor series representation; this is not remotely possible in the complex plane.
In light of this proposition, many people define analyticity using Definition 35; i.e., they
make no distinction between analyticity and holomorphicity when discussing functions from
C ! C. This is convenient in that one can define analyticity without discussing power
series but I feel as though it supresses some information regarding just how nice complex
di↵erentiable functions are.
Another interesting break between real and complex di↵erentiable functions is the fol-
lowing.
Theorem 37 (Liouville’s Theorem). The only bounded entire functions are constant.
This is, of course, patently false when considering real functions. For example, sin(x)
and cos(x) are bounded “entire” functions on R. There are also nontrivial smooth functions
with compact support on R; there are no such functions on C. An even stronger statement
is Picard’s Little Theorem (this is typically not covered in undergraduate complex analysis
but can be a useful fact).
entire. Then the range of f is the entire complex plane possibly missing a single point.
From here we move on to complex integration but the ideas of analyticity and holomor-
phicity will arise again. Just as we defined integration in R2 , the most natural extension of
one-dimensional integration to the complex plane is path integration.
Example 40. Let f (z) = iz and let g(z) = iz 2 . Let C be the straight-line path from 0 to
1 + i and let D be the path from 0 to 1 + i along the straight lines from 0 to 1 then from 1
to 1 + i. Calculate the line integrals of both f and g on both paths C and D.
Solution. The first path C is parameterized by z 0 (t) = t + it, t 2 [0, 1]. The second path
D is parameterized in two steps by z1 (t) = t for t 2 [0, 1] then z2 (t) = 1 + it for t 2 [0, 1].
Thus the integrals of f are given by
Z Z 1 Z 1
2i
f (z)dz = i(t it)(1 + i)dt = i(1 i)(1 + i) tdt =
C 0 0 3
and Z Z Z
1 1
i i
f (z)dz = itdt + i(1 it)idt = 1+ = 1 + i.
D 0 0 2 2
For g, we have
Z Z 1 Z 1
2 2(1 + i)
g(z)dz = i(t + it) (1 + i)dt = (1 + i) 2t2 dt =
C 0 0 3
and Z Z Z
1 1
2 i (1 + i)3 1 2(1 + i)
g(z)dz = it dt + i(1 + it)2 idt = +i = .
D 0 0 3 3 3
In this example, we notice that the integral of f depended on the path whereas the inte-
gral of g did not. This should be somewhat reminiscent of path integration in R2 . We have
a similar theorem here.
Note, that since di↵erentiability implies analyticity, this will only apply when f itself is
di↵erentiable in U and we need f as a function of z to use this. We can state this in another
way.
This demonstrates integration for entire functions. A more difficult problem is to inte-
grate functions around paths in regions where the functions have singularities; i.e., points
where the function is undefined or blows up.
Functions with removable singularities are those that have simply been shifted at a single
point or are undefined there such as
⇢
z, z 6= 0, x + 3i
f (z) = or f (z) = 2 .
1, z = 0, x +9
Functions with poles are those that blow up like a polynomial near certain points; for example
1
f (z) =
(z 1)(z 2 + 3i)3
has a degree one pole at z = 1 and a degree three pole at z = 2 3i. Essential singularities
usually occur when functions with poles are composed with non-polynomial functions. For
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 18
Knowing the behavior of functions near poles will help us calculate the integral of f
around paths that encircle the poles.
where the sum ranges over the points ak which lie inside the curve.
Example 47. Let C be the circle of radius 2 centered at the origin. Let
1
f (z) = .
(z 1)(z + i))(z 2 i)
Z
Find f (z)dz.
C
Solution. Only the poles that are inside the curve contribute to the integral. The pole at
z = 2 + i is outside the contour and thus we can ignore it; we need only calculate the residues
at z = 1 and z = i. We see
z 1 1 i
Res(f, 1) = lim = = .
z!1 (z 1)(z + i)(z 2 i) (1 + i)( 1 i) 2
Likewise
z+i 1 i
Res(f, i) = lim = = .
z! i (z 1)(z + i)(z 2 i) ( 1 i)( 2 2i) 4
The integral is then Z ✓ ◆
i i ⇡
f (z)dz = 2⇡i = .
C 2 4 2
1
Here positive orientation means that the curve is traversed couterclockwise. A slight technicality: we
also need C to be rectifiable; i.e., have finite length. Any reasonable curve will satisfy this.
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 19
This gives us a way to calculate integrals when the function f has only poles. We do not
yet have a definition for residue when the function has an essential singularity. To deal with
this, we need to discuss Laurent expansions.
Typically Laurent expansions converge in annuli. If cn = 0 for n < 0, then this will reduce
to the ordinary Taylor expansion.
5
Example 49. Find the Laurent expansions of f (z) = (1 z)(2i z)
which converge for (a)
|z| < 1, (b) 1 < |z| < 2 and (c) |z| > 2.
Solution. Practically, we can compute Laurent expansions using some clever factoring and
our prior knowledge of Taylor series. For this function, we will first use partial fractions to
see
1 + 2i 1 + 2i
f (z) = .
2i z 1 z
From prior knowledge of Taylor series, we know that
1
X
1
= zn
1 z n=0
for |z/2i| < 1 or |z| < 2. When |z| < 1, both of these are convergent Taylor series and we
have !
X1 X1 ⇣ ⌘n
1 z
f (z) = (1 + 2i) zn +
n=0
2i n=0 2i
for |z| < 1.
For 1 < |z| < 2, the latter Taylor series still converges; however, the former diverges so
we need to do something more clever. We see
1 1 1
= .
1 z z (1 (1/z))
Now if |z| > 1, then |1/z| < 1 and so
1 1
1 1X 1 X 1
= = .
1 z z n=0 z n n=1
z n
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 20
For some functions, this can be very easy. For example, knowing the Taylor expansion
for f (z) = ez makes it trivial to calculate the Laurent expansion for g(z) = e1/z at z = 0.
Indeed, we see that
X1
1
g(z) = e1/z =
n=0
n!z n
which converges for all z 6= 0.
One main use for a Laurent expansion is that it helps us calculate the residue for a func-
tion with an essential singularity.
This shows, for example, that Res(e1/z , 0) = 1 and Res(cos(1/z), 0) = 0. Using this, we
can apply the residue theorem to calculate path integrals when the path encircles an essential
singularity.
Solution. The integrand has essential singularities at z = ±i. Recall, the path integral
is equal to 2⇡i times the sum of the residues which lie inside the path. The singularity at
z = i lies outside of the contour so it can be ignored. We need only calculate the the
residue at z = i. Here, we see
✓ ◆ X 1
1 1
sin 2
= .
z +1 n=0
(2n + 1)!(z 2 + 1)2n+1
1
Now all terms with n 1 will not include (z i) since the power will be too high. Thus
we need to consider the n = 0 term. We have
1 1 i/2 i/2
= = .
z2 +1 (z + i)(z i) z+i z i
1
Now (z + i) has a Taylor series which converges in a neighborhood of z = i. Thus we have
✓ ◆ X 1 1
X
1 i/2 1
sin = + cn (z i)n +
z2 + 1 z i n=0 n=1
(2n + 1)!(z 2 + 1)2n+1
One last application of the residue theorem is evaluating integrals with infinite bounds.
For example, certain times evaluating
Z 1
f (x)dx
1
can be very difficult using methods from Calculus II. However, we can envision f as a func-
tion from C ! C and the line R is part of a contour in C. We demonstrate this in a final
example but first we need a lemma.
This is the closest analog in complex analysis to the fact that the Riemann integral pre-
serves inqualities. With this, we can do one last example.
Z 1
cos(x)dx
Example 53. Evaluate the integral 2
.
1 1+x
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 22
Solution. Using the integral methods from calculus, this looks fairly unapproachable.
However, it is easy to evaluate using the residue theorem. First note that since sin(x)
1+x2
is odd,
we have Z 1 Z 1 ix
cos(x)dx e dx
2
= 2
.
1 1+x 1 1+x
Consider a large R > 0 and let C be the contour which travels from R to R along the real
axis and then from R back to R along the semicircle CR parameterized by z = Rei✓ for
✓ 2 [0, ⇡] (pictured below).
eiz
Inside this contour, f (z) = 1+z 2 has one pole at z = i. Thus the value of the integral of
f over the contour is 2⇡i times the residue at that pole. We see
I
(z i)eiz eiz 1 ⇡
Res(f, i) = lim 2
= lim = =) f (z)dz = .
z!i 1 + z z!i z + i 2ie C e
On the other hand, we have
I Z R Z
eix dx eiz dz
f (z)dz = + .
C R 1 + x2 CR 1 + z2
When z is on the arc CR , notice that |eiz | = eiR(cos(✓)+i sin(✓)) = e R sin(✓) 1 where ✓ 2 [0, ⇡]
is the argument of z. Using this and the reverse triangle inequality, we see that for z on the
eiz 1 1
arc CR , we have 1+z 2 |1+z 2 | R2 1 . Hence, the ML inequality gives
Z
eiz dz `(CR ) ⇡R
2
2 = 2 ! 0 as R ! 1.
CR 1+z R 1 R 1
Topology
The field of topology is concerned with the shape of spaces and their behavior under
continuous transformations. Properties regarding shape and continuity are phrased using
the concept of open sets.
In this case, we call the pair (X, ⌧ ) a topological space and we call the sets T 2 ⌧ open sets.
Note, there are two topologies which we can always place on any set X: the trivial topol-
ogy ⌧ = {?, X} and the discrete topology ⌧ = P(X). Having defined open sets, we are able
to define closed sets.
The words open and closed can be a bit confusing here. Often times students mistak-
enly assume that a set is either open or closed; that these terms are mutually exclusive and
describe all sets. This is not the case. Indeed, sets can be open, closed, neither open nor
closed, or both open and closed. In any topological space (X, ⌧ ), the sets ? and X are both
open and closed. By De Morgan’s laws, since finite intersections and arbitrary unions of open
sets are open, we see that finite unions and arbitrary intersections of closed sets remain closed.
Example 3. The set of real numbers R becomes a topological space with open sets defined
as follows. Define ? to be open and define ? 6= T ⇢ R to be open i↵ for all x 2 T , there
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 2
exists " > 0 such that (x ", x + ") ⇢ T . Prototypical open sets in this topology are the
open intervals (a, b) = {x 2 R : a < x < b}. Indeed, this interval is open because for
x 2 (a, b), we can take " = min{|x a| , |x b|} and we will find that (x ", x + ") ⇢ (a, b).
We can combine open sets via unions or (finite) intersections to make more open sets; for
example (0, 1) [ (3, 5) is also an open set. Likewise, prototypical closed sets are closed in-
tervals [a, b] = {x 2 R : a x b}, and any intersection or (finite) union of such sets will
remain closed. As was observed above ? and R are both open and closed; in fact, in this
space, these are the only sets which are both open and closed, though it is easy to construct
sets which are neither open nor closed. Consider the set [0, 1) = {x 2 R : 0 x < 1}. This
set is not open because the point 0 is in the set, but it cannot be surrounded by an interval
which remains in the set. The complement of this set is ( 1, 0) [ [1, 1). This set is not
open since 1 is in the set but cannot be surrounded by an interval which remains in the set.
Since the complement is not open, the set [0, 1) is not closed. Note, this topology is called
the standard topology on R.
Example 4. While the above example defines the standard topology on R, it is easy to
come up with non-standard topologies as well. Indeed, let us now define T ⇢ R to be open
if T can be written as a union of sets of the form [a, b) = {x 2 R : a b < x}. These open
sets comprise a topology on R. In this topology a prototypical open set is of the form [a, b).
What other sets are open in this topology? Notice that
1
[
(a, b) = [a + 1/n, b)
n=1
which shows that sets of the form (a, b) remain open in this topology. Also notice that since
[a, b) is open, we define
[a, b)c = ( 1, a) [ [b, 1)
to be closed. However, both ( 1, a) and [b, 1) are easily seen to be open, so the set
( 1, a) [ [b, 1) is also open as a union of open sets. Since this set is open, it’s complement
[a, b) is closed. Hence in this topology, all sets of the form [a, b) are both open and closed.
The intervals [a, b] are closed and not open in this topology. Note, this topology is called
the lower limit topology on R.
Notice in these example, the lower limit topology contains as open sets all of the sets
which are open in the standard topology. In this way, the lower limit topology has “more”
open sets and we can think of the lower limit topology “containing” the standard topology.
We define these notions here.
Definition 5 (Finer & Coarser Topologies). Suppose that X is a set and ⌧, are two
topologies on X. If ⌧ ⇢ , we say that ⌧ is coarser than and that is finer than ⌧ .
On any space X, the finest topology is the discrete topology P(X) and the coarsest is
the trivial topology {?, X}. A finer topology is one that can more specifically distinguish
between elements.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 3
Definition 6 (Interior & Closure). Let (X, ⌧ ) be a topological space and let T ⇢ X.
The interior of T is defined to be the largest open set contained in T . The closure of T is
defined to be the smallest closed set containing T . We denote these by int(T ) and cl(T )
respectively. In symbols, we have
[ \
int(T ) = S and cl(T ) = S.
S2⌧,S⇢T S c 2⌧,T ⇢S
Other common notations are T̊ for the interior of T , and T for the closure of T .
In both of the examples above, there was some notion of a “prototypical” open set, from
which other open sets can be built. We give this notion a precise meaning here.
Definition 8 (Basis (Base) for a Topology). Let X be a set and let be a collection
of subsets of X such that
[
(1) X = B,
B2
forms a topology on X. We call this ⌧ the topology generated by , and we call a basis
for the topology ⌧ .
This is half definition and half theorem: we are defining what it means to be a basis, and
asserting that the topology generated by a basis is indeed a topology. If we can identify a
basis for a topology, then the basis sets are the “prototypical” open sets, and all other open
sets can be built as unions of the basis sets. Morally, basis sets are representatives for the
open sets; if you can prove a given property for basis sets, the property will likely hold for
all open sets. Often times it is easiest to define a topology by identifying a basis.
Example 9. Above we defined the standard topology on R by saying that a set T is open
if for all x 2 T , there is " > 0 such that (x ", x + ") ⇢ T . It is important to see this
definition of the topology; however, this is a much more analytic than topological definition.
The topological way to define the standard topology on R would be to define it as the topol-
ogy generated by the sets (a, b) where a, b 2 R, a < b. Indeed, these two definitions of the
standard topology are equivalent as the following proposition shows.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 4
Proposition 10. Suppose that (X, ⌧ ) is a topological space and that is a basis for the
topology ⌧ . Then T 2 ⌧ i↵ for all x 2 T , there is B 2 such that x 2 B and B ⇢ T .
It is important to identify when two bases in a topological space generate the same topol-
ogy and this next proposition deals with that question.
Proposition 11. Suppose that X is a set and 1 , 2 are two bases for topologies ⌧1 and ⌧2 .
Then ⌧1 ⇢ ⌧2 i↵ for every B1 2 1 , and for every x 2 B1 , there is B2 2 2 such that x 2 B2
and B2 ⇢ B1 . (Be very careful not to mix up the inclusions in this statement. What this
is essentially saying is that 2 generates a larger (finer) topology i↵ 2 has more (smaller)
sets.) Informally, the basis 2 generates a finer topology if we can squeeze basis sets from 2
inside basis sets from 1 (and not only that, but we can construct basis sets from 1 out of
basis sets from 2 ).
Just as all groups have subgroups and all vector spaces have subspaces, there is a natural
way to define subspaces of a topological space.
Again, this is part definition and part theorem; we are asserting that such does indeed
define a topology on Y .
Example 13. Consider [0, 3] ⇢ R with the standard topology on R. Note that the subspace
topology on [0, 3] includes standard open sets like (1, 2) since this set is open in R and
Now consider the set (1, 3]. This set is not open in R; however, it is open in the subspace
topology on [0, 3], because (1, 4) is open in R and
Definition 14 (Product Topology). Let (X, ⌧ ), (Y, ). Recall the Cartesian product is
given by coupling elements of X and Y : X ⇥ Y ..= {(x, y) : x 2 X, y 2 Y }. It is tempting to
define a topology on X ⇥ Y comprised of sets of the form T ⇥ S for T 2 ⌧, S 2 . However,
these do not form a topology on X ⇥ Y since a union of sets of this form will not be of this
form anymore. So rather, we let = {T ⇥ S : T 2 ⌧, S 2 } form the basis for a topology
on X ⇥ Y . The topology generated by is denoted ⌧ ⇥ and the space (X ⇥ Y, ⌧ ⇥ ) is
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 5
Example 15. Consider R with the standary topology, which we call ⌧1 . The product space
(R ⇥ R, ⌧1 ⇥ ⌧1 ) can be visualized by drawing the plane with standard open sets as rectangles
(a, b) ⇥ (c, d) = {(x, y) 2 R ⇥ R : a < x < b and c < y < d}. Alternatively, we can consider
the set R2 of 2-dimensional vectors. On this space, we consider the topology ⌧2 generated
by circles: Br (v) = {z 2 R2 : kv zk < r} for v 2 R2 and r > 0 (indeed, this is called the
standard topology on R2 ). We can identify each vector v = ( xy ) 2 R2 with the coordinates
(x, y) 2 R ⇥ R. Since this is a bijective map, the sets R ⇥ R and R2 are really the same.
We’d like to know if the topologies ⌧1 ⇥ ⌧2 and ⌧2 are the same. To prove they are the same,
consider the bases
For any B1 = (a, b)⇥(c, d) 2 1 , take any (x, y) 2 B1 and let r = min{x a, b x, y x, d y}.
Then for v = ( xy ), we will have x 2 Br (v) ⇢ B1 ; this shows that for any (x, y) 2 B1 , we can
find a set B2 2 2 such that (x, y) 2 B2 and B2 ⇢ B1 . Hence by Proposition 11, ⌧1 ⇥⌧1 ⇢ ⌧2 .
Conversely, let v = ( xy ) and r > 0pand consider the set B2 = Br (v) 2 2 . For any u = ( wz ) 2
Br (v), define r0 = (r ku vk2 )/ 2. Then the square B1 = (z r0 , z + r0 ) ⇥ (w r0 , w + r0 )
satisfies u = ( wz ) 2 B1 and B1 ⇢ B2 . Thus by Proposition 11, we have ⌧2 ⇢ ⌧1 ⇥ ⌧1 ,
and we can conclude that ⌧2 = ⌧1 ⇥ ⌧1 . (Note, this inclusions of basis sets is pictured in
Figure 1.) That is, the standard topology on R2 is the product of two copies of the standard
topology on R. More generally, for n 2 N, we can define the standard topology on Rn to be
the topology generated by open balls and we will find that this is the same as the product
of n copies of the standard topology on R.
Topology gives us the minimum structure required to discuss limits and continuity. In-
deed, in calculus we were only able to discuss these things because R is naturally a topological
space with the standard topology. We give the topological definitions of limits and continuity
here and discuss some of their properties.
Definition 16 (Limit of a Sequence). Let (X, ⌧ ) be a topological space and let {xn }1 n=1
be a sequence of values in X. We say that {xn }1 n=1 converges to a limit x 2 X if for any
open set T 2 ⌧ such that x 2 T , there is N 2 N, such that xn 2 T for all n N . We write
this as xn ! x or lim xn = x.
n!1
1
Note, this is only a good definition for the product topology when we are taking the product of a finite
number of spaces. Indeed, if {(Xi , ⌧i )}i2IQis an arbitrary collection of topological spaces, it is most natural
to define the product topology on X = i2I Xi to be the coarsest topology so that Q the projection maps
⇡i : X ! Xi are continuous. The topology defined generated by sets of the form i2I Ui where Ui 2 ⌧i is
then called the box topology. One can show that for a finite Cartesian product, the product topology and box
topology agree with each other; this is not necessarily true Qfor infinite products. (Another way to “correctly”
define
Q the product topology for an infinite product X = i2I Xi is to let it be generated by sets of the form
i2I U i where U i 2 ⌧ i and U i = X i for all but finitely many i 2 I.)
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 6
Figure 1: A basis set from either topology ⌧1 ⇥ ⌧1 or ⌧2 can be fit around any point in a basis
set from the other topology.
Note that this is a generalization of the definition we gave for a limit in calculus; in cal-
culus we are always using the standard topology on R. One feature of the limit in calculus
is that limits are unique: if xn ! x and xn ! y, then x = y. This is not true in a general
topological space.
Example 17. Consider a space X with the trivial topology ⌧ = {?, X}. Take any sequence
n=1 in X and apply the definition of the limit. For any x 2 X, and any N 2 N, we see
{xn }1
that if T 2 ⌧ and x 2 T , then T = X and xn 2 T for all n N . Thus in this space, every
sequence converges to every point.
There are non-trivial topologies where limits are still non-unique, but our intution tells
us that limits should be unique and we can add a simple property to ensure that they are.
Proposition 19. Limits in Hausdor↵ spaces are unique. That is, if (X, ⌧ ) is a Hausdor↵
space and {xn }1 1
n=1 is a sequence in X, then {xn }n=1 can have at most one limit x 2 X.
Now we would like to discuss maps between spaces. As with linear transforms in linear
algebra and homomorphisms in abstract algebra, we restrict our discussion to maps which
preserve some of the underlying structure of the space. In topology, these are the continuous
functions.
Definition 20 (Continuous Function). Let (X, ⌧ ) and (Y, ) be two topological spaces.
A function f : X ! Y is said to be continuous i↵
1
f (V ) = {x 2 X : f (x) 2 V } 2 ⌧ whenever V 2 .
That is, f is continuous if the preimage of every open set in Y is open in X.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 7
This gives us a way to define continuity without ever considering individual points. Con-
trast this with the calculus definition of continuity where we first define what it means for a
function to be continuous at a point, and then define continuous functions on a domain to
be those which are continuous at each point. Of course, topology still has a notion of what
it means to be continuous at a point.
Definition 21 (Pointwise Continuity). Let (X, ⌧ ) and (Y, ) be two topological spaces
and let f : X ! Y . We say that f is continuous at the point x 2 X, if for all V 2 such
that x 2 V , we have f 1 (V ) 2 ⌧ .
It is a good exercise to prove that when both X, Y are R with the standard topology, this
definition of continuity is equivalent to the "- definition of continuity presented in calculus.
One important result involving connectedness helps us classify sets which are both open
and closed.
Proposition 24. Suppose that (X, ⌧ ) is a topological space. Then X is connected i↵ the
only sets which are both open and closed in X are ? and X itself.
Such a collection {Ui }i2I in the definition of compactness is called an open cover of C.
Thus, in words, a set C is compact if every open cover of C admits a finite subcover.
Proposition 27. Suppose that (X, ⌧ ) and (Y, ) are topological spaces and that f : X ! Y
is continuous. Let U ⇢ X and consider the image of U under f defined by f (U ) ..= {f (x) :
x 2 U } ⇢ Y . If U is connected in X, then f (U ) is connected in Y . Likewise, if U is compact,
then f (U ) is compact in Y .
This proposition tells us that these properties of connectedness and compactness are in-
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 9
variant under continuous maps. Statements like this help us characterize topological spaces.
Note however that the converse is not true: a continuous map could still map a disconnected
set to a connected set (for example, the continuous function f (x) = x2 on R maps the dis-
connected set ( 1, 0) [ (0, 1) to the connected set (0, 1)), or a non-compact set to a compact
set (for example, the continuous map f (x) = sin(x) maps the non-compact set (0, 1) to the
compact set [ 1, 1]). If we want a continuous map to not change the structure of a space at
all, we need to require something more.
1. f is one-to-one,
2. f is onto,
3. f is continuous,
1
4. f is continuous.
If such a function f exists, the topological spaces (X, ⌧ ) and (Y, ) are called homeomorphic
and we write X ⇠ = Y.
Proposition 29. Let (X, ⌧ ) and (Y, ) be topological space, let U ⇢ X and let f : X ! Y
be a homeomorphism. Then
U is open in X i↵ f (U ) is open in Y ,
U is closed in X i↵ f (U ) is closed Y ,
U is connected in X i↵ f (U ) is connected in Y ,
U is compact in X i↵ f (U ) is compact in Y ,
X is Hausdor↵ i↵ Y is Hausdor↵.
Example 30. Consider R with the usual topology. Any open interval (a, b) is homeomorphic
to the interval (0, 1) under the map f : (a, b) ! (0, 1) defined by
x a
f (x) = , x 2 (a, b).
b a
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 10
Homemorphism is an equivalence relation (if two spaces are homemorphic to the same space,
they are homemorphic to each other); indeed we can always compose homemorphic maps
and retain a homemorphism. Thus any interval (a, b) is homemorphic to R.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 11
Real Analysis
Real analysis is concerned with the rigorous underpinnings of calculus. However, when
we teach calculus, we do everything formally and so everything is assumed to be “nice”
(all the common functions from calculus are smooth, for example). Now we make no such
assumptions: analysis is largely about tearing down our intuition from calculus and building
it back up again with rigor. Accordingly, most real analysis courses start with the basic
construction of R; we discuss this formally later, but begin here by establishing properties of
sequences, functions, sets, etc. Much of this will overlap with the preceding topology notes,
but often the same concepts are tackled in very di↵erent ways. Some of this will also be
repeated from the Calculus I & II notes.
We’ll start by discussing general metric spaces, giving several definitions and theorems,
and later specialize the conversation to R.
Definition 31 (Metric Space). Let X be a set and let d : X ⇥ X ! [0, 1). We call d a
metric on X (and call (X, d) a metric space) if the following three properties hold:
Example 32. The prototypical example of a metric space is R with the metric d(x, y) =
|x y|. This can be generalized to Rn . Indeed, in Rn , we define the metric
n
!1/2
X
d(x, y) = kx yk ..= (xi yi ) 2 , x, y 2 Rn .
i=1
Another example: for any set X, we can define the discrete metric d(x, y) = 0 if x = y and
d(x, y) = 1 if x 6= y.
Let ⌧ be the topology generated by the set = {Br (x) : x 2 X, r > 0}. This is called the
metric topology on X. In this topology, a set U ⇢ X is open i↵ for all x 2 U , there is r > 0
such that Br (x) ⇢ U . We will also refer to ⌧ as the topology generated by the metric d.
Conversely, if we have a topological space (X, ⌧ ) and there is a metric d on X that generates
⌧ , then we call ⌧ metrizable.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 12
For some example, the discrete metric on a set will generate the discrete topology. The
standard metric d(x, y) = |x y| on R will generate the standard topology on R.
Metric topologies have very nice structure. Most of the topological properties discussed
above can be given new definitions only using the metric structure that d lends to X. Thus,
the definitions in real analysis and topology may look di↵erent at a first glance, but they are
always compatible. We discuss some properties of metric topologies now.
Proposition 34. Let (X, d) be a metric space. The metric topology on (X, d) is Hausdor↵.
This is very easy to prove: if x, y 2 X and x 6= y then d(x, y) > 0. Then B" (x) and
B" (y), where " = d(x, y)/3, are open neighborhoods of x and y respectively which are dis-
joint, proving that the space is Hausdor↵.
Definition 35 (Limit of a Sequence). Let (X, d) be a metric space and let {xn }1 n=1 be
a sequence in X. We say that x 2 X is the limit of xn i↵ for all " > 0, there is N 2 N such
that d(x, xn ) < " for all n N . In this case, we say that xn converges to x and we write
xn ! x or limn!1 xn = x.
This definition of the limit is exactly as in calculus but generalized to arbitrary metric
spaces. Limits give us a way to characterize closed sets in metric topologies.
Definition 36 (Limit Points). Let (X, d) be a metric space and let U ⇢ X. A point
x 2 X is called a limit point of U if there is a sequence {xn } in U such that xn 6= x for all
n 2 N and xn ! x.
Proposition 37. Let (X, d) be a metric space and let C ⇢ X. Then C is closed in the
metric topology i↵ for all sqeuences {xn } in C converging to a limit x 2 X, we have that
x 2 C. In the terminology of the above definition, a subset of a metric space is closed i↵ it
contains all of its limit points.
Recall, in topology a set is closed i↵ the complement of the set is open. This theorem
gives an alternate definition, and it is often times easier to check that a set contains its limit
points than to check that its complement is open.
Note, we already defined the closure in topology to be smallest closed set containing a
set. Again, these notions are compatible: U defined in the definition above is the smallest
set which is closed in the metric topology and contains U . In many metric spaces, we can
build any point in the space by considering a smaller set and taking limits.
Definition 39 (Dense Set). Let (X, d) be a metric space and let D ⇢ X. We say that D
is dense in X i↵ for all x 2 X, there is a sequence {xn } in D such that xn ! x. Equivalently,
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 13
D is dense in X i↵ for all x 2 X and " > 0, there is y 2 D such that d(x, y) < ". Again,
equivalently, D is dense in X if D = X. One last equivalent statement: D is dense in X if
every open subset of X contains a least one point in D.
Intuitively, a dense set is tightly packed into X; it may not include all elements, but the
gaps between elements are infinitesimally small. In this way it seems like a dense set must
contain “most” of the space, but this is a place where intuition fails. Indeed, a dense set
can actually be quite small in a few di↵erent senses. We define one sense here and discuss it
more when we discuss R later.
Definition 41 (Separable Space). Let (X, d) be a metric space (or more generally a
topological space). We say that X is separable if there is a countable set D ⇢ X which is
dense in X: D = X.
Again, intuitively, it may seem like a separable space needs to be “small” because it has
a “small” dense set, but this intuition is not true in any meaningful sense. There are highly
non-trivial separable spaces.
Definition 42 (Cauchy Sequence). Let (X, d) be a metric space and let {xn } be a
sequence in X. We call {xn } a Cauchy sequence i↵ for all " > 0, there is N 2 N such that
d(xn , xm ) < " for all n, m N .
A Cauchy sequence is one that eventually begins to cluster together. Intuitively, we may
think that if the sequence clusters together, it must cluster around some point and thus it
will converge to that point. However, if the space X is “missing” some points, then the
sequence may cluster around a missing point and thus fail to converge to any member of X.
Thus we use Cauchy sequences to define a notion of a space not having any “missing” points.
Definition 43 (Complete Space). We call a metric space (X, d) complete i↵ for all
Cauchy sequences {xn } in X, there is x 2 X such that xn ! x.
Complete spaces are nice because to prove a sequence {xn } has a limit one first needs to
identify a candidate x and then prove that d(x, xn ) becomes small. However, the candidate
x may be difficult or impossible to identify. However, if the space is complete, to prove that
a sequence converges, one no longer needs to identify a candidate; rather can instead prove
that {xn } is a Cauchy sequence and conclude that it converges in that manner.
Besides sequences, much of calculus is concerned with functions and their properties like
continuity, di↵erentiability and integrability. We can discuss continuity in general metric
spaces; the other concepts require some of the structure of R, so we leave them for later.
Definition 44 (Continuity). Let (X, dX ) and (Y, dY ) be two metric spaces, let f : X ! Y
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 14
and let x 2 X. We say that f is continuous at x i↵ for all " > 0, there is = (x, ") > 0
such that for z 2 X, dX (x, z) < =) dY (f (x), f (z)) < ". We say that f is continuous
on X (or merely, f is continuous) i↵ f is continuous at every point x 2 X. Equivalently, a
function f is continuous on X i↵ for all x, y 2 X and all " > 0, there is = (x, y, ") > 0
such that dX (x, y) < =) dY (f (x), f (y)) < ".
This is the exact notion of continuity that we presented in calculus, but generalized to
metric spaces. Again, it is a useful exercise to prove that this notion of continuity is equiv-
alent to the topological notion of continuity. Because metric spaces have nice structure, we
can also characterize continuity in terms of limits of sequences.
Definition 46 (Uniform Continuity). Let (X, dX ) and (Y, dY ) be two metric spaces and
let f : X ! Y . We say that f is uniformly continuous i↵ for all " > 0, there is = (") > 0
such that for all x, y 2 X, dX (x, y) < =) dY (f (x), f (y)) < ".
At first glance this definition looks identical to the definition of continuity, but it is not.
The subtle di↵erence is in the order of the quantifiers. In the definition of continuity, the
is allowed to depend on the particular x and y you are testing; in the definition of uniform
continuity, cannot depend on x and y: there must be a uniform that only depends ". In
logical notation this di↵erent is expressed as such: f is continuous i↵
(8" > 0)(8x, y 2 X)(9 > 0) ; dX (x, y) < =) dY (f (x), f (y)) < ",
(8" > 0)(9 > 0)(8x, y 2 X) ; dX (x, y) < =) dY (f (x), f (y)) < ".
Definition 47 (Lipschitz Continuity). Let (X, dX ) and (Y, dY ) be two metric spaces
and let f : X ! Y . We say that f is Lipschitz continuous i↵ there is a constant L > 0 such
that for all x, y 2 X, dY (f (x), f (y)) L · dX (x, y). In this case, the smallest such L is
called the Lipschitz constant of f .
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 15
Thus Lipschitz continuous function have an explicit bound on the distance between f (x)
and f (y) in terms of the distance between x and y. If f is Lipschitz continuous with constant
L, then for any " > 0, we can take = "/L and we will find that f satisfies the definition of
uniform continuity; hence Lipschitz continuity is even stronger.
With this we drop generality and talk specifically about the analytic and topological
structure of R. Again, we will not explicitly construct the real numbers, but we’ll present
the rough idea, which is to start with rational numbers and define real numbers as limit
points of Cauchy sequences of rationals.
The rational numbers become a metric space with the metric d(a, b) = |a b| for a, b 2 Q.
Proposition 49. Q is countably infinite. That is, we can enumerate the rationals in a
sequence Q = {qn }1
n=1 .
This requires what is called a diagonalization argument. One can list the rationals in
a two-dimensional table where row k corresponds to the rationals whose denominator is k.
Then one can traverse the table down each diagonal, assigning a natural number to each
rational number.
Proposition 50. Q is indiscrete in the sense that between any two rational numbers, one
can find another rational number. Indeed, if a, b 2 Q and a < b, then for large enough
n 2 N, we have a < a + n1 < b and a + n1 is still rational.
From this proposition, it seems that the rational numbers do not have any large holes,
and this may lead us to believe that the rational numbers are complete, but this is incorrect.
Indeed, the rational numbers do not form a complete metric space.
Proposition 51. There is no x 2 Q such that x2 = 2, but there is a Cauchy sequence {xn }
in Q such that x2n ! 2.
which can be realized as limit points of Cauchy sequences of rational numbers. Thus any
rational is real, but the numbers x such that x2 = 2 are real without being rational. Though
this is essentially the definition of R, we state this as a proposition.
Proposition 52. Any real number is a limit of rational numbers. That is, Q is dense in
R, and thus any open set in R contains rational numbers. Furthermore R is complete under
the metric d(x, y) = |x y| , x, y 2 R.
Definition 53 (Irrational Numbers). The irrational numbers to be those which are real
but not rational; that is, the irrational numbers are given by R \ Q = {x 2 R : x 62 Q}.
Thus, for example, the solution to x2 = 2 are irrational, and of course, since f : [0, 1) !
[0, 1) defined by f (x) = x2 for x 2 [0, 1) is bijective,
p we can define
p an inverse map, and once
we’ve done, we denote the solutions to x2 = 2 as 2 and 2. Other common irrational
numbers are ⇡ and e.
We asserted before that Q is a countable set. It is reasonable to ask if R is still countable
since all numbers in R are simply limits of numbers in Q.
Proposition 54. The set R of real numbers is not countable, and thus R\Q is not countable
either.
To prove this, one can use another type of diagonalization argument. If we assume that
the numbers between 0 and 1 are countable, then we can list their decimal representations,
but then it is not difficult to explicitly construct a number between 0 and 1 which is not
accounted for in the list. In this sense Q is much smaller than R, but is still manages to be
dense in R. While we know there are holes in Q, we might still wonder about the general
structure of Q within R.
Thus in between any two rationals, we can find an irrational, and we can use this to show
that if a set of rational numbers has two points, then it cannot be connected; that is, Q is
totally disconnected asp a subset of R.
This example that 2 62 Q displays something troubling about the structure of Q. Con-
sider the set A = {x 2 Q : x2 < 2}. It is easy to see that this set is bounded (the elements of
this set do not get arbitrarily large; for example, for x 2 A, we will certainly have x < 5), but
there is no tight upper bound in Q. That is, for any rational number q such that x < q for any
x 2 A, one could find a smaller rational number p < q such that x < p for all x 2 A. In short:
there is no least upper bound for this set in Q. This is a feature which is fixed by moving to R.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 17
Proposition 59. Every non-empty bounded set in R has a finite supremum and infimum.
(Indeed, if we allow the supemum or infimum to take the values ±1, then every non-empty
set in R has a supremum and infimum.)
This is one more way in which we have “filled in the holes” when moving from Q to R.
We want to further study the analytical and topological properties of R. We have already
remarked that R is complete and thus every Cauchy sequence in R has a limit in R. As we
said before, one advantage of this is that when testing if a sequence has a limit, we do not
need to identify a candidate for the limit to prove convergence. We would like other such
tests to characterize when limits in R exist.
Loosely speaking, there are two possible ways for a sequence not to converge. It could
either blow up to ±1 as with the sequence xn = n2 or it could oscillate between certain
values as with the sequence xn = ( 1)n . In the first case, no matter how we look at it, the
sequence will always diverge, but in the latter case, if we look only along the even terms
x2n = ( 1)2n = 1, then we have a stable sequence which converges. The succeeding defini-
tions and theorems deals with this situation.
that |xn | M for all n 2 N . Then there exists some susbsequence {xnk } which converges.
Another way to state the above theorem is that if the sequence {xn } resides in the
bounded set (a, b), then along a subsequence we can find a limit. If instead we consider the
closed set [a, b], then this set contains its limit points and so the limit must also lie in [a, b].
With this is mind, we state a theorem characterizing compact sets in R.
Thus compact sets are precisely those which contain all their limit points and are not
too large in either direction. With this in mind, the prototypical compact sets in R are the
closed intervals [a, b] where a, b 2 R, a < b. However, this theorem applies more generally
in the metric topology on Rn . In light of the Bolzano-Weierstrass Theorem, we can add
another equivalent statement.
C is compact,
Sets satisfying the third property are called sequentially compact and this theorem tells
us that in R, sets are compact i↵ they are sequentially compact.
Now a valid question is: why is compactness an important property? The definition of
compactness is somewhat opaque, but compactness allows you to narrow your focus from
infinitey many things to finitely many things. This especially comes in handy when dealing
with functions. Indeed, we will move from here to discussing functions on R.
Proposition 67. Continuous functions from compact sets into R are bounded, and achieve
maximum and minimum values. Specifically, suppose that C ⇢ R, C is compact and
f : C ! R is continuous. Then there are xmin , xmax 2 C such that f (xmin ) f (x) f (xmax )
for all x 2 C. [Note: this is not only asserting that f remains trapped between two extreme
values, it is also asserting the existence of xmin and xmax where f meets those extreme values.]
How does compactness come into play here? Consider, if f is continuous then the sets
Un = {x 2 C : f (x) < n} for n 2 N are open since they are the pre-image of the open
sets ( 1, n). Also since f (x) 2 R for all x 2 C, for each x 2 C we can find n 2 N such
that x 2 Un . This shows that {Un } is an open cover of C. If C is compact, there is a
finite subcover Un1 , · · · , UnK . However, these sets are nested, so this shows that C ⇢ UN
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 19
where N = max{n1 , . . . , nK }. Then for all x 2 C, we have f (x) < N , which shows that f
is bounded from above. What has happened here? We began with infinitely many di↵erent
bounds f (x) < n for n 2 N each of which may have applied at di↵erent portions of C.
Using compactness we were able to pare this down to a finite number of bounds, and then
simply pick the largest one. Making a similar argument, we can get a lower bound, and
thus the range {f (x) : x 2 C} is bounded. Since the set is bounded, it has an infimum
and supremum. The theorem also asserts that f will meet these values. How do we find the
point that meets the supremum? We can contruct a maximizing sequence {xn } such that
By sequential compactness, the sequence has a subsequence {xnk } with a limit xmax 2 C
and by continuity, we will have f (xnk ) ! f (xmax ) and f (xmax ) = supx2C f (x). Thus while
compactness helps us arrive at the bound on f , sequential compactness helps us actually
find the point where f achieves the bound.
Above we defined not only continuity but also uniform continuity and Lipschitz conti-
nuity. We would like easy ways to identify which functions satisfy these stronger properties
and this is somewhere where compactness can help a bit as well.
Proposition 68. Continuous functions on compact sets are uniformly continuous. That is,
suppose that C ⇢ R and f : C ! R is continuous. If C is compact, then f is uniformly
continuous.
Again, we should observe how compactness comes into play. Fix " > 0. Recall, if f is
continuous at each point x 2 C, then for each individual point, we can find x > 0 such that
for y 2 C, |x y| < x =) |f (x) f (y)| < ". Here we have (possibly) infinitely many
di↵erent x > 0, but if we want to satisfy the definition of uniform continuity, we need to
have a single > 0. If the number x > 0 works in the definition of continuity at x 2 C, then
any smaller number 0 < 0 < x will also work. Thus one idea is to take the minimum over
all such x > 0, and this minimal will work for all x 2 C. However, the set { x }x2C may not
have a minimum and its infimum maybe zero, so this doesn’t quite work. But we note that
the sets (x x , x + x ) form any open cover of C. If C is compact, we can extrace a finite
subcover (x1 x1 , x1 + x1 ), . . . , (xK xK , xK + xK ) which still covers all of C. Now there
are only finitely many xk to choose from; choosing the minimum = min{ x1 , . . . , xK } will
provide a > 0 which works uniformly over all x 2 C, proving that f is uniformly contin-
uous. Again, we started with an infinite collection, and compactness allowed us to pare it
down to a finite collection.
Lipschitz continuity can also be easier to identify via compactness but in a slightly more
complicated way. First, recall a few definitions and theorems from calculus (for more expo-
sition regarding the calculus topic, one can look back to the calculus notes).
f (b) f (a)
= f 0 (c).
b a
This, in turn, implies that |f (b) f (a)| = |f 0 (c)| |b a| .
Note the similarity between the last statement and the definition of Lipschitz continuity.
There’s a very formal similarity that hints at some connection of the form |f 0 (c)| ⇠ L. In-
deed, we can make this precise.
This gives a rough equivalence between Lipschitz continuous functions and continuously
di↵erentiable functions. However, based on this continuous di↵erentiability still seems a bit
stronger than Lipschitz continuity and indeed, it is. Take for example, the function f (x) = |x|
for x 2 R. This function if Lipschitz continuous with Lipschitz constant 1 because of the
reverse triangle inequality:
However, this function is not di↵erentiable on all of R. [In fact, a famous theorem states
thata function is Lipschitz continuous functions i↵ it is di↵erentiable almost everywhere2
and the a.e. derivative is essentially bounded.]
Finally, we discuss sequences of functions and the interplay between sequences of func-
tions and the Riemann integral. Indeed, one of the reasons that the Lebesgue integral and
the field of measure theory came about was because the Riemann integral does not play nice
with sequences of functions, as we will see shortly.
f : A ! R i↵ for all x 2 A, the sequence {fn (x)} in R converges to f (x). That is, {fn }
converges pointwise to f i↵ for every x 2 A and " > 0, there is N = N (x, ") 2 N such that
|fn (x) f (x)| < " for all n N .
Note that
fn (0) = 0, fn (±1) = 1, for all n 2 N.
If x 2 [ 1, 1] \ { 1, 0, 1}, then log |x| < 0 and so
Conversely, if a, b 0 then |a2 + b2 | a + b (one can easily verify this inequality by squaring
both sides), and so r
1 1
gn (x) = x2 + |x| + p .
n n
Combining the inequalities and subtracting |x|, we see
1
0 gn (x) |x| p , x 2 [ 1, 1].
n
Thus for all x 2 [ 1, 1], we have
lim gn (x) = |x|
n!1
There are two interesting di↵erences to point out between these examples. In the first
example, we had a sequence of continuous functions which converged pointwise to a discon-
tinuous function, which is somewhat disconcerting (this is similar to before when we had
a sequence of rationals converging to an irrational; morally, this indicates that continiuous
functions are “incomplete with respect to pointwise limits”). The other di↵erence is that
proving the convergence of fn required special consideration for di↵erent values of x, whereas
proving the convergence for gn did not. To address both of these, we introduce a stronger
notion of convergence.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 22
Again, at first glance this looks roughly the same as pointwise convergence, but the word
“uniform” indicates that the same N in the definition of convergence works for all x in the
set. Because there are two di↵erent modes of convergence, when considering functions the
statement fn ! f is vague, and one should always specify what type of convergence is be-
ing proven/assumed (there are many other types of convergence as well; these are typically
addressed in a first course on measure theory). Uniform convergence is important for the
following reason.
Morally, this proposition states that continuous functions are “complete with respect to
uniform convergence.” We would like to place some topological structure on sets of continu-
ous functions to make this more rigorous.
This map defines a norm on C(A). Thus d : C(A) ⇥ C(A) ! [0, 1) defined by
is a metric on C(A).
Using Proposition 76 one can prove that for A ⇢ R compact, the metric space (C(A), d)
with d defined as in Definition 78 is a complete metric space (indeed convergence in this
metric space is precisely uniform convergence). This fact is very important in di↵erential
equations, where one uses Picard iteration to prove existence and uniqueness of certain
equations. There are several more theorems regarding the structure of C(A), focusing, for
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 23
example, on identifying sets which are compact in the metric topology or identifying the
continuous dual space of C(A). These are beyond the scope of these notes.
Lastly, we want to examine the interplay between the Riemann integral and convergence
of sequences of functions. We will not recall the definition of the Riemann integral here (see
the notes on Calculus I), except to remind the reader that it is essentially “area under the
curve” and can be computed using the Fundamental Theorem of Calculus. With this in
mind, it is reasonable to think that if fn converges to f (in some sense) then the limit of the
Riemann integrals of fn should converge to the Riemann integral of f . However, this is not
always the case.
[Note: this (and similar examples) is often referred to as “vertical escape to 1”; the mass
under the curve vanished as n ! 1 because the functions got very large on a very small set]
For a second example, consider the functions fn : [0, 1) ! R defined by
8
>
> 0, 0 x < n,
>
>
<
fn (x) = n1 , n x 2n,
>
>
>
>
:
0, 2n < x < 1.
Christian Parkinson GRE Prep: Topology & Real Analysis Notes 24
Here we see that 0 fn (x) 1/n for all x 2 [0, 1) and all n 2 N, from which it readily
R 1 uniformly 1to the zero function f ⌘ 0. However, calculating the
follows that fn converges
integral, we find that 0 fn (x)dx = n · (2n n) = 1 and so once again, we have
Z 1 Z 1
1 = lim fn (x)dx 6= f (x)dx = 0.
n!1 0 0
[Note: this (and similar examples) is referred to as “horizontal escape to 1”; the mass under
the curve vanished as n ! 1 because the support of the functions went to 1.]
Rb Rb
Both of these display cases where we may have a fn (x)dx 6! a f (x)dx. What was the
key feature of each example? In the first example, the convergence was non-uniform; in
the second example, the domain was non-compact. If both these are rectified, then we can
guarantee that limits of integrals are integrals of limits.
so in this case uniform convergence and a compact domain were not necessary. This was
one of the motivating factors for developing measure theory and the Lebesgue integral: in
order
R to find less
R stringent conditions on the behavior of fn while still guaranteeing that
f
X n
(x)dx ! X
f (x)dx when fn ! f . There is one theorem that is particularly helpful for
this. Because we do not have the machinery of measure theory, we cannot state the theorem
in full generality but we will state a particular case.
This version of the theorem essentially says that so long as there is no possibility of
“vertical/horizontal escape to 1” as in Example 79, we will have the desired behavior for
the limit of the integrals.
Math 94 Professor: Padraic Bartlett
This is the first week of the Mathematics Subject Test GRE prep course! We start by
reviewing the concept of a limit, with an eye for how it applies to sequences, series and
functions. There are more examples here than we had time for in class, so don’t worry if
you don’t recognize everything here!
Definition. A sequence {an }1 n=1 is called bounded if there is some value B 2 R such that
|an | < B, for every n 2 N. Similarly, we say that a sequence is bounded above if there is
some value U such that an U, 8n, and say that a sequence is bounded below if there is
some value L such that an L, 8n.
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, . . . ,
Definition. A sequence {an }1 n=1 converges to some value if the an ’s “go to ” at infinity.
To put it more formally, limn!1 an = i↵ for any distance ✏, there is some cuto↵ point N
such that for any n greater than this cuto↵ point, an must be within ✏ of our limit .
In symbols:
Convergence is one of the most useful properties of sequences! If you know that a
sequence converges to some value , you know, in a sense, where the sequence is “going,”
and furthermore know where almost all of its values are going to be (specifically, close to
.)
Because convergence is so useful, we’ve developed a number of tools for determining
where a sequence is converging to:
1
1.1 Sequences: Convergence Tools
1. The definition of convergence: The simplest way to show that a sequence con-
verges is sometimes just to use the definition of convergence. In other words, you
want to show that for any distance ✏, you can eventually force the an ’s to be within
✏ of our limit, for n sufficiently large.
How can we do this? One method I’m fond of is the following approach:
• First, examine the quantity |an L|, and try to come up with a very simple
upper bound that depends on n and goes to zero. Example bounds we’d love to
run into: 1/n, 1/n2 , 1/ log(log(n)).
• Using this simple upper bound, given ✏ > 0, determine a value of N such that
whenever n > N , our simple bound is less than ✏. This is usually pretty easy:
because these simple bounds go to 0 as n gets large, there’s always some value
of N such that for any n > N , these simple bounds are as small as we want.
• Combine the two above results to show that for any ✏, you can find a cuto↵ point
N such that for any n > N , |an L| < ✏.
That said: if you find yourself needing to resort to the ✏ N definition for the limit
of a sequence on the GRE test, something has likely gone wrong. Far more useful are
results like the following:
2. Arithmetic and sequences: These tools let you combine previously-studied results
to get new ones. Specifically, we have the following results:
3. Monotone and bounded sequences: if the sequence {an }1 n=1 is bounded above and
nondecreasing, then it converges; similarly, if it is bounded above and nonincreasing,
it also converges. If a sequence is monotone, this is usually the easiest way to prove
that your sequence converges, as both monotone and bounded are “easy” properties
to work with. One interesting facet of this property is that it can tell you that a
sequence converges without necessarily telling you what it converges to! So, it’s often
of particular use in situations where you just want to show something converges, but
don’t actually know where it converges to.
2
tools that you can use to directly show that something diverges, and as such is pretty
useful.
5. Squeeze theorem for sequences: if limn!1 an , limn!1 bn both exist and are equal
to some value l, and the sequence {cn }1n=1 is such that an cn bn , for all n, then the
limit limn!1 cn exists and is also equal to l. This is particularly useful for sequences
with things like sin(horrible things) in them, as it allows you to “ignore” bounded bits
that aren’t changing where the sequence goes.
6. Cauchy sequences: We say that a sequence is Cauchy if and only if for every ✏ > 0
there is a natural number N such that for every m > n N , we have
|am an | < ✏.
You can think of this condition as saying that Cauchy sequences “settle down” in the
limit – i.e. that if you look at points far along enough on a Cauchy sequence, they all
get fairly close to each other.
The Cauchy theorem, in this situation, is the following: a sequence is Cauchy if and
only if it converges.
Much like the ✏ N definition, if you find yourself showing something is Cauchy
to show it converges, you have probably made a mistake in your choice of methods.
That said, sometimes definition-centric questions will crop up that amount to “do you
remember this concept;” Cauchy is certainly one such concept you could be asked to
recall.
Proof. When we discussed the definition as a convergence tool, we talked about a “blueprint”
for how to go about proving convergence from the definition: (1) start with |an L|, (2)
try to find a simple upper bound on this quantity depending on n, and (3) use this simple
bound to find for any ✏ a value of N such that whenever n > N , we have
3
p p
Let’s try this! Specifically, examine the quantity | n + 1 n 0|:
p p p p
| n+1 n 0| = n + 1 n
p p p p
( n+1 n)( n + 1 + n)
= p p
n+1+ n
n+1 n
=p p
n+1+ n
1
=p p
n+1+ n
1
<p .
n
All we did here was hit our |an L| quantity with a ton of random algebra, and kept trying
things until we got something simple. The specifics aren’t as important as the idea here:
just start with the |an L| bit, and try everything until it’s bounded by something simple
and small!
In our specific case, we’ve acquired the upper bound p1n , which looks rather simple: so
let’s see if we can use it to find a value of N .
Take any ✏ < 0. If we want to make our simple bound p1n < ✏, this is equivalent to
p
making 1✏ < n, i.e ✏12 < n. So, if we pick N > ✏12 , we know that whenever n > N , we have
n > ✏12 , and therefore that our simple bound is < ✏. But this is exactly what we wanted!
In specific, for any ✏ > 0, we’ve found a N such that for any n > N , we have
p p 1 1
| n+1 n 0| < p < p < ✏,
n N
p p
which is the definition of convergence. So we’ve proven that limn!1 n+1 n = 0.
a1 = 2,
p
an+1 = 3a2n 1
converges.
Proof. This is a recursively-defined sequence; that is, the terms of this sequence are not
explicitly stated, but rather defined in terms of earlier terms! This is a bit of a headache
for us in terms of determining where this sequence goes. So: let’s not do that yet! Instead,
let’s just try to determine if it goes anywhere at all first; that is, let’s see if we can determine
whether or not it converges!
If we want to show a sequence converges without knowing where it converges to, there
are relatively few tools we have (basically Monotone+Bounded, or Cauchy.) Cauchy is
. . . not very pleasant-looking, so let’s see if this is monotone and bounded!
4
p p p p
From inspection (a1 = 2, a2 = 5, a3 = 3 5 1 > 5, . . .) with a calculator, it seems
like our terms are increasing — that is, an+1 an , for all n! We can prove this formally by
induction: p p p
Base case: a2 = 3 · 2 1 = 5 > 4 = 2 = a1 .
Inductive step: assume that an+1 an ; we will use this assumption to prove that
an+2 an+1 . To do this, simply look at an+2 . By definition + our inductive hypothesis,
we have
p p
an+2 = 3an+1 1 3an 1 = an+1 ,
So: by the monotone+bounded theorem from before, a limit exists! How can we find
it?
Well: let’s say that our sequence converges to some value L, say: then we have
lim an = L.
n!1
If we use our definition for an , we can see that this limit is als
p
lim 3an 1 1 = L.
n!1
When presented with a limit like this, our first reaction should probably be “square roots
are irritating.” Our response, therefore, should be to get rid of them! In other words, let’s
square both sides; i.e. by using arithmetic and limits, we get
p p p
lim ( 3an 1 1)2 = ( lim 3an 1 1)( lim 3an 1 1) = L · L.
n!1 n!1 n!1
This is nicer! In particular, after squaring, we can manipulate the LHS to get that
3 lim an 1 = L2 + 1.
n!1
Because our limit is taken as n goes to infinity, the an 1 ’s are just ahuge for appropriately
huge values of huge; in other words, the limit on the left just goes to the same place that
lim an goes to, i.e. L.
n!1
Therefore, we actually have
3L = L2 + 1;
5
in other words L2 3L + 1 = 0. This is a quadratic equation; we can solve for its two roots
and get
p p
3± 9 4 3± 5
= .
2 2
p
Which of these is the limit? Well: if we look at 3 2 5 , this number is strictly less than 32 ,
which is in turn less than 2. But our sequence starts at 2 and increases; so this cannot be
the limit! Therefore, we know that our limit L must be the other value above: namely,
p
3+ 5
⇡ 2.62.
2
1 sin(terrible things) 1,
no matter what terrible things we’ve put into the sin function. Dividing the left and right
by n, we have that
1 sin(terrible things) 1
,
n n n
1 1
for every n. Then, because limn!1 n = limn!1 n = 0, the squeeze theorem tells us that
n
!
.
e ..
sin n2 · ⇡ n 12n · nn
lim =0
n!1 n
as well.
6
Definition. A sequence is called summable if the sequence {sn }1
n=1 of partial sums
n
X
sn := a1 + . . . an = ak
k=1
converges.
If it does, we then call the limit of this sequence the series associated to {an }1
n=1 , and
denote this quantity by writing
1
X
an .
n=1
P Pn 1
We say that a series 1 n=1 an converges or diverges if the sequence { k=1 ak }n=1 of
partial sums converges or diverges, respectively.
Just like sequences, we have a collection of various tools we can use to study whether a
given series converges or diverges. Here are two such tools:
1. Comparison test: If {an }1 1
n=1 , {bn }n=1 are a pair of sequences such that 0 an bn ,
then the following statement is true:
1
! 1
!
X X
bn converges ) an converges .
n=1 n=1
When to use this test: when you’re looking at something fairly complicated
P that either
(1) you can bound above by something simple that converges, like 1/n 2 , or (2) that
P
you can bound below by something simple that diverges, like 1/n.
2. Ratio test: If {an }1
n=1 is a sequence of positive numbers such that
an+1
lim = r,
n!1 an
When to use this test: whenever you have something that looks a lot easier to integrate
X1
1
than to sum. (In particular, this test instantly proves that converges for c > 1
nc
n=1
and diverges for c 1. In particular, this test I think answers every problem the “p-
series test” solves, if that is one you remember from your calculus/analysis classes!)
7
4. Alternating series test: If {an }1
n=1 is a sequence of numbers such that
To illustrate how to work with these definitions, we work a collection of examples here:
8
P1 1
Look at the series n=1 an + n2
. Because both of the series
1
X 1
X 1
an ,
n2
n=1 n=1
converges.
Proof. Motivated by the presence of both a n! and a 2n , we try the ratio test:
2n ·n!
an nn+1
= 2n 1 ·(n 1)!
an 1
(n 1)n
2n · n! · (n 1)n
=
2n 1 · (n 1)! · nn+1
2 · n · (n 1)n
=
nn+1
2 · (n 1)n
= n
✓n ◆
n 1 n
=2·
n
✓ ◆
1 n
=2· 1
n
Here, we need one bit of knowledge that you may not have encountered before: the fact
that the limit
⇣ x ⌘n
lim 1 + = ex ,
n!1 n
and in particular that
✓ ◆n
1 1
lim 1 = .
n!1 n e
9
(Historically, I’m pretty certain that that this is how e was defined; so feel free to take it
as a definition of e itself.)
Applying this tells us that
✓ ◆
an 1 n 2
lim = lim n ! 12 · 1 = ,
n!1 an 1 n e
which is less than 1. So the ratio test tells us that this series converges!
Proof. The terms in this series are alternating in sign: as well, they’re bounded above and
below by ± n1 , both of which converge to 0. Therefore, we can apply the alternating series
test to conclude that this series converges.
Proof. We start by looking at the series composed of the absolute values of these terms:
1
X | cosn (nx)|
n!
n=1
Because | cos(x)| 1 for all x, we can use the comparison test to notice that this series will
converge if the series
1
X 1
n!
n=1
converges.
We can study this series with the ratio test:
1
n! 1
lim 1 = lim = 0,
n!1
(n 1)!
n!1 n
which is less than 1. Therefore this series converges, and therefore (by the comparison test
+ absolute convergence ) convergence) our original series
1
X cosn (nx)
n!
n=1
converges.
10
Claim. The series
1
X
n2
ne
n=1
converges.
we know that this function is decreasing on all of [1, 1). As well, it is positive on [1, 1):
so we can apply the integral test to see that this series converges i↵ the integral
Z 1
2
xe x dx
n=1
converges.
But this is not too hard to show! – by using the u-substitution u = x2 , we have that
Z 1 Z 1 u u
1
x2 e e e
xe dx = du = = ,
n=1 1 2 2 2
1
lim f (x) = L
x!a
if and only if
1. (vague:) as x approaches a, f (x) approaches L.
2. (precise; wordy:) for any distance ✏ > 0, there is some neighborhood > 0 of a such
that whenever x 2 X is within of a, f (x) is within ✏ of L.
3. (precise; symbols:)
11
Somewhat strange definitions, right? At least, the two “rigorous” definitions are some-
what strange: how do these epsilons and deltas connect with the rather simple concept of
“as x approaches a, f (x) approaches f (a)”? To see this a bit better, consider the following
image:
A+
A
A-
b- b+
This graph shows pictorially what’s going on in our “rigorous” definition of limits and
continuity: essentially, to rigorously say that “as x approaches a, f (x) approaches f (a)”,
we are saying that
• for any distance ✏ around f (a) that we’d like to keep our function,
Basically, what this definition says is that if you pick values of x sufficiently close to a, the
resulting f (x)’s will be as close as you want to be to f (a) – i.e. that “as x approaches a,
f (x) approaches f (a).”
This, hopefully, illustrates what our definition is trying to capture – a concrete notion
of something like convergence for functions, instead of sequences.
In practice, on the GRE, you are probably in trouble if you try to use this definition
(gorgeous as it is.) Instead, what you likely want to do is try some of the following tools:
1. Squeeze theorem: If f, g, h are functions defined on some interval I \ {a}1 such that
then limx!a g(x) exists, and is equal to the other two limits limx!a f (x), limx!a h(x).
1
The set X \ Y is simply the set formed by taking all of the elements in X that are not elements in Y .
The symbol \, in this context, is called “set-minus”, and denotes the idea of “taking away” one set from
another.
12
2. Limits and arithmetic: if f, g are a pair of functions such that limx!a f (x),
limx!a g(x) both exist, then we have the following equalities:
⇣ ⌘ ⇣ ⌘
lim (↵f (x) + g(x)) = ↵ lim f (x) + lim g(x)
x!a x!a x!a
⇣ ⌘ ⇣ ⌘
lim (f (x) · g(x)) = lim f (x) · lim g(x)
x!a x!a x!a
✓ ◆ ⇣ ⌘ ⇣ ⌘
f (x)
lim = lim f (x) / lim g(x) , if lim g(x) 6= 0.
x!a g(x) x!a x!a x!a
lim f (g(x)) = L.
x!b
4. L’Hôpital’s rule: If f (x) and g(x) are a pair of di↵erentiable functions such that
either
6 Limits: Examples
Example.
lim x2 sin(1/x) = 0.
x!0
1 sin(1/x) 1
) x2 x2 sin(1/x) x2 ;
lim x2 sin(1/x) = 0
x!0
as well.
13
Example.
if a 6= 0.
Proof. By our work earlier in this lecture, 1/x2 is continuous at any value of a 6= 0, and
from class sin(x) is continuous everywhere: thus, we have that their composition, sin(1/a2 ),
is continuous wherever x 6= 0. Thus,
as claimed.
(1 x)x 1 + x2
lim
x!0 x3
converges to 1/2.
Proof. We bash this limit repeatedly with L’Hôpital’s rule. First, before we can apply
L’Hôpital’s rule, we must check that its conditions apply. The functions contained in the
numerator and denominator are all infinitely di↵erentiable near 0, so this will never be a
stumbling block: furthermore, because the numerator and denominator are both continu-
ous/defined at 0, we can evaluate their limits at 0 by just plugging in 0: i.e.
So we’ve satisfied the conditions for L’Hôpital’s rule, and can apply it to our limit:
d
(1 x)x 1 + x2 dx (1 x)x 1 + x2
lim =L0 H lim .
x!0 x3 x!0 d
dx (x3 )
At this point, we recall how to di↵erentiate functions of the form f (x)g(x) , where f (x) > 0,
by using the identity
14
In particular, we can rewrite (1 x)x as eln(1 x)·x , which will let us just di↵erentiate
using the chain rule:
d
(1 x)x 1 + x2 dx (1 x)x 1 + x2
lim =L0 H lim
x!0 x3 x!0 d
dx (x3 )
⇣ ⌘
x
d
eln(1 x)·x 1+ x2 eln(1 x)·x · ln(1 x) + x 1 + 2x
dx
= lim d
= lim
x!0
dx (x3 ) x!0 3x2
Again, both the numerator and denominator are continuous, and plugging in 0 up top yields
eln(1)·0 · ln(1) + 01 2 · 0 = 0, while on the bottom we also get 0. Therefore, we can apply
L’Hôpital’s rule again to get that our limit is just
⇣ ⇣ ⌘ ⌘
d ln(1 x)·x · ln(1 x
dx e x) + x 1 + 2x
lim d 2
dx (3x )
x!0
⇣ ⌘2 ⇣ ⌘
eln(1 x)·x · ln(1 x) + x x 1 + eln(1 x)·x · 1
1 x
1
(x 1) 2 +2
= lim
x!0 6x
Again, the top and bottom are continuous near 0, and at 0 the top is
✓ ◆2 ✓ ◆
0 1 1
eln(1 0)·0
· ln(1 0) + + eln(1 0)·0
· +2=0 2 + 2 = 0,
0 1 1 0 (0 1)2
while the bottom is also 0. So, we can apply L’Hôpital again! This tells us that our limit
is in fact
✓ ⇣ ⌘2 ⇣ ⌘ ◆
d ln(1 x)·x x ln(1 x)·x 1 1
dx e · ln(1 x) + x 1 + e · 1 x (x 1)2
+2
lim d
dx (6x)
x!0
⇣ ⌘3 ⇣ ⌘ ⇣ ⌘
eln(1 x)·x · ln(1 x) + x x 1 +3eln(1 x)·x · ln(1 x) + x x 1 · 1
1 ⌘x
1
(x 1) 2
⇣
+eln(1 x)·x · (x 11)2 + (x 21)3
= lim .
x!0 6
Again, the top and bottom are made out of things that are continuous at 0. Plugging in 0
to the top this time gives us 3, while the bottom gives us 6: therefore, the limit is just
3 1
= .
6 2
So we’re done! (In class, I did a less-awful L’Hôpital bash than this to save time. I do this
here to illustrate just how many times you may have to apply L’Hôpital to get an answer,
though the average GRE problem will be less messy than this calculation!)
15
Math 94 Professor: Padraic Bartlett
This is the second week of the Mathematics Subject Test GRE prep course; here, we
review the concepts of derivatives and integrals!
1 Bestiary of Functions
For convenience’s sake, we list the definitions, integrals, derivatives, and key values of several
functions here.
Name Domain Derivative Integral Key Values
ln(x) (0, 1) 1/x x · ln(x) x + C ln(1) = 0,
ln(e) = 1
ex R ex ex + C e0 = 1,
e1 = e
sin(x) R cos(x) cos(x) + C sin(0) = p 0,
sin 4 = 22 ,
⇡
sin ⇡2 = 1
cos(x) R sin(x) sin(x) + C cos(0) = p 1,
cos 4 = 22 ,
⇡
cos ⇡2 = 0
2k+1
tan(x) x 6= 2 ⇡ sec2 (x) ln | sec(x)| + C tan(0) = 0,
tan ⇡4 = 1
2k+1
sec(x) x 6= 2 ⇡ sec(x) tan(x) ln | sec(x) + tan(x)| + C sec(0) = p 1,
⇡
sec 4 = p 2
csc(x) x 6= k⇡ csc(x) cot(x) ln | csc(x) cot(x)| + C csc ⇡4 = 2,
sec ⇡2 = 1
cot(x) x 6= k⇡ csc2 (x) ln | sin(x)| + C cot ⇡4 = 1,
p cot ⇡2 = 0
arcsin(x) ( 1, 1) p 1 x arcsin(x) + 1 x2 + C arcsin (0) = 0,
1 x2
p arcsin (1) = ⇡2
1
arccos(x) ( 1, 1) p
1 x2
x arccos(x) 1 x2 + C arccos (0) = ⇡2 ,
arcsin (1) = 0
1 ln(1+x2 )
arctan(x) R 1+x2
x arctan(x) 2 +C arctan (0) = 0,
arctan (1) = ⇡2
Memorizing all of these is not necessary to do well on the GRE: as we’ll discuss in class,
you can derive almost all of these identities on the fly by using the product/chain rules
or integration by parts/substitution! However, doing those calculations can take time, and
memorizing these formulas will save you time on the test; consider studying them in the
two weeks before you test with flashcards and the like!
1
2 The Derivative
As always, we start with the formal definition:
exists. If it does, denote this limit as f 0 (a); we will often call this value the derivative of
f at a.
Again, as before, if you find yourself directly using this definition to solve a GRE problem,
something has likely gone wrong! Instead, you likely want to use one of several rules that
we know for evaluating derivatives:
2.1 Tools
1. Di↵erentiation is linear: For f , g a pair of functions di↵erentiable at a and ↵, a
pair of constants,
5. Inverse function rule: Suppose that f (x) is a bijective function with inverse f 1 (x),
1 1
f (x) =
a f 0 (f 1 (a))
2
2.2 Theorems and Interpretations
The derivative has a number of useful interpretations and associated theorems. We state a
few here:
2. Tangents: if f (x) = y is a curve, the slope of the tangent to this curve at any point
x0 is given by f 0 (x0 ).
3. Mean Value Theorem: The Mean Value Theorem (abbreviated MVT) is the fol-
lowing result. Suppose that f is a continuous function on the interval [a, b] that’s
di↵erentiable on (a, b). Then there is some value c such that
f (b) f (a)
f 0 (c) = .
b a
In other words, there is some point c between f (a) and f (b) such that the derivative at
that point is equal to the slope of the line segment connecting (a, f (a)) and (b, f (b)).
The following picture illustrates this claim:
3
4. Classification of extrema: You can use the derivative to find minima and maxima
of functions! Specifically, recall the following two definitions:
Definition. A function f has a critical point at some point x if either of the two
properties hold:
• f is not di↵erentiable, or
• f 0 (x) = 0.
f (x) = xn xnn
take its local and global minima and maxima in the interval [ 2n, 2n]?
Proof. First, note that if n = 1 our function is identically 0, and thus its local and global
minima and maxima are uninteresting. We will focus on n > 1 for the rest of the proof.
By the above proposition, we know that f will take on these minima and maxima at
its critical points and endpoints. Because f is di↵erentiable everywhere, f ’s only critical
points come at places where f 0 (x) = 0. We examine these points here:
f 0 (x) = nxn 1
nn = 0
, nxn 1
= nn
, xn 1
= nn 1
.
There are two cases, here: if n is odd, its critical points occur at ±n; if n is even, however,
its only critical point is at n. In either situation, we have that f 00 (x) = n(n 1)xn 2 ; thus,
4
we have that x = n is a local minima regardless of whether n is odd or even, while x = n
is a local maxima for n odd.
This accomplished, we can then evaluate our function at these points along with the
endpoints, and use this to find its global maxima and minima:
For n odd:
So: if n > 2, we know that 2n > 2n; consequently, we have that f ( 2n) is the global
minima and f (2n) is the global maxima. Because every odd number other than 1 is > 2,
we’ve thus resolved our question of n odd.
For n even:
For any even value of n, this function has its global maxima at f ( 2n) and its global minima
at f (n). Thus, we’ve classified f ’s local and global minima and maxima for any value of n:
so we’re done!
Question. Let p(t) denote the current location of a particle moving in a one-dimensional
space. Call this particle “nice” if p(0) = 0, p(1) = 1, p0 (0) = p0 (1) = 0, and p(t) is
continuous.
What is
!
inf sup |p00 (t)| ?
“nice” particles t2[0,1]
Proof. To start studying the above claim, let’s assume that there is some answer M : in
other words, there is some M such that
1. M |p00 (t)|, for any nice particle p and any t 2 [0, 1].
2. M is the smallest such number that the above claim holds for.
What can we do from here? Well: we have some boundary conditions (niceness tell us
that p(0) = 0, p(1) = 1, p0 (0) = 0, p0 (1) = 0) and one global piece of information (|p00 (t)| <
M ). How can we turn this knowledge of the second derivative into information about rest
of the function?
5
Well: if we apply the mean value theorem to the function p00 (t), what does it say? It
tells us that on any interval [a, b], there is some c 2 (a, b) such that
p0 (b) p0 (a)
= (p0 )0 (x) = p00 (c).
b a
In other words, it relates the first and second derivatives to each other! So, if we apply our
known bound |p00 (t)| < M, 8t 2 [0, 1], we’ve just shown that
p0 (b) p0 (a)
= |p00 (c)| < M,
b a
for any a < b 2 [0, 1]. In particular, if we set a = 0, b = t and remember our boundary
condition p0 (0) = 0, we’ve proven that
Excellent! We’ve turned information about the second derivative into information about
the first derivative.
Pretend, for the moment, that you’re back in your high school calculus courses, and
you know how to find antiderivatives. In this situation, you’ve got a function p(t) with the
following properties:
• p(0) = 0,
• p(1) = 1,
6
which is M 2
2 t + C. Using our boundary condition p(1) = 1 and multiplying both sides by
1 tells us that we can pick C = 1 and get
M 2
p(1 t) > t + 1, 8t 2 (0, 1).
2
2
But what happens if we plug in t = 12 ? In our first bound, we have p 12 < M 1
2 2 = M8 .
1 M 1 2 M
Conversely, in our second bound we have p 1 2 > 2 2 + 1 = 1 8 .
In other words, we have M 8 < p(1/2) < 1 M
8 , which forces M 4. So we know an
upper bound on our answer!
Moreover, it is an attainable bound, whose answer is suggested by our work here: if we
actually set M = 4, we get the piecewise function
⇢
2t2 , t 2 ( 1, 1/2]
p(t) = 2
1 2(1 t) , t 2 [1/2, 1)
3 Integration
As before, we start by defining our terms:
Definition. The integral: A function f is integrable1 on the interval [a, b] if and only
if the following holds:
• there is a partition a = t0 < t1 < . . . < tn 1< tn = b of the interval [a, b] such that
n n
!
X X
sup (f (x)) · length(ti 1 , ti ) inf (f (x)) · length(ti 1 , ti ) < ✏.
x2(ti 1 ,ti )
i=1 x2(ti 1 ,ti ) i=1
Pictorially, this is just saying that the area of the teal rectangles approaches the area of the
purple rectangles in the picture below:
1
To be specific, Riemann-integrable.
7
Because of this picture, we often say that the integral of a function on some interval [a, b]
is the area beneath its curve from x = a to x = b.
Again, using this theorem directly is usually not the best idea. Instead, we have a
number of tools and theorems that are helpful for calculating integrals:
8
3.2 Worked Examples
Question. What’s
Z 2
x2 ex dx ?
1
Proof. Looking at this problem, it doesn’t seem like a substitution will be terribly useful:
so, let’s try to use integration by parts!
How do these kinds of proofs work? Well: what we want to do is look at the quantity
we’re integrating (in this case, x2 ex ,) and try to divideRit into two parts – a “f (x)”-part
R and
a “g 0 (x)” part – such that when we apply the relation f (x)g 0 (x) = f (x)g(x) g(x)f 0 (x),
our expression gets simpler!
To ensure that our expression does in fact get simpler, we want to select our f (x) and
g 0 (x) such that
1. we can calculate the derivative f 0 (x) of f (x) and find a primitive g(x) of g 0 (x), so that
either
So: often, this means that you’ll want to put quantities like polynomials or ln(x)’s in the
f (x) spot, because taking derivatives of these things generally simplifies them. Conversely,
things like ex ’s or trig functions whose integrals you know are good choices for the integral
spot, as they’ll not get much more complex and their derivatives are generally no simpler.
Specifically: what should we choose here? Well, the integral of ex is a particularly
easy thing to calculate, as it’s just ex . As well, x2 becomes much simpler after repeated
derivation: consequently, we want to make the choices
f (x) = x2 g 0 (x) = ex
f 0 (x) = 2x g(x) = ex ,
Another integral! Motivated by the same reasons as before, we attack this integral with
integration by parts as well, setting
f (x) = 2x g 0 (x) = ex
f 0 (x) = 2 g(x) = ex .
9
This then tells us that
Z 2 Z 2
2
x2 ex dx = x2 ex 2xex dx
1 1
✓1 Z 2 ◆
2 2
0
= x2 ex f (x)g(x) f (x)g(x)dx
1 1 1
✓ Z 2 ◆
2 2
= x2 ex 2xe x x
2e dx
1 1 1
✓ ◆
2 2 2
= x2 ex 2xe x
2e x
1 1 1
2 1 2 1
= 4e e 4e 2e 2e2 + 2e1
= 2e2 e1 .
So we’re done!
Question. What is
Z 2
x2 sin(x3 )dx ?
0
Proof. How do we calculate such an integral? Direct methods seem unpromising, and using
trig identities seems completely insane. What happens if we try substitution?
Well: our first question is the following: what should we pick? This is the only “hard”
part about integration by substitution – making the right choice on what to substitute in.
In most cases, what you want to do is to find the part of the integral that you don’t know
how to deal with – i.e. some sort of “obstruction.” Then, try to make a substitution that
(1) will remove that obstruction, usually such that (2) the derivative of this substitution is
somewhere in your formula.
Here, for example, the term sin(x3 ) is definitely an “obstruction” – we haven’t developed
any techniques for how to directly integrate such things. So, we make a substitution to make
this simpler! In specific: Let g(x) = x3 . This turns our term sin(x3 ) into a sin(g(x)), which
is much easier to deal with Also, the derivative g 0 (x) = 3x2 dx is (up to a constant) being
multiplied by our original formula – so this substitution seems quite promising. In fact, if
we calculate and use our indicated substitution, we have that
Z 2 Z 2
1
2 3
x sin(x )dx = sin(g(x)) · · g 0 (x)dx
0 0 3
Z 23
= sin(x)dx
03
sin(8) sin(0)
=
3 3
sin(8)
= .
3
10
(Note that when we made our substitution, we also changed the bounds from [a, b] to
[g(a), g(b)]! Please, please, always change your bounds when you make a substitution!)
However: this is not the only way to use integration by substitution! Specifically, it is
possible to use integration by substitution to put a g(x) into an integral, as well! In
other words, if we have an integral of the form
Z b
f (x)dx,
a
as long as we make sure that g is continuous on this new interval [g 1 (a), g 1 (b)].
Why would you want to do this? Well: suppose you’re working with a function of the
form
1
.
a + x2
2
11
Question. What is
Z 1
3/2
x2 + 1 ?
0
1
Proof. Looking at this, we see that we have a 1+x 2 term, surrounded by some other bits
and pieces. So: let’s try the tangent substitution we talked about earlier! Specifically: let
3/2
f (x) = x2 + 1 , g(x) = tan(x),
g 0 (x) = cos12 (x) .
= f (g(x))g 0 (x)dx
g 1 (0)
Z tan 1 (1)
1
= cos3 (x) · dx
tan 1 (0) cos2 (x)
Z ⇡/4
= cos(x)dx
0
⇡/4
= sin(x) dx
0
p
2
= .
2
Z p Z
lima!1 a2 1 1
1 u 1
p p ·p du = p du.
22 1
2
u u +1 2
u +1 3 u2 +1
Now, you should try a trig substitution! In particular, try u = tan(t), t = arctan(u), du =
12
1
cos2 (t)
dt:
Z 1 Z lima!1 arctan(a)
1 1 1
p 2
du = p 2 · du
3 u +1 arctan( 3) 1 + tan (u) cos2 (u)
Z lima!1 arctan(a)
= p 1du
arctan( 3)
⇣ ⌘ p
= lim arctan(a) arctan( 3).
a!1
We know that tangent approaches positive-infinity on ( ⇡/2, ⇡/2) as its argument ap-
proaches ⇡/2: therefore, the limit
p as arctangent approaches +1 is just ⇡/2. Similarly,pwe
know that tangent is equal to 3 when its argument is equal to ⇡/3; therefore, arctan( 3)
is ⇡/3. Therefore, our integral is just is ⇡/6.
u = ln(1 + x2 ) dv = dx
2x
du = 1+x 2 v = x,
which gives us
Z 1 Z 1
1 2x2
ln(1 + x2 )dx = ln(1 + x2 ) · x dx
0 0 0 1 + x2
13
R3
We now look at 2 px+1+1 px 1 dx. Before we can do anything, we have to do some algebra
to clean uppthis function.
p Specifically, to simplify this expression, we multiply top and
bottom by x + 1 x 1, a common algebraic technique used on square-root-involving
expressions to clean things up:
Z 3 Z 3 p p
1 1 x+1 x 1
p p dx = p p ·p p dx
2 x+1+ x 1 2 x+1+ x 1 x+1 x 1
Z 3 p p
x+1 x 1
= p 2
p dx
2 ( x + 1) ( x 1)2
Z 3p p
x+1 x 1
= dx
2 x+1 x+1
Z
1 3p p
= x+1 x 1dx
2 2
Z Z
1 3p 1 3p
= x + 1dx x 1dx.
2 2 2 2
where there’s no immediately obvious way to set up the integral. Sometimes, we can be
1
particuarly clever, and notice some algebraic trick: for example, to integrate cos(✓) , we can
14
use partial fractions to see that
1 cos(✓)
=
cos(✓) cos2 (✓)
cos(✓)
=
1 sin2 (✓)
✓ ◆
1 cos(✓) cos(✓)
= + ,
2 1 sin(✓) 1 + sin(✓)
and then integrate each of these two fractions separately with the substitutions u = 1 ±
sin(✓).
Relying on being clever all the time, however, is not a terribly good strategy. It would
be nice if we could come up with some way of methodically studying such integrals above –
specifically, of working with integrals that feature a lot of trigonometric identities! Is there
a way to do this?
As it turns out: yes! Specifically, consider the use of the following function as a substi-
tution:
g(x) = 2 arctan(x),
where arctan(x) is the inverse function to tan(x), and is a function R ! ( ⇡/2, ⇡/2). In
class, we showed that such inverse functions of di↵erentiable functions are di↵erentiable
themselves: consequently, we can use the chain rule and the definition of the inverse to see
that
Then, if we remember how the trigonometric functions were defined, we can see that
(via the below triangles)
15
we have that
1
(arctan(x))0 = cos2 (arctan(x)) =
1 + x2
and thus that
2
g 0 (x) = .
1 + x2
As well: by using the above triangles, notice that
sin(g(x)) = sin(2 arctan(x))
= 2 sin(arctan(x)) cos(arctan(x)
1 x
=2· p ·p
1 + x2 1 + x2
2x
= ,
1 + x2
and
cos(g(x)) = cos(2 arctan(x))
= 2 cos2 (arctan(x)) 1
2
= 1
1 + x2
1 x2
= .
1 + x2
What do we know about the integral on the right? Well: as we’ve just shown above, the
substitution of g(x) turns all of the sin(x)’s into sin(g(x))’s, which are just reciprocals of
polynomials; similarly, we’ve turned all of the cos(x)’s into cos(g(x))’s, which are also made
of polynomials. In other words, this substitution turns a function that’s made entirely out of
trig functions into one that’s made only out of polynomials! – i.e. it turns trig functions
into quadratic polynomials! This is excellent for us, because (as you may have noticed) it’s
often far easier to integrate polynomials than trig functions.
This substitution is probably one of those things that’s perhaps clearer in its use than
its explanation. We provide an example here:
16
Example. Find the integral
Z ⇡/2
1
d✓.
0 1 + sin(✓)
Proof. So: without thinking, let’s just try our substitution ✓ = g(x), where g(x) = 2 arctan(x):
Z ⇡/2 Z g 1 (⇡/2)
1
d✓ = f (g(x))g 0 (x)dx
0 1 + sin(✓) g 1 (0)
Z tan(⇡/4)
1 2
= 2x · dx
tan(0) 1+ 1+x2
1 + x2
Z 1
2
= dx
0 1 + x2 + 2x
Z 1
2
= dx
0 (1 + x)2
Z 2
2
= dx
1 x2
2
2
=
x
1
= 1/2.
. . . so it works! Without any e↵ort, we were able to just mechanically calculate an integral
that otherwise looked quite impossible. Neat!
17
Math 94 Professor: Padraic Bartlett
Lecture 3: Derivatives in Rn
Week 3 UCSB 2015
This is the third week of the Mathematics Subject Test GRE prep course; here, we
review the concepts of derivatives in higher dimensions!
f (a + h · ei ) f (a)
lim .
h!0 h
(Here, ei is the i-th basis vector, which has its i-th coördinate equal to 1 and the rest equal
to 0.)
However, this is not necessarily the best way to think about the partial derivative, and
certainly not the easiest way to calculate it! Typically, we think of the i-th partial derivative
of f as the derivative of f when we “hold all of f ’s other variables constant” – i.e. if we
think of f as a single-variable function with variable xi , and treat all of the other xj ’s as
constants. This method is markedly easier to work with, and is how we actually *calculate*
a partial derivative.
We can extend this to higher-order derivatives as follows. Given a function f : Rn ! R,
we can define its second-order partial derivatives as the following:
✓ ◆
@2f @ @f
= .
@xi @xj @xi @xj
In other words, the second-order partial derivatives are simply all of the functions you can
get by taking two consecutive partial derivatives of your function f .
Definition. Often, we want a way to talk about all of the first-order derivatives of a
function at once. The way we do this is with the di↵erential, or total derivative. We
define this as follows: the total derivative of a function f : Rn ! Rm is the following matrix
of partial derivatives:
2 @f @f1 @f1
3
1
@x (a) @x2 (a) . . . @xn (a)
6 @f21 @f @f 7
6 @x1 (a) @x22 (a) . . . @xn2 (a)7
D(f ) a = 66 .. .. .. .. 7 7
4 . . . . 5
@fn @fn @fn
@x1 (a) @x2 (a) ... @xn (a)
1
For a function f : Rn ! R, this has the special name gradient, and is denoted
✓ ◆
@f @f @f
rf = , ,...
@x1 @x2 @xn
Definition. For functions f : Rn ! R, we can also define an object that generalizes the
“second-derivative” from one-dimensional calculus to multidimensional calculus. We do this
with the Hessian, which we define here. The Hessian of a function f : Rn ! R at some
point a is the following matrix:
2 3
@2f @2f
(a) . . . (a)
6 @x1 @x. 1 ..
@x1 @xn
.. 7
H(f ) a = 6
4 .
. . .
7.
5
2
@ f 2
@ f
@xn @x1 (a) . . . @xn @xn (a)
Finally: like with the normal second derivative, we can use H(f ) a to create a “second-
order” approximation to f at a, in a similar fashion to how we used the derivative to
create a linear (i.e. first-order) approximation to f . We define this here: if f : Rn ! R
is a function with continuous second-order partials, we define the second-order Taylor
approximation to f at a as the function
1
T2 (f ) a (a + h) = f (a) + (rf )(a) · h + · (h1 , . . . hn ) · H(f ) a
· (h1 , . . . hn )T .
2
You can think of f (a) as the constant, or zero-th order part, (rf )(a) · h as the linear part,
and H(f ) a (h) as the second-order part of this approximation.
Definition. Finally, we have two useful physical phenomena, the divergence and curl,
that have natural interpretations. Given a C 1 vector field F : R3 ! R3 , we can defind the
divergence and curl of F as follows:
2
Often, the curl is written as the “determinant” of the following matrix:
2 3
i j k
6 7
6@ 7
det 6 @ @ 7
6 @x @y @z 7
4 5
F1 F 2 F3
We also have several theorems that we know about the derivative! We list a few here.
Here’s how we extend the product and chain rules:
Theorem. Suppose that f, g are a pair of functions Rn ! Rm , and we’re looking at the
inner product1 f · g of these two functions. Then, we have that
One interesting/cautionary tale to notice from the above calculations is that the partial
derivative of g f with respect to one variable xi can depend on many of the variables and
coördinates in the functions f and g!
I.e. something many first-year calculus students are tempted to do on their sets is to
write
@(g f )i @gi @fi
= · .
@xj a @xj f (a) @xj a
DO NOT DO THIS. Do not do this. Do not do this. Ever. Because it is wrong. Indeed,
if you expand how we’ve stated the chain rule above, you can see that @(g@xfj )i – the (i, j)-th
a
entry in the matrix D(g f ) – is actually equal to the i-th row of D(g) multipled by
f (a)
the j-th column of D(f ) – i.e. that
a
2 3
@f1
6 @xj a
7
@(g f )i @gi @gi 6 .. 7 .
= @x1 f (a) . . . @x m
·6 . 7
@xj a f (a) 4 5
@fm
@xj a
1 Pm
Recall that the inner product of two vectors u, v is just the real number i=1 u i vi .
3
Notice how this is much more complex! In particular, it means that the partials of g f
depend on all sorts of things going on with g and f , and aren’t restricted to worrying about
just the one coördinate you’re finding partials with respect to.
The moral here is basically if you’re applying the chain rule without doing a *lot* of
derivative calculations, you’ve almost surely messed something up. So, when in doubt, just
find the matrices D(f ) and D(g)!
Here’s how the derivative interacts with finding maxima and minima:
In the section above, we talked about how to use derivatives to find and classify the
critical points of functions Rn ! R. This allows us to find the global minima and maxima
of functions over all of Rn , if we want! Often, however, we won’t just be looking to find the
maximum of some function on all of Rn : sometimes, we’ll want to maximize a function given
a set of constraints. For example, we might want to maximize the function f (x, y, z) = x+y
subject to the constraint that we’re looking at points where x2 + y 2 = 1. How can we do
this?
Initially, you might be tempted to just try to use our earlier methods: i.e. look for
places where Df is 0, and try to classify these extrema. The problem with this method,
when we have a set of constraints, is that it usually won’t find the maxima or minima on
this constraint: because it’s only looking for local maxima or minima over all of Rn , it will
ignore points that could be maxima or minima on our constrained surface! I.e. for the f, g
we mentioned above, we know that r(f ) = (1, 1), which is never 0; however, we can easily
see by graphing that f (x, y) = x + y should have a maximum value on the set x2 + y 2 = 1,
specifically at x = y = p12 .
Theorem. So: how can we find these maxima and minima in general? The answer is the
method of Lagrange multipliers, which we outline here.
2
The Hessian H(f ) a
is positive-definite if and only if the matrix
2 @2f 2 3
@x1 @x1
(a) . . . @x@1 @x
f
n
(a)
6 .. .. .. 7
6 . 7
4 . . 5
2 2
@ f @ f
@xn @x1
(a) . . . @xn @xn
(a)
4
Suppose that f : Rn ! R is a function whose extremal values {x} we would like to find,
given the constraints g(x) = c, for some constraining function g(x). Then, we have the
following result: if a is an extremal value of f restricted to the set S = {x : 8i, g(x) = c},
then either one of r(f ) a is 0, doesn’t exist, or there is some constant such that
r(f ) a
= r(g) a .
Theorem. We have a pair of rather useful theorems about the divergence and curl of
functions, which we state here:
5
2 Worked Examples
Example. (Lagrange multipliers; level curves.) Consider the function
x2 y 2
g(x, y) = e x2 y 2 .
(b) Let f (x, y) = x + y, and let S be the constraint set given by the level curve {(x, y) :
g(x, y) = c}. For what values of c does f S have a global maximum? For what values
does it fail to have a global maximum: i.e. for what values of c is f unbounded on S?
(c) For c = 14 , find the global maximum of f on the above constraint set S = {(x, y) :
g(x, y) = c}.
Solution. We graph g(x, y) = z in red, along with three level curves in di↵erent shades of
blue, in the following picture.
Roughly speaking, there are three kinds of level curves for our function:
6
and our function looks roughly like e x2 y 2 , which is roughly 1 x2 y 2 (via Taylor
series) for small values of (x, y).
2. Level curves g(x, y) = c, where c is greater than 0, but not by much. For these values
of c, we wind up having kind of a “four-armed” shape, with arms stretching out along
the x- and y- axes. This is because when one of our coordinates is nearly zero, the
2 2
other can become much larger (because our function is roughly e x y then), whereas
when the coordinates are roughly the same, the dominant term is now the x2 y 2 term,
and we need to have both x and y be much smaller.
3. Level curves g(x, y) = c, where c is 0. In these cases, our level curves look like
hyperbola-style curves, one in each quadrant. This is because on each axis, our func-
2 2
tion g(x, y) can never be 0, as the e x y -part is always positive and the x2 y 2 part
is zero on the axes.
This graphing and subsequent analysis suggests an answer to part (b), as well:
Claim. Our function f (x, y) has a global maximum on the curve g(x, y) = c if and only if
1 c > 0.
2 2
Proof. If c > 1, then there are no points (x, y) such that g(x, y) = c, because e x y is
bounded above by e0 = 1, while x2 y 2 is bounded above by 0.
So: suppose that 1 c > 0. Then, if (x, y) are such that g(x, y) = c, we know that in
particular
x2 y 2
e c
2 2
) x y ln(c)
2 2
) x + y ln(c)
p p
) x2 + y 2 ln(c)
p
) ||(x, y)|| ln(c),
p
i.e. the point (x, y) can be no further than ln(c) from the origin. (Because 1 c > 0, we
know that 1 < ln(c) 0, and therefore that this is a well-defined finite and real-valued
bound on distances.)
Therefore, the set of points such that g(x, y) = c is bounded. We also know that it is
closed, because it is the level curve of a continuous function. Therefore, we know that any
continuous function (in particular, f ) will attain its global maxima and minima on this set,
and do so at the critical points identified by the method of Lagrange multipliers.
Finally, suppose that c 0. In this case, our claim is that f does not attain its global
maximum on g(x, y) = c. To prove this, pick any value of n: we want to find a point (x, y)
on our curve such that f (x, y) > n.
To do this, we simply use the intermediate value theorem. Pick any n, and choose x
such that x2 < c 1, and also x > n. Then, we know that
x2 0 x2
g(x, 0) = e x2 · 0 = e >0 c,
7
while
x2 1 x2 x2
g(x, 1) = e x2 · 1 = e x2 < e c 1 < c,
2
because e x < 1.
Therefore,because g(x, 0) > c and g(x, 1) < c, by the intermediate value theorem, there
is some value of y between 0 and 1 such that g(x, y) = c. At this point (x, y), we know that
f (x, y) = x + y n+0 n,
which is what we wanted to prove: i.e. we’ve shown that we can find points on our curve
along which f (x, y) is arbitrarily large, and therefore that there is no global maximum.
Finally, with this theoretical discussion out of the way, we can turn to the calculational
part of (c), which asks us to find the global maximum of our function f on the constraint set
g(x, y) = 14 . First, note that by our above discussion, we know that a global maximum does
exist, because when 1 c > 0 we’ve shown that our constraint set is closed and bounded.
Furthermore, to find this maximum, it suffices to use the method of Lagrange multipliers
to find all of the critical points of our function restricted to this curve, and simply select
the largest value amongst these critical points. (Again, this is because g(x, y) = c is closed
and bounded, which means that our global maximum must occur a critical point.)
So: we calculate. We are looking for any points (x, y) such that either
• r(f ) or r(g) are 0,
• r(f ) or r(g) are undefined, or
• there is some nonzero constant such that r(f ) = r(g).
Because
r(f )(x, y) = (1, 1) ,
we can immediately see that r(f ) is never undefined or zero.
Similarly, because
⇣ 2 2 2 2
⌘
r(g) = 2xe x y 2xy 2 , 2ye x y 2yx2 ,
we can see that the first component of r(g) is zero if and only if
2 y2
0= 2xe x 2xy 2
⇣ ⌘
x2 y 2
,0 = 2x e + y2
x2 y 2
,0 = x, because e + y 2 is strictly positive.
Similarly, we can see that the second component of r(g) is zero if and only if
2 y2
0= 2ye x 2yx2
⇣ ⌘
x2 y 2
,0 = 2y e + x2
x2 y 2
,0 = y, because e + x2 is strictly positive.
8
So r(g) is always defined and is only zero at (0, 0), which is not a point on our curve
g(x, y) = 14 . Therefore, the only points we’re concerned with are ones at which r(f ) =
r(g); i.e. points such that
⇣ 2 2 2 2
⌘
r(f ) = (1, 1) = r(g) = 2xe x y 2xy 2 , 2ye x y 2yx2
x2 y 2 x2 y 2
, 2xe 2xy 2 = 2ye 2yx2 ,
because the above equation is equivalent to forcing both the left and right coordinates of
r(g) to equal the same quantity (namely, 1 .)
Solving, we can see that this is equivalent to
x2 y 2 x2 y 2
0 = 2xe + 2xy 2 2ye 2yx2
x2 y 2
,2(x y)e 2xy(x y) = 0.
If x y = 0, i.e. x = y, this equation holds. Otherwise, we can divide through by 2(x y),
and get
x2 y 2
e = xy.
1
Plugging this into our constraint equation g(x, y) = 4 gives us
x2 y 2 1 1 1
e (xy)2 = ) (xy) (xy)2 = ) xy = ,
4 4 2
by thinking of “xy” as one term and using the quadratic formula. But, if we think about
2 2 1
what this means for the equation e x y = xy, and specifically use y = 2x , we have
1 x2 y 2 x2 1
= xy = e =e 4x2 .
2
This is impossible! In specific, by taking a single-variable derivative, you can easily see that
the largest value of x2 4x12 happens at x = p12 , at which this is 1. This means that
2 1
the largest that e x 4x2
gets is e 1 = 1e , which is smaller than 12 .
Therefore, the only points at which r(f ) = r(g) are those at which x = y. Plugging
this into our constraint g(x, y) = 14 yields
2x2 1
e x4 =
4
)x ⌘ ±.65.
The function f (x, y) = x + y is equal to 1.3 at the point (.65, .65) and is equal to 1.3
at ( .65, .65). Therefore, by our discussion earlier about how f must attain its global
minima and maxima at the critical points discovered by the Lagrange multiplier process,
we can safely conclude that (.65, .65) is roughly the point at which f (x, y) attains its global
maxima, which is roughly 1.3.
Example. (Tangent planes.) Let S be the surface in R3 formed by the collection of all
points (x, y, z) such that exyz = e. Find the tangent plane to S at (1, 1, 1).
9
Solution. One way to attack this problem is to apply natural logs to both sides, which lets
us write S as the collection of all points (x, y, z) such that xyz = 1; i.e. all points x, y 6= 0
1 1
such that z = xy . In other words, we can write S as the graph of the function f (x, y) = xy .
We know that the gradient of f (x, y) is just
✓ ◆
y x
, ,
(xy)2 (xy)2
which at 1 is just ( 1, 1). Therefore, by using the formula for describing the first-order
Taylor approximation – i.e. tangent plane – of functions of the form f (x, y) = z, we have
that the tangent plane to our surface at (1, 1, 1) is just
(z 1) = r(f ) · (x 1, y 1) = ( 1, 1) · (x 1, y 1)
(1,1,1)
)z 1+x 1+y 1 = 0.
Alternately, we also discussed a second formula in class for finding tangent planes to
surfaces of the form g(x, y, z) = C, at some point (a, b, c). Specifically, we observed that
the gradient of g at the point (a, b, c) was orthogonal to the tangent plane to our surface
at this point: in other words, that we could define our tangent plane as just the set of all
vectors orthogonal to the gradient of g through this point. As a formula, this was
0 = r(g) · (x 1, y 1, z 1)
(1,1,1)
Example. (Chain rule.) Let g : R4 ! R be defined by the equation (w, x, y, z) = (wz yx),
and h : R2 ! R4 be defined by the equation h (a, b) = (a, a, b, b).
(b) Geometrically, explain why your answer in (a) is “obvious,” in some sense.
Solution. So, we know that both g and h are continuous functions on all of their domains;
therefore, we know that their composition is continuous everywhere. Therefore, we know
that the total derivative of g h is just given by the partial derivatives of g h : i.e.
10
T (g h ) = D(g h ). Therefore, we can use the chain rule:
0
=[ b b, a a]
= [0, 0].
Notice that
✓ this is◆ geometrically somewhat obvious because g is just the determinant
✓ of
◆
w x a b
the matrix , while the function h just outputs the rank-1 matrix .
y z a b
Because the determinant of a rank 1 matrix is 0, we have that g h is identically 0, and
therefore also has derivative 0.
(a) Calculate the directional derivative of g(x, y) at (1, 2) in the direction (3, 4).
we know that the directional derivative at (1, 2) in the direction (3, 4) is just given to us by
the dot product of r(g)(1, 2) with the unit-length vector in the direction (3, 4), given
1 1
by ||(3,4)|| · (3, 4) = p9+16 (3, 4) = 35 , 45 :
✓ ◆ ✓ ◆
3 4 3 4 6 cos(1) + 4 cos(2)
r(g)(1, 2) · , = (2 cos(1), cos(2)) · , = .
5 5 5 5 5
To calculate the Taylor approximation of g at (0, 0), we just need to construct the
following function:
T2 (g) (0,0)
(h1 , h2 ) = g(0, 0) + r(g) (0,0)
· (x, y) + H(g) (0,0)
(x, y).
11
To do this, simply note that the Hessian H(g) of g is just
1⇥ ⇤ y 2 sin(xy) cos(xy) xy sin(xy) h1
H(g) (h , h ) =
(0,0) 1 2
h1 , h2
2 cos(xy) xy sin(xy) x2 sin(xy) h2
(0,0)
1⇥ ⇤ 0 1 h1
= h1 , h2 ·
2 1 0 h2
1⇥ ⇤ h2
= h1 , h2
2 h1
1
= (h1 h2 + h1 h2 )
2
= h 1 h2 ,
T2 (g) (0,0)
(h1 , h2 ) = g(0, 0) + r(g) (0,0)
· (x, y) + H(g) (0,0)
(x, y)
= sin(0) + (0 cos(0), 0 sin(0)) · (x, y) + xy
= xy.
Therefore, the second-order approximation to sin(xy) at the origin is just T2 (x, y) = xy.
Find all of the critical points of f , and classify them as local maxima, minima, or saddle
points.
12
Roughly speaking, it looks like we have four global maxima, at least four saddle points
between these maxima, and probably a bunch of weird things going on in the interior part
of our function which are hard to determine from our picture. Probably a local minima in
there.
Picture aside, our task here is pretty immediate:
1. First, we want to calculate r(f ), and find all of the points where it is either undefined
or 0. These are our critical points.
2. We then want to calculate H(f ), the Hessian of f , for each critical point. If the
Hessian is positive-definite3 , then we know that this point is a local minimum;
if it is negative-definite, then it’s a local maximum; if it has both a positive
eigenvalue and a negative eigenvalue, it’s a saddle point; and if it’s anything
else, we have no idea what’s going on, and will need to explore its behavior using
other methods.
and
0= 8y 7 + 24y 5 16y 3
p
,y = 0, ± 2, ±1.
So we have twenty-five critical points, consisting of five choices of x and five choices of y. To
classify these points, we look at the matrix of second-order-partials formed in the Hessian:
2 3
@2f @2f
(a) . . . (a)
6 @x1 @x. 1 .
@x1 @xn
. 7 56x6 + 120x4 48x2 0
6 .. .. .. 7= .
4 5 0 56y 6 + 120y 4 48y 2
2
@ f 2
@ f
@xn @x1 (a) . . . @xn @xn (a)
2 @2f @2f
3
@x1 @x1
(a) ... @x1 @xn
(a)
6 .. .. .. 7
3
We say that the Hessian is positive-definite if the associated matrix 6
4 . . .
7
5
@2f 2
@xn @x1
(a) . . . @x@n @x
f
n
(a)
of second partial derivatives is positive-definite: i.e. it has n eigenvalues and they’re all strictly positive.
Negative-definite is similar, except we ask that all of the eigenvalues exist and are strictly negative.
13
p
When x = ±1, the polynomial 56x6 +120x4 48x2 is 16, which is positive; when x = ± 2,
this polynomial is 64, which is negative; finally, when x = 0 this polynomial is 0. Therefore,
at the points
(±1, ±1)
the Hessian is positive-definite, and therefore our function has a local minimum, while at
the points
p p
(± 2, ± 2)
the Hessian is negative-definite, and therefore our function has a local maximum, while at
p p
(± 2, ±1), (±1, ± 2),
the Hessian has both a negative and a positive eigenvalue (try (1, 0), (0, 1) for two eigenvec-
tors!), and therefore our function has a saddle point.
This leaves just the points with a zero-coordinate, at which the Hessian is useless to us.
There, we need to analyze how small changes in our function
14
This ispkind of horrible-looking, but we can work with it. In particular, it tells us that at
z = ± 2, we have
p p p p p
g((± 2) + ✏) g((± 2)) ⇡ ( 8(± 2)7 ✏ + 24(± 2)5 ✏ 16(± 2)3 ✏)
p p p
+ ( 56(± 2)6 ✏2 + 120(± 2)4 ✏2 48(± 2)2 ✏2 )
= 0 + ( 56 · 8✏2 + 120 · 4✏2 48 · 2✏2 )
= 64✏2 ,
and at z = ±1 we have
Example. Take the vector field V (x, y) = (x2 y 2 , x2 + y 2 ) . Show that this vector field is
neither the curl nor the gradient of any function.
Proof. This is relatively straightforward. To show that V is not the gradient of any vector
field, we simply need to calculate the curl of V . If it is nonzero, then we know that it cannot
be a gradient.
Because V is a vector field on R2 , in order to calculate its curl we treat it like a vector
field on R3 that has a 0 in its third component and does not depend on z. Then,
✓✓ ◆ ✓ ◆ ✓ ◆◆
@V3 @V2 @V1 @V3 @V2 @V1
curl(V ) = , ,
@y @z @z @x @x @y
= 0 0, 0 0, 2x 2yx2 ,
15
Math 94 Professor: Padraic Bartlett
Lecture 4: Integrals in Rn
Week 4 UCSB 2015
This is the third week of the Mathematics Subject Test GRE prep course; here, we
review the concepts of integrals in higher dimensions!
1. Types of integrals. You’ve (in theory) learned how to take several kinds of integrals
in undergrad:
Part of being able to do these integrals is the ability to describe a region R via
sets of nested parameters. For example, if R is the upper-right quadrant of the
unit disk
R = {(x, y) : x2 + y 2 1, 0 x, 0 y},
you should be able to describe R as the set of all points such that
p
x 2 [0, 1], y 2 [0, 1 x2 ],
for some function f . Be able to do this “nested parameter” thing over most kinds
of regions: usually, the way you do this is by picking one variable, determining its
maximum range, then (for some fixed value of that first variable) pick a second
variable and determine its maximum range depending on the first variable, and
so on/so forth.
1
• Line integrals. Given a parametrized curve : [a, b] ! Rn , we can find the
integral of either a vector field F : Rn ! Rn or a scalar field f : Rn ! R along
this curve. Specifically, we can express these integrals as the following:
Z Z b
F ·d = (F (t)) · ( 0 (t))dt, and
a
Z Z b
f d = (f (t))|| 0 (t)||dt.
a
As well, recall that a unit normal vector to our surface, n, can be given by the
formula
(Tu ⇥ Tv ) (Tv ⇥ Tu )
n= or
||Tu ⇥ Tv || ||Tv ⇥ Tu ||
The trickiest thing going on here is “how” you choose your parametrization. For
finding a parametrization of a surface S, you can usually do one of the following
two things:
2
– Often, if you describe your surface S in cylindrical or spherical coördinates,
you’ll see that one of the coördinates you’re describing your surface in is con-
stant. For example, a spherical shell of radius 3 can be described in spherical
coördinates as the set of all point (3, ✓, ), where ✓ 2 [0, 2⇡], 2 [0, ⇡]. In
this kind of situation, our parametrization is just using this coördinate sys-
tem with the constant variable treated as a constant: i.e. for the spherical
shell of radius 3, our parametrization is just
In this case, because z is positive, we can solve for z in terms of the other
variables, and express S as
p
S = {(x, y, z) : z = 1 + x2 + y 2 , z 2 [1, 2]}.
You can of course combine these two approaches: for example, if we were
to use cylindrical coördinates on our surface S above and replace x with
r cos(✓), y with r sin(✓), we can see that we can easily express T instead as
the map
p p
T (r, ✓) = (r cos(✓), r sin(✓), 1 + r2 ), r 2 [0, 3], ✓ 2 [0, 2⇡],
3
is a traversal of C in the counterclockwise direction. Suppose as well that P and
Q are a pair of C 1 functions from R2 to R. Then, we have the following equality:
ZZ ✓ ◆ Z
@Q @P
dxdy = (P dx + Qdy) .
@x @y
R
• Stokes’ theorem. Stokes’ theorem, quite literally, is Green’s theorem for sur-
faces in R3 (as opposed to restricting them to lying in the plane R2 .) Specifically,
it is the following claim: suppose that S is a surface in R3 with boundary @S
given by the simple closed curve C, suppose that n is a unit normal vector to S
that gives S some sort of orientation, and suppose that is a traversal of C such
that the interior of S always lies on the left of ’s forward direction, assuming
that we’re viewing the surface such that the normal vector n is pointing towards
us. Suppose as well that F is a vector field from R3 to R3 . Then, we have the
following equality:
ZZ Z
(r ⇥ F ) · n dS = F d .
S
In general, you use Green’s and Stokes’s theorems whenever you have a integral
of a function over an awful curve (and taking derivatives to work with your
function over a region, which is what the curl does, will make things easier), or
you have an integral of a curl-like function over an awful region (and working on
the curve would make things easier.)
• Divergence/Gauss’s theorem. Let W be a region in R3 with boundary given
by some surface S, let n be the outward-pointing (i.e. away from W ) unit normal
vector to S, and let F be a smooth vector field defined on W . Then
ZZZ ZZ
(div(F ))dV = (F · n)dS.
W @W
Again, use this like you would use Green’s and Stokes’s theorems.
• Change of variables. A common tactic to make integrals easier is to apply
the technique of change of variables, which allows us to describe regions in Rn
using coördinate systems other than the standard Euclidean ones. In general,
the change-of-variables theorem says the following:
– Suppose that R is an open region in Rn , g is a C 1 map Rn ! Rn on an
open neighborhood of R, and that f is a continuous function on an open
neighborhood of the region g(R). Then, we have
Z Z
f (x)dV = f (g(x)) · det(D(g(x)))dV.
g(R) R
4
– Polar coördinates. Suppose that R is a region in R2 described in polar
coördinates: i.e. there is some set A ✓ [0, 1) ⇥ [0, 2⇡) such that (A) = R,
where is the polar coördinates map (r, ✓) 7! (r cos(✓), r sin(✓)). Then, for
any integrable function f : R2 ! R, we have
ZZZ ZZZ
f (x, y)dV = f (r cos(✓), r sin(✓)) · r dV.
(A) A
To describe the cone, sphere cap, or torus above, cylindrical coördinates are
probably going to lead to the easiest calculations. Why is this? Well, all three of
these shapes have a large degree of symmetry around their z-axis; therefore, we’d
expect it to be relatively easy to describe these shapes as a collection of points
(r, ✓, z). However, these shapes do *not* have a large degree of rotational symme-
try: in other words, if we were to attempt to describe them with the coördinate
(r, ✓, '), we really wouldn’t know where to begin with the ' coördinate.
However, for the ellipsoid and “ice-cream-cone” section of the ellipsoid, spherical
coördinates are much more natural: in these cases, it’s fairly easy to describe
these sets as collections of points of the form (r, ✓, ').
5
In general, if you’re uncertain which of the two to try, simply pick one and see
how the integral goes! If you chose wisely, it should work out; otherwise, you can
always just go back and try the other coördinate system.
3. Applications of the integral. Finally, it bears noting that we’ve developed a few
applications of the integral to finding volume, surface area, length, and centers of
mass. We review these here:
2 Example Problems
Question 1. Let S denote the cut-o↵ paraboloid surface formed by the equations z + 1 =
x2 + y 2 , z 0, oriented ⇣so that the z-component
⌘ at the origin is positive. Let F denote the
z z 2 z 3
vector field F (x, y, z) = e y, e x, e z . Find the integral of r ⇥ F over S.
You could parametrize S and directly integrate this vector over S. But this looks awful.
Instead, what we can do is use Stokes’ theorem! In particular, consider the surface D given
by the unit disk x2 + y 2 1, z = 0. This surface has the same boundary as our surface
S: specifically, @S = @D = x2 + y 2 = 1. Suppose we orient the unit disk with the normal
6
(0, 0, 1), which is normal to the unit disk everywhere. Then these boundaries have the same
orientation, if both boundaries are oriented positively with respect to their corresponding
surfaces.
Therefore, we can use Stokes’s theorem once to see that
ZZ Z
r ⇥ F · dS = F · ds,
S @S +
as any parametrization of the disk will have zero z-coördinate, and thus our integral is of
the form ( , , 0) · (0, 0, 1) = 0!
Lots of set-up, but it makes our calculations trivial: we didn’t even have to parametrize
the unit disk! This is one of the cooler applications of Stokes’s theorem: switching between
di↵erent surfaces.
You can also use things like Stokes’s and Green’s theorem to switch integrals between
di↵erent curves: it’s a little weirder, but sometimes is really useful.
Question 2. Take a pond whose outer perimeter is given by a circle of radius 4 and contains
16⇡ cubic centimeters of water. Drop a rock in the center of the pond. Assume that the
rock’s edges are roughly vertical, i.e. we can model the boundary of the rock in the pond as
some 2-d shape. After doing this, assume the water has height h in centimeters.
Suppose that there is an ant walking around the boundary of the rock. Suppose further
that this ant is being blown on by a wind current, which imparts force on the ant corre-
sponding to the vector field F(x, y) = ( y, x). In one walk of the ant around the boundary
of
R the rock, how much energy does the wind impart on the ant? In other words, what is
1
F · ds?
7
2
As labelled above, let 1 denote the perimeter of the rock, and 2 denote the perimeter of
the pond. Let R denote the region between the outer curve and the inner curve. We want
to calculate
Z
F · dS.
1
This is. . . hard, because, well, we don’t actually know what 1 is. However, we can get
around this with Green’s theorem! ⇣ ⌘
In particular: notice that Green’s theorem says that the integral of @F @x
2 @F1
@y over R
is equal to the integral of F over the two boundary components 1 , 2 , provided that they’re
both oriented (as drawn) so that R is always on the left-hand-side of each curve. In other
words,
ZZ ✓ ◆ Z Z
@F2 @F1
dA = F · ds + F · ds.
@x @y 1 2
R
So, we can solve for the integral we want to study, in terms of two other integrals:
Z ZZ ✓ ◆ Z
@F2 @F1
F · ds = dA F · ds.
1
@x @y 2
R
Because the pond started with 16⇡ cubic centimeters of water and had height h after we
dropped the rock in, we know that R has surface area 16⇡
h , and therefore that this integral
is 32⇡
h .
8
R
As well, we can find 2 F · ds. We parametrize 2 as 2 (t) = (4 cos(t), 4 sin(t)):
Z Z 2⇡ Z 2⇡
F · ds = ( 4 sin(t), 4 cos(t)) · ( 4 sin(t), 4 cos(t)) dt = 16dt = 32⇡.
2 0 0
R
Therefore, we can combine these two integrals to calculate 1
F · ds:
Z ✓ ◆
1
F · ds = 32⇡ 1 .
1
h
This is pretty cool: we know exactly how much work was done by this wind current,
even though we have no idea what path we integrated over!
Question 3. Let T be a triangle with vertices (1, 0, 0), (0, 2, 0), (0, 0, 3). If this triangle is
made out of some material with uniform density across its surface, what is the x-coördinate
of the center of mass of T ?
Solution. We want to find the x-coördinate of the center of mass of T . This is the “average”
x-coördinate over our entire surface. Recall the following: if we wantRR to find theRRaverage
value of a function f on a surface T , we want to find the integrals T f dA and T 1dA,
and divide the first of these two integrals by the second: this gives us the average value of
f over T .
So. We start by parametrizing our triangle. We do this by considering coördinates
one-by-one. We first look at x: over our entire triangle, x ranges from 0 to 1.
We now look at the possible range of y-values, given x. We do this by projecting our
triangle onto the xy-plane: there, this is the triangle with vertices (0, 0), (1, 0), (0, 2).
Given any fixed value of x, we can see that y ranges from 0 to 2 2x.
Finally, we need to solve for z given x and y. To do this, we just need to find the plane
this triangle lies in: this will give us an equation relating x, y and z. We do this by taking
the generic equation for a plane
ax + by + cz = d
and plugging in the three points (1, 0, 0), (0, 2, 0), (0, 0, 3) into this equation:
d d
a = d, b = ,c = .
2 3
9
This gives us that our plane has the equation
d d
dx + y + z = d,
2 3
which (if we divide by d) becomes
y z
x+ + = 1.
2 3
Solving for z gives us
3y
z=3 3x .
2
⇣ ⌘
3y
So, we can parametrize our triangle via the map T (x, y) = x, y, 3 3x 2 , where x
ranges from 0 to 1 and (given
RR x) y ranges from 0 to 2 2x.
So, if we want to find T 1dA, we can just use this parametrization:
ZZ Z 1 Z 2 2x
@T @T
1 dA = ⇥ dxdy
T 0 0 @x @y
Z 1Z 2 2x ✓ ◆ ✓ ◆
3
= 1, 0, 3 ⇥ 0, 1, dxdy
0 0 2
Z 1Z 2 2x ✓ ◆
3
= 3, , 1 dxdy
0 0 2
Z 1Z 2 2x
r
9
= 9 + + 1 dxdy
0 0 4
Z 1Z 2 2x
7
= dxdy
0 0 2
Z 1
= 7 7x dx
0
7
= .
2
RR
Similarly, if we want to find T xdA, we can do mostly the same thing:
ZZ Z 1 Z 2 2x
@T @T
x dA = x ⇥ dxdy
T 0 0 @x @y
Z 1 Z 2 2x
7x
= dxdy
0 0 2
Z 1
= 7x 7x2 dx
0
7
= .
6
Therefore, the x-coördinate of the center of mass is just the ratio of these two integrals,
7/6 1
i.e. 7/2 = 3 .
10
Question 4. Let T be the same triangle as in Question 3. Integrate the vector field
F(x, y, z) = (xy, yz, zx) over the perimeter of this triangle, oriented in the counterclock-
wise direction as viewed from the positive octant.
Solution. We could parametrize the boundary of this triangle, but that seems hard. In-
stead, we will use Stokes’s theorem, which says that
Z ZZ
F · ds = r ⇥ F · dS.
@T T
Using this, we can instead integrate r ⇥ F over the triangle itself, because we already
parametrized that! Convenient.
We do this here.
ZZ Z 1 Z 2 2x ✓ ◆
@T @T
r ⇥ F · dS = (r ⇥ F) · ⇥ dA
0 0 @x @y
T
Z 1 Z 2 2x ✓✓ ◆ ✓ ◆ ✓ ◆◆ ✓ ◆
@F3 @F2 @F1 @F3 @F2 @F1 @T @T
= , , · ⇥ dA
0 0 @y @z @z @x @x @y @x @y
Z 1 Z 2 2x ✓ ◆
3
= (0 y, 0 z, 0 x) · 3, , 1 dA
0 0 T (x,y) 2
Z 1 Z 2 2x ✓ ◆
3 3
= 3y 3 3x y x dA
0 0 2 2
Z 1 Z 2 2x
9 3 7
= y + x dA
0 0 2 4 2
Z 1
3
= 9 + 9x (2 2x)2 + 7x 7x2 dA
0 8
Z 1
17 2 21
= x + 19x dA
0 2 2
17 19 21 23
= + = .
6 2 2 6
Question 5. Directly calculate the integral of F (x, y, z) = (3x2 y, 3xy 2 , z) over the surface
of the unit cube, using the orientation depicted below. Then, use the divergence theorem to
calculate this in a much faster manner.
11
Solution. If we want to do this directly, break the unit cube into its six sides
notice that the normals to these sides are precisely the normals (0, 0, ±1), (0, ±1, 0), (±1, 0, 0)
as depicted in the above diagram, and calculate
ZZ
F · dS
surface of cube
Z 1Z 1 Z 1Z 1
= F (x,y,0) · (0, 0, 1)dxdy + F (x,y,1) · (0, 0, 1)dxdy
0 0 0 0
Z 1Z 1 Z 1Z 1
+ F (x,0,z) · (0, 1, 0)dxdz + F (x,1,z) · (0, 1, 0)dxdz
0 0 0 0
Z 1Z 1 Z 1Z 1
+ F (0,y,z) · ( 1, 0, 0)dydz + F (1,y,z) · (1, 0, 0)dydz
0 0 0 0
Z 1Z 1 Z 1Z 1 Z 1Z 1 Z 1Z 1
= 0dxdy + 1dxdy + 0dxdz + 3x dxdz
0 0 0 0 0 0 0 0
Z 1Z 1 Z 1Z 1
+ 0dydz + 3ydydz
0 0 0 0
=1.
Alternately, if you use the divergence theorem, we can calculate this in a much faster
12
way:
ZZ ZZZ
F · dS = (divF )dV
surface of cube cube
Z 1Z 1Z 1
= (6xy 6xy + 1)dxdydz
0 0 0
Z 1Z 1Z 1
= 1dxdydz = 1.
0 0 0
⇣ ⌘
sin2 (t)
Question 6. Let c(t) = cos(t) 2 , cos(t) sin(t) denote the “fish curve” drawn below:
Solution. This looks like a textbook example of when to use the Green’s theorem formula
for the area contained in a curve. Specifically, Green’s theorem, as applied to finding the
area contained within a curve, says that if a region R is bounded by some simple closed
curve c(t) that is oriented positively (i.e. so that R is on the left as we travel along c(t)),
then
ZZ Green’s theorem Z
z}|{ 1
area(R) = 1dxdy = = ( y, x) dc.
2 c(t)
R
13
R
If we just plug in our curve, we get that 12 c(t) ( y, x) dc is
Z ✓ ◆
1 2⇡ sin2 (t)
cos(t) sin(t), cos(t) · sin(t) sin(t) cos(t), cos2 (t) sin2 (t) dt
2 0 2
Z ✓ ◆
1 2⇡ 2 2 2 3 2 cos2 (t) sin2 (t) sin4 (t)
= cos(t) sin (t) + cos (t) sin (t) + cos (t) cos(t) sin (t) + dt
2 0 2 2
Z ✓ ◆
1 2⇡ cos2 (t) sin2 (t) 3 sin4 (t)
= + cos (t) + dt
2 0 2 2
Z ✓ ◆
1 2⇡ sin2 (2t) 2 (1 cos(2t))2
= + cos(t)(1 sin (t)) + dt
2 0 8 8
Z ✓ ◆
1 2⇡ 1 cos(4t) 2 1 2 cos(2t) + cos2 (2t)
= + cos(t)(1 sin (t)) + dt
2 0 16 8
Z ✓ ◆
1 2⇡ 1 cos(4t) 2 1 2 cos(2t) 1 + cos(4t)
= + cos(t)(1 sin (t)) + + dt
2 0 16 8 16
Z ✓ ◆
1 2⇡ 1 cos(2t)
= + cos(t)(1 sin2 (t)) dt
2 0 4 4
✓ ◆
1 t sin3 (t) sin(2t) 2⇡
= + sin(t)
2 4 3 8 0
=⇡/4.
But is this plausible? Well: looking at our fish curve, it seems to contain about (in the
head-part) the area of an ellipse from .5 to 1 with y-height from 1 to 1, which is about
3⇡/2. This is much greater than ⇡/4, the area of a circle with radius .5. So: something has
gone wrong!
What, specifically? Well, to apply Green’s theorem, we needed a simple closed curve
that was positively oriented. Did we have that here? No! In fact, our curve c has a self-
intersection: c(⇡/2) = c(3⇡/2), and in fact the tail part of our curve is oriented negatively
(i.e. if we travel around our curve from ⇡/2 to 3⇡/2, our region is on the right-hand side.
In fact, we’ve calculated the area of the head minus the area in the tail!
To calculate what we want, we want to take the integral above evaluated from ⇡/2
to ⇡/2 (the head) and then add the integral from 3⇡/2 to ⇡/2 (travelling backwards here
makes it so that we get the right orientation on the tail.) Specifically, we have
✓ ◆ ✓ ✓ ◆ ◆
1 t sin3 (t) sin(2t) ⇡/2 1 ⇡ ⇡ 1 1
+ sin(t) = + 1 ( 1) + +0 0
2 4 3 8 ⇡/2 2 8 8 3 3
⇡ 2
= + ,
8 3
while
✓ ◆ ✓ ✓ ◆ ◆
1 t sin3 (t) sin(2t) ⇡/2 1 ⇡ ⇡ 1 1
+ sin(t) = + 1 ( 1) + +0 0
2 4 3 8 3⇡/2 2 8 8 3 3
⇡ 2
= + ;
8 3
therefore, our total area is ⇡8 + 23 + ⇡8 + 23 = 43 .
14
Math 94 Professor: Padraic Bartlett
This is the fifth week of the Mathematics Subject Test GRE prep course; here, we review
the field of linear algebra!
1
• Eigenspace: For any eigenvalue , we can define the eigenspace E associated to
as the space
E =: {v 2 V : Av = v}.
The first matrix above multiplies a given row by , the second matrix switches two
given rows, and the third matrix adds times one row to another row.
• Adjacency Matrices: For a graph2 G on the vertex set V = {1, 2, . . . n}, we can
define the adjacency matrix for G as the following n ⇥ n matrix:
⇢
aij = 1 if the edge (i, j) is in E;
AG := aij
aij = 0 otherwise.
It bears noting that we can reverse this process: given a n ⇥ n matrix AG , we can
create a graph G by setting V = {1, . . . n} and E = {(i, j) : aij 6= 0}.
2
Given any permutation of (1, . . . n), the permutation matrix P is simply the
n ⇥ n matrix whose i-th column is given by e (i) . In other words,
2 3
.. .. ..
6 . . . 7
P =6 e ~
4 (1) e ~ (2) . . . e ~ 7
(n) 5
.. .. ..
. . .
3
– Algebraic multiplicity: The algebraic multiplicity of an eigenvalue µ is the
number of times it shows up as a root of A’s characteristic polynomial. I.e. if
pA ( ) = ( ⇡)2 , ⇡ would have algebraic multiplicity 2.
– Geometric multiplicity: The geometric multiplicity of an eigenvalue µ is
the dimension of the eigenspace associated to µ.
– Useful Theorem: The algebraic multiplicity of an eigenvalue is always greater
than or equal to the geometric multiplicty of that eigenvalue.
– Useful Theorem: A matrix is diagonalizable i↵ every eigenvalue has its algebraic
multiplicity equal to its geometric multiplicity. (If you want it to be diagonal-
izable via real-valued matrices, you should also insist that the matrix and all of
its eigenvalues are real.)
– Dominant eigenvalue: The dominant eigenvalue: is the largest eigenvalue
of a matrix.
• Regular: A matrix A is called regular if aij > 0, for every entry aij in A. We will
often write A > 0 to denote this.
• Nonnegative: A matrix is called nonnegative if and only if all of its entries are
0.
4
– If is an eigenvector of a nonnegative matrix A that corresponds to a nonnegative
eigenvector, then is at least the minimum of the row sums, and at most the
maximum of the row sums; similarly, is at least the minimum of the column
sums, and at most the maximum of the column sums.
• Similarity. Two matrices A, B are called similar if there is some matrix U such
that A = U BU 1 . If we want to specify what U is, we can specifically state that A
and B are similar via U .
– Suppose that A is diagonalized as EDE 1 . Then we can write the n-th power
of A as EDn E 1 . As well, if all of the entries along the diagonal of D have k-th
roots, we can give a k-th root of A as the product ED1/k E 1.
5
– Useful Theorem 3: If we have a probability matrix P representing some finite
system with n states {1, . . . n}, then the probability of starting in state j and
ending in state i in precisely m steps is the (i, j)-th entry in P m .
A+ · b + (I A+ A)w,
A = QR.
6
• Jordan block. A block Bi of some block-diagonal matrix is called a Jordan block
if it is in the form
2 3
1 0 0 ... 0
60 1 0 . . . 07
6 7
60 0 1 . . . 07
6 7
6 .. .. .. . . . . .. 7
6. . . . . .7
6 7
40 0 0 . . . 15
0 0 0 ... 0 .
In other words, there is some value such that Bi is a matrix with on its main
diagonal, 1’s in the cells directly above this diagonal, and 0’s elsewhere.
• Inner product: For two vectors x, y 2 Rn , we define the inner product hx, yi of x
and y as their dot product, x · y.
– Useful Observation: Often, it’s quite handy to work with the transpose of certain
vectors. So, remember: when you’re taking the inner product or dot product of
two vectors, taking the transpose of either vector doesn’t change the results!
I.e. hx, yi = hxT , yi = hx, yT i = hxT , yT i. We use this a *lot* in proofs and
applications where there are symmetric or orthogonal matrices running about.
• Magnitude: p The magnitude of a vector x is the square root of its inner product with
itself: ||x|| = hx, xi. This denotes the distance of this vector from the origin.
• Projection, onto a vector: For two vectors u, v, we define the projection of v onto
u as the following vector:
hv, ui
proju (v) := · u.
hu, ui
7
of x onto U as the following vector in U :
n
X
projU (x) = projbi (x).
i=1
– Useful Theorem: This vector is the closest vector in U to x.
• Orthogonal complement: For a subspace S of a vector space V , we define the
orthogonal complement S ? as the following set:
S ? = {v 2 V : hv, si = 0, 8s 2 S} .
8
2 Example problems
We work some sample problems here, to illustrate some of the ideas.
Question. Suppose that A is an n ⇥ n matrix such that A3 is the all-zeroes matrix, i.e.
the n ⇥ n matrix in which every entry is 0.
2. Can you find an example of such a matrix A, such that A and A2 are not themselves
all-zeroes matrices?
If you want a n ⇥ n example of such a matrix, simply add additional rows/columns of zeroes
to the left and bottom of A.
In general, we claim that the range of any such matrix A cannot be Rn . To see why,
simply notice that if A is a matrix with range equal to its domain, then A must be invertible;
consequently, for any natural number k, Ak must also be invertible, with inverse given by
(A 1 )k . Therefore Ak would have range Rn (as it is invertible, and thus has dim(nullspace)
= 0;) and therefore in particular we could not have Ak = the all-zeroes matrix for any k,
as this has dim(nullspace) = n.
nullspc(M k ) ◆ nullspc(M ).
Proof. The first claim here is not hard to establish. Take any vector ~v 2 nullspc(M ). By
definition, we know that M~v = ~0; therefore, we can conclude that M k~v = M k 1 · M~v =
M k 1~0 = ~0 as well, and thus that ~v 2 nullspc(M ).
9
As a side note, our earlier problem proves that inequality is possible (as the nullspaces
of A, A3 were distinct;) it is also not hard to see that equality is possible (let M be the
all-zeroes matrix!) and thus that this is the strongest statement we can make.
For the second part of our claim: we could simply use the multiplicative property of the
determinant (which tells us that det(M k ) = det(M ) · . . . · det(M ) = 0 · . . . · 0 = 0), or we
could use the first part of this question to note that because
• det(M ) = 0 if and only if dim(nullspc(M )) 6= 0, and
• nullspc(M k ) ◆ nullspc(M ), then dim(nullspc(M )) dim(nullspc(M k )),
• then we can conclude that if det(M ) = 0 then dim(nullspc(M k )) 6= 0, and thus that
• det(M k ) = 0.
Proof. If we ignore our “no zero entries” condition, this is not too hard; the matrix
2 3
1 0 0
4 0 2 05
0 0 3
satisfies our eigenvalue properties, as (1, 0, 0), (0, 1, 0), (0, 0, 1) are eigenvectors for these
three eigenvalues 1, 2, 3
Now, we can use the fact that eigenvalues are invariant under similarity; that is, if A
is a matrix and B is an invertible matrix, then A and BAB 1 have the same eigenvalues!
(This is because if ~v is an eigenvector for A, then B~v is an eigenvalue for BAB 1 , with the
same eigenvalue.)
So we can try simply multiplying A on the left and right by appropriate B, B 1 ’s, and
hope we get something without zeroes! In particular, let’s use some matrices whose inverses
we know: elementary matrices! Recall that
2 3 2 3
1 1 0 1 1 0
B = 4 0 1 05 ) B 1 = 40 1 05 ,
0 0 1 0 0 1
because the first map (when processed as B · (matrix)) corresponds to the Gaussian elim-
ination move of “add one copy of row two to row three,” and the second is just “add 1
copies of row two to row three.”
Therefore
2 3 2 3 2 3 2 3
1 1 0 1 0 0 1 1 0 1 1 0
4 0 1 0 5 · 4 0 2 0 5 · 4 0 1 05 = 4 0 2 0 5 ;
0 0 1 0 0 3 0 0 1 0 0 3
10
by using more of these elementary matrices, we can actually get
2 3 2 3 2 3 2 3
1 0 1 1 1 0 1 0 1 1 1 2
40 1 05 · 40 2 05 · 40 1 0 5 = 40 2 05 ;
0 0 1 0 0 3 0 0 1 0 0 3
2 3 2 3 2 3 2 3
1 0 0 1 1 2 1 0 0 1 1 2
42 1 05 · 40 2 05 · 4 2 1 05 = 4 6 4 45 ;
0 0 1 0 0 3 0 0 1 0 0 3
2 3 2 3 2 3 2 3
1 0 0 1 1 2 1 0 0 1 1 2
4 0 1 05 · 4 6 4 45 · 40 1 05 = 4 2 4 45 .
1 0 1 0 0 3 1 0 1 2 1 1
2 3
1 0 0
This is a matrix that has no nonzero entries, and by construction is similar to 40 2 05;
0 0 3
so we’ve answered our problem!
Question. Suppose that A is a n ⇥ n matrix with the following two properties:
• An is the all-zeroes matrix.
• There is exactly one nonzero vector ~v , up to scalar multiples, that is an eigenvector
of A. (In other words, the only eigenvectors for A are vectors of the form c · ~v .)
Find the Jordan normal form of A.
Proof. Take any eigenvector ~v for A; then A~v = ~v . Consequently, An~v = An 1 ~v =
An 2 2~v = . . . = n~v . Because An is the all-zeroes matrix, we can also observe that
An~v = ~0, for any vector ~v ; consequently, we have proven that the only possible eigenvalue
for A is 0.
Our second bullet point is the claim that the dimension of the eigenspace for this only
eigenvalue is 1. Consequently, if we look at our matrix’s Jordan normal form, we know that
• The diagonals are all zeroes, as 0 is the only eigenvalue, and eigenvalues go on the
diagonal of a Jordan normal form.
• There is only one block, as there is only one dimension of eigenvectors.
Therefore, we have that the Jordan normal form here is just zeroes on the diagonal, ones
directly about the diagonal, and zeroes elsewhere: i.e.
2 3
0 1 0 0 ... 0
60 0 1 0 . . . 07
6 7
60 0 0 1 . . . 07
6 7
6 .. .. .. .. . . .. 7
6. . . . . .7
6 7
40 0 0 0 . . . 15
0 0 0 0 ... 0
11
Question. Suppose that A is a real-valued symmetric n ⇥ n matrix with the following two
properties:
If this is equal to (10, 10, . . . 10), then there are ten 1’s in each row of A, as claimed.
Furthermore, suppose that we have any eigenvalue other than 10 for this matrix A. Let
~v be the eigenvector for this eigenvalue, and vk be the largest component of this eigenvector.
Then, again by definition, we have
2 3 2 3 2 Pn 3 2 3
a11 . . . a1n v1 i=1 a1i vi v1
6 .. . .. . 7 6 . 7 6
.. 5 · 4 .. 5 = 4 .
.. 7 6 .7
4 . 5 = 4 .. 5 .
Pn
an1 . . . ann vn i=1 ani vi vn
but if we use the fact that vk is the “biggest” (i.e. vk vj , 8j), we can see that
n
X n
X
aki vi aki vk 10vk ,
i=1 i=1
because there are at most ten one-entries in the k-th row (and the rest are zeroes.)
But this means that vk 10vk ; i.e. 10, as claimed.
12
Math 94 Professor: Padraic Bartlett
This is the seventh week of the Mathematics Subject Test GRE prep course; here, we
review various techniques used to solve di↵erential equations!
1
for two functions M (x), N (y). We can solve this equation by “separating” M (x) from
N (y): that is, by dividing both sides by N (x) and “multiplying” by dx to get1 the
following:
1
M (x)dx = dy.
N (y)
Integrating both sides yields
Z Z
1
M (x)dx = dy,
N (y)
which gives us a relation that can be used to solve for y with algebra/other techniques.
Be aware that this equation above only gives us solutions for which g(y) 6= 0. In the
event that g(y) is identically 0 — i.e. y is a constant — you would need to check this
manually by seeing if a constant value of y can solve our equation.
We calculate an example here:
)x3 + x = y 2 3y + c
Because (0, 0) is a point that should be a solution to our equation, we can see that
c = 0, and that our equation is (solving for y)
x3 + x = y 2 3y
)y 2 3y (x3 + x) = 0
p
3 ± 9 + 4(x3 + x)
)y= .
2
At x = 0, this expression is 3±3
2 , which we know should be 0; this tells us that we
want the positive branch of this expression, i.e.
p
3 + 9 + 4(x3 + x)
y= .
2
1
Formally speaking, we are doing something more subtle than multiplying through by dx, because what
would that even mean? What is a dx, outside of an integral? For rigorous answers to this, take courses on
analysis and di↵erential equations! For now, however, just roll with it.
2
2. Homogeneous first-order di↵erential equations. Suppose that you have a dif-
ferential equation of the form
dy
M (x, y) + N (x, y) = 0,
dx
where M (x, y), N (x, y) are a pair of degree-n homogeneous equations2 . We can solve
this di↵erential equation by defining v = y/x, which lets us make the substitution
y = xv, and yields the equation
d
M (x, xv) + N (x, xv)
(xv) = 0.
dx
If M, N are homogeneous of degree n, this yields
✓ ◆
n n dv
x · M (1, v) + x · N (1, v) · v + x = 0,
dx
dv
which we can solve for dx to get
✓ ◆
dv N (1, v) 1
= v · .
dx M (1, v) x
This is a separable di↵erential equation, and therefore is solvable by our earlier meth-
ods! Use them to solve this di↵erential equation, and then finally substitute v = y/x
back to get a solution for our original problem.
We calculate an example here:
2
A function f (x, y) of two variables is called homogeneous of degree n if f (tx, ty) = tn f (x, y) for all
t, x, y. For example, f (x, y) = x2 + xy + y 2 is homogeneous of degree 2, as f (tx, ty) = t2 x2 + txty + t2 y 2 =
t2 f (x, y).
3
Integrating both sides of this separable equation gives us
Z Z Z
1 v 1 2 1
1 dv = 2
dv = ln(1 + v ) = dx = ln(x) + c.
v +v
1 + v 2 x
Plugging in v = y/x yields
✓ ◆
1 y2
ln 1 + 2 = ln(x) + c
2 x
y 2 ⇣ ⌘ 2 c
)1 + 2 = e 2 ln(x)+c = c · eln(x) =
x x2
)y 2 = c x2
p
)y = ± c x2 .
4
Example. Solve the di↵erential equation
dy
+ x2 y = x5 ,
dx
given the boundary condition that at x = 0 we want y = 0.
To integrate the RHS, we use the substitution u = x3 /3, motivated by the fact that
eanything not just a single variable is a total pain to calculate: as du = x2 dx and 3u = x3 ,
we get
⇣ 3 ⌘ Z 3
yex /3 = x5 ex /3 dx
Z
3
) = x3 ex /3 x2 dx
Z
= 3ueu du
= 3ueu 3eu + C
3 /3
= ex (x3 3) + C
C
) y = x3 3+ .
ex3 /3
Our boundary conditions tell us that we want (0, 0) to be a solution to our equation:
in other words, that 3 = C.
5
to y: i.e. some function F (x, y) such that
Z
F (x, y) = M (x, y)dx + Cy ,
Z
F (x, y) = N (x, y)dy + Cx ,
Note that I have written Cy , Cx instead of the normal constants C; this is because when
we integrate with respect to x, y is held constant (and similarly for y, x.) Therefore,
terms involving the coefficient we are not integrating by are “constants” that can show
up in our solution! (This is why we need to consider integrating both M (x, y) and
N (x, y), and not just one of the two.)
Fun fact we’re not proving here: such a function always exists for exact di↵erential
equations, and you can always find it!
@ @
When you do, you’ll get that @x F (x, y) = M (x, y), @y F (x, y) = N (x, y). Therefore,
we can write our di↵erential equation in the form
@ @ dx
F (x, y) + F (x, y) = 0.
@x @y dy
But this is simply the total derivative of the function F (x, y)! Therefore, if we are
asking that this total derivative is 0, we are looking for the set of all points (x, y) on
which F (x, y) is constant; that is, the set of all level curves of F (x, y), i.e.
F (x, y) = c.
6
i.e. F (x, y) = sin(xy) + C. Our solutions are simply the level curves of this function;
i.e. the set of all points (x, y) satisfying sin(xy) = c. If we want (1, 0) on this curve,
we want sin(0) = C; i.e. C = 0, and therefore that our solutions are the set of all
points satisfying sin(xy) = 0.
7
is indeed a function ⇠(x) that only depends4 on the variable x! RTherefore, as suggested
above, we can multiply both sides by the integrating factor e ⇠(x)dx = e3x , to get
dy
(3x2 y + y 3 + 2yx)e3x + (x2 + y 2 )e3x = 0.
dx
We can see that this equation now is exact, as
@
(3x2 y + y 3 + 2yx)e3x = (3x2 + 3y 2 + 2x)e3x ,
@y
@ 2
(x + y 2 )e3x = 2xe3x + (x2 + y 2 )3e3x
@x
are both equal. Therefore, we can find a solution by integrating (3x2 y+y 3 +2yx)e3x , (x2 +
y 2 )e3x appropriately:
Z
F (x, y) = (3x2 y + y 3 + 2yx)e3x dx
✓ ◆ ✓ ◆
2 3x 2 3x 2 3x y 3 3x 2 3x 2 3x
y x e xe + e + e +y xe e + Cy
3 9 3 3 9
y3
=yx2 e3x + e3x + Cy .
Z 3
F (x, y) = (x2 + y 2 )e3x dy
y 3 3x
=yx2 e3x + e + Cx .
3
So we have
✓ ◆
3x 2 3x 2 3x
F (x, y) = e +y xe e + C,
3 9
Solutions to our di↵erential equation are level curves of this function; i.e. all x, y such
that
y 3 3x
yx2 e3x + e = C.
3
Asking that (0, 0) is on such a curve is simply the restriction that C = 0; that is, we
have
y 3 3x
yx2 e3x + e = 0.
3
4
Well, really, it doesn’t depend on anything. But that’s OK: constant functions are functions!
8
2 Example GRE Problems
We work four example problems here, taken from the three GRE exams you’ve completed
thus far in this class:
1 1 e2
(a) 2e (b) e (c) 2 (d) 2e (e) 2e2
Therefore, we can solve this by simply integrating these two functions appropriately:
Z
F (x, y) = (y xex )dx = xy xex + ex + Cy ,
Z
F (x, y) = x dy = xy + Cx .
Combining these results gives us F (x, y) = xy xex + ex + C, which we want to find level
curves for; i.e. our solutions look like xy xex + ex = C. If we plug in the point (1, 0), we
get C = 0. Finally, if we want to find out what happens when we have x = 2, note that
2y 2e2 + e2 = 0
e2
implies that y = f (2) is just 2. In other words, our answer is (c).
Problem. Which of the following five pictures gives the graphs of two functions satisfying
the di↵erential equation
✓ ◆2
dy dy
+ 2y + y 2 = 0?
dx dx
9
Answer. Factoring our equation yields
✓ ◆2
dy
+y = 0,
dx
The only answer whose curves have Ce x -like behavior is (a), so we have answered our
question.
Problem. Suppose that we have a tank of water. This tank is a cube with vertical sides,
no top, and side length 10 feet. Let h(t) denote the height of the water level, in feet, above
the floor of the tank at time t.
Suppose that at time t = 0 water begins to pour into the tank at a constant rate of 1
cubic foot per second, and also begins to pour out of the tank at a rate of h(t) 4 cubic feet
per second. As t approaches infinity, what is the limit of the volume of the water in the
tank?
(a) 400 ft3 (b) 600 ft3 (c) 1000 ft3 (d) The limit DNE.
(e) We do not have enough information to solve this problem.
10
Answer. We note that on one hand, if we let V denote the volume of our tank, we have
V = 100h; consequently, we have that dV dh dV
dt = 100 dt . Conversely, we are given dt directly as
1 h/4; therefore, by combining, we have the di↵erential equation
dh 1 1
+ h= .
dt 400 100
R
This is linear; therefore, if we multiply both sides by the integrating factor e (1/400)dt =
et/400 , we get
dh t/400 1 t/400 1
e + e h = et/400
dt 400 100
d ⇣ t/400 ⌘ 1
) he = et/400
dt Z 100
1
)het/400 = et/400 dt
100
= 4et/400 + C
C
) h = 4 + t/400 .
e
As t goes to infinity, this expression converges to 4; therefore the volume, which is 100h,
goes to 400. So our answer is (a).
11
Math 94 Professor: Padraic Bartlett
This is the eighth week of the Mathematics Subject Test GRE prep course; here, we run
a very rough-and-tumble review of abstract algebra! As always, this field is much bigger
than one class; accordingly, we focus our attention on key definitions and results.
• Identity: there is a unique identity element e 2 G such that for any other g 2 G, we
have e · g = g · e = g.
Example. 1. The real numbers with respect to addition, which we denote as hR, +i,
is a group: it has the identity 0, any element x has an inverse x, and it satisfies
associativity.
2. Conversely, the real numbers with respect to multiplication, which we denote as hR, ·i,
is not a group: the element 0 2 R has no inverse, as there is nothing we can multiply
0 by to get to 1!
3. The nonzero real numbers with respect to multiplication, which we denote as hR⇥ , ·i,
is a group! The identity in this group is 1, every element x has an inverse 1/x such
that x · (1/x) = 1, and this group satisfies associativity.
5. The integers with respect to multiplication, hZ, ·i do not form a group: for example,
there is no integer we can multiply 2 by to get to 1.
1
6. The natural numbers N are not a group with respect to either addition or multipli-
cation. For example: in addition, there is no element 1 2 N that we can add to 1
to get to 0, and in multiplication there is no natural number we can multiply 2 by to
get to 1.
7. GLn (R), the collection of all n ⇥ n invertible real-valued matrices, is a group under
the operation of matrix multiplication. Notice that this group is an example ofa non-
0 1
abelian group, as there are many matrices for which AB 6= BA: consider ·
0 0
1 0 0 0 1 0 0 1 0 1
= versus · = .
0 0 0 0 0 0 0 0 0 0
8. SLn (R), the collection of all n ⇥ n invertible real-valued matrices with determinant
1, is also a group under the operation of matrix multiplication; this is because the
property of being determinant 1 is preserved under taking inverses and multiplication
for matrices.
9. The integers mod n, Z/nZ is a group with respect to addition! As a reminder, the
object hZ/nZ, +, ·i is defined as follows:
2
Consider the element n. In particular, notice that for any k, we have
kn ⌘ x mod p
)kn x is a multiple of p
)kn x is a multiple of mn
)kn x is a multiple of n
)x is a multiple of n.
(If none of the above deductions make sense, reason them out in your head!)
Because of this, we can see that n has no inverse in (Z/pZ)⇥ , as kn is only
congruent to multiples of n, and 1 is not a multiple of n.
• The converse — showing that if p is prime, (Z/pZ)⇥ has inverses — is a little
trickier. We do this as follows: first, we prove the following claim.
Claim. For any a, b 2 {0, . . . p 1}, if a · b ⌘ 0 mod p, then at least one of a, b
are equal to 0.
Proof. Take any a, b in {0, . . . p 1}. If one of a, b are equal to 0, then we know
that a · b = 0 in the normal “multiplying integers” world that we’ve lived in our
whole lives. In particular, this means that a · b ⌘ 0 mod p as well.
Now, suppose that neither a nor b are equal to 0. Take both a and b. Recall,
from grade school, the concept of factorization:
Observation. Take any nonzero natural number n. We can write n as a product
of prime numbers n1 · . . . · nk ; we think of these prime numbers n1 , . . . nk as the
“factors” of n. Furthermore, these factors are unique, up to the order we write
them in: i.e. there is only one way to write n as a product of prime numbers, up
to the order in which we write those primes. (For example: while you could say
that 60 can be factored as both 2 · 2 · 3 · 5 and as 3 · 2 · 5 · 2, those two factorizations
are the same if we don’t care about the order we write our numbers in.)
In the special case where n = 1, we think of this as already factored into the
“trivial” product of no prime numbers.
Take a, and write it as a product of prime numbers a1 · . . . · ak . Do the same for
b, and write it as a product of primes b1 · . . . · bm . Notice that because a and b
are both numbers that are strictly between 0 and n 1, n cannot be one of these
prime numbers (because positive multiples of n must be greater than n!)
In particular, this tells us that the number a · b on one hand can be written as
the product of primes a1 · . . . · ak · b1 · . . . · bm , and on the other hand (because
factorizations into primes are unique, up to ordering!) that there is no n in the
prime factorization of a · b.
Conversely, for any natural number k, the number k · n must have a factor of
n in its prime factorization. This is because if we factor k into prime numbers
k1 · . . . · kj , we have k · n = k1 · . . . · kj · n, which is a factorization into prime
numbers and therefore (up to the order we write our primes) is unique!
3
In particular, this tells us that for any k, the quantities a · b and k · p are distinct;
one of them has a factor of p, and the other does not. Therefore, we have shown
that if both a and b are nonzero, then a · b cannot be equal to a multiple of p —
in other words, a · b is not congruent to 0 modulo p! Therefore, the only way to
pick two a, b 2 {0, . . . p 1} such that a · b is congruent to 0 modulo p is if at
least one of them is equal to 0, as claimed.
• From here, the proof that our group has inverses is pretty straightforward. Take
any x 2 (Z/pZ)⇥ , and suppose for contradiction that it did not have any inverses.
Look at the multiplication table for x in (Z/pZ)⇥ :
1 2 3 ... p 1
x ? ? ? ... ?
If x doesn’t have an inverse, then 1 does not show up in the above table! The
above table has p slots, and if we’re trying to fill it without using 1, we only
have p 1 values to put in this table; therefore some value is repeated! In other
words, there must be two distinct values k < l with xl ⌘ xk mod p.
Consequently, we have x(l k) ⌘ 0 mod p, which by our above observation
means that one of x, (l k) are equal to 0. But x is nonzero, as it’s actually
in (Z/pZ)⇥ : therefore, l k is equal to 0, i.e. l = k. But we said that k, l are
distinct; so we have a contradiction! Therefore, every element x has an inverse,
as claimed.
11. The symmetric group Sn is the collection of all of the permutations on the set
{1, . . . n}, where our group operation is composition. In case you haven’t seen this
before:
1 2 3
f:
✏
1 2 3
• This, however, is not the most space-friendly way to write out a permutation. A
much more condensed way to write down a permutation is using something called
cycle notation. In particular: suppose that we want to denote the permutation
that sends a1 ! a2 , a2 ! a3 , . . . an 1 ! an , an ! a1 , and does not change
any of the other elements (i.e. keeps them all the same.) In this case, we would
denote this permutation using cycle notation as the permutation
(a1 a2 a3 . . . an ).
4
To illustrate this notation, we describe all of the six possible permutations on {1, 2, 3}
using both the arrow and the cycle notations:
0 1 0 1 0 1
1 2 3 1 2 3 1 2 3
B C B C B C
id : @ A (12) : @ A (13) : @ A
✏ ✏ ✏ ✏ w ✏ '
1 2 3 1 2 3 1 2 3
0 1 0 1 0 1
1 2 3 1 2 3 1 2 3
B C B C B C
(23) : @ A (123) : @ A (132) : @ A
✏ w '
1 2 3 1 2 3 1 2 3
The symmetric group has several useful properties. Two notable ones are the follow-
ing:
Claim. For any finite group G, there is some n such that G is a subgroup of Sn .
12. The dihedral group of order 2n, denoted D2n , is constructed as follows:
Consider a regular n-gon. There are a number of geometric transformations, or sim-
ilarities, that we can apply that send this n-gon to “itself” without stretching or
tearing the shape: i.e. there are several rotations and reflections that when applied to
a n-gon do not change the n-gon. For example, given a square, we can rotate the plane
by 0 , 90 , 180 , or 270 , or flip over one of the horizontal, vertical, top-left/bottom-
right, or the top-right/bottom-left axes:
(rotate by 0°) (rotate by 90°)
a b a b a b b c
d c d c d c a d
(rotate by 180°) (rotate by 270°)
a b c d a b d a
d c b a d c c b
(flip over horizontal) (flip over vertical)
a b d c a b b a
d c a b d c c d
(flip over UL/DR diagonal) (flip over UR/DL diagonal)
a b a d a b c b
d c b c d c d a
1
A permutation 2 Sn is called a transposition if we can write = (ab), for two distinct values
a, b 2 {1, . . . n}.
5
Given two such transformations f, g, we can compose them to get a new transformation
f g. Notice that because these two transformations each individually send the n-gon
to itself, their composition also sends the n-gon to itself! Therefore composition is a
well-defined operation that we can use to combine two transformations.
a b b c c b
=
a b c b
This is a group!
Now that we have some examples of groups down, we list some useful concepts and
definitions for studying groups:
Definition. Take any two groups hG, ·i, hH, ?i, and any map ' : G ! H. We say that '
is a group isomorphism if it satisfies the following two properties:
This property “preserves structure” in the following sense: suppose that we have two
elements we want to multiply together in H. Because ' is a bijection, we can write
these two elements as '(g1 ), '(g2 ). Our property says that '(g1 · g2 ) = '(g1 ) ? '(g2 ):
in other words, if we want to multiply our two elements in H together, we can do so
using either the G-operation · by calculating '(g1 · g2 ), or by using the H-operation
? by calculating '(g1 ) ? '(g2 ).
Similarly, if we want to multiply any two elements g1 , g2 in G together, we can see
that g1 ·g2 = ' 1 ('(g1 ·g2 )) = ' 1 ('(g1 )?'(g2 )). So, again, we can multiply elements
using either G or H’s operation! To choose which operation we use, we just need to
apply ' or ' 1 as appropriate to get to the desired set, and perform our calculations
there.
2 1 1
Notice that this means that there is an inverse map ' : H ! G, defined by ' (h) = the unique
g 2 G such that '(g) = h.
6
Definition. Take any two groups hG, ·i, hH, ?i, and any map ' : G ! H. We say that '
is a group homomorphism if it satisfies the “preserves structure” property above.
Definition. A subgroup H of a group G is called normal if for any g 2 G, the left and
right cosets3 gH, Hg are equal. We write H ⇥ G to denote this property.
Theorem. Suppose G is a group and H is a normal subgroup. Define the set G/H to
be the collection of all of the distinct left cosets gH of H in G. This set forms something
called the quotient group of G by H, if we define g1 H · g2 H = (g1 g2 )H. This is a useful
construction, and comes up all the time: for example, you can think of Z/nZ as a quotient
group, where G is Z and H = nZ = {n · k | k 2 Z}.
Definition. Take any group hG, ·i of order n: that is, any group G consisting of n distinct
elements. We can create a group table corresponding to G as follows:
• Take any ordering r1 , . . . rn of the n elements of G: we use these elements to label our
rows.
• Using these two orderings, we create a n ⇥ n array, called the group table of G, as
follows: in each cell (i, j), we put the entry ri · cj .
Theorem. Two groups hG, ·i, hH, ?i are isomorphic if and only if there is a bijection ' :
G ! H such that when we apply ' to a group table of G, we get a group table of H.
Theorem. (Cayley.) Let hG, ·i be a finite group, and g 2 G be any element of G. Define
the order of g to be the smallest value of n such that g n = id. Then the order of g always
divides the total number of elements in G, |G|.
More generally, suppose that H is any subgroup of G. Then |H| divides |G|.
This theorem has a useful special case when we consider the group (Z/pZ)⇥ :
ap 1
⌘1 mod p.
3
The left coset gH of a subgroup H by an element g is the set {g · h | h 2 H}. Basically, it’s H if you
“act” on each element by g. Right cosets are the same, but with Hg instead.
7
Theorem. Any finite abelian group G is isomorphic to a direct sum4 of groups of the
k
form Z/pj j Z. In other words, for any finite abelian group G, we can find primes p1 , . . . pl
and natural numbers k1 , . . . kl such that
G⇠
= Z p k1 ··· Z p kl .
1 l
Some people will denote a ring with a multiplicative identity as a “ring with unity.” I
believe it is slightly more standard to assume that all rings have multiplicative identities,
and in the odd instance that you need to refer to a ring without a multiplicative identity
as a “rng.”
Example. 1. The integers with respect to addition and multiplication form a ring, as
do the rationals, reals, and complex number systems.
2. The Gaussian p integers Z[i], consisting of the set of all complex numbers {a +
bi | a, b 2 Z, i = 1} form a ring with respect to addition and multiplication.
Z/nZ is a ring for any n.
Definition. A integral domain is any ring R where the following property holds:
Example. 1. The integers with respect to addition and multiplication form a integral
domain, as do the rationals, reals, and complex number systems.
4
A group G is called the direct sum of two groups H1 , H2 if the following properties hold:
• Both H1 , H2 are normal subgroups of G.
• H1 \ H2 is the identity; i.e. these two subgroups only overlap at the identity element.
• Any element in G can be expressed as a finite combination of elements from H1 , H2 .
We think of G = H1 H2 .
8
2. Z/nZ is not an integral domain for any compositite n: if we can write n = ab for two
a, b < n, then we have that a · b ⌘ 0 mod n, while neither a, b are multiples of n (and
thus not equivalent to 0).
Definition. A field is any ring R where hR⇥ , ·i is a group. (By R⇥ , we mean the set of
all elements in R other than the additive identity.)
Example. 1. The integers with respect to addition and multiplication are not a field.
2. The rationals, reals, and complex number systems are fields with respect to addition
and multiplication!
There are many many theorems about rings and fields; however, the GRE will not
require you to know almost all of them. Instead, they mostly want you to be familiar with
what they are, and how they are defined!
To illustrate how the GRE tests you on these concepts, we run a few practice problems
here:
(a) 1 only. (b) 1 and 2 only. (c) 2 and 3 only. (d) 1 and 2 only. (e) 1, 2 and 3.
Answer. We can answer this problem quickly by classifying all possible homomorphisms
' : G ! G. We can first notice that we must send 1 to 1 if we are a homomorphism. To
see this, notice that '(1) = '(1 · 1) = '(1) · '(1), and therefore by canceling a '(1) on both
sides we have 1 = '(1).
Now, consider where to send i. If we have '(i) = i, then we must have '( 1) = '(i2 ) =
'(i)'(i) = i · i = 1, and '( i) = '(i3 ) = '(i)'(i)'(i) = i3 = i. So we’re the identity,
and also we are the map z 7! z 3 .
If we have '(i) = 1, then we must have '( 1) = '(i2 ) = '(i)'(i) = 1 · 1 = 1, and
'( i) = '(i3 ) = '(i)'(i)'(i) = 13 = 1. So we’re the map that sends everything to 1;
alternately, we’re the map z 7! z 4 .
Similarly, if we have '(i) = 1, then we must have '( 1) = '(i2 ) = '(i)'(i) =
1 · 1 = 1, and '( i) = '(i3 ) = '(i)'(i)'(i) = ( 1)3 = 1. So we’re the map z 7! z 2 .
9
The last possibility is if we have '(i) = i, then we must have '( 1) = '(i2 ) =
'(i)'(i) = i · i = 1, and '( i) = '(i3 ) = '(i)'(i)'(i) = ( i)3 = i. So we’re the map
z 7! z, or alternately the map z 7! z 3 .
As a result, all of the claims 1,2,3 are all true; so our answer is e.
Problem. Let R be a ring. Define a right ideal of R as any subset U of R such that
• U is an additive subgroup of R.
Suppose that R only has two distinct right ideals. Which of the following properties
must hold for R?
2. R is commutative.
3. R is a division ring; that is, every nonzero element in R has a multiplicative inverse.
(a) 1 only. (b) 2 only. (c) 3 only. (d) 2 and 3 only. (e) 1, 2 and 3.
Answer. We first notice that any ring always has two ideals, namely {0} and R.
The first property is eliminated by noticing that R = Z/2Z is a ring. Its only additive
subgroups are {0} and R, so in particular those are its only two ideals, and this group is
clearly finite.
The second property is eliminated by recalling the quaternions H, which are a noncom-
mutative ring!
Finally, we can verify that the third property must hold. To see this, take any a, and
consider the set aR = {ar | r 2 R}. This is an additive subgroup, and also a right ideal!
Therefore it is either the all-zero subgroup (only if a = 0, as otherwise a · 1 = a 6= 0) or all
of R. But this means that there is some ar 2 aR, s 2 R such that ars = 1; i.e. rs = a 1 .
So a has an inverse!
This leaves 3 as the only possibility.
10
Math 94 Professor: Padraic Bartlett
This is the tenth week of the Mathematics Subject Test GRE prep course; here, we
quickly review a handful of useful concepts from the fields of probability, combinatorics,
and set theory!
As always, each of these fields are things you could spend years studying; we present
here a few very small slices of each topic that are particularly key to each.
Definition. A set S is simply any collection of elements. We denote a set S by its elements,
which we enclose in a set of parentheses. For example, the set of female characters in
Measure for Measure is
Another way to describe a set is not by listing its elements, but by listing its properties.
For example, the even integers greater than ⇡ can be described as follows:
Definition. A specific set that we often care about is the empty set, ; i.e. the set con-
taining no elements. One particular quirk of the empty set is that any statement of the
form 8x 2 ; . . . will always be vacuously true, as it is impossible to disprove (as we disprove
8 claims by using 9 quantifiers!) For example, the statement “every element of the empty
set is delicious” is true. Dumb, but true!
Some other frequently-occurring sets are the open intervals (a, b) = {x | x 2 R, a < x <
b} and closed intervals [a, b] = {x | x 2 R, a x b}.
Definition. Given two sets A, B, we can form several other particularly useful sets:
• The di↵erence of A and B, denoted A B or A\B. This is the set {x | x 2 A, x 2
/ B}.
1
• The cartesian product of A and B, denoted A ⇥ B, is the set of all ordered pairs
of the form (a, b); that is, {(a, b) | a 2 A, b 2 B}.
• Sometimes, we will have some larger set B (like R) out of which we will be picking
some subset A (like [0, 1].) In this case, we can form the complement of A with
respect to B, namely Ac = {b 2 B | b 2 / A}.
Examples.
• The function h depicted below by the three arrows is a function, with domain {1, , '}
and codomain {24, , Batman} :
1 ; ? 24
' Batman
This may seem like a silly example, but it’s illustrative of one key concept: functions are
just maps between sets! Often, people fall into the trap of assuming that functions
have to have some nice “closed form” like x3 sin(x) or something, but that’s not true!
Often, functions are either defined piecewise, or have special cases, or are generally fairly
ugly/awful things; in these cases, the best way to think of them is just as a collection of
arrows from one set to another, like we just did above.
Definition. We call a function f injective if it never hits the same point twice – i.e. for
every b 2 B, there is at most one a 2 A such that f (a) = b.
Example. The function h from before is not injective, as it sends both and ' to 24:
2
1 ; ? 24
' Batman
However, if we add a new element ⇡ to our codomain, and make ' map to ⇡, our function
is now injective, as no two elements in the domain are sent to the same place:
1 ; 24
' Batman
*⇡
Definition. We call a function f surjective if it hits every single point in its codomain –
i.e. if for every b 2 B, there is at least one a 2 A such that f (a) = b.
Alternately: define the image of a function as the collection of all points that it maps
to. That is, for a function f : A ! B, define the image of f , denoted f (A), as the set
{b 2 B | 9a 2 A such that f (a) = b}.
Then a surjective function is any map whose image is equal to its codomain: i.e. f :
A ! B is surjective if and only if f (A) = B.
Example. The function h from before is not injective, as it doesn’t send anything to
Batman:
1 ; ? 24
' Batman
However, if we add a new element ⇢ to our domain, and make ⇢ map to Batman, our
function is now surjective, as it hits all of the elements in its codomain:
3
1 ; ? 24
' Batman
C
Definition. We say that two sets A, B are the same size (formally, we say that they are of
the same cardinality,) and write |A| = |B|, if and only if there is a bijection f : A ! B.
total elements in A.
To give an example, consider the following problem:
4
Problem. Suppose that we have n friends and k di↵erent kinds of postcards (with
arbitrarily many postcards of each kind.) In how many ways can we mail out all of
our postcards to our friends?
A valid “way” to mail postcards to friends is some way to assign each friend to a
postcard, so that each friend is assigned to at least at least one postcard (because
we’re mailing each of our friends a postcard) and no friend is assigned to two di↵erent
postcards at the same time. In other words, a “way” to mail postcards is just a
function from the set1 [n] = {1, 2, 3, . . . n} of postcards to our set [k] = {1, 2, 3, . . . k}
of friends!
In other words, we want to find the size of the following set:
n o
A = all of the functions that map [n] to [k] .
We can do this! Think about how any function f : [n] ! [k] is constructed. For each
value in [n] = {1, 2, . . . n}, we have to pick exactly one value from [k]. Doing this for
each value in [n] completely determines our function; furthermore, any two functions
f, g are di↵erent if and only if there is some value m 2 [n] at which we made a di↵erent
choice (i.e. where f (m) 6= g(m).)
Consequently, we have
· . . . · k} = k n
| · k {z
k
n
total ways in which we can construct distinct functions. This gives us this answer k n
to our problem!
• (Summation principle.) Suppose that you have a set A that you can write as the
union2 of several smaller disjoint3 sets A1 , . . . An .
Then the number of elements in A is just the summed number of elements in the Ai
sets. If we let |S| denote the number of elements in a set S, then we can express this
in a formula:
5
Question 1. Pizzas! Specifically, suppose Pizza My Heart (a local chain/ great pizza
place) has the following deal on pizzas: for 7$, you can get a pizza with any two
di↵erent vegetable toppings, or any one meat topping. There are m meat choices and
v vegetable choices. As well, with any pizza you can pick one of c cheese choices.
How many di↵erent kinds of pizza are covered by this sale?
Using the summation principle, we can break our pizzas into two types: pizzas with
one meat topping, or pizzas with two vegetable toppings.
For the meat pizzas, we have m · c possible pizzas, by the multiplication principle (we
pick one of m meats and one of c cheeses.)
For the vegetable pizzas, we have v2 ·c possible pizzas (we pick two di↵erent vegetables
out of v vegetable choices, and the order doesn’t matter in which we choose them; we
also choose one of c cheeses.)
v
Therefore, in total, we have c · m + 2 possible pizzas!
• (Double-counting principle.) Suppose that you have a set A, and two di↵erent
expressions that count the number of elements in A. Then those two expressions are
equal.
Again, we work a simple example:
How many dots are in this grid? On one hand, the answer is easy to calculate: it’s
(n + 1) · (n + 1) = n2 + 2n + 1.
On the other hand, suppose that we group dots by the following diagonal lines:
6
The number of dots in the top-left line is just one; the number in the line directly
beneath that line is two, the number directly beneath that line is three, and so on/so
forth until we get to the line containing the bottom-left and top-right corners, which
contains n + 1 dots. From there, as we keep moving right, our lines go down by one in
size each time until we get to the line containing only the bottom-right corner, which
again has just one point.
So, if we use the summation principle, we have that there are
1 + 2 + 3 + . . . + (n 1) + n + (n + 1) + n + (n 1) + . . . + 3 + 2 + 1
points in total.
Therefore, by our double-counting principle, we have just shown that
n2 + 2n + 1 = 1 + 2 + 3 + . . . + (n 1) + n + (n + 1) + n + (n 1) + . . . + 3 + 2 + 1.
Rearranging the right-hand side using summation notation lets us express this as
n
X
n2 + 2n + 1 = (n + 1) + 2 i;
i=1
7
Then, in any set S of greater than two people, there are at least two people with the
same number of friends in S.
Let |S| = n. Then every person in S has between 0 and n 1 friends in S. Also notice
that we can never simultaneously have one person with 0 friends and one person with
n 1 friends at the same time, because if someone has n 1 friends in S, they must
be friends with everyone besides themselves.
Therefore, each person has at most n 1 possible numbers of friends, and there are n
people total: by the pigeonhole principle, if we think of people as the “pigeons” and
group them by their numbers of friends (i.e. the “pigeonholes” are this grouping by
numbers of friends,) there must be some pair of people whose friendship numbers are
equal.
• Suppose that we have a set of n objects, and we want to pick k of them without
repetition in order. Then there are n · (n 1) · . . . · (n (k + 1) many ways to choose
them: we have n choices for the first, n 1 for the second, and so on/so forth until our
k-th choice (for which we have n (k + 1) choices.) We can alternately express this as
n!
(n k)! ; you can see this algebraically by dividing n! by k!, or conceptually by thinking
our our choice process as actually ordering all n elements (the n! in our fraction) and
then forgetting about the ordering on all of the elements after the first k, as we didn’t
pick them (this divides by (n k)!.)
• Suppose that we have a set of n objects, and we want to pick k of them without
repetition and without caring about the order in which we pick these k elements.
Then there are k!(nn! k)! many ways for this to happen. We denote this number as the
binomial coefficient nk .
• Finally, suppose that we have a set of n objects, and we want to pick k of them, where
we can pick an element multiple times (i.e. with repetition.) Then there are nk many
ways to do this, by our multiplication principle from before.
• A finite set ⌦.
8
• A measure P r on ⌦, such that P r(⌦) = 1. In case you haven’t seen this before,
saying that P r is a measure is a way of saying that P r is a function P(⌦) ! R+ , such
that the following properties are satisfied:
– P r(;) = 0.
1
! n
[ X
– For any collection {Xi }1
i=1 of subsets of ⌦, P r Xi P r(Xi ).
i=1 i=1
1
! n
[ X
– For any collection {Xi }1
i=1 of disjoint subsets of ⌦, P r Xi = P r(Xi ).
i=1 i=1
For a general probability space, i.e. one that may not be finite, the definition is almost
completely the same: the only di↵erence is that ⌦ is not restricted to be finite, while P r
becomes a function defined only on the “measurable” subsets of ⌦. (For the GRE, you can
probably assume that any set you run into is “measurable.” There are some pathological
constructions in set theory that can be nonmeasurable; talk to me to learn more about
these!)
9
For example, consider our six-sided die probability space again, and the random variable X
defined by X(i) = i (in other words, X is the random variable that outputs the top face of
the die when we roll it.)
The expected value of X would be
X 1 1 1 1 1 1 21 7
P r(!) · X(!) = · 1 + · 2 + · 3 + · 4 + · 5 + · 6 = = .
6 6 6 6 6 6 6 2
!in⌦
In other words, rolling a fair six-sided die once yields an average face value of 3.5.
Definition. For any two events A, B that occur with nonzero probability, define P r(A
given B), denoted P r(A|B), as the likelihood that A happens given that B happens as well.
Mathematically, we define this as follows:
P r(A \ B)
P r(A|B) = .
P r(B)
In other words, we are taking as our probability space all of the events for which B happens,
and measuring how many of them also have A happen.
10
Definition. Take any two events A, B that occur with nonzero probability. We say that A
and B are independent if knowledge about A is useless in determining knowledge about
B. Mathematically, we can express this as follows:
P r(A) = P r(A|B).
Definition. Take any n events A1 , A2 , . . . An that each occur with nonzero probability. We
say that these n events are are mutually independent if knowledge about any of these
Ai events is useless in determining knowledge about any other Aj . Mathematically, we can
express this as follows: for any i1 , . . . ik and j 6= i1 , . . . ik , we have
Example. There are many, many examples. One of the simplest is the following: consider
the probability space generated by rolling two fair six-sided dice, where any pair (i, j) of
faces comes up with probability 1/6.
Consider the following three events:
11
Each of these events clearly has probability 1/2. Moreover, the probability of A \ B, A \ C
and B \ C are all clearly 1/4; in the first case we are asking that both dice come up even,
in the second we are asking for (even, odd) and in the third asking for (odd, even), all of
which happen 1/4 of the time. So these events are pairwise independent, as the probability
that any two happen is just the products of their individual probabilities.
However, A \ B \ C is impossible, as A \ B holds i↵ the sum of our two dice is even!
So P r(A \ B \ C) = 0 6= P r(A)P r(B)P r(C) = 1/8, and therefore we are not mutually
independent.
12
MATH GRE BOOTCAMP: LECTURE NOTES
IAN COLEY
s
inverse functions and their derivatives, logarithms and exponential functions and
their derivatives.
s
1.1. Basics.
re
Definition 1.1. Let f : R ! R be a function. We say that lim f (x) = L if for every
x!a
" > 0, there exists > 0 such that whenever |x a| < , |f (x) L| < ".
We will worry about limits of sequences on another day.
og
Definition 1.2. We say that f : R ! R is continuous at x = a if lim f (x) = f (a).
x!a
one-sided limits. This is more easily drawn than defined (and they also have no
analogue in multivariable calculus, so we will omit the full definition). If this were a
blackboard, there would be a better example here.
When are functions discontinuous? Jump discontinuities (almost always in piece-
wise functions), infinite discontinuities, and removable discontinuities (holes). One
could also
p consider a function discontinuous at the points where it ceases to exist, e.g.
f (x) = x is discontinuous at x 0. For examples of continuous functions, think
of almost literally any function: polynomials, trigonometric functions, logarithms,
exponential functions, etc.
Last Updated: December 5, 2018.
1
2 IAN COLEY
It is likely unimportant for the GRE, but it might be nice to recall the squeeze
theorem just in case:
Theorem 1.4 (Squeeze Theorem). Suppose that f, g, h : R ! R are three functions.
If f (x) g(x) h(x) in a neighbourhood of x = a, then
lim f (x) lim g(x) lim h(x).
x!a x!a x!a
This is particularly useful when f (x) and h(x) are continuous at x = a (or one is
even constant) but g(x) isn’t.
✓ ◆
1
Problem 1.5. Compute lim x · sin .
x!0 x
s
The second main definition we need for calculus is that of di↵erentiable functions.
s
f (x) f (a)
Definition 1.6. We say that f : R ! R is di↵erentiable at x = a if lim
re
x!a x a
f (a + h) f (a)
exists. Alternatively, we can ask that lim exists. We write f 0 (a) for
h!0 h
this value.
og
Problem 1.7. Prove that these two definitions agree.
Problem 1.8. Prove that if f (x) is di↵erentiable at x = a, it is continuous at x = a.
When are functions non-di↵erentiable? By the previous exercise, when they’re
Pr
discontinuous. It’s important to remember to always check continuity first, as in the
following:
Problem 1.9. Describe the set of solutions (a, b, c) 2 R3 such that the following
function is continuous and di↵erentiable (everywhere):
(
ax2 + bx + c x 1
f (x) =
In
x log x x>1
s
1.2. Derivatives. Of course we all remember the basic rules for doing derivatives,
but let’s recall them anyway: let f, g : R ! R be two functions.
s
• (f ± g)0 (x) = f 0 (x) ± g 0 (x)
re
• (c · f )0 (x) = c · f 0 (x) for all c 2 R
d
• sin(x) = cos(x)
dx
d
• cos(x) = sin(x)
dx
d
• sinh(x) = cosh(x)
dx
d
• cosh(x) = sinh(x)
dx
4 IAN COLEY
d x
• e = ex
dx
d 1
• log(x) =
dx x
Other trigonometric functions can be computed using the quotient rule.
Perhaps it is also worth remembering the less-used but GRE-noteworthy formula
for the second derivative of a function:
Problem 1.12. Prove that
f (x + h) + f (x h) 2f (x)
f 00 (x) = lim
h!0 h2
s
What other techniques do we use for computing derivatives? Computing the
derivatives of inverse functions can be difficult, specifically when we don’t have a
s
closed formula for the inverse. What circumstances are those?
Definition 1.13. Let f : R ! R be a function, and suppose that X ⇢ R is a set on
re
which f is one-to-one. Then we say that f is invertible on X, and write f 1 (y) for
the inverse, which is defined by f 1 (y) = x if and only if f (x) = y.
Problem 1.14. Suppose that f : R ! R is a function and that for all x 2 X ⇢ R,
og
f 0 (x) > 0. Then f is invertible on X.
Problem 1.15. Suppose that f : R ! R is a di↵erentiable invertible function. Then
f 1 : R ! R is also di↵erentiable, except at those y = f (x) 2 R such that when
f 0 (x) = 0.
Pr
How do we compute the derivative of the inverse? Write f 1 = g for simplicity,
Then f (g(y)) = y, so taking derivatives and using the chain rule we have
1
(f g)0 (y) = f 0 (g(y)) · g 0 (y) = 1 =) g 0 (y) = 0
f (g(y))
So as long as we can figure out g(y) and f 0 (x), we can figure out g 0 (y).
Problem 1.16. Compute the derivative of tan 1 (y).
In
Solution. By the formula, we have that the inverse should be the derivative of tan(x)
evaluated on tan 1 (y). The derivative of tan(x) is sec2 (x), so we need to figure out
sec2 (tan 1 (y)). This is done with a technique I personally call ‘draw the triangle’.
We know that tan 1 (y) = x for some x, so we need to draw the triangle in which x
is one of the angles. We know only that tan(x) = y, so we may draw a right triangle:
1
• x
•
p y
1+y 2
•
MATH GRE BOOTCAMP: LECTURE NOTES 5
Therefore sec2 (x) we can compute as the hypotenuse squared over the adjacent side
squared, that is sec2 (tan 1 (y)) = 1 + y 2 . Therefore the derivative of tan 1 (y) is
1
.
1 + y2
This method can be used to compute the derivatives of the other inverse trigono-
metric and hyperbolic trigonometric functions on the fly, so you don’t need to nec-
essarily memorise all of them. That said, you should definitely memorise the above
example.
Logarithmic di↵erentiation is a useful technique, and it also recalls implicit di↵er-
entiation.
Definition 1.17. Let F : R2 ! R be a function and let F (x, y) = c implicitly define
s
a function of one variable f : (a, b) ! R. Then
s
dy Fx
=
dx Fy
re
where Fx , Fy are the partial derivatives of F .
In practice, life is easier than this. We will illustrate with an example.
Problem 1.18. Find the tangent line to the circle x2 + y 2 = 25 at (3, 4).
og
Solution. In practice, we just take the derivative of everything with respect to x
and recall that y 0 = dy/dx. Hence
2x x
0 = 2x + 2y · y 0 =) y 0 = = .
Pr
2y y
This is exactly what we get when we use the definition as well. To finish the actual
problem, we have
dy 3
|(x,y)=(3,4) =
dx 4
3
so that the tangent line is y 4 = (x 3).
4
In
s
1.3. Applications of Derivatives. Related rates problems, which involve para-
metric functions as discussed above, show up quite often in the single-variable cur-
s
riculum. Classic examples include balloons filling up or deflating and basins filling
up or emptying of water. Ladders sliding down a wall or shadows lengthening are
re
also common.
Problem 1.21. Suppose we have a right conical co↵ee filter which is 8cm tall with
a radius of 4cm. The water drips through at a constant rate of 2 cubic centimetres
og
per second. When there is one eighth of the original water remaining, how fast is
the water level dropping?
⇡
Solution. We begin by noting that V (t) = · h(t) · r(t)2 . But because our cone
3
is conical, we also know that the height and radius of the cone are in a fixed ratio:
Pr
h(t) = 2 · r(t). Because we are solving for h0 (t), we will substitute in r(t) = r(t)/2.
Hence
⇡ h(t)2 ⇡
V (t) = · h(t) · = · h(t)3 .
3 4 12
Taking the time derivative,
⇡
V 0 (t) = h(t)2 · h0 (t)
In
4
Letting t = t0 be the time at which we would like to find the change in height, we
know that V 0 (t0 ) = 2 no matter what. Therefore we need to find h(t0 ) to complete
⇡ 128⇡
the problem. We know that V (t0 ) = V (0)/8, and that V (0) = · 8 · 42 = . So
3 3
⇡
as V (t0 ) = 12 · h(t0 )3 = 16⇡
3
, it’s pretty clear that h(t0 ) = 4. Therefore
⇡ 1
2 = V 0 (t) = · 42 · h0 (t0 ) =) = h0 (t0 ).
4 2⇡
We can now turn to optimisation. This is certainly a favourite in the undergraduate
curriculum and appears sometimes in the GRE. Why does it work?
MATH GRE BOOTCAMP: LECTURE NOTES 7
s
following theorem, credited to Fermat by various sources.
s
Theorem 1.24 (Fermat’s Boring Theorem). Let f : [a, b] ! R be a di↵erentiable
function (except at the endpoints). Then the maxima and minima of f (x) occur at
re
critical points and end points.
There are many di↵erent kinds of optimisation problems (as there are related
rates problems), but all have the same basic process: we are given both a function
og
to optimise and a constraint. The function to optimise is going to be some F (x, y)
in two variables, and the constraint is going to be some equation g(x, y) = c. After
substituting, we’ll have a one-variable problem. Sometimes we will have endpoints,
and sometimes it will be implicit that as your one variables gets too big or small that
there is no extremum to be found. We then find the critical points and exhaustively
Pr
check which is the largest and which is the smallest.
Problem 1.25. Suppose we are constructing a window comprised of a semicircle
sitting atop a rectangle. Given that the perimeter of the window must be 4 meters,
what is the maximum area?
Solution. Let us call x the width and y the height of the rectangle. We know that
⇡ ⇣ x ⌘2
In
the area is given by the sum xy + · , the first for the rectangle and the second
2 2
for the semicircle. Our constraint is 2y + ⇡x + x = 4, the bottom three sides of the
rectangle and the arc of the semicircle. It looks like making x our sole variable will
1+⇡
be the best path to success, so we will substitute y = 2 · x. Our function is
2
thus ✓ ◆
1+⇡ ⇡ ⇣ x ⌘2 4 + 3⇡ 2
A(x) = x · 2 ·x + · = 2x ·x .
2 2 2 8
Note that this is the equation of a downward-facing parabola, so if x is too big or
too small we have A(x) < 0. This is obviously a nonsense answer to the question,
8 IAN COLEY
so what we’re looking for is the critical point giving the vertex of the parabola – its
maximum.
4 + 3⇡ 8
We now need to solve A0 (x) = 2 · x = 0, so x = . We can now
4 4 + 3⇡
compute the maximum area:
✓ ◆
8 8
A = (not a typo).
4 + 3⇡ 4 + 3⇡
Problem 1.26. What is the minimum distance between the curve y = 1/x and the
origin?
Distance questions seem more
p difficult, as the optimisation equation we are dealing
s
with is of the form d(x) = x2 + f (x)2 . But an important observation is that the
minimum of d(x) is also achieved by d(x)2 , because the distance function is always
s
positive. Moreover, critical points of d(x) and d(x)2 are the same, as
d
re
(d(x)2 ) = 2 · d(x) · d0 (x)
dx
and d(x) > 0 as long as x 6= 0 or f (x) 6= 0. The bright side is that d(x)2 = x2 + f (x)2
is much easier to di↵erentiate than d(x).
og
1.4. Graphical Analysis. We already know that f 0 (x) > 0 means increasing and
f 0 (x) < 0 means decreasing, but now let’s recall what the second derivative means.
If f 00 (x) > 0, the graph y = f (x) is concave up, so that the graph is open to +1,
and f 00 (x) < 0 is concave down.
Pr
Definition 1.27. If f 00 (a) = 0, we say that x = a is a point of inflection.
In the circumstances of optimisation, we can use the second derivative to test
whether a critical point is a maximum or minimum.
Theorem 1.28 (Second Derivative Test). Suppose that x = a is a critical point of
f : R ! R. Suppose that f 00 (x) exists in a neighbourhood of a. If f 00 (a) < 0, then
In
f (a) is a local maximum. If f 00 (a) > 0, then f (a) is a local minimum. If f 00 (a) = 0,
the test is inconclusive.
There is also the first derivative test: suppose f 0 (x) exists and is continuous near
x = a. If f 0 (x) < 0 to the left of a and f 0 (x) > 0 to the right of a, then f (a) is a local
minimum. If f 0 (x) > 0 to the left of a and f 0 (x) < 0 to the right of a, then f (a) is a
local maximum. This can often be more useful for optimisation problems when the
second derivative is too difficult to compute. However, as in the above example of
3⇡
the window, we see that A00 (x) = constantly, so any extreme values that exist
4
must be maxima.
MATH GRE BOOTCAMP: LECTURE NOTES 9
Problem 1.29. Suppose that y = f (x) is smooth (i.e. has continuous derivatives
of all orders) and that f (1) = 2 is a local maximum. Order the values f (1), f 0 (1),
f 00 (1).
Solution. We know that f (1) = 2, so that settles that. Because x = 1 is a local ex-
tremum, we must have f 0 (1) = 0. Finally, because this extreme point is a maximum,
we must be concave down so f 00 (1) < 0. Hence f 00 (1) < f 0 (1) < f (1).
1.5. L’Hôpital’s Rule. The last topic worth remembering L’Hôpital’s rule, which
comes surprisingly in the second quarter of calculus at UCLA but we can recall now.
Theorem 1.30. Let f, g : R ! R be two functions and a 2 R [ {±1}. If
lim f (x) = lim g(x) = c where c 2 {0, ±1}, then
s
x!a x!a
f (x) f 0 (x)
s
lim = lim 0
x!a g(x) x!a g (x)
re
0 1
Expressions of the form and ± are called indeterminate forms. L’Hôpital’s
0 1
rule can also be used to solve problems which are not immediately in an indeterminate
form. The easiest case is expressions of the form f (x) · g(x) which give rise to the
og
indeterminate form 0 · 1. By rearranging to
f (x) g(x)
or
1/g(x) 1/f (x)
we will obtain an actual indeterminate form. Which one to choose depends on
Pr
whether 1/f (x) or 1/g(x) is easier to di↵erentiate.
Problem 1.31. Compute the following limit:
1 2x + 1
lim
x!2 x 2 x2 4
Solution. If you plug in x = 2, we don’t obtain an indeterminate form, but we
In
obtain something that looks like 1 1. This is our clue to combine the fractions
into an indeterminate form. With a common denominator,
x+2 2x x+2 (x 2) 1
= = = .
x2 4 x2 4 x2 4 (x 2)(x + 2) x+2
In this case we don’t even need to use L’Hôpital’s rule to finish the problem since we
had some nice cancellation. In other cases we might not be so lucky.
Problem 1.32. Compute the following limit:
lim x1/x
x!1
10 IAN COLEY
Solution. These problems are also related to L’Hôpital’s rule as well. Plugging in, we
obtain 10 . We notice that if we took the log of this expression, log x1/x = x1 · log x
log x
yields the form 0 · 1. We can now rearrange it to and finish the problem:
x
log x L’H 1
lim log x1/x = lim = lim = 0.
x!1 x!1 x x!1 x
But of course this solves the wrong question. If we say L = limx!1 x1/x , then
log L = 0 (as log is a continuous function so commutes with limits). Thus L = 1.
This same type of solution works if we have the form 11 , as log(11 ) = 1 · log 1
yields 1 · 0. This concludes the di↵erential side of single-variable calculus.
s
2. Day 2: Single variable calculus
s
Topics covered: the integral, area between curves, volumes of revolution, the fun-
re
damental theorem of calculus, u-substitution, integration by parts, trigonometric
integration, partial fractions, arc length and surface area, sequences and series, con-
vergence tests, Taylor polynomials and power series, root and ratio tests.
2.1. Integrals. Let us start with the definition of the Riemann integral. To do so,
og
we need to recall the limit of a sequence.
Definition 2.1. Let {xn }n2N be a sequence. We say that limn!1 xn = L if for all
" > 0, there exists N 2 N such that whenever n > N , |xn L| < ".
Pr
The di↵erence here is that there is no function floating around. We will revisit the
intricacies of sequences and series later in this lecture.
Definition 2.2. Let f : R ! R be a function. We define the left Riemann integral
of f on the interval [a, b] to be the limit
n 1 ✓ ◆
b a X b a
lim · f a+ ·i
In
n!1 n i=0
n
We define the right Riemann integral of f on the interval [a, b] to be the limit
n ✓ ◆
b a X b a
lim · f a+ ·i
n!1 n i=1
n
We say that the Riemann integral of f on the interval [a, b] if the above exist and
agree. In that case, we write
Z b
f (x) dx
a
MATH GRE BOOTCAMP: LECTURE NOTES 11
There is more we could say on this definition, but it’s not necessary for the GRE.
We also call the individual terms of these limits the left and right Riemann sums,
denoted Ln f or Rn f . There are a few other approximations for integrals, including
the midpoint and trapezoid approximations. The trapezoid ✓ approximation is◆the
b a
average of the left and right, and the midpoint rule uses a + · (2i + 1) in
2n
its argument.
We will almost never use the right and left Riemann sums, as these limits are
not calculable in practice, but it’s important to know a few things. If a function is
increasing, then the left Riemann sum will always underestimate the actual value of
the integral, and the right Riemann sum will always overestimate it.
The Riemann integral is not guaranteed to exist for an arbitrary function, but it
s
must exist for our favourite functions.
s
Proposition 2.3. If f : [a, b] ! R is bounded and continuous at all but finitely
Z b
re
many points, then f (x) dx exists.
a
We could assign this as an exercise, but it’s a bit difficult and not necessary at all
for the GRE. This isn’t a necessary and sufficient condition, but it is certainly good
og
enough for almost all purposes.
The integral is linear, just as the derivative was. In particular, this means that
Z b Z b
c · f (x) dx = c · f (x) dx
Pr
a a
and Z Z Z
b b b
f (x) ± g(x) dx = f (x) dx ± g(x) dx
a a a
Moreover, recalling the definition via Riemann sums, we can always split an inte-
gral into intermediate chunks. For any c 2 [a, b], we have
In
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx
a a c
Z a
By definition we will say f (x) dx = 0, so using the additivity above
a
Z b Z a
f (x) dx = f (x) dx
a b
This way we can make sense of integrals where our interval [a, b] happens to be
oriented the wrong way (i.e. a > b).
12 IAN COLEY
In practice, we will always compute the integral using the Fundamental Theorem
of Calculus.
Theorem 2.4 (FTC I). Assume that f : [a, b] ! R is a continuous function. If
F : [a, b] ! R is a function such that F 0 (x) = f (x), then
Z b
f (x) dx = F (b) F (a).
a
s
Theorem
F (x) = f (t) dt. Then F 0 (x) = f (x).
s
a
re
Problem 2.6. Prove that
Z b(x)
d
f (t) dt = f (b(x)) · b0 (x) f (a(x)) · a0 (x).
og
dx a(x)
One last thing to say at this point is the definition of an improper integral. Suppose
that we are trying to integrate f (x) on an unbounded region, say [0, 1), or over a
region [a, b] on which g(x) has an infinite discontinuity (say at x = a). Then we can
Pr
define the integral (should it exist) as a limit of integrals as defined above, e.g.
Z 1 Z R Z b Z b
f (x) dx := lim f (x) dx g(x) dx := lim+ g(x) dx
0 R!1 0 a h!0 a+h
These limits are not guaranteed to exist. The most common types of integrals
we consider in this situation are functions f (x) such that lim f (x) = 0 so that the
x!1
integral at least has a chance of converging. We will readdress this issue in the section
In
on series.
2.2. Applications of the integral. But before we get into techniques, why bother
computing integrals at all? For one, the integral of f (x) on [a, b] gives the signed area
of the region under the graph of f (x). This can even be used in reverse: integrals
can be calculated using geometry.
Problem 2.7. Compute the integral
Z 2p
4 x2 dx
2
MATH GRE BOOTCAMP: LECTURE NOTES 13
p
Solution. The equation y = 4 x2 corresponds to x2 + y 2 = 4, i.e. a circle of
radius 2. Our graph is the top half of this circle. Therefore the area under the curve
is half the area of the circle, 2⇡. This problem can be solved otherwise, but it’s much
more annoying.
In the same way, we can compute the area between curves with integrals. We need
only to compute the area under the upper curve and subtract the area under the
lower curve. The main question in situations like this is which curve is on top and
which is on bottom.
p
Problem 2.8. Compute the area between the curves y = x and y = x2 in the first
quadrant.
s
p
Solution. Thinking on the graphs, we can recall that x is above x2 in the region
s
[0, 1] where these curves intersect. Therefore the integral we need to compute is
Z 1
p
re
x x2 dx.
0
Problem 2.9. What is the volume of the region created by rotating y = log x around
the x-axis between x = 1 and x = e2 ?
In
It’s harder to compute the volume when we rotate y = f (x) around the y-axis.
Rather than use the method of discs, we use the method of cylindrical shells. In this
case, we are computing the area of a cylinder of radius r and height h, which is given
by 2⇡ · r · h. In our case, the radius is x and the height is f (x), hence
Z b
V = 2⇡ x · f (x) dx
a
Problem 2.11. Compute the arc length of y = cosh x over the interval [0, 2].
Sometimes we also want to calculate the surface areas of regions of revolution, not
just the volumes. In this case, we again have a formula: the infinitesimal amount of
surface area is given by the arc length along the surface times the circumference of
s
the disc, which in the case of rotation around the x-axis is 2⇡f (x). Thus
Z
s
p
S = 2⇡ f (x) 1 + f 0 (x)2 dx
re
Problem
p 2.12. Compute the surface area of a sphere of radius R, using the curve
y = R 2 x2 .
2.3. Integration techniques. Now besides guessing at antiderivatives, what are
og
our other integration techniques? The first is u-substitution, which is is our answer
to the chain rule.
Theorem 2.13 (u-substitution). Suppose that h(x) is a continuous function and we
can write h(x) = f 0 (g(x))g 0 (x). Then
Pr
Z b Z g(b)
h(x) dx = f (u) du.
a g(a)
p
Problem 2.14. Evaluate the integral of f (x) = 2x · sin(x2 ) on [0, ⇡/2].
Solution. We see that g(x) = x2 and f (u) = sin u is a good choice. Hence
Z p⇡/2 Z ⇡/2
In
⇡/2
2
2x · sin(x ) dx = sin u du = cos u = 1.
0 0 0
The second is integration by parts, which is our answer to the product rule. It
0 0 0
steps from the following observation: if we consider
Z Z product (f · g) = f · g + f · g
the
and integrate both sides, we obtain f · g = g df + f dg.
Theorem 2.15 (Integration by Parts). Suppose that f (x) has the form u(x) · v 0 (x).
Then Z Z Z
0
f (x) dx = u(x) · v (x) dx = u(x) · v(x) v(x) · u0 (x) dx.
MATH GRE BOOTCAMP: LECTURE NOTES 15
This is useful when your function is a product of a part which is easy to integrate
and a part which is difficult to integrate. Another situation is when part of your
function will di↵erentiate to zero and the other part will not di↵erentiate, as in x · ex .
A general mnemonic for what functions should be chosen for v 0 (x) is LIATE: loga-
rithms, inverse trigonometric functions, algebraic (e.g. polynomials), trigonometric
functions, and finally exponential functions. Note that these latter two types in
particular are almost never ideal.
Problem 2.16. Compute the integral of f (x) = log x.
Solution. If we follow our mnemonic, we will choose dv = log x dx. This means that
1
u = 1. Therefore v = dx and du = dx. Hence:
s
x
Z Z Z
1
s
log x dx = x log x x · dx = x log x 1 dx = x log x x.
x
re
It doesn’t seem like it would work, and then it does.
Problem 2.17. Compute the integral of f (x) = x2 · ex .
The next method is the method of trigonometric substitution. There are some
og
integrals that do not lend themselves to either of the above methods, but require a
special kind of u-substitution. We recognise this situation in the case that one of the
Pythagorean identities holds, usually the following:
p
Problem 2.18. Compute the integral of f (x) = 1 + x2 .
Pr
Solution. The trigonometric identity we are looking for in this situation is that
1 + tan2 ✓ = sec2 ✓. Therefore if we substitute x = tan ✓, we will only need to
integrate sec ✓, which is much more tractable than the current issue.
However, we have to consider the integration term. If x = tan ✓, then dx =
sec2 ✓ d✓. Thus Z p Z
In
We leave it in this form because we can now use integration by parts to solve this
problem: let dv = sec2 ✓ d✓ and u = sec ✓. Then v = tan ✓ and du = sec ✓ tan ✓, so
Z Z
3
sec ✓ d✓ = sec ✓ · tan ✓ sec ✓ tan2 ✓ d✓.
Putting this all together completes the problem, sort of. Though we know that
Z
s
sec ✓ tan ✓ 1
sec3 ✓ d✓ = log(tan ✓ + sec ✓)
2 2
s
we were asked a question about f (x), not f (✓). We know that x = tan ✓, so we can
re
make that substitution. In order to determine what sec ✓ is in terms of x, we draw
the triangle
p as in yesterday’s lecture. The Pythagorean theorem will tell us that
sec ✓ = 1 + x2 . If we then put it all together,
Z p p
x 1 + x2 1 p
og 2
1 + x dx = log(x + 1 + x2 ).
2 2
The final integration technique is the method of partial fractions. We are only ca-
pable with our techniques to integrate fairly simple rational functions, so the method
of partial fractions allows us to break up complicated expressions into integrable ones.
Pr
p(x)
The idea is that any quotient of polynomials comes from a sum of polynomials
q(x)
whose denominators are the irreducible factors of q(x). We can integrate expressions
a ax + b
of the form or 2 .
x r x +r
4
Problem 2.19. Compute the integral of 4 .
In
x 1
Solution. We factor the denominator as (x + 1)(x 1)(x2 + 1), and so set up
4 A B Cx + D
4
= + + 2
x 1 x+1 x 1 x +1
Multiplying through by the denominator,
4 = A(x 1)(x2 + 1) + B(x + 1)(x2 + 1) + (Cx + D)(x + 1)(x 1)
We can plug in particular values for x to compute A and B. If x = 1,
4 = A( 2)(2) =) A = 1
MATH GRE BOOTCAMP: LECTURE NOTES 17
If x = 1,
4 = B(2)(2) =) B = 1
For the last two variables, we need to do some brute multiplication. Omitting the
details,
4 = Cx3 + (2 + D)x2 Cx + (2 D)
which implies that C = 0 and D = 2. Hence
4 1 1 2
= + +
x4 1 x + 1 x 1 x2 + 1
which we can now integrate:
Z Z Z Z
4 1 1 2
s
4
dx = dx + dx + 2
dx
x 1 x+1 x 1 x +1
s
= log |x + 1| + log |x 1| 2 tan 1 (x)
re
There’s a bit of a complication if the irreducible factor of q(x) is not exactly of the
form x2 + r, but completing the square and some further manipulation can always
get us to that form.
Problem 2.20. Prove that
og ✓ ◆
Z
dx 1 1 x
2
= tan p
x +r r r
2.4. Sequences and series. We gave above the definition of a sequence and the
Pr
definition of a convergent sequence. We can now recall what a series is.
Definition 2.21. A series is a sequence {Sn }n2N defined by a sequence {ai }i2N such
Xn
that Sn = ai . A series converges if the sequence {Sn } converges as above. We
i=0
1
X
write ai for lim Sn .
In
n!1
i=0
Sometimes we will call {ai } the series and leave the fact that we are taking sums
implicit.
The simplest type of sequence/series that we encounter is the geometric series,
which is defined by ai = a0 · ri for some r 2 R. In this circumstance, there is an easy
criterion for when the series {ai } converges.
Problem 2.22. Prove that the series {ai = a0 · ri } converges if and only if |r| < 1.
a0
In this situation, the infinite sum has the formula .
1 r
18 IAN COLEY
1
Theorem 2.24 (p-test). Suppose we have the series an = for some p 2 R. The
np
series {an } converges if and only if p > 1.
This situation is not so common, but is a useful tool (as we will soon see).
Theorem 2.25 (Limit Comparison Test). Suppose that {an }, {bn } are nonnegative
s
series. Consider the quantity
an
lim =L
s
n!1 bn
re
• If L = 0, the series {an } converges if {bn } converges
• If L = 1, the series {bn } converges if {an } converges
• If L 2 (0, 1), the series {an } converges if and only if {bn } converges
og
It is sometimes useful to bear in mind the contrapositives of these statements.
Therefore while many series do not take the form of the p-test, they can be limit-
compared to a ‘p-series’ and thus we can determine their convergence. These two
also combine to answer the standing question about improper integrals:
Pr
Theorem 2.26 (Integral Test). Suppose that {an } is a positive series such that
Z f (n) = an . Then the series {an }
there exists a continuous function f (x) satisfying
1
converges if and only if the improper integral f (x) dx exists (converges).
0
Therefore the same types of tests that check for the convergence of infinite series
can be used to check the convergence of improper integrals when the limits in question
In
are too difficult to compute. Here, however, is one case that can be computed directly.
Z 1
1
Problem 2.27. Show that p
dx converges if and only if p < 1.
0 x
Z 1
1
Problem 2.28. When does the integral converge?
0 xp
Thus far we have spoken of positive series, but there is specific test for alternating
series that is occasionally useful:
Theorem 2.29 (Alternating Series Test). Suppose that {an } is a series satisfying:
MATH GRE BOOTCAMP: LECTURE NOTES 19
• The series is alternating, i.e. ai and ai+1 have di↵erent signs for every i 2 N
• The series does not pass the divergence test, i.e. limn!1 an = 0
• The series is decreasing, i.e. |ai | > |ai+1 | for every i 2 N.
Then the series converges.
⇢
1
For example, we saw above that the series does not converge (the harmonic
⇢ n
( 1)n
series), but the alternating series does. As a remark, the factor ( 1)n is
n
the easiest way to see that a series is alternating, but cos(⇡ · n) does the job as well.
2.6. Taylor polynomials and series. Before getting into the last two tests, we
s
need to recall the definition of a Taylor polynomial and a Taylor series. To begin,
we can define a Taylor polynomial and then explain its utility.
s
Definition 2.30. Let f : R ! R be a function of class C k . The kth Taylor polyno-
re
mial of f centred at x = a is defined by
f 00 (a)(x a)2 f (k) (a)(x a)k
Tk f (x) = f (a) + f 0 (a)(x a) + + ··· +
2 k!
og
This is the best polynomial approximation to f (x) with the property that the first
k derivatives at x = a agree. For example, the first Taylor polynomial is just the
tangent line to x = a, i.e. the linear approximation. If a = 0, then usually we use
the name Maclaurin instead of Taylor.
Pr
The error of a Taylor polynomial can be approximated using what would be the
next term. In particular,
M · |b a|k+1
|Tk f (b) f (b)|
(k + 1)!
where M is the maximum of |f (k+1) (x)| on the interval [a, b] or [b, a] (whichever
makes sense). In nice circumstances, M takes its maximum at a or b and so a more
In
In ideal circumstances, we have T (x) = f (x) for all x 2 R, but this is not guar-
anteed. This infinite sum may not even converge for some values of x, or for most
values of x for that matter. This is where our last two tests come in handy.
Theorem 2.33 (Ratio Test). Consider a series {an } which may or may not be
positive, and consider the limit
an+1
⇢ = lim
n!1 an
Then {an } converges absolutely if ⇢ < 1, diverges if ⇢ > 1, and is inconclusive if
⇢ = 1.
Recall that a series is said to converge absolutely if {|an |} converges.
s
In the case that the series depends on a parameter x, this gives us a function ⇢(x)
that we demand be 1 to have a chance at convergence. Those x for which ⇢(x) < 1
s
is called the radius of convergence. If we also determine whether the cases ⇢(x) = 1
re
converge (using other techniques), we obtain the interval of convergence.
Problem 2.34. Compute the interval of convergence of
X1
2xn
f (x) =
og n2
n=1
geometric) will yield an inconclusive root/ratio test. One should attempt to apply
a p-test to these (perhaps via a limit comparison). Series that include factorial or
exponential terms are the target for the root and ratio tests.
We end by recalling some useful Taylor expansions and their radius of convergence.
X1
xn
• ex = , x2R
n=1
n!
1
X
1
• = xn , |x| < 1
1 x n=1
MATH GRE BOOTCAMP: LECTURE NOTES 21
X1
( 1)n x2n+1
• sin x = , x2R
n=0
(2n + 1)!
X1
( 1)n x2n
• cos x = , x2R
n=0
(2n)!
We can use these building blocks to construct other Taylor series using substitu-
tion, di↵erentiation, and integration.
Problem 2.36. Compute the Taylor series for tan 1 (x) centred at a = 0. What is
its radius of convergence?
s
Solution. This looks like a dubious prospect, but we notice that
d 1
s
tan 1 (x) =
dx 1 + x2
re
and the righthand side looks a lot like a Taylor series we already know. In particular,
1
X X1 X1
1 n 1 2 n
= u =) = ( x ) = ( 1)n x2n
1 u n=0
1 ( x2 ) n=0 n=0
og
We now need to integrate this Taylor series to obtain the one for tan 1 (x). We do
so term-by-term,
X1 Z X 1
1 n 2n ( 1)n x2n+1
tan (x) = ( 1) x dx =
2n + 1
Pr
n=0 n=0
but need to recall that we might need to add a constant term. The constant term of
the Taylor series is tan 1 (0) = 0, so we don’t need to add anything in this case
What happens to the radius of convergence? Integrating or di↵erentiating doesn’t
do anything, but substituting does. We know that the series converges for |u| < 1,
but now u = x2 ,so
p
In
3.1. Vectors in R3 . Since multivariable calculus takes place with two variables (at
least in general and at most for our purposes), graphs will occur in 2 + 1 = 3
dimensions. Thus we should learn a little bit about R3 . In particular, we need to
familiarise ourselves with vector operations (that will reappear in generality for our
linear algebra section).
A vector in R3 is a triple hx, y, zi which we think of as an arrow from (0, 0, 0) to
(x, y, z). Between any two points in R3 we can obtain another vector, namely
(a1 , a2 , a3 ) ! (b1 , b2 , b3 ) ⇠ hb1 a1 , b 2 a2 , b3 a3 i
We’ll need to recall the two main operations on vectors in R3 . The first is common
to every vector space, which is the dot product (or scalar product):
s
ha1 , a2 , a3 i · hb1 , b2 , b3 i = a1 b1 + a2 b2 + a3 b3
s
The dot product is linear in each argument, that is,
(c~v ) · w
~ = c(~v · w)
~ = ~v · (cw)
~
re
and
(~u + ~v ) · w
~ = ~u · w
~ + ~v · w
~
It’s also commutative, which is pretty obvious from its definition. We also define the
og
norm of a vector using this:
k~v k2 = ~v · ~v
~ are orthogonal if ~v · w
We say that ~v and w ~ = 0.
The second is the cross product, which only exists on R3 (at least in this form):
Pr
2 3
î ĵ k̂
ha1 , a2 , a3 i ⇥ hb1 , b2 , b3 i = det 4a1 a2 a3 5
b1 b2 b3
where î = h1, 0, 0i, ĵ = h0, 1, 0i, and k̂ = h0, 0, 1i are the three unit basis vectors in
R3 . The cross product is linear in each variable as well, but it is anticommutative:
In
~v ⇥ w
~= ~ ⇥ ~v
w
The cross product is actually defined by a universal property: it is a bilinear
operation such that ~v ⇥ w ~ is orthogonal to both ~v and w,
~ that the ordered set
{~v , w,
~ ~v ⇥ w}
~ obeys the right-hand rule, and that
k~v ⇥ wk
~ = k~v k · kwk
~ · sin ✓
where ✓ is the planar angle between the two vectors. In particular, ~v ⇥ w
~ gives a
normal vector to the plane spanned by ~.v and w.
~ But before we investigate that,
let’s look at planes in general:
MATH GRE BOOTCAMP: LECTURE NOTES 23
There are two main ways to define a plane (in R3 at least). A plane is the solutions
to a linear function in R3 , so the equation looks like
ax + by + cz = d
for fixed a, b, c, d 2 R. The better way to think about it is to consider a plane as a
set of vectors that are orthogonal to the normal vector to the plane:
~n · ~v = 0
But this defines a plane that passes through the origin. To move the plane to a point
elsewhere, we move it by a fixed amount:
~n · (~v ~v0 ) = 0
s
But now ~n · ~v0 = d for some d 2 R, so we obtain
s
~n · ~v = d
re
and letting ~n = ha, b, ci and ~v = hx, y, zi recovers the above formula.
The cross product is particularly convenient when solving the following problems:
Problem 3.1. Find the equation of the plane in R3 passing through P = (0, 0, 1),
Q = (1, 0, 0) and R = (1, 1, 1).
og
Solution. Any three non-colinear points defines a unique plane in R3 , and we can
take for granted that these points are non-colinear (or check quickly). Recall above
~ then ~v ⇥ w
we said that if we know that a plane is spanned by two vectors ~v and w, ~
is normal to the plane. If we know three points, we can come up with two vectors:
Pr
~v = P~Q = h1, 0, 1i, ~ = P~R = h1, 1, 0i
w
The cross product is
~ ⇥w
~n = w ~ = h1, 1, 1i
Thus the equation of the plane is x y + z = d, where d is some constant. We can
compute it by plugging in any point that is already on the plane, say (0, 0, 1). Hence
In
s
time t = 2.
s
What about arc length? We went over how to do this in R2 earlier: the arc length
of the curve y = f (x) is
re
Z bp
1 + f 0 (x)2 dx
a
p
which we obtained by trying to integrate ds = (dx)2 + (dy)2 . Now, we aren’t going
og
to want to integrate with respect to x, because these curves are functions of time t.
Moreover, we have three components, so that
p
ds = (dx)2 + (dy)2 + (dz)2
This is the length of the diagonal of the infinitesimal cube with sides dx, dy, dz. Thus
Pr
by ‘factoring out’ a dt from all these terms,
Z Z p
ds = (dx)2 + (dy)2 + (dz)2
s✓ ◆ ✓ ◆2 ✓ ◆2
Z 2
dx dy dz
= + + dt
dt dt dt
Z
In
k~r 0 (t)k dt
No surprise: we are integrating the length of the velocity, i.e. the (directionless) speed
of the curve at every point. This is, in fact, the same formula as we were dealing
with before. The planar curve y = p f (x) can be parametrised as ht, f (t)i, which has
0
derivative h1, f (t)i and thus speed 1 + f 0 (t)2 . Thus we can forget the old formula
and stick with the new.
Problem 3.3. Compute the arc length of the helix ~r(t) = hsin t, cos t, ti from t = 0
to t = 2⇡.
MATH GRE BOOTCAMP: LECTURE NOTES 25
One may recall studying curvature or other horrible topics, but these don’t seem
to appear on the GRE so we will not revisit them.
3.3. Multivariable functions. We will give everything in terms of two variables
for now, but the same could be done for 3 or n variables without changing definitions
very much.
Suppose now that f : R2 ! R is a function. The definition of a limit is still the
same.
Definition 3.4. We say that lim(x,y)!(a,b) f (x, y) = L if for every " > 0, there exists
> 0 such that whenever k(x, y) (a, b)k < , kf (x, y) Lk < ".
Now talk about how to try to compute limits, and approaching along di↵erent
s
paths to show that limits don’t exist
s
Problem 3.5. Prove that if lim f (x, y) = L, then for all : [0, 1] ! R such
(x,y)!(a,b)
re
that (0) = (a, b), then lim f ( (t)) = L.
t!0
xy
Problem 3.6. Prove that lim does not exist. Hint: find two paths
(x,y)!(0,0) x2 + y2
og
that give di↵erent limits.
xy 2
Problem 3.7. Prove that lim = 0.
(x,y)!(0,0) x2 + y 2
Solution. For this, we will recall polar coordinates, which we will use more tomor-
Pr
row. If we convert the point (x, y) into polar coordinates, then we are instead taking
the limit r ! 0 and under the substitution x = r cos ✓, y = r sin ✓, we need to solve
r3 cos ✓ sin2 ✓
lim 2
= lim r(cos ✓ sin2 ✓)
r!0 r r!0
s
derivatives @x f (a, b) and @y f (a, b) exist and f (x, y) is locally linear at (a, b).
s
The specific definition does not matter too much, but we have a nice property.
re
Proposition 3.10. If @x f (a, b) and @y f (a, b) exist and are continuous in a neigh-
bourhood of (a, b), then f (x, y) is di↵erentiable at (a, b).
In this case, we can define the tangent plane to f (x, y) at (a, b) and it is actually
the linear approximation: the tangent plane is spanned by the partial derivative
og
in the x-direction and the one in the y-direction. We didn’t go over above how to
parametrise a plane using two variables, but we can do so now:
P (s, t) = s · h1, 0, @x f (a, b)i + t · h0, 1, @y f (a, b)i + ha, b, f (a, b)i
Pr
But this isn’t particularly helpful for us, since we would like a form in terms of
(x, y, z). Instead, we will define the tangent plane using its normal vector. Because
we know two vectors in the plane, which moreover are linearly independent, we take
their cross product:
2 3
î ĵ k̂
h1, 0, @x f (a, b)i ⇥ h0, 1, @y f (a, b)i = det 41 0 @x f (a, b)5
In
0 1 @y f (a, b)
= h @x f (a, b), @y f (a, b), 1i
Hence our equation is h @x f (a, b), @y f (a, b), 1i · hx a, y b, z f (a, b)i = 0. If
we work this out and rearrange it, it becomes
z = @x f (a, b)(x a) + @y f (a, b)(y b) + f (a, b)
which looks a lot like the equation for the tangent line, except now there are two
slopes and two variables that need to be taken into account.
Now, what about taking multiple partial derivatives? In principle one can take
both @x @y f (x, y) and @y @x f (x, y). Are these the same? Are we detecting the same
MATH GRE BOOTCAMP: LECTURE NOTES 27
change in both x and y in both cases? In general, no we are not, but in every case
that we’ll run into during the GRE, yes. The reason is Clairaut’s theorem:
Theorem 3.11 (Clairaut’s Theorem). Suppose that f : R2 ! R is a function, and
suppose that the second-order partial derivatives of f exist and are continuous in a
neighbourhood of (a, b). Then @x @y f (a, b) = @y @x f (a, b).
This is convenient and not necessarily expected, but it does make a particular
technique in optimisation a whole lot more convenient later on. This also works
in more than 2 variables when taking partial derivatives with respect to any two
di↵erent independent variables.
3.5. Gradient and directional derivatives. Having done partial derivatives, it’s
s
fair to ask if there’s anything resembling a ‘total derivative’ of the function, something
s
that takes all the variables into account. There is.
Definition 3.12. For f : R2 ! R, define the gradient of f at (a, b) to be
re
rf (a, b) = h@x f (a, b), @y f (a, b)i.
What good is the gradient for us? It is a vector quantity instead of a scalar
og
quantity, which is interesting, and it still satisfies some nice properties. Because it
is made of partial derivatives, it is still linear, and it satisfies a product rule:
r(f (x, y) · g(x, y)) = f (x, y) · rg(x, y) + g(x, y) · rf (x, y)
where we think of f and g as scalar multiples (though depending on (x, y)). There is
Pr
also a chain rule for the types of functions that we can actually compose at this point:
suppose that ' : R ! R, so that ' f : R2 ! R is still a multivariable function. Then
r(' f ) = '0 (f (x, y)) · rf (x, y)
where again we think of the function '0 : R ! R as acting by scalar multiplication.
Problem 3.13. Compute the gradient of
In
g(x, y, z) = (x2 + y 2 + z 2 )8
We can now talk about directional derivatives in other directions. The partial
derivatives are the derivatives in the direction h1, 0i or h0, 1i, but we could have used
any other unit vector.
Definition 3.14. The directional derivative of f : R2 ! R at (a, b) in the direction
~u = hh, ki is the limit
f (a + th, b + tk) f (a, b)
@~u f (a, b) = lim
t!0 t
28 IAN COLEY
Note that these may not exist if the function is not di↵erentiable at (a, b). In
particular, the existence of partial derivatives alone is not sufficient to conclude
these exist. But in the case f (x, y) is di↵erentiable, we have the following:
@~u f (a, b) = rf (a, b) · ~u
which means that the above limit needs to be computed only rarely.
Problem 3.15. Prove that the directional derivatives of
( 4
xy
2 8 (x, y) 6= ~0
f (x, y) = x +y
0 (x, y) = ~0
at (0, 0) exist and depend linearly on the gradient, but that f (x, y) is not di↵eren-
s
tiable at (0, 0).
s
Problem 3.16. Prove that there is no function f (x, y) such that rf (x, y) = hy 2 , xi.
Hint: Clairaut’s theorem.
re
We have another version of the chain rule, where we compose a curve and a
multivariable function to obtain a function R ! R.
Theorem 3.17 (Chain Rule II). Let r : R ! R2 be a di↵erentiable curve and let
og
f : R2 ! R be a di↵erentiable function. Then
d
f (~r(t)) = rf (~r(t)) · ~r 0 (t)
dt
where this is now the dot product of the two vector-valued functions.
Pr
Problem 3.18. Prove this theorem from the definition of a single variable derivative.
Gradient is in the direction of greatest change on the graph z = f (x, y). How do
we see this? The directional derivatives of f (x, y) tell us the rate of change in each
direction. The direction ~u which makes the quantity rf (a, b) · ~u the most is the
unit vector in the direction of rf (a, b) itself. Similarly, rf (a, b) is the direction of
In
greatest decrease.
For another thing, suppose that we look at the level curves of the graph z = f (x, y).
These are the specific subsets f (x, y) = c for a fixed constant c 2 R. Let ~rc (t)
parametrise the curve, and consider a point (a, b) = ~rc (t0 ) on this curve. Then the
tangent vector to the curve is ~r 0c (t), and we can examine rf (a, b) · ~r 0c (t0 ). By the
chain rule,
d
rf (a, b) · ~r 0c (t0 ) = f (~rc (t))
dt t=t0
But on the curve ~rc (t), f is the constant c. Thus the above derivative is zero. This
means that the gradient is orthogonal to the tangent vector to the level curve, i.e. it
MATH GRE BOOTCAMP: LECTURE NOTES 29
is normal to the level curve. This is also true in higher dimensions, though it’s a bit
more complicated to prove.
Problem 3.19. What is the greatest rate of change of f (x, y) = x4 y 2
at the point
(a, b) = (2, 1)?
The last thing to say is on the subject of surfaces on R3 which are defined using
3-variable functions. Consider a function F : R3 ! R and consider the set of points
(x, y, z) such that F (x, y, z) = c. Assuming that F is a nice function (say, F is C 2
and rF is nowhere zero in all components), this defines a surface in R3 , but usually
one that isn’t the graph of a function. If we consider the easiest example,
x2 + y 2 + z 2 = 1
s
then we get a sphere, which we know isn’t the graph of a function, but is the graph
s
of two functions glued together.
Now, how do we find the tangent plane to such a surface? We clearly can’t take the
re
same approach because we don’t have a function f (x, y) = z to deal with. Instead,
we need to figure out how to use F (x, y, z). Suppose that ~r(t) is a curve on the
surface F (x, y, z) = c. Then by the chain rule,
d
og F (~r(t)) = rF (~r(t0 )) · ~r 0 (t0 ).
dt t=t0
But F (~r(t)) = c is a constant function, so its gradient is the zero vector. Thus the
above dot product is also zero, so that rF (~r(t) is orthogonal to to the curve ~r(t) at
any point. Thus rF is orthogonal to the surface F (x, y, z) = c and thus we can use
Pr
it as the normal vector to the tangent plane. We see now the reason that rF should
not be identically zero – it would mean that the ‘normal vector’ to a tangent plane
is the zero vector, implying something is wrong with the geometry of the situation
Problem 3.20. What is the tangent plane to the surface x2 + y 2 + z 2 = 3 at the
point (1, 1, 1)?
In
3.6. Local extrema. First, let’s talk about finding local minima and maxima. Just
as in single-variable calculus, these occur at critical points.
Definition 3.21. Let f : R2 ! R be a di↵erentiable function. Then we say that
(a, b) 2 R2 is a critical point of f (x, y) if rf (a, b) = ~0. That is, @x f (a, b) = 0 and
@y f (a, b) = 0.
Of course, just like in single-variable calculus, while every extremum occurs at a
critical point, not every critical point gives rise to an extremum. There are two meth-
ods in single-variable calculus to give us more information: the first derivative test
and the second derivative test. Neither has an immediate analogue in multivariable
calculus, but the second derivative test will turn out to be the solution.
s
But, as discussed above, there are four di↵erent ‘second derivatives’ of a given
function. What we do is assemble them into a matrix called the Hessian of f as
s
follows: ✓ ◆
@x @x f @x @y f
Hf = .
re
@y @x f @y @y f
Then the second derivative test says the following:
Theorem 3.22 (Second Derivative Test). Let f : R2 ! R be a function of class
og
C 2 . Suppose that (a, b) is a critical point of f (x, y). Let d = det Hf (a, b) be the
determinant of the Hessian matrix and T = tr Hf (a, b) be its trace. The following
conclusions hold:
• If d < 0, then the point (a, b) is a saddle point.
• If d > 0 and T < 0, then the point (a, b) is a local maximum.
Pr
• If d > 0 and T > 0, then the point (a, b) is a local minimum.
• If d = 0, the test is inconclusive.
If it’s hard to remember which condition corresponds to maximum and which to
minimum, then just remember single-variable: if f 00 (x) < 0, we have a local maximum
and if f 00 (x) > 0, we have a local minimum. The trace follows the same convention.
It’s also fun fact that, in the case that d > 0, @x2 f and @y2 f must have the same sign,
In
This matrix corresponds to second derivatives in the two essential directions which
are describing the behaviour of f (x, y) near (a, b). Thus we would want both direc-
tions to agree on what we’re seeing. If 1 , 2 > 0, then both directions think we are
concave up and thus we should be at a minimum. If 1 , 2 < 0, then both directions
think we are concave down and thus we should be at a maximum. However, if 1
and 2 have di↵erent signs, then this means that we are concave up in one direction
and concave down in another – a saddle point. We have a similar problem if 1 or
2 are equal to zero.
How does this reasoning apply to the second derivative test? The determinant of
the Hessian is equal to 1 2 . If 1 and 2 have the same sign, then d > 0. Otherwise,
d 0. The trace of the Hessian is equal to 1 + 2 , which lets us figure out if both
s
are positive or both are negative (in the case that d > 0).
The second derivative test for R2 takes advantage of a particular quirk: the product
s
of two numbers is positive if and only if the numbers have the same sign. If we were
to discuss local extrema in R3 or higher, we would end up needing to analyse three
re
eigenvalues 1 , 2 , 3 . It’s impossible to tell if three numbers are all positive just
from their product and sum: the triple (3, 1, 1) has positive determinant and
trace, but corresponds to a saddle point.
og
Let’s have one example before moving on:
Problem 3.24. Find and classify all critical points of f (x, y) = (x2 + y 2 )e x .
Solution. First we need the gradient:
Pr
@x f (x, y) = (x2 + y 2 )( e x ) + (2x)e x
= (2x x2 y 2 )e x
x
@y f (x, y) = 2ye
Starting with the y-derivative, we must have y = 0. Plugging that into the x-
derivative,
@x f (x, 0) = (2x x2 )e x
= (2 x) · x · e x
In
giving us two solutions: (2, 0) and (0, 0), as e x will never equal zero. We now need
to compute the Hessian. It’s useful here to take advantage that Clairaut’s theorem
applies, so that @y @x f = @x @y f :
= (2 4x + x2 + y 2 )e x
x
@x @y f (x, y) = 2ye
@y2 f (x, y) = 2e x
32 IAN COLEY
s
Optimisation in multivariable calculus works about the same as in single-variable
calculus: first, figure out the region on which you are trying to optimise. Check for
s
critical points of your function on the inside of your region, then check the boundary.
It’s not necessary to classify the critical points because, at the end of the day, we’re
re
just going to write down a list of values and pick the biggest and smallest one.
In single-variable, the boundary of a compact region (i.e. closed and bounded) is
always a discrete set of points which you can check individually. In multivariable
calculus, regions are two-dimensional so boundaries are one-dimensional. This means
og
that the ‘check the boundary’ step in multivariable calculus is just an ordinary op-
timisation problem in single-variable calculus, which (having gotten this far in the
course) you already know how to do.
Enough talk – let’s have an example.
Pr
Problem 3.25. Find the maximum of the function f (x, y) = x + y x2 y2 xy
on [0, 2] ⇥ [0, 2] ⇢ R2 .
Solution. We are working on a compact region so we are guaranteed a maximum,
so that’s a relief. First, find the gradient of the function:
rf (x, y) = h1 2x y, 1 2y xi
In
giving us a critical point (on this line) of t = 1/2, so the point (1/2, 0) all in all.
Noticing that f (x, y) = f (y, x), we will obtain a critical point (0, 1/2) on the left
edge of the square.
Moving to the top edge, we have ~r2 (t) = ht, 2i for t 2 [0, 2]. Solving as above,
f2 (t) = f (~r2 (t)) = t + 2 t2 4 2t = 2 t t2 =) f20 (t) = 1 2t
giving us a critical point at 1/2. This is outside our region, so we ignore it. By
symmetry, we won’t get anything on the right edge either.
The last step is to the check the boundaries of our boundary edges, which are
the corners of the square: (0, 0), (0, 2), (2, 0), and (2, 2). Having assembled all our
points, we now get a list of values:
s
f (0, 0) = 0
f (0, 2) = f (2, 0) = 2
s
f (2, 2) = 8
re
f (0, 1/2) = f (1/2, 0) = 1/4
f (1/3, 1/3) = 1/3
which proves that, indeed, the maximum was at the critical point (1/3, 1/3) all along.
og
However, we now know that the minimum of the function occurs at ( 2, 2).
Problem 3.26. Find the global extrema of the function f (x, y) = x2 x · y on the
ellipse x2 + 4y 2 4.
Solution. The first step is to find the critical points of the function on the interior
Pr
of the ellipse, which I will leave as an exercise. The problem comes with checking
the boundary – it is certainly one-dimensional, but how do we parametrise it? Here
is a sub-exercise for you to do:
Problem 3.27. The ellipse x2 /a2 +y 2 /b2 = 1 is parametrised by ~r(✓) = ha cos ✓, b sin ✓i
for ✓ 2 [0, 2⇡].
In
Once you’ve done this problem, you’ll be equipped to parametrise the boundary
and complete the problem. Unlike the case of the square, we do not have a ‘boundary
of the boundary’ in this case since the ellipse doesn’t have endpoints.
We now turn to the special case of Lagrange multipliers. This applies to opti-
misation of functions g : R3 ! R on closed surfaces (i.e. compact surfaces without
boundary) in R3 defined implicitly by F (x, y, z) = 0 (or F (x, y, z) = c for any c, but
by modifying F we can assume c = 0). It can also apply to optimisation on ellipses
or circles in R2 , but we will only demonstrate in the more difficult case.
Suppose that we are trying to find a maximum of g(x, y, z) on S = F 1 (0) for a
C 2 function F : R3 ! R. Let’s pick a random point (a, b, c) (where we do not mean
34 IAN COLEY
c as above). We can examine the gradient rg(a, b, c), which points in the direction
of greatest change of g. We can use this direction to move along S to another point
(a0 , b0 , c0 ) near to our starting point such that g(a0 , b0 , c0 ) > g(a, b, c). There is one
circumstance when that fails – if rg is pointing directly away from the surface S,
we cannot travel in that direction at all.
But we already know what direction is directly away from S – it is rF (a, b, c).
Thus:
Theorem 3.28 (Lagrange Multipliers). Let F be a C 2 function so that F (x, y, z) = 0
define a closed surface S in R3 , and let g : R3 ! R be a C 1 function. Then g has its
local extrema at those points (a, b, c) so that rF (a, b, c) and rg(a, b, c) are parallel,
i.e. there exists 2 R \ {0} such that rF (a, b, c) = g(a, b, c).
s
Note that this includes the case rg = ~0 identically, which would correspond to a
s
local maximum or minimum of g(x, y, z) without constraining ourselves to S.
re
Problem 3.29. Find the point on the plane
x y z
+ + =1
2 4 4
og
closest to the origin in R3 , then compute the distance.
Solution. As always, we need a constraint
p function and a function to optimise. The
function to optimise is d(x, y, z) = x + y + z 2 , which is a bit messy. As we argued
2 2
A key point to the theory of Lagrange multipliers is that we never need to compute
, but we can use it symbolically to arrange all that we have. Solving each of those
equations for tells us that
= 4x = 8y = 8z =) x = 2y = 2z
so we can use the one-variable substitution y = x/2 and z = x/2 to compute an
actual point on this plane:
x x/2 x/2 3 4 2
+ + = 1 =) x = 1 =) x = , y=z= .
2 4 4 4 3 3
MATH GRE BOOTCAMP: LECTURE NOTES 35
Answering the question, we have to plug all this in to the original distance function
p
p 2 6
d(4/3, 2/3, 2/3) = 16/9 + 4/9 + 4/9 = .
3
But wait! We never determined that this was a minimum! Fortunately, we can
appeal to our other senses: it’s very easy for a point on a plane to get far away from
the origin, but it’s difficult for it to be close. We should expect a minimum but no
maximum. As such, any extremum we encounter should be a minimum.
If we’re being extra fancy, we can compute the (three-dimensional!) Hessian for
g(x, y, z). Most of the second partial derivatives are zero, and the Hessian is diagonal
with entries (2, 2, 2).Thus we are in a permanent state of concave up, i.e. all local
extrema are minima.
s
I leave you with a classical practice problem:
s
Problem 3.30. What is the maximum of the function g(x, y, z) = xyz on the unit
sphere?
re
You perhaps know intuitively what the answer should be, but see how the method
of Lagrange multipliers bears out your intuition.
og
4. Day 4: Multivariable calculus
How do you integrate in two variables? First, learn how to integrate boxes. You
can do that using Riemann sums, but I really don’t want to do that. It’s conceptually
important but not worth typing up in the grand scheme of things. Again, they are
Pr
linear and you can separate them up and so on. There’s a technical definition for
when functions are integrable, but in all cases we care about it will suffice to know
that continuous functions on bounded domains (with non-ridiculous boundaries) are
di↵erentiable.
For boxes, it doesn’t matter whether you integrate over x or y first.
Theorem 4.1 (Fubini’s Theorem). Let f : R2 ! R be continuous and let R =
In
Then, for integrating functions over more complicated regions D which are not
rectangles, you want to parametrise the region in terms of x in some range then y
as a function of x, then integrate y first then x. Or you can do it the other way,
depending on exactly how your region looks.
Problem 4.2. Integrate f (x, y) = xy over the region bounded by y = 4 and y = x2
in the first quadrant.
36 IAN COLEY
Solution. We can easily describe this region as x 2 [a, b] with '(x) y (x).
Since we are in the first quadrant, we must start at x = 0. The end point is where
these two curves intersect, which occurs at (2, 4). Thus we would like to integrate:
Z 2Z 4
xy dy dx
0 x2
Why have we ordered the y-integral like this? Drawing out the region shows that
y = x2 is on bottom and y = 4 is on top. When performing multivariable integrals,
if we are integrating with respect to y we just pretend that x is a constant (because
it is for our purposes):
Z
xy 2 4 x5
4xy dy = = 8x .
s
x2 2 x2 2
This shows why we are integrating with respect to y first. If we were to do this
s
integral second, our final answer would still have variables, which is suboptimal for
a definite integral. But now we integrate with respect to x and all our variables will
re
vanish: Z 2
x5 x6 2 64 32
8x dx = 4x2 = 16 = .
0 2 12 0 12 3
og
If our regions are oriented in the other fashion, we should integrate first with
respect to x then with respect to y. As an example,
Problem 4.3. Compute the area between the curves x = y 2 and x = 2y in the first
quadrant.
Pr
To find the area of the region, just integrate the function f (x, y) = 1. The hard
part is setting up the bounds, which I leave to you.
When our function is not just f (x, y) = 1, the integral over R is the volume under
the surface z = f (x, y) in R3 which lies over R in the xy-plane. This means that
instead of calculating the area between curves, we can calculate the volume of a
region between surfaces. In particular, when a certain region has a nice boundary
In
with respect to the xy-plane and its upper and lower boundaries are nice functions
of x, y, we’re in business. We can also do this with triple integrals.
4.1. Triple Integrals. Really, there’s not a whole lot di↵erent here, except that we
have three variables instead of two. Riemann sums are Riemann sums, except now
they’re one dimension more annoying. Supposing that we can parametrise our region
W ⇢ R3 anagolously, so that its boundary is of the form z1 (x, y) z z2 (x, y) on
a region D in the xy-plane, then
ZZZ ZZ Z z2 (x,y) !
f (x, y, z) dV = f (x, y, z) dz dA
W D z1 (x,y)
MATH GRE BOOTCAMP: LECTURE NOTES 37
s
R
and now we just have to integrate over the rectangle, which is pretty straightforward,
s
thus left as an exercise.
To find the volume, we have two conceptual choices that amount to the same
re
integral. First, it’s the region between two surfaces, so by the brief comment above,
we could solve ZZ
(3x + 5y) (x + y) dA
R
og
That is, we want to integrate over the region R the di↵erence in the heights of these
functions. Alternatively, we perform the same integral by plugging in 1 instead of z:
Z Z Z 3x+5y ZZ
1 dz dA = (3x + 5y) (x + y) dA
R x+y R
Pr
which amounts to the same thing. This is an even easier computation that I will not
do.
This example is slightly more confusing.
Problem 4.5. Integrate f (x, y, z) = x over the region W bounded above by z =
4 x2 y 2 and below by z = x2 + 3y 2 in the first octant
In
Solution. In order to parametrise the region in the xy-plane over which W lies we
need to compute the intersection of the surfaces. This turns out to be an ellipse:
4 x2 y 2 = x2 + 3y 2 =) 4 = 2x2 + 4y 2
Call the quarter of this ellipse we care about E. Our integral is thus
Z Z Z 4 x2 y 2
x dz dA
E x2 +3y 2
We need
p to solve the ellipse in terms of x or y, and we must as well pick x. We have
x = ± 2 2y 2 . We also know that x 0 and y 0 in the part we care about, so
38 IAN COLEY
p
we will pick 0 x 2 2y 2 . The bounds of y are 0 y 1. Therefore we can
set up our integral and go:
Z 1 Z 2 2y2 Z 4 x2 y2
x dz dx dy.
0 0 x2 +3y 2
s
y
x = r cos ✓, y = r sin ✓ () x2 + y 2 = r2 , tan ✓ =
s
x
ZZ
converts between the two. But is doing an integral like f (x, y) dx dy as easy as
re
ZZ R
f (r, ✓) dr d✓?
R
No, it’s not. The problem is that dr d✓ is not the same area as dx dy. In fact, we
og
can draw the usual picture and prove that
dx dy = r dr d✓
Thus swapping your integral into polar coordinates is almost as easy as posited.
Pr
Problem 4.6. Compute the area of the unit circle using polar coordinates.
Solution. The unit circle is described as ✓ 2 [0, 2⇡] and r 2 [0, 1]. Thus its area is
Z 2⇡ Z 1
r2 1
r dr d✓ = 2⇡ · = ⇡.
0 0 2 0
Note that if we forget to include that r, we get
Z 2 Z 1
In
⇡ 1 dr d✓ = 2⇡
0 0
which is a wrong answer.
So that’s for two dimensions; what about three? There’s an analogue of polar
coordinates called cylindrical coordinates, which just adds z as the third variable.
It’s another easy computation that dx dy dz = r dr d✓ dz. These coordinates are best
used with surfaces or regions that have nice symmetry when rotating around the
z-axis but not for any other types of rotation, for example cones, cylinders, and
hyperboloids or paraboloids.
MATH GRE BOOTCAMP: LECTURE NOTES 39
Spherical coordinates are must useful for spheres, and can occasionally be useful
in other situations. The conversions are as follows:
z y
x2 + y 2 + z 2 = ⇢2 , cos ' = , tan ✓ = .
⇢ x
Conversely (and more usefully),
x = ⇢ sin ' cos ✓, y = ⇢ sin ' sin ✓, z = ⇢ cos '
A calculation that’s possible but a little beyond the scope of this course is that
dx dy dz = ⇢2 sin ' d⇢ d' d✓
Problem 4.7. Compute the volume of the region between the surfaces z = x2 + y 2
and z = 8 x2 y 2 .
s
Solution. The first surface is on bottom and the second is on top. The intersection
s
between these surfaces is
x2 + y 2 = 8 x2 y 2 =) x2 + y 2 = 4
re
which is the circle of radius 2. Thus we can compute this volume as a cylindrical
integral over the disc D given by r 2. The first thing is rephrasing the integrand
in terms of polar coordinates.
og
8 x2 y2 (x2 y2) = 8 2(x2 + y 2 ) = 8 2r2
Thus:
ZZ Z 2⇡ Z 2
2
8 2r dA = (8 2r2 )r dr d✓
Pr
D 0 0
Z 2
= 2⇡ · 8r 2r3 dr
✓0 ◆
r4 2
= 2⇡ · 4r2 = 2⇡ · (16 4) = 24⇡.
4 0
4.3. Quadric surfaces. . Now would probably be a good time to go over quadric
In
surfaces, i.e. the basic surfaces we will encounter in R3 . Ellipsoid. These are the
analogue of ellipses, and look basically the same: for positive numbers a, b, c 2 R, we
have
x2 y 2 z 2
+ 2 + 2 =1
a2 b c
the ellipse with radii a, b, c in the x, y, z directions respectively. The volume of such
an ellipsoid is (4/3)⇡abc, as one might expect.
Elliptic paraboloid These can be written as the graph of a function, namely for
a, b > 0,
z = ax2 + by 2 .
40 IAN COLEY
y = x2 + z 2 , x = 2y 2 + 3z 2 .
s
that one is facing up and one is facing down – hence hyperbolic paraboloid.
This type of shape is incredibly difficult to draw, but for one of these the point
s
(0, 0) is a saddle point. Thus these are the Pringle-shaped graphs that show up when
we learn the second derivative test.
re
Hyperboloid of one sheet. What if we have an ellipsoid but then flip one of
the signs? Then we can arrange it to obtain
x2 y 2 z2
og + = +1
a2 b2 c2
up to permutation of variables. Then if we set x = 0 or y = 0 we obtain again the
formula of a hyperbola, and now the slices at fixed values of z are ellipses. This is
not a combination we have seen before and we baptise it hyperboloid. You’ll want
Pr
to Google what these look like. If z is the isolated variable on the other side of the
equation, then we see that the elliptical slices are again parallel to the xy-plane.
Hyperboloid of two sheets. Suppose that flip two of the signs on an ellipsoid.
Then up to permuting variables, we obtain
x2 y 2 z2
+ = 1
a2 b2 c2
In
A fair question is how this di↵ers from the last example. It doesn’t really – we still
obtain hyperbolas if x = 0 or y = 0 and the horizontal slices are ellipses. But now
what if we plug in z = 0? Then we have to solve
x2 y 2
+ 2 = 1
a2 b
but this has no solutions. In fact, we need |z| c for there to be any points in x, y
that satisfy the equation. Thus the two halves of the hyperboloid are separated from
each other, i.e. there are two separate ‘sheets’.
MATH GRE BOOTCAMP: LECTURE NOTES 41
Cone. A special case is the intermediate point between the two kinds of hyper-
boloids:
x2 y 2
+ 2 = z2
a2 b
where we imagine we have multiplied through by c2 and reorganised our constants
a, b. Then when z = 0, there is only one point (0, 0, 0) on the level set. Thus our
two sheets are joined at a single point, and it’s not hard to see that we have a cone.
If we set x = 0 or y = 0, we get (for example)
x2 2 x
= z =) ± =z
a2 a
which is a pair of lines intersecting at the origin. This certainly feels like the slice of
s
a cone (as it’s nice and pointy).
s
4.4. Vector fields and fancier integration. We now turn to the second kind of
integration in multivariable calculus, namely those involving vector-valued functions.
re
Definition 4.8. A vector field is a function f : Rn ! Rn , which we think of as
assigning to each point in Rn a vector in Rn beginning at that point.
At this point, I would draw a picture, or steal one from StackExchange1There are
og
many vector fields in real life, two easily coming to mind are the gravitational vector
field which expresses the force (and direction) due to gravity on any object in space,
and on each we can talk about the vector field of wind – to each point on each we
can assign the vector of which direction (and speed) the wind is blowing.
The most boring example is when we are still working with real-valued functions. If
Pr
we want to integrate a real-valued function over some curve C in R3 (or a surface, but
let’s stick with curve), then we think of C as being the image of some : [a, b] ! R3
which is continuous except at finitely many points (with technical details omitted).
Then the function we are considering is
f ( (t)) : R ! R
In
But this isn’t quite enough, because the parametrisation matters. We need to make
sure that the speed at which we are traversing this curve is taken into account, i.e.
Z Z b
f (x, y, z) = f ( (t))k 0 (t)k dt
C a
Note that for this equation to be without problems, we want to assume that 0 (t)
and f (x, y, z) are continuous. But this is the boring example.
Now, suppose we are thinking physics, and we want to know something like ‘how
much energy does it take to fight gravity or the wind’ ? This would involve a vector
1https://tex.stackexchange.com/questions/328036/velocity-field-3d-vector-fields-in-tikz-or-pgfplots
42 IAN COLEY
field F : R3 ! R3 . In this case, whenever the curve travels in the same direction
as the vector field, we would like to value that positively (going with the flow), and
negative when the curve travels against it. Remember that C we cannot think of as
just a 1-dimensional object in R3 , it comes with an orientation – it has a back and
a front, and the function : [a, b] ! R3 we use needs to take this into account.
What vector operation determines whether things go in the same direction? The
dot product. What gives the (linear) direction the curve is going? Its tangent vectors
0
(t). Thus:
Definition 4.9. The line integral of a vector field F along a curve C, parametrised
by : [a, b] ! R3 , is
Z Z b
s
F · dr = F ( (t)) · 0 (t) dt
C a
s
Both of these quantities are vectors, so it makes sense to dot them. Another way
the expression F · dr is sometimes written is F1 dx + F2 dy + F3 dz, where these are
re
the component functions of F . This will come up later.
Problem 4.10. Compute the line integral of F = hz, y 2 , xi along the curve (t) =
(t + 1, et , t2 ) for t 2 [0, 2].
og
What are some basic properties of the line integral? They are still linear, and
now if one reverses the orientation of the curve C, this is like swapping a, b on the
righthand side of the above equation, hence negates the integral. The last thing is
that stringing together multiple curves end-to-end gives a sum of integrals.
Pr
4.5. Conservative vector fields. These are just the best. Our prototype here is
a vector field F that arises as rf for some f : R3 ! R. Such a vector field is called
conservative. Then we can use the fundamental theorem of calculus to evaluate
Z Z b Z b
0 d
F · dr = rf ( (t)) · (t) dt = f ( (t)) dt = f ( (b)) f ( (a))
C a a dt
In
But (b) and (a) are just the endpoints of the curve, so the actual curve C
doesn’t matter in this case. Any vector field for which this happens is called path-
independent.
Similarly, if C is a closed curve, its endpoints are the same, so
I
F · dr = 0
C
One might ask if there are path-independent vector fields that do not arise as the
gradient of some function. The answer is, essentially, no.
Theorem 4.12. A vector field F on an open, connected domain D is path-independent
if and only if it is conservative.
Now, let us go over some of the other vector derivatives that will become useful
shortly. The first is the divergence of a vector field,
div F (x, y, z) = r · F = @x F1 + @y F2 + @z F3
and the second is the curl of a vector field, which only makes sense in R3 :
2 3
î ĵ k̂
s
curl F (x, y, z) = r ⇥ F = det 4 @x @y @z 5
F1 F2 F3
s
Problem 4.13. Prove that if F = rf is conservative, then curl F = ~0.
re
The converse to this problem is not always true, but it is true in a great many
cases.
Theorem 4.14. Suppose that D is an open simply-connected domain. A vector
og
field F on D is conservative if and only if curl F = ~0.
Hence one should beware that the domains they are working on be simply-connected,
which (we remember) means that all loops in D be contracted to a point. That means
that something like x2 + y 2 c is okay but punctured regions like R2 \ {(a, b)} are
Pr
not.
4.6. Surface integrals. In order to integrate using surfaces in R3 , we need to be
able to parametrise them like
G(u, v) = (x(u, v), y(u, v), z(u, v))
where (u, v) are in some region D in R2 . For example, the graph of the function
In
z = f (x, y) in R3 is easily parametrised by (x, y, f (x, y)). This is our prototype, but
of course not all surfaces in R3 are graphs.
Suppose we want to parametrise the cylinder x2 + y 2 = 1 in R3 . This is best done
with cylindrical coordinates (of course), yielding an easy parametrisation (1, ✓, z) for
✓ 2 [0, 2⇡] and z 2 R. Since the radius is fixed, we only get two variables.
We can parametrise spheres similarly using spherical coordinates. There’s only a
slight problem with this picture, as we get kind of an overlap at ✓ = 0 and ✓ = 2⇡,
but we will not concern ourselves overmuch with this.
Okay, what are we doing with surfaces? Given a vector field, we want to measure
how much the vector field is flowing through the surface. But what does flowing
44 IAN COLEY
through the surface mean? Do surfaces have an up side and a down side? It turns
out, they do. In the parametrisation here, we have two tangent vectors @u G and @v G
which naturally give a direction ‘up’ for the surface in the form of @u G ⇥ @v G. Note
that if the partial derivatives are parallel this breaks, so we want to make sure that
we don’t run into that problem.
Once we have that normal vector, we can begin by finding the tangent plane, which
is moderately useful.
Problem 4.15. Compute the tangent plane to the surface G(✓, z) = (2 cos ✓, 2 sin ✓, z)
at P = G(⇡/4, 5).
Of course, we could easily reverse the orientation on the parametrisation by using
s
the equation G(z, ✓) so that the order in which we take the cross product is swapped.
Hence we’re going to want to have some sort of consistency. For the surface given by
s
the graph of a function, we will take the upwards direction to the canonical orienta-
tion for the normal vector, i.e. the one going in the positive z-direction h @x , @y , 1i.
re
Now, how do we perform scalar surface integrals, i.e. ones that ignore the vector
field for the moment? Turns out that, in the above situation, N = @u G ⇥ @v G is
also capturing the infinitesimal area of the parallelogram with sides du, dv. This is
an easy computation using the sin ✓ interpretation of the cross product. Therefore
og
if we want to do our integral, we need to scale by this amount, just like we had to
account for the speed in the case of line integrals.
Definition 4.16. Let G(u, v) be a parametrisation of a surface S ⇢ R3 with do-
main D. Assume that G is C 1 , one-to-one, and regular (i.e. the normal vector is
Pr
nondegenerate). Then for a function f : R3 ! R,
ZZ ZZ
f (x, y, z) dS = f (G(u, v))k@u G(u, v) ⇥ @v G(u, v)k du dv
S
Now, once we bring in the actual flow and a vector field F : R3 ! R3 , we need to
use a dot product:
In
Theorem 4.17.
ZZ ZZ
F · dS = F (G(u, v)) · N (u, v) du dv
S D
Problem 4.18. Compute the flux through the surface G(u, v) = (u2 , v, u3 v 2 ) over
D = [0, 1]2 of the vector field F = h0, 0, xi.
Problem 4.19. We need to compute first the normal vector, and so need @u G and
@v G:
@u G(u, v) = h2u, 0, 3u2 i, @v G(u, v) = h0, 1, 2vi.
The zeroes make the cross product slightly nicer (details omitted):
s
Is this upward pointing? Looking at the z-coordinate, it’s always positive when
u 2 [0, 1], so we’re in business.
s
We now need to compute the other part of our integrand:
re
F (G(u, v)) = h0, 0, u2 i =) F (G(u, v)) · N (u, v) = 2u3 .
F1 dx + F2 dy = (@x F2 @y F1 ) dA
@D D
Use this when the line integral of the closed curve would be way too confusing to
compute.
Problem 4.21. Verify Green’s Theorem by computing the line integral over the unit
circle C of F (x, y) = hxy 2 , xi.
Solution. On the one hand, we may parametrise the unit circle by ~ (t) = hcos t, sin ti.
Note that this is the correct counterclockwise orientation. Also note that ~ 0 (t) =
46 IAN COLEY
The integral of this first term is zero because (roughly) cos t and sin3 t are zero when
integrated over their entire period. The integral of cos2 t is not zero, however, and it
can be computed to be ⇡.
s
Using Green’s theorem, we have
I ZZ
s
~
F · dr = 1 2xy dA
re
C D
It would be better to convert to polar for this integral. Giving some of the steps
(and reminding the reader that 2 sin ✓ cos ✓ = sin(2✓)),
ZZ Z 2⇡ Z 1
og1 2xy dA = (1 2(r cos ✓)(r sin ✓))r dr d✓
D 0 0
Z 2⇡ Z 1
= r 2r3 cos ✓ sin ✓ dr d✓
0 0
Z 2⇡
1 1
Pr
= sin(2✓) d✓
0 2 2
=⇡
In this case, neither integral was particularly nice. However, in the case that the
integrand of the double integral is a constant, life is much better. For example,
consider the vector field F~ (x, y) = h y, xi. Then
In
@x F2 @y F1 = 1 ( 1) = 2.
Thus integrating a closed curve along this vector field gives you twice the area it
encloses. Look for phenomena like this when it seems that Green’s theorem might
be in play.
One key feature is that, even if your curve is not closed, you can close it up and
appeal to a simpler area calculation, i.e. if you have to compute a line integral over
half a circle, complete it to a whole circle.
Problem 4.22. Compute the line integral over the ‘curve’ of straight lines connecting
(1, 1) to (0, 1) to (0, 0) to (1, 0) of the vector field F~ (x, y) = hx2 y 2 , 2xyi.
MATH GRE BOOTCAMP: LECTURE NOTES 47
Solution. Now, this isn’t a closed curve, so we can’t use Green’s theorem. However,
it is oriented counterclockwise and it’s just a little bit o↵ being closed. Let’s call the
curve in the problem C1 and let C2 denote the straight line between (1, 0) and (1, 1).
If we let C be the closed curve, then we have
I Z Z
F~ · dr = F~ · dr + F~ · dr.
C C1 C2
But now the lefthand integral can be computed using Stokes’ theorem:
I ZZ Z 1Z 1 1
F~ · dr = 2y ( 2y) dA = 4y dy dx = 2y 2 = 2
C [0,1]2 0 0 0
Hence we can solve the integral we want a little more easily: we parametrise C2 by
s
~r(t) = h1, ti for t 2 [0, 1], and so
Z Z Z 1
s
F~ · dr = 2 F~ · dr = 2 F~ (~r(t)) · ~r 0 (t) dt
C1 C2 1
re
Z 1 Z 1
2
=2 h1 t , 2ti · h0, 1i dt = 2 2t dt
0 0
=2 1 = 1.
og
This is much easier than the alternative: breaking the curve C1 into three line seg-
ments and doing three di↵erent line integrals.
Next up: Stokes’ Theorem. The previous theorem told us how to compute a line
integral around a closed curve as a double integral. Stokes’ theorem will tell us how
Pr
to compute a line integral as a surface integral (and sometimes vice versa).
Theorem 4.23 (Stokes’ Theorem). Let S be an oriented surface in R3 with boundary
oriented so that the surface is always on your left (assuming outward pointing normal
vectors). Assume that F~ : R3 ! R3 is a C 1 vector field. Then
I ZZ
F~ · dr = curl F~ · dS
In
@S S
In particular, if @S = ?, then the integral is zero.
It doesn’t look like this is a particularly helpful theorem, in that surface integrals
are usually pretty nasty. But again, a vector field F~ might be nasty but have a curl
which is not – it’s hard to tell without computing itRRthough. Supposing that we
start with the righthand side, something of the form S G ~ · dS, how can we tell if
~ = curl F~ for some F~ ?
G
Proposition 4.24. Suppose that G ~ is a vector field in a simply-connected domain.
Then G~ = curl F~ if and only if div G~ = 0.
48 IAN COLEY
At least the backwards direction of this proposition is easy, and the forwards
direction is done by actually constructing an F~ which has curl G.
~ I will not be doing
~
this. Thus I haven’t actually told you how to find that F , but we usually don’t need
to in practice.
Problem 4.25. Let S be the unit sphere in R3 , and let
~
G(x, y, z) = h2xyz, 4 y 2 z, x3 + y 2 i.
~ through S.
Compute the flux of G
~ is defined on all of R3 ,
Solution. This looks nigh-impossible. This vector field G
which is simply-connected. Moreover, we see that
s
~ = 2yz
div G 2yz + 0 = 0
s
therefore it’s the curl of some vector field F~ . Thus:
re
ZZ I
~
G · dS = F~ · dr.
S @S
But hey, @S = ?, so we don’t even need to know F~ to conclude that the righthand
og
integral is zero.
Another use for Stokes’ theorem is the observation that many di↵erent surfaces S
have the same boundary @S. As an illustrating example:
Pr
Problem 4.26. Consider the vector field
~ = h2yex + z, log(x + z)
G y 2 ex , x + y + 1i.
Compute the flux of G ~ through the upper hemisphere of the unit circle S, with
counterclockwise oriented boundary and upward pointing normal vector.
Solution. Again, the brute force method would take ages. But we notice that
In
~ = 2yzex
div G 2yex + 0 = 0
So this is the curl of something. But hang on, our surface now has a boundary! It’s
the unit circle in the xy-plane, and doing the line integral of something unknown
over that circle seems really bad.
But let’s consider the unit disc D, which has the same boundary as S but has a
much more straightforward normal vector. Using Stokes’ Theorem twice,
ZZ I ZZ
~
G · dS = ?? · dr = G~ · dS.
S @S D
MATH GRE BOOTCAMP: LECTURE NOTES 49
Let’s now try to find this right-most integral. The normal vector to D is given
everywhere by h0, 0, 1i, so we just need to compute the double integral
ZZ ZZ
~
G · h0, 0, 1i dA = x + y + 1 dA
D D
The last theorem to discuss is the divergence theorem, which will tell us how to
s
compute (certain) triple integrals in terms of surface integrals, and more helpfully
vice versa.
s
Theorem 4.27 (Divergence Theorem). Let S be a closed surface, i.e. one that has
re
no boundary, enclosing a region W ⇢ R3 . Let S be oriented by outward pointing
normal vectors, and suppose that F~ is a C 1 vector field defined on open domain in
R3 that contains W . Then
ZZ ZZZ
F~ · dS = div F~ dV
og
S W
if something goes to zero, then it comes as a result of the previous operation in the
chain.
There’s a nice way to discuss this from the point of di↵erential topology, but that’s
a bit beyond the scope of the Math GRE. Indeed, I didn’t learn any of that until
graduate school, at which point I understood most of this well for the first time.
Let’s see it in action.
Problem 4.29. Let F~ (x, y, z) = hy, yz, z 2 i, and let S be the hollow cylinder of radius
2 and height 5 with its base on the xy-plane (with outward pointing normal vector).
Compute the flux of F~ through S.
50 IAN COLEY
s
Actually doing this triple integral is left as an exercise.
s
Thus: never ever compute the flux through a closed surface. You have access to
the divergence theorem, and it’ll almost certainly be a simpler calculation.
re
Remark 4.30. The divergence theorem has an application to physics which most
people learn in their introductory E&M course. Suppose that we have some collection
of charged particles and a spherical shell enclosing them. How do we compute the
flux of the electrical field through the shell? We just add up the charges of the
og
particles on the inside of that shell. This is known as Gauss’ Law. The divergence
of the electrical (vector) field is essentially the charge on the particles which produce
it.
5. Day 5: Linear algebra
Pr
Right, systems of linear equations. We like solving them, don’t we? Let’s just cut
to the chase. We already know well enough what a vector space is and what an inner
product and norm are. In the case that a vector space V admits an inner product,
we define the norm by k~v k2 = h~v , ~v i. We’re only ever going to be working with finite
dimensional inner product spaces on the GRE, so no need to get too complicated.
We want to open up inner products Pn a bit. Over C, we want to define the inner
In
product h~v , wi
~ to still be the sum i=1 vi wi , but now we want it to be sesquilinear:
h↵ · ~v , wi
~ = h~v , ↵ · wi,
~ ↵2C
Inner products and norms satisfy what’s called the Cauchy-Schwartz inequality: for
~ 2V,
any ~v , w
|h~v , wi|
~ k~v k · kwk
~
and equality is only satisfied in certain cases. We also have the triangle inequality:
k~v + wk
~ k~v k + kwk
~
which we all know from geometry.
MATH GRE BOOTCAMP: LECTURE NOTES 51
Problem 5.1. When are these inequalities equalities? Note: they have di↵erent
conditions.
Both of these can be proven using the idea of projections. We define the projection
of w
~ onto ~v by
h~v , wi
~
projw~ ~v = · w.
~
hw,
~ wi~
That’s a good formula to remember, and does things like proves the cos ✓ formula
for inner product:
h~v , wi
~ = kvk · kwk · cos ✓, ✓ = the angle between ~v , w
~
One fun formula that you might recall is the polarization identity. In any real
s
vector space,
1
s
h~v , wi
~ = ~ 2 k~v wk
k~v + wk ~ 2
4
re
which is an easy proof.
Problem 5.2. Prove it.
Okay, now we need the notion of subspaces. Let’s assume for ease of notation that
og
all our vector spaces are over R, though that might not necessarily be true on the
GRE.
Definition 5.3. A subset W ⇢ V is called a subspace if:
(1) ~0 2 W
Pr
(2) For any w ~ 2 2 W and c 2 R, w
~ 1, w ~1 + c · w
~2 2 W
That is, W is a vector space in its own right that sits inside of V .
i=1
coefficients ai = 0. This also implies that if w
~ is in the span of S, then there is a
unique way in which to write w ~ as a linear combination.
Problem 5.4. Prove that.
A maximal linearly independent set in V is called a basis. All bases have the
same number of elements, and all (finite dimensional) vector spaces have a basis.
Call that number the dimension of V . Note that ‘infinite dimensional’ vector spaces
don’t have a basis unless you assume the axiom of choice! It also makes sense to talk
about the basis of a subspace W ⇢ V , etc.
52 IAN COLEY
Remark 5.5. We’re going to keep writing V, W for arbitrary vector spaces, but on
the GRE we might as well have V = Rn and W = Rm all the time, where n = dim V
and m = dim W .
What’s the best kind of basis? An orthonormal one!
Definition 5.6. We say that ~v and w ~ are orthogonal if h~v , wi
~ = 0. This is true if
and only if the vectors are perpendicular in the ambient vector space (or at least one
of them is the zero vector).
Definition 5.7. A set S = {v1 , . . . , vn } ⇢ V is an orthonormal basis if
(1) kvi k = 1 for all i = 1, . . . , n
(2) For any i 6= j, h~vi , ~vj i = 0
s
This can be put more smoothly by saying that h~vi , ~vj i = ij , where ij = 1 if i = j
s
and is zero otherwise.
The usual basis for V = Rn given by ~ei is orthonormal. Think how much more
re
difficult the world would be if the coordinate axes weren’t perpendicular to each
other!
Problem 5.8. Suppose that S = {vi } is an orthonormal basis. Then we know that
og
~ 2 V has a unique expression as
any w
a1~v1 + · · · + an~vn = w.
~
Prove that we can compute these coefficients: ai = h~vi , wi.
~
Pr
That is very useful! But what if our basis isn’t orthonormal? Luckily there’s a
process, called the Gram-Schmidt process, to transform it into an orthonormal one.
The process is inductive, and goes as follows:
• Begin with the first element ~v1 of your basis. This may not be a unit vector,
so let ~u1 := k~~vv11 k . The vector ~u1 is orthogonal to every other element of our
new basis (because we haven’t added any yet).
• Now, take the element ~v2 . This is probably not orthogonal to ~u1 , so we force
In
it to be so: define
w~ 2 = ~v2 h~u1 , ~v2 i · ~u1
which we can see is now orthogonal to ~u1 . But this is probably not a unit
vector, so define ~u2 = kw w
~2
~ 2k
.
• We see how to proceed from here: define
j 1
X w
~j
w
~ j = ~vj h~ui , ~vj i · ~ui , ~uj =
i=1
kw
~jk
and eventually we’ll be done!
MATH GRE BOOTCAMP: LECTURE NOTES 53
It’s important to note that at every stage we are not changing the span of our vectors.
The vector w ~ 2 , for instance, is a linear combination of ~u1 and ~v2 , and ~u1 was just a
multiple of ~v1 so was in its span. Thus the span of w ~ 1, w
~ 2 is the same as ~v1 , ~v2 .
Problem 5.9. Let {1, x, x2 , x3 } be a basis for P3 (R), the vector space of degree at
R 1 3 polynomials with coefficients in R, endowed with the inner product hf (x), g(x)i =
most
0
f (x)g(x) dx. Convert this to an orthonormal basis.
5.2. Linear transformations. What are the functions we care about?
Definition 5.10. A linear transformation T : V ! W is a function satisfying the
axiom that T (c · ~v + w) ~ for all c 2 R and for all ~v , w
~ = c · T (~v ) + T (w) ~ 2V.
s
The definition implies that T (~0) = ~0, which is a nice feature. We have a couple of
associated numbers:
s
Definition 5.11. The rank of T : V ! W is the dimension of its image T (V ) ⇢ W
re
as a subspace of W . The nullity the dimension of its kernel, that is the subspace
ker T ⇢ V given by T (~v ) = 0.
Theorem 5.12. Rank + nullity = dim V .
og
Linear transformations are well described by matrices in the case that we identify
V = Rn and W = Rm . A transformation T yields a matrix A 2 Mm⇥n (R) where
the columns of A are T (~ei ) for the basis vectors ~ei of Rn . But normally vector
spaces don’t come with automatic bases. For T : V ! W , where dim V = n and
Pr
dim W = m, we still get a matrix A of the same dimensions, but we have to set a
basis = {~bi } for V and thus will denote it [T ] .
Okay, now let’s just fix V = W = Rn . What if we had a special basis that
we want to change A = [T ]std to? How do we change basis? In order to write the
matrix A in terms of a new basis, we think of converting from to standard, doing
the transformation A that we know, then back to . The ‘back to standard’ matrix
is P such that its columns are the ~bi . Therefore
In
A
Rn / Rn
P P
✏ ✏
Rn / Rn
[T ]
1 1
A=P [T ] P =) [T ] = P AP
where the matrix P is easy to compute but P 1 is usually a little more unpleasant
to compute. GAUSSIAN ELIMINATION START HERE
54 IAN COLEY
s
• reduced row echelon form is the identity
How do you find the inverse of a matrix? In 2 ⇥ 2, there’s a nice formula, but
s
otherwise you have to use Gaussian elimination. THEN DO A 3 ⇥ 3 EXAMPLE.
Define eigenvalues and eigenvectors of a matrix. Note that the kernel is eigenvec-
re
tors of eigenvalue 0. Sometimes you get a basis of eigenvectors, but sometimes you
don’t. How often does that occur?
Theorem 5.13 (Spectral Theorem). This is a restricted version, but often good
og
enough. If A is a real matrix and A = AT , the transpose of A, then A is diago-
nalizable. If A is a complex matrix, then we need A = A⇤ , the Hermitian matrix
(transpose conjugate). The general form is that AAT = AT A.
So how do we go about finding eigenvalues or eigenvectors? Use the characteristic
Pr
polynomial or, as came up on the test, just examine the matrix A In . is
an eigenvalue if and only if that matrix is not invertible, i.e. has a kernel so that
A~v · ~v = 0 has a solution and A~v = · ~v .
DO AN EXAMPLE OF FINDING EIGENVALUES FOR A SYMMETRIC 3 ⇥ 3
matrix.
Note: A and AT have the same eigenvalues, which is pretty neat.
In
Theorem 6.1. Suppose that f (t, y) and @y f (t, y) are continuous on a compact subset
K ⇢ R2 . Then for any point (t0 , y0 ) 2 K, the di↵erential equation y 0 = f (t, y),
y(t0 ) = y0 has a unique solution in some neighbourhood of (t0 , y0 ).
So perhaps it’s good to check that @y f (t, y) is continuous before blindly charging
into a problem, but probably not.
The first type of di↵erential equation is the basic separation of variables, like so:
Problem 6.2. Suppose that a colony of bacteria grows at a rate directly proportional
to its size. Initially, the colony has 100 bacteria, and after a week it has 300 bacteria.
Write a formula modelling this situation where the time t is measured in days
Solution. The situation we have is
s
dB dB
= k · B =) = k dt
s
dt B
Then integrating both sides gives us
re
log(B) = k · t + C =) B(t) = C · ekt .
We have that B(0) = 100, so C = 100. We also know that B(7) = 300, so
og log(3)
e7k = 3 =) k = .
7
Putting it all together,
log(3)
·t
B(t) = 100e 7
Pr
There are more sophisticated version of this problem, and they are usually in the
tune of salty or sugary tanks of water.
Problem 6.3. Suppose that we have a 100L tank with 50L of water and 100g of
salt in it. Suppose the tank drains at a rate of 1L/m and is filled at a rate of 2L/m
with pure water. Assuming instantaneous mixing, when the tank is full, how much
salt is there in the tank?
In
After integration and rearranging (specifically after pulling in the 1 into the log),
we get
C
S(t) =
t + 50
The amount of salt at the beginning is 100, so we have S(0) = C50 = 100 so
C = 5000. The tank is full at t = 50, so
5000 5000
S(t) = =) S(50) = = 50g
t + 50 100
Now, we turn to other types of di↵erential equations. Let’s first recall what an
integrating factor is. Suppose our di↵erential equation is of the form
dy
s
+ p(t)y = q(t).
dt R
s
Then we consider the integrating factor µ(t) = e p(t) dt . Why does that help? Using
this term,
re
d
µ(t) · y 0 + µ(t)p(t) · y = (µ(t) · y) = µ(t) · q(t).
dt
Thus, when we integrate both sides,
Z
og µ(t) · y = µ(t)q(t) dt
Assuming that the righthand side is integrable, we can then solve and divide out by
µ(t).
Problem 6.4. Solve the linear ODE y 0 2ty = t.
Pr
R 2
Solution. The process implies that µ(t) = e 2t dt = e t , not something we can
integrate on its own. Luckily, the whole righthand side is integrable:
Z
2 1 t2
te t dt = e +C
2
Dividing through now by our integrating factor,
In
2 1
y(t) = Cet
2
Again, if we get a linear ODE of this form, this is pretty much the only way to
solve it. Exact ODEs likely won’t come up, but there’s always that chance. Plus,
it’s related to multivariable calculus. Suppose that we have a di↵erential equation
of the form
N (x, y) · y 0 + M (x, y) = 0 i.e. N (x, y)dy + M (x, y)dx = 0
where moreover we have @x N (X, y) = @y M (x, y). Then this implies that, at least
locally, that this situation is coming from the equality is mixed partials, so we need to
MATH GRE BOOTCAMP: LECTURE NOTES 57
find a function H(x, y) with rH = hM, N i. The general solution to the di↵erential
equation is H(x, y) = C.
Problem 6.5. Solve (x2 y + 2y) · y 0 + (xy 2 + 2x) = 0.
Solution. This equation is exact (easily verified), so a solution looks something like
Z Z
H(x, y) = xy + 2x dx = x2 y + 2y dy
2
As before, we need to integrate but bear in mind that we might have constants that
depend on one variable or the other. That is,
Z Z
2 x2 y 2 2 x2 y 2
xy + 2x dx = + x + g1 (y), x2 y + 2y dy = + y 2 + g2 (x)
s
2 2
Comparing terms, we need to use g2 (x) = x2 and g1 (y) = y 2 , so that
s
x2 y 2
H(x, y) = + x2 + y 2 = C
re
2
is our general solution.
6.1. Higher order di↵erential equations. Now, suppose we want to solve partic-
ular di↵erential higher order di↵erential equations that have little interaction between
og
the variables. For instance, examine
y 00 9y = f (t)
The first step is to solve the corresponding homogeneous equation y 00 9y = 0. We
can solve this by inspection, know that y 0 = ky is solved by ekt . Hence the solutions
Pr
we need are e3t and e 3t . The general solution to the di↵erential equation is therefore
y(t) = c1 e3t + c2 e 3t
.
Here’s generally how you solve a homogeneous di↵erential equation like this. Con-
sider an equation
ay 00 + by 0 + cy = 0
In
Then solutions to this equation are given by e t , where is a root of the corresponding
characteristic polynomial
ax2 + bx + c = 0
There are three options here: the polynomial may have two distinct real roots, one
double real root, or two (conjugate) complex roots.
The case of two distinct real roots is the one we examined above: the general
solution is y(t) = c1 e 1 t + c2 e 2 t . If there is only one real root, we still need a two-
dimensional solution to the system of equations, so the general solution looks like
y(t) = c1 e t + c2 te t . The complex roots possibility is a little more delicate, because
we need to make sure that come up with a real solution.
58 IAN COLEY
Using now that cos( bt) = cos(bt) and sin( bt) = sin(bt),
y(t) = (c1 + c2 )eat cos(bt) + (c1 c2 )i · eat sin(bt)
s
Now we use the fact that (secretly) c1 , c2 2 C now, we need that c1 c2 2 iR and
c1 + c2 2 R. Luckily, it is possible to get any number we want using c1 = c 2di and
s
c2 = c+di
2
so that c1 + c2 = c and c1 c2 = di. Putting this all together, the general
re
solution is
y(t) = c1 eat cos(bt) + c2 eat sin(bt).
Problem 6.6. Solve the following initial value problem: y 00 4y 0 + 9y = 0, with
og
y(0) = 0 and y 0 (0) = 8.
Solution. The characteristic equation is x2 4x + 9 = 0, so that we have roots
p
4 ± 16 4(9) p
= = 2 ± 5i
Pr
2
Hence the general solution is
p p
y(t) = c1 e2t cos( 5t) + c2 e2t sin( 5t)
Knowing that y(0) = 0 means that c1 = 0, as everything else cancels out. Therefore
p p p p
y(t) = ce2t sin( 5t) and y 0 (t) = 2ce2 sin( 5t) + 5ce2t cos( 5t)
In
p 8
So y 0 (0) = 5 · c = 8 thus c = p . Not the nicest solution, but
5
8 p
y(t) = p e2t sin( 5t)
5
6.2. Nonhomogeneous di↵erential equations. How do we deal with nonhomo-
geneous di↵erential equations? First, solve the homogeneous one. Then we have to
guess a particular solution.
Problem 6.7. Determine a particular solution to y 00 4y 0 12y = 3e5t .
MATH GRE BOOTCAMP: LECTURE NOTES 59
What should yp (t) look like? Probably something of the form yp (t) = Ae5t , so we
then need to check which A satisfies the di↵erential equation:
yp0 (t) = 5Ae5t , yp00 (t) = 25Ae5t =) 25Ae5t 4 · 5Ae5t 12 · Ae5t = 3e5t
3
Solving this gives 7Ae5t = 3e5t so A = . Putting this all together,
7
s
6t 3 5t
y(t) = c1 e + c2 e2t e
s
7
Other types of particular solutions require di↵erent guesses: sines and cosines
re
demand sines and cosines. What if particular solutions are polynomials?
2a 4(2at + b) 12(at2 + bt + c) = t2 + 3t + 2
6.3. Complex analysis. Let’s recall a little the nice types of complex-valued func-
tions.
Theorem 6.9. Let f : ⌦ ! C, where ⌦ ⇢ C is an open subset of the complex
numbers. Then the following are equivalent:
• f is di↵erentiable in an open disc centered at a 2 ⌦ (holomorphic)
X1
• f has a convergent power series expansion cn (z a)n in an open disc
n=0
centered at a 2 ⌦ (analytic)
This incredible theorem implies that di↵erentiable functions are smooth, which is
one of our introductions to the wild world of complex analysis. There are some nice
s
corollaries:
s
Corollary 6.10. Let f, g : ⌦ ! C be two holomorphic functions on an open con-
nected ⌦ ⇢ C. If f (z) = g(z) on an infinite subset S ⇢ ⌦ that contains a limit point
re
of ⌦, then f = g on ⌦.
Corollary 6.11. A bounded holomorphic function f : C ! C must be constant.
Holomorphic functions must satisfy the Cauchy-Riemann equations, and the con-
og
verse is true as well.
Theorem 6.12. Let f : ⌦ ! C be a function, and write f (x + iy) = u(x + iy) + i ·
v(x + iy). Then f is holomorphic if and only if
Pr
@x u = @y v and @y u = @x v.
This theorem is pretty useful, as it means that information about the real part of
a holomorphic function can get us the whole function.
Problem 6.13. Suppose that f (x + iy) = u(x, y) + i · v(x, y). If u(x, y) = x2 y2
and v(1, 1) = 2, find v(4, 1).
In
6.4. Cauchy integral formula. The last thing we should recall is the Cauchy in-
tegral formula.
Theorem 6.15 (Cauchy integral formula). Suppose that f : ⌦ ! C is a holomorphic
s
function on an open domain, and let D ⇢ ⌦ be a closed disc in ⌦. Then
s
I
1 f (z)
f (a) = dz
re
2⇡i @D z a
for every a 2 D.
This yields the residue theorem.
og
Theorem 6.16 (Residue theorem). Let U ⇢ C be a simply connected open subset
and f : U ! C a function holomorphic but for a 2 U . Let be a closed curve in U
around a, oriented counterclockwise. Then
I
Pr
f (z) dz = 2⇡i Res(f, a)
1
where Res(f, a) is the coefficient of the term in the Laurent series expansion
z a
of f (z) around a. Otherwise put, it is the number R such that
In
R
f (z)
z a
has an analytic antiderivative in a disc around a.
I’m not sure that this has much of a place on the GRE, but the Cauchy-Riemann
equations are the jewel in the crown.
7. Day 7: Algebra
Topics covered: groups, rings, and fields.
62 IAN COLEY
7.1. Groups.
Definition 7.1. A group is a set G with a binary operation · : G ⇥ G ! G that
satisfies the following axioms:
• The operation is associative, so (g · h) · k = g · (h · k) for any g, h, k 2 G
• There exists an element e 2 G such that e · g = g · e = g for all g 2 G
• For every element g 2 G, there exists an element g 1 2 G so that
g·g 1 =g 1·g =e
Problem 7.2. Prove that the identity element is unique.
Problem 7.3. Prove that the inverse of an element g is unique.
s
These are two good exercises to get your hands on. Note that the group operation
needn’t be commutative! Consider an example you already know: let GLn (F ) be the
s
subset of invertible n ⇥ n matrices with entries in a field F . Then this is a group
under multiplication, with inverse and identity as one would imagine. Note that
re
Mn (F ) is not a group under multiplication, as non-invertible matrices do not have
inverses (obviously). However, Mn (F ) is a group under addition, and it’s in fact
commutative, i.e. A + B = B + A for all A, B 2 Mn (F ). Commutative groups are
also called abelian.
og
What are the kinds of functions we’re interested in?
Definition 7.4. A group homomorphism is a map of sets ' : G ! H such that
'(g1 · g2 ) = '(g1 ) · '(g2 ) for all g1 , g2 2 G.
Problem 7.5. Prove that '(g 1 ) = '(g) 1
Pr
and '(eG ) = eH .
Definition 7.6. An isomorphism of groups is a group homomorphism that is bijec-
tive as a map of sets. In particular, the set-theoretic inverse map is automatically a
group homomorphism.
Problem 7.7. Prove it.
In
Theorem 7.9 (Lagrange’s Theorem). Let G be a finite group and let H < G be a
subgroup. Then |H| divides |G|. In particular, |g| divides |G| for every g 2 G.
Hence when we see GRE questions about the possible orders of elements and
subgroups, this helps a lot.
We can talk about the subgroup generated by a number of elements g1 , . . . , gm in
the obvious way. It’s helpful to use the fact that the intersection of subgroups is still
a subgroup, and we can define
\
hg1 , . . . , gm i = H.
g1 ,...,gm 2H
s
Similarly, we can talk about the subgroup generated by a family of subgroups,
\
s
HK = {hk : h 2 H, k 2 K} = G0
H,K⇢G0
re
Since H < HK and K < HK, this means that |HK| needs to be a divisor of
|H| · |K|. In particular, |HK| = |H| · |K| if and only if H \ K = {eG }.
G/H = {gH : g 2 G}
where g1 H = g2 H if these subsets contain the same elements, i.e. there exists h 2 H
In
such that g1 · h = g2 . In the case that N / is a normal subgroup, G/N actually admits
a group structure – g1 N · g2 N = g1 g2 N .
There are a variety of simple groups, but the biggest class of examples is Cp for
the primes p. Another choice will turn out to be An for n 5, which we will define
below.
64 IAN COLEY
7.2. Examples of groups. It’s probably about time to give some examples of (fi-
nite) groups. For every positive integer n 2 N, consider the set with n elements
Xn = {1, . . . , n}. Consider a bijection f : Xn ! Xn . We can put a group structure
on this set, with the operation being composition. Identity and inverses are obvious.
Call the set of these maps Sn and call it the symmetric group on n elements. Then
|Sn | = n!, one can readily check.
We think about elements in Sn using a cycle decomposition. Let n = 5 for sim-
plicity, and consider the following function:
f (1) = 2, f (2) = 3, f (3) = 5, f (4) = 1, f (5) = 4
We write this in the following format: we start by writing (1 ), and we then write
the image of 1 to obtain (12 ), and so on until we get (12354). This is called a
s
5-cycle as it’s written with 5 elements. Consider another function,
s
g(1) = 2, g(2) = 3, g(3) = 1, f (4) = 5, f (5) = 4
which yields up the cycle decomposition (123)(45), which we call a 3-2-cycle. As a
re
final example, consider
h(1) = 2, h(2) = 1, h(3) = 3, h(4) = 4, h(5) = 5
We could write this as (12)(3)(4)(5), but we’d rather write (12) and call it a 2-cycle.
og
Note that in a cycle decomposition, the elements in the cycles must be disjoint.
Every element of Sn has such a unique cycle decomposition up to permutation, and
it’s unique if we orient the cycle to begin with the lowest number left. That is
(123)(45) = (231)(54) = (312)(45)
Pr
but the first choice is canonical.
How do we multiply cycles? Consider (12)(13). This is a composition that says
1 ! 2, 2 ! 1 ! 3, and 3 to1. This is the cycle (123). Consider now (13)(12). This
says 1 ! 3, 3 ! 1 ! 2, and 2 ! 1. Hence this is the cycle (132). These are di↵erent!
The symmetric group Sn is not commutative. Now, we can address subgroups and
orders.
In
(12345) = , and then the flip is given by (25)(34) = ⌧ . The group generated by
these two elements has 2n elements.
We can talk about groups in terms of generators and relations. For D2n , we write
n
D2n = h , ⌧ : = ⌧ 2 = 1, ⌧ ⌧ = 1
i
There’s an explicit embedding D2n ! Sn as we can see above, but we can also think
about D2n in abstract.
The last type of group to know is the alternating group An ⇢ Sn . This contains
exactly half the elements, and can be described as the kernel of the map
sgn : Sn ! C2 = {±1}
which sends an m1 -m2 -· · · -mk -cycle to ( 1)m1 +···+mk k . The alternating group con-
s
sists of the identity element, m-cycles for odd m, 2-2-cycles, etc. There’s another
description of this that’s not worth getting into right now.
s
7.3. Abelian groups. We now need to state the fundamental theorem on finitely
re
generated abelian groups. Finitely generated is pretty obvious to define, but what’s
the theorem?
Theorem 7.15 (FTFGAG). Let A be a finitely generated abelian group. Then
og
A⇠= Zr ⇥ Z/n1 Z ⇥ Z/n2 Z ⇥ · · · ⇥ Z/nk Z
where n1 | n2 | · · · | nk . Alternatively,
A⇠ ↵
= Zr ⇥ Z/p↵1 Z ⇥ · · · ⇥ Z/p ` Z
1 `
Pr
for primes pi and powers ↵i .
Now, what the hell does any of this mean? Z/nZ is the cyclic group Cn , but we
think of it additively and in terms of modular arithmetic. In particular, it’s the
quotient of Z by the normal subgroup nZ = {n · m : m 2 Z}, which we think of as
generated under addition by 1. In an abelian group, all subgroups are normal, so
there’s no problem there. The product is the same as the product of sets, and the
In
s
There are again two types of substructures that need to consider.
s
Definition 7.19. A subring S ⇢ R is an abelian subgroup that is closed under
re
multiplication. Sometimes we demand that 1 2 S, sometimes we don’t.
Subrings aren’t even that important.
Definition 7.20. A left ideal I ⇢ R is an abelian subgroup I that is closed under
og
left multiplication: for every a 2 R and x 2 I, a · x 2 I. Similarly, we can define a
right ideal and a two-sided ideal.
Ideals are important, subrings aren’t. Here’s another definition and an important
consequence.
Pr
Definition 7.21. An element a 2 R is invertible if there exists b 2 A such that
a · b = b · a = 1.
Problem 7.22. If I ⇢ R is an ideal and a 2 I is invertible, then I = R.
This means that in a proper ideal (i.e. I 6= R), we can’t have any invertible
elements.
In
We can also talk about the left, right, or two-sided ideal generated by a subset
of R. Finally, we can prove that if I ⇢ R is a two-sided ideal, then R/I has the
structure of a ring.
Now, what are the functions?
Definition 7.24. A (unital) ring homomorphism ' : R ! S is an abelian group
homomorphism such that '(1R ) = 1S and '(r1 · r2 ) = '(r1 ) · '(r2 ).
As an example, consider the map ' : Z ! Z such that '(n) = n. This is a
perfectly good abelian group homomorphism, but it’s not a ring homomorphism. In
fact, since we demand that '(1) = 1, there is only ever one map ' : Z ! R for any
ring R, and it’s defined by '(1) = 1R .
s
Kernels of ring homomorphisms are two-sided ideals, which is convenient, so that
R/ ker ' has the structure of a ring.
s
7.5. Modular arithmetic. Let R = Z and let I = nZ. Then in the ring Z/nZ, we
re
can do mathematics. The key is that we are working with ‘remainders after dividing
by n’. Let n = 12. Then for two examples,
8 + 7 = 15 ⌘ 3, 4 · 5 = 20 ⌘ 8
og
We can identify what elements in Z/nZ are invertible and which are zero divisors.
Supposing that d is a divisor of n, we know that d · n/d = n ⌘ 0. Even if (d, n) =
↵ > 1, it is still a zero divisor, because d · n/↵ ⌘ 0. On the other hand, if (d, n) = 1,
then we know that there’s a solution to the expression ↵ · d + · n = 1, so that
↵ · d ⌘ 1 and d is invertible.
Pr
7.6. Fields. There’s a special situation that we can see immediately. Suppose that
n = p is a prime. Then every d 2 Z/pZ is coprime to p, so that every element of
Z/pZ is invertible. A commutative ring in which every element is invertible is called
a field.
But wait, we already know a lot of fields. We know Q, R, C, and others! Well
surprise, there are also finite fields. Field homomorphisms are just ring homomor-
In
phisms, but with a twist: let’s examine ' : F ! K for two fields F, K. We know
that ker ' ⇢ F is a two-sided ideal. But since every element in F is invertible, we
know that either ker ' = F or ker ' = {0}. Since '(1F ) = 1K , we know that ker '
can’t be everything. Thus ker ' = {0} and all field homomorphisms are injective.
A special case is that of field automorphisms. It’s pretty hard to find field auto-
morphisms sometimes. This is the realm of Galois theory, which isn’t particularly
covered on the GRE. As a special observation, there are no nontrivial automorphisms
of Q or Fp (which is Z/pZ when it has its field clothes on).
Problem 7.25. Prove that if ' : Q ! Q with '(0) = 0 and '(1) = 1, then ' = idQ .
68 IAN COLEY
s
We can similarly define the infimum of S by inf S = sup( S). The supremum
s
is also called the least upper bound and the infimum the greatest lower bound.
There’s a nice feature about the real numbers that’s related to its completeness
re
(see below).
Theorem 8.2. Every bounded above subset of the real numbers has a supremum.
As another random note, here’s a theorem.
og
Theorem 8.3. Suppose that {xn } is an increasing sequence, i.e. xn xn+1 for all
n. If the set of values S = {xn } is bounded above, then
lim xn = sup S.
n!1
Pr
8.2. Metric spaces. Everything we say is going to work in an arbitrary (complete)
metric space, so let’s go ahead and do that definition first.
Definition 8.4. Let X be a set. A metric on X is a function d : X ⇥ X ! R 0 such
that
• d(x, y) = 0 if and only if x = y
• d(x, y) = d(y, x) for all x, y 2 X
In
This is strictly stronger than continuous: it says that the same can be used at
any point in the domain, not just at the particular point a you want. That’s why we
don’t talk about ‘f (x) is uniformly continuous at x = a’.
There’s an upgrade to this whose definition bears mentioning.
Definition 8.6. A function f : X ! Y is called Lipschitz continuous if there exists
a constant K > 0 such that dY (f (x1 ), f (x2 )) K · dX (x1 , x2 ) for all x1 , x2 2 X.
In particular, the choice = "/K shows that Lipschitz continuous implies uni-
formly continuous (implies continuous).
There’s a nice way to conclude a function is uniformly continuous.
Problem 8.7 (Heine-Cantor Theorem). If X is a compact metric space and f : X !
s
Y is continuous, then it is also uniformly continuous.
s
Solution. Recall that one version of compact (we’ll re-recall it later) is that every
re
open cover of X admits a finite subcover. Since X is continuous, let’s define the sets
Ux for all x 2 X as follows:
Ux = {x0 2 X : dY (f (x), f (x0 )) < "/2}
og
In other words, it’s the set around x that satisfy the conditions of uniform continuity
for a slightly smaller epsilon. We can then define Bx to be the biggest open ball
B( x , x) ⇢ Ux . For one more refinement, consider Bx0 = B( x /2, x), the ball with
half the maximal radius. The collection {Bx0 } is (obviously) an open cover of X, so
there’s some finite collection x1 , . . . , xn such that Bx0 i = B( i /2, xi ) cover X.
Pr
1
Consider now = min i . This is a positive number because we are taking
2
a minimum (instead of, say, an infimum). Moreover, take any z1 , z2 2 X with
dX (z1 , z2 ) < . Without loss of generality, we have z1 2 B10 . Then
1 1 1
d(z2 , x1 ) d(z2 , z1 ) + d(z1 , x1 ) < + + = 1
2 2 2
In
and this limit diverges. The case of uniform continuity ensures that we don’t run
into this problem.
s
There’s also absolute continuity, but I don’t think we need to recall that.
s
8.3. Convergence of functions. We can now begin to talk about convergence of
functions. We will restrict our attention to Y = R and X ⇢ R because we will care
re
about completeness. Let’s recall that briefly.
Definition 8.9. A sequence {xn } in a metric space X is called Cauchy if for all
" > 0, there exists N 2 N such that dX (xm , xn ) < " whenever n, m > N .
og
The terms of a Cauchy sequence get arbitrarily close together. This is related to
the sequence converging to some limit.
Problem 8.10. Prove that if limn!1 xn = x exists, then {xn } is Cauchy.
The converse is not necessarily true. Consider the metric space Q and the sequence
Pr
1, 1.4, 1.41, 1.414, . . .
p
given by the truncations of the infinite decimal 2. The limit is, by design, not in
Q, however this sequence is Cauchy as |xn xn+1 | < 10 n+1 so gets arbitrarily small.
We therefore get our definition:
In
assuming all these limits exist. In this case, we say that {fn } converges to f pointwise.
MATH GRE BOOTCAMP: LECTURE NOTES 71
This presents our conundrum. Each of the functions fn (x) is continuous, but their
pointwise limit is not! We need to introduce more refined version of convergence that
takes into account that we have an entire function, not just a series of points.
s
N 2 N such that
s
|fn (x) f (x)| < " for all n > N
re
That is, the pointwise limits are all getting close to the limit function f (x) simulta-
neously.
We can see how this should be generalised to general metric spaces. Note that since
R is complete, we could also demand that the sequence {fn } is uniformly Cauchy
og
rather than uniformly continuous, which is sometimes easier.
The point is this:
Theorem 8.14. Suppose that {fn } is a sequence of continuous functions that con-
Pr
verge uniformly to f . Then f is also continuous.
This must mean that fn (x) = xn does not converge uniformly to the limit function.
To see this, fix 1 > " > 0 and any N 2 N. We will show that there exists x 2 [0, 1]
such that |fN (x) f (x)| > ". Specifically, we are going to choose an x 2 (0, 1) so
we just need to prove that xN > ". But this is easy: take any 1 > > " and let
x = 1/N .
In
Can we upgrade this theorem in the case that we know that {fn } are also uniformly
continuous? Yes. Leave it at that.
8.4. Integrals. Now let’s address the issue of integrals. Suppose that we have a
sequence of integrable functions fn : X ! R where, again, we will consider X ⇢ Rn
(and most likely n = 1). Suppose further that fn ! f pointwise. Does it follow that
f is integrable? In particular, do we have an equality
Z Z
lim fn (x) dx = f (x) dx
n!1 X X
72 IAN COLEY
s
Z 1
0 dx 6= 1
s
0
re
Thus, something has gone wrong. As something specific to notice, the functions {fn }
are not uniformly bounded by any constant. Consequently, {fn } does not converge
to the zero function uniformly. We have two theorems that give us the means to
commute the integral and the limit.
og
Theorem 8.15 (Uniform convergence theorem). If fn ! f uniformly as functions
[a, b] ! R and all fn are integrable, then
Z b Z b
lim fn (x) dx = f (x) dx
Pr
n!1 a a
a a
I’m trying to avoid measure theory in the description of this theorem, and I think
we don’t need it.
n
\
• If U1 , . . . , Un is a finite collection of elements in T , then so is Ui
i=1
The set T is called a topology on X. The sets U 2 T are called open sets. A
set V such that V c 2 T is called closed. Note that you could also define a topology
using closed sets and a dual set of axioms.
Every subset S ⇢ X has an interior and a closure. The interior S is the union of
all open sets U ⇢ S and the closure S is the intersection of all closed sets S ⇢ C.
There are always two topologies on any set X, namely the maximal choice
T = P(X) called the discrete topology and the minimal choice {?, X} called the
indiscrete topology (which I think is joke).
Something that we might care about is when two topologies are the same, i.e.
s
when they have exactly the same open sets. Well, usually a topology is defined using
s
a generating set, in the following sense:
Definition 8.18. A subset B ⇢ T is called a base of the topology T on X if:
re
[
• U =X
U 2B
• For every U1 , U2 2 B and every x 2 U1 \U2 , there exists a U3 2 B containing x
og
Problem 8.19. Suppose that B1 is a base of T1 and B2 a base of T2 on a set X.
Suppose further that for every U2 2 B2 , there exists U1 2 B1 such that U1 ⇢ U2 and
vice versa. Then T1 = T2 .
If we have that T1 ⇢ T2 , we say that T1 is coarser than T2 , or that T2 is finer than
Pr
T1 . In linguistic terms, having more open sets makes the set X more smooth. We
can also check this on bases.
8.6. Separation axioms. There is an increasing list of axioms that make topolog-
ical spaces more and more nice. We’ll give a list and examples.
A topological space X is T0 if for every two points x, y 2 X, there exists an open
set U such that x 2 U but y 2 / U . That is, all points are topologically distinguishable.
In
s
Theorem 8.21 (Urysohn’s theorem). Let X be a topological space. Then X is
separable and metrizable (i.e. admits a metric that generates its topology) if and
s
only if it is regular, Hausdor↵, and second-countable.
re
We’re missing some of these words. A topological space is separable if it admits
a countable dense subset, i.e. there’s a countable set S ⇢ X such that S = X. A
topological space is second-countable if it admits a countable base.
8.7. Continuity. What are the functions we care about?
og
Definition 8.22. Let X, Y be two topological spaces. Then a set map f : X ! Y
is called continuous if f 1 (V ) is open for any open V ⇢ Y . Equivalently, if f 1 (C)
is closed for every closed C ⇢ Y .
Pr
Problem 8.23. Prove that if X, Y are metric spaces endowed with the metric topol-
ogy, this is the same definition as the usual one.
Definition 8.24. A map f : X ! Y is called a homeomorphism if it is continuous,
bijective, and moreover sends open sets to open sets.
Without this last condition, there’s no guarantee that the set-theoretic inverse is
a continuous function.
In
Now, what kind of sets are there besides open and closed?
Definition 8.25. A set Z ⇢ X is called disconnected if there exist open sets U, V ⇢
X such that U [ V = Z and U \ V = ?. A set that is not disconnected is called
connected.
Definition 8.26. A set Z ⇢ X is called path-connected if for every a, b 2 Z, there
exists a continuous function : [0, 1] ! Z such that (0) = a and (1) = b.
Problem 8.27. Prove that every path-connected set is connected.
The converse is not true.
MATH GRE BOOTCAMP: LECTURE NOTES 75
s
Theorem 8.30 (Heine-Borel Theorem). A set K ⇢ Rn is compact if and only if it
s
is closed and bounded.
There are a couple extra things one should prove now.
re
Problem 8.31. The image of a connected set under a continuous function is con-
nected. The image of a compact set under a continuous function is compact.
og
I think that’s about everything.
9. Day 9: Miscellaneous
Pr
Topics covered: probability and combinatorics, statistics, geometry, set theory,
logic, graph theory, algorithms.
9.2. Probability via area. We can imagine a probability space ⌦ and events A
and B being subsets of ⌦, such that the area/volume of ⌦ is 1 and hence P (A) is
given by the volume or area of A. Then looking at the probability of P (A and B) is
just given by the intersection of these areas, and similarly for P (A or B).
When you’re trying to compare continuous random variables x, y, z 2 [0, 1] (for
instance), the volume approach is very useful, as we’ve seen.
Problem 9.1. If x, y are randomly chosen in [0, 1], what is the probability that
x 2y?
76 IAN COLEY
Solution. We can picture this as the double integral where y 2 [0, 1] and x 2 [2y, 1].
Except that this doesn’t make total sense, because 2y > 1 when y > 1/2, so we really
have to integrate y 2 [0, 1/2].
Z 1/2 Z 1 Z 1/2 ✓ ◆2
1 1 1
1 dx dy = 1 2y dy = =
0 2y 0 2 2 4
We can also do this via drawing the picture and computing the area of the triangle.
The same works in 3d.
P (A and B)
9.3. General probability. Conditional probability: P (A|B) = . Read
P (B)
s
‘probability of A given B.
We say that A and B are independent if P (A and B) = P (A)·P (B). Equivalently,
s
P (A|B) = P (A).
There’s a nice way to swap the order of conditional probability, called Bayes’
re
theorem.
Theorem 9.2.
P (B|A)P (A)
P (A|B) =
og
P (B)
In action:
Problem 9.3. Consider drawing two cards from a deck. Compute the probability
of drawing a spade first given that you drew a spade second.
Pr
Solution. Let A be the first spade and B the second spade. We can work out the
probability P (B|A) explicitly: there are 51 cards left and 12 spades, so P (B|A) =
12/51. We can also compute P (A) and P (B): P (A) = P (B) = 1/4. We can see
that P (B) = 1/4 by noting that it’s just taking a random card from the deck. Thus
P (A|B) = P (B|A) = 12/51.
In
Now these have been discrete probabilities here, but what about continuous ones?
Definition 9.4. A probability distribution function for a random variable X is a
positive integrable function f : R ! R such that
Z 1
f (x) dx = 1
1
9.4. Statistics. The expected value of a discrete random variable X (taking values
in R) is X
E(X) = P (A) · A
A2X
For a continuous random variable with pdf f (x) is
Z 1
E(X) = x · f (x) dx
1
The variance can be calculated as E(X ) E(X)2 . Specifically,
2
X Z 1
2
P (A)(A E(X)) , (x E(X))2 · f (x) dx
1
s
A2X
s
What do we know about standard deviation? Well, suppose we have a normal
distribution. That’s basically none of the above examples, but we’ll get to that soon.
re
Within ±1 standard deviation of the expected value (or the mean) is 68% of the
distribution, within ±2 is 95% and within ±3 is 99.7%.
When can we expect a variable to be normally distributed? For example, Bernoulli
trials. Suppose we have an event with probability p and we perform n trials. Then
og
the expected value of successful trials is n · p. If we repeat this situation a bunch of
times, we can look at the number of trials that were actually successful. The variance
of this distribution is n · p · (1 p) and so the standard deviation is the square root
of this.
Pr
Problem 9.5. Suppose we roll a 20-sided die 400 times. Consider the probability of
rolling a prime number. What is the expected number of successes and what is the
standard deviation?
Solution. p = 7/20 so go from there.
9.5. Geometry. Triangles: let’s start there. There’s the law of sines and the law of
In
cosines, which I can recall, but there’s also Heron’s formula for the area of a triangle:
if the sides are a, b, c, then
p
A = s(s a)(s b)(s c)
a+b+c
where s = is the semiperimeter.
2
This is pretty pnice when you know that the triangle is equilateral and then the
x2 3
area becomes where x is the side length.
4
180(n 2)
The measure of the angle of a regular n-gon is . That can be useful.
n
78 IAN COLEY
You can also try to find the area of an inscribed polygon or a circumscribed polygon
around a circle. We can do the example of a hexagon and see where it goes from
there.
Problem 9.6. Do that, I think I can wing it. Start with the unit circle and go from
there.
9.6. Set theory. We’ve discussed the issue of cardinality. If you take the power set,
the cardinality strictly goes up. We can do some discussion of countability though.
@0 is the cardinality of N. We say that a set X is countable if there exists a
surjective function N ! X or an injective function X ! N. A countable union of
countable sets is still countable, and a finite product of countable sets is countable,
s
but a countable product of countable sets is definitely not countable anymore.Y
In particular, consider the set X = {0, . . . , 9} and take an infinite product X.
s
Z
Then if we interpret the elements of this product as xn · 10n for n 2 Z, then we
re
get (roughly) R as long as we make sure that these n are bounded above, but not
necessarily bounded below. We just have to throw away the elements that aren’t
bounded above, but that collection is finite. Hence uncountable minus countable
gives us an uncountable set R. Great!
og
9.7. Graph theory. Graphs are made of edges and vertices. A cycle is a cycle.
Sometimes graphs are directed, sometimes they aren’t. I guess that’s about it.
9.8. Algorithms. Learn some Python? If you don’t know any computer science,
it’s a bit tricky. I guess just try to treat the algorithm like a proof with input.
Pr
In