Mgre Notes

•
•
T ic Li
f he GRE Ma h S bjec Te
This a list of all the topics that I think could conceivably

show up on a GRE Math Subject Test. I compiled this list
based on my own experiences with the test and what kinds
of questions the ETS seems to like to ask (without giving
away question specifics, of course!) I deliberately meant
to be somewhat overkill the more prepared you can be,
the better I d a !
Note that none of these topics are guaranteed to be on the

test. (Af e all he e l much you can fit in 66 questions
anyway!) Even so, the idea is that this should give you a
road map for what you might need to brush up on. I e e e
deliberately included a number of potential e ch ic
just to try to cover as many bases as possible; these are
marked with an asterisk (*).
I hope you find this list useful! Good luck!
—Bill
(P.S.: I also offer personalized tutoring for the GRE Math Subject Test. Contact me via my website,
https://www.mathsub.com if o d like help re ie ing hese opics as well as learning some of the tips
and tricks I recommend to make them easier!)
Last updated: June 14, 2020

Topic List for the GRE Math Subject Test
P ecalc l
Basic Geometry
Parallel and perpendicular lines, transversals
Congruence
Centers of triangles (circumcenter, incenter, orthocenter, centroid)
Triangle inequality
Properties of polygons (angle sum, interior angle measure, etc.)
Similarity and proportionality
Angle bisector theorem for triangles
Pythagorean theorem
Right triangle trigonometry
Circles (arcs, chords, inscribed angles, tangents, secants, power of a point, etc.)
Cyclic quadrilaterals*
AM-GM inequality*
Areas and perimeters of triangles (multiple formulas!), quadrilaterals (multiple
formulas!), circles, sectors, ellipses
Volumes and surface areas of cubes, cylinders/prisms, cones/pyramids, spheres,
ellipsoids
Composite figures and shaded regions
Coordinate geometry (distance, midpoints, etc.)
Basic Algebra
Basics of functions (domain, range, intervals of increase, end behavior, etc.)
Algebra of functions
Inverse functions
Cyclic functions*
Functional equations*
Even and odd functions
Graphs of equations and transformations
Solving equations and inequalities (inverse functions, factoring, completing the square,
looking at the graph, etc.)
Lines and linear functions
Piecewise functions
Absolute value function
Floor and ceiling functions
Max and min functions
https://www.mathsub.com
Algebraic functions
Quadratic functions, quadratic formula, discriminant
Graphs of polynomials
Binomial theorem a d Pa cal ia gle
Factoring and zero-finding techniques (grouping, polynomial division, rational root
he em De ca e R le f Sig Vie a f m la etc.)
Fundamental theorem of algebra
Rational functions (asymptotes, holes, etc.)
Radical functions
Transforming radicals (rationalizing fractions, radical conjugates, nested radicals)
Transcendental functions
Exponential functions and exponent laws
Logarithmic functions and logarithm laws
Exponential and logarithmic applications and models (growth/decay, Gaussian curves,
financial applications, etc.)
Trigonometric and inverse trigonometric functions
Circular and harmonic motion
Laws of sines and cosines
Trigonometric identities (reciprocal, quotient, Pythagorean, sum/difference, double/half
angle, product/sum)
Combined sinusoids*
Hyperbolic functions
Analytic Geometry
Polar coordinates
Graphs of polar equations (rose curves, limaçons, lemniscates, etc.)
Plane curves and parametric equations
Loci in the plane
Conic sections (circles, ellipses, parabolas, hyperbolas) and their anatomy (foci,
eccentricity, etc.)
Transformations of conics (shifting, scaling, rotating)
Polar equations of conic sections
Sequences and series

Sequences and summation notation
Explicit and recursive formulas
Factorials
Arithmetic and geometric sequences and summations
Si gle Va iable Calc l

Limits
Concept and definition
Continuity
Intermediate Value Theorem and Extreme Value Theorem
Involving infinity
L H i al R le
Limits of explicit and recursive sequences
Differentiation
Tangent lines
Concept and limit definition
Differentiability and continuity
Linearity rules
Product and quotient rules
Chain rule
Higher order derivatives
Derivatives of elementary functions
Velocity and acceleration
Implicit differentiation
Relative extrema
Increasing and decreasing functions
Concavity and points of inflection
Curve sketching
Mean Value Theorem
Optimization
Related rates
Logarithmic differentiation
Parametric and polar derivatives
Integration
Antiderivatives
Definite integrals
Average value and the Mean Value Theorem
Fundamental Theorem of Calculus
Leib i R le
𝑢-substitution
Area between curves

Volume: slicing, disks, washers, shells
Arc length and surface area
Integration by parts
Trig substitution
Reduction formulas
Partial fractions
Improper integrals
Parametric and polar integrals, area, arc length, etc.
Series
Infinite series
Geometric series
Telescoping series
Integral test
Comparison test and limit comparison test
Alternating series and absolute convergence
Ratio test and root test
Power series
Taylor and Maclaurin series
Common power series
Remainders of Taylor series and Lagrange error bound
M l i a iable Calc l
Vectors
Vectors in 2D and 3D, rectangular and polar form
Dot and cross product
Lines and planes in space
Quadric surfaces
Vector-valued functions
Vector calculus
The Frenet frame, curvature, and tortsion*
Tangent and normal vectors
Parametric surfaces
Multivariable functions
Limits and continuity of multivariable functions
Partial derivatives and differentiability
Tangent planes
Multivariable chain rule
Gradients
Directional derivatives
Classifying critical points
Lagrange multipliers
Double and triple integrals
F bi i The em
Area, volume, and centroids
Cylindrical and spherical coordinates
Jacobians
Vector Calculus
Vector fields
Line integrals
Independence of path and conservative vector fields
G ee The em
Curl and divergence
Surface integrals and Flux
Divergence theorem
S ke The em*
Diffe e ial E ai
First-O de ODE
Initial value problems
Slope fields
A m ODE
Equilibrium solutions
Separable equations
Linear equations and integrating factors
Exact equations
Inexact equations and integrating factors*
Solutions by substitutions
Modeling with first- de ODE
Higher-O de ODE
Reduction of order
Homogeneous linear equations with constant coefficients
Undetermined coefficients
Variation of parameters*
Cauchy-Euler equations*
S em f li ea ODE
Basic modeling with higher- de ODE s
Relationships to linear algebra
Stretch topics*
Laplace transform
Fourier series
Partial differential equations
Li ea Algeb a
Linear equations and matrices
Systems of linear equations
Systems of inequalities and linear programming
Row reduction and echelon forms
Vector and matrix equations
Solution sets of linear systems
Linear independence
Linear transformations
Matrix algebra
Transformation matrices (rotation, dilation, shear, etc.)
Inverse and invertible matrices
Triangular and diagonal matrices
Partitioned and block matrices*
Matrix factorizations
Determinants
C ame R le
Vector spaces
Definitions
Subspaces
Null space and column space
Bases
Coordinate systems
Dimension
Rank and nullity
Change of basis
Inner product, length, and orthogonality
Cauchy-Schwartz inequality*
Orthogonal projection
Other examples of vector spaces (polynomials, functions, etc.) and their linear operators
Eigenvalues etc.
Eigenvectors and eigenvalues
Trace
The characteristic equation
Cayley-Hamilton Theorem
Diagonalization
Minimal polynomials
Nilpotent and idempotent matrices
Invariant subspaces and direct sums*
Jordan normal form*
Matrix exponentials*
Gram-Schmidt orthogonalization*
Least squares solutions*
Quadratic forms*
N mbe The
Divisibility
Division algorithm
GCD and LCM
Euclidean algorithm
Diophantine equations
Fundamental theorem of arithmetic
Modular arithmetic
Basic properties of congruence
Base 𝑏 representations
Divisibility tricks
Chinese remainder theorem*
Fe ma Li le The em
Wil The rem
Number-theoretic functions
Sum and number of divisors
E le hi f c i
E le he em
Other topics
Order of an integer mod 𝑚
Primitive roots
Quadratic residues*
Ab ac Algeb a
Groups
Definitions and properties of groups
Dihedral groups
Cyclic groups
Subgroups
Permutation groups (including permutation notations)
Isomorphisms
Cosets
Lag a ge he em
Ca le he em
S l fi he em*
Direct products
Normal subgroups
Quotient groups
Homomorphisms
First isomorphism theorem
Fundamental theorem of finite abelian groups
Conjugacy classes
Automorphism groups*
Group-like structures: semigroups, monoids*
Rings
Definitions and properties of rings
Examples
Subrings
Integral domains
Fields
Characteristic of a ring
Ideals
Quotient rings
Prime and maximal ideals
Ring homomorphisms
Polynomial rings
Polynomial (ir)reducibility tests*
Other special types of rings (PID, UFD, Boolean, local, Noetherian, Artinian, etc.)*
Fields
Field extensions and their relation to vector spaces
Classifications and structure of finite fields
Splitting fields*
Constructible numbers*
Modules*
(Technically mentioned on the syllabus, but this is definitely a stretch topic all unto itself.)
Definitions and basic examples

Spanning sets and linear independence
Free modules
Invariant basis number condition
Quotient modules
Di c e e Ma h
Set Theory
Sets and set operations
Venn diagrams
Operations and relations
Equivalence and order relations
Functions: injections, surjections, bijections
Function composition
Images and preimages
Cardinality, cardinal numbers, and countability
Cantor-Schröder-Bernstein theorem
Logic
Propositional logic and truth tables
Propositional equivalences
Predicates and quantifiers
Proof techniques and proof validity
Induction
Algorithms
Basics of pseudocode (input/output, assignment, branching, looping, etc.)
Growth of functions
Runtime complexity
Well-known algorithms
Recursion*
Combinatorics
Basics of counting
Pigeonhole principle
Binomial and multinomial coefficients
Permutations and combinations
Circular permutations
Complements and the Inclusion-Exclusion Principle
Generating functions*
Partitions*
Graph Theory
Graph terminology
Graph isomorphism
Connectivity
Adjacency matrices
Euler and Hamilton paths
Trees
Tree traversals*
Spanning trees
Planar graphs
Graph coloring*
P babili S a i ic
Probability
Basic concepts and properties
Probability and combinatorics
Probability and geometry
Conditional probability
Joint probability
Independent events
Ba e The em
Random Variables
P babili ma de i f c i mf df a d c m la i e di ib i f ci
CDF
Expectation
Mean, variance, standard deviation
Moment generating functions MGF
Discrete distributions (uniform, Bernoulli, binomial, geometric, Poisson, etc.)
Continuous distributions (uniform, exponential, normal, 𝑡, 𝜒 )
Normal approximation of binomial distribution
Empirical Rule
Functions of random variables
Statistics
Mean, median, quartiles, range, percentiles
𝑧-scores
Linear regression and correlation
Linearization of nonlinear models
Sampling distributions of sample means and proportions
Point estimation
Biased and unbiased estimators
Confidence intervals*
Hypothesis testing*
T l g
Topological spaces
Point-set topology and open sets
Examples (standard, indiscrete, discrete, lower limit, cofinite, etc.)
Basis for a topology
Closed sets
Interior, exterior, boundary
Limit points, derived set, closure
Subspace topology
Product topology
Quotient topology a d gl i g
Geometric examples
Properties of spaces and functions

Open and closed maps
Continuous functions
Homeomorphisms
Hausdorff property
Connectedness
Path connectedness
Open coverings
Compactness
Real A al i
Properties of real numbers
Supremum/infimum and the completeness property
Density property
Archimedean property
Sequences
Convergence and limits of sequences
Limit superior and limit inferior
Bolzano-Weierstrass theorem
Cauchy sequences
Functions
Limits of functions
Continuous functions
Sequential limits and continuity
Uniform continuity and continuous extensions
Other types of continuity (Lipschitz, Hölder, absolute)*
Bounded variation*
Differentiability classes
Riemann and Darboux integrability
Sequences of functions
Pointwise and uniform convergence
Interchange of limits
Series of functions
Metric spaces
Metric definitions and examples (Euclidean, taxicab, max, ℓ /𝐿 , etc.)
Complete metrics
Topological concepts in metric spaces: compactness, connectedness, continuity
Heine-Borel Theorem
Lebesgue measure*
(used to be on the test syllabus in 1997, likely a stretch topic now)
Lebesgue measure and measurability

Basic Lebesgue integration
C m le A al i
Basics of complex numbers
Definitions
Complex plane
Complex conjugation
Polar form
Powers and roots
Loci and regions in the complex plane
Complex functions
Functions and mappings
Linear and fractional-linear mappings
Inversive geometry*
Power functions
Limits and continuity
Differentiability and analyticity
Cauchy-Riemann equations
Harmonic functions
Elementary functions extended to the complex plane
Conformal mappings*
The Riemann mapping theorem*
Complex integration
Contour integrals
Cauchy-Goursat theorem
Maximum Modulus Theorem
ML Inequality
Independence of path
Ca ch i eg al f m la
Laurent series
Residues and the residue theorem
N me ical A al i
(Again i s mentioned on the syllabus, so may as well be prepared.)
Approximation of functions
Basic estimation techniques
Tangent line approximation
Linear interpolation
E le me h d
Taylor polynomial approximations
Error
Order of approximations*
Lagrange interpolating polynomials*
Root finding
Bisection method
Fixed points and contraction mapping theorem*
Newton-Raphson method
Secant method
Calculus methods
Finite differences
Numerical differentiation
Riemann sums
Trapezoidal sums
Sim le
Christian Parkinson GRE Prep: Calculus I Notes 1
Week 1: Calculus I
Notes
The most fundamental definition in Calculus is that of the limit.
Definition 1 (Limit). Let f : R ! R and let x0 2 R. We say that the limit of f (x)
as x approaches x0 is L if for all " > 0, there is > 0 such that 0 < |x x0 | < =)
|f (x) L| < " [note: there may be no such L in which case we say the limit does not exist].
When we can find such L, we write
lim f (x) = L.
x!x0
If instead, we want to take the limit at ±1, we need to modify the definition a bit. We
say that the limit of f (x) as x approaches 1 is L if for all " > 0, there is M > 0 such that
x > M =) |f (x) L| < " (and likewise for 1 with some of the signs changed) whence
we write
lim f (x) = L.
x!1
From this definition, it is easy to ascertain that

x
lim x2 = 4, or lim = 1.
x!2 x!1 x + 5
There is one theorem for computing limits that can be helpful on the math subject GRE.
Theorem 2 (Squeeze Theorem). Suppose that f, g, h : R ! R are such that f (x) 

g(x)  h(x) for all x in a neighborhood of some x0 2 R. Further suppose that
lim f (x) = lim h(x) = L.

x!x0 x!x0
Then lim g(x) = L. [The same holds when x0 is replaced with ±1.]
x!x0
sin(x)
This theorem helps us calculate limits like lim .
x!1 x
Example 3. Note that we require |f (x) L| < " for 0 < |x x0 | < ; the lower bound of
0 is important. Consider f (x) = 1 for x 6= 0 and f (0) = 2. By tracing the graph, it is clear
that we should have
lim f (x) = 1.
x!0
However, if we remove the lower bound of 0 in the definition, then we cannot prove that the
limit is indeed 1.
We can define continuity in a similar way.

Definition 4 (Continuity). We say that a function f : R ! R is continuous at x0 2 R i↵

for each " > 0, there is > 0 such that |x x0 | < implies |f (x) f (x0 )| < ". Informally,
f is continuous at x0 if when we move a little bit away x0 , f (x) doesn’t change much from
f (x0 ). Equivalently, f is continuous at x0 i↵
lim f (x) = f (x0 ).

x!x0
We say that f is continuous in some domain A ✓ R i↵ f is continuous at each point x0 2 A.
With these definitions, we can state some properties of limits and continuous functions.
Proposition 5 (Rules for limits). Suppose that f, g : R ! R and x0 2 R is such that

lim f (x) and lim g(x) exist. Then
x!x0 x!x0
(a) lim (f (x) ± g(x)) = lim f (x) ± lim g(x).

x!x0 x!x0 x!x0
✓ ◆✓ ◆
(b) lim (f (x)g(x)) = lim f (x) lim g(x) .
x!x0 x!x0 x!x0
✓ ◆
f (x)
limx!x0 f (x)
(c) lim = provided that lim g(x) 6= 0.
x!x0 g(x)
limx!x0 g(x) x!x0
✓ ◆
(d) lim [↵f (x)] = ↵ lim f (x) for any constant ↵ 2 R.
x!x0 x!x0
✓ ◆
(e) If h : R ! R is continuous at lim f (x), then lim h(f (x)) = h lim f (x) . That is,
x!x0 x!x0 x!x0
limits can slide inside continuous functions.
Very similar properties hold for continuous functions.
Proposition 6 (Rules for continuous functions). Suppose that f, g : R ! R are

continuous at x0 2 R. Then
(a) f ± g is continuous at x0 .
(b) f g is continuous at x0 .
(c) f /g is continuous at x0 so long as g(x) 6= 0 for x sufficiently close to x0 .
(d) ↵f is continuous at x0 for each constant ↵ 2 R.
(e) If h : R ! R is continuous at f (x0 ), then h f is continuous at x0 . That is, we can

compose continuous functions and the result will be continuous.
This theorem gives us many continuous functions. For example, since the function
f (x) = x is continuous, it follows from rules (a), (b), (d) that any polynomial is contin-
uous. As a general rule, any ‘common’ function is continuous; this includes sin, cos, exp for
example.
There is one very nice theorem regarding continuous functions that comes up fairly often
in the math subject GRE.
Theorem 7 (Intermediate Value Theorem). Suppose that f : [a, b] ! R is continuous.

Then for every y between f (a) and f (b), there is x 2 [a, b] such that f (x) = y. That is, on
any interval, f achieves every value between the values it achieves at the end points.
This theorem is key in proving that the continuous image of a connected set remains
connected; we discuss this later when we address set theoretic topology. The context in
which this theorem typically arises on the GRE is root finding.
Example 8. Given that p(x) = 2x3 2x + 3 has one root in R, find the interval [n, n + 1]
(where n 2 Z) containing the root.
Solution. By the intermediate value theorem, it suffices to find n 2 Z such that p(n) < 0
and p(n + 1) > 0 or p(n) > 0 and p(n + 1) < 0. Testing, we see
··· p( 3) p( 2) p( 1) p(0) p(1) p(2) p(3) ···

··· 45 9 3 3 3 15 51 ···
Thus the root occurs in the interval [ 2, 1].
Continuity often times is not enough since a continuous function can still be very jagged.
We like our functions to be smooth. This motivates us to define smoother functions; specif-
ically, functions that locally look like straight lines. Accordingly, for a function f : R ! R
and a point x 2 R, you could think of drawing the secant line between the points x and a
nearby point x + h (for some small h 2 R). The degree to which this line approximates the
graph of f is some measure of smoothness of f . Letting h become smaller and smaller, the
secant line ‘should’ settle to a line which lies tangent to f at the point x; indeed, this occurs
if f is di↵erentiable at x.
Definition 9 (Derivative). Let f : R ! R and suppose that x 2 R. We say that f is

di↵erentiable at x i↵ the limit
✓ ◆
f (y) f (x) f (x + h) f (x)
lim or equivalently lim
y!x y x h!0 h
exists. In this case, we write
f (y) f (x) f (x + h) f (x)

f 0 (x) = lim = lim .
y!x y x h!0 h
We say that f is di↵erentiable in some domain A ✓ R i↵ f is di↵erentiable at each x 2 A.
The value f 0 (x) is the slope of the graph of f at the point x. In this way, we can think of the
derivative as the ‘instantaneous rate of change’ of f at x. We sometimes write the derivative

df
as dx (x); we use the notations interchangeably.
The first note we make is that di↵erentiability is stronger than continuity.
Proposition 10. If f : R ! R is di↵erentiable at x 2 R, then f is continuous at x.
The derivative is so ubiquitous, that it is worth memorizing the derivative of several

common functions.
Proposition 11 (Derivatives of Common Functions). The derivative of
(a) f (x) = xk is given by f 0 (x) = kxk 1

for all x 2 R and any k 2 R,
(b) f (x) = ex is given by f 0 (x) = ex for all x 2 R,
(c) f (x) = sin(x) is given by f 0 (x) = cos(x) for all x 2 R,
(d) f (x) = cos(x) is given by f 0 (x) = sin(x) for all x 2 R,

1
(e) f (x) = ln(x) is given by f 0 (x) = x
for all x > 0.
Of course, we may assume knowledge of some derivatives which are not listed here. Oth-
erwise, we have some rules which will tell us how to find the derivatives of other functions.
Proposition 12 (Rules for derivatives). Suppose that f, g : R ! R are di↵erentable at

x 2 R. Then
(a) f ± g is di↵erentiable at x with (f + g)0 (x) = f 0 (x) + g 0 (x).
(b) (Product Rule) f g is di↵erentiable at x with (f g)0 (x) = f 0 (x)g(x) + f (x)g 0 (x).
(c) ↵f is di↵erentiable at x and (↵f )0 (x) = ↵f 0 (x) for any constant ↵ 2 R.
(d) (Power rule) if f (x) = xn for some n 2 R, then f 0 (x) = nxn 1 .
(e) (Chain rule) if h : R ! R is di↵erentiable at f (x), then h f is di↵erentiable at x and

(h f )0 (x) = h0 (f (x))f 0 (x).
(f) (Derivative of inverse) if f is locally injective (so that f 1 exists in a neighborhood of

f (x)), then f 1 is di↵erentiable at f (x) and
✓ ◆
1 0 1 1 0 1
(f ) (f (x)) = 0 equivalently (f ) (y) = 0 , y = f (x) .
f (x) f (f 1 (y))
Also, as discussed above, we can use the derivative to find tangent lines to a graph.
Definition 13 (tangent line). Suppose that f : R ! R is di↵erentiable at x0 2 R. Then

the line tangent to the graph of y = f (x) at x0 is given by the equation
y = f (x0 ) + f 0 (x0 )(x x0 );
notice that the slope of this line is exactly f 0 (x0 ).
These simple rules and definitions can be used to solve a surprising amount of problems
on the math subject GRE.
x2 x 2
Example 14. Let f (0) = 0 and f (x) = xe for x 6= 0. At how many points is the
tangent line to f horizontal?
Solution. The question is asking how often f 0 (x) = 0. For x 6= 0, we see

x2 x 2 x2 x 2 x2 x 2
f 0 (x) = e + xe ( 2x + 2x 3 ) = e (1 2x2 + 2x 2 )
This is zeropwhen 1 2x2 + 2x 2 = 0 or 2x4 x2 + 2 = 0. This is a quadratic in x2 yielding

2 1
x = 4 (1 ± 5). If we take the , we will have no real solutions, so there are two solutions
corresponding to taking the +. We also note that as x ! 0, f (x)
x
! 0 so indeed, f is di↵er-
0
entiable at x = 0 with f (0) = 0. Thus the answer is 3.
(x+1)2
Example 15. What is the derivative of f (x) = x9 e x
for x > 0?
Solution. We could chug away at this using the power rule and product rule and it would
get fairly messy. Instead, we can use the chain rule to find the derivative of ln(f (x)). We
see
f 0 (x) d d 2 9
= (ln(f (x))) = (2 ln(x + 1) 9 ln(x) x) = 1.
f (x) dx dx x+1 x
Thus ✓ ◆
0 (x + 1)2 2 9
f (x) = 1 .
x9 e x x+1 x
A natural question: is there a point when the average rate of change of a function over
an interval is equal to the instantaneous rate of change? The answer is yes and physically
this makes sense.For example, if you drove 500 miles in 10 hours, there was a point when
you were going exactly 50 miles per hour.
Theorem 16 (Mean Value Theorem). Suppose that f : [a, b] ! R is continuous and

that f is di↵erentiable on (a, b). Then there exists c 2 (a, b) such that
f (b) f (a)
f 0 (c) = .
b a
With “derivative as slope”, we see that a di↵erentiable function f is increasing at x if
f (x) 0 and decreasing if f 0 (x)  0. Thus the extreme points of f must occur at places
0
where f (x) = 0.
Theorem 17 (Extreme Value Theorem). Suppose that f : [a, b] ! R is continuous.

Then f achieves a maximum (resp. minimum) value on [a, b]. Furthermore, the maximum
(resp. minimum) occurs either at one of the endpoints or at a point x such that f 0 (x) = 0
for f 0 (x) does not exist.
Derivatives are also very helpful in finding limits. Many limits result in so-called inde-
terminant forms. For example, we may want to take the limit
f (x)
lim
x!1 g(x)
but both f (x) and g(x) approach 1 as x ! 1. The ‘number’ 1 1

is undefined; it is an inde-
terminant form. This can likewise happen when f (x) and g(x) both approach 0 as x ! 1
as 00 is also indeterminant. We have a theorem to handle such limits.
Theorem 18 (l’Hôpitals Rule). Suppose that f, g : R ! R are di↵erentiable functions

with g 0 (x) 6= 0 in a neighborhood of x0 2 R and assume that
lim f (x) = lim g(x) = 0

x!x0 x!x0
or
lim f (x) = ±1, lim g(x) = ±1.
x!x0 x!x0
f 0 (x)
Then if limx!x0 g 0 (x)
exists, we have
f (x) f 0 (x)
lim = lim 0 .
x!x0 g(x) x!x0 g (x)
This tells us that if we have an indeterminant form, what really matters is the rate at which
the functions approach that indeterminant form individually.
This is an incredibly powerful method for finding limits and it often arises on the math
GRE. It also (indirectly) helps us find limits when other indeterminant forms such at 01 or
10 arise.
Example 19. Compute the limits:

✓ ◆x p
sin(x) 1 sin(x sin(x)) 1+x 1
(a) lim , (b) lim 1 + , (c) lim , (d) lim .
x!0 x x!1 x x!0 x3 x!0 x
Solution.
sin(x) cos(x)
(a) lim = lim = 1.
x!0 x x!0 1
1 x
(b) Let L = lim 1 + x
. Then by continuity,
x!1
✓ ◆ ✓ ◆
1 ln(1 + 1/x) (1/x2 ) 1
ln L = lim x ln 1 + = lim = lim = 1.
x!1 x x!1 1/x x!1 1 + 1/x (1/x2 )
Thus L = e.
(c) We see
sin(x sin(x)) cos(x sin(x))(1 cos(x)) 1 cos(x)

lim = lim = lim .
x!0 x3 x!0 3x2 x!0 3x2
Repeatedly applying l’Hôpital, we then have
sin(x sin(x)) sin(x) 1
lim = lim = .
x!0 x3 x!0 6x 6
p
(d) We recognize this limit as the derivative of f (x) = x at x = 1. Thus
p
1+x 1 d p 1 1
lim = ( x) = p = .
x!0 x dx x=1 2 1 2
From here, we move onto Riemann integration. The Riemann integral of a function
between two points a, b can be physically Rthought of as the area underneath the graph of
b
the function between a, b. If we denote by a f (x)dx the integral of f from a to b, then it is
easy to see for example, that
Z 1 Z 1
1 dx = 1, x dx = 1/2
0 0
just by calculating the area. But what if f is not so simple? We can calculate the area using
limits and approximating with rectangles
Definition 20 (Riemann Integral [baby definition]). Suppose that f : [a, b] ! R is

continuous. For n 2 N, define the partition a = x0 < x1 < . . . < xn = b by xi = a + (b na)i .
Then we define the Riemann integral of f from a to b, by one of three equivalent formulas:
Z b n
b aX
f (x)dx = lim f (xi 1 )
a n!1 n i=1
n
b aX
= lim f (xi )
n!1 n i=1
n ✓ ◆
b aX xi 1+ xi
= lim f .
n!1 n i=1
2
Note that when f is continuous, these limits always exists and are equal. When f is
discontinuous, some of them may fail to exist. Thus a bit more care is needed in rigorously
defining the Riemann integral for discontinuous functions. We also note that there is no
reason the partition xi needs to contain evenly spaced points and there’s no reason we need
to evaluate f at xi 1 , xi or the midpoint. We could refine the definition to account for an
arbitrary partition with arbitrary tags:
Z b n
X
f (x)dx = lim f (x⇤i )(xi xi 1 ), where x⇤i 2 [xi 1 , xi ]
a n!1
i=1
for example. In this case, the points xi need to be strictly increasing, a = x0 < x1 < · · · <
xn = b and satisfy condition that ensures that they dont cluster in some small portion of the
interval: specifically, we need hn ..= max1in (xi xi 1 ) ! 0 as n ! 1. On the GRE, it is
rare to see a Riemann sum with non-uniform partition.
We also note that each of these limits has a di↵erent interpretation which can be seen by
looking at a picture.
Example 21. Let 0 = x0 < x1 < . . . < xn = 1 be a partition of [0, 1]. Which of the
following quantities is greatest?
Z 1
A= x2 dx
0
n
X
B= (xi 1 )2 (xi xi 1 )
i=1
n
X
C= (xi )2 (xi xi 1 )
i=1
n
X
D= 1
4
(xi 1 + xi )2 (xi xi 1 ).
i=1
Solution. By drawing a picture, it is easy to see that B < D < C and that A < C so C is
the largest. [Note: C was largest because x2 is an increasing function; what happens if we
change it to a decreasing function? non-monotone function?]
A useful property of integration is that it preserves inequalities.
Proposition 22 (Integration preserves inequalities). Suppose that f, g : [a, b] ! R

are such that f (x)  g(x) for all x 2 [a, b]. Then
Z b Z b
f (x)dx  g(x)dx.
a a
This still holds if we replace  with <.
In a standard Calculus curriculum, it is quite an abrupt shift from derivatives to inte-

grals which leaves many students wondering if there is some hidden connection. Indeed, the
fundamental theorem of calculus provides this connection.
Theorem 23 (Fundamental Theorem of Calculus). We break this into two parts.

FToC I: Suppose that f : [a, b] ! R is continuous. Then we can define F : [a, b] ! R by
Z x
F (x) = f (t)dt, x 2 [a, b].
a
Such F is di↵erentiable on (a, b) with F 0 (x) = f (x) for all x 2 (a, b).
FToC II: Suppose that f : [a, b] ! R is continuous and F : [a, b] ! R is any function
such that F 0 (x) = f (x) for x 2 (a, b). Then
Z b
f (x)dx = F (b) F (a).
a
Thus to calculuate area under curves, we simply need to reverse the di↵erentiation process
and evaluate at the correct points. Such a function F in the FToC is called an antiderivative
of f .
Z ⇡
Example 24. Evaluate sin(x)dx.
0
d
Solution. Since dx
( cos(x)) = sin(x), we see,
Z ⇡
sin(x)dx = cos(⇡) ( cos(0)) = 2.
0
The FToC can also be very helpful evaluating certain limits if we can recognize that they
are actually Riemann sums.
n
X n
Example 25. Evaluate lim .
n!1
k=1
n2 + k2
Solution. Call the limit S. When we re-write this as
n
1X 1
S = lim 2
n!1 n k
k=1 1 + n
1
we recognize that it is simply the Riemann sum for 1+x2
on the interval [0, 1]. Thus
Z 1
dx ⇡
S= = arctan(1) arctan(0) = .
0 1 + x2 4
The Fundamental Theorem of Calculus tells us that to calculate integrals, we simply
need to find antiderivatives. There are a few standard methods which help us with this. We
close the Calculus I notes by covering these methods.
Theorem 26 (Integration by Substitution). Suppose that I is some interval, f : I ! R

is continuous and that : [a, b] ! I is a di↵erentiable bijection. Then
Z (b) Z b
f (x)dx = f ( (t)) 0 (t)dt.
(a) a
We can use the formula in either direction; one of the above integrals may be very
straightforward while the other is not. We illustrate this with two examples.
Z 1 Z 1
t dt dx
Example 27. Compute (a) 2
and (b) 2
.
0 1+t 0 1+x
Solution. For (a), note that the numerator is the derivative of the denominator (up to a
constant). Thus we should make a substitution for the denominator. Putting x = 1 + t2 , we
see dx = 2tdt and so Z 1 Z
t dt 1 2 dx log(2)
2
= = .
0 1+t 2 1 x 2
Here we used the rule from right to left.
For (b), we can set x = tan(t). Then dx = sec2 (t)dt so we see
Z ⇡/4 Z ⇡/4
sec2 (t)dt ⇡
= 1 dt = .
0 1 + tan2 (t) 0 4
Here we used the rule from left to right.
Example 28. Note that the assumptions on are necessary. Indeed, removing this as-
sumption and proceeding formally, we can run into some errors. Consider, by (b) above and
using evenness, we have Z 1
dx ⇡
2
= .
1 1+x 2
However, using x = 1/t gives,
Z 1 Z 1 Z 1
dx ( 1/t2 )dt dt ⇡
2
= = = .
1 1+x 1 1+t 2 1 1 + t2 2
Thus we’ve seemingly proved that ⇡ = ⇡ which is absurd. However, this substitution was
invalid because (t) = 1/t is not a di↵erentiable bijection on [ 1, 1].
Intuitively, integration by substitution takes advantage of the chain rule for di↵erentia-
tion. A natural question is if we can exploit other rules for di↵erentiation in a similar way.
Theorem 29 (Integration by Parts). Suppose that u, v : [a, b] ! R are di↵erentiable.

Then Z b Z b
0
u(x)v (x)dx = [u(b)v(b) u(a)v(a)] v(x)u0 (x)dx.
a a
This is simply an exploitation of the product rule; consider

Z b Z b Z b
d 0
u(b)v(b) u(a)v(a) = (u(x)v(x))dx = u(x)v (x)dx + v(x)u0 (x)dx
a dx a a
which is equivalent to the above statement. You can think of integration by parts as “taking
a derivative o↵ of one term and applying it to the other.” Of course, this process is not free;
the cost you incur is the boundary evaluation u(b)v(b) u(a)v(a). In terms of indefinite
integrals, you often see this rule written as
Z Z
u dv = uv v du.
I find this to be an oversimplification as it supresses the relationship to the product rule.

However, it does help in application because it boils the rule down to “choosing” u and dv.
Practically, we try to choose u which gets “better” upon di↵erentiation (polynomials become
lower order, inverse trig functions and logarithms become algebraic) and dv which does not
become worse upon anti-di↵erentiation (exponentials, sines and cosines stay roughly the
same).
Z 1 Z 2
x
Example 30. Compute (a) xe dx and (b) log(x)dx.
0 1
Solution. For (a), we choose u = x and dv = e x dx. Then du = dx and v = e x . Thus

Z 1 x!1 Z 1 x!1
x x x x
xe dx = xe + e dx = e = 1.
0 x=0 0 x=0
For (b), there is only one function present. However, we can take u = log(x), dv = dx to see
that Z 2 Z 2
x=2
1
log(x)dx = x log(x) x · dx = 2 log(2) 1.
1 x=1 1 x
A last technique used for integration is partial fractions. This is a method for simplify-
ing the integrand; it doesn’t correspond to some di↵erentiation rule. This is a method for
breaking up rational functions into much simpler rational functions. Rather than discuss
this, we illustrate the technique with an example.
Example 31. Find an anti-derivative of

1
.
(x 1)(x2 9)
Solution. Note that
1 1
= .
(x 1)(x2 9) (x 1)(x 3)(x + 3)
We attempt to split this into three separate fractions. Specifically, we try to find A, B, C so
that
1 A B C
= + + .
(x 1)(x 3)(x + 3) x 1 x 3 x+3
Multiplying this equality by (x 1)(x 3)(x + 3) gives
1 = A(x 3)(x + 3) + B(x 1)(x + 3) + C(x 1)(x 3).
This is a system of three equations in three unknowns (collect terms for x0 , x1 , x2 ). It is easy
to solve by substituting certain values for x. Substituting in x = 1, x = 3 and x = 3, we
arrive at
1 = ( 2)(4)A, 1 = (2)(6)B, ( 4)( 6)C = 1.
Thus
1 1 1 1 1 1 1
= + + .
(x 1)(x 3)(x + 3) 8 x 1 12 x 3 24 x + 3
It is now easy to see that
Z
1 1 1 1
dx = log(x 1) + log(x 3) + log(x + 3) + constant.
(x 1)(x2 9) 8 12 24
Note that there need to be special considerations when the denominator does not split into
linear factors; we omit these cases.
Christian Parkinson GRE Prep: Calculus II Notes 1
Week 2: Calculus II
Notes
We concluded the Calculus I notes with Riemann integration, Fundamental Theorem of
Calculus and some helpful integration techniques. However, ofter times, you will be asked to
identify whether an integral converges or diverges even when you cannot find the value. This
is especially true for improper integrals: those in which the integrand had a singularity or the
range goes to infinity. We begin the Calculus II notes by discussing some basic convergence
results.
Proposition 1 (p-test). Let p 2 R. The integral

Z 1
dx
1 xp
converges if p > 1 and diverges otherwise. The integral
Z 1
dx
p
0 x
converges if p < 1 and diverges otherwise.
Note that as a consequence, the integral

Z 1
dx
0 xp
never converges. These can all be verified by a straight-forward calculation and hint at some
good intuition. The borderline case is 1/x where neither integrals converge, so as a general
rule, for Z 1
f (x)dx
1
to converge, we will need f (x) ! 0 faster than 1/x as x ! 1 and if g(x) has a singularity
at x = 0, then for Z 1
g(x)dx
0
to converge, we need g(x) to blow up slower than 1/x as x ! 0+ . Note, these “general rules”
should not be seen has hard-and-fast truths, but rather heuristics which can often make it
easy to verify if an integral converges or diverges. The following two theorems give more
rigorous forms of these general rules.
Theorem 2 (Comparison Test). Suppose that f, g : [0, 1) ! [0, 1) are continuous and
that f (x)  g(x) for all x 2 [0, 1). Then
Z 1 Z 1
(a) if g(x)dx converges, then so does f (x)dx,
0 0
Z 1 Z 1
(b) if f (x)dx diverges, then so does g(x)dx.
0 0
Theorem 3 (Limit Comparison Test). Suppose that f, g : [0, 1) ! [0, 1) are contin-
uous and that
f (x)
lim
x!1 g(x)
exists as a number in (0, 1) [excluding the endpoints]. Then

Z 1 Z 1
f (x)dx and g(x)dx both converge or both diverge.
0 0
Note, the previous two theorems also hold for improper integrals where the functions have
singularities at the same point but the limit in the second must be taken at the singularity.
This last theorem tells us that the only important thing in determining convergence of
an integral with 1 as a limit is the asymptotic behavior of the integrand. We have a special
notation for this. If f, g are such that f /g ! L as x ! 1 where L is a finite non-zero
number, then we write f ⇠ g [note: in most contexts, L is required to be 1 in order to say
f ⇠ g; for our purposes, it is fine to allow L to be finite and non-zero]. We typically read
this as “f is asymptotic to g.” Note that if f ⇠ g, then there are constants c, C > 0 such
that for sufficiently large x, we have cg(x)  f (x)  Cg(x). This shows that we can easily
derive the Limit Comparison Test from the ordinary Comparison Test. A relaxed version
of this requirement leads to a new definition. If there is C > 0 such that f (x)  Cg(x) for
all x sufficiently large, then we write f = O(g) [and say “f is big-oh of g”]. Thus f ⇠ g i↵
f = O(g) and g = O(f ). Specifically, f = O(g) i↵ f /g remains bounded as x ! 1. Finally,
note that we can define the same notion for x approaching some other point. For example,
sin(x) ⇠ x as x ! 0.
In order to apply these rules, it is nice to know something about the growth rates of
di↵erent common functions at 1. We discuss a few here.
Proposition 4 (Growth Rates of x, ex , log(x) at 1). Let ↵, , > 0. Then

1
(a) lim = 0,
x!1 log(x)↵
log(x)↵
(b) lim = 0,
x!1 x
x
(c) lim = 0.
x!1 e x
What these tells us is that asympotically as x ! 1, any power of x is smaller than any
exponential and any power of a logarithm is smaller than any power of x. In asymptotic
notation, we write 1 ⌧ log(x)↵ ⌧ x ⌧ e x where f ⌧ g means that f /g ! 0 at 1. We
can compound these into new relationships x ⌧ x log(x)↵ . Many more such relationships
can be proved by simply by taking the required limit. Note that if f ⌧ g, then f = O(g);
the converse need not be true.
Example 5. Determine the convergence/divergence of the following integrals

Z 1
1
(a) p dx.
2 x + x + log(x)
Z 1 6
x + 4x3 + 3x2 + 2
(b) dx
1 7x9 + x + 10
Z 1
(c) csc(x)1/2 dx.
0
Solution.
p p
(a) Since x, log(x) ⌧ x, there are constants c1 , c2 such that for x large enough, x  c1 x,
log(x)  c2 x. Thus
Z 1 Z 1
dx dx
p C+
2 x + log(x) + x M (1 + c1 + c2 )x
where C, M are some constants. The former diverges since the latter does.
(b) Notice that if we divide the integrand by 1/x3 (hence multiply by x3 ) and take the
limit, we see
x9 + 4x6 + 3x5 + 2x3 1 + 4/x3 + 3/x4 + 2/x6

lim = lim = 1/7.
x!1 7x9 + x + 10 x!1 7 + 1/x8 + 10/x9
R1
Thus by the Limit Comparison Test, our integral converges i↵ 1 dx x3
converges. The
latter converges, so the integral in (b) converges as well.
p
(c) Since sin(x) ⇠ x as x ! 0, we see that csc(x)1/2 ⇠ 1/ x. Thus the given integral
converges i↵ Z 1
dx
p
0 x
converges which it does.
Next, we discuss applications of the integral. We have already established that it can be
used to calculate area under curves but there are a few other applications that show up often
on the math subject GRE; most prominently arc-length calculations and surfaces/volumes
of revolution. We discuss the formulas for these briefly.
Proposition 6 (Arc Length). Let f : [a, b] ! R be continuously di↵erentiable. Then

the arc-length of the graph (x, f (x)) between a and b is given by
Z bp
L= 1 + f 0 (x)2 dx.
a
More generally, if x, y : [a, b] ! R2 are continuously di↵erentiable, then the length of the
curve parameterized by (x(t), y(t)) for t 2 [a, b] is given by
Z bp
L= x0 (t)2 + y 0 (t)2 dt.
a
Proposition 7 (Surface / Volume of Revolution). Suppose that f : [a, b] ! R is

continuous. If we were tor rotate the graph (x, f (x)) about the x-axis this will create a
surface. The surface area of this surface is
Z b p
A = 2⇡ f (x) 1 + f 0 (x)2 dx
a
and the volume encapsulated by this surface is

Z b
V =⇡ f (x)2 dx.
a
These formulas are easy to derive when you have a good picture in your mind; otherwise
you can use them as a black box. Note that they agree with our intuition.
p
Example 8. What is the length of the graph of f (x) = 1 x2 for x 2 [ 1, 1]?
Solution. Intuitively, this graph is half of a unit circle so the length should be ⇡. Using the
formula, we see
Z 1r Z 1
x2 dx
L= 1+ 2
dx = p = arcsin(1) arcsin( 1) = ⇡/2 ( ⇡/2) = ⇡,
1 1 x 1 1 x2
where the antiderivative can easily be found using the trig. substitution x = sin(t).
Example 9. Let f (x) = x for x 2 [0, 1] What surface area and volume result from rotating
the graph of f about the x-axis?
Solution. Intuitively, this will create a right, circular cone of “height” 1 and radius 1. You
may recall from geometry, the volume is ⇡3 r2 h = ⇡3 . According to our formula,
Z 1
⇡
V =⇡ x2 dx = .
0 3
Likewise, the surface area p
(not includingpthe “bottom” of the cone; i.e. the unit circle which
forms the base) will be ⇡r r2 + h2 = ⇡ 2 and our formula gives
Z 1 p p
A = 2⇡ x 1 + 12 dx = ⇡ 2.
0
Note: much like the arc-length formula, the surface area and volume formulas can be
adjusted to account for parameterized coordinates or to account for rotation around the
y-axis or any other line ax + by = c. Also, you may be asked for the volume created when
the area between two graphs (x, f (x)) and (x, g(x)) is rotated around the x-axis. This will
be given by Z b
⇡ f (x)2 g(x)2 dx.
a
The reasoning should be somewhat clear from the geometry.
From here we move onto sequences and series.
Definition 10 (Sequence). A sequence is a map a : N ! R which assigns a real number

to each natural number. The value that a assigns to n 2 N is typically denoted by an rather
than a(n) though both may be used. In a slight abuse of notation, we then often refer to an
as a sequence or otherwise denote sequences by (an ), sometimes accompanied by a range for
n, e.g., (an )n 0 or (an )1
n=1 .
Note that sequences can begin at n = 0 or n = 1; both are common start points. Se-
quences themselves (i.e. independent of series) account for many questions on the math
subject GRE. A common question will give a recursive formula (i.e. a formula for an in
terms of the preceeding entries: an 1 , an 2 , . . .) for a sequence and ask the student to iden-
tify the sequence explicitly.
Example 11. Suppose that a0 = 0 and an = an 1 + (2n 1) for all n 1. Find a closed
form expression for an .
Solution. To identify the sequence, we just write out terms:
a0 = 0, a1 = 0 + 1 = 1, a2 = 1 + 3 = 4, a3 = 4 + 5 = 9, . . . .
From here it is easy to surmise that the sequence is given by an = n2 . To prove this, one
could use mathematical induction; however, proofs are not necessary on the GRE so this
would be a waste of time. We will talk more about general methods for identifying sequences
when we touch on Discrete Math.
Of particular interest is the behavior of sequence for large n.
Definition 12 (Convergence of Sequences). A sequence (an ) is said to converge to

L 2 R if for all " > 0, there is N 2 N such that for all n N , we have |an L| < ". If no
such L exists, the sequence does not have a limit and is said to diverge. If such L does exist,
it is called the limit of the sequence (an ) and (an ) is said to be convergent. In this case, we
write
lim an = L.
n!1
Note that sometimes we drop the n ! 1 part since it is implied: we may say, lim an = L
or an ! L.
The rules for limits are sequences are the same as those for limits of functions. We repeat
them here.
Proposition 13 (Rules for limits of sequences). Suppose (an ) and (bn ) are two
convergent sequences. Then
(a) lim (an + bn ) = lim an + lim bn .

n!1 n!1 n!1
⇣ ⌘⇣ ⌘
(b) lim (an bn ) = lim an lim bn .
n!1 n!1 n!1
✓ ◆
an limn!1 an
(c) lim = provided that lim bn 6= 0.
n!1 bn limn!1 bn n!1
⇣ ⌘
(d) lim [↵an ] = ↵ lim an for any constant ↵ 2 R.
n!1 n!1
⇣ ⌘
(e) If h : R ! R is continuous, then lim h(an ) = h lim an . That is, limits can slide
n!1 n!1
inside continuous functions. [Note: this is actually a perfectly good definition for con-
tinuous function on R. By the sequential criterion theorem, a function h is continuous
i↵ limn!1 h(an ) = h (limn!1 an ) for all convergent sequences (an ).]
We also have analogous theorems like the squeeze theorem which we omit for brevity.
Each sequence also defines a series. A series is a formal infinite sum. Infinite sums are of
great interest since, for example, they can be used to approximate functions which cannot
be explicity calculated in a finite number of steps; e.g. ex or cos(x).
Definition 14 (Partial Sums & Series). Given a sequence (an )1

n=1 , the N
th
partial sum
of (an ) is given by
N
X
sN = an .
n=1
The series (or infinite sum) of (an ) is given (formally at least) by
s = lim sN
N !1
1
X N
X
an = lim an .
N !1
n=1 n=1
If the partial sums sn have a limit, then the series is said to converge; otherwise it is said to
diverge.
In some cases, it is easy enough to look at the partial sums and explicitly take the limit.
1 1
Example 15. Calculate the infinite sums of an = 2n
and bn = n(n+1)
.
Solution. For an , we see

1 3 7 15 2N 1
s1 = , s2 = , s3 = , s4 = , . . . , sN = .
2 4 8 16 2N
Thus sN ! 1 and so we say
X1
1
= 1.
n=1
2n
For bn , we note that by partial fractions,
1 1
bn = .
n n+1
Thus writing out the partial sum, we see
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 1 1 1 1 1 1
sN = 1 + + + ··· + .
2 2 3 3 4 N N +1
We notice that the sum “telescopes”; that is, all intermediate terms are canceled and we are
left with
1
sN = 1 ! 1.
N +1
Thus we also have 1
X 1
= 1.
n=1
n(n + 1)
For examples of series that diverge, consider an = n, whence sN = N (N2+1) ! 1 or

an = ( 1)n+1 where s1 = 1, s2 = 0, s3 = 1, s4 = 0, . . . and the partial sums continue to
oscillate and thus do not converge.
While it is fairly easy to tell if a sequence converges or diverges, it can be difficult to tell
whether a series converges or diverges. For example, it may be unclear initially whether
X1 X1 X1
1 ( 1)n
, or sin(n)
n=1
n n=2
log(n) n=1
converge or diverge since we can’t find a nice closed form for the partial sums. To this end,
we have built up several tests for convergence/divergence. The first test tells you that if the
series is to converge, we at least need the summand to approach zero.
Theorem 16 (Divergence Test). Suppose that (an ) is a sequence such that an 6! 0.

1
X
Then an diverges.
n=1
P P1
Theorem 21 tells us, for example, that 1 n
n=1 n+1 or n=1 sin(n) will diverge. However,
the converse is not true: having a summand which goes to zero does not guarantee conver-
gence for the series.
Proposition 17 (Harmonic Series). The Harmonic series given by

X1
1
n=1
n
diverges. [A nice document with roughly twenty proofs of this fact can be found here.]
Indeed, this is part of a larger fact.
Proposition 18 (p-series). Let p 2 R. The series

X1
1
n=1
np
converges is p > 1 and diverges for p  1.
You may recognize this as a direct analog to one of the above propositions for conver-
gence of integrals (as we will soon see, this is no coincidence) and wonder whether the other
integral convergence tests have analogs for sequence. The answer is yes.
Theorem 19 (Comparison Test for Series). Suppose that (an ) and (bn ) are sequences
with non-negative terms such that an  bn for all n 2 N. Then
1
X 1
X
(a) if bn converges then so does an .
n=1 n=1
1
X 1
X
(b) if an diverges then so does bn .
n=1 n=1
Theorem 20 (Limit Comparison Test for Series). Suppose that (an ) and (bn ) are
non-negative sequences such that bn > 0 for all n sufficiently large. If
an
lim = L 2 (0, 1)
n!1 bn
then 1 1
X X
an and bn either both converge or both diverge.
n=1 n=1
One feature that is more important for series than for integrals is sign changes.
P The re-
( 1)n
sults above apply for non-negative sequences but they do not address series like 1 n=1
p
n
n
[such series with summands P of the form ( 1) a n where a n are non-negative terms are called
alternating series]. While 1 p1 diverges (by the p-series test), with the alternating sign,
n=1 n
there is cancellation occuring and perhaps there is enough cancellation that the alternating
series converges. This is indeed the case; we state this as a proposition after a few other
related definition and results.
P1
Definition 21 (Absolute
P1 Convergence). The series n=1 an is said to converge abso-
lutely if the series n=1 |an | converges.
P P1
Theorem 22 (Absolute Convergence Theorem). If 1 n=1 |an | converges, then n=1 an
converges. That is, absolute convergence implies convergence. (This is a reflection of the
fact that R [with the usual norm] is a Banach space.)
Since the converse is not true (lack of absolute convergence does not imply lack of con-
P ( 1)n
vergence), this still says nothing about 1 n=1
p . For this we need another theorem.
n
Theorem 23 (Alternating Series PTest). Suppose that (an ) is a non-negative, decreasing

1 n
sequence such that an ! 0. Then n=1 ( 1) an converges.
The Alternating Series test is a special case of a more general theorem. By a theorem of
Dirichlet, if (an ) is a non-negative
Psequence decreasing to zero, and (bn ) is any series whose
1
partial sums are bounded, then n=1 an bn converges; the alternating series test is the case
that bn = ( 1)n . Note: it is a good exercise to prove by example that the assumption
that (an ) is decreasing is necessary; without this, the alternating series could diverge even if
an ! 0.
Above we remarked that the rules for infinite sums very closely resemble those for im-
proper integrals. Here is the reason why:
Theorem 24 (Integral Test). Suppose that f : [0, 1) ! R is a continuous decreasing

function such that limx!1 f (x) = 0. Then
Z 1 1
X
f (x)dx converges if and only if f (n) converges.
0 n=0
That is, to check convergence of the sum, we simply need to check convergence of the
integral. Morally: “sums are integrals and vice versa.”
Finally, there are two more tests for convergence which are very useful.
Theorem 25 (Ratio Test). Suppose that (an ) is a sequence such that an 6= 0 for all
sufficiently large n and assume that
an+1
lim = L 2 [0, 1). (i.e., assume the limit exists)
n!1 an
Then
1
X
(a) if L < 1, then an converges,
n=1
1
X
(b) if L > 1, then an diverges.
n=1
Theorem 26 (Root Test). Suppose that (an ) is a sequence and assume that
p
lim n |an | = L 2 [0, 1). (i.e., assume the limit exists)
n!1
Then
1
X
(a) if L < 1, then an converges,
n=1
1
X
(b) if L > 1, then an diverges.
n=1
Note that neither the ratio test or the root test can address the case that the given limit
L = 1; in this case, these tests are inconclusive and another test must be used. Also, it is not
strictly necessary for the limit to exist. We could replace L with the limit superior (which
will always exists as a number in [0, 1]) in either case and the theorems continue to hold.
We introduce one more convergence test which is not always covered in the Calculus II
curriculum but it can be very useful.
Theorem 27 (Cauchy Condensation Test). Let (an ) be a non-negative sequence such

that an ! 0. Then
1
X 1
X
an converges if and only if 2n a2n converges.
n=1 n=1
This test can be useful in series involving logarithms since log(2n ) = n log(2).
Example 28. Determine the convergence/divergence of

X1 1
X X 1 1
X
n2 + 4 ( 1)n n 1
(a) 3 + n3/2
, (b) , (c) ⇡ , (d) .
n=1
n n=1
n + log(n + log(n)) n=0 n=1
arctan(n)n
Solution. There are several ways to check convergence or divergence. We suggest one way
for each sum. Series (a) diverges by limit comparison with the harmonic series. Series (b)
converges by the alternating series test. Series (c) converges by the ratio or root test. Series
(d) converges by the root test.
Note that series (c) above is a special series. The underlying sequence is a geometric
progression; i.e., a sequence of the form cr0 , cr1 , cr2 , cr3 , . . . (above we have c = 1, r = 1/⇡).
Such series are so important, they are not only given their own name, they are evaluated
explicitly.
Proposition 29 (Geometric Series). Let c 2 R \ {0}. Then

X1 ⇢ c
n 1 r
, if |r| < 1,
cr =
divergent if |r| 1.
n=0
This is easily proven by showing (using induction) that

N
X c(1 rN +1 )
crn =
n=1
1 r
and then taking the limit (or course, r = 1 needs to be dealt with separately, but this is no
problem).
This last proposition is a nice segue into the final topic of Calculus II: series of functions
and specifically, the Taylor series. Note that in the above proposition, the function f :
( 1, 1) ! R defined by
1
f (x) = , x 2 ( 1, 1)
1 x
could just as well be defined by
1
X
f (x) = xn , x 2 ( 1, 1).
n=0
A natural question is then: what other functions has such representations as “infinite poly-
nomials”? To understand this question we may think of approximating function locally by
polynomials and allowing the degree of the approximating polynomial tend to 1. Recall
that di↵erentiable functions look “locally linear”; thus it is natural to think that smooth
functions (i.e., functions that are infinitely many times di↵erentiable) may look locally like
a polynomial of any degree that we choose. While this is not true of all smooth functions, it
is true for most of the functions that one typically encounters in a Calculus course. In what
follows, unless explicitly stated, we assume all functions that we introduce are smooth.
For demonstrative purposes, recall that the tangent line to a function f (x) at a point a
is given by
p1 (x) = f (a) + f 0 (a)(x a).
You may notice that p1 is the unique first degree polynomial which matches the value of f
at a and the value of f 0 at a. If we want to also match the second derivative of f at a, we
will need to up the order of the polynomial but some work shows that
f 00 (a)
p2 (x) = f (a) + f 0 (a)(x a) + (x a)2
2
will work. Likewise, we can match the first N derivatives of f at a using the polynomial
N
X f (n) (a)
pN (x) = (x a)n .
n=0
n!
These polynomials pN are sometimes called the N th order Taylor approximations to f (at
the point x = a). To see that these do actually approximate f near x = a, consider the
following theorem.
Theorem 30 (Taylor’s Theorem). Let f : R ! R be N times di↵erentiable at a 2 R.

Then there is a function RN : R ! R such that
N
!
X f (n) (a)
f (x) = pN (x) + RN (x) = (x a)n + RN (x), x 2 R
n=0
n!
and RN (x) ⌧ (x a)N as x ! a.
This shows that locally, pN approximates f to at least N th order. There are several ways
to explicity identify what form the function RN (x) actually takes but they are not of great
importance for the math subject GRE; error bounds are more practical.
Proposition 31 (Error Bound for Taylor’s Theorem). Suppose that f : R ! R is

N +1 times di↵erentiable in a neighborhood of (a , a+ ) of a 2 R and that f (N +1) (x)  M
for all x 2 (a , a + ). Further, let RN : R ! R be as in Taylor’s Theorem. Then for all
x 2 (a , a + ), we have the bound
M |x a|N +1
|RN (x)|  .
(N + 1)!
Intuitively this tells you that if f is smooth enough, then |f pN | = O(|x a|N +1 )
as x ! a; that is, the error in the approximation should be on the order of |x a|N +1
(RN ⇠ |x a|N +1 as x ! a). This error bound can be used to calculate certain function
within a given tolerance.
Example 32. Suppose that pN is the N th order Taylor approximation to ex centered at

x = 0. How large does N need to be so that pN (1) approximates the value of e to two
decimal places?
Solution. We see that
M |1 0|N +1
|e pN (1)| = |RN (1)| 
(N + 1)!
d n
where M is the maximum of the N + 1 order derivative of ex . Since dx x
n (e ) = e
x
for any n,
we easily find that M = e. To get the first two digits correct, we need |RN (1)|  0.01. Thus
we simply choose N to accomplish this. We find that using the above bound, N = 5 gives
|RN (1)| . 0.003 which is good enough (one can also verify that N = 4 is not good enough).
These propositions don’t quite answer our question. We would like to be able to able to
write f as an “infinite polynomial,” but these only give finite polynomial approximations.
We pass to the “infinite polynomial” by taking N ! 1. However, some care is required.
Definition 33 (Analyticity). A smooth function f : R ! R is said to be analytic at

a 2 R if there is some sequence (cn ) and some open neighborhood I ✓ R with a 2 I such
that 1
X
f (x) = cn (x a)n , for all x 2 I.
n=0
In this case, we will have

f (n) (a)
cn =
n!
so that the sum above is exactly the limit of the Taylor approximations. We say that f is
analytic in an open interval I ✓ R if f is analytic at each point a 2 I. In this case, we call
the sum 1
X f (n) (a)
(x a)n
n=0
n!
the Taylor series for f centered at a.
This gives us the definition that we want but does not tell us what functions are analytic.
For that, we use the above error bound.
Proposition 34. Suppose that f : R ! R is smooth and that for some open interval
I ✓ R, we have that the sequence (MN ) given by
MN = max f (N ) (x)
x2I
is bounded. Then f is analytic in I; that is,
f (x) = lim pN (x), x 2 I.

N !1
[Under these hypotheses, we can actually make the stronger claim that pN ! f uniformly
in I].
This is easily proven using the error bound in Proposition 35. Note, this is not strictly
a necessary condition. For example, ex is not bounded on R (nor are its derivatives), but it
can be verified that ex can be represented by its Taylor series on all of R.
Proposition 35 (Radius of Convergence). Suppose that f : R ! R is analytic at

a 2 R and that (cn ) is as in the definition of analyticity. Then we can take I = (a r, a + r)
where
r = lim sup |cn |1/n .
n!1
That is, if f is analytic at a, then it is analytic in an open neighborhood centered at a. The

value r here is called the radius of convergence of the Taylor series of f centered at a and it
can be +1.
Example 36. Above we showed that

1
X
1
= xn for x 2 ( 1, 1).
1 x n=0
The above proposition shows that we cannot enlarge ( 1, 1) at all; i.e., this is the maximum
interval on which we have this equality. This is because the radius of convergence here is
r = lim sup |1|1/n = 1.

n!1
Note, this radius is found using the root test; due to some compatibility conditions with
the root and ratio test, in most exercises it can be found just as easily using the ratio test,
which is often simpler to apply. Also, in most practical example, the limit limn!1 |cn |1/n
will actually exist and so we can do away with the lim sup. Finally, note that the proposition
says that f is analytic on (a r, a + r); it does not say anything about the boundary points.
At these points, f may still be equal to its Taylor series or the series may fail to converge.
They must be checked separately. The set where a series converges is called the interval of
convergence. By the above proposition, I = (a r, a + r) will be the interior of the interval
of convergence.
P1 xn
Example 37. Find the interval of convergence of the series n=1 n .
Solution. By the strict definition, we have that radius of convergence is

1
r = lim sup = 1.
n!1 n1/n
This shows that the series converges on ( 1, 1). We can find the same result using the ratio
test. By the ratio test, the series will converge when
n |x|n+1 n
lim n = |x| lim = |x| < 1
n!1 (n + 1) |x| n!1 n + 1
which again shows convergence on ( 1, 1). The endpoints need to be checked separately
(this is where the ratio test and root test will fail). At x = 1, we have the harmonic series
which is divergent. At x = 1, we have the alternating harmonic series which converges by
the alternating series test. Thus the interval of convergence is [ 1, 1).
Note, this same convergence test can be performed for series thatP1are xnot necessarily
n
power series. For example, we could use the ratio test to find where n=0 1+xn coverges.
There are several Taylor series which are ubiquitous enough to merit memorization (al-
ternatively, these can be derived by quickly taking the derivatives and using the formula).
Proposition 38 (Specific Taylor Series). We have

1
X xn
(a) x
e = for all x 2 R,
n=0
n!
X1
( 1)n x2n
(b) cos(x) = for all x 2 R,
n=0
(2n)!
X1
( 1)n x2n+1
(c) sin(x) = for all x 2 R,
n=0
(2n + 1)!
1
X
1
(d) = xn for x 2 ( 1, 1).
1 x n=0
Note that all these series are centered at x = 0; it is fairly uncommon at this level (or on
the math subject GRE) to come across a Taylor series that is centered somewhere besides
x = 0 though it does occasionally happen. Note that the series for cos(x) and sin(x) can be
derived from the series for ex using Euler’s identity eix = cos(x) + i sin(x) and collecting real
and imaginary parts.
From these, using the next two propositions and some clever functional composition, we
can derive many other series for common functions without having to actually work out the
derivatives.
Proposition 39 (Term-by-term Di↵erentiation.). Suppose that f : R ! R is analytic

at a 2 R with series
1
X
f (x) = cn (x a)n
n=0
converging inside the interval of convergence I ✓ R. Then for all x 2 I, we have

1
X
0
f (x) = ncn (x a)n 1 .
n=1
That is, to find the derivative of a convergent series, we can di↵erentiate term-by-term. Note,
this proposition states f 0 will be analytic with the same radius of convergence as f .
Proposition 40 (Term-by-term Integration.). Suppose that f : R ! R is analytic at

a 2 R with series
1
X
f (x) = cn (x a)n
n=0
converging inside the interval of convergence I ✓ R. Let F : R ! R be any anti-derivative

of f . Then for all x 2 I, we have
1
X (x a)n+1
F (x) = F (a) + cn .
n=0
n+1
That is, to find the integral of a series, we can integrate term-by-term. Note that in this
case, the radius of convergence also does not change.
Example 41. Find the Taylor series for cosh(x) and log(1 + x2 ) centered at x = 0. What
are the intervals on convergence for these series?
Solution. Recall that cosh(x) = 12 (ex + e x ). To find the series for cosh(x), we can simply
add the series for ex and e x . Since
1
X xn
x
e = , for all x 2 R,
n=0
n!
we see that 1
X ( 1)n xn
e x
= , for all x 2 R.
n=0
n!
When adding these together, the odd terms will cancel and the even terms will be doubled.
Thus we have 1
X x2n
cosh(x) = .
n=0
(2n)!
Since this is just the addition of two series which converge everywhere, this series should also
converge everywhere; this can be checked using the ratio test. The the interval of convergence
is R.
For log(1 + x2 ), we note that
Z x
2 2t
log(1 + x ) = 2
dt.
0 1+t
Now we can expand the integrand in a Taylor series. Using the geometric series, we see that
X 1
1
= ( 1)n t2n for t 2 ( 1, 1).
1 + t2 n=0
Then for such t,

X1
2t
=2 ( 1)n t2n+1 .
1 + t2 n=0
Thus integrating gives

X1
( 1)n x2n+2
log(1 + x2 ) = , x 2 ( 1, 1).
n=0
n + 1
Note that at both x = 1 and x = 1, this series will converge (by the alternating series
test), thus the interval of convergence is [ 1, 1].
One large application of Taylor series is evaluation of infinite sums. We can see for
example, that plugging in certain values of x, the series listed in Proposition 42 will give us
the value of certain infinite sums. For example,
X1 X1
1 1 3 ( 1)n
n
= = or = e 1.
n=0
3 1 (1/3) 2 n=0
n!
Taylor series can help with much more involved sums though this sometimes requires skillful
manipulations and some fortuitous recognition of series.
Example 42. Evaluate the following infinite sums.

X1 X1
1 2 3 4 ( 1)k 2k⇡ 2k ( 1)n
(a) + + + + ··· , (b) , (c) .
4 16 64 256 k=0
(2k + 2)! n=0
n+1
Solution. For (a), we can write the sum as
X1
n
n
.
n=1
4
Notice that we have a geomtric term 1/4n in the summand (though it is accompanied by
another term); this should hint that (a) comes from the geometric series somehow. Indeed,
to make that n appear in the numerator, we can di↵erentiate the geometric series:
1
X 1
X
1 1
= xn =) = nxn 1 .
1 x n=0
(1 x)2 n=1
Plugging in x = 1/4 gives
1 1
16 X n 2 X n
= =) = .
9 n=1
4n 1 3 n=1 4n
For (b), we notice the factorial in the denominator: this indicated that the sum can likely
be evaulated using the Taylor series for sin(x), cos(x) or ex . Call the sum S. Note that
X1 X1
( 1)k (2k + 2)⇡ 2k ( 1)k ⇡ 2k
S= 2 = S 1 + S2 .
k=0
(2k + 2)! k=0
(2k + 2)!
We see 1 1
X ( 1)k ⇡ 2k 1 X ( 1)k ⇡ 2k+1 sin(⇡)
S1 = = = = 0.
k=0
(2k + 1)! ⇡ k=0
(2k + 1)! ⇡
Now 1 1
X ( 1)k ⇡ 2k 2 X ( 1)k+1 ⇡ 2k+2
S2 = 2 = 2 .
k=0
(2k + 2)! ⇡ k=0
(2k + 2)!
This looks exactly like the Taylor series for cos(x) with x = ⇡ plugged in except that k
replaced in the summad by k + 1. This has the e↵ect of omitting the first term inthe Taylor
series which is cos(0) = 1. Thus
1
!
2 X ( 1)k ⇡ 2k 2 4
S2 = 2 1+ = 2 ( 1 + cos(⇡) = 2
.
⇡ k=0
(2k)! ⇡ ⇡
Thus
4
S = S1 + S 2 = .
⇡2
For (c), we can imagine this may have arisen from plugging in x = 1 to a Taylor series.
Indeed, for x 2 ( 1, 1),
1 1 Z x Z x X1
! Z x
X x n+1 X dt
n n
= t dt = t dt = = log(1 x).
n=0
n + 1 n=0 0 0 n=0 0 1 t
Now substituting x = 1, we see

X1
( 1)n
log(2) = .
n=0
n + 1
Note, at the end here, we sort of cheated: we substituted in x = 1 when the original
manipulation was only valid for x 2 ( 1, 1). This is justified by Abel’s Theorem.
Theorem 43 (Abel’s Theorem). Suppose that f is analytic in (a r, a + r) ✓ R with

series 1
X
f (x) = cn (x a)n .
n=0
If
P1 f is continous from the leftPat a + r (resp. continuous from the rightPat a r), then
n 1 n 1 n
n=0 cn r Pconverges (resp. n=0 cn ( r) converges) and f (a + r) = n=0 cn r (resp.
1
f (a r) = n=0 cn ( r)n ).
P ( 1)n
This theorem validates that log(1 x) = 1 n=0 n+1 since log(1 x) remains continuous
as x ! 1.
Another use for Taylor series is in evaluating limits since Taylor series decide local be-
havior of a function. For example, we saw previously using l’Hôpital’s rule that
sin(x)
lim = 1.
x!0 x
This can easily be computed using Taylor approximations as well. Writing out the first few
terms of the Taylor series, we see that near x = 0,
x3
sin(x) ⇡ x .
6
Thus
sin(x) x2
⇡1 ! 1 as x ! 0.
x 6
This can greatly simplify limits which appear tricky at first.
sin(x sin(x))
Example 44. Compute lim .
x!0 x3
Solution. This limit can be performed using l’Hôpital’s rule and manipulating the result,
but with knowledge of Taylor series it is very easy. From the above, we see that near x = 0,
we have
x3
x sin(x) = + O(x4 ).
6
In the limit, the higher order terms disappear, so we see
sin(x sin(x)) sin(x3 /6)

lim = lim
x!0 x3 x!0 x3
and since sin(x) ⇡ x for x near 0, we have sin(x3 /6) ⇡ x3 /6, so
sin(x sin(x)) x3 /6 1
lim = lim = .
x!0 x3 x!0 x3 6
As a final note, in light of our viewing Taylor series as “infinite polynomials,” there should
be some compatibility with finite polynomials. Indeed, there is.
Proposition 45. For any polynomial p : R ! R and any a 2 R, we see

1
X p(n) (a)
p(x) = (x a)n , for all x 2 R.
n=0
n!
Note that there is no issue checking convergence since p(n) (x) = 0 for all x, when n is
larger than the degree of the polynomial; thus the sum is actually finite. This proposition
tells us that any polynomial is equal to its Taylor series centered at any point. This fact is
occasionally useful on the math subject GRE; for example, one problem on a previous math
GRE asks to identify the values of a0 , a1 , a2 , a3 so that
x3 x+1 a0 + a1 (x 2) + a2 (x 2)2 + a3 (x a)3
which is very simple if you view the right hand side as the Taylor series for the polynomial
p(x) = x3 x + 1 centered at x = 2.
Christian Parkinson GRE Prep: Calculus III Notes 1
Week 3: Calculus III

Notes
While my previous notes attempted to give a fairly comprehensive view of Calculus I and
Calculus II, it as at this point that I give up on that approach simply because there would
be too much material to cover. Accordingly, these notes contain only the definitions and
results relevant for the math subject GRE with very little discussion and very few examples.
We begin with the definitions necessary for three dimensional geometry.
Definition 1 (R3 as a vector space). A vector v 2 R3 is a set of three coordinates

v = (x, y, z) where x, y, x 2 R. Intuitively, drawing three perpendicular axes representing
the x, y and z dimensions, we can think of v pointing from the origin to the point (x, y, z)
determined by these axes. For two vectors v1 = (x1 , y1 , z1 ) and v2 = (x2 , y2 , z2 ), we defined
the operation of addition by
v1 + v2 = (x1 + x2 , y1 + y2 , z1 + z2 ).
We also define scalar multiplication by
↵v1 = (↵x1 , ↵y1 , ↵z1 )
for any ↵ 2 R.
Definition 2 (Magnitude). For a vector v = (x, y, z) 2 R3 , we define the magnitude

(also referred to as the “length” or “norm”) of v by
p
kvk = x2 + y 2 + z 2 .
Intuitively, this gives the length from the origin (0, 0, 0) to the point (x, y, z).
Definition 3 (Standard Basis Vectors). Define
ı̂ = (1, 0, 0), ˆ⌘ = (0, 1, 0), k̂ = (0, 0, 1).
Thus any vector v = (x, y, z) can be written as
v = xı̂ + yˆ⌘ + z k̂.
Definition 4 (Dot & Cross Product). For v1 = (x1 , y1 , z1 ) and v2 = (x2 , y2 , z2 ), we

define the dot product by
v 1 · v 2 = x1 x2 + y1 y2 + z 1 z 2
and the cross product by
v1 ⇥ v2 = (y1 z2 y2 z1 )ı̂ + (x3 z1 x1 z3 )ˆ⌘ + (x1 y2 x2 y1 )k̂.

Formally, we can compute the cross product by taking the determinant

ı̂ ˆ⌘ k̂
v 1 ⇥ v 2 = x1 y1 z 1 .
x2 y2 z 2
Proposition 5 (Geometric Interpretation of Dot Product). For vectors v1 , v2 , we
have
v1 · v2 = kv1 kkv2 k cos(✓)
where ✓ 2 [0, ⇡] is the angle between the vectors. Thus two vectors are perpendicular
i↵ v1 · v2 = 0; in this case, we call the vectors orthogonal. Two vectors are parallel i↵
v1 · v2 = kv1 kkv2 k.
Proposition 6 (Geometric Interpretation of Cross Product). For vectors v1 , v2 ,

the cross product v1 ⇥ v2 is the unique vector (up to scaling) which is perpendicular to both
v1 and v2 . Also
kv1 ⇥ v2 k = kv1 kkv2 k sin ✓
where ✓ 2 [0, ⇡] is the angle between the vectors.
We use vectors to define planes; these are the higher dimensional analogs to lines in R2 .
Proposition 7 (Point-Normal Form of a Plane). Suppose that v0 = (x0 , y0 , z0 ) is in a

plane and the vector n = (nx , ny , nz ) is orthogonal to every point in the plane (such a vector
n is called the normal vector to the plane). Then the equation of the plane is
nx x + ny y + nz z = nx x0 + ny y0 + nz z 0 .
Equivalently, the plane is given by v · n = v0 · n or n · (v v0 ) = 0 where v = (x, y, z).
Proposition 8 (Plane Containing Three Points). Suppose that v1 , v2 , v3 lie in a

plane. Then the equation for the plane is
[(v2 v1 ) ⇥ (v3 v2 )] · v = [(v2 v1 ) ⇥ (v3 v2 )] · v1 .
From here we move on to multi-dimensional calculus. This should all be seen as an easy
generalization of calculus in one variable. It should be said that, while we develop these
results for functions of two or three variables, this can all be generalized further to functions
of n variables.
Definition 9 (Partial Derivatives). Suppose that f : R2 ! R, (x, y) 7! f (x, y). The

first partial derivative of f is given by
@f f (x + x, y) f (x, y)
(x, y) = lim
@x x!0 x
whenever the limit exists. Likewise the second partial derivative of f is given by
@f f (x, y + y) f (x, y)
(x, y) = lim
@y y!0 y
whenever the limit exists. Alternately, these are often written as fx and fy respectively and,
when it is clear that x is taken to be the first argument of f , we may call fx the partial
derivative of f with respect to x (and similarly for y).
A function f : R2 ! R defines a surface (x, y, f (x, y)). Intuitively, these partial deriva-
tives measure the slope of that surface in the x and y directions repectively. From here, all
derivative rules follow from the same rules in the single variable case. One derivative rule
that is worth restating is a multi-dimensional version of the chain rule.
Proposition 10 (Chain Rule). Suppose that f : R2 ! R has continuous partial

derivatives and that x, y : R ! R have continuous derivatives. Then
d @f dx @f dy
(f (x(t), y(t)) = (x(t), y(t)) (t) + (x(t), y(t)) (t).
dt @x dt @y dt
Example 11. Let z = F (x, y, w) where x = f (u, w) and y = g(u, w) and all these functions
@z @z
are di↵erentiable on their respective domains. Find general expressions for @w and @u .
Solution. With multiple applications on the chain rule, we see that

@z @F @x @F @y @F
= + + ,
@w @x @w @y @w @w
and
@z @F @x @F @y
= + .
@u @x @u @y @u
We can compound partial derivatives on top of each other by adding di↵erentials or
subscripts to our notation; e.g.
@ @f @ 3f @ @ @f
fxx = or 2
= .
@x @x @x @y @x @x @y
Theorem 12 (Equality of Mixed Partials). Suppose that f : R2 ! R is twice di↵eren-
tiable (i.e., fxx , fxy , fyx , fyy all exist). If fxy and fyx are continuous, then fxy = fyx . That is,
when taking mixed partial derivatives, we can take the partial derivatives in any order and
arrive at the same result so long as f is sufficiently smooth.
Just as we looked for tangent lines in Calculus I, we can look for tangent planes in mul-
tiple dimensions.
Proposition 13 (Tangent Plane). Suppose that f : R2 ! R is di↵erentiable at (x0 , y0 ) 2

R2 . Then the plane tangent to the surface (x, y, f (x, y)) at (x0 , y0 ) is given by the equation
@f @f
z = f (x0 , y0 ) + (x0 , y0 )(x x0 ) + (x0 , y0 )(y y0 ).
@x @y
More generally, a surface can be given by a level set of a function of three variables:
g(x, y, z) = C for some di↵erentiable function g : R3 ! R. We can use the above formula
along with implicit di↵erentiation to find tangent planes in this case.
Example 14. Find the tangent plane to the surface x2 + log(y) + (z + 1)ez = 5 at the point
(2, 1, 0).
Solution. Here we cannot solve for z explicitly as a function of x and y but as above, the
tangent plane should be given by
@z @z
z = z0 + (x0 , y0 , z0 )(x x0 ) + (x0 , y0 , z0 )(y y0 ).
@x @y
Di↵erentiating the equation with respect to x, we see
@z z @z
2x + e + (z + 1)ez = 0.
@x @x
Plugging in (x, y, z) = (2, 1, 0), we see
@z @z @z
4+ + =0 =) = 2 at (2, 1, 0).
@x @x @x
Likewise, di↵erentiating with respect to y gives
1 @z z @z
+ e + (z + 1)ez =0
y @y @y
whence plugging in our point gives
@z @z 1
1+2 =0 =) = at (2, 1, 0).
@y @y 2
Thus the tangent plane at that point is given by
1
z= 2(x 2) (y 1).
2
While fx and fy give information about how f changes in the x or y direction irrespective
of the other, it can be important to decide how f changes in other directions as well. To
this end, we define the gradient of a function f .
Definition 15 (Gradient). Suppose f : R2 ! R is di↵erentiable. We define the gradient

of f to be the vector ✓ ◆
@f @f
rf (x, y) = (x, y), (x, y) .
@x @y
Likewise, if g : R3 ! R is di↵erentiable, we define the gradient of g to be the vector
✓ ◆
@g @g @g
rg(x, y, z) = (x, y, z), (x, y, z), (x, y, z) .
@x @y @z
Definition 16 (Directional Derivative). Suppose that f : R2 ! R is di↵erentiable.

The directional derivative of f at (x, y) in the direction of u is given by
Du f (x, y) = rf (x, y) · u.
Here u is assumed to be a direction vector; i.e. kuk = 1. If the norm of u is not 1, then it
must be normalized.
Proposition 17 (Intuition Behind Gradient). Suppose that f : R2 ! R and

(x, y) 2 R2 . Then rf (x, y) points in the direction in which f increases most rapidly from
(x, y) and the rate of this increase is krf (x, y)k. Note: this result is true in higher dimen-
sions as well.
Proposition 18 (Geometric Property of Gradient). Suppose that f : R2 ! R is

di↵erentiable. Then rf (x0 , y0 ) is normal to the level curve f (x, y) = C which contains
(x0 , y0 ). Likewise if g : R3 ! R is di↵erentiable, then rg(x0 , y0 , z0 ) is normal to the level
surface g(x, y, z) = C containing (x0 , y0 , z0 ).
Note: Proposition 18 (combined with Proposition 7) gives another way of computing the
tangent plane to the surface g(x, y, z) = C.
As in Calculus I, another large application of the derivative is minimization and maxi-

mization of functions.
Theorem 19 (Extreme Value Theorem). Suppose that f : R2 ! R is smooth and that

(x0 , y0 ) is a local extreme point for f . Then rf (x0 , y0 ) = 0. Conversely, define
(x0 , y0 ) ..= fxx (x0 , y0 )fyy (x0 , y0 ) [fxy (x0 , y0 )]2 .
If rf (x0 , y0 ) = 0, then
(a) if (x0 , y0 ) > 0 and fxx (x0 , y0 ) > 0, then f has a local minimum at (x0 , y0 ),
(b) if (x0 , y0 ) > 0 and fxx (x0 , y0 ) < 0, then f has a local maximum at (x0 , y0 ),
(c) if (x0 , y0 ) < 0, then f has a saddle point at (x0 , y0 ).

This is a strong theorem but often times, we would like to maximize a function with
respect to some constraint.
Theorem 20 (Lagrange Multiplier Method). Suppose that f, g : R2 ! R are smooth

and that the level sets g(x, y) = C are compact. The maximum value of f (x, y) on the curve
g(x, y) = C occurs at a point (x0 , y0 ) such that
rf (x0 , y0 ) = rg(x0 , y0 )
for some real number . Note that this also holds in higher dimensions.
From here, we move to multi-dimensional integration.
Definition 21 (Line Integral of a Scalar Function). Let r(t) = (x(t), y(t)), a  t  b

be a piecewise smooth parameterization of a curve C in R2 and let f : R2 ! R be continuous.
The line integral of f over the curve C is given by
Z Z b Z b p
0
f ds = f (r(t))kr (t)kdt = f (x(t), y(t)) x0 (t)2 + y 0 (t)2 dt.
C a a
Geometrically, evaluating f along the curve creates a new curve which lies on the surface
(x, y, f (x, y)); this integral gives the area underneath that curve. This definition works just
as well in higher dimensions.
Definition 22 (Line Integral of a Vector Field). A vector field is a function F : R2 !

R2 . Let r(t) = (x(t), y(t)), a  t  b be a piecewise smooth parameterization of a curve C in
R2 and let F : R2 ! R2 be a continuous vector field. The line integral of F over the curve
C is given by Z Z b
F · dr = F(r(t)) · r0 (t)dt.
C a
Note: Often times we name the components of F and write this in a di↵erent way. If
F(x, y) = (M (x, y), N (x, y)), then
Z Z b
F · dr = [M (x, y)x0 (t) + N (x, y)y 0 (t)]dt
C a
and with the formal notation x0 (t)dt = dx and y 0 (t)dt = dy, we write this as
Z Z
F · dr = M (x, y)dx + N (x, y)dy.
C C
For the math subject GRE, one should be familiar

R with both of these notations. Physically,
if one imagines that F is a force field, then C F · dr is the amount of work done by this force
field on a particle which traverses the path C.
In general, the line integral defined in Definition 22 will depend not only on the endpoints
of the curve, but the path taken between them. There is one very important case when this
does not happen; i.e., when the integral is independent of path.
Theorem 23 (Fundamental Theorem of Calculus for Line Integrals). Suppose

that C is a piecewise smooth curve beginning at point A and ending at point B and that
F : R2 ! R2 is continuous. Further, assume there is f : R2 ! R which is continuously
di↵erentiable on R2 and satisfies rf = F. Then
Z
F · dr = f (B) f (A).
C
Such a function f (if one exists) is called a potential function for F.

Definition 24 (Conservative Vector Field). A vector field F : R2 ! R2 is said to be

conservative if for any two piecewise smooth curves C1 , C2 with the same start and end point,
we have Z Z
F · dr = F · dr.
C1 C2
That is, a vector field is said to be conservative if the line integral between any two points
is independent of the path traversed between them. Equivalently, F is conservative if
Z
F · dr = 0
C
for any closed curve C (i.e., a curve which begins and ends at the same point). When the
curve is closed, we typically draw the integral sign with a circle:
I
F · dr = 0.
C
And easy consequence of Theorem 23 is that any vector field F which has a potential
function is conservative. A natural question is to ask whether we can characterize vector
fields with potential functions.
Proposition 25. Suppose that U ✓ R2 is open and that F : U ! R2 , F(x, y) =

(M (x, y), N (x, y)) is a smooth vector field. If F has a potential function, then My = Nx .
Conversely, if My = Nx and U is simply connected, then F has a potential function. Thus,
when F is smooth and the domain of F is simply connected, the following are equivalent:
(a) F has a potential function,
(b) My = Nx ,
I
(c) F · dr = 0 for any closed curve C (contained in the domain of F),
C
Z Z
(d) F · dr = F · dr for any two curves C1 , C2 beginning and ending at the same point.
C1 C2
Example 26. Consider the vector field

✓ ◆
y x
F(x, y) = , .
x2 + y 2 x2 + y 2
This vector field satisfies My = Nx , however If we integrate around the unit circle beginning
and ending at the point (1, 0), we find that the path integral of F is non-zero which shows
that F is not conservative. This shows that the assumption that the domain of F is simply
connected is necessary in Proposition 25. Indeed, here the domain of F is R \ {0} which is
not simply connected.
Thinking geometrically, the most natural generalization of one dimensional integration

(i.e., finding area under a curve) is area integration (i.e., finding volume under a surface).
Definition 27 (Area Integral [non-rigorous definition]). Let f : R2 ! R be contin-

uous and D ⇢ R2 . The volume underneath the surface (x, y, f (x, y)) for (x, y) 2 D is given
by
ZZ Xn
f (x, y)dA = lim f (xi , yi )Area(Ai )
D n!1
i=1
where {Ai } is a discretization of D into small, equally-sized rectangles and (xi , yi ) 2 Ai .

Rather than write the integral with the infinitessimal dA, we sometimes use dxdy with the
understanding that these notations mean the same thing.
We will never use this definition (or any similar definition). Rather than build area inte-
grals from Riemann sums, we simply note that for nice functions, area integrals can be seen
as iterated one-dimensional integrals (Fubini’s theorem) when, in each iterated integral, we
treat the other variable as a constant. We demonstrate this with examples.
Example 28. Let f (x, y) = xy. Calculate the area integral of f over
the regions
(a) D1 = {(x, y) 2 R2 : 0  x  1, 0  y  1},
(b) D2 = {(x, y) 2 R2 : 0  x  1, 0  y  x},
(c) D3 = {(x, y) 2 R2 : 0  x  1, 0  y  x2 }.
Solution. (a) To cover this area, we need to integrate from x = 0 to

x = 1 and from y = 0 to y = 1. Thus (a) D1
ZZ Z 1Z 1
f (x, y)dA = xy dxdy
D1 0 0
Z 1  x=1
x2
= y dy
0 2 x=0
Z 1
y
= dy
0 2 (b) D2
 2 y=1
y 1
= = .
4 y=0 4
Note that we could have equivalently integrated in y first, and then x:

ZZ Z 1Z 1 Z 1
x 1
f (x, y)dA = xy dydx = dx = .
D1 0 0 0 2 4
(c) D3
(b) Now to cover the area, we need to integrate from y = 0 to y = x, then from x = 0 to
x = 1. Thus
ZZ Z 1Z x
f (x, y)dA = xy dydx
D2 0 0
Z 1  2 y=x
y
= x dx
0 2 y=0
Z 1 3
x
= dx
0 2
 4 x=1
x 1
= = .
8 x=0 8
Here the order of integration mattered. We can change the order, but the we must change
the bounds. Indeed, {0  x  1, 0  y  x} is the same as {0  y  1, y  x  1} whence
ZZ Z 1Z 1 Z 1
1 y2 1 1 1
f (x, y)dA = xy dxdy = y· dy = = .
D2 0 y 0 2 4 8 8
(c) Here the bounds are y = 0 to y = x2 and x = 0 to x = 1. Thus

ZZ Z 1 Z x2
f (x, y)dA = xy dydx
D3 0 0
Z 1  y=x2
y2
= x dx
0 2 y=0
Z 1
x5
= dx
0 2
 x=1
x6 1
= = .
12 x=0 12
Again, if we want to change the order of integration, we need to be careful to change the
p
bounds as well. We see {0  x  1, 0  y  x} is the same as {0  y  1, y  x  1}
whence ZZ Z 1Z 1 Z 1
1 y 1 1 1
f (x, y)dA = xy dxdy = y· = = .
D3 0
p
y 0 2 4 6 12
As the above example demonstrated, we can perform the iterated integration in either
order. In some examples, it will be very difficult (or even impossible) to perform the inte-
gration in one order, but very easy in the other order.
All the same integration techniques from Calculus I can be used when performing the
iterated integrals. However, before performing iterated integrals, there are also generaliza-
tions of the techniques to area integrals.
Theorem 29 (Integration by Substitution). Suppose that f : R2 ! R is continuous

and that D, D̃ ✓ R2 . Further suppose we are given a smooth, bijective transformation
x, y : D̃ ! D
(u, v) 7! (x(u, v), y(u, v)).
Then ZZ Z
f (x, y)dxdy = f (x(u, v), y(u, v)) |J(u, v)| dudv
D D̃
where J(u, v) is the Jacobian determinant of the transformation:

✓ @x ◆
@u
(u, v) @x
@v
(u, v) @x @y @x @y
J(u, v) = det @y @y = (u, v) (u, v) (u, v) (u, v).
@u
(u, v) @v (u, v) @u @v @v @u
One particular transformation is used often enough that it deserves to be singled out.
Proposition 30 (Integration in Polar Coordinates). Suppose that f : R2 ! R is

continuous and that D ✓ R2 can be parameterized by
x = r cos ✓, y = r sin ✓
for r1  r  r2 and ✓1  ✓  ✓2 . Then

ZZ Z ✓2 Z r2
f (x, y)dA = f (r cos(✓), r sin(✓))r drd✓.
D ✓1 r1
We illustrate each of these with an example.
Example 31. Evaluate the integral of
(d) f (x, y) = x2 over the diamond shaped region with vertices at

(1, 0), (0, 1), ( 1, 0), (0, 1),
(e) g(x, y) = x2xy

+y 2
over the portion of the annulus 1  x2 +y 2  4
which lies in the second quadrant.
Solution. (d) For the first integral, it is not too difficult to write
the region in terms of x and y (the boundaries are given by x + y = (d)
±1 and x y = ±1). However, actually performing the integral
with x and y is somewhat tedious. It is much easier if we use the
substitution u = x + y and v = x y since these variables will
satisfy 1  u  1, 1  v  1. This transformation is given by
u+v u v
x= ,y =
2 2
(e)
which has Jacobian determinant J(u, v) = (1/2)( 1/2) (1/2)(1/2) = 1/2. Thus the
integral is given by
ZZ Z Z ✓ ◆
2 1 1 1 u2 + 2uv + v 2
x dxdy = dudv
D 2 1 1 4
Z
1 1 2 1 1
= ( 3 + 2v 2 )dv = ( 43 + 43 ) = .
8 1 8 3
(e) Here we use x = r cos(✓), y = r sin(✓) for 1  r  2 and ⇡/2  ✓  ⇡. This gives
ZZ Z ⇡ Z 2 2
xy r cos(✓) sin(✓)
2 2
dxdy = r dr✓
D x +y ⇡/2 1 r2
Z ⇡ Z 2
= r cos(✓) sin(✓)drd✓
⇡/2 1
Z ⇡ ✓=⇡
3 3 3
= cos(✓) sin(✓)d✓ = sin2 ✓ = .
2 ⇡/2 4 ✓=⇡/2 4
One of the main applications of the integral is finding area or volume. We note that in sin-
gle variable calculus, the integral of 1 over an interval will return the length of that interval.
This remains true in higher dimensions: the integral of 1 over a domain in R2 will give the
area of that domain, the integral of 1 over a domain in R3 will give the volume of that domain.
Example 32. Find the volume of the region in the first octant bounded above by the surface
z = 1 x y.
Solution. The volume will be given by

Z 1Z 1 xZ 1 x y Z 1 Z 1 x Z 1
(1 x)2 1
V = 1 dxdydx = (1 x y)dydx = dx = .
0 0 0 0 0 0 2 3
Finally, there are three fundamental integration theorems that concluded Calculus III.
One of them shows up on the math subject GRE almost every year so we cover this one.
The other very seldom appear on the math subject GRE but are nice to know. Accordingly,
we will state all three even though we have not covered all of the materal which builds up
to the latter two and so some of the notation and terminology may seem foreign.
Theorem 33 (Green’s Theorem). Suppose that F : R2 ! R2 , F(x, y) = (M (x, y), N (x, y))
is a smooth vector field. Let C be a piecewise smooth, simple (i.e., non-self-intersecting),
closed curve in R2 which encloses a region D. Then
I ZZ ✓ ◆
@N @M
M dx + N dy = dA.
C D @x @y
This appears often enough to merit an example.

Example 34. Calculate the line integral of

⇣ ⇣ 2 ⌘⌘
5 x2 yy y
F(x, y) = sin (log(5 + x)) 4y, 2 + e arcsin 1+y 2
over the curve C which traces along the parabola y = 1 x2

from (1, 0) to (0, 1) and then returns to (1, 0) along the x-
axis (pictured).
(f) C, Example 33
Solution. Calculating the actual line integral would likely
be impossible but thanks to Green’s Theorem, we see
I ZZ
F · dr = (x + 4)dA
C D
Z 1 Z 1 x2
= (x + 4)dydx
1 0
Z 1
8 16
= (x + 4 x3 4x2 )dx = 8 = .
1 3 3
We conclude with Stokes’ Theorem and the Divergence Theorem of Gauss. It bears re-
peating that for the sake of brevity, I have skipped some material that would lead up to
these theorems. These do not often appear on the math subject GRE.
Theorem 35 (Stokes’ Theorem). Let S be an oriented, smooth surface that is bounded

by a simple, closed, smooth curve C and let F : R3 ! R3 be a smooth vector field. Then
ZZ Z
(r ⇥ F) · dS = F · dr.
S C
Theorem 36 (Divergence Theorem). Let V be a simple, solid region in R3 with surface

S and let F : R3 ! R3 be a smooth vector field. Then
ZZ ZZZ
F · dS = r · F dV.
S V
As a final note, we see that Green’s Theorem, Stokes’ Theorem and the Divergence
Theorem all have a similar flavor (they all relate an integral over an n-dimensional object
to an integral over an (n 1)-dimensional object). This is no coincidence. All three of
these are special cases of a theorem from di↵erential geometry (which is typically called
Stokes’ Theorem) which says roughly that the integral of a di↵erential (n 1)-form ! over
the boundary of a smooth, orientable n-manifold M is equal to the integral of the exterior
derivative d! over the whole manifold:
Z Z
!= d!.
@M M
This version of Stokes’ Theorem also subsumes the fundamental theorem of calculus.
Christian Parkinson GRE Prep: Di↵. Eq. & Lin. Alg. Notes 1
Week 4: Di↵erential Equations & Linear Algebra

Notes
After calculus, the most common topic on the math subject GRE is linear algebra, foll-
wed by di↵erential equations. Since the latter is a more natural continuation of calculus
(and since it will take much less time), I cover it first.
Di↵erential Equations
Definition 1 (Di↵erential Equation). An ordinary di↵erential equation is an equation
of the form
F (x, y, y 0 , y 00 , . . . , y (n) ) = 0
for some function F : Rn+2 ! R where y (k) denotes the k th derivative of y with respect to the
independent variable x. The order of the di↵erential equation is equal to the highest order
derivative of u which appears; the above equation is nth order (assuming F does actually
depend on its last variable). Note, we often replace y with u which can be thought to stand
for “unknown.” The goal then, is to find the unknown function which satisfies the equation.
On the math subject GRE, questions are mostly limited to equations where the highest
order derivative of the unknown function can be isolated; that is, equations of the form
y (n) = f (x, y 0 , . . . , y (n 1)
).
Many di↵erential equations can be solved by simply recalling facts from calculus.
Example 2. Solve the di↵erential equations

(a) u0 = u, (b) y 00 = 4y.
Solution. For (a), we can phrase the di↵erential equation as a question: “What function
will remain the same upon di↵erentiating?” We recall from calculus that u(x) = ex is the
function which is invariant under di↵erentiation. Indeed, this a solution to the equation in
(a).
For (b), asking a similar question, we reason that y should be a function which, upon
twice di↵erentiating, returns the negative of itself (with a constant multiple). You may recall
from calculus that the function sin(x) returns its negative after twice di↵erentiating and then
to account for the constant multiple, you can see that y(x) = sin(2x) is a solution.
A natural question would be to ask if we have found all solutions to a given equation.
Of course, in the above example, we did not. For (a), any constant times ex would work
just as well so a full family of solutions would be u(x) = Cex for some arbitrary constant
C 2 R. For (b), besides simply mutiplying by a constant, we see y(x) = cos(2x) works as
well. Indeed, combinations of the two also work so the full family of solutions is give by
y(x) = A sin(2x) + B cos(2x) for some arbitrary constants A, B 2 R. Both these equations
have special structure which is defined here.
Definition 3 (Linearity). A di↵erential equation
F (x, y, y 0 , . . . , y (n) ) = 0
said to be linear if the function F is linear in the arguments (y, y 0 , . . . , y (n) ); that is,
F (x, ↵y + z, ↵y 0 + z 0 , . . . , ↵y (n) + z (n) ) = ↵F (x, y, y 0 , . . . , y (n) ) + F (x, z, z 0 , . . . , z (n) )
for all n times di↵erentiable y, z : R ! R and all ↵ 2 R.
In the terminology of linear algebra, a di↵erential equation is linear if its solution set forms
a (possibly affine) subspace of the vector space of continuous functions on R. Intuitively, a
di↵erential equation is linear if the unknown function and its derivatives do not appear in
any non-linear way. In symbols, a linear nth order di↵erential equation has the form
an (x)y (n) + · · · + a2 (x)y 00 + a1 (x)y 0 + a0 (x)y = f (x)
for some functions ak , f : R ! R (k = 0, . . . , n).
Typically, we also require that an (x) 6= 0 for any x, so that we may divide by an (x) and
eliminate the coefficient of y (n) . If an (x) = 0 at some x, this x is called a singular point
of the di↵erential equation. Solving di↵erential equations with singular points can be quite
difficult and is beyond the scope of these notes.
Linear equations are always solvable analytically. We build toward the general solution
of a first-order linear equation in a few steps.
Definition 4 (Separability). A first-order di↵erential equation is called separable if it is

of the form
f (x)
y0 = .
g(y)
That is, separable equations have the variables on the right hand side separated into two
di↵erent functions.
Proposition 5 (Solution to Separable Equations). The solution to
f (x)
y0 =
g(y)
is given by the pair of integrals
Z Z
g(y)dy = f (x)dx.
Formally, we can arrive at this by manipulating di↵erentials:

Z Z
dy f (x)
= =) g(y)dy = f (x)dx =) g(y)dy = f (x)dx.
dx g(y)
Of course, this actually depends on the chain rule.
Example 6. Solve the di↵erential equation

y 0 = (1 + y 2 )(4x3 + 2x).
Solution. We see that
Z Z
dy
= (4x3 + 2x2 )dx =) arctan y = x4 + x2 + C =) y = tan(x4 + x2 + C).
1 + y2
It can be readily checked that this function does satisfy the di↵erential equation.
From this, we can easily solve any equation of the form

y 0 = p(x)y.
We see that such an equation has the solution
R
p(x)dx
y(x) = e .
This observation gives us a method for solving any first order linear equation.
Proposition 7 (Method of Integrating Factor). The solution to the first-order linear

equation
y 0 + p(x)y = q(x)
is given by ✓Z ◆
R R
p(x)dx p(x)dx
y(x) = e e q(x)dx .
This formula deserves some motivation. In line with separable equations (or even simpler
equations of the form y 0 (x) = f (x)), we see that solving a di↵erential equation is akin to
“integrating the equation.” With this in mind, it would be ideal if the left hand side of the
above equation was a perfect derivative so we could just integrate it away. In general, there
will not be Y (x) such that
Y 0 (x) = y 0 (x) + p(x)y;
i.e., the left hand side will not be a perfect derivative. However, we can fix that by cleverly
multiplying by some other function µ(x) which we call an integrating factor. Indeed, our
new equation will read
µ(x)y 0 (x) + µ(x)p(x)y(x) = µ(x)q(x).
Now, if µ0 = p(x)µ, then the left hand side will read
µ(x)y 0 (x) + µ0 (x)y(x)
which is the derivative of µ(x)y(x). ThusR we simply choose µ satisfying µ0 = p(x)µ. However,
from the above, we know that µ(x) = e p(x)dx satisfies that equation. Thus we see,
R R R d h R p(x)dx i R
e p(x)dx y 0 + p(x)e p(x)dx y = e p(x)dx q(x) =) e y(x) = e p(x)dx q(x)
dx
whence integrating gives

R
Z R R
✓Z R
◆
p(x)dx
e y(x) = e p(x)dx q(x)dx =) y(x) = e p(x)dx
e p(x)dx
q(x)dx
which is the solution listed above.
Example 8. Solve the equation

1
y 0 (x) + y(x) = cos(x2 ).
x
R
(1/x)dx
Solution. According the formula, we should multiply the equation by µ(x) = e =
eln(x) = x. Doing this gives
d
xy 0 (x) + y(x) = x cos(x2 ) =) (xy(x)) = x cos(x2 ).
dx
Integrating yields
sin(x2 ) sin(x2 ) C
xy(x) = +C =) y(x) = +
2 2x x
where C is an arbitrary constant.
It may irk you that all of these solutions have this lingering arbitrary constant C. This
has something to do with the fact that di↵erentiation eliminates constants so when we solve
the equation (think of “integrating the equation”) we need to reintroduce constants that
may have been eliminated. Often times, along with a di↵erential equation, a given value
y(x0 ) = y0 is specified. This value can be used to solve for the constant C. For example,
p along
with the above equation, the question may have specifiedp that y should satisfy y( ⇡) = 1.
We found our general solution, so substituting x = ⇡ gives
p sin(⇡) C
y( ⇡) = p + p .
2 ⇡ ⇡
We are told this value should be 1, so this gives
C p
p =1 =) C= ⇡.
⇡
p
Thus the function y which solves the equation and satisfies y( ⇡) = 1 is given by
p
sin(x2 ) ⇡
y(x) = + .
2x x
Such questions which give an equation and specify a value for the unknown function at a
point are called Initial Value Problems or Boundary Value Problems depending on the con-
text (and the way in which the values of the unknown function are specified). We won’t
address this point any further.
This gives a solution to all linear first order equations. With separation of variables we
can also solve some non-linear equations. Most non-linear equations will not be solvable an-
alytically. However, without solving we can still determine some of the behavior for solutions
to certain equations.
Definition 9 (Autonomous Equations). A first-order di↵erential equation is called

autonomous if it is of the form
y 0 = f (y).
Note that all autonomous equations are separable and thus can, in principle, be solved
by integration. However, that integration could be too difficult to be feasibly performed.
Even so, we can use the equation to decide what potential solutions might look like.
Definition 10 (Equilibrium Solutions). Consider the first order autonomous equation

y 0 = f (y).
Suppose that y0 2 R is such that f (y0 ) = 0. Then y(x) = y0 is an equilibrium solution to
the above equation.
It is not difficult to see that if we specify y(x0 ) = y0 for some x0 where y0 is root of f , then
y(x) = y0 for all x > x0 ; that is, if y hits an equilibrium point, it will stay on that equilibrium
point forever. By a standard uniqueness argument (so long as f is di↵erentiable), this shows
that no solution could cross an equilibrium solution. That is, if y0 is an equilibrium point
and y(x0 ) < y0 , then y(x) < y0 for all x > x0 . A general rull of thumb for autonomous
equations is that all solutions either (1) blow up to positive or negative infinity as x ! ±1
or (2) tend toward an equilibrium solution as x ! ±1. The function f and a specified
value will tell you which regime your solution falls into. We demonstrate this in an example
before moving on.
Example 11. Draw a solution to y 0 = (y 1)(4 y) such that

(a) y(0) = 1,
(b) y(0) = 3,
(c) y(0) = 5.
Solution. The general procedure is to identify what sign
f takes on either side of each equilibrium point; this will
determine whether y increases or decreases in that region.
This will also tell you whether an equilibrium point is sta-
ble or unstable. Note that all solutions tend to an equi-
librium point here. The point y0 = 4 is a stable equi-
librium point while y0 = 1 is an unstable equilibrium
point. Figure 1: Example 10 solu-
tion curves
We move on to higher-order linear, constant coefficient equations. Note that anything

from this section works just as well for first-order equations though it is unnecessary since
we have the integrating factor method for such equations.
We first consider equations of the form
an y (n) + an 1 y (n 1)
+ · · · + a1 y 0 + a0 y = 0 (1)
where a0 , a1 , . . . , an 2 R, an 6= 0. There is one strategy that always works for such equations
up to yor ability to find roots of polynomials. Intuitively, we could think that we need to find
solutions which don’t change much upon di↵erentiation since the desired solution seemingly
will only be multiplied by constants when di↵erentiated. Thus we look for a solution of the
form y = erx where r is a constant. Plugging this in, we arrive at
an r n + an 1 r n 1
+ · · · + a1 r + a0 = 0.
This is satisfied if r is a root of the above polynomial. The polynomial has n roots up to
multiplicity, and we can then superpose these to construct the solution.
Example 12. Find the general solution to the di↵erential equation y 00 2y 0 3y = 0.
Solution. Substituting in y = erx , we arrive at
r2 2r 3=0 =) (r + 1)(r 3) = 0 =) r= 1, 3.
x
This shows that both e and e3x are solutions. Thus by linearity, the general solution is
x
y(x) = C1 e + C2 e3x .
This method raises an immediate red flag: what if the polynomial in r does not have real
roots? This is easily handled using Euler’s identity.
Example 13. Find the general solution of y 00 + 4y 0 + 13y = 0.
Solution. Again, substituting y = erx . we arrive at
r2 + 4r + 13 = 0 =) (r + 2)2 + 9 = 0 =) r= 2 ± 3i.
This would give solutions of the form
y(x) = C1 e( 2+3i)x
+ C2 e ( 2 3i)x
.
By Euler’s identity, we see this is equivalent to

2x
y(x) = e (C1 cos(3x) + C2 sin(3x)).
You may think the final constants C1 and C2 must be allowed to be complex now. This
is sort of true however if real initial conditions are specified, it will always turn out that
C1 , C2 2 R.
There is a slightly more subtle hiccup that one can encounter when using this method.
Consider the equation y 00 2y 0 + y = 0. Using this method, we would find that r2 2r + 1 =
0 =) (r 1)2 = 0 =) r = 1. This yields the solution y(x) = Cex . However, you
may have noticed that with all the previous second-order equations, there were two solutions
with two arbitrary constants (i.e., the solution space was a two-dimensional vector space).
This is true generally: the solution space to an nth order linear equation where the right
hand side is zero will form an n-dimensional vector space. So for this latter equation we
are missing a solution. This solution can be recovered using a method called variation of
parameters but what you will find is that in this case, the general solution is not a constant
multiple of ex , but rather a linear multiple of ex :
y(x) = (C1 + C2 x)ex .
This generalizes upwards for higher-order roots.
This method only deals with homogeneous equations like (1); equations which have zero
on the right hand side. There are also methods for non-homogeneous equations; those of the
form
an y (n) + an 1 y (n 1) + · · · + a1 y 0 + a0 y = f (x)
for some given function f . These methods cen be very complicated even in relatively simple
cases (low order, “nice” functions). However, if f has a special form then they can be very
easy. We state this in a couple propositions and finish with a brief example.
Proposition 14 (Solutions of Non-Homogeneous Equations). Any solution to the

equation
an y (n) + an 1 y (n 1) + · · · + a1 y 0 + a0 y = f (x)
takes the form y = yp + yh where
an yp(n) + an 1 yp(n 1)
+ · · · + a1 yp0 + a0 yp = f (x)
and yh is the general solution of

(n) (n 1)
an y h + an 1 y h + · · · + a1 yh0 + a0 yh = 0.
The function yp is called the particular solution and the function yh is called the homoeneous
solution. Thus this proposition tells us that we can always break a solution into a particular
and homogeneous part.
Proposition 15 (Method of Undetermined Coefficients). Consider the equation
an y (n) + an 1 y (n 1)
+ · · · + a1 y 0 + a0 y = f (x).
Suppose that f is not in the span of the homogeneous solutions for the equation. Then
(a) if f (x) = ↵ sin( x) + sin( x) where 6= 0 and at least one of ↵, is non-zero, then
yp (x) = A sin( x) + B cos( x) for some constants A, B,
x x
(b) if f (x) = ↵e for ↵, 6= 0, then yp (x) = Ae for some constant A,
(c) if f (x) = ↵m xm + · · · + ↵1 x + ↵0 where m 2 N and ↵m 6= 0, then yp (x) = Am xm + · · · +

A1 x + A0 for some constants A0 , A1 , . . . Am .
This gives us a way of solving non-homogeneous equations if the forcing functions are of
a certain form.
Example 16. Find the general solution of the equation y 00 + 25y = x2 .
Solution. We must find the general homogeneous solution and then find a single particular
solution. For the homogeneous solution, we try yh = erx . This will be a homogeneous
solution if
r2 + 25 = 0 =) r = ±5i.
This gives a homogeneous solution: yh (x) = C1 cos(5x) + C2 sin(5x). By the above proposi-
tion, it suffices to look for a particular solution of the form yp (x) = Ax2 +Bx+C. Substituting
this into the equation gives
2A + 25Ax2 + 25Bx + 25C = x2 .
Matching the coefficients on each power of x gives A = 1/25, B = 0, C = 2/625. Thus the
general solution is given by
x2 2
y(x) = yp (x) + yh (x) = + C1 cos(5x) + C2 sin(5x).
25 625
Linear Algebra
The most basic goal of linear algebra is to solve systems of linear equations. Since these
linear systems can be expressed in matrix vector form, it is worthwhile to build up some
theory surrounding matrices and vectors; we will do that momentarily. First we review how
to solve these systems. The most general techinique for solving these equations is known
as Gauss elimination (also called row reduction) with back substituion wherein we simply
eliminate variables by tactfully combining the equations and then solve the equations in
reverse order. We demonstrate these sort of computations in a few examples and then move
on to theoretical aspects of linear algebra.
Example 17. Solve for (x, y, z) in the following system of equations:

x+y+z =5
x + 2y z = 9
2x + 4y + z = 3
Solution. In the above, we could subtract the first equation from the second equation and
add twice the first equation to the third equation to arrive at
x+y+z =5
y 2z = 4
6y + 5z = 7
Now we can subtract 6 times the second equation from the third and we will have
x+y+z =5
y 2z = 4
17z = 17
Now the last equation depends only on z and can easily be solved: z = 1. Substituting
thie value into the second equation, we find y = 2 and using both these values in the first
equation, we have x = 4. Thus our solution is (x, y, z) = (4, 2, 1).
Of course, we can write such a system in matrix-vector form as
0 10 1 0 1
1 1 1 x 5
@1 2 1 A @ A
y = @ 9A
2 4 3 z 3
or more succinctly in augmented matrix form where we can perform the same elimination
steps as before. This is given by
0 1 0 1
1 1 1 5 1 1 1 5
@1 2 1 9 A @
! 0 1 2 4A
2 4 3 3 0 6 5 7
0 1
1 1 1 5
@
! 0 1 2 4 A
0 0 17 17
where it is understood that each column represents a variable and the entries to the right of
the divider represent the values on the right hand side of the equation.
Many questions on the math subject GRE boil down to solving linear systems but there
are some intricacies. Above we found a unique solution to our system of equations. This
may not always happen. We demonstrate this with two more brief examples before moving
to theory.
Example 18. Solve the linear systems
x+y+z =5
x + 2y z = 9
3x + 5y z = 23
and
x+y+z =5
x + 2y z = 9
2x 5y + 4z = 10
Solution. For the first, we row reduce to see

0 1 0 1
1 1 1 5 1 1 1 5
@1 2 1 9 A ! @0 1 2 4A
3 5 1 23 0 2 4 8
0 1
1 1 1 5
! @0 1 2 4A
0 0 0 0
Re-writing this in equation form, the third equation is 0x+0y+0z = 0 which is automatically
satisfied for any (x, y, z). This shows that we really only have two equations while there are
three variables. Thus the best we can do is solve for two variables in terms of the third.
Indeed, letting z = t for some arbitrary t 2 R, we find that y = 4 + 2t and x = 1 3t. Thus
the solutions are given by the set of vectors (x, y, z) = (1 3t, 4 + 2t, t) where t 2 R. In this
case, since z is unconstrained, it is called a free variable.
For the second system above if we subtract three times the second equation from the
first, we arrive at 2x 5y + 4z = 22. This directly contradicts the third equation and so
this system is inconsistent; there are no solutions.
With these three examples we have witnessed three di↵erent scenarios. The first system
admitted a unique solution, the second had infinitely many and the third had zero. It turns
out these are the only three possibilities. We see this shortly.
Now we move onto some theory. The concepts of greatest concern in undergraduate linear
algebra (matrix algebra and the structure of Rn ) are specific cases of more general concepts
which we cover first though we will soon see that it is enough to study Rn .
Definition 19 (Vector Space). A vector space is a set V paired with a scalar field F and
equipped with two operations
+:V ⇥V !V
(x, y) 7! x + y
·:F⇥V !V
(↵, x) 7! ↵ · x (also written ↵x)
such that (V, +) is an additive group and + and · are compatible.1 In particular, since (V, +)
is an additive group it must have an additive identity which we name 0 and each element
x 2 V must have an additive inverse which we denote by x. When referencing a vector
space, we often only refer to the set V especially when there is no ambiguity about which
underlying field F we are using.
Typically the field F is either R or C; for most of our discussion we consider it to be

R. The prototypical example of a vector space is Rn ; the set of real n-tuples with pointwise
addition and pointwise scalar multiplication. Other examples include the set of matrices
under pointwise addition and scalar multiplication or the set continuous functions from a
topological space into C.
Any non-trivial vector space has smaller vector spaces embedded inside of it. These will
be of interest for various reasons later on so we define the concept now.
Definition 20 (Subspace). A subset of a vector space is called a subspace if it contains

the zero vector and is closed under addition and scalar multiplication.
That is, a subspace is a vector space which is contained in some larger vector space. The
o the coordinate axes in R . Indeed, consider the subsets of
n
easiest examples of subspacesn are
R2 given by x0 x2R and y0 . These are both subspaces of R2 . Every vector space
y2R
has two trivial subspaces: the set {0} and the space itself. Subspaces can be combined to
make other subspaces in various ways. For example, intersecting two subspaces or taking
the direct sum of two subspaces will yield another subspace.
One interesting thing about Rn is that all vectors in Rn can be expressed as linear
combinations of a very small set of vectors. Indeed, if x = (x1 , x2 , . . . , xn ) 2 Rn then
x = x1 e 1 + x2 e 2 + · · · xn e n
where ek is a coordinate vector; one which has zeros everywhere except a 1 in the k th entry.
Moreover, this representation is unique. There are no other scalars (↵1 , . . . , ↵n ) such that
x = ↵1 e1 + · · · + ↵n en . It is reasonable to ask if other vector spaces have a similar property:
can we always find a relatively small set whose linear combinations can form any vector in
1
In the sense of several axioms such as distributivity and associativity which I am much too lazy to
reproduce here.
the space? Indeed, we can. We build toward this.
Definition 21 (Span). Let V be a vector space and M ⇢ V . The span of M (denoted

by span(M )) is the smallest subspace of V which contains M ; that is, it is the intersection
of all subspaces of V which contain M . Equivalently, the span of M is the set of all linear
combinations of elements in M . This alternative definition is especially useful when M =
{v1 , . . . vn } is a finite subset where we can write
span{v1 , . . . , vn } = {↵1 v1 + · · · + ↵n vn : ↵1 , . . . , ↵n 2 F}.
We say vector space V is spanned by {v1 , . . . , vn } if v1 , . . . , vn 2 V and V = span{v1 , . . . , vn }

(alternatively, we call the vectors a spanning set for V ).
According to this definition, the set of vectors {e1 , . . . , en } in Rn are a spanning set for Rn .
One deficiency in this definition is that we could always add more vectors to a spanning set.
Indeed, {0, v, e1 , . . . , en } is also a spanning set for Rn (where v 2 Rn is arbitrary). However,
the zero vector and this arbitrary v are not needed in order to span Rn since they themselves
can be written as a linear combination of the other vectors in the set; including them would
ruin the uniqueness of the representation of any vector x since x = ↵0+0v +x1 e1 +· · ·+xn en
for any ↵ 2 R. We would like to eliminate such redundacies.
Definition 22 (Linear Independence). A set of vectors {v1 , . . . , vn } in a vector space

V are said to be linearly independent if for ↵1 , . . . , ↵n 2 F,
↵ 1 v1 + · · · + ↵ n vn = 0 =) ↵1 = · · · = ↵n = 0.
Otherwise, if we can find ↵1 , . . . , ↵n 2 F not all zero such that
↵ 1 v1 + · · · + ↵ n vn = 0
then the vectors are said to be linearly dependent.
Intuitively, linearly independent sets have no redundacies; no one vector in the set can be
written as a linear combination of the others. Thus we have uniqueness of representations of
elements in the span of linearly independent vectors. From this definition, it is easy to see
that the coordinate vectors in Rn , {e1 , . . . , en } form a linearly independent set which also
spans Rn .
Definition 23 (Basis). A basis for a vector space is a linearly independent spanning set.
Equivalently, {vi }i2I is a basis for a vector space V if every vector v 2 V can be represented
uniquely as a finite linear combination of vectors in {vi }i2I :
v = ↵i1 vi1 + · · · + ↵in vin , for some n 2 N, i1 , . . . , in 2 I, ↵i1 , . . . , ↵in 2 F.
Bases are very important because they allow us write all vectors in a space in terms of
a relatively small set of vectors. It isn’t clear a priori that all vector spaces have a basis.
However this is indeed a theorem.
Theorem 24. Every vector space has a basis.
This theorem is deceptive in that it seems very strong and useful but in practice it is
essentially useless. The theorem is actually equivalent to the axiom of choice and so the
proof is entirely non-constructive. So while this theorem ensures that every vector space has
a basis, it gives no advice on how to find such a basis and indeed, attempting to identify a
basis could be a fruitless endeavor. In one special case it is very easy to find a basis; when
the basis consists of a finite amount of vectors. Indeed in this case, we may use a basis
to somehow define the size of the space. First, it should be noted that bases are far from
unique. For example, both the sets 10 , 01 and 11 , 11 are bases for R2 . This could
potentially be troublesome; if we want to use bases to determine the size of a space, we
should be certain that all bases are actually the same size.
Theorem 25 (Invariance of Size of a Basis). If two sets each form a basis for the same
vector space, then the sets have the same cardinality.
Definition 26 (Dimension). The dimension of a vector space is defined to be the number

of elements in any basis for the space. If this is finite, we call the space finite dimensional;
otherwise the space is infinite dimensional. For a vector space V , this is written as dim V .
Thus for example Rn is n-dimensional since the basis {e1 , . . . , en } has n elements. For
the remainder of these notes, we focus on finite dimensional vector spaces (as is common in
undergraduate linear algebra); for a survey of results in infinite dimensional vector spaces,
one could study linear functional analysis.
In finite dimensional vector spaces, it is very easy to find a basis due to the next theorem.
Theorem 27. In any n-dimensional vector space, any set of n linearly independent vectors
forms a basis and any spanning set consisting of n vectors forms a basis.
This is a great result because it shows that we only need to check one property in the
definition of a basis. The following theorem has a similar flavor.
Theorem 28 (Size of Spanning/Linearly Independent Sets). In an n-dimensional

vector space, any linearly independent set can have at most n elements and any spanning
set has at least n elements. In particular, any spanning set is larger than any linearly inde-
pendent set.
This theorem makes a lot of sense intuitively: if there are only n-dimensions and you have
more than n vectors, then one of them must be a combination of the others. Similarly, if
there are n-dimensions then less than n vectors cannot span the space since some dimension
will be missing. This theorem tells us that a basis can be viewed as a maximal linearly
independent set or a minimal spanning set.
With all these definitions, we move on to what is the largest topic in linear algebra: maps
between vector spaces. For various reasons, linear transformations between vector spaces are
of special interest.
Definition 29 (Linear Transformation). Suppose that V, W are vector spaces over the
same field F. A linear transformation T : V ! W is a map such that
T (↵x + y) = ↵T (x) + T (y), for all x, y 2 V, ↵ 2 F.
We will focus mainly on the case that V = W so the map is a linear transformation
which takes a vector space to itself though for several results this is not necessary. When
V = W , the most obvious example of a linear transformation is the identity transformation
I : V ! V for which I(v) = v for all v 2 V . We typcally denote this transformation by
I, sometimes with an indication of the underlying space; e.g., IV . As another example, all
matrices form a linear transformation.
Proposition 30 (Matrices as Linear Transformations). Every matrix A 2 Rn⇥m

induces a linear transformation TA : Rm ! Rn by the rule
TA (x) = Ax, x 2 Rm .
In view of this proposition, we typically blur the lines between a matrix and the linear
transformation that it induces. For example, we often refer to matrices as linear transfor-
mations with the understanding that the tranformation we are discussing is given by x 7! Ax.
Taking a categorical view, we note that for a linear transformation T : V ! W , the set
of vectors {T (v) : v 2 V } ⇢ W is itself a subspace of W ; that is, T does actually map a
vector space to a vector space which shows that linear transformations are the morphisms in
the category of vector spaces. In fact, each linear transformation has two natural subspaces
associated with it.
Definition 31 (Range & Null Space). Suppose V, W are vector spaces and T : V ! W
is a linear map. We define the null space of T by
N (T ) = {v 2 V : T (v) = 0}
and the range (or image) of T by
R(T ) = {T (v) : v 2 V }.
Thus the null space of T is a subspace of V and the range of T is a subspace of W .
Thinking more about structure which linear transformations preserve, we can use them
to identify spaces which are essentially the same. We do this in the following way.
Definition 32 (Inverse of a Transformaton). Suppose that V, W are vector spaces

and that a linear transformation T : V ! W is one-to-one and onto. Then there is a linear
transformation T 1 : W ! V such that
1 1
T (T (v)) = v and T (T (w)) = w, for all v 2 V, w 2 W.
Such T 1 is called the inverse of the transformation of T and T is called invertible

Note, this is half-definition and half-proposition; we are asserting that this inverse map
is actually a linear map.
Definition 33 (Isomorphic Spaces). Two vector spaces are said to be isomorphic is

there is an invertible linear map between them. When vector spaces V, W are isomorphic,
we write V ⇠= W.
Isomorphic spaces share essentially all the same structure. For example, isomorphic
spaces have the same dimension (in fact, an invertible linear transformation will map a basis
of one space to a basis of the other). The next theorem justifies focusing only on Rn rather
than generic vector spaces V .
Theorem 34 (Classification of Finite Dimensional Vector Spaces). Any two n-

dimensional vector spaces over the same field are isomorphic. In particular, if V is an
n-dimensional vector space over R, then V ⇠
= Rn .
This tells us that essentially any linear algebra property that we can prove for Rn will
hold in any n-dimensional vector space (we can simply translate between the two by finding
an isomorphism). Given this, we drop any sort of generality and focus only on Rn for the
majority of the notes.
Another curious simplification from undergraduate linear algebra is that, rather than deal
with general linear maps T , we only deal with matrices. This is justified by the following.
Theorem 35 (Classification of Linear Transformations). Let T : Rm ! Rn be a

linear transformation. Then there is a matrix A 2 Rn⇥m such that
T (x) = Ax, x 2 Rm .
This further blurs the lines between matrices and linear transformations by providing a
converse to Proposition 29: not only does every matrix induce a linear transformation, we
also have that every linear transformation is induced by a matrix. Finding the matrix which
induces a given transformation is easy; the columns of the matrix are given by T (ek ) where
ek is a coordinate vector (in view of this, we can think of an invertible matrix as simply
changing the basis for the underlying space; in a di↵erent basis, a linear transformation will
have a di↵erent matrix associated with it). With this, we drop generality even further and
discuss matrices for a while. Any of the above definitions for linear transforms are adaptable
to matrices; e.g. we will use R(A) to mean the range of the linear transform which A induces.
Given a matrix, we would like to nail down exactly how it transforms Rn by answering
questions like “what does the range of A look like?” The first realization to note is that the
range of A is exactly the span of its columns. Indeed, for A 2 Rn⇥m , any y in the range of
A has the form
y = Ax = x1 A1 + x2 A2 + · · · + xm Am
where x is some vector in Rm and Ak are the columns of A. For this reason, the range of A
is often called the column space of A.
Definition 36 (Rank). The rank of a matrix is defined to be the dimension of the range
or the matrix. Equivalently the rank is the number of linearly independent columns in the
matrix. For a matrix A, we write this as rank(A).
The rank of an n ⇥ m matrix is at most n because the matrix maps into Rn which is
n-dimensional but the rank is also at most m since the matrix has m columns. A matrix
is said to be full rank if the rank is as large as possible; i.e., A 2 Rn⇥m is full rank if
rank(A) = min{n, m}. The rank of a matrix tells you a lot about how the matrix transforms
Rn . The only rank 0 matrix is zero matrix; it maps all vectors to the point {0}. A rank
1 n ⇥ m matrix maps Rm to a line in Rn . A rank 2 matrix maps Rm to a plane in Rn ,
etc. Intuitively, the rank should tell you something about the invertibility of a matrix. The
transformation which maps all of Rm onto a line (where m 2) will not be invertible since
many points must be mapped to the same point on the line.
Definition 37 (Nullity). The nullity of a matrix is defined to be the dimension of the null
space of the matrix. Equivalently, the nullity is maximum number of elements in linearly
independent set which is annihilated by the matrix. For a matrix A, we write this as null(A).
As an extension of the above discussion, if the nullity of the matrix is non-zero, then that
matrix maps a non-trivial subspace onto the zero vector which will mean the transforma-
tion is non-invertible. Thus there seems to be some relationship between rank and nullity;
roughly, higher rank means larger range which means less vectors are mapped to zero which
means lower nullity. This line of reasoning is made precise in the following theorem.
Theorem 38 (Rank-Nullity Theorem). Suppose that A 2 Rn⇥m . Then

rank(A) + null(A) = m.
Equivalently,
dim R(A) + dim N (A) = dim Rm .
We now define what it means for a matrix to be invertible and relate this to the rank
and nullity of a matrix.
Definition 39 (Identity Matrix & Inverse Matrices). The identity matrix of dimen-
sion n is the matrix I 2 Rn⇥n is the matrix with entries ij = 1 if i = j and ij = 0 for i 6= j.
This matrix induces the identity transformation on Rn . A matrix A 2 Rn⇥n is said to be

invertible (or nonsingular) if there is a matrix B 2 Rn⇥n such that AB = BA = I. In this
case, we call B the inverse matrix of A and we write it as A 1 . If no such matrix exists,
then A is called singular.
The power of the inverse of a matrix is that it gives us a concrete way to solve linear
systems. Indeed, for a system of equations like those in Example 16, we can write the
equations in the form Av = b for some matrix A, vector of unknowns v and a given vector
b. If A is invertible and we know the inverse A 1 , then we see
Av = b =) A 1 Av = A 1 b =) Iv = A 1 b =) v = A 1 b
so we have solved the equation for v. Because of this, it is very helpful to identify exactly
which matrices are invertible and which are not. For example, from the above it seems like
an n ⇥ n matrix can be invertible only if it has full rank; this turns out to be true. Towards
the end of this document, we will give a fairly comprehensive list of concrete conditions
which can be easily checked and are equivalent to invertibility of a matrix.
The above shows that invertibility is related to solvability of the system Av = b. We saw
in Examples 16,17 that not all systems are uniquely solvable and some are insolvable. The
previous paragraph seems to show that Av = b is uniquely solvable when A is invertible.
This is a sufficient but unnecessary condition for unique solvability. Indeed, the system may
be uniquely solvable even when the matrix A is no longer n ⇥ n. We summarize this in a
proposition.
Proposition 40 (Solvability of Av = b). Suppose that A 2 Rn⇥m and that b 2 Rn is

given. Then exactly one of the following is true regarding the system of equations Av = b.
(i) There is a unique solution v 2 Rm . [This will occur when b 2 R(A) and N (A) = {0}
which is always true for an invertible matrix A 2 Rn⇥n .]
(ii) There are infinitely many solutions v 2 Rm . [This will occur when b 2 R(A) and N (A)
is non-trivial because then we can find a single solution v and add to it any element
of the null space of A.]
(iii) There is no solution v 2 Rm . [This will occur when b 62 R(A).]

It is useful to phrase the above proposition in terms of the rank and nullity of A; we
leave this as an exercise to the reader. We can conclude from this that when A is invertible,
Ax = 0 has only the trivial solution x = 0. Indeed, this property is actually equivalent to
invertibility when A 2 Rn⇥n .
In line with the above discussion, it seems like a geometric way to determine whether
a matrix is invertible if to see if it maps n-dimensional subspaces to other n-dimensional
subspaces. To motivate this, we could consider volume in R3 . In some rough sense, a shape
which is truly 3-dimensional should have some positive volume while shapes which are 1-
dimensional or 2-dimensional (embedded in R3 ) have zero volume. Thus to decide whether
a matrix is invertible, we could check if it maps sets of positive volume to sets of positive
volume. This is essentially what the determinant of the matrix measures.
Definition 41 (Determinant). Suppose that A 2 Rn⇥n . The determinant of the matrix

A is defined by X
det A = sign( )a1 (1) a2 (2) · · · an (n)
2Sn
where Sn is the group of permutations of {1, . . . , n}. This is also sometimes written as |A|
(though we will stick to the former notation: det A).
At first this formula is somewhat mysterious but we rarely use this definition to calculate
a determinant. To actually calculate a determinant, we induct upwards. By the formula, we
see that ✓ ◆
a11 a12
det = a11 a22 a12 a21 .
a21 a22
Thus we can take the determinant of a 2 ⇥ 2 matrix. The following proposition allows us to
take the determinant of larger matrices.
Proposition 42 (Cofactor Expansion). Suppose that A 2 Rn⇥n and fix j 2 1, . . . , n.

Then n
X
det(A) = ( 1)i+j aij det(A(ij) )
i=1
where A is the matrix obtained by removing the ith row and j th column from A. Likewise,
(ij)
for fixed i 2 {1, . . . , n},

n
X
det(A) = ( 1)i+j aij det(A(ij) ).
j=1
These formulas relate the determinant of an n ⇥ n matrix to that of a series of (n 1) ⇥

(n 1) matrices.
The geometric interpretation of the determinant is crucial to seeing how this number
relates to invertibility.
Proposition 43 (Geometric Interpretation of Determinant). Let [0, 1]n denote the

unit cube in Rn , let A 2 Rn⇥n and let A([0, 1]n ) denote the image of the unit cube under the
transformation given by A. Then
|det A| = Vol A([0, 1]n ).
In words, the matrix A will transform the unit cube into some n-dimensional parallelepiped;
the determinant of A gives the volume of that parallelepiped.
By our above reasoning then, if the determinant of A is zero, then A squishes the unit
cube onto some lower dimensional parallelepiped and thus A should fail to be invertible; this
turns out to be true.
It is worthwhile to memorize some of the useful properties of the determinant; we list

these here.
Proposition 44 (Properties of the Determinant).
(1) det(I) = 1
(2) If two rows of a matrix are interchanged, the determinant changes sign.
(3) If one row of a matrix is multiplied by a constant, the determinant is also multiplied
by that constant. In particular for A 2 Rn⇥n , det(↵A) = ↵n det(A) for ↵ 2 R.
(4) Adding a multiple of one row to another row will not change the determinant.
(5) The determinant is multiplicative; i.e., det(AB) = det(A) det(B) when A, B 2 Rn⇥n .
Having defined the determinant, we are equipped to define eigenvalues and eigenvectors
but there is one more important value to define regarding a matrix which will come back later.
Definition 45 (Trace). The trace of a matrix is the sum of the diagonal elements of the
matrix. In symbols, for a matrix A 2 Rn⇥n , the trace is defined by
n
X
tr A = aii
i=1
where aij are the elements of the matrix.
The trace doesn’t carry quite the weight that the determinant does (for example, it is
unrelated to invertibility) but it is a linear functional on Rn⇥n (a functional on a vector space
is a map which takes the space to the underlying field) and you will be expected to know
about it for the math subject GRE. We state its properties.
Proposition 46 (Properties of the Trace). Let A, B 2 Rn⇥n and ↵ 2 R. Then
(a) [Linearity.] tr(↵A + B) = ↵trA + trB
(b) [Cyclic Property.] tr(AB) = tr(BA) and more generally the trace is invariant under
cyclic permutations so e.g.
tr(A1 A2 A3 A4 ) = tr(A2 A3 A4 A1 ) = tr(A3 A4 A1 A2 ) = tr(A4 A1 A2 A3 )

It is important to note what this last property does not imply: the trace is not invariant
under any permutations. Thus, when taking the trace of a product of several matrices, one
cannot swap the matrices in an arbitrary order with impunity.
With this, we move on to eigenvalues and eigenvectors of a matrix which carry geometric
information about how a square matrix A transforms Rn .
Definition 47 (Eigenvalues & Eigenvectors). Let A 2 Rn⇥n . An eigenvalue of A is a

scalar 2 C, such that
Av = v
for some nonzero vector v 2 Cn . In this case, such v is called an eigenvector corresponding
to the eigenvalue .
The eigenvectors (if they are real) show the directions in which A simply stretches or
squishes vectors and the amount of stretch is given by ; if we know these directions, we
may be able to deduce how A deforms other vectors. Imaginary eigenvalues and eigenvec-
tors correspond somehow to rotation rather than stretching. Practically, we see that an
eigenvalue/eigenvector pair satisfies
(A I)v = 0
and since v 6= 0, this shows that is an eigenvalue of A i↵ A I is singular. Recalling

that a matrix is singular i↵ it has zero determinant, we see that the eigenvalues of A are the
roots of the equation
det( I A) = 0.
The function pA ( ) = det( I A) is called the characteristic polynomial of A; it will be a
monic, order n polynomial in . This explains why we need to consider 2 C; although
pA ( ) is a real polynomial, it may have complex roots. Up to algebraic multiplicity, an n ⇥ n
matrix has n eigenvalues. To relate this to invertibility, consider: if zero is an eigenvalue of
A, then there is a non-zero solution to Av = 0 and so A is not invertible. The following
proposition relates the eigenvalues of a matrix to its determinant and trace.
Proposition 48. Let A 2 Rn⇥n and let 1, 2, . . . , n be the n eigenvalues of A. Then
det A = 1 2 ··· n and tr A = 1 + 2 + ··· + n.
Making this proposition even more explicit, when A 2 Rn⇥n , n 2, the characteristic
polynomial of A is given by
n n 1
pA ( ) = tr(A) + · · · + ( 1)n det(A).
That is, the determinant of A is the constant term in its characteristic polynomial (possibly
with a sign change) and the trace is the second coefficient (the coefficient attached to n 1 ).
Thus, for example, when A is 2 ⇥ 2,
2
pA ( ) = tr(A) + det(A).
With all these definitions, we state the largest theorem of undergraduate linear algebra.
This theorem gives several ways to check if a matrix is invertible or singular. We have al-
ready discussed most of these but it is nice to have them listed in one location. Many of the
following conditions are simply re-statements of each other but I list them for completeness.
This theorem displays the many relationships between the preceding topics
Theorem 49 (Invertible Matrix Theorem). Let A 2 Rn⇥n . The following are equiva-
lent:
(1) A is invertible
(2) A is right-invertible (i.e., there is B 2 Rn⇥n such that AB = I)
(3) A is left-invertible (i.e., there is C 2 Rn⇥n such that CA = I)
(4) A is full rank (i.e., rank A = n)
(5) The columns of A form a linearly independent set
(6) The columns of A span Rn (i.e. R(A) = Rn )
(7) A has zero nullity (i.e., null A = 0)
(8) A has a trivial null space (i.e. N (A) = {0})
(9) A induces an injective (one-to-one) transformation
(10) A induces a surjective (onto) tranformation
(11) A induces a bijective transformation
(12) Av = 0 has only the trivial solution v = 0
(13) Av = b is uniquely solvable for each b 2 Rn
(14) det A 6= 0
(15) 0 is not an eigenvalue of A
Of course, this is not an exhaustive list. There are plenty more equivalent conditions but
these give a solid overview of what it means for a matrix to be invertible.
This concludes a vast majority of the linear algebra material which is included on the
math subject GRE. For completeness, we discuss a few more topics: those related to simi-
larity and diagonalizability.
Definition 50 (Similarity). Two matrices A, B 2 Rn⇥n are called similar if there is an

invertible matrix P 2 Rn⇥n such that A = P BP 1 .
Similarity is an equivalence relation on Rn⇥n . At first glance, this definition may not
seem very meaningful but similar matrices share many nice properties in common.
Proposition 51. Similar matrices have the same characteristic polynomial; hence the same
eigenvalues, the same determinant and the same trace.
In view of this, similar matrices can be seen as representing the same linear transforma-
tion with respect to di↵erent bases for Rn . For a matrix A 2 Rn⇥n , one goal may be to
find a matrix which is similar to A but has a much simpler form; this would help us identify
key features about A (especially regarding how it transforms Rn ) without doing much heavy
lifting.
Definition 52 (Diagonal Matrix). A matrix D 2 Rn⇥n with entries (dij )ni,j=1 is said to
be diagonal if dij = 0 when i 6= j.
Diagonal matrices are incredibly simple and easy to analyze; they transform Rn by
stretching the coordinate axes. Thus each standard basis vector ek is an eigenvector of
a diagonal matrix and the corresponding eigenvalue is the k th diagonal element of the ma-
trix. From this, we see that the determinant of a diagonal matrix is the product of the
diagonal elements and a diagonal matrix is invertible i↵ it has no zeros on its diagonal.
Definition 53 (Diagonalizability). A matrix is called diagonalizable if it is similar to a

diagonal matrix.
Since similar matrices share the same eigenvalues, if a matrix is diagonalizable, then it is
similar to the diagonal matrix with its eigenvalues as the diagonal elements. This reasoning
seems, at first glance, to be reversible. Suppose that A 2 Rn⇥n has eigenvalues 1 , . . . , n
with corresponding eigenvectors v1 , . . . vn . We can form a matrix P which has columns given
by v1 , . . . , vn and we will see
AP = [Av1 | · · · | Avn ] = [ 1 v1 | · · · | n vn ] = PD
where D is the diagonal matrix with 1 , . . . , n on the diagonal. This yields A = P DP 1

which shows A is diagonalizable. This “proof” is exactly correct if A has n linearly inde-
pedent eigenvectors. Each distinct eigenvalue has at least one eigenvector and eigenvectors
corresponding to di↵erent eigenvalues are linearly independent. However, if a matrix has
a repeated eigenvalue, this eigenvalue may fail to have multiple corresponding eigenvectors
and thus A may have less than n linearly independent eigenvectors. An example of such a
matrix is 0 1
3 1 0
A = @0 3 1 A
0 0 4
where the eigenvalue 3 is repeated but only has one corresponding eigenvector. Avoiding
this peculiarity, the above proof is valid. We restate this fact as a theorem.
Theorem 54. A real n ⇥ n matrix is diagonalizable i↵ its eigenvectors form a basis for Rn .
In some sense, most matrices are diagonalizable (precisely: the diagonalizable matrices
form a dense subset of Rn⇥n with respect to any norm on Rn⇥n ). However, given a matrix,
it can be difficult to decide quickly whether it is diagonalizable. For large n, finding the
eigenvalues and eigenvectors of an n ⇥ n matrix is very difficult so it is useful to characterize
large classes of diagonal matrices in other ways. This requires a bit more machinery.
Definition 55 (Transpose). Let A 2 Rn⇥m have entries aij for 1  i  n, 1  j  m.

The transpose of A is the matrix B 2 Rm⇥n with entries bij = aji for 1  i  m, 1  j  n.
This matrix is denoted At .
Thus if A is a linear map from Rm to Rn , then At is a map from Rn back to Rm . It is

immediate from the definition that the transpose is additive and will respect scalar multi-
plication but it is not clear how it will interact with other operations.
Proposition 56 (Properties of the Transpose). Let A, B be real matrices. Then
(a) (At )t = A,
(b) (cA)t = cAt for all C 2 R,
(c) (A + B)t = At + B t when A, B have the same dimensions,
(d) (AB)t = B t At when A, B have dimensions compatible with multiplication,
(e) det A = det(At ) when A is square,
(f) (At ) 1 = (A 1 )t when A is square and invertible (in particular, At is invertible i↵ A is

invertible).
More generally, for any finite dimensional vector spaces V, W and any linear transforma-
tion T : V ! W , we can define an adjoint transformation T ⇤ : W ⇤ ! V ⇤ where V ⇤ , W ⇤ are
the continuous dual spaces of V, W respectively. In the terminology of category theory, the
map which takes any vector space to its dual and transposes any linear map is a contravari-
ant functor on the category of vector spaces. To gain more context regarding the meaning
of transposition one can read about the dual spaces; this is outside of the scope of these notes.
Definition 57 (Symmetric Matrix). A matrix A 2 Rn⇥n is said to be symmetric if

A = At .
Symmetric matrices (and their more general analog, self-adjoint operators) have many
desirable properties; these are highlighted whne studying inner product spaces. There are
two properties which are pertinent to our discussion.
Proposition 58. The eigenvalues of a real symmetric matrix are real.

Theorem 59 (Spectral Theorem (Real Symmetric Case)). Let A 2 Rn⇥n . A is

symmetric if and only if there is a diagonal matrix D 2 Rn⇥n and a matrix U 2 Rn⇥n such
that U 1 = U t and
A = U DU t .
Matrices satisfying U t = U 1 are called orthogonal matrices. These correspond to rota-
tions and reflections of Rn (i.e. linear transformations that preserve angles between vectors
and do not change the length of vectors, though they may change orientation; interestingly,
such matrices form a Lie group). Thus we often say that symmetric matrices are orthogo-
nally diagonalizable.
Finally, we demonstrate two applications of diagonalizability since, as of yet, these defi-

nitions are almost completely unmotivated.
One large application of diagonalizability is in the theory of systems of di↵erential equa-

tions. Indeed, if y = (y1 , . . . , yn ) are unknown functions satisfying
y 0 = Ay
for some constant matrix A 2 Rn⇥n and A = P DP 1 is a diagonal decomposition of A, then

the functions u = (u1 , . . . , un ) defined by u = P 1 y satisfy
u0 = Du
which is a very easy system to solve. Reversing the substitution will yield the solution y to
the original equation.
Another simpler application of diagonalizability is finding large powers of matrices. Given

a matrix A, it may be of use in application to find Ak for integer powers k. Performing matrix
multiplication is tedious and can even be computationally infeasible if A is large enough.
However, if A = P DP 1 is a diagonal decomposition of A, then we notice that
A2 = P DP 1
P DP 1
= P D2 P 1
, A3 = P D2 P 1
P DP 1
= P D3 P 1
, etc.
In general, we have Ak = P Dk P 1 for any integer k 0 (and any k < 0 if A is invertible).

This greatly simplifies the computation since taking the k th power of a diagonal matrix is as
easy as raising each entry to the k th power.
Topics which sometimes appear on the math subject GRE and which were excluded here
for brevity include normed spaces and inner product spaces. The main idea is that we can
add much more structure to a vector space by specifying a way to measure length and angles
between vectors. For a discussion of normed spaces, inner product spaces and much more,
one could look to Peter Petersen’s linear algebra book which is freely available online.
Christian Parkinson GRE Prep: Abs. Alg. & Comp. Analysis Notes 1
Week 5: Abstract Algebra & Complex Analysis

Notes
After Linear Algebra, we move onto miscellaneous topics which show up sporadically on
the math subject GRE. Some additional topics which appear every year (though probably
only in a few questions) are complex analysis and abstract algebra. It bears repeating that
for brevity, many important topics are glossed over very quickly (with no examples) and
others are skipped entirely; these notes are far from exhaustive but give a decent overview
of the material on the math subject GRE.
Abstract Algebra
The purpose of abstract algebra is to recognize structure shared by many mathematical

objects we have dealt with so far and identify which axioms are necessary to maintain this
structure. The most fundamental object in abstract algebra is the group.
Definition 1 (Group). A group is a set G paired with a binary operation which satisfies
the following axioms:
(a) (Closure) x y 2 G whenever x, y 2 G,
(b) (Associativity) x (y z) = (x y) z for all x, y, z 2 G,
(c) (Identity) there exists e 2 G such that x e = e x = x for all x 2 G,

1 1 1
(d) (Inverses) for each x 2 G there is an element x 2 G such that x x =x x = e.
We refer to the group as (G, ) unless there is no ambiguity about the operation in which
case we can simply write G. The element e in (c) is called the identity element in the group;
a simply proof shows that it is unique. For each x, the corresponding x 1 defined in (d) is
called the inverse of x; this element is also unique. In many circumstances, the operation
is addition (in which case we may call G an additive group and refer to inverses with a minus
sign: x is the inverse of x) or multiplication (in which case we may call G a multiplicative
group).
Almost all of the sets we’ve defined thus far have group structure with respect to some
operation.
Example 2. The following are examples of groups.
(R, +) - the set of real numbers with addition (more generally (Rn⇥m , +) is a group)
(Z, +) - the set of integers with addition
(R+ , ·) - the set of positive real numbers with multiplication
GL(n, R) - the set of real, invertible n ⇥ n matrices with matrix multiplication

(Zn , +n ) - the set of integers {0, 1, . . . , n 1} with addition modulo n
(Z⇤n , ·n ) - the set of integers k 2 {1, . . . , n 1} such that gcd(k, n) = 1 with multipli-
cation modulo n
Sn = {f : {1, . . . , n} ! {1, . . . , n} : f is bijective} - the set of permutations of an n

point set with functional composition
Un = {z 2 C : z n = 1} - the nth roots of unity with multiplication
Dn - the set of reflections and rotations of the regular n-gon with functional composition
K4 = {00, 01, 10, 11} - the set of length two strings consisting of zeros and ones under
bitwise XOR operation. One can think of this as the set of configurations of a pair
of light switches: o↵/o↵, o↵/on, on/o↵, on/on. The operations then correspond to
flipping the switches.
While most of the sets we’ve dealt with so far have a natural group structure, it is also
easy to identify sets which fail to be a group.
Example 3. The following fail to be groups.
D = {(x, y) 2 R2 : x2 + y 2  1} - the unit disk in R2 with addition. This set fails to

be closed under the given operation.
(R, ) - the set of real numbers with subtraction. This operation is non-associative.
(Z+ , +) - the set of positive integers with addition. This set has no identity element.
(Rn⇥n , ·) - the set of n ⇥ n matrices with matrix multiplication. Not all elements of
this set have inverses.
Notice that in the definition, we only require the operation in a group to be associative.
However, in many groups (especially additive groups), the operation is also commutative.
This distinction is especially important when discussing inverses of elements. In general, for
x, y in some group, we have (xy) 1 = y 1 x 1 . Only if the operation is commutative can we
arrive at the more natural (xy) 1 = x 1 y 1 . Commutivity is an additional piece of structure
and thus should be given its own definition.
Definition 4 (Abelian Group). A group is called abelian if, in addition to the other
group axioms, the binary operation is commutative. That is, (G, ) is abelian if x y = y x
for all x, y 2 G.
As stated above, just about any additive group will be abelian. Another thing to notice
is that many of the above groups are embedded inside each other; we call these subgroups
(just like many vector spaces can be realized as subspaces of some larger space).
Definition 5 (Subgroup). Suppose that (G, ) is a group and that H ⇢ G is such that
(H, ) is also a group. Then we call H a subgroup of G and write H  G.
The subgroups tell you something about the overall structure of the group. For example,
it may be reasonable to try to build groups by successively nesting subgroups. To this end,
it is useful to identify the subgroups of a given group. We discuss this after some examples.
Example 6. Every group has two trivial subgroups: the group itself and the set containing
only the identity element.
Recall that we have a chain of subsets Z ⇢ Q ⇢ R ⇢ C. This becomes a chain of
subgroups when each set is given the operation of addition: Z  Q  R  C.
The subset SL(n, R) = {A 2 Rn⇥n : det A = 1} is a subgroup of GL(n, R) since the
determinant is multiplicative.
The subset An of Sn consisting of even permutations (those which can be arrived at with
an even number of transpositions) is a subgroup.
The subset Zn = {0, 1, . . . , n 1} is not considered a subgroup of Z since, in order to
maintain closure, one must change the operation from addition to addition modulo n (in
fact, Zn is a quotient group of Z; a topic we will not cover).
For any group (G, ) and any x 2 G, the subset
hxi = {xn : n 2 Z}
is a subgroup of G (here xn = x x · · · x when n > 0, xn = x 1 x 1 · · · x 1 when n < 0

and x0 = e). This is called the subgroup of G generated by x or x is called the generator
of the subgroup. Note that in additive groups, we replace xn with nx to match with the
notation we have used since learning basic algebra.
There is a handy test for deciding whether a subset of a group is indeed a subgroup.
Proposition 7 (Subgroup Test.). Let H ⇢ G where G is a group. Then H is a subgroup

if and only if both of the following hold
(i) e 2 H (that is, H contains the identity element)

1
(ii) whenever x, y 2 H, we have xy 2 H.
One thing to notice is that this subgroup hxi generated by x may be the entire group.
Definition 8 (Cyclic Group). A group is called cyclic if it is generated by one of its

elements. That is, (G, ) is cyclic if
G = hxi
for some x 2 G.
Again, the sets Z and Zn are prototypical cyclic groups. Another example of a cyclic
group is Un , the nth roots of unity. Most of the groups listed above (indeed “most” groups
in general) are not cyclic: none of Q, R, C are cyclic under addition, Sn and Dn are not
cyclic. Another group which is famously non-cyclic is the Klein-4 group defined above by
K4 = {00, 01, 10, 11} with bitwise XOR; this is the smallest non-cyclic group. Any cyclic
group is abelian (this follows from associativity, can you see why?), but the converse is not
true.
Definition 9 (Order). Suppose that (G, ) is a group. The order of the group G is defined
to be the cardinality of G and written |G|. G is called a finite group if the cardinality is
finite. Let x 2 G. The order of x is defined to be the smallest n > 0 such that xn = e where
e is the identity of the group. If no such n exists, the element is said to have infinite order.
From this definition, we see that the order of an element x 2 G is the same as the order
of the group hxi. Likewise, G is cyclic if and only if there is some x 2 G such that the order
of x is equal to the order of G. Cyclic groups have an easily identifiable structure detailed
in the following proposition.
Proposition 10. Suppose that G is a cyclic group. Then all subgroups of G are cyclic.
Further if G has finite order n and x 2 G is a generator of G (so that x has order n), then
xm has order n/gcd(m, n). In particular, G has a subgroup of order d for all d which divide
n and xm is a generator of G if and only if gcd(m, n) = 1.
This proposition classifies all subgroups of a cyclic group (expecially finite cyclic groups).
One might ask if we can classify the subgroups of any finite group; while we cannot do this,
we can at least classify the potential sizes of subgroups and state some existence results for
subgroups. We list these in a few theorems.
Theorem 11 (Lagrange’s Theorem). Let G be a finite group. The order of any sub-
group of G must divide the order of G. In particular, if x 2 G has order m and G has order
n, then m | n.
A corollary of Lagrange’s Theorem is that every group of prime order is cyclic. Indeed,
if n is prime in the above statement, then every element of G must have order 1 or order n.
It is obvious that only the identity element can have order 1 and thus every other element
has order n and generates the group. While this tells you the possible orders of subgroups
of a finite group; it does not guarantee existence of subgroups of a given order.
Theorem 12 (Cauchy’s Theorem). Let G be a finite group of order n and let p be a

prime number dividing n. Then G has a subgroup of order p.
A refinement of this is as follows.
Theorem 13 (Sylow’s First Theorem). Let G be a finite group of order n and let
n = pk m for some prime p which does not divide m (i.e., pk is the largest power of p. Then
G has a subgroup of order p` for all 1  `  k.
We can summarize these three theorems with an example. If G has order 54, then by
Lagrange’s Theorem, any subgroup of G has order 1, 2, 3, 6, 9, 18, 27 or 54 (of course, we will
have the trivial subgroups of order 1 and 54). By Cauchy’s Theorem, there is at least one
subgroup of order 2 and at least one subgroup of order 3 since these are the primes which
divide 54. By Sylow’s First Theorem, there is at least one subgroup of order 9 and at least
one subgroup of order 27 since these are the prime powers which divide 54. We are not
guaranteed existence of subgroups of order 6 or order 18.
Since algebra is about identifying structure, it makes sense that we should identify when
two groups are morally the same. Take for example, Z4 , U4 and K4 with their repective
operations. Notice that both Z4 and U4 are cyclic. Indeed,
1 · 1 = 1, 2 · 1 = 2, 3 · 1 = 3, 4 · 1 = 0 in Z4
and
i1 = i, i2 = 1, i3 = i, i4 = 1 in U4 .
This gives us a natural bijection between the two groups. We could map 1 ! i and specify
that the map preserve group operations. However, K4 is not cyclic since ever element has
order 2. Thus, while there are bijections from K4 to Z4 , none of them will respect the group
structure. In this case, we say that Z4 and U4 are isomorphic while Z4 and K4 are not. We
define this notion here.
Definition 14 (Isomorphism). Let (G, ) and (H, ⇤) be two groups. These groups are
said to be isomorphic if there is a bijective map : G ! H such that
(x y) = (x) ⇤ (y)
for all x, y 2 G. Such a map is called a (group) isomorphism. When such a map exists,
we write G ⇠ = H.
As stated above, groups which are isomorphic can be thought of as morally the same
group up to renaming elements. We may also call isomorphic groups structurally identical,
whereupon we would refer to non-isomorphic groups as structurally distinct. All algebraic
properties of a group are preserved under isomorphism: number of solutions to equations
like x2 = y for fixed y; commutivity of the operation; number of subgroups of a given order,
etc. Using this concept, we can classify how many groups there are of a given order up to
isomorphism. In general, this is a humongous undertaking (the classification of finite, simple
groups requires hundreds of pages for example) but there are a few theorems which help us
classify groups with nice structure.
Proposition 15. Any cyclic group is isomorphic to either Z or Zn for some finite n.
This proposition essentially closes the book on cyclic groups: we know essentially any-
thing we will ever need to know about cyclic groups from an algebraic standpoint since they
are just integer groups.
The next goal may be to classify abelian groups. While in general this is still too difficult,
if we add an extra property, we can achieve a full classification. We build toward this.
Definition 16 (Direct Product). Let (G, ) and (H, ⇤) be groups. We define the
cartesion product
G ⇥ H = {(g, h) : g 2 G, h 2 H}.
The direct product group is the group (G ⇥ H, ·) with operation defined by
(g1 , h1 ) · (g2 , h2 ) = (g1 g2 , h1 ⇤ h2 ), (g1 , h1 ), (g2 , h2 ) 2 G ⇥ H.
This gives us a way to build larger groups out of smaller groups. Using this, we can see
for example than
Rn ⇠
=R | ⇥R⇥ {z· · · ⇥ R} .
n times
What is interesting is how the direct product works with finite groups. For any finite groups
G of order n and H of order m, G ⇥ H will have order nm. It is quite easy to see that the
direct product of two abelian groups is again an abelian group. However, the same is not
true of cyclic groups. Indeed, Z2 is certainly cyclic, but
Z2 ⇥ Z2 = {(0, 0), (0, 1), (1, 0), (1, 1)}
with pointwise addition modulo 2 is not cyclic; indeed Z2 ⇥ Z2 ⇠ = K4 . However, Z2 ⇥ Z3 is

cyclic as can be easily checked. The following proposition explains this somewhat.
Proposition 17. Suppose that G, H are groups and that x 2 G has order n and y 2 H has
order m. The order of (x, y) 2 G ⇥ H is lcm(n, m). In particular, Zn ⇥ Zm is cyclic if and
only if n and m are relatively prime.
This theorem continuous to hold when we add more factors. For example, Zn ⇥ Zm ⇥ Zk
is cyclic i↵ gcd(n, m, k) = 1. Using these direct products, we can build all abelian groups
which satisfy one more property.
Definition 18 (Finitely Generated Group). Let (G, ) be a group and let S ⇢ G. The
subgroup generated by S is defined to be the set of all finite combinations of elements in S
and their inverses. A groups is said to be finitely generated if it is equal to the subgroup
generated by a finite set of elements. For example, the subgroup generated by {a, b} contains
the elements
ab, b 1 a2 b3 , a2 b 5 ab7 , etc.
Thus a group is finitely generated if it can be built from just a few elements in the group.
From this definition it is clear that all cyclic groups are finitely generated. It is also
clear that any finite group is finitely generated since the entire group can be taken as the
generating set. The groups Zn = Z ⇥ Z ⇥ · · · ⇥ Z are prototypical examples are groups
which are infinite, non-cyclic and finitely generated. Indeed, Zn is generated by the elements
ek = (0, . . . , 0, 1, 0, . . . , 0) where the 1 appears in the k th entry.
Theorem 19 (Fundamental Theorem of Finitely Generated Abelian Groups.).

Suppose that G is a finitely generated abelian group. Then
G⇠
= Zr ⇥ Zm1 ⇥ Zm2 ⇥ · · · ⇥ Zmk
for some integers m1  m2  . . .  mk such that m1 | m2 , m2 | m3 , . . . , mk 1 | mk . The

integers m1 , . . . , mk need not be distinct, but the list m1 , . . . , mk is unique.
Alternatively,
G⇠= Z r ⇥ Z pk1 ⇥ Z p k2 ⇥ · · · ⇥ Z p k`
1 2 `
where p1 , . . . , p` are (not necessarily distinct) prime numbers and k1 , . . . , k` are (not neces-
sarily distinct positive integers.
This theorem allows us to identify the number of abelian groups of a given finite order.
Example 20. How many structurally distinct abelian groups have order 540?
Solution. We can factor 540 = 22 · 33 · 5. Now we simply need to find how many ways we
can write 540 = m1 · · · mk where each m` divides m`+1 or 540 = pk11 · · · pk` ` . Personally, I find
the latter easier. Listing the prime factors, we see
(i) 540 = 2 · 2 · 3 · 3 · 3 · 5
(ii) 540 = 22 · 3 · 3 · 3 · 5
(iii) 540 = 2 · 2 · 3 · 32 · 5
(iv) 540 = 22 · 3 · 32 · 5
(v) 540 = 2 · 2 · 33 · 5
(vi) 540 = 22 · 33 · 5
If we wanted to search for m1 , . . . , mk instead, we would see
(i) 540 = 3 · 6 · 30
(ii) 540 = 3 · 3 · 60
(iii) 540 = 6 · 90
(iv) 540 = 3 · 180
(v) 540 = 2 · 270
(vi) 540 = 540

Thus each method gives 6 abelian groups of order 540; we see
Z2 ⇥ Z2 ⇥ Z3 ⇥ Z3 ⇥ Z3 ⇥ Z5 ⇠
= Z3 ⇥ Z6 ⇥ Z30
Z 4 ⇥ Z 3 ⇥ Z3 ⇥ Z 3 ⇥ Z5 ⇠
= Z3 ⇥ Z3 ⇥ Z60
Z2 ⇥ Z2 ⇥ Z3 ⇥ Z9 ⇥ Z5 ⇠
= Z6 ⇥ Z90
Z4 ⇥ Z3 ⇥ Z9 ⇥ Z5 ⇠
= Z3 ⇥ Z180
Z2 ⇥ Z2 ⇥ Z9 ⇥ Z5 ⇠
= Z2 ⇥ Z270
Z4 ⇥ Z27 ⇥ Z5 ⇠
= Z540
While isomorphisms preserve all the algebraic properties of a group, requiring the map
to be bijective is very stringent. Indeed, a map could still preserve group structure without
satisfying this requirement.
Definition 21 (Homomorphism). Let (G, ) and (H, ⇤) be groups. A homomorphism

is a map : G ! H such that
(x y) = (x) ⇤ (y) for all x, y 2 G.
Homorphisms compress a group down into smaller group while still maintaining much of
the structure of the original group. It is easy to check that if G is a group and : G ! H is
an map which respects operations, then (G) ⇢ H is indeed a group (thus homomorphisms
are the morphisms in the category of groups). Note that every isomorphism is a homomor-
phism but not conversely. Homomorphisms have many nice properties which we list here.
Proposition 22 (Properties of a Homomorphism). Let G, H be groups and :G!H

be a homomorphism. Then
(i) if e is the identity element of G, then (e) is the identity element of H,
(ii) if x 2 G has finite order n, then (x) has finite order n in H,

1
(iii) if x is the inverse of x 2 G, then (x 1 ) is the inverse of (x) 2 H,
(iv) if G0 is a subgroup of G, then (G0 ) is a subgroup of H,
(v) if G ha finite order, then | (G)| | |G|. Likewise, if H has finite order then | (G)| | |H|,
(vi) If H 0 is a subgroup of H, then

1
(H) = {x 2 G : (x) 2 H}
is a subgroup of G. In particular,
ker = 1
({e0 }) = {x 2 G : (x) = e0 }
is a subgroup of G where e0 is the identity in H. This subgroup is called the kernel of

.
For some examples, the map : Z ! Zn defined by (k) = k(mod n) for k 2 Z is a

homomorphism of Z down onto Zn . Likewise, the map (x) = ex for x 2 R is a homomor-
phism from (R, +) to (R+ , ·).
Finally, until now we have only considered a single binary operation on a set. While
group theory is useful in its own right, most spaces that we are used to dealing with are
naturally endowed with two binary operations. Indeed, all of N, Z, Q, R, C have both addi-
tion and multiplication as does Rn⇥n . To account for such spaces, we need new algebraic
structures.
Definition 23 (Ring). A ring is a set R with two binary operations + and · (which we
call addition and multiplication) such that (R, +) is an abelian group (with identity 0 and
inverse x for each x) and · is associative. Further, we require + and · to be compatible in
the sense of distributivity:
x · (y + z) = (x · y) + (x · z) and (y + z) · x = (y · x) + (z · x), for all x, y, z 2 R.
If there is a multiplicative identity, we call this element 1 and we say R is a ring with unity.
If · is commutative, we say that R is a commutative ring.
One thing to notice about this definition is that multiplication need not be a nice op-
eration in a ring and indeed, it often is not. For example, Rn⇥n forms a ring under matrix
addition and matrix multiplication. However, under multiplication, there is no commutivity
and there are no guaranteed inverse elements. In more extreme examples, there is not even
necessarily and identity element for multiplication. More quotidian examples of rings are
Z and R. Again, in Z there are no multiplicative inverses, but multiplication is commuta-
tive. In R, each element (other than the additive identity) has both an additive inverse and
a multiplicative inverse. Other examples of rings are Zn under addition and multiplication
modulo n, RR - the functons from R ! R or Z[i] = {a+bi : a, b 2 Z} - the Gaussian integers.
As a shorthand, we usually drop the dot and write xy rather than x · y. Also, to avoid
drawing too many parentheses, it is convenient to adopt an order of operations wherein
multiplication has priority over addtion; thus xy + z = (x · y) + z for example.
Many of the definitions for groups have direct analogs for rings. For example a subring
is a subset of a ring which is itself a ring. A ring homomorphism is an operation preserving
map from one ring to another (note, there are two operations to preserve here).
Since ring multiplication can be bad, we typically do not have factorization like we do
in R. For example, in R if xy = 0, then either x = 0 or y = 0. This is not true in Z6
where 2 · 3 = 0 for example. However, if an element has a multiplicative inverse, then we can
perform this sort of cancellation. Thus the non-zero elements x such that xy = 0 for some
other non-zero y can be seen as the bad elements in a ring.
Definition 24 (Units & Zero Divisors). Let (R, +, ·) be a ring with unity 1. An element
x 2 R is called a unit if there is an element x 1 such that xx 1 = x 1 x = 1. In this case x 1
is unique and it is called the multiplicative inverse of x. By contrast, x 2 R is called a zero

divisor if x 6= 0 and there is some non-zero y 2 R such that xy = 0.
From the definitions, it is clear that no zero divisor can be a unit and vice versa. The
matrices Rn⇥n form a ring which has zero divisors. For example A = ( 00 10 ) satisfies A2 = 0
and is thus a zero divisor in R2⇥2 .We can use these concepts to further add structure to the
ring.
Definition 25 (Integral Domain). An integral domain is a commutative ring with unity

which has no zero divisors.
The prototypical integral domain is Z (hence the name integral domain). However, as
stated before Z lacks one final property that is shared by R and C for example.
Definition 26 (Field). A field is an integral domain in which ever non-zero element is

a unit. That is, a field is a set (R, +, ·) in which (R, +) is an additive abelian group and
(R \ {0}, ·) is a multiplicative abelian group.
Commonly encountered fields include Q, R, C. However, there are finite fields as well.
For example, Zp is a field under addition and multiplication modulo p when p is prime. In
fact, all finite fields have order pk for some prime p though the field of order pk is not Zpk
unless k = 1. Another common proposition states that all finite integral domains are fields.
Complex Analysis
Complex analysis deals with the extension of calculus to the complex plane C. The com-
plex numbers C are usually defined as the algebraic completion of R; i.e., the smallest field
containing R over which all polynomials can be factored. The motivation is that in the real
numbers, certain polynomials like p(x) = x2 + 1 have no roots. To remedy this, we add an
number i called the complex unit with the property that i2 = 1 and thus i and i are
roots of p. Then all roots of all polynomials with real coefficients can be written in the form
a + bi for some real numbers a, b.
Definition 27 (Complex Numbers). Define C = {a + bi : a, b 2 R, i2 = 1}. Then C

is a field with the operations of addition and multiplication defined by
(a+bi)+(c+di) = (a+c)+(b+d)i, (a+bi)(c+di) = (ac bd)+(ad+bc)i, a+bi, c+di 2 C.
For a complex number z 2 C, if z = a + bi then we call a the real part of z, denoted Re z = a
and b the imaginary part of z, denoted Im z = b.
One thing we may immediately recognize is that whenever we had a real polynomial, the
imaginary roots came in pairs a ± bi. We call these roots a conjugate pair. Indeed, for any
z = x + yi 2 C, we can define the complex conjugate of z by z = x yi. A complex number
is real i↵ its imaginary part is zero which is equivalent to z = z. It is useful to consider
what is lost when we move from R to C; indeed, we gain algebraic closure but we lose total
ordering. Thus the statement z < w is meaningless for z, w 2 C unless it happens to be the
case that z, w 2 R.
We can think of the real part and the imaginary part of a complex number as being inde-
pedent, representing each with an axis. In this way, we can geometrically (and topologically)
associate C with R2 under the map (x, y) 2 R2 7! x + yi 2 C. For this reason, we often refer
to C as the complex plane. Then, just as we found polar coordinates useful in R2 , they will
be useful in C.
Definition 28 (Magnitude
p & Argument). Let z = x + yi 2 C. We define the magni-
tude of z by |z| = x2 + y 2 . This can be thought of as the length from the origin to the
point x + yi. We also define the argument of z to be an angle ✓ which z makes with the
positive real axis. This is not a unique value since we could slway add or subtract 2⇡ to get
another argument of z. Thus we define the principal argument of z (denoted Arg z to be
the argument of z which lies in ( ⇡, ⇡].
From simple plane geometry, we see that |z|2 = zz and Arg z = arctan(Re z/Im z) after
possible accounting for a phase shift. Note that the magnitude gives us a sense of distance
in the complex plane: while |z| denotes the length from the origin to z, we also have that
|z w| denotes the length from z to w. Thus for example {z 2 C : |z w| < 2} is all
complex numbers whose distance from w is less than 2; we know this to be the circle of
radius 2 centered at w.
Definition 29 (Polar Form). Let z 2 C. The polar form of z is given by z = rei✓ where
r = |z| and ✓ = Arg z.
Note that the polar form of a complex number is the same as the polar form of a vector
(x, y) $ (r cos(✓), r sin(✓)) once we recall Euler’s identity ei✓ = cos(✓) + i sin(✓). Ths polar
form is useful in computing powers of z. For example, if z = rei✓ then z 5 = r5 e5i✓ is much
easier to compute than z 5 = (x + yi)5 and similarly for z 1/3 = r1/3 ei✓/3 . However, in this last
example some care is needed. While most functions translate nicely from the real numbers
to the complex numbers (as we will see shortly), fractional powers do not. For example, in
the real numbers the map x ! x1/3 is a well-defined bijection but in the complex numbers,
the map z 7! z 1/3 is not well-defined. Thus we simply need to be careful manipulating roots
in the complex plane. For example,
p p p
ab 6= a b
for complex a, b. We see that while xn = 1 has only 1 or 2 roots for real x, the equation
z n = 1 has n roots for complex z. We can find these using the polar form of z.
Definition 30 (Roots of Unity). The nth roots of unity are the n solutions to z n = 1 in
C. Setting !n = e2⇡i/n , we see that the roots of unity are given by 1, !n , !n2 , . . . , !nn 1 .
Visually, these roots of unity are n points which are evenly spaced around the unit circle.
Next we want to deal with functions of complex variables. Let f : C ! C. Then for
every z 2 C, we have f (z) 2 C which means that f (z) = Re(f (z)) + Im(f (z))i; that is f will
have a real and imaginary part which are maps from C ! R. With the understanding that
x is the real part of z and y is the imaginary part of z, we often write the real an imaginary
part of f as u(x, y) and v(x, y) respectively. Thus, somewhat confusingly, we will often write
f (z) = u(x, y) + iv(x, y), where z = x + yi 2 C
and u, v : R2 ! R. Other times it is more convenient to forgo this and just work with f (z);
however, any function f (z) does have the above decomposition into real and imaginary part.
For example, the function f (z) = z 2 can be written in terms of its real and imaginary
part as follows:
f (z) = (x + iy)2 = (x2 y 2 ) + i(2xy).
Thus in this example, u(x, y) = x2 y 2 and v(x, y) = 2xy. It is worthwhile to define a few
commonly used complex functions.
Definition 31 (Common Functions). We define the following functions for z 2 C

1
X zn
ez =
n=0
n!
e + e iz
iz
cos(z) = ,
2
eiz e iz
sin(z) = ,
2i
ez + e z
cosh(z) = ,
2
ez e z
sinh(z) = .
2
These are extensions of the same functions that we are familiar with for real variables and
satisfy all of the same identities. For example,
cos2 (z) + sin2 (z) = 1, and cos(z + 2⇡) = cos(z), ez+w = ez ew for all z, w 2 C.
These can all be written in terms of real and imaginary parts as well. For example,
f (z) = ez can be written as
f (z) = ex+iy = ex eiy = ex cos(y) + iex sin(y).
We want to define calculus over C and in order to do so, we need to define limits. We
can do this in the same way as in R since magnitude gives us a metric on C.
Definition 32 (Limit). Let f : C ! C and let a 2 C. We say that the limit as z

approaches a of f (z) is L 2 C i↵ for all " > 0, there is > 0 such that
0 < |z a| < =) |f (z) L| < ".
In this case, we write limz!a f (z) = L or if no such L exists, we say that the limit does not
exist.
Note that, whereas in R, |x a| < defines an open interval, in C, |z a| < defines

an open disk. So in the limit, z is allowed to approach a from any direction.
All of the same limit rules from ordinary calculus hold for the complex limit as well.
Using this definition of the limit, we can define derivatives in the same way that we do in R
as well.
Definition 33 (Derivative). Let f : C ! C. For z 2 C, we define the (complex)

derivative f 0 (z) by the limit
f (z + h) f (z)
f 0 (z) = lim
h!0 h
whenever the limit exists. When the limit exists, we say f is (complex) di↵erentiable at z.
If the limit does not exist, then we say that f is not di↵erentiable at the point z.
Note again, the limit is taken in the context of complex numbers so h needs to be allowed
to approach zero from any direction in the complex plane.
We can use this definition to prove all the same derivative rules from ordinary calculus.
For example, if f (z) = z 2 , note that
(z + h)2 z2 2zh + h2
f 0 (z) = lim = lim = lim (2z + h) = 2z
h!0 h h!0 h h!0
just as one would expect. Likewise if f (z) = ez then f 0 (z) = ez and if f (z) = sin(z) then
f 0 (z) = cos(z). To stress that the limit must exist in all directions, we do another example.
Example 34. Is the function f (z) = z di↵erentiable at z = 0?
Solution. We note that f (h) h f (0) = hh . Now if we allow h to approach 0 along the real
axis, we see that hh = 1 ! 1 while if we let h approach 0 along the imaginary axis, we have
h
h
= 1 ! 1. Since these do not agree, the limit does not exist and thus f (z) = z is not
di↵erentiable at z = 0.
The function f (z) = z seems fairly simply and thus, at a first glance it may seem like it
should be di↵erentiable but we’ve proved it isn’t (at least at z = 0). A natural question is:
which complex functions f (z) = u(x, y)+iv(x, y) are di↵erentiable. It seems very logical that
if u, v are di↵erentiable then f will be di↵erentiable however, for z = x iy, the components
are di↵erentiable so clearly this is not strong enough. We investigate further. Suppose that
f (z) = u(x, y) + iv(x, y) is di↵erentiable at z 2 C and let h = hx + ihy 2 C. Notice that
✓ ◆
f (z + h) f (z) u(x + hx , y + hy ) u(x, y) v(x + hx , y + hy ) v(x, y)
lim = lim +i .
h!0 h h!0 hx + ihy hx + ihy
Since the limit can be taken in any direction, we consider taking the limit along the real and
imaginary axis respectively. Along the real axis we arrive at
✓ ◆
0 u(x + hx , y) u(x, y) v(x + hx , y) v(x, y) @u @v
f (z) = lim +i = (x, y) + i (x, y).
hx !0 hx hx @x @x
However, along the imaginary axis, we see
✓ ◆
0 u(x, y + hy ) u(x, y) v(x, y + hy ) v(x, y) @u @v
f (z) = lim +i = i (x, y) + (x, y).
hy !0 ihy ihy @y @y
Since these must be equal, we see that if f is di↵erentiable at z, then
@u @v @u @v
= and =
@x @y @y @x
at z. Conversely, if u and v satisfy these equations at z, then f is di↵erentiable at z. The

boxed equations are called the Cauchy-Riemann equations. Functions u and v satisfying the
Cauchy-Riemann equations are called harmonic conjugates.
Complex di↵erentiability is quite a bit stronger than real di↵erentiability so we give it

its own name.
Definition 35 (Holomorphicity). A function f : C ! C is said to be holomorphic at

a 2 C if f is complex di↵erentiable in an open neighborhood V ⇢ C continaining a. It is
said to be holomorphic in an open set U ⇢ C if f is complex di↵erentiable at each point
in U . Equivalently, f (z) = u(x, y) + iv(x, y) is holomorphic in U if u and v satisfy the
Cauchy-Riemann equations at all points in U . If f is holomorphic on all of C, then f is said
to be entire.
Again, most nice functions are holomorphic on all of C or at least on “most” of C. For
example, exponentials, polynomials, sines, cosines and compositions thereof are holomophic
on all of C (i.e., they are entire). Rational functions (like f (z) = 1 1 z ) are holomorphic except
at isolated points in the complex plane (such functions are called meromorphic). As I said,
holomorphicity is actually a phenomenal property. For example, when we discussed Calculus
II, we noted that in R, it is not difficult to find smooth (i.e. infinitely di↵erentiable) func-
tions with no Taylor series representation; this is not remotely possible in the complex plane.
Proposition 36 (Holomorphicity Implies Analyticity). If a function f : C ! C

is holomorphic in an open domain U ⇢ C, then f is analytic in C. That is, if f is once
complex di↵erentiable in U , then f is infinitely di↵erentiable in U and locally representable
by a power series.
In light of this proposition, many people define analyticity using Definition 35; i.e., they
make no distinction between analyticity and holomorphicity when discussing functions from
C ! C. This is convenient in that one can define analyticity without discussing power
series but I feel as though it supresses some information regarding just how nice complex
di↵erentiable functions are.
Another interesting break between real and complex di↵erentiable functions is the fol-
lowing.
Theorem 37 (Liouville’s Theorem). The only bounded entire functions are constant.
This is, of course, patently false when considering real functions. For example, sin(x)
and cos(x) are bounded “entire” functions on R. There are also nontrivial smooth functions
with compact support on R; there are no such functions on C. An even stronger statement
is Picard’s Little Theorem (this is typically not covered in undergraduate complex analysis
but can be a useful fact).
Theorem 38 (Picard’s Little Theorem). Suppose that f : C ! C is non-constant and

entire. Then the range of f is the entire complex plane possibly missing a single point.
From here we move on to complex integration but the ideas of analyticity and holomor-
phicity will arise again. Just as we defined integration in R2 , the most natural extension of
one-dimensional integration to the complex plane is path integration.
Definition 39 (Path Integral). If C is a smooth curve in C parameterized by z(t) =

x(t) + iy(t) for a  t  b and f : C ! C is continuous, then the path integral of f along the
path C is given by Z Z b
f (z)dz = f (z(t))z 0 (t)dt.
C a
If the curve C is only piecewise smooth, we can break it into its smooth pieces and integrate
on each one individually.
Example 40. Let f (z) = iz and let g(z) = iz 2 . Let C be the straight-line path from 0 to
1 + i and let D be the path from 0 to 1 + i along the straight lines from 0 to 1 then from 1
to 1 + i. Calculate the line integrals of both f and g on both paths C and D.
Solution. The first path C is parameterized by z 0 (t) = t + it, t 2 [0, 1]. The second path
D is parameterized in two steps by z1 (t) = t for t 2 [0, 1] then z2 (t) = 1 + it for t 2 [0, 1].
Thus the integrals of f are given by
Z Z 1 Z 1
2i
f (z)dz = i(t it)(1 + i)dt = i(1 i)(1 + i) tdt =
C 0 0 3
and Z Z Z
1 1
i i
f (z)dz = itdt + i(1 it)idt = 1+ = 1 + i.
D 0 0 2 2
For g, we have
Z Z 1 Z 1
2 2(1 + i)
g(z)dz = i(t + it) (1 + i)dt = (1 + i) 2t2 dt =
C 0 0 3
and Z Z Z
1 1
2 i (1 + i)3 1 2(1 + i)
g(z)dz = it dt + i(1 + it)2 idt = +i = .
D 0 0 3 3 3
In this example, we notice that the integral of f depended on the path whereas the inte-
gral of g did not. This should be somewhat reminiscent of path integration in R2 . We have
a similar theorem here.
Theorem 41 (Fundamental Theorem of Path Integration). Suppose that F : C ! C

is di↵erentiable in an open domain U ⇢ C with derivative f . Then for any smooth path C
contained in U beginning at z1 and ending at z2 , we have
Z
f (z)dz = F (z2 ) F (z1 ).
C
Note, that since di↵erentiability implies analyticity, this will only apply when f itself is
di↵erentiable in U and we need f as a function of z to use this. We can state this in another
way.
Theorem 42 (Cauchy’s Integral Theorem). Let f : C ! C be holomorphic. Then for

any curve C starting and ending at the same point, we have
I
f (z)dz = 0.
C
The converse is also true.
Theorem 43 (Morera’s Theorem). Suppose f : C ! C is such that

I
f (z)dz = 0
C
for every closed curve C in C. Then f is holomorphic in C.
This demonstrates integration for entire functions. A more difficult problem is to inte-
grate functions around paths in regions where the functions have singularities; i.e., points
where the function is undefined or blows up.
Definition 44 (Singularities & Pole). Suppose that U ⇢ C is open and that f :

U \ {a} ! C is holomorphic. The point a is a removable singularity if there is a holomorphic
function f˜ : C ! C such that f (z) = f˜(z) for z 2 U \ {a} (i.e., f is holomorphic after
possibly redefining it at z = a). The point a is a pole of the function f , if there is a holo-
morphic function g : U ! C and a natural number n, such that g(z) = (z a)n f (z) for all
z 2 U \ {a} (i.e., f is holomorphic after multiplying it by some power of (z a)) . The least
such n is called the order of the pole. A pole of order 1 is sometimes called a simple pole.
If a is neither a removable singularity nor a pole, then it is called an essential singularity.
The function f is said to have a removable singularity, pole or essential singularity at 1 if
the function g(z) = f (1/z) has a removable singularity, pole or essential singularity (respec-
tively) at z = 0.
Functions with removable singularities are those that have simply been shifted at a single
point or are undefined there such as
⇢
z, z 6= 0, x + 3i
f (z) = or f (z) = 2 .
1, z = 0, x +9
Functions with poles are those that blow up like a polynomial near certain points; for example
1
f (z) =
(z 1)(z 2 + 3i)3
has a degree one pole at z = 1 and a degree three pole at z = 2 3i. Essential singularities
usually occur when functions with poles are composed with non-polynomial functions. For
example, the function f (z) = cos(1/z) has an essential singularity at z = 0.
Knowing the behavior of functions near poles will help us calculate the integral of f
around paths that encircle the poles.
Definition 45 (Residue). Suppose that U ⇢ C is open and that f : U \ {a} ! C is

holomorphic at that the point a is an nth order pole of f . Then the residue of f at a is
defined by
1 dn 1
Res(f, a) = lim n 1 [(z a)n f (z)].
(n 1)! z!a dx
In particular, if a is a simple pole, then
Res(f, a) = lim [(z a)f (z)].
z!a
The importance of residues is the following.
Theorem 46 (Residue Theorem). Suppose that U ⇢ C is open and that f : U ! C

is holomorphic except at isolated points {a1 , . . . , an } in U where f has poles. Let C be a
simple, closed curve with positive orientation1 in U . Then
I X
f (z)dz = 2⇡i Res(f, ak )
C
where the sum ranges over the points ak which lie inside the curve.
Example 47. Let C be the circle of radius 2 centered at the origin. Let
1
f (z) = .
(z 1)(z + i))(z 2 i)
Z
Find f (z)dz.
C
Solution. Only the poles that are inside the curve contribute to the integral. The pole at
z = 2 + i is outside the contour and thus we can ignore it; we need only calculate the residues
at z = 1 and z = i. We see
z 1 1 i
Res(f, 1) = lim = = .
z!1 (z 1)(z + i)(z 2 i) (1 + i)( 1 i) 2
Likewise
z+i 1 i
Res(f, i) = lim = = .
z! i (z 1)(z + i)(z 2 i) ( 1 i)( 2 2i) 4
The integral is then Z ✓ ◆
i i ⇡
f (z)dz = 2⇡i = .
C 2 4 2
1
Here positive orientation means that the curve is traversed couterclockwise. A slight technicality: we
also need C to be rectifiable; i.e., have finite length. Any reasonable curve will satisfy this.
This gives us a way to calculate integrals when the function f has only poles. We do not
yet have a definition for residue when the function has an essential singularity. To deal with
this, we need to discuss Laurent expansions.
Definition 48 (Laurent Expansion (rough idea)). A Laurent expansion for a function

f at a point a is an expansion in positive and negative powers of (z a):
1
X
f (z) = cn (z a)n .
n= 1
Typically Laurent expansions converge in annuli. If cn = 0 for n < 0, then this will reduce
to the ordinary Taylor expansion.
5
Example 49. Find the Laurent expansions of f (z) = (1 z)(2i z)
which converge for (a)
|z| < 1, (b) 1 < |z| < 2 and (c) |z| > 2.
Solution. Practically, we can compute Laurent expansions using some clever factoring and
our prior knowledge of Taylor series. For this function, we will first use partial fractions to
see
1 + 2i 1 + 2i
f (z) = .
2i z 1 z
From prior knowledge of Taylor series, we know that
1
X
1
= zn
1 z n=0
for |z| < 1. Likewise, we see

1
1 1 1 1 X
= = (z/2i)n
2i z 2i 1 (z/2i) 2i n=0
for |z/2i| < 1 or |z| < 2. When |z| < 1, both of these are convergent Taylor series and we
have !
X1 X1 ⇣ ⌘n
1 z
f (z) = (1 + 2i) zn +
n=0
2i n=0 2i
for |z| < 1.
For 1 < |z| < 2, the latter Taylor series still converges; however, the former diverges so
we need to do something more clever. We see
1 1 1
= .
1 z z (1 (1/z))
Now if |z| > 1, then |1/z| < 1 and so
1 1
1 1X 1 X 1
= = .
1 z z n=0 z n n=1
z n
Thus the function has Laurent expansion

!
1 X ⇣ z ⌘n
X1 1
1
f (z) = (1 + 2i) +
n=1
z n 2i n=0 2i
which converges for 1 < |z| < 2.

For |z| > 2, both Taylor series diverge. We have already made the correct adjustment
for the first series; we need to do the same for the second. We see
1 1 1
=
2i z z (1 (2i/z))
and |2i/z| < 1 since |z| > 2. Thus

1 ✓ ◆n 1
1 1X 2i X (2i)n 1
= = .
2i z z n=0 z n=1
zn
Thus f has Laurent expansion

1 1
!
X 1 X (2i)n 1
f (z) = (1 + 2i)
n=1
zn n=1
zn
which converges for |z| > 2.
For some functions, this can be very easy. For example, knowing the Taylor expansion
for f (z) = ez makes it trivial to calculate the Laurent expansion for g(z) = e1/z at z = 0.
Indeed, we see that
X1
1
g(z) = e1/z =
n=0
n!z n
which converges for all z 6= 0.
One main use for a Laurent expansion is that it helps us calculate the residue for a func-
tion with an essential singularity.
Proposition 50. Suppose that f has a pole or an essential singularity at a. Then

Res(f, a) = c 1 where c 1 is the coefficient of (z a) 1 in the Laurent expansion of f about a.
This shows, for example, that Res(e1/z , 0) = 1 and Res(cos(1/z), 0) = 0. Using this, we
can apply the residue theorem to calculate path integrals when the path encircles an essential
singularity.
Example 51. Let C be the circle of radius 3 centered at 2 + i. Find

Z ✓ ◆
1
sin dz.
C z2 + 1
Solution. The integrand has essential singularities at z = ±i. Recall, the path integral
is equal to 2⇡i times the sum of the residues which lie inside the path. The singularity at
z = i lies outside of the contour so it can be ignored. We need only calculate the the
residue at z = i. Here, we see
✓ ◆ X 1
1 1
sin 2
= .
z +1 n=0
(2n + 1)!(z 2 + 1)2n+1
1
Now all terms with n 1 will not include (z i) since the power will be too high. Thus
we need to consider the n = 0 term. We have
1 1 i/2 i/2
= = .
z2 +1 (z + i)(z i) z+i z i
1
Now (z + i) has a Taylor series which converges in a neighborhood of z = i. Thus we have
✓ ◆ X 1 1
X
1 i/2 1
sin = + cn (z i)n +
z2 + 1 z i n=0 n=1
(2n + 1)!(z 2 + 1)2n+1
which shows that c 1 = i/2. Thus

Z ✓ ◆
1
sin dz = 2⇡i( i/2) = ⇡.
C z2 + 1
One last application of the residue theorem is evaluating integrals with infinite bounds.
For example, certain times evaluating
Z 1
f (x)dx
1
can be very difficult using methods from Calculus II. However, we can envision f as a func-
tion from C ! C and the line R is part of a contour in C. We demonstrate this in a final
example but first we need a lemma.
Proposition 52 (ML Inequality). Suppose that f : C ! C and that C is a curve in C

such that |f (z)|  M for z 2 C. Then
Z
f (z)dz  M `(C)
C
where `(C) is the length of C.
This is the closest analog in complex analysis to the fact that the Riemann integral pre-
serves inqualities. With this, we can do one last example.
Z 1
cos(x)dx
Example 53. Evaluate the integral 2
.
1 1+x
Solution. Using the integral methods from calculus, this looks fairly unapproachable.
However, it is easy to evaluate using the residue theorem. First note that since sin(x)
1+x2
is odd,
we have Z 1 Z 1 ix
cos(x)dx e dx
2
= 2
.
1 1+x 1 1+x
Consider a large R > 0 and let C be the contour which travels from R to R along the real
axis and then from R back to R along the semicircle CR parameterized by z = Rei✓ for
✓ 2 [0, ⇡] (pictured below).
eiz
Inside this contour, f (z) = 1+z 2 has one pole at z = i. Thus the value of the integral of
f over the contour is 2⇡i times the residue at that pole. We see
I
(z i)eiz eiz 1 ⇡
Res(f, i) = lim 2
= lim = =) f (z)dz = .
z!i 1 + z z!i z + i 2ie C e
On the other hand, we have
I Z R Z
eix dx eiz dz
f (z)dz = + .
C R 1 + x2 CR 1 + z2
When z is on the arc CR , notice that |eiz | = eiR(cos(✓)+i sin(✓)) = e R sin(✓)  1 where ✓ 2 [0, ⇡]
is the argument of z. Using this and the reverse triangle inequality, we see that for z on the
eiz 1 1
arc CR , we have 1+z 2  |1+z 2 |  R2 1 . Hence, the ML inequality gives
Z
eiz dz `(CR ) ⇡R
2
 2 = 2 ! 0 as R ! 1.
CR 1+z R 1 R 1
Hence sending R ! 1, we have

Z 1 ix I iz Z 1
e dx e dz ⇡ cos(x)dx ⇡
2
= 2
= =) = .
1 1+x C 1+z e 1 1 + x2 e
Figure 1: Example 53, Contour C

Christian Parkinson GRE Prep: Topology & Real Analysis Notes 1
Week 6: Topology & Real Analysis

Notes
To this point, we have covered Calculus I, Calculus II, Calculus III, Di↵erential Equations,
Linear Algebra, Complex Analysis and Abstract Algebra. These topics probably comprise
more than 90% of the GRE math subject exam. The remainder of the exam is comprised
of a seemingly random selection of problems from a variety of di↵erent fields (topology, real
analysis, probability, combinatorics, discrete math, graph theory, algorithms, etc.). We can’t
hope to cover all of this, but we will state some relevant definitions and theorems in Topology
and Real Analysis.
Topology
The field of topology is concerned with the shape of spaces and their behavior under
continuous transformations. Properties regarding shape and continuity are phrased using
the concept of open sets.
Definition 1 (Topology / Open Sets). Let X be a set and ⌧ be a collection of subsets

of X. We say that ⌧ is a topology on X if the following three properties hold:
(i) ?, X 2 ⌧
n
\
(ii) If T1 , . . . Tn is a finite collection of members of ⌧ , then Ti 2 ⌧
i=1
[
(iii) If {Ti }i2I is any collection of members of ⌧ , then Ti 2 ⌧
i2I
In this case, we call the pair (X, ⌧ ) a topological space and we call the sets T 2 ⌧ open sets.
Note, there are two topologies which we can always place on any set X: the trivial topol-
ogy ⌧ = {?, X} and the discrete topology ⌧ = P(X). Having defined open sets, we are able
to define closed sets.
Definition 2 (Closed Sets). Let (X, ⌧ ) be a topological space. A set S ⇢ X is called

closed i↵ S c 2 ⌧ . That is, S is defined to be closed if S c is open.
The words open and closed can be a bit confusing here. Often times students mistak-
enly assume that a set is either open or closed; that these terms are mutually exclusive and
describe all sets. This is not the case. Indeed, sets can be open, closed, neither open nor
closed, or both open and closed. In any topological space (X, ⌧ ), the sets ? and X are both
open and closed. By De Morgan’s laws, since finite intersections and arbitrary unions of open
sets are open, we see that finite unions and arbitrary intersections of closed sets remain closed.
Example 3. The set of real numbers R becomes a topological space with open sets defined
as follows. Define ? to be open and define ? 6= T ⇢ R to be open i↵ for all x 2 T , there
exists " > 0 such that (x ", x + ") ⇢ T . Prototypical open sets in this topology are the
open intervals (a, b) = {x 2 R : a < x < b}. Indeed, this interval is open because for
x 2 (a, b), we can take " = min{|x a| , |x b|} and we will find that (x ", x + ") ⇢ (a, b).
We can combine open sets via unions or (finite) intersections to make more open sets; for
example (0, 1) [ (3, 5) is also an open set. Likewise, prototypical closed sets are closed in-
tervals [a, b] = {x 2 R : a  x  b}, and any intersection or (finite) union of such sets will
remain closed. As was observed above ? and R are both open and closed; in fact, in this
space, these are the only sets which are both open and closed, though it is easy to construct
sets which are neither open nor closed. Consider the set [0, 1) = {x 2 R : 0  x < 1}. This
set is not open because the point 0 is in the set, but it cannot be surrounded by an interval
which remains in the set. The complement of this set is ( 1, 0) [ [1, 1). This set is not
open since 1 is in the set but cannot be surrounded by an interval which remains in the set.
Since the complement is not open, the set [0, 1) is not closed. Note, this topology is called
the standard topology on R.
Example 4. While the above example defines the standard topology on R, it is easy to
come up with non-standard topologies as well. Indeed, let us now define T ⇢ R to be open
if T can be written as a union of sets of the form [a, b) = {x 2 R : a  b < x}. These open
sets comprise a topology on R. In this topology a prototypical open set is of the form [a, b).
What other sets are open in this topology? Notice that
1
[
(a, b) = [a + 1/n, b)
n=1
which shows that sets of the form (a, b) remain open in this topology. Also notice that since
[a, b) is open, we define
[a, b)c = ( 1, a) [ [b, 1)
to be closed. However, both ( 1, a) and [b, 1) are easily seen to be open, so the set
( 1, a) [ [b, 1) is also open as a union of open sets. Since this set is open, it’s complement
[a, b) is closed. Hence in this topology, all sets of the form [a, b) are both open and closed.
The intervals [a, b] are closed and not open in this topology. Note, this topology is called
the lower limit topology on R.
Notice in these example, the lower limit topology contains as open sets all of the sets
which are open in the standard topology. In this way, the lower limit topology has “more”
open sets and we can think of the lower limit topology “containing” the standard topology.
We define these notions here.
Definition 5 (Finer & Coarser Topologies). Suppose that X is a set and ⌧, are two
topologies on X. If ⌧ ⇢ , we say that ⌧ is coarser than and that is finer than ⌧ .
On any space X, the finest topology is the discrete topology P(X) and the coarsest is
the trivial topology {?, X}. A finer topology is one that can more specifically distinguish
between elements.
Definition 6 (Interior & Closure). Let (X, ⌧ ) be a topological space and let T ⇢ X.
The interior of T is defined to be the largest open set contained in T . The closure of T is
defined to be the smallest closed set containing T . We denote these by int(T ) and cl(T )
respectively. In symbols, we have
[ \
int(T ) = S and cl(T ) = S.
S2⌧,S⇢T S c 2⌧,T ⇢S
Other common notations are T̊ for the interior of T , and T for the closure of T .
Example 7. Considering R with the standard topology and a, b 2 R, a < b, we have

int([a, b)) = (a, b) and cl([a, b)) = [a, b].
In both of the examples above, there was some notion of a “prototypical” open set, from
which other open sets can be built. We give this notion a precise meaning here.
Definition 8 (Basis (Base) for a Topology). Let X be a set and let be a collection
of subsets of X such that
[
(1) X = B,
B2
(2) if B1 , B2 2 , then for each x 2 B1 \ B2 , there is B3 2 such that x 2 B3 and

B3 ⇢ B1 \ B2 .
Then the collection of sets
⌧ = T : T = [i2I Bi for some collection of sets {Bi }i2I ⇢
forms a topology on X. We call this ⌧ the topology generated by , and we call a basis
for the topology ⌧ .
This is half definition and half theorem: we are defining what it means to be a basis, and
asserting that the topology generated by a basis is indeed a topology. If we can identify a
basis for a topology, then the basis sets are the “prototypical” open sets, and all other open
sets can be built as unions of the basis sets. Morally, basis sets are representatives for the
open sets; if you can prove a given property for basis sets, the property will likely hold for
all open sets. Often times it is easiest to define a topology by identifying a basis.
Example 9. Above we defined the standard topology on R by saying that a set T is open
if for all x 2 T , there is " > 0 such that (x ", x + ") ⇢ T . It is important to see this
definition of the topology; however, this is a much more analytic than topological definition.
The topological way to define the standard topology on R would be to define it as the topol-
ogy generated by the sets (a, b) where a, b 2 R, a < b. Indeed, these two definitions of the
standard topology are equivalent as the following proposition shows.
Proposition 10. Suppose that (X, ⌧ ) is a topological space and that is a basis for the
topology ⌧ . Then T 2 ⌧ i↵ for all x 2 T , there is B 2 such that x 2 B and B ⇢ T .
It is important to identify when two bases in a topological space generate the same topol-
ogy and this next proposition deals with that question.
Proposition 11. Suppose that X is a set and 1 , 2 are two bases for topologies ⌧1 and ⌧2 .
Then ⌧1 ⇢ ⌧2 i↵ for every B1 2 1 , and for every x 2 B1 , there is B2 2 2 such that x 2 B2
and B2 ⇢ B1 . (Be very careful not to mix up the inclusions in this statement. What this
is essentially saying is that 2 generates a larger (finer) topology i↵ 2 has more (smaller)
sets.) Informally, the basis 2 generates a finer topology if we can squeeze basis sets from 2
inside basis sets from 1 (and not only that, but we can construct basis sets from 1 out of
basis sets from 2 ).
Just as all groups have subgroups and all vector spaces have subspaces, there is a natural
way to define subspaces of a topological space.
Definition 12 (Subspace Topology). Let (X, ⌧ ) be a topological space and let Y ⇢ X.

Then the collection of sets
= {Y \ T : T 2 ⌧ }
forms a topology on Y . This topology is called the subspace topology on Y inherited from
(X, ⌧ ).
Again, this is part definition and part theorem; we are asserting that such does indeed
define a topology on Y .
Example 13. Consider [0, 3] ⇢ R with the standard topology on R. Note that the subspace
topology on [0, 3] includes standard open sets like (1, 2) since this set is open in R and
(1, 2) = [0, 3] \ (1, 2).
Now consider the set (1, 3]. This set is not open in R; however, it is open in the subspace
topology on [0, 3], because (1, 4) is open in R and
(1, 3] = [0, 3] \ (1, 4).
Likewise there is a natural way to combine topological spaces in a Cartesian product.
Definition 14 (Product Topology). Let (X, ⌧ ), (Y, ). Recall the Cartesian product is
given by coupling elements of X and Y : X ⇥ Y ..= {(x, y) : x 2 X, y 2 Y }. It is tempting to
define a topology on X ⇥ Y comprised of sets of the form T ⇥ S for T 2 ⌧, S 2 . However,
these do not form a topology on X ⇥ Y since a union of sets of this form will not be of this
form anymore. So rather, we let = {T ⇥ S : T 2 ⌧, S 2 } form the basis for a topology
on X ⇥ Y . The topology generated by is denoted ⌧ ⇥ and the space (X ⇥ Y, ⌧ ⇥ ) is
the called the product space of X and Y .1
Example 15. Consider R with the standary topology, which we call ⌧1 . The product space
(R ⇥ R, ⌧1 ⇥ ⌧1 ) can be visualized by drawing the plane with standard open sets as rectangles
(a, b) ⇥ (c, d) = {(x, y) 2 R ⇥ R : a < x < b and c < y < d}. Alternatively, we can consider
the set R2 of 2-dimensional vectors. On this space, we consider the topology ⌧2 generated
by circles: Br (v) = {z 2 R2 : kv zk < r} for v 2 R2 and r > 0 (indeed, this is called the
standard topology on R2 ). We can identify each vector v = ( xy ) 2 R2 with the coordinates
(x, y) 2 R ⇥ R. Since this is a bijective map, the sets R ⇥ R and R2 are really the same.
We’d like to know if the topologies ⌧1 ⇥ ⌧2 and ⌧2 are the same. To prove they are the same,
consider the bases
1 = {(a, b) ⇥ (c, d) : a, b, c, d 2 R, a < b, c < d} and

2 = Br (v) : v = ( xy ) 2 R2 , r > 0 .
For any B1 = (a, b)⇥(c, d) 2 1 , take any (x, y) 2 B1 and let r = min{x a, b x, y x, d y}.
Then for v = ( xy ), we will have x 2 Br (v) ⇢ B1 ; this shows that for any (x, y) 2 B1 , we can
find a set B2 2 2 such that (x, y) 2 B2 and B2 ⇢ B1 . Hence by Proposition 11, ⌧1 ⇥⌧1 ⇢ ⌧2 .
Conversely, let v = ( xy ) and r > 0pand consider the set B2 = Br (v) 2 2 . For any u = ( wz ) 2
Br (v), define r0 = (r ku vk2 )/ 2. Then the square B1 = (z r0 , z + r0 ) ⇥ (w r0 , w + r0 )
satisfies u = ( wz ) 2 B1 and B1 ⇢ B2 . Thus by Proposition 11, we have ⌧2 ⇢ ⌧1 ⇥ ⌧1 ,
and we can conclude that ⌧2 = ⌧1 ⇥ ⌧1 . (Note, this inclusions of basis sets is pictured in
Figure 1.) That is, the standard topology on R2 is the product of two copies of the standard
topology on R. More generally, for n 2 N, we can define the standard topology on Rn to be
the topology generated by open balls and we will find that this is the same as the product
of n copies of the standard topology on R.
Topology gives us the minimum structure required to discuss limits and continuity. In-
deed, in calculus we were only able to discuss these things because R is naturally a topological
space with the standard topology. We give the topological definitions of limits and continuity
here and discuss some of their properties.
Definition 16 (Limit of a Sequence). Let (X, ⌧ ) be a topological space and let {xn }1 n=1
be a sequence of values in X. We say that {xn }1 n=1 converges to a limit x 2 X if for any
open set T 2 ⌧ such that x 2 T , there is N 2 N, such that xn 2 T for all n N . We write
this as xn ! x or lim xn = x.
n!1
1
Note, this is only a good definition for the product topology when we are taking the product of a finite
number of spaces. Indeed, if {(Xi , ⌧i )}i2IQis an arbitrary collection of topological spaces, it is most natural
to define the product topology on X = i2I Xi to be the coarsest topology so that Q the projection maps
⇡i : X ! Xi are continuous. The topology defined generated by sets of the form i2I Ui where Ui 2 ⌧i is
then called the box topology. One can show that for a finite Cartesian product, the product topology and box
topology agree with each other; this is not necessarily true Qfor infinite products. (Another way to “correctly”
define
Q the product topology for an infinite product X = i2I Xi is to let it be generated by sets of the form
i2I U i where U i 2 ⌧ i and U i = X i for all but finitely many i 2 I.)
Figure 1: A basis set from either topology ⌧1 ⇥ ⌧1 or ⌧2 can be fit around any point in a basis
set from the other topology.
Note that this is a generalization of the definition we gave for a limit in calculus; in cal-
culus we are always using the standard topology on R. One feature of the limit in calculus
is that limits are unique: if xn ! x and xn ! y, then x = y. This is not true in a general
topological space.
Example 17. Consider a space X with the trivial topology ⌧ = {?, X}. Take any sequence
n=1 in X and apply the definition of the limit. For any x 2 X, and any N 2 N, we see
{xn }1
that if T 2 ⌧ and x 2 T , then T = X and xn 2 T for all n N . Thus in this space, every
sequence converges to every point.
There are non-trivial topologies where limits are still non-unique, but our intution tells
us that limits should be unique and we can add a simple property to ensure that they are.
Definition 18 (Hausdor↵ Space). A topological space (X, ⌧ ) is called a Hausdor↵ space

(or is said to have the Hausdor↵ property), if for all x, y 2 X with x 6= y, there are open
sets Tx , Ty 2 ⌧ such that x 2 Tx , y 2 Ty and Tx \ Ty = ?.
Proposition 19. Limits in Hausdor↵ spaces are unique. That is, if (X, ⌧ ) is a Hausdor↵
space and {xn }1 1
n=1 is a sequence in X, then {xn }n=1 can have at most one limit x 2 X.
Now we would like to discuss maps between spaces. As with linear transforms in linear
algebra and homomorphisms in abstract algebra, we restrict our discussion to maps which
preserve some of the underlying structure of the space. In topology, these are the continuous
functions.
Definition 20 (Continuous Function). Let (X, ⌧ ) and (Y, ) be two topological spaces.
A function f : X ! Y is said to be continuous i↵
1
f (V ) = {x 2 X : f (x) 2 V } 2 ⌧ whenever V 2 .
That is, f is continuous if the preimage of every open set in Y is open in X.
This gives us a way to define continuity without ever considering individual points. Con-
trast this with the calculus definition of continuity where we first define what it means for a
function to be continuous at a point, and then define continuous functions on a domain to
be those which are continuous at each point. Of course, topology still has a notion of what
it means to be continuous at a point.
Definition 21 (Pointwise Continuity). Let (X, ⌧ ) and (Y, ) be two topological spaces
and let f : X ! Y . We say that f is continuous at the point x 2 X, if for all V 2 such
that x 2 V , we have f 1 (V ) 2 ⌧ .
It is a good exercise to prove that when both X, Y are R with the standard topology, this
definition of continuity is equivalent to the "- definition of continuity presented in calculus.
Continuity is an important property because it preserves certain features of topological

spaces. Indeed, if f : X ! Y , we define the image of X under f by f (X) = {f (x) : x 2 X}.
We define f to be the collection of subsets V ⇢ f (X) such that f 1 (V ) is open in X. If f
is continuous, this collection will form a topology, and this topology f on f (X) will be the
same as the subspace topology that f (X) inherits from (Y, ). This shows that continuous
functions respect the structure of the underlying spaces; in other words, continuous functions
are the morphisms in the category of topological spaces.
Above we showed that the spaces (R2 , ⌧2 ) and (R ⇥ R, ⌧1 ⇥ ⌧1 ) are essentially the same
space; it is important to be able to be able to identify equivalent spaces or distinguish be-
tween distinct topological spaces and continuity helps us do that. We define a few more
properties of topological spaces here.
Definition 22 (Connectedness). Let (X, ⌧ ) be a topological space and let C ⇢ X. We

say that the set C is disconnected i↵ there exist non-empty A, B 2 ⌧ such that C ⇢ A [ B
and A \ B = ?. We say the space is connected i↵ it is not disconnected.
Example 23. Intuitively, a set is connected if it is

in one whole piece; disconnected sets have separate
pieces broken o↵ from each other. Thus for exam-
ple, in R with the standard topology, the set [5, 9) is
connected while the set {1} [ [5, 9) [ (10, 1) is dis-
connected. However note, this is only a heuristic! It
works very well in R, but even in R2 there are fa-
mous examples that challenge this intuition. Indeed,
consider the set C ⇢ R2 (pictured right) given by
1
C = {(0, y) : y 2 [ 1, 1]}[ x, sin x
: x 2 (0, 1) .
This is called the Topolgist’s Sine Curve, and while it

is defined as the disjoint union of two sets, it is actu-
ally connected. Indeed, any open set containing the
vertical strip {(0, y)}y2[ 1,1] will necessarily contain Figure 2: Topologist’s Sine Curve
some of the curve. Thus this set cannot be covered by

two disjoint, open sets. (Actually, a stronger notion of connectedness is path-connectedness.
Roughly speaking, a set is path-connected if one can draw a path between any two points
in the set without leaving the set. At first glance, path-connectedness may seem equivalent
to connectedness, but the topologist’s sine curve is connected without being path-connected.)
One important result involving connectedness helps us classify sets which are both open
and closed.
Proposition 24. Suppose that (X, ⌧ ) is a topological space. Then X is connected i↵ the
only sets which are both open and closed in X are ? and X itself.
Definition 25 (Compactness). Let (X, ⌧ ) be a topological space and let C ⇢ X. We

say that C is compact i↵ from any collection of open sets {Ui }i2I such that C ⇢ [i2I Ui , we
can extract a finite collection of sets Ui1 , . . . , Uin such that C ⇢ [nk=1 Uik .
Such a collection {Ui }i2I in the definition of compactness is called an open cover of C.
Thus, in words, a set C is compact if every open cover of C admits a finite subcover.
Example 26. Compactness is somehow a generalization of closedness and boundedness.

Indeed, we give examples here of a bounded set which is not closed and a closed set which
is not bounded and prove that neither are compact. Consider R with the standard topol-
ogy and consider the bounded set (0, 1). This set is not closed, and we show that it is not
compact. Consider Uk = (0, 1 1/k) for k 2 N. We see that (0, 1) ⇢ [1 k=1 Uk so {Uk } forms
an open cover of (0, 1). However, if we take any finite subcollection Uk1 , . . . , UkN of {Uk }
and let K = max{k1 , . . . , kN }. Then (0, 1) 6⇢ [N n=1 Ukn = (0, 1 1/K). Thus there cannot
be a finite subcover for this particular cover, and so this cover violates the definition of
compactness and we conclude that (0, 1) is not compact. Similarly, consider the set [0, 1).
This set is closed, but is not bounded, and we show that it is not compact. Consider the
collection Uk = ( 1, k) for k 2 N . We see [0, 1) ⇢ [1 k=1 Uk , but again, any finite collection
N
Uk1 , . . . , UkN will satisfy [0, 1) 6⇢ [n=1 Ukn = ( 1, K) where K = max{k1 , . . . , kN }, and so
this open cover admits no finite subcover, and hence we conclude that (0, 1) is not compact.
By contrast, a closed and bounded subset of R like [0, 1] is compact, though this is not at all
trivial to prove. In general topological spaces, it is easier to show that a set isn’t compact,
since this only requires exhibiting one example of an infinite cover that does not admit a
finite subcover. In R (and more generally, in metric topologies), there are nice theorems
which give concrete lists of properties which are equivalent to compactness. We cover some
of these theorems in the Real Analysis portion of these notes.
Proposition 27. Suppose that (X, ⌧ ) and (Y, ) are topological spaces and that f : X ! Y
is continuous. Let U ⇢ X and consider the image of U under f defined by f (U ) ..= {f (x) :
x 2 U } ⇢ Y . If U is connected in X, then f (U ) is connected in Y . Likewise, if U is compact,
then f (U ) is compact in Y .
This proposition tells us that these properties of connectedness and compactness are in-
variant under continuous maps. Statements like this help us characterize topological spaces.
Note however that the converse is not true: a continuous map could still map a disconnected
set to a connected set (for example, the continuous function f (x) = x2 on R maps the dis-
connected set ( 1, 0) [ (0, 1) to the connected set (0, 1)), or a non-compact set to a compact
set (for example, the continuous map f (x) = sin(x) maps the non-compact set (0, 1) to the
compact set [ 1, 1]). If we want a continuous map to not change the structure of a space at
all, we need to require something more.
Definition 28 (Homeomorphism). Suppose that (X, ⌧ ) and (Y, ) are topological

spaces. A function f : X ! Y is called a homeomorphism i↵ the following four properties
hold:
1. f is one-to-one,
2. f is onto,
3. f is continuous,
1
4. f is continuous.
If such a function f exists, the topological spaces (X, ⌧ ) and (Y, ) are called homeomorphic
and we write X ⇠ = Y.
In words, a homeomorphism between two topological spaces is a bicontinuous bijection;

it maps each space bijectively and continuously to the other. Homeomorphic spaces share
essentially all important properties in common, so when two spaces are homeomorphic we
think of them as ”morally the same space.” For this reason it can be important to identify
whether two spaces are homemorphic. We make a few final statements about properties that
homeomorphic spaces share, and give one example in conclusion.
Proposition 29. Let (X, ⌧ ) and (Y, ) be topological space, let U ⇢ X and let f : X ! Y
be a homeomorphism. Then
U is open in X i↵ f (U ) is open in Y ,
U is closed in X i↵ f (U ) is closed Y ,
U is connected in X i↵ f (U ) is connected in Y ,
U is compact in X i↵ f (U ) is compact in Y ,
X is Hausdor↵ i↵ Y is Hausdor↵.
Example 30. Consider R with the usual topology. Any open interval (a, b) is homeomorphic
to the interval (0, 1) under the map f : (a, b) ! (0, 1) defined by
x a
f (x) = , x 2 (a, b).
b a
Likewise, (0, 1) is homeomorphic to R under the map g : (0, 1) ! R defined by

1
g(x) = tan ⇡ x 2
, x 2 (0, 1).
Homemorphism is an equivalence relation (if two spaces are homemorphic to the same space,
they are homemorphic to each other); indeed we can always compose homemorphic maps
and retain a homemorphism. Thus any interval (a, b) is homemorphic to R.
Real Analysis
Real analysis is concerned with the rigorous underpinnings of calculus. However, when
we teach calculus, we do everything formally and so everything is assumed to be “nice”
(all the common functions from calculus are smooth, for example). Now we make no such
assumptions: analysis is largely about tearing down our intuition from calculus and building
it back up again with rigor. Accordingly, most real analysis courses start with the basic
construction of R; we discuss this formally later, but begin here by establishing properties of
sequences, functions, sets, etc. Much of this will overlap with the preceding topology notes,
but often the same concepts are tackled in very di↵erent ways. Some of this will also be
repeated from the Calculus I & II notes.
We’ll start by discussing general metric spaces, giving several definitions and theorems,
and later specialize the conversation to R.
Definition 31 (Metric Space). Let X be a set and let d : X ⇥ X ! [0, 1). We call d a
metric on X (and call (X, d) a metric space) if the following three properties hold:
1. d(x, x) = 0 for all x 2 X and d(x, y) > 0 when x, y 2 X, x 6= y,
2. d(x, y) = d(y, x) for all x, y 2 X,
3. d(x, z)  d(x, y) + d(y, z) for all x, y, z 2 X.
Metrics generalize the notion of distance to non-Euclidean spaces.
Example 32. The prototypical example of a metric space is R with the metric d(x, y) =
|x y|. This can be generalized to Rn . Indeed, in Rn , we define the metric
n
!1/2
X
d(x, y) = kx yk ..= (xi yi ) 2 , x, y 2 Rn .
i=1
Another example: for any set X, we can define the discrete metric d(x, y) = 0 if x = y and
d(x, y) = 1 if x 6= y.
Definition 33 (Metric Topology). Let (X, d) be a metric space, x 2 X and r > 0.

Define the ball centered at x of radius r by
Br (x) = {y 2 X : d(x, y) < r}.
Let ⌧ be the topology generated by the set = {Br (x) : x 2 X, r > 0}. This is called the
metric topology on X. In this topology, a set U ⇢ X is open i↵ for all x 2 U , there is r > 0
such that Br (x) ⇢ U . We will also refer to ⌧ as the topology generated by the metric d.
Conversely, if we have a topological space (X, ⌧ ) and there is a metric d on X that generates
⌧ , then we call ⌧ metrizable.
For some example, the discrete metric on a set will generate the discrete topology. The
standard metric d(x, y) = |x y| on R will generate the standard topology on R.
Metric topologies have very nice structure. Most of the topological properties discussed
above can be given new definitions only using the metric structure that d lends to X. Thus,
the definitions in real analysis and topology may look di↵erent at a first glance, but they are
always compatible. We discuss some properties of metric topologies now.
Proposition 34. Let (X, d) be a metric space. The metric topology on (X, d) is Hausdor↵.
This is very easy to prove: if x, y 2 X and x 6= y then d(x, y) > 0. Then B" (x) and
B" (y), where " = d(x, y)/3, are open neighborhoods of x and y respectively which are dis-
joint, proving that the space is Hausdor↵.
Definition 35 (Limit of a Sequence). Let (X, d) be a metric space and let {xn }1 n=1 be
a sequence in X. We say that x 2 X is the limit of xn i↵ for all " > 0, there is N 2 N such
that d(x, xn ) < " for all n N . In this case, we say that xn converges to x and we write
xn ! x or limn!1 xn = x.
This definition of the limit is exactly as in calculus but generalized to arbitrary metric
spaces. Limits give us a way to characterize closed sets in metric topologies.
Definition 36 (Limit Points). Let (X, d) be a metric space and let U ⇢ X. A point
x 2 X is called a limit point of U if there is a sequence {xn } in U such that xn 6= x for all
n 2 N and xn ! x.
Proposition 37. Let (X, d) be a metric space and let C ⇢ X. Then C is closed in the
metric topology i↵ for all sqeuences {xn } in C converging to a limit x 2 X, we have that
x 2 C. In the terminology of the above definition, a subset of a metric space is closed i↵ it
contains all of its limit points.
Recall, in topology a set is closed i↵ the complement of the set is open. This theorem
gives an alternate definition, and it is often times easier to check that a set contains its limit
points than to check that its complement is open.
Definition 38 (Closure). Let U ⇢ X and let L = {x 2 X \ U : x is a limit point of U.}.

Then the closure of U is defined by U = U [ L. That is, the closure of U is the set U plus
all of the limit points of U .
Note, we already defined the closure in topology to be smallest closed set containing a
set. Again, these notions are compatible: U defined in the definition above is the smallest
set which is closed in the metric topology and contains U . In many metric spaces, we can
build any point in the space by considering a smaller set and taking limits.
Definition 39 (Dense Set). Let (X, d) be a metric space and let D ⇢ X. We say that D
is dense in X i↵ for all x 2 X, there is a sequence {xn } in D such that xn ! x. Equivalently,
D is dense in X i↵ for all x 2 X and " > 0, there is y 2 D such that d(x, y) < ". Again,
equivalently, D is dense in X if D = X. One last equivalent statement: D is dense in X if
every open subset of X contains a least one point in D.
Intuitively, a dense set is tightly packed into X; it may not include all elements, but the
gaps between elements are infinitesimally small. In this way it seems like a dense set must
contain “most” of the space, but this is a place where intuition fails. Indeed, a dense set
can actually be quite small in a few di↵erent senses. We define one sense here and discuss it
more when we discuss R later.
Definition 40 (Countable Set). A set C is countable if there is a injective map

f : C ! N. Equivalently, a set C is countable if the elements of C can be listed in a
n=1 . A set is C countably infinite if there is a bijective map f : C ! N.
sequence: C = {cn }1
Definition 41 (Separable Space). Let (X, d) be a metric space (or more generally a
topological space). We say that X is separable if there is a countable set D ⇢ X which is
dense in X: D = X.
Again, intuitively, it may seem like a separable space needs to be “small” because it has
a “small” dense set, but this intuition is not true in any meaningful sense. There are highly
non-trivial separable spaces.
Definition 42 (Cauchy Sequence). Let (X, d) be a metric space and let {xn } be a
sequence in X. We call {xn } a Cauchy sequence i↵ for all " > 0, there is N 2 N such that
d(xn , xm ) < " for all n, m N .
A Cauchy sequence is one that eventually begins to cluster together. Intuitively, we may
think that if the sequence clusters together, it must cluster around some point and thus it
will converge to that point. However, if the space X is “missing” some points, then the
sequence may cluster around a missing point and thus fail to converge to any member of X.
Thus we use Cauchy sequences to define a notion of a space not having any “missing” points.
Definition 43 (Complete Space). We call a metric space (X, d) complete i↵ for all
Cauchy sequences {xn } in X, there is x 2 X such that xn ! x.
Complete spaces are nice because to prove a sequence {xn } has a limit one first needs to
identify a candidate x and then prove that d(x, xn ) becomes small. However, the candidate
x may be difficult or impossible to identify. However, if the space is complete, to prove that
a sequence converges, one no longer needs to identify a candidate; rather can instead prove
that {xn } is a Cauchy sequence and conclude that it converges in that manner.
Besides sequences, much of calculus is concerned with functions and their properties like
continuity, di↵erentiability and integrability. We can discuss continuity in general metric
spaces; the other concepts require some of the structure of R, so we leave them for later.
Definition 44 (Continuity). Let (X, dX ) and (Y, dY ) be two metric spaces, let f : X ! Y
and let x 2 X. We say that f is continuous at x i↵ for all " > 0, there is = (x, ") > 0
such that for z 2 X, dX (x, z) < =) dY (f (x), f (z)) < ". We say that f is continuous
on X (or merely, f is continuous) i↵ f is continuous at every point x 2 X. Equivalently, a
function f is continuous on X i↵ for all x, y 2 X and all " > 0, there is = (x, y, ") > 0
such that dX (x, y) < =) dY (f (x), f (y)) < ".
This is the exact notion of continuity that we presented in calculus, but generalized to
metric spaces. Again, it is a useful exercise to prove that this notion of continuity is equiv-
alent to the topological notion of continuity. Because metric spaces have nice structure, we
can also characterize continuity in terms of limits of sequences.
Theorem 45 (Sequential Criterion Theorem). Let (X, dX ) and (Y, dY ) be metric

spaces and let f : X ! Y . Then f is continuous i↵ for all sequences {xn } in X converging
to a point x 2 X, we have that f (xn ) ! f (x) in Y .
A function satisfying the latter condition is said to be sequentially continuous, so this

proposition tells us that a function between metric spaces is continuous i↵ it is sequentially
continuous.
In calculus, this would conclude our discussion of continuity. However, in analysis, we
define more stringent definitions of continuity to more finely di↵erentiate between classes of
functions.
Definition 46 (Uniform Continuity). Let (X, dX ) and (Y, dY ) be two metric spaces and
let f : X ! Y . We say that f is uniformly continuous i↵ for all " > 0, there is = (") > 0
such that for all x, y 2 X, dX (x, y) < =) dY (f (x), f (y)) < ".
At first glance this definition looks identical to the definition of continuity, but it is not.
The subtle di↵erence is in the order of the quantifiers. In the definition of continuity, the
is allowed to depend on the particular x and y you are testing; in the definition of uniform
continuity, cannot depend on x and y: there must be a uniform that only depends ". In
logical notation this di↵erent is expressed as such: f is continuous i↵
(8" > 0)(8x, y 2 X)(9 > 0) ; dX (x, y) < =) dY (f (x), f (y)) < ",
whereas f is uniformly continuous i↵
(8" > 0)(9 > 0)(8x, y 2 X) ; dX (x, y) < =) dY (f (x), f (y)) < ".
Thus uniform continuity is a stronger condition: if f is uniformly continuous, then f is

continuous, but not vice versa. And even stronger notion of continuity is as follows.
Definition 47 (Lipschitz Continuity). Let (X, dX ) and (Y, dY ) be two metric spaces
and let f : X ! Y . We say that f is Lipschitz continuous i↵ there is a constant L > 0 such
that for all x, y 2 X, dY (f (x), f (y))  L · dX (x, y). In this case, the smallest such L is
called the Lipschitz constant of f .
Thus Lipschitz continuous function have an explicit bound on the distance between f (x)
and f (y) in terms of the distance between x and y. If f is Lipschitz continuous with constant
L, then for any " > 0, we can take = "/L and we will find that f satisfies the definition of
uniform continuity; hence Lipschitz continuity is even stronger.
With this we drop generality and talk specifically about the analytic and topological
structure of R. Again, we will not explicitly construct the real numbers, but we’ll present
the rough idea, which is to start with rational numbers and define real numbers as limit
points of Cauchy sequences of rationals.
Definition 48 (Rational Numbers). Define the set of rational numbers to be those

which can be written as a ratio of integers. That is, Q = { pq : p, q 2 Z, q 6= 0}. With this
definition, the rational numbers form a field: they are an abelian group under addition and,
neglecting the additive identity, they are an abelian group under multiplication.
The rational numbers become a metric space with the metric d(a, b) = |a b| for a, b 2 Q.
Proposition 49. Q is countably infinite. That is, we can enumerate the rationals in a
sequence Q = {qn }1
n=1 .
This requires what is called a diagonalization argument. One can list the rationals in
a two-dimensional table where row k corresponds to the rationals whose denominator is k.
Then one can traverse the table down each diagonal, assigning a natural number to each
rational number.
Proposition 50. Q is indiscrete in the sense that between any two rational numbers, one
can find another rational number. Indeed, if a, b 2 Q and a < b, then for large enough
n 2 N, we have a < a + n1 < b and a + n1 is still rational.
From this proposition, it seems that the rational numbers do not have any large holes,
and this may lead us to believe that the rational numbers are complete, but this is incorrect.
Indeed, the rational numbers do not form a complete metric space.
Proposition 51. There is no x 2 Q such that x2 = 2, but there is a Cauchy sequence {xn }
in Q such that x2n ! 2.
This proposition ruins completeness. Define f : Q ! Q by f (x) = x2 for x 2 Q. We see

that f (0) = 0 and f (2) = 4, so intuitively because f cannot jump over points, we should be
able to find x 2 Q such that f (x) = 2, but we’ve just asserted that this is impossible. Thus
we conclude that either f is discontinuous because it jumps over a point (but it is easy to
show that f is indeed continuous), or that Q is missing some points. It is the latter that
is true. In the terminology of metric spaces, this shows that Q is not complete. However,
for any metric space, we can define a notion of the completion of the space; that is, we can
define a new metric space by adding some points, which is compatible with the original but
is complete. This is how we define R. That is, the real numbers are defined as those numbers
which can be realized as limit points of Cauchy sequences of rational numbers. Thus any
rational is real, but the numbers x such that x2 = 2 are real without being rational. Though
this is essentially the definition of R, we state this as a proposition.
Proposition 52. Any real number is a limit of rational numbers. That is, Q is dense in
R, and thus any open set in R contains rational numbers. Furthermore R is complete under
the metric d(x, y) = |x y| , x, y 2 R.
Definition 53 (Irrational Numbers). The irrational numbers to be those which are real
but not rational; that is, the irrational numbers are given by R \ Q = {x 2 R : x 62 Q}.
Thus, for example, the solution to x2 = 2 are irrational, and of course, since f : [0, 1) !
[0, 1) defined by f (x) = x2 for x 2 [0, 1) is bijective,
p we can define
p an inverse map, and once
we’ve done, we denote the solutions to x2 = 2 as 2 and 2. Other common irrational
numbers are ⇡ and e.
We asserted before that Q is a countable set. It is reasonable to ask if R is still countable
since all numbers in R are simply limits of numbers in Q.
Proposition 54. The set R of real numbers is not countable, and thus R\Q is not countable
either.
To prove this, one can use another type of diagonalization argument. If we assume that
the numbers between 0 and 1 are countable, then we can list their decimal representations,
but then it is not difficult to explicitly construct a number between 0 and 1 which is not
accounted for in the list. In this sense Q is much smaller than R, but is still manages to be
dense in R. While we know there are holes in Q, we might still wonder about the general
structure of Q within R.
Proposition 55. Q is disconnected in R.

p p
Indeed, we see that Q ⇢ ( 1, 2) [ ( 2, 1). which shows that Q is contained in two
disjoint open sets. We can make this even stronger.
Proposition 56. The irrational numbers R \ Q are dense in R.
Thus in between any two rationals, we can find an irrational, and we can use this to show
that if a set of rational numbers has two points, then it cannot be connected; that is, Q is
totally disconnected asp a subset of R.
This example that 2 62 Q displays something troubling about the structure of Q. Con-
sider the set A = {x 2 Q : x2 < 2}. It is easy to see that this set is bounded (the elements of
this set do not get arbitrarily large; for example, for x 2 A, we will certainly have x < 5), but
there is no tight upper bound in Q. That is, for any rational number q such that x < q for any
x 2 A, one could find a smaller rational number p < q such that x < p for all x 2 A. In short:
there is no least upper bound for this set in Q. This is a feature which is fixed by moving to R.
Definition 57 (Supremum & Infimum). Suppose that A ⇢ R. The supremum of A

is defined to be the least upper bound of A. That is, the supremum is the number S 2 R
(if such a number exists) such that x  S for all x 2 A, and if x  M for all x 2 A, then
S  M . Likewise, the infimum of the set A is defined to the be greatest lower bounded of
A. That is, the infimum is the number I 2 R (if such a number exists) such that x I for
all x 2 A, and if x m for all x 2 A, then I m. When such numbers S, I exist, we write
S = sup(A) and I = inf(A).
Definition 58 (Bounded Set). A set A ⇢ R is called bounded, if there is M > 0 such

that |x|  M for all x 2 A.
Proposition 59. Every non-empty bounded set in R has a finite supremum and infimum.
(Indeed, if we allow the supemum or infimum to take the values ±1, then every non-empty
set in R has a supremum and infimum.)
This is one more way in which we have “filled in the holes” when moving from Q to R.
We want to further study the analytical and topological properties of R. We have already
remarked that R is complete and thus every Cauchy sequence in R has a limit in R. As we
said before, one advantage of this is that when testing if a sequence has a limit, we do not
need to identify a candidate for the limit to prove convergence. We would like other such
tests to characterize when limits in R exist.
Theorem 60 (Monotone Convergence Theorem). Suppose that {xn } is a sequence

in R which is increasing (that is, xn  xn+1 for all n 2 N) and bounded above (that is, there
exists M > 0 such that xn  M for all n 2 N). Then {xn } converges (and in fact, {xn } will
converge to the least M such that xn  M for all n 2 N). Likewise if {xn } is decreasing and
bounded below then it converges.
Loosely speaking, there are two possible ways for a sequence not to converge. It could
either blow up to ±1 as with the sequence xn = n2 or it could oscillate between certain
values as with the sequence xn = ( 1)n . In the first case, no matter how we look at it, the
sequence will always diverge, but in the latter case, if we look only along the even terms
x2n = ( 1)2n = 1, then we have a stable sequence which converges. The succeeding defini-
tions and theorems deals with this situation.
Definition 61 (Subsequence). Suppose that {xn }1 n=1 is a sequence in R. A subsequence

is a sequence {xnk }1
k=1 such that n k < n k+1 for all k 2 N. Thus {xnk } ⇢ {xn }.
Theorem 62 (Convergence Along Subsequences). Suppose that {xn } is a sequence

in R such that xn ! x 2 R as n ! 1. Then every subsequence {xnk } will also satisfy
xnk ! x as k ! 1.
Theorem 63 (Bolzano-Weierstrass Theorem). Every bounded sequence has a con-

vergent subsequence. That is, suppose that {xn } is sequence in R and there is M > 0 such
that |xn |  M for all n 2 N . Then there exists some susbsequence {xnk } which converges.
Another way to state the above theorem is that if the sequence {xn } resides in the
bounded set (a, b), then along a subsequence we can find a limit. If instead we consider the
closed set [a, b], then this set contains its limit points and so the limit must also lie in [a, b].
With this is mind, we state a theorem characterizing compact sets in R.
Theorem 64 (Heine-Borel Theorem). A subset of R is compact (in the sense of open

covers) i↵ it is closed and bounded.
Thus compact sets are precisely those which contain all their limit points and are not
too large in either direction. With this in mind, the prototypical compact sets in R are the
closed intervals [a, b] where a, b 2 R, a < b. However, this theorem applies more generally
in the metric topology on Rn . In light of the Bolzano-Weierstrass Theorem, we can add
another equivalent statement.
Proposition 65. Suppose that C ⇢ R. Then the following are equivalent:
C is compact,
C is closed and bounded,
every sequence in C has subsequence converging to a point in C.
Sets satisfying the third property are called sequentially compact and this theorem tells
us that in R, sets are compact i↵ they are sequentially compact.
Now a valid question is: why is compactness an important property? The definition of
compactness is somewhat opaque, but compactness allows you to narrow your focus from
infinitey many things to finitely many things. This especially comes in handy when dealing
with functions. Indeed, we will move from here to discussing functions on R.
Definition 66 (Bounded Function). Suppose that f : A ! R where A ⇢ R. We say

that f id bounded i↵ there is M > 0 such that |f (x)|  M for all x 2 A. That is, f is
bounded if the image {f (x) : x 2 A} is a bounded set in R.
Proposition 67. Continuous functions from compact sets into R are bounded, and achieve
maximum and minimum values. Specifically, suppose that C ⇢ R, C is compact and
f : C ! R is continuous. Then there are xmin , xmax 2 C such that f (xmin )  f (x)  f (xmax )
for all x 2 C. [Note: this is not only asserting that f remains trapped between two extreme
values, it is also asserting the existence of xmin and xmax where f meets those extreme values.]
How does compactness come into play here? Consider, if f is continuous then the sets
Un = {x 2 C : f (x) < n} for n 2 N are open since they are the pre-image of the open
sets ( 1, n). Also since f (x) 2 R for all x 2 C, for each x 2 C we can find n 2 N such
that x 2 Un . This shows that {Un } is an open cover of C. If C is compact, there is a
finite subcover Un1 , · · · , UnK . However, these sets are nested, so this shows that C ⇢ UN
where N = max{n1 , . . . , nK }. Then for all x 2 C, we have f (x) < N , which shows that f
is bounded from above. What has happened here? We began with infinitely many di↵erent
bounds f (x) < n for n 2 N each of which may have applied at di↵erent portions of C.
Using compactness we were able to pare this down to a finite number of bounds, and then
simply pick the largest one. Making a similar argument, we can get a lower bound, and
thus the range {f (x) : x 2 C} is bounded. Since the set is bounded, it has an infimum
and supremum. The theorem also asserts that f will meet these values. How do we find the
point that meets the supremum? We can contruct a maximizing sequence {xn } such that
f (xn ) ! sup f (x) = sup{f (x) : x 2 C}.

x2C
By sequential compactness, the sequence has a subsequence {xnk } with a limit xmax 2 C
and by continuity, we will have f (xnk ) ! f (xmax ) and f (xmax ) = supx2C f (x). Thus while
compactness helps us arrive at the bound on f , sequential compactness helps us actually
find the point where f achieves the bound.
Above we defined not only continuity but also uniform continuity and Lipschitz conti-
nuity. We would like easy ways to identify which functions satisfy these stronger properties
and this is somewhere where compactness can help a bit as well.
Proposition 68. Continuous functions on compact sets are uniformly continuous. That is,
suppose that C ⇢ R and f : C ! R is continuous. If C is compact, then f is uniformly
continuous.
Again, we should observe how compactness comes into play. Fix " > 0. Recall, if f is
continuous at each point x 2 C, then for each individual point, we can find x > 0 such that
for y 2 C, |x y| < x =) |f (x) f (y)| < ". Here we have (possibly) infinitely many
di↵erent x > 0, but if we want to satisfy the definition of uniform continuity, we need to
have a single > 0. If the number x > 0 works in the definition of continuity at x 2 C, then
any smaller number 0 < 0 < x will also work. Thus one idea is to take the minimum over
all such x > 0, and this minimal will work for all x 2 C. However, the set { x }x2C may not
have a minimum and its infimum maybe zero, so this doesn’t quite work. But we note that
the sets (x x , x + x ) form any open cover of C. If C is compact, we can extrace a finite
subcover (x1 x1 , x1 + x1 ), . . . , (xK xK , xK + xK ) which still covers all of C. Now there
are only finitely many xk to choose from; choosing the minimum = min{ x1 , . . . , xK } will
provide a > 0 which works uniformly over all x 2 C, proving that f is uniformly contin-
uous. Again, we started with an infinite collection, and compactness allowed us to pare it
down to a finite collection.
Lipschitz continuity can also be easier to identify via compactness but in a slightly more
complicated way. First, recall a few definitions and theorems from calculus (for more expo-
sition regarding the calculus topic, one can look back to the calculus notes).
Definition 69 (Lipschitz Continuity). Suppose that f : R ! R. We call f Lipschitz

continuous if there is a constant L > 0 such that for all x, y 2 R, |f (x) f (y)|  L |x y| .
Definition 70 (Di↵erentiability). Suppose that f : R ! R and x 2 R. We say that f

is di↵erentiable at x if the limit
f (x + h) f (x)
lim
h!0 h
exists. In this case we call the limit f 0 (x). We say that f is di↵erentiable in a set A ⇢ R if
f is di↵erentiable at all x 2 A.
Theorem 71 (Mean Value Theorem). Suppose that f : R ! R is di↵erentiable in R.

Then for any a, b 2 R, a < b, there is c 2 (a, b) such that
f (b) f (a)
= f 0 (c).
b a
This, in turn, implies that |f (b) f (a)| = |f 0 (c)| |b a| .
Note the similarity between the last statement and the definition of Lipschitz continuity.
There’s a very formal similarity that hints at some connection of the form |f 0 (c)| ⇠ L. In-
deed, we can make this precise.
Proposition 72. Suppose that f : R ! R is di↵erentiable on R. If the derivative f 0 : R ! R

is continuous, then f is Lipschitz continuous on any compact subset of R. If the derivative
f 0 is bounded, then f is Lipschitz continuous on all of R. If f 0 is unbounded on some subset
of R, then f is not Lipschitz continuous on that subset.
This gives a rough equivalence between Lipschitz continuous functions and continuously
di↵erentiable functions. However, based on this continuous di↵erentiability still seems a bit
stronger than Lipschitz continuity and indeed, it is. Take for example, the function f (x) = |x|
for x 2 R. This function if Lipschitz continuous with Lipschitz constant 1 because of the
reverse triangle inequality:
|f (x) f (y)| = |x| |y|  |x y| .
However, this function is not di↵erentiable on all of R. [In fact, a famous theorem states
thata function is Lipschitz continuous functions i↵ it is di↵erentiable almost everywhere2
and the a.e. derivative is essentially bounded.]
Finally, we discuss sequences of functions and the interplay between sequences of func-
tions and the Riemann integral. Indeed, one of the reasons that the Lebesgue integral and
the field of measure theory came about was because the Riemann integral does not play nice
with sequences of functions, as we will see shortly.
Definition 73 (Pointwise Convergence). Let A ⇢ R and let {fn } be a sequence of

functions fn : A ! R. We say that the sequence {fn } converges pointwise to a function
2
Here almost everywhere has a technical meaning.
f : A ! R i↵ for all x 2 A, the sequence {fn (x)} in R converges to f (x). That is, {fn }
converges pointwise to f i↵ for every x 2 A and " > 0, there is N = N (x, ") 2 N such that
|fn (x) f (x)| < " for all n N .
Example 74. Consider the sequences fn , gn : [ 1, 1] ! R given by

q
fn (x) = |x|n , and gn (x) = x2 + n1 for x 2 [ 1, 1].
Note that
fn (0) = 0, fn (±1) = 1, for all n 2 N.
If x 2 [ 1, 1] \ { 1, 0, 1}, then log |x| < 0 and so
lim fn (x) = lim |x|n = lim en log|x| = 0.

n!1 n!1 n!1
Thus fn converges pointwise to the function f : [ 1, 1] ! R defined by

⇢
1, x = 1, 1,
f (x) =
0, x 2 ( 1, 1).
Next for gn , note that for all x 2 [ 1, 1],

p q
1
|x| = x  x2 +
2
n
= gn (x).
Conversely, if a, b 0 then |a2 + b2 |  a + b (one can easily verify this inequality by squaring
both sides), and so r
1 1
gn (x) = x2 +  |x| + p .
n n
Combining the inequalities and subtracting |x|, we see
1
0  gn (x) |x|  p , x 2 [ 1, 1].
n
Thus for all x 2 [ 1, 1], we have
lim gn (x) = |x|
n!1
and so gn converges pointwise to the absolute value function on [ 1, 1].
There are two interesting di↵erences to point out between these examples. In the first
example, we had a sequence of continuous functions which converged pointwise to a discon-
tinuous function, which is somewhat disconcerting (this is similar to before when we had
a sequence of rationals converging to an irrational; morally, this indicates that continiuous
functions are “incomplete with respect to pointwise limits”). The other di↵erence is that
proving the convergence of fn required special consideration for di↵erent values of x, whereas
proving the convergence for gn did not. To address both of these, we introduce a stronger
notion of convergence.
Definition 75 (Uniform Convergence). Let A ⇢ R and let {fn } be a sequence of

functions fn : A ! R. We say that the sequence {fn } converges uniformly to a function
f : A ! R i↵ for every " > 0, there is N = N (") 2 N such that |fn (x) f (x)| < " for all
x 2 A and n N .
Again, at first glance this looks roughly the same as pointwise convergence, but the word
“uniform” indicates that the same N in the definition of convergence works for all x in the
set. Because there are two di↵erent modes of convergence, when considering functions the
statement fn ! f is vague, and one should always specify what type of convergence is be-
ing proven/assumed (there are many other types of convergence as well; these are typically
addressed in a first course on measure theory). Uniform convergence is important for the
following reason.
Proposition 76. A uniform limit of continuous functions is continuous. That is, if A ⇢ R

and {fn } is a sequence of continous functions fn : A ! R such that fn converges to f : A ! R
uniformly, then f is also continuous.
Morally, this proposition states that continuous functions are “complete with respect to
uniform convergence.” We would like to place some topological structure on sets of continu-
ous functions to make this more rigorous.
Definition 77 (Continuous Functions as a Vector Space). Suppose that A ⇢ R.

Define the set C(A) = {f : A ! R | f is continuous}. This set becomes a vector space over
R under pointwise addition and pointwise scalar multiplication. That is, if f, g 2 C(A) and
↵ 2 R, define (f + ↵g) : R ! R by
(f + ↵g)(x) = f (x) + ↵g(x), x 2 A.
Then (f + ↵g) 2 C(A).
Definition 78 (Norm/Metric on Continuous Functions). Suppose that A ⇢ R is

compact. Define the map k · k : C(A) ! [0, 1) by
kf k = sup |f (x)| , f 2 C(A).

x2A
This map defines a norm on C(A). Thus d : C(A) ⇥ C(A) ! [0, 1) defined by
d(f, g) = kf gk, f, g 2 C(A)
is a metric on C(A).
Using Proposition 76 one can prove that for A ⇢ R compact, the metric space (C(A), d)
with d defined as in Definition 78 is a complete metric space (indeed convergence in this
metric space is precisely uniform convergence). This fact is very important in di↵erential
equations, where one uses Picard iteration to prove existence and uniqueness of certain
equations. There are several more theorems regarding the structure of C(A), focusing, for
example, on identifying sets which are compact in the metric topology or identifying the
continuous dual space of C(A). These are beyond the scope of these notes.
Lastly, we want to examine the interplay between the Riemann integral and convergence
of sequences of functions. We will not recall the definition of the Riemann integral here (see
the notes on Calculus I), except to remind the reader that it is essentially “area under the
curve” and can be computed using the Fundamental Theorem of Calculus. With this in
mind, it is reasonable to think that if fn converges to f (in some sense) then the limit of the
Riemann integrals of fn should converge to the Riemann integral of f . However, this is not
always the case.
Example 79. We’ll do two examples here

where we have functions fn ! f (in some
sense) but the limit of the integral is not the
integral of theR limit. This Rwill show that
b b
in many cases a fn (x)dx 6! a f (x)dx even
when fn ! f . For the first example, con-
sider fn : [0, 1] ! R defined by
8 1
>
> 4n2 x, 0  x < 2n ,
>
>
<
fn (x) = 4n2 n1 x , 2n 1
 x < n1 ,
>
>
>
>
: 1
0, n
x  1.
This is graphed in figure 3. Note that for
all n 2 N, fn (0) = 0 and for any x 2 (0, 1],
we have fn (x) = 0 for all n d1/xe. This
shows that fn (x) ! 0 for all x 2 [0, 1] and
we conclude that fn converges pointwise to
the zero function f ⌘ 0. However, calcu- Figure 3: Example 79, fn (x)
lating
R1 the area under the curve, we see that
0
f (x)dx = 12 · n1 · 2n = 1 for all n 2 N, and
so we have a situation where
Z 1 Z 1
1 = lim fn (x)dx 6= f (x)dx = 0.
n!1 0 0
[Note: this (and similar examples) is often referred to as “vertical escape to 1”; the mass
under the curve vanished as n ! 1 because the functions got very large on a very small set]
For a second example, consider the functions fn : [0, 1) ! R defined by
8
>
> 0, 0  x < n,
>
>
<
fn (x) = n1 , n  x  2n,
>
>
>
>
:
0, 2n < x < 1.
Here we see that 0  fn (x)  1/n for all x 2 [0, 1) and all n 2 N, from which it readily
R 1 uniformly 1to the zero function f ⌘ 0. However, calculating the
follows that fn converges
integral, we find that 0 fn (x)dx = n · (2n n) = 1 and so once again, we have
Z 1 Z 1
1 = lim fn (x)dx 6= f (x)dx = 0.
n!1 0 0
[Note: this (and similar examples) is referred to as “horizontal escape to 1”; the mass under
the curve vanished as n ! 1 because the support of the functions went to 1.]
Rb Rb
Both of these display cases where we may have a fn (x)dx 6! a f (x)dx. What was the
key feature of each example? In the first example, the convergence was non-uniform; in
the second example, the domain was non-compact. If both these are rectified, then we can
guarantee that limits of integrals are integrals of limits.
Proposition 80. Suppose that a, b 2 R and a sequence fn : [a, b] ! R converges uniformly

to f : [a, b] ! R. Then
Z b Z b
lim fn (x)dx = f (x)dx.
n!1 a a
Rb
This proposition shows that the map f 7! a f (x)dx is a continuous functional on
C([a, b]). One final note: the conditions of Proposition 80 are sufficient but they are
not necessary. Going back to the example fn (x) = |x|n for x 2 ( 1, 1) and n 2 N, we note
that fn converges pointwise to the zero function f ⌘ 0 and indeed we also have
Z 1 Z 1
n 2
|x| dx = !0= f (x)dx,
1 n+1 1
so in this case uniform convergence and a compact domain were not necessary. This was
one of the motivating factors for developing measure theory and the Lebesgue integral: in
order
R to find less
R stringent conditions on the behavior of fn while still guaranteeing that
f
X n
(x)dx ! X
f (x)dx when fn ! f . There is one theorem that is particularly helpful for
this. Because we do not have the machinery of measure theory, we cannot state the theorem
in full generality but we will state a particular case.
Theorem 81 (Lebesgue Dominated Convergence Theorem (Baby Version)). Sup-

pose that fn : [a, b] ! R is a sequence of continuous functions converging pointwise to
f : [a, b] ! R. Further suppose that there is a constant M > 0 such that |fn (x)|  M for
all n 2 N and all x 2 [a, b]. Then
Z b Z b
lim fn (x)dx = f (x)dx.
n!1 a a
This version of the theorem essentially says that so long as there is no possibility of
“vertical/horizontal escape to 1” as in Example 79, we will have the desired behavior for
the limit of the integrals.
Math 94 Professor: Padraic Bartlett
Lecture 1:Limits, Sequences and Series

Week 1 UCSB 2015
This is the first week of the Mathematics Subject Test GRE prep course! We start by
reviewing the concept of a limit, with an eye for how it applies to sequences, series and
functions. There are more examples here than we had time for in class, so don’t worry if
you don’t recognize everything here!
1 Sequences: Definitions and Tools

Definition. A sequence of real numbers is a collection of real numbers {an }1
n=1 indexed
by the natural numbers.
Definition. A sequence {an }1 n=1 is called bounded if there is some value B 2 R such that
|an | < B, for every n 2 N. Similarly, we say that a sequence is bounded above if there is
some value U such that an  U, 8n, and say that a sequence is bounded below if there is
some value L such that an L, 8n.
Definition. A sequence {an }1 n=1 is said to be monotonically increasing if an  an+1 ,

for every n 2 N; conversely, a sequence is called monotonically decreasing if an an+1 ,
for every n 2 N.
Definition. Take a sequence {an }1 1

n=1 . A subsequence of {an }n=1 is a sequence that we
can create from the {an }1
n=1 ’s by deleting some elements (making sure to still leave infinitely
many elements left,) without changing the order of the remaining elements.
For example, if {an }1n=1 is the sequence
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, . . . ,
the sequences 0, 0, 0, 0, 0 . . . and 1, 1, 1, 1, 1, . . . are both subsequences of {an }1

n=1 , as is
0, 1, 0, 0, 0, 0, . . . and many others.
Definition. A sequence {an }1 n=1 converges to some value if the an ’s “go to ” at infinity.
To put it more formally, limn!1 an = i↵ for any distance ✏, there is some cuto↵ point N
such that for any n greater than this cuto↵ point, an must be within ✏ of our limit .
In symbols:
lim an = i↵ (8✏)(9N )(8n > N ) |an | < ✏.

n!1
Convergence is one of the most useful properties of sequences! If you know that a
sequence converges to some value , you know, in a sense, where the sequence is “going,”
and furthermore know where almost all of its values are going to be (specifically, close to
.)
Because convergence is so useful, we’ve developed a number of tools for determining
where a sequence is converging to:
1
1.1 Sequences: Convergence Tools
1. The definition of convergence: The simplest way to show that a sequence con-
verges is sometimes just to use the definition of convergence. In other words, you
want to show that for any distance ✏, you can eventually force the an ’s to be within
✏ of our limit, for n sufficiently large.
How can we do this? One method I’m fond of is the following approach:
• First, examine the quantity |an L|, and try to come up with a very simple
upper bound that depends on n and goes to zero. Example bounds we’d love to
run into: 1/n, 1/n2 , 1/ log(log(n)).
• Using this simple upper bound, given ✏ > 0, determine a value of N such that
whenever n > N , our simple bound is less than ✏. This is usually pretty easy:
because these simple bounds go to 0 as n gets large, there’s always some value
of N such that for any n > N , these simple bounds are as small as we want.
• Combine the two above results to show that for any ✏, you can find a cuto↵ point
N such that for any n > N , |an L| < ✏.
That said: if you find yourself needing to resort to the ✏ N definition for the limit
of a sequence on the GRE test, something has likely gone wrong. Far more useful are
results like the following:
2. Arithmetic and sequences: These tools let you combine previously-studied results
to get new ones. Specifically, we have the following results:
• Additivity of sequences: if limn!1 an , limn!1 bn both exist, then limn!1 an +

bn = (limn!1 an ) + (limn!1 bn ).
• Multiplicativity of sequences: if limn!1 an , limn!1 bn both exist, then limn!1 an bn =
(limn!1 an ) · (limn!1 bn ).
• Quotients of sequences: if limn!1 an , limn!1 bn both exist, and bn 6= 0 for all
n, then limn!1 abnn = (limn!1 an )/(limn!1 bn ).
3. Monotone and bounded sequences: if the sequence {an }1 n=1 is bounded above and
nondecreasing, then it converges; similarly, if it is bounded above and nonincreasing,
it also converges. If a sequence is monotone, this is usually the easiest way to prove
that your sequence converges, as both monotone and bounded are “easy” properties
to work with. One interesting facet of this property is that it can tell you that a
sequence converges without necessarily telling you what it converges to! So, it’s often
of particular use in situations where you just want to show something converges, but
don’t actually know where it converges to.
4. Subsequences and convergence: if a sequence {an }1 n=1 converges to some value

L, all of its subsequences must also converge to L.
One particularly useful consequence of this theorem is the following: suppose a se-
quence {an }1 1 1
n=1 has two distinct subsequences {bn }n=1 , {cn }n=1 that converge to dif-
ferent limits. Then the original sequence cannot converge! This is one of the few
2
tools that you can use to directly show that something diverges, and as such is pretty
useful.
5. Squeeze theorem for sequences: if limn!1 an , limn!1 bn both exist and are equal
to some value l, and the sequence {cn }1n=1 is such that an  cn  bn , for all n, then the
limit limn!1 cn exists and is also equal to l. This is particularly useful for sequences
with things like sin(horrible things) in them, as it allows you to “ignore” bounded bits
that aren’t changing where the sequence goes.
6. Cauchy sequences: We say that a sequence is Cauchy if and only if for every ✏ > 0
there is a natural number N such that for every m > n N , we have
|am an | < ✏.
You can think of this condition as saying that Cauchy sequences “settle down” in the
limit – i.e. that if you look at points far along enough on a Cauchy sequence, they all
get fairly close to each other.
The Cauchy theorem, in this situation, is the following: a sequence is Cauchy if and
only if it converges.
Much like the ✏ N definition, if you find yourself showing something is Cauchy
to show it converges, you have probably made a mistake in your choice of methods.
That said, sometimes definition-centric questions will crop up that amount to “do you
remember this concept;” Cauchy is certainly one such concept you could be asked to
recall.
2 Sequences: Examples of Convergence Tools

In this section, we work some examples of these tools.
Claim. (Definition of convergence example:)

p p
lim n + 1 n = 0.
n!1
Proof. When we discussed the definition as a convergence tool, we talked about a “blueprint”
for how to go about proving convergence from the definition: (1) start with |an L|, (2)
try to find a simple upper bound on this quantity depending on n, and (3) use this simple
bound to find for any ✏ a value of N such that whenever n > N , we have
|an L| < (simple upper bound) < ✏.
3
p p
Let’s try this! Specifically, examine the quantity | n + 1 n 0|:
p p p p
| n+1 n 0| = n + 1 n
p p p p
( n+1 n)( n + 1 + n)
= p p
n+1+ n
n+1 n
=p p
n+1+ n
1
=p p
n+1+ n
1
<p .
n
All we did here was hit our |an L| quantity with a ton of random algebra, and kept trying
things until we got something simple. The specifics aren’t as important as the idea here:
just start with the |an L| bit, and try everything until it’s bounded by something simple
and small!
In our specific case, we’ve acquired the upper bound p1n , which looks rather simple: so
let’s see if we can use it to find a value of N .
Take any ✏ < 0. If we want to make our simple bound p1n < ✏, this is equivalent to
p
making 1✏ < n, i.e ✏12 < n. So, if we pick N > ✏12 , we know that whenever n > N , we have
n > ✏12 , and therefore that our simple bound is < ✏. But this is exactly what we wanted!
In specific, for any ✏ > 0, we’ve found a N such that for any n > N , we have
p p 1 1
| n+1 n 0| < p < p < ✏,
n N
p p
which is the definition of convergence. So we’ve proven that limn!1 n+1 n = 0.
Claim. (Monotone and Bounded example:) The sequence
a1 = 2,
p
an+1 = 3a2n 1
converges.
Proof. This is a recursively-defined sequence; that is, the terms of this sequence are not
explicitly stated, but rather defined in terms of earlier terms! This is a bit of a headache
for us in terms of determining where this sequence goes. So: let’s not do that yet! Instead,
let’s just try to determine if it goes anywhere at all first; that is, let’s see if we can determine
whether or not it converges!
If we want to show a sequence converges without knowing where it converges to, there
are relatively few tools we have (basically Monotone+Bounded, or Cauchy.) Cauchy is
. . . not very pleasant-looking, so let’s see if this is monotone and bounded!
4
p p p p
From inspection (a1 = 2, a2 = 5, a3 = 3 5 1 > 5, . . .) with a calculator, it seems
like our terms are increasing — that is, an+1 an , for all n! We can prove this formally by
induction: p p p
Base case: a2 = 3 · 2 1 = 5 > 4 = 2 = a1 .
Inductive step: assume that an+1 an ; we will use this assumption to prove that
an+2 an+1 . To do this, simply look at an+2 . By definition + our inductive hypothesis,
we have
p p
an+2 = 3an+1 1 3an 1 = an+1 ,
and have thus proven our claim!

Bounded is not too hard, as well: if you plug a bunch of values here into a calculator,
you’ll see that everything here looks like it is some value less than 3! This suggests that 3
might be an upper bound, which we can again show by induction:
Base case: a1 = 2 < 3.
Inductive step: assume that an < 3. We seek to prove that an+1 < 3. This is not hard;
again, use your definition + the inductive assumption to see taht
p p p
an+1 = 3an 1 < 3 · 3 1 = 8 < 3.
So: by the monotone+bounded theorem from before, a limit exists! How can we find
it?
Well: let’s say that our sequence converges to some value L, say: then we have
lim an = L.
n!1
If we use our definition for an , we can see that this limit is als
p
lim 3an 1 1 = L.
n!1
When presented with a limit like this, our first reaction should probably be “square roots
are irritating.” Our response, therefore, should be to get rid of them! In other words, let’s
square both sides; i.e. by using arithmetic and limits, we get
p p p
lim ( 3an 1 1)2 = ( lim 3an 1 1)( lim 3an 1 1) = L · L.
n!1 n!1 n!1
This is nicer! In particular, after squaring, we can manipulate the LHS to get that
3 lim an 1 = L2 + 1.
n!1
Because our limit is taken as n goes to infinity, the an 1 ’s are just ahuge for appropriately
huge values of huge; in other words, the limit on the left just goes to the same place that
lim an goes to, i.e. L.
n!1
Therefore, we actually have
3L = L2 + 1;
5
in other words L2 3L + 1 = 0. This is a quadratic equation; we can solve for its two roots
and get
p p
3± 9 4 3± 5
= .
2 2
p
Which of these is the limit? Well: if we look at 3 2 5 , this number is strictly less than 32 ,
which is in turn less than 2. But our sequence starts at 2 and increases; so this cannot be
the limit! Therefore, we know that our limit L must be the other value above: namely,
p
3+ 5
⇡ 2.62.
2
Claim. (Squeeze theorem example:)

n
!
.
..
sin n2 · ⇡ ne 12n · nn
lim = 0.
n!1 n
Proof. The idea of squeeze theorem examples is that they allow you to get rid of awful-
looking things whenever they aren’t materially changing where the sequence is actually
going. Specifically, in our example here, the sin(terrible things) part is awful to work with,
but really isn’t doing anything to our sequence: the relevant part is the denominator, which
is going to infinity (and therefore forcing our sequence to go to 0.
Rigorously: we have that
1  sin(terrible things)  1,
no matter what terrible things we’ve put into the sin function. Dividing the left and right
by n, we have that
1 sin(terrible things) 1
  ,
n n n
1 1
for every n. Then, because limn!1 n = limn!1 n = 0, the squeeze theorem tells us that
n
!
.
e ..
sin n2 · ⇡ n 12n · nn
lim =0
n!1 n
as well.
3 Series: Definitions and Tools

We define series as follows:
6
Definition. A sequence is called summable if the sequence {sn }1
n=1 of partial sums
n
X
sn := a1 + . . . an = ak
k=1
converges.
If it does, we then call the limit of this sequence the series associated to {an }1
n=1 , and
denote this quantity by writing
1
X
an .
n=1
P Pn 1
We say that a series 1 n=1 an converges or diverges if the sequence { k=1 ak }n=1 of
partial sums converges or diverges, respectively.
Just like sequences, we have a collection of various tools we can use to study whether a
given series converges or diverges. Here are two such tools:
1. Comparison test: If {an }1 1
n=1 , {bn }n=1 are a pair of sequences such that 0  an  bn ,
then the following statement is true:
1
! 1
!
X X
bn converges ) an converges .
n=1 n=1
When to use this test: when you’re looking at something fairly complicated
P that either
(1) you can bound above by something simple that converges, like 1/n 2 , or (2) that
P
you can bound below by something simple that diverges, like 1/n.
2. Ratio test: If {an }1
n=1 is a sequence of positive numbers such that
an+1
lim = r,
n!1 an
then we have the following three possibilities:

P
• If r < 1, then the series 1 an converges.
Pn=1
1
• If r > 1, then the series n=1 an diverges.
• If r = 1, then we have no idea; it could either converge or diverge.
When to use this test: when you have something that is growing kind of like a geo-
metric series: so when you have terms like 2n or n!.
3. Integral test: If f (x) is a positive and monotonically decreasing function, then
X1 Z 1
f (n) converges if and only if f (x)dx converges.
n=N N
When to use this test: whenever you have something that looks a lot easier to integrate
X1
1
than to sum. (In particular, this test instantly proves that converges for c > 1
nc
n=1
and diverges for c  1. In particular, this test I think answers every problem the “p-
series test” solves, if that is one you remember from your calculus/analysis classes!)
7
4. Alternating series test: If {an }1
n=1 is a sequence of numbers such that
• limn!1 an = 0 monotonically, and

• the an ’s alternate in sign, then
P
the series 1 n=1 an converges.
When to use this test: when you have an alternating series.
5. Absolute convergence ) convergence: Suppose that {an }1

n=1 is a sequence such
that
1
X
|an |
n=1
P1
converges. Then the sequence n=1 an also converges.
When to use this test: whenever you have a sequence that has positive and negative
terms, that is not alternating. (Pretty much every other test requires that your
sequence is positive,
P1so you’ll often apply this test and then apply one of the other
tests to the series n=1 |an |.)
To illustrate how to work with these definitions, we work a collection of examples here:
4 Series: Example Calculations

Claim. (Comparison
P1 test example): If {an }1
n=1 is a sequence of positive numbers such that
the series n=1 an converges, then the series
1 p
X an
n
n=1
must also converge.

p
n a
Proof. To see why, simply notice that each individual termP1 n in thisP series is just the
geometric mean of an and n2 . Because both of the series n=1 an and 1
1 1
n=1 n2 are conver-
gent, we would expect their average (for any reasonable sense of average) to also converge:
in particular, we would expect to be able to bound its individual terms above by some
combination of the original terms an and n12 !
In fact, we actually can do this, as illustrated below:
✓ ◆
p 1 2
0 an
n
p
1 an
) 0  an + 2 2
p n✓ n◆
an 1 1 1
)  a n + 2 < an + 2 .
n 2 n n
8
P1 1
Look at the series n=1 an + n2
. Because both of the series
1
X 1
X 1
an ,
n2
n=1 n=1
converge, we can write

1 ✓
X ◆ 1
X 1
X
1 1
an + 2 = an +
n n2
n=1 n=1 n=1
and therefore notice p

that this series also converges; therefore, by the comparison test, our
P an
original series 1n=1 n must also converge.
Claim. (Ratio test example): The series

1
X 2n · n!
nn+1
n=1
converges.
Proof. Motivated by the presence of both a n! and a 2n , we try the ratio test:
2n ·n!
an nn+1
= 2n 1 ·(n 1)!
an 1
(n 1)n
2n · n! · (n 1)n
=
2n 1 · (n 1)! · nn+1
2 · n · (n 1)n
=
nn+1
2 · (n 1)n
= n
✓n ◆
n 1 n
=2·
n
✓ ◆
1 n
=2· 1
n
Here, we need one bit of knowledge that you may not have encountered before: the fact
that the limit
⇣ x ⌘n
lim 1 + = ex ,
n!1 n
and in particular that
✓ ◆n
1 1
lim 1 = .
n!1 n e
9
(Historically, I’m pretty certain that that this is how e was defined; so feel free to take it
as a definition of e itself.)
Applying this tells us that
✓ ◆
an 1 n 2
lim = lim n ! 12 · 1 = ,
n!1 an 1 n e
which is less than 1. So the ratio test tells us that this series converges!
Claim. (Alternating series test): The series

1
X ( 1)n+1
n
n=1
converges.
Proof. The terms in this series are alternating in sign: as well, they’re bounded above and
below by ± n1 , both of which converge to 0. Therefore, we can apply the alternating series
test to conclude that this series converges.
Claim. (Absolute convergence ) convergence): The series

1
X cosn (nx)
n!
n=1
converges.
Proof. We start by looking at the series composed of the absolute values of these terms:
1
X | cosn (nx)|
n!
n=1
Because | cos(x)|  1 for all x, we can use the comparison test to notice that this series will
converge if the series
1
X 1
n!
n=1
converges.
We can study this series with the ratio test:
1
n! 1
lim 1 = lim = 0,
n!1
(n 1)!
n!1 n
which is less than 1. Therefore this series converges, and therefore (by the comparison test
+ absolute convergence ) convergence) our original series
1
X cosn (nx)
n!
n=1
converges.
10
Claim. The series
1
X
n2
ne
n=1
converges.
Proof. First, notice that because

⇣ ⌘
2 0 2 x2 x2
xe x =e x 2x2 e = (1 2x2 )e < 0, 8x > 1,
we know that this function is decreasing on all of [1, 1). As well, it is positive on [1, 1):
so we can apply the integral test to see that this series converges i↵ the integral
Z 1
2
xe x dx
n=1
converges.
But this is not too hard to show! – by using the u-substitution u = x2 , we have that
Z 1 Z 1 u u
1
x2 e e e
xe dx = du = = ,
n=1 1 2 2 2
1
and that in particular this integral converges. Therefore

1
X
n2
ne
n=1
must converge as well.
5 Functions, Continuity, and Limits: Definitions

Definition. If f : X ! Y is a function between two subsets X, Y of R, we say that
lim f (x) = L
x!a
if and only if
1. (vague:) as x approaches a, f (x) approaches L.
2. (precise; wordy:) for any distance ✏ > 0, there is some neighborhood > 0 of a such
that whenever x 2 X is within of a, f (x) is within ✏ of L.
3. (precise; symbols:)
8✏ > 0, 9 > 0 s.t. 8x 2 X, (|x a| < ) ) (|f (x) L| < ✏).
Definition. A function f : X ! Y is said to be continuous at some point a 2 X i↵
lim f (x) = f (a).

x!a
11
Somewhat strange definitions, right? At least, the two “rigorous” definitions are some-
what strange: how do these epsilons and deltas connect with the rather simple concept of
“as x approaches a, f (x) approaches f (a)”? To see this a bit better, consider the following
image:
A+
A
A-
b- b+
This graph shows pictorially what’s going on in our “rigorous” definition of limits and
continuity: essentially, to rigorously say that “as x approaches a, f (x) approaches f (a)”,
we are saying that
• for any distance ✏ around f (a) that we’d like to keep our function,
• there is a neighborhood (a , a + ) around a such that
• if f takes only values within this neighborhood (a , a + ) , it stays within ✏ of f (a).
Basically, what this definition says is that if you pick values of x sufficiently close to a, the
resulting f (x)’s will be as close as you want to be to f (a) – i.e. that “as x approaches a,
f (x) approaches f (a).”
This, hopefully, illustrates what our definition is trying to capture – a concrete notion
of something like convergence for functions, instead of sequences.
In practice, on the GRE, you are probably in trouble if you try to use this definition
(gorgeous as it is.) Instead, what you likely want to do is try some of the following tools:
1. Squeeze theorem: If f, g, h are functions defined on some interval I \ {a}1 such that
f (x)  g(x)  h(x), 8x 2 I \ {a},

lim f (x) = lim h(x),
x!a x!a
then limx!a g(x) exists, and is equal to the other two limits limx!a f (x), limx!a h(x).
1
The set X \ Y is simply the set formed by taking all of the elements in X that are not elements in Y .
The symbol \, in this context, is called “set-minus”, and denotes the idea of “taking away” one set from
another.
12
2. Limits and arithmetic: if f, g are a pair of functions such that limx!a f (x),
limx!a g(x) both exist, then we have the following equalities:
⇣ ⌘ ⇣ ⌘
lim (↵f (x) + g(x)) = ↵ lim f (x) + lim g(x)
x!a x!a x!a
⇣ ⌘ ⇣ ⌘
lim (f (x) · g(x)) = lim f (x) · lim g(x)
x!a x!a x!a
✓ ◆ ⇣ ⌘ ⇣ ⌘
f (x)
lim = lim f (x) / lim g(x) , if lim g(x) 6= 0.
x!a g(x) x!a x!a x!a
3. Limits and composition: if f : Y ! Z is a function such that limy!a f (x) = L,

and g : X ! Y is a function such that limx!b g(x) = a, then
lim f (g(x)) = L.
x!b
Specifically, if both functions are continuous, their composition is continuous.
4. L’Hôpital’s rule: If f (x) and g(x) are a pair of di↵erentiable functions such that
either
• limx!a f (x) = 0 and limx!a g(x) = 0, or

• limx!a f (x) = ±1 and limx!a g(x) = ±1,
f (x) f 0 (x)
then limx!a g(x) = limx!a g 0 (x) , whenever the second limit exists.
6 Limits: Examples
Example.
lim x2 sin(1/x) = 0.
x!0
Proof. So: for all x 2 R, x 6= 0, we have that
1  sin(1/x)  1
) x2  x2 sin(1/x)  x2 ;
thus, by the squeeze theorem, as the limit as x ! 0 of both x2 and x2 is 0,
lim x2 sin(1/x) = 0
x!0
as well.
13
Example.
lim sin(1/x2 ) = sin(1/a2 ),

x!a
if a 6= 0.
Proof. By our work earlier in this lecture, 1/x2 is continuous at any value of a 6= 0, and
from class sin(x) is continuous everywhere: thus, we have that their composition, sin(1/a2 ),
is continuous wherever x 6= 0. Thus,
lim sin(1/x2 ) = sin(1/a2 ),

x!a
as claimed.
Example. Show that the limit
(1 x)x 1 + x2
lim
x!0 x3
converges to 1/2.
Proof. We bash this limit repeatedly with L’Hôpital’s rule. First, before we can apply
L’Hôpital’s rule, we must check that its conditions apply. The functions contained in the
numerator and denominator are all infinitely di↵erentiable near 0, so this will never be a
stumbling block: furthermore, because the numerator and denominator are both continu-
ous/defined at 0, we can evaluate their limits at 0 by just plugging in 0: i.e.
lim (1 x)x 1 + x2 = (1 0)0 1 + 02 = 1 1 = 0, and

x!0
lim x3 = 03 = 0.
x!0
So we’ve satisfied the conditions for L’Hôpital’s rule, and can apply it to our limit:
d
(1 x)x 1 + x2 dx (1 x)x 1 + x2
lim =L0 H lim .
x!0 x3 x!0 d
dx (x3 )
At this point, we recall how to di↵erentiate functions of the form f (x)g(x) , where f (x) > 0,
by using the identity
(f (x))g(x) = eln(f (x))·g(x)

d d ln(f (x))·g(x)
) (f (x))g(x) = e
dx dx ✓ ◆
ln(f (x))·g(x) g(x) 0 0
=e · · f (x) + g (x) ln(f (x)) .
f (x)
14
In particular, we can rewrite (1 x)x as eln(1 x)·x , which will let us just di↵erentiate
using the chain rule:
d
(1 x)x 1 + x2 dx (1 x)x 1 + x2
lim =L0 H lim
x!0 x3 x!0 d
dx (x3 )
⇣ ⌘
x
d
eln(1 x)·x 1+ x2 eln(1 x)·x · ln(1 x) + x 1 + 2x
dx
= lim d
= lim
x!0
dx (x3 ) x!0 3x2
Again, both the numerator and denominator are continuous, and plugging in 0 up top yields
eln(1)·0 · ln(1) + 01 2 · 0 = 0, while on the bottom we also get 0. Therefore, we can apply
L’Hôpital’s rule again to get that our limit is just
⇣ ⇣ ⌘ ⌘
d ln(1 x)·x · ln(1 x
dx e x) + x 1 + 2x
lim d 2
dx (3x )
x!0
⇣ ⌘2 ⇣ ⌘
eln(1 x)·x · ln(1 x) + x x 1 + eln(1 x)·x · 1
1 x
1
(x 1) 2 +2
= lim
x!0 6x
Again, the top and bottom are continuous near 0, and at 0 the top is
✓ ◆2 ✓ ◆
0 1 1
eln(1 0)·0
· ln(1 0) + + eln(1 0)·0
· +2=0 2 + 2 = 0,
0 1 1 0 (0 1)2
while the bottom is also 0. So, we can apply L’Hôpital again! This tells us that our limit
is in fact
✓ ⇣ ⌘2 ⇣ ⌘ ◆
d ln(1 x)·x x ln(1 x)·x 1 1
dx e · ln(1 x) + x 1 + e · 1 x (x 1)2
+2
lim d
dx (6x)
x!0
⇣ ⌘3 ⇣ ⌘ ⇣ ⌘
eln(1 x)·x · ln(1 x) + x x 1 +3eln(1 x)·x · ln(1 x) + x x 1 · 1
1 ⌘x
1
(x 1) 2
⇣
+eln(1 x)·x · (x 11)2 + (x 21)3
= lim .
x!0 6
Again, the top and bottom are made out of things that are continuous at 0. Plugging in 0
to the top this time gives us 3, while the bottom gives us 6: therefore, the limit is just
3 1
= .
6 2
So we’re done! (In class, I did a less-awful L’Hôpital bash than this to save time. I do this
here to illustrate just how many times you may have to apply L’Hôpital to get an answer,
though the average GRE problem will be less messy than this calculation!)
15
Lecture 2: Derivatives and Integrals

Week 2 UCSB 2015
This is the second week of the Mathematics Subject Test GRE prep course; here, we
review the concepts of derivatives and integrals!
1 Bestiary of Functions
For convenience’s sake, we list the definitions, integrals, derivatives, and key values of several
functions here.
Name Domain Derivative Integral Key Values
ln(x) (0, 1) 1/x x · ln(x) x + C ln(1) = 0,
ln(e) = 1
ex R ex ex + C e0 = 1,
e1 = e
sin(x) R cos(x) cos(x) + C sin(0) = p 0,
sin 4 = 22 ,
⇡
sin ⇡2 = 1
cos(x) R sin(x) sin(x) + C cos(0) = p 1,
cos 4 = 22 ,
⇡
cos ⇡2 = 0
2k+1
tan(x) x 6= 2 ⇡ sec2 (x) ln | sec(x)| + C tan(0) = 0,
tan ⇡4 = 1
2k+1
sec(x) x 6= 2 ⇡ sec(x) tan(x) ln | sec(x) + tan(x)| + C sec(0) = p 1,
⇡
sec 4 = p 2
csc(x) x 6= k⇡ csc(x) cot(x) ln | csc(x) cot(x)| + C csc ⇡4 = 2,
sec ⇡2 = 1
cot(x) x 6= k⇡ csc2 (x) ln | sin(x)| + C cot ⇡4 = 1,
p cot ⇡2 = 0
arcsin(x) ( 1, 1) p 1 x arcsin(x) + 1 x2 + C arcsin (0) = 0,
1 x2
p arcsin (1) = ⇡2
1
arccos(x) ( 1, 1) p
1 x2
x arccos(x) 1 x2 + C arccos (0) = ⇡2 ,
arcsin (1) = 0
1 ln(1+x2 )
arctan(x) R 1+x2
x arctan(x) 2 +C arctan (0) = 0,
arctan (1) = ⇡2
Memorizing all of these is not necessary to do well on the GRE: as we’ll discuss in class,
you can derive almost all of these identities on the fly by using the product/chain rules
or integration by parts/substitution! However, doing those calculations can take time, and
memorizing these formulas will save you time on the test; consider studying them in the
two weeks before you test with flashcards and the like!
1
2 The Derivative
As always, we start with the formal definition:
Definition. For a function f defined on some neighborhood (a , a + ), we say that f is

di↵erentiable at a i↵ the limit
f (a + h) f (a)
lim
h!0 (a + h) a
exists. If it does, denote this limit as f 0 (a); we will often call this value the derivative of
f at a.
Again, as before, if you find yourself directly using this definition to solve a GRE problem,
something has likely gone wrong! Instead, you likely want to use one of several rules that
we know for evaluating derivatives:
2.1 Tools
1. Di↵erentiation is linear: For f , g a pair of functions di↵erentiable at a and ↵, a
pair of constants,
(↵f (x) + g(x))0 = ↵f 0 (a) + g 0 (a).

a
2. Product rule: For f , g a pair of functions di↵erentiable at a,
(f (x) · g(x))0 = f 0 (a) · g(a) + g 0 (a) · f (a).

a
3. Quotient rule: For f , g a pair of functions di↵erentiable at a, g(a) 6= 0, we have

✓ ◆0
f (x) f 0 (a) · g(a) g 0 (a) · f (a)
=
g(x) (g(a))2
a
4. Chain rule: For f a function di↵erentiable at g(a) and g a function di↵erentiable at

a,
(f (g(x)))0 = f 0 (g(a)) · g 0 (a).

a
5. Inverse function rule: Suppose that f (x) is a bijective function with inverse f 1 (x),
and that f 1 (x) is di↵erentiable at some point a. Then we have that
1 1
f (x) =
a f 0 (f 1 (a))
2
2.2 Theorems and Interpretations
The derivative has a number of useful interpretations and associated theorems. We state a
few here:
1. Physical phenomena: if f (x) is a function that calculates distance with respect to

some time t, you can think of the derivative f 0 (t) as denoting the velocity of f at time
t, and f 00 (t) as denoting the acceleration of f at time t.
2. Tangents: if f (x) = y is a curve, the slope of the tangent to this curve at any point
x0 is given by f 0 (x0 ).
3. Mean Value Theorem: The Mean Value Theorem (abbreviated MVT) is the fol-
lowing result. Suppose that f is a continuous function on the interval [a, b] that’s
di↵erentiable on (a, b). Then there is some value c such that
f (b) f (a)
f 0 (c) = .
b a
In other words, there is some point c between f (a) and f (b) such that the derivative at
that point is equal to the slope of the line segment connecting (a, f (a)) and (b, f (b)).
The following picture illustrates this claim:
3
4. Classification of extrema: You can use the derivative to find minima and maxima
of functions! Specifically, recall the following two definitions:
Definition. A function f has a critical point at some point x if either of the two
properties hold:
• f is not di↵erentiable, or
• f 0 (x) = 0.
Definition. A function f has a local maxima(resp. local minima) at some point

a i↵ there is a neighborhood (a , a + ) around a such that f (a) f (x) (resp.
f (a)  f (x),) for any x 2 (a , a + ).
Derivatives relate to these properties as follows:
Proposition. If f is a function that has a local minima or maxima at some point t,

t is a critical point of f . As a corollary, if f is a continuous function defined on some
interval [a, b], f adopts its global minima and maxima at either f ’s critical points in
[a, b], or at the endpoints {a, b} themselves.
Furthermore, if f 00 (x) is defined and positive at one of these critical points, f adopts a
local minima at x; conversely, if f 00 (x) exists and is negative, f adopts a local maxima
at x.
To illustrate these ideas, we work some sample problems:
2.3 Worked examples.

Question. Choose n 2 N. Where does the function
f (x) = xn xnn
take its local and global minima and maxima in the interval [ 2n, 2n]?
Proof. First, note that if n = 1 our function is identically 0, and thus its local and global
minima and maxima are uninteresting. We will focus on n > 1 for the rest of the proof.
By the above proposition, we know that f will take on these minima and maxima at
its critical points and endpoints. Because f is di↵erentiable everywhere, f ’s only critical
points come at places where f 0 (x) = 0. We examine these points here:
f 0 (x) = nxn 1
nn = 0
, nxn 1
= nn
, xn 1
= nn 1
.
There are two cases, here: if n is odd, its critical points occur at ±n; if n is even, however,
its only critical point is at n. In either situation, we have that f 00 (x) = n(n 1)xn 2 ; thus,
4
we have that x = n is a local minima regardless of whether n is odd or even, while x = n
is a local maxima for n odd.
This accomplished, we can then evaluate our function at these points along with the
endpoints, and use this to find its global maxima and minima:
For n odd:
f ( 2n) = ( 2n)n ( 2n) · nn = nn (2n 2n ) ,

f ( n) = ( n)n ( n) · nn = nn (n 1) ,
n n n
f (n) = (n) (n) · n = n (1 n) ,
n n n n
f (2n) = (2n) (2n) · n = n (2 2n) ,
So: if n > 2, we know that 2n > 2n; consequently, we have that f ( 2n) is the global
minima and f (2n) is the global maxima. Because every odd number other than 1 is > 2,
we’ve thus resolved our question of n odd.
For n even:
f ( 2n) = ( 2n)n ( 2n) · nn = nn (2n + 2n ) ,

f (n) = (n)n (n) · nn = nn (1 n) ,
n n n n
f (2n) = (2n) (2n) · n = n (2 2n) .
For any even value of n, this function has its global maxima at f ( 2n) and its global minima
at f (n). Thus, we’ve classified f ’s local and global minima and maxima for any value of n:
so we’re done!
Question. Let p(t) denote the current location of a particle moving in a one-dimensional
space. Call this particle “nice” if p(0) = 0, p(1) = 1, p0 (0) = p0 (1) = 0, and p(t) is
continuous.
What is
!
inf sup |p00 (t)| ?
“nice” particles t2[0,1]
Proof. To start studying the above claim, let’s assume that there is some answer M : in
other words, there is some M such that
1. M |p00 (t)|, for any nice particle p and any t 2 [0, 1].
2. M is the smallest such number that the above claim holds for.
What can we do from here? Well: we have some boundary conditions (niceness tell us
that p(0) = 0, p(1) = 1, p0 (0) = 0, p0 (1) = 0) and one global piece of information (|p00 (t)| <
M ). How can we turn this knowledge of the second derivative into information about rest
of the function?
5
Well: if we apply the mean value theorem to the function p00 (t), what does it say? It
tells us that on any interval [a, b], there is some c 2 (a, b) such that
p0 (b) p0 (a)
= (p0 )0 (x) = p00 (c).
b a
In other words, it relates the first and second derivatives to each other! So, if we apply our
known bound |p00 (t)| < M, 8t 2 [0, 1], we’ve just shown that
p0 (b) p0 (a)
= |p00 (c)| < M,
b a
for any a < b 2 [0, 1]. In particular, if we set a = 0, b = t and remember our boundary
condition p0 (0) = 0, we’ve proven that
p0 (t) p0 (0) p0 (t) 0 |p0 (t)|

= = <M
t 0 t t
0
)|p (t)| < M t.
Similarly, if we let a = 1 t and b = 1, we get

p0 (1) p0 (1 t) 0 p0 (1 t) |p0 (1 t)|
= = <M
1 (1 t) t t
0
)|p (1 t)| < M t.
Excellent! We’ve turned information about the second derivative into information about
the first derivative.
Pretend, for the moment, that you’re back in your high school calculus courses, and
you know how to find antiderivatives. In this situation, you’ve got a function p(t) with the
following properties:
• p(0) = 0,
• p(1) = 1,
• |p0 (t)| < M t, and
• |p0 (1 t)| < M t.

What do you know about p(t)? Well: if p0 (t) < M t, you can integrate to get that p(t) <
M 2
2 t + C, for some constant C: using our boundary condition p(0) = 0 tells us that in
specific we can pick C = 0, and we have
M 2
p(t) < t , 8t 2 (0, 1).
2
Similarly, if we use our 0
R observation that p (1 t) < M t, we can integrate to get that the
integral of the LHS, p(1 t)dt = p(1 t), is bounded above by the integral of the RHS,
6
which is M 2
2 t + C. Using our boundary condition p(1) = 1 and multiplying both sides by
1 tells us that we can pick C = 1 and get
M 2
p(1 t) > t + 1, 8t 2 (0, 1).
2
2
But what happens if we plug in t = 12 ? In our first bound, we have p 12 < M 1
2 2 = M8 .
1 M 1 2 M
Conversely, in our second bound we have p 1 2 > 2 2 + 1 = 1 8 .
In other words, we have M 8 < p(1/2) < 1 M
8 , which forces M  4. So we know an
upper bound on our answer!
Moreover, it is an attainable bound, whose answer is suggested by our work here: if we
actually set M = 4, we get the piecewise function
⇢
2t2 , t 2 ( 1, 1/2]
p(t) = 2
1 2(1 t) , t 2 [1/2, 1)
This function is continuous, as it is piecewise made up of polynomials (which are continuous)

and at their join we have 2(1/2)2 = 1/2 = 1 2(1 1/2)2 = 1/2. As well, p(0) = 0, p(1) = 1,
and the derivative of p(t) is just
⇢
4t, t 2 ( 1, 1/2)
p(t) = ,
4(1 t), t 2 (1/2, 1)
which has p0 (0) = 0 = p0 (1) = 0.
3 Integration
As before, we start by defining our terms:
Definition. The integral: A function f is integrable1 on the interval [a, b] if and only
if the following holds:
• For any ✏ > 0,
• there is a partition a = t0 < t1 < . . . < tn 1< tn = b of the interval [a, b] such that
n n
!
X X
sup (f (x)) · length(ti 1 , ti ) inf (f (x)) · length(ti 1 , ti ) < ✏.
x2(ti 1 ,ti )
i=1 x2(ti 1 ,ti ) i=1
Pictorially, this is just saying that the area of the teal rectangles approaches the area of the
purple rectangles in the picture below:
1
To be specific, Riemann-integrable.
7
Because of this picture, we often say that the integral of a function on some interval [a, b]
is the area beneath its curve from x = a to x = b.
Again, using this theorem directly is usually not the best idea. Instead, we have a
number of tools and theorems that are helpful for calculating integrals:
3.1 Theorems and Tools

1. The first fundamental theorem of calculus: Let [a, b] be some interval. Suppose
that f is a bounded and integrable function over the interval [a, x], for any x 2 [a, b].
Then the function
Z x
A(x) := f (t)dt
a
exists for all x 2 [a, b]. Furthermore, if f (x) is continuous, the derivative of this
function, A0 (x), is equal to f (x).
2. The second fundamental theorem of calculus: Let [a, b] be some interval. Sup-
pose that f (x) is a function that has '(x) as its primitive on [a, b]; as well, suppose
that f (x) is bounded and integrable on [a, b]. Then, we have that
Z b
f (x)dx = '(b) '(a).
a
3. Integration by parts: If f, g are a pair of C 1 functions on [a, b] – i.e they have

continuous derivatives on [a, b] – then we have
Z b Z b
b
f (x)g 0 (x) = f (x)g(x) = f 0 (x)g(x)dx.
a a a
4. Integration by substitution: If f is a continuous function on g([a, b]) and g is a

C 1 functions on [a, b], then we have
Z b Z g(b)
0
f (g(x))g (x)dx = f (x)dx.
a g(a)
We work some example integrals here:
8
3.2 Worked Examples
Question. What’s
Z 2
x2 ex dx ?
1
Proof. Looking at this problem, it doesn’t seem like a substitution will be terribly useful:
so, let’s try to use integration by parts!
How do these kinds of proofs work? Well: what we want to do is look at the quantity
we’re integrating (in this case, x2 ex ,) and try to divideRit into two parts – a “f (x)”-part
R and
a “g 0 (x)” part – such that when we apply the relation f (x)g 0 (x) = f (x)g(x) g(x)f 0 (x),
our expression gets simpler!
To ensure that our expression does in fact get simpler, we want to select our f (x) and
g 0 (x) such that
1. we can calculate the derivative f 0 (x) of f (x) and find a primitive g(x) of g 0 (x), so that
either
2. the derivative f 0 (x) of f (x) is simpler than the expression f (x), or
3. the integral g(x) of g 0 (x) is simpler than the expression g 0 (x).
So: often, this means that you’ll want to put quantities like polynomials or ln(x)’s in the
f (x) spot, because taking derivatives of these things generally simplifies them. Conversely,
things like ex ’s or trig functions whose integrals you know are good choices for the integral
spot, as they’ll not get much more complex and their derivatives are generally no simpler.
Specifically: what should we choose here? Well, the integral of ex is a particularly
easy thing to calculate, as it’s just ex . As well, x2 becomes much simpler after repeated
derivation: consequently, we want to make the choices
f (x) = x2 g 0 (x) = ex
f 0 (x) = 2x g(x) = ex ,
which then gives us that

Z 2 Z 2
2
2 x
x e dx = f (x)g(x) f 0 (x)g(x)dx
1 1 1
Z 2
2
= x2 ex 2xex dx.
1 1
Another integral! Motivated by the same reasons as before, we attack this integral with
integration by parts as well, setting
f (x) = 2x g 0 (x) = ex
f 0 (x) = 2 g(x) = ex .
9
This then tells us that
Z 2 Z 2
2
x2 ex dx = x2 ex 2xex dx
1 1
✓1 Z 2 ◆
2 2
0
= x2 ex f (x)g(x) f (x)g(x)dx
1 1 1
✓ Z 2 ◆
2 2
= x2 ex 2xe x x
2e dx
1 1 1
✓ ◆
2 2 2
= x2 ex 2xe x
2e x
1 1 1
2 1 2 1
= 4e e 4e 2e 2e2 + 2e1
= 2e2 e1 .
So we’re done!
Question. What is
Z 2
x2 sin(x3 )dx ?
0
Proof. How do we calculate such an integral? Direct methods seem unpromising, and using
trig identities seems completely insane. What happens if we try substitution?
Well: our first question is the following: what should we pick? This is the only “hard”
part about integration by substitution – making the right choice on what to substitute in.
In most cases, what you want to do is to find the part of the integral that you don’t know
how to deal with – i.e. some sort of “obstruction.” Then, try to make a substitution that
(1) will remove that obstruction, usually such that (2) the derivative of this substitution is
somewhere in your formula.
Here, for example, the term sin(x3 ) is definitely an “obstruction” – we haven’t developed
any techniques for how to directly integrate such things. So, we make a substitution to make
this simpler! In specific: Let g(x) = x3 . This turns our term sin(x3 ) into a sin(g(x)), which
is much easier to deal with Also, the derivative g 0 (x) = 3x2 dx is (up to a constant) being
multiplied by our original formula – so this substitution seems quite promising. In fact, if
we calculate and use our indicated substitution, we have that
Z 2 Z 2
1
2 3
x sin(x )dx = sin(g(x)) · · g 0 (x)dx
0 0 3
Z 23
= sin(x)dx
03
sin(8) sin(0)
=
3 3
sin(8)
= .
3
10
(Note that when we made our substitution, we also changed the bounds from [a, b] to
[g(a), g(b)]! Please, please, always change your bounds when you make a substitution!)
3.3 Trigonometric Substitutions

Usually, when people use integration by substitution, they use it to take functions out
rather than to put functions in. I.e. people usually start with integrals of the form
Z b
f (g(x))g 0 (x)dx,
a
and turn them into integrals of the form

Z g(b)
f (x)dx.
g(a)
However: this is not the only way to use integration by substitution! Specifically, it is
possible to use integration by substitution to put a g(x) into an integral, as well! In
other words, if we have an integral of the form
Z b
f (x)dx,
a
we can use integration by substitution to turn it into an integral of the form

Z g 1 (b)
f (g(x))g 0 (x)dx,
g 1 (a)
as long as we make sure that g is continuous on this new interval [g 1 (a), g 1 (b)].
Why would you want to do this? Well: suppose you’re working with a function of the
form
1
.
a + x2
2
Substituting x = a tan(✓) then turns this expression into

1 1 cos2 (theta) 1
2 2 2 = ⇣ 2 (✓)
⌘ = 2 2 2 = 2 cos2 (✓),
a + a tan (✓) sin
a2 1 + cos2 (✓) a cos (✓) + sin (✓) a
which is much simpler. As well: if you have a function of the form

p
a2 x2 ,
the substitution x = a sin(✓) turns this into
q q p
a2 a2 sin2 (✓) = |a| · 1 sin2 (✓) = |a| · cos2 (✓) = |a cos(✓)|,
which is again a simpler and easier thing to work with! These substitutions come up
frequently enough that we refer to them as the trigonometric substitutions; they’re
pretty useful for dealing with many kinds of integrals.
We illustrate their use in the following example:
11
Question. What is
Z 1
3/2
x2 + 1 ?
0
1
Proof. Looking at this, we see that we have a 1+x 2 term, surrounded by some other bits
and pieces. So: let’s try the tangent substitution we talked about earlier! Specifically: let
3/2
f (x) = x2 + 1 , g(x) = tan(x),
g 0 (x) = cos12 (x) .
Then, we have that

Z 1 Z 1
2 3/2
x +1 dx = f (x)dx
0 0
Z g 1 (1)
= f (g(x))g 0 (x)dx
g 1 (0)
Z tan 1 (1)
1
= cos3 (x) · dx
tan 1 (0) cos2 (x)
Z ⇡/4
= cos(x)dx
0
⇡/4
= sin(x) dx
0
p
2
= .
2
Question. Evaluate the improper integral

Z 1
1
p dx.
2 x x2 1
p p
Proof. Try the u-substitution u = x2 1 ) x = u2 + 1. If you do this, you get that
du = pxx2 1 dx ) puu2 +1 du = dx, and therefore that our original integral is
Z p Z
lima!1 a2 1 1
1 u 1
p p ·p du = p du.
22 1
2
u u +1 2
u +1 3 u2 +1
Now, you should try a trig substitution! In particular, try u = tan(t), t = arctan(u), du =
12
1
cos2 (t)
dt:
Z 1 Z lima!1 arctan(a)
1 1 1
p 2
du = p 2 · du
3 u +1 arctan( 3) 1 + tan (u) cos2 (u)
Z lima!1 arctan(a)
= p 1du
arctan( 3)
⇣ ⌘ p
= lim arctan(a) arctan( 3).
a!1
We know that tangent approaches positive-infinity on ( ⇡/2, ⇡/2) as its argument ap-
proaches ⇡/2: therefore, the limit
p as arctangent approaches +1 is just ⇡/2. Similarly,pwe
know that tangent is equal to 3 when its argument is equal to ⇡/3; therefore, arctan( 3)
is ⇡/3. Therefore, our integral is just is ⇡/6.
Question. Calculate the following two integrals:

Z 1 Z 3
2 1
ln(1 + x )dx, p p dx.
0 2 x+1+ x 1
R1
Proof. We begin by studying 0 ln(1 + x2 )dx. Because no substitution looks very promising
(as the 1 + x2 term messes things up,) we are motivated to try integration by parts. In
particular, we can remember the trick we used when integrating ln(x), and set
u = ln(1 + x2 ) dv = dx
2x
du = 1+x 2 v = x,
which gives us
Z 1 Z 1
1 2x2
ln(1 + x2 )dx = ln(1 + x2 ) · x dx
0 0 0 1 + x2
A bit of algebra allows us to notice that

Z 1 ◆ ✓Z 1
1 x2 1 1
ln(1 + x2 ) · x 2 2
dx = ln(1 + x2 ) · x dx 2 1
0 0 1 + x 0 0 1 + x2
Z 1
1 1 1
= ln(1 + x2 ) · x 2x + 2 2
dx.
0 0 0 1+x
R 1
Now, we remember our inverse trig identities, and specifically remember that 1+x 2 dx =
arctan(x); combining, we have
Z 1
1 1 1
ln(1 + x2 )dx = ln(1 + x2 ) · x 2x + 2 arctan(x)
0 0 0 0
⇡
= ln(2) 2+ .
2
13
R3
We now look at 2 px+1+1 px 1 dx. Before we can do anything, we have to do some algebra
to clean uppthis function.
p Specifically, to simplify this expression, we multiply top and
bottom by x + 1 x 1, a common algebraic technique used on square-root-involving
expressions to clean things up:
Z 3 Z 3 p p
1 1 x+1 x 1
p p dx = p p ·p p dx
2 x+1+ x 1 2 x+1+ x 1 x+1 x 1
Z 3 p p
x+1 x 1
= p 2
p dx
2 ( x + 1) ( x 1)2
Z 3p p
x+1 x 1
= dx
2 x+1 x+1
Z
1 3p p
= x+1 x 1dx
2 2
Z Z
1 3p 1 3p
= x + 1dx x 1dx.
2 2 2 2
We now perform a pair of translation-substitutions, setting u = x+1 in the first integral

and u = x-1 in the second integral:
Z Z
1 4p 1 2p
= udu udu.
2 3 2 1
! !
1 2u3/2 4 1 2u3/2 2
=
2 3 3 2 3 1
p p p
64 27 8+1
= .
3
3.4 Undoing Trigonometric Substitutions

So: often, when we’re integrating things, we often come up across expressions like
Z ⇡ Z ⇡/4
1 1
d✓, or d✓,
0 1 + sin(✓) ⇡/4 cos(✓)
where there’s no immediately obvious way to set up the integral. Sometimes, we can be
1
particuarly clever, and notice some algebraic trick: for example, to integrate cos(✓) , we can
14
use partial fractions to see that
1 cos(✓)
=
cos(✓) cos2 (✓)
cos(✓)
=
1 sin2 (✓)
✓ ◆
1 cos(✓) cos(✓)
= + ,
2 1 sin(✓) 1 + sin(✓)
and then integrate each of these two fractions separately with the substitutions u = 1 ±
sin(✓).
Relying on being clever all the time, however, is not a terribly good strategy. It would
be nice if we could come up with some way of methodically studying such integrals above –
specifically, of working with integrals that feature a lot of trigonometric identities! Is there
a way to do this?
As it turns out: yes! Specifically, consider the use of the following function as a substi-
tution:
g(x) = 2 arctan(x),
where arctan(x) is the inverse function to tan(x), and is a function R ! ( ⇡/2, ⇡/2). In
class, we showed that such inverse functions of di↵erentiable functions are di↵erentiable
themselves: consequently, we can use the chain rule and the definition of the inverse to see
that
(tan(arctan(x))0 = (x)0 = 1, and

1
(tan(arctan(x))0 = tan0 (arctan(x)) · (arctan(x))0 = · (arctan(x))0
cos2 (arctan(x))
1
) · (arctan(x))0 = 1
cos2 (arctan(x))
)(arctan(x))0 = cos2 (arctan(x)).
Then, if we remember how the trigonometric functions were defined, we can see that
(via the below triangles)
15
we have that
1
(arctan(x))0 = cos2 (arctan(x)) =
1 + x2
and thus that
2
g 0 (x) = .
1 + x2
As well: by using the above triangles, notice that
sin(g(x)) = sin(2 arctan(x))
= 2 sin(arctan(x)) cos(arctan(x)
1 x
=2· p ·p
1 + x2 1 + x2
2x
= ,
1 + x2
and
cos(g(x)) = cos(2 arctan(x))
= 2 cos2 (arctan(x)) 1
2
= 1
1 + x2
1 x2
= .
1 + x2
Finally, note that trivially we have that

1
g (x) = tan(x/2),
by definition.
What does this all mean? Well: suppose we have some function f (x) where all of its
1 1
terms are trig functions – i.e. f (x) = 1+sin(x) , or f (x) = cos(x) – and we make the substiution
Z b Z g 1 (b)
f (x) = f (g(x))g 0 (x), for g(x) = 2 arctan(x).

a g 1 (a)
What do we know about the integral on the right? Well: as we’ve just shown above, the
substitution of g(x) turns all of the sin(x)’s into sin(g(x))’s, which are just reciprocals of
polynomials; similarly, we’ve turned all of the cos(x)’s into cos(g(x))’s, which are also made
of polynomials. In other words, this substitution turns a function that’s made entirely out of
trig functions into one that’s made only out of polynomials! – i.e. it turns trig functions
into quadratic polynomials! This is excellent for us, because (as you may have noticed) it’s
often far easier to integrate polynomials than trig functions.
This substitution is probably one of those things that’s perhaps clearer in its use than
its explanation. We provide an example here:
16
Example. Find the integral
Z ⇡/2
1
d✓.
0 1 + sin(✓)
Proof. So: without thinking, let’s just try our substitution ✓ = g(x), where g(x) = 2 arctan(x):
Z ⇡/2 Z g 1 (⇡/2)
1
d✓ = f (g(x))g 0 (x)dx
0 1 + sin(✓) g 1 (0)
Z tan(⇡/4)
1 2
= 2x · dx
tan(0) 1+ 1+x2
1 + x2
Z 1
2
= dx
0 1 + x2 + 2x
Z 1
2
= dx
0 (1 + x)2
Z 2
2
= dx
1 x2
2
2
=
x
1
= 1/2.
. . . so it works! Without any e↵ort, we were able to just mechanically calculate an integral
that otherwise looked quite impossible. Neat!
17
Lecture 3: Derivatives in Rn
Week 3 UCSB 2015
This is the third week of the Mathematics Subject Test GRE prep course; here, we
review the concepts of derivatives in higher dimensions!
1 Definitions and Concepts

We start by reviewing the definitions we have for the derivative of functions on Rn :
@f
Definition. The partial derivative @x i
of a function f : Rn ! R along its i-th coördinate
at some point a, formally speaking, is the limit
f (a + h · ei ) f (a)
lim .
h!0 h
(Here, ei is the i-th basis vector, which has its i-th coördinate equal to 1 and the rest equal
to 0.)
However, this is not necessarily the best way to think about the partial derivative, and
certainly not the easiest way to calculate it! Typically, we think of the i-th partial derivative
of f as the derivative of f when we “hold all of f ’s other variables constant” – i.e. if we
think of f as a single-variable function with variable xi , and treat all of the other xj ’s as
constants. This method is markedly easier to work with, and is how we actually *calculate*
a partial derivative.
We can extend this to higher-order derivatives as follows. Given a function f : Rn ! R,
we can define its second-order partial derivatives as the following:
✓ ◆
@2f @ @f
= .
@xi @xj @xi @xj
In other words, the second-order partial derivatives are simply all of the functions you can
get by taking two consecutive partial derivatives of your function f .
Definition. Often, we want a way to talk about all of the first-order derivatives of a
function at once. The way we do this is with the di↵erential, or total derivative. We
define this as follows: the total derivative of a function f : Rn ! Rm is the following matrix
of partial derivatives:
2 @f @f1 @f1
3
1
@x (a) @x2 (a) . . . @xn (a)
6 @f21 @f @f 7
6 @x1 (a) @x22 (a) . . . @xn2 (a)7
D(f ) a = 66 .. .. .. .. 7 7
4 . . . . 5
@fn @fn @fn
@x1 (a) @x2 (a) ... @xn (a)
1
For a function f : Rn ! R, this has the special name gradient, and is denoted
✓ ◆
@f @f @f
rf = , ,...
@x1 @x2 @xn
For a function f : Rn ! R, a point a is called a critical point if it is a stationary point, or

f is not di↵erentiable in any neighborhood of a. Similarly, a point a 2 Rn is called a local
maxima of f if there is some small value r such that for any point x within distance r of
~a, we have f (x)  f (a). (A similar definition holds for local minima.)
Definition. For functions f : Rn ! R, we can also define an object that generalizes the
“second-derivative” from one-dimensional calculus to multidimensional calculus. We do this
with the Hessian, which we define here. The Hessian of a function f : Rn ! R at some
point a is the following matrix:
2 3
@2f @2f
(a) . . . (a)
6 @x1 @x. 1 ..
@x1 @xn
.. 7
H(f ) a = 6
4 .
. . .
7.
5
2
@ f 2
@ f
@xn @x1 (a) . . . @xn @xn (a)
Finally: like with the normal second derivative, we can use H(f ) a to create a “second-
order” approximation to f at a, in a similar fashion to how we used the derivative to
create a linear (i.e. first-order) approximation to f . We define this here: if f : Rn ! R
is a function with continuous second-order partials, we define the second-order Taylor
approximation to f at a as the function
1
T2 (f ) a (a + h) = f (a) + (rf )(a) · h + · (h1 , . . . hn ) · H(f ) a
· (h1 , . . . hn )T .
2
You can think of f (a) as the constant, or zero-th order part, (rf )(a) · h as the linear part,
and H(f ) a (h) as the second-order part of this approximation.
Definition. Finally, we have two useful physical phenomena, the divergence and curl,
that have natural interpretations. Given a C 1 vector field F : R3 ! R3 , we can defind the
divergence and curl of F as follows:
• Divergence. The divergence of F , often denoted either as div(F ) or r · F , is the

following function R3 ! R:
@F1 @F2 @F3
div(F ) = r · F = + + .
@x @y @z
• Curl. The curl of F , denoted curl(F ) or r ⇥ F , is the following map R3 ! R3 :

✓✓ ◆ ✓ ◆ ✓ ◆◆
@F3 @F2 @F1 @F3 @F2 @F1
curl(F ) = r ⇥ F = , , .
@y @z @z @x @x @y
2
Often, the curl is written as the “determinant” of the following matrix:
2 3
i j k
6 7
6@ 7
det 6 @ @ 7
6 @x @y @z 7
4 5
F1 F 2 F3
Given a function F : R2 ! R2 , we can also find its curl by “extending” it to a function

F ? : R3 ! R3 , where F1? (x, y, z) = F (x, y), F2? (x, y, z) = F (x, y), and F3? (x, y, z) = 0. If
someone asks you to find the curl of a function that’s going from R2 ! R2 , this is what
they mean.
Also, divergence naturally generalizes to working on any function Rn ! Rn ; just take
the sum of @F
@xi over all of the variables the function depends on.
i
We also have several theorems that we know about the derivative! We list a few here.
Here’s how we extend the product and chain rules:
Theorem. Suppose that f, g are a pair of functions Rn ! Rm , and we’re looking at the
inner product1 f · g of these two functions. Then, we have that
D(f · g) = f (a) · (D(g)) + g(a) · (D(f )) .

a a a
Theorem. Take any function g : Rm ! Rl , and any function f : Rn ! Rm . Then, we have
D(g f ) = D(g) · D(f ) .

a f (a) a
One interesting/cautionary tale to notice from the above calculations is that the partial
derivative of g f with respect to one variable xi can depend on many of the variables and
coördinates in the functions f and g!
I.e. something many first-year calculus students are tempted to do on their sets is to
write
@(g f )i @gi @fi
= · .
@xj a @xj f (a) @xj a
DO NOT DO THIS. Do not do this. Do not do this. Ever. Because it is wrong. Indeed,
if you expand how we’ve stated the chain rule above, you can see that @(g@xfj )i – the (i, j)-th
a
entry in the matrix D(g f ) – is actually equal to the i-th row of D(g) multipled by
f (a)
the j-th column of D(f ) – i.e. that
a
2 3
@f1
 6 @xj a
7
@(g f )i @gi @gi 6 .. 7 .
= @x1 f (a) . . . @x m
·6 . 7
@xj a f (a) 4 5
@fm
@xj a
1 Pm
Recall that the inner product of two vectors u, v is just the real number i=1 u i vi .
3
Notice how this is much more complex! In particular, it means that the partials of g f
depend on all sorts of things going on with g and f , and aren’t restricted to worrying about
just the one coördinate you’re finding partials with respect to.
The moral here is basically if you’re applying the chain rule without doing a *lot* of
derivative calculations, you’ve almost surely messed something up. So, when in doubt, just
find the matrices D(f ) and D(g)!
Here’s how the derivative interacts with finding maxima and minima:
Theorem. A function f : Rn ! R has a local maxima at a critical point a if all of its

second-order partials exist and are continuous in a neighborhood of a, and the Hesssian of
f is negative-definite2 at a. Similarly, it has a local minima if the Hessian is positive-definite
at a. If the Hessian takes on both positive and negative values there, it’s a saddle point:
there are directions you can travel where your function increase, and others where it will
decrease. Finally, if the Hessian is identically 0, you have no information as to what your
function may be up to: you could be in any of the three above cases.
In the section above, we talked about how to use derivatives to find and classify the
critical points of functions Rn ! R. This allows us to find the global minima and maxima
of functions over all of Rn , if we want! Often, however, we won’t just be looking to find the
maximum of some function on all of Rn : sometimes, we’ll want to maximize a function given
a set of constraints. For example, we might want to maximize the function f (x, y, z) = x+y
subject to the constraint that we’re looking at points where x2 + y 2 = 1. How can we do
this?
Initially, you might be tempted to just try to use our earlier methods: i.e. look for
places where Df is 0, and try to classify these extrema. The problem with this method,
when we have a set of constraints, is that it usually won’t find the maxima or minima on
this constraint: because it’s only looking for local maxima or minima over all of Rn , it will
ignore points that could be maxima or minima on our constrained surface! I.e. for the f, g
we mentioned above, we know that r(f ) = (1, 1), which is never 0; however, we can easily
see by graphing that f (x, y) = x + y should have a maximum value on the set x2 + y 2 = 1,
specifically at x = y = p12 .
Theorem. So: how can we find these maxima and minima in general? The answer is the
method of Lagrange multipliers, which we outline here.
2
The Hessian H(f ) a
is positive-definite if and only if the matrix
2 @2f 2 3
@x1 @x1
(a) . . . @x@1 @x
f
n
(a)
6 .. .. .. 7
6 . 7
4 . . 5
2 2
@ f @ f
@xn @x1
(a) . . . @xn @xn
(a)
is positive-definite. (The same relation holds for being negative-definite.)

Recall from Math 1a that a matrix is positive-definite if and only if all of its eigenvalues are real and
positive. Similarly, a matrix is negative-definite if and only if all of its eigenvalues are real and negative.
If some of a matrix’s eigenvalues are 0, some are negative and others are positive, or if there are less real
eigenvalues than the rank of the matrix (i.e. some eigenvalues are complex,) then the matrix is neither
positive-definite or negative-definite.
Note also that because the Hessian is symmetric whenever the mixed partials of our function are equal,
and symmetric matrices have only real eigenvalues, you really should never get complex-valued eigenvalues.
4
Suppose that f : Rn ! R is a function whose extremal values {x} we would like to find,
given the constraints g(x) = c, for some constraining function g(x). Then, we have the
following result: if a is an extremal value of f restricted to the set S = {x : 8i, g(x) = c},
then either one of r(f ) a is 0, doesn’t exist, or there is some constant such that
r(f ) a
= r(g) a .
Theorem. We have a pair of rather useful theorems about the divergence and curl of
functions, which we state here:
• For any C 2 function F , div(curl(F )) is always 0.
• For any C 2 function F , curl(grad(F )) is always 0.
5
2 Worked Examples
Example. (Lagrange multipliers; level curves.) Consider the function
x2 y 2
g(x, y) = e x2 y 2 .
(a) Draw several level curves of this function.
(b) Let f (x, y) = x + y, and let S be the constraint set given by the level curve {(x, y) :
g(x, y) = c}. For what values of c does f S have a global maximum? For what values
does it fail to have a global maximum: i.e. for what values of c is f unbounded on S?
(c) For c = 14 , find the global maximum of f on the above constraint set S = {(x, y) :
g(x, y) = c}.
Solution. We graph g(x, y) = z in red, along with three level curves in di↵erent shades of
blue, in the following picture.
Roughly speaking, there are three kinds of level curves for our function:
1. Level curves g(x, y) = c, where c is close to 1. There, because we need g to be close to

2 2
1, we need to have x and y very small (so that the e x y part is as close to 1 as we
can get it, and the x2 y 2 part is not too large.) In particular, this forces us to have a
roughly circular shape, as for very small values of (x, y) the x2 y 2 part is insignificant
6
and our function looks roughly like e x2 y 2 , which is roughly 1 x2 y 2 (via Taylor
series) for small values of (x, y).
2. Level curves g(x, y) = c, where c is greater than 0, but not by much. For these values
of c, we wind up having kind of a “four-armed” shape, with arms stretching out along
the x- and y- axes. This is because when one of our coordinates is nearly zero, the
2 2
other can become much larger (because our function is roughly e x y then), whereas
when the coordinates are roughly the same, the dominant term is now the x2 y 2 term,
and we need to have both x and y be much smaller.
3. Level curves g(x, y) = c, where c is  0. In these cases, our level curves look like
hyperbola-style curves, one in each quadrant. This is because on each axis, our func-
2 2
tion g(x, y) can never be 0, as the e x y -part is always positive and the x2 y 2 part
is zero on the axes.
This graphing and subsequent analysis suggests an answer to part (b), as well:
Claim. Our function f (x, y) has a global maximum on the curve g(x, y) = c if and only if
1 c > 0.
2 2
Proof. If c > 1, then there are no points (x, y) such that g(x, y) = c, because e x y is
bounded above by e0 = 1, while x2 y 2 is bounded above by 0.
So: suppose that 1 c > 0. Then, if (x, y) are such that g(x, y) = c, we know that in
particular
x2 y 2
e c
2 2
) x y ln(c)
2 2
) x + y  ln(c)
p p
) x2 + y 2  ln(c)
p
) ||(x, y)||  ln(c),
p
i.e. the point (x, y) can be no further than ln(c) from the origin. (Because 1 c > 0, we
know that 1 < ln(c)  0, and therefore that this is a well-defined finite and real-valued
bound on distances.)
Therefore, the set of points such that g(x, y) = c is bounded. We also know that it is
closed, because it is the level curve of a continuous function. Therefore, we know that any
continuous function (in particular, f ) will attain its global maxima and minima on this set,
and do so at the critical points identified by the method of Lagrange multipliers.
Finally, suppose that c  0. In this case, our claim is that f does not attain its global
maximum on g(x, y) = c. To prove this, pick any value of n: we want to find a point (x, y)
on our curve such that f (x, y) > n.
To do this, we simply use the intermediate value theorem. Pick any n, and choose x
such that x2 < c 1, and also x > n. Then, we know that
x2 0 x2
g(x, 0) = e x2 · 0 = e >0 c,
7
while
x2 1 x2 x2
g(x, 1) = e x2 · 1 = e x2 < e c 1 < c,
2
because e x < 1.
Therefore,because g(x, 0) > c and g(x, 1) < c, by the intermediate value theorem, there
is some value of y between 0 and 1 such that g(x, y) = c. At this point (x, y), we know that
f (x, y) = x + y n+0 n,
which is what we wanted to prove: i.e. we’ve shown that we can find points on our curve
along which f (x, y) is arbitrarily large, and therefore that there is no global maximum.
Finally, with this theoretical discussion out of the way, we can turn to the calculational
part of (c), which asks us to find the global maximum of our function f on the constraint set
g(x, y) = 14 . First, note that by our above discussion, we know that a global maximum does
exist, because when 1 c > 0 we’ve shown that our constraint set is closed and bounded.
Furthermore, to find this maximum, it suffices to use the method of Lagrange multipliers
to find all of the critical points of our function restricted to this curve, and simply select
the largest value amongst these critical points. (Again, this is because g(x, y) = c is closed
and bounded, which means that our global maximum must occur a critical point.)
So: we calculate. We are looking for any points (x, y) such that either
• r(f ) or r(g) are 0,
• r(f ) or r(g) are undefined, or
• there is some nonzero constant such that r(f ) = r(g).
Because
r(f )(x, y) = (1, 1) ,
we can immediately see that r(f ) is never undefined or zero.
Similarly, because
⇣ 2 2 2 2
⌘
r(g) = 2xe x y 2xy 2 , 2ye x y 2yx2 ,
we can see that the first component of r(g) is zero if and only if
2 y2
0= 2xe x 2xy 2
⇣ ⌘
x2 y 2
,0 = 2x e + y2
x2 y 2
,0 = x, because e + y 2 is strictly positive.
Similarly, we can see that the second component of r(g) is zero if and only if
2 y2
0= 2ye x 2yx2
⇣ ⌘
x2 y 2
,0 = 2y e + x2
x2 y 2
,0 = y, because e + x2 is strictly positive.
8
So r(g) is always defined and is only zero at (0, 0), which is not a point on our curve
g(x, y) = 14 . Therefore, the only points we’re concerned with are ones at which r(f ) =
r(g); i.e. points such that
⇣ 2 2 2 2
⌘
r(f ) = (1, 1) = r(g) = 2xe x y 2xy 2 , 2ye x y 2yx2
x2 y 2 x2 y 2
, 2xe 2xy 2 = 2ye 2yx2 ,
because the above equation is equivalent to forcing both the left and right coordinates of
r(g) to equal the same quantity (namely, 1 .)
Solving, we can see that this is equivalent to
x2 y 2 x2 y 2
0 = 2xe + 2xy 2 2ye 2yx2
x2 y 2
,2(x y)e 2xy(x y) = 0.
If x y = 0, i.e. x = y, this equation holds. Otherwise, we can divide through by 2(x y),
and get
x2 y 2
e = xy.
1
Plugging this into our constraint equation g(x, y) = 4 gives us
x2 y 2 1 1 1
e (xy)2 = ) (xy) (xy)2 = ) xy = ,
4 4 2
by thinking of “xy” as one term and using the quadratic formula. But, if we think about
2 2 1
what this means for the equation e x y = xy, and specifically use y = 2x , we have
1 x2 y 2 x2 1
= xy = e =e 4x2 .
2
This is impossible! In specific, by taking a single-variable derivative, you can easily see that
the largest value of x2 4x12 happens at x = p12 , at which this is 1. This means that
2 1
the largest that e x 4x2
gets is e 1 = 1e , which is smaller than 12 .
Therefore, the only points at which r(f ) = r(g) are those at which x = y. Plugging
this into our constraint g(x, y) = 14 yields
2x2 1
e x4 =
4
)x ⌘ ±.65.
The function f (x, y) = x + y is equal to 1.3 at the point (.65, .65) and is equal to 1.3
at ( .65, .65). Therefore, by our discussion earlier about how f must attain its global
minima and maxima at the critical points discovered by the Lagrange multiplier process,
we can safely conclude that (.65, .65) is roughly the point at which f (x, y) attains its global
maxima, which is roughly 1.3.
Example. (Tangent planes.) Let S be the surface in R3 formed by the collection of all
points (x, y, z) such that exyz = e. Find the tangent plane to S at (1, 1, 1).
9
Solution. One way to attack this problem is to apply natural logs to both sides, which lets
us write S as the collection of all points (x, y, z) such that xyz = 1; i.e. all points x, y 6= 0
1 1
such that z = xy . In other words, we can write S as the graph of the function f (x, y) = xy .
We know that the gradient of f (x, y) is just
✓ ◆
y x
, ,
(xy)2 (xy)2
which at 1 is just ( 1, 1). Therefore, by using the formula for describing the first-order
Taylor approximation – i.e. tangent plane – of functions of the form f (x, y) = z, we have
that the tangent plane to our surface at (1, 1, 1) is just
(z 1) = r(f ) · (x 1, y 1) = ( 1, 1) · (x 1, y 1)
(1,1,1)
)z 1+x 1+y 1 = 0.
Alternately, we also discussed a second formula in class for finding tangent planes to
surfaces of the form g(x, y, z) = C, at some point (a, b, c). Specifically, we observed that
the gradient of g at the point (a, b, c) was orthogonal to the tangent plane to our surface
at this point: in other words, that we could define our tangent plane as just the set of all
vectors orthogonal to the gradient of g through this point. As a formula, this was
0 = r(g) · (x 1, y 1, z 1)
(1,1,1)
,0 = (yzexyz , xzexyz , xyexyz ) · (x 1, y 1, z 1)

(1,1,1)
,0 = (1, 1, 1) · (x 1, y 1, z 1)
,0 = z 1+x 1+y 1.
Reassuringly, we get the same answer no matter which method we pick.
Example. (Chain rule.) Let g : R4 ! R be defined by the equation (w, x, y, z) = (wz yx),
and h : R2 ! R4 be defined by the equation h (a, b) = (a, a, b, b).
(a) Calculate the derivative of g h using the chain rule.
(b) Geometrically, explain why your answer in (a) is “obvious,” in some sense.
Solution. So, we know that both g and h are continuous functions on all of their domains;
therefore, we know that their composition is continuous everywhere. Therefore, we know
that the total derivative of g h is just given by the partial derivatives of g h : i.e.
10
T (g h ) = D(g h ). Therefore, we can use the chain rule:
D(g h ) = D(g) · D(h )

(a,b) h (a,b) (a,b)
3 2
1 0
⇥ ⇤ 6 0 7
= z y x w ·6
4 0 1 5
7
h (a,b)
0
2 3
1 0
⇥ ⇤ 6 0 7
= b b a a ·6 4 0 1 5
7
0
=[ b b, a a]
= [0, 0].
Notice that
✓ this is◆ geometrically somewhat obvious because g is just the determinant
✓ of
◆
w x a b
the matrix , while the function h just outputs the rank-1 matrix .
y z a b
Because the determinant of a rank 1 matrix is 0, we have that g h is identically 0, and
therefore also has derivative 0.
Example. (Taylor series; directional derivatives.) Let g(x, y) = sin(xy).
(a) Calculate the directional derivative of g(x, y) at (1, 2) in the direction (3, 4).
(b) Calculate the second-order Taylor approximation of g(x, y) at (0, 0).
Solution. Because the gradient of g is just
r(g) = (y cos(xy), x cos(xy)),
we know that the directional derivative at (1, 2) in the direction (3, 4) is just given to us by
the dot product of r(g)(1, 2) with the unit-length vector in the direction (3, 4), given
1 1
by ||(3,4)|| · (3, 4) = p9+16 (3, 4) = 35 , 45 :
✓ ◆ ✓ ◆
3 4 3 4 6 cos(1) + 4 cos(2)
r(g)(1, 2) · , = (2 cos(1), cos(2)) · , = .
5 5 5 5 5
To calculate the Taylor approximation of g at (0, 0), we just need to construct the
following function:
T2 (g) (0,0)
(h1 , h2 ) = g(0, 0) + r(g) (0,0)
· (x, y) + H(g) (0,0)
(x, y).
11
To do this, simply note that the Hessian H(g) of g is just
 
1⇥ ⇤ y 2 sin(xy) cos(xy) xy sin(xy) h1
H(g) (h , h ) =
(0,0) 1 2
h1 , h2
2 cos(xy) xy sin(xy) x2 sin(xy) h2
(0,0)
 
1⇥ ⇤ 0 1 h1
= h1 , h2 ·
2 1 0 h2

1⇥ ⇤ h2
= h1 , h2
2 h1
1
= (h1 h2 + h1 h2 )
2
= h 1 h2 ,
and therefore that
T2 (g) (0,0)
(h1 , h2 ) = g(0, 0) + r(g) (0,0)
· (x, y) + H(g) (0,0)
(x, y)
= sin(0) + (0 cos(0), 0 sin(0)) · (x, y) + xy
= xy.
Therefore, the second-order approximation to sin(xy) at the origin is just T2 (x, y) = xy.
Example. (Using derivatives to study local extrema.) Let
f (x, y) = (x8 + y 8 ) + 4(x6 + y 6 ) 4(x4 + y 4 ).
Find all of the critical points of f , and classify them as local maxima, minima, or saddle
points.
Solution. We start by graphing our function:
12
Roughly speaking, it looks like we have four global maxima, at least four saddle points
between these maxima, and probably a bunch of weird things going on in the interior part
of our function which are hard to determine from our picture. Probably a local minima in
there.
Picture aside, our task here is pretty immediate:
1. First, we want to calculate r(f ), and find all of the points where it is either undefined
or 0. These are our critical points.
2. We then want to calculate H(f ), the Hessian of f , for each critical point. If the
Hessian is positive-definite3 , then we know that this point is a local minimum;
if it is negative-definite, then it’s a local maximum; if it has both a positive
eigenvalue and a negative eigenvalue, it’s a saddle point; and if it’s anything
else, we have no idea what’s going on, and will need to explore its behavior using
other methods.
So: by calculating, we can see that
D(f ) = ( 8x7 + 24x5 16x3 , 8y 7 + 24y 4 16y 3 ),
and therefore that this is equal to 0 whenever
0= 8x7 + 24x5 16x3

,x = 0, or
0= 8x4 + 24x2 16
2 2
,0 = (x 2)(x 1)
p
,x = ± 2, ±1,
and
0= 8y 7 + 24y 5 16y 3
p
,y = 0, ± 2, ±1.
So we have twenty-five critical points, consisting of five choices of x and five choices of y. To
classify these points, we look at the matrix of second-order-partials formed in the Hessian:
2 3
@2f @2f
(a) . . . (a) 
6 @x1 @x. 1 .
@x1 @xn
. 7 56x6 + 120x4 48x2 0
6 .. .. .. 7= .
4 5 0 56y 6 + 120y 4 48y 2
2
@ f 2
@ f
@xn @x1 (a) . . . @xn @xn (a)
2 @2f @2f
3
@x1 @x1
(a) ... @x1 @xn
(a)
6 .. .. .. 7
3
We say that the Hessian is positive-definite if the associated matrix 6
4 . . .
7
5
@2f 2
@xn @x1
(a) . . . @x@n @x
f
n
(a)
of second partial derivatives is positive-definite: i.e. it has n eigenvalues and they’re all strictly positive.
Negative-definite is similar, except we ask that all of the eigenvalues exist and are strictly negative.
13
p
When x = ±1, the polynomial 56x6 +120x4 48x2 is 16, which is positive; when x = ± 2,
this polynomial is 64, which is negative; finally, when x = 0 this polynomial is 0. Therefore,
at the points
(±1, ±1)
the Hessian is positive-definite, and therefore our function has a local minimum, while at
the points
p p
(± 2, ± 2)
the Hessian is negative-definite, and therefore our function has a local maximum, while at
p p
(± 2, ±1), (±1, ± 2),
the Hessian has both a negative and a positive eigenvalue (try (1, 0), (0, 1) for two eigenvec-
tors!), and therefore our function has a saddle point.
This leaves just the points with a zero-coordinate, at which the Hessian is useless to us.
There, we need to analyze how small changes in our function
f (x, y) = (x8 + y 8 ) + 4(x6 + y 6 ) 2(x4 + y 4 )
change its values at such points!

So: for very small values of x, y, we know that x4 x6 , x8 and y 4 y 6 , y 8 ; therefore,
4 4
very very close to the origin, our function is roughly just 2(x +y ), which is a upside-down
parabola with a maximum at the origin. Therefore, we can see that this point is actually
a local maximum, because (using our approximation) at all values very close to the origin
that are not the origin, our function is roughly 2(x4 + y 4 ) and therefore quite decidedly
< 0, its value at the origin. So (0, 0) is a local maxima!
For the other values, we can do a similar (but more in-depth) analysis. For convenience’s
sake, let g(z) = z 8 + 4z 6 4z 4 ; we can then write f (x, y) = g(x) + g(y). By the same
logic as above, for arbitrarily small values of z we can write g(z) as approximately 4z 4 ,
as z 4 z 6 , z 8 and thus the z 4 terms dominate the function g(z).
In general, we can extend our observation above to an approximation of (c + ✏)n for any
constant c, power n, and very small ✏, by using the binomial theorem:
Xn ✓ ◆
n n
(z + ✏) = · z n k ✏k
k
i=0
n(n 1)
= z n + n✏z n 1
+ zn 2 2
✏ + (terms scaled by ✏4 )
2
n(n 1)
⇡ z n + n✏z n 1
+ zn 2 2
✏
2
) g(z + ✏) = (z + ✏1 ) + 4(z + ✏1 )6
8
4(z + ✏1 )4
⇡ g(z) + ( 8z 7 ✏ + 24z 5 ✏ 16z 3 ✏) + ( 56z 6 ✏21 + 120z 4 ✏21 48z 2 ✏21 )
) g(z + ✏) g(z) ⇡ ( 8z 7 ✏ + 24z 5 ✏ 16z 3 ✏) + ( 56z 6 ✏21 + 120z 4 ✏21 48z 2 ✏21 ).
14
This ispkind of horrible-looking, but we can work with it. In particular, it tells us that at
z = ± 2, we have
p p p p p
g((± 2) + ✏) g((± 2)) ⇡ ( 8(± 2)7 ✏ + 24(± 2)5 ✏ 16(± 2)3 ✏)
p p p
+ ( 56(± 2)6 ✏2 + 120(± 2)4 ✏2 48(± 2)2 ✏2 )
= 0 + ( 56 · 8✏2 + 120 · 4✏2 48 · 2✏2 )
= 64✏2 ,
and at z = ±1 we have
g((±1) + ✏) g((±1)) ⇡ ( 8(±1)7 ✏ + 24(±1)5 ✏ 16(±1)3 ✏)

+ ( 56(±1)6 ✏2 + 120(±1)4 ✏2 48(±1)2 ✏2 )
= 0 + ( 56✏2 + 120✏2 48✏2 )
= 16✏2 .
p
(Note that we used our earlier observation that ±1, ± 2 are roots of 8z 7 + 24z 5 16z 3 to
simplify
p the first parenthetical expression to 0.) In other words, small changes of g(z) near
0 or ± 2 yield decreases in our function, while small changes near ±1 yield increases!p
pThis lets us classify our remaining points: we can now see that the points (0, ± 2),
(± 2, 0) are local maxima, and that the points (0, ±1) and (±1, 0) are saddle points.
Success!
Example. Take the vector field V (x, y) = (x2 y 2 , x2 + y 2 ) . Show that this vector field is
neither the curl nor the gradient of any function.
Proof. This is relatively straightforward. To show that V is not the gradient of any vector
field, we simply need to calculate the curl of V . If it is nonzero, then we know that it cannot
be a gradient.
Because V is a vector field on R2 , in order to calculate its curl we treat it like a vector
field on R3 that has a 0 in its third component and does not depend on z. Then,
✓✓ ◆ ✓ ◆ ✓ ◆◆
@V3 @V2 @V1 @V3 @V2 @V1
curl(V ) = , ,
@y @z @z @x @x @y
= 0 0, 0 0, 2x 2yx2 ,
which is not identically equal to 0.

Similarly, we can show that V is not a curl by calculating its divergence: if this is
nonzero, then V cannot be written as the curl of any vector field. We do this here:
@V1 @V2
div(V ) = + = 2xy 2 + 2y,
@x @y
which is clearly nonzero.
15
Lecture 4: Integrals in Rn
Week 4 UCSB 2015
This is the third week of the Mathematics Subject Test GRE prep course; here, we
review the concepts of integrals in higher dimensions!

We start by reviewing the definitions/theorems we have for the integrals of functions on
Rn :
1. Types of integrals. You’ve (in theory) learned how to take several kinds of integrals
in undergrad:
• “Normal” integrals. Given a region R ⇢ Rn , we know how to take the integral

of any function F : Rn ! Rm over such a region by taking iterated integrals.
RR if R is some sort of a n-dimensional box [a1 , b1 ] ⇥ . . . [an , bn ], we
For example,
can write F dV as the iterated integral
R
Z b1 Z bn
... F dxn . . . dx1 .
a1 an
Part of being able to do these integrals is the ability to describe a region R via
sets of nested parameters. For example, if R is the upper-right quadrant of the
unit disk
R = {(x, y) : x2 + y 2  1, 0  x, 0  y},
you should be able to describe R as the set of all points such that
p
x 2 [0, 1], y 2 [0, 1 x2 ],
and therefore notice that that we can express

ZZ Z p
1Z 1 x2
f (x, y)dydx = f (x, y)dydx,
0 0
R
for some function f . Be able to do this “nested parameter” thing over most kinds
of regions: usually, the way you do this is by picking one variable, determining its
maximum range, then (for some fixed value of that first variable) pick a second
variable and determine its maximum range depending on the first variable, and
so on/so forth.
1
• Line integrals. Given a parametrized curve : [a, b] ! Rn , we can find the
integral of either a vector field F : Rn ! Rn or a scalar field f : Rn ! R along
this curve. Specifically, we can express these integrals as the following:
Z Z b
F ·d = (F (t)) · ( 0 (t))dt, and
a
Z Z b
f d = (f (t))|| 0 (t)||dt.
a
• Surface integrals. Given a parametrized surface S with parametrization T :

R ! S, R ✓ R2 , we can find the integral of any function f : Rn ! R over S, as
well as the integral of any vector field F : Rn ! Rn over S. Specifically, we can
express the integral of f over S as the following two-dimensional integral over R:
ZZ ZZ
f dS = (f T (u, v)) · ||Tu ⇥ Tv || dudv.
S R
As well, recall that a unit normal vector to our surface, n, can be given by the
formula
(Tu ⇥ Tv ) (Tv ⇥ Tu )
n= or
||Tu ⇥ Tv || ||Tv ⇥ Tu ||
up to the orientation of n: i.e. depending on whether we look at (Tu ⇥ Tv ) or

(Tv ⇥ Tu ), we will get either n or n. Choosing an orientation for our surface S
is simply choosing which of these two choices of normal vectors we will make for
our entire integral: whenever we ask you to integrate a vector field over a surface,
we will tell you what orientation you should pick (i.e. by asking you to orient S
so that “the normals point away from the origin,” or something like that.) Once
you’ve fixed an orientation, say the Tu ⇥ Tv one, we define the integral of F over
S as the following integral:
ZZ ZZ ZZ
(Tu ⇥ Tv )
F · dS = F · ndS = (f T (u, v)) · · ||Tu ⇥ Tv || dudv
||Tu ⇥ Tv ||
S S
ZRZ
= (f T (u, v)) · (Tu ⇥ Tv ) dudv.
R
The trickiest thing going on here is “how” you choose your parametrization. For
finding a parametrization of a surface S, you can usually do one of the following
two things:
2
– Often, if you describe your surface S in cylindrical or spherical coördinates,
you’ll see that one of the coördinates you’re describing your surface in is con-
stant. For example, a spherical shell of radius 3 can be described in spherical
coördinates as the set of all point (3, ✓, ), where ✓ 2 [0, 2⇡], 2 [0, ⇡]. In
this kind of situation, our parametrization is just using this coördinate sys-
tem with the constant variable treated as a constant: i.e. for the spherical
shell of radius 3, our parametrization is just
T (✓, ) = (3 cos(✓) sin( ), 3 sin(✓) sin( ), 3 cos( )),
where ✓ 2 [0, 2⇡], 2 [0, ⇡].

– If this doesn’t work out, the other tactic that’s often useful is finding an
equation that describes your surface, and solving for one of the variables in
terms of the others. For example, suppose that we’re looking at the surface
S given by the upper sheet of the hyperboloid of two sheets between heights
1 and 2: i.e.
S = {(x, y, z) : x2 y 2 + z 2 = 1, z 2 [1, 2]}.
In this case, because z is positive, we can solve for z in terms of the other
variables, and express S as
p
S = {(x, y, z) : z = 1 + x2 + y 2 , z 2 [1, 2]}.
We can then use this to formulate a parametrization of S: simply let x and

y range
p over the possible values that keep z between 1 and 2, and then set
z = 1 + x2 + y 2 :
p p p p p
T (x, y) = (x, y, 1 + x2 + y 2 ), x 2 [ 3, 3], y 2 [ 3 x2 , 3 x2 ].
You can of course combine these two approaches: for example, if we were
to use cylindrical coördinates on our surface S above and replace x with
r cos(✓), y with r sin(✓), we can see that we can easily express T instead as
the map
p p
T (r, ✓) = (r cos(✓), r sin(✓), 1 + r2 ), r 2 [0, 3], ✓ 2 [0, 2⇡],
which may be easier to work with.
2. Tools for evaluating integrals. Throughout your undergraduate career, you’ve

ran into many integrals of the above kinds that were difficult or impossible to directly
evaluate. Motivated by these problems, we developed a number of theorems and tools
about integration, which we repeat here:
• Green’s theorem. There are a number of forms of Green’s theorem; we state

the simpler and most commonly used version here. Suppose that R is a region
in R2 with boundary @R given by the simple closed curve C, and suppose that
3
is a traversal of C in the counterclockwise direction. Suppose as well that P and
Q are a pair of C 1 functions from R2 to R. Then, we have the following equality:
ZZ ✓ ◆ Z
@Q @P
dxdy = (P dx + Qdy) .
@x @y
R
• Stokes’ theorem. Stokes’ theorem, quite literally, is Green’s theorem for sur-
faces in R3 (as opposed to restricting them to lying in the plane R2 .) Specifically,
it is the following claim: suppose that S is a surface in R3 with boundary @S
given by the simple closed curve C, suppose that n is a unit normal vector to S
that gives S some sort of orientation, and suppose that is a traversal of C such
that the interior of S always lies on the left of ’s forward direction, assuming
that we’re viewing the surface such that the normal vector n is pointing towards
us. Suppose as well that F is a vector field from R3 to R3 . Then, we have the
following equality:
ZZ Z
(r ⇥ F ) · n dS = F d .
S
In general, you use Green’s and Stokes’s theorems whenever you have a integral
of a function over an awful curve (and taking derivatives to work with your
function over a region, which is what the curl does, will make things easier), or
you have an integral of a curl-like function over an awful region (and working on
the curve would make things easier.)
• Divergence/Gauss’s theorem. Let W be a region in R3 with boundary given
by some surface S, let n be the outward-pointing (i.e. away from W ) unit normal
vector to S, and let F be a smooth vector field defined on W . Then
ZZZ ZZ
(div(F ))dV = (F · n)dS.
W @W
Again, use this like you would use Green’s and Stokes’s theorems.
• Change of variables. A common tactic to make integrals easier is to apply
the technique of change of variables, which allows us to describe regions in Rn
using coördinate systems other than the standard Euclidean ones. In general,
the change-of-variables theorem says the following:
– Suppose that R is an open region in Rn , g is a C 1 map Rn ! Rn on an
open neighborhood of R, and that f is a continuous function on an open
neighborhood of the region g(R). Then, we have
Z Z
f (x)dV = f (g(x)) · det(D(g(x)))dV.
g(R) R
Specifically, the three most common change-of-variable choices are transitions

to the polar, cylindrical, and spherical coördinate systems, which we review
here:
4
– Polar coördinates. Suppose that R is a region in R2 described in polar
coördinates: i.e. there is some set A ✓ [0, 1) ⇥ [0, 2⇡) such that (A) = R,
where is the polar coördinates map (r, ✓) 7! (r cos(✓), r sin(✓)). Then, for
any integrable function f : R2 ! R, we have
ZZZ ZZZ
f (x, y)dV = f (r cos(✓), r sin(✓)) · r dV.
(A) A
– Cylindrical coördinates. Suppose that R is a region in R3 described in

cylindrical coördinates: i.e. there is some set A ✓ [0, 1) ⇥ [0, 2⇡) ⇥ ( 1, 1)
such that (A) = R, where is the cylindrical coördinates map (r, ✓, z) 7!
(r cos(✓), r sin(✓), z). Then, for any integrable function f : R3 ! R, we have
ZZZ ZZZ
f (x, y)dV = f (r cos(✓), r sin(✓), z) · r dV.
(A) A
– Spherical coördinates. Suppose that R is a region in R3 described in

spherical coördinates: i.e. there is some set A ✓ [0, 1) ⇥ [0, 2⇡) ⇥ [0, ⇡)
such that (A) = R, where is the spherical coördinates map (r, ✓, ') 7!
(r sin(') cos(✓), r sin(') sin(✓), r cos(')). Then, for any integrable function
f : R3 ! R, we have
ZZZ ZZZ
f (x, y)dV = f (r sin(') cos(✓), r sin(') sin(✓), r cos(')) · r2 sin(') dV.
(A) A
One of the trickiest things to do with change of variables is deciding which

coördinate system to use on a given set. For example, consider the following five
shapes:
To describe the cone, sphere cap, or torus above, cylindrical coördinates are
probably going to lead to the easiest calculations. Why is this? Well, all three of
these shapes have a large degree of symmetry around their z-axis; therefore, we’d
expect it to be relatively easy to describe these shapes as a collection of points
(r, ✓, z). However, these shapes do *not* have a large degree of rotational symme-
try: in other words, if we were to attempt to describe them with the coördinate
(r, ✓, '), we really wouldn’t know where to begin with the ' coördinate.
However, for the ellipsoid and “ice-cream-cone” section of the ellipsoid, spherical
coördinates are much more natural: in these cases, it’s fairly easy to describe
these sets as collections of points of the form (r, ✓, ').
5
In general, if you’re uncertain which of the two to try, simply pick one and see
how the integral goes! If you chose wisely, it should work out; otherwise, you can
always just go back and try the other coördinate system.
3. Applications of the integral. Finally, it bears noting that we’ve developed a few
applications of the integral to finding volume, surface area, length, and centers of
mass. We review these here:
• Volume, surface area, and length. If you have a solid V , a surface S, or a

curve C, you can find the volume/area/length of your object by integrating the
function 1 over that object.
• Area, via Green’s theorem. If you have a region R ⇢ R2 with boundary
given by the counterclockwise-oriented curve , you can use Green’s theorem to
y x
find its area as a line integral. Specifically, notice that if F (x, y) = 2 , 2 , we
@F2 @F1
have @x @y = 1, and therefore that Green’s theorem says that
ZZ Z ⇣
y x⌘
1 dA = , d .
2 2
R
• Center of mass. Suppose that an object A (a curve, surface, or solid) has

density function (x). Then, the xi -coördinate of its center of mass is given by
the ratio
R
xi (x) dA
A
R .
(x) dA
A
To illustrate these concepts, we work some examples:
2 Example Problems
Question 1. Let S denote the cut-o↵ paraboloid surface formed by the equations z + 1 =
x2 + y 2 , z  0, oriented ⇣so that the z-component
⌘ at the origin is positive. Let F denote the
z z 2 z 3
vector field F (x, y, z) = e y, e x, e z . Find the integral of r ⇥ F over S.
Solution. First, we calculate r ⇥ F :

✓✓ ◆ ✓ ◆ ✓ ◆◆
@F3 @F2 @F1 @F3 @F2 @F1
r⇥F = , ,
@y @z @z @x @x @y
⇣ 2 2
⌘
= 0 2ez xz, ez y 0, ez ez .
You could parametrize S and directly integrate this vector over S. But this looks awful.
Instead, what we can do is use Stokes’ theorem! In particular, consider the surface D given
by the unit disk x2 + y 2  1, z = 0. This surface has the same boundary as our surface
S: specifically, @S = @D = x2 + y 2 = 1. Suppose we orient the unit disk with the normal
6
(0, 0, 1), which is normal to the unit disk everywhere. Then these boundaries have the same
orientation, if both boundaries are oriented positively with respect to their corresponding
surfaces.
Therefore, we can use Stokes’s theorem once to see that
ZZ Z
r ⇥ F · dS = F · ds,
S @S +
and we can use it again to see that

Z ZZ
F · ds = r ⇥ F · dS.
@D + D
RR
Because @S = @D, these integrals are all the same! So, to calculate r ⇥ F · dS, we can
RR S
calculate r ⇥ F · dS. We do this here. Notice that the unit normal n to the unit disk as
D
a surface in R3 is simply (0, 0, 1); this saves us the e↵ort of having to parametrize the disk,
because
ZZ ZZ ⇣ ⌘
2 2
r ⇥ F · dS = 0 2ez xz, ez y 0, ez ez · (0, 0, 1) dS = 0,
any parametrization of the disk
D D
as any parametrization of the disk will have zero z-coördinate, and thus our integral is of
the form ( , , 0) · (0, 0, 1) = 0!
Lots of set-up, but it makes our calculations trivial: we didn’t even have to parametrize
the unit disk! This is one of the cooler applications of Stokes’s theorem: switching between
di↵erent surfaces.
You can also use things like Stokes’s and Green’s theorem to switch integrals between
di↵erent curves: it’s a little weirder, but sometimes is really useful.
Question 2. Take a pond whose outer perimeter is given by a circle of radius 4 and contains
16⇡ cubic centimeters of water. Drop a rock in the center of the pond. Assume that the
rock’s edges are roughly vertical, i.e. we can model the boundary of the rock in the pond as
some 2-d shape. After doing this, assume the water has height h in centimeters.
Suppose that there is an ant walking around the boundary of the rock. Suppose further
that this ant is being blown on by a wind current, which imparts force on the ant corre-
sponding to the vector field F(x, y) = ( y, x). In one walk of the ant around the boundary
of
R the rock, how much energy does the wind impart on the ant? In other words, what is
1
F · ds?
Solution. We draw the situation here.
7
2
As labelled above, let 1 denote the perimeter of the rock, and 2 denote the perimeter of
the pond. Let R denote the region between the outer curve and the inner curve. We want
to calculate
Z
F · dS.
1
This is. . . hard, because, well, we don’t actually know what 1 is. However, we can get
around this with Green’s theorem! ⇣ ⌘
In particular: notice that Green’s theorem says that the integral of @F @x
2 @F1
@y over R
is equal to the integral of F over the two boundary components 1 , 2 , provided that they’re
both oriented (as drawn) so that R is always on the left-hand-side of each curve. In other
words,
ZZ ✓ ◆ Z Z
@F2 @F1
dA = F · ds + F · ds.
@x @y 1 2
R
So, we can solve for the integral we want to study, in terms of two other integrals:
Z ZZ ✓ ◆ Z
@F2 @F1
F · ds = dA F · ds.
1
@x @y 2
R
These are, surprisingly, things we can calculate. In specific, we have that

ZZ ✓ ◆ ZZ ZZ
@F2 @F1
dA = (1 ( 1)) dA = 2dA = 2 · (surface area of R).
@x @y
R R R
Because the pond started with 16⇡ cubic centimeters of water and had height h after we
dropped the rock in, we know that R has surface area 16⇡
h , and therefore that this integral
is 32⇡
h .
8
R
As well, we can find 2 F · ds. We parametrize 2 as 2 (t) = (4 cos(t), 4 sin(t)):
Z Z 2⇡ Z 2⇡
F · ds = ( 4 sin(t), 4 cos(t)) · ( 4 sin(t), 4 cos(t)) dt = 16dt = 32⇡.
2 0 0
R
Therefore, we can combine these two integrals to calculate 1
F · ds:
Z ✓ ◆
1
F · ds = 32⇡ 1 .
1
h
This is pretty cool: we know exactly how much work was done by this wind current,
even though we have no idea what path we integrated over!
Question 3. Let T be a triangle with vertices (1, 0, 0), (0, 2, 0), (0, 0, 3). If this triangle is
made out of some material with uniform density across its surface, what is the x-coördinate
of the center of mass of T ?
Solution. We want to find the x-coördinate of the center of mass of T . This is the “average”
x-coördinate over our entire surface. Recall the following: if we wantRR to find theRRaverage
value of a function f on a surface T , we want to find the integrals T f dA and T 1dA,
and divide the first of these two integrals by the second: this gives us the average value of
f over T .
So. We start by parametrizing our triangle. We do this by considering coördinates
one-by-one. We first look at x: over our entire triangle, x ranges from 0 to 1.
We now look at the possible range of y-values, given x. We do this by projecting our
triangle onto the xy-plane: there, this is the triangle with vertices (0, 0), (1, 0), (0, 2).
Given any fixed value of x, we can see that y ranges from 0 to 2 2x.
Finally, we need to solve for z given x and y. To do this, we just need to find the plane
this triangle lies in: this will give us an equation relating x, y and z. We do this by taking
the generic equation for a plane
ax + by + cz = d
and plugging in the three points (1, 0, 0), (0, 2, 0), (0, 0, 3) into this equation:
d d
a = d, b = ,c = .
2 3
9
This gives us that our plane has the equation
d d
dx + y + z = d,
2 3
which (if we divide by d) becomes
y z
x+ + = 1.
2 3
Solving for z gives us
3y
z=3 3x .
2
⇣ ⌘
3y
So, we can parametrize our triangle via the map T (x, y) = x, y, 3 3x 2 , where x
ranges from 0 to 1 and (given
RR x) y ranges from 0 to 2 2x.
So, if we want to find T 1dA, we can just use this parametrization:
ZZ Z 1 Z 2 2x
@T @T
1 dA = ⇥ dxdy
T 0 0 @x @y
Z 1Z 2 2x ✓ ◆ ✓ ◆
3
= 1, 0, 3 ⇥ 0, 1, dxdy
0 0 2
Z 1Z 2 2x ✓ ◆
3
= 3, , 1 dxdy
0 0 2
Z 1Z 2 2x
r
9
= 9 + + 1 dxdy
0 0 4
Z 1Z 2 2x
7
= dxdy
0 0 2
Z 1
= 7 7x dx
0
7
= .
2
RR
Similarly, if we want to find T xdA, we can do mostly the same thing:
ZZ Z 1 Z 2 2x
@T @T
x dA = x ⇥ dxdy
T 0 0 @x @y
Z 1 Z 2 2x
7x
= dxdy
0 0 2
Z 1
= 7x 7x2 dx
0
7
= .
6
Therefore, the x-coördinate of the center of mass is just the ratio of these two integrals,
7/6 1
i.e. 7/2 = 3 .
10
Question 4. Let T be the same triangle as in Question 3. Integrate the vector field
F(x, y, z) = (xy, yz, zx) over the perimeter of this triangle, oriented in the counterclock-
wise direction as viewed from the positive octant.
Solution. We could parametrize the boundary of this triangle, but that seems hard. In-
stead, we will use Stokes’s theorem, which says that
Z ZZ
F · ds = r ⇥ F · dS.
@T T
Using this, we can instead integrate r ⇥ F over the triangle itself, because we already
parametrized that! Convenient.
We do this here.
ZZ Z 1 Z 2 2x ✓ ◆
@T @T
r ⇥ F · dS = (r ⇥ F) · ⇥ dA
0 0 @x @y
T
Z 1 Z 2 2x ✓✓ ◆ ✓ ◆ ✓ ◆◆ ✓ ◆
@F3 @F2 @F1 @F3 @F2 @F1 @T @T
= , , · ⇥ dA
0 0 @y @z @z @x @x @y @x @y
Z 1 Z 2 2x ✓ ◆
3
= (0 y, 0 z, 0 x) · 3, , 1 dA
0 0 T (x,y) 2
Z 1 Z 2 2x ✓ ◆
3 3
= 3y 3 3x y x dA
0 0 2 2
Z 1 Z 2 2x
9 3 7
= y + x dA
0 0 2 4 2
Z 1
3
= 9 + 9x (2 2x)2 + 7x 7x2 dA
0 8
Z 1
17 2 21
= x + 19x dA
0 2 2
17 19 21 23
= + = .
6 2 2 6
Question 5. Directly calculate the integral of F (x, y, z) = (3x2 y, 3xy 2 , z) over the surface
of the unit cube, using the orientation depicted below. Then, use the divergence theorem to
calculate this in a much faster manner.
11
Solution. If we want to do this directly, break the unit cube into its six sides
[0, 1] ⇥ [0, 1] ⇥ {0}, [0, 1] ⇥ [0, 1] ⇥ {1},

[0, 1] ⇥ {0} ⇥ [0, 1], [0, 1] ⇥ {1} ⇥ [0, 1]
{0} ⇥ [0, 1] ⇥ [0, 1], {1} ⇥ [0, 1] ⇥ [0, 1],
notice that the normals to these sides are precisely the normals (0, 0, ±1), (0, ±1, 0), (±1, 0, 0)
as depicted in the above diagram, and calculate
ZZ
F · dS
surface of cube
Z 1Z 1 Z 1Z 1
= F (x,y,0) · (0, 0, 1)dxdy + F (x,y,1) · (0, 0, 1)dxdy
0 0 0 0
Z 1Z 1 Z 1Z 1
+ F (x,0,z) · (0, 1, 0)dxdz + F (x,1,z) · (0, 1, 0)dxdz
0 0 0 0
Z 1Z 1 Z 1Z 1
+ F (0,y,z) · ( 1, 0, 0)dydz + F (1,y,z) · (1, 0, 0)dydz
0 0 0 0
Z 1Z 1 Z 1Z 1 Z 1Z 1 Z 1Z 1
= 0dxdy + 1dxdy + 0dxdz + 3x dxdz
0 0 0 0 0 0 0 0
Z 1Z 1 Z 1Z 1
+ 0dydz + 3ydydz
0 0 0 0
=1.
Alternately, if you use the divergence theorem, we can calculate this in a much faster
12
way:
ZZ ZZZ
F · dS = (divF )dV
surface of cube cube
Z 1Z 1Z 1
= (6xy 6xy + 1)dxdydz
0 0 0
Z 1Z 1Z 1
= 1dxdydz = 1.
0 0 0
⇣ ⌘
sin2 (t)
Question 6. Let c(t) = cos(t) 2 , cos(t) sin(t) denote the “fish curve” drawn below:
Find the area contained within this curve.
Solution. This looks like a textbook example of when to use the Green’s theorem formula
for the area contained in a curve. Specifically, Green’s theorem, as applied to finding the
area contained within a curve, says that if a region R is bounded by some simple closed
curve c(t) that is oriented positively (i.e. so that R is on the left as we travel along c(t)),
then
ZZ Green’s theorem Z
z}|{ 1
area(R) = 1dxdy = = ( y, x) dc.
2 c(t)
R
13
R
If we just plug in our curve, we get that 12 c(t) ( y, x) dc is
Z ✓ ◆
1 2⇡ sin2 (t)
cos(t) sin(t), cos(t) · sin(t) sin(t) cos(t), cos2 (t) sin2 (t) dt
2 0 2
Z ✓ ◆
1 2⇡ 2 2 2 3 2 cos2 (t) sin2 (t) sin4 (t)
= cos(t) sin (t) + cos (t) sin (t) + cos (t) cos(t) sin (t) + dt
2 0 2 2
Z ✓ ◆
1 2⇡ cos2 (t) sin2 (t) 3 sin4 (t)
= + cos (t) + dt
2 0 2 2
Z ✓ ◆
1 2⇡ sin2 (2t) 2 (1 cos(2t))2
= + cos(t)(1 sin (t)) + dt
2 0 8 8
Z ✓ ◆
1 2⇡ 1 cos(4t) 2 1 2 cos(2t) + cos2 (2t)
= + cos(t)(1 sin (t)) + dt
2 0 16 8
Z ✓ ◆
1 2⇡ 1 cos(4t) 2 1 2 cos(2t) 1 + cos(4t)
= + cos(t)(1 sin (t)) + + dt
2 0 16 8 16
Z ✓ ◆
1 2⇡ 1 cos(2t)
= + cos(t)(1 sin2 (t)) dt
2 0 4 4
✓ ◆
1 t sin3 (t) sin(2t) 2⇡
= + sin(t)
2 4 3 8 0
=⇡/4.
But is this plausible? Well: looking at our fish curve, it seems to contain about (in the
head-part) the area of an ellipse from .5 to 1 with y-height from 1 to 1, which is about
3⇡/2. This is much greater than ⇡/4, the area of a circle with radius .5. So: something has
gone wrong!
What, specifically? Well, to apply Green’s theorem, we needed a simple closed curve
that was positively oriented. Did we have that here? No! In fact, our curve c has a self-
intersection: c(⇡/2) = c(3⇡/2), and in fact the tail part of our curve is oriented negatively
(i.e. if we travel around our curve from ⇡/2 to 3⇡/2, our region is on the right-hand side.
In fact, we’ve calculated the area of the head minus the area in the tail!
To calculate what we want, we want to take the integral above evaluated from ⇡/2
to ⇡/2 (the head) and then add the integral from 3⇡/2 to ⇡/2 (travelling backwards here
makes it so that we get the right orientation on the tail.) Specifically, we have
✓ ◆ ✓ ✓ ◆ ◆
1 t sin3 (t) sin(2t) ⇡/2 1 ⇡ ⇡ 1 1
+ sin(t) = + 1 ( 1) + +0 0
2 4 3 8 ⇡/2 2 8 8 3 3
⇡ 2
= + ,
8 3
while
✓ ◆ ✓ ✓ ◆ ◆
1 t sin3 (t) sin(2t) ⇡/2 1 ⇡ ⇡ 1 1
+ sin(t) = + 1 ( 1) + +0 0
2 4 3 8 3⇡/2 2 8 8 3 3
⇡ 2
= + ;
8 3
therefore, our total area is ⇡8 + 23 + ⇡8 + 23 = 43 .
14
Lecture 5: Linear Algebra

Week 5 UCSB 2015
This is the fifth week of the Mathematics Subject Test GRE prep course; here, we review
the field of linear algebra!

Unlike the calculus we’ve been studying earlier, linear algebra is a field much more focused
on its definitions than its applications. Accordingly, most of what you’ll be asked to recall
on a test are concepts and words, rather than any specific processes! We review as many
of these concepts as we can here, though I recommend skimming through a linear algebra
textbook for a more in-depth review.
1.1 Vector Spaces: Some Important Types

• Vector space: A vector space over a field F is a set V and a pair of operations
+ : V ⇥ V ! V and · : R ⇥ V ! V , that are in a certain sense “well-behaved:” i.e. the
addition operation is associative and commutative, there are additive identites and
inverses, the addition and multiplication operations distribute over each other, the
scalar multiplication is compatible with multiplication in F , and 1 is the multiplicative
identity.1
F will usually be R on the GRE.
– Examples: Rn , Cn , Qn , the collection of all polynomials with coefficients from
some field, the collection of all n ⇥ n matrices with entries from some field.
• Subspace: A subset S of a vector space V over a field F is called a subspace if it
satisfies the following two properties: (1) for any x, y 2 S and a, b 2 F , we have that
ax + by is also an element of S, and (2) S is nonempty.
• Span: For a set S of vectors inside of some vector space V , the span of S is the
subspace formed by taking all of the possible linear combinations of elements of S.
• Row space: For a n ⇥ k matrix M , the row space of M is the subspace of F n
spanned by the k rows of M .
• Null space: For a n ⇥ k matrix M , the null space of M is the following subspace:
{x 2 F n : M · x = 0}.
– Useful Theorem: The orthogonal complement of the row space of a matrix M is
the null space of M . Conversely, the orthogonal complement of the null space of
a matrix M is the row space of M .
1
See Wikipedia if you want a precise description of these properties.
1
• Eigenspace: For any eigenvalue , we can define the eigenspace E associated to
as the space
E =: {v 2 V : Av = v}.
1.2 Matrices: Some Important Types

• Elementary Matrices: There are three kinds of elementary matrices, which we
draw below:
0 1 0 1 ... 0 ... 0 ... 0
10 1
1 0 0 0 0 ... 0 1 0 0 0 0 ... 0
B 0 1 0 0 0 ... 0 C B 0 ... 0 ... 0 ... 0 CB 0 1 0 0 0 ... 0 C
B C B
B
CB
CB C
B 0 0 1 0 0 ... 0 C B 0 ... 0 ... 1 ... 0 CB 0 0 1 0 0 ... 0 C
B C B . . CB C
B 0 0 0 0 ... 0 C C
B C,B
B 0 . . . .. . . . .. . . .
0 CB
CB
0 0 0 1 0 ... 0 C
B 0 0 0 0 1 ... 0 C B CB 0 0 0 1 ... 0 C
B C B 0 ... 1 ... 0 ... 0 CB C
B . . . . . . . C B C@ .. ..
.. .. .. .. .. C
. . .
@ . . . . . . . . . . A @ ... ... ...
. .. .. . . ..
. . . . A . .. . . . . A
0 0 0 0 0 ... 1 0 ... 0 0 0 ... 1 0 0 0 0 0 ... 1
The first matrix above multiplies a given row by , the second matrix switches two
given rows, and the third matrix adds times one row to another row.
• Reflection Matrices: For a subspace U of Rn , we can find a matrix corresponding

to the map ReflU (x) by simply looking for eigenvectors. Specfically, if {u1 , . . . uk }
form a basis for U and {w1 , . . . wn k } form a basis for U ? , note that the ui ’s are
all eigenvectors with eigenvalue 1, and the wi ’s are all eigenvectors with eigenvalue
-1 (because reflecting through U fixes the elements in U and flips the elements in
U ? .) As a result, because we have n linearly independent eigenvectors, we can use
our diagonalization construction (discussed later in these notes) EDE 1 to make a
reflection matrix R.
• Adjacency Matrices: For a graph2 G on the vertex set V = {1, 2, . . . n}, we can
define the adjacency matrix for G as the following n ⇥ n matrix:
⇢
aij = 1 if the edge (i, j) is in E;
AG := aij
aij = 0 otherwise.
It bears noting that we can reverse this process: given a n ⇥ n matrix AG , we can
create a graph G by setting V = {1, . . . n} and E = {(i, j) : aij 6= 0}.
– Useful Theorem: In a graph G with adjacency matrix AG , the number of paths

from i to j of length m is the (i, j)-th entry in (AG )m .
• Permutation: A permutation of the list (1, . . . n) is simply some way to reorder

this list into some other ( (1), . . . (n)). (If you prefer to think about functions, is
simply a bijection from {1, . . . n} to {1, . . . n}.)
2
A directed graph G = (V, E) consists of a set V , which we call the set of vertices for G and a set
E ⇢ V 2 , made of ordered pairs of vertices, which we call the set of edges for G.
2
Given any permutation of (1, . . . n), the permutation matrix P is simply the
n ⇥ n matrix whose i-th column is given by e (i) . In other words,
2 3
.. .. ..
6 . . . 7
P =6 e ~
4 (1) e ~ (2) . . . e ~ 7
(n) 5
.. .. ..
. . .
1.3 Various Vector/Vector Space Properties

• Dimension: The dimension of a space V is the number of elements in a basis for
V.
• Rank: The rank of a matrix is the dimension of its row space.
• The rank-nullity theorem. The rank-nullity is the following result:
Theorem. Let U, V be a pair of finite-dimensional vector spaces, and let T : U ! V

be a linear map. Then the following equation holds:
dimension(null(T )) + dimension(range(T )) = dimension(U ).
• Orthogonality: Two vectors u, v are called orthogonal i↵ their inner product is 0;

i.e. if hu, vi = 0.
– Useful Theorem: If we have a basis B for some space V , the Gram-Schmidt

process will transform B into an orthogonal basis U for V – i.e. a basis for V
that’s made of vectors that are all orthogonal to each other. See my notes for an
in-depth description of this process.
• Linear indepdendence/dependence: A collection v1 . . . vk of vectors is called lin-

early
Pk dependent i↵ there are k constants a1 . . . ak , not all identically 0, such that
i=1 ai vi = 0. They are called linearly independent if no such collection exists.
– Useful Theorem: A collection of vectors {v1 , . . . vn } is linearly dependent i↵

the matrix formed by taking the vi ’s as its rows has a zero row in its reduced
row-echelon form.
– Equivalently, a collection of vectors {v1 , . . . vn } is linearly dependent i↵ the de-
terminant of the matrix formed by taking the vi ’s as its rows is zero.
• Basis: A basis for a space V is a collection of vectors B, contained within V , that is

linearly independent and spans the entire space V . A basis is called orthogonal i↵
all of its elements are orthogonal to each other; it is called orthonormal i↵ all of its
elements are orthogonal to each other and furthermore all have length 1.
• Eigenvector/eigenvalue: For a matrix A, vector x, and scalar , we say that is

an eigenvalue for A and x is a eigenvector for A if and only if Ax = x.
3
– Algebraic multiplicity: The algebraic multiplicity of an eigenvalue µ is the
number of times it shows up as a root of A’s characteristic polynomial. I.e. if
pA ( ) = ( ⇡)2 , ⇡ would have algebraic multiplicity 2.
– Geometric multiplicity: The geometric multiplicity of an eigenvalue µ is
the dimension of the eigenspace associated to µ.
– Useful Theorem: The algebraic multiplicity of an eigenvalue is always greater
than or equal to the geometric multiplicty of that eigenvalue.
– Useful Theorem: A matrix is diagonalizable i↵ every eigenvalue has its algebraic
multiplicity equal to its geometric multiplicity. (If you want it to be diagonal-
izable via real-valued matrices, you should also insist that the matrix and all of
its eigenvalues are real.)
– Dominant eigenvalue: The dominant eigenvalue: is the largest eigenvalue
of a matrix.
1.4 Various Matrix Properties

• Symmetric: A matrix is called symmetric i↵ AT = A.
– Useful Theorem: (AB)T = B T AT .
• Singular/Nonsingular: A n ⇥ n matrix is called singular i↵ it has rank < n, and

is called nonsingular i↵ it has rank n.
– Useful Theorem: A matrix is nonsingular if and only if it has an inverse.

– Useful Theorem: A matrix is nonsingular if and only if its determinant is nonzero.
• Orthogonal: A n ⇥ n matrix U is called orthogonal i↵ all of its columns are of

length 1 and orthogonal to each other. Equivalently, U is orthogonal i↵ U T = U 1 ;
i.e. U T U = U U T = I.
– Useful Theorem: Any n ⇥ n orthogonal matrix can be written as the product

of no more than n 1 reflections. (Specifically, no more than n 1 reflections
through spaces of dimension n 1, which we call hyperplanes.)
• Regular: A matrix A is called regular if aij > 0, for every entry aij in A. We will
often write A > 0 to denote this.
• Nonnegative: A matrix is called nonnegative if and only if all of its entries are
0.
– Useful Theorem: Suppose that A is a nonnegative matrix and is the maximum

of the absolute values of A’s eigenvalues. Then is itself an eigenvalue, and there
is a vector of nonnegative numbers that is an eigenvector for .
– Perron-Frobenius: If A is a nonnegative matrix such that Am > 0 for some value
of m, then the nonnegative eigenvector above is unique, up to scalar multiplica-
tion.
4
– If is an eigenvector of a nonnegative matrix A that corresponds to a nonnegative
eigenvector, then is at least the minimum of the row sums, and at most the
maximum of the row sums; similarly, is at least the minimum of the column
sums, and at most the maximum of the column sums.
• Similarity. Two matrices A, B are called similar if there is some matrix U such
that A = U BU 1 . If we want to specify what U is, we can specifically state that A
and B are similar via U .
• Diagonalizable: A diagonalization of a matrix A is an orthogonal matrix E and

a diagonal matrix D such that A = EDE 1 .
– Useful Theorem: A matrix A is diagonalizable if and only if it has n linearly in-

dependent eigenvectors e1 , . . . en . In this case, if 1 , . . . n are the corresponding
eigenvalues to the ei ’s, we can actually give the explicit diagonalization of A as
0 1
1 0 0 ... 0
0 1 B 0 0 ... 0 C 0 1 1
| | | B 2 C | | |
@ e1 e2 . . . en A · B B 0 0 3 ... 0 CC · @ e1 e2 . . . en A
B . . . . . C
| | | @ .. .. .. . . .. A | | |
0 0 0 ... n
– Suppose that A is diagonalized as EDE 1 . Then we can write the n-th power
of A as EDn E 1 . As well, if all of the entries along the diagonal of D have k-th
roots, we can give a k-th root of A as the product ED1/k E 1.
• Positive-definite/positive-semidefinite: A matrix A is called positive-definite

i↵ for any nonzero vector x, we have xT · A · x > 0. Similarly, it is called positive-
semidefinite i↵ for any nonzero vector x, we have xT · A · x 0.
– Useful Theorem: A matrix is positive-definite i↵ all of its eigenvalues are positive;

similarly, a matrix is positive-semidefinite i↵ all of its eigenvalues are nonnegative.
• Probability: A n ⇥ n matrix P is called a probability matrix if and only if the

following two properties are satisfied:
– P 0; in other words, pij

0 for every entry pij of P .
P
– The column sums of P are all 1; in other words, ni=1 pij = 1, for every j.
– Useful Theorem: Every probability matrix has a stable vector.

– Useful Theorem: If P is a probability matrix such that there is a value of m
where P m > 0, then there is only one stable vector v for P . Furthermore, for
very large values of m, P m ’s columns all converge to v. This theorem also holds
in the case where the graph represented by P is strongly connected3 , even if
P m is never > 0.
3
A graph is strongly connected i↵ it’s possible to get from any node to any other node via edges in
the graph.
5
– Useful Theorem 3: If we have a probability matrix P representing some finite
system with n states {1, . . . n}, then the probability of starting in state j and
ending in state i in precisely m steps is the (i, j)-th entry in P m .
• Polar decomposition: For a nonsingular n ⇥ n matrix A, a polar decomposition

of A is a pair of matrices Q, S such that Q is an orthogonal matrix and S is a
positive-definite symmetric matrix.
• Singular Value Decomposition: For a m ⇥ n matrix A, a singular value decom-

position of A is a n ⇥ n orthogonal matrix V , a m ⇥ n matrix D such that dij 6= 0
only when i = j, and a m ⇥ m orthogonal matrix U such that A = U DV T .
– Useful Theorem: If A has a singular value decomposition given by U DV T , then

A’s Moore-Penrose pseudoinverse A+ is given by the product V D+ U T , where
D+ is the n ⇥ m matrix formed by taking D’s transpose and replacing all of its
nonzero entries with their reciprocals.
– Useful Theorem: If A is a n ⇥ n matrix with SVD U DV T , then the minimum
value of ||Ax||/||x|| can be found by plugging in vi , where vi is the column of V
corresponding to the smallest value dii on D’s diagonal.
• Moore-Penrose pseudoinverse: For a matrix A, we say that A+ is the pseudoin-

verse of A i↵ the following four properties hold: (1) AA+ A = A, (2) A+ AA+ = A+ ,
(3) AA+ is symmetric, and (4) A+ A is also symmetric.
– Useful Theorem: The least-squares best-fit solutions to the system Ax = b are

given by vectors of the form
A+ · b + (I A+ A)w,
where we let w be any vector. Furthermore, if there is a solution to Ax = b,

then A+ · b is a solution of minimum length.
• The spectral theorem. Suppose that A is a n ⇥ n real symmetric matrix (i.e.

don’t make any assumptions about what U is like we did above.) Then in A’s Schur
decomposition U RU 1 , R is a diagonal real-valued matrix! Furthermore, we can
insure in our construction of U that it is a real-valued orthogonal matrix.
• QR-decomposition. A QR-decomposition of an n ⇥ n matrix A is an orthogonal

matrix Q and an upper-triangular4 matrix R, such that
A = QR.
Every invertible matrix has a QR-decomposition, where R is invertible.

4
2 A matrix
3 is called upper-triangular if all of its entries below the main diagonal are 0. For example,
1 2 3
40 3 25 is upper-triangular.
0 0 1
6
• Jordan block. A block Bi of some block-diagonal matrix is called a Jordan block
if it is in the form
2 3
1 0 0 ... 0
60 1 0 . . . 07
6 7
60 0 1 . . . 07
6 7
6 .. .. .. . . . . .. 7
6. . . . . .7
6 7
40 0 0 . . . 15
0 0 0 ... 0 .
In other words, there is some value such that Bi is a matrix with on its main
diagonal, 1’s in the cells directly above this diagonal, and 0’s elsewhere.
• Jordan canonical/normal form. Suppose that A is similar to an n ⇥ n block-

diagonal matrix B in which all of its blocks are Jordan blocks; in other words, that
A = U BU 1 , for some invertible U . We say that any such matrix A has been written
in Jordan canonical form.
Any n ⇥ n matrix A can be written in Jordan canonical form.
1.5 Operations on Vectors and Vector Spaces

product: For two vectors x, y 2 Rn , we define the dot product x · y as the
• Dot P
sum ni=1 xi yi .
• Inner product: For two vectors x, y 2 Rn , we define the inner product hx, yi of x
and y as their dot product, x · y.
– Useful Observation: Often, it’s quite handy to work with the transpose of certain
vectors. So, remember: when you’re taking the inner product or dot product of
two vectors, taking the transpose of either vector doesn’t change the results!
I.e. hx, yi = hxT , yi = hx, yT i = hxT , yT i. We use this a *lot* in proofs and
applications where there are symmetric or orthogonal matrices running about.
• Magnitude: p The magnitude of a vector x is the square root of its inner product with
itself: ||x|| = hx, xi. This denotes the distance of this vector from the origin.
• Distance:The distance of two vectors x, y from each other is the

p square root of the
inner product of the di↵erence of these two vectors: ||x y|| = hx y, x yi.
• Projection, onto a vector: For two vectors u, v, we define the projection of v onto
u as the following vector:
hv, ui
proju (v) := · u.
hu, ui
• Projection, onto a space: Suppose that U is a subspace with orthogonal basis

{b1 , . . . bn }, and x is some vector. Then, we can define the orthogonal projection
7
of x onto U as the following vector in U :
n
X
projU (x) = projbi (x).
i=1
– Useful Theorem: This vector is the closest vector in U to x.
• Orthogonal complement: For a subspace S of a vector space V , we define the
orthogonal complement S ? as the following set:
S ? = {v 2 V : hv, si = 0, 8s 2 S} .
• Isometry: A isometry is a map f : Rn ! Rn that preserves distances: i.e. for any

x, y 2 Rn , we have
||x y|| = ||f (x) f (y)||
• Reflection: For a subspace U of Rn , we define the reflection map through U as

the function
ReflU (x) = x 2 · projU ? (x)
1.6 Operations on Matrices

• Transpose: For a m ⇥ n matrix A, the transpose AT is the n ⇥ m matrix defined by
setting its (i, j)-th cell as aji , for every cell (i, j).
• Determinant For a n ⇥ n matrix A, we define
Xn
det(A) = ( 1)i 1 a1i · det(A1i ).
i=1
– Properties of the Determinant:
⇤ Multiplying one of the rows of a matrix by some constant multiplies that
matrix’s determinant by ; switching two rows in a matrix multiplies the
that matrix’s determinant by 1; adding a multiple of one row to another
in a matrix does not change its determinant.
⇤ det(AT ) = det(A).
⇤ det(AB) = det(A) det(B).
⇤ The determinant of the matrix A is the volume of the parallelepiped spanned
by the columns of A (up to a factor of ±1, which is how we determine if the
map is orientation-preserving or -reversing.)
– Useful Theorem: The determinant of a matrix A is nonzero if and only if A is
invertible.
• Trace: The trace of a n ⇥ n matrix A is the sum of the entries on A’s diagonal.
– Useful Theorem: The trace of a matrix is equal to the sum of its eigenvalues.
• Characteristic polynomial: The characteristic polynomial of a matrix A is the
polynomial pA ( ) = det( I A), where is the variable.
– x is a root of pA ( ) i↵ x is an eigenvalue for A.
8
2 Example problems
We work some sample problems here, to illustrate some of the ideas.
Question. Suppose that A is an n ⇥ n matrix such that A3 is the all-zeroes matrix, i.e.
the n ⇥ n matrix in which every entry is 0.
1. Think of A as a linear map from Rn ! Rn . Can the range of A be equal to Rn ?
2. Can you find an example of such a matrix A, such that A and A2 are not themselves
all-zeroes matrices?
Proof. For an example of such a matrix, consider

2 3
0 1 0
A = 4 0 0 15 .
0 0 0
We can easily check that

2 3 2 3 2 3
0 1 0 0 1 0 0 0 1
A2 = 4 0 0 15 · 4 0 0 15 = 4 0 0 05 ,
0 0 0 0 0 0 0 0 0
2 3 2 3 2 3
0 0 1 0 1 0 0 0 0
3 2
A =A ·A= 0 4 5
0 0 · 04 5
0 1 = 04 0 05 .
0 0 0 0 0 0 0 0 0
If you want a n ⇥ n example of such a matrix, simply add additional rows/columns of zeroes
to the left and bottom of A.
In general, we claim that the range of any such matrix A cannot be Rn . To see why,
simply notice that if A is a matrix with range equal to its domain, then A must be invertible;
consequently, for any natural number k, Ak must also be invertible, with inverse given by
(A 1 )k . Therefore Ak would have range Rn (as it is invertible, and thus has dim(nullspace)
= 0;) and therefore in particular we could not have Ak = the all-zeroes matrix for any k,
as this has dim(nullspace) = n.
Question. Take any n ⇥ n matrix M .
1. Take any k > 0 2 N, and think of M, M k as a pair of linear maps from Rn ! Rn .

Prove that
nullspc(M k ) ◆ nullspc(M ).
2. Suppose that det(M ) = 0. Prove that det(M k ) = 0 as well.
Proof. The first claim here is not hard to establish. Take any vector ~v 2 nullspc(M ). By
definition, we know that M~v = ~0; therefore, we can conclude that M k~v = M k 1 · M~v =
M k 1~0 = ~0 as well, and thus that ~v 2 nullspc(M ).
9
As a side note, our earlier problem proves that inequality is possible (as the nullspaces
of A, A3 were distinct;) it is also not hard to see that equality is possible (let M be the
all-zeroes matrix!) and thus that this is the strongest statement we can make.
For the second part of our claim: we could simply use the multiplicative property of the
determinant (which tells us that det(M k ) = det(M ) · . . . · det(M ) = 0 · . . . · 0 = 0), or we
could use the first part of this question to note that because
• det(M ) = 0 if and only if dim(nullspc(M )) 6= 0, and
• nullspc(M k ) ◆ nullspc(M ), then dim(nullspc(M ))  dim(nullspc(M k )),
• then we can conclude that if det(M ) = 0 then dim(nullspc(M k )) 6= 0, and thus that
• det(M k ) = 0.
Question. Create a 4 ⇥ 4 matrix A with the following properties:

• No entry of A is 0.
• A has 1,2,and 3 as eigenvalues.
Proof. If we ignore our “no zero entries” condition, this is not too hard; the matrix
2 3
1 0 0
4 0 2 05
0 0 3
satisfies our eigenvalue properties, as (1, 0, 0), (0, 1, 0), (0, 0, 1) are eigenvectors for these
three eigenvalues 1, 2, 3
Now, we can use the fact that eigenvalues are invariant under similarity; that is, if A
is a matrix and B is an invertible matrix, then A and BAB 1 have the same eigenvalues!
(This is because if ~v is an eigenvector for A, then B~v is an eigenvalue for BAB 1 , with the
same eigenvalue.)
So we can try simply multiplying A on the left and right by appropriate B, B 1 ’s, and
hope we get something without zeroes! In particular, let’s use some matrices whose inverses
we know: elementary matrices! Recall that
2 3 2 3
1 1 0 1 1 0
B = 4 0 1 05 ) B 1 = 40 1 05 ,
0 0 1 0 0 1
because the first map (when processed as B · (matrix)) corresponds to the Gaussian elim-
ination move of “add one copy of row two to row three,” and the second is just “add 1
copies of row two to row three.”
Therefore
2 3 2 3 2 3 2 3
1 1 0 1 0 0 1 1 0 1 1 0
4 0 1 0 5 · 4 0 2 0 5 · 4 0 1 05 = 4 0 2 0 5 ;
0 0 1 0 0 3 0 0 1 0 0 3
10
by using more of these elementary matrices, we can actually get
2 3 2 3 2 3 2 3
1 0 1 1 1 0 1 0 1 1 1 2
40 1 05 · 40 2 05 · 40 1 0 5 = 40 2 05 ;
0 0 1 0 0 3 0 0 1 0 0 3
2 3 2 3 2 3 2 3
1 0 0 1 1 2 1 0 0 1 1 2
42 1 05 · 40 2 05 · 4 2 1 05 = 4 6 4 45 ;
0 0 1 0 0 3 0 0 1 0 0 3
2 3 2 3 2 3 2 3
1 0 0 1 1 2 1 0 0 1 1 2
4 0 1 05 · 4 6 4 45 · 40 1 05 = 4 2 4 45 .
1 0 1 0 0 3 1 0 1 2 1 1
2 3
1 0 0
This is a matrix that has no nonzero entries, and by construction is similar to 40 2 05;
0 0 3
so we’ve answered our problem!
Question. Suppose that A is a n ⇥ n matrix with the following two properties:
• An is the all-zeroes matrix.
• There is exactly one nonzero vector ~v , up to scalar multiples, that is an eigenvector
of A. (In other words, the only eigenvectors for A are vectors of the form c · ~v .)
Find the Jordan normal form of A.
Proof. Take any eigenvector ~v for A; then A~v = ~v . Consequently, An~v = An 1 ~v =
An 2 2~v = . . . = n~v . Because An is the all-zeroes matrix, we can also observe that
An~v = ~0, for any vector ~v ; consequently, we have proven that the only possible eigenvalue
for A is 0.
Our second bullet point is the claim that the dimension of the eigenspace for this only
eigenvalue is 1. Consequently, if we look at our matrix’s Jordan normal form, we know that
• The diagonals are all zeroes, as 0 is the only eigenvalue, and eigenvalues go on the
diagonal of a Jordan normal form.
• There is only one block, as there is only one dimension of eigenvectors.
Therefore, we have that the Jordan normal form here is just zeroes on the diagonal, ones
directly about the diagonal, and zeroes elsewhere: i.e.
2 3
0 1 0 0 ... 0
60 0 1 0 . . . 07
6 7
60 0 0 1 . . . 07
6 7
6 .. .. .. .. . . .. 7
6. . . . . .7
6 7
40 0 0 0 . . . 15
0 0 0 0 ... 0
11
Question. Suppose that A is a real-valued symmetric n ⇥ n matrix with the following two
properties:
• All of A’s entries are either 0 or 1.
• The all-1’s vector is an eigenvector of A, with eigenvalue 10.
1. How many 1’s are there in each row of A?
2. Suppose that 6= 10 is another eigenvalue of A. Prove that  10.

2 3
a11 . . . a1n
6 .. .. .. 7 . Notice that by simply multiplying it out, A · (1, 1, . . . 1)
Proof. Let A = 4 . . . 5
an1 . . . ann
is the vector
n n n
!
X X X
a1i , a2i , . . . ani .
i=1 i=1 i=1
If this is equal to (10, 10, . . . 10), then there are ten 1’s in each row of A, as claimed.
Furthermore, suppose that we have any eigenvalue other than 10 for this matrix A. Let
~v be the eigenvector for this eigenvalue, and vk be the largest component of this eigenvector.
Then, again by definition, we have
2 3 2 3 2 Pn 3 2 3
a11 . . . a1n v1 i=1 a1i vi v1
6 .. . .. . 7 6 . 7 6
.. 5 · 4 .. 5 = 4 .
.. 7 6 .7
4 . 5 = 4 .. 5 .
Pn
an1 . . . ann vn i=1 ani vi vn
In particular, if we look at the vk coördinate, we have

n
X
aki vi = vk ;
i=1
but if we use the fact that vk is the “biggest” (i.e. vk vj , 8j), we can see that
n
X n
X
aki vi  aki vk  10vk ,
i=1 i=1
because there are at most ten one-entries in the k-th row (and the rest are zeroes.)
But this means that vk  10vk ; i.e.  10, as claimed.
12
Lecture 7: Di↵erential Equations

Week 7 UCSB 2015
This is the seventh week of the Mathematics Subject Test GRE prep course; here, we
review various techniques used to solve di↵erential equations!

A di↵erential equation is any mathematical equation that relates some collection of
functions to their derivatives; the mathematical fields of ordinary and partial di↵erential
equations study various methods used to find all of the functions that satisfy such equations.
More so than most other fields of mathematics, the study of ordinary and partial dif-
ferential equations is far more focused on its processes than its theorems; in other words,
mathematicians in these areas are less concerned with recognizing and applying various
large theorems, and are more concerned with the practical nitty-gritty of solving and work-
ing with specific classes of equations. To give an example, think of the di↵erence in feel
between studying limits and series in calculus, versus integrating various functions in cal-
culus. In the first subject, most of your proofs involve sticking together various results (i.e.
the mean value theorem, the squeeze theorem, etc) to solve problems. The second, however,
feels much more like an art — when integrating functions, you’re mostly looking for the
right clever use of substitution/parts/trig identities/change of variables/etc to solve your
specific equation!
Accordingly, our review section here is a bit di↵erent than other review sections. Here,
we sketch several basic techniques for solving various di↵erential equations; in the second
half of this talk, we use these techniques to solve several GRE-styled problems from the field.
Our listing of techniques is (of course) partial, as mathematics itself only knows techniques
for solving some families of di↵erential equations! However, almost all of the problems you
encounter on the GRE will be approachable with one of these techniques.
Finally, one last note about the specific structure of the GRE itself: due to the nature
of the GRE as a multiple-choice test, it will sometimes be possible to simply “plug in” your
five possible answers into your di↵erential equation, and eliminate answers based on those
that do not solve your question! This may not always be the most time-efficient approach,
as it will require you to take derivatives of between one to five functions; but derivation is
usually pretty “easy” as far as mathematical operations go. Do not be afraid to “game”
the GRE in this manner; it is a multiple-choice test, and you should exploit this structure
to your advantage!
1. Separable first-order di↵erential equations. Suppose that you have a di↵erential

equation of the form
dy
M (x)N (y) = .
dx
1
for two functions M (x), N (y). We can solve this equation by “separating” M (x) from
N (y): that is, by dividing both sides by N (x) and “multiplying” by dx to get1 the
following:
1
M (x)dx = dy.
N (y)
Integrating both sides yields
Z Z
1
M (x)dx = dy,
N (y)
which gives us a relation that can be used to solve for y with algebra/other techniques.
Be aware that this equation above only gives us solutions for which g(y) 6= 0. In the
event that g(y) is identically 0 — i.e. y is a constant — you would need to check this
manually by seeing if a constant value of y can solve our equation.
We calculate an example here:
Example. Solve the di↵erential equation

3x2 + 1 dy
= ,
2y 3 dx
with the boundary condition that when x = 0, y = 0 as well.
Proof. We simply separate variables and solve:

3x2 + 1 dy
=
2y 3 dx
Z Z
2
) (3x + 1)dx = (2y 3)dy
)x3 + x = y 2 3y + c
Because (0, 0) is a point that should be a solution to our equation, we can see that
c = 0, and that our equation is (solving for y)
x3 + x = y 2 3y
)y 2 3y (x3 + x) = 0
p
3 ± 9 + 4(x3 + x)
)y= .
2
At x = 0, this expression is 3±3
2 , which we know should be 0; this tells us that we
want the positive branch of this expression, i.e.
p
3 + 9 + 4(x3 + x)
y= .
2
1
Formally speaking, we are doing something more subtle than multiplying through by dx, because what
would that even mean? What is a dx, outside of an integral? For rigorous answers to this, take courses on
analysis and di↵erential equations! For now, however, just roll with it.
2
2. Homogeneous first-order di↵erential equations. Suppose that you have a dif-
ferential equation of the form
dy
M (x, y) + N (x, y) = 0,
dx
where M (x, y), N (x, y) are a pair of degree-n homogeneous equations2 . We can solve
this di↵erential equation by defining v = y/x, which lets us make the substitution
y = xv, and yields the equation
d
M (x, xv) + N (x, xv)
(xv) = 0.
dx
If M, N are homogeneous of degree n, this yields
✓ ◆
n n dv
x · M (1, v) + x · N (1, v) · v + x = 0,
dx
dv
which we can solve for dx to get
✓ ◆
dv N (1, v) 1
= v · .
dx M (1, v) x
This is a separable di↵erential equation, and therefore is solvable by our earlier meth-
ods! Use them to solve this di↵erential equation, and then finally substitute v = y/x
back to get a solution for our original problem.

dy
x2 y + y 2 x
= 0,
dx
given the boundary condition that at x = 0 we want y = 1.
Proof. The two functions M (x, y) = x2 y, N (x, y) = xy 2 are both homogeneous of

degree 3, so we can attempt to proceed as directed above. We start by substituting
in y = xv, and performing various algebraic manipulations:
✓ ◆
3 3 2 dv
x v+x v v+x =0
dx
dv x3 v
)v + x = 3 2
dx x v
dv 1
)x = v
dx v
1 1
) 1 dv = dx.
v +v
x
2
A function f (x, y) of two variables is called homogeneous of degree n if f (tx, ty) = tn f (x, y) for all
t, x, y. For example, f (x, y) = x2 + xy + y 2 is homogeneous of degree 2, as f (tx, ty) = t2 x2 + txty + t2 y 2 =
t2 f (x, y).
3
Integrating both sides of this separable equation gives us
Z Z Z
1 v 1 2 1
1 dv = 2
dv = ln(1 + v ) = dx = ln(x) + c.
v +v
1 + v 2 x
Plugging in v = y/x yields
✓ ◆
1 y2
ln 1 + 2 = ln(x) + c
2 x
y 2 ⇣ ⌘ 2 c
)1 + 2 = e 2 ln(x)+c = c · eln(x) =
x x2
)y 2 = c x2
p
)y = ± c x2 .
At x = 0 we wantedp y = 1; so this tells us that c = 1 and our sign is positive, and

therefore that y = 1 x2 is our answer!
Notice how we repeatedly ignore the constants, signs, etc. that c is multiplied by, as
it’s just a constant (and therefore we don’t really care if it’s two times some other
constant.)
3. Linear first-order di↵erential equations. Suppose that you have a di↵erential

dy
+ P (x)y = Q(x),
dx
and we want to solve for y as a function of x. We can solve this equation by multiplying
both sides by the “integrating factor3 ”
R
P (x)dx
µ(x) = e .
If we do this, then we get
dy
µ(x) + P (x)yµ(x) = Q(x)µ(x).
dx
d
However, the product rule tells us that the LHS above is just dx (µ(x) · y). Therefore,
our equation is now of the form
d
(µ(x) · y) = Q(x)µ(x).
dx
Integrating both sides with respect to x yields
Z
1
y= Q(x)µ(x)dx.
µ(x)
Success!
3
An integrating factor is some cleverly-chosen function that we multiply both sides of a di↵erential
equation by to make it simpler in some appropriate sense.
4
dy
+ x2 y = x5 ,
dx
Proof. This is a linear

R 2 di↵erential equation; therefore if we multiply both sides by the
3
integrating factor e x dx = ex /3 , and go through all of the steps above we get
dy x3 /3 3 3
e + x2 yex /3 = x5 ex /3
dx
@ ⇣ x3 /3 ⌘ 3
) ye = x5 ex /3
@x
⇣ 3 ⌘ Z 3
) yex /3 = x5 ex /3 dx.
To integrate the RHS, we use the substitution u = x3 /3, motivated by the fact that
eanything not just a single variable is a total pain to calculate: as du = x2 dx and 3u = x3 ,
we get
⇣ 3 ⌘ Z 3
yex /3 = x5 ex /3 dx
Z
3
) = x3 ex /3 x2 dx
Z
= 3ueu du
= 3ueu 3eu + C
3 /3
= ex (x3 3) + C
C
) y = x3 3+ .
ex3 /3
Our boundary conditions tell us that we want (0, 0) to be a solution to our equation:
in other words, that 3 = C.
4. Exact first-order di↵erential equations. Suppose that you have a di↵erential

dy
M (x, y) + N (x, y) = 0,
dx
@ @
where @y M (x, y) = @x N (x, y). Now, take any function F (x, y) that is the antideriva-
tive of M (x, y) with respect to x, and also the antiderivative of N (x, y) with respect
5
to y: i.e. some function F (x, y) such that
Z
F (x, y) = M (x, y)dx + Cy ,
Z
F (x, y) = N (x, y)dy + Cx ,
Note that I have written Cy , Cx instead of the normal constants C; this is because when
we integrate with respect to x, y is held constant (and similarly for y, x.) Therefore,
terms involving the coefficient we are not integrating by are “constants” that can show
up in our solution! (This is why we need to consider integrating both M (x, y) and
N (x, y), and not just one of the two.)
Fun fact we’re not proving here: such a function always exists for exact di↵erential
equations, and you can always find it!
@ @
When you do, you’ll get that @x F (x, y) = M (x, y), @y F (x, y) = N (x, y). Therefore,
we can write our di↵erential equation in the form
@ @ dx
F (x, y) + F (x, y) = 0.
@x @y dy
But this is simply the total derivative of the function F (x, y)! Therefore, if we are
asking that this total derivative is 0, we are looking for the set of all points (x, y) on
which F (x, y) is constant; that is, the set of all level curves of F (x, y), i.e.
F (x, y) = c.

dy
y cos(xy) + x cos(xy) = 0,
dx
Proof. We first notice that because

@ @
(y cos(xy)) = cos(xy) xy sin(xy) = (x cos(xy)),
@y @x
this di↵erential equation is indeed exact. Therefore, we are seeking some function
F (x, y) such that
Z Z
F (x, y) = M (x, y)dx = y cos(xy)dx = sin(xy) + Cy ,
Z Z
F (x, y) = N (x, y)dy = x cos(xy)dy = sin(xy) + Cx ;
6
i.e. F (x, y) = sin(xy) + C. Our solutions are simply the level curves of this function;
i.e. the set of all points (x, y) satisfying sin(xy) = c. If we want (1, 0) on this curve,
we want sin(0) = C; i.e. C = 0, and therefore that our solutions are the set of all
points satisfying sin(xy) = 0.
5. Nonexact first-order di↵erential equations. Sometimes you have a di↵erential

equation
dy
M (x, y) + N (x, y) = 0,
dx
that is not exact, but is not too far from it. In these situations, we can sometimes be
“clever” and multiply both the LHS and RHS by a cleverly-chosen integrating factor
µ(x) or µ(y), so that the resulting equation is exact! In practice, it can be very easy
or very difficult to find an integrating factor. We list a few special cases where such
factors have predictable forms here:
@ @
@y M (x, y) @x N (x, y)
• Suppose that is a function ⇠(x) that only depends on
N (x, y) R
the variable x. Then µ(x) = e ⇠(x)dx is an integrating factor for our di↵erential
equation.
@ @
@y M (x, y) @x N (x, y)
• Suppose that is a function (y) that only depends on
M (x, y) R
the variable y. Then µ(y) = e (y)dy is an integrating factor for our di↵erential
equation.

dy
(3x2 y + y 3 + 2yx) + (x2 + y 2 ) = 0,
dx
Proof. We first notice that because

@
(3x2 y + y 3 + 2yx) = 3x2 + 3y 2 + 2x,
@y
@ 2
(x + y 2 ) = 2x,
@x
we are sadly not exact. However, we do have that

@ @
@y M (x, y) @x N (x, y) 3x2 + 3y 2 + 2x 2x
= =3
N (x, y) x2 + y 2
7
is indeed a function ⇠(x) that only depends4 on the variable x! RTherefore, as suggested
above, we can multiply both sides by the integrating factor e ⇠(x)dx = e3x , to get
dy
(3x2 y + y 3 + 2yx)e3x + (x2 + y 2 )e3x = 0.
dx
We can see that this equation now is exact, as
@
(3x2 y + y 3 + 2yx)e3x = (3x2 + 3y 2 + 2x)e3x ,
@y
@ 2
(x + y 2 )e3x = 2xe3x + (x2 + y 2 )3e3x
@x
are both equal. Therefore, we can find a solution by integrating (3x2 y+y 3 +2yx)e3x , (x2 +
y 2 )e3x appropriately:
Z
F (x, y) = (3x2 y + y 3 + 2yx)e3x dx
✓ ◆ ✓ ◆
2 3x 2 3x 2 3x y 3 3x 2 3x 2 3x
y x e xe + e + e +y xe e + Cy
3 9 3 3 9
y3
=yx2 e3x + e3x + Cy .
Z 3
F (x, y) = (x2 + y 2 )e3x dy
y 3 3x
=yx2 e3x + e + Cx .
3
So we have
✓ ◆
3x 2 3x 2 3x
F (x, y) = e +y xe e + C,
3 9
Solutions to our di↵erential equation are level curves of this function; i.e. all x, y such
that
y 3 3x
yx2 e3x + e = C.
3
Asking that (0, 0) is on such a curve is simply the restriction that C = 0; that is, we
have
y 3 3x
yx2 e3x + e = 0.
3
4
Well, really, it doesn’t depend on anything. But that’s OK: constant functions are functions!
8
2 Example GRE Problems
We work four example problems here, taken from the three GRE exams you’ve completed
thus far in this class:
Problem. Let y = f (x) be a solution of the di↵erential equation
xdy + (y xex )dx = 0,
chosen such that y = 0 at x = 1. What is the value of f (2)?
1 1 e2
(a) 2e (b) e (c) 2 (d) 2e (e) 2e2
Answer. Notice that this equation is exact, as

@
(y xex ) = 1,
@y
@
(x) = 1.
@x
Therefore, we can solve this by simply integrating these two functions appropriately:
Z
F (x, y) = (y xex )dx = xy xex + ex + Cy ,
Z
F (x, y) = x dy = xy + Cx .
Combining these results gives us F (x, y) = xy xex + ex + C, which we want to find level
curves for; i.e. our solutions look like xy xex + ex = C. If we plug in the point (1, 0), we
get C = 0. Finally, if we want to find out what happens when we have x = 2, note that
2y 2e2 + e2 = 0
e2
implies that y = f (2) is just 2. In other words, our answer is (c).
Problem. Which of the following five pictures gives the graphs of two functions satisfying
the di↵erential equation
✓ ◆2
dy dy
+ 2y + y 2 = 0?
dx dx
9
Answer. Factoring our equation yields
✓ ◆2
dy
+y = 0,
dx
so we can simply take a square root to get the simpler problem

dy
+ y = 0.
dx
This is a very simple separable di↵erential equation; we can separate it accordingly to get
1 x
dy = dx ) ln(y) = x + C ) y = Ce .
y
The only answer whose curves have Ce x -like behavior is (a), so we have answered our
question.
Problem. Suppose that we have a tank of water. This tank is a cube with vertical sides,
no top, and side length 10 feet. Let h(t) denote the height of the water level, in feet, above
the floor of the tank at time t.
Suppose that at time t = 0 water begins to pour into the tank at a constant rate of 1
cubic foot per second, and also begins to pour out of the tank at a rate of h(t) 4 cubic feet
per second. As t approaches infinity, what is the limit of the volume of the water in the
tank?
(a) 400 ft3 (b) 600 ft3 (c) 1000 ft3 (d) The limit DNE.
(e) We do not have enough information to solve this problem.
10
Answer. We note that on one hand, if we let V denote the volume of our tank, we have
V = 100h; consequently, we have that dV dh dV
dt = 100 dt . Conversely, we are given dt directly as
1 h/4; therefore, by combining, we have the di↵erential equation
dh 1 1
+ h= .
dt 400 100
R
This is linear; therefore, if we multiply both sides by the integrating factor e (1/400)dt =
et/400 , we get
dh t/400 1 t/400 1
e + e h = et/400
dt 400 100
d ⇣ t/400 ⌘ 1
) he = et/400
dt Z 100
1
)het/400 = et/400 dt
100
= 4et/400 + C
C
) h = 4 + t/400 .
e
As t goes to infinity, this expression converges to 4; therefore the volume, which is 100h,
goes to 400. So our answer is (a).
11
Lecture 8: Abstract Algebra

Week 8 UCSB 2015
This is the eighth week of the Mathematics Subject Test GRE prep course; here, we run
a very rough-and-tumble review of abstract algebra! As always, this field is much bigger
than one class; accordingly, we focus our attention on key definitions and results.
1 Groups: Definitions and Theorems

Definition. A group is a set G along with some operation · that takes in two elements
and outputs another element of our group, such that we satisfy the following properties:
• Identity: there is a unique identity element e 2 G such that for any other g 2 G, we
have e · g = g · e = g.
• Inverses: for any g 2 G, there is a unique g 1 such that g · g 1 =g 1g = e.
• Associativity: for any three a, b, c 2 G, a · (b · c) = (a · b) · c.
Definition. A subgroup H of a group hG, ·i is any subset H of G such that H is also a

group with respect to the · operation.
Definition. A group hG, ·i is called abelian, or commutative, if it satisfies the following

additional property:
• Commutativity: for any a, b 2 G, a · b = b · a.
There are many di↵erent examples of groups:
Example. 1. The real numbers with respect to addition, which we denote as hR, +i,
is a group: it has the identity 0, any element x has an inverse x, and it satisfies
associativity.
2. Conversely, the real numbers with respect to multiplication, which we denote as hR, ·i,
is not a group: the element 0 2 R has no inverse, as there is nothing we can multiply
0 by to get to 1!
3. The nonzero real numbers with respect to multiplication, which we denote as hR⇥ , ·i,
is a group! The identity in this group is 1, every element x has an inverse 1/x such
that x · (1/x) = 1, and this group satisfies associativity.
4. The integers with respect to addition, hZ, +i form a group!
5. The integers with respect to multiplication, hZ, ·i do not form a group: for example,
there is no integer we can multiply 2 by to get to 1.
1
6. The natural numbers N are not a group with respect to either addition or multipli-
cation. For example: in addition, there is no element 1 2 N that we can add to 1
to get to 0, and in multiplication there is no natural number we can multiply 2 by to
get to 1.
7. GLn (R), the collection of all n ⇥ n invertible real-valued matrices, is a group under
the operation of matrix multiplication. Notice that this group is an example ofa non-
0 1
abelian group, as there are many matrices for which AB 6= BA: consider ·
0 0
    
1 0 0 0 1 0 0 1 0 1
= versus · = .
0 0 0 0 0 0 0 0 0 0
8. SLn (R), the collection of all n ⇥ n invertible real-valued matrices with determinant
1, is also a group under the operation of matrix multiplication; this is because the
property of being determinant 1 is preserved under taking inverses and multiplication
for matrices.
9. The integers mod n, Z/nZ is a group with respect to addition! As a reminder, the
object hZ/nZ, +, ·i is defined as follows:
• Your set is the numbers {0, 1, 2, . . . n 1}.

• Your addition operation is the operation “addition mod n,” defined as follows:
we say that a + b ⌘ c mod n if the two integers a + b and c di↵er by a multiple
of n.
For example, suppose that n = 3. Then 1 + 1 ⌘ 2 mod 3, and 2 + 2 ⌘ 1 mod 3.
• Similarly, our multiplication operation is the operation “multiplication mod n,”
written a · b ⌘ c mod n, and holds whenever a + b and c di↵er by a multiple of
n.
For example, if n = 7, then 2 · 3 ⌘ 6 mod 7, 4 · 4 ⌘ 2 mod 7, and 6 · 4 ⌘ 3
mod 7.
10. (Z/pZ)⇥ = {1, . . . p 1} is a commutative group with respect to the operation of

multiplication mod p, if and only if p is a prime.
Seeing this is not too difficult, and is a useful thing to do to remind ourselves about
how modular arithmetic works:
• It is easy to see that (Z/pZ)⇥ satisfies associativity, identity and commuta-

tivity, simply because these properties are “inherited” from the integers Z: i.e.
if a · b = b · a, then surely a · b ⇠
= b · a mod p, because equality implies equivalence
mod p!
• Therefore, the only property we need to check is inverses. We first deal with the
case where p is not prime. Write p = mn for two positive integers m, n 6= 1;
notice that because both of these values must be smaller than p if their product
is p, both m and n live in the set {1, . . . p 1}.
2
Consider the element n. In particular, notice that for any k, we have
kn ⌘ x mod p
)kn x is a multiple of p
)kn x is a multiple of mn
)kn x is a multiple of n
)x is a multiple of n.
(If none of the above deductions make sense, reason them out in your head!)
Because of this, we can see that n has no inverse in (Z/pZ)⇥ , as kn is only
congruent to multiples of n, and 1 is not a multiple of n.
• The converse — showing that if p is prime, (Z/pZ)⇥ has inverses — is a little
trickier. We do this as follows: first, we prove the following claim.
Claim. For any a, b 2 {0, . . . p 1}, if a · b ⌘ 0 mod p, then at least one of a, b
are equal to 0.
Proof. Take any a, b in {0, . . . p 1}. If one of a, b are equal to 0, then we know
that a · b = 0 in the normal “multiplying integers” world that we’ve lived in our
whole lives. In particular, this means that a · b ⌘ 0 mod p as well.
Now, suppose that neither a nor b are equal to 0. Take both a and b. Recall,
from grade school, the concept of factorization:
Observation. Take any nonzero natural number n. We can write n as a product
of prime numbers n1 · . . . · nk ; we think of these prime numbers n1 , . . . nk as the
“factors” of n. Furthermore, these factors are unique, up to the order we write
them in: i.e. there is only one way to write n as a product of prime numbers, up
to the order in which we write those primes. (For example: while you could say
that 60 can be factored as both 2 · 2 · 3 · 5 and as 3 · 2 · 5 · 2, those two factorizations
are the same if we don’t care about the order we write our numbers in.)
In the special case where n = 1, we think of this as already factored into the
“trivial” product of no prime numbers.
Take a, and write it as a product of prime numbers a1 · . . . · ak . Do the same for
b, and write it as a product of primes b1 · . . . · bm . Notice that because a and b
are both numbers that are strictly between 0 and n 1, n cannot be one of these
prime numbers (because positive multiples of n must be greater than n!)
In particular, this tells us that the number a · b on one hand can be written as
the product of primes a1 · . . . · ak · b1 · . . . · bm , and on the other hand (because
factorizations into primes are unique, up to ordering!) that there is no n in the
prime factorization of a · b.
Conversely, for any natural number k, the number k · n must have a factor of
n in its prime factorization. This is because if we factor k into prime numbers
k1 · . . . · kj , we have k · n = k1 · . . . · kj · n, which is a factorization into prime
numbers and therefore (up to the order we write our primes) is unique!
3
In particular, this tells us that for any k, the quantities a · b and k · p are distinct;
one of them has a factor of p, and the other does not. Therefore, we have shown
that if both a and b are nonzero, then a · b cannot be equal to a multiple of p —
in other words, a · b is not congruent to 0 modulo p! Therefore, the only way to
pick two a, b 2 {0, . . . p 1} such that a · b is congruent to 0 modulo p is if at
least one of them is equal to 0, as claimed.
• From here, the proof that our group has inverses is pretty straightforward. Take
any x 2 (Z/pZ)⇥ , and suppose for contradiction that it did not have any inverses.
Look at the multiplication table for x in (Z/pZ)⇥ :
1 2 3 ... p 1
x ? ? ? ... ?
If x doesn’t have an inverse, then 1 does not show up in the above table! The
above table has p slots, and if we’re trying to fill it without using 1, we only
have p 1 values to put in this table; therefore some value is repeated! In other
words, there must be two distinct values k < l with xl ⌘ xk mod p.
Consequently, we have x(l k) ⌘ 0 mod p, which by our above observation
means that one of x, (l k) are equal to 0. But x is nonzero, as it’s actually
in (Z/pZ)⇥ : therefore, l k is equal to 0, i.e. l = k. But we said that k, l are
distinct; so we have a contradiction! Therefore, every element x has an inverse,
as claimed.
11. The symmetric group Sn is the collection of all of the permutations on the set
{1, . . . n}, where our group operation is composition. In case you haven’t seen this
before:
• A permutation of a set is just a bijective function on that set. For example,

one bijection on the set {1, 2, 3} could be the map f that sends 1 to 2, 2 to 1,
and 3 to 3.
• One way that people often denote functions and bijections is via “arrow” nota-
tion: i.e. to describe the map f that we gave above, we could write
1 2 3
f:
✏
1 2 3
• This, however, is not the most space-friendly way to write out a permutation. A
much more condensed way to write down a permutation is using something called
cycle notation. In particular: suppose that we want to denote the permutation
that sends a1 ! a2 , a2 ! a3 , . . . an 1 ! an , an ! a1 , and does not change
any of the other elements (i.e. keeps them all the same.) In this case, we would
denote this permutation using cycle notation as the permutation
(a1 a2 a3 . . . an ).
4
To illustrate this notation, we describe all of the six possible permutations on {1, 2, 3}
using both the arrow and the cycle notations:
0 1 0 1 0 1
1 2 3 1 2 3 1 2 3
B C B C B C
id : @ A (12) : @ A (13) : @ A
✏ ✏ ✏ ✏ w ✏ '
1 2 3 1 2 3 1 2 3
0 1 0 1 0 1
1 2 3 1 2 3 1 2 3
B C B C B C
(23) : @ A (123) : @ A (132) : @ A
✏ w '
1 2 3 1 2 3 1 2 3
The symmetric group has several useful properties. Two notable ones are the follow-
ing:
Claim. We can write any 2 Sn as a product of transpositions1 .
Claim. For any finite group G, there is some n such that G is a subgroup of Sn .
12. The dihedral group of order 2n, denoted D2n , is constructed as follows:
Consider a regular n-gon. There are a number of geometric transformations, or sim-
ilarities, that we can apply that send this n-gon to “itself” without stretching or
tearing the shape: i.e. there are several rotations and reflections that when applied to
a n-gon do not change the n-gon. For example, given a square, we can rotate the plane
by 0 , 90 , 180 , or 270 , or flip over one of the horizontal, vertical, top-left/bottom-
right, or the top-right/bottom-left axes:
(rotate by 0°) (rotate by 90°)
a b a b a b b c
d c d c d c a d
(rotate by 180°) (rotate by 270°)
a b c d a b d a
d c b a d c c b
(flip over horizontal) (flip over vertical)
a b d c a b b a
d c a b d c c d
(flip over UL/DR diagonal) (flip over UR/DL diagonal)
a b a d a b c b
d c b c d c d a
1
A permutation 2 Sn is called a transposition if we can write = (ab), for two distinct values
a, b 2 {1, . . . n}.
5
Given two such transformations f, g, we can compose them to get a new transformation
f g. Notice that because these two transformations each individually send the n-gon
to itself, their composition also sends the n-gon to itself! Therefore composition is a
well-defined operation that we can use to combine two transformations.
a b b c c b
(rotate by 90°) (flip over vertical)

d c a d d a
=
a b c b
(flip over UR/DL diagonal)

d c d a
This is a group!
Now that we have some examples of groups down, we list some useful concepts and
definitions for studying groups:
Definition. Take any two groups hG, ·i, hH, ?i, and any map ' : G ! H. We say that '
is a group isomorphism if it satisfies the following two properties:
1. Preserves size: ' is a bijection2 .
2. Preserves structure: ', in a sense, sends · to ?. To describe this formally, we say

the following:
8g1 , g2 2 G, '(g1 · g2 ) = '(g1 ) ? '(g2 ).
This property “preserves structure” in the following sense: suppose that we have two
elements we want to multiply together in H. Because ' is a bijection, we can write
these two elements as '(g1 ), '(g2 ). Our property says that '(g1 · g2 ) = '(g1 ) ? '(g2 ):
in other words, if we want to multiply our two elements in H together, we can do so
using either the G-operation · by calculating '(g1 · g2 ), or by using the H-operation
? by calculating '(g1 ) ? '(g2 ).
Similarly, if we want to multiply any two elements g1 , g2 in G together, we can see
that g1 ·g2 = ' 1 ('(g1 ·g2 )) = ' 1 ('(g1 )?'(g2 )). So, again, we can multiply elements
using either G or H’s operation! To choose which operation we use, we just need to
apply ' or ' 1 as appropriate to get to the desired set, and perform our calculations
there.
2 1 1
Notice that this means that there is an inverse map ' : H ! G, defined by ' (h) = the unique
g 2 G such that '(g) = h.
6
Definition. Take any two groups hG, ·i, hH, ?i, and any map ' : G ! H. We say that '
is a group homomorphism if it satisfies the “preserves structure” property above.
Theorem. If G, H are groups and ' : G ! H is a homomorphism, then ker(') = {g 2

G | '(g) = id} is a subgroup of G, and for any subgroup S of G, '(S) = {'(s) | s 2 S} is
a subgroup of H.
Definition. A subgroup H of a group G is called normal if for any g 2 G, the left and
right cosets3 gH, Hg are equal. We write H ⇥ G to denote this property.
Theorem. Suppose G is a group and H is a normal subgroup. Define the set G/H to
be the collection of all of the distinct left cosets gH of H in G. This set forms something
called the quotient group of G by H, if we define g1 H · g2 H = (g1 g2 )H. This is a useful
construction, and comes up all the time: for example, you can think of Z/nZ as a quotient
group, where G is Z and H = nZ = {n · k | k 2 Z}.
Definition. Take any group hG, ·i of order n: that is, any group G consisting of n distinct
elements. We can create a group table corresponding to G as follows:
• Take any ordering r1 , . . . rn of the n elements of G: we use these elements to label our
rows.
• Take any other ordering c1 , . . . cn of the n elements of G: we use these elements to

label our columns. (This ordering is usually the same as that for the rows, but it does
not have to be.)
• Using these two orderings, we create a n ⇥ n array, called the group table of G, as
follows: in each cell (i, j), we put the entry ri · cj .
Theorem. Two groups hG, ·i, hH, ?i are isomorphic if and only if there is a bijection ' :
G ! H such that when we apply ' to a group table of G, we get a group table of H.
Theorem. (Cayley.) Let hG, ·i be a finite group, and g 2 G be any element of G. Define
the order of g to be the smallest value of n such that g n = id. Then the order of g always
divides the total number of elements in G, |G|.
More generally, suppose that H is any subgroup of G. Then |H| divides |G|.
This theorem has a useful special case when we consider the group (Z/pZ)⇥ :
Theorem. Fermat’s Little Theorem. Let p be a prime number. Take any a 6= 0 in

Z/pZ. Then
ap 1
⌘1 mod p.
3
The left coset gH of a subgroup H by an element g is the set {g · h | h 2 H}. Basically, it’s H if you
“act” on each element by g. Right cosets are the same, but with Hg instead.
7
Theorem. Any finite abelian group G is isomorphic to a direct sum4 of groups of the
k
form Z/pj j Z. In other words, for any finite abelian group G, we can find primes p1 , . . . pl
and natural numbers k1 , . . . kl such that
G⇠
= Z p k1 ··· Z p kl .
1 l
Groups are not the only algebraic objects people study:
2 Rings and Fields: Definitions and Theorems

Definition. A ring is a set R along with two operations +, · so that
• hR, +i is an abelian group.
• hR, ·i satisfies the associativity and identity properties.
• R satisfies the distributive property: i.e. for any a, b, c 2 R, we have a · (b + c) =

a · b + a · c.
• The identity for + is not the identity for ·.
Some people will denote a ring with a multiplicative identity as a “ring with unity.” I
believe it is slightly more standard to assume that all rings have multiplicative identities,
and in the odd instance that you need to refer to a ring without a multiplicative identity
as a “rng.”
Many of the examples we saw of groups are also rings:
Example. 1. The integers with respect to addition and multiplication form a ring, as
do the rationals, reals, and complex number systems.
2. The Gaussian p integers Z[i], consisting of the set of all complex numbers {a +
bi | a, b 2 Z, i = 1} form a ring with respect to addition and multiplication.
Z/nZ is a ring for any n.
Definition. A integral domain is any ring R where the following property holds:
• For any a, b 2 R, if ab = 0, then at least one of a or b is 0.
Example. 1. The integers with respect to addition and multiplication form a integral
domain, as do the rationals, reals, and complex number systems.
4
A group G is called the direct sum of two groups H1 , H2 if the following properties hold:
• Both H1 , H2 are normal subgroups of G.
• H1 \ H2 is the identity; i.e. these two subgroups only overlap at the identity element.
• Any element in G can be expressed as a finite combination of elements from H1 , H2 .
We think of G = H1 H2 .
8
2. Z/nZ is not an integral domain for any compositite n: if we can write n = ab for two
a, b < n, then we have that a · b ⌘ 0 mod n, while neither a, b are multiples of n (and
thus not equivalent to 0).
Definition. A field is any ring R where hR⇥ , ·i is a group. (By R⇥ , we mean the set of
all elements in R other than the additive identity.)
Example. 1. The integers with respect to addition and multiplication are not a field.
2. The rationals, reals, and complex number systems are fields with respect to addition
and multiplication!
3. Z/pZ is a field with respect to addition and multiplication.
There are many many theorems about rings and fields; however, the GRE will not
require you to know almost all of them. Instead, they mostly want you to be familiar with
what they are, and how they are defined!
To illustrate how the GRE tests you on these concepts, we run a few practice problems
here:
3 Sample GRE problems

Problem. Consider the set G made out of the four complex numbers {1, 1, i, i}. This
is a group under the operation of complex multiplication. Which of the following three
statements are true?
1. The map z 7! z is a homomorphism.
2. The map z 7! z 2 is a homomorphism.
3. Every homomorphism G ! G is of the form z 7! z k , for some k 2 N.
(a) 1 only. (b) 1 and 2 only. (c) 2 and 3 only. (d) 1 and 2 only. (e) 1, 2 and 3.
Answer. We can answer this problem quickly by classifying all possible homomorphisms
' : G ! G. We can first notice that we must send 1 to 1 if we are a homomorphism. To
see this, notice that '(1) = '(1 · 1) = '(1) · '(1), and therefore by canceling a '(1) on both
sides we have 1 = '(1).
Now, consider where to send i. If we have '(i) = i, then we must have '( 1) = '(i2 ) =
'(i)'(i) = i · i = 1, and '( i) = '(i3 ) = '(i)'(i)'(i) = i3 = i. So we’re the identity,
and also we are the map z 7! z 3 .
If we have '(i) = 1, then we must have '( 1) = '(i2 ) = '(i)'(i) = 1 · 1 = 1, and
'( i) = '(i3 ) = '(i)'(i)'(i) = 13 = 1. So we’re the map that sends everything to 1;
alternately, we’re the map z 7! z 4 .
Similarly, if we have '(i) = 1, then we must have '( 1) = '(i2 ) = '(i)'(i) =
1 · 1 = 1, and '( i) = '(i3 ) = '(i)'(i)'(i) = ( 1)3 = 1. So we’re the map z 7! z 2 .
9
The last possibility is if we have '(i) = i, then we must have '( 1) = '(i2 ) =
'(i)'(i) = i · i = 1, and '( i) = '(i3 ) = '(i)'(i)'(i) = ( i)3 = i. So we’re the map
z 7! z, or alternately the map z 7! z 3 .
As a result, all of the claims 1,2,3 are all true; so our answer is e.
Problem. Let R be a ring. Define a right ideal of R as any subset U of R such that
• U is an additive subgroup of R.
• For any r 2 R, u 2 U , we have ur 2 U .
Suppose that R only has two distinct right ideals. Which of the following properties
must hold for R?
1. R contains infinitely many elements.
2. R is commutative.
3. R is a division ring; that is, every nonzero element in R has a multiplicative inverse.
(a) 1 only. (b) 2 only. (c) 3 only. (d) 2 and 3 only. (e) 1, 2 and 3.
Answer. We first notice that any ring always has two ideals, namely {0} and R.
The first property is eliminated by noticing that R = Z/2Z is a ring. Its only additive
subgroups are {0} and R, so in particular those are its only two ideals, and this group is
clearly finite.
The second property is eliminated by recalling the quaternions H, which are a noncom-
mutative ring!
Finally, we can verify that the third property must hold. To see this, take any a, and
consider the set aR = {ar | r 2 R}. This is an additive subgroup, and also a right ideal!
Therefore it is either the all-zero subgroup (only if a = 0, as otherwise a · 1 = a 6= 0) or all
of R. But this means that there is some ar 2 aR, s 2 R such that ars = 1; i.e. rs = a 1 .
So a has an inverse!
This leaves 3 as the only possibility.
10
Lecture 10: Everything Else

Week 10 UCSB 2015
This is the tenth week of the Mathematics Subject Test GRE prep course; here, we
quickly review a handful of useful concepts from the fields of probability, combinatorics,
and set theory!
As always, each of these fields are things you could spend years studying; we present
here a few very small slices of each topic that are particularly key to each.
1 A Crash Course in Set Theory

Sets are, in a sense, what mathematics are founded on. For the GRE, much of the intricate
parts of set theory (the axioms of set theory, the axiom of choice, etc.) aren’t particularly
relevant; instead, most of what we need to review is simply the language and notation
for set theory.
Definition. A set S is simply any collection of elements. We denote a set S by its elements,
which we enclose in a set of parentheses. For example, the set of female characters in
Measure for Measure is
{Isabella, Mariana, Juliet, Francisca}.
Another way to describe a set is not by listing its elements, but by listing its properties.
For example, the even integers greater than ⇡ can be described as follows:
{x | ⇡ < x, x 2 Z, 2 divides x}.
Definition. If we have two sets A, B, we write A ✓ B if every element of A is an element

of B. For example, Z ✓ R.
Definition. A specific set that we often care about is the empty set, ; i.e. the set con-
taining no elements. One particular quirk of the empty set is that any statement of the
form 8x 2 ; . . . will always be vacuously true, as it is impossible to disprove (as we disprove
8 claims by using 9 quantifiers!) For example, the statement “every element of the empty
set is delicious” is true. Dumb, but true!
Some other frequently-occurring sets are the open intervals (a, b) = {x | x 2 R, a < x <
b} and closed intervals [a, b] = {x | x 2 R, a  x  b}.
Definition. Given two sets A, B, we can form several other particularly useful sets:
• The di↵erence of A and B, denoted A B or A\B. This is the set {x | x 2 A, x 2
/ B}.
• The intersection of A and B, denoted A \ B, is the set {x | x 2 A and x 2 B}.
• The union of A and B, denoted A[B, is the set {x | x 2 A or x 2 B, or possibly both.}.
1
• The cartesian product of A and B, denoted A ⇥ B, is the set of all ordered pairs
of the form (a, b); that is, {(a, b) | a 2 A, b 2 B}.
• Sometimes, we will have some larger set B (like R) out of which we will be picking
some subset A (like [0, 1].) In this case, we can form the complement of A with
respect to B, namely Ac = {b 2 B | b 2 / A}.
With the concept of a set defined, we can define functions as well:
Definition. A function f with domain A and codomain B, formally speaking, is a collec-

tion of pairs (a, b), with a 2 A and b 2 B, such that there is exactly one pair (a, b) for every
a 2 A. More informally, a function f : A ! B is just a map which takes each element in A
to some element of B.
Examples.
• f : Z ! N given by f (n) = 2|n| + 1 is a function.
• g : N ! N given by f (n) = 2|n| + 1 is also a function. It is in fact a di↵erent function

than f , because it has a di↵erent domain!
• The function h depicted below by the three arrows is a function, with domain {1, , '}
and codomain {24, , Batman} :
1 ; ? 24
' Batman
This may seem like a silly example, but it’s illustrative of one key concept: functions are
just maps between sets! Often, people fall into the trap of assuming that functions
have to have some nice “closed form” like x3 sin(x) or something, but that’s not true!
Often, functions are either defined piecewise, or have special cases, or are generally fairly
ugly/awful things; in these cases, the best way to think of them is just as a collection of
arrows from one set to another, like we just did above.
Functions have several convenient properties:
Definition. We call a function f injective if it never hits the same point twice – i.e. for
every b 2 B, there is at most one a 2 A such that f (a) = b.
Example. The function h from before is not injective, as it sends both and ' to 24:
2
1 ; ? 24
' Batman
However, if we add a new element ⇡ to our codomain, and make ' map to ⇡, our function
is now injective, as no two elements in the domain are sent to the same place:
1 ; 24
' Batman
*⇡
Definition. We call a function f surjective if it hits every single point in its codomain –
i.e. if for every b 2 B, there is at least one a 2 A such that f (a) = b.
Alternately: define the image of a function as the collection of all points that it maps
to. That is, for a function f : A ! B, define the image of f , denoted f (A), as the set
{b 2 B | 9a 2 A such that f (a) = b}.
Then a surjective function is any map whose image is equal to its codomain: i.e. f :
A ! B is surjective if and only if f (A) = B.
Example. The function h from before is not injective, as it doesn’t send anything to
Batman:
1 ; ? 24
' Batman
However, if we add a new element ⇢ to our domain, and make ⇢ map to Batman, our
function is now surjective, as it hits all of the elements in its codomain:
3
1 ; ? 24
' Batman
C
Definition. A function is called bijective if it is injective and surjective.
Definition. We say that two sets A, B are the same size (formally, we say that they are of
the same cardinality,) and write |A| = |B|, if and only if there is a bijection f : A ! B.
Not all sets are the same size:

Observation. If A, B are a pair of finite sets that contain di↵erent numbers of elements,
then |A| =
6 |B|.
If A is a finite set and B is infinite, then |A| =
6 |B|.
If A is an infinite set such that there is a bijection A ! N, call A countable. If A is
countable and B is a set that has a bijection to R, then |A| =
6 |B|.
We can use this notion of size to make some more definitions:
Definition. We say that |A|  |B| if and only if there is an injection f : A ! B. Similarly,
we say that |A| |B| if and only if there is a surjection f : A ! B.
This motivates the following theorem:
Theorem. (Cantor-Schröder-Bernstein): Suppose that A, B are two sets such that there
are injective functions f : A ! B, g : B ! A. Then |A| = |B|; i.e. there is some bijection
h : A ! B.
2 A Crash Course in Combinatorics

Combinatorics, very loosely speaking, is the art of how to count things. For the GRE, a
handful of fairly simple techniques will come in handy:
• (Multiplication principle.) Suppose that you have a set A, each element ~a of which
can be broken up into n ordered pieces (a1 , . . . an ). Suppose furthermore that the i-th
piece has ki total possible states for each i, and that our choices for the i-th stage do
not interact with our choices for any other stage. Then there are
n
Y
k1 · k2 · . . . · kn = ki
i=1
total elements in A.
To give an example, consider the following problem:
4
Problem. Suppose that we have n friends and k di↵erent kinds of postcards (with
arbitrarily many postcards of each kind.) In how many ways can we mail out all of
our postcards to our friends?
A valid “way” to mail postcards to friends is some way to assign each friend to a
postcard, so that each friend is assigned to at least at least one postcard (because
we’re mailing each of our friends a postcard) and no friend is assigned to two di↵erent
postcards at the same time. In other words, a “way” to mail postcards is just a
function from the set1 [n] = {1, 2, 3, . . . n} of postcards to our set [k] = {1, 2, 3, . . . k}
of friends!
In other words, we want to find the size of the following set:
n o
A = all of the functions that map [n] to [k] .
We can do this! Think about how any function f : [n] ! [k] is constructed. For each
value in [n] = {1, 2, . . . n}, we have to pick exactly one value from [k]. Doing this for
each value in [n] completely determines our function; furthermore, any two functions
f, g are di↵erent if and only if there is some value m 2 [n] at which we made a di↵erent
choice (i.e. where f (m) 6= g(m).)
k choices · k choices · . . . · k choices

| {z }
n total slots
Consequently, we have
· . . . · k} = k n
| · k {z
k
n
total ways in which we can construct distinct functions. This gives us this answer k n
to our problem!
• (Summation principle.) Suppose that you have a set A that you can write as the
union2 of several smaller disjoint3 sets A1 , . . . An .
Then the number of elements in A is just the summed number of elements in the Ai
sets. If we let |S| denote the number of elements in a set S, then we can express this
in a formula:
|A| = |A1 | + |A2 | + . . . + |An |.
We work one simple example:

1
Some useful notation: [n] denotes the collection of all integers from 1 to n, i.e. {1, 2, . . . n}.
2
Given two sets A, B, we denote their union, A [ B, as the set containing all of the elements in either
A or B, or both. For example, {2} [ {lemur} = {2, lemur}, while {1, ↵} [ {↵,lemur} = {1, ↵, lemur}.
3
Sets are called disjointif they haven no elements in common. For example, {2} and {lemur} are disjoint,
while {1, ↵} and {↵,lemur} are not disjoint.
5
Question 1. Pizzas! Specifically, suppose Pizza My Heart (a local chain/ great pizza
place) has the following deal on pizzas: for 7$, you can get a pizza with any two
di↵erent vegetable toppings, or any one meat topping. There are m meat choices and
v vegetable choices. As well, with any pizza you can pick one of c cheese choices.
How many di↵erent kinds of pizza are covered by this sale?
Using the summation principle, we can break our pizzas into two types: pizzas with
one meat topping, or pizzas with two vegetable toppings.
For the meat pizzas, we have m · c possible pizzas, by the multiplication principle (we
pick one of m meats and one of c cheeses.)
For the vegetable pizzas, we have v2 ·c possible pizzas (we pick two di↵erent vegetables
out of v vegetable choices, and the order doesn’t matter in which we choose them; we
also choose one of c cheeses.)
v
Therefore, in total, we have c · m + 2 possible pizzas!
• (Double-counting principle.) Suppose that you have a set A, and two di↵erent
expressions that count the number of elements in A. Then those two expressions are
equal.
Again, we work a simple example:
Question 2. Without using induction, prove the following equality:

n
X n(n + 1)
i=
2
i=1
First, make a (n + 1) ⇥ (n + 1) grid of dots:
How many dots are in this grid? On one hand, the answer is easy to calculate: it’s
(n + 1) · (n + 1) = n2 + 2n + 1.
On the other hand, suppose that we group dots by the following diagonal lines:
6
The number of dots in the top-left line is just one; the number in the line directly
beneath that line is two, the number directly beneath that line is three, and so on/so
forth until we get to the line containing the bottom-left and top-right corners, which
contains n + 1 dots. From there, as we keep moving right, our lines go down by one in
size each time until we get to the line containing only the bottom-right corner, which
again has just one point.
So, if we use the summation principle, we have that there are
1 + 2 + 3 + . . . + (n 1) + n + (n + 1) + n + (n 1) + . . . + 3 + 2 + 1
points in total.
Therefore, by our double-counting principle, we have just shown that
n2 + 2n + 1 = 1 + 2 + 3 + . . . + (n 1) + n + (n + 1) + n + (n 1) + . . . + 3 + 2 + 1.
Rearranging the right-hand side using summation notation lets us express this as
n
X
n2 + 2n + 1 = (n + 1) + 2 i;
i=1
subtracting n + 1 from both sides and dividing by 2 gives us finally

n
n2 + n X
= i,
2
i=1
which is our claim!

• (Pigeonhole principle, simple version): Suppose that kn + 1 pigeons are placed into
n pigeonholes. Then some hole has at least k + 1 pigeons in it. (In general, replace
“pigeons” and “pigeonholes” with any collection of objects that you’re placing in
various buckets.)
We look at an example:
Question 3. Suppose that “friendship” is4 a symmetric relation: i.e. that whenever
a person A is friends with a person B, B is also friends with A. Also, suppose that
you are never friends with yourself5 (i.e. that friendship is antireflexive.)
4
Magic!
5
Just for this problem. Be friends with yourself in real life.
7
Then, in any set S of greater than two people, there are at least two people with the
same number of friends in S.
Let |S| = n. Then every person in S has between 0 and n 1 friends in S. Also notice
that we can never simultaneously have one person with 0 friends and one person with
n 1 friends at the same time, because if someone has n 1 friends in S, they must
be friends with everyone besides themselves.
Therefore, each person has at most n 1 possible numbers of friends, and there are n
people total: by the pigeonhole principle, if we think of people as the “pigeons” and
group them by their numbers of friends (i.e. the “pigeonholes” are this grouping by
numbers of friends,) there must be some pair of people whose friendship numbers are
equal.
Some sorts of sets are very frequently counted:
• If we have a set of n objects, there are n! = n · (n 1) · . . . · 1 many ways to order

this set. For example, the set {a, b, c} has 3! = 6 orderings:
abc, acb, bac, bca, cab, cba.
• Suppose that we have a set of n objects, and we want to pick k of them without
repetition in order. Then there are n · (n 1) · . . . · (n (k + 1) many ways to choose
them: we have n choices for the first, n 1 for the second, and so on/so forth until our
k-th choice (for which we have n (k + 1) choices.) We can alternately express this as
n!
(n k)! ; you can see this algebraically by dividing n! by k!, or conceptually by thinking
our our choice process as actually ordering all n elements (the n! in our fraction) and
then forgetting about the ordering on all of the elements after the first k, as we didn’t
pick them (this divides by (n k)!.)
• Suppose that we have a set of n objects, and we want to pick k of them without
repetition and without caring about the order in which we pick these k elements.
Then there are k!(nn! k)! many ways for this to happen. We denote this number as the
binomial coefficient nk .
• Finally, suppose that we have a set of n objects, and we want to pick k of them, where
we can pick an element multiple times (i.e. with repetition.) Then there are nk many
ways to do this, by our multiplication principle from before.
3 A Crash Course in Probability

We give the basics of probability here.
Definition. A finite probability space consists of two things:
• A finite set ⌦.
8
• A measure P r on ⌦, such that P r(⌦) = 1. In case you haven’t seen this before,
saying that P r is a measure is a way of saying that P r is a function P(⌦) ! R+ , such
that the following properties are satisfied:
– P r(;) = 0.
1
! n
[ X
– For any collection {Xi }1
i=1 of subsets of ⌦, P r Xi  P r(Xi ).
i=1 i=1
1
! n
[ X
– For any collection {Xi }1
i=1 of disjoint subsets of ⌦, P r Xi = P r(Xi ).
i=1 i=1
For a general probability space, i.e. one that may not be finite, the definition is almost
completely the same: the only di↵erence is that ⌦ is not restricted to be finite, while P r
becomes a function defined only on the “measurable” subsets of ⌦. (For the GRE, you can
probably assume that any set you run into is “measurable.” There are some pathological
constructions in set theory that can be nonmeasurable; talk to me to learn more about
these!)
For example, one probability distribution on ⌦ = {1, 2, 3, 4, 5, 6} could be the distri-

bution that believes that P r({i}) = 1/6 for each individual i, and more generally that
P r(S) = |S|/6 for any subset S of ⌦. In this sense, this probability distribution is captur-
ing the idea of rolling a fair six-sided die, and seeing what comes up.
This sort of “fair” distribution has a name: namely, the uniform distribution!
Definition. The uniform distribution on a finite space ⌦ is the probability space that
assigns the measure |S|/|⌦| to every subset S of ⌦. In a sense, this measure thinks that
any two elements in ⌦ are “equally likely;” think about why this is true!
We have some useful notation and language for working with probability spaces:
Definition. An event S is just any subset of a probability space. For example, in the
six-sided die probability distribution discussed earlier, the set {2, 4, 6} is an event; you can
think of this as the event where our die comes up as an even number. The probability of
an event S occurring is just P r(S); i.e. the probability that our die when rolled is even is
just P r({2, 4, 6}) = 3/6 = 1/2, as expected.
Notice that by definition, as P r is a measure, for any two events A, B, we always have
P r(A [ B)  P r(A) + P r(B). In other words, given two events A, B, the probability of
either A or B happening (or both!) is at most the probability that A happens, plus the
probability that B happens.
Definition. A real-valued random variable X on a probability space ⌦ is simply any

function ⌦ ! R.
Given any random variable X¡ we can talk about the expected value of X; that is,
the “average value” of X on ⌦, where we use P r to give ourselves a good notion of what
“average” should mean. Formally, we define this as the following sum:
X
P r(!) · X(!).
!in⌦
9
For example, consider our six-sided die probability space again, and the random variable X
defined by X(i) = i (in other words, X is the random variable that outputs the top face of
the die when we roll it.)
The expected value of X would be
X 1 1 1 1 1 1 21 7
P r(!) · X(!) = · 1 + · 2 + · 3 + · 4 + · 5 + · 6 = = .
6 6 6 6 6 6 6 2
!in⌦
In other words, rolling a fair six-sided die once yields an average face value of 3.5.
Definition. Given a random variable X, if µ = E(X), the variance 2 (X) of X is just

E((X µ)2 ). This can also be expressed as E(X 2 ) (E(X))2 , via some simple algebraic
manipulations.
The standard deviation of X, (X), is just the square root of the variance.
Definition. Given a random variable X : ⌦ ! R on a probability space ⌦, we can define

the density function for X, denoted FX (t), as
FX (t) = P r(X(!)  t).

d
Given this function, we can define the probability density function fX (t) as dt Fx (t).
Notice that for any a, b, we have
Z b
P r(a < X(!)  b) = fX (t)dt.
a
A random variable has a uniform distribution if its probability density function is a

constant; this expresses the idea that uniformly-distributed things “don’t care” about the
di↵erences between elements of our probability space.
A random variable X with standard deviation , expectation µ has a normal distri-
bution if its probability density function has the form
1 (t µ)2 /2 2
fX (t) = p · e .
2⇡
This generates the standard “bell-curve” picture that you’ve seen in tons of di↵erent
settings. One useful observation about normally-distributed events is that about 68% of the
events occur within one standard deviation of the mean (i.e. P r(µ < X  µ + ) ⇡ .68),
about 95% of events occur within two standard deviations of the mean, and about 99.7%
of events occur within three standard deviations of the mean.
Definition. For any two events A, B that occur with nonzero probability, define P r(A
given B), denoted P r(A|B), as the likelihood that A happens given that B happens as well.
Mathematically, we define this as follows:
P r(A \ B)
P r(A|B) = .
P r(B)
In other words, we are taking as our probability space all of the events for which B happens,
and measuring how many of them also have A happen.
10
Definition. Take any two events A, B that occur with nonzero probability. We say that A
and B are independent if knowledge about A is useless in determining knowledge about
B. Mathematically, we can express this as follows:
P r(A) = P r(A|B).
Notice that this is equivalent to asking that
P r(A) · P r(B) = P r(A \ B).
Definition. Take any n events A1 , A2 , . . . An that each occur with nonzero probability. We
say that these n events are are mutually independent if knowledge about any of these
Ai events is useless in determining knowledge about any other Aj . Mathematically, we can
express this as follows: for any i1 , . . . ik and j 6= i1 , . . . ik , we have
P r(Aj ) = P r(Aj |Ai1 \ . . . \ Aik ).
It is not hard to prove the following results:
Theorem. A collection of n events A1 , A2 , . . . An are mutually independent if and only if

for any distinct i1 , . . . ik ⇢ {1, . . . n}, we have
k
Y
P r(Ai1 \ . . . \ Aik ) = Ai j .
j=1
Theorem. Given any event A in a probability space ⌦, let Ac = {! 2 ⌦ | ! 2 / A} denote

the complement of A.
A collection of n events A1 , A2 , . . . An are mutually independent if and only if their
complements {Ac1 , . . . Acn } are mutually independent.
It is also useful to note the following non-result:
Not-theorem. Pairwise independence does not imply independence! In other words, it is

possible for a collection of events A1 , . . . An to all be pairwise independent (i.e. P r(Ai \
Aj ) = P r(Ai )P r(Aj ) for any i, j) but not mutually independent!
Example. There are many, many examples. One of the simplest is the following: consider
the probability space generated by rolling two fair six-sided dice, where any pair (i, j) of
faces comes up with probability 1/6.
Consider the following three events:
• A, the event that the first die comes up even.
• B, the event that the second die comes up even.
• C, the event that the sum of the two dice is odd.
11
Each of these events clearly has probability 1/2. Moreover, the probability of A \ B, A \ C
and B \ C are all clearly 1/4; in the first case we are asking that both dice come up even,
in the second we are asking for (even, odd) and in the third asking for (odd, even), all of
which happen 1/4 of the time. So these events are pairwise independent, as the probability
that any two happen is just the products of their individual probabilities.
However, A \ B \ C is impossible, as A \ B holds i↵ the sum of our two dice is even!
So P r(A \ B \ C) = 0 6= P r(A)P r(B)P r(C) = 1/8, and therefore we are not mutually
independent.
12
MATH GRE BOOTCAMP: LECTURE NOTES
IAN COLEY
1. Day 1: Single variable calculus

Topics covered: limits, derivatives, implicit di↵erentiation, related rates, the in-
termediate value theorem, the mean value theorem, optimisation, L’Hôpital’s rule,
s
inverse functions and their derivatives, logarithms and exponential functions and
their derivatives.
s
1.1. Basics.
re
Definition 1.1. Let f : R ! R be a function. We say that lim f (x) = L if for every
x!a
" > 0, there exists > 0 such that whenever |x a| < , |f (x) L| < ".
We will worry about limits of sequences on another day.
og
Definition 1.2. We say that f : R ! R is continuous at x = a if lim f (x) = f (a).
x!a
Problem 1.3. At what points is the following function continuous?

(
x2 x2Q
Pr
f (x) =
x/5 x 2 R \ Q
This sort of problem (coupled with having taken real analysis) should remind you
that the heuristic ‘continuous means you don’t pick up your pen’ doesn’t work in
sophisticated situations.
There is also a notion of right continuous and left continuous, where we use only
In
one-sided limits. This is more easily drawn than defined (and they also have no
analogue in multivariable calculus, so we will omit the full definition). If this were a
blackboard, there would be a better example here.
When are functions discontinuous? Jump discontinuities (almost always in piece-
wise functions), infinite discontinuities, and removable discontinuities (holes). One
could also
p consider a function discontinuous at the points where it ceases to exist, e.g.
f (x) = x is discontinuous at x  0. For examples of continuous functions, think
of almost literally any function: polynomials, trigonometric functions, logarithms,
exponential functions, etc.
Last Updated: December 5, 2018.
1
2 IAN COLEY
It is likely unimportant for the GRE, but it might be nice to recall the squeeze
theorem just in case:
Theorem 1.4 (Squeeze Theorem). Suppose that f, g, h : R ! R are three functions.
If f (x)  g(x)  h(x) in a neighbourhood of x = a, then
lim f (x)  lim g(x)  lim h(x).
x!a x!a x!a
This is particularly useful when f (x) and h(x) are continuous at x = a (or one is
even constant) but g(x) isn’t.
✓ ◆
1
Problem 1.5. Compute lim x · sin .
x!0 x
s
The second main definition we need for calculus is that of di↵erentiable functions.
s
f (x) f (a)
Definition 1.6. We say that f : R ! R is di↵erentiable at x = a if lim
re
x!a x a
f (a + h) f (a)
exists. Alternatively, we can ask that lim exists. We write f 0 (a) for
h!0 h
this value.
og
Problem 1.7. Prove that these two definitions agree.
Problem 1.8. Prove that if f (x) is di↵erentiable at x = a, it is continuous at x = a.
When are functions non-di↵erentiable? By the previous exercise, when they’re
Pr
discontinuous. It’s important to remember to always check continuity first, as in the
following:
Problem 1.9. Describe the set of solutions (a, b, c) 2 R3 such that the following
function is continuous and di↵erentiable (everywhere):
(
ax2 + bx + c x  1
f (x) =
In
x log x x>1
Solution. The pair (a, b) determines the equality of derivatives at x = 1 and c

determines the continuity. To check continuity, we have that f (1) = a + b + c = 1 ·
log(1), so that a+b+c = 0. Second, by the product rule we have (x log x)0 = 1+log x,
so for di↵erentiability we need f 0 (1) = 2a + b = 1 + log(1) = 1, so that 2a + b = 1.
We can solve b = 1 2a, so a + b + c = a + (1 2a) + c = 0 hence a + c = 1 so
c = 1 + a and b = 1 2a. Thus the set of solutions can be written (a, 1 2a, 1 + a).
There are two basic theorems about continuous and di↵erentiable functions we
will state now, and a third later.
MATH GRE BOOTCAMP: LECTURE NOTES 3
Theorem 1.10 (Intermediate Value Theorem). Let f : [a, b] ! R be a continuous

function. Assume that f (a) < f (b). Then for any y 2 [f (a), f (b)], there exists
c 2 [a, b] such that f (c) = y.
use this to compute what zeroes are
Theorem 1.11 (Mean Value Theorem). Let f : [a, b] ! R be a di↵erentiable func-
tion (except at the endpoints). Then there exists a point c 2 [a, b] such that
f 0 (c) · (b a) = f (b) f (a).
I would call these theorems ‘fundamental’ but we all know that’s reserved for a
bit later.
s
1.2. Derivatives. Of course we all remember the basic rules for doing derivatives,
but let’s recall them anyway: let f, g : R ! R be two functions.
s
• (f ± g)0 (x) = f 0 (x) ± g 0 (x)
re
• (c · f )0 (x) = c · f 0 (x) for all c 2 R
• (f · g)0 (x) = f (x) · g 0 (x) + f 0 (x) · g(x)

og
✓ ◆0
f g(x) · f 0 (x) f (x) · g 0 (x)
• (x) = for g(x) 6= 0
g g(x)2
• (f g)0 (x) = f 0 (g(x)) · g 0 (x)

Pr
There are a few other functions we should memorise the derivative of:
d
• c = 0 for c 2 R
dx
d n
• x = n · xn 1
dx
In
d
• sin(x) = cos(x)
dx
d
• cos(x) = sin(x)
dx
d
• sinh(x) = cosh(x)
dx
d
• cosh(x) = sinh(x)
dx
4 IAN COLEY
d x
• e = ex
dx
d 1
• log(x) =
dx x
Other trigonometric functions can be computed using the quotient rule.
Perhaps it is also worth remembering the less-used but GRE-noteworthy formula
for the second derivative of a function:
Problem 1.12. Prove that
f (x + h) + f (x h) 2f (x)
f 00 (x) = lim
h!0 h2
s
What other techniques do we use for computing derivatives? Computing the
derivatives of inverse functions can be difficult, specifically when we don’t have a
s
closed formula for the inverse. What circumstances are those?
Definition 1.13. Let f : R ! R be a function, and suppose that X ⇢ R is a set on
re
which f is one-to-one. Then we say that f is invertible on X, and write f 1 (y) for
the inverse, which is defined by f 1 (y) = x if and only if f (x) = y.
Problem 1.14. Suppose that f : R ! R is a function and that for all x 2 X ⇢ R,
og
f 0 (x) > 0. Then f is invertible on X.
Problem 1.15. Suppose that f : R ! R is a di↵erentiable invertible function. Then
f 1 : R ! R is also di↵erentiable, except at those y = f (x) 2 R such that when
f 0 (x) = 0.
Pr
How do we compute the derivative of the inverse? Write f 1 = g for simplicity,
Then f (g(y)) = y, so taking derivatives and using the chain rule we have
1
(f g)0 (y) = f 0 (g(y)) · g 0 (y) = 1 =) g 0 (y) = 0
f (g(y))
So as long as we can figure out g(y) and f 0 (x), we can figure out g 0 (y).
Problem 1.16. Compute the derivative of tan 1 (y).
In
Solution. By the formula, we have that the inverse should be the derivative of tan(x)
evaluated on tan 1 (y). The derivative of tan(x) is sec2 (x), so we need to figure out
sec2 (tan 1 (y)). This is done with a technique I personally call ‘draw the triangle’.
We know that tan 1 (y) = x for some x, so we need to draw the triangle in which x
is one of the angles. We know only that tan(x) = y, so we may draw a right triangle:
1
• x
•
p y
1+y 2
•
Therefore sec2 (x) we can compute as the hypotenuse squared over the adjacent side
squared, that is sec2 (tan 1 (y)) = 1 + y 2 . Therefore the derivative of tan 1 (y) is
1
.
1 + y2
This method can be used to compute the derivatives of the other inverse trigono-
metric and hyperbolic trigonometric functions on the fly, so you don’t need to nec-
essarily memorise all of them. That said, you should definitely memorise the above
example.
Logarithmic di↵erentiation is a useful technique, and it also recalls implicit di↵er-
entiation.
Definition 1.17. Let F : R2 ! R be a function and let F (x, y) = c implicitly define
s
a function of one variable f : (a, b) ! R. Then
s
dy Fx
=
dx Fy
re
where Fx , Fy are the partial derivatives of F .
In practice, life is easier than this. We will illustrate with an example.
Problem 1.18. Find the tangent line to the circle x2 + y 2 = 25 at (3, 4).
og
Solution. In practice, we just take the derivative of everything with respect to x
and recall that y 0 = dy/dx. Hence
2x x
0 = 2x + 2y · y 0 =) y 0 = = .
Pr
2y y
This is exactly what we get when we use the definition as well. To finish the actual
problem, we have
dy 3
|(x,y)=(3,4) =
dx 4
3
so that the tangent line is y 4 = (x 3).
4
In
Now, what is logarithmic di↵erentiation? Let y = f (x). Using implicit di↵erenti-

ation (or the chain rule, depending on your perspective), we have
y0
(log y)0 = =) y 0 = y · (log y)0 .
y
In the case that log y is easier to di↵erentiate than y, this is a helpful trick. For
example, consider f (x) = xx or
s
(x + 11)2 (x 4)
f (x) = .
(x 1)2 (x + 4)
6 IAN COLEY
x
Problem 1.19. Compute the second derivative of f (x) = xx .
The last type of basic derivative is for parametric functions. Suppose that we
define a graph using two functions x(t) and y(t) rather than y = f (x). It’s still
straightforward to compute the change in y with respect to the change in x:
dy dy/dt
= .
dx dx/dt
This process can be iterated to find the second derivative of y with respect to x, but
it’s a bit more difficult.
d2 y d dy
Problem 1.20. Using the quotient rule, compute the formula for 2 = .
dx dx dx
s
1.3. Applications of Derivatives. Related rates problems, which involve para-
metric functions as discussed above, show up quite often in the single-variable cur-
s
riculum. Classic examples include balloons filling up or deflating and basins filling
up or emptying of water. Ladders sliding down a wall or shadows lengthening are
re
also common.
Problem 1.21. Suppose we have a right conical co↵ee filter which is 8cm tall with
a radius of 4cm. The water drips through at a constant rate of 2 cubic centimetres
og
per second. When there is one eighth of the original water remaining, how fast is
the water level dropping?
⇡
Solution. We begin by noting that V (t) = · h(t) · r(t)2 . But because our cone
3
is conical, we also know that the height and radius of the cone are in a fixed ratio:
Pr
h(t) = 2 · r(t). Because we are solving for h0 (t), we will substitute in r(t) = r(t)/2.
Hence
⇡ h(t)2 ⇡
V (t) = · h(t) · = · h(t)3 .
3 4 12
Taking the time derivative,
⇡
V 0 (t) = h(t)2 · h0 (t)
In
4
Letting t = t0 be the time at which we would like to find the change in height, we
know that V 0 (t0 ) = 2 no matter what. Therefore we need to find h(t0 ) to complete
⇡ 128⇡
the problem. We know that V (t0 ) = V (0)/8, and that V (0) = · 8 · 42 = . So
3 3
⇡
as V (t0 ) = 12 · h(t0 )3 = 16⇡
3
, it’s pretty clear that h(t0 ) = 4. Therefore
⇡ 1
2 = V 0 (t) = · 42 · h0 (t0 ) =) = h0 (t0 ).
4 2⇡
We can now turn to optimisation. This is certainly a favourite in the undergraduate
curriculum and appears sometimes in the GRE. Why does it work?
Theorem 1.22 (Extreme Value Theorem). Let f : [a, b] ! R be a continuous func-

tion. Then the set f ([a, b]) has both a maximum value and a minimum value.
Note that it’s necessary that [a, b] be a closed interval. The image of an open
interval under a continuous function may have neither a maximum nor a minimum,
i.e. f (x) = 1/x and the interval (0, 1).
We can even say more:
Definition 1.23. Let f : R ! R be a di↵erentiable function. We say that a 2 R is
a critical point if f 0 (a) = 0.
We might also include the case that f : R ! R is a function which is di↵erentiable
except at finitely many points call those critical points as well. Their utility is the
s
following theorem, credited to Fermat by various sources.
s
Theorem 1.24 (Fermat’s Boring Theorem). Let f : [a, b] ! R be a di↵erentiable
function (except at the endpoints). Then the maxima and minima of f (x) occur at
re
critical points and end points.
There are many di↵erent kinds of optimisation problems (as there are related
rates problems), but all have the same basic process: we are given both a function
og
to optimise and a constraint. The function to optimise is going to be some F (x, y)
in two variables, and the constraint is going to be some equation g(x, y) = c. After
substituting, we’ll have a one-variable problem. Sometimes we will have endpoints,
and sometimes it will be implicit that as your one variables gets too big or small that
there is no extremum to be found. We then find the critical points and exhaustively
Pr
check which is the largest and which is the smallest.
Problem 1.25. Suppose we are constructing a window comprised of a semicircle
sitting atop a rectangle. Given that the perimeter of the window must be 4 meters,
what is the maximum area?
Solution. Let us call x the width and y the height of the rectangle. We know that
⇡ ⇣ x ⌘2
In
the area is given by the sum xy + · , the first for the rectangle and the second
2 2
for the semicircle. Our constraint is 2y + ⇡x + x = 4, the bottom three sides of the
rectangle and the arc of the semicircle. It looks like making x our sole variable will
1+⇡
be the best path to success, so we will substitute y = 2 · x. Our function is
2
thus ✓ ◆
1+⇡ ⇡ ⇣ x ⌘2 4 + 3⇡ 2
A(x) = x · 2 ·x + · = 2x ·x .
2 2 2 8
Note that this is the equation of a downward-facing parabola, so if x is too big or
too small we have A(x) < 0. This is obviously a nonsense answer to the question,
8 IAN COLEY
so what we’re looking for is the critical point giving the vertex of the parabola – its
maximum.
4 + 3⇡ 8
We now need to solve A0 (x) = 2 · x = 0, so x = . We can now
4 4 + 3⇡
compute the maximum area:
✓ ◆
8 8
A = (not a typo).
4 + 3⇡ 4 + 3⇡
Problem 1.26. What is the minimum distance between the curve y = 1/x and the
origin?
Distance questions seem more
p difficult, as the optimisation equation we are dealing
s
with is of the form d(x) = x2 + f (x)2 . But an important observation is that the
minimum of d(x) is also achieved by d(x)2 , because the distance function is always
s
positive. Moreover, critical points of d(x) and d(x)2 are the same, as
d
re
(d(x)2 ) = 2 · d(x) · d0 (x)
dx
and d(x) > 0 as long as x 6= 0 or f (x) 6= 0. The bright side is that d(x)2 = x2 + f (x)2
is much easier to di↵erentiate than d(x).
og
1.4. Graphical Analysis. We already know that f 0 (x) > 0 means increasing and
f 0 (x) < 0 means decreasing, but now let’s recall what the second derivative means.
If f 00 (x) > 0, the graph y = f (x) is concave up, so that the graph is open to +1,
and f 00 (x) < 0 is concave down.
Pr
Definition 1.27. If f 00 (a) = 0, we say that x = a is a point of inflection.
In the circumstances of optimisation, we can use the second derivative to test
whether a critical point is a maximum or minimum.
Theorem 1.28 (Second Derivative Test). Suppose that x = a is a critical point of
f : R ! R. Suppose that f 00 (x) exists in a neighbourhood of a. If f 00 (a) < 0, then
In
f (a) is a local maximum. If f 00 (a) > 0, then f (a) is a local minimum. If f 00 (a) = 0,
the test is inconclusive.
There is also the first derivative test: suppose f 0 (x) exists and is continuous near
x = a. If f 0 (x) < 0 to the left of a and f 0 (x) > 0 to the right of a, then f (a) is a local
minimum. If f 0 (x) > 0 to the left of a and f 0 (x) < 0 to the right of a, then f (a) is a
local maximum. This can often be more useful for optimisation problems when the
second derivative is too difficult to compute. However, as in the above example of
3⇡
the window, we see that A00 (x) = constantly, so any extreme values that exist
4
must be maxima.
Problem 1.29. Suppose that y = f (x) is smooth (i.e. has continuous derivatives
of all orders) and that f (1) = 2 is a local maximum. Order the values f (1), f 0 (1),
f 00 (1).
Solution. We know that f (1) = 2, so that settles that. Because x = 1 is a local ex-
tremum, we must have f 0 (1) = 0. Finally, because this extreme point is a maximum,
we must be concave down so f 00 (1) < 0. Hence f 00 (1) < f 0 (1) < f (1).
1.5. L’Hôpital’s Rule. The last topic worth remembering L’Hôpital’s rule, which
comes surprisingly in the second quarter of calculus at UCLA but we can recall now.
Theorem 1.30. Let f, g : R ! R be two functions and a 2 R [ {±1}. If
lim f (x) = lim g(x) = c where c 2 {0, ±1}, then
s
x!a x!a
f (x) f 0 (x)
s
lim = lim 0
x!a g(x) x!a g (x)
re
0 1
Expressions of the form and ± are called indeterminate forms. L’Hôpital’s
0 1
rule can also be used to solve problems which are not immediately in an indeterminate
form. The easiest case is expressions of the form f (x) · g(x) which give rise to the
og
indeterminate form 0 · 1. By rearranging to
f (x) g(x)
or
1/g(x) 1/f (x)
we will obtain an actual indeterminate form. Which one to choose depends on
Pr
whether 1/f (x) or 1/g(x) is easier to di↵erentiate.
Problem 1.31. Compute the following limit:
1 2x + 1
lim
x!2 x 2 x2 4
Solution. If you plug in x = 2, we don’t obtain an indeterminate form, but we
In
obtain something that looks like 1 1. This is our clue to combine the fractions
into an indeterminate form. With a common denominator,
x+2 2x x+2 (x 2) 1
= = = .
x2 4 x2 4 x2 4 (x 2)(x + 2) x+2
In this case we don’t even need to use L’Hôpital’s rule to finish the problem since we
had some nice cancellation. In other cases we might not be so lucky.
Problem 1.32. Compute the following limit:
lim x1/x
x!1
10 IAN COLEY
Solution. These problems are also related to L’Hôpital’s rule as well. Plugging in, we
obtain 10 . We notice that if we took the log of this expression, log x1/x = x1 · log x
log x
yields the form 0 · 1. We can now rearrange it to and finish the problem:
x
log x L’H 1
lim log x1/x = lim = lim = 0.
x!1 x!1 x x!1 x
But of course this solves the wrong question. If we say L = limx!1 x1/x , then
log L = 0 (as log is a continuous function so commutes with limits). Thus L = 1.
This same type of solution works if we have the form 11 , as log(11 ) = 1 · log 1
yields 1 · 0. This concludes the di↵erential side of single-variable calculus.
s
2. Day 2: Single variable calculus
s
Topics covered: the integral, area between curves, volumes of revolution, the fun-
re
damental theorem of calculus, u-substitution, integration by parts, trigonometric
integration, partial fractions, arc length and surface area, sequences and series, con-
vergence tests, Taylor polynomials and power series, root and ratio tests.
2.1. Integrals. Let us start with the definition of the Riemann integral. To do so,
og
we need to recall the limit of a sequence.
Definition 2.1. Let {xn }n2N be a sequence. We say that limn!1 xn = L if for all
" > 0, there exists N 2 N such that whenever n > N , |xn L| < ".
Pr
The di↵erence here is that there is no function floating around. We will revisit the
intricacies of sequences and series later in this lecture.
Definition 2.2. Let f : R ! R be a function. We define the left Riemann integral
of f on the interval [a, b] to be the limit
n 1 ✓ ◆
b a X b a
lim · f a+ ·i
In
n!1 n i=0
n
We define the right Riemann integral of f on the interval [a, b] to be the limit
n ✓ ◆
b a X b a
lim · f a+ ·i
n!1 n i=1
n
We say that the Riemann integral of f on the interval [a, b] if the above exist and
agree. In that case, we write
Z b
f (x) dx
a
There is more we could say on this definition, but it’s not necessary for the GRE.
We also call the individual terms of these limits the left and right Riemann sums,
denoted Ln f or Rn f . There are a few other approximations for integrals, including
the midpoint and trapezoid approximations. The trapezoid ✓ approximation is◆the
b a
average of the left and right, and the midpoint rule uses a + · (2i + 1) in
2n
its argument.
We will almost never use the right and left Riemann sums, as these limits are
not calculable in practice, but it’s important to know a few things. If a function is
increasing, then the left Riemann sum will always underestimate the actual value of
the integral, and the right Riemann sum will always overestimate it.
The Riemann integral is not guaranteed to exist for an arbitrary function, but it
s
must exist for our favourite functions.
s
Proposition 2.3. If f : [a, b] ! R is bounded and continuous at all but finitely
Z b
re
many points, then f (x) dx exists.
a
We could assign this as an exercise, but it’s a bit difficult and not necessary at all
for the GRE. This isn’t a necessary and sufficient condition, but it is certainly good
og
enough for almost all purposes.
The integral is linear, just as the derivative was. In particular, this means that
Z b Z b
c · f (x) dx = c · f (x) dx
Pr
a a
and Z Z Z
b b b
f (x) ± g(x) dx = f (x) dx ± g(x) dx
a a a
Moreover, recalling the definition via Riemann sums, we can always split an inte-
gral into intermediate chunks. For any c 2 [a, b], we have
In
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx
a a c
Z a
By definition we will say f (x) dx = 0, so using the additivity above
a
Z b Z a
f (x) dx = f (x) dx
a b
This way we can make sense of integrals where our interval [a, b] happens to be
oriented the wrong way (i.e. a > b).
12 IAN COLEY
In practice, we will always compute the integral using the Fundamental Theorem
of Calculus.
Theorem 2.4 (FTC I). Assume that f : [a, b] ! R is a continuous function. If
F : [a, b] ! R is a function such that F 0 (x) = f (x), then
Z b
f (x) dx = F (b) F (a).
a
Hence the calculation of integrals will amount to the calculation of antiderivatives

(i.e. the function F (x) above).
There’s a companion theorem that we state now.
Z x 2.5 (FTC II). Let f : [a, b] ! R be a continuous function, and define
s
Theorem
F (x) = f (t) dt. Then F 0 (x) = f (x).
s
a
We can extend this theorem using the chain rule:
re
Z b(x)
d
f (t) dt = f (b(x)) · b0 (x) f (a(x)) · a0 (x).
og
dx a(x)
One last thing to say at this point is the definition of an improper integral. Suppose
that we are trying to integrate f (x) on an unbounded region, say [0, 1), or over a
region [a, b] on which g(x) has an infinite discontinuity (say at x = a). Then we can
Pr
define the integral (should it exist) as a limit of integrals as defined above, e.g.
Z 1 Z R Z b Z b
f (x) dx := lim f (x) dx g(x) dx := lim+ g(x) dx
0 R!1 0 a h!0 a+h
These limits are not guaranteed to exist. The most common types of integrals
we consider in this situation are functions f (x) such that lim f (x) = 0 so that the
x!1
integral at least has a chance of converging. We will readdress this issue in the section
In
on series.
2.2. Applications of the integral. But before we get into techniques, why bother
computing integrals at all? For one, the integral of f (x) on [a, b] gives the signed area
of the region under the graph of f (x). This can even be used in reverse: integrals
can be calculated using geometry.
Problem 2.7. Compute the integral
Z 2p
4 x2 dx
2
p
Solution. The equation y = 4 x2 corresponds to x2 + y 2 = 4, i.e. a circle of
radius 2. Our graph is the top half of this circle. Therefore the area under the curve
is half the area of the circle, 2⇡. This problem can be solved otherwise, but it’s much
more annoying.
In the same way, we can compute the area between curves with integrals. We need
only to compute the area under the upper curve and subtract the area under the
lower curve. The main question in situations like this is which curve is on top and
which is on bottom.
p
Problem 2.8. Compute the area between the curves y = x and y = x2 in the first
quadrant.
s
p
Solution. Thinking on the graphs, we can recall that x is above x2 in the region
s
[0, 1] where these curves intersect. Therefore the integral we need to compute is
Z 1
p
re
x x2 dx.
0
Luckily, we can get antiderivatives of these functions very easily:

Z 1
og
p 2x3/2 x3 1 2 1 1
x x2 dx = = = .
0 3 3 0 3 3 3
The other application is to surfaces and volumes of revolution. Supposing that
we rotate a curve y = f (x) around the x-axis, the height of the curve becomes the
Pr
radius of a disc, which we then need to integrate along the interval [a, b] in question,
whatever it is:
Z b
V =⇡ f (x)2 dx
a
Problem 2.9. What is the volume of the region created by rotating y = log x around
the x-axis between x = 1 and x = e2 ?
In
It’s harder to compute the volume when we rotate y = f (x) around the y-axis.
Rather than use the method of discs, we use the method of cylindrical shells. In this
case, we are computing the area of a cylinder of radius r and height h, which is given
by 2⇡ · r · h. In our case, the radius is x and the height is f (x), hence
Z b
V = 2⇡ x · f (x) dx
a
Problem 2.10. Compute the volume of the region created by rotating y = 1 2x +

3x2 2x3 from [0, 1] around the y-axis.
14 IAN COLEY
Arc length of a curve is another application. If we are looking at the infinitesimal

change in the length of a curve, it travels dx p in the x direction and dy in the y
direction. Therefore its total length is ds = (dx)2 + (dy)2 . In the case that y =
f (x), we have
s ✓ ◆2 Z b
Z Z p
dy
s = ds = (dx)2 + = 1 + f 0 (x)2 dx
dx a
Problem 2.11. Compute the arc length of y = cosh x over the interval [0, 2].
Sometimes we also want to calculate the surface areas of regions of revolution, not
just the volumes. In this case, we again have a formula: the infinitesimal amount of
surface area is given by the arc length along the surface times the circumference of
s
the disc, which in the case of rotation around the x-axis is 2⇡f (x). Thus
Z
s
p
S = 2⇡ f (x) 1 + f 0 (x)2 dx
re
Problem
p 2.12. Compute the surface area of a sphere of radius R, using the curve
y = R 2 x2 .
2.3. Integration techniques. Now besides guessing at antiderivatives, what are
og
our other integration techniques? The first is u-substitution, which is is our answer
to the chain rule.
Theorem 2.13 (u-substitution). Suppose that h(x) is a continuous function and we
can write h(x) = f 0 (g(x))g 0 (x). Then
Pr
Z b Z g(b)
h(x) dx = f (u) du.
a g(a)
p
Problem 2.14. Evaluate the integral of f (x) = 2x · sin(x2 ) on [0, ⇡/2].
Solution. We see that g(x) = x2 and f (u) = sin u is a good choice. Hence
Z p⇡/2 Z ⇡/2
In
⇡/2
2
2x · sin(x ) dx = sin u du = cos u = 1.
0 0 0
The second is integration by parts, which is our answer to the product rule. It
0 0 0
steps from the following observation: if we consider
Z Z product (f · g) = f · g + f · g
the
and integrate both sides, we obtain f · g = g df + f dg.
Theorem 2.15 (Integration by Parts). Suppose that f (x) has the form u(x) · v 0 (x).
Then Z Z Z
0
f (x) dx = u(x) · v (x) dx = u(x) · v(x) v(x) · u0 (x) dx.
This is useful when your function is a product of a part which is easy to integrate
and a part which is difficult to integrate. Another situation is when part of your
function will di↵erentiate to zero and the other part will not di↵erentiate, as in x · ex .
A general mnemonic for what functions should be chosen for v 0 (x) is LIATE: loga-
rithms, inverse trigonometric functions, algebraic (e.g. polynomials), trigonometric
functions, and finally exponential functions. Note that these latter two types in
particular are almost never ideal.
Problem 2.16. Compute the integral of f (x) = log x.
Solution. If we follow our mnemonic, we will choose dv = log x dx. This means that
1
u = 1. Therefore v = dx and du = dx. Hence:
s
x
Z Z Z
1
s
log x dx = x log x x · dx = x log x 1 dx = x log x x.
x
re
It doesn’t seem like it would work, and then it does.
Problem 2.17. Compute the integral of f (x) = x2 · ex .
The next method is the method of trigonometric substitution. There are some
og
integrals that do not lend themselves to either of the above methods, but require a
special kind of u-substitution. We recognise this situation in the case that one of the
Pythagorean identities holds, usually the following:
p
Problem 2.18. Compute the integral of f (x) = 1 + x2 .
Pr
Solution. The trigonometric identity we are looking for in this situation is that
1 + tan2 ✓ = sec2 ✓. Therefore if we substitute x = tan ✓, we will only need to
integrate sec ✓, which is much more tractable than the current issue.
However, we have to consider the integration term. If x = tan ✓, then dx =
sec2 ✓ d✓. Thus Z p Z
In
1 + x dx = sec ✓ · sec2 ✓ d✓.

2
We leave it in this form because we can now use integration by parts to solve this
problem: let dv = sec2 ✓ d✓ and u = sec ✓. Then v = tan ✓ and du = sec ✓ tan ✓, so
Z Z
3
sec ✓ d✓ = sec ✓ · tan ✓ sec ✓ tan2 ✓ d✓.
We now use that tan2 = sec2 ✓ 1 as we did above so that

Z Z
3
sec ✓ d✓ = sec ✓ tan ✓ sec ✓(sec2 ✓ 1) d✓
16 IAN COLEY
which rearranged gives us

Z Z
3
2 sec ✓ d✓ = sec ✓ tan ✓ sec ✓ d✓
R
Therefore the last thing to compute is sec ✓ d✓. This is a hard calculation that one
must memorise. The key is that we can apply u-substitution if we decide (as if by
sec ✓ + tan ✓
magic) to multiply by . I will leave it to the reader to verify
sec ✓ + tan ✓
Z
sec ✓ d✓ = log(tan ✓ + sec ✓).
Putting this all together completes the problem, sort of. Though we know that
Z
s
sec ✓ tan ✓ 1
sec3 ✓ d✓ = log(tan ✓ + sec ✓)
2 2
s
we were asked a question about f (x), not f (✓). We know that x = tan ✓, so we can
re
make that substitution. In order to determine what sec ✓ is in terms of x, we draw
the triangle
p as in yesterday’s lecture. The Pythagorean theorem will tell us that
sec ✓ = 1 + x2 . If we then put it all together,
Z p p
x 1 + x2 1 p
og 2
1 + x dx = log(x + 1 + x2 ).
2 2
The final integration technique is the method of partial fractions. We are only ca-
pable with our techniques to integrate fairly simple rational functions, so the method
of partial fractions allows us to break up complicated expressions into integrable ones.
Pr
p(x)
The idea is that any quotient of polynomials comes from a sum of polynomials
q(x)
whose denominators are the irreducible factors of q(x). We can integrate expressions
a ax + b
of the form or 2 .
x r x +r
4
Problem 2.19. Compute the integral of 4 .
In
x 1
Solution. We factor the denominator as (x + 1)(x 1)(x2 + 1), and so set up
4 A B Cx + D
4
= + + 2
x 1 x+1 x 1 x +1
Multiplying through by the denominator,
4 = A(x 1)(x2 + 1) + B(x + 1)(x2 + 1) + (Cx + D)(x + 1)(x 1)
We can plug in particular values for x to compute A and B. If x = 1,
4 = A( 2)(2) =) A = 1
If x = 1,
4 = B(2)(2) =) B = 1
For the last two variables, we need to do some brute multiplication. Omitting the
details,
4 = Cx3 + (2 + D)x2 Cx + (2 D)
which implies that C = 0 and D = 2. Hence
4 1 1 2
= + +
x4 1 x + 1 x 1 x2 + 1
which we can now integrate:
Z Z Z Z
4 1 1 2
s
4
dx = dx + dx + 2
dx
x 1 x+1 x 1 x +1
s
= log |x + 1| + log |x 1| 2 tan 1 (x)
re
There’s a bit of a complication if the irreducible factor of q(x) is not exactly of the
form x2 + r, but completing the square and some further manipulation can always
get us to that form.
og ✓ ◆
Z
dx 1 1 x
2
= tan p
x +r r r
2.4. Sequences and series. We gave above the definition of a sequence and the
Pr
definition of a convergent sequence. We can now recall what a series is.
Definition 2.21. A series is a sequence {Sn }n2N defined by a sequence {ai }i2N such
Xn
that Sn = ai . A series converges if the sequence {Sn } converges as above. We
i=0
1
X
write ai for lim Sn .
In
n!1
i=0
Sometimes we will call {ai } the series and leave the fact that we are taking sums
implicit.
The simplest type of sequence/series that we encounter is the geometric series,
which is defined by ai = a0 · ri for some r 2 R. In this circumstance, there is an easy
criterion for when the series {ai } converges.
Problem 2.22. Prove that the series {ai = a0 · ri } converges if and only if |r| < 1.
a0
In this situation, the infinite sum has the formula .
1 r
18 IAN COLEY
2.5. Convergence tests. Unfortunately, most series aren’t geometric, so we have

convergence tests to decide whether they converge. If a series does not converge, we
say it diverges. The first test is the following easy check:
Theorem 2.23 (Divergence Test). The series {an } diverges if lim an 6= 0.
n!1
1
Theorem 2.24 (p-test). Suppose we have the series an = for some p 2 R. The
np
series {an } converges if and only if p > 1.
This situation is not so common, but is a useful tool (as we will soon see).
Theorem 2.25 (Limit Comparison Test). Suppose that {an }, {bn } are nonnegative
s
series. Consider the quantity
an
lim =L
s
n!1 bn
where L 2 [0, 1]. Then we have the following three conclusions:
re
• If L = 0, the series {an } converges if {bn } converges
• If L = 1, the series {bn } converges if {an } converges
• If L 2 (0, 1), the series {an } converges if and only if {bn } converges
og
It is sometimes useful to bear in mind the contrapositives of these statements.
Therefore while many series do not take the form of the p-test, they can be limit-
compared to a ‘p-series’ and thus we can determine their convergence. These two
also combine to answer the standing question about improper integrals:
Pr
Theorem 2.26 (Integral Test). Suppose that {an } is a positive series such that
Z f (n) = an . Then the series {an }
there exists a continuous function f (x) satisfying
1
converges if and only if the improper integral f (x) dx exists (converges).
0
Therefore the same types of tests that check for the convergence of infinite series
can be used to check the convergence of improper integrals when the limits in question
In
are too difficult to compute. Here, however, is one case that can be computed directly.
Z 1
1
Problem 2.27. Show that p
dx converges if and only if p < 1.
0 x
Z 1
1
Problem 2.28. When does the integral converge?
0 xp
Thus far we have spoken of positive series, but there is specific test for alternating
series that is occasionally useful:
Theorem 2.29 (Alternating Series Test). Suppose that {an } is a series satisfying:
• The series is alternating, i.e. ai and ai+1 have di↵erent signs for every i 2 N
• The series does not pass the divergence test, i.e. limn!1 an = 0
• The series is decreasing, i.e. |ai | > |ai+1 | for every i 2 N.
Then the series converges.
⇢
1
For example, we saw above that the series does not converge (the harmonic
⇢ n
( 1)n
series), but the alternating series does. As a remark, the factor ( 1)n is
n
the easiest way to see that a series is alternating, but cos(⇡ · n) does the job as well.
2.6. Taylor polynomials and series. Before getting into the last two tests, we
s
need to recall the definition of a Taylor polynomial and a Taylor series. To begin,
we can define a Taylor polynomial and then explain its utility.
s
Definition 2.30. Let f : R ! R be a function of class C k . The kth Taylor polyno-
re
mial of f centred at x = a is defined by
f 00 (a)(x a)2 f (k) (a)(x a)k
Tk f (x) = f (a) + f 0 (a)(x a) + + ··· +
2 k!
og
This is the best polynomial approximation to f (x) with the property that the first
k derivatives at x = a agree. For example, the first Taylor polynomial is just the
tangent line to x = a, i.e. the linear approximation. If a = 0, then usually we use
the name Maclaurin instead of Taylor.
Pr
The error of a Taylor polynomial can be approximated using what would be the
next term. In particular,
M · |b a|k+1
|Tk f (b) f (b)| 
(k + 1)!
where M is the maximum of |f (k+1) (x)| on the interval [a, b] or [b, a] (whichever
makes sense). In nice circumstances, M takes its maximum at a or b and so a more
In
complicated calculation is not necessary.

p
Problem 2.31. Compute 101 using the third Taylor polynomial of an appropriate
function. What is the approximate error?
Supposing that f is smooth, we can take this definition to infinity.
Definition 2.32. The Taylor series of f (x) centred at x = a is the infinite sum
1
X f (k) (a)(x a)k
T (x) =
k=0
k!
20 IAN COLEY
In ideal circumstances, we have T (x) = f (x) for all x 2 R, but this is not guar-
anteed. This infinite sum may not even converge for some values of x, or for most
values of x for that matter. This is where our last two tests come in handy.
Theorem 2.33 (Ratio Test). Consider a series {an } which may or may not be
positive, and consider the limit
an+1
⇢ = lim
n!1 an
Then {an } converges absolutely if ⇢ < 1, diverges if ⇢ > 1, and is inconclusive if
⇢ = 1.
Recall that a series is said to converge absolutely if {|an |} converges.
s
In the case that the series depends on a parameter x, this gives us a function ⇢(x)
that we demand be  1 to have a chance at convergence. Those x for which ⇢(x) < 1
s
is called the radius of convergence. If we also determine whether the cases ⇢(x) = 1
re
converge (using other techniques), we obtain the interval of convergence.
Problem 2.34. Compute the interval of convergence of
X1
2xn
f (x) =
og n2
n=1
There is another test that is sometimes more helpful (though rarely).

Theorem 2.35 (Root Test). Consider a series {an } which may or may not be posi-
tive, and consider the limit
Pr
⇢ = lim |an |1/n
n!1
Then {an } converges absolutely if ⇢ < 1, diverges if ⇢ > 1, and is inconclusive if
⇢ = 1.
Both these tests are checking to what extent the series we are considering is geo-
metric with common ratio ⇢. Hence series that are mostly polynomial (and not
In
geometric) will yield an inconclusive root/ratio test. One should attempt to apply
a p-test to these (perhaps via a limit comparison). Series that include factorial or
exponential terms are the target for the root and ratio tests.
We end by recalling some useful Taylor expansions and their radius of convergence.
X1
xn
• ex = , x2R
n=1
n!
1
X
1
• = xn , |x| < 1
1 x n=1
X1
( 1)n x2n+1
• sin x = , x2R
n=0
(2n + 1)!
X1
( 1)n x2n
• cos x = , x2R
n=0
(2n)!
We can use these building blocks to construct other Taylor series using substitu-
tion, di↵erentiation, and integration.
Problem 2.36. Compute the Taylor series for tan 1 (x) centred at a = 0. What is
its radius of convergence?
s
Solution. This looks like a dubious prospect, but we notice that
d 1
s
tan 1 (x) =
dx 1 + x2
re
and the righthand side looks a lot like a Taylor series we already know. In particular,
1
X X1 X1
1 n 1 2 n
= u =) = ( x ) = ( 1)n x2n
1 u n=0
1 ( x2 ) n=0 n=0
og
We now need to integrate this Taylor series to obtain the one for tan 1 (x). We do
so term-by-term,
X1 Z X 1
1 n 2n ( 1)n x2n+1
tan (x) = ( 1) x dx =
2n + 1
Pr
n=0 n=0
but need to recall that we might need to add a constant term. The constant term of
the Taylor series is tan 1 (0) = 0, so we don’t need to add anything in this case
What happens to the radius of convergence? Integrating or di↵erentiating doesn’t
do anything, but substituting does. We know that the series converges for |u| < 1,
but now u = x2 ,so
p
In
| x2 | < 1 =) |x|2 < 1 =) |x| < 1 = 1

In general substitutions will change the radius of convergence, but in this case it
happens to stay the same.
3. Day 3: Multivariable calculus

Topics covered: basics on vectors in 3 dimensions, planes, parametric equations,
arc length and speed, limits and continuity in multiple variables, partial derivatives,
di↵erentiability and tangent planes, gradient and directional derivatives, multivari-
able chain rule, optimisation.
22 IAN COLEY
3.1. Vectors in R3 . Since multivariable calculus takes place with two variables (at
least in general and at most for our purposes), graphs will occur in 2 + 1 = 3
dimensions. Thus we should learn a little bit about R3 . In particular, we need to
familiarise ourselves with vector operations (that will reappear in generality for our
linear algebra section).
A vector in R3 is a triple hx, y, zi which we think of as an arrow from (0, 0, 0) to
(x, y, z). Between any two points in R3 we can obtain another vector, namely
(a1 , a2 , a3 ) ! (b1 , b2 , b3 ) ⇠ hb1 a1 , b 2 a2 , b3 a3 i
We’ll need to recall the two main operations on vectors in R3 . The first is common
to every vector space, which is the dot product (or scalar product):
s
ha1 , a2 , a3 i · hb1 , b2 , b3 i = a1 b1 + a2 b2 + a3 b3
s
The dot product is linear in each argument, that is,
(c~v ) · w
~ = c(~v · w)
~ = ~v · (cw)
~
re
and
(~u + ~v ) · w
~ = ~u · w
~ + ~v · w
~
It’s also commutative, which is pretty obvious from its definition. We also define the
og
norm of a vector using this:
k~v k2 = ~v · ~v
~ are orthogonal if ~v · w
We say that ~v and w ~ = 0.
The second is the cross product, which only exists on R3 (at least in this form):
Pr
2 3
î ĵ k̂
ha1 , a2 , a3 i ⇥ hb1 , b2 , b3 i = det 4a1 a2 a3 5
b1 b2 b3
where î = h1, 0, 0i, ĵ = h0, 1, 0i, and k̂ = h0, 0, 1i are the three unit basis vectors in
R3 . The cross product is linear in each variable as well, but it is anticommutative:
In
~v ⇥ w
~= ~ ⇥ ~v
w
The cross product is actually defined by a universal property: it is a bilinear
operation such that ~v ⇥ w ~ is orthogonal to both ~v and w,
~ that the ordered set
{~v , w,
~ ~v ⇥ w}
~ obeys the right-hand rule, and that
k~v ⇥ wk
~ = k~v k · kwk
~ · sin ✓
where ✓ is the planar angle between the two vectors. In particular, ~v ⇥ w
~ gives a
normal vector to the plane spanned by ~.v and w.
~ But before we investigate that,
let’s look at planes in general:
There are two main ways to define a plane (in R3 at least). A plane is the solutions
to a linear function in R3 , so the equation looks like
ax + by + cz = d
for fixed a, b, c, d 2 R. The better way to think about it is to consider a plane as a
set of vectors that are orthogonal to the normal vector to the plane:
~n · ~v = 0
But this defines a plane that passes through the origin. To move the plane to a point
elsewhere, we move it by a fixed amount:
~n · (~v ~v0 ) = 0
s
But now ~n · ~v0 = d for some d 2 R, so we obtain
s
~n · ~v = d
re
and letting ~n = ha, b, ci and ~v = hx, y, zi recovers the above formula.
The cross product is particularly convenient when solving the following problems:
Problem 3.1. Find the equation of the plane in R3 passing through P = (0, 0, 1),
Q = (1, 0, 0) and R = (1, 1, 1).
og
Solution. Any three non-colinear points defines a unique plane in R3 , and we can
take for granted that these points are non-colinear (or check quickly). Recall above
~ then ~v ⇥ w
we said that if we know that a plane is spanned by two vectors ~v and w, ~
is normal to the plane. If we know three points, we can come up with two vectors:
Pr
~v = P~Q = h1, 0, 1i, ~ = P~R = h1, 1, 0i
w
The cross product is
~ ⇥w
~n = w ~ = h1, 1, 1i
Thus the equation of the plane is x y + z = d, where d is some constant. We can
compute it by plugging in any point that is already on the plane, say (0, 0, 1). Hence
In
d = 0 0 + 1 = 1, so the plane has the equation x y + z = 1.

3.2. Parametric curves. We now need to look at parametrised curves in R3 . These
are functions ~r : R ! R3 which we write as ~r(t) = hx(t), y(t), z(t)i. Skipping over
some details, limits, continuity, and di↵erentiability of these parametric functions is
determined precisely by the component functions.
But what is the meaning of the derivative in this case? If it’s done componentwise,
then ~r 0 (t) = hx0 (t), y 0 (t), z 0 (t)i, so every time we plug in a time t we get a whole vector
in R3 instead of a number. How are we supposed to use this to find the tangent line?
Luckily, we can shed the chains of R2 and define lines parametrically instead. The
24 IAN COLEY
most appropriate form of the linear approximation to a curve at x = a in R2 looks

like
L(x) = f (a) + f 0 (a)(x a)
We can do something similar. We can define a parametric curve in R3 by
~`(t) = hx(t0 ) + x0 (t0 )(t t0 ), y(t0 ) + y 0 (t0 )(t t0 ), z(t0 ) + z 0 (t0 )(t t0 )i
= ~r(t0 ) + ~r 0 (t0 ) · (t t0 )
So what do we have? We have the point ~r(t0 ) at which we are taking this linear
approximation (and this is a line), and we have a slope ~r 0 (t0 ) which determines the
line’s direction.
Problem 3.2. Find the linear approximation to the curve ~r(t) = ht2 , t3 , 2t 1i at
s
time t = 2.
s
What about arc length? We went over how to do this in R2 earlier: the arc length
of the curve y = f (x) is
re
Z bp
1 + f 0 (x)2 dx
a
p
which we obtained by trying to integrate ds = (dx)2 + (dy)2 . Now, we aren’t going
og
to want to integrate with respect to x, because these curves are functions of time t.
Moreover, we have three components, so that
p
ds = (dx)2 + (dy)2 + (dz)2
This is the length of the diagonal of the infinitesimal cube with sides dx, dy, dz. Thus
Pr
by ‘factoring out’ a dt from all these terms,
Z Z p
ds = (dx)2 + (dy)2 + (dz)2
s✓ ◆ ✓ ◆2 ✓ ◆2
Z 2
dx dy dz
= + + dt
dt dt dt
Z
In
k~r 0 (t)k dt
No surprise: we are integrating the length of the velocity, i.e. the (directionless) speed
of the curve at every point. This is, in fact, the same formula as we were dealing
with before. The planar curve y = p f (x) can be parametrised as ht, f (t)i, which has
0
derivative h1, f (t)i and thus speed 1 + f 0 (t)2 . Thus we can forget the old formula
and stick with the new.
Problem 3.3. Compute the arc length of the helix ~r(t) = hsin t, cos t, ti from t = 0
to t = 2⇡.
One may recall studying curvature or other horrible topics, but these don’t seem
to appear on the GRE so we will not revisit them.
3.3. Multivariable functions. We will give everything in terms of two variables
for now, but the same could be done for 3 or n variables without changing definitions
very much.
Suppose now that f : R2 ! R is a function. The definition of a limit is still the
same.
Definition 3.4. We say that lim(x,y)!(a,b) f (x, y) = L if for every " > 0, there exists
> 0 such that whenever k(x, y) (a, b)k < , kf (x, y) Lk < ".
Now talk about how to try to compute limits, and approaching along di↵erent
s
paths to show that limits don’t exist
s
Problem 3.5. Prove that if lim f (x, y) = L, then for all : [0, 1] ! R such
(x,y)!(a,b)
re
that (0) = (a, b), then lim f ( (t)) = L.
t!0
xy
Problem 3.6. Prove that lim does not exist. Hint: find two paths
(x,y)!(0,0) x2 + y2
og
that give di↵erent limits.
xy 2
Problem 3.7. Prove that lim = 0.
(x,y)!(0,0) x2 + y 2
Solution. For this, we will recall polar coordinates, which we will use more tomor-
Pr
row. If we convert the point (x, y) into polar coordinates, then we are instead taking
the limit r ! 0 and under the substitution x = r cos ✓, y = r sin ✓, we need to solve
r3 cos ✓ sin2 ✓
lim 2
= lim r(cos ✓ sin2 ✓)
r!0 r r!0
But now we can apply the squeeze theorem to the problem:

In
0  lim |r(cos ✓ sin2 ✓)|  lim |r| = 0

r!0 r!0
and conclude that the middle limit must be zero as well.

In general, these types of limits exist when the numerator is a higher degree than
the denominator and do not otherwise. In the case that we believe the limit to exist,
polar coordinates and the squeeze theorem is usually the way to prove it.
Continuity is defined the same way using these limits. A function is continuous if
the limit at all points in its domain is the actual value of the function. All functions
from single variable calculus are still continuous in multivariable calculus, except
that now we allow both x and y to appear.
26 IAN COLEY
3.4. Partial derivatives. Di↵erentiability is defined slightly di↵erently for multi-

variable functions. Instead of having one derivative, we have several.
Definition 3.8. The partial derivative of f : R2 ! R at (a, b) with respect to x is
(if it exists) the limit
f (a + h, b) f (a, b)
@x f (a, b) = lim
h!0 h
The partial derivative with respect to y is
f (a, b + k) f (a, b)
@y f (a, b) = lim
k!0 k
Definition 3.9. We say that f : R2 ! R is di↵erentiable at (a, b) if both partial
s
derivatives @x f (a, b) and @y f (a, b) exist and f (x, y) is locally linear at (a, b).
s
The specific definition does not matter too much, but we have a nice property.
re
Proposition 3.10. If @x f (a, b) and @y f (a, b) exist and are continuous in a neigh-
bourhood of (a, b), then f (x, y) is di↵erentiable at (a, b).
In this case, we can define the tangent plane to f (x, y) at (a, b) and it is actually
the linear approximation: the tangent plane is spanned by the partial derivative
og
in the x-direction and the one in the y-direction. We didn’t go over above how to
parametrise a plane using two variables, but we can do so now:
P (s, t) = s · h1, 0, @x f (a, b)i + t · h0, 1, @y f (a, b)i + ha, b, f (a, b)i
Pr
But this isn’t particularly helpful for us, since we would like a form in terms of
(x, y, z). Instead, we will define the tangent plane using its normal vector. Because
we know two vectors in the plane, which moreover are linearly independent, we take
their cross product:
2 3
î ĵ k̂
h1, 0, @x f (a, b)i ⇥ h0, 1, @y f (a, b)i = det 41 0 @x f (a, b)5
In
0 1 @y f (a, b)
= h @x f (a, b), @y f (a, b), 1i
Hence our equation is h @x f (a, b), @y f (a, b), 1i · hx a, y b, z f (a, b)i = 0. If
we work this out and rearrange it, it becomes
z = @x f (a, b)(x a) + @y f (a, b)(y b) + f (a, b)
which looks a lot like the equation for the tangent line, except now there are two
slopes and two variables that need to be taken into account.
Now, what about taking multiple partial derivatives? In principle one can take
both @x @y f (x, y) and @y @x f (x, y). Are these the same? Are we detecting the same
change in both x and y in both cases? In general, no we are not, but in every case
that we’ll run into during the GRE, yes. The reason is Clairaut’s theorem:
Theorem 3.11 (Clairaut’s Theorem). Suppose that f : R2 ! R is a function, and
suppose that the second-order partial derivatives of f exist and are continuous in a
neighbourhood of (a, b). Then @x @y f (a, b) = @y @x f (a, b).
This is convenient and not necessarily expected, but it does make a particular
technique in optimisation a whole lot more convenient later on. This also works
in more than 2 variables when taking partial derivatives with respect to any two
di↵erent independent variables.
3.5. Gradient and directional derivatives. Having done partial derivatives, it’s
s
fair to ask if there’s anything resembling a ‘total derivative’ of the function, something
s
that takes all the variables into account. There is.
Definition 3.12. For f : R2 ! R, define the gradient of f at (a, b) to be
re
rf (a, b) = h@x f (a, b), @y f (a, b)i.
What good is the gradient for us? It is a vector quantity instead of a scalar
og
quantity, which is interesting, and it still satisfies some nice properties. Because it
is made of partial derivatives, it is still linear, and it satisfies a product rule:
r(f (x, y) · g(x, y)) = f (x, y) · rg(x, y) + g(x, y) · rf (x, y)
where we think of f and g as scalar multiples (though depending on (x, y)). There is
Pr
also a chain rule for the types of functions that we can actually compose at this point:
suppose that ' : R ! R, so that ' f : R2 ! R is still a multivariable function. Then
r(' f ) = '0 (f (x, y)) · rf (x, y)
where again we think of the function '0 : R ! R as acting by scalar multiplication.
Problem 3.13. Compute the gradient of
In
g(x, y, z) = (x2 + y 2 + z 2 )8
We can now talk about directional derivatives in other directions. The partial
derivatives are the derivatives in the direction h1, 0i or h0, 1i, but we could have used
any other unit vector.
Definition 3.14. The directional derivative of f : R2 ! R at (a, b) in the direction
~u = hh, ki is the limit
f (a + th, b + tk) f (a, b)
@~u f (a, b) = lim
t!0 t
28 IAN COLEY
Note that these may not exist if the function is not di↵erentiable at (a, b). In
particular, the existence of partial derivatives alone is not sufficient to conclude
these exist. But in the case f (x, y) is di↵erentiable, we have the following:
@~u f (a, b) = rf (a, b) · ~u
which means that the above limit needs to be computed only rarely.
Problem 3.15. Prove that the directional derivatives of
( 4
xy
2 8 (x, y) 6= ~0
f (x, y) = x +y
0 (x, y) = ~0
at (0, 0) exist and depend linearly on the gradient, but that f (x, y) is not di↵eren-
s
tiable at (0, 0).
s
Problem 3.16. Prove that there is no function f (x, y) such that rf (x, y) = hy 2 , xi.
Hint: Clairaut’s theorem.
re
We have another version of the chain rule, where we compose a curve and a
multivariable function to obtain a function R ! R.
Theorem 3.17 (Chain Rule II). Let r : R ! R2 be a di↵erentiable curve and let
og
f : R2 ! R be a di↵erentiable function. Then
d
f (~r(t)) = rf (~r(t)) · ~r 0 (t)
dt
where this is now the dot product of the two vector-valued functions.
Pr
Problem 3.18. Prove this theorem from the definition of a single variable derivative.
Gradient is in the direction of greatest change on the graph z = f (x, y). How do
we see this? The directional derivatives of f (x, y) tell us the rate of change in each
direction. The direction ~u which makes the quantity rf (a, b) · ~u the most is the
unit vector in the direction of rf (a, b) itself. Similarly, rf (a, b) is the direction of
In
greatest decrease.
For another thing, suppose that we look at the level curves of the graph z = f (x, y).
These are the specific subsets f (x, y) = c for a fixed constant c 2 R. Let ~rc (t)
parametrise the curve, and consider a point (a, b) = ~rc (t0 ) on this curve. Then the
tangent vector to the curve is ~r 0c (t), and we can examine rf (a, b) · ~r 0c (t0 ). By the
chain rule,
d
rf (a, b) · ~r 0c (t0 ) = f (~rc (t))
dt t=t0
But on the curve ~rc (t), f is the constant c. Thus the above derivative is zero. This
means that the gradient is orthogonal to the tangent vector to the level curve, i.e. it
is normal to the level curve. This is also true in higher dimensions, though it’s a bit
more complicated to prove.
Problem 3.19. What is the greatest rate of change of f (x, y) = x4 y 2
at the point
(a, b) = (2, 1)?
The last thing to say is on the subject of surfaces on R3 which are defined using
3-variable functions. Consider a function F : R3 ! R and consider the set of points
(x, y, z) such that F (x, y, z) = c. Assuming that F is a nice function (say, F is C 2
and rF is nowhere zero in all components), this defines a surface in R3 , but usually
one that isn’t the graph of a function. If we consider the easiest example,
x2 + y 2 + z 2 = 1
s
then we get a sphere, which we know isn’t the graph of a function, but is the graph
s
of two functions glued together.
Now, how do we find the tangent plane to such a surface? We clearly can’t take the
re
same approach because we don’t have a function f (x, y) = z to deal with. Instead,
we need to figure out how to use F (x, y, z). Suppose that ~r(t) is a curve on the
surface F (x, y, z) = c. Then by the chain rule,
d
og F (~r(t)) = rF (~r(t0 )) · ~r 0 (t0 ).
dt t=t0
But F (~r(t)) = c is a constant function, so its gradient is the zero vector. Thus the
above dot product is also zero, so that rF (~r(t) is orthogonal to to the curve ~r(t) at
any point. Thus rF is orthogonal to the surface F (x, y, z) = c and thus we can use
Pr
it as the normal vector to the tangent plane. We see now the reason that rF should
not be identically zero – it would mean that the ‘normal vector’ to a tangent plane
is the zero vector, implying something is wrong with the geometry of the situation
Problem 3.20. What is the tangent plane to the surface x2 + y 2 + z 2 = 3 at the
point (1, 1, 1)?
In
Solution. This is defined by F (x, y, z) = x2 + y 2 + z 2 and c = 3. We also have

rF (x, y, z) = h2x, 2y, 2zi. So at the point in question, we have
rF (1, 1, 1) = h2, 2, 2i
so the tangent plane has the formula
2x + 2y + 2z = d
for some d. To find d, we just plug in a point that we know is on the plane, namely
(1, 1, 1). Once we note that 2 + 2 + 2 = 6, we have
2x + 2y + 2z = 6 or x + y + z = 3.
30 IAN COLEY
3.6. Local extrema. First, let’s talk about finding local minima and maxima. Just
as in single-variable calculus, these occur at critical points.
Definition 3.21. Let f : R2 ! R be a di↵erentiable function. Then we say that
(a, b) 2 R2 is a critical point of f (x, y) if rf (a, b) = ~0. That is, @x f (a, b) = 0 and
@y f (a, b) = 0.
Of course, just like in single-variable calculus, while every extremum occurs at a
critical point, not every critical point gives rise to an extremum. There are two meth-
ods in single-variable calculus to give us more information: the first derivative test
and the second derivative test. Neither has an immediate analogue in multivariable
calculus, but the second derivative test will turn out to be the solution.
s
But, as discussed above, there are four di↵erent ‘second derivatives’ of a given
function. What we do is assemble them into a matrix called the Hessian of f as
s
follows: ✓ ◆
@x @x f @x @y f
Hf = .
re
@y @x f @y @y f
Then the second derivative test says the following:
Theorem 3.22 (Second Derivative Test). Let f : R2 ! R be a function of class
og
C 2 . Suppose that (a, b) is a critical point of f (x, y). Let d = det Hf (a, b) be the
determinant of the Hessian matrix and T = tr Hf (a, b) be its trace. The following
conclusions hold:
• If d < 0, then the point (a, b) is a saddle point.
• If d > 0 and T < 0, then the point (a, b) is a local maximum.
Pr
• If d > 0 and T > 0, then the point (a, b) is a local minimum.
• If d = 0, the test is inconclusive.
If it’s hard to remember which condition corresponds to maximum and which to
minimum, then just remember single-variable: if f 00 (x) < 0, we have a local maximum
and if f 00 (x) > 0, we have a local minimum. The trace follows the same convention.
It’s also fun fact that, in the case that d > 0, @x2 f and @y2 f must have the same sign,
In
so you can use one of those instead of the trace.

Remark 3.23. Why the second derivative test works requires some linear algebra to
understand. Since f is of class C 2 , Clairaut’s theorem applies and thus the Hessian
Hf is symmetric. A real symmetric matrix is diagonalisable by the spectral theorem
(see below), and thus we have
✓ ◆ ✓ ◆
@x @x f (a, b) @x @y f (a, b) 1 0
⇠
@y @x f (a, b) @y @y f (a, b) 0 2
when we plug in a (critical) point (a, b).

This matrix corresponds to second derivatives in the two essential directions which
are describing the behaviour of f (x, y) near (a, b). Thus we would want both direc-
tions to agree on what we’re seeing. If 1 , 2 > 0, then both directions think we are
concave up and thus we should be at a minimum. If 1 , 2 < 0, then both directions
think we are concave down and thus we should be at a maximum. However, if 1
and 2 have di↵erent signs, then this means that we are concave up in one direction
and concave down in another – a saddle point. We have a similar problem if 1 or
2 are equal to zero.
How does this reasoning apply to the second derivative test? The determinant of
the Hessian is equal to 1 2 . If 1 and 2 have the same sign, then d > 0. Otherwise,
d  0. The trace of the Hessian is equal to 1 + 2 , which lets us figure out if both
s
are positive or both are negative (in the case that d > 0).
The second derivative test for R2 takes advantage of a particular quirk: the product
s
of two numbers is positive if and only if the numbers have the same sign. If we were
to discuss local extrema in R3 or higher, we would end up needing to analyse three
re
eigenvalues 1 , 2 , 3 . It’s impossible to tell if three numbers are all positive just
from their product and sum: the triple (3, 1, 1) has positive determinant and
trace, but corresponds to a saddle point.
og
Let’s have one example before moving on:
Problem 3.24. Find and classify all critical points of f (x, y) = (x2 + y 2 )e x .
Solution. First we need the gradient:
Pr
@x f (x, y) = (x2 + y 2 )( e x ) + (2x)e x
= (2x x2 y 2 )e x
x
@y f (x, y) = 2ye
Starting with the y-derivative, we must have y = 0. Plugging that into the x-
derivative,
@x f (x, 0) = (2x x2 )e x
= (2 x) · x · e x
In
giving us two solutions: (2, 0) and (0, 0), as e x will never equal zero. We now need
to compute the Hessian. It’s useful here to take advantage that Clairaut’s theorem
applies, so that @y @x f = @x @y f :
@x2 f (x, y) = (2x x2 y 2 )( e x ) + (2 2x)e x
= (2 4x + x2 + y 2 )e x
x
@x @y f (x, y) = 2ye
@y2 f (x, y) = 2e x
32 IAN COLEY
Thus we can compute some Hessians:

✓ ◆
2 0
Hf (0, 0) =
0 2
We’re already diagonal, so we can see that this corresponds to a local minimum.
✓ ◆
2e 2 0
Hf (2, 0) =
0 2e 2
Now the eigenvalues have opposite signs, so this corresponds to a saddle point.
3.7. Optimisation and Lagrange Multipliers. Let’s now turn to global maxima.
s
Optimisation in multivariable calculus works about the same as in single-variable
calculus: first, figure out the region on which you are trying to optimise. Check for
s
critical points of your function on the inside of your region, then check the boundary.
It’s not necessary to classify the critical points because, at the end of the day, we’re
re
just going to write down a list of values and pick the biggest and smallest one.
In single-variable, the boundary of a compact region (i.e. closed and bounded) is
always a discrete set of points which you can check individually. In multivariable
calculus, regions are two-dimensional so boundaries are one-dimensional. This means
og
that the ‘check the boundary’ step in multivariable calculus is just an ordinary op-
timisation problem in single-variable calculus, which (having gotten this far in the
course) you already know how to do.
Enough talk – let’s have an example.
Pr
Problem 3.25. Find the maximum of the function f (x, y) = x + y x2 y2 xy
on [0, 2] ⇥ [0, 2] ⇢ R2 .
Solution. We are working on a compact region so we are guaranteed a maximum,
so that’s a relief. First, find the gradient of the function:
rf (x, y) = h1 2x y, 1 2y xi
In
We need to simultaneously solve 1 2x y = 0 and 1 2y x = 0. We’ve known

how to do this since Algebra I, so we skip the step and find the critical point is at
(1/3, 1/3). If we were very bold, we would conclude that this must be maximum
because it’s the only critical point, but we need to check the boundary.
The boundary of this region is made up of four lines, which we need to parametrise
using a single variable. The first edge is the bottom edge (0, 0) ! (2, 0), for which we
have the parametrisation ~r1 (t) = ht, 0i for t 2 [0, 2]. This gives us a single-variable
problem:
f1 (t) = f (~r1 (t)) = t t2 =) f10 (t) = 1 2t
giving us a critical point (on this line) of t = 1/2, so the point (1/2, 0) all in all.
Noticing that f (x, y) = f (y, x), we will obtain a critical point (0, 1/2) on the left
edge of the square.
Moving to the top edge, we have ~r2 (t) = ht, 2i for t 2 [0, 2]. Solving as above,
f2 (t) = f (~r2 (t)) = t + 2 t2 4 2t = 2 t t2 =) f20 (t) = 1 2t
giving us a critical point at 1/2. This is outside our region, so we ignore it. By
symmetry, we won’t get anything on the right edge either.
The last step is to the check the boundaries of our boundary edges, which are
the corners of the square: (0, 0), (0, 2), (2, 0), and (2, 2). Having assembled all our
points, we now get a list of values:
s
f (0, 0) = 0
f (0, 2) = f (2, 0) = 2
s
f (2, 2) = 8
re
f (0, 1/2) = f (1/2, 0) = 1/4
f (1/3, 1/3) = 1/3
which proves that, indeed, the maximum was at the critical point (1/3, 1/3) all along.
og
However, we now know that the minimum of the function occurs at ( 2, 2).
Problem 3.26. Find the global extrema of the function f (x, y) = x2 x · y on the
ellipse x2 + 4y 2  4.
Solution. The first step is to find the critical points of the function on the interior
Pr
of the ellipse, which I will leave as an exercise. The problem comes with checking
the boundary – it is certainly one-dimensional, but how do we parametrise it? Here
is a sub-exercise for you to do:
Problem 3.27. The ellipse x2 /a2 +y 2 /b2 = 1 is parametrised by ~r(✓) = ha cos ✓, b sin ✓i
for ✓ 2 [0, 2⇡].
In
Once you’ve done this problem, you’ll be equipped to parametrise the boundary
and complete the problem. Unlike the case of the square, we do not have a ‘boundary
of the boundary’ in this case since the ellipse doesn’t have endpoints.
We now turn to the special case of Lagrange multipliers. This applies to opti-
misation of functions g : R3 ! R on closed surfaces (i.e. compact surfaces without
boundary) in R3 defined implicitly by F (x, y, z) = 0 (or F (x, y, z) = c for any c, but
by modifying F we can assume c = 0). It can also apply to optimisation on ellipses
or circles in R2 , but we will only demonstrate in the more difficult case.
Suppose that we are trying to find a maximum of g(x, y, z) on S = F 1 (0) for a
C 2 function F : R3 ! R. Let’s pick a random point (a, b, c) (where we do not mean
34 IAN COLEY
c as above). We can examine the gradient rg(a, b, c), which points in the direction
of greatest change of g. We can use this direction to move along S to another point
(a0 , b0 , c0 ) near to our starting point such that g(a0 , b0 , c0 ) > g(a, b, c). There is one
circumstance when that fails – if rg is pointing directly away from the surface S,
we cannot travel in that direction at all.
But we already know what direction is directly away from S – it is rF (a, b, c).
Thus:
Theorem 3.28 (Lagrange Multipliers). Let F be a C 2 function so that F (x, y, z) = 0
define a closed surface S in R3 , and let g : R3 ! R be a C 1 function. Then g has its
local extrema at those points (a, b, c) so that rF (a, b, c) and rg(a, b, c) are parallel,
i.e. there exists 2 R \ {0} such that rF (a, b, c) = g(a, b, c).
s
Note that this includes the case rg = ~0 identically, which would correspond to a
s
local maximum or minimum of g(x, y, z) without constraining ourselves to S.
re
Problem 3.29. Find the point on the plane
x y z
+ + =1
2 4 4
og
closest to the origin in R3 , then compute the distance.
Solution. As always, we need a constraint
p function and a function to optimise. The
function to optimise is d(x, y, z) = x + y + z 2 , which is a bit messy. As we argued
2 2
above, it suffices to minimise d2 = g(x, y, z) = x2 + y 2 + z 2 , which will have a much

Pr
nicer gradient. Our constraint is F (x, y, z) = x/2 + y/4 + z/4 1 = 0. Computing
gradients, we have
rF (x, y, z) = h1/2, 1/4, 1/4i, rg(x, y, z) = h2x, 2y, 2zi.
Thus we are looking for a simultaneous solution to
/2 = 2x, /4 = 2y, /4 = 2z.
In
A key point to the theory of Lagrange multipliers is that we never need to compute
, but we can use it symbolically to arrange all that we have. Solving each of those
equations for tells us that
= 4x = 8y = 8z =) x = 2y = 2z
so we can use the one-variable substitution y = x/2 and z = x/2 to compute an
actual point on this plane:
x x/2 x/2 3 4 2
+ + = 1 =) x = 1 =) x = , y=z= .
2 4 4 4 3 3
Answering the question, we have to plug all this in to the original distance function
p
p 2 6
d(4/3, 2/3, 2/3) = 16/9 + 4/9 + 4/9 = .
3
But wait! We never determined that this was a minimum! Fortunately, we can
appeal to our other senses: it’s very easy for a point on a plane to get far away from
the origin, but it’s difficult for it to be close. We should expect a minimum but no
maximum. As such, any extremum we encounter should be a minimum.
If we’re being extra fancy, we can compute the (three-dimensional!) Hessian for
g(x, y, z). Most of the second partial derivatives are zero, and the Hessian is diagonal
with entries (2, 2, 2).Thus we are in a permanent state of concave up, i.e. all local
extrema are minima.
s
I leave you with a classical practice problem:
s
Problem 3.30. What is the maximum of the function g(x, y, z) = xyz on the unit
sphere?
re
You perhaps know intuitively what the answer should be, but see how the method
of Lagrange multipliers bears out your intuition.
og
4. Day 4: Multivariable calculus
How do you integrate in two variables? First, learn how to integrate boxes. You
can do that using Riemann sums, but I really don’t want to do that. It’s conceptually
important but not worth typing up in the grand scheme of things. Again, they are
Pr
linear and you can separate them up and so on. There’s a technical definition for
when functions are integrable, but in all cases we care about it will suffice to know
that continuous functions on bounded domains (with non-ridiculous boundaries) are
di↵erentiable.
For boxes, it doesn’t matter whether you integrate over x or y first.
Theorem 4.1 (Fubini’s Theorem). Let f : R2 ! R be continuous and let R =
In
[a, b] ⇥ [c, d] be a rectangle. Then

ZZ Z bZ d Z dZ b
f (x, y) dA = f (x, y) dy dx = f (x, y) dx dy
R a c c a
Then, for integrating functions over more complicated regions D which are not
rectangles, you want to parametrise the region in terms of x in some range then y
as a function of x, then integrate y first then x. Or you can do it the other way,
depending on exactly how your region looks.
Problem 4.2. Integrate f (x, y) = xy over the region bounded by y = 4 and y = x2
in the first quadrant.
36 IAN COLEY
Solution. We can easily describe this region as x 2 [a, b] with '(x)  y  (x).
Since we are in the first quadrant, we must start at x = 0. The end point is where
these two curves intersect, which occurs at (2, 4). Thus we would like to integrate:
Z 2Z 4
xy dy dx
0 x2
Why have we ordered the y-integral like this? Drawing out the region shows that
y = x2 is on bottom and y = 4 is on top. When performing multivariable integrals,
if we are integrating with respect to y we just pretend that x is a constant (because
it is for our purposes):
Z
xy 2 4 x5
4xy dy = = 8x .
s
x2 2 x2 2
This shows why we are integrating with respect to y first. If we were to do this
s
integral second, our final answer would still have variables, which is suboptimal for
a definite integral. But now we integrate with respect to x and all our variables will
re
vanish: Z 2
x5 x6 2 64 32
8x dx = 4x2 = 16 = .
0 2 12 0 12 3
og
If our regions are oriented in the other fashion, we should integrate first with
respect to x then with respect to y. As an example,
Problem 4.3. Compute the area between the curves x = y 2 and x = 2y in the first
quadrant.
Pr
To find the area of the region, just integrate the function f (x, y) = 1. The hard
part is setting up the bounds, which I leave to you.
When our function is not just f (x, y) = 1, the integral over R is the volume under
the surface z = f (x, y) in R3 which lies over R in the xy-plane. This means that
instead of calculating the area between curves, we can calculate the volume of a
region between surfaces. In particular, when a certain region has a nice boundary
In
with respect to the xy-plane and its upper and lower boundaries are nice functions
of x, y, we’re in business. We can also do this with triple integrals.
4.1. Triple Integrals. Really, there’s not a whole lot di↵erent here, except that we
have three variables instead of two. Riemann sums are Riemann sums, except now
they’re one dimension more annoying. Supposing that we can parametrise our region
W ⇢ R3 anagolously, so that its boundary is of the form z1 (x, y)  z  z2 (x, y) on
a region D in the xy-plane, then
ZZZ ZZ Z z2 (x,y) !
f (x, y, z) dV = f (x, y, z) dz dA
W D z1 (x,y)
Moreover, if we want the volume of W , we just integrate the function 1.

ZZZ
Problem 4.4. Evaluate z dV where W is the region between the planes z =
W
x + y and z = 3x + 5y over the rectangle [0, 3] ⇥ [0, 2]. Then compute its volume.
Solution. Because x, y 0, it’s clear that the plane z = 3x + 5y is on top. Thus we
have x + y  z  3x + 5y for our z-boundary. This sets up the triple integral.
Z Z Z 3x+5y ZZ
(3x + 5y)2 (x + y)2
z dz dA = dA
R x+y R 2 2
ZZ
= 4x2 + 14xy + 12y 2 dA
s
R
and now we just have to integrate over the rectangle, which is pretty straightforward,
s
thus left as an exercise.
To find the volume, we have two conceptual choices that amount to the same
re
integral. First, it’s the region between two surfaces, so by the brief comment above,
we could solve ZZ
(3x + 5y) (x + y) dA
R
og
That is, we want to integrate over the region R the di↵erence in the heights of these
functions. Alternatively, we perform the same integral by plugging in 1 instead of z:
Z Z Z 3x+5y ZZ
1 dz dA = (3x + 5y) (x + y) dA
R x+y R
Pr
which amounts to the same thing. This is an even easier computation that I will not
do.
This example is slightly more confusing.
Problem 4.5. Integrate f (x, y, z) = x over the region W bounded above by z =
4 x2 y 2 and below by z = x2 + 3y 2 in the first octant
In
Solution. In order to parametrise the region in the xy-plane over which W lies we
need to compute the intersection of the surfaces. This turns out to be an ellipse:
4 x2 y 2 = x2 + 3y 2 =) 4 = 2x2 + 4y 2
Call the quarter of this ellipse we care about E. Our integral is thus
Z Z Z 4 x2 y 2
x dz dA
E x2 +3y 2
We need
p to solve the ellipse in terms of x or y, and we must as well pick x. We have
x = ± 2 2y 2 . We also know that x 0 and y 0 in the part we care about, so
38 IAN COLEY
p
we will pick 0  x  2 2y 2 . The bounds of y are 0  y  1. Therefore we can
set up our integral and go:
Z 1 Z 2 2y2 Z 4 x2 y2
x dz dx dy.
0 0 x2 +3y 2
The answer is 16/15, and the computation is left as practice.

4.2. Change of coordinates. There are other coordinate systems that we greatly
prefer in the case of roundedness. In two dimensions, we already remembered polar
coordinates to do some limit computations. We even recalled how to parametrise
an ellipse in the last section. Discs, annuli, and their sections are the ‘rectangles’ of
polar coordinates. We can recall that
s
y
x = r cos ✓, y = r sin ✓ () x2 + y 2 = r2 , tan ✓ =
s
x
ZZ
converts between the two. But is doing an integral like f (x, y) dx dy as easy as
re
ZZ R
f (r, ✓) dr d✓?
R
No, it’s not. The problem is that dr d✓ is not the same area as dx dy. In fact, we
og
can draw the usual picture and prove that
dx dy = r dr d✓
Thus swapping your integral into polar coordinates is almost as easy as posited.
Pr
Problem 4.6. Compute the area of the unit circle using polar coordinates.
Solution. The unit circle is described as ✓ 2 [0, 2⇡] and r 2 [0, 1]. Thus its area is
Z 2⇡ Z 1
r2 1
r dr d✓ = 2⇡ · = ⇡.
0 0 2 0
Note that if we forget to include that r, we get
Z 2 Z 1
In
⇡ 1 dr d✓ = 2⇡
0 0
which is a wrong answer.
So that’s for two dimensions; what about three? There’s an analogue of polar
coordinates called cylindrical coordinates, which just adds z as the third variable.
It’s another easy computation that dx dy dz = r dr d✓ dz. These coordinates are best
used with surfaces or regions that have nice symmetry when rotating around the
z-axis but not for any other types of rotation, for example cones, cylinders, and
hyperboloids or paraboloids.
Spherical coordinates are must useful for spheres, and can occasionally be useful
in other situations. The conversions are as follows:
z y
x2 + y 2 + z 2 = ⇢2 , cos ' = , tan ✓ = .
⇢ x
Conversely (and more usefully),
x = ⇢ sin ' cos ✓, y = ⇢ sin ' sin ✓, z = ⇢ cos '
A calculation that’s possible but a little beyond the scope of this course is that
dx dy dz = ⇢2 sin ' d⇢ d' d✓
Problem 4.7. Compute the volume of the region between the surfaces z = x2 + y 2
and z = 8 x2 y 2 .
s
Solution. The first surface is on bottom and the second is on top. The intersection
s
between these surfaces is
x2 + y 2 = 8 x2 y 2 =) x2 + y 2 = 4
re
which is the circle of radius 2. Thus we can compute this volume as a cylindrical
integral over the disc D given by r  2. The first thing is rephrasing the integrand
in terms of polar coordinates.
og
8 x2 y2 (x2 y2) = 8 2(x2 + y 2 ) = 8 2r2
Thus:
ZZ Z 2⇡ Z 2
2
8 2r dA = (8 2r2 )r dr d✓
Pr
D 0 0
Z 2
= 2⇡ · 8r 2r3 dr
✓0 ◆
r4 2
= 2⇡ · 4r2 = 2⇡ · (16 4) = 24⇡.
4 0
4.3. Quadric surfaces. . Now would probably be a good time to go over quadric
In
surfaces, i.e. the basic surfaces we will encounter in R3 . Ellipsoid. These are the
analogue of ellipses, and look basically the same: for positive numbers a, b, c 2 R, we
have
x2 y 2 z 2
+ 2 + 2 =1
a2 b c
the ellipse with radii a, b, c in the x, y, z directions respectively. The volume of such
an ellipsoid is (4/3)⇡abc, as one might expect.
Elliptic paraboloid These can be written as the graph of a function, namely for
a, b > 0,
z = ax2 + by 2 .
40 IAN COLEY
At each fixed value z = c, we get an ellipse. If we fix x = 0, we get a parabola

z = by 2 , and similar if we fix y = 0 we get z = ax2 . This justifies the name.
Of course, in this version the elliptical slices are all parallel to the xy-plane and
enclose the z-axis. It’s also possible to permute the variables for other options, e.g.
y = x2 + z 2 , x = 2y 2 + 3z 2 .
Hyperbolic paraboloid. Not covered by the above permutations is what hap-

pens if we flip the sign on one of ax2 or by 2 . Suppose that we take z = x2 y 2 as
2
a simple example. Then the p slices z = c are of the form c = x y 2 , which we can
rearrange to obtain y = ± x2 c. This is the formula of a hyperbola. However, if
we again look at the slices with x = 0 or y = 0, we obtain two parabolas, except
s
that one is facing up and one is facing down – hence hyperbolic paraboloid.
This type of shape is incredibly difficult to draw, but for one of these the point
s
(0, 0) is a saddle point. Thus these are the Pringle-shaped graphs that show up when
we learn the second derivative test.
re
Hyperboloid of one sheet. What if we have an ellipsoid but then flip one of
the signs? Then we can arrange it to obtain
x2 y 2 z2
og + = +1
a2 b2 c2
up to permutation of variables. Then if we set x = 0 or y = 0 we obtain again the
formula of a hyperbola, and now the slices at fixed values of z are ellipses. This is
not a combination we have seen before and we baptise it hyperboloid. You’ll want
Pr
to Google what these look like. If z is the isolated variable on the other side of the
equation, then we see that the elliptical slices are again parallel to the xy-plane.
Hyperboloid of two sheets. Suppose that flip two of the signs on an ellipsoid.
Then up to permuting variables, we obtain
x2 y 2 z2
+ = 1
a2 b2 c2
In
A fair question is how this di↵ers from the last example. It doesn’t really – we still
obtain hyperbolas if x = 0 or y = 0 and the horizontal slices are ellipses. But now
what if we plug in z = 0? Then we have to solve
x2 y 2
+ 2 = 1
a2 b
but this has no solutions. In fact, we need |z| c for there to be any points in x, y
that satisfy the equation. Thus the two halves of the hyperboloid are separated from
each other, i.e. there are two separate ‘sheets’.
Cone. A special case is the intermediate point between the two kinds of hyper-
boloids:
x2 y 2
+ 2 = z2
a2 b
where we imagine we have multiplied through by c2 and reorganised our constants
a, b. Then when z = 0, there is only one point (0, 0, 0) on the level set. Thus our
two sheets are joined at a single point, and it’s not hard to see that we have a cone.
If we set x = 0 or y = 0, we get (for example)
x2 2 x
= z =) ± =z
a2 a
which is a pair of lines intersecting at the origin. This certainly feels like the slice of
s
a cone (as it’s nice and pointy).
s
4.4. Vector fields and fancier integration. We now turn to the second kind of
integration in multivariable calculus, namely those involving vector-valued functions.
re
Definition 4.8. A vector field is a function f : Rn ! Rn , which we think of as
assigning to each point in Rn a vector in Rn beginning at that point.
At this point, I would draw a picture, or steal one from StackExchange1There are
og
many vector fields in real life, two easily coming to mind are the gravitational vector
field which expresses the force (and direction) due to gravity on any object in space,
and on each we can talk about the vector field of wind – to each point on each we
can assign the vector of which direction (and speed) the wind is blowing.
The most boring example is when we are still working with real-valued functions. If
Pr
we want to integrate a real-valued function over some curve C in R3 (or a surface, but
let’s stick with curve), then we think of C as being the image of some : [a, b] ! R3
which is continuous except at finitely many points (with technical details omitted).
Then the function we are considering is
f ( (t)) : R ! R
In
But this isn’t quite enough, because the parametrisation matters. We need to make
sure that the speed at which we are traversing this curve is taken into account, i.e.
Z Z b
f (x, y, z) = f ( (t))k 0 (t)k dt
C a
Note that for this equation to be without problems, we want to assume that 0 (t)
and f (x, y, z) are continuous. But this is the boring example.
Now, suppose we are thinking physics, and we want to know something like ‘how
much energy does it take to fight gravity or the wind’ ? This would involve a vector
1https://tex.stackexchange.com/questions/328036/velocity-field-3d-vector-fields-in-tikz-or-pgfplots
42 IAN COLEY
field F : R3 ! R3 . In this case, whenever the curve travels in the same direction
as the vector field, we would like to value that positively (going with the flow), and
negative when the curve travels against it. Remember that C we cannot think of as
just a 1-dimensional object in R3 , it comes with an orientation – it has a back and
a front, and the function : [a, b] ! R3 we use needs to take this into account.
What vector operation determines whether things go in the same direction? The
dot product. What gives the (linear) direction the curve is going? Its tangent vectors
0
(t). Thus:
Definition 4.9. The line integral of a vector field F along a curve C, parametrised
by : [a, b] ! R3 , is
Z Z b
s
F · dr = F ( (t)) · 0 (t) dt
C a
s
Both of these quantities are vectors, so it makes sense to dot them. Another way
the expression F · dr is sometimes written is F1 dx + F2 dy + F3 dz, where these are
re
the component functions of F . This will come up later.
Problem 4.10. Compute the line integral of F = hz, y 2 , xi along the curve (t) =
(t + 1, et , t2 ) for t 2 [0, 2].
og
What are some basic properties of the line integral? They are still linear, and
now if one reverses the orientation of the curve C, this is like swapping a, b on the
righthand side of the above equation, hence negates the integral. The last thing is
that stringing together multiple curves end-to-end gives a sum of integrals.
Pr
4.5. Conservative vector fields. These are just the best. Our prototype here is
a vector field F that arises as rf for some f : R3 ! R. Such a vector field is called
conservative. Then we can use the fundamental theorem of calculus to evaluate
Z Z b Z b
0 d
F · dr = rf ( (t)) · (t) dt = f ( (t)) dt = f ( (b)) f ( (a))
C a a dt
In
But (b) and (a) are just the endpoints of the curve, so the actual curve C
doesn’t matter in this case. Any vector field for which this happens is called path-
independent.
Similarly, if C is a closed curve, its endpoints are the same, so
I
F · dr = 0
C
Problem 4.11. Verify that F (x, y, z) = h2xy + z, x2 , xi is the gradient of a function,

then evaluate the line integral over the curve (t) = hsin(t) cos(⇡t), et , 4t3 1i for
t 2 [0, 1/2].
One might ask if there are path-independent vector fields that do not arise as the
gradient of some function. The answer is, essentially, no.
Theorem 4.12. A vector field F on an open, connected domain D is path-independent
if and only if it is conservative.
Now, let us go over some of the other vector derivatives that will become useful
shortly. The first is the divergence of a vector field,
div F (x, y, z) = r · F = @x F1 + @y F2 + @z F3
and the second is the curl of a vector field, which only makes sense in R3 :
2 3
î ĵ k̂
s
curl F (x, y, z) = r ⇥ F = det 4 @x @y @z 5
F1 F2 F3
s
Problem 4.13. Prove that if F = rf is conservative, then curl F = ~0.
re
The converse to this problem is not always true, but it is true in a great many
cases.
Theorem 4.14. Suppose that D is an open simply-connected domain. A vector
og
field F on D is conservative if and only if curl F = ~0.
Hence one should beware that the domains they are working on be simply-connected,
which (we remember) means that all loops in D be contracted to a point. That means
that something like x2 + y 2  c is okay but punctured regions like R2 \ {(a, b)} are
Pr
not.
4.6. Surface integrals. In order to integrate using surfaces in R3 , we need to be
able to parametrise them like
G(u, v) = (x(u, v), y(u, v), z(u, v))
where (u, v) are in some region D in R2 . For example, the graph of the function
In
z = f (x, y) in R3 is easily parametrised by (x, y, f (x, y)). This is our prototype, but
of course not all surfaces in R3 are graphs.
Suppose we want to parametrise the cylinder x2 + y 2 = 1 in R3 . This is best done
with cylindrical coordinates (of course), yielding an easy parametrisation (1, ✓, z) for
✓ 2 [0, 2⇡] and z 2 R. Since the radius is fixed, we only get two variables.
We can parametrise spheres similarly using spherical coordinates. There’s only a
slight problem with this picture, as we get kind of an overlap at ✓ = 0 and ✓ = 2⇡,
but we will not concern ourselves overmuch with this.
Okay, what are we doing with surfaces? Given a vector field, we want to measure
how much the vector field is flowing through the surface. But what does flowing
44 IAN COLEY
through the surface mean? Do surfaces have an up side and a down side? It turns
out, they do. In the parametrisation here, we have two tangent vectors @u G and @v G
which naturally give a direction ‘up’ for the surface in the form of @u G ⇥ @v G. Note
that if the partial derivatives are parallel this breaks, so we want to make sure that
we don’t run into that problem.
Once we have that normal vector, we can begin by finding the tangent plane, which
is moderately useful.
Problem 4.15. Compute the tangent plane to the surface G(✓, z) = (2 cos ✓, 2 sin ✓, z)
at P = G(⇡/4, 5).
Of course, we could easily reverse the orientation on the parametrisation by using
s
the equation G(z, ✓) so that the order in which we take the cross product is swapped.
Hence we’re going to want to have some sort of consistency. For the surface given by
s
the graph of a function, we will take the upwards direction to the canonical orienta-
tion for the normal vector, i.e. the one going in the positive z-direction h @x , @y , 1i.
re
Now, how do we perform scalar surface integrals, i.e. ones that ignore the vector
field for the moment? Turns out that, in the above situation, N = @u G ⇥ @v G is
also capturing the infinitesimal area of the parallelogram with sides du, dv. This is
an easy computation using the sin ✓ interpretation of the cross product. Therefore
og
if we want to do our integral, we need to scale by this amount, just like we had to
account for the speed in the case of line integrals.
Definition 4.16. Let G(u, v) be a parametrisation of a surface S ⇢ R3 with do-
main D. Assume that G is C 1 , one-to-one, and regular (i.e. the normal vector is
Pr
nondegenerate). Then for a function f : R3 ! R,
ZZ ZZ
f (x, y, z) dS = f (G(u, v))k@u G(u, v) ⇥ @v G(u, v)k du dv
S
Now, once we bring in the actual flow and a vector field F : R3 ! R3 , we need to
use a dot product:
In
Theorem 4.17.
ZZ ZZ
F · dS = F (G(u, v)) · N (u, v) du dv
S D
where again N (u, v) = @u G(u, v) ⇥ @v G(u, v).

Note that we usually want to make sure that N (u, v) is the upward-pointing (or
outward-pointing in the case of a closed surface) normal vector, just like in the graph
case. Therefore one might need to shu✏e coordinates around if we accidentally have
gotten it wrong.
Problem 4.18. Compute the flux through the surface G(u, v) = (u2 , v, u3 v 2 ) over
D = [0, 1]2 of the vector field F = h0, 0, xi.
Problem 4.19. We need to compute first the normal vector, and so need @u G and
@v G:
@u G(u, v) = h2u, 0, 3u2 i, @v G(u, v) = h0, 1, 2vi.
The zeroes make the cross product slightly nicer (details omitted):
N (u, v) = h2u, 0, 3u2 i ⇥ h0, 1, 2vi = h 3u2 , 4uv, 2ui.
s
Is this upward pointing? Looking at the z-coordinate, it’s always positive when
u 2 [0, 1], so we’re in business.
s
We now need to compute the other part of our integrand:
re
F (G(u, v)) = h0, 0, u2 i =) F (G(u, v)) · N (u, v) = 2u3 .
The final computation is thus

Z Z
og 1 1
u4 1 1
2u3 du dv = 1 · = .
0 0 2 0 2
4.7. Fundamental Theorems of Vector Calculus. These all unify nicely in dif-
ferential topology, but not many of my readers will have that perspective before
Pr
graduate school (I certainly didn’t). Thus we will proceed one at a time and try to
use whatever intuition is accessible to us. The first is Green’s Theorem.
Theorem 4.20 (Green’s Theorem). Let D be a closed domain with @D a simple

closed curve, oriented counterclockwise. Then
I ZZ
In
F1 dx + F2 dy = (@x F2 @y F1 ) dA
@D D
Use this when the line integral of the closed curve would be way too confusing to
compute.
Problem 4.21. Verify Green’s Theorem by computing the line integral over the unit
circle C of F (x, y) = hxy 2 , xi.
Solution. On the one hand, we may parametrise the unit circle by ~ (t) = hcos t, sin ti.
Note that this is the correct counterclockwise orientation. Also note that ~ 0 (t) =
46 IAN COLEY
h sin t, cos ti. Thus

I Z 2⇡
~
F · dr = F ( (t)) · ~ 0 (t) dt
C 0
Z 2⇡
= hcos t sin2 t, cos ti · h sin t, cos ti dt
0
Z 2⇡
= cos t sin3 t + cos2 t dt
0
The integral of this first term is zero because (roughly) cos t and sin3 t are zero when
integrated over their entire period. The integral of cos2 t is not zero, however, and it
can be computed to be ⇡.
s
Using Green’s theorem, we have
I ZZ
s
~
F · dr = 1 2xy dA
re
C D
It would be better to convert to polar for this integral. Giving some of the steps
(and reminding the reader that 2 sin ✓ cos ✓ = sin(2✓)),
ZZ Z 2⇡ Z 1
og1 2xy dA = (1 2(r cos ✓)(r sin ✓))r dr d✓
D 0 0
Z 2⇡ Z 1
= r 2r3 cos ✓ sin ✓ dr d✓
0 0
Z 2⇡
1 1
Pr
= sin(2✓) d✓
0 2 2
=⇡
In this case, neither integral was particularly nice. However, in the case that the
integrand of the double integral is a constant, life is much better. For example,
consider the vector field F~ (x, y) = h y, xi. Then
In
@x F2 @y F1 = 1 ( 1) = 2.
Thus integrating a closed curve along this vector field gives you twice the area it
encloses. Look for phenomena like this when it seems that Green’s theorem might
be in play.
One key feature is that, even if your curve is not closed, you can close it up and
appeal to a simpler area calculation, i.e. if you have to compute a line integral over
half a circle, complete it to a whole circle.
Problem 4.22. Compute the line integral over the ‘curve’ of straight lines connecting
(1, 1) to (0, 1) to (0, 0) to (1, 0) of the vector field F~ (x, y) = hx2 y 2 , 2xyi.
Solution. Now, this isn’t a closed curve, so we can’t use Green’s theorem. However,
it is oriented counterclockwise and it’s just a little bit o↵ being closed. Let’s call the
curve in the problem C1 and let C2 denote the straight line between (1, 0) and (1, 1).
If we let C be the closed curve, then we have
I Z Z
F~ · dr = F~ · dr + F~ · dr.
C C1 C2
But now the lefthand integral can be computed using Stokes’ theorem:
I ZZ Z 1Z 1 1
F~ · dr = 2y ( 2y) dA = 4y dy dx = 2y 2 = 2
C [0,1]2 0 0 0
Hence we can solve the integral we want a little more easily: we parametrise C2 by
s
~r(t) = h1, ti for t 2 [0, 1], and so
Z Z Z 1
s
F~ · dr = 2 F~ · dr = 2 F~ (~r(t)) · ~r 0 (t) dt
C1 C2 1
re
Z 1 Z 1
2
=2 h1 t , 2ti · h0, 1i dt = 2 2t dt
0 0
=2 1 = 1.
og
This is much easier than the alternative: breaking the curve C1 into three line seg-
ments and doing three di↵erent line integrals.
Next up: Stokes’ Theorem. The previous theorem told us how to compute a line
integral around a closed curve as a double integral. Stokes’ theorem will tell us how
Pr
to compute a line integral as a surface integral (and sometimes vice versa).
Theorem 4.23 (Stokes’ Theorem). Let S be an oriented surface in R3 with boundary
oriented so that the surface is always on your left (assuming outward pointing normal
vectors). Assume that F~ : R3 ! R3 is a C 1 vector field. Then
I ZZ
F~ · dr = curl F~ · dS
In
@S S
In particular, if @S = ?, then the integral is zero.
It doesn’t look like this is a particularly helpful theorem, in that surface integrals
are usually pretty nasty. But again, a vector field F~ might be nasty but have a curl
which is not – it’s hard to tell without computing itRRthough. Supposing that we
start with the righthand side, something of the form S G ~ · dS, how can we tell if
~ = curl F~ for some F~ ?
G
Proposition 4.24. Suppose that G ~ is a vector field in a simply-connected domain.
Then G~ = curl F~ if and only if div G~ = 0.
48 IAN COLEY
At least the backwards direction of this proposition is easy, and the forwards
direction is done by actually constructing an F~ which has curl G.
~ I will not be doing
~
this. Thus I haven’t actually told you how to find that F , but we usually don’t need
to in practice.
Problem 4.25. Let S be the unit sphere in R3 , and let
~
G(x, y, z) = h2xyz, 4 y 2 z, x3 + y 2 i.
~ through S.
Compute the flux of G
~ is defined on all of R3 ,
Solution. This looks nigh-impossible. This vector field G
which is simply-connected. Moreover, we see that
s
~ = 2yz
div G 2yz + 0 = 0
s
therefore it’s the curl of some vector field F~ . Thus:
re
ZZ I
~
G · dS = F~ · dr.
S @S
But hey, @S = ?, so we don’t even need to know F~ to conclude that the righthand
og
integral is zero.
Another use for Stokes’ theorem is the observation that many di↵erent surfaces S
have the same boundary @S. As an illustrating example:
Pr
Problem 4.26. Consider the vector field
~ = h2yex + z, log(x + z)
G y 2 ex , x + y + 1i.
Compute the flux of G ~ through the upper hemisphere of the unit circle S, with
counterclockwise oriented boundary and upward pointing normal vector.
Solution. Again, the brute force method would take ages. But we notice that
In
~ = 2yzex
div G 2yex + 0 = 0
So this is the curl of something. But hang on, our surface now has a boundary! It’s
the unit circle in the xy-plane, and doing the line integral of something unknown
over that circle seems really bad.
But let’s consider the unit disc D, which has the same boundary as S but has a
much more straightforward normal vector. Using Stokes’ Theorem twice,
ZZ I ZZ
~
G · dS = ?? · dr = G~ · dS.
S @S D
Let’s now try to find this right-most integral. The normal vector to D is given
everywhere by h0, 0, 1i, so we just need to compute the double integral
ZZ ZZ
~
G · h0, 0, 1i dA = x + y + 1 dA
D D
where we think of D in two dimensions as parametrising D in R3 via f (x, y) =

(x, y, 0). But now this integral is easy: the region D is symmetric in both x and y,
so ZZ ZZ
x + y dA = 0 =) x + y + 1 dA = area(D) = ⇡.
D D
The last theorem to discuss is the divergence theorem, which will tell us how to
s
compute (certain) triple integrals in terms of surface integrals, and more helpfully
vice versa.
s
Theorem 4.27 (Divergence Theorem). Let S be a closed surface, i.e. one that has
re
no boundary, enclosing a region W ⇢ R3 . Let S be oriented by outward pointing
normal vectors, and suppose that F~ is a C 1 vector field defined on open domain in
R3 that contains W . Then
ZZ ZZZ
F~ · dS = div F~ dV
og
S W
This is related to an observation we made earlier: if F~ is a vector field with

div F~ = 0 and if @S = ?, then the surface integral on the lefthand side vanishes.
Pr
Remark 4.28. We have discussed the following operations in R3 :
r curl div
f : R3 ! R / F~ : R3 ! R3 / ~ : R3 ! R3
G / g : R3 ! R
The composition of any two of these operations yields the zero function or vector
field. Also, as long as we’re defining everything over a simply-connected domain, if
we know that curl F~ = 0, then F~ = rf and if div G ~ = 0, then G ~ = curl F~ . That is,
In
if something goes to zero, then it comes as a result of the previous operation in the
chain.
There’s a nice way to discuss this from the point of di↵erential topology, but that’s
a bit beyond the scope of the Math GRE. Indeed, I didn’t learn any of that until
graduate school, at which point I understood most of this well for the first time.
Let’s see it in action.
Problem 4.29. Let F~ (x, y, z) = hy, yz, z 2 i, and let S be the hollow cylinder of radius
2 and height 5 with its base on the xy-plane (with outward pointing normal vector).
Compute the flux of F~ through S.
50 IAN COLEY
Solution. To actually do this computation, we would need to decompose the cylinder

into its top, bottom, and body. That gives us three di↵erent flavors of normal vector,
which we can use to compute the surface integrals.
But let’s not. The divergence of this vector field is beautiful:
div F~ = 0 + z + 2z = 3z.
So we need to compute the integral of 3z over the solid cylinder. Luckily, since we
have access to cylindrical coordinates, it’s very easy to rephrase the triple integral
we need to perform:
ZZZ Z 2⇡ Z 2 Z 5
3z dV = 3z · r dz dr d✓ = 150⇡
cylinder 0 0 0
s
Actually doing this triple integral is left as an exercise.
s
Thus: never ever compute the flux through a closed surface. You have access to
the divergence theorem, and it’ll almost certainly be a simpler calculation.
re
Remark 4.30. The divergence theorem has an application to physics which most
people learn in their introductory E&M course. Suppose that we have some collection
of charged particles and a spherical shell enclosing them. How do we compute the
flux of the electrical field through the shell? We just add up the charges of the
og
particles on the inside of that shell. This is known as Gauss’ Law. The divergence
of the electrical (vector) field is essentially the charge on the particles which produce
it.
5. Day 5: Linear algebra
Pr
Right, systems of linear equations. We like solving them, don’t we? Let’s just cut
to the chase. We already know well enough what a vector space is and what an inner
product and norm are. In the case that a vector space V admits an inner product,
we define the norm by k~v k2 = h~v , ~v i. We’re only ever going to be working with finite
dimensional inner product spaces on the GRE, so no need to get too complicated.
We want to open up inner products Pn a bit. Over C, we want to define the inner
In
product h~v , wi
~ to still be the sum i=1 vi wi , but now we want it to be sesquilinear:
h↵ · ~v , wi
~ = h~v , ↵ · wi,
~ ↵2C
Inner products and norms satisfy what’s called the Cauchy-Schwartz inequality: for
~ 2V,
any ~v , w
|h~v , wi|
~  k~v k · kwk
~
and equality is only satisfied in certain cases. We also have the triangle inequality:
k~v + wk
~  k~v k + kwk
~
which we all know from geometry.
Problem 5.1. When are these inequalities equalities? Note: they have di↵erent
conditions.
Both of these can be proven using the idea of projections. We define the projection
of w
~ onto ~v by
h~v , wi
~
projw~ ~v = · w.
~
hw,
~ wi~
That’s a good formula to remember, and does things like proves the cos ✓ formula
for inner product:
h~v , wi
~ = kvk · kwk · cos ✓, ✓ = the angle between ~v , w
~
One fun formula that you might recall is the polarization identity. In any real
s
vector space,
1
s
h~v , wi
~ = ~ 2 k~v wk
k~v + wk ~ 2
4
re
which is an easy proof.
Problem 5.2. Prove it.
Okay, now we need the notion of subspaces. Let’s assume for ease of notation that
og
all our vector spaces are over R, though that might not necessarily be true on the
GRE.
Definition 5.3. A subset W ⇢ V is called a subspace if:
(1) ~0 2 W
Pr
(2) For any w ~ 2 2 W and c 2 R, w
~ 1, w ~1 + c · w
~2 2 W
That is, W is a vector space in its own right that sits inside of V .
P a set S = {v1 , . . . , vn }, we define the span of S to be all linear

5.1. Bases. For
combinations ni=1 ai~vi for any ai 2 R, and sometimes we write hSi for this. We say
Xn
that S is linearly independent if whenever we have the sum ai v~i = 0, then all
In
i=1
coefficients ai = 0. This also implies that if w
~ is in the span of S, then there is a
unique way in which to write w ~ as a linear combination.
Problem 5.4. Prove that.
A maximal linearly independent set in V is called a basis. All bases have the
same number of elements, and all (finite dimensional) vector spaces have a basis.
Call that number the dimension of V . Note that ‘infinite dimensional’ vector spaces
don’t have a basis unless you assume the axiom of choice! It also makes sense to talk
about the basis of a subspace W ⇢ V , etc.
52 IAN COLEY
Remark 5.5. We’re going to keep writing V, W for arbitrary vector spaces, but on
the GRE we might as well have V = Rn and W = Rm all the time, where n = dim V
and m = dim W .
What’s the best kind of basis? An orthonormal one!
Definition 5.6. We say that ~v and w ~ are orthogonal if h~v , wi
~ = 0. This is true if
and only if the vectors are perpendicular in the ambient vector space (or at least one
of them is the zero vector).
Definition 5.7. A set S = {v1 , . . . , vn } ⇢ V is an orthonormal basis if
(1) kvi k = 1 for all i = 1, . . . , n
(2) For any i 6= j, h~vi , ~vj i = 0
s
This can be put more smoothly by saying that h~vi , ~vj i = ij , where ij = 1 if i = j
s
and is zero otherwise.
The usual basis for V = Rn given by ~ei is orthonormal. Think how much more
re
difficult the world would be if the coordinate axes weren’t perpendicular to each
other!
Problem 5.8. Suppose that S = {vi } is an orthonormal basis. Then we know that
og
~ 2 V has a unique expression as
any w
a1~v1 + · · · + an~vn = w.
~
Prove that we can compute these coefficients: ai = h~vi , wi.
~
Pr
That is very useful! But what if our basis isn’t orthonormal? Luckily there’s a
process, called the Gram-Schmidt process, to transform it into an orthonormal one.
The process is inductive, and goes as follows:
• Begin with the first element ~v1 of your basis. This may not be a unit vector,
so let ~u1 := k~~vv11 k . The vector ~u1 is orthogonal to every other element of our
new basis (because we haven’t added any yet).
• Now, take the element ~v2 . This is probably not orthogonal to ~u1 , so we force
In
it to be so: define
w~ 2 = ~v2 h~u1 , ~v2 i · ~u1
which we can see is now orthogonal to ~u1 . But this is probably not a unit
vector, so define ~u2 = kw w
~2
~ 2k
.
• We see how to proceed from here: define
j 1
X w
~j
w
~ j = ~vj h~ui , ~vj i · ~ui , ~uj =
i=1
kw
~jk
and eventually we’ll be done!
It’s important to note that at every stage we are not changing the span of our vectors.
The vector w ~ 2 , for instance, is a linear combination of ~u1 and ~v2 , and ~u1 was just a
multiple of ~v1 so was in its span. Thus the span of w ~ 1, w
~ 2 is the same as ~v1 , ~v2 .
Problem 5.9. Let {1, x, x2 , x3 } be a basis for P3 (R), the vector space of degree at
R 1 3 polynomials with coefficients in R, endowed with the inner product hf (x), g(x)i =
most
0
f (x)g(x) dx. Convert this to an orthonormal basis.
5.2. Linear transformations. What are the functions we care about?
Definition 5.10. A linear transformation T : V ! W is a function satisfying the
axiom that T (c · ~v + w) ~ for all c 2 R and for all ~v , w
~ = c · T (~v ) + T (w) ~ 2V.
s
The definition implies that T (~0) = ~0, which is a nice feature. We have a couple of
associated numbers:
s
Definition 5.11. The rank of T : V ! W is the dimension of its image T (V ) ⇢ W
re
as a subspace of W . The nullity the dimension of its kernel, that is the subspace
ker T ⇢ V given by T (~v ) = 0.
Theorem 5.12. Rank + nullity = dim V .
og
Linear transformations are well described by matrices in the case that we identify
V = Rn and W = Rm . A transformation T yields a matrix A 2 Mm⇥n (R) where
the columns of A are T (~ei ) for the basis vectors ~ei of Rn . But normally vector
spaces don’t come with automatic bases. For T : V ! W , where dim V = n and
Pr
dim W = m, we still get a matrix A of the same dimensions, but we have to set a
basis = {~bi } for V and thus will denote it [T ] .
Okay, now let’s just fix V = W = Rn . What if we had a special basis that
we want to change A = [T ]std to? How do we change basis? In order to write the
matrix A in terms of a new basis, we think of converting from to standard, doing
the transformation A that we know, then back to . The ‘back to standard’ matrix
is P such that its columns are the ~bi . Therefore
In
A
Rn / Rn
P P
✏ ✏
Rn / Rn
[T ]
1 1
A=P [T ] P =) [T ] = P AP
where the matrix P is easy to compute but P 1 is usually a little more unpleasant
to compute. GAUSSIAN ELIMINATION START HERE
54 IAN COLEY
What does it mean to be invertible? First, need to be n ⇥ n, so that there’s a

chance that both AB = In and BA = In . You can’t have it both ways! That’s
rank-nullity theorem.
Over R, it means that det 6= 0. What’s determinant? Well, go over expansion
by minors equation. There’s some intrinsic definition using fancier math but never
mind that.
Equivalent formulations of invertibility:
• det A 6= 0
• row rank of A is n
• column rank of A is n
• the linear transformation that A defines is bijective
s
• reduced row echelon form is the identity
How do you find the inverse of a matrix? In 2 ⇥ 2, there’s a nice formula, but
s
otherwise you have to use Gaussian elimination. THEN DO A 3 ⇥ 3 EXAMPLE.
Define eigenvalues and eigenvectors of a matrix. Note that the kernel is eigenvec-
re
tors of eigenvalue 0. Sometimes you get a basis of eigenvectors, but sometimes you
don’t. How often does that occur?
Theorem 5.13 (Spectral Theorem). This is a restricted version, but often good
og
enough. If A is a real matrix and A = AT , the transpose of A, then A is diago-
nalizable. If A is a complex matrix, then we need A = A⇤ , the Hermitian matrix
(transpose conjugate). The general form is that AAT = AT A.
So how do we go about finding eigenvalues or eigenvectors? Use the characteristic
Pr
polynomial or, as came up on the test, just examine the matrix A In . is
an eigenvalue if and only if that matrix is not invertible, i.e. has a kernel so that
A~v · ~v = 0 has a solution and A~v = · ~v .
DO AN EXAMPLE OF FINDING EIGENVALUES FOR A SYMMETRIC 3 ⇥ 3
matrix.
Note: A and AT have the same eigenvalues, which is pretty neat.
In
RANDOM SHIT: if A is nilpotent, then In + A or In A is invertible (by the

formula that appeared before).
Simultaneously diagonalizable: suppose that A, B have the same eigenbasis. Then
AB = BA.
6. Day 6: Differential equations and complex analysis

So we begin today by recalling the basic types of di↵erential equations that we need
to solve for the GRE. There aren’t that many, but if you’re like me, you’ve definitely
forgotten about them. But before we get there, let’s recall the fundamental theorem
that lets us do anything:
Theorem 6.1. Suppose that f (t, y) and @y f (t, y) are continuous on a compact subset
K ⇢ R2 . Then for any point (t0 , y0 ) 2 K, the di↵erential equation y 0 = f (t, y),
y(t0 ) = y0 has a unique solution in some neighbourhood of (t0 , y0 ).
So perhaps it’s good to check that @y f (t, y) is continuous before blindly charging
into a problem, but probably not.
The first type of di↵erential equation is the basic separation of variables, like so:
Problem 6.2. Suppose that a colony of bacteria grows at a rate directly proportional
to its size. Initially, the colony has 100 bacteria, and after a week it has 300 bacteria.
Write a formula modelling this situation where the time t is measured in days
Solution. The situation we have is
s
dB dB
= k · B =) = k dt
s
dt B
Then integrating both sides gives us
re
log(B) = k · t + C =) B(t) = C · ekt .
We have that B(0) = 100, so C = 100. We also know that B(7) = 300, so
og log(3)
e7k = 3 =) k = .
7
Putting it all together,
log(3)
·t
B(t) = 100e 7
Pr
There are more sophisticated version of this problem, and they are usually in the
tune of salty or sugary tanks of water.
Problem 6.3. Suppose that we have a 100L tank with 50L of water and 100g of
salt in it. Suppose the tank drains at a rate of 1L/m and is filled at a rate of 2L/m
with pure water. Assuming instantaneous mixing, when the tank is full, how much
salt is there in the tank?
In
Solution. Let’s set this up. We have

dS
= in out
dt
In this example, there’s no salt coming in. For the out, we need to know what the
density of salt in the tank is. The amount of salt is S, but the volume changes: we
have a net +1L/m, so the volume is 50 + t. Therefore
dS S dS dt
= =) = =) log(S) = log(50 + t) + C
dt 50 + t S 50 + t
56 IAN COLEY
After integration and rearranging (specifically after pulling in the 1 into the log),
we get
C
S(t) =
t + 50
The amount of salt at the beginning is 100, so we have S(0) = C50 = 100 so
C = 5000. The tank is full at t = 50, so
5000 5000
S(t) = =) S(50) = = 50g
t + 50 100
Now, we turn to other types of di↵erential equations. Let’s first recall what an
integrating factor is. Suppose our di↵erential equation is of the form
dy
s
+ p(t)y = q(t).
dt R
s
Then we consider the integrating factor µ(t) = e p(t) dt . Why does that help? Using
this term,
re
d
µ(t) · y 0 + µ(t)p(t) · y = (µ(t) · y) = µ(t) · q(t).
dt
Thus, when we integrate both sides,
Z
og µ(t) · y = µ(t)q(t) dt
Assuming that the righthand side is integrable, we can then solve and divide out by
µ(t).
Problem 6.4. Solve the linear ODE y 0 2ty = t.
Pr
R 2
Solution. The process implies that µ(t) = e 2t dt = e t , not something we can
integrate on its own. Luckily, the whole righthand side is integrable:
Z
2 1 t2
te t dt = e +C
2
Dividing through now by our integrating factor,
In
2 1
y(t) = Cet
2
Again, if we get a linear ODE of this form, this is pretty much the only way to
solve it. Exact ODEs likely won’t come up, but there’s always that chance. Plus,
it’s related to multivariable calculus. Suppose that we have a di↵erential equation
of the form
N (x, y) · y 0 + M (x, y) = 0 i.e. N (x, y)dy + M (x, y)dx = 0
where moreover we have @x N (X, y) = @y M (x, y). Then this implies that, at least
locally, that this situation is coming from the equality is mixed partials, so we need to
find a function H(x, y) with rH = hM, N i. The general solution to the di↵erential
equation is H(x, y) = C.
Problem 6.5. Solve (x2 y + 2y) · y 0 + (xy 2 + 2x) = 0.
Solution. This equation is exact (easily verified), so a solution looks something like
Z Z
H(x, y) = xy + 2x dx = x2 y + 2y dy
2
As before, we need to integrate but bear in mind that we might have constants that
depend on one variable or the other. That is,
Z Z
2 x2 y 2 2 x2 y 2
xy + 2x dx = + x + g1 (y), x2 y + 2y dy = + y 2 + g2 (x)
s
2 2
Comparing terms, we need to use g2 (x) = x2 and g1 (y) = y 2 , so that
s
x2 y 2
H(x, y) = + x2 + y 2 = C
re
2
is our general solution.
6.1. Higher order di↵erential equations. Now, suppose we want to solve partic-
ular di↵erential higher order di↵erential equations that have little interaction between
og
the variables. For instance, examine
y 00 9y = f (t)
The first step is to solve the corresponding homogeneous equation y 00 9y = 0. We
can solve this by inspection, know that y 0 = ky is solved by ekt . Hence the solutions
Pr
we need are e3t and e 3t . The general solution to the di↵erential equation is therefore
y(t) = c1 e3t + c2 e 3t
.
Here’s generally how you solve a homogeneous di↵erential equation like this. Con-
sider an equation
ay 00 + by 0 + cy = 0
In
Then solutions to this equation are given by e t , where is a root of the corresponding
characteristic polynomial
ax2 + bx + c = 0
There are three options here: the polynomial may have two distinct real roots, one
double real root, or two (conjugate) complex roots.
The case of two distinct real roots is the one we examined above: the general
solution is y(t) = c1 e 1 t + c2 e 2 t . If there is only one real root, we still need a two-
dimensional solution to the system of equations, so the general solution looks like
y(t) = c1 e t + c2 te t . The complex roots possibility is a little more delicate, because
we need to make sure that come up with a real solution.
58 IAN COLEY
To examine this, let = a + bi. Then the general solution becomes

y(t) = c1 e(a+bi)t + c2 e(a bi)t
= c1 eat ei·bt + c2 eat ei·( bt)
Using the identity ei✓ = cos ✓ + i sin ✓, we change the above:

y(t) = c1 eat ei·bt + c2 eat ei·( bt)
= c1 eat (cos(bt) + i sin(bt)) + c2 eat (cos( bt) + i sin( bt))
Using now that cos( bt) = cos(bt) and sin( bt) = sin(bt),
y(t) = (c1 + c2 )eat cos(bt) + (c1 c2 )i · eat sin(bt)
s
Now we use the fact that (secretly) c1 , c2 2 C now, we need that c1 c2 2 iR and
c1 + c2 2 R. Luckily, it is possible to get any number we want using c1 = c 2di and
s
c2 = c+di
2
so that c1 + c2 = c and c1 c2 = di. Putting this all together, the general
re
solution is
y(t) = c1 eat cos(bt) + c2 eat sin(bt).
Problem 6.6. Solve the following initial value problem: y 00 4y 0 + 9y = 0, with
og
y(0) = 0 and y 0 (0) = 8.
Solution. The characteristic equation is x2 4x + 9 = 0, so that we have roots
p
4 ± 16 4(9) p
= = 2 ± 5i
Pr
2
Hence the general solution is
p p
y(t) = c1 e2t cos( 5t) + c2 e2t sin( 5t)
Knowing that y(0) = 0 means that c1 = 0, as everything else cancels out. Therefore
p p p p
y(t) = ce2t sin( 5t) and y 0 (t) = 2ce2 sin( 5t) + 5ce2t cos( 5t)
In
p 8
So y 0 (0) = 5 · c = 8 thus c = p . Not the nicest solution, but
5
8 p
y(t) = p e2t sin( 5t)
5
6.2. Nonhomogeneous di↵erential equations. How do we deal with nonhomo-
geneous di↵erential equations? First, solve the homogeneous one. Then we have to
guess a particular solution.
Problem 6.7. Determine a particular solution to y 00 4y 0 12y = 3e5t .
Solution. We need to solve x2 4x 12 = 0, which isn’t too difficult: it factors as

(x 2)(x + 6) = 0 so we get
6t
y(t) = c1 e + c2 e2t + yp (t)
What should yp (t) look like? Probably something of the form yp (t) = Ae5t , so we
then need to check which A satisfies the di↵erential equation:
yp0 (t) = 5Ae5t , yp00 (t) = 25Ae5t =) 25Ae5t 4 · 5Ae5t 12 · Ae5t = 3e5t
3
Solving this gives 7Ae5t = 3e5t so A = . Putting this all together,
7
s
6t 3 5t
y(t) = c1 e + c2 e2t e
s
7
Other types of particular solutions require di↵erent guesses: sines and cosines
re
demand sines and cosines. What if particular solutions are polynomials?
Problem 6.8. Determine a particular solution to y 00 4y 0 12y = t2 + 3t + 2.

og
Solution. The particular solution looks like a polynomial of the same degree, so let
yp (t) = at2 + bt + c. Then
yp0 (t) = 2at + b, yp00 (t) = 2a

Pr
Putting it all together,
2a 4(2at + b) 12(at2 + bt + c) = t2 + 3t + 2
We need to separate by degrees:
12at2 = t2 , ( 8a 12b)t = 3t, 2a 4b 12c = 2

In
The easiest way to solve this is left to right:

✓ ◆
1 8 7 7
a= , 12b = 3 =) 12b = =) b =
12 12 3 36
25
Finally, we can solve that c = . We can then put it all together as we did
216
above.
Fortunately, this is as far as things need to go in the realm of di↵erential equations.

60 IAN COLEY
6.3. Complex analysis. Let’s recall a little the nice types of complex-valued func-
tions.
Theorem 6.9. Let f : ⌦ ! C, where ⌦ ⇢ C is an open subset of the complex
numbers. Then the following are equivalent:
• f is di↵erentiable in an open disc centered at a 2 ⌦ (holomorphic)
X1
• f has a convergent power series expansion cn (z a)n in an open disc
n=0
centered at a 2 ⌦ (analytic)
This incredible theorem implies that di↵erentiable functions are smooth, which is
one of our introductions to the wild world of complex analysis. There are some nice
s
corollaries:
s
Corollary 6.10. Let f, g : ⌦ ! C be two holomorphic functions on an open con-
nected ⌦ ⇢ C. If f (z) = g(z) on an infinite subset S ⇢ ⌦ that contains a limit point
re
of ⌦, then f = g on ⌦.
Corollary 6.11. A bounded holomorphic function f : C ! C must be constant.
Holomorphic functions must satisfy the Cauchy-Riemann equations, and the con-
og
verse is true as well.
Theorem 6.12. Let f : ⌦ ! C be a function, and write f (x + iy) = u(x + iy) + i ·
v(x + iy). Then f is holomorphic if and only if
Pr
@x u = @y v and @y u = @x v.
This theorem is pretty useful, as it means that information about the real part of
a holomorphic function can get us the whole function.
Problem 6.13. Suppose that f (x + iy) = u(x, y) + i · v(x, y). If u(x, y) = x2 y2
and v(1, 1) = 2, find v(4, 1).
In
Solution. We can use the fundamental theorem of calculus in this case:

Z 4
v(4, 1) v(1, 1) = @x v(x, y) dx
1
The equation is, what’s @x v? By the Cauchy-Riemann equations, it’s @y u = 2y.
Thus Z 4 (4,1)
v(4, 1) v(1, 1) = 2y dx = 2xy =6
1 (1,1)
This implies that v(4, 1) 2 = 8 so v(4, 1) = 8.
It might be worth remembering the following:
Definition 6.14. A function u(x, y) : ⌦ ! R is harmonic if @x2 u + @y2 u = 0.

Locally, a harmonic function is the real part of a holomorphic function, in the
following way: if u(x, y) is harmonic, then
f (x + iy) = @x u(x, y) i · @y u(x, y)
is holomorphic.
6.4. Cauchy integral formula. The last thing we should recall is the Cauchy in-
tegral formula.
Theorem 6.15 (Cauchy integral formula). Suppose that f : ⌦ ! C is a holomorphic
s
function on an open domain, and let D ⇢ ⌦ be a closed disc in ⌦. Then
s
I
1 f (z)
f (a) = dz
re
2⇡i @D z a
for every a 2 D.
This yields the residue theorem.
og
Theorem 6.16 (Residue theorem). Let U ⇢ C be a simply connected open subset
and f : U ! C a function holomorphic but for a 2 U . Let be a closed curve in U
around a, oriented counterclockwise. Then
I
Pr
f (z) dz = 2⇡i Res(f, a)
1
where Res(f, a) is the coefficient of the term in the Laurent series expansion
z a
of f (z) around a. Otherwise put, it is the number R such that
In
R
f (z)
z a
has an analytic antiderivative in a disc around a.
I’m not sure that this has much of a place on the GRE, but the Cauchy-Riemann
equations are the jewel in the crown.
7. Day 7: Algebra
Topics covered: groups, rings, and fields.
62 IAN COLEY
7.1. Groups.
Definition 7.1. A group is a set G with a binary operation · : G ⇥ G ! G that
satisfies the following axioms:
• The operation is associative, so (g · h) · k = g · (h · k) for any g, h, k 2 G
• There exists an element e 2 G such that e · g = g · e = g for all g 2 G
• For every element g 2 G, there exists an element g 1 2 G so that
g·g 1 =g 1·g =e
Problem 7.2. Prove that the identity element is unique.
Problem 7.3. Prove that the inverse of an element g is unique.
s
These are two good exercises to get your hands on. Note that the group operation
needn’t be commutative! Consider an example you already know: let GLn (F ) be the
s
subset of invertible n ⇥ n matrices with entries in a field F . Then this is a group
under multiplication, with inverse and identity as one would imagine. Note that
re
Mn (F ) is not a group under multiplication, as non-invertible matrices do not have
inverses (obviously). However, Mn (F ) is a group under addition, and it’s in fact
commutative, i.e. A + B = B + A for all A, B 2 Mn (F ). Commutative groups are
also called abelian.
og
What are the kinds of functions we’re interested in?
Definition 7.4. A group homomorphism is a map of sets ' : G ! H such that
'(g1 · g2 ) = '(g1 ) · '(g2 ) for all g1 , g2 2 G.
Problem 7.5. Prove that '(g 1 ) = '(g) 1
Pr
and '(eG ) = eH .
Definition 7.6. An isomorphism of groups is a group homomorphism that is bijec-
tive as a map of sets. In particular, the set-theoretic inverse map is automatically a
group homomorphism.
In
Definition 7.8. A subset H ⇢ G is called a subgroup of G if eg 2 H and for every

h1 , h2 2 H, h1 h2 1 2 H. That is, H includes the identity element and is closed under
multiplication and inverses. We usually write H < G in this case.
As an example of subgroups, let g 2 G and consider the set {g n : n 2 Z}. Then
it’s an easy check that this satisfies the subgroup definition, and we write hgi < G.
Such a subgroup is called cyclic. Note that this set may be finite. If it is, we write
|g| = n for the order of g, and it’s the minimal n such that g n = eG . Otherwise we
say |g| = 1.
There’s a really important theorem on the order of subgroups (and hence of ele-
ments).
Theorem 7.9 (Lagrange’s Theorem). Let G be a finite group and let H < G be a
subgroup. Then |H| divides |G|. In particular, |g| divides |G| for every g 2 G.
Hence when we see GRE questions about the possible orders of elements and
subgroups, this helps a lot.
We can talk about the subgroup generated by a number of elements g1 , . . . , gm in
the obvious way. It’s helpful to use the fact that the intersection of subgroups is still
a subgroup, and we can define
\
hg1 , . . . , gm i = H.
g1 ,...,gm 2H
s
Similarly, we can talk about the subgroup generated by a family of subgroups,
\
s
HK = {hk : h 2 H, k 2 K} = G0
H,K⇢G0
re
Since H < HK and K < HK, this means that |HK| needs to be a divisor of
|H| · |K|. In particular, |HK| = |H| · |K| if and only if H \ K = {eG }.

og
1
Definition 7.11. A normal subgroup is a subgroup N < G such that gN g ⇢N
for all g 2 G. In this case, we write N / G.
Pr
Normal subgroups are very important. In particular, kernels of group homomor-
phisms are normal subgroups. Additionally, these are the appropriate objects in
order to define quotients. For any H < G, we define
G/H = {gH : g 2 G}
where g1 H = g2 H if these subsets contain the same elements, i.e. there exists h 2 H
In
such that g1 · h = g2 . In the case that N / is a normal subgroup, G/N actually admits
a group structure – g1 N · g2 N = g1 g2 N .
Definition 7.13. A group G is called simple if it has no normal subgroups besides

{e} and G.
There are a variety of simple groups, but the biggest class of examples is Cp for
the primes p. Another choice will turn out to be An for n 5, which we will define
below.
64 IAN COLEY
7.2. Examples of groups. It’s probably about time to give some examples of (fi-
nite) groups. For every positive integer n 2 N, consider the set with n elements
Xn = {1, . . . , n}. Consider a bijection f : Xn ! Xn . We can put a group structure
on this set, with the operation being composition. Identity and inverses are obvious.
Call the set of these maps Sn and call it the symmetric group on n elements. Then
|Sn | = n!, one can readily check.
We think about elements in Sn using a cycle decomposition. Let n = 5 for sim-
plicity, and consider the following function:
f (1) = 2, f (2) = 3, f (3) = 5, f (4) = 1, f (5) = 4
We write this in the following format: we start by writing (1 ), and we then write
the image of 1 to obtain (12 ), and so on until we get (12354). This is called a
s
5-cycle as it’s written with 5 elements. Consider another function,
s
g(1) = 2, g(2) = 3, g(3) = 1, f (4) = 5, f (5) = 4
which yields up the cycle decomposition (123)(45), which we call a 3-2-cycle. As a
re
final example, consider
h(1) = 2, h(2) = 1, h(3) = 3, h(4) = 4, h(5) = 5
We could write this as (12)(3)(4)(5), but we’d rather write (12) and call it a 2-cycle.
og
Note that in a cycle decomposition, the elements in the cycles must be disjoint.
Every element of Sn has such a unique cycle decomposition up to permutation, and
it’s unique if we orient the cycle to begin with the lowest number left. That is
(123)(45) = (231)(54) = (312)(45)
Pr
but the first choice is canonical.
How do we multiply cycles? Consider (12)(13). This is a composition that says
1 ! 2, 2 ! 1 ! 3, and 3 to1. This is the cycle (123). Consider now (13)(12). This
says 1 ! 3, 3 ! 1 ! 2, and 2 ! 1. Hence this is the cycle (132). These are di↵erent!
The symmetric group Sn is not commutative. Now, we can address subgroups and
orders.
In
Problem 7.14. The order of an m1 -m2 -· · · -mk -cycle is lcm(m1 , m2 , . . . , mk ).

As such, there isn’t an obvious formula for the maximal order of a cycle in Sn , but
is easily computed for a given n.
There are some interesting subgroups of Sn that we can get using both group
theory and geometry. The first geometric subgroup is the cyclic group Cn , which
is generated by any n-cycle in Sn . All such groups are symmetric. This represents
rigids rotations of the regular n-gon. Of course, there’s another symmetry of the
regular n-gon, which is the flip along a vertical axis of symmetry. It’s harder to
write down a cycle decomposition, but we can do it for n = 5. Let the cycle be
(12345) = , and then the flip is given by (25)(34) = ⌧ . The group generated by
these two elements has 2n elements.
We can talk about groups in terms of generators and relations. For D2n , we write
n
D2n = h , ⌧ : = ⌧ 2 = 1, ⌧ ⌧ = 1
i
There’s an explicit embedding D2n ! Sn as we can see above, but we can also think
about D2n in abstract.
The last type of group to know is the alternating group An ⇢ Sn . This contains
exactly half the elements, and can be described as the kernel of the map
sgn : Sn ! C2 = {±1}
which sends an m1 -m2 -· · · -mk -cycle to ( 1)m1 +···+mk k . The alternating group con-
s
sists of the identity element, m-cycles for odd m, 2-2-cycles, etc. There’s another
description of this that’s not worth getting into right now.
s
7.3. Abelian groups. We now need to state the fundamental theorem on finitely
re
generated abelian groups. Finitely generated is pretty obvious to define, but what’s
the theorem?
Theorem 7.15 (FTFGAG). Let A be a finitely generated abelian group. Then
og
A⇠= Zr ⇥ Z/n1 Z ⇥ Z/n2 Z ⇥ · · · ⇥ Z/nk Z
where n1 | n2 | · · · | nk . Alternatively,
A⇠ ↵
= Zr ⇥ Z/p↵1 Z ⇥ · · · ⇥ Z/p ` Z
1 `
Pr
for primes pi and powers ↵i .
Now, what the hell does any of this mean? Z/nZ is the cyclic group Cn , but we
think of it additively and in terms of modular arithmetic. In particular, it’s the
quotient of Z by the normal subgroup nZ = {n · m : m 2 Z}, which we think of as
generated under addition by 1. In an abelian group, all subgroups are normal, so
there’s no problem there. The product is the same as the product of sets, and the
In
group operation works in the obvious way.

Problem 7.16. Let m, n 2 N. Then Z/mZ ⇥ Z/nZ ⇠
= Z/mnZ if and only if
(m, n) = 1.
You can check both directions of that if you want. That means that the product
of cyclic groups is still cyclic if and only if each of the terms are pairwise coprime.
The two ways we think about the above decomposition depend how we group up
the prime factors. We can either separate them as much as possible, or we can
group them together. The specific details don’t matter too much, but remember
that theorem well.
66 IAN COLEY
7.4. Rings. Now we can talk about rings.

Definition 7.17. A (unital) ring is a set R such that (R, +) is an abelian group
with another operation · satisfying:
• Multiplication is associative
• There exists 1 2 R such that 1 · a = a · 1 = a for all a 2 R
• The distributive property holds: a·(b+c) = a·b+a·c and (b+c)·d = b·d+c·d
for all a, b, c, d 2 R
The multiplication is not necessarily commutative, as in the ring of matrices
Mn (F ). Here’s something that ought to be true and is:
Problem 7.18. Prove that 0 · a = 0 for all a 2 R.
s
There are again two types of substructures that need to consider.
s
Definition 7.19. A subring S ⇢ R is an abelian subgroup that is closed under
re
multiplication. Sometimes we demand that 1 2 S, sometimes we don’t.
Subrings aren’t even that important.
Definition 7.20. A left ideal I ⇢ R is an abelian subgroup I that is closed under
og
left multiplication: for every a 2 R and x 2 I, a · x 2 I. Similarly, we can define a
right ideal and a two-sided ideal.
Ideals are important, subrings aren’t. Here’s another definition and an important
consequence.
Pr
Definition 7.21. An element a 2 R is invertible if there exists b 2 A such that
a · b = b · a = 1.
Problem 7.22. If I ⇢ R is an ideal and a 2 I is invertible, then I = R.
This means that in a proper ideal (i.e. I 6= R), we can’t have any invertible
elements.
In
There’s one more type of element that needs defining:

Definition 7.23. An element a 2 R is a zero divisor if there exists b 2 R such that
a · b = 0.
Note that a zero divisor can’t be invertible!
Ideals are also closed under intersection, sums, and not quite under products. For
products we also have to take finite sums:
( n )
X
IJ := ik jk : ik 2 I, jk 2 J
k=1
We can also talk about the left, right, or two-sided ideal generated by a subset
of R. Finally, we can prove that if I ⇢ R is a two-sided ideal, then R/I has the
structure of a ring.
Now, what are the functions?
Definition 7.24. A (unital) ring homomorphism ' : R ! S is an abelian group
homomorphism such that '(1R ) = 1S and '(r1 · r2 ) = '(r1 ) · '(r2 ).
As an example, consider the map ' : Z ! Z such that '(n) = n. This is a
perfectly good abelian group homomorphism, but it’s not a ring homomorphism. In
fact, since we demand that '(1) = 1, there is only ever one map ' : Z ! R for any
ring R, and it’s defined by '(1) = 1R .
s
Kernels of ring homomorphisms are two-sided ideals, which is convenient, so that
R/ ker ' has the structure of a ring.
s
7.5. Modular arithmetic. Let R = Z and let I = nZ. Then in the ring Z/nZ, we
re
can do mathematics. The key is that we are working with ‘remainders after dividing
by n’. Let n = 12. Then for two examples,
8 + 7 = 15 ⌘ 3, 4 · 5 = 20 ⌘ 8
og
We can identify what elements in Z/nZ are invertible and which are zero divisors.
Supposing that d is a divisor of n, we know that d · n/d = n ⌘ 0. Even if (d, n) =
↵ > 1, it is still a zero divisor, because d · n/↵ ⌘ 0. On the other hand, if (d, n) = 1,
then we know that there’s a solution to the expression ↵ · d + · n = 1, so that
↵ · d ⌘ 1 and d is invertible.
Pr
7.6. Fields. There’s a special situation that we can see immediately. Suppose that
n = p is a prime. Then every d 2 Z/pZ is coprime to p, so that every element of
Z/pZ is invertible. A commutative ring in which every element is invertible is called
a field.
But wait, we already know a lot of fields. We know Q, R, C, and others! Well
surprise, there are also finite fields. Field homomorphisms are just ring homomor-
In
phisms, but with a twist: let’s examine ' : F ! K for two fields F, K. We know
that ker ' ⇢ F is a two-sided ideal. But since every element in F is invertible, we
know that either ker ' = F or ker ' = {0}. Since '(1F ) = 1K , we know that ker '
can’t be everything. Thus ker ' = {0} and all field homomorphisms are injective.
A special case is that of field automorphisms. It’s pretty hard to find field auto-
morphisms sometimes. This is the realm of Galois theory, which isn’t particularly
covered on the GRE. As a special observation, there are no nontrivial automorphisms
of Q or Fp (which is Z/pZ when it has its field clothes on).
Problem 7.25. Prove that if ' : Q ! Q with '(0) = 0 and '(1) = 1, then ' = idQ .
68 IAN COLEY
8. Day 8: Analysis and topology

Topics covered: Lipschitz and uniform continuity of functions, absolute and uni-
form convergence of functions, suprema and infema. The main topology topics are:
compactness, connectedness and path connectedness, continuous functions, metrics
and metric spaces, separations axioms (e.g. Hausdor↵), base of a topology.
8.1. A few basics. Let’s go over some of the fancy analysis words when it comes
to sequences in R.
Definition 8.1. Let S ⇢ R be any subset. Then we define the supremum sup S to
be the number ↵ 2 R [ {1} such that ↵ > s for all s 2 S and, if < ↵, then there
exists s0 2 S such that < s0 .
s
We can similarly define the infimum of S by inf S = sup( S). The supremum
s
is also called the least upper bound and the infimum the greatest lower bound.
There’s a nice feature about the real numbers that’s related to its completeness
re
(see below).
Theorem 8.2. Every bounded above subset of the real numbers has a supremum.
As another random note, here’s a theorem.
og
Theorem 8.3. Suppose that {xn } is an increasing sequence, i.e. xn  xn+1 for all
n. If the set of values S = {xn } is bounded above, then
lim xn = sup S.
n!1
Pr
8.2. Metric spaces. Everything we say is going to work in an arbitrary (complete)
metric space, so let’s go ahead and do that definition first.
Definition 8.4. Let X be a set. A metric on X is a function d : X ⇥ X ! R 0 such
that
• d(x, y) = 0 if and only if x = y
• d(x, y) = d(y, x) for all x, y 2 X
In
• For all x, y, z 2 X, we have d(x, y) + d(y, z) d(x, z) (triangle inequality)

In any metric space, we obtain a topology by defining open sets to be generated
the balls B(", x) for every " 2 R and x 2 X. Technically this is called the base of
a topology. We’ll go more into that later. That said, the definition of continuity
for functions Rn ! Rm can be repeated verbatim for functions f : X ! Y between
metric spaces, so we’ll give that generality.
Definition 8.5. We say a function f : X ! Y is uniformly continuous if for all
" > 0, there exists > 0 such that whenever dX (x1 , x2 ) < , dY (f (x1 ), f (x2 )) < ".
You may fill in X ⇢ R and Y = R if that makes you happier.
This is strictly stronger than continuous: it says that the same can be used at
any point in the domain, not just at the particular point a you want. That’s why we
don’t talk about ‘f (x) is uniformly continuous at x = a’.
There’s an upgrade to this whose definition bears mentioning.
Definition 8.6. A function f : X ! Y is called Lipschitz continuous if there exists
a constant K > 0 such that dY (f (x1 ), f (x2 ))  K · dX (x1 , x2 ) for all x1 , x2 2 X.
In particular, the choice = "/K shows that Lipschitz continuous implies uni-
formly continuous (implies continuous).
There’s a nice way to conclude a function is uniformly continuous.
Problem 8.7 (Heine-Cantor Theorem). If X is a compact metric space and f : X !
s
Y is continuous, then it is also uniformly continuous.
s
Solution. Recall that one version of compact (we’ll re-recall it later) is that every
re
open cover of X admits a finite subcover. Since X is continuous, let’s define the sets
Ux for all x 2 X as follows:
Ux = {x0 2 X : dY (f (x), f (x0 )) < "/2}
og
In other words, it’s the set around x that satisfy the conditions of uniform continuity
for a slightly smaller epsilon. We can then define Bx to be the biggest open ball
B( x , x) ⇢ Ux . For one more refinement, consider Bx0 = B( x /2, x), the ball with
half the maximal radius. The collection {Bx0 } is (obviously) an open cover of X, so
there’s some finite collection x1 , . . . , xn such that Bx0 i = B( i /2, xi ) cover X.
Pr
1
Consider now = min i . This is a positive number because we are taking
2
a minimum (instead of, say, an infimum). Moreover, take any z1 , z2 2 X with
dX (z1 , z2 ) < . Without loss of generality, we have z1 2 B10 . Then
1 1 1
d(z2 , x1 )  d(z2 , z1 ) + d(z1 , x1 ) < +  + = 1
2 2 2
In
which implies that z1 , z2 2 B1 .

" "
dY (f (z1 ), f (z2 ))  dY (f (z1 ), f (x1 )) + dY (f (x1 ), f (z2 )) < + ="
2 2
which proves that f (x) is uniformly continuous.
This proof is a good reminder of the utility of compactness, and that on compact
domains many results are upgradeable from local (i.e. continuity begin defined at
points) to global (having a uniform for each ").
Uniformly continuous functions have a nice property.
70 IAN COLEY
Theorem 8.8. Suppose that f : Z ⇢ X ! Y is uniformly continuous on a subset

Z ⇢ X. Then there is a unique extension f : Z ! Y defined on the closure of Z that
is still continuous.
This doesn’t work if the function isn’t uniformly continuous. For instance, let
f (x) = 1/x be defined on f : (0, 1) ! R. Then there is no way to continuously
extend f (x) to [0, 1] as we would need
1
f (0) = lim
x!0 x
and this limit diverges. The case of uniform continuity ensures that we don’t run
into this problem.
s
There’s also absolute continuity, but I don’t think we need to recall that.
s
8.3. Convergence of functions. We can now begin to talk about convergence of
functions. We will restrict our attention to Y = R and X ⇢ R because we will care
re
about completeness. Let’s recall that briefly.
Definition 8.9. A sequence {xn } in a metric space X is called Cauchy if for all
" > 0, there exists N 2 N such that dX (xm , xn ) < " whenever n, m > N .
og
The terms of a Cauchy sequence get arbitrarily close together. This is related to
the sequence converging to some limit.
Problem 8.10. Prove that if limn!1 xn = x exists, then {xn } is Cauchy.
The converse is not necessarily true. Consider the metric space Q and the sequence
Pr
1, 1.4, 1.41, 1.414, . . .
p
given by the truncations of the infinite decimal 2. The limit is, by design, not in
Q, however this sequence is Cauchy as |xn xn+1 | < 10 n+1 so gets arbitrarily small.
We therefore get our definition:
In
Definition 8.11. A metric space X is called complete if every Cauchy sequence

converges to some limit in X.
It’ll be convenient to work in complete metric spaces so that sequences are Cauchy
if and only if they are convergent.
Definition 8.12. Consider a sequence of functions fn : X ! R. Then we can define
a new function f : X ! R by
f (x) = lim fn (x)
n!1
assuming all these limits exist. In this case, we say that {fn } converges to f pointwise.
This is pretty good. For instance, let fn : [0, 1] ! R be defined by fn (x) = xn .

Then it’s pretty clear that the pointwise limit exists and
(
0 x 2 [0, 1)
f (x) =
1 x=1
This presents our conundrum. Each of the functions fn (x) is continuous, but their
pointwise limit is not! We need to introduce more refined version of convergence that
takes into account that we have an entire function, not just a series of points.
Definition 8.13. Let {fn : X ! R} be a sequence of functions. We say that {fn }

converge to f uniformly if they converge pointwise and, for every " > 0, there exists
s
N 2 N such that
s
|fn (x) f (x)| < " for all n > N
re
That is, the pointwise limits are all getting close to the limit function f (x) simulta-
neously.
We can see how this should be generalised to general metric spaces. Note that since
R is complete, we could also demand that the sequence {fn } is uniformly Cauchy
og
rather than uniformly continuous, which is sometimes easier.
The point is this:
Theorem 8.14. Suppose that {fn } is a sequence of continuous functions that con-
Pr
verge uniformly to f . Then f is also continuous.
This must mean that fn (x) = xn does not converge uniformly to the limit function.
To see this, fix 1 > " > 0 and any N 2 N. We will show that there exists x 2 [0, 1]
such that |fN (x) f (x)| > ". Specifically, we are going to choose an x 2 (0, 1) so
we just need to prove that xN > ". But this is easy: take any 1 > > " and let
x = 1/N .
In
Can we upgrade this theorem in the case that we know that {fn } are also uniformly
continuous? Yes. Leave it at that.
8.4. Integrals. Now let’s address the issue of integrals. Suppose that we have a
sequence of integrable functions fn : X ! R where, again, we will consider X ⇢ Rn
(and most likely n = 1). Suppose further that fn ! f pointwise. Does it follow that
f is integrable? In particular, do we have an equality
Z Z
lim fn (x) dx = f (x) dx
n!1 X X
72 IAN COLEY
The answer is not necessarily. An easy counterexample is the following: let

fn : (0, 1] ! R be defined by
(
n x 2 (0, 1/n]
fn (x) =
0 else
Then each fn (x) isn’t continuous, but it is bounded with finitely many discontinuities,
and Z 1
fn (x) dx = 1
0
for all n 2 N.
Now, fn ! 0 pointwise because fn (x) = 0 for all n > 1/x. But, as one observes
s
Z 1
0 dx 6= 1
s
0
re
Thus, something has gone wrong. As something specific to notice, the functions {fn }
are not uniformly bounded by any constant. Consequently, {fn } does not converge
to the zero function uniformly. We have two theorems that give us the means to
commute the integral and the limit.
og
Theorem 8.15 (Uniform convergence theorem). If fn ! f uniformly as functions
[a, b] ! R and all fn are integrable, then
Z b Z b
Pr
n!1 a a
Theorem 8.16. Suppose fn ! f pointwise as functions [a, b] ! R and all fn are

integrable. Suppose further that for all n 2 N, |fn (x)|  g(x) for an integrable
function g : [a, b] ! R. Then
Z b Z b
n!1
In
a a
I’m trying to avoid measure theory in the description of this theorem, and I think
we don’t need it.
8.5. Topology. I think we can now talk about topology in abstract.

Definition 8.17. A topological space is a set X along with a subset T ⇢ P(X) such
that:
• ?, X 2 T [
• If {Ui }i2I is a collection of elements in T , then so is Ui
i2I
n
\
• If U1 , . . . , Un is a finite collection of elements in T , then so is Ui
i=1
The set T is called a topology on X. The sets U 2 T are called open sets. A
set V such that V c 2 T is called closed. Note that you could also define a topology
using closed sets and a dual set of axioms.
Every subset S ⇢ X has an interior and a closure. The interior S is the union of
all open sets U ⇢ S and the closure S is the intersection of all closed sets S ⇢ C.
There are always two topologies on any set X, namely the maximal choice
T = P(X) called the discrete topology and the minimal choice {?, X} called the
indiscrete topology (which I think is joke).
Something that we might care about is when two topologies are the same, i.e.
s
when they have exactly the same open sets. Well, usually a topology is defined using
s
a generating set, in the following sense:
Definition 8.18. A subset B ⇢ T is called a base of the topology T on X if:
re
[
• U =X
U 2B
• For every U1 , U2 2 B and every x 2 U1 \U2 , there exists a U3 2 B containing x
og
Problem 8.19. Suppose that B1 is a base of T1 and B2 a base of T2 on a set X.
Suppose further that for every U2 2 B2 , there exists U1 2 B1 such that U1 ⇢ U2 and
vice versa. Then T1 = T2 .
If we have that T1 ⇢ T2 , we say that T1 is coarser than T2 , or that T2 is finer than
Pr
T1 . In linguistic terms, having more open sets makes the set X more smooth. We
can also check this on bases.
8.6. Separation axioms. There is an increasing list of axioms that make topolog-
ical spaces more and more nice. We’ll give a list and examples.
A topological space X is T0 if for every two points x, y 2 X, there exists an open
set U such that x 2 U but y 2 / U . That is, all points are topologically distinguishable.
In
A topological space X is T1 if for every two points x, y 2 X, there exists an open

set U such that x 2 U but y 2 / U and an open set V such that y 2 V and x 2 / V . In
this case, points are closed.
A topological space X is T2 or Hausdor↵ if for every two points x, y 2 X, there
exist open sets U, V such that x 2 U , y 2 V and U \ V = ?. That is, points are
separated.
A topological space X is regular if for every x 2 X and closed K ⇢ X such that
x2/ K, there exist open sets U, V such that x 2 U , K ⇢ V , and U \ V = ?. That is,
points and closed sets are separated. X is T3 (or regular Hausdor↵) if it is regular
and T0 .
74 IAN COLEY
A topological space X is normal if for every C, K ⇢ X closed with C \ K = ?,

there exist U, V open such that C ⇢ U , K ⇢ V , and U \ V = ?. That is, closed sets
are separated.
On this note,
Theorem 8.20 (Urysohn’s lemma). A topological space X is normal if and only if for
any disjoint closed sets C, K ⇢ X, there exists a continuous function f : X ! [0, 1]
such that f (C) = 0 and f (K) = 1. That is, C and K are separated by a continuous
function.
A topological space is T4 (or normal Hausdor↵) if it is T1 and normal. All metric
spaces are T4 . There’s another theorem that’s worth stating too.
s
Theorem 8.21 (Urysohn’s theorem). Let X be a topological space. Then X is
separable and metrizable (i.e. admits a metric that generates its topology) if and
s
only if it is regular, Hausdor↵, and second-countable.
re
We’re missing some of these words. A topological space is separable if it admits
a countable dense subset, i.e. there’s a countable set S ⇢ X such that S = X. A
topological space is second-countable if it admits a countable base.
8.7. Continuity. What are the functions we care about?
og
Definition 8.22. Let X, Y be two topological spaces. Then a set map f : X ! Y
is called continuous if f 1 (V ) is open for any open V ⇢ Y . Equivalently, if f 1 (C)
is closed for every closed C ⇢ Y .
Pr
Problem 8.23. Prove that if X, Y are metric spaces endowed with the metric topol-
ogy, this is the same definition as the usual one.
Definition 8.24. A map f : X ! Y is called a homeomorphism if it is continuous,
bijective, and moreover sends open sets to open sets.
Without this last condition, there’s no guarantee that the set-theoretic inverse is
a continuous function.
In
Now, what kind of sets are there besides open and closed?
Definition 8.25. A set Z ⇢ X is called disconnected if there exist open sets U, V ⇢
X such that U [ V = Z and U \ V = ?. A set that is not disconnected is called
connected.
Definition 8.26. A set Z ⇢ X is called path-connected if for every a, b 2 Z, there
exists a continuous function : [0, 1] ! Z such that (0) = a and (1) = b.
Problem 8.27. Prove that every path-connected set is connected.
The converse is not true.
Problem 8.28. Prove that the graph of the function

(
sin x1 x 6= 0
f (x) =
0 x=0
is connected but not path connected.

Definition 8.29. A set K ⇢ X is called compact if every open cover of K admits a
finite subcover.
Compact sets are necessarily closed and bounded. If X is Euclidean space, we get
the converse:
s
Theorem 8.30 (Heine-Borel Theorem). A set K ⇢ Rn is compact if and only if it
s
is closed and bounded.
There are a couple extra things one should prove now.
re
Problem 8.31. The image of a connected set under a continuous function is con-
nected. The image of a compact set under a continuous function is compact.
og
I think that’s about everything.
9. Day 9: Miscellaneous
Pr
Topics covered: probability and combinatorics, statistics, geometry, set theory,
logic, graph theory, algorithms.
9.1. Combinatorics. Pigeonhole principle (go over it again)

Permutations and combinations (and factorials)
Specific case: going in a circle, it’s (n 1)! instead of n! because you have to divide
by n options.
In
9.2. Probability via area. We can imagine a probability space ⌦ and events A
and B being subsets of ⌦, such that the area/volume of ⌦ is 1 and hence P (A) is
given by the volume or area of A. Then looking at the probability of P (A and B) is
just given by the intersection of these areas, and similarly for P (A or B).
When you’re trying to compare continuous random variables x, y, z 2 [0, 1] (for
instance), the volume approach is very useful, as we’ve seen.
Problem 9.1. If x, y are randomly chosen in [0, 1], what is the probability that
x 2y?
76 IAN COLEY
Solution. We can picture this as the double integral where y 2 [0, 1] and x 2 [2y, 1].
Except that this doesn’t make total sense, because 2y > 1 when y > 1/2, so we really
have to integrate y 2 [0, 1/2].
Z 1/2 Z 1 Z 1/2 ✓ ◆2
1 1 1
1 dx dy = 1 2y dy = =
0 2y 0 2 2 4
We can also do this via drawing the picture and computing the area of the triangle.
The same works in 3d.
P (A and B)
9.3. General probability. Conditional probability: P (A|B) = . Read
P (B)
s
‘probability of A given B.
We say that A and B are independent if P (A and B) = P (A)·P (B). Equivalently,
s
P (A|B) = P (A).
There’s a nice way to swap the order of conditional probability, called Bayes’
re
theorem.
Theorem 9.2.
P (B|A)P (A)
P (A|B) =
og
P (B)
In action:
Problem 9.3. Consider drawing two cards from a deck. Compute the probability
of drawing a spade first given that you drew a spade second.
Pr
Solution. Let A be the first spade and B the second spade. We can work out the
probability P (B|A) explicitly: there are 51 cards left and 12 spades, so P (B|A) =
12/51. We can also compute P (A) and P (B): P (A) = P (B) = 1/4. We can see
that P (B) = 1/4 by noting that it’s just taking a random card from the deck. Thus
P (A|B) = P (B|A) = 12/51.
In
Now these have been discrete probabilities here, but what about continuous ones?
Definition 9.4. A probability distribution function for a random variable X is a
positive integrable function f : R ! R such that
Z 1
f (x) dx = 1
1
It has a corresponding continuous distribution function defined by

Z a
P (X  a) = F (a) = f (x) dx
1
9.4. Statistics. The expected value of a discrete random variable X (taking values
in R) is X
E(X) = P (A) · A
A2X
For a continuous random variable with pdf f (x) is
Z 1
E(X) = x · f (x) dx
1
The variance can be calculated as E(X ) E(X)2 . Specifically,
2
X Z 1
2
P (A)(A E(X)) , (x E(X))2 · f (x) dx
1
s
A2X
The standard deviation is the square root of the variance.
s
What do we know about standard deviation? Well, suppose we have a normal
distribution. That’s basically none of the above examples, but we’ll get to that soon.
re
Within ±1 standard deviation of the expected value (or the mean) is 68% of the
distribution, within ±2 is 95% and within ±3 is 99.7%.
When can we expect a variable to be normally distributed? For example, Bernoulli
trials. Suppose we have an event with probability p and we perform n trials. Then
og
the expected value of successful trials is n · p. If we repeat this situation a bunch of
times, we can look at the number of trials that were actually successful. The variance
of this distribution is n · p · (1 p) and so the standard deviation is the square root
of this.
Pr
Problem 9.5. Suppose we roll a 20-sided die 400 times. Consider the probability of
rolling a prime number. What is the expected number of successes and what is the
standard deviation?
Solution. p = 7/20 so go from there.
9.5. Geometry. Triangles: let’s start there. There’s the law of sines and the law of
In
cosines, which I can recall, but there’s also Heron’s formula for the area of a triangle:
if the sides are a, b, c, then
p
A = s(s a)(s b)(s c)
a+b+c
where s = is the semiperimeter.
2
This is pretty pnice when you know that the triangle is equilateral and then the
x2 3
area becomes where x is the side length.
4
180(n 2)
The measure of the angle of a regular n-gon is . That can be useful.
n
78 IAN COLEY
You can also try to find the area of an inscribed polygon or a circumscribed polygon
around a circle. We can do the example of a hexagon and see where it goes from
there.
Problem 9.6. Do that, I think I can wing it. Start with the unit circle and go from
there.
9.6. Set theory. We’ve discussed the issue of cardinality. If you take the power set,
the cardinality strictly goes up. We can do some discussion of countability though.
@0 is the cardinality of N. We say that a set X is countable if there exists a
surjective function N ! X or an injective function X ! N. A countable union of
countable sets is still countable, and a finite product of countable sets is countable,
s
but a countable product of countable sets is definitely not countable anymore.Y
In particular, consider the set X = {0, . . . , 9} and take an infinite product X.
s
Z
Then if we interpret the elements of this product as xn · 10n for n 2 Z, then we
re
get (roughly) R as long as we make sure that these n are bounded above, but not
necessarily bounded below. We just have to throw away the elements that aren’t
bounded above, but that collection is finite. Hence uncountable minus countable
gives us an uncountable set R. Great!
og
9.7. Graph theory. Graphs are made of edges and vertices. A cycle is a cycle.
Sometimes graphs are directed, sometimes they aren’t. I guess that’s about it.
9.8. Algorithms. Learn some Python? If you don’t know any computer science,
it’s a bit tricky. I guess just try to treat the algorithm like a proof with input.
Pr
In

Mgre Notes

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mgre Notes

Uploaded by

Copyright:

Available Formats

•

This a list of all the topics that I think could conceivably

Note that none of these topics are guaranteed to be on the

I hope you find this list useful! Good luck!

Last updated: June 14, 2020

Sequences and series

Si gle Va iable Calc l

Area between curves

Definitions and basic examples

Properties of spaces and functions

Lebesgue measure and measurability

From this definition, it is easy to ascertain that

Theorem 2 (Squeeze Theorem). Suppose that f, g, h : R ! R are such that f (x) 

lim f (x) = lim h(x) = L.

We can define continuity in a similar way.

Definition 4 (Continuity). We say that a function f : R ! R is continuous at x0 2 R i↵

lim f (x) = f (x0 ).

We say that f is continuous in some domain A ✓ R i↵ f is continuous at each point x0 2 A.

Proposition 5 (Rules for limits). Suppose that f, g : R ! R and x0 2 R is such that

(a) lim (f (x) ± g(x)) = lim f (x) ± lim g(x).

Very similar properties hold for continuous functions.

Proposition 6 (Rules for continuous functions). Suppose that f, g : R ! R are

(c) f /g is continuous at x0 so long as g(x) 6= 0 for x sufficiently close to x0 .

(d) ↵f is continuous at x0 for each constant ↵ 2 R.

(e) If h : R ! R is continuous at f (x0 ), then h f is continuous at x0 . That is, we can

Theorem 7 (Intermediate Value Theorem). Suppose that f : [a, b] ! R is continuous.

··· p( 3) p( 2) p( 1) p(0) p(1) p(2) p(3) ···

Thus the root occurs in the interval [ 2, 1].

Definition 9 (Derivative). Let f : R ! R and suppose that x 2 R. We say that f is

exists. In this case, we write

f (y) f (x) f (x + h) f (x)

derivative as the ‘instantaneous rate of change’ of f at x. We sometimes write the derivative

The first note we make is that di↵erentiability is stronger than continuity.

Proposition 10. If f : R ! R is di↵erentiable at x 2 R, then f is continuous at x.

The derivative is so ubiquitous, that it is worth memorizing the derivative of several

Proposition 11 (Derivatives of Common Functions). The derivative of

(a) f (x) = xk is given by f 0 (x) = kxk 1

(b) f (x) = ex is given by f 0 (x) = ex for all x 2 R,

(c) f (x) = sin(x) is given by f 0 (x) = cos(x) for all x 2 R,

(d) f (x) = cos(x) is given by f 0 (x) = sin(x) for all x 2 R,

Proposition 12 (Rules for derivatives). Suppose that f, g : R ! R are di↵erentable at

(a) f ± g is di↵erentiable at x with (f + g)0 (x) = f 0 (x) + g 0 (x).

(c) ↵f is di↵erentiable at x and (↵f )0 (x) = ↵f 0 (x) for any constant ↵ 2 R.

(d) (Power rule) if f (x) = xn for some n 2 R, then f 0 (x) = nxn 1 .

(e) (Chain rule) if h : R ! R is di↵erentiable at f (x), then h f is di↵erentiable at x and

(f) (Derivative of inverse) if f is locally injective (so that f 1 exists in a neighborhood of

Definition 13 (tangent line). Suppose that f : R ! R is di↵erentiable at x0 2 R. Then

y = f (x0 ) + f 0 (x0 )(x x0 );

notice that the slope of this line is exactly f 0 (x0 ).

Solution. The question is asking how often f 0 (x) = 0. For x 6= 0, we see

This is zeropwhen 1 2x2 + 2x 2 = 0 or 2x4 x2 + 2 = 0. This is a quadratic in x2 yielding

Theorem 16 (Mean Value Theorem). Suppose that f : [a, b] ! R is continuous and

Theorem 17 (Extreme Value Theorem). Suppose that f : [a, b] ! R is continuous.

but both f (x) and g(x) approach 1 as x ! 1. The ‘number’ 1 1

Theorem 18 (l’Hôpitals Rule). Suppose that f, g : R ! R are di↵erentiable functions

lim f (x) = lim g(x) = 0

Example 19. Compute the limits:

sin(x sin(x)) cos(x sin(x))(1 cos(x)) 1 cos(x)

Definition 20 (Riemann Integral [baby definition]). Suppose that f : [a, b] ! R is

A useful property of integration is that it preserves inequalities.

Proposition 22 (Integration preserves inequalities). Suppose that f, g : [a, b] ! R

This still holds if we replace  with <.

In a standard Calculus curriculum, it is quite an abrupt shift from derivatives to inte-