Quantitative Methods For Finance

Foundations of Calculus
Limits and continuity of a

function
In this video …
A definition of function, domain and codomain
Composed functions
Geometric properties of functions
Types of functions
Foundations of Calculus 2
Warm up: a bit of definitions
FUNCTIONS ON : A function of a real variable x with
domain D is a rule that assigns a unique real number to
each number x in D
We typically use letters like f or g to denote such a rule

For the time being, we will consider f: , with , that
is, functions from to
Usually, y = f(x) denotes the value that the function f assigns
to the real number x belonging to its domain, or, in other
words, the “value of f at x”
As an example, f(x) = x+1 is a function that assigns to each x of
the domain, a number that is one unit larger; for instance, f(2)
= 3 is the value of this function at 2
Warm up: a bit of definitions
The domain is the set of numbers x at which f(x) is defined
Typically, when the domain is not specified, we assume that it
includes all the real numbers for which the function takes
meaningful values
For instance, for the function = the domain will be
, 3 3, , that is, -3 is excluded from the domain
The range (or co-domain) of a
function is instead the set of the
values assumed by the function
As an example, the domain of Range
f: = is equal to the
entire but the range is equal
to
Domain
Warm up: composition of functions
Let f: U , g: X , and ( ) . Then the function
: defined by
( )( ) ( )
is called composition of f and g
Basic geometric properties of a function
The basic geometric properties of a function are whether it is
increasing or decreasing and the location of its local and
global minima and maxima (if any)
A function f is increasing if > implies ( ) > ( )
while a function f is decreasing if > implies ( ) <
( )
Figure 1
The function depicted in Figure 1 is
decreasing over and increasing
over
The point where the function turns
from decreasing to increasing is a
(global) minimum for this function,
in this case zero
Basic geometric properties of a function
The function depicted in Figure 2 is Figure 2
increasing over and decreasing over
The point where the function turns from
decreasing to increasing, is a (global)
maximum for this function, here zero
If a function f changes from decreasing to
increasing at , the point ( , ( )) is a
local minimum of the function f; if
( ) ( ) for all x then the point is a
global minimum
If a function f changes from increasing to decreasing at ,
the point ( , ( )) is a local maximum of the function f; if
( ) ( ) for all x then the point is a global maximum
We will come back to these notions when we speak of
optimization
Different functions
The simplest functions are the monomials, those functions which
can be written as = for some number a (coefficient)
and some positive integer k, where k is said to be the degree of
the monomial
For instance = 6 is a monomial of the second order
A polynomial is a function formed by adding up different
monomials; the degree of the polynomial is highest degree of any
monomial that appears in the function
For instance = 6 +2x is a polynomial of the third order
+
Rational functions are ratios of polynomials, e.g., =
Non algebraic functions: exponential functions (where x appears

at the exponent), trigonometric functions, logarithmic functions,
etc. (I will focus on exponential and logarithmic functions later on)
Examples of popular functions (1/2)
Type Description Example
Constant function
Constant: y = (polynomial of y=5
degree zero)
Straight line: Linear function
= + (polynomial of y = 5x + 3
degree one)
Parabola: Quadratic function
= + + (polynomial of = + +2
degree two)
Hyperbola:
Rational function = / =
= /
Power function: Monomial of /
= =
= degree k
Examples of popular functions (2/2)
Power
Reference
Mathematics for Economists by Blume and Simon (Chapters

2.1 and 2.2)
Limits and continuity of a

function
In this video …
Limits: Definition
Rule for Limits
Limits: an example
Continuity: Definition
Limits: Definition
To get a first intuition of the concepts of limit consider that a

function f is defined for all x near a but not necessarily at =
We say that the function f(x) has the number A as its limit as x
tends to a if f(x) tends to A when x tends to a
We write
lim =
Limits: Definition
Note: we do not need the function to be defined at in order
for the limit lim ( ) to exist
Example: the function plotted in the figure is not defined at

=1
However, the limit at xo=1

exists
lim =1
Rules for Limits
Continuity: a definition
There are several (equivalent) definitions of continuity. A “naïve”
definition of continuity states that a function is continuous if an
approximate knowledge of x is sufficient to approximate f(x).
In other words, f is continuous at if all the values of f at points

near are close to f( )
Formally, we say that a function f is continuous at when for any

> 0, there is > 0 such that
( ) <
whenever
<
Alternative definitions.
A function f(x) is continuous at if for any sequence , f( )

converges to f( ).
Using limits, we can give the following definition. We say that a

function f is continuous at if
lim = ( )
If a function is not continuous at x we say that it is discontinuous at

x
We say that a function is continuous if it is continuous in any point

of its domain.
Example of a function
that is NOT continuous
Reference
Mathematics for Economists by Blume and Simon (Chapters

2.5 and 12.1)
Differentiability and rules of

derivation
In this video
The slope of functions
Interpretation of derivatives
Rules for computing the derivatives
Differentiability
Higher order derivatives
Start from a geometric
interpretation: when we
study the graph of a
function we would like
to have a measure of the
steepness of the graph at
a point or several points
If the function is linear, this
is easy: the slope is given by
the coefficient that multiplies x
However, this is less trivial for a non-linear function as the
one depicted above
We can define the steepness of a curve at a particular point
as the slope of the straight line that just touches the curve at
that point, i.e., that is tangent to the curve at that point
(point P in the figure) 3
The slope of the tangent to the graph at point P, with coordinates
(a, f(a)), is called derivative of f at P and is denoted by f’(a)
How can we find the slope of the tangent of f at the point P,
with coordinates (a, f(a))?
Consider a point Q that is also on the graph of f and is close
to P, i.e., suppose that the x-coordinate of Q is a+h where h is
a small number different from zero
Therefore the y-coordinate of Q is

f(a+h)
The slope of the secant PQ is:
( )
=
When Q moves towards P (Q
tends to P) along the graph of f,
the secant PQ tends to the
tangent to the graph at P
The x-coordinate a+h must tend
to a so that h must tend to zero
Therefore, we define the slope of
the tangent to the graph at P as
the number to which
approaches when h goes to zero
Hence, the derivative of a function f at a point a of its domain
is
Interpretation of derivatives
In economics, other (non geometric) interpretations of the
derivative are often more useful
It is a rate of change!
If is a function that expresses the cost of producing x
units of a good, then we interpret as the marginal cost
at x
( ) = lim ( ) Cost that we face
when we make a
Note: Do not forget that the slope of a non-linear “small” increment to
function is NOT CONSTANT! The derivative the production level
depends on the point where you compute it
(meaning that if the cost function is non linear,
increasing the production from 10,000 to 10,001
units is not the same as increasing the
production from 50,000 to 50,001
Rules for computing derivatives
Theorem (for a proof see BS, chapter 2):
For any positive integer k, the derivative of = at is
=
The derivatives of other commonly used functions are:

The derivative of a constant is zero
The derivative of the logarithm is
= ln = 1/
= =1
The derivative of exponential functions is
= =
= = ln
Rules for computing derivatives (2/2)
CHAIN RULE: = ( )
An example
Take the derivatives of the following functions:
ln
Differentiability
A function f is differentiable at if the limit
( )
lim
exists and is the same for each sequence which converges to 0.
If a function is differentiable at every in its domain we say that
the function is differentiable
Example of a function
that is continuous but
NOT differentiable
No derivative
at x0=0
Higher order derivatives
The derivative of a function f is often called the first derivative
of f. If f’ is also differentiable, we can differentiate f’ to get the
second derivative, f’’. The derivative of f’’ (if it exists) is called
the third derivative of f and so on. Typically, for our applications
first and second derivatives are enough, bur higher-order
derivatives may be computed
For example, consider the function =2 +6

The first derivative is =6 +
The second derivative is the derivative of f’, that is

= + 12
The third derivative is = 12
Derivatives from the fourth onwards are equal to zero

Objectives of the Course
Mathematics for Economists by Blume and Simon (BS)

(Chapters 2.3 – 2.6)
Linear and polynomial
approximations
In this video
Linear approximation and differentials
Polynomial approximations
Linear functions are very easy to manipulate: therefore it is
natural to try to find a “linear approximation” to a given
function
Consider a function f(x) that is differentiable at =
The tangent to the graph at ( , ( )) follows the equation
y= + for x close to
Therefore, a linear approximation of the function f around
is given by
( ) + for x close to
or provided that is small
Differential of the function f at

The differential is NOT
the actual increment in
y if x is changed to +
but rather the
change in y that would
occur if y continued to
change at the fixed rate
f’(x) as x changes to +
The less is the slope of f, the more precise is the
approximation; in addition, the larger is , the less precise
the approximation
We shall see this point in more depth with an example:
consider the function = ln( )
Starting from 10 = 2.3 compare the actual change in y
given a certain change in x with its linear approximation that
is,
X 0.1 0.5 1 5 10 100

Actual Change 0.0100 0.0488 0.0953 0.4055 0.6931 2.3979
Linear Approximation 0.0100 0.0500 0.1000 0.5000 1.0000 10.0000
Size of the "mistake" 0.0000 0.0012 0.0047 0.0945 0.3069 7.6021
Note: a change = 100 is quite big if we are at = 10,
leading to a rather imprecise approximation (see previous
slide)
But what if we are at = 10000?
Starting from ln 10000 = 9.20 compare the actual change in
y given a certain change in x with its linear approximation
that is,
Now = 100 can be considered small enough to lead to a

quite accurate approximation
Polynomial Approximations
If approximations provided by linear functions are not
sufficiently accurate it is natural to try quadratic appro-
ximations or approximations by polynomials of higher order
The quadratic approximation to ( ) about = is
1
+ + ( )
2
More generally, we can approximate ( ) about = by an
nth polynomial (the nth-order Taylor polynomial)
Polynomial Approximations
Consider the function = and see what happens when
we try to approximate it around 2 with linear vs. quadratic
approximations:
0.1 0.2 0.3 0.4 0.5
Actual change 8.84101 19.53632 32.36343 47.62624 65.65625
Linear Approximation 8.00000 16.00000 24.00000 32.00000 40.00000
Quadratic Approximation 8.80000 19.20000 31.20000 44.80000 60.00000
Other example: = ln( ) to be approximated around 1

(recall that 1 = 1 = 0)
0.1 0.2 0.3 0.4 0.5
Actual change 0.09531 0.18232 0.26236 0.33647 0.40547
Linear Approximation 0.10000 0.20000 0.30000 0.40000 0.50000
Quadratic Approximation 0.09500 0.18000 0.25500 0.32000 0.37500
References
Mathematics for Economists by Blume and Simon (BS)

(Chapter 2.7)
Logarithmic and exponential functions

In this video…
Natural exponential function
Natural logarithmic function
Foundations of calculus 2
Natural exponential functions
Consider the following function:
1
= 1+
When m increases towards infinity, will converge to

2.71828 … m/f(m)
1 2
1 2 2.25
lim 1+ 3 2.37037
4 2.44141
5 2.48832
Heuristic proof …
(if interested 100 2.70481

look at Chang, 1000 2.71692
chapter 10) 100000 2.71827
10000000 2.71828
Why do we care?
In economics and finance, the number e carries a special
meaning, as it can be interpreted as the result of a special
process of interest compounding, that is, continuous
compounding
Suppose that we invest 1 euro today and we earn an annual
nominal interest of 100% (just for simplicity, we shall
consider something more reasonable in a minute); clearly if
the interest is compounded once a year, at the end of the year
we will get 2 euros:
1
1 = 1+
1
Alternatively, if the interest is compounded semi-annually, we
will get
2 = 1 + 50% 1 + 50% = 1 +
More generally,
1
= 1+
Therefore, if we increase the frequency of compounding to

infinity, we know from the previous slide that we will earn
euro 1 x e = 2.718
This result can be generalized in three ways:
(1) More years of compounding
(2) Principal different from 1 euro
(3) Nominal interest rate different from 100%
(1) and (2) are trivial to implement
Simply, the amount of money that one will have after t years
with an annual rate of 100% continuously compounded is
equal to , where A is the capital invested at the beginning
Suppose now that we want to feature an annual interest rate r
= 5%
We can manipulate the formula to get
/
= 1+
With continuous compounding we get

=
Natural logarithmic functions
The inverse of the natural exponential function is the natural
logarithmic function, =
Natural logarithmic functions
Sometimes economists prefer to represent (and study) a
function = ( ) in log-log terms
This means that they apply a change to the variables such that
Y= ln( ) and X = ln( ); therefore = and = =
Hence, in XY-coordinates, f becomes = ln =
The slope of the graph in the log-log terms is

% change of a
function f
relative to a %
change of x
elasticity
Reference
Mathematics for Economists by Blume and Simon (Chapter 5)
Fundamental Methods of Mathematical Economics by Chiang

(Chapter 10)
Foundations of Linear Algebra
Matrices and Vectors

In this video …
Transpose of a matrix
Important definitions
Foundations of Linear Algebra 2

A matrix is a rectangular array of numbers, parameters or
variables
The members of the array are called elements of a matrix ,
and they are typically denoted as to denote the element in
row i and column j
The dimension of the matrix is where m are the rows
and n are the columns
A matrix where = is called square matrix
A vector is a special case of a matrix
A matrix mx1 is a column vector
A matrix 1xn is a row vector
3
Foundations of linear algebra
Transpose of a matrix
The transpose of a matrix A denoted by A’ or is obtained
by interchanging the rows and the columns of the matrix, e.g.,
6 2
6 0 4
= and = 0 9
2 9 2
4 2
Note that if A is n x m then A’ is m x n
If A=A’ then the matrix is said to be symmetric
4
An identity matrix (typically denoted by I) is a square matrix
with 1 in is principal diagonal and 0 everywhere else, e.g.,
1 0 0
= 0 1 0
0 0 1
The identity matrix is the matrix counterparty of the number
1, because A =A, and A = A where A is a generic n x m
matrix
A null matrix is the counterparty of the number zero; it is a

matrix whose all elements are zeros, e.g.,
0 0 0
=
0 0 0

A diagonal matrix is a square matrix in which all non-
diagonal entries are zero, e.g.,
3 0 0
= 0 5 0
0 0 2
An upper (lower) triangular matrix is a square matrix in
which all the entries below (above) the diagonal are zeros,
e.g.,

Euclidean norm : the Euclidean norm of a vector x is
denoted by | | is defined as follows: | | =
+ + +
For instance, given the vector
2
1
=
4
3
the norm is = 2 + 1 + 4 + ( 3) = 30 = 5.5
The Euclidean norm is the “length” of a vector

Vectors are said to be linearly independent if they cannot be

written as a linear combination of each other
In a n-dimensional space ( ) there exist a maximum of n
linearly independent vectors
For example, in the case of three-dimensional space, any
other vector can be rewritten as a linear combination of
1 0 0
= 0 , = 1 , = 0
0 0 1

The vectors that contain one element equal to 1 and the other
elements equal to zero are called unit vectors
N unit vectors are a basis of
The rank of a matrix tells us the maximum number of linearly
independent rows or columns of the matrix
If the matrix is n x m the maximum rank is min(n, m)

Reference

(Chapter 5)
Matrix operations
In this video …
Matrix operations
The inverse of a matrix
Checking singularity
Properties of the determinant

Matrix Operations
In order to perform algebraic operations, matrices must meet
some requirements about their size
We say that the matrices must be conformable for a given
operation
Multiplication by a scalar can always be performed (the size
of the matrix does not matter)
It consists in multiplying every element of the matrix for a
given scalar
2 1 8 4
4 =
5 3 20 12
3
Matrix Operations
In order to be conformable for addition two matrices must
have the same size
ADDITION OF TWO MATRICES
, … , , … ,
… , … + … , …
, … , , … ,
, + , … , + ,
= … , + , …
, + , … , + ,
For instance,
3 2 7 4 10 6
+ =
5 1 2 1 3 0
4
Matrix Operations
In order to be conformable for subtraction, two matrices must
have the same size
SUBTRACTION OF TWO MATRICES
, … , , … ,
… , … … , …
, … , , … ,
, , … , ,
= … , , …
, , … , ,
For instance
3 2 7 4 4 2
=
5 1 2 1 7 2
5
Matrix Operations
In order to be conformable for multiplication, if the size of the
first matrix is n x m, the size of the second matrix must be m x
h; the result of the multiplication has size n x h
MULTIPLICATION OF MATRICES
, … , , … , , ,
… , … … , … = ,
, … , , … , , ,
LEAD MATRIX LAG MATRIX
is the result of the multiplication of the first row of the
,
lead matrix “with” the first column of the lag matrix
…
is the result of the multiplication of the n row of the lead
,
matrix “with” the h-th column of the lag matrix
6
Matrix Operations
As an example of multiplication of matrices try
7
Matrix Operations
Important: unlike multiplication of scalars the multiplication

of matrices is not commutative!
Note that, for instance, if A is 2 x 3 and B is a 3 x 4, not only AB

and BA are not the same product, but BA is non conformable
Unlike scalars, matrices cannot be divided one by the other

(although it is possible to divide every element of a matrix by
a scalar)
8
Matrix Operations
9
Division: the inverse of a matrix
The inverse of a matrix, denoted as exists only if the
matrix is a square matrix
The inverse of matrix satisfies the conditions =I and
=I
Not every square matrix has an inverse (being a square
matrix is a necessary but not sufficient condition)
When a matrix does have an inverse it is said to be non-

singular while if it does not have an inverse it is said to be
singular
A n x n matrix is nonsingular if it has n linearly independent
rows (columns) (we can also say that the matrix has full
rank)
10
Check singularity of a matrix
In order to check non-singularity of a square matrix (or
alternatively, that the matrix has a full rank) one has to use the
determinant of a matrix
If the determinant of the square matrix is different from zero then
the matrix is non-singular (thus invertible)
Computing the determinant (with exception of 2x2 matrices) is
typically tedious and computationally intense
In case of a 2 x 2 matrix, the formula for determinant is
11
12
13
The inverse matrix in Excel
We do not want to go into the technicalities of matrix inversion (the
interested Reader may read Chang, Chapter 5)
However, again Excel can help us with the computation
The relevant function is MINVERSE
MINVERSE (C5:E7)
You need to select the result area H5:J7 (matrix is 3 x 3 so the

result will be 3 x 3 too) and press CTRL + SHIFT + ENTER 14
Reference

(Chapter 5)
15
System of equations
In this video …
Systems of linear equations
Solution to a linear system
Other methods to solve systems of linear equations
2
In general, an equation is said to be linear if it has the form
+ + …+ =
The coefficients , , …, d are fixed numbers and therefore
they are called parameters
, , …, stand for variables
A system of linear equations consists of a set of linear
equations that must be solved simultaneously
For instance, the one below is a system of two linear

equations
6 + =3
2 +5 =4
3
Matrix algebra provides a compact way to write system of
equations (even large ones)
For instance, the previous system of equations can be written
as follows
6 1 3
=
2 5 4
It leads to a way of testing the existence of a solution by the

analysis of the determinant of the matrix of coefficients
It gives a method for finding the solution (if it exists)
In compact format we can write the system as =
4
Solution of a linear system
Consider again the system =
We now know that in order to perform division we have to employ
the concept of inverse
The solution to the system is then simply =
Therefore, we are left with the possibilities in the scheme below
5
Other ways to solve system of linear equations
Another way to solve a system of linear equations (without
using matrix algebra) is Gaussian Elimination
Start from a linear system and obtain another linear system that
has the same solution of the original one but that is simpler to
solve
Let us see the following example
As a first step, we want to eliminate from the second equation

We multiply the first equation by two and we substract it from
the second equation
6
Now we have
Now you eliminate the from the third equation by subtracting the
first equation
We can eliminate from the third equation, by adding ½ the

second equation
7
We obtain
Now the system is in triangular form and this is easy to solve using
«back substitution», i.e., solve last equation, then secondo then first
=1
y=2
x = -3
We can check the solution using matrix formulation
1 2 1 2
2 6 1 = 7
1 1 4 3 8
Reference

(Chapter 5)
9
A Review of Optimization Methods
Introduction to optimization
In this video …
Statement of the optimization problem
Candidate points: first derivative test
Second derivative test
2
Optimization: Statement of the Problem
Optimization == maximizing (or minimizing) some objective
function, y = f(x), by picking one or more appropriate values of
the control (aka choice) variable x
The most common criterion of choice among alternatives in
economics (and finance) is the goal of maximizing something
(like the profit of a firm) or minimizing something (like costs
or risks)
For instance, think of a risk-averse investor who wants to
maximize a mean-variance objective by picking an
appropriate set of portfolio weights
Maxima and minima are also called extrema and may be
relative (or local, that is, they represent an extremum in the
neighborhood of the point only) or global
Key assumption: f(x) is n times continuously differentiable
3
Optimization: Statement of the Problem
In the leftmost graph, optimization is trivial: the function is a
constant and as such all points are at the same time maxima
and minima, in a relative sense
In the second plot, f(x) is monotonically increasing, there is no
finite maximum, if the set of nonnegative real numbers is the
domain (as the picture implies)
The points E and F on the right are examples of a relative
(local) extrema
A function can well have several relative extrema, some of
which may be maxima while others are minima
4
Candidate points: The First-Derivative Test
As a first step we want to identify the “candidate” points to solve
the optimization problem, i.e., all the local extrema
Indeed, global extrema must also be local extrema or end points
of f(x) on its domain
• If we know all the relative maxima, it is necessary only to select the
largest of these and compare it with the end points in order to
determine the absolute maximum
Key Result 1 (First-Derivative Test): If a relative extremum of the
function occurs at x = x0, then either f'(x0) = 0, or f'(x0) does not
exist; this is a necessary condition (but NOT sufficient)
5
Candidate points: The First-Derivative Test (2/2)
Key Result 1 Qualified: If f’(x0) = 0 then the value of f(x0) will be:
(a) A relative maximum if the derivative f'(x) changes its sign
from > 0 to <0 from the immediate left of the point x0 to its
immediate right
(b) A relative minimum if f'(x) changes its sign from negative
to positive from the immediate left of x0 to its immediate right
(c) Neither a relative maximum nor a relative minimum if f'(x)
has the same sign on both the immediate left and right of point
x0 (inflection point)
NOTE: we are assuming that the function
is continuous and possesses continuous
derivatives => for smooth functions,
relative extreme points can occur only
when the first derivative has a zero value Inflection point
6
One Example
7
Concavity, Convexity, and Second-Order Derivatives
A strictly concave (convex)
function is such that if we pick any
pair of points M and N on the
function and join them by a Inflection point
straight line, the line segment MN
Concave Convex
must lie entirely below (above)
the curve, except at points M and
N
If the second derivative ( ) is negative for all x then

the function ( ) is strictly concave
If the second derivative ( ) is positive for all x then the
function ( ) is strictly convex
8
The Second-Order Derivative Test
Key Result 2 (Second-Derivative Test): If the first derivative of a
function at x = x0 is f’(x0) = 0 (first-order, necessary condition),
then f(x0 ), will be:
(a) A relative maximum if f"(x0) < 0 Second-order,

(b) A relative minimum if f"(x0) > 0 sufficient condition
This test is in general more convenient to use than the first-

derivative test, because it does not require us to check the
derivative sign to both the left and the right of x0
Drawback: this test is inconclusive in the event that f"(x0) = 0
when the stationary value f(x0) can be either a relative
maximum, or a relative minimum, or even an inflection point
• This is what makes the condition sufficient only
9
Two Examples
Lecture 5 – Review of Optimization Methods 10

Reference

(Chapter 9)
11
Optimization with more than one

variable
In this video …
Functions of more than one variables
Partial derivatives
Hessian Matrix
Optimization: the case of n-variable functions
2
Functions with more than one variable
We are now going to generalize the earlier results to
optimization problems for functions of several variables, i.e.,
Functions : , i.e., = ( , ,…, )
In fact, functions from to will be popping up very often
in your future studies
For instance, the return of a portfolio is a linear function of the returns
of the n assets that compose the portfolio:
= + + +
Another example is a utility function , ,…, of a bundle of
consumption goods
However, we first need to generalize the concept of derivative
to the case of functions of several variables
This leads us to the introduction partial derivatives and of
Jacobian derivatives
3
Partial derivatives and the Jacobian
Definition: Let : . Then for each variable at each
point = ( , , … , ) in the domain of f, the partial
derivative with respect to is
if the limits exists. Only the ith variable changes, while the others
stay constant
The vector (more generally, matrix) that collects all

partial derivatives
= , ,…, ( )
is called the Jacobian derivative of f at

4
Partial Derivatives: One Example
Example: consider the function , =3 +4 +
7
Let us compute the partial derivative with respect to
Simply treat as it was a constant and apply the same rules
of one-variable calculus
=6 +4
Now let compute the partial derivative with respect to

=6 + 12 +7
The concept can be easily generalized to a function of more

than two variables
5
Second Order Derivatives and Hessians
If the n partial derivative functions of f are continuous
functions at the point in we say that f is continuously
differentiable at
If all the n partial derivatives / are themselves
differentiable we can compute their partial derivatives
is called the -second order partial derivative of f
and it is generally denoted as
When then we speak of cross (or mixed) partial

derivatives
A function of n variables has second order partial
derivatives that are usually arranged into a n x n Hessian
matrix
6
Second Order Derivatives and Hessians
The Hessian matrix is typically denoted as ( ) or
and takes the form
Young’s theorem: the Hessian matrix is a symmetric matrix,

i.e., for each pair of indices i and j
=
7
One Example
Consider the function , =3 +4 +7
Let us compute the Hessian matrix; we already computed
=6 +4 , =6 + 12 +7
Now we need to compute
=6 ; =6 + 24 ; = 12 + 12
Hessian matrix is
6 12 + 12
12 + 12 6 + 24
8
Optimization: the case of n-variable functions
Now we are ready to generalize optimization to the case of n-
variable functions
The strategy remains looking for candidate points (relative
extrema) and then try to isolate global ones among them
is a critical point for f if it fulfills
= ,
which means that
( ) = 0, for each i
If is an interior point which is a local maximum or minimum

then it is a critical point
However, the reverse is not true, i.e., the condition is necessary
but not sufficient for an interior point to be a local extremum
9
Checking the sign of the Hessian matrix
As one may guess from the one-variable case, second order
conditions involve checking the sign of the Hessian matrix
We need to add a definition to the matrix algebra review that
we discussed in the last lecture
• A principal minor of a square matrix A is the determinant of a
submatrix obtained by eliminating some rows and the
corresponding column; the order of a minor is the dimension of the
considered submatrix
• A leading principal minor is a principal minor obtained by
considering the first k rows and columns of the original matrix
• For instance,
=
=
the determinant of the
3x3 matrix itself
10
Checking the sign of the Hessian matrix
A square symmetric matrix is said to be
• Positive definite: if all its leading principal minors are
strictly positive
• Negative definite: if < 0 and then all its leading
principal minors alternate in sign (but are different from
zero)
• Indefinite: if we have a nonzero leading principal minor and
at least one leading principal minor does not follow the
patterns above
• Positive semidefinite: if every principal minor is
nonnegative
• Negative semidefinite: if 0 and every principal minor
of odd order is 0 and every principal minor of even order
is 0
11
Sufficient second order conditions
The sufficient second order conditions for a local extremum are
as follows, given that is an interior critical point:
• If ( ) is negative definite => is a local maximum
point
• If ( ) is positive definite => is a local minimum
point
• If ( ) is indefinite => is a saddle point
Semidefinite cases require further investigation, and we shall
skip their discussion
When the sign of the Hessian matrix does not depend on , the
local extrema are also global because when the Hessian is
positive (negative) definite over the entire domain the function
is strictly convex (concave)
12
Example of Unconstrained Optimization
Study the optimization of the following function: 3 +
3
Step 1: find the internal critical points
= 12 + =0
=3 3 =0
Solving that is non-trivial and time consuming
You get three critical points:
A(0,0); B( , ); C( , )
Step 2: compute the Hessian matrix
36 + 6
=
We need to check it at A, B, and C

13
Example of Unconstrained Optimization
As an example, I will only check the sign of the Hessian at
C( , )
9 3 3 6 3
= =
3 3 3 3
=6>0
=6 3 3 =9>0
Then the Hessian matrix is positive definite and the point is a local
minimum
This is an easy problem and yet you see how computationally intense
it is
Sometimes the solution shall ben find numerically anyway
Things get even worse when we introduce constraints (equality
constraints, inequality constraints or both)
14
Reference

(Chapter 11)
15
Hints of constrained optimization

In this video …
Hints of constrained optimization
Lagrange multiplier method
2
Hints of Constrained Optimization
Up to these points, all control variables have been independent
of each other: the decision made regarding one variable does
not impinge upon the choices of the remaining variables
• E.g., a two-product firm can choose any value for Q1 and any Q2 it
wishes, without the two choices limiting each other
• If the firm in the example is somehow required to fulfill a restri-
ction (e.g., a production quota) in the form of Q1 + Q2 = k, how-
ever, the independence between the choice variables will be lost
• The new optimum satisfying
the production quota constitu-
tes a constrained optimum, which,
in general, may be expected to
differ from the free optimum
Key Result : A constrained maxi-
mum can never exceed the free
maximum
3
Hints of Constrained Optimization
In general, a constrained maximum can be expected to achieve a
lower value than the free maximum, although, by coincidence, the
two maxima may happen to have the same value
o We had added another constraint intersecting the first constraint at
a single point in the xy plane, the two constraints together would
have restricted the domain to that single point
o Then the locating of the extremum would become a trivial matter
• In a meaningful problem, the number and the nature of the
constraints should be such as to restrict, but not eliminate, the
possibility of choice
o Generally, the number of constraints should be less than the
number of choice variables
Under C < N equality constraints, when we can write a sub-set
of the choice variables as an explicit function of all others, the
former can be substituted out:
max , ,…,
, ,…,
. . = ,…, ,…, = ,…, 4
Hint: Lagrange Multiplier Method
becomes: max , ,…, ,
, ,…,
an unconstrained problem
However, the direct substitution method cannot be applied
when the C constraints do not allow us to re-write the objective
functions in N – C free control variables
• Even if some of the variables become implicit functions of others,
it would be complex to proceed because the objective would
become “highly composite”
In such cases, we often resort to the method of Lagrange
(undetermined) multipliers
• The goal is to convert a constrained extremum problem into a
form such that the first-order condition of the free extremum
problem can still be applied
• For instance, consider an objective function z = f(x.y) subject to
the constraint g(x,y)=c where c is a constant
5
Hint: Lagrange Multiplier Method
The Lagrangian problem is:
max , [ , ]
, ,
The necessary FOC is then:

The stationary values of the
Lagrangian function Z will
automatically satisfy the constraint
The optimal value *

provides a measure of
the sensitivity of the
Lagrangian function to a
shift of the constraint
6
Reference

(Chapter 12)
7
Elementary choice under uncertainty
- dominance
Prof. Massimo Guidolin
Prep Course in Quantitative Methods for Finance

The Formal Set Up
Most financial assets (securities) are risky, i.e., they can be
characterized as contracts that give different (K) payoffs in different
states of the world to occur at a future point in time
o The assets of interest are said to belong to some asset menu
o Only one state will occur, though investors do not know, at the outset,
which one, i.e., the states are mutually exclusive
o The description of each state is complete and exhaustive
o The set of states, S, is given exogenously and cannot be affected by the
choices of the investors
Standard probability theory is used to capture the uncertainty on
the payoffs of securities, for instance:
Podcast 14: Elementary choice under uncertainty - mean-variance criterion 2

The Formal Set Up
Most financial assets (securities) are risky, i.e., they can be
characterized as contracts that give different (K) payoffs in different
states of the world to occur at a future point in time
o The assets of interest are said to belong to some asset menu
o Only one state will occur, though investors do not know, at the outset,
which one, i.e., the states are mutually exclusive
o The description of each state is complete and exhaustive
o The set of states, S, is given exogenously and cannot be affected by the
choices of the investors
Standard probability theory is used to capture the uncertainty on
the payoffs of securities, for instance:

The Formal Set Up
An investor’s task is a complex one and the optimal choice will result
from three distinct sets of (interacting) factors:
1. An investor's aversion toward or tolerance for risk
2. Some measure of the quantity of risk
3. How risk attitudes interact with the subjective uncertainties
associated with available assets to determine an investor's desired
portfolio holdings (demands)
o In the table, it is not evident why a rational investor ought to prefer
security C over security A (if any)
o An investor who pays more for security C than for A may be motivated
by a desire to lower range of variation of the payoffs
o Unclear how such inclinations against risk may be balanced off in the
light of the probability distribution that characterizes different states
The criteria of choice under uncertainty may be complete or
incomplete: a complete criterion is always able to rank all securities
or investment opportunities on the basis of their objective features;
an incomplete criterion is not
The Formal Set Up
by a desire to lower the range of variation of the payoffs
The Formal Set Up
by a desire to lower range of variation of the payoffs
Choice under uncertainty: (strong) dominance
A security (strongly) dominates another security (on a state-by-state
basis), if the former pays as much as the latter in all states of nature, and
strictly more in at least one state
Complete criteria form a good basis for portfolio choice
o E.g., an investor can always rank all available assets and to invest in some
pre-determined fraction starting from the top of the resulting ranking
A starkly incomplete criterion is strong dominance
basis), if the former pays as much as the latter in all states of nature,
and strictly more in at least one state
o All rational individuals would prefer the dominant security to the security
that it dominates
o Here rational means that the investor is non-satiated, that is, she always
prefers strictly more consumption (hence, monetary outcomes that may
be used to finance such consumption) to less consumption
The following example shows that strong dominance often does not
allow to rank assets or portfolios
o E.g., an investor may rank all available assets and to invest in some pre-
determined fraction starting from the top of the resulting ranking
o All rational individuals would prefer the dominant security to the
security that it dominates
o Here rational means that the investor is non-satiated, that is, she
always prefers strictly more consumption (hence, monetary outcomes
that may be used to finance such consumption) to less consumption
o E.g., an investor may rank all available assets and to invest in some pre-
determined fraction starting from the top of the resulting ranking
o All rational individuals would prefer the dominant security to the
security that it dominates
o Here rational means that the investor is non-satiated, that is, she
always prefers strictly more consumption (hence, monetary outcomes
that may be used to finance such consumption) to less consumption
o For instance, security B does not dominate security C and security A

does not dominate security C
o Hence, both securities A and C are not dominated by any other security,
while security B is (by security A)
o A rational investor may then decide to select between assets A and C,
ignoring B
o However, she cannot find an equivalently strong rule to decide to
decide between security A and C, hence the criterion is incomplete

o For instance, security B does not dominate security C and security A

does not dominate security C
o Hence, both securities A and C are not dominated by any other security,
while security B is (by security A)
o A rational investor may then decide to select between assets A and C,
ignoring B
o However, she cannot find an equivalently strong rule to decide to
decide between security A and C, hence the criterion is incomplete

Elementary choice under uncertainty
– mean-variance criterion
Prof. Massimo Guidolin
20550 – Prep Course in Quantitative Methods for Finance

The problem of dominance is that it escapes a definition of risk
However, in general, a security yields payoffs that in some states are

larger and in some other states are smaller than under any other
2
Podcast 15: Elementary choice under uncertainty - mean-variance criterion
Choice under uncertainty: mean-variance (dominance)
A security MV-dominates another security if it is characterized by a
higher expectation and by lower variance of payoffs than another one
When this is the case, the best known approach at this point consists
of summarizing the distributions of asset returns through their
mean and variance:
Under mean-variance (MV), the variance of payoffs measures risk

MV dominance establishes that a security dominates another one in
a mean variance sense, if the former is characterized by a higher
expected payoff and a by lower variance of payoffs
o The following example shows how mean and variance are used to rank
different securities
o Both securities A and C are more attractive than asset B as they have a
higher mean return and a lower variance
3
A security MV-dominates another security if it is characterized by a
higher expectation and by lower variance of payoffs than another one
When this is the case, the best known approach at this point consists
of summarizing the distributions of asset returns through their
mean and variance:
Under mean-variance (MV), the variance of payoffs measures risk

MV dominance establishes that a security dominates another one in
a mean variance sense, if the former is characterized by a higher
expected payoff and a by lower variance of payoffs
o The following example shows how mean and variance are used to rank
different securities
o Both securities A and C are more attractive than asset B as they have a
higher mean return and a lower variance
4
o However, security A fails to dominate security C (and vice versa) in a

mean-variance sense since it carries higher variance
Similarly to dominance, also MV is an incomplete criterion, i.e., pairs
of securities exist that cannot be simply ranked by this criterion
Because of its incompleteness, the MV criterion can at best only
isolate a subset of securities that are not dominated by any others
o E.g., security B, being dominated by both securities A and C, can be ruled
out from portfolio selection
o However, neither security A nor C can be ruled out because they belong
to the set of non-dominated
5
o However, security A fails to dominate security C (and vice versa) in a

mean-variance sense since it carries higher variance.
Similarly to dominance, also MV is an incomplete criterion, i.e., pairs
of securities exist that cannot be simply ranked by this criterion
Because of its incompleteness, the MV criterion can at best only
isolate a subset of securities that are not dominated by any others
o E.g., security B, being dominated by both securities A and C, can be ruled
out from portfolio selection
o However, neither security A nor C can be ruled out because they belong
to the set of non-dominated assets
6
Preference representation theorem
and its meaning
20550 – Prep Course in Quantitative Methods for

Finance
Utility-Based Choice Under Certainty
Rationality means that you can always express a precise

preference between any pair of bundles, that you should not
contradict yourself when asked to express preferences over
three or more bundles in successive pairs…
Such properties are formally derived from axioms of choice
The first step is that under such axioms, there exists a
continuous, time-invariant, real-valued ordinal utility function
u
That is, under rationality the ranking of bundles that you may
corresponds to the ranking derived from the utility function u(·)
2
Podcast 16: Preference representation theorem and its meaning
Rationality means that you can always express a precise

preference between any pair of bundles, that you should not
contradict yourself when asked to express preferences over
three or more bundles in successive pairs…
Such properties are formally derived from axioms of choice
The first step is that under such axioms, there exists a
continuous, time-invariant, real-valued ordinal utility function
u
That is, under rationality the ranking of bundles that you may
corresponds to the ranking derived from the utility function u(·)
3
4
Modern microeconomic theory describes individual behavior as the
result of a process of optimization under constraints
o The objective is determined by individual preferences
o Constraints depend on an investor’s wealth and on market prices
To develop such a rational theory of choice under certainty, we
postulate the existence of a preference relation, represented by the
For two bundles a and b, we can express preferences as: when a b,

for the investor in question, bundle a is strictly preferred to bundle b,
or she is indifferent between them
Pure indifference is denoted by a ~ b, strict preference by a b
In such a framework of choice rationality derives from a set of axioms
Completeness: Every investor is able to decide whether she
prefers a to b, b to a, or both, in which case she is indifferent with
respect to the two bundles; for any two bundles a and b, either a b
or b a or both; if both conditions hold, we say that the investor is
indifferent btw. the bundles
5

6

7
Under the axioms of choice, a continuous, time-invariant, real-valued
ordinal utility function u
Transitivity: For bundles a, b, and c, if a b and b c, then a
8
Transitivity: For bundles a, b, and c, if a b and b c, then a
9
Continuity: Let {xn} and {yn} be two sequences of consumption

bundles such that xn yn n
xn yn n, then the same
Completeness, transitivity, and continuity are sufficient to guarantee

the existence of a continuous, time-invariant, real-valued ordinal
utility function u(·), such that for any two objects of choice a and b,
a b if and only if u(a u(b)
Equivalently, a decision-maker, instead of optimizing by searching and
choosing the best possible bundle of goods and services, may simply
maximize the utility function u(·) (possibly, subject to constraints)
o Because of the continuity axiom, u(·) is a continuous function
o Because u(·) is an ordinal function, no special meaning may be attached
to its values, i.e., the exact size of the difference u(a) - u(b
10


11


12
IC1 IC2 IC3
13
IC1 IC2 IC3
14
Given u(·) and a monotone increasing transformation v(·), the function
v(u(·)) represents the same preferences as the original u(·)
o Different investors will be characterized by heterogeneous preferences
and as such will express different utility functions, as identified by
heterogeneous shapes and features of their u(·) functions
o However, because a b if and only if u(a u(b), any monotone
increasing transformation v(·) will be such that v(u(a v(u(b)), or,
assuming v(·) monotone increasing cannot change the ranking
u
(·), the function (u
u(·)
o E.g., if u(a u(b), (u(a))3 u(b))3 … guys, any guess?
15
16
17
The expected utility theorem

Finance
Utility-Based Choice Under Uncertainty
These concepts and the use of utility functions can be

generalized to the case of choice under uncertainty concerning
securities and random payoffs
Under certainty, the choice is among consumption baskets

with known characteristics; under uncertainty, the objects of
choice are vectors of state-contingent monetary payoffs
Disentangling preferences from probabilities is a complex

problem that simplifies to a manageable maximization under
assumptions
2
Podcast 17: Expected utility theorem
o Ranking vectors of monetary payoffs involves more than pure elements
of taste or preferences
o E.g., when selecting between some stock A that pays out well during
recessions and poorly during expansions and some stock B that pays out
according to an opposite pattern, it is essential to forecasts the
probabilities of recessions and expansions
Disentangling pure preferences from probability assessments is a
complex problem that simplifies to a manageable maximization
problem only under special assumptions, that is when the expected
utility theorem (EUT) applies
Under the EUT, an investor's ranking over assets with uncertain
monetary payoffs may be represented by an index combining, in the
most elementary way (i.e., linearly):
a preference ordering on the state-specific payoffs
the state probabilities associated to these payoffs
The EUT simplifies the complex interaction between probabilities and
preferences over payoffs in a linear way, i.e., by a simple sum of
products 3
o Ranking vectors of monetary payoffs involves more than pure elements
of taste or preferences
o E.g., when selecting between some stock A that pays out well during
recessions and poorly during expansions and some stock B that pays out
according to an opposite pattern, it is essential to forecasts the
probabilities of recessions and expansions
Disentangling pure preferences from probability assessments is a
complex problem that simplifies to a manageable maximization
problem only under special assumptions, that is when the expected
utility theorem (EUT) applies
Under the EUT, an investor's ranking over assets with uncertain
monetary payoffs may be represented by an index combining, in the
most elementary way (i.e., linearly):
a preference ordering on the state-specific payoffs
the state probabilities associated to these payoffs
The EUT simplifies the complex interaction between probabilities and
preferences over payoffs in a linear way, i.e., by a simple sum of
products 4
The Expected Utility Theorem
Under the assumptions of the EUT, one ranks assets/securities on the
basis of the expectation of the utility of their payoffs across states
Under the six axioms specified below, there exists a cardinal,
continuous, time-invariant, real-valued Von Neumann-Morgenstern
(VNM) felicity function of money U(·), such that for any two
lotteries/gambles/securities (i.e., probability distributions of
monetary payoffs) x and y,
x y)]
where for a generic lottery z (e.g., one that pays out either x or y),
The perceived, cardinal happiness of a complex and risky menu of

options, is given by the weighted average of the satisfaction derived
from each such individual option, weighted by the probabilities
o In the following example we use U(z) = ln(z)
o The ranking by the EU criterion differs from MV: while according the
latter only securities B and D are dominated (by A and C), and hence A
and C cannot be ranked, according to EU, security A ranks above security
C (and B and D) Podcast 17: Expected utility theorem 5
The Expected Utility Theorem
Under the assumptions of the EUT, one ranks assets/securities on the
basis of the expectation of the utility of their payoffs across states
Under the six axioms specified below, there exists a cardinal,
continuous, time-invariant, real-valued Von Neumann-Morgenstern
(VNM) felicity function of money U(·), such that for any two
lotteries/gambles/securities (i.e., probability distributions of
monetary payoffs) x and y,
x y)]
where for a generic lottery z (e.g., one that pays out either x or y),
The perceived, cardinal happiness of a complex and risky menu of

options, is given by the weighted average of the satisfaction derived
from each such individual option, weighted by the probabilities
o In the following example we use U(z) = ln(z)
C (and B and D) Podcast 17: Expected utility theorem 6
The Expected Utility Theorem: Supporting Axioms
o This example shows one fundamental advantage of EUT-based criteria

over dominance and MV criteria: its completeness
C (and B and D)
7
o This example shows one fundamental advantage of EUT-based criteria

over dominance and MV criteria: its completeness
C (and B and D)
8
The axioms supporting the Expected
Utility Theorem

Finance
o What are the axioms supporting the EUT?

o These concerns lotteries
-
Lottery reduction and consistency: (i
x; 1 - -
o This axiom means that investors are concerned with the net cumulative
lotteries are set up

2
o What are the axioms supporting the EUT?

o These concerns lotteries
-
Lottery reduction and consistency: (i
x; 1 - -
o This axiom means that investors are concerned with the net cumulative
lotteries are set up

3
called p and q
These is the compound lottery of p and q
4
The axioms supporting the EUT are (i comple-
teness
o
Completeness:
z to l, l to z
Transitivity: For any lotteries z, l, and h, if z l and l h, then z h
Continuity
1 2 1 2
o
is received under conditions of uncertainty, through a lottery
: Let x, y, z x > y > z, then there
x, z
not preserve ordering over lotteries

5
teness
o
Completeness:
z to l, l to z
Continuity
1 2 1 2
o
x, z

6
teness
o
Completeness:
z to l, l to z
Continuity
1 2 1 2
o
x, z

7
Uniqueness of EU preferences up to
monotone increasing linear
transformations

Finance
The EUT: Linear Affine Transformations
Any linear affine, monotone increasing transformation of a VNM utility
function (V(·) = a + bU b > 0) represents the same preferences
Arbitrary monotone transformations of cardinal utility functions do
Are preference defined by the EUT unique up to some kind of
transformations as standard u(·) functions were?
V(·) = a + bU b>0 is also a VNM felicity

o This is because V((x,y a+bU((x,y
a+b -
a+bU - a+bU -
o E.g., if John’s felicity function is UJohn(Ri Ri

UMary(Ri -2 + 4ln(Ri
o However, when UMary(Ri - ln(Ri UMary(Ri Ri 3, this will
not be the case 2
Arbitrary monotone transformations of cardinal utility functions do not
preserve ordering over lotteries
Are preference defined by the EUT unique up to some kind of
transformations as standard u(·) functions were?
V(·) = a + bU b>0 is also a VNM felicity

o This is because V((x,y a+bU((x,y
a+b -
a+bU - a+bU -
o E.g., if John’s felicity function is UJohn(Ri Ri
UMary(Ri -2 + 4ln(Ri share the same preferences
o However, when UMary(Ri - ln(Ri UMary(Ri Ri 3, this will
not be the case
3
4
Completeness of EUT-Induced Rankings
Different VNM felicity functions may induce rather different rankings of
lotteries/securities/portfolios, but these will always be complete
This example shows that the type of felicity function assumed for an
investor may matter a lot
Instead of a log-utility function, assume U(Ri -(Ri - - Ri
o While under a logarithmic utility function, it was security A to be ranked

on top of all others, now security A and C are basically on par
o The log and U(Ri - Ri are related functions but the second implies
larger risk aversion 5
Defining and measuring risk aversion

Measuring Risk Aversion
Given a specification of probabilities, the utility function of monetary
o
aversion to risk
o
u(x1,x2,…,xM
W
We shall always assume non-satiated individuals, U’(W
o Gordon Gekko’s greed, https://www.youtube.com/watch?v=VVxYOQS6ggk
To understand what risk aversion means, consider a bet where the
investor either receives an amount h with probability ½ or must pay
an amount h with probability ½, so the bet in expectation is fair
The intuitive notion of “being averse to risk” is that that for any level
of wealth W, an investor would not wish to enter in such a bet:
utility of wealth with no gamble exceeds expected utility of wealth+gamble

o H -mean random variable that takes value h with prob. ½ and
–h with prob. ½ 20: Defining and measuring risk aversion 2
Podcast
Measuring Risk Aversion
Given a specification of probabilities, the utility function of monetary
o
aversion to risk
o
u(x1,x2,…,xM
W
We shall always assume non-satiated individuals, U’(W
o Gordon Gekko’s greed, https://www.youtube.com/watch?v=VVxYOQS6ggk
To understand what risk aversion means, consider a bet where the
investor either receives an amount h with probability ½ or must pay
an amount h with probability ½, so the bet in expectation is fair
The intuitive notion of “being averse to risk” is that that for any level
of wealth W, an investor would not wish to enter in such a bet:
utility of wealth with no gamble exceeds expected utility of wealth+gamble

o H -mean random variable that takes value h with prob. ½ and
–h with prob. ½ 20: Defining and measuring risk aversion 3
Podcast
Defining Risk Aversion
A risk-averse investor is one who always prefers the utility of the
expected value of a fair bet to the expectation of the utility of the same
concave
utility function has the form below

We say
, the slope of
investor gets wealthier
U’(W W dW
decreases as W grows larger
If U’(W U’’(W
o Positive deviations from a fixed
average wealth do not help as
much as the negative ones hurt
o The segment connecting W – h and W + h lies below the utility function
4
Podcast 20: Defining and measuring risk aversion
Defining Risk Aversion
A risk-averse investor is one who always prefers the utility of the
expected value of a fair bet to the expectation of the utility of the same
concave
utility function has the form below

We say
, the slope of
investor gets wealthier
U’(W W dW
decreases as W grows larger
If U’(W U’’(W
o Positive deviations from a fixed
average wealth do not help as
much as the negative ones hurt
o The segment connecting W – h and W + h lies below the utility function
5
Other Risk Preference Types
A risk-
convex (linear
We obtain risk-loving behavior when
the
,
U’(W W dW increases as W
grows larger
If U’(W U’’(W
o Positive deviations from a fixed average wealth give more happiness
than the unhappiness caused by negative deviations
The case of risk neutral investors obtains if U’(W
o From standard integration of the marginal utility function, it follows
W b W a + bW, a linear utility function
6
A risk-
convex (linear
We obtain risk-loving behavior when
the
,
U’(W W dW increases as W
grows larger
If U’(W U’’(W
o Positive deviations from a fixed average wealth give more happiness
than the unhappiness caused by negative deviations
The case of risk neutral investors obtains if U’(W
o From standard integration of the marginal utility function, it follows
W b W a + bW, a linear utility function
7
8
Absolute and relative risk aversion
coefficients

2
Podcast 21: Absolute and relative risk aversion coefficients
Absolute and Relative Risk Aversion Coefficients
How can we manage to measure risk aversion and compare the risk
aversion of different decision makers?
Given that under mild conditions, risk aversion is equivalent to
U''(W)<0 for all wealth levels, one simplistic idea is to measure risk
o E.g., John is more risk averse than Mary is iff |UJohn''(W)| > |UMary''(W)|
Unfortunately, this approach leads to an inconsistency because when
UJohn(W) = a + bUMary(W) with b > 0 and b , clearly U’’John(W) =
bUMary''(W) UMary''(W) > 0
But we know that by construction, John and Mary have the same
preferences for risky gambles and therefore that it makes no sense to
state the John is more risk averse than Mary
Two famous measures that escape these drawbacks are the
coefficients of absolute/relative risk aversion:
o Because MU(W) is a function of wealth, ARA(W) and RRA(W) are too

3
How can we manage to measure risk aversion and compare the risk
aversion of different decision makers?
Given that under mild conditions, risk aversion is equivalent to
U''(W)<0 for all wealth levels, one simplistic idea is to measure risk
o E.g., John is more risk averse than Mary is iff |UJohn''(W)| > |UMary''(W)|
Unfortunately, this approach leads to an inconsistency because when
UJohn(W) = a + bUMary(W) with b > 0 and b , clearly U’’John(W) =
bUMary''(W) UMary''(W) > 0
But we know that by construction, John and Mary have the same
preferences for risky gambles and therefore that it makes no sense to
state the John is more risk averse than Mary
Two famous measures that escape these drawbacks are the
coefficients of absolute/relative risk aversion:
o Because MU(W) is a function of wealth, ARA(W) and RRA(W) are too

4
Both ARA(W) and RRA(W) are invariant to linear monotonic transforms;
this occurs because both are “scaled” at the denominator U'(W)
o If nonzero, the reciprocal of the measure of absolute risk aversion, T(W)
W) can be used as a measure of risk tolerance
o When ARA is constant, RRA(W) must be a linear (increasing) function of
wealth; when RRA is constant, then it must be the case that ARA(W) =
RRA/W, a simple inverse function of wealth
o ARA and RRA are invariant to linear monotonic transformations; e.g.,
To rank John and Mary’s risk aversion, we need to verify whether

ARAJohn(W) > ARAMary(W) (or the opposite) for all wealth levels
o Same applies to their coefficient of relative risk aversion for all wealth
o Possible that for some intervals of wealth it may be (R)ARAJohn(W) >
(R)ARAMary(W) but for other levels/intervals the inequality be reversed
Both measures are local as they characterize the behavior of
investors only when the risks (lotteries) considered are small
5

6

7
o Psychological research has documented differences in measured risk
aversion (below it is absolute) across sex and age…
Absolute
(ABSOLUTE)
o ... and countries

o In the first picture
above, what is the
link between the
premium to avoid a
lottery and (absolute)
risk aversion? 8
9
ARA and RRA and the Odds of
accepting a bet

ARA and RRA and the Odds of Accepting a Bet
What is the economic interpretation of ARA and RRA coefficients?
A first interpretation is that ARA and RRA are related to the odds that
a risk-averse agent may accept a bet
o Consider an investor with wealth W who is offered at no charge, a bet
involving winning or losing an amount h -
o
-sum tax of h)
o
h -h
o
clearly tilted in favor of the investor
o An investor’s willingness to accept the bet may depend on her wealth W
W; h) be that probability at which the agent is indifferent
-thing utility she derives in the absence of the bet equals

its expected utility 2
What is the economic interpretation of ARA and RRA coefficients?
A first interpretation is that ARA and RRA are related to the odds that
a risk-averse agent may accept a bet
o Consider an investor with wealth W who is offered at no charge, a bet
involving winning or losing an amount h -
o
-sum tax of h)
o
h -h
o
clearly tilted in favor of the investor
o An investor’s willingness to accept the bet may depend on her wealth W
W; h) be that probability at which the agent is indifferent
-thing utility she derives in the absence of the bet equals

its expected utility 3
As the ARA coefficient of an investor grows, her probability required to
enter a bet grows, at least locally (for small bets)
Your textbook (please see it) show that by applying a Taylor’s
expansion to the previous equation, one can show that for a small
W) and the minimum odds required to
o W; h)-
“mark-up” in the odds of the bet that the investor requires to tolerate it
o W; h) depends on the size of the bet, h, in a very
considering a second-order approximation that applies for h

o If one accepts a characterization in which John is more risk averse than
John(W; h Mary(W; h), we know that as a first
approximation this is equivalent to stating that ARAJohn
ARAMary(W) for all wealth levels
o Exploiting ARA(W W W, we
can re-
4
o W; h)-

ARAMary(W) for all wealth levels
can re-
5
o W; h)-

ARAMary(W) for all wealth levels Relative size of the bet
can re-
6
Two Examples
John is characterized by VNM function
Therefore so that
which is clearly constant

As a result, in the face of a two-outcome symmetric bet with size h, we
An increase in either absolute risk aversion and in the size of the bet
have identical effects
W; h) turns out to be independent of wealth
Therefore so that
7
Two Examples
John is characterized by VNM function
Therefore so that
which is clearly constant

As a result, in the face of a two-outcome symmetric bet with size h, we
An increase in either absolute risk aversion and in the size of the bet
have identical effects
W; h) turns out to be independent of wealth
Therefore so that
8
Applications to Real-Life Examples
Because casino are for-profit companies and hence they «rig» chance
games in their favor, a gambler structu-
W; h) for all h implied by the gambles
Therefore no risk-averse agent should ever walk into a casino, ever!
However, not all risk-
risk-lover, and are both negative and therefore W; h
W; h) to accept risky, unfair gambles
-cent gamble, she
-cent gamble, she will still
In short, constant ARA agents care for the absolute size (h) of
gambles, while constant RRA care for their relative size ( )
9
Because casino are for-profit companies and hence they «rig» chance
games in their favor, a gambler structu-
W; h) for all h implied by the gambles
Therefore no risk-averse agent should ever walk into a casino, ever!
However, not all risk-
risk-lover, and are both negative and therefore W; h
W; h) to accept risky, unfair gambles
-cent gamble, she
-cent gamble, she will still
In short, constant ARA agents care for the absolute size (h) of
gambles, while constant RRA care for their relative size ( )
In fact, there is empirical

and experimental evidence
that investors would be
risk-averse over gains but
risk-seekers over losses
This is called prospect
theory
With prospect theory, we enter the

domain of so-called behavioral
The reason is that there are no obvious

axioms of rational choice supporting it
In fact, there is empirical

and experimental evidence
that investors would be
risk-averse over gains but
risk-seekers over losses
This is called prospect
theory
With prospect theory, we enter the

domain of so-called behavioral
The reason is that there are no obvious

axioms of rational choice supporting it
ARA and RRA and the Risk Premium

The certainty equivalent of a risky bet is the (maximum) amount of
money one is willing to pay for the risky bet, less than its expected value
The other interpretation of ARA and RRA is that they relate to size of
the risk premium characterizing a gamble/lottery/security
o This derives from the very definition of risk aversion and it is simply an
application of the standard Jensen’s inequality:
o H is a random variable with S

s
The (maximum) certain sum of

money a person is willing to pay to
acquire a risky opportunity defines
his certainty equivalent (CE):
or
2
Podcast 23: ARA and RRA and the Risk Premium
The certainty equivalent of a risky bet is the (maximum) amount of
money one is willing to pay for the risky bet, less than its expected value
The other interpretation of ARA and RRA is that they relate to size of
the risk premium characterizing a gamble/lottery/security
o This derives from the very definition of risk aversion and it is simply an
application of the standard Jensen’s inequality:
o H is a random variable with S

s
The (maximum) certain sum of

money a person is willing to pay to
acquire a risky opportunity defines
his certainty equivalent (CE):
or
3
The risk premium measures the difference between the expected value
of a bet and the certainty equivalent an investor is willing to pay for it
The difference between the expected value of a risky prospect and its
W,H):
W,H W+H]- CE(W,H)
It represents the maximum amount the agent would be willing to pay
to avoid the gamble implied by the risky asset
W,H) must be s.t.: U( W+H]- W, H)) W+H)]
o The length of both red segments
o If one were to make it “more

concave”, the size of both
segments would increase
o The same would occur if—for fixed
—one were to increase h
Using Taylor approximations, your
result follows 4
The risk premium measures the difference between the expected value
of a bet and the certainty equivalent an investor is willing to pay for it
The difference between the expected value of a risky prospect and its
W,H):
W,H W+H]- CE(W,H)
It represents the maximum amount the agent would be willing to pay
to avoid the gamble implied by the risky asset
W,H) must be s.t.: U( W+H]- W, H)) W+H)]
o The length of both red segments
o If one were to make it “more

concave”, the size of both
segments would increase
o The same would occur if—for fixed
—one were to increase h
Using Taylor approximations, your
result follows 5
For small risks, ARA and RRA are proportional to the risk premium but
are interacted with variance, i.e., the perceived quantity of risk
Time for a simple, “visual” numerical example:
6
×(Quantity of risk)
o As before, because ARA(W W)/W, we can re-write the result as:
o Consider a two-outcome symmetric bet with size h (i.e., the possible

outcomes are h and –h -
respectively), we have that Var H]= h2 h2 - -h)2
o E.g., if John is characterized by VNM function then
o h
then John(W; h 2= 5 euros, and CE = 95
o Let’s check what the definition yields:
7
×(Quantity of risk)
o As before, because ARA(W W)/W, we can re-write the result as:
o Consider a two-outcome symmetric bet with size h (i.e., the possible

outcomes are h and –h -
respectively), we have that Var H]= h2 h2 - -h)2
o E.g., if John is characterized by VNM function then
(independent of wealth)
o h
then John(W; h 2= 5 euros, and CE = 95
o Let’s check what the definition yields:
8
A Different Definition of Risk Premium
Possible to convert these ideas into the classical definition of a
percentage risk premium to be added to asset returns to compensate
a decision-maker for the risk she runs
Any risky gamble H H so
that if CER is the riskless, certainty equivalent rate of return, then:
o This equation defines CER implicitly

o H] - CER is often interpreted as a percentage risk
premium associated to the risky asset/gamble H

o It is the percentage extra return that an investor requires to accept the
risky gamble instead of settling for the riskless CER
o Consider again Mary, characterized by a power utility function of wealth
o Because
One can show (see textbook) that
9
A Different Definition of Risk Premium
Possible to convert these ideas into the classical definition of a
percentage risk premium to be added to asset returns to compensate
a decision-maker for the risk she runs
Any risky gamble H H so
that if CER is the riskless, certainty equivalent rate of return, then:
o This equation defines CER implicitly

o H] - CER is often interpreted as a percentage risk
premium associated to the risky asset/gamble H

o It is the percentage extra return that an investor requires to accept the
risky gamble instead of settling for the riskless CER
o Consider again Mary, characterized by a power utility function of wealth
o Because
One can show (see textbook) that

A Few Common Utility of Wealth
Functions

Introducing a Few Common Utility of Wealth Functions
Our earlier examples have featured a few VNM utility functions, here
we simply collect ideas on their functional form and properties
Given an initial level of wealth W0, a utility of money function, which
relative to the starting point has the property U(W)/U(W0
W0), so that utility reacts only to the absolute difference in wealth, is
of the absolute risk aversion type
Only (non-satiated) function meeting this requirement is the
(negative) exponential 0 is
constant:
o The textbook shows that this implies a constant ARA, and because of that
the utility function is also referred to as CARA
o As ARA(W) = , RRA(W) = ARA(W)W = W, a linear function of wealth
o RRA(W) depends on initial wealth level, relative quantities such as the
percentage risk premium depend on initial wealth, which is problematic
A power, CRRA utility function is
o The textbook proves that in this
case RRA(W) =
2
Podcast 24: A Few Common Utility of Wealth Functions
constant:
case RRA(W) =
3
constant:
case RRA(W) =
4
o As ARA(W) = RRA(W)/W = /W, an inverse function of wealth
o The textbook reports numerical examples that emphasize that different
utility functions (even within the same power family) imply—for the
same bet—rather different estimates of CE and hence risk premia
A very popular class of utility functions is the quadratic one:
Because , this implies:
o A quadratic utility investor is not always risk averse: ARA(W) and

RRA(W W, or if W W* bliss point
o In fact, W W*
non-satiated, i.e., for the utility function to be monotone increasing
One final VNM utility function is the linear one: U(W) = a + bW, b > 0
U'(W)=b and U''(W)=0, imply that ARA(W) = RRA(W) = 0
5

o In fact, W W*
6

o In fact, W W*
7
All these utility functions are strictly increasing and concave, have
risk tolerance T(W) that depends of wealth in a linear affine fashion:
These functions are called linear risk tolerance (LRT) utility functions
(alternatively, HARA utility functions, where HARA stands for
hyperbolic absolute risk aversion, since ARA(W) defines a hyperbola)
LRT utility functions have many attractive properties:
It is possible to check that
o Correspondingly, the risk tolerance function is

o It is clearly linear affine and increasing in wealth
o This nests all cases reported above
8
All these utility functions are strictly increasing and concave, have
risk tolerance T(W) that depends of wealth in a linear affine fashion:
These functions are called linear risk tolerance (LRT) utility functions
(alternatively, HARA utility functions, where HARA stands for
hyperbolic absolute risk aversion, since ARA(W) defines a hyperbola)
LRT utility functions have many attractive properties:
It is possible to check that

A
o
o Correspondingly, the risk tolerance function is

o It is clearly linear affine and increasing in wealth
o This nests all cases reported above
9
Common Utility of Wealth Functions
All functions, apart from the linear, risk-neutral function, are

concave
No special meaning (or lack therefore) ought to be attached to
the fact that all utility function are negative for some wealth
levels (in fact, a few are always negative for all wealth levels)
Summary: Common Utility of Wealth Functions
Under such a definition, the risk premium is the percentage extra
return that an investor requires to accept the risky gamble
instead of settling for the riskless CER
The four most common VNM felicity functions are
Negative exponential, CARA
Power, CRRA
Quadratic, IARA
Linear, risk-neutral U(W) = a + bW with b > 0
Quadratic utility poses a few problems: e.g., the investor is not
nonsatiated for all wealth levels; she is satiated below the bliss
These functions are called linear risk tolerance (LRT) utility
functions (alternatively, HARA, hyperbolic absolute risk aversion,
because their ARA(W) defines a hyperbola)
From the Density of Wealth to the Density of U(W)
It changes the perception of the problem for an investor

On the horizontal axis, where wealth is measured, we plot the density
function of portfolio outcomes
This does not have to be, but could be a symmetric Gaussian density
We map the probability distribution of wealth into a probability density
function for the corre-
sponding utility index,
f(U(W))
The concavity of the utility
function makes for one
asymmetric, fat
tailed distribution
that certainly deviates from
a (Gaussian) benchmark
May have important
effects on investors’
optimal portfolios

Quantitative Methods For Finance

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantitative Methods For Finance

Uploaded by

Copyright:

Available Formats

Foundations of Calculus

Limits and continuity of a

A definition of function, domain and codomain

Geometric properties of functions

We typically use letters like f or g to denote such a rule

Non algebraic functions: exponential functions (where x appears

Mathematics for Economists by Blume and Simon (Chapters

Limits and continuity of a

Rule for Limits

To get a first intuition of the concepts of limit consider that a

Example: the function plotted in the figure is not defined at

However, the limit at xo=1

In other words, f is continuous at if all the values of f at points

Formally, we say that a function f is continuous at when for any

A function f(x) is continuous at if for any sequence , f( )

Using limits, we can give the following definition. We say that a

If a function is not continuous at x we say that it is discontinuous at

We say that a function is continuous if it is continuous in any point

Mathematics for Economists by Blume and Simon (Chapters

Differentiability and rules of

The slope of functions

Rules for computing the derivatives

Higher order derivatives

Therefore the y-coordinate of Q is

The derivatives of other commonly used functions are:

For example, consider the function =2 +6

The second derivative is the derivative of f’, that is

The third derivative is = 12

Derivatives from the fourth onwards are equal to zero

Mathematics for Economists by Blume and Simon (BS)

Linear approximation and differentials

Differential of the function f at

X 0.1 0.5 1 5 10 100

Now = 100 can be considered small enough to lead to a

0.1 0.2 0.3 0.4 0.5

Actual change 8.84101 19.53632 32.36343 47.62624 65.65625

Linear Approximation 8.00000 16.00000 24.00000 32.00000 40.00000

Quadratic Approximation 8.80000 19.20000 31.20000 44.80000 60.00000

Other example: = ln( ) to be approximated around 1

Actual change 0.09531 0.18232 0.26236 0.33647 0.40547

Linear Approximation 0.10000 0.20000 0.30000 0.40000 0.50000

Quadratic Approximation 0.09500 0.18000 0.25500 0.32000 0.37500

Mathematics for Economists by Blume and Simon (BS)

Logarithmic and exponential functions

Natural exponential function

Natural logarithmic function

When m increases towards infinity, will converge to

(if interested 100 2.70481

Therefore, if we increase the frequency of compounding to

With continuous compounding we get

Hence, in XY-coordinates, f becomes = ln =

The slope of the graph in the log-log terms is

Mathematics for Economists by Blume and Simon (Chapter 5)

Fundamental Methods of Mathematical Economics by Chiang

Matrices and Vectors

Matrices and Vectors

Foundations of Linear Algebra 2

If A=A’ then the matrix is said to be symmetric

A null matrix is the counterparty of the number zero; it is a

Foundations of Linear Algebra 5

Foundations of Linear Algebra 6

the norm is = 2 + 1 + 4 + ( 3) = 30 = 5.5