f
Why learn maths?
In order to:

 read economic literature

 be able to critically assess economic theory
 see logical implication of assumptions
 discover contradictions in assumption
 organise quantitative data and highlight structure
 isolate a particular factor

But do not:

o Let the mathematics “take over”

o Take assumptions for granted
o Forget that a mathematical model is no better than its assumptions
o Confuse mathematical/statistical relations with causality

How to learn maths?

 Mathematics is a science:
o Learn rules and definitions precisely.
o Remember the logic of the arguments.

 Mathematics is a craft:
o Practise!

 Mathematics is formalised logic

o It consists of deductions from assumptions - not of statements of fact and it
cannot tell you anything about causality

Formal Logic:
Mathematics uses implications and equivalences, sufficient conditions and necessary
Implication: P  Q
this is read as “P implies Q” or “If P then Q”
where P and Q are statements.

Ex 1. If (I drop the pen) then (the pen falls to the floor)

This implication is true if it is not false:
The implication is not true (false):
If I drop the pen and it does not fall to the floor,
Therefore the implication is true:
 If I drop the pen and it falls to the floor
 If I do not drop the pen and it doesn’t fall to the floor.
 If I do not drop the pen and it falls to the floor anyway.

Ex 2, If (it snows) then (it is cold outside) It snows  it is cold outside.

The implication is true:
 If it is snowing and it is cold.
 If it is not cold and not snowing.
 If it is cold and not snowing.
The implication is false:
 If it is snowing even though it isn’t cold.

P  Q means that if P is true, then Q must always be true too. It is enough to know that P is
the case to be certain that Q is the case too. When P  Q, P is said to be a sufficient
condition for Q. consequently the implication is false if and only if P can be true and at the
same time Q is not.

Example 1 is true also if I don’t drop the pen and example 2 is true on a day without snow.
Note the difference between saying “the statement P is false” and saying “the implication P
 Q is false”.

But if I drop the pen and it is tied by a string to my finger or if I hold my hand above a desk,
the implication in example 1 is false. If it snows on a warm summer day the implication in
example 2 is false.

P  Q is the same as Not Q Not P. This means that if P is a sufficient condition for
Q, then Q is always a necessary condition for P.

If we know that we implication in example 2 is true we know that if it snows it is cold, but
also that if it is not cold, then it is not snowing. If the temperature is +10 C, it isn’t necessary
to look out the window to know that it isn’t snowing outside.

If P  Q and Q  P they are said to be equivalent: P  Q

This is read as “P (Q) is equivalent to Q (P)”, “P if and only if Q”

Example: I stand in front of the whiteboard  the whiteboard is behind me

Note that logical implications are not causal explanations: It snows  it is cold but the snow
is not the cause of or the reason for cold.

Economic models
Elements of a model:
 Exogenous variables – values are given, “drop from the sky”
 Endogenous variables – values are determined by the model
 Parameters - constants (fixed numbers) in the relations that make up the model.
Equations that are part of the model may have different roles. They can be:
 Behavioural equations
 Equilibrium equations (conditions)
 Identities
Static models – a snapshot of something at a given moment
Comparative statics – use a static model to find and compare conditions
for different values of one or more exogeneous variables (“at different

Dynamic models – show also the process of change from one condition to another

Example: A simple national income model

Y=C+I (1)

I = I0 (2)

C = a + bY 0<a 0<b<1 (3)

Real numbers and the number line
The real numbers can be represented on a number line. Every real number corresponds to a
point on the line and every point on the line corresponds to a real number. If the point
corresponding to the number A is to the left of the point corresponding to the number B on
the number line, then A<B. (For example, - 1000 < - 1)

-2 -1 1

The distance from a to the point zero is the absolute value of a, a
The distance between a and b is a-b = b - a
If a≥ 0 then a-0=a= a
If a ≤ 0 then a= -a
-(-a) = a
a= -a

A connected piece of the number line is called an interval. A bounded interval begins and
ends in two points on the number lines. If we call the points a and b, and assume that a<b,
then the closed interval [a, b] is the set of all numbers, which are equal to or larger than a

and equal to or smaller than b. The open interval (a, b) is the set of all numbers, which are
strictly larger than a and strictly smaller than b. (You may also see open intervals written as
]a, b[.) a and b are called the end points of the interval. The set of numbers that are in the
intervals, but are not end points makes up the interior of the interval. A closed interval
includes its end points, an open interval does not.

An interval which is not bounded goes to plus or minus infinity (or both). Infinity is
represented by the symbol ∞. For example, the interval (3, ) consists of all real numbers
which are strictly larger than 3.

All real numbers can be added, subtracted and multiplied with any real number. All real
numbers can be divided by any real number EXCEPT ZERO.

Brackets (parentheses):
The basic rule for performing several arithmetical operations is that multiplication and
division are performed first, then addition and subtraction. BUT if any part of an arithmetical
expression is written within brackets, those operations take priority and are done first of all.

2 + 410-7 = 2 + 40 – 7 = 35 BUT 2 + 4(10-7) = 2 + 43 = 14

Multiply an expression within brackets with a number: a(2b +c) = 2ab + ac

3(4+x) = 12 + 3x
Multiply out a factor from inside brackets: 2ab + ac = a(2b + c)
Multiply expressions within brackets: (2a+3b-c)(d-2e) = 2ad -4ae+3bd-6be-cd+2ce

Important arithmetic rules:
-(-a) = a
a(-b) = (-a)b = -ab
(-a)(-b) = ab
a(b+c) = ab + ac distributive law
2 2 2
(a+b)(a+b)=(a+b) =a +2ab+b rules of squares
(a+b)(a-b)=a2-b2 rule of conjugates

a c ad  bc
  addition of quotients
b d bd
a c a c
  multiplication of quotients
b d b d
division of quotients
b  a d
c b c
d Do not divide by zero!

Example: If 4 kilos of apples cost 60 SEK, what is the price per kilo?
If ¼ of a kilo of cherries cost 16 SEK, what is the price per kilo?
Did you use the same method of calculation in both cases?

How do real wages change if:
a) Prices increase by 5% and nominal wages by 7%?
b) Prices increase by 300% and nominal wages by 200%?

Def. A function (mapping) f is a rule which assigns to every element x of a particular set, the
domain of f, one, and only one, value f(x). The set of all such values is called the range of f.

Domain (Df) Range (Vf)

z y
w u

f(x) = f(z) =y
f(w) = u
“f maps x and z onto y and w onto u”
A function can be seen as a “pairing rule” such that for any element of the domain (any value
of the variable) that we pick, we can unambiguously name one element of the range and say
that this is the value of the function for this value of the variable.

For each x in the domain there is one and only one y in the range such that y = f(x) BUT for
one element, y, in the range there can be several different elements (say, x and z) in the
domain such that y=f(x) AND y=f(z).

Example 1 g(y) = 3y3 – 2 g(10) = 3 000 – 2

g(2) = 3*(23) – 2 = 3*8-2 = 22 g(3y) = 3*(3y)3-2=81y3-2

Ex. 2 Function Domain Range

y=x–1 all real numbers, R R
y = x2 R y≥0
1 x1 y0
1 x

Some functions are such that to each element in the range corresponds only one element in
the domain. In other words, with these functions it cannot happen that f(x)=f(y) unless x=y.
Such functions are said to be one-to-one.

Power functions and power rules

f ( x)  x a , where a is a constant, is a power function

Power rules:
For positive integers m and n and real numbers x:

1. x  x  x  ... x

(n factors x)

2. xn xm  xnm

xn 
för x  0
x n

xy n
 xn  y

5. x 
n m  xn  m

 1 för x  0
6. x

x m mx
 
for x  0and/or m odd

For x>0 the rules are valid for all real values of n and m.

 x  is read as ”the m
m th
root of x”.  x
 x . In other words, the mth root of x raised

to m makes x. The second root (square root)  x  is usually written simply as  x  .


1 1
Note that from rule 3 it follows that x n  x  (  n)  
xn 1

To see why these rules are reasonable:
2. According to Rule 1. xn• xm = x• x•x•.... •x • x• x•.... •x

n factors x m factors x

n+m factors x
(As an example, write 23•22 = 25 divided into factors.)
1 xm
3. According to Rule 2. x •x = x -n m-n
. But x  n  n  x mn since we can foreshorten,

x x
dividing by n factors x in both numerator and denominator. Since multiplying by 1/x n and
multiplying by x-n lead to the same result, they have to be equal if we want Rule 2 to apply to
negative integers as well as positive. (As an example show that 36•3-4= 32.)

4. xy   xy  xy...  xy  x  x  ...x  y  y...  y  x n y n . n factors xy is the same as n factors x


and n factors y. Since multiplication is commutative we can let the factors change place
without changing the value of the product. (As an example, calculate (2•3)4=
2•2•2•2•3•3•3•3= 24•34.)

 
5. x n
 x n x n ...x n  x n n... n  x mn According to Rules 1. and 2. ((xn)m means m factors xn
multiplied by each other. (As an example, calculate (52)3.)

6. By Rule 2 xn•x0 0 xn+0 = xn for all integers n. Thus, multiplying a number by x0 produces
the same result as multiplying the number by the number 1. In order for Rule 2 to be valid we
must define x0 to be equal to one for all x, except the number zero. (00 is not defined.)

7. Let x be a positive number and m an integer. The mth root of x is defined as the number
which is equal to x when raised to the power of m. For example, the square root (the second
root) of x is the number the square of which is equal to x. But according to Rule 5
 m1  1
 x   x m  x m  x1  x . Therefore x1/m is the mth root of x. (If m is an odd number,
 
 
this is true also for negative values of x. If m is an even number and x is negative, the mth root
of x is not a real number.)


p( x)  a n x n  a n 1 x n 1  a n  2 x n  2  ...  a 2 x 2  a1 x  a 0

where an, an-1, …, a1, a0 are constant, real numbers and an  0, is a polynomial of degree n in
A zero of the polyomial p is a value of x such that p(x) = 0.
A polynomial of degree n has at most n real zeroes.

The factor theorem: The number r is a zero of the polynomial p (i. e. p(r) = 0) if and only if
there is a polynomial q such that p(x) = (x-r)q(x) for all values of x. If p is of degree n, q must
be of degree n-1.

Another way of expressing that there is a polynomial q such that p(x) = (x-r)q(x) is to say that
the polynomial p can be divided by (is divisible by) (x-r) (by the first order polynomial x-r).

Degree: Form Example
Zero a 3
One ax + b 4x – 9.7
Two ax2+bx+c 3Q2-4Q+10

ax + b has the zero x = -b/a

ax2+bx+c = 0 has
 no real root if b2<4ac
 the two roots

b b 2  4ac  b  b 2  4ac
x  
2a 4a 2 2a if b2 > 4ac
 the double root x   if b2 = 4ac

Alternatively to get a simpler formula to memorise: Divide both sides of ax2+bx+c = 0 by
Solve x2+px+q = 0 where p = b/a and q = c/a

p p 2  4q
x +px+q = 0  x   
if p2  4q
2 2

*Optional: To prove the rule: Complete the square and use the first rule of squares!
2 2
p  p  p
x 2  px  q  0  x 2  2 x        q  0
2 2 2
2 2 2 2
 p  p  p  p
 x      q  0  x      q
 2 2  2 2
p  p
 x     q
2 2


6x 2  x  1  0
1 1
x2  x  0
6 6

Formula solution: p = 1/6 and q = -1/6

1/36 > -4/6 so the solution is
1 4 1 24 25 5
1    ( ) 
6 6 1 36 36 1 36 1 1 5
x 6       6   =
2 2 12 2 12 2 12 2 12 12
4/12 1/3
= -6/12 = -1/2

*Alternative method or check: Complete the square:

Finally, check that these values satisfy the original equation:
6  ( ) 2   1    1    1  0 and 6        1  6   1    1  0
1 1 6 1 2 1 1 1 1 1 3 1
3 3 9 3 3 3  2  2 4 2 2 2

Exercise: According to the factor theorem 6x2+x-1 = 6(x – 1/3)(x+1/2). Check this!

Double roots: Take s(x) = x2 + 8x + 16. p = 8 and q = 16.
8 8 2  4  16 8 64  64
x     4  0
2 2 2 2
Thus, the polynomial has the double root x = -4 and according to the Factor´Theorem:
s(x) = [x – (-4))(x – (-4)] = (x + 4)(x +4). Check this as an exercise!

Example. Take the polynomial x2-5x+6. The formula for solving the equation x2-5x+6=0
shows that the roots are x = 2 and x = 3.
Factor theorem: (x-2)(x-3) = x2-3x-2x+(-2)(-3) = x2-5x+6

Golden rules for solving equations

 Do the same thing to both sides
 Do not divide by zero
 Check your answer by inserting the values you’ve found into the
original equation!!

Ex.: 5x – 10 = 20 But: x2 – 5x = 10x

x–2 =4 x(x-5) = 10x
x=6 x-5=10 or x=0

Linear equations

A linear equation in n variables:

a1x1+ a2x2+ a3x3+ …+ an-1xn-1+ anxn=b
1. The total revenue (TR) of a firm that sells two products
TR  p1Q1  p2Q2
2. The budget of a consumer with income m, buying 3 goods:
p1 x1  p2 x2  p3 x3  m
3. The budget of a consumer buying n goods:
p1x1+ p2x2+ p3x3+ …+ pn-1xn-1+ pnxn=m
Methods of solving systems of (simultaneous) linear equations:
1. Graphically (approximately)
2. Substitution
3. Elimination

Ex. Supply/demand equilibrium


QD = a – bP a, b, d > 0 (1)
QS = c + dP c<0 (2)
QD = QS (3)

Exercise: Make sure that you understand the connection between the signs of a, b, c, d and
the diagram!

The system x+y = 1 (1)
of equations 3x+2y = 3 (2)
is LINEAR with 2 variables and 2 equations.

3x+2y=3 Approximate


2. SUBSTITUTION: According to (1) y = 1- x. Replace (substitute) y by 1-x in (2):

2x + (1-x) = 2
2x + 1 – x =2
x + 1 =2
Insert this value of x into (1) to get y=1-1=0

Multiply both sides of (1) by 2
2x+2y=2 (1’)
Subtract the left side of (1’) from the left side of (2) and the right side of (1’) from the right
side of (2)
(2x+y) – (3x+2y) = 2 – 3 (3)
2x - 3x + 2y - 2y = -1
-x = -1
Insert x = 1 into (1)  1+y =1  y=0

We can subtract the same amount from both sides of an equation. To get (3) we subtracted
the left side of equation (1’) from the left side of (2) and the right side of (1’) from the right
side of (2). Since (1’) is an equation, its right side and its left side are equal. Therefore we
have subtracted the same from both sides of (2). This is called piece-wise subtraction.

Example: National income model:

Y=C+I (1)
I = I0 (2)
C = a + bY 0<a 0<b<1 (3)

With a public sector (government expenditure and tax) included in the model:

Y=C+I+G (1)
I = I0 (2)
C = a + bYD 0<a 0<b<1 (3)
YD = Y – T (4)
T = tY 0<t<1 (5)
C = a + b(Y – tY) = a+b(1-t)Y (6)
Y = a+b(1-t)Y+ I0+G (7)

Y – b(1-t)Y = a+ I0+G
(1-b(1-t))Y=a+ I0+G
a  I0  G
Y 
1  b(1  t )

Exercise: Determine C and T

A general system of m linear equations in n variables can be written as:

a11x1+ a12x2+ a13x3+ …+ a1nxn=b1

a21x1+ a22x2+ a23x3+ …+ a2nxn=b2

am1x1+ am2x2+ am3x3+ …+ amnxn=bm

Nr of solutions 0 1 

m=n x+y=1 x+y=1 x+y=1
3x+3y=2 x+2y=2 2x+2y=2
m<n x+y+z=1 ____ x+y=1
n<m x+y=1 x+y=1 x+y=1
x+2y=2 x+2y=2 2x+2y=2
x+3y=1 2x+3y=3 3x+3y=3

The examples show that it is not automatically the case that there is unique solution if and
only if there as as many equations as unknown variables. It depends on the relationship
between the equations.
In the example x+y=1
the equations contradict each other and no values of x and y can satisfy both simultaneously.
In the example x+y=1
all the information contained in the second equation is already contained in the first one. The
two equations are said to be dependent. With two variables and only one independent
condition, there are infinitely many values of x and y that satisfy the two equations. In the
example at the bottom of the third column there are three equations, but the third can be
deduced from the two first. It does not add any new information, doesn’t impose any new
restriction of the variables.

Systems of linear equations have either a unique solution, no solution or infinitely many
solutions. With non-linear equations there are more possibilities.

x2 + 1= 0 No solution
x2 - 1= 0 Two solutions
x =0 One solution
(x-1) 2 + (y-3) 2 = 1 Infinitely many solutions
(x-1) 2 + (y-3) 2 = 0 One solution

Linear functions:
y = ax + b a, b constants
b intercept
a slope
C = 0.8Y + 10 (aggregate consumption)
Q = 100 – 0.75P (demand for a good)

Difference quotients:
Let (x1; y1) and (x2; y2) be two points on the straight line y = ax + b
We can form the difference quotient:
y y2  y1 (ax2  b)  (ax1  b) a( x2  x1 )
   a
x x2  x1 x2  x1 ( x2  x1 )
y ( x  1)  y ( x) a ( x  1)  b  (ax  b) ax  a  b  ax  b
  a
( x  1)  x x 1 x 1
The change in the dependent variable y when the independent variable x is increased by one
is exactly a. Linear functions have the same slope at each point of the graph.
If a>0 x ↑ ==> y ↑
If a<0 x ↑ ==> y ↓
If a=0 y constant
If │a1│<│a2│ y = a2x +b2 is steeper than y = a1x +b1

To find the equation of a straight line/linear function:
Assume that (x1,y1) och (x2, y2) are two points on y=ax+b . There are two methods of
finding the equation of the line from knowing only two points.
y 2  y1
Method 1. Slope a x 2  x1 and intercept b  y1  ax1

x1a  b  y1
Method 2. Solve the system of equations x2a  b  y2 for a and b

Example: Find the straight line through (1, 3) and (3, -5)
53 8
a   4
1. 3 1 2 and b  3  (4)  3  4  7
1 a  b  3
2 . 3a  b  5
Piece-wise subtraction:
a - 3a  3 - (-5)  8  -2a  8  a  4
Insert =-4 into either equation to get b=7
The equation of the line is y = 7 - 4x
7 - 4*1 = 3 and 7 - 4*3 = -5

The slope of a non-linear function?
How does one determine the slope of a non-linear function? Lets compare with the linear
function y=ax+b. If we start from one given point (x, y), pick some other point and calculate
the difference quotient we always get the value a, whatever second point we choose. a is the
slope of the function at (x, y). The slope of a straight line is the same at every point so if we
repeat the exercise with any starting point we get the same result. If we do the same with
different points on the graph of a non-linear function we get different values for different
points because the slope of the function varies. The slope of a non-linear function is different
at different points.

Example: Take y(x)=x2

Obviously, the slope of this graph varies. To the left of zero it is negative, to the right of zero
it is positive. Far from zero, the slope is steep, near zero, the graph is almost flat. No single
number describes the slope of the whole graph. But can we attach values to the slope at each
point? For example, the slope of y(x)=x2 at x0=1?

y0 = y(x0) = 12 = 1

Difference quotients:
x x0+x= x1 y1= x12 y y =

1 2 4 3 3 =2+1
-1 0 0 -1 1 =2-1
½ 3 9 5 5 =2+½
2 4 4 1 2
-½ ½ 1 3 3 =2-½
4 43
 1 2
¼ 5 25 9 9 =2+¼
4 16 16 4
-¼ 3 9 7 =2-¼
4 16 16 4

121 21
1 11
21 2 1
10 10 100 10 10

 1 9 81  19 19 2 1
10 10 100 100 10

Geometric interpretation:
As we choose Δx smaller and smaller in absolute value, the straight lines through (1, 1) and
(1+Δx, (1+Δx)2 ) “cut off” smaller and smaller pieces of the graph. The smallest piece that
can conceivably be cut from a line is a single point.

Two straight lines always intersect, unless they have the same slope (are parallel). If we want
to draw a straight line through (1, 1) which does not intersect the graph we should also be
looking for a line with the same slope as y=x2.

The difference quotients, as Δx approached zero seem to approach the number two. The
straight line through (1, 1) with slope two is y = 2x – 1. If we draw the line it passes through
(1, 1) which is on the graph of y = x2 but does not intersect (“cut through”) it. This is the red
line in the diagram below.
















,3 -0,2



























Definition: A line which includes one point of a curve but does not intersect the curve is
called a tangent to the curve (graph).

If the line is tangent to the graph y=f(x) at the point x

the slope of the graph at x = the slope of the tangent.

In the example:
















,3 0,2























-1 -






The slope of the graph in (1, 1) is equal to the slope of the tangent at that point. In the
diagram the line y = 2x-1 appears to be the tangent of y=x2 in (1, 1).

Can we use algebra to find support for the geometric interpretation?
y y1  y0 (1  x) 2 12 12  2 1 x  (x) 2  12
   
x x1  x0 (1  x) 1 x

2x  (x) 2
  2  x
If Δx is near zero (“almost nothing”), the difference quotient must be near two, “almost equal
to two”.
For any point x=a:
2 2
y f (a  k )  f (a) (a  k )  a
 
x ak a k

2 2 2 2
a  2  a  k  (k )  a 2ak  k
   2a  k
k k

Definition of the derivative

Generally: To find the derivative of a function f at point a:
(VERY informally)
1. Let k be a small number h≠0 and form the difference quotient
f (a  k )  f (a)
2. Do this for different numbers k, all different from zero but nearer and nearer zero.
3. See if you can find some number which the difference quotient approaches.
If there is such a number, it is called the derivative of f at a.

Notation for the derivative of the function y = f(x) at the point x = a:

dy df
f ' (a) D f (a) , dx
(a ) (a )
, dx

Formally, the derivative is the limit of the difference quotient as the distance between the
points approaches zero. This is written as:
f ( a  h)  f ( a )
f ' (a)  lim
Formally h 0 h if the limit exists
If it doesn’t, f does not have a derivative at the point a. It is not differentiable at a.

Examples: If f is not continuous at a point a, f is not differentiable at a.

A continuous function can be seen, intuitively, as one whose graph can be drawn “without
lifting the pen from the paper”.

If f has “a sharp corner” at point a, f is not differentiable at a.

Since the slope of a non-linear function is different at different points, the derivative takes
different values at different points in the domain of f. This means that: if is a differentiable
function of x, the derivative, f’ is also a function of x. Note the difference between the
derivative (a function) and the derivative at a point (a number).

The derivative, considered as a function, can be written

f' f ' ( x) D f  dy df
dx dx

Notations for describing change:
(Assume that y = f(x) is a function and a and b are points in the domain of f)
Concept Mathematically
1. Change, difference y; f(a+h)-f(a); f

2. Average (relative) change , y , x≠0

f (a)  f (b) , f ( a  h)  f ( a )
change per unit ab, h0
ba h
dy lim
3. Instantaneous change f ' (a) , , x  0 x
dy 
f ' ( a ) dx y
; ;
4. Relative rate of change f (a) y y

5. Elasticity dy x (point elasticity)

dx y

y x
 (arc elasticity)
x y

A slightly closer look at limits:

x f(x)=x2 x 1
f ( x)  ,
x 1
x 1
1,9 3,61 0 1
1,99 3,9061 0,5 0,5858
1,999 3,9006001 0,9 0,5132
2 4 0,99 0,5013
2,001 4,004001 0,999 0,5001
2,01 4,0401 1 ---
2,1 4,41 1,001 0,4999
1,01 0,4988
1,1 0,4881
1,5 0,4495
2 0,4142

In the example to the left we could have found the limit of f as x approaches 2 by calculating
f(2) = 22 = 4. But in the example to the right f(1) does not exist – to calculate it we would
have had to divide by 1 – 1 = 0 and that is not possible. It looks as if the limit were the
number 0.5. We can check using the rule of conjugates and the fact that 1 = 12 and x = (x )2

For x≠1 f ( x) 
x 1

x 1

 x 1  


as x→1
x 1  x 2
 12  x 1  x 1   x 1 1 1 2

“Definition” (intuitive): If we can arbitrarily make any choice we want of how far f(x) may
be from A and ensure that it keeps within that limit by keeping x sufficiently close to a (but
lim f ( x)  A
not equal to a) then we say that xa

Rules of differentiation
Assume that f and g are differentiable functions with the same domain, D.
1. If for all x in D h(x)=c∙f(x) where c is a constant then h’(x) = c∙f’(x)
2. If for all x in D h(x)=f(x) + g(x) then h’(x)=f’(x) + g’(x) (analogously for subtraction)
3. . If for all x in D h(x)=f(x) ∙g(x) then h’(x)=f’(x) ∙g(x) + f(x) ∙g’(x) (Product rule)
f ( x)
4. If h( x)  for all x in D such that g(x)  0 then h' ( x)  f ´(x) g ( x)  f 2( x) g ' ( x)
g ( x)  g ( x) 
(Quotient rule)
1. Real and nominal GNP-growth
Y(t) = P(t)Q(t)
where t – time Y – nominal GNP
P – price level Q – real production
By the product rule:
Y’(t) =P’(t)Q(t) + P(t)Q’(t) Real growth


Nominal growth

P ' (t )
Relative change in prices
P (t )
Q ' (t )
Relative real growth
Q (t )
Y ' (t )
Relative GNP growth
Y (t )

In “dot-notation” (which is used mainly in macroeconomics literature) the derivative of a

function is written as the symbol for the function with a dot above it:

      
    

2. Average and marginal cost

Let Q represent the level of production of a firm, C(Q) the total cost of production and
C (Q)
AC(Q) average cost, i.e. AC (Q) 
This is a quotient of two functions so we use the quotient rule.
The derivative of C(Q) is C´(Q). The derivative of Q(Q) = Q?
The same as the derivative of f(x) = x. “How much does production change when production
changes by one unit?” By one unit!

d ( AC (Q) C ' (Q)  Q  C (Q) 1 C ' Q C 1 C ' AC 1

  2      ( MC - AC )
dQ Q2 Q Q Q Q Q Q
The derivate of AC is positive if and only if MC > AC

3, The derivative of a polynomial. The polynomial

p( x)  a n x n  a n 1 x n 1  a n  2 x n  2  ...  a 2 x 2  a1 x  a 0

is a sum of terms, each of which is a functions of the form akxk. According to rule 2.the
derivative of the polynomial is the sum of the derivatives of the terms.

Since ak is a constant the derivative of akxk is ak times the derivative of xk, according to rule
1. So, if we know the derivative of xk, we will be able to differentiate any polynomial.

We have found in the examples that
The derivative of x = x1 is 1 = 1∙x0 = 1∙x1-1
The derivative of x2 is 2x = 2x1 = 2∙x2-1
What is the derivative of x3? x3= x∙x2. The product rule tells us that the derivative of this is:
D[x3] = D[x] x2+x∙D[x2] = 1∙x2 + x∙2x = 3x2 = 3∙x3-1
Analogously D[x4] = D[x] ∙ x3+x∙D[x3] = 1∙x3 + x∙3x2 = 4x3 = 4x4-1

Assume that for some arbitrarily chosen positive (>0) integer k, the derivative D[xk ] = kxk-1
Then D[xk+1] = D[x∙xk] = 1∙xk + x∙ D[xk ] = xk + x(k)xk-1 = xk + k xk = (k+1) xk
Since the rule is true for k = 1 it is true for k = 2, since it is true for k= 2 it is true for k=3 etc.
In fact, for any k≠ 0 (not only integers): The derivative D[xk] = kxk-1

Exercise: Assume that it is known that the derivative of f(x) = xn is f’(x) = nxn-1 for all
positive integers n. Use the quotient rule to find the derivative of g(x) = xk where k is a
negative integer.

Example 4 a). f(x) = 3x4 – 5x2 +7

f’(x)=4*3*x3-5*2x1+0 = 12x3-10x
b). f(x) = (x2-1)(x4-2x2+1)
f’(x)=2x(x4-2x2+1)+(x2-1)(4x3-4x) =
= 2x5-4x3+2x+4x5-4x3-4x3+4x =
= 6x5-12x3+6x=6x(x4-2x2+1 ) = 6x(x2-1)2
c). y(x) = (x3 – x)(5x4+x2)
y’(x) = (3x2-1)(5x4+x2)+(x3-x)(20x3+2x) = 35x6-20x4-3x2

d.) x 2  1 g ( x) where g(x)=x2+1 and h(x)=x+1

f ( x)  
x 1 h( x )
g’(x)=2x h’(x)=1

g ' ( x ) h( x )  g ( x ) h' ( x ) 2 x( x  1)  ( x 2  1)  1 2 x2  2 x  x2  1 x2  2 x  1
f ' ( x)    
h( x)2 x  12 x  12 x  12

e). y ( x)  3 x  5
3  ( x  2)  1  (3x  5) 3x  6  3x  5 1
y ' ( x)     ( x  2) 2
( x  2) 2
( x  2) 2
( x  2) 2

5. The equation for the tangent of x 2  1 at x=1?

f ( x) 
x 1
The slope is equal to the derivative of f at this point. Inserting x = 1 into the derivative
calculated in 4 d) we get 12  2 1  1 2 1
 
(1  1) 2 4 2

The tangent passes through the point x=1, y(1) = 1 on the graph
y=0.5x+b where 1=0.5+b. Therefore b=0.5

Composite functions
Let g(y) be a function defined for all y in a set Dg and y=f(x) be a function defined for all x in
a set Df such that the range of f, Vf is a subset of Dg.

The function z= h(x) = g(f(x)) = is the composite of g and f.. It can be written as h = g◦f

f is called the inner function
g is called the outer function

The chain rule of differentiation

The chain rule: If h(x) = g(f(x)) and g and f are differentiable, then
h' (x)  g ' ( y)f ' ( x)  g ' ( f ( x) )f ' ( x)
dz dz dy

dx dy dx

Example 1. a) y = (x+2)2
The function can be written as y = z2 where z = (x+2). Applying the chain rule gives us
dy dy dz
   2 z  1  2( x  2)  2 x  4
dx dz dx
To check, we can note that y = x2 + 4x + 4. Differentiating this directly, we get
y’(x) = 2x + 4.
b) y = (2x+1)2
dy dy dz
   2 z  2  2  (2 x  1)  2  8 x  4
dx dz dx
Check: y(x) = 4x2 + 4x + 1  y’(x) = 8x +4

Example 2 y = (5x3 + 2x – 7)17.

In Example 1, it was equally easy to apply the chain rule and to differentiate directly. In this
case, although both methods work the chain rule is much easier. According to the chain rule
y’(x) = 17(5x3 +2x – 7)16*(15x2+ 2)
To avoid using the chain rule, one has to calculate (5x3 + 2x – 7)17 as a polynomial of the 51st

Example 3: Marginal Product Value

Assume that a firm faces a demand curve: p=15-q, where p is the price and q the quantity
Total revenue: R= p∙q = (15-q) ∙q = 15q-q2.
Marginal revenue R’(q)= 15-2q
Assume that the firm’s production function (with fixed capital equipment) is:
q=q(L)=10√L = 10L1/2
where L = the amount of labour time used
 1
1  
dq 10 5  1 
MPL=     10  L2  5L 2 
dL 2 L L 2 
 
MPL is the marginal productivity of labour, or the rate of increase in production when labour
time is increased.
What about R’(L)? The rate of increase in revenue when labour time is increased.

Express R as a function of L:
R(L)=15q-q2=15 ∙10√L-(10√L)2=150√L-100L
dR 150 75 100 L 75  100 L
  100    
dL 2 L L L L
5 5 dR dq
 (15  2 10 L )  (15  2q ) 
L L dq dL

Example 4

z ( x5  2 x  1)

z y
y  x5  2 x  1
dz dz dy 1 4 5x4  2
  (5 x  2) 
dx dy dx 2 y 2 x5  2 x  1

If you prefer writing square roots as power functions:

 5 1

z   ( x  2 x  1) 2 
 z  y 2 where y  x 5  2 x  1
 
1 1
dz dz dy 1  2 1  (5 x 4  2)
  y (5 x 4  2)  ( x 5  2 x  1) 2 (5 x 4  2) 
dx dy dx 2 2 1
2( x  2 x  1) 2

Let y = y(x)
The elasticity of y w.r.t. (with respect to) x: ”The percentage change in y relative to the
percentage change in x”.
Y X Y X (preliminary definition)
  
Y X X Y

Problems with this definition:
1. it takes different values for change from X1 to X2 and for change from X2 to X1
2. it takes different values for the same value of X but different ΔX so we cannot speak of the
elasticity at a given point.
(To see that this is true, take a demand function q = 100 – 5p. Calculate q when p = 5, p = 6
and p = 10. Then use the formula to calculate the price elasticity of demand when
a. price increases from 5 to 10
b. price decreases from 10 to 5
c. price increases from 5 to 6

1. To avoid the first problem modify the definition of arc elasticity to

Y X (Y2  Y1 ) ( X 2  X 1 )
E   
X Y ( X 2  X 1 ) (Y2  Y1 )


dY X
  
dX Y
X Y is a good approximation for ”small” ΔX
 

Important elasticities:

dQ P price elasticity (of demand)

p  
dP Q
(P, Q, price and quant. of same good)
dQ I income elasticity (of demand)
I  
dI Q
(Q, quant. of good, I inc.)
dQ X P cross price elasticity
p   Y
(X, Y are quantities of two goods)

Increasing/decreasing functions
The graph of an increasing function points ”upwards to the right”, that of a decreasing
function “downwards to the right”.

increasing functions decreasing functions

If x<y  f(x) < f(y) f is strictly increasing

If x<y  f(x) > f(y) f is strictly decreasing

If x<y  f(x)  f(y) f is increasing

If x<y  f(x)  f(y) f is decreasing
If a function is decreasing in its whole domain or increasing in its whole domain it is
Increasing: y = 3x +5, y=x2 for x>0, y=4x3
Decreasing: y = -3x + , y=x2 for x<0 y= 1/x for x>0

If y=f(x) and z=g(y) are two increasing functions, the composite z=g(f(x)) is increasing.
Assume that x2  x1. Then f(x2)  f(x1) since f is increasing. But then g(f(x2))  g(f(x1)) since
g is increasing.

”The function f is differentiable in the interval I” means that for every point x in I, the
derivative f’(x) exists.

Let f be a function which is differentiable in the interval I:

1. If f'(x)>0 for all x in I then f is strictly increasing in I.

2. If f'(x)<0 for all x in I then f is strictly decreasing in I.

In other words, a derivative which is strictly positive (in an interval) is a sufficient condition
for the function to be strictly increasing (in that interval), but not a necessary condition
because a function can be strictly increasing even though the derivative is zero at some point.
(For example the function y=x3 at the point x=0.)

A strictly negative derivative is a sufficient condition for the function to be strictly decreasing
but not a necessary condition because a function can be strictly decreasing even though the
derivative is zero at some point. For example the function y=-x3 at the point x=0.)

But since a negative derivative is sufficient for the function to be strictly decreasing, a non-
negative derivative is necessary for it to be increasing:
If f is increasing on the interval I then f'(x) ≥ 0 for all x in I.
If f is decreasing on the interval I then f'(x)  0 for all x in I

Example: The function f(x) = x3 – 5/2 x2 – 2x + 3 is

strictly increasing on (-, -1/3) and (2, )
increasing on (-, -1/3] and [2, )
strictly decreasing on (-1/3, 2)
decreasing on [-1/3, 2]

1 f(x)



1 f(x)





































Inverse functions
Example: Assume that a firm which is not a price taker faces a demand function:

Q= 2000 – 50 p (1)

Q = Q(p) is the quantity that can be sold at price p.

 How much will be sold if p=16?

p=16  Q=2000 – 50 ∙16 = 2000 – 800 = 1200
 What price should the firm set in order to sell 1200 units?
2000 Q Q
Q= 2000 – 50 p  50p = 2000 – Q  p   p  40 
50 50 50
Q=1200  p = 40 – 24 = 16

p(Q) defines price as function of quantity.

Q(p) defines quantity as function of price.

Q(p) is the inverse of p(Q)

For every price p0 and quantity Q0
Q(p(Q0))=Q0 and p(Q(p0))= p0

Q p(Q0)= Q0 & p Q(p0)= p0

In the example: p(Q(16)) = p(1200) = 16

Q(p(1200)) = Q(16) = 1200

The composite of the two inverse functions is a function that maps each element in the
domain on itself.

Let f be a function. If f is invertible f the inverse (function) of f is written as f-1.

f-1 f (x) = x for all x in the domain of f.
f f-1(x) = x for all x in the domain of f-1 Vf=Df-1

x1 Df
x2 y1

A function f is invertible if and only if it is a one-to-one function, that is if f(x1) cannot be
equal to f(x2) for different values x1 and x2 of the variable.

1. If f(x) = x + 1 and g(y) = y – 1 then f and g are inverses (of each other).
f(g(y)) = f(y-1) = (y-1)+1 = y for all values of y.
(Show yourself that g(f(x)) = x for all x.)
2. f(x) = x2, x≥0 and g(y) = y , y≥ 0 are inverses.
3. Find the inverse of f if f ( x)  , x  1.
x 1
For x1 y   y ( x  1)  x  yx  y  x  yx  x  y  x( y  1)  y
x 1
Thus, the function g ( y )  is the inverse of f.
y 1
A function f is invertible if and only if it is strictly monotonic

If f is differentiable at a and f’(a)0, then f-1 is differentiable at f(a) with derivative

f '(a )

*(Optional): Let h(x) = f -1 f(x). Since h(x) = x for every x, h’(x) =1.

dh df df 1
But according to the chain rule, ( x)  ( x)  ( y ) where y = f(x)
dx dx dy

Exponential functions and logarithmic functions

The function f(x) = ax (where a is a constant>0) is an exponential function (with base a).

Typical example: Kt=K0(1+r)t where K0 is an amount of initial capital, r is the interest rate,
and Kt is capital after t years.

The bigger x is, the bigger is ax for a>1  if a>1 then ax is a strictly increasing function of x
The bigger x is, the smaller ax for a<1  if a<1 then ax is a strictly decreasing function of x

Example: 2x with x=1,2, 3, 4 is equal to 2, 4, 8, 16

and (½)x with x = 1, 2, 3, 4 is equal to ½, ¼, 1/8, 1/16

For any a>0, f(x) = ax passes through the point (0, 1) because a0=1.
For all numbers a > 0
If a > 1 If 0 < a < 1
x<0  ax < 1 x<0  ax > 1
x = 0  ax = 1 x = 0  ax = 1
x > 0  ax > 1 x > 0  ax < 1

For any numbers a and b such that a < b and a, b > 0

ax < bx if x > 0
ax = bx = 1 if x = 0
bx < ax if x < 0

8 exp





19 ,1










3, - 0






























3, - 0

x <0 =0 >0
2x > 3x 2x = 3x 2x < 3x

What is the slope of the graph of an exponential function f(x)=ax?

At x=0 the slope of y=2x is a little less than one and that of y=3x a little more than one. There
is a number e, 2<e<3 such that the slope of ex is equal to one if x = 0.




-0,6 -0,6 -0,5 -0,5 -0,4 -0,4 -0,3 -0,3 -0,2 -0,2 -0,1 -0 0 0,05 0,1 0,15 0,2 0,25 0,3 0,35 0,4 0,45 0,5 0,55 0,6

Slope of y=2x, y=ex, y=3x and y=x+1 at x=0

But at x = 0, ex = 1 so the value of the function = the value of the derivative. For this
particular function, this is true for all values of x.

For the function f(x) = ex, f(x) = f’(x) for any number x!

This function has an inverse: The natural logarithm, y = ln x

y = ex x = ln y

”The ( natural) logarithm of y is the exponent to which e must be raised in order to attain the
value y”.

It follows from the definition that

eln y = y and ln ex = x

Example: Assume that inflation is constant at 3 % per year. How should that be
visualised graphically? The diagrams show three different options.

Rate of inflation
Rate of inflation









0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Price level
Price level






10 Serie1

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97

Logarithm of price level

ln P



ln P

































The third graph conveys the visual impression of something that changes (unlike the first) but
at an even pace (unlike the second).

Properties of the exponential and logarithmic functions

ex ln y (y, z>0)

1. Dex = ]- ; [ ; Vex = ] 0; [ 1. Dln y= ]0; [;Vln y =]-;[

2. e0 = 1; e1 = e 2. ln 1 = 0 ; ln e = 1

3. ex is strictly increasing 3. ln y is strictly increasing

4. er = es  r=s 4. ln y = ln z  y=z

5. er+s = eres 5. ln y + ln z = ln yz

6. er-s = er/es 6. ln y - ln z = ln y/z

7. (er)s = ers 7. ln yp = p ln y

8.D(ex) = ex , all x .D(ln y) = , y>0


1. Since ln y is the inverse of ex, Dln y=Vex and conversely. ex can take all positive values but
not be zero or negative so the range of the exponential function and the domain of the
logarithm is the positive numbers. ex is defined for all values of x so the logarithm can take
any real number as its value.

2. According to the definition of an inverse function:

ln 1 = ln e0 = 0 ln e = ln e1= 1

3. The inverse of a strictly increasing function is always strictly increasing!

4.Since the ln-function is strictly increasing it is one-to-one. Therefore the same value of the
function (the same logarithm) cannot correspond to two different values of the variable.
(Analogously, of course, for the exponential function.)

5. Take any two numbers x, y >0. By the definition of ln and the usual power rules

xy  eln x eln y  eln x  ln y . But xy  e ln xy by definition. Therefore, eln x  ln y  eln xy .

Since the exponential function is one-to-one this means that
ln x + ln y = ln xy.

6. Analogous to the proof of point 5. Try to show it as an exercise!

7. e ln( x )  x p  ( x) p  (e ln x ) p  e (ln x) p according to the power rules.
Since ex is one-to-one it follows that the two exponents must be equal.

8. According to the rule for the derivates of inverse functions

d d 1 1 1
(ln y )  (ln e x )   
dy dy d
ex ex y
Using logarithms to simplify.
Taking logarithms can make calculations easier because exponential relations are replaced by
products and products/ quotients by sums/differences.

Example 1. Cobb Douglas-function. (Used both for production- and utility functions.)

Q  AK L N   ln Q  ln A   ln K   ln L   ln N
You get a linear relation between the variables ln K, ln L och ln N

Example 2. Wage functions. Assume that on average the wage of a person increases p
percent with each year of schooling and r percent with each year of work experience. The
wage of someone with s years of schooling and x years of work experience is expected to be:
w = w0 (1+p)s(1+r)x
Empirically it is easier to estimate:
ln w = ln w0 + s ln(1+p)+ x ln (1+r) because this is a linear relation between s and x.

The general exponential function ax, a>0
Let f(t)=at.

f (t  1)  f (t ) a t 1  a t a t a1  a t  1 a t (a  1)
    a 1
f (t ) at at at
The relative change from time t to t+1 is independent of t.
f increases at a constant rate of (a-1)∙100 percent/unit of time.

f ( x)  a x  (e ln a ) x  e x ln a
According to the chain rule

f ' ( x)  e x ln a ln a  a x ln a

Differentiation of composite functions

involving exponentials:


y ( x)  2 5 x

y  e ln 2  5x
 e (ln 2)( 5 x )  e z
z  ln 2  v
v u
u  5x
dy dy dz dv du 1
  e z (ln 2) 5
dx dz dv du dx 2 u
(5 ln 2)e (ln 2)( 5 x ) (5 ln 2) 2 5 x
 
2 5x 2 5x


To optimise – to find the value/those values of exogenous variables for which a function
takes its largest or smallest value.

Typical micro-economic optimisation problems:

 Which level of production leads to maximum profit for a firm?
 Or which combination of products?
 Which combination of inputs allows the production of a particular quantity of a good
at the lowest possible cost?
 What level of savings maximises life-time utility for a consumer?
 How should a person divide her time between work and leisure in order to maximise

Assume that there is one variable input F in a production process. (We can assume that there
are other inputs but that they are fixed in the short run.) The graph shows production (Q) as a
function of this input.


F* F

For example, let Q be the grain produced on a certain area as a function of the amount of
fertiliser. (We assume that all other inputs to production such as land, water and labour are
constant.) Up to a certain level, adding more fertiliser increases production, but to a smaller
and smaller extent. At some point, the soil is over-fertilised and crops actually decrease if

more is added. This means that crops can be maximised with this amount of fertiliser. The
maximum output Qmax is obtained with F* kilograms of fertiliser.

But inputs usually have a cost. The next graph shows average cost for a firm (still assuming
one variable input). The firm wants to produce as cheaply as possible. At low levels of
production there is usually some slack, so average cost decreases when more is produced
since unused capacity is utilised. There are economies of scale. If production is increased
even more, at some stage there will be bottle-necks, machine capacity, floor space or the time
of workers with special skills become scarce and so on. Average cost increases and we have
the typical U-shaped average cost-curve with a minimum point.


Economies of scale bottlenecks

In these examples the functions had one maximum or minimum point, but of course that is
not always the case,

Extreme points

Definition: Let f be a function and c a point in Df.

If for all other points x in the domain of f
f(c)≤f(x) then c is a (global) minimum point of f and f(c) is the minimum value of f

If for all other points x in the domain of f, f(c) ≥ f(x) then c is a (global) maximum point of
f, and f(c) is the maximum value of f

(If ≥ is changed to > and ≤ to < then c is a strict maximum/minimum.)

If for all points x in some interval f(c) ≤ f(x) then c is a local minimum point of f.
If for all points x in some interval f(c) ≥ f(x) then c is a local minimum point of f.

In both the examples there was a ”peak” or a ”low point” which could easily be seen to be a
global maximum or minimum for the function, whatever the value of the variable. But the
graph of a function may very well have several peaks or low points, where the function takes
different values. Such points are maximum or minimum points for the function within some
part of the domain.

It may be that we are only interested in the largest and smallest values of a function in some
particular interval, for instance when a firm tries to maximize profit by choosing a level of
production between zero and a capacity limit.


This function has three local maximum points, one of which is a global maximum. It has two
local minimum points, but neither of them is a global minimum since the function takes
smaller values for very large and very small (large negative) values of the variable. Choose
different intervals on the x-axis och find the largest and smallest values of f in each interval.
It turns out that they are all either in one of the end points of the interval or in one of the peak
or low points. At a “peak” the slope of the function changes from positive (for smaller values
of x) to negative (for larger values of x). At a “low point” the slope of the graph changes
from negative (for smaller values of x) to positive (for larger values of x). Thus, the slope

and, hence, the derivative, changes sign at these points. If the derivative is continuous it has
to be equal to zero at the point where it changes its sign.

(Why? Assume that you are supposed to draw the graph of a function in the system of
coordinates below. The function should be continuous – that is to say, you should draw the
graph as one contiguous line. The value of the function should be above zero for all numbers
to the left of c and below zero for all numbers to the right of c. This means that the graph
must be contained in the shaded areas. Is there any way of drawing it so that it goes from the
striped field to the dotted without passing through (c, 0)?)

Definition A point c where f is differentiable and f’(c)=0 then c is called a stationary point
or a critical point of f

5 4 3
Example: f ( x)  x  x  7 x  4 x 2  12 x  1
5 2 3





10 Serie1














If you draw a tangent to this graph at some point where x > 3 it is sloping upwards to the
right, in other words the derivative is positive. At points to between 2 and 3, the tangent is
downward sloping, i.e. the derivative is negative. At x =3 (and at x=2, x= -1 and x= -2) the
tangent is horizontal. Its slope the derivative - is zero.

If f takes a largest/smallest value in an interval I, this happens either:

 at an end point of the interval
 at a point where f is not differentiable
 at a stationary point

x  1,2
Examples: 1. f ( x) 

Largest and smallest

values at the end points
of the interval

2. y=│x│ Minimum at x=0 where f is not differentiable

3. f ( x)  x 2 x  - 1,1






-1,2 -1,1 -1 -0,9 -0,8 -0,7 -0,6 -0,5 -0,4 -0,3 -0,2 -0,1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 1,2 1,3 1,4

Maximum at x=1 and x=-1 (end points)

Minimum at x=0 (stationary point)

BUT not all stationary points are extreme points!

Example: f(x)=x3 f’(x)=3x2
As the graph below shows, f’(0)=0 but f does not have a minimum or maximum at x=0.





0 Serie1








































Zero is an inflection point of f. (We’ll return to the definition of an inflection point later.)

Assume that c is a stationary point of the function f. Then:

 If the function f is increasing to the left of c and decreasing to the right of c then
c is a local maximum point of f.
 If the function f is decreasing to the left of c and increasing to the right of c then
c is a local minimum point of f.

If f is differentiable, its derivative can be used to determine whether a stationary point is a

maximum, a minimum or an inflection point.

Let c be a point such that f'(c)=0. If c is inside an interval (a, b) where f is differentiable and
1. [a≤x≤c  f'(x) ≥ 0] and [c≤x≤b  f'(x) ≤ 0] then x is a local maximum point of f.
2. [a≤x≤c  f'(x) ≤ 0] & [c≤x≤b  f'(x) ≥0] then x is a local minimum point of f.
3. If f'(x)>0 for all x, a≤x≤b, x≠c (or f'(x)<0 for all x a≤x≤b, x≠c), c is an inflection point of f.
(As an exercise, formulate the conditions in words!)
1. Assume that f’(x)≥0 for a≤x≤c. That means that for all points to the left of c (but to the
right of a) the derivative is non-negative (positive or zero), which, in its turn means that the
function is increasing. To the right of c, the derivative is non-positive so the function is
decreasing. Therefore c must be a local maximum.
2. The proof is analogous. Do it yourself as an exercise!
3. The function is strictly increasing for all values of x both smaller and larger than c so it
can’t be either maximum or a minimum point. Analogously, if f is strictly decreasing for
values of x both to the left and to the right of c, c cannot be either a maximum or a minimum

Example 1: The average cost-curve! (See figure and derivative above!) We found that the
derivative of the AC-function equal to ( MC  AC ) which is zero where MC=AC, negative
to the left of this point, where MC<AC and positive to the right of it, where MC>AC.

Example 2. : Find and examine all stationary points of f if

f’(x)=6x2-54x+84 = 6(x2-9x+14)
f’(x) = 0  (x2-9x+14)=0
(x2-9x+14)=0  x=7 or x = 2

One method of finding the sign of a complicated function is to write it as the product of as
simple factors as possible. Then make a sign diagram which shows the intervals where each
of these factors are positive and negative, respectively. At a point where an odd number of
factors are negative, the product is negative. Where an even number of factors are negative,
the product is positive. Since 2 and 7 are zeroes of f’, f’(x) can be written as f’(x)=6(x-7)(x-2)
according to the factor theorem.

2 7
6 + + +
(x-2) - 0 + +
(x-7) - - 0 +
f’(x) + 0 - 0 +

This function does not have any global maximum or minimum.

It has a local maximum for x=2, a local minimum for x=7

-100 Serie1


































-100 Serie1






f’(x)=6x2-54x+84 f’’(x) = 12x – 54





100 40



0 Serie1











38 -0,4










-100 -100

To find the largest and smallest value of a (continuous) function in an interval [a,b] calculate
the value of f:
 at all points in [a,b] where f'=0
 at all points in [a,b] where f' does not exist (where f is not differentiable).
 at the endpoints a and b.
The biggest/smallest of these is the biggest/smallest value of f on [a,b].

Example 1 : Find the smallest and largest value of
f(x)=2x3-27x2+84x-130 in the interval [1, 5]
1. Stationary points. x=2 and x=7 but 7 is not in [1, 5].
f(2) = 223-2722+842-130=16-108+168-130= -54
2. Points where f is not differentiable? Polynomials are always differentiable!
3. Endpoints: f(1)=2-27+84-130=-71
f(5) =250-675+420-130=-135
Largest value: f(2) = –54 and smallest vaue f(5) = -135

Example 2 Profit maximization.

Assume that the profit of a firm is given by the differentiable function Π(Q).
Π(Q) = R(Q) – C(Q) 0≤ Q ≤ Qmax
Q- quantity produced R – revenue C – cost
Qmax – maximum capacity
Π’(Q) = R’(Q) – C’(Q)
Π’(Q) = 0  R’(Q) – C’(Q) = 0  R’(Q) = C’(Q)
If there is a quantity that maximizes profit it is either:
 zero
 the maximum capacity
 a quantity for which MR = MC (marginal revenue = marginal cost).

Higher order derivatives

The derivative f’ of the function f is also a function.
If f’ is differentiable, its derivative is called the second derivative of f and written as f’’.
If the second derivative is differentiable its derivative is called the third derivative and
written as f(3) and so on.

Example: f(x)=2x3-27x2+84x-130
f’(x)=6x2-54x+84 f’’(x) = 12 x – 54 f(3)(x) = 12
f(4)(x) = 0 f(5)(x) = 0 f(6)(x) = 0*

As you can see all the derivatives of order four and higher are zero for every value of x. (They are said to be
identically zero.) This means that they are all equal to the constant function y(x) = 0, whose graph is the

Convexity and concavity
The first order derivative (the ordinary derivative) shows how the function changes when the
variable changes. The second order derivative shows how the first derivative, the slope of the
function, changes. If, for example, the function represents the distance which a vehicle has
moved, as a function of time, the difference quotient when we compare two points in time,
shows the distance travelled during this time interval, divided by the time it has taken – in
other words, the average speed of the vehicle. The derivative at a particular time t
corresponds to the velocity of the vehicle at that moment. The second derivative represents
the change in the derivative, in this example, the change in velocity – that is to say the
acceleration (or retardation) of the vehicle.

The second derivative indicates how the slope of the graph changes and therefore tells us
something about the shape of the graph.

Let f be twice differentiable in the interval [a,b] (i.e. f’’(x) exists for all x in the interval).

If f''(x) ≥ 0 for all x in [a,b] then f is convex in [a,b]

If f''(x)≤ 0 for all x in [a,b] then f is concave in [a,b]
(If nothing else is said, convex/concave is understood to mean convex/concave to the origin.)

If f''(x) = 0 for all x in [a,b] ? Since f’’ is the derivative of f’ and f’’=0 for all x in the interval,
f’ must be constant. But if f’ – the slope of f – is constant, then f must be a linear function.
Thus, a function is both convex and concave in an interval if and only if it is linear in the

If f''(x) > 0 for all x in [a,b] then f is strictly convex in [a,b]

If f''(x) < 0 for all x in [a,b] then f is strictly concave in [a,b]

horizontal axis. It does not mean that these derivatives don’t exist! Every polynomial is differentiable any
number of times.

A convex function:
(with some of its tangents)

As x increases from left to right, the graph changes from “sloping steeply downwards to the
right” to “sloping downwards to the right but not so steeply”, to flat at the minimum point, to
“sloping upwards to the right but not steeply”, to “sloping steeply upwards to the right”. The
slope of the graph and of its tangents increases from "large negative" to "small
negative" to "small positive" to "large positive".

A concave function
(with some tangents)

The slope of the tangents (the derivative) is decreasing

A convex/concave function can be either increasing or decreasing

convex & increasing convex & decreasing

concave & increasing concave & decreasing

Note: A function can be convex in one interval and concave in another.

Where a function is strictly convex:

 Tangents at all points of the graph are below the graph.
 Every line from one point on the graph to another point on the graph (secant) is above
the graph.
 If the function has a local extreme point, it must be a local minimum.

To understand the first, start from a point c on the graph. (For simplicity, assume that the
function is increasing – the reasoning is analogous but less simple if it is decreasing*.) Follow
the tangent at that point rightwards. The slope of the tangent is f’(c). But f’’>0 at c and at all
points to the right of c. So if you follow the graph instead, the slope starts at f’(c) and gets
bigger and bigger. You can think of the slope of the linear tangent as the velocity of a car,
which is going at a constant pace, and the slope of the graph with a positive second derivative
as the velocity of a car that is accelerating. If they have the same speed to begin with, the
accelerating car must be getting ahead of the car, which keeps going at the constant speed.
The graph must be going more steeply upwards than the tangent.

Proofs of the other two are too complicated for this course, those who are interested are
referred to a text book in calculus.

If the function is decreasing and you want to keep the car-metaphor, the car has a negative velocity - you have
to think of it as reversing. If the second derivative is positive the velocity is increasing. But when a negative
number increases, it gets “smaller negative”, the “graph car” is reversing but more and more slowly while the
“tangent car” keeps reversing at a constant speed and gets further behind on the road.

Where a function is strictly concave:
 Tangents at all points of the graph are above the graph.
 Every line from one point on the graph to another point on the graph (secant) is below
the graph.
 If the function has a local extreme point, it must be a local maximum.

A point where the function changes from convex to concave (or from concave to convex) is
called an inflection point.
For a twice differentiable function, an inflection point is where f’’ is zero and changes sign.

Example: f(x)=x4 and g(x)=x3

f'(x)= 4x3 and g'(x)=3x2
f''(x)=12x2 and g''(x)=6x.

f’’(0) = g’’(0) = 0 but only g’’ changes sign. f’’ is positive both for small negative and for
small positive numbers.

g’’(x) = 6x - 0 +
f’’(x) = 12x2 + 0 +

x=0 is an inflection point of y=x3 but not of y=x4.

x=0 is also a stationary point of both y=x3 and y=x4 Thus, it is a stationary point as well as an
inflection point of y=x4. But an inflexion point does not have to be stationary, just as a
stationary point does not have to be an inflexion point. For example, x=1 is an inflexion point
of the function y = x3-3x2 but it is not a stationary point. (Check both statements as an

Second order-conditions for maximum/minimum:
If f is twice differentiable in an interval I and c is a point inside I (not an end point) and
f'(c)=0 then:
f''(c)<0  c is a local maximum
f''(c)>0  c is a local minimum
If f''(c)=0 too, c may be a local minimum, local maximum or an inflection point.

The second derivative helps us determine the character of a stationary point, c.

Intuitively: Assume that f is twice differentiable and f’’> 0 on both sides of c.
f''>0 means that f' is strictly increasing.
If f'(c) = 0 and f' is strictly increasing f' must be less than zero (f decreasing) for values of x a
little smaller than c (to the left of c). And f' must be larger than f'(c)=0 to the right of c (f
increasing). But that means that c is a local minimum.
Analogously, if f’’<0 on both sides of c, c is a local maximum.

f’’ >0 >0

f’ 0 (and )
f’ <0 >0
f Min.

2 x
Example 1: Let f ( x)  x e Does it have any local maxima or minima?
Let g(x) = x2 and h(x)=ex. Then f=gh and.g'(x) = 2x and h'(x)= ex. According to the product
rule for derivatives:
f ' ( x )  g ' ( x ) h( x )  g ( x ) h' ( x ) 
x 2 x 2 x x
2 xe  x e  ( 2 x  x )e  x ( 2  x )e
f'(x)=0  x=0 or 2+x= 0 or ex=0  x=0 or x= -2
f’(x)=u(x)v(x) where. u ( x)  2 x  x and v( x)  e x
f ' ' ( x )  u ' ( x )v ( x )  u ( x )v ' ( x ) 
x 2 x 2 x
 ( 2  2 x )e  ( 2 x  x )e  ( 2  4 x  x )e
2 0
f ' ' (0)  (2  4  0  0 )e  (2  0  0)  1  2  0
x = 0 is a local minimum.
2 2 2 2
f ' ' (2)  (2  4  (2)  (2) )e  (2  8  4)  e  2e 0

x = -2 is a local maximum

As an exercise, show this using a sign diagram!

Example 2: f(x)=2x3-27x2+84x-130
f’(x)=6x2-54x+84 f’’(x) = 12 x – 54
We have found that f’(x) = 0 implies x = 2 or x = 7
f’’(2) = 24 – 54 = -30 < 0  x = 2 is a local maximum
f’’(7) = 84 – 54 = 30 > 0  x = 7 is a local minimum
Inflection points? 12x -54 = 0  x = 4.5
x<4.5  f’’< 0 and x>4.5  f’’>0.
x = 4.5 is an inflection point. (As we see, an inflection point doesn’t have to be a stationary

Example 3. Assume that the owner of a forest wants to determine when to cut the wood. If
he does it today he gets K SEK K>0. If he waits t years the value will be V=Ke√t. On the
other hand, when he waits he looses interest the interest that he gets when the wood is sold.
Discounting to present value the value of cutting at time t is

ke t
V (t ) 
(1  r ) t
Method 1: Differentiate directly using the quotient rule and the chain rule.
Method 2: Use the logarithm
We will use Method 2. The logarithm is an increasing function.
a > b  ln a > ln b.
The value of t that maximises lnV(t) also maximises V(t) and vice versa.

Y (t )  ln V  ln K  ln e t  ln(1  r ) t  ln K  t  t ln(1  r ) Is it a max or

Y ' (t )   ln(1  r )
2 t
Y ' (t )  0   ln(1  r )
2 t
2 t 
ln(1  r )
t 
4(ln(1  r ) 2 )

min? 1
Y ' ' (t )   0
4t 2
What we have found is a maximum point.

n-dimensional space
In economics, as in all social sciences, practically all phenomena we want to study depend on
many more factors than one. To use functions of one variable in a model, we must be sure
that it is meaningful and reasonable to disregard all exogenous variables except one, or to
keep them constant. But very often this is not the case and we have to model economic
relations as multivariate functions. For instance, consumers with high income may not only
buy different quantities of a particular good at each price, than consumers with low income. It
may also be that their demand curve has a different slope.

Example: Assume that demand, q, for a good depends both on its price, p and on the
consumers’ income, y. This can be written as
q=q(p, y)
and is read as ”q is a function of the ordered pair (p, y)”.
If demand for the good is a function of its own price p1 and the prices of n-1 other goods, and
we call the prices of the other goods p2, p3, …,pn, then q may be written as:
q=q(p1, p2, …,pn)
”q is a function of the n variables p1, p2, …,pn”
” q is a function of the ordered n-tuple (p1, p2, …,pn)”
(”q is a function of the (n-dimensional) vector p=(p1, p2, …,pn)”)

Level curves
An ordered pair of real numbers can be seen as a point in two-dimensional system of
A function of one variable can be illustrated by a graph (a line) in two dimensions.
A function of two variables can be illustrated by a graph (a surface) in three dimensions but
also by a two-dimensional representation by using level curves.

Definition Let the function f be defined as z=f(x, y).

For each number k in the range of f, the corresponding level curve consists of all the points
(x, y) such that f(x, y) = k.

Examples: Altitude on maps, isobars on weather maps, isocost curves, indifference curves,

Three level curves of

the utility function U(x, y)
where x and y denote
quantities consumed of two
different goods.

1. Let G(x, y) = x2y – 2y
What is the value of G for x=2 and y= 4?
Answer: G(2, 4) = 22*4-2*4 = 4*4 – 8 = 8
1 1
2. Let utility U ( x, y )  x 2 y 3 where x and y are the quantities consumed of two goods.
Which of the following three “consumption baskets” gives the consumer the highest utility?
x=25 y=8
x=4 y = 125
x = 12.25 y= 27
Answer: U(25; 8) = 5*2 =10 U(4, 125) = 2*5 = 10 U(12.25, 27) = 3.5*3 = 10.5
The two first bundles are on the same indifference curve but (12.25, 27) is on a higher one.
(Remember that z = x1/2  z2 = x and z = y1/3  z3 = y)
2 1
F(x, y)  ln x  ln y
3. Let 3 3
a) What is the domain of F?
b) Determine and draw the level curve F(x,y)=0.
Answer: F is defined for all (x, y) such that ln x and ln y exist so the domain is all
(x, y) such that x>0 and y>0.
For x>0 and y>0
2 1  2 1 
2 1  
ln x  ln y  ln x 3  ln y 3  ln x 3 y 3 
3 3  
 
ln u = 0 if and only if u=1.

2 1  2 1
  1
F ( x, y )  0  x 3 y 3  1   x 3 y 3   13  x 2 y  1  y 
  2
 
for x>0.
The function y 

4. The point (2, 4) is located on a level curve, of the function G in example 1, which
corresponds to the value 8. What other points are on this level curve? What is the shape of the
level curve?
Answer: On the level curve x2y – 2y = 8. Therefore, ( x 2  2) y  8  y  for
( x  2)


The level curve G=8





0 -4,5 -3,9 -3,3 -2,7 -2,1 -1,5 -0,9 -0,3 0,3 0,9 1,5 2,1 2,7 3,3 3,9 4,5




The level curve G=8

Partial derivatives
Taking partial derivatives
Let f be a function of two variables defined in a point (a, b).

Keep y=b fixed but allow x to vary. Then we can interpret the function as a function of only
x, given that y=b. (y is treated as a parameter.)

Choose a value x=a+h where h is near zero.

f (a  h, b)  f (a, b)
If the difference quotient has some limit when h→ 0, we say that f
is differentiable with respect to x at (a,b) and the limit is called the partial derivative of f
with respect to x at (a,b).

Analogously, lim f (a, b  h)  f (a, b) is called the partial derivative of f with respect
h 0 h
to y at (a, b). (If the limit exists.)

Thus, the procedure for taking partial derivatives is the same as for ordinary derivatives – but
when partially differentiating w. r. t. x, one treats y as a constant, and vice versa.

In a 3-dimensional system of coordinates, the graph of the function is a surface and the partial
derivative of f w. r. t. x at (a, b) is the vertical slope (“upwards slope”) of the surface when
you move along it from (a,b) parallel with the x-axis. (Or along the tangent plane of the
surface at (a, b).

NOTE: Like derivatives of functions of one variable, partial derivatives are functions. Both
the partial derivative w. r. t. x at the point (a, b) and the partial derivative w. r. t. y at the point
(a, b) depend on the two values a and b.

' ' f
Notation: The partial derivative of f w. r. t. x can be written as f x
or f 1
and the

' ' f
partial derivative of f w. r. t. y as f y
or f 2

Example 1: Let f ( x, y)  3x  xy  y 2
Find the partial derivatives with resp. to x and y at the points (2, 1), (2, 2) and (1, 2).
f f f
 3 y (2,1)  3  1  4 ( 2,2)  3  2  5
x x x

f f f
 x  2y (2,1)  2  2  1  4 (2,2)  2  2  2  6
y y y

If f is a function of more than two variables:

Let z=f(x1, x2,….,xn) be defined at x1=a1, x2=a2,…., xn=an. If the limit
f (a1 , a 2 , 1, ai  h, ai 1 ...a n )  f (a1 , a 2 , 1, ai , ai 1 ...a n )
lim exists it is called the
h0 h
partial derivative of f with respect to xi at (a1, a2, …, an)

Don’t forget that the partial derivative of f with respect to xi at (a1, a2, a3…., an) is a function of
all the n variables x1, x2, x3…., xn and therefore depends on all the n values a1, a2, a3…., an.

Example 2: z = xt + 3y2t
 txt 1
 6ty
 xt (ln x )  3 y 2

Example 3: Partial derivatives of the Cobb-Douglas production function Q = AKL

 AK  1L
 AK  L 1

As with derivatives of functions of one variable, we can differentiate partial derivatives to get
higher order partial derivatives.

If z=f(x, y) there are four second order partial derivatives:  z  2 z  2 z  2 z or

, , ,
x 2 xy yx y 2
'' '' '' '' or '' '' '' ''
f xx , f xy , f yx , f yy f11 , f12 , f 21 , f 22

Example 4: Second order derivatives of Q(K, L) = AKL

Q Q
 AK  1L  AK  L 1
K L
 2Q  Q
Q LL   ´  (   1) AK  L  2
L2 L L
 2Q  Q
Q LK   ´ AK  1 L 1
KL K L
 2Q  Q
Q KK   ´  (  1) AK   2 L
K 2 K K
 2Q  Q
Q KL   ´ AK  1 L 1
LK L K

Note that the two “mixed” derivatives are equal.

That is always the case if they are continuous (according to Young’s theorem).

The chain rule for functions of several variables and total derivatives

Example 1 z = xy where x = 3t and y = s2 – s

z z dx
  3y z z dy
t x dt   x(2s  1)
s y ds
In this example, using the chain rule for the function very similar to doing it for a function of
only one variable. That is because y is independent of t and x is independent of s. If we
illustrate by a box diagram it looks like this:

t x
s y

Example 2 Write GNP as Y = Y(I) where investment is I =I(r) where r is the rate of interest.
Y(I(r)) is a composite function which can be illustrated by a box diagram. (In this model
everything else that has an impact on Y is assumed to be constant.)

r I Y

dY dY dI
Chain rule:  
dr dI dr

If extend the model to allow Y to be determined by both investment and consumption?

r I

The derivative of Y w.r.t. (with respect to) r is now a partial derivative:

Y Y dI
 
r I dr

But what if C = C(r) (that is, if consumption is also a function of r, perhaps because of its
impact on housing costs)?

r Y

r determines Y “through two channels”, I and C and the total impact is the sum of the two
partial ones.
dY Y dC Y dI
   
dt C dr I dr

The total derivative of Y w. r. t. r

More generally:
Let z=z(x, y) where x=x(t) and y=y(t).

The total derivative of z w. r. t. t is

dz z dx z dy
   
dt x dt y dt

2 1
Example: Let F(x, y)  ln x  ln y
3 3
Find the total derivative when z=F(x,y), x=e3t and y = e-3t

Solution: The relations can be illustrated with a box diagram

t z

dz z dx z dy
The total derivative is  
dt x dt y dt

z 2 1 z 1 1 dx
 3e
3t dy  3t
     3e
x 3 x y 3 y dt dt
dz z dx z dy 2 1
   3e 3t     3e 3t    3t  3e 3t   3t   3e 3t   2  1  1
1 1 2 1 1 1
 
dt x dt y dt 3 x 3 y 3 e 3 e

A special case of the chain rule
which is important in many economic applications:

Assume that z=f(x, y(x))


The total derivative of z with respect to x is:

dz z z dy
   z x'  z 'y y '  z1'  z 2' y '
dx x y dx

Example: Take a utility function U(x, y(x)) and a level curve U=k. (k is some constant
number.) Suppose that the level curve has the shape that indifference curves are usually taken
to have in economics. It is enough to assume that the marginal utility functions are strictly
positive and strictly decreasing. (Examples of such utility functions are the Cobb-Douglas
functions.) Then for a given value of x, x0>0 there cannot be more than one point on the level
curve with the x-coordinate x0 . Therefore there can only be one value of y, say y0, such that
(x0, y0) is on the indifference curve. The same reasoning can be applied to any positive
number x, for each x there is only one y such that (x, y) is on the indifference curve and we
can define a function y(x) by saying that “y(x) is the value of y for which U(x, y) = k”.

The graph can be interpreted as ”an indifference curve of U(x, y)” but also as ”the graph of a
function y(x)”.

Thus, we have defined a function y(x) such that U(x, y(x)) = k and we can write U(x, y(x)) on
the indifference curve. Therefore, on the indifference curve U can be seen as a constant
function which depends only on the variable x.

Take the total derivative of U w.r.t. x on the level curve U(x, y)= k
dU U U dy
   U x'  U 'y y'  U1'  U 2' y'
dx x y dx
 0 since U = k, constant for (x, y) on the indifference curve.

U x'  U 'y y '  U1'  U 2' y '  0

This implies that U1' MU1
y ' ( x)    which is the marginal rate of substitution,
U 2' MU 2

MRS, between the goods 1 and 2 at this point on the indifference curve.

1 1
U ( x, y )  4 x 2 y 4 x
Example. Let What is the value of U and what is MRS when 3 and

6 ?

1 1 1 1 1

Answer: U ( ; )  4     4  
40 50 40 2 50 4 1600  4  50  4  80000  4
    4    20,9
3 12  3   12   9   12   108 
1 1
1 
'4 x 2 y4 1 1 1 3
U 2 2  2  2 4 (  4 ) 2y 25 3 5
MRS   x
' 1 3
 x y  2 x 1 y 1   2      0.625
U 1  1 x 6 40 8
4 x2 y 4

If we solve for y on the indifference curve we have
1 1
 80000  4
4y x U  4  
4 2

 108 
1 1  80000  4
y   1 4

4 2  108 
1 1  80000  20000
y  4  2  44   
 108  27 x
4 x
y ' ( x)  2 
27 x 3
40 40000 27 5
y' ( )       0.625
3 27 64000 8

which confirms that the slope of the indifference curve is equal to the MRS.

Total partial derivatives

Taking total derivatives means adding all the different ways in which one variable impacts on

a function.

A partial derivative measures the effect of a change in one variable when there are also other

variables than this that determine the value of the function. The opposite of “a partial

derivative” is “an ordinary derivative”, not “a total derivative”.

Total derivatives can be either ordinary or partial.

Example: Y = Y(C, I, G) and C = C(r, G) and I = I(r) where

Y = GNP, C = aggregate consumption, I = investment, G = public expenditure, r = the rate of



The total derivative of Y w.r.t. r must include both the effect through I and through C. But it
is still a partial derivative since Y depends also on G, which is not a function of r.

Y Y dI Y C
 
r I dr C r
The derivative of Y w. r. t I is a partial derivative because Y also depends on C and G.
But note that the derivative of I w. r. an ordinary derivative, since I is determined
exclusively by r while the derivative of C w. r. t. r is partial since C depends on both r and G.

Y Y Y C
The total derivative of Y w. r. t. G is  
G G C G
But here the same symbol is used for two things – the total partial derivative of Y with
respect to G on the left side, and the “simple” or “direct” partial derivative of Y with respect
to its second variable, G on the left side. It is probably more helpful for the reader to write:
YG'  Y1'  CG'  Y2'

Optimisation of functions of several variables

Example: Assume that a firm produces two goods.

Q 1 = the quantity of good 1
B B Q 2 = the quantity of good 2

Π(Q 1 , Q 2 ) is the firm’s profit function


1. Assume that there is some choice of production, a combination (Q1* , Q2

* such that

 (Q1* , Q2
)   (Q1, Q2 ) for all combinations (Q 1 , Q 2) that it is possible for the

firm to produce and that the firm is not operating at its limit of capacity when producing

(Q1* , Q2
* .

2. Assume that both partial derivatives of Π exist at (Q1* , Q2

* .

3. Define a function P(Q)   (Q, Q2 )
In other words, P is equal to the firm’s profit as a function of the production of good 1 when
production of good 2 is held constant at the level Q2*. Since Q2 doesn’t vary, P is a function
of Q1 only, that is to say, only of the level of production of good 1.

According to the definition of a partial derivative P ' (Q)  1' (Q, Q2


P(Q) must take its maximum value when Q  Q1* because if P(Q )  P(Q1* ) then

(Q , Q2* )  (Q1*, Q2* ) which contradicts the assumption that (Q1* , Q2
* is the maximum
point of Π.

But P is a function of one variable, it is differentiable and takes its maximum value at a point

Q1* which is not an end point of the interval where P is defined. Therefore Q1* must be a
stationary point of P:

P ' (Q1* )  1' (Q1* , Q2* )  0

In the same way, one can show that  '2 (Q1* , Q2





Example: Find the stationary points of f when

f(x, y)=x 2 +xy-x+y 2 +4y-7

f x' ( x, y )  2 x  y  1
f y' ( x, y )  x  2 y  4

A stationary point is one where:

f x' ( x, y )  2 x  y  1  0 2x  y  1 x2
x  2 y  4 y  3
f y' ( x, y )  x  2 y  4  0

(2, -3) is the only stationary point of f.

If f is a function of n variables with a local extreme point in a (inner) point

a = (a1, a2, …,an) in its domain when at the point a either
 at least one partial derivative of f does not exist
 all n partial derivatives are zero
The last means that a=(a 1 , a 2 , …., a n ) is a solution to the system of n simultaneous


 f
 (a)  0
 x1


 f
 ( a )  0 These equations are called the first order (necessary) conditions
 xi
. for an extreme point. (FOC).

 f
 (a )  0

 x n

As for functions of one variable, the FOC are necessary conditions for a maximum or
minimum but not sufficient. A stationary point can be a local maximum, a local
minimum or a saddle point. To know if a point is a maximum, minimum or saddle point of
a differentiable function of several variables we need:
1. To know that it is a stationary point
2. To check second order conditions which involve all the second order derivatives,
including the mixed.

The second order conditions for functions of n variables are complicated to learn by heart
even for n=2 and it is not required for this course. What is required is to know that such
conditions exist, that they involve all n2 second order partial derivatives of the function and to
be able to look them up and apply them for functions of two variables, when it is needed.

Second order conditions for a function of two variables:

Assume that c = (c1, c2) is a stationary point of a function of two variables, f(x, y). (And
assume that all four second order derivatives exist and are continuous):

f11  0
and f11 ''
 f 22 ''
 f12  2 at c  c is local minimum
f11  0 and f11
'' ''
 f 22  f12  
'' 2 at c  c is local maximum

'' 2  c is a saddle point

f11 ''
 f 22   f12
If f11 ''
 f 22  2 c can be either an extreme point or a saddle point.
 f12

*Footnote: f11 ''
 f 22 ''
 f12  2  '' and '' have the same sign so we could just
f11 f 22
as well have written:
f 22  0 and f11
'' ''
 f 22 ''
 f12  2 at c  c is local minimum
f 22  0 and f11
'' ''
 f 22 ''
 f12  2 at c  c is local maximum
f11 ''
 f 22  2  c is a saddle point
 f12

The first order and second order conditions together are sufficient conditions for
maximum or minimum.

In economic applications it is often reasonable to assume that functions are convex
everywhere (”bowl-shaped”) or concave everywhere (”bell-shaped”). This makes it easier to
find the optimum because it is only necessary to verify the first order conditions.
 If f is convex in its whole domain, any stationary point is a global minimum.
 If f is concave in its whole domain, any stationary point is a global maximum.

Example 1: f(x, y)=x 2 +xy-x+y 2 +4y-7


f xx 2
f x' ( x, y )  2 x  y  1  '' ''
f xy  f yx 1
f y' ( x, y )  x  2 y  4
f yy 2
2>0 and 2*2>1 2 so (2, -3) is a minimum point.

Example 2: Price discrimination

A firm produces a good which is sold in two separate markets. The firm is not a price taker
and maximizes profit.
X = quantity sold in market 1
Y = quantity sold in market 2
P = price set in market 1
Q = price set in market 2
Demand in market 1 X=600-8P (1)
Demand in market 2 Y=360 - 4Q (2)
Cost of production: C  70Z  1 Z 2  200 (3)
where Z=X+Y
Calculate prices, production and profits assuming:
a) the firm has to set the same price P in both markets
b) the firm is able to price discriminate

a) We get total demand by adding demand in the two markets. Adding the left side of
equation (1) to the left side of equation (2), and the right side of (1) to the right side of (2)1
we get:
X+Y=Z=960 - 12 P (4)
P=80 - Z/12 (5)
Total revenue TR  PZ  80Z  Z (6)
Marginal revenue MR  80  Z
Marginal cost: MC  70  Z
MR=MC  80  Z  70  Z (9)
6 12
(9)  Z=120. Z=120 and (5)  P=70
( X  Y )  TR( X  Y )  C ( X  Y )  70  120  (70  120   120  120  200)  200)  400

b) C  70Z  1 Z 2  200

C  70( X  Y )  1 ( X  Y ) 2  200 (10)

C 1 1
MC1   70   2( X  Y )  70  ( X  Y )
X 24 12
MC  C  70  1 ( X  Y )
2 Y 12

In this case MR 1 ≠ MR 2

(1)  8P  600  X  P  600  X (2)


 Q  360  Y .

Since the LS of (2) is equal to the RS of (2), we have added the same amount to both sides of equation (1),
only expressed in two different ways.

600  X X
TR1  X  75 X 
8 8
TR2  QY  360  Y Y  90Y  Y .
4 4
2 2
TR  75 X  X  90Y  Y
8 4

MR  TR  75  X and MR  TR  90  Y

1 X 4 2 Y 2
To maximise profits, the firm must have MR=MC in each market/

75  X  70  1 ( X  Y )
4 12
90   70  1 ( X  Y )
2 12
X=60 Y=60 P=67,5 Q=75.
=TR(X+Y)-C(X+Y) = 8550-8000=550

*To check that this stationary point really is a maximum point, we use the 2nd order sufficient
conditions. The partial derivative of  w.r.t X is
MR - MC  75 - X  (70  1 ( X  Y ))  5  X  Y
1 1 4 12 6 12
The partial derivative of  w.r.t Y is

MR - MC  90 - Y  (70  1 ( X  Y ))  20  X  5Y
2 2 2 12 12 6
(after simplification)
2 2
B       1
A    1
C   
XY YX 12
X 2 6 Y 2 6
A<0 and 1 5 1 1 5 1
AC - B2  ( )( )     0
6 6 12 12 36 144
The sufficient conditions for maximum are satisfied.

*Example 3: Least squares regression.

In an earlier example the wage of each individual was assumed to be a baseline wage, w0,
augmented by a certain percentage for each year of schooling, and some percentage for each
year of experience. This multiplicative model was equivalent to a linear model for the
logarithm of the wage. Now, for simplicity, assume that we model the wage as a function
only of education. The predicted wage for individual i is given by where si is

this person’s years of schooling beyond compulsory school, α is the log of the wage that a

worker with only compulsory school would get and β-1 is the percentage increase in wage for
an additional year of schooling.

The model predicts that all workers with the same length of education would receive the
same wage. If this was true and we would plot wage and education for a number of
individuals, all observations would show as points on a straight line.

ln w
α+ βs

In reality, a lot of other factors, besides education, influence individual wages – firm,
industry, experience, tenure, gender, location and different abilities. Even if it were true that,
all else equal, a worker earns β-1 percent more if she or he has one year more of schooling,
all else isn’t equal when we compare different individuals. The plot won’t be a nice linear
graph, it will be a lot of points scattered over the page. But if there is an “all else equal”
linear relation the scattered points will be scattered around the linear graph.

The labour economist normally doesn’t know the true values of α and β. He or she knows the
wages and the level of schooling of a large number of workers and tries to answer the
question: Which linear equation – in other words which values of α and β - is closest to or
most likely to capture the real relation between wages and education?

(si; lnw i ) represents the real wage and schooling of individual i. (si ; α+βsi) represents
schooling and the wage predicted by the model. One good bet for α and β could be the pair a,
b which are such that if we calculate the distance di between (si; lnw i ) and (si ; a+bsi) for
each individual and add all those distances, we would get the smallest value possible. In other
words, we could solve an optimisation problem in two variables a and b:
Minimise Σ di = Σ(si; lnw i )- si ; a+bsi)

A better choice – because it has a number of nice statistical properties – is to minimise the
sum of the square of these distances2. According to Pythagoras theorem the distance between
the points (x1; y1) and (x2; y2) is

and the square

(x1 ;y1)

y2 d
(x2 ;y2)

x1 x2 x

This makes our minimisation problem:

Find the values of a and b that minimise the sum
S(a, b) = (ln w1 – a – bs1)2+(ln w2 – a – bs2)2+...+(ln wn – a – bsn)2

As usual we take partial derivatives to find stationary points:

And solve for the values of a and b for which these partial derivatives are zero. To do this
with the algebra that is available to us in this course is, of course, possible, but it is quite
tedious and messy. It is much easier to do if one uses a more advanced mathematical
notation, vector and matrix algebra. Therefore we will not write the solution explicitly here.
(You will find it in textbooks in statistics and econometrics.)

For proof that the a and b that minimize the sum of squared distances give what is called a Best Linear
Unbiased Estimate of α and β, under certain assumptions, see any econometrics or statistics text book.

Optimisation under constraints:

In economics the problem is usually not to find the best solution that can be imagined but to
find the optimal choice under certain restrictions or constraints. The consumer who
maximises utility is constrained by not having an infinite amount of money, the worker who
chooses how many hours per week to work is constrained by not having an infinite amount of
time. The firm which maximises profits can be constrained by liquidity, by consumer
demand, by limited availability to inputs in the short run and so on. A typical economic
problem very often has the following mathematical form:

Find the largest/smallest value of f(x 1 , x 2 , x 3,.... , x n )


under the constraint that (subject to)

g(x 1 , x 2 , x 3,.... , x n )=c.

g(x 1 , x 2 , x 3,.... , x n )=c defines a level curve of g.


A typical example could be a consumer who maximises utility under a budget constraint. Let
U be a utility function which depends on the quantity consumed of two goods, x and y.
Assume that the prices of x and y are, respectively, px and py and that the consumer’s budget
is m kronas. The consumer obtains the greatest utility attainable by consuming quantities x
and y that solve the mathematical problem
Find the largest value of U(x, y)
under the constraint B(x, y) = pxx + pyy =m

In introductory microeconomics courses this problem is analysed graphically and the optimal
choice for the consumer is found to be the point of tangency, P, between the budget line and
an indifference curve.

Point of tangency P

Indifference curves

Budget line

m p
The slope of the budget line is p since px x  p y y  m  y   x x
 x py py
When we took the total derivative of U with respect to x on an indifference curve we found
that the slope of the indifference curve at the point P is equal to the marginal rate of
substitution: MU x Intuitively, this can be explained by saying that the slope
MRS  
MU y

shows how much of the two goods the consumer can exchange for each other without
experiencing a change in her level of utility. That proportion must be the reverse of the ratio
between the marginal utilities – if the marginal utility of a small unit of good x is twice as
large as the marginal utility of a small unit of good y, then the consumer will be indifferent to
an exchange of one unit of x for two units of y.

y U x U x
  when the changes in x and y approach zero.
x U y U y

The consumer attains the highest utility possible from her money if she consumes quantities
that are such that MRS is equal to the relative price. (Remember that marginal utilities – and
therefore also MRS – depend on the quantities consumed.) If one unit of x costs twice as
much as one unit of y and the consumer’s marginal utility from x is more than twice as large
as her marginal utility from a unit of y, she would gain from buying more of x and less of y.

If the MRS is less than two, she would increase her utility by giving up some x and buying
more y. Therefore the optimal choice must be the point P where the slope of the budget
function and the slope of the indifference curve are equal.

U B
or as x  x since the partial
MU x px
This condition can be formalized as 
MU y py U B
y y y y

derivative of B with respect to x is px and the partial derivative of B with respect to y is py.

Back to the general problem:

Find the largest/smallest value of f(x 1 , x 2 , x 3,.... , x n )

under the constraint that (subject to)

g(x 1 , x 2 , x 3,.... , x n )=c.

The constraint g(x 1 , x 2 , x 3,.... , x n )=c means that we can choose only among those points that

are located on a particular level curve of the function g, the level curve where g is equal to c.
At the same time, every point has to be on some level curve of f. The problem is to find the
level curve for f which has some point in common with the level curve g(x 1 , x 2 , x 3,.... , x n )=c B B B B B B B B

and at the same time corresponds to the highest/largest value of f.

If f and g are functions of two variables we can illustrate graphically:

g(x, y)=c

f(x, y)=z*

Suppose that we want to find a level curve of f which is as far away from the origin as
possible. By reasoning graphically, analogously to the argument in the utility maximisation
case, we can conclude that the optimal point cannot be one where g(x,y) intersects a level
curve of f. It would be possible to attain a point on a level curve of f which is further out.
The solution has to be a point where the level curves are tangent to each other. But if they are
tangent to each other, they have the same slope. We found the slope of a level curve of the
utility function by taking the total derivative. If we follow the same procedure with f and g
we find that

on a level curve of f and on a level curve of g

f x'  f y' y '  f1'  f 2' y '  0 g x'  g 'y y '  g1'  g 2' y '  0
f1' g1'
y ' ( x)   ' y ' ( x)   '
f2 g2
Since the slopes of the two level curves are the same at the point of tangency, the two
functions y(x) defined by the level curves must have the same derivative at that point.
f1' g1'
Therefore  at the point which solves the constrained optimisation problem.
f 2' g 2'

This is the idea behind the Lagrange method U

The Lagrange method

To solve the problem: Find the largest/smallest value of f(x 1 , x 2 , x 3,.... , x n ) under the B B B B B B B B

constraint (subject to) g(x 1 , x 2 , x 3,.... , x n )=c. B B B B B B B B

1. Write the Lagrange function

L(x 1 , x 2 , x 2,.... , x n , ) =

= f(x 1 , x 2 , x 3,.... , x n )-( g(x 1 , x 2 , x 3,.... , x n )-c)


2. Calculate all n+1 1st order partial derivatives.

3. Set up the system of n+1 equations (the First Order Conditions, FOC)
L (x , x , ..., x ,  )  f ( x , x ,...x )  g ( x , x ,...x )  0
x1 1 2 n x1 1 2 n x1 1 2 n
L  f   g  0
x2 x2 x2
L  f   g  0
xn xn xn
L  g ( x , x ,...x )  c  0
 1 2 n

and find all solutions – in other words, all the stationary points of L
4. Check points where the method may not work.
 Points where g(x 1 , x 2.... , x n )=c and g '  0 for all i=1, 2,...n

 Points where g(x 1 , x 2.... , x n )=c but some partial derivative of g or of f is not defined or

not continuous.

If there is a solution to the problem, it will be a point that you find in step 3 or step 4.

Justification for the method in the two-variable case.

Why does the Lagrange method work? Why do we find the constrained optimum of one
function by looking for a stationary point of another function? For functions of more than two
variables, the proof is complicated but for functions of two variables it is not so hard to see
f1' g1'
that the Lagrange method is a way of finding exactly those points where '  ' .
f2 g2
' '
L' x and L' y =0  f x ( P)  g x ( P)  0  f x ( P )  g x ( P ) 
' '

' '
f y ( P)  g y ( P)  0 f ' ( P )  g ' ( P )
y y
f x' ( P) f y ( P)
 
g 'x ( P) g 'y ( P)

f x' ( P) g 'x ( P)
  the level curves have the same slope-
f y' ( P) g 'y ( P)

Example 1. A consumer has the utility function U=4x 0.5 y 0.25P

The price of x is 2.5 and the price of y is 4. The consumer’s income is 50. What choice of
consumption maximizes utility?

Solution: The problem can be written as:

1 1
Maximise U ( x, y )  4 x 2 y 4 subject to 2.5x+4y = 50
1 1 5
The Lagrangefunction is : L( x, y,  )  4 x 2 y 4   ( x  4 y  50)
Its partial derivatives: First order (necessary) conditions
L 1 1 1 5 1 1 5
 4  x 2 y 4    (1) 2 x 2y 4  ( 4)
x 2 2 2
L 1 1  3 1 3
 4  x 2 y 4   4 ( 2)  x 2 y 4  4 ( 5)
y 4
L 5  x  4 y  50
  x  4 y  50  (3) 2 (6)
 2 
4  12 14 1 1 3
(4)    x y and (5)    x 2 y 4
5 4
4  12 14 1 12  3 4 16
x y  x y  yx
5 4 5
Substituting this into (6) (the constraint) we get y = 25/6 and x = 40/3
To check that the method works, we also calculate the MRS. For any (x,y)
U 1 1
 MU x  2  x 2 y 4
U 1 3
 MU y  x 2 y 4
MU x 1 1 1 3 y
MRS   2 x 2 y 4  x 2 y 4  2
MU y x

50 3 5 p
At the point which satisfies the FOC: MRS  2    x
12 40 8 py

MRS = the slope of the indifference curve

(Minus) the price ratio = the slope of the budget constraint.

Example 2. Use the Lagrange-method to solve the problem:
Find the smallest value of f(x, y) = x 2 + xy + y 2 subject to

g(x, y) = x + 2y=6
Solution: The Lagrange function is L(x, y, λ) = x 2 + xy + y 2 – λ(x + 2y-6)

Partial derivative FOC

w.r.t.. x 2x+y-λ 2x+y = λ (1)
w.r.t. y x+2y-2λ x+2y=2λ (2)
w.r.t. λ -x - 2y+6 x + 2y = 6 (3)

(1) and (2) imply that 4x + 2y = x + 2y which, in turn, implies that x = 0. Inserting this
value into (3) shows that y = 3.

Example 3 A firm produces two goods in quantities X and Y. Its cost function is
C(X, Y) = 10X + XY + 10Y
and the prices P and Q it can charge are
P = 50 - X + Y (1)
Q = 30 + 2X - Y (2)
The firm is committed to delivering a total of 15 units. Thus:X + Y = 15 (3)
How much should the firm produce of each good to maximize profits?
(It is enough to find the FOC).
Maximise (X,Y)
s. t. X+Y=15
Total revenue is TR=PX+ QY =(50-X+Y)X+(30+2X-Y)Y =50X-X 2 +30Y+3XY-Y 2 P

Profit is:(X,Y)=TR(X,Y)-C(X,Y)=
=50X-X 2 +3XY+30Y-Y 2 -10X-XY-10Y =40X-X 2 +2XY+20Y-Y 2

The Lagrange function is

L(X, Y; ) =40X-X 2 +2XY+20Y-Y 2 - (X + Y - 15)

The FOC are

L''x ( X , Y ,  )  40  2 X  2Y    0
L'' y ( X , Y ,  )  20  2 X  2Y    0
L'' ( X , Y ,  )  X  Y  15  0
The only solution is X=10, Y=5 (=30)


