Lectures On Optimization: A. Banerji

Lectures on Optimization
A. Banerji
September 23, 2013
Chapter 1
Introduction
1.1 Some Examples
We briey introduce our framework for optimization, and then discuss some
preliminary concepts and results that well need to analyze specic problems.
Our optimization examples can all be couched in the following general
framework:
Suppose V is a vector space and S V . Suppose F : V . We wish to
nd x
S s.t. F(x
) F(x), x S, or x
S s.t. F(x
) F(x), x S.
x
, x
are respectively called a maximum and a minimum of F on S.

In dierent applications, V can be nite- or innite-dimensional. The
latter need more sophisticated optimization tools such as optimal control; we
will keep that sort of stu in abeyance for now. In our applications, F will
be continuous, and pretty much also dierentiable; often twice continuously
dierentiable. S will be specied most often using constraints.
Example 1 Let U :
k
+
be a utility function, p
1
, ..., p
k
, I be positive
prices and wealth. Maximize U s.t. x
i
0, i = 1, ..., k, and
k
i=1
p
i
x
i

p.x I.
Here, the objective function is U, and
1
S = {x
k
: x
i
0i = 1, ..., k, and 0 p.x I}
.
Example 2 Expenditure minimization. Same setting as above. Minimize
p.x s.t. x
i
0i = 1, ..., k and U(x)

U, where

U is a non-negative real
number.
Here the objective function F :
k
is F(x) = p.x and
S = {x
k
: x
i
0i = 1, ..., k, and U(x)

U}
Example 3 Prot Maximization. Given positive output prices p
1
, ..., p
s
and
input prices w
1
, ..., w
k
, and a production function f :
k
+

s
(transforming
k inputs into s products),
Maximize
s
j=1
p
j
f
j
(x)
k
i=1
w
i
x
i
, s.t. x
i
0, i = 1, ..., k. f
j
(x) is
the output of product j as a function of a vector x of the k inputs.
Here, the objective function is prots :
k
+
dened by (x) =
s
j=1
p
j
f
j
(x)
k
i=1
w
i
x
i
, and
S = {x
k
: x
i
0, i = 1, ..., k}
Example 4 Intertemporal utility maximization. A worker with a known life
span T, earning a constant wage w, and receiving interest at rate r on ac-
cumulated savings, or paying the same rate on accumulated debts, wishes to
2
decide optimal consumption path c(t), t [0, T]. Let accumulated assets/debts
at time t be denoted by k(t). His instantaneous utility from consumption is
u(c(t)), u
> 0, u
< 0, and future consumption is discounted at rate . So

the problem is to choose a function c(t), t [0, T] to
Maximize F(c) =
_
T
0
e
t
u(c(t))dt s.t.
(i) c(t) 0, t [0, T]
(ii) k
(t) = w + rk(t) c(t)

(iii) k(0) = k(T)=0
(iii) assumes that the individual has no inheritance and wishes to leave
no bequest.
Here, the objective function F is a function of an innite dimensional
vector, the consumption function c. The constraint set S admits only those
functions c(t) that satisfy conditions (i) to (iii).
Example 5 Game in Strategic Form. G = N, (S
i
), (u
i
), where N =
{1, ..., n} is the set of Players, and for each player i, S
i
is her set of strategies,
and u
i
:
n
i=1
S
i
is her payo function.
In a game, i
s payo can depend on the choices/strategies (s

1
, ..., s
n
) of
everyone.
A Nash Equilibrium (s
1
, ..., s
n
) is a strategy prole such that for each
player i, s
i
solves the following maximization problem:
Maximize u
i
(s
1
, .., s
i1
, s
i
, s
i+1
, .., s
n
) s.t.
s
i
S
i
.
1.2 Some Concepts and Results
We will now discuss some concepts that we will need, such as the compactness
of the set S above, and the continuity and dierentiability of the objective
3
function F. We will work in normed linear spaces. In the absence of any
other specication, the space we will be in is
n
with the Euclidean norm
||x|| = (
n
i=1
x
2
i
)
1/2
. (Theres a bunch of other norms that would work
equally well. Recall that a norm in
n
is dened to be a function assigning
to each vector x a non-negative real number ||x||, s.t. (i) for all x, ||x|| 0
with =
i x = 0 (0
being the zero vector); (ii) If c , ||cx|| = |c|||x||.

(iii) ||x + y|| ||x|| + ||y||. The last requirement, the triangle inequality,
follows for the Euclidean norm from the Cauchy-Schwarz inequality).
One example in the previous section used another normed linear space,
namely the space of bounded continuous functions dened on an interval
of real numbers, with the sup norm. But in further work in this part of
the course, we will stick to using nite dimensional spaces. Some of the
concepts below apply to both nite and innite dimensional spaces, so we
will sometimes call the underlying space V . But mostly, it will help to think
of V as simply
n
, and to visualize stu in
2
.
We will measure distance between vectors using ||xy|| =
_
n
i=1
(x
i
y
i
)
2
_
1/2
.
This is our intuitive notion of distance using Pythagoras theorem. Further-
more, it satises the three properties of a metric, viz., (i) ||x y|| 0, with
= i x = y; (ii) ||x y|| = ||y x||; (iii) ||x z|| ||x y|| +||y z||.
Note that property (iii) for the metric follows from that for the triangle
inequality for the norm, since ||xz|| = ||(xy)+(yz)|| ||xy||+||yz||.
Open and Closed Sets
Let > 0 and x V . The open ball centered at x with radius is dened
as
B(x, ) = {y : ||x y|| < }
We see that if V = , B(x, ) is the open interval (x , x + ). If
V =
2
, it is an open disk centered at x. The boundary of the disk is traced
by Pythagoras theorem.
Exercise 1 Show that ||x y|| dened by max{|x
1
y
1
|, . . . , |x
n
y
n
|}, for
4
all x, y
n
is a metric (i.e. satises the three requirements of a metric). In
the space
2
, sketch B(0, 1), the open ball centered at 0, the origin, of radius
1, in this metric.
Let S V . x is an interior point of S if B(x, ) S, for some > 0. S
is an open set if all points of S are interior points. On the other hand, S is
a closed set i S
c
is an open set.
Example. Open in vs. open in
2
.
There is an alternative, equivalent, convenient way to dene closed sets.
x is an adherent point of S, or adheres to S, if every B(x, ) contains a point
belonging to S. Note that this does not necessarily mean that x is in S.
(However, if x S then x adheres to S of course).
Example. Singleton and nite sets; countable sets need not be open.
Lemma 1 A set S is closed i it contains all its adherent points.
Proof
Suppose S is closed, so S
c
is open. Let x adhere to S. Want to show
that x S. Suppose not. Then x S
c
, and since S
c
is open, x is an interior
point of S
c
. So there is some > 0 s.t. B(x, ) S
c
; this does not have any
points from S. So x cannot be an adherent point of S. Contradiction.
Conversely, suppose S contains all its adherent points. To show S is
closed, we show S
c
is open. We show that all the points in S
c
are interior
points. Let x S
c
. Since x does not adhere to S, it must be the case that
for some > 0, B(x, ) S
c
. .
More examples of closed (and open) sets.
Now we will relate closedness to convergence of sequences. Recall that
formally, a sequence in V is a function x : N V . But instead of writing
{x(1), x(2), ...} as the images or members of the sequence, we write either
{x
1
, x
2
, ...} or {x
1
, x
2
, ...}.
Denition 1 Convergence:
A sequence (x
k
)
k=1
of points in V converges to x if for every > 0 there
exists a positive integer N s.t. k N implies ||x
k
x|| < .
5
Note that this is the same as saying that for every open ball B(x, ), we
can nd N s.t. for all points x
k
following x
N
, x
k
lies in B(x, ). This implies
that when x
k
converges to x (notation: x
k
x), all but a nite number of
points in (x
k
) lie arbitrarily close to x.
Examples. x
k
= 1/k, k = 1, 2, ... is a sequence of real numbers converging
to zero. x
k
= (1/k, 1/k), k = 1, 2, ... is a sequence of vectors in
2
converging
to the origin. More generally, a sequence converges in
n
if and only if all
the coordinate sequences converge, as can be visualized in the example here
using hypotenuses and legs of triangles.
Theorem 2 (x
k
) x in
n
i for every i {1, . . . , n}, the coordinate
sequence (x
k
i
) x
i
.
Proof. Since
(x
k
i
x
i
)
2
j=1
(x
k
j
x
j
)
2
,
taking square roots implies |x
k
i
x
i
| ||x
k
x||, so for every k N s.t.
||x
k
x|| < , |x
k
i
x
i
| < .
Conversely, if all the coordinate sequences converge to the coordinates
of the point x, then there exists a positive integer N s.t. k N implies
|x
k
i
x
i
| < /
n, for every coordinate i. Squaring, adding across all i and

taking square roots, we have ||x
k
x|| < .
Several convergence results that appear to be true are in fact so. For
instance, (x
k
) x, (y
k
) y implies (x
k
+ y
k
) (x + y). Indeed, there
exists N s.t. k N implies ||x
k
x|| < /2 and ||y
k
y|| < /2. So
||(x
k
+y
k
) (x +y)|| = ||(x
k
x) +(y
k
y)|| ||x
k
x|| +||y
k
y|| (by the
triangle inequality), and this is less than /2 + /2 = .
Exercise 3 Let (a
k
) and (b
k
) be sequences of real numbers that converge to a
and b respectively. Then the product sequence (a
k
b
k
) converges to the product
ab.
6
Closed sets can be characterized in terms of convergent sequences as fol-
lows.
Lemma 2 A set S is closed if and only if for every sequence (x
k
) lying in
S, x
k
x implies x S.
Proof. Suppose S is closed. Take any sequence (x
k
) that converges to a
point x. Then for every B(x, ), we can nd a member x
k
of the sequence
lying in this open ball. So, x adheres to S. Since S is closed, it must contain
this adherent point x.
Conversely, suppose the set S has the property that whenever (x
k
) S
converges to x, x S. Take a point y that adheres to S. Take the successively
smaller open balls B(y, 1/k), k = 1, 2, 3, .... We can nd, in each such open
ball, a point y
k
from the set S (since y adheres to S). These points need not
be all distinct, but since the open balls have radii converging to 0, y
k
y.
Thus by the convergence property of S, y S. So, any adherent point y of
S actually belongs to S. .
Related Results
1. If (a
k
) is a sequence of real numbers all greater than or equal to 0,
and a
k
a, then a 0. The reason is that for all k, a
k
[0, ) which is a
closed set and hence must contain the limit a.
2. Sup and Inf.
Let S . u is an upper bound of S if u a, for every a S. s is the
supremum or least upper bound of S (called sup S), if s is an upper bound
of S, and s u, for every upper bound u of S.
We say that a set S of real numbers is bounded above if there exists an
upper bound, i.e. a real number M s.t. a M, a S. The most important
property of a supremum, which well by and large take here as given, is the
following:
Completeness Property of Real Numbers: Every set S of real num-
bers that is bounded above has a supremum.
For a short discussion of this property, see the Appendix.
Note that sup S may or may not belong to S.
7
Examples. S = (0, 1), D = [0, 1], K = set of all numbers in the sequence
1
1
2n
, n = 1, 2, 3, .... The supremum of all these sets is 1, and this does not
belong to S or to K.
When sup S belongs to S, it is called the maximum of S, for obvious
reasons. Another important property of suprema is the following.
Lemma 3 For every > 0, there exists a number a S s.t. a > sup S .
Note that this means that sup S is an adherent point of S.
Proof. Suppose that for some > 0, there is no number a S s.t. a >
sup S . So, every a S must then satisfy a sup S . But then,
sup S is an upper bound of S that is less than sup S. This implies that
sup S is not in fact the supremum of S. Contradiction. .
Lemma 4 If a set S of real numbers is bounded above and closed, then it
has a maximum.
Proof. Since it is bounded above, it has a supremum, sup S. sup S is an
adherent point of S (by the above lemma). S is closed so it contains all its
adherent points, including sup S. Hence sup S is the max of S. .
Corresponding to the notion of supremum or least upper bound of a set
S of real numbers, is the notion of inmum or greatest lower bound of S.
A number l is a lower bound of S if l a, a S. The inmum of S is a
number s s.t. s is a lower bound of S, and s l, for all lower bounds l of S.
We call the inmum of S, inf S.
Let S be the set of numbers of the form a, for all a S.
Fact. sup S = inf(S).
So, sup and inf are intimately related.
By the completeness property of real numbers, if S is bounded below,
(i.e., there exists m s.t. m < a, a S), it has an inmum. If S is closed
and bounded below, it has a minimum.
A set S is said to be bounded if it is bounded above and bounded
below. We can extend the lemma above along obvious lines as follows:
8
Theorem 4 If S is closed and bounded, then it has a Maximum and a
Minimum.
For a more general normed linear space V , we dene boundedness as
follows. A set S V is bounded if there exists an open ball B(0, M) s.t.
S B(0, M)
Compact Sets.
Suppose (x
n
) is a sequence in V . (Note the change in notation, from
superscript to subscript. This is just by the way; most places have this
subscript notation, but Rangarajan Sundaram at times has the superscript
notation in order to leave subscripts to denote co-ordinates of a vector).
Let m(k) be an increasing function from the natural numbers to the
natural numbers. So, l > n implies m(l) > m(n). A subsequence (x
m(k)
) of
(x
n
) is an innite sequence whose k
th
member is the m(k)
th
member of the
original sequence.
Give an example. The idea is that to get a subsequence from (x
n
), you
strike out some members, keeping the remaining members positions the
same.
Fact. If a sequence (x
n
) converges to x, then all its subsequences converge
to x.
Proof. Take an arbitrary > 0. So, there exists N s.t. n N implies
||x
n
x|| < . This implies, for any subsequence (x
m(k)
), that k N implies
||x
m(k)
x|| < . .
However, if a sequence does not converge anywhere, it can still have (lots
of) subsequences that converge. For example, let (x
n
) ((1)
n
), n = 1, 2, ....
Then, (x
n
) does not converge; but the subsequences (y
m
) = 1, 1, 1, ....
and (z
m
) = 1, 1, 1, ... both converge, to dierent limits. (Such points are
called limit points of the mother sequence (x
n
)).
Compact sets have a property related to this fact.
Denition 2 A set S V is compact if every sequence (x
n
) in S has a
subsequence that converges to a point in S.
9
Theorem 5 Suppose S
n
. Then S is compact if and only if it is closed
and bounded.
Proof (Sketch).
Suppose S is closed and bounded. We can show its compact using a
pigeonhole-like argument; lets sketch it here. Since S is bounded, we can
cover it in a closed rectangle R
0
= I
1
. . . I
n
, where I
i
, i = 1, ..., n are
closed intervals. Take a sequence (x
n
) in S. Divide the rectangle in two:
I
1
1
. . . I
n
and I
2
1
... I
n
, where I
1
1
I
2
1
= I
1
is the union of 2 intervals.
Then, theres an innity of members of (x
n
) in at least one of these smaller
rectangles, call this R
1
. Divide R
1
into 2 smaller rectangles, say by dividing
I
2
into 2 smaller intervals; well nd an innity of members of (x
n
) in at least
one of these rectangles, call it R
2
. This process goes on ad innitum, and we
nd an innity of members of (x
n
) in the rectangles R
0
R
1
R
2
.... By
the Cantor Intersection Theorem,
i=0
R
i
is a single point; call this point x.
Now we can choose points y
i
R
i
, i = 1, 2, ... s.t. each y
i
is some member
of (x
n
); because the R
i
s collapse to x, it is easy to show that (y
m
) is a
subsequence that converges to x. Moreover, the y
i
s lie in S, and S is closed;
so x S.
Conversely, suppose S is compact.
(i) Then it is bounded. For suppose not. Then we can construct a se-
quence (x
n
) in S s.t. for every n = 1, 2, ..., ||x
n
|| > n. But then, no subse-
quence of (x
n
) can converge to a point in S. Indeed, take any point x S
and any subsequence (x
m(n)
) of (x
n
). Then
||x
m(n)
|| = ||x
m(n)
x + x|| ||x
m(n)
x|| +||x||
(The inequality above is due to the triangle inequality).
So,
||x
m(n)
x|| ||x
m(n)
|| ||x|| n ||x||
and the RHS becomes larger with n. So (x
m(n)
) does not converge to x.
10
(ii) S is also closed. Take any sequence (x
n
) in S that converges to x.
Then, all subsequences of (x
n
) converge to x, and since S is compact, (x
n
)
has a subsequence converging to a point in S. So, this point of limit is x,
and x S. So, S is closed. .
Continuity of Functions
Denition 3 A function F :
n

m
is continuous at x
n
, if for
every sequence (x
k
) that converges to x in
n
, the image sequence (f(x
k
))
converges to f(x) in
m
.
Example of point discontinuity.
Example of continuous function on discrete space.
F is continuous on S
n
, if it is continuous at every point x S.
Examples. The real-valued function F(x) = x is continuous using this
denition, almost trivially, since (x
k
) and x are identical to (F(x
k
)) and F(x)
respectively.
F(x) = x
2
is continuous. We want to show that if (x
k
) converges to x,
then (F(x
k
)) = x
2
k
converges to F(x) = x
2
. This follows from the exercise
above on limits: x
k
x, x
k
x implies x
k
x
k
x.x = x
2
.
By extension, polynomials are continuous functions.
May talk a little about the coordinate functions of F :
n
m
:
(F
1
(x
1
, ..., x
n
), ..., F
m
(x
1
, ..., x
n
)).
Example: F(x
1
, x
2
) = (x
1
+ x
2
, x
2
1
+ x
2
2
). This is continuous because (i)
F
1
and F
2
are continuous; e.g. let x
k
x. Then the coordinates x
k
1
x
1
and x
k
2
x
2
. So F
1
(x
k
) = x
k
1
+ x
k
2
x
1
+ x
2
= F
1
(x).
(ii) Since the coordinate sequences F
1
(x
k
) F
1
(x) and F
2
(x
k
) F
2
(x),
F(x
k
) (F
1
(x
k
), F
2
(x
k
)) F(x) = (F
1
(x), F
2
(x)).
There is an equivalent, (, ) denition of continuity.
Denition 4 A function F :
n
m
is continuous at x
n
, if for every
> 0, there exists > 0 s.t. if for any y
n
we have ||x y|| < , then
||F(x) F(y)|| < .
11
So if there is a hurdle of size around F(x), then, if point y is close
enough to x, F(y) cannot overcome the hurdle.
Theorem 6 The two denitions above are equivalent.
Proof. Suppose there exists an > 0 s.t. for every > 0, there exists a y
with ||x y|| < and ||F(x) F(y)|| . Then for this particular , we can
choose a sequence of
k
= 1/k and x
k
with ||x x
k
|| < 1/k. So, (x
k
) x
but (F(x
k
)) does not converge to F(x), staying always outside the -band of
F(x).
Conversely, suppose there exists a sequence (x
k
) that converges to x, but
(F(x
k
)) does not converge to F(x). So, there exists > 0 s.t. for every
positive integer N, there exists k N for which ||F(x
k
) F(x)|| . Then,
for this specic , there does not exist any > 0 s.t. for all y with ||xy|| <
we have ||F(x) F(y)||; for we can nd for any such , one of the x
k
s s.t.
||x
k
x||, so ||F(x
k
) F(x)|| .
Here is an immediate upshot of the latter denition. Suppose F :
is continuous at x. If F(x) > 0, then there is an open interval (x , x + )
s.t. if y is in this interval, then F(y) > 0. The idea is that we can take an
= F(x)/2, say, and use the (, ) denition. A similar statement will hold
if F(x) < 0.
We use this fact now in the following result.
Theorem 7 Intermediate Value Theorem
Suppose F : is continuous on an interval [a, b] and F(a) and F(b)
are of opposite signs. Then there exists c (a, b) s.t. F(c) = 0.
Proof. Suppose WLOG that F(a) > 0, F(b) < 0 (i.e. for the other case
just consider the function F). Then the set
S = {x [a, b]|F(x) 0}
12
is bounded above. Indeed, b is an upper bound of S since F(b) is not 0.
By the completeness property of real numbers, S has a supremum, sup S = c,
say.
It cant be that F(c) > 0, for then by continuity, there is an h S, h > c,
s.t. F(h) > 0 so c is not an upper bound of S. It cant be that F(c) < 0. For,
if c is an upper bound of S with F(c) < 0, then we have for every x [a, b]
with F(x) 0, x c. However, by continuity, there is an interval (c , c]
s.t. every y in this interval satises F(y) < 0. But then, every x S must
be to the left of this interval. But then again, c is not the least upper bound
of S.
So, it must be that F(c) = 0.
13
Chapter 2
Existence of Optima
2.1 Weierstrass Theorem
This theorem of Weierstrass gives a sucient condition for a maximum and
minimum to exist, for an optimization problem.
Theorem 8 (Weierstrass). Let S
n
be compact and let F : S be
continuous. Then F has a maximum and minimum on S; i.e., there exist
z
1
, z
2
S s.t. f(z
2
) f(x) f(z
1
), x S.
The idea is that continuity of F preserves compactness; i.e. since S is
compact and F is continuous, the image set F(S) is compact. That holds
irrespective of the space F(S) is in; but since F is real-valued, F(S) is a
compact set of real numbers, and therefore must have a max and a min, by
a result in Chapter 1.
Proof.
Let (y
k
) be a sequence in F(S). So, for every k, there is an x
k
S, s.t.y
k
=
F(x
k
). Since (x
k
), k = 1, 2, ... is a sequence in the compact set S, it has a
subsequence (x
m(k)
) that converges to a point x in S. Since F is continuous,
the image sequence (f(x
n(k)
)) converges to f(x), which is obviously in F(S).
14
So weve found a convergent subsequence (y
m(k)
) = (f(x
m(k)
)) of (y
k
); hence
F(S) is compact. This means the set F(S) of real numbers is closed and
bounded; so, it has at least one maximum and at least one minimum. .
Example 6 p
1
= p
2
= 1, I = 10. Maximize U(x
1
, x
2
) = x
1
x
2
s.t. the budget
constraint. Here, the budget set is compact, since the prices are positive. We
can see that the image of the budget set S under the function U (or the range
of U), is U(S) = [0, 25]. This is compact, and so U attains a max (25) and
a min (0) on S.
The fact that U(S) is in fact an interval has to do with another property of
continuity of the objective: such functions preserve connectedness in addition
to preserving compactness of the set S, and here, the budget set is a connected
set.
Do applications of Weierstrass theorem to utility maximization and
cost minimization.
15
Chapter 3
Unconstrained Optima
3.1 Preliminaries
A function f : is dened to be dierentiable at x if there exists a
s.t.
lim
yx
_
f(y) f(x)
y x
a
_
= 0
(1)
By limit equal to 0 as y x, we require that the limit be 0 w.r.t. all
sequences (y
n
) s.t. y
n
x. a turns out to be the unique number equal to
the slope of the tangent to the graph of f at the point x. We denote a by
the notation f
(x). We can rewrite Equation (1) as follows:

lim
yx
_
f(y) f(x) a(y x)
y x
_
= 0
(2)
We can use this way of dening dierentiability for more general functions.
Denition 5 Let f :
n

m
. f is dierentiable at x if there is an
mn matrix A s.t.
16
lim
yx
_
||f(y) f(x) A(y x)||
||y x||
_
= 0
In the one variable case, the existence of a gives the existence of a tangent;
in the more general case, the existence of the matrix A gives the existence of
tangents to the graphs of the m component functions f = (f
1
, ..., f
m
), each
of those functions being from
n
. In other words this denition has
to do with the best linear ane approximation to f at the point x. To see
this in a way equivalent to the above denition, put h = y x in the above
denition, so y = x + h. Then in the 1-variable case, from the numerator,
f(x+h) is approximated by the ane function f(x)+ah = f(x)+f
(x)h. In
the general case, f(x +h) is approximated by the ane function f(x) +Ah.
It can be shown that (w.r.t. the standard bases in
n
and
m
), the matrix
A equals Df(x), the mn matrix of partial derivatives of f evaluated at the
point x. To see this, take the slightly less general case of a function f :
n
. If f is dierentiable at x, there exists a 1 n matrix A = (a

11
, . . . , a
1n
)
satisfying the denition above: i.e.
lim
h
||f(x + h) f(x) Ah||
||h||
= 0
In particular, the above must hold if we choose h = (0, .., 0, t, 0, .., 0) with
h
j
= t 0. That is,
lim
t0
||f(x
1
, .., x
j
+ t, .., x
n
) f(x
1
, .., x
j
, .., x
n
) a
1j
t||
t
= 0
But from the limit on the LHS, we know that a
1j
must equal the partial
derivative f(x)/x
1
.
Df(x) as the derivative of f at x; Df itself is a function from
n
to
m
.
17
Df(x) =
_
_
_
f
1
(x)/x
1
. . . f
1
(x)/x
n
. . . . . . . . .
f
m
(x)/x
1
. . . f
m
(x)/x
n
_
_
_
Here,
f
i
(x)
x
j
= lim
t0
f
i
(x
1
, .., x
j
+ t, ..., x
n
) f
i
(x
1
, .., x
j
, ..., x
n
)
t
We want to also represent the partial derivative in dierent notation: Let
e
j
= (0, .., 0, 1, 0, ..., 0) be the unit vector in
n
on the j
th
axis. Then,
f
i
(x)
x
j
= lim
t0
f
i
(x + te
j
) f
i
(x)
t
That is, the partial of f
i
w.r.t. x
j
, evaluated at the point x, is looking at
essentially a function of 1-variable: we take the (n 1) dimensional surface
of the function f
i
, and slice it parallel to the j
th
axis, s.t. point x is contained
on this slice/plane; well get a function pasted on this plane; its derivative
is the relevant partial derivative.
To be more precise about this one-variable function pasted on the slice/plane,
note that the single variable t is rst mapped to a vector x+te
j

n
, and
then that vector is mapped to a real number f
i
(x +te
j
). So, let h :
n
be dened by h(t) = x + te
j
, for all t . Then the one-variable function
were looking for is g : dened by g(t) = f(h(t)), for all t ; its
the composition of f and h.
In addition to slicing the surface of functions that map from
n
to
in the directions of the axes, we can slice them in any direction and get
a function pasted on the slicing plane. This is the notion of a directional
derivative.
Recall that if x
n
, and h
n
, then the set of all points that can
be written as x + th, for some t Re, comprise the line through x in the
direction of h.
See gure (drawn in class).
18
Denition 6 The directional derivative of a function f :
n
at x
n
,
in the direction h
n
, denoted Df(x; h), is
lim
t0+
f(x + th) f(x)
t
If t 0+ is replaced by t 0, we get the 2-sided directional derivative.
A function that is dierentiable on a set S if it is dierentiable at all
points in S. f is continuously dierentiable if it is dierentiable and Df is
continuous.
3.2 Interior Optima
Denition 7 Let f :
n
. A point z is a local maximum (resp. local
minimum) of f on a set S
n
if f(z) ()f(x), for all x B(z, ) S,
for some > 0.
B(z, ) is intersected with S since it may not, by itself, lie entirely in S.
However, if z is in the interior of S, we can discard that. z is said to be
an interior local maximum, of minimum, of f on S if f(z) ()f(x), x
B(z, ), for some > 0.
We now give a necessary condition for a point to be an interior local max
or min; namely, that its derivative should be zero. For if not, then we can
increase or decrease the function value by moving away slightly from the
point.
First Order Necessary Condition.
Theorem 9 Let f :
n
, S
n
, and let x
be a local max or local

min of f on S, lying in the interior of S. If f is dierentiable at x
, then
Df(x
) = .
19
Here, = (0, ..., 0) is the origin, and Df(x
) = (f(x
)/x
1
, . . . , f(x
)/x
n
).
Proof. Let x
be an interior local max (min proof is done along similar

lines).
Step 1: Suppose n = 1. Take any sequence (y
k
), y
k
< x
, y
k
x
, and
(z
k
), z
k
> x
, z
k
x
. Since x
is a local max, we have, for k K and K

large enough,
f(z
k
) f(x
)
z
k
x
0
f(y
k
) f(x
)
y
k
x
Taking limits preserves these inequalities since (, 0] and [0, ) are

closed sets and the ratio sequences lie in these closed sets. So,
f
(x
) 0 f
(x
)
so f
(x
) = 0.
Step 2. Suppose n > 1. Take any j
t
h axis direction, and let g :
be dened by g(t) = f(x
+te
j
). Note that g(0) = f(x
). Now, since x
is a
local max of f, f(x
) f(x
+te
j
), for t smaller than some cuto value: i.e.,
g(0) g(t) for t smaller than this cuto value, i.e., g(0) is a local interior
maximum. (Since t < 0 and t > 0 are both allowed). g is dierentiable
at 0 since g(0) = f(h(0)) = f(x
), and f is dierentiable at x
and h is
dierentiable at t = 0. (Here, h(t) = x
+ te
j
, so Dh(t) = e
j
, t). So, g is
dierentiable at 0, g
(0) = 0, and by the Chain Rule,

g
(0) = Df(h(0))Dh(0) = Df(x
)e
j
=
f(x
)
x
j
.
Note that this is necessary but not sucient for a local max or min, e.g.
f(x) = x
3
has a vanishing rst derivative at x = 0, which is not a local
optimum.
Second Order Conditions
20
Denition. x is a strict local maximum of f on S if f(x) > f(y), for all
y B(x, ) S, y = x, for some > 0.
We will represent the Hessian or second derivative (matrix) of f by D
2
f.
Theorem 10 Suppose f :
n
is C
2
on S
n
, and x is an interior
point of S.
1. (necessary) If f has a local max (resp. local min) at x, then D
2
f(x)
is n.s.d. (resp. p.s.d.).
2. (sucient) If Df(x) = and D
2
f(x) is n.d. (resp. p.d.) at x, then x
is a strict local max (resp. min) of f on S.
The results in the above theorem follow from taking a Taylor series ap-
proximation of order 2 around the local max or local min. For example,
f(x) = f(x
) + Df(x
(x x
) +
1
2
(x x
)
T
D
2
f(x
)(x x
) + R
2
(x x
)
where R
2
() is a remainder of order smaller than two. If x
is an interior
local max or min, then Df(x
) = 0 (a vector of zeros), so the quadratic form

in the second-order term will share the sign of (f(x) f(x
)).
Examples to illustrate: (i) SONC are not sucient. f(x) = x
3
. (ii) Semi-
deniteness cannot be replaced by deniteness. f(x) = x
4
. (iii). These are
conditions for local, not global optima. f(x) = 2x
3
3x
2
. (iv) Strategy for
using the conditions to identify global optima. f(x) = 4x
3
5x
2
+ 2x on
S = [0, 1].
21
Chapter 4
Optimization with Equality
Constraints
4.1 Introduction
We are given an objective function f : R
n
R to maximize or minimize,
subject to k constraints. That is, there are k functions, g
1
: R
n
R,
g
2
: R
n
R, ... , g
k
: R
n
R, and we wish to
Maximize f(x) over all x R
n
such that g
1
(x) = 0, . . . , g
k
(x) = 0.
More compactly, collect the constraint functions (looking at them as com-
ponent functions) into one function g : R
n
R
k
, where g(x) = (g
1
(x), . . . , g
k
(x)).
Then what we want is to
Maximize f(x) over all x R
n
such that g(x)
1k
=
1k
.
The Theorem of Lagrange provides necessary conditions for a local opti-
mum x
. By local we mean that the value f(x
) is a max or min compared

to other values f(x) for all x contained in some open set U containing x
such that x satises the k constraints. Thus the problem it considers is to

provide necessary conditions for a Max or a Min of
f(x) over all x S, where S = U {x R
n
|g(x) = }, for some open
set U.
22
The following example illustrates the principle of no arbitrage underly-
ing a maximum. A more general illustration, with more than 1 constraint,
requires a little bit of the machinery of linear inequalities, which well not
cover. The idea here is that the Lagrange multiplier captures how the con-
straint is distributed across the variables.
Example 1. Suppose x
solves Max U(x

1
, x
2
) s.t. I p
1
x
1
p
2
x
2
= 0
and suppose x
>> .
Then reallocating a small amount of income from one good to the other
does not increase utility. Say income dI > 0 is shifted from good 1 to good 2.
So dx
1
= (dI/p
1
) > 0 and dx
2
= (dI/p
1
) < 0. Note that this reallocation
satises the budget constraint, since
p
1
(x
1
+ dx
1
) + p
2
(x
2
+ dx
2
) = I
The change in utility is dU = U
1
dx
1
+ U
2
dx
2
= [(U
1
/p
1
)(U
2
/p
2
)]dI 0, since the change in utility cannot be positive
at a maximum. Therefore,
(U
1
/p
1
) (U
2
/p
2
) 0 (1)
Similarly, dI > 0 shifted from good 1 to good 2 does not increase utility,
so that
[(U
1
/p
1
) + (U
2
/p
2
)]dI 0, or
(U
1
/p
1
) + (U
2
/p
2
) 0 (2)
Eq. (1) and (2) imply (U
1
(x
)/p
1
) = (U
2
(x
)/p
2
) =
(3)
That is, the marginal utility of the last bit of income equals (U
1
(x
)/p
1
) =
(U
2
(x
)/p
2
at the optimum. Also, (3) implies U
1
(x
) =
p
1
, U
2
(x
) =
p
2
.
Along with p
1
x
1
+p
2
x
2
= I, these are the FONC of the Lagrangean function
Max L(x, ) = U(x
1
, x
2
) + [I p
1
x
1
p
2
x
2
]
More generally, suppose F : R
n
R and G : R
n
R, and suppose x
solves Max F(x) s.t. c G(x) = 0. This part is skippable.

Contemplate a change dx in x
that respects the constraint G(x) = c.

That is,
dG = G
1
dx
1
+ G
2
dx
2
= 0. Therefore,
G
1
dx
1
= G
2
dx
2
= dc, say. So dx
1
= (dc/G
1
), dx
2
= (dc/G
2
). If
dc > 0, then our change dx implies dx
1
> 0, dx
2
< 0. F does not increase at
the maximum, x
. So
23
dF = F
1
dx
1
+ f
2
dx
2
0, or [(F
1
/G
1
) (F
2
/G
2
)]dc 0. Similarly, 0
can be shown similarly.
Therefore, (F
1
(x
)/G
1
(x
)) = (F
2
(x
)/G
2
(x
)) =
(4)
Caveat: We have assumed that G
1
(x
) and G
2
(x
) are not both zero at

x
. This is called the constraint qualication.

Again, note that (4) can be got as the FONC of the problem
Max L(x, ) = F(x) + [c G(x)].
On
.
Lets go back to the utility example. At the optimum (x
), suppose
you increase income by I. Buying more x
1
implies utility increases by
(U
1
(x
)/p
1
)I, approximately.
Buying more x
2
implies utility increases by (U
2
(x
)/p
2
)I
At the optimum,(U
1
(x
)/p
1
) = (U
2
(x
)/p
2
) =
.
So in either case, utility increases by
I. So
gives the change in

the objective (here, the objective is utility), at an optimum, that results from
relaxing the constraint a little bit.
The interpretation is the same in the more general case: If G(x) = c, and
c is increased by c, suppose x
1
alone is then increased.
So G = g
1
dx
1
= c, or dx
1
= (c/G
1
). So at x
, F increases by
dF = F
1
dx
1
= (F
1
(x
)/G
1
(x
))c =
c.
If instead x
2
is changed, F increases by dF = F
2
dx
2
, = (F
2
(x
)/G
2
(x
))c =
c.
4.2 The Theorem of Lagrange
The set up is the following. f : R
n
R is the objective function, g
i
: R
n
R, i = 1, . . . , k, k < n are the constraint functions.

Let g : R
n
R
k
be the function given by g(x) = (g
1
(x), . . . , g
k
(x)).
Df(x) = (f(x)/x
1
, . . . , f(x)/x
n
).
Dg(x) =
_
_
_
(g
1
(x)/x
1
) . . . (g
1
(x)/x
n
)
.
.
.
.
.
.
.
.
.
(g
k
(x)/x
1
) . . . (g
k
(x)/x
n
)
_
_
_
=
_
_
_
Dg
1
(x)
.
.
.
Dg
k
(x)
_
_
_
24
So Dg(x) is a k n matrix.
The theorem below provides a necessary condition for a local max or
local min. Note that x
is a local max (resp. min) of f on the constraint set

{x R
n
|g
i
(x) = 0, i = 1, . . . , k} if f(x
) f(x) (resp. f(x)) for all x U

for some open set U containing x
, s.t. g
i
(x) = 0, i = 1, . . . , k}. Thus x
is
a Max on the set S = U {x R
n
|g
i
(x) = 0, i = 1, . . . , k}.
Theorem 11 (Theorem of Lagrange). Let f : R
n
R and g
i
: R
n
R, i = 1, . . . , k, k < n be C
1
functions. Suppose x
is a Max or a Min of
f on the set S = U {x R
n
|g
i
(x) = 0, i = 1, . . . , k}, for some open set
U R
n
. Then there exist real numbers
1
, . . . ,
k
, not all zero, such that
Df(x
) +
k
i=1
i
Dg
i
(x
) =
1n
.
Moreover if rank(Dg(x
)) = k, then we may put = 1.

Notes: (1) The condition rank(Dg(x
)) = k is called the constraint qual-

ication. The rst part of the theorem says that at a local Max or Min, under
the assumption of continuous dierentiability of f and g
i
, i = 1, . . . , k, the
vectors Df(x
), Dg
1
(x
), . . . , Dg
k
(x
) are linearly dependent. The constraint

qualication (CQ) basically assumes that the vectors Dg
1
(x
), . . . , Dg
k
(x
)
are linearly independent. In that case,
Df(x
) +
k
i=1
i
Dg
i
(x
) = implies that cannot equal 0, for then
k
i=1
i
Dg
i
(x
) = , which along with linear independence implies
i
=
0, i = 1, . . . , k. This cannot be. So if the CQ holds, then = 0, so we can
divide through by .
(2) In most applications the CQ holds. We usually check rst whether it
holds, and then proceed. Suppose it does hold. Note that
Df(x
) +
k
i=1
i
Dg
i
(x
) = subsumes the following n equations (with

= 1:
(f(x
)/x
j
) +
k
i=1
i
(g
i
(x
)/x
j
) = 0, j = 1, . . . , n
Note also that this leads to the usual procedure for nding equality
constrained Max or Min, by setting up a Lagrangean function:
L(x, ) = f(x) +
k
i=1
i
g
i
(x), and solving the FONC
25
(L(x, )/x
j
) = (f(x)/x
j
) +
k
i=1
i
(g
i
(x)/x
j
) = 0, j = 1, . . . , n
(L(x, )/
i
) = g
i
(x) = 0, i = 1, . . . , k
Which is (n + k) equations in (n + k) variables x
1
, . . . , x
n
,
1
, . . . ,
k
.
Why does the above procedure usually work to isolate global
optima?
The FONC that come out of the Lagrangean function are, as seen in the
Theorem of Lagrange, necessary conditions for local optima. However, when
we do equality constrained optimization, (i) usually a global max (or min)
x
is known to exist. (ii) Second, for most problems the CQ is met at all
x S. Therefore, it is met at the optimum as well. (Note that otherwise,
not knowing the optimum when we start out on a problem, it is not possible
to check whether the CQ holds at that point!)
When (i) and (ii) are met, the solutions to the FONC of the Lagrangean
function will include all local optima, and hence will include the global op-
timum that we want. By comparing the values f(x) for all x that solve the
FONC, we get the point at which f(x) is a max or a min. With this method,
we dont need second order conditions at all, if we just want to nd a global
max or a min.
Pathologies
The above procedure may not always work.
Pathology 1. A global optimum may not exist. Then none of the critical
points (solutions to the FONC of the Lagrangean function) is a global op-
timum. Critical points may then be only it local optima, or they may not
even be local optima. Indeed, the Theorem of Lagrange gives a necessary
condition; so there could be points x
that meet the condition and yet are

not even local max or min.
Example. Max f(x, y) = x
3
+ y
3
, s.t. g(x, y) = x y = 0. Here the
contour set C
g
(0) is the 45-degree line on the x y plane. By taking larger
and larger positive values of x and y on this contour set, we get higher and
higher f(x, y). So f does not have a global max on the constraint set. But
if we mechanically crank out the Lagrangean FONCs as follows
Max x
3
+ y
3
+ (x y)
26
FONC: 3x
2
+ = 0
3y
2
+ = 0
x y = 0. So x
= y
= 0 is a solution. But (x
, y
) = (0, 0)
is neither a local max nor a local min. Indeed, f(0, 0) = 0, whereas for
(x, y) = (, ), > 0, f(, ) = 2
3
> 0, and for (x, y) = (, ), < 0,
f(, ) = 2
3
< 0.
Pathology 2. The CQ is violated at the optimum.
In this case, the FONCs need not be satised at the global optimum.
Example. Max f(x, y) = y s.t. g(x, y) = y
3
x
2
= 0.
Let us rst nd the solution using native intelligence. Then well show
that the CQ fails at the optimum, and that the usual Lagrangean method
is a disaster. Finally, well show that the general form of the equation the
Theorem of Lagrange, that does NOT assume that the CQ holds at the
optimum, works.
The constraint is y
3
= x
2
, and since x
2
is nonnegative, so must y
3
be.
Therefore, y 0. The maximum of y s.t. y 0 implies y = 0 at the max.
So y
3
= x
2
= 0, so x = 0. So f attains global max at (x, y) = (0, 0).
Dg(x, y) = (2x, 3y
2
) = (0, 0) at (x, y) = (0, 0). So rank(Dg(x, y)) =
0 < k = 1 at the optimum; the CQ fails at this point. Using the Lagrangean
method, we get the following FONC:
(f/x) + (g/x) = 0, that is 2x = 0 (1)
(f/y) + (g/y) = 0, that is 1 + 3y
2
= 0 (2)
(L/) = 0, that is x
2
+ y
3
= 0 (3)
Eq.(1) implies either = 0 or x = 0. x = 0 implies, from Eq.(3), that
y = 0, but then (2) becomes 1 = 0, which is not possible. Similarly, = 0
again violates (2).
But the general form of the condition in the Theorem of Lagrange does
not rely on the CQ and works. In this problem, the only equation out of the
above three that changes is Eq. (2), as we see below:
Df(x, y) +Dg(x, y) = (0, 0), and x
2
+y
3
= 0, with Df(x, y) = (0, 1),
Dg(x, y) = (2x, 3y
2
) yield
(f/x) + (g/x) = 0, that is 2x = 0 (1)
27
(f/y) + (g/y) = 0, that is + 3y
2
= 0 (2)
(L/) = 0, that is x
2
+ y
3
= 0 (3)
Now, Eq.(1) implies = 0 or x = 0. If = 0, then Eq.(2) implies = 0.
But = = 0 is ruled out by the Theorem of Lagrange. Therefore, here
= 0. Hence x = 0. From Eq.(3), we then have y = 0, and so from Eq. (2),
= 0. So we get x = y = 0 as a solution.
Second-Order Conditions
These conditions are characterized by deniteness or semi-deniteness of
the Hessian of the Lagrangean function, which is the appropriate function
to look at in this constrained optimization problem. Also, we dont have to
check the appropriate inequality for the quadratic form for all x. Now, only
those x are relevant that satisfy the constraints. Second order conditions in
general say something about the curvature of the objective function around
the local max or min...i.e., how the graph curves as we move from x
to
a nearby x. In constrained optimization, we cannot move from x
to any
arbitrary x nearby; the move must be to an x which satises the constraints.
That is, such a move must leave all g
i
(x) at 0. In other words, dg
i
(x) =
Dg
i
(x).dx = 0, where dx is a vector x
that must be orthogonal to

Dg
i
(x). Thus it suces to evaluate the appropriate quadratic form at all
vectors x that are orthogonal to all the gradients of the constraint functions.
Since L(x, ) = f(x) +
k
i=1
i
g
i
(x),
D
2
L(x, )
nn
= D
2
f(x)
nn
+
k
i=1
i
D
2
g
i
(x)
nn
,
where D
2
f(x) =
_
_
_
f
11
(x) . . . f
1n
(x)
.
.
.
.
.
.
.
.
.
f
n1
(x) . . . f
nn
(x)
_
_
_
and D
2
g
i
(x) =
_
_
_
g
i11
(x) . . . g
i1n
(x)
.
.
.
.
.
.
.
.
.
g
in1
(x) . . . g
inn
(x)
_
_
_
So D
2
L(x, )
nn
=
_
_
_
f
11
(x) +
k
i=1
i
g
i11
(x) . . . f
1n
(x) +
k
i=1
i
g
i1n
(x)
.
.
.
.
.
.
.
.
.
f
n1
(x) +
k
i=1
i
g
in1
(x) . . . f
nn
(x) +
k
i=1
i
g
inn
(x)
_
_
_
28
is the second derivative of L w.r.t. the x variables. Note that D
2
L(x, )
is symmetric, so we may work with its quadratic form.
At a given x
R
n
, Dg(x
)
kn
=
_
_
_
Dg
1
(x
)
.
.
.
Dg
k
(x
)
_
_
_
So the set of all vectors x that are orthogonal to all the gradient vectors
of the constraint functions at x
is the Null Space of Dg(x
), N(Dg(x
)) =
{x R
n
|Dg(x
)x =
k1
}.
Theorem 12 Suppose there exists (x
n1
,
k1
) such that Rank(Dg(x
)) = k
and Df(x
) +
k
i=1
Dg
i
(x
) = .
(i) (a necessary condition) If f has a local max (resp. local min) on S at
point x
, then x
T
D
2
L(x
)x 0, (resp. 0) for all x N(Dg(x
))
(ii) (a sucient condition) If x
T
D
2
L(x
)x < 0,(resp. > 0), for all

x N(Dg(x
)), x = , then x
is a strict local max (resp. strict local min)

of f on S.
Checking these directly involves checking inequalities for every vector
x in the Null Space of Dg(x
), which is an n k dimensional subspace.

Alternatively, we could check the signs of k determinants instead, and the
relevant tests are given by the theorem below, which states tests equivalent
to those of the above theorem. These are the Bordered Hessian conditions.
This stu is tedious indeed, and it would be a hard taskmaster who would
ask anyone to waste hard disk space by memorizing these.
BH(L
) =
_
0
kk
Dg(x
)
kn
[Dg(x
)]
T
nk
D
2
L(x
)
nn
_
(n+k)(n+k)
BH
(L
; k +nr) is the matrix obtained by deleting the last r rows and

columns of BH
(L
).
BH
(L
) will denote a variant in which the permutation has been

applied to (i) both rows and columns of D
2
L(x
) and (ii) only the columns

of Dg(x
) and only the rows of [Dg(x
)]
T
, which is the transpose of Dg(x
).
29
Theorem 13 (1a) x
T
D
2
L(x
)x 0,for all x N(Dg(x
)), i for all

permutations of {1, . . . , n}, we have:
(1)
nr
det(BH
(L
; n + k r)) 0, r = 0, 1, . . . , k 1.
(1b) x
T
D
2
L(x
)x 0,for all x N(Dg(x
)), i for all permutations

of {1, . . . , n}, we have:
(1)
k
det(BH
(L
; k + n r)) 0, r = 0, 1, . . . , k 1.
(2a). x
T
D
2
L(x
)x < 0,for all nonzero x N(Dg(x
)), i (1)
nr
det(BH(L
; n+
k r)) > 0, r = 0, 1, . . . , k 1.
(2b)x
T
D
2
L(x
)x > 0,for all nonzero x N(Dg(x
)), i (1)
k
det(BH(L
; n+
k r)) > 0, r = 0, 1, . . . , k 1.
Note. (1) For the negative denite or semideniteness subject to con-
straints cases, the determinant of bordered Hessian with last r rows and
columns deleted must be of the same sign as (1)
nr
. The sign of (1)
nr
switches with each successive increase in r from r = 0 to r = k 1. So the
corresponding bordered Hessians switch signs. In the usual textbook case of
2 variables and one constraint, k = 1, k 1 = 0, so we just need to check
the sign for r = 0, that is, the sign of the determinant of the big bordered
Hessian. You should be clear about what this sign should be if it is to be a
sucient condition for a strict local max or min. For the necessary condition,
we need to check signs or 0, for one permuted matrix as well, in this
case. What is this permuted matrix?
(2) As in the unconstrained case, the suciency conditions do not require
checking weak inequalities for permuted matrices.
(3) In the p.s.d. and p.d. cases, the signs of the principal minors must
be all positive, if the number k of constraints is even, and all negative, if k
is odd.
(4) If we know that a global max or min exists, where the CQ is satised,
and we get a unique solution x
R
n
that solves the FONC, then we may
use a second order condition to check whether it is a max or a min. However,
weak inequalities demonstrating n.s.d. or p.s.d. (subject to constraints) of
D
2
(L
) do not imply a max or min; these are necessary conditions. Strict

30
inequalities are useful; they imply (strict) max or min. If however, a global
max or min exists, the CQ is satised everywhere, and there is more than
one solution of the FONC, then the one giving the highest value of f(x) is
the max. In this case, we dont need second order conditions to conclude
that it is the global max.
VII.4. An Example
A consumer with income I > 0 faces prices p
1
> 0, p
2
> 0, and wishes
to maximize U(x
1
, x
2
) = x
1
x
2
. So the problem is: Max x
1
x
2
s.t. x
1
0,
x
2
0, and p
1
x
1
+ p
2
x
2
I.
To be able to use the Theorem of Lagrange, we need equality constraints.
Now, it is easy to see that if (x
1
, x
2
) solves the above problem, then (i)
(x
1
, x
2
) > (0, 0). If x
i
= 0 for some i, then utility equals zero; clearly, we can
do better by allocating some income to the purchase of each good; and (ii)
the budget constraint binds at (x
1
, x
2
). For if p
1
x
1
+ p
2
x
2
< I, then we can
allocate some of the remaining income to both goods, and increase utility
further.
We conclude from this that a solution (x
1
, x
2
) will also be a solution to
the problem
Max x
1
x
2
s.t. x
1
> 0, x
2
> 0, and p
1
x
1
+ p
2
x
2
= I.
That is, Maximize U(x
1
, x
2
) = x
1
x
2
over the set S = R
2
++
{(x
1
, x
2
)|I
p
1
x
1
p
2
x
2
= 0}. Since the budget set in this problem is compact and the
utility function is continuous, U attains a maximum on the budget set (by
Weierstrass Theorem). Moreover, we argued above that at such a maximum
x
, x
i
> 0, i = 1, 2 and the budget constraint binds. So, x
S.
Furthermore, Dg(x) = (p
1
, p
2
), so Rank(Dg(x)) = 1, at all points in
the budget set. So the CQ is met. Therefore, the global max will be among
the critical points of L(x
1
, x
2
, ) = x
1
x
2
+ (I p
1
x
1
p
2
x
2
).
FONC: (L/x
1
) = x
2
p
1
= 0 (1)
(L/x
2
) = x
1
p
2
= 0 (2)
(L/) = I p
1
x
1
p
2
x
2
= 0 (3)
= 0, (otherwise (1) and (2) imply that x
1
= x
2
= 0, which violates (3)).
Therefore, from (1) and (2), = (x
2
/p
1
) = (x
1
/p
2
), or p
1
x
1
= p
2
x
2
. So (3)
31
implies I 2p
1
x
1
= 0, or p
1
x
1
= (I/2), which is the standard Cobb-Douglas
utility result that the budget share of a good is proportional to the exponent
w.r.t. it in the utility function. So we get
x
i
= (I/2p
i
), i = 1, 2, and
= (I/2p
1
p
2
).
We argued that the global max would be one of the critical points of
L(x, ) in this example; (note, however, that the global min (which occurs
at (x
1
, x
2
) = (0, 0) is not a critical point). Since we have only it one critical
point, it follows that this must be the global max! (We know that x
1
= x
2
= 0
is the global min, and not the point that we have located). If we were unsure
whether our point is a max or a min, we could try second order conditions
(unnecessary here) as follows:
Dg(x
) = (p
1
, p
2
)
D
2
L(x
) = D
2
U(x
) +
D
2
g(x
)
=
_
U
11
(x
) U
12
(x
)
U
21
(x
) U
22
(x
)
_
+
_
g
11
(x
) g
12
(x
)
g
21
(x
) g
22
(x
)
_
=
_
0 1
1 0
_
+
_
0 0
0 0
_
=
_
0 1
1 0
_
Now evaluate the quadratic form z
T
1
D
L
(x
)z
2
= 2z
1
z
2
at any (z
1
, z
2
)
that is orthogonal to Dg(x
) = (p
1
, p
2
). So, p
1
z
1
p
2
z
2
= 0 or
z
1
= (p
2
/p
1
)z
2
. For such (z
1
, z
2
), z
T
1
D
L
(x
)z
2
= (2p
2
/p
1
)z
2
2
< 0, so
D
2
L(x
) is negative denite relative to vectors orthogonal to the gradient

of the constraint, and x
is therefore a strict local max.

Youve probably seen the computation below. I provide it for the heck of
it, since its not needed and moreover weve done the second-order exercise
above using the quadratic form; it is not a secret that I often dislike tedious
second-order conditions.
BH(L
) =
_
0 Dg(x
)
[Dg(x
)]
T
D
2
L(x
)
_
=
_
_
_
0 p
1
p
2
p
1
0 1
p
2
1 0
_
_
_
det(BH(L
)) = 2p
1
p
2
> 0. This is the sign of (1)
n
= (1)
2
. Therefore,
there is a strict local max at x
.
Digression on the Chain Rule
32
We saw an example (in the proof of the 1st order condition) of the Chain
Rule at work; you told me youve seen this. Namely, if h :
n
and
f :
n
are dierentiable at the relevant points, then the composition
g(t) = f(h(t)) is dierentiable at t and
g
(t) = Df(h(t))Dh(t) =
n
j=1
f(h(t))
x
j
h
j
(t)
You may have encountered this before in notation f(h
1
(t), . . . , h
n
(t)),
with some use of total dierentiation or something. Similarly, suppose h :
n
and f :
n
m
are dierentiable at the relevant points, then the
composition g(x) = f(h(x)), g :
p
m
is dierentiable at x, and
Dg(x) = Df(h(x))Dh(x)
.
Here, on the RHS an mn matrix multiplies an n p matrix, to result
in the m p matrix on the LHS. Things are actually quite similar to the
familiar case. The (i, j)
th
element of the matrix Dg(x) is g
i
(x)/x
j
, where
g
i
is the i
th
component function of g and x
j
is the j
th
variable. Since this is
equal to the dot product of the i
th
row of Df(h(x)) and the j
th
column of
Dh(x), we have
g
i
(x)/x
j
=
n
k=1
f
i
(h(x))
h
k
h
k
(x)
x
j
Application to the Implicit Function Theorem
Theorem 14 Suppose F :
m+n

m
is C
1
, and suppose, for a given
c
m
, F(y, x) = c for some y
m
and some x
n
. Suppose also
that DF
y
(y, x) has rank m. Then there are open sets U containing x and V
33
containing y and a C
1
function f : U V s.t.
F(f(x), x) = c x U
.
Moreover,
Df(x) = [DF
y
(y, x)]
1
DF
x
(y, x)
The proof of this theorem starts going deep, so will not be part of this
course. But notice, that applying the Chain Rule to dierentiate
F(f(x), x) = c
yields
DF
y
(y, x)Df(x) + DF
x
(y, x) = 0
(*)
whence the expression for Df(x).
More painfully in terms of compositions, if h(x) = (f(x), x), then Dh(x) =
_
Df(x)
I
_
,
whereas DF(.) = (DF
y
(.)|DF
x
(.)), so matrix multiplication using parti-
tions yields Eq.(*).
Proof of the Theorem of Lagrange
Before the formal proof, note that well use the tangency of the contour
sets of the objective and the constraint approach, which in other words uses
the implicit function theorem. For example, consider maximizing F(x
1
, x
2
)
s.t. G(x
1
, x
2
) = 0. If G
1
= 0 (this is the constraint qualication in this
case), we have at a tangency point of contour sets, G
1
f
(x
2
) +G
2
= 0 (where
x
1
= f(x
2
) is the implicit function that keeps the points (x
1
, x
2
) on the
constraint); so f
(x
2
) = G
2
/G
1
.
34
On the other hand, if we vary x
2
and adjust x
1
to stay on the constraint,
the function value F(x
1
, x
2
) = F(f(x
2
), x
2
) does not increase; therefore lo-
cally around the optimum, F
1
f
(x
2
) + F
2
= 0. Substituting, F
1
(G
2
/G
1
) +
F
2
= 0. If we now put
F
1
/G
1
=
,
we have both F
1
+ G
1
= 0 by denition, and G
2
+ F
2
= 0, the two
FONC.
The Proof:
Without loss of generality, let the leading principal k k minor matrix of
Dg(x
) be linearly independent. We write x = (w, z) with w being the rst

k coordinates of x and z being the last (n k) coordinates. So showing the
existence of
(a 1 n vector) that solves

Df(x
) +
Dg(x
) =
is the same as showing that the 2 equations below hold for this
; the
equations are of dimension 1 k and 1 (n k) respectively:
Df
w
(w
, z
) +
Dg
w
(w
, z
) =
(*)
Df
z
(w
, z
) +
Dg
z
(w
, z
) =
(*)
Since Dg
w
(w
, z
) is square and of full rank, Eq.(*) yields
= Df
w
(w
, z
) [Dg
w
(w
, z
)]
1
35
(**)
We show
solves (*) as well. This needs two steps.

First, g(h(z), z) = for some implicit function h, so
Dh(z) = [Dg
w
(w
, z
)]
1
Dg
z
(w
, z
)
Second, dene F(z) = f(h(z), z). Since theres a constrained optimum
at (h(z
), z
), varying z while keeping w = h(z) will not increase the value

of F(z
). So
DF(z
) = Df
w
(w
, z
)Dh(z
) + Df
z
(w
, z
) =
Substituting for Dh(z
),
Df
w
(.)[Dg
w
(.)]
1
Dg
z
(.) + Df
z
(.) =
That is,
Dg
z
(.) + Df
z
(.) =
36
Chapter 5
Optimization with Inequality
Constraints
5.1 Introduction
The problem is to nd the Maximum or the Minimum of f : R
n
R on the
set {x R
n
|g
i
(x) 0, i = 1, . . . , k}, where g
i
: R
n
R are the k constraint
functions. At the optimum, the constraints are now allowed to be binding
(or tight or eective), i.e. g
i
(x) = 0, as before, or slack (or non-binding), i.e.
g
i
(x) > 0.
Example: Max U(x
1
, x
2
) s.t. x
1
0, x
2
0, I p
1
x
1
p
2
x
2
0. If we
do not know whether x
i
= 0, for some i, at the utility maximum, or whether
x
i
> 0, then clearly we cannot use the Theorem of Lagrange. Similarly,
if there is a bliss point, then we do not know in advance whether there at
the budget constrained optimum the budget constraint is binding or slack.
Again, we cannot then use the Theorem of Lagrange, to use which we need
to be assured that the constraint is binding.
Note the general nature of a constraint of the form g
i
(x) 0. If we have
a constraint h(x) 0, this is equivalent to h(x) 0. And something like
h(x) c is equivalent to c h(x) 0.
37
We use Kuhn-Tucker Theory to address optimization problems with in-
equality constraints. The main result is a rst order necessary condition
that is somewhat dierent for that of the Theorem of Lagrange; one main
dierence is that the rst order conditions g
i
(x) = 0, i = 1, . . . , k in the The-
orem of Lagrange are replaced by the conditions
i
g
i
(x) = 0, i = 1, . . . , k in
Kuhn-Tucker theory.
In order to motivate this dierence, let us do a simple example. Consider
the function f : R R dened by f(x) = 10x x
2
. f is strictly concave,
has a unique Max at x = 5, and equals 0 at x = 0 and x = 10. Consider rst
the problem
Max 10x x
2
s.t. x 3, and compare this with the equality constraint
x = 3. The constraint function in either case is g(x) = 3 x. Analytically
or from a diagram, we see that the maximum occurs at x = 3 in either case,
and the value of the multiplier
at the maximum equals 4.

Now Max 10xx
2
s.t. x 7, and compare this to the equality constraint
x = 7. In either case the constraint function is g(x) = 7 x. When the
constraint is 7 x 0, we see that the maximum occurs at x = 5, at which
point the constraint 7 5 = 2 > 0, that is the constraint is slack. Notice
that if we relax the constraint, to, say, x 8, the maximum will not change.
Since a Lagrange multiplier, as we know it, gives the rate of change of the
objective function with respect to a constraint level, a theory that uses a
Lagrange multiplier here should give us the result that it equals zero at the
maximum, x = 5. Here is a rst look at the Kuhn-Tucker method.
Set up the Lagrangean L(x, ) = 10x x
2
+ [7 x].
The Kuhn-Tucker rst order conditions are
(L/x) = 10 2x = 0 (1)
(L/) = (7 x) = 0, 0, (L/) 0, with complementary
slackness, that is,
if > 0, then (L/) = 0, and
if (L/) > 0, then = 0. (2)
Condition (2) says that if > 0, it must be that relaxing the constraint
would increase the value of the objective: so, the constraint must bind (7
x = 0). On the other hand, if the constraint does not bind at the optimum
38
(7x > 0), then relaxing it further will not increase the value of the objective,
so it must be that = 0.
There are two equivalent ways to now solve the problem; rst, we could
consider one by one the two alternative possibilities, that at the maximum,
either g(x) = 7 x > 0, or that g(x) = 7 x = 0. Second, we could use
the two alternatives that at the maximum either = 0, or > 0. We will
generally use the rst approach; SB use the second approach.
If g(x) = 7 x > 0, then by complementary slackness (henceforth, CS),
= 0; then Eq.(1) implies x = 5. On the other hand, if g(x) = 7 x = 0,
then x = 7, and then Eq.(1) implies = 4 < 0, which violates Eq. (2). So
x = 5 is our answer.
We remark here that the meaning of = 0 at the maximum should be
clear. If the constraint is relaxed, the maximum value does not increase, as
the constraint doesnt bind at the Max in the rst place.
Notice also the quite dierent result for the problem
Max 10x x
2
s.t. x = 7.
Here, the maximum occurs at x = 7, and at the maximum, = 4 < 0,
which means that if the constraint is relaxed, to say x = 8, then the max-
imum value will go down. This possibility is always ruled out in inequality
constrained problems.
Two important notes. First, Kuhn-Tucker theory gives conditions for local
maxima and minima. Our example had a concave objective function; had
it not been concave, we would have needed an additional argument to show
that we had arrived at a global maximum. Second, the answer to the simple
example was obvious, and needed no formal machinery to tackle. This is also
the case for simple problems with linear objective functions in those cases
as well, one is better o solving the problem without using formal machinery.
Note well that complementary slackness does not mean the conditions
if = 0, then (L/) > 0, and
if (L/) = 0, then > 0.
For instance, for the problem
Max 10x x
2
s.t. x 5, the maximum occurs at x = 5, and both and
(L/) equal zero at the optimum. So both conditions above are incorrect.
39
5.2 Kuhn-Tucker Theory
Recall that the problem is to nd the Maximum or the Minimum of f :
R
n
R on the set {x R
n
|g
i
(x) 0, i = 1, . . . , k}, where g
i
: R
n
R are
the k constraint functions. The main theorem deals with local maxima and
minima, though.
Suppose that l of the k constraints bind at the optimum x
. Denote the
corresponding constraint functions as (g
i
)
i
, where is the set of indexes
of the binding constraints. Let g
: R
n
R
l
be the function whose l
components are the constraint functions of the binding constraints. That is
g
(x) = (g
i
(x))
i
.
Dg
(x) =
_
_
_
Dg
i
(x)
.
.
.
Dg
m
(x)
_
_
_
, where i, . . . , m are the indexes of the binding
constraints. So Dg
(x) is an l n matrix.
We now state FONC for the problem. The Theorem below is a consoli-
dation of the Fritz-John and the Kuhn-Tucker Theorems.
Theorem 15 (The Kuhn-Tucker (KT) Theorem). Let f : R
n
R, and
g
i
: R
n
R, i = 1, . . . , k be C
1
functions. Suppose x
is a Maximum of f
on the set S = U {x R
n
|g
i
(x) 0, i = 1, . . . , k}, for some open set
U R
n
. Then there exist real numbers ,
1
, . . . ,
k
, not all zero such that
Df(x
) +
k
i=1
i
Dg
i
(x
) =
1n
.
Moreover, if g
i
(x
) > 0 for some i, then
i
= 0.
If, in addition, RankDg
(x
) = l, then we may take to be equal to 1.

Furthermore,
i
0, i = 1, . . . , k, and
i
> 0 for some i implies g
i
(x
) = 0.
Suppose the constraint qualication, RankDg
(x
) = l, is met at the
optimum. Then the KT equations are the following (n +k) equations in the
n + k variables x
1
, . . . , x
n
,
1
, . . . ,
k
:
i
g
i
(x
) = 0, i = 1, . . . , k,
i
0, g
i
(x
) 0 with complementary
slackness. (1)
40
Df(x
) +
k
i=1
i
Dg
i
(x
) = (2)
If x
is a local minimum of f on S, then f(x
) attains a local maximum

value. Thus for minimization, while Eq.(1) stays the same, Eq.(2) changes
to
Df(x
) +
k
i=1
i
Dg
i
(x
) = (2)
Equation (1) and (2) are known as the Kuhn-Tucker conditions.
Note nally that since the conditions of the Kuhn-Tucker Theorem are
not sucient conditions for local optima; there may be points that satisfy
Equations (1) and (2) or (2) without being local optima. For example, you
may check that for the problem
Max f(x) = x
3
s.t. g(x) = x 0, the values x = = 0 satisfy the KT
FONC (1) and (2) for a local maximum but do not yield a maximum.
5.3 Using the Kuhn-Tucker Theorem
We want to maximize f(x) over the set {x R
n
|g(x)
1k
}, where g(x) =
(g
1
(x), . . . , g
k
(x)).
Set up L(x, ) = f(x) +
k
i=1
i
g
i
(x)
(If we want to minimize f(x), set up
L(x, ) = f(x) +
k
i=1
i
g
i
(x))
To ensure that the KT FONC will hold at the global max, verify that
(1) a global max exists and (2) The constraint qualication is met at the
maximum.
This second is not possible to do if we dont know where the maximum
is. What we do instead is to check whether the CQ holds everywhere in the
domain, and if not, we note points where it fails. The CQ in the theorem
depends on what constraints are binding at the maximum. Again since we
dont know the maximum, we dont know what constraints bind at it.
With k constraint functions, there are 2
k
proles of binding and non-
binding constraints possible, each of these proles implying a dierent CQ.
We either check all of them, or we rule out some proles using clever argu-
ments.
41
If both checks are ne, then we nd all solutions (x
) to the set of
equations:
i
(L(x, )/
i
) = 0,
i
0, (L(x, )/
i
) 0, i = 1, . . . , k, with CS.
(L(x, )/x
j
) = 0, j = 1, . . . , n.
From the set of all solutions (x
), pick that solution (x
) for which
f(x
) is maximum. Note that this method does not require checking for con-
cavity of objective functions and constraints, and does not require checking
any second order condition.
The method may fail if a global max does not exist or if the CQ fails at
the maximum. The example Max f(x) = x
3
s.t. g(x) = x 0 is one where
no global max exists, and we saw earlier that the method fails.
An example in which CQ fails: Max f(x) = 2x
3
3x
2
, s.t. g(x) =
(3 x)
3
0.
Suppose the constraint does not bind at the maximum; then we dont have
to check a CQ. But suppose it does. That is, suppose the optimum occurs
at x = 3. Dg(x) = 3(3 x)
2
= 0 at x = 3. The CQ fails here. You could
check that the KT FONC will not isolate the maximum.In fact, in this baby
example, it is easy to see that x = 3 is the max, as (3x)
3
0 iff (3x) 0,
so we may work with the latter constraint function, with which CQ does not
fail. It is a good exercise to visualize f(x) and see that x = 3 is the maximum,
rather than merely cranking out the algebra now.
Alternatively, we may use the more general FONCs stated in the theorem.
Df(x) + Dg(x) = 0, with , not both zero.
(6x
2
6x) + (3(3 x)
2
) = 0, and (1)
(3 x)
3
0, with strict inequality implying = 0. (2)
If (3x)
3
> 0, then = 0, which from Eq.(1) implies either = 0, which
violates the FONC, or x = 1. At x = 1, f(x) = 1.
On the other hand, if (3 x)
3
= 0, that is x = 3, then Eq (1) implies
= 0, so it must be that > 0. At x = 3, f(x) = 27. so x = 3 is the
maximum.
Two Simple Utility Maximization Problems
42
Example 1. This is a real baby example meant purely for illustration.
No one expects you to use the heavy Kuhn-Tucker machinery for such simple
problems. In this example, one expects instead that you would use reason-
ing about the marginal utility per rupee ratios (U
1
/p
1
), (U
2
/p
2
) to solve the
problem.
Max U(x
1
, x
2
) = x
1
+ x
2
, over the set {x = (x
1
, x
2
) R
2
|x
1
0, x
2

0, I p
1
x
1
p
2
x
2
0}, where I > 0, p
1
> 0 and p
2
> 0 are given.
So there are 3 inequality constraints:
g
1
(x
1
, x
2
) = x
1
0, g
2
(x
1
, x
2
) = x
2
0, and
g
3
(x
1
, x
2
) = I p
1
x
1
p
2
x
2
0
At the maximum x
, any combination of these three could bind; so there

are 8 possibilities. However, since U is strictly increasing, the budget con-
straint binds at the maximum (g
3
(x
) = 0). Moreover, g
1
(x
) = g
2
(x
) = 0 is
not possible since consuming 0 of both goods gives utility equal to 0, which
is clearly not a maximum.
So we have to check just three possibilities out of the eight.
Case(1) g
1
(x
) > 0, g
2
(x
) > 0, g
3
(x
) = 0
Case(2) g
1
(x
) = 0, g
2
(x
) > 0, g
3
(x
) = 0
Case(3) g
1
(x
) > 0, g
2
(x
) = 0, g
3
(x
) = 0
Before using the KT conditions, we verify that (i) a global max exists
(here, because the utility function is continuous and the budget set is com-
pact), and that (ii) the CQ holds at all 3 relevant cominations of binding
constraints described above.
Indeed, for Case(1), Dg
(x) = Dg
3
(x) = (p
1
, p
2
), so Rank[Dg
3
(x)] =
1, so CQ holds.
For Case(2), Dg
(x) =
_
Dg
1
(x)
Dg
3
(x)
_
=
_
1 0
p
1
p
2
_
, so Rank[Dg
(x)]
= 2.
For Case(3), Dg
(x) =
_
Dg
2
(x)
Dg
3
(x)
_
=
_
0 1
p
1
p
2
_
, so Rank[Dg
(x)]
= 2.
Thus for the maximum, x
, there exists a
such that (x
) will be a
solution to the KT FONCs. Of course, there could be other (x, )
s that are
solutions as well, but a simple comparison of U(x) for all candidate solutions
43
will isolate for us the Maximum.
L(x, ) = x
1
+ x
2
+
1
x
1
+
2
x
2
+
3
(I p
1
x
1
p
2
x
2
)
The KT conditions are
1
(L/
1
) =
1
x
1
= 0,
1
0, x
1
0, with CS (1)
2
(L/
2
) =
2
x
2
= 0,
2
0, x
2
0, with CS (2)
3
(L/
3
) =
3
(I p
1
x
1
p
2
x
2
) = 0,
3
0, I p
1
x
1
p
2
x
2
0, with
CS (3)
(L/x
1
) = 1 +
1
3
p
1
= 0 (4)
(L/x
2
) = 1 +
2
3
p
2
= 0 (5)
Since we dont know which of the three cases select the constraints that
bind at the maximum, we must try all three.
Case(1). Since x
1
> 0, x
2
> 0, (1) and (2) imply
1
=
2
= 0.Plugging
these in Eq(4) and (5), we have 1 =
3
p
1
=
3
p
2
. Since utility is strictly in-
creasing, relaxing the budget constraint will increase utility. So the marginal
utility of income,
3
> 0. Thus
3
p
1
=
3
p
2
implies p
1
= p
2
.
(We could have alternatively got
3
> 0 simply by equation mining as
follows: If
3
= 0, then Eq(4) and (5) imply
1
=
2
. Now both
1
and
2
cant equal 0, for then all three multipliers equal 0, violating the KT
conditions. So both must be greater than 0. But then from Eq (1) and (2),
x
1
= x
2
= 0. which is ruled out in Case (1). thus it must be that
3
> 0.)
So if at a local max both x
1
and x
2
are strictly positive, then it must be
that there prices are equal. All (x
1
, x
2
) that solve Eq(3) are solutions. The
utility in any such case equals
x
1
+ (I p
1
x
1
/p
2
) = I/p, where p = p
1
= p
2
. Note that in this case,
(U
1
/p
1
) = (U
2
/p
2
) = 1/p.
Case 2. x
1
= 0 implies, from Eq(3), that x
2
= (I/p
2
). Since this is
greater than 0, Eq(2) implies
2
= 0. Hence from Eq(5),
3
p
2
= 1.
Since
1
0, Eq (4) and (5) imply
3
p
1
= 1 +
1
1 =
3
p
2
. Moreover,
since
3
> 0, this implies p
1
p
2
.
That is, if it is the case that at the maximum, x
1
= 0, x
2
> 0, then it must
be that p
1
p
2
. Note that in this case, (U
2
/p
2
) = (1/p
2
) (U
1
/p
1
) = (1/p
1
).
For completeness sake, Eq(5) implies
3
= (1/p
2
). So from Eq (4),
1
= (p
1
/p
2
) 1. So the unique critical point of L(x, ) is
44
(x
) = (x
1
, x
2
,
1
,
2
,
3
) = (0, (I/p
2
), (p
1
/p
2
) 1, 0, (1/p
2
)).
Case(3). This case is similar, and we get that x
2
= 0, x
1
> 0 occurs only
if p
1
p
2
. We have
(x
) = ((I/p
1
), 0, 0, (p
2
/p
1
) 1, 1/p
1
).
We see that which of the cases applies depends upon the price ratio p
1
/p
2
.
If p
1
= p
2
, then all three cases are relevant, and all (x
1
, x
2
) R
2
+
such that the
budget constraint binds are utility maxima. But if p
1
> p
2
, then only Case(2)
is applies, because if Case (1) had applied, we would have had p
1
= p
2
, and
if Case (3) had applied, that would have implied p
1
p
2
. The solution to
the KT conditions in that case is the utility maximum. Similarly, if p
1
< p
2
,
only Case (3) applies.
Example 2. Max U(x
1
, x
2
) = (x
1
/1 + x
1
) + x
2
/1 + x
2
), s.t. x
1
0,
x
2
0, p
1
x
1
+ p
2
x
2
I.
Check that the indierence curves are downward sloping, convex and that
they cut the axes (show all this). This last is due to the additive form of the
utility function, and may result in 0 consumption of one of the goods at the
utility maximum.
Exactly as in Example 1, we are assured that a global max exists, that
the CQ is met at the optimum, and that there are only 3 relevant cases of
binding constraints to check.
the Kuhn-Tucker conditions are:
1
(L/
1
) =
1
x
1
= 0,
1
0, x
1
0, with CS (1)
2
(L/
2
) =
2
x
2
= 0,
2
0, x
2
0, with CS (2)
3
(L/
3
) =
3
(I p
1
x
1
p
2
x
2
) = 0,
3
0, I p
1
x
1
p
2
x
2
0, with
CS (3)
(L/x
1
) = (1/(1 + x
1
)
2
) +
1
3
p
1
= 0 (4)
(L/x
2
) = (1/(1 + x
2
)
2
) +
2
3
p
2
= 0 (5)
Case(1). x
1
> 0, x
2
> 0 implies
1
=
2
= 0. Eq (4) implies
3
> 0, so
that Eq(4) and (5) give ((1 + x
2
)/(1 + x
1
)) = (p
1
/p
2
)
1/2
.
Using Eq(3), which gives x
2
= ((I p
1
x
1
)/p
2
), above, we get
((p
2
+ I p
1
x
1
)/(p
2
(1 + x
1
)) = (p
1
/p
2
)
1/2
, so simple computations yield
x
1
= ((I + p
2
(p
1
p
2
)
1/2
)/(p
1
+ (p
1
p
2
)
1/2
)),
45
x
2
= ((I + p
1
(p
1
p
2
)
1/2
)/(p
2
+ (p
1
p
2
)
1/2
)),
3
= (1/p
1
(1 + x
1
)
2
).
x
1
> 0, x
2
> 0 implies I > (p
1
p
2
)
1/2
p
1
, I > (p
1
p
2
)
1/2
p
2
. If either of
these fails, then we are not in the regime of Case 1.
Case(2) x
1
= 0 with Eq(3) implies x
2
= I/p
2
. Since this is positive,
2
= 0, so Eq(5) implies
3
= 1/(1 + (I/p
2
))
2
p
2
.
1
=
3
p
1
1 (from x
1
= 0 and Eq(4)).
1
= p
1
p
2
/(p
2
+ I)
2
1. For this to be 0, it is required that
p
1
p
2
/(p
2
+ I)
2
1,that is I (p
1
p
2
)
1/2
p
2
.
Utility equals x
2
/(1 + x
2
) = I/(p
2
+ I).
(x
1
, x
2
,
1
,
2
,
3
) = (0, I/p
2
, 1 + ((p
1
p
2
)/(p
2
+ I)
2
), 0, p
2
/(p
2
+ I)
2
).
Case(3) By symmetry, the solution is
(x
1
, x
2
,
1
,
2
,
3
) = (I/p
1
, 0, 0, 1 + ((p
1
p
2
)/(p
1
+ I)
2
), p
1
/(p
1
+ I)
2
)
And for this Case to hold it is necessary that p
1
p
2
/(p
1
+ I)
2
1, or
I (p
1
p
2
)
1/2
p
1
.
To summarize, suppose p
1
= p
2
= p, then (p
1
p
2
)
1/2
p
1
= (p
1
p
2
)
1/2
p
2
equals 0. So since I > 0, we are in the regime of Case I,and x
1
= x
2
= I/2p
at the maximum.
Suppose on the other hand that p
1
< p
2
(the contrary can be worked out
similarly), then p
2
> (p
1
p
2
)
1/2
> p
1
, so that
(p
1
p
2
)
1/2
p
1
> 0 > (p
1
p
2
)
1/2
p
2
. Thus either
I > (p
1
p
2
)
1/2
p
1
, in which case we use Case(1), or
I (p
1
p
2
)
1/2
p
1
in which case we use Case(3). Case(2), that in which a
positive amount of good 2 and zero of good 1 is consumed at the maximum,
does not apply.
5.4 Miscellaneous
(1) For problems where some constraints are of the form g
i
(x) = 0, and
others of the form g
j
(x) 0, only the latter give rise to Kuhn-Tucker like
complementary slackness conditions (
j
0, g
j
(x) 0,
j
g
j
(x) = 0).
46
(2) If the objective to be maximized, f, and the constraints g
i
, i = 1, . . . , k
(where constraints are of the form g
i
(x) 0) are all concave functions,
and if Slaters constraint qualication holds (i.e., there exists some x
n
s.t. g
i
(x) > 0, i = 1, . . . , k, then the Kuhn-Tucker conditions become both
necessary and sucient for a global max.
(3) Suppose f and all the g
i
s are quasiconcave. Then the Kuhn-Tucker
conditions are almost sucient for a global max: An x
and that satisfy

the Kuhn-Tucker conditions indicate that x
is a global max provided that

in addition to the above, either Df(x
) = , or f is concave.
47
Appendix
Completeness Property of Real Numbers
48

Lectures On Optimization: A. Banerji

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lectures On Optimization: A. Banerji

Uploaded by

Copyright:

Available Formats

Lectures on Optimization

are respectively called a maximum and a minimum of F on S.

< 0, and future consumption is discounted at rate . So

(t) = w + rk(t) c(t)

s payo can depend on the choices/strategies (s

being the zero vector); (ii) If c , ||cx|| = |c|||x||.

n, for every coordinate i. Squaring, adding across all i and

(x). We can rewrite Equation (1) as follows:

. If f is dierentiable at x, there exists a 1 n matrix A = (a

be a local max or local

be an interior local max (min proof is done along similar

is a local max, we have, for k K and K

Taking limits preserves these inequalities since (, 0] and [0, ) are

(0) = 0, and by the Chain Rule,

(0) = Df(h(0))Dh(0) = Df(x

) = 0 (a vector of zeros), so the quadratic form

. By local we mean that the value f(x

) is a max or min compared

such that x satises the k constraints. Thus the problem it considers is to

solves Max U(x

solves Max F(x) s.t. c G(x) = 0. This part is skippable.

that respects the constraint G(x) = c.

) are not both zero at

. This is called the constraint qualication.

gives the change in

R, i = 1, . . . , k, k < n are the constraint functions.

is a local max (resp. min) of f on the constraint set

) f(x) (resp. f(x)) for all x U

)) = k, then we may put = 1.

)) = k is called the constraint qual-

) are linearly dependent. The constraint

) = implies that cannot equal 0, for then

) = , which along with linear independence implies

) = subsumes the following n equations (with

that meet the condition and yet are

that must be orthogonal to

is the Null Space of Dg(x

)x 0, (resp. 0) for all x N(Dg(x

)x < 0,(resp. > 0), for all

is a strict local max (resp. strict local min)

), which is an n k dimensional subspace.

; k +nr) is the matrix obtained by deleting the last r rows and

) will denote a variant in which the permutation has been

) and (ii) only the columns

) and only the rows of [Dg(x

)x 0,for all x N(Dg(x

)), i for all

)x 0,for all x N(Dg(x

)), i for all permutations

)x < 0,for all nonzero x N(Dg(x

)x > 0,for all nonzero x N(Dg(x

) do not imply a max or min; these are necessary conditions. Strict

) is negative denite relative to vectors orthogonal to the gradient

is therefore a strict local max.

) be linearly independent. We write x = (w, z) with w being the rst

(a 1 n vector) that solves

) is square and of full rank, Eq.(*) yields

solves (*) as well. This needs two steps.

), varying z while keeping w = h(z) will not increase the value

at the maximum equals 4.

) > 0 for some i, then

) = l, then we may take to be equal to 1.