Notes 02 - Producer Theory

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

Producer Theory

Jonathan Levin, Paul Milgrom, and Ilya Segal

September 2015

1 Competitive Producer Behavior


Since Marshall, the standard approach to developing a theory of competitive mar-
kets is to attribute demand and supply in competitive markets to two di↵erent
kinds of participants. The suppliers are firms, whose behavior is studied by “pro-
ducer theory”, and the demanders consumers, studied under the rubric of “con-
sumer theory.”Demand and supply are reconciled by prices, which is the notion
of market equilibrium. We adopt that standard for this class. This note studies
producer theory and a separate one studies consumer theory.1
The standard model of the firm has the following features. Firms are described
by fixed and exogenously given technologies that allow them to convert inputs
(in simple models, these are land, labor, capital and raw materials) into outputs
(products). “Competitive” firms take both input and output prices as given, and
choose a production plan (a technologically feasible set of inputs and outputs) to
maximize profits.
Before we get into the details, let’s remark on a few key features of the model.

1. Firms are price takers. This “competitive firm” assumption applies to both
input and output markets and makes it reasonable to ask questions about
1
We begin with producer theory because it proves to be mathematically simpler. The sim-
plicity comes from the fact that the parameters (prices) enter a firm’s objective function (profits)
but not its feasible set (production set). In consumer theory, conversely, prices determine the
consumer’s feasible set (budget set) but not his objective (utility function). Nevertheless, there
is a close connection between consumer and producer theory, which we will highlight later.

1
(1) what happens to the firm’s choices when a price changes and (2) what
can be inferred about a firm’s technology from its choices at various price
levels. For output markets, the assumption fits best when each firm has
many competitors who produce perfectly substitutable products, and a par-
allel condition applies to input markets. Of course, even the most casual
empiricism suggests that many firms sell di↵erentiated products and have at
least some flexibility in setting prices, and even small firms may have market
power in buying local inputs, such as hiring workers who live near a mine or
factory, so the results of the theory need to be applied with care. Even so,
the pattern of analysis established in this way is often partially extendable
to situations in which firms are not price takers.

2. Technology is exogenously given. This assumption is sometimes criticized as


too narrow to be useful in a world of technical change, product innovations,
and consumer marketing. Although it is true that some aspects of technical
change are difficult to capture within the standard model, the model is more
flexible and encompassing than many critics acknowledge. The exogenous
technology model formally includes the possibility of investing in technical
change, provided these investments are themselves treated as inputs into a
production process. Similarly, the model can formally include advertising
and branding that alter consumer’s perceptions, provided that we represent
these activities as transforming the output into a di↵erent product. It allows
managerial e↵ort and talent to be inputs as well, if they, too, are treated as
simple inputs into production.

3. Firms maximize profits. This assumption can be justified if

(i) the firm is “competitive” – i.e, cannot a↵ect prices for any of its inputs
or outputs (assumption (1) above),
(ii) there is no uncertainty about profits (e.g., the firm can buy all inputs
and sell all outputs before uncertainty is resolved, thus ensuring profits),
and
(iii) the firm’s managers are perfectly controlled by the owners/shareholders.

2
Under (i)-(iii), all of the firms’ shareholders would agree to maximize profits,
since this would then maximize each shareholder’s income, which he could
spend optimally according to his preferences.
If (i) is violated, an owner who is also an input supplier or an output con-
sumer would have an interest in raising/lowering the relevant price. (E.g., a
worker-owner in the absence of a perfect labor market would want to deviate
from profit-maximization to drive up the wages.) If (ii) is violated, profits are
uncertain, and the optimal decisions depend on the owners’ beliefs regarding
possible realizations of uncertainty, or their attitudes towards risk, and these
beliefs and/or risk attitudes may di↵er across owners. Regarding (iii): Since
the time of Adam Smith, if not earlier, many observers have emphasized
that corporations are characterized by a separation between ownership (the
stockholders) and control (management), and that this separation weakens
the incentives of managers to maximize profits. The problem of motivat-
ing managers to act on behalf of owners has been a main concern for the
economics (and law) of agency theory.

4. The Marshallian approach of separating the household, where consumption


takes place, from the firm, in which all production takes place, is criticized
by some economists. An alternative approach treats households as both
consumers and producers. In this alternative view, an essential feature of
all economic development is the change in household productive behavior
associated with the development of markets. As markets develop, households
move away from the pattern of producing for themselves only the goods
they plan to consume toward a pattern in which each household specializes,
devoting most of its productive e↵orts to producing one particular good,
which it sells or trades for other goods.
When markets are fully developed, under assumptions (i)-(iii) above, we
could artifically separate each household into a “consumer” part and a profit-
maximizing “producer” part. (This observation is known as “Fisher’s Sepa-
ration Theorem”). However, with imperfect markets (e.g., insurance markets
or household service markets), such separation is impossible.

3
Students sometimes wonder about the role of assumptions such as these, par-
ticularly when they are contrary to the facts of the situation. Economists have
taken a range of positions concerning how to think about simplifying assumptions,
and there is no consensus about the “correct” view. One extreme position is to
deny the relevance of any inference based on such models, because the premises
of the model are false. At the opposite extreme, some practicing economists seem
willing to accept “standard” or “customary” assumptions uncritically. Both of
these extreme positions are rejected by thoughtful people.
All economic modeling abstracts from reality by making simplifying but untrue
assumptions. Experience in economics and other fields shows that such assump-
tions models can serve useful purposes. One purpose is to support tractable models
that isolate and highlight important e↵ects for analysis by suppressing other ef-
fects. Another purpose is to serve as a basis for numerical calculations, possibly for
use in estimating magnitudes, deciding economic policies, or designing economic
institutions. For example, one might want to estimate the e↵ect of a tax policy
change on overall investment or hiring.The initial calculations based on a simplified
model might then be adjusted to account for the e↵ects suppressed in the model.
For a model to serve these practical purposes, its relevant predictions must be
reasonably accurate. The accuracy of predictions can sometimes be checked by
testing using data. Sometimes, the “robustness” of predictions can be evaluated
partly by theoretical analyses. In no case, however, should models or assumptions
be regarded as adequate merely because they are “usual” or “standard.” Although
this seems to be an obvious point, it needs to be emphasized because the temptation
to skip the validation step can be a powerful one. Standard assumptions often make
the theory fall into easy, recognizable patterns, while checking the suitability of
the assumptions can be much harder. The validation step is not dispensable.

2 Production Sets, Technology


We start by describing the technological possibilities of the firm. Suppose there are
n commodities in the economy. A production plan is a vector y = (y1 , ..., yn ) 2 Rn ,
where an output will have yk > 0 and an input will have yk < 0. If the firm has

4
nothing to do with good k, then yk = 0. The production possibilities of the firm
are described by a set Y ✓ Rn , where any y 2 Y is feasible production plan. Figure
1 illustrates a production possibility set with one input and one output as the area
below the curve in the second quadrant, with y labeling a point in the set.
x2
6

·y
- x1

Figure 1: A Production Possibility Set

Throughout our analysis, we will make the innocent technical assumptions that
Y is non-empty (so as to have something to study!) and closed (to help ensure
the existence of optimal production plans). Consider some more interesting and
substantive economic properties production sets might have:

• Free Disposal. The production set Y satisfies free disposal if y 2 Y implies


that y 0 2 Y for any y 0  y.

• Shut Down. The production set Y has the shut-down property if 0 2 Y ; that
is, the firm has the option of using no resources and producing nothing.

• Nonincreasing Returns to Scale. The production set Y has nonincreasing


returns to scale (loosely, “decreasing returns to scale”) if y 2 Y implies that
↵y 2 Y for all 0  ↵  1.

5
• Nondecreasing Returns to Scale. The production set Y has nondecreasing
returns to scale (loosely, “increasing returns to scale”) if y 2 Y implies that
↵y 2 Y for ally ↵ 1.

• Constant Returns to Scale. The production set Y has constant returns to


scale if y 2 Y implies that ↵y 2 Y for all ↵ 0.

• Convexity. The production set Y is convex if for all y, y 0 2 Y , all t 2 (0, 1),
ty + (1 t) y 0 2 Y . This condition incorporates a kind of “nonincreasing
returns to specialization,” meaning that if two “extreme” plans are feasible,
their combination will be as well. In addition, if 0 2 Y , then convexity
implies nonincreasing returns to scale.

• Strict Convexity. Y is strictly convex if for all y, y 0 2 Y , all t 2 (0, 1),


ty + (1 t) y 0 is in the interior of Y (i.e., lies in a ball contained in Y ).

One convenient way to represent production possibility sets is using a trans-


formation function T : Rn ! R, where T (y)  0 implies that y is feasible, and
T (y) > 0 implies that y is infeasible. This is represented in Figure 2. The set of
boundary points {y 2 Rn : T (y) = 0} is called the transformation frontier.2
When the transformation function is di↵erentiable, we can define the marginal
rate of transformation between goods k and l as:
@T (y)/@yl
M RTk,l (y) = .
@T (y)/@yk
The marginal rate of transformation measures the extra amount of good k that
can be obtained per unit reduction of good l. As Figure 2 shows, it is equal to the
slope of the boundary of the production set at point y. (Hence, even though there
are many transformation functions T describing a given production set, at points
on the transformation frontier, MRT does not depend on which one is used.)
Production sets correspond to a very general model where each good k can be
either an input or an output — that is, a firm may both produce widgets, and also
2
Several interpretations can be o↵ered of the function T (y). As just one example among many,
one might interpret it to define the amount of technical progress required to make the combination
y a feasible one (the currently available technology corresponding to “zero progress”).

6
x2
6
c
c
c
c
cy T (y) > 0
c
c
c
T (y) < 0 c
c
c - x1
c
c
slopec= MRT(y)

Figure 2: Marginal Rate of Transformation

use widgets to make gadgets, with yk being the net amount of widgets produced.
Often, it is convenient to separate inputs and outputs, letting q = (q1 , ..., ql ) denote
the vector of the firm’s outputs, and z = (z1 , ..., zm ) the vector of inputs (where
l + m = n).
If the firm has only a single output, we can describe the transformation frontier
by writing output as a function of the inputs used, q = f (z). Formally, allowing
for free disposal, the production set can then be described as

Y = {(q, z) 2 R ⇥ Rm : q  f (z)} .

In this case, we refer to f (·) as the firm’s production function. Equivalently, this
production set can be described with the transformation function T (q, z) =
q f (z). The marginal rate of transformation between inputs k and l, also known
as the marginal rate of technical substitution, can then be computed as
@f (z)/@zl
M RTk,l (y) =
@f (z)/@zk
This expression tells us how many units of input k must be used in place of one
unit of input l to maintain the same level of output. It is illustrated in Figure 3.

7
z2
6
@
@
@
@
@
@
@
y@
@
@
@ {z : f (z) = q}
slope = MRTS(y) @
@ - z1

Figure 3: Isoquants and MRTS.

3 Profit Maximization and Duality


We now consider the profit maximization problem for the firm from a production
set Y at a price vector p. Denote the set of the firm’s optimal production decisions
by Y ⇤ (p) and the resulting profits by ⇡(p). Thus,

⇡(p) = sup p · y, (1)


y2Y
Y ⇤ (p) = Arg max p · y = {y 2 Y : p · y = ⇡(p)} . (2)
y2Y

Thus, Y ⇤ : Rn ◆Rn is a correspondence (set-valued function), which we will call


the firm’s “optimal supply correspondence.” (Note that in our general model, it
specifies the firm’s inputs as negative numbers along with outputs as positive
numbers.) ⇡ (p) is called the firm’s “profit function.”

Remark 1 We have not made sufficient assumptions to ensure that a maximum


profit is achieved (i.e., Y ⇤ (p) 6= ?) and that the sup cannot be always replaced with
the max. In particular, we allow for the possibility that ⇡ (p) = +1, i.e., the firm
achieves unbounded profits at some prices, which may happen when the production
set Y is unbounded even if it is closed. For example, this always happens when
some prices are negative and Y satisfies free disposal. More importantly, this may
happen even when all prices are positive. For example, if Y has nondecreasing

8
returns to scale and has the shutdown property, then at any p, either ⇡ (p) = 0 or
⇡ (p) = +1 (Exercise: show this.)

Suppose that we don’t know the firm’s production set Y , but we observe some
of the firm’s supply decisions y (p) ✓ Y ⇤ (p) for p 2 Rn . (This formulation is quite
general: it allows that some prices p may not be observed at all and so for them
y (p) = ?, for other prices we may observe only some but not all optimal decisions,
and so y (p) could be a strict subset of Y ⇤ (p).) We can ask three questions:

1. What can we infer from the observations about the underlying production
set?

2. Can we recover the entire production set if we have enough data?

3. Which observations are “rationalizable,” i.e., consistent with profit maxi-


mization for some production set?

Note that these questions are parallel to those asked in “revealed preference”
theory, with one di↵erence: In revealed preference theory, we observed the decision-
maker’s choices and the feasible sets and wanted to infer his objective function.
Here we observe the firm’s choices and the objective function (profits) and want
to infer the feasible set (production set). The roles of the objective function and
the feasible set in the two problems are swapped.

Definition 1 Supply correspondence y : Rn ◆ Rn is rationalized by production set


Y if y (p) ✓ Y ⇤ (p) for all p 2 Rn , where Y ⇤ is the optimal supply correspondence
given by (2). y is rationalizable if it is rationalized by some production set.

We start by answering Question 1: What can we infer from the observations


about the production set Y ? Suppose that we observe that at prices p the firm
has chosen production plan y. We can make two inferences from this observation:
(1) plan y is feasible, and (2) any production plan that would yield higher profits
than y cannot be feasible. If we have observed many choices by the firm at various
prices, we can use idea (1) to construct a simple “inner bound” on Y that consists
of all choices that the firm actually makes:

Y I = [p2Rn y (p) .

9
Similarly, we can use idea (2) to construct an “outer bound” on Y , which only
include plans that don’t give the firm higher profits at any given price vector p
than what it obtained given its observed choices:3

Y O = {y 0 2 Rn : p · y 0  p · y for all p 2 Rn , y 2 y (p)} .

Remark 2 Note the parallel to revealed preference in constructing the outer bound:
there from the fact that an alternative is feasible we inferred that it can’t be better
than the chosen point. Here from the fact that an alternative is better than the
chosen point we infer that it can’t be feasible.

It turns out that Y I and Y O summarize all that can be inferred about the
production set:

Proposition 1 Production set Y rationalizes supply correspondence y if and only


if Y I ✓ Y ✓ Y O .

Proof. The “only if” part holds by construction of Y I and Y O , as argued in the
text. For the “if” part, note that with production set Y , for any price vector
p 2 Rn and any y 2 y (p), we have y 2 Y I ✓ Y , and also p · y p · y 0 for all
y 0 2 Y ✓ Y O , and so y 2 Y ⇤ (p). Q.E.D.

Now we proceed to Question 2: Can we infer Y exactly? Note that as we get


more data (i.e., the set y (p) grows for all p), the bigger is the inner bound Y I
and the smaller is the lower bound Y O , and so we infer production set Y more
precisely. Now we ask if we can infer production set Y exactly when our data is in
some sense “complete.” For simplicity we restrict attention to production sets Y
satisfying free disposal. For such sets, we can expand the inner bound to the “free
disposal inner bound”:

YFID = {y 2 Rn : y  x for some x 2 Y I }.

If Y has free disposal, then knowing that Y I ✓ Y implies that YFID ✓ Y . Note also
that with free disposal, it does not make sense to face the firm with negative prices,
3
In general convex analysis terms, this construction of the outer bound is known as a “Fenchel
duality.” For a general and deep treatment of duality, see Rockafellar’s (1970) Convex Analysis.

10
since whenever pl < 0 the firm can make unbounded profits by taking yl ! 1.
Thus, we focus on nonnegative nonzero prices: p 2 Rn+ \ {0}.
It turns out that closed convex production sets with free disposal are fully
inferred if the data is “complete” in the following sense:

Proposition 2 Suppose the production set Y is convex and closed and has free
disposal. Then
(i) if y (p) 6= ? for all p 2 Rn+ \ {0} (i.e., we observe some optimal choice at
each price) then Y O = Y .
(ii) if y (p) = Y ⇤ (p) for all p 2 Rn+ \ {0} (i.e., we observe all optimal choices
at each price) and Y 6= Rn , then YFID = Y .

Proof. (i) Since by construction Y ✓ Y O , we only need to show that Y O ✓ Y .


Take any x 2 Rn \Y . Then, by the separating hyperplane theorem, there exists
p 2 Rn \ {0} such that p · x > p · y for all y 2 Y , and in particular for y 2 y (p).
Furthermore, we must have p 0, for if we had pl < 0 for some l, letting elk = 1
for k = l and = 0 otherwise, we would have p · y Kel = p · y pl K > p · x for
K 0 large enough, and y Kel 2 Y by free disposal. So, p 2 Rn+ \ {0}, and it
follows that x 2/ Y O . Therefore, Y O ✓ Y .
(ii) Since by construction YFID ✓ Y , we only need to show that Y ✓ YFID .
By the supporting hyperplane theorem, for every point x on the boundary of Y
there exists some p 2 Rn \ {0} such that p · x = maxy2Y p · y. Just as in (i), by
free disposal, p 0. Thus, x 2 Y ⇤ (p), and so the boundary of Y is contained
in Y I . Furthermore, for any point y 2 Y we can find a point x y on the
ˆ
boundary of Y . (For example, letting = sup { 0 : y + ( , . . . , ) 2 Y }, we
have ˆ < +1 for otherwise free disposal would imply ⇣ Y = ⌘R , and the sup is
n

achived due to closedness of Y , we can take x = y + ˆ , . . . , ˆ . By the preceding


argument x 2 Y I , hence y 2 YFID . This establishes that Y ✓ YFID . Q.E.D

If Y is not convex, then Y 6= Y O , since the outer bound Y O is always convex,


being the intersection of convex sets — Y O = \p2Rn , y2y(p) {y 0 2 Y : p · y 0  p · y}.
(Exercise: show that Y O is the convex hull of Y under the assumptions of part
(i) of the above Proposition.) Thus, we cannot infer a nonconvex production set

11
completely: There is insufficient information in y(p) to decide whether the points
in the set di↵erence Y O \Y I are in the set Y .

Remark 3 We can infer from the complete data whether Y is convex: it is convex
if and only if YFID = Y O . However, this inference relies on observing all profit-
maximizing choices Y ⇤ (p) at a given price p. If we only observe a subset of optimal
choices, y (p) ✓ Y ⇤ (p), we may not be able to tell whether some choices in Y O
that would be optimal for price p (such as point x in Figure 4 are not chosen
because they are unavailable due to nonconvexity (so the production set is Y as
depicted in the Figure), or they are available but some other equally profitable
choice y was made instead. Distinguishing between the two cases, however, is
important for many economic issues - e.g., whether a competitive equilibrium with
such production technology exists.

Figure 4: Isoquants and MRTS.

12
It should be clear that to construct the “outer bound” as in part (i) of the
Proposition above (this is in contrast to the “inner bound”), we actually do not
need to observe any of the firm’s choices, as long as we observe its profit function
⇡ (p) = p · y (p) (which must be single-valued). We can describe the outer bound
with the “gain function” : Rn ⇥ Y ! R, defined as

(p, y) = p · y ⇡ (p) . (3)


In words, (p, y) describes the gain over ⇡ (p) from choosing production vector y
at price p. Supposing we observe ⇡ (p) for all p 2 P , we can then describe the
resulting outer bound Y O as the set of outputs at which the gains are nonpositive
at any price: ⇢
YO = y 2 Rn : sup (p, y)  0 . (4)
p2P

In other words, the “outer bound” production set Y O can be described by means
of the “transformation function” T (y) = supp2P (p, y).

Example 1 Consider a single-output firm, whose production set can be written


as Y = (q, z) 2 Rn+ : q  f (z) . Suppose that we observe all possible non-
negative input prices w 2 Rn+ 1 while the output price is fixed at 1, thus P =
(1, w) : w 2 Rn+ 1 . Then the transformation function describing Y O can be writ-
ten as

T (q, z) = sup [(q w · z) ⇡ (w)] = q inf [⇡ (w) + w · z] .


w 0 w 0

Thus, the outer bound Y O can be described by means of the production function

f O (z) = inf [⇡(w) + w · z] .


w 0

Note that observing the set of prices P amounts to having “complete data” (i.e.,
fixing the output price at 1 is just a normalization), since ⇡ must be homogeneous
of degree one, so ⇡ (p, w) = p⇡ (1, w/p) – see below). Thus, if the firm’s actual
production function f is concave, then by Proposition 2, we have f O = f . More
generally, we have f O f , and f O will be the lowest concave function that is
nowhere below f .

13
Now we proceed to Question 3: which obervations are rationalizable. Proposi-
tion 1 immediately implies

Corollary 3 Supply correspondence y is rationalizable if and only if Y I ✓ Y O ,


i.e., p · y 0  p · y for all p 2 Rn , y 2 y (p) , y 0 2 y (p0 ). (These inequalities are known
as the “Weak Axiom of Profit Maximization,” or WAPM.)

One simple consequence of this characterization is that when checking rational-


izability we can restrict attention to supply functions rather than correspondences.

Corollary 4 Let y : Rn ◆ Rn and P = {p 2 Rn : y (p) 6= ?}. Supply correspon-


dence y is rationalizable if and only if (i) any selection ŷ : P ! Rn from it is
rationalizable, and (ii) p · y (p) is single-valued at each p 2 P .

Proof. (i) is equivalent to WAPM applied to p0 6= p , while (ii) is equivalent to


WAPM applied to p0 = p. Q.E.D

Thus, when given a supply correspondence, we only need to check that (i) each
selection from it is a rationalizable supply function, and (ii) the profit function
⇡ (p) = p · y (p) at any given p 2 P does not depend on which selection is chosen.
Since checking (ii) is trivial, from now on we focus on checking rationalizability of
a given supply function (rather than correpondence).

4 Rationalizability: Di↵erentiable Case


The WAPM inequalities that characterize rationalizability are hard to check di-
rectly if the set of observations is large. So, now we consider how rationalizability
can be characterized when we actually have a continuum of observations. Specifi-
cally, now we suppose that we observe a supply function y : P ! Rn on an open
convex set P ✓ Rn (e.g., we could take P be the set of all strictly positive price
vectors). For this case, we derive useful characterizations of rationalizability using
di↵erential conditions.
We begin with some simple necessary conditions for rationalizability:

Proposition 5 (i) Any profit function ⇡ given by (1) is convex.

14
(ii) Any profit function ⇡ given by (1) is homogeneous of degree one, i.e., ⇡ ( p) =
⇡ (p) for all p 2 Rn , > 0.

(iii) Any optimal supply correspondence Y ⇤ given by (2) is homogeneous of degree


zero, i.e., Y ⇤ ( p) = Y ⇤ (p) for all p 2 Rn , all > 0.

Proof. (i) For any p0 , p00 , t 2 [0, 1],

⇡ (tp0 + (1 t)p00 ) = sup (tp0 + (1 t)p00 ) · y


y2Y

= sup (tp0 · y + (1 t)p00 · y)


y2Y

 t sup p0 · y + (1 t) sup p00 · y


y2Y y2Y
0 00
= t⇡ (p ) + (1 t) ⇡ (p ) .

(ii)
⇡( p) = max p · y = max p · y = ⇡(p).
y2Y y2Y

(iii) Using (ii),

Y ⇤ ( p) = {y 2 Y : p · y = ⇡( p)} = {y 2 Y : p · y = ⇡(p)}
= {y 2 Y : p · y = ⇡(p)} = Y ⇤ (p).

Q.E.D

More generally, we say that a function f : Rn ! Rm is homogeneous of degree


k if f ( p) = k f (p) for all > 0. The following di↵erentiable characterization of
homogeneity will prove useful:

Proposition 6 (Euler’s law) If f is di↵erentiable and homogeneous of degree k,


then Df (p) p = kf (p).

k
Proof. Di↵erentiate the identity f ( p) = f (p) with respect to and set = 1.
Q.E.D

Now we o↵er a complete characterization of rationalizability in the di↵erentiable


case. Recall that a supply function y is rationalizable if and only if Y I ✓ Y O , i.e.,

15
WAPM holds. Writing Y O in the form (4) above, this means that for all p 2 P ,
supp0 2P (p0 , y (p))  0 (it is now convenient to denote the maximization variable
by p0 rather than p). Since we also know that (p, y (p)) = 0 by the definition of
⇡ (p), this is equivalent to

max
0
(p0 , y (p)) = (p, y (p)) = 0 for all p 2 P. (D)
p 2P

Intuitively, the gain from choosing a production plan y that is optimal for price p
when the actual price is p0 must be nonpositive, and is exactly zero when p0 = p.
(D) can be viewed as a dual problem to the profit-maximization problem: its
solution is a price vector p supporting a given production plan y as an optimal
choice.
Since the set P is open, all of its points are interior. At any p 2 P at which
⇡ is di↵erentiable, so is the objective function (·, y (p)) in (D), and therefore the
following FOC must be satisfied:

rp0 (p0 , y (p))|p0 =p = y (p) r⇡ (p) = 0.


This equality is known as Hotelling’s Lemma.

Remark 4 Hotelling’s Lemma is an example of an “Envelope Theorem,”which


tells us that we can di↵erentiate the value of the firm’s maximization problem with
respect to the parameters (in this case, prices) holding the maximizer (in this case,
the production plan) fixed, and get the same answer as if the maximizer is allowed
to vary. In this case, ⇡(p) = p · y(p), do di↵erentiating with respect to p without
varying y(p) leads to r⇡(p) = y(p). Such theorems are called “Envelope Theorems”
because “value functions” can be geometrically represented as the “upper envelope”
of a family of functions of the parameter. In the present example, the firm’s profit
function ⇡ (p) is by definition the upper envelope of the family of linear functions
{p · y}y2Y of p. Traditional derivations of the Envelope Theorem for constrained
maximization make many more assumptions: e.g., assume that the maximizer y (p)
is di↵erentiable in the parameter, and the feasible set Y is convex and described
by di↵erentiable inequality constraints, and then applies the Kuhn-Tucker FOC for
maximization. By considering the dual problem (D), we managed to avoid any of

16
these assumptions – instead we only had to make the (weaker) assumption that the
value function ⇡ (·) is di↵erentiable. Later we will dispense with this assumption
as well. For a more general statement of the Envelope Theorem, see Milgrom and
Segal (Econometrica 2002).

Recall, in particular, that the profit function ⇡ cannot depend on which selec-
tion y (p) 2 Y ⇤ (p) is chosen, and so Hotelling’s Lemma implies that the firm could
only have a unique optimal supply decision (i.e., Y ⇤ (p) must be a singleton) at
each price vector p at which ⇡ is di↵erentiable.
Observe that the convexity of ⇡ implies the concavity of the objective function
(·, y (p)) in (D), and so (along with the convexity of the feasible set P ) implies
that any price vector satisfying the FOC for problem (D) must solve this problem.
Thus, for the special case where ⇡ is di↵erentiable everywhere on P , we can state:

Proposition 7 Consider a supply function y : P ! Rn on an open convex set


P ✓ Rn such that ⇡ (p) = p · y (p) is di↵erentiable. Then y is rationalizable if and
only if
(i) (Hotelling’s Lemma) r⇡ (p) = y (p) for all p 2 P , and
(ii) ⇡ is convex.

This characterization of rationalizability is easy to remember: (i) describes the


First-Order Condition and (ii) the Second-Order Condition for Problem (D). This
characterization gives immediately all the important properties of supply functions
and profit functions.
This characterization can be restated in a more familiar form in the special
case in which the supply function y is continuously di↵erentiable (this is clearly a
stronger assumption than di↵erentiability of ⇡). The familiar characterization is
stated in terms of properties of the Jacobian Dy (p) = (@yi /@pj )i,j=1,...,n , known as
the “substitution matrix” :

Proposition 8 A continuously di↵erentiable supply function y : P ! Rn on an


open convex set P ✓ Rn is rationalizable if and only if its Jacobian Dy (p) is
symmetric, positive semidefinite, and satisfies Dy (p) p = 0.

17
Proof.
Let ⇡ (p) = p · y (p), and use the chain rule to write

r⇡ (p) = y (p) + p · Dy (p) .

Thus, Hotelling’s Lemma is equivalent to p · Dy (p) = 0. Furthermore, di↵erentiat-


ing Hotelling’s Lemma yields Dy (p) = D2 ⇡ (p), and ⇡ must be twice continuously
n
di↵erentiable, therefore its “Hessian” matrix D2 ⇡ (p) = (@ 2 ⇡/@pi @pj )i,j=1 is sym-
metric. Thus, Hotelling’s Lemma is equivalent to the symmetry of Dy (p) together
with (p · Dy (p))T = (Dy (p))T p = Dy (p) p = 0.4
Finally, the function ⇡ is convex if and only if its Hessian D2 ⇡ (p) is positive
semidefinite, and under Hotelling’s Lemma this Hessian equals Dy (p).
Hence, the conditions of this proposition are equivalent to those of Proposition
7. Q.E.D.

Note that the condition Dy (p) p = 0 is nothing but Euler’s Law for the degree-
0 homogeneity of the supply function y. (Recall that by Hotelling’s Lemma, the
di↵erentiability of ⇡ implies that Y ⇤ (p) = {y (p)}, and we know it must be homo-
geneous of degree 0.)

Remark 5 One can wonder what can we say about rationalizability if we only
observe the profit function (1) but not the supply choices. We already showed that
any rationalizable profit function must be homogeneous of degree 1 and convex.
It turns out that any profit function function ⇡ : P ! R satisfying these two
conditions on an open convex set P ✓ Rn is in fact rationalizable. Exercise: prove
this characterization of rationalizability for di↵erentiable profit functions, using
Proposition 7
4
Note that
0 1n
X @yj (p)
p · Dy (p) = @ pj A , while
j
@p i
0 1ni=1
X @yi (p)
Dy (p) p = @ pj A ,
j
@pj
i=1

so in general they need not coincide, but they do coincide when Dy (p) is a symmetric matrix.

18
5 Rationalizability “in the Large”
5.1 Law of Supply
Now we develop “finite-change” analogues of the positive semi-definiteness and
symmetry of the substitution matrix. The goal of this is twofold: (i) the finite-
changes analogues permit more intuitive interpretations, and (ii) they will permit
a general characterization of rationalizability, which dispenses with any di↵eren-
tiability assumptions.
We begin with positive semidefiniteness of the substitution matrix, which has a
simple intperpretation: Take a small change dp in the prices. The resulting small
change in supply will be Dy (p) (dp). Positive semidefiniteness means that for any
dp, (dp) · Dy (p) (dp) 0, i.e., the change of supply in the direction of the price
change is nonnegative. In particular, if only the price of good i changes, then
@yi (p) /@pi 0 (so the supply curve of good i is upward-sloping). To obtain a
finite-change version of this condition, write a double application of WAPM:

(p0 p) · y (p)  p0 · y (p0 ) p · y (p)  (p0 p) · y (p0 ) , (5)

and compare the first and last expressions and rearranging terms to get

(p0 p) · (y (p0 ) y (p)) 0

– an inequality known as the “Law of Supply”.

5.2 Producer Surplus Formula


Now we turn to symmetry of the substitution matrix, which is a much more subtle
consequence of maximization. The symmetry means that the e↵ect of a small
change in the price of good i on the supply of good j must be exactly the same
as the e↵ect of the same change in the price of good j on the supply of good i.
It is very difficult to develop a simple economic intuition for how this conclusion
follows from profit maximization. Historically, this conclusion was argued to be
important evidence that a mathematical approach to economic theory could lead
to new insights and predictions that would be missed by a merely verbal approach.

19
We will now derive a “large-change” implication of Hotelling’s Lemma (which
as we have seen is equivalent to symmetry of the substitution matrix, under the
extra assumption of homogeneity of degree zero). Consider a smooth path ⇢ con-
necting two price vectors p0 , p00 2 Rn . Formally, the path is described by a smooth
(i.e., continuously di↵erentiable) function ⇢ : [0, 1] ! Rn such that ⇢ (0) = p00
and ⇢ (1) = p0 . Assuming the profit function ⇡ is di↵erentiable, we can use the
Fundamental Theorem of Calculus and the Chain Rule to write
Z 1
00 0 d
⇡ (p ) ⇡ (p ) = ⇡ (⇢ (⌧ )) d⌧
0 d⌧
Z 1
= r⇡ (⇢ (⌧ )) · ⇢0 (⌧ ) d⌧
Z0
= r⇡ (p) · dp

(The last expression is a “shorthand” for writing a path integral, like the previous
expression.) Note in particular that the path integral cannot depend on the smooth
path ⇢ chosen to connect p00 and p0 . By Hotelling’s Lemma, this implies
Z
00 0
⇡ (p ) ⇡ (p ) = y (p) · dp.

This expression is known as the “Producer Surplus Formula”. The “path indepen-
dence” of the path integral is mathematically equivalent to the symmetry of the
substitution matrix (@yi (p) /@pj )i,j .
For example, consider the special “one-dimensional” case, in which only one
price is changing from pi = p0i to pi = p00i and the other prices p i are fixed.
(Formally, we the path is given by ⇢ (⌧ ) = ((1 ⌧ ) p0i + ⌧ p00i , p i ).) In this case, the
Producer Surplus Formula yields
Z p0i
0
⇡ (pi , p i ) ⇡ (pi , p i ) = yi (pi , p i ) dpi .
pi

This one-dimensional PSF simply gives the profit change as the area below the
supply curve for good i. Note that it allows us to calculate how the firm’s profits
change in response to changes in the price of good i knowing only the supply
function for good i, without knowing the prices or supply choices for other goods

20
(so the profits p · y (p) could not be calculated). This is very useful for empirical
work in “partial equilibrium,” which focuses on some markets and ignores other
markets.
When more than one prices change at the same time, there is no “natural”
path to choose, and we have many options for calculating the change in prices.
E.g., we could change dimensions one by one – and the result should not depend
on the order in which we change prices. Say, with two dimensions we can write

Z p00
1
Z p00
2
⇡ (p001 , p002 ) ⇡ (p01 , p02 ) = y1 (p1 , p02 ) dp1 + y2 (p001 , p2 ) dp2
p01 p02
Z p00
2
Z p00
1
= y2 (p01 , p2 ) dp2 + y1 (p1 , p002 ) dp1 .
p02 p01

5.3 General Characterization


While we have derived Producer Surplus Formula from Hotelling’s Lemma, the
formula is actually general. To see why we may be interested in more general
cases, note that by Hotelling’s Lemma, profits are di↵erentiable at p only if the
supply correspondence Y ⇤ (p) is single-valued at p. (In fact, the converse can
also be proven true if the production set Y is closed.) This is ensured when the
production set is strictly convex,5 but not otherwise: When Y is nonconvex (as in
Figure 4), or even convex but with a flat portion of the boundary, Y ⇤ (p) will be
multi-valued for some p, and ⇡ will not be di↵erentiable at p. We can derive PSF
for such cases as well. Furthermore, we can o↵er a generalization of Proposition 7
that does not require any di↵erentiability or single-valuedness assumptions:

Proposition 9 A supply function y : P ! Rn on a convex set P ✓ Rn is ratio-


nalizable if and only if

(i) (Producer Surplus Formula): ⇡ (p) = p · y (p) satisfies, for any p, p0 2 P , and
5
Indeed, suppose in negation that there exist y, y 0 2 Y ⇤ (p) such that y 6= y. Then we have
y 00 = 12 y + 12 y 0 2 interior(Y ) and p · y 00 = 12 p · y + 12 p · y 0 = ⇡ (p), hence y 00 2 Y ⇤ (p). This is
impossible, because a non-trivial linear function (one with p 6= 0) has no local maximum.

21
any path smooth ⇢ : [0, 1] ! P such that ⇢ (0) = p and ⇢ (1) = p0 ,
Z 1
0
⇡ (p ) = ⇡ (p) + y (⇢ (t)) · ⇢0 (t) dt
0

(ii) (Law of Supply): For all p, p0 2 P,

(p0 p) · (y (p0 ) y (p)) 0.

Proof. “Only if”: (ii) obtains, as explained above, by double application of


WAPM (5).
To derive (i), take any smooth function ⇢ : [0, 1] ! Rn such that ⇢ (0) = p00
and ⇢ (1) = p0 , and let (t) = ⇡ (⇢ (t)), and

(t0 , t) = ⇢ (t0 ) · y (⇢ (t)) (t0 ) .

Profit-maximization implies that (·, t) achieves its maximum value (zero) at t0 =


t. Thus, at any t 2 (0, 1) at which 0 (t) exists, the following FOC must hold:

@ (t0 , t) 0
= ⇢0 (t) · y (⇢ (t)) (t) = 0.
@t0 t0 =t

Thus, we must have 0 (t) = ⇢0 (t) · y (⇢ (t)) at each t at which the derivative exists.
Now we observe that

| (t) (t0 )|  |⇢ (t0 ) ⇢ (t)| · max {|y (⇢ (t0 ))| , |y (⇢ (t))|}


 |t t0 | max {|q · y (p)| , |q · y (p0 )|} ,

where the first inequality obtains from (5), while the second inequality obtains
because by the Law of Supply, q · y (p + tq) is nondecreasing in t 2 [0, 1]. Hence
is Lipshitz continuous on [0, 1], which implies that it is absolutely continuous, which
in turn implies that it is di↵erentiable almost everywhere and can be represented
as the integral of its derivative. Together with the expression derived for 0 (t)
wherever it exists, this gives the Producer Surplus formula.
“If”: For all p, p0 2 P , write

22
⇡ (p0 ) p0 · y (p) = [⇡ (p0 ) p0 · y (p)] [⇡ (p) p · y (p)]
= ⇡ (p0 ) ⇡ (p) (p0 p) · y (p)
Z 1
= (p0 p) · y (p + t (p0 p)) dt (p0 p) · y (p)
Z0 1
= (p0 p) · [y (p + t (p0 p)) y (p)] dt 0,
0

where the third equality is by the Producer Surplus Formula for the linear path
⇢ (t) = p + t (p0 p), and the inequality is by Law of Supply. By Corollary 3, this
implies rationalizability of y. Q.E.D

Since the general proof is cumbersome, consider the following simpler proof
and some intuition for the special “one-dimensional” case, in which the set P is
such that only one price pi varies and the other prices p i stay fixed (and so we
omit them from the arguments). From double WAPM inequalities (5), we see
that |⇡ (p0i ) ⇡ (pi )|  max {|yi (p0i )| , |yi (pi )|} · |pi p0i |. By the Law of Supply,
yi (pi ) is nondecreasing in pi , and therefore bounded by max {|yi (a)| , |yi (b)|} on
an interval [a, b]. Thus, ⇡ is Lipshitz continuous on [a, b], which implies that it is
di↵erentiable a.e. and can be written as an integral of its derivative. At any pi at
which ⇡ 0 (pi ) exists, Hotelling’s Lemma (which is the FOC for problem (D)) yields
⇡ 0 (pi ) = yi (pi ). Thus, we obtain the one-dimensional PSF:
Z b
⇡ (b) ⇡ (a) = yi (pi ) dpi .
a

Remark 6 When multiple optimal supply choices exist, the profit ⇡ (p) cannot
depend on which selection is used, and so the integral in the Producer Surplus
Formula cannot depend on it either. This implies that the supply correspondence
must be single-valued a.e. on any straight line.

Remark 7 Unlike the traditional derivation in intermediate micro textbooks, this


derivation does not rely on convexity of the firm’s technology or on the supply curve
coinciding with the marginal cost curve. This derivation a special case of a general
Envelope Theorem of Milgrom and Segal (2002).

23
To show the sufficiency part of the Proposition in the one-dimensional case,
by Corollary 3, we verify that PSF and the Law of Supply imply WAPM: for any
pi , p0i ,

⇡ (pi ) (pi , p i ) · y (p0i ) = ⇡ (pi ) ⇡ (p0i ) + (p0i , p i ) · y (p0i ) (pi , p i ) · y (p0i )


Z p0i
= yi (t) dt + (p0i pi ) yi (p0i )
pi
Z p0i
= [yi (p0i ) yi (t)] dt 0,
pi

where the first equality is by definition of ⇡ (p0i ), the second equality is by PSF,
and the inequality is by the Law of Supply (sign [yi (p0i ) yi (t)] = sign [p0i pi ]).
The proof in the multidimensional case makes exactly the same one-dimensional
arguments along other straight lines, that are not necessarily parallel to any of the
axes. (They can be interpreted as changing the price of one “good” in a di↵erent
coordinate system in the commodity space).

5.4 Summary: Characterizations of Rationalizable Supply


Functions
The three characterizations of di↵erentiable supply functions are summarized with
the following table:

Technical assumption y (p) cont. di↵er. ⇡ (p) = p · y (p) di↵er. None


Dy (p) symmetric
FOC: Hotelling’s Lemma Prod. Surplus Formula
Dy (p) p = 0
SOC: Dy (p) p.s.d. ⇡ (p) convex Law of Supply

6 The Single-Output Case


For a single-output firm with free disposal, the production set can be described as

Y = (q, z) : z 2 Rm
+ , q  f (z) .

24
With a positive output price p > 0, profit-maximization requires choosing q =
f (z), and so the profit maximization problem can be written as:

max pf (z) w · z,
z2Rm
+

where w 2 Rm
+ is the vector of input prices.
The profit-maximization problem can be separated into two subproblems:

(i) Find a cost-minimizing way to produce a given output q, yielding a “cost


function” for the firm, and

(ii) find an output level that maximizes the di↵erence between its revenue and its
cost function.

(i) is called the cost-minimization problem. Formally, for a fixed q 0, let

c (q, w) = inf w · z,
z2Rm
+ :f (z) q

Z ⇤ (q, w) = z 2 Rm
+ : f (z) q, w · z = c (q, w) .

The value function c (q, w) for this problem is called the cost function, and
the minimizer set Z ⇤ (q, w) is called conditional factor demand correspondence (to
indicate that it is conditional on a fixed output level q).
Once problem (i) is solved, problem (ii) can then be written as maxq 0 pq
c (q, w).
Note that the cost-minimization problem can be viewed as the profit-maximization
problem on the restricted production set Yq = y = (q, z) : z 2 Rm + , q  f (z)
(this is an “upper level set” of the production function). Thus, the properties of
the cost function and conditional factor demand as functions of the input prices
w exactly mirror those of the profit function and the supply correspondence, re-
spectively, with the obvious sign reversions. For example, Proposition 7 can be
restated for this case as

Proposition 10 Consider a conditional factor demand function z : R⇥W ! Rn


for a fixed output q on an open convex set W ✓ Rm such that c (q, w) = w · z (q, w)

25
is di↵erentiable in w. Them z is rationalizable by some production function if and
only if
(i) (Shepard’s Lemma) rw c (q, w) = z (q, w) .
(ii) c (q, ·) is concave.

Other properties of the cost function and conditional factor demand as functions
of w follow as well from the corresponding properties of profit-maximization, e.g.,
(a) c(q, ·) is homogeneous of degree one in w, (b) Z ⇤ (q, ·) is homogeneous of degree
zero, and (c) if Z ⇤ (q, ·) is a di↵erentiable function, then the matrix Dw Z ⇤ (q, w) =
Dw2 c(q, w) is symmetric and negative semi-definite.
What about the properties of the cost function c (q, w) as a function of the out-
put q? Under free disposal, it should be nondecreasing in q. Additional assump-
tions on the production function yield additional properties of the cost function,
e.g.,

Proposition 11 If the production function has nondecreasing[nonincreasing] re-


turns to scale, then the “average cost function” c (q, w) /q is nonincreasing [non-
decreasing] in q. If the production function f is concave, then the cost function
c (q, w) is convex in q.

Proof. Left as exercise.

7 First-Order Conditions for Profit Maximiza-


tion
To solve the profit-maximization problem numerically (or analytically, if the tech-
nology is given by convenient functional forms), we can use first-order conditions
from the Kuhn-Tucker Theorem. Namely, if technology is given by a di↵erentiable
transformation function T , then the problem is written as

max p · y s.t. T (y)  0.


y2Rm

26
The Lagrangian is L (y, ) = p · y T (y) where 0 denotes the dual variable
with the constraint. By the Kuhn-Tucker Theorem, the following FOC is then
necessary for profit-maximization:

rT (y) = p.

Geometrically, this means that at the optimal production plan y, the price vector
is normal to the production possibility frontier (since the gradient of the transfor-
mation function is is the normal vector to the frontier).
For a single-output firm with m inputs and production function f , the problem
can be written as
max
m
pf (z) w · z,
z2R+

where p > 0 is the price of output and the vector w 2 Rm reflects the input prices.
If f is di↵erentiable, an interior optimal vector of factor demands must satisfy the
following FOC: for all i,

@f (z)
p  wi , with equality if zi > 0.
@zi

Remark 8 Applying the Kuhn-Tucker Theorem formally, we could write the FOC
as p @f@z(z)
i
+ µi = wi , where µi 0 is the dual variable with the constraint zi 0,
satisfying the Complementary Slackness Condition µi zi = 0. It is customary to
suppress the dual variables with the nonnegativity constraints and write the FOC
in the above-displayed form.

Finally, let us separate the firm’s problem into cost-minimization and profit-
maximization using a cost function. The cost-minimization problem

c (q, w) = minm w · z s.t. f (z) q.


z2R+

Letting 0 denote the dual variable with the production constraint f (z) q,
the FOC for cost minimization is
@f (z)
 wi , with equality if zi > 0.
@zi

27
Thus, with f concave, one can think of profit maximization as the special
case of cost minimization in which the shadow price of output is the market price
p. There is more to this account. From the envelope therem for parameterized
constraints, we have:
@c(q, w)
= .
@q
Thus, at the solution to the cost minimization problem, the shadow value of output
is exactly the marginal cost of production.
Returning to our characterization of the firm’s problem, suppose the firm solves
the cost minimization problem for every q, yielding a cost function c(q, w). The
profit maximization problem can then be seen as:

max pq c(q, w).


q 0

This problem gives the famous first-order condition:


@c(q, w)
p , with equality if q > 0.
@q
Thus, a profit-maximizing price-taking firm should choose output to equalize marginal
cost to price (provided that it produces a positive output). So profit maximization
implies that the correct shadow price is the market price for output p.
When the production set is convex (e.g., described by a concave production
function f ), the FOCs are sufficient as well as necessary for profit-maximization.
The FOC approach is then useful for working out examples and obtaining formulae
that can be used to compute solutions numerically.
The convexity assumption fails in several interesting cases, such as ones where
there are fixed costs of production or where the production sets exhibits increasing
returns.

Example 2 Consider a single-output firm with a cost function c (q) that has a “U-
shaped” marginal cost c0 (q). (See MWG Figure 5.D.3 on p.144). Intuitively, the
firm has economies of scale at low production and diseconomies at high production.
When p > min c0 (q), the FOC for profit-maximization then gives two output levels
ql < qh satisfying p = c0 (q), plus q = 0 when c0 (0) > p. Which of these outputs is
profit-maximizing?

28
The SOC for profit-maximization is c00 (q) 0, which rules out the lower output
ql , at which the marginal cost curve is downward-sloping (and so in fact it is a local
profit-minimizer rather than maximizer). It remains to compare qh to 0. For this
comparison, it is enough to compare the average cost at qh , to p. The firm will
produce qh if and only if c (qh ) /qh  p.
We can now construct the firm’s supply curve from its AC and the MC curves.
Note that
✓ ◆0
0 c (q) c0 (q) q c (q) 1
AC (q) = = 2
= (M C (q) AC (q)) .
q q q
Thus, AC is downward-sloping where M C < AC, upward-sloping where M C >
AC, and minimized where M C = AC. For example, AC could be given by1 a
U-shaped curve, at whose bottom it must intersect the MC curve. Let q m denote
the output that minimizes AC (called the firm’s “most efficient scale”). Then the
firm’s supply correspondence is
8
>
< 0 if p < minq AC,
⇤ m
Q (p) = {0, q } if p = minq AC,
>
:
the higher solution of c0 (q) = p if p > minq AC.
A similar supply curve obtains if M C is downward sloping bur production involves
a positive fixed cost, so the AC curve is still U-shaped (see MWG Figure 5.D.4 on
p.145).
Thus, with nonconvex technology, the firm should then consider “discrete”
changes (e.g., whether to shut down), as well as changes “on the margin.” More
generally, some or all components of the firm’s decision set may discrete (e.g.,
which product to produce), and so convexity or di↵erentiability are unapplicable.
This makes the profit-maximization problem much harder to solve, but we can still
obtain some of its qualitative properties without solving it.

8 Monopoly
See MWG Section 12.B.6
6
As discussed in the beginning, profit-maximization is harder to justify for a firm that is not a
price-taker, because the firm’s owners may be at the same time consuming the firm’s outputs or

29
9 Comparative Statics
An important question in economics is the comparative statics question: How do
endogenous variables in the economy respond to changes in exogenous variables?
For example, in producer theory, exogenous variables could be prices or techno-
logical parameters, and the endogenous variables are the firm’s profit-maximizing
production choices.
We could ask the comparative statics question in a general maximization prob-
lem: Let F : X⇥T ! R, where X, T ✓ R, and consider the problem

X ⇤ (t) = Arg max F (x, t).


x2X

The question is how X (t) depends on t.

9.1 The First-Order Approach


The traditional approach to comparative statics in economics is by applying the
Implicit Function Theorem to the first-order conditions. This approach relies on
the following assumptions:

• Smoothness: F is twice continuously di↵erentiable.

• Convexity of X.

• Strict Concavity: Fxx < 0. (In particular, together with the previous bullet,
this ensures that the maximizer is unique: X ⇤ (t) = {x (t)})

• Interiority: For each t, x(t) is in the interior of X.

Under these assumptions, the unique maximizer x(t) is the unique solution to
the following First-Order Condition:

Fx (x(t), t) = 0. (FOC)
inputs or other goods whose prices may in general be a↵ected by the firm’s behavior. Justification
is possible in a “partial equilibrium” setting where we may assume that the output price set on
the firm does not a↵ect prices in other markets and that the firm’s owners consume negligible
amounts of its output.

30
Thus, x (t) is a function given “implicitly” by the FOC. We can now apply the
Implicit Function Theorem, which amounts to di↵erentiating (FOC) with respect
to t, which yields
Fxx (x(t), t)x0 (t) + Fxt (x(t), t) = 0.
This yields
Fxt (x(t), t)
x0 (t) = .
Fxx (x(t), t)
The advantage of this approach is that, if the function F is exactly known
and the above conditions are satisfied, then we calculate the value of x0 (t) exactly.
However, this approach is not useful in many theoretical studies, because

(a) F is not known exactly; we may only know some of its qualitative properties.
We want to have predictions that are robust to specification of F .

(b) F and/or X may not satisfy the assumptions. For example, F may be non-
smooth, or non-concave (e.g., a firm with fixed costs), or X may not be
convex (e.g., a nonconvex production set).

(c) The theoretical models are in many cases not calibrated to give quantitative
predictions. Instead, we are only interested in qualitative predictions: in
what direction do endogenous variables respond to changes in exogenous
variables? E.g., when can we say that x0 (t) 0? What qualitative features
of F that are important for this conclusion?

Obtaining qualitative predictions that are robust to specifications of the model


is the purview of the field called Monotone Comparative Statics (MCS).
To begin, note that by the above formula, under the assumed strict concavity
(Fxx < 0), we have x0 (t) 0 if and only if Fxt (x(t), t) 0. This finding has a
simple economic intuition:

Intuition: When Fxt 0, an increase in variable x is more valuable when the


parameter t is higher. In this sense, x and t are complementary in the
objective function. Intuitively, if this is the case, an increase in t results in a
higher x being optimal.

31
At the same time, the above formula relies on the smoothness of F and strict
concavity of F in x. Are these assumptions important?

Example 3 Suppose we want to know how a profit-maximizing firm responds to


output price changes. The firm solves maxq 0 pq c (q), where c (·) is the firm’s cost
function. If the cost function is di↵erentiable and strictly convex, and the solution
q (p) is interior, then it is characterized by the FOC p = c0 (q (p)). Moreover, if c is
twice di↵erentiable, then di↵erentiating the FOC (i.e., using the Implicit Function
Theorem) yields q 0 (p) = 1/c00 (q (p)) > 0, under the assumed convexity of c (·).
However, we saw in Example 2 above that the firm’s supply curve was upward-
sloping even though the cost function was not convex. Recall that upward-sloping
supply curves for an arbitrary production technology/cost function follow from the
“Law of Supply,” which says that q (p) must be nondecreasing in p. Thus, all
the assumptions on the cost function are superfluous. The “Law of Supply” is a
prototypical MCS result, and all the MCS results to follow can be viewed as its
generalizations.

Are concavity and smoothness important for monotone comparative statics?


The Intuition above clearly does not rely on these assumptions. We can further
convince ourselves that these properties cannot important by observing that the
Monotone Comparative Statics Question is fundamentally ordinal, rather than
cardinal, in nature. Simply speaking, the answer to this question should not depend
on the scale used to measure x. For example, suppose that a new variable x̃ is
obtained from the old variable x by a rescaling of real numbers. Specifically, let
x = (x̃), where is a strictly increasing function. In that case, the problem

x̃⇤ (t) 2 arg max F ( (x̃), t) ⌘ F̃ (x̃, t).


x̃2R, (x̃)2X

is clearly equivalent to the original problem, and x̃⇤ (t) is nondecreasing in t if and
only if x⇤ (t) = (x̃⇤ (t)) is nondecreasing in t. For example, suppose that x is the
variance of a certain distribution to be chosen optimally, and x̃ is its standard
deviation (and we can write x = (x̃) = x̃2 ). Clearly, the optimal variance should
be nondecreasing in the parameter t if and only if the optimal standard deviation
is nondecreasing in t.

32
Exercise 1 Show that di↵erentiable monotone rescaling does not preserve con-
cavity of the objective function, but it preserves the complementarity condition
Fxt 0.

Thus, concavity is a cardinal property and is not robust to monotonic trans-


formations of variables, and so it is superfluous for monotone comparative statics.
Similarly, we can see that smoothness of F is superfluous - if the scale transforma-
tion (·) is not everywhere di↵erentiable, a smooth function F would be turned
into a non-smooth function Fe, but the direction of comparative statics should not
be a↵ected.

9.2 Univariate Topkis’s Theorem


The apparatus of ordinal comparative statics dispenses with the assumptions of
concavity and smoothness. The crucial assumption that remains is that x and t are
“complementary” in the objective function F . When F is smooth, this amounts
to requiring that Fxt 0. Since we do not want to restrict attention to smooth
objective functions, we formulate an equivalent property in terms of di↵erences
rather than derivatives:

Definition 2 F : X ⇥ T ! R with X, T ✓ R has Increasing Di↵erences if for all


x, x0 2 X and t, t0 2 T such that x0 > x and t0 > t,

F (x0 , t0 ) F (x, t0 ) F (x0 , t) F (x, t).

This definition says that the incremental benefit of increasing x, F (x0 , t)


F (x, t), is increasing in the parameter t. (Symmetrically, it can be rewritten to say
that the incremental benefit of increasing t, F (x, t0 ) F (x, t), is increasing in x.)

Exercise 2 Using the Fundamental Theorem of Calculus, prove that X and T are
intervals and the function F : X ⇥ T ! R is sufficiently smooth, then F has
increasing di↵erences if and only if

(a) Fx (x, t) is nondecreasing in t for all x,

(b) Ft (x, t) is nondecreasing in x for all t,

33
(c) Fxt (x, t) 0 for all (x, t).

Exercise 3 Show that if functions F, G : X ⇥ T ! R have increasing di↵erences


and ↵, 0, then the function ↵F + G also has increasing di↵erences.

Now we can formulate the simplest monotone comparative statics result, which
is proven using the “Revealed Preference” approach:

Theorem 12 (Univariate Topkis’s Theorem) If F : X ⇥ T ! R with X, T ✓


R has increasing di↵erences, t, t0 2 T such that t0 > t, x 2 X ⇤ (t), and x0 2 X ⇤ (t0 ),
then min {x, x0 } 2 X ⇤ (t) and max {x, x0 } 2 X ⇤ (t0 ).

Proof. If x  x0 , then the statement is trivial, so suppose x > x0 , and so min


{x, x0 } = x0 and max {x, x0 } = x.
By “revealed preference,” we have

F (x, t) F (x0 , t) because x 2 X ⇤ (t) ,


F (x0 , t0 ) F (x, t0 ) because x0 2 X ⇤ (t0 ) .

If x > x0 , then using Increasing Di↵erences, the two inequalities imply, respec-
tively

F (x, t0 ) F (x0 , t0 ) F (x, t) F (x0 , t) 0,


F (x, t) F (x0 , t)  F (x, t0 ) F (x0 , t0 )  0.

These in turn imply x 2 X ⇤ (t0 ) and x0 2 X ⇤ (t).


When X ⇤ (t) and X ⇤ (t0 ) are both singletons, then the theorem simply says
that X ⇤ (t)  X ⇤ (t0 ). When either set is empty, the statement is vacuous. In the
general case of multi-element sets, the statement means that a maximizer can go
down (x0 < x) when the parameter goes up (t0 > t) only when both maximizers
are optimal for both parameter values.
Formally, for two sets A, B ✓ R we can say that A  B in the strong set order
when for any a 2 A and b 2 B such that a b we must also have b 2 A and a 2 B.
It can be seen that A  B in the strong set order if and only if the set A\B lies
entirely below the set A \ B, which in turn lies entirely below the set B\A:

34
•| • •
{z •}•| {z• • •} •| •{z• •}
A\B A\B B\A

The above Theorem then simply says that the set of maximizers X ⇤ (t) is non-
decreasing in t the strong set order. This implies, in particular, that the extreme
points of the set, sup X ⇤ (t) and inf X ⇤ (t), are nondecreasing in t. Clearly, all these
statements are equivalent when the maximizer is unique.
Sometimes we can obtain the stronger result that the maximizer cannot go
down at all when the parameter goes up:

Definition 3 F : X ⇥ T ! R with X, T ✓ R has Strictly Increasing Di↵erences


if for all x, x0 2 X and t, t0 2 T such that x0 > x and t0 > t,

F (x0 , t0 ) F (x, t0 ) > F (x0 , t) F (x, t).

When X, T are intervals and F is twice continuously di↵erentiable, a sufficient


condition for strictly increasing di↵erences is Fxt > 0. (However, as Edlin and
Shannon (1998) point out, this condition is not necessary for strict ID, in contrast
to Fxt 0 being necessary for “weak” ID.)

Proposition 13 (Monotone Selection Theorem) If F : X ⇥ T ! R with


X, T ✓ R has strictly increasing di↵erences, then for all t, t0 2 T such that t0 > t,
x 2 X ⇤ (t), and x0 2 X ⇤ (t0 ), we have x  x0 .

Proof: Left as exercise.


This is called the “Monotone Selection Theorem” because it says that any
selection from the maximizer correspondence X ⇤ (t) is nondecreasing. Note: strict
ID does NOT imply that any selection from X ⇤ (t) is strictly increasing – much
strong assumptions are needed for that (namely, di↵erentiability and interiority -
see Edlin-Shannon “Strict Monotone Comparative Statics”.)

Example 4 The objective function F (q, p) = pq c (q) has increasing di↵erences,


since Fp (q, p) = q is increasing in q. Thus, Topkis’s Theorem implies that the
supply correspondence Q⇤ (p) = Arg maxq 0 F (q, p) is nondecreasing “in the strong
set order”. In fact, the objective function has strictly increasing di↵erences, so by
the “Monotone Selection Theorem,” any selection from the supply correspondence
is nondecreasing (this is what the “Law of Supply” says).

35
Example 5 Suppose we want to know the e↵ect of a unit tax t on the optimal
price set by a monopolist. Here adopt the convention that the tax is paid by the
firm and consider the e↵ect on the “before-tax” price p received by the firm (i.e.,
the price paid by consumers); in another exercise we will examine the e↵ect on the
“after-tax” price p̄ = p t). If the monopolist faces a downward-sloping demand
curve D (p), his profit is F (p, t) = (p t) D (p) c (D (p)). This function has
strictly increasing di↵erences, since Ft (p, t) = D (p) is strictly increasing in p.
Thus, the Monotone Selection Theorem implies that any optimal price selection
p⇤ (t) is nondecreasing in t, and therefore the corresponding output D (p⇤ (t)) is
nonincreasing in t.

9.3 Robust Monotone Comparative Statics


The assumptions of concavity and smoothness have proven unnecessary for mono-
tone comparative statics. But what about the assumption of increasing di↵erences
(ID)? Can it be relaxed in a way that still ensures robust monotone compara-
tive statics? This question becomes important when you find that the objective
function in your maximization problem does not have ID. Can you conclude that
no robust monotone comparative statics conclusion can be made, or should you
continue looking for a weaker condition that might ensure that the maximizer is
nondecreasing in the parameter?
To see that ID may not be necessary for monotone comparative statics, note
that the property of ID is cardinal rather than ordinal: it considers how much F
is increased by increases in x, and so it depends on the scale used for the values of
the function. For example, it is clear that the problems

max F (x, t) and max F̃ (x, t) = (F (x, t))


x2X x2X

are equivalent for any strictly increasing function : R ! R. However, observe


that when all functions are smooth,
@ 0 00
F̃xt (x, t) = [ (F (x, t)) · Ft (x, t)] = (F (x, t))·Ft (x, t)·Fx (x, t)+ 0 (F (x, t))·Fxt (x, t).
@x
00
Thus, when 6= 0, ID of F does not imply ID of F̃ . This suggests that the class

36
of functions for which monotone comparative statics obtains includes all functions
of the form F , but is wider than that.

Example 6 Consider the e↵ect of a tax on the monopolist’s output on the “after-
tax” price p̄ received by the monopolist (so the price faced by the consumers is
p = p̄ + t). Assume that the firm has a constant marginal cost c, and write
its profits as F (p̄, t) = (p̄ c) D (p̄ + t). While it is hard to ensure ID for this
function, note that

@ log F (p̄, t) @ log D (p̄ + t) D0 (p̄ + t) " (p̄ + t)


= = = ,
@t @t D (p̄ + t) p̄ + t

where " (p) = pD0 (p) /D (p) is the elasticity of demand at price p. Thus, when
" (p) /p is increasing/decreasing, log F (p̄, t) has increasing di↵erences in (p̄, t) /
(p̄, t), and so the before-tax price is decreasing/increasing in the tax. (For constant
" (p) /p, which corresponds to demand functions of the form D (p) = Ae Bp , the
after-tax price received by the monopolist does not depend on the tax.)

In the example, we guessed a transformation of the objective function to ensure


ID. In general, however, it may be hard to find one, and it need not even exist.
Can we still ensure monotone comparative statics?
Milgrom and Shannon (1994) point out that the “right” property to ensure
monotone comparative statics depends on the kind of “robustness” we require
of our monotone comparative statics result. By “robustness” they mean that
monotone comparative statics should continue holding when some choices are not
available so the feasible set is S ✓ X, and/or when the objective function F is
perturbed in specific ways.
First, we only require robustness to the feasible set. Intuitively, for MCS, it
matters only whether a change in x raises the objective function, reduces it, or
keeps it constant. It cannot matter by how much the objective is changed, which
is what ID concerns itself with. Thus, the following condition emerges:7
7
Strulovici and Quah (Econometrica 2009) relax SCC even further by only requiring robustness
to feasible sets S ✓ X that are intervals.

37
Definition 4 Function F : X ⇥T ! R with X, T ✓ R satisfies the Single-Crossing
Condition (SCC) if for all x, x0 2 X, t, t0 2 T such that x0 > x and t0 > t,

F (x0 , t) F (x, t) ) F (x0 , t0 ) F (x, t0 ) , and


F (x0 , t) > F (x, t) ) F (x0 , t0 ) > F (x, t0 ) .

It satisfies the strict SCC if

F (x0 , t) F (x, t) ) F (x0 , t0 ) > F (x, t0 ) .

SCC can be understood as saying that when x0 > x, the function (t) =
F (x0 , t) F (x, t) crosses the horizontal axis at most once, and from below (al-
though the function is allowed to stay zero on an interval). The second implica-
tion in the definition of SCC is sometimes more useful in its contrapositive form,
F (x0 , t0 )  F (x, t0 ) ) F (x0 , t)  F (x, t). Strict SCC strengthens SCC by requir-
ing that (t) cannot turn zero at more than one point. Note that (strict) SCC
is a relaxation of (strict) ID, which requires that (t) be (strictly) increasing. In
contrast to ID, these conditions are purely ordinal: they only make ordinal com-
parisons of the values of F at di↵erent points, not cardinal comparisons (i.e., ask
only whether F is increased or decreased, not by how much) , and so they are
invariant to strictly increasing transformations of the objective function. Also,
unlike ID, these conditions are not symmetric in (x, t).

Theorem 14 (Milgrom-Shannon) X ⇤ (t) = Arg maxx2S F (x, t) is non-decreasing


in t in the strong set order for all feasible sets S ✓ X if and only if F satisfies
SCC. If F satisfies strict SCC, then any selection x⇤ (t) 2 X ⇤ (t) is non-decreasing
in t for all feasible sets S ✓ X.

Proof. We prove the first statement (the proof of the second statement is similar
and left as an exercise).
The “if” part: We want to show that when t0 > t, x 2 X ⇤ (t), and x0 2 X ⇤ (t0 ),
then min {x, x0 } 2 X ⇤ (t) and max {x, x0 } 2 X ⇤ (t0 ). If x  x0 , then the statement
is trivial, so suppose x > x0 , and so min {x, x0 } = x0 and max {x, x0 } = x.
Since x 2 X ⇤ (t), we have F (x, t) F (x0 , t), but then by the first part of SCC
F (x, t0 ) F (x0 , t0 ), which in conjunction with x0 2 X ⇤ (t0 ) implies x 2 X ⇤ (t0 ).

38
Similarly, since x0 2 X ⇤ (t0 ), we have F (x, t0 )  F (x0 , t0 ), but then by the
second part of SCC F (x, t)  F (x0 , t), which in conjunction with x 2 X ⇤ (t)
implies x0 2 X ⇤ (t).
The “only if” part: Let S = {x, x0 } with x0 > x, and t0 > t. If F (x0 , t)
F (x, t), then x0 2 X ⇤ (t) . Since by assumption X ⇤ (t0 ) X ⇤ (t) in the strong set
order, this implies x0 2 X ⇤ (t0 ) (indeed, otherwise x 2 X ⇤ (t0 ) and then again
max {x, x0 } = x0 2 X ⇤ (t0 )), and therefore F (x0 , t) F (x, t0 ). Similarly, if
F (x0 , t0 )  F (x, t0 ), then x 2 X ⇤ (t0 ). Since by assumption X ⇤ (t)  X ⇤ (t0 ) in
the strong set order, this implies x 2 X ⇤ (t) (indeed, otherwise x0 2 X ⇤ (t) and
then again min {x, x0 } = x 2 X ⇤ (t)), and therefore F (x0 , t)  F (x, t).
While SCC is the “right” condition for MCS in the sense stated above, it has two
shortcomings: (i) it is difficult to check, since it can’t be verified by checking the
sign of some derivatives, and (ii) it does not ensure robustness to perturbations of
the objective function. Specifically, Milgrom-Shannon consider objective functions
of the form F (x, G (x) , t). For example, G(x) could be the monetary benefit (or
cost) of choosing action x, which is independent of the parameter t. In this setting,
the “right” property of F to ensure “robust” monotone comparative statics is that
F (x, G (x) , t) have SCC for any perturbation G.
With some extra assumptions about the shape of F , the “right” property takes
familiar forms in the following two cases.

9.3.1 Additive Perturbations

Here F takes the form F (x, y, t) = f (x, t) + y. This formulation is appropriate


for cases where G (x) is the monetary benefit (or negative cost) of decision x and
the decision maker is either a firm maximizing profits or a consumer maximizing
a quasilinear utility function. For this case, SCC means when x0 > x, (t) =
f (x0 , t) f (x, t) + G (x0 ) G (x) crosses zero at most once, and from below.
Equivalently, (t) = f (x0 , t) f (x, t) crosses the horizontal line G (x) G (x0 )
at most once, and from below. Since this must hold for any value of G (x)
G (x0 ), (t) = f (x0 , t) f (x, t) must be nondecreasing in t, i.e., f must have
ID. (Indeed, if f (x0 , t0 ) f (x, t0 ) < f (x0 , t) f (x, t) for some t0 > t, then we
violate SCC when G (x0 ) G (x) = 12 [f (x0 , t0 ) f (x, t0 ) + f (x0 , t) f (x, t)]).

39
Similarly, for strict SCC, we need f (x0 , t) f (x, t) to be strictly increasing, i.e., f
to have strict ID. Thus, the property of (strict) ID is the right condition to ensure
(strict) monotone comparative statics that is robust to additive perturbations of
the objective function.

9.3.2 Multiplicative Perturbations

Here F takes the form F (x, y, t) = f (x, t) · y, with the restriction f (x, t) , y 0.
Since SCC is invariant to strictly increasing transformation of values, we can check
that log [f (x, t) · G (x)] = log [f (x, t)] + log [G (x)] has SCC for all nonnegative
functions G. But this means that the function (x) = log [G (x)] could be an arbi-
trary function, and therefore the “right” condition is that the function log [f (x, t)]
has ID. (This property of f is also known as “log-supermodularity”).

9.3.3 Arbitrary Perturbations of a Smooth Objective

Here we let X, Y be intervals, and require F to be twice continuously di↵erentiable


in (x, y), with Fy 6= 0 and having a constant sign. (I.e., y is either always a “ben-
efit” or always a “cost” of choice x.) In addition, we assume that “compensation
using y is possible:” for all t 2 T and all x0 , x00 2 X, there exist y 0 , y 00 2 Y such
that F (x0 , y 0 , t) = F (x00 , y 00 , t). (This means that any two vertical lines in the
(x, y) space are crossed with an isoqant. This could be weakened somewhat to the
requirement that the isoquants of F in the (x, y) space are path-connected.) In
this setting, the robust SCC of F (x, G (x) , t) turns out to be equivalent to the
famous Spence-Mirrlees condition, which says that the marginal rate of substitu-
tion between x and y (the slope of the isoquants of F ), is nondecreasing in the
parameter t at any given point (x, y).

Proposition 15 Suppose X, Y ✓ R are intervals, T ✓ R, and F : X ⇥ Y ⇥ T !


R is twice continuously di↵erentiable in (x, y), with Fy 6= 0, and compensation with
y is possible. Then F (x, G (x) , t) satisfies SCC in (x, t) for all G : X ! Y if and
only if Fx (x, y, t) / |Fy (x, y, t)| is nondecreasing in t. F (x, G (x) , t) satisfies strict
SCC in (x, t) for all G : X ! Y if Fx (x, y, t) / |Fy (x, y, t)| is strictly inreasing in
t.

40
Proof. First we show the “if” part of the first statement, letting for definite-
ness Fy > 0 (if Fy < 0 we can replace y with y). Denote by ŷ (x|t, ↵) the
value of y 2 Y satisfying F (x, y, t) = ↵, which is at most unique under our
assumption. Thus ŷ (x|t, ↵) describes an isoquant of F , and ŷ 0 (x|t, ↵) =
Fx (x, ŷ (x|t, ↵) , t) /Fy (x, ŷ (x|t, ↵) , t). Observe that when t00 > t0 ,

d
F (x, ŷ (x|t0 , ↵) , t00 ) = Fx (x, ŷ (x|t0 , ↵) , t00 ) + Fy (x, ŷ (x|t0 , ↵) , t00 ) ŷ 0 (x|t0 , ↵)
dx
Fx (x, ŷ (x|t0 , ↵) , t0 )
= Fx (x, ŷ (x|t0 , ↵) , t00 ) Fy (x, ŷ (x|t0 , ↵) , t00 )
Fy (x, ŷ (x|t0 , ↵) , t0 )

Fx (x, ŷ (x|t0 , ↵) , t00 ) Fx (x, ŷ (x|t0 , ↵) , t0 )
= Fy (x, ŷ (x|t0 , ↵) , t00 )
Fy (x, ŷ (x|t0 , ↵) , t00 ) Fy (x, ŷ (x|t0 , ↵) , t0 )
0.

(In words, increasing x while moving along the isoquant of type t0 benefits type
t00 .)
Now, suppose that x00 > x0 and F (x00 , G (x00 ) , t0 ) F (x0 , G (x0 ) , t0 ). Then,
since compensation is possible, there exist y 0 , y 00 such that F (x00 , y 00 , t0 ) = F (x0 , y 0 , t0 ) ⌘
↵, and furthermore since Fy > 0 we can choose them so that y 0 G (x0 ) and y 00
 G (x00 ) . Then, noting that y 00 = ŷ (x00 |t0 , ↵) and y 0 = ŷ (x00 |t0 , ↵) and using the
previous display, we have

F (x00 , G (x00 ) , t00 ) F (x00 , y 00 , t00 ) F (x0 , y 0 , t00 ) F (x0 , G (x0 ) , t0 ) .

Similarly, starting from the premise F (x00 , G (x00 ) , t0 ) > F (x0 , G (x0 ) , t0 ), some of
the above inequalities become strict to yield the conclusion F (x00 , G (x00 ) , t00 ) >
F (x0 , G (x0 ) , t0 ). This establishes that F has SCC. Similarly, the strict Spence-
Mirrlees condition yields strict SCC.
To see the “only if” part, note that if the weak Spence-Mirrlees condition fails,
then for some t00 > t0 , Fx (x, y, t00 ) / |Fy (x, y, t00 )| < Fx (x, y, t0 ) / |Fy (x, y, t0 )| at
some point (x, y), and therefore by continuity on some open square X̄ ⇥ Ȳ ✓ X ⇥Y .
But this implies, by the strict “if” part, that F (x, G (x) , t) has strict SCC in (x, t)
on X̄ ⇥ {t0 , t00 } for all G : X̄ ! Ȳ , which contradicts SCC in (x, t).

41
Remark 9 The “only if” statement for strict SCC is not be true, as shown by
Edlin and Shannon (1998): the strict Spence-Mirrlees condition is not necessary
for F (x, G (x) , t) to satisfy SCC in (x, t) for all G : X ! Y.

Remark 10 Note that the Spence-Mirrlees condition is ordinal, since it depends


only on the isoqants of F . I.e., it is invariant to increasing transformations of F ,
provided that they are smooth (so the slope can be calculated as the ratio of partial
derivatives of F ).

Remark 11 In the special case where F takes the quasilinear form F (x, y, t) =
f (x, t) + y, Fy ⌘ 1 and so the Spence-Mirrlees condition means that fx (x, t) is
nondecreasing in t, i.e., that f has ID. This is consistent with the previous part
(robustness to additive perturbations) except here it imposes smoothness of the
function.

Remark 12 The proofs of the “only if” results do not use all possible perturbation
functions G. Instead, any “sifficiently rich” family of functions G which allows to
assign arbitrary values at two given points x0 , x00 will do to to obtain the necessity
of ID or of the Spence-Mirrlees condition in the respective setting. In particular,
it suffices to consider affine perturbation functions G (x) = a + bx with arbitrary
parameters a, b. Thus, to have monotone comparative statics that is robust to such
perturbations, F must satisfy appropriate conditions, which in turn ensures that
the comparative statics is robust to arbitrary perturbations G.

Example 7 We want to see how a monopolist responds to a growing market. Sup-


pose we have N consumers (or markets) with identical inverse demand P (q). The
monopolist chooses per consumer output q to solve maxq 0 [P (q) · N q c (N q)]. It
is difficult to ensure that the objective function has ID. However, we can write the
program as maxq 0 F (q, P (q) , N ), where F (q, p, N ) = N pq c (N q). To check
SCC, note that Fp (q, p, N ) = N q 0 and
Fq (q, p, N ) Np N c0 (N q) p c0 (N q)
= =
Fp (q, p, N ) Nq q
is increasing/decreasing in N if c (·) is strictly concave/convex. Intuitively, with
concave/convex costs, N increases the marginal rate of substitution between per

42
customer output q and price p in the firm’s profits. Thus, for any demand function
the firm might face, it would respond to a growing market by raising per customer
output and reducing price when its cost function is concave, and doing the reverse
when its cost function is convex.

9.4 Multivariate Topkis’s Theorem


Suppose that the choice variable x is a n-dimensional vector, and that we would
like to know whether all components of the maximizer X ⇤ (t) are nondecreasing in
t. E.g., with n = 2, the maximization problem could be is

max F (x1 , x2 , t)
(x1 ,x2 )2X✓R2

Univariate Topkis’s Theorem implies that if F has ID in (x1 , t), then the optimal
value of x1 holding x2 fixed is non-decreasing in t. Similarly, if F has ID in (x2 , t),
then the optimal value of x2 holding x1 fixed is non-decreasing in t. However, now
both variables are chosen simultaneously, and we need to think of indirect e↵ects
(“feedbacks”) arising from the interaction between x1 and x2 . For example, how
does the fact that x2 increases in response to an increase in t a↵ect the optimal
value of x1 ?
Intuitively, if we assume in addition that F has ID in (x1 , x2 ), then the indirect
e↵ects will work in the same direction as the direct e↵ects. For example, under
this assumption, the fact that x2 optimally increases in response to an increase in
t further increases our incentive to raise x1 (as in the picture below). So, in the
end, when all the feedbacks play out, both x1 and x2 are increased.
+
% x1
t l +
& x2
+

More generally, when F has ID in all pairs of variables, all indirect e↵ects will
reinforce the direct e↵ects and each other. Formally, ID in all pairs of variables is
characterized by a property called “supermodularity,” which we now define.

43
For x, y 2 Rn , define operations meet and join, respectively, as follows:

x ^ y = (min {x1 , y1 } , . . . , min {xn , yn }) ,


x _ y = (max {x1 , y1 } , . . . , max {xn , yn }) .

(They can also be called “greatest lower bound” and “least upper bound” of {x, y},
respectively.) A set X ✓ Rn is a sublattice if for all x, y 2 X, we have x ^ y 2 X
and x _ y 2 X.
Graphically, when X ✓ R2 , the sublattice property means that when two non-
ordered corners of a rectangle whose edges are parallel to the axes are in X, then
the other two corners are also in X. Intuitively, X being a sublattice means
that the feasible set induces a (weak) complementarity in the dimensions of x:
if it is possible to increase [reduce] dimension xi of x 2 X (i.e., find y 2 X
s.t. yi > [<] xi ), this can always be done without reducing [increasing] any other
dimension xj , simply by going to x _ y [x ^ y] (but sometimes this might involve
increasing [reducing] the other dimensions).
Here are some examples of sublattices:

1. Any product set X = X1 ⇥ . . . ⇥ Xn .

2. Any set described by an inequality xj  g (xi ), where g is an increasing


function.

In case (1), increasing one dimension does not a↵ect the feasibility of increasing
another dimension, while in case (2) increasing dimension i helps make an increas-
ing in dimension j feasible (and vice versa.) In fact, it has been shown (by Topkis)
that any sublattice of Rn can be described as an intersection of sets of the form
(1) and (2).
For a set that is NOT a sublattice, take a consumer’s budget set: (x1 , x2 ) 2 R2+ : p1 x1 + p2 x2  w
where p1 , p2 > 0 are prices of the two goods, and w > 0 is the consumer’s wealth.
Intuitively, here increasing x1 might necessitate a reduction x2 so as to preserve
the consumer’s budget constraint.

Remark 13 We have defined “meet” and “join” operations on Rn , but the the-
ory of supermodularity applies to any partial ordered set X on which the “meet”

44
and “join” operations are defined as the supremum (greatest lower bound) and the
infimum (least upper bound), respectively. If these two operations are well-defined
within the set, it is called a “lattice,” and the study of such sets is called “lattice
theory.” We could use di↵erent lattices to examine monotone comparative statics
on choice sets other than subsets Rn and/or in partial orders that are di↵erent
from the vector ordering on Rn . To give one example, if X is the set of all
subsets of some set Y , and the partial order on X is given by the set inclusion
(✓), the “meet” and “join” operations become the set intersection [ and set union
\ operations, respectively.

Definition 5 A function F : X ! Rn on a sublattice X is supermodular if for


all x, y 2 X,
F (x ^ y) + F (x _ y) F (x) + F (y) .

For example, consider the meaning of supermodularity when X is a sublattice


of R2 . The condition has no bite when x  y or x y, since then {x, y} =
{x ^ y, x _ y}. Thus, consider the case where (for definiteness) x1 > y1 but x2 <
y2 . The supermodularity inequality can then be written as

F (y1 , x2 ) + F (x1 , y2 ) F (x1 , x2 ) + F (y1 , y2 ) , or


F (x1 , y2 ) F (y1 , y2 ) F (x1 , x2 ) F (y1 , x2 ) .

But this inequality means that the benefit of increasing the first argument from
y1 to x1 can only go up when the second argument increases from x2 to y2 . Thus,
supermodularity is here implied by ID. In fact, when X = X1 ⇥ X2 ✓ R2 , then
supermodularity on X also implies ID, by writing, for each x1 > y1 and y2 > x2 ,
the supermodular inequality for (x1 , y2 ) and (x2 , y1 ). More generally, when X is a
product set in Rn with n 2, supermodularity is characterized by ID in each pair
of variables holding the others fixed:

Lemma 16 F : X ! R on a product set X1 ⇥ . . . ⇥ Xn ✓ Rn is supermodular if


and only if it has increasing di↵erences in (xi , xj ) for all i 6= j holding the other
variables x ij fixed.

45
Proof: “Only if”: Take any xi , x0i 2 Xi , xj , x0j 2 Xj with x0i > xi and x0j > xj ,
and x ij 2 X ij = ⇧l6=i,j Xl . Writing the supermodular inequality for (x0i , xj , x i j )
and xi , x0j , x i j yields

F x0i , x0j , x i j F xi , x0j , x i j F (x0i , xj , x i j) F (xi , xj , x i j) .

“If”: Let m = x ^ y and M = x _ y. Then we can write


n
X
F (M ) F (x) = [F (M1 , . . . , Mi 1 , Mi , xi+1 , . . . , xn ) F (M1 , . . . , Mi 1 , xi , xi+1 , . . . , xn )]
i=1
Xn
[F (y1 , . . . , yi 1 , Mi , mi+1 , . . . , mn ) F (y1 , . . . , yi 1 , xi , mi+1 , . . . , mn )]
i=1
n
X
= [F (y1 , . . . , yi 1 , yi , mi+1 , . . . , mn ) F (y1 , . . . , yi 1 , mi , mi+1 , . . . , mn )]
i=1
= F (y) F (m) .

The inequality is by ID (since Mj yj and xj mj for all j). For the second
equality, note that for each i, either Mi = yi and mi = xi , or Mi = xi and mi = yi
(and in the latter case both di↵erences are zero). QED

Theorem 17 (Topkis) If X is a sublattice, T is fully ordered (e.g., T ✓ R), and


F : X ⇥ T ! R is supermodular, then for all t, t0 2 T such that t0 t, and all
x 2 X ⇤ (t) x0 2 X ⇤ (t0 ),

x ^ x0 2 X ⇤ (t) and x _ x0 2 X ⇤ (t0 ) .

Proof: Since X is a lattice, x ^ x0 2 X and x _ x0 2 X. By revealed preference,

F (x ^ x0 , t)  F (x, t) and F (x _ x0 , t0 )  F (x0 , t0 ) .

On the other hand, by the definition of meet and join and the supermodularity
inequality,

F (x ^ x0 , t) + F (x _ x0 , t0 ) = F ((x, t) ^ (x0 , t0 )) + F ((x, t0 ) _ (x0 , t0 ))


F (x, t) + F (x0 , t0 ) .

46
Hence, F (x ^ x0 , t) = F (x, t) and F (x _ x0 , t0 ) = F (x0 , t0 ). QED

In particular, if X ⇤ (t) = {x} and X ⇤ (t0 ) = {x0 }, then the statement says
that x ^ x0 = x and x _ x0 = x0 , which means that x  x0 –i.e., the maximizer is
nondecreasing in t. In the general case, the conclusion of the theorem is often stated
as X ⇤ (t) is “nondecreasing in t in the stronger set order,” with the appropriate
definition of the strong set order: A  B in the strong set order if for all a 2 A, b 2
B, we have a ^ b 2 A and a _ b 2 B.
Note: The theorem also applies to the case t0 = t, in which it says that the set
of maximizers X ⇤ (t) is a sublattice.

10 Application: Complements and Substitutes


Now we consider applications of Topkis’s Theorem to characterizing monotone
price e↵ects on the input demands of a single-output firm. Informally, in price
theory two inputs are called substitutes when an increase in the price of one leads
to an increase in input demand for the second and complements when it leads to a
decrease in input demand for the second. (In the di↵erentiable case, complemen-
tarity/substitutability is given by the sign of the entry in the substitution matrix
@zi /@wj .) Several things conspire to complicate this seemingly simple definition.
First, it is perfectly possible that the change in demand in response to a price
increase is not uniform: for example, the demand zj (p, w) for input j may increase
as the price of input i increases from wi to wi0 and may then decrease as the input
price increases further to wi00 . Second, the response to a price increase can depend
on which profit-maximization problem is used to determine demand. E.g., it could
be the problem where

1. All inputs and output are free to vary, or

2. Some inputs S ✓ {1, . . . , n} are held fixed at some levels zS 2 RS+ (which
could be interpreted as “short-run” optimization), or

3. Output q is held fixed (i.e., the cost-minization problem).

47
The first result reported below is for the profit maximization problem with a
single output and all inputs free to vary.

Proposition 18 Restrict attention to the domain of price vectors (p, w) 2 Rm+1+


upon which input demand vector z(p, w) is singleton-valued. If the production
function f is increasing and supermodular, then z(p, w) is nondecreasing in p and
nonincreasing in w.

Proof. Since f (·) is increasing and supermodular, the firm’s objective function
p · f (z) w · z is supermodular in (z, p). Also, the choice set Rn+ is a lattice. So by
Topkis’ Monotonicity Theorem, z(p, w) must be nondecreasing in p. Similarly, the
firm’s objective is also supermodular in (z, w). So z(p, w) is nonincreasing in wi .
Q.E.D.

The proposition is easily extended to “short run” profit maximization, in which


some set of inputs S ✓ {1, . . . , n} is held fixed at some levels zS 2 RS+ . Indeed,
let z(p, w, xS ) denote the solution to the firm’s profit maximization problem given
the extra constraint zS = xS . This additional constraint defines a lattice, so the
original proof still applies.
The above result shows that supermodularity of the production function f im-
plies the price-theoretic notion of input complementarity. In fact, supermodularity
is strictly stronger than the price-theoretic complementary, because it implies the
price theory concept not only for the long-run problem but also for all possible
short-run problems. It is stronger in another way, as well: it characterizes the
behavior of f even around choices z that would never be justified by any price
vector. The next theorem asserts that when f is strictly concave, so each choice
is the unique optimum for some set of prices, then supermodularity is identical to
this notion of long-run and short-run price theory complementarity.

Proposition 19 Suppose that f is increasing and strictly concave. If for all S


and xS , the solution z(p, w, xS ) is nonincreasing in w, then f is supermodular.

Proof. Left as an exercise. (Hint: Suppose that all but 2 inputs are fixed. How
does f vary in the remaining two inputs?) Q.E.D.

48
By Topkis’s Monotonicity Theorem, if f is supermodular, then z(p, w, xS ) is
nonincreasing in w, for all S, even without the assumptions that f is increasing
and concave. So, the import of the proposition is that for the case of an increasing
concave production function f , inputs are complements in the strong sense defined
by the proposition if and only if f is supermodular.
The substitutes case is similar for the two-input case. A function f is called
submodular if ( f ) is supermodular.

Proposition 20 Restrict attention to the domain of price vectors (p, w) upon


which z(p, w) is singleton-valued and suppose there are just two inputs. If the
production function f is submodular, then z1 (p, w) is nondecreasing in w2 and
z2 (p, w) is nondecreasing in w1 .

Proof. If f is submodular, then changing the variables to z̄1 = z1 , the firm’s


problem can be written as maxz̄1 0,z2 0 [f ( z̄1 , z2 ) + w1 z̄1 w2 z2 ], and the ob-
jective function is supermodular in (z̄1 , z2 , w1 ). Topkis’s Monotonicity Theorem
implies that z2 (p, w) is nondecreasing in w1 . The second statement is symmetric.
Q.E.D.

Note well that this characterization applies only to the two-input case. With
more than two inputs, the following problem arises: submodularity of f does not
ensure that the indirect e↵ects do not counter each other. E.g., suppose w1 goes up
which makes the firm reduces z1 (by the Law of Supply). By univariate Topkis’s
theorem f , this would lead the firm to raise each z2 and z3 if the other were held
fixed. However, the increase in z2 would then lead the firm to reduce z3 :
+
% z3
w1 " ! z1 # l
& z2
+

Without very restrictive assumptions on f , it is hard to ensure that the last


e↵ect does not overpower the previous ones.
Instead of dealing with the production function, one can have a simple charac-
terization of substitutes or complements using the profit function:

49
Proposition 21 Suppose the profit function ⇡(p, w) is di↵erentiable in w. For
two inputs i 6= j, zi (p, w) is non-increasing [non-decreasing] in wj if and only if
the profit function ⇡(p, w) has increasing di↵erences in (wi , wj ) [(wi , wj )].

@
Proof. By Hotelling’s lemma, zi (p, w) = @w i
⇡(p, w), which is always non-
increasing [non-decreasing] in wj if and only if ⇡ has increasing di↵erences in
(wi , wj ) [(wi , wj )]. Q.E.D.

This implies, in particular, that all inputs are complements [substitutes] to


each other if and only if the profit function is supermodular [submodular] in
the input prices. Similar characterizations obtain for short-run substitutabil-
ity/complementarity using short-run profit function, and for substitutability/complementarity
holding output fixed using the cost function.

11 The Short Run and the Long Run


While not treating time explicitly, the neoclassical theory of the firm typically
distinguishes between the long-run, a length of time over which the firm has the
opportunity to adjust all factors of production, and the short-run, during which
time some factors may be difficult or impossible to adjust.
In his Foundations of Economic Analysis (1947), Samuelson suggested that a
firm would react more to input price changes in the long-run than in the short-
run, because it has more inputs that it can adjust. This view still persists in
some economics texts.8 Samuelson called this e↵ect the LeChatelier principle and
argued that it also illuminates how war-time rationing makes demand for non-
rationed goods less elastic. Assuming that the optimal production choice y(p) is
di↵erentiable, he proved that the principle holds for sufficiently small price changes
in a neighborhood of the long-run price. The relation between long and short
run e↵ects can be quite important, because data about the short-run e↵ects of
8
For example, Varian (1992) writes: “It seems plausible that the firm will respond more to
a price change in the long run since, by definition, it has more factors to adjust in the long run
than in the short run. This intuitive proposition can be proved rigorously.”

50
policies are frequently used to forecast their long-run e↵ects, and such forecasts
can influence policymakers.
We begin our analysis with an example to prove that the Samuelson-LeChatelier
principle does not apply to large price changes. Consider a single-output firm with
the production function f (k, l) = 10 if either l 2 or k, l 1, and 0 otherwise.
Thus, the firm can produce 10 units of output either by using two units of labor,
or by using one unit of each input. Suppose that the output price is 1.
Suppose that initial long-run input prices are given by w = (3, 2). At the
corresponding initial long-run optimum, the firm achieves its maximum profit by
buying two units of labor: z 0 = (0, 2). Suppose that the price of labor rises to
6, so the new price vector is w0 = (3, 6). If the use of capital is fixed in the
short run at zero , the firm can no longer make a positive profit since 2 · 6 > 10,
so it shuts down, hence z SR = (0, 0). The firm’s long run choice at price vector
w0 is using both capital and labor: z LR = (1, 1) (yielding a profit of 1). In this
example, when the price of labor rises from 2 to 6, the demand for labor changes
in the short-run from z10 = 2 to z1SR = 0, but then recovers in the long-run to
z1LR = 1. So, the short-run change is larger than the long-run change, contrary
to the Samuelsonian conclusion. Although the production function may seem
unusual, it can be modified to be concave, smooth, and strictly increasing, yielding
the same input demand functions.
There is an interesting set of economic models in which it is always true that
long-run responses to price changes are larger than short run responses. Intuitively,
these are models in which a “positive feedbacks” argument applies, as follows.
Consider the profit-maximization problem for a single-output firm with two
inputs, capital and labor, and production function f (k, l) in which the inputs are
substitutes, in the sense of submodularity (fkl  0 ). Suppose that capital is fixed
in the short-run. By the law of demand, if the wage increases, the firm will use
less labor both in the short-run and in the long-run. Since the two inputs are
substitutes, the increased wage implies an increased use of capital in the long-run.
Since fkl  0, the additional capital used in the long-run will reduce the marginal
product of labor, so in the long-run the firm will use still less labor. In summary,
the long-run e↵ect is larger than the short-run e↵ect because, in the short-run the

51
firm responds only to a higher wage, but in the long-run, it responds both to a
higher wage and to an increased capital stock that reduces marginal product of
labor. Graphically, the additional e↵ect in this example can be represented by a
positive feedback loop.
Next, suppose that the two inputs, capital and labor, are complements, i.e.,
fkl 0 Again, by the law of demand, if the wage increases, the firm will use
less labor input, both in the short-run and in the long-run. Since the inputs are
complements, the increased wage implies a reduced use of capital in the long run.
Since fkl 0, the reduced capital used in the long-run will reduce the marginal
product of labor, so in the long-run the firm will use still less labor. Again, we
have a positive feedback loop.
The general positive feedback argument for two inputs (due to Milgrom and
Roberts (1996)) goes as follows. Let X and Y be lattices (for example, let X =
Y = R). Define:
x(y, t) = arg max F (x, y, t)
x2X

and
y(t) = arg max F (x(y, t), y, t).
y2Y

Proposition 22 Suppose that F : X ⇥ Y ⇥ R ! R is supermodular, that t0 t,


and that the maximizers described below are unique for the parameter values t and
t0 . Then:
x(y(t0 ), t0 ) x(y(t), t0 ) x(y(t), t).
and
x(y(t0 ), t0 ) x(y(t0 ), t) x(y(t), t).

Proof. By Topkis’ Theorem applied to max(x,y)2X⇥Y F (x, y, t), the function y(t) is
nondecreasing. Then, since t0 t, y(t0 ) y(t). Similarly, by Topkis’s Theorem, the
function x(y, t) is nondecreasing (in both arguments). The claims in the theorems
follow immediately from that and the inequalities t0 t and y(t0 ) y(t). Q.E.D.

Now let’s apply the result. If capital and labor are “complements” in the sense
that the production function f (k, l) of the capital input k and labor input l is

52
supermodular, then we let x = l, y = k, and t = wl , where wl is the price of
labor. The firm’s objective function is

F (x, y, t) = pf (y, x) + tx wk y,

which is supermodular because it is the sum of supermodular functions.


The Proposition can also be applied to the case where capital and labor are
“substitutes,” in the sense that production function f (k, l) is submodular. In this
case, let x = l, y = k, and t = wl , so that the objective function becomes

F (x, y, t) = pf ( y, x) + tx wk y,

and is again supermodular. We then have the following result.

Corollary 23 (LeChatelier Principle) Suppose production is given by f (k, l),


where either fkl (k, l) 0 for all (k, l) or fkl (k, l)  0 for all (k, l). Then if the wage
wl increases (decreases), the firm’s labor demand will decrease (increase), and the
decrease (increase) will be larger in the long-run than in the short-run.

In particular, if f is twice continuously di↵erentiable, fkl has a constant sign in


a neighborhood of any point (k, l) at which fkl (k, l) 6= 0, and so capital and labor
will be either local substitutes or local complements. This explains Samuelson’s
local version of the Le Chatelier principle. On the other hand, globally, capital
and labor need not be either always substitutes or always complements. E.g., in
our example above, capital is a complement to labor when labor is scarce but a
substitute to labor when labor is abundant. Namely, when we go from zero units
of labor to one, the marginal product of capital goes up from 0 to 10, but when
we go from one unit of labor to 2, the marginal product of capital goes back down
from 10 to 0. It is this non-uniformity that enables the example to contradict the
conclusion of the LeChatelier principle.

53

You might also like