Professional Documents
Culture Documents
1 Moral Hazard: Theory
1 Moral Hazard: Theory
1 Moral Hazard: Theory
2. Moral Hazard
April 20, 2020
After a contract is signed an, an agent can take actions to affect the characteristics. Examples
Model
Payoffs
• The agent receives utility U = u(w) − c(a), where u(·) is increasing and concave, and c(·) is
increasing and convex.
– We will often assume the functions are differentiable, or satisfy convenient boundary
conditions, e.g. c0 (0) = 0.
• The agent has outside option U . One can think of this as the agent’s utility from another job.
1
Econ 201c, UCLA Simon Board
Throughout we’ll assume that output q is contractible (i.e. a judge will enforce the contract w(q)).
We differentiate between the following:
• Effort is unobserved. This means the principal does not see the agent’s effort. This is the
“moral hazard” problem. We’ll primarily be interested in this case.
• Effort is contractible or verifiable. This means the principal and a court can see the agent’s
effort, and so the parties can write a contract on it. This is the “first-best” case that we study
next.
• Effort is observed. This means the principal sees the agent’s effort, but an outside court may
not. It turns out that you can implement first-best if you are sufficiently cunning. See HW 1,
Q2. We’ll come back to this case when we look at relational contracts.
The problem
• The principal chooses the action a and the wage w(q) to solve
max Π = E [q − w(q)|a]
a,w(q)
• The “IR” stands for “individual rationality”. It means that the agent is willing to sign the
contract.
• If (IR) is slack, then we can lower the wage in each state. For example, given contract w(q)
define a new contract w̃(q) = w(q) − .
• We will actually want a slightly stronger statement: That the Lagrange multiplier on the
constraints is strictly positive.
– This follows since the a reduction in U enables the firm to lower wages and strictly
increase profits.
– One can also prove this via contradiction. If the multiplier were zero then the optimal
wage would be minus infinity!
2
Econ 201c, UCLA Simon Board
The Lagrangian
The principal maximizes
L = E [q − w(q)|a] + λE u(w(q)) − c(a) − U |a
Z
= [q − w(q) + λu(w(q)))] dF (q|a) − λc(a) − λU (1)
q
• What wages should the principal choose? Equation (1) is additive over the states, so we can
maximize pointwise. That is,
max −w + λu(w)
w
So the optimal choice of wage is independent of the output, q. This objective is also concave.
• Interpretation: 1/u0 is the cost of delivering a util. To see this let w(u0 ) be the wage required
to deliver u0 utils, u(w(u0 )) = u0 . Differentiating with respect to u0 and rearranging,
1
w0 (u0 ) =
u0 (w(u0 ))
The FOC (2) thus says that the cost of delivering a util is equated to across all states and
given by λ.
u(w∗ ) − c(a) = U
Π = E[q|a] − w∗
= E[q|a] − u−1 (c(a) + U )
• We assume the principal makes a take-it-or-leave-it (TIOLI) offer to the agent. This is actually
without loss. By varying U , we can trace out the Pareto frontier Π(U ).
3
Econ 201c, UCLA Simon Board
• We could equivalently give all the bargaining lower to the agent and have the agent make an
offer subject to giving the principal profits of Π̄. These problems of duals of each other.
• The principal chooses a wage w(q) and a recommended action a. The agent must want to
follow this recommendation.
max Π = E [q − w(q)|a]
a,w(q)
• Here, “IC” stands for “Incentive Compatible”. It means that the agent is happy to follow the
principal’s recommendation.
• The idea of having the principal recommend an action is without loss; it’s a version of the
“revelation principle” (Myerson, 1983, JMathE).
• We will consider three situations when the principal can obtain first-best.
• The problem with the first-best wage is that the agent has no incentives. But this is fine as
the agent will choose the first-best action on his own.
4
Econ 201c, UCLA Simon Board
• If the agent is risk-neutral, then the principal can “sell the firm”, making the agent the residual
claimant.
Intuitively, the agent makes an up-from payment of k and then gets to keep all the output
that he creates.
k = E[q|a∗ ] − c(a∗ ) − U
– If a∗ = 10, then output should be in the range q ∈ [10, 11]. If anything else happens, the
principal should punish the agent very heavily.
5
Econ 201c, UCLA Simon Board
– The support here is < so it doesn’t satisfy the shifting support assumption, however the
support is “almost” shifting, and the first-best contract can “almost” be implemented.
– In practice, the normal-additive model is used a lot (see HW1, Q4). We typically restrict
the principal to linear contracts, w = α + βq. Without this restriction, first-best is
attainable with an extreme contract that punishes agents in the lower tail.
Setup
• Suppose the agent chooses a ∈ {L, H}, and that c(L) < c(H).
• This has the advantage that the (IC) constraint is simple: we need to stop the agent shirking
and choosing the low action.
• If we wish the agent to choose the low action, we need not provide incentives and can thus
pay a constant wage so (IR) binds,
6
Econ 201c, UCLA Simon Board
max E [q − w(q)|H]
w(q)
CLAIM: The constraints (IR) and (IC) binds. In particular, λ > 0 and µ > 0.
• Suppose (IR) is slack. Then we can lower wages, giving us a contradiction. In particular,
given contract w(q) define a new contract w̃(q) such that u(w̃(q)) = u(w(q)) − . By lowering
the utility in each state, this leaves (IC) unaffected.
• Suppose (IC) is slack. Then the solution to the problem is a flat wage, w∗ . But then the
optimal action is a = L, giving us a contradiction.
• The Lagrangian is
L = max E [q − w(q)|H]+λE u(w(q)) − c(H) − U |H +µ [E [u(w(q)) − c(H)|H] − E [u(w(q)) − c(L)|L]]
w(q)
• Differentiating,
1 f (q|L)
=λ+µ 1− (3)
u0 (w(q)) f (q|H)
• Interpretation: Consider raising the wage in state q by enough to raise the worker’s utility
by one util. The LHS is the cost to the principal of raising the agent’s utility by a util. On
the RHS, λ represents the benefit of loosening the (IR) constraint, while the second term
represents the impact of the wage increase on the (IC) constraint. The wage boost loosens
the (IC) constraint if f (q|H) > f (q|L), meaning that the output is more likely to occur under
the high action.
7
Econ 201c, UCLA Simon Board
Figure 1.1: Non-monotone Wages. In the last picture, ŵ is defined by 1/u0 (ŵ) = λ.
• Let `(q) = f (q|H)/f (q|L) be the likelihood ratio. This is a sufficient statistic for output.
Thus, we can write
1 1
=λ+µ 1−
u0 (w(`)) `
• The RHS increases in `. Since the LHS increases in w, this means the wage increases in `.
Interpretation: the more likely the agent took action H, the higher the wage.
• This is like a statistical inference problem. Given output, we try to infer whether the agent
shirked or worked. If the evidence suggest he worked, we pay him more. The weird thing is
that we know that the agent actually worked. The principal punishes him in states that are
indicative of low effort in order to deter deviations.
• This points to a time inconsistency problem. After the agent has chosen the effort, the
principal would like to fully insure him. But if she did this, the agent would exert no effort.
See Fudenberg and Tirole (1990, Ecta).
• Suppose `(q) increase in q; this is called the Monotone Likelihood Ratio Property (MLRP).
Then wages w(q) are increasing in q.
• In general this may not be the case. See Figure 1.1, which is taken from MWG.
8
Econ 201c, UCLA Simon Board
• In principle we can calculate profit from the two actions. However, since we don’t have a
closed form expression for profits from a = H, it’s hard to say much.
• We can say that there is a downward distortion relative to first-best. Profits from a = L are
the same as in first-best, while profits from a = H are lower. Thus we may choose a = L
when a∗ = H.
• In general q could be binary, continuous or even multi-dimensional. The wage at a given signal
is determined by the likelihood ratio, Pr(q|H)/ Pr(q|L).
• Suppose we have two statistics of firm performance, (q, q̃). For example, q could be revenue,
while q̃ are costs.
• CLAIM: If q is a sufficient statistic for (q, q̃) then we should make the wage dependent on q
alone.
• The general principle is that we should not make the agent’s wage depend on things outside
his control. This introduces risk without introducing incentives.
Exercise: Suppose the principal is also risk averse, obtaining payoffs v(q − w(q)), where v(·) is
strictly concave. How does this affect the first-order condition (3)?
9
Econ 201c, UCLA Simon Board
max E [q − w(q)|a]
a,w(q)
• We’ll replace the agent’s IC constraint with the first-order condition corresponding to his
optimization problem,
Z
U 0 (a) = 0 ⇔ u(w(q))fa (q|a)dq − c0 (a) = 0 (ICF OC)
where λ ≥ 0 and µ ≷ 0.
Optimal wages
• Recall MLRP states that f (q|aH )/f (q|aL ) increases in q for aH > aL .
• Hence log[f (q|aH )/f (q|aL )] = log f (q|aH ) − log f (q|aH ) increases in q.
10
Econ 201c, UCLA Simon Board
• Hence
fa (q|a) d log f (q|a + ) − log f (q|a)
= log f (q|a) = lim
f (q|a) da →0
is increasing in q.
L = u(x) + λ[m − p · x]
If λ > 0 then this is the same as solving maxx u(x) s.t. m − p · x ≥ 0. Intuitively, the agent
would like to make m − p · x go negative; to ensure this does not happen, she must pay λ utils
if she exceeds her budget.
L = Π + µU 0 (a)
If µ > 0 then this is the same as solving maxw Π s.t. U 0 (a) ≥ 0. If U (a) is concave and
maximized at a0 , then we can also write this as maxw Π s.t. a ≤ a0 , meaning that if the
principal could choose any action [0, a0 ], then she would choose a0 . That is, she’d like the
agent to take more effort, and not less.
LEMMA: Consider two functions v(q) and g(q) such that v(q) is decreasing and g(q) is mean zero
and quasi-increasing.1 Then the correlation is negative
Z
Corr = v(q)g(q)dq ≤ 0
• Since g(q) averages to zero we are putting positive weight on “high q” outcomes (where v is
low) and negative weight on “low q” outcomes (where v is high). The average is thus negative.
1
The function g(q) is quasi-increasing if g(q) < 0 for q < q̂ and g(q) > 0 for q > q̂. The naming is parallels
“quasi-convex” since the the derivative of a quasi-convex function is quasi-increasing, like the derivative of a convex
function is increasing. Such a function is sometimes called single-crossing.
11
Econ 201c, UCLA Simon Board
• Formally, let the crossing point of g(q) be q̂ (see Figure 1.2). Then
Z Z
Corr = v(q)g(q)dq + v(q)g(q)dq
q<q̂ q>q̂
Z Z
≤ v(q̂)g(q)dq + v(q̂)g(q)dq
q<q̂ q>q̂
Z
= v(q̂) g(q)dq
=0
To understand the inequality, note that: (i) When q < q̂, v(q) ≥ v(q̂) and g(q) ≤ 0, so
v(q)g(q) ≤ v(q̂)g(q), and (ii) When q > q̂, v(q) ≤ v(q̂) and g(q) ≥ 0, so v(q)g(q) ≤ v(q̂)g(q).
• To apply the Lemma, let v(q) = u(w(q)) and g(q) = fa (q|a). By assumption u(w(q)) is
R
decreasing. By MLRP, fa (q|a) is quasi-increasing. Moreover, it averages to zero, fa (q|a)dq =
0. Intuitively, a higher effort will make some output more likely and some less likely; this
cancels out on average. Formally,
Z Z
d d
fa (q|a)dq = f (q|a)dq = [1] = 0
da da
• Inequality (5) follows from the Lemma and the fact that c0 (a) > 0.
• It’s very convenient to replace the (IC) condition with (ICFOC), but this is only valid if the
agent’s problem is concave.
• There are a couple of sufficient conditions that guarantee this but neither are satisfactory.
In practice, researchers calibrating a moral hazard typically assume the first-order approach
works and numerically check after the fact.
• Suppose A = [0, 1], and there exists FL (q) and FH (q) such that F (q|a) = aFH (q)+(1−a)FL (q)
12
Econ 201c, UCLA Simon Board
Figure 1.2: A Decreasing Function and Quasi-Increasing Function are Negatively Correlated.
• Then
Z
U (a) = u(w(q))dF (q|a) − c(a)
Z Z
= a u(w(q))dFH (a) + (1 − a) u(w(q))dFL (q) − c(a)
is concave in a.
• Integrating by parts,
Z
U (a) = u(w(q))dF (q|a) − c(a)
Z
= u(w(q)) − u0 (w(q))w0 (q)F (q|a)dq − c(a)
which is concave in a. This follows since u0 (w) < 0 and by concavity, w0 (q) > 0 by MLRP.
• We’ll discuss three stochastic orders. For more on this see Shaked and Shanthikumar (2007)
• Suppose X ∼ F and Y ∼ G.
• For the examples, F is the distribution of mens’ longevity, and G is the distribution of womens’.
13
Econ 201c, UCLA Simon Board
• Interpretation: Given any age, there are more women alive than men.
• Characterization: X ≤st Y iff E[φ(X)] ≤ E[φ(y)] for any increasing function φ(·).
• Interpretation: At any age, men are more likely to die than women.
• Characterization: X ≤hr Y iff [X|a ≤ X] ≤st [Y |a ≤ Y ] for all a. That is, if we truncate
the distribution below and renormalize it, then distribution of Y dominates X in the usual
stochastic order.
• Hence the hazard rate order implies the usual stochastic order.
• Interpretation: At lower ages, proportionately more men die. While the distribution of mens’
and womens’ agent probably do satisfy the hazard rate order, they do not satisfy this since
at the f /g is higher at 18 year of age than at 12.
• Hence the likelihood ratio order implies the hazard rate order.
14
Econ 201c, UCLA Simon Board
• There is a sense that debt contract motivate managers to work harder. This famously underlay
the leveraged buyout wave of the 1980s (e.g., read “Barbarians at the Gate”)
Model
• A contract r(q) ∈ [0, q] tells us what fraction of the output the agent repays.
• Formally, this is a standard moral hazard problem with risk-neutral agent and constraints on
the wage.
The problem
(F E) 0 ≤ r(q) ≤ q
• One may wonder: If the agent is designing the contract, then why do we need (IC)? After all,
isn’t the action chosen to maximize his utility. The issue is that the agent is time inconsistent.
Consider the model from Section 1.3. Without (IC), any Pareto efficient contract fully insures
the agent. However, if the agent is fully insured, then he will not work.
15
Econ 201c, UCLA Simon Board
• Pointwise maximization means we can ignore all the parts that don’t involve r(q),
fa (q|a)
max r λ − µ −1
r f (q|a)
• Thus
q if η(q) > 0
r(q) =
0 if η(q) < 0
• Suppose MLRP holds. If µ > 0 then η(q) is decreasing in q.2 Thus there exists a q̂ such that
q if q < q̂
r(q) =
0 if q > q̂
This “live or die” contract does not look like debt: The agent pays everything if the project
makes less than q̂, but pays nothing if it is successful.
Obtaining debt
• Innes observed that the above contract has a problem. Suppose the agent produced output
q̂ − . Then he could borrow 2, lower is payments to zero, and pay back the loan. This
motivated Innes to consider monotone contracts in which r(q) is increasing.
• A contract is defined by two numbers: The amount of debt D and the effort level a. There
2
When is it the case that µ > 0? Innes (1990) supposes the agent’s utility U (a) is concave and that the agent can
only deviate downwards, so U 0 (a) ≥ 0 and µ ≥ 0. There are two cases. If µ > 0, we have the analysis in the text. If
µ = 0 then IC is irrelevant and we can obtain first-best.
16
Econ 201c, UCLA Simon Board
2.2 Multitasking
• Often agent have more than one actions. How does a principal incentivize an agent to do the
right thing?
• Motivation
– A Wells Fargo banker chooses to “help the customer” or “create fake accounts”
• In terms of modeling style, this is quite different from the generality of Section 1. The aim
here is to design the simplest possible model to get some clean economic insights.
Model
• The principal sees a performance measure m = a1 + ga2 , where the constant g may be positive
or negative.
• The agent is risk-neutral, U = w − c(a1 ) − c(a2 ) with outside option U . Let c(a) = a2 /2.
17
Econ 201c, UCLA Simon Board
maxΠ = a1 + f a2 − α − βm
α,β,a
• We can use the (IR) constraint to eliminate α. The firm thus maximizes welfare,
Π = a1 + f a2 − c(a1 ) − c(a2 ) − U
β2
= β(1 + f g) − (1 + g 2 ) − U
2
where the second line uses (ICFOC) to eliminate (a1 , a2 ).
• Differentiating we obtain
1 + fg
β=
1 + g2
1 (1 + f g)2
Π=
2 1 + g2
• Intuitively, β captures the degree of alignment between output F = (1, f ) and monitoring
G = (1, g). More precisely, let θ be the angle between F and G. Then
F ·G |F | 1
β= 2
= cos θ = projG F
|G| |G| |G|
Example: Suppose f = 1, so both actions are equally productive. Unfortunately, action a2 is not
as easy to measure, g ≤ 1. Also assume U = 0.
• Motivation: Consider a manager can spend her time finding new leads or team-building. The
firm cares about both since it affects lifetime profits (q) but team-building doesn’t have as
much impact on short-term sales (m).
• Suppose g = 12 . Then β = 65 , a1 = 56 , a2 = 3
5 and Π = 9
10 . Thus the principal over-incentivizes
action a1 in order to get some a2 .
Exercise (Harmful actions): What happens if f = −1, so task 2 is harmful to the firm.
18
Econ 201c, UCLA Simon Board
• Motivation: The police can catch bad guys (a1 ) or fake crime statistics (a2 ).
• But was is the agent only has so many hours in the day?
19
Econ 201c, UCLA Simon Board
3 Multiple Agents
• They interact via the performance scheme (e.g. a tournament) even though their performance
is unrelated. See Section 3.1.
• They work as a team, and impose externalities on each other. See Section 3.2.
• The are subject to common shocks. So making agent 1’s pay depend on agent 2’s performance
can lower their risk. See HW2, Q2 for a study of such “relative performance evaluation”.
3.1 Tournaments
Model
• Each produces output qi = ai + i , where (i , j ) are IID with zero mean.
• Notice that there is no inherent relationship between the two agents. They will purely be
linked via the performance scheme.
Π = E[q|a] − c(a) − U
= a − c(a) − U
A tournament
• Suppose there are prizes wH and wL for first and second place.
20
Econ 201c, UCLA Simon Board
Agent’s problem
• The FOC is
(wH − wL )h(ai − aj ) = c0 (ai )
• There is a pure strategy equilibrium if utility Ui is quasi-concave in ai . This may well not
hold. For example, if there is no noise then there is a mixed NE.
• We have two variables: wH and wL . Intuitively, we can choose the spread to get the right
incentives, and the level to make the agent participate.
• We then need (IR) to bind. In the symmetric equilibrium, both agent win with probability
1/2, so
1
U = (wH + wL ) − c(a∗ ) = U
2
21
Econ 201c, UCLA Simon Board
• Risk aversion. We’re making i’s pay depend on j’s performance. Why introduce any relation-
ship between them?
• Collusion. If the agent interact repeatedly, they might collude on a1 = a2 = 0. See HW2, Q4.
• Sensitivity. The contract requires that the principal knows h(0) exactly. This is unlikely. In
comparison, a piece rate contract is more robust.
• They are useful if the principal cannot be trusted to truthfully reveal output (“private evalu-
ations”). With a tournament, the principal always pays out the same amount, and so has no
incentive to lie.
Model
• Team output depends on all the actions, q(a), where a = (a1 , . . . , aN ). Assume q(·) is strictly
increasing in it’s arguments.
• Throughout, we focus on partial implementation. That is, we look for an equilibrium that
implements the desired outcome rather than insisting that all equilibria implement the desired
outcome. See HW2, Q3 for more on this.
Full-information benchmark
• The FOC is
∂
q(a∗ ) = c0 (a∗i ) (8)
∂ai
22
Econ 201c, UCLA Simon Board
• Assuming full participation is first-best, we can then split the surplus in any way we like so
as to give each agent at least U .
Agent’s problem
P
• Suppose we use an output sharing rule {ti (q)} such that ti (q) = q.
∂
t0i (q(a∗ )) q(a∗ ) = c0 (a∗i ) (9)
∂ai
• Comparing (8) and (9), we require that t0i (q(a∗ )) = 1 which means that agent i needs to be
the residual claimant.
• However, there is only one surplus, so not everyone can be the residual claimant. Formally,
differentiating i ti (q) = q gives us i t0i (q) = 1. We cannot give everyone the marginal
P P
dollar.
• Suppose agent 1 is in charge of marketing, and agent 2 in charge of operations. Then we can
see why the team failed (low sales, or high costs), and punish the appropriate person.
• Formally, if q(ai , a∗−i ) 6= q(aj , a∗−j ) for all ai 6= a∗i and aj 6= a∗j , then we can spot the deviator
by giving them no payment.
• Suppose we set
q(a∗ )/N if q = q(a∗ )
ti (q) =
0 otherwise
That is, we destroy the team’s work if we don’t get first-best output.
23
Econ 201c, UCLA Simon Board
• The problem is that we cannot give the marginal dollar to everyone. But, suppose we introduce
a third agent who could chip in the required money.
• Suppose each agent i gets ti (q) = q − F , so that under first-best effort they split the pie,
q(a∗ ) N −1
ti (q(a∗ )) = N . This means F = ∗
N q(a ).
• Agent N + 1 takes the up-front payment of F from the N agents and makes each the residual
claimant,
X
tN +1 (q) = q(a) − ti (q(a))
i
= N F − (N − 1)q(a)
• Hence the budget-breaker breaks even in equilibrium, but earns money if the agent shirk. This
acts as a commitment device.
• One odd feature: The budget breaker really wants the team to fail. This is reminiscent of
scheme in “The Producers” whereby the title characters sell the revenue to a play many times
over, and try to stage the world’s worst play.
24