RI Lecture1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Rational Inattention - Lecture 1

Introduction

Tim Willems
University of Oxford

2013

Tim Willems () Rational Inattention 2013 1 / 32


Introduction

Economics is the study of the allocation of scarce resources


A good (but uncommon) example of a scarce resource is perhaps
attention ("information processing capacity")
Does this scarcity have economic implications?
Theory of rational inattention enables such a study in a formal way
Introduced by Chris Sims (1998, 2003)
Builds upon information theory
Key ingredients:
All information is available
Attention of agents is limited
Agents can choose what to pay attention to (and how much)

Tim Willems () Rational Inattention 2013 2 / 32


Outline and learning goals

1 Primer in information theory


Understand the underlying intuition
Be able to understand, apply, and work with its core concepts (entropy,
information, etc.)
2 A macroeconomic application: Mackowiak and Wiederholt (2009)

Tim Willems () Rational Inattention 2013 3 / 32


Information theory (i)

Branch of applied mathematics dealing with the quanti…cation of


information
Founded by Claude E. Shannon in his 1948 article
One of the most cited articles ever written (>60,000 citations!)
Deals with information compression and communication
Practical applications: computer science, lossless data compression
(zip), lossy data compression (mp3, jpg), coding
But also relevant to linguistics, biology, economics, ...

Tim Willems () Rational Inattention 2013 4 / 32


Information theory (ii)

Two fundamental questions:


1 What is the ultimate data compression? (Answer: entropy)
2 What is the ultimate rate of communication? (Answer: channel
capacity)
Shannon’s main insights:
Random processes have a certain irreducible complexity below which
they cannot be compressed (entropy)
It is possible to communicate over a noisy channel with negligible
probability of error
When entropy of the source < capacity of the channel, error-free
communication is theoretically possible
How to actually achieve it, the theory does not say

Tim Willems () Rational Inattention 2013 5 / 32


Background literature

Thomas M. Cover and Joy A. Thomas, Elements of Information


Theory
Very comprehensive and clear treatment
First part of these notes draws upon it
David J.C. MacKay, Information Theory, Inference, and Learning
Algorithms
Similar to Cover and Thomas
James Gleick, The Information
Popular science book on information theory

Tim Willems () Rational Inattention 2013 6 / 32


Communication channel (i)

A communication channel is a system in which the output depends


probabilistically on its input
Characterized by a probability transition matrix, p (y jx ), which
determines the conditional distribution of the output (y ) given the
input (x)
Example #1: noiseless binary channel
Both input and output are binary (0 or 1)
The output of a noiseless channel exactly reproduces the input, so
p (0j0) = p (1j1) = 1 and p (1j0) = p (0j1) = 0
Graphically:

Tim Willems () Rational Inattention 2013 7 / 32


Communication channel (ii)

Example #2: noisy binary channel


Both input and output are binary (0 or 1)
Now the channel is noisy and distorts the input w.p. (1 q ), so
p (0j0) = p (1j1) = q and p (1j0) = p (0j1) = 1 q
Graphically:

Is noiseless communication still possible over such a channel?


Only if the entropy of the source < the capacity of the channel
Entropy???

Tim Willems () Rational Inattention 2013 8 / 32


Entropy (i)

Let X be a discrete random variable with alphabet X and p.d.f.


p (x ) = Pr [X = x ]
Then, the entropy of X , denoted by H (X ), is:

H (X ) (= H (p )) = ∑ p (x ) log p (x )
x 2X
= Ep log p (x )

A measure of uncertainty/unpredictability/information in X
How di¢ cult is it to describe X ?
Entropy is the average length of the shortest description of the random
variable
Base of log does not matter
With log = log2 , H (X ) is expressed in bits
With log = ln, H (X ) is expressed in nats (where 1 nat 1.44 bits;
e 1 = 2b , b = ln12 )
Tim Willems () Rational Inattention 2013 9 / 32
Entropy (ii)

H (X ) = ∑ p (x ) log p (x )
x 2X

Shannon showed that H (X ) is the only measure of uncertainty that


satis…es certain intuitive properties, such as:
Additivity of two independent sources of uncertainty:
H ( xi xj ) = H ( xi ) + H ( xj )
Explains the log
The less probable an event a priori (low p (x )), the more information its
occurrence contains
Explains log p (1x ) = log p (x )
Information is surprise! (Compare: I am thinking of a word starting
with a... "q". The second letter is a "u".)

It is not possible to describe random variable X perfectly with less


than H (X ) bits
Tim Willems () Rational Inattention 2013 10 / 32
Entropy (iii)

Example #1 (binary random variable): Let

1 w .p. p
X =
0 w .p. 1 p

Then:
H (p ) = p log2 p (1 p ) log2 (1 p)

Note that:
H (1/2) = 1 bit

Hence, we need n bits of information to describe the outcome of n


fair coin tosses.
Do we need more or less bits to describe the outcome of n coin tosses
when p 6= 1/2?
Tim Willems () Rational Inattention 2013 11 / 32
Entropy (iv)

Example #2 (Gaussian random variable): Let

X N (µX , σ2X )

Then:
1
H (X ) = log2 2πeσ2X
2

For given variance σ2X , the normal distribution maximizes entropy.

Tim Willems () Rational Inattention 2013 12 / 32


Aside: general insights (i)

To minimize the probability that your message is distorted, add


redundancy
Rather than sending "1", send "11111111111"
Intuition for why noiseless information transmission is still possible over
a noisy channel: "11101110110" will still be understood as meaning "1"
In language: "queue"
Form the fcat taht you can sitll raed tihs, you may cnolcdue taht trehe
is qitue smoe rdenuadcny in Egnilsh lnaugage.
Frxm thx fxct thxt yxx cxn stxll rxxd thxs, yxx mxy cxnclxdx thxt thxrx
xs qxxtx sxmx rxdxndxncx xn Xnglxsh lxngxxgx.

Tim Willems () Rational Inattention 2013 13 / 32


Aside: general insights (ii)

To minimize the length of your code, remove redundancy


Assign a short code to outcomes that are very likely (downside: we
have to use longer codes to describe outcomes that are less likely)
Language: we use a short "code" for common concepts ("a", "we",
"the"); we use longer "codes" for uncommon concepts (such as the
rare disease "pneumonoultramicroscopicsilicovolcanoconiosis")
Morse-code: uses "." for the common "e", while using "_ . . _" for
the uncommon "x"
Saturday-night texting: "Hi m8, r u still in the q?"

Various tricks to detect errors in codes

Tim Willems () Rational Inattention 2013 14 / 32


Mutual information

We are interested in learning the realization of X . How informative is


the observation of S?
Answer: this is measured by the reduction in uncertainty (entropy) on X

I (X ; S ) = H (X ) H (X jS )

I (X ; S ) is the mutual information between X and S; knowledge of S


reduces our entropy on X by I (X ; S )
Mutual information is symmetric (verify!):

I (X ; S ) = I (S; X )

Tim Willems () Rational Inattention 2013 15 / 32


Rational inattention
Agents can only absorb a limited amount of information.
Modelled by a constraint on entropy reduction:
I (X ; S ) = H (X ) H (X jS ) κ (#)
Per time period, agents are not able to reduce their entropy on X by
more than κ bits
In the Gaussian example, (#) boils down to (using σ2X jS to denote
the posterior uncertainty on X , i.e. the uncertainty on X that remains
after observing the signal S):
!
2
1 1 1 σ
log2 2πeσ2X log2 2πeσ2X jS = log2 X
κ
2 2 2 σ2X jS
This implies that for …nite κ, acquiring perfect knowledge on the
realization of X is not possible (i.e.: the agent cannot choose to set
σ2X jS = 0)
The agent can reduce his uncertainty on the realization of X , but he
cannot eliminate it
Tim Willems () Rational Inattention 2013 16 / 32
Outline and learning goals

1 Primer in information theory


2 A macroeconomic application: Mackowiak and Wiederholt
(2009)
Be able to understand the model, its derivation, and its implications

Tim Willems () Rational Inattention 2013 17 / 32


Mackowiak and Wiederholt (2009)

A price setter who faces an information ‡ow constraint


Motivation: how can monetary policy have real e¤ects? Or: why are
…rms so sluggish in responding to monetary shocks?
Traditional explanation: sticky prices
But: prices are very ‡exible at the micro-level
Can we build a model that reconciles individual price ‡exibility with
aggregate price stickiness?
Mackowiak and Wiederholt: informational frictions
Lucas-island model in which the noise is optimally chosen by
price-setters, subject to their info processing constraint
Endogenous information structure
Firms allocate only little attention to tracking aggregate variables
(idiosyncratic ones are more important/volatile) ) sluggish responses
to aggregate shocks

Tim Willems () Rational Inattention 2013 18 / 32


Link with Lucas island model

Lucas island model works from the assumption that agents can only
observe the current state of monetary policy with a delay
But in reality this information lag is short
Lucas island model has di¢ culties in explaining persistent business
cycle ‡uctuations
Sims (2003): when agents can’t attend to all information, there is a
di¤erence between available information and information re‡ected in
decisions

Tim Willems () Rational Inattention 2013 19 / 32


Model (i)
Firms face idiosyncratic and aggregate shocks (zit and yt )
Under full information, they would set the pro…t maximizing price:
pit = pt + α1 yt + α2 zit
All variables are assumed to be Gaussian, so we can de…ne the pro…t
maximizing response to aggregate conditions ∆t pt + α1 yt and
write:
pit = ∆t + α2 zit
∆t N (0, σ2∆ ); zit N (0, σ2z )
Due to informational imperfections, …rms cannot observe ∆t and zit
perfectly. Instead, they observe private noisy signals S∆it and Szit :
S∆it = ∆t + εit , εit N (0, σ2ε ) (1)
Szit = zit + ψit , ψit N (0, σ2ψ ) (2)
Firm chooses attention allocation over ∆t and zit , and hence σ2ε and
σ2ψ .
Tim Willems () Rational Inattention 2013 20 / 32
Model (ii)

Given noisy signals, the …rm’s pro…t-maximizing price is:

pit = E f∆t jS∆it g + α2 E fzit jSzit g

Whenever pit 6= pit , there is a pro…t-loss which is quadratic in


distance:
γ 2
π pit , ∆t , zit π (pit , ∆t , zit ) = p pit
2 it
So pro…t maximization boils down to:
γ 2
min E∆,z ,S ∆ ,Sz p pit
f (∆,S ∆ ),f (zi ,S z ) 2 it

Note: minimization takes place by choosing the joint distribution of the


signals and the true state
Why not choose pit ? Equivalent, since there is a deterministic mapping
from information to actions
Tim Willems () Rational Inattention 2013 21 / 32
Model (iii)

n o
Using that pit = E∆,z pit jS∆it , Szit , this means:

γ n o 2
min E∆,z ,S ∆ ,Sz p E∆,z pit jS∆it , Szit
f (∆,S ∆ ),f (zi ,S z ) 2 it

By the Law of Iterated Expectations:

γ n o 2
min ES ∆ ,Sz E∆,z pit E∆,z pit jS∆it , Szit S∆it
f (∆,S ∆ ),f (zi ,S z ) 2
γ n h io
= min ES ∆ ,Sz Var pit S∆it , Szit
f (∆,S ∆ ),f (zi ,S z ) 2
γ h i
= min Var pit S∆it , Szit
f (∆,S ∆ ),f (zi ,S z ) 2

Tim Willems () Rational Inattention 2013 22 / 32


Model (iv)

Since the aggregate and idiosyncratic state variables are assumed to


be uncorrelated:
h i
Var pi S∆ , Sz = Var [ ∆j S∆it , Szit ] + Var [ α2 zi j S∆it , Szit ]
= σ2∆jS ∆it + α22 σ2z jSzit (3)

Hence, the objective is simply to minimize the posterior uncertainty


on the state variables, weighted for their importance (α2 ).
Core of RI: this problem is subject to a constraint on attention (so (3)
can’t just be set to zero)

Tim Willems () Rational Inattention 2013 23 / 32


Model (v)
Entropy constraint:
I ( ) = H ( ∆t ) H (∆t jS∆it ) + H (zit ) H (zit jSzit ) κ
! !
1 2πeσ2∆ 1 2πeσ2z
= log2 + log2 κ
2 2πeσ2∆jS 2 2πeσ2z jS
!∆it ! zit

2 2
1 σ∆ 1 σz
= log2 + log2 κ
2 σ2∆jS 2 σ2z jS
∆it zit
2 + σ2
!
2
σ∆ + σε 2
1 1 σ z ψ
= log2 + log2 κ
2 σ2 2 σ2ψ
| {z ε } | {z }
κ∆ κz

Endogenous signal-to-noise ratios (recall equations (1) and (2)),


increasing in κ:
σ2∆ σ2z
22κ ∆ 1; 22κz 1
σ2ε σ2ψ
Tim Willems () Rational Inattention 2013 24 / 32
Model (vi)
Convenient: when the objective is quadratic and the priors are
Gaussian, the posterior is Gaussian as well
See Sims (2006), Matejka (2011a,b), and Matejka and Sims (2012) for
relaxations of these assumptions
Problem:

min σ2∆jS ∆it + α22 σ2z jSzit


σ2∆jS ,σ2z jS
∆it zit
! !
1 σ2∆ 1 σ2z
s.t. log2 + log2 κ
2 σ2∆jS 2 σ2z jS
∆it zit

Lagrangean formulation:
σ2∆jS + α22 σ2z jSzit
∆it
L= 1 σ2∆ σ2z
+λ 2 log2 2
σ ∆ jS
+ 12 log2 σ2z jS
κ
∆it zit

Tim Willems () Rational Inattention 2013 25 / 32


Model (vii)

Solution (check!):

κ + 12 [log 2 (σ2∆ )+log 2 (σ2z ) log 2 (α22 )]


σ2∆jS ∆it = α22 2
κ + 21 [log 2 (σ2∆ )+log 2 (σ2z ) log 2 (α22 )]
σ2z jSzit = 2

Or, equivalently:
1 1 σ2∆
κ∆ = κ + log2
2 4 α22 σ2z
Intuition?

Tim Willems () Rational Inattention 2013 26 / 32


Calibration

σ2∆ can be estimated from aggregate data


Micro data show large idiosyncratic price changes
Evidence for large idiosyncratic shocks(?)
σ2z /σ2∆ 12
Hard to calibrate κ
Is set equal to 3 such that pro…t losses due to RI are "small"

Tim Willems () Rational Inattention 2013 27 / 32


Results (i)

4% of attention is allocated to aggregate conditions; 96% to


idiosyncratic ) rapid response to idiosyncratic shocks

Tim Willems () Rational Inattention 2013 28 / 32


Results (ii)

Sluggish response to aggregate shocks

Tim Willems () Rational Inattention 2013 29 / 32


Results (iii)

Pro…t losses are small (< 1%), as tracking of important variables is


good

Tim Willems () Rational Inattention 2013 30 / 32


Results (iv)

Sluggish dynamics in the aggregate

Tim Willems () Rational Inattention 2013 31 / 32


Summarizing

Firms pay more attention to idiosyncratic variables than to aggregate


ones ) sluggish response to shocks to the latter
Model can explain the co-existence of large idiosyncratic price
changes, with the sluggishness observed in the aggregate price level
Mackowiak and Wiederholt (2013) construct a DSGE-model with RI
households and …rms
Can match the data as well as conventional DSGEs with multiple
frictions
But: model has very di¤erent predictions in response to policy
experiments
Reason: allocation of attention is endogenous (see next lecture)

Tim Willems () Rational Inattention 2013 32 / 32

You might also like