RI Lecture1

Rational Inattention - Lecture 1
Introduction
Tim Willems
University of Oxford
2013
Tim Willems () Rational Inattention 2013 1 / 32

Introduction
Economics is the study of the allocation of scarce resources

A good (but uncommon) example of a scarce resource is perhaps
attention ("information processing capacity")
Does this scarcity have economic implications?
Theory of rational inattention enables such a study in a formal way
Introduced by Chris Sims (1998, 2003)
Builds upon information theory
Key ingredients:
All information is available
Attention of agents is limited
Agents can choose what to pay attention to (and how much)

Outline and learning goals
1 Primer in information theory

Understand the underlying intuition
Be able to understand, apply, and work with its core concepts (entropy,
information, etc.)
2 A macroeconomic application: Mackowiak and Wiederholt (2009)

Information theory (i)
Branch of applied mathematics dealing with the quanti…cation of

information
Founded by Claude E. Shannon in his 1948 article
One of the most cited articles ever written (>60,000 citations!)
Deals with information compression and communication
Practical applications: computer science, lossless data compression
(zip), lossy data compression (mp3, jpg), coding
But also relevant to linguistics, biology, economics, ...

Information theory (ii)
Two fundamental questions:

1 What is the ultimate data compression? (Answer: entropy)
2 What is the ultimate rate of communication? (Answer: channel
capacity)
Shannon’s main insights:
Random processes have a certain irreducible complexity below which
they cannot be compressed (entropy)
It is possible to communicate over a noisy channel with negligible
probability of error
When entropy of the source < capacity of the channel, error-free
communication is theoretically possible
How to actually achieve it, the theory does not say

Background literature
Thomas M. Cover and Joy A. Thomas, Elements of Information

Theory
Very comprehensive and clear treatment
First part of these notes draws upon it
David J.C. MacKay, Information Theory, Inference, and Learning
Algorithms
Similar to Cover and Thomas
James Gleick, The Information
Popular science book on information theory

Communication channel (i)
A communication channel is a system in which the output depends

probabilistically on its input
Characterized by a probability transition matrix, p (y jx ), which
determines the conditional distribution of the output (y ) given the
input (x)
Example #1: noiseless binary channel
Both input and output are binary (0 or 1)
The output of a noiseless channel exactly reproduces the input, so
p (0j0) = p (1j1) = 1 and p (1j0) = p (0j1) = 0
Graphically:

Communication channel (ii)
Example #2: noisy binary channel

Both input and output are binary (0 or 1)
Now the channel is noisy and distorts the input w.p. (1 q ), so
p (0j0) = p (1j1) = q and p (1j0) = p (0j1) = 1 q
Graphically:
Is noiseless communication still possible over such a channel?

Only if the entropy of the source < the capacity of the channel
Entropy???

Entropy (i)
Let X be a discrete random variable with alphabet X and p.d.f.

p (x ) = Pr [X = x ]
Then, the entropy of X , denoted by H (X ), is:
H (X ) (= H (p )) = ∑ p (x ) log p (x )
x 2X
= Ep log p (x )
A measure of uncertainty/unpredictability/information in X
How di¢ cult is it to describe X ?
Entropy is the average length of the shortest description of the random
variable
Base of log does not matter
With log = log2 , H (X ) is expressed in bits
With log = ln, H (X ) is expressed in nats (where 1 nat 1.44 bits;
e 1 = 2b , b = ln12 )
Entropy (ii)
H (X ) = ∑ p (x ) log p (x )
x 2X
Shannon showed that H (X ) is the only measure of uncertainty that

satis…es certain intuitive properties, such as:
Additivity of two independent sources of uncertainty:
H ( xi xj ) = H ( xi ) + H ( xj )
Explains the log
The less probable an event a priori (low p (x )), the more information its
occurrence contains
Explains log p (1x ) = log p (x )
Information is surprise! (Compare: I am thinking of a word starting
with a... "q". The second letter is a "u".)
It is not possible to describe random variable X perfectly with less

than H (X ) bits
Entropy (iii)
Example #1 (binary random variable): Let
1 w .p. p
X =
0 w .p. 1 p
Then:
H (p ) = p log2 p (1 p ) log2 (1 p)
Note that:
H (1/2) = 1 bit
Hence, we need n bits of information to describe the outcome of n

fair coin tosses.
Do we need more or less bits to describe the outcome of n coin tosses
when p 6= 1/2?
Entropy (iv)
Example #2 (Gaussian random variable): Let
X N (µX , σ2X )
Then:
1
H (X ) = log2 2πeσ2X
2
For given variance σ2X , the normal distribution maximizes entropy.

Aside: general insights (i)
To minimize the probability that your message is distorted, add

redundancy
Rather than sending "1", send "11111111111"
Intuition for why noiseless information transmission is still possible over
a noisy channel: "11101110110" will still be understood as meaning "1"
In language: "queue"
Form the fcat taht you can sitll raed tihs, you may cnolcdue taht trehe
is qitue smoe rdenuadcny in Egnilsh lnaugage.
Frxm thx fxct thxt yxx cxn stxll rxxd thxs, yxx mxy cxnclxdx thxt thxrx
xs qxxtx sxmx rxdxndxncx xn Xnglxsh lxngxxgx.

Aside: general insights (ii)
To minimize the length of your code, remove redundancy

Assign a short code to outcomes that are very likely (downside: we
have to use longer codes to describe outcomes that are less likely)
Language: we use a short "code" for common concepts ("a", "we",
"the"); we use longer "codes" for uncommon concepts (such as the
rare disease "pneumonoultramicroscopicsilicovolcanoconiosis")
Morse-code: uses "." for the common "e", while using "_ . . _" for
the uncommon "x"
Saturday-night texting: "Hi m8, r u still in the q?"
Various tricks to detect errors in codes

Mutual information
We are interested in learning the realization of X . How informative is

the observation of S?
Answer: this is measured by the reduction in uncertainty (entropy) on X
I (X ; S ) = H (X ) H (X jS )
I (X ; S ) is the mutual information between X and S; knowledge of S

reduces our entropy on X by I (X ; S )
Mutual information is symmetric (verify!):
I (X ; S ) = I (S; X )

Rational inattention
Agents can only absorb a limited amount of information.
Modelled by a constraint on entropy reduction:
I (X ; S ) = H (X ) H (X jS ) κ (#)
Per time period, agents are not able to reduce their entropy on X by
more than κ bits
In the Gaussian example, (#) boils down to (using σ2X jS to denote
the posterior uncertainty on X , i.e. the uncertainty on X that remains
after observing the signal S):
!
2
1 1 1 σ
log2 2πeσ2X log2 2πeσ2X jS = log2 X
κ
2 2 2 σ2X jS
This implies that for …nite κ, acquiring perfect knowledge on the
realization of X is not possible (i.e.: the agent cannot choose to set
σ2X jS = 0)
The agent can reduce his uncertainty on the realization of X , but he
cannot eliminate it
Outline and learning goals
1 Primer in information theory

2 A macroeconomic application: Mackowiak and Wiederholt
(2009)
Be able to understand the model, its derivation, and its implications

Mackowiak and Wiederholt (2009)
A price setter who faces an information ‡ow constraint

Motivation: how can monetary policy have real e¤ects? Or: why are
…rms so sluggish in responding to monetary shocks?
Traditional explanation: sticky prices
But: prices are very ‡exible at the micro-level
Can we build a model that reconciles individual price ‡exibility with
aggregate price stickiness?
Mackowiak and Wiederholt: informational frictions
Lucas-island model in which the noise is optimally chosen by
price-setters, subject to their info processing constraint
Endogenous information structure
Firms allocate only little attention to tracking aggregate variables
(idiosyncratic ones are more important/volatile) ) sluggish responses
to aggregate shocks

Link with Lucas island model
Lucas island model works from the assumption that agents can only
observe the current state of monetary policy with a delay
But in reality this information lag is short
Lucas island model has di¢ culties in explaining persistent business
cycle ‡uctuations
Sims (2003): when agents can’t attend to all information, there is a
di¤erence between available information and information re‡ected in
decisions

Model (i)
Firms face idiosyncratic and aggregate shocks (zit and yt )
Under full information, they would set the pro…t maximizing price:
pit = pt + α1 yt + α2 zit
All variables are assumed to be Gaussian, so we can de…ne the pro…t
maximizing response to aggregate conditions ∆t pt + α1 yt and
write:
pit = ∆t + α2 zit
∆t N (0, σ2∆ ); zit N (0, σ2z )
Due to informational imperfections, …rms cannot observe ∆t and zit
perfectly. Instead, they observe private noisy signals S∆it and Szit :
S∆it = ∆t + εit , εit N (0, σ2ε ) (1)
Szit = zit + ψit , ψit N (0, σ2ψ ) (2)
Firm chooses attention allocation over ∆t and zit , and hence σ2ε and
σ2ψ .
Model (ii)
Given noisy signals, the …rm’s pro…t-maximizing price is:
pit = E f∆t jS∆it g + α2 E fzit jSzit g
Whenever pit 6= pit , there is a pro…t-loss which is quadratic in

distance:
γ 2
π pit , ∆t , zit π (pit , ∆t , zit ) = p pit
2 it
So pro…t maximization boils down to:
γ 2
min E∆,z ,S ∆ ,Sz p pit
f (∆,S ∆ ),f (zi ,S z ) 2 it
Note: minimization takes place by choosing the joint distribution of the

signals and the true state
Why not choose pit ? Equivalent, since there is a deterministic mapping
from information to actions
Model (iii)
n o
Using that pit = E∆,z pit jS∆it , Szit , this means:
γ n o 2
min E∆,z ,S ∆ ,Sz p E∆,z pit jS∆it , Szit
f (∆,S ∆ ),f (zi ,S z ) 2 it
By the Law of Iterated Expectations:
γ n o 2
min ES ∆ ,Sz E∆,z pit E∆,z pit jS∆it , Szit S∆it
f (∆,S ∆ ),f (zi ,S z ) 2
γ n h io
= min ES ∆ ,Sz Var pit S∆it , Szit
f (∆,S ∆ ),f (zi ,S z ) 2
γ h i
= min Var pit S∆it , Szit
f (∆,S ∆ ),f (zi ,S z ) 2

Model (iv)
Since the aggregate and idiosyncratic state variables are assumed to

be uncorrelated:
h i
Var pi S∆ , Sz = Var [ ∆j S∆it , Szit ] + Var [ α2 zi j S∆it , Szit ]
= σ2∆jS ∆it + α22 σ2z jSzit (3)
Hence, the objective is simply to minimize the posterior uncertainty

on the state variables, weighted for their importance (α2 ).
Core of RI: this problem is subject to a constraint on attention (so (3)
can’t just be set to zero)

Model (v)
Entropy constraint:
I ( ) = H ( ∆t ) H (∆t jS∆it ) + H (zit ) H (zit jSzit ) κ
! !
1 2πeσ2∆ 1 2πeσ2z
= log2 + log2 κ
2 2πeσ2∆jS 2 2πeσ2z jS
!∆it ! zit
2 2
1 σ∆ 1 σz
= log2 + log2 κ
2 σ2∆jS 2 σ2z jS
∆it zit
2 + σ2
!
2
σ∆ + σε 2
1 1 σ z ψ
= log2 + log2 κ
2 σ2 2 σ2ψ
| {z ε } | {z }
κ∆ κz
Endogenous signal-to-noise ratios (recall equations (1) and (2)),

increasing in κ:
σ2∆ σ2z
22κ ∆ 1; 22κz 1
σ2ε σ2ψ
Model (vi)
Convenient: when the objective is quadratic and the priors are
Gaussian, the posterior is Gaussian as well
See Sims (2006), Matejka (2011a,b), and Matejka and Sims (2012) for
relaxations of these assumptions
Problem:
min σ2∆jS ∆it + α22 σ2z jSzit

σ2∆jS ,σ2z jS
∆it zit
! !
1 σ2∆ 1 σ2z
s.t. log2 + log2 κ
2 σ2∆jS 2 σ2z jS
∆it zit
Lagrangean formulation:
σ2∆jS + α22 σ2z jSzit
∆it
L= 1 σ2∆ σ2z
+λ 2 log2 2
σ ∆ jS
+ 12 log2 σ2z jS
κ
∆it zit

Model (vii)
Solution (check!):
κ + 12 [log 2 (σ2∆ )+log 2 (σ2z ) log 2 (α22 )]

σ2∆jS ∆it = α22 2
κ + 21 [log 2 (σ2∆ )+log 2 (σ2z ) log 2 (α22 )]
σ2z jSzit = 2
Or, equivalently:
1 1 σ2∆
κ∆ = κ + log2
2 4 α22 σ2z
Intuition?

Calibration
σ2∆ can be estimated from aggregate data

Micro data show large idiosyncratic price changes
Evidence for large idiosyncratic shocks(?)
σ2z /σ2∆ 12
Hard to calibrate κ
Is set equal to 3 such that pro…t losses due to RI are "small"

Results (i)
4% of attention is allocated to aggregate conditions; 96% to

idiosyncratic ) rapid response to idiosyncratic shocks

Results (ii)
Sluggish response to aggregate shocks

Results (iii)
Pro…t losses are small (< 1%), as tracking of important variables is

good

Results (iv)
Sluggish dynamics in the aggregate

Summarizing
Firms pay more attention to idiosyncratic variables than to aggregate

ones ) sluggish response to shocks to the latter
Model can explain the co-existence of large idiosyncratic price
changes, with the sluggishness observed in the aggregate price level
Mackowiak and Wiederholt (2013) construct a DSGE-model with RI
households and …rms
Can match the data as well as conventional DSGEs with multiple
frictions
But: model has very di¤erent predictions in response to policy
experiments
Reason: allocation of attention is endogenous (see next lecture)

RI Lecture1

Uploaded by

Copyright:

Available Formats

You might also like

RI Lecture1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RI Lecture1

Uploaded by

Copyright:

Available Formats

Rational Inattention - Lecture 1

Tim Willems () Rational Inattention 2013 1 / 32

Economics is the study of the allocation of scarce resources

Tim Willems () Rational Inattention 2013 2 / 32

1 Primer in information theory

Tim Willems () Rational Inattention 2013 3 / 32

Branch of applied mathematics dealing with the quanti…cation of

Tim Willems () Rational Inattention 2013 4 / 32

Two fundamental questions:

Tim Willems () Rational Inattention 2013 5 / 32

Thomas M. Cover and Joy A. Thomas, Elements of Information

Tim Willems () Rational Inattention 2013 6 / 32

A communication channel is a system in which the output depends

Tim Willems () Rational Inattention 2013 7 / 32

Example #2: noisy binary channel

Is noiseless communication still possible over such a channel?

Tim Willems () Rational Inattention 2013 8 / 32

Let X be a discrete random variable with alphabet X and p.d.f.

Shannon showed that H (X ) is the only measure of uncertainty that

It is not possible to describe random variable X perfectly with less

Example #1 (binary random variable): Let

Hence, we need n bits of information to describe the outcome of n

Example #2 (Gaussian random variable): Let

For given variance σ2X , the normal distribution maximizes entropy.

Tim Willems () Rational Inattention 2013 12 / 32

To minimize the probability that your message is distorted, add

Tim Willems () Rational Inattention 2013 13 / 32

To minimize the length of your code, remove redundancy

Various tricks to detect errors in codes

Tim Willems () Rational Inattention 2013 14 / 32

We are interested in learning the realization of X . How informative is

I (X ; S ) is the mutual information between X and S; knowledge of S

Tim Willems () Rational Inattention 2013 15 / 32

1 Primer in information theory

Tim Willems () Rational Inattention 2013 17 / 32

A price setter who faces an information ‡ow constraint

Tim Willems () Rational Inattention 2013 18 / 32

Tim Willems () Rational Inattention 2013 19 / 32

Given noisy signals, the …rm’s pro…t-maximizing price is:

pit = E f∆t jS∆it g + α2 E fzit jSzit g

Whenever pit 6= pit , there is a pro…t-loss which is quadratic in

Note: minimization takes place by choosing the joint distribution of the

By the Law of Iterated Expectations:

Tim Willems () Rational Inattention 2013 22 / 32

Since the aggregate and idiosyncratic state variables are assumed to

Hence, the objective is simply to minimize the posterior uncertainty

Tim Willems () Rational Inattention 2013 23 / 32

Endogenous signal-to-noise ratios (recall equations (1) and (2)),

min σ2∆jS ∆it + α22 σ2z jSzit

Tim Willems () Rational Inattention 2013 25 / 32

κ + 12 [log 2 (σ2∆ )+log 2 (σ2z ) log 2 (α22 )]

Tim Willems () Rational Inattention 2013 26 / 32

σ2∆ can be estimated from aggregate data

Tim Willems () Rational Inattention 2013 27 / 32

4% of attention is allocated to aggregate conditions; 96% to

Tim Willems () Rational Inattention 2013 28 / 32

Sluggish response to aggregate shocks

Tim Willems () Rational Inattention 2013 29 / 32