Li, Tornell - Exchange Rates Under Robustness, The Forward Premium Puzzle and Delayed Overshooting

Exchange Rates Under Robustness: The Forward
Premium Puzzle and Delayed Overshooting

Ming Li
San Francisco State University
Aaron Tornell
UCLA
First draft: May 2006

This draft: April 2008
Keywords: Carry Trade, Robust Control, Misspecification, Underreaction,
Overreaction, Forecasts.
Abstract
We show that the forward premium puzzle and the delayed overshooting of exchange
rates can be rationalized by a desire for robustness against model misspecification.
Specifically, optimizing agents that are uncertain about their model of the interest
rate dierential will make forecasts and portfolio decisions that are less sensitive to
news than standard Bayesian updating. In equilibrium, this underreaction in the bond
market generates the anomalies in the foreign exchange market. To provide an intuitive
interpretation of the biases induced by robustness, we establish a link with the empirical
literature on distorted beliefs via Girsanov-like change of measure results that transform
uncertainty sets into distortions of the parameters of the probability distribution of the
data-generating process. This transformation allows us to derive closed-form solutions
to the agents problem and the equilibrium exchange rate that have the same form as
a standard rational expectations equilibrium.
We thank Pierre O. Gourinchas, Chris Hellwig, Romain Ranciere and Pierre O. Weil, for helpful
discussions.
1. Introduction
It is a stylized fact that high interest rate currencies tend to appreciate relative to low interest
rate currencies. This is the forward premium puzzle (FPP) as one would expect that investors
would demand higher interest rates on currencies expected to depreciate. Related to this
anomaly, exchange rates often exhibit delayed overshooting i.e., momentum in response to
interest rate dierential shocks. That is, when there is a positive innovation to the interest
rate dierential, the exchange rate continues appreciating for several months after the initial
appreciation. Standard models predict an immediate appreciation followed by a depreciating
path. These patterns imply excess returns with a predictable variation over time.
This paper contributes to the international finance literature by showing that a desire
for robustness against certain types of model misspecification can account for the FPP and
delayed overshooting. In order to be robust against observation uncertainty, agents distort
their model to forecast interest rate dierentials. This robustness-induced distortion generates an underreaction of interest rate dierentials forecasts to news, which in equilibrium
gives rise to the anomalies.
As Hansen et. al. (2002), we use robust control to solve the agents problem and generate
the distortion in beliefs that underlies the anomalies. This forecast-distortion mechanism
has been found in the data by Gourinchas and Tornell (2004). Using survey data on interest
rate forecasts, they find a substantial systemic underreaction of forecasts of interest rate
dierentials to news across the G7 countries. Furthermore, they show that the estimates of
this distortion in beliefs can generate the forward premium puzzle and momentum.
What supports the existence of time-varying predictable excess returns? Why does a
desire for robustness make optimizing agents, that hold no misperception, distort the probability distributions in ways that are consistent with the anomalies? To attaint intuition for
the results consider a standard setup where the representative agent observes interest rates
and exchange rates, makes her forecasts, and chooses her bond portfolio. The interest rate
dierential has two components ( la Muth): an unobservable trend plus observation noise.
Agents have a baseline model of the interest rate dierential process which corresponds to
the data generating process, but fear that their model is misspecified leads them to design
robust forecasting and portfolio strategies.
The agents baseline model can be misspecified in many ways. Our main result establishes that if agents fear that the equation linking the noisy observations and the unobservable
trend is misspecified, then we can account for both the forward premium puzzle and momentum when the trend is persistent. This is because under this observation uncertainty,
underreaction of interest rate dierentials forecasts to news is the outcome of the agents
optimization. In equilibrium, this underreaction in the bond market leads to an exchange
rate underreaction, which in turn generates a negative covariance between exchange rate
changes and the interest rate dierential i.e., a negative Fama coecient, and an average
delayed response of the exchange rate to interest rate shocks, i.e., momentum. In other
words, a desire for robustness against this observation uncertainty underlies the positive expected excess returns, under the data-generating model, that the equilibrium path exhibits.
These excess returns reflect an uncertainty premium distinct from a risk premium.
Why should robustness against observation uncertainty necessarily imply underreaction
to news, not overreaction? There is, what we might term, a robustness principle at
work. In designing her strategy the agent asks: if things were to go wrong, what would
be the costliest direction in which they could go wrong? She then trades o the benefits
1
of dampening the eects of misspecification in this costliest direction, against the costs of
moving away from optimality under the baseline model. This trade-o is weighted by her
degree of uncertainty aversion, which is distinct from the degree of risk aversion.
When we apply this robustness principle to the case in which agents fear misspecification
in the link between the observations and the unobservable trend of the forward premium, the
costliest misspecification occurs when the agents model incorrectly assigns a signal content
to the observations that is higher than the signal it actually has. Therefore, robustness entails
distorting upwards the noisiness of the observations, which in turn implies underreacting to
news.
We would like to emphasize that simply invoking robustness against any misspecification
does not generate the anomalies. It is necessary to induce underreaction of forecasts, and for
this the structure of the uncertainty set is crucial and is determined by the events agents want
to guard against. This forecast underreaction requires a more refined description of model
uncertainty than that contained in the so called unstructured uncertainty typically considered
in the literature. This paper contributes to the robust macroeconomics literature by refining
the uncertainty set so as to capture specific ways in which the model can be misspecified,
and by deriving Girsanov-like change of measure lemmas that provide a translation from
probability measure sets into parameter distortions. This translation, in turn, allows for
closed-form solutions to the agents problem and the equilibrium exchange rate, and allows
us to establish a link with the empirical literature on belief distortions.
To investigate whether the anomalies arise under other types of uncertainty, we examine
two other types of model uncertainty. First, when there is structured uncertainty about the
unobservable trend process, we find that forecasts overreact to news. As a result, we cannot
account for the anomalies in fact, the Fama regression coecient is greater than one, and
there is no delayed overshooting. Second, in the so called unstructured uncertainty case,
under which agents fear misspecification of the entire interest rate dierential process, but
cannot pinpoint either its nature or location, the result is surprising. Robust forecasts are
equal to the Bayesian forecasts under the baseline model. Thus, the forecast-underreaction
mechanism underlying the forward premium puzzle and momentum is not operative. The
reason for this result is that the robust problem reduces to a standard Bayesian problem
via the Representation Lemma.
The structure of the paper is as follows. In Section 2 we present an intuitive overview.
In Section 3 we present the model. In Section 4 we derive the equilibrium exchange rate. In
Section 5 we characterize the conditions under which observation uncertainty generates the
anomalies. In Section 6 we consider other uncertainty sets to establish that the structure of
the uncertainty set is key to account for the anomalies. In Sections 7 and 8 we present a
literature review and the conclusions, respectively. Finally, the appendix contains the proofs.
2. Overview
Here, we present a non-technical overview, which can be skipped without loss of continuity.
We consider a representative agent asset pricing model, where in equilibrium the exchange
rate is a function of the interest rate dierential, the forecasts of future dierentials, as well as
an uncertainty premium. To make things concrete, let the dierential between the domestic
and foreign interest rates be the sum of an unobservable trend (xt ) and noise (vt )
it ift = xt + vt
xt = axt1 + wt1 ,
(2.1)
a (0, 1)
The agent has a baseline model under which the observation shock (vt ) and the trend shock
(wt1 ) are white noise processes with mean 0, and variances 2v and 2w , respectively. The
agent, however, fears that model (2.1) is misspecified, and herein lies the key to accounting
for the anomalies.
In forecasting exchange rates the agent needs to come up with an estimate of the unobservable trend (
xt ). As is standard, her estimate is a weighted average of the current interest
rate dierential observation and her prior
xt1 .
xt = k[it ift ] + [1 k]a
(2.2)
If she fully trusted her model of the interest rate dierential (2.1), she would set the gain
k equal to the Bayesian gain as in Muth (1961). When she fears model misspecification,
however, choosing such a gain is inappropriate. To see why, suppose that the agent fears
that the observation equation (i.e., the first equation in (2.1)) is misspecified. Under this
type of observation uncertainty the agent can make two types of errors: (i)underestimate
the true signal contained in the observations by setting k too low; or (ii)overestimate the
signal content by setting k too high. The key point is that when a robust agent fears
misspecification about the link between the observations and the unobservable trend, she
will rather commit error (i) and err on the side of dismissing part of the signal information
contained in the observations. By acting as if the interest rate dierential is more noisy
than what it actually is, the true mean squared error (MSE) will grow at a lower rate as the
degree of unknown misspecification increases in the worst possible direction. In other words,
the distortion renders the estimator more robust.
The downside of robustness is that if there was no misspecification, the MSE would be
greater than under the optimal Bayesian gain, and the estimation of the trend would be
ineciently slow. As a result, she might forgo profitable carry trades, but she will dampen
the extent of potential losses. In contrast, under the alternative strategy of using the Bayes
filter which is optimal under the baseline model, and giving more weight to observations, she
might incorrectly undertake trades that will results in substantial loses. In sum, being robust
against observation uncertainty entails underreacting to interest rate dierential news.
To link this underreaction of interest rate dierentials forecasts to the exchange rate
anomalies consider a representative agent asset pricing model. As is standard, the equilibrium exchange rate is a function of the interest rate dierential, its forecasts, as well as an
uncertainty premium. Since the baseline model corresponds to the data generating model,
the robust forecasts underreact on average to news. As a result, future exchange rates will
have to adjust to reflect the future revision of forecasts. If the trend of the interest rate
dierential is highly persistent, there will be a negative unconditional correlation between
exchange rate changes and the interest rate dierential (i.e., a negative Fama coecient),
and the equilibrium exchange rate will exhibit a humped-shaped impulse response function
(i.e., delayed overshooting).
But how far should underreaction go? To address this question we need to formalize
the agents problem. In particular, we need a measure of distance between models, as
well as an objective function that trades o optimality versus robustness against model
3
misspecification. We accomplish this by representing model uncertainty in terms of sets

of underlying probability measures, and by using the relative entropy between probability
measures to index distance between models. In this setup, we cannot represent preferences
with a standard expected utility function because there is no unique probability measure
under which to compute the expectation. Instead, as in Gilboa and Schmeidler (1989) and
Hansen et.al. (2002), the utility function selects in a pessimistic way a robust probability
measure from an infinitely large set of measures.
The agents utility function trades o robustness against optimality by penalizing deviations from the baseline model. This penalization is a proportion of the relative entropy
between the baseline underlying probability measure and the robust probability measure
chosen by the agent. By varying we can generate the whole gamut of agent types, from a
Bayesian agent that is not uncertainty-averse (1/ = 0) and has a unique prior, to a paranoid
agent, that is infinitely uncertainty-averse (1/ = ) and focuses only on the worst-case.
In the rest of the paper, for each type of misspecification, the results are derived in three
steps. First, we define a set of probability measures that captures the type of misspecification (uncertainty in the observation equation, in the trend equation as well as unstructured
uncertainty). Second, for a given set we establish a one-to-one relationship between probability measures and parameter distortions via change-of-measure lemmas, which are analogous
to the Girsanov theorem. These lemmas allow us to convert an optimization problem over
unknown probability measures into an optimization over a parameter, which is much simpler
and allows for closed-form solutions for the forecasts and portfolio strategies. Third, we derive the equilibrium exchange rate and establish the conditions under which the equilibrium
path is consistent with the forward premium puzzle and momentum.
The reader might note a discrepancy, as we have described the intuition via misspecifications to the probability distribution of the disturbances, while the preferences we consider
are defined in terms of underlying probability measures. This former representation is attractive because, arguably, distortions to means and variances of probability distributions
are observable to an econometrician, they are intuitive, and it is the way in which the empirical evidence is presented in the literature. In contrast, underlying probability measures are
abstract unobservable concepts. However, they are conceptually more convenient because
they allow us to treat dierent types of uncertainty in a unified manner, and define the
agents objective in a concise manner.
As we have outlined in this section, this paper also contributes to the robust macroeconomics literature by refining the uncertainty set in several dimensions and establishing a
one-to-one link between probability measures and parameter distortions for dierent types
of misspecification. In defining the uncertainty sets our aim is to capture dierent types of
misspecification in ways simple enough that will allow for the derivation of tractable solutions, and the characterization of the anomalies. The uncertainty sets, change-of-measure
lemmas, and derivation of equilibria can be applied to asset pricing problems other than the
exchange rate problem considered here.
3. The Model
We present a tractable exchange rate model that allows us to characterize conditions under
which a desire for robustness against dierent types of model misspecification generates the
forward premium puzzle and delayed overshooting (i.e., momentum).
There are overlapping generations of two-period lived investors that can take any long
4
and short positions in two one-period bonds: a dollar bond that will pay exp(it ) dollars
next period and a euro bond that will pay exp(ift ) euros. Consider a young representative
investor that forms a zero-cost portfolio by taking a long position bt in dollar bonds and a
short position Ebtt in euro bonds, where Et is the dollar-euro spot exchange rate (i.e., the
number of dollars per euro). Next period she will receive exp(it )bt dollars and will have to
repay EEt+1
exp(ift )bt dollars. If we let et := log(Et ), we can express the excess return as
t
i
h
f
Wt+1 = bt (it it ) (et+1 et )
(3.1)
plus a second order term.1 Equation (3.1) says that the excess rate of return of the carry
trade equals the dollar-euro interest rate dierential minus the dollars depreciation rate.
In this model, the interest rate dierential is a primitive and is the source of uncertainty.
It has two components: a trend (xt ) and observation noise (vt ). The investor does not observe
them separately, but only observes their sum. The trend follows an autoregressive process
of order one, while the noise has zero mean.
it ift yt = xt + vt
xt = axt1 + wt1 ,
(3.2)
x0 = 0, a (0, 1)
In other words, the interest rate dierential is hit by both observation shocks (v), that are
transitory and die out after one period, and trend shocks (w) that die out only gradually.
The investor, however, does not observe the shocks individually. Throughout the paper we
will refer to the interest rate dierential as the forward premium.2
Model Uncertainty
We allow for both risk and Knightian uncertainty. In the literature, risk refers to the
unique probability distribution of the relevant random variables {wj } and {vj } in our case,
which agents either know or can learn. Knightian uncertainty refers to a potential misspecification of the model.
As in Hansen et.al (2002), we introduce both dimensions of uncertainty by assuming that
the representative agent is endowed with a baseline model of the forward premium. However,
she fears model misspecification and takes the baseline model simply as an approximation
to the data-generating model. In order to measure the distance between models in a simple
way we represent model uncertainty in terms of sets of underlying probability measures. To
this end we will call model a probability measure defined on a measurable space (, I) ,
where is a compact metric space and I is the algebra on the space . The agent is
endowed with a baseline model, which we will denote by 0 , but she takes 0 simply as an
approximation and allows for the possibility that the true model lies in an uncertainty set
j .
Under the baseline model 0 the forward premium is given by (3.2) and the shock processes
This second order term is close to zero when it and et+1 et + ift are small, which is the case for
quarterly data across G7 currency pairs considered in the literature. The appendix contains the second
order approximation.
2
Notice that the covered interest parity condition implies that the interes rate dierential equals the
forward premium: ft et = it ift , where ft is the log one period ahead forward exchange rate.
1
are i.i.d. normal random variables.3

Baseline model:
wt v N (0, 1) ,
0
vt v N 0, 2v
(3.3)
In order to obtain the standard Bayes estimator in the special case where agents do not care
about robustness, we assume that baseline model 0 corresponds to the true data-generating
model.
The Robust Control literature considers two classes of model uncertainty: structured
and unstructured.4 The former specifies the location and nature of the misspecification. In
contrast, under unstructured uncertainty one does not have this refined information. As
we shall see, the distinction between these types of uncertainty is crucial in explaining the
anomalies.
Throughout the main body of the paper we consider a structured uncertainty set that
captures misspecification in the observation equation, while keeping the rest of the forward
premium process (3.2) unchanged.
(
)
s 2
v
d
1 1
1
v = P () :
(yt xt )2
2v [, ), > 0
0 = exp
2 2
2,
2
d
v v
v
(3.4)
This is an infinitely large set of probability measures that is generated by letting the parameter
2v , in the Radon-Nikodim derivative, take on values on the extended positive real
line.
Utility Function
The agent is willing to sacrifice profitability under the baseline model in exchange for some
degree of robustness against unknown misspecifications i.e., dampening of the deleterious
eects of misspecifications on profitability. To formalize this problem we need an objective
function that trades o optimality against robustness, as well as a measure of distance
between models.
Since there is no unique probability measure under which expectations can be computed,
we cannot use a standard expected utility function. Instead, as in Gilboa and Schmeidler
(1989) and Hansen, et. al. (2002), we consider a utility function that selects in a pessimistic
way a robust probability measure from the uncertainty set: U = inf j u. To get closed
form solutions we specialize the functional u as follows.
U = infj E [ exp (Wt+1 ) + <(||0 )] ,
> 0, > 0,
(3.5)
where j is a closed and convex set of probability measures and < (||0 ) is the relative
entropy of probability measure with respect to the baseline probability measure 0 :
d
R
log d
d if is absolutely continuos w.r.t. 0
0
0
<(|| ) =
(3.6)
otherwise.
The relative entropy can be thought of as the distance between the baseline model 0 and
3
For notational simplicity we normalize the variance 2w of wt to one. The results are not dependent on
the value of 2w .
4
Alternatively, we can call them specific and global uncertainties, respectively.
the alternative model .5

As we explain below, utility function (3.5) distinguishes risk aversion parametrized by
from uncertainty aversion (or desire for robustness), which is parametrized by 1/.
Equilibrium Concept
We consider an equilibrium concept analogous to a rational expectations equilibrium,
in which the price reveals all the information available to agents. As is standard in the
asset pricing literature we endow the representative agent with a linear conjecture of the log
exchange rate function.
f
t
econj
i
(3.7)
=
i
t
t
t
t ,
1 t
2
where {t , 1 , 2 } are undetermined coecients and xt t F t (xt |It ) is the estimate, under
probability measure t , of the forward premiums unobservable trend, conditional on the
information available at time t : It = {y1,..., yt }.
t1
and of its
At time t, the young t-agent inherits the estimate of the mean of the state xt1
2,t1
variance t1 . Given this prior and the observation (yt , et ) the agent solves the following
problem
0
, It = {y1 , ..., yt }
sup inf j t E t exp Wt+1 (bt , et , econj
t+1 ) + <( t || )|It
bt R t
(3.8)
it ift ,
+
+
A robust linear equilibrium is an exchange rate function =
such that given the conjecture about next periods exchange rate (3.7), the demand for
domestic bonds equals the supply, which is exogenous: bt (et , It ) = bst .
et
1 xt t
3.1. Discussion of the Setup

To interpret utility function (3.5) we can think of nature as choosing the true probability
measure in a malevolent way, so as to minimize the agents utility.6 Note, however, that
by increasing the distance between model and the baseline model 0 proxied by <(||0 )
nature incurs a cost <(||0 ). The greater the uncertainty aversion (1/), the lower the
cost for deviating from the baseline model. In one extreme, if 1/ were zero, (3.5) would
0
reduce to the familiar expected utility function E exp (Wt+1 ) . In this case, the agent
would not allow for model misspecification because choosing any model dierent from 0
would make (3.5) infinite. In the other extreme, if 1/ were infinity, the agent would choose
the worst case model among all possible models. Clearly, this worst-case objective is too
paranoid and leads to overly conservative strategies.
Both risk aversion and uncertainty aversion are necessary to account for the anomalies.
As we shall see, if = 0 or = , robustness against misspecification generates neither
under- nor over-reaction to news in a linear equilibrium.
Gilboa and Schmeidler (1989) show that in the presence of Knightian uncertainty, preferences can be represented by a utility function of the form: U = inf u, where is a closed
and convex set of underlying probability measures and u is an ane function. The objective
function in (3.5) belongs to this class of functions because: (i) the sets j of probability
5
Subsection 3.1 contains an interpretation of (3.5).

Obviously, nature does not care about our forecasts. This device is simply a useful way to induce
robustness.
6
measures that we consider are closed and convex under the weak topology, and (ii) the
term in brackets is a standard expected utility function as E exp (Wt+1 ) + <(||0 ) =
E [ exp (Wt+1 ) + log(d/d0 )] . We have considered two-period lived agents. If agents
had infinite horizons, we could not have a constant as this would generate time inconsistency. Epstein and Schneider (2003) extend to a multiperiod setup the framework of Gilboa
and Schmeidler (1989), and discuss the issues involved in considering long-lived agents.
The inclusion of the relative entropy in the objective is frequently used in the literature
on the theory of large deviations and information theory. We can think of relative entropy
as a distance between two probability measures on P () , the set of all probability measures
on . It is always non-negative, and it is zero if and only if = 0 .7 Moreover, if we consider
a new measure = + (1 ) 0 , then <(||0 ) <(||0 ).8
Simply invoking robustness against any misspecification will not generate the anomalies. It is necessary to induce underreaction of forecasts, and for this the structure of the
uncertainty set is crucial. A key contribution of this paper is to refine the uncertainty sets
and show how structured uncertainty about the link between observations and the unobservable trend can account for the exchange rate anomalies. The set v in (3.4) captures this
type of uncertainty in a simple way, that allows for the derivation of tractable solutions. This
set is generated via the Radon-Nykodim derivative in (3.4) by letting the parameter
2v range
over the positive real numbers. Lemma 4.1, below, shows that there is a one-to-one relation
between the elements of the set v and perturbations to the variance of the disturbance
vt in the observation equation yt . The variance distortion is
2v 2v . In Section 5 we show
how observation uncertainty generates the anomalies.
There are other types of potential misspecifications one could consider. In Section 6 we
consider other uncertainty sets. First, we consider structured uncertainty sets that allow only
for misspecifications in the trend equation, while keeping the rest of the forward premium
model unchanged. Second, we consider the case of unstructured uncertainty by following the
robust control literature and considering the set of all probability measures. In neither case
there is an underreaction to news.
The reader might ask why have we described uncertainty via sets of underlying probability
measures, rather than via misspecifications of probability distributions. The latter might
seem more attractive because, arguably, distortions to means and variances of probability
distributions are observable to econometricians, they are intuitive, and it is the way in which
the empirical evidence is presented in the literature. In contrast, the former are abstract
unobservable objects. However, they are conceptually more convenient because they allow
us to treat dierent types of uncertainty in a unified way, and proxy the distance between
models in a concise way via the relative entropy, as in (3.5).
h We have defined thei utility function in terms of the log linear excess return Wt+1 =
bt (it ift ) (et+1 et ) . Notice that for a given domestic interest rate it , u1 = E [ exp (Wt+1 )]
h
i
f
exp(i
)
.
is a monotonic transformation of u2 = E exp bt exp(it ) EEt+1
t
t
Finally, we have allowed for misspecification only in the forward premium process. One
could think of more complicated setups where uncertainty exists in dierent parts of the
model, or where the structure of the economy is more complicated. Although such extensions
are worth pursuing, we believe that considering a minimal model allows us to better target
7
8
Define 0 = if 0 (A)= (A) for all A I.

The relative entropy is not a metric on the space P() because it does not satisfy the triangle inequality.
missforecasts as a source of the anomalies.
4. Equilibrium Exchange Rate

We derive the equilibrium in two steps. First, we solve the agents problem (3.8) for a given
exchange rate et . Then, we derive the exchange rate function that equilibrates the market
and that is consistent with the conjecture (3.7).
4.1. The Joint Forecasting-portfolio Problem
In order to solve (3.8) the investor solves simultaneously three problems: (i) selects the
robust probability measure t ; (ii) under this robust probability measure, computes her
trend estimates; and (iii) selects the portfolio bt that generates the highest expected utility
under t . As we shall see, the larger the set of models over which the strategy is robust
against misspecification, the less the optimality under the baseline model 0 .
Problem (3.8) cannot be solved by applying the well known separation principle, under
which the forecasts are obtained using Bayes law independently of the portfolio optimization. Instead, we need to consider a joint forecasting-portfolio problem. We will solve
this problem by considering a zero-sum game between the investor and nature. The investor
chooses bt taking into account the strategy of nature. Conditioning on the choice of bt , nature
chooses the probability measure t in a malevolent way.
This problem is rather complicated as the investor must optimize over a set of unknown
probability measures to make her forecasts. We solve this problem by converting it to
a parametric problem that determines the value of a variance distortion rather than the
robust probability measure. We can do so because, for a given random variable, a oneto-one mapping can be established between the probability distribution function and the
underlying probability measure.9 This mapping is given by the next change of measure
lemma, which is analogous to the Girsanov theorem.10
Lemma 4.1 (Change of Measure I: Volatility). If under baseline probability measure
0
0 the random variables in the forward premium process (3.2) are distributed as xt |It1 v
0
0
N (a
xt1 , 1), yt |xt N (xt , 2v ) and xt1 N xt1 , 2t1 , then for any probability measure
v :
The distribution of the observation satisfies yt |xt v N xt ,

2v , while the distributions
of xt |It1 and xt1 are preserved.
9
A random variable v is a measurable function that maps an abstract underlying measurable space
(, I) to (R, B (R)) , where B (R) is the Borel -algebra. A probability measure on (, I) determines the probability of v belonging to a set A B (R) , which is defined by Pr (v A)
({ : v () A} (v)). There exists a one-to-one correspondence between the probability distribution and the underlying measure on the space we consider for the following reasons. First, the distribution
R
function of random variable v is a non-decreasing function F : R [0, 1] such that Pr (v A) = A dF (x)
for every set A B (R). By definition the induced distribution function F is determined by the underlying
probability measure almost everywhere. Second, if we know theRdistribution function of v, we can retrieve
the underlying probability measure with the formula (B) = v(B) dF (x) , B (v) . This measure is
unique except in sets with zero measure.
10
The Girsanov theorem applies to a change of drift. Here we are interested in a change of variance.
The relative entropy of probability measure with respect to the baseline 0 equals
2
2
v
1
v
0
1
for any v
< (|| ) =
log
(4.1)
2 2v
2v
This Lemma allows us to go back and forth between a distortion to the variance parameter
of a probability distribution and sets of underlying probability measures. In terms of the
linear hidden-state formulation (3.2), Lemma 4.1 says that if under baseline model 0 the
random variables v and w are normally distributed as in (3.3), then under an alternative
model v only the variance of the observation equation is altered from 2v to
2v , while
the rest of the process remains unchanged
vt N 0,
2v
yt = xt + vt ,
(4.2)
wt1 N (0, 1) , xt1 |It1
xt = axt1 + wt1 ,
N xt1 , 2t1
Equation (4.1) will prove quite useful because it defines the distance between models only in
terms of the variance distortion. It says that the relative entropy between measures and
0 is zero when there is no variance distortion, and that it increases at an increasing rate as
the distortion grows in either direction.
The robust probability measure defines a robust set of models with the property
< ( ||) < < ( ||0 ) . This set is convex because of the convexity of the relative entropy,
which guarantees a unique robust measure. Given the structure of (3.4), we can also specify
the set as the set of all distorted variances in the interval [ 2v ,
2v ].
Solution to the Investors Problem
t
Let xt t be the estimate of the trend xt and 2,
be the variance of xt under probability
t
t1
2,t1
measure t . At time t, the agent inherits xt1 and t1 and updates her estimate of xt and
xt+1 using the information It under the robust probability measure t .
Notice that for a fixed probability measure t , the investor constructs her estimators of
xt and xt+1 using Bayes law. These estimators are given by the celebrated Kalman filter
t
f
t
t
t
t
t
xt+1 E (xt+1 |It ) ,
(4.3)
with xt = 1 kt a
xt1 + kt it it ,
where the gain of the filter ktt and the variance of xt are
ktt
2,
a2 t1t1 + 1
,
2,
a2 t1t1 + 1 +
2v,t
t
2,
= vart (xt |It ) =
t
2,
2v,t
a2 t1t1 + 1
2,
a2 t1t1 + 1 +
2v,t
The distorted variance

2v,t is uniquely determined by t via Lemma 4.1.
To compute the distribution of excess returns under probability measure t , the t agent
t+1
uses
(3.7) to forecast next periods exchange rate: econj
t+1
+
t+1 = t+1 + 1 x
her conjecture
t+1
2 it+1 ift+1 . To compute Ett (
xt+1
; It ) note that the t agent knows the problem that will
be solved next period by t + 1 agents. Thus, the t agent knows how t + 1 agents will derive
the robust probability measure t+1 , and that they will make forecasts using Bayes law under
t+1
t+1 . It follows that the
that t+1 agents will use the updating formula xt+1
=
t agent knows
t+1
t
(1 kt+1 ) a
xt t + kt+1 it+1 ift+1 . Therefore, the t agent sets Ett (
xt+1
; It ) = xt+1
= a
xt t .
10
Replacing this formula in the conjecture, it follows that under probability measure t , the
log excess return Jt+1 (i ift ) (et+1 et ) is normally distributed with mean and variance
f
f
t
t
E (Jt+1 ) = (i it ) E et+1 et = t+1 a ( 1 + 2 ) xt + et + it it (4.4)
V t (Jt+1 ) = (kt+1 1 + 2 )2 t ,
t a2 ktt
2v,t + 1 +
2v,t
(4.5)
2
Since for any normally distributed random variable z, E (exp [z]) = exp E (z) + 2 var (z) ,
problem (3.8) is equivalent to
"
!#
2v,t
2v,t
1

2
t
t
max 2 inf
exp bt E (Jt+1 ) + (bt ) V ar (Jt+1 ) +
log 2 1
bt R
2
2 2v
v
v,t (,)
(4.6)
The attractive property of (4.6) is that it has converted an optimization over unknown
probability measures to a much simpler parametric optimization over the variance distortion
2v,t . The solution to this problem is given by the following Lemma.

Lemma 4.2 (Solution to the Portfolio-Forecasting Problem). In the presence of observation uncertainty, i.e., v , Problem (4.6) has a solution only if the degree of uncertainty aversion 1/ is lower than a threshold 1/ given by (8.3). In this case:
1. The forecast of the forward premium is given by the Kalman filter (4.3) under the
robust probability measure t , and the demand for the domestic bond is
h
i
t+1 + a ( 1 + 2 ) xt t et it ift
(4.7)
2
bt (et ,
v,t ) =
V art (Jt+1 )
f
it it (E t et+1 et )
=
V art (Jt+1 )
2. Under the robust probability measure t the robust forecast of the forward premium
2
underreacts to news: k(
2
v,t ) < k( v ) for any ( , ).
3. Beliefs are not distorted if the primitive utility function is risk neutral or there is no
2
aversion to uncertainty:
2
v,t v if 0 or .
4. If the degree of uncertainty aversion is greater than 1/ , there is no solution to the
portfolio-forecasting problem.
The solution is quite intuitive. The numerator of the demand in (4.7) is the expected rate
of return, and the denominator is the risk aversion coecient times the variance of returns.
A non-standard aspect of the demand is that the moments of returns are computed under
the robust probability measure t , which need not be the same as the baseline measure
0 . Part 2 says that under t the agent updates her forecasts of E t et+1 using a gain k
smaller than the Bayesian gain associated with the baseline model. As we shall see, this
underreaction of forecasts to news is the key to accounting for the anomalies. Part 3 says
that both risk aversion and uncertainty aversion are necessary for this underreaction. In
particular, if agents have a risk neutral primitive utility function, a desire for robustness will
11
not generate the anomalies. Part 4 says that in general the eects of misspecification can
be ameliorated but not completely eliminated. This is because for any < , the game
between the agent and natures ceases to be convex.
From now on we will characterize the demand and forecasts simply in terms of the
distorted variance
2
v,t with no reference to the abstract concept of robust probability measure
t . This is because the expectation and the variance in (4.7) can be computed by simply
plugging
2
v,t in (4.3)-(4.5). The reason for this simplicity is that there is a one-to-one map
2
between
v,t and t established by Lemma 4.1. This translation will prove useful because
distortions of parameters are intuitive and can be linked to the empirical literature.
4.2. Equilibrium
Since investors have measure one, the equilibrium condition is bt (et ) = bst . By solving this
equation, we obtain
clearing exchange rate in terms of the undetermined co
the market
ecients: et = it it + a ( 1 + 2 ) xt + t+1 + bst V art (Jt+1 ) . We then obtain the

equilibrium value of these coecients (t , 1 , 2 ) by imposing the fixed point condition that
the market clearing exchange rate equals the conjecture (3.7).
Proposition 4.3 (Equilibrium). In a robust linear equilibrium, the log exchange rate is
a t
xt + t ,
t = t+1 + bst V art (Jt+1 ) ,
(4.8)
et = it ift
1a
t
The robust forecast of the forward premium xt+1
= a
xt t is given by the filter (4.3) and
t
the variance of returns V ar (Jt+1 ) is given by (4.5).
The exchange rate underreacts to news under observation uncertainty because in making forward premium forecasts investors
distort downwards the gain of the filter as a
0
means to attain robustness (i.e., kt t < kt via (8.1)).

The exchange rate function (4.8) is intuitive. The first and second terms say that the
exchange rate appreciates (i.e., there is a fall in the number of Dollars per Euro) if there is
an increase in either the interest rate dierential (it ift ) or the present value of its forecasts
P
P i t
t
t
a
(
x
=
a
x
=
x
). The third term t can be interpreted as the long-run

t
t+i
i=1
i=1
1a t
11
exchange rate.
The non-standard aspect of (4.8) is that the estimate of the unobservable trend (
xt t )
is derived by distorting upwards the observation variance in the filter (4.3), i.e., by setting
2
2
v,t > v,t . Because the magnitude of the reaction of forecasts to news is determined by
the gain k in (4.3), which in turn depends on
2
v,t , Lemma 4.2 implies that, relative to the
baseline Bayes estimator, the robust estimator underreacts if the agent fears misspecification
in the equation that links observations with the unobservable trend. Proposition 4.3 states
that this forecast underreaction is translated into an exchange rate underreaction to forward
premium news. This mechanism will be the key to account for the anomalies in Section 5.
2
The result
2
v > v means that the robust agent considers a PDF for yt with fatter
tails, but the same mean, than the standard Normal. This result does not depend on the
particular value of the baseline variance 2v .
11
In order
P fors t to be bounded it is necessary that the fiscal stance is balanced over the infinite horizon,
so that
i=0 bt+i is bounded.
12
4.3. Intuition for Underreaction

Why should robustness against observation uncertainty necessarily imply underreaction to
news, not overreaction? There is, what we might term, a "robustness principle" at work. In
designing her strategy the agent asks: if things were to go wrong, what would be the costliest
direction in which they could go wrong? She then trades o the benefits of dampening the
eects of misspecification in this costliest direction, against the costs of moving away from
"optimality under the baseline model." This trade-o is weighted by the degree of uncertainty
aversion 1/.
To determine the distortion that agents will introduce we need to apply this robustness
principle to a specific uncertainty set. If agents fear misspecification in the equation that
links observations and the unobservable trend of the forward premium, the costliest misspecification occurs when the agents model wrongly sets the observation noise lower than
what it actually is. Therefore, under observation uncertainty robustness entails distorting
upwards the observation noise, which in turn implies underreacting to news.
To illustrate the core of the mechanism that generates underreaction let us decuple the
forecast problem from the portfolio problem. Consider a one-period filtering problem where
the observation is y = x + v, the unobservable trend follows x = ac + w, the shocks w and
v are independently distributed random variables with variances d and , and var(c) = 0.
In this example, the agents primitive objective is to find an estimator x to minimize the
mean squared error (MSE) of her state estimate E (x x)2 . To find an estimator x the
agent must choose the weight (k) on the observation and the weight on her prior (1 k):
x = ky + (1 k) ac. Given the gain k, the MSE equals
v
E (x x)2 = k2 + (1 k)2 d + (1 k)2 a2 var(c)
(4.9)
If the agent fears no model uncertainty, the gain that minimizes the MSE is the celebrated
Kalman filter used by Muth(1961), Lucas (1973) and many others12
kb =
d
.
d+
If the agent fears misspecification of the observation equation, what gain k shall she choose?
Since the misspecification is unknown, the agent considers all potential misspecifications
within the uncertainty set. This is why the structure of the uncertainty is key. When
v , there are two types of errors she can make. Error I occurs when she sets too high
and so chooses k lower that what it should be under the true model. Error II occurs when
she sets too low. If things were to go wrong, which error is more costly? The key point is
that when v , it is more harmful, in terms of the MSE, to mistakenly believe that is
low than to mistakenly believe is high. Therefore, the robust agent focuses on the type II
error and guards against the possibility of underestimating . This leads her to underreact
to news in order to be robust against observation uncertainty.
The above argument explains why there is underreaction. But how far shall a robust
agent underreact? The answer depends on the degree of uncertainty aversion 1/. To see
the intuition consider the case in which the agent is infinitely uncertainty averse ( = 0), so
her objective is simply to minimize supv E (x x)2 , and suppose for a moment that
d
can take values only on [,
]. In this case the agent would set the gain to k (
) = d+
< kb .
12
In a Gaussian setup Bayes law renders the same result.
13
In this way she would bound the maximum MSE regardless of the value of , and would
actually attain the lowest upper bound on the MSE.13 Notice, however, that the uncertainty
set v does not impose a hard upper bound on , but it allows to take any value in the
extended positive real line. Thus, the simplistic worse-case analysis would lead the agent
to guard against
= and set k = 0. This belief is quite paranoid. The role of the cost
0
R(|| ) in Problem (3.8) is to dampen this paranoia. Instead of imposing hard bounds on
, Problem (3.8) penalizes the fictitious malevolent nature for choosing a probability measure
dierent from the baseline 0 . When v , this cost grows at an increasing rate as the
distorted variance moves away in either direction from the baseline. Since the benefit to
nature, in terms of a greater MSE, grows linearly at rate k2 , nature faces a concave objective
and thus sets at a bounded level, but greater than its baseline. This level is determined
by .
Lemma 4.2 considers a more complex problem where the forecast and portfolio problems
must be solved jointly. However, the gist of the argument is the same as the one we have
used to illustrate the intuition. Although the formula for the variance distortion
2
v,t is more
complex, it has the same two properties as the distortion in the simple example above:
2
v,t is greater than the baseline level and it is inversely related to the degree of uncertainty
aversion 1/. In fact,
2
v,t tends to infinity as approaches the lowest permissible bound
. Notice that in general cannot be zero and so absolute robustness against observation
uncertainty cannot be attained. This is because below the threshold the objective function
ceases to be convex in
2
v,t , and so the problem has no solution.
5. Foreign Exchange Market Anomalies

Here we analyze conditions under which the anomalies are generated by the robust equilibrium of Proposition 4.3. Recall that we have imposed the condition that the baseline
model be equal to the data generating model. Thus, we obtain the rational expectations
equilibrium when agents use the baseline model. In this case the forecasts are generated by
Bayes law, the Fama coecient is one and there is no momentum.
To preview, we find that exchange rates should exhibit momentum i.e., delayed overshooting
and the forward premium puzzle in its most extreme form i.e., a negative Fama coecient,
if the interest rate dierential has a highly persistent trend (high a), and the degree of un0
derreaction to news k k is high, but not too high. Further, our analysis indicates that
along a robust equilibrium path there are positive predictable excess returns under the baseline (data generating) model. Such predictable excess returns reflect a desire for robustness
against model misspecification.
To concentrate on the role of uncertainty about the forward premium process in accounting for the anomalies we consider a deterministic drift t in (4.8). We do so by setting the
supply of domestic bonds equal to a constant b up to some time T, where T can be very
large.
b,
tT
s
1
bt =
(5.1)
b exp (T t) t , t (T, ), t V t (Jt+1 |It )
13
To clarify the mechanics suppose for a moment that is drawn from a pdf f () with support [,
].
d
The robust agent disregards the information contained in f () and sets k (
) = d+
.
Notice
that
even
if
the baseline were

and
were far away from , the agent would choose k (
) for any > 0. This is
because her objective is to bound the worst-case damage.
14
It is well know that the gain kt of the filter (4.3) converges rather fast if the underlying
coecients (a,
2v ) are constant. Letting initial time be in the infinite past, we show in the
appendix that on t [0, T ] the gain is constant and given by
q
2
2 2
2
(1 + v a v ) + (1 + 2v a2 2v )2 + 4a2 2v
a +1
k = 2
=
. (5.2)
,
2a2
a +1+
2
v
The distorted variance

2
is also constant (it is given by (8.5)). Furthermore, the drift t
v
is deterministic and given by (8.9).
5.1. Momentum (Delayed Overshooting)
As a preliminary step in explaining the forward premium puzzle, we ask when is it that
the exchange rate exhibits momentum conditional on the occurrence of a once-and-for-all
shock to the forward premium. That is, the initial depreciation(appreciation) tends to be
followed by more depreciation(appreciation) for several periods afterwards. This delayed
overshooting pattern is consistent with the forward premium puzzle because there is a period
during which an exchange rate appreciation coexists with a positive interest rate dierential.
This pattern stands in contrast to the overshooting pattern of Dornbusch (1976), and has
been found in the data by Eichenbaum and Evans (1995).
In order to determine the conditions under which there is momentum we compute the
impulse response of the equilibrium exchange rate to a random forward premium shock (i.e.,
a shock to y1 = i if1 ). Suppose that at time t = 1 the representative agent observes a
forward premium realization y1 = 1 and suppose no shocks occur afterwards. She knows
that the forward premium shock is generated by a combination of a transitory shock v1 =
and a persistent shock w0 = such that y1 = + = 1, but she does not observe the
particular values of and .
Since the data is generated by baseline model 0 , we define the average impulse response
at time t to an initial y-shock as follows
0
eav
t E (et (, ) et (0, 0)) |y1 =+=1 ,
(5.3)
where et (, ) is the response of the log exchange rate at time t to an initial persistent shock
and an initial transitory shock . That is, the time t log exchange rate generated by shock
sequences ws = {, 0, ..., 0}1t and vs = {, 0, ..., 0}1t . Notice that it is not appropriate
to consider only the response to a persistent shock to account for the momentum anomaly
because the econometrician observes the average response. He or she cannot condition on
whether shocks are transitory or persistent.
Since in equilibrium the exchange rate and the forecasts are linear in the initial shocks
and , we can express the average impulse response (5.3) as a weighted average of the
responses to a persistent shock et (, 0) and to a transitory shock et (0, ) :
0
eav
t (, 0) + [1 q ] et (0, ),
t = q e
0
1
1+
2v
2w
1
1 + 2v
(5.4)
The weight q is the expected value under the data generating model 0 of the trend shock
0
conditional on y1 = 1. The greater q , the greater the share of trend shocks in the data.
15
As we can see, q is decreasing in the noise-to-signal ratio of the data generating model
2
( 2v = 2v ). Recall that in the baseline model 2w is assumed to be one.
w
The equilibrium filter (4.3) and exchange rate function (4.8) imply that the response to
a
a purely transitory shock is: e1 (0, ) = 1a
k and
et (0, ) =
at k (1 k)t1
,
1a
t2
(5.5)
If the interest rate dierential shock is positive, the exchange rate appreciates at time t = 1,
and it depreciates starting at t = 2. Thus, the impulse response to a transitory shock does
not exhibit momentum. The response following a purely persistent shock is
et (, 0) =
at1
1 a (1 k)t ,
1a
t1
(5.6)
This response may exhibit momentum only if the trend is persistent enough (large a), and
the gain of the filter k
is neither too large nortoo low. For instance, e2 (, 0) < e1 (, 0)
4a31 2a+ 4a31

3
.
,
provided a 4 and k 2a 2a
2a
Since the response to a transitory shock dies out after one period, to establish the existence of momentum in the average exchange rate it is necessary that the response to a
persistent shock is humped-shaped and that it is not dominated by the response to a transitory shock. The next Proposition states the conditions under which these conditions hold.
Proposition 5.1 (Delayed Overshooting). Under the data generating model 0 , the average impulse response of the equilibrium exchange rate to an interest rate dierential shock
1a(1k )
e
=
and for t 1 :
at t = 1 satisfies: eav
0
1
1a
av
eav
t+1 et =
at1 h 0

t1
0
k q a 1 a 1 k
+ (1 a) q
1 k
1a
(5.7)
0
i. There is momentum if and only if forecasts underreact suciently to news (k k ) so

0
that q > k , but k is not too low. This occurs if agents fear misspecification in the
observation equation and the aversion to uncertainty 1/ is high enough.
ii. Momentum occurs up to a time , after which mean-reversion takes place. Time is the
smallest integer that satisfies
0

log [1 a(1 k )][q k ] log q [ a1 1]
1+
log (1 k )
The first part of the proposition says that momentum occurs only if agents act as if shocks
are more transitory than what they actually are in the data. Equivalently, the agents filter
must take observation shocks to be more abundant than what they actually are in the
0
data: k < q . This distortion leads forecasts to underreact on average to news. As a
result, forecasts will have to catch up latter on. This catch-up will generate momentum
av
(eav
1 > e2 > ...) provided trend shocks are persistent enough (a is high). The second part
of the proposition follows because since a < 1, the eect of the forward premium shock
vanishes over time. Therefore, even if the estimate of x underreacts initially to an increase
16
in y1 , there is a time after which this estimate will have to start declining. Hence, there is
av
mean reversion (eav
< e +1 ).
To see the intuition notice that the exchange rate is proportional to the estimate of the
hidden trend xt , and that the response of xt to forward premium news is determined by the
gain k because xt = kyt + (1 k)a
xt1 . Suppose for a moment that the forward premium
shock is positive and purely persistent. Since k < 1, the agents estimate underreacts to the
news at t = 1. If the trend is highly persistent (a is large enough), next period the agents
estimate will have to be revised upwards and so there will be an appreciation between time
t = 1 and time t = 2. If instead a were small, even trend shocks would disappear fast, so
initial underreaction would not lead to momentum. Next, consider the gain k . If k were
too small, subsequent revisions in the estimate would not be strong enough to counteract
the declining eect of the original shock. Likewise if k were too large, most of the updating
would take place initially, so subsequent revisions in the estimate would be too small.
Let us now drop the assumption that the shock is purely persistent. In order for the
impulse response to the average shock to inherit the humped shape of the response to the
0
persistent shock, it is necessary that the share of persistent shocks in the data (q ) be
0
greater than the share assumed by the agents robust filter. This is because the share q is
2 2 +1
0
0
1
0
necessarily lower than the Bayesian gain k . That is, k = a2a2 +1+
2 > 1+ 2 = q .
v
v
Since the agents robust model distorts upwards the variance of observation shocks
0
relative to the data generating process (i.e.,

2
> 2v ), the robust gain k is lower than
v
0
the Bayesian gain k . Delayed overshooting occurs if and only if the distortion
2
2v
v
is large enough so that there is enough underreaction and k is lower than q . The degree
of underreaction is in turn determined by the uncertainty aversion coecient 1/. We can
0
ensure that k < q by letting go towards its lower bound . Notice that if there is no
uncertainty aversion, 1/ 0, there is no room for distortion as

2
must be equal to 2v .
v
0
0
Therefore, there cannot be momentum as k = k > q .
5.2. The Forward Premium Puzzle (FPP)
Fama (1984) estimated the regression et+1 et = + F (i ift ) + ut . Under the null of
uncovered interest parity and rational expectations the slope coecient F is one. Fama
F
F
(1984) and many empirical studies find that < 1 (weak FPP) and often < 0 (strong
FPP). The forward premium puzzle in its strong form is linked to the empirical fact that
high interest rate currencies tend on average to appreciate with respect to low interest rate
currencies. This pattern implies that there are unconditional predictable excess returns and
contradicts the risk neutral uncovered interest parity condition.
The next proposition states the conditions under which the robust equilibrium exchange
rate (4.8) generates a theoretical F < 0, given that the interest rate dierential is generated
0
by baseline model , but agents use the robust model that solves their portfolio problem.
Proposition 5.2 (Forward Premium Puzzle). In the robust equilibrium, the Fama regression coecient converges in plim to
2
+
(1
a
)
a
a
(1
+
a)
k
1
F
1+
= 1 k k
0
2 a2
1 a2 1 k (1 a2 (1 k ))
(5.8)
17
1. Weak FPP. is less than one if and only if forecasts underreact to news: k < k .
This underreaction occurs if agents fear misspecification in the observation equation
that links the interest rate dierential and the unobservable trend.
F
2. Strong FPP. is negative if uncertainty aversion is suciently large (so that k
is small enough) and the trend of the interest rate dierential is highly persistent (i.e.,
the drift a is large enough).
F
3. cannot be negative if the drift a is small.
Part 1 says that if agents underreact to news, the asymptotic value of the Fama coecient
is strictly smaller than one and the forward premium is a biased predictor of the future rate
F
of depreciation. Part 2 states that the strong form of the FPP (negative ) results if both
0
F
persistence a and underreaction to news k k are large. Part 3 says that can be
negative only if the trend of the interest rate dierential is highly persistent.
To see the intuition let us analyze when is it that the regression coecient F =
0
cov (et+1 ,yt )
is negative. To expand the numerator notice that under the data generating
var0 (yt )
0
model 0 expected depreciation is Et (et+1 ) (it ift ) t , where t are the predictable
excess returns under the data generating model 0 :
0
t := (it ift ) Et (et+1 et )
(5.9)
Using (5.11) to substitute for it ift , we show in the appendix that in the robust equilibrium

h 0
i
ak
t = Et (xt+1 ) Et (xt+1 ) 1 +
+ t
(5.10)
1a
The second term is given by the first order condition for bt :
E (et+1 ) et = (it
ift )
1
t , t :=
2
ak
+ 1 (a2 k
2
2
v +1+
v )
1a
(5.11)
In order to link the above equations to the literature we can interpret (5.11) as the robust
uncovered interest parity condition. It says that the depreciation rate expected by robust
agents equals the forward premium minus a risk premium on domestic assets. Thus, under
the agents robust model , predictable excess returns on domestic currency equal simply
the risk premium t . However, under the data generating model 0 predictable excess returns
equal the expectational distortion plus the risk premium, as we can see in (5.10).
Combining (5.9) and (5.10) we obtain:

h 0
i
ak
0
Et et+1 = yt Et (xt+1 ) Et (xt+1 ) 1 +

+
(5.12)
1a
0
0
0
ak
k
y
k
E
a
1
+
k
(x
)
E
(x
)
+
= yt 1 + ak
t
t
t
t1
t1
1a
1a
The first term in (5.12) is the direct eect of the forward premium on average depreciation.
The second term is the forecast error eect on average depreciation, and it is the source of
the forward premium puzzle. The third term captures the eect of past forecast errors. The
18
last term is the risk premium.14

To see the intuition for Proposition 5.2 consider an increase in yt and disregard the third
term in (5.12) for the moment. Equation (5.12) shows that if forecasts underreact to news
0
(i.e., k < k ), expected depreciation responds by less than the change in the forward
premium. This results in a Fama coecient less than one, as stated part 1 of Proposition
0
5.2. To derive part 2 notice that if both persistence a and

underreaction
to
news
k
k are
large, the forecast error eect in (5.12), captured by 1 + ak

k k , can dominate
1a
the direct eect, which is one. As a result, the initial mispricing of the exchange rate (dollars
per euro) is so large that it requires the exchange rate to appreciate further in the future.
In the future, as agents update their beliefs, they realize the change in interest rates is more
persistent than initially anticipated. When a is large, this future upward revision in beliefs
will have a large eect on the exchange rate because agents will expect high domestic interest
rates to persist far into the future. This mispricing results in an extreme scenario where a
high domestic interest rate coexists with an appreciating currency. Hence, under the data
generating model 0 , the forward premium and the expected depreciation tend, on average,
F
to move in opposite directions. This is what makes negative.
Consider now the third term in (5.12). Since past expected forward premia are correlated
F
with the current forward premium, one obtains when the correlation is properly taken
into account. This term dampens the movements in predictable excess returns and makes it
F
more dicult for to turn negative, as revealed by a comparison of the first two terms in
(5.12) with the Fama coecient in (5.8).
F
To see the intuition for part 3, notice that the coecient is decreasing in a because
a more persistent trend implies that any forward premium expectational error will lead to
a more severe mispricing of the exchange rate. When a is small, the downward bias in
0
the robust gain (k < k ) does not translate into a significant underreaction of forecasts
to interest rate news because persistent trend shocks become very similar to transitory
observation shocks.
The dependence of a strong FPP on k is more complex. A low requires

a low k .
F
= 0 the Fama coecient is positive: =
ak
1a2 1k
However, when k
> 0. The
2a2
minimum of is attained for small but strictly positive values of k . The case k = 0
corresponds to an environment where agents behave as if all shocks are purely observation
shocks ( 2
).
v
Using interest rate survey data Gourinchas and Tornell (2004) find that across the G7
14
In our robust equilibrium all deviations from uncovered interest parity are generated by forecast errors
0
because the risk premium t is deterministic (cov ( t , yt ) = 0). To see this, we decompose the realized log
exchange rate change (et+1 := et+1 et ) into its robust forecast Et (et+1 ) and a forecast error t+1 :
et+1 = Et (et+1 ) + t+1 ,
Et ( t+1 ) = 0.
(5.13)
Clearly, the forecast error is zero under the robust measure . Using (5.13) we have that
F ama =
cov (et+1 , yt )
cov ( t+1 , yt ) cov ( t , yt )
=1+
0
0
0
var (yt )
var (yt )
var (yt )
(5.14)
The decomposition in (5.14) says that deviations of F ama from one are due to either forecast errors or timevarying risk premia. Using survey forecast data on exchange rates Frankel and Froot (1989) find evidence
for deviations from UIP coming from forecast errors.
19
countries, the drift of the interest rate dierential process is quite high and that the gain
0
used by market participants k is small enough relative to the Bayesian gain k so as to
account for the anomalies. In our model economy, we can generate such a small gain k by
letting the the uncertainty aversion coecient 1/ be large enough.
6. Other Types of Uncertainty

Proposition 5.2 establishes that structured uncertainty about the link between interest rate
dierential observations and the unobservable trend can generate both the forward premium
puzzle and momentum when the trend is highly persistent. In this section we consider
two other types of uncertainty: structured uncertainty about the trend process xt , and
unstructured uncertainty under which agents fear misspecification of the entire interest rate
dierential process but cannot pinpoint either its nature or location. In the first case we
find an overreaction of forecasts to news, which leads to a F ama > 1 and does not generate
momentum. In the second case, the result is surprising: robust forecasts are equal to the
Bayesian forecasts under the baseline model. Thus, the forecast-underreaction mechanism
underlying the forward premium puzzle and momentum is not operative.
6.1. Structured Uncertainty in the Trend Equation
We consider two types of structured uncertainty in the trend equation (3.2): about the
variance of wt and about the drift a. We will show that under these two types of structured
uncertainty an agent with baseline model 0 will set the gain of her robust filter k greater
0
than k , the Kalman gain under the baseline model.
In each case, we follow the same steps as in Section 4. First, we define the uncertainty
set, and present the change of measure lemma that links the set to a parameter distortion.
Then we solve the investors problem, and derive the equilibrium.
The following uncertainty set captures variance uncertainty in the trend equation.
(
)
s 2
d
1
1
1
w
w = P () :
(xt+1 axt )2
= exp
,
2 [, ], > 0
2
d0
2w 2w
2w w
(6.1)
The condition > 0 ensures that set w is closed and convex. The next Lemma shows that
under any measure in the set w , the interest rate dierential process is given by baseline
model (3.2), except that the variance of trend shocks has a distorted value
2w instead of the
baseline 2w . The variance of the observation shock remains equal to its baseline value (i.e.,
2v = 1).
Lemma 6.1 (Change of Measure (volatility II)). If under baseline probability mea0
sure 0 the random variables in (3.2) are distributed as xt |It1 v N (a

xt1 , 1), yt |xt
0
N (xt , 1) and xt1 N xt1 , 2t1 , then:
Under any probability measure w , xt |It1 v N a

xt1 ,
2w , while the distributions
of yt |xt and xt1 are preserved.
20
The relative entropy of measure with respect to measure 0 is

2
w
1
2w
0
< (|| ) =
log 2 1
for any w
2 2w
w
(6.2)
As in Section 4, the representative investor solves Problem (3.8). The only dierence with
respect to the case of observation uncertainty is that under an alternative model w the
variance of returns is not given by (4.5), but by

!
2
2 2
a
w
t1
2
(6.3)
V ar (Jt+1 |It ) = [(kt+1 1 + 2 )] a2 2 2
+1+
2w
a t1 + 1 +
2w
The solution to the agents problem has the same form as in the case of observation uncertainty. The only dierence is that a robust agent distorts upwards the variance of the trend
2
shock:
2
w > w . Therefore, there is an overreaction to news.
Lemma 6.2 (Solution to the Portfolio-Forecasting Problem). In the presence of trend
shock uncertainty, i.e., w , Problem (3.8) has a solution only if the degree of uncertainty
aversion 1/ is lower than a threshold 1/ given by (8.3). In this case:
1. The forecast of the forward premium is given by (4.3) and the demand for the domestic
bond is given by (4.7).
2. Under the robust probability measure t the observation variance is distorted upwards:
2
w,t > w for any ( , ). Thus, the robust forecast of the forward premium
overreacts to news
kt
2
a2 2t +
w,t
a2 2t + 1 +
2
w,t
> kt =
a2 2t + 1
,
a2 2t + 1 + 2w
2
w,t > w
3. If the primitive utility function is risk neutral or there is no aversion to model uncer2
tainty, robust beliefs are not distorted:
2
w,t w if 0 or .
The upward distortion in the variance of the trend shock follows from applying the
robustness principle to uncertainty set w . If agents fear misspecification in the equation of
the unobservable trend of the forward premium, the costliest misspecification occurs when
the agents model wrongly sets the variance of the trend shock lower than what it actually
is. Therefore, under trend equation uncertainty, robustness entails distorting upwards the
variance of trend shocks, which in turn implies overreacting to news. To see the intuition
consider a fictitious game between the agent and nature. The agents tries to minimize
the mean squared error (MSE) of her estimate, while nature acts malevolently and tries to
maximize it. Since the estimator takes the observer form x = (1 k) a
c + ky, as shown by
Lemma 6.2, the MSE is
w
2w + k2
E (x x)2 = a2 (1 k)2 2c + (1 k)2
We can see that given the estimator (i.e., the gain k), nature can increase the MSE by
choosing a larger
2w . To counteract this potential misspecification the robust estimator sets
21
the gain k greater than what Bayes law indicates. The degree of overreaction is determined
by the degree of aversion to model uncertainty 1/. The higher 1/, the greater
2
w and thus
the greater k . This is because the lower 1/, the lower the penalization nature suers when
from its baseline level. This penalization is given by < (||0 ) =
it sets
2w farther away
2
2w

log w2 1 .
2 2w
w
The equilibrium exchange rate is given by (4.8). It follows directly from Proposition
5.2 that there is no forward premium puzzle if w . That is, there is no unconditional
negative link between the forward premium and spot exchange rate changes (i.e., F ama < 0).
0
In fact, since the robust gain k is strictly greater than the baseline gain k , the regression
coecient in the Fama regression F ama must be greater than one. Similarly, Proposition
5.1 implies that if w , there cannot be delayed overshooting because the average impulse
response function must depreciate every period t > 1. To see this note that (5.7) is strictly
0
0
positive because q < k < k . That is, there is no delayed overshooting because the agents
gain takes trend shocks to be more abundant that what they actually are in the data.
6.2. Drift uncertainty
Here, we consider uncertainty about the drift of the trend of the forward premium. In the
agents baseline model the drift in the trend equation is a, but she fears that the true drift
is a + . We allow the misspecification to take values on [a, 1 a) so that the true
drift is positive and preserves mean-reversion: a + [0, 1). The following set of probability
measures captures such drift uncertainty, while keeping the variance of observation and trend
shocks equal to their baseline values (i.e., 2v = 2w = 1).
#
)
(
"
2
)
2x
(x
ax
)
d
(x
t
t
t+1
t
a = :
, [a, 1 a), a 0
(6.4)
= exp
2
d0
Lemma 6.3 below, which is a version of Girsanovs Theorem, establishes a one-to-one relationship between the set a and distortions to the drift.
Lemma 6.3 (Change of Measure (drift)). If under baseline probability measure 0 the
0
0
random variables in (3.2) are distributed as xt |It1 v N a
xt1 , 2t1 , yt |xt N (xt , 2v ) and
0
xt1 , 2w ), then:
xt1 N (
Under any probability measure a , xt |It1 v N ((a + ) xt1 , 1), while the distributions of yt |xt and xt1 are preserved.
The relative entropy of measure with respect to measure 0 is
< (t ||0 ) =
1 t 2
x
2 t
for any a
(6.5)
Under any model in the set a , the drift of the trend is a + . As before, notice
that for a fixed probability measure t a (or equivalently, given a drift distortion t ),
the investor generates her estimates of xt and xt+1 using Bayes law. These estimates are
t
given by xt+1
E t (xt+1 |It ) = (a + t ) xt t . In order to obtain a robust linear equilibrium,
we assume that the representative agents prior is the baseline estimate of the trend. The
robust linear equilibrium is characterized by the next proposition.
22
Proposition 6.4 (Equilibrium Under Drift Uncertainty). Under uncertainty set a

there exists a robust linear equilibrium if and only if the degree of uncertainty aversion 1/
is lower than a threshold 1/a given by (8.22). In this equilibrium:
1. The log exchange rate is
et
f
= it it
a + t+1
xt t + t ,
1 (a + t+1 )
t = t+1 + bt V ar (Jt ) ,
(6.6)
where V ar (Jt ) is given by (??).

2. The robust forecast of the interest rate dierential is given by
t
t
t+1
f
t
(6.7)
xt + kt+1 it+1 it+1 ,
xt+1 = 1 kt+1 (a + t+1 )
2
2
a + t+1 2t1 + 2w 2v
a + t+1 2t1 + 2w
t
and 2t =
with kt+1 =
2 2
2
2
2
a + t+1 t1 + w + v
a + t+1 2t1 + 2w + 2v
3. There is overreaction to news: t+1 > 0.
This proposition says that in the presence of drift uncertainty the exchange rate and the
forecast overreact to news. This is because the gain is increasing in the drift distortion ,
which is positive in equilibrium. To see the intuition for why the drift distortion must be
positive, consider a static filtering problem in which the agent is confident about her model
linking noisy observations and trend, but is uncertain about the drift of the unobservable
trend component. That is, let y = x+v and x = (a + ) c+w, where cN ((a + ) c, 2c ) is the
prior distribution of x and the disturbances follow distributions wN (0, 2w ) and vN (0, 2v ) .
Then, the estimator of x given y is x = (a + ) (1 k) c + ky. Hence,
x x = (a + ) (1 k) c + ky x
= (a + ) (1 k) c + (k 1) (a + ) c + (k 1) w + kv.
Therefore, under drift uncertainty the MSE is
a
E (x x)2 = (a + )2 (1 k)2 2c + (1 k)2 2w + k2 2v
(6.8)
Since the MSE is increasing in the drift, and the trend is restricted to be positive and mean
reverting (i.e., a + [0, 1)), the costliest misspecification occurs when the agents model
wrongly sets the drift lower than what it actually is. Therefore, if her objective is to bound
the eects of misspecification on the mean square error of her estimator, she should distort
upwards the drift. This distortion implies overreacting to news. In other words, when she
observes a large realization of the interest rate dierential she should fear that it is more
likely to come from a change in the trend than what her baseline model implies, and so
her best response is to put more weight on the observations than on her prior. Hence she
overreacts. The degree of overreaction depends on the value of the uncertainty aversion
coecient 1/.
23
From the perspective of the exchange rate anomalies the bad news is that since there is
overreaction of forecasts to news, the equilibrium exchange rate generates neither the forward
premium puzzle nor delayed overshooting.
Notice that the exchange rate path with constant coecient we derived in Section 5 to
analyze the anomalies does not apply under drift uncertainty. In our setup the estimate xt of
a t agent is the prior mean of a t+1 agent. Since in Section 5 we consider a constant baseline
measure 0 , we can generate an equilibrium sequence {
xt }. This procedure is appropriate for
both types of variance uncertainty. However, under drift uncertainty we cannot use the
estimate xt of a t agent as the prior mean of a t + 1 agent and at the same time assume
that the baseline probability measure 0 of t and t + 1 agents is the same. This is because,
under drift uncertainty, the robust estimate xt is not an unbiased estimator of xt under the
baseline probability measure 0 .
6.3. Unstructured Uncertainty
Under unstructured uncertainty the investor fears model misspecification but does not know
the nature of the misspecification, and does not know whether it is located in the observation
equation, or in the trend equation or in both. Here, we follow the robust control literature
and assume that the uncertainty set is the set of all probability measures on the measurable
space (, B ()) , where B () is the Borel algebra.15
u = P () { : B () [0, 1] and () = 1}
(6.9)
This set corresponds to a truly worst-case scenario!

Optimizing over the set u seems a daunting task. Fortunately a result of the theory of
large deviations the Representation Lemma of Dupuis and Ellis (1997) implies that the
investors problem simplifies significantly.
Lemma 6.5 (Representation Lemma). Under unstructured uncertainty, the robust forecastingportfolio problem (3.5) reduces to the following Bayesian problem under the unique baseline
probability measure 0 .
(6.10)
t = max inf Et [u (Wt+1 ) + <(||0 )]
bt P ()
1
0
= max log Et exp u (Wt+1 )
(6.11)
bt
This result is striking. It says that the agents problem reduces to an expected utility
maximization problem under a unique probability measure. Moreover, this unique probability measure is the baseline measure 0 . This is a Bayesian problem, like the ones considered
in rational expectations models. Under the equivalent representation the relative entropy
disappears and
0inf P
()1 Et u (Wt+1
) is replaced by the so called risk-sensitive utility function log Et exp u (Wt+1 ) . This function keeps the baseline probability measure
unchanged and, because of the exponential function, captures the desire for robustness by
15
The set of all probability mesures on is compact by Alaoglus theorem. See for instance Folland(2001),
page 169.
24
putting more weight on the tails of the distribution. Hence a risk-sensitive agent is more
concerned about tail events than a typical risk-averse agent.16
From the perspective of accounting for the exchange rate anomalies the bad news is
that in this problem the separation principle applies: Expectations can be computed
independently of the portfolio strategy. The investor forms her expectations using Bayes law
under the baseline probability measure 0 . Based on these expectations she then chooses her
portfolio. This separation implies that the gain that will appear in the equilibrium exchange
0
rate function is the Bayesian gain k . Thus, we will not get the underreaction of forecasts to
0
news (k < k ) that underpins the explanation for delayed overshooting and for the forward
premium puzzle in Propositions 5.1 and 5.2. In order to see whether these anomalies can
be generated by other mechanisms, such as a time-varying risk premium, let us derive the
equilibrium.
Since u(Wt+1 ) = exp(Wt+1 ), we can express problem t in (6.11) as follows
1
0
t = max log Et exp exp(Wt+1 )
bt
1
0
min Et exp exp(Wt+1 )
bt
1
f
0
min Et exp exp bt et+1 et + it it
bt
We show in the appendix that taking the first order condition for bt , using Steins Lemma
and the exchange rate conjecture (3.7), it follows that there is an interior solution for bt only
if returns satisfy the following condition:
f
0
Et (et+1 ) et = it it + lt .
(6.12)
Condition (6.12) is the well known uncovered interest parity condition plus a time-varying
risk premium lt , which is given by
V art (et+1 ) Et g 0 (Jt+1 )

f
lt : =
, where Jt+1 it it (et+1 et )(6.13)
0
Et g (Jt+1 )
1
s
s
g(Jt+1 ) : = exp (bt Jt+1 ) exp exp (bt Jt+1 ) ,
1
g 0 (Jt+1 ) = 1 + exp (bt Jt+1 ) bst g (Jt+1 ) .
Notice that when the risk-aversion coecient is large, g 0 (Jt+1 ) is negative and thus the
risk premium is large and positive. In contrast, in the risk-neutral case ( = 0), the risk premium becomes zero (because g0 (Jt+1 ) = 0). This result is striking: risk-neutrality combined
with a desire for robustness against unstructured uncertainty yields the same equilibrium
as risk neutrality under no model uncertainty. Finally, when there is no aversion to model
0
uncertainty (i.e., goes to infinity), the risk premium lt equals bst V art (et+1 ), which is the
16
The risk sensitive objective function has been introduced by Jacobson (1973).
25
same as in a standard rational expectations equilibrium.17

From pricing equation (6.12) we can find the equilibrium exchange rate. For the sake of
generality suppose that under the baseline model, instead of vt N(0, 1) and wt N(0, 1), we
have vt N(0, 2v ) and wt N(0, 2w ).
Proposition 6.6 (Equilibrium). Under unstructured uncertainty, in a robust linear equilibrium the log exchange rate is
a
xt + t , t+1 = t + lt ,
et = i ift
1a
where the estimate of the state is given by the standard Kalman filter under the unique
0
baseline probability measure
0
0
xt1 + kt i ift
xt = 1 kt a
2
2 2
2
a
a2 2t + 2w
kt 2 2
, 2t = 2 2 t1 2 w v2
2
2
a t + w + v
a t1 + w + v
a
and lt is given by (6.13) evaluated at 1 = 1a
, 2 = 1 and bt = bst .
Can the equilibrium exchange rate et generate the anomalies? Since et has the same
linear form as (4.8), we know from (5.7) and (5.8) in Propositions 5.1 and 5.2 that the
F ama
= 1 because the
first two terms in et do not generate momentum and give rise to
0
robust gain equals the Bayesian gain kt . This implies that the underreaction-of-forecasts
mechanism is not operational under unstructured uncertainty.18
In principle, the anomalies could be generated if the time-varying risk premium lt can
generate a sequence {t+1 } that is negatively correlated with the interest rate dierential.
This question cannot be answered analytically as lt is highly nonlinear. We use Monte-Carlo
simulations to derive equilibrium series of lt and t for dierent values of the drift a and the
cost of robustness .19 In each case, we first generate a large number of shocks (wt , vt ) for
every time period, and use them to simulate a large number of next-period prices et+1 given
lt1
, t1 . The resulting prices are used to compute the expectation terms in 6.13. We then
estimate series of lt and t by solving the non-linear equation 6.13. Finally, we compute the
Fama coecients with the estimated parameters. The simulation results show that F ama is
negative only when a and are small. The simulations do not generate any negative F ama
when a is close to 1.
7. Literature Review
Following Bilson (1981) and Fama (1984) the forward premium puzzle has been documented
for many currency pairs and time periods. Engel (1996) and Hodrick (1987) survey the
literature. Several mechanisms have been proposed to account for the failure of the uncovered interest parity. One group of papers concentrates on a time-varying risk premium
(for instance, Alvarez, Atkeson and Kehoe (2006), Lusting and Verdelhan (2006)). Another
17
This is because lim g(S) = exp (bt S) and lim g 0 (S) = exp (bt S).
A simmilar result obtains in H-inifinity filtering (see Basar and Berhard (1995)).
19
We set 2v = 2w = 1 under the baseline model.
18
26
group of papers looks at the microstructure of the trading mechanism (for instance, Bacheta
and Van Wincoop (2006), Burnside, Einchenbaum and Rebelo (2007), and Evans and Lyons
(2002)). A third group of papers concentrates on learning and expectational biases (for instance, Kaminski (1993), Lewis (1995), Frankel and Froot (1989), Frankel and Rose (1994),
Gourinchas and Tornell (2004)). The forward premium puzzle and delayed overshooting are
linked to the long-horizon predictability of exchange rates that has been found by Mark
(1995) and others.
Using exchange rate survey data Frankel and Froot (1989) find that expectational biases
can better account for the forward premium puzzle. Using interest rate survey data Gourinchas and Tornell (2004) find that forecasts of interest rate dierentials typically underreact
to news and that this underreaction can account for the exchange rate anomalies. Our paper
characterizes the conditions under which this specific distortion in beliefs regarding second
moments arises, and to show if agents want to be robust, this distortion entails neither a
violation of optimality nor incentives to exploit the resulting predictable excess returns under
data generating model.
In formulating and solving the agents problem, as Hansen et. al. (2002), we use robust
control, which has been recently developed in the mathematical and engineering literature.
The starting point is the recognition that the models used by real world agents are simply
approximations to the true unobservable model, and there is no reason to maintain that
the resulting misspecification patterns are known to the agents or can be parametrized by
a known probability distribution, as is done in the traditional stochastic approach. Robust
control was developed in reaction to shortcomings of optimal control. In the 1980s it was
widely recognized that, when the state must be estimated, the resulting optimal controllers
do not posses good robustness properties. Although an optimal controller leads to the best
performance in the case of no misspecification, it leads to bad results and even instabilities
in the face of small misspecifications. And when you plan to launch a missile, you would
rather be sure this does not occur.20 Robust control addressed the issue of how accurate the
model and the description of uncertainty should be, and how the performance index should
be defined to guarantee the robustness of the controlled system against misspecifications.
Lars Hansen, Tom Sargent and coauthors have used robust control to analyze several
macroeconomic issues. Hansen, et.al. (1999) consider robust agents that suspect specification
errors and design strategies that are robust against misspecification. They show that large
market based measures of risk aversion can emerge from concern about small specification
errors. Hansen, et.al. (2002) extend the above paper to a hidden state setup and solve a joint
robust filtering and control problem, as we do in this paper. Hansen, et.al. (2005) study the
connection between alternative robust control problems and the min-max utility of Gilboa
and Schmeidler (1989) and Epstein and Schneider (2003). Kasa (200?) and Onatski and
Stock (2000) have also used robust control to analyze economic problems.
This paper focuses on the robust filtering problem. In the robust control literature
20
This point is nicely made by Peter Huber in an influential book on classical robust statistcs. Referring
to the assumptions one makes when using probability distributions he says (Huber (1981), pg. 1): The
assumptions are not supposed to be exactly true they are mathematically convenient rationalizations of
an often fuzzy knowledge... one justifies their use by appealing to a vague continuity or stability principle:
a minor error in the mathematical model should cause only a small error in the final conclusions. Unfortunately, this does not always hold. During the past decade one has become increasingly aware that some
of the most common statistical procedures are excessively sensitive to seemingly minor deviations from the
assumptions...
27
an important distinction is made between global (unstructured) and specific(structured)

uncertainty. The result that under global uncertainty the robust estimator equals the baseline
Bayes estimator can be found in Basar and Bernhard (1995). A contribution of this paper
is to derive underreaction and overreaction of forecasts and prices from dierent types of
structured uncertainty.
In deriving overreaction and underreaction from an optimizing setup, this paper is related to Benabou and Tirole (2002), henceforth BT. They consider agents with imperfect
knowledge about their ability and with hyperbolic time discounting. There is a self-deception
game between self-0 and self-1 in which self-0 can repress bad or good news by incurring
a cost. If the repression cost is small and ability and eort are complements(substitutes),
a motivation eect dominates an overconfidence eect and bad(good) news are repressed.
Thus agents might persuade themselves that their skill level is greater(lower) than it actually
is, and motivate themselves to exert more eort. In our setup there is also an information
game: between nature and the agent. BTs repression cost is analogous to our parameter
that multiplies the relative entropy between the baseline and alternative probability measures. Since in our setup under and over-estimates are treated symmetrically, the source of
over(under) confidence derives from the type of uncertainty, not from the fact that ability
and eort are either complements or substitutes as in BT. If we were to consider payo
functions in which ability and eort are complements or substitutes, eects similar to those
identified by BT would interact with the eects we identify in this paper.
8. Conclusions
This paper contributes to the international finance literature by showing that robustness
against a class of model misspecifications can account for the forward premium puzzle in
its strong form a negative Fama coecient and delayed overshooting i.e., momentum.
Specifically, a desire for robustness against observation uncertainty leads optimizing agents,
that hold no misperception, to distort the probability distribution of the data-generating
process and to make forecasts of interest rate dierentials that are less sensitive to news
than standard Bayesian forecasts. In equilibrium, this underreaction in the bond market
generates the anomalies in the foreign exchange market.
Simply invoking robustness against model uncertainty is not sucient to account for the
exchange rate anomalies. The forecast underreaction requires a more refined description of
model uncertainty than that contained in the so called unstructured uncertainty typically
considered in the literature. This paper contributes to the robust macroeconomics literature
by refining the uncertainty set so as to capture specific ways in which the model can be misspecified, and by deriving Girsanov-like change of measure results that translate probability
measure sets into parameter distortions. This translation allows for close-form solutions to
the agents problem and the equilibrium exchange rate, and allows us to establish a link with
the empirical literature on beliefs distortions. We find that in the class of linear exchange
rate models, structured uncertainty about the link between interest rate dierential observations and the underlying unobservable trend process that drives them can generate the
anomalies. This is not the case for uncertainty about the trend process or for unstructured
uncertainty.
The results in this paper have not solved the puzzles, but have simply pushed them one
level deeper. The question remains as to why the representative agent is more fearful about
uncertainty in the observation equation than in the trend equation. This is a dicult question
28
to answer in a representative agent setup. One could think that such observation uncertainty
could capture more complicated uncertainty patterns that might exist in a heterogeneous
agent setup. For instance, if there is uncertainty about the liquidity or solvency of other
market participants, a big price change might simply reflect the unwinding of postilions of
market participants in need of liquidity rather than a persistent shock to fundamentals.
Finally, we would like to note that accounting for the forward premium puzzle can help
us explain the profitability of the so called carry trade, under which investors borrow in low
interest rate currencies and invest in high interest rate currencies. In this paper all agents
are alike and so do not engage in the carry trade. In a future paper we plan to generate the
carry trade by considering heterogeneous investors with dierent desires for robustness.
29
References
[1] Alvarez, Fernando, Alvarez, Andrew Atkeson and Patrick Kehoe, 2007, Time-Varying
Risk, Interest Rates, and Exchange Rates in General Equilibrium, mimeo.
[2] Basar, T. and P. Bernhard, 1995, H - Optimal Control and Related Minimax Design
Problems, Birkhauser.
[3] Burnside, Eichembaum, Kleschelski and Rebelo, 2006, The Returns to Currency Speculation, mimeo.
[4] Casella, George and Roger L. Berger, Statistical Inference.
[5] Dupuis, Paul and Richard S. Ellis, 1997, A weak convergence approach to the theory of
large deviations, John Wiley&Sons.
[6] Dornbusch, R., 1976, Expectations and Exchange Rate Dynamics, Journal of Political
Economy, 84, 1161-1176.
[7] Eichenbaum, Martin and Charles Evans, 1995, Some Empirical Evidence on the Eects
of Monetary Policy Shocks on Exchange Rates, Quarterly Journal of Economics, 1995.
[8] Epstein, L. and M. Schneider, Recursive Multiple Priors, Journal of Economic Theory,
2003, 1-31.
[9] Evans, Martin and Richard Lyons, 2002, Order Flow and Exchange Rate Dynamics,"
Journal of Political Economy, 110, 1.
[10] Fama, Eugene, 1984, Forward and Spot Exchange Rates, Journal of Monetary Economics, 14 (4), 319-338.
[11] Frankel, Jerey and Kenneth Froot, 1989, Forward Discount Bias: Is it an Exchange
Risk Premium? The Quarterly Journal of Economics, 104, 1, 139-161.
[12] Foland, G, 2001, Real Analysis- Modern Techniques and Their Applications (Third
Edition).
[13] Gilboa, Itzhak; David Schmeidler, 1989, Maxmin Expected Utility with Non-unique
Prior, Journal of Mathematical Economics, 18 (1989) 141-153.
[14] Gourinchas, Pierre O and Aaron Tornell, 2004, Exchange Rate Dynamics and Misperception, Journal of International Economics.
[15] Hansen, L. P., T. J. Sargent, G. Turmuhambetova, N. Williams, 2005, Robust Control
and Model Misspecification, working paper, Stanford Unversity.
[16] Hansen, L. P., T. J. Sargent, and T. Tallarini, Robust Permanent Income and Pricing,
Review of Economic Studies
[17] Hansen, L. P., T. J. Sargent, N. Wang, Robust Permanent Income and Pricing with
Filtering.
30
[18] Kaminski, Graciela, 1993, Is There a Peso Problem? Evidence from the Dollar/Pound
Exchange Rate, American Economic Review, 83, 450-472.
[19] Lewis, Karen, 1989, Changing Beliefs and Systematic Rational Forecast Errors with
Evidence from Foreign Exchange, American Economic Review, 79, 621-636.
[20] Lustig, H. and Verdelhan, A., 2007, The Cross-Section of Foreign Currency Risk Premia
and Consumption Growth Risk, mimeo.
[21] Mark, Nelson 1995, Exchange Rates and Fundamentals: Evidence on Long-Horizon
Predictability, American Economic Review, 85, 201-218.
31
Appendix
Derivation ofh Equation (3.1). At
i time t + 1, the representative old investor obtains
f
Et+1
a return Wt+1 = exp(i) Et exp(it ) bt . If we take a Taylor expansion around zero and
let et := log(Et ), we get
i
h
Wt+1 = bt exp(i) exp(et+1 et + ift )
h
i
f
= bt (1 + i + o2 (2)) (1 + et+1 et + it + o1 (2))
h
i
= bt (et+1 et ) + i ift + o (2) ,
where o(2) = o1 (2) + o2 (2) and
1
1
o1 (2) = exp ( 1 ) et+1 et + ift , o2 (2) = exp ( 2 ) i2 ,
2
2
f
1 0, et+1 et + it , 2 (0, i)
Clearly, limxi 0 oix(xi i ) = 0, for i = 1, 2 where x1 = et+1 et + ift and x2 = i. Thus the terms
o1 (2) and o2 (2) are approximately zero if et+1 et + ift and i are small.
n
h
io
is a monotonic
exp
ift
Lemma. For a fixed i, u1 = E exp bt exp (i) EEt+1
t
n
h
io
.
transformation of u2 = E exp bt i ift et+1 + et
Proof. Suppose that under utility function u1
bt,1 , Et+1,1 , Et,1 , ift,1 3 bt,2 , Et+1,2 , Et,2 , ift,2

Et+1,1
= E exp bt exp (i)
exp ift,1
Et,1

Et+1,2
exp ift,2
u2 = E exp bt exp (i)
Et,2
u1
for any measure .

Et+1,1
Et+1,2
f
exp (i)
exp it,1 exp (i)
exp ift,2
Et,1
Et,2
dividing both sides by exp (i)

1
Et+1,1
Et+1,2
exp ift,1 i 1
exp ift,2 i
Et,1
Et,2
f
f
exp i it,1 et+1,1 + et,1 exp i it,2 et+1,2 + et,2
i ift,1 et+1,1 + et,1 i ift,2 et+1,2 + et,2
32
i
h
exp bt i ift,1 et+1,1 + et,1
h
i
exp bt i ift,2 et+1,2 + et,2
n
h
io
u1 E exp bt i ift,1 et+1,1 + et,1
n
h
io
u2 E exp bt i ift,2 et+1,2 + et,2
Therefore, Et+1,1 , Et,1 , ift,1 3 Et+1,2 , Et,2 , ift,2 . QED
To show that if we let go to zero, the primitive utility function becomes risk neutral con0
sider the following monotonic transformation of the primitive utility function: 1 E (exp (Wt+1 ) 1)
). The following Lemma computes the limit.
Lemma. lim0 1 E (exp (Wt+1 ) 1) = E (Wt+1 ).
Proof. LHopital rule implies that
1
0
lim E (exp (Wt+1 ) 1) + <(|| )

0
1
0
= E lim (exp (Wt+1 ) 1) + <(|| )

0
= E lim Wt+1 exp (Wt+1 ) + <(|| )

0
= E [Wt+1 + <(||0 )]
Proof of Lemma 4.1. Notice that if under the baseline

model 0 , yt N (xt , 2v ), then
under any measure v , we have that yt N xt ,

2v . To show this we compute the
Z
Z
d 0
d =
P (y < t) =
0 d
{y<t}
{y<t} d
s 2
Z
v 0
1 1
1
(y xt )2
=
exp
d
2 2
2
v v
2v
{y<t}
#
"
s 2
Z
2
1
)
1 1
1
(y
x
t
v
dy
(y xt )2
=
exp
exp
2 2
2
2
2
v 2 v
{y<t}
v
v
#
"
Z
1
(y xt )2
dy
=
exp
2
2v
2
v
{y<t}
33
which shows that the density function of yt is N xt ,

2v . The relative entropy is given by
2
1 1
d
1
1 2
2
0
=E
(yt xt ) +
R (|| ) = E log
log v log
v
2
2
d0
2w 2v
1 2
2
1
1 1
E vt2 +
log v log
v
=
2 2
2
2
v v
2
2
v
v
1
1
log
=
2 2v
2v
0
Proof of Lemma 4.2.

Notice that if under the baseline
model 0 , yt N (xt , 2v ),
2
v
then under any measure , we have that yt N xt ,

v . To show this we compute the
cumulative distribution of yt
Z
Z
d 0
P (y < t) =
d =
0 d
{y<t}
{y<t} d
s 2
Z
v 0
1 1
1
(y xt )2
=
exp
d
2 2
2
v v
2v
{y<t}
#
"
s 2
Z
v 1
1 1
1
(y xt )2
2
dy
(y xt )
=
exp
exp
2
2 2v
2v 2v
2v 2 v
{y<t}
#
"
Z
1
(y xt )2
dy
=
exp
2
2v
2
v
{y<t}
The last equation shows that the density function of yt is N xt ,

2v . The relative entropy
of with respect to 0 is given by
2
1 1
d
1
1 2
2
0
=E
(yt xt ) +
log v log
v
R (|| ) = E log
2
2
d0
2w 2v
1 2
2
1
1 1
E vt2 +
log v log
v
=
2 2
2
2
v v
2
2
v
v
1
1
log
=
2 2v
2v
Proof of Lemma 4.2.

Notice that for a fixed probability measure t (or given the
2
distorted variance
v,t ), the investor makes her estimates of xt and xt+1 using Bayes law.
t
These estimates are given by the Kalman filter: xt+1
E t (xt+1 |It ) = a
xt t , with
xt1 + kt yt
xt = E (xt |It ) = (1 kt ) a
!

2
2 2
a2 2t1 + 1
2v,t
+
1
a
t1
v,t
xt+1 |It N a
xt |It N xt , 2 2
xt , a2 2 2
+1
a t1 + 1 +
2v,t
a t1 + 1 +
2v,t
kt =
a2 2t1 + 1
a2 2t1 + 1 +
2v,t
34
Under probability measure t the excess rate of return (denoted by Jt+1 ) is

Jt+1 (i ift ) (et+1 et )
t+1
2 i ift+1 + et + (i ift )
= t+1 1 xt+1
f
t
= t+1 (1 kt+1 ) 1 a
xt (kt+1 1 + 2 ) i it+1 + et + (i ift )
The t agent knows that the t + 1 agent will (i)use the same method to distort the probability
measure t+1 as the one used by the t agent, and (ii) will make forecasts with Bayes law
t+1
t
under
xt t
this t+1
. Therefore, the t agent knows that xt+1 will be xt+1 = (1 kt+1 ) a
kt+1 i ift+1 . Equations (4.4) and (4.5) follow directly.

We solve problem (3.5) by considering the investor as a Stackelberg leader, taking into
account the strategy of nature:
2v,t = s(bt , et ). Nature then selects
2v,t conditioning on the
agents choice of bt . Notice that
2v,t aects the investors payo through its eect on E (Jt )
21
and V ar (Jt ) . The first order conditions are:
!
(bt )2 V art (Jt+1 )
(bt )2
t
t
V ar (Jt+1 )
=
exp bt E (Jt+1 ) +
2
2
2v,t
2v,t
!
1
1
(8.1)
2
+
2 2v
v,t
= 0
!
2
(b
)
t
V art (Jt+1 )
= E t (Jt+1 ) + 2 bt V art (Jt+1 ) exp bt E t (Jt+1 ) +
bt
2
2v,t
d
+ 2
v,t dbt
i
h h
i
2 bt V art (Jt+1 )
= t+1 + a ( 1 + 2 ) xt t + et + i ift
!
2
)
(b
t
exp bt E t (Jt+1 ) +
V art (Jt+1 )
2
(8.2)
= 0
In the second equality we use the envelope theorem:
for the investors problem is 0 >
2
,
b2t
2v,t
= 0. The second order condition
where
!2
2
2
d
d2
(b)
(b)
2
2
v,t
v,t
v,t (b)
2
2
2
2
2
2
2
2
=
(b,
(b))+
(b,
(b))
(b,
(b))
+
(b,
(b))
+
bb
bv
v,t
v
v
v,t
v,t
v,t
v,t
2
bt
db
db
db2
Notice that the total derivative of natures FOC 2v,t (b,
2
2v,t = 0.
2v,t b db+
2v,t d
v,t (b)) = 0 is
21
We make a conjecture here that et = t + 1 x

t t + 2 i ift will make E (Jt+1 ) not dependent on
2v
, as it is in the equilibrium.
35
Thus,
2v,t b
d
2
v,t (b)
=
db
2v,t 2v,t
Combining this equation with b2v,t = 2v,t b , the investors SOC becomes
2v,t b (b,
2
v,t (b))

+
= bb (b,
2
2
2v,t (b,
v,t (b)) + b
v,t (b))
2
bt
2v,t 2v,t (b,
2
v,t (b))
2
2
2
(b,
(b))
v,t
= bb (b,
v,t b
2v,t 2v,t (b,
2
2
v,t (b))
v,t (b)) 0
2 2 (b, 2
(b))
v,t
v,t v,t
This condition is unambiguously satisfied because bb 0 :
h
i
2
2
1
2
bb = E (J) + bt V ar (J) + V ar (J) exp (bt ) E (J) + (bt ) V ar (J)

2
Natures second order condition is

2

(bt ) 1
(bt )2
2 2 =
2
2

v,t
V ar (J)
2v,t
!2
V ar (J)
2 2

v,t
!
(bt )2
1
exp bt E (J) +
V ar (J) + 2 0
2
2
2v,t
+
It holds holds if and only if t
!2
!
2
2
)
(J)
V
ar
(J)
)
(b
(b
V
ar
2
t
t
t
v,t bt
+
V ar (J)
2 2 exp bt E (J) +
2
2
2v,t

v,t
(8.3)
The sign of the lower bound t equals the sign of the bracketed term. To derive this sign
2
2
(a2 2t1 +1)
2
2 V ar (J)
2 V ar (J)
1
2
= 2a [kt+1 1 + 2 ] 2 2
0,
note that
2
3 0. Since 2 (bt )
2v,t
(
2v,t )
(a t1 +1+2v,t )
this term has an ambiguous sign. Factoring [kt+1 1 + 2 ]2 out, the sign of the bracketed
term equals the sign of
!2
2
2
2 2
2 2
a t1 + 1
a t1 + 1
(bt )2
2
2
2
[kt+1 1 + 2 ] a
2 + 1 2a 2
3
2
2v,t
2v,t
a2 2t1 + 1 +
a2 t1 + 1 +
It then follows that t is positive if (bt )2 is large enough. If (bt )2 is small, the restriction
on is not binding.
Next we characterize the solution. The first order condition for bt implies that
i
h
t+1 + a ( 1 + 2 ) xt + et + it ift bt V ar (J) = 0
2v
Equation (4.7) for bt in text follows from this condition. The first order condition for
36
implies that
!
(bt )2 V ar (J)
1
1
1
2
=
2
exp bt E (J) + (bt ) V ar (J) 0
2v
2
v,t
2v,t
(8.4)
Since V ar (Jt+1 |It ) = [(kt+1 1 + 2 )]2 t , we have that

dt
V ar (Jt+1 |It )
= [(kt+1 1 + 2 )]2 2
2
v,t
d
v,t
2
2 2
a t1 + 1
= [(kt+1 1 + 2 )]2 a2
2 + 1
2v,t
a2 2t1 + 1 +
0,
2
2
2v as goes to infinity or
which implies
2
v,t v . Equation (8.4) also implies that v,t =
= 0. QED
Proof of Proposition 4.3. The solution to the equilibrium consition bt (et ) = bst implies
et = i ift + a ( 1 + 2 ) xt + t+1 + bst V ar (Jt+1 )

Equalizing coecients with conjecture (3.7) we obtain
2 = 1,
1 =
a
,
1a
t = t+1 + bst V art (Jt+1 )
Since time starts in the infinite past, 2t has already

converged2 to its
2
2
2
(1+v a v )+ (1+2v a2 2v ) +4a2 2v
steady-state value for any time t [0, T ], where =
2a2
2 2
2
a
+1
)
(
v
solves the steady-state equation of 2t = a2 2t1+1+2 . Since the supply of bonds bst is constant
Proof of (8.9).
on [0, T ] , in steady-state equilibrium k

system
k t =
t1
and
2
v
are jointly determined by the following
a2 + 1
t
a2 + 1 +
2
v
h
i2
1
1
akt
2 b 1a + 1 kt
2
v t
v
i2
2
2 h
(a2 2t1 +1)

(b) akt
2
2 t 2t
2t
=
a 2 2
exp
+1 a k
v + 1 +
v
2 + 1
2
1a
(a t1 +1+2v,t )
v , with k =
It follows that for t [0, T ] there is a solution k ,
1
1

v
2v 2
i2
h
ak
a2
= b 1a + 1 k
a2 +1
a2 +1+
2
v
and
(8.5)
i2
2 h
a2 + 1
(b) ak
+ 1 exp
+ 1 a2 k
2
v
v +1+
2
1a
2
2
a +1+
v
37
Therefore, V art (Jt+1 |It ) is also constant for t [0, T ]: V ar (Jt+1 |It ) =
2
2
v
2
2 (a +1)
.
with = a a2 +1+2 + 1 +
v
ak
1a
i2
+ 1
Proof that t is finite for any fixed T. For t [0, T ] ,

1
t = t+1 + bst V art (Jt+1 |It )

2
2
T t
1 X ak
= T +1 +
+ 1
b
2 i=1
1a

2
T t ak
b
+ 1
= T +1 +
2
1a
(8.6)
We will prove that T +1 is finite so that t does not explode.

j
T +1 = T +j
1 X s
+
bT +i V art (JT +i+1 |It )
2 i=1
Letting j go to infinity, we have

T +1
1 X s
= +
b V art (JT +i+1 |It )
2 i=1 T +i
(8.7)
To find note that (8.6) implies

1
= + lim bst V art (Jt+1 |It )

t 2
(8.8)
Note that the bst process in (5.1) implies that for t (T, ]
1
1
lim bst V art (Jt+1 |It ) = lim b exp ( (t T )) t V art (Jt+1 |It )
t 2
t 2
1
= lim b exp ( (t T )) = 0
t 2
Therefore, exists and is equal to a constant = c R. Plugging back into equation
(8.7), we have that
1 X s
b V art (JT +i+1 |It )
T +1 = c +
2 i=1 T +i
b
1 X
1
= c+
b exp (i) = c +
2 i=1
2 1 exp (1)
b
is finite. Hence, for any fixed T and t [0, T ] , t
This proves that T +1 = c + 12 1exp(1)
38
can be expressed as
t
2
b
T t ak
= c+
+
b
+ 1 < ,
2 1 exp (1)
2
1a
2
2
v
2 (a + 1)
2
and c R.
=
a 2
v
+ 1 +
a +1+
2
v
with
(8.9)
Derivation of (5.10). A tagent knows the conjecture and the robust probability measure
that will used by future agents. Using xt+1 = a(1 k)
xt + kyt+1 and taking expectations
we have that
a2
a
Et (et+1 ) = t+1
(1 k) xt k
+ 1 Et (yt+1 )
1a
1a
2
a
a
0
0
0
(1 k ) xt k
+ 1 Et (yt+1 )
Et (et+1 ) = t+1
1a
1a
0
Equation (5.10) follows from the fact that Et (yt+1 ) = Et (xt+1 ) and Et (yt+1 ) = Et (xt+1 )
t = E (et+1 ) et + t E (et+1 et )
0
= [Et (et+1 ) Et (et+1 )] + t
h 0
i
ak
+ t
= Et (xt+1 ) Et (xt+1 ) 1 +
1a
Proof of Proposition 5.1. Under the data generating model 0 the shock sequences {vt }
N (0, 2v )and {wt }N (0, 2w ) for t = 1, 2, .... We assume that initial states y0 = x0 = x0 = 0.
We first calculate the impulse response of exchange rate to a transitory shock and persistent
shock separately.
1) IRF to a transitory shock:
The transitory shock starts at v1 = , the rest of all shocks are zero. It follows that data
generated are xt = 0 for all t 1, and y1 = , yt = 0, for all t 2. Using these data, we are
able to compute the state estimate at time t, xt = (a (1 k))t1 k. Therefore the IRF is
t=0
0 ak
1 + 1a t = 1
et (0, ) =
t1
at k(1k)

t2
1a
2)IRF to a persistent shock:

The persistent shock starts at w0 = , the rest of all shocks are zero. It follows that data
generated are xt = at1 and yt = at1 for all t 1 . Using these data, we compute the
state estimate at time t, xt = at1 1 (1 k)t . Plugging xt and yt into our exchange
rate function, we have the IRF
et (, 0) = t
at1
1 a (1 k)t ,
1a
t1
Second, we define our average IRF. Since econometrician cant distinguish transitory and
39
state-equation shocks, the price reaction to shock is an average of the two shocks. Hence we
define the observed IRF as following.
Definition 8.1. The impulse response at time t to an initial y-shock of size one (i.e., y1 =
w0 + v1 = 1) is given by
0
eav
t E (et (, ) et (0, 0)) |y1 =+=1
(8.10)
where et (, ) = et (0, ) + et (, 0)is the exchange rate response at time t, to an initial

persistent shock and an initial transitory shock (i.e., the time t exchange rate generated
by shock sequences ws = {, 0, ..., 0}1t and vs = {, 0, ..., 0}1t ).
0
>From the definition and use the result from Lemma 8.2 that E (|y1 = 1) = q and
0
E (|y1 = 1) = 1 q , it follows that
For t = 1,

ak
1
0
av
1q +
(1 a (1 k)) q 1
e1 = 1 1 +
1a
1a
ak
ak
1
0
= 1+
+ 1+
(1 a (1 k)) q
1a
1a 1a
ak
= 1+
1a
For t 2,
eav
t
t1

t
k
(1
k)
a
at1
0
t
1 q + 2
1 a (1 k) q 2
= 2
1a
1a
"
#
t1
t
at k (1 k)t1
a
k
(1
k)
at1
0
1 a (1 k)t
q
=
1a
1a
1a
"
0
at1
at k (1 k)t1
1 a (1 k)t1 q
=
1a
1a
i
at1 h 0
0
q + a (1 k)t1 k q
=
1a
Therefore we have
eav
t
= E (et (0, ) + et (, 0) et (0, 0) |y1 = 1)
0
1a(1k)
1a
t1
t1
0
a1a q + a (1 k)
t=0
t=1
0
kq
t2
Lastly, we prove i) and ii).

i)The IRF exhibits momentum only if agents fear misspecification in the observation
0
equation so that q = 12 > k.
We take dierence in exchange rates,
(
t=0
1a(1k)
av
1a
eav
e
=
t1 0
t+1
t
at1
0
t1
(1 a) q a (1 a (1 k)) (1 k)
q k
1a
40
It follows that
av
eav
t+1 et 0
(8.11)
only if k q .
ii)There exists a unique time , such that mean-reversion occurs after time , i.e.
av
eav
t+1 et 0, t
0
0 1
[1 k] 1 [1 a(1 k)][q k] q [ 1]
a
is the smallest integer greater or equal to
0 1
log 1 k
= 1 + log [1 a(1 k )][q k ] log q [ 1]
a
Lemma 8.2. Under the data generating model E (|y1 = 1) =
1
1+ 2v / 2w
Proof. Let w0 = and v1 = , .

Consider y1 = + , with (0, 2w ) N (0, 2v ) as a realization of a random variable
with the following distribution
0
y1 | N , 2v , and
with prior distribution
N(0, 2w )
Bayes law implies that the posterior is
|y1
N(m, n2 ), with
m = (1 h)m0 + hy1 , n2 = (1 h)n20 , h =
n20
n20 +
2v
Since m0 = 0 and n20 = 2w , it follows that

2w
1
E (|y1 =1 ) = 2
y1 =
.
2
w + v
1 + 2v / 2w
0
Proof of Proposition FPP. From (5.13) and robust uncovered interest parity Et (et+1 ) =
yt t it follows that
0
cov (et+1 , yt ) =
=
=
=
cov (Et (et+1 ) + t+1 , yt )

0
cov ((yt t ) + t+1 , yt )
0
0
0
var (yt ) + cov (t+1 , yt ) cov ( t , yt )
0
0
var (yt ) + cov (t+1 , yt )
0
(8.12)
The last step follows because the risk premium t is deterministic, so cov ( t , yt ) = 0. Lets
41
develop cov ( t+1 , yt ).

0
cov (t+1 , yt ) = cov (et+1 Et (et+1 ), yt )

0
= cov ( t t , yt )
0
= cov (t , yt )
(8.13)
The second line follows from (5.10): t = [Et (et+1 ) Et (et+1 )] + t . The third line
follows because the risk premium t is deterministic. Replacing (8.13) in (8.12) and then in
(5.14) we get
0
cov (t , yt )
F amma
p lim
(8.14)
= 1 lim
0
t var (yt )
0
Lets develop cov (t , yt ). Using (5.12)

0
cov (t , yt ) = cov
(Et
Et (xt+1 ), yt ),
(xt+1 )
0

0
= cov a xt xt , yt
0
= a cov xt xt , yt
0
a
:= k
+1
1a
(8.15)
Here, we use the fact that Et (xt+1 ) = a

xt and Et (xt+1 ) = a
xt because Et (wt ) = 0 and
Et (wt ) = 0.
0
The algebra follows. Lets compute cov (
xt , yt ) .
0
xt , yt ) = cov (
xt , xt + vt ) = cov (
xt , xt ) + k
cov (
(8.16)
Note that
0
xt , xt ) = cov (a (1 k) xt1 + kyt , xt )

cov (
0
0
xt1 , xt1 ) + kV ar (xt )
= a2 (1 k) cov (
0
xt , yt ) and the stationary value of V ar (x) =

Using the stationarity of cov (
that
1
k
0
0
x, y) = lim cov (
xt , yt ) =
+k
cov (
2
2
t
1 a 1 a (1 k)
0
0
1
k0
0
0
+ k0
cov x , y = lim cov xt , yt =
2
2
t
1 a 1 a (1 k0 )
Finally, substituting back in (5.14) we have that

p lim F amma = 1
K1
,
K2
where K2 := 1 +
42
1
,
1 a2
and
1
1a2
we have
K1 =
=
=
=
=
ak
a
1a
ak
a
1a
ak
a
1a
ak
a
1a
ak
a
1a
1
1
k0
k
0
+1
+ (k k)
1 a2 1 a2 (1 k0 ) 1 a2 1 a2 (1 k)
k0
1
k
0
+1
+ (k k)
1 a2 1 a2 (1 k0 ) 1 a2 (1 k)
1 k0 (1 a2 (1 k)) k (1 a2 (1 k0 ))
0
+1
+ (k k)
1 a2
(1 a2 (1 k 0 )) (1 a2 (1 k))
1 k0 a2 k0 + a2 k0 k k + a2 k a2 kk 0
0
+1
+ (k k)
1 a2 (1 a2 (1 k 0 )) (1 a2 (1 k))
1
+1
+ 1 (k 0 k)
2
0
2
(1 a (1 k )) (1 a (1 k))
It follows that
K1
=
K2
=
ak
1a
1
+ 1 (1a2 (1k0 ))(1a
+
1
(k0 k)
2 (1k))
1+
1
1a2
a (a (1 + a) k + (1 a2 ))
1
(1a2 (1k0 ))(1a2 (1k))
2 a2
+ 1 (k0 k)
Part 1 of the Proposition follows directly from the fact that for any a (0, 1) the sign of
K1 (k ) equals the sign of k 0 k . To prove part 2, we need to show that K1 /K2 > 1 for some
large a and that K1 /K2 < 1 for some low a.
0
1
K1
k
0
2
lim
= 2k k k
+ 1 = 2 1 + kk k 0 ,
a1 K2
kk0
k
2 2
2 2
+1
a +1
where k and k0 denote k and k0 evaluated at a = 1; k0 = a2a2+1+
, where
2 and k = a2 2 +1+
2
v
v
2
2
2
(1+v a )+ (1+2v a2 ) +4a2
2 =
, and
2
v solves the first order condition (8.1). It follows
2a2
2
from (8.1) that lim

v = , where is the lowest admissible bound for , so that the
SOC is satisfied. Since lim k = 0 for all a, we have that
K1
0
0
= 2 1 + 0 k 0 + 0 = 2
lim lim
a1 K2
k
K1
2
Since
2
v and k are continuos functions of , it follows that there is a
v such that lima1 K2 >
K1
1 for any
2
2
v >
v . This proves part 2. To prove part 3 note that K2 |a=0 = 0 and observe
1
that K
is continuous on a (1, 1). Thus, there exists a neighborhood ra = (, ) of
K2
1
< << 1 , whenever a ra , where is an arbitrary small
a around zero, such that K
K2
F
1
number. In other words, when a (0, ) , we always have = 1 K
> 0.
K2
Proof of Lemma 6.1. First, we find the distribution of random variable x under any
probability measure w , given that under the baseline measure 0 the random variable
x is normal N (axt1 , 1) .
43
P (x < t)
d =
{x<t}
{x<t}
d 0
d
d0
s 2

w 0
1
1
(x axt1 )2
=
exp
d
2
2
2
w 2 w
2w
{x<t}
#
"
s 2

Z
w
1
1
1
(x axt1 )2
2
dx
(x axt1 )
=
exp
exp
2 2w
2
2w 2 2w
2w 2 w
{x<t}
Z
1
1
2
=
exp 2 (x axt1 ) dx
2
w
2
w
{x<t}
The RHS in the last equation is the PDF of a Normal distribution N axt1 ,
2w . This shows
2w . The relative entropy is calculated as follows

that x v N ac,
Z
2
1
1
1
2
=E
x log
w
R (|| ) = E log
2
2
2w 2 2w
2
2 1
1
1
E
log
=
2
2
2w 2 2w
2
2
w
w
1
1
log
=
2 2w
2w
0
d
d0
Proof of Proposition 6.2. The only dierence with respect to the case of observation
uncertainty is that under the alternative model w the variance of returns is given by
(6.3). It follows that the first order condition for bt is given by (8.2), while that for
2w,t is
!
V
ar
(J)
1
1
1
1 2
= (bt )2
exp bt E (J) + (bt )2 V ar (J) +
0=
2
2
2
2w,t
2w,t
w,t
where
!
1
V ar (Jt+1 |It )
2
2
= [(kt+1 1 + 2 )] a
2 + 1 > 0
2w,t
2w,t
a2 2t1 + 1 +
Analogous to the argument for

2
v,t , the first order conditions imply that a robust agent
2
distorts upwards the variance of the trend shock:
2
w,t > w .
Proof of Lemma 6.3. First, we find the distribution of random variable x under any
probability measure a , given that under the baseline measure 0 the random variable
x is normal N (axt , 1) .
44
P (x < t)
d =
{x<t}
"
Z
{x<t}
d 0
d
d0
#
(xt )2 2xt (xt+1 axt )
d0
=
exp
2
{x<t}
"
#
#
"
Z
(xt )2 2xt (xt+1 axt )
1
(xt+1 axt )2
exp
=
exp
dxt+1
2
2
2
{x<t}
Z
1
1
2
exp (xt+1 (a + ) xt ) dxt+1
=
2
2
{x<t}
The RHS in the last equation is the PDF of a Normal distribution N (axt , 1). This shows
that xt+1 v N ((a + ) xt , 1) . The relative entropy is calculated as follows

0
R (|| ) = E log
= E
d
d0
(xt ) 2xt (xt+1 axt )

2
(xt )2
2
in the last step we use the fact that E (xt+1 ) = (a + ) xt .

Proof of Proposition 6.4. To derive the filter notice that under the baseline model,
0
we have the celebrated Kalman filter for state estimate xt ,
0
0
0
(8.17)
xt1 + kKalman yt
xt = E (xt |It ) = 1 kKalman a
2
2 2
2
2 2
2
a +
a +
ktKalman = 2 2 t1 2 w 2 , 2t = 2 2 t1 2 w v2
(8.18)
a t1 + w + v
a t1 + w + v
While under the robust model we have that
!
2 2
2
(a
+
)
(a + )2 2t1 + 2w
t1
w
t
(a + ) xt1 +
yt
xt = E (xt |It ) = 1
(a + )2 2t1 + 2w + 2v
(a + )2 2t1 + 2w + 2v
!
!
t
t
(a + )2 2t1 + 2w
(a + )2 2t1 + 2w
2
, xt+1 |It N (a + ) xt , (a + )
xt |It N xt ,
+1
(a + )2 2t1 + 2w + 2v
(a + )2 2t1 + 2w + 2v
(a+)2 2
+2
w
In this case the gain of the filter is kt t = (a+)2 2 t1
2
2 . Note that the gain is increasing
t1 + w + v
in the drift distortion because of the assumption a
2 (a + ) 2t1
dkt
=
2 > 0
d
(a + )2 2t1 + 2
45
(8.19)
Under probability measure t the excess rate of return (denoted by Jt+1 ) is

Jt+1 (i ift ) (et+1 et )
t+1
2 i ift+1 + et + (i ift )
= t+1 1 xt+1
= t+1 (1 kt+1 ) 1 (a +
) xt t
f
(kt+1 1 + 2 ) i it+1 + et + (i ift )
To compute the distribution of excess returns, under probability measure t , the t agent
t+1
uses
(3.7) to forecast next periods exchange rate: econj
t+1
+
t+1 = t+1 + 1 x
the conjecture
t+1
2 it+1 ift+1 . To obtain Ett (
xt+1
; It ) note that the t agent knows the problem that will
be solved by t + 1 agents. Thus, the t agent knows the method that t + 1 agents will use
to derive the robust probability measure t+1 , and that they will make forecasts using Bayes
law under t+1 . Taking this into account, the t agentknows that t + 1 agents will use the
t+1
updating formula xt+1
= (1 kt+1 ) (a+ t+1 )
xt t +kt+1 it+1 ift+1 . Therefore, the t agent
t+1
sets Ett (
xt+1
; It ) = (a + t+1 )
xt t . Replacing this formula in the conjecture, it follows that
under probability measure t , the log excess return Jt+1 (i ift ) (et+1 et ) is normally
distributed with mean and variance
E t (Jt+1 |It ) = t+1 (1 kt+1 ) 1 (a + ) xt t (kt+1 1 + 2 ) (a + ) xt t + et + (i ift )

(8.20)
= t+1 (1 + 2 ) (a + ) xt t + et + (i ift )
"
#
2 2
(a + ) t1 + 2w
V t (Jt+1 |It ) = (kt+1 1 + 2 )2 (a + )2
+2
(8.21)
(a + )2 2t1 + 2w + 2v
We solve problem (3.5) by considering the investor as a Stackelberg leader that takes into
account the strategy of nature: = s(bt , et ). Nature then selects conditioning on the
agents choice of bt . Notice that aects the investors payo through its eect on E (Jt )
and V ar (Jt ) .
The first order condition for is:
"
#
!
2
(b
E t (Jt+1 ) (bt )2 V art (Jt+1 )

)
t
= bt
+
V art (Jt+1 )
2
+ xt t + 1
"
#
2
)
dk
(b
t
t
= bt (1 + 2 ) xt t +
(kt+1 1 + 2 )2 2 (a + ) kt + (a + )2
2
d
!
2
2
)
(b
t
V art (Jt+1 ) + xt t + 1
2
46
The first order condition for for bt is:
!
2
(b
t
= E t (Jt+1 ) + 2 bt V art (Jt+1 ) exp bt E t (Jt+1 ) +
V art (Jt+1 )
bt
2
d
+
db
i
h h t
i
2 bt V art (Jt+1 )
= t+1 + (a + ) ( 1 + 2 ) xt t + et + i ift
!
2
)
(b
t
V art (Jt+1 )
2
= 0. The second order

In the second equality we have used the envelope theorem:
condition for the investors problem is

2
2
d (b)
d (b)
d2 (b)
0 > 2 = bb (b, (b)) + b (b, (b))

+ (b, (b))
+ (b, (b))
bt
db
db
db2
Notice that the total derivative of natures FOC (b, (b)) = 0 is b db + d = 0. Thus,
d (b)
b
=
db
Combining this equation with b = b , the investors SOC becomes
b (b, (b))
2
+
= bb (b, (b)) + b (b, (b))
b2t
2v 2v (b, 2
v (b))
2
b (b, (b))
= bb (b, (b)) 0
(b, (b))
(b, (b))
This condition is unambiguously satisfied because bb 0 :
h
i
2
2
1
2
bb = E (J) + bt V ar (J) + V ar (J) exp (bt ) E (J) + (bt ) V ar (J)

2
Natures second order condition is

#
"
2
(J)
V
ar
(J)
V
ar
(bt )2 1
(bt )2
=
+
2
2
2
()2
!
2
2
)
(b
t
exp bt E (J) +
V ar (J) + xt t + 1 0
2
It holds holds if and only if at

#
!
"
2
2
)
(bt )2
2 V ar (J)
(b
(bt )2 V ar (J)
t
a
exp bt E (J) +
V ar (J)
t 2
+
2
t
2
xt + 1
(8.22)
47
Next, we characterize the equilibrium. The first order condition for bt implies that
i
h
f
t+1 + (a + ) ( 1 + 2 ) xt et it it bt V ar (J) = 0
In equilibrium,
2 = 1
(a + ) ( 1 + 2 ) = 1 1 =
a+
1 (a + )
With 2 = 1 in equilibrium, the first order condition for becomes
"
#
dk
(bt )2
t
=
(kt+1 1 + 2 )2 2 (a + ) kt + (a + )2
0 =
2
d
!
2
2
(b
)
t
V art (Jt+1 ) + xt t + 1
2
This condition implies that > 0. Since the gain kt is an increasing function of , we
conclude that
ktt > ktKalman
(8.23)
Proof of Proposition 6.5.
We use a representation theorem in Dupuis and Ellis
(1997) to show that if 0 is the baseline model and u , then problem (6.10) is equivalent
to (6.11).
Lemma 8.3 (Variational Representation). Let (, I) be a measurable space, and let
P() denote the set of all probability measures defined on it. If f is a bounded measurable
function mapping from into R, and P (), then; (a) the following variational formula
holds:
Z
Z
log
ef () (d) = inf
P()
R(||) +
f ()(d)
(8.24)
(b) The infimum in (??) is uniquely achieved by a probability measure 0 which is absolutely
continuous with respect to and has the Radon-Nikodym derivative
d 0
1
ef () R f ()
d
e
(d)
Applying the above Lemma directly to our robust utility function, t = maxbt inf P () Et [u (Wt+1 ) +
0
we have that t = maxbt log Et exp 1 u (Wt+1 )
.
Proof of (6.12). From the first order condition for bt we have
0
1
f
f
f
et+1 et + it i exp bt et+1 et + it i exp exp bt et+1 et + it i
E
=
t
Next, we use Steins Lemma (see for example, Casella and Berger [?] page 187), which
states that for a normal distributed random variable XN (, 2 ) , and a function g (x) with
E |g0 (X)| < , we have
E [g (X) (X )] = 2 Eg 0 (X)
48
Starting with the first order condition for bt ,
1
0
0 = Et Jt+1 exp (bt (Jt+1 )) exp exp (bt (Jt+1 ))
where Jt+1 i ift (et+1 et )+. The linear conjecture implies that Jt+1 has a conditional
0
f
0
i it (et+1 et ) , V art (et+1 ) . Thus, let
normal distribution N Et
1
g (Jt+1 ) = exp (bt Jt+1 ) exp exp (bt Jt+1 )
The derivative of g (Jt+1 ) is
1
g (Jt+1 ) = 1 + exp (bt Jt+1 ) bt g (Jt+1 )
The right side of the first order condition can be written as
0
0
0
0
Et (Jt+1 g (Jt+1 )) = Et Jt+1 Et (Jt+1 ) + Et (Jt+1 ) g (Jt+1 )
0
0
0
0
= Et Jt+1 Et (Jt+1 ) g (Jt+1 ) Et (Jt+1 ) Et g (Jt+1 )
Applying Steins Lemma to the right side of the first order condition, we have that
0
0 = V art (et+1 ) Eg 0 (Jt+1 ) Et (Jt+1 ) Et g (Jt+1 )

Therefore,
0
Et
Hence
V art (et+1 ) Eg 0 (Jt+1 )

(S) =
0
Et g (Jt+1 )
0
Et (Jt+1 ) lt = 0
0
)Eg 0 (Jt+1 )
,
g
(J
)
=
exp
(b
J
)
exp
exp
(b
J
)
,
where lt V art E(et+1
0
t+1
t
t+1
t
t+1
t g(J1t+1 )
0
g (Jt+1 ) = 1 + exp (bt Jt+1 ) bt g (Jt+1 ) .
Thus,
0
0
1
1
V art (et+1 )Et [(1+
exp(bt (et+1 et +ift i)))bt exp(bt (et+1 et +ift i)) exp(
exp(bt (et+1 et +ift i)))]
.
lt =
0
f
f
1
Et [exp(bt (et+1 et +it i)) exp( exp(bt (et+1 et +it i)))]
Proof of Proposition 6.6. From the linear conjecture (3.7) et = t + 1 xt + 2 i ift

we have
f
0
0
0
Et (et+1 ) = t+1 + 1 Et xt+1 + 2 Et i it+1
0
= t+1 + ( 1 + 2 ) Et xt+1
f
0
= t+1 + ( 1 + 2 ) Et (1 kt ) a
xt + kt i it+1
= t+1 + a ( 1 + 2 ) xt
49
In equilibrium, bt = b , and the pricing function must satisfy
f
0
Et et+1 et + it i lt = 0
0
Plugging et and Et (et+1 ) into the pricing function, we have
t+1 t lt + [a ( 1 + 2 ) 1 ] xt ( 2 + 1) i ift = 0
The equation holds for any xt and i ift in , so
a ( 1 + 2 ) 1 = 0
2 + 1 = 0
t+1 t + lt = 0
Hence we have the coecients of equilibrium exchange rate function
a
, = 1, t+1 = t + lt
1a 2
where lt is lt evaluated at 1 , 2 , b .
1 =
50

Li, Tornell - Exchange Rates Under Robustness, The Forward Premium Puzzle and Delayed Overshooting

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Li, Tornell - Exchange Rates Under Robustness, The Forward Premium Puzzle and Delayed Overshooting

Uploaded by

Copyright:

Available Formats

Exchange Rates Under Robustness: The Forward

Premium Puzzle and Delayed Overshooting

First draft: May 2006

misspecification. We accomplish this by representing model uncertainty in terms of sets

are i.i.d. normal random variables.3

U = infj E [ exp (Wt+1 ) + <(||0 )] ,

the alternative model .5

3.1. Discussion of the Setup

Subsection 3.1 contains an interpretation of (3.5).

Define 0 = if 0 (A)= (A) for all A I.

missforecasts as a source of the anomalies.

4. Equilibrium Exchange Rate

The distribution of the observation satisfies yt |xt v N xt ,

wt1 N (0, 1) , xt1 |It1

The distorted variance

2v,t . The solution to this problem is given by the following Lemma.

ecients: et = it it + a ( 1 + 2 ) xt + t+1 + bst V art (Jt+1 ) . We then obtain the

means to attain robustness (i.e., kt t < kt via (8.1)).

). The third term t can be interpreted as the long-run

4.3. Intuition for Underreaction

E (x x)2 = k2 + (1 k)2 d + (1 k)2 a2 var(c)

In a Gaussian setup Bayes law renders the same result.

5. Foreign Exchange Market Anomalies

the baseline were

The distorted variance

4a31 2a+ 4a31

i. There is momentum if and only if forecasts underreact suciently to news (k k ) so

relative to the data generating process (i.e.,

uncertainty aversion, 1/ 0, there is no room for distortion as

t := (it ift ) Et (et+1 et )

Et et+1 = yt Et (xt+1 ) Et (xt+1 ) 1 +

last term is the risk premium.14

5.2. To derive part 2 notice that if both persistence a and

large, the forecast error eect in (5.12), captured by 1 + ak

The dependence of a strong FPP on k is more complex. A low requires

et+1 = Et (et+1 ) + t+1 ,

6. Other Types of Uncertainty

sure 0 the random variables in (3.2) are distributed as xt |It1 v N (a

Under any probability measure w , xt |It1 v N a

The relative entropy of measure with respect to measure 0 is

Proposition 6.4 (Equilibrium Under Drift Uncertainty). Under uncertainty set a

where V ar (Jt ) is given by (??).

E (x x)2 = (a + )2 (1 k)2 2c + (1 k)2 2w + k2 2v

This set corresponds to a truly worst-case scenario!

V art (et+1 ) Et g 0 (Jt+1 )

same as in a standard rational expectations equilibrium.17

an important distinction is made between global (unstructured) and specific(structured)

where o(2) = o1 (2) + o2 (2) and

bt,1 , Et+1,1 , Et,1 , ift,1 3 bt,2 , Et+1,2 , Et,2 , ift,2

for any measure .

dividing both sides by exp (i)

Therefore, Et+1,1 , Et,1 , ift,1 3 Et+1,2 , Et,2 , ift,2 . QED

lim E (exp (Wt+1 ) 1) + <(|| )

= E lim (exp (Wt+1 ) 1) + <(|| )

= E lim Wt+1 exp (Wt+1 ) + <(|| )

Proof of Lemma 4.1. Notice that if under the baseline

under any measure v , we have that yt N xt ,

which shows that the density function of yt is N xt ,

Proof of Lemma 4.2.

then under any measure , we have that yt N xt ,

The last equation shows that the density function of yt is N xt ,

Proof of Lemma 4.2.

Under probability measure t the excess rate of return (denoted by Jt+1 ) is

kt+1 i ift+1 . Equations (4.4) and (4.5) follow directly.

The dependence of a strong FPP on k is more complex. A low requires