Professional Documents
Culture Documents
Patton OMI Lec4
Patton OMI Lec4
Andrew Patton
Department of Economics
Duke University
Andersen, T.G., Bollerselv, T., Christo¤ersen, P.F., and Diebold, F.X., 2006,
Volatility and Correlation Forecasting, in the Handbook of Economic
Forecasting, G. Elliott, C.W.J. Granger and A. Timmermann ed.s, North
Holland Press, Amsterdam.
Clark, T.E. and McCracken, M.W., 2009, Tests of Equal Predictive Ability
with Real-Time Data, J. Business and Economic Statistics, 27, 441-454
Patton, A.J., 2011, Data-Based Ranking of Realised Volatility Estimators,
Journal of Econometrics, 161(2), 284-303.
Patton, A.J. and Sheppard, K., 2009, Evaluating Volatility and Correlation
Forecasts, in T.G. Andersen, R.A. Davis, J.-P. Kreiss and T. Mikosch (eds.)
Handbook of Financial Time Series, Springer Verlag.
Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 2 / 50
Evaluating volatility forecasts
Yt +h = β0 + β1 Ŷt +h,t + ut +h
H0 : β0 = 0 \ β1 = 1
dt + h L Yt +h , Ŷta+h,t L Yt +h , Ŷtb+h,t
H0 : E [ dt ] = 0
Both of these rely on the observability of Yt +h , which does not hold when
the object of interest is σ2t +1 V [rt +1 jFt ] .
σ̂2t +1 rt2+1
h i h i
Et σ̂2t +1 = Et rt2+1 = Vt [rt +1 ] σ2t +1
Many previous papers ignored the estimation error in the volatility proxy
Hansen and Lunde (2006) and Patton (2011) derive results that show using a
conditionally unbiased, but noisy, proxy
h i
Et 1 σ̂2t = σ2t
2
but Et 1 σ̂2t σ2t > 0
may lead to problems, and suggest methods that are robust to noise in the
proxy.
HL set out to obtain conditions which ensure that the ranking of volatility
forecasts is robust to noise:
h i h i h i h i
E L σ2t , h1t R E L σ2t , h2t , E L σ̂2t , h1t R E L σ̂2t , h2t
where
1 T
T t∑
Empirical pre-ordering: L σ̂2t ,
=1
The main question is: under what conditions are the “true” and
“approximate” pre-orderings equivalent?
Assumption 1(ii+iii):
T h i T h i
lim T 1
T !∞
∑E L σ2t , ht , lim T 1
T !∞
∑E L σ̂2t , ht ,
t =1 t =1
T
and lim T 1
T !∞
∑L σ̂2t , ht
t =1
Assumption 2(i): De…ne η t σ̂2t σ2t . Let Ft be some …ltration such that
for all hit we have σ2t , hit 2 Ft 1
2
Assumption 2(ii): (b) ∂2 L σ2 , h /∂ σ2 exists and does not depend on h,
AND fη t , Ft g is a martingale di¤erence sequence) E [η t jFt 1 ] = 0
∂L σ2t , ht
L σ̂2t , ht = L σ2t , ht + σ̂2t σ2t
∂σ2
1 ∂2 L σ̈2t , ht 2
+ 2
σ̂2t σ2t
2 ∂ ( σ2 )
h i h i ∂L σ2t , ht h i
Et 1 L σ̂2t , ht = Et 1 L σ2t , ht + 2
Et 1 σ̂2t σ2t
∂σ
1 2
+ Et 1 λ σ̈2t σ̂2t σ2t
2
h i 1 2
= Et 1 L σ2t , ht + Et 1 λ σ̈2t σ̂2t σ2t
2
h i h i 1 2
so E L σ̂2t , ht = E L σ2t , ht + E λ σ̈2t σ̂2t σ2t
2
h i h i h i h i
and E L σ̂2t , h1t E L σ̂2t , h2t = E L σ2t , h1t E L σ2t , h2t
Thus we’ve shown the equivalence of the “approximate” and the “true”
pre-orderings (rankings)
2 ∂2 L σ 2 , h
L σ2t , ht = σ2t ht ) 2
=2
∂ ( σ2 )
Thus ranking volatility forecasts using MSE and the daily squared returns is
equivalent (asymptotically) to ranking them using the true conditional
variance. Ranking by MSE on log variances is not.
ϕ σ̂2t = β0 + β1 ϕ (ht ) + ut
p
Common choices for ϕ are ϕ (x ) = x, ϕ (x ) = log x, and ϕ (x ) = x.
The population R 2 from this regression is
h i2
Cov ϕ σ̂2t , ϕ (ht )
R2 = h i
V [ ϕ (ht )] V ϕ σ̂2t
HL show that rankings using this measure are robust to noise in σ̂2t if the
following condition is satis…ed:
j
Et 1 σ̂2t σ2t ϕ(j ) σ2t = cj for j = 1, 2, ... 8t
Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 14 / 50
Regression-based evaluation, cont’d
Consider the condition below for a few special cases
h i
Et 1 η jt ϕ(j ) σ2t = cj for j = 1, 2, ... 8t
Thus the popular regression using logs is not likely to yield a robust ranking.
3 If ϕ is the square-root, then need Et 1 η kt ∝ σ2k
t
1
8k which is also
unlikely to hold.
In this paper I extend the work of Hansen and Lunde (2006) in a few useful
directions:
2 I provide a necessary and su¢ cient condition on the loss function for it to yield
rankings of volatility forecasts that are robust to noise in the volatility proxy
3 I provide some guidance on the choice of loss function for volatility forecast
comparison, through:
Thus, the presence of noise in the proxy makes the choice of loss function
very important.
The problem we face here is unusual: we know what the optimal forecast is
(we want ht = σ2t ), and we want to …nd loss functions that ensure this.
2
h generatesiht = σt , we also satisfy
It will turn out that by making sure that L
3 2 2
Hansen and Lunde’s condition that ∂ L/ ∂ σ̂t ∂h = 0.
h i
for any σ̂2t s.t. Et 1 σ̂2t = σ2t
Thus a measure of the extent of the problem (if any) with a loss function is
to derive the optimal forecast under it and check how far it is from σ2t .
1 σ̂2t = rt2
m
∑ rjt2
(m )
2 σ̂2t = RVt +1
j =1
3 σ̂2t = RGt p1
2 log 2
(maxτ log Pτ minτ log Pτ ), for t 1<τ t
2
MSE : L σ̂2 , h = σ̂2 h
σ̂2 σ̂2
QLIKE : L σ̂2 , h = log 1
h h
2
MSE -LOG : L σ̂2 , h = log σ̂2 log h
p 2
MSE -SD : L σ̂2 , h = σ̂ h
2
“HMSE ” aka MSE -prop : L σ̂2 , h = σ̂2 /h 1
Consider the choice of the MSE loss function with squared returns as the
volatility proxy. In that case we know:
2
ht arg min Et 1 σ̂2t h
h 2H
h i
= Et 1 rt2
h i
= σ2t Et 1 ε2t , since rt = σt εt
= σ2t , since εt jFt 1 s Student’s t (0, 1, ν)
And so under this combination of loss function and proxy we …nd no bias.
This is consistent with the result from Hansen and Lunde (2006).
So we again …nd that the optimal forecast is the true conditional variance.
What about some other loss functions?
And so under this combination of loss function and proxy we …nd that the
optimal forecast is equal to 0.45 times the true conditional variance
Thus we will generally be lead to selecting forecasts that are downward biased
And so under this combination of loss function and proxy we …nd that the
optimal forecast is equal to 0.64 times the true conditional variance
We will again generally be lead to selecting forecasts that are downward biased
1 For two loss functions (MSE and QLIKE) the optimal forecast is the true
conditional variance. Thus they satisfy at least this necessary condition for
robustness (they also satisfy Hansen and Lunde’s su¢ cient condition).
2 The remaining seven loss functions all generate optimal forecasts that di¤er
from the true conditional variance.
The fact that the bias can be upwards or downwards explains the con‡icting
…ndings various authors have found when looking for the best volatility
forecast across a range of (non-robust) loss functions.
σ2t
so fri ,m,t gm
i =1 s iid N 0,
m
Although RV theory has been developed for more general DGPs, theory for
the range is mostly based on work by Feller (1951) who considered on the
above simple case.
Recent work by Christensen and Podolskij (2006, JoE) provides asymptotic
theory for the “realised range” but I do not consider this.
Feller (1951) provided the density of the range, and Parkinson (1980)
provided a formula for obtaining moments:
∞
k2 k RGt
f (RGt ; σt ) = 8 ∑( 1)k 1
σt
φ
σt
k =1
4 p+1
E RGtp = p Γ 2p /2 22 p /2 ζ (p 1) σpt , for p 1
π 2
Volatility proxy
Both the MSE and QLIKE loss functions yielded the conditional variance as
the optimal forecast.
This lead me to the question: Is there a general class of such loss functions?
A3: L is twice continuously di¤erentiable with respect to h and σ̂2 , and has a
unique minimum at σ̂2 = h.
h i
A4: There exists some ht 2 int (H) such that ht = Et 1 σ̂2t , where H is
a compact subset of R++ .
h i
A5: L and Ft are such that: (a) Et 1 L σ̂2t , h < ∞ for some h 2 H;
h i
(b ) Et 1 ∂L σ̂2t , σ2t /∂h < ∞; and
h i
(c ) Et 1 ∂2 L σ̂2t , σ2t /∂h2 < ∞ for all t.
I show that S 1 ) S 2 ) S 3 ) S 1.
S 1 ) S 2 follows from Hansen and Lunde (2006): their assumption 2 is
satis…ed given the assumptions for the proposition.
S 2 ) S 3 is simple to show using the fact that rankings using “robust” loss
functions are the same whether σ̂2t or σ2t is used.
Proving S 3 ) S 1 was harder. It is related to the necessity of the
“linear-exponential” family of densities for quasi-maximum likelihood
estimation (though my case does not …t directly into that framework). See the
paper for details.
The general class of robust loss functions is quite broad: it depends on the
univariate function C , which is only restricted to be twice continuously
di¤erentiable and strictly decreasing function on H.
Proposition 3:
(i) The “MSE” loss function is the only robust loss function that depends
solely on the forecast error, σ̂2 h.
(ii) The “QLIKE” loss function is the only robust loss function that depends
solely on the standardised forecast error, σ̂2 /h.
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
hhat (r2=2)
0.5
0
0 0.5 1 1.5 2
forecast error (r2=2)
It turns out that the sub-set of homogeneous robust loss functions is exactly
the parametric family I derived previously:
8
>
>
1
(b +1 )(b +2 )
(σ̂2b +4 h b +2 ) 1
b +1 h
b +1 σ̂2 h ,
>
<
for b 2
/ f 1, 2g
L σ̂2 , h; b = 2
>
> h σ̂2 + σ̂2 log σ̂h , for b = 1
>
: σ̂2 2
h log σ̂h 1, for b = 2
One would hope that a simple re-scaling of the data does not change any
conclusions, but this is not the case. Consider the following example:
C 0 (h ) = log (1 + h )
Daily and intra-daily data on IBM from January 1993 to December 2003,
2772 observations
First 272 observations are used for estimation, last 2500 observations are
used for forecast comparison
25
Conditional variance
20
15
10
0
Jan94 Jan95 Jan96 Jan97 Jan98 Jan99 Jan00 Jan01 Jan02 Jan03 Dec03
2
DMW t-statistic
-1
Most tests were developed for the case that Yt 2 Ft , and no such problems
arise in that case.
Hansen and Lunde (2006) provide a su¢ cient condition on the loss function
for it to yield rankings that are robust to (mean zero) noise in the volatility
proxy.
Patton (2011) veri…es that many commonly-used loss functions lead to severe
biases when used with a noisy proxy.
More accurate volatility proxies were shown to alleviate these problems, but
they do not completely remove them.
Patton (2011) provides a necessary and su¢ cient condition for the loss
function to be robust, ruling out some previously-used loss functions