Patton OMI Lec4

Volatility Forecast Evaluation and Comparison
Andrew Patton
Department of Economics
Duke University
OMI-SoFiE Summer School, 2013
Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 1 / 50

Papers to be covered
* Hansen, P. R., and Lunde, A., 2006, Consistent Ranking of Volatility
Models, Journal of Econometrics, 131, 97-121.
* Patton, A.J., 2011, Volatility Forecast Comparison using Imperfect
Volatility Proxies, Journal of Econometrics, 160(1), 246-256.
Andersen, T.G., Bollerselv, T., Christo¤ersen, P.F., and Diebold, F.X., 2006,
Volatility and Correlation Forecasting, in the Handbook of Economic
Forecasting, G. Elliott, C.W.J. Granger and A. Timmermann ed.s, North
Holland Press, Amsterdam.
Clark, T.E. and McCracken, M.W., 2009, Tests of Equal Predictive Ability
with Real-Time Data, J. Business and Economic Statistics, 27, 441-454
Patton, A.J., 2011, Data-Based Ranking of Realised Volatility Estimators,
Journal of Econometrics, 161(2), 284-303.
Patton, A.J. and Sheppard, K., 2009, Evaluating Volatility and Correlation
Forecasts, in T.G. Andersen, R.A. Davis, J.-P. Kreiss and T. Mikosch (eds.)
Handbook of Financial Time Series, Springer Verlag.
Evaluating volatility forecasts
The standard de…nition of an optimal forecast for loss function L is:
Ŷt +h,t arg min E [L (Yt +h , ŷ ) jFt ]

ŷ 2Y
Traditional evaluation and comparison tests are usually based on

Mincer-Zarnowitz type regressions:
Yt +h = β0 + β1 Ŷt +h,t + ut +h
H0 : β0 = 0 \ β1 = 1
And Diebold-Mariano-West tests
dt + h L Yt +h , Ŷta+h,t L Yt +h , Ŷtb+h,t
H0 : E [ dt ] = 0
Both of these rely on the observability of Yt +h , which does not hold when
the object of interest is σ2t +1 V [rt +1 jFt ] .

Latent variables in macroeconomics: GDP growth
Related to Clark and McCracken, 2009, JBES.

Using “volatility proxies” in MZ and DMW tests
The standard method of overcoming the latent nature of σ2t +1 (or IVt +1 or
Et [IVt +1 ] in modern work) is to …nd some volatility proxy.
A common example is the squared daily return, which is conditionally

unbiased for σ2t +1 if the conditional mean is zero:
σ̂2t +1 rt2+1
h i h i
Et σ̂2t +1 = Et rt2+1 = Vt [rt +1 ] σ2t +1
“Realizsed volatility” may also be conditionally unbiased for the expected

integrated variance or the conditional variance (under some conditions)
m
(m )
σ̂2t +1 = RVt +1 ∑ rjt2
j =1
h i
Et σ̂2t +1 = σ2t +1 ?
h i
Et σ̂2t +1 = Et [IVt +1 ] ?

Using “volatility proxies” in MZ and DMW tests
Many previous papers ignored the estimation error in the volatility proxy
In some applications this is OK

In other applications, ignoring the estimation error can lead to undesirable
outcomes.
Hansen and Lunde (2006) and Patton (2011) derive results that show using a
conditionally unbiased, but noisy, proxy
h i
Et 1 σ̂2t = σ2t
2
but Et 1 σ̂2t σ2t > 0
may lead to problems, and suggest methods that are robust to noise in the
proxy.

Hansen and Lunde (2006)
HL set out to obtain conditions which ensure that the ranking of volatility
forecasts is robust to noise:
h i h i h i h i
E L σ2t , h1t R E L σ2t , h2t , E L σ̂2t , h1t R E L σ̂2t , h2t
where
σ2t is the object of interest (latent) 2 Ft 1
σ̂2t is the proxy for σ2t (observable) 2 Ft
hit is the volatility forecast from model i 2 Ft 1

Three “pre-orderings” of volatility models
Hansen and Lunde distinguish three “pre-orderings” (rankings) of volatility
models:
h i
True pre-ordering: E L σ2t ,
h i
Approximate pre-ordering: E L σ̂2t ,
1 T
T t∑
Empirical pre-ordering: L σ̂2t ,
=1
Only the “empirical pre-ordering” is directly observable. Under basic

assumptions the empirical pre-ordering limits to the approximate pre-ordering.
The main question is: under what conditions are the “true” and
“approximate” pre-orderings equivalent?

Key assumptions in the analysis of Hansen and Lunde I
Assumption 1(i): L σ2t , h and L σ̂2t , h are “integrable” 8 t

h i
() E L σ2t , h < ∞ and E L σ̂2t , h < ∞)
Assumption 1(ii+iii):
T h i T h i
lim T 1
T !∞
∑E L σ2t , ht , lim T 1
T !∞
∑E L σ̂2t , ht ,
t =1 t =1
T
and lim T 1
T !∞
∑L σ̂2t , ht
t =1
exist and are …nite

Key assumptions in the analysis of Hansen and Lunde II
Assumption 2(i): De…ne η t σ̂2t σ2t . Let Ft be some …ltration such that
for all hit we have σ2t , hit 2 Ft 1
Assumption 2(ii): (a) ∂L σ2 , h /∂σ2 exists and does not depend on h, OR
2
Assumption 2(ii): (b) ∂2 L σ2 , h /∂ σ2 exists and does not depend on h,
AND fη t , Ft g is a martingale di¤erence sequence) E [η t jFt 1 ] = 0
Note: Assumption 2(ii)(a) is unlikely to hold in practice – 2(ii)(b) is a more

relevant assumption.

Hansen and Lunde - main result
The main result in Hansen and Lunde is Theorem 2: under assumptions 1

and 2 the “true” and “approximate” pre-orderings are equivalent
Proof: take a second-order mean-value expansion of L σ̂2t , ht :
∂L σ2t , ht
L σ̂2t , ht = L σ2t , ht + σ̂2t σ2t
∂σ2
1 ∂2 L σ̈2t , ht 2
+ 2
σ̂2t σ2t
2 ∂ ( σ2 )
where σ̈2t = ασ̂2t + (1 α) σ2t , for some α 2 [0, 1] , and

2 2
∂2 L σ̈2t , ht /∂ σ2 λ σ̈2t , since ∂2 L/∂ σ2 does not depend on h by
assumption 2(ii ) (b ) . Then:

Hansen and Lunde - main result, cont’d
Take unconditional expectation of L σ̂2t , ht :
h i h i ∂L σ2t , ht h i
Et 1 L σ̂2t , ht = Et 1 L σ2t , ht + 2
Et 1 σ̂2t σ2t
∂σ
1 2
+ Et 1 λ σ̈2t σ̂2t σ2t
2
h i 1 2
= Et 1 L σ2t , ht + Et 1 λ σ̈2t σ̂2t σ2t
2
h i h i 1 2
so E L σ̂2t , ht = E L σ2t , ht + E λ σ̈2t σ̂2t σ2t
2
h i h i h i h i
and E L σ̂2t , h1t E L σ̂2t , h2t = E L σ2t , h1t E L σ2t , h2t
Thus we’ve shown the equivalence of the “approximate” and the “true”
pre-orderings (rankings)

Application of the main result
Hansen and Lunde verify that the MSE loss function
2 ∂2 L σ 2 , h
L σ2t , ht = σ2t ht ) 2
=2
∂ ( σ2 )
satis…es their assumption 2 if used in conjunction with a conditionally

unbiased proxy: h i
Et 1 σ̂2t = σ2t
The MSE-log loss function does not:
2 ∂2 L σ 2 , h 1 log σ2t + log ht

L σ2t , ht = log σ2t log ht ) 2
=2
∂ ( σ2 ) σ4t
Thus ranking volatility forecasts using MSE and the daily squared returns is
equivalent (asymptotically) to ranking them using the true conditional
variance. Ranking by MSE on log variances is not.

Regression-based evaluation
Hansen and Lunde also consider ranking forecasts by the R 2 of a
Mincer-Zarnowitz regression of a transformation, ϕ, of the proxy on the same
transformation of the forecast:
ϕ σ̂2t = β0 + β1 ϕ (ht ) + ut
p
Common choices for ϕ are ϕ (x ) = x, ϕ (x ) = log x, and ϕ (x ) = x.
The population R 2 from this regression is
h i2
Cov ϕ σ̂2t , ϕ (ht )
R2 = h i
V [ ϕ (ht )] V ϕ σ̂2t
HL show that rankings using this measure are robust to noise in σ̂2t if the
following condition is satis…ed:
j
Et 1 σ̂2t σ2t ϕ(j ) σ2t = cj for j = 1, 2, ... 8t
Regression-based evaluation, cont’d
Consider the condition below for a few special cases
h i
Et 1 η jt ϕ(j ) σ2t = cj for j = 1, 2, ... 8t
1 If ϕ is a¢ ne, then only requires that Et 1 σ̂2t σ2t = c (possibly di¤erent

from zero)
2 If ϕ is logarithmic, then need
1
Et 1 [η t ] = c1 , OK if Et 1 [η t ] = 0
σ2t
h i 1 h i
Et 1 η 2t = c2 , so need Vt 1 σ̂2t ∝ σ4t , OK?
σ4
h ti
In general, need Et 1 η kt ∝ σ2k
t 8k
Thus the popular regression using logs is not likely to yield a robust ranking.
3 If ϕ is the square-root, then need Et 1 η kt ∝ σ2k
t
1
8k which is also
unlikely to hold.

Empirical application
Hansen and Lunde consider an application to IBM returns over the period
Jan 1995 to Feb 2002. They use the …rst 1250 observations for estimation
and the remaining 545 for out-of-sample forecast evaluation.
They consider 8 variations of ARCH models:

ARCH, GARCH, EGARCH, APARCH, T-GARCH, FIGARCH, FIAPARCH.
They consider 3 volatility proxies:

!
1 T rs2
T s∑
σ̃2[sc .RV ]t RVt
=1 RVs
2
σ̃2[RV +on ]t RVt + log Ptopen log Ptclose
1
2
σ̃2[sq.ret ]t log Ptclose log Ptclose
1
HL note that, under some assumptions:

h i h i h i
V σ̃2[sc .RV ]t σ2t V σ̃2[RV +on ]t σ2t V σ̃2[sq.ret ]t σ2t

Empirical application - Table 1

Empirical application - Table 2

Patton (2011)
In this paper I extend the work of Hansen and Lunde (2006) in a few useful
directions:
1 I derive analytical results on the distortions in rankings of volatility forecast

that occur using some common loss functions and volatility proxies.
2 I provide a necessary and su¢ cient condition on the loss function for it to yield
rankings of volatility forecasts that are robust to noise in the volatility proxy
3 I provide some guidance on the choice of loss function for volatility forecast
comparison, through:
statistical considerations (required number of …nite moments)

economic considerations (homogeneous loss functions are the only sensible
choice in economics)

Framework
As in earlier forecast studies, I start from the de…nition of an optimal forecast:
Ŷt +h,t arg min E [L (Yt +h , ŷ ) jFt ]

ŷ 2Y
If we convert this to volatility forecasting and assume that no proxy is needed

we get:
h i
ht arg min E L σ2t , h jFt 1 = σ2t ,
h 2H
since σ2t 2 Ft 1 and L σ2t , σ2t = 0 8 L
But if we are forced to use a volatility proxy we get

h i
ht arg min E L σ̂2t , h jFt 1
h 2H
= σ2t ?? for all/some L?

Framework, cont’d
It is immediate that the optimal volatility forecast is σ2t when σ2t is
observable (at time t 1).
But if we de…ne an optimal volatility forecast for loss function L as:

h i
ht arg min E L σ̂2t , h jFt 1
h 2H
then it is not clear that for any choice of L we have ht = σ2t .
Thus, the presence of noise in the proxy makes the choice of loss function
very important.
The problem we face here is unusual: we know what the optimal forecast is
(we want ht = σ2t ), and we want to …nd loss functions that ensure this.
2
h generatesiht = σt , we also satisfy
It will turn out that by making sure that L
3 2 2
Hansen and Lunde’s condition that ∂ L/ ∂ σ̂t ∂h = 0.

De…ning a “robust” loss function
A loss function is “robust” if the ranking of any two (possibly misspeci…ed)

forecasts, h1t and h2t , by expected loss is the same whether the ranking is
done using the true conditional variance, σ2t , or some conditionally unbiased
proxy, σ̂2t .
That is, if:

h i h i h i h i
E L σ2t , h1t R E L σ2t , h2t , E L σ̂2t , h1t R E L σ̂2t , h2t
h i
for any σ̂2t s.t. Et 1 σ̂2t = σ2t

Characterising the problems with some loss functions
A necessary condition for a loss function to be robust is that it generates an

optimal forecast that is equal to the true conditional variance, i.e. ht = σ2t .
Thus a measure of the extent of the problem (if any) with a loss function is
to derive the optimal forecast under it and check how far it is from σ2t .
I do this for a collection of 9 loss functions, and three volatility proxies:
1 σ̂2t = rt2
m
∑ rjt2
(m )
2 σ̂2t = RVt +1
j =1
3 σ̂2t = RGt p1
2 log 2
(maxτ log Pτ minτ log Pτ ), for t 1<τ t

Loss functions
2
MSE : L σ̂2 , h = σ̂2 h
σ̂2 σ̂2
QLIKE : L σ̂2 , h = log 1
h h
2
MSE -LOG : L σ̂2 , h = log σ̂2 log h
p 2
MSE -SD : L σ̂2 , h = σ̂ h
2
“HMSE ” aka MSE -prop : L σ̂2 , h = σ̂2 /h 1
MAE : L σ̂2 , h = σ̂2 h
MAE -LOG : L σ̂2 , h = log σ̂2 log h

p
MAE -SD : L σ̂2 , h = σ̂ h
MAE -prop : L σ̂2 , h = σ̂2 /h 1

The MSE loss function using squared returns as the proxy
Consider the choice of the MSE loss function with squared returns as the
volatility proxy. In that case we know:
2
ht arg min Et 1 σ̂2t h
h 2H
h i
= Et 1 rt2
h i
= σ2t Et 1 ε2t , since rt = σt εt
= σ2t , since εt jFt 1 s Student’s t (0, 1, ν)
And so under this combination of loss function and proxy we …nd no bias.
This is consistent with the result from Hansen and Lunde (2006).

The QLIKE loss function using squared returns as the proxy
Another popular choice is the so-called QLIKE loss function. In this case:
" #
σ̂2t σ̂2t
ht arg min Et 1 log 1
h 2H h h
" !#
∂ σ̂2t σ̂2t
FOC 0 = Et 1 log 1
∂h ht ht
" #
σ̂2t 1
= Et 1 +
ht 2 ht
h i
so ht = Et 1 σ̂2t
= σ2t
So we again …nd that the optimal forecast is the true conditional variance.
What about some other loss functions?

The MAE loss function using squared returns as the proxy
Consider the common choice of the MAE loss function with squared returns
as the volatility proxy. In that case we know:
h i
ht arg min Et 1 σ̂2t h
h 2H
h i
= Mediant 1 rt2
h i
= σ2t Mediant 1 ε2t , since rt = σt εt
8 2 ν 2
< σt ν Median [F1,ν ] , if rt jFt 1 s Student’s t 0, σ2t , ν
=
: 2
σt Median χ21 0.45σ2t , if rt jFt 1 s N 0, σ2t
And so under this combination of loss function and proxy we …nd that the
optimal forecast is equal to 0.45 times the true conditional variance
Thus we will generally be lead to selecting forecasts that are downward biased

The MSE-SD loss function using sq rets as the proxy
Another popular choice is the MSE-SD loss function with squared returns as
the volatility proxy. In that case we have:
p 2
ht arg min Et 1 σ̂t h
h 2H
= Et 1 [jrt j]2
= σ2t Et 1 [jεt j]2
8 2
>
< ν π 2 Γ ν 2 1 /Γ ν
σ2t , if rt jFt 1 s Student’s t 0, σ2t , ν ,
2
=
>
: 2 2
π σt 0.64σ2t , if rt jFt 1 s N 0, σ2t
And so under this combination of loss function and proxy we …nd that the
optimal forecast is equal to 0.64 times the true conditional variance
We will again generally be lead to selecting forecasts that are downward biased

Summary of results across all loss functions
Distribution of daily returns
Loss rt jFt 1 s Student 0 s t (ν)

function rt jFt 1 s Ft 0, σ2t ν=6 ν = 10 ν ! ∞
MSE σ2t σ2t σ2t σ2t

QLIKE σ2t σ2t σ2t σ2t
MSE-LOG exp Et 1 log ε2t σ2t 0.22σ2t 0.25σ2t 0.28σ2t
MSE-SD (Et 1 [jεt j])2 σ2t 0.56σ2t 0.60σ2t 0.64σ2t
MSE-prop Kurtosist 1 [rt ] σ2t 6.00σ2t 4.00σ2t 3.00σ2t
MAE Mediant 1 rt2 0.34σ2t 0.39σ2t 0.45σ2t
MAE-LOG Mediant 1 rt2 0.34σ2t 0.39σ2t 0.45σ2t
MAE-SD Mediant 1 rt2 0.34σ2t 0.39σ2t 0.45σ2t
MAE-prop † n/a 2.73σ2t 2.55σ2t 2.36σ2t

Interpreting Patton’s Table 1
Table 1 reveals two important facts:
1 For two loss functions (MSE and QLIKE) the optimal forecast is the true
conditional variance. Thus they satisfy at least this necessary condition for
robustness (they also satisfy Hansen and Lunde’s su¢ cient condition).
2 The remaining seven loss functions all generate optimal forecasts that di¤er
from the true conditional variance.
The bias is worse for larger kurtosis

The bias can be upwards (MSE-prop and MAE-prop) or downwards (the rest)
The fact that the bias can be upwards or downwards explains the con‡icting
…ndings various authors have found when looking for the best volatility
forecast across a range of (non-robust) loss functions.

Using better volatility proxies
I next consider the use of realised volatility and the (adjusted) range, to
see how the bias is a¤ected when the noise in the proxy is reduced.
I do this using a very simple DGP:
rt = σt dWt
στ = σ t 8 τ 2 (t 1, t ]
iZ/m iZ/m
ri ,m,t rτ d τ = σ t dWτ
(i 1 ) /m (i 1 ) /m
σ2t
so fri ,m,t gm
i =1 s iid N 0,
m
Although RV theory has been developed for more general DGPs, theory for
the range is mostly based on work by Feller (1951) who considered on the
above simple case.
Recent work by Christensen and Podolskij (2006, JoE) provides asymptotic
theory for the “realised range” but I do not consider this.

Distributional properties of RV and RG
This simple DGP allows me to obtain analytically moments and quantiles of
RV and RG:
m
(m ) σ2t m 2
RVt ∑ rt,i2 =
m i∑
εt,i
i =1 =1
2 (m )
so mσt RVt s χ2m
Feller (1951) provided the density of the range, and Parkinson (1980)
provided a formula for obtaining moments:
∞
k2 k RGt
f (RGt ; σt ) = 8 ∑( 1)k 1
σt
φ
σt
k =1
4 p+1
E RGtp = p Γ 2p /2 22 p /2 ζ (p 1) σpt , for p 1
π 2
where φ is the standard normal pdf , erfc(x ) 1 erf (x ), erf (x ) is the

p R∞ 2
‘error function’: erf (x ) 2/ π 0 e t dt. ζ is the Riemann zeta
function
The MAE loss function using RV or RG as the proxy
Consider the MAE loss function with RV as the volatility proxy. In that case
we know:
σ2 h i
ht = Mediant 1 [RVt ] = t Median χ2m
m
2 1
1 + σ2t
3m 9m2
8
< 0.44 σ2t for m = 1
= 0.95 σ2t for m = 13
:
0.99 σ2t for m = 78
Using RG as the proxy we obtain:
ht = Mediant 1 [RGt ]
2.2938 2
σ 0.83σ2t
4 log 2 t
using numerical methods to invert the cdf of RGt .
Thus using more accurate volatility proxies does, as expected, reduce the bias
in this case.
Summary of results across all loss functions
Volatility proxy
Loss function Realised volatility

Range m=1 m = 13 m = 78 m!∞
MSE σ2t σ2t σ2t σ2t σ2t

QLIKE σ2t σ2t σ2t σ2t σ2t
MSE-LOG † 0.85σ2t 0.28σ2t 0.91σ2t 0.98σ2t σ2t
MSE-SD 0.92σ2t 0.56σ2t 0.96σ2t 0.99σ2t σ2t
MSE-prop 1.41σ2t 3.00σ2t 1.15σ2t 1.03σ2t σ2t
MAE 0.83σ2t 0.45σ2t 0.95σ2t 0.99σ2t σ2t
MAE-LOG 0.83σ2t 0.45σ2t 0.95σ2t 0.99σ2t σ2t
MAE-SD 0.83σ2t 0.45σ2t 0.95σ2t 0.99σ2t σ2t
MAE-prop † 1.19σ2t 2.36σ2t 1.10σ2t 1.02σ2t σ2t

A class of robust loss functions
Both the MSE and QLIKE loss functions yielded the conditional variance as
the optimal forecast.
This lead me to the question: Is there a general class of such loss functions?
Proposition 2 suggests a class of loss functions, related to the

linear-exponentional family of densities of Gourieroux, et al. (1984), and to
Gourieroux, et al. (1987).

A class of robust loss functions - assumptions
h i
A1: E σ̂2t jFt 1 = σ2t
A2: σ̂2t jFt 1 s Ft 2 F̃ , the set of all absolutely continuous distribution

functions (ie, those that have a pdf ) on R+ .
A3: L is twice continuously di¤erentiable with respect to h and σ̂2 , and has a
unique minimum at σ̂2 = h.
h i
A4: There exists some ht 2 int (H) such that ht = Et 1 σ̂2t , where H is
a compact subset of R++ .
h i
A5: L and Ft are such that: (a) Et 1 L σ̂2t , h < ∞ for some h 2 H;
h i
(b ) Et 1 ∂L σ̂2t , σ2t /∂h < ∞; and
h i
(c ) Et 1 ∂2 L σ̂2t , σ2t /∂h2 < ∞ for all t.

A class of robust loss functions - Prop 2
Let assumptions A1 to A5 hold. Then a loss function L is “robust” if and

only if it takes the following form:
L σ̂2 , h = C̃ (h ) + B σ̂2 + C (h ) σ̂2 h
where B and C are twice continuously di¤erentiable,

R C is a strictly
decreasing function on H, and C̃ (h ) C (h ) dh.
Note that if we normalise L (h, h ) = 0, then the class simpli…es to:
L σ̂2 , h = C̃ (h ) C̃ σ̂2 + C (h ) σ̂2 h

A class of robust loss functions - Prop 2 proof
I proved this proposition by showing the equivalence of the following three

statements:
S 1: The loss function takes the form given the statement of the proposition;
S 2: The loss function is robust in the sense of De…nition 1;
S 3: The optimal forecast under the loss function is the conditional variance.
I show that S 1 ) S 2 ) S 3 ) S 1.
S 1 ) S 2 follows from Hansen and Lunde (2006): their assumption 2 is
satis…ed given the assumptions for the proposition.
S 2 ) S 3 is simple to show using the fact that rankings using “robust” loss
functions are the same whether σ̂2t or σ2t is used.
Proving S 3 ) S 1 was harder. It is related to the necessity of the
“linear-exponential” family of densities for quasi-maximum likelihood
estimation (though my case does not …t directly into that framework). See the
paper for details.

Interesting sub-sets of the robust class of loss functions
The general class of robust loss functions is quite broad: it depends on the
univariate function C , which is only restricted to be twice continuously
di¤erentiable and strictly decreasing function on H.
I was interested in extracting sub-sets of this class that might be easier to

handle for applied users. In doing so, I stumbled onto Prop 3:
Proposition 3:
(i) The “MSE” loss function is the only robust loss function that depends
solely on the forecast error, σ̂2 h.
(ii) The “QLIKE” loss function is the only robust loss function that depends
solely on the standardised forecast error, σ̂2 /h.

Parametric sub-sets of the robust class of loss functions
I next tried to extract a parametric family of robust loss functions, which
nested the MSE and QLIKE cases.
I did this by using the fact that both these loss functions have a FOC of the
form:
2 3
∂L σ̂2t , ht h i
0 = Et 1 4 5 = (ht )b Et 1 σ̂2t ht , b 2 R
∂h
where b = 0 for MSE loss and b = 2 for QLIKE loss.

When we integrate this up, and normalise so that L (h, h ) = 0 we obtain the
following:
8
>
>
1
(b +1 )(b +2 )
(σ̂2b +4 h b +2 ) 1
b +1 h
b +1 σ̂2 h ,
>
<
for b 2
/ f 1, 2g
L σ̂2 , h; b = 2
>
> h σ̂2 + σ̂2 log σ̂h , for b = 1
>
: σ̂2 2
h log σ̂h 1, for b = 2

Robust loss functions for various choices of b
Various robust loss functions

2.5
b=1
b=0.5
b=0 (MSE)
2 b=-0.5
b=-1
b=-2 (QLIKE)
b=-5
1.5
loss
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
hhat (r2=2)

Ratio of losses from neg errors to pos errors, for various choices of b
Ratio of loss from negative forecast errors to positive forecast errors

2.5
b=1
b=0.5
b=0 (MSE)
2 b=-0.5
b=-1
b=-2 (QLIKE)
b=-5
1.5
loss
0.5
0
0 0.5 1 1.5 2
forecast error (r2=2)

The homogeneous sub-set of robust loss functions
Finally I derived the sub-set of robust loss functions that is homogeneous of

order k:
L aσ̂2 , ah = ak L σ̂2 , h , 8 a > 0
It turns out that the sub-set of homogeneous robust loss functions is exactly
the parametric family I derived previously:
8
>
>
1
(b +1 )(b +2 )
(σ̂2b +4 h b +2 ) 1
b +1 h
b +1 σ̂2 h ,
>
<
for b 2
/ f 1, 2g
L σ̂2 , h; b = 2
>
> h σ̂2 + σ̂2 log σ̂h , for b = 1
>
: σ̂2 2
h log σ̂h 1, for b = 2
where the degree of homogeneity, k = b + 2.

Why is homogeneity so important? I
Homogeneity of the loss function is useful in economics because the choice of

units is often arbitrary: decimal vs. percent returns, etc
One would hope that a simple re-scaling of the data does not change any
conclusions, but this is not the case. Consider the following example:

Why is homogeneity so important? II
h i
σ2t = 1 8t, (h1t , h2t ) = (γ1 , γ2 ) 8t, and σ̂2t is s.t. Et 1 σ̂2t = 1 a.s. 8t.
One robust but non-homogeneous loss is given by choosing:
C 0 (h ) = log (1 + h )
Given this set-up, we have

h i 1h i
E L aσ̂2t , ahit = aγi (3aγi + 2) 2 (1 + aγi )2 log (1 + aγi )
4
+a [aγi (1 + aγi ) log (1 + aγi )] (1 γi ) + const
Then de…ne dt (γ1 , γ2 , a) L aσ̂2t , aγ1 L aσ̂2t , aγ2
Then note that E [dt (0.33, 1.5, 1)] = 0.0087 ) h1 h2

but E [dt (0.33, 1.5, 2)] = +0.0061 ) h1 h2

Daily and intra-daily data on IBM from January 1993 to December 2003,
2772 observations
I consider two simple but widely-used volatility models:

1 60
Rolling window : h1t = ∑ r2
60 j =1 t j
RiskMetrics : h2t = λh2t 1 + (1 λ) rt2 1 , λ = 0.94
First 272 observations are used for estimation, last 2500 observations are
used for forecast comparison

Daily conditional variance forecasts for IBM, Jan 1994 – Dec 2003
Conditional variance forecasts

35
60-day rolling window
RiskMetrics
30
25
Conditional variance
20
15
10
0
Jan94 Jan95 Jan96 Jan97 Jan98 Jan99 Jan00 Jan01 Jan02 Jan03 Dec03

DMW forecast comparison tests
t-statistics Volatility proxy

Daily 65-min 15-min 5-min
Loss function squared return realised vol realised vol realised vol
b=1 -1.58 -1.66 -1.30 -1.35
b= 0 (MSE) -0.59 -0.80 -0.03 -0.13
b = -1 1.30 1.04 1.65 -1.55
b = -2 (QLIKE) 1.94 2.21 2.73 2.41
b = -5 -0.17 0.25 1.63 0.65

DMW forecast comparison tests
DMW forecast comparison t-statistics
3
2
DMW t-statistic
-1
proxy = squared returns

proxy = realised volatility
-2
-5 -4 -3 -2 -1 0 1
loss function parameter
Figure: Rolling-window vs. RiskMetrics DMW t-statistics

Summary
Hansen and Lunde (2006) and Patton (2011) reveal that the latent nature of
volatility (amongst other variables) can cause problems in standard tests for
forecast evaluation and comparison.
Most tests were developed for the case that Yt 2 Ft , and no such problems
arise in that case.
Hansen and Lunde (2006) provide a su¢ cient condition on the loss function
for it to yield rankings that are robust to (mean zero) noise in the volatility
proxy.
Patton (2011) veri…es that many commonly-used loss functions lead to severe
biases when used with a noisy proxy.
More accurate volatility proxies were shown to alleviate these problems, but
they do not completely remove them.
Patton (2011) provides a necessary and su¢ cient condition for the loss
function to be robust, ruling out some previously-used loss functions

Patton OMI Lec4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Patton OMI Lec4

Uploaded by

Copyright:

Available Formats

Volatility Forecast Evaluation and Comparison

OMI-SoFiE Summer School, 2013

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 1 / 50

The standard de…nition of an optimal forecast for loss function L is:

Ŷt +h,t arg min E [L (Yt +h , ŷ ) jFt ]

Traditional evaluation and comparison tests are usually based on

And Diebold-Mariano-West tests

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 3 / 50

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 4 / 50

A common example is the squared daily return, which is conditionally

“Realizsed volatility” may also be conditionally unbiased for the expected

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 5 / 50

In some applications this is OK

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 6 / 50

σ2t is the object of interest (latent) 2 Ft 1

σ̂2t is the proxy for σ2t (observable) 2 Ft

hit is the volatility forecast from model i 2 Ft 1

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 7 / 50

Only the “empirical pre-ordering” is directly observable. Under basic

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 8 / 50

Assumption 1(i): L σ2t , h and L σ̂2t , h are “integrable” 8 t

exist and are …nite

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 9 / 50

Assumption 2(ii): (a) ∂L σ2 , h /∂σ2 exists and does not depend on h, OR

Note: Assumption 2(ii)(a) is unlikely to hold in practice – 2(ii)(b) is a more

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 10 / 50

The main result in Hansen and Lunde is Theorem 2: under assumptions 1

where σ̈2t = ασ̂2t + (1 α) σ2t , for some α 2 [0, 1] , and

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 11 / 50

Take unconditional expectation of L σ̂2t , ht :

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 12 / 50

satis…es their assumption 2 if used in conjunction with a conditionally

The MSE-log loss function does not:

2 ∂2 L σ 2 , h 1 log σ2t + log ht

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 13 / 50

1 If ϕ is a¢ ne, then only requires that Et 1 σ̂2t σ2t = c (possibly di¤erent

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 15 / 50

They consider 8 variations of ARCH models:

They consider 3 volatility proxies:

HL note that, under some assumptions:

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 16 / 50

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 17 / 50

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 18 / 50

1 I derive analytical results on the distortions in rankings of volatility forecast

statistical considerations (required number of …nite moments)

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 19 / 50

As in earlier forecast studies, I start from the de…nition of an optimal forecast:

Ŷt +h,t arg min E [L (Yt +h , ŷ ) jFt ]

If we convert this to volatility forecasting and assume that no proxy is needed

since σ2t 2 Ft 1 and L σ2t , σ2t = 0 8 L

But if we are forced to use a volatility proxy we get

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 20 / 50

But if we de…ne an optimal volatility forecast for loss function L as:

then it is not clear that for any choice of L we have ht = σ2t .

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 21 / 50

A loss function is “robust” if the ranking of any two (possibly misspeci…ed)

That is, if:

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 22 / 50

A necessary condition for a loss function to be robust is that it generates an

I do this for a collection of 9 loss functions, and three volatility proxies:

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 23 / 50

MAE : L σ̂2 , h = σ̂2 h