Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Volatility Forecast Evaluation and Comparison

Andrew Patton

Department of Economics
Duke University

OMI-SoFiE Summer School, 2013

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 1 / 50


Papers to be covered
* Hansen, P. R., and Lunde, A., 2006, Consistent Ranking of Volatility
Models, Journal of Econometrics, 131, 97-121.
* Patton, A.J., 2011, Volatility Forecast Comparison using Imperfect
Volatility Proxies, Journal of Econometrics, 160(1), 246-256.

Andersen, T.G., Bollerselv, T., Christo¤ersen, P.F., and Diebold, F.X., 2006,
Volatility and Correlation Forecasting, in the Handbook of Economic
Forecasting, G. Elliott, C.W.J. Granger and A. Timmermann ed.s, North
Holland Press, Amsterdam.
Clark, T.E. and McCracken, M.W., 2009, Tests of Equal Predictive Ability
with Real-Time Data, J. Business and Economic Statistics, 27, 441-454
Patton, A.J., 2011, Data-Based Ranking of Realised Volatility Estimators,
Journal of Econometrics, 161(2), 284-303.
Patton, A.J. and Sheppard, K., 2009, Evaluating Volatility and Correlation
Forecasts, in T.G. Andersen, R.A. Davis, J.-P. Kreiss and T. Mikosch (eds.)
Handbook of Financial Time Series, Springer Verlag.
Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 2 / 50
Evaluating volatility forecasts

The standard de…nition of an optimal forecast for loss function L is:

Ŷt +h,t arg min E [L (Yt +h , ŷ ) jFt ]


ŷ 2Y

Traditional evaluation and comparison tests are usually based on


Mincer-Zarnowitz type regressions:

Yt +h = β0 + β1 Ŷt +h,t + ut +h
H0 : β0 = 0 \ β1 = 1

And Diebold-Mariano-West tests

dt + h L Yt +h , Ŷta+h,t L Yt +h , Ŷtb+h,t
H0 : E [ dt ] = 0

Both of these rely on the observability of Yt +h , which does not hold when
the object of interest is σ2t +1 V [rt +1 jFt ] .

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 3 / 50


Latent variables in macroeconomics: GDP growth
Related to Clark and McCracken, 2009, JBES.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 4 / 50


Using “volatility proxies” in MZ and DMW tests
The standard method of overcoming the latent nature of σ2t +1 (or IVt +1 or
Et [IVt +1 ] in modern work) is to …nd some volatility proxy.

A common example is the squared daily return, which is conditionally


unbiased for σ2t +1 if the conditional mean is zero:

σ̂2t +1 rt2+1
h i h i
Et σ̂2t +1 = Et rt2+1 = Vt [rt +1 ] σ2t +1

“Realizsed volatility” may also be conditionally unbiased for the expected


integrated variance or the conditional variance (under some conditions)
m
(m )
σ̂2t +1 = RVt +1 ∑ rjt2
j =1
h i
Et σ̂2t +1 = σ2t +1 ?
h i
Et σ̂2t +1 = Et [IVt +1 ] ?

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 5 / 50


Using “volatility proxies” in MZ and DMW tests

Many previous papers ignored the estimation error in the volatility proxy

In some applications this is OK


In other applications, ignoring the estimation error can lead to undesirable
outcomes.

Hansen and Lunde (2006) and Patton (2011) derive results that show using a
conditionally unbiased, but noisy, proxy
h i
Et 1 σ̂2t = σ2t
2
but Et 1 σ̂2t σ2t > 0

may lead to problems, and suggest methods that are robust to noise in the
proxy.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 6 / 50


Hansen and Lunde (2006)

HL set out to obtain conditions which ensure that the ranking of volatility
forecasts is robust to noise:
h i h i h i h i
E L σ2t , h1t R E L σ2t , h2t , E L σ̂2t , h1t R E L σ̂2t , h2t

where

σ2t is the object of interest (latent) 2 Ft 1

σ̂2t is the proxy for σ2t (observable) 2 Ft

hit is the volatility forecast from model i 2 Ft 1

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 7 / 50


Three “pre-orderings” of volatility models
Hansen and Lunde distinguish three “pre-orderings” (rankings) of volatility
models:
h i
True pre-ordering: E L σ2t ,
h i
Approximate pre-ordering: E L σ̂2t ,

1 T
T t∑
Empirical pre-ordering: L σ̂2t ,
=1

Only the “empirical pre-ordering” is directly observable. Under basic


assumptions the empirical pre-ordering limits to the approximate pre-ordering.

The main question is: under what conditions are the “true” and
“approximate” pre-orderings equivalent?

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 8 / 50


Key assumptions in the analysis of Hansen and Lunde I

Assumption 1(i): L σ2t , h and L σ̂2t , h are “integrable” 8 t


h i
() E L σ2t , h < ∞ and E L σ̂2t , h < ∞)

Assumption 1(ii+iii):

T h i T h i
lim T 1
T !∞
∑E L σ2t , ht , lim T 1
T !∞
∑E L σ̂2t , ht ,
t =1 t =1
T
and lim T 1
T !∞
∑L σ̂2t , ht
t =1

exist and are …nite

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 9 / 50


Key assumptions in the analysis of Hansen and Lunde II

Assumption 2(i): De…ne η t σ̂2t σ2t . Let Ft be some …ltration such that
for all hit we have σ2t , hit 2 Ft 1

Assumption 2(ii): (a) ∂L σ2 , h /∂σ2 exists and does not depend on h, OR

2
Assumption 2(ii): (b) ∂2 L σ2 , h /∂ σ2 exists and does not depend on h,
AND fη t , Ft g is a martingale di¤erence sequence) E [η t jFt 1 ] = 0

Note: Assumption 2(ii)(a) is unlikely to hold in practice – 2(ii)(b) is a more


relevant assumption.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 10 / 50


Hansen and Lunde - main result

The main result in Hansen and Lunde is Theorem 2: under assumptions 1


and 2 the “true” and “approximate” pre-orderings are equivalent
Proof: take a second-order mean-value expansion of L σ̂2t , ht :

∂L σ2t , ht
L σ̂2t , ht = L σ2t , ht + σ̂2t σ2t
∂σ2
1 ∂2 L σ̈2t , ht 2
+ 2
σ̂2t σ2t
2 ∂ ( σ2 )

where σ̈2t = ασ̂2t + (1 α) σ2t , for some α 2 [0, 1] , and


2 2
∂2 L σ̈2t , ht /∂ σ2 λ σ̈2t , since ∂2 L/∂ σ2 does not depend on h by
assumption 2(ii ) (b ) . Then:

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 11 / 50


Hansen and Lunde - main result, cont’d

Take unconditional expectation of L σ̂2t , ht :

h i h i ∂L σ2t , ht h i
Et 1 L σ̂2t , ht = Et 1 L σ2t , ht + 2
Et 1 σ̂2t σ2t
∂σ
1 2
+ Et 1 λ σ̈2t σ̂2t σ2t
2
h i 1 2
= Et 1 L σ2t , ht + Et 1 λ σ̈2t σ̂2t σ2t
2
h i h i 1 2
so E L σ̂2t , ht = E L σ2t , ht + E λ σ̈2t σ̂2t σ2t
2
h i h i h i h i
and E L σ̂2t , h1t E L σ̂2t , h2t = E L σ2t , h1t E L σ2t , h2t

Thus we’ve shown the equivalence of the “approximate” and the “true”
pre-orderings (rankings)

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 12 / 50


Application of the main result
Hansen and Lunde verify that the MSE loss function

2 ∂2 L σ 2 , h
L σ2t , ht = σ2t ht ) 2
=2
∂ ( σ2 )

satis…es their assumption 2 if used in conjunction with a conditionally


unbiased proxy: h i
Et 1 σ̂2t = σ2t

The MSE-log loss function does not:

2 ∂2 L σ 2 , h 1 log σ2t + log ht


L σ2t , ht = log σ2t log ht ) 2
=2
∂ ( σ2 ) σ4t

Thus ranking volatility forecasts using MSE and the daily squared returns is
equivalent (asymptotically) to ranking them using the true conditional
variance. Ranking by MSE on log variances is not.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 13 / 50


Regression-based evaluation
Hansen and Lunde also consider ranking forecasts by the R 2 of a
Mincer-Zarnowitz regression of a transformation, ϕ, of the proxy on the same
transformation of the forecast:

ϕ σ̂2t = β0 + β1 ϕ (ht ) + ut
p
Common choices for ϕ are ϕ (x ) = x, ϕ (x ) = log x, and ϕ (x ) = x.
The population R 2 from this regression is
h i2
Cov ϕ σ̂2t , ϕ (ht )
R2 = h i
V [ ϕ (ht )] V ϕ σ̂2t

HL show that rankings using this measure are robust to noise in σ̂2t if the
following condition is satis…ed:
j
Et 1 σ̂2t σ2t ϕ(j ) σ2t = cj for j = 1, 2, ... 8t
Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 14 / 50
Regression-based evaluation, cont’d
Consider the condition below for a few special cases
h i
Et 1 η jt ϕ(j ) σ2t = cj for j = 1, 2, ... 8t

1 If ϕ is a¢ ne, then only requires that Et 1 σ̂2t σ2t = c (possibly di¤erent


from zero)
2 If ϕ is logarithmic, then need
1
Et 1 [η t ] = c1 , OK if Et 1 [η t ] = 0
σ2t
h i 1 h i
Et 1 η 2t = c2 , so need Vt 1 σ̂2t ∝ σ4t , OK?
σ4
h ti
In general, need Et 1 η kt ∝ σ2k
t 8k

Thus the popular regression using logs is not likely to yield a robust ranking.
3 If ϕ is the square-root, then need Et 1 η kt ∝ σ2k
t
1
8k which is also
unlikely to hold.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 15 / 50


Empirical application
Hansen and Lunde consider an application to IBM returns over the period
Jan 1995 to Feb 2002. They use the …rst 1250 observations for estimation
and the remaining 545 for out-of-sample forecast evaluation.

They consider 8 variations of ARCH models:


ARCH, GARCH, EGARCH, APARCH, T-GARCH, FIGARCH, FIAPARCH.

They consider 3 volatility proxies:


!
1 T rs2
T s∑
σ̃2[sc .RV ]t RVt
=1 RVs
2
σ̃2[RV +on ]t RVt + log Ptopen log Ptclose
1
2
σ̃2[sq.ret ]t log Ptclose log Ptclose
1

HL note that, under some assumptions:


h i h i h i
V σ̃2[sc .RV ]t σ2t V σ̃2[RV +on ]t σ2t V σ̃2[sq.ret ]t σ2t

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 16 / 50


Empirical application - Table 1

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 17 / 50


Empirical application - Table 2

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 18 / 50


Patton (2011)

In this paper I extend the work of Hansen and Lunde (2006) in a few useful
directions:

1 I derive analytical results on the distortions in rankings of volatility forecast


that occur using some common loss functions and volatility proxies.

2 I provide a necessary and su¢ cient condition on the loss function for it to yield
rankings of volatility forecasts that are robust to noise in the volatility proxy

3 I provide some guidance on the choice of loss function for volatility forecast
comparison, through:

statistical considerations (required number of …nite moments)


economic considerations (homogeneous loss functions are the only sensible
choice in economics)

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 19 / 50


Framework

As in earlier forecast studies, I start from the de…nition of an optimal forecast:

Ŷt +h,t arg min E [L (Yt +h , ŷ ) jFt ]


ŷ 2Y

If we convert this to volatility forecasting and assume that no proxy is needed


we get:
h i
ht arg min E L σ2t , h jFt 1 = σ2t ,
h 2H

since σ2t 2 Ft 1 and L σ2t , σ2t = 0 8 L

But if we are forced to use a volatility proxy we get


h i
ht arg min E L σ̂2t , h jFt 1
h 2H
= σ2t ?? for all/some L?

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 20 / 50


Framework, cont’d
It is immediate that the optimal volatility forecast is σ2t when σ2t is
observable (at time t 1).

But if we de…ne an optimal volatility forecast for loss function L as:


h i
ht arg min E L σ̂2t , h jFt 1
h 2H

then it is not clear that for any choice of L we have ht = σ2t .

Thus, the presence of noise in the proxy makes the choice of loss function
very important.

The problem we face here is unusual: we know what the optimal forecast is
(we want ht = σ2t ), and we want to …nd loss functions that ensure this.
2
h generatesiht = σt , we also satisfy
It will turn out that by making sure that L
3 2 2
Hansen and Lunde’s condition that ∂ L/ ∂ σ̂t ∂h = 0.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 21 / 50


De…ning a “robust” loss function

A loss function is “robust” if the ranking of any two (possibly misspeci…ed)


forecasts, h1t and h2t , by expected loss is the same whether the ranking is
done using the true conditional variance, σ2t , or some conditionally unbiased
proxy, σ̂2t .

That is, if:


h i h i h i h i
E L σ2t , h1t R E L σ2t , h2t , E L σ̂2t , h1t R E L σ̂2t , h2t

h i
for any σ̂2t s.t. Et 1 σ̂2t = σ2t

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 22 / 50


Characterising the problems with some loss functions

A necessary condition for a loss function to be robust is that it generates an


optimal forecast that is equal to the true conditional variance, i.e. ht = σ2t .

Thus a measure of the extent of the problem (if any) with a loss function is
to derive the optimal forecast under it and check how far it is from σ2t .

I do this for a collection of 9 loss functions, and three volatility proxies:

1 σ̂2t = rt2
m
∑ rjt2
(m )
2 σ̂2t = RVt +1
j =1
3 σ̂2t = RGt p1
2 log 2
(maxτ log Pτ minτ log Pτ ), for t 1<τ t

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 23 / 50


Loss functions

2
MSE : L σ̂2 , h = σ̂2 h
σ̂2 σ̂2
QLIKE : L σ̂2 , h = log 1
h h
2
MSE -LOG : L σ̂2 , h = log σ̂2 log h
p 2
MSE -SD : L σ̂2 , h = σ̂ h
2
“HMSE ” aka MSE -prop : L σ̂2 , h = σ̂2 /h 1

MAE : L σ̂2 , h = σ̂2 h

MAE -LOG : L σ̂2 , h = log σ̂2 log h


p
MAE -SD : L σ̂2 , h = σ̂ h

MAE -prop : L σ̂2 , h = σ̂2 /h 1

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 24 / 50


The MSE loss function using squared returns as the proxy

Consider the choice of the MSE loss function with squared returns as the
volatility proxy. In that case we know:
2
ht arg min Et 1 σ̂2t h
h 2H
h i
= Et 1 rt2
h i
= σ2t Et 1 ε2t , since rt = σt εt
= σ2t , since εt jFt 1 s Student’s t (0, 1, ν)

And so under this combination of loss function and proxy we …nd no bias.

This is consistent with the result from Hansen and Lunde (2006).

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 25 / 50


The QLIKE loss function using squared returns as the proxy
Another popular choice is the so-called QLIKE loss function. In this case:
" #
σ̂2t σ̂2t
ht arg min Et 1 log 1
h 2H h h
" !#
∂ σ̂2t σ̂2t
FOC 0 = Et 1 log 1
∂h ht ht
" #
σ̂2t 1
= Et 1 +
ht 2 ht
h i
so ht = Et 1 σ̂2t
= σ2t

So we again …nd that the optimal forecast is the true conditional variance.
What about some other loss functions?

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 26 / 50


The MAE loss function using squared returns as the proxy
Consider the common choice of the MAE loss function with squared returns
as the volatility proxy. In that case we know:
h i
ht arg min Et 1 σ̂2t h
h 2H
h i
= Mediant 1 rt2
h i
= σ2t Mediant 1 ε2t , since rt = σt εt
8 2 ν 2
< σt ν Median [F1,ν ] , if rt jFt 1 s Student’s t 0, σ2t , ν
=
: 2
σt Median χ21 0.45σ2t , if rt jFt 1 s N 0, σ2t

And so under this combination of loss function and proxy we …nd that the
optimal forecast is equal to 0.45 times the true conditional variance

Thus we will generally be lead to selecting forecasts that are downward biased

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 27 / 50


The MSE-SD loss function using sq rets as the proxy
Another popular choice is the MSE-SD loss function with squared returns as
the volatility proxy. In that case we have:
p 2
ht arg min Et 1 σ̂t h
h 2H
= Et 1 [jrt j]2
= σ2t Et 1 [jεt j]2
8 2
>
< ν π 2 Γ ν 2 1 /Γ ν
σ2t , if rt jFt 1 s Student’s t 0, σ2t , ν ,
2
=
>
: 2 2
π σt 0.64σ2t , if rt jFt 1 s N 0, σ2t

And so under this combination of loss function and proxy we …nd that the
optimal forecast is equal to 0.64 times the true conditional variance

We will again generally be lead to selecting forecasts that are downward biased

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 28 / 50


Summary of results across all loss functions

Distribution of daily returns

Loss rt jFt 1 s Student 0 s t (ν)


function rt jFt 1 s Ft 0, σ2t ν=6 ν = 10 ν ! ∞

MSE σ2t σ2t σ2t σ2t


QLIKE σ2t σ2t σ2t σ2t
MSE-LOG exp Et 1 log ε2t σ2t 0.22σ2t 0.25σ2t 0.28σ2t
MSE-SD (Et 1 [jεt j])2 σ2t 0.56σ2t 0.60σ2t 0.64σ2t
MSE-prop Kurtosist 1 [rt ] σ2t 6.00σ2t 4.00σ2t 3.00σ2t
MAE Mediant 1 rt2 0.34σ2t 0.39σ2t 0.45σ2t
MAE-LOG Mediant 1 rt2 0.34σ2t 0.39σ2t 0.45σ2t
MAE-SD Mediant 1 rt2 0.34σ2t 0.39σ2t 0.45σ2t
MAE-prop † n/a 2.73σ2t 2.55σ2t 2.36σ2t

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 29 / 50


Interpreting Patton’s Table 1

Table 1 reveals two important facts:

1 For two loss functions (MSE and QLIKE) the optimal forecast is the true
conditional variance. Thus they satisfy at least this necessary condition for
robustness (they also satisfy Hansen and Lunde’s su¢ cient condition).
2 The remaining seven loss functions all generate optimal forecasts that di¤er
from the true conditional variance.

The bias is worse for larger kurtosis


The bias can be upwards (MSE-prop and MAE-prop) or downwards (the rest)

The fact that the bias can be upwards or downwards explains the con‡icting
…ndings various authors have found when looking for the best volatility
forecast across a range of (non-robust) loss functions.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 30 / 50


Using better volatility proxies
I next consider the use of realised volatility and the (adjusted) range, to
see how the bias is a¤ected when the noise in the proxy is reduced.
I do this using a very simple DGP:
rt = σt dWt
στ = σ t 8 τ 2 (t 1, t ]
iZ/m iZ/m
ri ,m,t rτ d τ = σ t dWτ
(i 1 ) /m (i 1 ) /m

σ2t
so fri ,m,t gm
i =1 s iid N 0,
m
Although RV theory has been developed for more general DGPs, theory for
the range is mostly based on work by Feller (1951) who considered on the
above simple case.
Recent work by Christensen and Podolskij (2006, JoE) provides asymptotic
theory for the “realised range” but I do not consider this.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 31 / 50


Distributional properties of RV and RG
This simple DGP allows me to obtain analytically moments and quantiles of
RV and RG:
m
(m ) σ2t m 2
RVt ∑ rt,i2 =
m i∑
εt,i
i =1 =1
2 (m )
so mσt RVt s χ2m

Feller (1951) provided the density of the range, and Parkinson (1980)
provided a formula for obtaining moments:

k2 k RGt
f (RGt ; σt ) = 8 ∑( 1)k 1
σt
φ
σt
k =1
4 p+1
E RGtp = p Γ 2p /2 22 p /2 ζ (p 1) σpt , for p 1
π 2

where φ is the standard normal pdf , erfc(x ) 1 erf (x ), erf (x ) is the


p R∞ 2
‘error function’: erf (x ) 2/ π 0 e t dt. ζ is the Riemann zeta
function
Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 32 / 50
The MAE loss function using RV or RG as the proxy
Consider the MAE loss function with RV as the volatility proxy. In that case
we know:
σ2 h i
ht = Mediant 1 [RVt ] = t Median χ2m
m
2 1
1 + σ2t
3m 9m2
8
< 0.44 σ2t for m = 1
= 0.95 σ2t for m = 13
:
0.99 σ2t for m = 78
Using RG as the proxy we obtain:
ht = Mediant 1 [RGt ]
2.2938 2
σ 0.83σ2t
4 log 2 t
using numerical methods to invert the cdf of RGt .
Thus using more accurate volatility proxies does, as expected, reduce the bias
in this case.
Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 33 / 50
Summary of results across all loss functions

Volatility proxy

Loss function Realised volatility


Range m=1 m = 13 m = 78 m!∞

MSE σ2t σ2t σ2t σ2t σ2t


QLIKE σ2t σ2t σ2t σ2t σ2t
MSE-LOG † 0.85σ2t 0.28σ2t 0.91σ2t 0.98σ2t σ2t
MSE-SD 0.92σ2t 0.56σ2t 0.96σ2t 0.99σ2t σ2t
MSE-prop 1.41σ2t 3.00σ2t 1.15σ2t 1.03σ2t σ2t
MAE 0.83σ2t 0.45σ2t 0.95σ2t 0.99σ2t σ2t
MAE-LOG 0.83σ2t 0.45σ2t 0.95σ2t 0.99σ2t σ2t
MAE-SD 0.83σ2t 0.45σ2t 0.95σ2t 0.99σ2t σ2t
MAE-prop † 1.19σ2t 2.36σ2t 1.10σ2t 1.02σ2t σ2t

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 34 / 50


A class of robust loss functions

Both the MSE and QLIKE loss functions yielded the conditional variance as
the optimal forecast.

This lead me to the question: Is there a general class of such loss functions?

Proposition 2 suggests a class of loss functions, related to the


linear-exponentional family of densities of Gourieroux, et al. (1984), and to
Gourieroux, et al. (1987).

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 35 / 50


A class of robust loss functions - assumptions
h i
A1: E σ̂2t jFt 1 = σ2t

A2: σ̂2t jFt 1 s Ft 2 F̃ , the set of all absolutely continuous distribution


functions (ie, those that have a pdf ) on R+ .

A3: L is twice continuously di¤erentiable with respect to h and σ̂2 , and has a
unique minimum at σ̂2 = h.
h i
A4: There exists some ht 2 int (H) such that ht = Et 1 σ̂2t , where H is
a compact subset of R++ .
h i
A5: L and Ft are such that: (a) Et 1 L σ̂2t , h < ∞ for some h 2 H;
h i
(b ) Et 1 ∂L σ̂2t , σ2t /∂h < ∞; and
h i
(c ) Et 1 ∂2 L σ̂2t , σ2t /∂h2 < ∞ for all t.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 36 / 50


A class of robust loss functions - Prop 2

Let assumptions A1 to A5 hold. Then a loss function L is “robust” if and


only if it takes the following form:

L σ̂2 , h = C̃ (h ) + B σ̂2 + C (h ) σ̂2 h

where B and C are twice continuously di¤erentiable,


R C is a strictly
decreasing function on H, and C̃ (h ) C (h ) dh.

Note that if we normalise L (h, h ) = 0, then the class simpli…es to:

L σ̂2 , h = C̃ (h ) C̃ σ̂2 + C (h ) σ̂2 h

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 37 / 50


A class of robust loss functions - Prop 2 proof

I proved this proposition by showing the equivalence of the following three


statements:
S 1: The loss function takes the form given the statement of the proposition;
S 2: The loss function is robust in the sense of De…nition 1;
S 3: The optimal forecast under the loss function is the conditional variance.

I show that S 1 ) S 2 ) S 3 ) S 1.
S 1 ) S 2 follows from Hansen and Lunde (2006): their assumption 2 is
satis…ed given the assumptions for the proposition.
S 2 ) S 3 is simple to show using the fact that rankings using “robust” loss
functions are the same whether σ̂2t or σ2t is used.
Proving S 3 ) S 1 was harder. It is related to the necessity of the
“linear-exponential” family of densities for quasi-maximum likelihood
estimation (though my case does not …t directly into that framework). See the
paper for details.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 38 / 50


Interesting sub-sets of the robust class of loss functions

The general class of robust loss functions is quite broad: it depends on the
univariate function C , which is only restricted to be twice continuously
di¤erentiable and strictly decreasing function on H.

I was interested in extracting sub-sets of this class that might be easier to


handle for applied users. In doing so, I stumbled onto Prop 3:

Proposition 3:

(i) The “MSE” loss function is the only robust loss function that depends
solely on the forecast error, σ̂2 h.
(ii) The “QLIKE” loss function is the only robust loss function that depends
solely on the standardised forecast error, σ̂2 /h.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 39 / 50


Parametric sub-sets of the robust class of loss functions
I next tried to extract a parametric family of robust loss functions, which
nested the MSE and QLIKE cases.
I did this by using the fact that both these loss functions have a FOC of the
form:
2 3
∂L σ̂2t , ht h i
0 = Et 1 4 5 = (ht )b Et 1 σ̂2t ht , b 2 R
∂h

where b = 0 for MSE loss and b = 2 for QLIKE loss.


When we integrate this up, and normalise so that L (h, h ) = 0 we obtain the
following:
8
>
>
1
(b +1 )(b +2 )
(σ̂2b +4 h b +2 ) 1
b +1 h
b +1 σ̂2 h ,
>
<
for b 2
/ f 1, 2g
L σ̂2 , h; b = 2
>
> h σ̂2 + σ̂2 log σ̂h , for b = 1
>
: σ̂2 2
h log σ̂h 1, for b = 2

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 40 / 50


Parametric sub-sets of the robust class of loss functions
Robust loss functions for various choices of b

Various robust loss functions


2.5
b=1
b=0.5
b=0 (MSE)
2 b=-0.5
b=-1
b=-2 (QLIKE)
b=-5
1.5
loss

0.5

0
0 0.5 1 1.5 2 2.5 3 3.5 4
hhat (r2=2)

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 41 / 50


Parametric sub-sets of the robust class of loss functions
Ratio of losses from neg errors to pos errors, for various choices of b

Ratio of loss from negative forecast errors to positive forecast errors


2.5
b=1
b=0.5
b=0 (MSE)
2 b=-0.5
b=-1
b=-2 (QLIKE)
b=-5
1.5
loss

0.5

0
0 0.5 1 1.5 2
forecast error (r2=2)

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 42 / 50


The homogeneous sub-set of robust loss functions

Finally I derived the sub-set of robust loss functions that is homogeneous of


order k:
L aσ̂2 , ah = ak L σ̂2 , h , 8 a > 0

It turns out that the sub-set of homogeneous robust loss functions is exactly
the parametric family I derived previously:
8
>
>
1
(b +1 )(b +2 )
(σ̂2b +4 h b +2 ) 1
b +1 h
b +1 σ̂2 h ,
>
<
for b 2
/ f 1, 2g
L σ̂2 , h; b = 2
>
> h σ̂2 + σ̂2 log σ̂h , for b = 1
>
: σ̂2 2
h log σ̂h 1, for b = 2

where the degree of homogeneity, k = b + 2.

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 43 / 50


Why is homogeneity so important? I

Homogeneity of the loss function is useful in economics because the choice of


units is often arbitrary: decimal vs. percent returns, etc

One would hope that a simple re-scaling of the data does not change any
conclusions, but this is not the case. Consider the following example:

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 44 / 50


Why is homogeneity so important? II
h i
σ2t = 1 8t, (h1t , h2t ) = (γ1 , γ2 ) 8t, and σ̂2t is s.t. Et 1 σ̂2t = 1 a.s. 8t.

One robust but non-homogeneous loss is given by choosing:

C 0 (h ) = log (1 + h )

Given this set-up, we have


h i 1h i
E L aσ̂2t , ahit = aγi (3aγi + 2) 2 (1 + aγi )2 log (1 + aγi )
4
+a [aγi (1 + aγi ) log (1 + aγi )] (1 γi ) + const

Then de…ne dt (γ1 , γ2 , a) L aσ̂2t , aγ1 L aσ̂2t , aγ2

Then note that E [dt (0.33, 1.5, 1)] = 0.0087 ) h1 h2


but E [dt (0.33, 1.5, 2)] = +0.0061 ) h1 h2

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 45 / 50


Empirical application

Daily and intra-daily data on IBM from January 1993 to December 2003,
2772 observations

I consider two simple but widely-used volatility models:


1 60
Rolling window : h1t = ∑ r2
60 j =1 t j

RiskMetrics : h2t = λh2t 1 + (1 λ) rt2 1 , λ = 0.94

First 272 observations are used for estimation, last 2500 observations are
used for forecast comparison

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 46 / 50


Empirical application
Daily conditional variance forecasts for IBM, Jan 1994 – Dec 2003

Conditional variance forecasts


35
60-day rolling window
RiskMetrics
30

25
Conditional variance

20

15

10

0
Jan94 Jan95 Jan96 Jan97 Jan98 Jan99 Jan00 Jan01 Jan02 Jan03 Dec03

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 47 / 50


DMW forecast comparison tests

t-statistics Volatility proxy


Daily 65-min 15-min 5-min
Loss function squared return realised vol realised vol realised vol
b=1 -1.58 -1.66 -1.30 -1.35
b= 0 (MSE) -0.59 -0.80 -0.03 -0.13
b = -1 1.30 1.04 1.65 -1.55
b = -2 (QLIKE) 1.94 2.21 2.73 2.41
b = -5 -0.17 0.25 1.63 0.65

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 48 / 50


DMW forecast comparison tests
DMW forecast comparison t-statistics
3

2
DMW t-statistic

-1

proxy = squared returns


proxy = realised volatility
-2
-5 -4 -3 -2 -1 0 1
loss function parameter

Figure: Rolling-window vs. RiskMetrics DMW t-statistics


Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 49 / 50
Summary
Hansen and Lunde (2006) and Patton (2011) reveal that the latent nature of
volatility (amongst other variables) can cause problems in standard tests for
forecast evaluation and comparison.

Most tests were developed for the case that Yt 2 Ft , and no such problems
arise in that case.

Hansen and Lunde (2006) provide a su¢ cient condition on the loss function
for it to yield rankings that are robust to (mean zero) noise in the volatility
proxy.

Patton (2011) veri…es that many commonly-used loss functions lead to severe
biases when used with a noisy proxy.

More accurate volatility proxies were shown to alleviate these problems, but
they do not completely remove them.

Patton (2011) provides a necessary and su¢ cient condition for the loss
function to be robust, ruling out some previously-used loss functions

Patton (Duke) Volatility Forecast Evaluation OMI-SoFiE Summer School, 2013 50 / 50

You might also like