Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

VaR model validation

Laura Garcı́a Jorcano

February 2018

Contents

Contents 1

1 Introduction 2

2 Backtesting 2
2.1 Unconditional Coverage Test . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Conditional Coverage Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Dynamic Quantile Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Distribution Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Loss Functions 7
3.1 Lopez’s Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Sarma et al.’s Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Giacomini and Komunjer’s Loss Function . . . . . . . . . . . . . . . . . . . 9
1 Introduction
Once a risk model is constructed, it is important that it be carefully validated before
being put to practical use, and that its performance be regularly evaluated after that.
Risk models need to be regularly validated, and a key feature of model validation is
baktesting, the application of quantitative methods to determine whether the forecasts of
a VaR forecasting model are consitent with the assumptions on which the model is based,
or to rank a group of such models against each other.

2 Backtesting
2.1 Unconditional Coverage Test
Unconditional coverage test introduced by Kupiec (1995) is based on the number of vio-
lations, i.e. the number of times returns exceed the predicted VaR (T1 ) over a period of
time (T ) for a given significance level. If the VaR model is correctly specified, the failure
rate (π̂ = TT1 ) should be equal to the pre-specified VaR level (α).

α
Under the null hypothesis, the indicator function (It = 1 if Rt < V aRt−1 and It = 0
otherwise) is assumed to follow an i.i.d. Bernoulli process, where constant “sucess” prob-
ability is equal to the significance level of the VaR (α).

The null hypothesis π = α is evaluated through a likelihood ratio test


!
(1 − α)T0 αT1
 
L(Πα ) T →∞ 2
LRuc = −2 ln = −2 ln T T
−→ χ1
L(Π)b (1 − π̂) π̂
0 1

where T0 = T − T1 .

Choosing a significance level of 5% for the test, we will have a critical value of 3.8414
from the χ21 distribution. If the LRuc test value is larger than critical value, then we reject
the VaR model at 5%.

Alternatively, we can calculate the p-value associated with this test statistic
P − value = 1 − Fχ21 (LRuc )
where Fχ21 (·) denotes the cumulative density function of a χ2 variable with one degree
of freedom. If the p-value is below the desired significance level, then we reject the null
hypothesis.

These basic frequency tests have a simple intuition, are easy to apply and do not
require a great deal of information. However, they lack power (i.e., the ability to identify
bad models) except with very large sample sizes, because they throw potentially valuable
information away. This loss of information is the down side of the otherwise attractive
fact that the binomial test can be implemented using the knowledge of only T , α and T1 .
The information discarded is of two kinds:

2
• Since they focus exclusively on the frequency of exceedances over the sample pe-
riod, these tests throw away information about the temporal pattern of exceedances.
However, this can be important, because many risk models predict that exceedances
should be independently and identically distributed, that is to say, many models
predict that the probability of a tail loss is constant and independent of whether or
not an exceedance ocurred the previous period.

• Frequency tests throw away (often useful) information on the sizes of tail losses
predicted by risk forecasting models. This has the unfortunate implication that a
“bad” risk model will pass a frequency test if it generates an acceptable accurate
frequency of exceedances, even if its forecasts of losses larger than VaR are very
poor.

2.2 Conditional Coverage Test


Christoffersen (1998) supposes that, under the alternative hypothesis of VaR inefficiency,
the process of violations It (α) can be modeled as a Markov chain whose matrix of transi-
tion probabilities is defined by
   
π00 π01 1 − π01 π01
Π1 = =
π10 π11 1 − π11 π11

where πij = P r[It (α) = j|It−1 (α) = i]. This Markov chain postulates the existence
of a memory of order one in the It (α) process, i.e., the probability of having a violation
(resp. not having one) for the current period depends on the occurrence or not a violation
in the previous period. The null hypothesis of conditional coverage is then defined by the
following equality
 
1−α α
H0 : Π = Πα =
1−α α

If we accept the null hypothesis, then we accept the unconditional coverage hypothesis.
Whatever the state of the system in t − 1, the probability of having a violation at time t
is equal to α, the coverage rate, i.e.,πt = P r[It (α) = 1] = α. Furthermore, the probability
of a violation at time t is independent of the state in t − 1.

A simple likelihood ratio statistic, denoted LRcc , then allows us to test the null hy-
pothesis of conditional coverage. Under H0 : π01 = π11 = α, Christoffersen shows that
!
L(Πα ) T →∞ 2
LRcc = −2 ln −→ χ2
L(Π1 )
b

where Πb 1 is the maximum likelihood estimator of the transition matrix under the al-
ternative hypothesis

3
!
T00 T01
Π
b1 = T00 +T01 T00 +T01
T10 T11
T10 +T11 T10 +T11

where Tij is the number of times we have It (α) = j and It−1 (α) = i. L(Π
b 1 ) is the
log-likelihood of the sequence It (α) associated with Π
b1

b 1 ) = (1 − π̂01 )T00 π̂ T01 (1 − π̂11 )T10 π̂ T11


L(Π 01 11

Let us note that log-likelihood under the null hypothesis is equal to

L(Πα ) = (1 − α)T0 αT1

where T0 = T00 + T10 and T1 = T11 + T01 .

Allowing for dependence in the hit sequence corresponds to allowing π01 6= π11 , if,
on the other hand, the hits are independent over time, then the probability of a violation
tomorrow does not depend on today being a violation or not, i.e., π01 = π11 = π. Under
independence, the transition matrix is thus
 
1 − π̂ π̂
Π
b=
1 − π̂ π̂

We can test the independence hypothesis that π01 = π11 using a likelihood ratio test
!
L(Π)
b T →∞ 2
LRind = −2 ln −→ χ1
L(Π b 1)

Notice that the LRcc test takes the likelihood from the null hypothesis in the LRuc
test and combines it with the likelihood from the alternative hypothesis in the LRind test.
Therefore,
! " ! !#
L(Πα ) L(Πα ) L(Π)
b
LRcc = −2 ln = −2 ln =
L(Πb 1) L(Π)
b L(Πb 1)
! !
L(Πα ) L(Π)
b
= −2 ln − 2 ln = LRuc + LRind
L(Π)b L(Πb 1)

so that the joint test of conditional coverage can be calculated by simply summing the
two individual tests for unconditional coverage and independence.

While this test is easy to use, it is rather limited for two main reasons,
1. The independence is tested against a very particular form of alternative dependence
structure that does not take into account dependences of order higher than one.

4
2. The use of a Markov chain makes it possible only to measure the influence of past
violations It (α) and not the influence of any other exogenous variable.

2.3 Dynamic Quantile Test


The Dynamic Quantile Test proposed by Engle and Manganelli (2004) overcomes these
latter two drawbacks of the conditional coverage test. They suggest using a linear regres-
sion model that links current violations to past violations. It tests the null hypothesis
that the sequence of hits (Hitt ) is uncorrelates with any variable that belongs to the in-
formation set Ωt−1 available when the VaR was calculated and have a mean value of zero,
which implies that the hits will not be autocorrelated.

Let Hitt (α) = It (α) − α, we have,


(
1−α if Rt < V aRt|t−1 (α)
Hitt (α) =
−α if Rt ≥ V aRt|t−1 (α)

We now consider the following linear regression model,


p
X q
X
Hitt (α) = δ0 + δi Hitt−i + δj Xj + t
i=1 j=p+1

where Xj are explanatory variables contained in Ωt−1 . Engle and Manganelli (2004)
suggested X1 = V aR(α). By doing so, we are testing whether the probability of an
exception depends on the level of the VaR.

The Dynamic Quantile test statistic allows us to test the null hypothesis H0 : δ0 =
δi = δj = 0, i = 1, ..., p, j = p+1, ..., q. The statistic is a Wald statistic and the asymptotic
distribution of the OLS estimator under the null can be easily established, invoking an
appropriate central limit theorem,
a
δ̂OLS = (X 0 X)−1 X 0 Hitt ∼ N (0, α(1 − α)(X 0 X)−1 )

where X is the vector of explanatory variables.

It is now straightforward to derive the Dynamic Quantile test statistic,


0
δ̂OLS X 0 X δ̂OLS a 2
∼ χ (p + q + 1)
α(1 − α)

Under the null hypothesis, E[Hitt (α)] = E(t ) = 0, which implies that P r[It (α) =
1] = E[It (α)] = α, i.e., the hits are unbiased and uncorrelated, so that the explanatory
power of the above regression should be zero.

5
2.4 Distribution Forecasts
The test considered so far all share one common feature: they focus exclusively on the
frequency of exceedances, and therefore throw away information about their sizes. Yet
information about the sizes of exceedances is potentially useful for assessing model ad-
equacy, and we would expect tests that use such information to be more reliable than
tests that use only frequency information. We also have to wonder whether it is helpful to
throw away the non-tail information, in which case we might wish to consider the whole
return distribution, and not just its tails. This suggests that we should compare realized
distributions against predicted distributions.

A VaR estimate is just one quantile of an entire distribution that is forecast over
an h-day risk horizon. Hence, an assessment of the accuracy of the entire distribution,
instead of just one of its quantiles, is a more extensive test of the risk model. In other
words, we assess the quality of the entire distribution forecast, rather than focusing ex-
clusively on the tails.

Let us denote the forecasted distribution function by Ft . The subscript t is there


to remind us that the forecast of the forward-looking h-day return is made at time t. Set

pht = Ft (Rt+h )

where Rt+h denotes the realized return on the portfolio between time t and time t + h
in the backtest. Assuming that the backtest is based on non-overlapping data, our null
hypothesis is
H0 : pht ∼ i.i.d. U (0.1)
where U (0, 1) denotes the standard uniform distribution. In other words, our null hypoth-
esis is that the probabilities pht should be a sequence of random numbers. Put another
way, our risk model should not be able to predict the probability of the realized return.

Suppose our risk model systematically underestimates the tail risk. Then there will be
more realized returns in the tail than are predicted by the model. As a result, the backtest
will generate too many values for pht that are near 0 or near 1. Likewise too many values
will lie near the centre also, due to the higher peak for a leptokurtic density. In other
words, the empirical density of the return probabilities would have a “W” shape instead
of being flat, as it should be according to the standard uniform distribution.

A test of null hypothesis is therefore a test on the proximity of our empirical dis-
tribution to a theoretical distribution, which in our case is standard uniform. However,
tests on the standard uniform distribution are not as straightforward as tests on the stan-
dard normal distribution, so we transform pht to a variable that has a standard normal
distribution under the null hypothesis.

Zht = Φ−1 (pht )

6
where Φ denotes the standard normal distribution function. The null hypothesis may now
be written
H0 : Zht ∼ i.i.d. N (0, 1)
and a very simple alternative is

H1 : Zht ∼ i.i.d. N (µh , σh2 ), µh 6= 0 σh 6= 1

A parametric test statistic may now be based on a likelihood ratio statistic of the form
 
L0
−2 ln LR = −2 ln ∼ χ22
L1

where the likelihood function under the null hypothesis, L0 is the product of the standard
normal density functions based on the realized returns, and the likelihood function under
the alternative hypothesis, L1 is the product of the normal density functions with mean
µh and standard deviation σh based on the realized returns. If the backtest sample size is
T then, using the log likelihood of the normal distribution, it can be shown that
T T 
zht − µ̂h 2
X X 
2
−2 ln LR = zht − − T ln σ̂h2
σ̂h
t=1 t=1

where µ̂h and σ̂h denote the sample mean and standard deviation of Zht .

3 Loss Functions
It is often the case that we are not just interested in how individual models perform, but
also in how different models compare to each other. We can do so using forecast evaluation
methods that give each model a score in terms of some loss function; we then use the loss
scores to rank the models, the lower the loss, the better the model. These approaches are
not statitical test of model adequacy. Instead, their purpose is to rank models. Because
they are not statistical tests, forecast evaluation approaches do not suffer from the low
power of standard tests such as basic frequency tests: this makes them attractive for back-
testing with the small data sets typically available in real-world applications. In addition,
they also allow us to tailor the loss function to take account of particular concerns: for
example, we might be more converned about higher losses than lower losses, and might
therefore wish to give higher losses a greater weight in our loss function.

The ranking process has four key ingredientes, and a single output, a final score for
each model.

The first ingredient is a set of n paired observations - paired observations of returns


each period and their associated VaR forecasts.

The second ingredient is loss function that gives each observation a score depend-
ing on how the observed returns compares to the VaR forecasted for that period. Thus,

7
if Lt is the loss made over period t, and V aRt is the forecasted VaR for that period, our
loss function assigns the following value to the period-t observation
(
f (Rt , V aRt ) if Rt < V aR(α)
Ct =
g(Rt , V aRt ) if Rt ≥ V aR(α)

where f (Rt , V aRt ) ≥ g(Rt , V aRt ) to ensure that tail losses do not receive a lower value
than other observations.

The third ingredient is a benchmark, which gives us an idea of the score we could
expect from a “good” model.

The fourth ingredient is a score function, which takes as its inputs our loss function
and benchmark values.

3.1 Lopez’s Loss Functions


Lopez (1998, 1999) introduced loss functions in VaR evaluation. He considers three loss
functions: (i) the binomial loss function that assigns the value 1 when the VaR estimate
exceeded by its loss and 0 otherwise; (2) the zone loss function based on the adjustments
to the multiplication factor used in market risk amendment; and (iii) the magnitude loss
function, which assigns a quadratic numerical score when VaR estimate exceeded by its
loss and 0 otherwise. Subsequently, not only the VaR exception but also the magnitude
of the losses is incorporated.
We focus on the third loss function proposed by Lopez. It pays attention to the
magnitude of the noncovered losses only when they occur. Thus, Lopez’s magnitude loss
function has the following quadratic specification
(
1 + (Rt − V art (α))2 if Rt < V aRt (α)
Ct =
0 if Rt ≥ V aRt (α)

In this loss function, the quadratic term ensures that large failures are penalized more
than small failures. This function was built mainly for regulatory purposes for evaluating
the bank internal models.

A VaRmodel should be preferable to another if it has a lower average value of the loss
PT Ct 
function, t=1 T .

3.2 Sarma et al.’s Loss Functions


Lopez (1998, 1999) proposed three loss functions which might reflect the utility function
of a regulator: the binomial loss function, the magnitude loss function and the zone loss
function. Broadly speaking, the latter two penalise failures more severely as compared
with the binomial loss function Using that insight, Sarma et al. (2003) introduced the
“regulatory loss function” (RLF) to reflect the regulator’s utility function, and a “firm’s

8
loss function” (FLF) which reflects the utility function of a firm, that uses squared dis-
tances between the observed returns and the V aR(α) predicted when a violation occurs,
to ensure greater penalty on large excesses.

The RLF that they use is similar to the “magnitude loss function” of Lopez. It
penalises failures differently from the binomial loss function, and pays attention to the
magnitude of the failure.
The loss function is defined as,
(
(Rt − V aRt (α))2 if Rt < V aRt (α)
lt =
0 if Rt ≥ V aRt (α)

A VaRmodel should be preferable to another if it has a lower average value of the loss
PT lt 
function, t=1 T .

The FLF captures the idea that firms use VaR in internal risk management. Here,
there is a conflict between the goal of safety and the goal of profit maximization. A VaR
estimator which reported “too high” values of VaR would force the firm to hold “too
much” capital, imposing the opportunity cost of capital upon the firm. They propose to
model the firm’s loss function by penalising failures but also imposing a penalty reflecting
the cost of capital suffered on other days
(
(Rt − V aRt (α))2 if Rt < V aRt (α)
lt =
−αV aRt (α) if Rt ≥ V aRt (α)

Here α measures the opportunity cost of capital.

3.3 Giacomini and Komunjer’s Loss Function


Giacomini and Komunjer (2005) developed a loss function called Asymmetric Linear Tick
Loss Function (AlTick). This loss function takes into account the magnitude of the implicit
cost associated with VaR forecasting errors, i.e. not only it takes into account the returns
that exceed the VaR, but also takes into account the opportunity cost that occurs due to
an overestimation of VaR. When there are not exceptions, the loss function also penalized
due to excess capital remained. It is defined as,
(
(α − 1)et if et < 0
Lα (et ) =
αet if et ≥ 0

where et = Rt − V aRt . It is preferable that VaR model with the lowest average value
of the loss function.

You might also like