Professional Documents
Culture Documents
Lecture 12 - Forecasting Tail Risk, VaR and Expected Shortfall (ES), h1
Lecture 12 - Forecasting Tail Risk, VaR and Expected Shortfall (ES), h1
• However, for h > 1 the distribution of rt+h|Ft is not D, under the volatility
models we have considered.
• However, the mean and variance of rt+h|Ft are given by the formulas we know
(and can derive) for: µt(h), σt2(h)
• For example, if ϵt ∼ iid N (0, 1), it is not hard to show that, for h > 1, the kurtosis
QBUS6830 (S2, 2022); Module 4; 3
is also not D.
• Please note that the Gaussian-based formulas for h > 1 for tail risk in Tsay’s
textbook are thus incorrect.
QBUS6830 (S2, 2022); Module 4; 4
• Figure 1 shows a plot from Wong and So (2003) illustrating this point.
Figure 1: Kurtosis for rt+h |Ft and rt [h]|Ft for GARCH(1,1) models
QBUS6830 (S2, 2022); Module 4; 5
Ph
• i.e. if µt = µ (constant mean) then σt2[h] = 2
k=1 σt (h).
• To estimate the true distribution for rt[h]|Ft, h > 1 and associated tail risk
measures for the model being fit to the data, Monte Carlo sampling is used to
implicitly, sample from the distribution of rt[h]|Ft, by repeatedly sampling from
the 1-step-ahead distribution, ...
• ... even though we do not know exactly what distribution rt[h]|Ft follows; we
can use probability theory to use successive h = 1 forecasting to achieve this. We
consider this aspect below in more detail.
• However, since the GARCH is a fully parametric model, we can simulate from
BOTH these distributions implicitly, by repeatedly using the implied 1-step-ahead
distribution rt+1|Ft, using a neat probability trick.
• We obtain many MC samples from the joint distribution for rt+1, rt+2, . . . , rt+h|Ft.
QBUS6830 (S2, 2022); Module 4; 8
• We then add these realisations of rt+1, rt+2, . . . , rt+h|Ft together to achieve a single
simulated sample from rt[h]|Ft (from the model being that simulated from).
6. Calculate at+2 = ϵt+2σt+2, then form rt+2 = µt+2 + at+2. This is a sample from
p(rt+2|rt+1, Ft).
7. Repeat steps 4, 5, and 6 using j = 3, 4, . . . , h − 1, instead of j = 2: i.e. for
j = 3, 4, . . . , h − 1:
2
8. calculate σt+j |rt+j−1, Ft and µt+j |rt+j−1, Ft, i.e. by using the at+j−1 and rt+j−1
just simulated and the GARCH and AR equations above.
9. Finally, for j = h:
2
10. Calculate µt+h, σt+h |rt+1, . . . , rt+h−1, Ft, using the rt+1, . . . , rt+h−1 simulated
in steps 1-7 above.
11. Simulate a single ϵt+h ∼ D(0, 1)
12. Form at+h = ϵt+hσt+h, then form rt+h = µt+h + at+h, which is a sample from
p(rt+h|rt+1, . . . , rt+h−1, Ft).
13. Finally, form rt[h]|Ft by summing up the single realization of the h values
QBUS6830 (S2, 2022); Module 4; 10
Ph
rt+1, . . . , rt+h|Ft just simulated, i.e. rt[h]|Ft = j=1 rt+j |Ft .
14. Repeat all steps above many times, forming a MC sample from rt[h]|Ft
• Repeating these steps many times will provide a Monte Carlo sample from the
distribution rt[h]|Ft.
• How does this achieve a single realisation or sample from the joint distribution for
rt+1, rt+2, . . . , rt+h|Ft??
• i.e. to simulate from rt[2]|Ft = rt+1+rt+2|Ft requires sampling from p(rt+1, rt+2|Ft)
• but this is equivalent to simulating from p(rt+1|Ft) and then p(rt+2|rt+1, Ft).
QBUS6830 (S2, 2022); Module 4; 11
• The h-step-ahead VaR forecast is then the relevant sample quantile of this forecast
sample of h-period returns.
• The h-step-ahead ES is simply the sample mean of the h-period returns beyond
the VaR estimate.
• This is not hard to code (e.g. in Python). But can take a longish time to run.
Figure 2: Histogram of 100000 Monte Carlo forecasted 10-day returns from AR-GJR-GARCH-N model. T is the end of the full sample.
QBUS6830 (S2, 2022); Module 4; 14
• The VaRs are the sample percentiles of this Monte Carlo sample, and the ES
estimates are the sample means for those simulated 10 day returns beyond the
VaR limits.
QBUS6830 (S2, 2022); Module 4; 15
Figure 3: Histogram of 100000 Monte Carlo forecasted 10-day returns from AR-GJR-GARCH-t model. T is the end of the full sample.
QBUS6830 (S2, 2022); Module 4; 16
• Fatter tails are only slightly more apparent under the Student-t model, though
the very few outliers, compared to a Gaussian, are clear.
• For the Gaussian error model, the sample skewness and kurtosis for rt[10] are
−0.26, 3.77 respectively.
• A Jarque-Bera test confirms these are not consistent with a Gaussian distribution,
with p-val ≈ 0.0.
• For the Student-t GJR model, the sample skewness and kurtosis for rt[10] are
−0.21, 4.06 respectively.
• A Jarque-Bera test again confirms these are both not consistent with a Gaussian
distribution, with p-value 0.0.
QBUS6830 (S2, 2022); Module 4; 17
• The outliers in the Student-t error based model have led to higher kurtosis for the
10-day returns than for the Gaussian error model.
• The table below shows estimates of VaR10 and ES10 for a range of levels p for the
GJR and GJR-t models.
Table 1: 10 day tail risk VaR (h = 10) forecasts for CBA
• The Gaussian error model gives more extreme 10-day VaR forecasts for p ≥ 0.025,
compared to the Student-t model.
• For p ≤ 0.025, as we move further out into the tails, the Student-t error model
QBUS6830 (S2, 2022); Module 4; 18
• The Gaussian and Student-t error models give more similar 10-day ES forecasts,
than they did for 10-day VaR, as also occurs as p gets smaller; overall these
differences seem relatively less important (compared to those for h = 1 say).
• The 10-day CBA return distribution, from 04/05/22, has risk levels between 4%
and 13%, depending on p.
• The 10-day 1% VaR is ≈ 9.4%, whilst the 10 day 2.5% ES is ≈ 9.6%, the 10 day
1% ES is ≈ 11.6%
QBUS6830 (S2, 2022); Module 4; 19
• Thus the single day forecast variance remains exactly the same for any forecast
horizon h (i.e. no mean reversion).
QBUS6830 (S2, 2022); Module 4; 20
Ph
• Riskmetrics incorrectly assumes that k=1 rt+h |Ft ∼ N (0, hσt2(1)) and for any
horizon h > 0 employs:
q
VaRp(h) = Φ−1(p) hσt2(1)
√ −1
ϕ Φ (p)
ESp(h) = − hσt(1)
p
which, in truth, are only approximations, and likely risky, anti-conservative ones
at that.
QBUS6830 (S2, 2022); Module 4; 21
• This approximate method should under-estimate the true VaR for h > 1, by
Wong and So (2003), since they showed fatter tails than Gaussian for h > 1.
• We can however, also get more ”correct” estimates by using MC simulation for
the RM model, in the same way as we do for GARCH models.
and
√
−1
VaRp(h) = Φ (p) hσ 2 + hµ
√ ϕ Φ−1(p)
ESp(h) = hµ − hσ
p
QBUS6830 (S2, 2022); Module 4; 22
• The series rt[h] is again usually chosen to be non-overlapping; which reduces our
sample size to Th .
• When h = 10, this is a series of 10 day returns and our sample size is divided by
10
• CAViaR can also be employed directly on the h-day returns, but this is not often
done since Th is usually not large enough for accurate estimate of the CaViaR
model.
QBUS6830 (S2, 2022); Module 4; 23
• There is currently little work on using the 1-day CAViaR VaR model to find an
h-day VaR (or ES) forecast.
Section (4d): Example
• Figure 16 shows the non-overlapping 10 day CBA returns.
• Figure 5 shows the non-overlapping 10 day CBA returns in the forecast period
only.
• Note the much ”shorter” effect of Covid in March 2020 on 10 day CBA returns,
compared to the 1-day returns.
• And the far lower degree of persistence in volatility compared to 1-day returns.
• I now forecast VaR and ES for the non-overlapping 10 day returns for CBA, after
20th Dec, 2007. This gives 375 non-overlapping 10-day returns to forecast.
QBUS6830 (S2, 2022); Module 4; 24
10
10
20
30
20
10
10
20
Figure 5: 10 day non-overlapping returns for CBA, from 8th Dec, 2009
• I use the ARCH(5), GARCH, GJR and EGARCH models, all with both Gaus-
sian and Student-t errors, plus the RiskMetrics model (both by simulation and
QBUS6830 (S2, 2022); Module 4; 26
Gaussian assumption), for daily data and generate 10 day ahead forecasts of rt[10].
• I estimate the models using daily data previous to the 10 day period being
forecast, re-estimating all models for each new 10-day forecast.
• Figure 6 shows the VaR10 and ES10 forecasts at p = 0.025 for the EGARCH-t
and RiskMetrics models (both using simulation).
• The two VaR and ES series are fairly close to each other, though often the EG-t
model gives slightly more extreme forecasts.
• Except for during the high volatility period, where RM gives the most extreme
forecasts, both during and for several 10 day periods after it.
QBUS6830 (S2, 2022); Module 4; 27
20
10
10
20
Figure 6: 10 day non-overlapping returns for CBA plus forecasts of VaR10 and ES10 at p = 0.025 for the EGARCH-t and RiskMetrics
models.
• Figure 7 shows the VaR10 and ES10 forecasts at p = 0.025 for the RM Gaussian
and HS models.
QBUS6830 (S2, 2022); Module 4; 28
20
10
10
20
CBA 10-day returns
VaR HS100
30 ES HS100
2.5% VaR HS-T
2.5% ES HS-T
VaR RM N
40 ES RM N
2008 2010 2012 2014 2016 2018 2020 2022
Figure 7: 10 day non-overlapping returns for CBA plus forecasts of VaR10 and ES10 at p = 0.025 for the informal, adhoc methods.
• Figure 8 shows the VaR10 and ES10 forecasts at p = 0.01 for the GJR-GARCH-t
and ARCH-t models (both using simulation).
QBUS6830 (S2, 2022); Module 4; 29
20
20
40
Figure 8: 10 day non-overlapping returns for CBA plus forecasts of VaR10 and ES10 at p = 0.01 for GJR-t and ARCH-t models.
• Figure 9 shows the VaR10 and ES10 forecasts at p = 0.01 for the adhoc HS
methods, plus the (incorrect) RiskMetrics method assuming 10-day returns are
QBUS6830 (S2, 2022); Module 4; 30
Gaussian.
20
10
10
20
CBA 10-day returns
30 VaR HS100
ES HS100
1% VaR HS-T
40 1% ES HS-T
VaR RM N
ES RM N
50
2008 2010 2012 2014 2016 2018 2020 2022
Figure 9: 10 day non-overlapping returns for CBA plus forecasts of VaR10 and ES10 at p = 0.01 for RiskMetrics (Gaussian), HS-100 and
HS-T.
QBUS6830 (S2, 2022); Module 4; 31
• The table below shows accuracy measures for p = 0.025 over the 360 10-day
periods for VaR forecasts for CBA.
Table 2: The 10-step-ahead forecast VaR summary for all models for CBA, p = 0.025.
• At p = 0.025, the 10-step-ahead forecasts for VaR are most accurate in VRate
QBUS6830 (S2, 2022); Module 4; 32
for the ARCH-t method: this method also passes most tests but fails the DQ test
(p = 0.042).
• The 3 best methods by VRate are the ARCH-t, ARCH-N and HS-100. The
ARCH-t and ARCH-N have the 2 lowest quantile values, whilst the HS-100 has
the worst/highest quantile loss. These models also all fail the DQ test.
• The UC test rejects the GARCH, RM, GARCH-t, GJR-t and HS-T models,
indicating these all have violation rates significantly different to (higher than) the
desired 2.5%.
• The incorrect RM N model (18) has 2 more violations than the simulation-based
RM model (16). Both are rejected by the UC test.
• The DQ test using 4 lags rejects the ARCH, GARCH, ARCH-t, GARCH-t, RM-
N, iid N, HS-100 and HS-T models as having inaccurate 2.5% VaR forecasts and
likely VaR violations correlated with quantile forecast and/or previous violations.
• Only the GJR-GARCH-N, EGARCH-N, RM, GJR-t and EGARCH-t are not
rejected by the DQ test.
• For models not rejected by any test, the quantile loss function values GJR-N and
EGARCH-N models.
• The iid N, HS-100 and HS-T models rank in the 3 last places by the loss function.
• For VaR 10-step forecasts at p = 0.025, the GJR-N model is the best performing
model overall, followed closely by the EGARCH-N model.
QBUS6830 (S2, 2022); Module 4; 34
• Figure 10 shows 2.5% VaR violations for the ARCH-N, GJR-GARCH-N, HS-T
and RM methods.
• Clearly the HS-T has correlated and clustered violations, whilst the RM has too
many violations.
• The ARCH-t and HS-100 models are closest to what we expect here, with 5
violations each. But most other models seem quite inadequate by ES violation
rate; e.g. many have ES VRate ≥ 0.025, ≥ 9 violations, except ARCH-N and
HS-T.
QBUS6830 (S2, 2022); Module 4; 35
20
10
10
20
CBA 10-day returns and 2.5% violations
ARCH-N
GJR
30 HS-T
RM
2008 2010 2012 2014 2016 2018 2020 2022
Figure 10: 10 day returns for CBA plus 2.5% VaR10 violations for some models.
• The t-test on the standardised ES residuals finds that the ES forecast residuals
are not significantly different to a mean of 0 for most of the models: i.e. we can’t
QBUS6830 (S2, 2022); Module 4; 36
Table 3: The 10-step-ahead forecast ES summary for all models for CBA, p = 0.025.
reject any of the ES forecast models as being biased at p = 0.025; the models
rejected are GARCH-N, RM, GARCH-t, RM N and iid N (p < 0.05).
QBUS6830 (S2, 2022); Module 4; 37
• The HS-T model’s 2.5% ES residuals and standardised residuals are closest in
mean to 0 and in their t-stats closest to 0 and highest p-values.
• The RM model’s 2.5% ES residuals have the lowest RMSE, while the GJR-N’s
have the lowest MAD; these 2 models have 2.5% ES residuals closest to 0 in
variation.
• The RM-N has the highest number of 2.5% ES violations (13), which is higher than
what the 2.5% VaR violation number should be, and thus is highly problematic.
• The model with VaR and ES forecasts having the lowest joint VaR and ES loss
value is the ARCH-t, closely followed by the ARCH-N model.
• The 3 models with clearly highest joint loss values, as well as clearly highest RMSE
and MAD values, are again: HS-100, iid N and HS-T.
QBUS6830 (S2, 2022); Module 4; 38
• At p = 0.025, the ARCH-t and ARCH-N models appear to be the most accurate
forecasters of ES and joint forecasters of 10-day VaR and ES levels, for CBA.
• Figures 11 and 12 show the forecast ES10 residuals for some models for their ES
2.5% forecasts.
• Again we see the property of ES residuals that they are negatively skewed, are
mostly small and positive but also tend to have a few larger magnitude, negative
values.
• The models shown have between 11 and 18 VaR 2.5% violations, thus they also
have from 11-18 ES 2.5% residuals.
• The models shown seem to have close to mean 0 ES 2.5% residuals, but differing
variation levels, agreeing with the ES 2.5% table above.
QBUS6830 (S2, 2022); Module 4; 39
5 ARCH
ARCH-t
EG-t
RM
GARCH-t
0 GJR-GARCH
10
15
Figure 11: Forecast residuals for ES10 (p = 0.025): ξt , for ARCH, GJR-GARCH-N, ARCH-t, EGARCH-t, GARCH-t and RiskMetrics.
• The table below shows accuracy measures for p = 0.01 over the 375 10-day periods
for VaR forecasts.
QBUS6830 (S2, 2022); Module 4; 40
10
15
RM N
HS-100
20 HS-T
GARCH-N
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5
Figure 12: Forecast residuals for ES10 (p = 0.025): ξt , for RM Gaussian, HS-100 and HS-T.
• At p = 0.01, the 10-step-ahead forecasts for VaR are most accurate in VRate for
the ARCH-t model, with 5 violations and V Rate = 0.013.
QBUS6830 (S2, 2022); Module 4; 41
Table 4: The 10-step-ahead forecast VaR summary for all models for CBA, p = 0.01.
• All other models are rejected by the UC test, with 8-13 violations; except the
ARCH-N and HS-100 models, with 6 and 7 violations respectively.
• The incorrect RM N model (13) has only 1 more violation than the simulation-
QBUS6830 (S2, 2022); Module 4; 42
• The ARCH-t and iid N models are rejected by the Independence test for violations.
• However, the DQ test rejects ALL the models as having inaccurate 1% VaR
forecasts.
• The quantile loss function favours the EGARCH-N model, followed by GJR-
GARCH-N and EGARCH-t models.
• The HS-T, HS-100 and iid N models again do the worst by the quantile loss
function: these are the least accurate 1% VaR forecasters and most strongly
rejected by the DQ test too.
• The best models via the quantile loss function are ARCH-t, ARCH-N and EGARCH-
QBUS6830 (S2, 2022); Module 4; 43
t models.
• The models with the 3 highest loss values are the iid N, HS-100 and HS-T.
• For VaR 10-step forecasting at p = 0.01, all models are rejected by the DQ test,
except the GJR-N model, which is rejected by the UC test.
• Figure 13 shows 1% VaR violations for the ARCH-N, ARCH-t, HS-100 and RM
methods.
• The ARCH-N and ARCH-t’s violations seem not to cluster at all. However,
their violations only occur in higher volatility periods, thus may be related to the
quantile forecasts, explaining why they failed the DQ test.
QBUS6830 (S2, 2022); Module 4; 44
10
10
20
30
Figure 13: 10 day returns for CBA plus 1% VaR10 violations for some models.
Table 5: The 10-step-ahead forecast ES summary for all models for CBA, p = 0.01.
violations on average..
QBUS6830 (S2, 2022); Module 4; 46
• The ARCH-N, ARCH-t, GJR-N, GJR-t and HS-T methods are all the closest to
that ES violation rate here, with 4 violations; unfortunately this rate is above the
1% expected for 1% VaR forecasts, which is problematic
• Most models seem inadequate by ES violation rate, all with above 4 (i.e. above
1%) ES violation rates.
• The incorrect RM N model (13) has 2 more violations than the simulation-based
RM model (11). Both are way too large for 1% ES forecast VRates.
• The GJR-t model has ES residuals closest to mean 0 and t-stat closest to 0 with
highest p-value.
QBUS6830 (S2, 2022); Module 4; 47
• The t-tests on the standardised ES residuals find that the ES forecast residuals
are significantly different to a mean of 0 for the models: RM RM N and iid N i.e.
these ES forecast models are significantly biased for p = 0.01.
• The RM has smallest RMSE and MAD for the 1% ES residuals; but this is biased
by it having too many 1% ES residuals; since it has 12 1% VaR violations.
• The GJR-N model’s 1% VaR and ES forecasts have the lowest joint loss.
• The 3 models with clearly highest joint loss values, as well as clearly highest RMSE
and MAD values, are again the HS-100, iid N and HS-T methods.
• These 3 methods are clearly the least accurate 1% VaR and ES forecasters.
• Figures 14 and 15 show the forecasted ES10 residuals for some models for their
ES 1% forecasts.
• The models shown have between 5 and 8 VaR 1% violations, thus they also have
from 5-8 ES 1% residuals.
• The models shown seem to have close to mean 0 ES 1% residuals, but differing
variation levels, agreeing with the 1% ES table above.
QBUS6830 (S2, 2022); Module 4; 49
5.0
2.5
0.0
2.5
5.0
7.5
10.0
ARCH
ARCH-t
12.5 EG-t
RM
15.0 GARCH-t
GJR-GARCH
0 2 4 6 8 10
Figure 14: Forecast residuals for ES10 (p = 0.01): ξt , for ARCH, GJR-GARCH-N, ARCH-t, EGARCH-t, GARCH-t and RiskMetrics.
• The incorrect RM that assumed a Gaussian distribution for 10-day returns con-
sistently forecast VaR and ES marginally less accurately than the RM method,
QBUS6830 (S2, 2022); Module 4; 50
20
10
10
20
CBA 10-day returns and 2.5% violations
ARCH-N
GJR
30 HS-T
RM
2008 2010 2012 2014 2016 2018 2020 2022
Figure 15: Forecast residuals for ES10 (p = 0.01): ξt , for RM Gaussian, HS-100 and HS-T.
• For 10-day VaR and ES forecasting for p = 0.025, 0.01, the GJR-GARCH-N model
was consistently among the most accurate models and often the best and most
accurate model, for CBA.
• The 3 clearly worst methods for 10-day VaR and ES forecasting for p = 0.025, 0.01
were the HS-100, HS-T and iid N methods.
• Very, very long (daily) sample periods are needed to adequately compare and
assess VaR and ES forecasts for h = 10 and h > 1 in general.
Section (4e): Example, WES 10 day tail tisk forecasting
• Figure ?? shows the non-overlapping 10 day WES returns.
• The table below shows accuracy measures for p = 0.025 over the 375 10-day
periods for VaR forecasts.
QBUS6830 (S2, 2022); Module 4; 52
10
10
20
30
WES 10-day non-overlapping returns: 2000-2022
2000 2004 2008 2012 2016 2020
• For p = 0.025 VaR 10 day ahead forecasting for WES, all models fail at least one
of the UC, ind or DQ tests.
QBUS6830 (S2, 2022); Module 4; 53
Table 6: The 10-step-ahead forecast VaR summary for all models for WES, p = 0.025.
• All models have higher VRates than 2.5%; the closest in VRate to 2.5% are the
HS-T and HS-100 and these are the only 2 models not rejected by the UC test
• Though both HS-T and HS-100 fail the DQ test and also have the highest/worst
QBUS6830 (S2, 2022); Module 4; 54
quantile loss values, i.e. they are the furthest from the true 2.5% VaR series.
• The models with the best and lowest quantile loss are the GJR-N and RM, though
both are rejected by the UC and DQ tests.
• For models rejected by only 1 test, those with lowest quantile loss are: ARCH-t,
EGARCH-t.
• Figure 17 shows 2.5% VaR violations for the ARCH-t, GJR-N, HS-T and RM-N
methods.
• The GJR-N’s violations may cluster in 2008 and especially 2020, explaining why
the GJR-N failed the DQ test. The ARCH-t shares only 3 of the 4 GJR-N
QBUS6830 (S2, 2022); Module 4; 55
violations in 2008 and 1 or the 2 close together GJR-N violations in 2022; this
explains why it doesn’t fail the DQ test.
• Overall, the best model for forecasting 2.5% 10 day WES VaR is the ARCH-t.
Though it has too many violations, at least these occur approximately indepen-
dently.
• The table below shows accuracy measures for p = 0.01 over the 375 10-day periods
for VaR forecasts.
• For p = 0.01 VaR 10 day ahead forecasting for WES, all models fail at least one
of the UC, ind or DQ tests.
• All models have higher VRates than 1%; the closest in VRate to 1% is the HS-T
this is the only model not rejected by the UC test.
QBUS6830 (S2, 2022); Module 4; 56
10
10
20
Figure 17: 10 day returns for WES plus 2.5% VaR10 violations for some models.
• The HS-T fails the DQ test and also has the 2nd highest/worst quantile loss value,
i.e. it is the 2nd furthest from the true 1% VaR series.
QBUS6830 (S2, 2022); Module 4; 57
Table 7: The 10-step-ahead forecast VaR summary for all models for WES, p = 0.01.
• The models with the best and lowest quantile loss are the ARCH-t and ARCH-N,
though both are strongly rejected by both the UC and DQ tests.
• Only the HS-T model is rejected by only 1 test, but it is the 2nd worst for quantile
QBUS6830 (S2, 2022); Module 4; 58
loss.
• Figure 18 shows 1% VaR violations for the ARCH-t, GJR-N, HS-T and RM-N
methods.
• The GJR-N’s and ARCH-t’s violations may cluster in early 2020, with 2 violations
in 2 days, explaining why these models failed the DQ test.
10
10
20
Figure 18: 10 day returns for WES plus 1% VaR10 violations for some models.