Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Econometric Reviews

ISSN: 0747-4938 (Print) 1532-4168 (Online) Journal homepage: http://www.tandfonline.com/loi/lecr20

Lassoing the HAR Model: A Model Selection


Perspective on Realized Volatility Dynamics

Francesco Audrino & Simon D. Knaus

To cite this article: Francesco Audrino & Simon D. Knaus (2015): Lassoing the HAR Model:
A Model Selection Perspective on Realized Volatility Dynamics, Econometric Reviews, DOI:
10.1080/07474938.2015.1092801

To link to this article: http://dx.doi.org/10.1080/07474938.2015.1092801

Accepted author version posted online: 13


Oct 2015.
Published online: 13 Oct 2015.

Submit your article to this journal

Article views: 12

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=lecr20

Download by: [La Trobe University] Date: 02 February 2016, At: 22:39
Econometric Reviews, 0(0):1–37, 2016
Copyright © Taylor & Francis Group, LLC
ISSN: 0747-4938 print/1532-4168 online
DOI: 10.1080/07474938.2015.1092801

Lassoing the HAR Model: A Model Selection


Perspective on Realized Volatility Dynamics
Francesco Audrino and Simon D. Knaus
University of St. Gallen, St. Gallen, Switzerland
Downloaded by [La Trobe University] at 22:39 02 February 2016

Realized volatility computed from high-frequency data is an important measure for many
applications in finance, and its dynamics have been widely investigated. Recent notable
advances that perform well include the heterogeneous autoregressive (HAR) model which
can approximate long memory, is very parsimonious, is easy to estimate, and features
good out-of-sample performance. We prove that the least absolute shrinkage and selection
operator (Lasso) recovers the lags structure of the HAR model asymptotically if it is the
true model, and we present Monte Carlo evidence in finite samples. The HAR model’s
lags structure is not fully in agreement with the one found using the Lasso on real data.
Moreover, we provide empirical evidence that there are two clear breaks in structure for
most of the assets we consider. These results bring into question the appropriateness of
the HAR model for realized volatility. Finally, in an out-of-sample analysis, we show equal
performance of the HAR model and the Lasso approach.

Keywords Heterogeneous autoregressive model; Lasso; Model selection; Realized volatility.

JEL Classification C58; C63; C49.

1. INTRODUCTION

Volatility of financial assets is of great importance to many applications in finance.


Reliable estimates and forecasts are key for risk management and asset allocation.
As opposed to returns series, financial volatility is predictable and has received great
attention in the financial econometrics research community. The seminal article of
Bollerslev (1986) introducing the generalized autoregressive conditional heteroscedasticity
(GARCH) model for conditional volatility has thus sparked an even greater interest
in volatility modeling. The GARCH model has become extremely popular and despite
various extensions and modifications the basic GARCH(1,1) fares well as a prediction
device for conditional volatility in an out-of-sample forecast comparison (Hansen and

Address correspondence to Francesco Audrino, University of St. Gallen, Bodanstrasse 6, 9000 St. Gallen,
Switzerland; E-mail: francesco.audrino@unisg.ch
Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/lecr.
2 F. AUDRINO AND S. D. KNAUS

Lunde, 2005). While (Bollerslev’s 1986) GARCH model is able to capture stylized facts of
volatility series (e.g., volatility clustering), its estimation still relies on daily observations
and thus potentially discards intraday information. The advent of high-frequency data
(with frequencies as high as tick-by-tick) has ignited a new line of research pioneered by
Andersen et al. (2001) and Barndorff-Nielsen and Shephard’s (2002) among others.
Suppose that an asset’s log price obeys the dynamics dXt = t dt + t dWt , where Wt
is a Brownian motion, t the instantaneous volatility, and t the instantaneous drift
 T
term. One can then show that plimq→∞ ti (Xti+1 q − Xtiq )2 = 0 t2 dt for any sequence of
partitions t0 = 0 < t1 < · · · < tN = T with supti+1 − ti  → 0 for q → ∞, i.e., the sum
q q q q q

of squared returns converges to the integratedT variance (over a day) as the sampling
 −1
frequency increases.1 An estimator of 0 s2 ds is thus given by Ni=0 (Xti+1 − Xti )2 , where
Downloaded by [La Trobe University] at 22:39 02 February 2016

t0 , t1 ,    , tN is an appropriate sampling frequency, and is denoted RVt , where t refers



to the day. RVt is called realized variance, and its squareroot RVt is referred to as
realized volatility. An overview of variants of the aforementioned estimator and their
corresponding assumptions is collected in McAleer and Medeiros’s (2008) review on
realized volatility.
Since the goal of this work is to investigate the dynamics of the realized variance and
not the estimation itself we can thus—with daily realized variance at hand—approach the
problem of modeling realized variance.
It has been observed that the time series RVt 1≤t≤T exhibits some distinct features,
the most relevant one being its slowly decaying autocorrelation function which is often
termed long memory: This finding appears to be robust across different asset classes and
evidence has been reported for exchange rates (Andersen et al., 2001), index futures (Areal
and Taylor, 2002; Thomakos and Wang, 2003), as well as for individual stocks (Andersen
et al., 2001).
To address these characteristics of the realized variance time series, different
approaches have been put forward, most prominently fractionally integrated ARMA
models (ARFIMA) and the heterogeneous autoregressive (HAR) model introduced by
Corsi (2009). The HAR model can approximate long memory in a very simple way, is
parsimonious, and allows for an easy estimation. For all these reasons it is widely used
within the research community.
The contribution of this article is to shed more light on the underlying dynamics
as advocated by Corsi (2009) HAR model which in essence claims tomorrow’s realized
variance to be a sum of daily, weekly, and monthly averages of realized variances that
can each be attributed to specific investment behaviors. The question we are aiming to
answer relates to how much these frequencies (daily, weekly, monthly) are really inherent
to the data and if we can identify them from a model selection perspective.
Model selection plays a crucial role in determining a model for forecasting. Oftentimes
model selection can be extremely costly from a computational perspective and may

1
T
It is known that this naive estimator of 0 t2 dt is biased under e.g., microstructure noise or if the log
price process is a jump-diffusion process.
LASSOING THE HAR MODEL 3

already become infeasible within the class of linear models: In fact, an exhaustive search
over p lags already requires 2p comparisons and thus grows exponentially. An important
contribution in terms of model selection within the class of linear models was made in
Tibshirani (1996) where the least absolute shrinkage and selection operator (Lasso) was
introduced. Lasso, a shrunk regression, performs shrinkage and selection at a time and is
yet computationally affordable. Although originally the Lasso was mostly noticed by the
computational statistics community, researchers in econometrics are increasingly using it.
Most recently, conditions under which the Lasso and its refined version called adaptive
Lasso give consistent results have also been established in time series econometrics by, for
example, Wang et al. (2007), Hsu et al. (2008), Nardi and Rinaldo (2011), Kock (2012),
Downloaded by [La Trobe University] at 22:39 02 February 2016

Kock and Callot (2012), Medeiros and Mendes (2012), and Audrino and Camponovo
(2013), and applications of the Lasso are also found in Park and Sakaori (2013).
Despite the great popularity and appreciation of the HAR model, there has been little
work investigating the validity of the structure as proposed by the HAR model. Although
most work is done in the direction of extending the HAR model (see the recent review
of Corsi et al., 2012), there are two notable exception: Craioveanu and Hillebrand (2010)
investigate the structure of the HAR model and find no benefit in allowing for a more
flexible structure of lag selection. However, their result is based on an exhaustive search
over HAR-like models but varying aggregation frequencies. Wang et al. (2013) investigate
the problem of forecasting a stationary ARFIMA process subject to structural breaks
with unknown break dates. They show that an ARFIMA model subject to a shift in mean
and a persistent change in the memory parameter can be approximated well by an AR
process and point out that this result can be viewed as an econometric explanation for
the empirical success of the HAR model.
It is along these lines that this paper adds to the literature. We present a
methodologically sound way of recovering the lags structure and maximum order of the
HAR model. We show that under the assumption that the HAR model is the true model,
we can apply the Lasso and should recover the lags structure as implied by the HAR
model. To this end we investigate how far Nardi and Rinaldo’s (2011) result can be
extended for the special case of the HAR model. We then investigate if the Lasso can
be used for forecasting realized variance from a purely statistical point of view as well
as measuring outperformance from a more economically relevant point of view via a risk
management application.
In summary, we show that using a theoretically sound model selection device we
cannot fully recover the HAR lags structure on real data. Although Corsi (2009) already
noted in his work that the HAR model is only intended as a simple approximation
consistent with a large number of stylized facts, our result provides a statistical foundation
to this statement and thus casts some doubt on the appropriateness of the basic HAR
as a model for realized variance beyond capturing stylized facts. Moreover, our empirical
results show that there are two clear breaks in structure taking place around the end of
2006 and with the onset of the financial crisis.
4 F. AUDRINO AND S. D. KNAUS

Finally, we find no substantial superiority of either the HAR model or the Lasso
when it comes to out-of-sample forecasting. This last result is not surprising given that
the Lasso identifies in most cases as relevant information for prediction the one also
considered by the HAR model (that is lags up to order 22). In fact, the Lasso and HAR
approaches can be seen as two methods aiming at parsimony in different ways: the Lasso
reduces the number of predictors, whereas the HAR model imposes strict restrictions on
the coefficients in order to accommodate many regressors.
The rest of the article is structured as follows. Section 2 introduces the HAR model
in more detail, relates it to the autoregressive class of time series models and shows how
the Lasso can be used in this context. Section 3 features an empirical application of the
proposed model selection approach, a Monte Carlo study, as well as an out-of-sample
Downloaded by [La Trobe University] at 22:39 02 February 2016

comparison of the HAR versus the Lasso. Section 4 concludes.

2. THEORETICAL FOUNDATION

2.1. The HAR Model

The HAR model as introduced in Corsi (2009) enjoys great popularity. There are
numerous variants and modifications of the HAR model (Corsi et al., 2012); however,
we restrict our attention to the basic model to keep a clear focus on the actual volatility
dynamics. We thus intentionally ignore other transient effects (such as the leverage effect)
that may be embedded in a HAR framework as well.
Let for this purpose RV(d)
t be an estimate of daily realized variance. Then, the HAR
model postulates that

t+1 = c + 
log RV(d) + (w) log RV(w) + (m) log RV(m) + t+1 ,
(d)
log RV(d)
t t t (1)

where, with a slight abuse of notation, log RV(w) = 15 5i=1 log RV(d)
t−i+1 and log RVt
(m)
=
1
22 (d)
t

22 i=1 log RV t−i+1 are the weekly and monthly averages of daily log realized variances,
and t t is a zero mean innovation process. Once these average log-variances are known,
the model can be consistently estimated by traditional least squares to obtain estimates
for c, (d) , (w) , and (m) . In other words, the conditional expectation of tomorrow’s
log-realized variance is the weighted sum of daily, weekly, and monthly log-realized
variances.2
Clearly, the HAR model is simply a constrained AR(22) model, as it has already been
noted by Corsi (2009), i.e., we can write


22

t+1 = c +
log RV(d) t−i+1 + t+1 ,
log RV(d)
HAR
i (2)
i=1

2
We comment further on the use of log-realized variances in Section 3.1.
LASSOING THE HAR MODEL 5

where the restrictions as imposed by (1) require




⎨ + 5  +
(d) 1 (w) 1 (m)
22
 for i = 1
HAR
= 15 (w) + 221 (m) for i = 2,    , 5 (3)
i

⎩ 1 (m)
22
 for i = 6,    , 22

A direct specification test is obviously testing the restrictions as collected by (3). Given
the high number of restrictions, a rejection of these is not surprising. Nevertheless, in his
work Corsi argues that this can well be attributed to specific properties of the time series.
Thus, a rejection can be regarded as some preliminary indication that indeed the HAR
Downloaded by [La Trobe University] at 22:39 02 February 2016

model may fail to fully capture the effects present in the data.

2.2. Lasso as model selection device

Lasso was introduced in Tibshirani (1996) and is frequently used in the field of
computational statistics and machine learning. In recent years, the Lasso in general as
well as the Lasso as model selection device has also been found in econometrics (Kock,
2012; Leeb and Pötscher, 2005). The Lasso is computationally cheap and renders model
selection with a high number of predictors feasible. As opposed to the 2p comparisons
that are required in an exhaustive search over p predictors, the Lasso employs a highly
efficient algorithm which provides estimates and model selection jointly at affordable
computational costs; see Friedman et al. (2010).
The Lasso works as follows. To ease the notation, let us denote by xt the daily realized

variance log RV(d) t . Let (xt−1 ,    , xt−p ) be predictor variables and xt responses, t = p +
1,    , n. The Lasso estimator of the AR(p) model


p
xt = c + j xt−j + t , (4)
j=1

where t t are independent and identically distributed (iid) innovations with zero mean,
is obtained as
⎧ ⎛ ⎞2 ⎫

⎨ ⎪

n p
p
(ĉLasso , ˆ Lasso ) = arg minc, ⎝xt − c − j xt−j
⎠ subject to | j | ≤ t, (5)

⎩t=p+1 ⎪

j=1 j=1

where t is a tuning parameter. As it is common practice, we demean the data and thus
drop the parameter c from the minimization. It can be seen (Tibshirani, 1996) that Eq. (5)
6 F. AUDRINO AND S. D. KNAUS

is equivalent to the Lagrangian form given as


⎧ ⎛ ⎞2 ⎫

⎨ ⎪

n p

p
ˆ Lasso = arg min ⎝xt − j xt−j
⎠ + | j| , (6)

⎩t=p+1 ⎪

j=1 j=1

with a one-to-one correspondence between in Eq. (6) and t in Eq. (5). The powerful
feature of the Lasso is now induced by the 1 -norm of the penalty. The Lasso solution
will be sparse, since some j coefficients will be set exactly to zero.
Define S = j, j  = 0 ⊂ 1,    , p the active set, S c = 1,    , p \ S the nonactive set,
Downloaded by [La Trobe University] at 22:39 02 February 2016

and XY = Cov(X, Y ) the covariance matrix of vectors X and Y . Consequently, SS is the


square covariance matrix of the active predictors, and Sc S is the covariance matrix of the
predictors in the nonactive set (given as xt−j , j ∈ S c ) with the predictors in the active set
(given as xt−j , j ∈ S).
A question of utmost importance is how reliable the Lasso is in the sense that it
correctly identifies S and S c , i.e., that it sets the true zero coefficients to zero. Typically,
this is what is captured by model selection consistency. The following definition adopts
the view of Nardi and Rinaldo (2011). For an overview and weaker form of this, the
reader is referred to Bühlmann and Van De Geer (2011).


Definition 1. Let xt = pj=1 j xt−j + t , t = p + 1,    , n, with true unknown coefficients
denoted by 0 = [ 01 ,    , 0p ] . Let us define the sign function as sgn :  → −1, 0, 1 with


⎨−1 if x < 0
sgn(x) = 0 if x = 0


1 if x > 0,

and denote sgn( ) = (sgn( 1 ),    , sgn( p ))



. Then an estimator ˆ n is said to be model
selection consistent if

P(sgn( ˆ n ) = sgn( 0
)) → 1 for n → ∞ (7)

The above model selection consistency definition meets our requirement that if there
is an estimator producing ˆ n which is model selection consistent it will eventually only
retain the true nonzero coefficients of 0 .
Nardi and Rinaldo (2011) establish that, under some assumptions, the Lasso is model
selection consistent in the sense of Definition 1 for a causal AR(p) process with Gaussian
iid innovations. In the next section, we will generalize this result for more realistic
non-Gaussian AR(p) processes.
LASSOING THE HAR MODEL 7

2.3. Lassoing the HAR Model

If we assume that t in (4) is Gaussian, we can readily use Nardi and Rinaldo (2011) result
and apply the Lasso to recover the lags structure and the maximum order of the HAR
model embedded in an AR(p) process with p > 22. The Lasso should then detect3 S =
1, 2,    , 22 and S c = 23, 24,    , p since any other lagged value should be irrelevant if
the HAR model is the true data generating process (DGP).
The assumption of Gaussianity of the error is rather restrictive and does not
correspond to reality in general, although the HAR model is usually estimated using least
squares. What we are going to show in the next theorem is that, under the assumption that
the HAR model is the true DGP, we precisely know the dynamics and can prove model
Downloaded by [La Trobe University] at 22:39 02 February 2016

selection consistency of the Lasso without relying on Gaussianity. The relaxation on the
distribution of the error term comes at the price of keeping S and S c fixed; the Lasso
literature generally differentiates between p growing with n or p fixed. Nardi and Rinaldo
(2011) result addresses the case where p is allowed to grow; our contribution below
however requires p to be fix. In the context of realized volatility dynamics estimation,
however, this can be safely assumed.

Theorem 1. Consider the HAR process introduced in (1) and rewritten as a restricted
AR(22) process as


22
log RV(d)
t =c+ j t−j + t
log RV(d)
j=1

with


⎨ + 5  +
(d) 1 (w) 1 (m)
22
 for j = 1
= 15 (w) + 221 (m) for j = 2,    , 5

j
⎩ 1 (m)
22
 for j = 6,    , 22,

where t t is a zero mean, iid innovation process with distribution having a finite fourth
moment, the process is assumed to be causal, and the parameters (d) , (w) , (m) are
assumed to be non-negative. Moreover, defining the infinity norm of a matrix by

A ∞ := max A ∞, with ∞ = max | i | for ∈ n ,


∞ =1 1≤i≤n

assume as follows:

−1
(i) There exists a finite positive constant Cmax such that SS ∞ ≤ Cmax ;
−1
(ii) There exists a  ∈ (0, 1] such that Sc S SS ∞ ≤ 1 − .

3
In the sense of setting the nonactive coefficients to zero.
8 F. AUDRINO AND S. D. KNAUS

Let S, S c fixed, and the tuning parameter n is chosen such that n /n → 0 and
1+c
n /n 2 → ∞ with 0 ≤ c < 1. Then the Lasso is model selection consistent in the sense
of Definition 1.

The complete argument and proof is given in Appendix A.

Remark. The assumptions that the HAR model is causal as well as (d) , (w) , (m) to
be non-negative are by no means restrictive: In fact when estimating the HAR model
on empirical data they are (almost) always satisfied. Condition (i) holds trivially in case
that none of the variables is a linear combination of another. Condition (ii) is found
Downloaded by [La Trobe University] at 22:39 02 February 2016

throughout the model consistency literature for the Lasso. Typically this condition is
called the irrepresentable condition as introduced in Zhao and Yu (2006). It has been
already proved that using a version of the Lasso allowing for a more flexible penalization
called the adaptive Lasso this assumption can be significantly relaxed; see, for example,
Zou (2006) and Bühlmann and Van De Geer (2011).
Finally, the assumption of iid errors is not restrictive but greatly simplifies all
derivations. As it has been already shown by Medeiros and Mendes (2012) and Audrino
and Camponovo (2013), in case of the adaptive Lasso this assumption can be relaxed
in favor of more realistic error terms. Moreover, in our Monte Carlo simulations we
investigate the adequacy of the Lasso as model selection device in the case of conditional
heteroscedasticity with good results.

Theorem 1 says that when the HAR model is the true DGP, by estimating an
unrestricted AR model using the Lasso we are able to recover the lags structure and
maximum order of the HAR. In other words, if the HAR model is the DGP, applying
the Lasso we should expect to select lags up to order 22 in the active set, whereas
the coefficients of the lags beyond 22 should be set to zero. This is clearly different
from recovering the correct coefficient structure of the HAR model, as in this case the
imposition of the equality restrictions across coefficients belonging to the same frequency
(weekly, monthly) is necessary as well. This goal can be attained in the Lasso context by
considering variants of the original Lasso, such as the adaptive Lasso (Zou, 2006) or the
cluster group Lasso (Bühlmann et al., 2012). A deeper investigation of the appropriateness
of the coefficient restrictions imposed by the aggregation embedded in the HAR model is
left for future research.

3. EMPIRICAL APPLICATION

In this section, we illustrate our approach of identifying the HAR model via the Lasso
using nine assets traded on the New York Stock Exchange. For each of these stocks, we
compute the series of daily realized variance measures using Zhang et al. (2005) two-time
LASSOING THE HAR MODEL 9

scales estimator with frequencies of 2 and 20 ticks.4 We then estimate the HAR model
in-sample and contrast it with estimates obtained using the Lasso procedure described
in Section 2. We finally compare the Lasso and HAR forecasting performances out-of-
sample.
Note that we obviously only forecast one day ahead realized variance since our
argument is based on the basic specification of the HAR model. One could of course
address the question whether the Lasso is also well suited to forecast realized variance
at longer horizons (weekly, monthly) given that it was explicitly designed to capture long
memory type dependencies; this, however, would be a purely empirical exercise and is
beyond the scope of this article.
Downloaded by [La Trobe University] at 22:39 02 February 2016

For our analysis we use R, the statistical programming language (Team, 2012) in its
version 2.14.1. The Lasso estimates were obtained using the glmnet package which is
based on Friedman et al. (2010) as well as the lars package (Hastie and Efron, 2011).

3.1. Data Description

We use intraday data of Alcoa, Inc. (AA), Citigroup, Inc. (C), Hasbro, Inc. (HAS), Harley
Davidson, Inc. (HDI), Intel Corporation (INTC), Microsoft Corporation (MSFT), Nike,
Inc. (NKE), Pfizer, Inc. (PFE), and Exxon Mobil Corporation (XOM) from Jan. 2, 2001
to Nov. 15, 2010, for a total of 2,483 daily realized variance observations. Although using
the log to transform the realized variance is standard in the literature, we briefly comment
explicitly on this in Appendix B. In what follows, we always assume the use of log realized
variance when speaking of realized variance unless otherwise stated.
Summary statistics for the unconditional distribution of log RVt are summarized in
Table 1 and illustrated in Fig. 1.
Consistent with the existing literature we witness slowly decaying autocorrelation
functions in Fig. 1 (a) for all assets. Figure 1 (b) shows a violin plot (Hintze and Nelson,
1998) of the unconditional distributions of log RVt . A violin plot is similar to box plots,
except that they also show the probability density of the data at different values (in the
simplest case this could be a histogram). As can be seen from Table 1 and Fig. 1 (b) all
stocks show similar values of mean and standard deviation, positive skewness, and excess
kurtosis, with the only exception of Citigroup, Inc.

4
We adhere to the suggestion put forward in Corsi (2009) and use annualized returns in percentage points.
Clearly, other realized variance estimators like the realized kernel can be used: We chose the two-time scales
estimator for its simplicity and empirical accuracy. Moreover, the theory used in this study for the HAR
process can be applied to any other realized variance estimator.
10 F. AUDRINO AND S. D. KNAUS

TABLE 1
Descriptive Statistics of log RVt Series

AA C HAS HDI INTC MSFT NKE PFE XOM

Mean 6.80 6.96 6.36 6.42 6.90 6.37 6.10 6.30 5.90
SD 0.95 1.98 0.93 0.98 0.78 0.89 0.90 0.85 0.89
Kurtosis 4.06 2.61 3.06 3.32 3.82 4.40 3.06 4.18 6.15
Skewness 0.92 0.78 0.46 0.69 0.58 0.58 0.58 0.83 1.19
Median 6.67 6.43 6.23 6.30 6.81 6.33 5.97 6.18 5.78
25%-quantile 6.07 5.33 5.70 5.65 6.40 5.76 5.41 5.68 5.30
75%-quantile 7.30 8.21 6.99 7.01 7.35 6.90 6.70 6.82 6.35

Summary statistics of the (log) realized variance unconditional distribution for nine assets belonging to the
Downloaded by [La Trobe University] at 22:39 02 February 2016

S&P500 universe: Alcoa, Inc. (AA), Citigroup, Inc. (C), Hasbro Inc. (HAS), Harley Davidson, Inc. (HDI),
Intel Corporation (INTC), Microsoft Corporation (MSFT), Nike Inc. (NKE), Pfizer Inc. (PFE), and Exxon
Mobil Corporation (XOM). The time period goes from Jan. 2, 2001 to Nov. 15, 2010, for a total of 2,483
daily observations.

FIGURE 1 Autocorrelation function for log RVt series. Panel (a) shows the autocorrelation function for the
nine log RVt series. Panel (b) shows a violin plot Hintze and Nelson (1998) of the unconditional distribution
of log RVt . A violin plot is a combination of a box plot and a kernel density plot. Specifically, it starts with
a box plot. It then adds a rotated kernel density plot to each side of the box plot.

3.2. In-Sample Evaluation

To address the question whether the HAR model lags structure is identified by the Lasso
procedure, we define S c = xt−23 ,    , xt−100 .5 Since in (6) is a tuning parameter and our

5
The choice of S running up to 100 is arbitrary. However, the results are not sensitive to the choice of
the maximal lag, as for instance the results remain almost identical for a maximal lag of 50.
LASSOING THE HAR MODEL 11

theoretical results only hold asymptotically, we proceed as in the previous literature and
choose according to the Bayesian information criterion (BIC) criterion.6
Results obtained when estimating the HAR model as well as the Lasso on the full
sample for the nine stocks under investigation are summarized in Table 2 and graphically
plot in Fig. 2.
Two important points should be highlighted: First, the Lasso does not select as relevant
all predictors with coefficients implied to be nonzero by the HAR as can be inferred from
Table 2. Although near lags are identified as relevant for most assets, lags beyond xt−6
rarely get selected by the Lasso.7 Note at this point that a comparison of coefficients in
magnitude of the Lasso estimates to the HAR estimates cannot be made since the Lasso,
as a penalized estimator, is biased. Second, sometimes lags far beyond xt−22 are selected
Downloaded by [La Trobe University] at 22:39 02 February 2016

in the active set as can be seen in Fig. 2. Clearly, these lags are zero under the assumption
that the HAR model is true.
At this stage, it is already apparent that the Lasso does not fully agree from a model
choice perspective with the HAR model’s lags structure, i.e., Ŝ  = 1,    , 22. To provide
further evidence supporting this statement, we conduct analyses which attempt to answer
the following two questions: 1. How reliable is the Lasso as a model selection device
in this specific finite sample setting? 2. How stable are these regressors over time? A
thorough answer to these questions is provided in the two subsequent paragraphs.

3.2.1. Monte Carlo Study

We perform a Monte Carlo simulation study to assess the model selection consistency
of the Lasso in the case of the HAR model in finite samples. Since the Lasso’s model
selection results depend on the signal-to-noise ratio (Bühlmann and Van De Geer, 2011),
it is important to have a comparable setting to assess the finite sample performance of
the Lasso as a model selection device.
Standard Setting. We first conduct the Monte Carlo study under the assumption that the
HAR model is true, in order to answer the question how effective the Lasso would be if the
HAR model were true. To this end, we proceed as follows in a parametric bootstrap manner:

1. For asset j = 1,    , 9 estimate the HAR model (1) on the full sample of 2483 data
points, which includes

(a) Obtain ĉ, ˆ (d) , ˆ (m) , and ˆ (w) , and compute Var(
 t ) as well as the derived estimates
ˆ (HAR) ,    , ˆ (HAR) via (2)–(3).
1 22

6
There are several alternatives available for choosing the tuning parameter , such as other information
criteria (i.e., AIC), cross-validation, or by fixing the value of according to some theoretical arguments; see,
for example, Nardi and Rinaldo (2011). As a robustness check, we redid the analysis using cross-validation
log n log p
and fixing = n
. Results remain qualitatively intact.
7
One of the reasons for this result may be the lack of uniformity of the Lasso estimator: Coefficient of
order 1/n are undistinguishable from zero.
Downloaded by [La Trobe University] at 22:39 02 February 2016

TABLE 2
HAR Estimates Versus Lasso Estimates

AA C HAS HDI INTC MSFT NKE PFE XOM


Lag HAR Lasso HAR Lasso HAR Lasso HAR Lasso HAR Lasso HAR Lasso HAR Lasso HAR Lasso HAR Lasso

1 0.470 0.426 0.573 0.531 0.403 0.368 0.413 0.380 0.535 0.509 0.488 0.460 0.456 0.417 0.417 0.387 0.497 0.476
2 0.084 0.178 0.073 0.169 0.084 0.128 0.086 0.144 0.074 0.132 0.086 0.133 0.073 0.122 0.073 0.121 0.094 0.143
3 0.084 0.043 0.073 0.002 0.084 0.079 0.086 0.080 0.074 0.015 0.086 0.048 0.073 0.082 0.073 0.004 0.094 0.043
4 0.084 0.064 0.073 0.065 0.084 0.045 0.086 0.022 0.074 0.071 0.086 0.093 0.073 0.055 0.073 0.074 0.094 0.104
5 0.084 0.031 0.073 0.044 0.084 0.026 0.086 0.054 0.074 0.062 0.086 0.019 0.073 – 0.073 0.045 0.094 0.063

6 0.010 0.014 0.008 0.025 0.012 0.057 0.012 0.054 0.008 – 0.008 0.057 0.013 0.033 0.014 0.046 0.005 –
7 0.010 – 0.008 – 0.012 0.038 0.012 0.015 0.008 – 0.008 0.004 0.013 0.035 0.014 0.030 0.005 –
8 0.010 – 0.008 – 0.012 – 0.012 0.016 0.008 – 0.008 – 0.013 0.008 0.014 0.010 0.005 –
9 0.010 0.049 0.008 0.056 0.012 – 0.012 0.031 0.008 0.062 0.008 0.047 0.013 0.026 0.014 0.041 0.005 0.056

12
10 0.010 0.060 0.008 0.002 0.012 – 0.012 0.015 0.008 0.005 0.008 0.006 0.013 0.006 0.014 0.034 0.005 0.012
11 0.010 – 0.008 – 0.012 0.028 0.012 0.034 0.008 0.007 0.008 0.010 0.013 0.028 0.014 – 0.005 –
12 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 0.006 0.013 – 0.014 – 0.005 –
13 0.010 – 0.008 – 0.012 0.018 0.012 0.008 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
14 0.010 0.012 0.008 0.005 0.012 – 0.012 – 0.008 – 0.008 0.023 0.013 0.015 0.014 0.041 0.005 0.007
15 0.010 – 0.008 – 0.012 – 0.012 0.002 0.008 0.013 0.008 – 0.013 – 0.014 – 0.005 –
16 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
17 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
18 0.010 – 0.008 – 0.012 0.008 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
19 0.010 – 0.008 – 0.012 0.014 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
20 0.010 – 0.008 – 0.012 – 0.012 0.007 0.008 0.017 0.008 – 0.013 – 0.014 – 0.005 –
21 0.010 – 0.008 0.002 0.012 – 0.012 – 0.008 – 0.008 – 0.013 0.017 0.014 – 0.005 –
22 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –

This table reports the HAR coefficients (as implied by (3)) and the Lasso coefficients estimated on the full sample going from Jan. 2, 2001 to Nov.
15, 2010, for the nine stocks belonging to the S&P500 universe under investigation. Coefficients set to 0 by the Lasso procedure are indicated by dashes.
We only report the coefficients up to lag xt−22 . Lasso coefficient estimates of lags higher than xt−22 are reported graphically in Fig. 2.
LASSOING THE HAR MODEL 13
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 2 HAR versus Lasso coefficients. Lasso (red) and HAR (blue) estimated coefficients for nine stocks
belonging to the S&P500 universe for the time period going from Jan. 2, 2001 to Nov. 15, 2010.


(b) Compute the unconditional mean ˆ (as ĉ/(1 − 22 ˆ
 i=1 i )) and the unconditional
 22 ˆ
variance ˆ (as Var(t )/(1 − i=1 i ˆ i ), where ˆ i is the autocovariance at lag i, see
2

Brockwell and Davis, 1986).

2. Resample the HAR model.

(a) Sample x1 ,    , x22 from the stationary distribution  (,


ˆ ˆ 2 ).
14 F. AUDRINO AND S. D. KNAUS

(b) Compute x23 ,    , x2483 recursively based on (3).


(c) Apply the Lasso and record the estimates.

Step 2 is repeated 1,000 times, and the results are reported in Table 3.
The results clearly indicate that the HAR structure is well recovered by the Lasso in
this synthetic HAR setting. Although the small monthly coefficients are selected less often,
the daily and weekly coefficients are almost always estimated to be nonzero and thus
considered active.
Note at this point that there is indeed some contradiction with what has been reported
in Table 2: The percentages of times lags 1 ,    , 22 are recovered deviate, in some cases
substantially, from those observed in Table 2, and selection of lags beyond 22 is generally
Downloaded by [La Trobe University] at 22:39 02 February 2016

moderate, if not rare. For example, we may conclude that the observed event in Table 2
that lag xt−16 is nonactive across all nine assets has a chance of occurring of 0.24% based
on the percentages summarized in Table 3 (that is, assuming independence, multiplying
one minus the corresponding percentages reported in the nine columns).
We thus conclude from this Monte-Carlo application that indeed the Lasso does
recover the lags structure and maximum order implied by the HAR model reasonably
well if it is the true model, i.e., if we simulate from this DGP.
Stochastic Volatility and Jumps. We evaluate then the performance of the Lasso device in
a more realistic setting, mimicking more closely the real data dynamics. We thus consider
the extended HAR model

t+1 = c + 
log RV(d) + (w) log RV(w) + (m) log RV(m) + t+1 + t+1 ,
(d)
log RV(d)
t t t (8)

where

t+1 = h1/2
t+1 t+1 , ht+1 = 0 + 1 t2 + 1 ht ,

t t (jump process) are independent identically Poisson distributed random variables
with intensity  = 01, and t follows a standardized t-distribution with five degrees
of freedom. The GARCH parameters are chosen to match the estimated realized
volatility dynamics and are set as follows: 0 = 0003, 1 = 005, and 1 = 09. The HAR
parameters are averaged over the estimated ones for the nine series under investigation
(for the values see Table 2).
Simulations are done according to the resampling Step 2 of the parametric bootstrap
algorithm introduced in the standard setting (with 1,000 replications), applied to the
extended model (8) with parameters specified above. We consider two cases: no jumps
(that is,  = 0) and the case with  = 1. Results are summarized in Table 4, columns 2
and 3.
As Table 4 shows, the Lasso approach is robust against this type of misspecification:
When compared to those summarized in Table 3, results remain qualitatively the same.
LASSOING THE HAR MODEL 15

TABLE 3
Percentage of HAR Coefficients Selected

Lag AA C HAS HDI INTC MSFT NKE PFE XOM

xt−1 100 100 100 100 100 100 100 100 100

xt−2 100 100 100 100 100 100 100 97 100


xt−3 100 100 100 100 100 100 100 98 100
xt−4 100 100 100 100 100 100 100 97 100
xt−5 100 100 100 100 100 100 100 100 100

xt−6 58 52 62 64 52 53 64 66 44
xt−7 54 49 60 61 48 49 61 65 41
xt−8 55 47 62 63 48 49 63 67 41
Downloaded by [La Trobe University] at 22:39 02 February 2016

xt−9 50 43 56 57 43 44 58 61 35
xt−10 52 47 58 59 47 48 59 63 38
xt−11 49 44 56 57 44 44 57 62 35
xt−12 52 45 57 58 46 46 59 63 35
xt−13 50 45 57 59 45 46 58 63 32
xt−14 53 48 61 62 47 48 62 66 36
xt−15 48 42 57 58 42 43 58 63 30
xt−16 49 42 54 56 42 43 56 60 31
xt−17 46 39 53 54 39 40 54 59 27
xt−18 47 41 55 57 40 40 57 62 29
xt−19 44 38 51 52 37 38 53 57 27
xt−20 41 34 47 48 34 35 49 54 22
xt−21 39 32 48 48 33 33 50 56 21
xt−22 34 26 43 43 28 29 44 49 20

xt−23 12 13 15 22 14 10 12 14 12
xt−24 7 6 8 8 5 6 8 8 4
xt−25 7 6 7 8 6 6 9 9 5
xt−26 5 4 8 7 5 5 8 9 5
xt−27 3 1 5 4 2 2 5 6 2
xt−28 4 2 5 5 3 4 5 6 3
xt−29 3 2 4 4 2 3 5 5 2
xt−30 2 2 4 4 2 2 4 4 2
xt−31 2 1 3 3 1 2 3 3 2
xt−32 2 1 4 3 2 2 3 3 2
xt−33 2 1 3 2 1 2 3 3 2
xt−34 2 1 5 3 2 2 4 5 3
xt−35 3 1 4 4 3 3 4 5 3
xt−36 2 2 4 3 2 3 4 4 3
xt−37 2 1 4 3 2 2 3 3 2
xt−38 2 1 3 3 2 2 3 3 2
xt−39 2 1 3 2 1 1 3 3 3
xt−40 1 1 2 2 1 1 2 2 2
xt−41 2 1 3 3 2 2 3 3 2
xt−42 1 0 2 1 1 1 1 2 1
xt−43 1 0 2 1 1 1 2 2 1
xt−44 1 1 2 1 1 1 2 2 1
         
         
xt−100 0 0 2 1 0 0 2 2 1

Percentage of times out 1,000 replications a lag has been selected (estimated as nonzero) by the Lasso.
The data is generated according to the correctly specified HAR model.
16 F. AUDRINO AND S. D. KNAUS

TABLE 4
Percentage of HAR Coefficients Selected: Misspecified Settings

Lag Stochastic volatility Stochastic volatility and jumps

xt−1 100 100


xt−2 100 100
xt−3 100 100
xt−4 99 100
xt−5 99 100
xt−6 54 59
xt−7 52 54
xt−8 47 53
xt−9 49 53
Downloaded by [La Trobe University] at 22:39 02 February 2016

xt−10 48 51
xt−11 49 53
xt−12 48 52
xt−13 49 51
xt−14 46 51
xt−15 45 48
xt−16 46 51
xt−17 47 49
xt−18 44 51
xt−19 44 46
xt−20 41 46
xt−21 39 43
xt−22 37 40
xt−23 13 10
xt−24 7 9
xt−25 7 7
xt−26 6 7
xt−27 6 7
xt−28 6 8
xt−29 5 6
xt−30 4 5
xt−31 5 5
xt−32 2 5
xt−33 4 2
xt−34 3 3
xt−35 2 3
xt−36 1 2
xt−37 1 2
xt−38 1 3
xt−39 2 1
xt−40 1 3
xt−41 1 2
xt−42 1 1
xt−43 2 3
xt−44 1 1
  
  
xt−100 2 1

Percentage of times out 1,000 replications a lag has been selected (estimated as nonzero) by the Lasso.
The data is generated according to a misspecified HAR model extended to include (i) stochastic volatility
(column 2) and (ii) stochastic volatility and jumps (column 3).
LASSOING THE HAR MODEL 17

This finding is not surprising and confirms the recent theoretical and empirical results
provided in Kock (2012), Medeiros and Mendes (2012), and Audrino and Camponovo
(2013).

3.2.2. Rolling Window

To address the question whether regressors estimated to belong to the active set by the
Lasso are stable over time, we apply the Lasso procedure in a rolling window manner.
We stack our data for each asset as follows
⎡ ⎤
x101 x100    x1
Downloaded by [La Trobe University] at 22:39 02 February 2016

⎢x102 x101    x2 ⎥
⎢ ⎥
X=⎢    ⎥ 
⎣    ⎦
xn xn−1    xn−100

We then estimate the Lasso on the first 1,000 rows of X and roll this window of length
1,000 down to the last row of X. As an illustration, Fig. 3 contains this analysis for Harley
Davidson, Inc. (HDI). The abscissa reports the last date of the current window: The first
window thus corresponds to May 19, 2005 and continues through Nov. 15, 2010. The
ordinate indicates whether or not a regressor was selected and is shown as a green bar.
Diagonal gray lines have slope 1, i.e., if a regressor moves along these lines, its effect is
lagged by one day as the rolling window proceeds by one row. Dashed lines indicate the
daily, weekly, and monthly lags implied by the HAR model’s lags structure.
Groups of regressors moving along the diagonal lines are likely to be noise given that
they are one-off events that move through the sampling window. It is also apparent from
Fig. 3 that there are two clear breaks in structure taking place around the beginning of
2007 and about 6 months after. In fact, for a small period in 2007 no lag is selected by
the Lasso, and thus the realized volatility dynamics follow a (local) constant model.
Figure 4 draws the same picture for the remaining eight assets.
Although there are some differences among assets, we observe a clear pattern of a
dependence breakdown around the end of 2006 and the beginning of 2007, the only
exception being XOM8 : Very simple AR models are selected by the Lasso in this period.
This result supports previous findings in the GARCH setting described, among others,
by Brownlees et al. (2012). A second break is then found with the onset of the financial
crisis (for some assets already at the beginning of 2008, and for others several months
later). Most assets indeed also have components that can be explained by one-off events.
However, we also identify, for some assets and for particular subperiods depending on
the asset, lags beyond xt−22 that constantly get selected and remain in the active set. This

8
To test whether this is a real feature of the data, we estimated the HAR model using a similar rolling
window strategy. Results on the parameters of the HAR components pointed in the same direction as those
of the Lasso, confirming the break in structure.
18 F. AUDRINO AND S. D. KNAUS
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 3 Stability of Lasso selected regressors for Harley Davidson, Inc. The abscissa reports the last date
of the current window: The first window thus corresponds to the date of x1000 which in this case is May 19,
2005 and continues through Nov. 15, 2010. The ordinate indicates whether or not a regressor was selected
and is shown as a green bar. Diagonal gray lines have slope 1, i.e., if a regressor moves along these lines
its effect is lagged by one day as the rolling window proceeds by one row. Dashed lines indicate the daily,
weekly, and monthly lags used by the HAR model.

may be an indication of longer-range dependence which is not accounted for by the HAR
model.
To investigate further whether the identified break in the lags structure around 2007
is the consequence of a true regime change or simply driven by some outliers, we plot as
an illustrative example the realized variance series of the Harley Davidson, Inc. in Fig. 5,
left panel. Additionally, we also plot in Fig. 5, right panel, the autocorrelation function
of the realized variances for three different subperiods the pre-2005 period, the 2005–2007
period, and post 2007 period.
From the visual inspection of the realized variance time series, nothing really seems to
change abruptly around the year 2007: As expected, there is a general increase in the level
of realized volatility during the financial crisis. In contrast, the memory of the realized
variance series drops significantly faster during the period 2005–2007 than it was before
LASSOING THE HAR MODEL 19
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 4 Stability of Lasso selected regressors for the other eight assets we consider. The abscissa reports
the last date of the current window: The first window thus corresponds to the date of x1000 which in this
case is May 19, 2005 and continues through Nov. 15, 2010. The ordinate indicates whether or not a regressor
was selected and is shown as a green bar. Diagonal gray lines have slope 1, i.e., if a regressor moves along
these lines its effect is lagged by one day as the rolling window proceeds by one row. Dashed lines indicate
the daily, weekly, and monthly lags used by the HAR model.

and than it will be after 2007. In agreement with what we found above, this result also
points in the direction of a structural break in the memory of the process that takes place
around the beginning of 2007.

3.3. Out-of-Sample Prediction

So far, we only considered the Lasso results in-sample. But the HAR has also garnered
praise for its out-of-sample prediction. Similarly, simple realized variance AR models,
close to those we found in the previous section when applying the Lasso, often perform
well in forecasting; see for a detailed discussion Andersen et al. (2004) and Sizova (2011).
In a next step, we thus compare the HAR’s and the Lasso’s out-of-sample performance.
We estimate the HAR model with data up to time t and compute an estimate for t + 1
which is labeled log  (HAR)
RVt+1|t . We do the same for the Lasso to obtain log  (Lasso)
RVt+1|t .
20 F. AUDRINO AND S. D. KNAUS
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 5 Log realized variance time series (left panel) and autocorrelation function (right panel) for Harley
Davidson, Inc. The autocorrelation function is shown for three subperiods separately: The period going from
the beginning of the sample in January 2001 to 2005 (solid line), the years 2005, 2006, and 2007 (dashed
line), and the period after 2007 until the end of the sample in November 2010 (dotted line).

We proceed again in a rolling window manner but also vary the training window length:
We consider windows of length 200, 400, 1,000, and 2,000 daily observations. To render
the results comparable, we report the out-of-sample prediction for different training
window length but the same evaluation window, that is from May 12, 2009 to Nov. 15,
2010 as implied by the longest training window length. To have an objective comparison,
we also include the random walk (RW) in our analysis.
Similarly to the in-sample analysis, we choose the optimal parameter of the Lasso
according to the BIC criterion. We measure the out-of-sample performance using the mean

squared prediction error (MSPE) computed as MSPE = N1 Nt=1 (log RVt+1|t − log RVt+1 )2 ,
where log RVt+1|t is the prediction obtained by the HAR model, the Lasso, or RW, and N
is the total number of out-of-sample predictions. Results are reported in Table 5.
Both the Lasso and the HAR need a certain window length to attain reasonably low
MSPEs. The HAR model is markedly better for small training window sizes, whereas for
longer training windows, although the Lasso and the HAR are almost equal in terms of
MSPE, the Lasso outperforms the HAR in most of the cases (12 out of 18).
LASSOING THE HAR MODEL 21

TABLE 5
Out-of-sample Comparison

200 400 1,000 2,000


Asset RW HAR Lasso RW HAR Lasso RW HAR Lasso RW HAR Lasso
AA 0.160 0.129 0.145 0.160 0.126 0.125 0.160 0.125 0.121 0.160 0.124 0.121
C 0.132 0.115 0.128 0.132 0.115 0.121 0.132 0.115 0.115 0.132 0.116 0.115
HAS 0.240 0.201 0.233 0.240 0.197 0.207 0.240 0.193 0.197 0.240 0.197 0.198
HDI 0.231 0.184 0.207 0.231 0.181 0.185 0.231 0.179 0.179 0.231 0.179 0.176
INTC 0.113 0.094 0.104 0.113 0.091 0.091 0.113 0.089 0.087 0.113 0.089 0.087
MSFT 0.153 0.128 0.151 0.153 0.125 0.128 0.153 0.123 0.122 0.153 0.123 0.120
NKE 0.176 0.144 0.167 0.176 0.142 0.149 0.176 0.139 0.139 0.176 0.138 0.139
PFE 0.130 0.107 0.116 0.130 0.104 0.105 0.130 0.102 0.100 0.130 0.102 0.098
Downloaded by [La Trobe University] at 22:39 02 February 2016

XOM 0.221 0.182 0.193 0.221 0.179 0.180 0.221 0.178 0.177 0.221 0.176 0.172

MSPE for all nine assets across training window length of 200, 400, 1,000, and 2,000 observations (rolling
window). In addition to the Lasso and the HAR the random walk (RW) is included.

To better understand these results, we further report the evaluation over different
out-of-sample periods: Pre-crisis, post-crisis, and full out-of-sample period. The date for
the beginning of the financial crisis was set to Sep. 1, 2007. For the relevant training
window lengths (i.e., 1,000 days and 2,000 days), we kept the maximal out-of-sample
period which, unlike Table 5, results in evaluation windows of different lengths. The
difference in MSPE is then tested using the Diebold and Mariano (1995) test corrected to
account for autocorrelations and heteroscedasticity. Results are summarized in Table 6.

TABLE 6
Diebold-Mariano Tests of Equal Predictive Ability

AA C HAS HDI INTC MSFT NKE PFE XOM

1,000 Total Mean Diff. –0.008 –0.010 0.013 –0.017 –0.005 –0.006 –0.020 0.022 –0.008
(n = 1 383) p-value 0.067 0.020 0.003 0.000 0.071 0.057 0.000 0.000 0.038
PreCrisis Mean Diff. –0.004 –0.004 –0.005 –0.011 0.001 –0.002 –0.015 0.007 0.001
(n = 575) p-value 0.509 0.431 0.547 0.172 0.357 0.334 0.178 0.024 0.829
PostCrisis Mean Diff. –0.011 –0.014 0.018 –0.021 –0.009 –0.009 –0.024 0.033 –0.014
(n = 808) p-value 0.066 0.023 0.000 0.000 0.033 0.093 0.000 0.000 0.021
2,000 Total Mean Diff. 0.003 0.000 –0.000 0.003 0.002 0.002 –0.001 0.004 0.004
(n = 383) p-value 0.220 0.700 0.837 0.287 0.022 0.296 0.517 0.058 0.048
PreCrisis Mean Diff. — — — — — — — — —
p-value — — — — — — — — —
PostCrisis Mean Diff. 0.003 0.000 –0.000 0.003 0.002 0.002 –0.001 0.004 0.004
(n = 383) p-value 0.220 0.700 0.837 0.287 0.022 0.296 0.517 0.058 0.048

Differences in MSPE (MSPEHAR − MSPELasso ) are reported together with p-values from the Diebold–
Mariano test (Newey–West adjusted). The differences and p-values are reported for different training windows
(1,000, 2,000) and before/after the financial crisis. Differences significant at 1% are typeset in boldface.
22 F. AUDRINO AND S. D. KNAUS

Although there is a small number of rejections of the null at the 1% significance level,
we find no consistent pattern, neither in favor of the HAR nor in favor of the Lasso.

Investigating the predictions log
(Lasso)
RVt+1|t and log  (HAR)
RVt+1|t in the sense of Mincer and
Zarnowitz (1969), we also find no evidence of either of the models being more often
unbiased. For the sake of brevity, results are not reported but can be obtained from the
authors upon request.
To be retained at this stage is that there is no clear evidence that either of the two
models is genuinely better suited to forecast realized variance out-of-sample.

3.4. Risk Management Application


Downloaded by [La Trobe University] at 22:39 02 February 2016

To test the predictions obtained from the Lasso and the HAR model from a different
angle, we include a risk management application. The value-at-risk of an asset to the level
 is given as

VaRt = − infx ∈ |P(Xt ≤ x) ≥ 1 − , (9)

where Xt is the daily log-return of the asset.9 Under the assumption, which also underlies
the computation of realized variance, that an asset’s return Xt is given as10

Xt = t + t · Zt

and assuming a scale-location family with continuous distribution function, we can


readily compute the value-at-risk as

VaR (X) = t + t q1− , (10)

where q1− is the 1 −  quantile of the standardized distribution Zt , t the conditional


mean, and t the conditional volatility of Xt .
We assume that Zt is either standard normally distributed or we take the empirical

distribution after standardizing Xt with t and RVt as an accurate estimate of t . Since
we are aiming for a realistic benchmark, we do not employ backtesting for the value-at-
risk but conduct an out-of-sample analysis and predict

VaRt+1|t
 = t + t+1|t q1− , (11)

9
We define the value-at-risk compliant to the risk management literature: Instead of working with the
usual distribution, we multiply it with −1 such that losses are now positive values, resulting in the mnemonic
that a greater VaR means greater risk.
10
We may also allow for jumps to contribute to the return Xt . For reasons of simplicity, we exclude this
component.
LASSOING THE HAR MODEL 23

where t+1|t is again obtained based on RVt+1|t estimates by either the Lasso or the HAR
model.
To do so, we estimate both models on window lengths of size 200, 400, 1, 000, and
2, 000 observations. To get an optimal forecast (in the sense of Proietti and Lütkepohl,
2013, and Appendix B) of the actual volatility, we compute ˆ t+1|t as

 ˜ 2
ˆ t+1|t = exp(log RVt+1|t + ), (12)
2

where ˜ 2 is the variance of log RVt+1|t . The hit ratios are then defined as
Downloaded by [La Trobe University] at 22:39 02 February 2016

#xt+1 < −VaRt+1|t 


HRM

(D)
= 
, (13)
n

where “M”’ can either be “HAR” or “Lasso” depending on how t+1|t in (12) is computed
(either by the HAR-model or our Lasso approach), and “D” is either “Norm” or
“Emp” depending on how q1− in (11) is computed (quantiles of a  (0, 1) distribution
or quantiles of the standardized empirical distribution). In all cases, we compute the

conditional mean as t = n1 ni=1 xt−n+i .
To contrast these estimates, we also implement a naive estimator of the value-at-risk
by simply taking the empirical -quantile of the distribution of the log-returns, i.e.,

#xt+1 < q̂1− 


HREmp
 = ,
n

where q̂1− is the empirical 1 −  quantile of xt−n+1 ,    , xt .


In Fig. 6, we plot the hit ratios HR at the confidence level  for  equal to 99% (top
panels) and 975% (bottom panels).
Figure 6 clearly shows that there is again no systematic difference between HRHAR 
D

Lasso D
and HR . Both are too aggressive and yield a VaR which is too low and thus is
violated more often than theoretically specified when the empirical distribution is used
for the standardized innovations, and less so when the  (0, 1)-distribution is used for
the standardized innovations. What becomes apparent from Fig. 6 is that the influence of
the assumption on the distribution as well as the asset in question is much more crucial
than the model used to forecast volatility. There is no apparent outperformance when
computing the VaR with volatility forecasts obtained by either the HAR or the Lasso
over the simple historical quantiles for short training periods. This is all the more so, when
looking at the rejections of 0 under Kupiec’s (1995) test. In fact, the null hypothesis is
less often rejected for the naive estimator than for any realized variance model at both
5% and 10% significance levels.
The particularly poor performance of all VaR forecasts for Citigroup, Inc. is related
to the turbulent times the stock went through during the financial crisis resulting in
24 F. AUDRINO AND S. D. KNAUS
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 6 Actual hit ratios. In the columns, we plot the hit ratios HR at the confidence level  for  equal
to 99% (top panels) and 975% (bottom panels) obtained using the different methods: HAR and Lasso with
normal innovations, HAR and Lasso using the empirical distribution of the standardized innovations, and a
simple method based on historical empirical quantiles. The horizontal lines show the theoretical levels (1 − )
of the VaR. The color indicates the p value of Kupiec’s (1995) test against the theoretical level. Window of
different lengths n = 200, 400, 1000, 2000 are considered in the estimation.

pronounced non-normality of log RVt as reported in Fig. 1 (b) as well as non-normality


of the log-returns reported in Fig. 7.

3.5. Robustness Check: Adaptive Lasso Estimation

As already discussed in the remarks following Theorem 1, the adaptive Lasso is a natural
alternative to the original Lasso given that it is more tuned towards variable selection.
We therefore redo the whole analysis using the adaptive Lasso as a robustness check for
the results we found using the Lasso.
The adaptive Lasso introduced by Zou (2006) allows for a more flexible penalization
⎧ ⎛ ⎞2 ⎫

⎨ ⎪

n p
p
ˆ adalasso = arg min ⎝xt − j xt−j
⎠ + j| j| , (14)

⎩t=p+1 ⎪

j=1 j=1
LASSOING THE HAR MODEL 25
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 7 Kernel density estimates of standardized log-returns for pre-crisis (PC) and full sample (FS)
against normal distribution. The date for the beginning of the crisis was set to Sep. 1, 2007.

where j are adaptive weights. It can be shown that in fact the adaptive Lasso relaxes
the assumptions for the model selection consistency of the Lasso; see, among others, Zou
(2006) or Bühlmann and Van De Geer (2011). Following the literature, as a common
choice for the adaptive weights, we set j = |1/ ˆ Lasso
j |, with the notation that in case a
variable is excluded by the Lasso (that is, ˆ Lasso
j = 0), we also exclude it from the adaptive
Lasso estimation.
Results for the full in-sample estimation are summarized in Table 7 and Fig. 8.
When comparing the results with those shown in Table 2 and Fig. 2 for the Lasso,
we see that, as expected, the adaptive Lasso is excluding some additional variables that
were considered active by the Lasso (so-called false positives). The general picture is,
however, not changing significantly. Thus, using the adaptive Lasso, we come to the same
conclusions we discussed previously for the Lasso.
Downloaded by [La Trobe University] at 22:39 02 February 2016

TABLE 7
HAR Estimates Versus Adaptive Lasso Estimates

AA C HAS HDI INTC MSFT NKE PFE XOM


Lag HAR adalasso HAR adalasso HAR adalasso HAR adalasso HAR adalasso HAR adalasso HAR adalasso HAR adalasso HAR adalasso

1 0.470 0.453 0.573 0.544 0.403 0.373 0.413 0.384 0.535 0.515 0.488 0.466 0.456 0.421 0.417 0.392 0.497 0.484
2 0.084 0.200 0.073 0.178 0.084 0.131 0.086 0.147 0.074 0.140 0.086 0.135 0.073 0.127 0.073 0.125 0.094 0.147
3 0.084 0.025 0.073 – 0.084 0.082 0.086 0.084 0.074 0.004 0.086 0.049 0.073 0.087 0.073 0.000 0.094 0.040
4 0.084 0.083 0.073 0.076 0.084 0.047 0.086 0.019 0.074 0.078 0.086 0.099 0.073 0.061 0.073 0.078 0.094 0.108
5 0.084 0.015 0.073 0.051 0.084 0.027 0.086 0.058 0.074 0.068 0.086 0.011 0.073 – 0.073 0.046 0.094 0.066

6 0.010 – 0.008 – 0.012 0.060 0.012 0.060 0.008 – 0.008 0.065 0.013 0.038 0.014 0.049 0.005 –
7 0.010 – 0.008 – 0.012 0.040 0.012 0.012 0.008 – 0.008 – 0.013 0.041 0.014 0.034 0.005 –
8 0.010 – 0.008 – 0.012 – 0.012 0.012 0.008 – 0.008 – 0.013 – 0.014 0.003 0.005 –
9 0.010 0.057 0.008 0.070 0.012 – 0.012 0.037 0.008 0.074 0.008 0.058 0.013 0.031 0.014 0.045 0.005 0.069

26
10 0.010 0.075 0.008 – 0.012 – 0.012 0.012 0.008 – 0.008 – 0.013 – 0.014 0.037 0.005 –
11 0.010 – 0.008 – 0.012 0.032 0.012 0.042 0.008 – 0.008 0.002 0.013 0.034 0.014 – 0.005 –
12 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
13 0.010 – 0.008 – 0.012 0.020 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
14 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 0.031 0.013 0.017 0.014 0.045 0.005 –
15 0.010 – 0.008 – 0.012 – 0.012 – 0.008 0.010 0.008 – 0.013 – 0.014 – 0.005 –
16 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
17 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
18 0.010 – 0.008 – 0.012 0.006 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
19 0.010 – 0.008 – 0.012 0.018 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –
20 0.010 – 0.008 – 0.012 – 0.012 – 0.008 0.020 0.008 – 0.013 – 0.014 – 0.005 –
21 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 0.026 0.014 – 0.005 –
22 0.010 – 0.008 – 0.012 – 0.012 – 0.008 – 0.008 – 0.013 – 0.014 – 0.005 –

This table reports the HAR coefficients and the adaptive Lasso coefficients estimated on the full sample going from Jan. 2, 2001 to Nov. 15, 2010, for
the nine stocks belonging to the S&P500 universe under investigation. Coefficients set to zero by the adaptive Lasso procedure are indicated by dashes.
We only report the coefficients up to lag xt−22 . Adaptive Lasso coefficient estimates of lags higher than xt−22 are reported graphically in Fig. 8.
LASSOING THE HAR MODEL 27
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 8 HAR versus adaptive Lasso coefficients: Adaptive Lasso (red) and HAR (blue) estimated
coefficients for nine stocks belonging to the S&P500 universe for the time period going from Jan. 2, 2001 to
Nov. 15, 2010.

Similarly, all other results of the analysis are qualitatively the same as those shown for
the Lasso. Not surprisingly, the Lasso and the adaptive Lasso perform equally well in the
out-of-sample analysis. For the sake of brevity, full results are not reported but can be
obtained from the authors upon request.

4. CONCLUSIONS

We provide theoretical results and empirical evidence on synthetic data that, if the data
stem from a HAR DGP, the Lasso should detect the HAR model’s lags structure.
28 F. AUDRINO AND S. D. KNAUS

Applying the Lasso to the (log) realized variance series of nine U.S. stocks, we conclude
that it is not the case. Moreover, we find a clear indication of two structural breaks of
the realized variance dynamics that took place around the end of 2006 and with the onset
of the financial crisis. Our results provide a statistical foundation to the common view
of the HAR model being only a simple approximation, but not a deep true model, that
is consistent with a large number of stylized facts. Thus, they cast some doubt on the
appropriateness of the HAR as a global model for realized variance.
In contrast with the above (in-sample) discussion and consistent with the previous
literature on realized volatility, we find that the HAR model is very difficult to
beat in forecasting applications. In fact, the Lasso and the HAR model are almost
indistinguishable from an out-of-sample performance point of view. In particular, when
Downloaded by [La Trobe University] at 22:39 02 February 2016

looking at an economically meaningful comparison using value-at-risk prediction, both


models performed equally poorly with no noticeable differences in favor of either of the
two.
The arguments above and the selection of only near-lags in the whole sample, and
even more pronouncedly around the time period preceding the financial crisis, lead us to
the hypothesis that in fact the realized variance dynamics are much better explained by
shorter horizon models or models allowing for regime shifts and structural breaks. Our
results are in line with the empirical evidence shown, for example, in Chen et al. (2010).
Although structural breaks seem to occur (almost) simultaneously across the assets,
differences in the lags selected for the stocks considered are clearly visible. Our intuition is
that those differences might be explained by specific characteristics of the stocks like, for
example, the industry sector they belong to, liquidity, or the size of the company. As an
example, the high order lags selected for Citigroup, a company belonging to the financial
industry sector, during the financial crisis seem to be a consequence of particular spikes
in volatility caused by events like the Bear Stearns acquisition by J.P. Morgan in March
2008 or the bankruptcy of Lehman Brothers in September 2008. In contrast, these events
were not so relevant for other companies in our sample belonging to other market sectors
like Intel or Pfizer. This intuition is, however, only based on preliminary investigations
and, at this point, rather speculative. To investigate further interpretations and causes of
the selected lags behavior, we plan to enlarge the dataset cross-sectionally adding several
additional stocks to the analysis. This is left for future research.

APPENDIX A: PROOF OF THEOREM 1

This proof is structured as follows. We first show in Lemma 1 that the irrepresantable
condition is satisfied for the HAR model. Based on this, we invoke a theorem of Zhao
and Yu (2006) which relaxes the assumptions on the innovation term for the Lasso to
be model consistent. Finally we show that the HAR model satisfies the assumptions of
the aforementioned theorem, and we can thus expect the Lasso to be model selection
consistent without the assumption Gaussianity for the error term.
LASSOING THE HAR MODEL 29

Lemma 1. Under the assumption that a causal HAR model with non-negative
coefficients is the true DGP, conditions (i) and (ii) of Theorem 1 are satisfied.

Lemma 1 states that if the true DGP indeed obeys the law of motion as specified by
the HAR model one can apply the results of Nardi and Rinaldo (2011) who establish that
the Lasso is a valid model selection device for Gaussian AR processes. When embedding
the HAR model in the AR specification, we have that S consists of the lagged values up
to order 22 and S c is any other lagged value beyond 22. Since (i) holds trivially, as by the
definition of the HAR model none of the variables is a linear combination of another, we
only collect the proof of (ii) below.
Downloaded by [La Trobe University] at 22:39 02 February 2016

−1
Proof. The proof is split into two parts. First we show that the infinity norm of Sc S SS
can be seen as the sum of the absolute values of the regression coefficients of the usual
HAR estimates. Second, we show that it is sufficient to consider one specific nonactive
regressor.
Moreover, consider the following equivalent notations:

Cov(S c , S)Var(S)−1 = Cov(S c , S)Cov(S, S)−1 = −1


S c S SS 

To rule out any possible confusion, we restate the definition of the infinity norm of
a matrix. If ∞ for ∈ n is defined as ∞ = max1≤i≤n | i |, then the corresponding
matrix norm is given as

A ∞ := max A ∞,
∞ =1

where it can be shown (Lewis, 1991, Proposition 3.4.1) for A = [aij ]1≤i≤n,1≤j≤m that


m
A ∞ = max |aij |
1≤i≤n
j=1

In what follows we consider a row-vector = [ 1 ,    , n ] as 1 × n matrix such that



∞ = 1.
Throughout the proof, we assume without loss of generality the HAR model to contain
no intercept. Moreover, for the sake of notational simplicity, we assume the AR process
to be labeled as


22
xt = i xt−i + t  (15)
i=1
30 F. AUDRINO AND S. D. KNAUS

Assume that |S c | = 1 with S c = xt−23 11 and that the true model is in fact the HAR
model, i.e., |S| = 22 with S = xt−1 , xt−2 ,    , xt−22 . In other words, the active set consists
of the first 22 lagged values, and the first nonactive predictor is xt−23 . We then find that

Cov(xt−23 , [xt−1 , xt−2 ,    , xt−22 ])Var([xt−1 , xt−2 ,    , xt−22 ])−1 = [ ˜ 1 ,    , ˜ 22 ], (16)

where [ ˜ 1 ,    , ˜ 22 ] is the usual representation of regression coefficients of xt−23 on


xt−1 , xt−2 ,    , xt−22 (note that the previously introduced superscript “HAR” is omitted to
alleviate notation).
Since we are only interested in the sum of the absolute values of these regression
coefficients, i.e., [ ˜ 1 ,    , ˜ 22 ] ∞ , we may as well reorder the regressors given that
Downloaded by [La Trobe University] at 22:39 02 February 2016

[ ˜ 1 ,    , ˜ 22 ] ∞ = [ ˜ (1) ,    , ˜ (22) ] ∞ (17)

is true for any permutation . With (i) = 22 − i + 1, we find that

[ ˜ (1) ,    , ˜ (22) ] ∞ = Cov(xt−23 , [xt−22 , xt−21 ,    , xt−1 ])Var([xt−22 , xt−21 ,    , xt−1 ])−1 ∞

Exploiting covariance stationarity and, thus, the fact that the autocovariance is an even
function, we can show that

Cov(xt−23 , [xt−22 , xt−21 ,    , xt−1 ] = [Cov(xt−23 , xt−(23−i) )]1≤i≤22


= [Cov(xt , xt−i )]1≤i≤22
= Cov(xt , [xt−1 , xt−2 ,    , xt−22 ])

and

Var([xt−22 , xt−21 ,    , xt−1 ]) = Var([xt−1 , xt−2 ,    , xt−22 ]),

such that

[ ˜ (1) , ˜ (2) ,    , ˜ (22) ] = Cov(xt−23 , [xt−22 , xt−21 ,    , xt−1 ])Var([xt−22 , xt−21    , xt−1 )
= Cov(xt , [xt−1 , xt−2 ,    , xt−22 ])Var([xt−1 , xt−2 ,    , xt−22 ])−1 (18)
=[ 1, 2,    , 22 ]

Combining (17) and (19) shows that (16) is indeed simply the sum of the absolute values
of the coefficients of (15), i.e., we conclude for S c = xt−23  that we have
−1
S c S SS ∞ = (d) + (w) + (m)  (19)

Observe that we slightly deviate from the notation used previously where S ⊂ ; we use S and S c to
11

denote the corresponding lags variables rather than their indices.


LASSOING THE HAR MODEL 31

When extending the set of nonactive predictors to S C = xt−(22+i) 1≤i≤k , one can verify that

Cov([xt−(22+1) ,    , xt−(22+k)] , [xt−1 , xt−2 ,    , xt−22 ])Var([xt−1 , xt−2 ,    , xt−22 ])−1


⎡ (1) ⎤
˜ ˜ (1) · · · ˜ (1)
1 2 22
⎢  ⎥  (20)
= ⎣  
  ⎦
˜ (k) ˜ (k)
· · · ˜ (k)
1 2 22

Hence,


22
Cov(S c , S)Var(S)−1 = max | ˜ (j)
i |
Downloaded by [La Trobe University] at 22:39 02 February 2016


1≤j≤k
i=1

22 22
i=1 | i | < i=1 | i | for l > k by induction. The
(l) (k)
In a next step we show that
conclusion then follows since it holds for k = 1, i.e., for S c = xt−23 , which has already been
proven in (19).
Given that reversing the order has no effect on the sum of the coefficients, that is using
the permutation (i) = 22 − i + 1, i = 1,    , 22,

[ ˜ (j) ˜ (j)
1 ,    , 22 ] ∞ = [ ˜ (j) ˜ (j)
(1) ,    , (22) ] ∞, j = 1,    , k,

we proceed with the proof in the usual AR(22) representation. Simplifying the notation,
similarly to Eq. (15) we define


22
xt+j = (j)
i xt+1−i + t+j ,
i=1

˜ (j)
i = (i) for i = 1,    , 22 and j = 1,    , k.
where (j)
Now, consider the induction basis for j = 1 → 2 as follows:


22
xt+1 = (1)
i xt+1−i + t+1
i=1
 

22 
22
= (1)
1
(1)
i xt−i + t + (1)
i xt+1−i + t+1
i=1 i=2
21 
 
= (1)
1
(1)
i + (1)
i+1 xt−i + (1)
1
(1)
22 xt−22 + ˜ t+1
i=1


22
= (2)
i xt−i + ˜ t+1 ,
i=1
32 F. AUDRINO AND S. D. KNAUS

where ˜ t+1 = (1)


1 t + t+1 and

(2)
i = (1)
1
(1)
i + (1)
i+1 for i = 1,    , 21 and (2)
22 = (1)
1
(1)
22  (21)

By the assumptions put forward in (1), we have that (1)


i > 0 ∀i = 1,    , 22, and taking
the difference of the sum of absolute values thus yields
 

22 
22 
22
 
| i |
(2)
− | i |
(1)
= (1)
1
(1)
i −1 = (1)
1 (d) + (w) + (m) − 1 
i=1 i=1 i=1

We therefore proved that (2) > 0 ∀i = 1    22 and, using the causality assumption12 ,
Downloaded by [La Trobe University] at 22:39 02 February 2016

22 (2) 22 (1) i


that i=1 i < i=1 i .
Reapplying the same argument for the induction step j → j + 1 yields


22
xt+j = (j)
i xt+1−i + t+j
i=1
 

22 
22
= (j)
1
(1)
i xt−i + t + (j)
i xt+1−i + t+j
i=1 i=2
21 
 
= (j)
1
(1)
i + (j)
i+1 xt−i + (1)
22
(j)
1 xt−22 + ˜ t+j
i=1


22
= (j+1)
i xt−i + ˜ t+j ,
i=1

where again ˜ t+j = (1)


1 t + t+j and i
(j+1)
= (j)
1
(1)
i + (j)
i+1 for i = 1,    , 21 and (j+1)
22 =
(1) (j)
1 22 .
(j+1) (j)
Taking the difference between the sum of i and the sum of i yields
 22 

22 
22 
(j+1)
i − (j)
i = (1)
i −1 (j)
1 
i=1 i=1 i=1

By the induction basis we have (j)


i > 0 ∀i = 1,    , 22 such that (j+1)
i > 0 ∀i = 1,    , 22
and thus


22 
22
| (j+1)
i | − | i |
(j)
<0
i=1 i=1

Since all roots lie outside the unit circle and P(z), the characteristic polynomial, is continuous on , it
12

follows that P(1) > 0 and thus that (d) + (w) + (m) < 1.
LASSOING THE HAR MODEL 33

such that the claim


22 
22
| (j+1)
i | < | i |
(j)

i=1 i=1

follows. Summarizing, we conclude that for the HAR model it holds that S c S SS ∞ ≤
1 −  if (d) + (w) + (m) ≤ 1 − . 

Having proven Lemma 1, we look at a theorem provided by Zhao and Yu (2006) that
shows that the Lasso is model selection consistent under some assumptions. Later we will
prove that these assumptions hold if the HAR model is assumed to be true and we can
Downloaded by [La Trobe University] at 22:39 02 February 2016

thus safely relax the assumption of normally distributed errors if we are willing to accept
a fixed S and S c (in contrast with Nardi and Rinaldo’s 2011, result where p is allowed to
grow as the sample size increases).

Theorem A (Zhao and Yu, 2006). Under the assumptions of S and S c fixed and the
following statements, the Lasso is model selection consistent in the sense of Definition 1
if the innovation term has finite second moment and n is chosen such that n /n → 0
1+c
and n /n 2 → ∞ with 0 ≤ c < 1    (a.s. denoting almost sure convergence):

−1 as
(A1) | SC S SS sgn(supp 0 )| < 1, where 1 is a vector of ones and the inequality is
understood componentwise.
as
c ),(S,S c ) −→ (S,Sc ),(S,Sc ) , where (S,Sc ) is the autocovariance matrix and (S,S
n n
(A2) (S,S c ) its

sample analogon.
 as
(A3) n1 max0≤i≤n−p pj=1 xt−i−j 2
−→ 0.

Proof of Theorem 1. We prove that the assumptions of Theorem 2 above are satisfied
if one assumes the dynamics of the HAR model as put forward in (1) to hold as well as
the existence of a finite fourth moment of the innovation term.
−1 as
(A1) | SC S SS sgn(supp 0 )| < 1 in (A1) of Theorem 2 holds since the argument
in the proof of Lemma 1 can be made in terms of sample moments. Knowing that
the least squares estimates converge a.s. to the true values (Brockwell and Davis, 1986,
−1 as
Theorem 10.8.1) the conclusion follows since | SC S SS sgn(supp 0 )| < 1 is weaker than
−1
S c S SS ∞ ≤ 1 −  as all components of supp
0
are greater than zero by (3).
(A2) Under the assumption of a finite fourth moment of the innovations, we
have by a result of Hong-Zhi et al. (1982) the convergence almost surely. The positive
definiteness follows from the fact that (S,Sc ) is positive semidefinite iff a variable is a linear
combination of the others which is ruled out by the assumption of the HAR model as
given in (3).13

13
It is semidefinite since it is a covariance matrix.
34 F. AUDRINO AND S. D. KNAUS

p 2
(A3) To let Assumption (A3) to be satisfied, we argue that max0≤i≤n−p j=1 xt−i−j
increases at a rate slower than n when n → ∞.

The condition on the innovation follows from Hölder’s inequality since we have that
L4 ⊂ L2 such that it suffices to require a finite fourth moment of the error term. 

Summarizing, we have that the Lasso should detect the HAR model if we assume
a finite fourth moment of the innovations distribution. We can therefore relax the
Gaussianity assumption.
Downloaded by [La Trobe University] at 22:39 02 February 2016

FIGURE 9 Coefficient of determination along power transform: R2 for different values of BC for the HAR
model estimated on RV(t ) on the whole sample as described in Section 3.1. The green line indicates the
maximal √R2 and the dotted lines indicate common transformations for realized volatilities (log RVt with
BC = 0, RVt with BC = 1/2, and RVt with BC = 1).
LASSOING THE HAR MODEL 35

APPENDIX B: LOG-TRANSFORMED VOLATILITIES

Although it is common to use the log-transform to model realized variance for reasons
of positiveness, lower skewness, and lower kurtosis, the case of the HAR model even
allows for additional arguments to justify the use of log-transformed realized volatilities.
These are not solely related to the realized volatility series as such (as for instance in
Martens et al., 2009, Table 2) but also to how realized volatility is modeled. Extending
the approach of Box and Cox (1964) where only the dependent variable is transformed,
we employ the Box–Cox transform

x BC −1
if BC  = 0
Downloaded by [La Trobe University] at 22:39 02 February 2016

f BC (x) = x ( BC )
= BC

log(x) otherwise

to series of realized volatility. Consequently, the Box–Cox transform not only affects the
dependent variable but also predictor variables in the HAR model. As in the original
work of Box and Cox, we then compute the (quasi-)likelihood for each BC . Since the
(quasi-)likelihood is equivalent to the R2 , we report the R2 for different values of BC in
Fig. 9.
Clearly, following again Box and Cox and choosing a rational BC , it follows that
BC = 0 is a sensitive choice and thus justifies the use of log-transformed volatilities. A
further argument for using BC = 0 may be found in the fact that for the case of BC =
0 we can construct unbiased estimates (under the assumption of normality of the log-
transformed realized volatilities) explicitly without resorting to the median (Pankratz and
Dudley, 1987; Proietti and Lütkepohl, 2013).

ACKNOWLEDGMENTS

The authors are grateful to Marcelo Medeiros, Matthias Fengler, Tim Bollerslev,
the participants to the 20th international conference on Computational Statistics
(COMPSTAT, 2012), the participants to the 7th International Conference on
Computational and Financial Econometrics (CFE, 2013), and seminar participants at the
University of St. Gallen for many helpful comments.

REFERENCES

Andersen, T. G., Bollerslev, T., Diebold, F. X., Ebens, H. (2001). The distribution of realized stock return
volatility. Journal of Financial Economics 61(1):43–76.
Andersen, T. G., Bollerslev, T., Diebold, F. X., Labys, P. (2001). The distribution of realized exchange rate
volatility. Journal of the American Statistical Association 96(453):42–55.
Andersen, T. G., Bollerslev, T., Meddahi, N. (2004). Analytical evaluation of volatility forecasts. International
Economic Review 45(4):1079–1110.
Areal, N. M. P. C., Taylor, S. J. (2002). The realized volatility of FTSE-100 futures prices’. Journal of
Futures Markets 22(7):627–648.
36 F. AUDRINO AND S. D. KNAUS

Audrino, F., Camponovo, L. (2013). Oracle properties and finite sample inference of the adaptive Lasso for
time series regression models, Technical report, St. Gallen University. SEPS Discussion Paper Series.
Barndorff-Nielsen, O., Shephard, N. (2002). Econometric analysis of realized volatility and its use in
estimating stochastic volatility models. Journal of the Royal Statistical Society. Series B, (Statistical
Methodology) 17(5):253–280.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics
31(3):307–327.
Box, G., Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B,
(Statistical Methodology) 26(2):211–252.
Brockwell, P., Davis, R. (1986). Time Series: Theory and Methods. New York: Springer-Verlag Inc.
Brownlees, C., Engle, R., Kelly, B. (2012). A practical guide to volatility forecasting through calm and storm.
Journal of Risk 14(2):3–22.
Bühlmann, P., Rütimann, P., van de Geer, S., Zhang, C. (2012). Correlated variables in regression: Clustering
Downloaded by [La Trobe University] at 22:39 02 February 2016

and sparse estimation, Technical report, ETH Zürich. arXiv:1209.5908.


Bühlmann, P., Van De Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and
Applications. New York: Springer-Verlag Inc.
Chen, Y., Härdle, W., Pigorsch, U. (2010). Localized realized volatility modeling. Journal of the American
Statistical Association 105(492):1376–1393.
Corsi, F. (2009). A simple approximate long-memory model of realized volatility. Journal of Financial
Econometrics 7(2):174–196.
Corsi, F., Audrino, F., Renò, R. (2012). HAR modeling for realized volatility forecasting. In: Bauwens,
L., Hafner, C., Laurent, S., eds. The Handbook of Volatility Models and Their Applications. Wiley
Handbooks in Financial Engineering and Econometrics Series, Hoboken, New Jersey: John Wiley & Sons,
Inc.
Craioveanu, M., Hillebrand, E. (2010). Why it is OK to use the HAR-RV (1,5,21) model, Technical report,
Louisiana State University.
Diebold, F. X., Mariano, R. S. (1995). Comparing predictive accuracy. Journal of Business and Economic
Statistics 13(3):253–263.
Friedman, J., Hastie, T., Tibshirani, R. (2010). Regularization paths for generalized linear models via
coordinate descent. Journal of Statistical Software 33(1):1–22.
Hansen, P., Lunde, A. (2005). A forecast comparison of volatility models: Does anything beat a
GARCH (1, 1). Journal of Applied Econometrics 20(7):873–889.
Hastie, T., Efron, B. (2011). Lars: Least Angle Regression, Lasso and Forward Stagewise. Manual to the
R-package ‘lars’, Stanford University.
Hintze, J., Nelson, R. (1998). Violin plots: A box plot-density trace synergism. American Statistician
52(2):181–184.
Hong-Zhi, A., Zhao-Guo, C., Hannan, E. J. (1982). Autocorrelation, autoregression and autoregressive
approximation. Annals of Statistics 10(3):926–936.
Hsu, N., Hung, H., Chang, Y. (2008). Subset selection for vector autoregressive processes using Lasso.
Computational Statistics and Data Analysis 52:3645–3657.
Kock, A. B. (2012). On the Oracle Property of the Adaptive LASSO in Stationary and Nonstationary
Autoregressions, Technical report, Aarhus University. CREATES Research Paper 2012-05.
Kock, A. B., Callot, L. A. (2012). Oracle Inequalities for High Dimensional Vector Autoregressions, Technical
report, Aarhus University. CREATES Research Paper 2012-16.
Kupiec, P. (1995). Techniques for verifying the accuracy of risk measurement models. Journal of Derivatives
3(2):73–84.
Leeb, H., Pötscher, B. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21(1):
21–59.
Lewis, D. (1991). Matrix Theory. Singapore: World Scientific Publishing Co. Pte. Ltd.
Martens, M., van Dijk, D., de Pooter, M. (2009). Forecasting S and P 500 volatility: Long memory, level
shifts, leverage effects, day-of-the-week seasonality, and macroeconomic announcements. International
Journal of Forecasting 25(2):282–303.
McAleer, M., Medeiros, M. C. (2008). Realized Volatility: A Review. Econometric Reviews 27(1–3):10–45.
LASSOING THE HAR MODEL 37

Medeiros, M., Mendes, E. (2012). Estimating High-Dimensional Time Series Models., Technical report,
Aarhus University. CREATES Research Paper 2012-37.
Mincer, J., Zarnowitz, V. (1969). The evaluation of economic forecasts. In: Mincer, J., ed., Economic Forecasts
and Expectations: Analysis of Forecasting Behavior and Performance. National Bureau of Economic
Research, chapter The Evaluation of Economic Forecasts, pp. 1–46.
Nardi, Y., Rinaldo, A. (2011). Autoregressive process modeling via the Lasso procedure. Journal of
Multivariate Analysis 102(3):528–549.
Pankratz, A., Dudley, U. (1987). Forecasts of Power-transformed Series. Journal of Forecasting 6(4):239–239.
Park, H., Sakaori, F. (2013). Lag weighted Lasso for time series model. Computational Statistics 28(2):
493–504.
Proietti, T., Lütkepohl, H. (2013). Does the Box–cox transformation help in forecasting macroeconomic time
series? International Journal of Forecasting 29(1):88–99.
Sizova, N. (2011). Integrated variance forecasting: Model based vs. reduced form. Journal of Econometrics
Downloaded by [La Trobe University] at 22:39 02 February 2016

162(2):294–311.
Team, R. C. (2012). R: A Language and Environment for Statistical Computing, R Foundation for Statistical
Computing, Vienna, Austria.
Thomakos, D. D., Wang, T. (2003). Realized volatility in the futures markets. Journal of Empirical Finance
10(3):321–353.
Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical
Society. Series B, (Statistical Methodology) 58(1):267–288.
Wang, C.-H., Bauwens, L., Hsiao, C. (2013). Forecasting a long memory process subject to structural breaks.
Journal of Econometrics 177:171–184.
Wang, H., Li, G., Tsai, C. (2007). Regression coefficient and autoregressive order shrinkage and selection via
the Lasso. Journal of the Royal Statistical Society. Series B, (Statistical Methodology) 69:63–78.
Zhang, L., Mykland, P., Aït-Sahalia, Y. (2005). A tale of two time scales. Journal of the American Statistical
Association 100(472):1394–1411.
Zhao, P., Yu, B. (2006). On model selection consistency of Lasso. Journal of Machine Learning Research
7(2):2541–2563.
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association
101(476):1418–1429.

You might also like