Double Cross Validation For The Number of Factors in Approximate Factor Models

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN

APPROXIMATE FACTOR MODELS

Xianli Zenga , Yingcun Xiab and Linjun Zhangc


Determining the number of factors is essential to factor analysis. In this paper, we

propose an efficient cross validation (CV) method to determine the number of factors
arXiv:1907.01670v1 [stat.ME] 2 Jul 2019

in approximate factor models. The method applies CV twice, first along the directions

of observations and then variables, and hence is referred to hereafter as double cross-

validation (DCV). Unlike most CV methods, which are prone to overfitting, the DCV

is statistically consistent in determining the number of factors when both dimension of

variables and sample size are sufficiently large. Simulation studies show that DCV has

outstanding performance in comparison to existing methods in selecting the number

of factors, especially when the idiosyncratic error has heteroscedasticity, or heavy

tail, or relatively large variance.

Keywords: cross validation, eigen-decomposition, factor model, high dimensional

data, model selection.

1. INTRODUCTION

Factor models are a special kind of latent-variable models that are widely used in economics and
other disciplines of research. Due to the ubiquitous dependence across high-dimensional economic
variables, it is appealing for explaining the variation of the high-dimensional economic measurements
from only a small number of latent common factors. Well-known examples include the arbitrage pric-
ing theory (Ross, 1976), rank of demand systems (Lewbel, 1991), multiple-factor models (Fama and
French, 1993), components analysis in economic activities (Gregory and Head, 1999), and diffusion
index (Stock and Watson, 2002). Successful applications rely heavily on the ability to correctly spec-
ify the number of factors, which is usually unknown in practice. Therefore, determining the number
a Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546;

stazeng@nus.edu.sg
b Department of Statistics and Applied Probability, National University of Singapore, Singapore 117546;

staxyc@nus.edu.sg
c Department of Statistics, the Wharton School, University of Pennsylvania, Philadelphia, PA 19104.

linjunz@wharton.upenn.edu

1
2 X. ZENG, Y. XIA AND L. ZHANG

of factors is an essential step in applying factor models.


One approach to this end is to utilize the eigen-structure of data matrix; see for example
Fujikoshi (1977), Schott (1994), Ye and Weiss (2003), Onatski (2010) and Luo and Li (2016), amongst
others. However, these methods require strong assumptions on the separation of eigenvalues, and
thus most of them are only applicable to fixed dimensional data. More popular approaches are cross-
validation (CV) and information criteria (IC). For factor model, several cross-validation methods
have been proposed; see, for example, Wold (1978) and Eastment and Krzanowski (1982), amongst
others. Bro et al. (2008) comprehensively reviewed these CV-based approaches, and concluded that
most were not statistically consistent. In addition, they recommended an alternative method based
on the expectation maximization (EM) algorithm, which turned out to be extremely computationally
expensive. As an alternative, the ICs are more computationally effective and consistency can be
guaranteed. As far as we know, Cragg and Donald (1997) was the first paper that used ICs to
determine the number of factors. Subsequently, other methods based on the Akaike information
criterion (AIC) and Bayesian information criterion (BIC) (Stock and Watson, 1998; Forni et al.,
2000; Bai and Ng, 2002; Li et al., 2017) were proposed for different settings of sample size n and
dimension of variable, p. When the variance of the idiosyncratic component is relatively small, these
methods are able to provide very efficient estimates of the number of factors (Bai and Ng, 2002).
However, the IC-based approaches are usually not fully data-driven since their penalties depend on
predetermined tuning parameters. Even though several guides are proposed to mitigate the effect of
tuning parameters, the performances of these methods are not stable when the signal-to-noise ratio
is relatively small (Onatski, 2010), i.e., the variation of the idiosyncratic component is relatively
large, or when the idiosyncratic component has heavy tails in distribution. Unfortunately, both of
which are common in financial data.
In this paper, we propose to estimate the number of factors by a computationally-efficient CV
method. Note that the conventional CV methods, e.g., those used in the linear regression, only leave
one or several observations out. In contrast, factor models involve a matrix, where both observations
(or rows) and variables (or columns) play similar roles mathematically in the analysis. Thus, we
propose a double cross-validation (DCV) method that applies CV first to the observations and
then to the variables in the observations. Theoretically, we show that the method is consistent
for high-dimensional data and allows dependency between the idiosyncratic errors. The method is
thus applicable to approximate factor models. In addition, computationally, the second CV can be
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS3

easily calculated by the linear regression error using the full data (Shao, 1993), while the first CV
can be done in a K-fold manner. Simulation studies show that the proposed approach performs
satisfactorily even when the idiosyncratic error has relatively big variance or homoscedasticity, or
heavy tail, which are especially relevant for economic data.
The rest of this paper is organized as follows. Section 2 reviews the basic notation of factor
analysis. Section 3 presents the DCV approach. Section 4 establishes statistical consistency of the
approach, while the proofs of those properties are given in the Appendix. Sections 5–6 present a set
of simulation studies and an empirical application to assess the finite sample performance of DCV.
Concluding remarks are given in Section 7.

2. FACTOR MODEL AND ITS FACTORS AND LOADINGS

For a p-dimensional random vector x = (X1 , X2 , ..., Xp )> , the factor model assumes the fol-
lowing bilinear structure

x = L0 f 0 + e,

where L0 ∈ Rp×d0 is the loading matrix, f 0 = (f1 , f2 , ..., fd0 )> is the vector of d0 common
factors. Together L0 f 0 is called the common component of the data, and e = (e1 , e2 , ..., ep )> the
idiosyncratic component. Let Σe = E[ee> ], Σf0 = E[f0 f0> ], and Σx = E[xx> ]. Under this model,
the variation of the p-dimensional variable is mainly generated by a small number of common factors.
In the statistical literature, the idiosyncratic component are usually assumed to be independent of
>
the common component, and have diagonal covariance matrix, in which case Σx = L0 Σf0 L0 +Σe .
However, such independence assumption and diagonal structure are usually not realistic in financial
data or macroeconomic data where factor analysis are often applied. In this paper, we consider the
approximate factor model where the idiosyncratic components can be weakly dependent.
Let X = (x> > > >
1 , x2 , ..., xn ) be random samples of n observations on p variables. We have the

following matrix form for the factor model:

>
(1) X = F 0 L0 + E,

with F 0 = (f10 , f20 , ..., fn0 )> and E = (e> > > >
1 , e2 , ..., en ) . According to the eigen-decomposition,

X > X has the expression

X > X = ΦΛΦ> ,
4 X. ZENG, Y. XIA AND L. ZHANG

where Λ is a p × p diagonal matrix with diagonal entries being the eigenvalues of X > X, Φ =
(φ1 , φ2 , ..., φn ) ∈ Rn×n is the eigenvector matrix of X > X such that Φ> Φ = Ip . Then, with
working number of factors d, the estimators of factor loadings and factors are respectively

(2) e d = √p(φ1 , φ2 , ..., φd ) and Fe d = 1 X L


L e d.
p
Denote by Tr(A) the trace of matrix A. It is easy to see that

> >
(Fe d , L
e d) = argmin {Tr[(X − F d Ld )(X − F d Ld )> ]}.
F d ∈Rn×d ,Ld ∈Rp×d
>
Ld Ld /p=Id

3. DOUBLE CROSS VALIDATION FOR NUMBER OF FACTORS

Our basic idea to specify the number of factor d0 , is based on the prediction error in a cross-
validated manner. Write the model element-wise as

>
(3) xis = fi0 ls0 + eis , i = 1, ..., n, s = 1, 2, ..., p.

CV needs to make prediction of xis based on other elements except for itself. Because neither fi0
nor ls0 in (3) is observable, we implement the prediction by two-stage fitting: leave-observation-out
and leave-variable-out.
The first stage is to estimate ls0 from the data by the K-fold CV. Divide the rows of X into K
folds, 1 < K ≤ n. Denote them by M1 , ..., MK . Let nk = #Mk be the number of elements in Mk ,
and X−Mk be the sub-matrix of X with rows in Mk being removed. Apply the eigen-decomposition
to X−Mk to obtain corresponding matrices Fe k,d and L
e k,d as in (2). Here, we use the superscript to

highlight the fact that the estimator is obtained from the data with the rows in Mk removed, with
working number of factors d. Similar to Bai and Ng (2002), we rescale the estimated factor loading
matrix and let

1
b k,d = (
L X > X−Mk )L l1k,d , ..., b
e k,d = (b lpk,d ).
n − nk −Mk

In the second stage, we replace ls0 in (3) by b


lsk,d and rewrite (3) as a regression model

> k,d
(4) ls + e0is ,
xis = fi0 b i ∈ Mk , s = 1, 2, ..., p,

where e0is is the regression error in place of eis due to the replacement. This time, b
lsk,d , s = 1, 2, ..., p
are known and treated as the ‘regressors’, but fi0 is treated as ‘regression coefficients’ that need to
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS5

be estimated. By leaving xis out, we estimate fi0 by


p
X
k,d
fbi,s = argmin ltk,d )2 , i ∈ Mk .
(xit − f > b
f
t6=s,t=1

k,d > bk,d


Thus we can predict xis by (fbi,s ) ls . The average squared prediction error for xi = (xi1 , ..., xip )>
is
p
1X
Vik,d = k,d > bk,d 2
[xis − (fbi,s ) ls ] , i ∈ M k .
p s=1

The calculation of Vik,d can be much simplified as shown in Shao (1993), i.e.,
p
1X
Vik,d = (1 − wsk,d )−2 (xis − (fbik,d )> b
lsk,d )2 ,
p s=1

where fbik,d = (L b k,d )−1 L


b k,d> L b k,d> xi is the conventional least squares estimator for (4) and wk,d is
s
b k,d (L
the s-th diagonal element of projection matrix PLb k,d = L b k,d )−1 × L
b k,d> L b k,d> . Finally, consider

the averaged prediction error over all the elements


K n
1 X X k,d
DCV (d) = Vi .
n
k=1 i∈Mk

Let dmax be a fixed positive integer that is large enough such that p > dmax > d0 . The DCV
estimator for the number of factors is given by

(5) dbDCV = argmin DCV (d).


0≤d≤dmax

The above approach has similarity with the vanilla row-wise CV (Bro et al., 2008), which
also predicts xi by two steps. Its first step is exactly the same as ours. However, in the second
step, row-wise CV uses the full data instead of cross validation method, leading to overfitting and
inconsistency of the method. To fix this problem, Eastment and Krzanowski (1982) suggested the
element-wise cross validation. Although their methods solve the problem of overfitting, they are
costly and involve immense computation.
Our DCV offers a simpler solution. By utilizing CV in both the first stage of leaving “observa-
tions” out, and the second regression step of leaving “variables” out, our method guarantees that
the prediction of each element does not use any information from itself, and thus the consistency
is ensured as shown in the next section. In addition, the implement of K-fold CV can significantly
6 X. ZENG, Y. XIA AND L. ZHANG

reduce the computational complexity, especially when the number of rows is large. It is interesting
to see that the K-fold CV in the first stage will also help in simplifying the calculation in the second
stage. Because of this appealing nature, to further facilitate the calculation, we can transpose X
when n < p.

4. ASYMPTOTIC CONSISTENCY OF THE ESTIMATION

In this section, we investigate the consistency of our derived estimator. We start with the
following assumptions.

Assumption 1 The eigenvalues of F 0> F 0 /n are bounded away from zero and infinity, and that
Ekfi0 k4 < ∞ for i = 1, ..., n.

Assumption 2 The eigenvalues of L0> L0 /p are bounded away from zero and infinity, and that
Ekli0 k4 < ∞ for i = 1, ..., p.

Assumption 3 There exists a constant M < ∞ such that


1. Eeit = 0 and Ee4it ≤ M for 1 ≤ i ≤ n and 1 ≤ t ≤ p;
n
2. γp (i, j) = E(e> 1
γp2 (i, j) ≤ M for all i;
P
i ej /p) ≤ M and n
j=1
p P
p
1
P
3. E(eis eit ) = τst,i with τst,i ≤ |τst | for some τst and for all i; in addition, p τst ≤ M ;
s=1 t=1
n P p P
n P p
1
P
4. E(eis ejt ) = τij,st and np τij,st ≤ M ;
i=1 j=1 s=1 t=1
n
1
[ejt ejs − E(ejt ejs )]|2 ≤ M for all (s, t);
P
5. n E|
j=1
n
k √1n fi0 eis k2 ≤ M for 1 ≤ s ≤ p;
P
6.
i=1
p
7. k √1p ls0 eis k2 ≤ M for 1 ≤ i ≤ n.
P
s=1

Assumptions 1–3 are commonly used in the analysis of factor models, for example, in Bai
and Ng (2002) and Li et al. (2017). Assumptions 1 and 2 together ensure that each factor plays a
nontrivial role in contributing to the variation of X. Unlike the strict factor model that assumes
all entries of E are I.I.D., Assumption 3 allows weak dependency for elements of E, making our
methods applicable to the approximate factor models. These assumptions also indicate that all
the eigenvalues of the covariance matrix of common components would dominate the eigenvalues
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS7

corresponding to idiosyncratic components, which is crucial to the identifiability for the approximate
factor models.

Assumption 4 sup nk = o(n1/3 ) and K sup nk /n < ∞, where nk = #Mk is the number of
k k
elements in Mk .

Assumption 4 is a weak condition on K-fold CV in the first stage. The number of elements
in each fold can either be fixed or tend to infinity. In particular, this assumption includes the
leave-one-out CV.

Assumption 5.a The idiosyncratic components satisfies


1. E[e2it | 0
F−M k
L0> ] = σ 2 for i ∈ Mk , where F−M
0
k
is a submatrix of F 0 with rows in Mk being
removed;
2. λ1 (E > E/n) < 2σ 2 .

Assumption 5.b Let p/n → ρ ∈ (0, ∞). There exists a p × p positive definite matrix Σp such that
1/2
ei = Σp yi , where yi = (yi1 , yi2 , ..., yip )> ,
4
1. Eyit = 0, Eyit < ∞, and yit are I.I.D. for 1 ≤ i ≤ n, 1 ≤ t ≤ p;
0
2. yi is independent of F−M k
;
3. The spectral distribution of Σp is convergent, i.e., there exists a distribution function H such
p
that p1
P
1(λs (Σp ) ≤ t) → H(t) as p → ∞;
s=1
4. 2diag(Σp ) − Σp is a positive definite matrix, where diag(Σp ) is a diagonal matrix that has the
same diagonal elements as Σp .

Assumptions 5.a and 5.b are two technical assumptions, either of which ensures the viability
of the proposed method. They also allow weak dependence among idiosyncratic components as well
as the observations, which generalize the I.I.D. assumption required in the strict factor model. In
comparison, Bai and Ng (2002) needs weaker assumptions on the dependence than ours, possibly
due to the relatively large penalty in their methods. Of course, this large penalty is at the cost of
selecting inappropriately fewer factors as shown in our simulation study.

Theorem 1 Suppose Assumptions 1–4 hold. If either Assumption 5.a or 5.b holds, then the DCV
8 X. ZENG, Y. XIA AND L. ZHANG

estimator (5) is consistent as both n and p tend to infinity, i.e.,

lim P rob(dbDCV = d0 ) = 1.
n,p→∞

Remark 1 Recent empirical findings suggest that the factor structure of economic data may
change with sample size and dimensionality of variable (Onatski, 2010; Jurado et al., 2015; Li et al.,
2017). In fact, Theorem 1 can be easily extended to the case that d0 varies slowly with both n and
p. DCV is thus applicable to practical data with changing structure of factorization.

5. SIMULATION STUDY

In this section, we conduct simulation studies to compare the finite sample performances of the
DCV with the other methods, including the panel criterion IC1 (Bai and Ng, 2002) and Ladle (Luo
and Li, 2016). As shown in Luo and Li (2016), Ladle estimator outperforms all the other eigen-
structrue based methods in determining the number of factors, and is thus used as a representative of
those methods. Similarly, for the IC-based methods, Bai and Ng (2002) showed that their criterion
outperforms other information criteria such as AIC and BIC. We do not consider other cross-
validation methods as they are either inconsistent or suffer from excessive computational burden.
For our DCV , we consider both leave-one-out CV (DCV1 ) and 10-fold CV (DCV10 ) in the first
stage. Matlab codes for the calculations are available at https://github.com/XianliZeng/Double-
Cross-Validation.
We simulate data from a factor model with d0 = 5:
5
X √
(6) xis = fij lsj + θeis ; fij , lsj ∼ N (0, 1) i = 1, ..., n, s = 1, 2, ..., p.
j=1

For the idiosyncratic errors eis , the following settings are considered:
(E1) independent Gaussian: eis ∼ N (0, 1);
(E2) independent t-distributed: eis ∼ t3 ;
(E3) heteroskedastic: eis ∼ N (0, δs ), δs = 1 if s is odd, and 2 if s is even;
(E4) serial correlated: eis = 0.3ei,s−1 + νis , νis ∼ N (0, 1);
10
0.15j νi−j,s , νis ∼ N (0, 1).
P
(E5) cross-sectional correlated: eis =
j=−10
In all simulations, we select d from {1, 2, ..., dmax } where dmax = 8 as in Bai and Ng (2002).
We consider the combinations of n and p with n = {40, 160, 640} and p = {30, 90, 270}. To see
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS9

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
1 3 5 7 9 2 6 10 14 18 5 9 13 17 21

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
3 6 9 12 15 8 17 26 35 44 10 30 50 70 90

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
3 7 11 15 19 12 26 36 48 60 35 65 95 125 155

Figure 1.— The relative frequencies of correctly selecting the number of factors for model (6) with independent
Gaussian idiosyncratic errors (E1). In each panel, we use red line for DCV , black line for DCV10 , blue line for IC1

and green line for Ladle.

the effect of the signal-to-noise ratio on the methods, θ varies in a range of intervals depending on
different settings of idiosyncratic errors and different n and p. These ranges are made clear in each
panel of the figures below. All comparisons were evaluated based on 1000 replications. Figures 1
to 5 summarize the simulation results, where the curves represent the frequencies of selecting the
right number of factors at different noise level θ.
Generally, the simulation results suggest that the performance of all the methods becomes
10 X. ZENG, Y. XIA AND L. ZHANG

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 1 2 3 4 0 1.5 3 4.5 6 0 2 4 6 8

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 1.5 3 4.5 6 0 4 8 12 16 0 7 14 21 28

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 1.5 3 4.5 6 0 5 10 15 20 0 12 24 36 48

Figure 2.— The relative frequencies of correctly selecting the number of factors for model (6) with independent
t-distributed idiosyncratic errors (E2). In each panel, we use red line for DCV , black line for DCV10 , blue line for

IC1 and green line for Ladle.

better when n and p get larger, which lends support to the asymptotic consistency of the methods.
Ladle usually shows better performance than IC1 , possibly due to its utilization of both eigenvalues
and eigenvectors. Notably, DCV1 and DCV10 stand out and perform the best in most scenarios.
They possess a much higher frequency of correct selection when θ is large, implying their ability
to select the right number of factors even when the factors’ contribution to the variation of the
variables, i.e., signal-to-noise ratio, is low, which is particularly the situation in economic data.
11
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
1 3 5 7 9 1 6 11 16 21 0 6 12 18 24

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 3 6 9 12 6 12 18 24 30 10 25 40 55 70

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
1 4 7 10 13 10 17 24 31 38 20 40 60 80 100

Figure 3.— The relative frequencies of correctly selecting the number of factors for model (6) with heteroskedas-
tic idiosyncratic errors (E3). In each panel, we use red line for DCV , black line for DCV10 , blue line for IC1 and

green line for Ladle.

Moreover, DCV1 and DCV10 present almost identical performance for all the models, implying that
the performance of the DCV is not sensitive to the choice of K.
For different settings of the idiosyncratic errors, we have some additional observations. Figure
2 shows that IC1 is not so robust to the data with heavy-tail especially when the size of the data is
small. Figure 3 implies that IC1 is also affected adversely by the heteroskedastic errors. Note that
both heavy tail and heteroskedasticity are stylized facts in financial data. In contrast, for these two
12 X. ZENG, Y. XIA AND L. ZHANG

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
1 3 5 7 9 0 4 8 12 16 0 5 10 15 20

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
2 5 8 11 14 7 14 21 28 35 10 26 42 58 74

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
5 8 11 14 17 10 21 32 43 54 30 55 80 105 130

Figure 4.— The relative frequencies of correctly selecting the number of factors for model (6) with serial
correlated idiosyncratic errors (E4). In each panel, we use red line for DCV , black line for DCV10 , blue line for IC1

and green line for Ladle.

types of data, DCV1 and DCV10 still demonstrate very stable and efficient ability in selecting the
number of factors. In Figures 4 and 5, where the idiosyncratic errors have serial or cross-sectional
correlation, DCV has inferior performance than IC1 and Ladle when the variance of idiosyncratic
components is small and both n and p are small. However, DCVs quickly gains advantage when
either n or p increases.
13
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 3 6 9 12 4 8 12 16 20 5 9 13 17 21

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
5 7 9 11 13 7 16 25 35 43 11 29 47 65 83

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
4 8 12 16 20 11 23 35 47 59 30 60 90 120 150

Figure 5.— The relative frequencies of correctly selecting the number of factors for model (6) with cross-
sectional correlated idiosyncratic errors (E5). In each panel, we use red line for DCV , black line for DCV10 , blue line

for IC1 and green line for Ladle.

6. AN EMPIRICAL APPLICATION: FAMA-FRENCH THREE-FACTOR MODEL

In asset pricing and portfolio management, the so-called three-factor model of Fama and French
(1993) is widely applied to describe the stock returns. It shows that the excess return of a stock or
a portfolio (Rit − rf t ) can be satisfactorily explained by three common factors: (1) excess return
of the market portfolio (RM t − rf t ), (2) the outperformance of small-cap companies versus big-cap
companies (SM B), and (3) the outperformance of companies with high book to market ratio versus
14 X. ZENG, Y. XIA AND L. ZHANG
Validation of three-factor model based on value weighted return of 25 portfolios

1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6

1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6

1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6

Validation of three-factor model based on value weighted return of 100 portfolios

1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6

1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6

1 1 1 1

0.5 0.5 0.5 0.5

0 0 0 0
0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6

Figure 6.— The frequencies of selected numbers of factors for the value weighted returns of 25 and 100 portfolios.
Panels in each column represents one method; panels in the same rows are based on the same data in fixed period of

years.
15
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

those with small book to market ratio (HM L), i.e.,

Rit − rf t = β1 (RM t − rf t ) + β2 SM Bt + β3 HM Lt + it ,

where rf t is the risk-free return. Next, we use the data provided by Professor Kenneth R. French1
to mutually validate our methods and the three-factor model. Stocks are divided into 6, 25 or
100 groups according to company’s capitalization (market equity) and Book-to-Market ratio, and
portfolios are constructed with value weighted daily returns or equally weighted daily returns in
each group. As those portfolios were compiled based on company’s data at the end of June of every
year, we set the first of July of every year as a starting date of a period. It is understandable that the
portfolios change from year to year, thus we only consider periods of 1 to 3 years in our calculation.
Figure 6 presents the frequencies of the numbers of factors detected by the DCV1 , DCV10 , IC1
and Ladle in two of the data sets with dmax set at 15. In most situations, all the approaches identify
three common factors for the returns of portfolios, which are in line with the three-factor model of
Fama and French (1993). In general, DCV selects three factors most often amongst the approaches,
showing its superior performance in practice.

7. CONCLUDING REMARKS

In the literature, both CV and IC based approaches were popularly used for determining the
complexity of a model, such as the number of factors. As noticed in Bro et al. (2008), most CV
approaches for factor models are not consistent due to their insufficient validation, while existing
remedies for the consistency is very computationally expensive. In contrast, ICs are consistent
and easy to implement, but they depend on predetermined penalty functions (Bai and Ng, 2002).
Simulation results not reported here show that the penalty function in Bai and Ng (2002) could be
modified to be more adaptive to the data. In addition, ICs are unstable when the signal-to-noise
ratio is relatively low (Onatski, 2010).
By validating the model twice, the proposed DCV method does not only ensure the consistency
under mild conditions, but is also easy to implement. Because DCV is based on prediction error, it
automatically selects the number of factors that balances the model complexity and stability. Our
simulation studies also demonstrate its superior efficiency over the existing methods most of the time.
The only exception is when there is strong serial dependence in the idiosyncratic errors. However, this
1 http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data library.html
16 X. ZENG, Y. XIA AND L. ZHANG

deficiency disappears when the sample size and dimension of the variables increase. The advantages
of our method are more pronounced for data with relative large variation, heteroscedasticity or
heavy tails in the idiosyncratic errors. The method is thus particularly relevant for financial data.

REFERENCES

Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1):191–

221.

Bai, Z. and Zhou, W. (2008). Large sample covariance matrices without independence structures in columns. Statistica

Sinica, pages 425–442.

Bao, Z., Pan, G., and Zhou, W. (2015). Universality for the largest eigenvalue of sample covariance matrices with

general population. The Annals of Statistics, 43(1):382–421.

Bro, R., Kjeldahl, K., Smilde, A. K., and Kiers, H. A. L. (2008). Cross-validation of component models: a critical

look at current methods. Analytical and bioanalytical chemistry, 390(5):1241–1251.

Cragg, J. G. and Donald, S. G. ((1997)). Inferring the rank of a matrix. Journal of Econometrics, 76:223–250.

Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. iii. SIAM Journal on Numerical

Analysis, 7(1):1–46.

Eastment, H. T. and Krzanowski, W. J. (1982). Cross-validatory choice of the number of components from a principal

component analysis. Technometrics, 24(1):73–77.

Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial

Economics, 33(1):3–56.

Forni, M., Hallin, M., Lippi, M., and Reichlin, L. (2000). The generalized dynamic-factor model: Identification and

estimation. Review of Economics and Statistics, 82(4):540–554.

Fujikoshi, Y. (1977). Asymptotic expansions for the distributions of some multivariate tests. In Krishnaiah, P. R.,

ed. Multivariate Analysis, volume IV, pages 55–71, Amsterdam: North-Holland.

Gregory, A. W. and Head, A. C. (1999). Common and country-specific fluctuations in productivity, investment, and

the current account. Journal of Monetary Economics, 44(3):423–451.

Horn, R. A. and Johnson, C. R. (1990). Matrix analysis. Cambridge university press.

Jurado, K., Ludvigson, S. C., and Ng, S. (2015). Measuring uncertainty. American Economic Review, 105(3):1177–

1216.

Ledoit, O. and Péché, S. (2011). Eigenvectors of some large sample covariance matrix ensembles. Probability Theory

and Related Fields, 151(1-2):233–264.

Lewbel, A. (1991). The rank of demand systems: theory and nonparametric estimation. Econometrica, 59:711–730.

Li, H., Li, Q., and Shi, Y. (2017). Determining the number of factors when the number of factors can increase with
17
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

sample size. Journal of Econometrics, 197(1):76–86.

Luo, W. and Li, B. (2016). Combining eigenvalues and variation of eigenvectors for order determination. Biometrika,

103(4):875–887.

Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. The Review of

Economics and Statistics, 92(4):1004–1016.

Ross, S. A. (1976). The arbitrage theory of capital asset pricingheory of capital asset pricing. Journal of Finance,

13:341–360.

Schott, J. R. (1994). Determining the dimensionality in sliced inverse regression. Journal of the American Statistical

Association, 89(425):141–148.

Shao, J. (1993). Linear model selection by cross-validation. Journal of the American statistical Association,

88(422):486–494.

Shao, J. (2003.). Mathematical statistics. 2nd ed. New York: Springer.

Stock, J. H. and Watson, M. W. (1998). Diffusion indexes. NBER working paper.

Stock, J. H. and Watson, M. W. (2002). Macroeconomic forecasting using diffusion indexes. Journal of Business &

Economic Statistics, 20(2):147–162.

Wold, S. (1978). Cross-validatory estimation of the number of components in factor and principal components models.

Technometrics, 20(4):397–405.

Ye, Z. and Weiss, R. E. (2003). Using the bootstrap to select one of a new class of dimension reduction methods.

Journal of the American Statistical Association, 98(464):968–979.

Yu, Y., Wang, T., and Samworth, R. J. (2014). A useful variant of the Davis–Kahan theorem for statisticians.

Biometrika, 102(2):315–323.

APPENDIX A: PROOFS FOR THEOREMS

In this section, we provide detailed proofs for our theoretical results, mainly based on the advanced techniques

derived in Bai and Ng (2002) and RMT (Bai and Zhou, 2008; Ledoit and Péché, 2011; Bao et al., 2015). We first

introduce several notations that repeatedly appear hereafter. For matrix A = (aij )n×p , ai = (ai1 , ai2 , · · · , aip )> is

the transpose of the i-th row, and A−Mk is the sub-matrix of A with rows in Mi being removed, and λt (A) is the t-th
bt , t = 1, 2, ..., p be the eigenvalues of X > X with
largest eigenvalue of A, and kAk is the Frobenius norm of A. Let λ

decreasing order and φ bk , φ


bt , t = 1, 2, ..., p be the corresponding eigenvectors. Let (λt , φt , t = 1, ..., p), (λ bk , t = 1, ..., p)
t t
> >
>
and (λkt , φkt , t = 1, ..., p) be corresponding quantities for L0 F 0> F 0 L0 , X−M X−Mk and L0 F−M
0> F 0 0 ,
k −M L k k

respectively. We use Mnp to denote max(n, p) and mnp to denote min(n, p). In addition, we use Cand C 0 to denote

constants that may vary in different places throughout the paper.


18 X. ZENG, Y. XIA AND L. ZHANG

To make the proofs easy to read, we give the proof to our main result in the first subsection, and put supporting

facts and their proofs in the second subsection of this Appendix.

A.1. Proof of Theorem 1

Theorem 1 follows if we can show that

lim Prob(DCV (d) < DCV (d0 )) = 0 for d 6= d0 and 1 ≤ d ≤ dmax .


n,p→∞

Let wsk,d be the sth diagonal element of PL b k,d (L


b k,d = L
b k,d )−1 L
b k,d> L b k,d> , Wi (d, L) = 1 >
x (Id
p i
− PL )xi and
1 >
H k,d = F 0> X−M
(n−nk )p −Mk
e k,d . For
L 1 ≤ d < d0 , according to Shao (1993),
k
K X p
1
(1 − wsk,d )−2 (xis − fbik,d> b
X X
DCV (d) = lk,d
s )
2
np k=1 i∈Mk s=1
K p
1 X X X 2
= (1 + 2wsk,d + O(wsk,d ))(xis − fbik,d> b
lk,d
s )
2
np k=1 i∈M s=1
k

K
1 X X >
= x (Id − PL
b k,d )xi + op (1)
np k=1 i∈M i
k

K
1 X X
b k,d ) + op (1).
= Wi (d, L
n k=1 i∈Mk
Then,
K K X
1 X X b k,d ) − 1
X
b k,d0 ) + op (1)
DCV (d) − DCV (d0 ) = Wi (d, L Wi (d0 , L
n k=1 i∈M n k=1 i∈M
k k

K K X
1 X X b k,d ) − 1
X
= Wi (d, L Wi (d, L0 H k,d )
n k=1 i∈M n k=1 i∈M
k k

K K
1 X X 1 X X
+ Wi (d, L0 H k,d ) − Wi (d0 , L0 )
n k=1 i∈M n k=1 i∈M
k k

K K
1 X X 1 X X
+ Wi (d0 , L0 ) − Wi (d0 , L0 H k,d0 )
n k=1 i∈M n k=1 i∈M
k k

K K
1 X X 1 X X
+ Wi (d0 , L0 H k,d0 ) − b k,d0 ) + op (1)
Wi (d0 , L
n k=1 i∈M n k=1 i∈M
k k

> cd + op (1).

Here, the last inequality follows from Lemma 3, Lemma 4 and the fact that PL0 = PL0 H k,d0 . Consequently,

lim Prob(DCV (d) < DCV (d0 )) = 0.


n,p→∞

k,d k,d k,d


b k,d = diag(w1 , w2 , ..., wp ) and Cov(ei ) = Σp,i . For d0 < d ≤ dmax , according to Lemma 7,
Let QL

DCV (d)
K p
1 X X X
= (1 − wsk,d )−2 (xis − fbik,d> b
lk,d
s )
2
np k=1 i∈M s=1
k
19
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

K p
1 X X X 2
= (1 + 2wsk,d + O(wsk,d ))(xis − fbik,d> b
lk,d
s )
2
np k=1 i∈M s=1
k

K K p
1 X X > 2 X X X k,d 1
= xi (Id − PL
b k,d )xi + w (xis − fbik,d> b
lk,d 2
s ) + op ( )
np k=1 i∈M np k=1 i∈M s=1 s p
k k

K K
1 X X > 2 X X 1
= xi (Id − PL
b k,d )xi + Tr(QL
b k,d Σp,i ) + op ( ).
np k=1 i∈M np k=1 i∈M p
k k

Therefore,

DCV (d) − DCV (d0 )


K K
1 X X > 2 X X 1
= b k,d0 − PL
xi (PL b k,d )xi + b k,d − QL
Tr((QL b k,d0 )Σp,i ) + op ( )
np k=1 i∈M np k=1 i∈M p
k k

d K K
1 X X X 2 X X 1
= − kx> bk 2
i φl k + b k,d − QL
Tr((QL b k,d0 )Σp,i ) + op ( ).
np l=d0 +1 k=1 i∈Mk
np k=1 i∈Mk
p
According to Lemma 14,
d K X K
1 X X 2 X X
kx> bk 2
i φl k < b k,d − QL
Tr((QL b k,d0 )Σp,i ) + op (1).
n l=d0 +1 k=1 i∈Mk
n k=1 i∈M
k

It follows that,

lim Prob(DCV (d) < DCV (d0 )) = 0.


n,p→∞

We thus complete the proof of Theorem 1.

A.2. Supporting Lemmas and their Proofs

Lemma 1 lk,d
Suppose Assumptions 1–4 hold. Let bs and l0s be the transpose of the s-th row of L
b k,d and L0 respectovely.

For any k and d, there exist a (d0 × d) matrix H k,d with rank(H k,d ) = min(d0 , d) such that,

lk,d
kbs − H k,d> l0s k2 = Op (m−1
np ), for each k and s.

1 >
Proof: Let H k,d = F 0> X−M
(n−nk )p −Mk
e k,d ,
L then,
k

b k,d − L0 H k,d 1 >


L = (X−M − L0 F−M
0>
)X−Mk Le k,d
(n − nk )p k k

1
= E > X−Mk L e k,d
(n − nk )p −Mk
1 1
= E > E−Mk L e k,d + E> F 0 L0> L
e k,d .
(n − nk )p −Mk (n − nk )p −Mk −Mk

Thus,
  2
1 p k,d p
X
1 1 1
lk,d
X X X
lk,d − H k,d> l0s k2 0>
fi0 eis

kbs = lt 
e eit eis  + e
t lt
p
t=1 n − nk p t=1 n − nk
i∈M
/ k i∈M
/
k
20 X. ZENG, Y. XIA AND L. ZHANG

  2
1 p k,d
X
1 X
≤ 3
lt 
e [eit eis − E(eit eis )]
p
t=1 n − nk i∈M

/ k
2
p
1 X 1
lk,d
X
+3 e
t E(eit eis )
p
t=1 n − nk
i∈M
/ k
2
1 p k,d 0>
X
1 X
0

+3 l
e
t lt f i eis

p
t=1 n − nk
i∈M
/ k

= 3[a(k, t) + b(k, t) + c(k, t)].

Next, we investigate the three terms separately as follows.


 2
p p
1 X ek,d 2 X  1 X
a(k, t) ≤ kl k [eit eis − E(eit eis )]
p2 t=1 t t=1
n − nk
i∈M
/ k
 2
p
d X 1 X
=  [eit eis − E(eit eis )]
p t=1
n − nk
i∈M
/ k
−1
= Op (n ).

 2
 
p p 
1 X ek,d 2 X  1 X
b(k, t) ≤ kl k E eit eis 
p2 t=1 t t=1
n − nk
i∈M
/ k
 2
p  
d X 1 X
= E eit eis 
p t=1
n − nk
i∈M
/ k
−1
= Op (p ).

2
p k,d 0>
X
1 1 X
0

c(k, t) ≤ l
e
t lt f i eis

p2
t=1 n − n k i∈M

/ k
2
p p
1 √ 1
1X 21
lk,d
X X
fi0 eis kl0 k2

≤ ket k
n − nk n − nk

p t=1
p t=1 t
i∈M
/ k

= Op (n−1 ).

In summary,

lk,d
kbs − H k,d> l0s k2 = Op (m−1
np ).

Q.E.D.

Lemma 2 Suppose Assumptions 1–4 hold. For any matrix F , let PF = F (F > F )−1 F > be its projection matrix. If

the random vector v satisfies kvk2 = Op (1),

−1/2
v > (PL0 H k,d − PL
b k,d )v = Op (mnp ).
21
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

Proof: Let Dk,0 = H k,d> L0> L0 H k,d /p and Dk,d = (L b k,d /p. We have kD −1 k = O(1) and kD −1 −
b k,d )> L
k,d k,0
−1 −1/2
Dk,d k = Op (mnp ) (see Lemma 2 of Bai and Ng (2002)). Note that,

v > (PL0 H k,d − PL


b k,d )v

1 > 0 k,d −1 −1
= v L H (Dk,0 − Dk,d )H k,d> L0> v
p
1 b k,d − L0 H k,d )D −1 (L
− v > (L k,d
b k,d − L0 H k,d )> v
p
1 b k,d − L0 H k,d )D −1 H k,d> L0> v
− v > (L k,d
p
1 −1 b k,d
− v > L0 H k,d Dk,d (L − L0 H k,d )> v
p
= I + II + III + IV.

We consider the four terms in turn.


p p
1 XX −1 −1
I = vs l0>
s H
k,d
(Dk,0 − Dk,d )H k,d> l0t vt
p s=1 t=1
p
1 −1 −1
X
. ≤ kDk,0 − Dk,d k(k vs l0>
s H
k,d 2
k )
p s=1
p
1 −1 −1
X
≤ kDk,0 − Dk,d k· kl0>
s H
k,d 2
k · kvk2
p s=1
−1/2
= Op (mnp );

p p
1 XX −1 bk,d
II = lk,d
vs (bs − H k,d> l0s )> Dk,d (lt − H k,d> l0t )vt
p s=1 t=1
p p
1 n X X  bk,d −1 bk,d 2 o1/2
≤ (l − H k,d> l0s )> Dk,d (lt − H k,d> l0t ) kvk2
p s=1 t=1 s
p
1 X bk,d −1
≤ ( kl − H k,d> l0s k2 )kDk,d k · Op (1)
p s=1 s
= Op (m−1
np );

p p
1 XX −1
III = lk,d
xis (bs − H k,d> l0s )> Dk,d H k,d> l0t xit
p s=1 t=1
p p
1 n X X  bk,d −1 2 o1/2
≤ (ls − H k,d> l0s )> Dk,d H k,d> l0t kvk2
p s=1 t=1
p p
1 X bk,d
kH k,d> l0s k2 )1/2 kDk−1 k · Op (1)
X
≤ ( kls − H k,d> l0s k2 )1/2 (
p i=1 s=1
−1/2
= Op (mnp ).

Similarly,

−1/2
IV = Op (mnp ).
22 X. ZENG, Y. XIA AND L. ZHANG

The proof is then complete. Q.E.D.

1 >
Lemma 3 Suppose Assumptions 1–4 hold. For any matrix F , denote Wi (d, F ) = x (Id
p i
− PF )xi . Then, for

1 ≤ d ≤ d0 ,
K K X
1 X X b k,d ) − 1
X −1/2
Wi (d, L Wi (d, L0 H k,d ) = Op (mnp ).
n k=1 i∈M n k=1 i∈M
k k

p
Note that, Ekxi k2 = E(x2is ) ≤ p max E(x2is ) = O(p). According to Lemma 2,
P
Proof:
s=1 i,s

K
1 X X b k,d ) − Wi (d, F 0 H k,d )]
[Wi (d, L
n k=1 i∈M
k

K
1 X X >
= x (PL0 H k,d − PL
b k,d )xi
np k=1 i∈M i
k
−1/2
= Op (mnp ).

Q.E.D.

Lemma 4 Suppose Assumptions 1–4 hold. For any d < d0 , there exists a number cd > 0 such that,
K
1 X X
lim [Wi (d, L0 H k,d ) − Wi (d0 , L0 )] > cd .
n,p→∞ n k=1 i∈M
k

Proof:
K
1 X X
[Wi (d, L0 H k,d ) − Wi (d0 , L0 )]
n k=1 i∈M
k

K
1 X X >
= x (PL0 − PL0 H k,d )xi
np k=1 i∈M i
k

K
1 X X >
= e (PL0 − PL0 H k,d )ei
np k=1 i∈M i
k

K
2 X X
+ e> 0 0
i (PL0 − PL0 H k,d )L fi
np k=1 i∈Mk
K
1 X X 0> 0>
+ f L (PL0 − PL0 H k,d )L0 fi0
np k=1 i∈M i
k

= I + II + III.

We have I ≥ 0 since for each d, (PL0 − PL0 H k,d ) is positive semi-definite. For II, note that for each d,

2 > 2
e (PL0 − PL0 H k,d )L0 ft0 = [e> L0 ft0 − e> 0 0
i PL0 H k,d L ft ] = op (1),
p i p i

since k √1p L0> ei k = Op (1).


23
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

For III, according to Stock and Watson (1998), H k,d converges to a matrix H0d with rank d, then

K
1 X X 0> 0>
III = f L (PL0 − PL0 H k,d )L0 fi0
np k=1 i∈M i
k

K
1 X X 0> 0>
→ f L (PL0 − PL0 H d )L0 fi0
np k=1 i∈M i 0
k

1 1
= Tr( [L0> L0 − L0> L0 H0d (H0d> L> d −1
0 L0 H0 ) H0d> L0> L0 ] · F 0> F 0 )
p n
> cd > 0,

1 0> 0
since H0d has rank d and λd0 ( n F F ) and λd0 ( p1 L0> L0 ) are bounded away from 0. Q.E.D.

Lemma 5 Suppose Assumptions 1–4 hold. For 1 ≤ d1 ≤ d0 and r < d2 ≤ dmax ,

φ> bk −1
d1 φd2 = Op (mnp ).

Proof: Let Φ bk , φ
b k = (φ bk , ..., φ
bk ) and Φd = (φ1 , φ2 , ..., φd ). According to Davis-Kahan sin Θ theorem (Davis
d0 1 2 d0 0 0

and Kahan, 1970; Yu et al., 2014), there exist a orthogonal matrix Ob ∈ Rd0 ×d0 such that

d0 kX > X − L0 F−M 0> F 0 L0> kop
bk O k −Mk
kΦ d0
b − Φd k ≤
0
,
λd0 − λd0 +1

where k · kop is the operator norm. Note that,

0> 0>
kF−M k
E−Mk kop = sup kF−M k
E−Mk ξk
kξ=1k
 
p
X X
≤ sup ( k fi0 eis k2 )1/2 · kξk
kξ=1k s=1 i∈M
/ k

= Op ( np).

Then,

kX > X − L0 F−M
0>
F0
k −Mk
L0> kop
X
= k xi x> 0 0> 0
i + L F−Mk E−Mk + E−Mk F−Mk L
0> >
+ E−Mk
E−Mk kop
i∈Mk
X
≤ k xi x> 0 0> 0
i kop + kL F−Mk E−Mk kop + kE−Mk F−Mk L
0> >
kop + kE−Mk
E−Mk kop
i∈Mk
X
≤ k xi x> 0 0> >
i kop + 2kL F−Mk kop · kE−Mk kop + kE−Mk E−Mk kop
i∈Mk

= Op (Mnp ).

Consequently,

bk O
kΦ b − Φd k = Op ( Mnp ) = Op (m−1 ),
d0 0 np
np
24 X. ZENG, Y. XIA AND L. ZHANG

bd1 be the transpose of the d1 -th row of O,


Let o b we can obtain that,

φ> bk bk>
d1 φd2 = φd2 (φd1 − Φd0 o
bk> Φ
b bd ) + φ
1 d2 b = Op (m−1
bd o
0 d1 np ).

Q.E.D.

Lemma 6 Suppose Assumptions 1–4 hold. For d0 < d ≤ dmax ,

bk> L0 f 0 k2 = Op (pm−2 ).
kφ d i np

d0 d0
It is easy to see that L0 fi0 ∈ span{φ1 , φ2 , ..., φd0 }. Let L0 fi0 = b2s = kL0 fi0 k2 . Note
P P
Proof: bs φs with
s=1 s=1
that,
d0
X
bk> L0 f 0 k2
kφ = k bk> φs k2
bs φ
d i d
s=1
d0
X
≤ kL0 fi0 k2 bk> φs k2
kφ d
s=1

Since kL0 fi0 k2 = Op (p), according to Lemma 5, we have that,

bk> L0 f 0 k2 = Op (pm−2 ).
kφ d i np

Q.E.D.

Lemma 7 Suppose Assumptions 1–4 hold. Let Cov(ei ) = Σp,i , for d0 < d ≤ dmax ,
K p K
1 X X X k,d 1 X X
ws (xis − fbik,d> b
lk,d 2
s ) = Tr(QL
b k,d Σp,i ) + op (1).
n k=1 i∈M s=1 n k=1 i∈M
k k

Proof: Let H k,d be as defined in Lemma 1 and H k,d− be the generalized inverse matrix of H k,d . Moreover, let
k,d k,d k,d k,d
QL
b k,d = diag(w1 , w2 , ..., wp ). It is easy to see that sups ws = op (1). Write xi = L0 fi0 + ei = L
b k,d H k,d− f 0 +
i

(L0 H k,d − L
b k,d )H k,d− f 0 + ei . Since k(L0 H k,d − L
i
b k,d )H k,d− f 0 k2 = Op (1),
i

K p
1 X X X k,d
w (xis − fbik,d> b
lk,d
s )
2
n k=1 i∈M s=1 s
k

K
1 X X b k,d> fbk,d )> Q b k,d (xi − L
b k,d> fbk,d )
= (xi − L i L i
n k=1 i∈M
k

K K
1 X X > 2 X X >
= ei QL
b k,d ei + e Q b k,d (L0 H k,d − L
b k,d )H k,d− f 0
n k=1 i∈M n k=1 i∈M i L i
k k

K
1 X X b k,d )H k,d− f 0 )> Q b k,d (L0 H k,d − L
+ ((L0 H k,d − L i L
b k,d )H k,d− f 0
i
n k=1 i∈M
k
25
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

K
1 X X >
= e Q b k,d ei + op (1)
n k=1 i∈M i L
k

K
1 X X
= Tr(QL
b k,d Σp,i ) + op (1).
n k=1 i∈M
k

Q.E.D.

Lemma 8 Suppose Assumptions 1–4 hold. Moreover, either Assumption 5.a or 5.b holds. For 1 ≤ d1 ≤ d0 and

d0 + 1 ≤ d2 ≤ dmax ,
1 1
= Oa.s. ( ).
λd1 − λd2 np
b λk , and λ
Here, λ can be replaced by λ, bk .

Proof: bd here. The result for λk and λ


We only prove the result for λd and λ bk can be similarly verified. For λs , as
d d

λd = 0 for d > d0 and λd ≥ λd0 for d ≤ d0 , the result follows if we can show that,

λd0 > Cnp a.s.,

with C a positive constant. For any ξ ∈ Rp with kξk2 = 1, we have

ξ> L0 F 0> F 0 L0> ξ ≥ λd0 (F 0> F 0 ) · kL0> ξk2 = λd0 (F 0> F 0 ) · ξ> L0 L0> ξ.

Then, according to Assumption 1, Assumption 2 and Min-Max theorem,


n o
λd 0 = min max ξ> L0 F 0> F 0 L0> ξ
dim(U )=d0 ξ∈U ,kξk2 =1
n o
≥ λd0 (F 0> 0
F )· min max ξ> L0 L0> ξ
dim(U )=d0 ξ∈U ,kξk2 =1

= λd0 (F 0> F 0 ) · λd0 (L0 L0> )

≥ Cnp a.s..

Here λd0 (A) represent the d0 -th largest eigenvalue of A.


bd , note that X = F 0 L0> + E, by the Weyl’s inequality for singular values (Theorem 3.3.16 of Horn and
For λ

Johnson (1990)),
q 1
bd − λd | ≤ λ 2 (E > E).
p
| λ 1

1 > 1 >
Under Assumption 5.a, λ1 ( n E E) < 2σ 2 . While according to Bao et al. (2015), under Assumption 5.b, λ1 ( n E E)

is bounded. It follows that,


1
bd > ( Cnp − λ 2 (E > E))2 > C 0 np
p
λ 0 1 a.s.

and

bd +1 = Oa.s. (λ1 (E > E)) = oa.s. (np).


λ 0
26 X. ZENG, Y. XIA AND L. ZHANG

Then,

1 1 1
≤ = Oa.s. ( ).
bd − λ
λ 1
bd
2
bd − λ
λ 0
bd +1
0
np

Q.E.D.

Lemma 9 Suppose Assumptions 1–4 hold. Moreover, either Assumption 5.a or 5.b holds. For d0 < d ≤ dmax ,
d
X K
X d
X
(λ bk ) ≤
bl − λ λ
bl
l
l=d0 +1 k=1 l=d0 +1

and
d
X K X
X d
X
kx> bk 2
i φl k ≤ λ
bl , a.s.
l=d0 +1 k=1 i∈Mk l=d0 +1

p
Proof: bk = P cls,k φ
Let φ bs . Note that,
l
s=1
p
X p
X
X>Xφ
bk = X > X
l cls,k φ
bs = cls,k λ
bs φ
bs ,
s=1 s=1

and
X
X>Xφ
bk
l = >
X−M k
bk +
X−Mk φl xi x> bk
i φl
i∈Mk
X
= bk φ
λ bk
l l + xi x> bk
i φl
i∈Mk
p
X X
= bk φ
cls,k λ l s+
b xi x> bk
i φl .
s=1 i∈Mk

We have
p
X X
cls,k (λ bk )φ
bs − λ
l
bs = xi x> bk
i φl .
s=1 i∈Mk

It follows that,
d
X p
X d
X X
c2ls,k (λ bk )2 =
bs − λ
l k xi x> bk 2
i φl k .
l=d0 +1 s=1 l=d0 +1 i∈Mk

Since, for 1 ≤ s ≤ d0 and d0 < l ≤ dmax ,

1 1
= Oa.s. ( ) and kxi x> bk 2 2 > bk 2
i φl k ≤ kxi k · kxi φl k ,
λs − λl
b b k np

we have,
d
X d0
X d
X X kxi k2 Xd X
c2ls,k λ
bs = Oa.s. ( kx> bk 2
i φl k ) = oa.s. ( kx> bk 2
i φl k ),
l=d0 +1 s=1 l=d0 +1 i∈M
np l=d +1 i∈M
k 0 k

e d0 = √
Let L p(φ1 , φ2 , ..., φd0 ),
d
X d
X
λ
bl ≥ bk> (Ip − P e d )X > X(Ip − P e d )φ
φ bk
l L 0 L 0 l
l=d0 +1 l=d0 +1
27
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

d
X d
X
= bk> X > X φ
φ bk − bk> P e d X > XP e d φ
φ bk
l l l L 0 L 0 l
l=d0 +1 l=d0 +1
d
X d
X X d
X d0
X
= bk +
λl kx> bk 2
i φl k − c2ls,k λ
bs
l=d0 +1 l=d0 +1 i∈Mk l=d0 +1 s=1
d
X d
X X
= bk + (1 − oa.s. (1))
λ l kx> bk 2
i φl k
l=d0 +1 l=d0 +1 i∈Mk

K
On the other hand, since (K − 1)X > X = >
P
X−M X−Mk , we have that,
k
k=1

d
X d
X
(K − 1) λ
bl = (K − 1) b> X > X φ
φ bl
l
l=d0 +1 l=d0 +1
K
X d
X
= b> X > X−M φ
φ k l
b
l −Mk
k=1 l=d0 +1
K
X d
X
= b> (Ip − P b k,d )X > X−M (Ip − P b k,d )φ
φ bl
l L 0 −Mk k L 0
k=1 l=d0 +1
K
X d
X
+ b> P b k,d X > X−M P b k,d φ
φ bl
l L 0 −Mk k L 0
k=1 l=d0 +1
K
X d
X K
X d
X d0
X
= bk · k(Ip − P b k,d )φ
λ bl k2 + λ b> φ
b k kφ bk k2
l L 0 t l t
k=1 l=d0 +1 k=1 l=d0 +1 t=1
K
X d
X
≤ (1 + o(1)) bk .
λ l
k=1 l=d0 +1

Consequently,
d
X K
X d
X d
X
bk ) ≤ (K − (K − 1))
bl − λ
(λ λ
bl = λ
bl
l
l=d0 +1 k=1 l=d0 +1 l=d0 +1

and
d
X K X
X d
X
kx> bk 2
i φl k ≤ λ
bl , a.s.
l=d0 +1 k=1 i∈Mk l=d0 +1

Q.E.D.

Lemma 10 Suppose Assumptions 1–4 and 5.b hold. Then, for d0 < d ≤ dmax ,
K
1 X b> φ
bk 2
E(1 − |φd d |) = o(1).
K k=1

Proof: Here we only prove the result for d = d0 + 1. For d0 + 1 < d ≤ dmax , the result can be similarly verified.
p
bk = P cds,k φ
Similar to Lemma 9, set φ bs . Then
d
s=1
p
X X
c2ds,k (λ bk )2 = k
bs − λ
d xi x> bk 2
i φd k .
s=1 i∈Mk
28 X. ZENG, Y. XIA AND L. ZHANG

and
p
X X
c2ds,k (λ bk ) =
bs − λ
d kx> bk 2
i φd k .
s=1 i∈Mk

d0 d0
Write ξ1k,d = c2ds,k , ξ2k,d = c2ds,k (λ bk ). It is easy to see that,
P P bs − λ
d
s=1 s=1
K
1 X k,d
ξ = oa.s. (1);
K k=1 1
and, as shown in Lemma 9,
K K
ξ2k,d = oa.s. (
X X
kx> bk 2
i φd k ) = oa.s. (λd ).
b
k=1 k=1
Note that,

c2dd,k (λ bk ) + ξ k,d
bd − λ
d 2
p
X X
= c2ds,i (λ
bk − λ
d
bs ) + kx> bk 2
i φd k
s=d+1 i∈Mk
p
X
≥ c2ds,t (λ
bk − λ
d
bd+1 )
s=d+1

= (1 − ξ1k,d − c2dd,k )[(λ


bd − λ
bd+1 ) − (λ bk )]
bd − λ
d

= (1 − ξ1k,d − c2dd,k )(λ bd+1 ) − (1 − ξ k,d )(λ


bd − λ
1
b k ) + c2 ( λ
bd − λ
d
bk
dd,k d − λd )
b

≥ (1 − ξ1k,d − c2dd,k )(λ


bd − λ
bd+1 ) − (λ bk ) + c2 (λ
bd − λ
d
bk
dd,k d − λd ).
b

We obtain that,
K h i XK
(1 − ξ1k,d − c2dd,k )(λ ξ2k,d .
X
bd − λ
bd+1 ) − (λ bk ) ≤
bd − λ
d
k=1 k=1
Then,
K
1
[ξ2k,d + (λ bk )]
P bd − λ
K K d K
1 X k=1 1 X k,d
(1 − c2dd,k ) ≤ + ξ .
K k=1 bd − λ
λ bd+1 K k=1 1
According to Lemma 9, we have
K
X
[λ bk ] ≤ λ
bd − λ bd .
d
k=1

According to Bao et al. (2015), we have that, under Assumption 5.b, λ


bd /n = Op (1) and λ bd+1 ∼ Op (n1/3 ).
bd − λ

Furthermore, 1/K ≤ sup nk /n = o(n−2/3 ). Then,


k
K
1 X λ
bd
(1 − c2dd,k ) = Oa.s. ( ) = op (1).
K k=1 K(λd − λ
b bd+1 )
K
1
Note that 0 ≤ c2dd,k ≤ 1, we have 0 ≤ (1 − c2dd,k ) ≤ 1. Then,
P
K
k=1
 K 
1 X

lim sup E (1 − c2dd,k ) I = 0.

K
t→∞ n K k=1 1
|K
P
(1−c2
dd,k
)|>t
k=1
29
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

According to Theorem 1.8 of Shao (2003),


K
1 X
E( (1 − c2dd,k )) = o(1).
K k=1

Consequently,

K K
1 X b> φ
bk 2 1 X
E(1 − |φd d |) = E(1 − |cdd,k |)2
K k=1 K k=1
K
1 X
≤ E(1 − |cdd,k |)
K k=1
K
1 X
≤ E(1 − c2dd,k ) = o(1).
K k=1

Q.E.D.

Lemma 11 Let e = (e1 , e2 , ..., en )> be a random vector with max E(e4i ) < C and a = (a1 , a2 , ..., an )> a random
i
vector that is independent with e. Then,

E(a> e)4 ≤ CEkak4 .

Proof:
 
Xn X
n X
n X
n
> 4
E(a e) = E  (ai aj ak al ei ej ek el )
i=1 j=1 k=1 l=1
n X
X n X
n X
n
= E(ai aj ak al )E(ei ej ek el )
i=1 j=1 k=1 l=1
n X
X n X
n X
n
≤ E(ai aj ak al )E(e4i )1/4 E(e4j )1/4 E(e4k )1/4 E(e4l )1/4
i=1 j=1 k=1 l=1
n X
X n X
n X
n
≤ C E(ai aj ak al )
i=1 j=1 k=1 l=1
n
X
= CE[( ai )4 ]
i=1
n
X
≤ CE[( a2i )2 ] = CEkak4 .
i=1

Q.E.D.

Lemma 12 Suppose Assumptions 1–4 and Assumption 5.b hold. For d0 < d ≤ dmax , there exist a constant C0 such

that

bk> Σp φ
φ bk → C0 a.s..
d d
30 X. ZENG, Y. XIA AND L. ZHANG

p
1
For p × p matrix A, let F A (t) =
P
Proof: 1(λs (A) ≤ t) be its empirical spectral distribution (ESD). The
p
s=1
R 1
Stieltjes transform of a nondecreasing function F is defined by mF (z) = t−z dF (t) for all z ∈ C+ where C+ = {z ∈
1 >
E−M E−M
C, Im(z) > 0}. Let lim p/n = ρ, by Bai and Zhou (2008), we have, for all k, F n−nk k k → F a.s. with
n→∞
1
R
mF (z) satisfies mF (z) = t(1−ρ−ρzm dH(t).
F (z))
p
τt vt vt> and
P
Let Σp =
t=1
p p
1 X X bk> 2
Φp (λ, τ ) = |φ vt | 1(λ ≥ λks )1(τ ≥ τt ).
p s=1 t=1 s
1 >
Then, by the procedures in Ledoit and Péché (2011), (7) holds if we can prove that |Tr(Σsp ( n−n X−M X−Mk −
k k
1
zIp )−1 ) − Tr(Σsp ( n−n >
E−M E−Mk − zIp )−1 )| → 0 for s = 0, 1, 2, ....
k k
Z λ Z τ
(7) ΦN (λ, τ ) → φ(l, t)dH(l)dF (t), a.s.
−∞ −∞

where φ(l, t) is defined in Ledoit and Péché (2011).

In fact, since λ1 (Σsp ) = λs1 (Σp ) < ∞ and

1 1
|Tr(Σsp ( X > X−Mk − zIp )−1 ) − Tr(Σsp ( E > E−Mk − zIp )−1 )|
n − nk −Mk n − nk −Mk
1 1
≤ λ1 (Σsp )|Tr(( X > X−Mk − zIp )−1 ) − Tr(( E > E−Mk − zIp )−1 )|.
n − nk −Mk n − nk −Mk

we only need to prove that

1 1
|Tr(( X > X−Mk − zIp )−1 ) − Tr(( E > E−Mk − zIp )−1 )| → 0,
n − nk −Mk n − nk −Mk
which follows the facts that
1 > 1
X−M X−M
F n−nk k k X > X−Mk − zIp )−1 );
= Tr((
n − nk −Mk
1 E> E 1
F n−nk −Mk −Mk = Tr(( E > E−Mk − zIp )−1 );
n − nk −Mk
1 X> X 1 E> E 1 1 > >
F n−nk −Mk −Mk − F n−nk −Mk −Mk ≤ rank( (X−M X−Mk − E−M E−Mk )) → 0.
p n − nk k k

According to Weyl’s inequality, λ bk ≤ λ


bd+n ≤ λ bd . Moreover, by Proposition 1 of Onatski (2010), exists a constant
k d

Cλ such that λd → Cλ , bk → Cλ ,
a.s. for d > d0 and d/n → 0. This implies that λ a.s. for d0 < d ≤ dmax . Let
d

(Xp , Yp ) follows Φp , (X, Y ) follows Φ and C0 = E(Y |X = Cλ ). Then,


p
X
bk> Σp φ
φ bk = bk> vt |2
τ t |φ
d d d
t=1
p
1 P bk> vt |2
p
τ t |φ d
t=1
= p
1 P bk> vt |2
p
|φ d
t=1
p
τt P (Xn = λkd , Yn
P
= τt )
t=1
=
P (Xn = λkd )
31
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

= E(Yn |Xn = λkd )

→ E(Y |X = Cλ ) = C0 a.s..

Q.E.D.

Lemma 13 Suppose Assumptions 1–4 and Assumption 5.b hold. For d0 < d ≤ dmax ,
K
1 X X > bk bk>
Var( e φ φ ei ) → 0.
n k=1 i∈M i d d
k

Proof: Similar to Lemma 10, we only prove the result for d = d0 + 1. Denote X−Mk,l be the submatrix of X with
bk,l ’s, and φ
the rows in Mk and Ml being removed, λ bk,l ’s be the corresponding eigenvalues and eigenvectors.
d k

bk for i ∈ Mk , we have
Note that ei is independent with φd

E[e> bk bk> bk> bk


i φd φd ei ] = E[φd Σp φd ] = C0 + o(1).

Then, let i 6= j and i ∈ Mk , j ∈ Ml ,

Cov[e> bk bk> > bl bl>


i φd φd ei , ej φd φd ej ]

= E[e> bk bk> > bl bl> > bk bk> > bl bl>


i φd φd ei ej φd φd ej ] − E[ei φd φd ei ]E[ej φd φd ej ]

> bk,l bk,l> > bk,l bk,l>


= E[e> bk bk>
i φd φd ei ej φd φd ej ] + E[e> bk bk> > bl bl>
i φd φd ei (ej φd φd ej − ej φd φd ej )] − C02 + o(1)

= (I) + (II) − C02 + o(1).

For (I), we have

> bk,l bk,l>


(I) = E{E[e> bk bk>
i φd φd ei ej φd φd ej |(F 0 , L0 , E−Mk )]}

E{e> bk,l bk,l> ej E[e> φ


bk φ
bk> 0 0
= j φd φd i d d ei |(F , L , E−Mk )]}

= E[e> bk,l bk,l> ej φ


bk> Σp φ
bk ]
j φd φd d d

= bk,l> Σp φ
C0 E[φ bk,l ] + o(1)
d d

= C02 + o(1).

bk,l ≤ 1. Otherwise we use −φ


bl> φ
For (II), without loss of generality, we assume that 0 ≤ φ bk,l to replace φ
bk,l . Then,
d d d d

bk,l k2 ≤ 4.
bl + φ
2 ≤ kφ d d

bk,l , by Lemma 11,


bl and φ
Note that ej is independent with φd d
 4 h i
E (φ bk,l )> ej
bl + φ bk,l )k4 < ∞,
bl + φ
≤ CE kφ
d d d d

and
 4 h i4
E e> bl bk,l bk,l k .
bl − φ
j (φd − φd ) ≤ CE kφd d
32 X. ZENG, Y. XIA AND L. ZHANG

It follows that,
h i1/2 h i1/2
> bk,l bk,l>
(II) ≤ E(e> bk bk> 2
i φd φd ei ) E(e> bl bl>
j φd φd ej − ej φd φd ej )2
h i1/2  h i2 1/2
≤ E(e> bk bk> 2 E e> bl bk,l > bl bk,l
i φd φd ei ) j (φd + φd )ej (φd − φd )
h i1/2 h i1/4 h i1/4
≤ E(e> bk bk> 2 E(e> bl bk,l 4 E(e> bl bk,l 4
i φd φd ei ) j (φd + φd )) j (φd − φd ))
h i1/4
≤ C( Ekφ bk,l k4
bl − φ
d d
h i1/4
= 0
C E(1 − |φ bk,l |)2
bl> φ .
d d

Then,
K
1 X X > bk bk>
Var( e φ φ ei )
n k=1 i∈M i d d
k

K K K
1 X X > k k> 1 XX X X
= Var(ei φd φ d ei ) + Cov(e> bk bk> > bl bl>
i φd φd ei , ej φd φd ej )]
n2 k=1 i∈M n2 k=1 l=1 i∈M j6=i
k k
j∈Ml
K X
K X X h
C C0 X
bk,l |)2
bl> φ
i1/4
≤ + 2 E(1 − |φ d d
n n k=1 l=1 i∈Mk j6=i
j∈Ml
K
C C0 X X X X h bk,l |)2
bl> φ
i1/4
= + 2 E(1 − |φ d d
n n k=1 l6=k i∈M j∈M
k l

K X
C C0 X h
bk,l |)2
bl> φ
i1/4
= + 2 nk nl E(1 − |φ d d
n n k=1 l6=k

C 0 K sup nk X
K
C k 1 Xh bk,l |)2
bl> φ
i1/4
≤ + nk E(1 − |φd d
n n2 k=1
K l6=k
C
= + o(C 0 K sup nk /n)
n k

= o(1).

Here, the last last equalities follow from Lemma 10 and Assumption 4. Q.E.D.

Lemma 14 Suppose Assumptions 1–4 hold. Moreover, either Assumption 5.a or 5.b holds. Then, for d0 < d ≤ dmax ,
d K X K
1 X X 2 X X
kx> bk 2
i φl k < b k,d − QL
Tr((QL b k,d0 )Σp,i ) + op (1).
n l=d0 +1 k=1 i∈Mk
n k=1 i∈M
k

Proof: On the one hand, when Assumption 1 to 3 and Assumption 5.a hold, we have,
p
X
b k,d − QL
Tr((QL b k,d0 )Σp,i ) = (wsk,d − wsk,d0 )σ 2
s=1
2
= b k,d ) − Tr(PL
σ [Tr(PL b k,d0 )]

= (d − d0 )σ 2 .
33
DOUBLE CROSS VALIDATION FOR THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS

While, according to Lemma 9,


d K X d
1 X X 1 X
kx> bk 2
i φl k ≤ λ
bl , a.s.
n l=d0 +1 k=1 i∈Mk
n l=d0 +1

By Weyl’s inequality, we have


d K X
1 bl ≤ d − d0 λ1 (E > E) < 2(d − d0 )σ 2 = 2
X X
λ b k,d − QL
Tr((QL b k,d0 )Σp,i ) a.s..
n l=d0 +1
n n k=1 i∈M
k

On the other hand, when Assumption 1 to 3 and Assumption 5.b hold, note that,
d K X d K X 
1 X X 1 X X
0 > 0 > bk 2

kx> bk 2
i φl k = ke> bk 2 > bk bk> 0 0
i φl k + 2ei φl φl L fi + kfi L φl k .
n l=d0 +1 k=1 i∈Mk
n l=d0 +1 k=1 i∈Mk

> > bk 2
Since Eke> bk 2
i φl k = O(1) and Ekfi0 L0 φ lk = O(pm−2
np ), we have,

d K X
1 X X
| e> bk bk> 0 0
i φl φl L fi |
np l=d0 +1 k=1 i∈Mk
d K X K X
1 X X X > > bk 2 1/2
≤ ( ke> bk 2 1/2 (
i φl k ) kfi0 L0 φ lk )
np k=d0 +1 k=1 i∈Mk k=1 i∈Mk

= Op (p1/2 m−1
np ) = op (1);

and
d K X
1 X X > > bk 2 −2
kfi0 L0 φ l k = Op (pmnp ) = op (1).
n l=d0 +1 k=1 i∈Mk

In addition, according to Lemma 13, for d0 + 1 ≤ l ≤ dmax ,


K
1 X X > bk bk>
E( e φ φ ei ) → C0 ;
n k=1 i∈M i l l
k

K
1 X X
bk → C0
bk> Σp φ
φ l l a.s.;
n k=1 i∈Mk
and
K
1 X X > bk bk>
Var( e φ φ ei ) → 0.
n k=1 i∈M i l l
k

Thus
d K X d
1 X X X
ke> bk 2
i φl k = C0 + op (1).
n l=d0 +1 k=1 i∈Mk l=d0 +1

d
Note that PL
b k,d =
P bk φ
φ bk>
l l , we have,
l=1

K K d
1 X X 1 X X X
bk> diag(Σp )φ
bk .
b k,d − QL
Tr((QL b k,d0 )Σp,i ) = φl l
n k=1 i∈M n k=1 i∈M
k k l=d0 +1

Thus,
d K X K
1 X X 2 X X
kx> bk 2
i φl k − b k,d − QL
Tr((QL b k,d0 )Σp,i )
n l=d0 +1 k=1 i∈Mk
n k=1 i∈M
k
34 X. ZENG, Y. XIA AND L. ZHANG

= bk> (Σp − 2diag(Σp ))φ


φ bk + op (1)
l l

< 0 + op (1),

since 2diag(Σp ) − Σp is positive definite. Q.E.D.

You might also like