Convergence Uniforme Index

Sankhyā : The Indian Journal of Statistics
2014, Volume 76-A, Part 2, pp. 356-378

c 2014, Indian Statistical Institute
On the Nonparametric Conditional Density and Mode

Estimates in the Single Functional Index Model
with Strongly Mixing Data
Said Attaoui
Université Djillali Liabès, Sidi Bel Abbès, Algerie
Université des Sciences et de la Technologie, Mohamed Boudiaf,
El Mnaouer-Oran, Algerie
Abstract
This study focuses on the nonparametric estimation of the conditional den-
sity of a scalar response variable given a random variable taking values in sep-
arable Hilbert space. We establish under general conditions the almost com-
plete convergence rates of the conditional density estimator under α-mixing
dependence, based on the single-index structure. We also demonstrate the
impact of this functional parameter on the mode estimation. Finally, the es-
timation of the functional index via the pseudo-maximum likelihood method
is discussed but not tackled.
AMS (2000) subject classification. Primary 62G05, Secondary 62G07, 62G08.

Keywords and phrases. Conditional density estimation, Conditional mode
estimation, Functional Hilbert spaces, Single-index model, α-mixing
dependency.
1 Introduction
During the last two decades, functional data analysis (FDA) has become
more and more popular in modern statistics. Due to the rapid technology
development of accurate instruments, measurement could be taken continu-
ously over a period of time to produce data in functional form (i.e. curves)
(see Ramsay and Dallzel (1991)). Moreover, modelization of variables rep-
resented by curves has attracted the interest of several authors in the recent
statistical literature, and it has provided the powerful tool for exploratory
functional data analysis. Several variations have been proposed in Ramsay
and Silverman (2002, 2005) for parametric models or in Ferraty and Vieu
(2006) for nonparametric conditional models (mean, mode,...), and we refer
to Ferraty (2011) for recent advances. The goal of this work is to contribute
Conditional density and mode for strong mixing
in single index 357
to the functional data literature by studying some classes of semiparamet-
ric models. Note that such models are important in the statistical and
econometric modelization, due to its flexibility for dimension reduction, it
provides the best new ways to investigate problems in substantive economics
(see Horowitz (2009)). As a particular case, the single-index model has
proven useful in providing an optimal approach to compromise between non-
parametric and parametric models.
Single-index models when the explanatory variable is an element of a
finite-dimensional space have been studied extensively in both statistical and
econometric literatures, we quote, for instance, Härdle et al. (1993), Horowitz
(1996), Hristache et al. (2001a, b) and Delecroix et al. (2003). The main
focus of this work is to study the functional single index model for dependent
data by treating the problem of estimating the conditional density of a real
variable Y given a functional covariable X, when the explanation of Y given
X is done through its projection on one functional direction.
In nonparametric regression setting, it is a well-known fact that for fore-
casting and statistical inferences, the conditional density function is very
useful. It has been, widely nonparametrically studied in the multivariate
case, see Roussas (1968), Rosenblatt (1969) and Youndjé (1993). It pro-
vides the one of the best tool to estimate some characteristic feature of
the dataset, such as the conditional mode. Indeed, this last has received
a considerable attention, see, for instance, Collomb et al. (1987), Samanta
and Thavaneswaran (1990), Quintela-del-Rı́o and Vieu (1997), Berlinet et al.
(1998), and among others. However, little attention has been devoted to the
case when the explanatory variable takes values in abstract spaces, starting
by Gasser et al. (1998), they gave an approach to introduce a nonparametric
estimation of the mode when data is curve. Ferraty et al. (2006) showed that
the conditional mode provides an alternative method for prediction and gives
results slightly better than the classical regression. The consistency and the
asymptotic normality of the nonparametric conditional mode estimator were
obtained by Ezzahrioui and Ould Saı̈d (2008). Quintela-del-Rı́o et al. (2011)
gave some recent results about nonparametric conditional density estimation
with econometric applications.
For the functional single-index models, the literature is closely limited,
and only a few theoretical results have been obtained until now. The first
asymptotic properties in the fixed functional single-model were obtained
by Ferraty et al. (2003). They established the almost complete convergence,
in the i.i.d. case, of the link regression function of this model. Their results
were extended to the dependent case by Aı̈t Saidi et al. (2005). Where
the functional single-index is unknown, Aı̈t Saidi et al. (2008) proposed
358 S. Attaoui
an estimator of this parameter, based on the cross-validation procedure.

Newly, Ferraty and Park (2011) proposed a new estimator of the single in-
dex based on the idea of functional derivative estimation. Recently, Attaoui
et al. (2011) studied the functional single-index model via its conditional
density kernel estimator, and they established its pointwise and uniform al-
most complete convergence (a.co.)1 rates. Our purpose is to extend the
results obtained by Attaoui et al. (2011) in i.i.d case to dependent case.
We establish the almost complete convergence rate of the kernel estimator
of the conditional density when the data satisfy the α-mixing hypothesis,
based on the single index structure. Recall that a process (Xi , Yi )i≥1 is
called α-mixing or strongly mixing (see Lin and Lu (1996)) for more details
and examples), if
sup sup |P(A ∩ B) − P(A)P(B)| = α(n) → 0 as n → ∞,

∞
A∈F1k B∈Fn+k
where Fij is the σ-field generated by Xi , . . . , Xj .

In nonparametric functional estimation context, several works deal within
α-mixing hypothesis, we quote, for instance, for the regression and time se-
ries prediction, Ferraty and Vieu (2006). Dabo-Niang and Laksaci (2007)
stated the consistency in Lp norm of the mode estimator. Ezzahrioui and
Ould Saı̈d (2010) obtained the asymptotic normality of a functional non-
parametric conditional mode estimator. Demongeot et al. (2010) established
the asymptotic behavior (the pointwise a.co. convergence rates) of the non-
parametric local linear estimator of the conditional density with application
to time series and prediction problems. For the asymptotic normality of the
robust regression and the quantile, see Attouch et al. (2010) and Laksaci et
al. (2011) respectively.
The rest of the paper is as follows: We present our model in the next
Section. In Section 3 we introduce notations, assumptions and state the
main results. Section 4 is devoted to the conditional mode estimate, we show
that properties of the estimator of this parameter depends on the those of
the conditional density estimator. Some discussions on the functional single
index estimate are given in Section 5. The proofs of the results are postponed
to the last Section.
We say that the sequence (Wn )n converges a.co. to zero, if and only if ∀η0 >
1
0, n≥1 P(|Wn | > η0 ) < ∞. Furthermore, we say that Wn = Oa.co. (wn ), if there ex-
ists η > 0, such that n≥1 P(|Wn | > ηwn ) < ∞. Note that this type of convergence
implies both the almost-sure convergence and the convergence in probability.
in single index 359
2 Model
Let {(Xi , Yi ), 1 ≤ i ≤ n} be n random variables, identically distributed
as the random pair (X, Y ) with values in H × R, where H is a separable
real Hilbert space with the norm . generated by an inner product < ·, · >.
We consider the semi-metric dθ , associated to the single-index θ ∈ H defined
by ∀x1 , x2 ∈ H : dθ (x1 , x2 ) := | < x1 − x2 , θ > | . Under such topological
structure and for a fixed functional θ, we suppose that the conditional density
of Y given X = x denoted by f (.|x) exists and is given by
∀y ∈ R, fθ (y|x) =: f (y| < x, θ >). (1)
By considering the same conditions of Ferraty et al. (2003) on the regression

operator, the identifiability of the model is assumed. More precisely we
suppose that f is differentiable with respect to (w.r.t) x and θ such that
< θ, e1 >= 1, where e1 is the first eigenvector of an orthonormal basis of the
space H. Clearly, we have for all x ∈ H,
f1 ( · | < x, θ1 >) = f2 ( · | < x, θ2 >) ⇒ f1 ≡ f2 and θ1 = θ2 .
In what follows, we denote by f (θ, ·, x), the conditional density of Y given

< x, θ > and we define the kernel estimator f(θ, ·, x) of f (θ, ·, x) by:
n
h−1 −1 −1
i=1 K(hK dθ (x, Xi ))H(hH (y − Yi ))
f(θ, y, x) = H n −1 , ∀y ∈ R (2)
i=1 K(hK dθ (x, Xi )))
with the convention 0/0 = 0, where K and H are kernels function and
hK := hn,K (resp. hH := hn,H ) is a sequence of smoothing parameters
decreasing to zero as n goes to infinity.
3 Main Results
3.1. Pointwise Almost Complete Convergence. Throughout the paper,
when no confusion will be possible, we will denote by C or/and C some
strictly positive generic constants whose values are allowed to change.
We put, for any x ∈ H, and i = 1, . . . , n, Ki (θ, x) := K(h−1
K dθ (x, Xi )),
−1
and for all y ∈ R, Hi (y) := H(hH (y − Yi )). We denote by Bθ (x, hK ) :=
{ x1 ∈ H : | < x − x1 , θ > | ≤ hK }, the ball of center x and radius hK .
Next, to give our main result, we need to introduce the following basic
assumptions
(H1) P(X ∈ Bθ (x, hK )) =: φθ,x (hK ) > 0, φθ,x (hK ) → 0, as hK → 0.

360 S. Attaoui
(H2) The conditional density f (θ, y, x) satisfies the Hölder condition, that
is: ∀(y1 , y2 ) ∈ C 2 , ∀(x1 , x2 ) ∈ Nx × Nx ,

|f (θ, y1 , x1 ) − f (θ, y2 , x2 )| ≤ Cθ,x dθ (x1 , x2 )β1 + |y1 − y2 |β2 ,
β1 > 0, β2 > 0.
with Nx is a fixed neighborhood of x and C is a fixed compact subset

of R.
(H3) H is a bounded function, such that

H(t)dt = 1, |t| H(t)dt < ∞ and
β2
H 2 (t)dt < ∞.
(H4) K is a bounded continuous function such that
0 < C1[0,1] (t) < K(t) < C 1[0,1] (t) < ∞.
(H5) (Xi , Yi )i∈N is a strongly mixing sequences, whose mixing coefficient

α(n) satisfy
√
∃a > (5 + 17)/2, ∃C > 0 : ∀n ∈ N, α(n) ≤ Cn−a .

φθ,x (hK )(a+1)/a
(H6) 0 < sup P (Xi , Xj ) ∈ Bθ (x, hK )×Bθ (x, hK ) = O .
i=j n1/a
⎧
⎪
⎪ log n
⎪ lim hK = 0, lim
⎨ (H7a) n→∞ = 0;
n→∞ nhH φθ,x (hK )
(H7) 3−a
⎪
⎪ (H7b) ∃β > 0, C1 , C2 > 0, such that C1 n a+1 +β ≤ hH φθ,x (hK ),
⎪
⎩ 1
and φθ,x (hK ) ≤ C2 n 1−a .
3.2. Comments and Remarks.
1. Remarks on the Assumptions: Our conditions are very standard in

this context. Assumption (H1) characterizes the probability measure
concentration of the functional variable on small balls, it is the basic
condition used in the majority of the existing works dealing with this
kind of variable. In our work we consider a single index topological
structure when the concentration of measure changes a little. More
precisely the quantity in (H1) can be written as P(| < x − X, θ > | ≤
hK ) =: φθ,x (hK ). The regularity of the functional space of our model
in single index 361
is controlled by mean of condition (H2) and is needed to evaluate the

bias of estimation. In order to establish the almost complete conver-
gence rate of our model under the α-mixing hypothesis, we need to
the assumption (H6), that described the asymptotic behavior of the
joint distribution of the couple (Xi , Xj ). Nothing that, the asymp-
totic expression imposed in (H6) was obtained as n go to infinity by
the following equivalent:

supi=j P (Xi , Xj ) ∈ Bθ (x, hK ) × Bθ (x, hK )
φθ,x (hK ) 1/a
∼ .
P (X ∈ Bθ (x, hK )) n
The assumption (H6) can be seen differently, based on the idea of
maximum concentration between the quantities P(Xi ∈ Bθ (x, hK )) and
P(Xj ∈ Bθ (x, hK )) (see Ferraty et al. (2005)). The rest assumptions
are the technical conditions imposed for brevity of proofs.
2. Remarks on the single index: It is well known that one of the main ad-
vantages of the single index model is its ability to deal with the problem
of high dimensional data. A straightforward example is the optimal
convergence rate of type O(n−2k/(2k+p) ) for the estimation of a k-times
differentiable regression function, this rate goes to zero dramatically
slowly if k is small compared to the dimension p of the explanatory
variable (X ∈ Rp ). In this regard, Gaı̈ffas and Lecué (2007) showed
that the optimal rate of convergence of regression function in the single
index model, is of order O(n−2k/(2k+1) ) (rather than O(n−2k/(2k+p) )).
The same idea was adopted in the abstract metric spaces by the choice
of the semi-metric increasing the probability measure concentration
of the explanatory variable in small balls (see Ferraty et al. (2006)),
Section 13.2). Among these family of semi-metric, one can consider
the semi-metric induced by the functional single index estimate.
Now, we present our main result in the following theorem.
Theorem 1. Under Assumptions (H1)-(H7), and for any fixed y, we
have, as n goes to infinity

β1 β2 log n
f (θ, y, x) − f (θ, y, x) = O(hK ) + O(hH ) + Oa.co. . (3)
nhH φθ,x (hK )
Proof of Theorem 1. Similarly to Attaoui et al. (2011), the proof is based
on the following decomposition
1
f(θ, y, x) − f (θ, y, x) = fN (θ, y, x) − E fN (θ, y, x)
fD (θ, x)
362 S. Attaoui

+ E fN (θ, y, x) − f (θ, y, x)
f (θ, y, x)
− fD (θ, x) − 1
fD (θ, x)
where
1 n
fN (θ, y, x) = Hi (y)Ki (θ, x),
nhH E [K1 (θ, x)]
i=1
1
n
fD (θ, x) = Ki (θ, x).
nE [K1 (θ, x)]
i=1
fN (θ, y, x)
with the fact that f(θ, y, x) = and E[fD (θ, x)] = 1.
fD (θ, x)
Finally, the proof of Theorem 1 is a direct consequence of the following
Lemmas
Lemma 1. (See Attaoui et al. (2011)) (Lemma 3.3)) Under the Assump-
tions (H1)-(H3), as n goes to infinity, we have

β β
E fN (θ, y, x) − f (θ, y, x) = O(hK1 ) + O(hH2 ). (4)
Lemma 2. Under the Assumptions (H1)-(H7), as n goes to infinity, we

have

log n
fN (θ, y, x) − E fN (θ, y, x) = Oa.co. . (5)
nhH φθ,x (hK )
Lemma 3. Under the Assumptions (H1), (H4)-(H7), as n goes to infinity,

we have

log n
fD (θ, x) − 1 = Oa.co. . (6)
nφθ,x (hK )
Furthermore, we have
∞

P fD (θ, x) ≤ 1/2 < ∞. (7)
n=1
in single index 363
4. On the Conditional Mode Estimate

In this section we will consider the problem of estimating the condi-
tional mode in the functional single index model, given as the value which
maximizes the conditional density. We assume that for a given x there
exists some compact subset C = (μθ (x) − ζ, μθ (x) + ζ), ζ > 0, such that
μθ (x) = arg supy∈C f (θ, y, x), (supposed to be unique). This can be inter-
preted by the following assumption
(H8) ∃ζ > 0, such that f (θ, ., x) on (μθ − ζ, μθ ) and f (θ, ., x) on

(μθ , μθ + ζ), and for the brevity of proof, we need to introduce the
following technical conditions:

∀y1 , y2 ∈ R, |H(y1 ) − H(y2 )| ≤ C|y1 − y2 |,
(H9)
limn→∞ hH = 0 and limn→∞ nα hH = ∞, for some α > 0.
(H10) The kernel K satisfies (H4) and following Lipschitz’s condition
∀t1 , t2 ∈ R, |K(t1 ) − K(t2 )| ≤ C|t1 − t2 |.
θ is given by
Thus, a natural nonparametric estimator of μθ denoted μ
θ (x) = arg sup f(θ, y, x).

μ (8)
y∈C
The j th order derivative of the conditional density f (θ, y, x) of the variable

Y given X = x denoted f (j) (θ, y, x) satisfy:
(H11) f (j) (θ, y, x) is j-times continuously differentiable with respect to y on

(μθ − ζ, μθ + ζ),
(l)
f (θ, μθ , x) = 0, if 1 ≤ l < j,
(H12)
0 < |f (j) (θ, μθ , x)| < ∞.
It is worth noting that the flatness of the function f (θ, y, x) around the
mode μθ (x) controlled by the number of derivatives of f (θ, y, x) vanishing
at the point y = μθ (x) (conditions (H11) and (H12)) has a great impact on
the asymptotic rates of the mode estimate, which is given in the next result.
Theorem 2. If the conditions of Theorem 1 hold together with (H8)-
(H12), then we have as n goes to infinity
1
β1 β2
log n 2j
θ − μθ = O hK + hH + Oa.co.
μ j j
. (9)
nhH φθ,x (hK )
364 S. Attaoui
Proof of Theorem 2. Under (H11) and (H12), by using the following

Taylor expansion of f (θ, y, x) in neighborhood of μθ , we get
f (j) (θ, μ∗θ , x)

f(θ, μ
θ , x) = f (θ, μθ , x) + μθ − μθ ) j ,
( (10)
j!
∗
some μθ ∈ (μθ , μ
for θ ). Combining the latter equality with the fact that

θ , x) − f (θ, μθ , x) ≤ 2 supθ∈ΘH supy∈C f(θ, y, x) − f (θ, y, x), allow to
f (θ, μ
write:

j!
μθ − μθ |j ≤ (j)
| ∗
sup sup f (θ, y, x) − f (θ, y, x) .
f (θ, μθ , x) θ∈ΘH y∈C
We need to show that

∞

∃c > 0, such that P f (j) (θ, μ∗θ , x) < c < ∞. (11)
n=1
This result can be obtained directly by using the second part of (H12) to-
gether with the following Lemma.
Lemma 4. Under the Assumptions (H1)-(H7) together with (H8)-(H10)
and (H12), we have as n goes to infinity
θ − μθ → 0, a.co.
μ (12)
So, we would have

j
μθ − μθ | = Oa.co.
| sup sup f (θ, y, x) − f (θ, y, x) . (13)
θ∈ΘH y∈C
Finally, Theorem 2 can be deduced from the following Lemmas.

Lemma 5. Under the Assumptions (H1)-(H3), we have as n goes to in-
finity

sup sup E fN (θ, y, x) − f (θ, y, x) = O(hβK1 ) + O(hβH2 ). (14)
θ∈ΘH y∈C
Lemma 6. Under the Assumptions (H1)-(H7), together with (H8)-(H12),

we have as n goes to infinity,

log n
sup sup fN (θ, y, x) − E fN (θ, y, x) = Oa.co. . (15)
θ∈ΘH y∈C nhH φθ,x (hK )
in single index 365
Lemma 7. Under the Assumptions (H1), (H2) and (H4)-(H7), together

with (H8)-(H12), we have as n goes to infinity,

log n
sup fD (θ, x) − 1 = Oa.co. . (16)
θ∈ΘH nφθ,x (hK )
Corollary 1. Under the Assumptions of Lemma 7, we have

∞
1

P inf fD (θ, x) ≤ < ∞.
θ∈ΘH 2
n=1
5. About the Functional Single Index Estimate

Among the interest of our study is to show how the conditional density
estimate can be used to derive an estimate of the functional single index
if the latter is unknown. The leave-out-one-curve cross-validation proce-
dure was adapted by Aı̈t Saidi et al. (2008) to estimate the single index.
Newly, Hall and Müller (2009) proposed a method for estimating functional
derivatives and Ferraty and Park (2011) adopted this technique to estimate
the parameter θ. Alternatively, this parameter can be estimated via the
pseudo-maximum likelihood method which is based on the preliminary esti-
mation of the conditional density of Y given X by
θ = arg max L(θ)

θ∈ΘF
where
1
n

L(θ) = log f(θ, Yj , Xj ).
n
j=1
Note that, this method has been studied by Delecroix et al. (2003) in the
real case where they showed that this technique has minimal variance among
all estimators. The asymptotic optimality of this procedure in functional
statistic, is an important prospect of the present work.
As an application, this approach can be used for answering the semi-
metric choice question. Indeed, it is well known that, in nonparametric
functional statistic, the projection-type semi-metric is very important for
increasing the concentration property. The functional index model is a par-
ticular case of this family of semi-metric, because it is based on the projection
on one functional direction. So, the estimation procedures of this direction
366 S. Attaoui
permit us to compute adaptive semi-metrics in the general context of non-

parametric functional data analysis. Finally, the theoretical justification and
practice should be established.
Acknowledgements. The author wishes to express his thanks to the
Editor, the Associate Editor and two anonymous referees for their valuable
comments and suggestions which significantly helped to improve the quality
of the paper.
References
aı̈t saidi, a., ferraty, f., kassa, r. and vieu, p. (2005). Single functional index model
for a time series. R. Roumaine. Math. Pures et Appl., 50, 321–330.
aı̈t saidi, a., ferraty, f., kassa, r. and vieu, p. (2008). Cross-validated estimation in
the single functional index model. Statistics, 42, 475–494.
attaoui, s., laksaci, a. and ould-saı̈d, e. (2011). A note on the conditional density
estimate in the single functional index model. Statist. Probab. Lett., 81, 45–53.
attouch, m., laksaci, a. and ould-saı̈d, e. (2010). Asymptotic normality of a robust
estimator of the regression function for functional time series data. J. Korean Stat.
Soc., 39, 489–500.
berlinet, a., gannoun, a. and matzner-lober, e. (1998). Normalité asymptotique
d’estimateurs convergents du mode conditionnel. Canad. J of Statist., 26, 365–380.
collomb, g., härdle, w. and hassani, s (1987). A note on prediction via conditional
mode estimation. J. Statist. Plann. and Inf., 15, 227–236.
dabo-niang, s. and laksaci, a. (2007). Estimation nonparamtrique du mode conditionnel
pour variable explicative fonctionnelle. Pub. Inst. Stat. Univ., 3, 27–42.
delecroix, m., härdle, w. and hristache, m. (2003). Efficient estimation in conditional
single-index regression. J. Multivariate Anal., 86, 213–226.
demongeot, j., laksaci, a., madani, f. and mustapha, r. (2010). A Fast Functional
Locally Modeled Conditional Density and Mode for Functional Time-Series. In Recent
advances in functional data analysis and related topics. Contribution to statistics. (F
Ferraty, ed.). Physica.
ezzahrioui, m. and ould saı̈d, e. (2008). Asymptotic normality of the kernel estimators
of the conditional mode for functional data. J. Nonparametric Statist., 20, 1–18.
ezzahrioui, m. and ould saı̈d, e. (2010). Some asymptotic results of a nonparametric
conditional mode estimator for functional time series data. Statist. Neerlandica, 64,
171–201.
ferraty, f., peuch, a. and vieu, p. (2003). Modèle à indice fonctionnel simple. C. R.
Mathématiques Paris, 336, 1025–1028.
ferraty, f., laksaci, a and vieu, p. (2005). Functional time series prediction via condi-
tional mode. C. R., Math., Acad. Sci. Paris, 340, 5, 389–392.
ferraty, f. and vieu, p (2006). Nonparametric functional data analysis. Theory and
Practices. Springer-Verlag.
ferraty, f., laksaci, a and vieu, p. (2006). Estimating some characteristics of the condi-
tional distribution in nonparametric functional models. Statist. Inf. for Stoch. Proc.,
9, 47–76.
ferraty, f. and vieu, p. (2008). Erratum of: Non-parametric models for functional data,
with application in regression, time-series prediction and curve estimation. Non-
paramet. Statist., 20, 187–189.
in single index 367
ferraty, f. (2011). Recent advances in functional data analysis and related topics.
Physica-Verlag.
ferraty, f. and park, j. (2011). Estimation of a Functional Single Index Model. In Recent
advances in functional data analysis and related topics. Contribution to statistics. (F.
Ferraty, ed.). Physica.
gaı̈ffas, s. and lecué, g. (2007). Optimal rates and adaptation in the single-index model
using aggregation. Electr. J of Statist., 1, 538–573.
gasser, t., hall, p. and presnell, b. (1998). Nonparametric estimation of the mode of
a distribution of random curves. J. Roy. Statist. Soc. Ser. B, 60, 681–691.
hall, p. and müller, h.g. (2009). Estimation of functional derivatives. Ann. Statist., 37,
3307–3329.
hristache, m., juditsky, a. and spokoiny, v. (2001a). Direct estimation of the index
coefficient in the single-index model. Ann. Statist., 29, 595–623.
hristache, m., juditsky, a., polzehl, j. and spokoiny, v. (2001b). Structure Adaptive
approach for dimension reduction. Ann. Statist., 29, 1537–1566.
horowitz, j.l. (1996). Semiparametric estimation of a regression model with an unknown
transformation of the dependent variable. Econometr., 64, 103–137.
horowitz, j.l. (2009). Semiparametric and Nonparametric Methods in Econometrics.
Springer Series in Statistics.
laksaci, a., lemdani, m. and ould-saı̈d, e. (2011). Asymptotic results for an L1 -norm
kernel estimator of the conditional quantile for functional dependent data with ap-
plication to climatology. Sankhayā, 73-A, Part 1, 125–141.
lin, z. and lu, c. (1996). Limit theory of mixing dependent random variables. Mathematics
and its applications, Sciences Press, Kluwer Academic Publishers, Beijing.
quintela-del-rı́o, a. and vieu, ph (1997). A nonparametric conditional mode estimate.
J. Nonparametr. Stat., 8, 253–268.
quintela-del-rı́o, a., ferraty, f. and vieu, ph (2011). Nonparametric Conditional Den-
sity Estimation for Functional Data. Econometric Applications. In Recent advances
in functional data analysis and related topics. Contribution to statistics. (F. Ferraty,
ed.). Physica.
ramsay, j.o. and dallzel, c. (1991). Some tools for functional data analysis. J. R. Statist.
Soc. B, 53, 539–572.
ramsay, j.o. and silverman, b.w. (2002). Applied functional data analysis: Methode and
Case Studies. Springer-Verlag, New York.
ramsay, j.o. and silverman, b.w. (2005). Functional data analysis, 2nd edn. Springer,
New York.
rio, e. (2000). Théorie asymptotique des processus aléatoires faiblement dépendants (in
french). Springer, Mathématiques & Applications 31.
rosenblatt, m. (1969). Conditional probability density and regression estimates. In Multi-
variate Analysis II. (P. R. Krishnaiah, ed.). Academic Press, New York N.Y., pp. 25–
31.
roussas, g.g. (1968). On some properties of nonparametric estimates of probability den-
sity functions. Bull. Soc. Math. Greece (N.S.) 9, 29–43.
samanta, m. and thavaneswaran, a. (1990). Nonparametric estimation of the conditional
mode. Comm. Statist. Theory Methods, 19, 4515–4524.
youndjé, e. (1993) Estimation nonparamétrique de la densité conditionnelle par la
méthode du noyau (in french). PhD thesis, University of Rouen.
368 S. Attaoui
Appendix
Proof of Lemma 2. By definition of the a.co. convergence, we need to

show: ∀η > 0

∞
log n
P fN (θ, y, x) − E fN (θ, y, x) > η < ∞.
nhH φθ,x (hK )
n=1
To do that, we can see:

1
fN (θ, y, x) − E fN (θ, y, x) =
nhH E[K1 (θ, x)]
n

× Hi (y)Ki (θ, x) − E Hi (y)Ki (θ, x)
i=1
1 n
= Δi (x),
nhH E[K1 (θ, x)]
i=1
with
Δi (x) = Hi (y)Ki (θ, x) − E Hi (y)Ki (θ, x) .
So, the proof of this Lemma will be based on the application of the Fuck-
Nagaev exponential-type inequality (see Rio (2000), pp. 87). For that, we
need to evaluate the asymptotic behavior of s2n quantity defined as

n
n
s2n = |Cov(Δi , Δj )| = s2∗
n + nV ar(Δi ).
i=1 j=1
For the quantity s2∗

n , we merely split it into,

s2∗
n = |Cov(Δi , Δj )| = J1,n + J2,n
i=j
with J1,n and J2,n are the sums of covariance over the sets S1 and S2 respec-
tively, defined by:

J1,n = |Cov(Δi , Δj )|, and J2,n = |Cov(Δi , Δj )|
S1 S2
where
S1 = {(i, j), such that 1 ≤ |i − j| ≤ mn }, and
in single index 369
S2 = {(i, j), such that mn + 1 ≤ |i − j| ≤ n − 1}.

Here mn is a sequence of natural number going to infinity as n → ∞.
Since ∀i, E[Δi (x)] = 0, by the property of conditional expectation, we
get

J1,n = Cov(Δi (x), Δj (x)) ≤ C E[Δi (x)Δj (x)]
S1 S

1

≤ C E[Hi (y)Ki (θ, x)Hj (y)Kj (θ, x)]
S1
2
≤ C E[H1 (y)| < θ, X >] |E[Ki (θ, x)Kj (θ, x)]|
S1
By a change of variable in the following integral and by applying the as-

sumptions (H2) and (H3), one gets:

E[H1 (y)| < θ, X1 >] = H(h−1 H (y − z))f (θ, z, x) dz
R

≤ hH H(t) |f (θ, y − hH t, x) − f (θ, y, x)| dt
R
+hH H(t) |f (θ, y, x)| dt

R

= o(hH ) |t|β2 H(t) dt + hH H(t) |f (θ, y, x)| dt
R R
= O(hH ).
Then,

J1,n ≤ Ch2H E[Ki (θ, x)Kj (θ, x)]
S1
Because of (H1), (H4) and (H6), we can write

E[Ki (θ, x)Kj (θ, x)] ≤ C [P((Xi , Xj ) ∈ Bθ (x, hK ) × Bθ (x, hK ))]

φθ,x (hK ) 1/a
≤ C φθ,x (hK ) .
n
So,
1/a
φθ,x (hK )
J1,n = O nmn h2H φθ,x (hK ) .
n
370 S. Attaoui
Now, let us focus on the sum over S2 . Since the variable (Δi )1≤i≤n is bounded
(i.e, Δi ∞ < ∞), we can use the Davydov-Rio’s inequality. Wherefore, we
have for all i = j;
|Cov(Δi , Δj )| ≤ 4C1 C2 α(|i − j|) ≤ Cα(|i − j|).
So, we get by using (H5)

J2,n = |Cov(Δi , Δj )| ≤ n2 α(mn ) = O(n2 m−a
n ).
S2
Thus,
s2∗
n = J1,n + J2,n
1/a
φθ,x (hK )
≤ Cn mn h2H φθ,x (hK ) + nm−a
n
n

φθ,x (hK ) −1/a
By choosing mn = h−1
H n , we obtain
s2∗
n = o(nhH φθ,x (hK )).
For the variance of Δi we have for all i = 1, . . . , n
V ar(Δi ) = E(Δ2i ) ≤ CV ar (Hi (y)Ki (θ, x))
By (H1), and the same integral calculus realized above, we obtain

V ar(Δi ) ≤ C hH φθ,x (hK ) + (hH φθ,x (hK ))2
= O (hH φθ,x (hK )) .
Finally,
s2n = o(nhH φθ,x (hK )) + O (nhH φθ,x (hK ))

= O (nhH φθ,x (hK )) .
Now, we apply the classical Fuck-Nagaev’s inequality to get: ∀λ > 0, ∀r > 0
n

P fN (θ, y, x) − E fN (θ, y, x) > λ ≤ P Δi (x) > λnhH E[K1 (θ, x)]
i=1
≤ C (A1 + A2 ) .
in single index 371
where
− 2r
(λnhH E[K1 (θ, x)])2
A1 = 4 1+
rs2n
a+1
4cn r
A2 = .
r λnhH E[K1 (θ, x)]

nhH φθ,x (hK ) log n
Setting λ = η , in the term A2 , then we get:
nhH E[K1 (θ, x)]
A2 = 4cnra λ−a−1 = Cnra (nhH φθ,x (hK ) log n)−(a+1)/2
Taking r ∼ C(log n)2 , it follows that
A2 = Cn(log n)(2a−(a+1)/2) (nhH φθ,x (hK ))−(a+1)/2

≤ C(log n)2a−(a+1)/2 n(1−(a+1)/2) (hH φθ,x (hK ))−(a+1)/2
Next, using the left inequality in (H7b), one gets
A2 ≤ C(log n)2a−(a+1)/2 n(1−(a+1)/2) n−( a+1 +β)(a+1)/2

3−a
≤ C(log n)2a−(a+1)/2 n[(1−a)/2−(3−a+β(a+1))/2]

(a+1)
≤ C(log n)2a−(a+1)/2 n−1−β 2 .
So, finally it exists some real number ζ > 0 such that
A2 ≤ Cn−1−ζ . (17)
The term A1 can be treated as:

− r2
η 2 nhH φθ,x (hK ) log n
A1 = 4 1 +
16rs2n
Because r ∼ C(log n)2 , and the fact that s2n = O(nhH φθ,x (hK )), we can
write
− r2
η 2 log n
A1 ≤ C 1 + = exp{−(η 2 /32) log n} (18)
16r
Finally, by combining results (17) and (18), we get for η large enough

log n
P fN (θ, y, x) − E fN (θ, y, x) > η ≤ Cn−1−ζ .
nhH φθ,x (hK )
372 S. Attaoui
Proof of Lemma 3. The proof of this Lemma is based on the similar

arguments to those used in the proof of Lemma 2. Indeed, we define

i (x) = Ki (θ, x) − E Ki (θ, x) .
Δ
It yields
1 n
fD (θ, x) − E fD (θ, x) = i (x),
Δ
nE[Ki (θ, x)]
i=1
and

n
n
s2
n =
i, Δ
|Cov(Δ j )| = i, Δ
|Cov(Δ j )| +nV ar(Δ
i)
i=1 j=1 i=j

s2∗
n

≤ C i, Δ
Cov(Δ j) + C i, Δ
Cov(Δ j ) + nV ar(Δ
i ),
S1 S2
where
S1 = {(i, j), such that 1 ≤ |i − j| ≤ mn }, and
S2 = {(i, j), such that mn + 1 ≤ |i − j| ≤ n − 1}, mn → ∞.
By (H4), as K is bounded, due to the right inequality in (H7b), we get,
i, Δ
Cov(Δ j ) ≤ C [P((Xi , Xj ) ∈ Bθ (x, hK ) × Bθ (x, hK ))
+P(Xi ∈ Bθ (x, hK )) × P(Xj ∈ Bθ (x, hK ))]

φθ,x (hK ) 1/a
≤ C φθ,x (hK ) + (φθ,x (hK ))2
n

φθ,x (hK ) 1/a
= O φθ,x (hK ) ,
n
and
i, Δ
∀i = j : |Cov(Δ j )| ≤ Cα(|i − j|).
So,
1/a
φθ,x (hK )
sn2∗ ≤ C φθ,x (hK ) +C α(|i − j|)
n
S1 S2
1/a
φθ,x (hK )
≤ Cnmn φθ,x (hK ) + Cn2 m−a
n .
n
in single index 373
−1/a
φθ,x (hK )
Taking, mn = , we obtain
n
sn2∗ = o(nφθ,x (hK )).
i we have
For the variance of Δ
i ) = E(Δ
V ar(Δ 2 ) ≤ CV ar(Ki (θ, x)).
i
By assumption (H1), we obtain, for all i = 1, . . . , n

i ) ≤ C(φθ,x (hK )),
V ar(Δ
Finally,
s2
n = O (nφθ,x (hK )) .
Analogously to Lemma 2, we apply Fuck-Nagaev’s inequality to get, ∀λ >

0, ∀r > 0
n

i (x) > λnE[K1 (θ, x)]
P fD (θ, x) − E fD (θ, x) > λ ≤ P Δ
i=1

≤ C A1 + A2 .
r
2 −2
a+1
(λnE[K 1 (θ, x)]) 4cn r
where A1 = 4 1+ and A2 = .
rs2n r λnE[K1 (θ, x)]
nφθ,x (hK ) log n
Setting λ = η , by choosing again r ∼ C(log n)2 , we get
nE[K1 (θ, x)]
for A1 :
− r2
η 2 nφθ,x (hK ) log n
A1 = 4 1+
16rs2
n
2
− r2
η log n 2
≤ C 1+ = Cn−η /32 . (19)
16r
and for A2 :
A2 = Cnra (nhH φθ,x (hK ) log n)−(a+1)/2

= Cn(log n)(2a−(a+1)/2) (nφθ,x (hK ))−(a+1)/2
≤ C(log n)2a−(a+1)/2 n(1−(a+1)/2) (φθ,x (hK ))−(a+1)/2
374 S. Attaoui
Now, again by the left inequality in (H7b), we have
A2 ≤ C(log n)2a−(a+1)/2 n(1−(a+1)/2) n−( a+1 +β)(a+1)/2

3−a
≤ C(log n)(3a−1)/2 n[(1−a)/2−(3−a+β(a+1))/2]

(a+1)
≤ C(log n)(3a−1)/2 n−1−β 2 .

A2 ≤ Cn−1−ζ . (20)
Finally, by combining results (19) and (20), we get for η large enough

log n
P fD (θ, x) − E fD (θ, x) > η

≤ Cn−1−ζ . (21)
nφθ,x (hK )
Now, we have

P fD (θ, x) ≤ 1/2 ≤ P fD (θ, x) − 1 > 1/2

≤ P fD (θ, x) − E fD (θ, x) > 1/2 ,
and by applying the inequality (21), we can see that

∞

P fD (θ, x) ≤ 1/2 < ∞.
n=1
Finally, the proof of the Lemma is achieved.

Proof of Lemma 4. Because the continuity of the function f (θ, y, x) (see
(H8) and (H11)), we have
∀ > 0, ∃δ( ), ∀y ∈ (μθ (x) − ζ, μθ (x) + ζ),

|f (θ, y, x) − f (θ, μθ (x), x)| ≤ δ( ) ⇒ |y − μθ (x)| ≤ .
θ (x)
This allows us to write directly for y = μ
∀ > 0, ∃δ(), P (|μθ (x) − μθ (x)| > ) ≤ P (|f (θ, μθ (x), x) − f (θ, μθ (x), x)| > δ()) .
Then, according to Theorem 1, ( μθ −μθ ) goes to 0, a.co. as n goes to infinity.

Finally, the result (11) follows immediately.
in single index 375
Proof of Lemma 5. It suffices to apply the results of Lemma 1, where

the Hölder condition imposed in (H2) will be assumed uniformly on θ and
y. So

sup sup E fN (θ, y, x)) − f (θ, y, x) = O(hβK1 ) + O(hβH2 ).
θ∈ΘH y∈C
Proof of Lemma 6. By the compactness of C and ΘH we write

zn
C ⊂ (yk − ln , yk + ln ) (22)
k=1
and
dn
ΘH ⊂ B(tj , τn ) (23)
j=1
with zn ≤ n(3α+1)/2 , and zn ln ≤ C, dn τn ≤ C . Now, for all y ∈ C

and all θ ∈ ΘH , we pose k(y) = arg mink∈{1,...zn } |y − tk | and j(θ) =
arg minj∈{1,...dn } θ − tj .
Note that the recovers of the compact subsets imposed in (22) and (23)
are necessary to derive our uniform consistency. The condition (22) was
explained in Ferraty et al. (2006), whereas condition (23) is a key point to
ensure the geometric link between the number dn of balls and the sequence
of radius τn . In abstract semi-metric spaces, it is usually assumed that dn τn
is bounded, see Ferraty et al. (2008) for more discussion. Consider, now the
following decomposition

sup sup fN (θ, y, x) − E fN (θ, y, x) ≤ sup sup fN (θ, y, x) − fN (θ, yk(y) , x)
θ∈ΘH y∈C θ∈ΘH y∈C

T1

+ sup sup fN (θ, yk(y) , x) − fN (tj(θ) , yk(y) , x)
θ∈ΘH y∈C

T2

+ sup sup fN (tj(θ) , yk(y) , x) − E fN (tj(θ) , yk(y) , x)
θ∈ΘH y∈C

T3

+ sup sup E fN (tj(θ) , yk(y) , x) − E fN (θ, yk(y) , x)
θ∈ΘH y∈C

T4

+ sup sup E fN (θ, yk(y) , x) − E fN (θ, yk(y) , x) .
θ∈ΘH y∈C

T5
376 S. Attaoui
• For (T1 ) and (T5 ): By the fact that K is bounded and because the
Lipschitz’s condition of H imposed in (H9), we get

C 1
sup sup fN (θ, y, x) − fN (θ, yk(y) , x) ≤ sup
θ∈ΘH y∈C hH E[K1 (θ, x)] y∈C n
n
yk(y) − Yi
× H y − Yi − H
hH hH
i=1
|y − yk(y) |
≤ sup
y∈C h2H φθ,x (hK )
ln
≤ C .
h2H φθ,x (hK )

Since ln = O n−(3α+1)/2 , α > 0, and by using the second part of (H9), we
get
ln log n
=o .
h2H φθ,x (hK ) nhH φθ,x (hK )
Thus, by letting n → ∞, we can deduce

log n
T5 ≤ T 1 = O . (24)
nhH φθ,x (hK )
• For (T2 ) and (T4 ): Using the Lipschitzian property of K and the bound-
edness of H, together with the Cauchy-Schwartz’s inequality, we can write

C 1
sup sup fN (tj(θ) , yk(y) , x) − fN (θ, yk(y) , x) ≤ sup
θ∈ΘH y∈C hH E[K1 (θ, x)] θ∈ΘH n
n

× Ki (tj(θ) , x) − Ki (θ, x)
i=1
x − Xi θ − tj(θ)
≤ C sup
θ∈ΘH hH hK φθ,x (hK )
τn
≤ C .
hH φθ,x (hK )
So, taking τn = O( logn n ), we obtain for n infinitely large

log n log n
T 4 ≤ T2 = O =o . (25)
nhH φθ,x (hK ) nhH φθ,x (hK )
in single index 377
• For (T3 ): It suffices to prove that for some positive real η large enough

∞
log n
P sup sup fN (tj(θ) , yk(y) , x) − E fN (tj(θ) , yk(y) , x) > η < ∞.
n=1 θ∈ΘH y∈C nhH φθ,x (hK )
Firstly, observe that:

log n
P T3 > η = P max max fN (tj(θ) , yk(y) , x)
nhH φθ,x (hK ) 1≤j≤dn 1≤k≤zn

log n
−E fN (tj(θ) , yk(y) , x) > η
nhH φθ,x (hK )

≤ dn zn max max P fN (tj(θ) , yk(y) , x)
1≤j≤dn 1≤k≤zn

log n
−E fN (tj(θ) , yk(y) , x) > η
nhH φθ,x (hK )
Secondly, the proof of the probability in the right side of the previous in-
equality runs along the lines of Lemma 2. So, based on this result and by
3
the fact that dn zn ≤ C(τn ln )−1 ≤ n 2 (α+1) , we arrive finally at:

log n
P max max fN (tj(θ) , yk(y) , x) − E fN (tj(θ) , yk(y) , x) > η
1≤j≤dn 1≤k≤zn nhH φθ,x (hK )
3
≤ C n 2 (α+1)−1−ζ
The choice of ζ > 32 (α + 1), combined with equations (24) and (25), allows
to finish the proof of Lemma 6.
Proof of Lemma 7. The proof of this Lemma is derived directly from
result of Lemma 6; case when we take H ≡ 1. Thus, we write

∞

log n
P sup fD (θ, x) − E fD (θ, x) > η =
n=1

∞
log n
P sup 1 − fD (θ, x) > η < ∞. (26)
n=1
378 S. Attaoui
Proof of Corollary 1. Clearly, we have

1 1

inf fD (θ, x) ≤ ⇒ ∃θ ∈ H such that sup 1 − fD (θ, x) >
θ∈H 2 θ∈H 2
Consequently,
1 1

P inf fD (θ, x) ≤ ≤ P sup 1 − fD (θ, x) >
θ∈H 2 θ∈H 2
Inequality (26) in Lemma 7, validates the Corollary.
Said Attaoui Département de Mathématiques

Département de Mathématiques Université des Sciences et de la
Université Djillali Liabès, BP 89 Technologie, Mohamed Boudiaf
22000 Sidi Bel Abbès, Algerie BP 1505, 31000 El Mnaouer-Oran
E-mail: s attaoui@yahoo.fr Algerie
Paper received: 17 October 2012; revised: 24 September 2013.

Convergence Uniforme Index

Uploaded by

Copyright:

Available Formats

You might also like

Convergence Uniforme Index

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Convergence Uniforme Index

Uploaded by

Copyright:

Available Formats

Sankhyā : The Indian Journal of Statistics

2014, Volume 76-A, Part 2, pp. 356-378

On the Nonparametric Conditional Density and Mode

AMS (2000) subject classiﬁcation. Primary 62G05, Secondary 62G07, 62G08.

an estimator of this parameter, based on the cross-validation procedure.

sup sup |P(A ∩ B) − P(A)P(B)| = α(n) → 0 as n → ∞,

where Fij is the σ-ﬁeld generated by Xi , . . . , Xj .

∀y ∈ R, fθ (y|x) =: f (y| < x, θ >). (1)

By considering the same conditions of Ferraty et al. (2003) on the regression

f1 ( · | < x, θ1 >) = f2 ( · | < x, θ2 >) ⇒ f1 ≡ f2 and θ1 = θ2 .

In what follows, we denote by f (θ, ·, x), the conditional density of Y given

(H1) P(X ∈ Bθ (x, hK )) =: φθ,x (hK ) > 0, φθ,x (hK ) → 0, as hK → 0.

with Nx is a ﬁxed neighborhood of x and C is a ﬁxed compact subset

(H3) H is a bounded function, such that

(H4) K is a bounded continuous function such that

0 < C1[0,1] (t) < K(t) < C 1[0,1] (t) < ∞.

(H5) (Xi , Yi )i∈N is a strongly mixing sequences, whose mixing coeﬃcient

1. Remarks on the Assumptions: Our conditions are very standard in

is controlled by mean of condition (H2) and is needed to evaluate the

Lemma 2. Under the Assumptions (H1)-(H7), as n goes to inﬁnity, we

Lemma 3. Under the Assumptions (H1), (H4)-(H7), as n goes to inﬁnity,

4. On the Conditional Mode Estimate

(H8) ∃ζ > 0, such that f (θ, ., x)  on (μθ − ζ, μθ ) and f (θ, ., x)  on

(H10) The kernel K satisﬁes (H4) and following Lipschitz’s condition

∀t1 , t2 ∈ R, |K(t1 ) − K(t2 )| ≤ C|t1 − t2 |.

θ (x) = arg sup f(θ, y, x).

The j th order derivative of the conditional density f (θ, y, x) of the variable

(H11) f (j) (θ, y, x) is j-times continuously diﬀerentiable with respect to y on

Proof of Theorem 2. Under (H11) and (H12), by using the following

f (j) (θ, μ∗θ , x)

We need to show that

So, we would have

Finally, Theorem 2 can be deduced from the following Lemmas.

Lemma 6. Under the Assumptions (H1)-(H7), together with (H8)-(H12),

Lemma 7. Under the Assumptions (H1), (H2) and (H4)-(H7), together

Corollary 1. Under the Assumptions of Lemma 7, we have

5. About the Functional Single Index Estimate

θ = arg max L(θ)

permit us to compute adaptive semi-metrics in the general context of non-

Proof of Lemma 2. By deﬁnition of the a.co. convergence, we need to

To do that, we can see:

For the quantity s2∗

S2 = {(i, j), such that mn + 1 ≤ |i − j| ≤ n − 1}.

By a change of variable in the following integral and by applying the as-

+hH H(t) |f (θ, y, x)| dt

Because of (H1), (H4) and (H6), we can write

|Cov(Δi , Δj )| ≤ 4C1 C2 α(|i − j|) ≤ Cα(|i − j|).

So, we get by using (H5)

For the variance of Δi we have for all i = 1, . . . , n

V ar(Δi ) = E(Δ2i ) ≤ CV ar (Hi (y)Ki (θ, x))

By (H1), and the same integral calculus realized above, we obtain

s2n = o(nhH φθ,x (hK )) + O (nhH φθ,x (hK ))

Now, we apply the classical Fuck-Nagaev’s inequality to get: ∀λ > 0, ∀r > 0

A2 = 4cnra λ−a−1 = Cnra (nhH φθ,x (hK ) log n)−(a+1)/2

Taking r ∼ C(log n)2 , it follows that

A2 = Cn(log n)(2a−(a+1)/2) (nhH φθ,x (hK ))−(a+1)/2

Next, using the left inequality in (H7b), one gets

(H8) ∃ζ > 0, such that f (θ, ., x) on (μθ − ζ, μθ ) and f (θ, ., x) on

θ (x) = arg sup f(θ, y, x).

Then, according to Theorem 1, ( μθ −μθ ) goes to 0, a.co. as n goes to inﬁnity.