Estimation of Integrated Functionals of A Monotone Density

Estimation of Integrated Functionals of a
Monotone Density
Rajarshi Mukherjee and Bodhisattva Sen∗
Department of Biostatistics
arXiv:1808.07915v1 [math.ST] 23 Aug 2018
Harvard University
655 Huntington Avenue
Boston, MA 02115
e-mail: rmukherj@hsph.harvard.edu
Department of Statistics
Columbia University
1255 Amsterdam Avenue
New York, NY 10027
e-mail: bodhi@stat.columbia.edu
Abstract: In this paper we study estimation of integrated functionals of a
monotone nonincreasing density f on [0, ∞). We find the exact asymptotic
distribution of the natural (tuning parameter-free) plug-in estimator, based
on the Grenander
√ estimator. In particular, we show that the simple plug-in
estimator is n-consistent, asymptotically normal and is semiparametric ef-
ficient. Compared to the previous results on this topic (see e.g., Nickl (2008),
Jankowski (2014), and Söhl (2015)) our results holds under minimal assump-
tions on the underlying f — we do not require f to be (i) smooth, (ii) bounded
away from 0, or (iii) compactly supported. Further, when f is the uniform dis-
tribution on a compact interval we explicitly characterize the asymptotic dis-
tribution of the plug-in estimator — which now converges at a non-standard
rate — thereby extending the results in Groeneboom and Pyke (1983) for the
case of the quadratic functional.
Keywords and phrases: efficient estimation, empirical distribution func-
tion, Grenander estimator, Hadamard directionally differentiable, least con-
cave majorant, plug-in estimator, smooth functionals.
1. Introduction
Estimation of functionals of data generating distributions is a fundamental research

problem in nonparametric statistics. Consequently, substantial effort has gone into
understanding estimation of nonparametric functionals (e.g., linear, quadratic, and
other “smooth” functionals) both under density and regression (specifically Gaus-
sian white noise models) settings — a comprehensive snapshot of this huge liter-
ature can be found in Hall and Marron (1987), Bickel and Ritov (1988), Donoho
et al. (1990), Donoho and Nussbaum (1990), Fan (1991), Kerkyacharian and Picard
(1996), Nemirovski (2000), Laurent (1996), Cai and Low (2003, 2004, 2005) and the
references therein. However, most of the above papers have focused on smoothness
∗
Supported by NSF Grants DMS-17-12822 and AST-16-14743.
1
imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018
R. Mukherjee and B. Sen/Estimation of Functionals of a Monotone Density 2
restrictions on the underlying (nonparametric) function class. In contrast, parallel
results for purely shape-restricted function classes (e.g., monotonicity/ convexity/
log-concavity) are limited.
Estimation of a shape-restricted density is somewhat special among nonpara-
metric density estimation problems. In shape-constrained problems classical maxi-
mum likelihood and/or least squares estimation strategies (over the class of all such
shape-restricted densities) lead to consistent (see e.g., Grenander (1956), Groene-
boom et al. (2001), Dümbgen and Rufibach (2009)) and adaptive estimators (see
e.g., Birgé (1987), Kim et al. (2016)) of the underlying density. Consequently, the
resulting procedures are tuning parameter-free; compare this with density estima-
tion techniques under smoothness constraints that usually rely on tuning param-
eters (e.g., bandwidth for kernel based procedures or resolution level for wavelet
projections), the choice of which requires delicate care for the purposes of data
adaptive implementations.
The simplest and the most well-studied shape-restricted density estimation
problem is that of estimating a nonincreasing density on [0, ∞). In this case, the
most natural estimator of the underlying density is the Grenander estimator —
the (nonparametric) maximum likelihood estimator (which also turns out to be
the same as the least squares estimator); see e.g., Grenander (1956), Prakasa Rao
(1969), Groeneboom (1985), Groeneboom and Jongbloed (2014). Even for this
problem only a few papers exist that study nonparametric functional estimation;
see e.g., Nickl (2008), Jankowski (2014), and Söhl (2015). Moreover, these papers
often assume additional constraints such as: (i) smoothness, (ii) compact support,
and (iii) restrictive lower bounds on the underlying true density.
In this paper we study estimation of an integrated functional of a nonincreas-
ing density on [0, ∞) using simple (tuning parameter-free) plug-in estimators. We
derive the asymptotic distribution of the plug-in estimators in this problem, un-
der minimal assumptions. In particular, we show that the limiting distribution is
asymptotically normal with the efficient variance.
1.1. Problem formulation and a summary of our contributions
Suppose that X1 , . . . , Xn are independent and identically distributed (i.i.d.) ran-

dom variables from a nonincreasing density f on [0, ∞). Let P denote the distri-
bution of each Xi and F the corresponding distribution function. The goal is to
study estimation and uncertainty quantification of the functional ν defined as
Z
ν(h, f ) := h(f (x))f (x)dx = P [h ◦ f ], (1)
where h : R → R is a known (smooth) function. Here, and in the rest of R the paper,
we use the operator notation, i.e., for any function g : R → R, P [g] := g(x)dP (x)
denotes the expectation of g(X) under the distribution X ∼ P . Prototypical ex-
amples of such integrated functionals include the (classical) quadratic functional

R
f 2 (x)dx — which in turn is closely related to other inferential problems in non-
parametric statistics (see Giné and Nickl (2016, Chapter 8.3) and the references
therein).
In this paper we study estimation of integrated functionals of the form (1). Let fˆn
be the Grenander estimator in this problem (see Section 2 for its characterization).
Then, a natural estimator of ν(h, f ) is the simple plug-in estimator
Z
ν(h, fn ) := h(fˆn (x))fˆn (x)dx.
ˆ
Another intuitive estimator of ν(h, f ) would be n1 ni=1 h(fˆn (Xi )). It is easy to
P
show that in this case both the above plug-in estimators are exactly the same (see
Lemma 2.1).
We characterize the asymptotic distribution of ν(h, fˆn ) for estimating ν(h, f ),
when h is assumed to be smooth (see√Theorem 2.2). Specifically, we show that
the simple plug-in estimator ν(h, fˆn ) is n-consistent, asymptotically normal with
the (semiparametric) efficient variance (see Corollary 2.5). Further, we can con-
sistently estimate this limiting variance which readily yields asymptotically valid
confidence intervals for ν(h, f ). Compared to the previous results on this topic
(see e.g., Nickl (2007), Nickl (2008), Jankowski (2014), and Söhl (2015)) our re-
sults holds under minimal assumptions on the underlying density (see conditions
(A1)-(A3) in Section 2.2) — in particular, we do not assume that f is smooth,
bounded away from 0, or compactly supported; in fact, our results also hold when
f is piecewise constant.
Finally, in the special case when f is constant (i.e., f is uniform on a com-
pact interval) we explicitly characterize the asymptotic distribution of the plug-in
estimator which now converges at a non-standard rate; see Theorem 2.6. This the-
orem extends the results in Groeneboom and Pyke (1983) beyond the quadratic
functional.
1.2. Comparison with the literature when f belongs to a smoothness

class
Estimation of functionals of the form (1) in the density model, for f belonging
to smoothness classes (e.g., Sobolev and Besov balls), has been studied in detail
in the literature. Historically, the case of “smooth” h is relatively well-understood
and we direct the interested reader to Hall and Marron (1987), Bickel and Ritov
(1988), Donoho et al. (1990), Donoho and Nussbaum (1990), Fan (1991), Birgé and
Massart (1995), Kerkyacharian and Picard (1996), Laurent (1996), Nemirovski
(2000), Cai and Low (2003), Cai and Low (2004), Cai and Low (2005), Tchetgen
et al. (2008) and the references therein.
A general feature of the results for estimating “smooth” nonparametric func-
tionals is an elbow effect in the rate of estimation based on the smoothness of
the underlying
√ function class. For example, when estimating the quadratic func-
tional, n-efficient estimation can be achieved as soon as the underlying density

has Hölder smoothness index β ≥ 1/4, whereas the optimal rate of estimation is
n−4β/(4β+1) (in root mean squared sense) for β < 1/4. Since √ monotone densities
have one weak derivative one can therefore expect to have n-consistent efficient
estimation of ν(h, f ) (under suitable assumptions on h). However, standard of
methods of estimation under smoothness restrictions involve expanding the infi-
nite dimensional function in a suitable orthonormal basis of L2 and estimating an
approximate functional created by truncating the basis expansion at a suitable
point. The point of truncation decides the approximation error of the truncated
functional as a surrogate of the actual functional and depends on the smoothness
of the functional of interest and approximation properties of the orthonormal ba-
sis used. This point of truncation is then balanced with the bias and variance of
the resulting estimator and therefore directly depends on the smoothness of the
function. Consequently, most optimal estimators proposed in the literature depend
explicitly on the knowledge of the smoothness indices. In this regard, our main re-
sult in the paper shows the optimality of tuning parameter-free plug-in procedure
based on the Grenander estimator (under minimal assumptions) for estimating
integral functionals of monotone densities.
Finally, we also mention that the results of this paper are directly motivated
by the general questions tackled in Bickel and Ritov (2003) and Nickl (2007). In
particular, Bickel and Ritov (2003) considered nonparametric function estimators
that satisfy the desirable “plug-in property”. An estimator is said to have the
“plug-in property” if it is simultaneously minimax optimal in a function space
norm (e.g., in l2 -norm) and can also
√ be “plugged-in” to estimate specific classes of
functionals efficiently (and/or at n-rate). Subsequently, Bickel and Ritov (2003)
demonstrated general principles for constructing such nonparametric estimators
pertaining to linear functionals (allowing for certain extensions) with examples
arising in various problems such as nonparametric regression, survival analysis, and
density estimation. Indeed, in this paper we show that the Grenander estimator
satisfies such a “plug-in property” for estimating smooth integrated functionals of
monotone densities.
1.3. Organization of the paper
The rest of the paper is organized as follows. In Section 2.1 we collect a few use-
ful notations, definitions, and discussions which will help the presentation of the
rest of the paper. Subsequently, in Section 2.2, we define the class of integrated
functionals of interest along with our main assumptions. Section 2.3 discusses the
construction of the plug-in estimator, its equivalence with other natural estimators
in this context, as well as the main result of this paper which give an explicit char-
acterization of its asymptotic distribution. Moreover, we show the semiparametric
efficiency of the simple plug-in estimator. Since the case of the uniform density on
a compact interval deserves a finer asymptotic expansion, we devote Section 2.4
to this purpose. Subsequently, in Section 3 we present some numerical results to

validate our theoretical findings. Finally, all the technical proofs are relegated to
Section 4.
2. Estimation of the Integrated Functional ν(h, f )
2.1. Preliminaries
In this subsection we introduce some notation and definitions to be used in the rest
of the paper. Throughout we let R+ denote the (compact) nonnegative real line
[0, ∞]. The underlying probability space on which all random elements are defined
is (Ω, F, P). Suppose that we have nonnegative random variables
i.i.d.
X1 , . . . , X n ∼ P
having a nonincreasing density f and (concave) distribution function F . Let Pn

denote the empirical measure of the Xi ’s and let Fn denote the corresponding
empirical distribution function, i.e., for x ∈ R,
n
1X
Fn (x) := 1(∞,x] (Xi ).
n i=1
For a nonempty set T , we let `∞ (T ) denote the set of uniformly bounded, real-
valued functions on T . Of particular importance is the space `∞ (R+ ), which we
equip with the uniform metric k·k∞ and the ball σ-field; see Pollard (1984, Chapters
IV and V) for background.
Following Beare and Fang (2017), we now define the notion of the least concave
majorant (LCM) operator. Given a nonempty convex set T ⊂ R+ , the LCM over T
is the operator MT : `∞ (R+ ) → `∞ (T ) that maps each ξ ∈ `∞ (R+ ) to the function
MT ξ where
n o
∞
MT ξ(x) := inf η(x) : η ∈ ` (T ), η is concave, and ξ ≤ η on T , x ∈ T.
We write M as shorthand for MR+ and refer to M as the LCM operator. Beare
and Fang (2017, Proposition 2.1) shows that the LCM operator M is Hadamard
directionally differentiable (see e.g., Beare and Fang (2017, Definition 2.2)) at any
concave function g ∈ `∞ (R+ ) tangentially to C(R+ ) (here C(R+ ) denotes the
collection of all continuous real-valued functions on R+ vanishing at infinity). Let
us denote the Hadamard directional derivative of M at the concave function g ∈
`∞ (R+ ) by M0g .
Let F̂n to be the LCM of Fn , i.e., F̂n is the smallest concave function that sits
above Fn . Let B(·) be the Brownian bridge process on [0, 1]. By Donsker’s theorem
(see e.g.,
√ van der Vaart (1998, Theorem 19.3)) we know that the empirical process
∞
Dn := n(Fn − F ) converges in distribution to G ≡ B ◦ F in ` (R+ ), i.e.,
√ d
Dn := n(Fn − F ) → G.

As the delta method is valid for Hadamard directionally differentiable func-
tions (see Shapiro (1991)), as a consequence of the above discussion it follows
that (see Beare and
√ Fang (2017, Theorem 2.1)), for any concave F , the stochastic
processes D̂n := n(F̂n − F ) converges in distribution to Ĝ, i.e.,
√ d
D̂n := n(F̂n − F ) → Ĝ, where Ĝ := M0F G. (2)
We provide further discussions on Ĝ in our context in Section 2.3 after the state-
ment of Theorem 2.2.
2.2. Assumptions
We are interested in estimating the functional ν(h, f ) defined in (1). Many com-
monly studied functionals of interest (see e.g., Bickel and Ritov (1988), Birgé and
Massart (1995)) can indeed be expressed in this form; in particular, the monomial
functionals Z
f p (x)dx for p ≥ 2.
Of particular significance among them is the quadratic functional corresponding
to p = 2 — being related to other inferential problems in density models such
as goodness-of-fit testing and construction of confidence balls (see e.g., Giné and
Nickl (2016) for more details). We now state the assumptions needed for our main
results.
(A1) The true underlying density f : [0, ∞) → [0, ∞) is non-increasing with
f (0+) := lim f (x) < ∞.
x↓0
(A2) We further assume that for some α ∈ (0, 1],

Z
f 1−α (x)dx < ∞.
f <1
(A3) The functional ν(h, f ) is “smooth” in the sense that

h ∈ C 2 (R+ )
where C 2 (R+ ) denotes the class of all twice continuously differentiable func-
tions on [0, ∞).
Let us briefly discuss assumptions (A1)-(A3). Conditions (A1) and (A2) imply
that the true density f is bounded and satisfies a tail decay condition. Note that
(A2) is satisfied if for some m > 1, lim supx→∞ xm f (x) < ∞. In particular, if f is
bounded and compactly supported, conditions (A1) and (A2) immediately hold.
Further, we restrict ourselves to estimating functionals ν(h, f ) where h satisfies
the smoothness condition (A3). This condition can be relaxed to only assuming
that the first derivative of h has a bounded weak derivative on compact intervals.
However, for the sake of simplicity, we simply work with this slightly more strin-
gent assumption on h. Finally, note that (A1) and (A3) together imply that the
functional ν(h, f ) is well-defined (and finite).

2.3. The estimators
A natural estimator of f in this situation is fˆn , the Grenander estimator — the

(nonparametric) maximum likelihood estimator of a non-increasing density on
[0, ∞); see Grenander (1956). The Grenander estimator fˆn is defined as the left-
hand slope of F̂n , the LCM of the empirical distribution function Fn . Note that F̂n
is piecewise affine and a valid concave distribution function by itself. Further, fˆn
is a piecewise constant (non-increasing) density with possible jumps only at the
data points Xi ’s. Moreover, fˆn (x) = 0 for x > max Xi . Consequently, a natural
i=1,...,n
plug-in estimator of ν(h, f ) is
Z
ν(h, fˆn ) := h(fˆn (x))fˆn (x)dx = P̂n [h ◦ fˆn ], (3)
where P̂n denotes the probability measure associated with the Grenander estimator
(i.e., P̂n has distribution function F̂n ).
The main result of this paper gives the exact asymptotic distribution for this
plug-in estimator. However, before going into the statement and discussion of this
result, we first contrast the plug-in estimator with two other natural estimators and
argue that the special structure of the Grenander estimator implies their equiva-
lence with the simple plug-in estimator (3). Observe that another natural estimator
of ν(h, f ) ≡ P [h ◦ f ] would be
n
1X ˆ
Pn [h ◦ fˆn ] := h(fn (Xi )).
n i=1
Indeed, Lemma 2.1 below shows that this natural estimator is exactly the same as
the plug-in estimator P̂n [h ◦ fˆn ].
Lemma 2.1. For any function h : R → R, P̂n [h ◦ fˆn ] = Pn [h ◦ fˆn ].
Remark 2.1 (One-step estimator). There is however another interesting by-product
of Lemma 2.1: The plug-in estimator fˆn is also equivalent to the classically stud-
ied one-step estimator which is traditionally efficient for estimating integrated
functionals such as ν(h, f ) over smoothness classes for f ; see e.g., Bickel et al.
(1993), van der Vaart (1998), van der Vaart (2002). In particular, a first order
influence function of ν(h, f ) at P is P [h(f ) + f h0 (f )] and consequently a one-step
estimator obtained as a bias-corrected version of a f˜-based plug-in estimator (here
f˜ is any estimator of f ) is given by
h i
˜ ˜ ˜ 0 ˜ ˜ ˜ 0 ˜
ν̃ := ν(h, f ) + Pn h(f ) + f h (f ) − P̃ [h(f ) + f h (f )]
where P̃ denotes the probability measure associated with f˜. However, Lemma 2.1
implies that when f˜ = fˆn (the Grenander estimator)
h i
Pn h(f˜) + f˜h0 (f˜) − P̃ [h(f˜) + f˜h0 (f˜)] = 0 ⇒ ν̃ = ν(h, fˆn ).

Consequently, in this case of estimating a monotone density, the first order influ-
ence function based one-step estimator, obtained as a bias-corrected version of a
Grenander-based plug-in estimator, coincides with the simple plug-in estimator (3).
Since all the three intuitive estimators in this problem turn out to be equivalent,
one can expect them to be efficient as well (in a semiparametric sense). Our first
result is a step towards formalizing this heuristic; see Section 4.2 for its proof.
Theorem 2.2. Recall the notation introduced in Section 2.1. Assume conditions
(A1)-(A3). Define the random variable
Z Z
0
Y := − G(x)d(h(f (x))) + Ĝ(x)d(h (f (x))f (x)) (4)
where G and Ĝ are defined in Section 2.1. Then

√
d
n ν(h, fˆn ) − ν(h, f ) → Y. (5)
Before we discuss the many implications of the above result, we provide a very
brief high-level idea of the proof of Theorem 2.2 — mainly to isolate the crucial
components which will be the focus of our simulation studies in Section 3.
Observe that, using Lemma 2.1,
√
n{ν(h, fˆn ) − ν(h, f )}
√ √
= nPn [h(fˆn )] − nP [h(f )]
√ √ √
= n(Pn − P )[h(f )] + n(Pn − P )[h(fˆn ) − h(f )] + n(P [h(fˆn )] − P [h(f )])
= T1 + T2 + T3 , (6)
where we let
√
T1 := n(Pn − P )[h(f )],
√
T2 := n(Pn − P )[h(fˆn ) − h(f )], (7)
√
T3 := n(P [h(fˆn )] − P [h(f )]).
We shall analyze T1 + T3 and T2 separately. In particular, we shall show that T2 =

oP (1) and subsequently demonstrate a distributional limit of T1 + T3 , before finally
concluding using Slutsky’sR theorem. In particular, T1 has a Gaussian limit which
is captured by the term G(x)d(h(f (x)f (x))) in (4), and T3 , being comprised of
fˆn , yields the second part of √Y involving the process Ĝ. In Section 3 we come
back to this decomposition of n{ν(h, fˆn ) − ν(h, f )} and illustrate the asymptotic
behaviors of each term in (7). It is also worth pointing out that assumption (A2)
is only used in the proof of T2 = oP (1) where we control the bracketing entropy of
suitable subclasses of monotone functions on [0, ∞).
Now we give a few remarks about the implications of Theorem 2.2.

Remark 2.2 (Y is well-defined). Implicit in the statement of Theorem 2.2 is the
fact that the distribution of Y is well-defined. Indeed, as will be Rclear from the proof
of Theorem 2.2 that it is sufficient to argue that (G, Ĝ) 7→ G(x)d(h(f (x))) +
Ĝ(x)d(h0 (f (x))f (x)) viewed as a map from (l∞ (R+ ), k · k∞ ) × (l∞ (R+ ), k · k∞ )
R
to R is continuous. The validity of this continuity follows since: (i) G ≡ B ◦ F is

continuous (as the Brownian bridge B is continuous and F is concave) and hence
bounded on compact intervals, and h ◦ f has bounded total variation on R+ ; (ii)
kĜk∞ ≤ kGk∞ i.e., Ĝ ∈ l∞ (R+ ) and h0 ◦ f × f is absolutely continuous.
Remark 2.3 (Assumptions in Theorem 2.2). Note that Theorem 2.2 is valid un-
der much weaker assumptions than previously studied in the literature. Although,
similar results have been derived in Nickl (2008), the derivations relied on further
smoothness, lower bound, and compact support assumptions on f . In contrast, our
only technical assumptions (A1) and (A2) are significantly less restrictive and only
demand f to be bounded and have a polynomial-type tail decay.
Remark 2.4 (Uniform distribution). If f is the uniform distribution on [0, c], for
√
any c > 0, Theorem 2.2 implies that n ν(h, fˆn ) − ν(h, f ) → 0, as the right
P
side of (4) is indeed 0. In the next subsection we deal with this scenario (i.e.,
when f is uniform) and obtain a non-degenerate distributional limit, after proper
normalization.
Remark 2.5 (Connection to estimation in√smoothness classes). For smoothness
classes of functions it is well-known that n-consistent estimation of ν(h, f ) is
possible if f has more than 1/4 derivatives (see e.g., Birgé and Massart (1995))
and a typical construction of such an estimator proceeds via a bias-corrected one-
step estimator; see e.g., Bickel and Ritov (1988), Robins et al. (2008), Robins et al.
(2017). Since (i) monotone densities have one weak derivative, and (ii) the one-
step bias-corrected estimator
√ in this scenario is equivalent to the plug-in estimator,
one can intuitively expect n-consistent estimation of ν(h, f ) without further as-
sumptions. Indeed Theorem 2.2 makes this rigorous.
Remark 2.6 (Tuning parameter-free estimation). Finally, being based on the
Grenander estimator and the plug-in principle, the estimator ν(h, fˆn ) is completely
tuning parameter-free unlike those considered in Robins et al. (2008), Mukherjee
et al. (2016), Mukherjee et al. (2017) for the sake of adaptation over different
smoothness classes.
Although it is clear from Theorem 2.2 that the Grenander estimator based
plug-in
√ principle for estimating integrated functionals
√ of a monotone density is
n-consistent and possesses a weak limit in the n-scale, it is not immediately
obvious whether the limiting distribution is Gaussian with the efficient variance.
The rest of this subsection is devoted to making this connection. We also discuss
some further implications.
In particular, the description of the limiting behavior of the plug-in estimator
(properly normalized) in Theorem 2.2 involves the stochastic processes G and Ĝ.

Recall that G = B ◦ F , where B is the standard Brownian bridge process on [0, 1].
Further, Ĝ = M0F (G) is the Hadamard directional derivative of G at F . The
following result, due to Beare and Fang (2017, Proposition 2.1), characterizes the
Hadamard directional derivative of Mg , for g ∈ `∞ (R+ ), tangentially to C(R+ )
and is crucial to our subsequent analysis.
Proposition 2.3 (Proposition 2.1 of Beare and Fang (2017)). The LCM operator
M : `∞ (R+ ) → `∞ (R+ ) is Hadamard directionally differentiable at any concave g ∈
`∞ (R+ ) tangentially to C(R+ ). Its directional derivative M0g : C(R+ ) → `∞ (R+ )
is uniquely determined as follows: for any ξ ∈ `∞ (R+ ) and x ≥ 0, we have
M0g ξ(x) = MTg,x ξ(x),
where Tg,x = {x} ∪ Ug,x , and Ug,x is the union of all open intervals A ⊂ R+ such
that (i) x ∈ A, and (ii) g is affine over A.
The above proposition will help us demonstrate that indeed Y has a Gaussian
distribution with the desired efficient variance. However, to give more intuition to
the reader, we begin by discussing two simple scenarios.
Case (i): First, let F be strictly concave on R+ . From the above proposition
it follows that M0g is linear if and only if g is strictly concave on R+ (Beare and
Fang (2017)). Thus, M0F ξ = ξ for all ξ ∈ C(R+ ). Thus, if F is strictly concave
then Ĝ = G and consequently
Z
Y = G(x)d h0 (f (x))f (x) + h(f (x)) .

In this form, it is easy to see that Y is a centered Gaussian random variable. The
fact that its variance matches the efficiency bound will be demonstrated in the
proof of Corollary 2.5.
Case (ii): Next consider the other extreme case, i.e., F is piecewise affine. In
this case f is piecewise constant with jumps (say) at 0 < t1 < t2 < . . . < tk < ∞
and values v1 > . . . > vk , for some k ≥ 2 integer, i.e.,
k
X
f (x) = vi 1(ti−1 ,ti ] (x),
i=1
where t0 ≡ 0. As f only takes k + 1 values (namely, {v1 , . . . , vk , 0}) we have that

k
X k
X
Y = ai G(ti ) + bi Ĝ(ti )
i=1 i=1
where
ai := {h(f (ti −)) − h(f (ti +))} and bi := {h0 (f (ti −))f (ti −) − h0 (f (ti +))f (ti +)}.

Further, from Proposition 2.3, we have, for every i = 1, . . . , k, Ĝ(ti ) ≡ M0F G(ti ) =
G(ti ). As G(tk ) = 0 a.s., we have
k−1
X Z
Y = (ai + bi )G(ti ) = G(x)d(h0 (f (x))f (x) + h(f (x)))
i=1
which is the same distribution as in the case when

R F was strictly concave.
0
Since in both the above cases we have Y = G(x)d(h (f (x))f (x) + h(f (x))),
one can conjecture that the result should be generally true for any concave F . This
intuition indeed turns out to be correct as Theorem 2.4 below shows.
Theorem 2.4. Assume conditions (A1)-(A3) and recall the definition of Y from
(4). Then
Z
d
Y = G(x)d(h0 (f (x))f (x) + h(f (x))),
d
where = stands for equality in distribution.
Now, as an immediate corollary to Theorem 2.4 we get the following result
which expresses the right side of (4) in a more familiar form.
Corollary 2.5. Assume that conditions (A1)-(A3) hold. Then,
√
d
n ν(h, fˆn ) − ν(h, f ) → N (0, σeff
2
(f )), (8)

2 0
where, for X ∼ P , σeff (f ) := Var h (f (X))f (X) + h(f (X)) < ∞.
The proof of Corollary 2.5 invokes Theorem 2.2 followed by Theorem 2.4 and
involves some further tedious computations; see Section 4.4 for its proof.
Remark 2.7 (Asymptotic efficiency). Observe that the limiting variance in Corol-
lary 2.5 (see (8)) matches the semiparametric lower bound in this problem (Bickel
et al. (1993), van der Vaart (1998), van der Vaart (2002), Laurent (1996)). This
shows that the plug-in estimator ν(h, fˆn ) is asymptotically efficient when F is
strictly concave.
Remark 2.8 (Monomial functionals). Of special interest is the case when Rh(x) =
xp−1 , where p ≥ 2 is an integer. For example, the quadratic functional f 2 is
obtained when p = 2. In this case, Corollary 2.5 yields an asymptotically efficient
variance of
Z Z 2 !
2 p−1 2 2p−1 p

σeff (f ) = VarP pf (X) =p f (x)dx − f (x)dx , (9)
whenever F is strictly concave.

Remark 2.9 (Construction of confidence intervals). As an important consequence
of Corollary 2.5 one can construct asymptotically valid confidence intervals for the
functional ν(h, f ). In particular, one can estimate the efficient asymptotic variance
2
of σeff as follows. Note that,

2
σeff (f ) = Var h0 (f (X))f (X) + h(f (X)) = ν(h1 , f ) − (ν(h2 , f ))2
where h1 (x) = (h0 (x)x + h(x))2 and h2 (x) = h0 (x)x + h(x). As a result, if h ∈
C 3 (R+ ) then h1 , h2 ∈ C 2 (R+ ) and Corollary 2.5 implies that σ̂n2 := ν(h1 , fˆn ) −
(ν(h2 , fˆn ))2 is a consistent estimator of σeff
2
(f ). Consequently ν(h, fˆn ) ± zα/2 σ̂n
(with zα/2 being the 1 − α/2 quantile of the standard normal distribution) is an
asymptotically valid 100 × (1 − α)% confidence interval for ν(h, f ). Indeed, the
additional assumption of h ∈ C 3 (R+ ) compared to h ∈ C 2 (R+ ) in Corollary 2.5
can be reduced since we only demand consistency of σ̂n , instead of asymptotic
normality. This can be done by suitably modifying the control of T3 in Theorem
2.2. We omit the details for the sake of avoiding repetitions.
2.4. When f is uniformly distributed
As mentioned in Remark 2.4, Theorem 2.2 does not give a non-degenerate limit
when f is the uniform distribution on the interval [0, c], for any c > 0. Indeed,
as we will see, in such a case ν(h, fˆn ) has a faster rate of convergence. In this
subsection we focus our attention to the case when f is the uniform distribution
on [0, 1]; an obvious scaling argument can then be used to generalize the result to
the case when f is uniform on [0, c], for any c > 0.
Suppose that P is the uniform distribution on [0, 1], i.e., f (x) = 1[0,1] (x), x ∈ R.
It has been shown in Groeneboom and Pyke (1983) (also see Groeneboom (1983))
that in this case
n fˆn2 (x) − 1 dx − log n d
R
√ → N (0, 1). (10)
3 log n
The above result yields the Rasymptotic distribution for the plug-in estimator of
the quadratic functional (as f 2 (x)dx = 1), properly normalized. In the following
theorem we extend the above result to general smooth functionals of the Grenander
estimator.
i.i.d.
Theorem 2.6. Suppose that h ∈ C 4 (R+ ) and X1 , . . . , Xn ∼ U [0, 1]. Then

ˆ
n ν(h, fn ) − ν(h, f ) − 21 h00 (1) + 2h0 (1) log n d

q → N (0, 1). (11)
3 00 (1) + 20 h(1) 2 log n

4
h
Remark 2.10. The proof of Theorem 2.6 above showsR that the result remains valid
for estimating a more general functional of the form g(f (x))dx, for g ∈ C 4 (R+ ),

R
instead of functionals of the form ν(h, f ) := h(f (x))f (x)dx. However, for the
sake of consistency, for the rest of the paper we only present the simpler form of
the result here.
The proof of the above result is given in Section 4.5 and follows closely the line
of argument presented in Groeneboom and Pyke (1983). Indeed, as is somewhat
apparent from the statement of the theorem, the proof relies on a Taylor expansion
of h and thereafter controlling the error terms. In particular, we prove that
n (fˆn (x) − 1)k dx
R
√ = oP (1), k = 3, 4. (12)
log n
and then a simple three term Taylor expansion yields the desired result.
3. Simulation study
In this section we illustrate the distributional convergence of our plug-in estimator

ν(h, fˆn ) (see (3)) for estimating ν(h, f ). Let us consider
R the case of estimating the
quadratic functional, i.e., h(x) = x and ν(h, f ) = f 2 (x)dx.
1.4
1.4
Limit Limit
0.5
n = 1000 n = 1000
1.2
1.2
n = 10000 n = 10000
n = 50000 n = 50000
n = 200000 n = 200000
1.0
1.0
0.4
0.8
0.8
0.3
0.6
0.6
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.1
−1.0 −0.5 0.0 0.5 1.0 5000 10000 50000 2000000 −1.0 −0.5 0.0 0.5 1.0
Sample size
Fig 1: The left (right) plot shows kernel density estimates of the distribution of T1 (T3 )
as sample size n varies, along with the actual limiting normal distribution (in
red). Center: Box plots of the distribution of T2 (excluding outliers) as n varies.
We first consider the case when the true density f is strictly concave and thus,
the asymptotic distribution of ν(h, fˆn ) is given by (8) (in Corollary 2.5) where
2
σeff (f ) is given in (9) (with p = 2). Recall the decomposition given in (6). We
study each of the three terms T1 , T2 and T3 (see (7)) in this decomposition sep-
arately and illustrate their limiting behaviors. For the plots in Figure 1 we took
f to be the exponential density with parameter 1 for which ν(h, f ) = 1/2 and
2
σeff (f ) = 4/12. The plots show the limiting behaviors of T1 , T2 and T3 as the sam-
ple size increases (n = 1000, 10000, 50000, 200000); the estimated distributions are
computed from 1000 independent replications. We see that T1 , being just a simple
average (properly standardized), converges to a limiting normal distribution (in
this scenario it converges to the distribution N (0, 2/12)) by the central limit theo-
rem. The distribution of T2 , as n increases, seems to converge to 0 (in probability),

but even for moderate sample sizes it is non-negligible. We can easily check that in
this scenario (i.e., when f is the exponential distribution) T3 should also converge
to the same limit as T1 . However, there seems to be a slight (vanishing) bias in the
distribution of T3 , even for moderately large sample sizes. Although these findings
broadly corroborate the theoretical results in Theorem 2.2 and Corollary 2.5, it is
indeed true that the asymptotic regime seems to kick in quite slowly (i.e., even for
moderate sample sizes there seem to be non-ignorable biases in the distributions
of T2 and T3 ).
Limit Limit
n = 1000 n = 1000
n = 10000 n = 10000
0.4
0.4
n = 50000 n = 50000
n = 200000 n = 100000
0.8
0.3
0.3
0.6
0.2
0.2
0.4
0.1
0.1
0.2
0.0
0.0
0.0
−2 0 2 5000 10000 50000 2000000 −2 0 2
Sample size
Fig 2: The same plots as in Figure 1 for the case when F is given in (13).
Let us now consider the case when f is piecewise √ constant. To fix ideas, as
in Beare and Fang (2017), let us take (with c := 1 − 1/ 2)
(
√x , x ∈ [0, c]
F (x) := 2−1√ √ (13)
2 − 2 + ( 2 − 1)x, x ∈ [c, 1].
Then, from Beare and Fang (2017, Proposition 2.1), it follows that, for any ξ ∈
C(R+ ), (
M[0,c] ξ(x), x ∈ [0, c]
M0F ξ(x) = (14)
M[c,1] ξ(x), x ∈ [c, 1].
Corollary 2.5 gives the asymptotic distribution of ν(h, fˆn ) in this case as well. Fig-
ure 2 shows the corresponding plots (the limiting behaviors of T1 , T2 and T3 as n in-
2
creases) when F is defined in (13) for which ν(h, f ) ≈ 1.828 and σPWA (f ) ≈ 3.314.
We see that T1 and T3 are virtually indistinguishable in this scenario and both
converge to a limiting normal distribution. In this case, the normal approximation
seems quite good even for moderately small sample sizes. The distribution of T2 ,
as n increases, seems to converge to 0 (in probability).

4. Proofs
4.1. Proof of Lemma 2.1
Let X(1) < X(2) < . . . < X(n) be the ordered data points and let
θ̂i = fˆn (X(i) ), for i = 1, . . . , n,
be the fitted values of the Grenander estimator at the ordered data points. A simple
characterization of θ̂ := (θ̂1 , . . . , θ̂n ) is given by
n 2
X 1
θ̂ = arg min − θi (X(i) − X(i−1) ),
θ∈C i=1
n(X(i) − X(i−1) )
where
C = {θ ∈ Rn : θ1 ≥ · · · ≥ θn },
and X(0) ≡ 0; see e.g., Groeneboom and Jongbloed (2014, Exercise 2.5). From the
characterization of projection onto the closed convex cone C it follows that for any
function h : R → R (see e.g., Robertson et al. (1988, Theorem 1.3.6))
n
X 1
− θ̂i (X(i) − X(i−1) )h(θ̂i ) = 0
i=1
n(X(i) − X(i−1) )
n n
1X X
⇔ h(θ̂i ) = h(θ̂i )θ̂i (X(i) − X(i−1) )
n i=1 i=1
⇔ Pn [h ◦ fˆn ] = P̂n [h ◦ fˆn ].
The last step follows from the fact that

Z n
X
P̂n [h ◦ fˆn ] = h(fˆn (x))fˆn (x)dx = h(θ̂i )θ̂i (X(i) − X(i−1) ),
i=1
as fˆn (·) is constant on the interval (X(i−1) , X(i) ], for every i = 1, . . . , n.
4.2. Proof of Theorem 2.2
Recall the notation in (7) and the terms T1 , T2 and T3 .
4.2.1. Analysis of T2 :
This term is controlled though an asymptotic equicontinuity argument made pre-

cise by the following lemma.

Lemma 4.1. Under the conditions of Theorem 2.2 the following stochastic equicon-
tinuity holds:
P
T2 → 0.
Proof. Fix > 0. For any C, M > 0 define the event
E(C, M ) := {kfˆn − f k2 ≤ Cn−1/3 , kfˆn k∞ ≤ M f (0+)}, (15)

R
where for√any function g : R → R, kgk22 := g 2 (x)dx. Finally, using the notation
of Gn ≡ n(Pn − P ), we have to study the probability

P |Gn (h(fˆn ) − h(f ))| >

= P |Gn (h(fˆn ) − h(f ))| > , E(C, M ) + P |Gn (h(fˆn ) − h(f ))| > , E c (C, M ) .
First we claim that one can choose C, M > 0 large enough (depending on ) such
that for any n large enough (depending on )
P(E c (C, M )) ≤ /2. (16)
To this end note that, by Woodroofe and Sun (1993, Theorem 2 and Remark 3),
ˆ (0+)
the sequence of random variables ffn(0+) is asymptotically P -tight and consequently,
∗
given any > 0 there exists M > 0 (depending on > 0) such that for n large
enough (depending on > 0)

P kfˆn k∞ > M ∗ f (0+) ≤ . (17)
4
Moreover, kfˆn k∞ ≤ M ∗ f (0+) implies that
2
√ ∗ 1
Z q
ˆ ˆ
2 2
p
kfn − f k2 ≤ 2( M + 1) f (0+) fn (x) − f (x) dx
2
√
= 2( M ∗ + 1)2 f (0+)h2 P̂n , P ,
where Z q 2
1
fˆn (x) −
2
p
h P̂n , P := f (x) dx
2
is the Hellinger divergence between P̂n and P . As assumptions (A1)-(A2) are sat-
isfied, by van de Geer (2000, Theorem 7.12), there exists a c > 0 such that for all
C 0 ≥ c,
C 02 n1/3

0 −1/3

P h P̂n , P > C n ≤ c exp − .
c2
Consequently, for an appropriate C ∗ (depending on M ∗ ) and c̃, we have

P kfˆn − f k2 > C ∗ n−1/3 , kfˆn k∞ ≤ M ∗ f (0+) ≤ c exp −c̃n1/3 ≤ , (18)
4
for n large enough (depending on ). Now, (17) and (18) together imply (16) with
(C, M ) = (C ∗ , M ∗ ). The rest of the proof works with this choice of (C, M ). Let
δn := Cn−1/3 ,
and let Pmon denote the set of all nonincreasing densities on R+ . Subsequently,
note that

E kGn kFn (δn )
P |Gn (h(fˆn ) − h(f ))| > , E(C, M ) ≤ P kGn kFn (δn ) > ≤ ,

where
Fn (δn ) := {h(g) − h(f ) : g ∈ Pmon , kg − f k2 ≤ δn , kgk∞ ≤ M f (0+)} ,
and
kGn kFn (δn ) := sup |Gn (g)|.
g∈Fn (δn )
Note that here we do not mention measurability issues since this can be taken
care of in a standard way by working with the outer expectation instead (see e.g.
van der Vaart and Wellner (1996)) Thus, it is enough to show that

E kGn kFn (δn ) → 0 as n → ∞.
Now, note that for any g̃ := h(g) − h(f ) ∈ Fn (δn ), using the fact that h ∈ C 1 (R),
kg̃k∞ ≤ (M + 1)f (0+) sup |h0 (x)| = M 0 (say).
x∈[0,(M +1)f (0+)]
Moreover,
!2
2
|h0 (x)| δn2 = δ̃n2

P g̃ ≤ sup (say).
x∈[0,(M +1)f (0+)]
Hence by van der Vaart and Wellner (1996, Lemma 3.4.2),

!
J˜[ ]
E kGn kFn (δn ) ≤ J˜[ ] 1 + √ M0

, (19)
δ̃n2 n
where
˜ ˜
J[ ] ≡ J[ ] δ̃n , Fn (δn ), L2 (P )
and for any class of functions F ⊆ L2 (P ), and δ > 0
Z δq
˜
J[ ] (δ, F, L2 (P )) = 1 + log N[ ] (u, F, L2 (P ))du,
0
with N[ ] (u, F, L2 (P )) being the u bracketing number of F measured in the L2 (P )-

norm. The next lemma estimates this relevant bracketing entropy and will help us
complete the proof.

Lemma 4.2. Assume that h ∈ C 2 (R+ ). Then there exists a constant B depending
on M such that for large enough n (depending on (C, M ))
q
˜
J[ ] δ̃n , Fn (δn ), L2 (P ) ≤ B δ̃n .
We first finish the proof of Lemma 4.1 assuming the validity of Lemma 4.2.
Plugging in the estimate from Lemma 4.2 into (19) we get
q p ! q !
B δ̃n 0 B
E kGn kFn (δn ) ≤ B δ̃n 1 + √ M = B δ̃n 1 + 3/2 √ M 0

δ̃n2 n δ̃n n
q
≤ B 0 δ̃n ,
where the last inequality follows since δ̃n = O(n−1/3 ) and B 0 depends on M 0 , C, M, f (0+)
and supx∈[0,(M ∗ +1)f (0+)] |h0 (x)| which completes the proof of Lemma 4.1.
4.2.2. Proof of Lemma 4.2:
Note that h is uniformly Lipschitz on the interval [0, (M + 1)f (0+)]. If h were
monotone, the proof would immediately follow from the fact the ε-bracketing en-
tropy of all uniformly bounded monotone functions is of order 1/ε (which would
contain g ∈ Pmon , kgk∞ ≤ M f (0+)); see e.g., van de Geer (2000, Equation (2.5)).
To handle the case of a general h ∈ C 2 (R+ ) ⊂ C 1 (R+ ), we write h = h+ − h−
where h+ and h− are monotone nonincreasing (and Lipschitz) functions. Hence,
one can provide obvious modifications to obtain a bracketing number for Fn (δn )
which is similar in order to the case when h in monotone. This completes the proof
of the lemma.
4.2.3. Analysis of T1 + T3 (as defined in (7)):
We first analyze T3 . We start by noting that, as h ∈ C 2 (R+ ),

Z n o
n −1/2 ˆ
T3 = P [h(fn )] − P [h(f )] = h(fˆn (x)) − h(f (x)) f (x)dx
Z
= (fˆn (x) − f (x))h0 (f (x))f (x)dx + wn ,
where
Z
wn := (fˆn (x) − f (x))2 h00 (ξn (x))f (x)dx,
with |ξn (x) − f (x)| ≤ |fˆn (x) − f (x)|. In order to control wn , fix an > 0 and recall
the choice of (C, M ) which guarantees the validity of (16), (17), and (18). Thus,
with an appropriate choice of (C, M ), we have
P (|wn | > )

 
sup |h00 (x)|kfˆn − f k22 >
2f (0+)
,
≤ P x∈[0,(M +1)f (0+)] 
kfˆn k∞ ≤ M f (0+)

+ P kfˆn k∞ > M f (0+) ≤ + < ,
4 4
for n is sufficiently large. Here we have used the fact that h ∈ C 2 (R+ ) which implies
that h00 is bounded on any compact set, and hence in particular over the interval
[0, (M +1)f (0+)]. Consequently wn = oP (1). Moreover, simple integration by parts
implies the following two identities:
Z Z
(fˆn (x) − f (x))h (f (x))f (x)dx = − (F̂n − F )(x)d(h0 (f (x))f (x)),
0
and
Z
(Pn − P )[h(f )] = − (Fn − F )(x)d(h(f (x))).
Therefore, using the two above displays and (2), we have,

√
Z
T1 + T3 = − n(Fn − F )(x)d(h(f (x)))
√
Z
− n(F̂n − F )(x)d(h0 (f (x))f (x)) + oP (1)
Z Z
= − Dn (x)d(h(f (x))) − D̂n d(h0 (f (x))f (x)) + oP (1) (20)
Z Z
d 0
→− G(x)d(h(f (x))) + Ĝ(x)d(h (f (x))f (x)) (21)
as n → ∞, where the last step follows from a version of the continuous mapping
d
theorem: Observe that (i) (Dn , D̂n ) → (G, Ĝ) in the product space `∞ (R+ ) ×
`∞ (R+ ); (ii) for any g1 , g2 : R+ → R uniformly bounded,
Z Z
(g1 , g2 ) 7→ g1 (x)d(h(f (x))) + g2 (x)d(h0 (f (x))f (x))
is a continuous function. This gives the convergence of (20) to (21) and completes
the proof of the theorem.
Let g(t) = t · h0 (t), for t ∈ R. Let us assume that the function g ◦ f is monotone
(otherwise we can split g ◦ f as the difference of two monotone functions and apply
the following to each part). Note that it is enough to show that for any ξ ∈ C(R+ ),
Z Z
0
MF ξ(x)d[g(f (x))] = ξ(x)d[g(f (x))].

Then we can take ξ to be G ≡ B ◦ F (which is almost surely continuous) and we
will obtain the desired result.
Suppose that
I = {x ∈ (0, ∞) : F is affine in an open interval around x}.
It is easy to show that I is a Borel measurable set. Consider the collection {TF,x :
x ∈ I}, where TF,x is defined in Proposition 2.3 (also see Beare and Fang (2017,
Proposition 2.1)). First note that x ∈ TF,x , for every x ∈ I. We now claim that there
are at most countably many such distinct TF,x ’s, as x varies in I. This follows from
the fact that for any x in the support of F , F has a strictly positive slope on TF,x
(i.e., F cannot be affine with slope 0 on TF,x , as F is concave and nondecreasing)
which implies that we can associate a unique rational number in the range of F
to this interval TF,x . Let {TF,xi }i≥1 (for xi ∈ (0, ∞)) be an enumeration of this
countable collection {TF,x : x ∈ I}. Obviously, I ⊂ ∪∞ i=1 TF,xi .
Let ξ ∈ C(R+ ). Then,
Z Z
0

MF ξ(x)d[g(f (x))] − ξ(x)d[g(f (x))]

Z Z
≤ |M0F ξ(x) − ξ(x)|d[g(f (x))] + |M0F ξ(x) − ξ(x)|d[g(f (x))]
I [0,∞)\I
∞
XZ
≤ |MTF,xi ξ(x) − ξ(x)|d[g(f (x))]
i=1 TF,xi
where the last inequality follows from the fact that for x ∈ [0, ∞) \ I, M0F ξ(x) =
ξ(x) (see Beare and Fang (2017, Remark 2.2)) and the above integrals are viewed
as a Lebesgue-Steiljes. Notice now that on the open interval TF,xi , f is constant
(as F is affine on TF,xi ). Hence each of the integrals in the last display equals 0.
This completes the proof.
4.4. Proof of Corollary 2.5
Recall the definition of Y in (4). Furthermore, note that by Beare and Fang (2017),
when F is strictly concave, we have Ĝ = G. Consequently,
Z
Y = − G(x)d(g(x)),
where
g(x) = h0 (f (x))f (x) + h(f (x)), x ∈ R.
It is easy to see that the limiting Y is a centered Gaussian random variable and
2
therefore we only show that Var(Y ) matches σeff . Further, we only prove the result
for the case when g(∞) := limt→∞ g(t) = 0, i.e., h(0) = 0. The other cases can be
derived by working with h∗ (x) := h(x) − h(0) instead of h(x).

To operationalize our argument, we begin with the following simplification:
Z Z Z Z
d
G(x)d(g(x)) = B ◦ F (x)dg(x) = W ◦ F (x)dg(x) − W(1) F (x)dg(x),
d
where W is the standard Brownian motion on [0, 1] and = stands for equality in
distribution. Therefore,
Z Z
Var(Y ) = Var W ◦ F (x)dg(x) + Var W(1) F (x)dg(x)
Z Z
− 2 Cov W ◦ F (x)dg(x), W(1) F (x)dg(x)
Z 2 Z 2
=E W ◦ F (x)dg(x) + F (x)dg(x)
Z Z
− 2 F (x)dg(x)E W(1) W ◦ F (x)dg(x) .
Now, by simple integration by parts,

Z Z
F (x)dg(x) = − g(x)dF (x),
and consequently it is enough to prove the next lemma to complete the proof of
Corollary 2.5.
Lemma 4.3. The following identities hold under the assumptions of Corollary 2.5:
Z Z
E W(1) W ◦ F (x)dg(x) = F (x)dg(x), and
Z 2 Z
E W ◦ F (x)dg(x) = g 2 (x)dF (x).
Proof. The first identity is obvious since under our assumptions one can inter-
change the integral and expectation and observe that
E(W(1)W(F (x))) = Cov (W(1), W(F (x))) = min{1, F (x)} = F (x).
For the second identity, note that
Z 2 Z Z
0 0
E W ◦ F (x)dg(x) = E W ◦ F (x)W ◦ F (x )dg(x)dg(x )
Z ∞ Z x0
=2 F (x)dg(x)dg(x0 ). (22)
0 0
R x0
Now, let v(x0 ) := 0
F (x)dg(x), for x0 ∈ R+ . Then by repeated integration by
parts
Z ∞ Z x0
F (x)dg(x)dg(x0 )
0 0

Z ∞
= lim v(x0 )g(x0 ) − v(0)g(0) − g(x0 )dv(x0 )
x0 →∞ 0
Z ∞
=− g(x0 )dv(x0 )
0
Z ∞
0 0 0
=− g(x )F (x )dg(x )
0
1 ∞ 2 0
Z
0 1 2 0 1 2 0
= − lim F (x ) g (x ) − F (0) g (0) − g (x )dF (x )
x0 →∞ 2 2 2 0
1 ∞ 2 0
Z
= g (x )dF (x0 ). (23)
2 0
Plugging back (23) into (22) completes the proof of Lemma 4.3.
First let us define g : R+ → R as g(x) := h(x)x, for x ∈ R+ , and consequently

g ∈ C 4 (R+ ), g 00 (x) = h00 (x)x + 2h0 (x), and
Z
ν(h, fn ) − ν(h, f ) = [g(fˆn (x)) − g(f (x))]dx.
ˆ
Note that for some ξn (x) lying between f (x) ≡ 1 and fˆn (x) one has by a four term
Taylor expansion (allowed by the assumption that h ∈ C 4 (R+ ))

n ν(h, fˆn ) − ν(h, f ) − 12 (h00 (1) + 2h0 (1)) log n
q
3
4
(h00 (1) + 2h0 (1))2 log n
n [g(fˆn (x)) − g(1)]dx − 12 g 00 (1) log n
R
= q
3[ 21 g 00 (1)]2 log n
n (fˆn (x) − 1)2 dx − log n
R
= √ (24)
3 log n
1 n g (3) (1)(fˆn (x) − 1)3 dx 1 n g (4) (ξn (x))(fˆn (x) − 1)4 dx
R R
+ q + q .
6 1 00 2
3[ 2 g (1)] log n 24 1 00 2
3[ 2 g (1)] log n
Therefore, if we prove that
n (fˆn (x) − 1)k dx

R
√ = oP (1), for k = 3, 4,
log n
then
R
n g (4) (ξ(x))(fˆ (x) − 1)4 dx n (fˆn (x) − 1)4 dx
R
n (4)
√ ≤ sup |g (z)| √

3 log n z∈[0,kfˆn k∞ ∨1] 3 log n

= oP (1),
since kfˆn k∞ = OP (1) by Woodroofe and Sun (1993) and g (4) is bounded on com-
pact intervals. Therefore, modulo the claim above, one readily obtains the desired
result (11) from (24) and (10).
n (fˆn (x)−1)k dx
R
The next two subsections are devoted towards understanding √
log n
for
k = 3, 4. In the analysis, we crucially use Groeneboom and Pyke (1983, Theorem
2.1) and consequently need the following notation to proceed. Let
n Nj
n X
X X
Tn := jNj , Sn := Sji ,
j=1 j=1 i=1
where {Sji : i ≥ 1, j ≥ 0} are independent random variables with Sji ∼ Gamma(j, 1),
for j ≥ 1, and S0i ∼ Gamma(1, 1), and Nj ∼ Poisson(1/j) for j ≥ 1 and N0 = 1
a.s.; further, {Nj } and {Sji } are independent. Also let,
n Nj
n X
1X −1/2
X
Wn = jNj , Vn = n (Sji − j).
n j=1 j=1 i=1
n (fˆn (x)−1)k dx
R
With this notation we are ready to analyze √
log n
for k = 3, 4.
n (fˆn (x)−1)3 dx
R
4.5.1. Proof of √
log n
= oP (1)
Let us start by considering the distribution of

Z Z Z
3 2
Ln := (fn (x) − 1) dx = fn (x)dx − 3 fˆn (x)dx + 2.
ˆ 3 ˆ
By an argument similar to that in Groeneboom and Pyke (1983, Section 3), it can
be seen that Ln has the same distribution as that of
n Nj n Nj
1 X 3X 1 3 X 2X 1
L∗n := j − j + 2 {Tn = n, Sn = n}.

2
n j=1 i=1 Sji n j=1 i=1 Sji
For constants c0 , cn , γn , we define the sequence of random variables


n Nj n Nj
1 X 3 X 1 1 X
2
X 1 1
Un (c0 , cn , γn ) := j 2
− 2 −3 j −
cn j=1 i=1 Sji j j=1 i=1
Sji j

n XNj
X
+ c0 (Sji − j) − γn  .
j=1 i=1

The particular choice of c0 , cn , γn is crucial in the analysis that follows. In partic-
ular, a specific choice for the constant c0 implies certain fine-tuned cancellations
which are necessary for controlling the asymptotic variance of Un at a desired level.
It is worth noting that such a definition and the eventual choice of c0 = 1 is also
present in Groeneboom and Pyke (1983, Equation (3.2)). Our choice of c0 , cn , γn is
more tailored to the current problem. Finally, conditional on the joint event that
Wn = 1 and Vn = 0 (i.e., Tn = n and Sn = n) one has
d
Un (c0 , cn , γn ){Wn = 1, Vn = 0} = c−1 ∗

n (nLn − γn ) {Tn = n, Sn = n}. (25)
The following lemma, crucial to our proof, is the analogue of Groeneboom and
Pyke (1983, Lemma 3.1).
√ d
Lemma 4.4. Let cn = log n, γn = 0 and c0 = −1. Then (Un , Vn , Wn ) →
(U, V, W ) where U = δ0 , the dirac measure at 0, and (V, W ) has the infinitely

R1 2 2
divisible characteristic function φV,W (s, t) = exp 0 (eitu−s u /2 − 1)u−1 du .
Proof. With the choice of c0 , cn , γn provided in the statement of the theorem,
P
denote Un ≡ Un (c0 , cn , γn ). We only prove the fact that Un → 0. The subsequent
conclusion follows verbatim from the proof of Groeneboom and Pyke (1983, Lemma
Γ(j−k)
−k
3.1). In the following analysis we repeatedly use the fact that E Sji = Γ(j) for
j > k.
First, define Un∗ to be Un without its first 3 summands in j, i.e.,
Un∗
 
n Nj n Nj X n X Nj
1 X X 1 1 X X 1 1
:= j3 2
− 2 −3 j2 − − (Sji − j) .
Sji j j=4 i=1
P P
We will show that Un∗ → 0 and as Un − Un∗ → 0 (since cn → ∞), this will establish
the desired result. Note that,
n n
∗
X
3 Γ(j − 2) 1 X
2 Γ(j − 1) 1
cn E (Un ) = j E(Nj ) − 2 −3 j E(Nj ) −
j=4
Γ(j) j j=4
Γ(j) j
n
X 4
= .
j=4
(j − 1)(j − 2)
As lim supn→∞ | nj=4 (j−1)(j−2)

4
| < ∞, one has E(Un∗ ) → 0 (as cn → ∞). Subse-
P
quently it is enough to prove that Var(Un∗ ) → 0 which in turn is implied by proving

that lim supn→∞ Var(cn Un∗ ) < ∞. First note that
n 2
X 4j
Var(E(cn Un∗ |N1 , . . . , Nn )) = Var(Nj )
j=4
(j − 1)(j − 2)

n 2
X 4j 1
= ,
j=4
(j − 1)(j − 2) j
and consequently, lim supn→∞ Var(E(cn Un∗ |N1 , . . . , Nn )) < ∞. Next, note that
n
∗
X 1 1
Var(cn Un |N1 , . . . , Nn ) = Nj j 6 Var( 2 ) + 9j 4 Var( ) + Var(Sj1 )
j=4
Sj1 Sj1

5 1 1 3 1 1
−6j Cov( 2 , ) − 2j Cov( 2 , Sj1 ) + 6Cov( , Sj1 )
Sj1 Sj1 Sj1 Sj1
n
X 15j 5 + 129j 4 − 404j 3 + 116j 2 + 48j
= Nj 2 (j − 2)2 (j − 3)(j − 4)
.
j=4
(j − 1)
Consequently,
lim sup E(Var(cn Un∗ |N1 , . . . , Nn ))
n→∞
n
X 15j 4 + 129j 3 − 404j 2 + 116j + 48
= lim sup < ∞.
n→∞
j=4
(j − 1)2 (j − 2)2 (j − 3)(j − 4)
This completes the proof of the lemma.

We finally note that Lemma 4.4 immediately implies that Un is asymptotically
P
independent of (Wn , Vn ) and that Un − Um → 0 for any n, m → ∞. Consequently,
the analysis provided in the proof of Groeneboom and Pyke (1983, Theorem 3.1)
goes through verbatim yielding that
d
Un |{Wn = 1, Vn = 0} → 0
d
which in turn proves that (using (25)) c−1 ∗
n (nLn − γn ) {Tn = n, Sn = n} → 0 for

our choice of γn = 0. This yields the desired result.
n (fˆn (x)−1)4 dx
R
4.5.2. Proof of √
log n
= oP (1)
(fˆ (x)−1)3 dx
R
n
The proof technique is similar to that of the case √
log n
albeit with much
more cumbersome details and a different choice of c0 . As before, let us start by
considering the distribution of
Z Z Z Z
4 3 2
Ln := (fˆn (x) − 1) dx = fˆn (x)dx − 4 fˆn (x)dx + 6 fˆn (x)dx − 3.
4
As argued in Groeneboom and Pyke (1983, Section 3), it is equivalent to studying

the (conditional) distribution of
n Nj n Nj n Nj
1 X 4X 1 4 X 3X 1 6 X 2X 1
L∗n := j − j + j − 3 {Tn = n, Sn = n}.

3 2
n j=1 i=1 Sji n j=1 i=1 Sji n j=1 i=1 Sji

We once again introduce a sequence of random variables

n Nj n Nj
1 X 4 X 1 1 X
3
X 1 1
Un (c0 , cn , γn ) := j 3
− 3 −4 j 2
− 2
Sji j

n Nj n X Nj
X X 1 1 X
+6 j2 − + c0 (Sji − j) − γn  .
j=1 i=1
S ji j j=1 i=1
Then, conditional on the joint event that Wn = 1 and Vn = 0 (i.e., Tn = n and

Sn = n) one has
d
Un (c0 , cn , γn ){Wn = 1, Vn = 0} = c−1 ∗

n (nLn − γn )|{Tn = n, Sn = n}.
Once again we have a crucial lemma which is the analogue of (Groeneboom and
Pyke, 1983, Lemma 3.1).
√ d
Lemma 4.5. Let cn = log n, γn = 0 and c0 = 1. Then (Un , Vn , Wn ) → (U, V, W )
where
RU = δ0 and (V, W ) has the infinitely divisible characteristic function φV,W (s, t) =
1 itu−s2 u2 /2
exp 0 (e − 1)u−1 du .
Proof. With the above choice of c0 , cn , γn , denote by Un ≡ Un (c0 , cn , γn ). We only
P
prove the fact that Un → 0. The subsequent conclusion follows verbatim from the
proof of Groeneboom and Pyke (1983, Lemma 3.1), as in the proof of Lemma 4.4.
First, define Un∗ to be Un without it’s first 4 summands in j, i.e.,

n Nj n Nj
∗ −1 
X
4
X 1 1 X
3
X 1 1
Un =cn j 3
− 3 −4 j 2
− 2
j=5 i=1
S ji j j=5 i=1
S ji j

n Nj X n X Nj
X X 1 1
+6 j2 − + (Sji − j) .
j=5 i=1
S ji j j=5 i=1
P P
We show that Un∗ → 0 and indeed by Un − Un∗ → 0 (since cn → ∞) we establish
the desired claim. Note that,
n n
∗
X
4 Γ(j − 3) 1 X
3 Γ(j − 2) 1
cn E (Un ) = j E(Nj ) − 3 −4 j E(Nj ) − 2
j=5
Γ(j) j j=5
Γ(j) j
n
X Γ(j − 1) 1
+6 j 2 E(Nj ) −
j=5
Γ(j) j
n
X 2(j − 9)
= .
j=5
(j − 1)(j − 2)(j − 3)

Pn 2(j−9)
Now lim supn→∞ | j=5 (j−1)(j−2)(j−3) | < ∞ by standard ratio test. Consequently,
∗
using cn → ∞ one has E(Un ) → 0.
Subsequently it is enough to prove that Var(Un∗ ) → 0 which in turn is implied
by proving that lim supn→∞ Var(cn Un∗ ) < ∞. First note that
n 2
X 2j(j − 9)
Var(E(cn Un∗ |N1 , . . . , Nn )) = Var(Nj )
j=5
(j − 1)(j − 2)(j − 3)
n 2
X 2j(j − 9) 1
= ,
j=5
(j − 1)(j − 2)(j − 3) j
and consequently, lim supn→∞ Var(E(cn Un∗ |N1 , . . . , Nn )) < ∞. Next note that
Var(cn Un∗ |N1 , . . . , Nn )

n
X 1 1 1
= Nj j 8 Var( 3 ) + 16j 6 Var( 2 ) + 36j 4 Var( ) + Var(Sj1 )
j=5
Sj1 Sj1 Sj1
1 1 1 1 1
− 8j 7 Cov(
3
, 2 ) + 12j 6 Cov( 3 , ) + 2j 4 Cov( 3 , Sj1 )
Sj1 Sj1 Sj1 Sj1 Sj1

5 1 1 3 1 2 1
− 48j Cov( 2 , ) − 8j Cov( 2 , Sj1 ) + 12j Cov( , Sj1 )
Sj1 Sj1 Sj1 Sj1
n
X 96j 7 + 4297j 6 − 14259j 5 − 28198j 4 + 102180j 3 − 33336j 2 − 4320j
= Nj .
j=5
(j − 1)2 (j − 2)2 (j − 3)2 (j − 4)(j − 5)(j − 6)
Consequently,
lim sup E(Var(cn Un∗ |N1 , . . . , Nn ))

n→∞
n
X 96j 6 + 4297j 5 − 14259j 4 − 28198j 3 + 102180j 2 − 33336j − 4320
= lim sup < ∞.
n→∞ (j − 1)2 (j − 2)2 (j − 3)2 (j − 4)(j − 5)(j − 6)
j=5
This completes the proof of the lemma.

We finally note that as before Lemma 4.4 immediately implies that in the asymp-
P
totic distributional limit, Un is independent of Wn , Vn and that Un −Um → 0 for any
n, m → ∞. Consequently, the analysis provided in proving Groeneboom and Pyke
d
(1983, Theorem 3.1) goes through verbatim yielding that Un |{Wn = 1, Vn = 0} → 0
which in turn proves that
d
c−1 ∗
n (nLn − γn )|{Tn = n, Sn = n} → 0
(fˆn (x)−1)4 dx
R
P
for our choice of γn = 0. Consequently, √
log n
→ 0.

References
Beare, B. K. and Z. Fang (2017). Weak convergence of the least concave majorant
of estimators for a concave distribution function. Electron. J. Stat. 11 (2), 3841–
3870.
Bickel, P. J., C. A. Klaassen, P. J. Bickel, Y. Ritov, J. Klaassen, J. A. Wellner, and
Y. Ritov (1993). Efficient and adaptive estimation for semiparametric models.
Johns Hopkins University Press Baltimore.
Bickel, P. J. and Y. Ritov (1988). Estimating integrated squared density deriva-
tives: sharp best order of convergence estimates. Sankhyā Ser. A 50 (3), 381–393.
Bickel, P. J. and Y. Ritov (2003). Nonparametric estimators which can be
“plugged-in”. Ann. Statist. 31 (4), 1033–1053.
Birgé, L. (1987). On the risk of histograms for estimating decreasing densities.
Ann. Statist. 15 (3), 1013–1022.
Birgé, L. and P. Massart (1995). Estimation of integral functionals of a density.
Ann. Statist. 23 (1), 11–29.
Cai, T. T. and M. G. Low (2003). A note on nonparametric estimation of linear
functionals. Ann. Statist. 31 (4), 1140–1153.
Cai, T. T. and M. G. Low (2004). Minimax estimation of linear functionals over
nonconvex parameter spaces. Ann. Statist. 32 (2), 552–576.
Cai, T. T. and M. G. Low (2005). Nonquadratic estimators of a quadratic func-
tional. Ann. Statist. 33 (6), 2930–2956.
Donoho, D. L., R. C. Liu, and B. MacGibbon (1990). Minimax risk over hyper-
rectangles, and implications. Ann. Statist. 18 (3), 1416–1437.
Donoho, D. L. and M. Nussbaum (1990). Minimax quadratic estimation of a
quadratic functional. Journal of Complexity 6 (3), 290–323.
Dümbgen, L. and K. Rufibach (2009). Maximum likelihood estimation of a log-
concave density and its distribution function: basic properties and uniform con-
sistency. Bernoulli 15 (1), 40–68.
Fan, J. (1991). On the estimation of quadratic functionals. Ann. Statist. 19 (3),
1273–1294.
Giné, E. and R. Nickl (2016). Mathematical foundations of infinite-dimensional
statistical models. Cambridge Series in Statistical and Probabilistic Mathemat-
ics, [40]. Cambridge University Press, New York.
Grenander, U. (1956). On the theory of mortality measurement. II. Skand. Aktu-
arietidskr. 39, 125–153 (1957).
Groeneboom, P. (1983). The concave majorant of Brownian motion. Ann.
Probab. 11 (4), 1016–1027.
Groeneboom, P. (1985). Estimating a monotone density. In Proceedings of the
Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, Vol. II (Berke-
ley, Calif., 1983), Wadsworth Statist./Probab. Ser., Belmont, CA, pp. 539–555.
Wadsworth.
Groeneboom, P. and G. Jongbloed (2014). Nonparametric estimation under shape

constraints, Volume 38 of Cambridge Series in Statistical and Probabilistic Math-
ematics. Cambridge University Press, New York. Estimators, algorithms and
asymptotics.
Groeneboom, P., G. Jongbloed, and J. A. Wellner (2001). Estimation of a convex
function: characterizations and asymptotic theory. Ann. Statist. 29 (6), 1653–
1698.
Groeneboom, P. and R. Pyke (1983). Asymptotic normality of statistics based on
the convex minorants of empirical distribution functions. Ann. Probab. 11 (2),
328–345.
Hall, P. and J. S. Marron (1987). Estimation of integrated squared density deriva-
tives. Statistics & Probability Letters 6 (2), 109–115.
Jankowski, H. (2014). Convergence of linear functionals of the Grenander estimator
under misspecification. Ann. Statist. 42 (2), 625–653.
Kerkyacharian, G. and D. Picard (1996). Estimating nonquadratic functionals of
a density using Haar wavelets. Ann. Statist. 24 (2), 485–507.
Kim, A. K., A. Guntuboyina, and R. J. Samworth (2016). Adaptation in log-
concave density estimation. arXiv preprint arXiv:1609.00861 .
Laurent, B. (1996). Efficient estimation of integral functionals of a density. Ann.
Statist. 24 (2), 659–681.
Mukherjee, R., W. K. Newey, and J. M. Robins (2017). Semiparametric ef-
ficient empirical higher order influence function estimators. arXiv preprint
arXiv:1705.07577 .
Mukherjee, R., E. T. Tchetgen, and J. Robins (2016). On adaptive estimation of
nonparametric functionals. arXiv preprint arXiv:1608.01364 .
Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on prob-
ability theory and statistics (Saint-Flour, 1998), Volume 1738 of Lecture Notes
in Math., pp. 85–277. Springer, Berlin.
Nickl, R. (2007). Donsker-type theorems for nonparametric maximum likelihood
estimators. Probab. Theory Related Fields 138 (3-4), 411–449.
Nickl, R. (2008). Uniform central limit theorems for the grenander estimator.
(unpublished manuscript).
Pollard, D. (1984). Convergence of stochastic processes. Springer Series in Statis-
tics. Springer-Verlag, New York.
Prakasa Rao, B. L. S. (1969). Estimation of a unimodal density. Sankhyā Ser.
A 31, 23–36.
Robertson, T., F. T. Wright, and R. L. Dykstra (1988). Order restricted statistical
inference. Wiley Series in Probability and Mathematical Statistics: Probability
and Mathematical Statistics. Chichester: John Wiley & Sons Ltd.
Robins, J., L. Li, E. Tchetgen, and A. van der Vaart (2008). Higher order influence
functions and minimax estimation of nonlinear functionals. In Probability and
statistics: essays in honor of David A. Freedman, Volume 2 of Inst. Math. Stat.
(IMS) Collect., pp. 335–421. Inst. Math. Statist., Beachwood, OH.
Robins, J. M., L. Li, R. Mukherjee, E. T. Tchetgen, and A. van der Vaart (2017).

Minimax estimation of a functional on a structured high-dimensional model.
Ann. Statist. 45 (5), 1951–1987.
Shapiro, A. (1991). Asymptotic analysis of stochastic programs. Ann. Oper.
Res. 30 (1-4), 169–186. Stochastic programming, Part I (Ann Arbor, MI, 1989).
Söhl, J. (2015). Uniform central limit theorems for the Grenander estimator. Elec-
tron. J. Stat. 9 (1), 1404–1423.
Tchetgen, E., L. Li, J. Robins, and A. van der Vaart (2008). Minimax estimation
of the integral of a power of a density. Statist. Probab. Lett. 78 (18), 3307–3311.
van de Geer, S. A. (2000). Applications of empirical process theory, Volume 6
of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge
University Press, Cambridge.
van der Vaart, A. (2002). Semiparametric statistics. In Lectures on probability
theory and statistics (Saint-Flour, 1999), Volume 1781 of Lecture Notes in Math.,
pp. 331–457. Springer, Berlin.
van der Vaart, A. W. (1998). Asymptotic statistics, Volume 3 of Cambridge Se-
ries in Statistical and Probabilistic Mathematics. Cambridge University Press,
Cambridge.
van der Vaart, A. W. and J. A. Wellner (1996). Weak convergence and empir-
ical processes. Springer Series in Statistics. Springer-Verlag, New York. With
applications to statistics.
Woodroofe, M. and J. Sun (1993). A penalized maximum likelihood estimate of
f (0+) when f is nonincreasing. Statist. Sinica 3 (2), 501–515.

Estimation of Integrated Functionals of A Monotone Density

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation of Integrated Functionals of A Monotone Density

Uploaded by

Copyright:

Available Formats

Estimation of Integrated Functionals of a

Estimation of functionals of data generating distributions is a fundamental research

1.1. Problem formulation and a summary of our contributions

Suppose that X1 , . . . , Xn are independent and identically distributed (i.i.d.) ran-

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

1.2. Comparison with the literature when f belongs to a smoothness

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

1.3. Organization of the paper

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

2. Estimation of the Integrated Functional ν(h, f )

having a nonincreasing density f and (concave) distribution function F . Let Pn

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

(A2) We further assume that for some α ∈ (0, 1],

(A3) The functional ν(h, f ) is “smooth” in the sense that

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

A natural estimator of f in this situation is fˆn , the Grenander estimator — the

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

where G and Ĝ are defined in Section 2.1. Then

We shall analyze T1 + T3 and T2 separately. In particular, we shall show that T2 =

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

to R is continuous. The validity of this continuity follows since: (i) G ≡ B ◦ F is

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

M0g ξ(x) = MTg,x ξ(x),

where t0 ≡ 0. As f only takes k + 1 values (namely, {v1 , . . . , vk , 0}) we have that

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

which is the same distribution as in the case when

whenever F is strictly concave.

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

2.4. When f is uniformly distributed

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

In this section we illustrate the distributional convergence of our plug-in estimator

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

−2 0 2 5000 10000 50000 2000000 −2 0 2

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

4.1. Proof of Lemma 2.1

θ̂i = fˆn (X(i) ), for i = 1, . . . , n,

⇔ Pn [h ◦ fˆn ] = P̂n [h ◦ fˆn ].

The last step follows from the fact that

as fˆn (·) is constant on the interval (X(i−1) , X(i) ], for every i = 1, . . . , n.

4.2. Proof of Theorem 2.2

Recall the notation in (7) and the terms T1 , T2 and T3 .

This term is controlled though an asymptotic equicontinuity argument made pre-

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

E(C, M ) := {kfˆn − f k2 ≤ Cn−1/3 , kfˆn k∞ ≤ M f (0+)}, (15)

P(E c (C, M )) ≤ /2. (16)

Hence by van der Vaart and Wellner (1996, Lemma 3.4.2),

with N[ ] (u, F, L2 (P )) being the u bracketing number of F measured in the L2 (P )-

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

4.2.2. Proof of Lemma 4.2:

4.2.3. Analysis of T1 + T3 (as defined in (7)):

We first analyze T3 . We start by noting that, as h ∈ C 2 (R+ ),

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

Therefore, using the two above displays and (2), we have,

4.3. Proof of Theorem 2.4

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

I = {x ∈ (0, ∞) : F is affine in an open interval around x}.

4.4. Proof of Corollary 2.5

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

Now, by simple integration by parts,

imsart-generic ver. 2014/10/16 file: Draft_5.tex date: August 27, 2018

4.5. Proof of Theorem 2.6

First let us define g : R+ → R as g(x) := h(x)x, for x ∈ R+ , and consequently

P(E c (C, M )) ≤ /2. (16)