Asymptotically Optimal Histogram Density Approximations

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Asymptotically Optimal Cells for a Historgram

Author(s): Atsuyuki Kogure


Source: The Annals of Statistics, Vol. 15, No. 3 (Sep., 1987), pp. 1023-1030
Published by: Institute of Mathematical Statistics
Stable URL: https://www.jstor.org/stable/2241813
Accessed: 02-06-2023 20:46 +00:00

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Institute of Mathematical Statistics is collaborating with JSTOR to digitize, preserve and


extend access to The Annals of Statistics

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
The Annals of Statistics
1987, Vol. 15, No. 3, 1023-1030

ASYMPTOTICALLY OPTIMAL CELLS FOR A HISTOGRAM

BY ATSUYUKI KOGURE

Fukushima University

The purpose of this paper is to examine the properties of the histogram


when the cells are allowed to be arbitrary. Given a random sample from an
unknown probability density f on I, we wish to construct a histogram. Any
partition of I can be used as cells. The optimal partition minimizes the mean
integrated squared error (MISE) of the histogram from f. An expression is
found for the infimum of MISE over all partitions. It is proved that the
infimum is attained asymptotically by minimizing MISE over a class of
partitions of locally equisized cells.

1. Introduction. Let f be a continuous probability density on an interval I


in the real line. I includes its endpoints if it is finite. Given a random sample
X1, X2,..., X. of size n from f, we wish to construct a histogram. Prior to the
actual drawing of a histogram we need to choose cells into which the Xi's are
grouped. Any partition of I can be used as cells. (We call a collection {C>) of
intervals of I a partition of I if Ci n Cj = 0 whenever i 1 j and if I = UjCj.)
When a partition Q = {Cj> is chosen, the histogram on Cj is

Hn(xIQ) := Fn(Cj)1ICI9

where ICjl denotes the width of cell Cj and


n

Fn(C)= n- lE {XiE C ).
i=l

Considering the histogram an estimate of f, we adopt the


squared error,

(1.1) MISE(n,Q) := E I(Hn(xIQ) -f(x))dXj^

as the risk is using Q. Here E denotes the expectation with


The risk is decomposed into two components as

(1.2) MISE(n, Q) = F(C- )(i F(C)) +-

where fQ(x) = E[Hn(xIQ)]. The first component


the sampling variability and the second the bias.
the bias at the cost of increasing the sampling v
is to minimize MISE by compromising the confli

Received April 1986; revised January 1987.


AMS 1980 subject classifications. Primary 62G05; secondary 62E20.
Key words and phrases. Histogram, cells, partition, mean integrated squared error.
1023

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
1024 A. KOGURE

In this paper we examine the properties of the histogram when the cells are
allowed to be arbitrary. We ask how close, in terms of the MISE, the histogram
comes to f. Several authors-including Tapia and Thompson (1978), Scott (1979)
and Freedman and Diaconis (1981)-have investigated this problem. However,
they deal with the subject under the restrictive setting that all cells in a
partition have a common cell width, i.e., for all j, I (Cl = h for some h > 0.
pursue this problem with no restrictions on cells. We derive an asymptotic
expression for the infimum of MISE over all partitions and show that the
infimum can be achieved by a "partition of locally equisized cells."
We analyze the problem under the following conditions on f:
(C.1) f is twice continuously differentiable;
(C.2) f, f ' and f " all belong to L2;
(C.3) f I(x)f(x)I2/3 dX> 0.
An important class of densities excluded by the conditions is that of step
functions. For any such density, the optimal histogram is the density itself.

2. Asymptotic behaviour of MISE. How small can MISE be made as the


sample size tends to infinity? Let 9 denote the class of all partitions of L We
have

THEOREM 1. Let (C.1) and (C.2) be satisfied. Then

(2.1) inf MISE(n, Q) = [(a)( )1/3 lf f(x)f(x)12/3 dx] n -2/3 + (n - 2/3

Under the restrictive setting that all cells in a partition have a common width
h, Freedman and Diaconis (1981) obtained a relation parallel to (2.1):

(2-2) inf MISE(n,


(.)h>O h) = )(_I) / f t(x))2 dx n-2/3 + o(n-2/3),
[(2i)/((613

where MISE(n, h) is the MISE of the histogram with common cell width h. See
also Scott (1979). The ratio of the leading term of (2.1) to that of (2.2) is
J f '(X)f(X)12/3 dx/{Jt( f '(X))2 dX}l/3, which is seen to be no more than 1 by
H6lder's inequality. It is approximately 0.89 for the normal density and 0.24 for
the Cauchy density.
Theorem 1 will be derived by proving the following two propositions. The first
of these gives an asymptotic lower bound for the MISE.

PROPOSITION 1. Under (C.1) and (C.2) we have

(2.3) liminfn2/3 inf {MISE(n, Q)} 2 (1)6'/ - f1 t '(x)f(x)12/3dx.


n - oo Q 9

All we need then is to find a partition whose MISE behaves like the lower
bound as n increases. Our scheme for getting such a partition is based on three

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
OPTIMAL CELLS FOR A HISTOGRAM 1025

observations:

(i) on a small interval f is well approximated by a linear function;


(ii) the bias term f( fQ(x) - f(x))2 cdx of MISE(n, Q) is minimized for a
partition of equisized cells if f is linear; and
(iii) if f does not change much over the region, then the magnitude of the
sampling variation (1/n)EjF(Cj)(1 - F(Cj))/lCjl is mostly determined by
number of cells, regardless of the cell sizes being equal or unequal.

With these observations we might expect that the desired partition can be
found in a class of partitions of locally equisized cells (POLEC). A POLEC is
constructed by "dividing I twice," a process separated into two steps:
Step 1. Choose a reference point x0 E I and a mesh width w > 0 to obtain an
equally spaced net {A i E Z}, where Z is a set of integers and

Ai:= [x0 + (i - 1)W, xO + iw),

(2.4) u i = I.
ieZ

Step 2. For each i E Z choose a positive integer ki and


cells, allowing different ki's for different Ai's. Put k:=
Q( w, k) denote the resultant POLEC.
Let {w,j and {U,} be sequences of positive numbers such that
(2.5) n -l << wn << n-/ Un << n /, (WnlUn) << n-/
where an << bn means limsupn :(a,,/bn) = 0. Define a class of POLEC

(2.6) Y(Wn) Un) {Q(wn,k)Iki < Un for all i E Z}.


Then we have

PROPOSITION 2. Under conditions (C.1) and (C.2) we have

inf{MISE(n, Q): Q E 9 (wn, Un))

(2.7) = ()6/3 f '(x) f (x) 12/3 n -2/3 + o( n2/3).

Clearly, Propositions 1 and 2 together imply Theorem 1. The proofs of the


propositions occupy the next section.
Another implication of the two propositions is

inf{MISE(n, Q): Q e g(wn, Un)} = inf {MISE(n, Q)} + o(n-2/3).

This means that there is a POLEC whose associated MISE is nearly as small as
that of the optimal partition as the sample size grows large. This may call forth
some interest in developing a data-driven rule for choosing a POLEC. Some
discussion on this can be found in Kogure (1986). For the related subject of the
data-based choice of a cell width, see Scott (1979), Rudemo (1982), Chow, Geman
and Wu (1983) and Stone (1985).

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
1026 A. KOGURE

3. Proofs.

PROOF OF PROPOSITION 1. For each interval C let

y(C) = (3)6-1/3JcIf /(x)f(x)I2/3 dx


and for each Q = {C.} E 9 let a?,(Q) - n2/3MISE(n, Q) - y(I).
Put hn (n2/9logn)-' and define a subclass gn of .2 as

:n = (Q = {C)jl0 < ICIl < hn forall I).


For each interval C, 0 < ICI < so, let

Sn(C) = (.i)(nl/31C1)2 + JC(f'(x))2dx + (n l/31CI)-1F(C)


and for each Q = {Cj} e Yn let f3n(Q) = Ej(Sn(Cj) - y(Cj)). Observe that
inf an(Q) ? inf f3,(Q) + inf (an(Q) -in(Q))
(3.1) ~Qe Q r eQ =g
+ (inf an(Q) - inf an(Q))

Then it suffices to show that each term on the right of (3.1) has a nonnegative
lower limit. First, we will show

(3.2) liminf inf f,1(Q) 2 0.


n --#oo Q g?n

Fix Q= {Cj) E- 'n For each C. E Q


1/3

n( Cj)2 ( 23)6-/3{(f(x))2 dx) (F(Cj)) /

> (3)6-/3 f '(x)f(x) 12/3 dx (by H6lder's inequaity)

Thus, fln(Q) ? 0, which implies (3.2).


Second, we will show

(3.3) liminf inf (an(Q) - f3n(Q)) 2 0.

Fix Q = (Cj) E 96,. Then


an(Q) - n(Q) = n2/3MISE(n, Q) - n(Cj)
I

(3.4) = fl2/3( (fQ(X) - f(X)) dx - (d )2 l9 ( f J (X))2 dx)


-1n/3 E (F(Cj))21,Cj.

It is easy to see that for each j there is a point xj inside cj such that

j( fQ(X) - f(X))2 dC 2 (?)( f '(Xj))21C l.

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
OPTIMAL CELLS FOR A HISTOGRAM 1027

Then, by Lemma 2.22 of Freedman and Diaconis (1981), we

f( '(xj)) 2ICj - | f(X))2 dx < 21c11f I t '(x)f "

Thus,

f( Q(x) - f(x))2 dx - ( EI) lCjq2 ( f t(X))2 dx


(3.5) 1 1

> -2h3 f '(x)f "(x) I dx.


n

Note that

(3.6) =f(I(F(Cj))21Cj, fQ(x))2 dx < f(X))2 dx.


Combining (3.4), (3.5) and (3.6) and recalling that h, = (n2/9log n)1 we have
(3.3).
Lastly, we will show

(3.7) lim inf ( inf an(Q) - inf an(Q)} ? 0.


n oQeg =-g

It would suffice to show that for each Q e 9, there


sufficiently large n

(3.8) an(Q) - an(Qo) 2 rn,


where {rn} is a sequence of real numbers such
then for all sufficiently large n: infQ ef gAf
construct such Q? as follows. Fix C E Q. If I
Q?. If ICI > h , then divide C equally into s
equal to hn if ICI = oo. If ICI < so, then ther
that hn(m - 1) < ICI < hnm. Set h* equal to
subintervals thus made be included in Q?. Repeat this for each C E Q. Then we
have

an(Q) - an(Q?)

-2/3(J1( IQ(x) - f(x))2 _X fQO(X) _ f X))2 d

(3*9) + (1/n)( E F(C)(1 - F(C))IICI


CEQ, CEQO

- E F(C)(1 - F(C))IICl))
C4Q, CEQ0

Note that Q? is finer than Q in the sense that for each C E Q there is C0 E Q?
such that CO c C. Thus

(3.10) j( fQ(x) f(x))2 dx > ( fQo(x) - f(x))2 dx.

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
1028 A. KOGURE

Observe that

(3.11) E F(c)/C < (inf ICI) < Qhn.


CeQ, CE CeQ0

Combining (3.6), (3.9), (3.10) and (3.11) we have

an(Q) - a(Q n) ? 1-1/3(2h-1 + f(f(X))2 dx)

This in turn implies the existence of {rn) in (3.8). The proof of the proposition is
now complete. O

PROOF OF PROPOSITION 2. The mean value theorem implies that for each
i E Z there is xi E Ai such that

(3.12) l I '(x)f(x) 2 dX = W, f (Xi) f (Xi) 12/3,

where A, is defined as at (2.4). For each i E Z define a real-valued function Hi(.)


on (0, oo) as

(3.13) Hi(t) 1= (12)Wn3( f '(x))2/t2 + f (xi)t/n.


Put Z? := {i E Zlf(xi) # 0 and f '(xi) 0). For each i E Z? let

t*= 6- 1/3f{( I 1(Xi))2/f(xi)} /n1/3wl .

t x minimizes Hi(t) and Hi(t*) = (3)6-1/3JAf'(x)f(x)22/3 n-2/3 By (312)


JAIf '(x)f(x)I2/3 dx = 0 for i E Z?. Thus,

(3.14) ? Hi(t") = (1)6-1/3 f '(x) f(x) 12/3 x n-2/3


ieZI

Let Q(wn, k) be a partition in 9(wn, Un), where 9(wn, Un) is defi


It is easy to see that

(3.15) MISE(n, Q(wnjk)) = ? Hi(ki) + o(n 2/3),

where the o(n -2/3) term is bounded by a quantity which converges to ze


than n-2/3, uniformly in both {xi} and k. In the light of (3.14) and (3.1
conclusion follows if we verify the existence of a sequence of positive
{k,?i E Z, k* < U)n such that

(3.16) Hi(k#*) - Hi(t?*) = o(n-2/3).


ieZ ieZ0

If both f '(xi) and f(xi) ar


of f '(xi) and f(x ) is not ze

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
OPTIMAL CELLS FOR A HISTOGRAM 1029

integer such that H.(k9) is closest to Hj(t"). F

if i E Z and kp < U,
Un= -1 if i E Z and k9 > Un,i
g Un if f '(xi) # 0 and f(xi) 0,
1 if f '(xi) = 0 and f (xi) #0.

Now consider the four possible cases and the respective inequalities below. For
the first two cases we utilize the following relation: If i E Z?, then

(3.17) 2( 1 )( f ,(Xi))2W3t-2 > (f(xi)/n)t, iff t' >< t.


Case 1. i E ZO and k9 < Un. By (3.17)
Hi(t?* + 1) < (3)( f (xj)1n)(t?* + 1)
and

Hi(t*) = (3)(f(xi)/n)t*.
Thus,

(3.18) Hi(k?H ) - Hi(t*) < Hi(tVI + 1) - Hj(0)

= (3)( f(xi)/n).
Case 2. ieZOand k9 > Un.Then tV, + 1> Un Thus,
(3.19) Hi(U -1) < (j)( f (X))2Wn3(U -_) 1K.
Case 3. f (xi) # 0 and f(xt) = 0. Then

(3.20) Hi(Un) = ( 11)( f '(x ))2wn,U,-.


Case 4. f '(xi) = 0 and f(xi) # 0. Then
(3.21) Hi(l) = f (xi)/n.
Combining (3.18)-(3.21) we have

Hi (k* )- Hi(ti)
ieZ ieZ0

(3.22) < (E)( Ef(xi))/n + (4)( ( ((Xi))2)W(U- 1)

By Corollary 2.24 of Freedman and Diaconis (1981) we have

(3.23) E f (xi)wn- (x) dx < wnj I '(xf ) I dx


and

(3.24) | (I f(xi))2w,-j( f(x))2 dx < 2wnjI f '(x) f (x) I dx.


iEr=Z

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms
1030 A. KOGURE

Equation (3.22) together with (3.23) and (3.24) implies

H E (k*)- E Hi(t*) = o((nwn) 1 + (wJUn)2)


ieZ iEZ0

= O(n- 2/3)
The last relation holds because of (2.5). This completes the proof of Proposition
2. ri

Acknowledgments. This work is part of the author's Ph.D. thesis at Yale


University, written under the supervision of Professor J. A. Hartigan, whose
guidance and suggestions are gratefully acknowledged and appreciated. Thanks
are also due to Professors K. Takeuchi and Y. Hosoya for helpful comments.

REFERENCES

CHOW, Y.-S., GEMAN, S. and Wu, L. D. (1983). Consistent crossvalidated density estimation. Ann.
Statist. 11 25-38.
FREEDMAN, D. and DIACONIS, P. (1981). On the histogram as a density estimator: L2 theory. Z.
Wahrsch. verw. Gebiete 57 453-476.
KOGURE, A. (1986). Optimal cells for a histogram. Ph.D. thesis, Yale Univ.
RUDEMO, M. (1982). Empirical choice of histograms and kernel density estimators. Scand. J.
Statist. 9 65-78.
ScoTr, D. W. (1979). On optimal and data-based histograms. Biometrika 66 605-610.
STONE, C. (1985). An asymptotically optimal histogram selection rule. In Proc. of the Berkeley
Conference in Honor of Jerzy Neymnan and Jack Kiefer (L. M. Le Cam and R. A. Olshen,
eds.) 2 513-520. Wadsworth, Belmont, Calif.
TAPIA, R. A. and THOMPSON, J. R. (1978). Nonparametric Probability Density Estimation. Johns
Hopkins Univ. Press, Baltimore, Md.

FACULTY OF ECONOMICS
FUKUSHIMA UNIVERSITY
MATSUKAWA-MACHI, FUKUSHIMA 960-12
JAPAN

This content downloaded from 128.197.229.194 on Fri, 02 Jun 2023 20:46:02 +00:00
All use subject to https://about.jstor.org/terms

You might also like