Professional Documents
Culture Documents
Bowman - Monotone Regresion PDF
Bowman - Monotone Regresion PDF
Key Words: Bootstrap; Critical bandwidth; Local linear fitting; Multimodality testing;
Smoothing.
1. INTRODUCTION
It is often natural to assume that a regression relationship is monotone. For example,
a response might be expected to increase (or decrease) with increasing dose of a drug, at
least over some range of doses of interest; measurements made from physical or chemical
processes which are subject to decay can be expected to exhibit monotonically decreasing
(or increasing) behavior. It follows that many popular parametric forms for regression
are monotone (linear, logistic, exponential, power law). In nonparametric regression,
if monotonicity is a valid assumption, incorporation of a monotonicity constraint will
enhance performance (Friedman and Tibshirani 1984; Mukerjee 1988; Ramsay 1988;
Mammen 1991). On the other hand, forcing monotonicity on an estimate when the true
relationship is nonmonotone may lead to erroneous inferences. It can be argued that a
good nonparametric estimator should reflect monotonicity or otherwise as indicated by the
data, and hence that monotonicity constraints need never be introduced (Breiman 1988;
Hastie and Tibshirani 1988). However, a test of monotonicity of a regression function
would often be a useful adjunct to looking at a smooth estimate, particularly one with an
inconclusive indication of a monotone relationship, and especially with a view towards
making a parametric specification. It is the purpose of this article to provide such a test.
Our test is an analogue of Silverman’s (1981) test of multimodality of a probability
A. W. Bowman is Professor, Department of Statistics, University of Glasgow, Glasgow G12 8QQ, Scotland,
U.K. (E-mail: adrian@stats.gla.ac.uk). M. C. Jones is Reader, Department of Statistics, The Open University,
Milton Keynes MK7 6AA, U.K. I. Gijbels is Associate Professor, Institut de Statistique, Université Catholique
de Louvain, 20 Voie du Roman Pays, 1348 Louvain-la-Neuve, Belgium.
c 1998 American Statistical Association, Institute of Mathematical Statistics,
and Interface Foundation of North America
Journal of Computational and Graphical Statistics, Volume 7, Number 4, Pages 489–500
489
490 A. W. BOWMAN, M. C. JONES, AND I. GIJBELS
density function. For a particular version of kernel density estimation, the modality of
the density estimate is a monotone decreasing function of the smoothing parameter,
or bandwidth. Silverman’s test statistic is the “critical” bandwidth that just forces the
modality of the null hypothesis. The idea is that if the null hypothesis is true, this
critical bandwidth will be relatively small, but that if a greater modality is true, the
critical bandwidth will be rather larger, to impose the smaller null modality. Silverman
(1981) used the smooth estimate using the critical bandwidth as the basis of a natural
“smoothed bootstrap” approach to assessing the distribution of the test statistic under the
null hypothesis. See also Wong (1985), Minnotte and Scott (1993) and Fisher, Mammen
and Marron (1994).
For suitable nonparametric regression estimation, the monotonicity of the estimate
is monotone in the bandwidth in the sense that there is a critical bandwidth at which
the estimate changes from the nonmonotonicity exhibited at all smaller bandwidths to
monotonicity which persists for all larger bandwidths. Again, if the null hypothesis of
monotonicity is true, the critical bandwidth should be relatively small, while if mono-
tonicity is false, the critical bandwidth needs to be rather larger to force monotonicity.
Also analogously, a smoothed bootstrap approach can be used to obtain the null distribu-
tion of the test statistic based on the smooth estimate using the critical bandwidth. This
is particularly straightforward in regression models in which the mean specification also
determines the variance structure; see Section 5. However, one has to make additional
specifications for this step in normal-based regression as compared with density estima-
tion. Details and discussion of our proposal, with emphasis on the homoscedastic case,
are given in Section 2. The methodology is tested in simulations in Section 3 and applied
to real examples in Sections 4 and 5. It appears to work well.
Perhaps surprisingly, we have found no direct competitors to our procedure in the
literature. Only Schlee (1982) laid the theoretical groundwork for a test based on the
greatest discrepancy of an estimate of the derivative of the regression function from
zero. However, that work does not discuss practical implementation.
2. THE METHOD
A first requirement is a good method for nonparametric regression, and we use the
well-known local linear regression estimator (Cleveland 1979; Fan 1992; Hastie and
Loader 1993; Fan and Gijbels 1996). A particular merit of this approach is that as the
bandwidth becomes large, the estimator tends to a least squares straight line fit, and this,
of course, is monotone. We also feel that the method’s good boundary properties (Fan
and Gijbels 1992) will be advantageous. This estimator will be denoted by m̂(x; h),
where h is the bandwidth.
Using m̂, we now give an outline version of our proposal.
1. Find the critical bandwidth hc , say, which is the smallest h such that m̂(x; hc ) is
monotone.
2. Construct good estimates ˆ1 ; : : : ; ˆn of the residual errors.
3. Generate a bootstrap sample ˆ31 ; : : : ; ˆ3n from ˆ1 ; : : : ; ˆn and hence a bootstrap data
set Yi3 = m̂(Xi ; hc ) + ˆi3 ; i = 1; : : : ; n.
TESTING MONOTONICITY OF REGRESSION 491
m̃(x; h) =
XZ n Ti
h (x 0 y)dy Yi ;
i 1 = Ti01
m̃0 (x; h1 ) =
XZ
n Ti
0 (x
h 0 y)dy Yi
Ti01
XZ
1
=
i 1
n Ti
= ( 3 h)0 (x 0 y)dy Yi
Z= Z
Ti01
X
i 1
n Ti
= ( x0u ) 0h (u 0 y)dy du Yi
=
i 1 Ti 01
of the regression estimate holds for some bandwidth h but not for some h1 > h) are
extremely unusual, and also shortlived. Out of 90;000 simulations this phenomenon
occurred on only two occasions; see Section 3. This empirical observation provides
practical justification for the use of the proposed methodology.
One way of dealing with these occasional departures from the desired property is
to give a different definition of hc , and to adopt the analogous definition for h3c . Strictly
speaking, this would involve abandoning the efficient procedure for the calculation of the
p value described earlier. However, we prefer to maintain computational efficiency by
adopting a slightly conservative approach. To this end, we define hc to be the smallest
value of h that gives rise to regression monotonicity, even if there exist larger h for
which the function is not monotone. To see that the test using this definition is conser-
vative, consider a situation where, as h increases, the shape of the estimate moves from
nonmonotonicity, to monotonicity, to nonmonotonicity, and back to monotonicity, for the
bootstrap data set under consideration. Suppose the pattern is such that hc for the origi-
nal data is aligned “above” the “central” nonmonotonicity section for the bootstrap data.
Since the significance of the test is the proportion of cases where h3c > hc , a case such
as this would increase the p value, since it would suggest erroneously that h3c is larger
than hc in this case when in fact it is smaller. It is easy to see that this argument extends
immediately to any other monotonicity/nonmonotonicity patterns one could envisage.
We note too that practical implementation will involve discretisation through the
examination of monotonicity over a grid of bandwidths. This will further alleviate the
problem. In the examples to follow we used an initial grid of 50 values of h between
r=(2n) and r=2, where r denotes the range of the x-values and n denotes the sample
size. (These end-points were multiplied by two in the case of Binomial response data
considered later.) To ensure that the bootstrap distribution of hc was not concentrated on
a few discrete values, the two grid points which bound hc were identified and a second
grid search, covering 20 values, performed to increase the resolution.
P
minimize fm̂(Xi ) 0 m(Xi )g2 . We therefore use the local linear estimator m̂ with the
“plug–in” automatic bandwidth selector of Ruppert, Sheather, and Wand (1995), which,
among other things, addresses the particular loss function mentioned previously.
We have no rigorous theoretical arguments to back up our belief that this method of
bootstrapping correctly mimics the distribution of interest, although simulation evidence
is provided in Section 3. However, it is clear heuristically that such a test is consistent.
If the null hypothesis of monotonicity is true, the critical bandwidth will, asymptotically,
be less than or equal to the optimal bandwidth for “best” estimation of m, regardless
of monotonicity, because the latter will, asymptotically, produce a monotone regression
function. Thus, hc will tend to zero under H0 . When H0 is false, hc will converge to
a nonzero value, since a nonzero amount of smoothing will be necessary to produce
a monotonic estimator from data generated by a nonmonotonic regression function. In
the density estimation case, Silverman (1983) and Mammen, Marron, and Fisher (1992)
provided some theory in connection with the analogous argument there.
Finally, we note that our method does not immediately cover heteroscedasticity in a
model with normal errors. All that is required for this extension is a mechanism which,
in the notation given previously, allows resampled ˆi ’s to be associated with m̂(Xj ; hc )’s
only when Xj is “close” to Xi . Other cases, such as Poisson and binomial regression
models in which the variance function is determined by the mean, are readily dealt with
as in Section 5.
3. SIMULATIONS
In order to examine the performance of the proposed test, a simulation study was
carried out. The regression function used was 1 + x 0 a expf0 12 (x 0 :5)2 =:12 g over
the range (0; 1). The use of the variable a allows the addition of a “kink” of varying
size to be added to the underlying linear function. Figure 1 displays the shapes used.
494 A. W. BOWMAN, M. C. JONES, AND I. GIJBELS
a= 0 a= 0.15
2.2
• •
• •• • •
• •• • • ••
••
•• ••• • •• •
1.8
• •• • • •
• • • ••
• •• • • • ••••
••
y
y
•
• • ••• • •• •
1.4
•
• •• •••• • • •
• ••• • •• • • • ••• •• ••
• •• • •
••
1.0
•• • • •• •
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
a= 0.25 a= 0.45
2.0
• ••
• •••••• •
••• • •••
• ••••• • •• ••
1.8
1.6
• • •
• • •
•• • • •
y
• •• ••• • •
•• ••
1.4
• •• ••• • •
1.2
••• • • ••
• • •• • • •
• •• • • ••
• •••• •• •• ••
1.0
•• •••
0.8
•• •
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
Figure 1. Regression Functions Used in the Simulation Study.
The value a = 0 produces a linear curve, which is therefore strongly monotonic. The
value a = :15 produces a curve which is only just monotonic. The values a = :25 and
a = :45 produce mildly and strongly nonmonotonic shapes, respectively. The design
points were regularly spaced over (0,1) and the errors were Normally distributed. A
variety of sample sizes, n = 25; 50; 100, and error standard deviations, = :025; :05; :1,
was also considered. The data displayed in Figure 1 were simulated with n = 50 and
= :1. For each combination of parameters, 500 simulations were carried out, using a
bootstrap simulation size of 500.
Results are displayed in Table 1, where the proportions of the simulations in which
the observed p value fell below 5% are listed. In the strongly monotonic case (a = 0)
these proportions are very small, reflecting the fact that a linear regression function is
unlikely to produce data which exhibit nonmonotonicity. In the case where the underlying
curve is only just monotonic (a = :15) the proportions are generally within a reasonable
distance of the target value of 5%. Where this is not the case, the observed proportions
are very small. The overall indications on the size of the test are therefore that where
a nominal target of 5% is not reached the test will generally operate in a conservative
manner.
TESTING MONOTONICITY OF REGRESSION 495
Table 1. Simulated Size and Power for the Test of Monotonicity. The regression function used was
+ 0 0 0
1 x a exp( 1/2(x .5)2 /.12 ) over the range (0,1). The regression curves corresponding
= =
to a 0 and a .15 are monotonic. A variety of sample sizes n and error standard deviations
was also considered.
n
25 50 100
a =0
.025 .002 .006 .002
.05 .018 .000 .008
.10 .022 .008 .004
a = .15
.025 .058 .000 .080
.05 .042 .022 .052
.10 .026 .018 .008
a = .25
.025 .816 .926 .994
.05 .310 .482 .748
.10 .088 .100 .174
a = .45
.025 1.000 1.000 1.000
.05 .966 1.000 1.000
.10 .374 .544 .874
4. AN EXAMPLE
The test was applied to data from radiocarbon dating. Clark (1977) described anal-
ysis of data of this type. Data from samples of known age are subjected to laboratory
analysis and used to calibrate the radiocarbon dating process. Figure 2 displays a subset
of data published by Pearson and Qua (1993), corresponding to true ages of 5,000 to 6,000
496 A. W. BOWMAN, M. C. JONES, AND I. GIJBELS
•
•
5200 •
• •
•
•
• ••
•• •
5000
••
Radiocarbon age
•
•
•••
•
4800
• •
•
•
• • •
•
••••
•
4600
•
• ••• ••
••
• ••
•
••
Figure 2. Nonparametric Regressions With Radiocarbon Data Using “Plug-in” (full line) and Critical (dotted
line) Bandwidths.
years. A local linear nonparametric regression curve, with “plug-in” bandwidth h = 38,
is superimposed. Since dating techniques are based on the process of radioactive decay,
it would be natural to expect that older objects will produce older radiocarbon dates.
However, it is known that fluctuations in the natural production of radiocarbon can
produce nonlinear effects (so called “wiggles”) in the calibration curve.
In general, where monotonicity can be assumed, greater precision may be achieved
by incorporating this information into the estimation process. Some of the monotonic
estimators mentioned in Section 1 can be employed to do this. However, inappropriate
use of such estimators on curves which are not monotonic may produce bias. In order
to explore the evidence for nonmonotonicity in Figure 2, the bootstrap test was applied.
The critical bandwidth is hc = 57. The corresponding nonparametric regression curve
is indicated in Figure 2 by a dotted line. The test produced a p value of .06. While
not conclusive, this is small enough to lend weight against the adoption of a monotonic
estimator.
TESTING MONOTONICITY OF REGRESSION 497
5.1 AN EXAMPLE
As an example of data with binary outcomes, we use measurements of systolic
blood pressure and the occurrence of myocardial infarction (MI), reported in Rousseuw
et al. (1983). These data were also analyzed by Hastie and Tibshirani (1990). Figure
3 displays a nonparametric regression curve, produced using a “local logistic” method
described by Fan, Heckman, and Wand (1995). The bandwidth used was h = 40. This
was subjectively chosen, but reflects the general behavior of the curve over a wide range
of bandwidths. The curve shows a relatively simple increasing relationship between blood
pressure and MI, but a small decrease is also evident for very low blood pressure. A test
of monotonicity was applied in order to identify whether this feature reflects systematic
structure or random variation.
The critical value of the bandwidth is hc = 64. This monotonic curve is displayed
in Figure 3 as a dotted line. The bootstrap procedure was applied as described in Section
498 A. W. BOWMAN, M. C. JONES, AND I. GIJBELS
1.0 • • • • • • • • • • • • • •• •• • • • •• • • • • • • • • • • • • • • •• • ••
0.8
0.6 • • • •• • • • ••••• •• • ••• • •• •• •• • • •••• •• • • • •• •• • • •• • •• • • • ••
MI
0.4
0.2
•• • • • •• •• •• • ••••• • • •••• • • • •• • • • • •• •• •• • • •
0.0
•
• • • •• • • •• ••••• ••••••• ••• ••• •• •• •• •• •• •• ••••• •• ••• •••• •• •••• • • • •• • • •
Figure 3. Nonparametric Regressions With Myocardial Infarction Data Using Subjective (full line) and Critical
(dotted line) Bandwidths.
2.3, except that the calculation of, and sampling from, Normal residuals is replaced by the
simulation of binary data using the fitted curve at each point to define the probability of
success. This produces a p value of :89, showing clearly that the nonmonotonic behavior
at very low blood pressures is quite consistent with sampling variation. We are therefore
prevented from over-interpreting this observed feature.
ACKNOWLEDGMENTS
The first two authors gratefully acknowledge the support and hospitality of the Institut de Statistique,
Université Catholique de Louvain, where this work was initiated. Irène Gijbels was supported by “Projet
d’Actions de Recherche Concertées’ (No. 93/98–164) and research grant (No. 15.001.95F) of the National
Science Foundation (FNRS), Belgium. We are also grateful for the assistance of Marian Scott and Trevor
Hastie in obtaining and commenting on the data for the examples, and to Matt Wand and Dave Signorini
for allowing us access to computing code to implement some of the smoothing procedures. In addition, the
comments of an associate editor and two referees were helpful in improving some aspects of the presentation
of the article.
TESTING MONOTONICITY OF REGRESSION 499
REFERENCES
Breiman, L. (1988), Comment on “Monotone Regression Splines in Action,” by J. O. Ramsay, Statistical
Science, 3, 442–445.
Clark, R. M. (1977), “Calibration, Cross-Validation and Carbon-14. II,” Journal of the Royal Statistical Society,
Series A, 143, 177–194.
Cleveland, W. (1979), “Robust Locally Weighted Regression and Smoothing Scatterplots,” Journal of the
American Statistical Association, 74, 829–836.
Fan, J. (1992), “Design-Adaptive Nonparametric Regression,” Journal of the American Statistical Association,
87, 998–1004.
Fan, J., and Gijbels, I. (1992), “Variable Bandwidth and Local Linear Regression Smoothers,” The Annals of
Statistics, 20, 2008–2036.
(1996), Local Polynomial Modelling and its Applications, London: Chapman and Hall.
Fan, J., Heckman, N.E., and Wand, M.P. (1995), “Local Polynomial Kernel Regression for Generalized Linear
Models and Quasi-likelihood Functions,” Journal of the American Statistical Association, 90, 141–150.
Fisher, N.I., Mammen, E., and Marron, J.S. (1994), “Testing for Multimodality,” Computational Statistics and
Data Analysis, 18, 499–512.
Friedman, J.H., and Tibshirani, R.J. (1984), “The Monotone Smoothing of Scatterplots,” Technometrics, 26,
243–250.
Härdle, W. (1990), Applied Nonparametric Regression, Cambridge: Cambridge University Press.
Hastie, T.J., and Loader, C.R. (1993), “Local Regression: Automatic Kernel Carpentry” (with comments),
Statistical Science, 8, 120–143.
Hastie, T.J., and Tibshirani, R.J. (1988), Comment on “Monotone Regression Splines in Action,” by J. O.
Ramsay, Statistical Science, 3, 450–456.
Hastie, T.J., and Tibshirani, R.J. (1990), Generalized Additive Models, London: Chapman and Hall.
Mammen, E. (1991), “Estimating a Smooth Monotone Regression Function,” The Annals of Statistics, 19,
724–740.
Mammen, E., Marron, J.S., Fisher, N.I. (1992), “Some Asymptotics for Multimodality Tests Based on Kernel
Estimates,” Probability Theory and Related Fields, 91, 115–132.
McCullagh, P., and Nelder, J.A. (1989), Generalized Linear Models (2nd ed.), London: Chapman and Hall.
Minnotte, M.C., Scott, D.W. (1993), “The Mode Tree: A Tool for Visualization of Nonparametric Density
Estimates,” Journal of Computational and Graphical Statistics, 2, 51–68.
Mukerjee, H. (1988), “Monotone Nonparametric Regression,” The Annals of Statistics, 16, 741–750.
Pearson. G.W., and Qua, F. (1993), “High Precision 14C Measurement of Irish Oaks to Show the Natural 14C
Variations From AD 1840–5000 BC: A Correction,” Radiocarbon, 35, 105–123.
Ramsay, J.O (1988), “Monotone Regression Splines in Action” (with comments), Statistical Science, 3, 425–
461.
Rousseuw, J., du Plessis, J., Benade, A., Jordann, P., Kotze, J., Jooste, P., and Ferreira, J. (1983), “Coronary
Risk Factor Screening in Three Rural Communities,” South African Medical Journal, 64, 430–436.
500 A. W. BOWMAN, M. C. JONES, AND I. GIJBELS
Ruppert, D., Sheather, S.J., and Wand, M.P. (1995), “An Effective Bandwidth Selector for Local Least Squares
Regression,” Journal of the American Statistical Association, 90, 1257–1270.
Schlee, W. (1982), “Nonparametric Tests of the Monotony and Convexity of Regression,” in Nonparametric
Statistical Inference vol. II, eds. B.V. Gnedenko, M.L. Puri, and I. Vincze, Amsterdam: North-Holland,
pp. 823–836.
Silverman, B.W. (1981), “Using Kernel Density Estimates to Investigate Multimodality,” Journal of the Royal
Statistical Society Ser. B, 43, 97–99.
(1983), “Some Properties of a Test for Multimodality Based on Kernel Density Estimates,” in Proba-
bility, Statistics and Analysis, eds. J.F.C. Kingman and G.E.H. Reuter, Cambridge: Cambridge University
Press, 248–259.
Wong, M.A. (1985), “A Bootstrap Testing Procedure for Investigating the Number of Subpopulations,” Journal
of Statistical Computing and Simulation, 22, 99–112.