Contributions On Theory of Mathematical Statistics Kei Takeuchi Online Ebook Texxtbook Full Chapter PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Contributions on Theory of

Mathematical Statistics Kei Takeuchi


Visit to download the full and correct content document:
https://ebookmeta.com/product/contributions-on-theory-of-mathematical-statistics-kei-
takeuchi/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Proceedings of the Third Berkeley Symposium on


Mathematical Statistics and Probability Volume 2
Contributions to Probability Theory Jerzy Neyman
(Editor)
https://ebookmeta.com/product/proceedings-of-the-third-berkeley-
symposium-on-mathematical-statistics-and-probability-
volume-2-contributions-to-probability-theory-jerzy-neyman-editor/

Proceedings of the Third Berkeley Symposium on


Mathematical Statistics and Probability Volume 4
Contributions to Biology and Problems of Health

https://ebookmeta.com/product/proceedings-of-the-third-berkeley-
symposium-on-mathematical-statistics-and-probability-
volume-4-contributions-to-biology-and-problems-of-health/

Proceedings of the Third Berkeley Symposium on


Mathematical Statistics and Probability Volume 1
Contribution to the Theory of Statistics

https://ebookmeta.com/product/proceedings-of-the-third-berkeley-
symposium-on-mathematical-statistics-and-probability-
volume-1-contribution-to-the-theory-of-statistics/

Proceedings of the Sixth Berkeley Symposium on


Mathematical Statistics and Probability Volume 1 Theory
of Statistics Lucien M. Le Cam (Editor)

https://ebookmeta.com/product/proceedings-of-the-sixth-berkeley-
symposium-on-mathematical-statistics-and-probability-
volume-1-theory-of-statistics-lucien-m-le-cam-editor/
Proceedings of the Third Berkeley Symposium on
Mathematical Statistics and Probability Volume 3
Contributions to Astronomy and Physics Jerzy Neyman
(Editor)
https://ebookmeta.com/product/proceedings-of-the-third-berkeley-
symposium-on-mathematical-statistics-and-probability-
volume-3-contributions-to-astronomy-and-physics-jerzy-neyman-
editor/

Proceedings of the Fourth Berkeley Symposium on


Mathematical Statistics and Probability Volume 3
Contributions to Astronomy Meteorology and Physics
Jerzy Neyman (Editor)
https://ebookmeta.com/product/proceedings-of-the-fourth-berkeley-
symposium-on-mathematical-statistics-and-probability-
volume-3-contributions-to-astronomy-meteorology-and-physics-
jerzy-neyman-editor/

Probability Theory and Mathematical Statistics 2nd


Edition B. Grigelionis

https://ebookmeta.com/product/probability-theory-and-
mathematical-statistics-2nd-edition-b-grigelionis/

Methodology and Applications of Statistics: A Volume in


Honor of C.R. Rao on the Occasion of his 100th Birthday
(Contributions to Statistics) Barry C. Arnold (Editor)

https://ebookmeta.com/product/methodology-and-applications-of-
statistics-a-volume-in-honor-of-c-r-rao-on-the-occasion-of-
his-100th-birthday-contributions-to-statistics-barry-c-arnold-
editor/

ON THE MATHEMATICAL STRUCTURE OF QUANTUM MEASUREME


THEORY 9th Edition Geoffrey Sewell

https://ebookmeta.com/product/on-the-mathematical-structure-of-
quantum-measurement-theory-9th-edition-geoffrey-sewell/
Kei Takeuchi

Contributions
on Theory
of Mathematical
Statistics
Contributions on Theory of Mathematical Statistics
Kei Takeuchi

Contributions on Theory
of Mathematical Statistics

123
Kei Takeuchi
Professor Emeritus, The University of Tokyo
Bunkyo-ku, Tokyo, Japan

ISBN 978-4-431-55238-3 ISBN 978-4-431-55239-0 (eBook)


https://doi.org/10.1007/978-4-431-55239-0
© Springer Japan KK, part of Springer Nature 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Japan KK part of Springer Nature.
The registered company address is: Shiroyama Trust Tower, 4-3-1 Toranomon, Minato-ku, Tokyo
105-6005, Japan
Preface

This is a collection of the author’s contributions to various types of problems of the


theory of mathematical statistics.
The original sources of the contents of this book consist of various types.
Some (Chaps. 6, 9, 13–15) are reprints of the papers published in English
journals. Chapter 9 is co-authored with Professor Kazuo Murota and Chaps. 13–15
are co-authored with Professor Akimichi Takemura. I would like to express cordial
acknowledgements for their permission to have joint papers to be included in this
volume.
Others are reorganizations of my papers or chapters of my books written in
Japanese translated into English (Chaps. 1, 5, 7, 8, 10–12).
Also others (Chaps. 2–4) are papers originally written in English and presented
in meetings but have not been published.
The contents are divided into seven parts according to the topics dealt with.
My joint papers with Professor Masafumi Akahira were compiled and published
as “Joint Statistical Papers of Akahira and Takeuchi” (World Scientific (2003))
hence not included in this volume.
I would like to express my special thanks to Dr. M. Kumon, who helped me
edit the papers and typed the manuscripts for printing. I am also indebted to
Mr. Y. Hirachi of the Springer for the arrangement of the publication, who kindly
waited for my long-delayed preparation of the texts with patience.

Kamakura, Japan Kei Takeuchi


April 2019

v
Contents

Part I Statistical Prediction


1 Theory of Statistical Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Sufficiency with Respect to Prediction . . . . . . . . . . . . . . . . . . . 5
1.3 Point Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Interval or Region Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5 Non-parametric Prediction Regions . . . . . . . . . . . . . . . . . . . . . 26
1.6 Dichotomous Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.7 Multiple Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Part II Unbiased Estimation


2 Unbiased Estimation in Case of the Class of Distributions
of Finite Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.2 Minimum Variance Unbiased Estimators . . . . . . . . . . . . . . . . . 43
2.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.4 Non-regular Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Some Theorems on Invariant Estimators of Location . . . . ....... 59
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 59
3.2 Estimation of the Location Parameter When the Scale
is Known . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 60
3.3 Some Examples: Scale Known . . . . . . . . . . . . . . . . . ....... 64
3.4 Estimation of the Location Parameter When the Scale
is Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....... 70

vii
viii Contents

3.5 Some Examples: Scale Unknown . . . . . . . . . . . . . . . . . . . . . . . 76


3.6 Estimation of Linear Regression Coefficients . . . . . . . . . . . . . . 81
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Part III Robust Estimation


4 Robust Estimation and Robust Parameter . . . . . . . . . . . . . . . . . . . 89
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.2 Definition of Location and Scale Parameters . . . . . . . . . . . . . . . 91
4.3 The Optimum Definition of Location Parameter . . . . . . . . . . . . 93
4.4 Robust Estimation of Location Parameter . . . . . . . . . . . . . . . . . 95
4.5 Definition of the Parameter Depending on Several
Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......... 98
4.6 Construction of Uniformly Efficient Estimator . . . . . . . . . . . . . 100
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5 Robust Estimation of Location in the Case of Measurement
of Physical Quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.2 Nature of Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3 Normative Property of the Normal Distribution . . . . . . . . . . . . 112
5.4 Class of Asymptotically Efficient Estimators . . . . . . . . . . . . . . . 117
5.5 Linear Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.6 Class of M Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.7 Estimators Derived from Non-parametric Tests . . . . . . . . . . . . . 138
5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
6 A Uniformly Asymptotically Efficient Estimator of a Location
Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
6.2 The Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
6.3 Monte Carlo Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.4 Observations on Monte Carlo Results . . . . . . . . . . . . . . . . . . . 162
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

Part IV Randomization
7 Theory of Randomized Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
7.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
7.3 Testing the Hypothesis in Randomized Design . . . . . . . . . . . . . 188
7.4 Considerations of the Power of the Tests . . . . . . . . . . . . . . . . . 198
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Contents ix

8 Some Remarks on General Theory for Unbiased Estimation


of a Real Parameter of a Finite Population . . . . . . . . . . . . . . . . . . . 201
8.1 Formulation of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
8.2 Estimability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.3 X0 -exact Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
8.4 Linear Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
8.5 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

Part V Tests of Normality


9 The Studentized Empirical Characteristic Function
and Its Application to Test for the Shape of Distribution . . . . . . . . 221
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.2 Limiting Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.3 Application to Test for Normality . . . . . . . . . . . . . . . . . . . . . . 224
9.4 Asymptotic Consideration on the Power . . . . . . . . . . . . . . . . . . 226
9.4.1 The Power of b2 ; an ðtÞ; ~ an ðtÞ . . . . . . . . . . . . . . . . . . . 226
9.4.2 Relative Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
9.5 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.6 Empirical Study of Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
9.6.1 Null Percentiles of an ðtÞ and ~ an ðtÞ . . . . . . . . . . . . . . . . 231
9.6.2 Details of the Simulation . . . . . . . . . . . . . . . . . . . . . . 232
9.6.3 Results and Observations . . . . . . . . . . . . . . . . . . . . . . 233
9.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
10 Tests of Univariate Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
10.2 Tests Based on the Chi-Square Goodness of Fit Type . . . . . . . . 239
10.3 Asymptotic Powers of the v2 -type Tests . . . . . . . . . . . . . . . . . . 246
10.4 Tests Based on the Empirical Distribution . . . . . . . . . . . . . . . . 260
10.5 Tests Based on the Transformed Variables . . . . . . . . . . . . . . . . 269
10.6 Tests Based on the Characteristics of the Normal
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
11 The Tests for Multivariate Normality . . . . . . . . . . . ......... . . . 301
11.1 Basic Properties of the Studentized Multivariate Variables . . . . 301
11.2 Tests of Multivariate Normality . . . . . . . . . . . . ......... . . . 309
11.3 Tests Based on the Third-Order Cumulants . . . ......... . . . 320
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......... . . . 325
x Contents

Part VI Model Selection


12 On the Problem of Model Selection Based on the Data . . . . . . . . . . 329
12.1 Fisher’s Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
12.2 Search for Appropriate Models . . . . . . . . . . . . . . . . . . . . . . . . 330
12.3 Construction of Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
12.4 Selection of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
12.5 More General Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
12.6 Derivation of AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
12.7 Problems of AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
12.8 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
12.9 Some Additional Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356

Part VII Asymptotic Approximation


13 On Sum of 0–1 Random Variables I. Univariate Case . . . . . . . . . . 359
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
13.2 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
13.3 Approximation by Binomial Distribution . . . . . . . . . . . . . . . . . 363
13.4 Convergence to Poisson Distribution . . . . . . . . . . . . . . . . . . . . 366
13.5 Convergence to the Normal Distribution . . . . . . . . . . . . . . . . . 370
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
14 On Sum of 0–1 Random Variables II. Multivariate Case . . . . . . . . 381
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
14.2 Sum of Vectors of 0–1 Random Variables . . . . . . . . . . . . . . . . 382
14.2.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . 382
14.2.2 Approximation by Binomial Distribution . . . . . . . . . . . 385
14.2.3 Convergence to Poisson Distribution . . . . . . . . . . . . . . 386
14.2.4 Convergence to the Normal Distribution . . . . . . . . . . . 388
14.3 Sum of Multinomial Random Vectors . . . . . . . . . . . . . . . . . . . 389
14.3.1 Notations and Definitions . . . . . . . . . . . . . . . . . . . . . . 390
14.3.2 Generalized Krawtchouk Polynomials and
Approximation by Multinomial Distribution . . . . . . . . . 392
14.3.3 Convergence to Poisson Distribution . . . . . . . . . . . . . . 395
14.3.4 Convergence to the Normal Distribution . . . . . . . . . . . 396
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
15 Algebraic Properties and Validity of Univariate and Multivariate
Cornish–Fisher Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
15.2 Univariate Cornish–Fisher Expansion . . . . . . . . . . . . . . . . . . . . 403
Contents xi

15.3 Multivariate Cornish–Fisher Expansion . . . . . . . . . . . . . . . . . . 414


15.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
15.5 Validity of Cornish–Fisher Expansion . . . . . . . . . . . . . . . . . . . 419
15.6 Cornish–Fisher Expansion of Discrete Variables . . . . . . . . . . . . 423
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Part I
Statistical Prediction
Chapter 1
Theory of Statistical Prediction

Abstract The author started the studies of problems of statistical prediction around
1965 and has written a series of papers on them, giving talks in academic meetings
and seminars and also publishing papers. This chapter is a reorganization of the
main results of those studies. The problems of ‘prediction’ for time-series data are
not dealt within this chapter. We are mainly interested in simpler cases where the
data X 1 , . . . , X n and the value Y to be predicted are jointly distributed real random
variables, in most cases independently distributed or with rather simple structure. The
purpose of our study is to construct an analogous theory of prediction corresponding
to the theory of statistical inference on parameters. It has been established that in
correspondence to the theory of point estimation and of interval estimation, a quite
similar theory of point prediction and interval prediction can be constructed and
corresponding to the theory of testing hypothesis and of multiple decisions, the
theory of dual or multiple-choice prediction can be constructed.

1.1 Introduction

It is often argued that the true objective of statistical inference is in prediction. The
author does not necessarily share this opinion but it is certain that prediction must
be a very important part of statistical theory. But as it is, the theory of statistical
prediction has not received sufficient attention it deserves by the modern theory of
statistics. It is generally treated as a corollary of the theory of statistical inference
or as a kind of statistical decision, which does not deserve separate treatments apart
from the general theory of statistical decisions. But it seems to the author that there
are some problems in the theory of statistical prediction which need a specific theory
and this chapter is to give a comprehensive approach to formulate such a theory of
statistical prediction.
We shall begin with the most abstract framework. We suppose that we are to predict
on the basis of some data X some value Y which is to be realized in the future. We

This chapter is a reorganization of the main results of Takeuchi (1975) Theories of Statistical Pre-
diction (Tōkei-teki Yosoku-ron) and Akahira and Takeuchi (1980) A note on prediction sufficiency
(adequacy) and sufficiency. Austral. J. Statist. 22 (3), 332–335.

© Springer Japan KK, part of Springer Nature 2020 3


K. Takeuchi, Contributions on Theory of Mathematical Statistics,
https://doi.org/10.1007/978-4-431-55239-0_1
4 1 Theory of Statistical Prediction

assume that X and Y are jointly distributed random variables with sample space
X × Y with some σ-field. X and Y may be spaces of any kind but throughout
this chapter we assume that they are Euclidean. The joint distribution of X and Y
has some unknown factor, which is designated by a parameter θ which is an element
of some parameter space Θ.
‘Prediction’ may be done in various ways, so that in the most general frameworks
it may be defined that ‘prediction’ is to select an element f of a space F(C ), which
is a set of statements about the value of Y with σ-field C and is called the space
of prediction. In the subsequent sections, several cases where F is defined in more
concrete ways are discussed in more detail but in this section no more restrictions
are imposed on F.
The rule of prediction to be applied is defined as a prediction function π, which
is a measurable transformation from X to F. Generally, it is assumed that the class
of prediction functions is defined including randomized functions which determine
probability distributions over F corresponding to X . The fault or loss due to a predic-
tion is defined in terms of a weight or loss function W , which is a non-negative-valued
measurable function of Y and f and the expectation of W with respect to a prediction
function π under parameter value θ is designated as

r (θ, π) = E θ [W (Y, π(X ))],

and is called the risk function.


These setups are quite analogous to that of the decision functions as formulated
by Wald (1950). Also vector-valued weight functions may be considered as Blyth
and Bondar (1987) but in the abstract frameworks the possibility of vector-valued
weight functions is not investigated.
When X and Y are statistically independent, the problem of prediction can be
reduced to the decision problem in the sense of Wald, for defining

W ∗ (θ, f ) = E θ [W (Y, f )],

we have

r (θ, π) = E θ [W (Y, π(X ))] = E θ [W ∗ (θ, π(X ))],

and W ∗ (θ, f ) may be identified as a weight function of a decision problem, which


was pointed out by Sverdrup (1967).
But when X and Y are not independent, some new aspects will emerge. Also in
certain more specific problems, the prediction problem requires a separate approach
from the usual type of inference. Those problems will be discussed in the subsequent
sections.
For the most part of this chapter, it is assumed that the distribution of X and Y is
absolutely continuous with respect to some σ-finite measure and we shall denote the
density function as f (x, y, θ), etc.
1.2 Sufficiency with Respect to Prediction 5

1.2 Sufficiency with Respect to Prediction

First we shall consider the concept of sufficiency with respect to prediction. Let
T = t (X ) be a statistic, i.e., a measurable transformation which maps X into some
measurable space T . Then the condition for T to be sufficient for the prediction of Y
may be defined in several ways. The concept was discussed in Akahira and Takeuchi
(1980; 2003), Takeuchi (1975).

(a) For any prediction function π(X ), there is a T -measurable prediction function
π ∗ (T ) such that for all θ ∈ Θ,

r (θ, π ∗ ) = r (θ, π).

(b) More precisely for any π(X ), there is a π ∗ (T ) such that for all θ ∈ Θ, the joint
distribution of Y and π ∗ is identical to that of Y and π.
(c) More loosely for any π(X ), there is a π ∗ (T ) such that for all θ ∈ Θ,

r (θ, π ∗ ) ≤ r (θ, π).

(d) For any prior distribution ξ over some σ-field of subsets of Θ, the posterior
distribution of Y given X is equal to the posterior distribution of Y given T for
almost all X .
When X and Y are independent, the problem of prediction can be reduced to that
of decision and if T is sufficient for X in the usual sense, it is also sufficient for
prediction.

Theorem 1.1 Assume that X and Y are independent. Let T = t (X ) be a sufficient


statistic, i.e., the conditional distribution of X given T can be determined to be
independent of θ, then T satisfies (a) to (d) above.
Proof For decision problems, sufficiency was discussed by Halmos and Savage
(1949) and Bahadur (1954) and they proved essentially (b). Also it is evident that (b)
implies (a), (a) implies (c). As for (d) let Pξ {·|X } denote the conditional probability
measure given X then we have
 
Pξ {Y ∈ A|X } = Pθ {Y ∈ A}d Pξ {θ|X } = Pθ {Y ∈ A}d Pξ {θ|T } = Pξ {Y ∈ A|T },

so that (d) is satisfied. 

As for the converse of the theorem, some regularity conditions are necessary.

Theorem 1.2 Assume that X and Y are independent and that for any θ1 = θ2 , there
is a set A such that Pθ1 {Y ∈ A} = Pθ2 {Y ∈ A}. Then (d) is satisfied only if T is a
sufficient statistic.
6 1 Theory of Statistical Prediction

Proof Let ξ be a prior distribution which assigns positive probabilities to only two
points θ1 and θ2 . Then the posterior probability of Y ∈ A given X is expressed as

Pξ {Y ∈ A} = Pθ1 {Y ∈ A}Pξ {θ1 |X } + Pθ2 {Y ∈ A}Pξ {θ2 |X }.

The posterior probability of Y ∈ A given T can be expressed similarly and when


Pθ1 {Y ∈ A} = Pθ2 {Y ∈ A}, two posterior probabilities are equal if and only if
Pξ {θi |X } = Pξ {θi |T }, i = 1, 2 and if this condition is satisfied for almost all X , T is
pairwise sufficient for θ1 and θ2 . Since θ1 and θ2 are arbitrary and pairwise sufficiency
implies sufficiency (Halmos and Savage 1949), the theorem is proved. 

When X and Y are not independent, it is intuitively clear that the information
about Y can be obtained through the information about θ contained in X and also
from the correlation between Y and X , so that sufficient statistic with respect to the
prediction of Y have to maintain these two sources of information contained in X .
We shall try to define this notion rigorously. Consider the condition (b).

Lemma 1.1 If the prediction space has at least two distinct elements, (b) implies
that T = t (X ) is a sufficient statistic in the usual sense for X .

Proof It is obvious that (b) implies that π(X ) and π ∗ (T ) have the same distribution
for all θ ∈ Θ. Assume that T is not sufficient, then T is not pairwise sufficient. Hence
for some θ1 and θ2 , there is a set C ∈ C and A ∈ A the likelihood ratio

Pθ1 {X ∈ A|T = t (X ) = t}/Pθ2 {X ∈ A|T = t (X ) = t}

is not constant for X ∈ X such that T = t (X ) = t ∈ C. Then there exists a positive


constant k > 0 such that the set

Ak = {A ∈ A |Pθ1 {X ∈ A|T = t (X ) = t}/Pθ2 {X ∈ A|T = t (X ) = t} ≥ k}

cannot be expressed in terms of T = t (X ) and Pθ1 {X ∈ Ak } > 0, Pθ2 {X ∈ Ak } > 0.


Consider the procedure such that π0 (X ) = P1 for X ∈ Ak and π0 (X ) = P2 for X ∈/
Ak . Then π0 is the unique procedure (Lehmann 1959) which maximizes Pθ1 {π(X ) =
P1 } under the condition that Pθ2 {π(X ) = P2 } = Pθ2 {π0 (X ) = P2 }, and there exist
no equivalent procedure which can be expressed as a function of T . This contradicts
(b) and the lemma is proved. 

Lemma 1.2 If the condition of Lemma 1.1 is satisfied, then (b) implies that given T ,
X and Y are conditionally independent for all θ ∈ Θ and for almost all values of T .

Proof Assume the contrary. Then for some θ ∈ Θ, there is a set A ∈ A such that
there is a set C ∈ C for which

Pθ {Y ∈ A|X } = Pθ {Y ∈ A|t (X ) ∈ C}, Pθ {T ∈ C} > 0.


1.2 Sufficiency with Respect to Prediction 7

Consider the procedure π ∗ which takes only two prediction points P1 and P2
and which minimizes Pθ {Y ∈ A|π(X ) = P1 } under the condition Pθ {π(X ) = P1 }
is equal to constant. Then π ∗ (X ) = P1 if and only if Pθ {Y ∈ A|X } ≤ k and it is the
unique solution of this problem. More exactly some randomization procedure may
be necessary when Pθ {Y ∈ A|X } = k. Hence for some appropriate k, there is no
π(T ) equivalent to π ∗ (X ). 

Thus we arrived at the following definition.

Definition 1.1 A statistic T = t (X ) is called to be sufficient with respect to the


prediction of Y in the first sense or for short P-Y sufficient in the first sense if given
T conditional distribution of X is independent of θ and of Y . It is also called as being
adequate.

Theorem 1.3 If the prediction space has at least two distinct points, then the con-
dition (b) is satisfied only if T is P-Y sufficient in the first sense.

Theorem 1.4 T is P-Y sufficient in the first sense if and only if the joint density
function of X and Y can be decomposed as

f (x, y, θ) = g(x)h(t (x), y, θ) a.e. (x, y) ∈ X × Y ,

where g is a function of x only and h is dependent on x only through t (x).

Proof If T is P-Y sufficient, then the conditional density function given T = t is


written as

f ∗ (x, y, θ|T = t) = g ∗ (x|t)h ∗ (y, θ|t).

Multiplying this by the density function of t, we have

f (x, y, θ) = f ∗ (x, y, θ|T = t) f ∗ (t, θ)


= g ∗ (x|t)h ∗ (y, θ|t) f ∗ (t, θ).

Putting g(x) = g ∗ (x|t), h(t, y, θ) = h ∗ (y, θ|t) f ∗ (t, θ), we have the condition of
the theorem. The converse is obvious. 

Theorem 1.5 If T is P-Y sufficient in the first sense, then the conditions (a) to (d)
are satisfied.

Proof For any π(X ) let π ∗ (T ) be a randomized prediction procedure such that for
given T , π ∗ (T ) is distributed according to the conditional distribution of π(X ).
Since the conditional distribution of X given T is independent of Y , π ∗ (T ) can be
determined independently of Y and the joint distribution of Y and π ∗ (T ) is equivalent
to that of Y and X . This establishes (b) and (b) implies (a), (a) implies (c). (d) is
obvious from Theorem 1.4 above. 
8 1 Theory of Statistical Prediction

But as for (c) the above definition of P-Y sufficiency does not express a necessary
condition. For the extreme case, consider that X and Y are independent and the
distribution of Y is independent of θ, while X does depend on θ. Then X does not
possess any information about Y and the best prediction procedure is to take f which
minimizes E[W (Y, f )] irrespective of X . Hence, in this case, any statistic (constant)
may be considered as sufficient. Thus we have the second definition.

Definition 1.2 T = t (X ) is called to be sufficient with respect to the prediction of


Y (P-Y sufficient) in the second sense if given T the conditional distribution of Y is
independent of θ and X for almost all values of T and for all θ ∈ Θ.

Theorem 1.6 T is P-Y sufficient in the second sense, if and only if the joint density
function of X and Y is decomposed as

f (x, y, θ) = g(x, θ)h(y, t (x)).

The proof is similar to that of Theorem 1.4 and is omitted.

Theorem 1.7 If T is P-Y sufficient in the second sense, then there is a π ∗ (T ) which
is uniformly best, i.e., for any π,

r (θ, π ∗ ) ≤ r (θ, π) ∀θ ∈ Θ.

Proof Let π ∗ (T ) be defined so as to minimize

E[W (Y, π)|T ],

which by the definition of P-Y sufficiency in the second sense is independent of θ.


Also it is evident that π ∗ satisfies the condition of the theorem. 

We shall define the third case which is the mixture of the above two.

Definition 1.3 T = t (X ) is P-Y sufficient in the third sense, if for all θ ∈ Θ and
for almost all values of T conditionally given T , X and Y are independent and either
X or Y is independent of θ given T .
Theorem 1.8 T = t (X ) is P-Y sufficient in the third sense, if and only if the joint
density function of X and Y is decomposed as

f (x, y, θ) = g(x)h(t (x), y, θ) or f (x, y, θ) = g ∗ (x, θ)h(t (x), y) a.e. (x, y) ∈ X × Y .

Theorem 1.9 If T = t (X ) is P-Y sufficient in the third sense, then the conditions
(c) and (d) are satisfied.

The proof is straightforward and is omitted.

Theorem 1.10 If the condition (d) is satisfied, then T is P-Y sufficient in the third
sense.
1.2 Sufficiency with Respect to Prediction 9

Proof Fix two points θ1 and θ2 in the parameter space Θ. Consider a prior probability
measure which assigns probability ξ1 to θ1 and ξ2 to θ2 , where ξ1 ≥ 0, ξ2 ≥ 0 and
ξ1 + ξ2 = 1. Then the posterior probability of Y ∈ A given X is equal to

Pξ {Y ∈ A|X } = Pθ1 {Y ∈ A|X }Pξ {θ1 |X } + Pθ2 {Y ∈ A|X }Pξ {θ2 |X },

where Pξ {·|X } denotes the posterior probability and

ξ1 f (X, θ1 )
Pξ {θ1 |X } = , Pξ {θ2 |X } = 1 − Pξ {θ1 |X }.
ξ1 f (X, θ1 ) + ξ2 f (X, θ2 )

If the condition (d) is satisfied, then for almost all x1 and x2 such that t (x1 ) = t (x2 )
we have

Pξ {Y ∈ A|X = x1 } = Pξ {Y ∈ A|X = x2 }

for all ξ1 and ξ2 = 1 − ξ1 . Consequently

Pθ1 {Y ∈ A|x1 }ξ1 f (x1 , θ1 ) + Pθ2 {Y ∈ A|x1 }ξ2 f (x1 , θ2 )


ξ1 f (x1 , θ1 ) + ξ2 f (x1 , θ2 )
Pθ1 {Y ∈ A|x2 }ξ1 f (x2 , θ1 ) + Pθ2 {Y ∈ A|x2 }ξ2 f (x2 , θ2 )
=
ξ1 f (x2 , θ1 ) + ξ2 f (x2 , θ2 )

for all ξ1 and ξ2 = 1 − ξ1 . This is satisfied only when

Pθ1 {Y ∈ A|x1 } = Pθ1 {Y ∈ A|x2 }, Pθ2 {Y ∈ A|x1 } = Pθ2 {Y ∈ A|x2 }.

If f (x1 , θ1 ) = 0, then Pθ1 {Y ∈ A|x1 } may be defined to be equal to any value and
we regard that the above equality is automatically satisfied. Hence given T the con-
ditional distribution of Y given X is independent of X thus given T , X and Y are
conditionally independent. Moreover it is derived from above that

f (x1 , θ1 ) f (x2 , θ1 )
Pθ1 {Y ∈ A|x1 } = Pθ2 {Y ∈ A|x2 } or = .
f (x1 , θ2 ) f (x2 , θ2 )

Also if Pθ1 {Y ∈ A|x1 } = Pθ2 {Y ∈ A|x1 } for some x1 , then Pθ1 {Y ∈ A|x} = Pθ2 {Y ∈
A|x} for almost all x such that t (x) = t (x1 ) and if Pθ1 {Y ∈ A|x1 } = Pθ2 {Y ∈ A|x1 }
for some x1 , then f (x, θ1 )/ f (x, θ2 ) = f (x1 , θ1 )/ f (x1 , θ2 ) for almost all x such that
t (x) = t (x1 ). 
Example 1.1 X 1 , . . . , X n are independent and identically distributed according to
the normal distribution N (μ, σ 2 ). Y is distributed normally also with mean μ and
variance σ 2 . The correlation coefficient of Y and X i is known to be equal to γi , μ
and σ 2 are unknown. Then the joint density function is
10 1 Theory of Statistical Prediction

    2
√ −(n+1)  
n −1/2 y− n
i=1 x i − 1 −
n
i=1 γi μ
f (x1 , . . . , xn , y) = 2πσ 2 1− γi2 exp −  n  ,
i=1
2σ 2 1 − i=1 γi2

 n n n 
from which it follows that the set i=1 γi X i , i=1 Xi , i=1 X i2 is a P-Y
sufficient statistic in the first sense.

1.3 Point Prediction

In this section, we shall consider the case when the variable Y to be predicted is real
valued and it is required to predict the value directly. Any real-valued function r (X )
of X may be regarded as a predictor of Y and it is called an unbiased predictor if

E θ [r (X )] = E θ [Y ] ∀θ ∈ Θ. (1.1)

An unbiased predictor r ∗ (X ) is called a uniformly minimum variance unbiased


predictor if for any unbiased predictor r (X ) of Y ,

Vθ [r ∗ (X ) − Y ] ≤ Vθ [r (X ) − Y ] ∀θ ∈ Θ. (1.2)

Remark 1.1 The above definitions are quite similar to those for the unbiased esti-
mator of the parameter. Let g(θ) = E θ [Y ], then (1.1) implies E θ [r (X )] = g(θ) thus
r (X ) is also an unbiased estimator of g(θ). Also if Y and X are independent

Vθ [r (X ) − Y ] = Vθ [r (X )] + Vθ [Y ],

and Vθ [Y ] is independent of the prediction procedures hence the problem of unbiased


prediction can completely be reduced to the problem of unbiased estimation.
Theorem 1.11 If Y and X are independent, then an unbiased predictor r (X ) of Y is
of uniformly minimum variance, if and only if it is the uniformly minimum variance
unbiased estimator of E θ [Y ].
Theorem 1.12 r (X ) is a uniformly minimum unbiased predictor of Y , if and only if
for any unbiased estimator r0 (X ) of zero, i.e., E θ [r0 (X )] = 0 for all θ ∈ Θ,

Covθ [r (X ) − Y, r0 (X )] = 0 ∀θ ∈ Θ.

The proof of this theorem is similar to the corresponding well-known lemma of


unbiased estimation and is omitted.
Theorem 1.13 If E θ [Y |X ] = g(θ) + h(X ), i.e., the conditional expectation of Y
given X is decomposed as a sum of a function of the parameter and a function of
X , then r ∗ (X ) is a uniformly minimum variance unbiased predictor, if and only if
r ∗ (X ) − h(X ) is the uniformly minimum variance unbiased estimator of g(θ).
1.3 Point Prediction 11

Proof For any r0 (X ),

Covθ [r (X ) − Y, r0 (X )] = Covθ [r (X ) − h(X ), r0 (X )],

hence the theorem is an immediate consequence of the previous theorem. 


Example 1.2 Let
 
Xi = ci j θ j + u i , Y = a j θ j + v,
j j

where θ j are unknown parameters, ci j and a j are known coefficients, u i are distributed
independently normally according to N (0, σ 2 ), v is also normally distributed accord-
ing to N (0, σ 2 ) but u i and v are not independent and the covariance Cov(u i , v) = γi
is assumed to be known. Then
    
E[Y |X i ] = E[Y |u i ] = ajθj + γi u i = (a j − γi ci j )θ j + γi X i .
j i j i i

Let θ̂ j be the least squares estimator of θ j , then it is well known that any linear
combination of θ̂ j is a uniformly minimum variance unbiased estimator hence
  
r ∗ (X ) = (a j − γi ci j )θ̂ j + γi X i
j i i

is a uniformly minimum variance unbiased predictor of Y .


Example 1.3 Let X 1 , . . . , X n , X n+1 , . . . , X n+m be distributed independently and
identically and the shape of the distribution is not specified. Assume that X 1 , . . . , X n
are observed values and Y = X 1 + · · · + X n+m is to be predicted. Let E(X i ) = μ
and V (X i ) < ∞. Then

E[Y |X 1 , . . . , X n ] = X 1 + · · · + X n + mμ,

n
and as is well known that X̄ = i=1 X i is the uniformly minimum variance unbiased
estimator of μ, so that

Ŷ = (n + m) X̄

is a uniformly minimum variance unbiased predictor of Y .


The following theorem may be regarded as an extension of the Rao–Blackwell
theorem.
Theorem 1.14 Let T be a P-Y sufficient statistic in the first sense, then for any
unbiased predictor r (X ), there is an unbiased predictor r ∗ (T ) which is a function of
12 1 Theory of Statistical Prediction

T and

Vθ [r ∗ (T ) − T ] ≤ Vθ [r (X ) − Y ] ∀θ ∈ Θ.

Proof Let E[r (X )|T ] = r ∗ (T ) which is independent of θ by the definition of the


P-Y sufficiency. Also

E θ [r ∗ (T ) − Y ] = E θ [r (X ) − Y ] = 0,
Vθ [r (X ) − Y ] = E θ Vθ [r (X ) − Y |Y ] + E θ [r ∗ (T ) − E[Y |T ]]2
= E θ Vθ [r (X )|T ] + E θ Vθ [Y |T ] + E θ [r ∗ (T ) − E[Y |T ]]2
≥ E θ Vθ [Y |T ] + E θ [r ∗ (T ) − E[Y |T ]]2
= Vθ [r ∗ (X ) − Y ],

since X and Y are conditionally independent given T . 

Theorem 1.15 Let T be a P-Y sufficient statistic in the second sense. If there is
an unbiased predictor of Y , then there is a uniformly minimum variance unbiased
predictor r ∗ (T ) of Y .

Proof Let r (X ) be an unbiased predictor of Y and r ∗ (T ) = E[Y |T ]. Then

E θ [r (X ) − Y ]2 = E θ [E θ [r (X ) − Y ]2 |T ]]
= E θ [E θ [(r (X ) − r ∗ (T )) − (Y − r ∗ (T ))]2 |T ]]
= E θ [E θ [(r (X ) − r ∗ (T ))2 |T ] + E θ [E θ [(Y − r ∗ (T ))2 |T ]],

since Y and X are conditionally independent given T hence

E θ [r (X ) − Y ]2 = E θ [r (X ) − r ∗ (T )]2 + E θ [Y − r ∗ (T )]2
≥ E θ [Y − r ∗ (T )]2 . 

The case for sufficiency in the third sense is a little more complicated.

Theorem 1.16 Let T be a P-Y sufficient statistic in the third sense. Then for any
predictor r (X ), there is r ∗ (T ) such that

E θ [r ∗ (T ) − Y ]2 ≤ E θ [r (X ) − Y ]2 ∀θ ∈ Θ.

Proof Provided that r ∗ (T ) be defined either by E[Y |T ] or by E[r (X )|T ] according


to T belongs to the first or second set, the inequality follows immediately. 

The only trouble with Theorem 1.16 is that the unbiasedness of r (X ) does not
necessarily imply the unbiasedness of r ∗ (T ) as stated in Theorem 1.15.
1.3 Point Prediction 13

Example 1.4 Assume that X 1 and X 2 are distributed independently normally with
common variance 1 and E(X 1 ) = θ, E(X 2 ) = 2θ, where θ is an unknown param-
eter. The predicted value Y is distributed conditionally normally given X 1 and X 2 ,
conditional mean and variance being

X2 − X1 if X 1 + X 2 ≥ 0
E[Y |X 1 , X 2 ] =
θ if X 1 + X 2 < 0,
V [Y |X 1 , X 2 ] = 1.

Let T = (T1 , T2 ) be defined as

T1 = sgn (X 1 + X 2 ),

X2 − X1 if T1 ≥ 0
T2 =
X 1 + 2X 2 if T1 < 0.

Then T is shown to be P-Y sufficient in the third sense. Since given T when T1 ≥ 0,
Y is conditionally distributed with mean T2 and independently of θ and when T1 < 0,
conditional distribution of (X 1 , X 2 ) is independent of θ.
Now Ŷ = (X 1 + 2X 2 )/5 which is the uniformly minimum variance unbiased
estimator of θ is an unbiased predictor of Y since E(Y ) = θ. But this is not a function
of the P-Y sufficient statistic T and such is obtained by

X2 − X1 if X 1 + X 2 ≥ 0
r ∗ (T ) =
E[Ŷ |T1 , T2 ] = (X 1 + 2X 2 )/5 if X 1 + X 2 < 0.

Obviously r ∗ (T ) is improved over Ŷ if X 1 + X 2 ≥ 0 and is equal to Ŷ if X 1 + X 2 < 0


thus it is shown that

E[r ∗ (T ) − Y ]2 < E[Ŷ − Y ]2 .

But r ∗ (T ) is not unbiased since

E[r ∗ (T )] = E[X 2 − X 1 |X 1 + X 2 ≥ 0] Pr{X 1 + X 2 ≥ 0}


+ E[(X 1 + 2X 2 )/5|X 1 + X 2 < 0] Pr{X 1 + X 2 < 0}
= θ Pr{X 1 + X 2 ≥ 0}
+ E[3(X 1 + X 2 )/10 + (X 2 − X 1 )/10|X 1 + X 2 < 0] Pr{X 1 + X 2 < 0}
= θ Pr{X 1 + X 2 ≥ 0}
+ θ/10 Pr{X 1 + X 2 < 0} + E[3(X 1 + X 2 )/10|X 1 + X 2 < 0] Pr{X 1 + X 2 < 0}
< θ Pr{X 1 + X 2 ≥ 0}
+ θ/10 Pr{X 1 + X 2 < 0} + E[3(X 1 + X 2 )/10] Pr{X 1 + X 2 < 0}
= θ.

Theorem 1.14 can be generalized in the following way.


14 1 Theory of Statistical Prediction

Theorem 1.17 Let T = t (X ) be a sufficient statistic in the usual sense for X and
assume that the conditional expectation of Y given X can be expressed as a function
of t (X ) and θ for almost all values of X and for all θ ∈ Θ, then for any unbiased
predictor r (X ), there is a r ∗ (T ) such that

Vθ [r ∗ (T ) − Y ] ≤ Vθ [r (X ) − Y ] ∀θ ∈ Θ.

Proof For any r (X ),

Vθ [r (X ) − Y ] = E θ Vθ [Y |X ] + Vθ [r (X ) − E θ [Y |X ]],

and let r ∗ (T ) = E[r (X )|T ], then under the assumption of the theorem

Vθ [r (X ) − E θ [Y |X ]] = E θ Vθ [r (X )|T ] + Vθ [r ∗ (T ) − E θ [Y |X ]]
≥ Vθ [r ∗ (T ) − E θ [Y |X ]]. 

Theorem 1.14 may be regarded as a corollary of this theorem, for a P-Y sufficient
statistic T in the first sense satisfies the condition of the theorem. The sufficiency of T
is evident and since X and Y are conditionally independent given T , E θ [Y |X, T ] =
E θ [Y |T ] but E θ [Y |X, T ] is obviously equal to E θ [Y |X ] thus the condition of the
theorem is satisfied.
Now we shall consider the conditions for the locally best unbiased predictor, that
is, a predictor which is unbiased and minimizes

Vθ0 [r (X ) − Y ] at a specific θ = θ0 .

Let the conditional expectation be E θ0 [Y |X ] = g0 (X ) then

Vθ0 [r (X ) − Y ] = Vθ0 [r (X ) − g0 (X )] + Vθ0 [Y |X ].

Hence the problem is reduced to minimizing

Vθ0 [r (X ) − g0 (X )]

under the condition

E θ [r (X )] = g(θ) = E θ (Y ),

or equivalently putting r̃ (X ) = r (X ) − g0 (X ) and g̃(θ) = g(θ) − E θ [g0 (X )], we are


to minimize Vθ0 (r̃ (X )) under the condition E θ [r̃(X )] = g̃(θ).
Thus the problem entirely reduces to the problem of locally best unbiased esti-
mation.
1.3 Point Prediction 15

Example 1.5 Suppose that (X 1 , Y1 ), . . . , (X n , Yn ), (X n+1 , Yn+1 ) are independently


distributed according to an identical bivariate normal distribution, all the parameters
being unknown and we are to predict Yn+1 on the basis of (X 1 , Y1 ), . . . , (X n , Yn )
and X n+1 . Then

E[Yn+1 |(X 1 , Y1 ), . . . , (X n , Yn ), X n+1 ] = E[Yn+1 |X n+1 ] = μ2 + β(X n+1 − μ1 ),

where
Cov(X i , Yi )
μ2 = E(Yi ), μ1 = E(X i ), β = .
V (X i )

Fix, for example, if β = b. Then

g0 (X ) = bX n+1 , E[g0 (X )] = bμ1 , E(Yn+1 ) − bμ1 = μ2 − bμ1 .

The locally best estimator of μ2 − bμ1 is

1 1 
n n+1
Ȳ − b X̄ , Ȳ = Yi , X̄ = Xi .
n i=1 n + 1 i=1

Hence the locally best unbiased predictor of Yn+1 is

1
n
n
Ȳ − b X̄ + bX n+1 = Ȳ + b(X n+1 − X̄ ), X̄ = Xi .
n+1 n i=1

Remark that the locally best predictor depends on the specific fixed value b thus no
uniformly best unbiased predictor does exist.

The following theorem is an extension of the Cramér–Rao theorem.

Theorem 1.18 Assume that the density function f (x, θ) and the conditional expec-
tation gθ (X ) = E θ [Y |X ] are continuous and differentiable with respect to θ at θ = θ0
for almost all X . Moreover it is assumed that

∂ f (X, θ) 2 ∂gθ (X ) 2
E θ0 f (X, θ0 ) < ∞, E θ0 < ∞,
∂θ θ=θ0 ∂θ θ=θ0
1 ∂ f (X, θ) 2
lim E θ0 { f (X.θ0 + Δθ) − f (X, θ0 )} − = 0,
Δθ→0 Δθ ∂θ θ=θ0
1 ∂gθ (X ) 2
lim E θ0 {gθ0 +Δθ (X ) − gθ0 (X )} − = 0.
Δθ→0 Δθ ∂θ θ=θ0

Then for any unbiased predictor r (X ) of Y ,


16 1 Theory of Statistical Prediction
2
∂gθ (X )
E θ0 ∂θ θ=θ0
Vθ0 [r (X ) − Y ] ≥  + Vθ0 [Y |X ].
∂ log f (X,θ)
E θ0 ∂θ θ=θ0

Proof Under the conditions of the theorem, the Cramér–Rao theorem holds with
respect to the unbiased estimation of g̃(θ) and
2
∂ g̃(θ)
∂θ θ=θ
Vθ0 [r (X ) − Y ] ≥ 0
,
∂ log f (X,θ)
E θ0 ∂θ θ=θ0

where

g̃(θ) = g(θ) − E θ [g0 (X )]


= E[E θ [Y |X ] − E θ0 [Y |X ]]

= [E θ [Y |x] − E θ0 [Y |x]] f (x, θ) dμ(x),

and under the conditions of the theorem it holds that



∂ g̃(θ) ∂ E θ [Y |x]
= f (x, θ) dμ(x)
∂θ θ=θ0 ∂θ θ=θ0

∂ f (x, θ)
+ [E θ [Y |x] − E θ0 [Y |x]] dμ(x)
∂θ θ=θ0

∂gθ (x)
= f (x, θ) dμ(x)
∂θ θ=θ0
∂gθ (X ) 
= E θ0 ,
∂θ θ=θ0

which establishes the theorem. 

1.4 Interval or Region Prediction

Now consider the prediction procedure which assigns a measurable set S(x) ⊂ S to
each point x. We also assume that the set {(x, y)|y ∈ S(x)} is measurable. We say that
S(X ) is a prediction region (or interval) with confidence coefficient 100(1 − α)% or
for short 100(1 − α)% prediction region if

Pθ {Y ∈ S(X )} ≥ 1 − α ∀θ ∈ Θ. (1.3)
1.4 Interval or Region Prediction 17

Prediction regions may be obtained in many ways thus we should define some
criterion of optimality for prediction regions. We may consider that those predic-
tion regions which have the smallest possible volume (area or length) are optimal.
More generally, we may define a weight function W (y) and the mean volume of the
prediction region as
 
Eθ W (y) dμ(y) = M(θ, S), (1.4)
S(X )

where μ(y) is the Lebesgue measure.


Consider a function which is defined as

1 if Y ∈ S(X )
σ(X, Y ) = (1.5)
0 otherwise,

and is called a region prediction function. Conversely if there is a function σ(X, Y )


which takes only two values 0 and 1, then we may define a prediction region by

S(X ) = {Y |σ(X, Y ) = 1 for X }. (1.6)

Consequently prediction region and region prediction function correspond one to one
with each other. More generally consider a randomized prediction region procedure
which gives a probability measure over some σ-field of subsets corresponding to
each point x. Then

σ(X, Y ) = Pr{Y ∈ S(X )|X }, 0 ≤ σ(X, Y ) ≤ 1 (1.7)

is also called a region prediction function. Conversely for a measurable function


σ(X, Y ), 0 ≤ σ(X, Y ) ≤ 1, there corresponds a randomized prediction region which
is defined as

S(X ) = {Y |σ(X, Y ) ≥ U for X },

where U is a random variable distributed uniformly over the interval [0, 1] and
independently of X .
Thus we can reformulate the problem as follows: seek for a function σ(x, y) such
that 0 ≤ σ(x, y) ≤ 1,

E θ [σ(X, Y )] ≥ 1 − α,

and minimizes
 
E θ0 σ(x, y)W (y) dμ(y) . (1.8)
18 1 Theory of Statistical Prediction

The solution σ ∗ (x, y) of this problem is called the locally best prediction region at
θ = θ0 and if this is determined independently of θ, it is called the uniformly best
prediction region.

Theorem 1.19 If σ ∗ (x, y) is determined so as to minimize


 
E θ0 σ(x, y)W (y) dμ(y)

under the condition



E θ [σ(X, Y )] dξ(θ) = 1 − α

for some prior probability measure ξ over Θ and if it satisfies the condition (1.8),
then it gives the locally best prediction region at θ0 . More precisely it is determined
as
 
1 if  f (x, y, θ) dξ(θ) > c f (x, θ0 )W (y)
σ ∗ (x, y) =
0 if f (x, y, θ) dξ(θ) < c f (x, θ0 )W (y),

and satisfying

E θ [σ ∗ (X, Y )] ≥ 1 − α ∀θ ∈ Θ,

and

E θ0 [σ(X, Y )] dξ(θ) = 1 − α,

then σ ∗ (x, y) gives the locally best prediction region.

Proof The former half of the theorem is obvious. The latter half of the theorem is
also easily proved analogously with the Neyman–Pearson lemma. 

Example 1.6 Let X and Y be distributed independently according to an exponential


distribution with mean θ. Consider the case when W (y) = 1. Let the prior measure
be concentrated at one fixed point θ1 . Then σ ∗ (x, y) in the above theorem is obtained
as
x x + y  cθ12
σ ∗ (x, y) = 1 if and only if exp − ≥ .
θ0 θ1 θ0

Putting c = θ0 /θ12 we have

x θ0
σ ∗ (x, y) = 1 if and only if ≥ .
x+y θ1
1.4 Interval or Region Prediction 19

Since for all θ ∈ Θ, X/(X + Y ) is distributed uniformly over the interval [0, 1], if
we set θ0 /θ1 = α we have

E θ [σ ∗ (X, Y )] = Pθ {X/(X + Y ) ≥ α} = 1 − α ∀θ ∈ Θ.

Thus the prediction region X/(X + Y ) ≥ α or

1−α
Y ≤ X
α
gives a locally best prediction region at θ0 . But since this is independent of θ0 , it is
the uniformly best prediction region or in this case the uniformly shortest prediction
interval.

In some cases it is more convenient to modify the condition (1.3) or (1.8) to

Pθ {y ∈ S(X )} = 1 − α or E θ [σ(X, Y )] = 1 − α. (1.9)

If (1.9) is satisfied, then the prediction region is called to be similar. Also among
similar prediction regions the best (locally or uniformly) prediction region is sought
for.
Assume that for the joint distribution of X and Y , there is a complete sufficient
statistic W = w(X, Y ), i.e., E θ [φ(W )] = 0 for all θ ∈ Θ implies φ(W ) ≡ 0 a.e..
Then by putting

σ ∗ (W ) = E[σ(X, Y )|W ],

the condition (1.9) implies E θ [σ ∗ (W )] = 1 − α, i.e.,

σ ∗ (W ) = E[σ(X, Y )|W ] = 1 − α a.e. W.

Since
   
E θ0 σ(X, y)W (y) dμ(y) = E θ0 W E θ0 σ(X, y)W (y) dμ(y)|W ,

it is sufficient to minimize
 
E θ0 σ(X, y)W (y) dμ(y)|W

under the condition

E[σ(X, Y )|W ] = 1 − α

for almost all W in order to obtain the locally best similar prediction region.
20 1 Theory of Statistical Prediction

If by transformation of variables Y = φ(W, X, V ), we have one-to-one transfor-


mation from (X, Y ) to (X, V, W ) and if the conditional density function of X and
V given W is expressed as f ∗ (x, v|w), then putting σ(x, y) = σ̃(x, v, w), w(y) =
w̃(x, v, w), we are to minimize
  
f (x, θ0 )σ̃(x, v, w)w̃(x, v, w) |J |dμ(x)dμ(v)dμ(w)

under the condition


 
f ∗ (x, v|w)σ̃(x, v, w)dμ(x)dμ(v) = 1 − α a.e. w,

where |J | denotes the Jacobian of the transformation.

Theorem 1.20 If the above conditions are satisfied, then the locally best prediction
region is given by σ̃ which satisfies

1 if f ∗ (x, v|w) > cw f (x, θ0 )W̃ |J |
σ̃ =
0 if f ∗ (x, v|w) < cw f (x, θ0 )W̃ |J |,

where the constant cw is dependent of w.

The proof of this theorem is straightforward.

Example 1.7 X 1 , . . . , X n , Y are independent and identically distributed according


to the normal distribution N (θ, 1). Then it is well known that W = i X i + Y is
sufficient for X and Y and is complete. Also by transformation Y = W − i X i ,
we have the joint density function of W and X 1 , . . . , X n and the conditional density
function of X 1 , . . . , X n given W is

1   w 2  w 2 
n
f ∗ (x1 , . . . , xn |w) = const. × exp − xi − + n 2 x̄ − ,
2 i=1 n+1 n+1
1
n
x̄ = xi .
n i=1

Further if we put W (y) = 1 then we have

 1 n 
f (x, θ) = const. × exp − (xi − θ)2 .
2 i=1

Putting θ = 0, we have the locally best prediction region as σ̃(x, w) = 1 if and only
if
1.4 Interval or Region Prediction 21

n 
 w 2  w 2  2
n
xi − + n 2 x̄ − ≤ xi + cw ,
i=1
n+1 n+1 i=1

or equivalently if and only if


w w
− cw ≤ x̄ ≤ + cw ,
n n

where the constant cw is determined so that


w w 
Pr − cw ≤ X̄ ≤ + cw W = w = 1 − α.
n n

The conditional distribution of X̄ given W = w is normal with mean w/(n + 1) and


variance 1/n(n + 1), cw is a function of |w| and is an increasing function of |w|.
Returning back to the space of X and Y , we have the locally best prediction region
(shortest prediction interval) as

Y
X̄ − ≤ c̃( X̄ + Y ), c̃( X̄ + Y ) = cw n X̄ + Y,
n
which is visualized by the following figure (Fig. 1.1).
It is remarkable that the solution depends on θ and it does not coincide with the
‘usual’ prediction region

| X̄ − Y | ≤ c.

Example 1.8 Suppose that X 1 , . . . , X n and Y are i.i.d. real random variables dis-
tributed according to the exponential-type distribution with the density function

Fig. 1.1 Locally best


Y
prediction region in Example
1.1
Y
X
n

X
22 1 Theory of Statistical Prediction

f (x, θ) = h(x) exp{θx + c(θ)},


n
where θ is a real parameter. Then T = i=1 X i + Y is sufficient for (X 1 , . . . , X n , Y )
and also complete. Its density function is expressed as

f ∗ (t, θ) = gn+1 (t) exp{θt + (n + 1)c(θ)},

and the conditional density function is expressed independently of θ as follows.


n
The joint density function of Y and W = i=1 X i is given as

f (y, z, θ) = h(y)gn (z) exp{θ(y + z) + (n + 1)c(θ)},

from which the joint density function of Y and T is given as

f (y, t, θ) = h(y)gn (t − y) exp{θt + (n + 1)c(θ)},

and the conditional density function of Y given T is

f ∗ (y|t) = h(y)gn (t − y)/gn+1 (t).

When the distribution is continuous, we can define S ∗ (t) such that



f ∗ (y|t)dt = 1 − α, (1.10)
S ∗ (t)

and derive the region S(X ) by

 
n 

Y ∈S Y+ Xi ⇔ Y ∈ S(X ),
i=1

then S(X ) gives a similar prediction of Y of size 1 − α. Generally there are infinitely
many ways of defining S ∗ (T ), we need some criterion to choose among them.
The expected volume of the prediction region S(X ) or φ S (x, y) is defined as
   
VS (θ) = E θ dμ y = E θ φ S (X, y)dμ y .
S(X )

At a specified value θ0 of θ we have


   
VS (θ0 ) = dμ y f 0 (x)dμx = f 0−1 (y) f 0 (x, y)dμx dμ y
S(x) S(x)
  
−1
= f 0 (y) f (y|t)dy f ∗ (t)dt, f 0 (y) = f (y, θ0 ).

(1.11)
S ∗ (t)
1.4 Interval or Region Prediction 23

Then (1.11) is minimized under the condition (1.10) when



S ∗ (t) = {y| f 0 (y) > c(t)}, f ∗ (y|t)dt = 1 − α.
f 0 (y)>c(t)

Therefore the minimum volume prediction region depends on θ0 and generally there
does not exist any uniformly minimum volume prediction region except when the
region {y| f (y, θ) > c(t)} is independent of θ.
Example 1.9 X 1 , . . . , X n , Y are i.i.d. normally distributed with mean μ and variance
σ 2 . The complete sufficient statistic is given by the pair (T, W )


n
T = n X̄ + Y, W = X i2 + Y 2 .
i=1


n
Since R = (Y − X̄ )/S, S = i=1 (X i − X̄ ) /(n − 1) is distributed independently
2

of the parameters
√ μ and σ, the completeness of (T, W ) implies that R is independent
of (T, W ), (n + 1)/n R is shown to be distributed according to the t-distribution
with n − 1 degrees of freedom, therefore given (T, W ) conditionally

n+1 (n−1)
Pr R ≤ tα/2 = 1 − α,
n

which implies that


 
n + 1 (n−1) n + 1 (n−1)
X̄ − t S < Y < X̄ + t S
n α/2 n α/2

gives a similar prediction interval of Y , which, however, is not locally best.


Example 1.10 X 1 , . . . , X n , Y are i.i.d. uniformly distributed over the interval (θ, τ ).
Then the complete sufficient statistic is given by the pair (U, V )

U = min{X 1 , . . . , X n , Y } = min{min X i , Y },
i
V = max{X 1 , . . . , X n , Y } = max{max X i , Y }.
i

Then the conditional distribution of Y given (U, V ) = (u, v) is given by

1 1
Pr{Y = u|u, v} = , Pr{Y = v|u, v} = ,
n+1 n+1
 n − 1  b − a 
Pr{a < Y < b|u, v, u < a < b < v} = .
n+1 v−u

Therefore if α > 2/(n + 1), a prediction interval with level α is given by


24 1 Theory of Statistical Prediction

1 + β  1 − β  1 − β  1 + β  n + 1
U+ V <Y < U+ V, β = (1 − α),
2 2 2 2 n−1

or equivalently
1 + β  1 − β  1 − β  1 + β 
min{X i } + max{X i } < Y < min{X i } + max{X i }.
2 i 2 i 2 i 2 i

We can also discuss the case when Y are vector valued.

Example 1.11 X 1 , . . . , X n and Y are i.i.d. p-variate random variables distributed


normally with mean vector μ and known variance–covariance matrix 0 . Then the
X 1 , . . . , X n , Y ) is given by the vector
complete sufficient statistic for (X


n
T = X i + Y = n X̄
X +Y.
i=1

It can be shown that given T , Y is conditionally normally distributed with mean


1/(n + 1)TT and variance–covariance matrix n/(n + 1)0 and Y − X̄ X is distributed
normally with mean vector 0 and variance–covariance matrix (n + 1)/n0 . There-
fore a prediction region of Y is obtained by
 n 
(Y X ) 0−1 (Y
Y − X̄ Y − X̄
X ) ≤ χ2α ( p),
n+1

where χ2α ( p) denotes the upper α-point of the chi-square distribution with p degrees
of freedom.

Example 1.12 The joint distribution of X i and Y is the same as the Example 1.11
above but the variance–covariance matrix  is unknown. Then the complete sufficient
statistic is given by


n
T = n X̄
X +Y, W = Xi + YY .
X i X̄

i=1

Then a prediction region of Y is defined by


 1   1 
Y − T W −1 Y − T ≤ C,
n+1 n+1

which is equivalent to

1 
n
n
(Y X ) V −1 (Y
Y − X̄ Y − X̄
X ) ≤ A, V = (X
X i − X̄
X )(X
X i − X̄
X) .
n+1 n − 1 i=1
1.4 Interval or Region Prediction 25

Since the left-hand side of the above is shown to be distributed according to the F
distribution with degrees of freedom ( p, n − p − 1), A is equal to Fα ( p, n − p − 1).

When the distribution of X i and Y is discrete, it is necessary to have randomized


regions in order to have similar prediction regions.

Example 1.13 X 1 , . . . , X n and Y are i.i.d. according to the Poisson distribution with
the parameter λ. The complete sufficient statistic is given by


n
T = X i + Y,
i=1

and given T = t, Y is conditionally distributed according to the binomial distribution


B(t, 1/n). Then a similar prediction region function is given as


⎪ 1 for u(t) < y < v(t)

δ for y = u(t), 0 < δt < 1
φ∗ (y, t) = t

⎪ εt for y = v(t), 0 < εt < 1

0 for y < u(t) or y > v(t),

where u(t), v(t) and δt , εt are defined by


 t!
φ∗ (y, t) = 1 − α.
y
y!(t − y)!

Then the interval prediction function φ(y, x) in terms of x is defined as




⎪ 1 when u(y + x) < y < v(y + x)

δt when u(y + x) = y
φ(y, x) =
⎪ εt
⎪ when v(y + x) = y

0 when u(y + x) > y or v(y + x) < y.

It is to be noted that the function φ does not have the structure



1 for u ∗ (x) < y < v∗ (x)
φ(x, y) =
0 for u ∗ (x) > y or v∗ (x) < y.

When t is large u(t) and v(t) can be approximated by


√ √
t nt t nt
u(t) = − u α/2 , v(t) = + u α/2 ,
n+1 n+1 n+1 n+1

and

φ(y, x) = 1 for y < y < ȳ,


26 1 Theory of Statistical Prediction

where y and ȳ are obtained from



√  
x+y n(x + y) 1 u 2α/2 u α/2 u 2α/2
ȳ = ± u α/2 or ȳ = x+ ± (n + 1)x + .
n+1 n+1 n 2 n 4

1.5 Non-parametric Prediction Regions

We can also obtain non-parametric prediction regions. Suppose that X 1 , . . . , X n and


Y are i.i.d. continuous random p-vectors. Let Π be the set of n + 1 p-vectors

{xx 1 , . . . , x n , x n+1 |X
X 1 = x 1 , . . . , X n = x n , Y = x n+1 }

irrespective of ordering. Then given Π the conditional probability that X i = x ji ,


i = 1, . . . , n and Y = x jn+1 for any permutation (xx j1 , . . . , x jn+1 ) of (xx 1 , . . . , x n+1 ) is
equal to 1/(n + 1)!. Hence Π is a sufficient statistic and it is complete if the family
of possible distributions is large enough. Then for a similar prediction function

E[φ∗ (Y
Y , Π )] = 1 − α

for all probability distribution of X i and Y we have

E[φ∗ (Y
Y , Π )|Π ] = 1 − α,

which means that

1  ∗
n+1
φ (xx i , Π ) = 1 − α.
n + 1 i=1

A method to obtain this is to define a continuous function H (xx |Π ) of x depending


on Π and in its coordinates and H(1) < H(2) < · · · < H(n+1) is the ordered values of
H (xx i ), i = 1, . . . , n + 1 and define

⎨ 1 when H(h) < H (yy ) < H(k)
φ∗ (yy , Π ) = γ/2 when H (yy ) = H(h) or H (yy ) = H(k)

0 when H (yy ) < H(h) or H (yy ) > H(k) ,

and (k − h − 1) + 2γ = (n − 1)(1 − α).

Example 1.14 When X i and Y are i.i.d. continuous real random variables, the set of
∗ ∗ ∗
points is described by the statistic x(1) < x(2) < · · · < x(n+1) and the order statistic
of x1 , . . . , xn be denoted as x(1) < · · · < x(n) . In this case, we may put H (y) = y
and y = x(∗j) is equivalent to x( j−1) < y < x( j) with x(0) = −∞, x(n+1) = ∞.
1.5 Non-parametric Prediction Regions 27

Then if 2/(n + 1) < α,



⎨ 1 when x( j) < y < x(n− j+1)
φ∗ (y, Π ) = γ/2 when x( j−1) < y < x( j) or x(n− j+1) < y < x(n− j+2)

0 when y < x( j−1) or y > x(n− j+2) ,
n − 2 j + 1 + γ = (n + 1)(1 − α).

Correspondingly the randomized prediction interval is derived as

with probability 1 − γ, x( j) < Y < x(n− j+1)


with probability γ/2, x( j−1) < Y < x(n− j+1)
with probability γ/2, x( j) < Y < x(n− j+2) .

When 2/(n + 1) > α, the prediction interval is not bounded.


In this case bounded prediction intervals can be obtained in the following way.
Define

1  
n
H (x|π) = |x − x̄ ∗ |, x̄ ∗ = xi + y .
n + 1 i=1

Then H (y|π) = H( j) is equivalent to that the inequality H (y) < H (xi ) holds for not
fewer than j of xi s. Also it implies that

n 1
|y − x̄ ∗ | < |xi − x̄ ∗ | or |y − x| < xi − x − (y − x̄) ,
n+1 n+1

which can be rewritten as


1 n+1
when xi − x̄ ≥ (y − x̄), x̄ − (xi − x̄) < y < xi ,
n+1 n−1
1 n+1
when xi − x̄ < (y − x̄), xi < y < x̄ − (xi − x̄).
n+1 n−1

Then the prediction interval is given as


with probability 1 − γ, collect the values of y which are in at least j of intervals
defined by
x̄ − n+1
(x
n−1 i
− x̄) < y < xi ,
and with probability γ, collect the values of y which are in at least j + 1 of the
intervals defined by
xi < y < x̄ − n+1
(x
n−1 i
− x̄).
If we denote
28 1 Theory of Statistical Prediction

xi   when xi ≤ x̄
Ai =
x̄ − n+1
n−1
(xi − x̄) when xi > x̄
  n−1 
x̄ + n+1 (x̄ − xi ) when xi ≤ x̄
Bi =
xi when xi > x̄,

and ordered values of them as

A(1) < · · · < A(n) < B(n) < · · · < B(1) .

Then the randomized prediction interval is with probability 1 − γ, [A( j) , B( j) ] and


with probability γ, [A( j+1) , B( j+1) ].

Example 1.15 We can also construct locally best non-parametric regions at a spec-
ified density function f (x). Let σ̃(X, Y ) be denoted as
 
σ̃(X, Y ) = σ̃ Y = x̃(i) |x̃(1) , . . . , x̃(n+1) ,

since in this case the order statistic is sufficient. Also in this case the order statistic
is complete thus we must have


n+1
 
σ̃ Y = x̃(i) |x̃(1) , . . . , x̃(n+1) = (n + 1)(1 − α)
i=1

for almost all x̃(1) , . . . , x̃(n+1) . Consider a fixed distribution with density function
f (x) and assume that W (y) = 1. Then we have
 
E σ(X, y) dμ(y)

  x̃1  x̃n+1
 
= n! ··· σ̃ Y = x̃(i) |x̃(1) , . . . , x̃(n+1) f (x1 ) · · · f (xn+1 ) dx1 · · · dxn+1 ,
x̃0 x̃n

where x̃0 = −∞, x̃n+1 = ∞. Thus the locally best prediction region is defined by

  1 if f (x̃(i) ) > c(x̃(1) , . . . , x̃(n+1) )
σ̃ Y = x̃(i) |x̃(1) , . . . , x̃(n+1) =
0 if f (x̃(i) ) < c(x̃(1) , . . . , x̃(n+1) ).
 
In other words σ̃ Y = x̃(i) |x̃(1) , . . . , x̃(n+1) = 1 if f (x̃(i) ) is larger than the kth
(k = (n + 1)(1 − α)) largest value among f (x̃(1) ), . . . , f (x̃(n+1) ) or equivalently
if f (y) is larger than n − k + 1 values of f (x1 ), . . . , f (xn ). For example, let
f (x) = √12π e−x /2 and let 0 < X (1) < · · · < X (n) be the order statistic of the absolute
2

values of X 1 , . . . , X n . Then we have the locally best prediction region as


1.5 Non-parametric Prediction Regions 29

⎨1 if |Y | < X (k) , k = (n + 1)(1 − α)
σ(Y ) = (n + 1)(1 − α) − k if X (k) < |Y | < X (k+1)

0 if X (k+1) < |Y |.

The above method can be applied to multivariate cases.

Example 1.16 X 1 , . . . , X n , Y are i.i.d. p-variate continuous random variables.


Define

H (xx |Π ) = (xx − x̄x ∗ ) (xx − x̄x ∗ ),

then the region H (yy |Π ) < H (xx i |Π ) is given by


 1  1   1  1 
y− (nx̄x + y ) y − (nx̄x + y ) < x i − (nx̄x + y ) x i − (nx̄x + y ) ,
n+1 n+1 n+1 n+1
 1  1  n2
y − x̄x + (xx i − x̄x ) y − x̄x + (xx i − x̄x ) < (xx i − x̄x ) (xx i − x̄x ).
n−1 n−1 (n − 1)2

For given x i this defines a p-sphere in terms of y . The prediction region consists of
the set of points in at least j (or j + 1) spheres.
Another definition of H (xx |Π ) is

1
n
H (xx |Π ) = (xx − x̄x ∗ ) S ∗−1 (xx − x̄x ∗ ), S ∗ = (xx i − x̄x ∗ )(xx i − x̄x ∗ ).
n i=1

Then
n 1
x̄x ∗ = x̄x + y,
n+1 n+1
1 
n
n−1 1
S∗ = S+ (yy − x̄x )(yy − x̄x ) , S = (xx i − x̄x )(xx i − x̄x ),
n n+1 n − 1 i=1

and we have
n  n −1
S ∗−1 = S+ 2 (yy − x̄x )(yy − x̄x )
n−1 n −1
1  −1 n 
= S + 2 (yy − x̄x ) S −1 (yy − x̄x )S −1 − S −1 (yy − x̄x )(yy − x̄x ) S −1 ,
Δ n −1
n2  n 
Δ= 1+ (yy − x̄x ) S −1 (yy − x̄x ) .
n−1 n−1

Then we have
30 1 Theory of Statistical Prediction

n2
H (yy |Π ) = (yy − x̄x ) S −1 (yy − x̄x ),
(n + 1)2 Δ
1
H (xx i |Π ) = (xx i − x̄x ) S −1 (xx i − x̄x )
Δ
n  
+ 2 (xx i − x̄x ) S −1 (xx i − x̄x )(yy − x̄x ) S −1 (yy − x̄x ) − {(xx i − x̄x ) S −1 (yy − x̄x )}2
n −1
2 1 
− (xx i − x̄x ) S −1 (yy − x̄x ) + (yy − x
x̄ ) S −1 (yy − x̄x ) .
n+1 (n + 1)2

Consequently H (yy |Π ) < H (xx i |Π ) is equivalent to

2
(xx i − x̄x ) S −1 (yy − x̄x ) + (xx i − x̄x ) S −1 (xx i − x̄x ) < 0,
(yy − x̄x )  (yy − x̄x ) +
n+1
 n − 1  n   n 
= 1− (xx i − x̄x ) S −1 (xx i − x̄x ) S −1 + 2 S −1 (xx i − x̄x )(xx i − x̄x ) S −1 .
n+1 (n − 1) 2 n −1

The above inequality leads to


 1   1 
y − x̄x + −1 S −1 (xx i − x̄x )  y − x̄x + −1 S −1 (xx i − x̄x )
n+1 n+1
  1 2 
< (xx i − x̄x ) S −1 + S −1  S −1 (xx i − x̄x ),
n+1

which implies that y is in a p-ellipsoid centered at x̄x . The prediction is given as the
collection of points in at least j (and j + 1) such ellipsoids.

1.6 Dichotomous Prediction

In this section we shall consider the case when it is required to know whether Y will
belong to some set A. This is called a dichotomous prediction and the prediction
space consists of two points, i.e., to predict Y ∈ A and Y ∈ / A. Let φ(X ) be a pre-
diction function which denotes the probability of predicting that Y will belong to
A given X . Then 0 ≤ φ(X ) ≤ 1 and the errors of two kinds will be given by

α(θ) = E θ [1 − φ(X )|Y ∈ A]Pθ {Y ∈ A},


β(θ) = E θ [φ(X )|Y ∈
/ A]Pθ {Y ∈
/ A}.

When X and Y are independent

α(θ) = E θ [1 − φ(X )]Pθ {Y ∈ A},


β(θ) = E θ [φ(X )]Pθ {Y ∈
/ A}.

Let us denote P(θ) = Pθ {Y ∈ A}.


1.6 Dichotomous Prediction 31

We shall seek for a procedure which minimizes supθ β(θ) under the condition
supθ α(θ) ≤ α. When X and Y are independent, this is equivalent to minimizing

sup E θ [φ(X )](1 − P(θ))


θ∈Θ

under the condition


α
E θ [φ(X )] ≥ 1 − ∀θ ∈ Θ. (1.12)
P(θ)

Let Φ be a set of functions φ which satisfy the condition (1.12). Then we have to
seek for

V = inf sup(1 − P(θ))E θ [φ(X )].


φ∈Φ θ∈Θ

Let ξ(θ) be a probability measure over some σ-field of subsets of Θ and let  be
a set of all such measures. Then

V = inf sup (1 − P(θ))E θ [φ(X )] dξ(θ). (1.13)
φ∈Φ ξ∈

It is to be remarked that the set Φ is closed and convex hence usually it will hold that

V = sup inf (1 − P(θ))E θ [φ(X )] dξ(θ). (1.14)
ξ∈ φ∈Φ

The precise condition under which (1.14) holds true is not necessarily known but if
there is a pair ξ ∗ and φ∗ such that
 
(1 − P(θ))E θ [φ∗ (X )] dξ ∗ (θ) = inf (1 − P(θ))E θ [φ(X )] dξ ∗ (θ)
φ∈Φ

= sup(1 − P(θ))E θ [φ∗ (X )], (1.15)


θ∈Θ

then it is guaranteed by the well-known proposition of game theory that (1.14) is true
and φ∗ is the solution of the problem.
Thus we shall first seek for the right-hand side of (1.14) and then check the
condition (1.15) for the corresponding φ∗ and ξ ∗ . Also for this we need to minimize

(1 − P(θ))E θ [φ(X )] dξ(θ)

under the condition (1.12) for given ξ.


Summarizing such procedures, we have the following theorem.
32 1 Theory of Statistical Prediction

Theorem 1.21 Let ξ ∗ and λ∗ be a pair of prior probability measures over Θ and
φ∗ be a prediction function. If

φ∗ (x) = 0 for almost all x such that


 
(1 − P(θ)) f (x, θ) dξ ∗ (θ) > c f (x, θ) dλ∗ (θ),

φ∗ (x) = 1 for x such that


 
(1 − P(θ)) f (x, θ) dξ ∗ (θ) < c f (x, θ) dλ∗ (θ)

for some positive constant c and


α
E θ [φ∗ (X )] ≥ 1 − ∀θ ∈ Θ,
P(θ)
α
E θ [φ∗ (X )] = 1 − for θ ∈ C ⊂ Θ, λ∗ (C) = 1,
P(θ)

and moreover

(1 − P(θ))E θ [φ∗ (X )] = sup(1 − P(θ))E θ [φ∗ (X )] for θ ∈ D ⊂ Θ, ξ ∗ (D) = 1.


θ∈D

Then φ∗ is the prediction function which minimizes supθ β(θ) under the condition
that supθ α(θ) ≤ α.

The proof of this theorem is straightforward.

Corollary 1.1 If for some θ1 , θ2 and positive constant c, φ∗ satisfies the condition
that

∗ 0 if f (x, θ1 ) > c f (x, θ2 )
φ (x) =
1 if f (x, θ1 ) < c f (x, θ2 ),

and that
α
E θ [φ∗ (X )] ≥ 1 − ∀θ ∈ Θ,
P(θ)
α
E θ2 [φ∗ (X )] ≥ 1 − ,
P(θ2 )
(1 − P(θ1 ))E θ1 [φ∗ (X )] ≥ (1 − P(θ))E θ [φ∗ (X )] ∀θ ∈ Θ.

Then φ∗ is the optimum prediction function in the sense of Theorem 1.21.

Theorem 1.22 Suppose that θ is real and the density function f (x, θ) has the mono-
tone likelihood ratio with respect to a function t (x). If for some constants c and
0 < γ < 1, φ∗ satisfies
1.6 Dichotomous Prediction 33

⎨0 if t (x) < c
φ∗ (x) = γ if t (x) = c

1 if t (x) > c,

and

sup P(θ)E θ [1 − φ∗ (X )] = α,
θ∈Θ

and there is a pair θ1 > θ2 such that

P(θ2 )E θ2 [1 − φ∗ (X )] = α,
(1 − P(θ1 ))E θ1 [φ∗ (X )] = sup(1 − P(θ))E θ [φ∗ (X )].
θ∈Θ

Then φ∗ is optimum.

Proof If f (x, θ) has the monotone likelihood ratio, then f (x, θ1 )  c f (x, θ2 ) is
equivalent to c  t (x) thus the theorem is proved directly from the previous corol-
lary. 

Example 1.17 Suppose that X 1 , . . . , X n and Y are independent and identically dis-
tributed according to the normal distribution N (θ, 1) and we are to predict whether
Y > 0 or Y ≤ 0. The distribution has the monotone likelihood ratio with respect to
n
X̄ = i=1 X i /n. Thus it is natural that we shall predict Y > 0 if and only if X̄ > c
for some constant c, where c should be determined so that

sup Pθ { X̄ < c}Pθ {Y > 0} = α.


θ∈Θ

Assume that α < 1/4, then it is obvious that c < 0. Let

Pθ { X̄ < c} = q(θ), Pθ {Y > 0} = p(θ),

and

p(θ1 )q(θ1 ) = sup p(θ)q(θ),


θ∈Θ
(1 − p(θ2 ))(1 − q(θ2 )) = sup(1 − p(θ))(1 − q(θ)).
θ∈Θ

Then

p (θ1 )q(θ1 ) + p(θ1 )q (θ1 ) = 0,


p (θ2 )q(θ2 ) + p(θ2 )q (θ2 ) − p (θ2 ) − q (θ2 ) = 0.

Since
34 1 Theory of Statistical Prediction

√ t
1
√ e−z /2 dz,
2
p(θ) = Φ(θ), q(θ) = Φ( n(c − θ)), Φ(t) =
−∞ 2π
1
p (θ) = Φ (θ) = √ e−θ /2 ,
2


√ √ n
q (θ) = − nΦ ( n(c − θ)) = √ e−n(c−θ) /2 ,
2

we have c < θ1 < 0 and if n ≥ 2, p(θ1 ) < q(θ2 ) then

p (θ1 ) + q (θ1 ) < 0,


p (θ1 )q(θ1 ) + p(θ1 )q (θ1 ) − p (θ1 ) − q (θ1 ) < 0,

hence θ1 < θ2 . Thus the condition of the theorem is ascertained. For n = 1,


θ1 = θ2 = c/2 and the method√ of the above theorem is not directly applicable.
√ But in
this case, p(θ1 ) = q(θ2 ) = α and (1 − p(θ2 ))(1 − q(θ2 )) = (1 − α)2 . Also for
any procedure such that E θ [φ( X̄ )]Pθ {Y > 0} ≤ α for θ = θ1 = θ2 = c/2 we have
√ 2
(1 − E θ [φ(X )])(1 − Pθ {Y > 0}) ≥ (1 − α) ,

hence the optimality of the above procedure is established. Similarly for α ≥ 1/4, it
is ascertained that the procedure is optimum. The value of c is difficult to calculate
numerically but when n is large θ  c and q(θ1 )  1/2 thus p(θ1 )  p(c)  2α,
from which c is approximately obtained.

When X and Y are not independent let

g(X, θ) = Pθ {Y ∈ A|X },

then it is required to minimize

sup E θ [φ(X )(1 − g(X, θ)]


θ∈Θ

under the condition that

E θ [φ(X )g(X, θ)] ≥ p(θ) − α ∀θ ∈ Θ.

Consequently the optimum φ∗ will have the form


  
1 if  (1 − g(x, θ)) f (x, θ) dξ(θ) > c  g(x, θ) f (x, θ) dλ(θ)
φ∗ (x) =
0 if (1 − g(x, θ)) f (x, θ) dξ(θ) < c g(x, θ) f (x, θ) dλ(θ)

for some measures ξ, λ and a positive constant c.


Other types of optimality criterion for prediction function may be taken into
consideration. For example, we may minimize either supθ max{α(θ), β(θ)} or
1.6 Dichotomous Prediction 35

supθ {w1 α(θ) + w2 β(θ)}. For the first of these, let φ∗α be the prediction function
which minimizes supθ β(θ) under the condition that supθ α(θ) ≤ α for given α and
for this φ∗α let supθ β(θ) = β(α). It is obvious that β(α) is a monotone decreasing
function of α and let α∗ be the value such that β(α∗ ) = α∗ . Then φ∗α which corre-
sponds to this α∗ is the solution which minimizes supθ max{α(θ), β(θ)} and for φ∗α ,
supθ max{α(θ), β(θ)} = α∗ .
For the second of the above criteria we have

w1 α(θ) + w2 β(θ) = E θ [w1 (1 − φ(X ))g(X, θ) + w2 φ(X )(1 − g(X.θ))]


= w1 P(θ) + E θ [(w2 − (w1 + w2 )g(X, θ))φ(X )].

Also if for some prior measure ξ over Θ, let φ∗ be the solution which minimizes

[w1 α(θ) + w2 β(θ)] dξ(θ),

and for this φ∗ let



[w1 α(θ) + w2 β(θ)] dξ(θ) = w∗ ,

then if

w1 α(θ) + w2 β(θ) ≤ w∗ ∀θ ∈ Θ,

φ∗ minimizes supθ {w1 α(θ) + w2 β(θ)}. Also such φ∗ is obtained by


  
∗ 1 if  g(x, θ) f (x, θ) dξ(θ) > w1
w1 +w2  f (x, θ) dξ(θ)
φ (X ) =
0 if g(x, θ) f (x, θ) dξ(θ) < w1
w1 +w2
f (x, θ) dξ(θ).

1.7 Multiple Prediction

k
Suppose that the space Y is partitioned into k (k ≥ 3) subsets Y = ∪ Ai and we
i=1
are to predict to which Ai the value of Y belong. Two types of prediction procedures
will be taken into consideration. First we are to decide on the basis of observation
of X , which one of Ai will Y belong to. Or secondly we are to choose a collection
of Ai s such that we can say with some confidence that Y will belong to one of the
selected subsets.
More precisely considering randomization, we have to decide k prediction func-
tions φi (X ), 0 ≤ φi (X ) ≤ 1, i = 1, . . . , k each of which denotes the probability of
k
Ai be chosen given X either i=1 φi (X ) = 1 or
Another random document with
no related content on Scribd:
thought I heard some one inside. But after I had pounded on the
door there was not a sound.”
Suddenly her voice broke and, giving us an appealing look, she
asked if we would not go back to the library and break open one of
the windows, so we could get within. There was no doubt she was
afraid something had happened to Warren. When she finished
speaking, there was just one response. It came from Ranville.
“I think we better do as she says, Carter,” was all he said.
At these words the woman ran from the room, returning in a
moment with two flashlights, which she gave us. She half started to
follow us from the piazza, and then, as if thinking better of her
resolution, stopped by the door. As we went down the steps to the
ground, our last sight was the housekeeper, standing in the open
door with the light from the hall streaming out into the night.
It was now dark. As we retraced our steps, the high hedges on
each side of the path caused the walk to appear like a black tunnel.
Above our heads we could catch a glimpse of the stars, and could
hear the faint rustle of the branches of the trees. For some reason no
one spoke, nor for that matter did we hurry.
Climbing the slight hill, we approached the building, which
loomed a dark mass before us. On the veranda we paused for a
second, and then the darkness was split by the sudden ray from
Carter's flashlight. We tried the door again, but it was still locked, and
there came no response to our knock. The window was six or seven
feet above our heads, and to reach it some one would have to do a
little climbing.
As I was the lightest, they proposed to lift me from the floor to the
ledge of the window. If I found it was locked, I was to break the
glass, lift the window, and climb into the room. Ranville gave me his
hand, and I reached the sill. Balancing myself on the narrow ledge, I
tried to peer into the room, but it was a dense black shadow of
gloom. Nothing could be distinguished, and though I waited a
second, the only sound to come to my ears was the wind in the
branches of the near-by trees.
Trying the window, I found it locked. Then Carter reached up to
me the second flashlight, and without turning it on I broke the glass
with the heavy end. The glass fell with a tinkling sound to the floor,
and slipping my hand through the hole, I turned the catch and lifted
the window. As I did this, I dropped the flashlight, which fell with a
thud within the room. Hesitating a second, I dropped into the library
and fumbled on the floor for the flashlight.
I found it without any trouble and, putting on the catch, played the
light hastily around the room. Just what I expected to see I cannot
say; but the brief sweep which I made over the floor and the walls
revealed nothing. The room evidently took in the entire house, and
the walls showed only long lines of books, and a gallery which ran
around the eight sides. In the center was a large desk, the surface
littered with books and papers. But of Warren there was not a trace.
Turning the light to the door, I found the spring lock was on. It
took but a second to fling the door open, and Carter and Ranville
slipped within. The same question was on both faces, and I slowly
shook my head in reply. Carter's first words were for me to find the
switch for the lights. The button was near the door, and, pressing it,
the room in an instant was a blaze of light.
The room was octagonal in shape, with a window placed high on
each of the sides. The wall space was filled with bookcases, and
there must have been many thousand volumes. A gallery at a height
of around twelve feet ran completely around the room. Even this was
filled with books. The furniture was simple. Near the door stood a
safe, and there were a number of stands in various corners. But in
the center of the room was the largest desk I had ever seen—a huge
affair made out of an old-fashioned square piano—with its surface
littered with books and papers. Near it stood a typewriter stand, with
the machine uncovered. And then, suddenly, we saw something else
—something which drove all other thoughts from our minds. Peering
from behind the desk was a foot—a foot which did not move.
We must have seen it at the same moment, for Carter's hand
gripped my arm, and for a second we stood silent. Then without a
word slowly we went across the floor, knowing what we would find.
Though we were sure what was behind the desk, yet it came as a
shock. For there, lying on his back upon the floor—with both arms
outstretched from the body—lay a man. A man to whom the dinner
waiting in the big house would never matter; and it needed but a
glance to know that death had come suddenly—and violently.
As Ranville's eyes and mine met, they framed the same question.
It was Carter who spoke the two words:
“It's Warren.”
The scientist was a man of about fifty, and perhaps a little over
that age. The face was self-willed, and the lines around the distorted
lips were stern. Though past middle life, his hair was a dense black,
without a sign of gray, and there was not a white hair in the close-
cropped mustache. One could tell by his figure that he had been a
man of the strongest physique. He was dressed in a light summer
suit, without a coat, and upon the white shirt, just over the heart, was
a crimson stain.
Carter dropped on his knees and made a hasty examination. In a
second he turned and pointed with his finger at the dark stain upon
the white shirt. Then as he straightened up we saw something else—
something we had overlooked. It was a sheet of paper. Evidently it
had fallen off the body, though perhaps it had been placed by its
side. A piece of bond paper with but five letters—large letters,
evidently written with a hurried hand, the beginning of an
incompleted word:

—ANANI—

There had been little conversation, for we were too upset by what
we had found. But the piece of paper puzzled us. That Warren had
been stabbed there was no doubt; but what the paper meant we
could not tell. The letters seemed to mean nothing, and we were not
sure that they had anything to do with the crime. For a moment we
puzzled over it, and then my eyes wandered again to the still figure
upon the floor. As I glanced at it, I gave a sudden start and dropped
to my knees for a closer look. And then—then, after one glance, I
gave a startled cry.
For there upon the forehead of the murdered man were two faint
lines—lines now swollen and red. Not very long lines, nor for that
matter very noticeable, but lines which I could not understand. There
upon the forehead of the famous scientist were two faint lines cut
into the skin. A cross—the lines of which had just been made. Cut
faintly, I judged, with a knife. A cross—the lines now red and swollen,
and a crooked cross at that.
Chapter III.
The Broken Bookcase
At my cry of astonishment Carter and Ranville had turned in
surprise. I simply pointed to the forehead of the murdered man, and
they bent forward for a closer look. I saw a startled expression
sweep over the Englishman's face, and he slowly shook his head. It
was Carter who broke the silence, speaking to no one in particular.
“Do you think that was made by the murderer?”
“There is not the slightest doubt of it,” was Ranville's quick retort.
“That man has not been dead over two hours, and the cut itself is not
any older.”
I cast a hurried glance at the grewsome lines of the red cross and
gave a little shiver as I asked:
“But what under heavens can it mean; why should there be a
mutilation of that kind?”
Carter simply shook his head, and it was the Scotland Yard
Inspector who replied:
“We do not know, of course. I have seen a good many murdered
people in my time, but as a rule the murderer had never marked his
victim. Once in a while you will run into a murder which was
committed by a woman—committed in a fit of frenzy. Sometimes in
such a case they mark up their victims. But of course we know
nothing of this crime. What the motive was we do not know. How he
was killed is rather easy to understand—a long thin knife or dagger.”
The body lay upon the floor near the desk, but about two feet
behind it. The position was such that any one coming into the
building by the front door would have been unable to see it. Save for
the crimson spot upon the shirt and the faint cross upon the
forehead, there were no signs of violence.
But the position in which the body lay was rather odd. The man
lay flat upon his back, the staring eyes fixed upon the ceiling. But
both arms were outstretched as far as they could reach. It was this
that puzzled me. I knew it was impossible that the man could have
fallen in the position in which he was. Some one must have arranged
the body after the crime—but why?
Behind the desk was the chair in which the scientist must have
sat while at work. Near it, on the left, was another chair, back of the
typewriting stand. And on the other side of the desk, very close to it,
was a third chair. The surface of the desk was covered with papers
and pamphlets. A small heap of manuscript was piled in an orderly
manner in the very center. But of any weapon there was not a sign.
I was just starting to comment upon this when I observed that
Ranville was carefully studying the position of the chairs. In a
moment he went around the desk, studying the place where a chair
stood. Then he turned to us.
“I have an idea I can reconstruct the murder. See the three
chairs? There is no trouble about the two on your side of the desk;
one was where Warren sat when at work, the other was for his
secretary. But this chair on my side of the desk tells us a good deal.”
I cast an inquiring gaze at the chair, a tall antique piece of
furniture, while Ranville continued:
“In a room as large as this you will not as a rule find a chair pulled
up to a desk, across from which a man is working. But if some one
comes in, the natural thing is to bring a chair near the desk, to be as
close to the man you are talking to as you can. Now there are other
chairs in the room across the desk from where Warren sat; but they
are all rather far away. All but that one, and I am pretty sure the
murderer sat there.”
When he mentioned it, I noticed for the first time that there were a
number of other chairs across from us. Some were near the wall,
and one in front of the safe; but the chair he was speaking of stood
but two feet from the desk. Seeing we did not speak, he went on:
“What happened, I think, was this. The murderer sat in this chair
talking to Warren. I have an idea it was some one he knew. Though I
do not know his habits, yet I doubt if Warren would spend much time
while at work with any one he did not know. The papers said he was
rushing his book. Maybe there were some words passed, maybe not.
But then, suddenly, Warren was killed.”
“Why suddenly?” came Carter's dry question.
“Warren seems to have been a man of strong physical
development. There is no evidence of any struggle. In a fight I judge
he could have held his own with any one. So if there was no
struggle, it follows he was killed suddenly. I judge whoever sat in that
chair must have risen—perhaps said he was going—strolled to
Warren's side and suddenly stabbed him.”
I again turned my eyes to the figure upon the floor, and again the
outstretched arms puzzled me.
“But he never fell to the floor in that position,” I said.
“He never did,” was the reply. “The body was arranged in that
position, and the cut on his forehead was made after he was dead.”
“But why?” asked Carter.
“God knows!” was the retort. “But then, Carter, this is not our
show anyway.”
Carter gave a sudden start, saying slowly: “You are right. I will
call up our chief of police. He will get a mighty big shock, for there
has not been a murder in this town in years. And then”—he paused
—“then I'd better call the housekeeper and break the news to her.”
There was a telephone in the building near the door. After several
attempts Carter got the housekeeper and told her that she had better
come to the summer house. Then he held a short conversation with
the police station, after which he returned to our side.
“While we are waiting for the police, we had better look this place
over,” he said.
As I have mentioned, the building was an odd one with eight
sides and only one story. There was a window at each side, placed
rather high, and the space between the windows was filled with
bookcases. All these cases had glass doors, some of which were
open, while others we found locked. The books in the cases were
mostly upon science and anthropology—the library of a professional
scientist. It was not until we reached the further side of the room that
we found anything out of the way. But there we found one of the
bookcases with the glass in the locked door smashed into hundreds
of pieces—pieces which lay upon the rug at our feet.
Behind the broken glass were seven book shelves with books
packed tightly together. They were mostly bound in a uniform red
morocco, small volumes, not very thick nor very tall. Only in the third
shelf was there a gap, and there several books seemed to be
missing.
And the books themselves turned out to be a rather curious
collection, yet when one remembered Warren's profession, perhaps
they were not so out of place as I first thought. The word “erotic”
describes them best, though several went beyond that. Why a
scientist should wish to have upon his shelf “The Perfumed Garden,”
“The Ananga Ranga,” “Aretino” and others one could understand.
But there were certain other things in the case which seemed out of
place.
Side by side with the classics of the underworld of literature stood
the witty and immoral romances of the eighteenth century of France.
But there were a few modern books, decidedly pornographic in type,
which flanked the more classical ones. An odd collection at the best,
worth a good deal of money, it is true. But the oddest thing to explain
was why some one had broken the glass to get at the contents of the
case.
The Englishman gave a low whistle, and I saw his eyebrows
raise. Reaching in his hand, it came forth with a volume. It stood on
the shelf which had the empty space, the one where, if the broken
gap told the truth, several books were missing. He turned the leaves
slowly, shrugged his shoulders at several of the engravings, and
then without a word handed the book to me. It was the first volume of
De Sade's “Justine,” the first edition with the illustrations. I
remembered once hearing Bartley say that it was the worst book
ever written and very difficult to secure. In turn, without speaking, I
handed the thin volume to Carter just as Ranville expressed what
was in his mind.
“That's not only a pretty rare book, but it is also a rather rotten
one. It looks very much as if some one smashed the glass in this
case to get at the books. What they took I cannot tell, though it might
be the other volumes of that ‘Justine.’ I cannot understand why any
murderer should want the books. Besides, it's the French edition,
and not every chap reads French, you know.”
We agreed to this, and placed the book back in the case. Then
climbing a narrow winding stairs, we went up to the gallery. It ran
around the entire length of the room—a narrow gallery, built
evidently to give more space for books. The walls were lined with
books, thousands of them, of every kind. But there were no doors or
glass before the cases in the gallery.
Nothing had been disturbed so far as we could see. I glanced
over the rail to the floor below, giving a shudder as my eyes fell upon
the still figure by the desk—the figure with the outstretched arms.
Leaving the gallery, we tried the rear door, finding as we
expected that it was locked. As both doors had a spring lock, it would
have been only necessary for the murderer to close them when he
went out. But why the windows were down, and also locked, puzzled
us. It had been a warm day, and it hardly seemed possible that
Warren had worked in a room without any fresh air. We were
commenting on this when there came a voice from the front door,
and two men stepped into the room.
One was a very short man with a vivid red face, and I could tell
by his blue uniform that it was the chief of police. He was very warm,
as if he had been hurrying, and there was a questioning look in the
glance he gave us. He had a rather kindly face, though it was not an
over-intelligent one, and I decided that he did not fancy the task
before him. The young man with him he introduced as the coroner, a
young man named Hasty.
The chief held a short conversation with Carter and then went
over to the desk. He came to a sudden halt by the body, and I saw
the look of dismay which swept over his face. Even the doctor
seemed shocked, but went about his examination at once. When he
had finished, he rose to answer the eager questions of the chief.
He informed us the man had been dead several hours, and that
he had been stabbed. The blow had evidently reached the heart, and
the scientist must have died at once. The faint cross on the forehead
he could not explain, but he agreed that it had been made after
death.
“But,” came the heavy voice of the chief, “why should any one
wish to kill Warren? There are very few people in the town that know
him. Though this is his birthplace, he has been away so long that he
has hardly any friends here. He never cared to bother much with
people.”
He paused to throw a curious look around the room.
“If he was stabbed, where is the weapon?”
We assured him that we had seen no signs of a weapon, though
we had looked the building over. Carter said he agreed with the chief
regarding Warren's acquaintance in the town. There was no doubt
he was their most distinguished citizen, but he had been away so
many years that few knew him. But why he had been murdered, or
by whom, there was not the slightest kind of a clew.
The police chief listened, his face growing very long as Carter
went on. Like most police chiefs in small places, his work was the
usual small town routine. Confronted with a murder, and one as
mysterious as now before him, he did not know what to do. And as
he gave a glance at the body on the floor, I knew that he was much
perplexed.
As Carter and the chief started a low conversation, Ranville and I
went to the desk. No one had looked at the papers on its surface,
and as we started to glance through them, we found just about what
we had expected. The greater part of them were notes, and as I read
a sentence or two, I could see that they dealt with Warren's two
years' stay in the heart of China. Many of them had crude drawings
of bones and fossils. But the handwriting was rather bad, and I did
not bother to read more than a few lines.
There were a number of books upon the desk, but they were
mostly scientific works of reference. One red-covered volume turned
out to be a popular mystery story, and beside it stood one of the
adventure story magazines. A number of typewritten sheets,
evidently corrected work of his secretary were near the edge of the
desk, the pages filled with corrections in red ink. But there was
nothing of importance, only the natural data of a scientist who was
writing an account of his last expedition.
Just as I was about to turn away from the desk, my eyes fell upon
a piece of paper which was peering out from under the typewritten
manuscript. I pulled it forth to see what it might be. It was part of a
typewritten letter dated the day before, but with no address or
signature. There could not have been a signature, for the lower half
of the letter was missing. The sheet was torn across as if some one
had wished to destroy the signature. It read:

“Tuesday.
“Mr. Henry Warren,
“My dear Professor,
“I will call upon you to-morrow around five o'clock. I feel sure you
can spare me a few moments. If I can only make you see how great
a thing you can do for humanity, I am sure you will take my
viewpoint. The consequences of the step you are taking are so
momentous that unless—”

And there the letter ended, for the rest of the sheet had been torn
off.
It was a curious sort of a letter, and seemed to contain a warning
of some sort. It was written upon a typewriter whose ribbon was far
from clean. Not only did it contain a warning, but it seemed to me
there was a threat in the words. But more important than anything
else was the statement that the writer would call upon Warren. As it
had been written the day before, Warren must have seen the person
only an hour before he died.
Without a word I handed the letter to Ranville and watched his
face as he read. When he came to the end, I saw his eyebrows raise
a little, and he turned to me.
“This looks important, Pelt. Any signs of the missing portion of the
sheet?”
I shook my head, and we both turned to the desk. We went
through every paper, lifting them from each other, and even turning
the pages of the books. But we found nothing. Then we turned our
attention to the wastebasket, turning the contents upon a small rug.
But the basket contained only the discarded notes which had been
thrown aside and a few matches. The missing half of the letter we
did not find.
As we paused, I noticed that the chief and Carter were before the
bookcase—the bookcase with the broken glass. Ranville placed the
letter in his pocket and said: “What do you make of it?”
I told him what I thought, that it contained both a warning and a
threat, and then said that it looked as if the missing part had been
taken in order to destroy the signature.
“True enough,” came the drawling answer. “But why did they not
take the entire letter? Why destroy half of it and leave the other? If
the whole note had been taken we would never have known
anything about it. To take but half seems a very illogical thing to do.”
Hearing our voices, Carter and the chief came to the desk and
asked what we had found. Ranville handed him the letter, and after
they had both looked at it the chief held it a long while in his hand.
His face was a study, and he slowly shook his head. He might have
spoken if Carter had not said:
“The chief agrees with me that the murder of Professor Warren is
going to make a great deal of comment. He will have the inquest to-
morrow, and hopes before then to have something to go on. As it
stands now all we know is that Warren was murdered, but nothing
else. The—”
There came a commotion at the door, and we turned, only to see
the housekeeper rush into the building. Her face was red as if she
had been running, but why she had taken so long to come to the
library after Carter called up I could not tell. For a second she leaned
against the door as if out of breath, and then gave a quick glance
around the room. In her eyes was terror, and the glance at length
rested upon Carter. With one step in his direction, she gasped in a
trembling voice so low that we could barely hear her:
“Mr. Warren—is—is he dead?”
Carter nodded, and again the woman's eyes swept the room.
This time they went slowly as if seeking for something, and as if
afraid of what she might find. Suddenly she stiffened into attention as
her glance fell upon the foot of the dead man, which could be seen
around the desk. Then slowly, a step at a time, she crossed the floor
to a place beyond the desk. There she stood, silently looking down
at the still figure of her employer. The red had faded from her
cheeks, and her face was a dull white. Slowly she turned, her eyes
asking the question before her lips spoke:
“Was he murdered?” came the quivering voice.
“Yes,” some one said.
For a moment she did not speak. Again her eyes came back to
the silent figure. For an instant as her lips moved I thought she would
speak, but she gave a shudder and shut them tightly. But the flush
had come back to her face, and when she turned toward us, I could
see the veins in her forehead throb. And then suddenly, in a shrill
voice which rang through the room, she shrieked: “I knew it, I knew
it. It's that secretary. I knew that girl would—”
But the sentence was not completed. As the shrill voice rose
higher and higher, her hands began to beat the air; the voice died
away in her throat as if suddenly cut off. Then with a little gasp she
staggered a second and fell fainting to the floor.
Chapter IV.
We Discuss the Crime
So unexpected was the woman's action that for a second none of
us stirred. It was the doctor who reached her first. The eyes opened
with a little flutter, and the color came flooding back into her cheeks.
As he placed her in a chair, her hands went out in a confused,
questioning gesture, as if seeking aid. Then when she realized what
had happened, she cast one horrified look in the direction of the
body.
When she was feeling more composed, the chief tried to question
her. But she refused to say a word. Before she fainted, in a voice
which rang with conviction, she had practically accused the secretary
of the murder. Now in a listless tone she refused to say a word,
shutting her thin lips in a determined manner. At last, seeing that she
did not care to speak, and in fact would not, the chief suggested that
the coroner assist her back to the house.
When they had left, he turned with an astonished air to Carter.
“What in the devil did she mean by that crack about the
secretary?”
“I don't know,” was his reply. “She seemed to be a bit angry. Who
is the secretary anyway?”
“Why it's the former stenographer of Judge Williams. She is as
good looking a girl as you will find in many a day. But that
housekeeper is crazy if she thinks that girl killed Warren.”
“Well,” came the drawling voice of Ranville, “I know nothing about
the girl you speak of; but if I were you, I would look her up.”
A few moments later, concluding that we could do no good if we
remained, we left. It was growing late, and the police had much work
to do. Besides we were beginning to feel the need of the dinner we
had not eaten. We told the chief all we knew, showed him the broken
glass in the bookcase, and mentioned what the housekeeper had
said regarding the visit of a Chinaman. Last of all we pointed out the
faint cross on Warren's forehead. This seemed to impress him more
than anything else, and I saw his eyes grow big. Then with Carter's
remark, that we would aid him in any manner he wished, we said
“good night” and went out.
The stars were bright above our heads, but it was dark at that.
The path between the hedges was a dense black line, and the trees
loomed in a somber manner above us. Reaching the lawn before the
house, we saw that the building was a blaze of lights, though we
glimpsed no one. We did not turn to the house, but instead passed
through the iron gateway and out to the road.
No one spoke, and I judged that none of us felt like speaking. As
we went along, I thought of the famous scientist, who only a few
short weeks before had been hailed in every paper of the world.
There had been many wild guesses made as to what he found, more
so after he had said that the question of man's origin was forever
settled. What he had found no one knew, and he refused to say,
simply stating that it would all come out in his book. And then the
whole controversy burst into flame.
This was caused by the theological argument which was raging
over evolution. The controversy had increased after Warren's
statement. Back and forth flew the arguments. The scientists
contented themselves by saying that every intelligent person
believed in evolution, and that if Warren said he had found the final
proof that settled it. His reputation and word was enough for the men
of science. On the other hand, theologians and demagogues who
knew nothing about science cried long and loud that Warren could
not have found any proofs of evolution, for, as they said, evolution
was not a fact.
In all this controversy—one which filled many pages of the
papers—Warren bore no part. As soon as he arrived in America, he
had gone directly to his home and made the announcement that he
would have his manuscript ready as soon as possible. Only one
statement he gave the papers—it was to repeat what he had said
before: that he had found the final proofs. The proofs which settled
for all time the question of man's origin. After that he was silent. And
now he was murdered, and I pictured the papers when they heard of
his death.
And then I began to wonder why he should have been killed. A
man of his decided personality must, of course, have made enemies.
I puzzled over the man from China, who the housekeeper said had
come to the house. I played with this thought for a while, only to
decide that perhaps it was better to stop wondering about the case
until I had more facts to puzzle over. And by the time we came in
sight of Carter's home the only thing I was thinking about was
something to eat.
The tall grandfather's clock was striking eleven as we entered the
living room. With the remark that we must be hungry, Carter went out
into the kitchen saying he would see if the cook had left anything in
the ice box. Ranville and myself dropped into the nearest chairs. I
was too tired to talk, and the experiences of the last few hours had
not been pleasant. But to look at the Scotland Yard Inspector one
would never have guessed that anything had taken place. The fine
face of the Englishman was as peaceful and contented as if he had
just returned from a wedding—instead of a murder. He lay back in
his chair, his eyes half closed, watching the curling smoke of his
cigarette.
Carter's voice hailed us from the kitchen, and we rose and joined
him. Upon the white enameled table was a cold chicken, three
bottles of ale, and some rye bread. We pulled our chairs to the table
and set to work. When the chicken had become but a memory,
Carter rummaged in the ice box and found a pie—a pie of which we
did not leave a crumb.
The lunch over, we went out on the large veranda; the night was
cool, with a slight breeze, and down at the edge of the lawn we could
hear the water lapping on the shore. As Carter handed me a cigar, I
happened to think of Trouble, locked in the garage, and went down
to rescue him. He greeted me with a loud bark, but at my command
followed to the piazza and dropped by the side of my chair. For a
while nothing was said, and in the darkness I watched the glowing
tips of my friends' cigars. It was Carter who broke the silence, saying
to no one in particular:
“Well—what do you think about the murder?”
Ranville's drawling voice came floating from his chair, and his
tone was serious:
“It looks to me, Carter, as if we had stumbled upon what will
prove one of the most perplexing murder mysteries we have ever
seen. There are some very curious things about this affair; and it's
my idea it's going to prove rather difficult to solve.”
“It will cause a sensation all right,” was the reply. “You know for
weeks Warren's name has been on the front pages of the papers.
First there came the accounts of his trip to China. When he did not
return at the time expected, the papers began to say his expedition
was lost. Then the outlaw war broke in China, and it was thought he
was killed; and when he suddenly made his appearance, he certainly
got a lot of publicity.”
As he paused, I added my bit. I reminded them that his statement
that he had settled the question of evolution had made more
comment than anything else.
“That's right,” replied the Englishman. “Even in London the old
Times gave a good many columns to that feature. But as he refused
to say what it was he had found, the whole affair led to some little
controversy.”
“You have had a good deal of experience in murder cases in your
Scotland Yard work,” I said to the Inspector. “What do you think was
back of Warren's death?”
Ranville was silent a while, replying at last:
“That is the question. It is pretty hard to say from what we found
to-night, just what could be the motive. Men are murdered as a rule
for three reasons—robbery, revenge, or, say, in a sudden passion.
Now it does not look like robbery, for we saw no signs of anything
being taken. That is, unless we figure the murderer broke the glass
of the bookcase and took a book. But that seems hardly reasonable.”
“Still some one did take a book or two from that case,” was my
retort.
“Perhaps. Of course Warren might have broken the glass himself
by accident. Then again, though I do not know much about books, I
do know a bit about that kind of literature. Once in a while we clean
up some book dealers who put it out in London. And I know this.
None of that stuff sells at a very high figure. It's rare, of course,
mostly because it's sold under cover. But a few pounds would buy
anything in that case. It does not seem reasonable to start out by
assuming he was murdered for a book of that class.”
“Well, let's put that out of the question and say revenge,”
suggested Carter.
“That would look more reasonable,” Ranville commented. “A man
of Warren's type would, of course, have made enemies. And the two
odd things about the murder—the position in which we found the
body and the cross on the forehead—seem to suggest revenge. You
cannot tell what he might have done while he was in China. He may
have made enemies there.”
“That suggests the Chinaman who the housekeeper says came
to the house about six,” was my remark.
“Maybe and maybe not,” was Ranville's quick retort. “I admit that
six o'clock is pretty near the time Warren was killed. Also, why a
Chinaman should wish to see him is something which must be
looked into. But I have had a good deal of experience with criminal
Chinese in our Limehouse section of London. They are capable of
the most devilish torture, the weirdest kinds of murder. But I fail to
remember a single case where they ever marked their victim after
death. And no Chinaman, it seems to me, would ever mark his victim
with a cross. Of course, once in a while you run into one who goes
wild, and there is no telling what he might do. But as a rule, though
they will in seeking revenge impose the most cruel tortures on some
of their victims, they do not as a rule mark them after death.”
“Disfigurement after a killing is often the work of a frenzied
woman,” was Carter's shrewd remark.
“That's true, Carter. Women, far more than men, are apt not to be
satisfied with murder alone. When a woman in a sudden passion kills
a man, she often, while the rage is on her, goes further.”
“And that makes one think of what the housekeeper said about
the secretary,” was my comment.
There was a moment's silence, broken by Carter's saying:
“I wonder what the housekeeper meant by that remark. She
certainly shut up like a clam when we tried to question her. There is
something back of it—at least back of the housekeeper's attitude.”
“Well,” came Ranville's voice, “there is one thing sure; I think I am
right when I say that whoever killed Warren was some one who knew
him. He sat in that chair, the one across from the desk, and I think
perhaps I am right when I add that he might have gone to Warren's
side to say good-by and then plunged the knife into him. But why he
paused to arrange the body on the floor and to make the cross on
his forehead I cannot say, but—”
Just what he might have added I do not know. We were
interrupted by the dog suddenly rising to his feet and starting to
growl. Deep, heavy growls at some object we could not see. Then
came the sound of footsteps on the walk, and a deep voice came
from the lawn:
“Hearing voices, Mr. Carter, I could not resist stopping.”
As the man came up the steps, I pushed the dog behind my
chair, telling him to be quiet. Carter rose and turned on the porch
lamp. For a moment the light, after the dense darkness, blinded me.
I wondered who could be coming to see Carter at this late hour. It
was a very tall and an extremely thin man who accepted the chair
which Carter pulled out. A man with a deep lined face and nervous
shifting eyes. As he came over to the chair, I saw that he was
wearing a clergyman's collar, though he did not look as calm as most
of the clergymen I have seen.
He proved to be Carter's next-door neighbor, and he told us he
was on his way home when he heard the sound of our voices. As he
talked, I could see that he was of a nervous, restless disposition, for
his hands were never still, and he moved his feet in an uneasy
manner. His voice was rather harsh, though the English he used was
perfect.
After the introductions had been acknowledged Carter said:
“Woods is my next-door neighbor. He's been in England,
Ranville.”
The clergyman admitted that he had been in England many
times. He changed the conversation at once by remarking:
“I have just come from down town, and they are all excited over
the murder of Mr. Warren. I did not know Warren very well, but it
seems almost incredible a man of his position should have met with
such a sudden death. Have they any idea who the guilty person is?”
We all shook our heads, and then Carter went into a brief
description of the finding of the body. The eyes of the minister grew
larger as he went on, and I saw a horrified look sweep across his
face. As I did not know many clergymen, I studied the man before
me with interest. It was easy to see that he had a good education,
and I wondered why he had buried himself in such a small country
town. Long before Carter had finished I had decided that the minister
was as nervous a man as I have ever met. His hands were never
still, and his eyes were as uneasy as his hands.
He said nothing until Carter mentioned the secretary, and then
half rose as he burst forth:
“Why, of all things,” came the rough high-pitched voice, “I know
Mr. Warren's secretary very well. She comes to my church. You must
know her also—Miss Harlan?” and he turned to Carter.
Carter shook his head; then said he knew her by sight and that
was all. He added she was a very fine-looking girl.
“That's very true,” the minister eagerly replied. “She is not only a
very fine-looking girl, but a very fine girl in all ways. It's absurd to
think she knows anything about Mr. Warren's death. I saw her myself
this afternoon.”
The conversation for some reason lagged after this, and after a
while the minister gave a glance at his watch, and then with a
sudden exclamation rose saying it was late. We said “good night,”
and he went down the steps and was lost to sight. After he was out
of hearing Ranville asked:
“How long have you taken up with clergymen, Carter?”
His friend laughed. “Oh, I do not know him so very well. He has
lived here for some time. It seems that about fifty years ago his
grandfather—for some unknown reason—built the church next door.
Woods sort of fell into it. He has a good deal of money they say, but
very few people ever go to his church. In fact he supports it himself.
You see he is about as high church as you can find—all sorts of
rituals and that kind of thing. They don't go very well in a place like
this. Then again, he is always attacking something.”
“Attacking something; what do you mean?” was Ranville's
puzzled question.

You might also like