Borovkov A Mathematical Statistics

You might also like

Download as pdf
Download as pdf
You are on page 1of 592
Mathematical Statistics A.A. Borovkov Institute of Mathematics, Novasibirsk, Russia Translated from the Russian by A. Moullagaliev “The book ts a useful reference tool for researchers and graduates _ The overall approach 1s intuitively inspiring and insightful.” Or Stephen Lee, Department of Statistics, University of Hong Korg This wide-ranging, extensive overview of modern mathematical statistics reflects the current stat: of the field whilst being succinct and easy to grasp. The mathematical presentation is coherent and rigorous throughout The author presents classical results and methods which form the basis of modern statistics arid examines the foundations of estimation theory, hypothesis testing theory, and Statistical gare theory. He goes on to consider statistical problems for two or more samples, and those in which observations are taken from different distributions. Methods of finding optimal and asymptotically ‘Optimal statistical procedures are given, along with treatments of homogeneity testing, regression variance analysis, and pattern recognition. The author also posits a number of methodolog} improvements which simplify proofs, and brings together a number of new results which have never before been published in a single monagraph. This monograph, by an acknowledged world leader in the field. combines maximum clantty vith mathematical rigor. Its breadth of focus will render it invaluable to postgraduate and researcn students of statistics and working statisticians, whilst the earlier sections will be of use to graduate students wishing to extend their knowledge of basic probability theory Related Titles of Interest Probability Theory, A.A. Borovkov Probability Theory and Mathe matical Statistics, |.A. bragimov and A Yu. Zaitsev ISBN; 90-5699-018-7 ni i Gordon and Breach Science Publishers wl l Australia * Canada + China « France + Germany mt a + India + Japan - Luxembourg - Malaysia + The Netherlands » Russia * Singapore + Switzerland LSILVLS TWOILVINAHLVIN § Aouaoxog ‘vv Mathematica Statistics A.A. Borovkov GORDON AND BREACH SCIENCE PUBLISH Copyright © 1998 OPA (Overseas Publishers Association) N.V. Published by license under the Gordon and Breach Science Publishers imprint. All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and recording, or by any information storage or retrieval system, without permission in writing from the publisher. Printed in Singapore. Amsteldijk 166 Ist Floor 1079 LH Amsterdam The Netherlands British Library Cataloguing in Publication Data Borovkov, A.A. Mathematical statistics 1. Mathematical statistics I. Title 519.5 ISBN 90-5699-018-7 CONTENTS Preface Introduction Chapter I. A sample. Empirical distribution. Asymptotic properties of statistics 1. 2. 3. mh 8. o*. The notion of a sample Empirical distribution: The one-dimensional case Sample characteristics. Main types of statistics Examples of sample characteristics . Two types of statistics . L-statistics M-statistics . On other statistics Multidimensional samples 1. Empirical distributions 2*. More general versions of the Glivenko—Cantelli theorem. The law of the iterated logarithm. 3. Sample characteristics Continuity theorems Empirical distribution function as a random process. Convergence to the Brownian bridge 1, Distribution of the process nF7(1) 2. Limit behavior of the process w”(1) Limit distribution for statistics of type | Limit distribution for statistics of type II ‘Notes on nonparametric statistics WRuUNS 10*. Smoothed empirical distributions. Empirical densities Chapter II. Estimation of unknown parameters i. 12. Preliminaries ‘Some parametric families of distributions Normal distribution on the real line . Multidimensional normal distribution . Gamma distribution . Chi-square distribution H, with & degrees of freedom . Exponential distribution weone xiii xix ase ry 10 ul Vl ll 12 13 40 41 41 a2 Q 43 15%; 16. 19. 20. 21. CONTENTS rs ». Fisher distribution Fy, k with number of degrees of freedom equal to k, , ky... 7. Student distribution T, with & degrees of freedom 8. Beta distribution 9. Uniform distribution 10, Cauchy distribution K, , with parameters (a, 6) 11. Lognormal distribution 12, Degenerate distribution 13. Bernoulli distribution By 14, Poisson distribution T1, 15. Polynomial distribution Point estimation. The main method of obtaining estimators. Consistency and asymptotic normality 1. Substitution method. Consistency 2. Asymptotic normality: The one-dimensional case 3. Asymptotic normality: The case of a multidimensional parameter Realization of the substitution method in the parametric case. The method of moments. M-estimators 1. The method of moments: One-dimensional case 2. The method of moments: Multi-dimensional case 3. M-estimation as a generalized method of moments 4*, Consistency of M-estimators 5. Consistency of M-estimators 6. Asymptotic normality of M-estimators 7. Some notes on the multidimensional case The minimum-distance method The maximum-likelihood method. Optimality of maximum-likelihood estimators in the class of M-estimators 1, Definitions, general properties 2. Asymptotic properties of maximum-likelihood estimators. Consistency 3. Asymptotic normality of maximum-likelihood estimators. Optimality in the class of M-estimators On comparing estimators J. Mean square approach. One-dimensional case 2. Asymptotic approach. One-dimensional case 3. Lower bound of dispersion for L-estimators 4. 5. ‘a, 07 Mean square and asymptotic approaches in the multidimensional case Some heuristic approaches to the determination of the variance of estimators. The jackknife and bootstrap approaches Comparing estimators in the parametric case. Efficient estimators 1. One-dimensional case. The mean-square approach 2. Asymptotic approach. Asymptotic efficiency in the classes of M-estimators and L-estimators 3. Multidimensional case Conditional expectations I. Definition of a conditional expectation 2. Properties of conditional expectations Conditional distributions Bayesian and minimax approaches to parameter estimation 8 80 83 85 87 OL 94 100 101 101 104 109 22. 23%. 24. 25. 27", 28". 298. 30. ¥31. 32". 33. 34. CONTENTS, Sufficient statistics Minimal sufficient statistics Constructing efficient estimators via sufficient statistics. Complete statistics 1, One-dimensional case 2. Multidimensional case 3. Complete statistics and efficient estimators Exponential family The Rao-Cramer inequality and R-efficient estimators 1, The Rao-Cramer inequality and its corollaries 2. Re-efficient and asymptotically R-efficient estimators 3, The Rao—Cramer inequality in the multidimensional case 4, Some concluding remarks. Properties of the Fisher information 1. One-dimensional case 2. Multidimensional case 3. Fisher matrix and parameter change Estimators of the shift and scale parameters. Efficient equivariant estimators 1. Estimators for the shift and scale parameters 2. Efficient estimator for the shift parameter in the class of equivariant estimators 3, Pitman estimators are minimax 4, On optimal estimators for the scale parameter General problem of equivariant estimation Integral Rao-Cramer type inequality. Criteria for estimators to be asymptotically Bayesian and minimax 1. Efficient and superefficient estimators 2. Main inequalities 3. Inequalities for the case when the function q(6)/J(@) is not differentiable 4. Some corollaries, Criteria for estimators to be asymptotically Bayesian or minimax 5. Multidimensional case i Kullback-Leibler, Hellinger, and x2 distances and their properties . Definitions and main properties of the distances ? Connection between the Hellinger and other distances and the Fisher information 3, Existence of uniform bounds for r(A)/A? 4, Multidimensional case 5*. Connection between the distances in question and estimators Difference inequality of Rao—~Cramer type Auxiliary inequalities for the likelihood ratio. Asymptotic properties of maximum-likelihood estimators 1. Main inequalities 2. Estimates for the distribution and for the moments of a maximum-likelihood estimator. Consistency of a maximum-likelihood estimator 3. Asymptotic normality 4. Asymptotic efficiency 5. Maximum-likelihood estimators are asymptotically Bayesian Asymptotic properties of the likelihood ratio. Further optimality properties of maximum-likelihood estimators vii 116 122 128 128 129 130 133 139 139 144 147 ISL 152 152 155 157 158 158 159 162, 163 165 168 168 169 173. 174 177 177 177 180 181 182 183 184 188 189 191 192 194 19s viii 35%. 36. 37. 38". 39. 40. 41. 42. 43". CONTENTS Approximate computation of maximum-likelihood estimators 204 The results of Sections 33-35 for the multidimensional case 211 1. Inequalities for the likelihood ratio (the results of Section 33) 212 2. Asymptotic properties of the likelihood ratio (results of Section 34) 213 3. Properties of maximum-likelihood estimators (the results of Sections 33 and 34) 217 4. Approximate computation of maximum-likelihood estimators 220 5. Properties of maximum-likelihood estimators without regularity conditions (the results of Subsections 14.4 and 16.2) 220 Uniformity in 6 of the asymptotic properties of the maximum-likelihood ratio and maximum-likelihood estimators 220 1. Uniform law of large numbers and uniform central limit theorem 220 2. Uniform versions of the theorems on asymptotic properties of the likelihood ratio and maximum-likelihood estimators 222 3. Some corollaries 225 On statistical problems related to samples of random size. Sequential estimation 226 Interval estimation 226 1. Definitions 226 2. Construction of confidence intervals in the Bayesian case 227 3. Construction of confidence intervals in the general case. Asymptotic confidence intervals 228 4. Construction of precise confidence intervals via a given statistic 230 5. Other methods for the construction of confidence intervals 233 6, Multidimensional case 235 Precise sample distributions and confidence intervals for normal populations 236 1. Precise distributions of the statistics % and S3 236 2. Constructing precise confidence intervals for the parameters of the normal distribution 237 Chapter IIL. Testing hypotheses Testing finitely many simple hypotheses 240 1, Statement of the problem. The notion of a statistical test, Most powerful tests 240 2. Bayesian approach 242 3. The minimax approach 246 4. Most powerful tests 247 Testing two simple hypotheses 249 Two asymptotic approaches to calculation of tests. Numerical comparison 252 1. Preliminary remarks 252 2. Fixed hypotheses 253 3. Close hypotheses 257 4. Comparison of asymptotic approaches. A numerical example 259 5. The connection between most powerful tests and asymptotic efficiency of maximum likelihood estimators 263 Testing composite hypotheses. Classes of optimal tests 264 1. Statement of the problem and main notions 264 2. Uniformly most powerful tests 266 3. Bayes tests 267 4, Minimax tests 267 45, 46*. a7. 48%. 49, 50. S1*: 52. 53. 54. 55. 56. CONTENTS Uniformly most powerful tests 1. One-sided alternatives. Monotone likelihood ratio 2. Two-sided null hypothesis. Exponential family 3. Another approach to the problems under study 4. The Bayesian approach and the least favorable a priori distributions in the construction of most powerful tests and uniformly most powerful tests Unbiased tests 1, Definitions, Unbiased uniformly most powerful tests 2. Twoesided alternatives. The exponential family Invariant tests Connection with confidence sets 1. Connection between statistical tests and confidence sets. Connection between optimality properties 2, Most precise confidence intervals 3. Unbiased confidence sets 4. Invariant confidence sets The Bayesian and minimax approaches to testing composite hypotheses 1. Bayes and minimax tests 2. Minimax tests for the parameter a of normal distributions 3. Degenerate least favorable distributions for one-sided hypotheses Likelihood ratio test Sequential analysis 1. Preliminaries 2. Sequential Bayes test 3, Sequential test minimizing the average number of experiments 4, Computing the parameters of the best sequential test Testing composite hypotheses in the general case Asymptotically optimal tests, Likelihood ratio test as an asymptotically Bayes test for testing a simple hypothesis against a composite alternative 1. Asymptotic properties of likelihood ratio tests and Bayes tests 2. Conditions for likelihood ratio tests to be asymptotically Bayesian 3. Asymptotic unbiasedness of likelihood ratio tests Asymptotically optimal tests for testing close composite hypotheses 1. Statement of the problem and definitions 2. Main assertions Asymptotic optimality properties of the likelihood ratio test which follow from the limit optimality criterion 1. Asymptotically uniformly most powerful tests for close hypotheses with one-sided alternatives 2. Asymptotically uniformly most powerful tests for two-sided alternatives 3. Asymptotically minimax test for close hypotheses concerning a multidimensional parameter 4. Asymptotically minimax test for the hypothesis that a sample: belongs to the parametric subfamily The 2 test. Testing hypotheses on grouped data 1. The x2 test. Properties of asymptotic optimality 2. Applications of the? test. Testing hypotheses based on grouped data 268 268 270 274 275 277 277 279 281 285 285 287 290 291 293 293 297 303 307 307 307 311 313 315 323 323 325 328 329 329 332 336 336 337 338 340 345 345 348 57. 58. CONTENTS, Testing the hypothesis that a sample belongs to a parametric family 1. Testing the hypothesis X € Byay . Data grouping 2. General case Robustness of statistical decisions 351 351 354 357 1. Problem statement. Qualitative and quantitative characterization of robustness357 2. Estimating the shift parameter 3. Student statistics and 53 4. Likelihood ratio test Chapter IV. Statistical problems for two or more samples 59. 6l. 62. 63. Testing complete or partial homogeneity hypotheses in the parametric case 1. Definition of the class of problems under study 2. Asymptotically minimax test for testing close hypotheses about ordinary homogeneity 3. Asymptotically minimax tests for the homogeneity problem with a nuisance parameter 4, Asymptotically minimax test for the partial homogeneity problem 5. Some other problems Homogeneity problems in the general case 1. Statement of the problem 2. Kolmogorov-Smimov test 3. The sign test 4. The Wilcoxon test 5. The test x? as an asymptotically optimal test for testing based on grouped data for testing homogeneity -Regression problems I, Statement of the problem 2. Estimating the parameters 3. Testing linear regression hypotheses 4. Estimation and hypothesis testing in the presence of linear relationships Analysis of variance 1. Problems of analysis of variance as regression problems. The case of ‘one factor 2. Effect of two factors. Elementary approach Pattern recognition 1. Parametric case 2. General case Chapter V. Nonidentically distributed observations 64, 65. Preliminary remarks. Examples Basic methods of estimator construction. M-estimators. Consistency and asymptotic normality 1, Preliminary remarks and definitions 2. M-estimators 3*. Consistency of M-estimators 4. Consistency of M-estimators 5. Asymptotic normality of M- estimators 363 365 366 368 368 370 374 379 382 382 382 383 384, 385 389 390 390 392 398 401 404 406 409 409 4iL 4l7 417 als 423 427 430 66, 67. 69. 70. 71. CONTENTS Maximum-likelihood estimators. The main principles of estimator comparison. Optimality of maximum-likelihood estimators in the class of M-estimators 1. Maximum-likelihood estimators 2. Asymptotic properties of maximum-likelihood estimators 3. Main principles of estimator comparison. Asymptotic efficiency of maximum-likelihood estimators in the class of M-estimators Sufficient statistics. Efficient estimators. Exponential families Efficient estimators in the problem of estimating ‘tails’ of distributions (Example 65.6). Asymptotic properties of estimators 1. Maximum-likelihood estimators 2. Asymptotic normality of & in Problem B 3*. Asymptotic normality and optimality in Problem A Rao—Cramer inequality Inequalities for the likelihood ratio and asymptotic properties of maximum-likelihood estimators 1. Inequalities for the likelihood ratio and consistency of maximum-likelihood estimators 2, Asymptotic normality of maximum-likelihood estimators 3. Asymptotic efficiency 4, Maximum-likelihood estimators for a multidimensional parameter Remarks on testing hypotheses based on nonhomogeneous observations Chapter VI. Game-theoretic approach to problems of mathematical statistics Vv T2. v 73. v 74. v 15. v 76. 77. 2B. 79. Preliminary remarks Two-person games: Definitions and results |. Two-person game 2. Uniformly optimal strategies in subclasses 3. Bayesian strategies 4. Minimax strategies 5. Complete class of strategies Statistical games 1, Description of statistical games 2. Classification of statistical games 3. Two fundamental theorems of the theory of statistical games Bayes principle, Complete class of decision functions Sufficiency, unbiasedness, invariance 1. Sufficiency 2. Unbiasedness 3. Invariance Asymptotically optimal estimators for an arbitrary loss function Optimal statistical tests for an arbitrary loss function. The likelihood ratio test as an asymptotically Bayesian decision 1. Optimality properties of statistical tests for an arbitrary loss function 2. The likelihood ratio test as an asymptotically Bayesian test Asymptotically optimal decisions for an arbitrary loss function in the case of close composite hypotheses Appendix I. Theorems of Glivenko—Cantelli type Appendix I. Functional limit theorem for empirical processes xi 432 432 434 438 442 442 443 451 453. 453. 456 457 458 458 461 462 462 463 463 465 471 472 474 475 476 482 482 483, 487 497 497 497 500 505 508 xii CONTENTS, Appendix IIL. Properties of conditional expectations Appendix IV. The law of large numbers and the central limit theorem. Uniform versions Appendix V. Some assertions concerning integrals depending on parameters Appendix VI. Inequalities for the distribution of the likelihood ratio in the multidimensional case Appendix VII. Proofs of two fundamental theorems of the theory of statistical games Tables Bibliographic comments References Notation Index 514 S17 527 533 537 543 552 560 564 568 PREFACE The present book is a substantially revised and expanded version of the book Mathematical Statistics, which was originally published in Russian in 1984 and consists of two parts: ‘Parameter Estimation and Hypothesis Testing’ and ‘Complementary Chapters’. The two parts were later translated and published as a single monograph in French (1987) and Spanish (1988). One of the main changes in the present book compared to the original is the addition of anew chapter on the statistics of nonidentically distributed observations. The book is based on the mathematical statistics course | taught for many years at Novosi- birsk University. | modified the material many times in the quest for a version that would re- flect the current state of the art in the area whilst being logical and easy to understand. Numerous versions have been tried, from a ‘collection of recipes’ for the basic types of prob- Jems (constructing estimators and tests, and studying their properties), to a course of a general game-theoretic nature, in which the theories of estimation and hy pothesis testing were present- ed as particular cases of acommion general approach. The time limitation (it was a one-semes- ter course) did not allow us to unify the two complementary variants, each having distinctive disadvantages when taken separately. In the first case, the collection of specific facts preclud- ed forming a general view of the matter. The second variant lacked simple, concrete results and was overloaded with numerous novel, sophisticated notions that were difficult to under- stand. In this book, we combined these two approaches, presenting the theories of estimation and hypothesis testing, with a consistent emphasis on finding optimal procedures. The book is based on the accumulated material that was used in different variants of my lecture courses taught in different years. The material was expanded by adding sections whose presence is required by the very logic of the exposition. The main goal was to present the cur- Tent state of the art in the subject with the maximal possible clarity, mathematical rigour, and integrity. The book consists of six chapters and seven appendices. Chapter I studies mostly asymp- totic properties of empirical distributions, which form the foundation of mathematical statis- tics. Chapters II and III present the theory of estimation and the theory of statistical hypothesis testing, respectively. The first part of each of these two chapters describes possible approaches to solving problems and finding optimal procedures. The rest deals with constructing asymp- totically optimal procedures. Chapter IV deals with problems for two or more samples, Statis- tical inference for nonidentically distributed observations, in a more general setup than that of Chapter IV, is discussed in the new Chapter V, which has already been mentioned above. Chapter VI presents a general game-theoretic approach to problems of mathematical statistics and has a similar structure to that of Chapters II and III. The book also contains seven appendices. They are related to various assertions in the text whose proofs are beyond the scope of the main presentation either because of their nature, or because of their difficulty. We also give bibliographical comments at the end of the book which do not claim to be complete but enable one to trace the origins and development of the main directions of xiii xiv PREFACE mathematical statistics. Wherever possible, we prefer, for the sake of easier access, to refer to monographs rather than to original papers. Nowadays there are quite a lot of books on mathematical statistics. We would distinguish the following books which contain a large amount of material reflecting the current state of the subject: H. Cramer [32], E. Lehmann [77, 78], S. Zacks [127], I. A. Ibragimov and R. Z. Khas’ minskii [66], L. Le Cam (76], and G. R. Shorack and J. A. Wellner (109]. Of these, the present book has been most influenced by [66] and [78]. Namely, Sections 33, 34, and 36 use some ideas of [66], while Sections 45-48 of Chapter III are close to the corresponding parts of [78]. The rest of our exposition bears little resemblance in its structure to the existing text- books. There are many other books which also occupy an important place in the literature on statistics (such as the books by Blackwell and Girshick [9], Kendall and Stuart [67], Cox and Hinkley [30], Ferguson [44], Rao [100], and some others; it is impossible to list all of them), but they differ essentially from the present monograph in both their spirit and selection of ma- terial, Along with well-known results and approaches, the book contains several new sections which simplify the exposition, a number of methodological improvements, certain new results and also results which have not yet been published in monographs. Below we give a brief description of the methodological structure of the present book (see also the table of contents and brief prefaces to each of the chapters). Chapter I presents in Sec- tions | and 2 the notions of a sample and an empirical distribution, and proves the Glivenko— Cantelli theorem which is a fundamental fact at the basis of statistical inference. In Section 3, we define two types of statistics (to be called statistics of type | and II), which cover a vast majority of statistics of practical interest. Statistics of these types are defined to be the values G(P*) of some functionals G (satisfying certain conditions) of the empirical dis- tribution P* . The class of statistics under consideration is then extended by including L-, M- and U-statistics. Later, in Sections 7 and 8, we establish limit theorems for the distributions of such statistics. This makes the subsequent presentation simpler and allows one to avoid the ne- cessity of repeating basically the same argument for each particular statistic, such a repetition being mostly irrelevant to the subject of statistics. Section 5 gathers auxiliary theorems, called ‘continuity theorems’ in this book on the con- vergence of distributions and their moments. The purpose is again to make the subsequent ex- position lighter. In Section 6 (optional at the first reading), we prove that the empirical distribution function F3(2) isa conditional Poisson process and state the theorem on the convergence of the process in(F*(t)— F(t)), where F(t) is the corresponding ‘theoretical’ distribution function to the Brownian bridge. The proof of the theorem is given in Appendix 1. Section 10 introduces smoothed empirical distributions which allow one to approximate not only the distribution itself, but also its density. Chapter II treats estimation of unknown parameters. Section 13 introduces a ‘substitution method” as a common general approach to constructing estimators. The idea is that if a param- eter 9 can be represented as a functional 6 = G(P) of the distribution P of the sample, one can take the estimator @” for @ of the form @” = G(P>), where P” is the empirical distri- bution. Almost all ‘reasonable’ estimators used in practice are substitution ones. An optimal estimator can then be found by selecting a suitable functional G. Ifthe estimator 7 = G(P*) is a statistic of type I or II, the theorems of Chapter I establish immediately that the estimator is consistent and asymptotically normal. In Sections 14 and LS, this approach is illustrated by the examples of estimators obtained using the moment method and the minimum distance method. In the same sections, consistency and asymptotic normality of M-estimators are PREFACE. xv proved. Maximum likelihood estimators are studied in Section 16 from a similar point of view. Moreover, we establish a lower bound for the variance of M-estimators and prove the asymp- totic optimality of the maximum likelihood estimators in the class of all M-estimators. A more detailed study of asymptotic properties of the maximum likelihood estimators follows later, in ‘Sections 33 and 34. In Section 17, we establish a lower bound for the variance of Z-estimators. This enables us to construct, in explicit form, asymptotically optimal L-estimators for the shift parameter, which are asymptotically equivalent to maximum likelihood estimators. In Chapter I], two approaches are used to compare estimators. According to the mean square approach, we compare the values of E (@ — 6)* . When using the asymptotic ap- proach, we compare, in the class of asymptotically normal estimators, the variances of the lim- it distributions of /n(@" - 6). As an illustration, these approaches are used to construct asymptotically optimal estimators in the class of Z-estimators. In the parametric case, the mean square approach allows us to single out three types of optimal estimators, namely, efficient es- timators in the class K, of estimators with a fixed bias 5, Bayes estimators, and minimax es- timators, Based on the same principles, we can define the classes of asymptotically optimal estimators in the asymptotic approach. To construct efficient estimators, the following tradi- tional methods are used. The first method is qualitative; it is connected with the sufficiency principle (Sections 22-24). The second one is based on the quantitative relations which follow from the Rao—Cramer inequality (Section 26). The third method uses the invariance argument (Sections 27 and 29) thereby enabling one to reduce the class of estimators under considera- tion. In Sections 30-38, we find asymptotically optimal estimators and study asymptotic prop- erties of the likelihood function. Section 30 contains an integral inequality of the Rao—Cramer type. In particular, this inequality allows one to obtain simple criteria for an estimator to be asymptotically Bayesian or minimax, as well as to justify selecting a certain subclass of esti- mators Ko to which one could restrict one’s attention when looking for asymptotically effi- cient estimators. This enables us to establish, immediately after studying asymptotic properties of the maximum likelihood estimators in Section 34, that the latter are asymptoti- cally Bayesian and minimax, as well as asymptotically efficient in Kg . Sections 31-33 contain auxiliary material. Interval estimation of parameters is covered in Sections 29, 30, and 48. Chapter III is devoted to hypothesis testing. In Sections 41 and 42, we consider the case of finitely many simple hypotheses. First, as in the theory of estimation, we distinguish three types of optimal tests: the most powerful tests in subclasses, Bayes, and minimax tests. Rela- tionships between these tests are established and their explicit form is found. As the basic prin- ciple for the studies we put the Bayes principle rather than the Neyman-Pearson lemma; in our opinion, this simplifies the exposition and makes it more accessible, In Section 43 we present and compare two asymptotic approaches to computing the parameters of tests for two simple hypotheses. Section 44 considers the general setting of the problem of testing two composite hypotheses and defines the classes of optimal tests (uniformly most powerful, Bayes, and min- imax tests). Section 45 deals with finding uniformly most powerful tests in the cases when it is possible. In Sections 46 and 47, we solve the same problem for the classes of tests which are restricted using the criteria of unbiasedness and invariance. As in Sections 4] and 42, the ex- position is again based on the Bayesian approach. In Section 48, we use the results obtained previously to construct the most precise confidence sets. Section 49 deals with Bayes and min- imax tests. Sections 50 and 53 are devoted to the likelihood ratio test. It turns out to be uni- formly most powerful in many special cases and is asymptotically Bayesian under rather wide assumptions. The investigation of asymptotic optimality properties of the likelihood ratio test is continued in Sections 55-57. In Section 51 we show that this test is optimal in the problems of sequential analysis. Sections 54 and 55 deal with finding asymptotically optimal tests for xvi PREFACE close hypotheses and present a simple explicit form of these tests for basic statistical problems. Section 58 is devoted to robustness of statistical procedures, . A distinctive feature of the first three chapters is that they deal only with statistical prob- lems for one sample. As we have already mentioned, Chapter IV is devoted to problems for two or more samples. First of all, these are problems related to complete or partial homogene- ity (Sections 59 and 60), regression problems (Section 61), and analysis of variance (Section 62). Based on the results of Chapter III, we construct asymptotically optimal tests for homo- geneity problems in the parametric case under the assumption that alternative hypotheses are close to the null homogeneity hypothesis. For regression problems (both for linear and arbi- trary functional regression), we use the results of Chapters I and III to find efficient estimators of unknown parameters and construct tests for null hypotheses. We also consider the so-called pattern recognition problems (Section 5). ‘The new Chapter V ‘Statistics of nonidentically distributed observations’ resembles Chap- ter Il in structure. The very appearance of the chapter in the present edition is due to the fact that in applications, one encounters more and more problems related to nonhomogeneous ob- servations. A typical, though by no means unique example is nonlinear regression. At the same time, the general methods for solving such problems have been developed only partially, and there is no systematic exposition, While not claiming to fill this gap completely, we aim to transfer the basic results and approaches that were presented in Chapters II and II of the book for statistics of homogeneous observations to the nonhomogeneous case. Section 64 presents several typical problems (including some of independent interest) related to statistics of non- identically distributed observations, These examples are then used for illustration and are in- vestigated in more detail. In Section 65, we present the basic methods of constructing estimators. These are primarily the methods of AY- and M-estimation. The main results of ‘Chapter II on consistency and asymptotic normality are extended to these estimators. In Sec- tion 66, we use the results of Section 65 to establish asymptotic properties of maximum like~ lihood estimators (consistency, asymptotic normality, and asymptotic optimality in the class of M-estimators). Section 67 contains some comments on the use of the results related to suf- ficient statistics and exponential families in the case of nonidentically distributed observa~ tions. As an illustration, in Section 68 we study in detail the asymptotic properties of the estimators for the parameters of the distribution ‘tails’. Section 69 presents a generalization of ‘the Rao—Cramer inequality and problems related to the case of nonidentically distributed ob- servations. Section 70 is devoted to extending (to the same case) the results of Chapter II on asymptotic properties of maximum likelihood estimators (cf. Sections 33-34). Extending the majority of the results from Chapter III to the case of nonidentically distrib- uted observations either does not require detailed investigation or is based on the above-men- tioned generalizations of the results of Chapter II. For some remarks on this extension, see Section 70. Chapter VI is devoted to the general game-theoretic approach to statistical problems. It en- ables one to work out a general view on the subject of mathematical statistics and to generalize many results of Chapters Il and III. Section 73 presents the basic notions and results of ‘ordinary’ game theory (only two-person games are considered), In particular, we establish the relations between the basic types of optimal strategies - Bayesian, minimax, and uniformly best in subclasses. In Section 74, statistical games are studied. In Section 75, we state and prove the so-called Bayes prmciple enabling one to reduce the problem of finding a Bayes sta- tistical decision to a much simpler one of constructing a Bayes strategy for an ordinary two- person game. In Section 76, we discuss the principles of sufficiency, unbiasedness, and invar- iance to construct decisions which are uniformly optimal in the corresponding subclasses. Sec- tions 77-79 deal with finding asymptotically optimal decisions. In Section 77 we study PREFACE xvii asymptotically optimal estimators when the loss function is arbitrary (not necessarily quadrat- ic). In this case, it tums out to be possible to prove some assertions which are close to the re- sults from Chapter I] on asymptotic optimality of maximum likelihood estimators. In Sections 78 and 79, we study asymptotically optimal tests in the case of an arbitrary loss function, In Section 78, it is shown that the likelihood ratio test is asymptotically Bayesian. In Section 79, we establish a limit criterion for optimality of the tests for close hypotheses (which extends the results of Sections 44 and 45 of Chapter Il to the case of an arbitrary loss function). Of all the appendices, we mention here Appendix VII, in which two fundamental theorems of statistical game theory are proved. To read it, one requires a more advanced mathematical background. The present book is a multipurpose one, The whole monograph is certainly closer to a pro- gram for postgraduate students of mathematical statistics than to a textbook for undergradu- ates. However, the exposition is devised so as to make the book readable for ‘mathematically minded’ undergraduate students as well. More complicated or ‘more advanced’ sections are marked by an asterisk and may be skipped at first reading, as well as the text set in the smaller font. Moreover, the discussion of more technically complicated cases involving multidimen- sional parameters is almost always given in separate sections and subsections which can also be skipped. Graduate students and instructors who are already familiar with the subject to some extent can select a subset of sections (there are many possible choices) which would constitute a sound one-semester course of mathematical statistics. Here is one possible variant: Sections 1, 3, 5, 12-14, 16-22, 24, 26 (31, 33-34), 39, 40-42, 44, 45, 52 (53, 56). The sections in paren- theses deal with asymptotically optimal procedures. Depending on the level of the class, they may be either maximally simplified or even left out. The reader is assumed to be familiar with probability theory; the author’s textbook on the subject [19] is a good basis for the present course (the use of other probability textbooks is of course also possible). Unlike others references to {19] appear in the places which are assumed to be known to the reader and serve basically as reminders. The section numbering is common and runs throughout the whole book. Numbering of Theorems (lemmas, examples etc.) is separate within each section. References to theorems, lemmas, examples, equations etc. depend on the section in which they appear. If we refer to Theorem I or inequality (12) of the current section, we do it like this: Theorem |, inequality (12). A reference to Theorem | or inequality (12) from a different section, for example Section 15, looks as follows: Theorem 15.1, inequality (15.12). The same convention is used for sub- sections. The symbol (1 denotes the end of a proof, For the reader’s convenience, there is a list of notation and an index at the end of the book. Preparing and writing this book required a lot of work which was done in several steps. 1S. Borisov provided me with much help in preparing the original lecture notes for publica- tion and in eliminating their shortcomings. The second version of the manuscript was read on my request by K. A. Borovkov. As a result, I received much useful advice and a long list of errors he noticed in the text. He also helped me substantially while ‘debugging’ the final ver- sion of the manuscript. In the search for further fresh criticism, I asked A, I, Sakhanenko to read the manuscript. He also proposed a long list of remarks and suggestions on how to im- prove the exposition; I used many of them. The most significant changes were made to the proofs in Sections 26, 31, 33, 37, 43-45, and Appendices II, IV, and VII (see also bibliograph- ical notes at the end of the book), Many valuable remarks aimed at improving the book were made by D. M. Chibisov. V. V. Yurinskii and A. A. Novikov also made a number of useful remarks upon reading the xviii PREFACE manuscript, | am sincerely grateful to all my colleagues that I have named here, as well as to all the others who helped'me in any way in my work on this book. Their support and assistance: are greatly appreciated, INTRODUCTION This book presents the basics of a part of mathematics which is called mathematical statistics. For the sake of brevity, mathematical statistics is often called just staristics. One should bear in mind, however, that the abbreviation should only be used when there is a good mutual un- derstanding, because the word ‘statistics’ quite often has a somewhat different meaning. What is the subject of mathematical statistics? One could give various descriptive ‘defini- tions’ which reflect, to some extent, the contents of this field of mathematics. One of the sim- plest and crudest definitions is based on a comparison connected with the notion of a sample from a general population and the problem on the hypergeometric distribution, which is usu- ally discussed at the beginning of a probability theory course. Knowing the composition of a general population, one studies the distributions for the composition of a random sample. This isa typical direct problem of probability theory. Very often, however, we need to solve inverse problems, in which we know the composition of the sample and need to find what the general population was. Graphically speaking, inverse problems of this sort constitute the subject of mathematical statistics. To make this comparison somewhat more precise, we could say that in probability theory we know the nature of a phenomenon and attempt to find out the behavior (that is, the distri- bution) of certain characteristics which can be observed in experiments. Conversely, in math- ematical statistics we begin with experimental data, which are generally some observed values of random variables, and need to make a judgement or decision concerning the nature of the phenomenon under consideration, Thus, we are dealing with one of the most important aspects of human activity, the process of cognition. The thesis that the ‘practice is the criterion of truth’ is directly related to mathematical statistics, because it is exactly this science which studies, in the framework of precise mathematical models, the methods which allow us to an- swer the question of whether practice, in the form of the results of an experiment, is adequate to the given hypothetical idea of the nature of the phenomenon. It should be emphasized that here, as in probability theory, we are interested not in those experiments which allow us to derive unique, deterministic conclusions about the phenomena of nature in question, but in the experiments whose results are random events, As science de- velops, more and more problems of this kind arise, since the increase in the precision of our experiments does not help us to avoid the random factor related to various interferences and the limits of our measuring and computing capabilities. Mathematical statistics is a part of probability theory in the sense that each problem of mathematical statistics is essentially a problem (sometimes a very original one) of probability theory. However, mathematical statistics has its own place in the hierarchy of sciences. It can be regarded as a science about the so-called inductive behavior of humans (and not only hu- mans) when they have to make decisions, on the basis of their nondeterministic experience, that lead to minimal losses!, 'For more details, see [90]. xix xx INTRODUCTION Mathematical statistics is also called the theory of statistical decisions, because it can be characterized as a science about optimal decisions (the last two words require an explanation) which are based on statistical experimental data. Precise statements of the problems will be given later, in the body of the book. Here we only give three examples of the most simple and typical statistical problems. Example 1. One ofthe main quality-related parameters of many products is their life span. As arule, the life span of a product (say, an electric bulb) is random and cannot be determined in advance. Experience shows that if the manufacturing process is homogeneous in a certain sense, then the life spans &,,&,, ... of the first, second, etc. product, respectively, can be re- garded as independent identically distributed random variables, It is natural to identify the lifespan parameter, which we are interested in, with the number @ = E&,, which is the expectation of &,. One of the standard problems is to find @. To determine this value, one takes nready items and tests them. Suppose that X),X>,...,X, are the life spans of these tested ‘items, We know that n 1 nae? i=l as n— ©. Therefore, it is natural to expect that the number X = any. 1%i> for n large enough, will be close to @ and will allow us to answer our question to a certain extent. It is clear also that we are interested in making the required number of observations as small as pos- sible and the estimate of @ as accurate as possible (both overstating and understating @ will lead to material losses). Example 2. A radar is scanning a given part of the air space at moments £1, £35 ..+s fy tty ing to locate a certain object. We denote by x;, ...., the reflected signals registered by the device. If the object that we are interested in is absent in the observed area, the values x, may be regarded as independent random variables whose distribution coincides with that of a cer- tain random variable €, which is determined by the nature of the atmospheric noise. And if the object is in the observed area during the entire observation period, then & will contain a ‘useful’ signal a, in addition to the noise, and the distribution of x, will be equal to that of € +a. Thus, if the distribution function of x, in the first case is F(x) , then in the second case it equals F(x - a). Given a sample x,, ..,, X,, we need to determine which of the two cases takes place, that is, find out whether or not the object of interest is in the observed area. In this problem, it is possible to point out an ‘optimal decision rule’ (in a certain sense), which will solve the posed problem with minimal error. The statement of the problem can be made more complicated as follows: first the object is absent and then it appears beginning with the observation with some unknown index @. We need to estimate, as accurately as possible, the time @ when the object entered the area. This is the so-called ‘change point problem’, which has many other interpretations important for applications. Example 3. A certain experiment is performed | times under conditions A, and then 1, times under the conditions B. Let us denote by x), ..., X,, andy}, ...Y, the results of these experiments under the conditions A and B, respectively. The question is as follows: Do the conditions of the experiment have an impact on its results? In other words, if P, denotes the distribution of x,, | 1. In what follows, we will generally treat only these two cases, that is, X will be either R (one-dimensional case) or R™, m > 1 (multi- dimensional case). The class 8x is usually chosen to be the g-algebra of Borel sets. At the same time, it should be noted that many results of the book, especially those of Chapters II-VI, are not related to the nature of the sample space X at all, since they are concerned not with the observations themselves, but rather with R™-valued functions of the observations, m > 1. If it is known in advance that P is concentrated in a part B € Bx of the space X, then it may be more convenient to assume that X refers to B and Bx to the trace of the o-algebra Bx on B. Consider n independent repetitions of the experiment G (see [19], Sec. 2.3) and denote by x1, ..., X, the resulting set of observations. The vector Xn = (1+. + Xn) is called a sample of size n from a population with distribution P. Sometimes shorter or longer versions of the term are used: a “sample from a distribution P” or a “simple sample of size n from a general population with distribution P.” 1 2 I. A SAMPLE. PROPERTIES OF STATISTICS To denote the relation “Xp is a sample from a distribution P,” we use the symbol € as follows; Xn €P. (1) A similar notation is also used for other random variables. For instance, the relation €eP (2) means that € has distribution P. Such a use of the symbol € agrees with (1), since the latter is defined for any 7, including the case n = 1, If € and 7 are two random variables (generally given on different probability spaces) having identical distributions, we denote this by € 57 Thus, if X, and Yq are two samples of the same size from a distribution P, we can write X, + Yn Instead of the distribution P, the right-hand side of (1) or (2) may be the distri- bution function corresponding to P. So, if F(z) = P((—co,z)), then the expression Xn @F is identical to (1). We encounter the notion of a “sample from a population” also when considering the simplest probability models connected with drawing balls from an urn in the classical definition of probability (see [19], Sec. 1.2). It should be noted that this definition of asample completely agrees with the definition introduced above and in fact coincides with it. If x; (or the random variable €) may take only s values a,,..., a,, and if the probabilities of these are rational, that is, Pé=a)=%, Yy=n, j=l then the sample Xq can be interpreted as the result of “sampling with replacement,” in the sense of Chapter 1 of [19], from an urn with N balls, of which N; balls are labeled a1, Nz balls are labeled az, etc. As a mathematical object, the sample X = Xp, where the subscript n will often be left out, is merely a random vector (xi,-.., Xn) with values in the “n-dimensional” space X” = Xx Xx--- x X and with the distribution that is defined for B = By x By x+++x By, B; € Bax, by the equalities P(X € BY = P(x: € Bi,...,%n © Bn) = [] P(x: € Bi). (3) In other words, the distribution P on X" is the direct product of n prescribed “one- dimensional” distributions. For the notation for the distribution P and other distributions, we use the following conventions which have already been introduced in (3) and which will avoid confusion. 1, We use the sare symbol, for example, P, both for distributions in (X, 8.) and for their direct product in (2C", 8%) (see (3)), where 8% is the c-algebra of Borel sets in X". The two cases are distinguished only by the argument of the measure- function P. 2. EMPIRICAL DISTRIBUTION: THE ONE-DIMENSIONAL CASE 3 2. Sometimes it is more convenient to denote the probability that X is in a set B, say, from 8% by P(B) and sometimes by P(X € B). The two expressions are equivalent since X" is the sample space for X. 3. Finally. we use the symbol P to refer to the general notion of probability (that is, probability defined for some other random variables without specifying any probability space). In view of (3), we can regard the sample X as an elementary event in the sample probability space (", B%,P) (see [19], Sec. 3.2). Note that we use two interpre- tations for the symbol X and the corresponding object: as a random variable and as a vector of numerical data which have already been obtained in real experiments. Our experience shows that such a dual interpretation is quite acceptable and does not lead to confusion, even though it allows one simultaneous existence of expressions like P(x; < t) = F(t) and x; = 0.74, xz = 0.83, etc. Also, note that the components x; of the sample X are denoted by roman letters x, while italic letters x are reserved for variables. Vectors (21,...,2n) € X", where 2; € X, are denoted by bold-faced letters, that is, # = (21,...,2n): A sample is the basic starting object in problems of mathematical statistics. In practice, however, its elements x1, X», ... are not always independent, and we will not exclude such a possibility from our considerations. In order to avoid additional assumptions, we assume in the case of mutually dependent observations that we are dealing with a sample of size n = 1 and the observations are components of the vector x1. This is possible because the nature of the space X is arbitrary. In what follows, we will often deal with samples X, of unboundedly growing size n. In such cases, it is convenient to assume that we have a sample Xoo = (x1, X2,...) of infinite size, and X = Xy is the set of its first n coordinates. By a sample of infinite size we mean an element of the sample space (X°, 832, P), where X® is the space of sequences (x1, 22,...), the c-algebra BY is generated by the sets M 00. We then obtain a sequence of empirical distributions P},. A remarkable fact is that this sequence becornes infin- itely close to the original distribution P of the observed random variable. This is of fundamental importance for the presentation that follows, because it shows that the unknown distribution P can be reconstructed as precisely as we like if the size of the sample is large enough. Theorem 1. Let B € 8 and Xn = [Xooln € P. Then P3(B} > P(B) as n 900. Here convergence with probability 1 is with respect to the distribution P = P® in (R®,$°,P). We need the assumption that X, = [Xco]n im order to have the random variables P*(B) defined on the same probability space. Proof. Recall the definition (2) and note that the I,,(B) are independent, iden- tically distributed random variables with El,,(B) = P(I.,(B) = 1) = P(x; € B) = P(B). Since P* is the arithmetic average of these variables, using the strong law of large numbers completes the proof. a Theorem 1 shows that P%(B) converges to P(B) at each “point” B, However, a stronger assertion is valid; namely, that such convergence is, in a sense, uniform with respect to B. Denote by 3 the collection of sets B which are semi-intervals of the form [a,6), with finite or infinite endpoints, and suppose again that Xn = [Xeo]n. Theorem 2 (Glivenko-Cantelli). Let X, © P, Then sup |P5(B) — P(B)| > 0. Bes as 2, EMPIRICAL DISTRIBUTION: THE ONE-DIMENSIONAL CASE. 5 To be precise. a slightly different assertion is usually connected with the names of Glivenko and Cantelli, It refers to an important notion of an empirical distribu- tion function, By definition, the latter is the distribution function corresponding to Pj. In other words, the empirical distribution function F;(z) is the function Fr (2) = Pa((-co, 2)). ‘The value of nF*(z) is the number of the elements of the sample X which are less than x. In practice, the following procedure is often used to construct Fyt{z): first, the elements (xi,..-,x%n) Of the sample are ordered in the ascending order; that is, we construct the sequence Xa) SX) SS Xap) which is called the set of order statistics. Then we can put . k FR(x) = 5 for € (Xe), X(e41)), where & varies from 0 to n, x(9) = —00, and x(q41) = 00. It is obvious that Fiz(z) is a step function with jumps 1/n at the points x;, provided that all the x; are distinct. Let F(z) = P((~00, x)) be the distribution function for x; (or, equivalently, for x;) and let X, = [X.o]n- The Glivenko-Cantelli theorem is the following assertion. Theorem 2A. sup [F3(z) — F(2)| = 0 as n 00. In what follows, we omit the subscript n in Fy and write simply F*. Proof. For simplicity, we first suppose that the function F is continuous. Let > 0 be a fixed arbitrarily small number such that N = L/e is an integer. Since F is continuous, we can point out numbers zo = —o0, 21, -.., 2-1, and zy = oo such that, . 1 k F(29)=0, F(zi)=€ = —,..., F(2e) = he = Foe Fen) = Ls For z € [2, 2441), the following relations hold: FP (2)— F(z) F'(ze41) — F (te) = F* (eee1) — Flees) +8, P* (2) — F(z) > P*(z4) — F(eeg1) = F*(=e) — Flex) —& (3) Denote by Ax the set of elementary events w = Xco for which we have F*(z) > F(z). By Theorem 1, P(Ag) = 1. Hence, for each w € A = Mfg Ax there exists an n(w) such that for all n > n(w) we have : IF" (ze) - F(e)l 0, for all w € A, and for all n > n(w,e) large enough. Since P(A) = 1, the theorem is proved for the case of a continuous function F. 6 I. A SAMPLE. PROPERTIES OF STATISTICS ‘The proof in the case of an arbitrary function F(z) is perfectly similar. We only need to use the fact that for any F(z), there exist finitely many points —00 = zy < a <0: < 2N-1 < zw = 00 such that P(tkyi)— Fle +0) $6, k=0j1L..,N-1 (6) To be definite, we can assume that the set {z,;} contains all points at which the jump. of F is greater than, say, €/2. In a manner very similar to (3), we conclude that for 2 € (2k, Zk41] we have F'(z) = F(z) < F°(ze41) — F (tenn) +6, F*(z) - F(z) > F*(zm +0) — F(z +0) ~e. ™ To the sets Ax, defined as above, we add the sets Af, k = 0,1,...,N, in which F*(z, +0) -> F(z, +0). Then P(A,) = P(At) = 1 by Theorem 1. In the set A= p_yAxAf, for which P(A) = 1, inequality (4) holds for n > n(w) large enough and, in addition, IF'(e+0)- F(t 0/ Pi (Bn) — P(Bn) +1. To conclude this section, we note that representation (2) allows us to obtain results on the asymptotic behavior of P* that are more precise than the Glivenko-Cantelli type theorems (these results will be presented in Sections 4 and 6). To illustrate the possibilities existing here, we recall that )-"_, I, (B) in (2) is the sum of independent, identically distributed random variables in the Bernoulli scheme, and Ely, (B) = P(Ix,(B) = 1) = P(B), EL (B)=P(B), Di,,(B) = P(B)(1 — P(B)). The following statement therefore follows immediately from the central limit theorem. 3. SAMPLE CHARACTERISTICS. MAIN TYPES OF STATISTICS 7 Theorem 3. P3(B) can be represented as follows: 4 (8) P,(B) = P(B) + Vn’ (8) where the distribution of Ga(B) = (1/Vm) Dt, (Ie,(B) = P(B)) converges weakly to the normal distribution with parameters (0, P(B)(1 — P(B))). Further investigation of P3(B) in this direction will be presented in Section 6, For more precise theorems on convergence with probability 1, see Section 4. 3. Sample characteristics. Main types of statistics 1, Examples of sample characteristics. | Sample characteristics are usually defined as measurable functionals of the empirical distribution or, in other words, functions of the sample which are assumed to be measurable. The simplest of them are sample (or empirical) moments, The sample moment of order k is defined by the equality ay =a}(X)= f= dF; (x) = yg. in The central sample moment of order k equals ai? = (x) = | ~ of)* dLe et it —at)* = Pe ay). The special symbols ¥ and S? are often used in the literature for the sample moments aj and @3°: =a3°= oe x), Here are more examples of sample characteristics that are used in statistical problems. The sample median C* is the medium order statistic, that is, ¢” = X(m) if n = 2m—1 is odd, and 7 = (xo) + X¢m41))/2 if 2 = 2m is even. We recall also that the median ¢ of a continuous distribution P is any solution to the equation. P= 1/2. A more general notion is that of a quantile ¢, of order p. It is the number for which F(¢) = p. Thus, the median is a quantile of order 1/2. If F has points of discontinuity (that is, adiscrete component), then the definition becomes meaningless. In the general case, therefore, we use the following definition: A quantile (, of order p for the distribution P is the number = sup {z: F(z) < p}. As a function of p, the quantile ¢, is exactly the function /~!(p) that is inverse to F(z). This definition of Gp (or F~*(p)), unlike the preceding one, is meaningful for any F(z). It is clear that along with the sample median we can consider the sample quantile G of order p, which is by definition equal to xq), where ! = [np] +1, xix) are the order statistics of the sample X, k = 1,...,n. For p= 1/2, we keep the definition ¢* = Cf /o given above (it coincides with the last definition only for odd n). 8 I. A SAMPLE. PROPERTIES OF STATISTICS 2. Two types of statistics. Let 5 be a measurable function of n arguments. A sample characteristi¢ $(X) = S(x1,---,Xn) is often called a statistic. It is clear from the above that any statistic is a random variable. Its distribution is fully defined by the distribution P(B) = P(x; € B) (we recall that S(X) may be regarded as a random variable defined on (X", B%., P), where P is the direct product of n “one-dimensional” distributions of x;). Now we will define two classes of statistics that are used frequently in this book. ‘They will be constructed with the help of the following two types of functionals G(F) of the distribution functions F. I. Functionals of the form G(F)=h ([aerarey), where g is a given Borel function, A is a function which is continuous at the point a= fg(z)dFo(z), for Fo such that X € Fy. Il. Functionals G(F) that are continuous at the “pojnt” Fy in the uniform metric, that is, such that G(F‘)) — G(F) if sup, |F)(x) — Fo(z)| > 0 and the supports? of the distributions F(") are contained in the support of Fy. Here Fo is again a distribution function of X. We define the corresponding classes of statistics by the equality S(X) = G(Fa), where F; is the empirical distribution function. Then we obtain the following classes. I. Statistics of type I. This is the class of all statistics which can be represented as S(X) =h (/ ste)43(2)) ah (2 Sats) . a=1 It is clear that all sample moments have the form of additive statistics (1/n) 77, 9(xi) and so belong to this class. II. The class of statistics of type IJ, which is the class of all statistics that are continuous at the point Fo. It is clear, for instance, that the sample median is a continuous statistic at a point F if the median ¢, F(¢) = 1/2, exists, and if F is continuous and strictly increasing at ¢. : These two classes are not, of course, the only alternatives, A functional G(F) may belong to neither class or to both. For instance, if G is a functional of type I, the support of F is contained in the segment {a,6] (F(a) = 0 and F(8) = 1), and the function g has bounded variation in [4,4], then G is at the same time a functional of type II, because in this case the functional . [ 92) ar (a) =a) - f Peale) ‘Phe support of a distribution P with the distribution function F is defined to be any set Np for which P(Np) = 1. 3. SAMPLE CHARACTERISTICS. MAIN TYPES OF STATISTICS 9 is continuous with respect to F in the uniform metric. The above means, in particular, that the first-type statistics X and $? are also statistics of type II if X © P and P is concentrated in a finite interval. Statistics of type G(F.2) are sometimes called statistical functions. Their systematic study was initiated by von Mises (see [45, 116, 117]). We can now complement Theorems 2.1 and 2.2 by the following assertion about almost sure (a.s.) convergence of sample characteristics. Theorem 1. Suppose again that X, =[Xeoln © F. If S(X) és a statistic of type Tor II, then G(Fa) > GLP) an +00, We assume here, of course, that G(F) exists. Thus, samples of large sizes allow us to estimate not only the distribution P, but also functionals of the distribution—at least those which belong to one of the classes named in the theorem. Proof. The proof of this assertion is almost obvious for both classes. For instance, let G(F) = h(f g(e)dF(z2)). Then “ 1s S=S(x)= f o(2)dF5(e) = + of) S is the sum of independent random variables with expectation Ext) = f o(2)4P(e). Therefore, by the strong law of large numbers we have S —> Eg(x). Now let A = {Xco : S(X) > Eg(x:)}. Then P(A) =1, and if Xoo € A, then $(X) — Eg(x:) and A(S(X)) > h(Eg(x:)). In other words, on the set A we have G(Fj) + G(F). as. The claim of the theorem for functionals of the second type is a direct consequence of the Glivenko~Cantelli theorem. oO From the theorem it follows that absolute and central sample moments converge a.s., a8 N —+ 00, to the corresponding moments of the distribution P, for example, ig ay = at(X)= zoe 2 Ex}, jan «oy, 1 ! az? = af°(X) = Yi - 2) 2 Ea —Exy)*. ia In particular, 2 (9? = 2 4Dx,. sf i=1 Thus, we have established an important fact which is of principal importance to us: as the size of a sample grows, the empirical distribution, as well as a broad class of functionals of it, become as close as desired to the corresponding “theoretical values.” 10 I. A SAMPLE. PROPERTIES OF STATISTICS 3. [-statistics. This term refers to statistics which are linear combination of the order statistics or functions of them; 50(¥) = 1D pas ala Here the function g and the coefficients yn,; are fixed. It is evident that all sample quantiles are L-statistics. If we put yn(t) = yni for ¢ € ((i—1)/n,i/n], then the statistic Sy (X) can be represented as 1 Su(X) = f enlthaleM(0) at (1) 0 where F,! is the empirical distribution function which is inverse to F,. If ya (t) converges to y(t} as n —» oo (in a sUitable sense), then it is natural to expect that S,(t) is close to the statistic G(Fj), where L a(F)= f eae tenae= f etPenate aren. (2) 0 It is clear that G(F*) is an L-statistic for yas = y((i— 1)/n). Assuming for the mo- ment that F;(z) is right-continuous (this assumption does not change the following proofs), we obtain a slightly more convenient expression yn, = p(i/n). Represen- tation (2) provides us with a rather broad “regular” choice for the coefficients Yn, in the definition of S,(X). Therefore, in order to avoid technical complications, we restrict ourselves to considering L-statistics of type G(F;), which are constructed for the functional G defined in (2). Note that the sample quantile of order p and the order statistics x(1) and x(n) are L-statistics of type (1), for which the yn(t) converge as n ~ oo to the Dirac é-functions. In what follows, we assume, as a rule, that y(t) in (2) is an “ordinary” (that is, not a generalized) function. Since the functions g and g in representation (2) are defined ambiguously (up to constant factors whose product is 1), we may fix the value yp = J p(t) dt. Expression (2) shows that statistics of type I for h(x) = x are L-statistics, and L-statisties for g = 1 are statistics of type I. 4. M-statistics. Let ¥(z,8) be some function X x R* +R. Consider the func- tional G(F) which is defined as the maximum point of the function [ ¥(z, 0) P(dz) with respect to @. An M-statistic (or M-estimator) is the value 8° = G(F%) or, equivalently, the point 6* at which max) vtve,0) = mann f y(cy) dFa() (3) is attained. If there are several maximum points, then they all are M/-statistics. A somewhat different definition (which is more regular in a certain sense) is also often used. It defines an M-statistic (M- estimator) as any solution to the equation S(g,xi, 8) = 0. (4) 4. MULTIDIMENSIONAL SAMPLES. iM In addition, the term M-statistic is sometimes used as a general term, which in- cludes both M- and M-statistics mentioned above. _ If the function w in (3) is differentiable with respect to @, then an M-statistic in the sense of (3) is an M-statistic in the sense of definition (4), in which g is replaced by the derivative 4’ with respect to 8. However, definition (4) may add “extraneous” values of §*, at which local extrema are attained. M-statistics will be studied in more detail in Chapter II. They play an important role in the theory of asymptotically optimal estimators. 5. On other statistics. Some other “cumulative” classes of statistics are often used as well. We will mention here only the so-called U-statistics, which are the statistics based on the functionals G(F)= [ her, ...s2)P(des)..-Pldem), — 1Em Sa, where A is a symmetric function which is sometimes called the kernel of the functional. A U-statistic is the statistic GPS) =o SO ACK Xi) where the sum is over all i},...,im from 1 to n. All sample moments are obviously U-statistics. Almost all statistics occurring in applications belong to one of the classes listed above. On the other hand, the asymptotic behavior of the distributions of statistics from each of the classes can be studied very comprehensively. Some results in this direction will be presented in Sections 7 and 8, as well as in Chapter II. 4. Multidimensional samples 1. Empirical distributions. In this section, we consider the multidimensional case, when the observed random variable € and, accordingly, the sample values X1y.++)%q are vectors of dimension m > 1; in other words, x4 = (X4,1)---Xkm)+ Here P(B) = P(€ € B) is a distribution in X = R™, and the sample space is (X", BZ, P), where P is the direct product of n copies of the distribution P in (R™, Bx = By). The notation X € P retains its meaning. ‘The constructions of an empirical distribution and of sample characteristics are quite similar in this case. Given a sample X, the empirical distribution P* is con- structed as above, that is, as a discrete distribution with masses 1/n at the points X1,+++,%a, So that pz(B) =“) - oy 1,,(B), where »(B) is the number of points that fall into B, Ix, is the distribution concentrated at one point x;. The claim of Theorem 1 on the convergence P;(B) -> P(B) obviously remains valid in this case. The generalization of the Glivenko-Cantelli theorem to the multidimensional case is connected with a number of qualitatively new issues. One of them is generalizing the notion of an interval to the multidimensional case. Several such generalizations are possible, for example. rectangles, convex sets, etc.

You might also like