Download as pdf or txt
Download as pdf or txt
You are on page 1of 272

Translations of Mathematical Monographs Volume 20

STATISTICAL PROBLEMS
WITH NUISANCE PARAMETERS

by
Ju. V. Linnik

American Mathematical Society


Providence, Rhode Island
1968
CTATMCTMqECKME 3A,ll;Aql1
c MElliAIDmMMM nAPAMETPAMM

IO. B. JIMHHMK

TEOPMH BEPOHTHOCTEtiI
l1 MATEMATMqECKAH CTATMCTMKA

l13AaTeJibCTBO , , HayKa''
rJiasHaH PeAaK~JilH
<l>1rt31r1KO· MaTeMaTJr1tleCK oii Jl1r1TepaTypbl
MocKsa 1966

Translated from the Russian by


Scripta Technica

Library of Congress Card Number 67-30101

Copyright© 1968 by the American Mathematical Society

Printed in the United States of America


All Rights Reserved
No portion of this book may be reproduced
without the wriuen permission of the publisher
PREFACE

The present book is devoted to the analytic theory of elimination of nuisance


parameters in the testing of statistical hypotheses and to the theory of unbiased
estimates. Our attention is concentrated on the analytic properties of tests and
estimates and on the mathematical foundations for obtaining a test or unbiased
estimate that is optimal in some sense or other. It does not, however, include
either computational algorithms (which in many cases reduce to certain forms of
linear programming) or tables. Thus the book does not contain individual statis-
tical recipes for problems with nuisance parameters, but rather attempts to point
out procedured for constructing such recipes.
In the introduction and later, we recall certain standard theorems on a-alge-
bras, probabilistic measures, and statistics. In Chapter I, we treat multiple La-
place transforms and describe the simpler properties and applications of analytic
sheaves along the lines developed by H. Cartan. In later chapters these proper-
ties will be applied to the theory of exponential families. Chapter II gives the
fundamentals of the theory of sufficient statistics for distributions in Euclidean
spaces and exponential families associated with them (for repeated samples).
Chapter III presents some of the problems themselves with nuisance parameters.
Chapter IV treats the theory of similarity following J. Neyman, E. Lehmann, and
H. Scheffe. Chapters V and VII-X discuss the recent researches of statisticians
at Leningrad University in the theory of similar tests and unbiased estimates,
particularly in connection with the Behrens-Fisher problem. In Chapter VI, an
exposition is given of the remarkable method of R. A. Wijsman; however, this
method does not yield all desirable tests. The role of the theory of sheaves of
ideals of functions as an analytic foundation of the theory of similar tests and
unbiased estimates for imcomplete exponential families is clarified in Chapters V
and VII. Here exponential families are considered not only for repeated samples
but for other cases as well.
In Chapter XI, an exposition is given of the problem of many small samples,
and, in particular, of the researches of A. A. Petrov.

iii
iv PREFACE

At the end of the book are several unsolved problems, which constitute only a
small portion of the esthetically pleasing and varied problems that arise in analy-
tical statistics. Our purpose of the present book is to draw the attention of per-
sons interested in mathematic~! statistics to its analytical aspects.
A. M. Kagan, I. L. Romanovskaja, and V. N. Sudakov had a share in the
writing of this book. Sections 2 and 3 of Chapter VII were written by the author
in collaboration with A. M. Kagan, and section 4 of Chapter VIII with I. L. Roma-
novskaj a. Section 2 of Chapter X was written by V. N. Sudakov. A considerable
amount of help in the writing of Chapter I was provided by N. M. Mitrofanova and
V. L. Eldlin.
I wish to express my gratitude to 0. I. Rumjanceva and S. I. Cirkunova for
their great help in the preparation of the manuscript.

I u. V. Linnik
PREF ACE TO THE AMERICAN EDITION

The American translation of this book takes account of several corrections of


misprints and author's errors that were noticed by readers or by the author. It also
contains a supplement to the book written by A. M. Kagan and V. P. Palamodov,
expounding their important contributions published recently in "Teorija Verojat-
nostei' i ee Primenenija". The answers to several questions raised at the end of
the book are provided by the supplement. This new material includes a consider-
able advance in the theory of nonsequentially verifiable functions, the construc-
tion of all randomized similar tests for the Behrens-Fisher problem, and important
progress in the estimation theory for incomplete exponential families, based on
the introduction into statistics of the elements of homological algebra (in particu-
lar, flat modules).
The analytical sheaf theorems on which a large part of the book is based are
replaced in the supplement by the Hormander-Malgrange theory of linear differen-
tial operators with constant coefficients. This theory enables us to solve problems
involving convex supports, rather than merely the polygonal ones discussed in the
book. Thus we can now construct all similar tests for a linear hypothesis with
unknown variances (least square method with unknown observation weights) and
for many other problems of testing hypotheses and unbiased estimation. The opti-
mization problems are thus reduced to purely analytic (variational) ones.
I am very grateful to the American Mathematical Sociery for publishing a
translation of my book with the supplement. It is my pleasant duty to thank S. H.
Gould and G. L Walker for their interest in my book.

Ju. V. Linnik

v
TABLE OF CONTENTS

Page

Preface . .. .. . .. . .. . .. . ... . . . .. . . ... . ... . .. . .. . .. . . .. .. . . . . .. .. .. . . .. .. . .. . .. .. . .. . ... ... .. .. . ... . ... .. . . .. . .. . ... .. . .. .. . . ... . .. . iii

Preface to the American Edition .................................................................................. v

Introduction . . .. . ... .. . ... . .. .. . .. .. .. .. . .. .. . .. .. .. .. . .. . .. .. .. . .. .. . . .. .. .. .. .. . . ... . .. .. . .. . .. .. .. . . .. . .. . . . .. . . .. .. .. ... .. ... 1

Chapter I. The multiple Laplace transformation, functions of several


complex variables, and analytic sheaves .............................................. 7
§i. The multiple Laplace transformation ........................................................... 7
§2. Functions of several complex variables. Theorems yielding
bounds ............................................. ~ .............................................................. 10
§3. Ideals in rings of holomorphic functions. Analytic sheaves ................. 17

Chapter II. Sufficient statistics and exponential families ................................... 23


§L General information on sufficient statistics ............................................. 23
§2. Examples of sufficient statistics ............................................................... 27
§3. Informational properties of sufficient statistics ...................................... 32
§4. Sufficient statistics for a repeated sample. Exponential families ........ 35
§5. Exponential families .................................................................................... 41
§6. Sufficient statistics and unbiased estimates ............................................ 45

Chapter III. Nuisance parameters. Tests with invariant power functions ........ 49
§ 1. Nuisance parameters ... .............. .. .... ..... ...... ............... ....... ... . ......... .. .. . . .... ..... 49
§2. Tests with invariant power functions ........................................................ 51
§3. Some results dealing with tests with invariant power functions ............ 53
§4 .. Stein's test .................................................................................................... 59

Chapter IV. Similar tests and statistics ................................................................. 63


§L Similarity of tests and of statistics ........................................................... 63
§2. Neyman structures. Lehmann's and Scheffe's theorems ........................ 66
§3. Some methods of constructing similar zones ............................................ 69
§4. Approximately similar zones ....................................................................... 78
§5. Independent statistics .................................................................................. 80

vii
viii
§6. Applications of a theorem of 1-1. Cartan to the study of families
of statistics .................................................................................................. 83

Chapter V. Cotest ideals for exponential families ................................................ 89


§I. Similar tests and cotest ideals ................................................................... 89
§2. Statement of the problem for incomplete exponential families ............... 92
§3. Ideals of precotests ...................................................................................... 94
§4. Application of Cartan's theorems ............................................................... 97
§5. The behavior of smooth precotests ............................................................ 98
§6. Smoothing of precotests .............................................................................. 101
§7. Formation of smooth precotests from a given one .................................. 104
§8. Formulation of the final results. Examples ............................................ 106

Chapter VI. Wijsman's D-method ............................................................................ 117


§I. The D-method and the conditions under which it can be applied ........ 117
§2. Examples of application of the D-mcthod ................................................ 119

Chapter VII. Unbiased estimates ............................................................................ 125


§1. Unbiased estimates for incomplete exponential families depend-
ing on sufficient statistics ........................................................................ 125
§2. On the behavior of the variance of unbiased estimates ........................ 127
§3. A theorem of S. R. Rao on the inadmissibility of certain estimates ..... 133

Chapter VIII. Analytical methods of studying unrandomized tests. Appli-


cation to the Behrens-Fisher problem ............................................ 139
§I. Questions of existence of unrandomized similar tests for incom-
plete exponential families .......................................................................... 139
§2. Statement of the problem of an unrandomized homogeneous similar
test in the Behrens-Fisher problem .......................................................... 142
§3. Homogeneous Fisher-Welch-Wald tests .................................................... 146
§4. Lemmas on tangency of a test boundary to a critic ................................ 151
§5. Completion of the proof of theorem 8.3.1. ............................................... 166

Chapter IX. Randomized homogeneous tests in the Behrens-Fisher prob-


lem. Characterization of tests of the Bartlett-Scheffe type .......... 167
§I. Nonexistence of "null-regular" similar tests ......................................... 167
§2. Bartlett-Scheffe tests .................................................................................. 178
§3. A homogeneous randomized test associated with Bartlett's test ........ 181
§4. Characterization of tests of the Bardett-Scheffe type .......................... 183
ix
Chapter X. An unrandomized homogeneous similar test in the Behrens-
Fisher problem ...... : ............................................................................... 193
§L Construction of an unrandomized test ...................................................... 193
§2. The Romanovskil.'-Sudakov theorem .......................................................... 197

Chapter XI. The problem of many small samples ................................................ 204


§L Statement of the problem ............................................................................ 204
§2. A. A. Petrov's investigations .................................................................... 205

Appendix ...................................................................................................................... 213

Supplement. New results in the theory of estimation and testing hypothe-


ses for problems with nuisance parameters .................................. 217
§L Invariant verification of functions which are polynomials in a
and l/a2 for normal samples .................................................................... 217
§2. The description of all cotests for a class of exponential families
with polynomial relations .......................................................................... 226
§3. Conditions of optimal unbiased estimation for incomplete exponen-
tial families with polynomial relations .................................................... 232
§4. The sample mean as the estimate of scale parameters ......................... 240
§5. Nonparametric approach to the estimation of location parameters ....... 243
Bibliography to the Appendix ............................................................................ 249

Bibliography ................................................................................................................ 251


INTRODUCTION

§ 1. PROBABILI1Y MEASURES. INTEGRATION. STATISTICS

In the present section we shall review the basic material from measure theory,
which usually constitutes the foundation of the theory of statistical inferences.
For the proofs of the corresponding theorems, we shall refer the reader to certain
well-known texts. We shall, howevet, give the proofs of the less widely used
theorems. We employ the usual set-theory notation.
As arule, we consider a space X of elementary events. Along with this
space, we also consider the a-algebra ff of measurable subsets A of X. The
pair (X, ff) is called a measurable space. Let us consider a countably additive
(though not necessarily finite) nonnegative set function µ=µ(A) defined on the
sets A € ff. We take µ. (/) = 0, where Ff denotes the empty set. If µ. (X) = 1, then
µ is a probability measure and we shall usually denote it by the letter P. If ff
contains a countable family of disjoint sets A 1, A 2, • • • such that µ.(A)< oo (for
i = 1, 2, · · ·) and Ui Ai = X then the measure µ. is said to be a-finite. We shall
usually make a given µ. complete, without changing the notation for it, by sup-
plementing, if necessary, the a-algebra ff with all subsets of sets of zero meas-
ure and assigning to them the measure O. The most important particular measures
that we shall be using are Lebesgue measure and "counting measure".
Lebesgue measure is defined for X = En (that is, n-dimensional Euclidean
space). Here the a-algebra is composed of the Borel sets generated by all the
parallelepipeds xi€ (ai, bi] (for i = 1, 2, · · ·, n),where x 1, •.. , xn are the coor-
dinates of a point in En. This is the minimal a-algebra containing all such paral-
lelepipeds. For these sets, the measure is simply their geometric volume.
Counting measure is defined on a countable set X. The a-algebra ff is the
family of all subsets of X. For A€ ff, µ.(A) is defined as the number of ele-
ments of A, so that µ(A) = oo if A is an infinite set.
Consider a measurable space (X, ff). Let j" denote a space other than X.
Let T denote a mapping defined on X into j":

T: X- j".
1
2 INTRODUCTION

Let the space 5" also be provided with a a-algebra 93 of measurable subsets 8.
The mapping T is said to be measurable if the pre-images under T of measurable
sets are measurable, that is, if 8 € 93 implies r- 1 (8) € c1. In the case of the
measurable space (5°, 93), we customarily define a measure v by v(8)=µ(T- 1 (8))
for 8 € 93.
When we interpret the original measurable space (X, ct) as a sample space,
we shall call the measurable mapping T: X --> 5" a statistic. In the majority of
the particular cases that we shall consider, '.X: and 5" will be Euclidean spaces,
while the sample element in '.X: and the value of the statistic T will be random
vectors. In particular, if 5" is the real axis, the mapping T is a measurable func-
tion. Here we may consider ordinary integration with respect to the measure µ.
If µ is the measure corresponding to the measurable space ex' (!) and ¢ is a
nonnegative measurable function, then the expression

v(A)= Jcpdµ
A

for A € c1, defines a new measure v on ('.X:, ct). We write ¢ = dv/ dµ and we call
¢ the Radon-Nikodym derivative of the measure v with respect to µ. Here, if
v('.X:) = 1 and v is a probability measure, then ¢ is called the probability density
for v with respect to µ. The function ¢ is uniquely determined up to its values
on sets of measure zero.
Conditions for existence of the function ¢ are given by the well-known
Radon-Nikodym theorem:
Theorem 0.1.1. Suppose that µand v are a-finite measures over CX, c1).
A necessary and sufficient condition for existence of the Radon-Nikodym deriva-
tive of the measure v with respect to µ is that the measure v with respect to µ
be absolutely continuous, that is, that v(A) = O for all A € c1 such that µ(A) =0.
In what follows the concept of a product measure p =µ x v will ~e useful.
Suppose that ex' c1. µ) and <Y. 93, v) are two measurable spaces equipped with
measures µ and v respectively. Consider the Cartesian product Xx Y of X and
Y and the Cartesian product c1 x 93 of the a-algebras c1 and 93, this last product
meaning the minimal a-algebra containing all sets A x 8 for A € (! and 8 € ~.
Thus we obtain a measurable space (X x '!J, c1 x 93). The product measure for this
space is defined as

p(A X B)= µ(A)· v(B).


INTRODUCTION 3

For probability measures this definition corresponds to the concept of independent


events.
In connection with this concept, we shall need the well-known theorem of
Fubini.
Theorem 0.1.2. Suppose that µ and v are a-finite measures over measurable
space CX, (f) and (~. 93) respectively. Suppose that p =µxv. If a function
cp(x, y) is integrable with respect to the measure p, it is integrable with respect
to µ for almost all (in the sense of the measure v) values of y. Furthermore,
the function .f f(x, y) dµ (x) is integrable with respect to v, and

f f (x, y) dp(x, y) = J dv (y) f f (x, y) dµ (x). (0.1.1)


:tx !!f !'I J'i

Proofs of these two theorems can be found, for example, in the book by
Halmos [ 69 ].

§2. SOME PROPERTIES OF STATISTICS

Let us examine some properties of statistics that will be needed below. Our
exposition follows to some extent the well-known book by Lehmann [36].
Suppose that a statistic T maps a measurable space (!, Cf) into a measur-
able space (5°, 93). If B € 93, then r-1 (B) € (f, but the family of measurable
r-
sets I 1 (8)} = Cf 0 , although it is a subset of the family (f, does not necessarily
coincide with it. Obviously the family (f 0 constitutes a a-subalgebra, known as
the a-subalgebra induced by the statistic T. Let us consider the measurable
space (!, lf 0) and assign the measurable real functions in it to the same func-
tions for the measurable space (5°, 93). This enables us to consider the a-alge-
bras lf 0 and 93 in a certain sense equivalent with respect to the statistic T.
Theorem 0.2.1. Suppose that a statistic T: (!, (f) --> (5°, induces a a-sub- 93)
algebra lf 0 . Let f denote a real Cf-measurable function. The function f is <'.f 0 -
measurable if and only if there exists a 93-measurable function g such that
f(x) = g (T (x)) (0.2.1)
for all x.
For the proof, see the book by Lehmann [36].
Another theorem that we shall find useful is

Theorem 0.2.2. Let T: (!, Cf)--> (5°, 93) denote a measurable mapping, let µ
4 INTRODUCTION

denote a u-finite measure over ex.


ff), let g denote a real measurable function of
t € 5", and let µ* denote the measure defined over (5", 93) by
µ*(B) = µ(r 1 (B)) for B € 93. (0.2.2)
Then for arbitrary B € 93,
J g(T(x))dµ(x)= J g(t)dµ*(t); (0.2.3)
T-l(B) B

that is, if one of these integrals exists, so does the other, in which case they
coincide.
For proof see [ 36 ].

§ 3. ON CONDITIONAL PROBABILITIES

In this section we state without proof the information that we shall need re-
garding conditional mathematical expectations and probabilities. Detailed proofs
can be found, for example, in [36] and [8].
Suppose that a measurable space eX, ff) is equipped with a probability meas-
ure P and that a statistic T maps CX:, ff) into (5°, 93), inducing a u-subalgebra
ff 0.Let f denote a nonnegative function. Suppose that ff is measurable and
integrable with respect to the measure P. Theo the integrals J A f dP exist for
all A €ff and a fortiori for all A0 € ff 0 . It follows from the Radon-Nikodym
Theorem (Theorem 0.1.1) that there exists a function f 0 that is ff 0 -measurable
with respect to the measure P and that satisfies the equation

JI dP= f f 0 dP (0.3.1)
Ao Ao
for all A0 € ff 0 .
In accordance with Theorem 0.2.1, f 0 is a measurable function of T (x). This
function has the following two important properties.
1) Equation (0.3.1) holds for an arbitrary set A 0 € ff 0 .
2) The function f 0 is a measurable function of T(x).
On the basis of Theorem 0.1.1 these properties determine f 0 uniquely up to
its values on sets P of measure 0. By definition, the function f 0 is taken as
the conditional mathematical expectation f(x) for a given value of the statistic
T(x):

f0 = E(f(x)\T(x)) = E(f(x)\T = t) = g(t). (0.3.2)


INTRODUCTION 5

Let p*(t) denote the measure induced by the statistic T in the measurable
space (j', ~). Then by virrue of Theorem 0.2.2,

J f(x)dP(x)= J g(t)d?*(t). (0.3.3)


7-l(B) B

for arbitrary B € ~.
To extend the definition of conditional mathematical expectation to sigo-
variable functions f(x), we _define
j+(x)= /(x)+21f(x)I; 1-(x)= lf(x)l2/(x) •

Then f(x) = f +(x) - r(x). We now define


E (/ (.x) I t) = E [j+ (x) I t] +Eu- (.x) I t],
provided the two expressions on the right are meaningful and these quantities
exist. The basic properties of conditional mathematical expectations can be found
in the books [36] and [8]. We shall state two theorems that will be needed in our
study of sufficient statistics and exponential families.
In view of the definition given above for conditional mathematical expecta-
tion, we can, in particular, take for the measurable function f(x) the characteris-
tic function IA (x) of any set A € Cl, that is, the function that is equal to 1 on
the set A and equal to 0 on :t: \A.
The conditional probability of A for a fixed value of the statistic T = t is
defined as
P (A I t) = E (I A (x)I t). (0.3.4)
On the basis of (0.3.3), we have

p (An r- 1 (B))= f I A (x) dP(x) = f P(A It) d?*(t),


7-l(B) B

where B € ~ is an arbitrary measurable set and P*(t) is, just as before, the
measure induced by the statistic T in the measurable space (j', ~).
We now tum to the important special case in which :t: = En is n-dimensional
Euclidean space and Cl is the family of Borel subsets of E n • We note that the
more general case in which :t: is a Borel subset of En reduces to this case since
we can extend the probability measure to all En by taking its value equal to O
on En \:t:.
We have the following important theorems, for proof of which see for example
[36].
6 INTRODUCTION

Theorem 0.3.1. If ! is n-dimensional Euclidean space, then there exist


definitions of conditional probabilities P (A I t) such that P (A I t) is a probabilis-
tic measure on (!, ff) for every t.
Let x denote a random vector defined on ! = En , let T: (!, ff) --+ (j", ~)
denote a statistic, and let P (x It) denote the conditional distribution (determined
by Theorem 0.3.1) of the random vector x. From the preceding theorems we easily
obtain
Theorem 0.3.2. If f(x) is a measurable function of the random vector x in
! = En and if EI f(x)\ < oo, then

E(f(x)\t)= J f(x)dP(xlt). (0.3.5)


CHAPTER I

THE MULTIPLE LAPLACE TRANSFORMATION,


FUNCTIONS OF SEVERAL COMPLEX VARIABLES,
AND ANALYTIC SHEAVES

§ 1. THE MULTIPLE LAPLACE TRANSFORMATION

A considerable portion of the present book is devoted to statistical problems


associated with exponential families of distributions. A natural analytical tool
in the theory of such families is the multiple bilateral Laplace transformation. In
the present section we shall state without proof the theorems in the theory of
such transformations that we shall need. Proofs either can be found in the books
[10] and [ 16] or can be obtained by obvious modifications in the reasoning in
those books.
Consider the s-fold Laplace transform
00 00

L(mj0)= JdT 1 ••• JdT m(T 5 1• ••• , T5)


-oo -oo
(1.1.1)
where m (T 1, • · •, T5 ) is a complex-valued function that is continuous almost
everywhere with respect to Lebesgue measure and () 1' • • • , () 5 are complex para-
meters such that
0i=xi+iyi U= 1, 2, ... , s).
In many problems that we shall be studying, m ( T 1, · • • , T5 ) vanishes for
T; ~ 0, where j is one of the numbers 1, 2, · · · , s 1 ~ s. If s 1 = s, what we have
is a unilateral Laplace transformation. In the general case s 1 < s. We shall
assume that the function m ( T 1, • · • , T5 ) is such that L (m I ()) converges abso-
lutely in the Cartesian product P of the s 1 half-planes R.: x. > O
. I I
(where j=l,2,···,s 1 ) andthe s-s 1 strips S;:O<x;~A; (forj=s 1+1,···,s).
(We can reduce arbitrary strips to this last type by a linear transformation of the
parameters and variables.) Let m 1 and m 2 denote two functions that have

7
8 I. INTRODUCTORY MATERIAL

Laplace transforms of the type described. We have the following important convo-
lution theorem.
Theorem 1.1.1.
(1.1.2)
where
r1 Ts, co

ml • m2 = f ds1 . . . f d6s, f dss,


0 0 -co
+1 .• ••

co

-co
f ds~1 (s1· · · · • ss) ni2 (T 1- 61· T 2- 62· · · · • Ts - ss>·

Here L (m 1 * m 2 je) converges absolutely in a region of the type described above.


We have the following theorem on the inverse transformation.
Theorem 1.1.2. If a point (c 1' • • •, cs) belongs to P and if the integral
c 1+tco c8 +l.oo

(2~)s f
c 1-tco
d0 1 ••• J
cs-lco
d0sl(mjO)exp(01T1 + ... +OsTs). (1.1.3)

over the product of the vertical contours converges absolutely, it is equal to the
function m (T 1, • • • , Ts) at all its points of continuity.
Since L (m I e) is holomorphic in the region P, evety contour in the product
can be deformed in a rather arbitrary manner by replacing it with rectifiable curves
of a type that is convenient in some respect or other. In particular, if L (m I e)
satisfies the inequality

ll(mlO)j < 1(10.1+1) .:. (10sl+l)lr'0


(l.l.4)

for r > 1, e E P and C 0 = const., then by translating the contours for the inte-
grations with respect to e 1, • • • , es to the right, we see that
m(Tl• ... , T:;)=O for T 1 <0, ... , Ts,<O.
1
We note also that for positive integers k 1, • • ·, k s the fraction l/ekl .•• esk s is
the unilateral Laplace transform of
1 Tk' Tk 5
(k 1 - i)I ... (ks-1)1 1 •• • s •

Theorem 1.1.3. Let E(el' ···,es) denote a function that is holomorphic in


a region P. Suppose that for arbitrary T/ > O and c j ~ T/ (where j = 1, 2, • • • , s)
the inequality
IE (01, .•.• Os>I< ( 10, 1+ w~'. ~'l)l10s 1+ w (r ~ 3) (1.1.5)
§ 1. THE MULTIPLE LAPLACE TRANSFORMATION 9

holds on the vertical contours (mentioned in Theorem 1.1.2) passing through the
region P. Then E(fJl'···,fJs) is the Laplace transform L(H\fJ),where H=
H (T 1, • • ·, Ts) vanishes for Ti < O (for j = 1, 2, • · ·, s ~ and has partial derivatives
of the first r - 2 orders. Furthermore,
(1.1.6)

where ( > O may be arbitrarily small. Here and in what follows the Ki are posi-
tive constants.
To prove this, we substitute the function E (() 1, • • ·, () s) for L (m \()) in for-
mula (1.1.4). By virtue of inequality (1.1.5), the integral converges absolutely.
Denoting it by H (Tl'• · • , Ts), we obtain a function satisfying inequality ( 1.1.6).
If one of the variables T 1, T 2, • • • , Ts 1 is negative, we find, by increasing the
abscissa ci of the corresponding contour, that H(T l'. ··,Ts)= O. Specifically,
the integrand in (1.1.3) contains the factor exp (() 1 T 1 + • • • + ()s Ts), which con-
verges uniformly to O. Furthermore, by differentiating formula (1.1.3) formally
with respect to () l' · • · , () s , we obtain a partial derivative of order r - 2. By
virtue of inequality (1.1.5), this integral converges absolutely, so that the corre-
sponding derivatives of H(T l' ··•,Ts) exist.
We still need a theorem on the Laplace transforms of functions that vanish
outside a finite interval (that is, functions of compact support).
Theorem 1.1.4. Suppose that the function 8(T l' · · ·, Ts) has partial deriva-
tives of the first r (~ 1) orders almost everywhere and that 8 vanishes outside a
finite interval [O, b]. Then for arhitrary (() 1, • • • , () s) (Re () > O),

l(6 1°)=0((l01 l+1)' ..1. (10sl+1)7 )· (1.1. 7 )


To prove this we note that here the Laplace transform is unilateral. Inte-
grating by parts, we obtain
b b

l(6!0)= f dT 1 ••• f dT/.>(T 1, ••• , Ts)


0 0 .
X exp[-(0 1T 1 + ... +0sTs)1
·1
=ei f .dT
b

1 •••
b

JdT 6(T 5 1• •• ., T5 )exp [-(0 1T 1 + ... +0 T


5 5 )].
0 0
Repeating this operation r times with respect to the corresponding variables, we
e
obtain inequality (1.1.7). We note that for 1 close to 0 this operation is not
suitable. It should be applied to those variables Ti for which \fJi\ ~ 1. If every
10 I. INTRODUCTORY MATERIAL

jej I ::S 1 then inequality (1.1.7) is trivial.


We shall say that the function E (el'···, es) is holomorphic in a neighborhood
of the point at infinity <el= oo, ••• ' es= oo) if it can be represented as a power
series in p 1 = 1/ el' ••• ' p s = 1/es that converges in a neighborhood of (0, ••. ' 0).
Theorem 1.1. 5. If the function E (e 1, • • • , es) is holomorphic in a neighbor-
hood of the point at infinity (e 1 = oo, • • ·, ()s = oo) and if it vanishes there, then it
is the unilateral Laplace transform E(el' ••• ' es)= L(H 1e), where the function
H=H(Tl'···,Ts) isholomorphicfor T 1 >0,···, Ts>O.
Suppose that
00

E (0 l• • • •' 0s) = "'1


."'-l
e-n
anl' ... , nk 1 I • • •
e-n
s s
nl' ... , nk=l

and that the series converges in a neighborhood of (0, · · ·, 0). By virtue of the
correspondence mentioned above,

Multiplying these expressions by a


1,···, nk and summing, we obtain an every-
n
where-holomorphic function under the symbol L (H le). We need to consider it only
for T 1 > o, ···, Ts > o.

<§2. FUNCTIONS OF SEVERAL COMPLEX VARIABLES.


THEOREMS YIELDING BOUNDS

We saw above that functions of the form L (m le) are holomorphic functions
of several complex variables. We shall use certain concepts and theorems from
the theory of such functions. These can be found in the monographs [ 5 ], [ 5 5 ],
(75], and (76]. In particular, we shall need the concept of a holomorphic function
in a region of superposition.
Let us denote by cs the Cartesian product of s copies of the Euclidean
complex plane. A domain of holomorphy is defined as an open region DC cs for
which there exists a function f (el' · · · , es) that is holomorphic in D but does not
have an analytic continuation to any open region of which D is a proper subset.
In particular, the polycylinders Z 1 x Z 2 x • • • x Zs, where the components Z.
(for j ::S s) are open simply-connected regions of the complex variables xi + i~j,
are domains of holomorphy. In what follows, we shall be concerned primarily with
such regions.
§ 2. FUNCTIONS OF SEVERAL COMPLEX VARIABLES 11

Suppose that the Cartesian product / 1 x • • • x Is of real intervals 11, · · · , Is


of positive length is contained in the interior of the polycylinder Z 1 x • • • x Zs·
Then we have
Theorem 1.2.1. If a function that is holomorphic in Z 1 x • • • x Zs vanishes
at all points of the set 11 x • • • x Is, then it also vanishes everywhere in
z 1
x •.. x s z.
Let f 1 (e 1, • • •, es),··· , fr (e 1, · · • , e 8 ) denote r holomorphic functions
defined in the polycylinder Z 1 x • • • x Zs . Consider the system of equations

f 1 = 0, • • • ' fr = O. (1.2.1)

The set of solutions of these equations (e 1' ••• ' es) inside z 1 x ••• x zs is
called the analytic set generated by the system (1.2.1). Let us denote this set by
Vf 1,···,fr" Suppose that the functions f/er
···,es) for j = 1, 2, · · ·, s have
real values on the real axes. Furthermore, suppose that r < s and that the analy-
tic set v11,···,fs inside the polycylinder Z 1 x • • • x Z 5 with bounded simply con-
nected components Z; (for j = 1, 2, · · ·, s) can be decomposed into a finite num-

ber of components vj 1,. •• ,fs, where q = 1, 2, · · ·, M, ·that are connected along


strips, l) each of these being of complex dimension s - r, and suppose that these
components contain the connected set Rfq f of real points of real dimension
l•"""• s
s - r.
Theorem 1.2.2. Suppose that each Rfq
1.···, f s has an interior point at which
IJ/1 iJ/1
iJtl1 ···-;m;-
rank · · · · · · = r. (1.2.2)
IJf, iJfr
Ti""" ca;
Then every function F (e 1, • • • , es) that is holomorphic in Z 1 x ••• x Zs and
vanishes on Rfq f (where q = 1, · · · ,. M) also vanishes on the entire analytic
1.-··, s
set vf 1.···, f s .
To prove this we note that, on the basis of (1.2.2) and the continuity of the
matrix on the left-hand side of that equation, we can express r of the variables

1) That is, any two points of a component can be connected by an open strip of maxi-
mum dimension in it. .
12 I. INTRODUCTORY MATERIAL

() 1, • • ·, () in a neighborhood of the point inside Rfq ••• f in question in terms


s l• ' s
of f 1,. •• , fr. Let us suppose that this can be done for the parameters () 1, • • • , (Jr.
Then in that neighborhood

F(()l' • • •' ()s) = F 1 (fl'···' fr' ()r+l' • • ·' ()),


where F 1 is holomorphic in that neighborhood. Furthermore, F 1 (0, • • • , 0,
() r+ 1, • • ·, () s ) = 0 for some set of values ()r +I'••·, () s of real dimension (s - r).
Then '1:he same is true of some complex neighborhood of this set and F (() l' · · · , () 5 )
will vanish in that (s - r)-dimensional neighborhood. This last follows from
standard theorems in the theory of functions of several complex variables, in
particular from Theorem 1.2.1.
We shall also need certain special theorems providing bounds. We present
these along with their proofs. They depend on other theorems of the same type,
which appear in the literature on the subject. In particular, we shall need the
following theorem proved by Lojasiewicz in 1959 in the article [52).
Theorem 1.2.3. Let f(x 1, · • •, xn) denote a holomorphic function of n real
variables defined on an open set UC En. Let X denote the set of real zeros of
f and suppose that X is not empty. Let A denote a compact subset of U. Then
there exist positive constants c and q such that for all x = (x 1, • • ·, xn) € A,
I f(x l' · · ·, xn)I ~ c (p(x, X))q,

where p (x, X) is the distance in E n •


From this theorem we can easily derive a corollary on functions of several
complex variables. Let f((J 1, • • •, ()s) denote a function of s complex variables
() 1, • • • , () 5 that is holomorphic in an open subset U of Es' where Es is s-dimen-
sional complex space. If we set (). = x. + iy., we obtain f(() 1, • • ·, () ) =
I I I s
¢ 1 (x, y) + i¢ 2 (x, y), where ¢ 1 and ¢ 2 are real functions of the variables
x 1, • • ·, xn, y 1, • • ·, y n' both holomorphic in U (see [5 ]). If X denotes the set of
complex zeros of f(() 1' • • ·, ()s) in U, then X is also the set of real zero~ of the
function ¢=<Pf+¢~ , which is a real function in U. If A is a compact subset
of U, then Theorem 1.2.3 provides the corollary that we need: for()=(() l' · ·., () 5 ),
there exist constants c = c (A)> O and q = q (A)> O such that
'lf(()l' ••• , ()s)I ~c-(p((), X))q. (1.2.3)

Let Z = Z 1 x • • • x Zs denote a polycylinder with open bounded simply-con-


nected components and let Z denote its closure.
§ 2. FUNCTIONS OF SEVERAL COMPLEX VARIABLES 13

If a function Q(el' ... ' e) is holomorphic on z, then it is holomorphic in


some open set U containing Z. Suppose that the set X of complex zeros
Q(el''''' es) has no points inside z. Then we can apply (1.2.3) with A= Z,
so that
( 1.2.4)

We turn now to the theorems on bounds that we shall need. The letters K 0 ,
K 1, • • • will denote pos~ive constants. Let Z denote an open polycylinder of the
type described and let Z denote its closure. Let f 1 (e 1, • • ·, es) denote a func-
tion that is holomorphic on Z and let F (e 1' • • • , es) denote a function that is
holomorphic in Z. Suppose also that, throughout Z,

(1.2.5)

Theorem 1.2.4. Suppose that there exists a constant M such that


es )I ::; M everywhere in Z, and that rank
IF (e 1' ... ' 11af iiae 1' ... 'af 1 I ae s II = 1.
Then everywhere inside Z,
(1.2.6)

where 0 is the distance from (e 1' ••• ' es) to the boundary of z.
Proof. By the hypothesis of the theorem, the polycylinder Z can be covered
by a finite number of open circular polycylinders U 1, • • • , Uk such that in each
of them,

F(0 1, ••• , 0s)=F 1 ( / 1• 0; 2•••• , 0ts)=/ 10 1 (!1• 0t 2, ••• , 01J (1.2.7)

Here (i 2, •••,is) is a sample of (s - 1) numbers of the set 1, 2, • · ·, s; G 1 =


G 1 (el'···, es) for f 1 =f 1 (el'···, es); the functions F 1 and G 1 are holomor-
phic in the images vi of the polycylinders uj under the mapping
¢:(el' ... ' es)__. (f 1' ei2' ••• ' ei/
Consider a point (81' • • ·, e) € U; nZ and suppose that the mapping ¢
assigns to it the point <f1, e.z 2 ' ... ' e.Zs ). If the distance from the point
(el' ... ' es) to the boundary of z is equal to o, then it follows fr:om inequality
( 1.2.4) and the familiar properties of J acobians that the distance from
(f 1' ei 2' .•• ' eis) to the corresponding boundary is y j oq' where q is a positive
constant and y j is a positive function bounded above and below by constants
depending only on j and, since j 5, k, by certain absolute constants. If the value
of f1 satisfies the inequality If1 1> y i oq 110, we obtain from <i.2. 7)
14 I. INTRODUCTORY MATERIAL

IOM K M
101 (11· 0i •... , 0,
2 s
)I<-<--·
'Y/'q {JQ
2
(I.2.8)

On the other hand if lf11~ yjoq/10, then the circle CY/ lf'1 -f11= yjoq/5 lies
inside the image cf>(Ui n Z) and from Cauchy's theorem we have

1 ~ F 1 (!~, 0 1 ~, •• • , 0iJ ,
0 1 (/1, 0, 2, ... , 0is) = -. ( , ) , dfl'
2m c /1 -/1 /1
Yj
so that

Combining this inequality with (1.2.8), we obtain the proof of the theorem.
A modification of Theorem 1.2.4 that will be important in what follows is a
theorem dealing with the case in which the polycylinder Z is the Cartesian prod-
uct of s open vertical strips S 1 x • • • x S 5 •

Theorem 1.2.5. Suppose that the polycylinder Z is the product of vertical


strips S.: O < ReO. < T/ ., O < T/. < oo, where j = 1, 2, · · ·, s. Suppose that equa-
l I I I -
tion (1.2.5) holds and that IF(0 1, • · •, 05 )1 _:: ; M inside Z. Snppose that
f 1 (01' • • •, 0 5 ) is a holomorphic function of the variables 1/0 1 + 1, • • • 71/ 0 5 + 1
on the closure of Z. Then

0(01, · .. , 0s)I <: ~~~ (1011 + .. · + 10sl + l)K2, (1.2.9)

where o is the distance from (01'•••, 0 5 ) to the boundary of Z.


To prove this we note that the mapping
1 1
P1 = 01 +i ' ... , Ps= es+l
maps Z into the Cartesian product of the "half-moons" tangent to vertical lines,
so that we obtain an open polycylinder 6. Here, the points (0 1, • • • , 05 ), where
oj = oo, are mapped into points at which p j = o. The distances from these points to
the new boundary can be expressed in terms of o
and the quantities IO. + 1 I, for
I
j = 1, 2, · · · , s. Thus the theorem reduces to the preceding one.
A generalization of equation (1.2.5) is the equation
F (0 1, ...• 0s) = / 1 (0 1, •••• 0s) 0 1 (0 1, ... , 0s) +
... +f,(01, ... , 0$)0,(01; ... , 0), (1.2.10)
where (in the previous notation) the functions f 1, ···,fr are holomorphic on Z
§ 2. FUNCTIONS OF SEVERAL COMPLEX VARIABLES 15

and the functions F, G 1, ···,Gr are holomorphic in z. Here Z is a polycylinder


with open connected bounded components and 2 :s; r < s.
The functions G 1, • • • , Gr are not uniquely defined. To these functions we
can add arbitrary functions A 1 , · · • , Ar that are holomorphic in Z and that satisfy
the condition
Ai/1 + ... + A,f,= 0.
To obtain inequalities analogous to (1.2.4) we must make the comparatively
stringent requirement that there exist constants ()al' () az' • • • , ()a,., such that
iJ/1 ofr
iJ0a1 • • • iJ0a1
. • • • . . :/= 0 . (1.2.11)
..!1J_ of,
iJ0 • • • iJ0
a, a,
in the polycylinder z.
Theorem 1.2.6. Suppose that the functions f 1, ··•,fr have a common zero in
the.region Z, that equation (1.2.10) holds, and that \F(()l' • • ·, ()s)I :s; M in the
region Z. Then equation (1.2.10) remains valid if we replace the functions
G 1' ••• , GT with functions G'1· ••• , such that c;
IOi(01o
I
.•• , 0s) I< K4A1
f>K,. (i= 1. 2, ... , r),
where 8 is the distance from the point (()1' ••• , es) to the boundary of z.
Proof. We use induction. The theorem is already proved for r = 1. Let us
suppose it true for 1, 2, · · ·, r - 1. Let F denote the analytic set of common
zeros of f 1, ···,fr inside z. By hypothesis F is nonempty.
Let us number the functions f 1 , • • • , fr and the variables ()a , • • • , ()a
1 r
in
such a way that the corner minors of orders 1, 2, · · ·, r in the matrix l\afi I ae ill
(i, j = 1, 2, • • •, r) are nonsingular in z.
The Variables () 1, • • •, {)T can be ex-
pressed as functions ()i = ()i (f 1' ···,fr, ()r+l' • • ·, ()s) that are holomorphic in the
domain of superposition which is obtained by representing Z in terms of the func-
tions fp···,fr, ()r+l'···, es, and¢ where

<p: (01, ...• Os)~ <ft, ...• f,. er.+ I• •••• 0s).

The functions G/01' · · ·, Os) can also be represented in this region as holomor-
phic functions Gj (f 1' • • • , f,, ()r + 1' • · · , ()s), where j = 1, 2, · · · , r. Consider a point
( (0)
el,···, s €r.Inane1ghbor_hoodofthepo1nt f1=f1<e1(0) , ... ,es(0) ), •••
()(0)) D • •

(0) ( 0)) (0) (0) .


• • • 'fr = f/01 ' ... 'es ' ()r+l'· •• ' es we may write
16 I. INTRODUCTORY MATERIAL

Os)= 0 11 ( /1, 0, ... , 0, 0,+ 1, ••• , Os)


+ /2G12U1• · · ·• fr, 0r+1• · · ·• Os>+ .. ·
··· +fr01r(f1, ••••fr• 0r+l• •••• 08 ).
where G 12 , • • • , G 1 , are holomorphic functions of 0 l' • • • , 0s • With the aid of
equation (1.2.10) we obtain
F(01, ... , 0s)=/1G11(i1• 0, .•. , 0, 6,+t• ... , Os)
+ /2 0~(01' ... , Os)+ ... +
/,0;(0 1, ••• , 08 ), (1.2.12)
where Gj=Gi-flGi (j=2,3,···,r).
The representation (1.2.12) is valid for all values of (01' •·•,Os) and the
values of f 1, • • ·, f, corresponding to them. This equation should be understood
to mean that the function G, depends only OD f 1 and o,+l' ..• ' (JS. Therefore we
need only find a bound for it at the values f 1 = 0, • • • , f, = O. Theo
F(a;. .... 08 ) = F 1 (/1• 0, ...• 0, O,, ...• 08 )
=/1011(/1. 0, ...• 0, ar+l• ...• 0$),
which corresponds to the case considered in the preceding theorem, so that we
obtain an inequality of the form (1.2.8) for G 11 . We can now write the expression
(1.2.12) in the form
F- /1011 = frO~ + · · · + Jp;.
The conditions of the theorem are satisfied and IF - f 1 G 111 ~ K 6M!ll 7 in Z.
The rest follows by the induction hypothesis. This completes the proof.
Theorem 1.2.7. Suppose that a polycylinder Z is the Cartesian product of
vertical strips Si: 0 <Re (Ji < T/"j, T/. i > O, for j = 1, 2, • • • , s. (The possibility
T/ i = oo is admitted.) Suppose that con<litions (1.2.11) and (1.2.10) are satisfied,
where jF(0 1, ... , Os)I ~M inside Zand the functions f 1 (01''" ,(Js),. ••
• • • , f, (O l' • • • , 0s) are holomorphic functions of the variables 1/(J 1 + I, · · ·
• • •, I/Os + 1 on the closure of Z. Then equation (1.2.10) remains valid if we
replace the functions G l' ... , GT with functions G'l' ••• , c; such that

I0}(01, ... , Os>l< K~


6 •
(!0d+ .. · +10sl+l)K".
This theorem reduces to the preceding one just as Theorem 1.2.5 reduced to
Theorem 1.2.4.
§ 3. ANALYTIC SHEAVES 17

§ 3. IDEALS IN RINGS OF HOLOMORPHIC FUNCTIONS.


ANALYTIC SHEA YES
The study of similar tests and unbiased estimates for exponential families of
distributions leads us in a natural way to a study of ideals in rings of certain
holomorphic functions. Let 0 denote any commutative ring. Then a subring I
such that al CI for arbitrary a € 0 is called an ideal of 0. Let iB denote a sub-
set of 0. Then the set of all finite sums of the form ~r=l ai bi, where ai € 0 and
b.i € B, constitutes an ideal known as the ideal generated by B in 0. In particu-
lar, if B consists of a single element b, the ideal {ab: a € 0 l generated by it is
called a principle ideal and is denoted by (b). If an arbitrary element of an ideal
I can be represented as a finite sum of the form ~r=l ai bi, where b 1, • • ·, bM are
given elements of 0 and ai € 0, then the ideal I is called an ideal with finite
basis. (To obtain a basis from the elements b 1 , • • ·, bM one removes all elements
that can be expressed as linear combinations of the preceding elements of that set.)
We shall consider ideals I in the ring 0 of functions that are holomorphic in an
open polycylinder Z = Z 1 x • · • x Zs, where Z 1, • • • , Zs are simply-connected
bounded open regions of the vafrables el' ... , es
respectively. The study of
such ideals, in accordance with the ideas of H. Cartan [25)-[28) and K. Oka [59],
is based on the local properties of ideals, from which we derive their global prop-
erties with the aid of the theory of analytic sheaves.
Of special importance in the study of local properties of ideals of functions
is the following theorem, proved by Riickert [65] in 1932.
Theorem 1.3.1. Every ideal of functions that are holomorphic in a given re-
gion of a given point has a finite basis in a sufficiently small neighborhood of
that point.
The following theorem, proved by Cartan [ 26] in 1940, on the "gluing" of
bases is also extremely important.
Theorem 1.3.2. Let ti' and ti" denote compact polycylinders all but one of
the components of which are the same. Suppose that ti= ti' n ti" is not empty.
Let I, I', I" denote ideals in rings of holomorphic functions defined on ti, ti',
and ti" respectively. Suppose that I' and I" generate I in the compact set ti.
Then there exists in the region ti' U ti" an ideal JO that has a finite basis and
that generates all three of these ideals in their domains.
By a function that is holo111.orphic on a compact polycylinder we mean a func-
tion that is holomorphic in some neighborhood of that polycylinder.
18 I. INTRODUCTORY MATERIAL

The general local properties of ideals of functions are formulated with the aid
of the concept of the germ of a function. Let f denote a function that is holomor-
phic in a neighborhood of () = (() 1' • • • , () s) E Z. The germ of f is defined as the
class of functions that are holomorphic in a neighborhood of () and that coincide
with f in a neighborhood of (). This definition can be made more precise with the
aid of the concept of an inductive limit (see Cartan [ 28 ]). Consider the open sub-
sets U of z. In each set U consider the ring SU of functions f (()) that are holo-
morphic in U. Let V denote an open subset of U and introduce the homomor-
phism rU that consists of the restriction to V of functions that are holomorphic
in U. F~r WC V CU we will have r~ = r~ or~. The inductive limit of the
groups Su for () € U constitutes a group Se· Every element of this group is a
germ of functions f e. If f € Su , g E S V, and () € U n V and if there exists an
open set wcu n v such that r~ f = r~g, then f e = g e·
Addition and multiplication of germs of functions f e are defined in a natural
way. They constitute a ring 0 e in which we may consider the ideals I e· In the
space of all germs f e we define a topology (of the general type), and for this we
need only exhibit a basis of the family of open sets. Let U denote an open set Z
such that{€ Su· For every point()€ Uf we define a germ feE S 8 • We denote
the set of such germs by f U. We form a basis of the family of open sets from the
sets fU for all f and U.
We shall now present briefly the information on sheaves that we shall need.
(For a more detailed exposition see [ 17] or (55 ].) Although we shall need only
sheaves of ideals of germs of holomorphic functions over Z, it will be useful to
give the general definition of a sheaf of rings. A sheaf j= of rings is determined
when the following four things are defined:
I) a topological space X (the base of the sheaf),
-
2) a function x - F (x) that assigns to each x E X a ring Fx,
3) a (general) topology in the union F of all the sets Fx,
4) a "projection" p: F - X that assigns to all elements of Fx the element
x and that is a local homeomorphism.
In addition we require continuity of the algebraic operations. More precisely,
the mapping a - - a (a € F) of the space F onto itself must be continuous;
also, the mappings (a,f3) - a+ f3 and (a, f3) - af3 of those pairs (a, f3) €
F x F for which p (a)= p ({3) (so that the operations are defined) must be continu-
ous (the topology in F x F is defined as usual).
§3. ANALYTIC SHEAVES 19
The rings F x are called the stems of the sheaf.
Let U denote an open subset of X. A continuous mapping s: U--> F for
which the mapping p o s is the identity mapping is called a section of the sheaf
S: over U.
Thus a section s maps x into the stem F x; it has exactly one point in
common with every stem over U. Two sections that coincide for a point x E X
also coincide for some neighborhood of x.
Furthermore, since s • x belongs to F x we can define addition and subtrac-
tion of two sections over U by (s 1 ± s 2 ) x = s 1 x ± s 2 x € F x. In this way we
obtain the group of sections r
(U, S:).
The sheaves that we shall use will always have a base X = Z, where Z is
the polycylinder introduced earlier. Furthermore they will always be analytic.
Take X = Z. A sheaf of ideals I z in the rings 0 z of germs of functions
that are holomorphic at a point z € Z is called an analytic sheaf S: over We z.
take the u~ual topology on Z; on F = {/ zl we take the topology described above.
Here the mapping (f, a)-> fa, where f€ Oz and a€ Fx' is continuous. (In a
more general situation this property is a requirement over a complex analytic
variety.)
In particular, the sheaf 0 of germs of all holomorphic functions over Z is
analytic. The sheaf S:
mentioned above can be defined as a subsheaf of O. The
section s of the analytic sheaf S: over the set U can be identified in a natural
manner with the function f €Su which is holomorphic over U.
Very important for what follows is the concept of a coherent analytic sheaf.
An analytic sheaf S: over Z is said to be coherent if for every point z € Z
there exists a finite number of functions (sections of the sheaf 0) that are holo-
morphic over some open neighborhood U of the point z and have the property
that for an arbitrary point z' € Z the stem Iz is contained in the ideal generated
by this finite number of functions in the ring of germs 0 z,.
Now we can state the theorem of H. Cartan that we shall need. In what fol-
lows we shall base our study of a large class of tests and unbiased estimates on
this theorem (Theorem 5 in [28)). We formulate it under more stringent restrictions
than in its original presentation since this formulation will be sufficient for our
purposes.
Theorem 1.3.3. Let Z denote a polycylinder with open simply-connected
20 I. INTRODUCTORY MATERIAL

bounded components and let 1 denote a coherent analytic sheaf over Z. Let
u 1, u 2, • • ·, uM denote finitely many sections of 1 over Z such that for an arbi-
trary point z .€ Z, the stem lz is generated by the germs of these functions in Oz.
Then the sections u 1, • • ·, uM in the ring Oz of all functions that are holom01phic
over Z generate an ideal containing the entire group of sections r(Z, ~).
From this theorem we derive a corollary that we shall have occasion to use
later.
Corollary. Let f 1, ···,fr, where r < s, denote functions that are holomorphic
on Z. Suppose that the analytic set of points (01' • • •, Os) defined by the condi-
tions
iJ/1 i}f I
091 • ' · 098
/1=0, ... , fr=O; rank · · · · < r, (1.3.1)
iJf, iJf,
as.- ... as;
has no points inside Z. Then every function F that is holomorphic in Z and
vanishes on the analytic set f 1 = 0, • • ·, f, = 0 can be represented in the form
F=f 1 G 1 +···+f,G,, (1.3.2)

where the functions G 1' • • • , Gr are holomorphic in Z.


We note that this theorem deals essentially with the theory of ideals (in gen-
eral, reducible) of analytic varieties. However, since the analytic set

f 1 = 0, • • • ' fr = 0, (1.3.3)

is not in general closed in Z, we cannot apply the corresponding theorems of


Cartan in [ 28 ]. Consider a point 0 E Z and the germs of the functions f 1, · · • , fr
at it. If the functions f 1, • • • , fr have no common zeros at the point 0, at least
one of them, let us say f 1, has no zeros in some neighborhood U of the point z.
Setting G 1 = I/f 1, so that 1 = f 1 G 1 in U, we see that the functions f 1, ···,fr
generate at the point z the ring of all germs Oz, that is, the ideal I z = 0 z . Of
course, the germ of the function F at z belongs to this ideal.
Suppose now that the functions f 1, · • ·, fr have a common zero at the point O.
Then by the hypotheses of the theorem we have at that point rank 11 afi I ao; 11 = r.
Without loss of generality we may assume that the left corner minor of order r is
nonzero. Then in a neighborhood of the point 0 we may express F = F (0 1, ••·,Os)
in the form of a holomorphic function
§ 3. ANALYTIC SHEAVES 21

F1U1• /2, · · ·• f,,0r+l• · · ·• 0s)


= +
Fi(O, ... , 0, 0r+l• · · · • 0s) f 1F11 + /2F12 + •· · +/,Fir•
where the F 1 . (j = 1,. · ·, r) are holomorphic functions of f 1, • • ·, f,, ()r+i•" •• ,()5 •
If we set f 1 = •1• • = f, = O, we see that F 1(0, • • • ,0, ()r+l' • • ·, ()5 ) vanishes identically
in that neighborhood; hence in that neighborhood we have F = f 1 G 1 + • • • + f, Gr'
where the G.
J
= F 1 J.
are holomorphic functions of () 1, • • • ' () s •
Consider the sheaf j= of ideals generated on Z by the germs of the functions
f l' . . . f . Obviously it will be coherent. In accordance with Theorem 1.3.3, the
' T
functions f 1, • · · , f,, being an ideal in the ring of all functions that are holomor-
phic on Z, will generate all sections j= on Z, including the function F. From
this we obtain (1.3.2), where all the G. (j = 1, · · ·, r) are holomorphic on Z
J
(G i € Sz). In particular, if f 1, • • • , f, have no common zeros in Z, then 1 =
f 1G 1 + • • • + f, Gr, where each Gi € Sz (j = 1, · · · , r) (Cartan's example; see [ 28]).
We now. present an important consequence of Theorems 1.3.1-1.3.3, discovered
by Cartan in 1950 (see (27]).
l
Theorem 1.3.4. Let A denote a compa·ct polycylinder contained in Z and let
I denote an ideal of functions that are holomorphic inside Z. Then I has a finite
basis on A in the sense that there exists a set of functions g 1, • • •, g 5 that are
holomorphic in Z and an open neighborhood V such that Z ::> V ::> A and for every
function f € I there exists a representation of the form

flv=h1g1+h2g2+ · · · +hsgs,
where h l' • • • , h s are /'1unctions that are holomorphic in V. l)
Proof. For every point () € A, let us consider a polycylindrical neighborhood U8
in which the ideal is finitely generated. Theorem 1.3.1 asserts that such a neigh-
borhood exists. From the open covering of the compact set £1 that we have ob-
tained, let us choose a finite covering {u el' ... , u el I.
We may assume that U~=l u 8i = Z' is a·polycylinder (containing A). Consi-
der the system of generating ideals I e i W 1, • • • , Us l and the ring of functions

l) The assertion that I has a finite basis on A can also be understood in the follow-
ing sense. Let us choose elements of I as generators and construct from them an ideal
'f in the ring of functions that are holomorphic in Z. Then the ideal [' is finitely gen-
erated. We note that if I has a finite basis on A in the sense indicated in the statement
the theorem, it will also have a finite basis on A in the sense just explained.
22 I. INTRODUCTORY MATERIAL

that are holomorphic in Z'. Let I denote the ideal generated by I in that ring.
From the coherence of the sheaf of ideals I e that are locally generated by the
ideal I on Z and from Theorem 1.3.3, we may assert that the system IU 1,. • ·, U8 }
generates the entire group of sections. This means that for every function t € I
there exists a set h 1, • · ·, hz 8 of functions, holomorphic in Z', such that f \zt =
Ihi ui Iv. The theorem is proved.
CHAPTER II

SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

§I. GENERAL INFORMATION ON SUFFICIENT STATISTICS

Consider a family of probability measures p = IPe' () € n l defined on a


single measurable space CX, <f), the sample space. We shall call the index ()
a parameter and the set n of indices () the parameter space.
A statistic T is said to be sufficient for () (or for the family P) if for
every measurable set A € <f there exists a definition of conditional probability
P8 (A J t) that does not depend on the parameter ().
In what follows we shall encounter numerous examples of sufficient statis-
tics. At the moment, let us prove the following theorem (see [36]).
Theorem 2.1.1. Suppose that X is the Euclidean space En and let T de-
note a statistic that is sufficient for P. Then there exists a definition of con-
ditional probability P8 (A Jt) that does not depend on () and that for every fixed
is a probability measure over a.
This theorem resembles Theorem 0. 3.1 in its content, and their proofs
follow the same lines. For simplicity we shall prove it only for a one-dimensional
space.
Let rl' r 2, r 3, • • • denote the set of rational numbers arranged in some order
and let P8 (A J t) denote some value of the conditional probability of the set
(- oo, x) for given t. For each value of x = ri (where i = 1, 2, · • • ), let us
·.ake a definition Pe (r i J t) = F (r i J t) that does not depend on () (as can be
done, since the statistic T is sufficient). If ri < rj, then, by the properties
of conditional probabilities (see for example [36]), we have F (r i J t) .::; F (rj J t)
for all t except possibly for t € N ij, where N ij is a set of measure zero with
respect to the measure induced by the statistic. The function F (x J t) is a non-
decreasing function of x for rational values of x and for all t ¢ N' = Ui,· I.N'I..

23
24 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

(which is also a set of measure O). Furthermore, by the properties of conditional


probabilities (ibid~, for t ¢ N", where N" is a set of measure 0, we have

lim F(ri-...!...jt)=F(rllt) (l=l, 2, ... );


n'co n
Jim F(njt)= 1; Jim F(-njt)=O.
n'"" n~co

For t¢ N' UN", the function F (x It) behaves like the distribution function
P(X < x It), where X is a random variable. Now suppose that F 1 (x It) is con-
tinuous from the left with respect to x and that it coincides with F (x I t) at
rational points. It is a distribution function and it defines a probability measure
p 1 (A I t) that is independent of the parameter e for A € a. Let us show that
P 1(A I t) is the conditional probability of the set A for given t. Specifically,
let us show that

(2. 1.1)
for all A € <f and for all t except for values of t in a set of corresponding
measure O. By what has been said above this is true for sets A that are inter-
vals with rational or infinite end-points. Hence on the basis of elementary set-
theoretic considerations it is true for the entire Borel a-algebra <f. On the other
hand, on the set N' LJ N" we can choose the measures P(A \ t) arbitrarily with-
out violating the definition of conditional probabilities. This completes the
proof of Theorem 2. 1.1.
In what follows, we shall confine ourselves to admissable probability
measures P = IPe, e € o }, that is, probability measures with probability density
p e with respect to the same a-finite measure µ on ex, (f ). (Most often x will
be a Euclidean space and µ will be a Lebesgue measure.)
We shall need the following theorem from measure theory.
Theorem 2.1.2. The family P of probability measures is dominated by a a-
finite measure if an.d only if P has a countable equivalent family.
By equivalent families of measures, we mean that vanishing of the measures
of one family for any set A € (1 implies vanishing of the measures of the other
family.
For the proof of the theorem see [36] or [18]. As a dominating measure for
P = {Pe, e€ 0} we may take A. = I.i= 1 c i Pei, where each c i is positive, where
§ 1. SUFFICIENT ST ATIS TICS 25
ir =1ci = 1, and where the Pe.
'
constitute a suitably chosen countable family.

We shall now prove some extremely important factorization theorems that


play a fundamental role in the theory of sufficient statistics.
Consider a family of probability measures P = {Pe, () € 0 I dominated by
the measure A = Ii= 1 c i Pe., introduced above, which is equivalent to the
family
'
P. Suppose that a statistic T maps a sample space (~, Cf) onto another
space (5°, 93).
Theorem 2. 1.3. The statistic T is sufficient for P if and only if there
exists a nonnegative 93-measurable function g (t, ()) such that for all () € 0,

dP0 (x) = g (T (x), 0) dA. (x). (2. 1.2)

Proof (cf. [36]). We first prove the necessity. Suppose that T is a suffi-
cient statistic. Let A 0 denote the a-subalgebra of Cf induced by T. Then for
all () € 0, A0 € ff 0 , and A € ff we have

f P(A\T (x)) dP0 (x) = P 0 (A nvt 0). (2.1.3)


Ao
Since A = Ii= 1 c i Pe. , it follows from (2.1.3) that JA P(A I T(x)) x d,\(x) =
0
'
,\(A n Ao), so that P(A I T(x)) is also a conditional probability for the measure
,\.
Suppose that g ( T(x), ()) is the Radon-Nikodym derivative dP 8 /d,\ corre-
sponding to the a-algebra ff 0 and the measure ,\. To prove (2. 1.2) it will be
sufficient to show that it is the derivative dP8 /d,\ for the a-algebra ff with
the measure A. Setting A 0 =~ in equation (2.1.3), we find that

P 0 (A)= f
P(A\T(x))dP0 (x)= EJ..ilA(x)JT(x)]dP0 (x), f
x x
because, as we have just shown, P(A I T(x)) is also a conditional probability
for the measure A.. Furthermore the function EA[/ A(x) I T<x)] is ff 0-measurable
and dP8 /d,\ = g(T(x), ()) for the a-algebra ff 0 and the measure ,\. Conse-
quently

f EJ..il
:!;
A (x) IT (x)) dP0 (x)

= J EA.[/A (x) IT (x)] g (T (x), 0) d')., (x) =


x
26 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

= JEi..[g (T (x), 0) I (x) IT (x)) dA. (x)


A
:r;

= Jg (T (x), 0) I (x) dA. (x) = J g (T (x),


A 0) d'A (x)
:r; A

by virtue of the definition of conditional mathematical expectation. Thus


g(T(x), e) is the Radon-Nikodym density both for the a-algebra C:i and the
measure A, which completes the proof of the necessity of equation (2.1.2).
Let us prove its sufficiency. Suppose that equation (2. 1.2) holds. We show
that the conditional probability PA (A j t) is a conditionaI probability for all
measures P € P. For the a-algebra Cf we have g(T(x), e) = dPe(x)/dA. For
given A € C:f and () we define a measure v on C:f by dv = I A dP e. For the a-
algebra Uo we have dv(x)/dP e<x) = E ell A(x) I T(x)], so that dv(x)/dA(x) =
p e[A I T(x)]g (T(x), ()) for the a-algebra (fo· Furthermore, dv(x)/dA(x) =
I A (x) g ( T (x), ()) for the a-algebra C:i , so that

~~ ~ = Ei., [I A (x) g (T (x), 0)[T (x)) =Pi., [A[ T (x)) g(T (x), 0)

for the a-algebra C:i 0 • Therefore

Pi.,(AJT(x))g(T(x), 0)=P 6 (AjT(x))g(T(x), 0)

for the a-algebra Cf 0


and the measure A and hence for the a-algebra C:f 0 and
the measure Pe. Thus PA (A I T(x)) is one of the values of the conditional
probability Pe (A I T(x )) and the statistic T(x) is sufficient.
Theorem 2.1.3 has an important consequence.
Theorem 2.1.4. Suppose that the probability measures Pe € P have prob-
ability densities Pe= dP e / d µ. with respect to the a-finite measure µ.. Then the
statistic T(x) is sufficient for P if and only if there exists a nonnegative 93-
measurable function g (T(x), ()) and a nonnegative Cf-measurable function h de-
fined on X such that

Pe (x) = g (T (x), 0) h (x) (2. 1.4)


for almost all x corresponding to the a-algebra Cf and the measure µ..
To prove this, consider the measure A= I';°= 1 c iPe., which by Theorem
i

2. 1.2 is equivalent to the family P. If the statistic T is sufficient, then


(2. 1.4) follows from (2. 1.2), where we set h = dA/ dµ.. Conversely, if (2. 1.4)
§2. EXAMPLES OF SUFFICIENT STATISTICS 27

holds, then

00

dA. (x) = ~ c;g (T (x), 0i) h (x) dµ (x) = V (T (x)) h (x) dµ (x),
i=l

where V is a measurable function. We define

_ 0)_- g (T (x), 0)
gi-(T(x), V(T(x))' if V (T (x)) > 0,
and
g 1 =0, if V(T(x))=O.

Then

dP0 (x) = g (T (x), 0) h (x) dµ (x) = g 1 (T (x), 0) dA. (x), (2. 1.5)
and thus T(x) is a sufficient statistic.
At this point we make the obvious remark that if a mapping of a sufficient
statistic T defined in a space 3" onto another statistic T' defined in the same
space is one-to-one and measurable in both directions, then such a mapping
yields a sufficient statistic provided the function g (T(x), 0) in formula (2.1.4)
is a measurable function of T'.

§2. EXAMPLES OF SUFFICIENT STATISTICS

We shall now examine a number of examples that illustrate the finding of


sufficient statistics. Some of these examples will be needed later when we
pose statistical problems with nuisance parameters; others are instructive as
regards the general theory.
Example 1. In the Euclidean space En consider the family P of all densi-
ties f(x 1) • · • f(xn) that are continuous with respect to Lebesgue measure and
that correspond to a repeated sample xl' • • •, xn, that is, to a vector with inde-
pendent identically distributed components having the same probability density
f(x). (In what follows we_ shall use the term "repeated sample" without expla-
nation.) If we arrange the elements of the sample xl' · • •, xn in order of size
we obtain the variational series
28 II. SUF FIOENT ST ATIS TICS AND EXPONENTIAL FAMILIES

The mapping T: (xi' • • ·, xn)-+ (xi, · · ·, x:), which for brevity we shall identi-
fy with the expression (xi, · • •, x:), is a sufficient statistic. Indeed,

for the entire family P. By Theorem 2. 1.4, T is a sufficient statistic.

Example 2 (see Dynkin [ 19]). Suppose that § is a region in Euclidean


space E n • Let x denote an element of E n and let S denote a set of values
of the parameter (). Suppose that for arbitrary () € S the probability density
p(x I ()) is continuous and positive for x € §. We consider the mapping

x -g(x, 0)= In p (x!0)- In p (xl00).


which assigns to each x € § the function g (x, ()), where () ranges over S and
() 0 is any fixed value of the parameter (). This mapping is a sufficient statistic,
since

p(xl0)=exp(g(x, 0)) · p(xj00),

and this meets the conditions of Theorem 2. 1.4.

Example 3. In a Bernoulli scheme (of n repeated independent trials), let


us treat the probability p of success in a single trial as the parameter p gener-
ating the family P of probability measures. The results of the sample can be
coded by the row xl' • • ·, xn, where xi= 1 if the ith trial succeeds and xi=O
if it fails. For the probability of the row xI, • • ·, xn with given value of the
parameter p, we have the probability density with respect to the "counting
measure"

where m is the number of successes in n trials. The number m or the ratio


m/n (the relative frequency) is a sufficient statistic for P.
A more general example is the scheme of independent trials with k outcomes
EI' • • • , Ek. The family of distributions P has k - 1 parameters p I' • • ·, p k-l
(the probabilities of the outcomes EI'•••, Ek_ 1). For n repeated trials, the
absolute frequencies m I' • • ·, m k-l of occurrence of the outcomes EI'•··, Ek-l
or the relative frequencies m/n, • · ·, mk_/n wilI be sufficient statistics.

Example 4. Suppose that independent trials with constant probability of


§2. EXAMPLES OF SUFFICIENT STATISTICS 29
success p are carried out until m successes occur. Let Z + m denote the num-
ber of trials necessary for this result. Then the probability of the value Z =z is

p (Z = z fP) = c:,,+z-1Pm (1 - p)z.


If the results of z + m trials are written as rows (x 1 , • • ·, xz +m) (for z =
0, 1, 2, • • •) as described in Example 2, the number z of experiments such
that m out of z + m trials are successful is a sufficient statistic for the para-
meter p.
Example 5. For a repeated normal sample Xp • • •, xn € N(a, 1), the proba-
bility density with respect to Lebesgue measure is

2 exp ( -
-(2-:rt;-nr.- ~ ±(xi- a)
l=l
2)

=expn (xa- ~2 ) ( 2:rt)n12 exp(-~ ~x~)·


where as usual x denotes the quantity x = (1/n) 1 xi. From this we seeIi=
that xis a sufficient statistic for the parameter a. For a sample from a more
general family N(a, a 2), the probability density can be represented in the form

1 exp [ - - 1- ,.., (x - a)2 ]


(2:rt)n/2 an 202 ~ l '

and x and a 2 = (1/n) Ii= 1 xf x


and s 2 = (1/n)
or Ii= 1 (xi - x) 2, are suf-
ficient statistics for the parameter () = (a, a 2).
Example 6. Let us consider a sequence of k independent repeated normal
samples

Fortheparameter 0=(a 1 ,ai; ···;ak,a~), the set (x 1, ••• ,xk; si,···,s~),


where
nl
1 ~ xu;
-x 1 = -
n1
Z=l
is a sufficient statistic. Here, the number of "scalar" sufficient statistics is
equal to the number of scalar parameters. In particular, for k = 2, correspond-
ing to the two samples xl' • • ·, xn 1 € N(a 1 , ai); Yp • • • ,yn 2 €N(a 2, a~)
30 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

and the four parameters a 1, a 2 , ai, and a~ are the four scalar statistics x, f,
s i, and s ~ (in the accepted notation).
Consider the probability density for a given pair of samples

(2. 2.1)

Let us set
(2.2.2)

We thus have three independent scalar parameters a, ai and a~ or the triple


() =(l/ai, afai, l/a~), which is equivalent to them (in the sense explained). As
before the sufficient statistics constitute a quadruple (x, f, s~). Here si,
there are more scalar sufficient statistics than parameters. (This corresponds to
the "incompleteness" of the system of sufficient statistics. We shall speak in
greater detail of this later.)
If ai = a~ = a 2 , then corresponding to the triple () = (a 1, a 2 , a 2) is the
. . . . . - - nl 2 ~n2 2)
triple of suff1c1ent statistics (x, Y • ~i =1 xi + ~j =1 Y j •
Finally, if a 1 = a 2 =a and ai
=a~= a 2 , we have a homogeneous repeated
sample of n 1 + n 2 normal variables from N (a, a 2). Of course,

.,!.,@x,+ ~ Y1)aodn,~,(~x~+~/) )·
will be sufficient statistics.

Example 7. Consider a repeated sample (x 1, y 1 ), • • •, (xn, y n) of values


of the two-dimensional normal vector (X, Y) with probability density

The parameter ()=(al' a 2 , ai, a~, p) includes the two mean values a 1 and a 2 ,
two values ai
and a~ of the variances, and the coefficient of correlation P·
As a sufficient statistic we may take the vector
§2. EXAMPLES OF SUFFICIENT STATISTICS 31

(\ -X, -y, 2
Si. 82,
2 ii - -x) (Yi - -y) ') ·
n1 ~(xi
We can also replace the sample covariance (1/n) !.'f= 1 (xi - x) (yi - y) with the
quantity (1/n) !.'f = 1 xiy i and we can replace the sample variances s~ and s 22
with the quantities (1/n) !.'.I_
I-
1 x?I and (1/n) !.'.I_
1-
1 y?.
I

Example 8. Consider a repeated sample xl' • • •, xn from a distribution of


the Pearson-III type with probability density

p(x)= r~:) (x-a)m-l exp[-y(x-a)] for x>a.

p (x) = 0 for x < a; m > 0, y > 0.


Here we have three scalar parameters: a, y, and m. If we know the value of
a, so that the family of distributions depends only on y and m, then we have
the sufficient statistics

(~1 (xi-a), fl (xi -a)).

Now let us suppose that a is an unknown parameter. For the probability


density of the sample we have the expression p(xl' • • ·, xn) = p(x 1) • • • p(xn),
or

p(x1. · · ·. xn)= (r ~:; )n ft (xi-ar- 1 exp [-v~(xi -a)].

if min(x 1 , · · ·, xn) 2=. a and

p(x 1, .• • , Xn)=O, if min(x 1, •.• , Xn)<a.

If m /, 1, we have only the trivial sufficient statistics T = (x~, • • • ,x~)


(the variational series; cf. Example 1). However, if m = 1, so that we have
the exponential distribution

p (x) = ye-V (x-a) (x ~a),


p=O (x <a).
then the pair
32 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

obviously constitutes a sufficient statistic.

Example 9 (Kolmogorov [32]). Consider the family of uniform distributions


where the carrier of the probability density is [O - Yi, 0 + Yi ]. Here 0 can be
any real number. For a repeated sample xl' • • ·, xn the probability density
p(xl' .•• , xn) is equal to 1 if 0-Yi S xiSO+Yi (for i =l, 2, ••• , n) but
equal to 0 otherwise. Thus, p(x l' • • • , xn) = 1 if and only if mini (xi+ Yi) ~ 0
and maxi(xi - Yi) S 0 and it is equal to 0 otherwise. Here, the parameter 0
has two sufficient statistics: mini xi and maxi xi.

Example 10. In the preceding example suppose that the parameter 0 cannot
assume all real values but only the values 2m + Yi, where m ranges over all
integers {positive, negative, and zero}. Suppose that we take a sample of two
independent observations x 1 and x 2. Obviously each observation completely
determines the parameter 0, so that x 1 and x 2 are each sufficient statistics.

§3. INFORMATIONAL PROPERTIES


OF SUFFICIENT STATISTICS

The last, quite simple, example (cf. also the examples of this type in
Basu's article (2]) is instructive for a discussion of information-theoretic prop-
erties of sufficient statistics. The relationships between statistics and infor-
mation theory are well expounded in the famous book of Kullback (35]. Here
we shall touch on only one point in that field.
Let § denote a region contained in Euclidean space En and let x denote
an element of §. Let 0 denote a parameter assuming values in a region J( con-
tained in Em· Let p(x I O) denote the probability density with respect to
Lebesgue measure. Suppose that this probability density is continuous and
po sitive on § x K Let E n CE n , n 1
1
< n, denote a Euclidean space contained

in En" Suppose that there exists a sufficient statistic T € En for the·para-


1
meter O. In the region § we set up a system of lo.cal coordinates (T, g), where
g is an (n - n 1)-dimensional vector. Suppose that the Jacobian a(T,g)/a(x)
of the transformation exists, is continuous, and does not vanish for x € §.
Suppose that the parameter 0 is also a random variable with distribution
density q (O) -/, 0 in its domain of definition K Then we can consider the
common distribution (x, 0) and we can define a common distribution density
§3. INFORMATION PROPERTIES OF SUFFICIENT STATISTICS 33

p(x, e). If the integral

l(x, 0)= J J p(x, 0)log 2 P~;;~~~) dxd0, (2. 3.1)


(§, J{)
where p(x) = fJ{ p(x I e) q (e) de exists, it is called the quantity of information in
the sense of Shannon contained in x with respect to e or vice versa (see for
example [35)). Here dx and de are the elements of volume. Let us shift to
the coordinates T and g. We have

p (x I 0) = p (T. sI 0) IiJiJ(T,(x)6) I•
where p( T, gI e) is the probability density for ( T, g) corresponding to a given
value of the parameter e. From (2. 3.1) we obtain

l(x. 0)= ff q(0)p(T, sl0)1 iJ~~~)6) I


(§, J{)
x1
og 2 J P (T, 61 0)
p (T, 61 0) q (0) d0
I I
iJ (x) dT ds
iJ (T, 6) • (2. 3.2)
:JC
If T is a sufficient statistic, then by Theorem 2.1.4,

p (T' sI 0) = g (T. 0) r (T. s) * 0. (2. 3.3)


where r(T, g) is independent of e. In this case (2. 3.2) yields

I (x, 0) = I
(T, :JC)
q (0) P1 (TI 0)log2 Igg (T, 0>
(T, 0) q (0) d
0 dT d0, (2. 3.4)

:JC

where T is the range of the statistic T and p 1( T I e) is the distribution density


of the statistic T for a given value of e. If we integrate both sides of equation
(2. 3.3) with respect to e. we obtain
(2. 3.5)
where r 1(T) = fr(T, g)dg, this integral being taken over all values of g.
If we multiply the numerator and denominator of the fraction in the argument
of the logarithm in (2. 3.4) by r 1 (T) and make the substitution (2. 3.5), we obtain
34 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

I (x, 0) = f
IT, :JC)
q (0) p 1 (T j 0}log 2 p~~~g) dT df

- f(T, :JC)
P1 (T, 0) log 2 ;Cg•q ~~) dT d0 =I (T, 0). (2. 3.6)

Here p 1( T, 0) denotes the common density of T and 0, and p 1(t) =


fJ{p 1(T, O)q(O)dO denotes the density of T. The expression l(T, 0) is the
amount of information in the statistic T with respect to O. Thus, under the
conditions listed above regarding p(x, 0) and the statistic T, this statistic
contains all the information concerning the parameter 0 that is to be found in
the observation of x. Conversely one can show that if

l(x, fJ)=l(T, 0)

under the preceding assumption regarding p(x IO) for a sufficiently broad class
of a priori distributions of the parameter q(O) and the statistic T, then the
statistic T is sufficient. We note that in our derivation we could have used not
the amount of information l(x, O) but the functional

o(x, 0)= ff ( p (x, 0)


0
p(x, 0)q> p(x)q(B) dxd,
)
(2. 3.7)

where cf> is a rather arbitrary continuous function of a single. variable. The


conclusion regarding the sufficient statistics would remain valid. If a statistic
U is stochastically independent of a sufficient statistic T, then under rather
general assumptions the distribution u is independent of the parameter 0 and
contains no information about the parameter. However, this situation does not
always obtain. Interesting "pathological" phenomena in this field were des-
cribed by Basu [2].
Let er' <!) denote a measurable space and let p = IPel denote a family
of measures defined on it. Then for every value ·r = t and every A € <! there
exists a definition of conditional mathematical expectation P(A I t) that does
not depend on the parameter O. For arbitrary A € <! and B € 93 we have

P(AnT- 1 (B))= f P(Alt)dPaT- 1• (2. 3.8)


B
where per-I is the measure induced on 93 by the statistic T.
§4. A REPEATED SAMPLE 35

Now let A denote a set belonging to ff whose probability is independent


of the values of the statistic T for all values of the parameter (J. Thus

Pe (An T- 1 (B)) =Pe (A) Pe (T- 1 (B)) (2. 3.9)


for arbitrary B € 93. Since

Pe(A)Pe(T- 1 (8))= f Pe(A)dPeT- 1• (2.3.10)


B
when we compare this relation with (2. 3.8), which is also valid for arbitrary
B€ 93, we conclude that for every value of the parameter (J,
Pe(A)=P(A It) (2.3.11)

for almost all t with respect to the measure Pe r- 1 .


Let 0 1 and 0 2 denote two values of the parameter (J. If there exists a
set B € 93 such that Pe T-1 (B) > 0 and Pe T- 1 (B) > 0, then we conclude
1 2
from (2. 3.11) that

Pe, (A)= Pa, (A).


If any two values of the parameter can be "coupled" in the above manner by
means of the set B € 93, we see that the value of P8 (A) is independent of 0
and that the set A does not contain any information concerning the parameter
(J. However, if the measures Pe T-l and pe T-l do not "overlap·", we
1 2
cannot draw any such conclusion. Example 10 shows that in this case the set
A, which is stochastically independent of the sufficient statistic, can even con-
tain "complete information" on the parameter O.

§4. SUFFICIENT STATISTICS FOR A REPEATED SAMPLE.


EXPONENTIAL. F AMIUES
Almost all the examples of sufficient statistics that we have given dealt
with a repeated sample of random vectors in a Euclidean space. Such samples
play an extremely important role in mathematical statistics. The simplest case
is a sample of one-dimensional random variables (x 1 , • • ·, xn). A number of
works have been devoted to the study of sufficient statistics for this case. We
mention Darmois [IS], Koopman [34], Dynkin (19), Bahadur [I], and Brown (7).
Here we shall discuss certain theorems from [19) and [7].
36 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

Consider a region § contained in E m and the family of distributions P=


IPel with Borel a-algebra in it. Consider the set of values of the parameter n
as an abstract set.
If two functions X 1(x) and X 2(x) are defined in §, we shall say that X 1
is subordinate to X 2 if for an arbitrary pair of values x' and x" belonging to §
the relation X2(x') =Xix") implies the relation X1(x') = X1(x").
By Theorem 2. 1.4, a statistic subordinate to a sufficient statistic is itself
sufficient. If each of two statistics X 1 and X 2 is subordinate to the other
statistic, we shall say that they are equivalent.
The statistic dx) =x (the identity mapping) is always sufficient. We shall
call a statistic that is equivalent to dx) in a subregion §C§ a trivial statistic.
If a statistic X(x) is subordinate to every sufficient statistic, we shall call it a
necessary statistic. Of course, it will be of interest to find statistics that are
both necessary and sufficient.
Let us take a repeated sample of size n. Here we obtain points (x 1 , • • ·, xn)
in
E mn =E~m
x···xE.
n times
In the region

n times
the distribution of the repeated sample pn is induced. We shall call its statis-
tics the st..itistics of the repeated sample.
Now suppose that for all values of () € 0 the distributions p have a posi-
tive probability density p(x I()) that is continuous with respect to x. In Example
2 of §2 we saw that the function g(x, ()) = lnp(x I()) - lnp(x I 0 0 ), where 0 0 is
any value of the parameter (), is a sufficient statistic for P. This statistic is
also necessary; if X(x) is a sufficient statistic, then by (2.1.4)
p(xj6)=g 1 (X(x), 6)h(x),
where h(x) is independent of (). Therefore g(x I())= lng 1(X(x), O)-Ing 1(X(x), 00)
is, as a function of () € 0, subordinate to X(x).
If we replace § with §n and P with pn, then we need to replace g(x, ())
with g(xl' • • ·, xn' ()) = g(x 1, ()) + • • •+ g(xn' ()).
§4. A REPEATED SAMPLE 37

Now consider the family P of one-dimensional distributions defined on an


c E 1. Let us suppose that for arbitrary 0 € U there exists a contin-
interval fl.
uous derivative dp(x I e)/dx. Furthermore let us suppose that p(x I 0) > c(O) > 0
for x € fl. and for every value of O. Consider the minimum linear space (over
the set of real numbers) L(P, ~) of functions defined on fl. that contains all con-
stants and all functions g(x, e) for all 0 € U. Denote the dimension of L(P, ~)
by r + 1 (here r may be oo).
Under the assumptions made above on p(x I 0) we have (see [19], (7]).
Theorem 2. 4.1. If the functions 1, ¢ 1(x), · · ·, ¢,(x) constitutes a basis in
L(P, ~), then for arbitrary n ~ r the system of functions

(2. 4.1)
is functionally independent and constitutes a necessary and sufficient statistic
for a sample of size n.
We shall show that the statistic (x 1, • • ·, x,) is equivalent to the neces-
sary and sufficient statistic g(xp • · ·, xn' e).
First let us show that the first statistic is subordinate to the second. Let
cf>(x) denote a function in L(P, fl.) and suppose that x(x 1 i • • ·, xn) =

"i,~_ ..1.. (x .). We have


1- 1 ~ I
= c 0 + I/q-_1 c q g(x, 0 q), where the c.' are constants.
..1.. (x)
~

Therefore x(x 1, ···, xn) = nc 0 + I.~=l cqg(xl' •• ·, xq' e), so that Xis sub-
ordinate to the necessary statistic g(xl' • • •, xn' e) and hence is itself neces-
sary. Furthermore, the converse subordination also holds: g(x, 0) = a 0 (0) +
"i.j =1 a/e) ¢/x), where the a/O) are constants depending only on O. There-
fore g(x 1 , ••• , xn, 0) = na 0 (0) +I./= 1 a/O) x/x 1, • • ·, xn)' so that
g(xl' • • ·, xn' 0) is subordinate to {x 1 , • • ·, x,}. Thus this last statistic is a
sufficient statistic equivalent to g(xl' • • ·, xn' e).
The functions 1, ¢ 1(x), · • •, ¢,(x) constitute a basis in L(P, fl.) and con-
sequently are linearly independent. Let us show now that, if s ( :S: r) arbi-
trarily chosen functions 1, ¢ 1(x), • • ·, ¢ 5 (x) are linearly independent (for
example, if they constitute a part of the basis), then for n ?::, s the system of
functions
n
Xi(X1, ... , Xn)= ~cpi(xj)
j=1
38 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

(for i = 1, · • • , s) are functionally dependent.


Each of the functions <Pi (xi) is a linear combination of a constant and
functions of the form
g(xi' ()q) = lnp(xi\Oq) - lnp(xi\0 0 ).

By the hypothesis of the theorem this last function is continuously differentiable


with respect to xi in the interval /'l.. The same applies to ¢lxi). If the func-
tions Xi•···, Xs were functionally dependent in l'l.<n>, we would have

cp; (x1) cp; (x2) cp; (X s)


cp~ (xi) <p~ ( x2) cp~(xs) ==:O. (2. 4.2)

cr:(x1) cp:(x2) ... cp~ (xs)

Let us show that this would imply linear dependence of the system 1, ¢1' • • •
... , ¢s· We prove this by" induction. If s then ¢~(x 1 ) = 0, so that
= 1,
¢ 1(x 1) = const and our assertion is trivial. Suppose now that it is true for
s -1. If D(x 1, • • ·, Xs-l)/D(xl' · • ·, xs_ 1h= 0, our assertion holds. Conse-
quently we may assume that there exists a point (x~0 >, •• •, x~0_}1 ) at which this
Jacobian is nonzero. Consider a point (x~0 >, ••·, x~°.}l' xs), where xs € l'l.
assumes arbitrary values. At such a point let us write the left-hand member of
(2. 4.2), expanding the determinant in terms of elements of the last column. We
then obtain

(2. 4.3)
Here A1, ···, As are constants and As f 0. Thus we see that ¢(xs) = const
in the interval l'l., so that we have linear dependence.
Thus the system of functions (2. 4.1) is linearly independent and constitutes
a necessary and sufficient statistic for P with respect to a sample of size n.
This completes the proof of the theorem.
Theorem 2. 4.2. Under the conditions of Theorem 2. 4.1, for arbitrary finite
n :::; r, every sufficient statistic with respect to a sample of size n is trivial.
Suppose that n :::; r. Then L(P, l'l.) contains n functions ¢ 1 , ¢ 2 , • · •, <Pn
such that the system 1, ¢1' ¢ 2, • · ·, ¢n is linearly independent. It follows
from the proof of Theorem 2. 4.1 that the functions x 1(x 1, • • ·, xn), • • •
§4. A REPEATED SAMPLE 39

• ··, Xn (x 1 , • • ·, xn) of the type (2.4.1) corresponding to them are functionally


independent. Then there exist xlO), • • ·, x~O) € /).. such that
D(x 1 , • • ·, Xn)/D(x 1 , • • ·, xn) /, 0. In a neighborhood of the point (xl0 >, .•• , x~O))
we have

Thus by the definition given above the trivial ~sufficient statistic (x 1, • • • , x ) is


n
subordinate to Xl, • • • , Xn in some interval /).. C /).. and hence is necessary.
Following Dynkin [19] the rank of a system of distributions P in a region §
is defined as the largest number R such that, for every finite n : : ; R, the family
P has no nontrivial sufficient statistics with respect to a sample of size n in
the region §.
Suppose now that a system P= {p(x J e) l of distributions on an interval /)..
is subject to the conditions described above: for every (), there exists a con-
tinuous de~ivative dp(x J ())/ dx and we have the inequality p(x J ()) > c (()) > 0
for x € /)..

Theorem 2. 4.3. Suppose that the system of distributions P satisfies the


conditions stated above and has finite rank R in the interval /)... Then the density
p(x J ()) can be represented in the form

P (x I 0) =exp L~1 cpi (x) ci (0) + c 0 (0) + % (x)J . (2.4.4)

Here x € /).. and () € O; the functions ¢/x) are continuously differentiable in


6. (i = 1, · · ·, R), and the systems of functions (1, ¢1' • • ·, ¢R) and
(1, c 1(()), • • • , c R(())) are linearly independent.
Families of the form (2. 4.4) are called exponential families. Let us note,
first, that the dimension r + 1 of the space L(P, /).) is equal to R + 1. To
see this, note that this dimension is finite because otherwise, by Theorem
2. 4.2, the sufficient statistics would be trivial for an arbitrary sample of size
n (including for n = R + 1). This shows that r + 1 ::; R + 1. Furthermore, if
r + 1 were less than R + 1, then, by taking n = r + 1 : : ; R, we would obtain a
sufficient statistic (x 1, • • ·, Xr) of the form (2. 4.1) that is nontrivial, whereas
the size of the sample does not exceed R. Thus r + 1 = R + 1, so that r = R
and we take a continuously differentiable basis in /).. for the space L(P, /).) con-
sisting of the r + 1 functions 1, ¢ 1(x), ••• , ¢R(x). Then
40 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

R
g (x, 0) = !i ci (0) cpi (x) + c0 (0),
i=l
(2. 4.5)

where the c PJ) are constants depending only on (). Since g(x, ()) = In p(x I()) -
lnp(x I 00 ), we obtain formula (2. 4.4) with cp 0(x) = lnp(x I 0 0 ). Let us now show
that the functions 1, c 1(()), • • • , c R(()) are linearly independent. Suppose that
this is not the case. Then without loss of generality we may set c R(()) = b 0 +
b 1c 1(()) + • • • + b R-I c R .:.1 (()), where the bi are constants. Substituting this
expression into (2. 4.5) we see that
R-1
g (x, 0) = .!i ci (0) ¢i (x) + c0 (0) +
i=l
b0cpR (x),

where 1fli(x) = cpi(x) + bicpR(x) (where i = 1, 2, • • ·, R -1). Therefore

g(x, 0)=g(x, 0)-g(x, 00)


R-1
= !i [ci(0)-ci(00)]¢i(x)+c0 (0)-c0 (00),
i=l

so that the dimension of L(P, 6.) does not exceed R, which contradicts the
hypothesis.
The following theorem is in a certain sense the converse of Theorem 2. 4.3.
Theorem 2. 4.4. Suppose that

p (x 10) =exp(~ ci (0)cpi (x)+ c0 (0) +cp0 (x)). (2. 4.6)

where the cpi(x) are continuously differentiable functions, x € 11, () € 0, and


the systems (1, ¢1' ••• , ¢R) and (1, c 1(0), ••• , cR(())) are linearly independ-
ent systems of functions. Then the rank of the system of distributions P is
equal to R, and for n ~ r the system of functions Xi(x 1, • • •, xn) (i = 1,- • •
• • ·, R) of the form (2. 4.1) constitutes a necessary and sufficient statistic for
a sample of size n.
We have

g(x, 0)= In p (xi 0)--lnp (xi 00)


R
=,Ii (ci (0)-ci)cpi (x)+ c0 (0)-c0;
l=l
ci = ci (00). (2. 4.7)
§5. EXPONENTIAL FAMILIES 41

From this it is clear that the dimension of L(P, ~) does not exceed R + 1, and
consequently the rank of P does not exceed R. Furthermore, if the functions
1, c 1(0), • • • , c R(f)) are linearly independent, so are the R functions c 1(fJ) ·-
c 1, • • • , c R(f)) - c R" An elementary argument leads to the existence of R num-
bers f) l' · · · , fJ R in 0 such that
R
<Jli (x) = biO+ ~ biig (x, 6i), (2. 4.8)
j=1

where the bij are constants. Thus the functions 1, ¢1' • · ·, ¢k belong to
L(P, ~) and generate it. H they are linearly independent they constitute a
basis in L(P, ~), and the present theorem follows from Theorems 2. 4.1 and
2. 4.2.
These theorems naturally lead us to consideration of exponential families
of a general form, the study of one aspect of which constitutes the content of
this book.·

§s. EXPONENTIAL FAMILIES

The preceding section indicates the importance of the study of families of


distributions of the form (2. 4.4), i.e. exponential families. We shall need to
generalize this concept, which was defined in §4 only for an interval ~ of the
straight line E 1 and under special hypotheses. In making this generalization
we shall follow Lehmann [36). Let :X: denote the Euclidean space Ek with
Borel a-algebra <f. Suppose that a a-finite measure µ is defined on the measur-
able space (:X:, <f). Consider the family of densities lpix)l of the form

Pe (x) = C (6) exp c~ Qi (6) Ti (x)) h (x), (2. S.l)

where f) belongs to an abstract set 0, where T/x) (j = 1, 2, · • ·, k) and


h(x) are measurable functions, and where the Q/fJ) are arbitrary functions.
Later we shall give representative examples.·
If Xl' • • •, .Xn is a repeated sample of a distribution of the type (2. 5.1),
then it follows immediately from the factorization theorem (2.1.4) that the
statistics I7=
1 T/xi) (j = 1, 2, • • •, s) ·constitute a sufficient statistic with
respect to a sample of volume n.
We can simplify the writing of the probability density (2. S.1) if we assign
42 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

h(x) to the measure µ(x) (that is, if we consider the measure v(x) with dv(x)=
h(x)dµ(x)) and set Q/()) = ()j (j = 1, 2, · · ·, s). Here the ()i have numerical
values and the point () = ()I' • • ·, () 5 belongs to the Euclidean space E 5 • We
obtain the expression

(2. 5.2)

where C(()) is a normalizing constant.


In this notation the parameters (() 1 , • • • , () 5 ) in the space E 5 may assume
arbitrary sets of values related to one another (for example, they may be de-
pendent on some base parameters that vary in Er' where r < s ).
However, we can also consider a space of parameters 0 CE 5 that consists
of those values of () = (() 1 , • • • , () 5 ) for which

J ··· Jexp (
Ek
±
j=I
0iT i (x)) dv (x) < oo. (2. 5.3)

Here v (x) must be a a-finite measure on '.X: = Ek that is given in advance and
that is independent of the parameter (). The space 0 defined above is called
the natural parameter space. Let us give some theorems of Lehmann [36] on
exponential families under natural parametrization.
Theorem 2. 5.1. The natural parameter space 0 is a convex set.
To prove this, we use Holder's inequality (see for example [71], pp. 21-26),
which may be written in the form

J (a (x) )a (b (x.) ) -a dµ (x)


1

Em

(2. 5.4)

where a(x) and b(x) are measurable functions defined on Em' µx is a a-finite
measure, a E (O, 1 ), and all the integrals converge.
Let () = (()1' • • ·, () 5 ) and ()' = (()~, • · ·, ()~) denote two points in the natural
parameter space 0 and let a denote a number in the interval (0, 1). We need
to show that the point a() + (1 - a)()' belongs to 0. In formula (2. 5.4) we set
§s. EXPONENTIAL FAMILIES 43
µ(x) = v (x), a(x) = exp (I.j = 1 ()i T/x)), b(x) = exp (l:.j =I ej ~ (x)), and m = n.
Then, on the basis of (2. 5.4) and (2. 5.3), the desired result follows for the
values of the parameters () and () 1 •
In studying the family {p 8(x)l in a natural parametrization, we need an
extension into the complex domain.
g
Let = (g 1, ·•·,gs) denote a point in the natural parameter space 0, so
that the integral in (2. 5.3) converges when ()i = gi (j = 1, 2, • · ·, s). Then this
integral will obviously converge also for complex values ()i = gi + iri, where
the Ti are arbitrary real numbers. Admittedly, the values of p 8 (x) given by
formula (2. 5.2) will not be positive and they will not in general even be real,
so that we do not obtain extensions of the family of distributions. However
such a procedure for extending the values of the parameters to the complex domain
will be useful to us on several accounts in what follows. In many cases it
allows us to apply the theory of functions of a complex variable.
Let ¢(x) denote a bounded complex-valued measurable function defined on
En • Consider the integral

l ~(x)exp[~a;T1 (x)]dv(x). (2. 5.5)

Obviously this integral converges at every point g = (g1, • • •, (s) of the natural
parameter space n and also at every point of the cylindrical subset of the com-
plex space E(Zs) consisting of points gi + i Ti (for j = 1, 2, • · ·, s ), where
the Ti are arbitrary real numbers.
If the natural space 0 contains an interior point (ei_O), • • ·, g;o)), we may
assert that the integral (2. 5.5) is an analytic function of the parameters ()I,•• •
••• , ()s at that point. To see this, note that our integral can then be repre-
sented as the limit to which a sequence of finite sums converges uniformly
with respect to the complex parameter () = (g1 + i TI' •••,gs+ i Ts)' where
(t°l' • · ·, gs) lies in the neighborhood (~O>, · · ·, g;o>). By Weierstrass'
theorem on functions of several variables (see for example [5]) we may conclude
that (2. 5.5) is analytic at the point (g~ 0 >, · · ·, g;o)) and at all complex points
with the same abscissa.
The derivatives with respect to the ()i can be calculated at these points by
differentiating under the integral sign. This is true because the derivatives of
44 II. SUFFIOENT STATISTICS AND EXPONENTIAL FAMILIES

such analytic functions can be expressed as integrals of i:he Cauchy type over
small polycylinders around points () of the type described, and by virtue of the
absolute and uniform convergence with respect to () of the integral (2. 5.5) in a
sufficiently small neighborhood of the given point we can transform the integral
with respect to () into an integral of the Cauchy type with respect to the
measure dv(x}.
These considerations can be applied to the derivation of certain useful
formulas (see [36]). From (2. 5.2), we have, at interior points of 0,

Jexp ( t 0iTi(x)) dv (x) = (C (0) )- 1• (2. 5.6)


Em J=l
Differentiating under the integral sign with respect to ()j and multiplying ·both
sides of the equation by C(()), we obtain

(2. 5.7)

Analogously,

where the subscripts denote the variables of differentiation. Since


a2InC/aeiaei = c;j1c-c;c;;c 2 it follows from (2.5.7) that

cov(Ti (x), T 1 (x)) =E(Ti (x) T 1(x) )-E(Ti (x)) E (Ti (x))
<}2
=- 00 . 00 . In C (0).
I I
(2. 5.8)

Examples of exponential families can be obtained by repeated samples from


the distributions exhibited in the different examples of §2 of the present chapter.
A normal distribution is illustrated by Example 5 (for the one-dimensional case)
and Example 7 (for the two-dimensional case). Example 3 illustrates a multi-
nomial distribution. Normal distributions (of arbitrary dimension n) and binomial
and multinomial distributions belong to this class. It will be instructive to con-
sider Examples 8 and 9 of §2 (the Pearson-III distribution and uniform distribu-
tion with variable mean). If we consider the number a in Example 8 as given,
then for a repeated sample we obtain an exponential family with parameters
§6. SUFFICIENT STATISTICS AND UNBIASED ESTIMATES 45

01 = y and 0 2 = (m - 1) and with sufficient statistics


n n
Ti=~(x,-a); T 2 =~1n(x 1 -a).
l=l / l=l

On the other hand, if a is also a parameter, we do not obtain an exponential


family. (In this and more general cases, it would be proper to speak of a gen-
eralization of exponential families, namely, "families with movable carriers",
which are not exponential families.)
In Example 9, where the density is of a very simple form, assuming only
the values 1 and 0, we also do not obtain an exponential family.

§6. SUFFICIENT STATISTICS AND UNBIASED ESTIMATES

Let P = {P8 }, () € 0, denote a family of distributions that is defined on a


measurable space ('.X, ct) and let F (P 8) = f(()) denote a functional defined on P.
A function ¢(x) defined on '.X is called an unbiased estimate for f(()) if
the mathematical expectation Ee¢ (x) = fx ¢ (x) dPe (x) exists for all () and if

E 0<p (x) = f (0), (2. 6.1)


In (32] Kolmogorov introduced upper estimates ¢+(x) and lower estimates
q,Jx) for f(()), which are characterized by the properties

E0<p + (x) > f (0); E 0<p _ (x) < f (0), (2. 6.2)
so that an estimate which is both upper and lower is unbiased, and conversely.
We shall study below the behavior of unbiased estimates in certain
classes of exponential families. At the moment, however, what we need to do
is establish connections between the theory of unbiased estimates and the
theory of sufficient statistics.
Let cp(x) denote an unbiased estimate for f(()) and let x(x) denote a
sufficient statistic for () with an arbitrary set of values. Consider the condition-
al mathematical expectations E el¢(x)I x(x)]. Since x(x) is a sufficient
statistic, this expression is independent of (), so we may write EJ¢(xllx(x)]=
EX(q,). The function Ex(¢) is a function only of x(x) for given ¢(x).
Since E(EX(cp)) =Ee/>= f(O), we see that the function EX(¢) is an unbiased
estimate of f(()) (see Rao [61], Blackwell [4], and Kolmogorov (32]). We also
have the following theorem ([16)., [17], [l S]).
46 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

Theorem 2. 6.1. If for some () € 0 and an unbiased estimate ¢. the variance


D 8 (¢) = fx(¢-f(())) 2dP 8 , exists, then D 8 (EX(¢)) exists also and

(2. 6.3)
Proof. We have

De<p =Ee (<p- f (0) )2 =Ee (<p-Ex (<p) +Ex (<p)- f (0) )2
=Ee (<p - Ex (<p) )2 + E (Ex (<p) - f (0) )2
+ 2E [(<p-Ex (<p)) (Ex (<p)- f (0) )].
Furthermore,

E [(<p - Ex (<p)) Ex (<p)] = E [Ex (<p - Ex (<p)) Ex (<p)} = O;


E (<p - Ex (<p)) f (0) = 0,
so that
De<p =De (Ex (<p)) +Ee (<p -Ex (<p) )2•

which proves (2. 6.3).


Theorem 2. 6.1 can be generalized if we consider not the variance ¢ but
"defects" of a more general form. Let us replace D(¢) = E(cp - f(())) 2 with
the expression Eg (¢ - f(())), where g is an arbitrary continuous convex (down·
ward) function. We have (see Doob [IS])

g(E (<p- f (0)) IX)= g(Ex (<p)- f (0)) <: E(g(<p- f (0)) IX),
where we assume the existence of the corresponding integrals.
Again taking the mathematical expectations, we obtain

Theorem 2. 6.2.

Eg (Ex (<p)- I (0)) <: Eg (<p- f (0) ). (2. 6.4)


This provides a generalization of Theorem 2. 6.1.
Thus the operation of taking the conditional mathematical expectation of an
unbiased statistic ¢ for a given value of a sufficient statistic leads to a new
unbiased statistic the "convex defects" of which do not exceed those of the
preceding one. In what follows, we shall frequently make use of this fact.
Now let us look at some examples of unbiased estimates.
Example 1. Let xi' • • ·, xn denote a repeated sample from a distribution
'§6. SUFFIOENT STATISTICS AND UNBIASED ESTL\1ATES 47

with parameters Exi =a and D(xk) = a 2• Then the quantity x = (1/n):I7=l xi is


an unbiased estimate of general mean a. An arbitrary linear form A1x 1 + • · •
· • • + An xn, where I7 =1 Ai= 1, is such an estimate. The quantity
n
-2 1 ~ -
x - n(n-1) ~(xi-x)2,
l=l

can serve as an unbiased estimate of the functional a 2 , as can numerous other


estimates.

Example 2. Suppose that the distribution is normal, so that xi€ N(a, a5),
where a 0 is known. Then we may consider the integral

,rl j'"exp[- (x-2a)2]dx=fA(a)


Y 2naoA 2ao

over an arbitrary Borel set A on the real line. Suppose, for example, that
xi (i = 1, ···,, n) denotes the dimensions of an arbitrary manufactured object and
that A constitutes a zone lying outside the tolerated range [a 0 - 3a0 , a 0 + 3a0).
Then f A (a) denotes the portion.of rejects for a general mean a. An unbiased
estimate of f ia) is the statistic. (see (32))

-
<l'A (x) = 1
,r-
y 2na0
Jexp [-
A
(x - x)2 ] dx.
2a0
2

Analogous estimates are made in (32) when a and a are unknown.

Example 3. Let xl' • • •, xn € N(a, a 2 ) denote a repeated sample in a


normal set with unknown variance a 2 • Let us suppose that n ~ 2. We know
that under these conditions x and 5 2 , where 5 2 = (1/n) I'!_
I -
1 (x.I - x) 2 , are
stochastically independent and x € N(a, a 2 /n); n5 2/a 2 = X~-l' where X~-l
is a x2-distribution with (n - 1) degrees of freedom. (See Cramer [33). See
Chapter IV regarding the independence of the statistics in connection with the
theory of Neyman structures.) If we set n5 2I a 2 = Zn we have, for arbitrary
p> o,

EzP = 1 J oo n 3
y-2+1> -2
x
_ p
r {~
2
+ P)
n .!.=.!.. (n-l) x e dx-2 (n-l
2 2 r_2_0 •
r--
2
)
Thus for arbitrary positive p the statistic
48 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES

(n)P
-
r (n 21)
---,.--....,...----'-....,..- 52P
2 r(n2l+P)
is an unbiased estimate for the parameter a 2 P.

Example 4 ([30] and [72]). Led.xl' y 1), · • ·, (xn, y n) denote a two-dimension-


al normal sample (see Example 7 of §2) with coefficient of correlation p .and
let
n
~ (xl-x)(Yi-Y)
l=l

denote the sample coefficient of correlation. Then

E arcsin r = arcs in p, (2. 6.5)


so that (unexpectedly) arcsin p has the unbiased estimate arcsin r.

Example 5. Let .xl' • · ·, xn denote a repeated sample from the family of


all distributions on E 1 with continuous density /(.x). Let T(x 1 , • • •, xn) de-
note a continuous bounded statistic. This statistic is an unbiased estimate of
the functional Tf= E 1 T(x 1 , • ••, xn) for given{. Since it is bounded, it has
a variance D/ TJ = E( T - T1) 2 • A sufficient statistic of this family is the
variational series x~ :S • • • :S x~. For a fixed sufficient statistic, the condition·
al distribution of the sample consists of n! points constituting n! permutations
of the variational series. Thus for a given value of a sufficient statistic the
conditional mathematical expectation of T is equal to

1r
n. ~ OT(X1, .. . , Xn)=T,
~
0

where G is the set of all n! permutations of .xl' • • ·, xn and the notation


GT (xl' · · ·, xn) denotes an arbitrary member of G. By Theorem 2. 6.1 the
variance of the symmetrized statistic T does not exceed the variance of T.
CHAPTER ill

NUISANCE PA RAM ET ERS.


TESTS W.ITH .INVARIANT POWER FUNCT.IONS

§ 1. NUISANCE PARAMETERS
Suppose that on a measurable space (~. U) there is defined a family of mea-
sures {P 8 1, for () E 0, that are dominated by a a-finite measure µwith respect
to which there is a probability density pix). We shall assume that the values of
the parameter () = (() l' · · · , () 5 ) lie in a Borel set I 1 (usually a parallelepiped)
contained in the Euclidean space E 5 • Furthermore, we shall assume that for
every value of x the density p B (x) = p(x; () 1, · • • , () 5 ) is a sufficiently smooth
function of the parameters () l' · · ·, () 5 • The degree of smoothness required will
vary from problem to problem and will be specified in the individual cases.
Instead of the given parameters () 1, .. ·, () 5 , we may introduce other para-
meters y 1(()1' • • ·, () 5 ), • • · , y 5 (() 1, • • ·, () 5 ). For suitable choice of these func-
tions yl'···, y 5 , the property of smoothness of pix) for given xis retained for
the new parameters y 1' · · ·, y 5 • Let us now suppose that we subject a hypothesis
H0 regarding the parameters () 1, · · ·, ()q (where q < s) to statistical verification.
The hypothesis may be of the form (() 1, · · • , () q) € w, where w is a subset of the
space of the parameters (() 1, · · · , () q). We make no hypothesis of any sort regard-
ing the remaining parameters ()q+I• • • ·, () 5 • However, since these parameters
appear in the expression for the probability density p(x; () 1' · • · , () 5 ) , they may
play a part in all calculations of the distributions of the statistics that we may
make in verifying the hypothesis H0 ; for this reason we have to keep them in mind.
Following H. Hotelling, we call such parameters ()q+l' • • ·, () 5 nuisance parameters
of the given problem.
Frequently, the hypothesis H0 deals with the behavior of certain functions
Yi(() l' · · · , () 5 ),y /() 1, · • • , () 5 ) (q < s) of the given parameters and is of the
• • ·,

form (yl' · · ·, Yq) € (J)q• where wq is a subset of Eq. Let us assume that the

49
50 III. INVARIANT POWER FUNCTIONS

functions y 1(0 1, · • ·, ()s), • • ·, Yq (0 1, · · ·, ()s) are functionally fdependent and


sufficiently smooth and that there exist s - q additional functions y q +1(() 1, • · ·, () s), •••
• • ·, Ys(Ol' • • ·, ()s), such that the parameters Yl(()l' · · ·, ()s); • • ·, Ys(() 1, • · ·, e)
constitute a new system of parameters with the same properties as the original
system. Then it is natural to consider Yq+l(0 1,. ··,es),···, y/e 1, ···,es) as
the nuisance parameters of the hypothesis H0 •
~et us look at some examples of the verification of a hypothesis with nuisance
parameters.

Example I. Verification of a linear hypothesis in the one-dimensional case


(see for example (36]). Let xl' · • ·, xn denote independent normal variables with
e
means 1, • • ·, en and common variance a 2• We know that the vector of the means
(~l' · · ·, en) lies in the t-dimensional linear subspace T1n c En. Suppose that we
wish to verify the hypothesis H0 that this vector lies in the subspace "TI.., of dim-
ension t - r < t.
Here the coordinates in t-dimensional space Iln and the standard deviation
a serve as the parameters (e 1, • · • , () s), where s = t + 1. Let us assume that
there are r linear relationships among the first t parameters, so that there will
be r + 1 nuisance parameters in the problem.
Suppose, in particular, that we have a linear regression scheme: ej = a+ f3tj
(for j = 1, 2, • • ·, n), where the t. are known. We wish to verify the hypothesis
J .
H0 : f3 = 0. Here the space Iln consists of vectors of the form a(l, · · ·, 1) +
f3(t 1, · · ·, tn), so that we have the three parameters a, (3, and a for t = 2. With
the hypothesis H0 : f3 = 0, we have two nuisamce parameters a and a.
The hypothesis H'0 that a= a 0 and f3 = (3 0 , which fixes the value of the
regression parameters, reduces to a linear hypothesis in the sense of the above
definition if we take the new variables Yj =xi - a 0 - f3otj (for j = 1, 2, · · ·, n).
For the new variables, we have the hypothesis H~: a= O, f3 = 0 and a single
nuisance parameter a.
Example 2. An important special case of a linear hypothesis in the some-
what more general sense described at the end of the preceding example is Student's
problem: For a repeated normal sample xl' · '. ·, xn € N(a, a 2), verify the hypoth-
esis H0: a= a 0 regarding a given value of the general mean a. The nuisance
parameter is a.
Example 3. For the same sample, verify the hypothesis H0: a/a= Yo· Making
§ 2. TESTS WITH INVARIANT POWER FUNCTIONS 51
a suitable chang~. of parameters Yo= y 0 (a, a)= a/a and y 1 = y 1(a, a), we can
treat ·y 1 as a nuisance parameter. In particular, we may set y 1 =a or y 1 =a.

Example 4 (the Behrens-Fisher problem). Let xl' • · ·, xn 1 and y 1' • • ·, y n 1


denote two repeated samples taken respectively from normalized sets N(al' aj)
and N(a 2, a~). Verify the hypothesis H 0 : a 1 - a 2 = O. We may treat a 1, a 1 ,
and a 2 as nuisance parameters. Basically, this problem is a special case of a
generalized linear hypothesis when not only the mean values but also the vari-
ances are assumed variable. The present problem is one of the best-known problems
with nuisances parameters and it will be treated in detail at the end of the book.

Example 5. Under the condition of Example 4, verify the hypothesis H0: Yo =

ai/a~ =Yr Here the nuisance parameters are, for example, al' a 2, and a2•

§ 2. TESTS WITH INVARIANT POWER FUNCTIONS


Let us again look at the measurable space CX. U) and the family of measures
with probability density p (x; e1,· ... es) introduced in the preceding section
defined on it. Let us consider a null statistical hypothesis H0 of the form e1 =
e)O), ••• , eq = (JCqO), where q < s. As has already been noted, the hypothesis H'0
that there are q relationships of the form y 1(el' ··.,es)= 0, • • ·, Yq(el' · .. ,es)= 0
among the parameters when certain analytic propenies are imposed on these rela-
tions can be reduced to such a form. An altemative to H0 is the hypothesis that
the parameters fall in the complement (with respect to 0) of the set defined by
the hypothesis H0• We denote the hypothesis alternative to H0 by H1•
Simple hypothesis H'0 that the parameters e 1' • • • , es assume given values
satisfying the relationships indicated will be called "special cases of the hypoth-
esis Ho·"
The hypothesis H0 is a complicated statistical hypothesis since it does not
fix the values of the nuisance parameters e q+l' •.• ' es and hence does not give
the probability density p(x; el' ..• ' es).
Let us assume that verification of the hypothesis H0 is done with the aid of
a critical function cll(x), which we shall also call a test. If 'cll(x) assumes only
the values O and. l, we shall say that the test is unrandomized and we ·shall call
the zone of values of x such that ·w(x) = 1 the critical zone Z, so that cll(x) is
the characteristic function of the critical .zone. If x € Z, then H0 is rejected
and we take H1• If cll(x) assumes other values in the interval [O, I] we say that
the test is randomized. If an observation yields the value x, we need to have a
52 III. INVARIANT POWER FUNCTIONS

"play-off" with two outcomes: H 0 with probability I - '<ll(x) and H 1 with proba-
bility ·<Ii(x), and we should accept or reject H0 in accordance with these outcanes.
We know that, even in the simplest case of testing a simple hypothesis H0
against a simple alternative H1, to obtain the most powerful test we need to intro-
duce randomization (see for example [ 36]).
Let us denote by E 19<1l(x) the mathematical expectation of <ll(x) for given () =
(() 1, · · ·, () 5 ). Those values (()l' • • ·, () 5 ) for which the hypothesis H0: () 1 =
e\O>, · · ·, () q = ()~O) is satisfied correspond to the value of the level of the test
<ll(x):
(3.2 .1)

and they characterize the probability of rejecting H0 when it is valid. If ()


assumes values corresponding to H l' then
E 0<!> (x) = ~ (6 1, ••• , 6s) (3. 2.2)

gives the power of the test <ll. We shall refer to both the values of (3.2.1) and
the values of (3.2.2) as the values of the power function ¢(() 1, · • ·, () 5 ) of the test
<ll(x), keeping in mind their different statistical connotations depending on whether
they correspond to the values of the parameter H0 or the parameter H1• Thus we
have
(3.2.3)
To a significant degree, the power function ¢ characterizes the properties of a
test although there are other characteristics that are important for many measural:ie
spaces, for example those spaces encountered in sequential analysis.
To calculate the power function ¢(() 1, · · ·, () 5 ) we.need to know the values
of the parameters. If we wish to determine the behavior of a test for simple par-
ticular cases H0 : (e)0 >, · · ·, ()~0 >, ()'q+l' ••• , () 15 ) and the alternative H 1:
(())1), • • ·, ()~O, ()"q+l' • • ·, ()'~), then we need to know not only the values of the
"basic" parameters ()l' ••• , ()q but also the values of the nuisance parameters
()q+l• • • ·, () 5 • The question naturally arises: is it possible to construct tests
for which the power function ¢(()1' • · ·, ·() 5 ) is independent of the nuisance para-
meters ()q+l' ••• , () 5 and is determined only by the values of the "bask" para-
meters () 1, · • • , () q with which the hypothesis H0 and H 1 deal? We shall call
tests of such a kind tests with invariant power function (with respect to the nui-
sance parameters).
There exist trivial examples of such tests: The test ·<Ii =a, where a is a
§ 3. SOME RESULTS 53

constant belonging to the interval (0, 1), is obviously a test with invariant power
function. However, one can easily see that this test is completely useless for
verifying the hypothesis H0 •
In general, if a test ell with invariant power function ¢(() l' • · ·, () s) that is
independent not only of the nuisance parameters () q+l' · · ·, () s but also of the
basic parameters ()l' • • ·, ()q, that test will be useless for verifying H0 • In what
follows we shall be concerned with tests ell(x) with invariant power function that
are not useless. Such tests might be of particular interest if, for a sufficiently
broad class of problems, we could obtain sufficiently broad classes of such tests
and examine their properties, seeking tests that are optimum in some respect or
other. Their advantage over other tests would consist in the fact that these other
(noninvariant) tests do not in general admit calculation of the power function
¢(01' · ··, () 5 ) for particular cases of H0 and H1 and hence their qualities are to
a considerable degree unclear.
At the present time however, we have extremely few results in this direction,.
and the results that we do have deal primarily with particular cases of testing
hypothesis (cf the type of Examples 1-5 in § 1). Since these results are interesting
however, we shall present them in the folJowing section.

§ 3. ~OME RESULTS DEALING WITH TESTS


WITH INVARIANT POWER FUNCTIONS
The question of unrandomized tests with invariant power function for Student's
problem (Example 2 of § 1) was first studied by Dantzig [14] in 1940. Dantzig
showed that all such tests are useless. Here we shall look at a somewhat more
general statement of this problem.
Let x l' ... , xn E N(a, a 2) denote a repeated normal sample. Sufficient
statistics for the parameters a and a 2 are x
and s 2 = (1/n) ~:=l (xi - X) 2. Let
y(a, a) denote a piecewise continuous function of the parameters a and a.
Suppose that we are verifying a hypothesis H0 : y(a, a)= Yo (where Yo is one of
the possible values of .y). To verify it we take a randomized test ell(x). Consider
the conditional mathematical expectation
E (<I> (x) Ix,· s2) = <1> 1(x. s2~. (3.3.1)
Since x and ·s 2 are sufficient statistics, the expression ell 1(X, s 2 ) will not depend
on the parameters and will also be a test, because with a suitable definition of
conditional mathematical expectation we have 0~·ell 1 .~ L Furthermore, Theorems
54 III. INV ARI ANT POWER FUNCTIONS

2.6.1 and 2.6.2 enable us to assume that the properties of the new test cll 1(X, s 2),
which depends only on sufficient statistics, will be no worse in several respects
than the properties of the original test. In all cases, for arbitrary values of a
and a,
(3.3.2)

that is, the power function of the test ·cfl(x) coincides with the power function of
the test cll 1(X, s 2), so that investigation of the tests with invariant power func-
tion can be conducted only in a region of tests that are dependent only oo the
sufficient statistics.
In what follows we shall frequently use an operation of the type (3.3.1), which
we shall call the operation of projection onto the a-algebra of sufficient statistics.
For now, let us consider the question of verifying the hypothesis H0 : y(a, a)= Yo
by using tests with invariant power function. It turns out that the existence of
such tests that are not useless depends in a very real way on the form of the func·
tion y(a, a). For example, let us take y(a, a)= a, so that the hypothesis H0 is
of the form a= a 0 • The unrandomized test ell with critical zone s 2 ·> C (for any
C > 0) will obviously possess a power function that is independent of a but de-
pendent on a, so that it has an invariant power function but still is not useless.
A similar situation is true of a test with critical zone f(xi - xj) > C, where f is
an arbitrary continuous function of the differences in observations. Its power
function is independent of a but (in general) dependent on a. We can express
this situation very concisely by saying that the hypothesis H0 : a= a 0 admits an
invariant verification. However, as we shall see later, the hypothesis H0: a= a0
does not have an invariant verification. We can prove this and a more general
theorem.
Theorem 3.3.1. The hypothesis H 0:
a
#=Yo
does not admit an invariant verification for p < 1.
To prove this theorem we need only consider randomized and unrandomized
tests that depend only on sufficient statistics. By virtue of (3.3.1) and (3.3.2),.
the remaining tests reduce to these. Let us set x = X and s 2 = V. As we know,
x and s 2 are stochastically independent (for phenomena of this sort see Chapter
IV).:
Furthermore (see for example [33]) the statistic X has probability density
§ 3. SOME RESULTS 55

P1(X)=( 2
The statistic V has probability density
:r ! exp[- 2: 2 (x-a)2J. (3 .3 .3)

p,(o)~ (;) •;' r (~) a-n+iv...!!._!


2 2exp ( - nv)
202 • (3.3.4)

Thus the common probability density is

nn/2
r.!l.e en =---n-_...,.l____ (3.3.5)
V2n 2-2-r ( n 2 1 )

Let us also set n/2a 2 = ,\ and na/ a 2 =µ (this corresponds to a natural


parametrization of our exponential family). From (3.3.5) we obtain

p (x, v) = C~A. 2exp (-


n
r;) v-2-exp [-A.(v+x2)+ µx],
n-3
(3.3.6)

where C'n >0 is a new constant.


Let <ll 1(X, V) denote a test for the hypothesis H0 : a/aP = Yo that depends
only on the sufficient statistics X and V. In accordance with (3.3.6.). we have
an expression for its power function:

cp(A., µ)=E1..,µ<I> 1(X, V)

f J
oo oo n-3

= C~A. ~exp (- r;) dx dv<I> 1 (x, v) v-2-


-oo 0
X exp[-A.(v+x 2)+µx}. (3.3.7)
Thus

f f
oo
dx
oo n-3
dv<I> 1 (x, v)v-2-exp[-A.(v+x2)+µxl

-oo 0
=
If _!!.._ (µ2)
CnA. 2 exp 41.. cp (A., µ). (3.3.8)

Since <ll 1(x, v) is a Lebesgue-measurable function such that 0 :5 <ll 1(x, v) :::; I, we
seefrom(3.3.7)that ¢(,\,µ)is an analytic function of,\ andµ for A.>O,- oe<
µ.<oe.
56 III. INVARIANT POWER FUNCTIONS

The hypothesis H 0:
a
~=Vo

can be represented in terms of the parameters A. and µ as follows:


µ ( n )p/2
A.1-p12 = 2 2. Vo= V1· (3.3.9)

Thus, it is a question of the existence of a test ~ 1 (X, V) such that in formula


(3.3.8),

(3.3.10)

for A> 0 and - oo .< µ < "". Here the function i/J = i/J(t) must not be a constant if
the test is to be of any use. Because of the analyticity of </J(A., µ) in the region
defined above, the function i/J(t) is differentiable for t > 0. Let us make use of
this fact. If i/J(t) is not a constant, there exists a point t = t 0 , where tf;'(t 0) f. O.
In equation (3.3.8) we set </J(A., µ) = i/J(µ/A. l-p/ 2) and differentiate both sides
with respect toµ. Since A> 0, we can differentiate under the integral signs on
the left-hand side of the equation. We obtain

II dx dv<I> 1 (x, 'V)'V n; 3 xexp[-A.('V+x2)+µx]


-o> 0

= C~A.-n 12 exp(::)(;,_,¢ ( A.1~p12) +¢' ( A.1~p1 2) A.1~p12 r (3.3.ll)

Now, let us set µ. = t 0 A. I -p/ 2 and consider the behavior of the two sides of equa-
tion (3.3.11) for small positive values of A.. Let us find an upper bound foe the
left-hand side of that equation by replacing the factor x with Iii and the function
·~ 1(x, v) with its maximum value, namely 1. For the left-hand side of equation
(3.3.11) we obtain the upper bound

f f
0) 3 0)

2 'V n; exp(-A.'V)d'V xexp(-A.x2+lµjx)dX·


0 0
n-1 ~
=2f (n 2 1 )A.--2- j xexp(-A.x2+lµlx)dx. (3.3.12)
0
To find a bound for the remaining integral, we remember that µ = t 0 A. l-p/ 2 , where
p < 1. Let us break this integral into two integrals: 1() dx + f~ () dx, where
1 f;
A. 1 = (lnl/A.)/.J£. For the values of µ indicated, we see that
§ 3. SOME RESULTS 57

f ( )dx=O(exp-{-(1n !)2).
00

A,

Thus a trivial bound for J~ 1 ()


dx is A.i = O((In(l/A.) 2)/A.). By substituting this
value into (3.3.12) we obtain a bound for (3.3.12) as A.tO:

o(A-i-~(1n ~)2). (3.3 .13)

Here the right-hand side of equation (3.3.11) is of the form

C~A-n 12exp(~ At-p)(~ A-Pf2.'IJ (to) +'ll'(t0)A-f-t).

Since p < 1, we have p/2 - 1 < - 1/2. Since t/J' (t 0 ) I= 0, we obtain a contradiction
by letting A approach 0 from above; thus the proof of Theorem 3.3.1 is complete.
In partic.ular, we see that the hypothesis H0 : a= a 0 does not admit an invari-
ant verification.
We note now that for p = 1 and invariant verification of the hypothesis H0:
y =a/a= Yo is possible. To car5 out this verification we may take, for example,
the statistic = + ••. + x~, where
u x;Jxi
X=xVn+l ,Xi=(I- n~it'''(xi-x)(j=l,2, .. .,n).
The distribution of this statistic depends only on a/a, so that the unrandomized
test with critical zone IVI ~ C will have an invariant power function.
A ,result analogous to the last one can be formulated for the general case of
a linear hypothesis (Example 1 of § 1). A very simple case of a more general
hypothesis admitting inequality of the variances of the elements of the sample is
the Behrens-Fisher problem (Example 4 of § 1). In the notation of that example,
suppose that we are verifying the hypothesis H0 : a 1 - a 2 = 0. We shall take
a 1, Up and a 2 as the nuisance parameters.
Theorem 3.3.2. The hypothesis of equality of means in the Behrens-Fisher
problem does not admit an invariant verification.
Proof. Corresponding to the four parameters a 1' a 2 , a 1' and a 2 in the
problem are the four sufficient statistics X 1 = x,
X 2 = Y, V 1 =sf, and V 2 = s~.
Remembering the laws of distribution of the form (3.3.3) and (3.3.4) fa: two samples
of sizes n 1 and n 2, we obtain an expression for the common probability density
58 III. INVARIANT POWER FUNCTIONS

(3.3.14)
Just as in the preceding derivation, we introduce the natural parameters
n1 _ ~ • n2 ~ • n 1a 1 n 2a 2
-2 -11.1, -2- = 11.2, -2- = µl; -2- = µ2.
2a 1 2a2 a1 a2
From these equations we get 2a 1 = µ./A 1 and 2a 2 = µ./A 2, so that the null
hypothesis takes the form H0: µ./Ai - µ. 2/A 2 = 0. The more gene ml hypothesis
H0 : a 1 - a 2 = 8, where 8 is given, reduces to this one when we substitute, for
example, xi+ 8/2 for xi (where i = 1, 2, · · ·, n 1) and Yj- 8/2 for Yj (where j =
1, 2, · · ·, n 2). Thus, as in the preceding derivation, the question of an invariant
verification of Hn re::l.uces to the question of existence of the equation

s f f s
-co
dx 1
~

-co
dX2
0
co
dV1
0
co n 1 -3
dV2'U1-2-V2_2_
n,-3

X cxp[-f..1 (v1 + xi)-'J..2(v2+xn + µ1X1 +µ2X2]


-~ _.!!.!_ ( µ~ µ~ ') ( µl µ2 )
2 A.2 2 exp 4A., -+- 4A.2 ljl -,;:-;- --x; .
' (3.3.15)
= Cn'J..1
Here C'n > 0 and ·l/I is a power function not depending on the nuisance puameters.
The left-hand member of the equation is an analytic function of Al' A2, µ.1' and
µ. 2 for positive A1 and A2 and arbitrary µ. 1 and µ. 2• Therefore l/J = l/J(t) is a twice
differentiable function. Let us show that l/J"(t) =0. Let us suppose, to the con-
trary, that there exists a point t 0 such that ¢"(t 0) I= O. We multiply both sides of
equation (3.3.15) by the quantity exp [-(µj/4A 1 + fL~/4A 2 )] and apply the differen-
tial operator a2/aµ. 1aµ. 2 to both sides of the resulting equation. Then we set
/A.
µ. 1 = 2t 0 and µ./A 2 = t 0 • On the right-hand side of the equation obtained by
multiplying (3.3.15) by the exponential expression indicated above, we obtain the
expression

' -~ -~ ' 1 (3.3.16)


-CnA.1 2 f..2 2 ljl' (to)· A. 1A.2 •
On the left side we obtain an expression that by virtue of (3.3.13) takes the form
§ 4. STEIN'S TEST 59

(3.3.17)
as A. 1 i 0 and A. 2 i 0. If !/J" (t 0 ) f. 0, we obtain a contradiction between (3.3.16)
and(3.3.17). Thus !/J"(t)::O and lfJ(t)=a+(3t, where aand (3 areconstants.
Furthermore, the argument t µ, 1/,\ 1 - µ,/A. 2 can assume arbitrary values and
=
1/J(t), as power function, satisfies the condition 0 :S lfJ(t):::; 1. Therefore (3 = O
and lfJ(t) =a, so that the test is useless. This completes the proof of the theorem.
However, the methods that we have expounded do not enable us to answer
the general question of hypothesis of the form H0 : y(a, a)= Yo that admit an
invariant verification in a problem of the type of Student's problem, to say noth-
ing of more general problems. To solve such problems, we need finer analytical
methods.

§4. STEIN'S TEST


The above derivations had to do with samples of constant size, normal dis-
tributions defined on a Euclidean space. In a well-known article, Stein [67] showed
in 1945, in particular, that the situation as regards Student's problem (Example 2
of§ 1) changes if, instead of the tests based on samples of constant size, we use
two-sample tests with samples of random size, i.e. if we take the point of view of
sequential analysis. It then turns out that the hypothesis H0 : a= a 0 in Student's
problem will admit an invariant verification. The present section is devoted to an
exposition of this result of Stein.
Suppose that independent random variables x 1 , x 2 , • • • are distributed accord-
ing to a normal law N(a, a 2). We wish to verify the hypothesis H 0: a = a0 with
the aid of a test whose power function depends only on a - a 0 and is independent
of a 2 • Suppose that we have a sample of size n 0 : xl' · • ·, xno· Suppose that

s2 = no 1 1 lf ~ x~ - ~o ( xi)21
i=l
t
\i=l '
(3.4.1)

is an unbiased estimate of the variance. Furthermore, let us define the integer n


by
(3.4.2)

where z >0
is a prenamed constant and [q] denotes the greatest integer not
exceeding q. If we know the number s 2, we can determine the real numbers bi
60 III. INVARIANT POWER FUNCTIONS

(i = 1, · • • , n) such that
n n
~bi= l; b1=b2= ... =bn0 ; s 2 ~b~=Z. (3.4.3)
l=l i=l

Such a choice is possible since, when I7=l


bi= 1 and b 1 = b 2 =· ··=bno' the
minimum min I7=l b'f is attained when bi= l/n (i = 1, 2, · · ·, n) and its value is
1/n. By virtue of (3.4.2), we have n ·:;::: s 2/z and s 2/n ~ z. Now we define a
random variable t' by
n n
~ b1x 1 -a0 ~ b1 (x 1 -a) a-a 0
t' = - i - = _ I _ = - - - i=I +---=u+ a-ao, (3.4.4)
Vz Vz Vz Yz
where
n
~ b1 (x1-a)
i=-1
U=------ (3.4.5)
Vz
Let us show that u has a Student's distribution tno-l with n 0 - 1 degrees of
freedom (see for example [33]} •. Specifically, by (3.4.3),
no n
b1 ~ (x1-a)+ ~ bj(x1-a)
l=l j=-n 0 +1

We note that the expression b 1 I;~l (xi - a) is stochastically independent of s

· · '
x
by the independence of and s 2 in a normal sample. The expression I~
n I =no
+lb .(x. - a)
I I
Is stochastically independent of s and of b I .0 (x. - a) since it con-
1 i =1 i
tains only the observations of xi with j-> n 0 , whereas the preceding expressions
contain only the observations of xi with i ::::; n. Thus for given s and given con-
stants bi, the numerator of the fraction (3.4.5) is distributed independently of s
according to a normal law N(O, a 2 I7 =l b'f). Furthermore, the quanti~n 0 :!_}s 2/a2
has the distribution x;
0 _ 1, so that s is distributed like a 0 _ 1/(n 0 - lJ. Jx;
Thus the fraction u is distributed like X/Jx~0 -if(n -1) \\here X € N(O, 1) is in-
dependent of x;o-1" Therefore u = tno-l is distributed according to Student's
law with n 0 - 1 degrees of freedom. The statistic u can be used to construct
an uorandomized test of the hypothesis H0 : a= a 0 with power function independ-
ent of a. Let a denote the desired level (the possibility of falling into the criti-
cal zone for H0). Let ~ a./i denote the abscissa at which
P {tno-1 >Satz}=; • (3.4.6)
§ 4. STEIN'S TEST 61

We determine the critical zone to be the zone


n
~ bixi-ao
i=l
> sa12· (3.4. 7)

Then we obtain a test for H0 with power function

p{ Itno-1 + av-lo I> Sa/2 }· (3.4.8)

Here z is a constant chosen in advance, and therefore the power function ¢(a)
depends only on a and is independent of a.
We. note also that a Student's distribution (for tn o- 1) is even and unimodal
and that it has a single vertex at the coordinate origin. For given a 0 and ~ a./ 2
the probability (3.4.8) attains a minimum at a= a0 and its value at a 0 is equal
to the level a. For a-/, a 0 it is higher than that level, so that the probability
of rejecting H0 when it is not valid is greater than when it is valid, i.e. this test
is unbiased.
To test H0 : a= a0 against a one-sided alternative a> a 0 , we may follow a
similar procedure. The critical zone of the level a can be defined as

~ bixi-a0
i=l
--,~-=,....---
rz
> sn (3.4.9)
with power function

cp(a)=P{tn.-1 >L+ av/ t·


Of course one can also construct other critical zones with invariant power
function by means of the statistic u.
Let us also stop to look at the distribution of the random variable n, i.e.
the number of observations necessary for obtaini~g a solution. We have

n = max { [ ~ ] + 1. n0 + 1} •
Thus
P{ n=n0 + 1) =P { -z-<.n
~ 1
0)

=P{(no-l)s 2 < n0 (n0 -l)z }=P{X2 < Y}' (3.4.10)


. a2 a2 no-I
where y = n 0 (n 0 - I)z/a 2• For v > n0 + I we have
62 III. INVARIANT POWER FUNCTIONS

s2 l
P[n=v}=P [l v<-+1-<v+l)
z
=Pf\ (v-l)(no-l)z <"I}
n,-1
< v(n 0 -l)z l
J, (3 .4.11)
02 02

and the necessary values of the probabilities P{n = vl can be found from a table
of thex 2 distribution.
From this result, we can find the mathematical expectation
E(n) of the necessary number of observations. We note that these last numbers
depend on the nuisance parameter a. It should be noted that our wish to make the
power function strictly invariant with respect to the nuisance parameter can lead
to a certain loss of information. Therefore it is expedient to use certain varia-
tions of this test that ensure weak dependence of the power function on the nui-
sance parameter but that raise the lower limit of the power (cf. Stein [67]). We
shall not pursue these considerations since the primary aim of the present mono-
graph is to give as complete a description as possible of certain forms of invariant
tests, so as to make it possible to form an opinion of their qualities in comparison
with other tests and on the loss of information in comparison with those situations
in which the values of the nuisance parameters are known. For further information
on tests of the Stein type as applied to various 'bypotheses associated with a nor-
mal law see the article by Chatterjee [78].
CHAPTER IV

SIMILAR TESTS AND STATISTICS

§1. SIMILARITY OF TESTS AND OF STATISTICS


In the preceding chapter we considered several unbiased tests. The principle
of unbiasedness is often put forward as a desirable property of tests.
Suppose that we have a measurable space (!, CO with family of probabilistic
measures !Pel (for () € m that are dominated by a a-finite measure
µ. and two
(in general, complex) hypotheses H0 : () € w c 0 and H 1 : () € O\w. Suppose
that we are verifying the hypothesis H0 with the aid of a test <I> (in general ran-
domized) with power function ¢<1>((}) = E8 (<1>(x)). The principle of unbiasedness
of a test <I> consists in the requirement that the probability of rejecting the hypo-
thesis H0 when it is not valid be less than when it is valid. Thus we must have
cp<I> (0)-< a for 0 Ew. }
(4.1.1)
cp<I> (0) ~a for 0 EQ '\. w.
A test <I> of such a form is said to be unbiased. Of course, this property in itself
does not guarantee high quality of the test. For example the trivial test <I>= a is
obviously unbiased. However, it is a rather natural requirement.
Following Sverdrup [68], we note that if the parameter space 0 is equipped
with a topology according to which ¢<1>((}) is a continuous function of () and the
common boundary aw of the regions w and n\ w is nonempty, it follows from
(4.1.1) that ¢<1>(()) = a for ()€aw. Thus ¢<1>(()) has a constant value a for
() € aw. If we compare this behavior of ¢<1>((}) with the behavior of the trivial
test <1> 1 =1 for which¢<1> 1(()) = £ 8 <1> 1 = 1 for arbitrary () e 0, we see that the
critical function ¢ is similar in a certain sense to the entire space ! (the
critical zone of the trivial test <1> 1). Therefore, the test <I> is said to be similar
on aw.
These considerations indicate the procedure for studying tests <l>(x) for
which

63
64 IV. SIMILAR TESTS AND ST A TIS TICS

E8 ¢(x) =a (4.1.2)
for all values of e in a set0 0 C 0. Such tests are said to be similar with respect
to the family of distributions {Pel (for e € 0 0 ) or, more briefly, with respect to
the parameter set 0 0 . The concept of similarity was first introduced by Neyman
[56l. Let us pause to look at the properties of such tests for the scheme of fam-
ilies of distributions with nuisance parameters (cf. §1 of the preceding chapter).
Suppose that we are testing the null hypothesis H0 : y 1(e 1 , • • ·, es) = 0, • • ·,
q < s and y 1, • • • , y q are sufficiently smooth
• • ·, y q(e 1, • • ·, es) = 0, where
functions, for a family of densities {p (x; e 1, • • ·, es)}. Suppose that the s smooth
functions y 1 (0 1, • • ·, es),···, y q(el, • • ·, es), Yq+l(e1, ... , es),···, ys(e 1, .. ·, es)
constitute a new parametrization and that Yq + 1, • • ·, Ys are the nuisance para-
meters. If we were to succeed in constructing a test cl> with invariant power func-
tion for the trial H0, its power function ¢<1>(y 1, • • ·, Yq) would depend only on the
parameters y 1, • • ·, y q and would not contain nuisance parameters. However1 we
know from the preceding chapter that, in the simplest classical cases of testing
a linear hypothesis and its generalizations, such tests do not exist for repeated
samples of constant size. Therefore it is natural to weaken the requirement of
invariance of the power function with respect to the nuisance parameters. We can
require that the power function ¢<1> be independent of the nuisance parameters at
least when the null hypothesis H0 : y 1 = 0, • · ·, Yq = 0 is realized. Then the
values of ¢<1> will constitute a level a (the probability of rejecting H0 when it
is valid), and we arrive at the similarity condition
¢<1>(e) = E8 «1> (x) = a, ( 4.1. 3)

provided that
(4.1.4)

The points (e 1, • • ·, es) of the set (4.1.4) are limiting values of points that
violate these relations. Therefore if ¢<1>(0) is a continuous function of then, e,
in accordance with what was said above, the similarity condition (4.1.3) of a
test with hypothesis H0 follows from the requirement that the test be unbiased.
The general definition of similarity of a test for e€ 0 0 C 0 can be regarded
as a special case of unbiasedness of statistics (cf. Chapter II, §6). Specifically,
in the present case, we are considering tests, i.e. statistics <l>(x) satisfying the
inequalities 0 ~ <l>(x) ~ I for e€
0 0 C 0. Here the functional f(O} (cf. (2.6.1))
is equal to the constant a. From an analytical point of view, the condition 0 ~
cl> (.x) ~ I is equivalent to consideration of bounded statistics g(x): if lg(x)i < K,
§1. SIMILARITY 65

where K is a constant, then the linear transformation ((x) ---> 1/2 + ((x)/2k
makes the statistic ((x) a test. In what follows, some of the results that we ob-
tain for similar tests will be carried over to a certain degree to unbiased estimates.
Ifwe require, in addition to the conditions for similarity of a test (4.1.3) and
(4.1.4), that the test <I> with critical zone Z be unrandomized, then conditions
(4.1.3) and (4.1.4) take the form
q < s. (4.1.5)

Thus the conditional probability of falling in the ciri ti cal zone Z under H 0 must
be a constant and equal to the level a for each simple particular case H 0. Study
of similar unrandomized tests is quite difficult. The zone Z of a test of the form
indicated is called the similarity zone. In the general case, in analogy with def-
inition ( 4.1.2) we shall call the scalar statistic T (x) similar if, for arbitrary
() € G 0 CG and an arbitrary value of (, the probability Pe(T(x) <()=FT(() is
independent of 0. An analogous definition can be made for the most general case
of a statistic T. Suppose that the statistic T maps measurable spaces (X, <1)
into measurable spaces (1, ~) and that a family of measures IPe} (for (J € mis
defined on X. The statistic T is similar for (J € no c G if p e< r- 1B) is inde-
pendent of 0 for arbitrary 0 € G 0 and B € ~-
If T (x) is a scalar similar statistic, then zones of the form Zg 1 g 2 : ( 1 :5
T(x) < (2
are similar for arbitrary ( 1 and (z (with (1 < (z) and they can be
used to construct unrandomized similar tests. As an approximation to (scalar) similar
statistics one can consider statistics T (x) with mean value Ea T (x) and variance
D8 (T (x)) that are independent of 0 for 0 € G 0 CG. However, their study involves
great difficulties.
Together with similar zones, we sometimes consider bisimilar zones. These
were also introduced by Neyman [56]. Suppose that k measurable real functions
f 1(x), • • •, fk(x) are defined on the space (X, <1) with the system of measures
IPe} (for 0 € {}). Suppose also that

J f,(x)dP 0 (x)=0 for 0EQ0 cQ, l=l, .•. , k.


:r;
Then the similar zone Z C X is said to be bisimilar if it satisfies the additional
conditions
Jf i (x) dP 0 (x) =0 (i = 1, 2, .. ., k). (4.1.6)
z
66 IV. SIMILAR TESTS AND STATISTICS

Bisimilar zones are useful for the construction of similar tests possessing addi-
tional desirable properties. It is natural to consider the bisimilar statistics T (x)
(including randomized tests cl> (x)) defined by the equations

E 6 T(x)=a.; E 6 /i(x)T(x)=0; i= 1, 2, ... , k; ( 4.1. 7)


o
EQoc:Q,
where the fi(x) (i = 1, 2, • • • , k) are the functions referred to above.

§ 2. NEYMAN STRUCTURES.
,
LEHMANN'S AND SCHEFFE'S THEOREMS

Let cl>(x) denote a similar test of level a for e€ 0 0 C 0,

ElP(x)=a.; 0E0oc0. (4.2.1)


Suppose that a family of probabilistic measures IP e I, e € n, has a sufficient
statistic T with respect to e€ 0 0 • Then for an arbitrary value T = t the condi-
tional distribution cl>(x) is, for given t, independent of the parameter e€ 0 0,
Consider the "projection" of the test cl> (x) onto the sufficient statistic T (cf,
Chapter III, §3)
E 6 (<I> (x) It)= <1> 1 (t). (4.2.2)
The statistic <1> 1(t) is independent of the parameter e€ 0 0 • If it is constant
and equal to the level a for almost all values of t (with respect to all measures
induced by T for e€ 0 0 ), a condition which we shall write in the form
E 8 (<1>(x)jt) = <1> 1lt) =a (for almost all t), ( 4.2.3)

then
(4.2.4)

i.e. the test cl> is similar. A test cl> satisfying condition ( 4.2. 3) is called a
Neyman structure for the statistic T (cf. Neyman [58]). An analogous concept
can be defined for the statistic g(x).
An easily-grasped interpretation of Neyman structures is as follows. If T
is a sufficient statistic and ~(x) is another statistic, then the conditional dis-
tribution of ~(x) for a given value T = t is independent of the parameter e € 0 0•
If X and the values of the statistic T are contained in Euclidean spaces, then
under definite (and not excessively cumbrous) analytical conditions we may speak
of the conditional distribution of the statistic ~(x) on the surface T = t (see for
example Cramer [33]). This distribution is independent of 0. Under the same con-
ditions on the surface T = t, we can "cut out" a zone Zt such that the conditional
§2. NEYMAN STRUCTURES 67
probability P8 (~(x) € Zt \t) is equal to a, that is, equal to the level for () € 0 0 •.
If we suc~eed in "gluing" these zones zt
"continuously" for all values of t into
a single zone Z, this zone will obviously be similar to the zone for the level a.
Of course, rigorous treatment of such considerations is rather laborious but it can
be done without difficulty for many important classical tests.
Lehmann and Scheffe [371 have characterized an important class of families
of measures P for which all similar tests are Neyman structures. This character-
ization involves the concept of completeness of a family of distributions. A fam-
ily of probabilistic measures IP el, () € n, is said to be complete if an arbitrary
measurable function f (x) such that
Eef(x) = o, () € 0, (4.2.5)
is equal to 0 almost everywhere. A family IP el is said to be boundedly complete
if an arbitrary bounded measurable function f(x) satisfying the condition (4.2.5)
is equal to 0 almost everywhere. We have (see Lehmann and Scheffe [371 and
Lehmann [36])
Theorem 4.2.I. Let IP el. () € n, denote a family of distributions and let T
denote a suffici'ent statistic for () € n0 CO. For all similar tests «l>(x) for the
hypothesis H 0: () € 0 0 to be Neyman structures with respect to T, it is neces-
sary and sufficient that the family of measures induced by T be boundedly com-
plete.
Proof of sufficiency. Let IP Jl denote the boundedly complete family of dis-
tributions induced by the statistic T. Consider a similar test cl> (x) of level a.
We have Ee( cl> (x) - a) = 0 for () € 0 0 • Consider the "projection" E (cl> (xi t)) =
cI> 1(t). Then E(«l> 1(t) - a)= 0 for all measures Pf with () € 0 0 • Since the family
is boundedly complete, «1> 1(t) =a with probability 1, and hence the test «l>(x) is
a Neyman structure with respect to T.
Proof of necessity. Let us suppose that the family IP f I, () € U 0, is not
boundedly complete. Then there exists a function f such that f(T)-/, 0 with
positive probability, !f(T)I _;:; M < oo (where M is some constant), and Ef(T) = 0
for all Pf with (J€ 0 0 . Suppose that c = (l/m) min(a, 1- a) and «l>(t) =a+
cf(t). Then 0 _;:; cl> (t) .:5 1 and Ee«I> (t) = a for () € 0 0 , so that cl> (t) is a similar
test that is not a Neyman structure. This completes the proof o~ the theorem.
Exponential families (see Chapter II, §5) constitute an important special case
of complete families. Consider the exponential family (2.5.2)
68 IV. SIMILAR TESTS AND STATISTICS

(4.2.6)

where µ (x) is the a-finite measure with respect to which the probability densities
in (2.5.2) are taken. In this connection we have (see Lehmann [361)
Theorem 4.2.2. Suppose that a parameter set 0 0 contains an s-dimensional
parallelepiped. Then the family of distributions induced by the sufficient statistic
T = (T 1(x), · · - , T /x)) for 0 € 0 0 is complete.
Proof. By means of a translation of the statistic T by a constant (vector)
amount we may assume that the s-dimensional parallelepiped contained in 0 is
of the form
/: - a~ e, ~a, a > 0, j = 1, 2, ...• s.
Furthermore, suppose that f( T) is a measurable function such that E8 f( T) =0
for all 0 € 0 0 • Let v(T) denote the measure induced by µ(x) on the space j"
of values of T. We have

J exp(0 T + ...
1 1 +esTs)f(T)dv(T)=O (4.2.7)
;r
for (0 1, · • ·, 0 5 ) € /. The integral (4.2. 7) is assumed to converge absolutely. It
is a multiple Laplace-Stieltjes integral. It can be extended to the complex
"polystrip" II: O.
I
= f I + ir.,I where -a< 1:.
- SJ -
<a,am-oo < r.I <oo for 1·= 1, 2,···
· · ·, s, in which it is an analytic function. By virtue of ( 4.2. 7) it vanishes on the
entire polystrip II and hence (see Theorem 1.2.1) f(T) = 0 almost everywhere
(with respect to the measure v ). The theorem is proved.
We make a slight addition to this theorem that will be useful in the theory
of unbiased statistics for exponential families.
Suppose that an unbiased statistic x(T) in such a family depends only on
sufficient statistics and that it estimates the functj.on p(0 1, • • ·, 05 ). Then it is
the only unbiased estimate depending solely on sufficient statistics. To see this,
suppose that x 1(T) is another unbiased estimate for p(0 1 , • • ·, 05 ). Then
Ee(x(T) - x 1(T)) = 0 for 0 € /, so that by Theorem 4.2.2 we have x(T) = X1(T)
with probability 1.
Theorems 2.6.1 and 2.6.2 enable us to amplify this remark somewhat. Let
g(x) denote an arbitrary unbiased estimate of the function p(0 1 , • • ·, 05 ) with
variance D (g(k)) or, in general, the loss
Eg(s(x)-p(0)),where p(0)=p(0 1, ••• , 0s),
§3. CONSTRUCTING SIMILAR ZONES 69
and let g denote a given convex (downward) function. The conditional mathemat-
ical expectation E <t(x)I T) = x< T) is also an unbiased estimate of p(O). Also,
it depends only on the sufficient statistic and from what was said above it is the
unique unbiased estimate of this form, Thus the "projections" of arbitrary un-
biased estimates onto sufficient statistics coincide. In accordance with Th.eorem
2.6.2, for an arbitrary unbiased estimate t(x),
Eg(s (x)-p(0)) > Eg (x,(T)-p(0) ). (4.2.8)
so that the statistic x<n yields the minimum of the loss.
Whereas the various forms of exponential families provide interesting examples
of completeness, consideration of examples of incomplete families of a distribu-
tion is also instructive. Let us look again at the Behrens -Fisher problem (O:iapter
III, §3). In accordance with formula (3.3.14) the corresponding distribution has
four natural parameters (denoted in the text following formula (3.3.14) by 1\, A. 2,
µ. 1, and µ 2 ). If no relationships are imposed on these parameters, the set of their
values constitutes a four-dimensional parallelepiped and the family defined by
(3.3.14) is complete. However, in the Behrens-Fisher problem we further impose
the condition a 1 = a 2, that is, A. 1µ 2 - A. 2µ 1 = 0, so that we do not have a paral-
lelepiped of the type indicated. In formula (3.3.14) let us set a 1 = a 2 = a. Let
if/(i- y) = l/J(X 1 -
X 2) denote an arbitrary continuous odd function of X1 - X 2
for which there exists a mathematical expectation when the probability density is
that of (3.3.14). The quantity X 1 - X2 is a normal variable with zero mean, so
that the mathematical expectation of the odd function ifJ (X 1 - X 2) vanishes.
This proves the incompleteness of the family (3.3.14) for a 1 = a 2 • The majority
of the questions considered below deal with the behavior of exponential families
in which the conditions imposed on the natural parameters produce incomplete-
ness. The description of similar tests (especially unrandomized tests) and of
unbiased estimates under such conditions is an interesting problem of analytical
statistics.

§3. SOME METHODS OF CONSTRUCTING SIMILAR ZONES

Let us consider the usual situation in which we have a measurable space


(X, ct) equipped with a family of measures IPe }, (;J € n. Let no denote a subset
of n. (This corresponds to the null hypothesis H0 : (;J € n0 .) The zone A, which
is similar with respect to n0 , has a constant probability P8 (A) =a for 0 E n0 •
Let <l>(x) denote the characteristic function of the set A (that is, <l>(x) =1 for
x €A and <l>(x) = O for x e A). We then have an unrandomized test for the
70 IV. SIMILAR TESTS AND STATISTICS

hypothesis H0 .
Now let <1> 1(x) denote a randomized test for H0 • We thus have the similarity
condition E8 <1>1(x) =a for 8 E 0 0 • If the similar test <1> 1(x) were unrandomized,
not only the mean value Ee<l> 1(x) but also the distribution of <1> 1(x) would depend
on the parameter 8 € 0 0 • This is the basic difference between unrandomized and
randomized tests. However, this difference is maintained only for the basic
sample space X and it can be removed in a certain sense by a suitable broaden-
ing of that space.
Let U denote the interval [O, ll with Borel a-algebra and uniform distribu-
tion. Consider the Cartesian product Xx U with the corresponding product of
the a-algebras and the product measure. Let <1> 1 = <1> 1(x) denote a randomized
test on the space ('.X:, ff).
Consider the statistic (<1> 1 ,U), where U is indepen-
dent of x E X and is uniformly distributed on U. With the aid ~f th{s statistic
let us define a new test cf>* on ('.X:, U) by
for U -<I> 1 (x) < 0, (4.3.1)
for U - <I> 1 (x)):. 0.
We see that the test <I>* is unrandomized. Furthermore, for 8 € 0 0,
E 0<I>* = E 0 (E (CI>* I<1> 1 (x))) = E 0<1> 1 (x) =a, (4.3.2)
so that the test <I>* is similar to the level a for 8 € 0 0 • Of course, construction
of <I>* is equivalent to the process of randomization as a supplementary observa-
tion over the statistic U.
Sometimes the procedure described can be applied without broadening the
basic sample space X.
Suppose that X is n·dimensional Euclidean space En and that we have the
sufficient statistic (T1, • • ·, T k) with range in the Euclidean space Ek (k < m)
for8€0 0 • Let (T 1 ,. •• , Tk; g1 , ••• , gm_k) denoteasystemoflocalcoordin-
ates in '.X. Here the a-algebras are, as usual, assumed to be Borel sets, and we
may speak of the conditional distribution P8 {g1, • • •, gm -k IT l• T 2' • • ·, Ts) for
8 € 0 (see §3 of the Introduction).
Now suppose that we have constructed a randomized test <1> 1( T 1, • • • , Tk)
for testing a hypothesis H 0 : e€ 0 0 that depends only on sufficient statistics.
Let us take an arbitrary measurable scalar function V(T1,···, Tk; g1,···, gm_k)
and construct the conditional distribution
( 4.3.3)
§3. CONSTRUCTING SIMILAR ZONES 71

for an arbitrary real value of y. For () €n0 this distribution does not depend on
(). Let us suppose that for almost all values of (Tl•••·, Tk), the function
F(y; T 1,···, Tk) is a strictly monotonic function ofy. Then, as is well known
(see [33]), the transformation U = F(V(T 1, • • •, Tk); (g1, · · ·, gm_l)) yields a
new function the conditional distribution of which is uniform on the interval [O, ll
for almost all values of T 1, • • • , Tk" We can now define an unrandomized test <ti*
by formula (4.3.1), where we replace cll 1(x) with «11 1( T 1, · · •, Tk). This test
cp* depends only on x € !. Furtheqnore, for arbitrary () € n,
(4.3.4)

so that the power functions of the tests ell* and <11 1 coincide. In particular, if the
test «11 1(T 1, •.• , Tk) is similar for ()En 0, the same will be true of the test <ti*.
Its critical zone will be a similar zone.
When -we have constructed a randomized similar test ell that depends only on
sufficient statistics, we can, generally speaking, use it to construct (choosing
different functions V) various unrandomized tests of the same power and the
corresponding similar zones. In Chapter V we shall be able, in the case of cer-
tain broad subclasses of exponential families, to give a rather complete descrip-
tion of randomized similar tests that depend only on sufficient statistics. Our
discussion shows that these tests correspond, in general, to unrandomized tests
with the same power function that depend on the sample point x € X and not on
sufficient statistics. Construction of nontrivial unrandomized similar tests that
depend only on sufficient statistics is not always possible and is a different
problem. We shall speak funher about this and shall give some important special
cases (particularly as regards the Behrens-Fisher problem in Chapters VIII
and IX).
Neyman structures (see §2) are a classical means for constructing similar
zones (see Neyman [SB]). Suppose that we have the situation just described:
X =Em and T 1, • • ·, Tk are sufficient statistics for () € n0 c n c En, where
n < m. Suppose that the family of probability densities {Pel, () € n0, and the
sufficient statistic T = (T 1, • • ·, Tk) satisfy the condition formulated in §3 of
Chapter II regarding the possibility of "coupling" any two values of the parameter
() € n0 • Then we will not have the pathological situations described in that sec-
tion. If X tx) is a statistic independent of the sufficient statistic T, their dis-
tribution will not depend on the parameter () if ()€no. Consequently x(x) is a
72 IV. SIMILAR TESTS AND STATISTICS

sufficient statistic for (} € 0 0 • From a geometrical point of view, construction of


the similar statistic x(x) that assumes only the values 0 and I (the characteristic
function of the similar zone) amounts to delineating zones of given probability a
on the level surfaces of the sufficient statistic T = ( T 1, • • ·, Tk) and ''gluing"
them as described in §2. The similar zones constructed as Neyman structures are
quite different from those constructed above in accordance with formulas (4.3.1)-
( 4.3.3): the .characteristic functions of the latter are conditional distributions de-
pending on sufficient statistics.
We note that direct application of Neymann structures to the construction of
similar tests of the hypothesis H0 : (} € 0 0 obviously leads to useless tests if
the sufficient statistic T = ( T 1, • • ·, T5 ) that is applied is sufficient for (} € 0
but not for (} € 0 0 • Here replacement of the basic sample space X with the space
of values of a statistic Y constructed in such a way that the new statistics that
are sufficient for (} € 0 0 but not for T/ € 0 in the new sample space may be of
help. Such constructions will be made in the following chapters.
Let us now look at other ways of constructing similar zones. It might be
noted that nontrivial similar zones do not exist for all families. In fact, nontriv-
ial randomized similar tests may fail to exist. Consider for example the family of
probability densities (with respect to Lebesgue measure) of the form
p 6 (x) = c1 {(J) (l +cos 2n0x) and Pe (x) = c2 (0) (l +sin 2n0x),
where x € [O, Il and the ci((J) (i = 1, 2) are normalizing constants. Let Cl> de-
note the entire real line -oo < (} < oo and let 0 0 consist of all integers (} = 0,
±1, ±2, .•.•
If cp(x) is a similar randomized test for the hypothesis H0 : (} € 0 0 of
level a, then
l 1
f (<I> (x) - a) cos 2nnx dx = O; f ('1> (x) - a) sin 2nnx dx =0
0 . 0
where n = o, I, 2, ...• (Note that I~ (Cl> (x) - a) dx = 0 also by virtue of the sim-
ilarity condition for (} = 0.) Therefore Cl> (a) =a with probability 1, so that the
test is trivial.
An even simpler example is provided by the family of distributions that is
defined on X= [O, ll and that has probability density
p 9 (x)=(0+ l)x 9, 0E0o= {O, l, 2, ... }.
If Cl>(x) is a similar test of the level a it follows from the equation
§3. CONSTRUCTING SIMILAR ZONES 73
1
J (<l>(x)-a) x 0 dx = 0,
0
()€no, that <l>(x) is equal to a almost everywhere.
Analogous examples are easily obtained from Theorem 4.2.2 for exponential
families. Thus we see that construction of similar zones is possible only for
special families of distributions. Here we shall exhibit some of these families.
Finite families of distributions (families with a finite set of parameters 0 0 )
were investigated in this connection by Ljapunov [541 in 1940 and later by Neyman
[57). Ljapunov's article contains a number of important results regarding com-
pletely additive vector-valued set functions. We shall need only a few relatively
weak results that can be proven more easily than can Ljapunov's results.
Theorem 4.3.1. 1> Suppose that n continuous probability measures µ 1, • • •
• • •, µn are defined on a set X with a-algebra I of subsets. Then for arbitrary
,\ in [O, t] there exists a subspace A,\€ I such that µ 1(A,\) = µ 2 (A,\) = · • •
... = µn(A,\) = A.
Proof. Without loss of generality we may assume that the measures are abso-
lutely continuous with respect to one of them since we can define a new measure
in terms of the n measures by µn + 1 = (µ 1 + • · · + µn)/n and require that µn + 1(A,\) = A,
which changes nothing. All the other measures are absolutely continuous with
respect to this measure.
Furthermore, it will be sufficient to prove the theorem for A= Yi, since re-
peated application of the theorem enables us to construct sets A,\, where ,\ is
an arbitrary dyadic fraction (i.e. a fraction of the form a/2n with a and n inte-
gers), by carrying out the constructiions in the measurable space X- A,\ with
14X- A,\) = 1- A. Then for arbitrary ,\ we may consider a sequence of these
dyadic fractions that approaches A.
Suppose that the theorem is true for all n ~n 0 and suppose that µ 1, • · ·, µno+l
satisfy the conditions of the theorem. From what was said above, to every ,\ in
[O, ll is assigned a set A,\€ I such that µk(A,\) = ,\ for k = 1, • • ·, n 0 • Also,
A,\ 1c A,\ 2 for A. 1 < A.2•
Consider the Borel a-algebra ~ constructed on the sets A,\• Here

1) Proof of this theorem, which is a consequence of Ljapunov's results, was commu-


nicated to the author by V. N. Sudakov.
74 IV. SIMILAR TESTS AND STATISTICS

µ 1(B) = µ 2(B) = • • • = µn 0(B) for arbitrary B € U, The measures µ 1(B) and


µno+ 1(B) satisfy the conditions of the theorem. There are two of them, so that it
remains for us to prove the theorem for n = 2.
For every ,\ € [O, ll,
let us construct a set A,\ such that µ 1(A,\) =,\(which
is possible by virtue of the continuity of the measure µ 1 ). Let ~ denote the
Borel a-algebra on the sets A,\. Let IA,\ l denote the family, constructed for the
measure µ1' of subsets of I. with the properties described above. We may inter-
pret ,\ as a point on a circle of unit circumference. To every Borel subset of the
circle, in particular to every semicircle, there corresponds a subset in U. Let
us now define a continuous function on the circle as follows: to every A we
assign the value µ,\, the measure of the set corresponding to the semicircle be-
ginning at,\. Obviously, if this function is not a constant it assumes values both
greater and less than Yi and hence it must assume the value Yi somewhere on the
semicircle. This completes the proof of Theorem 4.3.1.
Theorem 4.3.1, which, as stated above, is a particular case of Ljapunov's
results [541, can be applied to the construction of certain classes of families with
nontrivial similar zones.
Let pe, () € n0, denote a family of measures defined on ('.X, (!} that is dom-
inated by a a-finite measure µ. Then, in accordance with Theorem 2.1.4, for a
sufficient statistic T (x) we have the decomposition (2.1.4)
Pe (x) = g (T (x), 0) h (x),
where g is measurable in the space of values of T (x) and where h (x) is a non-
negative C!-measurable function. Conversely, (2.1.4) is a sufficient condition for
sufficiency of T (x).
In the articles [ 401 and [ 41] by Linnik, which are generalized in the article
by Kagan and Linnik (23], there is introduced a class of distributions generaliz-
ing (2.1.4) for nontrivial T(x). If '.X is the Euclidean space Em and if T(x) €
Ek (k < m) is a nontrivial sufficient statistic, then under rather general condi-
tions it is possible to construct similar zones in the form of Neyman structures,
as described at the beginning of this section. Let us now consider the family of
probability densities of the form
(4.3.5)

Here x € '.X =Em and T(x) € Ek (k < m) is an arbitrary statistic. The density is
with respect to a a-finite dominating measure µ, and ()€no is an abstract para-
meter. Correspondingly, for given (), the functions Ri(T(x), ()) (i = 1, 2, • • •, m)
§3. CONSTRUCTING SIMILAR ZONES 75

and the functions r/x) are assumed measurable. We see that the family (4.3.5)
generalizes the family (2.1.4) for nontrivial T(x) €Ek and '.X =Em. Let us now
show that under rather general assumptions this family has similar zones (obviously
nontrivial) for an arbitrary level a E (0, 1). First of all, let us put the probability
densities (4.3.5) in the form of finite sums of sign-constant terms. We set
RT(T. 0)=max{Ri(T. 0). o}; rt(x)=max{r,(x). o}.
Rt (T. 0) =min (Rt (T. 0), O}; ri (x) =min (rt (x), o}.
Then we have
Rt (T. 0) =Rt (T. 0)+ RI (T. 0).
rt(x)=rt(x)+rt: (x).
Substituting these expressions into (4.3.5), denoting by R)T, 0) either Rt(T, 0) or
R~(T, (J) and by ';:'i(x) either rt(x) or r;<x), we obtain from (4.3.5)
Pa (x) = R1 (T. 0) ~ (x) + ... +Rn (T. 0) rn (x), (4.3.6)

where "' 0) and ""'


R/ T, ri(x) have corresponding measurability and q ~ n ~ 4q. Now
let us impose the condition
1 - -
ni.e(x)= Ci(e) IR,(T, 0)r,(x)I. i= 1, 2, ...• n (4.3.7)

for all 0 E n0 and i = 1, 2, · · • , n. We may assume without loss of generality


that Ci((f) > 0 for 0 > n0 • We define
c, (0) = f I.R, (T (x). 0) r, (x)ldµ (x) <
;r;
00, (4.3.8)

Then we set
n
Pa (x) = ~ ePt (0) n, a (x). (4.3.9)
i=I '
Here the TT. 8 (x)
i,
are normalized probability densities with respect to the measure
µ, the quantity £i = ± 1, and
n
~ ePt(O)=l, 0Eflo. (4.3.10)
i=I
Also, we denote by TI. e the probability measures corresponding to the densities
17.

8 (x), By virtue of our assumptions (that the space '.X and the range of T(x)
'· Euclidean spaces and that we are using Borel a-algebras),
are we obtain for every
value of T(x) the conditional distributions of the probabilities nt,e<A In (cf.
Theorem 2.1.1). Since T is a sufficient statistic we may assume for 0 € n0 that
(4.3.11)
76 IV. SIMILAR TESTS AND STATISTICS

n
for any fixed value 80 € 0• In accordance with Theorem 4.3.1, for given a €
(0, 1) and a given value of T there exists a set Ar € Cf such that

II1, 90 (Arl T) = II2, a.(ArlT) = ... = IIn, 90 (ArlT) =a,


so that, by virtue of ( 4. 3.11),

Du(ArlT) =Il2, 9 (ArlT) = ... = IIn, 9 (Ar IT) =a (4.3.12)


for all 8 E n0 •
Suppose that for almost all T the level surfaces of the statistic T in' the
Euclidean space Ek are piecewise smooth and admit introduction of local coor-
dinates (T, g) in the space :(=Ek. Then (see Cramer [33]) for almost all T it
is possible to assign to the sets Ar € C! the sets Cr on the sudaces with given
values of T (the traces of the Ar) such that

IIi,e(ArlT)=II1,e(CrlT). 0EQQ. l= 1. 2, ... , n.


Let us suppose that "gluing" with respect to the values of T is possible,
i.e. that the set A = U Cr is measurable. Then, from ( 4. 3.11),

Ili9(A)=a. 0E00 , l= 1, 2, ...• n.


Thus IA "i,e(x) dµ.(x) =a for the same values of e
and i. Let us integrate both
sides of equation (4.3.9) over the set A keeping equation (4.3.10) and the equa-
tion just derived in mind. We obtain
Jp 9 (x)dµ(x)=a, 0E09 . (4.3.13)
A
We have obtained a similar zone of level a. The construction composed gen-
eralizes Neyman structures. In the article [23] this construction is expounded in
a more general form: the sample space and the range of the statistic are not as-
sumed to be Euclidean spaces.
Let us give some examples of families of the form (4.3.5).
Example 1. Suppose that we have a probability density with respect to a
dominating a-finite measure µ. on a space ex, ff) of the form
p (x) =exp { ~1 cpl (x) '¢l (0)} ~1 qi (x) si (0)
9 (4.3.14)

with ':orresponding conditions of measurability. Consider a repeated sample of


size n >N for the sample space !. The corresponding probability density with
respect to the product-measure for µ.(x) is of the form
§3. CONSTRUCTING SIMILAR ZONES 77

p 8 (x 1, ••• , Xn)=exp{~1 Tdx)'lj>d0)}


M
X l:i q 11 (x) ... qin (x) s11 (0) •.. Sin (0) ( 4.3.15)
11• 12• •••• 1n"' 1
where T/x) = ~'/=l ¢/xj), If the second sum in the product (4.3.14) degenerated
to unity, then (4.3.15) would yield an exponential family with nontrivial sufficient

statistics. The family (4.3.15) is of the form (4.3.5) and, in accordance with what
was said above, it admits nontrivial similar zones. It can be regarded as a "de-
generate exponential family".
Example 2 (cf. Example 8, Chapter II, §2). Consider a distribution of the
Pearson-III type:

p(x; a. y)= r~:) (x-a)m- 1 exp {-y(x-a)} (x>a);

where p (x; a, y) =0 for x < a. Here m >0 and y > 0. Let m denote a positive
integer ascertained from observation. Suppose that we wish to verify the hypo-
thesis H0 : y = Yo with the aid of a test that is similar with respect to the values
of a from a repeated sample o(size n.
We know (see Chapter II, §i) that for m >1 there will be only trivial suf-
ficient statistics and we cannot construct Neyman structures. However we can
use the device described above. The probability density of the repeated sample
(x 1 , • • ·, xn) is equal to
p (x 1, ••• , xn; a, y)

for minixi ~
= (r

a. On the other hand,


~:; )n n
1.. 1
(X1 · - ar- 1 exp I-"± 1.. 1
(x, _ a)J
for minixi < a.
Since m - 1 is a nonnegative integer, we obtain representations of the form
(4.3.5), where T(x)=(~t=l"i• minixi), for p(x 1,···, xn; a, y), and the remain-
ing functions are easily found. Consequently, for y = Yo and an arbitrary level
a € (0, 1), we can construct zones that are similar with respect to a.
78 IV. SIMILAR TESTS AND ST ATIS TICS

§4. APPROXIMATELY SIMILAR ZONES

We have seen that in general there are no similar zones or even randomized
similar tests for infinite families of distributions. However, as Besicovitch dis-
covered in 1961 (see [3]), there are approximately similar zones under extremely
general conditions. Here we shall expound a portion of his results. To make the
exposition simpler we have strengthened somewhat the hypotheses in his theorems.
Consider the family of distributions IP e }, 0 € 0 0 , on a measurable space
('.X, £!)
admitting a scalar statistic T(x) that has a probability density p8 (x)
with respect to Lebesgue measure for all values of 0. Also let us assume that
Pe(x) is an absolutely continuous function whose derivative exists everywhere
and has an absolutely bounded majorant M(x):
Ip~(x) I< M (x), M (x) <. M 0 • ( 4.4.1)

Under these conditions we have


Theorem 4.4.1. For a € (0, 1) and f > 0 there exists a set Af €Cf such that
(4.4.2)
To prove this it will be sufficient to construct a Lebesgue-measurable set
Ef on the real axis such that
IJ
Ee
p9 (x)dx-aj<e for 0E~· (4.4.3)

Then we can set A£= T- 1(E£).


To construct Ef we take an interval IN= [-N, Nl of the real line such that

j p (x)dx>-1-:.
9 ( 4.4.4)
JN
Then we partition IN by means of the points • • ·, x_ 1, x 0 , x 1, x 2, • • • into sub-
intervals In-l,l = [xn-I• xn] such that
( 4.4.5)

where 71 is a small number to be chosen later.


= [xn _ 1, xn ], let us consider the
Now, in each interval In - I, 1 subinterval
In-J.,a. = [xn_i.(1-a)xn-I + axn]' We now have, for x € In-1,l>
p0 (x)=p9 (xn_ 1)+(x-xn_ 1)'\'(X, n),
where
§4. APPROXIMATELY SIMILAR ZONES 79

Thus

(4.4.6)
rn-1
where
(4.4.6a)
Therefore

I J
11.-1, a
( 4.4. 7)

1
f ( Pe x )d X= ( Xn-Xn-1 ) Pe ( Xn-1 )+
(Xn-Xn-1) 2
2 Y1(x, n).
(4•4•8)
n-h 1
Furthermore, by inequalities ( 4.4. 5) and ( 4.4.6a) we have

~ (xn-;n-1)2 IY1(X, n)l<ri~Mn-1(Xn-Xn-1)<2NHo'l1• (4.4.9)


(n) (n)
Now, by choosing T/ sufficiently small, we can get 2NH 0 TJ,:::; E/4. Then from
( 4.4.8) we have
N

J Pe(x)dx= ~(xn-Xn-1)Pe(Xn-1)+Y:,
-N (n)

where JyJ ,:::; 1. Thus


N

~(xn-Xn-1)Pe(xn-1)= JPe(x)dx+y: · (4.4.10)


(n) -N
Let us set EE= Un/n-1 a.· From (4.4.7) we obtain
'
JPe(x)dx =a~ (Xn -Xn-1)Pe(Xn-1)+Y1: •
Ee (nl
(4.4.11)

where Jy 1 J,:::; 1. From (4.4.10) we obtain


N

J
Ee
Pe(x)dx =a J
-N
Pe(x)dx+ '':e +vi :
N

=a
-N
JPe(x)dx+v2i.
80 IV. SIMILAR TESTS AND STATISTICS

Then, keeping (4.4.4) in mind, we obtain (4.4.3).


In [31 this theorem is proven under the less restrictive hypothesis that a
majorant M(x) exists and is bounded at all points except points of a closed set
of measure 0. The proof under this hypothesis becomes somewhat more compli-
cated.
Consider a family of probability densities

{Pe(x)}; P_a(x) = (0 + 1) x8; 0=0, 1, 2, ... , x Ero. 11


(see §3) for which there are not even any randomized similar tests. By Theorem
4.4.1, there are still approximate similar zones Af for this family such that

J p 6 (x)dx=a+ve. lvl<: 1.
Ae
The approximately similar zones constructed in accordance with Theorem 4.4. l
have characteristic structure of a "laminated threshold". If we decrease the
number f characterizing the approximation, the layers become ever finer and the
number of them increases in a bounded region of space. From a statistical stand-
point the use of approximately similar critical zones of this kind is inexpedient.
Of course, in evaluating the expediency of such a method we must begin with a
definite criterion, for example the behavior of the power function. However, we
can indicate beforehand why it is not in general statistically expedient to use
approximately similar zones constructed as indicated in the proof of Theorem 4.4.1.
For an observation X € :X: to belong to an approximately similar zone Af it
is sufficient that the probability density p8 (x) of a statistic <l>(x) be smooth and
that it have a bounded derivative. If the space n of the parameters <01, ••• ' (}s) €
Es is a compact set, the case in which IPe(x) I is bounded for (} € no c n but
not bounded for (} € n\no is likely to be a pathological rather than a natural sit-
uation. The natural cases are those in which this quantity, if bounded for (} € 0 0,
is also bounded for (} € n. Then the approximately similar zones Af constructed
for testing the hypothesis H0 : (} € n0 will be approximately similar with the
same level for all (} € n and a test based on them is from a practical point of
view useless.

§5. INDEPENDENT STATISTICS

For the construction of many tests, in particular in problems dealing with a


normal law, it is helpful to use independent statistics. Thus we have several
times used the independence of the statistics x and s 2 in a repeated normal
§5. INDEPENDENT STATISTICS 81

sample. In many cases such properties of independence are connected with the
properties of completeness of exponential families and with the Lehmann-Scheffe
theorems. These relationships were studied in the articles by Hogg and Craig
[73] and Feigel' son [74]. We shall use the notation of Theorem 4.2.l and shall
assume the completeness conditions mentioned in that theorem.
Let x(x) denote a similar statistic for 8 € 0 0 • Consider zones of the form
lx(x) < fl. They will be similar for arbitrary g. Suppose that cl>g (x) is a test
for the hypothesis H0 : () € n0 with critical zone lx(x) <fl of level a. By
Theorem 4.2. l the test cl>g (x) is a Neyman structure. Thus for almost all values
of T = t we have E (cf>g(x) It) = a when () € no. This means that

p {x(x) < sIt} =P(x(x) < s)


for almost all values of t.
Obviously, similar reasoning can be applied to vector-valued similar statis-
tics.
Thus the statistic g(x) does not depend on the sufficient statistic T (x).
Thi·s circumstance enables us to prove the independence of rather complicated
statistics immediately, without calculations that are often extremely laborious.
Let us look at some appropriate examples.
Example 1. In a repeated normal sample x 1, • · ·, xn € N(a, u~), where u 0
is known, a sufficient statistic for a is x (see Example 5, §2, Chapter II). For
a € (- oo, oo), the family is exponential and complete (in accordance with Theorem
4.2.2). Let g(xi - xi) denote a Lebesgue-measurable vector-valued function de-
pending only on the observed differences xi - xi" Obviously this function is a
complete statistic with respect to a. On the basis of what we showed above,
g(xi - x;) is stochastically independent of :X. In particular, s 2 is independent
of :X.
Example 2. In the same normal sample x 1, • • ., xn € N(a, u 2), let us sup-
pose that the two parameters are independent. The statistics x and s 2 are suf-
ficient statistics for (a, u). Let g(xi - xi) denote a Lebesgue-measurable
vector-valued function of the observed differences. Suppose also that g (xi - x;)
is a homogeneous function of zero dimension with respect to its arguments, so
that

for arbitrary <J > 0.


82 IV. SIMILAR TESTS AND STATISTICS

Then obviously g(xi - xi) is similar with respect to (a, a). Since the exponential
family in question is complete, the statistic g (xi - xi) is independent of the pair
(x, s 2 ). In particular, statistics of the form
n - "
;~ = ~ (Xi S X) '
2" l =1
where the notation is the convential one and v ~ 2 is an integer, are independent
of th~ pair (%, s 2 ). (This property was derived in Cramer's book [33) by means of
a rather complicated calculation.)
Example 3. Let x 1, • • •, xn denote a repeated sample from a distribution of
the Pearson-Ill type (see Example 8, §2, Chapter II) where a= O. Thus

p(x; a, y)=p(x; 0, Y)= r~:) xm- 1 exp(-yx) (x>O),

p(x; a, Y)=O (x <O).


Here the parameter y E (0, oo). A sufficient statistic is :X. Let g(x 1, • • ·, xn)
denote a measurable vector-valued function of zero dimension with respect to its
arguments. This function is independent of x. In particular, the random vector
( X1 X2 Xn )
x + ... +xn' x + ... +xn····• x + ... +xn
1 1 1

is independent of x 1 + • • • + xn •
Example 4. Consider a repeated sample from a 2-dimensional normal set with
independent components, i.e. with probability density (in the usual notation)
2 +(Y-a2) 2 ]
Pa(x, Y) = - -1e x p - -
l[(x-a1)
2 2 •
2:n:010'2 2 0'1 0'2

=
0 (a1, a2, 0"1, 0"2).
Here ai € (-oo, oo) and ai € (0, oo) (i = 1, 2), and we have a complete exponen-
tial family.
Now consider a measurable vector-valued function g(xl, • • ·, xn; Yl• • · ·, Yn)
that is invariant under the transformation xi--+ axi + {3, Yi--+ YYi + 8, where
a, {3, y, and 8 are constants with a and y nonzero. This transformation makes
p8 (x, y) a probability density, where a 1 = a 2 = 0 and a 1 = a 2 = 1. In view of
this distribution, the function g(x 1, • • ·, xn; y 1, • • •, y n) is independent of the
parameters a 1, a 2, a 1, and a 2, and hence is a similar statistic that is indepen-
dent of the sufficient statistics of the family:
n n
x, y, si=* ~(xi-x)2; s~= ! ~(y,-y)2.
l=l l=l
§6. A THEOREM OF H. CARTAN 83

In particular, the sample coefficient of correlation


11

n1 xi (x;-x)- (Y;-Y)-
~
i=I
r =---------
S1S2

is a function of the type indicated. Thus the sample coefficient of correlation r


is independent of the quadruple of sufficient statistics (x, y, si, s~). This fact
will be useful to us in Chapter IX. For other examples see (74].
The properties of independence of statistics can in tum be used to charac-
terize the distributions and to construct certain nonparametric tests. Numerous
works, for example (281, (38) and (531, are devoted to problems of characteriza-
tion in this sense.

§6. APPLICATIONS OF A THEOREM OF H. CARTAN TO THE


STUDY OF FAMILIES OF STATISTICS

An important corollary of the theorems of H. Cartan given in Chapter I is


Theorem 1.3.2, which was proved there under the assumption that, in the ring
Oz of complex functions f(z 1 , • • ·, zk) that are holomorphic on a compact simply-
connected polycylinder, all the ideals are finitely generated on that polycylinder.
Following the article (48) we shall apply this result to the study of families
of independent identically distributed statistics. Here we shall see the utility of
considering ideals of functions for studying the behavior of a statistic. In Chap-
ter V we shall use ideals of functions to construct (in a certain sense) complete
families of similar tests. Let us begin by considering families of identically dis-
tributed statistics, keeping in mind the fact that the occurrence of identical dis-
tribution of statistics is closely related to their independence (see (381). Specif-
ically, if (X, Y) is a two-dimensional random vector, then a necessary and
sufficient condition for their components X and Y to be independent is that

. E exp i (t 1X + t Y) = E exp (ltX) E exp (ltY)


2 (4.6.1)
for arbitrary real t 1 and t 2 • Let (X', Y') denote a random vector with compon-
ents that are independent and distributed like X and Y respectively. We denote
this fact by writing
X-:::::..X', Y,_,Y'.
Then for arbitrary real t 1 and t 2 equation (4.6.1) can be written in the form
E exp i(t 1X + t Y) = 2 E exp l(t 1X' +t Y'),
2
84 IV. SIMILAR TESTS AND STATISTICS

for arbitrary real t1 and t 2• Thus the independence of X and Y is equivalent to


identical distribution of t 1 X + t 2 Y and t 1 X' + t 2 Y' for arbitrary real values of
t 1 and t 2 •
Let ('.X:, Cf) denote a measurable space equipped with a probability measure
µ. Let Q(X; al,·••, as) denote a family of statistics that are dependent on the
real parameters a 1 , • • ·, as and that vary in some closed bounded parallelepiped
0. To shorten our notation we set
(a 1, ••• , as)= a, Q (X; a 1, ••• , as)= Q (X, a).

If two statistics Q(X, a) and Q(X, a') of the family are identically distrib-
uted, i.e. if Q(X, a)~ Q(X, a'), then existence of the moments E (Q(X, a))' and
E ( Q (X, a'))r, r = 0, 1, 2, • • • , would mean that the moments are equal:
E(Q(X, a))'=E(Q(X, a'))', r=O. 1, 2, .. ·; ( 4.6.2)
Conversely, if the problem of moments were determinate in the present case, then
a countable set of equations (4.6.2) would imply identical distribution of the
statistics Q(X, a) and QCX, a'). The conditions under which the problem of
moments is determinate are well-known:
00 1
~ ~2r=oo,
r=l
where a 2r is the 2rth moment (see [80]).
Although these conditions are only sufficient, they are almost necessary.
In this connection, these conditions are not satisfied even with a normal distrib-
ution of polynomial families of statistics (see [39]). Therefore we shall consider
not the moments (4.6.2) but their generalizations which we shall call the <1>-mom-
ents. Specifically, suppose that we have a family of continuous functions
{<1>/U)l (r = 1, 2, • • ·) of a real argument U. Suppose that the family {<l>,(U)l has
property (D):
(D) Determinacy of the analogue of the moment problem: The mathematical
expectations (<1>-moments)
E<I>, (Q (X, a)), (4.6.3)
exist, and their value determines the distribution Q(X, a) up to µ-measure zero.
§6. A THEOREM OF H. CARTAN 85

Furthermore, let us suppose that, relative to the family {<l>/U)l, the statis-
tics Q(X, a) possess property (f'):
(I') Holomorphic connectedness. Let us define
E (<I>, (Q (X, a)))= q>, (a 1, ••. , as). ( 4.6.4)

Then the region of the s complex variables a 1, • • ·, a5 contains a simply-con-


nected compact polycylinder 0 1 ~ 0 on which all the functions (4.6.4), r = 1,
2, • • •, exist and are homomorphic. (This applies to boundary as well as interior
points.)
By assumption, giving (4.6.3) determines the distribution Q(X, a). There-
fore if
E<D,(Q(X, a))=E<I>,(Q(X, a')), r= l, 2, (4.6.5)
for two values of the parameter a, a' € 0, then Q (X, a) and Q (X, a') are iden-
tically distributed statistics. The converse is obvious. If Q(X, a)~ Q(X, a'),
then (4.6.5) holds for r = 1, 2, · · ·•
It turns out that condition (r) leads to the fact that the property of identical
distribution of any two statistics of the family follows from a finite number of
relations of the form (4.6.5).
Theorem 4.6.1. For given(X, <t, µ),and given families {Q(X, a)! and {<I>,l,
if the condition (I') of holomorphic connectedness is satisfied, then there exists
a finite constant R 0 such that the relations
.E<I>,(Q(X, a))=E'1>,(Q(X, a')), a'Eg, (4.6.6)
where r .$ R 0 , imply that
Q(X, a):::Q(X, a'). (4.6.7)
To prove this we note that (4.6.7) follows from a countable set of relations of
the form ( 4.6. 5). These may be put in the form
(4.6.8)
Consider the polycylinder 0 1 x 0 1 of points of the form (a, a'). Let us construct
an ideal I generated by the differences
q>,(a 1, ••• , as)-qi,(a~ • ... , a:) (r= 1, 2, ... ):

On the basis of the corollary of Cartan's theorem mentioned above, the ideal
I must have a finite basis and must be generated by the differences
. ("al' ... • as
q>, . )- q>, (al'
' ... • as')
86 IV. SIMILAR TESTS AND STATISTICS

for r:::; R 0• Thus, if equations ( 4.6.8) are satisfied for r :5 R 0, where a =


(al, ... ' as) and a' = (ai, ... ' a~) are two points belonging to n, then they are
also satisfied for all r > R0, so that equations ( 4.6. 5) are satisfied for all r and
hence (4.6. 7) holds.
An analogous assertion can be made for conditions of independence of two
statistics of a given family. Consider the same measurable space CX:, Cf) with
measure µ and two families of statistics IQ<X, a)l, for a€ n, and I T(X, a')I,
for a'€ n'. Furthermore, let l<P/U)I and l'l't(U)l (r, t = 1, 2, • • ·) denote two
families of continuous functions such that the functions
E[<I>,(Q(X, a))'Yt(T(X, a'))] =qi,,,(a, a'), r, t= I, 2, ... , (4.6.9)
and the functions
E<I>,(Q(X, a))= X,(a), E'I' 1 (T(X, a'))='t',(a'), ( 4.6.10)
r,t=l,2, ...
are both defined in the polycylinder n x n' of values of (a, a').
Furthermore let us suppose that a countable set of equations of the form
<p,,,(a, a')=x,(a)'t',(a'), r, t=I, 2, ... , ( 4.6.11)

for points a € n and a' € n' imply independence of the statistics Q(X, a) and
T (X, a'). In analogy with condition (I') above, let us formulate a new condition
(r') of "holomorphic connectedness" for this situation:
(r'). There exist simply-connected compact polycylinders 01 :::> n and Oi J
n ' such that the functions Xr(a), r = 1, 2, • · • , are holomorphic for a € 0, the
functions rt(a'), t = 1, 2, • • ·, are holomorphic for a' € Oi,
and the functions
<Pr)a, a') (r, t = 1, 2, •.• ) are holomorphic for (a, a')€ 0 1 x Oi.
(Again, this
applies to boundaty as well as interior points.)
Under these conditions we have
Theorem 4.6.2. For given families IQ(X, a)l, {T(X, a')I, l<P,l, and !'Pt},
there exists a constant R 1 such that finitely many uncorrelatedness relations
E [<I>, (Q (X, a)) 'I', (T (X, a'))]
=E<l>,(Q(X, a))E'I' 1 (Q(X, a')) ( 4.6.12)
for 1 :5R 1 and t :5R 1 and for arbitrary given values of a and a', imply indepen-
dence of the statistics Q(X, a) and T(X, a').
The proof is analogous to the preceding proof. On the polycylinder 0 1 x Oi
we consider the ideal I generated by the functions
<p,, t ta. a') - '1.r (a) 1'1 (a'). ( 4.6.13)
§6. A THEOREM OF H. CARTAN 87

By virtue of the compactness and simple connectedness of the polycylinder


U 1 x U~, the ideal I has a finite basis, by the above-mentioned corollary of
Cartan's theorem, so that it is generated by functions of the form (4.6.13) for
r .:S R1 and t .:SR 1• If equations ( 4.6.12) are satisfied for r .:S R 1 and t .:SR 1 and
for any particular values a E n and a' € n I' then, by virtue of what has been
said, they are satisfied for all values of r and t. Hence the statistics Q(X, a)
and T(X, a') are then independent for given values of a and a'.
Thus we see that independence of two statistics in the families of statistics
described above is equivalent to the existence of a finite number of uncorrelated-
ness relations of the form ( 4.6.12).
Let us give some examples of the application of Theorems 4.6.1 and 4.6.2.
Example 1. Let {Q (X, a) I denote a family of positive polynomials, for ex-
ample the squares of other polynomials. Suppose that a € II, where II is a com-
pact parallelepiped in E5 • Let us take for the family l~/U)} the set of functions

<l>r(U)=U 7 - 1 exp(-G(U)), r=l, 2, .... ( 4.6.14)


where G(U) is a nonnegative polynomial of degree ~ 2 for U > 0. An arbitrary
compact polycylinder Z contains II, and the family (4.6.14) gives a basis for
using Theorem 4.6.1. Identical distribution of the statistics Q(X, a) and
Q(X, a') for a € II and a' € II is equivalent to coincidence of their ~-moments
(of which there are r _:SR 0 ) constructed with the aid of (4.6.14).
Example 2. To apply Theorem 4.6.2 we may take two families of nonnegative
polynomials {Q(X, a)} and IT (X, a')} for a € II and a' € II and, in addition to
the family ( 4.6.14), an analogous family of functions with the same or a different
polynomial G ( U).
For the families {Q(X, a)} and {T(X, a')} we may take the set of rational
functions and many other sets of functions.
CHAPTER V

COTEST IDEALS FOR EXPONENTIAL FAMILIES

§ 1. SIMILAR TESTS AND COTEST IDEALS

In the present chapter we shall examine the construction of classes, complete


in a certain sense, of similar tests for exponential families under certain restric-
tions. Such a construction is equivalent to finding a basis of an ideal. In this
section we consider the question from a general point of view, without involving
exponential families.
Suppose that we have a measurable space ex. Cf). Let IP al, 0 € n, denote
a family of measures defined on it that are dominated by a a-finite measure µ.
Suppose that we are testing the null hypothesis H 0 : 0 € n 0 c n. To simplify the
statement of the problem, let us consider a Bayes situation as an alternative: a
probability measure B (0) is defined on the set n \n 0 ; if we know B (0) then the
alternative becomes the simple hypothesis. To convert the null hypothesis H 0
into a simple hypothesis let us consider randomized tests that are similar with re-
spect to n 0 for testing it. Such a situation may seem somewhat artificial, but it
is possibly the least complicated one for an initial description of similar tests·
and the choosing of tests that are optimum in some sense or other.
Let <I> = <I> (x) denote a randomized similar test of level a € (0, 1) for testing
the null hypothesis H 0 : 0 € n0 , so that

E(<I>\ 0) =a for 0 € n0 . (5.1.1)

Under such conditions we shall refer to an arbitrary statistic of the form


l/J =A (<I> - a), where A is a constant, as a cotest. Obviously
E (t/J \ 0) = o for 0 € n0 . (5.1.2)

For every cotest t/I = t/I (x) there exists a constant C such that
- a ::; Ct/I (x) $ 1 - a (5.1.3)

89
90 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

for all x. The number a is called the level of the cotest. A precotest of a level
a is defined to be any statistic g (x) for which E (g (x)JO) whenever 0€ n and
E(g(x)JO) =o for 0 € n0 . (5.1.4)

To describe similar tests of level a we need only know how to describe the pre-
cotests. If we have a description of the precotests, for example a parametric de-
scription with arbitrary functions belonging to some class serving as parameters,
we need only choose f~om them the precotests such that - a~ g (x) ;$ 1 - a for
all x. Similar tests are of the form
~(x) = g(x) +a.
Let us now look at the less general scheme introduced in § 1 of Chapter II.
The parameter 0 = (0 1, • • • , 0 s) lies in some compact Borel subset n (usually a
parallelepiped) of the Euclidean space Es . The probability density
p (x; 0 l' ••• ' 0 s) with respect to the dominating measure µ. is assumed to be a
continuous function of the parameter 0 € n for almost all values of x.
Consider the precotests belonging to the class K of complex-valued statis-
tics f(x) satisfying the following condition: the space E2 s of complex s-tuples
Io r ... , os l contains a simply-connected compact polycylinder z ::> n such that
the mathematical expectations
E (f(x)JO) = ¢(0) (5.1.5)
exist and are holomorphic on Z. The "images" ¢(0) of the statistics f(x) con-
stitute a ring K over the field of complex numbers.
Of course the class K of statistics of this sort may be trivial and reduce to
the statistic f (x) =O. Even if it is not trivial it may prove too small to contain
the cotests that are optimum in some sense or other. However, for the case of ex-
ponential families the class K is sufficiently broad. Furthermore, finding the pre-
images f (x) from their images ¢ (0) is comparatively simple for these families.
If f(x) € K is a precotest, the image ¢(0) vanishes for 0 €no· The con-
verse assertion is also obvious. The set of functions ¢ (0) € K that vanish for
0 € n0 constitutes an ideal of the ring K. We shall call this ideal the cotest
ideal of the ring K and denote it by I H 0 . (It would be more accurate to call it the
"precotest ideal" but such a term would be too cumbersome.) Thus description of
the cotests in the class K reduces to description of the ideals of functions in the
ring K that are generated by functions that vanish on no c z (see Chapter I). It
should be noted, however, that the theory of ideals in rings of holomorphic functioos
§ 1. SIMILAR TESTS AND COTEST IDEALS 91

was developed (by Cartan and others in the publications cited in § 3 of Chapter I)
primarily for rings of all functions that are holomorphic on a compact polycylinder
and not on their subrings. However, even the information that we already have is
quite useful for investigating similar tests in certain families of distributions.
Under the conditions of a given Bayes measure B (0) on simple alternatives
to H 0 , it is natural to treat as the optimum similar test out of all similar tests
that one cl> that maximizes the integral f cn\Oo)E (cl> I 0) dB (0). However, if such
a test exists it is usually hard to find. Therefore it may prove expedient to find
an f-optimum similar test «l>f satisfying the condition

w (ll>e IB)= f E (ll>e I c; 18 (0)>- sup f E ('1> 10) dB (0)-e, (5.1.6)


(Q"Q·) <I> (Q"Q·)

where the supremum is over all similar tests cl> of given level a and where f is
a positive number. Instead of an f-optimum test, we need only find an f-optimum
cotest t/lf satisfying condition (5.1.6), with «l>f replaced by tflf, and the condition

- a ::5 t/lf(T).::; I - a. (5.1.7)

·For families (see above) with densities of the form {p (x; 0 1, • · ·, (} )} which
. s
for all x are smooth functions of the parameter (} = (0 1' · • • , (} s), we may consider
optimization of the test from the standpoint of the local properties of its power
function (see [36 ]). For example, if a hypothesis H 0 is of the form H 0 :
y(0 1, •••,Os)= y 0 , where y is a smooth function, and if y 1 = y(0 1, •••,Os),•••
• •'' Ys = Ys (01' •••,Os) is a local coordinate system, we may examine the be-
havior of the power function

(5.1.8)

For y = y 0 and arbitrary admissible values of y 2, · · • , y s , the function ( 5.1.8)


assumes a constant value a. If this function is twice continuously differentiable
with respect to y for arbitrary y 2 , y 3, · · •, Ys, we may require that "the tests be
unbiased in a neighborhood of Ho"; i.e. we may require that
(a/ay)E(«l>iy,y 2 ,···,ys)=O for y=y 0 and arbitrary values of y 2 , ... ,ys.It
is desirable to have the value of the second derivative (a2 I ay2) E(«I> !y, Y2• ••• 'Ys)
as great as ·possible at y = y 0 for all values of y 2 , • •• , y s • Of course this can-
not in general be done simultaneously for all points y • • • y
2' ' s.
If the family of distributions IP el that we are studying has nontrivial suffi-
cient statistics, then by what was said in §3 of Chapter III it is natural for us to
92 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

confine ourselves to similar tests that depend only on sufficient statistics. Char-
acteristics of the type (5.1.6) will be no worse for these.

§2. STATEMENT OF THE PROBLEM FOR


INCOMPLETE EXPONENTIAL FAMILIES

Let us again look at exponential families of distributions. To make the ex-


position as simple as possible we assume that the sample space is the Euclidean
space En and that the dominating measure is Lebesgue measure.
As we know (see Chapter 11), a compatible distribution of suffident statis-
tics (T 1, ··•,Ts) in the space Es also constitutes an exponential family with
probability density which it will be convenient for us to write in the form
p0 (T 1, ... , Ts}=C(0)exp [-(01T1 + ... + 0sTs)1 h(T1, ... , Ts).
(5.2.1)
Let us assume that the set of points of discontinuity of h (T 1, • • ·, Ts)
has Lebesgue measure O.
Many problems with tests of the "even type" involve only sufficient statis-
tics T 1, · • ·, Ts that retain sign. For·example, such a problem is the problem of
constructing similar tests for the Behrens-Fisher problem that depend only on the
statistics (:X - y) 2 , s ~, and s ~. where the notation is that of Chapter III, §3. ·we
shall look at this problem again in Chapter VIII, § 2. Cases of the type
T.>c. or T.<c. (i=l 2···,s), (5.2.2)
£-' £- t ' '

where the c i are constants, reduce to this case if we replace the Ti with suffi-
cient statistics Ti - ci (i = 1, 2, · · ·, s ). We shall call exponential families with
sufficient statistics of this kind one-sided exponential families. Without loss of
generality we can write (making a transformation of parameters if necessary) the
probability densities for the sufficient statistics of a one-sided family in the form

p 0 (T 1•... , Ts) =C(0)exp[-(0 1T 1 + ... +0sTs)lh(T 1, •• • , Ts),

(5.2.3)
where, in the case of a one-sided family, we assume T. >O for i = 1 2 ··· s.
t - ' ' '
In a more general situation we assume that h (T 1, • • ·, Ts) behaves like the func-
tion m (Tl' · · • , Ts) in § 1 of Chapter I, specifically that h (Tl' • .. , Ts) vanishes
for Ti < 0, where j = 1, 2, · · ·, s 1 and s 1 =5 s. The integral (5.2.3) converges
absolutely in the Cartesian product P of the s 1 half-planes R.: Re(). > O
I I
§L INCOMPLETE EXPONENTIAL FAMILIES 93

(j=l,2,···,s 1) andthe s-s 1 strips Sj:O<Reej~Aj,wherethe Aj are


finite numbers. (By a linear transformation of the parameters and the variables we
can reduce arbitrary strips to this type.) The region P described above is the
appropriate region of variation for natural complex parameters. In posing problems
of testing statistical hypotheses we shall assume that the parameters ej
(j = 1, 2, ... , s) are real. As a rule, the region U of variation of these para-
meters will be a closed bounded parallelepiped in the Euclidean space E5 • Let
us assume that the null hypothesis H 0 in this region is of the form
II1 (01 ....• 0,) = o..... II, (01, ...• es)= o. r < S, (5.2.4)

where the functions TI I. (e 1, · .• , es ) (j = 1, 2, · · ·, r) are holomorphic and real on


U. Furthermore let us assume that the functions TI 1' • • • , TI, satisfy condition (r).
(r) The holomorphy condition. There exists an integer N ~ 0 such that the
functions
(5.2.5)

are holomorphic on the entire region P, including its boundary.


Here the behavior of TI i at infinitely distant points (ei = oo), i = 1, 2, · · ·, s,
is significant. Condition (r) is not very restrictive and is satisfied for polynomi-
als and other functions of similar asymptotic behavior. Thus the null hypothesis
H0 imposes r ( < s) holomorphic relationships on the s natural parameters
e e
1' • • · , 5 • FrC:,m an analytical point of view these relationships generate a real
aualytic set (see Chapter I) contained in U. Let us denote it by 0 0 • By what
was said at the end of the preceding section, we can confine ourselves to tests
that depend only on sufficient statistics: ¢(T 1' • · •, T 5). Assuming that an alter-
native H 1 to H 0 is given in the Bayes formulation (as mentioned in the preceding
section), we seek £-optimum similar tests for the hypothesis H 0 that are defined
in accordance with formula (5.1.6). Here we need to impose a number of conditions
on the relations (5.2.4) determined by the null hypothesis H 0 and on the function
h (T 1' • • • , T) defined in the region :f: T j > 0 for j = 1, 2, · · · , s 1 . We define
Condition (A). The function h ( T 1, • • • , T 5 ) is positive almost everywhere
in P.
Although this condition is satisfied in many cases, for example in many prob-
lems on the statistical analysis of normal vectors, it is still rather restrictive;
but if it is not satisfied our investigations become quite complicated. We shall
not go into this matter in the present book except in Chapter VII, which deals with
94 V: COTEST ID~ALS FOR EXPONENTIAL FAMILIES

unbiased statistics. One can easily see that the case in which the function
h (Tl'· • • , Ts) vanishes outside an s 1-faced cone in s-dimensional Euclidean
space but is nonzero almost everywhere inside that cone reduces to condition (A).
By applying a suitable nonsingular linear transformation to the sufficient
statistics, we can then apply a linear transformation to the parameters, replacing
them with new parameters, in such a way that condition (A) is satisfied.

§3. IDEALS OF PRECOTESTS

In line with the definition given in § 1, we now define a precotest


~ (T 1' • • • , Ts) as any Lebesgue-measurable statistic such that the quantity

C (0) f ... f s(Ti• •.. , T 5)

d
Xexp[-(0 1T 1 + .. .+6 T )]h(T
5 5 1, •• • , T 5 )dT1 ••• dT 5 (5.3.1)

exists and vanishes for

(5.3.2)

We note that C (O) -!- 0 for 0 € 0, since the quantity ( 5. 3.1) must be equal to
=
unity if ~ (T 1, • • • , Ts) 1. Consider the set of all statistics m (Tl'··· , Ts) for
which the s-fold Laplace transform

L (m I0) = f ... f m (T 1• • •. , T5 )
d
Xexp[-(0 1T 1 + ... +0.T )]dT 5 1 ••• dT3 (5.3.3)

converges absolutely in our basic region P. If we set u (Tl' • • • , T) =


m(Tr·••, Ts)/h(T 1,··•, Ts) for (T 1, .•• , Ts)€5°,whichwecandobyvirtue
of condition (A), we find that
C(O)L (m \0) = E(u \O). (5.3.4)
To find the precotests, we can find the statistics m for which
L(m\O)=O for 0€0 0 . (5.3.5)
Then ~ = m/h is a precotest. Since C ((}) -!- 0 for 0 € 0, this procedure provides
us with all the precotests.
The statistics m(T 1,···, Ts) for which the integral (5.3.3) converges abso-
lutely constitute a linear space 9Jt. Here the region of scalar multipliers can be
the set of all real numbers, or the set of all complex numbers if we are admitting
§ 3- IDEALS OF PRECOTESTS 95

complex-valued statistics. If m 1, m 2 € !Dl then the convolution (cf. (1.1.2))

Tl Ts1 00

m1 * m2 = J d61. · · J dss, · · · Jd£ m1 (T1-£1, · · ·• Ts-ss)


5

0 0 -oo
(S.3.6)
belongs to !Dl and
L(m 1 * m2 10) = L(m 1 10) L(m 2 , 0) (S.3.7)

for f) € R 1 x • • • x Rs (see [16]).


Thus the elements of the space !Dl constitute a ring, with the convolution
defined by formula (S.3.6) serving as the ring multiplication. The images L (mJ fJ)
of the statistics m constitute a ring with respect to ordinary multiplication. We
denote this ring by L (!Dl).
If ~ is a precotest then the function g • h belongs to !Dl and

E Cs I0) = L (sh I0) = o for 0 E0o. (S.3.8)

Let m denote an arbitrary function belonging to !JJt. From (S.3.8) and (S.3.7)
we have

Thus if ~is a precotest, then an arbitrary function of the form [m *(gh)]/h


is also a precotest and functions of the form gh
constitute an ideal I H 0 C !Dl. We
shall call this ideal the cotest ideal of the null hypothesis or, more briefly, the
cotestideal. If ~h€/H 0 ,thentheexpressions E(gJfJ)=L(ghJfJ) constitute
an ideal in the ring L (!Dl). We shall call this ideal the image of the cotest ideal
I H 0 and we shall denote it by L (/ H /
The investigation of similar tests under our assumptions reduces to study
of the image of the cotest ideal L (/H 0 ). If we construct a basis for the ideal
L (/H 0), we shall also have a basis for the cotest ideal and a description of all
precotests g. Of these we then need to choose those cotests for which

- a~ g ~ 1- a. (S.3.9)
To choose an f-optimum cotest i/Jf, we impose, following (S.1.6), the condition

W(¢elB)= J EC¢e!0)dB(0):>sup f E(¢10)dB(0)-e, (S.3.10)


(Q " Q.) "' (Q " '20)

where the supremum is over all cotests i/Jf of level a subject to the condition
96 V. COTEST IDEALS FOR EXPONENTIAL F AMll.IES

(5.3.9), in which we need to set g =I/Ji. The £-optimum similar test is <l>f.= l/lf.+a.
However, construction of a basis for the ideal L (I H 0) in the ring L (9.lt) is a diffi-
cult matter. Investigation of the ideals in rings of holomorphic functions has been
carried out primarily for rings of ide~ls of all functions that are holomorphic on
some compact polycylinder (see the articles of Cartan cited above, and also of
K. Oka and other authors). In a recent article [29] Cartan investigated ideals in
rings of all functions that are holomorphic in the region of real values of the argu-
ments. We know virtually nothing about other types of rings of holomorphic func-
tions, in particular about our basic ring L (!lit). Therefore, having in mind the appli-
cation of Cartan' s results (expounded in Chapter I,§ 3) to an investigation of the
images of cote st ideals, let us look at an extension L (/ H 0) of the ideal L (I H0)
--
for the ring L (9.Jt) of all functions that are holomorphic on "'
0 (see Chapter I). We can
describe a basis for this extension L (/H 0) under certain conditions in the same
way as for an ideal in L (!D1). By imposing further conditions on the function
h ( T 1, • · • , T5 ), we can use our knowledge of the basis for the ideal L (/ H 0) to
find the £-optimum cotests. The values assumed by the parameters 0 1 , • • •, 05 lie
on the parallelepiped 0 c A. To simplify the remainder of our discussion let us
suppose that the parallelepiped n is of the form
0/;;:, ei• j = 1, 2, ...• s1; er<.. 0r<.. E ;• j = s1 + 1•...• s, (5.3.11)

where the f. j are small positive numbers and the E; are given positive numbers
less than the corresponding Aj.
The boundary of the region P contains points with infinite coordinates, in
particular the point (oo, • • • , oo). We denote by ao p the set of such boundary points
of the region P. Functions of the form (5.3.3), i.e. elements of the ring L (!D1), will
not in general be analytic at the points of ao P. It turns out that for s = 1 analy-
ticity of these elements at the points of a0 p is equivalent to entireness of a cer-
tain order (see [ 10 ]) of the function m ( T 1) of the complex: variable T 1 .
The requirement that the precotests be holomorphic may restrict our choice of
similar tests, and we shall not make this requirement. In such a case we need to
reckon with the difficulty arising from the fact that we are unable to consider func-
tions that are holomorphic on all P and must assume that they are holomorphic in
the interior but not on the boundary of P. One encounters considerable difficulties
in studying ideals of functions that are holomorphic in an open region. Up to the
present time such ideals have not been completely described even in the simplest
§4. APPLICATION OF CARTAN'S THEOREMS 97

cases (see Gleason ( 13 ]). What we have .to do is impose stringent requirements in
order to obtain a sufficiently complete description of the corresponding similar
tests. With a view to applying the corollary to Theorem 1.3.1 (Cartan's theorem),
we shall study first not the cote st ideal L (/H 0) but its extension L (I H 0) in the
ring of all functions that are holomorphic inside P.

§4. APPLICATION OF CARTAN'S THEOREMS

The precotest ideal L (/H 0) consists of all holomorphic functions in L (!Ill)


that vanish at all real points (0 l' · · · , 0) of the region n satisfying ( 5.2.4). Our
plan is to apply the theorems of Chapter I,§ 3 dealing with holomorphic functions
L (!Ill) that vanish at all complex points of the analytic set (5.2.4) belonging to P.
Obviously the ideal that they form is only a portion of the ideal L (/H 0) in the
general case. In view of this we need to admit a "complexification" of conditions
(5.2.4) (see Cartan ( 29 ]). We shall do this in the elementary form indicated by
Theorem 1.2.2.
Let us look at the relationships between the parameters in (5.2.4) when the
parameters (O 1, • • • , e) are complex members of P. We make an assumption com-
pletely analogous to the hypotheses of Theorem 1.2.2. We denote the analytic set
(5.2.4) by vn r··n r .
Condition (B). Inside P the analytic set Vn
l •••
Il
T
can be decomposed into
finitely many connected components Vnq Il each of which is of complex dim-
1 • •• s
mension s - r and contains the connected set Rnq
l •••
n c n0 of real points of
T
real dimension s - r.
When the set Vn n has such a structure, the functions that are holomor-
1 • •• T
phic in p and vanish in no also vanish at all complex points inside p (by Theo-
rem 1.2.2).
Thus the ideal of the functions L (!Ill) that vanish at all complex points of the
analytic set (5.2.4) coincides with the original ideal L (IH 0) when condition (B)
is satisfied.
It would also be desirable if the functions n l' ... ' n T defining the conditions
connecting the parameters (5.2.4), which are holomorphic on the closed region P,
generate the ideal L (/H 0 ) locally. To arrange for this, we impose on the relation-
ships (5.2.4) the following condition.
Condition (C). The analytic set ~ of points in the region P such that
98 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

arr1 arrl
aol aes (5.4.1)
rank . <r.
arr, ... arr,
aos aes
has no points inside P.
We note that condition (C) is not especially restrictive. The inequality re-
garding the rank of the matrix yields, in general, an analytic set of complex dimen-
sion not exceeding s - r - 1. In general the first r of conditions (5.4.1) make the
set !a empty and not merely finite. Also, it is easy to exhibit many cases in
which condition (C) is not satisfied. Thus, if the entire functions II j (e 1, · • ·, ()s)
are squares of other entire functions, then the set !a has, in general, complex di-
mension no less than s - r. Of course. in our present example it is not expedient
to use relationships of the form (5.2.4). Without changing the problem we can re-
place them with the conditions
n)• = o U= 1, 2 •... , r).
Now we can prove
Theorem 5.4.1. If conditions (B) and (C) are satisfied, the ideal L (IH 0) in-
side P is generated by the functions II 1' • • • , II r. Thus if F € L UH 0 ),then
there exist functions G 1, • • •, Gr, holomorphic inside P such that

F = II 1 G 1 + • • • + IIr Gr . (5.4.2)

To prove this we note that, by virtue of condition (B), the ideal L UH 0) coin-
cides with the ideal of all functions that are holomorphic inside P and that vanish
at all complex points in the analytic set VII 1•.. II r inside P. The representation
(5.4.2) then follows from the corollary to Cartan's theorem (Theorem 1.3.2).

§5. THE BEHAVIOR OF SMOOTH PRECOTESTS


The results of the preceding section dealt with the ideal L (IH 0), which is an
extension of the basic precotest ideal L (IH 0 ). Theorem 5.4.2 yields a basis
II 1, • • • , II r for it inside P. However this gives no immediate information regard-
ing a basis for the ideal L (IH 0) in the ring L (m); in fact it gives no information
as to whether such a basis is finite or not. In this connection a simple example 1)

1) This example was communicated to the author by B. z. Moroz.


§ S. SMOOTH PRECOTESTS 99

from the theory of rings is instructive. Let K [x, y, z] denote the ring of all poly-
nomials in the three variables x, y, z over the field of real numbers. Let
K 1 [(x), y, z] denote the ring of all polynomials in the variables y and z whose
coefficients are rational functions over the same field. Then
K [x, y, z] CK [(x), y, z].
In the ring K [x, y, z], consider the ideal I 0 of all polynomials without a constant
term. This ring has the three-number basis {x, y, z}. An extension I in the ring
K 1 [(x), y, z] is obviously the ring itself, which has the one-member basis {1 }.
Thus the basis of the original ideal is quite different from the basis of its exten-
sion. However, since we are interested not in the most general case of ideals in
rings but in the examination of the behavior of the precotest ideals L (/H 0), under
certain conditions the basis of L (/H 0) gives us a great deal of information re-
garding the ideal L (/H 0).
Let us look at smooth precotests g (T 1, • • • , Ts) in the precotest ideal IH 0 ,
the preimage of L (/H 0). We shall say that a precotest g is smooth and that it be-
longs to the class L (m) (where m is a positive integer) if

(5.5.1)

for the function g (T 1, · • • ,


Ts) h ( T 1' • • • , Ts) whenever at least one of the num-
bers ()i--+ oo. By Theorem 1.1.3 a smooth precotest g (T 1, • · ·, T 8 ) has partial
derivatives of the first m orders inclusive.
We now impose on the relationships (5.2.4) defined by the zero hypothesis H 0
other conditions in the region P that we shall need.
Condition (C'). Inside P,

I:~i I=I= 0
aj
(i, j = 1, 2, ... , r). (5.5.2)

for suitably chosen indices a 1, · · ·, a 7 •


We recall that the functions II j (()1' • • •, () 8 ), j = 1, 2, · · ·, r, are holomorphic
on :f. Suppose now that the function F belongs to the extended ideal L (/H0) of
the zero hypothesis H0 • By Theorem 5.4.1 we have inside P770 the representation
F=II 1 G 1 +···+II r G.
r (5.5.3)

Keeping (5.2.5) in mind, we multiply both sides of this last equation by


[(() 1 + 1) • • · (() s + 1)rN. We obtain
100 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

P1 = F [(01 +I) ... (0s+ l)]N = Il1G~ + ... +n,a;. (5.5.4)

where the functions G'. = [(e 1 +I)·· .(es+ I)]-NGi are holomorphic inside P710 .
Suppose that F is botinded inside P by a constant M: \F\ ~ M. If we define
M1 =supp + 1)\-N we obtain \F 1 1~MM 1 = M0 on P, where M0 is
\(e 1 +I)··· <es
a new constant. By virtue of condition (5.5.2), Theorem 1.2.7 is applicable, so
that inside p we have the following bound for the functions c; (j = 1, 2, ••• ' r):

Ia} <0i. .... 0s> I< ~~ <1011+1021+ ... +10s1+ 1{·.


where K 1 and K 2 are constants and 8 is the distance from the point (e 1' • • • , es)
to the boundary of the region P710 . Since Gj = [(8 1 + 1) • • ·(es + 1)]-N Gi, this in-
equality gives us the following bound for Gj:

1Gj(01, · · ·• 05)1~ ~~~ (1011+ · · · +10sl+ 1/', (5.5.5)

where K 3 is a new constant. Here we remember that

j(e 1 +1)···<es+1)j~1 in P.
Let us now suppose that g (T 1' · · ·, Ts) is a smooth precotest of the class L <m>,
where m ~ 1. Then F(e) = E(g\e) satisfies the relation (5.5.1).
If the function F belongs to the ideal L (IH 0 ), then so does the function
F 2 = [F(e 1, •• ·, e)r
E L(IH 0 ), which is holomorphic inside P. Also,
lF (er·· · , es )Im < M, where M is a constant. Therefore F 2 has a representation
of the form (5.5.3):
(5.5.6)

where the Gj satisfy an inequality of the form (5.5.5). If we multiply equation


(5.5.6) by [(el+ 1)· .. (es+ 1)r 1 and set
0}'=[(01+1) ... (0s+IJ-m0j U=l. ... , s),
we obtain, by virtue of (5.5.5),

(5.5.7)

In the region P consider die Cartesian product


L 1 x • • • x L , where L . is
s J
the vertical contour [-ioo + r,/2, ioo + 71/2] with 0 < T/ < 71 0 , where 71 0 is a small
number. Then on this Cartesian product we obtain the following bounds for the G/
§ 6. SMOOTHING OF PRECOTESTS 101

Iai"(01 •... ' I K'l'IM


0s) ~ [( I01' + l) •.. <I es I+ l))m.

where K is a constant depending on T/.


'T/
Now let (denote a precotest of the smoothness class L(m), so that (5.5.1)
holds inside P. In view of this we obtain from (5.5. 7)

E(s\0)=F(0)=II10; + ... +rr,o;. (5.5.8)

If we now remember that (5.2.5) is holomorphic and if we multiply and divide every
term on the right-hand side of (5.5.8) by [(O 1 + I)•·· (0 s + I)]N + 2 , we obtain
inside P
(5.5.9)
where

(5.5.10)

If m 2: N + 3, then by Theorem 1.1.3 the functions A (01' ••• ' 0s) will be unilateral
Laplace transforms of A/0 1, • • ·, 0s) = L (Hi \0 ), where Hi= H (T 1' • · ·, T 5) is a
function that vanishes for T 1 < O, • • ·, T < 0 and satisfies the relation
s1
(5.5.11)
for arbitrary T/ > O.
Furthermore, H has partial derivatives of at least the first m - N orders. By
Theorem I.I.I (the convolution theorem), if the precotest ( is sufficiently smooth
we have the representation

(5.5.12)
where Ri and Hi (j = I, 2, · · ·, r) are the functions described above. Furthermore,
we shall prove that every cotest can be replaced by a cotest of any desired degree
of smoothness in such a way that the basic statistical properties change by an
arbitrarily small amount.

§6. SMOOTHING OF PRECOTESTS

Let us generalize the concept of the gain function W(cl> I B) (see §I); in par-
ticular let us generalize formula (5.1.6) to precotests (including cotests) by setting

W(s\B) = f E(s\0)dB(0) (5.6.1)


(Q'-Qo)
102 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

for a precotest t and a given Bayes distribution B (0) on 0 \ 0 0 •


Let us look at the question of "smoothing" the precotests by replacing them
with smooth precotests for which the gain function W(t\B) will differ only slightly
from the original one.
Let odenote a number in the interval (O, Yz ). Let d 8 r (x) denote a smooth
function of a single variable that imitates the behavior of Dirac's function, has
the form of a "smooth peak" of area 1 and has a given number h ~ 1 of derivatives.
The function d 8 r (x) must vanish for Ix - o/21 ;?. or. Here r ~ 2 is an integer that
will be fixed in what follows. Thus we have
6
J d r(x)dx =
6 l; d 6, (x) =0 for x~O or x~b.
0

Let us now define the function


(S.6.2)
Let t € ~ denote a given precotest and define p = th. Consider the convolution

We have
Tl Ts, oo

p6=D6*P= f ds1
0
f dss, f dss,+1
0 -oo
00

f dp6(S1· ... , ss)p(T1 -61... ·• Ts-ss> · ds1 ... dss· (S.6.3)


-oo

We shall call the function t 8 defined by


S6 (T1 ....• Ts)= h- 1(D6 * p) = h- 1 (D6 *sh)= h- 1 p6. (S.6.4)
the smoothed precotest of e.
Theorem s.6.1. For given £ and sufficiently small o> o,
\W(s6IB)-W(slB)l~e. (S.6.S)
Proof. For m € !D1 and 0 € P we have
E(h- 1m I0) = L (m j 0).
Therefore

E (s 6 j0) = l (D 6 * pj0) = l (D 6 j0) l (pj0).


§ 6. SMOOTHING OF PRECOTESTS 103

By the construction of the function D 8 and a well-known property of the Laplace


transform for () € n \n 0 and given positive E, we have
(5.6.6)
if 8 > O is sufficiently small (cf. condition (5.3.11)). Therefore

W(6 6 IB)-W(6\B)= W(6 6 \B)-W(h- 1plB)


= f dB(0)(E(6 6 \0)-E(iz- 1p\0))
(Q''\J2ol

= f dB (0)(L(D 6 * pl0)- l(pl0))


(12"\!:.lo)

= f l(pl0)(l(D6 \0)-l)dB(0). (5.6.7)


(Q "\!lo)

Since n \no is contained in the compact set n, in which all ().


I -
> E I• > 0 (cf.
(5.3.11)), we have
\L(p\0)\<C for 0eo,0o.
From this and (5.6.7) the conclusion of the theorem follows by virtue of (5.6.6).
Smoothing of the precotest g, i.e. its replacement with the precotest g8 de-
fined by formula (5.6.4), leads to improvement in the behavior of the function
E (g 8 \ ()) introduced in § 3 with increase in I(). j, where j is one of the numbers
"' I
1, 2, ... ' s, in the region n.
Theorem 5.6.2. If the function d 8 r (x) has h :2: 1 continuous derivatives, then
in the region "'
n
(5.6.8)

To prove this we note that, by Theorems 1.1.4 and 5.6.I, we have, for () € "'
n,
E (66 I 0) = L (D 6 I 0) L (p 10)

= 0 ((I e, I+ l)h •~. <I es I+ l)h) . 0 (1),


which proves (5.6.8).
For a sufficiently large value of h the precotest g8 has, by virtue of what
was said in §5, a representation of the form (5.5.12).
104 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

§7. FORMATION OF SMOOTH PRECOTESTS FROM A GIVEN ONE

We saw above that, under certain conditions of a rather general type, a smooth
precotest g can be represented as a sum of convolutions
(5.7.1)
where A 1, • • • , A, are given functions corresponding to the connection conditions
and H 1, • • ·, Hr are smooth functions of a fairly arbitrary type. Conversely, every
expression of the form (5.7.1) is a smooth precotest. For a precotest g to be a
cotest of level a, it must satisfy the condition - a .:S g .:S 1 - a.
If we began with the cotest t/J of level a and then smoothed it in accordance
with formula (5.6.4), we would obtain the smoothed precotest
"16 = h- 1 (D6 * lllh). (5.7 .2)
For this to be a cotest of level a it is necessary and sufficient that
-ah<D 6 *"1h<(I -a)h. (5.7 .3)
Let us assume that the level a is neither 0 nor 1. In such a case, if we found a
smoothing function D 8 that would lead not to the inequalities ( 5. 7. 3) but to the
inequalities
(5.7 .4)
where T/ > 0 is an arbitrarily small fixed number, then, by replacing D 8 with
(1 - r/) D 8 , where T/' is sufficiently small, we would obtain a smoothed cote st
t/I gof level a. By formula ( 5.6. 7) the Bayes gain W(t/J 81 B) would be arbitrarily
close to W(t/JIB). Such a situation is satisfactory, so that we can make not the
requirement (5.7.3) but the requirement (5.7.4), which is equivalent to the inequal-
ities
(5.7.5)
We see that the zeros of the density function h = h (T 1, • • • , T8 ) play an extremely
important role here. To obtain a satisfactory description of £-complete cotests of
a given level we need to impose rather stringent restrictions on the behavior of
the density h. However these restrictions are satisfied for an extremely large
number of cases that we encounter in statistics. In the class of problems that we
are considering, the function h (Tl' • • • , Ts) disappeared for T; < 0, where j is
one of the numbers 1, 2, · · ·, s l' so that we considered it in the region j":
T 1 > 0, • • ·, T8 1 > O; - oo < T; < oo (for j = s 1 +1, • • ·, s). Let us impose on the
function h
§1. FORMATION OF SMOOTH PRECOTESTS 105

Condition (N). The function h (Tl'··· , Ts) does not vanish in the region j"_
In that region it has continuous first partial derivatives. Furthermore, for every
l >0 the inequality

I--ar;- -t- ... + I


iJ In h I iJ In h ·1
iJTs
A1
~&+ A2• (5.7.6)

where A 1, A 2, and a(< 1) are positive constants, holds in the region j"l:
T 1 ~l, T 2 ~E,···, Ts 1 ~£.
This condition makes it possible to ensure that the inequalities (5.7.5) will
be satisfied.
Theorem 5.7.1. Suppose that condition (5.7.6) is satisfied. Then from a given
cotest t/J of level a it is possible to construct an arbitrarily smooth cotest t/J ~
for which the Bayes gain W(t/J'8 \B) is arbitrarily close to the initial gain W(t/l\B).
To prove this theorem let us choose a smoothing function in accordance with
formula (5.6.2). First we set
r = 2. (5.7.7)
The parameter 8 > 0 will be determined later. By formula (5.7.2) we obtain, for a
given cotest t/J of level a,
Tl Ts, co co

h-l (D6 * \jlh) = j d61 • • • f d6s, • f d6s +1 • • • f dss


1

0 0 -co -co
x d 6r (6)1 • · • d 6r (6)s h (Ti h-6i. ... , Ts -ss>
(T1o .•• , Ts)

x "'(T1 - Sl• .... Ts - ss>· (5.7.8)

By virtue of the construction of the function d 8 r (x ), the integrand in ( 5. 7 .8) dis-


appears for Ti S 8/ 4 (where j is any one of the numbers 1, 2, · · · , s 1 ) because
the inequalities Ti S 8/4 and (i ST; imply that d 8 ,((i) = 0 and the inequality
(;>Ti implies that h = O. Thus the integral (5.7.8) is over the region Ti> 8/4
(j = 1, 2, ... ' s 1).
By (5.7.6), in this case we have, in accordance with Lagrange's theorem,

(5.7.9)

a
where the notation (a In h/ Ti) means that the derivatives are evaluated at an
intermediary point (T 1 - og l' ... ' Ts - og. s), where 10 Is 1. Furthermore, by
106 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

virtue of the definition of the function d 8 /x) and equation (5.7.7), I~; I~ o/4.
Using (5.7 .9) we get

1Jnh(T 1. . . . . Ts)-Inh(T 1 -s1• ...• Ts-ss)I


:::;;::.!e>(4a.A1
-..:::::: 4 5a
+A) ::;;::bl-a
2 -..:::::: •

For sufficiently small o we then see that for all points of the region of integration
h(T1-s1>····Ts-ss)_ 1
h(T 1, •••• Ts) -
+s6• (5.7.10)
where ~ 8 --+ O as o--+ O. Furthermore, for the cotest· t/l we have the usual in-
equalities: - a~ t/J ~ 1 - a. If we substitute (5.7.10) into (5.7.8) we obtain

(5.7.11)

where T/ 8 --+ O as o--+ O. If we set tfl 8 = h - l (D 8 * tflh) and choose o sufficiently


smal!, we obtain a cotest tfl '8 of level a with the properties required by Theorem
5.7.1.
We note now that in many important cases, for example in problems of multi-
variant analysis, inequality (5.7.6) holds for a= 1. A theorem of the type of 5.7.1
holds in this case too, although the proof of it is different. We shall examine
those cases in §s, where examples are given of testing of a linear hypothesis.

§8. FORMULATION OF THE FINAL RESULTS. EXAMPLES

We can now formulate the final results of this chapter.


Consider an exponential family defined by probability densities with respect
to Lebesgue measure:
Pe(T1. · · ·. Ts)
=C(6)exp-(fl 1T 1 + ... +6 T )h(T 8 8 1, ( 5.8.1)

We impose the following conditions on h(T l'. ··,Ts).


I. There exists a number s 1 in the interval [O, s] such that h (T 1, • • ·, Ts)=O
if at least one of the variables Tj for j = 1, 2, • · • , s 1 is negative. This deter-
mines the carrier :f of the function h (T 1 , • • ·, Ts).
II. The function h (T 1, • • ·, Ts) does not vanish at interior points of the re-
gion :f and it has continuous first derivatives at such points. Furthermore, in the
region :fl C :f defined by the inequalities
T 1 ::;p e1, .•• , Ts ::;p e (5.8.2)
§s. THE FINAL· RESULTS 107

for arbitrary E > 0, we have

IOT;-
a + ''. +I. a
In h \ In h \ = 0 (-1
aTs EP
+ 1) ' (5.8.3)

where a <1 is a constant. In the case a = 1 a special examination is necessary


(see the examples given below).
III. The integral .fj" • • • Jp 8 (T 1 , • • •, T 5) dT 1 ···dT5 converges absolutely
for 0=(0 1, ••. ,() 5 )€9',where P=R 1 x·•·xR 51 xS 51 +1 x···xS 5 is the
Cartesian product of the s 1 right half-planes Re ()j >0 and the (s - s 1) strips
O<Re()j<Ar
The null hypothesis (H 0). For real points () = (() 1, • • ·, () 5) € P, we have r
( < s) relationships
(5.8.4)

Here the functions II 1' • • • , II r must be real for real () 1' • • • , () 5 • When we
multiply by 1/ [(() 1 + 1) • • • (() 5 + l)]N, where N is a suitable integer, the func-
tions II 1' • • • , II r become holomorphic functions of 1/(() 1 + 1), • • • , l/(e 5 + I) on
P. (The point ()j = oo is included.) The null hypothesis consists in satisfaction
of conditions (5.8.4) on the compact set n of real points (()1, ••• ' ()s) defined by
the inequalities
U= I, 2, .. ., s). (5.8.5)

We denote the set of corresponding points by n


0 . Alternatives to H 0 are
(()1' • • ·, () 5 ) € n\n 0 . The alternatives are equipped with the Bayes probability
measure B (()) defined on n \ n0 , which reduces them to a simple hypothesis.
For a test <I> that is similar with respect to H 0 we introduce the Bayes gain
w(<I> IB) = J cn\no>E (<I> j()) dB(()). An analogous gain is introduced for the cotest.
Conditions on the relationships ( complexification of H 0 ):
(Y 1) Equations (5.8.4) considered in a complex region P generate in that re-
gion an analytic set of points VII 1 , ••• , II r that can be decomposed into a finite
number of components (see Chapter I, §2) connected "along the strips"
q
VII 1, ••• , Ils, each of which is of complex dimension s - r and contains a connec-

ted set R~ 1, ••• , II r of real points that is contained in n0 and that has real dimen-
sion s - r.
(Y II) In the region P the relationships (5.8.4) admit choice of variables
108 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

Oa. 1, ••• , Oa.r such that the determinant laII/aOa.il' i,j= !,···, r. has no zeros
inside P.
Condition (Y 11) is somewhat restrictive. It follows from what was said in§ 3
of Chapter I that in the case r = I (hypothesis H 0 with one relationship) this
condition can be replaced with the weaker condition

rank I ~~: II= r inside Q


(i=l,2, ... ,r, j=l,2, s).
It turns out that condition (Y 11) can be replaced with the condition (Y{1) even
when r > 1. However the proof of this is extremely complicated and we shall not
stop to go through it.
Under the conditions indicated, it is possible to give a description of an £-

complete family of tests <I> that are similar with respect to the hypothesis H0 .
Theorem 5.8.1. For given f > O and a given similar test <I> of level a there
exists a cote st t/Jf = t/lf (T 1' • • ·, Ts) oflevel a such that IW(t/J f IB)- W(t/JI B)I ,'.Sf,
where t/I = <I> - a. Here t/I f (T 1' • • ·, Ts) has the prescribed number of partial de-
rivatives and can be represented in the form
1
lPe (T1, · · ·, Ts)= h (A1 * H1 + ... +A,* H,). (5.8.6)

In this equation, the A. are the preimages of the functions IJ./(0 1 ···0 )N+l
I J s
under the unilateral Laplace transformation
II.
l(Aj!0)= J N+l'
(81 ... 8s+1)
and H 1, ···,Hr are functions possessing the prescribed number of partial deriva-
tives and satisfying the relation
Hj(T1, ... , Ts)=O(exps(ITil+ ... +ITsl).
j =
1. 2, ... , r, (5.8.7)
where g is a sufficiently small positive number. For T. < 0, where j is one of
I
the numbers 1, 2, · · · , s 1, the function Hi ( T 1, • • · , T 5) vanishes.
Conversely, every expression of the form g = 1/h (A 1 *H 1 +•••+Ar* Hr),
where the H. are continuous, which vanishes for T. < O (j = 1 2 • • • s ) and
I J ' ' ' 1
satisfies condition (5.8.7), is a precotest. If H 1, H 2, ···,Hr are chosen in such
a way that
- a~ g ~ 1- a,
then g is a cotest of level a.
§s. THE FINAL RESULTS 109

In view of this the search for £-optimal si~lar tests in the problem posed
leads to a variational problem with constraints: Find continuous functions
H 1, ···,Hr satisfying conditions (5.8.7) such that setting
1
s= h(A1 *Hi+ · · · +Ar* H,), (5.8.8)

maximizes the Bayes gain W(g\B), where g satisfies the inequalities


- a .:s e.:s 1 - a. (5.8.9)
In this form the question can be regarded as a problem of linear programming.
There is a definite computational algorithm regarding this, but we will not stop
for it here.
If r = 1 (the case of the single relationship n 1 = 0) then (5.8.8) has only the
single term
(5.8.10)

This corresponds to the principal ideal T,,


0 = {IT 1 G 1} (cf. § 3, Chapter I). Of all
the conditions listed above the most restrictive is condition II that the function
h ( T 1, • • • , Ts) not vanish in the region j°. If this condition is not satisfied, all
that' follows from the above considerations is that every sufficiently smooth co-
test I/I is of the form (5.8.6):
1
¢=T(A 1 * H 1 + ... +A, *H,),
where H 1, • • • , H, are such that the expression in the parentheses vanishes
wherever h vanishes. However we still do not know whether such a family of
smooth cotests is £-complete in the sense of the definition given above.
In the study of the local properties of similar tests we can pose the problems
considered at the end of § 1. In particular we can study tests that are unbiased
with respect to the zero hypothesis H 0 . Here we treat as alternatives for H0 :
II 1 = o, o r or,
· · · , II, = o the hypotheses H 1 : II 1 = 1, • • • , IT = where not all the
oi vanish. It should be borne in mind that the relationships II 1 = 0, • • • , Il r = 0
have many equivalents and that, by shifting from a basis for the ideal In 0 of
given II 1, • • • , Il, to another basis, we can express H 0 in terms of the new basis;
when we do this the alternatives H 1 obviously change.
Let U!> suppose that the alternatives are defined by means of the relation-
ships (5.8.4) and that the conditions listed at the beginning of this section are
satisfied. Then for sufficiently smooth cotests I/I we have, in accordance with
llO V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

what was shown above, the representation

(5.8.11)
The assumption that the cotest is unbiased with respect to H 0 leads to the follow-
ing equation on no:
iJ
arr . E (¢ I0) = o U=, 1, 2, ...• r).
J
From (5.8.11) we see that the functions G 1, · · ·, Gr must vanish for II 1 = •••

• • • = II r = O; in other words they must belong to the ideal Tif 0 . The functions Gi
are holomorphic in P. By what was shown above, 0

U= I, 2, ... , r),

where the Gil (l = 1, 2, · · ·, r) are functions of the same type as the Gj" Thus
we obtain
r
E (¢ \ 0) = ~ oiiIIJii. (5.8.12)
i, j=l
From this we get
Theorem 5.8.2. Under the conditions listed at the· beginning of this section,
every sufficiently smooth unbiased cotest If; can be represented in the form
T

111= ~ ~ Hii*Ai*Ai, (5.8.13)


'· i.=1
where the Ai are the same functions as in the hypotheses of Theorem 5.8.1 and
the Hii behave like the functions Hi, j = 1, 2, · · ·, r, defined in that theorem.
Here smoothness of a cotest is to be understood in the sense of the definition
given in §5, specifically in the sense of satisfaction of condition (5.5.1).
Let us tum to individual examples of the application of the results that we
have obtained.
Let us first consider some examples in which the family of sufficient statis-
tics is a unilateral exponential family. Such a situation is encountered in the
problem of testing a linear hypothesis (see Example 1, § 1, Chapter II). Let us
consider the following special case, which is a generalization of the Behrens-
Fisher problem. Suppose that we have a set of k normal samples xi 1, x j 2, · • ·
···, xin; € N (ai' <Jf), of sizes n 1, n 2 ,· ··, nk. Here sufficient statistics are
§ 8. THE FINAL RES UL TS l ll
nl ni
- 1 ~ 2 1 ~( -)2
x j = nj ~xjl and sj=n ~ xjl-xj .
l=l l=l
By making translations if necessary, we may assume that the parameters vary on
finite intervals:

a 1j < a j < ~j; ~tj < o~ -< ~2 j; a 1j > 0, ~ij > 0.


The hypothesis H 0 that
A. 1a 1 + ... + A.k.ak. = 0,
where the ,\. are known quantities not all equal to 0, is being verified. To verify
I
H0 , we use the statistic X = A. 1'i 1 + • • · + ,\ k xk and the statistics V 1 = s ~' • • •
• • ·, Vk =sf, so that we use only the subalgebra of the a-algebra of sufficient statis-
tics.
The statistics V 1 , V 2, • • ·, Vk are independent of X. Here
'J..202 ·;..2 02
X EN (0, o2), where o 2 = - n11 -1 + +~. nk.

Furthermore, the statistics V 1, • • ·, Vk have the distribution (see formula (3.3.4))

( ) - ( nk) n~-l 1 -nk.+l n2k. -~ ( nk.Vk.)


pk. vk. - 2 r ( nk. 2 1 ) ok • vk exp- 2cri .

From this we see that the family of distributions defined by the random vector
(X, V 1' • • ·, Vk) has sufficient statistics (X 2 , V 1, • • •, Vk) and that it is a uni-
lateral exponential family. Let us set
1
01 = - - = -,-..,....,.-----............,.....,-
202 (i..202 '}..202),
2
n1
.....!....!...+
+~
nk
1 I
02=-2-· ...• okt1=-2;
2o 1 2ok

Pj=f.I U= 1. 2, ... , k+ 1).


If this family is defined in terms of the natural parameters, the null hypothesis H0
consists in

Here the conditions listed above are easily verified.


112 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

On the basis of what was said above, from a given Bayes distribution B (8)
on an alternative hypothesis n \no for the construction of an £-optimal sufficiently
smooth cotest t/Jf(x, Vp•••, Vk)' where X = X 2 and v 1,• ••, Vk are the values of
the statistics V 1, · • · , Vk, we can use the expression
lJle(x, Vi, ... • vk)
= (F 0 (x. vl' ... ,. vk) * H (x. vl' ... , vk)] x1'•v;m1 ... v;mk.
Here F 0 = x - A.iv /n 1 - ••• - Af vk Ink and H (x, v l' · · ·, vk) is a smooth func-
tion such that
H(x, Vi, ... , vk)=O(expe(x+vi+ ... +vk))
for arbitrary e > 0.
The cotest t/1£ must satisfy the conditions
-a<,x'1•v;m• ... v;mkF0 •H"" 1-a,
where a is a level. Under such restrictions, we pose for H the variational prob-
lem of maximizing
(5.8.14)
In the class of smooth functions H this problem can be solved "with accuracy up
to £"; i.e., for arbitrary £one can find H =HE for which the cotest t/1£ is £-opti-
mal in the sense indicated by inequality (5.1.6).
It is natural to ask whether it is possible to construct from a given cotest t/J
of level a a sufficiently smooth cote st t/J 8 for which the Bayes gain W(t/J 8 1B) is
arbitrarily close to W(t/J jB), so that to find an £-optimal cotest we need only con-
sider smooth cotests. The answer to this question is affirmative. To show that
this is the case we need to modify somewhat the proof of Theorem 5.7 .1.
We set
Ti=Vi, T 2 =V2, ... , Tk=Vk, Tk+t=x, s=k+l;
i
h (Ti, ... , T 8 ) = T~ 1 ••• T:kT;!r
Let us set up a formula of the type (5.7 .8) taking, for small given (), the func-
tions d~i> (~ ;), j = 1, 2, · · ·, k, in the form of "smooth peaks" of area 1 with car-
riers [8/2 - ()2, 8/2 + 8 2 ]. For j = k + 1, we put the function d~ +l) (~ k+l) in the
form of a "smooth peak" of area 1 with carrier [-8/2 - 8 2 , -8/2 + 82 ]. Define
( 1) (k + 1) • . • •
D8 = d8 ••• d8 • A formula of the form ( 5. 7 .8) 1s written 10 the following form:
§ 8. THE FIN AL RESULTS 113
T1 Tk +co

h-l(Dti*11'h)= f dsi ... f f dsk dsk+1dbl>(s1)·


0

. dbk>(sk)dbk+l>(sk+i)( Ti
0

~s1 r• ..... (
-co

i
Tki:Sk rk

X ( Tk+1-sk+1
Tk+•
)-2 ¢ (T t
1 - ':>l• • · ·•
T t
s- ss)·
. . (1) (k + 1) ( t: )
By the construction of the functions d 8 (g 1), • • ·, d8 s k +l the product
1
P= (Ti -61 )ml ... ( Tk-sk )mk ( Tk+i -sk+I )-2
T1 Tk 1k+•
satisfies the inequality 0 P ~ 1 in our region of integration, so that the expres-
~
sion ifJ 8 = h - l (D 8 * ifJh) provides a cote st if ifJ is a cote st. The level of the co-
te st ifJ 8 is equal to the level of the cote st ifJ. Their powers differ by an arbitrarily
small amount for sufficiently small 8. This answers our question.
If we set k =2 in the above statement of the problem of testing a linear
hypothesis, then without loss of generality we can take the condition of connec-
tedness in the form a 1 - a 2 = 0. Thus we obtain the Behrens-Fisher problem. Here
F 0 = x - v 1 /n 1 - v 2 /n 2 , so that the formulas have a simple form. (In [56] the
letters x, u, and v denote variables proportional to the quantities for which we
have been using them, so that the linear form F 0 has a slightly different appear-
ance.)
Let us calculate the Bayes gain (5.8.14). We set x = X 2, l/Jf(x, v 1, v 2 ) =
l/J f (X 2 , v 1, v 2 ); 4>f = ifJf + a. The joint distribution of X, v l' and v 2 has proba-
bility density
p (X, Vi, V2)

= y21na Il [Ci)"•;'
z=i
(n,1- 1) a;"•"v;•;']
r - -2 -
1( (X - 6) 2 n 1vi n 2 v2 ) a~ a~
Xexp [ - -
2 a
2 +-2-+-2-;
~ ~
a2=-+-.
~ ~
For given a 1, a 2 , a 1 - a 2 = 8, the power function is

co co co

=
-co
f dX f dv f dv <f>e(X
0
1
0
2 2, Vi, v2). (5.8.15)
114 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

By virtue of the evenness of our similar test with respect to X, we see that
the test is unbiased: for arbitrary a 1 and a 2 we have

:ti E (<1> (X
8
2, v 1, v 2) I o 1, o2, b) = U for b=O,

for b= 0.
I (5.8.16)

These relationships mean that the similar test cI>f is unbiased. If the Bayes dis-
tribution B (O) on the alternative n\ 0 0 is concentrated on intervals of a straight
line a 1 = a<;), a 1 = aiO), - oo < [) < oo, [j f, 0, then we obtain the locally most
powerful similar test if we maximize the functional ( 5.8.16) (see § 1). Thus for
the similar test cI>f we have a variational problem, that of maximizing the left-
hand member of (5.8.16) with the constraints
(5.8.17)
This problem is amenable to numerical solution by the methods of linear program-
ming.
Let us turn to the general problem of testing a linear hypothesis with unknown
weights.
Let X 1, • • ·, Xn denote independently distributed normal variables such that
Xi € N (ai, yf). Here, the ai are a priori subject to n - s "initial" linear rela-
tionships. Specifically, if

then there exists a constant matrix C = Cn-s,n such that Ca= o and rank C=n-s.
Thus a 1, • • ·, an can be expressed as a linear combination of the parameters
b I' ... ' bs.
Suppose we are verifying the hypothesis H 0 that, in addition to these rela-
tionships, there are still r ( < s) relationships (which may be thought of as im-
posed on b 1, • • ·, bs ). Specifically, we introduce a matrix F = F rs of rank r <s
such that

Pb=O,
§s. THE FINAL RESULTS 115

One can show that the theory expounded earlier, except possibly for ~e conditions
on h (Tl'···, T 5 ), is applicable here.
Let us look at an individual case of a linear hypothesis, the general Behrens-
Fisher problem. Here n = n 1 +n 2 is partitioned into two samples x 1, • • • ,xn 1
and y 1 , • • • , y n 2 of sizes n 1 and n 2 • The initial relationships are of the forms
a 1 = a 2 = ••• = an 1 = a 1 and an 1 +1 =···=an= a 2 . We also introduce the
condition y 1 = y 2 =··•=Yn 1 = a 1 ; Yn 1 + 1 = • • • = Yn = a 2 • Sufficient statistics
are X 1 = x, X2 = y, V1 = s i, V 2 = s ~- By formula (3.3.14) the probability density
is

Just as in § 3 of Chapter III, we introduce the natural parameters


n1 • n2 ~ n1a1 n2a2
-2-
20"1
= A.1, -2-
20"2
= 11.2, -2-
0"1
= µl, .-2-
0"2
= µ2.
The hypothesis H 0 imposes one relationship (see Chapter III):

~: - ~: = 0. (5.8.18)

Let us consider H 0 in the region 0 0 : £ j _$ () j :5 Ej, j = 1, 2, 3, 4, where () j is


one of the numbers A. 1, A. 2 , µ 1, µ 2 (if necessary the parameters can be displaced).
In the present situation all the conditions of the theory expounded above are sat-
isfied except for the conditions on the density h.
If we divide equation (5.8.18) by A. 1 A. 2 µ iµ ~ we obtain the equivalent rela-
tionship in 0 0 :
(5.8.19)

If we DOW set x1 = Tl' x2 = T 2' v1 + Xi= T 3' and v2 + x~ = T 4' we obtain,


corresponding to the left-hand member of (5.8.19), its preimage under the unilateral
Laplace transformation (see § 1 of Chapter I):

T 1 T 4 - T 2 T 3 if Ti.> o for i = 1, 2, 3, 4,
A (T) = {
0 otherwise.
116 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES

The general form of sufficiently smooth cotests i/J (T) isl)


1
=
ti' (T) h (A * H), (5.8.20)
where H is a function of the type described in the preceding section, and
n 1 -3 n 2 -3 n,-3 n 2 -3
h (T) = v;-2v;2 = (T3 - TD_2_ . (T 4 - T~)-2-

JT;,
for T 1 .:::; and T 2 .:5 ~; h (T) = 0 otherwise. For the expression (5.8.20)
to be a cotest it is necessary that H be chosen in such a way that A * H vanishes
for T 1 > JT;, or T 2 > ..JT;_. The fact that such a choice of the functions H is
possible will be shown in § 3 of Chapter IX. In view of the complexity of the ques-
tion of constructing a sufficiently broad family of such functions, we shall not
stop to do this here.

l) Added in proof. V. P. Palamodov has shown that all cotests have an analogous
fa.rm: Testing of a multidimensional polynomial hypothesis, Dokl. Alcad. Nauk SSSR 172
(1967), 291-293 =Soviet Math. Dokl. 8 (1967), 95-97. Editor's note. See also the Supple-
ment to the present book, by Palamodov and Kagan.
CHAPTER VI

WIJSMAN'S D-METHOD

§ 1. THE D-ME1HOD AND THE CONDffiONS UNDER


WHICH IT CAN BE APPLIED

Let us again look at exponential families, where the probability density with
respect to Lebesgue measure can be expressed by formula (5.2.3) without the
correspo'nding conditions on the Ti (j = 1, • • ·, s). As was stated in §§2 and 8 of
Chapter V, we can under rather general conditions construct a precotest ideal and
similar tests if we can exhibit a cone in the space Es of sufficient statistics
Tl•··· , Ts inside which h (Tl•·•· , Ts> does not vanish and, after a suitable
linear transformation, satisfies certain conditions. The relationships between the
e
parameters 1, • • ·, esdefining the null hypothesis H0 are assumed to be holo-
morphic of the type (5.2.4). Of course this does not give a sufficiently complete
description of the similar tests.
If we cannot exhibit a cone of the type indicated we cannot construct similar
tests by this method.
In 1958, in an interesting article [ 11], Wijsman presented a method enabling us to con-
struct similar tests under weaker assumptions regarding h( Tl'·· • ; Ts) even when the null
hypothesis H 0 reduces to a single polynomial relationship among 1 , • • ·,e es'
However only a part of the entire family of similar tests can be obtained in this
way, and it is only in rare cases that we can determine their optimality or E·

optimality. Nonetheless, Wijsman's method is extremely elegant and it discloses


interesting relationships between statistics and the theory of partial differential
equations. Therefore we shall stop to examine this method in the present chapter.
As stated above, suppose that an expon·ential family is defined by the probability
density with respect to Lebesgue measure:

and suppose that h ( T 1, • • • , Ts) satisfies the following condition.

117
118 VI. WIJSMAN'S D-METHOD

Condition (W). There exists a closed s-dimensional cube C on which

(6.1.2)
The zero hypothesis H0 is assumed to be defined by a single polynomial
relationship

H 0 :P(0 1, ••• , 0s)=0, (6.1.3)


where P is a polynomial of degree d ~ 1.
To construct a similar test «I>= «l>(T 1, • • ·, T8 ) of level a, let us construct
the corresponding cotest t/J = t/J(T 1, • • ·, Ts). To do this we construct a bounded
function F(T l• • • ·, Ts)= F(T) which vanishes outside the cube C and for which
the Laplace transform L (Fl e) vanishes when condition (6.1.3) is satisfied. By
virtue of condition (6.1.2) and the boundedness of F, the function t/J(T} defined
by
F (T)
'¢(T)=e h(T) for TEC. ,
(6.1.4)
'¢(T)= 0 for TfC.
is a cotest for sufficiently small f. Thus the problem reduces to construction of
this function F ( T). To construct this function we take an entire function G ( T) =
G(T 1, •• ·, Ts) which possesses all partial derivatives of the first d orders in-
side C, which vanishes outside C, and for which all the derivatives of the first (d - 1)
orders are continuous on the boundary of C. An example of such a function is the
One defined as follows. Suppose that the cube C is defined by the inequalities
a;<,. Tr~ a;+ I, j = 1, 2, ... , s.
Then define .;
a (T) =II
j-;l
(T;- a;)d (a,+ 1-T;)d

for T € C and G(T) =0 for T ~ C. The function G(T) is the desired one.
Consider the differential operator

D=P(a~., .. .. a~J· (6.1.5)

By virtue of the properties of G ( T) (see § 1 of Chapter I) we have


L(DOl0) =P (0) L(Ol0). (6.1.6)

From this it is obvious that the function g(T)= DG(T)/h(T) is a precotest.


Since fg(T) is bounded, it is a cotest for sufficiently small E. H the space of
sufficient statistics T 1, • • • , Ts also contains several cubes or parallelepipeds
§z. EXAMPLES OF APPLICATION 119

in which inf h ( T) is positive, we can construct cotes ts for them by the method
described and set up linear combinations of them from which we can also obtain
cotests. Following Wijsman we shall call this method of constructing similar
tests the "D-method".

§2. EXAMPLES OF APPLICATION OF 1HE D-METHOD

Example 1. Construction of certain similar tests for the Behrens-Fisher


problem.
Here, in the notation of Chapter III, §3, we obtain the equation

for defining the null hypothesis H0 , so that here we need to set


D __a
__a___a__a_
- iJT2 iJT3 ar. iJT4 •
For the cube C we may take any cube in which the function h ( T) defined by
formula (3.3.14) is bounded below by a positive number.
Of course the cotests constructed in this way constitute only a part of the
entire family of cotests.
Example 2. Verification of a hypothesis regarding a standardized mean.
Here we succeed, to a sufficient degree, in getting a complete description of
the similar tests. Suppose that we make a repeated sample from a normal set
N (a, u 2). Let us verify the hypothesis H0 : a/u = Yo that the standardized mean
has a given value Yo· We wish to get as complete as possible a description of
the similar tests H0 • Let us make an orthogonal transformation of the chosen
variables, taking the new variables X0 , X1,. ·., Xn in accordance with the
formulas (see Chapter III, §3)
1

X0 1
1 )-2 (x -x).
=x Vn+ 1; X = (1 - n+l - 1
1

... , Xn= (1 - 1 )-2 (Xn-X).


n+t
-
These variables are normal and independent, and

XiEN(O, o2). j = 1, 2, ...• n ..

If we set ayn+l = /l and r 0 = y 0..jn + 1, we reduce the problem to verification


of the hypothesis H0 : /llu = r 0 for the sample (X 0 , X1, • • ·, Xn). For this sam-
ple the probability density has the form
120 VI. WIJSMAN'S D-METHOD

(2n)
.!!..:!:.!.
2 an
exp}f _ 2~2·[cx0 -µ2)+ ~xJ]l 1= 1

= C(µ, cr)exp [ - ±
2~2 J=O X~+ r: x 0 ]. (6.2.1)

Sufficient statistics are T 1(x) = "i.j =o Xj and T 2(x) = X0 • Furthermore, we set


1 O ro
01 = 202 • 2 = - -a .
Under the hypothesis H0 we have !9~ - 2r60 1 = 0, so that
P (01' 02) = o~ - 2ri0 1 (6. 2.2)
and
D=--2r-.
a2 2 a (6.2.3)
::iT2
v 2
o iJT 1
We return to the function h ( T 1, T 2). The statistics T 2 and V = T 1 - T~ are
independent. Also, T2 € N(µ, a 2) and V = a 2 x;,
so that the corresponding prob-
ability densities p(T 2) and q(V) are equal to
1 (X -µ)2
p (T2) =-=-exp- ...:.........:0'--~
Y2na 2a 2
and
1 ~-1 ( v
q (V) = n V 2 exp - 202 ) •

22 r·(-;.)an
Furthermore, for T 1 ;?:: T~,

a (T2. V)
iJ (T,. T1)
=I I \=I,
-T 2
01

so that the common probability'density for T 1 and T 2 is of the form


p (T 1, T 2) =C 1 (µ, cr) exp-(0 1T 1 +0 2T2)h (T 1, T2 ),
where C1(µ, a) depends only on the parameters µ and a and where

(T 1 - T~)i-I for T 1 ;?:: T~


h (TI• T 2) ={ 2
O for T 1 <T 2.
Thus for suitably chosen G( Tl• T 2) we can, on the basis of what has been
said above, obtain the cotest
§2. EXAMPLES OF APPLICATION 121

(6.2.4)
if
(6.2.5)
If inequality (6.2.5) is not satisfied, then t/l(T 1, T 2 ) = 0. We note that, by Chap-
ter III, §3, the statistic T 2/Jf;_ has a distribution depending only on a/a, so
that tests of the form C>(T 2hfl;), where C> € (0,1) is an arbitrary measurable
function, are similar. From these tests we can construct the cotests t/l(T 2/JT;_).
For a suitable choice of G( T1, T2) formula (6.2.4) yields cotests that do not reduce to
the type indicated.
Let us now discuss Wijsman's reasoning with regard to the construction of
similar tests of this problem by the method described. We need several tedious
analytical lemmas, which we use without proof, referring the reader to Wijsman's
article [I I].
We can extend somewhat the general idea of the D-method. We shall show
that for a given level a every cotest can be expressed by formula (6.2.4) for
suitable G( T 1, T 2) satisfying certain additional conditions. We do not require
that G( T 1, T 2) vanish whenever h ( T 1, T 2) vanishes. It is important only that
equation (6.1.6) be satisfied, and in fact even this requirement can be somewhat
weakened.
Equation (6.2.4) can be written in the form

( _iJ__ ~ iJ22)0= Jf2'1: <I>, (6.2.6)


iJT 1 2r0 iJT2 r0
where
t1> (T) = y 8'1:1 r 0 h (T) 1jl (T). (6.2.7)

Here t/l(T) is a cotest of level a.


Equation (6.2.6) can be regarded as the one-dimensional heat-flow equation.
Here T 1 should be interpreted as time and T 2 as the space coordinate, G as
the temperature, and (..,/2ri/r0) 41 as a heat source (either positive or negative)
that is variable in space and time. If we were dealing with an ordinary heat-flow
problem we could write its solution with the aid of the corresponding Green's
function in the form
122 VI. WIJSMAN'S D-METHOD

co T1 1

o(r1. T2) = f dT~ f


-co 0
dT1<I>(r;, T~)(T1 -TD- 2

r~ (T 2 - T~)2
X exp [ - - ,
J· (6.2.8)
2 T1 -T 1
Since <l>(T') = O whenever T!/ > T1' , we can integrate over the interval T'2
2 ::;
Ti :S TI.
We need to investigate the question as to whether the formal solution (6.2.8)
of equation (6.2.6) actually yields the cotest that we need. To do this, let us set
<I> ( T) = I/I ( T) + a and examine the power function
~(r, cr)=E(tl>(T 1, T 2 )[r. cr).
Suppose that I/I satisfies (6.2.4). Then

(3(r, CJ)= a+C(r, cr) J J exp\--- ;~2 +: T 2)

X( 02
2 - 2r6 ~) 0 (T 1, T 2)dT 1 dT 2 , (6.2.9)
oT2 oT 1
where the integration is over the region 0 ::; T 1 < oo, - oo < T 2 < oo and where
C (r, a) is a normalizing constant.
A detailed investigation of this integration leads to the formula (see [ 11],
PP· 1037-1038 and 1042-1045)
p(r, .cr)

=a+C(r, cr) r~ /~-


(J
Jim Jim JAdT
A_,).uo B->oo 0
1 JB O(T 1• T2)
-uo

X exp [ - :fa~+: r 2] dT 2• (6.2.10)


Here we take the limit first as B --> oo, then as A --> oo. From equation (6.2.10)
we see that under the hypothesis H0 : r = r 0 the power function is independent
of the nuisance parameter a, which means that the test <1> 1 is similar.
To describe all similar tests let us consider the class L of functions
G( T 1, T2 ) defined in the right ( T 1, T2 )~half-plane ( T 1 > 0) that satisfy the fol-
lowing four conditions.

I. DO (T1. T2) = 0 for T~ > T1.


2 1-~ ;,
II. -a< ( T1-T2) DO(T1. T2)<I-a,
2 if T2<T,.
III. 0 (0, T2) = 0: 0(T 1, T 2) - 0 as T 2 - - oo for arbitrary prescribed
values of T 1 .
§2. EXAMPLES OF APPLICATION 123

IV. The integrals


B

sa
-oo
(A, T2) exp [- 2~2 +: T2] dT2,
A

f
0
O(T 1• B)exp[- ~~ +~B]dT 1 ,
A

f iJO (Ti. B)
iJT2 exp
[-_!_J_
202
+!...BJ
a
dT
t
0
converge to 0 as first B and then A approaches oo. Then (see [11], PP· 1037-

1038 and 1042-1045) there is a one-to-one correspondence between these func-


tions G( T 1, T2 ) € L and the cotests i./J for the level a in accordance with form-
ulas (6.2.6) and (6.2.8). Here-conditions IV are difficult to verify. They are
always satisfied, however, if
O(T1, T2)==:0 for T 2 > VT 1 •
CHAPTER VII

UNBIASED ESTIMATES

§ 1. UNBIASED ESTIMATES FOR INCOMPLETE EXPONENTIAL


FAMILIES DEPENDING ON SUFFICIENT STATISTICS
The question of the relationship between the theory of sufficient statistics
and unbiased estimates was considered in Chapter II, §6. Theorems 2.6.1 and
2.6.2 tell us that for a broad class of questions we can confine ourselves to
unbiased tests depending on sufficient statistics.
Let us consider an exponential family of the form (S.2-3) under the as-
sumptions listed in § 2 of Chapter V. Specifically, we assume that the function
h(T 1, ···,Ts) does not vanish in the region ~: Ti< O, j = 1, 2, · · ·, s, that
s 1 ~ s, and that the integral of the function (S.2.3) converges absolutely in the
Cartesian product P of the s 1 half-planes Ri: Re ei > 0 (j = 1, 2, · · · , s 1) and
the s - s 1 strips S.: O < Ree. < A .. We denote the region of real values
I I - I
(e 1, · · • , es) € p by PR" H we impose no additional constraints on the parameters
e1, · • · , es, then the family of sufficient statistics T 1, · • • , Ts is complete
(see §2, Chapter IV). Let ~(T 1 , • • ·, T) denote a statistic satisfying the
condition

(7.1.1)

for every f > O. Then for all real values of e = (8 1 , • · · , es) E pR the mathe-
matical expectation

and the variance

(7.1-3)
exist.

125
126 VII. UNBIASED ESTIMATES

The statistic eis an unbiased estimate of the function f(Ol' ••• ' Os) in the
region PR" H we impose no additional constraints on the parameters (O 1' ••• , Os),
then by Theorem 4.2.2 the statistic g is the unique unbiased estimate of
f(Ol' ···,Os). Every other unbiased estimate f(Ol' • • ·, 0) coincides with it
with probability 1. On the other hand, if there are additional constraints on the
parameters then equation (7.1.2) may hold only under the assumption of these rela-
tionships. We then do not have completeness and there may be many unbiased
estimates.
Let g1 and g2 denote two unbiased estimates of f(OI' • • ·, 0) for
(0. 1 , • • • , Os) € n 0 , where n0 C pR" Then )< = gl - g2 is an unbiased estimate of
zero. We shall denote this concept simply by UEZ. If g is an unbiased estimate
off (OI' • • • , Os), then all other unbiased estimates of f are of the form g+ X•
where )( is a UEZ. To describe all unbiased estimates of f(0 1, ··•,Os), it will
be sufficient to find one of them and all the UEZ's X·
If f(OI' • • • , Os) is a function defined on PR• then the question of existence
of an unbiased estimate of it in PR, that is, of an equation of the form (7.1.2), is,
for the given families, a special question in the theory of the multiple Laplace
transformation. The function f(0 1 , ···,Os) must always be holomorphic in P. It
is sufficient that it be holomorphic in a neighborhood of the point
(0 1 = oo, • • • , Os = oo). Then there exists a smooth function p = p (T 1, · · · , Ts)
that vanishes if one of the numbers Ti< O (j= 1, 2, · · ·, s) and that satisfies
the equation L (pJ O) = f (0 I' · • • , Os). Since by hypothesis h ( T 1, · · • , Ts) does
not vanish in 5", it follows that e = p/h is an unbiased estimate of f(OI' ••• , Os)
in PR" Furthermore, it is the only unbiased estimate since there are no relation·
ships between the parameters.
Let us now describe the set of UEZ's for incomplete exponential families,
i.e. when we have relationships of the form

(7.1.4)

We impose on these relationships the same conditions as in §8 of Chapter V.


The functions Il 1, ·••,Dr must be real for real (OI' •••,Os) and there must
exist an integer N ·:;: ; O such that the functions Il/[(0 1 + !) ···(Os + l)]N are
holomorphic functions of 1/(0 1 + 1), • · · , 1/(0s + 1), on P for j = 1, • • · , r.
Furthermore, just as in §s Chapter V, the complexification conditions (YI) and
§2. BEHAVIOR OF THE VARIANCE 127

the conditions (Y n) are imposed on them.


So far the only condition that we have imposed on the function h(T1, · · · , Ts)
is condition III of §s, Chapter V, i.e. the requirement that the integral of (5.2. 3)
be absolutely convergent in P.
Let us now investigate the UEZ t under the assumption that (7.1.1) holds. -
Under this assumption E <tie) exists and is a holomorphic function of (el'···, es)
in P. Furthermore, under relations (7.1.4) E<tie) = 0, so that t plays the role of
a precotest and the reasoning of Chapter v can be applied to E<tie). If tis a
sufficiently smooth unbiased estimate, we obtain the representation
(7.1.5)

where the functions Ai and Hi (j = 1, 2, • · •, r) are the same as those introduced


in §s of Chapter V. The degree of smoothness required for a representation of the
form (7.1.5) to exist is also indicated by the reasoning of Chapter V.
Conversely, if H 1 , · • · , Hr are such functions and if the sum of the convolu-
tions of the right-hand side of (7.15) vanishes whenever h vanishes, then

(7 .1.6)

will be a UEZ in PR·


In particular, if the function h does not vanish in 3", then formula (7.1.5)
yields a UEZ in PR for arbitrary H 1, • • • , Hr subject to the conditions listed. in
§ 8 of Chapter V.
§ 2. ON 1HE BEHAVIOR OF THE VARIANCE
OF UNBIASED ESTIMATES 1 )

In this section we consider the same exponential families and the same unD
biased estimates of zero as in the preceding one. Let t= t<T 1, ···,Ts) denote
an unbiased estimate of the function f(el' ···,es). Suppose that t possesses a
variance D ((ie) in the region pR" Let n0 denote a compact set of real values
of the parameters in pR" We shall say that the unbiased estimate t is admissibk
on the compact set no if there does not exist a UEZ x such that D <t + x Ie) :$
D<tie) for all e E n 0 with equality holding for at least one value of e. Other·
wise tis said to be inadmissible on n0• An estimate tis said to be the best

1) This section was written by the author in collaboration with A. M. Kagan~


128 VII. UNBIASED ESTIMATES

estimate on n 0 if for an arbitrary UEZ x D(( + xi O) ~ D ((JO). Furthermore, if x


is a UEZ, then for an arbitrary constant ·y the statistic y • )( is a UEZ and

D<s+ vx i 0)=E<s-/(0)+vx>2
=E <6 - I (0))2 + 2vE <sx> + v2Ex2.
Or' sing ling out the dependence on e,
D<6+vx J0) = D <610>+ 2vE<6x JO)+v2E(x2J0). (7.2-1)

Here the behavior of the covariance E ((xJO) is extremely important. If

E(sxl 0) 4' o. (7.2.2)


at the point e, then the third term on the right-hand side of equati:>n (7.2.1) will
in many cases be less than the second term for sufficiently small ·y and the esti•
mate (will not be the best. H E ((xJO) retains its sign for all OE n 0, then (
will be an inadmissible estimate on no. On the other hand, if inequality (7.2.2)
is satisfied at a given point 0 E n0 then the unbiased estimate camot be the best.
H X is a UEZ satisfying the conditions of the preceding section, then we
have the representation (7.1.6). H h does not vanish in 5' we can choose
rather freely the functions H 1, ···,Hr. In this case

(7.2-3)
where al' • • ·, ar are functions that are holomorphic in P and that are subject to
the conditions described earlier. Let us write equation (7.2-3) in greater detail:

f ... J 1,h exp [- (01T1 + ... +09 T9 )] dT1 ... dT 9


;r
1
= C(0) (Il101+ ... +IIPs)•
Here, llC (O) = J • • • J5'exp [ - (0 1 T 1 + · • • + Os TS)]dT 1 • • • dT s is a
function that is holomorphic in p and positive for 0 E PR" Thus

J... J1,h exp [-(01T1 + ... + 0sTs)I dT1 ... dTs Ej Ho· (7.2.4)
er
Let L = P (al ae 1 , • • ·,al aes) denote a linear differential operator that is
a polynomial in al d9 1, • • ·, alaes with constant coefficients. H we apply this
operator to the left-hand member of (7.2.4), we get the expression

.J ...
I
er J xQhexp[-(ff1T1+ ... +0sTs)JdT1 ... dTS, (7.2.5)
§2- BEHAVIOR OF TIIE VARIANCE 129

where Q is a polynomial that is easily determined from the polynomial P. Con•


versely, for each polynomial Q we can choose the corresponding polynomial P.
If we treat Q = Q(T 1, • • •• Ts) as an unbiased estimate of E (QIO) for OE n0,
we can use (7.2. 5) to judge the quality of that estimate.
'Jn particular, the question as to whether the estimate Q is admissible or not
reduces to the question of whether the ideal TH 0 contains a function f ETH 0
such that Lf is negative for 0 E 0 0 • Here L is an operator such that P (L)
corresponds to the polynomial Q •
To find the polynomial statistic Q ( T 1, • · • , Ts), which would be the best
unbiased estimate of its mathematical expectation on the compact set no' we
need to find differential operators L such that

( 7. 2.6)

Obviously such operators constitute a ring admitting multiplication by arbitrary


complex numbers.
Let us give some examples of the application of the methods described to
determine the inadmissability of certain polynomial estimates, for example the
method of sample moments. I) Consider a one-dimensional exponential family with
a single parameter and with density (with respect to Lebesgue measure) of the
form
p (x 10) = c (0) exp (01P1 (x) + ... + 0 cps (x)) exp lo (x)'
8 (7.2. 7)

where ¢/x) and l 0 (x) are continuous functions and x € [a, b] for a= - oo or
b = + oo, We have l 0 (x)-+ - oo as x-+ - oo; 10 (x)-+ - oo as x - oo; 1¢· (x)/l 0 (x)I-+ 0 as
oo . I
x--+ ± oo; and .( 00exp ((1- ()l 0 (x))dx < oo for some £ > o.
Consider a repeated sample (x 1' · · · , x n) out of a set with probability density
(7.2. 7). This sample corresponds to an exponential family with probability
density
p(x 1, ••• , Xnl0)=
= C 1 (0) exp (01T 1 + ... + 0 Ts) R (T
8 1, ••• , Ts). (7.2.8)

(Earlier we used notation with the minus sign in the argument of the exponential.)
Here Ti= I:=l ¢;Cxi) (j = 1, 2, .. •, s); h = h (T 1, ···,Ts) is the corresponding

1) These examples were given by A. M. Kagan.


130 VII. UNBIASED ESTIMATES

probability density. The parameters 81' · • ·, ()s are .related by


2 ~ s
ff1=02-01=0, Il2=03-01=0, .... ns-1=0s-01=0, (7.2.9)
which define an analytic set n. Here T =s - 1. The statistics TI' ..• ' Ts are
sufficient stat is tics. Let us show that an arbitrary polynomial Q ( T 2, · • • , Ts) -fo
const is an inadmissible estimate of the function E (QJO) on an arbitrary compact
set 6 of values of the parameter ().
In particular, if ¢; (x) = xi (j = 1, 2, • • · , s ), we may assert that the sample
moments ak =(1/n)Ii= 1 x7
for k = 2, 3, · • ·, s are inadmissible estimates of the
general initial moments ak (for k = 2, · · • , s) on an arbitrary compact set ®·
As a second example consider a set with densities p (xJ()) =
C0 exp [-(x - ()) 2k], where k is an integer '2'. 2. For a repeated sample
( ) h ~n
T 1 = ""i=lxi2k-l T ~n 2k-2 T
xl' .. • 'xn t e sums ' 2 = ""i=lxi ' · • ., 2k-1 =
Ii=l xi= nx are sufficient statistics. We may assert, in particular, that the
sample moments x, a 2 , · · · , a 2n-2 are inadmissible estimates of the correspond 0

ing general moments.


By virtue of the conditions imposed on the functions ¢; (x), the quantity
E (QJO) exists for an arbitrary polynomial Q (T) and for arbitrary values of
()l'. · ·, ()s. Suppose that a polynomial Q = Q(T 2 , • • • , Ts) depends only on the
s - 1 sufficient statistics indicated. This polynomial is an unbiased estimate of
the function E (QJO) = f(()). We shall be interested in the properties of this esti 0

mate under conditions (7.2.9) on an arbitrary compact set of values of ()l' We


have

C1(0) J ... JQ(T T )h(T)


2 , ••• , 5

J'
X exp (01T1 + ... + 0 T dT1 ... dT 5 5) 5 = f (0), (7.2.10)

where 5" is the space of values of (T 1 , · • • , Ts). By virtue of the conditions m


h (x) in the space 5", we can choose a bounded closed cube R: T . 1 < T. < T. 2
I -.1- I
(j = 1, 2, · • • , s) in which
h (T) e0 0. >- > k k
Let m ~ 1 denote the degree of the polynomial Q and let ak ••• ks T 22 · · • Ts s 2
denote one of its mth degree terms (ak 2... ks -fo 0 and k~ + •··+ks
0
= m). We
shall now apply considerations of the nature of Wijsman's D method (see Chapter 0
§:z. BEHAVIOR OF THE VARIANCE 131

VI).
Let lfio (T 1 , • • ·, Ts) denote a function satisfying the following conditions.
(1) lfJ 0 (T 1, ···,Ts)> 0 for (T 1, •·•,Ts) inside R.
(2) I/Jo (Tl' · • • , Ts) = 0 for T ft, R.
(3) l/J 0 (T 1 , • • ·, Ts) is everywhere continuous and has no fewer than
2k 2 + • · · + sks partial derivatives with respect to each argument. Here all deriva·
tives vanish on the boundary of R.
We set

f f ljlo(T 1• • • . , T 3 )exp(0 1T 1 + ... +0 T )dT 3 3 1 •••

. . . dT3 = V(0).

Consider the polynomial

of degree (2k 2 + · • ·+ sks).


We set

where the operator W is given by

i iJll+ ••. +Is


w = ~ w1, . ...• 's ae'• aeis.
I •· · s
By virtue of condition (3 ), W is applicable to the function l/J 0 ( T 1, · • · , T5 ).
A familiar property of the Laplace transformation (see § 1, Chapter I) leads
to the relation

J ... Jljl(T 1•..• , T 5 )exp(0 1T 1 + ... +0 T )dT 5 5 1 ••• dT 3


= w (0) v (0) = (02 - 0i)k 2 ••• (0s-0ft 5 v (0).
Now define
132 VI I. UNBIASED ESTIMATES

I o. Tf/.. R..
X(T1,. · ·• Ts)=t k~~:: :::: ~~, TER.

We have

E (X 10) = C1 (0) J .· ·
J 'ljl(T 1, •••• T 5 )exp(0 1T 1 + ... +0,T )dT 5 1 ••• dT 5
=C1(0)(02-0'fl2 -(05 -0ft5 V(0).

Therefore X is a UEZ under the relations (7.2.9), i.e. for (:) E Il.
Furthermore,

E(xQi0)=C 1 (0) J ... J¢(T1•••• , T 5)

X Q(T2, ••• , T )exp(0 T + ... +0 T )dT


5 1 1 5 5 1 ••• dT 5
=C1(0)L{(02-a~t 2 ••• (05 -0f)11sv(O)).
where

Obviously, under conditions (7.2.9),

L ( (02 - 0~t 2 ••• (as - 0fts V (0)}


= a112, ••• , 11sk2 I ... ks Iv (0), a En. (7.2.11)

H we assume that ak 2, ... , k s > 0, this equation yields

E(xQI 0) > o. 0en. (7.2.12)


~ ~

Now, let 8 denote an arbitrary compact subset of 0 0 and let Il denote the
corresponding compact subset of Il; i.e.

rr = {01. 0 En. 01 ee1.


§3. RAO'S THEOREM ON INADMISSIBILITY 133

Obviously, for () € ""


U,

and
E <xQ I 0) >- c > o
0 (7.2.13)

E (1 2 10) ~ C0 < oo. (7.2.14)

Consider the statistic

(7.2.15)

c1 (0) I ... I Q
We have
E (Qj 0) = (T) h (T)
:r
X exp (0T 1 + ... + 0sT 5) dT 1 ••• dTs

and
E(Q2 10) =E(Q2 l 0)-2yE CxQ I 0)+v2E(x2 I 0).' (7.2.16)

It is obvious from (7.2.13), (7.2.14), and (7.2.16) that y can be chosen in such
""
a way that, for () € e,

-2yE <xQ IO)+ 12£ <x2 10> < o. (7.2.17)

which proves the inadmissibility of the estimate Q(T 2 , • • ·, T ) on an arbitrary


"" s
compact set e. If we consider the polynomials depending on all sufficient
statistics as estimates, this inadmissibility may obviously fail to occur.

§ 3- A THEOREM OF S. R. RAO ON THE INADMISSIBILITY OF


CERTAIN ESTIMATES l)
Densities with "shifting parameter" of the form
p(xl0) =
C 0 expP (x -0), (7.3-1)
where P (x) is a polynomial of even degree with negative leading coefficient,
constitute a special case of densities of the form (7.2. 7). When a repeated sample
is made the sample mean x is obviously an unbiased estimate of (). From
Kagan 's example at the end of the preceding section one can easily see that x is
and inadmissible estimate of () if the degree of the polynomial exceeds 2. (If the
degree is equal to 2, i.e. if the sample is a normal one, x is, as we know, the
best estimate of () in the sense described earlier.)
However, inasmuch as this result has to do with families with "shifting
parameter," it follows from a more general result proved by Rao [62] in 1952

1) This section was written by the author in collaboration with A. M. Kagan.


134 VII. UNBIASED ESTIMATES

dealing with such families.


Consider a family of distributions on En with probability density (with re·
spect to Lebesgue measure)
(7.3.2)

and a class ij of unbiased estimates A of the parameter e that satisfy the con-
dition

(7.3.3)

Obviously the class ij coincides with the set of statistics of the form
x + f(xi -xi)' where f(xi - xi) is a statistic depending only on the observed dif-
ferences and satisfying the equation E (f\O) = Q.
Theorem 7.3.1. Every estimate A€~ is inadmissible in the class of IJJ'l,biased
estimates e for the region of all values of e except when
(7-3-4)
To prove this we note that

E(A(x 1••• • , Xn)lx 2 - x 1• ••• , x,,-x 1• 0)


=E(A(x 1 +0 . ... , xn+0)lx2 -x 1, ••• , Xn-x 1, 0)
=0+E(A(x 1, ••• , Xn)lx 2 - x 1, ••• , xn-x 1, 0). (7.3.5)

We set

(7-3-6)
Obviously X is a UEZ. Let us now consider the estimate B (x 1, • • •, xn) =
A - )('· We have
E (Bl0) =E(A\0)-E(x\"0) = 0.
Furthermore,

D(A\0)=E [(A-0)2 \0) =E [(B-0+x)2 1=


=E [(B-0)2 101+E(x2 J0)+2E <x (B-0)\0).
But

The statistic X depends by definition only on x 2 - x I' · · ·, xn - x I° Because of


this and equations (7.3.5) and (7.3.6),
§ 3. RAO'S TIIEOREM ON INADMISSIBILI1Y 135

E(X(B-0)ix2 - x 1• . • • , xn-x1. 0)
=x[E[(B-0)lx2 - x 1, ••• , xn-x 1, 0]]=0.

Therefore E [x (B - O) IO] =0 and D (A IO) = D (B) + E <x 2 1O) > D (B) except when
X = 0 with probability 1.
Let us look at those cases when this theorem of Rao gives inadmissibility of
a classical estimate i for the parameter e. For the statistic A = x, let us find
the "correction" X defined by (7-3.6). To do this we need to calculate the con-
ditional probability density for x with fixed x 2 - x 1, • • • , xn - x 1 , and 0 = O. We
make the change of variables

- 1
s1=X=n(x1+ ••• +xn),
s2= X2-X1,

The determinant of the transformation is

1 1 1
n n n n
-1 1 o ... o
=l,
-1 0 1 ... 0

- 1 0 0 ... 1

which is easily verified by adding all columns of the determinant to the first
column. H we set ~ = ((2 + • • • + (n)/n, we find, after some simple calculation,
that

X=E(xlx2-X1, ... , Xn-XI, 0)


=E(s1ls2· .... sn· O)
= l+E(x1ls2· ... , Sn• O)
00

f sf(X1-S• ... , Xn-s)ds


. -oo (7.3-7)
=x------------ 00

-oo.
J f(x1-6 • ... , xn-s)ds

Let us now consider the question as to when the "correction" x defined by


136 VII. UNBIASED ESTIMATES

formula (7.3. 7) is nontrivial, i.e. when )( = E (xlx 2 - x 1' • · ·, xn - x 1' O) = O. We


confine ourselves to the case of a repeated sample of size n, that is, to the case
in which

Theorem 7.3.2. 0 For the case of a repeated sample of size n ~ 3, if Bxj


exists, the equation
(7.3.s)

is satisfied only for a nonnal sample. If n = 2 this equation will be satisfied if


the distribution of xi is symmetric about O.
To prove this we note that (7.3.8) implies immediately that for n ~ 3,
(7.3. 9)

We introduce the characteristic function ¢ (t) = Ee itx~ which is independent of j


since the sample is repeated. For arbitrary real t 1 and t 2 it follows from (7.3.9)
that

(7°3°10)

Since Exj = O exists, we have ¢ '(t) = iE (x/ itxi) (see, for example, (33]). There-
fore we obtain from (7.3. 7) that

E(xl + + +
x 2 x 3) exp l [(t 1 t 2) x 1 - t 1x 2 - t 2x 3]
=- lcp' (t1+t2) cp ( - t1) cp (-t2)-icp (t1+t2) cp' ( - t1) cp(-t2)
- lcp(t1 + t2)cp' (-t2)cp(-t1) = o.
In a sufficiently small neighborhood of the point (0, 0) we have It 11 ::; f and
lt 21::; £, and ¢ (t)-/: O. We may write

qi'(t1+t2)
IJ) (t1 +t2)
+ qi'(-t1)
(-ti)
(j)
+ qi'(-t2) -0
(-t2) -
(j). (7.3.11)

In such a neighborhood of Owe obtain ¢ '(t)/ ¢ (t) = t/J (t). Let us show that

1) See the article by A. M. Kagan, Ju. V. Linnik, and S. R. Rao, On a character-


ization of the normal law based on a property of the sample average, Sankhya, Ser, A,
2; (1965), 405-406.
§3. RAO'S THEOREM ON-INADMISSIBILITY 137

ifJ (t) = at. From (7.3.8) we have ifJ (t 1 + t 2) = -


ifJ(- t 1)- ifJ(- t 2). Furthermore,
l/J(O) = o. H we set t 1 =- t 2 we find that .ifJ(t 1) = - ifJ~ t 1); that is, the function
ifJ(t) is odd. Thus ifJ(t 1 + t 2) = ifJ(t 1) + ifJ(t 2) in this neighborhood of zero.
It then follows from the continuity of ifJ (t) that ifJ (t) is a linear function, so
that ¢ (t) =exp (at 2/2 + C). Since ¢ (t) is the characteristic function, we have
c = O and a~ 0, so that "; is normal.
H n = 2 then
E [(x 1 +x 2) (exp lt (x 1-x2)} = 0,
and hence ¢' (t)/ ¢(t) = - ¢ '(- t)/ ¢ (- t) in some neighborhood of O, and ¢ (t) is an
even function in that neighborhood. On the other hand, if ¢ (t) is everywhere an
even function, the equation is satisfied. Thus for n = 2 condition (7.2.S) is
satisfied for all symmetric x .•
J
CHAPTER VIII

ANALYT,lCAL METHODS OF STUDY.ING UNRANDOMIZED TESTS.


APPLICAT.ION TO THE BEHRENS-FISHER PROBLEM

§ 1. QUESTIONS OF EXISTENCE OF UNRANDOMIZED SIMILAR TESTS


FOR INCOMPLETE EXPONENTIAL FAMILIES
In this chapter we shall look at the construction of (-Complete families of
randomized similar tests for exponential families under holomorphic relationships
of the form (5.2.4). These tests depend only on sufficient statistics, which, by
what was said in § 1 of Chapter V, does not violate their €·completeness. Further-
more, as was shown in the same section, when we have a randomized test depend-
ing only on sufficient statistics, we can, generally speaking, construct from it an
unrandomized test that is measurable with respect to the entire u·algebra of the
sample but not with respect to the u-subalgrbra of sufficient statistics. This test
possesses the same power function as the original one (and thus is equivalent to

it in this respect).
Thus by leaving the u-algebra of sufficient statistics we are enabled to con·
struct an (·complete family of unrandomized tests for the cases considered above.
In particular, on the basis of the discussion in Chapter V, this can be done for the
Behrens-Fisher problem.
However, if we wish to investigate unrandomized similar tests for exponential
families that depend only on sufficient statistics under null hypotheses :of t~e focm
(5.2.4), we encounter considerable difficulties even in such a simple case of an
exponential family as that ta which the Behrens-Fisher problem leads. Below, we
shall investigate the Behrens-Fisher· problem in this respect.
Consider an exponential family of the form
Pa(1'1. · ·., Ts)
= (C(0) exp [-(0 1T 1 + ... + 0sTs)l h (T, • ...• TsY> (8.1.1)
Following [42], let us consider unrandomized similar tests ¢(T 1, • · ·, T 5 ) for
the hypothesis H0 defined with the aid of r (< s) polynomial relations of the form

139
140 VIII. UNRANDOMIZED TESTS

(8.1.2)
where the polynomials P;(B), j = 1, 2, · · ·, r, are assumed to be homogeneous of
degree ~ 1.
The test </J(T l' · · ·, T 5 ) is characterized by its critical zone, for which it is
the indicator function. Of course, we need to require of this test only that it be
measurable with respect to the a-algebra of sufficient statistics. Furthennore, we
must require satisfactory behavior of its power function. Finally, to apply the test,
it is desirable that the boundary of the critical zone have as simple a form as pos-
sible, for example that it consist of finitely many smooth pieces.
The problem of describing unrandomized similar tests for exponential families
that depend only on sufficient statistics is still far from solved. However, inves-
tigations that have been pursued in this direction (which will be discussed to some
degree in what follows) show that such tests, if they exist, apparently have criti-
cal zones whose boundaries are extremely complicated. In general the function
itself or its derivatives of low orders have discontinuities on such boundaries.
Roughly speaking, we should not expect the existence of tests of such a fonn with
sufficiently smooth boundaries of the critical zone. Fairly detailed investigations
in this connection have been made only for the Behrens-Fisher problem by the
method of analytic continuation with respect to the parameters.
Let us look briefly at the application of such a method to exponential families
(8.1.1) and the null hypothesis of the fonn (8.1.2). Let ·j" =(Tl'··.·, T 5 ) denote
the space of values of the sufficient statistics. This space is contained in E 5 •
Let us require that the integral

f ·;r·. f jexp[-(01T1+ ... +0 T )]jh(T1, ... , ]' )dT


8 8 8 1 ••• dT8 .
.
converge in the Cartesian. product of the half-spaces Re().>
I
O (j = 1, 2, · • ·, s).
If ¢(Tl'··· , T 5 ) is an unrandomized similar test of level a with similar zone Z,
then

Jz f exp [-(0 T + ...


1 1 + OsTs)I
"
(8.1.3)

for all real values of ()l' • • ·, () 5 that satisfy conditions (8.1.2). Here C 0(a) is a
positive constant depending only on the level a and the critical zone Z. The rela-
tionships (8.1.2) constitute an algebraic set in the projective space. Let (Bp···, () 5 )
§ 1. QUESTIONS OF EXISTENCE 141

denote a point in this algebraic set. For arbitrary cu> 0 the point (Oi/cu, · • ·, 0/cu)
also belongs to that set.
We set Oj = w~j for j = 1, 2, · · ·, s. Let us multiply both sides of equation
(8.1.3) by cube-a"', where b is an integer ;: : _ 1 and a is a positive number. If we
formally integrate both sides of the equation with respect to cu from O to .oe, we
obtain the formal relationship

I ... f ( +
z
h (Ti. ••• , T 5 ) dT 1
a
it T
1 1
•••

+ ··. + 'ftsTs)
dT5
b+l
C A
0 (a)/ (v, a, b) •. (8.1.4)

Here {('&, a, b) is the result of integrating the quantity


ule-a(J) robe-a(J)
C(0) = C('ftro)
with respect to cu from 0 to oo.

Let us suppose now that both sides of (8.1.4) are meaningful for real -&j > O
(j = 1, 2, • · ·, s). A sufficient condition for this is absolute convergence of the
integral on the left side of (8.1.4) for -6-j > O. Then the left-hand member of (8.1.4)
must coincide with the right-hand member if the relations

(8.1.5)
are satisfied. For real parameter~ -& these relations can be extended to a complex:
region. Let us suppose that such an extension is possible on both sides of equa-
tion (8.1.4). Then this equation itself will remain valid for the corresponding com-
plex values of ~ l' · · · , ii- s in the corresponding region 0. Here the form of the
function f('fr, a, b) is determined by the exponential family (8.1.1), and investiga-
tion of the possible similar tests leads to investigation of the zones Z satisfying
the identity (8.1.4) when the relations (8.1.5) are satisfied. For Z = ·j' the identity
(8.1.4) obviously holds with C0 (a) = C0(1); this is a trivial unrandomized similar
test. For other Z the zone has a nontrivial boundaty.
For a given a coo.sider the expression in the denominator of the left-hand mem-
ber of (8.1.4)
D(a, -&, T)=a+'6- 1T 1 + ... +-& Ts 1 (8.1.6)
with relations (8.1.5) holding.
In the study of equations of the type (8.1.4), a special role is played by the
real zeros of the expression (8.1.6) when the relationships mentioned hold, i.e.
the geometric loci in the space E 5 that are defined by the equations
D(a. 'fr, T)=O; P 1 ('6-)=0, ... , Ps(-6-)=0 .. (8.1. 7)
142 VIII. UNRANDOMIZED TESTS

Here we admit arbitraiy complex values for 'fr that yield real values for Tl'". · , T8 •
We shall call such real zeros "critical surfaces" or "critics" of the exponential
family (they depend on a number a fixed in advance).
Th_: values -frCO) = ('6-<1°>, · · ·, 'fr~O)) generated by the critics lie outside the
region n in which equation (8.1.4) is valid, since the left-hand member of (8.1.4)
becomes meaningless for such values. However in many cases it is possible to
approximate such values without leaving the region 0. Here the denominator of
the fraction on the left side of (8.1.4) approximates O. In studying the asymptotic
behavior of the left and right sides of this equation as we let 'fr€ n approach 'frcoi
we can fr~quently obtain somewhat remarkable information regarding the behavior
of the similar zones Z. Here the strucrure of their boundaries is of particular
importance. Under certain conditions one can use this proc.edure to show that
there are no similar critical zones with sufficiently smooth boundaries.
We note also that to carry out an investigation of this type it is essential to
have a relation of the form (8.1.4), where the denominator in the integrand vanishes
at the critics of the exponential family. The origin of such a relationship is of no
significance. We obtained it by an integral transfonnation, but it can also be
obtained by other approaches, especially when we are considering tests that are
measurable with respect to some (incomplete) subalgebra of the algebra of. suffi-
cient statistics. We shall encounter such examples when we study the Behrens-
Fisher problem.

§ 2. STA TEMENT OF THE PROBLEM OF AN UNRANDOMIZED


HOMOGENEOUS SIMILAR TEST IN THE BEHRENS-FISHER PROBLEM
For the Behrens-Fisher problem (in the usual notation) we have four sufficient
statistics: X, Y, s I'
and s 2• We define a homogeneous similar unrandomized test
as a similar test whose critical zone is of the form

where G is a Lebesgue-measu·rable function. We shall see in §1 of Chapter X


that such a test always exists for an arbitrary level a in the case of samples of
sizes n 1 and n 2 of unlike pariry.
From an analytical point of view, however, the boundary of this critical zone
is extremely complicated. We shall see that the problem centers around this dif-
ficulty.
§ 2. STATEMENT OF THE PROBLEM 143

Sine e we shall be dealing with the random vector ((x - y) / s 2, s ifs 2) , we need
to derive its distribution. Let us set g = (x - y)/ s 2 and T/ = s ifs 2• Let us assume
also that n 1 ·2: 2 and n 2 ~ 2. We define

_ - ( n 1a~ + n2a~ )''•


X 1 = (x - .Y) n 1n 2 •

This quantity is independent of si and s~ (see Otapter IV). Here E(X 1) = O and
D(X 1) = 1, so that X 1 € N(O, 1).
II I
Let us also define u = n 1s a and v = n 2 s ~/a~ These quan titles are
independent and they have x 2 distributions with (nl -1) and (n2 - 1) degrees
of freedom respectively. Finally, we define u 1 = ..jii and v 1 = ...jv. The quanti-
ties X l' u 1 and v 1 are independent and have probability densities
1
p(X 1 )=~exp(- ~·. ,
, 2)

'Vt> 0.

Define
x- = xt
-
flt
- u 1 Z=V1·
, .Y=-
flt •
-

Th en x1 = """
x z, u 1 = """
y z, v 1 = ""
z, and
z 0 0
oz o = z 2•

;;1
Thus for arbitrary x 0 and arbitrary positive y 0 the common density is

where

Integrating with respect to "" . that


z, we find
144 VIII. UNRANDOMIZED TESTS

00

Jdz . zn,+n, -2exp [ - ~ (l+x2+)12)J


(1
n 1+n2-l'
+ ;2 + y2)--2-
0

where

From this we get


31n 1 -2
p (X, y) = Cn,n 2 ---~--::n:-,+;-:n:-,--.-1 ' (8.2.1)
c.x2+ y2+ 1>
- 2
where

2r(n1+n2-l)
- I 2
C n,n, = Kn,n,K n,n, = r ( n1 2 t ) r ( n2 2 1 ) •

Let us now define () = n 2 u~/n 1 u~. After some simple manipulations we obtain

X= S - T)
Y1+0' Y=ye·
Let ¢(e, T/) denote the indicator function of the critical zone G(e, TJ) ~ 0. Then
E<<Pl0)= ff q>(xv1+0 . .Yy0)p(x.
Q
)i)dxd'Y=a.. (8.2.2>

where a is the level of the test and n is the upper half-plane


, -oo < x< oo, .Y~o.
Define
N -- n1+n2-l
2 .
Let us return to the variables g and T/• Then (8.2.2) may be written in the
form

(8.2.3)
where C(n 1, n 2)-> O. This is the basic integral relationship for a homogeneous
unrandomized similar test.
Following the exposition of § 1, let us investigate the critical curves for the
family of measures appearing in formula (8.2.3). We shall simply refer to these as
§ 2. STATEMENT OF THE PROBLEM 145

"critics". They are geometric loci obtained by setting the expression


02+0(1+62+ri2)+ 1)2,
eCilual to 0, i.e. they corresf><llld to values of () = - D ~ O. For D = g tft.is is the
abscissa 11 = O. For 0 < D < 1, we obtain die family of confocal hyped>olas
'12 52
D - I-D
1 = (8.2.4)
with common focus (0, 1), the limiting position 0f these hyperbolas beiag tile line
T/ = o.
For D = 1 we have e
= 0, i.e. die equation for die ordinate. k is natural to
consider that portion of the ordisate correspoediog to 71·~1 (twice covered) as
belonging to our family of hyperbolas. Thus we obtain the family of confocal
hyperbolas
A(s. ri)=D, (8.2.5)

where 0 ~ D .:::; 1. For D = O and D = 1 the hyperbolas degenerate in to segments


of straight lines. For D > 1, instead of hyperbolas, we have ellipses
52 112 '
D-1 + o = l (8.2.5 ')

with common focus (0, 1). For D = 1, they degenerate into the twice covered
segment e= 0, 0 .~ 71 .:::; 1. We write the family of ellipses in the form

8(6. ri)=D, D~L (8.2.6)


When we study the distribution of the critics with respect to the boundary of the
test zone and apply the principle of analytic continuation with respect to (), we
discover a number of properties of the unrandomized test ¢Ce, 11> (see the articles
by Lionik [ 42], [ 44] and Salaevsb"':i (77]). (Salaevskfi applied the Laplace trans·
formation to get these results. He obtained simpler and more general results by
this procedure for continuous families of tests.)
We shall say that a continuous family of unrandomized similar tests with
critical zones Z c = {'G(e, 11> ~ Cl is defined if a continuous function G(e, 11> is
constructed such that, for arbitrary given C and an arbitrary value of the parameter
(), the quantity Pc= P[(g, 77) E Zcl is independent of (), so that the distribution
of G((, 77) is independent of (). Trivial continuous. families are generated by the
functions G(g, 77) = cons t.
Theorem 8.2.1 (Salaevskll). There are no nontrivial continuous families of
unrandomized similar tests for the Behrens-Fisher problem.
The proof of this interesting theorem is rather laborious, and we omit it.
146 VIII. UNRANDOMIZED TESTS

However, if we drop the requirement that the function ·c(g, T/) be continuous
this theorem is no longer valid, as we shall show in Chapter X.

§ 3. HOMOGENEOUS FISHER-WELCH-WALD TESTS


In a well-known article [9] Wald examined unrandomized tests that satisfy
four axioms regarding the critical region Z and that are quite narural from· a statis·
tical point of view. These axioms are as follows:
(I) The critical regioo Z lies in the space of the sufficient statistics x, y, sj, s~;
that is, if a sample (x 1, • • ·, Xn 1; Yp • • ·, Yn 2) beloogs to Z, then the sample
I
Xp••·,xn I I I ) 1 b Iongsto z if -, - -, -
x =X,y=y,s 12 =sl'an
2 d s 12 =s 2 •
1 ;yp···,yn 2 aso e
(
1 2 2
(II) H the sample (x 1' • • • , xn 1; y 1' • • ·, y n 2) be longs to Z, then for an
arbitrary number c the sample (x 1 + c, · · ·, xn 1 + c; y 1 + c, · • ·, yn 2 + c) also
belongs to Z.
(III) If the sample (x 1' x 2, ••• , xn 1; y 1' · • • , y n 2) belongs to Z then for arbitrar}r
nonzero k the sample (kx 1' •.. , kxn 1; ky 1' ..• , ky n 2) also belongs to z.
(IV) Let (x I'··· , xn ; y 1' • • ·, y n ) denote a sample belonging to Z. Let
(x'p ... , x'n 1; Y'p · · ·, y'n~) denote ano2ther sample such that :\y' - Xi\ > \y - xi,
s'i = si, ands'~= s~. Then the sample (x'l' • · ·, x'n 1; y'l' · · ·, y'n) also belongs
to z.
It follows from these four axioms that the critical region of a test must be of
the form
1'.x-Y-1 ~ cp(~). (8.3.1)
S2 ~ 52

where ¢ is a single-valued measurable function.


Proof. ·It follows from axiom (I) that the region Z is detennined roly by . X, Y, s i,
and s~. It follows from (II) that if the point (X, Y, sj, s~) belongs to Z, then the
point (x - Y, O, si, s~) also belongs to Z, so that Z is determined by the variables
x - y, s~, and s~. From (III) we conclude, by setting k = - 1, that Z is sym-
metric in the sense that it is determined only by the values of \x - rl' s 1' and
S2.
Let us set k = 1/ s 2• We see that, to determine Z, all we need are the values
of \:X-Yl/s 2 and s/s 2 , so that the zone Z can be represented in the half-plane:
g=\x - Yl/ s 2, T/ = s / s 2, T/ ~ 0, and so that it is symmetric about the TJ-axis. We
can consider only the half of z
lying in the quadrant n
1= g ~ 0, T/ ~ o.
Let us show now that every straight line T/ = T/o (T/ 0 > O) intersects the bound·
§ 3. HOMOGENEOUS FISHER-WELCH-WALD TESTS 147

ary of Z in the quadrant Q 1 at no more than one point. Suppose that this is not
the case, i.e. that there are two points <el' 710) and <e2, 710) belonging to Z such
that e2 > e1 ~ O. Then there exist two points <el' 71 0) and <e3, 71 0) such that
el< e3 < e2 and <e3' 110) ~ z. Suppose that corresponding to the point <el' 110)
are the sufficient statistics x, y, sl' s2 and x', y~ s'p and s'2· Here sifs2=
s'1/s'2 and (x' -y1)/s'2 > (x -y)/s 2. Let us set k = s' 1si: 1 and take a point to
which the sufficient statistics kX, kY, ks 1 and ks 2 correspond. This point also
belongs to Z. Furthermore ks 2 = s•2 and (x' - y' )/si > (kx - ky)/ks 2• Acccxding
the axiom (IV) the point <e3, 71 0) must also belong to Z, which is impossible.
Thus the critical zone Z must be of the form

1:X-"Y1 ~
S2 -:?'
qi(!L).
S2

where </J is a measurable single-valued function, which however may assume


infinite values. If we exclude this possibility and suppose that the function </J in
(8.3.1) assumes only finite values, we obtain a test, which we may call the
Fisher-Welch-Wald test in honor of the authors who first investigated it.
Sometimes this test is written in the form
jx-yj . . ._ (Si).
,r 2 2 """'<Jli -
Y s1+s2 s2

which we obtain from the preceding form by setting <P 1 = </J • (1 + s j! s ~)~. A fairly
detailed bibliography regarding this test and others like it that were discovered up
to 1955 can be found in the survey by Breny [6].
In his article [9] Wald considered approximate similar tests of the form (8.3.1),
where </J is a rational function, and he raised the question of constructing precisely
a similar test for which the function </J is analytic for 1/ = s ifs 2 ~ O•. We shall
see later that such a construction is impossible-that a similar test for which the
function </J is analytic does not exist. We shall also prove a stronger assertion
of this type.
Let us return to the coordinates eand T/· In the quadrant 0 1: e·~ 0, 71 ~ 0,
the test (8.3.1) becomes
s~qi (1]), (8.3 .2)
so that the critical zone z is a symmetric doubly-connected region e·~ </J(71) or
e.5:- ¢J( 71). Following articles 1) by the author [45] and I. L. Romanovski'l [63],

1) These two articles contain an omission, which is corrected in [ 49].


148 VIII. UNRANDOMIZED TESTS

we now prove
Theorem 8.3.l. Suppose that ¢C.q) is continuous for 0 :s; T/ :s; 1 and that it has
a finite derivative for 0 .::; 71 .~ 1, where ¢'(0) means the right-hand derivative.
Suppose that ¢(71) satisfies a Lipschitz condition with exponent 1 for 1 :s; T/ ~
2(M + 1), where M = sup 0.< <I ¢(71). Then the test (8.3.2) cannot be similar if
_Tl_
the size of the test n 2 ;?: 4.
The proof is based on the method of analytic conti.ouatiOR with respect to a
pa&Hll.eter, as described io § 1. Several lemmas are necessary for it. kl wkat fol.
lows, the symbols li, ci, aad Ki denote positive constants and the symhels fi'
71i• and ei denote small positive constants.
Lemma 8.3.I. Suppose that e0 = ¢(0). Then
c2a <So< c1a. (8.3.3)
To prove this we use formula (8.2.3). By this fonnula we have

(0s2 + (1 + 0) 112 + 0 + 02)N


n. 1
=C(n1, ~)a0- 2 (t +a>-N+2. (8.3.4)

By the conditions of the theorem ¢(71) is continuous in a neighborhood of O. Hence


for given small f > 0 there exists a 8 > 0 such that the double inequality 0 .::;
71 :s; 8 implies the inequality !¢(11) - eol :s; f. Let us suppose that f and 8 are
both less than 1.
Consider the following regions in the quadrant 0 1:
Z6 : 0 < s < cp (TJ), 0 < TJ < 6;
Z~: O<s<oo. 6<TJ<oo.
Let us find an estimate for

ffz' 11 11 ·- 2 ~d1J
(0s2+ (1+0) 112+9 +02)N =f11.
I
(8.3.5)
6
We define p(71 , ()) = (1 + ()) 71 2 + () + ()2. We find that

where CN > 0 is a constant. Furthennore, since n 2 ;?: 2,


§ 3. HOMOGENEOUS FISHER-WELCH-WALD TESTS 149

1 n
=(l +0)-N+20-2
2 s du n, (8.3.6)
_6_ (u2+1)2
va
If e is chosen in such a way that o/Je = K, where K is sufficiently large con-
stant, then (8.3.6) has the estimate

(1 + 0) -N+_!_ 0 _.!!!.. eKO (1),


2 2

where €K - .... 0 as K--> ""• Thus


, -N+}_ -~ 62
fri=(l +0) 2 0 2 eKO(l) for 0= K 2 • (8.3.7)
0
Now let us look at the integral J J2 8 = I 8 =I - I' O" Obvious).y
n,
Ia<<so+e)0-Nbn,-1 =(so+e)0-2 Kn,-1
For sufficiently large K we obtain from (8.3.7) and (8.3.4)

Kn·- 1 0- n; (s 0 +ef>=· ~-C(n 1 • n 2)a0-i


and hence

If we take € -~ c 3a/2, we get


so:> c2d.· (8.3.8)
To get an upper bound for ;-0 let us find a lower bound for I 8 • We have
1 _ .!!!..
/6~Ki(K)(so-e)0 2
where K 1(K) depends only on K. For sufficiently large K, .
n, 1 _.!!!,.
2C (n 1, ~) a0- 2 :;p Ki (K) (so -e) 0 2 ,
so that
(so-e)<K2a, 6o<K2a+e < (K 2 l)a, +
if € is chosen less than a. This completes the proof of Lemma 8.3.1.
Let us now look at the operation of "upsetting" tests. Suppose that we are
given the test J ~ J ?_ ¢(T/). We define
q> ('IJ) .
cp1<11)= v1+'1]2 •
so that
150 VIII. UNRANDOMIZED TESTS

Then the test takes the form

or

IX-YI...._
---..,...q>l -
(S1) 04~
1 +·-.
S2 S2 S~
which may be written in the form

v1x-Y-1
~ 1+ 5 2
2 ~q>l
(
SS21 ),

or in the form

1:Y-x1 (( 5 1

Vs~+~ ~q> 1
)- )
s:
or

This last expression yields the critical zone of the test in which the role of the
sample (xi'···, x 111 ) is played by !yp · · ·, y 112 ) and vice versa. Thus, corres-
ponding to the test le-I~ cp 1(T/) ~ is the test le-I~ cp 1(1/T/) ...jl + T/2 of the
same level. Here we need to switch n 1 and n 2 in formula (8.3.4).
Consider the projective transformation of the plane - oe < g < oe, - oe < T/ < ·""
defined by g' = g;T/, T/' = l/TJ. This is an involution of the plane with fixed axis
T/ = 1 and fixed center (0, - l).
Each quadrant of the upper half-plane is mapped by this transformation into
itself. In the upper right quadrant the test boundary g = <P 1(T/) ~ is mapped
into g = <P 1(1/T/) ...jl + T/2. From what was said above, this yields a test bound-
ary of a similar test of the same level. In formula (8.3.4) we need to replace n 1
with n 2, n 2 with nl' and </J(T/) with cp 1(l/TJ) ~ •
Let D denote a number in the interval (O, 1) and let the expressioo A(g, T/)"" D
define a nongenerate critic-hyperbola. The equation for this hyperbola is
,,2 s2
0-1-n=l.
The involution described above maps the hyperbola into the ellipse
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 151

52 '1')2
D 1 -1 + D- 1 =l.

Here D-l > 1 and this ellipse is the critic B(e, ri) = D- 1• This means that every
nondegenerate critic-hyperbola is mapped into a critic-ellipse. Since out transfor-
mation is an involution, the converse is also true. Furthermore, if the test bound-
ary e= ¢(ri) is tangent at any point to a noodegenerate critic-hyperbola (resp.
nondegenerate critic-ellipse), then the "upset" similar test with test boundary

~= V -1 + TJ2 'P1 (-:;)1 ) (


where 'P1(TJ)=V1q> (TJ)TJ2+ )

is tangent to a nondegenerate critic-ellipse (resp. nondegenerate critic-hyperbola)


and conversely. If a point lies inside a quadrant (i.e. if it has positive abscissa
and ordinate) then its image under our involution lies inside a quadrant.

§4. LEMMAS ON TANGENCY OF A TEST BOUNDARY TO A CRITIC 1)

Lemma 8.4.1. Suppose that a test boundary e


= ¢(ri) is tangent to a nonde 0

generate critic at a point 0<e


, ri 0 ), so that ¢ has a finite derivative ¢'(ri0 ) at
T/ = T/o and the tangent line to the boundary coincides with the tangent line to the
critic. Suppose that the critic intersects the <e
= O) axis at a point (O, ri'0) and
0

that it satisfies a Lipschitz condition with exponent 1 in a neighborhood of the


point ri'0 ¢(ri). Then the test JeJ ~ ¢(ri) cannot be a similar test.
The proof of this lemma is rather complicated. We introduce the family of
critics A(e, ri) and B(e, ri) and we write the basic expression (8.2.3) in the form

ffz, TJn'- 2dsdTJ _ a -


(02+e(l+s2+TJ2)+TJ 2)N-Cn,n, 2 0

(1 +0)
-N+.!. 8 4 1
2. ( • • )

The basic equation (8.4.1) must be satisfied for all e > 0.


Let us construct the anayltic continuation of the left and right sides of equa-
tion (8.4.1) onto the plane A of complex values of the parameter e= T + i(. On
e = 0 and e = - 1 are singular points. Since
the right side of (8.4.1) the points
the right-hand member is a function of e, it can be extended to the plane A with
a cut along the negative half of the real axis from 0 to - oo. The integral on the
left side of (8.4.1) converges absolutely and unifomlly in every closed bounded
region of the plane A with a cut along the negative path of the real axis. (All the
zeros of the denominator in the integrand are real and negative.) The left-hand

1) This section was written by the author in colloboration with I. L. Romanovskil'.


152 VIII. UNRANDOMIZED TESTS

member of (8.4.1) is a single-valued analytic function of () in the region A and it


too can be extended to the region of complex values of the parameter.
Since the left and right sides of (8.4.1) must coincide for positive values of
(), they must coincide for the entire region A. Therefore a test that is similar for
positive values of () is also similar for all values of () E A.
By choosing the appropriate branches of the factors and using the concept of
critics that we have introduced, one can represent the denominator in the left-hand
member of (8.4.1) as the product

Thus the fundamental equation takes the form

ffz, ,,n,-2d6d11
(0 +A (6, 11) )N (0 + B (6, 11) )N
a - .!!!. -N +.!
=Cn,n2 2 0 2 (1+0) 2• (8.4.2)
By considering the values of () of the form () = -D 0 + i<:, where (is a small real
number, we shall obtain a contradiction, thus proving Lemma 8.4.1.
We may assume that the test boundary is tangent to the critic-hyperbola
A(tf, 71) = D0 , 0.<D 0 .<1, because if it is tangent to the critic-ellipse B(if, T/) =
D l' D 1 > 1, it will be tangent ot the critic-hyperbola for the "upset" test and the
analytical properties of the lemma are maintained.
Let us divide the region of integration Z 1 into several parts. First of all,
we denote by Ilt"O the infinite haH-horseshoe-shaped region containing the critic
A(tf, TJ) = f!o bounded by the ordinate and the two confocal hyperbolas
A(s. 11)=Do+eoandA(s. 11)=Do-Eo.
(where t"Q is a sufficiently small positive number such that D0 + t"o .< I and D0 -
t"o > O) •. We then denote by Il~~) the finite subregion of this region bounded by the
ordinate and the line TJ = TJ l' where T/ 1 is such that the distance from the ordin-
ate the point of tangency (if0 , T/o) is, for, example, several times as great as
to
the distance along this line from the ordinate to the half-horseshoe. Thus the
region Ilt"O \TI~~) contains the point of tangency to the critic and the test bound·
ary. It is for just this region that we shall obtain a contradiction in the behavior
of the left and right sides of the basic equation as <:--+ O. The integral in the
left-hand member of the basic equation (8.4.2) taken over this region approaches
- oo, whereas the right-band member remains bounded.
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 153
In what follows we shall take e = - D0 + i(, where ( < ( 1 « E"o· We denote
by (j that region of values of e that is defined by the conditions listed.
In the region fl I \IlfO for t/J(g, 71) I= 0,
10 + A <s. 11) I :;;::. eo
and

Therefore the denominator of the integrand in (8.4.2) differs from 0 by some con-
stant and its behavior plays no significant role.
In the region Ilf 0 , for e E ""e,
10+ A (s. 11) I<: l-D0 + i~+Do+eo 1-< Veo+~2 < 2to· (8.4.3)

and
(8.4.4)
Thus what is important is the behavior of the basic equation in the region
Ilfo• Let us denote by 1(0) the integral in the left-hand member of the basic equa-
tion and let us denote the same integral over the subregion IlfO by I 0(0). Let us
study the integral I 0(0) in greater detail. The quantity (0 + 8)-N can be rewritten

e+A)-N
(B-A)-N ( 1+ 8-A • (8.4.5)

Let us choose E"o in such a way that the series that we shall be considering con-
verge; for example let us take E"o < E"/10. Since inequality (8.44) holds and
since for e
> 0 we are considering that branch of the function (O + 8)-N with posi-
tive values, we can expand (8.4.5) according to the binomial formula.
Let us set
M=
1
1 YzN+ if n 1 and n 2 are of like parity,
N if n 1 and n 2 are of opposite parity.
Let t/J(g, 71) denote the indicator function of the zone z I" Then

lo(0)
"'\, f f 'i'(s.
M

= .t.A Ek 11) (B-A)N+K <e


'l')n,-2 d6 d'I')
+ A)N-K
k=O Ile,,
154 VIII. UNRANDOMIZED TESTS

+ f J'IJ(s. fJ) rt'- 2 E M+l ( ! + ~) ds dfJ,


Ile,,

where the Ek are the coefficients in the expansion, E 0 = 1, the function


EM+l((fJ + A)/(B -A)) is a regular function of g and T/ in the region IlfO and the
last term is a bounded analytic function. As B -7oe in IIE"Q• this function has by
(8.4.5) the e~timate 0(8-N), and all the integrals converge absolutely in IlE"O"
This expansion shows that quantities of different orders are "mixed'' in the
integral I 0(f)), where in the region IlfQ lfJ + A I < 2fo for f) E the maximum ?f
order will be provided by the integral with maximum degree (fJ + A); that is, the
fundamental term is the integral

f f 'iJ
IIeo
(s, f)) (B - A)N (0
'l'Jn,-2 ds d'l'J
+ A)N • (8.4.6)

To single out this fundamental quantity, we apply the following integro-differen-


tial operator to the integral (8.4.6).
Let Lb-k) denote the operator for k-fold integration along the line segment
connecting the points f) and f) 1 (where f) E ?f and f) 1 = 1 + i). This path lies
entirely in the region A. Let us apply the operator L~-M1) to the integral /(fJ),
where
l
M = N - ~ if n 1 and n 2 are of like parity,
1 [ N - 2 if n and n are of opposite parity.
1 2
If n 1 and n 2 are of opposite parity, application of the operator L ~ -M l) yields

l~._M,> (lo (0))


1
= (-Nt 1) (-Nt2) ... (-NtM 1)
ff 'IJ(s.... 'l'Jn,-2dsd'l'J
ll\B-A)N (0tA) 2 +
Ile,,

+
1
(-N+2) ... (-N+l+M 1)
SS ¢(5. fJ) ~~~~
(B-A)N+l (0+A) +
Ile,,

+ (-N+3) ... (-N +2+M


1
1)
ff 'l'Jn,-2 ds d'l'J
¢(6, fJ) (B-A)N+2 + U1 ( 8),
Ile,,
where u 1(fJ) is a function that is analytic and bounded and has analytic bounded
first and second derivatives for f) E "'
f). If we apply this same operator to the
integral I 1(fJ), we again obtain an analytic bounded function with analytic bounded
first and second derivatives. Application of the operator L ~-MI) in the case of
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 155
samples of a single parity yields

4-M;> (I (0)) =cl


1
JJ'llJ (s. 'l'J) TJn,-2 ds dTJ
(B-A)N (0+A) 1f2
Ile.
+c1J J'llJ('.li:
2
)
"'' 'l'J
TJn,-2<0+A>'/2dsdTJ
(B-A)N+l
Ile.

+ c~ JJ 'llJ (s. ri) TJn,-:~0_!A~~~~s dTJ +vi (0). (8.4. 7)


Ile.
All these integrals converge absolutely. The function v'(O) is also an analytic
bounded function with analytic bounded first and second derivatives.
In the case of samples of like parity, to avoid fractional exponents we can
apply a different operator to (8.4. 7). Let us multiply this equation by (0 - O')...J.iido,
where 0 varies over the line segment from 0 = O' E to 0 = 0 1 = l + i, and0
then let us integrate the result over that line segment: we obtain
0'

s
e1
d0
(0 + A) 1/ 2 (0 - 0') 1/ 2

=In [2 (0 1 + A)11• (0 1 - 01)112 + 20 1 +(A -01)] - In (A+ 0'),


e, 'I
S (0 +A) • d0
(0- 0')'/2
- (0 - 01)'/2 (01 -
1
A)''•
01
+(A+ 01) In [(0 1- 01)112 + 0{2 - A)- (A+ 01) In (A + 01)112 ,

Se, (0(0 -+A)°'•


0')'/2
d0 = [ 01 +A
8
+ 2._
16
(01 + A)] · 2 (0 1- 01)112
01

X (0 1 + A)11• + ~ (A+ 0')2 In [2 (0 1 + A)11• + 2 (0 1 - 01)112 ]


- ~ (A+0 1)1n[2(01+ A)112 ] .
"-
Now let us twice differentiate the results obtained with respect to 0' for 0' € e. ·
We denote by Le
the operator producing the trans formation that we have just per-
formed. Then we can easily show that
l56 VIII. UNRANDOMIZED TESTS

+ c" f f ¢(6
3 '
'11) TJn,-2In (8 +A) ds d1J
(B-A)N+2
+ v (0) ' (8.4.8)
Ile.
where c'J., c"2, and c 113 are re~ constants, c' 1 f,. 0, and the function v(O) is
analytic and bounded for E e e.
We choose that branch of ln(O +A) that has
real values for e > O. Again, all the integrals converge absolutely.
The expressions (8.4. 7) and (8.4.8) differ only in the third term. Therefore
we may write (combining the two cases)

Le(/(0))=C1 j f ¢(6, '!')) T]n,-2 d£d1J


w (B-A)N(8+A)2
Ileo

+ C2 f f ljJ (~. rt) (B - T)n,-2 d£ d1J


A)N+I (8 +A)
Ileo

+c 3
f f lJl (6. ri) TJn,-2 c (8) ds dTJ
(B-A)N+2
+ u (0)' (8.4.9)
Ile.

where L 8 is either L~-M1) or L8(Lc;M1)) and


C(O) = { 1 if n 1 and n 2 are of like parity,
ln(A + O) if n 1 and n 2 are of opposite parity.
For the coefficients and the last terms we have used the conventional notation.
Let us now show that the imaginary part of the first integral in°(8.4.9) approaches
oo as t;; --> O. We denote by n<€01) = 'Il €0 \nO€0 that portion of the half-horseshoe
bounded below by the horizontal straight line T/ = TJi· This region contains the
point of tangency of the critic and the test boundary. Consider the integral over
this region

f f ¢<5·
rr!IJ
(8.4.10)

"' eo
where e E e; that is, e= - D0 + it;;. In the region TI~~) let us replace the vari-
ables (and T/ with a and T/ just as in [45]. We set a= A -D 0 • Here the
Jacobian a(a, T/)/a((, ri) < 0 in the region TI~J). This Jacobian is an analytic_
function which, when multiplied by (8 - A)-N, decreases at least as fast as 1/ri2
in the region I]O) as T/ -->.oo.
€Q
<§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 157

Using the new variables, we rewrite the integral (8.4.10) in the form

- ff
n<I>
'II'
P(a, 'IJ)
(a, 'l'J) (a+ i~) 2 da d'l'j, (8.4.11)
e,,
where
F(a, ri) > o.
Since
--~s~2-- t
D 0 +a (1-(Do+a) = '
it follows that a is a single-valued analytic function of ( and 71 for given D0 •
Here a((, 71) is a strictly decreasing function of ( for fixed 71 and a strictly
increasing function of 71 for fixed (. For the new variables, instead of the char-
acteristic function t/J((, 71) we write the corresponding limits of integration. For
given 71 we define
s0 ('l'J) =min (e0 , a (q> ('l'J), 'l'J))
and
F(a, 'l'J)=F0 (0, ri)+aF 1 (a, 'l'J).
Then the integral in (8.4.11) is equal to
('!I)

f f
00 So

d P 0 .(0, T)) +
aF1 (a, T)) da
- 'l'J (a+i~) 2 •
n, -e,,
Here F 0 (0, 71) = O(l/71 2) and aF 1(a, 71) = O(l/71 2) as 71-> ;,.,.
We estimate the imaginary part of this integral:
('!I)
00

Sdri S
So
p 0 (0, T)) +
aPi(a, "I) da
- Im (a+ i~)2
'Iii -e,, oo So('ll)
2a 2 ~F 1 (a, Tt) da
=- S 'l'J S d
- (a2+~2)2
'111 -e,,
00 So ('!I)

+ Sdri S
2a~P0(0, T)) da
(a2+~2)2
'lit -e,,

As '-> 0 the integral


00 So ('II)

f
'111
d'l'j
-e,,
J 2a2~p1 (a,
(a2 +~2)2
'IJ) da

behaves like 0(1). The imagillary part of the second integral is equal to
158 VIII. UNRANDOMIZED TESTS

00

f 2~F0 (0, 11)


(So (1]))2 +
~2 dfl + H, (8.4.12)
Th

where H is a bounded number. Here we should keep in mind the fact that
F 0 (0, Tl) > e > 0. 3
for all T/ in a neighborhood of T/o·
The denominator of the integrand approaches 0 as ( 0 when s 0(T/) = 0, as --+

is the case at common points of the critic A(cf, T/) = D0 and the boundary of the
test. These common points may be points of either intersection or tangency. In
either case the half-horseshoe ·n<fO1) has at least one point of tangency, namely
the point (cf0 , T/o).
Consider the region Il~~) containing the point of tangency. As T/-+ T/o•
So~Tl) = 0 j 'l'J - 'l'Jo J •
Let K denote a large given number. For sufficiently small (and for ITJ-TJol_:5
K( we have the inequality Js 0(.,,)J :::; (, thanks to the existence (assumed in the
lemma) of a finite derivative of ¢(.,,), and we can estimate the integral in (8.4.12)
as follows:
co +Kt;
2~ f rs: f~) )1J>~~2 > 2~ f
TJ1 -KC
2~2 dri = 2e3K.

where f 3 > 0. For sufficiently large K, we see that the integral (8.4.12) can be
made arbitrarily large. Thus
Tl' (TJ)

I f
So

F(a, 11)
Im d'l'J (a+ i~) 2 d at;-:0- oo.
TJ~ -e.
Now let us look at the other two integrals in (8.4.9). We can get bounds for
their imaginary parts as (-+ 0 just as we did for the first integral, this_time re-
placing (a+ i(>-2 with (a+ i()-1 and In (a+ i() or 1 depending on whether
the difference n 1 - n 2 is odd or oven. :Obviously, Iml/(a + i(} = - (/(a 2 + ( 2) and
Im In (a + i() = o(l). As ( --+ 0, we see that the imaginary parts of the second and
third integrals in (8.4.9) are brunded. The functioo u(O) and its imaginary put are also
brunded fa: 0 E 0 as (--. O. We now need to investigate the integral over the regioo n<f~·
Let us consider that portion d the half-ha:seshoe II~~) adjacent to the a:dinate and
bounded by the line cf= f 4• The behavior of the integral in the left-hand member of (8.4.2)
over the region II~~) depends oo the behavior of the test boundary cf= <fJ(T/) at the point
at which the critic A(cf, T/) = D 0 leaves the ordinate, -i.e. :at the point
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 159

(O, ../D 0 ). Two cases are possible.


Case 1: ¢(Jlj 0 ) f. O. We denote by TI~~) the intersection of n~g> and the strip
0 .:5 g .:5 .: 4• Here the indicator function t/J(g, 11) = 0 and the integration may be
taken over the region n~~>. In this case a(a, 11)/a(g, 11) neither vanishes nor be-
comes infinite and it is an analytic function. The integration is carried out just as
in the region no>,
fQ
but 11 varies from - fo to
s0 (TJ) = min (e0 , a (qi (TJ), TJ), a (e4 , TJ) ).
Case 2: ¢(yD 0) = O. Then by the hypothesis of the lemma the function ¢(11)
satisfies a Lipschitz condition. Let us fix gin (O, .: 4) and consider the set of
roots of the equation g = ¢(11). Because of the continuity of the function ¢(11),
the s·et of these roots is closed. Among them there are two roots s +<11) and s J11)
such that s +<11) > JD 0 and s J11) < yD 0 which out of all such roots are closest to
the number vDo. Since by the hypothesis of the lemma the function ¢(11) satisfies
a Lipschitz condition, we have
Is+ (11)-VDol ~ess·
Is_ (TJ)-VDol > ess·
where .: 5 is a positive constant. lntegration,.over the region Il~~>\n~~) leads to
"
the same results as before. In the region Il~~) the function tfJ(g, 11) is nonzero
and the Jacobian a(a, 11)/a(g, 11) becomes infinite (the tangents at the vertices of
the critic-hyperbolas are parallel to the 11-axis). Therefore we must use a differ-
ent approach. Let us replace the variables (g, Tf) with the variables (g, a), where
a= A - D0 • Here the Jacobian a(g, a)/a(g, 11) > O is an analytic function that
neither vanishes nor becomes infinite. Just as in the region ·n<O we consider the
fQ
integral

where v(g, a) is a regular function in the region ft~) and v(g, a)> O. For fixed
g the limits of integration with respect to a depend on the roots of the equation
g = ¢(11). Just as in the region TI~~. we may write
v (6. a)= "o (6. 0) + av,,.,. (6,
1 a),
where v 1(g, a) is a function that is regular in Il(O). What is significant is the
fO
integral
160 VIII. UNRANDOMIZED TESTS

ff!o>
Vo (s, a)·d~ ds •
(a+ Zb)
neo

J- 2~vo (s.
84
0)
l
The imaginary part of this integral is of the form

~l
r,
[ (aim
1
m) + b 2 2 - (a2m (s) ) + ~ 2
I
2
]

r. [
- ~
m=l ·(a1m<S>)2+b2
I
- I
(a2m<S>)2+b2
]lI ~ (8.4.13)

where a 1 ($ and a 2m($ correspond to those roots of the equation </J(T/) =~that
exceed ;/DO, and aIm($ and a2m($ correspond to those roots that are less than
JD~'. The upper limits of summation T 1 and T 2 may be infinite. Also, a lm ($ <
a 2 m(~) and a1m(fl < a2 m(~). Since the function ¢(T/) satisfies a Lipschitz con-
diti.on in the interval in question, we have

Ia1m m1 ~ es6·
Ia1m (s)I > es6·
Each of the summations in (8.4.13) consists of positive quantities. These summa-
tions do not exceed
and 1
(aim (S) ) 2 + ~2 (aim (S) ) 2 + ~2 '

respectively, where ja 1m1 and ia1m1 are minimal. The absolute value of the in-
tegrand in (8.4.13) does not exceed
2ds
0 (6. 0)
1'
2.,,v 2 2 2 (8.4.14)
E5S +~
If we integrate (8.4.14) we obtain a quantity that remains bounded as '--+ O.
The remaining integrals, analogous to those in (8.4.10) that are taken over the
region n~~. are investigated in the same way and they have a bounded imaginary
part as ' --+ O.
Thus for () = - D0 + i,, where '>
O, we have LeC/ 0(()))--+ oo as '--+ 0,
where / 0(()) is the basic integral and oo is the infinite point of the complex plane.
Let us return to equation (8.4.2). If we consider the expressioo

le (cn,n, ~ 0-i (1+0)-N+i-),


we see that it is bounded for () = - D0 + i,, which leads to a contradiction.
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 161

Thus, the test curve and the critic-hyperbola A(g, T/) = D, 0 < D < 1, cannot
be tangent at a finite distance.
It follows that the test boundary cannot under such conditions be tangent to
the critic-ellipse
B (s. l'J) = D for D > 1.
To see this note that, in accordance with what was said in §3, such tangency would
lead to tangency of the test boundary of the "upset" rest with the critic-hyperbola
A(g, T/) = D, and this, as we have shown, is impossible. This completes the
proof of the lemma.
Lemma 8.4.2. If the test boundary g = cfiTJ) is continuous on [O, l], then it
has a zero on (0, l].
In fact,this follows at once from the theorem on "null-regular tests" which
is proved in § 1 of Chapter IX; .the proof of this theorem is completely indepen-
dent of §§ 3 and 4 of Chapter VIII. By Lemma 8.3. l we know moreover that
eo = ¢(0) I= o.
Lemma 8.4.3.0 Let g= ¢(TJ) denote a functiori that is continuous on [O, I]
and that has a finite derivative everywhere on [0, 1), with the right-hand deriva-
tive meant at T/ = O•. Suppose that the size of the sample is n 2 ·:;::: 4. Then the
curve g= ¢Cq) is tangent to a nondegenerate critic-ellipse at a point (g0 , .,, 0 ),
o < .,,0 < 1.
To prove this lemma we need only show that the (one-sided) derivative ¢' (O)
is positive at the point TJ = O•. This is true because ¢(0) = g0 > 0 in accordance
with Lemma 8.3.l and, if ¢' (O) is greater than 0, the curve g = ¢(TJ) issuing
from the point (g0 , O) has a right-hand tangent.
By Lemma 8.4.2 the curve g = ¢(TJ) then reaches the point (O, y), where
y € (O, l] is a zero of the function ¢(TJ); i.e. it must "tum to the left". It is
obvious from geometric con side.rations that it is tangent to a suitably chosen
critic-ellipse B(g, T/) = D, D > 1, that encircles the curve g = ¢(TJ) for o 5 T/ 5 1.
If M = sup 0_.,.,_
< <1¢(TJ) we can find an upper bound for the.ordinate of the point of
intersection (0, TJ'o) of the critic B(g, T/) = D with the T/-axis.
Indeed, the equation of our critic is
62 ·q2
D-1 +o=l.
1) This lemma was proved under somewhat more stringent hypotheses by Salaevski~
(see [ 49]).
162 VIII. UNRANDOMIZED TESTS

From this one can easily see that


'l'J~<; (M+l). (8.4.15)
It remains to show that
~=qi' (0) 0. > (8.4.16)
The derivation of this inequality is rather complicated.
We write the zone of acceptance of the null hypothesis H0 : a 1 = a 2 (comple-
ment to the critical zone) in the form

or lx-yl <qi (s~)


---
S2
2
S2
.

In accordance with what was shown in c§ 3, we can write this inequality in the
form

1x,1< «i qi (e x;r•) · (8.4.17)


l"T+"6 Xn2 -1
where X 1 €N(O,1), x;
1 _ 1 and X~~-I are independent random variables, two of
which have a x distribution with the number of degrees of freedom indicated in
2

the subscripts, and 0=n 2a~/n 1a~.


We define

Je-~
u•
G(u)=vk df=P(IX 1 l<u), and g (u) = 2 e -2
Y2:it
0
If a E (O, 1) is the level of our similar test, then

V1x~:~1 q>(e x;~-1)]=1-a.


ea[ -vr::F'0 0 < 0 < 00.
Xn2 - l / -

If we set x; 1_ 1 = X and x•; 2 _1 = Y, we have

VY qi(e x)]= I -a.


Ea[r1+a o < e< oo. (8.4.18)
y
So far we have made no assumptions regarding the differentiability of ¢(x). Let
us formally differentiate equation (8.4.18) with respect to 0 and then set (formally)
0 = O. We obtain
Eg [qi (0) "VY] [VY ; qi' (0)- ~ VY <p(O)] = 0. (8.4.19)

If this formally constructed relationship were valid, (8.4.16) would follow easily
from it. To see this, suppose that ¢' (0)::; O. Since ¢(0) = g0 ~ 0, the left-hand
member of (8.4.19) would be negative, which would contradict (8.4.19).
Thus we need to show that equation (8.4.19) is valid. Let us assume that
§ 4. TANG ENCY OF A TEST BOUNDARY TO A CRITIC 163
for x ~ 0 the function cf;(x) is measurable and has a finite one-sided derivative
g'0 at x = 0. Taking into consideration the probability densities for X and Y
(see § 2) and denoting the left-hand member of (8.4.18) by E(O), we obtain, for
() > 0,

(8.4.20)
where
n,-3
p 1 (X) =C 1X-2-exp(- ;).

p 2 (Y)=C 2Y
n,-3
2
(
exp - 2
Y)
We note that G(t) = 0(1) for arbitrary real t. Thus E(O) exists even at () = 0.
We need to show that E(O) is continuous, i.e. that lim e""()E(O) = E(O), from which
it will follow that E(O) = 1 - a.
Let us make the substirutions X\Y = U and Y = V. Then !a(X, Y)/a(U, V)! = V
and for () >0 we have, by virrue of (8.4.20),

E (6) = s f
00

0
dU
00

0
dVVO [ :v
1 +!J
<p (Ou)] p 1 (UV) p 2 (V)= 1-a..
(8.4.21)

Let us set D(O) = E(O) - E(O) and represent this function as an integral of the
corresponding difference; D(O) = 1 - a- E(O) =coast. Let us show that D(O)--> 0
as ()--> 0, so that D(O) = 0 for () ~ 0. We break the integral with respect to U
into two integrals by setting
e-I +f oo

s s
00 00

D 0 (0)=
0
dU
0
dV( ), J s dU
0
dV( ).

Here f is a small positive number.


As () --+ 0 we have

D 1 (6)
164 VIII. UNRANDOMIZED TESTS

= 0(1) J
oo
dU
Un,;3
n,+n2_ 1
= ( n,-1)
0 0<l-e)-2-

e-I+f (1 + U) 2

Let us require that (n 2 + 1)/2 > 1. Then n 2 ~ 4 and

D 1 (6) = O (ef- 2e). (8.4.22)


Now note that

8-1 +f aa

D 0 (6)= f dU f dVVp (UV)p (V)1 2


0 0

(8.4.23)

Here 0 :$ OU :$ Of. Since <f;(x) is assumed to be differentiable from the right at


x = O, we have
c:p (6U) = c:p (O) +au (c:p' (O) + TJ),
where TJ --+ 0 as e --+ 0 (and hence as OU --+ 0). Therefore

a[v;_:e c:p(0V)]-a[yi7c:p(O)j'
JIV
, / _ cp(0U)
rl+0

= 1- f
Jf2it JI·-v cp (0)
exp[-~]d't
2
vv
, / - (cp(0)+0U(cp' (0)+1J))
, 1+ e

=y2:n;
1
j exp [ - ~2 ] di: .. (8.4.24)
YV cp(O)

By virtue of the mean-value theorem, we have the following estimate for this inte-
gral when e is sufficiently small:
0(1) OU exp (- · v:o) yv,where~0 =c:p(O) > 0.
If we substitute this into (8.4.23), we obtain

n 1-1
D 0 (6) = 0 (1) 6 U-2-dUX
§ 4. TANG ENCY OF A TEST BOUNDARY TO A CRITIC 165

X Jv~·+:,-a
(j
exp [ - ~ ( 1 + U + ~)] V dV = o (6)

where n 2 ~ 4•. Thus lim 8_,,0D 0 ( ()) = O and hence D(()) = 0 for all () > O. There-
fore by (8.4.22),

D0 (6) = o (ei- 2e). (8.4.25)

r
We can extend this reasonin~ to (8.4.24). For arbitrary V > 0,

1+0
(q> (0) + au (qi' (0) + ..,)).
= VV(qi<O>- ~ cp'(O)+eucp'(O) )+Tie(u+ yV). (8.4.26)

where 71 --+ 0 as ()--+ 0. Proceeding as we did in deriving the above estimate, we


see that if we drop the term 71()(U + ..,/V) in (8.4.26), we obtain an error 0(71()) in
(8.4.23). In other words, if we set
.
qi (0) - 20 qi (0) + 6Uqi' (0) = f (6, U).
then the expression
e-l+E oo

1
-V2:rt o
f dU f
o
dVVp 1 (UV)p 2 (V)

fli' I (0, U)

X
vv
J cp (0)
exp [ - ~2 ] d-c (8.4.27)

will differ from (8.4.23) by no more than 0(71()). H we extend the integral with
respect to U in (8.4.27) from 0 to oo, i.e. if we add to (8.4.27) the term
00 00

Y~:rt I I
e-l+E
du
o
dV ( ) •.

where the parentheses represent the same integrand, we obtain the erroc
0(()3/2-ZE), just as in the derivation of (8.4.22). From this we finally obtain
e-l+E oo

D0 (0)=:,~
" 2Jt
r dU j dVVp 1 (UV)p 2 (V).
0.. 0•
VV /(0, UJ

X J exp [ - ; 2 ]d't + 0 (TJ6) (8.4.28)


VV cp(O)

Furthermore, in accocdance with (8.4.25),


166 VIII. UNRANDOMIZED TESTS

E 0 (8) - £0 (0) = 0 (0f- e). 2

Let us divide both sides of this equation by e


and then let e
approach 0. This
gives us the result that the quantity (d/de) .fro
0
dU fro
. 0
dV ()vanishes at = o• e
Here the quantity f~ dU J~ dV ( ) is the integral in the right-hand member of
(8.4.28) with f(e, U) a lineac function of e and
U. Therefore it is possible to
differentiate under the integral sign in (8.4.28). When we do this we obtain
co co

,r 1
r 2n
f
0
dU j dV Vp
~
0
1 (UV) p 2 (V)

X exp [ - <-VV ~ (0))2 ] yv (- ~ <p (O)+U<p' (0) )=o.. (8.4.29)

This coincides with equation (8.4.19) and completes the proof of Lemma 8.4.3.

§5. COMPLETION OF THE PROOF OF THEOREM 8.3.1.


It is now easy for us to prove Theorem 8.3.1. Suppose that n 2 ·;::: 4. Then by
Lemma 8.4.3 there will be tangency, under the hypothesis of Theorem 8.3.1, to
the nondegenerate ellipse-critic B(g, ri) = D, where D > 1. By (8.4.15) this
ellipse will intersect the 0)-axis at a point (0, ri'0 ), where 1 < 770 : :;
(g =
(3/2)(M + I). By the hypothesis of Theorem 8.3.1, ¢(ri) satisfies, for 1 < T/ <
2(M + 1), a Lipschitz condition with exponent I. By Lemma 8.4.1 the test JgJ?.
¢(ri) cannot be similar. This completes the proof of Theorem 8.3.1.
We note that this theorem can be strengthened by requiring merely that the
test be similar in a countable set of pairs of values (a 1, a 2) such that the ratios
a/a 2 are distinct and their values are bounded.
In fact, the principle of analytic continuation as applied to formula (8.4.1) for
the points e= n2ailn la~ shows that this formula is valid for all e > o.
CHAPTER IX

RANDOMIZED HOMOGENEOUS TESTS IN THE


BEHR ENS-FISH ER PROBLEM. CHARA·CTERIZA
, TION OF
TESTS OF THE BARTiLETT-SCHEFFE TYPE

§I. NONEXISTENCE OF "NULL-REGULAR" SIMILAR TESTS


We retain the previous notation for the different statistics and parameters
used in the Behrens-Fisher problem. Homogeneous tests ¢ ((, T/) will depend
only on the two statistics ( = (i" - y)/ s 2 and. T/ = s ifs 2. It is natural for us to
consider the statistic

(9.1.1)

which, as is known (see for example [33)), is a fundamental instrument in the


construction of similar tests for Student's problem (see Example 2, §I, Chapter
III). For any homogeneous test ¢ ((, T/) it is natural to require that it assume the
null hypothesis with probability I if the fraction t is sufficiently small. To re-
ject the null hypothesis H0 : a 1 = a 2 if the standardized difference of the means
,is sufficiently small is foolish. It is natural to consider tests ¢ ((, T/) that lead
to such solutions as being "irregular" in some sense or other. Let us make more
precise what we mean by defining the property of null-regularity of a test ¢ ((, T/)
(see [44]).
The property of null-regularity of a test ¢ ((, T/). Suppose that there exists a
number T/o = 1 + T/i >1 such that it is possible to draw a circle around every
point on the segment e= 0, 0 s T/ s TJo with the property that, when the point
((, T/) falls in that circle, the conditional probability of rejecting the null hy-
pothesis H0 on the basis of the test ¢ ((, T/) is zero.
H the test is unrandomized, i.e. if the function ¢ ((, .,,) is always equal to 0
or 1 and if the boundary of the critical zone (the zone in which ¢ ((, TJ) = 1)

167
168 IX. RANDOMIZED HOMOGENEOUS TESTS

consists of finitely many continuous curves, then the property of null-regularity of


the test consists simply in t.he fact that it does not reject the null hypothesis if
x - y = 0 and s i + s~ > O, which is an extremely simple and natural condition.
This natural requirement on the test is, as it turns out, incompatible with the re-
quirement of similarity of the test and a fortiori with the requirement of unbi-
asedness. H max (n 1' n 2) ~ 3 we have
Theprem 9.1.1. There are no randomi,zed homogeneous similar (or a fortiori
unbiased) tests for the Behrens-Fisher problem that possess the property of null-
re gularity.
Let us make a few remarks concerning this theorem.
In a certain sense, it does not admit an improvement since the conclusion
ceases to hold if we discard the requirement of null- regularity: similar randomized
tests without this property do exist. In Otapter V we constructed £•Complete sys-
tems of such tests.
Let us give a simple example of the violation of the property of null-regularity.
It might be noted that the distribution density of the statistics (g, T/) that we shall
write out in what follows is an even function of efor all T/ under the null hy-
pothesis H0• Let ¢ i<e,
T/) denote a continuous function of g and T/ in the half-
plane 0: - OC) e
< < °"• 0 :5 T/ < °"• which for arbitrary values of T/ is an odd func-
tion of e
and satisfies the condition

Let us set
1
qi cs. TJ) = 2 + Ql1 <s· TJ).

Then by what was said above,

Eqi (s. TJ) = 21


under the hypothesis H0, so that ¢ (g, T/) is a randomized similar test. However,
the property of null-regularity is obviously violated.
We note also that from an analytical point of view this theorem can be
strengthened. Instead of similarity of the test ¢ (g, T/) with respect to all values
of a 1 and a 2, we need only require its similarity with respect to an arbitrary
countable set of distinct values contained in some finite segment.
§I.· "NULL-REGULAR" SIMILAR TESTS 169

Proof of Theorem 9.1.1. Let us apply the basic integral relationship of the
type (8.2.3). For randomized tests this relationship is obviously written the same
as for unrandomized ones, i.e. in the form

(9.1.2)

where N = (n 1 + n 2 - 1)/2, a(> o) is the level, C~ in 2 > o and n is the half-


plane - oo < t < oo, O < 71 < oo. Equation (9.1. 2) holds for all positive values of ().
Just as we did above, we shall carry out an analytic continuation with respect
to () = a+ i r onto the region A, which is the complex plane with a cut along the
negative axis r = O, - "" < a< o. The left-hand member can also be extended to
this region. The integral in (9.1.2) converges absolutely and uniformly in every
closed subregion contained in /\..
Thus equation (9.1. 2) holds in the entire region A of values of the parameter ().
We note that the same conclusion follows from the weaker assumption that the
test ¢ (g, 71) is similar for an arbitrary countable bounded set of values of a/ a2
and hence for values of (). Elementary application of the principle of analytic
continuation allows us in this case also to assert that equation (9.1.2) is valid
for the entire region A of values of ().
Hwe set

we can always assume, as was shown in Otapter VIII, that

0~A (s. TJ) ~ I, B (s. TJ) ;.::i.


We shall call the geometric loci defined by the equations

A (s. TJ) = D, B(s. TJ)= D,

the critical curves (or simply critics) of the family of measures that is generated
by our probability densities and the parameter (). For D <1 the equation
A (g, 71) = D yields the family of hyperbolas
TJ 2 s
D - 1 - D =l.
2 (9.1. 3)
170 IX. RANDOMIZED HOMOGENEOUS TESTS

For D = 1 the hyperbola degenerates into the twice-covered ray ( = O, 1 :'.S T/ < OQ.

For D = 0 it degenerates into the (T/ = o)-axis.


For D > 1 the equation B ((, T/) =D yields the family of confocal semi-ellipses

62 '12
(9.1.4)
D-I +o=l.
For D = I these semi-ellipses degenerate into the 1:Wice covered segment ( = O,
0 ::S T/ ::S 1.
We can rewrite the basic equation (9.1.2) in the form

SScp (s. rJ) [(e +A (6,


g
'l'Jn,-2 ds d'l'J
TJ)) (0 + B (6. 'I']) )JN

(9°1°5)

By the null-regularity of the test ¢ ((, T/) there exists a number ·ri 0 = 1 + ri 1 > I
such that it is possible to draw a circle around every point of the segment ( = O,
O ::S T/ ::S T/o in which ¢ ((, 11) = O almost everywhere in the sense of Lebesgue
measure. By the Heine-Borel theorem there exists a finite covering consisting of
certain of these circles, so that we obtain an open region r containing our
segment such that ¢ ((, T/) = 0 for almost all points ((, T/) E r.
Let us now look at the family of confocal semi-ellipses (9.1.4). Let u.s
choose D > 1 so close to I that the corresponding semi-ellipse B ((, ri) = D lies
entirely in the region r (the same will of course be true for smaller values of D).
Now let us consider larger values of D. Let D0 (> 1) denote the least upper bound
of the set of values of D for which ¢ ((, 'ri) = O almost everywhere inside the
ellipse (9.1.4).
H D 0 = °" then ¢ ((, ri) = O almost everywhere on 0. The test is then a
trivial similar test. We disregard this case and assume that D0 is a finite nuinber.
The notation that we shall need is as follows. We denote by H a quantity
(not always the same one) that remains bounded as the different parameters of the
problem in question vary. We denote by ( 0, ( 1, · • · small positive constants
each dependent on the preceding ones. We denote by L~>, where k is a non-
negative integer, the operator for k differentiations with respect to () at the point
()EA.
We have D 0 > 1. We set
§ 1. "NULL-REGULAR" SIMlLAR TESTS 171

Do-1 =~o• 1"


..,1=mm
• ( ~o
2· ~)·
Suppose that () vaties in a rectangle Q CA:

(9.1.6)

Our remaining considerations are based on the application of the operator L ~)


to both sides of equation (9.1.5) with () E Q. We shall see that, for suitable choice
of () E Q and sufficiently large k, the order of growth with respect to k is not the
same on the two sides of equation (9.1.5), which provides the desired contradic-
tion.
From the definition of D0, the function ¢ (~, T/) is zero almost everywhere
inside the semi-ellipse B (~, T/) < D0 . We denote by 0 0 the half-plane fl from
which we have removed the semi-ellipse referred to. We denote by I 0 (()) the
corresponding integral, which coincides with the left-hand member of (9-1. 5). We
have

Note also that

(9.1.8)

so that for () E Q,
(9.1.9)

by virtue of the definition of ( 1•


We consider first the case in which the sizes n 1 and n 2 of the samples are
of opposite parity. Then the number N = (n 1 + n 2 - 1)/2 is an integer, and the
integrand in (9.1. 7) is rational and has no branches.
From the integral I 0 (()), we take out the integral

'1(6)= II
B (,, 11);;. Do+ 1
(9.1.10)
172 IX. RANDOMlZED HOMOGENEOUS TESTS

and find an upper bound for the quantity


L~kJ/1 (0) (9.1.11)

for () € Q and large values of k. By virtue of (9. I. 9), we obtain

L(ml 1 =H f(N+m>. (9.1.12)


e ce+A>N 'f
(The meaning of the symbol H was explained above. In particular, H • H = H and
ct= H..)
Now let B = B ((, T/) vary in the region
(9.1.13)

whete D ;::_ D0 + 1. Then


(9.1.14)
Consequently
~m> 1 -H r(N+m> (9.1.15)
(0+B}N - (D-D 0 )m+N •

Now let us find a bound for the quantity


L~kl 1 (9.1.16)
ce+A>N ce+B>N
where () € Q and B = B ((, T/) belongs to the region (9.1.14). To do this we use
(9.1.12), (9.1.15) and the familiar formula for differentiating the product of two
functions. We find that

(9.1.17)

Since C'J: = k!/m! (k - m)!, we find, after some simple calculation, that

l(k) l =Hf(k+l)(D-D)-N~-kk2 N
e ce+A>N ce+B>N o i
. Hf(k+l)(D-D0)-N~ik. (9.1.18)

where , 2 = 'if 2.
From 1 1 (()) (see (9.1.10)) let us take out the integral over the region defined
by (9.1.13); we denote this integral by I 1D. In this region (which is of the form
of a horseshoe between two semi-ellipses), we have
1
'l')=H(D-D0f2.
§ 1. "NULL·REGULAR'' SIMILAR TESTS 173

The area of this horseshoe is H (D - D0 )1h. Remembering (9.1.18), we see that


for () E Q,
n, i
l~k)/1D(0) =Hr (k+ I)C;k(D-Do)2-N-2. (9°1°19)

By the assumption that max (n 1, n 2) ~ 3, we may assume that

n1 < n2, ~ - N _ _!_ -


2
n2 ~ 3
2-2:::r2·
In this case we may sum the estimates (9. l.19) for the values D = D 0 + l,
D0 + 2, • • ·• We obtain
(9.1.20)
Let us set

I oo (0) =Io (0) - / 1 (0) = ff (9.1.2n


Da<:B (s, 11>< Da+l
Let R > 100 denote a large number (with value to be fixed later) such that
M= R'2 1 is an integer. Define

A.=C2R- 1•

Po,r(0)= JJ (9.1. 22)


Do+r~<:B(s, TJ)<Da+(r+t)~
so that
(9.1.23)

We need to study the behavior of

l~k)Por (0) (9.1.24)


in the region

D0 +rA.<B<£,.·TJ)<Do+<r+ l)A. (9.1.25)

with () E Q. We have
(9.1.26)

Let us define o= ll/2. Thus the value of the number () = - D0 + o+


on the number R and the number ,, which we shall identify later. From (9.1.26)
i' depends

we obtain
1

10+ ~I= ((r + })2 ll2 +c f:.


2
174 IX. RANDOMIZED HOMOGENEOUS TESTS

Let us assume also that 0 < C:::; /),,. Then in the region (9.1.25) and for the values
of 0 indicated we have
1

(r+~)A~JO+BJ~A((r+ ~r + i):r. (9.1.27)

Let us return now to (9.1.24). On the basis of (9.1. 27) the expression (9.1.26)
is replaced with

(9.1.28)

and the estimate (9.1.12) remains valid. Furthermore, for k >0 ,

where

(9.1.30)

if (r + 1/2)A < C:/2. On the other hand, if (r + l/2)A ~ C:/2 we conclude from
(9. L 29) and the second of the three expressions (9. I. 30) that

L~11, 1 =H r (N + k) (9.1.31)
(0+B)N (0+A)N ~~ '

where C:3 = C:/2.


H we now use (9.1.29) and (9.1.27), we obtain

I L~k> 1
(0 + A)N (0 + B)N
I> r (N + k)
Co (0 + B)N+k
(9.1.32)
for

(r+ ~)A<~4~1· (9.1-33)

where c 0 is a positive constant.


Furthermore, if the number iC: (see (9.1.6)) is sufficiently small, for example
§ 1. "NULL-REGULAR" SIMILAR TESTS 175

if 0 < ( < /'),/10k 2, then by (9.1.8) we have, for even k + N,

Re L~k> 1 > ..!._ c r (N + k) (9.1.34)


(0 + A)N (0 + B)N 2 o I 0 + B IN+k

H (r + 1/ 2)/'), :2: ( 4 ( 2 , we obtain the expression

L~k> l = H r (N + k) (9.1. 35)


(0 + A)N (0 + B)N (~4~1)k

which is analogous to (9"1.31).


Let us now return to the quantities (9.1. 22). By definition of the number D0
we must have

Do<;;;;B(~,
JJ
'l'))<;;;;D,+d
11n 1- 2
1P (s. 11) ds d11 > ~o (/'),) > o, (9.1. 36)

where {3 0 (/'),) is a positive number depending on /'),. Let us choose the number R
sufficiently great and hence /'), sufficiently small that /'),/2 < ( 4 ( 1 , so that con°
dition (9.l.33)is satisfied when r.= O. Then from (9.1.32), (9.1.34), and (9.1.36)
we obtain

r (N +k)
(9.1. 37)

Let r 0 denote the largest integer satisfying condition (9.1. 33). Then from
(9.1.37), (9.1.34), and (9.1.33) we obtain

L (k) F 8
Ree(o3() + F01()+
8 ... +Fo,,0 (8))> Co~o2(~)

X r(N+k> (9.1.38)
(~ v-~ t+k ·
By virtue of (9. 1. 3 5) we obtain
ReL~k)(Fo, r 0 +1(8)+Fo, r 0 +2(8)+ · · · +Fo,M-1(6))
=H f(N+kk). (9.1.39)
(~4~1)
If we keep (9.1.20) in mind and compare (9.1.29), (9-1.20), and (9.1.38), we
conclude that, if /'), is chosen sufficiently small in comparison with ( 4( 1 and
( 2 (which leads to choice of a sufficiently large number R), for example if we
176 IX. RANDOMIZED HOMOGENEOUS TESTS

give A a value :S (1/I,OOO)min ((4( 1, ( 2) and then choose k sufficiently large


and of the same parity as N, we obtain

1 f(N+k)
R.e La Io (0) > 4 co~o (A)
(kl
_ N+k (9.1.40)
(~ v~)
for () = - D0 + A/2 + iA/1ok 2, but this inequality, as we shall see below, leads
to a contradiction.
Let us now consider the case in which the numbers n 1 and n 2 are of like
parity and let us show that an inequality analogous to (9.1.40) holds. In this case
the number N = (n 1 + n 2 - 1)/2 is half of an odd number, and the expression

[(0+A) (0+B))N'

in (9.1. 7) has a branch point.


Keeping the previous notation, let us look at a sequence of large integers
{kl, and for each k let us choose, just as before,

0=-Do+ ~ +i l~k2.
For () E A, if we assume that arc () = 0 for () 1> 0 we find, by virtue of (9.1. 8)
and (9.1.26),
1 = tN 1 •
((0 +A) (0 + B)]N ((- 0 - A) (0 + B))N ,.

Here, for values of () defined by (9.1·42),

arc ( - 0 - A) = 0 ( (i 2) ) , (9.1.43)

and for values of B = B ((, 77) with ((, 77) € f1 0 ,

arc (0 + B) = 0 ( ~2 ) • (9.1.44)

This means that for sufficiently large k,

1 - tN 1 1
[(0+A)(O+B))N - (-0-A)N (0+B>N'

and we can apply our previous reasoning with minor modifications: instead of
(9.1.12) we write the estimate
§ 1. "NULL-REGULAR" SIMILAR TESTS 177

Ljt> 1 =H rcN+m> (9.1-46)


(-0-A)N 6i '
which follows from the familiar formula
r (N+m)
r(N) =N(N+l) ... (N+m-1).

Fromula (9.1-15) remains unchanged. We replace () + A everywhere with


- () - A for the remainder of the discussion. Instead of formula (9.1. 29), we
obtain
rNrtk) 1
LJtf (-0-A)N ce+B>N -
_ (-l)kN(N+l) ... <N+k-1) + (9.1-47)
- c0+ 8 >N+k <- 0 -A)N Yk

with the same estimate (9.1-30) for Yk. Consequently,

(-l)krNRe 1 >~ rcN+k> (9.1.48)


(-0-A)N(0+B)N 2 10+B1N+k

for (r + 1/2)~ < ( 4 ( 1.


Otherwise the reasoning proceeds as before except that, in inequalities
(9.1.37), (9.1-38), and (9.1.40), we need to replace Re L~)(. ••)with
Re (- l)ki-N L ~k) (. · · ), As a result, we again arrive at a formula of the type
( 9-1.40):

k .-N Re L(k)l (0) > Co~o (Ll) r (N + k)


(
- l) L 0 0 4 (ll-V~r+k
(9.1.49)

for

0 =-Do+2+ 1ok2 •
ll ill (9.1. 50)

For values of ()satisfying equation (9.1.50), we obtain from (9.1.40) and (9.1.49)
the inequality
'I 4ik)I (0) I > Co~o (Ll) r + k)
ll 1-~ r+k
(N (9.1-51)
0 4 (

this time for arbitrary values of n 1 and n 2 except that max (n 1 , n 2 ) ~ 3.


Let us look at the application of the operator L~) to the right-hand member
of (9.1. 7); in other words, let us consider the expression
178 IX. RANDOMIZED HOMOGENEOUS TESTS

n2 1
L~k>ae-2 (I+ 0)-N+2 •
Using the considerations discussed at the beginning of this section, we see that
. ~ 1
i.~k>ae-2 (1 +e)-N+2 =H r<N+k>
(Do-~ -1r+k · (9.1.52)

H t'!i. is chosen sufficiently small and then k is chosen sufficiently large, then
(9-1-51) will satisfy (9.1-52), which completes the proof of our theorem.
We note that, when the sizes of the samples are of opposite parity and N is
an integer, one can considerably simplify and shorten the proof of the theorem.

§2. BARTLETT·SO:IEFFE TESTS


In Chapter V we saw how one can construct for the Behrens-Fisher problem a
family of similar tests that is f·complete in a certain sense. These tests were
randomized and they depended only on sufficient statistics. By what was said in
Chapter IV one can use these tests to construct unrandomized tests by leaving
the a-algebra of sufficient statistics. The family of such tests can also be made
e-complete.
However, certain tests of this type that are convenient and simple in applica·
tion have long been known to statisticians. These are the Bartlett~Scheffe tests
(see Neyman [58] and Scheffe [79]). Let us look at these tests.
Bartlett's test deals with the case when two samples x 1' · • · , x n 1 €
N (a 1, ai) and y 1, • • • , y n 2 € N (a 2, u~) are of the same size n = n 1 = n 2 . The
critical zones for Bartlett's test are of the form

(9.2.1)

for arbitrary C > O.


The test depends on the linear forms X = x - y and li = (xi - :X) - (y (""" y),
i = 1, 2, · · ·, n. These last forms are linearly dependent: I7=l li = O. The forms
X and l 1, · • • , ln are normal variables whose means are zero and whose variances
are respectively D(x) = (ui + u~)/n and D (l) = (1 - 1/n) (ui +1 u~). H we divide
both the numerator and the denominator in (9.2.1) by (ui + u~) Yi, we see that
our test is similar. Below we shall show that the left-hand member of. (9.2.1) is
distributed like Student's fraction tn 1_ 1 with n 1 - 1 degrees of freedom.
§:z. BARTLETT·SHEFFE TESTS 179

Consider the case n 1 -f. n 2 (we assume without loss of generality that n 1 <
n 2). Here we could use only n 1 observations y 1, ···,Yr,. 1 from the second sample
and construct a similar test of the form (9.2.1). H we did this however, there would
be a definite loss of information. Therefore Scheffe [79] proposed in 1943 a new
variant of a test of the form (9. 2.1) for this case.
We introduce the linear forms
n,
ll=Xi-~cijyj (i=l, 2, ... , n 1). (9. 2. 2)
J=l
These forms li constitute a normal vector, which we shall write as a column
matrix

l= =x-Cy,

where
Yt
X= y=

xn. Yn,
Suppose that a 1 - a 2 = o. For the equations
(> cr2 ... 0 ... 0
0 ... 0 2 ..• 0
El= = t;, E (l - b)(l --bl= =0"2ln1n1.
0 ... O .•. cr2
to hold for a 2 >O (for calculation with random matrices, the reader is referred,
for example, to [36]), it is necessary and sufficient that
n,
a1 - a2 ~ cij = a1 - a2 = (>, (9.2-3)
j=l
so that
n,
~ Ci·= 1 (i= l, 2, ... , n1), (9.2.4)
j=l J
- - T] =O"ln
E[(x-Cy-l>)(x-Cy-b). 2 1n1· (9.2.5)

Define

s= ri=
180 IX. RANDOMIZED HOMOGENEOUS TESTS

Then if (9.2.3) is satisfied we may write (9.2-5) in the form

E [<s - Crt) (s - Cri{J =


= Esf - ecrif - EsriTcT + ecririTcT
= oil n,n, + CJ~CCT (9.2.6)

by virtue of the uncorrelatedness of xi - a. and y. - a.. Thus, in addition to


I J t
(9.2.4)we have the relation CCT = c 2/nini' where c is a nonnegative number.
Therefore
n.
~ cikc jk = c26ij•
k=l

where (jij is the familiar Kronecker delta. Furthermore,

D(Li)= oi + c2o~ = 0~. (9.2.7)

Under the null hypothesis H0 : a 1 = a 2 we have D(l) = o. We define L =l =


(l 1 +···+ln 1)/n 1 and Q=I72 1 (l 1 -L)2.Then(see [33])
Q
-2 = 2
'Xn 1-t.
Vili"L EN (0,
-- 1),
02 O"c

and the quantity ..Jn;. L/(Q/(n C l))Yz is Student's fraction with n 1 - 1


= tn 1_ 1
degrees of freedom. Therefore we can construct a similar test with critical zone

~Ill, >C. (9.2.8)


(Q/(n1 - 1) ) '•

H the hypothesis H 0 is not valid and a 1 - a 2 = (j -f, O, the random variable


(L - 8)/(Q/n 1 (n 1 - I))Yz = tn-l is still Student's fraction. H for a given reliability
y we find a number t such that P l\t 1 < t ) = y and if we construct the
'Y n 1- 1 'Y
confidence interval

(9.2.9)

we can see that the mean length of such a confidence interval is minimized when
a; is minimized. By virtue of the correspondence between the confidence
intervals and the tests of the hypotheses that is described in [36], this property
of the confidence interval will be a useful property of a similar test (9.2.8). We
note on the basis of (9. 2. 7) that it is expedient to find matrices C that minimize
§3. BARTLETT'S TEST. 181

c 2• This minimization problem was studied in detail by Scheffe in [79]. He obtain-


ed the following solution for the matrix C = Uc ijll:

It then follows that

- !_=l
n,

li=x, - ~ c,iyi= xi-(::r Y1+(n1n2)- 112 t yi-Y·


i=l
L=x-y.
For n 1 = n 2 we obtain Bartlett's solution (the test (9.2.l)): cij = oij' li =xi -
/ y i, L = x- y. Here c ' = 1.
Th~s, as stated above, the left-hand member of (9. 2.1) is distributed like
Student's fraction tn 1_ 1.

§3. A HOMOGENEOUS RANDOMIZED TEST ASSOCIATED


WITH BARTLETT'S TEST
If we "project" Bartlett's test with critical zone (9. 2.1) onto the a-algebra
of sufficient statistics, we obtain a randomized test with the same power func-
tion. It turns out that this test has an extremely simple and characteristic form.
Following the article [43], let us go through the corresponding calculations.
Let us assume that n 1 = n 2 = n. Just as above, we take ~ = (x - y)/ s 2 and
T/ = s 1/ s 2 and introduce the sample coefficient of correlation
n
~(x,-x)(y1-Y)

r= (;~;::-X)' iicy,-YJ•)"'.
This puts the test (9. 2.1) in the form

(9.3.1)

Since the normal samples x 1, · • · , xn and y l' · · · , y n are independent, it follows


(see Example 4, § 5, O:tapter IV) that the random variable r is stochastically
independent of the randan vector (X, y, s 1, s 2) and a fortiori of the vector
(~, 71). Furthermore (see [33]), for the same reason, r has probability density
182 IX. RANDOMIZED HOMOGENEOUS TESTS

r(~) ~
fn(r)= (n-2) -
r - - 'V.i:
(1- r2) 2 (9.3.2)
2

It follows from (9-3.l) that


62
C2 > 112 - 2r1] - 1 > 112 - 21] - 1 = (1J - 1)2. (9-3-3)
1

Furthermore, from (9. 3.1) we see that, when (9. 3. 3) is satisfied,

1 > r >max I• (s. TJ), - 1J. (9-3-4)


where

(9.3. 5)

Since r is stochastically independent of g and 71• we obtain from what was said
above a randomized test ¢ (g, 71) defined (for given C 1) by

cp(6. T]) = 0

cp(6. TJ) = j f.(r)dr : :


max (T (!;, lJ), -1)
::::::::=:::II (9. 3.6)

that is, under condition ( 9.3-3). We note that under this last condition r(g, 71) .$ 1.
The condiV.on l~I = C1 J71- ll defines the boundary of the "randomized critical
zone." Inside the quadrant g~ O, 71 ~ O this zone has the form of a right angle
with vertex at the point (0, 1) and bisector parallel to the ~axis. In the quadrant
g ~ O, 71 ~ 0 the boundary of the zone is the mirror image of this angle about the
71-axis.
Returning to the sufficient statistics, we note that the boundary can be repre-
sented in the very simple form

I --, -C
x-y
S1-S2 - l·
(9-3-7)

If the sizes of the samples are all n = 4 then we see from (9.3 .2) that fn (r) is
constant and equal to r(3/2)/viTr(1) =~.Thus in this case the form of the test
¢ (g, 71) is simplified.
The fact that the randomized critical zone extends to the point (0,1) is quite
§4- TESTS OF BARTLETT·SCHEFFE TYPE 183

characteristic. It can be shown that this is necessary for a randomized homo~

geneous test. (In this connection see § 3 of Chapter VIII, where a somewhat
weaker assertion is proved in detail.)

§4. GIARACTERIZATION OF TESTS OF TIIE


BARTLETT SCHEFFE TYPE0

In Chapter VIII, we ·used the method of analytic continuation with respect to


a parameter to prove the possibility of various similar tests for the Behrens 0

Fisher problem. The same method can be applied to achieve purposes that in a
way are just the opposite. One may seek to characterize the known similar tests
by exhibiting certain simple properties of a qualitative nature that distinguish
them from other tests. Apparently this can be dme for many familiar tests in
multivariant analysis and Fisher's analysis of variance. In this section we shall,
following [ 47], do this for the classical Bartlett Scheffe test described in the
0

preceding section. In particular, for identical sizes of the samples n 1 = n 2 = n we


have Bartlett's unrandomized, mixed similar tests with critical zone Z C:

(n(n-l)) 1 '•1x-y1 >C (9-4.1)


( ~
n [(xi -X} - (Yi - y)]2)''• '
t=I

where C is a constant.
The expression (9."4.1) can be represented in the form

(9.4.2)

where X=x-y, li =(xi-x)-(yi -y) (i=l,2,···,IL), IL=n-1,and


c 1 = C 2/n (n - 1).
We note that the linear form X and the linear forms l 1, · • • , l µ of the
observations xi and y i are linearly and stochastically independent. Furthermore,
under the hypothesis H0 : a 1 = a 2 we have

E(Xl.H0 )=0, E(l;IH0 )=0, (9-4-3)


cr2 n-1
D(X)=-,
n D (li) = -.-
n a2 , i = 1, 2, ... , µ, (9-4-4)
184 IX. RANDOMIZED HOMOGENEOUS TESTS.

where a 2 = a 21 + a.22

Thus the variances X and the linear forms li are also proportional for arbitrary
values of a 1 and a 2• The test (9.4-1) is defined on the sample space W de-
termined by the linear statistics X, l 1, · · · , l µ: Using the results of § 2, one can
easily write the corresponding probability density.
Here we have a one-parameter exponential family with parameter n/2a2 =
n/2 (a~ + a~) and with a single sufficient statistic

(9.4-5)
which is a quadratic form in its variables. By virtue of Theorem 4.2.2 we see that
the family of distributions generated by the sufficient statistic Q1 is boundedly
complete and that all similar zones are determined by the Neyman zones. Thus the
crtiical zone Z C is a Neyman zone, which one can also verify directly by noting
that the left side of (9.4.1) is stochastically independent of Qr
We also mention a characteristic property of the test (9.4-1): there exists a
positive constant f (an arbitrary number less than C) such that the inequality

implies that the test assumes the null hypothesis H0. Thus the test that we are
studying has the following two properties.
I. It is defined on the sample space defined by the forms X, l 1, · ··, lµ
II. It assumes the null hypothesis H0 when inequality (9.4.6) is satisfied.
Here our test is a Neyman structure defined by X, l 1, • · · , l µ: Therefore a
complete description of tests of the Bartlett-Scheffe type reduces to construction
of Neyman structures (see § 2, Chapter IV).
The purpose of what follows is to prove that properties I and II already
characterize the corresponding tests since they ai:e Neyman structures for suitable
exponential families.
Suppose that we have a randomized or unrandomized test
</; = ¢(x 1, · • ·, x ; y 1, • · ·, y ) for the BehrensaFisher problem. Here, of
nl n2
course,

x1, •• ., xn,EN(a 1, ai); yl' .. . , Yn,EN(a 2 , a~).

Suppose that the test <ll is defined in the sample space generated by µ.+ 1 linear
forms X, l 1, • • • , l µ' where µ. ~ n 1 + n 2 - 1, that are linearly and stochastically
§4. TESTS OF BARTLETT SCHEFFE TYPE
0
185
independent. Suppose that under the hypothesis l/ 0: a 1 = a 2 we have

E(XjH0 )=0, E(l;JH0 )=0, i=l, 2, ... , µ. (9-4. 7)

Here the variances of the statistics X, l 1, • • • , l µ. can be various binary


quadratic forms of the unknown parameters a 1 and a 2•
The first of these forms, X, will play a special role. Let us call this form
the leader of the test. We now introduce the concept of standardizer of a test.
By this we mean a continuous function T (l 1, · · · , l) that is homogeneous of
degree v > 0 and that satisfies the conditions T (l 1, · · · , l) > O for
> O, if at least one of the
(l 1, · · ·, lµ.) ,f. (O, • ··, O), with T(l 1 , · ··, l) > c 0
variables is equal to 1.
For arbitrary s > 0,

(9.4-8)

In addition the standardizer T must be related to the leader X and the test ¢
as follows: there exists a positive number fo such that the inequality

(9.4-9)

implies the test ¢ assumes the null hypothesis H 0 with probability 1. Of course
the test may assume this hypothesis even though inequality (9.4-9) is not
satisfied.
Now we can formulate the fundamental theorem.
Theorem 9.4. l. Suppose that a randomized similar test ¢ =
¢ lx 1 , • • ·, xn 1; y 1 , • • • , y n 2) is defined on the sample space of linear forms X,
ll' • ··, lµ. (satisfying the conditions listed above). Suppose that it has a leader
X and a standardizer T such that it assumes the null hypothesis H0 at least when
condition (9. 4-9) is satisfied. Then

D (l;)
D (X) = a; (i = l, 2, ... , ~t), (9.4.10)

where the ai are positive constants independent of a 1 and a 2 and our test is a
Neyman structure for the exponential family generated by X, l 1, · · ·, l µ:
We define D (X) = a 2 = f (a 1 , a 2), where f is a binary quadratic form. Then
D (l) = ai a 2 and the exponential family generated by X, l 1, · · · , l µ. is a one-
parameter family (the parameter being a) and has the sufficient statistic
186 IX. RANDOMIZED HOMOGENEOUS TESTS.

µ t2
Qo = x2 + ~
~ __!_ •
ai
l=l

Corollary. Under the conditions of the theorem the test ¢ is a Neyman struc-
ture for the statistic Q0 = X 2 + If:, 1 lf!ai. If it is unrandomized, it is obtained by
singling out the regwns of given conditional probability on the surfaces of level
Q0 and "gluing" them together.
Jn particular, for tests of the Bardett-Scheffe type, the quantity X = x-
y
serves as a leader and T = I':!.t= 1 [(x.£ - :X) - (y.I - y-)]2 = T (l 1, • • • , l n- 1), where l.£ =
(xi - x) - (y i - y) (for i = 1, 2 •.•• ' n - 1) serves as standardizer. Therefore
the test is a Neyman structure of the special form (9.4-1).
We note also that if T = T (l 1, • • · , lµ.) ls a standardizer corresponding to a
posit.ive number lo• it can always be replaced with the standardizer T1 =
l~ + · · ·+ l~ by replacing the number l with fl< lo· To see this, note that if a
test ¢ assumes H 0 under the condition (9.4-9), it also assumes it under the
condition

for sufficiently small £1 > O.


Let us now turn to the ptoof of our theorem. We have

D{xd=oi (i=l .... ,n 1); D{Yd=a~ (i=l, 2, ... , n2).


Suppose that X, l 1, • • • , l µ. (where µ _$ n 1 + n 2 - 2) are linear forms in x 1, · • •
• • • , xn 1 ; y 1, • • • , y n 2 that satisfy the hypothesis of the theorem. Then obviously

(Here and in what follows the letters b, c, d and e with subscripts are positive
constants.) We set

(9.4.12)

Thus

(9-4-13)

where the constants c ij are reindexed.


Now let us consider a test ¢ = ¢ (x 1, • · • , xn 1; y 1, • • • , y n 2), that satisfies

the hypotheses of the theorem. Then ¢ = ¢ (X, l 1, • • · , l )• where X is the


~ 4. TESTS OF BARTLETT"SCHEFFE TYPE 187
leader of the test. Our test must be similiar with respect to the parameters a 1
and a 2 when the hypothesis H 0: a 1 = a 2 is satisfied. HereE!X\H 0 l = Elli\H 0 1=
O (i = I, 2, · · • , µ). For a given level of the test a> 0, the assumption that the
test is similar leads, by what was said above, to the integral relationship

J... f
a;
cpl (X, Li, .. ., lµ)

x exp [ - ~ (
\
±
~. :~2 + Cj1~1 ~C/2~2
l=l
I µ
)] dX dl1 ••• diµ
I
= A 0 (tf 1 +tr2 )~ II (cn-6- + c '6' )2,
1 12 2 (9."4.14)
l=l
which is identically satisfied for 'l'l- 1, -6- 2 > O. Here A 0 > O is an absolute constant
and X is the space of the variables (X, l 1, • · • , l )· Let us set

'6-1 = 01(1)-1, '6-~ = 02(1)-1,


where el' e2' and cu are new positive parameters.

I µ I

= A0 (01+02) 2 II (cn01+ ci20 2) 2 ,


i=I
which holds for arbitrary positive values of el' e2 and CU• We multiply both
sides of this identity by the expression cuaeacu/ 2-, where a > 0 and a> - I. We
obtain

(9."4.17)

For arbitrary T/ > O the integral of this function with respect to cu from T/ to °"
converges absolutely. Furthermore
188 IX. RANDOMIZED HOMOGENEOUS TESTS.
00

f
0
<iPe --f-oo d©= ( ;-ra-1 f (l +a)= a-a-IQ(a). (9.4.18)

where G (a) is a function that is regular in the half plane Re a> - 1. Let us 0

suppose that el and e2 are positive numbers. If we integrate both sides of


(9.4.17) with respect to (Ll from T/ to oo and then let T/ l O we obtain, by virtue of
the boundedness and measurability of cf; 1,

s... J
x
ci>1 (X, 11, ••• ' lµ)

-a-1-µ+l
t7 )
X (
, x2
01+02
+~
~
.0 + .0 +a
';
C11 I C12 2
2
dX dl 1 ••• dlµ
i=I
µ

= A 00 (a) a-a-I (0 1 + 02)112 JI (cn0 + ci202)11'.


1 (9.4.19)
i=l

In this equation we set el= e, e2 = 1 and a= - 1 +TO' where To is a small


positive number whose precise value we shall determine later. Now multiply both
sides of (9.4.14) by (1 + erTo-(µ+ l)/2. We get

x(x +<1+0) ~( cll0+ci2


2 1~
~

(9.4.20)

Here a and e are arbitrary positive numbers and G1 (T0 ) is a function that is
regular for Re T0 > O.
Now consider the fractions
r = 10-:+.-0
Ct1 Ct2
• i= 1, 2, ...• µ. (9.4. 21)

These fractions may include some in which c i 1 = c i 2, so that r i = 1/ c i 1 = 1/c i 2


is independent of e. This means that D!lil/D!Xl is independent of e1 and e2•
If this property is satisfied for all values of i = 1, 2, • • • , µ., it will follow ima
mediately that our theorem is valid. It is just this property that we wish to prove
in what follows.
§ 4. TESTS OF BARTLETT~SCHEFFE TYPE 189

Let us suppose that this property is not satisfied for all indices i = 1, 2, · · ·
• • • , µ· Suppose that it is satisfied for i = 1, 2, · · · , p, where O ::; p < µ, and that
it is not satisfied for i = p + 1, · · · , µ· (Of course this reindexing of the l i does
not restrict the generality.)
For i = 1, 2, · · ·, p, we write bi= 1/cil = 1/c iZ" From (9.4.20), we obtain

f f <p 1 (X. 11••••• lµ)

dX dl 1 ••• dlµ
x

µ
(9. 4. 22)
JI (cile+ ci2)'1•.
i=p+I
Here A 1 =A(b 1 , ... ,bp)-Yi>o. Also µ-p?.1 since p<µ.
The expression (9.4.22) is an analytic regular function of e for Re e > O.
Obviously it can be extended as a regular function to the complex &-plane cut
e e
along the negative axis - oo < Re :S 0, Im = O. We define its values as we
approach the upper and lower edges of the cut in such a way as to preserve its
continuity with the aid of rectifiable paths extending from points of the positive
axis Ree> 0, Im e = O. Then the two sides of (9.4.22) coincide. We define

QI --x2+l~+
I . . . + LP.z
·~ l~
Qze= ~ ci1e+ci2 +a. (9.4.23)
i=p+l
Let us set

(9.4.24)

where ' i s a small positive number, which we shall make arbitrarily small in
what follows.
We partition the region of integration ~:
-oo < X <cc, -cc< li < +oo (i= 1, 2, ... , µ)
into "layers:"

IXI<~ (9.4.25)
and
IX IE (2m-l?",
.,, 2m1"]
.,, (m=l,2,3, ... ). (9.4.26)
190 IX. RANDOMIZED HOMOGENEOUS TESTS.

Let us find an upper bound for the absolute value of the expression on the
left-hand side of (9.4.22), where () is the value indicated in (9.4.24). Consider
the integral over the "layer" (9.4.25), On the basis of condition (9.4-9) regarding
the leader and the standardizer of the test we havecf>= O for T ~ JXI 0• By v; ,
the homogeneity of the standardizer T and its properties we then see that for
¢-f, 0
(9.4-27)

where c 1 is a positive constant.


Now consider the integral (9.4.22) over the layer (9.4.25). By (9.4.27) we
have in this case

(9-4-28)

Furthermore,
(9.4.29)

By (9.4-28) and (9.4-29), we have in the layer (9.4.25)

(1 + 0) Q2, 9 = B~X2 t i~2. (9-4-30)

where B is a bounded function. (In what follows the letter B will not always de·
note the same bounded function.) From this we see that for sufficiently small. ,,

I Ql +<1 +e)Q2.a I> 1x 2 +s~x 2 + i~2 1> I{ x 2+ i~2 I> ~2.


(9.4. 31)
everywhere in the layer (9.4-25). By virtue of condition (9-4-28) the integral over
the layer (9.4.25) is bounded by

__B-'6"""µ~+_1_ = B~-2i:•• (9-4°32)


µ+1
(62) - i - +To

Now let us take the layer Lm defined by (9.4.26)for_some value of m = 1,


2, · • ·. For the "layer" Lm we obtain on the basis of (9.4.27)
ILi I-< C12m~.
IQl + (1 -1- 0) Q2, a I> I x2 + B~X2-t i~2 I>-{ x2 >- 2m-2~2 (9-4-33)

(assuming 'sufficiently small). Hence the portion of the integral (9.4-22) over
the layer Lm is bounded by
§A. TESTS OF BARTLETT-SCHEFFE TYPE 191

__ B_(_2_m_~_)µ_+_i__
_µ_+_l +To
= B' -2T,2m"t,.
(9.4-34)
( 2 m-2~2) 2

In this expression the number B is bounded by an absolute constant for all


values of m = 1, 2, •••,and (> O. Now the number TO can be chosen arbitrarily
in the interval 0 <TO :S 0.01. Then if we sum the estimates (9.4-34) with respect
to m = 1, 2, · · · and then the estimate (9.4.32), we obtain the following upper
bound for the left-hand member of (9.4.22):
(9.4-35)
Let us now look at the behavior of the right-hand member of (9.4.22). As was
stated above, here we have µ - p 2: 1. Furthermore, for i = p + 1, • · · , µwe have
c iI -/, c i 2 , so that

for () = - 1 + i( and sufficiently small ( > O. Thus the right-hand member of


(9.4. 22) has the following bound from below in absolute value:
µ-p I
C3,-2T,,--2- >- C3,-2-2'to (9.4.36)

Comparing this inequality with (9.4.35), we obtain a contradiction for ( =0


sufficiently small.
Thus the inequality µ - p > O is impossible and we have p = µ· Hence the
ratios
D {1;}
D {X} =a,
are independent of the values of () 1 and () 2. This establishes one of the as-
sertions of the theorem.
In the space of the linear forms (X, l 1, · · • , l) the element of probability is
therefore

c0o-<Jt+llexp [ - 2 ! 2 ( X 2 +~ !: )] dX dl 1 ••• dlw

where c 0 > O is a constant and a 2 = '6- 1 + tl- 2 •


Thus we have an exponential family with a single parameter a 2 and a single
192 IX. RANDOMIZED HOMOGENEOUS TESTS.

sufficient statistic
~L t2
Qo =X2 + 4',,j·
~._!_ ai '
l=l

so that in this case all similar tests are Neyman structures with probability

E{q:ilQ0 J=a
for all values of Q0 . This completes the proof of the theorem. Its corollary re•
garding the structure of unrandomized similar tests can be proven directly.
The Bartlett·Scheffe test is the special form of Neyman structure defined by
the left-hand member of formula (9.2.S).
Special forms of linear forms li are chosen in such a way that property
(9.4.10), which we have proven, is satisfied. Just which of the various Neyman
structures are preferred in this sense depends on the requirements imposed on the
power properties of the test.
We note that the method that we have just expounded is applicable to the
study of possibl~ construction of many tests in multivariant analysis.
CHAPTER X

AN UNRANDOMIZED HOMOGENEOUS SIMILAR TEST


IN THE BEHRENS-FISHER PROBLEM

§ 1. CONSTRUCTION OF AN UNRANDOMIZED TEST

In the preceding chapter we discussed certain results dealing with unrandom-


ized homogeneous similar tests in the Behrens-Fisher problem that have critical
zone Z of the form G (Ix - :YI/ s 2 , s / s 2) ~ O. These results are negative in
nature and they point out the "poor" analytical properties of the boundary of the
critical zone of such a test if such a test exists. However, if we require only that
the boundary of a test be measurable, a test will in many cases still exist, as
was shown in 1964 in the simultaneous articles [24] and [50]. Following the pro-
cedure used in [50], we shall now prove two theorems.
Theorem 10.1.1. For an arbitrary level a E (0, 1) and a pair of samples n 1,
n2 of opposite parity, there exists an unrandomized similar test for the Behrens-
Fisher problem with critical zone defined by the values of Ix - rl/s 2 and s/s 2 •
Theorem 10.1.1 can be strengthened somewhat.
Theorem 10.1.2. Suppose that we have a finite number K of pairs of samples
of sizes nli and n 2 i (i = 1, · · ·, K) of opposite parity. Then, there exists an
unrandomized homogeneous test with critical zone

a(I X,--Y, I·
S21
~)~o
S21

(where i = 1, 2, • • ·, K and the function G is the same for all values of i) that
is similar for all pairs of samples and that has a prescribed level a E (0, 1).
Proof of Theorem 10.1.1. To construct a test <I>= <l>(l.X - :YI/ s 2 , s / s 2),
that is of level a E ( 0, 1) and that assumes only the values 0 and 1, it will be
sufficient to construct a cotest 'I' = <I> - a. The function 'I' must assume only the
values - a and 1 - a. Define~= (x -y)/s 2, T/ = s 1/ s 2 • Consider the critics
A(~, 71) and B (~, T/). Let <I>(~, T/) denote the characteristic function of the

193
194 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST

critical zone of the test sought and let 0 1 denote the quadrant g ~ 0, T/ ~ O.
Then to find <I>((, T/) we have (see (8.4.2))

(10.1.1)

for arbitrary e > O. Here N = (n 1 + n 2 - 1)/2. For the cotest '11((, T/) we have
the identity

(10.1.2)

for all e > o. If the function 'I'((, T/) assumes only the values - a and 1- a and
if it satisfies (10.1.2), it will be the desired cotest.
By hypothesis the numbers n 1 and n 2 are of opposite parity. It follows that
the number N is an integer. Let us assume that n 1 ~ 2 and n 2 ;:::: 2. Since these
numbers are of opposite parity at least one of them is 3 or greater and N ;:::: 2.
In the quadrant 0 1 : ( ~ 0, T/ ~ O the critics A((, T/) and B ((, T/) constitute
a coordinate system. Through each point of this quadrant passes one and only
one critic of each of the two types. In examining the integral (10.1.2) we shall
find it convenient to change to this coordinate system. To do this we need to
find the Jacobian a((, T/)/a(A, B). Since (O +A) (e + B) = e2 + 0(1+ g2 +T/2)+T/ 2,
we have A+ B = 1 + g2 + T/ 2 and AB= T/2· Thus(= [(B - 1) (1- A)]Yz and
T/ = (AB)Yz. Therefore

0 (6, TJ)
.0 (A, B) -

1 B-A (10.1.3)
=-4 1 1 1 •
(AB) 2 (B-1)2 (1-A)2

Inserting this expression into (10. i.2) and making the change of coordinates,
we obtain an equation.for finding the cotest '11((, T/) = '11 1 (A, B):

=f f Y
n 1-3

J (0) qr I (A, B) (AB)-2 -(B - A) dA dB = 0. (10.1.4)


II 1 A YB + +
l (0 A)N (0 B)N
§L UNRANDOMIZED TEST 195

Here II is the half-strip o$ A $ 1, B;::: 1 into which the quadrant 0 1 is mapped.


Let us partition the half-strip 11 into a sequence of disjoint half-strips:

co
II= UIIk.
k"'O

Let us construct the function '11 (A, B) in such a way that for every k = O, 1, 2, · • •
we have

ffV
n 1 -3

J (6) = W1(A, B) (AB)-2 -(B - A) dA dB =O (10.1.5)


k 11
k
1 A Jf B - 1 (0 A)N (0 B)N + + •

Without loss of generality we can assume that n 1 ;::: 3.


The function (B - A)/y 1 - A ~ has only an integral singularity at the
point (1, 1). Since (n 1 - ·3)/2;::: O it follows from equations (10.1.5) (k = 0, I,···)
that f (e) = 0. Since N is an integer and since A f. B in the half-strip lh, we
have the decomposition into partial fractions
N
1 xi (8 -A)m-1
-(0_+_A_)....,N-(0_+_B_)NT.:" = ~ Dm (B- A)2N-1 (0 + A)m
N
~ (B-A)m-1
(10.1.6)
+~Em (B-A)2N-1(0+B)m'
m"'l

where Di and Ei (i = 1, 2, • • •, N) are constants. If we substitute (10.1.6) into


the left-hand member of (10.1.5) we find that

N 1-2-k-l

Jk(6)= ~Dm
m=l
f
1-2-k
dA
co n,-3

x1 f dB '¥1(A, B) (AB)-2-(B - A)m


vi-Av B 1 <B-A)2N- 1 <e+Ar
N co 1-2-k-l

+~Emf dB f dA
m=l 1 1-2-k
n 1 -3
'l'1(A, B) (AB)-2 -(B-A)m (10.1.7)
X Vl A YB-l(B-A) 2N- 1 (e+B>m
196 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST

Now let us partition the strip Ilk into rectangles:

Define
n 1 -3

A B _
2 -(B - A)m
(AB)-
Pm( • ) - Yl-AYB-1(B-A)2N-1 • m= 1. 2, ... , N.

On each rectangle Qks we choose a constant Cks > 0 such that the functions

1
Pmks(A, B)=-c Pm(A, B), m= 1. 2, ... , N,
ks
will be probability densities with respect to Lebesgue measure on that rectangle.
Obviously this is possible since B > A, so that Pm (A, B) ;::: O. Furthermore the
Pmks (A, B) are continuous functions on the closure Qks of the rectangle Qks
for k = 0, 1, 2, • • • •
At this point we apply the Romanovskil'-Sudakov theorem (which we shall
prove
_ k (A,
in the next section) to the probability densities p ms _ B) considered on
Qks for each k, s, and N. For given k and s suppose that Qks is of the form
a ::S A $ b, c $ B $ d. By Theorem 10.2.1 we can construct a measurable dense ·set
U = Uks such that, for m = 1, 2, . • • , N,
b b

f I (A, B) Pmks (A, B) dA =a JPmks (A, B) dA


a a
for almost all BE [c, d],
d d
f !(A, B)Pmks(A, B)dB=a f Pmks(A, B)dB
c c
for almost all A E[a, b],
where !(A, B) =/ks (A, B) is the characteristic function of the set U = Uks and a€ (O, 1)
is the level of the test. The function !(A, B) assumes only the values O and 1.
Let us define 'I\ (A, = !(A,
B) - a. Then 'I' 1 (A, B) assumes only the values
B)
- a and 1- a. Obviously, for m = 1, 2, • • ·, N,
b

f 'I'; (A, B) Pmks (A, B) dA = 0 for almost all BE [c, d],


a
d

f 'Yi(A, B)Pmks(A, B)dB=O for almost all A E[a, b].


c
§z. ROMANOVSKII·SUDAKOV THEOREM 197

Since the function Pm ks differs from Pm (A, B) only by a constant factor, if we


extend the domain of definition of 'I' 1 (A, B) to the entire interior of the strip II
by setting it equal to the functions Pmks constructed on the individual Qks
(k= 1, 2, • • • ; s = 1, 2, • • •), we obtain
I k (()) = 0 (k = 0, 1, 2, • • •) and J (()) = 0 for all () > O.
This completes the proof of Theorem 10.1.1. To prove Theorem 10.1.2 we con-
sider not the N functions P m (A, B) but the finitely many functions P mi. (A, B),
i = 1, 2, • • ·, K. For each i, m = 1, 2, • • • , we have N.£ = (n 1 £. + n 2 £. - 1)/2,
where nli and n 2 i are the sizes of the samples and
nli-3

P· ·(A, B)= (AB)_2_(B-A)m •


mi VI-A VB-1 (B-A) 2N,-l
The corresponding integrals of the type j (0) vanish simultaneously. The vari-
ables A and B areexpressedintermsof xi, Yi• sli' and Szi·

§2. THE ROMANOVSKIY·SUDAKOV THEOREM 1 )

To give a firm foundation to the construction of an unrandomized similar test


that we made in § 1 we need to prove the following theorem, which we applied in
that section and which was first published by Romanovski1 and Sudakov in (50]
and [64].
Theorem 10.2.1. Suppose that n probability densities p 5 (x, y), s = 1, · • •
• • •, n, are defined on a rectangle M: a~ x ~ b, c ~ y ~ d. Then, from given
a€ [O, 1], one can construct a set A= A a. CM such that
b b

JI A (x, y)p5 (x, y)dx=a f p (x, 5 y)dx


a a
for almost all y E[c, d]
(10.2.1)
and
d d
J/A(x, y)p5 (X, y)dy=a f p (X, y)dy
5

c c
for almost all XE (a, b]
for s = 1, • · ·, n, where I A (x, y) is the characteristic function of the set A.
Without loss of generality we may assume that a = c = 0 and b = d = 1. The

1) This section was written by V. N. Sudakov.


198 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST

general case reduces to this one by a linear transformation of coordinates.


Here we shall prove the general theorem on the values that the integrals in
the left-hand members of equations (10.2.1) can both assume. Theorem 10.2.1
then follows easily.

Theorem 10.2.2. Let p 5 (x, y), s = 1, • • ·, n, denote probability densities


defined on the unit square M and let A denote a measurable subset of M. The
collection, as A ranges over the class of all measurable subsets of M, of all
sets each consisting of the 2n functions

(j I A (x, y) p, (x, y~ dx,


JI A (x, y) Ps (x. y) dy, s = 1, .•. , n}
coincides with the collection, as f(x, y) ranges over the class of all functions
that are measurable on the unit square and that satisfy the inequality 0 5 f(x, y) 5 1,
of the 2n sets

lJ
I 1

f (x,
y) Ps (x, y) dx,

To prove this we need to construct, for each function that is measurable on


the unit square and that is bounded by zero and unity, the corresponding subset A of that
square. For such a function f(x, y) let Df denote the set of measurable functions g(x, y)
(defined only up to the values they assume on a set of measure 0) that satisfy the condi-
tions
1
Jg (x, y) p (x,
5 y) dx
0
1
= J f (x. y) p 5 (X, y)dx, s= 1. ... , n,
0
(10.2.2)
1
J g(x, y) Ps (x, y) dy
0
1
= Jf (x, y) p 5 (X, y)dy, s= l, ... , n,
0
O~g(x, y)~l. J
§2. ROMANOVSKII·SUDAKOV THEOREM 199
Let us ·study the set Df as a subset of the space L 00 (M) of bounded measur-
able functions defined on the square M. We know that the ·space L 00 (M) is dual
to the space LI (M) of functions that are Lebesgue-summable on M. On the space
L 00 (M) we introduce the weak topology I) of the dual space a(L 00 , L). This
means that gk (x, y) - g(x, y) if and only if for an arbitrary function q(x, y) e
LI (M)

ff q (x, y) gk (x, y) dx dy-+ ff q (x, y) g (x, y) dx dy.


M M

In the dual space every closed convex bounded subset is compact in the weak
topology (see [ 8] ).
Let us show that Df is convex, closed and bounded and hence that it is com-
pact in the topology of a(L00, L 1). That Dr
is bounded follows from the third of
conditions (10.2.2). Each of conditions (10.2.2) defines a closed convex subset
of L 00 • Proof of the convexity presents no difficulties. Also, the fact that the
set defined by the third of conditions (10.2.2) is closed is also obvious. It re-
mains to show that the first two of conditions (10.2.2) define closed subsets. The
first of these conditions is equivalent to the condition that for an arbitrary
bounded measurable function a(y), defined on [O, 1].
l l

f a(y)dy f g(x, y)p (x, 5 y)dx


0 0
I I

= f a(y)dy f f(x, y)p 5 (x, y)dx


0 0
or

ff g(x,
M
y)a (y) Ps (x, y) dx dy = const.

1) A neighborhood of an element g(x, y) of L00 (M} in the topology of CT(L00 , LI) is any
set containing a set V = V(g; qI, • • •, qk, f) consisting of all elements g(x, y) of L 00(M)
such that

IJJ(g (x, Y) - g (x, Y) )' qt(x, y) dx dy I< e, i = I, ••• , k

(q; (x, y) E l 1 (M), e > 0).


We recall that a topological vector space is said to be locally convex if every neighbor-
hood of every point in it contains a convex neighborhood of that point. Since the sets V
are convex the space L00 (M} is locally convex in the topology of O"(L00 , LI).
200 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST

Since a (y) p s (x, y) € L 1 {M), the set defined by the last equation is a closed
hyperplane; hence the intersection of all such hyperplanes as a(y) ranges over
the same set of all bounded measurable functions is also closed. The situation is
analogous with the second of conditions (10.2.2). The set Df' being the intersec-
tion of the sets examined above, is closed, convex and bounded; hence it is com-
pact.
The proof of Theorem 10.2.2 will be complete if we can show that DJ con-
tains functions l(x, y) that assume only the values 1 and O. Actually, not only
does DJ contain such functions but it contains enough of them in the following
sense.
·A point a in a convex set D is called an extremal point of that set if the re-
lations b € D, c € D and a= (b + c)/2 imply that a= b = c. It is easy to see
that each function l(x, y) € DJ that assumes only the values O and 1 is an ex-
tremal point of DJ. On the other hand, from the well-known Krein-Milman theorem
[8], every convex compact set in a locally convex Hausdorff space is the closed
convex hull of the set of its extremal points. In particular this applies to the com-
pact set DJ in the space L00 (M) equipped with the weak topology of the dual
space. Since the set DJ is nonempty, the set of its extremal points is also non-
empty and even ""sufficiently rich". It remains to show that the role of extremal
points of DJ can be played only by functions that assume only the two values 0
and 1. To do this we construct, for every function g(x, y) e DJ for which the
measure of the set [(x, y) € M; O < g(x, y) < 1] is positive, a nonzero function
h(x, y) such that g + h €DJ and g- h €Dr This will show that g is not an ex-
tremal point of Dr
We begin our construction of such a function h(x, y) by choosing a set B CM
of positive measure on which the function g(x, y) is separated from both zero and
unity by a positive amount:

B= {(x, y): 0 < ~< g(x, y)< 1-~}.


Let us choose natural numbers k and l such that kl> n(k + l).
We know (see [66], p. 129) that almost all points of a plane measurable set
B are points of density of that set. From this it follows that there exists a set
of distinct numbers E"o = O, Ep • • ·, E"k_ 1 ; 80 = O, 81' ···,Bl-I such that
i=k-1

mes n
i=l-1

l, J=O
(B- {(e,, 6i)}) > O.
§z. ROMANOVSKII·SUDAKOV THEOREM 201

(Here B - !(g, 71 )l = l(x - g, y - 71 ); (x, y) EB}.) In the set

l=k-1

n
J•l-1

l, J=O
(B-{(ei, l>i)}),

let us choose a subset C of positive measure with diameter not exceeding the
smallest of the differences lfp - fql' lop - oq I for p.;,. q. Obviously
(C + l(fil' oi 1)}) n (C + l(fiz' oh)})= A for (il' iJ) f,. (i 2, jz) and C+l<.~·i· oi)}cB
for arbitrary i, j.
For an arbitrary fixed point (x, y) E C let us consider the homogeneous
linear system of equations in the unknowns h ..:

l-1.j
'I
k-1
~ Ps(ti 1)h 11=0, s=l, ... , n, }=0,
l=O
... ,
Z-1
~0 Ps(t:.1)h 11 =0, S= 1, ••• , n, l=O, •• • , k-1,
J=

where tij = tij (x, y) = (x, y) + (fi• oi). Obviously the rank of the matrix of coeffi-
cients of this system does not exceed k(k + l - 1).
Let us now reindex the unknowns hij in a single sequence by assigning to
each a single index number: h 1, h 2, • • ·, hkl (where kl represents a single inte-
ger). To every point (x, y) EC let us assign the number N(x, y) such that the
rank of the submatrix of coefficients of the system (10.2.3) consisting of the co-
efficients of h 1, • • •, hN is equal to N and the rank of the submatrix of coeffi-
cients of the system (10.2.3) consisting of the coefficients of h 1, • • ·, hN+I is
also equal to N. If all the coefficients of h 1 are equal to 0 we take N = O.
Now consider a subset C 1 of C that is of positive measure and on which
the function N(x, y) assumes a constant value. By the choice of N the system
(10.2.3) will be satisfied if we assign to the unknown hN+l an arbitrary value
hN + 1 =f by setting hN +Z = · • • = hkl = O and then define unambiguously h 1, · • •
• • •, hN in the system (10.2.3). Here the functions h 1 = h 1 (x, y, f ), • • •, hN =
hN (x, y, f) are obviously measurable with respect to all the arguments and they
are linearly independent with 1. Let us suppose that a single f is chosen for all
(x, y) E C1 .
Now on the set

C2 = U<C
Ii
+ {(ei, l>)})
1
202 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST

we consider the measurable function h (x, y) obtained by setting h(x + f .,


i
y + oI.) =
hp (x, y, f), where hp (x, y, f) is the value of the unknown hij constructed as
described above for the system (10.2."3) after its coefficients are evaluated at the
point (x, y) and where the unknown h N + 1 is assigned the value f • Since the
function "'
h is almost everywhere finite, it is bounded everywhere on C 2 except
on a subset of arbitrarily small measure. Suppose that lh(x, y)I s Q everywhere
except on a set C 3 such that mes C 3 <mes c 1. Then

i=k-1
j=l-1
mes C1 ' \ LJ (C3 - {(e,. bi)})> 0.
i, j =0

Finally, we define the sets


i=k-1
J=l-1
C4 =C 1 = LJ (C3 -{(e,, 61)}),
l, j=O
l=k-1
i=l-1
C5= LJ
l, i=O
(C4 + {(e,, 61)}),

and we define on the square M the function

:::: { h(x, y), (x, y)EC5,


h (x, y)=
0, (x. y) EM '\ C5•

~
This function h(x, y) is now defined on the entire square, and as one can
~
easily see lh(x, r)I s Q. Let us show that the function
B::::
h(x, y) = Q h(x, y)

is the function we have been seeking. Let l denote the straight line x = x 0 • If
l intersects the set C4 + fo, then

1
Jh(x 0, Y)Ps(Xo, y)dy
Q l-1

=~ J h(x0 , y+b1)Ps(xQ• y+6;)dy.


J=O (C 4+ {(Eto' O)}) n1
§z. ROMANOVSKII·SUDAKOV THEOREM 203

This last expression is equal to

I
(C 4 + { (elo• O)}) nl
Ps (t1 01) h101dg=0,

since the integrand is identically equal to 0 by virtue of the second group of


equations (10.2.3). We treat analogously the integrals over straight lines of the
form y = y0 that intersect the set c
5 . In the case of lines that do not intersect
~
the ·set c5 , the function h is identically equal to O, and on these, of course,
1 1
Jh(x 0, y)p 8 (X0, y)dy= Jh(x, Yo)P 5 (X, y0)dx=0.
0 0
%
Similar equations hold for the function h(x, y), which differs from h(x, y) only
by a constant factor. Consequently if the function g (x, y) € Df is different from
both zero and unity on a set of positive measure, then g (x, y) + h (x, y) € Df and
g(x, y) - h(x, y) € Df' so that g(x, y) is not an extremal point of the convex
compact set Dr
This completes the proof of the theorem.
CHAPTER XI

THE PROBLEM OF MANY SMALL SAMPLES

§1. STATEMENT OF THE PROBLEM

By the problem of many small samples we mean the problem of verifying a


hypothesis H0 regarding a class of distributions defined on a measurable space
ex, ff) with measure µ(a) in the form of a family of densities

fe(x) = f(x, e1,· •. , es),


where e1, • ·., es are parameters that vary in Es· They are not assumed to be
interdependent. The hypothesis H0 is the hypothesis that the samples belong to
such a class under suitable choice of the parameters e1, • • ·, es• This hypothesis
is verified on the basis of q independent samples 0 1, • • ·, 0 q of sizes n 1, • • •
e
• • ·, nq. The parameters 1, • • ·, es
are assumed to remain constant for each of
the samples but they vary from sample to sample. The sizes of the samples
n 1, • • ·, ns are in general assumed to be small, so that we cannot use them to
estimate the parameters el, ... ' es with the necessary accuracy.
Such a problem is encountered naturally in problems of continuous control of
industrial production when it is desirable to use many small samples to verify the
normality of a random deviation of some aspect of the production from the techno-
logical standard, in order to be able afterwards to establish some plan of contin-
uous control on the basis of a normal law.
To verify the hypothesis H0 it is natural to use the above-described methods
of eliminating parameters and constructing similar statistics. However an impor-
tant role is played here by the question as to what alternatives the hypothesis
H0 has. The choice of such statistics depends on this.
Let us consider the case in which the observations have an exponential-type
distribution of the form

Pa (x) = C (0) exp [- (0 1T 1 (x) + ... + 0sTs (x) )] p (x). (11.1.1)

204
§2. A. A. PETROV'S INVESTIGATIONS 205

on a measurable space ex, ff), where p(x) is the density with respect to the
measure µ.(x) on (~, ff). Here the parameters are not assumed to be interdepen-
dent.
If no other choice of alternative is prescribed by the statement of the problem
itself, we can take as an alternative a family with probability density of the form
Pe(X, e, V)=C(6, E)
Xexp[-(6 1T 1 (x)+ ... +esTs(x)+eV(x))]p(x), (11.1.2)
where V (x) is a statistic and E is a small number. Here we assume that
Pe(x, E, V) is a probability density for small values of E. For E= 0 we obtain the
hypothesis H0 • We denote the alternative for a given value of E by HE. For an
arbitraty admissible value of E the function T = ( T 1(x), • • • , Ts (x)) is a sufficient
statistic for e e
= (e 1, · • ·, 5 ). Since the parameters are assumed to be indepen-
ent for each admissible value of E, the family of sufficient statistics T is com-
plete (see §2, Chapter IV).
If our independent samples 0 1, • • ·, 0 q are of the same size n, we can con-
sider the conditional distributions of the samples for fixed sufficient statistics
in each sample. Such conditional distributions depend only on E, V, and the
values of the fixed sufficient statistics, and we can construct a criterion for dis-
tinguishing the hypotheses H0 and HE of a given level that has minimal condi-
tional probabilities of errors of the second kind.
For series of repeated samples 0 1, • · ·, 0 q from one-dimensional distributions
containing only parameters of displacement and dilation (these distributions are
not necessarily assumed to be exponential), the construction of similar statistics
can be achieved by constructing quite simple affine invariants of the sample space.
Some interesting studies have been made by Petrov [60] in this direction. We
shall expound these results in the following section.

§2. A. A. PETROV'S INVESTIGATIONS

Suppose that a repeated sample x 1, · • ·, xn is taken from a distribution with


probability density f(x) with respect to Lebesgue measure. We define

"
S= ~(x,-x)2. y,= x, s x.
l=l
Then n n
l: y,=O, l: Y~= 1.
l=l i=l
(11.2.1)
206 XI. MANY SMALL SAMPLES

Thus the distribution of the random vector TJ = (y 1, • • •, yn) is concentrated on the


(n - 2)-dimensional sphere @ defined by ( 11.2.1). Let us derive the distribution
of T/ on the sphere @. We introduce the new coordinate system x, S, c/>1, ••• , </>n _2,
where c/> 1, • • • , <Pn _2 are the usual spherical coordinates for @. In the new coor-
dinate system the element of probability is of the form
/(x 1), ••• , f (xn)dx 1 •••• dxn
n

= 11/(
l =1
x-+sYt >I iJ(x., ••• ,xn)
iJ (x, S, ci>1o • • • • ci>n-2)
id-dSd
x <Jl1· · · d'Pn-2·

The probability density TJ for a point c/> 1 , • • ·, <Pn- 2 of the sphere @ is


oo co· n
p(11>=Jds f llt<x+Sy,) o(x., ••. ,xn) • (11.2.2)
o -co t=l iJ (x, S, cp,, • • •• ci>n-2)
From this we easily obtain
w co n
p (11) = Vn f dS f sn- II I (x
2 +Sy,) dx. ( 11.2.3)
0 -co l=l
In particular, for a normal sample
I [ (x-a) 2 ]
f (x; a, a)= V2na exp - 2<1 2 '

the corresponding density p 0(71) is of the form

Po (11) Yn J Jsn-
00
dS
00
2 exp (
nx2+s2)
2
- r (n 21)
dx = --n--1::'""'"'"'-
(Jl2i'"t 0 -co 2:it_2_

Thus for a normal sample, the density TJ is constant.


The question naturally arises as to whether this property determines the
normality of the original sample. For the case of a continuous original density
f(x) and sample size n ~ 6, a surprising answer has been obtained by Zinger
(21]. A strengthened form of this assertion can be found in (22].
Since the vector (y 1, • • •, yn) is an affine invariant, instead of f(x; a, a)
we may consider fo(x) = f (x; 0, 1). As an alternative, let us take the type of den-
sities that is represented by a density of the form
/ 2 (x) = / 0 (x)(l + eH (x)+ R (x, e)), (11.2.4)

where f is a small number, H (x) is a polynomial, and the function R (x, E) sat-
isfies, for all values of a, the inequality
IR (x, e) I~e2R (x). (11.2.5)
§2. A. A. PETROV'S INVESTIGATIONS 207

where R(x) is a polynomial that assumes nonnegative values for all real x. Let
us show that for the corresponding probability density pf(71) of the vector 71 on
the sphere @,
co co n
Ps('l'J)=Po('l'J)+eVn
0
I dS I sn- -co
2 ~(x+Syi)
l=l
n
X Il1 c:X+ Sy,)dx+O(e2)
l =1
0 (11.2.6)

uniformly with respect to 71.


The expression (11.2.3) for pf(71) contains the product
n
II
l=l
1 c:x + sy,> = rr 1rr2•
11

where

n
II2 = fi (1 +
i=l
eH (x + Syi) + R (x + Syi• e)).
For brevity we write
H,=H(x+Sy,), Rn=R(x+SYi• e) (l= l, 2, ... , n).
Then
n n
II2'-= n (1 +eH;-t- R;)= 1 +c ~ H; =I= R.
l=l l=l

Here the quantity Ris the sum of finitely many terms of the form iPHi • • • Hi
1 p 1
Rj • ••

· · • Riq' where p = 0, 1, • • ·, n; q = 0, 1, • • •, n; i 1, • • •, ip, h' .. •, iq are dis-


tinct, p + q Sn and p + 2q ~ 2. Returning to (11.2.3) we now find that

Ps('l'J)= 'Vn jds _[ sn- 2 ( 1 + e t,H,+R) Il1dX


co co n
= Po('l'J)+e Vn I dS I sn-
0 -co
2 ~ H(x+ Sy,)
l=l
n co co

x II fo(X +SY;)dx+ Vn I dS I sn- Ril,dx·


2

l =1 0 -oo
208 XI. MANY SMALL SAMPLES

To prove (11.2.6) it only remains to show that


co 00

R=Vn f dS f RIT,dx.
0 -oo
sn- 2

Vnn f dS • f Rexp(- nx t
00 00

= sn- 2
dx+ 0(e2) cp.2.1) 2 8 a)
(2n) 2 o -oo

uniformly with respect to 71 (for fixed sample size n). It follows from what we
~
said above that R is the sum of finitely many terms of the form
co

f dS f
00

i=C sn- 2aPH1 1 If1


p
R1 I ...
0 -oo
... R;qexp(- nx2ts2)dx.
where C > 0 depends only on n. Let us show that each of these integrals is
0 (f 2).
By virtue of (11.2.5),
co 00

Ill= O(l)lelp+Zq f dS f sn-ZillH(x+sY,k)I


0 -oo k=l

-
Let us denote by I the double integral in this expression. Since p + 2q ?:. 2, we
need only show that
00 co

T- I dS f sn-ZlllH(x+sY,k)I
0 -co k=l

x "f.! R(x+Sy1k)exp(- nx2-;-s2)dx<;K.


where K is independent of T/• To do this we choose a polynomial P (x) such
that for all x
P(x)~l. P(x)~R(x), P(x)~IH(x)I.

For such a polynomial we may cho.ose, for example, the polynomial


P(x)=R(x)+(H(x))2·+ 1.

Then
f sn- :iI [P<x+sy,) exp (- <x+:/Y1) )]~x'.·
00 00 -

T<;I dS 2 2
0 -oo l m1
§2. A. A. PETROV'S INVESTIGATIONS 209

Let us choose the constant K 1 in such a way that

Then

T<Kf j j s11 - 2
0
dS
-oo
exp [ - ~2 ] exp [ - n;2] dx
11 11-2.,_r:;;;r(n-1)
=K12 v 1tn - 2- = -K ,
which completes the proof of (11.2.7).
Let 17 1, • • ·, 11q denote q independent observations on the vector 17. Suppose
that we are verifying the hypothesis H 0 : p ( 17) = p 0( 17) as opposed to the alterna-
tive Hf.: p (17) = pf.(17) for given f. /, 0. According to the familiar Neyman-Pearson
criterion, the test that is optimal in a certain sense is based on the statistic
Pe (TJ1), • • • • Pe (TJq)
y (11 1•••• , 'llq) =In PO (TJ I•
)
... ,
P0 (TJ q) • ( 11.2.8)
Here it should be noted that, by what was shown above, pf.(17) /, 0 for 11t € @
and sufficiently small f. (i = 1, 2, · • • , q). The test will have a critical zone
y(17) > c with level
a = P ( y > c IH0 )
and with probability of error of the second kind
(3 = P(y .:5 c\Hf.).
We consider the behavior of (11.2.8) as q--> oo with fixed n. We define
Yk = ln(pf.(17k)/poC17k)), so that
q
'V (Th, • • • 'llq) = l} 'Vk•
1
k=l
The quantities yk (k = 1, 2, • • ·, q) are independent and identically distributed.
We shall show that they have a finite third moment both under the hypothesis H0
and under the hypothesis Hf.. Then the central limit theorem (see [33]) can be
applied to the random variable y. We have

p ( v- qao < t I H 0) = a (t) + 0oPo I (11.2.9)


ao'Jli Yq
p ( v- qae
aeyq
< t IH e) = a (t) + 01Pe
Yq
I ( 11.2.10)

where
210 XI. MANY SMALL SAMPLES

t
O(t)= J/ (x)dx,
0 a0 =E(Vk IH0 ); a 2 =E(Vk IH2 ),
-co

a~= D (Vk IHo), a~= D ('\'k IHe)•


1 1
Po= 7n"E(J'\'k -aol 3 1Ho), Pe= ~ E(IVk-ael 3 1H2).
o e
!6ol <2. 1611<2.
From this we obtain the expression for the level of the test a and the probability
/3 of error of the second kind.
a= l - 0 ( c - qao ) - 0oPo
ao Vs fq'
~= 0 ( s-
aeyq
ao ) + 0ePe
yq

We denote by tz the number satisfying the equation


1- G(t) = z.
Suppose that a, f3 and f are given. Using the usual symbol for asymptotic
equality, we obtain, as q -+ oo,

ta,_, c - .. r-::
qa0 t t c - qa0
• 1-a=- a,..., ..r:: •
ao r q Ge.f'·..'f:
c ,_, qao + taao yq. c ,_, qae - t r.Oe yq.
Therefore ta. a 0 + t/3aE"' yq (af - a 0 ) and

q ,_,
(ta<lo + tr/1 8) 2
• (11.2.11)
(ae-ao)2
If a and f are fixed, the probability of error of the second kind f3 (q, a, f) =
P(y ~ clHf)-+ 0 as q-+ oo. For given f3 let q = q(a, {3, E) denote the smallest
integer such that {3(q, a, f) ~ (3. Then q is the number of observations necessary
to distinguish the hypotheses H0 and Hf' H0 and Ha. on the basis of a sample of
constant size q with probabilities a and f3 of errors of the first and second kinds
respectively. We define

q
*
= q* (a, ~. e)
+ t 11ae)2
= (t0a(ae-ao 0
) 2

To compare q* and q we use the following fact. Suppose that q*(a, {3, E)-+ oo

as f-+ 0 for fixed a and f3 and that the quantities Po and pf remain bounded.
Then
q (a, ~. e) 1
q• (a, ~. e). ~ · (11.2.12)
§z. A. A. PETROV'S INVESTIGATIONS 211

To prove this we note that by virtue of the expressions for a and {3 we have
c = qa 0 + ta.a 0(1 + A), where A= A( f., q)-+ 0 as f.-+ 0 and q-+ oo; also

p(q, a, e)=O(Yq ao;ae +ta ~o (I +A.))+~~.


e e q f
Let {3 denote a given admissible probability of error of the second kind. Let us
fix o> 0 arbitrarily small and set
, (taO'o (1 - 6) - t13+611 8) 2
q = (ae-ao)2 • ql = [q'],
(taO'o (1+6) + t13-611e) 2
q
11
= (ae-:- ao)2 • q2 = [q''] +1
where the brackets denote the integral part of the number contained in them.
The quantities q 1 and q 2 approach oo as f.-+ O; the quantities A(f., q 1)
and A(f., q 2) depend only on f. and they approach 0 as f. approaches O. When
q 1 observations are made we have

e)<,o(V? ao-ae +ta~(1+ 6))+ 9 8p~


O'e O's -vi'
=P-6+ 9ePe <p
-vi'
for sufficiently small f.. This means that q(a, {3, f.) :s.· q 1 for sufficiently small f..
Analogously, one can show that under the same condition
q (a, {3, f.) ~ qz·
Hence

. 2 tll+6
2 } 1 q2 -q (a, p, e)
mm { (l -6), - t 2- --q* <,-q* <, q,. (a, ~ , e)
13
qi 2
<,-.<,max { (1+6)2, -
q
2-
tll-6
t13
}
+--.
1
q
and

min {c1-6)2, tg~ 6 }<Jim q <,Jim q


t13 e+o q* e+O q*

<,max{ (1 +6)2• t~tr}.


Since o is arbitrarily small, the relation (11. 2.12) follows.
The above reasoning is valid for rather arbitrary random objects 71. Let us
apply it to a particular given vector 71 = (y 1, • • •, yn). Here by (11.2.7) we have
212 XI. MANY SMALL SAMPLES

where

a(ri)= b
0
j j dS
-oo
sn- 2 exp ( - nx ts 2 2
rt
l=l
H (x+ Syi) dx;
b= 2
Vn •
2 n; ynr ( n 2 1 )
The function u (11) is bounded by the quantity G. Thus for sufficiently small E

y=ln Pe(TJ) =ln[l +eu+O(e2)]=eu+O(e2)


Po(TJ) •
If we now set µ. 2 = E (u 2 IH 0 ), we obtain
a8 - a0 = e2µ 2+ 0 (e3).
It is easy to see that E(uiH 0 ) = 0:
E(l jH8 )=E(l IH0)+eE(ulH0)+0(e2),
or
1= 1 + eE (u IH + 0 (e0) 2), E (u I H 0) = 0 (e),
that is, E (ulH 0) = O. Therefore a0 = E (Eu + 0 (£ 2) IH 0) = 0 (£ 2); and a6 = 0 (£ 4);
analogously ai = 0(£ 4). It is then easy to show that a6=
E 2µ. 2 + 0(£ 3) and = ai
E2 µ. 2 + 0 (£3), so that, from what was said above,

- (ta+t13) 2
q ,.._, i;2µ2 ( 11.2.13)
If we set en= n/µ. 2 we have

(11.2.14)

Thus to get an idea how many independent samples of size n should be


made to verify H0 as opposed to HE, we need to calculate en in the two cases.
Various representative examples of such calculations are presented by Petrov
[601. Other interesting calculations are made by Volodin [12],
APPENDIX

UNSOLVED QUESTIONS

In the analytical formulation of certain problems presented in this book deal-


ing with the elimination of nuisance parameters in statistical problems, certain
questions and unsolved problems arise rather naturally. Here we arrange these
according to the chapters dealing with the topic in question.

Chapter II. 1. Continuation of the work of Koopman, Dynkin and Brown on


the classification of distributions admitting sufficient statistics of finite rank
for the case in which the distribution densities can vanish. Weakening of the
condition for existence of exponential families in the case when the densities do
not vanish on the sample space.
2. Investigation of exponential families with "sliding carrier", i.e. families
whose carrier depends on the parameters.

Otapter III. I. Consider a repeated normal sample x 1, · · ·, x 11 , where xi €


N (a, u 2). Let us partition all almost-everywhere-continuous. functions y(a, u)
into two classes:
I. "Verifiable functions" for which the hypothesis H0 : y(a, u) =Yo admits
an invariant verification.
II. The remaining functions.
How can one describe the class I?
2. The same question for an infinite sample x 1, x 2 , • • ·, and the application
of sequential analysis, where the mean number of steps is bounded up to the final
solution.
3. Generalization of question 1. Suppose that we have an exponential family
of the form (5.2.3). Let us partition the collection of functions y(t\, · · ·, (} 5 )
into "verifiable" and "unverifiable" classes, just as in question 1. How can
we describe the class of the form I?

213
214 UNSOLVED QUESTIONS

Chapter IV. 1. How can we describe all the similar zones of distribution
families of the form (4.3. 5)?
2. Let Xi,·•·, xn (xi € N (0, 1)) denote a repeated normal sample. Let
P i<xi, • • •, xn) and P 2<xi, • • ·, xn) denote two independent polynomial statistics.
Is it always possible to "uncouple" them by means of an orthogonal transforma-
tion, i.e. to reduce them to two statistics depending on the completeness of the
different variables? (For a given sample size n and given degrees m 1 and m2
of the polynomials the question can be solved in a finite number of steps for all
such polynomials (see [ 41]).)

Chapter V. 1. Is it possible to weaken the condition for complexification of


the relations (condition ( Y1) in §B)?
2. Is it possible to weaken the condition on the Jacobians (Y 11)?
3. How can one describe in terms of generalized Laplace transformations
(in the sense of the theory of generalized functions) the construction of nonsmooth
cotests?
These questions are related to the following question in the theory of ana-
lytic functions.
4. Let Z denote a simply-connected complex polycylinder. Let O' denote
a ring of functions that are holomorphic and that have a holomorphic continuation
from the polycylinder to the Cartesian product of the half-planes containing Z.
How can we describe the structure of the ideals of the ring 0'? Under what con-
ditions will the basis of the ideals be finite?
5. How can we construct cotests for the case in which the function h van-
ishes in a given region?
Formula (5.8.6) leads to the following analytic problem: Let A.( Ti,···, T ),
I s
j = 1, 2, ••• , r; r < s, denote sufficiently smooth functions defined in the Euclidean
space Es of the arguments T 1, • • ·, T8. Let U denote a simply-connected re-
gion contained in E 5 and bounded by smooth surfaces. How can we describe the
system of all functions Hi•·· ·, Hr for which the sum of the convolutions
Ai* Hi+···+Ar *Hr
exists and vanishes on U?
6. Is it possible to describe smooth cotests for exponential families with
"sliding carrier" (cf. question 2 to Chapter Il)?
7. The development of computational methods, in particular linear programming
APPENDIX 215

methods, for finding optimal similar tests.

Chapter VI. 1. Generalization of Wijsman's results on the construction of all


similar tests of the hypothesis H0: a/a= Yo for a repeated normal sample x 1, • • •
• • ·, xn where xi E N(a, a 2). In what problems associated with tests of hypotheses
on normal samples do parabolic differential equations appear? What can we do in
other cases?

Chapter VII. 1. Questions of unbiased estimates of zero (UEZ's) analogous


to the questions in Chapter Von cotests, i.e. questions 1-5 of Chapter V with
the word "cotest" replaced by the expression "unbiased estimate of zero".
2. Generalization of the results of §2 dealing with inadmissible unbiased
estimates. Ways of obtaining more general forms of inadmissible unbiased esti-
mates for incomplete exponential families on a compact set of values of the para-
meters. Cases of inadmissibility for noncompact sets of values of the parameters.
3. Replacement of the variance as a function of the loss of an unbiased esti-
mate with other sufficiently smooth functions. Conditions for inadmissibility of
unbiased estimates.
4. Oassification of the best unbiased estimates for incomplete exponential
families from a standpoint of variance. From an analytic point of view this prob-
lem is connected with the following question from functional algebra. Let 0
denote the ring of all holomorphic functions defined on a compact simply-connected
polycylinder and let I denote an ideal contained in O. Find the ring K of all
linear differential operators L with constant coefficients such that LI C I.
5. Problems analogous to the preceding ones for the case in which we are
using not the variance but other sufficiently smooth functions of the loss.
6. Construction of an analogue to Theorem 7.3.1 dealing with inadmissibility
of a sample mean for scale parameters. Extension of Theorem 7.3.1 to observa-
tions connected with a homogeneous Markov chain.

Chapter vm. 1. Investigation of the question of nonexistence, in general,


for incomplete exponential families of similar zones that depend on sufficient
statistics with sufficiently smooth boundaries.
2. Weakening of the conditions of Theorem 8.3.1. Does there exist a similar
Fisher-Welch-Wald test if we require only continuity of the test boundary or only
satisfaction of a Lipschitz condition for it?
216 UNSOLVED QUESTIONS

Chapter IX. 1. Cao we construct a test analogous to the simple randomized


test in §3 for the case of nonidentical sample sizes?
2. Generalization of Theorem 9.4.1 on the characterization of the Bartlett-
Scheffe test. Weakening of the conditions on the basic space of linear forms.
Construction of an analogue of this theorem to characterize the simplest tests of
the method of least squares with unknown weights.

Chapter X. 1. How can one construct a homogeneous unrandomized similar


test for the Behrens-Fisher problem in the case of like parity of the sizes of the
samples?

Chapter XI. 1. Investigations analogous to those made by Petrov for the


scheme of many small samples from complete exponential families referred to
in §1.
SUPPLEMENT

MEW RESULTS IN THE THEORY OF ESTIMATION AND


TESTING HYPOTHESES FOR PROBLEMS
WITH NUISANCE PARAMETERS

A. M. KAGAN AND V. P. PALAMODOVI)

The results expounded in this Supplement are mostly answers to questions


put at the end of the book (see queries: 1 to Chapter III, 3 and 5 to Chapter V,
4 and 6 to Chapter VII). But §s is an exception; there the problem of estimating
a location parameter is considered when the "nuisance parameter" is the form
of the function F(x - ()) itself, for which only several first distribution moments
for () =0 are known.
The results of §§1 and 2 are due to V. P. Palamodov [6, 7), 2) of §3 to
A. M. K~gan and V. P. Palamodov [2, 3], of §4 to A. M. Kagan [4, 12), of §s to
A. M. Kagan and A. L. Ruhin [5].

§L INVARIANT VERIFICATION OF FUNCTIONS


WHICH ARE POLYNOMIALS IN a AND 1/a2
FOR NORMAL SAMPLES

This section gives the description of all polynomials P(a, 1/a2) admitting
invariant verification in the sense of §2, Chapter III (strictly speaking, that
sense will be modified a little), on the evidence of a repeated sample(xp • • ·, xn)
from a normal population N(a, a 2). The method of this section is purely analytical,
in contrast, for instance, to that of E. Lehmann [13] establishing the non-
verifiability of certain functions as a consequence of the indistinguishability of
the corresponding families of distributions.

l)Editor's note. The translation of the Supplement was provided by the authors.
2) Authors' note. These refer to the Bibliography at the end of this Supplement, not
to the main Bibliography.

217
218 SUPPLEMENT

Let xl' • • ·, xn € N(a, a 2) be independent normal variables. In studying


the question of the invariant verification we can restrict ourselves, without loss
of generality, to those tests <f>(x, s) depending only upon the sufficient statis-
tics

We have
- 2 ~
E 0 a-<f>(x, s) = <f>(a, a)= <f>(a, g)
'
= c0 j j gn 12 exp[- e<s + <-x -aY)Js<n-3> 12 ¢<-x, s)dx ds. (S.1.1)
-oo 0

Here we denoted g = l/a2 ; C 0 is a constant.


Note that the integral (S.1.1) can be continued as an analytic function of
two complex variables to the product of the complex plane C of the values of a
and the complex halfplane C + of the values of g with Reg> O. This enables
us to introduce the following definition.
Definition. A function f (a, g) defined in C x C + is called C-verifiable if
there exists a test </> such that
¢<a, g) = '" (f(a, g)); a€ c, e € c+
for some t/l /:. const.
Any real C-verifiable function is obviously verifiable in the sense of §2,
Chapter III. The converse is not true; in [6) we give sufficient conditions for
the C-verifiability of functions which are verifiable in the sense of §2, Chapter
III.
Lemma S.1.1. For any test <f>(x, s)

1¢<a, ~co e
g)I ·IR!
2
ln/ exp [~:1; IIm a 12 ] (S.1.2)

for a € C, g€ C + (C 0 , C 1 , • •• in what follows are positive constants).

Proof.
ESTIMATION AND TESTING HYPOTHESES 219

:S: C0 jgjn/ 2 exp (-Re(ga 2)) j j exp [-Re g(s +% 2) + 2 Re (ga):X]s<n-3)/ 2 ds dx


-oo 0
(S.1.3)
because 0 .'.5 ¢ (x, s) .:5 1. The quantity

C0 !Re gjn/ 2

•expf__Re(ga) 2
[ Reg
1-ooo
j j exp[-Reg(s + :x 2 ) + 2 Re(ga):X]sCn-3)/2 ds dx
is the power function of the trivial test ¢ =1 at the point (Reg, Re (ga)/Re g).
Taking this into account, we can write the right side of (S. 1.3) in the form

C Ig
0 Re g
ln/2
exp
[Re (ga)2
Re g -
R
e
(ga 2)] .

Putting a= a+ i f3, g= ( + i T/ we get

Re (ga) - Re ga2
Ree

= .!..[(a(-f3TJ)2-((((a2f32)-2T/af3)] = .!.((2 + T/2){32= !if_ !Imal2'


i: i: Reg
which leads to (S. 1.2).
We shall say that r(a, g) € R if

p(a, g) Pm(g)am + ·•• +po(g)


r(a, g) = =-------- (S.1.4)
q(a, g) qk(g)ak + ••• + qo(g)

where pi, q j (i = O, 1, • • • , m; j = 0, • • • , k) belong to the ring A of the func-


tions holomorphic in C + and pm q k f. 0. Without loss of generality we suppose
that either m >k or m = k but Pm(g)/qk(g) f. const. Moreover, we can
suppose that the number m is the least possible and the functions pi and qi
do not vanish simultaneously.
Theorem S.1.1. 1°. If a function r € R with m > 0 is C-verifiable, then
k = 0 in (S.1.4), and the function q 0 (g) f. 0 on C +. Moreover, the power
function of the test t/J (r (a, g)) is an entire function of order not exceeding 2/m.
220 SUPPLEMENT

2°. If, moreover, the functions q 0 , p 0 , ···,Pm and p 0 /q 0 are analytic in


the vicinity of zero, the function is of the form
Pi
r(a, ,f) = r 2 (,f)a 2 + r/,f)a + r 0 (,f); ri = qo € A

times a constant; also, r 2(0) = r 1(0) = O; r;(o) = 1 and the image of the mapping
r 2 : c+ - C belongs to c+. The function i/J is bounded in any closed angle
belonging to C +.
In what follows we shall denote by C * the complex plane compactified in
the usual way.
Letnma S. 1.2. Let p: w -> C * be a function meromorphic in a domain
w cC and not a constant; let n be the range of its values. If t/J(p(I;,)),
(, € w is analytic in w, then t/J is analytic in n.
Proof. Let z 0 € 0 be an arbitrary point distinct from oo, and let (,0 € w
be a point at which p(l;,0 ) = z. If p'((,0 ) .f. 0, then in the neighborhood of the
point z 0 we have z = p(w(z)) where w(z) is holomorphic in z 0 • Hence
·ijl (z) = t/J (p (w (z))) which implies that t/J is holomorphic in z 0 •

Since pi const, the zeros of its derivative are isolated. Let (,0 be one
of the zeros of p' and U Cw a closed bounded neighborhood of the point (,0 ,
containing no other zeros of p' and no poles of p. Its image V = p (U) is a
bounded neighborhood of the point z 0 = p ((,0).
In the domain V \I z 0 l the function t/J is bounded and was proved to be
analytic. Hence it is analytic at the point z 0 also.
Now let (,0 be a pole of the function P· Choose a bounded closed neighbor-
hood U C w of the point (,0 • Its image V is a neighborhood of oo, and in the

domain V\ I z 0 } the function i/J is bounded and analytic. Hence it is analytic


at the point z 0 also. The lemma is proved.
We pass now to the proof of Theorem S. 1.1. We suppose that t/J i const.
It is easy to see that the range of values of the function r : C x C + -> C *
always contains C; hence by Lemma S.1.2 the function i/J is entire.
If k > o, then for a suitable <fo the image of the mapping r (a, <fo): c - c*
contains the point oo. Hence by Lemma S. 1.2 the function i/J is analytic at that
point. Since it is entire it must be a constant, which contradicts our assumptions.
Hence k = 0. In a similar way we prove that q 0 has no zeros in C +. Hence
ESTIMATION AND TESTING HYPOTIIESES 221

we get
Pi
r(a, ()=rm(() am+···+ r 0 (,f); r. = - €A.
' qo

We fix the point ( so that rm(() f. 0. Then r (a, () "-' rm (()am for a -> oo. The
inequality (S.1.2) implies for certain values of the constants A and B that

ll/J(r(a, ())! ~ C0 exp(A1Imai 2 )::; C0 exp(B lr(a, .f)IZ/m);

hence we deduce that the order of the entire function 1/1 does not exceed 2/m.
The first assertion of Theorem S. 1.1 is proved.
We pass to the proof of the second assertion. The functions r 0, p 1 , •••
• • • , Pm• q 0 are analytic in the vicinity of zero. Hence the functions
r i = pi/q 0 are analytic in the vicinity of zero, except perhaps at zero itself,
where the only possible singularity is a pole. We shall show that in fact the
coefficients r i (i = 1, · · · , m) are analytic in the vicinity of zero and r i (O) = 0.
For each i = 1, · · ·, m we have in the vicinity of zero
a. a. +1
r/() =Pi ( '+ P; ( ' + · "" '
with certain Pi f. 0 and ai. If all ai >0 our assertion is proved. Suppose
that ai <0 for a certain i; consider the quantity

a= max r_ ai ] .
i> 1 ti
Let k be the largest number for which - ak/k =a. For A€ C put

g > o,

where the branch of the root can be chosen arbitrarily. Since ak < 0, the quan-
tity a A(() is bounded for (-> 0 for any A€ C. Since ( > 0, we get from
(S.1. 2):

(S.1.5)

On the other hand,

daA((), () = I r.(,f) a~(() + A+ I r. (()a~((). (S.1.6)


j>k 1 j<kl
In the first sum on the right-hand side of (S.1.6)
222 SUPPLEMENT

,.,_, C gi( ajlj - aklk) _ 0

for g- 0, in view of the choice of k. In the second sum


· j(a lj-a lk)
lr/e)a~ (g)I s cg i k 1Ali1k::;c<1Al<k-l)lk + 1)
for 0 e
< < 1, since a./j?:. a.k/k for j < k and the function ro(g) is bounded
for g-+ O. We choose g so small that the first sum in (S.1.6) is smaller than 1.
Then

(S. 1. 7)

We take now a sufficiently large R. If the point ,.\ runs over the circumference
with the radius R, (S.1.7) implies that the point r(a,.\(g), g), which depends
continuously on ,.\, runs over a curve homeomorphic to that circumference and
containing the circle

lzl ~R' = R-C(R<k-l)/k + 1).

On the other hand, from (S.1.5) it follows that on this curve the function t/J is
bounded by the constant C exp (8(,.\) g). Since g can be chosen as small as we
please, t/J is bounded by the constant C, which does not depend upon ,.\ on the
curve described above and hence in the circle of radius R '. As R -+ oo so
does R'; hence t/J is bounded on the whole plane; hence t/J =const. Since we
assumed that t/J ~ const we see that all a.i > 0.
Lemma S.1. 3. For each g € c+ there exists an E > 0 such that the func-
tion t/J is bounded inside the angle
!arc z - arc r m(g)I Sf·

and, for odd m, also inside the angle

!arc z - arc (-r m(g))l Sf· (S. 1.8)

Proof. Take an arbitrary g € c+. Since r m(O) = 0, it follows that r m(g)~


const. Therefore in the vicinity of the point g we can find points g+ and g_
such that
ESTIMATION AND TESTING HYPOTHESES 223

arc r m(g +) > arc rm (g) > arc rm (g _),


arc rm (g+) - arc rm (g_) = 8 < 17/2. (S.1.9)

We shall show now that the function ifJ is bounded inside the angle

arc rm (g) - 8 /3 ~ arc z ~ arc rm (g_) + 8 /3. (S. 1.10)

Consider the curves

(S.1.11)

We have

This implies that the domain between the curves g±(a), taken together with a
sufficiently large circle, contains the angle (S.1.10). By the inequality (S.1.2)
the function ifJ (r(a, .;±)) is bounded for real values of a. Hence the function ifJ
is bounded on the curves (S.1.11). But by (S.1.9) these curves are contained
inside an angle less than 77/2, while ifJ was shown above to be an entire function
of order not exceeding 2/m ~ 2. Hence by the Phragmen-Lindelof principle, the
function I/I is bounded in the domain lying between the curves (S. l.11), as was
to be proved. The case of odd m is dealt with by analogy with the preceding
one. * To complete the proof of Theorem S.1.1, consider the set

(S. 1.12)

Since

for sufficiently small .;•s, we have

(S.1.13)

Therefore if am> 1, the set (S.1.12) contains an interval of length 217· By


Lemma S.1.3 it then follows that the function ifJ is bounded inside the angle
277 - E for any f > 0. As ifJ is an entire function of a finite order, we have by
the Phragmen-Lindelof principle: ifJ = const. Hence am= 1. Multiplying
r(a, g) into l/Pm• we get r~(O) = 1.

*With .; replaced by - .; •
224 SUPPLEMENT

In this case it follows from (S. 1.13) that the set (S.1.2) contains the interval
(- TT/2, TT/2). Then by Lemma S.1.3 the function I/I is bounded inside the angle
Jarc zl :::; TT/2 - f for any £ > 0. Hence I/I is a function of order at least 1 and of
finite type. On the other hand, we have proved that its order does not exceed
2/m. Hence, m :::; 2. The case m = 1 is excluded because for m = 1 Lemma
S. 1.3 would imply the boundedness of the function I/I also in the angle
Jarc z - TTI :::; TT/2 - f for any f > 0, which is impossible for nonconstant functions
of finite order.
It remains only to verify that the image of the mapping r 2 : c+-+ C belongs
to c+. Suppose it is not so, and that for a certain g € c+, !arc r2 (g)I > TT/2.
By Lemma S.1.3 the function I/I
is bounded inside the angle !arc z - arc r 2(g)I $£·
Since this function is of the first order and bounded inside any angle of the form
!arc z I :::; TT/2 - f, again the Phragmen-Lindelof principle implies that it vanishes
identically. Theorem S.1.1 is proved.
Theorem S.1.2. Let a polynomial p(a, g) which is not a function of g only
be C-verifiable. To be so, it is necessary and sufficient for p(a, g) to be rep-
resentable as a linear fonn of

g(a 2 + Aa+B) (S. 1.14)

where A and B are real and A 2 - 48 :::; O.


Proof. Necessity. Let p(a, g) be a C-verifiable polynomial, and let the
test ¢ be such that ¢(a, g) = I/I (p(a, g)) where ifl -/;. const. By Theorem S. 1.1
the function I/I is an entire one and p(a, g), after multiplication by a suitable
constant, is of the form

where
(S. 1.15)

By the conditions of the theorem pzCg), p 1(g), p 0 (g) are polynomials. We


shall show that their order does not exceed 1. Suppose this is not true; then
for a certain a = a0 the polynomial p(a 0 , g) is of order k '> 1 as a polynomial
in g. Hence
(S. I.16)
ESTIMATION AND TESTING HYPOTHESES 225

Let the point g move inside the angle \arc g\ $ rr/2 - £. Then from (S. 1.2) it
follows that ijJ (p(a 0 , g)) is bounded. On the other hand, the first term in
(S.1.16) runs over all the values of the angle \arc g\ $TT - 2£-o Hence the func-
tion p(a 0 , g) runs over all the values of the complex plane, except perhaps in a
certain circle and in the angle \arc z - rr \ $ 3 £. Hence the entire function t/J is
bounded on the whole plane except perhaps in an angle \arc z - rr \ $ 3 £, which
is as small as we please. Since this function is of finite order, the Phragmen
Lindelof principle implies that it is bounded on the whole plane and therefore is
a constant. This contradiction proves that p/g), p 1(g) and p 0 (g) are linear
functions of g.
From (S.1.15) we get that p 2(g) = g; p 1 (g) =Ag, where A is a constant.
Subtracting a suitable constant from p(a, g) we can also annul the constant term
of p 0 (g). Then p(a, g) takes the form (S.1.14).
Consider now the function arc (a 2 + Aa + 8) for real values of a. Suppose
it to be nonconstant and let ¢ 1 and ¢ 2 be two distinct values of it. In that
case, if e runs over the angle !arc ti.$_ rr/2 - (' and the point a runs over
the real axis, the point g(a 2 + Aa + B) will take on all the values inside the
angles

From the inequality (S. 1.2) it follows that t/J is bounded inside these
angles for any £ > 0.
Since ¢ 1 -/, ¢ 2 and t/J is an entire function of the first
order, this is impossible in view of the Phragmen-Lindelof principle. Hence
arc (a 2 + Aa + B) = const. On the other hand, arc (a 2 + Aa + B) = 0 for a --> oo.
Hence arda 2 + Aa + B) = 0, A and B are real numbers and a 2 + Aa + B is
nonnegative for all real a, so that A 2 - 4B .:::; 0.
Sufficiency. Consider the test

s ?_tx 2

s <tx 2

for a given t > O. It can be shown that for a certain pair of numbers (y, s 0 )
with s 0 > 0 we have

Ea, § ¢ (x - y, s - s 0) = C0 exp ( - r g2 (a2 + Aa + B))


226 SUPPLEMENT

where r € (O, 1), so that t(a 2 + Aa + B) is verifiable. We omit the details;


they can be found in [6].

§2. THE DESCRIPTION OF ALL COTESTS FOR A CLASS


OF EXPONENTIAL FAMILIES WITH POLYNOMIAL RELATIONS

This is closely related to §s, Chapter V, and we use here the notation and
terminology of that chapter.
Consider the exponential family of densities with respect to Lebesgue
measure

(S. 2.1)

with 0, T €Rs. Before stating the conditions imposed upon h(T) and the para-
metric set, we introduce the set (l) C Rs determined in the following way. Con-
struct the set of e € Rs for which for a certain C = C(O) the condition

1(0, T) < Cl:::> Supp h = 3"


holds.
This set is a cone whose interior we denote by (l). It is important to remark
that if 0 € (l) we can choose a constant B such that for a suitable € >0
(0, T) 5 - €I Tl + B; T € 3". (S. 2.2)

We shall suppose that the family (S. 2.1) satisfies the following requirements.
1. 1. 3" =Supp h is a convex set.
1. 2. (l) is non void.
1.3. For any € >0
l\MT)exp(-€\TPllL <oo. (S. 2.3)
2

(Note that from (S. 2.2) and (S. 2.3) it follows that for any e€ (l), p(T, O) can be
considered as a probability density.)
2. 1. The parameter e takes on values in an everywhere dense subvariety
of a real algebraic variety II n (l,).

2. 2. The polynomial ideal I formed by the real polynomials of e vanishing


on II n (l) is a principal one, i.e. it consists of all polynomials of the form
G(O)P(O)
ESTIMATION AND TESTING HYPOTHESES 227

for a suitable P(e).


Denote by N the variety of complex roots of P(e) and by w x Rs the set
(e € Cs; Ree € w). y

3. 1. Each connected component of the intersection N n (w x R~) has at


least one real point where grad P(e) ,fa 0.
Theorem S. 2.1. Under Conditions 1.1-3. 1 each cotest can be represented
uniquely by the formula

¢(t) = - 1- P(- g))'P(T); ~ = r-~, ... , _!.._] (S. 2.4)


h(T) La T1 aTs
where 'P(T) is an (ordinary) function such that
Supp 'I' C Supp h
ll'Pexp(-flTl)llL <oo forany (S. 2.5)
2

The function 'I' is uniquely determined.


Conversely, each function of the type (S. 2.4) with 'P(T) satisfying the re-
quirements (S. 2.5) and lying between a and 1 - a for a value of a € (0, 1), is
a cotest.
Proof. I. Take a point a€ W· Then for certain constants C > 0 and B
the condition
(a, T) :5 - Cl Tl + B (S. 2.6)

is fulfilled on the set Supp h.


2. Set

By a theorem of Hormander [11], for any differential operator with constant


coefficients there is a fundamental solution (in general, a generalized function)
E(T) under the condition:

II E * cI>ll-f :5 cII cI>ll f (S. 2. 7)

for any cf>(T) for which the right-hand side of the formula is finite. Let £(1') be
such a solution for P(- ~ + a).
Let cp(T) be a given cotest, put ¢" = ¢h and
228 SUPPLEMENT

'P(r) = E(T) exp [-(a, T)] * '¢<n


= fE(T) exp [-(a, T)] '¢<r - T) dT

=exp [-(a, r)] fE(T) exp (a, T- n¢<r- T) dT.

In view of the inequality (S. 2.6), the function exp (a, T) ¢(T) belongs to the
space L 2 with the weight exp(€\ T\) for a certain € > 0. Therefore the property
(S. 2. 7) of the function E implies the inequality

\\exp (a, r)'P(r)I\_€

=\IE* exp(a, T) ¢;'(T)I\ _< ~ C\\exp (a, T) ¢<nl!€


I V
~ C I\ ¢ I\_€ for all small €> O. (S. 2.8)

Let us check the equality (S. 2.1). Tc d1is end we shall establish first that
the function E( T) exp [ - (a, T)] is the fundamental solution for the operator
p(- :D). By the Leibnitz differentiation formula we have

P(- :D) IE(T) exp(-(a, T))J

= exp(-(a, T))!, ~i p(i) (- :i)) E


i i!
= exp (- (a, T))P( - :D + a) E = exp - (a, T) • o = o,
from which we easily obtain
"' "" " =
P(-:V)'P = P(- JJ)(E(T) exp(-(a, T))* ¢) o * ¢"=' "¢·
3. We must prove now that Supp 'I' C Supp h. Let r € Supp h; then from the
convexity of Supp h it follows that for a certain ,\ we have (,\, T) > (,\, r)
for T € Supph. The construction of E(T) leads to the relation
P(:i) T + a) E(r - T) = 0 in the half-space (,\, r - T) < 0.

By a theorem of V. P. Palamadov (8, 9] we have the representation

E(r - T) = f exp (0, T) µ. (dO), (S. 2.9)


P(e+a)=O
\ReB\<2€
where the condition \Re 0\ <2€ is secured by the special choice of the funda-
mental solution E(T) for which \IE* ¢\\ _€ < C \\¢\\€(the property (S. 2.7)). The
measure µ. in (S. 2. 9) is such that the integral (S. 2. 9) converges absolutely
ESTIMATION AND TESTING HYPOTHESES 229

as a generalized function in the half-space (,\, r - T) < O.


We now have
1P(r) = fE(r - T) exp(- (a, r- T)) ¢<n dT
I E(r- n exp(-(a, T- T))¢<n dT
Supph=5°
..,
= exp(-(a, r)) f (exp(O +a, T), ¢(T))µ.(de). (S. 2.10)
P(e +a) =O
\Ree I <2€
Here we have set

(exp(O +a, T), ¢<rn = I exp(O +a, n¢<n dT.


j"

The interchanging of the order of integration is permissible because for a


sufficiently small / > 0, in view of \Re 0\ < 2 E, we have:

exp(O +a, T) = O(exp(- / \T\)) for T € 5".


We shall now show that

(exp(0 0 +a, T), ~(T)) = 0 for p(0 0 +a)= 0, \Re0 0 \ < 2f.
For sufficiently small a we have 0 0 +a€ <i.> x R;. Let N' be a connected
component of the intersection N n (w x R;), containing the point 00 + a. By
the conditions of the theorem N' has at least one real point ( at which
grad P(() -/, 0. Since P(O) is a polynomial with real coefficients, the condition
grad P(() -/, 0 implies that the set N' contains an open (s - 1)-dimensional
part v of the set of the real zeros of P(O). Now the cotest condition

implies that

(exp(O, T), ¢<n) = 0 for e€ V· (S. 2.11)

Since (exp(O, T), ~(T)) is analytic in w x R~, the relation (S.2.11) holds by
the principle of analytic continuation for the whole N', i.e.

(exp(0 0 +a, T), ef,(1')) = 0

for p(O + a) = 0. Hence


Supp 11' C 5".
230 SUPPLEMENT

4. We shall show that for any £ >0

(S. 2.8) implies that

First we shall prove that 'I' does not depend upon the choice of the point a€ cu.
Take

We have 11'1' exp(a, T)ll-£ < oo and 11'1'' exp(a', T)ll-£ < oo. From this and from
the convexity of the cone cu we deduce that the integrals

"'
'I' = f'P exp (t9, T) dT and "'
'I'' = J'P' exp (t9, T) dT
converge absolutely for t9 =b+ c +a' ; b € cu. But

P(-~)'P = P(- ~)'I''


implies that
'\J '\J I
P(t9)('1' - 'I'') = 0 for t9 €cu + a + a.
'\J '\JI
Hence 'I' = 'I' and 'I' = 'I'', and therefore
11'1' exp(a, T)ll-£ < oo for any a €cu.
Since lal in w can be made as small as we need, we have

11'1'11-( < oo,


The argument of this section also shows that the function 'I' in (S. 2.1) is
uniquely determined by the conditions (S. 2.5).
5. Application to the Behrens-Fisher problem. For the Behrens-Fisher
problem we have

p( T, t9) = C(t9) h(T) exp (t9 1 T 1 + • • • + t9 4 T4);

cu = (t9 2 <O, t9 4 < O);


h(T) = (T 2 - Tf)<n-3)/2 (T 4 - T~)<m -3)/2.

The corresponding ideal is a principal one and is generated by the polynomial


P(t9) = 191t94 - 19 2 t93·
ESTIMATION AND TESTING HYPOTHESES 231

Thus the Conditions 1. 1-2. 2 obviously hold. Let us check the requirement
n
3. 1. We take an arbitrary point e € N (w x R;) and let it move into the real
subspace while remaining during the whole motion in one and the same connected
component.
1) Suppose that e 3 f. 0. Consider the path formed by the points:

et= <e 1 , e 2 , ce3' ce 4); c E [ 1, :~ J.


Clearly all points et belong to the same component of N. Consider now the
path

e~ = (Ree 1 + itime 1 , Ree 2 + itime 2 , Ree 1 +itime 1 , Ree 2 +itime);

t € (O, 1).
This path takes et into the point (Ree 1 , Re0 2 , Ree 3 , Ree 4) belonging to
the real subspace.
2) If e3 = 0 but e4 f. 0 the argument must be changed in an obvious way.
3) If e3 = 0, e4 = 0, the path

et= (Re e 1 + itlm e 1 , Re e 2 +it Im e 2 , 0, O); t € (1, 0)

takes the point (e 1 , e 2, 0, O) into (Re e 1, Re e 2 , 0, O). It remains only to re-


mark that for P(e) = e 1 e 4 - e 2 e3' we have grad P(e) = O only at the origin,
which does not belong to W·
Hence for the Behrens-Fisher problem all cotests are of the form

1 r
<f>=lqar 1 ar 4
a2
-
a2
ar 1 ar 3
]
'I'.

6. We can describe all the cotests for exponential families with an arbi-
trary number of polynomial relations. In that case the ideal I is not neces-
sarily a principal one. Each cotest can be written in the form

~ = ~ :I p. (-~)'I'.,
'+' hj J J

where P.(e) are the generators of the ideal I and 'l'.(T) are in general, gen-
J J
eralized functions with supports in 3".
232 SUPPLEMENT

§3. CONDITIONS OF OPTIMAL UNBIASED ESTIMATION


FOR INCOMPLETE EXPONENTIAL FAMILIES
WITH POLYNOMIAL RELATIONS

We consider the problem of unbiased estimation of parametric functions of


n independent observations of a random variable with the density with respect
to Lebesgue measure:

(S. 3.1)

here s s; n; the abstract parameter a € A. Introduce the natural parameters

()i = ci(a)

and suppose that for a€ A the point () = (() 1 , • • ·, () s) runs over an everywhere
dense subset of an algebraic sub variety of the domain 0 C Rs. This subvariety
we shall write in the form 0 n TI, where TI is an algebraic variety in Rs given
by the polynomial relations

(S. 3.2)

with r < s.
The distribution of the repeated sample (xl' • • ·, xn) = ~ from the set
(S. 3.1) is given in Rn by the density with respect to Lebesgue measure

n n n }
t<n>(~; ()) = (C(()))n exp { fto(xi) + ()l ft1(xi)+·· ·+()sfts(xi) (S. 3.3)

where C(()) is determined by the norming condition. The sufficient statistics


for the family (S. 3.3) are
n n
T 1 =It 1(x.);···;T =It (x.).
1 ' s 1 s !

We shall suppose that T 1 , • • •, Ts are functionally independent; then the dis-


tribution of the vector T = (T 1, • • •, Ts) is given in Rs by the density with
respect to Lebesgue measure

(S. 3.4)

where h(T) ~ O; () € 0 n TI. We denote by Supp h the support of the function


ESTIMATION AND TESTING HYPOTHESES 233

h(T), i.e. Supph =IT: h(T) > OI; let 3" =int Supph. Our condition for h(T)
consists in the following requirements:
h(T) is infinitely differentiable on 3"}
(S. 3.5)
mes Supph =mes 3".

Let g(T) be an unbiased estimate for a certain function y(O) depending only
upon the vector of sufficient statistics T and having a finite variance for all
values of 0 € 0 n Il. We shall investigate the conditions which are implied for
the variety II and the estimate itself by the optimality property of the estimate
for all O € 0 n Il in the class of unbiased estimates of the function y(O) with
finite variances. Throughout this section we take the variance for the quality
measure of the estimate. It is clear that the behavior of the function g(T) out-
side 3" is of no importance for its properties as an estimate of y(O); therefore
we shall suppose g(T) to be defined only on 3". Denote by N the least
(complex) algebraic variety in Cs containing Il n O. Since Il is itself an
algebraic variety, Il n0 =N n O.
Theorem S. 3.1. In order for the function g(T), T € 3"
for which E8 g 2 < 00 ;
0 € 0 n Il to be the best unbiased estimate of the function E 8 g = y(O) for the
exponential family (S. 3.5), 0 € 0 n Il, under condition (S. 3.5) it is necessary
and sufficient that:
1) In the space Cs of variables 01' ···,Os there exists a linear system of
coordinates O~, ••• , O~ in which N is a cylinder of type L xv, where L is
the coordinate space o~ = ••• = o~ = 0 and v is a certain set in the space:
0 m+ 1 = · · • = 0's = 0; 0 -< m -< s.
1

2) In the corresponding system of coordinates T{, • • ·, T; in the space


Rs of values (TI, •.. , Ts) the function g(T) depends only on T~+I' ... , T;.
Proof. Let x(T) be an arbitrary unbiased estimate of zero with a finite
variance, i.e.

E 8 <x)=O; E 8 x 2 <oo; O€Onn.


Lemma S. 3.1. For g(T) to be the optimal unbiased estimate of y(O),
0€ 0 nn, it is necessary and sufficient that for each unbiased estimate of
zero with a finite variance
E e<g x) = O; 0 € 0 n II.
234 SUPPLEMENT

In fact, let g(T'J be the best unbiased estimate of y(O) and x<T> an arbitrary
unbiased estimate of zero. Then for an arbitrary constant c

(S. 3.6)

As for all values of c

CJJe<g + c x> ~ CJJ e<g).


then it is easy to deduce from (S. 3.6) that E e<g x) = 0, 0€ 0 n II. Conversely,
let E e<g x> = O, 0 € 0 n
II for any unbiased estimate of zero x(T). If the
estimate g 1(T) is such that

then X = g 1 - g will be an unbiased estimate of zero. We have:

which proves Lemma S. 3.1.


We can now prove the sufficiency of the conditions of the theorem. Without
loss of generality, we can assume that in the initial system itself the subspace
L is given by the equations 0 1 = · · • = Om = O, and v is a subset of the subspace
0 m +l = • • • = 0 s = 0, and that the function g ( T) depends only upon
Tm+l' ••• ,Ts. Let x(T) be an arbitrary unbiased estimate of zero with
E e<x 2) < oo, 0 € N n O. Take any point 00 = (0 10 , • • ·, 0so) € N n 0, denote
by cu the section of O by the surface 0 1 = 0 10 , ···,Om= Omo and put

The relation N L x v implies that, together with the point 0 0 , the set N con-
=

tains the points (0 10 , ••• ,Omo' Om+l' ••• ,Os) forall values of Om+l' ••• ,Os.
Hence and from the condition E 8(x) = O; 0 € 0 nN we deduce that

f'P(T) exp(Om+lTm+l+ ···+Os T)dTm+l • •• dTs =0


for all (Om+l' •·•,Os)€ CU• By the uniqueness theorem for Laplace transforms,
'P(T) = 0. We have further for 00 € 0 n N:
ESTL\tATION AND TESTING HYPOTHESES 235
E 80 (gx)=C(e 0 )fg(Tm+I' ••• , Ts)dTm+i ••• dTs

x fx(T)h(T) • exp(0 10 T 1 +···+Omo Tm)dT 1 · • • dT m

=C(Oo)fg(Tm+l'···,T)exp(Om+l,OTm+I +···+Os,OTs)dTm+1 •·• dTs

x fx(T)h(T)exp(0 10 T 1 + · •• + emO T m)dT 1 •• • dT m

= C(0 0 ) fg(T)'P(T)exp(Om+l, 0 Tm+l + · · · + eso Ts)dT m+I • •. dTs = O.

By Lemma S. 3.1, g(T) is the best unbiased estimate of the function y(O).
The proof of the necessity of the conditions of Theorem S. 3.1 is more com-
plicated and the following lemmas are required. Denote by 1J = '.iJ (j") the space
of infinitely differentiable functions of T with compact supports lying in :f.
For any function ¢(T) € '.iJ the Laplace transform

'ef> (O) = f exp (0, T) ¢(T) dT

is an entire function in cs.


Directly from Lemma S. 3.1 it follows that if g(T) is the best unbiased
estimate of the function y(e), e E f! n II and x (T) E '.iJ is such that
x<m = o, e E n n II,
then

(S. 3. 7)

Lemma S. 3.2. If an entire function ¢(0) is equal to zero on f! n II, then


it is equal to zero on the whole set N.
Proof. Let M be an irreducible algebraic variety in Cs. By a theorem of
Whimey [17) there exists a proper subvariety M* with the following property:
n
the set (M\M*) Rs in the vicinity of each of its points e
is a real analytic
subvariety of dimension d, and the vectors grad f(O) (where f (e) is an
arbitrary real polynomial vanishing on M) form an (s - d)-dimensional space.
Now represent the variety N as the sum of irreducible subvarieties
N=NIU··· LJNz.
Applying to each NA the theorem of Whitney, we separate in it the exclusive
subvariety M~. We remark that each of the sets NA\N~ has a nonvoid intersection
236 SUPPLEMENT

with 0, since otherwise 0 n II would be a pact of Uµf. ANµ U NA, and N


would not be the least algebraic variety containing 0 n II. We now fix the A and
choose a point () € (NA\NZ) n n. By the theorem of Whitney, in the vicinity of
the point () the set NA n0 is a real analytic variety of dimension d; moreover
there exist real polynomials fl'···, fs-d• vanishing on NAn 0, whose
gradients are linearly independent at the point (). The polynomials fl' · · ·
· · · , f s-d vanish on NA, since otherwise N would not be the least variety
containing 0 n II. Hence there exists a complex neighborhood U of the point fl
such that NA n U is a complex analytic variety of dimension d. Since c:f>(O) = 0
foe () € 0 n N\ it follows that c:f> (()) = Ofoe () € Un NA. But any complex

irreducible variety is a connected analytic variety with the exception of a


nowhere dense set. Hence c:f> (()) =O, () € NA. Since A is any of the numbers
1, · • • , l, c:f> =0 on N. Lemma S. 3.2 is proved.
From (S. 3.7) and Lemma S. 3.2 it follows that foe any X € ~. if x(O) = 0,
() € 0 nII then g x(()) = o, () € N.
Let A be the ring of all polynomials of () € Cs with complex coefficients.
If g is an ideal in A, and (; € Cs, we shall denote by g~ the ideal formed by
the polynomials p(() + (;), where p(()) € g. Consider the set U C A of poly-
nomials p(()) foe which g (T) is the best unbiased estimate of the function
y(()) and is also a generalized solution in j" of the equation

p(9))g = 0, 9) = [~, .•• ' ~]·


aT1 aTs
It is easy to see that U is an ideal. Denote by I the ideal in A formed by all
polynomials vanishing on N.
Lemma S. 3.3. If (; € .N, then I~ CU.
Proof. Let p(()) € I~ i.e. p(() - (;) € /. Considering g (T) as a generalized
function in j", we have for an arbitrary function </> € 9):

(p(:D)g, c:f>exp((;, T)) = (g, p(- 9))c:f>exp((;, T))

= (g, exp((;, T)p(-~-()c:f>)(n= f g(T)exp((;,


T)p(- ~ - (;)c:p(T) dT.
~ "' .
Since p(() - (;) € /, the function p(- 9) - (;)c:f> = p(() - (;) c:p(O) vanishes on
0 n II. But then we have

f g(T) exp((;, T)p(- 9) -()c:f>(T) dT = O; 0€0 n II


ESTIMATION AND TESTING HYPOTHESES 237

because the expression to the left is equal to g x(O), where x = p(- ~- /;,)¢ sat-
isfies the condition x(e) = 0, 0 € 0 n TI. Now from (S. 3.8) we deduce that the
generalized function p(~) g vanishes on the functions of the form
¢ exp (/;,, T).
As any function from ~ can be represented in this form, p(~) g = 0 10 j° i.e.
p(O) € ~. Lemma S. 3.3 is thus proved.
Consider the ideal g= {/ 6! 6€N"
By Lemma S. 3.3, each summand I 6 belongs
to U; hence g C U. Moreover, each polynomial from g vanishes on the set
L = n
6 € N(N - (,). But if 0 € L, then for a certain (, € N, € (N - (,). Hence e
we can find a polynomial f €I~ Cg such that f(e) ,j 0. Thus L is the set of
common roots of polynomials from g and therefore is an algebraic variety.
Lemma S. 3.4. The set L = n 6 € N (N - (,) is a linear subspace, and for
each point (, € L
g6 = g_ (S. 3.9)

Proof. Let (, € L ; if e€ N, then e + (, € N because (, € N - e. Hence


g6 = !I e + 6 I c !I e I = g, (S. 3.10)
e EN BEN
i.e. g6 Cg. But the set of all roots of the polynomials from g6 is
L - (, and
so from (S. 3.10) it follows that L - (, ::> L i.e. L ::> L + (, for all (, € L. Hence
L is a semigroup with respect to the operation of vector addition in Cs. We
shall now show that L is a linear subspace in Cs. Let A be the largest
linear subspace contained in L (we remark that L contains at least the origin
of coordinates). If A ,j L, there exists a point (, € L \A. Now take an arbitrary
point e€ A. Since L is a semigroup, it must contain the points e+ kl;,;
k = 1, 2, .••.
Let p(O) be an arbitrary polynomial vanishing on L; since p(O + k (,) = 0,
k = 1, 2, • • ·, p(O + k (,) =0 for all ,\ € C 1 • The set of all straight lines 0 +
,\ (,, ,\ E C 1 forms a linear subspace A' spread on A and (,. Since L is an
algebraic variety, A' CL; i.e. A is not the largest linear subspace contained
in L. This contradiction proves that L = A.
Since L is a linear subspace, it follows from (, € L that - (, € L. Hence
for any points e€ N and (, € L, we have e - (, € N. Hence in (S. 3.10) we
have an inverted inclusion and (S. 3.9) is proved.
238 SUPPLEMENT

Lemma S. 3.5. The set of vectors

grad p(O), p € g (S. 3.11)


.l_
coincides with the space L orthogonal, to L.
.l_
Proof. It is obvious that all the vectors (S. 3.11) belong to L and that they
form a linear space. We shall show that this linear space contains all the vectors
~ ~
from L . Suppose this is not so; then there exists a vector a € L which is
orthogonal to all the vectors (S. 3.11). Let q be an arbitrary polynomial from I.
For each point <; € N the polynomial q(() + <;) belongs to g and so the vector
grad q(<;) is orthogonal to a. Hence q ~ (<;) =0 on N i. e. q ~ € /. Applying
this argument to the polynomial q~ we see that q: = 0 on N, as was to be
proved. Hence at each point <; € N all derivatives of the polynomial q in the
direction of the vector a are equal to zero. This means that q vanishes on the
whole line <;+,\a, ,\ € c 1, On the other hand, N is the set of the roots of all
polynomials from I. Hence the set N, together with each of its points <;, con-
tains the whole straight line !<; + ,\aJ. Hence the set L € n~€ N(N - <;) con-
tains the straight line {,\aJ, in contradiction to the orthogonality of a to L.
This contradiction proves the lemma.
Lemma S. 3.6. Each polynomial p(()) vanishing on L belongs to g.
Proof. Select the polynomials pl' •··,Pm€ g in such a way that the
~
vectors grad pi (O); i = 1, • • • , m form a basis in the space L . Construct a
regular system of coordinates (w 1' • • • , w m) in the vicinity of the origin so that
w 1 = p 1' · • · , w m = pm. Since the polynomial p vanishes on the coordinate
subspace w 1 = • · • = wm = 0 (i.e. on L), it can be written in the form
m m
P = I.w.f.
1 ££
= I.p.f.
1 II

where fi are certain functions holomorphic at the origin. By (S. 3.9) we can
also obtain the representation

where f. I
are certain functions holomorphic at an arbitrary point
-
<; of L. Such
a representation can obviously be obtained for each point <; € L, since we can
always find a polynomial p' € g that does not vanish at <; •
ESTIMATION AND TESTING HYPOTHESES 239

We fix now an arbitrary point (€Cs. Denote by R ~ the ring of all rational
functions in Cs whose denominators do not vanish at the point (; by }( ~ we
denote the ring of all functions holomorphic in (. As is well known, the pair
(R ~, }( ~) is flat (see (15]). This means that }( ~ is a flat R cmodule [ 15] and
that for each R ~-module E the natural operation E -+ E ® R }( ~ is injective.
~
We now take for E the factor-ring Rs \fas where gR s is the ideal in Rs formed

by all the functions of the form

I.qiri; qi€ g; 'i € R~.


Since }( ~ is a flat R cmodule, we have

Ex J(~~J(~\gJ(
~

where g}(~ is an analogous ideal in }(~. Hence we can assert that the natural
mapping

R ~ \g }( ~ -+ }( ~ \g J{ ~

is injective. By (S. 3.12) the polynomial p belongs to g}(~ and hence belongs to
g9{ ~, in view of the injectivity of this mapping. Hence
(S. 3-13)

where q s is a certain polynomial distinct from zero in the point (. Consider


an ideal in A generated by the polynomials q s. By the Hilbert Nullstellensatz
(see for instance [I]) the unity of the ring A belongs to that ideal i.e.
I.k1 h.qr = 1 with certain h. €A. Putting(=(. in (S.3.13), multiplying by
' "'i ' '
hi and adding, we finally obtain p € g, which proves the lemma.
We can now complete the proof of the theorem. In the space cs choose a
rectangular system of coordinates e~, ... , e: in such a way as to make L
coincide with the coordinate subspace in which e~ = ••• = e~. In that case
0~, . · . , 0~ vanish on L and by Lemma S. 3.6 belong to the ideal g. Since
g C ~, in the corresponding system of coordinates in Rs we have in the domain
j" the equations

ag ag
--=···=
ar'
--=0.
ar'
1 m
240 SUPPLEMENT

These equations in the space of generalized functions mean that the function g,
up to its values on a set of measure zero, is constant with respect to the variables
Ti, · • • , T~ in the domain :f. The representation N =L x JI is obvious, be-
cause it follows from L C N - ( that together with the point ( the set N con-
tains the whole variety L + (.
Theorem S. 3.1 is proved.
It is interesting to compare the situation with respect to optimal unbiased
estimation of parametric functions for complete and incomplete exponential families.
For the former, by the Rao-Blackwell-Kolmogorov theorem, every statistic
g (T) depending only upon sufficient statistics is an optimal estimate of its
mathematical expectation E eg. For the incomplete exponential families, in view
of Theorem S. 3.1, the optimality of g (T) as an estimate of E eg means,
roughly speaking, a kind of quasi-completeness with respect to some of parameters
(the representation N =L x JI is analogous to the completeness). As regards
the optimal estimate.itself, it depends on sufficient statistics, having the same
indices as the parameters with the "quasi-completeness,, property.

§4. THE SAMPLE MEAN AS THE ESTWA TE


OF SCALE PARAMETERS

In this section we srudy the families of distribution functions of the form


F (x/CT) on the half-line (O, oo) depending upon the scale parameter CT€ (0, oo).
Our purpose is to srudy the unbiased estimation of CT on the evidence of the
sample (x 1, · · · , xn) from the population characterized by F (x/CT ). As usual,
we take the quadratic loss function.
Suppose the condition
00

fx 2 dF(x)=a 2 <oo (S. 4.1)


0

holds. If we set
00

a1= f xdF(x),
0

the statistics a1 1 x will be an unbiased estimate of the parameter with a finite


variance by (S. 4.1).
We remark first of all that in the class of all (not only unbiased) estimates
ESTIMATION AND TESTING HYPOTHESES 241

of a, the ai 1 x are always inadmissible except in the case of a degenerate dis-


tribution F(x). In fact, we have

Eo-(c 1 x -a) 2 = a 2E 1(c 1x-1) 2

and minc 1 E 1(c 1x - 1) 2 is attained, as is easily verified, for

al
cl= .
af + (a 2 -ai)fn

Since a 2 '2::.ai
and, moreover, the equality sign holds for only the degenerate
ones in the class of all estimates (except in the degenerate case), ai 1 x is
always inadmissible.
x
When is ai 1 admissible in the class of unbiased estimates of the scale
parameter a? To answer this question we shall first prove Lemma S. 4.1,
where we use the notation y = (x 2/x 1 , • • ·, x/x 1).

Lemma S. 4.1. Let


- E I(x I Y)
S =C X--- (S. 4.2)
n n E l(x2jy)
where the constant c n is determined from the condition
· {E
c EI
1 (x Iy )2 }
= 1. (S. 4.3)
n E1(x2jy)

Then

Eo-sn =a; Eo-(sn - a) 2 :S Eo-(ai 1 x - a) 2

and equality holds in (S. 4.4) if and only if, with probability 1,

E i (x IY) -1 -1
El(x2jy) = bn; bn =al en

Proof. Put y= y(y) = E 1(xjy)/E 1(x 2 jy). We have

E1(xjy)2]
E (s )=c E (xy)=c aE 1(yE 1(xjy))=ac E 1 [ 2 =a
o- n n o- n n E I (x I y)

by (S. 4.3);

1 1
E o- (a-Ix-a)2=Eo-(a
1 1- x - s n ) 2 +E o- (s n -a) 2 +2E o- l(a-1 x-s n Hsn-a)}.
242 SUPPLEMENT

Further,

by (S. 4.3). Thus

Eer(a. 1- 1 x - o->2 = E er (a. -1 1 x - s n ) 2 + E er (s n - a) 2 >


-
E(s n - a) 2 (S.4. 5)

and the equality sign in (S. 4.5) holds for all a€ (0, oo) simultaneously if and
only if with probability 1

i.e. if

This proves Lemma S. 4.1.


From now on we shall suppose that all the moments of F(x}
00

f xk dF(x), k = 1, 2, ••• ' (S.4.6)


0
are finite.
Theorem S. 4.1. Let the function F(x) satisfy the condition (S. 4.6). In
order for the statistic a.~ 1 x to be an admissible estimate of a in the class of
the unbiased estimates in the sample sizes n = n 1 , n = n 2 (nl' n 2 are any
numbers with n 2 > n 1 ~ 3) from the population given by F(x/a), it is necessary
and sufficient for F(x) to be either degenerate:
o, x <x0
F(x) = { for some x 0 > 0,
1, x 0 ::S x < oo

or a gamma-distribution:
ESTIMATION AND TESTING HYPOTHESES 243

F(x)={o, x~O
(ym /r(m)) J;xm-I e -y x dx; 0 < x < oo
for some m > 0, Y·
Theorem S. 4.2. If F(x) satisfies the condition (S. 4.6) and a~ 1 x
is optimal
in the class of the unbiased estimates of the parameter a for a sample size
n ~ 3, then F(x) is either degenerate or a gamma-distribution.
We omit the proofs, which proceed by means of functional equations. They
are given in detail in [5].

§5. NONPARAMETRIC APPROACH TO THE ESTIMATION


OF LOCATION PARAMETERS

Let xl' • • ·, xn be a repeated sample from the population with the distribu-
tion function F (x - ()) satisfying the conditions

fxdF = o, (S. 5.1)

In this case the parameter () € R 1 to be estimated on the evidence of the sample


(x 1, • • •, xn) means the mathematical expectation. For the well-known type of
distribution function F (x) E. Pitman [I 4] introduced, as early as 1938, the
following estimate for ():

(S. 5.2)

For an absolutely continuous F(x), F(x) = J~ 00 f(u) du the Pitman estimate,


as mentioned in §3, Chapter VII, can be written in the form
oo n
J enf<xi - g) dg
-oo I
(S. 5.3)
oo n
J n f(xi - g) dg
-oo I

C. Stein proved (16] that under the condition

JI x 13 dF(x) < oo
the Pitman estimate is absolutely admissible.
The loss function is assumed to be quadratic throughout this section. We
244 SUPPLEMENT

shall now consider in detail the situation when the form of the function F (x) is
unknown. If we suppose that F (x) is allowed to be arbitrary, satisfying only
the condition (S. 5.1), it is easy to see that there is no better estimate for () than
x. However, in the case when for some integer k > 1 the first 2k moments of
the distribution function F (x) are known:

µz = Jxl dF(x), l = 1, 2, ... ' 2k (S. 5.5)


and

µ2k = Jx2k dF < oo (S. 5/5)

the information about F(x) contained in the moments (S. 5.4) can be used more
efficiently than x for construction of the estimates of the parameter ().

Under condition (S. 5.5) the set of all polynomials Q(x 1 , • • ·, xn) of xl' • • •
• • • , x n of degree not exceeding k forms a Hilbert space L 2 > if the scalar1
product of the elements Q1 and Q2 is defined by:

(Ql, Q2) = Eo(Ql Q2) ·


The subspace formed by the polynomials Q € L 2 > of the form 1
Q = Q(x 2 - x 1' ••• ' x n - x 1)

will be denoted by Ak.


Consider the estimate

(S. 5.6)

where E 0 ( • I Ak) is the operator of projection on the subspace Ak · Note that


for the construction of the estimate (S. 5.6) we need to know only the first 2k
moments of the distribution function, not the whole function.

Theorem S. 5.1. For all () € R 1

(S. 5. 7)

uh ere the equality or the inequality in (S. 5. 7) are realized simultaneously for
all () € R1 and the equality holds if and only if E0 (x I Ak) = 0.
Proof. Since Q =1 € Ak, we have

(x-E 0 (x\Ak), l)=O.


ESTIMATION AND TESTING HYPOTHESES 245

Hence

Moreover,

E 8 (x - 0) 2 = E 8 (x - E0 (x I Ak) - e + E 0 (x I Ak)) 2

= E 8 (t~ - ()) 2 + 2 E 8 ((t~ - ()) E0 (x I Ak)) + E 8 (E 0 (x I Ak)) 2 •


But
k ~ k ~
E 8 ((tn - ())E 0 (x I Ak )) = E 0 (tn E 0 (x I Ak)) = o
as t! = x- E0 (x I Ak) is orthogonal to each function from Ak. Hence

Ee(x - ())2 = Ee<t!- ())2

= E 8 <t!-el 2 +E 0 (E 0 (x1Ak)) 2 ~ E 8 (t!-el 2

and the equality sign (for all () € R 1 simultaneously) holds under the condition

In connection with Theorem S. 5.1 it is natural to raise the question: for


what functions F (x) is the estimate t! better than x? In other words, when
does the knowledge of the first 2k moments of F (x) enable us to improve upon
the standard estimate x.
Theorem S. 5.2. If F(x) satisfies condition (S. 5.5), then for n ~ 3 the
estimate t! will be better than the sample mean x as an estimate of the loca-
tion parameter in all cases except when the first (k + 1) moments of the distribu-
tion function F(x), µz coincide with the corresponding moments of a normal
law, so that
o, l odd
µz ={
(l - l)!!al, l even
for a certain a 2 •
Theorem S. 5.2 is an obvious consequence of the Theorem S. 5.1 and the
following lemma.
LemmaS.5.1. If n~3 then E0 (x\Ak)=O ifandonlyifthefirst (k+l)
246 SUPPLEMENT

moments of F(x) coincide with the corresponding moments of a normal law.


Proof of Lemma S. 5.1. We proceed by induction._ For k = 1 the lemma
~olds trivially. Since Ak > Ak +I' it follows from E o<x I Ak) = 0 that
E 0 (x \ Ak+I) = 0. We can assume now that the first k moments of F(x) coincide
with corresponding normal moments and we shall prove that then the moment
Ilk +I coincides with the (k + l)st normal moment.
The condition

(S. 5.8)

is equivalent to the set of conditions

E0(x\ Ak_ 1)=0 (S. 5.9)

E0 (x(x.
'1 - ~ 1 ) •• -(x.lk -x 1))=0, (S. 5.10)

where (S. 5.10) must hold for all the sets (j 1 , • • ·, j k) of integers 2, • • ·, n.
The equivalence of the conditions (S. 5.8) and (S. 5.9)-(S. 5.10) follows from
the fact that Ak-I and the functions (x. - x 1) • • • (x. - x 1) generate the
/1 lk
whole Ak.
Let
al as
(x. -x 1) ••• (x. -x 1)=(x. -x 1) ••• (x. -x 1) , (S.5.11)
II Ik 'I is

where i l' • · ·, is are mutually distinct and a 1 + • • • + as = k. We have


a1 a
(x . - x I) ••• (x . - x I) s
'1 's

(S.5.12)

then from (S. 5.12) we get


a1 a
Eo(x (x . - x I) • • • (x . -
i I is
"i) s)

a
s l z l
1
=-
n
I (- 1) cal1 ... cass /ll ••• /ll /lk +I - l
I s

(S. 5.13)
ESTIMATION AND TESTING HYPOTHESES 247

We shall consider the cases of odd and even values of (k - 1) separately.


1) k - 1 odd. By the induction assumption, all odd moments up to the order
(k - 1) are equal to zero. Hence from (S. 5.13) we get
a1 a
nEO(x(xil -xl) ... (xis -xl) s)

~· l l1 ls
~ (-1) Cal ... Casµll ••• µlsµk +1-l

s ~· l ll ls
+ :I ~ (-1) ca ••• ca µz ···µz µz •• •µz µk-l'
q=l q 1 s 1 q-1 q+I s

(S. 5.14)
where the summation in I* is taken over all even l 1, • • • ; ls and in I; over
even l 1' • • • , l q-l , l q +I , • • • , ls and odd l q, the limits being indicated by
(S. 5.13). Io the sum I* the number l is always even and therefore (k + 1-l) is
odd. Moreover, if k + 1 - l Sk - 1, then µk + 1 - l =0 by the induction as sump·
tioo. Io the sum :I;, the number l is always odd; hence (k - l) is also odd;
since k - l ~ k - 1 we have µk-l = 0. Hence the condition (S. 5.10) is
equivalent to the relation

µk + 1 = o. (S. 5.15)

2) (k - 1) is even. Consider again the relation (S. 5.14). We break I* and


~s ~* .
kq=l kq into subsums

I*= So+Sz+· .. +Sk-1'

where S Zm is the part of the sum I* corresponding to all the values l 1, • • •


... ,ls, osl1sa1····· oslssas forwhich l1+···+ls=2m, and
s
I I*=S 1 +S 3 +···+Sk
q =1 q

where S Zm + 1 is the part of the double sum corresponding to the values l 1, • • •


. ·.. , l s for which l 1 + • • · + l s = 2 m + 1.
Consider first the conditions
O<a 1 <k, ... , O<a s <k. (S. 5.16)

We shall show that S2 m + S2 m + 1 = 0 for 0 < 2m S k - 1. Note that in view of


248 SUPPLEMENT

the condition (S. 5.16) in the sum I.;=l I; we have lq + 1 :Sh; hence by the
induction assumption

(S. 5.17)

for a certain a 2 > 0. Then


~
S2m+l = - .&.. ~* ell els
.&..q ••• a. Ill ···Ill Ill +1 Ill 1 ···Ill llk-1-2m
q=l l 1 +•••+ls =2m+l °'! s 1 q-1 q q+ s

(S. 5.18)

But

Ca.
l 1 +1
csla. +1 s
- -1- - (l 1 + 1) + .•• + __
s_ (ls + 1) = I. (a - l ) = k - 2m.
ll els q =1 q q
ca. a.
1 s
Since

llk-l-2m a 2 (k - 2 m) = llk+l-2m
(recall that m > O), we get from (S. 5.18)

s2m + s2m +l = o. (S. 5.19)

In view of (S. 5.19) the condition (S. 5.10) is equivalent to

s0 + s1 = o. (S. 5.20)

But

So= Ilk+ 1'

S1 = - (a 1 + · • • + as) /l.2 µ. k - 1 = - k a 2 µ. k - 1 •

Hence (S. 5.10) is equivalent to the equality


ESTIMATION AND TESTING HYPOTHESES 249

(S. S.21)

The induction assumption together with (S. S.21) gives


k+I
µk+l = k 11··a ·

Now let one of the numbers a 1, • • ·, a 8 be equal to k; then all the other num-
bers vanish. Without loss of generality we can assume that

a. =k, a. =···=a. =0. (S. S.22)


'1 '2 's

In this case (S. S.14) reduces to


k-1 l k l
nEo(x(xi -xI)k)= I Ckµlµk+I-l- I Ckµl+Iµk-l"
1 l=O l=l
l even l odd
In the second sum we put l' = k - l; then
k-1 k -1
I Ciµtµk +I -l I
l=O t' =O
l even l' even
so that under condition (S. S.22), the condition (S. S.14) is always satisfied.
We remark that for n ~ 3 the subspace Ak always contains the function
ct ct
(x. - x 1) 1 · • · (x. - x 1) s under condition (S. S.16).
'I 's
Hence we have established that for n ~ 3 the condition (S. S.8) is equiva-
lent to the coincidence of the first (k + 1) moments of the distribution function
F (x) with the moments of a normal law. This completes the proof of Lemma
S. S.l and Theorem S. S.2.

BIBLIOGRAPHY

[ 1] B. L. van der Waerden, Modeme Algebra. Vol. 2, Springer, Berlin, 1931;


4th ed., 1959; Russian transl., GITTL, Moscow, 1947. MR 2, 120;
MR 31#1292.
[ 2] A. M. Kagan and V. P. Palamodov, Conditions of optimal unbiased esti-
mation of parametric functions for incomplete exponential families with
polynomial ties, Dokl. Akad. Nauk SSSR 1967. (Russian)
250 SUPPLEMENT

[ 3] - - - , Incomplete exponential families and variance unbiased minimum


estimates. I, Teor. Verojatoost. i Primenen. 12(1967), 34-49. (Russian)
[4] A. M. Kagan, Sample mean as an estimate of the shift parameters, Dokl.
Akad. Nauk SSSR 169(1966), 1006-1008 =Soviet Math. Dokl. 7(1966),
1041-1043. MR 33#6747.
[ 5] A. M. Kagan and A. L. Ruhin, On the theory of the estimation of a scale
parameter, Teor. Verojatoost. i Primenen. 12 (1967). (Russian)
[6] V. P. Palamodov, On verifiable functions, Teor. Verojatoost. i Primenen.
12 (1967). (Russian)
[ 7] - - - , Tes ting multidimensional polynomial hypo th es es, Dokl. Akad.
Nauk SSSR 172 (1966), 291-293 = Soviet Math. Dokl. 7 (1966), 95-97.
[8] - - - , On systems of differential equations with constant coefficients,
Dokl. Akad. Nauk SSSR 148(1963), 523-526 =Soviet Math. Dokl. 4(1963),
133-136. MR 29 # 1442.
[9] - - - , Ph.D. Thesis, Moscow State University, Moscow, 1965. (Russian)
[10] G. M. Fihtengol'c, A course on differential and integral calculus. Vol. II,
"Nauka", Moscow, 1966; German transl., Hochschiilbucher fiir Mathematik,
Band 62, 2nd ed., VEB Deutscher Verlag, Berlin, 1966.
[ll] L. Hormander, Linear partial differential operators, Die Grundlehren der
math. Wissenschaften, Band 116, Academic Press, New York and Springer-
Verlag, Berlin, 1963. MR 28 #4221.
[12] A. M. Kagan, On the estimation theory of location parameter, Sankhya (1966).
[13] E. Lehmann, On the non-verifiability of certain parametric functions, Teor.
Verojatoost.i Primenen. 10(1965), 758-760. (Russian summary) MR 32 #8445.

[14] E. J. G. Pitman, The estimation of the location and scale parameters of a


continuous population of any given form, Biometrika 30(1938), 391-421.
[15] ] .-P. Serre, Geometrie algebrique et geometrie analytique, Ann. Inst.
Fourier, Grenoble 6(1955-56), 1-42. MR 18, 511.
[16] C. Stein, The admissibility of Pitman's estimator of a single location
parameter, Ann. Math. Statist. 30 (1959), 970-979. MR 22 # 278.
[17] H. Whitney, Elementary structure of real algebraic varieties, Ann. of Math.
(2) 66 (1957), 545-556. MR 20 # 2342.
[18] E. Lehmann, Testing statistical hypotheses, Wiley, New York,and Chapman
& Hall, London, 1959. MR 21 #6654.
BIBLIOGRAPHY
BAHADUR, R. R.
1. Sufficiency and statistical decision functions, Ann. Math. Statistics 25
(1954), 423....:462. MR 16, 154.
BASU, D.
2. On statistics independent of sufficient statistics, Sankhya 20 (1958),
223-226. MR 21 #4494.
BESICOVITCH, A. S.
3. On diagonal values of probability vectors of infinitely many components,
Proc. Cambridge Philos. Soc. 57 (1961), 759-766. MR 23 #A4154.
BLACKWELL, D.
4. Conditional expectation and unbiased sequential estimation, Ann. Math.
Statistics 18 (1947), 105-110. MR 8, 478.
BOCHNER, C. and MARTIN, W. T.
5. Several complex variables, Princeton Univ. Press, Princeton, N. J .,
1948; Russian transl., IL, Moscow, 1951. MR 10, 366.
BRENY, H.
6. L' etat actuel du probleme de Behrens-Fisher, Trabajos Estadfst. 6
(1955), 111-113. MR p, 868.
BROWN, L.
7. Sufficient statistics, in the case of independent random variables, Ann,
Math. Statistics 35 ( 1964), 1456-1414.
BOURBAKI, N.
8. Espaces vectoriels topologiques, Elements de Mathematiques, no. 1189.
1229, Hermann, Paris, 1953, 1955; Russian transl., IL, Moscow, 1959.
MR 14, 880; MR 17, 1109.
WALD, A.
9. Testing the difference between the means of two normal populations with
unknown standard deviations, Selected Papers in Probability and Statis-
tics, McGraw-Hill, New York, 1955, pp. 669-695. MR 16, 435.
WIDDER, D.
10. The Laplace transform, Princeton Math. Series, vol. 6, Princeton Univ.
Press, Princeton, N. J., 1946.
WIJSMAN, R. A.
11. Incomplete sufficient statistics and similar tests, Ann. Math. Statistics
29 (1958), 1028-1045. MR 21 #5256.

251
252 BIBLIOGRAPHY

VOLODIN, I. N.
12. On the distinction between the Poisson and P6lya distributions when a
large number of small samples is available, Teor. Verojatnost. i Primenen.
10 ( 1965), 364-367. (Russian) MR 31 #6299.
GLEASON, A. M.
13. Finitely generated ideals in Banach algebras, J. Math. Mech. 13 ( 1964),
125-132. MR 28 #2458.
DANTZIG, G.
14. On the non-existence of tests of "Student's" hypothesis having power func-
tion independent of a, Ann. Math. Statistics 11 (1940), 186-191. MR 1, 348.
DARMOIS, G.
a
15. Sur Les Lois de probabilite estimation exhaustive, C.R. Acad. Sci.
Paris 260 ( 1935), 1265-1266.
DOETSCH, G.
16. Handbuch der Laplace-Trans formation Vols. I, II, Birkhauser, Basel,
1950, 1955. MR 13, 230; MR 18, 35.
DOWKER, C.H.
17. Lectures on sheaf theory, Tata Institute, Bombay, 1956, 1962. MR19, 301.
DOOB, J. L.
18. Stochastic processes, Wiley, New York, 1953; Russian transl., IL, Mos-
cow, 1956. MR 15, 445; MR 19, 71.
D¥NKIN, E. B.
19. Necessary and sufficient statistics for a family of probability distribu-
tions, Uspehi Mat. Nauk 6 (1951), no. 1(41), 68-90; English transl.,
Selected Transl. Math. Stat. and Prob., vol. 1, Amer. Math. Soc., Provi-
dence, R. I., 1961, pp. 17-40. MR 12, 839.
ZINGER, A. A.
20. Independence of quasi-polynomial statistics and analytical properties of
distributions, Teor. Verojatnost. i Primenen. 3 (1958), 265-284.
(Russian) MR 21 #941.
21. On a problem of A. N. Kolmogorov, Vestnik Leningrad. Univ. 11 0956),
no. 1, 53-56. (Russian) MR 17, 863.
ZINGER, A. A. and LINNIK, Ju. V.
22. Characterization of the normal distribution, Teor. Verojatnost. i
Primenen. 9 (1964), 692-695. (Russian) MR 30 #607.
BIBLIOGRAPHY 253

KAGAN, A. M. and LINNIK, Ju. V.


23. A class of families admitting similar zones, Vestnik Leningrad. Univ. Ser.
Mat. Meh. Astronom. 19 (1964), 25-36. (Russian)
KAGAN, A. M. and SALAEVSKII, O. V.
24. The Behrens-Fisher problem for the existenc·e of similar regions in an
algebra of sufficient statistics, Dokl. Akad. Nauk SSSR 155 (1964), 1250-
1252 =Soviet Math. Dokl. 5 (1964), 556-558. MR 28 #4627a.
CARTAN, H.
25. ldeaux des fonctions analytiques de n variables complexes, Ann. Sci.
Ecole Norm. Sup. (3) 61 (1944), 149-197. MR 7, 290.
26. Sur les matrices holomorphes de n variables complexes, J. Math. Pures
Appl. 19 (1940), 1-26. MR 1, 312.
27. ldeaux et modules de fonctions analytiques de variables complexes, Bull.
Soc. Math. France 78 (1950), 29-64. MR 12, 172.
28. Varietes analytiques complexes et cohomologie, Colloque sur les Fonc-
tions de Plusieurs Variables, Bruxelles 1953, Georges Thone, Liege and
Masson & Cie, Paris, 1953, pp. 41-55. MR 16, 235.
29. Varietes analytiques reelles et varietes analytiques complexes, Bull.
Soc. Math. France 85 (1957), 77-99. MR 20 #1339.
KENDALL, M. G.
30. The evergreen correlation coefficient, Contributions to Probability and
Statistics, Stanford Univ. Press, Stanford, Calif., 1960, pp. 274-277.
MR 22 #11457.
KOLMOGOROV,A.N.
31. Izv. Akad. Nauk SSSR Ser. Mat. 6 ( 1942), 3-32. MR 4, 221.
32. Unbiased estimates, Izv. Akad. Nauk SSSR Ser. Mat. 14 (1950), 303-326;
English transl., Amer. Math. Soc. Transl. (1) 11 (1962), 144-170.
MR 12, 116.
,
CRAMER, H.
33. Mathematical methods of statistics, Princeton Math. Series, vol. 9,
Princeton Univ. Press, Princeton, N. J., 1946; Russian transl., IL,
Moscow, 1948. MR 8, 39.
KOOPMAN, B. O.
34. On distributions admitting a sufficient statistic, Trans. Amer. Math. Soc.
39 ( 1936), 399-409.
254 BIBLIOGRAPHY

KULLBACK, S.
35. Information theory and statistics, Wiley, New York and Chapman & Hall,
London, 1959. MR 21 #2325.
LEHMANN, E.
36. Testing statistical hypotheses, Wiley, New York and Chapman & Hall,
London, 1959; Russian transl., IL, Moscow, 1963 and "Nauka", Moscow,
MR 21 #66 54.
1964.
,
LEHMANN,E.andSCHEFFE,H.
37. Completeness, similar regions and unbiased estimation. I, Sankhyii 10
(1950), 305-340. MR 12, 511.
LINNIK, Ju. V.
38. On polynomial statistics in connection with the analytical theory of dif-
ferential equations, Vestnik Leningrad. Univ. 11 (1956), no. 1, 35-48;
English transl., Selected Transl. Math. Stat. and Prob., vol. 1, Amer.
Math. Soc., Providence, R. I., 1961, pp. 171-206. MR 17, 983.
39. Polynomial statistics and polynomial ideals, Calcutta Math. Soc. Golden
Jubilee Commemoration Volume ( 1958-1959), Part I, Calcutta Math. Soc.,
Calcutta, 1963, PP· 95-98. MR 29 #2830.
40. On the theory of statistically similar regions, Dokl. Akad. Nauk SSSR
146 (1962), 300-302 =Soviet Math. Dokl. 3 (1962), 1297-1299.
MR 25 #3584.
41. Sur certaines questions de la statistique analytique, Ann. Fae. Sci. Univ.
Clermont Math. No. 8 (1962), 53-61.
42. Complex variables in problems with nuisance parameters and finite rank
sufficient statistics, Dokl. Akad. Nauk SSSR 149 (1963), 1026;_1028 =
Soviet Math. Dokl. 4 (1963), 512-513. MR 27 #3041.
43. Remarks on the Fisher-Welch-Wald test, Dokl. Akad. Nauk SSSR 154
(1964), 514-516 =Soviet Math. Dokl. 5 (1964), 118-120.
44. Randomized homogeneous tests for the Behrens-Fisher problem, lzv.
Akad. Nauk SSSR Ser. Math. 28 (1964), 249-260; English transl., Selected
Transl. Math. Stat. and Prob., vol. 6, Amer. Math. Soc., Providence, R. I.,
1966, PP· 207-217. MR 28 #5521.
45. On the construction of optimal similar solutions of the Behrens-Fisher
problem, Trudy Mat. Inst. Steklov. 79 ( 1965), 40-53 =Proc. Steklov
Inst. Math. no. 79 (1965), 41-56.
BIBLIOGRAPHY 2SS

46. On A. Wald's test for the comparing of two normal s~mples, Teor. Vero-
jatnost. i Primenen 9 (1964), 16-30. (Russian) MR 28 #SS20.
47. Characterization of tests of the Bartlett-Scheffe type, Trudy Mat. Inst.
Steklov. 79 (196S), 32-39 = Proc. Steklov Inst. Math. no. 79 (196S), 32-40.
48. An application of a theorem of H. Cartan in mathematical statistics, Dokl.
Akad. Nauk SSSR 160 ( 196S), 1248-1249 = Soviet Math. Dokl. 6 ( 196S),
291-293. MR 31 #6298.
LINNIK, Ju. V., ROMANOVSKAJA, J. L. and ~ALAEVSKII, 0. V.
49. Remarks on the theory of the Fisher-Welch-Wald test, Teor. Verojatnost.
i Primenen. 10 (196S), 727-730. (Russian) MR 32 #8446.
LINNIK, Ju. V., ROMANOVSKII, J. V. and SUDAKOV, V. N.
SO. A non-randomized homogeneous test in the Behrens-Fisher problem,
Dokl. Akad. Nauk SSSR 1S5 (1964), 1262-1264 =Soviet Math. Dokl. 5
( 1964), S70-S72. MR 28 #4627b.
v v
LINNIK, Ju. V. and SALAEVSKII, O. V.
Sl. On the analytic theorr, of tests for the Behrens-Fisher problem, Dokl.
Akad. Nauk SSSR 150 (1963), 26-27 = Soviet Math. Dokl. 4 (1963), S80-
S82. MR 27 #3042.
J'.OJASIEWICZ, S.
S2. Sur le probleme de division, Studia Mathematica 18 (19S9), 87 -136.
MR 21 #S893.
LUKACZ, E.
S3. Characterization of populations by properties of suitable statistics,
Proc. Third Berkeley Sympos. Math. Stat. and Prob. 19S4-19SS, vol. 2,
University of California Press, Berkeley, 19S6, pp. 19S-214. MR 18, 942.
LJAPUNOV, A. A.
S4. On completely additive vector-valued functions, Izv. Akad. Nauk SSSR
Ser. Mat. 4 (1940), 46S-468. (Russian) MR 2, 31S.
MALGRANGE, B.
SS. Lectures on the theory of functions of several complex variables, Tata
Institute, Bombay, 1962.
NEYMAN, J.
S6. Sur la verification des hypotheses statistiques composees, Bull. Soc.
Math. France 63 ( 193S), 346-366.
256 BIBLIOGRAPHY

57. Un theor~me d'existence, C. R. Acad. Sci. Paris 222 (1946), 843-845.


MR 7, 457.
58. Current problems of mathematical statistics, Proc. lnternat. Congress
Math. 1954, vol. 1, Noordhoff, Groningen and North-Holland, Amsterdam,
1957, pp. 349-370. MR 20 #1374.
OKA, K.
59. Surles fonctions analytiques de plusieurs variables. VII: Sur quelques
notions arithmetiques, Bull. Soc. Math. France 78 0950), 1-27. MR 12, 18.
PETROV, A. A.
60. Verification of statistical hypotheses on the type of distribution based
on small samples, Teor. Verojatnost. i Primenen. 1 (1956), 248-271.
(Russian) MR 19, 76.
RAO, C.R.
61. Information and the accuracy attainable in the estimation of statistical
parameters, Bull. Calcutta Math. Soc. 37 (1945), 81-91. MR 7, 464.
62. Some theorems on minimum variance estimates, Sankhya 12 (1952), 27-
42. MR 14, 110 3.
ROMANOVSKAJA, J. L.
63. On the Fisher-Welch-Wald test, Sibirsk. Mat. z. 5 (1964), 1343-1359.
(Russian) MR 30 #661.
ROMANOVSKil, I. V. and SUDAKOV, V. N.
64. On the existence of independent partitions, Truliy Mat. Inst. Steklov.
79 ( 1965), 5-10 = Proc. Steklov Inst. Math. no. 79 (1965), 1-7.
RUCKERT, w.
65. Zur Eliminationsproblem der Potenzreihenideale, Math. Ann. 107 (1932),
259-281.
SAKS, S.
66. Theorie de l'integrale, Warsaw, 1933; English transl., rev. ed., Dover,
New York, 1964; Russian transl., IL, Moscow, 1949.
STEIN, Cb.
67. A two-sample test for a linear hypothesis whose power is independent of
the variance, Ann. Math. Statistics 16 (1945), 243-258. MR 7, 213.
SVERDRUP, E.
68. Similarity, unbiasedness, minimaxibility and admissibility of statistical
test procedures, Skand. Aktuarietidskr. 36 (1953), 64-86. MR 15, 453.
BIBLIOGRAPHY 257

HALMOS, P. R.
69. Measure theory, Van Nostrand, Princeton, N. J ., 1950; Russian transl.,
IL, Moscow, 1953. MR 11, 504; MR 16, 22.
HALMOS, P.R. and SAVAGE, L. J.
70. Application of the Radon-Nikodym theorem to the theory of sufficient
statistics, Ann. Math. Statistics 20 0949), 225-241. MR 11, 42.
,
HARDY, G. H., LITTLEWOOD, J. E. and POLY A, G.
71. Inequalities, 2nd ed., Cambridge Univ. Press, New York, 1952; Russian
transl., IL, Moscow, 1948. MR 13, 727; MR 18, 722.
HARDY, B.
72. Some properties of an angular transformation of the correlation coefficient,
Biometrika 43 (1956), 219-224. MR 17, 981.
HOGG, R. and CRAIG, A.
73. Sufficient statistics in elementary distribution theory, Sankhya 17 (1956),
209-216. MR 19, 188.
FEIGEL'SON, T. S.
74. On a simple method of establishing independence of statistics, Vestnik
Leningrad Univ. Ser. Mat. Astronom. 19 (1964), no. 3, 157-158.
(Russian) MR 29 #4131.
FUKS,. B. A.
75. Introduction to the theory of analytic functiOns of several complex vari-
ables, Fizmatgiz, Moscow, 1962; English transl., Transl. Math. Mono-
graphs, vol. 8, Amer. Math. Soc., Providence, R. I. 1963; reprint 1965.
MR 27 #4945; MR 29 #6049.
76. Special chapters in the theory of analytic functions of several complex
variables, Fizmatgiz, Moscow, 1963; English transl., Transl. Math.
Monographs, vol. 14, Amer. Math. Soc., Providence, R. I., 1965.
MR 30 #4979.
v v
SALAEVSKII, O. V.
77. On the non-existence of regularly varying tests for the Behrens-Fisher
problem, Dokl. Akad. Nauk SSSR 151 (1963), 509-510 =Soviet Math.
Dokl. 4 (1963), 1043-1045. MR 27 #2037.
CHATTERJEE, S. K.
78. On an extension of Stein's two sample procedure to the multi-normal
problem, Calcutta Statist. Assoc. Bull. 8 (1959), 121-148. MR 21 #4501.
258 BIBLIOGRAPHY
,
SO:IEFFE, H.
79. On solutions of the Behrens-Fisher problem based on the t-distribution,
Ann. Math. Statistics 14 (1943), 35-44. MR 4, 221.
SHOHAT, J. A. and TAMARKIN, J. D.
80. The problem of moments, Math. Surveys, vol. 1, Amer. Math. Soc.,
Providence, R. I., 1943; rev. ed., 1947. MR 5, 5.

You might also like