Professional Documents
Culture Documents
Linnik - Statistical Problems With Nuisance Parameters-Amer Mathematical Society (1968)
Linnik - Statistical Problems With Nuisance Parameters-Amer Mathematical Society (1968)
STATISTICAL PROBLEMS
WITH NUISANCE PARAMETERS
by
Ju. V. Linnik
IO. B. JIMHHMK
TEOPMH BEPOHTHOCTEtiI
l1 MATEMATMqECKAH CTATMCTMKA
l13AaTeJibCTBO , , HayKa''
rJiasHaH PeAaK~JilH
<l>1rt31r1KO· MaTeMaTJr1tleCK oii Jl1r1TepaTypbl
MocKsa 1966
iii
iv PREFACE
At the end of the book are several unsolved problems, which constitute only a
small portion of the esthetically pleasing and varied problems that arise in analy-
tical statistics. Our purpose of the present book is to draw the attention of per-
sons interested in mathematic~! statistics to its analytical aspects.
A. M. Kagan, I. L. Romanovskaja, and V. N. Sudakov had a share in the
writing of this book. Sections 2 and 3 of Chapter VII were written by the author
in collaboration with A. M. Kagan, and section 4 of Chapter VIII with I. L. Roma-
novskaj a. Section 2 of Chapter X was written by V. N. Sudakov. A considerable
amount of help in the writing of Chapter I was provided by N. M. Mitrofanova and
V. L. Eldlin.
I wish to express my gratitude to 0. I. Rumjanceva and S. I. Cirkunova for
their great help in the preparation of the manuscript.
I u. V. Linnik
PREF ACE TO THE AMERICAN EDITION
Ju. V. Linnik
v
TABLE OF CONTENTS
Page
Preface . .. .. . .. . .. . .. . ... . . . .. . . ... . ... . .. . .. . .. . . .. .. . . . . .. .. .. . . .. .. . .. . .. .. . .. . ... ... .. .. . ... . ... .. . . .. . .. . ... .. . .. .. . . ... . .. . iii
Chapter III. Nuisance parameters. Tests with invariant power functions ........ 49
§ 1. Nuisance parameters ... .............. .. .... ..... ...... ............... ....... ... . ......... .. .. . . .... ..... 49
§2. Tests with invariant power functions ........................................................ 51
§3. Some results dealing with tests with invariant power functions ............ 53
§4 .. Stein's test .................................................................................................... 59
vii
viii
§6. Applications of a theorem of 1-1. Cartan to the study of families
of statistics .................................................................................................. 83
In the present section we shall review the basic material from measure theory,
which usually constitutes the foundation of the theory of statistical inferences.
For the proofs of the corresponding theorems, we shall refer the reader to certain
well-known texts. We shall, howevet, give the proofs of the less widely used
theorems. We employ the usual set-theory notation.
As arule, we consider a space X of elementary events. Along with this
space, we also consider the a-algebra ff of measurable subsets A of X. The
pair (X, ff) is called a measurable space. Let us consider a countably additive
(though not necessarily finite) nonnegative set function µ=µ(A) defined on the
sets A € ff. We take µ. (/) = 0, where Ff denotes the empty set. If µ. (X) = 1, then
µ is a probability measure and we shall usually denote it by the letter P. If ff
contains a countable family of disjoint sets A 1, A 2, • • • such that µ.(A)< oo (for
i = 1, 2, · · ·) and Ui Ai = X then the measure µ. is said to be a-finite. We shall
usually make a given µ. complete, without changing the notation for it, by sup-
plementing, if necessary, the a-algebra ff with all subsets of sets of zero meas-
ure and assigning to them the measure O. The most important particular measures
that we shall be using are Lebesgue measure and "counting measure".
Lebesgue measure is defined for X = En (that is, n-dimensional Euclidean
space). Here the a-algebra is composed of the Borel sets generated by all the
parallelepipeds xi€ (ai, bi] (for i = 1, 2, · · ·, n),where x 1, •.. , xn are the coor-
dinates of a point in En. This is the minimal a-algebra containing all such paral-
lelepipeds. For these sets, the measure is simply their geometric volume.
Counting measure is defined on a countable set X. The a-algebra ff is the
family of all subsets of X. For A€ ff, µ.(A) is defined as the number of ele-
ments of A, so that µ(A) = oo if A is an infinite set.
Consider a measurable space (X, ff). Let j" denote a space other than X.
Let T denote a mapping defined on X into j":
T: X- j".
1
2 INTRODUCTION
Let the space 5" also be provided with a a-algebra 93 of measurable subsets 8.
The mapping T is said to be measurable if the pre-images under T of measurable
sets are measurable, that is, if 8 € 93 implies r- 1 (8) € c1. In the case of the
measurable space (5°, 93), we customarily define a measure v by v(8)=µ(T- 1 (8))
for 8 € 93.
When we interpret the original measurable space (X, ct) as a sample space,
we shall call the measurable mapping T: X --> 5" a statistic. In the majority of
the particular cases that we shall consider, '.X: and 5" will be Euclidean spaces,
while the sample element in '.X: and the value of the statistic T will be random
vectors. In particular, if 5" is the real axis, the mapping T is a measurable func-
tion. Here we may consider ordinary integration with respect to the measure µ.
If µ is the measure corresponding to the measurable space ex' (!) and ¢ is a
nonnegative measurable function, then the expression
v(A)= Jcpdµ
A
for A € c1, defines a new measure v on ('.X:, ct). We write ¢ = dv/ dµ and we call
¢ the Radon-Nikodym derivative of the measure v with respect to µ. Here, if
v('.X:) = 1 and v is a probability measure, then ¢ is called the probability density
for v with respect to µ. The function ¢ is uniquely determined up to its values
on sets of measure zero.
Conditions for existence of the function ¢ are given by the well-known
Radon-Nikodym theorem:
Theorem 0.1.1. Suppose that µand v are a-finite measures over CX, c1).
A necessary and sufficient condition for existence of the Radon-Nikodym deriva-
tive of the measure v with respect to µ is that the measure v with respect to µ
be absolutely continuous, that is, that v(A) = O for all A € c1 such that µ(A) =0.
In what follows the concept of a product measure p =µ x v will ~e useful.
Suppose that ex' c1. µ) and <Y. 93, v) are two measurable spaces equipped with
measures µ and v respectively. Consider the Cartesian product Xx Y of X and
Y and the Cartesian product c1 x 93 of the a-algebras c1 and 93, this last product
meaning the minimal a-algebra containing all sets A x 8 for A € (! and 8 € ~.
Thus we obtain a measurable space (X x '!J, c1 x 93). The product measure for this
space is defined as
Proofs of these two theorems can be found, for example, in the book by
Halmos [ 69 ].
Let us examine some properties of statistics that will be needed below. Our
exposition follows to some extent the well-known book by Lehmann [36].
Suppose that a statistic T maps a measurable space (!, Cf) into a measur-
able space (5°, 93). If B € 93, then r-1 (B) € (f, but the family of measurable
r-
sets I 1 (8)} = Cf 0 , although it is a subset of the family (f, does not necessarily
coincide with it. Obviously the family (f 0 constitutes a a-subalgebra, known as
the a-subalgebra induced by the statistic T. Let us consider the measurable
space (!, lf 0) and assign the measurable real functions in it to the same func-
tions for the measurable space (5°, 93). This enables us to consider the a-alge-
bras lf 0 and 93 in a certain sense equivalent with respect to the statistic T.
Theorem 0.2.1. Suppose that a statistic T: (!, (f) --> (5°, induces a a-sub- 93)
algebra lf 0 . Let f denote a real Cf-measurable function. The function f is <'.f 0 -
measurable if and only if there exists a 93-measurable function g such that
f(x) = g (T (x)) (0.2.1)
for all x.
For the proof, see the book by Lehmann [36].
Another theorem that we shall find useful is
Theorem 0.2.2. Let T: (!, Cf)--> (5°, 93) denote a measurable mapping, let µ
4 INTRODUCTION
that is, if one of these integrals exists, so does the other, in which case they
coincide.
For proof see [ 36 ].
§ 3. ON CONDITIONAL PROBABILITIES
In this section we state without proof the information that we shall need re-
garding conditional mathematical expectations and probabilities. Detailed proofs
can be found, for example, in [36] and [8].
Suppose that a measurable space eX, ff) is equipped with a probability meas-
ure P and that a statistic T maps CX:, ff) into (5°, 93), inducing a u-subalgebra
ff 0.Let f denote a nonnegative function. Suppose that ff is measurable and
integrable with respect to the measure P. Theo the integrals J A f dP exist for
all A €ff and a fortiori for all A0 € ff 0 . It follows from the Radon-Nikodym
Theorem (Theorem 0.1.1) that there exists a function f 0 that is ff 0 -measurable
with respect to the measure P and that satisfies the equation
JI dP= f f 0 dP (0.3.1)
Ao Ao
for all A0 € ff 0 .
In accordance with Theorem 0.2.1, f 0 is a measurable function of T (x). This
function has the following two important properties.
1) Equation (0.3.1) holds for an arbitrary set A 0 € ff 0 .
2) The function f 0 is a measurable function of T(x).
On the basis of Theorem 0.1.1 these properties determine f 0 uniquely up to
its values on sets P of measure 0. By definition, the function f 0 is taken as
the conditional mathematical expectation f(x) for a given value of the statistic
T(x):
Let p*(t) denote the measure induced by the statistic T in the measurable
space (j', ~). Then by virrue of Theorem 0.2.2,
for arbitrary B € ~.
To extend the definition of conditional mathematical expectation to sigo-
variable functions f(x), we _define
j+(x)= /(x)+21f(x)I; 1-(x)= lf(x)l2/(x) •
where B € ~ is an arbitrary measurable set and P*(t) is, just as before, the
measure induced by the statistic T in the measurable space (j', ~).
We now tum to the important special case in which :t: = En is n-dimensional
Euclidean space and Cl is the family of Borel subsets of E n • We note that the
more general case in which :t: is a Borel subset of En reduces to this case since
we can extend the probability measure to all En by taking its value equal to O
on En \:t:.
We have the following important theorems, for proof of which see for example
[36].
6 INTRODUCTION
7
8 I. INTRODUCTORY MATERIAL
Laplace transforms of the type described. We have the following important convo-
lution theorem.
Theorem 1.1.1.
(1.1.2)
where
r1 Ts, co
co
-co
f ds~1 (s1· · · · • ss) ni2 (T 1- 61· T 2- 62· · · · • Ts - ss>·
(2~)s f
c 1-tco
d0 1 ••• J
cs-lco
d0sl(mjO)exp(01T1 + ... +OsTs). (1.1.3)
over the product of the vertical contours converges absolutely, it is equal to the
function m (T 1, • • • , Ts) at all its points of continuity.
Since L (m I e) is holomorphic in the region P, evety contour in the product
can be deformed in a rather arbitrary manner by replacing it with rectifiable curves
of a type that is convenient in some respect or other. In particular, if L (m I e)
satisfies the inequality
for r > 1, e E P and C 0 = const., then by translating the contours for the inte-
grations with respect to e 1, • • • , es to the right, we see that
m(Tl• ... , T:;)=O for T 1 <0, ... , Ts,<O.
1
We note also that for positive integers k 1, • • ·, k s the fraction l/ekl .•• esk s is
the unilateral Laplace transform of
1 Tk' Tk 5
(k 1 - i)I ... (ks-1)1 1 •• • s •
holds on the vertical contours (mentioned in Theorem 1.1.2) passing through the
region P. Then E(fJl'···,fJs) is the Laplace transform L(H\fJ),where H=
H (T 1, • • ·, Ts) vanishes for Ti < O (for j = 1, 2, • · ·, s ~ and has partial derivatives
of the first r - 2 orders. Furthermore,
(1.1.6)
where ( > O may be arbitrarily small. Here and in what follows the Ki are posi-
tive constants.
To prove this, we substitute the function E (() 1, • • ·, () s) for L (m \()) in for-
mula (1.1.4). By virtue of inequality (1.1.5), the integral converges absolutely.
Denoting it by H (Tl'• · • , Ts), we obtain a function satisfying inequality ( 1.1.6).
If one of the variables T 1, T 2, • • • , Ts 1 is negative, we find, by increasing the
abscissa ci of the corresponding contour, that H(T l'. ··,Ts)= O. Specifically,
the integrand in (1.1.3) contains the factor exp (() 1 T 1 + • • • + ()s Ts), which con-
verges uniformly to O. Furthermore, by differentiating formula (1.1.3) formally
with respect to () l' · • · , () s , we obtain a partial derivative of order r - 2. By
virtue of inequality (1.1.5), this integral converges absolutely, so that the corre-
sponding derivatives of H(T l' ··•,Ts) exist.
We still need a theorem on the Laplace transforms of functions that vanish
outside a finite interval (that is, functions of compact support).
Theorem 1.1.4. Suppose that the function 8(T l' · · ·, Ts) has partial deriva-
tives of the first r (~ 1) orders almost everywhere and that 8 vanishes outside a
finite interval [O, b]. Then for arhitrary (() 1, • • • , () s) (Re () > O),
1 •••
b
and that the series converges in a neighborhood of (0, · · ·, 0). By virtue of the
correspondence mentioned above,
We saw above that functions of the form L (m le) are holomorphic functions
of several complex variables. We shall use certain concepts and theorems from
the theory of such functions. These can be found in the monographs [ 5 ], [ 5 5 ],
(75], and (76]. In particular, we shall need the concept of a holomorphic function
in a region of superposition.
Let us denote by cs the Cartesian product of s copies of the Euclidean
complex plane. A domain of holomorphy is defined as an open region DC cs for
which there exists a function f (el' · · · , es) that is holomorphic in D but does not
have an analytic continuation to any open region of which D is a proper subset.
In particular, the polycylinders Z 1 x Z 2 x • • • x Zs, where the components Z.
(for j ::S s) are open simply-connected regions of the complex variables xi + i~j,
are domains of holomorphy. In what follows, we shall be concerned primarily with
such regions.
§ 2. FUNCTIONS OF SEVERAL COMPLEX VARIABLES 11
f 1 = 0, • • • ' fr = O. (1.2.1)
The set of solutions of these equations (e 1' ••• ' es) inside z 1 x ••• x zs is
called the analytic set generated by the system (1.2.1). Let us denote this set by
Vf 1,···,fr" Suppose that the functions f/er
···,es) for j = 1, 2, · · ·, s have
real values on the real axes. Furthermore, suppose that r < s and that the analy-
tic set v11,···,fs inside the polycylinder Z 1 x • • • x Z 5 with bounded simply con-
nected components Z; (for j = 1, 2, · · ·, s) can be decomposed into a finite num-
1) That is, any two points of a component can be connected by an open strip of maxi-
mum dimension in it. .
12 I. INTRODUCTORY MATERIAL
We turn now to the theorems on bounds that we shall need. The letters K 0 ,
K 1, • • • will denote pos~ive constants. Let Z denote an open polycylinder of the
type described and let Z denote its closure. Let f 1 (e 1, • • ·, es) denote a func-
tion that is holomorphic on Z and let F (e 1' • • • , es) denote a function that is
holomorphic in Z. Suppose also that, throughout Z,
(1.2.5)
where 0 is the distance from (e 1' ••• ' es) to the boundary of z.
Proof. By the hypothesis of the theorem, the polycylinder Z can be covered
by a finite number of open circular polycylinders U 1, • • • , Uk such that in each
of them,
IOM K M
101 (11· 0i •... , 0,
2 s
)I<-<--·
'Y/'q {JQ
2
(I.2.8)
On the other hand if lf11~ yjoq/10, then the circle CY/ lf'1 -f11= yjoq/5 lies
inside the image cf>(Ui n Z) and from Cauchy's theorem we have
1 ~ F 1 (!~, 0 1 ~, •• • , 0iJ ,
0 1 (/1, 0, 2, ... , 0is) = -. ( , ) , dfl'
2m c /1 -/1 /1
Yj
so that
Combining this inequality with (1.2.8), we obtain the proof of the theorem.
A modification of Theorem 1.2.4 that will be important in what follows is a
theorem dealing with the case in which the polycylinder Z is the Cartesian prod-
uct of s open vertical strips S 1 x • • • x S 5 •
<p: (01, ...• Os)~ <ft, ...• f,. er.+ I• •••• 0s).
The functions G/01' · · ·, Os) can also be represented in this region as holomor-
phic functions Gj (f 1' • • • , f,, ()r + 1' • · · , ()s), where j = 1, 2, · · · , r. Consider a point
( (0)
el,···, s €r.Inane1ghbor_hoodofthepo1nt f1=f1<e1(0) , ... ,es(0) ), •••
()(0)) D • •
The general local properties of ideals of functions are formulated with the aid
of the concept of the germ of a function. Let f denote a function that is holomor-
phic in a neighborhood of () = (() 1' • • • , () s) E Z. The germ of f is defined as the
class of functions that are holomorphic in a neighborhood of () and that coincide
with f in a neighborhood of (). This definition can be made more precise with the
aid of the concept of an inductive limit (see Cartan [ 28 ]). Consider the open sub-
sets U of z. In each set U consider the ring SU of functions f (()) that are holo-
morphic in U. Let V denote an open subset of U and introduce the homomor-
phism rU that consists of the restriction to V of functions that are holomorphic
in U. F~r WC V CU we will have r~ = r~ or~. The inductive limit of the
groups Su for () € U constitutes a group Se· Every element of this group is a
germ of functions f e. If f € Su , g E S V, and () € U n V and if there exists an
open set wcu n v such that r~ f = r~g, then f e = g e·
Addition and multiplication of germs of functions f e are defined in a natural
way. They constitute a ring 0 e in which we may consider the ideals I e· In the
space of all germs f e we define a topology (of the general type), and for this we
need only exhibit a basis of the family of open sets. Let U denote an open set Z
such that{€ Su· For every point()€ Uf we define a germ feE S 8 • We denote
the set of such germs by f U. We form a basis of the family of open sets from the
sets fU for all f and U.
We shall now present briefly the information on sheaves that we shall need.
(For a more detailed exposition see [ 17] or (55 ].) Although we shall need only
sheaves of ideals of germs of holomorphic functions over Z, it will be useful to
give the general definition of a sheaf of rings. A sheaf j= of rings is determined
when the following four things are defined:
I) a topological space X (the base of the sheaf),
-
2) a function x - F (x) that assigns to each x E X a ring Fx,
3) a (general) topology in the union F of all the sets Fx,
4) a "projection" p: F - X that assigns to all elements of Fx the element
x and that is a local homeomorphism.
In addition we require continuity of the algebraic operations. More precisely,
the mapping a - - a (a € F) of the space F onto itself must be continuous;
also, the mappings (a,f3) - a+ f3 and (a, f3) - af3 of those pairs (a, f3) €
F x F for which p (a)= p ({3) (so that the operations are defined) must be continu-
ous (the topology in F x F is defined as usual).
§3. ANALYTIC SHEAVES 19
The rings F x are called the stems of the sheaf.
Let U denote an open subset of X. A continuous mapping s: U--> F for
which the mapping p o s is the identity mapping is called a section of the sheaf
S: over U.
Thus a section s maps x into the stem F x; it has exactly one point in
common with every stem over U. Two sections that coincide for a point x E X
also coincide for some neighborhood of x.
Furthermore, since s • x belongs to F x we can define addition and subtrac-
tion of two sections over U by (s 1 ± s 2 ) x = s 1 x ± s 2 x € F x. In this way we
obtain the group of sections r
(U, S:).
The sheaves that we shall use will always have a base X = Z, where Z is
the polycylinder introduced earlier. Furthermore they will always be analytic.
Take X = Z. A sheaf of ideals I z in the rings 0 z of germs of functions
that are holomorphic at a point z € Z is called an analytic sheaf S: over We z.
take the u~ual topology on Z; on F = {/ zl we take the topology described above.
Here the mapping (f, a)-> fa, where f€ Oz and a€ Fx' is continuous. (In a
more general situation this property is a requirement over a complex analytic
variety.)
In particular, the sheaf 0 of germs of all holomorphic functions over Z is
analytic. The sheaf S:
mentioned above can be defined as a subsheaf of O. The
section s of the analytic sheaf S: over the set U can be identified in a natural
manner with the function f €Su which is holomorphic over U.
Very important for what follows is the concept of a coherent analytic sheaf.
An analytic sheaf S: over Z is said to be coherent if for every point z € Z
there exists a finite number of functions (sections of the sheaf 0) that are holo-
morphic over some open neighborhood U of the point z and have the property
that for an arbitrary point z' € Z the stem Iz is contained in the ideal generated
by this finite number of functions in the ring of germs 0 z,.
Now we can state the theorem of H. Cartan that we shall need. In what fol-
lows we shall base our study of a large class of tests and unbiased estimates on
this theorem (Theorem 5 in [28)). We formulate it under more stringent restrictions
than in its original presentation since this formulation will be sufficient for our
purposes.
Theorem 1.3.3. Let Z denote a polycylinder with open simply-connected
20 I. INTRODUCTORY MATERIAL
bounded components and let 1 denote a coherent analytic sheaf over Z. Let
u 1, u 2, • • ·, uM denote finitely many sections of 1 over Z such that for an arbi-
trary point z .€ Z, the stem lz is generated by the germs of these functions in Oz.
Then the sections u 1, • • ·, uM in the ring Oz of all functions that are holom01phic
over Z generate an ideal containing the entire group of sections r(Z, ~).
From this theorem we derive a corollary that we shall have occasion to use
later.
Corollary. Let f 1, ···,fr, where r < s, denote functions that are holomorphic
on Z. Suppose that the analytic set of points (01' • • •, Os) defined by the condi-
tions
iJ/1 i}f I
091 • ' · 098
/1=0, ... , fr=O; rank · · · · < r, (1.3.1)
iJf, iJf,
as.- ... as;
has no points inside Z. Then every function F that is holomorphic in Z and
vanishes on the analytic set f 1 = 0, • • ·, f, = 0 can be represented in the form
F=f 1 G 1 +···+f,G,, (1.3.2)
f 1 = 0, • • • ' fr = 0, (1.3.3)
flv=h1g1+h2g2+ · · · +hsgs,
where h l' • • • , h s are /'1unctions that are holomorphic in V. l)
Proof. For every point () € A, let us consider a polycylindrical neighborhood U8
in which the ideal is finitely generated. Theorem 1.3.1 asserts that such a neigh-
borhood exists. From the open covering of the compact set £1 that we have ob-
tained, let us choose a finite covering {u el' ... , u el I.
We may assume that U~=l u 8i = Z' is a·polycylinder (containing A). Consi-
der the system of generating ideals I e i W 1, • • • , Us l and the ring of functions
l) The assertion that I has a finite basis on A can also be understood in the follow-
ing sense. Let us choose elements of I as generators and construct from them an ideal
'f in the ring of functions that are holomorphic in Z. Then the ideal [' is finitely gen-
erated. We note that if I has a finite basis on A in the sense indicated in the statement
the theorem, it will also have a finite basis on A in the sense just explained.
22 I. INTRODUCTORY MATERIAL
that are holomorphic in Z'. Let I denote the ideal generated by I in that ring.
From the coherence of the sheaf of ideals I e that are locally generated by the
ideal I on Z and from Theorem 1.3.3, we may assert that the system IU 1,. • ·, U8 }
generates the entire group of sections. This means that for every function t € I
there exists a set h 1, • · ·, hz 8 of functions, holomorphic in Z', such that f \zt =
Ihi ui Iv. The theorem is proved.
CHAPTER II
23
24 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES
For t¢ N' UN", the function F (x It) behaves like the distribution function
P(X < x It), where X is a random variable. Now suppose that F 1 (x It) is con-
tinuous from the left with respect to x and that it coincides with F (x I t) at
rational points. It is a distribution function and it defines a probability measure
p 1 (A I t) that is independent of the parameter e for A € a. Let us show that
P 1(A I t) is the conditional probability of the set A for given t. Specifically,
let us show that
(2. 1.1)
for all A € <f and for all t except for values of t in a set of corresponding
measure O. By what has been said above this is true for sets A that are inter-
vals with rational or infinite end-points. Hence on the basis of elementary set-
theoretic considerations it is true for the entire Borel a-algebra <f. On the other
hand, on the set N' LJ N" we can choose the measures P(A \ t) arbitrarily with-
out violating the definition of conditional probabilities. This completes the
proof of Theorem 2. 1.1.
In what follows, we shall confine ourselves to admissable probability
measures P = IPe, e € o }, that is, probability measures with probability density
p e with respect to the same a-finite measure µ on ex, (f ). (Most often x will
be a Euclidean space and µ will be a Lebesgue measure.)
We shall need the following theorem from measure theory.
Theorem 2.1.2. The family P of probability measures is dominated by a a-
finite measure if an.d only if P has a countable equivalent family.
By equivalent families of measures, we mean that vanishing of the measures
of one family for any set A € (1 implies vanishing of the measures of the other
family.
For the proof of the theorem see [36] or [18]. As a dominating measure for
P = {Pe, e€ 0} we may take A. = I.i= 1 c i Pei, where each c i is positive, where
§ 1. SUFFICIENT ST ATIS TICS 25
ir =1ci = 1, and where the Pe.
'
constitute a suitably chosen countable family.
Proof (cf. [36]). We first prove the necessity. Suppose that T is a suffi-
cient statistic. Let A 0 denote the a-subalgebra of Cf induced by T. Then for
all () € 0, A0 € ff 0 , and A € ff we have
P 0 (A)= f
P(A\T(x))dP0 (x)= EJ..ilA(x)JT(x)]dP0 (x), f
x x
because, as we have just shown, P(A I T(x)) is also a conditional probability
for the measure A.. Furthermore the function EA[/ A(x) I T<x)] is ff 0-measurable
and dP8 /d,\ = g(T(x), ()) for the a-algebra ff 0 and the measure ,\. Conse-
quently
f EJ..il
:!;
A (x) IT (x)) dP0 (x)
~~ ~ = Ei., [I A (x) g (T (x), 0)[T (x)) =Pi., [A[ T (x)) g(T (x), 0)
holds, then
00
dA. (x) = ~ c;g (T (x), 0i) h (x) dµ (x) = V (T (x)) h (x) dµ (x),
i=l
_ 0)_- g (T (x), 0)
gi-(T(x), V(T(x))' if V (T (x)) > 0,
and
g 1 =0, if V(T(x))=O.
Then
dP0 (x) = g (T (x), 0) h (x) dµ (x) = g 1 (T (x), 0) dA. (x), (2. 1.5)
and thus T(x) is a sufficient statistic.
At this point we make the obvious remark that if a mapping of a sufficient
statistic T defined in a space 3" onto another statistic T' defined in the same
space is one-to-one and measurable in both directions, then such a mapping
yields a sufficient statistic provided the function g (T(x), 0) in formula (2.1.4)
is a measurable function of T'.
The mapping T: (xi' • • ·, xn)-+ (xi, · · ·, x:), which for brevity we shall identi-
fy with the expression (xi, · • •, x:), is a sufficient statistic. Indeed,
2 exp ( -
-(2-:rt;-nr.- ~ ±(xi- a)
l=l
2)
and the four parameters a 1, a 2 , ai, and a~ are the four scalar statistics x, f,
s i, and s ~ (in the accepted notation).
Consider the probability density for a given pair of samples
(2. 2.1)
Let us set
(2.2.2)
.,!.,@x,+ ~ Y1)aodn,~,(~x~+~/) )·
will be sufficient statistics.
The parameter ()=(al' a 2 , ai, a~, p) includes the two mean values a 1 and a 2 ,
two values ai
and a~ of the variances, and the coefficient of correlation P·
As a sufficient statistic we may take the vector
§2. EXAMPLES OF SUFFICIENT STATISTICS 31
(\ -X, -y, 2
Si. 82,
2 ii - -x) (Yi - -y) ') ·
n1 ~(xi
We can also replace the sample covariance (1/n) !.'f= 1 (xi - x) (yi - y) with the
quantity (1/n) !.'f = 1 xiy i and we can replace the sample variances s~ and s 22
with the quantities (1/n) !.'.I_
I-
1 x?I and (1/n) !.'.I_
1-
1 y?.
I
Example 10. In the preceding example suppose that the parameter 0 cannot
assume all real values but only the values 2m + Yi, where m ranges over all
integers {positive, negative, and zero}. Suppose that we take a sample of two
independent observations x 1 and x 2. Obviously each observation completely
determines the parameter 0, so that x 1 and x 2 are each sufficient statistics.
The last, quite simple, example (cf. also the examples of this type in
Basu's article (2]) is instructive for a discussion of information-theoretic prop-
erties of sufficient statistics. The relationships between statistics and infor-
mation theory are well expounded in the famous book of Kullback (35]. Here
we shall touch on only one point in that field.
Let § denote a region contained in Euclidean space En and let x denote
an element of §. Let 0 denote a parameter assuming values in a region J( con-
tained in Em· Let p(x I O) denote the probability density with respect to
Lebesgue measure. Suppose that this probability density is continuous and
po sitive on § x K Let E n CE n , n 1
1
< n, denote a Euclidean space contained
p (x I 0) = p (T. sI 0) IiJiJ(T,(x)6) I•
where p( T, gI e) is the probability density for ( T, g) corresponding to a given
value of the parameter e. From (2. 3.1) we obtain
I (x, 0) = I
(T, :JC)
q (0) P1 (TI 0)log2 Igg (T, 0>
(T, 0) q (0) d
0 dT d0, (2. 3.4)
:JC
I (x, 0) = f
IT, :JC)
q (0) p 1 (T j 0}log 2 p~~~g) dT df
- f(T, :JC)
P1 (T, 0) log 2 ;Cg•q ~~) dT d0 =I (T, 0). (2. 3.6)
l(x, fJ)=l(T, 0)
under the preceding assumption regarding p(x IO) for a sufficiently broad class
of a priori distributions of the parameter q(O) and the statistic T, then the
statistic T is sufficient. We note that in our derivation we could have used not
the amount of information l(x, O) but the functional
n times
the distribution of the repeated sample pn is induced. We shall call its statis-
tics the st..itistics of the repeated sample.
Now suppose that for all values of () € 0 the distributions p have a posi-
tive probability density p(x I()) that is continuous with respect to x. In Example
2 of §2 we saw that the function g(x, ()) = lnp(x I()) - lnp(x I 0 0 ), where 0 0 is
any value of the parameter (), is a sufficient statistic for P. This statistic is
also necessary; if X(x) is a sufficient statistic, then by (2.1.4)
p(xj6)=g 1 (X(x), 6)h(x),
where h(x) is independent of (). Therefore g(x I())= lng 1(X(x), O)-Ing 1(X(x), 00)
is, as a function of () € 0, subordinate to X(x).
If we replace § with §n and P with pn, then we need to replace g(x, ())
with g(xl' • • ·, xn' ()) = g(x 1, ()) + • • •+ g(xn' ()).
§4. A REPEATED SAMPLE 37
(2. 4.1)
is functionally independent and constitutes a necessary and sufficient statistic
for a sample of size n.
We shall show that the statistic (x 1, • • ·, x,) is equivalent to the neces-
sary and sufficient statistic g(xp • · ·, xn' e).
First let us show that the first statistic is subordinate to the second. Let
cf>(x) denote a function in L(P, fl.) and suppose that x(x 1 i • • ·, xn) =
Therefore x(x 1, ···, xn) = nc 0 + I.~=l cqg(xl' •• ·, xq' e), so that Xis sub-
ordinate to the necessary statistic g(xl' • • •, xn' e) and hence is itself neces-
sary. Furthermore, the converse subordination also holds: g(x, 0) = a 0 (0) +
"i.j =1 a/e) ¢/x), where the a/O) are constants depending only on O. There-
fore g(x 1 , ••• , xn, 0) = na 0 (0) +I./= 1 a/O) x/x 1, • • ·, xn)' so that
g(xl' • • ·, xn' 0) is subordinate to {x 1 , • • ·, x,}. Thus this last statistic is a
sufficient statistic equivalent to g(xl' • • ·, xn' e).
The functions 1, ¢ 1(x), · • •, ¢,(x) constitute a basis in L(P, fl.) and con-
sequently are linearly independent. Let us show now that, if s ( :S: r) arbi-
trarily chosen functions 1, ¢ 1(x), • • ·, ¢ 5 (x) are linearly independent (for
example, if they constitute a part of the basis), then for n ?::, s the system of
functions
n
Xi(X1, ... , Xn)= ~cpi(xj)
j=1
38 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES
Let us show that this would imply linear dependence of the system 1, ¢1' • • •
... , ¢s· We prove this by" induction. If s then ¢~(x 1 ) = 0, so that
= 1,
¢ 1(x 1) = const and our assertion is trivial. Suppose now that it is true for
s -1. If D(x 1, • • ·, Xs-l)/D(xl' · • ·, xs_ 1h= 0, our assertion holds. Conse-
quently we may assume that there exists a point (x~0 >, •• •, x~0_}1 ) at which this
Jacobian is nonzero. Consider a point (x~0 >, ••·, x~°.}l' xs), where xs € l'l.
assumes arbitrary values. At such a point let us write the left-hand member of
(2. 4.2), expanding the determinant in terms of elements of the last column. We
then obtain
(2. 4.3)
Here A1, ···, As are constants and As f 0. Thus we see that ¢(xs) = const
in the interval l'l., so that we have linear dependence.
Thus the system of functions (2. 4.1) is linearly independent and constitutes
a necessary and sufficient statistic for P with respect to a sample of size n.
This completes the proof of the theorem.
Theorem 2. 4.2. Under the conditions of Theorem 2. 4.1, for arbitrary finite
n :::; r, every sufficient statistic with respect to a sample of size n is trivial.
Suppose that n :::; r. Then L(P, l'l.) contains n functions ¢ 1 , ¢ 2 , • · •, <Pn
such that the system 1, ¢1' ¢ 2, • · ·, ¢n is linearly independent. It follows
from the proof of Theorem 2. 4.1 that the functions x 1(x 1, • • ·, xn), • • •
§4. A REPEATED SAMPLE 39
R
g (x, 0) = !i ci (0) cpi (x) + c0 (0),
i=l
(2. 4.5)
where the c PJ) are constants depending only on (). Since g(x, ()) = In p(x I()) -
lnp(x I 00 ), we obtain formula (2. 4.4) with cp 0(x) = lnp(x I 0 0 ). Let us now show
that the functions 1, c 1(()), • • • , c R(()) are linearly independent. Suppose that
this is not the case. Then without loss of generality we may set c R(()) = b 0 +
b 1c 1(()) + • • • + b R-I c R .:.1 (()), where the bi are constants. Substituting this
expression into (2. 4.5) we see that
R-1
g (x, 0) = .!i ci (0) ¢i (x) + c0 (0) +
i=l
b0cpR (x),
so that the dimension of L(P, 6.) does not exceed R, which contradicts the
hypothesis.
The following theorem is in a certain sense the converse of Theorem 2. 4.3.
Theorem 2. 4.4. Suppose that
From this it is clear that the dimension of L(P, ~) does not exceed R + 1, and
consequently the rank of P does not exceed R. Furthermore, if the functions
1, c 1(0), • • • , c R(f)) are linearly independent, so are the R functions c 1(fJ) ·-
c 1, • • • , c R(f)) - c R" An elementary argument leads to the existence of R num-
bers f) l' · · · , fJ R in 0 such that
R
<Jli (x) = biO+ ~ biig (x, 6i), (2. 4.8)
j=1
where the bij are constants. Thus the functions 1, ¢1' • · ·, ¢k belong to
L(P, ~) and generate it. H they are linearly independent they constitute a
basis in L(P, ~), and the present theorem follows from Theorems 2. 4.1 and
2. 4.2.
These theorems naturally lead us to consideration of exponential families
of a general form, the study of one aspect of which constitutes the content of
this book.·
h(x) to the measure µ(x) (that is, if we consider the measure v(x) with dv(x)=
h(x)dµ(x)) and set Q/()) = ()j (j = 1, 2, · · ·, s). Here the ()i have numerical
values and the point () = ()I' • • ·, () 5 belongs to the Euclidean space E 5 • We
obtain the expression
(2. 5.2)
J ··· Jexp (
Ek
±
j=I
0iT i (x)) dv (x) < oo. (2. 5.3)
Here v (x) must be a a-finite measure on '.X: = Ek that is given in advance and
that is independent of the parameter (). The space 0 defined above is called
the natural parameter space. Let us give some theorems of Lehmann [36] on
exponential families under natural parametrization.
Theorem 2. 5.1. The natural parameter space 0 is a convex set.
To prove this, we use Holder's inequality (see for example [71], pp. 21-26),
which may be written in the form
Em
(2. 5.4)
where a(x) and b(x) are measurable functions defined on Em' µx is a a-finite
measure, a E (O, 1 ), and all the integrals converge.
Let () = (()1' • • ·, () 5 ) and ()' = (()~, • · ·, ()~) denote two points in the natural
parameter space 0 and let a denote a number in the interval (0, 1). We need
to show that the point a() + (1 - a)()' belongs to 0. In formula (2. 5.4) we set
§s. EXPONENTIAL FAMILIES 43
µ(x) = v (x), a(x) = exp (I.j = 1 ()i T/x)), b(x) = exp (l:.j =I ej ~ (x)), and m = n.
Then, on the basis of (2. 5.4) and (2. 5.3), the desired result follows for the
values of the parameters () and () 1 •
In studying the family {p 8(x)l in a natural parametrization, we need an
extension into the complex domain.
g
Let = (g 1, ·•·,gs) denote a point in the natural parameter space 0, so
that the integral in (2. 5.3) converges when ()i = gi (j = 1, 2, • · ·, s). Then this
integral will obviously converge also for complex values ()i = gi + iri, where
the Ti are arbitrary real numbers. Admittedly, the values of p 8 (x) given by
formula (2. 5.2) will not be positive and they will not in general even be real,
so that we do not obtain extensions of the family of distributions. However
such a procedure for extending the values of the parameters to the complex domain
will be useful to us on several accounts in what follows. In many cases it
allows us to apply the theory of functions of a complex variable.
Let ¢(x) denote a bounded complex-valued measurable function defined on
En • Consider the integral
Obviously this integral converges at every point g = (g1, • • •, (s) of the natural
parameter space n and also at every point of the cylindrical subset of the com-
plex space E(Zs) consisting of points gi + i Ti (for j = 1, 2, • · ·, s ), where
the Ti are arbitrary real numbers.
If the natural space 0 contains an interior point (ei_O), • • ·, g;o)), we may
assert that the integral (2. 5.5) is an analytic function of the parameters ()I,•• •
••• , ()s at that point. To see this, note that our integral can then be repre-
sented as the limit to which a sequence of finite sums converges uniformly
with respect to the complex parameter () = (g1 + i TI' •••,gs+ i Ts)' where
(t°l' • · ·, gs) lies in the neighborhood (~O>, · · ·, g;o>). By Weierstrass'
theorem on functions of several variables (see for example [5]) we may conclude
that (2. 5.5) is analytic at the point (g~ 0 >, · · ·, g;o)) and at all complex points
with the same abscissa.
The derivatives with respect to the ()i can be calculated at these points by
differentiating under the integral sign. This is true because the derivatives of
44 II. SUFFIOENT STATISTICS AND EXPONENTIAL FAMILIES
such analytic functions can be expressed as integrals of i:he Cauchy type over
small polycylinders around points () of the type described, and by virtue of the
absolute and uniform convergence with respect to () of the integral (2. 5.5) in a
sufficiently small neighborhood of the given point we can transform the integral
with respect to () into an integral of the Cauchy type with respect to the
measure dv(x}.
These considerations can be applied to the derivation of certain useful
formulas (see [36]). From (2. 5.2), we have, at interior points of 0,
(2. 5.7)
Analogously,
cov(Ti (x), T 1 (x)) =E(Ti (x) T 1(x) )-E(Ti (x)) E (Ti (x))
<}2
=- 00 . 00 . In C (0).
I I
(2. 5.8)
E0<p + (x) > f (0); E 0<p _ (x) < f (0), (2. 6.2)
so that an estimate which is both upper and lower is unbiased, and conversely.
We shall study below the behavior of unbiased estimates in certain
classes of exponential families. At the moment, however, what we need to do
is establish connections between the theory of unbiased estimates and the
theory of sufficient statistics.
Let cp(x) denote an unbiased estimate for f(()) and let x(x) denote a
sufficient statistic for () with an arbitrary set of values. Consider the condition-
al mathematical expectations E el¢(x)I x(x)]. Since x(x) is a sufficient
statistic, this expression is independent of (), so we may write EJ¢(xllx(x)]=
EX(q,). The function Ex(¢) is a function only of x(x) for given ¢(x).
Since E(EX(cp)) =Ee/>= f(O), we see that the function EX(¢) is an unbiased
estimate of f(()) (see Rao [61], Blackwell [4], and Kolmogorov (32]). We also
have the following theorem ([16)., [17], [l S]).
46 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES
(2. 6.3)
Proof. We have
De<p =Ee (<p- f (0) )2 =Ee (<p-Ex (<p) +Ex (<p)- f (0) )2
=Ee (<p - Ex (<p) )2 + E (Ex (<p) - f (0) )2
+ 2E [(<p-Ex (<p)) (Ex (<p)- f (0) )].
Furthermore,
g(E (<p- f (0)) IX)= g(Ex (<p)- f (0)) <: E(g(<p- f (0)) IX),
where we assume the existence of the corresponding integrals.
Again taking the mathematical expectations, we obtain
Theorem 2. 6.2.
Example 2. Suppose that the distribution is normal, so that xi€ N(a, a5),
where a 0 is known. Then we may consider the integral
over an arbitrary Borel set A on the real line. Suppose, for example, that
xi (i = 1, ···,, n) denotes the dimensions of an arbitrary manufactured object and
that A constitutes a zone lying outside the tolerated range [a 0 - 3a0 , a 0 + 3a0).
Then f A (a) denotes the portion.of rejects for a general mean a. An unbiased
estimate of f ia) is the statistic. (see (32))
-
<l'A (x) = 1
,r-
y 2na0
Jexp [-
A
(x - x)2 ] dx.
2a0
2
EzP = 1 J oo n 3
y-2+1> -2
x
_ p
r {~
2
+ P)
n .!.=.!.. (n-l) x e dx-2 (n-l
2 2 r_2_0 •
r--
2
)
Thus for arbitrary positive p the statistic
48 II. SUFFICIENT STATISTICS AND EXPONENTIAL FAMILIES
(n)P
-
r (n 21)
---,.--....,...----'-....,..- 52P
2 r(n2l+P)
is an unbiased estimate for the parameter a 2 P.
1r
n. ~ OT(X1, .. . , Xn)=T,
~
0
§ 1. NUISANCE PARAMETERS
Suppose that on a measurable space (~. U) there is defined a family of mea-
sures {P 8 1, for () E 0, that are dominated by a a-finite measure µwith respect
to which there is a probability density pix). We shall assume that the values of
the parameter () = (() l' · · · , () 5 ) lie in a Borel set I 1 (usually a parallelepiped)
contained in the Euclidean space E 5 • Furthermore, we shall assume that for
every value of x the density p B (x) = p(x; () 1, · • • , () 5 ) is a sufficiently smooth
function of the parameters () l' · · ·, () 5 • The degree of smoothness required will
vary from problem to problem and will be specified in the individual cases.
Instead of the given parameters () 1, .. ·, () 5 , we may introduce other para-
meters y 1(()1' • • ·, () 5 ), • • · , y 5 (() 1, • • ·, () 5 ). For suitable choice of these func-
tions yl'···, y 5 , the property of smoothness of pix) for given xis retained for
the new parameters y 1' · · ·, y 5 • Let us now suppose that we subject a hypothesis
H0 regarding the parameters () 1, · · ·, ()q (where q < s) to statistical verification.
The hypothesis may be of the form (() 1, · · • , () q) € w, where w is a subset of the
space of the parameters (() 1, · · · , () q). We make no hypothesis of any sort regard-
ing the remaining parameters ()q+I• • • ·, () 5 • However, since these parameters
appear in the expression for the probability density p(x; () 1' · • · , () 5 ) , they may
play a part in all calculations of the distributions of the statistics that we may
make in verifying the hypothesis H0 ; for this reason we have to keep them in mind.
Following H. Hotelling, we call such parameters ()q+l' • • ·, () 5 nuisance parameters
of the given problem.
Frequently, the hypothesis H0 deals with the behavior of certain functions
Yi(() l' · · · , () 5 ),y /() 1, · • • , () 5 ) (q < s) of the given parameters and is of the
• • ·,
form (yl' · · ·, Yq) € (J)q• where wq is a subset of Eq. Let us assume that the
49
50 III. INVARIANT POWER FUNCTIONS
ai/a~ =Yr Here the nuisance parameters are, for example, al' a 2, and a2•
"play-off" with two outcomes: H 0 with probability I - '<ll(x) and H 1 with proba-
bility ·<Ii(x), and we should accept or reject H0 in accordance with these outcanes.
We know that, even in the simplest case of testing a simple hypothesis H0
against a simple alternative H1, to obtain the most powerful test we need to intro-
duce randomization (see for example [ 36]).
Let us denote by E 19<1l(x) the mathematical expectation of <ll(x) for given () =
(() 1, · · ·, () 5 ). Those values (()l' • • ·, () 5 ) for which the hypothesis H0: () 1 =
e\O>, · · ·, () q = ()~O) is satisfied correspond to the value of the level of the test
<ll(x):
(3.2 .1)
gives the power of the test <ll. We shall refer to both the values of (3.2.1) and
the values of (3.2.2) as the values of the power function ¢(() 1, · • ·, () 5 ) of the test
<ll(x), keeping in mind their different statistical connotations depending on whether
they correspond to the values of the parameter H0 or the parameter H1• Thus we
have
(3.2.3)
To a significant degree, the power function ¢ characterizes the properties of a
test although there are other characteristics that are important for many measural:ie
spaces, for example those spaces encountered in sequential analysis.
To calculate the power function ¢(() 1, · · ·, () 5 ) we.need to know the values
of the parameters. If we wish to determine the behavior of a test for simple par-
ticular cases H0 : (e)0 >, · · ·, ()~0 >, ()'q+l' ••• , () 15 ) and the alternative H 1:
(())1), • • ·, ()~O, ()"q+l' • • ·, ()'~), then we need to know not only the values of the
"basic" parameters ()l' ••• , ()q but also the values of the nuisance parameters
()q+l• • • ·, () 5 • The question naturally arises: is it possible to construct tests
for which the power function ¢(()1' • · ·, ·() 5 ) is independent of the nuisance para-
meters ()q+l' ••• , () 5 and is determined only by the values of the "bask" para-
meters () 1, · • • , () q with which the hypothesis H0 and H 1 deal? We shall call
tests of such a kind tests with invariant power function (with respect to the nui-
sance parameters).
There exist trivial examples of such tests: The test ·<Ii =a, where a is a
§ 3. SOME RESULTS 53
constant belonging to the interval (0, 1), is obviously a test with invariant power
function. However, one can easily see that this test is completely useless for
verifying the hypothesis H0 •
In general, if a test ell with invariant power function ¢(() l' • · ·, () s) that is
independent not only of the nuisance parameters () q+l' · · ·, () s but also of the
basic parameters ()l' • • ·, ()q, that test will be useless for verifying H0 • In what
follows we shall be concerned with tests ell(x) with invariant power function that
are not useless. Such tests might be of particular interest if, for a sufficiently
broad class of problems, we could obtain sufficiently broad classes of such tests
and examine their properties, seeking tests that are optimum in some respect or
other. Their advantage over other tests would consist in the fact that these other
(noninvariant) tests do not in general admit calculation of the power function
¢(01' · ··, () 5 ) for particular cases of H0 and H1 and hence their qualities are to
a considerable degree unclear.
At the present time however, we have extremely few results in this direction,.
and the results that we do have deal primarily with particular cases of testing
hypothesis (cf the type of Examples 1-5 in § 1). Since these results are interesting
however, we shall present them in the folJowing section.
2.6.1 and 2.6.2 enable us to assume that the properties of the new test cll 1(X, s 2),
which depends only on sufficient statistics, will be no worse in several respects
than the properties of the original test. In all cases, for arbitrary values of a
and a,
(3.3.2)
that is, the power function of the test ·cfl(x) coincides with the power function of
the test cll 1(X, s 2), so that investigation of the tests with invariant power func-
tion can be conducted only in a region of tests that are dependent only oo the
sufficient statistics.
In what follows we shall frequently use an operation of the type (3.3.1), which
we shall call the operation of projection onto the a-algebra of sufficient statistics.
For now, let us consider the question of verifying the hypothesis H0 : y(a, a)= Yo
by using tests with invariant power function. It turns out that the existence of
such tests that are not useless depends in a very real way on the form of the func·
tion y(a, a). For example, let us take y(a, a)= a, so that the hypothesis H0 is
of the form a= a 0 • The unrandomized test ell with critical zone s 2 ·> C (for any
C > 0) will obviously possess a power function that is independent of a but de-
pendent on a, so that it has an invariant power function but still is not useless.
A similar situation is true of a test with critical zone f(xi - xj) > C, where f is
an arbitrary continuous function of the differences in observations. Its power
function is independent of a but (in general) dependent on a. We can express
this situation very concisely by saying that the hypothesis H0 : a= a 0 admits an
invariant verification. However, as we shall see later, the hypothesis H0: a= a0
does not have an invariant verification. We can prove this and a more general
theorem.
Theorem 3.3.1. The hypothesis H 0:
a
#=Yo
does not admit an invariant verification for p < 1.
To prove this theorem we need only consider randomized and unrandomized
tests that depend only on sufficient statistics. By virtue of (3.3.1) and (3.3.2),.
the remaining tests reduce to these. Let us set x = X and s 2 = V. As we know,
x and s 2 are stochastically independent (for phenomena of this sort see Chapter
IV).:
Furthermore (see for example [33]) the statistic X has probability density
§ 3. SOME RESULTS 55
P1(X)=( 2
The statistic V has probability density
:r ! exp[- 2: 2 (x-a)2J. (3 .3 .3)
nn/2
r.!l.e en =---n-_...,.l____ (3.3.5)
V2n 2-2-r ( n 2 1 )
f J
oo oo n-3
f f
oo
dx
oo n-3
dv<I> 1 (x, v)v-2-exp[-A.(v+x2)+µxl
-oo 0
=
If _!!.._ (µ2)
CnA. 2 exp 41.. cp (A., µ). (3.3.8)
Since <ll 1(x, v) is a Lebesgue-measurable function such that 0 :5 <ll 1(x, v) :::; I, we
seefrom(3.3.7)that ¢(,\,µ)is an analytic function of,\ andµ for A.>O,- oe<
µ.<oe.
56 III. INVARIANT POWER FUNCTIONS
The hypothesis H 0:
a
~=Vo
(3.3.10)
for A> 0 and - oo .< µ < "". Here the function i/J = i/J(t) must not be a constant if
the test is to be of any use. Because of the analyticity of </J(A., µ) in the region
defined above, the function i/J(t) is differentiable for t > 0. Let us make use of
this fact. If i/J(t) is not a constant, there exists a point t = t 0 , where tf;'(t 0) f. O.
In equation (3.3.8) we set </J(A., µ) = i/J(µ/A. l-p/ 2) and differentiate both sides
with respect toµ. Since A> 0, we can differentiate under the integral signs on
the left-hand side of the equation. We obtain
Now, let us set µ. = t 0 A. I -p/ 2 and consider the behavior of the two sides of equa-
tion (3.3.11) for small positive values of A.. Let us find an upper bound foe the
left-hand side of that equation by replacing the factor x with Iii and the function
·~ 1(x, v) with its maximum value, namely 1. For the left-hand side of equation
(3.3.11) we obtain the upper bound
f f
0) 3 0)
f ( )dx=O(exp-{-(1n !)2).
00
A,
Since p < 1, we have p/2 - 1 < - 1/2. Since t/J' (t 0 ) I= 0, we obtain a contradiction
by letting A approach 0 from above; thus the proof of Theorem 3.3.1 is complete.
In partic.ular, we see that the hypothesis H0 : a= a 0 does not admit an invari-
ant verification.
We note now that for p = 1 and invariant verification of the hypothesis H0:
y =a/a= Yo is possible. To car5 out this verification we may take, for example,
the statistic = + ••. + x~, where
u x;Jxi
X=xVn+l ,Xi=(I- n~it'''(xi-x)(j=l,2, .. .,n).
The distribution of this statistic depends only on a/a, so that the unrandomized
test with critical zone IVI ~ C will have an invariant power function.
A ,result analogous to the last one can be formulated for the general case of
a linear hypothesis (Example 1 of § 1). A very simple case of a more general
hypothesis admitting inequality of the variances of the elements of the sample is
the Behrens-Fisher problem (Example 4 of § 1). In the notation of that example,
suppose that we are verifying the hypothesis H0 : a 1 - a 2 = 0. We shall take
a 1, Up and a 2 as the nuisance parameters.
Theorem 3.3.2. The hypothesis of equality of means in the Behrens-Fisher
problem does not admit an invariant verification.
Proof. Corresponding to the four parameters a 1' a 2 , a 1' and a 2 in the
problem are the four sufficient statistics X 1 = x,
X 2 = Y, V 1 =sf, and V 2 = s~.
Remembering the laws of distribution of the form (3.3.3) and (3.3.4) fa: two samples
of sizes n 1 and n 2, we obtain an expression for the common probability density
58 III. INVARIANT POWER FUNCTIONS
(3.3.14)
Just as in the preceding derivation, we introduce the natural parameters
n1 _ ~ • n2 ~ • n 1a 1 n 2a 2
-2 -11.1, -2- = 11.2, -2- = µl; -2- = µ2.
2a 1 2a2 a1 a2
From these equations we get 2a 1 = µ./A 1 and 2a 2 = µ./A 2, so that the null
hypothesis takes the form H0: µ./Ai - µ. 2/A 2 = 0. The more gene ml hypothesis
H0 : a 1 - a 2 = 8, where 8 is given, reduces to this one when we substitute, for
example, xi+ 8/2 for xi (where i = 1, 2, · · ·, n 1) and Yj- 8/2 for Yj (where j =
1, 2, · · ·, n 2). Thus, as in the preceding derivation, the question of an invariant
verification of Hn re::l.uces to the question of existence of the equation
"°
s f f s
-co
dx 1
~
-co
dX2
0
co
dV1
0
co n 1 -3
dV2'U1-2-V2_2_
n,-3
(3.3.17)
as A. 1 i 0 and A. 2 i 0. If !/J" (t 0 ) f. 0, we obtain a contradiction between (3.3.16)
and(3.3.17). Thus !/J"(t)::O and lfJ(t)=a+(3t, where aand (3 areconstants.
Furthermore, the argument t µ, 1/,\ 1 - µ,/A. 2 can assume arbitrary values and
=
1/J(t), as power function, satisfies the condition 0 :S lfJ(t):::; 1. Therefore (3 = O
and lfJ(t) =a, so that the test is useless. This completes the proof of the theorem.
However, the methods that we have expounded do not enable us to answer
the general question of hypothesis of the form H0 : y(a, a)= Yo that admit an
invariant verification in a problem of the type of Student's problem, to say noth-
ing of more general problems. To solve such problems, we need finer analytical
methods.
s2 = no 1 1 lf ~ x~ - ~o ( xi)21
i=l
t
\i=l '
(3.4.1)
where z >0
is a prenamed constant and [q] denotes the greatest integer not
exceeding q. If we know the number s 2, we can determine the real numbers bi
60 III. INVARIANT POWER FUNCTIONS
(i = 1, · • • , n) such that
n n
~bi= l; b1=b2= ... =bn0 ; s 2 ~b~=Z. (3.4.3)
l=l i=l
· · '
x
by the independence of and s 2 in a normal sample. The expression I~
n I =no
+lb .(x. - a)
I I
Is stochastically independent of s and of b I .0 (x. - a) since it con-
1 i =1 i
tains only the observations of xi with j-> n 0 , whereas the preceding expressions
contain only the observations of xi with i ::::; n. Thus for given s and given con-
stants bi, the numerator of the fraction (3.4.5) is distributed independently of s
according to a normal law N(O, a 2 I7 =l b'f). Furthermore, the quanti~n 0 :!_}s 2/a2
has the distribution x;
0 _ 1, so that s is distributed like a 0 _ 1/(n 0 - lJ. Jx;
Thus the fraction u is distributed like X/Jx~0 -if(n -1) \\here X € N(O, 1) is in-
dependent of x;o-1" Therefore u = tno-l is distributed according to Student's
law with n 0 - 1 degrees of freedom. The statistic u can be used to construct
an uorandomized test of the hypothesis H0 : a= a 0 with power function independ-
ent of a. Let a denote the desired level (the possibility of falling into the criti-
cal zone for H0). Let ~ a./i denote the abscissa at which
P {tno-1 >Satz}=; • (3.4.6)
§ 4. STEIN'S TEST 61
Here z is a constant chosen in advance, and therefore the power function ¢(a)
depends only on a and is independent of a.
We. note also that a Student's distribution (for tn o- 1) is even and unimodal
and that it has a single vertex at the coordinate origin. For given a 0 and ~ a./ 2
the probability (3.4.8) attains a minimum at a= a0 and its value at a 0 is equal
to the level a. For a-/, a 0 it is higher than that level, so that the probability
of rejecting H0 when it is not valid is greater than when it is valid, i.e. this test
is unbiased.
To test H0 : a= a0 against a one-sided alternative a> a 0 , we may follow a
similar procedure. The critical zone of the level a can be defined as
n·
~ bixi-a0
i=l
--,~-=,....---
rz
> sn (3.4.9)
with power function
n = max { [ ~ ] + 1. n0 + 1} •
Thus
P{ n=n0 + 1) =P { -z-<.n
~ 1
0)
s2 l
P[n=v}=P [l v<-+1-<v+l)
z
=Pf\ (v-l)(no-l)z <"I}
n,-1
< v(n 0 -l)z l
J, (3 .4.11)
02 02
and the necessary values of the probabilities P{n = vl can be found from a table
of thex 2 distribution.
From this result, we can find the mathematical expectation
E(n) of the necessary number of observations. We note that these last numbers
depend on the nuisance parameter a. It should be noted that our wish to make the
power function strictly invariant with respect to the nuisance parameter can lead
to a certain loss of information. Therefore it is expedient to use certain varia-
tions of this test that ensure weak dependence of the power function on the nui-
sance parameter but that raise the lower limit of the power (cf. Stein [67]). We
shall not pursue these considerations since the primary aim of the present mono-
graph is to give as complete a description as possible of certain forms of invariant
tests, so as to make it possible to form an opinion of their qualities in comparison
with other tests and on the loss of information in comparison with those situations
in which the values of the nuisance parameters are known. For further information
on tests of the Stein type as applied to various 'bypotheses associated with a nor-
mal law see the article by Chatterjee [78].
CHAPTER IV
63
64 IV. SIMILAR TESTS AND ST A TIS TICS
E8 ¢(x) =a (4.1.2)
for all values of e in a set0 0 C 0. Such tests are said to be similar with respect
to the family of distributions {Pel (for e € 0 0 ) or, more briefly, with respect to
the parameter set 0 0 . The concept of similarity was first introduced by Neyman
[56l. Let us pause to look at the properties of such tests for the scheme of fam-
ilies of distributions with nuisance parameters (cf. §1 of the preceding chapter).
Suppose that we are testing the null hypothesis H0 : y 1(e 1 , • • ·, es) = 0, • • ·,
q < s and y 1, • • • , y q are sufficiently smooth
• • ·, y q(e 1, • • ·, es) = 0, where
functions, for a family of densities {p (x; e 1, • • ·, es)}. Suppose that the s smooth
functions y 1 (0 1, • • ·, es),···, y q(el, • • ·, es), Yq+l(e1, ... , es),···, ys(e 1, .. ·, es)
constitute a new parametrization and that Yq + 1, • • ·, Ys are the nuisance para-
meters. If we were to succeed in constructing a test cl> with invariant power func-
tion for the trial H0, its power function ¢<1>(y 1, • • ·, Yq) would depend only on the
parameters y 1, • • ·, y q and would not contain nuisance parameters. However1 we
know from the preceding chapter that, in the simplest classical cases of testing
a linear hypothesis and its generalizations, such tests do not exist for repeated
samples of constant size. Therefore it is natural to weaken the requirement of
invariance of the power function with respect to the nuisance parameters. We can
require that the power function ¢<1> be independent of the nuisance parameters at
least when the null hypothesis H0 : y 1 = 0, • · ·, Yq = 0 is realized. Then the
values of ¢<1> will constitute a level a (the probability of rejecting H0 when it
is valid), and we arrive at the similarity condition
¢<1>(e) = E8 «1> (x) = a, ( 4.1. 3)
provided that
(4.1.4)
The points (e 1, • • ·, es) of the set (4.1.4) are limiting values of points that
violate these relations. Therefore if ¢<1>(0) is a continuous function of then, e,
in accordance with what was said above, the similarity condition (4.1.3) of a
test with hypothesis H0 follows from the requirement that the test be unbiased.
The general definition of similarity of a test for e€ 0 0 C 0 can be regarded
as a special case of unbiasedness of statistics (cf. Chapter II, §6). Specifically,
in the present case, we are considering tests, i.e. statistics <l>(x) satisfying the
inequalities 0 ~ <l>(x) ~ I for e€
0 0 C 0. Here the functional f(O} (cf. (2.6.1))
is equal to the constant a. From an analytical point of view, the condition 0 ~
cl> (.x) ~ I is equivalent to consideration of bounded statistics g(x): if lg(x)i < K,
§1. SIMILARITY 65
where K is a constant, then the linear transformation ((x) ---> 1/2 + ((x)/2k
makes the statistic ((x) a test. In what follows, some of the results that we ob-
tain for similar tests will be carried over to a certain degree to unbiased estimates.
Ifwe require, in addition to the conditions for similarity of a test (4.1.3) and
(4.1.4), that the test <I> with critical zone Z be unrandomized, then conditions
(4.1.3) and (4.1.4) take the form
q < s. (4.1.5)
Thus the conditional probability of falling in the ciri ti cal zone Z under H 0 must
be a constant and equal to the level a for each simple particular case H 0. Study
of similar unrandomized tests is quite difficult. The zone Z of a test of the form
indicated is called the similarity zone. In the general case, in analogy with def-
inition ( 4.1.2) we shall call the scalar statistic T (x) similar if, for arbitrary
() € G 0 CG and an arbitrary value of (, the probability Pe(T(x) <()=FT(() is
independent of 0. An analogous definition can be made for the most general case
of a statistic T. Suppose that the statistic T maps measurable spaces (X, <1)
into measurable spaces (1, ~) and that a family of measures IPe} (for (J € mis
defined on X. The statistic T is similar for (J € no c G if p e< r- 1B) is inde-
pendent of 0 for arbitrary 0 € G 0 and B € ~-
If T (x) is a scalar similar statistic, then zones of the form Zg 1 g 2 : ( 1 :5
T(x) < (2
are similar for arbitrary ( 1 and (z (with (1 < (z) and they can be
used to construct unrandomized similar tests. As an approximation to (scalar) similar
statistics one can consider statistics T (x) with mean value Ea T (x) and variance
D8 (T (x)) that are independent of 0 for 0 € G 0 CG. However, their study involves
great difficulties.
Together with similar zones, we sometimes consider bisimilar zones. These
were also introduced by Neyman [56]. Suppose that k measurable real functions
f 1(x), • • •, fk(x) are defined on the space (X, <1) with the system of measures
IPe} (for 0 € {}). Suppose also that
Bisimilar zones are useful for the construction of similar tests possessing addi-
tional desirable properties. It is natural to consider the bisimilar statistics T (x)
(including randomized tests cl> (x)) defined by the equations
§ 2. NEYMAN STRUCTURES.
,
LEHMANN'S AND SCHEFFE'S THEOREMS
then
(4.2.4)
i.e. the test cl> is similar. A test cl> satisfying condition ( 4.2. 3) is called a
Neyman structure for the statistic T (cf. Neyman [58]). An analogous concept
can be defined for the statistic g(x).
An easily-grasped interpretation of Neyman structures is as follows. If T
is a sufficient statistic and ~(x) is another statistic, then the conditional dis-
tribution of ~(x) for a given value T = t is independent of the parameter e € 0 0•
If X and the values of the statistic T are contained in Euclidean spaces, then
under definite (and not excessively cumbrous) analytical conditions we may speak
of the conditional distribution of the statistic ~(x) on the surface T = t (see for
example Cramer [33]). This distribution is independent of 0. Under the same con-
ditions on the surface T = t, we can "cut out" a zone Zt such that the conditional
§2. NEYMAN STRUCTURES 67
probability P8 (~(x) € Zt \t) is equal to a, that is, equal to the level for () € 0 0 •.
If we suc~eed in "gluing" these zones zt
"continuously" for all values of t into
a single zone Z, this zone will obviously be similar to the zone for the level a.
Of course, rigorous treatment of such considerations is rather laborious but it can
be done without difficulty for many important classical tests.
Lehmann and Scheffe [371 have characterized an important class of families
of measures P for which all similar tests are Neyman structures. This character-
ization involves the concept of completeness of a family of distributions. A fam-
ily of probabilistic measures IP el, () € n, is said to be complete if an arbitrary
measurable function f (x) such that
Eef(x) = o, () € 0, (4.2.5)
is equal to 0 almost everywhere. A family IP el is said to be boundedly complete
if an arbitrary bounded measurable function f(x) satisfying the condition (4.2.5)
is equal to 0 almost everywhere. We have (see Lehmann and Scheffe [371 and
Lehmann [36])
Theorem 4.2.I. Let IP el. () € n, denote a family of distributions and let T
denote a suffici'ent statistic for () € n0 CO. For all similar tests «l>(x) for the
hypothesis H 0: () € 0 0 to be Neyman structures with respect to T, it is neces-
sary and sufficient that the family of measures induced by T be boundedly com-
plete.
Proof of sufficiency. Let IP Jl denote the boundedly complete family of dis-
tributions induced by the statistic T. Consider a similar test cl> (x) of level a.
We have Ee( cl> (x) - a) = 0 for () € 0 0 • Consider the "projection" E (cl> (xi t)) =
cI> 1(t). Then E(«l> 1(t) - a)= 0 for all measures Pf with () € 0 0 • Since the family
is boundedly complete, «1> 1(t) =a with probability 1, and hence the test «l>(x) is
a Neyman structure with respect to T.
Proof of necessity. Let us suppose that the family IP f I, () € U 0, is not
boundedly complete. Then there exists a function f such that f(T)-/, 0 with
positive probability, !f(T)I _;:; M < oo (where M is some constant), and Ef(T) = 0
for all Pf with (J€ 0 0 . Suppose that c = (l/m) min(a, 1- a) and «l>(t) =a+
cf(t). Then 0 _;:; cl> (t) .:5 1 and Ee«I> (t) = a for () € 0 0 , so that cl> (t) is a similar
test that is not a Neyman structure. This completes the proof o~ the theorem.
Exponential families (see Chapter II, §5) constitute an important special case
of complete families. Consider the exponential family (2.5.2)
68 IV. SIMILAR TESTS AND STATISTICS
(4.2.6)
where µ (x) is the a-finite measure with respect to which the probability densities
in (2.5.2) are taken. In this connection we have (see Lehmann [361)
Theorem 4.2.2. Suppose that a parameter set 0 0 contains an s-dimensional
parallelepiped. Then the family of distributions induced by the sufficient statistic
T = (T 1(x), · · - , T /x)) for 0 € 0 0 is complete.
Proof. By means of a translation of the statistic T by a constant (vector)
amount we may assume that the s-dimensional parallelepiped contained in 0 is
of the form
/: - a~ e, ~a, a > 0, j = 1, 2, ...• s.
Furthermore, suppose that f( T) is a measurable function such that E8 f( T) =0
for all 0 € 0 0 • Let v(T) denote the measure induced by µ(x) on the space j"
of values of T. We have
J exp(0 T + ...
1 1 +esTs)f(T)dv(T)=O (4.2.7)
;r
for (0 1, · • ·, 0 5 ) € /. The integral (4.2. 7) is assumed to converge absolutely. It
is a multiple Laplace-Stieltjes integral. It can be extended to the complex
"polystrip" II: O.
I
= f I + ir.,I where -a< 1:.
- SJ -
<a,am-oo < r.I <oo for 1·= 1, 2,···
· · ·, s, in which it is an analytic function. By virtue of ( 4.2. 7) it vanishes on the
entire polystrip II and hence (see Theorem 1.2.1) f(T) = 0 almost everywhere
(with respect to the measure v ). The theorem is proved.
We make a slight addition to this theorem that will be useful in the theory
of unbiased statistics for exponential families.
Suppose that an unbiased statistic x(T) in such a family depends only on
sufficient statistics and that it estimates the functj.on p(0 1, • • ·, 05 ). Then it is
the only unbiased estimate depending solely on sufficient statistics. To see this,
suppose that x 1(T) is another unbiased estimate for p(0 1 , • • ·, 05 ). Then
Ee(x(T) - x 1(T)) = 0 for 0 € /, so that by Theorem 4.2.2 we have x(T) = X1(T)
with probability 1.
Theorems 2.6.1 and 2.6.2 enable us to amplify this remark somewhat. Let
g(x) denote an arbitrary unbiased estimate of the function p(0 1 , • • ·, 05 ) with
variance D (g(k)) or, in general, the loss
Eg(s(x)-p(0)),where p(0)=p(0 1, ••• , 0s),
§3. CONSTRUCTING SIMILAR ZONES 69
and let g denote a given convex (downward) function. The conditional mathemat-
ical expectation E <t(x)I T) = x< T) is also an unbiased estimate of p(O). Also,
it depends only on the sufficient statistic and from what was said above it is the
unique unbiased estimate of this form, Thus the "projections" of arbitrary un-
biased estimates onto sufficient statistics coincide. In accordance with Th.eorem
2.6.2, for an arbitrary unbiased estimate t(x),
Eg(s (x)-p(0)) > Eg (x,(T)-p(0) ). (4.2.8)
so that the statistic x<n yields the minimum of the loss.
Whereas the various forms of exponential families provide interesting examples
of completeness, consideration of examples of incomplete families of a distribu-
tion is also instructive. Let us look again at the Behrens -Fisher problem (O:iapter
III, §3). In accordance with formula (3.3.14) the corresponding distribution has
four natural parameters (denoted in the text following formula (3.3.14) by 1\, A. 2,
µ. 1, and µ 2 ). If no relationships are imposed on these parameters, the set of their
values constitutes a four-dimensional parallelepiped and the family defined by
(3.3.14) is complete. However, in the Behrens-Fisher problem we further impose
the condition a 1 = a 2, that is, A. 1µ 2 - A. 2µ 1 = 0, so that we do not have a paral-
lelepiped of the type indicated. In formula (3.3.14) let us set a 1 = a 2 = a. Let
if/(i- y) = l/J(X 1 -
X 2) denote an arbitrary continuous odd function of X1 - X 2
for which there exists a mathematical expectation when the probability density is
that of (3.3.14). The quantity X 1 - X2 is a normal variable with zero mean, so
that the mathematical expectation of the odd function ifJ (X 1 - X 2) vanishes.
This proves the incompleteness of the family (3.3.14) for a 1 = a 2 • The majority
of the questions considered below deal with the behavior of exponential families
in which the conditions imposed on the natural parameters produce incomplete-
ness. The description of similar tests (especially unrandomized tests) and of
unbiased estimates under such conditions is an interesting problem of analytical
statistics.
hypothesis H0 .
Now let <1> 1(x) denote a randomized test for H0 • We thus have the similarity
condition E8 <1>1(x) =a for 8 E 0 0 • If the similar test <1> 1(x) were unrandomized,
not only the mean value Ee<l> 1(x) but also the distribution of <1> 1(x) would depend
on the parameter 8 € 0 0 • This is the basic difference between unrandomized and
randomized tests. However, this difference is maintained only for the basic
sample space X and it can be removed in a certain sense by a suitable broaden-
ing of that space.
Let U denote the interval [O, ll with Borel a-algebra and uniform distribu-
tion. Consider the Cartesian product Xx U with the corresponding product of
the a-algebras and the product measure. Let <1> 1 = <1> 1(x) denote a randomized
test on the space ('.X:, ff).
Consider the statistic (<1> 1 ,U), where U is indepen-
dent of x E X and is uniformly distributed on U. With the aid ~f th{s statistic
let us define a new test cf>* on ('.X:, U) by
for U -<I> 1 (x) < 0, (4.3.1)
for U - <I> 1 (x)):. 0.
We see that the test <I>* is unrandomized. Furthermore, for 8 € 0 0,
E 0<I>* = E 0 (E (CI>* I<1> 1 (x))) = E 0<1> 1 (x) =a, (4.3.2)
so that the test <I>* is similar to the level a for 8 € 0 0 • Of course, construction
of <I>* is equivalent to the process of randomization as a supplementary observa-
tion over the statistic U.
Sometimes the procedure described can be applied without broadening the
basic sample space X.
Suppose that X is n·dimensional Euclidean space En and that we have the
sufficient statistic (T1, • • ·, T k) with range in the Euclidean space Ek (k < m)
for8€0 0 • Let (T 1 ,. •• , Tk; g1 , ••• , gm_k) denoteasystemoflocalcoordin-
ates in '.X. Here the a-algebras are, as usual, assumed to be Borel sets, and we
may speak of the conditional distribution P8 {g1, • • •, gm -k IT l• T 2' • • ·, Ts) for
8 € 0 (see §3 of the Introduction).
Now suppose that we have constructed a randomized test <1> 1( T 1, • • • , Tk)
for testing a hypothesis H 0 : e€ 0 0 that depends only on sufficient statistics.
Let us take an arbitrary measurable scalar function V(T1,···, Tk; g1,···, gm_k)
and construct the conditional distribution
( 4.3.3)
§3. CONSTRUCTING SIMILAR ZONES 71
for an arbitrary real value of y. For () €n0 this distribution does not depend on
(). Let us suppose that for almost all values of (Tl•••·, Tk), the function
F(y; T 1,···, Tk) is a strictly monotonic function ofy. Then, as is well known
(see [33]), the transformation U = F(V(T 1, • • •, Tk); (g1, · · ·, gm_l)) yields a
new function the conditional distribution of which is uniform on the interval [O, ll
for almost all values of T 1, • • • , Tk" We can now define an unrandomized test <ti*
by formula (4.3.1), where we replace cll 1(x) with «11 1( T 1, · · •, Tk). This test
cp* depends only on x € !. Furtheqnore, for arbitrary () € n,
(4.3.4)
so that the power functions of the tests ell* and <11 1 coincide. In particular, if the
test «11 1(T 1, •.• , Tk) is similar for ()En 0, the same will be true of the test <ti*.
Its critical zone will be a similar zone.
When -we have constructed a randomized similar test ell that depends only on
sufficient statistics, we can, generally speaking, use it to construct (choosing
different functions V) various unrandomized tests of the same power and the
corresponding similar zones. In Chapter V we shall be able, in the case of cer-
tain broad subclasses of exponential families, to give a rather complete descrip-
tion of randomized similar tests that depend only on sufficient statistics. Our
discussion shows that these tests correspond, in general, to unrandomized tests
with the same power function that depend on the sample point x € X and not on
sufficient statistics. Construction of nontrivial unrandomized similar tests that
depend only on sufficient statistics is not always possible and is a different
problem. We shall speak funher about this and shall give some important special
cases (particularly as regards the Behrens-Fisher problem in Chapters VIII
and IX).
Neyman structures (see §2) are a classical means for constructing similar
zones (see Neyman [SB]). Suppose that we have the situation just described:
X =Em and T 1, • • ·, Tk are sufficient statistics for () € n0 c n c En, where
n < m. Suppose that the family of probability densities {Pel, () € n0, and the
sufficient statistic T = (T 1, • • ·, Tk) satisfy the condition formulated in §3 of
Chapter II regarding the possibility of "coupling" any two values of the parameter
() € n0 • Then we will not have the pathological situations described in that sec-
tion. If X tx) is a statistic independent of the sufficient statistic T, their dis-
tribution will not depend on the parameter () if ()€no. Consequently x(x) is a
72 IV. SIMILAR TESTS AND STATISTICS
Here x € '.X =Em and T(x) € Ek (k < m) is an arbitrary statistic. The density is
with respect to a a-finite dominating measure µ, and ()€no is an abstract para-
meter. Correspondingly, for given (), the functions Ri(T(x), ()) (i = 1, 2, • • •, m)
§3. CONSTRUCTING SIMILAR ZONES 75
and the functions r/x) are assumed measurable. We see that the family (4.3.5)
generalizes the family (2.1.4) for nontrivial T(x) €Ek and '.X =Em. Let us now
show that under rather general assumptions this family has similar zones (obviously
nontrivial) for an arbitrary level a E (0, 1). First of all, let us put the probability
densities (4.3.5) in the form of finite sums of sign-constant terms. We set
RT(T. 0)=max{Ri(T. 0). o}; rt(x)=max{r,(x). o}.
Rt (T. 0) =min (Rt (T. 0), O}; ri (x) =min (rt (x), o}.
Then we have
Rt (T. 0) =Rt (T. 0)+ RI (T. 0).
rt(x)=rt(x)+rt: (x).
Substituting these expressions into (4.3.5), denoting by R)T, 0) either Rt(T, 0) or
R~(T, (J) and by ';:'i(x) either rt(x) or r;<x), we obtain from (4.3.5)
Pa (x) = R1 (T. 0) ~ (x) + ... +Rn (T. 0) rn (x), (4.3.6)
Then we set
n
Pa (x) = ~ ePt (0) n, a (x). (4.3.9)
i=I '
Here the TT. 8 (x)
i,
are normalized probability densities with respect to the measure
µ, the quantity £i = ± 1, and
n
~ ePt(O)=l, 0Eflo. (4.3.10)
i=I
Also, we denote by TI. e the probability measures corresponding to the densities
17.
'·
8 (x), By virtue of our assumptions (that the space '.X and the range of T(x)
'· Euclidean spaces and that we are using Borel a-algebras),
are we obtain for every
value of T(x) the conditional distributions of the probabilities nt,e<A In (cf.
Theorem 2.1.1). Since T is a sufficient statistic we may assume for 0 € n0 that
(4.3.11)
76 IV. SIMILAR TESTS AND STATISTICS
n
for any fixed value 80 € 0• In accordance with Theorem 4.3.1, for given a €
(0, 1) and a given value of T there exists a set Ar € Cf such that
statistics. The family (4.3.15) is of the form (4.3.5) and, in accordance with what
was said above, it admits nontrivial similar zones. It can be regarded as a "de-
generate exponential family".
Example 2 (cf. Example 8, Chapter II, §2). Consider a distribution of the
Pearson-III type:
where p (x; a, y) =0 for x < a. Here m >0 and y > 0. Let m denote a positive
integer ascertained from observation. Suppose that we wish to verify the hypo-
thesis H0 : y = Yo with the aid of a test that is similar with respect to the values
of a from a repeated sample o(size n.
We know (see Chapter II, §i) that for m >1 there will be only trivial suf-
ficient statistics and we cannot construct Neyman structures. However we can
use the device described above. The probability density of the repeated sample
(x 1 , • • ·, xn) is equal to
p (x 1, ••• , xn; a, y)
for minixi ~
= (r
We have seen that in general there are no similar zones or even randomized
similar tests for infinite families of distributions. However, as Besicovitch dis-
covered in 1961 (see [3]), there are approximately similar zones under extremely
general conditions. Here we shall expound a portion of his results. To make the
exposition simpler we have strengthened somewhat the hypotheses in his theorems.
Consider the family of distributions IP e }, 0 € 0 0 , on a measurable space
('.X, £!)
admitting a scalar statistic T(x) that has a probability density p8 (x)
with respect to Lebesgue measure for all values of 0. Also let us assume that
Pe(x) is an absolutely continuous function whose derivative exists everywhere
and has an absolutely bounded majorant M(x):
Ip~(x) I< M (x), M (x) <. M 0 • ( 4.4.1)
j p (x)dx>-1-:.
9 ( 4.4.4)
JN
Then we partition IN by means of the points • • ·, x_ 1, x 0 , x 1, x 2, • • • into sub-
intervals In-l,l = [xn-I• xn] such that
( 4.4.5)
Thus
(4.4.6)
rn-1
where
(4.4.6a)
Therefore
I J
11.-1, a
( 4.4. 7)
1
f ( Pe x )d X= ( Xn-Xn-1 ) Pe ( Xn-1 )+
(Xn-Xn-1) 2
2 Y1(x, n).
(4•4•8)
n-h 1
Furthermore, by inequalities ( 4.4. 5) and ( 4.4.6a) we have
J Pe(x)dx= ~(xn-Xn-1)Pe(Xn-1)+Y:,
-N (n)
J
Ee
Pe(x)dx =a J
-N
Pe(x)dx+ '':e +vi :
N
=a
-N
JPe(x)dx+v2i.
80 IV. SIMILAR TESTS AND STATISTICS
J p 6 (x)dx=a+ve. lvl<: 1.
Ae
The approximately similar zones constructed in accordance with Theorem 4.4. l
have characteristic structure of a "laminated threshold". If we decrease the
number f characterizing the approximation, the layers become ever finer and the
number of them increases in a bounded region of space. From a statistical stand-
point the use of approximately similar critical zones of this kind is inexpedient.
Of course, in evaluating the expediency of such a method we must begin with a
definite criterion, for example the behavior of the power function. However, we
can indicate beforehand why it is not in general statistically expedient to use
approximately similar zones constructed as indicated in the proof of Theorem 4.4.1.
For an observation X € :X: to belong to an approximately similar zone Af it
is sufficient that the probability density p8 (x) of a statistic <l>(x) be smooth and
that it have a bounded derivative. If the space n of the parameters <01, ••• ' (}s) €
Es is a compact set, the case in which IPe(x) I is bounded for (} € no c n but
not bounded for (} € n\no is likely to be a pathological rather than a natural sit-
uation. The natural cases are those in which this quantity, if bounded for (} € 0 0,
is also bounded for (} € n. Then the approximately similar zones Af constructed
for testing the hypothesis H0 : (} € n0 will be approximately similar with the
same level for all (} € n and a test based on them is from a practical point of
view useless.
sample. In many cases such properties of independence are connected with the
properties of completeness of exponential families and with the Lehmann-Scheffe
theorems. These relationships were studied in the articles by Hogg and Craig
[73] and Feigel' son [74]. We shall use the notation of Theorem 4.2.l and shall
assume the completeness conditions mentioned in that theorem.
Let x(x) denote a similar statistic for 8 € 0 0 • Consider zones of the form
lx(x) < fl. They will be similar for arbitrary g. Suppose that cl>g (x) is a test
for the hypothesis H0 : () € n0 with critical zone lx(x) <fl of level a. By
Theorem 4.2. l the test cl>g (x) is a Neyman structure. Thus for almost all values
of T = t we have E (cf>g(x) It) = a when () € no. This means that
Then obviously g(xi - xi) is similar with respect to (a, a). Since the exponential
family in question is complete, the statistic g (xi - xi) is independent of the pair
(x, s 2 ). In particular, statistics of the form
n - "
;~ = ~ (Xi S X) '
2" l =1
where the notation is the convential one and v ~ 2 is an integer, are independent
of th~ pair (%, s 2 ). (This property was derived in Cramer's book [33) by means of
a rather complicated calculation.)
Example 3. Let x 1, • • •, xn denote a repeated sample from a distribution of
the Pearson-Ill type (see Example 8, §2, Chapter II) where a= O. Thus
is independent of x 1 + • • • + xn •
Example 4. Consider a repeated sample from a 2-dimensional normal set with
independent components, i.e. with probability density (in the usual notation)
2 +(Y-a2) 2 ]
Pa(x, Y) = - -1e x p - -
l[(x-a1)
2 2 •
2:n:010'2 2 0'1 0'2
=
0 (a1, a2, 0"1, 0"2).
Here ai € (-oo, oo) and ai € (0, oo) (i = 1, 2), and we have a complete exponen-
tial family.
Now consider a measurable vector-valued function g(xl, • • ·, xn; Yl• • · ·, Yn)
that is invariant under the transformation xi--+ axi + {3, Yi--+ YYi + 8, where
a, {3, y, and 8 are constants with a and y nonzero. This transformation makes
p8 (x, y) a probability density, where a 1 = a 2 = 0 and a 1 = a 2 = 1. In view of
this distribution, the function g(x 1, • • ·, xn; y 1, • • •, y n) is independent of the
parameters a 1, a 2, a 1, and a 2, and hence is a similar statistic that is indepen-
dent of the sufficient statistics of the family:
n n
x, y, si=* ~(xi-x)2; s~= ! ~(y,-y)2.
l=l l=l
§6. A THEOREM OF H. CARTAN 83
n1 xi (x;-x)- (Y;-Y)-
~
i=I
r =---------
S1S2
If two statistics Q(X, a) and Q(X, a') of the family are identically distrib-
uted, i.e. if Q(X, a)~ Q(X, a'), then existence of the moments E (Q(X, a))' and
E ( Q (X, a'))r, r = 0, 1, 2, • • • , would mean that the moments are equal:
E(Q(X, a))'=E(Q(X, a'))', r=O. 1, 2, .. ·; ( 4.6.2)
Conversely, if the problem of moments were determinate in the present case, then
a countable set of equations (4.6.2) would imply identical distribution of the
statistics Q(X, a) and QCX, a'). The conditions under which the problem of
moments is determinate are well-known:
00 1
~ ~2r=oo,
r=l
where a 2r is the 2rth moment (see [80]).
Although these conditions are only sufficient, they are almost necessary.
In this connection, these conditions are not satisfied even with a normal distrib-
ution of polynomial families of statistics (see [39]). Therefore we shall consider
not the moments (4.6.2) but their generalizations which we shall call the <1>-mom-
ents. Specifically, suppose that we have a family of continuous functions
{<1>/U)l (r = 1, 2, • • ·) of a real argument U. Suppose that the family {<l>,(U)l has
property (D):
(D) Determinacy of the analogue of the moment problem: The mathematical
expectations (<1>-moments)
E<I>, (Q (X, a)), (4.6.3)
exist, and their value determines the distribution Q(X, a) up to µ-measure zero.
§6. A THEOREM OF H. CARTAN 85
Furthermore, let us suppose that, relative to the family {<l>/U)l, the statis-
tics Q(X, a) possess property (f'):
(I') Holomorphic connectedness. Let us define
E (<I>, (Q (X, a)))= q>, (a 1, ••. , as). ( 4.6.4)
On the basis of the corollary of Cartan's theorem mentioned above, the ideal
I must have a finite basis and must be generated by the differences
. ("al' ... • as
q>, . )- q>, (al'
' ... • as')
86 IV. SIMILAR TESTS AND STATISTICS
for points a € n and a' € n' imply independence of the statistics Q(X, a) and
T (X, a'). In analogy with condition (I') above, let us formulate a new condition
(r') of "holomorphic connectedness" for this situation:
(r'). There exist simply-connected compact polycylinders 01 :::> n and Oi J
n ' such that the functions Xr(a), r = 1, 2, • · • , are holomorphic for a € 0, the
functions rt(a'), t = 1, 2, • • ·, are holomorphic for a' € Oi,
and the functions
<Pr)a, a') (r, t = 1, 2, •.• ) are holomorphic for (a, a')€ 0 1 x Oi.
(Again, this
applies to boundaty as well as interior points.)
Under these conditions we have
Theorem 4.6.2. For given families IQ(X, a)l, {T(X, a')I, l<P,l, and !'Pt},
there exists a constant R 1 such that finitely many uncorrelatedness relations
E [<I>, (Q (X, a)) 'I', (T (X, a'))]
=E<l>,(Q(X, a))E'I' 1 (Q(X, a')) ( 4.6.12)
for 1 :5R 1 and t :5R 1 and for arbitrary given values of a and a', imply indepen-
dence of the statistics Q(X, a) and T(X, a').
The proof is analogous to the preceding proof. On the polycylinder 0 1 x Oi
we consider the ideal I generated by the functions
<p,, t ta. a') - '1.r (a) 1'1 (a'). ( 4.6.13)
§6. A THEOREM OF H. CARTAN 87
For every cotest t/I = t/I (x) there exists a constant C such that
- a ::; Ct/I (x) $ 1 - a (5.1.3)
89
90 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES
for all x. The number a is called the level of the cotest. A precotest of a level
a is defined to be any statistic g (x) for which E (g (x)JO) whenever 0€ n and
E(g(x)JO) =o for 0 € n0 . (5.1.4)
To describe similar tests of level a we need only know how to describe the pre-
cotests. If we have a description of the precotests, for example a parametric de-
scription with arbitrary functions belonging to some class serving as parameters,
we need only choose f~om them the precotests such that - a~ g (x) ;$ 1 - a for
all x. Similar tests are of the form
~(x) = g(x) +a.
Let us now look at the less general scheme introduced in § 1 of Chapter II.
The parameter 0 = (0 1, • • • , 0 s) lies in some compact Borel subset n (usually a
parallelepiped) of the Euclidean space Es . The probability density
p (x; 0 l' ••• ' 0 s) with respect to the dominating measure µ. is assumed to be a
continuous function of the parameter 0 € n for almost all values of x.
Consider the precotests belonging to the class K of complex-valued statis-
tics f(x) satisfying the following condition: the space E2 s of complex s-tuples
Io r ... , os l contains a simply-connected compact polycylinder z ::> n such that
the mathematical expectations
E (f(x)JO) = ¢(0) (5.1.5)
exist and are holomorphic on Z. The "images" ¢(0) of the statistics f(x) con-
stitute a ring K over the field of complex numbers.
Of course the class K of statistics of this sort may be trivial and reduce to
the statistic f (x) =O. Even if it is not trivial it may prove too small to contain
the cotests that are optimum in some sense or other. However, for the case of ex-
ponential families the class K is sufficiently broad. Furthermore, finding the pre-
images f (x) from their images ¢ (0) is comparatively simple for these families.
If f(x) € K is a precotest, the image ¢(0) vanishes for 0 €no· The con-
verse assertion is also obvious. The set of functions ¢ (0) € K that vanish for
0 € n0 constitutes an ideal of the ring K. We shall call this ideal the cotest
ideal of the ring K and denote it by I H 0 . (It would be more accurate to call it the
"precotest ideal" but such a term would be too cumbersome.) Thus description of
the cotests in the class K reduces to description of the ideals of functions in the
ring K that are generated by functions that vanish on no c z (see Chapter I). It
should be noted, however, that the theory of ideals in rings of holomorphic functioos
§ 1. SIMILAR TESTS AND COTEST IDEALS 91
was developed (by Cartan and others in the publications cited in § 3 of Chapter I)
primarily for rings of all functions that are holomorphic on a compact polycylinder
and not on their subrings. However, even the information that we already have is
quite useful for investigating similar tests in certain families of distributions.
Under the conditions of a given Bayes measure B (0) on simple alternatives
to H 0 , it is natural to treat as the optimum similar test out of all similar tests
that one cl> that maximizes the integral f cn\Oo)E (cl> I 0) dB (0). However, if such
a test exists it is usually hard to find. Therefore it may prove expedient to find
an f-optimum similar test «l>f satisfying the condition
where the supremum is over all similar tests cl> of given level a and where f is
a positive number. Instead of an f-optimum test, we need only find an f-optimum
cotest t/lf satisfying condition (5.1.6), with «l>f replaced by tflf, and the condition
·For families (see above) with densities of the form {p (x; 0 1, • · ·, (} )} which
. s
for all x are smooth functions of the parameter (} = (0 1' · • • , (} s), we may consider
optimization of the test from the standpoint of the local properties of its power
function (see [36 ]). For example, if a hypothesis H 0 is of the form H 0 :
y(0 1, •••,Os)= y 0 , where y is a smooth function, and if y 1 = y(0 1, •••,Os),•••
• •'' Ys = Ys (01' •••,Os) is a local coordinate system, we may examine the be-
havior of the power function
(5.1.8)
confine ourselves to similar tests that depend only on sufficient statistics. Char-
acteristics of the type (5.1.6) will be no worse for these.
where the c i are constants, reduce to this case if we replace the Ti with suffi-
cient statistics Ti - ci (i = 1, 2, · · ·, s ). We shall call exponential families with
sufficient statistics of this kind one-sided exponential families. Without loss of
generality we can write (making a transformation of parameters if necessary) the
probability densities for the sufficient statistics of a one-sided family in the form
(5.2.3)
where, in the case of a one-sided family, we assume T. >O for i = 1 2 ··· s.
t - ' ' '
In a more general situation we assume that h (T 1, • • ·, Ts) behaves like the func-
tion m (Tl' · · • , Ts) in § 1 of Chapter I, specifically that h (Tl' • .. , Ts) vanishes
for Ti < 0, where j = 1, 2, · · ·, s 1 and s 1 =5 s. The integral (5.2.3) converges
absolutely in the Cartesian product P of the s 1 half-planes R.: Re(). > O
I I
§L INCOMPLETE EXPONENTIAL FAMILIES 93
unbiased statistics. One can easily see that the case in which the function
h (Tl'· • • , Ts) vanishes outside an s 1-faced cone in s-dimensional Euclidean
space but is nonzero almost everywhere inside that cone reduces to condition (A).
By applying a suitable nonsingular linear transformation to the sufficient
statistics, we can then apply a linear transformation to the parameters, replacing
them with new parameters, in such a way that condition (A) is satisfied.
d
Xexp[-(0 1T 1 + .. .+6 T )]h(T
5 5 1, •• • , T 5 )dT1 ••• dT 5 (5.3.1)
(5.3.2)
We note that C (O) -!- 0 for 0 € 0, since the quantity ( 5. 3.1) must be equal to
=
unity if ~ (T 1, • • • , Ts) 1. Consider the set of all statistics m (Tl'··· , Ts) for
which the s-fold Laplace transform
L (m I0) = f ... f m (T 1• • •. , T5 )
d
Xexp[-(0 1T 1 + ... +0.T )]dT 5 1 ••• dT3 (5.3.3)
Tl Ts1 00
0 0 -oo
(S.3.6)
belongs to !Dl and
L(m 1 * m2 10) = L(m 1 10) L(m 2 , 0) (S.3.7)
Let m denote an arbitrary function belonging to !JJt. From (S.3.8) and (S.3.7)
we have
- a~ g ~ 1- a. (S.3.9)
To choose an f-optimum cotest i/Jf, we impose, following (S.1.6), the condition
where the supremum is over all cotests i/Jf of level a subject to the condition
96 V. COTEST IDEALS FOR EXPONENTIAL F AMll.IES
(5.3.9), in which we need to set g =I/Ji. The £-optimum similar test is <l>f.= l/lf.+a.
However, construction of a basis for the ideal L (I H 0) in the ring L (9.lt) is a diffi-
cult matter. Investigation of the ideals in rings of holomorphic functions has been
carried out primarily for rings of ide~ls of all functions that are holomorphic on
some compact polycylinder (see the articles of Cartan cited above, and also of
K. Oka and other authors). In a recent article [29] Cartan investigated ideals in
rings of all functions that are holomorphic in the region of real values of the argu-
ments. We know virtually nothing about other types of rings of holomorphic func-
tions, in particular about our basic ring L (!lit). Therefore, having in mind the appli-
cation of Cartan' s results (expounded in Chapter I,§ 3) to an investigation of the
images of cote st ideals, let us look at an extension L (/ H 0) of the ideal L (I H0)
--
for the ring L (9.Jt) of all functions that are holomorphic on "'
0 (see Chapter I). We can
describe a basis for this extension L (/H 0) under certain conditions in the same
way as for an ideal in L (!D1). By imposing further conditions on the function
h ( T 1, • · • , T5 ), we can use our knowledge of the basis for the ideal L (/ H 0) to
find the £-optimum cotests. The values assumed by the parameters 0 1 , • • •, 05 lie
on the parallelepiped 0 c A. To simplify the remainder of our discussion let us
suppose that the parallelepiped n is of the form
0/;;:, ei• j = 1, 2, ...• s1; er<.. 0r<.. E ;• j = s1 + 1•...• s, (5.3.11)
where the f. j are small positive numbers and the E; are given positive numbers
less than the corresponding Aj.
The boundary of the region P contains points with infinite coordinates, in
particular the point (oo, • • • , oo). We denote by ao p the set of such boundary points
of the region P. Functions of the form (5.3.3), i.e. elements of the ring L (!D1), will
not in general be analytic at the points of ao P. It turns out that for s = 1 analy-
ticity of these elements at the points of a0 p is equivalent to entireness of a cer-
tain order (see [ 10 ]) of the function m ( T 1) of the complex: variable T 1 .
The requirement that the precotests be holomorphic may restrict our choice of
similar tests, and we shall not make this requirement. In such a case we need to
reckon with the difficulty arising from the fact that we are unable to consider func-
tions that are holomorphic on all P and must assume that they are holomorphic in
the interior but not on the boundary of P. One encounters considerable difficulties
in studying ideals of functions that are holomorphic in an open region. Up to the
present time such ideals have not been completely described even in the simplest
§4. APPLICATION OF CARTAN'S THEOREMS 97
cases (see Gleason ( 13 ]). What we have .to do is impose stringent requirements in
order to obtain a sufficiently complete description of the corresponding similar
tests. With a view to applying the corollary to Theorem 1.3.1 (Cartan's theorem),
we shall study first not the cote st ideal L (/H 0) but its extension L (I H 0) in the
ring of all functions that are holomorphic inside P.
arr1 arrl
aol aes (5.4.1)
rank . <r.
arr, ... arr,
aos aes
has no points inside P.
We note that condition (C) is not especially restrictive. The inequality re-
garding the rank of the matrix yields, in general, an analytic set of complex dimen-
sion not exceeding s - r - 1. In general the first r of conditions (5.4.1) make the
set !a empty and not merely finite. Also, it is easy to exhibit many cases in
which condition (C) is not satisfied. Thus, if the entire functions II j (e 1, · • ·, ()s)
are squares of other entire functions, then the set !a has, in general, complex di-
mension no less than s - r. Of course. in our present example it is not expedient
to use relationships of the form (5.2.4). Without changing the problem we can re-
place them with the conditions
n)• = o U= 1, 2 •... , r).
Now we can prove
Theorem 5.4.1. If conditions (B) and (C) are satisfied, the ideal L (IH 0) in-
side P is generated by the functions II 1' • • • , II r. Thus if F € L UH 0 ),then
there exist functions G 1, • • •, Gr, holomorphic inside P such that
F = II 1 G 1 + • • • + IIr Gr . (5.4.2)
To prove this we note that, by virtue of condition (B), the ideal L UH 0) coin-
cides with the ideal of all functions that are holomorphic inside P and that vanish
at all complex points in the analytic set VII 1•.. II r inside P. The representation
(5.4.2) then follows from the corollary to Cartan's theorem (Theorem 1.3.2).
from the theory of rings is instructive. Let K [x, y, z] denote the ring of all poly-
nomials in the three variables x, y, z over the field of real numbers. Let
K 1 [(x), y, z] denote the ring of all polynomials in the variables y and z whose
coefficients are rational functions over the same field. Then
K [x, y, z] CK [(x), y, z].
In the ring K [x, y, z], consider the ideal I 0 of all polynomials without a constant
term. This ring has the three-number basis {x, y, z}. An extension I in the ring
K 1 [(x), y, z] is obviously the ring itself, which has the one-member basis {1 }.
Thus the basis of the original ideal is quite different from the basis of its exten-
sion. However, since we are interested not in the most general case of ideals in
rings but in the examination of the behavior of the precotest ideals L (/H 0), under
certain conditions the basis of L (/H 0) gives us a great deal of information re-
garding the ideal L (/H 0).
Let us look at smooth precotests g (T 1, • • • , Ts) in the precotest ideal IH 0 ,
the preimage of L (/H 0). We shall say that a precotest g is smooth and that it be-
longs to the class L (m) (where m is a positive integer) if
(5.5.1)
I:~i I=I= 0
aj
(i, j = 1, 2, ... , r). (5.5.2)
where the functions G'. = [(e 1 +I)·· .(es+ I)]-NGi are holomorphic inside P710 .
Suppose that F is botinded inside P by a constant M: \F\ ~ M. If we define
M1 =supp + 1)\-N we obtain \F 1 1~MM 1 = M0 on P, where M0 is
\(e 1 +I)··· <es
a new constant. By virtue of condition (5.5.2), Theorem 1.2.7 is applicable, so
that inside p we have the following bound for the functions c; (j = 1, 2, ••• ' r):
j(e 1 +1)···<es+1)j~1 in P.
Let us now suppose that g (T 1' · · ·, Ts) is a smooth precotest of the class L <m>,
where m ~ 1. Then F(e) = E(g\e) satisfies the relation (5.5.1).
If the function F belongs to the ideal L (IH 0 ), then so does the function
F 2 = [F(e 1, •• ·, e)r
E L(IH 0 ), which is holomorphic inside P. Also,
lF (er·· · , es )Im < M, where M is a constant. Therefore F 2 has a representation
of the form (5.5.3):
(5.5.6)
(5.5.7)
If we now remember that (5.2.5) is holomorphic and if we multiply and divide every
term on the right-hand side of (5.5.8) by [(O 1 + I)•·· (0 s + I)]N + 2 , we obtain
inside P
(5.5.9)
where
(5.5.10)
If m 2: N + 3, then by Theorem 1.1.3 the functions A (01' ••• ' 0s) will be unilateral
Laplace transforms of A/0 1, • • ·, 0s) = L (Hi \0 ), where Hi= H (T 1' • · ·, T 5) is a
function that vanishes for T 1 < O, • • ·, T < 0 and satisfies the relation
s1
(5.5.11)
for arbitrary T/ > O.
Furthermore, H has partial derivatives of at least the first m - N orders. By
Theorem I.I.I (the convolution theorem), if the precotest ( is sufficiently smooth
we have the representation
(5.5.12)
where Ri and Hi (j = I, 2, · · ·, r) are the functions described above. Furthermore,
we shall prove that every cotest can be replaced by a cotest of any desired degree
of smoothness in such a way that the basic statistical properties change by an
arbitrarily small amount.
Let us generalize the concept of the gain function W(cl> I B) (see §I); in par-
ticular let us generalize formula (5.1.6) to precotests (including cotests) by setting
We have
Tl Ts, oo
p6=D6*P= f ds1
0
f dss, f dss,+1
0 -oo
00
To prove this we note that, by Theorems 1.1.4 and 5.6.I, we have, for () € "'
n,
E (66 I 0) = L (D 6 I 0) L (p 10)
We saw above that, under certain conditions of a rather general type, a smooth
precotest g can be represented as a sum of convolutions
(5.7.1)
where A 1, • • • , A, are given functions corresponding to the connection conditions
and H 1, • • ·, Hr are smooth functions of a fairly arbitrary type. Conversely, every
expression of the form (5.7.1) is a smooth precotest. For a precotest g to be a
cotest of level a, it must satisfy the condition - a .:S g .:S 1 - a.
If we began with the cotest t/J of level a and then smoothed it in accordance
with formula (5.6.4), we would obtain the smoothed precotest
"16 = h- 1 (D6 * lllh). (5.7 .2)
For this to be a cotest of level a it is necessary and sufficient that
-ah<D 6 *"1h<(I -a)h. (5.7 .3)
Let us assume that the level a is neither 0 nor 1. In such a case, if we found a
smoothing function D 8 that would lead not to the inequalities ( 5. 7. 3) but to the
inequalities
(5.7 .4)
where T/ > 0 is an arbitrarily small fixed number, then, by replacing D 8 with
(1 - r/) D 8 , where T/' is sufficiently small, we would obtain a smoothed cote st
t/I gof level a. By formula ( 5.6. 7) the Bayes gain W(t/J 81 B) would be arbitrarily
close to W(t/JIB). Such a situation is satisfactory, so that we can make not the
requirement (5.7.3) but the requirement (5.7.4), which is equivalent to the inequal-
ities
(5.7.5)
We see that the zeros of the density function h = h (T 1, • • • , T8 ) play an extremely
important role here. To obtain a satisfactory description of £-complete cotests of
a given level we need to impose rather stringent restrictions on the behavior of
the density h. However these restrictions are satisfied for an extremely large
number of cases that we encounter in statistics. In the class of problems that we
are considering, the function h (Tl' • • • , Ts) disappeared for T; < 0, where j is
one of the numbers 1, 2, · · ·, s l' so that we considered it in the region j":
T 1 > 0, • • ·, T8 1 > O; - oo < T; < oo (for j = s 1 +1, • • ·, s). Let us impose on the
function h
§1. FORMATION OF SMOOTH PRECOTESTS 105
Condition (N). The function h (Tl'··· , Ts) does not vanish in the region j"_
In that region it has continuous first partial derivatives. Furthermore, for every
l >0 the inequality
where A 1, A 2, and a(< 1) are positive constants, holds in the region j"l:
T 1 ~l, T 2 ~E,···, Ts 1 ~£.
This condition makes it possible to ensure that the inequalities (5.7.5) will
be satisfied.
Theorem 5.7.1. Suppose that condition (5.7.6) is satisfied. Then from a given
cotest t/J of level a it is possible to construct an arbitrarily smooth cotest t/J ~
for which the Bayes gain W(t/J'8 \B) is arbitrarily close to the initial gain W(t/l\B).
To prove this theorem let us choose a smoothing function in accordance with
formula (5.6.2). First we set
r = 2. (5.7.7)
The parameter 8 > 0 will be determined later. By formula (5.7.2) we obtain, for a
given cotest t/J of level a,
Tl Ts, co co
0 0 -co -co
x d 6r (6)1 • · • d 6r (6)s h (Ti h-6i. ... , Ts -ss>
(T1o .•• , Ts)
(5.7.9)
a
where the notation (a In h/ Ti) means that the derivatives are evaluated at an
intermediary point (T 1 - og l' ... ' Ts - og. s), where 10 Is 1. Furthermore, by
106 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES
virtue of the definition of the function d 8 /x) and equation (5.7.7), I~; I~ o/4.
Using (5.7 .9) we get
For sufficiently small o we then see that for all points of the region of integration
h(T1-s1>····Ts-ss)_ 1
h(T 1, •••• Ts) -
+s6• (5.7.10)
where ~ 8 --+ O as o--+ O. Furthermore, for the cotest· t/l we have the usual in-
equalities: - a~ t/J ~ 1 - a. If we substitute (5.7.10) into (5.7.8) we obtain
(5.7.11)
IOT;-
a + ''. +I. a
In h \ In h \ = 0 (-1
aTs EP
+ 1) ' (5.8.3)
Here the functions II 1' • • • , II r must be real for real () 1' • • • , () 5 • When we
multiply by 1/ [(() 1 + 1) • • • (() 5 + l)]N, where N is a suitable integer, the func-
tions II 1' • • • , II r become holomorphic functions of 1/(() 1 + 1), • • • , l/(e 5 + I) on
P. (The point ()j = oo is included.) The null hypothesis consists in satisfaction
of conditions (5.8.4) on the compact set n of real points (()1, ••• ' ()s) defined by
the inequalities
U= I, 2, .. ., s). (5.8.5)
ted set R~ 1, ••• , II r of real points that is contained in n0 and that has real dimen-
sion s - r.
(Y II) In the region P the relationships (5.8.4) admit choice of variables
108 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES
Oa. 1, ••• , Oa.r such that the determinant laII/aOa.il' i,j= !,···, r. has no zeros
inside P.
Condition (Y 11) is somewhat restrictive. It follows from what was said in§ 3
of Chapter I that in the case r = I (hypothesis H 0 with one relationship) this
condition can be replaced with the weaker condition
complete family of tests <I> that are similar with respect to the hypothesis H0 .
Theorem 5.8.1. For given f > O and a given similar test <I> of level a there
exists a cote st t/Jf = t/lf (T 1' • • ·, Ts) oflevel a such that IW(t/J f IB)- W(t/JI B)I ,'.Sf,
where t/I = <I> - a. Here t/I f (T 1' • • ·, Ts) has the prescribed number of partial de-
rivatives and can be represented in the form
1
lPe (T1, · · ·, Ts)= h (A1 * H1 + ... +A,* H,). (5.8.6)
In this equation, the A. are the preimages of the functions IJ./(0 1 ···0 )N+l
I J s
under the unilateral Laplace transformation
II.
l(Aj!0)= J N+l'
(81 ... 8s+1)
and H 1, ···,Hr are functions possessing the prescribed number of partial deriva-
tives and satisfying the relation
Hj(T1, ... , Ts)=O(exps(ITil+ ... +ITsl).
j =
1. 2, ... , r, (5.8.7)
where g is a sufficiently small positive number. For T. < 0, where j is one of
I
the numbers 1, 2, · · · , s 1, the function Hi ( T 1, • • · , T 5) vanishes.
Conversely, every expression of the form g = 1/h (A 1 *H 1 +•••+Ar* Hr),
where the H. are continuous, which vanishes for T. < O (j = 1 2 • • • s ) and
I J ' ' ' 1
satisfies condition (5.8.7), is a precotest. If H 1, H 2, ···,Hr are chosen in such
a way that
- a~ g ~ 1- a,
then g is a cotest of level a.
§s. THE FINAL RESULTS 109
In view of this the search for £-optimal si~lar tests in the problem posed
leads to a variational problem with constraints: Find continuous functions
H 1, ···,Hr satisfying conditions (5.8.7) such that setting
1
s= h(A1 *Hi+ · · · +Ar* H,), (5.8.8)
(5.8.11)
The assumption that the cotest is unbiased with respect to H 0 leads to the follow-
ing equation on no:
iJ
arr . E (¢ I0) = o U=, 1, 2, ...• r).
J
From (5.8.11) we see that the functions G 1, · · ·, Gr must vanish for II 1 = •••
• • • = II r = O; in other words they must belong to the ideal Tif 0 . The functions Gi
are holomorphic in P. By what was shown above, 0
U= I, 2, ... , r),
where the Gil (l = 1, 2, · · ·, r) are functions of the same type as the Gj" Thus
we obtain
r
E (¢ \ 0) = ~ oiiIIJii. (5.8.12)
i, j=l
From this we get
Theorem 5.8.2. Under the conditions listed at the· beginning of this section,
every sufficiently smooth unbiased cotest If; can be represented in the form
T
From this we see that the family of distributions defined by the random vector
(X, V 1' • • ·, Vk) has sufficient statistics (X 2 , V 1, • • •, Vk) and that it is a uni-
lateral exponential family. Let us set
1
01 = - - = -,-..,....,.-----............,.....,-
202 (i..202 '}..202),
2
n1
.....!....!...+
+~
nk
1 I
02=-2-· ...• okt1=-2;
2o 1 2ok
On the basis of what was said above, from a given Bayes distribution B (8)
on an alternative hypothesis n \no for the construction of an £-optimal sufficiently
smooth cotest t/Jf(x, Vp•••, Vk)' where X = X 2 and v 1,• ••, Vk are the values of
the statistics V 1, · • · , Vk, we can use the expression
lJle(x, Vi, ... • vk)
= (F 0 (x. vl' ... ,. vk) * H (x. vl' ... , vk)] x1'•v;m1 ... v;mk.
Here F 0 = x - A.iv /n 1 - ••• - Af vk Ink and H (x, v l' · · ·, vk) is a smooth func-
tion such that
H(x, Vi, ... , vk)=O(expe(x+vi+ ... +vk))
for arbitrary e > 0.
The cotest t/1£ must satisfy the conditions
-a<,x'1•v;m• ... v;mkF0 •H"" 1-a,
where a is a level. Under such restrictions, we pose for H the variational prob-
lem of maximizing
(5.8.14)
In the class of smooth functions H this problem can be solved "with accuracy up
to £"; i.e., for arbitrary £one can find H =HE for which the cotest t/1£ is £-opti-
mal in the sense indicated by inequality (5.1.6).
It is natural to ask whether it is possible to construct from a given cotest t/J
of level a a sufficiently smooth cote st t/J 8 for which the Bayes gain W(t/J 8 1B) is
arbitrarily close to W(t/J jB), so that to find an £-optimal cotest we need only con-
sider smooth cotests. The answer to this question is affirmative. To show that
this is the case we need to modify somewhat the proof of Theorem 5.7 .1.
We set
Ti=Vi, T 2 =V2, ... , Tk=Vk, Tk+t=x, s=k+l;
i
h (Ti, ... , T 8 ) = T~ 1 ••• T:kT;!r
Let us set up a formula of the type (5.7 .8) taking, for small given (), the func-
tions d~i> (~ ;), j = 1, 2, · · ·, k, in the form of "smooth peaks" of area 1 with car-
riers [8/2 - ()2, 8/2 + 8 2 ]. For j = k + 1, we put the function d~ +l) (~ k+l) in the
form of a "smooth peak" of area 1 with carrier [-8/2 - 8 2 , -8/2 + 82 ]. Define
( 1) (k + 1) • . • •
D8 = d8 ••• d8 • A formula of the form ( 5. 7 .8) 1s written 10 the following form:
§ 8. THE FIN AL RESULTS 113
T1 Tk +co
. dbk>(sk)dbk+l>(sk+i)( Ti
0
~s1 r• ..... (
-co
i
Tki:Sk rk
X ( Tk+1-sk+1
Tk+•
)-2 ¢ (T t
1 - ':>l• • · ·•
T t
s- ss)·
. . (1) (k + 1) ( t: )
By the construction of the functions d 8 (g 1), • • ·, d8 s k +l the product
1
P= (Ti -61 )ml ... ( Tk-sk )mk ( Tk+i -sk+I )-2
T1 Tk 1k+•
satisfies the inequality 0 P ~ 1 in our region of integration, so that the expres-
~
sion ifJ 8 = h - l (D 8 * ifJh) provides a cote st if ifJ is a cote st. The level of the co-
te st ifJ 8 is equal to the level of the cote st ifJ. Their powers differ by an arbitrarily
small amount for sufficiently small 8. This answers our question.
If we set k =2 in the above statement of the problem of testing a linear
hypothesis, then without loss of generality we can take the condition of connec-
tedness in the form a 1 - a 2 = 0. Thus we obtain the Behrens-Fisher problem. Here
F 0 = x - v 1 /n 1 - v 2 /n 2 , so that the formulas have a simple form. (In [56] the
letters x, u, and v denote variables proportional to the quantities for which we
have been using them, so that the linear form F 0 has a slightly different appear-
ance.)
Let us calculate the Bayes gain (5.8.14). We set x = X 2, l/Jf(x, v 1, v 2 ) =
l/J f (X 2 , v 1, v 2 ); 4>f = ifJf + a. The joint distribution of X, v l' and v 2 has proba-
bility density
p (X, Vi, V2)
= y21na Il [Ci)"•;'
z=i
(n,1- 1) a;"•"v;•;']
r - -2 -
1( (X - 6) 2 n 1vi n 2 v2 ) a~ a~
Xexp [ - -
2 a
2 +-2-+-2-;
~ ~
a2=-+-.
~ ~
For given a 1, a 2 , a 1 - a 2 = 8, the power function is
co co co
=
-co
f dX f dv f dv <f>e(X
0
1
0
2 2, Vi, v2). (5.8.15)
114 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES
By virtue of the evenness of our similar test with respect to X, we see that
the test is unbiased: for arbitrary a 1 and a 2 we have
:ti E (<1> (X
8
2, v 1, v 2) I o 1, o2, b) = U for b=O,
for b= 0.
I (5.8.16)
These relationships mean that the similar test cI>f is unbiased. If the Bayes dis-
tribution B (O) on the alternative n\ 0 0 is concentrated on intervals of a straight
line a 1 = a<;), a 1 = aiO), - oo < [) < oo, [j f, 0, then we obtain the locally most
powerful similar test if we maximize the functional ( 5.8.16) (see § 1). Thus for
the similar test cI>f we have a variational problem, that of maximizing the left-
hand member of (5.8.16) with the constraints
(5.8.17)
This problem is amenable to numerical solution by the methods of linear program-
ming.
Let us turn to the general problem of testing a linear hypothesis with unknown
weights.
Let X 1, • • ·, Xn denote independently distributed normal variables such that
Xi € N (ai, yf). Here, the ai are a priori subject to n - s "initial" linear rela-
tionships. Specifically, if
then there exists a constant matrix C = Cn-s,n such that Ca= o and rank C=n-s.
Thus a 1, • • ·, an can be expressed as a linear combination of the parameters
b I' ... ' bs.
Suppose we are verifying the hypothesis H 0 that, in addition to these rela-
tionships, there are still r ( < s) relationships (which may be thought of as im-
posed on b 1, • • ·, bs ). Specifically, we introduce a matrix F = F rs of rank r <s
such that
Pb=O,
§s. THE FINAL RESULTS 115
One can show that the theory expounded earlier, except possibly for ~e conditions
on h (Tl'···, T 5 ), is applicable here.
Let us look at an individual case of a linear hypothesis, the general Behrens-
Fisher problem. Here n = n 1 +n 2 is partitioned into two samples x 1, • • • ,xn 1
and y 1 , • • • , y n 2 of sizes n 1 and n 2 • The initial relationships are of the forms
a 1 = a 2 = ••• = an 1 = a 1 and an 1 +1 =···=an= a 2 . We also introduce the
condition y 1 = y 2 =··•=Yn 1 = a 1 ; Yn 1 + 1 = • • • = Yn = a 2 • Sufficient statistics
are X 1 = x, X2 = y, V1 = s i, V 2 = s ~- By formula (3.3.14) the probability density
is
~: - ~: = 0. (5.8.18)
T 1 T 4 - T 2 T 3 if Ti.> o for i = 1, 2, 3, 4,
A (T) = {
0 otherwise.
116 V. COTEST IDEALS FOR EXPONENTIAL FAMILIES
JT;,
for T 1 .:::; and T 2 .:5 ~; h (T) = 0 otherwise. For the expression (5.8.20)
to be a cotest it is necessary that H be chosen in such a way that A * H vanishes
for T 1 > JT;, or T 2 > ..JT;_. The fact that such a choice of the functions H is
possible will be shown in § 3 of Chapter IX. In view of the complexity of the ques-
tion of constructing a sufficiently broad family of such functions, we shall not
stop to do this here.
l) Added in proof. V. P. Palamodov has shown that all cotests have an analogous
fa.rm: Testing of a multidimensional polynomial hypothesis, Dokl. Alcad. Nauk SSSR 172
(1967), 291-293 =Soviet Math. Dokl. 8 (1967), 95-97. Editor's note. See also the Supple-
ment to the present book, by Palamodov and Kagan.
CHAPTER VI
WIJSMAN'S D-METHOD
Let us again look at exponential families, where the probability density with
respect to Lebesgue measure can be expressed by formula (5.2.3) without the
correspo'nding conditions on the Ti (j = 1, • • ·, s). As was stated in §§2 and 8 of
Chapter V, we can under rather general conditions construct a precotest ideal and
similar tests if we can exhibit a cone in the space Es of sufficient statistics
Tl•··· , Ts inside which h (Tl•·•· , Ts> does not vanish and, after a suitable
linear transformation, satisfies certain conditions. The relationships between the
e
parameters 1, • • ·, esdefining the null hypothesis H0 are assumed to be holo-
morphic of the type (5.2.4). Of course this does not give a sufficiently complete
description of the similar tests.
If we cannot exhibit a cone of the type indicated we cannot construct similar
tests by this method.
In 1958, in an interesting article [ 11], Wijsman presented a method enabling us to con-
struct similar tests under weaker assumptions regarding h( Tl'·· • ; Ts) even when the null
hypothesis H 0 reduces to a single polynomial relationship among 1 , • • ·,e es'
However only a part of the entire family of similar tests can be obtained in this
way, and it is only in rare cases that we can determine their optimality or E·
117
118 VI. WIJSMAN'S D-METHOD
(6.1.2)
The zero hypothesis H0 is assumed to be defined by a single polynomial
relationship
for T € C and G(T) =0 for T ~ C. The function G(T) is the desired one.
Consider the differential operator
in which inf h ( T) is positive, we can construct cotes ts for them by the method
described and set up linear combinations of them from which we can also obtain
cotests. Following Wijsman we shall call this method of constructing similar
tests the "D-method".
X0 1
1 )-2 (x -x).
=x Vn+ 1; X = (1 - n+l - 1
1
(2n)
.!!..:!:.!.
2 an
exp}f _ 2~2·[cx0 -µ2)+ ~xJ]l 1= 1
= C(µ, cr)exp [ - ±
2~2 J=O X~+ r: x 0 ]. (6.2.1)
22 r·(-;.)an
Furthermore, for T 1 ;?:: T~,
a (T2. V)
iJ (T,. T1)
=I I \=I,
-T 2
01
(6.2.4)
if
(6.2.5)
If inequality (6.2.5) is not satisfied, then t/l(T 1, T 2 ) = 0. We note that, by Chap-
ter III, §3, the statistic T 2/Jf;_ has a distribution depending only on a/a, so
that tests of the form C>(T 2hfl;), where C> € (0,1) is an arbitrary measurable
function, are similar. From these tests we can construct the cotests t/l(T 2/JT;_).
For a suitable choice of G( T1, T2) formula (6.2.4) yields cotests that do not reduce to
the type indicated.
Let us now discuss Wijsman's reasoning with regard to the construction of
similar tests of this problem by the method described. We need several tedious
analytical lemmas, which we use without proof, referring the reader to Wijsman's
article [I I].
We can extend somewhat the general idea of the D-method. We shall show
that for a given level a every cotest can be expressed by formula (6.2.4) for
suitable G( T 1, T 2) satisfying certain additional conditions. We do not require
that G( T 1, T 2) vanish whenever h ( T 1, T 2) vanishes. It is important only that
equation (6.1.6) be satisfied, and in fact even this requirement can be somewhat
weakened.
Equation (6.2.4) can be written in the form
co T1 1
r~ (T 2 - T~)2
X exp [ - - ,
J· (6.2.8)
2 T1 -T 1
Since <l>(T') = O whenever T!/ > T1' , we can integrate over the interval T'2
2 ::;
Ti :S TI.
We need to investigate the question as to whether the formal solution (6.2.8)
of equation (6.2.6) actually yields the cotest that we need. To do this, let us set
<I> ( T) = I/I ( T) + a and examine the power function
~(r, cr)=E(tl>(T 1, T 2 )[r. cr).
Suppose that I/I satisfies (6.2.4). Then
X( 02
2 - 2r6 ~) 0 (T 1, T 2)dT 1 dT 2 , (6.2.9)
oT2 oT 1
where the integration is over the region 0 ::; T 1 < oo, - oo < T 2 < oo and where
C (r, a) is a normalizing constant.
A detailed investigation of this integration leads to the formula (see [ 11],
PP· 1037-1038 and 1042-1045)
p(r, .cr)
sa
-oo
(A, T2) exp [- 2~2 +: T2] dT2,
A
f
0
O(T 1• B)exp[- ~~ +~B]dT 1 ,
A
f iJO (Ti. B)
iJT2 exp
[-_!_J_
202
+!...BJ
a
dT
t
0
converge to 0 as first B and then A approaches oo. Then (see [11], PP· 1037-
UNBIASED ESTIMATES
(7.1.1)
for every f > O. Then for all real values of e = (8 1 , • · · , es) E pR the mathe-
matical expectation
(7.1-3)
exist.
125
126 VII. UNBIASED ESTIMATES
The statistic eis an unbiased estimate of the function f(Ol' ••• ' Os) in the
region PR" H we impose no additional constraints on the parameters (O 1' ••• , Os),
then by Theorem 4.2.2 the statistic g is the unique unbiased estimate of
f(Ol' ···,Os). Every other unbiased estimate f(Ol' • • ·, 0) coincides with it
with probability 1. On the other hand, if there are additional constraints on the
parameters then equation (7.1.2) may hold only under the assumption of these rela-
tionships. We then do not have completeness and there may be many unbiased
estimates.
Let g1 and g2 denote two unbiased estimates of f(OI' • • ·, 0) for
(0. 1 , • • • , Os) € n 0 , where n0 C pR" Then )< = gl - g2 is an unbiased estimate of
zero. We shall denote this concept simply by UEZ. If g is an unbiased estimate
off (OI' • • • , Os), then all other unbiased estimates of f are of the form g+ X•
where )( is a UEZ. To describe all unbiased estimates of f(0 1, ··•,Os), it will
be sufficient to find one of them and all the UEZ's X·
If f(OI' • • • , Os) is a function defined on PR• then the question of existence
of an unbiased estimate of it in PR, that is, of an equation of the form (7.1.2), is,
for the given families, a special question in the theory of the multiple Laplace
transformation. The function f(0 1 , ···,Os) must always be holomorphic in P. It
is sufficient that it be holomorphic in a neighborhood of the point
(0 1 = oo, • • • , Os = oo). Then there exists a smooth function p = p (T 1, · · · , Ts)
that vanishes if one of the numbers Ti< O (j= 1, 2, · · ·, s) and that satisfies
the equation L (pJ O) = f (0 I' · • • , Os). Since by hypothesis h ( T 1, · · • , Ts) does
not vanish in 5", it follows that e = p/h is an unbiased estimate of f(OI' ••• , Os)
in PR" Furthermore, it is the only unbiased estimate since there are no relation·
ships between the parameters.
Let us now describe the set of UEZ's for incomplete exponential families,
i.e. when we have relationships of the form
(7.1.4)
(7 .1.6)
In this section we consider the same exponential families and the same unD
biased estimates of zero as in the preceding one. Let t= t<T 1, ···,Ts) denote
an unbiased estimate of the function f(el' ···,es). Suppose that t possesses a
variance D ((ie) in the region pR" Let n0 denote a compact set of real values
of the parameters in pR" We shall say that the unbiased estimate t is admissibk
on the compact set no if there does not exist a UEZ x such that D <t + x Ie) :$
D<tie) for all e E n 0 with equality holding for at least one value of e. Other·
wise tis said to be inadmissible on n0• An estimate tis said to be the best
D<s+ vx i 0)=E<s-/(0)+vx>2
=E <6 - I (0))2 + 2vE <sx> + v2Ex2.
Or' sing ling out the dependence on e,
D<6+vx J0) = D <610>+ 2vE<6x JO)+v2E(x2J0). (7.2-1)
(7.2-3)
where al' • • ·, ar are functions that are holomorphic in P and that are subject to
the conditions described earlier. Let us write equation (7.2-3) in greater detail:
J... J1,h exp [-(01T1 + ... + 0sTs)I dT1 ... dTs Ej Ho· (7.2.4)
er
Let L = P (al ae 1 , • • ·,al aes) denote a linear differential operator that is
a polynomial in al d9 1, • • ·, alaes with constant coefficients. H we apply this
operator to the left-hand member of (7.2.4), we get the expression
.J ...
I
er J xQhexp[-(ff1T1+ ... +0sTs)JdT1 ... dTS, (7.2.5)
§2- BEHAVIOR OF TIIE VARIANCE 129
( 7. 2.6)
where ¢/x) and l 0 (x) are continuous functions and x € [a, b] for a= - oo or
b = + oo, We have l 0 (x)-+ - oo as x-+ - oo; 10 (x)-+ - oo as x - oo; 1¢· (x)/l 0 (x)I-+ 0 as
oo . I
x--+ ± oo; and .( 00exp ((1- ()l 0 (x))dx < oo for some £ > o.
Consider a repeated sample (x 1' · · · , x n) out of a set with probability density
(7.2. 7). This sample corresponds to an exponential family with probability
density
p(x 1, ••• , Xnl0)=
= C 1 (0) exp (01T 1 + ... + 0 Ts) R (T
8 1, ••• , Ts). (7.2.8)
(Earlier we used notation with the minus sign in the argument of the exponential.)
Here Ti= I:=l ¢;Cxi) (j = 1, 2, .. •, s); h = h (T 1, ···,Ts) is the corresponding
J'
X exp (01T1 + ... + 0 T dT1 ... dT 5 5) 5 = f (0), (7.2.10)
VI).
Let lfio (T 1 , • • ·, Ts) denote a function satisfying the following conditions.
(1) lfJ 0 (T 1, ···,Ts)> 0 for (T 1, •·•,Ts) inside R.
(2) I/Jo (Tl' · • • , Ts) = 0 for T ft, R.
(3) l/J 0 (T 1 , • • ·, Ts) is everywhere continuous and has no fewer than
2k 2 + • · · + sks partial derivatives with respect to each argument. Here all deriva·
tives vanish on the boundary of R.
We set
. . . dT3 = V(0).
I o. Tf/.. R..
X(T1,. · ·• Ts)=t k~~:: :::: ~~, TER.
We have
E (X 10) = C1 (0) J .· ·
J 'ljl(T 1, •••• T 5 )exp(0 1T 1 + ... +0,T )dT 5 1 ••• dT 5
=C1(0)(02-0'fl2 -(05 -0ft5 V(0).
Therefore X is a UEZ under the relations (7.2.9), i.e. for (:) E Il.
Furthermore,
Now, let 8 denote an arbitrary compact subset of 0 0 and let Il denote the
corresponding compact subset of Il; i.e.
and
E <xQ I 0) >- c > o
0 (7.2.13)
(7.2.15)
c1 (0) I ... I Q
We have
E (Qj 0) = (T) h (T)
:r
X exp (0T 1 + ... + 0sT 5) dT 1 ••• dTs
and
E(Q2 10) =E(Q2 l 0)-2yE CxQ I 0)+v2E(x2 I 0).' (7.2.16)
It is obvious from (7.2.13), (7.2.14), and (7.2.16) that y can be chosen in such
""
a way that, for () € e,
and a class ij of unbiased estimates A of the parameter e that satisfy the con-
dition
(7.3.3)
Obviously the class ij coincides with the set of statistics of the form
x + f(xi -xi)' where f(xi - xi) is a statistic depending only on the observed dif-
ferences and satisfying the equation E (f\O) = Q.
Theorem 7.3.1. Every estimate A€~ is inadmissible in the class of IJJ'l,biased
estimates e for the region of all values of e except when
(7-3-4)
To prove this we note that
We set
(7-3-6)
Obviously X is a UEZ. Let us now consider the estimate B (x 1, • • •, xn) =
A - )('· We have
E (Bl0) =E(A\0)-E(x\"0) = 0.
Furthermore,
E(X(B-0)ix2 - x 1• . • • , xn-x1. 0)
=x[E[(B-0)lx2 - x 1, ••• , xn-x 1, 0]]=0.
Therefore E [x (B - O) IO] =0 and D (A IO) = D (B) + E <x 2 1O) > D (B) except when
X = 0 with probability 1.
Let us look at those cases when this theorem of Rao gives inadmissibility of
a classical estimate i for the parameter e. For the statistic A = x, let us find
the "correction" X defined by (7-3.6). To do this we need to calculate the con-
ditional probability density for x with fixed x 2 - x 1, • • • , xn - x 1 , and 0 = O. We
make the change of variables
- 1
s1=X=n(x1+ ••• +xn),
s2= X2-X1,
1 1 1
n n n n
-1 1 o ... o
=l,
-1 0 1 ... 0
- 1 0 0 ... 1
which is easily verified by adding all columns of the determinant to the first
column. H we set ~ = ((2 + • • • + (n)/n, we find, after some simple calculation,
that
-oo.
J f(x1-6 • ... , xn-s)ds
(7°3°10)
Since Exj = O exists, we have ¢ '(t) = iE (x/ itxi) (see, for example, (33]). There-
fore we obtain from (7.3. 7) that
E(xl + + +
x 2 x 3) exp l [(t 1 t 2) x 1 - t 1x 2 - t 2x 3]
=- lcp' (t1+t2) cp ( - t1) cp (-t2)-icp (t1+t2) cp' ( - t1) cp(-t2)
- lcp(t1 + t2)cp' (-t2)cp(-t1) = o.
In a sufficiently small neighborhood of the point (0, 0) we have It 11 ::; f and
lt 21::; £, and ¢ (t)-/: O. We may write
qi'(t1+t2)
IJ) (t1 +t2)
+ qi'(-t1)
(-ti)
(j)
+ qi'(-t2) -0
(-t2) -
(j). (7.3.11)
In such a neighborhood of Owe obtain ¢ '(t)/ ¢ (t) = t/J (t). Let us show that
it in this respect).
Thus by leaving the u-algebra of sufficient statistics we are enabled to con·
struct an (·complete family of unrandomized tests for the cases considered above.
In particular, on the basis of the discussion in Chapter V, this can be done for the
Behrens-Fisher problem.
However, if we wish to investigate unrandomized similar tests for exponential
families that depend only on sufficient statistics under null hypotheses :of t~e focm
(5.2.4), we encounter considerable difficulties even in such a simple case of an
exponential family as that ta which the Behrens-Fisher problem leads. Below, we
shall investigate the Behrens-Fisher· problem in this respect.
Consider an exponential family of the form
Pa(1'1. · ·., Ts)
= (C(0) exp [-(0 1T 1 + ... + 0sTs)l h (T, • ...• TsY> (8.1.1)
Following [42], let us consider unrandomized similar tests ¢(T 1, • · ·, T 5 ) for
the hypothesis H0 defined with the aid of r (< s) polynomial relations of the form
139
140 VIII. UNRANDOMIZED TESTS
(8.1.2)
where the polynomials P;(B), j = 1, 2, · · ·, r, are assumed to be homogeneous of
degree ~ 1.
The test </J(T l' · · ·, T 5 ) is characterized by its critical zone, for which it is
the indicator function. Of course, we need to require of this test only that it be
measurable with respect to the a-algebra of sufficient statistics. Furthennore, we
must require satisfactory behavior of its power function. Finally, to apply the test,
it is desirable that the boundary of the critical zone have as simple a form as pos-
sible, for example that it consist of finitely many smooth pieces.
The problem of describing unrandomized similar tests for exponential families
that depend only on sufficient statistics is still far from solved. However, inves-
tigations that have been pursued in this direction (which will be discussed to some
degree in what follows) show that such tests, if they exist, apparently have criti-
cal zones whose boundaries are extremely complicated. In general the function
itself or its derivatives of low orders have discontinuities on such boundaries.
Roughly speaking, we should not expect the existence of tests of such a fonn with
sufficiently smooth boundaries of the critical zone. Fairly detailed investigations
in this connection have been made only for the Behrens-Fisher problem by the
method of analytic continuation with respect to the parameters.
Let us look briefly at the application of such a method to exponential families
(8.1.1) and the null hypothesis of the fonn (8.1.2). Let ·j" =(Tl'··.·, T 5 ) denote
the space of values of the sufficient statistics. This space is contained in E 5 •
Let us require that the integral
for all real values of ()l' • • ·, () 5 that satisfy conditions (8.1.2). Here C 0(a) is a
positive constant depending only on the level a and the critical zone Z. The rela-
tionships (8.1.2) constitute an algebraic set in the projective space. Let (Bp···, () 5 )
§ 1. QUESTIONS OF EXISTENCE 141
denote a point in this algebraic set. For arbitrary cu> 0 the point (Oi/cu, · • ·, 0/cu)
also belongs to that set.
We set Oj = w~j for j = 1, 2, · · ·, s. Let us multiply both sides of equation
(8.1.3) by cube-a"', where b is an integer ;: : _ 1 and a is a positive number. If we
formally integrate both sides of the equation with respect to cu from O to .oe, we
obtain the formal relationship
I ... f ( +
z
h (Ti. ••• , T 5 ) dT 1
a
it T
1 1
•••
+ ··. + 'ftsTs)
dT5
b+l
C A
0 (a)/ (v, a, b) •. (8.1.4)
Let us suppose now that both sides of (8.1.4) are meaningful for real -&j > O
(j = 1, 2, • · ·, s). A sufficient condition for this is absolute convergence of the
integral on the left side of (8.1.4) for -6-j > O. Then the left-hand member of (8.1.4)
must coincide with the right-hand member if the relations
(8.1.5)
are satisfied. For real parameter~ -& these relations can be extended to a complex:
region. Let us suppose that such an extension is possible on both sides of equa-
tion (8.1.4). Then this equation itself will remain valid for the corresponding com-
plex values of ~ l' · · · , ii- s in the corresponding region 0. Here the form of the
function f('fr, a, b) is determined by the exponential family (8.1.1), and investiga-
tion of the possible similar tests leads to investigation of the zones Z satisfying
the identity (8.1.4) when the relations (8.1.5) are satisfied. For Z = ·j' the identity
(8.1.4) obviously holds with C0 (a) = C0(1); this is a trivial unrandomized similar
test. For other Z the zone has a nontrivial boundaty.
For a given a coo.sider the expression in the denominator of the left-hand mem-
ber of (8.1.4)
D(a, -&, T)=a+'6- 1T 1 + ... +-& Ts 1 (8.1.6)
with relations (8.1.5) holding.
In the study of equations of the type (8.1.4), a special role is played by the
real zeros of the expression (8.1.6) when the relationships mentioned hold, i.e.
the geometric loci in the space E 5 that are defined by the equations
D(a. 'fr, T)=O; P 1 ('6-)=0, ... , Ps(-6-)=0 .. (8.1. 7)
142 VIII. UNRANDOMIZED TESTS
Here we admit arbitraiy complex values for 'fr that yield real values for Tl'". · , T8 •
We shall call such real zeros "critical surfaces" or "critics" of the exponential
family (they depend on a number a fixed in advance).
Th_: values -frCO) = ('6-<1°>, · · ·, 'fr~O)) generated by the critics lie outside the
region n in which equation (8.1.4) is valid, since the left-hand member of (8.1.4)
becomes meaningless for such values. However in many cases it is possible to
approximate such values without leaving the region 0. Here the denominator of
the fraction on the left side of (8.1.4) approximates O. In studying the asymptotic
behavior of the left and right sides of this equation as we let 'fr€ n approach 'frcoi
we can fr~quently obtain somewhat remarkable information regarding the behavior
of the similar zones Z. Here the strucrure of their boundaries is of particular
importance. Under certain conditions one can use this proc.edure to show that
there are no similar critical zones with sufficiently smooth boundaries.
We note also that to carry out an investigation of this type it is essential to
have a relation of the form (8.1.4), where the denominator in the integrand vanishes
at the critics of the exponential family. The origin of such a relationship is of no
significance. We obtained it by an integral transfonnation, but it can also be
obtained by other approaches, especially when we are considering tests that are
measurable with respect to some (incomplete) subalgebra of the algebra of. suffi-
cient statistics. We shall encounter such examples when we study the Behrens-
Fisher problem.
Sine e we shall be dealing with the random vector ((x - y) / s 2, s ifs 2) , we need
to derive its distribution. Let us set g = (x - y)/ s 2 and T/ = s ifs 2• Let us assume
also that n 1 ·2: 2 and n 2 ~ 2. We define
This quantity is independent of si and s~ (see Otapter IV). Here E(X 1) = O and
D(X 1) = 1, so that X 1 € N(O, 1).
II I
Let us also define u = n 1s a and v = n 2 s ~/a~ These quan titles are
independent and they have x 2 distributions with (nl -1) and (n2 - 1) degrees
of freedom respectively. Finally, we define u 1 = ..jii and v 1 = ...jv. The quanti-
ties X l' u 1 and v 1 are independent and have probability densities
1
p(X 1 )=~exp(- ~·. ,
, 2)
'Vt> 0.
Define
x- = xt
-
flt
- u 1 Z=V1·
, .Y=-
flt •
-
Th en x1 = """
x z, u 1 = """
y z, v 1 = ""
z, and
z 0 0
oz o = z 2•
;;1
Thus for arbitrary x 0 and arbitrary positive y 0 the common density is
where
00
where
2r(n1+n2-l)
- I 2
C n,n, = Kn,n,K n,n, = r ( n1 2 t ) r ( n2 2 1 ) •
Let us now define () = n 2 u~/n 1 u~. After some simple manipulations we obtain
X= S - T)
Y1+0' Y=ye·
Let ¢(e, T/) denote the indicator function of the critical zone G(e, TJ) ~ 0. Then
E<<Pl0)= ff q>(xv1+0 . .Yy0)p(x.
Q
)i)dxd'Y=a.. (8.2.2>
(8.2.3)
where C(n 1, n 2)-> O. This is the basic integral relationship for a homogeneous
unrandomized similar test.
Following the exposition of § 1, let us investigate the critical curves for the
family of measures appearing in formula (8.2.3). We shall simply refer to these as
§ 2. STATEMENT OF THE PROBLEM 145
with common focus (0, 1). For D = 1, they degenerate into the twice covered
segment e= 0, 0 .~ 71 .:::; 1. We write the family of ellipses in the form
However, if we drop the requirement that the function ·c(g, T/) be continuous
this theorem is no longer valid, as we shall show in Chapter X.
ary of Z in the quadrant Q 1 at no more than one point. Suppose that this is not
the case, i.e. that there are two points <el' 710) and <e2, 710) belonging to Z such
that e2 > e1 ~ O. Then there exist two points <el' 71 0) and <e3, 71 0) such that
el< e3 < e2 and <e3' 110) ~ z. Suppose that corresponding to the point <el' 110)
are the sufficient statistics x, y, sl' s2 and x', y~ s'p and s'2· Here sifs2=
s'1/s'2 and (x' -y1)/s'2 > (x -y)/s 2. Let us set k = s' 1si: 1 and take a point to
which the sufficient statistics kX, kY, ks 1 and ks 2 correspond. This point also
belongs to Z. Furthermore ks 2 = s•2 and (x' - y' )/si > (kx - ky)/ks 2• Acccxding
the axiom (IV) the point <e3, 71 0) must also belong to Z, which is impossible.
Thus the critical zone Z must be of the form
1:X-"Y1 ~
S2 -:?'
qi(!L).
S2
which we obtain from the preceding form by setting <P 1 = </J • (1 + s j! s ~)~. A fairly
detailed bibliography regarding this test and others like it that were discovered up
to 1955 can be found in the survey by Breny [6].
In his article [9] Wald considered approximate similar tests of the form (8.3.1),
where </J is a rational function, and he raised the question of constructing precisely
a similar test for which the function </J is analytic for 1/ = s ifs 2 ~ O•. We shall
see later that such a construction is impossible-that a similar test for which the
function </J is analytic does not exist. We shall also prove a stronger assertion
of this type.
Let us return to the coordinates eand T/· In the quadrant 0 1: e·~ 0, 71 ~ 0,
the test (8.3.1) becomes
s~qi (1]), (8.3 .2)
so that the critical zone z is a symmetric doubly-connected region e·~ </J(71) or
e.5:- ¢J( 71). Following articles 1) by the author [45] and I. L. Romanovski'l [63],
we now prove
Theorem 8.3.l. Suppose that ¢C.q) is continuous for 0 :s; T/ :s; 1 and that it has
a finite derivative for 0 .::; 71 .~ 1, where ¢'(0) means the right-hand derivative.
Suppose that ¢(71) satisfies a Lipschitz condition with exponent 1 for 1 :s; T/ ~
2(M + 1), where M = sup 0.< <I ¢(71). Then the test (8.3.2) cannot be similar if
_Tl_
the size of the test n 2 ;?: 4.
The proof is based on the method of analytic conti.ouatiOR with respect to a
pa&Hll.eter, as described io § 1. Several lemmas are necessary for it. kl wkat fol.
lows, the symbols li, ci, aad Ki denote positive constants and the symhels fi'
71i• and ei denote small positive constants.
Lemma 8.3.I. Suppose that e0 = ¢(0). Then
c2a <So< c1a. (8.3.3)
To prove this we use formula (8.2.3). By this fonnula we have
ffz' 11 11 ·- 2 ~d1J
(0s2+ (1+0) 112+9 +02)N =f11.
I
(8.3.5)
6
We define p(71 , ()) = (1 + ()) 71 2 + () + ()2. We find that
1 n
=(l +0)-N+20-2
2 s du n, (8.3.6)
_6_ (u2+1)2
va
If e is chosen in such a way that o/Je = K, where K is sufficiently large con-
stant, then (8.3.6) has the estimate
or
IX-YI...._
---..,...q>l -
(S1) 04~
1 +·-.
S2 S2 S~
which may be written in the form
v1x-Y-1
~ 1+ 5 2
2 ~q>l
(
SS21 ),
or in the form
1:Y-x1 (( 5 1
Vs~+~ ~q> 1
)- )
s:
or
This last expression yields the critical zone of the test in which the role of the
sample (xi'···, x 111 ) is played by !yp · · ·, y 112 ) and vice versa. Thus, corres-
ponding to the test le-I~ cp 1(T/) ~ is the test le-I~ cp 1(1/T/) ...jl + T/2 of the
same level. Here we need to switch n 1 and n 2 in formula (8.3.4).
Consider the projective transformation of the plane - oe < g < oe, - oe < T/ < ·""
defined by g' = g;T/, T/' = l/TJ. This is an involution of the plane with fixed axis
T/ = 1 and fixed center (0, - l).
Each quadrant of the upper half-plane is mapped by this transformation into
itself. In the upper right quadrant the test boundary g = <P 1(T/) ~ is mapped
into g = <P 1(1/T/) ...jl + T/2. From what was said above, this yields a test bound-
ary of a similar test of the same level. In formula (8.3.4) we need to replace n 1
with n 2, n 2 with nl' and </J(T/) with cp 1(l/TJ) ~ •
Let D denote a number in the interval (O, 1) and let the expressioo A(g, T/)"" D
define a nongenerate critic-hyperbola. The equation for this hyperbola is
,,2 s2
0-1-n=l.
The involution described above maps the hyperbola into the ellipse
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 151
52 '1')2
D 1 -1 + D- 1 =l.
Here D-l > 1 and this ellipse is the critic B(e, ri) = D- 1• This means that every
nondegenerate critic-hyperbola is mapped into a critic-ellipse. Since out transfor-
mation is an involution, the converse is also true. Furthermore, if the test bound-
ary e= ¢(ri) is tangent at any point to a noodegenerate critic-hyperbola (resp.
nondegenerate critic-ellipse), then the "upset" similar test with test boundary
ffz, ,,n,-2d6d11
(0 +A (6, 11) )N (0 + B (6, 11) )N
a - .!!!. -N +.!
=Cn,n2 2 0 2 (1+0) 2• (8.4.2)
By considering the values of () of the form () = -D 0 + i<:, where (is a small real
number, we shall obtain a contradiction, thus proving Lemma 8.4.1.
We may assume that the test boundary is tangent to the critic-hyperbola
A(tf, 71) = D0 , 0.<D 0 .<1, because if it is tangent to the critic-ellipse B(if, T/) =
D l' D 1 > 1, it will be tangent ot the critic-hyperbola for the "upset" test and the
analytical properties of the lemma are maintained.
Let us divide the region of integration Z 1 into several parts. First of all,
we denote by Ilt"O the infinite haH-horseshoe-shaped region containing the critic
A(tf, TJ) = f!o bounded by the ordinate and the two confocal hyperbolas
A(s. 11)=Do+eoandA(s. 11)=Do-Eo.
(where t"Q is a sufficiently small positive number such that D0 + t"o .< I and D0 -
t"o > O) •. We then denote by Il~~) the finite subregion of this region bounded by the
ordinate and the line TJ = TJ l' where T/ 1 is such that the distance from the ordin-
ate the point of tangency (if0 , T/o) is, for, example, several times as great as
to
the distance along this line from the ordinate to the half-horseshoe. Thus the
region Ilt"O \TI~~) contains the point of tangency to the critic and the test bound·
ary. It is for just this region that we shall obtain a contradiction in the behavior
of the left and right sides of the basic equation as <:--+ O. The integral in the
left-hand member of the basic equation (8.4.2) taken over this region approaches
- oo, whereas the right-band member remains bounded.
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 153
In what follows we shall take e = - D0 + i(, where ( < ( 1 « E"o· We denote
by (j that region of values of e that is defined by the conditions listed.
In the region fl I \IlfO for t/J(g, 71) I= 0,
10 + A <s. 11) I :;;::. eo
and
Therefore the denominator of the integrand in (8.4.2) differs from 0 by some con-
stant and its behavior plays no significant role.
In the region Ilf 0 , for e E ""e,
10+ A (s. 11) I<: l-D0 + i~+Do+eo 1-< Veo+~2 < 2to· (8.4.3)
and
(8.4.4)
Thus what is important is the behavior of the basic equation in the region
Ilfo• Let us denote by 1(0) the integral in the left-hand member of the basic equa-
tion and let us denote the same integral over the subregion IlfO by I 0(0). Let us
study the integral I 0(0) in greater detail. The quantity (0 + 8)-N can be rewritten
e+A)-N
(B-A)-N ( 1+ 8-A • (8.4.5)
Let us choose E"o in such a way that the series that we shall be considering con-
verge; for example let us take E"o < E"/10. Since inequality (8.44) holds and
since for e
> 0 we are considering that branch of the function (O + 8)-N with posi-
tive values, we can expand (8.4.5) according to the binomial formula.
Let us set
M=
1
1 YzN+ if n 1 and n 2 are of like parity,
N if n 1 and n 2 are of opposite parity.
Let t/J(g, 71) denote the indicator function of the zone z I" Then
lo(0)
"'\, f f 'i'(s.
M
f f 'iJ
IIeo
(s, f)) (B - A)N (0
'l'Jn,-2 ds d'l'J
+ A)N • (8.4.6)
+
1
(-N+2) ... (-N+l+M 1)
SS ¢(5. fJ) ~~~~
(B-A)N+l (0+A) +
Ile,,
s
e1
d0
(0 + A) 1/ 2 (0 - 0') 1/ 2
+ c" f f ¢(6
3 '
'11) TJn,-2In (8 +A) ds d1J
(B-A)N+2
+ v (0) ' (8.4.8)
Ile.
where c'J., c"2, and c 113 are re~ constants, c' 1 f,. 0, and the function v(O) is
analytic and bounded for E e e.
We choose that branch of ln(O +A) that has
real values for e > O. Again, all the integrals converge absolutely.
The expressions (8.4. 7) and (8.4.8) differ only in the third term. Therefore
we may write (combining the two cases)
+c 3
f f lJl (6. ri) TJn,-2 c (8) ds dTJ
(B-A)N+2
+ u (0)' (8.4.9)
Ile.
f f ¢<5·
rr!IJ
(8.4.10)
"' eo
where e E e; that is, e= - D0 + it;;. In the region TI~~) let us replace the vari-
ables (and T/ with a and T/ just as in [45]. We set a= A -D 0 • Here the
Jacobian a(a, T/)/a((, ri) < 0 in the region TI~J). This Jacobian is an analytic_
function which, when multiplied by (8 - A)-N, decreases at least as fast as 1/ri2
in the region I]O) as T/ -->.oo.
€Q
<§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 157
Using the new variables, we rewrite the integral (8.4.10) in the form
- ff
n<I>
'II'
P(a, 'IJ)
(a, 'l'J) (a+ i~) 2 da d'l'j, (8.4.11)
e,,
where
F(a, ri) > o.
Since
--~s~2-- t
D 0 +a (1-(Do+a) = '
it follows that a is a single-valued analytic function of ( and 71 for given D0 •
Here a((, 71) is a strictly decreasing function of ( for fixed 71 and a strictly
increasing function of 71 for fixed (. For the new variables, instead of the char-
acteristic function t/J((, 71) we write the corresponding limits of integration. For
given 71 we define
s0 ('l'J) =min (e0 , a (q> ('l'J), 'l'J))
and
F(a, 'l'J)=F0 (0, ri)+aF 1 (a, 'l'J).
Then the integral in (8.4.11) is equal to
('!I)
f f
00 So
d P 0 .(0, T)) +
aF1 (a, T)) da
- 'l'J (a+i~) 2 •
n, -e,,
Here F 0 (0, 71) = O(l/71 2) and aF 1(a, 71) = O(l/71 2) as 71-> ;,.,.
We estimate the imaginary part of this integral:
('!I)
00
Sdri S
So
p 0 (0, T)) +
aPi(a, "I) da
- Im (a+ i~)2
'Iii -e,, oo So('ll)
2a 2 ~F 1 (a, Tt) da
=- S 'l'J S d
- (a2+~2)2
'111 -e,,
00 So ('!I)
+ Sdri S
2a~P0(0, T)) da
(a2+~2)2
'lit -e,,
f
'111
d'l'j
-e,,
J 2a2~p1 (a,
(a2 +~2)2
'IJ) da
behaves like 0(1). The imagillary part of the second integral is equal to
158 VIII. UNRANDOMIZED TESTS
00
where H is a bounded number. Here we should keep in mind the fact that
F 0 (0, Tl) > e > 0. 3
for all T/ in a neighborhood of T/o·
The denominator of the integrand approaches 0 as ( 0 when s 0(T/) = 0, as --+
is the case at common points of the critic A(cf, T/) = D0 and the boundary of the
test. These common points may be points of either intersection or tangency. In
either case the half-horseshoe ·n<fO1) has at least one point of tangency, namely
the point (cf0 , T/o).
Consider the region Il~~) containing the point of tangency. As T/-+ T/o•
So~Tl) = 0 j 'l'J - 'l'Jo J •
Let K denote a large given number. For sufficiently small (and for ITJ-TJol_:5
K( we have the inequality Js 0(.,,)J :::; (, thanks to the existence (assumed in the
lemma) of a finite derivative of ¢(.,,), and we can estimate the integral in (8.4.12)
as follows:
co +Kt;
2~ f rs: f~) )1J>~~2 > 2~ f
TJ1 -KC
2~2 dri = 2e3K.
where f 3 > 0. For sufficiently large K, we see that the integral (8.4.12) can be
made arbitrarily large. Thus
Tl' (TJ)
I f
So
F(a, 11)
Im d'l'J (a+ i~) 2 d at;-:0- oo.
TJ~ -e.
Now let us look at the other two integrals in (8.4.9). We can get bounds for
their imaginary parts as (-+ 0 just as we did for the first integral, this_time re-
placing (a+ i(>-2 with (a+ i()-1 and In (a+ i() or 1 depending on whether
the difference n 1 - n 2 is odd or oven. :Obviously, Iml/(a + i(} = - (/(a 2 + ( 2) and
Im In (a + i() = o(l). As ( --+ 0, we see that the imaginary parts of the second and
third integrals in (8.4.9) are brunded. The functioo u(O) and its imaginary put are also
brunded fa: 0 E 0 as (--. O. We now need to investigate the integral over the regioo n<f~·
Let us consider that portion d the half-ha:seshoe II~~) adjacent to the a:dinate and
bounded by the line cf= f 4• The behavior of the integral in the left-hand member of (8.4.2)
over the region II~~) depends oo the behavior of the test boundary cf= <fJ(T/) at the point
at which the critic A(cf, T/) = D 0 leaves the ordinate, -i.e. :at the point
§ 4. TANGENCY OF A TEST BOUNDARY TO A CRITIC 159
where v(g, a) is a regular function in the region ft~) and v(g, a)> O. For fixed
g the limits of integration with respect to a depend on the roots of the equation
g = ¢(11). Just as in the region TI~~. we may write
v (6. a)= "o (6. 0) + av,,.,. (6,
1 a),
where v 1(g, a) is a function that is regular in Il(O). What is significant is the
fO
integral
160 VIII. UNRANDOMIZED TESTS
ff!o>
Vo (s, a)·d~ ds •
(a+ Zb)
neo
J- 2~vo (s.
84
0)
l
The imaginary part of this integral is of the form
~l
r,
[ (aim
1
m) + b 2 2 - (a2m (s) ) + ~ 2
I
2
]
r. [
- ~
m=l ·(a1m<S>)2+b2
I
- I
(a2m<S>)2+b2
]lI ~ (8.4.13)
where a 1 ($ and a 2m($ correspond to those roots of the equation </J(T/) =~that
exceed ;/DO, and aIm($ and a2m($ correspond to those roots that are less than
JD~'. The upper limits of summation T 1 and T 2 may be infinite. Also, a lm ($ <
a 2 m(~) and a1m(fl < a2 m(~). Since the function ¢(T/) satisfies a Lipschitz con-
diti.on in the interval in question, we have
Ia1m m1 ~ es6·
Ia1m (s)I > es6·
Each of the summations in (8.4.13) consists of positive quantities. These summa-
tions do not exceed
and 1
(aim (S) ) 2 + ~2 (aim (S) ) 2 + ~2 '
respectively, where ja 1m1 and ia1m1 are minimal. The absolute value of the in-
tegrand in (8.4.13) does not exceed
2ds
0 (6. 0)
1'
2.,,v 2 2 2 (8.4.14)
E5S +~
If we integrate (8.4.14) we obtain a quantity that remains bounded as '--+ O.
The remaining integrals, analogous to those in (8.4.10) that are taken over the
region n~~. are investigated in the same way and they have a bounded imaginary
part as ' --+ O.
Thus for () = - D0 + i,, where '>
O, we have LeC/ 0(()))--+ oo as '--+ 0,
where / 0(()) is the basic integral and oo is the infinite point of the complex plane.
Let us return to equation (8.4.2). If we consider the expressioo
Thus, the test curve and the critic-hyperbola A(g, T/) = D, 0 < D < 1, cannot
be tangent at a finite distance.
It follows that the test boundary cannot under such conditions be tangent to
the critic-ellipse
B (s. l'J) = D for D > 1.
To see this note that, in accordance with what was said in §3, such tangency would
lead to tangency of the test boundary of the "upset" rest with the critic-hyperbola
A(g, T/) = D, and this, as we have shown, is impossible. This completes the
proof of the lemma.
Lemma 8.4.2. If the test boundary g = cfiTJ) is continuous on [O, l], then it
has a zero on (0, l].
In fact,this follows at once from the theorem on "null-regular tests" which
is proved in § 1 of Chapter IX; .the proof of this theorem is completely indepen-
dent of §§ 3 and 4 of Chapter VIII. By Lemma 8.3. l we know moreover that
eo = ¢(0) I= o.
Lemma 8.4.3.0 Let g= ¢(TJ) denote a functiori that is continuous on [O, I]
and that has a finite derivative everywhere on [0, 1), with the right-hand deriva-
tive meant at T/ = O•. Suppose that the size of the sample is n 2 ·:;::: 4. Then the
curve g= ¢Cq) is tangent to a nondegenerate critic-ellipse at a point (g0 , .,, 0 ),
o < .,,0 < 1.
To prove this lemma we need only show that the (one-sided) derivative ¢' (O)
is positive at the point TJ = O•. This is true because ¢(0) = g0 > 0 in accordance
with Lemma 8.3.l and, if ¢' (O) is greater than 0, the curve g = ¢(TJ) issuing
from the point (g0 , O) has a right-hand tangent.
By Lemma 8.4.2 the curve g = ¢(TJ) then reaches the point (O, y), where
y € (O, l] is a zero of the function ¢(TJ); i.e. it must "tum to the left". It is
obvious from geometric con side.rations that it is tangent to a suitably chosen
critic-ellipse B(g, T/) = D, D > 1, that encircles the curve g = ¢(TJ) for o 5 T/ 5 1.
If M = sup 0_.,.,_
< <1¢(TJ) we can find an upper bound for the.ordinate of the point of
intersection (0, TJ'o) of the critic B(g, T/) = D with the T/-axis.
Indeed, the equation of our critic is
62 ·q2
D-1 +o=l.
1) This lemma was proved under somewhat more stringent hypotheses by Salaevski~
(see [ 49]).
162 VIII. UNRANDOMIZED TESTS
In accordance with what was shown in c§ 3, we can write this inequality in the
form
Je-~
u•
G(u)=vk df=P(IX 1 l<u), and g (u) = 2 e -2
Y2:it
0
If a E (O, 1) is the level of our similar test, then
If this formally constructed relationship were valid, (8.4.16) would follow easily
from it. To see this, suppose that ¢' (0)::; O. Since ¢(0) = g0 ~ 0, the left-hand
member of (8.4.19) would be negative, which would contradict (8.4.19).
Thus we need to show that equation (8.4.19) is valid. Let us assume that
§ 4. TANG ENCY OF A TEST BOUNDARY TO A CRITIC 163
for x ~ 0 the function cf;(x) is measurable and has a finite one-sided derivative
g'0 at x = 0. Taking into consideration the probability densities for X and Y
(see § 2) and denoting the left-hand member of (8.4.18) by E(O), we obtain, for
() > 0,
(8.4.20)
where
n,-3
p 1 (X) =C 1X-2-exp(- ;).
p 2 (Y)=C 2Y
n,-3
2
(
exp - 2
Y)
We note that G(t) = 0(1) for arbitrary real t. Thus E(O) exists even at () = 0.
We need to show that E(O) is continuous, i.e. that lim e""()E(O) = E(O), from which
it will follow that E(O) = 1 - a.
Let us make the substirutions X\Y = U and Y = V. Then !a(X, Y)/a(U, V)! = V
and for () >0 we have, by virrue of (8.4.20),
E (6) = s f
00
0
dU
00
0
dVVO [ :v
1 +!J
<p (Ou)] p 1 (UV) p 2 (V)= 1-a..
(8.4.21)
Let us set D(O) = E(O) - E(O) and represent this function as an integral of the
corresponding difference; D(O) = 1 - a- E(O) =coast. Let us show that D(O)--> 0
as ()--> 0, so that D(O) = 0 for () ~ 0. We break the integral with respect to U
into two integrals by setting
e-I +f oo
s s
00 00
D 0 (0)=
0
dU
0
dV( ), J s dU
0
dV( ).
D 1 (6)
164 VIII. UNRANDOMIZED TESTS
= 0(1) J
oo
dU
Un,;3
n,+n2_ 1
= ( n,-1)
0 0<l-e)-2-
•
e-I+f (1 + U) 2
8-1 +f aa
(8.4.23)
a[v;_:e c:p(0V)]-a[yi7c:p(O)j'
JIV
, / _ cp(0U)
rl+0
= 1- f
Jf2it JI·-v cp (0)
exp[-~]d't
2
vv
, / - (cp(0)+0U(cp' (0)+1J))
, 1+ e
=y2:n;
1
j exp [ - ~2 ] di: .. (8.4.24)
YV cp(O)
By virtue of the mean-value theorem, we have the following estimate for this inte-
gral when e is sufficiently small:
0(1) OU exp (- · v:o) yv,where~0 =c:p(O) > 0.
If we substitute this into (8.4.23), we obtain
n 1-1
D 0 (6) = 0 (1) 6 U-2-dUX
§ 4. TANG ENCY OF A TEST BOUNDARY TO A CRITIC 165
X Jv~·+:,-a
(j
exp [ - ~ ( 1 + U + ~)] V dV = o (6)
where n 2 ~ 4•. Thus lim 8_,,0D 0 ( ()) = O and hence D(()) = 0 for all () > O. There-
fore by (8.4.22),
r
We can extend this reasonin~ to (8.4.24). For arbitrary V > 0,
1+0
(q> (0) + au (qi' (0) + ..,)).
= VV(qi<O>- ~ cp'(O)+eucp'(O) )+Tie(u+ yV). (8.4.26)
1
-V2:rt o
f dU f
o
dVVp 1 (UV)p 2 (V)
fli' I (0, U)
X
vv
J cp (0)
exp [ - ~2 ] d-c (8.4.27)
will differ from (8.4.23) by no more than 0(71()). H we extend the integral with
respect to U in (8.4.27) from 0 to oo, i.e. if we add to (8.4.27) the term
00 00
Y~:rt I I
e-l+E
du
o
dV ( ) •.
where the parentheses represent the same integrand, we obtain the erroc
0(()3/2-ZE), just as in the derivation of (8.4.22). From this we finally obtain
e-l+E oo
D0 (0)=:,~
" 2Jt
r dU j dVVp 1 (UV)p 2 (V).
0.. 0•
VV /(0, UJ
,r 1
r 2n
f
0
dU j dV Vp
~
0
1 (UV) p 2 (V)
This coincides with equation (8.4.19) and completes the proof of Lemma 8.4.3.
(9.1.1)
167
168 IX. RANDOMIZED HOMOGENEOUS TESTS
Let us set
1
qi cs. TJ) = 2 + Ql1 <s· TJ).
Proof of Theorem 9.1.1. Let us apply the basic integral relationship of the
type (8.2.3). For randomized tests this relationship is obviously written the same
as for unrandomized ones, i.e. in the form
(9.1.2)
the critical curves (or simply critics) of the family of measures that is generated
by our probability densities and the parameter (). For D <1 the equation
A (g, 71) = D yields the family of hyperbolas
TJ 2 s
D - 1 - D =l.
2 (9.1. 3)
170 IX. RANDOMIZED HOMOGENEOUS TESTS
For D = 1 the hyperbola degenerates into the twice-covered ray ( = O, 1 :'.S T/ < OQ.
62 '12
(9.1.4)
D-I +o=l.
For D = I these semi-ellipses degenerate into the 1:Wice covered segment ( = O,
0 ::S T/ ::S 1.
We can rewrite the basic equation (9.1.2) in the form
(9°1°5)
By the null-regularity of the test ¢ ((, T/) there exists a number ·ri 0 = 1 + ri 1 > I
such that it is possible to draw a circle around every point of the segment ( = O,
O ::S T/ ::S T/o in which ¢ ((, 11) = O almost everywhere in the sense of Lebesgue
measure. By the Heine-Borel theorem there exists a finite covering consisting of
certain of these circles, so that we obtain an open region r containing our
segment such that ¢ ((, T/) = 0 for almost all points ((, T/) E r.
Let us now look at the family of confocal semi-ellipses (9.1.4). Let u.s
choose D > 1 so close to I that the corresponding semi-ellipse B ((, ri) = D lies
entirely in the region r (the same will of course be true for smaller values of D).
Now let us consider larger values of D. Let D0 (> 1) denote the least upper bound
of the set of values of D for which ¢ ((, 'ri) = O almost everywhere inside the
ellipse (9.1.4).
H D 0 = °" then ¢ ((, ri) = O almost everywhere on 0. The test is then a
trivial similar test. We disregard this case and assume that D0 is a finite nuinber.
The notation that we shall need is as follows. We denote by H a quantity
(not always the same one) that remains bounded as the different parameters of the
problem in question vary. We denote by ( 0, ( 1, · • · small positive constants
each dependent on the preceding ones. We denote by L~>, where k is a non-
negative integer, the operator for k differentiations with respect to () at the point
()EA.
We have D 0 > 1. We set
§ 1. "NULL-REGULAR" SIMlLAR TESTS 171
(9.1.6)
(9.1.8)
so that for () E Q,
(9.1.9)
'1(6)= II
B (,, 11);;. Do+ 1
(9.1.10)
172 IX. RANDOMlZED HOMOGENEOUS TESTS
(9.1.17)
Since C'J: = k!/m! (k - m)!, we find, after some simple calculation, that
l(k) l =Hf(k+l)(D-D)-N~-kk2 N
e ce+A>N ce+B>N o i
. Hf(k+l)(D-D0)-N~ik. (9.1.18)
where , 2 = 'if 2.
From 1 1 (()) (see (9.1.10)) let us take out the integral over the region defined
by (9.1.13); we denote this integral by I 1D. In this region (which is of the form
of a horseshoe between two semi-ellipses), we have
1
'l')=H(D-D0f2.
§ 1. "NULL·REGULAR'' SIMILAR TESTS 173
A.=C2R- 1•
with () E Q. We have
(9.1.26)
we obtain
1
Let us assume also that 0 < C:::; /),,. Then in the region (9.1.25) and for the values
of 0 indicated we have
1
Let us return now to (9.1.24). On the basis of (9.1. 27) the expression (9.1.26)
is replaced with
(9.1.28)
where
(9.1.30)
if (r + 1/2)A < C:/2. On the other hand, if (r + l/2)A ~ C:/2 we conclude from
(9. L 29) and the second of the three expressions (9. I. 30) that
L~11, 1 =H r (N + k) (9.1.31)
(0+B)N (0+A)N ~~ '
I L~k> 1
(0 + A)N (0 + B)N
I> r (N + k)
Co (0 + B)N+k
(9.1.32)
for
Do<;;;;B(~,
JJ
'l'))<;;;;D,+d
11n 1- 2
1P (s. 11) ds d11 > ~o (/'),) > o, (9.1. 36)
where {3 0 (/'),) is a positive number depending on /'),. Let us choose the number R
sufficiently great and hence /'), sufficiently small that /'),/2 < ( 4 ( 1 , so that con°
dition (9.l.33)is satisfied when r.= O. Then from (9.1.32), (9.1.34), and (9.1.36)
we obtain
r (N +k)
(9.1. 37)
Let r 0 denote the largest integer satisfying condition (9.1. 33). Then from
(9.1.37), (9.1.34), and (9.1.33) we obtain
L (k) F 8
Ree(o3() + F01()+
8 ... +Fo,,0 (8))> Co~o2(~)
X r(N+k> (9.1.38)
(~ v-~ t+k ·
By virtue of (9. 1. 3 5) we obtain
ReL~k)(Fo, r 0 +1(8)+Fo, r 0 +2(8)+ · · · +Fo,M-1(6))
=H f(N+kk). (9.1.39)
(~4~1)
If we keep (9.1.20) in mind and compare (9.1.29), (9-1.20), and (9.1.38), we
conclude that, if /'), is chosen sufficiently small in comparison with ( 4( 1 and
( 2 (which leads to choice of a sufficiently large number R), for example if we
176 IX. RANDOMIZED HOMOGENEOUS TESTS
1 f(N+k)
R.e La Io (0) > 4 co~o (A)
(kl
_ N+k (9.1.40)
(~ v~)
for () = - D0 + A/2 + iA/1ok 2, but this inequality, as we shall see below, leads
to a contradiction.
Let us now consider the case in which the numbers n 1 and n 2 are of like
parity and let us show that an inequality analogous to (9.1.40) holds. In this case
the number N = (n 1 + n 2 - 1)/2 is half of an odd number, and the expression
[(0+A) (0+B))N'
0=-Do+ ~ +i l~k2.
For () E A, if we assume that arc () = 0 for () 1> 0 we find, by virtue of (9.1. 8)
and (9.1.26),
1 = tN 1 •
((0 +A) (0 + B)]N ((- 0 - A) (0 + B))N ,.
arc ( - 0 - A) = 0 ( (i 2) ) , (9.1.43)
arc (0 + B) = 0 ( ~2 ) • (9.1.44)
1 - tN 1 1
[(0+A)(O+B))N - (-0-A)N (0+B>N'
and we can apply our previous reasoning with minor modifications: instead of
(9.1.12) we write the estimate
§ 1. "NULL-REGULAR" SIMILAR TESTS 177
for
0 =-Do+2+ 1ok2 •
ll ill (9.1. 50)
For values of ()satisfying equation (9.1.50), we obtain from (9.1.40) and (9.1.49)
the inequality
'I 4ik)I (0) I > Co~o (Ll) r + k)
ll 1-~ r+k
(N (9.1-51)
0 4 (
n2 1
L~k>ae-2 (I+ 0)-N+2 •
Using the considerations discussed at the beginning of this section, we see that
. ~ 1
i.~k>ae-2 (1 +e)-N+2 =H r<N+k>
(Do-~ -1r+k · (9.1.52)
H t'!i. is chosen sufficiently small and then k is chosen sufficiently large, then
(9-1-51) will satisfy (9.1-52), which completes the proof of our theorem.
We note that, when the sizes of the samples are of opposite parity and N is
an integer, one can considerably simplify and shorten the proof of the theorem.
(9.2.1)
Consider the case n 1 -f. n 2 (we assume without loss of generality that n 1 <
n 2). Here we could use only n 1 observations y 1, ···,Yr,. 1 from the second sample
and construct a similar test of the form (9.2.1). H we did this however, there would
be a definite loss of information. Therefore Scheffe [79] proposed in 1943 a new
variant of a test of the form (9. 2.1) for this case.
We introduce the linear forms
n,
ll=Xi-~cijyj (i=l, 2, ... , n 1). (9. 2. 2)
J=l
These forms li constitute a normal vector, which we shall write as a column
matrix
l= =x-Cy,
where
Yt
X= y=
xn. Yn,
Suppose that a 1 - a 2 = o. For the equations
(> cr2 ... 0 ... 0
0 ... 0 2 ..• 0
El= = t;, E (l - b)(l --bl= =0"2ln1n1.
0 ... O .•. cr2
to hold for a 2 >O (for calculation with random matrices, the reader is referred,
for example, to [36]), it is necessary and sufficient that
n,
a1 - a2 ~ cij = a1 - a2 = (>, (9.2-3)
j=l
so that
n,
~ Ci·= 1 (i= l, 2, ... , n1), (9.2.4)
j=l J
- - T] =O"ln
E[(x-Cy-l>)(x-Cy-b). 2 1n1· (9.2.5)
Define
s= ri=
180 IX. RANDOMIZED HOMOGENEOUS TESTS
(9.2.9)
we can see that the mean length of such a confidence interval is minimized when
a; is minimized. By virtue of the correspondence between the confidence
intervals and the tests of the hypotheses that is described in [36], this property
of the confidence interval will be a useful property of a similar test (9.2.8). We
note on the basis of (9. 2. 7) that it is expedient to find matrices C that minimize
§3. BARTLETT'S TEST. 181
- !_=l
n,
r= (;~;::-X)' iicy,-YJ•)"'.
This puts the test (9. 2.1) in the form
(9.3.1)
r(~) ~
fn(r)= (n-2) -
r - - 'V.i:
(1- r2) 2 (9.3.2)
2
(9.3. 5)
Since r is stochastically independent of g and 71• we obtain from what was said
above a randomized test ¢ (g, 71) defined (for given C 1) by
cp(6. T]) = 0
that is, under condition ( 9.3-3). We note that under this last condition r(g, 71) .$ 1.
The condiV.on l~I = C1 J71- ll defines the boundary of the "randomized critical
zone." Inside the quadrant g~ O, 71 ~ O this zone has the form of a right angle
with vertex at the point (0, 1) and bisector parallel to the ~axis. In the quadrant
g ~ O, 71 ~ 0 the boundary of the zone is the mirror image of this angle about the
71-axis.
Returning to the sufficient statistics, we note that the boundary can be repre-
sented in the very simple form
I --, -C
x-y
S1-S2 - l·
(9-3-7)
If the sizes of the samples are all n = 4 then we see from (9.3 .2) that fn (r) is
constant and equal to r(3/2)/viTr(1) =~.Thus in this case the form of the test
¢ (g, 71) is simplified.
The fact that the randomized critical zone extends to the point (0,1) is quite
§4- TESTS OF BARTLETT·SCHEFFE TYPE 183
geneous test. (In this connection see § 3 of Chapter VIII, where a somewhat
weaker assertion is proved in detail.)
Fisher problem. The same method can be applied to achieve purposes that in a
way are just the opposite. One may seek to characterize the known similar tests
by exhibiting certain simple properties of a qualitative nature that distinguish
them from other tests. Apparently this can be dme for many familiar tests in
multivariant analysis and Fisher's analysis of variance. In this section we shall,
following [ 47], do this for the classical Bartlett Scheffe test described in the
0
where C is a constant.
The expression (9."4.1) can be represented in the form
(9.4.2)
where a 2 = a 21 + a.22
Thus the variances X and the linear forms li are also proportional for arbitrary
values of a 1 and a 2• The test (9.4-1) is defined on the sample space W de-
termined by the linear statistics X, l 1, · · · , l µ: Using the results of § 2, one can
easily write the corresponding probability density.
Here we have a one-parameter exponential family with parameter n/2a2 =
n/2 (a~ + a~) and with a single sufficient statistic
(9.4-5)
which is a quadratic form in its variables. By virtue of Theorem 4.2.2 we see that
the family of distributions generated by the sufficient statistic Q1 is boundedly
complete and that all similar zones are determined by the Neyman zones. Thus the
crtiical zone Z C is a Neyman zone, which one can also verify directly by noting
that the left side of (9.4.1) is stochastically independent of Qr
We also mention a characteristic property of the test (9.4-1): there exists a
positive constant f (an arbitrary number less than C) such that the inequality
implies that the test assumes the null hypothesis H0. Thus the test that we are
studying has the following two properties.
I. It is defined on the sample space defined by the forms X, l 1, · ··, lµ
II. It assumes the null hypothesis H0 when inequality (9.4.6) is satisfied.
Here our test is a Neyman structure defined by X, l 1, • · · , l µ: Therefore a
complete description of tests of the Bartlett-Scheffe type reduces to construction
of Neyman structures (see § 2, Chapter IV).
The purpose of what follows is to prove that properties I and II already
characterize the corresponding tests since they ai:e Neyman structures for suitable
exponential families.
Suppose that we have a randomized or unrandomized test
</; = ¢(x 1, · • ·, x ; y 1, • · ·, y ) for the BehrensaFisher problem. Here, of
nl n2
course,
Suppose that the test <ll is defined in the sample space generated by µ.+ 1 linear
forms X, l 1, • • • , l µ' where µ. ~ n 1 + n 2 - 1, that are linearly and stochastically
§4. TESTS OF BARTLETT SCHEFFE TYPE
0
185
independent. Suppose that under the hypothesis l/ 0: a 1 = a 2 we have
(9.4-8)
In addition the standardizer T must be related to the leader X and the test ¢
as follows: there exists a positive number fo such that the inequality
(9.4-9)
implies the test ¢ assumes the null hypothesis H 0 with probability 1. Of course
the test may assume this hypothesis even though inequality (9.4-9) is not
satisfied.
Now we can formulate the fundamental theorem.
Theorem 9.4. l. Suppose that a randomized similar test ¢ =
¢ lx 1 , • • ·, xn 1; y 1 , • • • , y n 2) is defined on the sample space of linear forms X,
ll' • ··, lµ. (satisfying the conditions listed above). Suppose that it has a leader
X and a standardizer T such that it assumes the null hypothesis H0 at least when
condition (9. 4-9) is satisfied. Then
D (l;)
D (X) = a; (i = l, 2, ... , ~t), (9.4.10)
where the ai are positive constants independent of a 1 and a 2 and our test is a
Neyman structure for the exponential family generated by X, l 1, · · ·, l µ:
We define D (X) = a 2 = f (a 1 , a 2), where f is a binary quadratic form. Then
D (l) = ai a 2 and the exponential family generated by X, l 1, · · · , l µ. is a one-
parameter family (the parameter being a) and has the sufficient statistic
186 IX. RANDOMIZED HOMOGENEOUS TESTS.
µ t2
Qo = x2 + ~
~ __!_ •
ai
l=l
Corollary. Under the conditions of the theorem the test ¢ is a Neyman struc-
ture for the statistic Q0 = X 2 + If:, 1 lf!ai. If it is unrandomized, it is obtained by
singling out the regwns of given conditional probability on the surfaces of level
Q0 and "gluing" them together.
Jn particular, for tests of the Bardett-Scheffe type, the quantity X = x-
y
serves as a leader and T = I':!.t= 1 [(x.£ - :X) - (y.I - y-)]2 = T (l 1, • • • , l n- 1), where l.£ =
(xi - x) - (y i - y) (for i = 1, 2 •.•• ' n - 1) serves as standardizer. Therefore
the test is a Neyman structure of the special form (9.4-1).
We note also that if T = T (l 1, • • · , lµ.) ls a standardizer corresponding to a
posit.ive number lo• it can always be replaced with the standardizer T1 =
l~ + · · ·+ l~ by replacing the number l with fl< lo· To see this, note that if a
test ¢ assumes H 0 under the condition (9.4-9), it also assumes it under the
condition
(Here and in what follows the letters b, c, d and e with subscripts are positive
constants.) We set
(9.4.12)
Thus
(9-4-13)
J... f
a;
cpl (X, Li, .. ., lµ)
x exp [ - ~ (
\
±
~. :~2 + Cj1~1 ~C/2~2
l=l
I µ
)] dX dl1 ••• diµ
I
= A 0 (tf 1 +tr2 )~ II (cn-6- + c '6' )2,
1 12 2 (9."4.14)
l=l
which is identically satisfied for 'l'l- 1, -6- 2 > O. Here A 0 > O is an absolute constant
and X is the space of the variables (X, l 1, • · • , l )· Let us set
I µ I
(9."4.17)
For arbitrary T/ > O the integral of this function with respect to cu from T/ to °"
converges absolutely. Furthermore
188 IX. RANDOMIZED HOMOGENEOUS TESTS.
00
f
0
<iPe --f-oo d©= ( ;-ra-1 f (l +a)= a-a-IQ(a). (9.4.18)
where G (a) is a function that is regular in the half plane Re a> - 1. Let us 0
s... J
x
ci>1 (X, 11, ••• ' lµ)
-a-1-µ+l
t7 )
X (
, x2
01+02
+~
~
.0 + .0 +a
';
C11 I C12 2
2
dX dl 1 ••• dlµ
i=I
µ
(9.4.20)
Here a and e are arbitrary positive numbers and G1 (T0 ) is a function that is
regular for Re T0 > O.
Now consider the fractions
r = 10-:+.-0
Ct1 Ct2
• i= 1, 2, ...• µ. (9.4. 21)
Let us suppose that this property is not satisfied for all indices i = 1, 2, · · ·
• • • , µ· Suppose that it is satisfied for i = 1, 2, · · · , p, where O ::; p < µ, and that
it is not satisfied for i = p + 1, · · · , µ· (Of course this reindexing of the l i does
not restrict the generality.)
For i = 1, 2, · · ·, p, we write bi= 1/cil = 1/c iZ" From (9.4.20), we obtain
dX dl 1 ••• dlµ
x
µ
(9. 4. 22)
JI (cile+ ci2)'1•.
i=p+I
Here A 1 =A(b 1 , ... ,bp)-Yi>o. Also µ-p?.1 since p<µ.
The expression (9.4.22) is an analytic regular function of e for Re e > O.
Obviously it can be extended as a regular function to the complex &-plane cut
e e
along the negative axis - oo < Re :S 0, Im = O. We define its values as we
approach the upper and lower edges of the cut in such a way as to preserve its
continuity with the aid of rectifiable paths extending from points of the positive
axis Ree> 0, Im e = O. Then the two sides of (9.4.22) coincide. We define
QI --x2+l~+
I . . . + LP.z
·~ l~
Qze= ~ ci1e+ci2 +a. (9.4.23)
i=p+l
Let us set
(9.4.24)
where ' i s a small positive number, which we shall make arbitrarily small in
what follows.
We partition the region of integration ~:
-oo < X <cc, -cc< li < +oo (i= 1, 2, ... , µ)
into "layers:"
IXI<~ (9.4.25)
and
IX IE (2m-l?",
.,, 2m1"]
.,, (m=l,2,3, ... ). (9.4.26)
190 IX. RANDOMIZED HOMOGENEOUS TESTS.
Let us find an upper bound for the absolute value of the expression on the
left-hand side of (9.4.22), where () is the value indicated in (9.4.24). Consider
the integral over the "layer" (9.4.25), On the basis of condition (9.4-9) regarding
the leader and the standardizer of the test we havecf>= O for T ~ JXI 0• By v; ,
the homogeneity of the standardizer T and its properties we then see that for
¢-f, 0
(9.4-27)
(9-4-28)
Furthermore,
(9.4.29)
where B is a bounded function. (In what follows the letter B will not always de·
note the same bounded function.) From this we see that for sufficiently small. ,,
(assuming 'sufficiently small). Hence the portion of the integral (9.4-22) over
the layer Lm is bounded by
§A. TESTS OF BARTLETT-SCHEFFE TYPE 191
__ B_(_2_m_~_)µ_+_i__
_µ_+_l +To
= B' -2T,2m"t,.
(9.4-34)
( 2 m-2~2) 2
sufficient statistic
~L t2
Qo =X2 + 4',,j·
~._!_ ai '
l=l
so that in this case all similar tests are Neyman structures with probability
E{q:ilQ0 J=a
for all values of Q0 . This completes the proof of the theorem. Its corollary re•
garding the structure of unrandomized similar tests can be proven directly.
The Bartlett·Scheffe test is the special form of Neyman structure defined by
the left-hand member of formula (9.2.S).
Special forms of linear forms li are chosen in such a way that property
(9.4.10), which we have proven, is satisfied. Just which of the various Neyman
structures are preferred in this sense depends on the requirements imposed on the
power properties of the test.
We note that the method that we have just expounded is applicable to the
study of possibl~ construction of many tests in multivariant analysis.
CHAPTER X
a(I X,--Y, I·
S21
~)~o
S21
(where i = 1, 2, • • ·, K and the function G is the same for all values of i) that
is similar for all pairs of samples and that has a prescribed level a E (0, 1).
Proof of Theorem 10.1.1. To construct a test <I>= <l>(l.X - :YI/ s 2 , s / s 2),
that is of level a E ( 0, 1) and that assumes only the values 0 and 1, it will be
sufficient to construct a cotest 'I' = <I> - a. The function 'I' must assume only the
values - a and 1 - a. Define~= (x -y)/s 2, T/ = s 1/ s 2 • Consider the critics
A(~, 71) and B (~, T/). Let <I>(~, T/) denote the characteristic function of the
193
194 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST
critical zone of the test sought and let 0 1 denote the quadrant g ~ 0, T/ ~ O.
Then to find <I>((, T/) we have (see (8.4.2))
(10.1.1)
for arbitrary e > O. Here N = (n 1 + n 2 - 1)/2. For the cotest '11((, T/) we have
the identity
(10.1.2)
for all e > o. If the function 'I'((, T/) assumes only the values - a and 1- a and
if it satisfies (10.1.2), it will be the desired cotest.
By hypothesis the numbers n 1 and n 2 are of opposite parity. It follows that
the number N is an integer. Let us assume that n 1 ~ 2 and n 2 ;:::: 2. Since these
numbers are of opposite parity at least one of them is 3 or greater and N ;:::: 2.
In the quadrant 0 1 : ( ~ 0, T/ ~ O the critics A((, T/) and B ((, T/) constitute
a coordinate system. Through each point of this quadrant passes one and only
one critic of each of the two types. In examining the integral (10.1.2) we shall
find it convenient to change to this coordinate system. To do this we need to
find the Jacobian a((, T/)/a(A, B). Since (O +A) (e + B) = e2 + 0(1+ g2 +T/2)+T/ 2,
we have A+ B = 1 + g2 + T/ 2 and AB= T/2· Thus(= [(B - 1) (1- A)]Yz and
T/ = (AB)Yz. Therefore
0 (6, TJ)
.0 (A, B) -
1 B-A (10.1.3)
=-4 1 1 1 •
(AB) 2 (B-1)2 (1-A)2
Inserting this expression into (10. i.2) and making the change of coordinates,
we obtain an equation.for finding the cotest '11((, T/) = '11 1 (A, B):
=f f Y
n 1-3
co
II= UIIk.
k"'O
Let us construct the function '11 (A, B) in such a way that for every k = O, 1, 2, · • •
we have
ffV
n 1 -3
N 1-2-k-l
Jk(6)= ~Dm
m=l
f
1-2-k
dA
co n,-3
+~Emf dB f dA
m=l 1 1-2-k
n 1 -3
'l'1(A, B) (AB)-2 -(B-A)m (10.1.7)
X Vl A YB-l(B-A) 2N- 1 (e+B>m
196 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST
Define
n 1 -3
A B _
2 -(B - A)m
(AB)-
Pm( • ) - Yl-AYB-1(B-A)2N-1 • m= 1. 2, ... , N.
On each rectangle Qks we choose a constant Cks > 0 such that the functions
1
Pmks(A, B)=-c Pm(A, B), m= 1. 2, ... , N,
ks
will be probability densities with respect to Lebesgue measure on that rectangle.
Obviously this is possible since B > A, so that Pm (A, B) ;::: O. Furthermore the
Pmks (A, B) are continuous functions on the closure Qks of the rectangle Qks
for k = 0, 1, 2, • • • •
At this point we apply the Romanovskil'-Sudakov theorem (which we shall
prove
_ k (A,
in the next section) to the probability densities p ms _ B) considered on
Qks for each k, s, and N. For given k and s suppose that Qks is of the form
a ::S A $ b, c $ B $ d. By Theorem 10.2.1 we can construct a measurable dense ·set
U = Uks such that, for m = 1, 2, . • • , N,
b b
c c
for almost all XE (a, b]
for s = 1, • · ·, n, where I A (x, y) is the characteristic function of the set A.
Without loss of generality we may assume that a = c = 0 and b = d = 1. The
lJ
I 1
f (x,
y) Ps (x, y) dx,
In the dual space every closed convex bounded subset is compact in the weak
topology (see [ 8] ).
Let us show that Df is convex, closed and bounded and hence that it is com-
pact in the topology of a(L00, L 1). That Dr
is bounded follows from the third of
conditions (10.2.2). Each of conditions (10.2.2) defines a closed convex subset
of L 00 • Proof of the convexity presents no difficulties. Also, the fact that the
set defined by the third of conditions (10.2.2) is closed is also obvious. It re-
mains to show that the first two of conditions (10.2.2) define closed subsets. The
first of these conditions is equivalent to the condition that for an arbitrary
bounded measurable function a(y), defined on [O, 1].
l l
ff g(x,
M
y)a (y) Ps (x, y) dx dy = const.
1) A neighborhood of an element g(x, y) of L00 (M} in the topology of CT(L00 , LI) is any
set containing a set V = V(g; qI, • • •, qk, f) consisting of all elements g(x, y) of L 00(M)
such that
Since a (y) p s (x, y) € L 1 {M), the set defined by the last equation is a closed
hyperplane; hence the intersection of all such hyperplanes as a(y) ranges over
the same set of all bounded measurable functions is also closed. The situation is
analogous with the second of conditions (10.2.2). The set Df' being the intersec-
tion of the sets examined above, is closed, convex and bounded; hence it is com-
pact.
The proof of Theorem 10.2.2 will be complete if we can show that DJ con-
tains functions l(x, y) that assume only the values 1 and O. Actually, not only
does DJ contain such functions but it contains enough of them in the following
sense.
·A point a in a convex set D is called an extremal point of that set if the re-
lations b € D, c € D and a= (b + c)/2 imply that a= b = c. It is easy to see
that each function l(x, y) € DJ that assumes only the values O and 1 is an ex-
tremal point of DJ. On the other hand, from the well-known Krein-Milman theorem
[8], every convex compact set in a locally convex Hausdorff space is the closed
convex hull of the set of its extremal points. In particular this applies to the com-
pact set DJ in the space L00 (M) equipped with the weak topology of the dual
space. Since the set DJ is nonempty, the set of its extremal points is also non-
empty and even ""sufficiently rich". It remains to show that the role of extremal
points of DJ can be played only by functions that assume only the two values 0
and 1. To do this we construct, for every function g(x, y) e DJ for which the
measure of the set [(x, y) € M; O < g(x, y) < 1] is positive, a nonzero function
h(x, y) such that g + h €DJ and g- h €Dr This will show that g is not an ex-
tremal point of Dr
We begin our construction of such a function h(x, y) by choosing a set B CM
of positive measure on which the function g(x, y) is separated from both zero and
unity by a positive amount:
mes n
i=l-1
l, J=O
(B- {(e,, 6i)}) > O.
§z. ROMANOVSKII·SUDAKOV THEOREM 201
l=k-1
n
J•l-1
l, J=O
(B-{(ei, l>i)}),
let us choose a subset C of positive measure with diameter not exceeding the
smallest of the differences lfp - fql' lop - oq I for p.;,. q. Obviously
(C + l(fil' oi 1)}) n (C + l(fiz' oh)})= A for (il' iJ) f,. (i 2, jz) and C+l<.~·i· oi)}cB
for arbitrary i, j.
For an arbitrary fixed point (x, y) E C let us consider the homogeneous
linear system of equations in the unknowns h ..:
l-1.j
'I
k-1
~ Ps(ti 1)h 11=0, s=l, ... , n, }=0,
l=O
... ,
Z-1
~0 Ps(t:.1)h 11 =0, S= 1, ••• , n, l=O, •• • , k-1,
J=
where tij = tij (x, y) = (x, y) + (fi• oi). Obviously the rank of the matrix of coeffi-
cients of this system does not exceed k(k + l - 1).
Let us now reindex the unknowns hij in a single sequence by assigning to
each a single index number: h 1, h 2, • • ·, hkl (where kl represents a single inte-
ger). To every point (x, y) EC let us assign the number N(x, y) such that the
rank of the submatrix of coefficients of the system (10.2.3) consisting of the co-
efficients of h 1, • • •, hN is equal to N and the rank of the submatrix of coeffi-
cients of the system (10.2.3) consisting of the coefficients of h 1, • • ·, hN+I is
also equal to N. If all the coefficients of h 1 are equal to 0 we take N = O.
Now consider a subset C 1 of C that is of positive measure and on which
the function N(x, y) assumes a constant value. By the choice of N the system
(10.2.3) will be satisfied if we assign to the unknown hN+l an arbitrary value
hN + 1 =f by setting hN +Z = · • • = hkl = O and then define unambiguously h 1, · • •
• • •, hN in the system (10.2.3). Here the functions h 1 = h 1 (x, y, f ), • • •, hN =
hN (x, y, f) are obviously measurable with respect to all the arguments and they
are linearly independent with 1. Let us suppose that a single f is chosen for all
(x, y) E C1 .
Now on the set
C2 = U<C
Ii
+ {(ei, l>)})
1
202 X. UNRANDOMIZED HOMOGENEOUS SIMILAR TEST
i=k-1
j=l-1
mes C1 ' \ LJ (C3 - {(e,. bi)})> 0.
i, j =0
~
This function h(x, y) is now defined on the entire square, and as one can
~
easily see lh(x, r)I s Q. Let us show that the function
B::::
h(x, y) = Q h(x, y)
is the function we have been seeking. Let l denote the straight line x = x 0 • If
l intersects the set C4 + fo, then
1
Jh(x 0, Y)Ps(Xo, y)dy
Q l-1
I
(C 4 + { (elo• O)}) nl
Ps (t1 01) h101dg=0,
204
§2. A. A. PETROV'S INVESTIGATIONS 205
on a measurable space ex, ff), where p(x) is the density with respect to the
measure µ.(x) on (~, ff). Here the parameters are not assumed to be interdepen-
dent.
If no other choice of alternative is prescribed by the statement of the problem
itself, we can take as an alternative a family with probability density of the form
Pe(X, e, V)=C(6, E)
Xexp[-(6 1T 1 (x)+ ... +esTs(x)+eV(x))]p(x), (11.1.2)
where V (x) is a statistic and E is a small number. Here we assume that
Pe(x, E, V) is a probability density for small values of E. For E= 0 we obtain the
hypothesis H0 • We denote the alternative for a given value of E by HE. For an
arbitraty admissible value of E the function T = ( T 1(x), • • • , Ts (x)) is a sufficient
statistic for e e
= (e 1, · • ·, 5 ). Since the parameters are assumed to be indepen-
ent for each admissible value of E, the family of sufficient statistics T is com-
plete (see §2, Chapter IV).
If our independent samples 0 1, • • ·, 0 q are of the same size n, we can con-
sider the conditional distributions of the samples for fixed sufficient statistics
in each sample. Such conditional distributions depend only on E, V, and the
values of the fixed sufficient statistics, and we can construct a criterion for dis-
tinguishing the hypotheses H0 and HE of a given level that has minimal condi-
tional probabilities of errors of the second kind.
For series of repeated samples 0 1, • · ·, 0 q from one-dimensional distributions
containing only parameters of displacement and dilation (these distributions are
not necessarily assumed to be exponential), the construction of similar statistics
can be achieved by constructing quite simple affine invariants of the sample space.
Some interesting studies have been made by Petrov [60] in this direction. We
shall expound these results in the following section.
"
S= ~(x,-x)2. y,= x, s x.
l=l
Then n n
l: y,=O, l: Y~= 1.
l=l i=l
(11.2.1)
206 XI. MANY SMALL SAMPLES
= 11/(
l =1
x-+sYt >I iJ(x., ••• ,xn)
iJ (x, S, ci>1o • • • • ci>n-2)
id-dSd
x <Jl1· · · d'Pn-2·
Po (11) Yn J Jsn-
00
dS
00
2 exp (
nx2+s2)
2
- r (n 21)
dx = --n--1::'""'"'"'-
(Jl2i'"t 0 -co 2:it_2_
where f is a small number, H (x) is a polynomial, and the function R (x, E) sat-
isfies, for all values of a, the inequality
IR (x, e) I~e2R (x). (11.2.5)
§2. A. A. PETROV'S INVESTIGATIONS 207
where R(x) is a polynomial that assumes nonnegative values for all real x. Let
us show that for the corresponding probability density pf(71) of the vector 71 on
the sphere @,
co co n
Ps('l'J)=Po('l'J)+eVn
0
I dS I sn- -co
2 ~(x+Syi)
l=l
n
X Il1 c:X+ Sy,)dx+O(e2)
l =1
0 (11.2.6)
where
n
II2 = fi (1 +
i=l
eH (x + Syi) + R (x + Syi• e)).
For brevity we write
H,=H(x+Sy,), Rn=R(x+SYi• e) (l= l, 2, ... , n).
Then
n n
II2'-= n (1 +eH;-t- R;)= 1 +c ~ H; =I= R.
l=l l=l
Here the quantity Ris the sum of finitely many terms of the form iPHi • • • Hi
1 p 1
Rj • ••
l =1 0 -oo
208 XI. MANY SMALL SAMPLES
R=Vn f dS f RIT,dx.
0 -oo
sn- 2
Vnn f dS • f Rexp(- nx t
00 00
= sn- 2
dx+ 0(e2) cp.2.1) 2 8 a)
(2n) 2 o -oo
uniformly with respect to 71 (for fixed sample size n). It follows from what we
~
said above that R is the sum of finitely many terms of the form
co
f dS f
00
-
Let us denote by I the double integral in this expression. Since p + 2q ?:. 2, we
need only show that
00 co
T- I dS f sn-ZlllH(x+sY,k)I
0 -co k=l
Then
f sn- :iI [P<x+sy,) exp (- <x+:/Y1) )]~x'.·
00 00 -
T<;I dS 2 2
0 -oo l m1
§2. A. A. PETROV'S INVESTIGATIONS 209
Then
T<Kf j j s11 - 2
0
dS
-oo
exp [ - ~2 ] exp [ - n;2] dx
11 11-2.,_r:;;;r(n-1)
=K12 v 1tn - 2- = -K ,
which completes the proof of (11.2.7).
Let 17 1, • • ·, 11q denote q independent observations on the vector 17. Suppose
that we are verifying the hypothesis H 0 : p ( 17) = p 0( 17) as opposed to the alterna-
tive Hf.: p (17) = pf.(17) for given f. /, 0. According to the familiar Neyman-Pearson
criterion, the test that is optimal in a certain sense is based on the statistic
Pe (TJ1), • • • • Pe (TJq)
y (11 1•••• , 'llq) =In PO (TJ I•
)
... ,
P0 (TJ q) • ( 11.2.8)
Here it should be noted that, by what was shown above, pf.(17) /, 0 for 11t € @
and sufficiently small f. (i = 1, 2, · • • , q). The test will have a critical zone
y(17) > c with level
a = P ( y > c IH0 )
and with probability of error of the second kind
(3 = P(y .:5 c\Hf.).
We consider the behavior of (11.2.8) as q--> oo with fixed n. We define
Yk = ln(pf.(17k)/poC17k)), so that
q
'V (Th, • • • 'llq) = l} 'Vk•
1
k=l
The quantities yk (k = 1, 2, • • ·, q) are independent and identically distributed.
We shall show that they have a finite third moment both under the hypothesis H0
and under the hypothesis Hf.. Then the central limit theorem (see [33]) can be
applied to the random variable y. We have
where
210 XI. MANY SMALL SAMPLES
t
O(t)= J/ (x)dx,
0 a0 =E(Vk IH0 ); a 2 =E(Vk IH2 ),
-co
ta,_, c - .. r-::
qa0 t t c - qa0
• 1-a=- a,..., ..r:: •
ao r q Ge.f'·..'f:
c ,_, qao + taao yq. c ,_, qae - t r.Oe yq.
Therefore ta. a 0 + t/3aE"' yq (af - a 0 ) and
q ,_,
(ta<lo + tr/1 8) 2
• (11.2.11)
(ae-ao)2
If a and f are fixed, the probability of error of the second kind f3 (q, a, f) =
P(y ~ clHf)-+ 0 as q-+ oo. For given f3 let q = q(a, {3, E) denote the smallest
integer such that {3(q, a, f) ~ (3. Then q is the number of observations necessary
to distinguish the hypotheses H0 and Hf' H0 and Ha. on the basis of a sample of
constant size q with probabilities a and f3 of errors of the first and second kinds
respectively. We define
q
*
= q* (a, ~. e)
+ t 11ae)2
= (t0a(ae-ao 0
) 2
To compare q* and q we use the following fact. Suppose that q*(a, {3, E)-+ oo
as f-+ 0 for fixed a and f3 and that the quantities Po and pf remain bounded.
Then
q (a, ~. e) 1
q• (a, ~. e). ~ · (11.2.12)
§z. A. A. PETROV'S INVESTIGATIONS 211
To prove this we note that by virtue of the expressions for a and {3 we have
c = qa 0 + ta.a 0(1 + A), where A= A( f., q)-+ 0 as f.-+ 0 and q-+ oo; also
. 2 tll+6
2 } 1 q2 -q (a, p, e)
mm { (l -6), - t 2- --q* <,-q* <, q,. (a, ~ , e)
13
qi 2
<,-.<,max { (1+6)2, -
q
2-
tll-6
t13
}
+--.
1
q
and
where
a(ri)= b
0
j j dS
-oo
sn- 2 exp ( - nx ts 2 2
rt
l=l
H (x+ Syi) dx;
b= 2
Vn •
2 n; ynr ( n 2 1 )
The function u (11) is bounded by the quantity G. Thus for sufficiently small E
- (ta+t13) 2
q ,.._, i;2µ2 ( 11.2.13)
If we set en= n/µ. 2 we have
(11.2.14)
UNSOLVED QUESTIONS
213
214 UNSOLVED QUESTIONS
Chapter IV. 1. How can we describe all the similar zones of distribution
families of the form (4.3. 5)?
2. Let Xi,·•·, xn (xi € N (0, 1)) denote a repeated normal sample. Let
P i<xi, • • •, xn) and P 2<xi, • • ·, xn) denote two independent polynomial statistics.
Is it always possible to "uncouple" them by means of an orthogonal transforma-
tion, i.e. to reduce them to two statistics depending on the completeness of the
different variables? (For a given sample size n and given degrees m 1 and m2
of the polynomials the question can be solved in a finite number of steps for all
such polynomials (see [ 41]).)
This section gives the description of all polynomials P(a, 1/a2) admitting
invariant verification in the sense of §2, Chapter III (strictly speaking, that
sense will be modified a little), on the evidence of a repeated sample(xp • • ·, xn)
from a normal population N(a, a 2). The method of this section is purely analytical,
in contrast, for instance, to that of E. Lehmann [13] establishing the non-
verifiability of certain functions as a consequence of the indistinguishability of
the corresponding families of distributions.
l)Editor's note. The translation of the Supplement was provided by the authors.
2) Authors' note. These refer to the Bibliography at the end of this Supplement, not
to the main Bibliography.
217
218 SUPPLEMENT
We have
- 2 ~
E 0 a-<f>(x, s) = <f>(a, a)= <f>(a, g)
'
= c0 j j gn 12 exp[- e<s + <-x -aY)Js<n-3> 12 ¢<-x, s)dx ds. (S.1.1)
-oo 0
1¢<a, ~co e
g)I ·IR!
2
ln/ exp [~:1; IIm a 12 ] (S.1.2)
Proof.
ESTIMATION AND TESTING HYPOTHESES 219
C0 !Re gjn/ 2
•expf__Re(ga) 2
[ Reg
1-ooo
j j exp[-Reg(s + :x 2 ) + 2 Re(ga):X]sCn-3)/2 ds dx
is the power function of the trivial test ¢ =1 at the point (Reg, Re (ga)/Re g).
Taking this into account, we can write the right side of (S. 1.3) in the form
C Ig
0 Re g
ln/2
exp
[Re (ga)2
Re g -
R
e
(ga 2)] .
Re (ga) - Re ga2
Ree
times a constant; also, r 2(0) = r 1(0) = O; r;(o) = 1 and the image of the mapping
r 2 : c+ - C belongs to c+. The function i/J is bounded in any closed angle
belonging to C +.
In what follows we shall denote by C * the complex plane compactified in
the usual way.
Letnma S. 1.2. Let p: w -> C * be a function meromorphic in a domain
w cC and not a constant; let n be the range of its values. If t/J(p(I;,)),
(, € w is analytic in w, then t/J is analytic in n.
Proof. Let z 0 € 0 be an arbitrary point distinct from oo, and let (,0 € w
be a point at which p(l;,0 ) = z. If p'((,0 ) .f. 0, then in the neighborhood of the
point z 0 we have z = p(w(z)) where w(z) is holomorphic in z 0 • Hence
·ijl (z) = t/J (p (w (z))) which implies that t/J is holomorphic in z 0 •
Since pi const, the zeros of its derivative are isolated. Let (,0 be one
of the zeros of p' and U Cw a closed bounded neighborhood of the point (,0 ,
containing no other zeros of p' and no poles of p. Its image V = p (U) is a
bounded neighborhood of the point z 0 = p ((,0).
In the domain V \I z 0 l the function t/J is bounded and was proved to be
analytic. Hence it is analytic at the point z 0 also.
Now let (,0 be a pole of the function P· Choose a bounded closed neighbor-
hood U C w of the point (,0 • Its image V is a neighborhood of oo, and in the
we get
Pi
r(a, ()=rm(() am+···+ r 0 (,f); r. = - €A.
' qo
We fix the point ( so that rm(() f. 0. Then r (a, () "-' rm (()am for a -> oo. The
inequality (S.1.2) implies for certain values of the constants A and B that
hence we deduce that the order of the entire function 1/1 does not exceed 2/m.
The first assertion of Theorem S. 1.1 is proved.
We pass to the proof of the second assertion. The functions r 0, p 1 , •••
• • • , Pm• q 0 are analytic in the vicinity of zero. Hence the functions
r i = pi/q 0 are analytic in the vicinity of zero, except perhaps at zero itself,
where the only possible singularity is a pole. We shall show that in fact the
coefficients r i (i = 1, · · · , m) are analytic in the vicinity of zero and r i (O) = 0.
For each i = 1, · · ·, m we have in the vicinity of zero
a. a. +1
r/() =Pi ( '+ P; ( ' + · "" '
with certain Pi f. 0 and ai. If all ai >0 our assertion is proved. Suppose
that ai <0 for a certain i; consider the quantity
a= max r_ ai ] .
i> 1 ti
Let k be the largest number for which - ak/k =a. For A€ C put
g > o,
where the branch of the root can be chosen arbitrarily. Since ak < 0, the quan-
tity a A(() is bounded for (-> 0 for any A€ C. Since ( > 0, we get from
(S.1. 2):
(S.1.5)
(S. 1. 7)
We take now a sufficiently large R. If the point ,.\ runs over the circumference
with the radius R, (S.1.7) implies that the point r(a,.\(g), g), which depends
continuously on ,.\, runs over a curve homeomorphic to that circumference and
containing the circle
On the other hand, from (S.1.5) it follows that on this curve the function t/J is
bounded by the constant C exp (8(,.\) g). Since g can be chosen as small as we
please, t/J is bounded by the constant C, which does not depend upon ,.\ on the
curve described above and hence in the circle of radius R '. As R -+ oo so
does R'; hence t/J is bounded on the whole plane; hence t/J =const. Since we
assumed that t/J ~ const we see that all a.i > 0.
Lemma S.1. 3. For each g € c+ there exists an E > 0 such that the func-
tion t/J is bounded inside the angle
!arc z - arc r m(g)I Sf·
We shall show now that the function ifJ is bounded inside the angle
(S.1.11)
We have
This implies that the domain between the curves g±(a), taken together with a
sufficiently large circle, contains the angle (S.1.10). By the inequality (S.1.2)
the function ifJ (r(a, .;±)) is bounded for real values of a. Hence the function ifJ
is bounded on the curves (S.1.11). But by (S.1.9) these curves are contained
inside an angle less than 77/2, while ifJ was shown above to be an entire function
of order not exceeding 2/m ~ 2. Hence by the Phragmen-Lindelof principle, the
function I/I is bounded in the domain lying between the curves (S. l.11), as was
to be proved. The case of odd m is dealt with by analogy with the preceding
one. * To complete the proof of Theorem S.1.1, consider the set
(S. 1.12)
Since
(S.1.13)
*With .; replaced by - .; •
224 SUPPLEMENT
In this case it follows from (S. 1.13) that the set (S.1.2) contains the interval
(- TT/2, TT/2). Then by Lemma S.1.3 the function I/I is bounded inside the angle
Jarc zl :::; TT/2 - f for any £ > 0. Hence I/I is a function of order at least 1 and of
finite type. On the other hand, we have proved that its order does not exceed
2/m. Hence, m :::; 2. The case m = 1 is excluded because for m = 1 Lemma
S. 1.3 would imply the boundedness of the function I/I also in the angle
Jarc z - TTI :::; TT/2 - f for any f > 0, which is impossible for nonconstant functions
of finite order.
It remains only to verify that the image of the mapping r 2 : c+-+ C belongs
to c+. Suppose it is not so, and that for a certain g € c+, !arc r2 (g)I > TT/2.
By Lemma S.1.3 the function I/I
is bounded inside the angle !arc z - arc r 2(g)I $£·
Since this function is of the first order and bounded inside any angle of the form
!arc z I :::; TT/2 - f, again the Phragmen-Lindelof principle implies that it vanishes
identically. Theorem S.1.1 is proved.
Theorem S.1.2. Let a polynomial p(a, g) which is not a function of g only
be C-verifiable. To be so, it is necessary and sufficient for p(a, g) to be rep-
resentable as a linear fonn of
where
(S. 1.15)
Let the point g move inside the angle \arc g\ $ rr/2 - £. Then from (S. 1.2) it
follows that ijJ (p(a 0 , g)) is bounded. On the other hand, the first term in
(S.1.16) runs over all the values of the angle \arc g\ $TT - 2£-o Hence the func-
tion p(a 0 , g) runs over all the values of the complex plane, except perhaps in a
certain circle and in the angle \arc z - rr \ $ 3 £. Hence the entire function t/J is
bounded on the whole plane except perhaps in an angle \arc z - rr \ $ 3 £, which
is as small as we please. Since this function is of finite order, the Phragmen
Lindelof principle implies that it is bounded on the whole plane and therefore is
a constant. This contradiction proves that p/g), p 1(g) and p 0 (g) are linear
functions of g.
From (S.1.15) we get that p 2(g) = g; p 1 (g) =Ag, where A is a constant.
Subtracting a suitable constant from p(a, g) we can also annul the constant term
of p 0 (g). Then p(a, g) takes the form (S.1.14).
Consider now the function arc (a 2 + Aa + 8) for real values of a. Suppose
it to be nonconstant and let ¢ 1 and ¢ 2 be two distinct values of it. In that
case, if e runs over the angle !arc ti.$_ rr/2 - (' and the point a runs over
the real axis, the point g(a 2 + Aa + B) will take on all the values inside the
angles
From the inequality (S. 1.2) it follows that t/J is bounded inside these
angles for any £ > 0.
Since ¢ 1 -/, ¢ 2 and t/J is an entire function of the first
order, this is impossible in view of the Phragmen-Lindelof principle. Hence
arc (a 2 + Aa + B) = const. On the other hand, arc (a 2 + Aa + B) = 0 for a --> oo.
Hence arda 2 + Aa + B) = 0, A and B are real numbers and a 2 + Aa + B is
nonnegative for all real a, so that A 2 - 4B .:::; 0.
Sufficiency. Consider the test
s ?_tx 2
s <tx 2
for a given t > O. It can be shown that for a certain pair of numbers (y, s 0 )
with s 0 > 0 we have
This is closely related to §s, Chapter V, and we use here the notation and
terminology of that chapter.
Consider the exponential family of densities with respect to Lebesgue
measure
(S. 2.1)
with 0, T €Rs. Before stating the conditions imposed upon h(T) and the para-
metric set, we introduce the set (l) C Rs determined in the following way. Con-
struct the set of e € Rs for which for a certain C = C(O) the condition
We shall suppose that the family (S. 2.1) satisfies the following requirements.
1. 1. 3" =Supp h is a convex set.
1. 2. (l) is non void.
1.3. For any € >0
l\MT)exp(-€\TPllL <oo. (S. 2.3)
2
(Note that from (S. 2.2) and (S. 2.3) it follows that for any e€ (l), p(T, O) can be
considered as a probability density.)
2. 1. The parameter e takes on values in an everywhere dense subvariety
of a real algebraic variety II n (l,).
for any cf>(T) for which the right-hand side of the formula is finite. Let £(1') be
such a solution for P(- ~ + a).
Let cp(T) be a given cotest, put ¢" = ¢h and
228 SUPPLEMENT
In view of the inequality (S. 2.6), the function exp (a, T) ¢(T) belongs to the
space L 2 with the weight exp(€\ T\) for a certain € > 0. Therefore the property
(S. 2. 7) of the function E implies the inequality
Let us check the equality (S. 2.1). Tc d1is end we shall establish first that
the function E( T) exp [ - (a, T)] is the fundamental solution for the operator
p(- :D). By the Leibnitz differentiation formula we have
(exp(0 0 +a, T), ~(T)) = 0 for p(0 0 +a)= 0, \Re0 0 \ < 2f.
For sufficiently small a we have 0 0 +a€ <i.> x R;. Let N' be a connected
component of the intersection N n (w x R;), containing the point 00 + a. By
the conditions of the theorem N' has at least one real point ( at which
grad P(() -/, 0. Since P(O) is a polynomial with real coefficients, the condition
grad P(() -/, 0 implies that the set N' contains an open (s - 1)-dimensional
part v of the set of the real zeros of P(O). Now the cotest condition
implies that
Since (exp(O, T), ~(T)) is analytic in w x R~, the relation (S.2.11) holds by
the principle of analytic continuation for the whole N', i.e.
First we shall prove that 'I' does not depend upon the choice of the point a€ cu.
Take
We have 11'1' exp(a, T)ll-£ < oo and 11'1'' exp(a', T)ll-£ < oo. From this and from
the convexity of the cone cu we deduce that the integrals
"'
'I' = f'P exp (t9, T) dT and "'
'I'' = J'P' exp (t9, T) dT
converge absolutely for t9 =b+ c +a' ; b € cu. But
Thus the Conditions 1. 1-2. 2 obviously hold. Let us check the requirement
n
3. 1. We take an arbitrary point e € N (w x R;) and let it move into the real
subspace while remaining during the whole motion in one and the same connected
component.
1) Suppose that e 3 f. 0. Consider the path formed by the points:
t € (O, 1).
This path takes et into the point (Ree 1 , Re0 2 , Ree 3 , Ree 4) belonging to
the real subspace.
2) If e3 = 0 but e4 f. 0 the argument must be changed in an obvious way.
3) If e3 = 0, e4 = 0, the path
1 r
<f>=lqar 1 ar 4
a2
-
a2
ar 1 ar 3
]
'I'.
6. We can describe all the cotests for exponential families with an arbi-
trary number of polynomial relations. In that case the ideal I is not neces-
sarily a principal one. Each cotest can be written in the form
~ = ~ :I p. (-~)'I'.,
'+' hj J J
where P.(e) are the generators of the ideal I and 'l'.(T) are in general, gen-
J J
eralized functions with supports in 3".
232 SUPPLEMENT
(S. 3.1)
()i = ci(a)
and suppose that for a€ A the point () = (() 1 , • • ·, () s) runs over an everywhere
dense subset of an algebraic sub variety of the domain 0 C Rs. This subvariety
we shall write in the form 0 n TI, where TI is an algebraic variety in Rs given
by the polynomial relations
(S. 3.2)
with r < s.
The distribution of the repeated sample (xl' • • ·, xn) = ~ from the set
(S. 3.1) is given in Rn by the density with respect to Lebesgue measure
n n n }
t<n>(~; ()) = (C(()))n exp { fto(xi) + ()l ft1(xi)+·· ·+()sfts(xi) (S. 3.3)
(S. 3.4)
h(T), i.e. Supph =IT: h(T) > OI; let 3" =int Supph. Our condition for h(T)
consists in the following requirements:
h(T) is infinitely differentiable on 3"}
(S. 3.5)
mes Supph =mes 3".
Let g(T) be an unbiased estimate for a certain function y(O) depending only
upon the vector of sufficient statistics T and having a finite variance for all
values of 0 € 0 n Il. We shall investigate the conditions which are implied for
the variety II and the estimate itself by the optimality property of the estimate
for all O € 0 n Il in the class of unbiased estimates of the function y(O) with
finite variances. Throughout this section we take the variance for the quality
measure of the estimate. It is clear that the behavior of the function g(T) out-
side 3" is of no importance for its properties as an estimate of y(O); therefore
we shall suppose g(T) to be defined only on 3". Denote by N the least
(complex) algebraic variety in Cs containing Il n O. Since Il is itself an
algebraic variety, Il n0 =N n O.
Theorem S. 3.1. In order for the function g(T), T € 3"
for which E8 g 2 < 00 ;
0 € 0 n Il to be the best unbiased estimate of the function E 8 g = y(O) for the
exponential family (S. 3.5), 0 € 0 n Il, under condition (S. 3.5) it is necessary
and sufficient that:
1) In the space Cs of variables 01' ···,Os there exists a linear system of
coordinates O~, ••• , O~ in which N is a cylinder of type L xv, where L is
the coordinate space o~ = ••• = o~ = 0 and v is a certain set in the space:
0 m+ 1 = · · • = 0's = 0; 0 -< m -< s.
1
In fact, let g(T'J be the best unbiased estimate of y(O) and x<T> an arbitrary
unbiased estimate of zero. Then for an arbitrary constant c
(S. 3.6)
The relation N L x v implies that, together with the point 0 0 , the set N con-
=
tains the points (0 10 , ••• ,Omo' Om+l' ••• ,Os) forall values of Om+l' ••• ,Os.
Hence and from the condition E 8(x) = O; 0 € 0 nN we deduce that
By Lemma S. 3.1, g(T) is the best unbiased estimate of the function y(O).
The proof of the necessity of the conditions of Theorem S. 3.1 is more com-
plicated and the following lemmas are required. Denote by 1J = '.iJ (j") the space
of infinitely differentiable functions of T with compact supports lying in :f.
For any function ¢(T) € '.iJ the Laplace transform
(S. 3. 7)
because the expression to the left is equal to g x(O), where x = p(- ~- /;,)¢ sat-
isfies the condition x(e) = 0, 0 € 0 n TI. Now from (S. 3.8) we deduce that the
generalized function p(~) g vanishes on the functions of the form
¢ exp (/;,, T).
As any function from ~ can be represented in this form, p(~) g = 0 10 j° i.e.
p(O) € ~. Lemma S. 3.3 is thus proved.
Consider the ideal g= {/ 6! 6€N"
By Lemma S. 3.3, each summand I 6 belongs
to U; hence g C U. Moreover, each polynomial from g vanishes on the set
L = n
6 € N(N - (,). But if 0 € L, then for a certain (, € N, € (N - (,). Hence e
we can find a polynomial f €I~ Cg such that f(e) ,j 0. Thus L is the set of
common roots of polynomials from g and therefore is an algebraic variety.
Lemma S. 3.4. The set L = n 6 € N (N - (,) is a linear subspace, and for
each point (, € L
g6 = g_ (S. 3.9)
where fi are certain functions holomorphic at the origin. By (S. 3.9) we can
also obtain the representation
where f. I
are certain functions holomorphic at an arbitrary point
-
<; of L. Such
a representation can obviously be obtained for each point <; € L, since we can
always find a polynomial p' € g that does not vanish at <; •
ESTIMATION AND TESTING HYPOTHESES 239
We fix now an arbitrary point (€Cs. Denote by R ~ the ring of all rational
functions in Cs whose denominators do not vanish at the point (; by }( ~ we
denote the ring of all functions holomorphic in (. As is well known, the pair
(R ~, }( ~) is flat (see (15]). This means that }( ~ is a flat R cmodule [ 15] and
that for each R ~-module E the natural operation E -+ E ® R }( ~ is injective.
~
We now take for E the factor-ring Rs \fas where gR s is the ideal in Rs formed
Ex J(~~J(~\gJ(
~
where g}(~ is an analogous ideal in }(~. Hence we can assert that the natural
mapping
R ~ \g }( ~ -+ }( ~ \g J{ ~
is injective. By (S. 3.12) the polynomial p belongs to g}(~ and hence belongs to
g9{ ~, in view of the injectivity of this mapping. Hence
(S. 3-13)
ag ag
--=···=
ar'
--=0.
ar'
1 m
240 SUPPLEMENT
These equations in the space of generalized functions mean that the function g,
up to its values on a set of measure zero, is constant with respect to the variables
Ti, · • • , T~ in the domain :f. The representation N =L x JI is obvious, be-
cause it follows from L C N - ( that together with the point ( the set N con-
tains the whole variety L + (.
Theorem S. 3.1 is proved.
It is interesting to compare the situation with respect to optimal unbiased
estimation of parametric functions for complete and incomplete exponential families.
For the former, by the Rao-Blackwell-Kolmogorov theorem, every statistic
g (T) depending only upon sufficient statistics is an optimal estimate of its
mathematical expectation E eg. For the incomplete exponential families, in view
of Theorem S. 3.1, the optimality of g (T) as an estimate of E eg means,
roughly speaking, a kind of quasi-completeness with respect to some of parameters
(the representation N =L x JI is analogous to the completeness). As regards
the optimal estimate.itself, it depends on sufficient statistics, having the same
indices as the parameters with the "quasi-completeness,, property.
holds. If we set
00
a1= f xdF(x),
0
al
cl= .
af + (a 2 -ai)fn
Since a 2 '2::.ai
and, moreover, the equality sign holds for only the degenerate
ones in the class of all estimates (except in the degenerate case), ai 1 x is
always inadmissible.
x
When is ai 1 admissible in the class of unbiased estimates of the scale
parameter a? To answer this question we shall first prove Lemma S. 4.1,
where we use the notation y = (x 2/x 1 , • • ·, x/x 1).
Then
and equality holds in (S. 4.4) if and only if, with probability 1,
E i (x IY) -1 -1
El(x2jy) = bn; bn =al en
E1(xjy)2]
E (s )=c E (xy)=c aE 1(yE 1(xjy))=ac E 1 [ 2 =a
o- n n o- n n E I (x I y)
by (S. 4.3);
1 1
E o- (a-Ix-a)2=Eo-(a
1 1- x - s n ) 2 +E o- (s n -a) 2 +2E o- l(a-1 x-s n Hsn-a)}.
242 SUPPLEMENT
Further,
and the equality sign in (S. 4.5) holds for all a€ (0, oo) simultaneously if and
only if with probability 1
i.e. if
or a gamma-distribution:
ESTIMATION AND TESTING HYPOTHESES 243
F(x)={o, x~O
(ym /r(m)) J;xm-I e -y x dx; 0 < x < oo
for some m > 0, Y·
Theorem S. 4.2. If F(x) satisfies the condition (S. 4.6) and a~ 1 x
is optimal
in the class of the unbiased estimates of the parameter a for a sample size
n ~ 3, then F(x) is either degenerate or a gamma-distribution.
We omit the proofs, which proceed by means of functional equations. They
are given in detail in [5].
Let xl' • • ·, xn be a repeated sample from the population with the distribu-
tion function F (x - ()) satisfying the conditions
(S. 5.2)
JI x 13 dF(x) < oo
the Pitman estimate is absolutely admissible.
The loss function is assumed to be quadratic throughout this section. We
244 SUPPLEMENT
shall now consider in detail the situation when the form of the function F (x) is
unknown. If we suppose that F (x) is allowed to be arbitrary, satisfying only
the condition (S. 5.1), it is easy to see that there is no better estimate for () than
x. However, in the case when for some integer k > 1 the first 2k moments of
the distribution function F (x) are known:
the information about F(x) contained in the moments (S. 5.4) can be used more
efficiently than x for construction of the estimates of the parameter ().
Under condition (S. 5.5) the set of all polynomials Q(x 1 , • • ·, xn) of xl' • • •
• • • , x n of degree not exceeding k forms a Hilbert space L 2 > if the scalar1
product of the elements Q1 and Q2 is defined by:
(S. 5.6)
(S. 5. 7)
uh ere the equality or the inequality in (S. 5. 7) are realized simultaneously for
all () € R1 and the equality holds if and only if E0 (x I Ak) = 0.
Proof. Since Q =1 € Ak, we have
Hence
Moreover,
E 8 (x - 0) 2 = E 8 (x - E0 (x I Ak) - e + E 0 (x I Ak)) 2
and the equality sign (for all () € R 1 simultaneously) holds under the condition
(S. 5.8)
E0 (x(x.
'1 - ~ 1 ) •• -(x.lk -x 1))=0, (S. 5.10)
where (S. 5.10) must hold for all the sets (j 1 , • • ·, j k) of integers 2, • • ·, n.
The equivalence of the conditions (S. 5.8) and (S. 5.9)-(S. 5.10) follows from
the fact that Ak-I and the functions (x. - x 1) • • • (x. - x 1) generate the
/1 lk
whole Ak.
Let
al as
(x. -x 1) ••• (x. -x 1)=(x. -x 1) ••• (x. -x 1) , (S.5.11)
II Ik 'I is
(S.5.12)
a
s l z l
1
=-
n
I (- 1) cal1 ... cass /ll ••• /ll /lk +I - l
I s
(S. 5.13)
ESTIMATION AND TESTING HYPOTHESES 247
~· l l1 ls
~ (-1) Cal ... Casµll ••• µlsµk +1-l
s ~· l ll ls
+ :I ~ (-1) ca ••• ca µz ···µz µz •• •µz µk-l'
q=l q 1 s 1 q-1 q+I s
(S. 5.14)
where the summation in I* is taken over all even l 1, • • • ; ls and in I; over
even l 1' • • • , l q-l , l q +I , • • • , ls and odd l q, the limits being indicated by
(S. 5.13). Io the sum I* the number l is always even and therefore (k + 1-l) is
odd. Moreover, if k + 1 - l Sk - 1, then µk + 1 - l =0 by the induction as sump·
tioo. Io the sum :I;, the number l is always odd; hence (k - l) is also odd;
since k - l ~ k - 1 we have µk-l = 0. Hence the condition (S. 5.10) is
equivalent to the relation
µk + 1 = o. (S. 5.15)
the condition (S. 5.16) in the sum I.;=l I; we have lq + 1 :Sh; hence by the
induction assumption
(S. 5.17)
(S. 5.18)
But
Ca.
l 1 +1
csla. +1 s
- -1- - (l 1 + 1) + .•• + __
s_ (ls + 1) = I. (a - l ) = k - 2m.
ll els q =1 q q
ca. a.
1 s
Since
llk-l-2m a 2 (k - 2 m) = llk+l-2m
(recall that m > O), we get from (S. 5.18)
s0 + s1 = o. (S. 5.20)
But
S1 = - (a 1 + · • • + as) /l.2 µ. k - 1 = - k a 2 µ. k - 1 •
(S. S.21)
Now let one of the numbers a 1, • • ·, a 8 be equal to k; then all the other num-
bers vanish. Without loss of generality we can assume that
BIBLIOGRAPHY
251
252 BIBLIOGRAPHY
VOLODIN, I. N.
12. On the distinction between the Poisson and P6lya distributions when a
large number of small samples is available, Teor. Verojatnost. i Primenen.
10 ( 1965), 364-367. (Russian) MR 31 #6299.
GLEASON, A. M.
13. Finitely generated ideals in Banach algebras, J. Math. Mech. 13 ( 1964),
125-132. MR 28 #2458.
DANTZIG, G.
14. On the non-existence of tests of "Student's" hypothesis having power func-
tion independent of a, Ann. Math. Statistics 11 (1940), 186-191. MR 1, 348.
DARMOIS, G.
a
15. Sur Les Lois de probabilite estimation exhaustive, C.R. Acad. Sci.
Paris 260 ( 1935), 1265-1266.
DOETSCH, G.
16. Handbuch der Laplace-Trans formation Vols. I, II, Birkhauser, Basel,
1950, 1955. MR 13, 230; MR 18, 35.
DOWKER, C.H.
17. Lectures on sheaf theory, Tata Institute, Bombay, 1956, 1962. MR19, 301.
DOOB, J. L.
18. Stochastic processes, Wiley, New York, 1953; Russian transl., IL, Mos-
cow, 1956. MR 15, 445; MR 19, 71.
D¥NKIN, E. B.
19. Necessary and sufficient statistics for a family of probability distribu-
tions, Uspehi Mat. Nauk 6 (1951), no. 1(41), 68-90; English transl.,
Selected Transl. Math. Stat. and Prob., vol. 1, Amer. Math. Soc., Provi-
dence, R. I., 1961, pp. 17-40. MR 12, 839.
ZINGER, A. A.
20. Independence of quasi-polynomial statistics and analytical properties of
distributions, Teor. Verojatnost. i Primenen. 3 (1958), 265-284.
(Russian) MR 21 #941.
21. On a problem of A. N. Kolmogorov, Vestnik Leningrad. Univ. 11 0956),
no. 1, 53-56. (Russian) MR 17, 863.
ZINGER, A. A. and LINNIK, Ju. V.
22. Characterization of the normal distribution, Teor. Verojatnost. i
Primenen. 9 (1964), 692-695. (Russian) MR 30 #607.
BIBLIOGRAPHY 253
KULLBACK, S.
35. Information theory and statistics, Wiley, New York and Chapman & Hall,
London, 1959. MR 21 #2325.
LEHMANN, E.
36. Testing statistical hypotheses, Wiley, New York and Chapman & Hall,
London, 1959; Russian transl., IL, Moscow, 1963 and "Nauka", Moscow,
MR 21 #66 54.
1964.
,
LEHMANN,E.andSCHEFFE,H.
37. Completeness, similar regions and unbiased estimation. I, Sankhyii 10
(1950), 305-340. MR 12, 511.
LINNIK, Ju. V.
38. On polynomial statistics in connection with the analytical theory of dif-
ferential equations, Vestnik Leningrad. Univ. 11 (1956), no. 1, 35-48;
English transl., Selected Transl. Math. Stat. and Prob., vol. 1, Amer.
Math. Soc., Providence, R. I., 1961, pp. 171-206. MR 17, 983.
39. Polynomial statistics and polynomial ideals, Calcutta Math. Soc. Golden
Jubilee Commemoration Volume ( 1958-1959), Part I, Calcutta Math. Soc.,
Calcutta, 1963, PP· 95-98. MR 29 #2830.
40. On the theory of statistically similar regions, Dokl. Akad. Nauk SSSR
146 (1962), 300-302 =Soviet Math. Dokl. 3 (1962), 1297-1299.
MR 25 #3584.
41. Sur certaines questions de la statistique analytique, Ann. Fae. Sci. Univ.
Clermont Math. No. 8 (1962), 53-61.
42. Complex variables in problems with nuisance parameters and finite rank
sufficient statistics, Dokl. Akad. Nauk SSSR 149 (1963), 1026;_1028 =
Soviet Math. Dokl. 4 (1963), 512-513. MR 27 #3041.
43. Remarks on the Fisher-Welch-Wald test, Dokl. Akad. Nauk SSSR 154
(1964), 514-516 =Soviet Math. Dokl. 5 (1964), 118-120.
44. Randomized homogeneous tests for the Behrens-Fisher problem, lzv.
Akad. Nauk SSSR Ser. Math. 28 (1964), 249-260; English transl., Selected
Transl. Math. Stat. and Prob., vol. 6, Amer. Math. Soc., Providence, R. I.,
1966, PP· 207-217. MR 28 #5521.
45. On the construction of optimal similar solutions of the Behrens-Fisher
problem, Trudy Mat. Inst. Steklov. 79 ( 1965), 40-53 =Proc. Steklov
Inst. Math. no. 79 (1965), 41-56.
BIBLIOGRAPHY 2SS
46. On A. Wald's test for the comparing of two normal s~mples, Teor. Vero-
jatnost. i Primenen 9 (1964), 16-30. (Russian) MR 28 #SS20.
47. Characterization of tests of the Bartlett-Scheffe type, Trudy Mat. Inst.
Steklov. 79 (196S), 32-39 = Proc. Steklov Inst. Math. no. 79 (196S), 32-40.
48. An application of a theorem of H. Cartan in mathematical statistics, Dokl.
Akad. Nauk SSSR 160 ( 196S), 1248-1249 = Soviet Math. Dokl. 6 ( 196S),
291-293. MR 31 #6298.
LINNIK, Ju. V., ROMANOVSKAJA, J. L. and ~ALAEVSKII, 0. V.
49. Remarks on the theory of the Fisher-Welch-Wald test, Teor. Verojatnost.
i Primenen. 10 (196S), 727-730. (Russian) MR 32 #8446.
LINNIK, Ju. V., ROMANOVSKII, J. V. and SUDAKOV, V. N.
SO. A non-randomized homogeneous test in the Behrens-Fisher problem,
Dokl. Akad. Nauk SSSR 1S5 (1964), 1262-1264 =Soviet Math. Dokl. 5
( 1964), S70-S72. MR 28 #4627b.
v v
LINNIK, Ju. V. and SALAEVSKII, O. V.
Sl. On the analytic theorr, of tests for the Behrens-Fisher problem, Dokl.
Akad. Nauk SSSR 150 (1963), 26-27 = Soviet Math. Dokl. 4 (1963), S80-
S82. MR 27 #3042.
J'.OJASIEWICZ, S.
S2. Sur le probleme de division, Studia Mathematica 18 (19S9), 87 -136.
MR 21 #S893.
LUKACZ, E.
S3. Characterization of populations by properties of suitable statistics,
Proc. Third Berkeley Sympos. Math. Stat. and Prob. 19S4-19SS, vol. 2,
University of California Press, Berkeley, 19S6, pp. 19S-214. MR 18, 942.
LJAPUNOV, A. A.
S4. On completely additive vector-valued functions, Izv. Akad. Nauk SSSR
Ser. Mat. 4 (1940), 46S-468. (Russian) MR 2, 31S.
MALGRANGE, B.
SS. Lectures on the theory of functions of several complex variables, Tata
Institute, Bombay, 1962.
NEYMAN, J.
S6. Sur la verification des hypotheses statistiques composees, Bull. Soc.
Math. France 63 ( 193S), 346-366.
256 BIBLIOGRAPHY
HALMOS, P. R.
69. Measure theory, Van Nostrand, Princeton, N. J ., 1950; Russian transl.,
IL, Moscow, 1953. MR 11, 504; MR 16, 22.
HALMOS, P.R. and SAVAGE, L. J.
70. Application of the Radon-Nikodym theorem to the theory of sufficient
statistics, Ann. Math. Statistics 20 0949), 225-241. MR 11, 42.
,
HARDY, G. H., LITTLEWOOD, J. E. and POLY A, G.
71. Inequalities, 2nd ed., Cambridge Univ. Press, New York, 1952; Russian
transl., IL, Moscow, 1948. MR 13, 727; MR 18, 722.
HARDY, B.
72. Some properties of an angular transformation of the correlation coefficient,
Biometrika 43 (1956), 219-224. MR 17, 981.
HOGG, R. and CRAIG, A.
73. Sufficient statistics in elementary distribution theory, Sankhya 17 (1956),
209-216. MR 19, 188.
FEIGEL'SON, T. S.
74. On a simple method of establishing independence of statistics, Vestnik
Leningrad Univ. Ser. Mat. Astronom. 19 (1964), no. 3, 157-158.
(Russian) MR 29 #4131.
FUKS,. B. A.
75. Introduction to the theory of analytic functiOns of several complex vari-
ables, Fizmatgiz, Moscow, 1962; English transl., Transl. Math. Mono-
graphs, vol. 8, Amer. Math. Soc., Providence, R. I. 1963; reprint 1965.
MR 27 #4945; MR 29 #6049.
76. Special chapters in the theory of analytic functions of several complex
variables, Fizmatgiz, Moscow, 1963; English transl., Transl. Math.
Monographs, vol. 14, Amer. Math. Soc., Providence, R. I., 1965.
MR 30 #4979.
v v
SALAEVSKII, O. V.
77. On the non-existence of regularly varying tests for the Behrens-Fisher
problem, Dokl. Akad. Nauk SSSR 151 (1963), 509-510 =Soviet Math.
Dokl. 4 (1963), 1043-1045. MR 27 #2037.
CHATTERJEE, S. K.
78. On an extension of Stein's two sample procedure to the multi-normal
problem, Calcutta Statist. Assoc. Bull. 8 (1959), 121-148. MR 21 #4501.
258 BIBLIOGRAPHY
,
SO:IEFFE, H.
79. On solutions of the Behrens-Fisher problem based on the t-distribution,
Ann. Math. Statistics 14 (1943), 35-44. MR 4, 221.
SHOHAT, J. A. and TAMARKIN, J. D.
80. The problem of moments, Math. Surveys, vol. 1, Amer. Math. Soc.,
Providence, R. I., 1943; rev. ed., 1947. MR 5, 5.