Professional Documents
Culture Documents
AkademiaiKiado 004581
AkademiaiKiado 004581
AkademiaiKiado 004581
RÉNYI
probability
theory
PROBABILITY THEORY
AKADÉMIAI KIADÓ
PUBLISHING HOUSE OF THE HUNGARIAN
ACADEMY OF SCIENCES
B U D A P E S T
I
i
\
•1
.
PROBABILITY THEORY
í
■ (
» .J
V
PROBABILITY
THEORY
by
A. RÉNYI
Member o f the Hungarian Academy o f Sciences
Professor o f Mathematics at the
Eötvös Loránd University, Budapest
Director o f the Mathematical Institute o f the
Hungarian Academy o f Sciences, Budapest
WAHRSCHEINLICHKEITSRECHNUNG
VALÓSZÍNŰSÉGSZÁMÍTÁS
Tankönyvkiadó, Budapest 1966
and
English translation by
DR. LÁSZLÓ VEKERDI
© A K A D É M IA I K IA D Ó , B U D A P E S T 1970
JO IN T E D IT IO N PU B L ISH E D BY
A K A D É M IA I K IA D Ó
PU B L ISH IN G H O U SE O F T H E H U N G A R IA N A C A D E M Y O F SC IE N C E S
AND
N O R T H H O L L A N D P U B L IS H IN G C O M P A N Y • A M S T E R D A M • L O N D O N
P R IN T E D IN H U N G A R Y
<\
*
PREFACE
One of the latest works of Alfréd Rényi is presented to the reader in this
volume. Before his sudden death on the 1st February, 1970, he corrected
the first proof of the book, but he had no longer time for the final proof
reading* and writing the preface he had planned.
This preface is, therefore, a brief memorial to a great mathematician,
mentioning a few features of Alfréd Rényi’s professional career.
Professor Rényi lectured on probability theory at various universities
throughout an uninterrupted series of years, from 1948 till his untimely
death. His academic career started at the University of Debrecen and was
continued at the University of Budapest where he was professor of the
Chair of Theory of Probability. In the meantime he was invited lecturer for
shorter or longer terms in several scientific centres of the world. Thus he
was visiting professor at Stanford University, Michigan State University,
the University of Erlangen, and the University of North Carolina.
Besides his teaching activities, Professor Rényi was director of the Mathe
matical Institute of the Hungarian Academy of Sciences for one and half
decade. Under his direction the Institute developed into an important re
search centre of the science of mathematics.
He participated in the editorial work of a number of journals. He was
the editor of Studia Scientiarum Mathematicarum Hungarica and a
member of the Editorial Board of: Acta Mathematica, Annales Sci. Math.,
Publicationes Math., Matematikai Lapok, Zeitschrift für Wahrscheinlich
keitstheorie, Journal of Applied Probability, Journal of Combinatorial
Analysis, Information and Control.
The careful reader will certainly note how the long teaching experience
and keen interest in research are amalgamated in the present book. The
material of Professor Rényi’s courses on probability theory was first pub
lished in the form of lecture notes. It appeared as a book in Hungarian in
1954, and completely revised in German translation in 1962. The latter
book was the basis of a new Hungarian edition in 1965 and the French
* This was done by Mr. P. Bártfai, Mrs. A. Földes and Mrs. L. Rejtő.
PREFACE
§ 1. Fundamental relations 9
§ 2. Some further operations and relations 13
§ 3. Axiomatical development of the algebra of events 16
§ 4. On the structure of finite algebras of events 18
§ 5. Representation of algebras of events by algebras of sets 21
§ 6. Exercises 25
tables 617
REMARKS AND BIBLIOGRAPHICAL NOTES 638
REFERENCES 645
AUTHOR AND SUBJECT INDEX 661
CHAPTER I
A L G E B R A S OF E V E N T S
§ 1. Fundamental relations
A = A. (1)
В one in the right side of it. In this case the statement “A and В occurred
both” means the fact that the hit lies in the right upper quadrant of the target
(Fig- 1).
A В AB
Fig. 1
AB = BA. (2)
Also obviously,
AA = A, (3)
i.e. every event A is idempotent with respect to multiplication. The definition
of the product of events may be extended to more than two factors. A(BC)
occurs, by definition, if and only if the events A and BC occur; that is, if the
events A, B, and C all occur. Evidently, (AB)C has the same meaning. Thus
we have the associative law for multiplication:
A(BC) = (AB)C. (4)
Instead of A(BC) therefore we can write simply ABC. Clearly, the event
AB can occur only if A and В do not exclude each other. If A and В are
mutually exclusive, AB is an impossible event. It is useful to consider the
impossible event as an event too. It will be denoted by O. The fact, that A
and В are mutually exclusive, is thus expressed by AB = О. Since an event
and the complementary event obviously exclude each other, we have
AÄ = O. (5)
If A and В are two events of an algebra of events, one may ask whether
at least one of theevents A and В did occur. Let A denotethe event that the
hit lies in the upperhalf of the target and В the event that it lies in the right
I. 8 1] F U N D A M E N T A L R E L A T IO N S 11
half; the statement, that at least one of the events A and В occurred, means
then that the hit does not lie in the left lower quadrant of the target (Fig. 2).
The event occurring exactly when at least one of the events A and В occurs,
is said to be the sum of A and В and is denoted by A + B. It is easy to see
В A+B
Fig. 2
that
A + В = В + A (6)
(commutative law of addition) and also that
A + (В + С) = (A + В) + C (7)
(associative law of addition). The definition of the sum is readily extended to
the case of more than two events.
The event A + B occurs thus precisely, if A or В occurs; the word “or”,
however, does not mean in this connection that A and В exclude one an
other. Thus for instance, in our repeatedly considered example the meaning
of A + В is the statement that the hit lies either in the upper half of the tar
get (this is now the event A) or in the right lower quadrant (event Aß).
Therefore we have the relation
A + В = A + ÄB, (8)
where the two terms on the right hand side are now mutually exclusive.
By applying relation (8), every sum of events can be transformed in such
a way that the terms of the sum become pairwise mutually exclusive.
Clearly the formula
A + A = A (9)
is valid. Further we see that the event A + Ä certainly occurs; thus by
introducing the notation / for the “ sure event” we have
A + Ä = I. (10)
We agree further that
1= 0, 0 = 1, (11)
12 ALGEBRAS O F EVENTS [I, § <
i.e. that the event complementary to the sure event is the impossible event
О and conversely.
Evidently, the following relations are also valid:
AO = O, (12)
A + О = A, (13)
AJ = A, (14)
A + I= I. (15)
A + AB = A; (17)
since from (14), (16) and (15) we have
A + AB = A I + AB = A (I + В) = A I = А.
Clearly, rule (17) can be verified directly as well; the direct verification is,
however, clumsy for some complicated relations, while by applying the for
mal rules of operation one can readily get a formal proof. This is the reason
why the algebra of events is useful; therefore it is advisable to obtain a
certain practice in such formal proofs.
The distributive laws can be extended (just like in ordinary algebra) to
more than two terms. In the algebra of events there exists, however, still
another distributive law:
A + ВС = (A + В) (A + C). (18).
(A + В) (A + C) = A + AB + A C + BC = A + BC,
i, § 2 ] S O M E F U R T H E R O P E R A T IO N S A N D R E L A T IO N S 13
ÄB = Ä + B, (19)
A + В = AB. (20)
The event AB occurs exactly, if AB does not occur, hence if the events
A and В do not both occur; Ä + В occurs exactly, if A or В (or both) do
not occur. These two propositions evidently state the same thing; thus (19)
is valid. Formula (20) can be proved in the same way.
As to the rules of operation valid for the addition and multiplication
of events, we see that both have the same properties (commutativity, asso
ciativity, idempotency of every element) and that the relations between the
two kinds of rules of operation are symmetrical. Formulas (16) and (18)
are obtained from each other — by interchanging everywhere the signs of
multiplication and addition. Such formulas are called dual to one another.
Thus for instance the relations
A + AB = A and A(A + B) — A
are dual to one another. Clearly, there exist relations which are, because of
their symmetry, selfdual; e.g. the relation
(A + В) (A + С) (B + C) = AB + AC + BC.
П
For sake of brevity we write sometimes Y[ ^ к instead of A 1A2 .. . An
k= 1
n
and Yj instead of A x + A 2 + . . . + A„.
к =1
В — A = BÄ. (1)
they are the two distributive laws of subtraction. Using the subtraction,
the complementary event may be written in the form
A = I — A. (3)
А В Aa В
Fig. 4
The subtraction does not satisfy all the rules of operation known from
ordinary algebra. Thus for instance (A — В) + В is in general not equal
to A ; further A + {В — C) is not always identical to (A + В) — C. Hence,
if in relations between events there figures the sign of subtraction too, the
brackets are not to be omitted without any consideration. There are, how
ever, cases when this omission is allowed, e.g.
A — (В + C) — (A — В) — C. (4)
The event A — В occurs exactly if A does and В does not occur; in the
same way, В — A occurs if В does but A does not occur. The meaning of the
expression (A — В) + (B — A) is therefore not O, but the event which
consists of the occurrence of one and only one of the events A and B. It is
reasonable to introduce for this event a new symbol. We put
(A - В) + (В - A) = AAB. (5)
B = B I= B{A + A) = BA + BA = A + BA.
From this it follows that for the validity of the relation A £ В the validity
of one of the relations A = AB and В = A + BÄ is necessary and sufficient.
The latter relation can be stated in the following form: For the validity of
A £ В a necessary and sufficient condition is the existence of a C such that
AC — О and В = A + C; indeed, from this it follows directly C = BA.
16 A LG EBRA S O F EVENTS [I, § 3
Aj Ak = О for j ф к and Ak + A2 + . . . + A„ = /
are valid. For instance [A, A} is a complete system of events, provided that
А Ф О and А Ф I.
AA = A (1.1)
AB = BA (1.2)
A{BC) = (AB)C (1.3)
A + A = A (2.1)
A + В = В + A (2.2)
A + (В + C) = (A + В) + C (2.3)
A{B + C) = AB + AC (3.1)
A + BC = (A+ B) (A + C) (3.2)
AÄ = О (4.1)
A + Ä = I (4.2)
AI = A (5.1)
A + О = A (5.2)
AO = О (5.3)
A A 1= 1 (5.4)
1 The notations А П В and A U В are used often instead of AB and A + B,
respectively.
I, § 3] A X IO M A T IC A L D E V E L O P M E N T 17
It is to be noted that these axioms are not all mutually independent; thus
for instance (3.2) can be deduced from the others. It is, however, not our
aim to examine here which axioms could be omitted from the system.
The totality of the outcomes of an experiment forms a Boolean algebra,
if we understand by the product AB of two events A, В the joint occurrence
of both events and by the sum A + В of two events the occurrence of at
least one of the two events; further, if we denote by Ä the event complemen
tary to A and by О and / the impossible and the sure events, respectively.
Indeed, the above 14 axioms are fulfilled in this case. More generally, every
subset of the set of the outcomes of an experiment is a Boolean algebra if it
contains the sure event, further for every event A its complementary event
Ä and for every A and В the events AB and A + B.
Clearly, one can find other Boolean algebras as well. Thus for instance,
the totality of the subsets of a set Я is also a Boolean algebra. We define the
sum of two sets as the union of the two sets and their product as the inter
section of the two sets. Let I mean the set Я itself and О the empty set,
further A the set complementary to A with respect to Я and thus В — A
the set complementary to A with respect to B. A direct verification of each
axiom shows that this system is indeed a Boolean algebra.
There exists a close connection between Boolean algebras of events and
algebras of sets. In our example of the target this connection is clearly vis
ible. This analogy between a Boolean algebra of sets and an algebra of
events has an important role in the calculus of probability.
In order to obtain a Boolean algebra, it is not necessary to consider all
subsets of a set. A collection T of the subsets of a set Я is said to be an algebra
o f sets, if the addition can be always carried out in it, if Я itself belongs to
T and for a set A its complementary set Ä = H — A belongs to T as well;
i.e. if the following conditions are satisfied:1
1. Я ( Г.
2. A £ T, В £ T implies A + В £T.
3. A £ T implies Ä £ T.
1 The notation a £ M means here and in the following that a belongs to the set
M; a <J M means that a does not belong to the set M.
18 A LG EBRA S O F EVENTS [I, § 4
A = В + С, В ф А, С ф A.
ments of the whole algebra of events is £ " (n 1> which Is equal to 2".
r —0 [^ J
It follows from Theorem 1 that the sure event / can be represented as the
sum of all elementary events
I —Ai + A2 + . . . + An.
Thus always one and only one of the elementary events A b A 2, .. A „occurs.
The elementary events form a complete system of events.
Consider now as an example the algebra of events which consists of the
possible outcomes of a game with two dice. Clearly, the number of elemen
tary events is 36; let us denote them by Ay (i ,j — 1, 2 ,. .., 6) where Ay
means that the result for the first die is i, that for the second, j. Accord
ing to Theorem 2 the number of events of this algebra of events is
236 = 68 719 476 736. It would thus be impossible to discuss all cases.
We choose therefore another example, namely the tossing a coin twice.
The possible elementary events are; 1. first head, second head as well (let
A u denote this case); 2. first head, next tail, denoted by AV1; 3. first tail,
next head, denoted by A 21; 4. first tail, second also tail, notation; A 22. The
number of all possible events is 2d = 16. These are: I, O, the four elementary
events, further A u + A 12, A n + A 21, A n _+ A 22, A 12 + A 21, A 12 + A 22,
A 21 + A 22, and besides these the four events A n , A 12, A 21, A22 complementary
to the four elementary events.
By the canonical representation we have thus obtained a complete de
scription of this finite algebra of events. Now also the rules of operation
obtain a new sense. Theorem 1 namely points to a connection which shall
lead us to Kolmogorov’s theory of probability. A compound event, which
is the sum of elementary events, can be characterized uniquely by the set
of terms of this sum. In this way one can assign to every event a set, namely
the set of the elementary events whose sum is the canonical representation
of the event. Let A' denote the collection of elementary events which form
the event A and similarly let B' denote the collection of the elementary
events from which event В is composed. One can show that the collection
of elementary events from which the event A В is composed is the inter-
1. § 5] R E P R E S E N T A T IO N B Y A L G E B R A S O F S E T S 21
section A'B' of A' and B'\ further that the collection of elementary events
from which the event A + В is composed is equal to the union A' + B'
of the sets A' and B '. In this assignment of events to sets, the elementary
events themselves correspond to the sets having only one element. Obviously
the empty set corresponds to the impossible event. To the sure event cor
responds the set of all possible elementary events (with respect to the same
experiment); this set will be denoted by H and will be called the sample
space. Further, it is easy to show that to the complementary event Ä corre
sponds the complementary set of A' with respect to II.
In the following paragraph we shall show that to every algebra of events
corresponds an algebra of the subsets of a set H such that there corresponds
to the event A + В the union of the sets belonging to A and В and to the
product AB the intersection of the sets belonging to A and B, finally to the
complementary event Ä the complementary set with respect to H of the
set belonging to A. In other words, one can find to every algebra o f events
an algebra o f sets which is isomorphic to it.
The proof of this theorem, due to Stone [1], is not at all simple; it will
be accomplished in the next paragraph; on first reading it can be omitted,
since Stone’s theorem will not be used in what follows. We give the proof
only in order to show that the basic assumption of Kolmogorov’s theory,
i.e. that events can always be represented by sets, does not restrict the gen
erality in any way.
In the case of a finite algebra of events this fact was already established
by means of Theorem 1. Here we even have a uniquely determined event
corresponding to every subset of the sample space.
The theory of Boolean algebras is a particular case of the theory of more
general structures called lattices (cf. e.g. G. Birkhoff [1]).
a) / É a.
b) A £ a implies A $ a and conversely.
c) If A + В £ a, then A or В belongs to a.
1 In lattice theory such systems are called ultrafilters: ultrafilters are commonly
characterized as sets complementary to prime ideals. A nonempty subset ß of a Boolean
algebra cA is called a prime ideal, if the following conditions are fulfilled. 1. A £ ß
and В £ ß imply A + В aß. 2. A £ ß and В £ cA imply AB f ß. 3. If AB f ß, then
A £ ß or В £ ß (or both). Cf. e.g. G. Aumann [1].
I , § 5] R E P R E S E N T A T IO N B Y A L G E B R A S O F S E T S 23
Let us return to the proof of Lemma 2. Let S3 denote the set of those sys
tems of events ß in the algebra of events t s f which fulfil conditions 1 and 2
of the crowds of events. If ß < у means that ß is a proper subset of y, S3 is
a partially ordered set. If А ф O, the set ß = (A) consisting only of the
element A evidently fulfils conditions 1 and 2. According to Lemma 3 there
exists a maximal chain containing ß = (A) as a subset. Let a denote the
union of the subsets у belonging to this chain. Clearly, a is a crowd of events,
since it is the union of sets ß fulfilling the rules 1 and 2 defining the crowds
of events. Therefore no element of the chain contains the event О and thus
a does not contain О either. Further if B1 and B2 belong to a, they belong to
a subset ßx respectively a subset ß2 of a. Since either ßx < ß2 or the contrary
must hold, ßx and ß2 belong both to ßx or to ß2 and the same holds for BXB2.
Thus BXB2 belongs to a as well. Further we see that a cannot be extended.
This is a consequence of the requirement that the chain be a maximal chain.
Lemma 2 is thus proved.
Now we can construct to every algebra of events a field of sets isomorphic
to it. Let 3 3 be the set of all crowds of events a of the algebra of events
We assign to every event A of >3: the subset S3 Aof 3 3 consisting of all crowds
of events a containing the events. The set -3$A will be called the representa-1
*32a*32b = *32Ab<
> (0
*32a = *32A> ( 2)
*32A+в —*32A
B—*32A
B= <32a<
32-b —<32A+ <
J&B—*32a+ *32B<
Thus we have proved that JH s an algebra of sets. In order to show that
.23 is isomorphic to the algebra of events <32, it still remains to prove that the
correspondence A -* <32A is one-to-one. If А ф B, we have AAB Ф O.
Hence at least one of the relations AB Ф О and AB ф О is valid as well.
Suppose that ÄB Ф O. Because of (1) <32AB — <32A<32B. Hence every crowd
of events belonging to 32AB belongs to <32A and also to <32D, hence it belongs
to <32Band does not belong to <32A. Thus we proved the existence of crowds of
events which belong to <32B but do not belong to <32A. Hence 32B and <32A
I, § 6 ] E X E R C IS E S 25
§ 6. Exercises
1. Prove
a) AB~+ CD = (A + B ) ( C + D),
b) (A + B) (A + B ) + (A + В) (Л + B) = I,
c) (A + B ) ( A + B) (A + B)(A~ + B) = 0 ,
d) (A + B) (A + C) (B + C) = AB + AC + BC,
e) A — BC = ( A — B ) + (A — C),
f) A — (B + C) = (A — B) — C ,
g) (A — B) + C = [{A + C) — B ] + BC,
h) (A — B) — (C — D) = [A — (B + C)] + (AD — B) ,
i) A — {A — [B — (B — C)]} = A B C ,
j) ABC + ABD + A CD + BCD =
= (A + B ) ( A + C) (A + D)( B + C)(B + D)(C + D) ,
k) A + B + C = (A — B ) + ( B — 0 + ( C — A) + ABC,
l) A A ( BA C) = (A AB) A C ,
m) (A + В) A (A + B ) = A A В ,
n) AB ABA = A AB.
o) Prove the relations enumerated in § 2 (6) for the symmetric difference.
p) The relation (A + B) — В = A does not hold in general. Under what con
ditions is it valid?
q) Prove that А А В = C A D implies A A С = В A D.
2. The elements of a Boolean algebra form a ring with respect to the operations
of symmetric difference and multiplication. The zero element is 0, the unit element I.
3. In a finite algebra of events containing n elementary events one can give several
complete systems of events. Complete systems of events differing only in the order
of the terms are to be considered as identical. Let T„ denote the number of the different
complete systems of events.
a) Prove that Г, = 1 , 7 , = 2, T3 = 5, T4 = 15, Tb = 52, Tr>= 203.
b) Prove the recursion formula
a n d sh o w t h a t T l0 = 115 975.
26 A LG EBRA S O F EVENTS [I, § 6
c) Prove
k= I K -
5. P ro v e t h e r e la tio n
7. We can construct from the events A, B, and C by repeated addition and multi
plication eighteen, in general different, events namely A, B, C, AB, AC, BC, A + B,
В + С, C + A, A + ВС, В + AC, C + AB, AB + AC, AB + ВС, АС + ВС, АВС,
А + В + С, AB + АС + ВС. (The phrase “in general different” means here that
no two of these events are identical for all possible choices of the events А, В, C.)
Prove that from 4 events one can construct 166, from 5 events 7579 and from 6 events
7 828 352 events in this way. (No general formula is known for the number of events
which can be formed from n events.)
8. The divisors of an arbitrary square-free1 number N form a Boolean algebra’
if the operations are defined as follows: We understand by the “sum” of two divisors
of N their least common multiple, by their “product” their greatest common divisor;
N
d being a divisor of N we understand by d the number — ; the number 1 serves as
d
О and the number N as I.
9. Verify that for the example of Exercise 8, our Theorem 1 is the same as the well-
known theorem on the unique representability of (square-free) integers as a product
of prime numbers.
10. The numbers 0 , 1 , . . . , 2"-l form a Boolean algebra if the rules of operation
are defined as follows: Represent these numbers in the binary system. We understand
by the “product” of two numbers the number obtained by multiplying the corre-
sponding digits of both numbers place for place; by the sum the number obtained
by adding the digits place for place and by replacing everywhere the digit 2 obtained
in the course of addition by 1.
11. Let A, B, C denote electric relays or networks of relays. Any two of these
may be connected in series or in parallel. Two such networks which are either closed
both (allowing current to pass) or both open (not allowing current to pass) are con
sidered as equivalent. Let A + В denote that A and В are coupled in parallel, AB
that they are coupled in series. Let A denote a network always closed if A is open
and conversely. Let О denote a network allowing no current to pass and I a network
always closed. Prove that all axioms of Boolean algebras are fulfilled.1
Hint. Relation (Л + B)C = AC + BC has for instance the meaning that it comes
to the same thing to connect first A and В in parallel and couple the network so obtained
with C in series or to couple first A and C in series, then В and C in series and then
the two systems so obtained in parallel. Both systems are equivalent to each other in
the sense that they either both allow to pass the current or both do not. A similar
consideration holds for the other distributive law. Both distributive laws are illustrated
in Fig. 5.
о —-C“:>
• A • •
(A+B)C = AC + BC
В •— i I— # A • — i i— • В • — i
12. A domain of the plane is said to be convex if it contains for any two of its
points the segment connecting these points as well. We understand by the “sum”
of two convex domains the least convex domain containing both, by their “product”
their intersection which, evidently, is convex as well. Let further I denote the entire
plane and О the empty set. The addition and multiplication fulfil axioms (1.1)—(2.3);
the distributive laws are, however, not valid and the complement A is not defined.
13. Let us understand by a linear form a point, a line or a plane of the 3-dimensional
affine space, further let the empty set and the entire 3-dimensional space be called
linearforms too. We define as the sum of a finite number of linear forms the least linear
form containing their set theoretical union; let their product be their (set theoretical)
intersection, which is evidently a linear form too. Prove the same propositions as in
Exercise 12.
1 This example shows how Boolean algebra can be applied in the theory of net
works and why it is of great importance in communication theory and in the construc
tion of computers (cf. e.g. M. A. Gavrilov [1]).
28 ALG EBRA S O F EVENTS [I, § 6
Sk — P„-k+ 1 (к — 1, 2 , .. ., n)
in a formal way by applying the rules of operation of Boolean algebras and verify
it directly too (cf. the generalization of Exercise Id).
Hint. Sk has the meaning that among the events A,, A}, . . . , A„ there are at least
к which occur and the meaning of P„-k+ t is that among these same events there are
no n — к + 1 which do not occur; these two statements are equivalent.
16. Show that condition b) of the preceding exercise cannot be replaced by b')
“whenever A and В belong to T, A A В belongs to T as well”.
Hint. Let Н be a finite set of the elements a, b, c, d and let Г consist of the following
subsets of H: {a, b}; {c, d ] ; {a, c } ; {b, d } ; О ; H.
20. We call a nonempty 50 of subsets of a set H that contains with two sets A and
В also A + В and A — B, a ring of sets. A ring of sets 01 is thus an algebra of sets
if and only if H belongs to 01. Prove that a nonempty system of sets containing with
A and В also AB and A — В is not necessarily a ring of sets. Show that the condition
“with two sets A and В, A + В and AB belong as well to S” is not sufficient for S
to be a ring of sets either.
C H A P T E R II
PROBABILITY
with the same accuracy as most of the “deterministic” laws of nature. The
radioactive disintegration is thus a mass phenomenon described, as to its
regularity, by the theory of probability.
As seen in the above example, phenomena described by a stochastic
scheme ate also subject to natural laws. But in these cases the complex of
the considered circumstances does not determine the exact course of the
events; it determines a probability law, giving a bird’s view of the outcome.
Probability theory aims at the study of random mass-phenomena, this
explains its great practical importance. Indeed we encounter random mass-
phenomena in nearly all fields of science, industry, and everyday life.
Almost every “deterministic” scheme of the sciences turns out to be sto
chastic at a closer examination. The laws of Boyle, Mariotte, and Gay-Lus
sac for instance are usually considered to be deterministic laws. But the
pressure of the gas is caused by the impacts of the molecules of the gas on
the walls of the container. The mean pressure of the gas is determined by
the number and the velocity of the molecules hitting the wall of the container
per time unit. In fact, the pressure of the gas shows small fluctuations,
which may, however, be neglected in case of greater gas-masses. As another
example consider the chemical reaction of two substances A and В in a
watery solution. As it is well known, the velocity of the reaction is in every
instant proportional to the product of the concentrations of A and B. This
law is commonly considered as a causal one, but in reality the situation is
as follows. The atoms (respectively the ions) of the two substances move
freely in the solution. The average number of the “encounters” of an ion
of substance A with an ion of substance В is proportional to the product of
their concentrations; hence this law turns out to be essentially a stochastic
one too.
The development of modern science makes it often necessary to examine
small fluctuations in phenomena dealt with earlier only in their outlines
and considered at that level as causal. In the following, we shall find several
occasions to illustrate these principal questions with concrete examples.
ning of our century К . Pearson obtained from 24 000 tossings the value
0.5005 for the relative frequency.
There are thus random events showing a certain stability of the relative
frequency, i.e. the latter fluctuates about a well-determined value and the
more trials are performed, the smaller are, generally, the fluctuations. The
number, about which the relative frequency of an event fluctuates, is called
the probability of the event in question. Thus, for instance, the probability
of “ heads” (supposed the coin is regular) is equal to —
Consider now another example. In throws of regular die of homogeneous
substance the relative frequency of any one of the faces 1, 2, 3, 4, 5, 6 fluc
tuates about — , i.e. the probability of each number is equal to — , if,
6 6
however, the die is deformed, e.g. by curtailing one of its faces, these
probabilities will be different.
A further example is the following: there is a well-determined probability
that a certain atom of radioactive substance disintegrates during a given
time interval t. That is, in repeated observation of atoms during a time inter
val t we find that the number of atoms decaying during this time interval
fluctuates about a well-determined value. This value is according to the
observations 1 — c~ Xl, where A is a positive constant depending on the radio
active substance. (E.g. in case of radium, if the time is measured in seconds,
A = 1.38 • 10 11.) Later on we shall give a theoretical foundation of this
law, i.e. we shall deduce it from simple assumptions.
II, § 3] P R O B A B IL IT Y A L G E B R A S 33
§ 3. Probability algebras
a number lying between zero and one. Evidently, the probability of an event
must therefore lie between zero and one as well. It is further clear that the
relative frequency of the sure event is equal to one and that of the impossible
event is equal to zero. Hence also the probability of the “sure” event must
be equal to one and that of the “impossible” event equal to zero. If A and В
are two possible outcomes of the same experiment which mutually exclude
each other and if in n performances of the same experiment the event A
occurred k A times and the event В k B times, then clearly the event A + В
occurred k A + k B times. Hence, denoting by f A, f B and f A+B the relative
frequencies of A, B, and A + B, respectively, we have:
I a +b ~ f a + f e
in other words, the relative frequency of the sum of two mutually exclusive
events is always equal to the sum of the relative frequencies of these events.
Hence also the probability of the sum of two mutually exclusive events must
be equal to the sum of the probabilities of the events. We therefore take the
following axioms:
a) To each element A o f an algebra o f events a non-negative real number
P{A) is assigned, called the probability o f the event A.
fi) The probability o f the sure event is equal to one, that is P(I) — 1.
y) I f AB = O, then P(A + B) = P(A) + P{B).
I a + f i — 1-
Axiom y) states that the probability of the sum of two mutually exclusive
events is equal to the sum of the probabilities of the two events. This leads
immediately to
T heo rem 3. I f the events A b A .,,. .., An are pairwise exclusive, i.e. if
A jAj = О for i Ф j, then
Indeed, by assumption
Ai + A2 Ч- . . . + An = / and AjAj = О
36 P R O B A B IL IT Y [II, § 3
T heorem 6. If A s B. then
P{B - A) = P(B) - P{A).
Proof. We have, by assumption, В = A + {B — A) and A(B — A) = O,
hence Theorem 6 follows from Axiom y).
Clearly the relation P(B - A) = P(B) - P(A) does not hold in general.
It is, however, easy to obtain
and
(A - B) (B - A) = 0
we find
P(AAB) = P(A - B) + P(B - A),
and Theorem 8 follows from Theorem 7.
We proved Theorem 3 by repeated application of Axiom y). In the same
manner, we can obtain by repeated application of Theorem 5 a formula
for the probability of the sum of an arbitrary number of events. In particular,
we have
P(A + В + C) = P(A) + P(B) + P(C) - P(AB) - P{BC) -
- P(CA) + P(ABC).
More generally we have
T heorem 10 (Ch. Jordan). Let Vj.n) denote the probability o f the occurrence
o f exactly r among the events А ъ A.,, . . ., A n. Then we have
where S jf = 1 and
s (,n) = Z
l ^ i \ < i 2< ... <ii<.n
А Л Л • • • Л) fo r 1 = 1 ,2 ,... (4)
where the real numbers /.m depend only on the numbers ck and on the func
tional dependence of the events Bk on the events Aj, but do not depend on
the numerical values P(Aj). The summation is over all the 2" elementary1
^’ 0) t'k ,
<oCBk
where the summation is over such values of к for which со figures in the
representation of Bk. Since the nonnegative numbers P(co) are submitted
to the only condition IP(co) = 1, (6) holds, in general, if and only if all
numbers ).01 are nonnegative. But when the sequence of numbers P(Aj)
(j — 1 , 2 , . . . , « ) consists of nothing but zeros and ones, one and only one
of the elementary events to has probability 1 and all the others have proba
bility 0. Thus the proposition that лш> 0 for all со is equivalent to the prop
osition that (6) is valid whenever the sequence of numbers P(Ak) consists
of zeros and ones only. Theorem 11 is now proved.
From Theorem 11 follows immediately
L ck P(Bk) = 0 (7)
*=i
holds in every probability algebra, i f and only if it holds in all cases when the
sequence o f numbers P(Ak) consists o f zeros and ones only.
m
Е н гГ Я Ш й к
For l < r all terms of the left hand side of (8) are equal to 0, for / — /• only
the term к = 0 is distinct from zero, namely 1 and for 1 > r the sum can be
transformed as follows:
l
h=0
(- 1)kfr
( ^
t 1 f I1' ++ kj
J =
40 P R O B A B IL IT Y [II, § 4
(r + i) ( " Ь * ? 1
----- 7-----TT— ^ -4 - ( r = 1 , 2 , . . . n - 1), (10)
in — 1 j n—1
l г J l ' - l
I ' U] , "I
1H
U .J СГ
and Theorem 14 to the likewise trivial inequality
' n / [n\_[l
r + l) r+ 1 rI r
.n —1 ^ n —l
\ r 11*— 1
The probability P(A) of any event A is then uniquely determined by the values
of P for the sets which consist of exactly one element. Let {со,-} be the set
consisting of со,- only and let further be P({co,-}) = p, (i = 1, 2, . . N).
Then we have for each event A
P ( A ) = £ Pt-
coi£A
Since P(£2) = 1, the (nonnegative) numbers must obey the condition
N
£ Pi = i.
i=i
An important particular case is obtained if all the numbers pt are equal
to each other, that is, if they are equal to ~ . These special probability alge
bras are called classical probability algebras, since the classical calculus of
probability exclusively dealt with these algebras.
At the early stages of development of probability theory one wished to
reduce the solution of any kind of problems to this case. But this often turned
out to be either too artificial or unnecessarily involved. Since, however, in the
games of chance (tossing of a coin, games of dice, roulette, card-games,
etc.) the probabilities can be determined in this manner indeed, and since
many problems of science and technology may be reduced to the study of
classical probability algebras, it is worthwhile to deal with them separately.
Tn the case of classical probability algebras we have
P{A) = —
Example la. A person having N keys in his pocket wishes to open his
apartment. He takes one key after the other from his pocket at random and
tries to open the door. What is the probability that he finds the right key
at the k-th trial? Suppose that the N\ possible sequences of the keys all have
the same probability. In this case the answer is very simple indeed: N ele
ments have (N - 1)! permutations with a fixed element occupying the
(N — 1)! 1
k-th place. The probability in question is therefore----------- = — ; that
N\ N
is, the probability of finding the right key at the first, second,. . ., 7V-th trial,
respectively, is always . If the keys are on the same key-ring and if the
same key may be tried more than once, the answer is different and will be
dealt with later (cf. Ch. Ill, § 3, 7.).
Example lb. An urn contains M red and N — M white balls. Balls are
drawn from the urn one after the other without replacement. What is the
probability of obtaining the first red ball at the k-th drawing? In order to
answer the question we have to determine the total number of all permu
tations of M red and N - M white balls having for their first k - 1 balls white
[N-M \
ones and for the k-th a red ball. The first k - \ balls may be chosen in
(different) ways from N —M white balls; furthermore, these can be arranged
in (k — 1)! different orders. The red ball on the /с-th place can be chosen
in M different ways and the remaining places can be filled up in ( N - k)\
different ways. Hence the probability in question - provided that all the
N\ permutations are equally probable — is given by
M
If N and M are large in comparison to k, and — is denoted by p (0 < p <
< 1), then we have approximately Pk « (1 — p)k~lp = nk. Indeed, if N
M
and M tend to infinity while p — — remains constant, then Pk tends to the
expression nk.
Ii, § 5] P R O B A B IL IT IE S A N D C O M B IN A T O R IC S 43
(N \
Solution. A sample of n elements may be chosen from N elements in
n)
different ways. Suppose that every such combination is equally probable.
ÍN ]-1
Then the probability of every combination is • Therefore we have only
l«
to count, how many combinations contain just к of the rejects. There can
[ M\
be chosen к elements from M in different ways and n - к elements
К J
, . IN-M\
from N — M in , different ways. Therefore the probability in ques-
\n-k)
tion is:
M\ i N - M
_ U I U - к
k
nI
In] 11*/ 1
VAr (TVj \ N,
• • • • • • • •
Fig. 8
/ N + n —к —2
\ n —к
N + n — l' '
n
The theory discussed up to now can only deal with the most elementary
problems of probabilities; those involving an infinite number of possible
events are not covered by it. To deal with them we need Kolmogorov’s
theory, which will now be discussed.
In Kolmogorov’s probability theory we assume that there is given an
algebra of sets, isomorphic to the algebra of events dealt with. This assump
tion, as we have seen, does not restrict the generality. We assume further
that this algebra of sets contains not only the sum of any two sets belonging
to it but also the sum of denumerably many sets belonging to the algebra
of sets. Algebras of sets with this property are called o-algebras or Borel
algebras.
In Kolmogorov’s theory we therefore assume the following axioms:
I. Let there be given a nonempty set Q. The elements o f Q are said to be
elementary events and are denoted by o).
II. Let be specified an algebra o f se tso £ o f the subsets o f Q ; the sets A o f
•of. are called events.
III. o f is a a-algebra, that is1
00
Ak £ o 6 (k = 1 , 2 ,.. .) =» X Ak ( fo f .
*=i
From the Axioms I-III follows immediately that if A k £ o f (k = 1,2,...),
00
then also П Ak e o f ■
k =1
The following axioms prescribe the properties of probabilities:
IV. To each element A o f o f is assigned a nonnegative real number P{A),
called the probability o f the event A.
1 Here and in what follows the sign => stands for the (logical) implication.
I I , § 7] R IN G S A N D A L G E B R A S O F S E T S A N D M E A S U R E S 47
V. P(Q) = 1.
VI. I f A b A 2, . . A„,. . . is a finite or a denumerably infinite sequence
o f pairwise disjoint sets belonging to *A, then
P(A^ + A.> + .’. . + A n + . . . ) = P(Aj) + P(A2) + . . . + P(A„j + . . . .
Requirement VI is called the o-additivity (or complete additivity) of the
set function P{A).
A cr-algebra - A of subsets of a set Q on which a set function P(A) is defined
such that Axioms I-VI are fulfilled will be called a probability space in the
sense o f Kolmogorov and will be denoted by [Í2, А , Р].
Theorems proved in the previous paragraph hold clearly for Kolmogorov
probability spaces too, as the Axioms a ),ß ), and y) correspond to Axioms
IV, V, and VI respectively. Axiom VI, however, requires more than the
Axiom y) of probability algebras, since it assumes the additivity of P(A)
not only for finitely many, but also for denumerably many pairwise disjoint
sets belonging to the cr-algebra A .
Every finite probability algebra is a Kolmogorov probability space, since
an additive set function on a finite algebra of sets is trivially rx-additive.
The empty set is denoted by O. Obviously, we have always P(O) — 0
(cf. the note to Theorem 2 of § 3).
Apart from finite probability algebras the most simple probability fields
are those in which the space Í2 consists of denumerably many elements.
Let Q be a denumerable set, with elements cox, co2,... , co„,. . .; let A con
sist of all subsets of Q. Let the set containing the only element con be denoted
by {a>„}; let further be P( {can}) = p„(n= 1 , 2 , . . .). In order that [ß, P]
■ CO
be a Kolmogorov probability space the conditions pn ä: Ó and Pn —1
n= 1
must, according to Axioms IV-VI, be satisfied. Further if A is an arbitrary
subset of Q, then by Axiom VI we have
P (A )= Y Pk‘
<Ok(.A
In this paragraph we shall discuss some results of set theory and measure
theory, used in probability theory. We shall not aim at completeness. We
48 P R O B A B IL IT Y [II, § 7
assume that the reader is familiar with the fundamentals of measure theory
and of the theory of functions of a real variable. Accordingly, proofs are
merely sketched or even omitted, especially if dealing with often-used con
siderations of the theory of functions of a real variable.1
We have seen already in Chapter I that every algebra of events is isomor
phic to an algebra of sets. It is always assumed in Kolmogorov’s theory that
the sets assigned to the elements of the algebra of events form a er-algebra.
Hence the algebra of sets constructed to the algebra of events must be ex
tended into a (т-algebra — if it is not already a e-algebra itself. This exten
sion is always possible, even in the case of a ring of sets.
A system <52 of subsets of a set Q is called a ring o f sets if
A £ <J$ and +
The ring of sets <^is an algebra of sets iff2the set Q belongs to <52. In fact,
an algebra of sets <sd can be characterized as a system of subsets of a set
Q having the following properties:
l. A Z <^€ and В => A - В
II. A £ and +
III. Q
This is obvious, as I and III imply that whenever A belongs to <j£ so does
Q —A —Ä and thus conditions 1,2, and 3 of § 3 Chapter I are fulfilled. Con
versely, it follows from these conditions that whenever Л ()<j €and В
hold, so do Ä £ and A — В = Ä + В £ <j 5 as well. Hence the conditions
I, II, III are equivalent to the conditions 1, 2, 3 of Chapter I. We have now
the following theorem:
T heorem 1. Let Q be any set and <52 a ring consisting o f subsets o f Q. There
exists then a uniquely determined а-ring (or Borel ring) 5)(<52) with the follow
ing properties: 5i(<52) contains <52and an arbitrary a-ring <52' containing <52
contains fS(<52) as well. In other words, 5l(<52) is the least а-ring containing <52.
Proof. Obviously, there exists a a-ring - 5 ’ containing <52. Such is, for in
stance, the collection of all subsets of 12. Let now 5i(<52) be the intersection
measure g{An) < + oo and I c у A„. The following theorem can now be
n=1
asserted:
T heorem 2. If g(A) is a о-finite measure defined over a ring o f sets <32, there
exists a uniquely determined о-finite measure g(A) defined over the extended
ring Щ-32) such that for every A £ •Jf one has g(A) = g(A).
of all sums у fi(^n)> where the A„ belong to ^ and their union contains A ;
n= l
that is
CO \
Let the ring 32 be the collection of all sets consisting of a finite number of
intervals closed to: the left and open to the right. p(A) will be defined as
follows: If A consists of the half-open disjoint intervals [ak, bk), ax < bx <
< a2 < b2 < . . . < ar < br, let then be
K A) = £
/1=1
To prove this we first prove another important general theorem:
T heorem 3. A nonnegative additive set function g(A) defined on a ring o f
sets is a measure on *32 iff for every sequence o f sets B„ £ J ? such that Bn+x S
oo
e B„, g(Bn) < + 03 (n = 1, 2,. ..) and f ] Bn = О (i.e.for every decreasing
/1 = 1
sequence o f sets B„ having the empty set as their intersection) the relation
lim g{Bn) = 0 (2)
/1 — co
holds.
The proof is simple. Indeed, if (2) is fulfilled and
An dJ?, £ Ап £<Я,
n=1
while A„Am — О for n ф m, then we have for every n
' 00 /1 —1 / °°
В Z
k = l
Ak = XKк = 1
A k) + B Z
l k —n
A k \l
J
oo
thus from the fact that the sets Bn — Z Ak satisfy the conditions
k =n
00
B„ 5„+1 E [ ] 5„ = О it follows that lim g(Bn) = 0; thus
/1 = 1
л £ л ) = £ к Ак),
k=l J k—l
i.e. /л is completely additive. Conversely, if /1 is completely additive, then
00 00
whenever B„ É Bn+1 £ £„, П B„ = 0 hold, one has Bx = Z (Д, - ß„+i)
/1 = 1 /1 = 1
where Bn - 5„+1 € <3? and (B„ - Bn+1) (Bm - Bm+1) = О (n ф mi).
Therefore we have
lim p(B„) = lim £ p(Bk - Bk+1) = 0.
/!-*■ oo /1-*- co k —ti
Now we shall prove that the set function g defined by (1) satisfies the con
ditions of Theorem 3. Let therefore be B„ £ -J&, Bn+1 £ Bn (n = 1 , 2 , . . . )
00
and П B„ = O. ThenO ^ g(Bn+1) <. p(Bn) (n = 1, 2 ,. . .), thus lim p(S„) = c
/1 = 1 /!-*-C0
does exist and д{Вп) > c (n = 1 , 2 , . . . ) . We shall show that the as-
52 P R O B A B IL IT Y [II, § 7
the relation B’n # О would follow). But this contradicts d), and thus
n =l
we proved our statement that the set function p defined by (1) is a measure
on <3?.
According to Theorem 2 the definition of the measure ц can be extended
to all Borel subsets of the real axis. Thus we obtained on these sets a measure
H such that for A = [a, b) the relation Ji(A) = F(b) — F(a) is valid.
Especially, if
10 for 0,
X <
F(x) = \x for 0<x<l,
(l for 1 < X,
then the above procedure assigns to every subinterval [a, b) of the interval
[0, 1] the value b —a.
We have seen that ц is a complete measure determined on a cr-ring <J&*,
which contains the cr-ring If F(x) has the special form mentioned
above, this measure is just the ordinary Lebesgue measure defined on the
interval [0, 1]. Any measure ц constructed by means of a function F(x)
satisfying the above conditions is called a Lebesgue-Stieltjes measure de
fined on the real axis.
II, § 7] R IN G S A N D A L G E B R A S O F S E T S A N D M E A S U R E S 53
The same construction can be applied in cases of more than one dimension.
Let F(xu x 2, . . ., x„) be a function of the real variables x b x 2, . . x„
having the following properties:
1. F(xb x2, . . ., x„) is in any one of its variables a non-decreasing func
tion continuous from the left.
2. lim F(xy, x 2, . . x„) = 0, (k = 1 , 2 , . . . , rí) and lim F(xy, x 2, ■■■,x„) =
X k-*-oo
K A )= i 44). (4)
k=l
then the extension of the set function p(A) defined above leads to the ordi
nary и-dimensional Lebesgue-measure defined on the и-dimensional cube
0 < x k < 1 (k = 1, 2........и).
§ 8. Conditional probabilities
r _ f AB
J A\B---- ■
JB
Since f AB fluctuates around P{AB) and f B around P(B), the conditional
P( A
relative frequency f A\B will fluctuate for P(B) > 0 around -. This
P( AB)
number shall be called the conditional probability of the event A with
respect to the condition B; it is assumed that P(B) > 0. The notation for
the conditional probability is P{A\B)\ thus we put
(I>
By means of formula (1) the conditional probability of any event A of a
probability algebra with respect to any condition В can be calculated, pro-
I I , § 8] C O N D I T I O N A L P R O B A B IL IT IE S 55
vided that P(B) > 0. If P(B) = 0, formula (1) has no sense; the conditional
probability P(A I B) is thus defined only for P(B) > O.1 Formula (1) may be
expressed in words by saying that the conditional probability of an event A
with respect to the condition В is nothing else than the ratio of the probability
of the joint occurrence of A and В and the probability of B.
Equality (1) is (in contradiction to the standpoint of many older text
books) neither a theorem nor an axiom; it is the definition of conditional
probability.12 But this definition is not arbitrary; it is a logical consequence
of the concept of probability as the number about which the value of the
relative frequency fluctuates.
In the older literature of probability theory as well as in some vulgariza
tions of modern physics one finds often the misleading formulation that the
probability of an event A changes because of the observation of the occur
rence of an event B. It is, however, obvious that P(A \ B) and P(A) do not
differ because the occurrence of the event В was observed, but because of
the adjunction of the occurrence of event В to the originally given complex
of conditions.
Let us now state some examples.
Example 1. In the task of pebble-screening one may ask, what part of the
pebbles is small enough to pass through a sieve SA, i.e. what is the probability
of a pebble chosen at random to pass through the sieve SA. Let this event
be denoted by A. Assume now that the pebble was already sieved through
another sieve SB, and the pebbles which did not pass through the sieve
SB were separated. What is the probability that a pebble chosen at random
from those sieved through the sieve SB will pass through the sieve SA as
well? Let В denote the event that a pebble passes through SB, the probability
of this event let be denoted by P(B). Let further AB denote the event that
a pebble passes through both SB and SA, and P(AB) the corresponding
probability. Then the probability that a pebble chosen at random from
those which passed SB will pass SA as well is, according to the above,
Example 2. Two dice are thrown, a red one and a white one. What is the
probability of obtaining two sixes, provided that the white die showed a six?
РВ£А„ l P ( A nB) „
P(A)
,2 ,
hence P(B \ A) can be expressed by means of P{A | B), P(A), and P(B). One
can write (2) in the following form, equivalent to it:
P{ B\ A) = P {A \B)
P(B) P(A) ' ’
Formula (1) can be generalized as follows: If А ъ A?, A n are arbitrary
events such that P{A1A i . . . An_,) > 0, we have
P(A,A2. . . A„) = P(AX) P(A2 I A,) P(A3 I aM 2) . . . P(An \ A,A2. . . An_j).
(4 )
II, § 9] T H E IN D E P E N D E N C E O F EV E N T S 57
Let A and В be two events of a probability algebra; assume that P(A) > 0,
and P(B) > 0. In the preceding paragraph the conditional probability
P(A I B) was defined. Generally it is different from P(A). If, however, it is
not, i.e. if
P(A I В) = P(A) (1)
then we say that A is independent o f B. If A is independent of B, then В is
independent of A as well; indeed, by Formulas (2) and (3) of the preceding
paragraph
P (B \A ) = P{B). (Г)
It is therefore permissible to say that A and В are independent of each
other. From Formula (1) of § 8 follows readily a definition of independence
of two events that is symmetrical in A and B. Indeed, because of the inde
pendence just defined we have
P(AB) = P(A)P(B). (2)
A and В being independent, (2) is valid; conversely, if (2) holds and P(A),
P(B) are both positive, then (1) and (1') hold as well, thus A and В are inde
pendent. Hence (2) is the necessary and sufficient condition of the indepen
dence, thus it may serve as a definition, either. Old textbooks of probability
theory used to call relation (2) the product rule o f probabilities. However,
according to the interpretation followed in this book (2) is not a theorem
but the definition of independence. (Since we take Formula (2) as the defi
nition of independence, any event A with P{A) = 0 or P(A) = 1 is inde
pendent of every event B.)
If A and В are independent, A and В are independent as well. Namely
from (2) it follows that
P{AB) = P(A) - P(AB) = P(A) - P(A)P(B) = P(A)P(B).
Therefore the independence of A and В implies the independence of A
and В and, similarly, that of Ä and B, further of Ä and B.
The independence of two complete systems of events is defined in the
following manner: The complete systems of events (Ab A 2, . . ., A m) and
(Въ B2, . . ., B„) are said to be independent, if the relations
are valid for them. It is easy to see that from the m-n conditions figuring in
(3) every one containing A m or Bn can be omitted. If the remaining mn -
— (m + n — 1) = (m — 1) (n — 1) conditions are fulfilled, the omitted
ones are necessarily fulfilled too, as is seen from the relations
t
*=i
P(Aj Bk) = P(Aj) ( у = 1 ,2 ,...,» и ) (4)
and
m
are superfluous, since the validity of one implies necessarily the validity of
the remaining three. Thus the independence of the events A and В is equi
valent to the independence of the complete systems of events (A, A) and
(В, В). This follows also from the relation
1 1_±
~2 ' 2 ~ 4 '
I I , § 9] T H E IN D E P E N D E N C E O F EV EN TS 59
Let us now extend the concept of independence to more than two events.
If A, B, and C are pairwise independent (i.e. A and B, A and С, В and C
are independent) events of the same probability algebra, the non-existence
of any dependence between the events A, B, and C does not follow. This
may be seen from the following example.
Let us throw two dice; let A denote the event of obtaining an even number
with the first die, В the event of throwing an odd number with the second,
finally C the event of throwing either both even or both odd numbers. Then
P(ABC) = 0,
thus
P( (AB)C) ф P(AB)-P(C),
P(AB) = P(A)P(B),
PIAC) = P(A)P(C),
P(BC) = P(B)P(C),
P(ABC) = P(A)P(B)P(C)
are valid. The first three of these relations express the pairwise indepen
dence of A, B, and C, the fourth the fact that each of the events is inde
pendent of the product of the remaining two. Indeed, from the first three
conditions we have:
is valid for any combination (ib i2, . . ik) from the numbers 1 , 2 , . . . , /;.
n
Since from n objects one can choose к objects in ways, (7) consists of
К
2" —и —1 conditions. In what follows, by saying for more than two events
that they are independent we shall mean that they are completely indepen
dent in the sense just defined. If only pairwise independence is meant this
will be stated explicitly. The independence of more than two complete sys
tems of events can be defined in a similar manner.
Combinatorial methods for the calculation of probabilities have already
been mentioned. They rested upon the assumption of the equiprobability
of certain events. By means of the concept of independence, however, this
assumption may often be reduced to more simple assumptions. Besides the
simplification of the assumptions, this reduction has the advantage that
the checking of the practical validity of our assumptions becomes sometimes
more easy.
Example 2. Sampling without replacement. An urn contains n different
objects, numbered somehow from 1 to n. We draw one after the other к
items without replacement. What is the probability that we obtain a given
combination of к elements? Clearly the number of possible combinations
n1
ISU)‘
It was supposed that all combinations are equally probable, the proba-
I n -1
bility looked for is thus
l^
This result may also be obtained from the following simpler assumption:
at every drawing the conditional probability o f drawing any object still in the
urn is the same. Here the probability that a given combination occurs in a
given order is — • — ----- - ------ J------ . Namely, at the first drawing there
n n —1 n- к + 1
are in the urn n objects, the probability of choosing any one is — ; at the
n
second drawing the conditional probability of choosing any one of the n —1
objects which are still in the urn is --------, etc. Since the elements of the com-
n —1
bination in question may be chosen from the urn in k \ different orders,
the obtained result must be multiplied by k\ and thus we get that the proba
bility of drawing a combination of к arbitrary elements is
_______k! / и ! “1
n{n — l j . . . (n —к + 1) 1к ]
I I, § 9 ] T H E IN D E P E N D E N C E O F E V E N T S 61
Indeed, let A t denote the event of choosing a red ball at the i-th drawing
(/ = 1 , 2 , . . . , ri). These events are, because of the replacement, independent
of each other. The probability that at the q-th, i2- th ,. .., 4-th drawing a
red and at all the other (y'j-th, y'2- th ,. . ,,j n_k-th) drawings a white ball will
be chosen is nothing else than the probability of the event
A А/г . . . Aik ÄjxÄjs. . . Äjn k.
As the events A h, A u, . . .,Aik, Ah, Aj%, . . ., Ajn k are completely inde
pendent and P{Ai) = p, P(Aj) = 1 — p, we get
P ( A h . . . A i k Ä h . . . Ä jn _ k ) = p k ( l - p ) n - k .
Since the order is irrelevant and only the number of red balls drawn is of
interest, the value so obtained must still be multiplied by the number of
n
the possible orderings, i.e. by . Thus we obtain (8).
к
This result can immediately be generalized for experiments with more
than two possible outcomes. Let the possible outcomes in every experiment
be A ^ \ A ^ \ . . ., A^r); let their probabilities be denoted by Р(А(Л)) = ph
r
(h = 1, 2 , . . . , r). Of course we have £ ph = 1. Assume that in repeated
A= 1
performance of the experiment the outcomes of the individual experiments are
independent of each other. Then the probability that in n repetitions of the
experiment event A ^ occurs k x times, event A<2> k 2 tim es,. . ., event A(r> k r
times, is
(9)
where £ k h = n. For r = 2 Formula (9) reduces to (8).
A-l
62 P R O B A B IL IT Y (II, § 10
It is easy to see from the results of § 7 that [fi, o£,P ] is a Kolmogorov proba
bility space. In this probability space probabilities may be obtained by
geometric determination of measures. Probabilities were thus calculated
already in the Eighteenth Century.1
Some simple examples will be presented here.
Example 1. In shooting at a square target we assume that every shot hits
the target (i.e. we consider only shots with this property). Let the probability
that the bullet hits a given part of the target be proportional to the area of
the part in question. What is the probability that the hit lies in the part A ?
Clearly we only have to determine the factor of proportionality. If Q de
notes the entire target, the probability belonging to it must be equal to 1.
Hence
- №
where p(Q) denotes the area of the entire target and p(A) that of A. Thus
for instance the probability of hitting the left lower quadrant of the target is
equal to .
As seen from this example, not every subset of the sample space can be
considered as an event. Indeed, one cannot assign an event to every subset
of the target, since the “area”, as it is well known, cannot be defined for
every subset such that it is completely additive and that the areas of con
gruent figures are equal.
In general, the distribution of probability is said to be uniform, if the
probability that an object situated at random lies in a subset can be obtained
according to the definition (1) from a geometric measure ц invariant under
displacement (e.g. volume, area, length of arc, etc.).
Example 2. A man forgot to wind up his watch and thus it stopped. What
is the probability that the minute hand stopped between 3 and 6 ? Suppose
1 Of course instead of Lebesgue measure the notion of the area (and volume) of
elementary geometry was applied.
II, § 10] “ G E O M E T R IC ” P R O B A B IL IT IE S 63
the probability that the minute hand stops on a given arc of the circum
ference of the face of the watch is proportional to the length of the arc in
question. Then the probability asked for will be equal to the quotient of
the length of the arc in question, and the whole circumference of the face;
1
i.e. in our case to — .
4
In the above two examples the determination of the probabilities was
reduced to the determination of the area or of the length of the arc
in certain geometric configurations. Though this method is intuitively
very convincing it is nevertheless a very special method. Before applying
it to further examples, let us see its relation to the already described combi
natorial method. This relation is most evident in Example 2. If we neglect
the fractions of the minutes and are looking for the probability that the
minute hand stops between the zeroth and the first, the first and second,. . . ,
the k-th and к + 1-th minute {к = 0, 1, . . .,59), then we have a sample
space consisting of 60 elementary events; the probability of every event is
the same, viz. -. In the case of the example of the target let us assume,
60
for sake of simplicity, that the sides of the square target are 1 m long. Let
us subdivide the target into n2 congruent little squares with sides parallel
to the sides of the target. The probability that a hit lies in a set which can be
obtained as the union of a certain number of the little squares is obtained
by dividing the number of the little squares in question through n2. Thus we
see that geometric probabilities can be approximately determined by a
combinatorial method. We must not, however, restrict ourselves to some
fixed n in the subdivision, for then we could not obtain the probability of a
hit lying in a domain limited by a general curve. If the mentioned subdivi
sion is performed for every и however large, then the probability of measur
able sets, or to be more precise, of every domain having an area in the sense
of Jordan, can be calculated by means of limits. For this calculation we
к
have to consider the quotient — where к means the number of small
nr
squares lying in the domain if the large squaie is subdivided into con-
к
gruent small squares and we have to determine the limit of ~ for
П-ЮО.
Probabilities obtained in a combinatorial way (without passing to the
limit) are always rational numbers; geometric probabilities, however, may
assume any value between 0 and 1. Thus for instance the probability that
. . . . . . . . 7C
the hit lies in the circle inscribed into the square target is equal to — .
64 P R O B A B IL IT Y [II, § 10
Fig. 9
metry of the circle we may assume that the midpoint of the chord lies on a
fixed radius of the circle and choose the midpoint of the chord so that the
probability that it lies in a given segment of this fixed radius is assumed to
be proportional to the length of this segment. The chord will be longer
than the side of the inscribed regular triangle, if its midpoint has a distance
r 1
less than — from the centre of the circle; the answer is thus — (cf. Fig. 10).
Interpretation 3. Because of the symmetry of the circle one of the end
points of the chord may be fixed, for instance in the point P0; the other end
point can be chosen on the circle at random. Let the probability that this
other endpoint P lies on an arbitrary arc of the circle be proportional to the
length of this arc. The regular triangle inscribed into the circle having for
one of its vertices the fixed point P0 divides the circumference into three
equal parts. A chord drawn from the point P0 will be longer than the side
of the triangle, if its other endpoint lies on that one-third part of the circum
ference which is opposite to point P0. Since the length of this latter is one
third of the circumference, the answer is, according to this interpretation,
equal t o ~ .
From a well-known theorem of the elementary geometry concerning the
central and peripheral angles it follows that the third interpretation is equi
valent to the statement that the probability distribution of the intersection
point of the chord and the semicircle of centre P0 is uniform on this semi
circle (Fig. 11).
Obviously, all interpretations discussed above can be realized in physical
experiments. The example seemed once a paradox, because one did not
pay attention to the fact that the three interpretations correspond to dif
ferent experimental conditions concerning the random choice of the chord
66 P R O B A B IL IT Y [И, § 10
case we only need to compute the area of the domain determined by the
inequalities (Fig. 12)
O c x c —c v c l and y —x < ^ ~
2 2
or
0<y<4-<x< 1 and X —у < — .
2 2
Fig. 12
The method just applied is often used, for instance in statistical physics.
Here, to every state of the physical system a point of the “phase space”
may be assigned, having for its coordinates the characterizing data of the
state in question. Accordingly, the phase space has as many dimensions,
as the state of the system has data to characterize it (the so-called degrees
of freedom of the system). In our example we assigned a point of the phase
space to a decomposition of the (0,1) interval by two points; the degree of
freedom of the “ system” is here equal to 2. The analogy can be made still
more obvious by assigning to the decomposition of the (0,1) interval a
physical system: two mass points moving in the interval (0,1).
Clearly the phase space may be chosen in many ways; by solving problems
of probability in this way, however, one must not forget to verify in every
given case separately the assumption that the probabilities belonging to
the subdomains of the phase space are proportional to the area (volume).
Finally we shall discuss here a classical example, B u ff on's needle problem
(1777).
68 P R O B A B IL IT Y [II, § 10
termine the value of n with any prescribed precision. Of course this would
have no practical importance, since there are more straightforward and
reliable methods to compute the value of я. Still the question is of great
interest, since it shows that certain mathematical problems can be solved
approximately by performing experiments of a probabilistic nature. Now
adays difficult differential equations and other problems of numerical anal
ysis are treated in this manner (this is the so-called Monte Carlo method).
Questions dealt with in this paragraph are closely connected to integral
geometry.
W i 4,1*1 = 5 P{An\B).
l n = l I n=l
P(AB I C)
P(A I В) = —^----!— L .
V 1 ’ P (B \C )
If the Axioms a), b), and c) are satisfied, we shall call the system
[fl, -f,P{A [ B)] a conditional probability space.
If P*{A) is a measure defined on and P*(Q) = 1 (that is, if [Й, tsF, P*]
is a Kolmogorov probability field), further if 38* denotes the collection of
all sets В such that P*(B) > 0, then - as it is easy to see - the system
[Í2, tjF, .$*, P*(A I Z?)]is a conditional probability space, provided P*(A \ В)
is defined by
P* (AB)
P* (A [ B) = p ; ’~ { A ^ , B ^ * ) .
T heorem 6. I f for fixed C £ -fi we put P*(A) = P(A | C), the system
[Í2, -/F, P*] is a Kolmogorov probability space. I f В is an element o f •jf. such
72 P R O B A B IL IT Y [II, § 11
we have
P*(A I В) = P(A I BC).
P roof . The first statement of the theorem is evident since P£ is a measure
on and P£(Q) = 1. The second statement follows from Axiom c); indeed
we have by Theorem 1
p , H l f n В* (AB) P (A B \C ) P(ABC\C)
Be (A \B ) — = - n ß j C y - = p ( ß C |c ) - = П А IВС) .
P* (AB)
« A iB > - r h n -
Remark. AS may contain sets if such that P*(B) = 0. On the other hand, sets
В for which P*(B) > 0 may not belong to 3S. Hence [fl, AS, P(A | В)]
is not necessarily identical to the Kolmogorov probability space
[Í2, P(A I Q)], not even in the case ß
From the theorems proved above one readily sees how the generalized
theory of probability can be deduced from our axioms.
Let us mention here some further examples.
Example 1. Let Q be the и-dimensional Euclidean space; let the points
of Q be denoted by to = (cuj, co2, • • .,co„). Let denote the class of all mea
surable subsets of Q, let further f(co) be a nonnegative, measurable function
defined on Ü and AS the set of all measurable sets В such that ( f(co)da> be
finite and positive. Put
| f(co) dco
P(A I B) = ----- — .
f /(« ) dm
В
[Q, AS, P(A I B)] is then a conditional probability space. If j f(co)dco <
h
< + oo, a conditional probability space generated by a Kolmogorov proba-
II § ; l ] C O N D IT IO N A L P R O B A B IL IT Y S P A C E S 73
bility space is obtained; if, however, f f(co)d(o = + oo, this is not the case.
ii
Especially when f(co) = 1, we obtain the uniform probability distribution
in the whole и-dimensional space. In this case
m
Clearly [ Q , y £ , P ( A \ B)] is a conditional probability space. It is gener-
00
ated by a Kolmogorov probability space if and only if the series £ p„ is
n= 1
convergent.
Especially when p„ = 1 (« = 1, 2 ,. . .),
I 1
Р(Л|5) = ^ Т
n£B
is equal to the ratio of the number of elements of the set AB and the set B.1
Evidently the question arises how conditional probabilities are connected
with relative frequencies, i.e. whether the generalized theory does have a
frequency-interpretation too.
The answer is affirmative and even very simple. The conditional proba
bility P(A I B) can be interpreted in the generalized theory (as well as in the
theory of Kolmogorov) as the number about which the relative frequency of
A with respect to the condition В fluctuates. Thus the generalized theory
has the same relation to the empirical world as Kolmogorov’s theory.
1 In both cases, P(A | B) could have been represented as the ratio '' ^ ^ ■, where
/' ( B)
H is an unbounded measure) (With respect to the conditions for the existence of such
measures cf. Á. Császár [1], and A. Rényi [18]).
74 P R O B A B IL IT Y [И , § 12
§ 12. Exercises
1. Let р ъ p 2, P x 2 be given real numbers. Prove that the validity of the four inequal
ities below is necessary and sufficient for the existence of two events A and В such
that P(A) = p u P(B) = p2, P(AB) = p i2.
1 — Pi —Рг+ Р\г ^ 0 , (1)
P i - Piг > 0, (2)
P 2 -P l2 ^°> (3)
Pi2> 0 • (4)
H i n t . On the right hand side of the inequalities (1)—(4) we have the probabilities
of AB, AB, AB, and AB. Of course they must be nonnegative, thus the conditions
are necessary. Their sufficiency can be shown as follows: from (l)-(4) it is clear that
0 < P u < Pi < Pi + p 2 — Pi2 < 1
and similarly
0 < pu < рг < Pi + Рг - Pu < 1.
8. What is the probability that in n throws of a die the sum of the obtained numbers
is equal to к ?
Hint. Determine the coefficient of xk in the expansion of the generating function
(x + X 2 + X 3 + X4 + X 5 + X6)".
9. What is the probability that the sum of the numbers thrown is larger than 10
in a throw with three dice ?
Remark. This was the condition of gain in the “passe-dix” game which was current
in the Seventeenth Century.
10. What is more probable: to get at least one six with four dice or at least one
double six in 24 throws of two dice? (Chevalier de Méré’s problem.)
11. In a party of n married couples everybody dances. Every gentleman dances
with every one of the ladies with the same probability. What is the probability that
nobody dances with his own wife? Find the limit of this probability for n -* °° .
12. An urn contains n white and m red balls, п ф т \ balls are drawn from the urn
at random without replacement. What is the probability that at some instant the
numbers of white and red balls drawn are equal?
13. There is a queue of 100 men before the box-office of an exhibition. One ticket
costs 1 shilling. 60 of the men in the queue have only 1 shilling coins, 40 only 2 shilling
coins. The cash contains no money at the start. What is the probability that tickets
can be sold without any trouble (i.e. that never comes a man, having only 2 shilling
coins, before the cash desk at a moment when the latter contains no 1 shilling coin) ?
14. A particle moves along the j»axis with unit velocity. If it reaches a point with
integer abscissa it has one of two equiprobable possibilities: either it continues to
proceed or it turns back. Suppose that at the moment / = 0 the particle was in the
point X= 0 . Find the probability that at a time t the particle will have a distance
X from the origin (r is a positive integer, x an arbitrary integer).
15. Let the conditions of Exercise 14 be completed by the following: at the point
with abscissa x0 (a positive integer) there is an absorbing wall; if the particle arrives
at the point of abscissa x0 it will be absorbed and does not continue its movement.
Answer the question of the preceding exercise for x <. x0 .
16. A box contains M red and N white balls which are drawn one after the other
without replacement. Let Pk denote the probability that the first red ball will be drawn
at the k-th drawing. Since there are N white balls, clearly к <, N + 1 and thus
Pi + Рг + . . . + PN+i = 1. By substituting the explicit expression of Pk we obtain
an identity. How can this identity be proved directly, without using probability theory ?
17. Let us place eight rooks at random on a chessboard. What is the probability
that no rook can take another?
Hint. One has to count the number of ways in which 8 rooks can be placed on a
chessboard so that in every row and in every column there is exactly one rook.
18. Put
[M l I N - M j
'■“ ■ '" T i r
76 P R O B A B IL IT Y [II, § 12
and
И ^ = ( " j / ( l - P ) n~ k (A = 0 , 1 , . . . , « ) ,
M M
where p = — -. Prove that if M and N tend to infinity so that = p remains
N N
constant, then Pk (M, N) tends to Wk.
19. P u t
constant, then Qk(M, N) tends to Vk (cf. § 5, Example lb). Estimate the error
I Qk ( M , N ) ~ Vk \.
20. How many raisins are to be put into 20 ozs of dough in order that the proba
bility is at least 0.99 that a cake of 1 oz contains at least one raisin?
Compute the probability that at the time o f the stopping only the wall of the elevator
shaft can be seen from the elevator.
26. What conditions must the numbers p, q, r, s satisfy in order that there exist
events A and В such that
5r = £ / , + ( r + 1,i/,+! + . .. + ( ” )t/„.
78 P R O B A B IL IT Y [II, § 12
replacement. Let Ak denote the event that the k-th drawing yields a red ball. Prove
now the following statements:
a) The events Au Аъ . . . , A„ are exchangeable.
b) The events Ak are, generally, not even pairwise independent.
c) Let Wk denote the probability that from the n drawings exactly к yield red balls.
Compute the value of Wk .
d) Let nk denote the probability that the first red ball was drawn at the k-th
drawing; compute the value of лк .
4L Let Ak denote the event that given the conditions of Exercise 37 the ft-th person
draws his own visiting card. Prove that the events Ak are exchangeable.
42. Let N balls be distributed among n urns such that each ball can fall with the
same probability into any one of the urns. Compute
a) the probability P0 (я, N) that at least one ball falls into every urn;
b) the probability Pk (n, N) that exactly к (k = 1 , 2 , . . . , « — 1) of the urns remain
empty.
43. Let Ak denote in the preceding exercise the event that the k-th urn does not
remain empty; show that the events Ak are exchangeable and
46. Let an urn contain M red and N - M white balls. Draw all balls from the
urn in turn without replacement and note the serial numbers of the red drawings.
Let these serial numbers be ku k2, , kM, and put X = L, + k2 + . . . + kM. Let
P„ (M , N) denote the probability that X = n (A < n < B), where
M ( M + 1) '
A = ----- ’ and В = A + M(N - M).
80 P R O B A B IL IT Y [II, § 12
Put
в
F(M, N, x) = £ P„ (M, N) x".
n=A
Determine the polynomial F(M, N, x) and thence the probabilities P,(M, N).'
Prove that
PB-n (M, N) = PA+„ (ЛТ, N).
47. Prove by means of probability theory that if <p(n) denotes the number of the
positive integers less than n and relatively prime to n (« = 1 , 2 , . . . ) , then2
other hand we have P(An) = — , hence, because of the independence of the events Ap,
P
^ n^ ( П лP \n) = П а д = П
p \ll (p \n> ' - 7 )P )-
48. a) Let Q be a countably infinite set, let its elements be cou a>2, . . . , a>„,. . . .
Let cA consist of all subsets of Q and let the probability measure P be defined in the
following manner: P({o)„}) = p„ where p„ > p„+l > 0 (« = 1 , 2 , . . . ) and £ p„ = 1.
/1=1
Prove that the set of those numbers x for which an A g c t can be found such that
P(A) = x, is a perfect set.
b) Prove that, given the conditions of Exercise 48 a), the range of the set function
P(4) is identical to the interval [0, 1], if and only if
P„ < £ pk (« = 1, 2 , . . . ) .
k-n+l
c) Given the conditions of Exercise 48 a), prove that to every r-tuple of numbers
xlt x2, . . . , xr with
Г
£ X, = 1, X, > 0, ( / = 1 , 2 .......r)
(=i
a complete system of events
A„ A2, í . . , Ar with P(At) = x, ( / = 1 , 2 , . . . , r)
1 a
P „ < — Y pk (и = 1 , 2 , . . . ) .
r k=n
Hint, a) A number x is said to be representable if there exists an event A £ r:A
such that P(A) = x, i.e. if x can be represented in the form x = £ p„. If x„
(« = 1, 2 ,. . .) is representable and lim x„ = x Ф 0, then it is readily seen that x is
П—*- со
representable too. Indeed we can select from the sequence x„ an infinite subsequence
x„k (k = 1, 2 ,. ..) such that in the representation of each x„k the greatest member
is ptl. Take now from this sequence an infinite subsequence having in its representation
for second greatest member р,г. By progressing in this manner we obtain a sequence
Pi, (s = 1 , 2 , . . . ) and it is easy to verify that £ p it = x. The range of the function
S= 1
P(A) is thus a closed set. Furthermore, if x is a number which can be represented as
N
a sum of a finite number of the pr s, e.g. x = pih then
N
x = lim ( £ Pu + p„).
r t—► CO /= 1
If x = Pu> then
/=i n
x = lim ^ Pi,.
П - + со /= 1
Í p „ = Í x , = l,
П= 1 /= 1
CO
Hint. Prove first that for any e > 0 Q can be decomposed into a finite number
of disjoint subsets A, (A, € cA; j = 1, 2, . . . , m) such that P(A,) < e. This can be
seen as follows. If A € cA, P(A) > 0, then A contains a subset В £ A such that
0 < P(B) < e. Indeed if P{A) < e, we can choose В = A. If P(A) > e, then (since
P is non-atomic) а В c A can be found such that В £ c A and 0 < P(B) < P(A);
p(A) P(A)
here either P(B) or P(A — B) is not greater than —- — . If —- —- < e, we have
P{A)
completed the proof; if —- — > e, the procedure is continued. Since for large enough
P(A)
r we have - < e, there can be found in a finite number of steps a set В such that
1л
с( A) = sup P(B) fo r A£cA.
В £ А ,Р ( В ) < .е
According to what was said above, /j.c (A) > 0 for P(A) > 0. Choose a set Al C <Jl
for which 0 < P{A^) < e, further a set A2 £ Áx for which
e > РШ > ~ b. W
and then a set A3 с Ax + Аг for which e > P(A3) L / i . (/t[ + A2); generally, if
the sets A lt A2, . . . , A„ are already chosen, we choose a set A„+1 such that the con
ditions
An+l £ Ax + Аг + . . . + A„
and
e > Д Л + 1) ^ у В* { A . + A 2 + . . . + A n)
CO
are satisfied. Then Alt Аг, . . . . A„, .. . are disjoint sets, hence P (A„) < 1
Tl= 1
and thus lim P(A„) = 0 and at the same time
П—►со
03
Choose now N so large that £ P(A„) < e. Then the sets A\, A2, . . . , /t v_, and
n^N
A'n = Y_, An possess the required properties. Now we can construct for an arbitrary
"=i .
number x (0 < x < 1) an A g d . such that P(A) = x in the following manner: Q is
decomposed first into a number Nl of disjoint subsets Av such that
Then x lies in one of the intervals [x2_„ x2j+I); e.g. x 6 [x2.ra, x2_r. + ,). By con
tinuing this procedure we obtain a set
A — X ^ n + Y + •• - + Y j A sl + ■■■
i=i i=i i=i
for which P(A) = x.
50. Prove for an arbitrary probability space that the range of P(A) is a closed set.
Hint. A set A € cA will be called an atom (with respect to P), if P(A) > 0 and if
В g cA, В S A imply either P(B) = 0 or P(B) = P(A). Two atoms A and A’ are, a
set of zero measure excepted, either identical or disjoint. From this it follows that
there can always be found either a finite or a countably infinite number of disjoint
atoms A„ (« = 1 , 2 , . . .) such that the set Q — ^ An contains no further atoms. Put
n=i
03
Y
/ 1=1
A" = B’ = KAB), n2(A) = n(AB).
Then n(A) = fii (A) + fi2(A). Here fi,(A) can be considered as a measure on the
class of all subsets of the set Q' having for its elements the sets An, and /i2(A) is non-
atomic. Hence the statement of Exercise 50 is reduced to the Exercises 48a) and 49.
C H A P T E R 111
= (i)
n n
Let BL, B2, . . ., Bn, . . . be a complete system of events and let P(Bj) > 0
(i = 1, 2, . . . ) . Then an arbitrary event A £ can be decomposed accord-
Ill, § 2] THEOREM O F T O T A L P R O B A B IL IT Y A N D B A Y E S ’ T H E O R E M 85
A = Y ABn.
П=1
Since BjBj = 0 holds for i ф j, we obtain
P(A) = £ P (A B J . (1)
n
M
P(Ak) = -jy- (k = 2, 3 ,. . ., N ). According to the theorem of total proba
bility
P(A2) = Р(Аг I АО Р Ш + P(A2 I A,) P(A,),
86 D IS C R E T E R A N D O M V A R IA B L E S (II I , § 2
hence
M - 1 M M N -M M
V2' N- l N N —1 N N
M
Similarly we obtain that P(Ak) = — , if к = 3, 4 ,. . A(cf. Exercise 39a,
The name “polynomial distribution” comes from the fact that the terms
P(Bki кг... kr ) can be obtained by expanding (p1 + p 2 + . . . + pr)n
according to the polynomial theorem. If r = 3, we call the distribution
the trinomial distribution.1
where pt is the probability that A did occur at the f-th experiment. The
summation is to be taken over all combinations (i1; i2, . . ik) of the k-th
order of theelements (1, 2 ,. . и) and j\, j 2, .. .,j„ -k denote those numbers
of the sequence 1 , 2 whi ch do not occur among /,, i2, . . ., ik. The
numbers P(Bk) form a probability distribution. If for instance all probabili
ties Pi are equal to each other, we obtain as a particular case the binomial
distribution (1).
The distribution (3) occurs for instance in the following practical problem :
In a factory there are n machines which do not work all the time. They are
switched on and switched off independently from each other. Let pt denote
the probability that the г-th machine is working at a given moment, let
P{Bk) be the probability that at this instant exactly к machines are working,
П
then P(Bk) is given by the Formula (3). The fact that £ P(Bk) = 1 can
fc= 0
be seen directly in the following manner: A simple calculation gives that
I
k=0
p (Bk) дс* =i=Пl 0 - p>+ Pi *);
n
by substituting X = 1 we obtain £ P{Bk) = 1.
k=0
4. The following problem was discussed in the preceding Chapter. An
urn contains M red and N — M white balls (M < N). Draw n times one
ball from the urn without replacement (n < N). What is the probability
that there are к red balls among the n balls drawn ? Denote this event by
Ck. Then the events Ck [max (0, n — (N — M ) ) < к < min (и, M)] form
a complete system of events. The corresponding probabilities are, as we have
already shown:
M IN-M
П
This distribution is called the hypergeometric distribution.
Ill, § 3 ] C L A S S IC A L P R O B A B IL IT Y D I S T R IB U T IO N S 89
I«,
Distribution (5) is called the polyhypergeometric distribution. It is, for ex
ample, applied in statistical quality control, when the commodities are
classified into several categories. (Such categories are for instance: a) fault
less; b) faulty but still serviceable; c) completely faulty.)
The events
C k u к ..... (0 ^
kr k j < min (и, N i ) ; X k , = n )
1=1
form a complete system of events, thus
*,) = i.
This can be seen directly, if we compare the coefficient of x" on both sides
of the identity
П (1 + x )Ni = (1 + * )* .
i=i
6 . Let an urn contain M red and N — M white balls. Let Ak (к = 0, 1,. . .,
N — M ) denote the event that at consecutive drawings without replacement
we obtained the first red ball at the (к + l)-st drawing. As was proved in
§ 5 of the preceding Chapter, we have
M
РИ о )= дГ>
M k~l M I
P^ )= N ^ k П ! - j 1 ,2 ,...,N -M ). (6)
M N- M M к~ Ч . M
N + *?i N - к У0 I N ~j
This identity also has a direct proof, but it is not quite simple. It happens
often that certain identities for which a mathematical proof may be rather
elaborate, are readily obtained by means of probability calculus.
7. Let the preceding exercise be modified in the following manner: Let
an urn again contain M red and N — M white balls, but the drawn balls
should now always be replaced. Let Ak denote the event that we obtain the
first red ball at the (к + l)-st drawing. The most marked difference between
this problem and that of the drawing without replacement dealt with above
is that the number к was there bounded (k < N — M). Now, however, к
can be arbitrarily large; in principle, it is even possible that we draw always
a white ball. Hence it can be questioned whether the events Ak (к = 0, 1,. . .)
do form a complete system of events. Clearly, the events Ak mutually ex
clude each other, the only thing we have to examine is whether it is sure that
one of the events does occur. By introducing the notation
ß' = X X /1 = 0
we have Q' Ф Q.
We shall prove, however, that the possibility to draw always a white ball
in an infinite repetition of drawings has probability 0 , thus in practice it
does not count at all, i.e., the system of events {Ak} is in a wider sense
complete.
M
First of all we compute the probabilities P{Ak). Put — = p, 1 —p = q.
The probability that we obtain at the first к drawings white balls and at the
(k + l)-st drawing a red one is
Hence the probability of Q' is 1 and thus P(Q — Q') = 0. Though it is, in
principle, possible that Q — Q' occurs; this possibility can be neglected in
practice. Hence the system {Ak} of events is, in a wider sense of the word,
complete.
The distribution pqk (к = 0, 1,. . .) is often called the geometric distri
bution, since the sum of the members pqk is a geometric series. We shall see
ш , § 31 C L A S S IC A L P R O B A B IL IT Y D IS T R I B U T I O N S 91
where the meaning of E„ can be only A or Ä, the number x having the binary
expansion x = 0 , . . . g„ . . ., where
П if En = A,
E" ( o if En = Ä.
P(C) = pkql.
From this we can compute P(C) for every C Clearly [ ^ й,Р] is a proba
bility algebra, but -.j£0 is not a n-algebra. But if we consider the least e-al
gebra containing and extend the set function P(C) defined over
(readily seen to be a measure on 0) to ^ , then we obtain the Kolmogorov
probability space sought for (cf. Ch. II, § 6 ). In order to prove that P{C)
is a measure on 0, let us consider the above mapping of the sample space
onto the interval [0, 1]; let the interval [0, 1] be denoted by Í2*. There cor
responds to the algebra of sets o ( 0 the class u€* of the subsets of Q* consist
ing of a finite number of pairwise disjoint intervals with binary rational end
points. Just like in Chapter II, § 7 there can be given a function E(x) so that
the probability belonging to the interval [a, b) = 1 be equal to F{b) — F(a).
92 D IS C R E T E R A N D O M V A R IA B L E S [III, § 3
indeed, if the interval [a, i ) i s o f the form j («i being odd) and
m 1 1 1 .
2n ' 2'Г + y f + • • ■+ Y* (*i < *2 < - • - < * * —”)»
then we put
F(ft) - F(a) = / <7"-*.
From this F(x) can be determined at every binary rational point x. Thus
for instance
F(0) = 0, F(l)=l, f Ü - U 9, f (~ = q\
(5 „ (7 )
F T \= q+ TP’ F r 8 = 4 + pq + p q' etc-
In general, if
q = 1 —p. Let A^f denote the event that during independent repetitions
of the experiment the event A occurred for the r-th time (r > 1) at the
(r + k )-th experiment. We obtain by a simple combinatorial consideration
that
p(4'>) = +
/ _ ~ 1 / qk (k = 0 ,1 ,...) . (8)
“ [k + r - \ \ k ” f-r| . p У
k=o r - 1 I k=o \ к ) i - q )
T(C) = Y
/1 = 0
P{C„) = 0 .
Thus the event A occurs infinitely many times with the probability l . 1
9. Consider the following problem: let an urn contain M red and N — M
white balls. Draw a ball at random, replace the drawn ball and at the same
time place intő the urn R extra balls with the same colour as the one drawn . 2
Then we draw again a ball, and so on. What is the probability of the event
that in n drawings we obtain exactly к times a red ball? Let this event be
denoted by A k. Of course we assume that at every drawing each ball of the
1 Later on we shall prove more: let k„ denote the number of occurrences of A in
the first n experiments, then not only lim £ „ = + <*= with probability 1, but more
П-*-со
k„
precisely lim — = p with probability 1.
n—►CO ft
2 R can be negative as well. In case of negative R we remove from the urn R balls
of the same colour as the one drawn.
94 D IS C R E T E R A N D O M V A R IA B L E S [III, § 4
urn is selected with the same probability. We compute first the probability
that we obtain at each of the first к drawings a red ball, and white balls at
the remaining n — к drawings. Clearly this probability is
П (M + j R) П ' {N —M + hR)
J
—------------------------------------------------------- • (9 )
П (N +l R )
/-0
So far we have only considered whether a random event does or does not
occur. Qualitative statements like this are often insufficient and quantitative
investigations are necessary. In other words, for the description of random
mass phenomena one needs numerical data. These numerical data are not
constant, they show random fluctuations. Thus for instance the result of
a throw in dicing is such a random number. Another example is the number
of calls arriving at a telephone exchange during a given time-interval, or
the number of disintegrating atoms of a radioactive substance during a
given time-interval.
In order to characterize a random quantity we have to know its possible
values and the probabilities of these values. Such random quantities are
I ll, § 4] T H E C O N C E P T O F A R A N D O M V A R IA B L E 95
called random variables. In the present Chapter we shall discuss only random
variables having a countable set of values; these are called discrete random
variables. The random variables figuring in the above examples were all
of the discrete type. The life time of a radioactive atom is for instance also
a random variable but it is not a discrete one. General (not discrete) random
variables will be dealt with in the following Chapters. In what follows ran
dom variables will be denoted by the letters of the Greek alphabet.
Let A be an arbitrary event. Let the random variable £A be defined in
the following way:
Í 1 if A occurs,
A I 0 otherwise (i.e. if A occurs).
£= n if An occurs (n = 1, 2, . . . ) .
The value n may be replaced by /(и), where f(ri) is any function defined for
the positive integers, for which f(ri) Ф f(m ) if n ф m. Thus we can see that
a complete system of events can be assigned to every discrete random va
riable in a unique manner, while there can be assigned infinitely many dif
ferent random variables to a complete system of events.
We shall deal in this Chapter with random variables assuming only real
values. It must be said that probability theory deals also with random vari
ables whose range does not consist of real numbers but for instance of
96 D IS C R E T E R A N D O M V A R IA B L E S [III, § 4
«-dimensional vectors. There are also random variables whose values are
not vectors of a finite dimension but infinite sequences of numbers or func
tions, etc. Later on, we shall also examine such cases.
Now let us see, how the notion of a random variable is dealt with in the
general theory of probability.
In Chapter II we were made familiar with Kolmogorov’s foundation of
probability theory. We started from a set Q, the set of elementary events,
and a а-algebra t s f consisting of subsets of Q. Here consists of all events
coming into our considerations. Further there was given a nonnegative
и-additive set function P defined on tjfi such that P(Q) = 1. The value
P(A) of this function for the set A defines the probability of the event A. N at
urally, we understand by a random variable a quantity depending on which
one of the elementary events in question occurs. A random variable is there
fore a function £ = £(co) assigning to every element со of the set Q (i.e., to
every elementary event) a numerical value.
What kind of restrictions are to be prescribed for such a function ? If we
have a probability field where every subset of Q corresponds to an event,
no restriction is necessary at all. But if this is not the case, then the definition
of a random variable calls for certain restrictions.
Since we consider in this Chapter discrete random variables only, we
confine ourselves (for the present) to the following definition:
Let [fí, 'sA, P] be a Kolmogorov probability space. A function £ = £(a>)
defined on Q with a countable set o f values is said to be a discrete random
variable, i f the set, for which £,(oj) takes on a fixed value x belongs to for
every choice o f this fixed value x.
Let x b x 2, . . ■denote the different possible values of the random variable
£, = £(cü) and A n the set of the elementary events со £ Q for which £(co) = x,„
then A n must belong to the algebra of sets for every «. Only in this case
the probability
Р{Ц = х п) = Р(Ап)
is defined.
A complete system of events associated with a discrete random variable
thus consists of those subsets of the space of events for which the random
variable takes on the same value. Especially, if bA = £A(co) is the indicator
of the event A, then ( A(co) is a random variable having the value 1 or 0
according as со does or does not belong to the set A.
The sequence of probabilities of a complete system of events is said to be
a probability distribution. Now that we have introduced the concept of
random variable this probability distribution can be considered as the set
of all probabilities corresponding to the different values taken on by a ran
dom variable. If for instance an experiment having the possible outcomes
A and Ä is independently repeated n times, then the number 1; of the experi-
I ll , § 4] T H E C O N C E P T O F A R A N D O M V A R IA B L E 97
ments showing the occurrence of the event A is a random variable with the
binomial distribution, i.e.
where the sum is to be extended over all values of к such that x k < x.
1 Called also cumulative distribution function.
The definition F(x) = P((, < x) is also customary. This induces only minor
modifications in its properties, e.g. this function is continuous from the right, while
P(£ < xi is continuous from the left.
98 D IS C R E T E R A N D O M V A R IA B L E S [III, § 4
put
B ( « ,ß ,x ) = \ t* - 1 ( \ - t y - 1 dt. (2)
• b
B(a, ß, X) is called Euler's incomplete beta integral o f order (a, ß). It is well
known that
D, оч D, p 1Ч Д а) Щ )
Щ . , Й - U f r , ft В - Ц а + Ю , (3)
where
00
Г (а )= \ x * - 1 e~x dx (а > 0 )
ö
is the so-called gamma function. B(a, ß) is called Euler's complete beta inte
gral o f order (a, ß). It is readily verified through integration by parts that
_ . _ B{r+ 1 , n - r,p)
B (r+ l,n -r) ’ W
hence
5(r + l,n - r)
if
r < X < r + 1 (r = 0 , 1 , . . . , n — 1). (5 )
for every n and m. Hence in case of two independent random variables the
joint distribution of £ and t] is, according to ( 1 '), determined by the distri
butions of £ and г].
This definition can be generalized to the case of several random variables.
The discrete random variables £2, . . . , are said to be (completely)
independent, if for every system of values x ki, x ki, . . ., xkr the relation
P r o o f . The proof will be given in detail only for r = 2, for r > 2 the
procedure is essentially the same.
Let {xjk} be the sequence of the possible values of the random variable
(J = 1, 2) and {A k) the complete system of events belonging to the ran
dom variable £ ; Ajk is thus the set of those elementary events со £ Q for
which f(o>) = xjk.
If yjt is one of the possible values of the random variable rjj = g fA if
then the set Bjt defined by g/Aj) = Уд can obviously be obtained as the
union of finitely or denumerably many sets Ajk; Bn is equal to the union of
the sets Ajk whose indices satisfy the equation g f x jk) = yjt.
Since the complete systems of events {Ay,} and {A2k} are independent,
the sum of an arbitrary subsequence of the sets A lk is independent of the
sum of an arbitrary subsequence of the sets A 2k. From this our assertion
follows.
We give a reformulation of the above theorem which we shall need later
on. Let £(a>) be a discrete random variable with possible values х ъ x 2, ■. .,
xn, . . . and let A„ denote the set of those elementary events со for which
<foj) = x„. Let further be the least а -algebra containing the sets A„.
t is called the a-algebra generated by Clearly, consists of the sets
obtained as the union of finitely or denumberably many of the sets A n.
Obviously ^c i f £1; f , . . . Qr are independent random variables,
'у А . . ., >
у Аir are the ст-algebras generated by £2, ■■■, <tr and Bj
is an arbitrary element o f у А^ (j = 1 , 2 , . . . , r), then the events Въ В ,,. . -,Br
are independent.
Let £ and g be two random variables with possible values x n and y,„
( n , m = 1 , 2 , . . .), respectively. Let the distributions of £ and >/ be
If g{x, y) is any real valued function of two real variables, then — as men
tioned above — £ = g(f, rj) is a random variable.
I ll, § 6] C O N V O L U T IO N S O F D IS C R E T E R A N D O M V A R IA B L E S 101
The sum is here extended over those pairs (и, ni) for which g{xn, y m) = z.
If such pairs do not exist, the sum on the right hand side of (1) is zero.
In order to compute P(£ = z) we have to know therefore, in general,
the joint distribution of £ and r/. If £ and )] are independent, then
Щ = x n,rj = y m) = P(£ = x„) P(q = y j and thus
у [« ij Я2 j _ К + Яо
j=o Ы \k-.i) к
Р^ = к ) = \ П
к Pk q "~k (A: = 0, 1 , . . . , ri) (5)
where я = nt + л2.
Hence the random variable £ has a binomial distribution too. This result
can also be obtained without any computation as follows: Consider an ex
periment with the possible outcomes A and Ä ; let P{A) = p. In the above
example resp. rj is equal to the number of occurrences of A in the course
of fii resp. я 2 independent repetitions of the experiment. The assertion that
£ and ц are independent means that we have two independent sequences of
events. Perform a total number of я = n1 + n2 independent experiments,
then £ = £ + t] means the number of the occurrences of A in this sequence
of experiments; hence £ is a random variable having a binomial distribution
of order n and parameter p; that is, Formula (5) is valid.
We encounter a practical application of this result when estimating the
percentage of defective items. Consider a sampling with replacement from
the population investigated. According to the above this can be done also
by subdividing the whole population into two parts having the same per
centage of defective items and selecting from one part a sample of nx ele
ments and from the other one a sample of n2 elements. This estimating pro
cedure is equivalent to that which consists of the choice of a sample of
n — пг + я 2 elements from the whole population.
It is to be noted here that the distribution of the sum of two independent
random variables with hypergeometric distributions does not have a hyper-
geometric distribution. Hence the former assertion is not valid if the sam
pling is done without replacement. The difference is, however, negligible in
’ practice, if the number of elements of the population is large with respect
to that of the sample.
one of such data is the expectation defined below (first for discrete distribu
tions only).
Let the possible values of the random variable £ be х ъ x 2, ■. . with corre
sponding probabilities p„ = P(£ = x„) (n = 1 , 2 , . . .). Perform N independent
observations of £; if N is a large number, then, according to the meaning of
probability, at approximately Npx occasions we shall have £ = x x, at approx
imately Np 2 occasions £ = x 2, and so on. Taking the arithmetic mean of the
^-values obtained at the N observations, we obtain approximately the value
Np1 -x 1 + Np 2 -x 2 + . . . ^
-------------- n ----------- -- = L P k X k;
к
this is the value about which the arithmetic mean of the observed values of
£ fluctuates. Hence we define the expectation E(f) of the discrete random
variable £ by the formula
Е (0 = ^ Р кхк. ( 1)
к
Obviously, E(£) is the weighted arithmetic mean of the values x k with weights
pk.1 In order that the definition should be meaningful we have to assume
the absolute convergence of the series figuring on the right side of (1). Other
wise, namely, a rearrangement of the x k values would give different values
for the expectation.
If £ can take on infinitely many values, then E(f) does not always exist.
E.g. if
then the series £ pkxk is divergent. Clearly, the expectation of discrete and
к
bounded random variables always exists.
Sometimes, instead of “expectation” , the expressions “mean value” or
“ average” are used. But they may lead to confusion with the average of the
observed values. In order to discriminate the observed mean from the
number about which the observed mean fluctuates we always call the latter
“expectation” .
Obviously, the expectation E(t;) depends only on the distribution of
hence if £, and £ 2 are two discrete random variables having the same dis
tribution, then £(£]) = E(£2). Therefore E(£) can also be called the expecta
tion o f the distribution of f The fluctuation about E(f) of the averages1
1 Hence E(c) lies always between the lower and the upper limit of the possible
values of i.
104 D IS C R E T E R A N D O M V A R IA B L E S [III, § 7
formed from the observed values of £ is described more precisely by the laws
of large numbers, which we shall discuss later on. Here we mention only
that the average of the observed values of £ and the expectation E{f) are
essentially in the same relationship as the relative frequency and the proba
bility of an event. This will be readily seen if we consider the indicator A
of an event A having the probability p ; indeed, E(ff) = p ■l + (1 — p)- 0 =
= p and the average of the observed values of £A is equal to the relative
frequency of the event A.
Next we compute the expectations of some important distributions.
1. The expectation o f the binomial distribution. The random variable c;
has a binomial distribution if it assumes the values к = 0 , 1 , . . ., n with
probabilities
P(Z=k)= j ” A " -* ,
Щ ) = i (r + k) [k + r - 1 P'у = - f (‘k + r I / +i Ф = — .
k =0 r~ 1 P k =0 l r I P
Example. In shooting at a target, suppose that every shot hits the target
with the probability p and the outcomes of the shots are independent of
each other. How many shots are necessary to hit the target r times ?
The mathematical wording of the problem is as follows: Let the experi
ments of a sequence be independent of each other. Let the experiment have
only two outcomes: A (the shot hits the target) and Ä (it does not). Let £
denote the serial number of the experiment at which A occurred for the г-th
time. As noted in § 3 of this Chapter, the probability that the event A occurs
in the (к + r)-th experiment for the r-th time is
[k + r - 11 k
, p q ,
r- 1
hence t has a negative binomial distribution of order r. Thus in the average
r
we need to fire — shots in order to get r hits.
Р
T heorem 1. IfE (£) and E(rj) exist, then E{f + rj) exists too and
On the other hand, the possible values of £ + t] are the numbers z repre
sentable as Xj + y k. It may happen that a number z can be represented
in more than one way in the form z = x} + y k ; in this case
Since the sum of two absolutely convergent series is itself absolutely conver
gent, we obtain that
E ( t c k Q = Z c kE ( Q .
k= 1 *=1
E ( r ,)= E ( 0 - E (E ( 0 ) -
Since the expectation of a constant is obviously the constant itself, we have
E(E{f)) = £(£), and our statement follows.
T heorem 5. I f i, and t] are discrete random variables such that the expecta
tions £(£2) and E(jf) exist, then E(£ri) exists as well and
\E(Zn) t < j E { C ) E ( f ) . ( 1)
Ía= « - 2 ^)2,
108 D IS C R E T E R A N D O M V A R IA B L E S [H I, § S
where X is a real parameter. Since 0 < < 2£? + 2A2ij2, Е(Сд exists.
Because of Theorem 3 we have
Since £A> 0 we have АГ(£Л) > 0 for every real X, therefore the polynomial
(2) in X of degree 2 is nonnegative. But as it is well known this is only pos
sible if ( 1 ) holds, which is what we wished to prove.
Let £ be a discrete random variable and A an event having positive proba
bility. The conditional expectation of £ with respect to the condition A is
defined by the formula
E{Z\A) = YP ( $ = X k \A )x k, (3)
к
provided that the series on the right side is absolutely convergent (which is
always fulfilled if E(£) exists), where x„ (n = 1 , 2 , . . . ) denote the possible
values of £. E(£ | A) is therefore the expectation of the conditional distri
bution of £ with respect to the condition A. If the events A„(n = 1 , 2 , . . . )
form a complete system of events, then in view of the theorem of total proba
bility
E(£) = T P ( i = = I I P « = ** I A n) P ( A n) x k = £ P ( A „ ) E ( £ I A n).
к к n n
Е (0 = ^ Р Ш Е ^ \ А п), (4)
n
£(£(£!»,)) = £ (£ ). (5)
This relation will be used later on.
Example. Formula (5) can also be used to compute the expectation of the
sum of a random number of random variables. Let <J2, . . . be indepen
dent random variables and let v be a random variable independent of
(n = 1 , 2 , . . . ) and taking on the values 1 , 2 , . . . with probabilities
qx, q2, ■■. . Consider the random variable
£ = Ci + £2 + • ■• + £1
E (0 = t q n(E1 + E2 +
/Z—1
In the special case where the expectations of the random variables £,k
are equal, i.e. En = E, then
£ (0 = £ I nqn =E-E(v). (6 )
n =l
P roof . Let Ajk denote the event £ = xJ} rj = y k (j, к = 1 , 2 , . . .). Clearly,
the possible values of lr\ are the numbers which can be represented in the
form z = Xjyk. Further zP(^rj = z) = z £ P(Ajk) = £ XjykP{Ajk), hence
xjyk=z x/y/c=z
E ( ^ ) = E l x j y k P(Ajk). (8 )
j к
§ 9. The variance
The expectation of a random variable is the value about which the random
variable fluctuates; but it does not give any information about the magni
tude of this fluctuation. If we compute the expectation of the difference
between a random variable and its expectation we obtain, as we have already
seen, always zero. This is so because the positive and negative deviations
from the expectation cancel each other. Thus it seems natural to consider
the quantity
d (0 = E ( \ ^ - E ( 0 \ ) (1)
as a measure of the fluctuations. Since, however, this expression is difficult
to handle, it is the positive square root of the expectation of the random
variable (/; — E (0 )2 which is most frequently used as a measure of the
magnitude of fluctuation. This quantity, called the standard deviation of £,
is thus defined by the expression
D(£) = + V £ ( ( £ - £ ( 0 ) 2) (2)
(provided that this value is finite) and D2(£) is called the variance of
The choice of D(£) for measuring the fluctuations is advantageous from a
mathematical point of view, as it makes computations easier. The real im
portance of the concept of variance is shown, however, by some basic theo
rems of probability theory discussed in the following Chapters, e.g. the
central limit theorem.
DXO = Z P n(*n-E (O Y
П
(3)
and, according to Theorem 1,
Theorems 2 and 3 are similar (from a formal point of view even equal)
to the well-known Steiner theorem in mechanics which states that the mo
ment of inertia of a linear mass-distribution about an axis perpendicular to
this line is equal to the sum of the moment of inertia about the axis through
the center of gravity and the square of the distance of the axis from the
center of gravity, provided that the total mass is unity; consequently, the
moment of inertia has its minimal value if the axis passes through the center
of gravity.
Theorem 3 exhibits an important relation between the expectation and
the variance.
Theorem 2 is mostly used if the values of £ lie near to a simple number A
but the expectation has not exactly this value. For computational reasons
it is then more convenient to calculate the value of E (f — A)2.
112 D IS C R E T E R A N D O M V A R IA B L E S [III, § 9
d (0 < D(0-
P roof . According to Theorem 5 of § 8
d 2( 0 = E 2 ( \ ^ - E ( 0 \ - \ ) < D 2 (0 .
Equality can occur in other cases besides the trivial case when f is with
probability 1 a constant, thus e.g. if f takes on the values + 1 and —1 with
the same probability y .
D(n) = \a \ -D(0-
P roof. Since E(n) = aE(f) + b, we obtain that
D\rj) = E (a\£ - E ( f) f ) = a'D 2 (f).
Especially, we obtain that the standard deviation does not change if we add
a constant to the random variable £ or multiply it by —1 .
It is seen from (3) that the variance of a random variable depends on its
distribution only. Hence we can speak about the variance of a distribution.
We shall now compute the variances of certain discrete distributions and
for sake of comparison we determine the values of r/(f) as well.
1. The variance o f the binomial distribution. Let the distribution of the
random variable <f be a binomial distribution of order n:
The value of for sake of simplicity, will only be determined for a bi-
I l l , § 9] T H E V A R IA N C E 113
лЛ2ЛП
АГ i2Af)
U) _ ГЕ_
22лГ ~ V я ‘
Um ^ - = 1.
TV—oo bN
D«>- J t■
it follows that
d (0 « / — D(£).
V я
Thus the quotient —jr r tends for N -+ oo to the limit !— • We shall see
P(£ = k + T)=pqk (k = 0 , 1 , . . . ) ,
114 D IS C R E T E R A N D O M V A R IA B L E S ПН, § 9
D2 (Z) = p E (k + О 2 як - A >
* =0 P
and therefore
b !«) = 7 -
M\ I N - M
к [n-k
P(S = k ) = 1 ' ^ ----- — (к = 0, 1 , . . n).
1".
As in the two preceding examples we obtain
M l M I n — 11
1 > (0 ' , т г т | | ' - » г т ] -
M
Let us introduce the notations---- = p, q = 1 —p, then
N
D (Í)= ^ » o t(I
D 4 i z k) = t » 4 S k)-
k=1 k=l
D \ Í c k Zk) = Í c l D \ Z k).
k=l k=1
Щ п) = пЕ.
Hence the ratio
Щ .) _ D_
E(fn) E jn
116 D IS C R E T E R A N D O M V A R IA B L E S [III, § 11
tends to zero for n —> oo, provided that E is distinct from zero. Consequences
of this are dealt with in Chapter VII. If £ is a positive random variable, the
quotient is called the coe fficient o f variation o f f
Щ)
As an interesting consequence of Theorem 2 we mention that if £ and t]
are independent, then
Ся = £i + £2 + • • • + Ín
is a random variable having a binomial distribution of order n. Since
D2(^k) = pq, it follows from Theorem 1 that
D~ ((„) = npq.
Thus by applying Theorem 1 we can avoid the calculation used in § 9 for
the determination of the variance of the binomial distribution.
2. The variance o f the negative binomial distribution. In the former para
graph the variance of the negative binomial distribution of the first order
was determined. If the independent random variables £ls £2, . . - Л г have
a negative binomial distribution of the first order, i.e. if
Pttj = k + \ ) = p q k (k = 0 , 1 , . . . ; j — 1 , 2 , . . . r)
’ >- im5? h - - £« » h . - m y
= (2 )
4 0(0 ’
satisfies
E(E') = 0 and D(E') = 1.
T heorem 1. We have
■
m
P roof . It follows from the linearity of the operator E that
\E([E-E(E)][r1- E ( t 1)])\<D(E)D(rj).
Theorem 2 cannot be further sharpened, since
R(E, Í) = + 1
118 D IS C R E T E R A N D O M V A R IA B L E S [III, § 11
and
= - 1.
T heorem 3 . I f S, and ц are independent, then
Щ ,Ц) = 0 .
J>(í=l,4=l)=P(É = - l , i f = l ) = P ( É = l , 4 = - l ) -
= P(£ = - M = - =
P ( f - 0 , 4 = l ) = P tf = 0,i, = - 1 ) = P ( £ = 1, 4 = 0) =
= р« = - 1,ч = 0) = - Ц £ -
where 0 < p < 1. Then £(£) = E(r\) = Elfq) = 0, hence £ and q are uncor-
(1 ~ p f
related. Since, however, P(f = 0, t] = 0) = 0 # P(£ = 0) P(q = 0) = ---- ------
they are not independent.
I l l , § 11] T H E C O R R E L A T IO N C O E F F I C I E N T 119
ti = a £ + b (5)
with probability 1, where a and b are real constants and а ф 0 ; in this case
R( f r\) = +1 or —1 according as a > 0 or a < 0.
P roof. Let E(f) = m. If the relation (5) holds between and r\, we have
Suppose, for instance, R ( f rj) - +1. (The case R(f, ц) = - 1 can be dealt
with in the same manner.) Put
£- m . ч —E(tj)
RKO ’ V DM ’
then by (3)
Щ ' l )= 1,
hence
- ri'f) = 2 - 2 = 0 .
From this it follows that
П ? = *l') = 1 ,
that is
i? = E M + DM
1 if .г > 0
sgn x — 0 if x = 0
- 1 if x < 0.
120 D IS C R E T E R A N D O M V A R IA B L E S [H I, § 11
T heorem 6. L e t £_ a n d t] b e d is c r e te r a n d o m v a r ia b le s a s s u m in g o n ly a
f i n i t e n u m b e r o f v a lu e s . L e t th e p o s s ib le d if f e r e n t v a lu e s o f £ b e Xj ( i = 1, 2,
. . m) and those o f t] yj (j = 1 , 2 , . . . , n). If£,h and r\k are for h = 1, 2 ,...,
m — 1 and к = 1 , 2 , . . . , n - 1 uncorrelated, i.e.
£ (£ V ) = £ (£ * № * ) (A = 1, . . .,/и - 1; (6 )
th en £ a n d rj a r e in d e p e n d e n t.
P roof. Let
ЁЁ
/=1 j = l
rn xj yki = (i=1
Y i) (
Pi xh z qjtfy,
i= 1
dik = Ё t ó (8 )
i=i
we have for the unknowns dik {i — 1 , 2 , . . . , m) the system of linear equa
tions
m
Z 4 ^ = 0 (A = 0 1, — ,m — 1). (9)
/= 1
d ik = 0 (г = 1,2,.. . , m ).
H I, § I I ] T H E C O R R E L A T IO N C O E F F I C I E N T 121
The same can be shown for every i = 1, 2 ,. .., m. From this follows
rtj = Pi Яг
thus £ and rj are independent.
Remark. The random variables £ and t] must fulfil (m — 1) (и — 1) con
ditions in this theorem; as was seen in Chapter II, § 9, the same number of
conditions is necessary to ensure the independence of two complete systems
of events consisting of m and n events.
Finally, we give an example in which the correlation coefficients are effec
tively computed.
Let the r-dimensional distribution of the random variables
íi,
be a polynomial distribution
E(£iZj) = n ( n - 1 ) Pi Pj.
ín ) 1 1 )"-k
hence from (2 )
Xk
1 im Wk - — e (к= 0, 1 ,.. .)• (4)
n -*-oo К .
Let
= (* = 0 ,1 ,- ..) . (5)
!• ( 6)
fc=0 A=0
Thus the probabilities defined by (5) are the terms of a probability distri
bution, called the Poisson distribution with parameter X: the meaning of X
in the above example is the average number of balls in one urn. It can be
shown by direct calculation that X is the expectation of the Poisson distri
bution (5). Namely from the relation
P(A = k) = - ^ r e~X (* = 0 , 1 , . . . )
we have
m -Ík~e-x=x[Í
k =0 K '-
- i ^ - l e -* =
4f c = l l A —I T
Me-' X.
= (7 )
Thus the expectation of the Poisson distribution (5) is X; hence the distri
bution (5) can be called the Poisson distribution with expectation X. The vari
ance of the Poisson distribution can easily be calculated;
Щ 2) = 1 к * - ^ е - х = 1 к ( к - 1 ) 4 г е - * + Х = Х* + Х,
k =0 K ■ k=2 K -
hence
D2tf) = X2 + X - X 2 = X;
124 D IS C R E T E R A N D O M V A R IA B L E S [III, § 12
that is, the standard deviation of the Poisson distribution (5) is D(if) = ,yÄ.
Thus the variance o f a Poisson distribution is equal to the expectation.
In the passage to the limit in (4) no use was made of the property that the
probability for a ball to enter in a certain urn is — with a natural number N.
Therefore our result can also be stated in the following form : The k-th term
Щ = ( ” Pkqn~k (8)
o f the binomial distribution tends to the k-th term o f the Poisson distribution,
i.e. to the limit
^ = 4 r ^ (9 )
for X > 0, z > 0, denote the incomplete gamma function of Euler and
the complete gamma function of Euler. Partial integration yields the formula
( . 2)
it0 fc! r\ Jо T ( r + l)
Let us now return to our practical problem. Because of the relation be
tween relative frequency and probability, the ratio of defective bottles and
produced bottles is approximately equal to the probability of a bottle being
defective, provided the number of manufactured bottles is sufficiently large.
This probability, however, is 1 — W0 hence approximately 1 — e~x. Since
X f X
Я = ----- , the percentage of defective items is 100 1 - exp — —— . If x
100 L l 100 j
is very small, this is in fact nearly equal to x; in the case of large x, however,
it is not. In the extreme case, when x = 100, the fraction of defective bottles
is not 1 0 0 per cent as it would follow from the consideration mentioned at
I ll, § 13] A P P L IC A T IO N S O F T H E P O IS S O N D IS T R IB U T IO N 125
the beginning of this paragraph, but only 100(1 — e _1) = 63.21 %. Of course
such a large fraction of defective items will not occur. If for instance л; = 30,
the fraction of defective items is 100(1 - e~0-3) « 25.92% instead of 30%.
Clearly, if the number of stones is large, it is more economical to produce
small bottles, provided of course that there is no way for clearing the liquid
glass. Using 0.25 kg glass per bottle instead of 1 kg, the fraction of defective
items decreases for x = 30 from 25.92% to 7.22%. As is seen from this
example, probability theory can give useful hints for practical problems
of production.
№ ]" = g (~ ). (T,
From (6 ) and (7) we obtain
Im ) —
G — t = [G(f ] "
l n /
hence for every positive rational number r
G( 0 = [ < W - (8 )
Since G(l) < 1, G(l) can be written in the form G(l) = e~x. Thus we obtain
from (8 ) that for every rational t
G{t) = e~x‘. (9)
However, because of the monotonicity of G(t), (9) holds for every t. There
fore
F(t) = 1 - G(t) = 1 - e~xt. (10)
Let us now examine the physical meaning of the constant A. By expand
ing the function F(At) = 1— e~XAl in powers of At we obtain the equality
F(At) = AAt + 0 {{A tf). (11)
The left side of (11) is the probability that an atom, which did not disinte
grate until the moment t, will disintegrate before the moment t + At. A has
I l l , § 13] A P P L IC A T IO N S O F T H E P O IS S O N D IS T R IB U T I O N 127
thus the following physical meaning: the probability that an atom disinte
grates during the time interval between t and t + At is (up to higher powers
of At) equal to к At. The constant A is called constant o f disintegration; it
characterizes the radioactive element in question and may serve for its
identification. It is attractive to give another interpretation of the number
A, which enables us to measure it. The time during which approximately
half of the mass of the radioactive substance disintegrates, is said to be the
half-life period. More exactly, this is the time interval such that during it
each of the atoms of the substance has probability ~ of disintegrating. Con
sider a given mass of a radioactive element of disintegration constant A.
Since every atom disintegrates during the half-life period T with the proba
bility we have F(T) = However, G(T) = 1 — Fit) = e~XT, thus
e~XT = — and
2
, In 2
* = ( 12)
(N X tf e~Nxt
P U 0 = - ---- ----------- • (15)
The half-life period of radium is 1580 years. Taking a year for unit we
obtain X = 0.000439. If t is less than a minute, Xt is of the order 10-9. For
1 g uranium mineral, containing approximately 1 0 15 radium atoms, the re
lative errors committed in replacing Pk(t) by P*(t) are of the order 10-3.
If we restrict ourselves to the case where t is small with respect to the half-
life period, we can choose the model so that the Poisson distribution re
presents the exact distribution of the number of radioactive disintegrations.
Consider a certain mass of radioactive substance and assume >
1. If tk < t2 < t3 and Ak(tb t2) denote the event that “during the time
interval (tb t2) к disintegrations occur”, then the events A k(tlt t2) and
A,(t2, t3) are independent for all nonnegative integer values of к and l.
2. The events Ak(tb t2), к — 0, 1,. . . form a complete system. If к is
given, P[Ak(ti, t2)] depends only on the difference t2 — tv In other words,
the process of radioactive disintegration is homogeneous with respect to
time. Let Wk(t) denote the probability of к disintegrations during a time
interval of length t (t2 — tk = t).
3. If t is small enough, the probability that during a time interval t there
occurs more than one disintegration is negligibly small compared to the
probability that there occurs exactly one. That is
a . 1 - ^ =o, ( .6 ,
/-*o Wi (t)
or equivalently
h i l M . I . (17)
Ir,(<)
In words: the probability that there occurs at least one disintegration is,
in the limit, equal to the probability that there occurs exactly one.
I l l , § 13] A P P L IC A T IO N S O F T H E P O IS S O N D I S T R IB U T I O N 129
*T( 0 = FL
u2t2
K2(0 = V ’
and, in general,
Hence
( ut)ke~ß‘
W k (t ) = ----- (A: = 0 , 1 , . . . ) .
The distribution of the stars thus follows the same law as the radioactive
disintegration; the only difference is that here the volume plays the role
I l l , § 14] T H E A L G E B R A O F P R O B A B IL IT Y D IS T R IB U T IO N S 131
of time. The same reasoning holds for particular kinds of stars as well, e.g.
for double stars. In the same manner the distribution of red and white cells
in the blood can be determined. Let A k denote the event that there are
exactly к cells to be seen in the visual field of the microscope, then we have
(XT)k e~XT
P{Ak) = 1 ------- (к = 0 , 1 , .. . ) , (27)
where T is the area of the visual field and X is the average number of cells
per unit area.
= Е XnPnk- ( 1)
n= 0
£ я * = £ a„ X P n k = £ <*„= L (2)
k=0 n=0 k= 0 /i = 0
n = f a„^„;
n =0
П will be called the mixture o f the probability distributions taken with the
weights a„.
132 D IS C R E T E R A N D O M V A R IA B L E S [III, § 14
•Я,(/>) = | ”. pV ~ * J
А"е~л
taken with the weight ccn = ------— is a Poisson distribution. In fact
n\
(M l ( A - M l
I
i
^ , ”r k ’
u)
/дм
with weights a„ = pnqN ". This leads to the binomial distribution $ M{p),
[n
as is seen from the relation
Ml ( N - M
n
Geometrically, mixtures of distributions can be represented in the follow
ing way: Two d istrib u tio n s ^ = {plk} and SP2 = {p2k} can be considered
as two points in an infinite dimensional space having the coordinates p lk and
p2k respectively. The mixture
+ ß<&>2 = {aplk + ßp2k} (0 < a < 1, ß = 1 - a)
00 00 00
Ё = Y Pj Y 4h = L (6)
k =0 7=0 h =0
Y PliPijlhh-
i+j + h= k
^ „ 0 ) = ( ^ i (*>))"• (9)
In fact
hence
(
■%m P ) ■%n(P) = '%m + n( P) - (10)
Relation (9) can be obtained from (10) by mathematical induction.
Similarly, it can be shown that for the negative binomial distribution
fJr(p) = \f~i(p)Y, where (?r(p) = {рьг>} with p ^ r) — 0 for k < r and p(^ =
= 11p rqk~r for к > r.
r - 1 /1
It can be shown finally that the convolution of two Poisson distributions
l k e~k )
is again a Poisson distribution. If C^fA) = — —— }, then
k\ I
^(A ) • = dS*(A + p) (11)
since
k kj e~x pk-j e"" (A + p)k e~(k+'°
j h ~ T ~ & —j)\ ~ k\
i.e. the distribution obtained as the convolution “product” of two Poisson
distributions has for its parameter the sum of the parameters of the two
“factors” .
Let us now introduce the degenerate distribution H?0. It is defined by
^ 0 = {1,0, 0 , . . . , 0 ,...} .
= Ä (12)
Thus the distribution $ 0 plays the role of the unit element with respect to
the convolution operation . 1 The distributions n, defined by p„ = \ , p m = 0
for m ф n, are also degenerate distributions. It is easy to show that
(/7Z
=0
«n^n)0. = Z an{&„ Q)-
/7=0
(14>
By means of the operations mixture and convolution functions of probabil-
00
ity distributions can be defined in the following manner: Let g(z) = £ Wnz n
/7 =0
00
g ( ^ ) = f J w n^ sn ( ^ ° = r 0). (15)
n= 0
If for instance ^ is the degenerate distribution defined above and if g(z) —
= (pz + q)n (0 < p < 1), then, because of (13), we have
exP № , = j r ■
k =0 A- fc=0 A . ( A i
G i 0 0 = f pk zk, ( 1)
k=0
where z is a complex variable. The power series (1) is certainly convergent
for I z I < 1 , since
t Pk = 1
k=0
(2)
and represents an analytic function which is regular in the open unit disk.
The introduction of the generating function makes it possible to treat some
problems of probability theory by the methods of the theory of functions
of a complex variable.
where Gh(fc)(z) is the k-th derivative of G^(z). The series (l)may converge in
a circle larger than | z | < 1 , or even in the entire plane.
Examples
1. Generating function o f the binomial distribution. Let ^ be a random
variable having a binomial distribution of order n, then
Gi (z) = (1 + p(z - 1))” = (pz + q)n.
2. Generating function o f the Poisson distribution. Let £ be a random vari
able having a Poisson distribution with expectation X, then
Gi (z) = eÄ(z- 1).
(Compare these with the corresponding Formulas (16) and (17) of the pre
ceding paragraph.)
3. Generating function o f the negative binomial distribution. Let ^ be a
r
random variable of negative binomial distribution with expectation —, then
P
we have / vz у
■
From the generating function of a distribution one can obviously get all
characteristics (expectation, variance, etc.) of the distribution. We shall
now show that these quantities can all be expressed indeed by the deriva
tives of the generating function at the point z = 1. Since the generating
function is, in general, defined only for | z | < 1 , we understand by the
“derivative at the point z = 1 ” always the left side derivative (provided it
exists).
If the derivatives G(f (z ) of G( (z) exist at z = 1, we have the following
relations:
00
G H !) = Z to *
k =1
oo
G '^ l) = Y K k - l ) p k
k=2
I l l , § 15] G E N E R A T I N G F U N C T IO N S 137
and, in general
G « ( l ) = f; k ( k - l ) . . . ( k - r + \ ) Pk (r = 1,2, .. .) , (4)
k=r
where the series on the right is convergent. Conversely, it is easy to show that
if the series in (4) converges, the derivative G*jp(l) exists and Formula (4) is
valid. The number
co
M s = E{ff) = Y k sp k (J= l,2,...) (5)
k=1
is called the moment o f order s o f £ (hence М г is the expectation). Thus we
have
G ,( l ) = M„ = 1,
G*(l) = М ъ
G ' ' ( 1 ) = M 2 - M 1;
and, in general,
G « ( l ) = ± S<fMj (r = 1 , 2 ,. .. ) (6 )
7=1
where the S f are Stirling numbers of the first kind defined by the relation
x(x - 1 ) . . . (x - r + 1) = £ S f x J.
7=1
Equations (6 ), if solved with respect to Mj, give
M i = G' (1),
M 2 = G '(1 ) + G ''(1)
and, in general,
M, = i ^G ^(l), (7)
7=1
where o f are Stirling numbers of the second kind, defined by
xs=t 7=1
afx(x - (x-j
1)... + 1).
Equations (7) allow the calculation of the central moments of f i.e. the
moments of £ — E (f) :
ms = E([Z-E {£,)Y) (5 = 2 , 3 , .. .) . (8 )
138 D IS C R E T E R A N D O M V A R IA B L E S [III, § 15
In fact
m, = t M ( - \)rM ^ rM[. (9 )
r= 0 ГI
For s = 2 we obtain the often used formula
00 00 (kw\s 00 ws 00
(e») = E Л E ^ r - = I ^ k E pjt k*
£ = 0 5 = 0 ** • 5 = 0 ^ • = 0
(11)
or
00 M
H ( (w) = Gs(e") (12)
j =0
The function Hf w ) is called the moment generating function of the random
variable f. In order to calculate central moments we put
(13)
A simple computation furnishes
_ “ m, ws
/« (» •0 = 1 + Z - J j - . (14)
s= 2 J!
This condition is always fulfilled for bounded random variables and also
in case of certain unbounded distributions, e.g. the Poisson distribution and
the negative binomial distribution.
If Hf w) exists, then I f 0) = 1, since G f \ ) = 1. But then there can be
found a circle | w \ < r in which I f w) Ф 0, hence In I f w) is regular. Put
Kf w) = In Ifw). Since KfO) = 0 and KfO) = /*(0) = 0, we have for | w \ < r
w - i
/-=2 % - о «
H I , § 15 ] G E N E R A T IN G F U N C T IO N S 139
k 2 = m 2 = D2 (£),
k 3 = m3, (16)
кл = mi — 3ml-
G( (z) = еЯ(г-1),
Я . (w) = eHeW- 1\
f (w) = ex<-e*~1~w\
hence
and, consequently,
K i + n ( w) = K , ( w ) + K n (w) . (19)
к, (£ + »/) = к, (О + к, (/?) (1 = 2 ,3 , . . . ) . (2 0 )
Remark. For 1=2 relation (20) is already well known to us: the variance
of the sum of independent random variables is equal to the sum of the
variances. For / == 3, relation (20) shows that this holds for the third cen
tral moments, too.
Proof. The probability that the quantity t] is equal to the random variable
is, by assumption, equal to a„. Thus, if qk = P(rj = k) and pnk = P(f n =
= k), we have
OO
Як = X < *nPnk- (22)
Consequently
oo oo oo
(z) = É Як zk = X an Z Pnk zk, (23)
k= 0 n= 0 k= 0
where the order of the summations may be interchanged because of the ab
solute convergence of the double series. Relation (21) is herewith proved.
ц=^ +Ь + . (24)
o f a random number o f random variables is equal to G,,[G(z)].
C„(z) = £ a„[G(z)]".
/1= 1
00
But, by definition, G„(z) = £ a„z". Hence
/1 = 1
But from this follows because of the regularity of g(z) that g(z) = eNz.
Hence A(z) = eN(z~J); that is v has a Poisson distribution.
Now we shall prove the following theorem:
bution, i.e. if
lim pnk= p k (k = 0 , 1 , . . . ) (36)
«-*■ co
and
t Pk= 1 (37)
are valid for
Pnk = P{Zn = k) (k = 0 , 1 , . . . ) , (38)
then the generating functions o f the converge, in the closed unit circle, to
the generating function o f the distribution {pk }. Hence we have
ve
lim G„(z) = G (z) fo r \ z \ < 1 (39)
n-*■ 00
where
= £ PnkZk (40)
k= 0
and
<?(*) = £ Pkzk. (41)
о
Conversely, if the sequence Gn(z) tends to a limit G(z) for every z with | z | <
< 1, then (36) and (37) are valid, i.e. G(z) is the generating function o f a
distribution {Pk} and the distributions {pnk} converge to this distribution
{.Pk}•
Remark. If (36) does hold while (37) does not, then (39) is valid only in
the interior of the unit circle. This can be seen from the following example.
Let £„ = n, hence
j 1 for к — n,
Pnk j 0 otherwise;
consequently,
lim pnk = 0 (fc = 0 , 1 , . . . ) ,
я-*-оо
but
>. _ . . .. „ Í 0 for I z j < 1,
lim Gn(z) = lim zn = \
n -a c oo I 1 fo r Z = 1,
P roof of T heorem 4. First we show that (39) follows from (36) and (37).
144 D IS C R E T E R A N D O M V A R IA B L E S [III, § 15
E
fc=V
Pk< -4-.
4
(42)
where pk has the sense given in (37); this will be always possible because of
(37). Choose next a number n so large that
E Pnk<~- (44)
k=N z
In fact
» N- 1 JV-1 p 00 P p
E Pnk = i - k=0
fc=lV
E a * —i ~k=0
E Pk + 4 —*=лг
E А + т4 < уZ '
It follows from relations (42), (43) and (44) that for \ z \ <, l and for suffi
ciently large n
N —1 oo oo
lG(z) - G«(z) I^ E
=0fc
\Pk- р Пк I+ E Pk + E
* =
Pnk < £,
it follows according to the known theorem of Yitali that G(z) is regular for
I z I < 1 and that G„(z) converges uniformly to G(z) in the entire circle
I Z I < r < 1. Putting
00
G(z) = £ pk zk
k=0
and denoting by Cr the circle | z | = r < 1, we obtain that
Cr Cr
Ш , § 15] G E N E R A T I N G F U N C T IO N S 145
From this (36) follows. Since G(l) = lim G„(l) = 1, we get (37).
«-*-00
Example. By means of Theorem 4 another proof can be given of the fact
that the binomial distribution converges to the Poisson distribution. Let
G„(z) be the generating function of the binomial distribution j , then
6. w _ ( i + * z « ) \
Clearly
lim Gn (z) =
«-*-00
P(Zr = k) = + k ~ 1 f t (к = 0,1,...),
A A
where p — 1 -------and q = 1 —p = — , then the distribution of £r con
i'-------------------------- r
verges to the Poisson distribution J^(A). Since the generating function G„(z)
of the distribution &r |l — —j is given by
/ 1 _ у
G„(z)= -------~
V1 r
and
( 1 - — V
lim --------- L = e ^ -b ,
« —co 1 ^
V1 " 7 /
our statement follows from Theorem 4.
The reader may have noticed that the present and the preceding para
graph deal substantially with the same problems. The only difference is that
instead of the algebraic point of view the analytical viewpoint is favored
here. Obviously, it means the same to say that the distribution SP can be
146 D IS C R E T E R A N D O M V A R IA B L E S [III, § 15
exhibited in the form & = G{Wj), where G(z) is a power series of nonnega
tive coefficients such that G(l) = 1 and $ ’1 denotes the distribution
{0 , 1 , 0 , . . 0 , . . . } , or to say that the distribution SP has the generating
function G(z). In dealing with algebraic relations between distributions, the
first point of view is entirely sufficient and the analytic point of view is
superfluous. If, however, theorems of convergence are considered, the anal
ytic point of view is preferable.
As an example of the application of generating functions, let us consider
now a problem taken from the theory of chain reactions. Consider the chain
reaction occurring in an electron multiplier. This instrument consists of
so-called “screens” . If an electron hits a screen, secondary electrons are
generated, whose number is a random variable. These electrons hit a second
screen, making free new electrons from it, whose number is again a random
variable, etc. Suppose that the distribution of the secondary electrons pro
duced by one primary electron is the same for each screen. Calculate the
probability that exactly к electrons are produced from the и-th screen. Let
£nr (r = 1 , 2 , . . . ) be the number of secondary electrons produced from the
и-th screen by the r-th electron; assume that £л1, . . . are independent
random variables with the same distribution which take on nonnegative
integer values only. Let pk denote the probability pk = P (£„r = k) (k =
= 0, 1 ,.. .) . Let further rj, denote the number of electrons issued from the
и-th screen. We have then
— £nl + £n2 + • • ■+ (45)
in fact, the number of electrons emerging from the и-th screen is the sum of
the electrons liberated by those emerging from the (и — l)-th screen. Thus
the random variable t]„ is exhibited as the sum of independent random vari
ables, the number of terms of the sum being equal to the random variable
4n-i- Put
G(z) = f p fcz* (46)
k=Q
and let G„(z) be the generating function of r\„. We have Gx(z) = G(z) and
it follows from Theorem 3 that
The generating function G„(z) is thus the и-th iterate of G(z). Sometimes it
is convenient to employ the recursive formula
G„(z) = G(G„_x(z)) (и = 2, 3 ,. . .). (48)
i n , § 15] G E N E R A T I N G F U N C T IO N S 147
In general, we have
G„+m(z) = G„(Gm(z)). (49)
The expectation of the number of electrons emitted from the и-th screen
is thus the и-th power of the expectation of the number of electrons emitted
from the first screen. For M > 1 this expectation increases beyond every
bound for и -* oo; for M < 1 it tends to 0. In the latter case the process
stops sooner or later. Let us see now, what is the probability of this. Let
Pnk be the probability that к electrons are emitted from the и-th screen;
particularly, we have
Pn,o = G„(0). (53)
It can be supposed that G(0) = p0 is positive, since if G(0) = 0, obviously
P„,o = 0 for и = 1 , 2 , . . . .
The sequence P„fi (и = 1, 2 , . . . ) is monotone increasing. This can be
seen immediately: in fact if no electron is emitted from the и-th screen,
the same will hold for the (и + l)-st screen to o ; the converse, however, is
not true. According to (53) we have
P = G(P). (57)
Since G(l) = 1, 1 is also a root of this equation. We shall show that for
M < 1 there exist no other real roots. In this case therefore the probability
that no electrons are emitted from the и-th screen, tends to 1 if n -> oo. To
prove this draw the curve у = G(x). Since G(x) is a power series with non
negative coefficients, the same holds for all its derivatives, G(x) is therefore
monotone increasing in the interval 0 < x < 1 and is also convex. The
equation P = G{P) means that P is the abscissa of the intersection of the
curve у = G(x) and the line у = x. Since G'(0) > 0, G(x) - x is positive
for л: = 0. Now if G'(l) = M > 1, G(x) — x is, because of G(l) = 1, nega
tive in an appropriate left hand side neighbourhood of the point x — 1
(see Fig. 14). As G(x) is continuous, there exists a value P(0 < P < 1) satis
fying (57). Because of the convexity of G(x) there can exist no further points
of intersection.
It can be proved in the same manner that for M < 1 Equation (57) has
no real roots other than P = 1. (There can of course exist complex roots
of (57).)
It is yet to be shown that for M > 1 the sequence Pn0 (и = 1 , 2 , , . . )
converges to the smaller of the two roots of Equation (57). This can be seen
immediately from Fig. 15 by relation (47) which gives in case of M > 1 for
I I I . § 16] A P P R O X IM A T IO N O F T H E B I N O M IA L D I S T R IB U T I O N 149
Fig. 15
Thus the probability that from the и-th screen there are exactly к > 1 elec
trons issued tends to 0 for n -* oo for each fixed value of k. From
CO
lim Pnfi = P < 1 and £ Рщк = 1
n-*co k =0
it follows that for large enough n the number of the emitted electrons (pro
vided that the process did not stop) will be arbitrarily large with a (condi
tional) probability near to 1. This is in accordance with experience.
e W = X - [x] - — , (2 )
where [x] denotes as usually the integral part of x ; i.e. [x] = к for к <
< X < к + 1 (к = 0, 1,. . .). Then we have
X /(*) = j /(*) dx -
a < k < ,b a
[e(b)f(b) - Q(a)f(a)\ + J Q{x]f'(x) dx.
h
(3)
к = tip + z and n — к = nq - z. (6 )
in 1
Evaluating asymptotically the binomial coefficient figuring in (5) by
/С I
Stirling’s formula, a simple calculation gives
“ 72л “ Ж _ 1 2 (и - к) ’ 1 ;
where 0„ is defined by (1). We assume that the quantity
* = -£ = - (9)
yjnpq
1T h e p r o o f o f th is f o r m u la c a n b e f o u n d e .g . in K . K n o p p [1 ].
I l l , § 16J A P P R O X IM A T IO N O F T H E B IN O M IA L D IS T R I B U T I O N 151
1 ------- — 1 +——
np + z nq — z
( . 2)
bsjnpq l и 1_
5 = 0 (-Í-), (13)
and the constant figuring in the residual term О Í— in Equations (11), (12)
In ,
and (13) depends on A only; thus we obtain from the relations (7), (11),
(12), and (13) the following theorem:
0* = ( ” Pk <Tk (k = 0 , 1 , . . . , « ) , (14)
further if
■ r . - - j l ~ r i + (* - ^ - r t -+ o f-l|l. (.6)
s/2nnpq Gf npq \ n ).
1 Here, as well as in what follows, the notation aN = О (bN) is employed. If aN and
bN(N = 1 ,2 ,...) are sequences of numbers such that bN ф 0 and there exists a constant
C > 0 for which [a v I <,C I bN\, this fact will be denoted by aN = O(bf). (Read:“ av is of
a N
order not exceeding that of bN”.) If, however, lim ——= 0, this will be denoted by
v .<z> b\
aN = о (bN). (Read: “aN is of smaller order than bN”.)
152 D IS C R E T E R A N D O M V A R IA B L E S [III, § 16
[l+of-U (17)
sjln n p q L V « /J
of Theorem 1 suffices.
Thus the probabilities | ” | pkq"~k are approximated by values of the
function
f{x) = — exp [ - (x - m f / 2o2] (18)
у/ 2 тто
Fig. 16
line either to the left or to the right, with the same probability — . Under
the last line of nails there follows a line of n + 1 boxes in which the balls are
accumulated. In order to fall into the k-th box (numbered from the left,
к = 0 , 1 , . .., ri) a ball has to be diverted к times to the right and n — к
Fig. 17
times to the left. If the directions taken at each of the lines are independent,
n
the probability of this event will be 2~". By letting a large enough num-
К
ber of balls roll down Gabon’s desk, their distribution in the boxes exhibits
quite neatly a curve similar to the Laplace Gauss curve. Theorem 1 states
that the limit relation
( " W - ‘
lim ------ ■ , ...------- vft = 1 (19)
„-a, 1 (k - n p f
__ exp — ------------ -
sjln n p q I 2npq Jj
holds, if with n also к tends to infinity so that
\k -n p I
•Jnpq
remains bounded; with these conditions the convergence is even uniform.
Formula (19) is the so-called de Moivre-Laplace theorem.
154 D IS C R E T E R A N D O M V A R IA B L E S [Ш , § 16
W * \a ,b ) = £ Wk =
Г npq
= — /—■ X e 2 (l + — = = - ( * £ —3xfc) + О — , (2 1 )
y/2nnpq a<,xk<b \ b fn p q L n J/
are integers; this can always be done without changing the value of
W(n)(a, b). It follows then from (4) that
T heorem 2. I f
^ J l ) pkg" k ’ ^ r ^ ^ ‘lx + S v a
<24a>
where
for each given pair (a, ß) of real numbers (a < ß); it suffices in fact to replace
a by an, ß by bn, where a„ is the least number such that
A = n p + a „ yj npq + ~ > np + a J n p q
Obviously,
a" d
hence
bn _ ** ß _ X*
lim f e 2 dx = f e 2 dx.
«-<*>& i
Thus the right hand side of (25) gives an approximate value for the proba
bility that the number of occurrences of an event A (having the probability
P(A) = p) in an experiment consisting of n independent trials, lies between
the limits np + a J n p q and np + ß J n pq. To use this result we must have
the values of the integral
ß
I C
— 7= e 2 dx
Jlu J
a
_ **
for every pair (я, ß). The integral Je 2 dx cannot be expressed by elemen-
156 D IS C R E T E R A N D O M V A R IA B L E S [III. § 16
is tabulated with a great precision and a table of its values is given at the
end of this volume (cf. Table 6 ). The curve у = Ф(х) is shown in Fig. 18.
+00 +00 00
1 Г Г Х*+Уг Г _ г*
Ф2(+ оо) = —— е 2 d xd y = \ re 2 dr = I.
—о о —оо О
Proof. We have
P ( \ f A( n ) - P ( A ) \ < e ) = E \ nk \ p k 4n~k-
[ k —np\< № \ K J
Choose a number Y such that
Ф (Т)-Ф (-Т)> 1 - A . (2 )
158 D IS C R E T E R A N D O M V A R IA B L E S [III, § 17
§ 18. Exercises
1. Suppose a calculator is so good that he does not make more than three errors
in the average in doing 1000 additions. Suppose he checks his additions by testing
the addition modulo 9 and corrects the errors thus discovered. There can, however,
still remain undetected errors: in fact, it may occur that the erroneous result differs
from the exact sum by a multiple of 9. How many errors remain in the average among
his additions?
H in t. It can be assumed that, if the sum is erroneous, the error lies with an equal
probability — in any of the residue classes 0, 1,2, 3, 4, 5, 6, 7, 8 mod 9. Let A denote
the event “ the sum is erroneous” , В the event “ the error could be detected by testing
the sum modulo 9” . The probability sought is the conditional probability P(.A \ B ) ;
according to Bayes’ rule it has the value — - .
160 D IS C R E T E R A N D O M V A R IA B L E S [III, § 18
2. A missing letter is to be found with the probability P in one of the eight drawers
of a secretary. Suppose that seven drawers were already tried in vain. What is the
probability to find the letter in the last drawer ?
, В Д -1 Э
UJ
of the polyhypergeometric distribution, where
к i + k2 + . . . + kr = n у Ni + N 2 + . . . + N r = N.
P ro v e t h a t if t h e n u m b e r s N , ( j = 1, 2 t e n d t o in f in ity s o t h a t
lim = p, > 0 (J = 1, 2, . . . , r ) ,
ы р ь*... - Щ О T
Thus under the above conditions the terms of the polyhypergeometric distribution
converge to the corresponding terms of the multinomial distribution.
5. If
lim n p t = A, > 0 (J = 1, 2 , . . . , r — 1),
П—
*■+00
the multinomial distribution
j ______ a. _____ . A
i f t . кг\ . . . kr\ yr i
tends to an (r — l)-dimensional Poisson distribution. That is, for fixed k u k2, . . . , kr~j
(Ат, = n — кг —k2 — . . . — k r- i ) we have for n —* + 00
„I 3*! 3*/*—I
lim ------- ---------- f t f t . . . f t = 1' ■e-«rb~+A ,. .
f t f t . . . kr\ ^ Pr f t. . . . k rf t.
6. Deduce Formula (4) of §4 from Formula (12) of § 12, using the convergence of
the binomial distribution to the Poisson distribution.
kk e~x
7. Determine the maximal term of the Poisson distribution-------- (к = 0 , 1 , . . . ;
k\
A >0).
8. If 2 is constant and N = n ln n + X n , there exists a limit of the probabilities
Pk (n, N ) (cf. Ch. II, § 12, Exercise 42.b) for n -> ° ° : we have for any fixed real value
of X and any fixed nonnegative integer к
, M R
9. If —— = p, — = r, and n-> oo so that
N N
lim np — Л > 0 and lim nr = p > 0
Л—*- + со tl —►oo
then
k —1 n - k —l
, П<м+''л) П Q f - M + j K)
h m ---------------- *3]---------------------------- =
— ; Пw
t=0
+ J v
Thus under the above conditions, the Pólya distribution tends to a negative binomial
distribution. If /i = 0, the above limit becomes -------- ; the limit distribution is then a
k\
Poisson distribution.
10. A roll of a certain fabric contains in the average five faults per 100 yards .
The cloth will be cut into pieces of 3 yards. How many faultless pieces does one expect
to find?
Hint. It can be supposed that the number of faults has a Poisson distribution.
The probability of finding к faults in an x yards long piece is therefore equal to
U -f,-*
20 -------- №- 0 ...........
k\
11. In a forest there are on the average ten trees per 100 m2. For sake of simplicity
suppose that all trees have a circular section with a diameter of 20 cm. A gun is fired
in a direction in which the edge of the forest is 100 m away. What is the probability
that the shot will hit a trunk?
Hint. It can be assumed that the trees have a Poisson distribution; the probability
that on a surface area T m2 there are к trees is equal to
( T \k --L
13. At a certain post office 1017 letters without address were posted during one
year. Estimate the number of days on which more than two letters without address
were posted.
14. Let tSj (j>) = {p , 1 - p } be a binomial distribution of order 1; let g(z) =
2 _(X
= -----------. Determine the distribution g [•%(?)]■
1 —az
i f -4 -
Ф(х) = ---- = e 2 du.
yj 2л '
I (, + ^ K e_ 2 •
Ш , § 18] E X E R C IS E S 163
for M = 1, 2..........
20. Applying the result of Exercise 19 prove that
21. The function of two variables *Р(х, у) = ф {—— ) fulfils the partial differential
{yjy >
equation of heat-conduction
= 1
дУ 2 dx2
the function
U(x'" ) = k
K< 2
fulfils the difference equation
AnU = i - AlU
where
A„U = U(x, n + 1) - U(x, n),
XU = U(x + 1, n) - 2U(x, n) + U(x - 1, л).
A2
22. Prove the asymptotic relation
(n\ к nк 1 (к — "РУ
к ) 2n npq L 2npq
where
• --
TV—> oo, M — pN, 0 < p < 1, q — \ — p, n —> n= o( J n )
164 D IS C R E T E R A N D O M V A R IA B L E S [III, § 18
and
Iк — np\ =
Thus also the hypergeometric distribution can be approximated by the normal
distribution.
24. Establish the following power series expansion:
Ф{х) =
1 1 X* X5 X7 ( - l)*x2*+1
~ T + Г " 1 ! 2 . 3 + 214-5 3 1 8 - 7 + ' ' ' + k \ 2 k{2k + 1) + " 'J ’
How many terms are to be taken to calculate Ф(2) with an accuracy of four decimal
digits?
25. If X is positive, the difference
1 -
1 — Ф (х )------ ----- e 2 X
V 271
1 1 1-3 1 • 3 ■5 ( - 1)* 1 ■3 . . .{2k - 1)
+ ^ X7 +■■•+ x 2*+ 1
How many terms are there to be taken to calculate Ф(4) with an accuracy of 10“"?
26. Show that
lim g„ \a + ---- I = A
П-*- oo \ nb) )
be uniformly fulfilled for t in every finite interval. Show that for every x > 0, у > 0
we have
__ °+ ^ A x
J gÁ t)fV ) dt = 5 e~ 'r du-
У ^ —У
inb
I l l , § 18] E X E R C IS E S 165
e~^
lim £ _ _ _ = ф(х).
A -v o o k < A + x i h
lim
n —►со
Z
к —пр
( \ ) / <?”“* = Ф(х).
\ К )
^npq~ <X
30. Prove the following strong form of Stirling’s formula
\ к —пр\>ПЕ V K ) пь
deduced in course of the proof of Bernoulli’s law of large numbers, that for any
function f(x), continuous in the closed interval [0, 1], the so-called Bernstein poly
nomials of f{x)
f j Nk Ek = E . (3)
ft=l
According to the definition of the conditional probability, probabilities (1) fulfilling (3)
are simply multiplied by a constant factor. Find the values of N x, N2, . . . , N„ fulfilling
(2) and (3) for which the expression (1) takes on its maximal value.
In Г (N + 1) = ilV + 4 In N - N + In v '2 я - j -
0
r ’{ N + \ ) 1 Г o(x)dx
Г (A + 1) - П + IN + J (x + N )- '
о
By Lagrange’s method of multipliers, the conditional extremum of (1) under the
conditions (2) and (3) can be found. Thus it follows that
Wk e ~PE*
N k x N —— Í---------- ,
£ W ,e-to
/=i
where the constant ß must be chosen so that (3) is satisfied. This is Boltzmann's energy
distribution.
35. Let 0 < p < — , g = l - p, и an integer such that np — m is also an integer.
Let A be an event with probability p. Show that during the course of n independent
E i к
п \ р кя"~к > E [ I U v -* .
A' = 0 K ) k=m+ 1 \ K )
By putting (1)
Br = I n j pm- ' qn- m+r ( r = 0 , 1 , . . . , m),
\ m —r )
36. Prove the following asymptotic relation for the terms of the multinomial
distribution. For
Г
П -> + Y j k l =* n > I k l - nP i I = ° ( J n)
1=1
we have
-------
kx\ k2\.—--------
. . k r\ PÍ1Pz‘ ■■■Pr’
Vr ~
A' uk vlk
Rlk = P(Z=j,ri = l c ) = C - ^ — ( / , * = 0, 1, . . . ) ,
C ~ /to *=o /! A! ‘
For the independence of { and ?? it is necessary and sufficient that v = 1 should hold.
The distribution 7?ilt is therefore a generalization of the Poisson distribution for two
dimensions. (Distribution of N. G. Obreskov.)
39. Let £ and r\ be two independent random variables both having a Poisson
distribution with expectation A. Determine the distribution of f — rj.
40. Each of two urns contains N — 1 white balls and one red ball. Draw from
both urns n balls (n < N) without replacement. Put now all 2 N balls into one and
the same urn and draw 2n balls without replacement. In which one of the two cases
is it more probable to obtain at least one red ball?
41. Let A be the disintegration constant of a radioactive material. Let the proba
bility of observing the disintegration of any one of the atoms be denoted by c (c is
proportional to the solide angle under which the counter is seen from the point from
where the radiation starts). Let N denote the number of the atoms at the time t = 0,
the number of disintegrations observed in the time interval (0, f). Prove by applying
the theorem of total probability that has a binomial distribution.
Hint. The probability that exactly n atoms disintegrate during the interval (0, t) is
(^ 1 (1 - e- x')nе -ш - п);
\ П)
the probability that among them к disintegrations are observed, is | ^ jc^fl —c)"~k.
U I, § 18] E X E R C IS E S 169
= ( c ( l - e-*'))k (1 - с (1 - e - x'))N~k .
Note that because of с (1 — e~Xl) < 1 — e~cXl somewhat fewer disintegrations are
observed than when the value of the disintegration constant would be Xc and all
disintegrations would be visible. But this difference is only important for large values
of t.
42. Let £„ £2, .. ., c:r be independent random variables with the same negative
binomial distribution of order 1:
K i k = «) = (1 - P ) P " - ' ( « = 1 , 2 , . . . ; к = 1, 2 , . . . , r; 0 < p < 1).
Let V be a randcm variable, independent from , with a Poisson distribution
an d expectation X. Determine the distribution of
f = + íj + ... + í,+l-
Hint. By using the notations of § 14, the distribution of £i + £ * + . . . + i*+i
can be written as
Sk+l — - - - - ,
that of C is given therefore by
£P= y f Г* =
k=o k\ \ \ — pgy) 1 — pSi
7 —— e l-t = £ z*>
1 z k=0
where the
43. Calculate the expectation of the number of marked fishes at the second capture
(cf. Ch. II, § 12, Exercise 21), if there are 10 000 fishes in the lake and if at the first
capture 100 fishes are marked.
44. Calculate the expectation of the number of matches in one of the boxes in
Banach’s pockets at the moment when he found the other box empty for the first
time (cf. Ch. II, § 12, Exercise 14).
lCf. e.g. G. Pólya and G. Szegő [1].
170 D IS C R E T E R A N D O M V A R IA B L E S [III, § 18
45. Calculate the expectation of the sum defined in Chapter II, § 12, Exercise 46:
X — kx + k2 + . . . + kM.
Hint. Let be the distribution of a random variable which assumes the values
0, 1, 2, . . к — 1 with the probabilities -j- :
к
М т 4 .....T - * 0........
Show that the distribution of X — M( M + l)/2 can be written in the form
■■■ÍFv-M+i
M ( N + 1)
From this it follows that E(X) = ------ —-------- .
46. Suppose that a player gambles according to the following strategy at a play
of coin tossing: he bets always on “tail” ; if “head” occurs, he doubles his stake in
the next tossing. He plays until tail occurs for the first time. What is the expectation
of his gain ?
Hint. If the tail occurs at the я-th toss for the first time (the probability of this
event is ■— ), the gain of the player, if his bet at the first toss was 1 shilling, will be
1 shilling, since
П—1
2" - I 2* = 1.
k =0
Z yz - b
/ 1=1
It seems that with this strategy the player could ensure for himself a gain. This,
however, would be true only if he would dispose over an infinite sum of money. His
fortune being limited, it is easy to show by a simple calculation that the expectation
of his gain is 0 even if he doubles the stake always when a head appears.
47. Calculate the expectation of the Pólya distribution.
48. The chevalier de Méré asked Pascal the following. Two gamblers play a game
where the chances are equal. They deposit at the beginning of the game the same
amount of money. They agree that he who is the first to have won N games gets the
whole deposit. They are, however, obliged to interrupt the game at a moment when
the one player gained N — n times and the other N — m times (1 < n < N; l < m <
< N). How is the deposited money to be distributed ? Calculate this proportion for
я = 2 and m = 3.
Hint. The distribution of the deposited money is said to be “fair” if the money
is distributed in the proportion p n : p m, p„ denoting the probability that the first
gambler would win and p m the probability that the second. Thus each gambler receives
I l l , I 18] E X E R C IS E S 171
a sum equal to this expectation. The problem is thus to calculate the probability that
the first (or the second) wins, under the condition that he already won N — n
(i. e. N — m) games.
49. In playing bridge, 52 cards are distributed among four players. The values
of the cards distributed are measured by the number of “tricks” in the following
manner: If a player has the ace and the king of the same suit, this amounts to 2
tricks; ace and queen of the same suit without the king t o l — ; king and queen without
ace to 1; ace alone to 1; king alone to ~ trick. What .is the expectation of the total
number of tricks in the hand of a player?
Hint. Obviously, the expectation of the number of tricks is the same for all players
and in each of the suits. Hence the expectation of the total number of tricks for a
player in all four suits is equal to the expectation of sum of tricks for the four players
in one suit. Thus it suffices to consider one suit only, e.g. spades. The expectation
of the tricks in the hand of one player is equal to the sum of the expectations of all
tricks present in the spades. However, this sum is equal to 2, except in the case when
the ace, the king, and the queen of spades are in the hands of different players; in
3
this case the sum of tricks is — . Hence the expectation looked for is 1.801.
M
50. a) There are M red and N — M white balls in an urn. We put --- = p. Draw
. N
n balls without replacement from the urn and let the random variables (k =
= 1, 2, ..., n) be defined as follows:
f _ [1 if at the k-th drawing a red ball is drawn,
’k } 0 otherwise.
Calculate R(í, , Sk) (1 < j < к < rí).
b) If
= i l + £2 + • • ■ + in
prove that
° 2(Q = np (1 - p) (\ - -Ü .
C H A P T E R IV
(la)
n n
and
е л л - В) = Ц -\А ) - Z -\B ). (lb)
Let Ix denote the interval ( —0 0 , лг) and Ia b the half-open interval [a, b).
By assumption, £ - \ I x) = Ax £ *x£. Hence, according to (lb), £_1(/я>л) é
for every pair of real numbeis (a, b), a < b. Since is a cr-algebra, it follows
from (la) and (lb) that £- 1(A) £ for every Borel-set A of the real line.
Theorem 1 follows immediately.
it follows that
lim F(xn) = 1.
n — + CO
f 0 for X < c
F(x) = |
( 1 otherwise.
the inequality
£
k= 1
(Pk ~ a k) < 0
implies
Example. We have already seen (Ch. Ill, § 13) that the function de
fined by
Í 1 — e~Xx for > 0
Fix) = \
[0 otherwise,
/„W, - FТ7Ц
W\ - [
i 0
fOT X> 0
f o r i < 0
1 --
f(x ) = — = e \
J in
where px, p2, p3 are nonnegative numbers having sum 1 and jF)(x) (i — 1 ,
2, 3) are the three distribution functions such that Fi(x) is the distribution
function of a discrete random variable, F 2(x) is an absolutely continuous
distribution function and F3(x) is a singular distribution function. This
decomposition is evidently unique.
The probability figuring on the right side of (1) is always defined; in fact,
let denote the level set of all со such that £Ä(<u) < x (k = 1 , 2 , . . ri),
then A ® and
П1
k=
178 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV, § 3
Ч П 4*)-
k=l
Л = £ А к, В = f f Bk,
k=l k =1
IV , § 3 ] M U L T IN O M IA L D IS T R IB U T IO N S 179
we find that
P(ak <Цк < Ь к ; к = 1 , 2 , . . n) = P{ÄB) = В Д - P ( £ AkB )‘.
k =1
for hk > 0 andfor any real numbers x k (k — 1 , 2 , . . . , n). Here the “product”
of the (commutative) operations A$ means that they are to be performed
one after the other. It is easy to prove that if condition 5' holds for hx = h2 —
= . . . = hn = h > 0 it is valid in general.
Conversely, it can be shown that every function F{xu x2, . . . , x n) fulfill
ing conditions 1-5 may be considered as a distribution function. This
follows from § 7 of Chapter II.
If the distribution function of the random vector £ = (£x, £2>• • •> £n) is
F(xx, x 2, . . x n) and В is a Borel-set of the и-dimensional space, then
P ( U B ) = $ . . . $ d F(xb . . . , x n),
J в J
180 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 3
s, N 8nF(x1,x z, . . . , x n)
....... * - ) ° а , , г , , . . . э * , (3)
/ 0 * 1 , * 2 , . . . , x„) > о
/( V ! ; . . . . *„) = urn .
л-o "
Further we have
Xi Хц
F(Xi, ...,*„) = J —co
... J
—oo
f{ t и • • tn) dtl . . . dtn ; (4)
hence in particular
+00 +00
J . . . J /( * ! ,. . . , x„) dx1. . . d x n = 1 .
—00 — 'oo
(5)
Further
ьi
< bk; к = 1 , 2 , . . . , и) = j' . . . J /(* 1 , ... , x„) dx1. . . dxn, (6 )
űi an
In other words: the probability that the endpoint of the random vector £
lies in a Borel-set В of the и-dimensional space is equal to the integral on В
of /(* b . . ., x n).
i.e., if the tw o-dim ensional distribution function o f (£, rj) is equal to the prod
uct o f the distribution functions o f £ and rj. From (1) is readily deduced
that
P (a <£ < b, c < rj < d) = P (a < <
I < b) P (c <ц < d) (2)
and, more generally, for any two Borel-sets A and В (cf. Theorem 2 below):
p ( ti = n m < x k) (3)
k=l
holds. If the random variables <J2, are independent, any к {к < rí)
chosen arbitrarily from them are independent as well. To see this, it suffices
to substitute Xj = + 0 0 , where the./-s are the indices of the random variables
which do not figure among the chosen k.
The converse of this relation does not hold. For example the fact that £, rj,
C are pairwise independent does not imply their independence. We have
already seen this in the preceding Chapter.
If (*2 , . . ., £„ are discrete random variables, the above definition of
independence is equivalent to the definition given in the preceding Chapter.
We shall prove now some simple theorems about independent random
variables.
T heorem 1. + c o n s ta n t is in d e p e n d e n t o f e v e r y r a n d o m v a r ia b le .
IV , § 5] IN D E P E N D E N T R A N D O M V A R IA B L E S 183
P(A< r\<y)=
p ,. .
X,
^ s f Щ < x) for c < y,
[ 0 otherwise,
hence ( 1 ) is valid.
P roof . If Въ . . . , Bn are Borel subsets of the real axis, it follows from (3)
that
р<Л1 6 в ъ . . . , e я„) = п € **)• (4 )
Л= 1
In fact, if Вх, . . ., Bn are unions of finitely many intervals, (4) follows from
(3). Let now B2, B3, . . . , B„ be fixed and let Bx alone be considered as vari
able: thus both sides of (4) represent a measure. The theorem about the
unique extension of a measure (Ch. II, § 7, Theorem 2) can be applied here
and it follows that (4) is true for any Borel-set BY. Let now be Вг an arbi
trary, fixed Borel-set and let B3, Вn be fixed sets, each of them being the
union of finitely many intervals. By repeating the preceding reasoning it
can be seen that (4) remains valid, if B2 too is an arbitrary Borel-set. By
progressing in this manner (4) can be proved. Theorem 2 follows immediately
from (4).
In particular it follows from Theorem 2 that the random variables
Vk = ak€k + bk (k=l,2,...,n)
where ak and bk are arbitrary constants, are independent if § l 5 . . .., t n are
independent.
Furthermore, it follows from Theorem 2 that for independent random
variables Formula (3) remains valid if for one or several values
of к on both sides one of the expressions < x k, > x k, or i> x k will be written
instead of < x k.
P roof . (5) follows from (3) because of Formula (3) of § 3. Conversely, (3)
is obtained by integrating (5).
T heorem 4. L e t £2, b e in d e p e n d e n t r a n d o m v a r ia b le s a n d le t
A(xj, . . x k) b e a B o r e l-m e a s u r a b le f u n c tio n o f к v a r ia b le s (Jc < ri). T h en
th e r a n d o m v a r ia b le s
h(f 1 , • • t o , & +i ,
are independent.
The proof is similar to that of Theorem 2.
The independence of two random vectors, £ = (£ъ . . £„) and n =
= (nu . . Пт) can be defined as follows: £ and rj are said to be indepen
dent, if the equality
P(Zi < хъ . . £„ < x n\ r\i < Ух,. . ., i/m < y m) =
= Pß I < *1, • • In<Xn ) f (»?X < Уь fim<Ут
•• ) ( 6)
is identically fulfilled in the variables Xj and y k.
/(* ) = 1 ( 1)
—------ for a < x <b.
b —a
1 for x> b.
f f(x ) d x = d — - , (4)
c b -a
Hence
f ( x 1, . . . , x n) = П /*(**), (6 )
k =l
where f k(xk) (/c = 1 , . . и) is the density function of a random variable uni
formly distributed on the interval (ak, bk); consequently are inde
pendent. Conversely: if is uniformly distributed on (ak, bk) and if the
i;k are independent, the vector £ = (<JX, . . £„) is uniformly distributed in
the parallelepiped ak < x k < bk (k = 1 , 2 , . . . , и).
For an infinite interval (or for a domain of infinite volume) the uniform
distribution can be defined by means of the theory of conditional probability
spaces. We shall return to this in Chapter V.
Fi(x) = Fi ~ ~ ~ ) (la)
(for a > 0 ), or
F 2 (x) = 1 - F, + oj (lb)
(for a < 0 ).
If F ^x) is absolutely continuous, F>(x) is absolutely continuous as well.
In this case, we obtain for the density functions f ( x ) = F'(x) (/ = 1, 2)
, , , \ [x-m
f ‘l (.X) — r / l • (2 )
M о
IV , § 7 ] T H E N O R M A L D IS T R IB U T IO N 187
where
П
x ’k = a k о + £ aki xt (k = 1 , . . . , n).
i=i
For n = 1, (4) reduces to (2).
Let us now return to the normal distribution. We shall call every distri
bution normal which is similar to that obtained as the limit of the binomial
Хг
. . . e ~2 ~
1 lx-m \ 1 (x-m f
f( x ) = - < p -------- = - — - e x p -------r-j— , (5a)
в a ! Jlna a
where
1 f --
Ф(х) = —= г e 2dt. (6 b)
Jin J
— CO
then thedensity function of the random vector £ = (£, tj) isequal to the
product of the density functions of £ and ц; i.e. to
,. . 1 f 1 Г (x —mfi2 (у —m.,)2 11
^ = 'Ъгг/T ‘fr~exp ~ T ------ ~2----- + --------72-- • (7a)
A random vector having a density function of the form (7a) or one similar
to it is said to be normally distributed (or Gaussian). Since all distributions
having density functions of type (7a) are similar to each other, the two-
dimensional normal distributions form a family. The density function (7a)
(with mx — m2 — 0) is represented on Fig. 19.
IV , § 7 ] T H E N O R M A L D I S T R IB U T IO N 189
A simple calculation shows that the most general form of the two-dimen
sional normal density function is given by
where A and C are positive, В is a real number such that B~ < AC, mx and
m2 are arbitrary real numbers. If В Ф 0, £ and r\ are not independent. In
fact, in this case the density function cannot be decomposed into two factors,
one depending only on x and the other only on y.
We introduce now the concept of the projection of a probability distribu
tion. Let C — (£1, • • D be an «-dimensional random vector. The projec
tion of the distribution of £ upon the line g having the cosines of direction
Cg = Z SkZk-
к= 1
f k ( x k) = J •■. J /(*!, -•
—oo —oo
x„) d x x... d x k _ x d x k+1. .. dx „. (9)
"*= <n)
\
IV, § 7] T H E N O R M A L D IS T R IB U T IO N 191
X exp
A
X
/= 1
Z
j = 1
bu ix 'i - mi) {x'j - mj) , (17)
i=l y=l
is positive definite. It is known that a positive definite quadratic form can
be transformed into a sum of squares. Thus if the density function of £'
has the form (17), there exists an orthogonal transformation with matrix
C = (Cy) such that the и-dimensional density function of the random vari
ables
Zk = £ cjk ( £ y - mi)
i= 1
n
has the form (12). Note that the factor 1/ Д <7fc is equal to the positive square
k = 1
root of the determinant | by |. The matrix В = (by) can be written as CSC*,
where C* is the transpose of C and S is the diagonal matrix
\ 0 ... 0
<4
о 0 \ ... 0
S= a~2
1
0 0 ...
i ^ i = i5 i = n - 4 .
k=1 ak
Consequently, the density function (17) can be written as
sional normal distributions form a family. It has some interest to study the
case of an m-dimensional vector £ = (£b . . £„, 0 , . . 0 ) where m > n,
and where the /i-dimensional vector (£1;. . £„) has a density function of
the form (12). By applying the orthogonal transformation
Formula (21) expresses that the point (£i,. . . , £') lies in an n-dimensional
subspace of the m-dimensional space. A distribution of this kind is said to
be a degenerate m-dimensional normal distribution.
L j M2
0 for у < 0.
A random variable having the density function (2) is said to be lognormal.
The lognormal distribution is of great importance in the theory of crushing
of materials. The distribution of the grains of a granular material (stone,
metal or crystal powder, etc.), in particular of a product produced by a
breaking-process, is lognormal under rather general conditions. This den
sity function is represented by the curve seen on Fig. 20.
Take now another example. Let the random vector £ be uniformly dis
tributed on the circumference of the unit circle; what is the density of the
distribution of the projection £ of £ on the x-axis? We obtain from (1)
Let two independent random variables £ and rj be given having the distri
bution functions F(x) and G(y) respectively. Consider the sum £ = £ + 17;
let H(z) be its distribution function. We have clearly
ír + & = £2 + Ét and + Q + £3 = it + «2 +
From this follows for the distribution functions that
Fx * F., = F2 * F! and (Ft * F2) * F 3 = Ft * (F2 * F3).
4 Z) =
—00
J /(* ) # 0 - X) rfx = j' /(z - >’) .90) dy,
—00
(2 )
H(z) = j j f ( x - y ) d x d G { y ) =
—00 —00
* (3)
= J
—00
J f(x -
—00
y) dG(y) dx.
From (4) follows immediately (2). Further it can be seen that the distribu
tion of £ = £ + i/ is absolutely continuous, provided that one of £ and rj
has such a distribution, regardless of the other distribution.
The function h(x) defined by (2) is called the convolution of the density
functions f( x ) and g(x) and is denoted by h = f * g. It is easy to show that
h{x) is a density function; as a matter of fact (2 ) implies h(x) > 0 and
+ 00 + 00 + 00 + 00 + 00
X — (a + c)
— --— ---- - for a + c < x < b + c,
(b - a)(d - c)
h(x) = l (7)
— ---- for b + c < x < a + d,
d —c
(b + d) — x „ , ,
TV-------т-i------ for a + d < x < b + d.
(b — a){d — c)
The graph of the function f = h(x) is an isosceles trapezoid with its base
on the x-axis (Fig. 21 represents the case a — —1, = 0, c — - 1 , d = +1).
Note that h(x) is everywhere continuous, though f(x) and g(x) have jumps.
(The convolution in general smoothes out discontinuities.)
I V , § 9] T H E C O N V O L U T IO N O F D IS T R IB U T IO N S 197
Fig. 21
distribution. Thus for instance the density function of the sum of three
independent random variables uniformly distributed on ( - 1 , + 1 ) is given by
0 for Ix I > 3,
The function h(x) (cf. Fig. 22) is not only continuous but also everywhere
differentiable. The curve has already a bell-shaped form as the Gaussian
curve; by adding more and more independent random variables with uni
form distribution on ( —1 , + 1 ), this similarity becomes still closer: we have
here a particular case of the central limit theorem to be dealt with later. The
density function of the sum of n mutually independent random variables
with uniform distribution on ( - 1 , + 1 ) is
[~ 2~]
/„ (* > - 2 > Ь )! i„ + ft* \ * \ < " . (9)
0 otherwise
198 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 9
а д . - ■ J (,o)
(io
k =1
0 for x < 0
respectively. It is shown in the kinetic theory of gases that these three ran
dom variables are independent, normally distributed, and have the same
density function:
1 (x
— <p — .
a a)
The physical meaning of £, ц and ( having identical distributions is that
the pressure of the gas has the same value in every direction; m = 0 means
that the gas does not move as a whole, only its molecules move at random.
We wish to determine the density function of the absolute value of the
velocity
v=y/Y + 7 + F - (i s)
Clearly, —, —, — have the density function <p(x); hence, by (17), the density
' <t a a
function of — is g3(x). According to Formula (1) of § 8 the density v(x) of v
a
■ 1 „ P l 1 ^ / 1 ] 1 r
is — 9s — . Since Г — = — Г — = — s/ и , we have
о \o ) [2 ! 2 [21 2
(The curve representing у — v(x) is drawn on Fig. 23 for a = 1.) Note that
a has the physical meaning
fkf
* = 7 м > (20)
where T is the absolute temperature, M the mass of the molecules, and к is
Boltzmann’s constant.
1 -*
Let further be noted that h2( x ) - ^ e 2 : the ^-distribution with 2
degrees of freedom is an exponential distribution.
4. Convolution o f exponential distributions.
The exponential distribution was introduced in the previous section in
connection with radioactive disintegration; but it occurs also in many
other problems of physics and technology. In what follows, we give an
example from the textile industry; namely the problem of the tearing of the
yarn on the loom. At a given moment, the yarn is or is not torn, according
as the section of the yarn, submitted at this moment to a certain stress, does
or does not yield to the latter. Evidently this does not depend on the time
duiing which the loom worked uninterruptedly. Let £ be the random vari
able representing this time-interval, i.e. the time between the start of the
work and the first rupture of the yarn; let F(x) denote the distribution func
tion and f(x ) the density function of £; for F(x) one obtains, as in the case
of the radioactive disintegration, the functional equation
By (23), Formula (24) holds for n = 1. Assume its validity for a certain value
of n. Since £ „ + 1 = Cn + £ „ + 1 and further £„ is independent of £„+1, For
mula (2) can be applied here. Thus we obtain
t
fn + i (0 = J
0
Г
fn (m) / i 0 — u )d u =
Xn+1t ne~x'
— — ------
7 <r
204 GENERAL TH EORY OF R A N D O M VARIABLES [IV, § 10
-0 0
c = V c r+ ^ . + c i ’ (4)
where Co» £i> are independent random variables having the same
normal distribution with density function
1
<p(x) = - = e 2.
J in
Let q„(z) be the density function of C- We know already the density func
tion of the denominator of (4) (cf. Formula (17) of § 9), hence we obtain
from (3)
r f c ± i]
9 a(z)= 7 = h T --------^
f y ( 1 + z 2) 2
(5)
?1<2)- 1 ч г Ь т (6)
2. Distribution o f the ratio o f two independent random variables having
X2-distributions.
In mathematical statistics one is often interested in the density function
h{z) of the ratio of two independent random variables C and q having y2-
distributions with n and m degrees of freedom, respectively.
It follows easily from Formula (12) of § 9 and from (3) of the present
IV , § 10] D I S T R IB U T IO N O F A F U N C T I O N O F R A N D O M V A R IA B L E S 205
section that
^r i n +
2
m \ » 1,
2
Г( 2 / Ы <1+Z)
3. The beta distribution.
If f is the ratio considered in the previous example, let т denote the ran-
. . £
dom variable т = -— - and k(x) the density function of r. By (1) of § 8
we obtain
in + m \
1 2 --1 m- i
k(x) = ----- Ц ---- — ■X 2 (1 - x) 2 for 0 < X < 1 . (8 )
„ In „ lm
Г — Г —
Uj I2j
X
The distribution function K(x) = j к (/) *// is thus
о
X
(n + m \ Г
1 2 --i m—i
= — A t— t 2 ( 1 - 0 2 <*=■»„ m (x) for 0 < x < l , (9)
'• tH t J 0
where
X
« ■ - Ш <l0)
о
is, up to a numerical factor, Euler’s incomplete beta integral. The distribu
tion B(a, b) (a > 0, b > 0) having (10) for its distribution function is called
the beta-distribution o f order (a, b).
4. Order statistics.
In nonparametric statistics the following problem is of importance: Let
£i, fi, • be independent random variables with the same continuous
distribution: let F(x) be the distribution function of £k. Arrange the values
of £b . .., £„ in increasing order , 1 and denote by £* the k -th of these ordered
values; hence, in particular
<n = min £*, £* = max £k . (11)
t<,k<.n l<.k<.n
Í* is called the &-th order statistics of the sample (£,,. . ., £„).
1 The probability that equal values occur is 0.
206 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 10
it follows from the theorem of total probability that this distribution func
tion F(x) is given by
N, x 1 N, (x
v1 N ox ) N \o2j
i.e. F(x) is the mixture of the distribution functions of the errors of the two
methods, taken with the weights and ——.
B N N
It is easy to extend the notion of the mixture to a nondenumerable set
of distribution functions. If F{t, x ) for each value of the parameter Г is a
distribution function and for each fixed value of x F(t, x) is a measurable
function of t and if G(t) is an arbitrary distribution function, the Stieltjes
integral
+ co
H(x) = J F(t, x) dG(t) (2 0 )
■H-OO
Щх) = £ pfiJLx), (2 2 )
n= l
C- W - J („ _ i) i « ■
and, by (2 2 )
f (pXf tk_1 e~pkl
H^ r \ k - 1)1 d t’ (23)
0
tation o f £ by
Л д г)= * Т 7 1 ? )
the expectation does not exist, since in this case the integral (5) does not
converge.
Let us now consider some examples.
1. Expectation o f the uniform distribution.
If £ is a random variable uniformly distributed on the interval (a, b), it
follows from (5) that
L'
Щ) = m.
* W “ T -
П_i _ X
hA x)= ~— - y .
^(т)
A simple calculation gives
E(xl) = n .
Similarly, for the expectation of /„
£ ( а ) = ч/ 2
E (/„ )
hence
E(xl) ~ \E(Xn) f if и-* oo.
5. Expectation o f the beta distribution.
Let £ be a random variable with a beta distribution; its density function is
212 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § l l
m-«- ‘ c - t f - (o<*<i>
a+b
6. Order statistics.
Let (i, . . be independent random variables each uniformly distrib
uted on the interval (0,1). Let t;* be the random variable which assumes the
A>th of the values £ъ ranked according to increasing magnitude;
by Formula (14) of § 10
Fk(x) = Bk'„+i_ * (x ).
к
Hence E(£*) = — — j-; expectations of the £*-s subdivide the interval
(0 , 1 ) into n + 1 equal intervals, as could also be guessed by a symmetry
argument.
We hinted already at the analogy between probability distributions and
distributions of masses. Consider now the distribution of the unit mass on
a line, such that between the abscissas a and b > a there should lie a mass
F(b) — F(a), where F(x) is a given distribution function. If x 0 is the center
of gravity of this distribution, we know that
+ 00
x () = j xdF{x) ,
— CO
Since P(B I A) < the existence of E{£) implies the existence of the
conditional expectation E(£ | A) for any event A such that P(A) > 0.
If F(x I A) is the conditional distribution function of £ with respect to
the condition A, then
E(£ I A) = J° xd F (x! A ) . (6 )
— CO
IV , § II] T H E G E N E R A L N O T IO N O F E X P E C T A T IO N 213
Clearly, since
f m d p
Щ \ A ) = A ‘ -{ A) - = f mdQ = UM dQ
A Q
where Q(B) = P(B \ A), and Q(B) is a probability measure, all results valid
for ordinary expectations are also valid for conditional expectations.
We shall now give some often used theorems.
T heorem 1. The relation
E ( t ckZk) = t ckE ^ k)
*=i k=i
holds fo r any random variables £k with finite expectation and for any con
stants ck. Thus the functional E is linear.
This theorem is a direct consequence of (3) and of the corresponding
properties of the integral.
Let i; and t] be two normally distributed independent random variables
with density functions
1 ( X —mj , 1 ( X — nin)
— <p --------- and — <p ---------- .
°i l °1 °2 I °2 j
The density function of the random variable £, + r\ is, as we have seen al
ready, — (p i------— , where m — mx + m2 and er = J o \ + of. It was proved
a \ a j
above that the parameter m figuring in the density function is the expecta
tion of the distribution. Hence the relation m = + m2 is a consequence
of Theorem 1.
Similarly, because of Theorem 1, the expectation of the gamma distribu
tion of order n is —, since the gamma distribution is the distribution of the
A,
E ( 0 = Y . E ^ \ A n)P{An). (7)
Л- 1
214 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 11
E(£n)=E(£)E{t,). (9)
P roof . Assume first > 0. Let A k be the event kh < n < (k + 1)h;
evidently, the events A k (k = 0, +1, ±2, . . .) form a complete system of
events. Hence, by Theorem 2,
В Д ) = E P(Ak)E(Zr,\Ak). (10)
h----со
The conditional expectations E(£ri \ A k) exist, since ц is bounded, under
condition Ak.
Since, however, and ц are independent, we have
If we put this into ( 1 0 ), the series on the right side can be seen to converge,
thus E(£rj) exists; further (9) holds since the sums
+ 00 +00
X khP{Ak) and X ( k + \ ) h P ( A k)
k=—со k=—со
tend to E(t]), if h -* 0. Thus (9) is proved for £ > 0. The restriction £ > 0
can be eliminated as follows: Put
( . 2,
then > 0, £2 > 0 and £ = «Jj — £,. Since r\ is independent of fi and £2,
we have
£(tn)=Щх n) - щ* n) =[ а д - вд] m = Щ) m ( i з)
and herewith Theorem 3 is proved.
ív, § 11] T H E G E N E R A L N O T IO N O F E X P E C T A T IO N 215
Г \x\dF(x)
- oo
exists. Hence
If we add term by term Equations (17) and (18) and let x tend to infinity we
obtain, by (14) and (15), Formula (16).
Conversely, the existence of the integrals on the right-hand side of (16)
implies the existence of the expectation E{f). In fact, the convergence of the
integrals implies for x > 0
X ~ 2
hence (14) and (15) are valid. Because of (17) and (18), the second part of
Theorem 5 follows.
Theorem 5 has the following graphical interpretation: Draw the curve
representing F(x) and the line у = 1. The expectation is equal to the diffcr-
Fig. 24
ence of the areas of the domains marked by + and — on Fig. 24. The
(evident) fact follows that a distribution symmetric with respect to x = a
has expectation a if this expectation exists. A distribution is said to be sym
metric with respect to a if
F(a — x) = 1 —F(a + x + 0).
T heorem 6. lfH(x) is a continuous function, which is on every finite inter
val o f bounded variation1 and £ is a random variable with the distribution func
tion F(x), then
E ( H ( 0 ) = J H(x)dF(x) (19)
— 00
E(H(£))= J y d F ( H - f i y ) ) . (20)
—00
Relation (19) results from (20) by a transformation x = H~1(y) of the
variable of integration.
Examples. 1. The expectations E{if), if they exist, are expressed by
1 Relation (19) holds for every Borel function H(x) provided that its expectation
Я[Я(х)] exists; cf. § 17, Exercise 47.
IV , § 13] T H E M E D IA N A N D T H E Q U A N T IL E S 217
с - « ! ,...,«
is known, then so are the components Е(£к) {к = 1 , . . n) of its expectation.
They can be considered as the components of an и-dimensional vector
ДО = № ) ,... ,£(£„)),
called the expectation vector of the random vector £. In the three-dimen
sional case, the expectation vector specifies the center of gravity of the cor
responding mass-distribution.
Let us calculate for example the expectation vector of a normally distrib
uted и-dimensional random vector £ = (t]u . . r/n), where the density
function of £ is given by Formula (18) of § 7. By the definition of the и-di
mensional normal distribution, the components rjk can be exhibited in the
form
П
4 k = w* + X c k J Z j,
where the £,■ are normally distributed independent random variables with
1 I X
density function -— cp — and thus expectation£(c) = 0 ; henceE(rjk) =m k.
aj l
Thus we have found the probabilistic meaning of the parameters mk figuring
in Formula (18) of §7.
l г
Ф(х) = ■ — ■ e 2 dt,
J2n J
V —00
(The inequality also holds for 0 < X < 1, but in this case it is trivial,
since every probability is at most equal to 1 .)
P roof . From
m = E {£) = f xdF(x)
0
I V , § 14 ] S T A N D A R D D E V IA T IO N A N D V A R IA N C E 219
follows
00 00
m > I xdF(x) > Ám j dF(x) = Ám( 1 —F(Ám)),
Xm Xm
which proves ( 1 ).
If F(x) is continuous and strictly increasing and if x = Q(y) is the y-
quantile, i.e. the inverse function of у = F(x), then (1) can be written in the
form
Q 1 -----— < Ám.
I . .
In particular (for £ > 0), the upper quartile can never exceed the fourfold
of the expectation.
where F(x) is the distribution function of the random variable <J. If this
distribution function is absolutely continuous and if we put F \x ) = f(x),
then we have
+ 00 + 00 + 00
а д - Ь -t.
2^/3
220 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 14
2. Normal distribution.
Let % be a random variable with density function
1 X —m l ( ( x — m )2
° V ° J 2 n a eXP ( 2 tr2
X —YYi
We know that E(£) = m. By a transformation of the variable---------= и
о
we obtain
+ 00
D2(£) = —
y j2 я a
—
J
f (*- tnf exp Í -
{
(X dx
2cr
^ ]
)
=
—oo
+ oo
-2 /'• «2
= ---- — \ u2 e 2 J m.
v / 2 tt J
—cc
oo
f i 1 l 2 1
Z) 2 ( 0 = A ) * - T e~^dx = l r
0
and
0(0 = 4 '
The standard deviation of the exponential distribution is numerically equal
to its expectation.
4. Student's distribution.
Let £ be a random variable having Student’s distribution with n degrees
of freedom; its density function is given by Formula (5) of § 10. Since f(x )
is an even function, E(£) — 0 for n > 2. [For n = 1 (i.e. in the case of the
I V , § 14] STAND AR D D E V IA T IO N A N D V A R IA N C E 221
1
W—
2
I г X2
=
^ гЫ J 'In \
—00
----------- Л * -
( i + *2)2
X2
Take for new variable of integration у = —------; then
1 + X
^ 2( o = —
n —2 for n
о
ab
(ű + b y (a + b + 1 )
6. Convolution o f normal distributions.
Let E, and rj be independent normally distributed random variables with
densities
1 l x — m ,\ , 1 (X — m2)
---- <p ----------- and -— cp —------- I.
{ ai J °2 1 °2 j
function Xe~Kx for X > 0; their sum £„ = + ... + has a gamma dis"
г(х)-ф
is according to Theorem 6 of § 11 and Formula (3), equal to
+ 00
D2( ? ) = — L I x4 e" dx - [ E ( ? )f = 2 .
J2n J
— 00
0.6745
where qx 0.477 x — ---- yj2. Anyone of these two forms can be taken as
--------------- 1
the expectation m and standard deviation a can be obtained immediate
ly, without calculation; if a normal distribution is brought to the form
( X —m
--------- expectation m and quartile deviation q can be read off without
4
any further computation.
If the distribution function F(x) of the random variable £ is continuous
and strictly increasing for 0 < F(x) < 1, then the value of £ lies with proba
bility ^ in the interval Q Q Clearly, every interval
1 U 1 )
Q(S), Q ö + — 0 < <5 < — I possesses the same property. If the distri
bution is symmetric with respect to the origin and if its density function is
’ M l 3 \)
monotone decreasing for jc > 0, then Q — , Q — is the smallest
interval possessing this property. In this case
respect to the origin, D \£) = E(£2). Put Я = ar|d apply the Markov
inequality (§ 13, Theorem 1) to the random variable £2. Then we obtain
1 D2 (£)
From (2) and (3) it follows that — < — whi ch proves П).
The inequality (1) is sharp. This is shown by the following example: Let
the distribution of the random variable £ be the mixture, with weights
— , — , — of three normal distributions with the same standard deviation
4 2 4
e(> 0) and expectations —1,0, +1. Since e can be chosen arbitrarily small,
it follows from the example that y /2 figuring in ( 1 ) cannot be replaced by a
smaller number.
The quartile deviation q(£) is mostly used when the standard deviation
of £ is infinite, e.g. in the case of the Cauchy distribution.
The standard deviation of a random variable that is uniformly distributed
on the interval (m — a, m + a) is given by —= . If ^ is an arbitrary random
V3
variable with E(£) — m and D(f) = a, the interval
( m - f f y / з , m + <7^3) (4 )
d($) = E ( \ S - E ( Q \ )
is also used as a measure of fluctuations. By Theorem 6 of § 11
m = 7 \x -E (0 \d F (x ).
— 00
J J
d c o -^ -а д ,
d{f) = 2 D (0 .
e
Of course Theorem 4 of § 9 is also valid in the general case.
I V , § 16] V A R IA N C E IN H IG H E R D IM E N S IO N A L C A SE 225
t g = Y ak ^ k - m k) ( 1)
*=1
D \ i g) = t t (3)
i= l 7= 1
r Dn . . . D u '
D= ................. . (4)
4 T)„i.. .Dnn
ÉI Dijxi Xj = c2
/-i j =i
1 Since D u is called the covariance of £, and £ t he dispersion matrix is also
called the covariance matrix.
226 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 16
is called the dispersion ellipsoid of the distribution. It is easy to see that the
dispersion matrix is invariant under a shift of the coordinate system. Under
the rotation of the coordinate system D is transformed as a matrix of a
tensor. Let in fact C = (c,7) be an orthogonal matrix and
Ои = Е {?,® = £ с 1к Í c jhDkh.
k=1 h=l
If the matrix (/)',) is denoted by D’, we have
D ’ = C D C * ,
and suppose that the random vector ft = (t]b rj2) is uniformly distributed
inside this ellipse. The elements of the dispersion matrix of ft, i.e. the
numbers di} = Е(гцц}) (i,j = 1 , 2 ) are defined by
С В A
d u = AC — B2 d'* = d* = - A C - B 2 ’ dv2 = ~AC - B2 ■ (7)
IV , § 16] V A R IA N C E I N H I G H E R D IM E N S IO N A L C A S E 227
Let £ = (£,, £2) be any random vector. Choose the numbers A, B, C such
that the dispersion matrix of a random vector uniformly distributed in
the ellipse (5) coincides with that of £. We put, therefore
AЛ -. —
°2 2
, Вp --------A Cü - —^ _
,A l ,o \
(8 )
k (0 = > (10)
An у / A
i.e. the reciprocal of the area of the ellipse (9), is called the concentration
of £.
(A B\
If А, В, C are chosen according to (8 ), the matrix is the inverse of
[В C
A i A 2 |
. A l T>22 J
The case of higher dimensions turns out to be quite similar. The equation
of the ellipsoid of concentration is here
z z ^ = « + 2- (11)
where d is the value of the determinant | Dy \ and Atj the value of the co
factor of the element in the г-th row and y'-th column. The concentration,
that is to say the reciprocal of the volume of the ellipsoid ( 1 1 ), is equal to
Of course, this holds only for A > 0. If A = 0, the point (£b . .., £„)
lies, with probability 1 , on a hyperplane of at most n — 1 dimensions;
228 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 16
£ xj (Zj - rtij) = 0.
1
C B A
Dn = а с - в 2’ Dl2 = ~ H c ^ ¥ ’ ° 22~ А С - В * '
It follows that
._ D 22 n _ _ -^12 p _ -fin
“ TŐT’ \D \ ’ ~~ ID I ’
where | D | = Dn D22 — fi?2. If we put
Du /—— — ,,
8 = r n — y.— , °i = V D u, o2 = \J D 22. 5 (14)
V ^11 ^ 2 2
we find
1
/( * ,/ = 2 п< /Т ^ Т
т1(т2%/ 1 - Q
*
1 A2 / 1 1
X exp ----- 5Г —;----------------h —и ■ (15)
2 (1 - / ) / сггег2 а \)
IV , § 16] V A R IA N C E I N H I G H E R D IM E N S IO N A L C A S E 229
The number q is the correlation coefficient I?(£, rj) of the random variables
£ and )]. We have already introduced this quantity for discrete distributions.
It is similarly defined in the general case and its properties are the same.
Thus
. í(K -£ « )i[> i-£ (i)i) в д й - вдж ч )
--------- Щ ) Ш —
= * (16)
Theorems 1, 2, 3 and 5 of Chapter III, § 11 are valid and can be proved in
nearly the same way.
,, . 1 X 1 1 у )
f ( x , y ) = ---- <p ----- ------- <p ----- ; (17)
°1 °"l / °2 <f2 J
hence £ and ц are independent. This theorem is easily generalized to any
number of dimensions.
(ca a \ , . . . , cin a n
2) (20)
I I C jk iy j- mj ) 2 ; (23)
fc=l °k u =l ) 1=1 A;
and consequently
, 4 i 1 £ (л - w , ) 2 '
9 (У ъ • • ■> J' «) = ------------- — ---------- e x P - t L ----------j ---------- >
(2 ТГ) 2 Y \ a k 1 ,=1 J
k=l
which proves the independence of the random variables rjy ,. . . , rj„.
Remark. If, instead of assuming that the random vector (rj1 ; . . . , tj„)
has an «-dimensional normal distribution, the weaker condition is assumed
that the components rjy, . . . ,r]n are each normally distributed, then the
assertion of Theorem 2 is false. This can be seen from the following example:
Let the density function of the random vector (£, 17) be
f{ x ) = g{x) = —- L = e 2 ,
J2n
i.e. ij and r\ are normally distributed with expectation 0 and standard
deviation 1. Since h (x,y) is an even function of both x and y, it follows
IV , § 16] V A R IA N C E I N H I G H E R D I M E N S I O N A L C A S E 231
that R(£, q) = 0. The random variables £ and q, however, are not indepen
dent, since evidently h(x, у ) Ф f(x) g{y); thus £ and q are each normally
distributed and are uncorrelated, but they still are dependent.
From Theorem 2 follows
П1 = Z ak Zk and r]2 = Z bk U
k = 1 k = 1
n
are independent if and only if akbk = 0 .
k=1
P roof. Since
П
Z ak bk
ъ) = ,T - ,
J Z a llb Z
V k= 1 k =1
the necessary condition that rjx and rj2 should be uncorrelated is that
t ak bk = 0 .
k =1
We shall now show that the random vector (q, , q2) is normally distributed.
There can be found an orthogonal matrix (ckl) such that
ak
—j = and, bk
-y——
J i°l J ib l
232 GENERAL THEORY OF RA ND O M V A R IA B L E S (IV , § 17
£ akbk = 0
k =1
means that these two directions are orthogonal and rji, r\ 2 are (up to a
numerical factor) the projections of the random vector , . . . , £„) on
these directions. Our result may thus be formulated as follows: If , . . . ,
are mutually independent random variables with the same normal distri
bution, then the projections of the random vector £ = ......... £„) on
two lines dt , d., are independent iff dt and d2 are orthogonal.
§ 17. Exercises
1. Let the distribution function F(x) of the random variable Z be continuous and
strictly increasing for — c o < X <C H - o o . Determine the distribution function of the
following random variables:
of the lognormal distribution and calculate its extrema and points of inflexion. Calcu
late the expectation and standard deviation of the lognormal distribution.4*
4. a) Show that if the random variable £ has a lognormal distribution, the same
holds for rj — ci* (c > 0 ; а ф 0 ).
I V , § 17] E X E R C IS E S 233
b) Suppose that the diameters of the particles of a certain kind of sand possess
a lognormal distribution; let f(x) be the density function of this distribution (cf. Exer
cise 3), with m = — 0.5, a = 0.3; x is measured in millimeters. The sand particles
are supposed to have spherical form. Find the total weight of the sand particles which
have diameters less than 0.5 mm, if the total weight of a certain amount of sand is
given.
5. Let the random variable rj have a lognormal distribution with density function
„ 1 (In x — m)2 1 „
f ( x ) -- —— — e x p --------- — ------- for x > 0 .
^Jbiox 2a
If the curve of у = fix) is drawn on a paper where the horizontal axis has a logarithmic
subdivision, then (apart from a numerical factor) one obtains a normal curve. It does
not coincide with the density function of In rj, but is shifted to the left over a dis
tance a2.
6 . Let the random point (£, rj) have a normal distribution on the plane, with density
function
,, V 1 x2 + y 2
КХ’ У ) = 2 ^ - eXP ['-------2a2
1 1 ( X? Xj \
----------- exp — r -t----- r .
2 лах <r2 2 \ a\ a\ ]98
8 . Let the density function of the probability distribution of the life-time of the
tubes of a radio receiver with 6 tubes be X2t e~l‘ for t > 0, where Я = 0.25 if the
unit of time is a year. Find the probability that during 6 years no one of the tubes
has to be replaced. (The life-times of the individual tubes are supposed to be indepen
dent of each other.)
Cm~ 1 _ c
h) the distribution with density function f ( x ) = —---- —— x~m e x for x > 0 ;
(m — 2 )!
(c > 0 ; m = 2,3,...).
10. a) Let the point (c, ту) be uniformly distributed in the interior of the unit circle.
We put
---------- 77
Q = у / Í2 + V , <p = arc tan — .
/(*) = —
n Ve + e
Find the distribution of f = { + 7y.
14. Let i be a random variable with density function
n /c=1
í v , § 17] E X E R C IS E S 235
16. Let the random variables 4> 4 ........ Z„ be independent and uniformly distrib-
П
uted on the interval (0,1). Determine the density function of C = Z Zl-
1=1
I I* l *
d) Show that the random variables — —• are uniformly distributed in the
l 4+1J
interval (0 , 1 ).
18. The random variables Zi, 4 ........ 4 are called exchangeable, if their и-dimen'
sional distribution function F(jrb ....... x„) is a symmetric function of its variables-
(Exchangeable random variables have thus the same distribution and consequently
the same expectation.)
a) Choose at random and independently, with a constant probability density, n
points in the interval (0, 1). Let their abscissas be {,» 4 >- - - . 4- The interval (0, 1)
is subdivided by these points into и + 1 subintervals of the respective lengths
Vu %>•••> Vn+i- Show that
Щк) = —X T •
Hint. The rju t)2 , . . . , rjn+t are exchangeable random variables and we have
Л+1
Z=1
A = L
Hint. Z* = Vi + % + • • • + Va
di Which is larger:
[L e -* .
Vя
22. Let {2, . . . , Sn be independent random variables, let the density function
of i k (k = 1, 2 be
X{k + h — 1) for a > 0 ,
where X > 0 and ft is a real number. Find the distribution function of the sum
П
rj = £* and show that C = exp (— X rj) has a beta distribution.
k=I
23. Let h„(x) be the density function of Student’s distribution with n degrees of
freedom. Show that
lim -A = h„ í - í = ] = —} = e~ 2 .
\ /n > V 2jl
24. The substances A„ A2, A „ +1 form a radioactive chain, i.e. if an At atom
disintegrates it is transformed into an A2 atom, similarly the A2 atoms into A.6 atoms,
and so on. The An+l atoms are not radioactive. Suppose that at the instant t — 0
the number of Аг atoms is N„ the number of A., atoms is N2, ..., while there are N„ atoms
of A„. Find the density function of the time interval needed for an atom chosen at
random to change into an A„+i atom.
25. Let X be the disintegration constant of a radioactive atom. Let there be N atoms
present at the time 0 .
a) Calculate the standard deviation of the number of atoms disintegrated up to
the time t.
b) Calculate the expectation and the standard deviation of the half-period (i.e. of
N
the random time interval till the — -th disintegration, if N is even).26
26. a) Let r/k (k = 1, 2, . . . ) be the time required for the transformation of a radio
active atom A, into an Ak + 1 atom, through the intermediary states A , , . . ., Ak, i.e.
the duration of the process
A i~ * A 2 —> ■ . • —► A k + 1 .
Let further Xk be the disintegration constant of the Ak atoms, gk(t) the density function
of r]k and i k(t) the number of A, atoms which are present at the time t. It is assumed
IV, § 17] EXERCISES 237
that at the moment 0 there are only Al atoms present and their number is equal to N.
Find the distribution function of r\k and of i k(t) (k = 1, 2, . . . ) .
Hint. Let Pk(t) be the probability that at the time t an atom is in the state Ak.
These probabilities can be calculated in the following way: The probability that an
atom Ak changes into an atom Ak+ l during a time interval it, t + At) is, by definition
of gk(t), equal to gk{t) At + o(/lt). On the other hand, the probability of this event
is as well expressed by Pk{t)XkAt + o(/l/); the possibility that during the time interval
(/, t + At) an atom passes through several successive disintegrations can be neglected.
Hence we have
Pk(t) = ffk(‘) ■ 0)
Ák
Since the disintegrations of the individual atoms are independent, we obtain
Remark. The atoms A„+i not being radioactive, M„+ 1(l) is evidently an increasing
function of time, hence mn+x = + a> .
d) Show that t = 0 is a zero of order к — 1 of the function M k(t).
27. Let г/, C be the components of the velocity of a molecule of a gas in a
container. Let the random variables £, r\, £ be independent and uniformly distributed
on the interval ( —A, + A). Calculate the density function f A(x) of the energy of this
molecule. Determine further the limit
lim A3f A(x) = w(x).
A-*- -f- со
Hint. Let the mass of the molecule be denoted by m and its energy by E, then
£ = - ^ ( í 2 + ’?2 + C2) .
hence
_ m e - ß>
PiO = -s-------------------- .
J w(u) du
0
3
where c' is- a positive constant and c = -----g- . Calculate under these conditions, for
2 c' 2
the limiting case c -> + °o, the value of ß, the function />(?), and the distribution
of the velocity of the molecule.
Hint. With the above notations we have for c' —* + oo
E _ 3
~N ~~ ~2ß '
E 3kT
It is known from statistical mechanics that ---- = ------ , where к is Boltzmann’s
N 2
constant and T the absolute temperature. So ß = ^ and
2 s /T exp - T r
p(0 = — —— —— •
V n (k T ) 2
Let the velocity of a molecule be denoted by v and its kinetic energy by Ekin, then the
density function of v will be given by
t, s s dEm ( m \-\ 12 V2 ( m \
f W = * E ^ — = [ — J °2 exp - •
This derivation of the Maxwell distribution coincides essentially with one usually
given in textbooks of statistical mechanics. (We return to this question in Chapter Y,
§ 3.)
29. a) Calculate from the Maxwell distribution the mean velocity of the molecules
of a gas having the absolute temperature T and consisting of molecules of mass m.
b) Show that the average kinetic energy at the absolute temperature T of the
3
molecules of a gas is equal to — kT (k is Boltzmann’s constant).
c) Compare the mean kinetic energy of a molecule with the kinetic energy of a
molecule moving with mean velocity. Which of the two is larger?
30. a) Consider a gas containing in 1 cm3 N molecules and calculate the mean
free path of a molecule.
Hint. The molecules are considered as spheres of radius r and are supposed to be
distributed in the space according to a Poisson distribution, i.e. the probability that
a volume V contains no molecules is expressed by e~NV. The probability that the
volume A V contains just one molecule is given by NA V + о (A V). The meaning
of the statement that “a molecule covers a distance s without collision and then collides
on a segment of length As with another molecule” is just the following: a cylinder
of radius 2 r and height s does not contain the center of any of the molecules and
another cylinder of radius 2r and height As contains the center of at least one of the
molecules. Thus the probability in question is
Ar27iNe~inNr'’ As + о (As),
240 G E N E R A L T H E O R Y O F R A N D O M V A R IA B L E S [IV , § 17
i. e. the distribution of the free path is an exponential distribution with density function
4nNr2 e-*nNr‘s .
1
Hence the length of the mean free path is ■ w . .
AnNr1
b) Calculate the mean time interval between two consecutive collisions of a molecule.
Hint. Let the length of the free path be denoted by s, the velocity of the molecule
by V, then г = — , where т denotes the time interval studied, s and v can be assumed
V
to be independent, thus E(x) = E{s)E j ; the first of these two factors is known
from Exercise 30 a), the second can be computed from the Maxwell distribution.
31. Calculate the standard deviation of the velocity and kinetic energy of a gas
molecule, if the absolute temperature of the gas is T and the mass of its molecules m.
32. Let the endpoint of a three-dimensional random vector Í possess a uniform
distribution on the surface of the unit sphere. Let в be the angle between the vector
sin t
£ and the positive x-axis. Show that the density function of 0 is given by — —
D \ Z ) = X P k D \ H k) + D \p ),
kmz1
where ц is a random variable assuming the values M u M2, ...» Af„with probabilities
IV, § 17] EXERCISES 241
,, 4 1 (* —"Ú2"
/(* ) = .— e x p ------- — — .
y j 2ла L 2a
Deduce £(£) = m from the fact that the function у = /(x ) satisfies the differential
equation a2/ = — (x — m) у .
b) Let the density function of the random variable £ be given by
A™” ! - A
f(x) - 2 , x-me * (x> 0 ),
where m > 3 is a positive integer and A > 0. Calculate E(Q from the fact that the
function у = f ( x ) satisfies the differential equation
/ = *
c) Apply the same method in general to Pearson’s distributions (cf. Exercise 9).
39. Suppose that there are 9 barbers working in a hairdressing-saloon. One shaving
takes 10 minutes. Someone coming in sees that all barbers are working and 3 customers
are waiting for service. What waiting time can he expect till he is served?
Hint. Assume that the moments of the finishing of the individual shavings are
independent and are uniformly distributed on the time interval (0 , 1 0 ').
40. Let £lt be independent random variables having the same distribution.
Prove that
E { ) - т
4L Prove that if the standard deviation of the random variable f with the distri
bution function F(x) exists, then
and
Щ 2) = 2 fx (1 - F(x) + E(—x)) dx .
о
42. Calculate the dispersion matrix of a nondegenerate «-dimensional normal
distribution.
Hint. Let the «-dimensional density function of the random variables % ,... Fin be
where \B\ is the determinant of the matrix В = (Ъ„). There can be given independent
normally distributed random variables i k such that £(£*) = 0 and
V i = Y ct k h O'= 1, 2 , (2 )
k=1
where C = (c№) is an orthogonal matrix. Let ok = D(Ak) and let 5 be the diagonal
matrix having for its elements the numbers —- . Then B, S, and C are connected by
°k
the relation В = CSC*, where C* is the transpose of the matrix C. If we put Dn =
= E(r]/ rjt), then we have by (2)
П
А/ = Y c‘k C/k (2 )
k=1
Hence the matrix D = (Du) can be written in the form D = CS~l C*, where S _l
denotes the inverse of the matrix S and thus BD = CSC*CS~lC* = E, where E is
the unit matrix of order и. Thus the dispersion matrix D of the normal distribution
is the inverse of the matrix В of the quadratic form figuring in the exponent of (1).
43. a) Using the result of the preceding exercise, find anew proof for Theorem 2
of § 16.
b) Let ....... be independent normally distributed random variables with
E(£k) = 0, D(í/i) = o \show that if the matrix C = (c„) is orthogonal, then the random
variables
П
Vi = Y c'*
A=1
are independent.
c) Determine the ellipsoid of concentration of the и-dimensional normal distri
bution and prove Formula (12) of § 16.
d) What is the geometric meaning of Exercise b)?
Hint. The components c2, . . . , ?„ of an и-dimensional normally distributed
random vector are independent, iff the axes of the ellipsoid of concentration are
parallel to the coordinate axes. If the random variables ........ Z„ have the same
normal distribution, then the ellipsoid of concentration is an и-dimensional sphere;
thus the condition required is fulfilled for every choice of the coordinate system.
44. a) When considering errors of measurements the following rule is often used:
If the random variables are independent, further if the first partial deriv
atives of the function g(xl t ..., x„) are continuous and if r] = g{xl .., x„), then
register any particle arriving before the end of this time interval. The number of
particles counted is thus smaller than the number of the particles actually coming in.
The average number of particles registered during unit time is said to be the “virtual
density of events” and is denoted by P; the average number of the particles actually
arriving during the unit time is said to be the “actual density of events” and is denoted
by p. (Every arriving particle renders the apparatus insensitive for a time interval h,
regardless whether the particle was registered by the apparatus.) As to the arrival
of the particles, the usual assumption made in the study of radioactive radiations
is introduced, namely that the probability of the arrival of n particles during a time
(pt)" e~p'
interval t is given by --------- — (n = 0 , 1 , . . . ) .
n\
a) Determine the virtual density of events P.
b) Determine that value of the actual density of events which makes the virtual
density maximal.
Hint. The probability that a particle arrives during a time interval At and is registered
is equal to the probability that the particle arrived during the time interval considered
and no other particle did arrive during the preceding time interval of length h. This
probability is approximately pe~phAt, hence P — pe~ph. If the passive period h is
known and P was experimentally determined, the above transcendental equation is
obtained for p . By differentiating we find that P has its maximal value, if p = ~ ;
h
then P = ---- .
eh
c) Calculate the distribution, expectation and standard deviation of two consecutive
registered particle-arrivals.
Hint. Suppose that an arrival was registered at the time t = 0 and let fV(t) be the
probability that at least a time interval t will pass till the following registered arrival.
It is easy to see that IV(t) satisfies the following (retarded) difference-differential
equation
W’(t) = T (l - W(t - hj) (t > h) (1)
and fulfils the initial condition W(t) = 0 for 0 < t < h . The solution of (1) is given by
” ( _ D *- 1 Pk (t — кКУ
W(t) = У -------------------------— for nh < t < (n + 1 )A ( л = 1 , 2 , ...). (2 )
*=i k\
Integrate (1) from h to + ° ° , then
CO
the fact that the apparatus has a passive period diminishes the relative standard
deviation of the distribution.
d) If the radiation has a too high intensity, a “scaler” is commonly used in order
to make the observations more easy. This apparatus registers only every k-ih particle.
(In practice к is a power of 2.) Calculate the virtual density of events for this case too.
Hint. First calculate the probability that during the interval (t, t + At) there arrives
a “k-th particle”, i.e. a particle having in the list of arriving particles a serial number
which is divisible by k. Clearly, the probability of this event is
t oo nk .n k I -p i ,
( . ? , т а г ) л + “<л >-
As the factor of At depends al»d on t, the process is not stationary. But this dependence
on t is very weak when t is large; in fact, it can be shown that
“ p " k _ p ...
E(H(0) = H(x)dF(x)
— CO
holds without restriction for every Borel-measurable function H(x), such that the
expectation E(H(£)) exists.
Hint. The value of E(H(£)) only depends on the distribution of Я(£), hence
on the distribution of c, since for every Borel set В P(H(i.) 6 В) = P(c 6 Я - 1 (В) ) ,
where Н~'(В) denotes the set of the real numbers x for which FHx) 6 В . Hence
E(H(£)) does not depend on the fact on what probability space [Q, cA, P] the random
variable £ is defined; thus let Ü be the real axis, cA. the set of all Borel subsets of Ü
and P the Lebesgue-Stieltjes measure defined on Q by P(Iab) = F(b) — F(a), where
/ oi, is an arbitrary interval a < x < b . Under these conditions £(x) = x ( — <x> < x <
< + oo) has distribution function F(x), hence
Let & = [Í2, «55, P ] be a conditional probability space (cf. Ch. II, § 11).
P ( r 1( h ) \ r \ h ) ) > 0 .
an open or half open interval with endpoints a and ß (ос may be equal
to - oo and ß to + oo). Let cn be any point in the interior of J, i.e. a < c0 < ß.
Take a sequence of intervals Ianbn (n = 1 , 2 , . . .) with
a„+l < a n < c0 < bn <bn+1, lim an = a, lim bn = ß.
n-*- + 00 n-*- + CO
and
F ,.o - 't i - V J H - I U for „ < * < .
f n u i i - M - g : ”
(c < d and Fn(d) — F„(c) > 0 follow from our assumptions). Furthermore,
for an <, X < bn and for N > n
F n( x ) = f N( x ) .
Therefore the value of Fn(x) does not depend on n and we can omit the
index n by wiiting simply
The function F(x) is defined everywhere on the interval (a, ß), it is non
decreasing and leftcontinuous; for Icd £ we have F(d) — F(c) > 0 and
for c < a < b < d the relation
шО
)
in 70; let a and ß denote the endpoints o f J. Then there exists a nondecreasing,
left continuous function Fix) defined on (a, ß), such that for Icd £ we have
F{d) — Fie) > 0 and for Iab CCIcd the relation
р ({ -ч с > ж -ч ц ) - P)
holds.
A function Fix) having the above properties will be called a distribution
function o f £ on (a, ß). Under the assumptions of Theorem 1, the random
variable £, thus possesses a distribution function on (a, ß). The distribution
function Fix) of £, is evidently not uniquely determined since for A > 0
and for arbitrary g, the function G(x) = /.Fix) + g is also a distribution
function of £, on (a, ß). Conversely, if F(x) and G)x) are distribution functions
of ^ on (a, ß) and if the conditions of Theorem 1 are fulfilled, then for any
two subintervals I cd and I.,ö of (a, ß) with l cd £ and I./6 £ there
can be found an interval Ief £ such that Icd cc fif and Iy6 £ l ej. Thus
we have
Fid) - Fie) F if) - Fie) F fi) - F(y)
G id )-G ic ) G if)-G ie ) G(ő) - G(y)
Hence
for (4)
f i x ) = F \x ) (8 )
248 M O R E A B O U T R A N D O M V A R IA B L E S [V, § 1
X
A \ g{t) dt + g for x > 0,
FW = 'о (10)
—A f g(t) dt + g for x < 0
F ( h - \y ) ) = G(y)
/(* -■ O ))
/(* ) =
then r] = cf (c > 0 ) has the same density function.
— <П)
a
{Clearly, the value o f E(£ \ a < £ < b) does not depend on the choice o f
Fix) or fix).)
«
250 M O R E A B O U T R A N D O M V A R IA B L E S [V, § 1
E ( Z \a < Z < b ) = ^ -
for all a < b.
Example 7. If £ is logarithmically uniformly distributed on the positive
semi-axis, then
E(£ \ a < £ < b) = —— for 0 < a < b.
In —
a
Example 8. If £ is uniformly distributed on the whole real axis, |£| is
uniformly distributed on the positive semi-axis.The distribution function of
<J2 is th u s y /x a n d its d en sity fu n c tio n is - __: f o r x > 0 . H en ce fo r 0 < a < b
V *
6* _
j у/ X dx о , ,9
in accordance with the fact that under the condition a < £ < b, £ is
uniformly distributed on the interval (a, 6 ) and the standard deviation of
such a distribution is —— = . (Cf. Ch. IV, § 14.)
2 ^ /3
Distribution functions and density functions of an /--dimensional random
vector on a conditional probability space can be defined in a similar way.
Let I be an “interval” of the /--dimensional space, i.e. the set of the points
X = (xx , . . . , x r) whose coordinates satisfy ak < x k < bk (k = 1 ,2 , , r)
and let F{x1, . . . , x r) be a function of r variables. Like in Chapter IV,
§ 3, we introduce the notation
A,F = < > A f>. .. A%E(al t . . . , a r),
where hk = bk — ak (k = 1, 2, ,r). We have the following theorem:
G(xl t .. . , x r) = Щ х ь . . . , x r) + p (14)
/ = П M xi) . (17)
i=l
where the nonnegative function f ( x ) is equal to F'(x). Conversely, from
(17) follows (16) and thus the independence of the random variables
J 9 (x) dx
P(A IB) = Aw .
1 g(x) dx
В
Put CW — X. Then ^ = [Q, £8, P] is a conditional probability space
and £ is a random vector on JE If Ix denotes the interval
min(0 , x t) < ti < max(0 , x t) (i = 1, 2 , ,r),
then the distribution function of C is given by
Е(хъ . . x r) = ( - l)fc f g (x) dx,
h
where к is the number of the values of i for which x t < 0 and g{x) is the
density function of f.
In the case g(x) = 1 , ( is uniformly distributed on the whole space Er.
In this particular case we can put
F(x t , . . xr) = x rx 2 . . . x r.
р ( с - 1( в ) \ с - 1(Г)) = — -A -F— •
n C - 1 ( C ) K - 1 (5)) = y - £ - p F .
' в '
From Theorem 5 we can easily deduce
H (x)=J(F(x-y)-F(0))dG(y).
0
H(b) - H(a)
H(d) - H(c) ’
f
h{x) = f ( x - y ) d G ( y )
Ö
is a density function of £ = £ + . Finally, if G(y) is absolutely continuous
and #Cf) = G'(y),
h(x)=\f(x~y)g(y)dy.
о
ű = й + ••• + £
We obtain by induction
" -i
A„(x) = x 2 for X> 0 .
(ij = P(B I Ak) for every со £ Q such that £(co) = x k). Instead of (1), the
notation Y] = P^B) will be also used.
Let U denote any Borel set of real numbers and <l~ \U ) the set of all
со £ Q such that £(co) £ U. Let further be the family of the sets
We have thus The family is a tr-algebra since
^ - \ l U k) = U ~ \ U k) and =
Pt (S2) = i - (8)
One can prove in a similar manner that with probability 1
P fB ,) < P fB 2) for B, a B2.
£ and define the random variables Р ^Вк) (к = 1 ,2 ,...) and P${B) as above.
Then, for A £
P{ABk) = \ P i(Bk)dP (10)
A
and
P(AB) = \P 4{B)dP.
A
But from (10) and from
I P(ABk) = P(AB)
k =1
it follows that
00
P(AB) = $ ( Y P t(Bk))dP,
oo
hence ]T PÁBf) fulfils relation (6 ) which defines P((B). Thus with proba
ri
bility 1
00
W = k= 1 (11)
The elements to for which the relation (11) does not hold form thus a
set C of measure zero, i.e. P(C) = 0. Since Pt(B) is determined only almost
everywhere, one cannot expect to prove more than this. The exceptional
set C depends on the sets Bk and the union of the exceptional sets corre
sponding to the individual sequences {Bk} is not necessarily a set of measure
zero since the set of all sequences {Bk } is nondenumerable if has infinitely
many elements. Thus we cannot state that for a fixed £, Pc(B) as a function
of В is a measure; in general this is not true.
In practice, however, this fact causes scarcely any difficulty at all. In most
cases, the conditional probability P$(B) = P(B | £ = x) is studied simul
taneously for nondenumerably infinitely many В only when the con
ditional distribution of a random variable tj is to be determined with respect
to the condition £ = x; i.e. if the probabilities
Р(П < У I É = x)
are to be considered for every real value of у . If these conditional proba
bilities can be defined in such a manner that Р(ц < у \ ^ = x) is a distri
bution function with probability 1 , then this function is said to be the
conditional distribution function o f rj with respect to the condition ^ = x
and is denoted by F(y | x):
F(y I x) = f f( t I x) dt
— CO
In fact, this relation is valid for every rational у and hence for every
real у as well. Herewith our statement is proved.
Thus we have defined the conditional probabilities P(B \ A) even for
P(A) = 0; but let it be emphasized that in the latter case the conditional
probability P(B I A) is only defined, if A can be considered as a level set
of a random variable £, i.e., if there exists an x such that A is the set of
the elements <n of Q for which flat) = x. Then P(B \ A) is defined by
P(B I A) = P(B \£ = x). However, a set of probability zero can be obtained
as a level set of different random variables, thus e.g. A may be defined
by any one of the conditions — x x and £ 2 = x 2 . Thus it is possible that
(
P(B) FB(b) - FB(a)) = J P{B I £, = x) dF(x)
a
( 12)
where FB(x) denotes the distribution function of £ with respect to the
condition В and where we have chosen for U the interval [a, b]. It follows
by a well-known theorem of Lebesgue that (if F(x) is the distribution
function of <^)
/(*)=j—OO h( x , y ) d y
be the density function of £ . Let £-1(£/) and £_1(F) denote the events
^ £ U and (z V respectively, where U and V are Borel sets on the reai
axis. Assume that the function /(x) is positive for x £ U. Then
hence
У£ V
thus the conditional density function g(y | x) of g with respect to the con
dition £ = X is given for the x values which fulfill f(x ) > 0 by
P ( tl iV \Z = x )= P (tiZ V ). (17)
Consequently, if the random variables £ and g are independent, then
the conditional distribution function of g with respect to the condition
£ = x is identical with the ordinary (unconditional) distribution function
of g . Conversely, if (17) is valid for every Borel set V and for every x £ U
with P(£ £ U) = 1, then ^ and g are independent.
3. Let (c, g) be a normally distributed random vector with the density
function
—
1 exp
Г 1 ,(*■2 + >■)
2x1> .
Let Q and ft (0 < ft < 2n) be the polar coordinates of the point (£, g).
Find the conditional distribution of ft with respect to the condition
о = r > 0. We have
ы » )= 1 1 for a i B -
\ 0 otherwise.
Since Xu is measurable with respect to we have, with probability 1,
= XÁ<»)-
6 . (Particular case of 5.) Let Q be the interval [0, 1], let be the set
of Borel subsets of Q and P the Lebesgue measure. Put
c(co) — ш (0 < ш < 1 ).
V, § 3] G E N E R A L IZ A T IO N O F C O N D IT IO N A L P R O B A B IL IT Y 263
f 1 for со £ B,
PAB) = { л ,
[ 0 otherwise.
7. Let Q be the unit square of the plane (x, y), ^ the class of the Borel
subsets of fi and P the two-dimensional Lebesgue measure. Put £(x, y) = x.
Since, for every В £ <Jé and for any Borel set U of the real axis (according
to the theorem of Fubini),
P (B Z ~ \U ) ) = \{ S' dy)dx,
U О ,Ж »
we find
P(B |£ = x0) = J dy = n(BXo),
where BXo represents the intersection of В by the line x = x0 and /< the
one-dimensional Lebesgue measure. In this case P(B | £ = x0) is thus, as
a function of B, a measure on the u-algebra -J i, for every x0 .
Let & = [ß, <^f, 3d, P(A j Л)] be a conditional probability space and b,
a random variable on г?'. Let B ( t / and C £ 3d be given sets, with
P(B I C) > 0 and let -.sA£ be the least tr-algebra with respect to which £
is measurable. Consider the measures jic(,4) = P(A | C ) and vc(A) =
= P(AB I C ) on <^6 vc(A) is absolutely continuous with respect to gc(3) ;
there exists thus, by the Radon-Nikodym theorem, a function measurable
with respect to ^ f ( c o ) = P fB \ C) such that
P (A B \C )= \ P fB \C ) d g c for (1)
'A
Let us point out the following circumstance. If A(x) is the set of all
со £ Q for which £(co) = x, it may happen that the sets CA(x) belong
to the family AS for some values of x or even for every one of its values
and thus P(B \ CA(xj) is defined. But a priori it is not at all certain that
P(B I СА(хУ) coincides with P ^B \ C), i.e. that
P(AB 1C) = f P(B I C A (ty dpc for A £ Л {.
A
This regularity property does not follow from the axioms and if necessary
it must be postulated as an additional axiom.
C o n sid e r n o w th e fo llo w in g im p o rta n t p a rtic u la r case:
L e t Q be a n a rb itra ry set an d a a -a lg e b ra o f su b se ts o f Q . L e t fu rth e r
p be a «т-finite m e asu re o n ^ 6 a n d le t A S b e th e fam ily o f sets
F = [ Q ,u € , AS,P (A \B )]
P { A B \C )= \P ^ B )d p c (4)
A
for A e 5 f t / and В CI C £38.
V, § 3 ] G E N E R A L IZ A T IO N O F C O N D IT IO N A L P R O B A B IL IT Y 265
р^ и^ - у щ г ’ <8)
V
if Z - \V )
In this case the conditional density function of r\ with respect to the
condition £ = x is equal, for f(x ) > 0 , to
[J h (x ,y )d x d y \ ( \ g ( y \ X)d y )f(x )d x
n „ Í W .U V \U V )= ^ T m d x m d x ------------ (10)
V V
266 M ORE ABOUT RANDOM V A R IA B L E S [V, § 3
I 1 for | V - - / | < 1,
- I 0 otherwise.
The density function / ( x) of ^ is
/(*) = -00
J h{x,y)dy,
hence
f ( x ) = j 2( \ Л 2 + 1 - J * ? - О for 1Je 1> 1,
( 2sjx~ + 1 otherwise.
( 2 (7 / + 1 -s fr ~ - 1) for | t I > 1,
g(y) = --------
( 2 y jy + 1 otherwise.
It follows that
f(x\y)= 1
-----for I у I < 1,0 < |x | < J y + 1,
/Т Г Т
2 J r + 1
0 otherwise.
is thus
h(x,y) =f(x)g(y),
where g(y) is an ordinary density function. Hence the conditional density
function g(y I x) of t] with respect to the condition £ = x is
ff(y I x) = g(y).
The conditional density function g(y | x) does not depend on the value
of X.
3. Let (£1; £2, . . . , <*„) be a random vector uniformly distributed in the
whole и-dimensional space and let »/„ = + £! + ••• + £2. Determine
the conditional density function of £k with respect to the condition r\„ =
"-i
= y (y > 0). We know already that the density function of t]n is у 2
for у > 0 (§ 1, Example 10). It follows that the two-dimensional density
function of £k and tjk is
0 otherwise.
For the conditional density function of £k with respect to the condition
— У we find thus
fj f „ ( x \ y ) d x = \ . (13)
-НУ
From (12) and (13) it follows
c. g 4 = , 2 (14)
V , r | - ± |
Г — 2 я-З
f n i x \ y ) = - K = ---- . f i 1 - — ! 2for - J y < x < + J y . (15)
s f^ y г n ~ 1 у I
268 M O R E A B O U T R A N D O M V A R IA B L E S [V, § 3
hence every £,k (к fixed) has in the limit a normal (conditional) distribution,
if the condition imposed is t]n — no1 and n tends to infinity.
4. We deduce now the Maxwell distribution from the preceding example. 1
Let c,k, r\b t,k (к - 1 , 2 , ,n) be the components of the velocities of n
atoms of a certain amount of gas. We assume that the (a priori) distribution
of the point (<*!, tji, £], • • •, in, r\,., Cn) is uniform on the whole Зл-dimen-
sional phase space. Consider the conditional distribution of the velocity
components with respect to the condition that the total kinetic energy
of the gas be constant. This kinetic energy is given by
E = ^ -i(ü + n t+ a i
1 k=i
where m represents the mass of a particle of the gas. The conditional
density function of the distribution studied is, by the above example,
____r (Jü_ I
, L 2E I m 2 ' Í,
hn Г m V 2nE 3n - 1 2E I
2 )
f 2E
for lx I < -----. (17)
V m
3
By taking into account that E = — кТп (к Boltzmann's constant, T the
absolute temperature of the gas) we find for the conditional density function
hn{x 17) of the velocity components t]k, £k at constant temperature T
rí— I 2 ,
, , ,_ 1 2 (t x 2m
h „ (x \T )= ^ — — ----_ 1 _ ___
/ ЪпкТп г I Зи - 1 \ ЪкТп)
hence
__ x2
m
lim hn ( x \T ) = —===== e~ * * . (19)
+ со / 2 ,7 1 k ]
V m
*1in —Yj Zk + fk + Ck
k= I
1 V „ ( v \y ) d v = \. (22)
0
270 M O R E A B O U T R A N D O M V A R IA B L E S [V, § 4
Hence
r i£
D n = — F = 2 • (23)
чЛ r 3w — 3
2
If Ж„(п I 71) denotes the conditional density function of the velocity of the
particles at a given absolute temperature Г, we have
rí— I
, ч Ъ п кТ ] 4v2 m I* 2
W „ (v IT ) = V „ V ------------ = — — — — — ----------- - X
m J n ЪпкТ Iin — 3
l 2
Í mv2 -"Г— / ЪпкТ
x ( ' (24)
The distribution with density function (24) is called the Maxwell distribution
o f order n. As we have already seen, it tends for n - 4 0 0 to the ordinary
Maxwell distribution, i.e.
Г2~~ Í m 11 _ g*”
lim Ж „ ( п | Г ) = — —— 2 1 ,2 e 2kT (0 < V < + 00). (2 5 )
Я-»-00 V II \ ^ ,
hence
\E (ti\Z )d P = \r,dP, (1)
A A
E(E{n \ Q ) = E { n ) . (2)
In particular, \i r\ = t\B , where >]h is the indicator of the set B, i.e.
f 1 for (o £ B,
W [ 0 otherwise,
then
V(A) — f gBdP = P(AB),
A
and
Е(Пв\ 0 = Р<(в )-
The conditional probability P((B) of В for a given value of £, may thus
also be considered as a conditional expectation.
Of course one may ask whether E(r] \ £,) is with probability 1 equal to
the expectation of the conditional distribution of r\ for a given value of ^
(i.e. to the expectation of the distribution Р ?(»7 _1(К)). The response is
affirmative, provided that Р£>{>]~'{У)) is with probability 1 a probability
distribution. This can always be achieved, as we have already seen. In this
case
E(4 10 = f r,dP(> (3)
h
272 MORE ABOUT RANDOM VARIABLES [V, § 4
with probability 1. In order to prove this it suffices to show that for every
A £ the relation
\ ( \ g d P .) d P = \ t 1dP (4)
A Si A
holds. Obviously, this relation is fulfilled for ц = rjn, where rjB is the
indicator of the set B; indeed in this case
4 r]B dP ( = P ( ( Д), S' пв d P = P (A B),
h A
and (4) will be reduced to the relation
j Pi (B) dP = P(AB)
A
defining Р^(В). H ence (4) holds when t] takes on a denum erable set o f
values. From this, because o f the know n properties o f the Lebesgue integral,
it can be show n that (4) is generally valid.
If t; and ц are independent, it follows from (3) that we have with proba
bility 1
Е Ш ) = Е(п). (5)
Furtherm ore the follow in g theorem can be stated for arbitrary random
variables £ and >]: If f ( x ) is a Borel-m easurable function su ch th a t E (f(£)q)
exists, then we have, with probability 1 ,
E { f t t ) 4 \ V ) = № E { n \Z)- (6)
To prove this it suffices to show that
е д = £ № 1 0 )= £ (№ Ю )- (8)
Thus if £ and t] are independent, E(t] | £) = E(ti) with probability 1 and
from this follows the desired result.
Consider now another important property of the conditional expectation.
Let £ and rj be two random variables and g(x) a Borel-measurable function.
V , § 5] G E N E R A L IZ A T IO N O F BA YES* T H E O R E M 273
In d ee d w e h av e b y (1) fo r every A £
= I (<i П1 + c2 42)dP.
A
Nevertheless we cannot state that E(rj j <f) is a linear functional with proba
bility 1, since (11) holds with probability 1 only and the exceptional sets
corresponding to every pair (t]u t]2) may together even cover the whole
space Q.
and
g(y) = J g(y I x)f(x)dx. (6 )
— CO
hence by (5)
*)-.
M (*)
—00
Formula (8) may be considered as a generalization o f Bayes' theorem for
the case of absolutely continuous distributions. With this formula one can
express the conditional density function of g for a given value of £ by means
of the conditional density function of £ for a given value of g and the
unconditional density function of g. It follows from (8) that
J f(x \y )g (y )d y
= — h ------------------ (9)
I f(.x \t)g (t)d t
—oo
or
J
f ( x Iy)dG(y)
P ( a < g < b \t; = x ) = - ^ -------------------- , (10)
J f{ x \l)d G (t)
-00
where G(y) is the ordinary distribution function of g.
I
V , § 6] T H E C O R R E L A T IO N R A T IO 275
Let it be mentioned that h(x, y),f(x), g(y) are only defined up to a constant
factor. If f ( x I y) and g(y \ x) are computed by (3) and (4) or (8), this factor
disappears. The obtained density functions f ( x \ y) and g(y \ x) are already
so normed that their integral from —oo to +oo has the value 1.
Remarks.
1. It was implicitly shown in proving (3) that the random variables
tj — E(rj 10 and E(rj | 0 are uncorrelated.
2. The assertion of Theorem 1 may be written in the form
Then by Theorem 1
0< /ЗД < 1. (7)
E ^ r , ) = E ^ E ( t 1 \ 0 ) = E ( 0 E ( r 1) ,
thus R{0 rj) = 0. The following example shows that Kfrj) = 0 does not
imply the independence of £ and rj: Let the point (E rf) be uniformly
V, § 6] T H E C O R R E L A T IO N R A T IO 277
E ( [ g - E ( g I O]2) = 0,
hence, with the probability 1,
E([g-g(0f)
Е([д-д^)]2)>Е([д-Е{д\0]2), (H)
278 M ORE ABOUT RANDOM V A R IA B L E S tV , §6
g (0 = E (r ,\0
is valid with probability 1.
In particular, it follows from Theorem 4 that for any two real numbers
a and b
E([r, - (a{ + b) f ) > E(U - E(r, I £)]2> (12)
a= W R M ’ h = E M - aE® - ( 13)
The line
E([rl - ( a l i + b ) ? ) = D \ r i ) l \ - R \ t i , r i ) ) . (15)
Ä*(i,»/) £ * ! ( , ) . (17)
where у = g{x) runs through the set o f all Borel-measurable functions for
which the expectation and variance o f g(f) exist. The relation
r(^)-fcy w ^ -рщрт2и
ш ю Vf к P ( A k) P ( B j ) ’
1 For the general case see A. Rényi [28].
280 M ORE ABOUT RANDOM V A R IA B L E S [V , § 7
(2)
It is clear that 9?(£, rj) is zero iff £ and rj are independent. If the number
of the values x k is n and that of the у,-s is m with m > n, then
<«>
It can be seen from (4) that in (3) the sign of equality holds iff for every
к and for every j either P(Ak Bj) = P(BJ) or Р(АкВу) = 0. Since, however
Let A and В be Borel sets on the x-axis and the у-axis respectively; put
P1(A) = P( Z- 1(A)) ( 6 a)
and
P2( B ) = P ( r 4 B ) ) . ( 6 b)
Let A X В denote the set of the points of the (x, y)-plane for which x £ A
and у £ В. Define the measure Q(C) for the two-dimensional Borel sets
of the form С = A x В by
P(C) = \ k(x,y)dQ (8 )
c
holds. If F(x) and G{y) are the distribution functions of c, and g, respectively,
and if A and В are any two Borel subsets of the real axis, then the function
k(x, y) satisfies the relation
P(Z £ A, g £ B) = (' f k(x, y) d f(x ) dG(y). (9)
х£Ау£В
In particular, if £ and rj are discrete random variables,
P(AkBj)
----------- -— tor x — x k, у = у.;
k(x, у) = P (A )P (B j) ^ ^ (10)
0 otherwise.
If the joint distribution of £ and \ is absolutely continuous with the density
function h(x, y) and if / ( x) and g{y) are the density functions of i and g
respectively, then we evidently have
^ . ( 11)
/W s W
We can now define the contingency for arbitrary regularly dependent
random variables £ and g by
<p(L g) = ( 7 T W
— 00 — 00
* . У) - 1 12 dFW dGW ( 12 )
or equivalently by
+со +oo
<T g) = j J k 2 (x, y) dF(x) dG(y) - 1. (13a)
—00 —oc
282 M O R E A B O U T R A N D O M V A R IA B L E S [V, § 7
(l3b)
—0 0 —00
But by assumption
R (u (£ ), v(rj )) = J J
—00 —00
u (x) v(y ) [k (x , у ) - 1] d F (x ) d G {y). (1 7 )
By applying the Schwarz inequality and (12), we obtain
R \u ( 0 , v(ri)) < rp2 ({, n),
which proves the theorem.
The quantity
•Ж, ri) = sup I R (u(0, U ( 18)
u ,c
where u(x) and v(y) assume all Borel-measurable functions for which
expectations and variances of u{Q and v(rj) exist, can also be considered
V, § 7] D E P E N D E N C E O F T W O R A N D O M V A R IA B L E S 283
a) <KÉ,»/) = lK»bO;
b) 0 < ^ , » ? ) < 1 ;
c) i f У — a(x ) and У = b(x) are strictly monotonic functions, then
Ф(a(0> *0 0 ) = rj)-,
d) ij/(f tj) = 0 iff q and tj are independent ;
e) if there exists between £, and ij a relation o f the form U(f) — V(tj),
where U(x) and V(y) are Borel-measurable functions with DjUifij) > 0,
then ф(£, tj) = 1 ;
f)
we have
I h) I ^ min (Kt (rj), ^ ( 0 ) ^ max (K t (tj), K n (О) ^ h) < <p(£, rj).
P r o o f . Properties a), b), and c) are direct consequences of the definition.
If £ and tj are independent, clearly rj) = 0. Conversely, if
the smallest positive real number A satisfying for all sequences {x„} with
Y jxn < + 0 0 the inequality
IZ Z Ш п , in,) x n x m) I < A Z x%
n m n
(1 9 )
i.e. the least upper bound of the quadratic form
Z Z Ш п>im )xnx m
n m
n
K t(n )£ A .Z (20)
For the proof we need a lemma which is a generalization of the Bessel
inequality, well known from the theory of orthogonal series, to the case
of quasiorthogonal functions.
Z Z E ( U m) x nx m
n m
\ Ц Е ( Ш х пх т\< B ^ x l ; (2 1 )
n m n
then for every random variable rj for which Elfi1) exists we have
Z E 2(>/(„) ^
n
BE(rf). (22)
V , § 7] D E PE N D E N C E OF TW O R A N D O M V A R IA B L E S 285
а„ = Е Ш - (24)
Obviously,
in = fn ( U - (29)
Then by definition of the maximal correlation
Thus the lemma can be applied to the sequence {£„} with В = d, provided
286 M ORE ABOUT RA ND O M V A R IA B L E S [V, § 8
(34)
Dd n )
where Din(t]) = D(E{r\ \ £„)) is the standard deviation of E(r\ | £„). Then
according to Theorem 5 of § 6
* a( / .( « , 4 ) - * f . 0 l ) . (35)
Hence by (32) and (33)
1 With the help of this generalization Rényi succeeded to prove that every positive
integer n can be written in the form я = p + P, where p is a prime and P is the product
of at most К prime factors; К denotes here a universal constant. Cf. A. Rényi [2].
V , § 8] FU N D A M EN TA L THEOREM OF KOLM OGOROV 287
F n + m ( x 1 , X 2, . . . , X „ , + CO CO) = f „ ( X j , X 2, . . X„)
(n,m — 1 , 2 , . . . ). ( 1)
For A c Q, let //,, A denote the set of all elements of Qn which can be
brought to th e fo rm у = П„со with со £ A.
Let now A cz Qn be any subset of Qn. We shall call the set of elements
со = co2, such that IJnco = (соь . . . , co„) £ A an n-dimen
sional cylinder with base A; we shall denote this set by IJfi(A).
If A is Borel-measurable, the corresponding cylinder set is said to be
a Borel cylinder set. Let be the set of all Borel cylinder sets; is an
algebra of sets. To see this let us remark that an и-dimensional cylinder
set is at the same time an (n + m)-dimensional cylinder set as well. In fact
A + В = П ^ \ А ’ + В ’),
A — В = n~jf{Ä - В ');
^ n \ ^ n) = n N\ M(AN+M)
follows, because of ( 1 ),
Pn +m (An+m) — Pn (An).
Consequently, the definition of P(A) does not depend on the base figuring
in the construction of A.
Clearly, the set function P(A) is nonnegative; it is easy to show that it
is (finitely) additive. If A £ , В £ AB = 0, then, because of
(We made use of the fact that the value of P(A) does not depend on
the dimension of the chosen base of A.) It is further clear that P(Q) =
= Pn (Qtv) = 1. It remains to prove that P(A) is not only additive but
also e-additive on By Theorem 3, § 7 of Chapter II it suffices to show
that P has the following property:
CO
§ 9. Exercises
1. Let there be given in the plane a circle C, of radius R with its center in the
origin, and a circle C2 concentrical with C\ having a radius r < R . Let us draw a
line d at random which intersects C,, so that if the equation of d is written in the form
X cos <p + у sin <f = Q,
q> and q are independent random variables, <p being uniformly distributed in (0 , л)
and q in ( - R, + R ) . Let { denote the length of the chord of d inside C2. Determine
the distribution function, expectation, and standard deviation of { .
Hint. Let first f be fixed. Then
0 for X < 0,
I At the point X = 0 the distribution function has thus a jump of the value 1 — — . j
This expression being independent of <p, the conditional density function of c
under the condition { > 0 is
fO fer X < 0 and X > 2r,
4* V - - 7
V , § 9] E X E R C IS E S 291
This leads to
. г2я r2 / 32Ä
£(f) = ------ and O(f) = ----- J ---------- л2.
w 2R v’ 2R V 3r
2. Let d be a line chosen at random as in Exercise 1. Let В be a convex domain
in the circle Cv Let £ denote the length of the chord of d inside B. Calculate the
expectation of £.
Hint. We have E(Z) = EÍE(£ \ y)), where <p has the same meaning as in Exercise 1.
E(£ I <p) is equal to the integral along the chords of the domain В lying in a given
direction, divided by 2 R; for fixation of <p means restriction to the chords which
n IВ \
form an angle <p + — with the x-axis. Hence E(( | <p) = —— , \B\ being the area
2 2R
IВ I
of B. We see that E( q) = — . It is not necessary to require the convexity of В neither
that it be simply connected.
3. Let there be given in the plane a curve L consisting of a finite number of convex
arcs and contained in a circle C of radius R. Choose at random (in the sense explained
in Exercise 1) a line d intersecting C. What is the expectation of the number of the
points of intersection of this line with L I
Hint. Consider first the particular case when L is a segment of length l of a straight
line. In this case the number v of points of intersection is 0 or 1. If (p is the angle
between the normal to d and the segment L, the expectation under the condition of
/ cos (p
fixed <p is E(v I 9?) = —— — . This leads to
2R
n
2
From this it follows for polygons, and by a limit procedure for all piecewise convex
IL I
(or concave) curves L, that E(v) = ------, where | L 1 is the length of the curve L.
nR
4. Calculate E(i") for n = 2, 3 , . . . the conditions of Exercise 1.
Hint. We have
те
2
2n rn+l (
Ed") = — - — I sin"+I i> d&.
0
Hint. The pressure of the gas is equal to the expectation of the quantity of motion
imparted by the molecules of the gas during unit time to a unit surface of the vessel
wall. We assume that the shocks are perfectly elastic. If a molecule of mass m and
velocity V strikes the wall in a direction which forms an angle & with the normal
vector of the wall, then the quantity of motion imparted by the molecule will be
2 mv cos &. In order to strike a unit surface К of the wall during a time interval
(t, t + 1 ), the molecule of velocity v moving in a direction which makes an angle
with the normal vector to the wall has to be included at the time t in an oblique
cylinder of (unit) base К and height v cos &. Under the assumption that the molecules
are uniformly distributed in the recipient, the probability of the shock in question
v cos &
i s ---------- , where W denotes the volume of the vessel. Hence the expectation of
W
the quantity of motion imparted to the wall by the considered molecule will be
2 V m Ci°S — = cos ^ where e is the kinetic energy of the molecule. The
W W
4e cos2 ft
quantity------------ is a random variable. Hence we have to calculate its expectation.
W
(Here the relation E(E(Z \ r\)) = E(Z) is to be applied.) If the velocity components
are supposed to be independent and to have normal distributions with the density
1 ( X2 \ IkT
function------ = exp — — - where a = , / — , then ft and e are independent and
a y In \ 2o4 Mm
the distribution of the direction of the velocity vector is uniform. Hence
3 i
We know already (Ch. IV, § 17, Exercise 29b) that E(e) = — kT. Since E(cos2 #) = - - ,
2 о
we find
_ ( 4e ) kT
[ W j W
for the expectation of the “pressure” exerted upon the wall by one molecule. Since
there are N molecules in a gram molecule of gas, we find for n gram molecules, because
of the additivity of the expectation, the value
nNkT _ NkT _ RT
p ~ - - y ~ - —y ’
w
where V — ---- is the molar volume and R = Nk the ideal gas constant.
n
6. Let C2 , . . ., i n be independent random variables uniformly distributed in the
interval (0, 1). Let them be arranged into an increasing sequence and let the k-th
element of this sequence be denoted by Z*.
a) Show that the conditional density function of £,*, £*, . . ., s* with respect to
the condition = c is given by
k\
—y for 0 < Xi < x2 < ■• • < xk < c,
f/c(xs, хг, . . . , xk I ££+1 = c) = c
0 otherwise.
V, § 9] E X E R C IS E S 293
7. Let {lt £2, . . f„, . . . be independent random variables. Consider the sums
= £i + £* + . . . + €„ • Show that under the condition = x the random variables
t k and f, are independent for к < n < l .
8. Let the random vector ({, rj) have the normal density function
J A C - В2 Г 1
h(x, у) = - У .— -------- exp — —- (Ax2 + 2Bxy + Cy-) .
2л 2
Prove the following relations:
В
W , rj) = ------ -- - ,
у / AC
E (V I I) = - 4 Í,
I 4) = - A »i,
?>(£. V) — —p = = = and Ш , » 0 = 1 r I.
V 1 - r*
where r = R(c, rj) is the correlation coefficient, <p({, rj) the contingency, and i/<({, rj)
the maximal correlation of the random variables £ and q.
10. If the functions a(x) and b(x) are strictly monotone, then
<p(a(í), b(qj) = <p(í, rj) .
11. If ({, rj) is uniformly distributed in a circle, фЦ, rj) = —.
£(ty) = j 1 for m 6 d,
[ 0 otherwise,
Г 1 for со € Ő,
tl(co) — Í
I 0 otherwise,
then
Ш rj) = r ( $ , rj) = K\(rj) = K j ß ) = R\!t, n) =
[P(AB) - P(A) P(B)f
~ P(A) [1 - P(A)i P(B) [1 - P(B)\ ’
provided that 0 < P(A) < 1 and 0 < P(B) < 1.
294 M O R E A B O U T R A N D O M V A R IA B L E S [V, § 9
13. Prove the following variant of Bayes’ theorem: Let i be a random variable
with an absolutely continuous distribution with the density function / ( a) and let rj
be a discrete random variable. Let y k (k = 1 , 2 , . . . ) denote the possible values of
rj and p k(x) the conditional probability Pip = yk | ( = x). Let f k(x) be the conditional
density function of £ given ij = yk. We have
fk(x) = -+3----------------- •
J /»*(0/(0*
Hint. By definition
J Pk(x)f(x) dx = P ( H A, = yk),
A
hence
p it < x n - V ) dl
P(S<x \T,= yk) = Х’п Ук) = ^ ----- -------------- .
p(v = Ук) +c . . „ . .
J PÁOfOdt
14. Suppose that the probability of an event Л is a random variable c with density
function p(x) 0 (a) = 0 for a < 0 and 1 < a ) . Perform n independent experiments
for which the value P(A) = £ is constant and denote by r]„ the number of the experi
ments in which A occurred. Let p„k(x) be the conditional (a posteriori) density function
of { with respect to the condition т]„ = к (к = 0 , 1 , 2 , . . . и); according to the preceding
exercise
, . A *(l - X f - kp(x)
P„k(x) = -J------------------------- .
Jr*(l - t) " -hp(t)d t
о
a) Show that if { has a beta distribution of order (r, s), then c has under condition
rj„= к a beta distribution of order (к + r, n — к + s).
b) If p(x) is continuous and positive on (0, 1) and if / is a constant (0 < / < 1),
then
15. Let Í be a random variable and let «i........ in- be random variables which
are for every fixed value of C independent and have a normal distribution with
expectation f and standard deviation a (a > 0 is a constant). Let p(x) be the density
function of £ . Study the conditional density function p„(x |y) of Í under the con
dition
fi + f* + • ■• +£■ _
n
and show that if p(x) is positive and continuous, we have, for fixed x and y,
Pn (У + y\
I Jn J 1
l i m ------------- 4 _--------- = g 2o! .
»-+» 7 ” J 2710
V , § 9) E X E R C IS E S 295
16. Let f be a random variable with an exponential distribution. For every given
value of C , let f t) { 2 , . . . , be independent normally distributed random variables
with expectation f and standard deviation a > 0 . Determine the conditional distri
bution of f with respect to the condition
£t + ^2 + • • ■+ _
n
17. Let fi be a random variable having the density function pit). Let for every given
value of p the random variables i u . . . , be independent, normally distributed,
with expectation p and standard deviation a > 0. Show that . ■. ,c„ are exchangeable
(cf. Ch. IV, § 17, Exercise 18).
18. Let {дг be independent random variables having the same distribution
and finite variance. Put t)„ = f t + { , + . . . + i„ (n = 1, 2, , N) . Calculate the
correlation ratio Knn (rjN) (n < TV).
19. Let £„ . . ., i N be independent random variables with the same distribution.
Put i;,, = + . . . + {„ (л = Í, 2 , . . . , TV). Calculate the contingency
Ч>(Л«, Vm) (n < m < TV).
20. Let the random variables , £2, . . {„ be independent and uniformly distributed
in the interval (0, 1). Let i* denote the k -th order statistic of the sample . ..,
(See Exercise 17 of Ch. IV.) Compute Kf* (£*) and <p(Z*, if ) for k < l < n.
21. Suppose that the probability p of an event A is a random variable on a con
ditional probability space. Let g(t) — ^-----— (0 < t < 1) be its density function.
Let p be constant during the course of n independent experiments and let r/„ denote
the number of those experiments in which the event A occurred. Calculate the
a posteriori density function and the conditional expectation of the random variable
p with respect to the condition = k ( 0 < k < ri).
"■a
Hint. According to Bayes’ theorem the a posteriori density function of p with
respect to the condition r/„ = k is
, , P *-'(l - p y - k~l
ffk(p) = i --------------------- ;
j - t)n- k-'d t
0
the “a posteriori distribution” of p is thus a beta distribution of order (k, n — k)
and the conditional expectation is — .
n
22. Let i be a random variable with Poisson distribution and expectation A, where
A is a random variable with a logarithmically uniform distribution on (0 , + oo).
Calculate the a posteriori density function and the conditional expectation of A with
respect to the condition c = n > 1 .
Hint. Bayes’ theorem gives for the conditional density function of A with respect
to the condition i = n:
„ A"-l e~x
9,,(/) ~ ö T T i j r ;
the a posteriori distribution of A is thus a gamma distribution of order n.
296 M ORE ABOUT RANDOM V A R IA B L E S [V , § 9
23. Let the random variable £ have a normal distribution N(p, a), where p is a
random variable uniformly distributed on the whole real axis. Determine the a posteriori
density function and the expectation of p with respect to the condition Í — a.
Hint. According to Bayes’ theorem
„ 1 (.ft - a)2
9(P I Í = a) = — — e x p ------ — — .
у Тло L 2a
1 1 «
/C*i, • • •, x„) = -— exp - — V (xk - m f .
( а ,/2 л )" 2a-
Put
1 П
x = n- kt= 1 x*•
We have
n n
у (*k - m f = У (xk — x f + n(x - m f,
k= 1 k= 1
hence
1 In n(x — to)2] 1
/ ( a„ . . *„) = J — e x p -------- — ---- • - — -= r~—--- = X
a V 2я 2a- J ( в ^ 2 я ) " - ‘ /л
1 П _
X exp - —- У (a* - a) 2 .
*=1
The density function of C„ is
_ 1 j n nix — to) 2
S(5) = 7 V 'УГ exp [ ------- 2 a2 ;
hence the conditional density function of the random vector (£t, . . . , c,) for £„ = x is
This function does not depend on m; a property which is expressed by saying that
C„ is a sufficient statistic for the parameter to.
25. a) Let there be given n independent random variables . ■., {„ with the same
normal distribution N(0, a). Put
(k = 1, 2, rí). Put
Vй
П
Vi = X С» ffc C /= 1, 2, . . л).
k = \
Then = /я C and
É
1=1
= *=1
É Й = 4? + £=
Z 1 (f* “ o 2.
hence
т = 1=2
É »?*•
We know (cf. Ch. IV, § 17, Exercise 43) that r)x, . . . , r ] „ are independent normally
distributed random variables with expectation 0 and standard deviation cr; hence
f = ~ = and г = У jif
V" ,=2
are independent; т has a ^-distribution with (я — 1 ) degrees of freedom.
b) Let • • •, Í« be independent random variables with the same normal distribution.
Let the expectation /< and the standard deviation a of the be independent random
variables on a conditional probability space, p being uniformly distributed on the
whole real axis and a logarithmically uniformly distributed in the interval (0 , + oo).
Put
f. + $2 + ■ ■ ■ + , V It
Í = -------------------------- and r = > (I* - 0 ‘-
n <t=i
Determine the a posteriori distribution of /г and cr2 under condition f = x and r = z.
Show that given these conditions cr and —---- — are independent.
(7
Hint. The density function of the vector (p, a2) with respect to the condition
f = x, r = z is, according to Bayes’ theorem and the result of Exercise 25 a)
•2 ^ 1 z n(x - fl)2'
i---- z - e x p ------- - e x p ------------- —
n 2 <r2 _ 2a2
V 2^ ’ . --L t i i ’
2 2 cri+,r | — - —
л; — (j,
thus a and --------- are independent,
(7 26
26. Let there be given a sequence of pairwise independent events A„(n = 1, 2,. .. )
with P(A„) > a > 0 {n — 1 , 2 , . . . ) and an arbitrary random variable r) of finite
variance. Show that
lim E( 4 I An) = О Д . (1)
n —► + CO
If 2? is the indicator of the event В (O < P(B) < l) , it follows from (1) that
lim P(B I A„) = P(B). (2)
П —*• -j- со
298 MORE ABOUT RANDO M V A R IA B L E S (V , § 9
h e n c e , by T h e o re m 3 o f § 7,
00 P(A ^
Z i Р"Л л № I 4 ) - £ ( # < ö ! (>;).
/1=1 A ^\Лп)
Thus
lim
»-»+» ,—
1
P(íДíЛ7T
) =
w h ic h p r o v e s (1).
Remark. C f. C h . VII, § 10, T h e o re m 1.
27. Let a sequence of pairwise independent events A„{n = 1, 2, . . . ) be given and
assume
Z Л Л ) - +°°-
/1=1
Let
В = lim s u p
n —>- -J- OO
A„ = П(
CO
/1 = 1
zCO
& = /l
A *)
denote the event that infinitely many of the events A„ occur simultaneously. Show
that P(Ii) = 1 .
Hint. Let C be any event with 0 < P(C) < 1. Like in Exercise 26, it follows from
Theorem 3, § 7 that
<*> p(A ^
Z , J J s lp(c I A„) - P(QV < P ( Q [1 - P(C)1.
/1=1 1 *\л п)
CO
Apply (1) to C = Ck = ^ A„. Obviously, P(Ck) > 0. It follows from (1), in view of
n —k
CO
P(Ck I An) = 1 for n > k, that P(Ck) = 1; hence P(Ck) = 0. Since В = | ”[ Ck> we
k=1
— °o
have B = Y , C k a n d h e n c e P(B) = 0 , w h ic h is e q u iv a le n t t o P(B) = 1 .
Remark. The assertion of Exercise 27 is a sharper form of the Borel-Cantelli
lemma (cf. Ch. VII, § 5).
28. Let c and ?; be arbitrary random variables, f(x) and g(x) Borel-measurable
functions such that
E ( f ( 0 ) = E(g(r1)) = 0, D ( m ) = D(g(rj)) = 1
V, s 9] EXERCISES 299
and
* (/« ). g (v )) = E {№ g ( r j) ) = ш , V) .
or to put it otherwise, suppose that Jl(u(Q, v(r/)) assumes its maximal value for и = f
and V = g. Then the following equations hold with probability 1, where X = ф (£, rj):
E ( m ) 1v ) = Xg(v) (1)
and
E(g{v) I 0 = Ш ) , (2)
hence also
E(E(M) \ v ) \ t ) = Ш ) (3)
and
E(E(g(rj) \ i ) \ v ) = WgRi) ■ (4)
Hint. We have
H í , rj) = E (№ )g(.v)) = E(E(M)g(?j) I 0 ) = E (№ E (g (ri)\0 ) ,
hence according to Schwarz’ inequality
On the other hand, if /*(£) = fulfils £ (/* ({)) = 0 and ö (/* (£ )) = 1, then
we conclude that Ű- < ф'ЧЛ, rj). Hence Ö 2 = ф'ЧЛ, rj). Since in Schwarz’ inequality
equality holds only in the case of proportionality, we must have E^gjrj) | | ) = A/({)
which proves (2). But
E ( f ( Q E ( g ( r j) \ij) = H i,V ) -
On the other hand, by (2)
E (m E (g (v ) ! £)) = XE (/*({)) = X ,
hence X = H i , V)- Equation (1) is proved in a similar way.
29. With the notations of Exercise 28 we have
E ( № I g(ri)) = Xg(rj)
and
E ( g ( r j) |/ ( 0 ) = X f( i) .
Hence the regression curves of i* = /{(,) with respect to rj* = g(rj) as well as that
of rj* with respect to l* are straight lines (or, as it is expressed, the regression of {*
and rj* is linear).
300 M ORE ABOUT RANDO M V A R IA B L E S IV, § 9
(№,MO) E(A(0M0),=
Ll is a Hilbert space. Further we define АЛО, for /( { ) € Ll, by
А Л О = е ( е (ЛО\чЖ ) .
Show that АЛО = A ( 0 belongs also to L\ and the linear transformation АЛО of
the space L'l is positive and symmetric, i.e. it fulfils the relations
CHARACTERISTIC FUNCTIONS
Я(ПС*)
k=l
=k П
=1 £(£*). (3)
If A (x) = a(x) + ib(x) is a complex valued Borel function of the real
variable x and £ is a real random variable, further if the expectation of
£ = T(£) exists, then the latter can be calculated by
It is easy to prove that for every random variable with complex values
|£(0 I<AKI). (5 )
§ 2. Characteristic functions and their basic properties
9>i(0 = J
— 00
e‘xl dF{x) (2 )
^ ) = 1 A «"". (3)
/c=1
First of all, let it be noted that every distribution function has a charac
teristic function since the Stieltjes integral (2) exists always, in view of
I eixl I = 1 .
If £ assumes positive integer values only, with
Р(Л — к) —Pk (k = 0 , 1 , . . . ) ,
we see that
<Pt(0 = E Pk e m =
k=0
where
IM O - £ ( ^ ‘ IЛ ) Р Ш I ^ A M < ~ • (6 )
304 C H A R A C T E R IS T IC F U N C T I O N S [V I, § 2
Consequently,
follows
Y .Y . v f d h - t k) z hzk > 0 . (1 0 )
A = lfc = l
Z Z vf fh - tk) Zh Zk = E( I Z eUk« zk I2 ).
A = 1 fc=l k=1
V I, § 21 B A S IC P R O P E R T IE S 305
the characteristic function o f their sum is equal to the product o f the charac
teristic functions o f the individual terms:
П
Pc.+<. *...+«» ( 0 = n<Pi*( 0 -
k=l
Vst+dO = V sX O vdO
the independence of and £ 2 does not follow. Let for instance be
ív = £2 = £> where £ has a Cauchy distribution: 95f t ) = e~w
306 C H A R A C T E R IS T IC F U N C T IO N S [V I, § 2
T
—oo
exists, the integral
j xeixt dF(x)
—00
converges uniformly in t. Hence
exists fo r j — 1 , 2 , , k, we have1
l* « (° l* 1 7 p (17)
Since by assumption | / (<r)(x) | is integrable on ( —oo, + oo), (14) follows
from (16) by Riemann’s lemma concerning the Fourier integral. 1
Inequality (17) is obviously of interest for the study of the behaviour
of cpft) for large values of 1 1 1 .
Remark. According to Theorem 7, the “smoothness” (differentiability)
of cp^t) is determined by the behaviour of/(x)for |x| -*■ + oo ; by Theorem 8
the “ smoothness” of f i x ) determines the behaviour of for |i| -> oo .
The two theorems are therefore in a certain sense dual.
<pft) is even a holomorphic function in the whole band \ v\ < R o f the complex
plane t - и + iv.
P roof. If the assumptions of the theorem are fulfilled, cp^t) is, because
of ( 1 1 ), arbitrarily often differentiable at the point t = 0 and we have
9 )(^( 0 ) = i"M„. From this (20) follows immediately.
Because of (13) for every real t0 and every n
"I 1<Р(л)('о)1 „ 1
,lmSUP V ------ n ' ------ - R - (23)
M n = Elf") (и = 1 , 2 , . . . ) .
l “ I
I
cpt —- ^ ! = 1 for n = 0 , ± 1, ± 2 ,...;
2 nn \ i?
Vi —7 - = I Pk= 1-
4 ^ / к — — oo
. . . 2 kn a
Since 1 — cos (fox — a) is positive except for x = -------- 1-----(k = 0,
to t0
+ 1,. . .) (for which values it is equal to 0), all jumps of F(x) must therefore
2л a
belong to the arithmetic progression dk + r with d = ---- and r = —■.
tо t0
vdO = Z Pk VtfO-
к
know that
Fix ) = I Pk Fk (x)-
к
j f e 2 dz j < e 2 j e 2 du (1 )
x—it 0
implies
lim I j e 2 dz | = 0 . (2 )
( —►GO X—it
Hence
+ 00
1 Г -- 1 г - -
=
sJ 2 k j
) e 2 dz =
y /2 n J e 2 dx
5=
L —00
and consequently
9 >«(0 = e 2. (4)
Q— m
If the random variable ^ is N(m, a) the random variable <f = -------- -
a
_ i•
is N (0, 1) and ^ = crii;' + m. From cp^,(t) = e 2 and from Theorem 3
in § 2 follows
imt----- 5 - ,
(t) = e . (5)
00
cpt(t) = X J dx = - - - - - — — . (6 )
0 1 ~ T
From this it follows immediately by Theorem of § 2 that a random 6
£
variable £k having a Г-distribution of order к and expectation — has
A
characteristic function
n ,( 0 ~ F - W •
j £ - i &
k =1
9 ^ (0 = (8 )
(1 —l i t ) 2
Example 6. The characteristic function o f the binomial distribution.
Let (J be a random variable having a binom ial distribution o f order n
with param eter p \ according to § 15 o f Chapter III
Since the unicity Theorem lb follows from the inversion Formula (1),
it suffices to prove the latter. Before beginning the proof we have to make
first some remarks. It was pointed out in § 2 that q>(-t) = q>(t). Thus if
Re {2 } denotes the real part of the complex number z, ( 1 ) can be rewritten
in the form
1
F{b) - F(a) = — Jf Re I9 > (0 e- “a - e-i'b
-------- ------ dt. (2 )
—00
e~Ua- e - i,b
The real parts of w(t) and o f -------------- are even functions, while their
it
imaginary parts are odd functions. Therefore the same holds for
e ~i,a _ e~i,b
m = ? (o — г— (3)
и
1 /• e -i“>_ e~“b
2Í J ------ J,------ *■ <6)
—00
But if this integral exists, its value is by Formula (5) equal to F(b) — F(a).
For the proof of Formula (2), we need two simple lemmas.
L emma 1. Put
s c .o - 2 - [ (7)
n J t
314 C H A R A C T E R IS T IC F U N C T I O N S [V I, §4
and the convergence is uniform for | a | > <5 > 0, where 6 is an arbitrarily
small positive number.
P r o o f . If we put
■X
__ 2 f sin и
S(x) = — ------- du,
n J и
0
we have
S(a, T) = S(aT). (10)
Put
(W+l)7r
(‘ sin и
2
cn — — ------ du,
n J и
пк
then we have
It
cn = ( - i y — [ - ^ - d u (и = 0 , 1 ,2 , .. .) ; (11)
л J nn + и
0
sin и
Six) — У ck H----- ----------- du for пк < X < (n + l ) 7t (12)
к=о 71 J u
ПК
№ ) |< 2 . (16)
Thus (8 ) is proved. (9) follows from the well-known formula
00
. 2 f sinn ,
S(co) = -—- -------du= 1. (17)
n J и
о
The uniform convergence follows from (10).
Lemma 2. Put
+T
1 f sin t(z - a ) - sin 1(z - b)
D (T ,z,a,b) = — J ------------- — ------ ------- —dt (18)
-T
and
+00
.. I f sin t(z - a ) - sin t(z - b)
D (z,a,b) = D(+ oo, z, a, b) = —— ------------------------------- — dt. (19)
2л J t
—00
For every real z, a, b and for every positive T
0 fo r z < a or b < z.
P roof . Since
On the other hand, since a and b are points of continuity of F(x), we have
by Lemma 2
F(b) - F(a) = f D(z, a, b) dF(z).
— CO
(23)
In order to prove (2) it suffices thus to prove that the order of integration
may be reversed in the right hand side of Formula (22). The difficulty is
that the integral (19) representing D(z, a, b) is not absolutely convergent.
But by Lemma 2 we know that D(T, z, a, b) — D(z, a, b) tends unifoimly
to zero on the whole real axis, except for the intervals a — d < z < a + 5
and b - ö < z < b + ö, where Ö is a small positive number. Furthermore
on these intervals \D(T, z, a, b) | < 2. Since a and b are continuity points
of F(x), we have
T I t (0 1 dt (27)
—oo
exists. Then
т - ш n * * - * - * -
a- о
+00
= lim
л-o J
I iíHthi* [T(/) e-»* + ^ ( - í ) ei,Jt] dt. (28)
—00
Since (27) exists and because of
the limit and the integration can be interchanged according to the theorem
of Lebesgue, hence
+ CO
It is easy to show that the integral figuring on the right hand side of (29)
is a uniformly continuous and bounded function of x. This leads to
in this case
I I e'xt dF{x) < — . (34)
|x |> A
Fn( + V > i ~ y
hold, hence
Now |.F„(x) - F(x) I < 2 and according to the theorem of Lebesgue limit
and integration can be interchanged, hence the right hand side of (37),
and by (36) <pn(t) — 9o(t) too, tend for n -> oo uniformly to zero if jij < T.
Thus we proved that the condition of Theorem 3 is necessary.
We show now that it is sufficient as well, i.e. that from (32), with (p(t)
continuous for t = 0, follows (31). According to a well-known theorem
of Helly every sequence {F„(x) } possesses a subsequence {F„k(x) } that
converges to a monotone nondecreasing function F(x) at all continuity
points of the latter.
We show first that this function F(x) is necessarily a distribution function.
It suffices to show that F(+ oo) = 1, F(— oo) = 0, and that F(x) is left-
continuous. This latter condition can always be realized by a suitable
modification of F(x) at its points of discontinuity. Since F(x) is a limit
of distribution functions, we have always 0 < F(x) < 1. Hence it suffices
to prove that F(+ oo) —F( — oo) = 1. First we prove the following formula:
X + oo
In fact
+00
r, . 1 f 1 - cos xt ,. ,
Fix) = — J ------ y,----- <pn{t) dt =
-00
+00 +00
f f 1 —cos xt
= —
1
JI — 7 2 ----- cos yt dt dFn(y) (39)
—OO —OO
—
1
Jf 1 —cos xt .
------ ^2----- dt = \x \.
,
(40)
—00
From (40) it follows that for x > 0
0 for у < - x,
+ 00
Hence by (39)
/„ (* )= f
-л :
(x-\y\)dFn(y). (4 2 )
or
+ oo
(T > 0 fixed arbitrarily) follows from the already proved necessity of the
condition of Theorem 3. Herewith our theorem is completely proved.
Let us add some remarks.
f 1 for X > n,
F n (x ) = [ 0
n for
- x < n.
For every finite x, lim Fn(x) = 0, nevertheless cpn(t) = elnl does not tend
w—OO
to a limit (except for f = 2 kn (к = 0, ± 1 , + 2 , . . .)).
2. We have proved that if the characteristic functions of the functions
F„(x) converge to a function 95(f) continuous at f = 0, then the functions
F„(x) converge to a distribution function F(x) with characteristic function
<p(t). If we omit the condition that <p(t) is continuous at the origin, our
proposition is no longer valid. Thus for instance let F„(x) be the
distribution function of the uniform distribution on the interval ( —л, +ri),
that is
, ., sin ni , , , .
then wJt) = --------, and thus the limit
nt
lim <pn(0 = <r(t)
n-~OO
exists for every real t and is given by
f 1 for t = 0,
^ I 0 otherwise,
thus 95(f) is not continuous for / = 0. The sequence F„(x) converges when
n -* 00 for every x to . F(x) is therefore identically equal to the
„ ? „!ъ г п г - — - <47)
hence
+f P{!; = In + 1) = 1,
n = — co
and we find
8 “ cos(2«+l)l 21 / 1
<Рц(0 = n-2 „ti.
Z —Ö (2n" +TTw—
1)- = 1 ----------
n for 111 - n> (48)
P(r, = 0) + +f P(rj = An + 2) = 1
n = — 00
<«>
The function $>,,(/) is periodic with period n. Let the real axis be partitioned
into subintervals
2k — 1 2k + 1 „ л „
---------- - n <, t < ------— n (к = 0 ,± 1 ,± 2 , . ..),
V I, § 5 ] O N T H E N O R M A L D IS T R IB U T IO N 323
then we see that the functions (p$(t) and cpn(t) are identical on intervals
with an even index к and are of the opposite sign on intervals with an
odd index k.
Now (p{t) can never be zero. In fact, if for a value t0 we would have
t f i
<p(t0) = 0 , then by ( 1 ) we would have 99 = 0 and thus also 99 I= 0
(n — 1, 2, . . . ) . As cp(t) is continuous, we would have 9 9 (0 ) = 0 ; this is
impossible as 9 9 ( 0 ) = 1 (cf. Theorem 1 of § 2). Put
Щ = In cp(t) (2 )
then, by ( 1 )
*(20 = W ) + * ( - ') • (3)
Put
ад = «КО- «К-0. (4)
If in (3) t is replaced by —t and the equality so obtained is subtracted
from (3) we find
0(21) = 20(0- (5)
324 C H A R A C T E R IS T IC F U N C T I O N S [V I, § 5
sl-L
= ■ (»-1,2,...). (6)
~T
The right side of (6) tends for n -* oo to <5'(0), i.e. to zero. Hence
<5(0 = 0. (7)
It follows that i\/(t) = <//( —/) and by (3) that
ф(20 = 4 m (8)
This leads to
(9)
NO
The right hand side of (9) tends to — — since i/<(0) = »A'(0) = 0, t/<"(0) =
= —1. Hence
t2 • - '*
•K0 = - - j - and 7(0 = e 2 ;
£ + t]
By assumption, the characteristic function of — = — is also equal to
\/2
cp(t). By Theorems 3 and 6 of § 2, however, the characteristic function of
^ “4 is cp2 — 7 - I , hence
V2 I v/ 2 )
* (-t )
[ 2 -0
t2 t2 )
hence 1l/(t) = — — and y(t) = exp — — which proves our theorem.
Theorem 2 can be rephrased, by using the notion of families of distri
butions, as follows:
+ 00 + 00
I xdF{x) = 0, \ X2 dF{x) = 1.
—00 —00
I Í X — m I
I f the family o f distributions I F -------- L <7 > 0, is closed with respect
\ \ a )\
to the operation o f convolution, i.e. if for any real numbers m b m2 and for
any positive numbers а ъ tr2 there can be found constants m and a (m real,
<7 positive) such that
F И , (13)
Ci c2 j a
326 C H A R A C T E R IS T IC F U N C T IO N S [V I, § 5
then
X
u2
F(x) = — = e 2 du; (14)
■^J2 71
— CO
{ X — TYl I
F --------- 1 is thus the family o f the normal distributions.
a Л
then
9 К 0 <F(°2 0 = 4 Í - J + ai 0- ( l5)
For <7x — ff2 = —j= (15) reduces to (10); hence Theorem 2' follows from
\/2
Theorem 2.
Theorem 2' explains to some extent the fact that errors of measurements
are usually normally distributed. In effect, the condition, that the sum of
two independent errors of measurement belongs to the same family of
distributions as the two errors themselves, cannot be fulfilled, in the case
of a finite variance, by other than normal distributions. The condition that
F(x) should have finite variance is necessary for the validity of Theorem 2'.
Thus, for instance, for the distribution function
as pointed out above. There exist, however, other stable distributions, e.g.
the Cauchy distribution. Stable distributions will be dealt with in § 8.
We deal now with some further remarkable properties of normal distri
butions. If £ and q are independent normally distributed random variables,
their sum £ + q is, as we know already, normally distributed too. We shall
now prove that the converse of this statement is also true: this result is
due to H. Cramér.
<Ft (0 = —
TOOe‘xl dF(x )> Vn (0 =—
Г00 e“ ' dG(x y ( 18)
We show now that the definition of <pft) and cpft) can be extended to
all complex values of t, so that rp^t) and y f t ) are entire functions of the
complex variable t. Let us first suppose t — iv (v real) and let A and В
be any two positive numbers. We have
+
( e~vxdF(x)- +
f e~vy dG(y) < +f j * e~v(-x+y) dF(x)dG(y) =
—A —B —oo —oo
v*
= 95c + , ( * » ) = e 2 . (19)
The definition of cp^t) and q>n(t) can thus be extended to every complex t.
It is easy to see that <pjj) and <p,,(0 are holomorphic on the whole complex
plane, hence they are entire functions of t.
Because of (17), (p^t) # 0, cpn(t) Ф 0 for every t. Hence l n y ^ i ) and
ln y 4(/) are entire functions too, where that branch of the logarithmic
function is to be taken, for which ln 1 = 0. If a > 0 and b > 0 are such
that F(a) — F( —a) > and G(b) — G( —b) > — , then
+ o° ^^
and
+ °°
Hence, for t = и + i v,
v‘_
e2 ~+6M Ш-а+*|<|
I <Ps (0 I ^ <Pt 0' v) = у < 2e < 2e ~ . (24)
Similarly, we obtain
Д11+.1.1
\ < P ,0 ) \ ^ 2 e 2 . (25)
If the real part of z is denoted by Re(z), we have by (24)
are, because of (25) and (26), bounded on the whole /-plane. According
to a well-known theorem of H. A. Schwarz the relation
2n
1 Г 4- z
/(z) = Im (/(0 )) + — J R e ie
R e ( f( R e ie)) Reie_ z dd (27)
0
holds for Iz I < R for every function /(z) holomorphic on Iz I < R. It follows
from (27) that — — ^ —- and ----- \ —1 are bounded on the whole plane;
|/|2 |/|2
they are thus, according to Liouville’s theorem, constant. Hence <рД/) =
= exp (c/2) and <p,t(t) = exp (dt2). Because of
there follows
Г /2 I
<Pí(0= exp (28)
and
Г* 0) = exp - ^ | -
Г t2 1 , (29)
П (Л(О)“* = e“ 2 - (30)
*=i
then it follows from Cramér’s theorem that the functions f k(t) (k = 1 , 2 r)
are characteristic functions of normal distributions. In fact, if N denotes
the common denominator of the numbers al f . . . , ar, we have
r N l‘
П ( Л ( 0 Г * к = е~ 2, (31)
k =1
where Nock (k = 1, 29. . . 9r) are integers. Hence
л С О -С Л Ю Г ^ П С М ) * 4-
1 Фк
П {fk (t)Yk =
k=1
e"1' ~ °2 , (33)
where m is a real number and а, alt a2, . . . , a, are positive numbers, then
the functions f k(t ) (/c = 1,2.........r) are characteristic functions o f normal
distributions.
P roof . The following proof is due to Yu. V. Linnik and A. A. Singer [1].
It consists of five steps.
Step 1. Put
П ( л ( 0 ) м = е “ ?2 (34)
k=1
holds if 111< Ő. Furthermore gk (t) is a real and even function. If we prove
from (34) that gk (t ) is the characteristic function of a normal distribution,
then the theorem of Cramér implies the same conclusion for f k(t) . It
follows from (34) that gk(t) Ф 0 for 11 \ < Ő, hence we may take the log
arithm of the two sides of Equation (34):
r 1 t2
У ock In -------= — . (35)
к- l 9k (0 2 V ;
Let Gk (.y) be the distribution function corresponding to the characteristic
function gk(t). It follows from the assumptions concerning gk(t) that Gk(x)
is symmetric with respect to the origin; hence we have for any a > 0
+ o° +a
9k (0 = I cos tx ■dGk ( y ) < 1— I (1 —cos tx) dGk ( y ) .
—oo —a
V I, § 5 ] O N T H E N O R M A L D IS T R IB U T IO N 331
К
Since for 111 < — - the relation
2a
+Я
I (1 - cos tx)dGk(x) < 1
—a
holds and since for 0 < x < 1 we havex < In — -Í— ], it follows from
1 —x j
(35) for a > —— that
2<5
+ a
Г C t2 71
Z ak ( 1 - COStx)dG k (x )< — for |/|< — . (36)
k=i J 2 2a
—a
(40)
1 Cf. e.g. E. Lukács [2].
332 C H A R A C T E R IS T IC F U N C T IO N S [VI, § 5
Z 4 = 7 and =Z h 4 = P’
]= i j i
A ij = 4 Z 4 4 = 2^
;= i y=i
Now we show by induction that the integrals
k=i f ~9k7( (r
0
- ^ )(0)]) =
5 н (о = е п - - у ; /у!-
j =1 lj'
V I, § 5 ] O N T H E N O R M A L D IS T R IB U T IO N 333
Í h = U t i j l j = 2q.
1=1
We show now that the right hand side of (43) has the order of magnitude
0 ( t2) when ? —►0. In fact if v < 2q is an odd number, then by the induction
hypothesis g ^ \t) = О { \t\). Hence it suffices to consider terms for which
all the lj are even. If v is even, v < 2q — 2, then we have
d k O )
J _ - 1 = 0 (i2)
9 k (0
t
k =l
У Р (0 - a f q) (0)] = 0 {t2). (44)
ff* (0 = 9k *
у / N0
for a* = N00Lk. Without restriction of generality we may thus assume
afc > 1 (к = 1, . . . ,r). Now raise the two sides of (34) to the power 2q,
differentiate 2q times and put t = 0. By introducing the notation yk(t) =
= [gk(t)]2ai‘‘l we obtain thus
I . (45)
/, +... +/,= 2 9 ll - • • • ‘г- Ш 1=0
The quantities y(k\ t ) can be evaluated by means of the formula of Faä
pi Bruno:
/
fi°(0 = Z 2q<xk (2qcck - 1) . . . (2qock - v + 1) [gk (г)]2««*-’ x
V=1
х Г т г ^ т .П ^ т т • <46>
where in the inner sum the summation is to be taken over the /}-s and
/,—
s such that /) = v, £ i f = /. Because of
<?f-1)(0) = 0, sg n # > (0 ) = ( - l ) '
and
2 ^ 0/д. (2//aÄ— 1) . . . (2^гал —v + 1) > 0 for v < 2q,
it follows that all nonzero terms on the left hand side of (45) have the
sign of ( —l)9. The right hand side is
d^ e~tt(1
t o= (2# (2?)! H 24 (0), (47)
Thus
(-1)«
H 2a (0) = —— — .
2eW q\ 2«
Since on the left hand side of Equation (45) there occur the terms
^ ( 0 ) too, the relation
^ ( O ) . i 2«
* w ?! 2q
must hold, wherefrom
i' tot 1
lim su p rV V ' (49)
the circle 111< — . Suppose that not all gk(t) are entire functions; then
e
the same holds for the functions hk(t). Let hko(t) be the function hk(t) which
has the smallest radius of convergence, which radius we denote by R.
Take 0 < r < R; put k(t) = hk(r + t). Then
П (50)
k=l
and
ГТ ( - ^ ) . | gfc = e 2 (51)
2
in the circle 1t — r \ < — . From this it follows for an r sufficiently near
e
to R that hkB(t) is regular at the point t = R. This, however, contradicts
the known theorem according to which the sum of a power series with
positive coefficients having a radius of convergence equal to R, is singular
at the point + R.1
Step 5. The proof of Theorem 4 can now be finished like that of Cramér’s
theorem. If we choose the numbers ak > 0 (k = 1, 2 , . . . , r) such that
ak
J
-Як
dGk{x)> -i- ( k = r - l,...,r ) ,
then
P r o o f . Linnik has shown that the proof of this theorem can be reduced
to that of Theorem 4 as follows: Take ax = a2 = . . . = a, = 1, which
does not restrict the generality. t]x and t]2 are by assumption independent,
hence we have
E ( e i(Uql+ m,,)) = E ( e iu,u) E ( e iD"’). (53)
In a neighbourhood of the origin the 9ok{t) are not all zero. Put therefore
Ф/А0 = in <pk(0> then
Y j ( x - и) ф*к(и + bkv) du = “ IÉ * (V ) I +
0
X
+ f (x - u)
■' k= 1
É
Фк(м)du-
о
If we put
B(v) = Y4>* ( V ) , (56)
*=i
we obtain
r x + bkv t b/cV
Since [<p*(x) I ^ 1, the relation B"(0) < 0 must hold. Equality cannot hold
here, since then the l k would be constants with probability 1. Hence we
can put B"(0) = —a 2 < 0 and we have thus
i Cjktk
= U= 1.2,...,«)
k=l
are mutually independent, too.
We have thus S, = J n and
i =
k =Él S - n - j==1 Í r « r - J=Í2 ^
which shows the independence of £, and rj.
2. The condition is sujficient. We may assume E(i;k) = 0. By the assump
tion of the theorem
E(ei(ui+vri)) = E(eiui) E(eiv"). (62)
If we differentiate on both sides of (62) with respect to v (which is allowed
because of Theorem 7 of § 2), and substitute v = 0 afterwards, we obtain
Eine»*) = E (e ^) E(n) = Е{ц), (63)
where
(p(u) = E(e‘uik)
is the characteristic function of the random variables £fc. From
ч-fi—
{
^-1±«—nf-x:1
n k=
s№
j= j<k 1 1
(«j
1 J. Kawata and H. Sakamoto [1].
2 A. A. Singer [1].
340 C H A R A C T E R IS T IC F U N C T IO N S [V I. § 6
= (и - 1)а2(<}5(ы))\ (67)
If we divide by (« — 1) (<р(м))'', we find
L M . (68)
<p(u) I (f(u)
The left hand side of (68) is the second derivative of In cp(u). If we integrate
twice and consider that cp'(0) — iE(^k) = 0, we find
a2u2
\rup(u) = ------— (69)
9 ^ (0 = Е(еКМ) = +J . . .
—00 —00
J
ei(*-r) dF(x). (2)
9><(0)=1.
2. For every t
1 ^ (0 1 < 1 .
9>4(m), (4)
where
n
= u = (u1, . . . , u n), m= (5)
i=i
It suffices therefore to determine the characteristic function of £. Since,
by assumption, the random variables are independent, we have
E E bhhh
A = l/= 1
is positive definite and the matrix В = (bhj) is the inverse of the matrix
A = (ahj), the elements of which are the coefficients in the expression of
the density function o f »/. The matrix В is thus the dispersion matrix of /?.
In fact, a simple calculation shows that the matrix В can be written in the
form В — C S C '1, where S is the diagonal matrix with elements of.
On the other hand, we have proved (cf. Ch. IV, § 17, Exercise 42) that the
density function g(y) of t] with у = {уъ .. . , y„) is given by
ZZ
h= 17=1
ahjZhZj
is positive definite and its determinant is denoted by \A\, then the characteristic
function o f q is given by (7) where m = ( m1, , mn) and where (bhj) = В =
= A " 1 is the dispersion matrix o f q.
There exists an inversion formula for «-dimensional characteristic func
tions too. It is given by
n S t ) ^ E ( e i^ ‘) = <pt(au), (12)
344 C H A R A C T E R IS T IC F U N C T I O N S [VI, § 6
I «1=1,
k=1
the theorem follows from the uniqueness theorem.
<pft ) = <p„(t).
(P d O = (P d t ) (P n S t )- O 3)
It follows from (12) that
<Pftd) = <pfta) <p„{ta), (14)
and the proof can be finished as that of Theorem 3.
P roof . It follows from the assumption that the random variables e“kik
are also independent, hence
cpft) =E{f\eil
*=i
*<*) = П
*=i
= П
*=i
( 16)
k= 1
V í, § 6] M U L T ID IM E N S IO N A L C H A R A C T E R IS T IC F U N C T IO N S 345
then by (15) it follows that the characteristic functions of F(x) and G(x)
are identical. Hence, because of the uniqueness, F(x) = G(x) and the
random variables £k are independent.
As an application of the preceding theorem we prove now the following
theorem due to M. Kac (cf. Ch. Ill, § 11, Theorem 6).
£ Щ кЖ ) к л “ E tfX ftJ
Víih) = I -------------- and <pft2) = X ------- - ------
fc=o i=o t-
for every complex value of t1 and t.,. If we put £ = (£, g) and t = (zx, t2),
it follows from (17) that
<P{(0=*=0/=0
I I ------nTi-------=<Pt(h) <pj!*
)■
Hence the theorem is proved.
Theorem 3 of § 4 may also be extended to the case of higher dimensions:
furthermore, for any A > 0 the convergence in (19) is uniform for \ l \ < A,
where | / 1 = J t \ + . . . + 1*.
The proof of this theorem is omitted here, as it is essentially the same
as that of Theorem 3 of § 4.
As an application of Theorem 7 we show that the multinomial distri
bution tends to the normal distribution. Suppose that the random vector
Zn = (£л-1>• • •, £n„) (N = 1 , 2 , . . . ) has a multinomial distribution
fj\
P(ZN1 = U = *«) = T J ~ ~ / r r ■■■pkn’ (20)
X kj = N, p; > 0, X Pj = 1
j =1 7=1
Let t]N denote the random vector (i/iVi, • • ■, i?;v„) with
—-^VÍPy /->i ,
"-( ’
We obtain for the characteristic function of gN
_ n _ Ги I it- 11^
<PnN (0 = exp [ - i j N ( X tjjp j)} ■ X Pj exP \— 7== >
7=1 -7 = 1 -I
hence
In <?w(0 = 7=1
hy/Pj ) +
+ A in 1 + X Pj exP /— - 1 • (22)
y=i V Wft L
Hm In <pnN(t) = -
N-+со
\ [ X Ű - (y =I l
y= l
h JPi )2]- (23>
The limit distribution has the characteristic function
in effect the random variables r\Nk are connected by the linear relation
П _
I fiN k s /P k ^ Q ;
k = 1
Ü *k yJPk = 0.
k = l
where у and о > 0 are real constants and G{u) is a nondecreasing bounded
function. {Formula o f Lévy and Khinchin.)
I f the distribution has a finite variance, (1) may be written in the form
+ 00
10 for и < h.
K(u) = , , ,
(1 for и > h,
then
99(f) = exp{A(euh - 1 )}.
§ 8. Stable distributions
У f ( i ) = <Р(Я„0 е‘ы
1
with qn > 0; hence [?>(/)]" (« = 1, 2, . . .) is again a characteristic
function.
For a detailed study of stable distributions we refer to the books cited
in the footnote of the preceding paragraph. Lévy calls only those
nondegenerate distribution functions F(x) stable, for which to any two
positive numbers c, and c2 there exists a positive number c such that
F(cxx) * F(c2x) = F(cx). Distributions, which we called stable above,
are called quasi-stable by Lévy. It can be shown that a distribution with
characteristic function <js(t) is stable in the sense of P. Lévy, iff In <p(t)
may be written in the form (3) with у = 0 for а Ф 1 and ß -- 0 for a = 1.
Thus the following result is valid:
T heorem 2. I f a distribution with the characteristic function cp(t) is stable
in the sense o f P. Lévy, In <p(t) can be written in the form
? R - = ° (и = 1 , 2 , . . . ) (9a)
q ,
and
<p(qnto) = 0 (и = 1 , 2 , . . . ) (9b)
which is impossible both for q > 1 (9a) and for q < 1 (9b) in view of
(p{0) = 1 and the fact that <p{t) is continuous at t = 0.
Thus i//(t) = In cp(t) is continuous too, and i/^(0) = 0. Let clt c2, . . . , c„
be any n positive numbers; according to (7) there exists a c > 0 with
Z <Kc/c 0 =
k =1
Hence, for cx = c2 = . . . = cn = 1 there exists a c(n) with
пф(0 = ф(с(п) t). (10)
Thus
c(n)
# ( 0 = Ф(с(п) t) = тф t .
( n c(ri)
If we put c ----- = -------, we obtain
\ m c(m)
— iKO = Ф c Í— I t . (li)
m \ \ m)
(I d \ П
From il/(at) = il/(bt) and a < b follows = i// — t , and because
b I
of the continuity of ip(t), we have = i^(0) = 0, which is impossible
since the distribution was supposed to be nondegenerate. Hence necessarily
a = b and c(r) is thus unique. Since further
for every t, (13) holds for any two positive rational numbers r, s. Let now
q be a rational number q > 1 and t a real number such that \p(t) Ф 0.
Then
Ф([сШ п 0 = Ч" И 0 - (14)
We have necessarily c{q) > 1, since c{q) < 1 would imply
т ) = Ф(с(Х)1) (is)
and
c(A/f) = c(A) с(ц) (16)
are valid for every positive value of A and ц and that c(A), as a function
of the real variable A > 0 is increasing. Put
g(x + = +
y) g(x) g(y) ( )
18
is valid. Hence (cf. Ch. Ill, § 13)
and
1
c(x) = x a , for X > 0. (20)
This leads to
*‘ WO = M O (21)
for every real t and positive X. Thus for t > 0 we have
хКХ1) = Х°ф(0 = Гф(Х), (22)
И 0 = “ (c° + 1 J7 [ Cl ^ I“ (24)
Because of \q>(/) | < 1, c0 > 0 holds. It remains to show that 0 < a < 2.
But a > 2 would imply 9?"(0) = 0 and hence D2(^) = 0, and thus f would
be a constant. Herewith (6a) is proved.
where the numbers N k are integers. It is easy to see that /(x ) d C and
g(x) d К imply f(x ) ■g(x) d C.
A sequence of functions { f j x ) } ( f n(x) d С; n — 1, 2, . . .) is said to be
regular, if for every h(x) £ C the limit
exists. Two regular sequences {/„(x)} and {gn(x)} are said to be equivalent
if for every h(x) £ C
+ 00 +00
by
+ 00 + 00
where on the right hand side the limit exists by assumption and remains
the same when {/„(x)} is replaced by another sequence equivalent to it.
If F(x) ~ {/„(x)} is regular, the sequence {x ffa x + b)} is evidently
regular for any two real numbers a, b and for any complex number L
We put therefore XF{ax + b) ~ {)f„{ax + b)}. If F(x) ~ {f n(x)} and
G(x) ~ {^„(x)}, the sequence {f„(x) + g„(x)} is again regular; we put
F(x) + G(x) ~ {/„(x) + g fx ) |. Finally if {/„(x)} is regular and if g(x) £ K,
the sequence {/„(x) g(x) } is regular. For F(x) ~ {/„(x)} we put
F(x)g(x) ~ {/„(x) • g(x)}.
If {/„(x)} is regular, {/„'(x)} is also regular because if h(x) d C we have
+ 00 +00
+ 00 +00
lim J /„' (x) h(x) dx —— j F(x) ti (x) dx
n-*-00 —00 —00
Proof. f(x) = 0(\x\~k) for every integer k, hence the integral (7) exists
for every real number t ; further
+oo
J
—oo
\f(x ) \ - \ x \ k dx
The integral on the right hand side exists, since f{x ) 6 C, hence cp(t) =
= 0(\t\~N) for |f| -» + 0 0 and for every integer Ж By (8) the same holds for
<-p(k\ t ) since (ix)kf(x ) £ C, hence Theorem 1 is proved.
The function <p(t) is the Fourier transform of f(x).
+00
Ax) = -2fn J( < p f ) e - u x dt. (10)
— OO
+ 00
Proof. The preceding theorem guarantees the existence of j | (f (t)\dt ard
—00
(10) follows from Theorem 2 of § 4.
A t) = —Jсо A x ) e ilx dx ,
7(0= —JoO 9ix) ei,x dx,
V I, § 9 ] C O N D IT IO N A L P R O B A B IL IT Y D IS T R IB U T IO N S 357
then we have
+00 +0Г
J f(x)g(x)dx = j ^ I <p(-t)y(t)dt. (ill
—00 —00
P roof . By Theorem 2
+ 00
+ 00
= 2^- J у(О ф( - 0 *
—oo
<P(t) = +
f F(x)ei,xdx. (12)
—00
I f h(x) is a function o f the class C and %(t) is its Fourier transform, we
have
+00 +00
j d>(t)x(t)dt = 2n F (x)h(—x )d x . (1 3 )
J J
—CO —00
We say that the generalized function Ф(г) is the Fourier transform of
F(x).
P roof . Let y(t) £ C and
+ oo
P roof . Put
_ + 00
Let f(x ) £ D and let F(x) ~ { fn(x)} be the generalized function corre
sponding to it (in a unique manner) according to Theorem 5. Write
fix) ~ Fix).
Let now £ be a random variable on a conditional probability space with
density function f(x ) and assume that f(x ) £ D. The characteristic function
Ф$и) of £ is defined by
with F(x) ~ f(x) in the sense of (12); Ф4(?) is thus a generalized function. If
j f ix ) dx = 1
— oo
(f i f t ) = J
— 00
f(x ) eixt dx,
Furthermore, by Theorem 1,
+ 00 +00
j Ф? (0 x(t) dt = 2n
— 00
J
— on
F(x) h ( -x ) dx. (19)
By the definition of F(x), the right hand sides of (18) and (19) coincide;
we get thus (17).
360 C H A R A C T E R IS T IC F U N C T I O N S [V I, § 9
+ 00 _ X2 _____ _ nt*_
I e in e“x dx = s jln n e 2, (21)
— 00
we obtain
+ 00
f X
2\1
hence f k(x) ~ jexp ikx - — J . And since
—00
Ф ( 0 = j° Д х ) е''*Лс.
—00
P roof . Let f„(x) denote the density function of £„ and <r the standard
deviation of the f(x ) < M implies f„(x) £ D. We show that for every
h(x) £ C
_ + 00 +00
J ^ ( x ) K x ) dx = - f f f j K x)d x. (27)
—00 —00
Relation (27) proves the theorem. Indeed if it holds for every h(x) £ C,
let then be h(a, b, г, x) a function of the class C such that
and
0 for X < a — e,
h(a, b ,e ,x ) = 1 for a < x < b, (29)
0 for b + £ < X.1
We have then
+ 00 b
+ 00
f fn (x ) b, e, x )d x
<, ----- ^ --------------------------------- . (30)
J f n (x ) h(c + e, d - e, e, x ) d x
— 00
Since
fn (x) dx J
P ( a < t „ < b \ c < t ; n < d ) = d L ------------- , (31)
j fn (x ) d x
C
h{a, b, f, x) = к ( X - ° + e- \ - к (i L Z * ) ,
where we put
0 for x < 0 ,
Í exP _ w! _ГД dt
k(x) = — —j.--------- ---------- í — for 0< x < 1,
f exp ----------------- dt
J0 - /(1-0
1 for X > 1.
V I , § 9] C O N D IT IO N A L P R O B A B IL IT Y D IS T R IB U T IO N S 363
where
/ = lim inf P(a < < b \c < £„ < d)
«-►CO
and
L = lim sup P{a < £„ < b \ c < £„ < d).
«-►00
When e -* 0, the first and the last member of the threefold inequality (32)
b —a
a—
c
tend to —------, hence (27) implies (26).
Let now be F„(x ) ~ f , ( x ) and let 99,/i) be the Fourier transform of
a yJlTt n f n(x). By Theorem 6 it suffices to prove that Ф„(г) — ő(t), where
<Pn(t) is the generalized function corresponding to <p„(t) (n = 1 , 2 , . . . )
and <5(f) is Dirac’s delta.
Put
<p(t) = J f i x ) e Ux d x .
— 00
We see that cpn(t) = a J i n n <p\t). We have to show that for every /(f) £ C
one has
__ +00
The proof can be carried out by means of the method of Laplace (cf. Ch.
Ill, § 18, Exercise 27).
By Theorem 11 of § 2 we have | 99(f) | < 1 for t Ф 0; furthermore by
Theorem 8 of § 2
lim 99(f) = 0 .
Hence there can be assigned to every e > 0 a q = q(s) with 0 < q(s) < 1
such that I 99(f) I < q(e) for | f | > e. But then we have
_____ +00
(7^ t 2
\ncp (f) = ----- — (1 + 17(f)) with lim 17(f) = 0 .
2* / —00
364 C H A R A C T E R IS T IC F U N C T I O N S [V I, § 9
—s
+eajn
for every h{x) £ C, where on the right hand side figures an ordinary Stieltjes
integral. Then Ф(0, the Fourier transform of F(x), will be considered as
the characteristic function of the random variable £.
-ТТУШ
Example. Suppose that £ is uniformly distributed on the set of the integers,
i.e. the distribution function of £ is given by [x] ([x] represents the integer
part of x, i.e. the largest integer smaller than or equal to x). In this case
F(x) = +
f <5(x - k). (38)
k = —co
Z e- ^ = v i L z e“ r‘~ . (41)
k =—oo ^ к——со
This is a formula known from the theory of 0-functions. We shall need
it later on.
Now follows a theorem similar to Theorem 7.1
and their variance finite. Suppose further that the greatest common divisor
o f the values assumed by £,L—c2 with positive probabilities is equal to 1. Put
C„ = £ 1 + £ 2 + • • • + ín (n = 1 , 2 , . . . ) , then for any two integers к and l
j F n (*) K x ) dx = aJ i n n X P (L = k) h(k),
—со к= —oo
§ 10. Exercises
<ft (0 = 1 A e iXn'
n= 1
is absolutely and uniformly convergent, hence it can be integrated term by term.
Furthermore, since for every nonzero real number x
T
lim i e,x' dt = 0,
T—►oo J
-T
the theorem follows immediately.
2. Let { be an integer-valued random variable and let <pt (t) be its characteristic
function. Prove that
71
3. Prove the theorem of Moivre and Laplace by means of the result of the preceding
exercise.
Hint. By Exercise 2
Я
we have
t f (/O )) < l n ^ 2 ne ,
_ 1 .
where equality holds only for f(x) = (2л) 2 exp j ---- — . Hence # ( /( x ) ) assumes
its largest value in the case of the normal distribution. (In information theory the
number H(f(x)) is called the entropy of the distribution with the density function fix) ;
cf. Appendix.)
5. If E(£) exists, then we know that <pt(t) is differentiable at / = 0 and <p's(0) = iE(^).
Show that the differentiability of <p?(i) does not necessarily imply the existence of £■(£).
Hint. Put
We find
Д cos nt , +, sin nt
* (,) - 2c £ 7 Í 7 ’ * (,) - - 2c, 5 T to T •
The trigonometric series qi\{t) is uniformly convergent1 and <p^(0) = 0. Nevertheless.
E(Z) does not exist.
6 . Let £ be a random variable and = £(|{|"), a > 0. Suppose that Af„ is finite.
Show that if 0 < ß < a ,
i i
(Mp)ß < ( M J “.
p ap bq
Hint. For positive a and b p > 1, q = -------— we have2 a b < — + — .
P —1 P Q
Apply this inequality with
\s \ß . . <
*■
a = ------- T . 6=1, P= j .
(A Q a
7. Study the limit distribution of the multinomial distribution
m = 7 I* Idf(x).
— CO
1 Г 1 - Re(V(0)
m = — -------- dt ( 1)
71 I t*
— CO
1 - Re(<p(0) ■= J — CO
(1 - cos xt)dF(x)
follows
— CO — CO — 00
j~ 1 —Re (y(0) df
b) If we add to the assumption a) the other one that the variance of { exists, we
have further
« . - i f ü M *
71 J t
---CO
„tn-^Q = /T
*-co D(C„) V 71
370 C H A R A C T E R IS T IC F U N C T I O N S [VI, § 10
Hint. If rp{t) is the characteristic function of the random variables £„, we have
9>CWi(0 = 9>M=)1 •
L
Since (pit) is real for every real t, we obtain, by taking into account Exercise 8,
_ CO
</(?*) Jn r tp'(u)
~n7F\
0(C „) = - 31 .) V (") -----
и du-
—со
10. If Ф(t) is the Fourier transform of the generalized function F(x), the Fourier
transform of F(ax + b) is
1 - "b ( t\
-r— e « Ф — .
Ia I 1a )
11. With the same notations, the Fourier transform of F'(x) is — it Ф(').
12. a) With the preceding notations, the Fourier transform of x"F(x) is ( —
b) If the conditional density function of £ is x2" (n = 1, 2, . . . ), the (generalized)
characteristic function of £ is 2тг(—1)" <У'П) (0, where <5(I) denotes Dirac’s delta
(cf. § 9, p. 355).
13. a) Let £„ •> £« be independent random variables having the same normal
distribution, £(£*) = m, £ = — (£t + £2 + . . . + £„), and
n
14. Prove that the following property characterizes the normal distribution: If
f(x) ( —M < X < + ° ° ) is a continuously differentiable positive density function
such that for any three real numbers x, y, z the function
f ix - t ) f { y - t ) f ( z - t)
, . . X + у + z
has its maximum at t = --------------- , then
, 1 Г *2 '
Ax) = — - exp - — .
^ jin a L 2a
f\x ) x -f- у -f- z
Hint. By assumption,/(x) is positive. If we put g(x) = ------ and s = ----------------
A x) 3
we have
g{x — s ) + g(y — s) + g(z — s) = 0 . (2 )
For x = у — z, we obtain g(0) = 0. Take now any two x and у and put z = — ^ =s\
the relation (2 ) and #( 0 ) = 0 lead to
I
372 C H A R A C T E R IS T IC F U N C T IO N S [V I, § 10
<">■>
is the characteristic function of an infinitely divisible distribution.
20. We know (Ch. IV, § 10) that the quotient of two independent N (0, 1) random
variables has a Cauchy distribution. Show that this property is not characteristic for
the normal distribution. If £ and t] are independent, have the same distribution of
zero expectation and if — has a Cauchy distribution, then it does not follow that
V
£ and t] are normally distributed.
Hint. Take for density function of c and r\
/2 1
Ax) = ^ — — —r ( - °°< X < +°°).
n 1 + x'
1 f(x) is the density function of £_z; where £ is N (0, 1); this distribution is some
times called the “inverse normal distribution”.
2 Cf. R. G. Laha [1].
i
C H A P T E R V II
M a = E( I £ - M П (2)
(thus M x is the a-th absolute central moment of £), then we get
„ I, ,, t + In tv#(e) _
P l; > M + -------------— < e '. (6)
e
In order to get the sharpest possible bound we have to choose £ such that
t + In M ( e )
the expression--------------- is minimal or at least nearly minimal.
e
In § 4 an improvement o f Chebyshev’s inequality, which is due to Bern
stein, will be deduced from inequality (6).
§ 2. Stochastic convergence
for any fixed £ > 0 we shall say that the sequence £„ (n = 1 , 2 , . . . ) con
verges in probability (or stochastically) to the constant a and indicate this by
lim st £„ = a (3)
n-*-co
or by
Z„^a. ( 4)
V II, § 2 ) S T O C H A S T IC C O N V E R G E N C E 375
ПГ
In particular, if we put Я = e / -----, (5) becomes
V pq
if now n tends to infinity, the expression on the right of (6) tends to 0, which
proves the theorem.
The definition of stochastic convergence can also be given in the follow
ing form : the sequence f „ (n = 1 , 2 , . . . ) converges stochastically to the
number p when to every pair s, <5 of positive numbers (however small) there
can be chosen a number N = N(e, d) so that for every n > N
. . f 0 for X < p,
hrn Fn (x) = (8)
( 1 for X > p.
If Dp{x) denotes the (degenerate) distribution function of the constant p,
(8) is equivalent to
hence
P(Cn < x ) < P(C„ <X\A„) + P(An). (13b)
But
P(C < X + s)
P(C„ < X 1A„) < P(C < X + г 1A„) < * -- ^ . (14)
lim P(£„ < x) < P(£ < X + e) for every e > 0. (15)
n->-00
P(C„ < A') > P(An) P(Cn < x I A J > P(C < x - £) - P(T„). (13c)
lim P(£„ < x) > P(£ < x —e) for every e > 0. (16)
n-►00
Since e can be chosen arbitrarily small, (15) and (16) imply the statement
of Theorem 1.
V II, § 3 ] G E N E R A L IZ A T IO N O F B E R N O U L L I’S L A W 377
r _ £1 + £2 +
bn »
n
I
0 otherwise.
Ш п - М |>8)< ,
ne,
which proves the statement of Theorem 1.
378 LAW S O F L A R G E N U M B E R S [V II, § 3
lim — £ а д =M
«-►oo ft k= 1
lim - — = 0,
«—со ^2
where
s„= . / £ ß 8 (&).
V k=1
S n = J lD l
V k =1
we have1
lim — = 0. (3)
«-oo ft
Then for the random variable
1 "
f„ =■ — Z Í*
n k=i
relation
t n JLM
is valid.
1 This condition is certainly fulfilled e.g. if the random variables (or at least
the numbers Dk) are uniformly bounded.
V II, § 3] G E N E R A L IZ A T IO N O F B E R N O U L L I ’S L A W 379
C* = - Í (€* - M k).
n k=i
g
Taking into account that £(£*) = 0 and = ——, we obtain the re-
n
lation
lim st С* =0.
п-*-со
Now
С = С„ - -1- i М к
П k l
and by assumption
I 1 и l e
— X Мк - М' < —
I п к= 1 I
£
if п is large enough. As | £„ — Л/ | > e can hold only if | £* | > — , it
follows
lim st (£„ —M ) — 0. (4)
П-+-oo
The assumptions of the above theorem can still be weakened. Instead of the
pairwise independence of £,k it suffices to assume that there does not exist
a strong positive correlation between most pairs. More precisely, the follow
ing theorem holds, due essentially to S. N. Bernstein:
1 "
Then £„ = — У converges in probability to M:
n k =l
b« “ M. (5)
if this is done, the remaining part of the proof can be repeated word by word.
We prove therefore (6). We have
X Dt Di+k < S i ;
n In
— X *(*)|-
1
1
( 8)
Hence by condition b)
z)2(C„) <; — + 2K (— X а д ) .
c„ = — X &
n
1
к
"
=1
Thus we have
Theorem 4. Let be pairwise independent and identically distributed
random variables and suppose that the expectation
В Д = M (9)
exists. Then for
1 "
Cn = — kZi «
one has
( 10)
— i m t) = — i I xdF(x). (12)
П и * = 1 1
-k
Since by assumption
we can write
lim — Z £(£*) = 0. (13)
n-*-ao M k=l
and consequently
l i m ~ Z ^ ) = °- <15)
n~ со И fc-1
382 LAW S O F L A R G E N U M B E R S [VII, § 3
If we put
C —k £ {.+ n k = 1
i
fc= .-+ 1
Theorem 2 implies
lim st C*r = 0 .
n-*-oo
P(C*r # и < s.
lim P ( \ U > £ ) = 0 ,
CO
t (ill"
lim <pn(t) - lim 1 H----- £ — = 1, (21)
Л -* 00 П -* 00 ^ ^ , /
lim * [F (- x) + (1 - Д х))] = 0
X-*- CO
and that these conditions are not only sufficient but necessary as well for
1 "
c. = — i tk
n k =1
to converge in probability to a constant as n -* 0 0 . As regards the proof
of this theorem of Kolmogorov, cf. § 14, Exercise 24.
We now give an example to which the law of large numbers does not
apply. If the random variables are completely independent and if all
have the same Cauchy distribution (with the density function j + ' )>
then
1 "
Í- = — I
n k=1
1 "
Zn = — l Z k
n k=1
384 LAW S O F L A R G E N U M B E R S [V II, § 4
has the same Cauchy distribution as £,k, can be interpreted as follows: When
we take a sample from a population with a Cauchy distribution with den
sity function — 7 - — -------------- т я г - , we do not obtain more information concern-
я(1 +(x — m ) )
ing the number m from the mean of a sample, however large, than from a
single observation.
^ f(s) =
t + In <_x#(e)
It was already observed that the choice of в minimizing -------------------
£
makes the inequality (1) as sharp as possible.
We prove now the following
£= + £2 + Ш ) = D, and ^ f ( e ) = E(e£i),
£2 D2 l eK 1
In lx# (б) < —-— 1 + - у - eeKj .
P ro o f . Since (2)
k —1
VII, § 4 ] BERNSTEIN'S IMPROVEMENT OF CHEBVSHEV’S INEQUALITY 385
tfb = У i A
n=0 n \
and the are bounded, the series is uniformly convergent and the expecta
tion of ecik can be calculated term by term from the power series. Thus we
obtain
Ei^ l+ i Ä + z £ m . (3)
2 я=з n\
Е(ЛТ) — D \K n~2,
we obtain
V e" E < 2 D 2 у W ~ 2
h «! £ kh n\
As
1 1 1
-----< — ------------
л! 6 (n — 3)!
.W -L + f 2 Л_з
1s i + iz s f ■ + " 1 1 ,
j L2 6
which leads to
p2 П 2 I \
^ ( e ) = E(e^) < exp Ц - (l + £* 3e- j ,
/ e2 D2 EKecK
i + _ 2 ~ 1+ “ T "
p U > ------------------- ± - J \ < e ‘. (4)
V e J
386 LAW S O F L A R G E N U M B E R S (V II, § 4
Put
J it
- V (5)
Then (4) leads to
Substitution of Я = J i t gives
IK — 1 Ü
P £ > ID 1 + — e D < e 2‘ . (7)
L 6Л J
IK
Thus if Я is large and small, we obtain a much sharper inequality than
that of Chebyshev’s.
If we apply the obtained result to — we find that
Г ЯЛ M ii
P 1í i > ЯЛ 1 + —— e D < 2e 2■ (8)
6D j
eD<eл к
< 3.
XK
From this, putting ц = Я 1-1------ , we obtain from (8) because of Я <
l M1+k l J
We may free ourselves from the condition £(£*) = 0 by applying (9) to
the random variable
e-m=i [$k-E(Sk)\ к =1
VII, § 4] BER N STEIN ’S IM PROVEM ENT OF CHEBYSHEV’S INEQ U A LITY 387
Г /i2
- M \ > p D ) < 2 e x p ------- --------ГГГ2- . (10)
21 + ^
L 1 2Dj
In this formula
M = i Mk and D = j t Dl ,
*=1 V
D
while p is a positive number such that p < — .
К
Let us apply now this result to the case where the £k have a common
distribution. Let M x be the expectation of the £k and D\ their variance. Then
П
the expectation of the sum £ = £ !;k is equal to nMx and its variance to D\n.
k= 1
It follows from (10) for p < v 'u that
К
г 2
P( I { - nM11> pD! V « ) < 2 exp - —г----- • (11)
L 2Dly/ n ) J
If the £k (k = 1 , 2 , . . ., ri) are the indicators of an event A in a sequence of n
experiments, we get from (11)
2 ^ 1 5 0 ( ,3 >
only for n > 10 000, while by using (12) we find that (13) holds already for
n > 1283.
If we take e = , we find, applying Chebyshev’s inequality, that
(I 1 1 1 ) 1
\fn 2 I 50j 100
is valid for n > 62 500; while applying (12) we see that it is valid already
for n > 7164.
In these examples s > 0 and Ö > 0 were given and we wanted to estimate
the least number n0 = и0(е, <5) such that for n > nn
P(\C „-p\> e)< 0
Е а д < + » , (i)
n= 1
00 00
Лоо = П1 I
n= k = n
(2)
390 LAW S O F L A R G E N U M B ER S [V II, § 5
then
P(AX) = 0. (3)
Remark. The right side of (2) is denoted in set theory by
for every n. Because of (1), the right hand side of (4) tends to zero as n —>• + oo,
hence we have (3).
If the A n are completely independent, (1) is not only sufficient, but also
necessary in order that with probability 1 at most finitely many of the A„
OO
should occur. If Y j P(A„) = + oo, thenP(A„0 )1 5 not only positive but equal
n=1
to 1. Thus we have
f \Р{АЯ) = + с о (9)
/1= 1
and
t t p (Ak A,)
lim inf ----------- = 1, (10)
( t pW ) 2
k= 1
then (5) holds; thus there occur with probability 1 infinitely many o f the
events A„.
Í 1 if A„ occurs,
oc = i
( 0 otherwise.
n n n Dl ( £ a*.)
P ( \ t * k ~ t p ( A k) \ > e Y j P{Akj ) < ------^ ------ . (11)
ft=i *=1 fc=i b2 q :^ ) ) 2
k =l
Now E{oik<Xi) = P(ÁkAl), hence
If we put
we have
lim inf d„ = 0. (15)
П —►CO
392 LAW S O F L A R G E N U M B E R S [V II, § 6
It follows from this that one can choose an infinite subsequence of posi
tive integers nx < n2 < . . . < rij < . . . , such that
I 4*< + o o ; (16)
j=i
hence by Lemma A we have with probability 1
Щ 1 л/
P(Ak),
k=l Z k= 1
00
except for a finite number of values of j. Thus by (9) the series Y ak is di-
k =l
vergent with probability 1, which proves our statement.
The just proved lemmas will serve us well in proofs dealing with improve
ments of the law of large numbers.
§ 6. Kolmogorov’s inequality
к Y Dl
P ( max I £ {t]j- Mj) \ > e ) <, *=12 . (1)
1<.k<,n j = 1 8
P ro o f . Put
= цк — M k, £k = Y 1? (к = 1 , 2 , . . . , ri). Let further Ak
]=i
denote the event that is the first among the random variables £ъ £2, . . .,
C„ which is not less than e, i.e.
P(l<,k<,nj
max IY=i i ^ 8) = Y *: = ]
)•
p (Ak (2)
V II, § 6Л K O L M O G O R O V ’S I N E Q U A L I T Y 393
The right hand side of (3) becomes smaller if the term к = 0 is omitted from
the summation. Hence
r ,r *, • , % ЩкЧ*<*к) Щ * ) Щ к «к) л
Е (и " ' |Л ) " а д ° 0' (6)
Similarly, we obtain that
Е(п*П *Ш = 0 (7)
because of the independence of rjf, t]*, ak for j > i > k.
(5), (6) and (7) lead to
Y D t > e 2 £ P(Ak),
k= 1 k= 1
which proves (1).
394 LAW S O F L A R G E N U M B E R S [V II, § 7
P(lim цп = 0) = 1.
«-►GO
lim st n„ = 0. (3)
«-►00
Proof. We show first that (2) follows from (1). Let e > 0; let A„(e) denote
the event sup | r\m | > e and C the event lim t]n = 0; put further Bn{e) =
m^n «-►со
CO
= CA„(e). Then Bn+1(s) c B„(e) and the set f J Bn(e) is obviously empty.
«=1
It follows from this (cf. Ch. II, § 7, Theorem 3) that
P(Bn(ej) = P(An&).
all relations
\ i n + k ~ P \ < z (A: = 1 , 2 , . . . )
are simultaneously fulfilled with a probability >1 — Ő for s > 0 and ő > 0
however small, if the index n is larger than a number n0 depending on s
and 5.
P roof of T heorem 1. We consider
If the inequality AN > e is fulfilled for an At such that T < N < 2J+1, then
2/+i > e is fulfilled for at least one / > s. Hence
P(An > e) < f P(A2,'2l+i > e) for 2s < N < 2s+\ (5)
l =s
n k=1
then
Z(lim C„ = 0 ) = 1. (9)
n-*- 00
P( lim Cn = M )= 1.
n-*-00
The hypothesis of the existence of the variance in Theorem 1 is therefore
superfluous.
It follows from AN > e, 2s < N < 2s+1 that A2^,2^+1 ^ £ for at least one
l > s; hence
00
P(^>e)<? E w I Я
£ l= s z k =1
Now it can be shown that the right hand side of inequality (11) tends to
oo £)2
zero as n increases (hence as N increases too) provided that the series £ —j-
k=i Iе
is convergent. To show this we need the following lemma due to L. Kro-
necker.
00
1 n
lim ------- Y , ak 4 k = 0. (12)
n-+oo Яn k =1
398 LAW S O F L A R G E N U M B E R S [V II, § 7
Proof. Put rn = E
CO
ak and choose a number n0 = n0 (s) (s is an arbitrary
k=n
small positive number) large enough in order that n > n0 should imply j | <
g
rn
— . It is easy to see that (with q0 = 0)
I я I я
---- = — L rk(<lk - Чк-l) - r„+i. (13)
4 n k= 1 4 n k=l
I
——
я
4 n k =l
E j
ak 4 k —
j
Aq„
4n
2s
+ ~ r— •
3
^1^1 g
1
------
4n
E"
k =1
ak 4 k < s ,
This lemma and the convergence of the series E ~Л~ imply immediately
k =1 к
lim
E я* 2 =0;
«^00 «
hence the right hand side of (11) tends to 0 as N -* oo. This and Lemma 1
lead to Theorem 2.
Put
i n i n i n
^ __ V v** __ V"* K **
ъ« /Li Ъп 2-1 ^ к 5 ^>« 2—t ^ k '
n k =l n k =1 П k =l
Let F(v) denote the distribution function of the £/c. Since we have assumed
M = 0, we have
P( lim CÍ = 0) = 1, (14a)
«-> 00
F (lim C f = 0) = 1. (14b)
«-►co
0 \ ф < Е ( 0 = ] к x2dF(x).
—
k
Hence, because of
00 1 “ 1 1
У — < V ------------------- = — ,
kéj+i к 2 *=% а к (к - 1) j
we have
j —
7+1
соП2 c° Г Г Г 1 00 1
Е ^ н ^ Е л а д + Е-тг*
*=1 * 7= 1 L J j J k= j K
7 -1 -7
<2 j IX I dF(x).
— 00
Hence (14a) must hold. Now consider the random variables £**. We have
Р{И1*Ф 0 ) = f dF(x% (15)
\x\>k
hence
E P ( t i * * 0 ) < +f \x\dF(x). (16)
k=1 —00
400 LAWS OF LARGE NUMBERS [VII, § 8
IC I
Thus — I > 1 holds with probability 1 only for a finite number of values
I n !
of n. Since the are independent, it follows from Lemma В of § 5 that the
°° i £ I I£ I f
series У P ——| > 1 is convergent. Now since P — — > 1 = dF(x)
„ti 1 « ! n J
\x \ > n
the series
W+ l -Л
СО Г* 00 If* Г
I i/T(A-) = £ И í/f(x) + ÚÍF(X)
W=l ^ Л=1 »/ ;
|д:|>л n —(л+1)
is convergent. But we have
+ 00 n+1 —n
hence j | x | dF(x) exists and M = F(<;J exists, too. Hence by the first part
— 00
An = sup I Fn {x ) - F(x) \.
- CO <X< +CO
V II, § 8] G L I V E H K O ’S T H E O R E M 401
Then
P ( lim Лд- = 0) = 1. (1)
TV -*- go
P ( lim F n (x ) = F(x)) = 1
N-*■00
and
P ( lim F n (x + 0) = F(x + 0)) = 1.
N-*oo
Thus it follows from (2) that
1
P lim sup An > ---- = 0 (3)
. ЛГ-оо M
This particular form shows clearly that the strong law of large numbers
and Glivenko’s theorem have a definite meaning even for the practical
402 LAW S OF LA R G E N U M B E R S [VII, § 9
case when only finitely many observations are made. In fact, always when
a large sample is studied, this theorem is implicitly used; hence it has the
right to be called the fundamental theorem of mathematical statistics.
On the other hand it must be noticed that Glivenko’s theorem does not
give any information how N0 figuring in (4) depends on a and ő. This ques
tion will be answered by a theorem of Kolmogorov dealt with later on
(cf. Ch. VII, § 10).
2D I1 + 2Z)2
the convergence of the series
CO
lF(|C J>e) (2)
П=1
and because of Lemma A of § 5, we find that the inequality | £„ | < a is
fulfilled with probability 1 for every sufficiently large n. This implies the
strong law of large numbers. Thus we obtained another proof of this law.
Notice, however, that a supplementary hypothesis was needed for this proof:
the random variables were supposed to be bounded. This hypothesis
allows to prove a far more precise theorem, called the law o f the iterated
logarithm:
Then
Ck = C l + Cl + • ■• + Ck ~ k M .
Then
are fulfilled. Let Bk denote the event '(n — Ck > — 2 x/riD and A the event
Cn ^ X — 2 ,,Jn D . If both A k and Bk occur, A occurs as well. The events
Ak (k = 1, 2 , . . ., ri) mutually exclude each other, thus the same holds for
the events A kBk ; the events A k and Bk are evidently independent since Bk
depends only on the random variables Ck+i, ■• Cn and Ak depends only
on Ci, ■■•! Ck- Since A 1B1 + . . . + A„Bn £ A, the independence of Ak
and Bk implies
' - P V ^ ^ i r * I - W
Thus
T + v ^ +J J
2_
Since lim - — — = - ' ^ ■— = ( l + e ) 8 and lim ltk ^ =0
k~oc V 2111111^ Jy ■ k~ oo 2j N k+1
there exists a number k0depending only on e such that for к > k0 we have
fil
— -----------——- > (1 + e) In ln IV*. > (1 + a) In (к In y).
Ä „ Uif К
2 1+
2 y / N kJ
V II, § 9 ] TH E LAW O F T H E IT E R A T E D L O G A R IT H M 405
Proof. We use elements of the theory of Hilbert spaces. Let a t be the set
of all random variables £ for which £(c2) exists. Put (£, tj) = Е(^ц) and 11£ 11 =
= (^, ^)2. a t is then a Hilbert space. Let ocndenote the indicator of the event
A„:
j 1 for co£An
j o for O K A .
If ß is the indicator of В and if a„ — d = yn, we can write (1) in the form
We show that (4) implies (3) for every ß £ Ж (hence not merely for the ß
which are indicators of sets).
Let Ж ! denote the set of those elements of Ж which are linear combinations
of the y„ or limits of such elements, in the sense of strong convergence,
that is <5„ -» ő means that lim || <5„ — ő || = 0. In other words, Ж х is the least
«-*-00
subspace of Ж containing the elements yn (n = 0,1 . . .). Obviously, (4)
implies (3) when ß is a finite linear combination of the y„, and also when
«
ß £ Ж x. In fact, in the latter case there exists for every e > 0 а у = £ скУк
k= 1
with II ß — у И < e. Because of Schwarz’ inequality and of
we have
I(ß , Уп) - (У, Уп)\ = \ ( Р ~ У Уп) > I^ IIß - У II•II УлII ^ £•
By (4) lim (у, у„) — 0, thus lim sup | (ß, y„) | < s, since for e > 0 there
«-»-oo «-»-oo
can be chosen any positive number however small. (3) is therefore proved
for every ß £ Ж х.
Let now Ж ^ be the set of elements <5 of Ж such that (<5, yn) = 0 for
n — 0,1, . . . . Ж 2 is then the subspace of Ж orthogonal to Ж x. For
ß £ Ж г (3) is trivial. Now according to a well-known theorem of the theory
of Hilbert spaces1 ß 6 Ж can be written in the form ß = ßx + ß2, where
ßx 6 x and ß2 £ Ж 2. Furthermore,
Proof. Choose two numbers a and b such that P(a < ( < b) > 0 and
a, b are two points of continuity of the distribution function of £. Then
P(a < £ „ < / > ) > 0 for n > n0. Let A0 = ß and let A k denote the event
a< +k +x < b (k = 1 , 2 , . . . ) . For к > 1 we have
P (A I Ak) =
_ p I < («0 + к + l ) C „ o+ /c + 1 + ^Л о+* + 2 + • • • + £«„ + « + 1 < £ j ^ <
~ «0 + n+ 1
< p | a _ (»0 + к + 1) b < £n„-|-fc+2 + • • • + £„„+,, +1 < b _ (И0 + + 1) fl j
— ) w0 + n +1 — и0 + n + 1 n0 + n + 1 I
for any e > 0 whenever n is large enough. Similarly, for sufficiently large n,
Theorem 1 of § 2 leads to
lim P ( A J A k) = P ( a < £ < b ) ,
/»-►oo
Q(B) < P(B), hence if P{B) = 0, then Q(B) = 0. From this the assertion
of our theorem follows directly, by Theorem 3 of Chapter II, § 7.
According to the Radon-Nikodym theorem the derivative
= “(" ) (2)
Let A'n be a mixing sequence of events in the probability space [Í2, ts f, Р2]
with density d 1 and A"n a mixing sequence of events in the probability space
[Q, P2\ with density d2(0 < dx < d2 < 1); put A„ = A'nQ1 + A!’n Q2.
Then clearly we have for every event В £
P(An В ) = P(QA P, (A'a В ) + P( ü 2)P 2(A: B),
hence
lim P(AnB) = Q(B),
«-»-co
where
Q(B) = d1 P(BQ1) + di P(BQ2).
Let the random variable a = a (со) be defined in the following manner:
d 1 if со £ Qu
a(® )= ,
«2 tf ß 2>
then
Q(B) = f «t/P.
в
Thus the sequence of events {A,,} is stable but not mixing, since its density
is not constant but assumes two distinct values with positive probabilities.
Clearly, there can be constructed in a similar manner stable sequences of
events with densities having an arbitrary prescribed discrete distribution.
Now we shall prove the generalization of Theorem 1 of § 10 concerning
stable sequences of events.
exist, then the sequence o f the events {A„} is a stable sequence o f events.
L (0 = (£,«),
i.e. the sequence a„ converges to a, in the sense of weak convergence in the
Hilbert space. (A sequence of elements an of a Hilbert space is said to con
verge weakly to a (a £ 3F), if for any element £, £ 3t?
lim (£, a„) = (£, a).
n—00
This fact is denoted by a„ -*• a.)
The preceding discussion contains the proof of the following
T heorem 3. Let a„ denote the indicator o f the event An and 3 ? the Hilbert
space formed by the random variables with finite second moments defined on
the probability space [Í2, Р]. The sequence o f events {A„} belonging to
the probability space [Q, P] is stable, if f a„ converges weakly in to an
elementa. £ ЗА. I f the sequence o f events {A„} is stable and if an a, then a
is the density o f the sequence o f events {An}.
A stable sequence of events {An} is mixing, iff there exists a number
d(0 < d < 1) such that for every event A
T heorem 4. From any sequence o f events one can select a stable subse
quence.
further by assumption
P{An A ) = P2 if « > k,
hence
lim P{A„ Ak) = p 2 ( k = 1 ,2 ,...) .
/I-*-oo
Pk = dk,
i.e.
P{Ani A„t . . . A J = P ( A J P ( A J . . . P ( A J ,
whenever 1 < nx < n2 < . . . < nk. But this means that the events A„ are
independent.
Now we prove a theorem due to B. de Finetti.
P k— \ x k dF(x) ( k = 1, 2, . . . ) . (2)
0
P roof . By Theorem 1 the sequence {An} is stable. Let a denote the den
sity of this sequence. Then
t
414 LAW S OF LA R G E N U M B E R S [V II, § 12
Similarly,
p 3 = P(An Ak A,) if n > к > l,
thus
p 3 = lim P(An Ak A,) = f ш к a, dP,
n-+-co D
hence —by taking the limit first for к —> oo, then for / —> oo, — we obtain
that
Рз= f <*3dP
Q
i , P k +j ( - i y [ ! ) = ( - i y ^ Ä + z , (6)
hold. Sequences of numbers having property (7) are called absolutely mono
tonic sequences. Hence an absolutely monotonic sequence is nonincreasing,
its first differences form a nondecreasing sequence (i.e. the sequence is
convex), its second differences form a nonincreasing sequence, etc. Note
that inequality (7) can be obtained from the representation of the sequence
of numbers pk in (2) or (3), since
i
( - 1 )l A 'pk+l= [ **(1 - x ) 'd F ( x ) = f a*(l - a)'dP. (8)
b is
Hence we can see from (5) that given the sequence pk, the joint distribution
function of a finite number of random variables chosen arbitrarily from
oq, <x2>■ is given as well; the conditions of compatibility are,
obviously, fulfilled and thus the existence of th e(exchangeable) sequence of
events with the required properties is ensured by the fundamental theorem
of Kolmogorov.
416 LAW S OF LA R G E N U M B E R S [V II, § 12
«1 + «2 +
lim -------------------------= a. (9)
„-ос n
P roof . Let
n
oq + a2 + . . . + a„ ~ a)
0.= --------------------- a=
и n
Thus the series £ ( 0 is convergent, hence (by the Beppo Levi theorem)
n=1
00
Pk = f x k dF(x), (17)
ö
where F(x) is the distribution function of the beta-distribution with párám
ig N -M
eters a = ---- and b = ----------. That is (cf. Ch. IV, § 10, (10)) we have,
R R
IN)
Г — x
R Г M _i N-M
= (1-'> *■' (18)
rH T - H "
Thus by Theorem 4 in case of Polya’s urn model the relative frequency of
the drawings yielding a red ball among the first n drawings converges with
probability 1 to a random variable having a beta-distribution of order
M N-M )
---- , ----------- . From this it follows that the distribution of this relative fre-
R R
quency converges to the mentioned beta-distribution. In fact, if a sequence
t]n of the random variables converges with probability 1 to a random vari
able rj, then (cf. § 7) also rjn tends in probability to ц and thus (cf. Theorem 1
of § 2) the distribution of цп tends to that of r\. Hence we have
Theorem 5. Let in Pólya's urn scheme vn denote the number o f red balls
drawn in the course o f the first n drawings, then
IN
Г — x
V R Г N-M -
lim P - ^ < x = — ---- —4 ^ ------ г tR (1-0 * dt,
л-оо n M ) J N - M \J V
W 1 R 1°
418 LAWS OF LARGE NUMBERS [VII, § 13
i.e. the limit distribution of the relative frequency of the red balls drawn is
a ,beta-distribution
■ ofс order
л [----
M , ----------
N -M .
\R R
In particular, if M = R — 1 and N = 2, the relative frequency of red
balls will be in the limit uniformly distributed on the interval (0, 1).
Furthermore, it is easy to see that Formula (10) of Chapter III, § 3 is a
special case of the present Formula (5).
As is seen from this example, the general theory of stable sequences of
events permits a deeper insight into some particular problems already dis
cussed.
we show for the least a-algebra S :in, relative to which the random variables
£2, . • i\ n are measurable, that £ У (n — 1,2, . . . ) . Hence, accord
ing to the lemma, á3(1) £ У , therefore С £ У and, consequently, P 2(C) =
= P(C). Notice that Theorem 2 of § 10 can also be deduced from this.
00
E D\n t). (3 )
n= 1
00
Remark. It is easy to see from the zero-one law that Y tjn converges
/1 = 1
Proof. We show first that the conditions (1), (2), (3) are sufficient. From
(1) and the Borel-Cantelli lemma it follows with probability 1 that >]„ = tj*
oo oo
for sufficiently large values of n ; hence the series Y rjn and Y g* are, with
/1= 1 «=1
probability 1, simultaneously convergent or divergent. Thus it suffices to
00
show that Yj h* converges with probability 1. Because of (2), it suffices to
/1=1 oo
prove this for the series ő„, where Sn = rj* — We know that the
/1=1
random variables ö„ are completely independent, further that Е(дп) = 0
oo
and Y L)\bn) < + oo. Hence for an e > 0, however small, Kolmogorov’s
n = 1
inequality gives
m i N
P{ max | £ <5* | > e) < — E D \ 6k), (4)
n<fm<.N k —n & k =n
V I I , § 14] K O L M O G O R O V ’S T H R E E - S E R I E S T H E O R E M 421
d,= k-rij
t
Then the series
oo m
Hence it suffices to consider the series £ q*, the sum of which, a random
П=1
variable itself, will be denoted by rj*. In what follows we shall need a lemma
due to J. L. Doob.
equality
Д!,2
I 1< e 3 . (9)
, . D V « (2MI г I)—2 , Ű V
П -10-1+— ------ ä — ,
i.e.
£ 2?2 D2*2 ! Z>2/ 2 - DT
I <Pt*(0 1^ - 1+ — + ! 1 ------ ^ 1 ------------- j — ^ e >
f° r |,|s ^ "
N
Since Yj *7* converges to rj* with probability 1 as iV-> oo, the distribu
it 1
V I I § 14 ] K O L M O G O R O V ’S T H R E E - S E R I E S T H E O R E M 423
N
tion function of £ tj* converges to the distribution function of 4 * at
n= 1
П Фп(0 = «К0>
Л= 1
and
00
£ D4hn) (13)
/2 = 1
00
however, the series ^ r\n converges with probability 1 for any rearrange-
/7 = 1
ments of its terms; the sum and the set of its points of divergence depend
of course on the rearrangement in question.
00
Theorem 1 is stronger than the law of large numbers. Thus for instance
Theorem 2 of § 7 can be deduced from the present Theorem 2 as follows:
Let £2, be independent random variables with •£'(£„) = 0,
00 £,2 £
D(<L) = Dn, and assume У — ~ < + 0 0 . If we put rj. — ——, the hypo-
„=i « ' n
. oo
theses of Theorem 2 are fulfilled; hence the series ]T t]n converges with prob-
/7 = 1
ability 1. According to Kronecker’s lemma (Lemma 2 of § 7) with qn = n it
follows that with probability 1
n
I k 4k , n
lim *-=! = l i m — X í * = 0,
n — co tl «-► O O ^ /c = l
D \ U B n ) = Dl («=1,2,....) (1)
Pn — P(Bn\ C) (3)
and assume that pn > 0 (« = 1 , 2 , . . . ) . Suppose further that
ZPn=+°o (4)
П-1
and
°° n D2
Z - < +00. (5)
l
Define
Sn(Y )= Z ik and N n( V) = Z 1- (6)
1 < .k < .n 1 < .k< _ n
Sktv ikCV
Sn( V) is thus the sum o f the (1 < к < ri) whose values belong to V and
N„( V) is their number. Then
'ЬШ-Н-'-
Proof. Let us put
„ = „ ( « , ) _ j 1 for « “ >«
[ 0 otherwise
and
, ek (Zk - M )
°k — к
Z
Pi
i=1
426 LAW S OF LARGE NUM BERS [V II, § 15
00
Consider the series Y ök. The ök are independent under condition C;
k= 1
further E(ök I C ) = 0 and
D \d k ! C) = — P ^ L - ;
( Е й )2
7=1
oo
Y ^ k - M ) i ^
P lim — — ------------ - = 0 C = 1 . (7)
V
4
^ Pk
k=1 '
Put now
zk - P k
'Ik = — г ----------- •
IPJ
J =1
00
By repeating the preceding reasoning for the series £ 7/< we find that
*=i
Eiyik I C) = 0 and
( Y pjT
j =i
00
The series £ D 2(t]k \ C) converges by the lemma of Abel and Dini;1 it
*=i
follows (as in the proof of (7)) that
( V о \
A £* I
P lim П
----- = 1 IC = 1 . (8)
4
«-►oo ^
V LA '
4 fc = l 7
Í Z 4 tk
P lim ------_ = Л / c =1. (9)
И -»-CO
V Z е* /
4 k = l '
n n
Since Z £k£k = S„(V) and Z ek — Nn(V) Theorem 1 is herewith proved.
k=1 k=1
The quotient *j" ^ J can be considered as the empirical conditional aver-
П V /
00 Z Pk IMk - M i
1 -i=l------------------ < + ^ (10)
И—1 rn
Z Pk
k = l
M = E{Zn\Bn) = P ( Z n = \ \ B n) = p .
§ 16. Exercises
lim —- У Dk = 0.
„-о, Л2 k =I
VII, § 16) EXERCISES 429
X X R‘i X, xi ^ С X xj
1=1 i=i i=i
for every system of real values x, such that X •*? converges; C is a positive constant.
/=i
Given these conditions,
1 "
lim st — У Zk = M.
n k= 1
3. Prove the following theorem: If f ( x , y ) is a uniformly continuous function of
wo variables, if lim st {„ = £ and lim st ??„ = r), then
П—*■со П—*■со
" t lie -
< P n( t) = П cos
A= 1 tl
and lim
n —* oo
V n(r) = e 4;
Дг„ = ± /Л = у .
Show that the law of large numbers holds for the sequence £„ if 0 < ő < - i - .
6 . Let the events Au A2, . . ., A„ be the possible results of an experiment. Let there
be performed N such independent experiments. The probability that the event Ak
occurs exactly vk(N) times (k = 1 , 2 , , n), and in a given order, is equal to
•Tv = О Ркк(ю,
k= 1
where p k = P(Ak). Since nN depends on the sequence vk(N) (k — 1 ,2 , . . . , n) and
i'k(N) are random variables, nN is a random variable as well. Obviously
II 1\ "
E lN l 0 82 ---
nN) = - X Pk l0 8 * Pk-
430 LAW S OF LA R G E N U M B E R S [V II, § 16
n
The quantity H ( d ) = — £ pk\og,ph is called the entropy of the complete system of
*=i
events c t = (Аъ A,,. . ., A„) (cf. Appendix). Prove the limit relation
7. Let an urn contain a0 white and h„ red balls. If we draw from the urn a white
ball, we put it back and besides we add to the urn a1 white balls and b, red balls.
If we draw a red ball we put it back and add to the urn a2 white and b2 red balls where
ax bt = ű2 + b.,, a2 > 0. The same procedure is repeated after all subsequent
drawings. Let denote the number of white balls drawn in the first n drawings.
Prove the relation
r st
lim „+ — = —---------
^2 .
n—►со n К + a2
Hint. It is easy to show that lim E Í— ) = ———— ; further lim D (——) = 0; hence
VП I bl + °2 V» )
our statement follows from Chebyshev’s inequality.
8. a) Let (n = 1 , 2 , . . . ) be bounded random variables, \rjn\ < C. The necessary
and sufficient condition that r;„ should converge in probability to zero is the fulfilment
of the relation
lim E( \r)n\ ) = 0. (1)
П—►оэ
Hint. Applying Markov’s inequality to the random variable |t;„|, we obtain
n/ i i ^ ^ \t]n\ )
P( |»?„| > e) < --------- ,
E
Hint. Evidently, lim st (/(C„) —/(c)) = 0. Since f{x) is bounded, it follows because
«-► C O
hence
lim E ( fiQ ) = f(c ).
n-*-CQ
9. Let f(x ) and g(x) be continuous functions on the closed interval [0,1 ] which
fulfil the relation 0 < fix) < Cg(x), where C is a positive constant. Then
Urn fr
»-«J J J
..№ 1
9 (xy) + g(x2) + . . . + g(x„) -
I*0 1“ .
\g ( x )d x
0
/ ( í t) + m + . • • + M n)
vn = -----------------------^ ---------------------------
and
j. _ ft(.Zi) + fffe) + - ■• + g(£n)
" n
We have thus
i i
lim st r/„ = J fix) dx, lim st = J g(x) dx,
П — со О И-*- oo 0
1
and, since J dx > 0, we have by the result of Exercise 3,
о
f fix ) dx
lim st -----------.
\g ( x ) d x
0
71
Since further 0 < — < C, we get from the result of Exercise 8 . b)
tv \ № x'>dx
,im £ f r = 't\ ---------
g(x)dx
>
0
q.e.d.
(— l)“- , nV"~1> (— ]
fix) = lim ---------—----- —— — . for X> 0.
x " ( n — 1 )!
Hint. Let ij, i 2, . . . , £„, . . . be independent random variables having the same
t
lim £■(/(;„)) = f ( x ) .
П-*-00
Now we have (cf. Lormula (24) of Ch. IV, § 9)
a) P ( lim — — = ~ ) = 1 (r = 0, 1, . . . . 9),
l«-» n 10 J
/ I n
A(r) - -- 10
b) P lim sup 1 ,_____ < —- = 1 .
' ^J2n\n\nn 3 /
Hint. Let the random variable f„0) be equal either to 1 or to 0 according to the
и-th digit in the decimal expansion of rj being equal to r or distinct from it; then
í„(r) (n = 1 , 2 , . . . ) are independent and have the same distribution, P(cjn(r) = 1) = -y^-,
9
P(£„(r) = 0 ) = —- (r = 0, 1, . . . , 9). a) is obtained from the strong law of large numbers,
b) from the law of the iterated logarithm.
V II, § 16 ] E X E R C IS E S 433
P(ik = ± 1) = у (к — \ , 2 , . . . , n ) .
d) Show that
,,( й )
E (f.) = Y. ~ ---- Ъ ------- for " = 2. 3, 4, . ..
k= 1 ^
/2П
and conclude from it that E(£n) & N/ — .
V n
e) Show finally that
---- * X 2
, /— { e 2 du for x > 0,
/— V n I
lim P(t„ < x j n ) = 0
W-*- 00
0 otherwise.
Hint. Let pnJc = P(C„ = k) (k = —I, 0, 1 , . . . , n). We have the following recursive
formulas:
1
P«+i.k = у (P„.k- 1 + Ли+i) for k = 2 , 3 , . . . , n + 1,
1
A l+ 1 ,1 — у (P n A + Рл,2 +
1
Ptt + l. —l = у 0 „ . - l + Лм>)
1
Рй+ЬО 2 РпЛ’
a) follows by induction.
434 LAW S O F LA R G E N U M B E R S [VII, § 16
= « *=1 I
*=л+1 ,c (3)
Let (fc > n) denote the event that the inequalities
_ CO
> ikk
£ I Л ) [m-
( A - v(от +* l)2, ]J •
If m > k, it can be shown as in the proof of Kolmogorov’s inequality, that
В Д ! Л ) > Д Й 1 Л ) > ^ !А
Hence
E(r] I /1J > e2,
and
£(??) > c2 £ P(Ak) = e2 P (sup > ej . (4)
k^n U ^n к J
(3) and (4) imply (2).
15. Deduce Theorem 2 of § 7 from Exercise 14.
16. Prove the following generalization of Exercise 14. If {lt c2, . . . are completely
independent and if E(£k) = 0, D2(£k) = Df , further if 0 < Bt < Вг < . . . is a sequence
" D\
of positive numbers such that > ----- < + oo, we have, for £ > 0,
fc=l P k
p f s u \pJ Pi к^ L >
'
£ )e < \± Pf" i k-= X
1
^+ I
A=n+1 P k )
#)•
a)k=E1M * - +°°>
00 £)2
ß) the series E — -k— ---- is convergent. Then with probability 1
‘=1 (E
i=l
M iy
i e*({*-M+l)
lim i= !----- —-------------- = 1. (5)
л'*0° Ё Pk
*=i
If we apply the theorem in Exercise 17 to the sequence rjk — ek we have again, with
conditional probability 1 under condition C
П
У £k
lim -ini---- = 1. (6 )
E Pk
k=i
From (5) and ( 6 ) it follows that
Í «А \
P lim ------- = M C = 1.
v~ L * j
436 LAW S OF L A R G E NUM BERS [VII, § 16
CO
Hence the series Y P(\ c„ j > e) is convergent and we can use Lemma A of § 5. (The
n= 1
id e a o f th is p r o o f is d u e t o F . P . C a n te lli.)
20. If it, i f, . . . , are identically distributed random variables with finite variance,
it suffices for the validity of the strong law of large numbers the still weaker criterion
that the i k are pairwise uncorrelated (instead of completely independent).
Hint. Let E(Zk) = 0, £>(Q = D and
1 "
í. = 7n kE=l í.-
> z>2
According to Chebyshev’s inequality, P(|C„* | > e) < - o ; hence the series
nl e2
Y Р(И.„г I > e) is convergent. By the Borel-Cantelli lemma
I I< t (7)
max Yj p Z - e” 2 '
n '< N < 0 i+ 1)2 k= n* + 1 I ) 1 l *=«* + 1 !
for a sufficiently large и. (7) and (8 ) lead with probability 1 for п? < N < (n + l ) 2
to the inequality |f„| < 2 e for a large enough n, which proves our statement.
21. If £,, i 2, . . . are pairwise uncorrelated, if E(fk) = 0, D2(fk) = Dk and if the
co £)2
series ^ ^ is convergent, then the strong law of large numbers is valid.
k -\ к 12
V II, § 16] E X E R C IS E S 437
belonging to the sequence q„ (qn > 2 , q„ integer), where the “digits” e„ (x) may take on
the values 0 , 1 , . . . , qn — 1 (n = 1 , 2 , . . . ) . If q is a random variable uniformly
distributed on the interval (0 , 1 ), let f„(к) denote the number of digits efy) equal to к
(j — 1, 2 ,. . . , n). Assume that the sequence q„ fulfils the conditions lim qn = + °o and
У — = + o°. Show that
n = \ 4n
P I lim = 1 = 1 for к = 0, 1 , . . . .
I П-*■со *
^ Ä q, '
Hint. Let
- _ 1 f°r en (q) = k,
C
~"k 0 otherwise.
“ (St )
follows from the Abel-Dini theorem. Thus we can apply the result of Exercise 17-
The statement of the present exercise can also be obtained as a particular case of
Theorem 1 of § 15.
23. Let ri„ be the frequency of the event A in a sequence of n independent experi
ments, while P(A) = p(0 < p < 1; q = 1 — p). Prove the complete form of the law
of the iterated logarithm, i.e.
p ( V n-np > ) =1 - а д + о Ш
' Jnpq > I N/ n )
1 Cf. J. L. Doob [2], p. 158, Theorem 5.2.
i
438 LAWS OF LARGE NUMBERS [VII, § 16
it follows that
p( ^ ~ np > [— for
l / 2npq ln ln n ) lnn
. . ^ - n kP_
yj2n k pq In In nk
Thus the series ^ P(.Ak) is divergent. It is easy to show that the sequence Ak fulfils the
*=i
condition of Lemma C of § 5. Hence there occur with probability 1 infinitely many
events Ak. Since e > 0 is arbitrarily small, (9) is obtained. (10) can be proved in a
similar way.
24. Let £„ i 2> be pairwise independent random variables with common
distribution function F(x). Put f„ = ^ ^
show that in order that
n
lim st £„ = 0 should hold, the following two conditions are sufficient:
П —
► CO
+И
a) lim J xdF{x) = 0,
П—p- со — П
(Theorem of Kolmogorov).
Hint. Let
= í Í* for I I ^ n,
"k 1 0 otherwise
and
CÍ = — X ink-
n k=l
Then
where
+n
I <5„ I < n[F(-ri) + 1 - F(n)] + j j x-dF(x)
—П
Hence by a) and b) follows
lim ZJ(e"tn) - i
П—
►co
for every real t.
CHAPTER V ili
Cn — £l + £2 + • • • + £n-
We know already that E(Cn) = np and £>(£„) = yjnpq. The linear trans
formation which transforms the random variable £ into the random variable
having expectation zero and standard deviation one, is
where
л:
ф(х) = j e 2 du (2)
—00
<4)
From Theorem 9 of Chapter VI, § 2 it follows, because of Е(рк) = 0, that
i t ] t2 /1 1
<P ~FT~7= = 1 - — + о — . (5)
{-Oyj n] 2n n
( x I”
lim 1 H— — = ex if lim x n — x , (6)
П -+ + 0 0 l ^ Я -»- + <Х>
442 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 1
M( f i k) = M , D 2 t f k) = D l > 0 , M { \ t ; k - M k ?) = H l
Sn= j D \ + Dl + . . . D l , (8)
Kn = U H \ + H \ + . . . + H i, (9)
lim ~ О (И )
«-► + CO ^ n
is fulfilled, then
lim F„(x) = Ф(х) ( — 00 < x < +oo). (12)
n-*-+ 00
Remark. The condition (11) is evidently fulfilled when all ^k have the same
distribution. In effect, in this case Dk = D, Hk = H, Sn = D sfn , K„ =
1 Later on it was proved by Markov that Liapunov’s theorem can also be proved
by the method of moments.
V III, § 1] T H E C E N T R A L L IM IT T H E O R E M S 443
= H l/n , hence
It is again fulfilled when the random variables c,k —M k are uniformly bound
ed and lim Sn = + oo, In fact, from | £k — M k | < C follows
n~* CO
Hl < CDl
hence
S. V S.
where
Kn{ß)= £ £ ( K * - A f * i v \ (14)
k —X
Lindeberg proved the central limit theorem under still more general con
ditions. His condition is, in a certain sense, necessary as well. It is formulated
in the following theorem due to Lindeberg:
Sn = (15)
k= 1
and let Fk(x) be the distribution function o f £k — Mk. I f for every positive
s the so-called Lindeberg condition
lim 4 a £ I
n-+ + go k = 1 J
x 2dFk (x) = 0 (16)
\x\>eSn
444 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 1
Í (i* - M k)
c = - - „— (i7)
Лn
we have
lim P(Cn < x) = Ф(х). (18)
00
Remark. From Liapunov’s condition (11) one can deduce (16); indeed
we have
+ CO
i t Í t f |д г |М В Д - 4 - Ш * . (19)
k =1 J k= 1 J [ °n /
|;c!>8<Sn - со
Similarly, (16) can be deduced from (13), too. Hence it suffices to prove
Theorem 3 (Lindeberg’s theorem); then Liapunov’s theorem (Theorem 2)
will also be proved.
s l ü t . (21)
h o fl kl
hence (21) holds for к - 1; if (21) holds for any k, it follows from
that (21) holds for к + 1 too; hence by induction (21) holds for every k.
Thus the lemma is proved.
We have therefore
ifx X2 t 2
e Sn = 1 + - 7 Г- + 0i where 10X| < 1 (24)
and
*tx ifx X2 t 2 г3 £3
e s" = l + —------- + o2— 2 ~ where |02| < 1 . (25)
Now let £ > 0 be given. The integral (20) can be separated into two parts:
+sSn
t \ Г *tx г
<Pk у I = J e s"dF k (x)+ J
lfx
e Sn dFk (x). (26)
-eSn
Consider first the first integral on the right side of (26). Because of (25)
we have
sSn sSn c*S/7
with
If we add (27) and (29), we obtain by (28), (30) and by taking into account
that E{r]k) = 0,
4 i ) = 1 ~ 4 f - + ^ 3)’ (31)
446 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III. § 1
with
+ I ^ JFk(x) ( 32)
|лг| >ESn
max Dk
lim l ^ L _ = 0. (33)
«-*-00 ^n
In fact
n
D\ = J X2dFk (x) + J X2 dFk (x) < £2 S2 + £ j x2dFk (x),
|дг|>с 5'/1 &=1
where
max D \ i n r
^ e 2+ - i E x2^ ( x ) . (34)
fc=l J
\xl>eSn
It follows, because of (16), that
max Dk
lim sup ---- < g. (35)
«-*-+00 *^«
Since г > 0 may be chosen arbitrarily small, (33) is proved.
Choose now n0(e) such that for n > n0(s)
This can be done because of (16) and (33). Let further be £ < jyy ,thus
1 t 2D \
— < 1 --------< 1
2 2S\
Hence from (39) and (40) it follows for n > n0(e) that
« it) _ i2 i l t I3 1 t 4e2
Д » * У - ! h ( 1f + '1 + — ■ <4»
Since e > 0 can be chosen arbitrarily small, (41) implies
” ft) . --
lim (pk — = lim E(e“^") = e 2 . (42)
Tt~* + oo k —1 1 * ^ /1 . Л -. + 0О
Theorem 1 does not follow from Theorem 2, since it is possible that j x 2dF(x)
+ 00 + 00 — 00
exists but [ |xj3c/T(x) = + oo and even that j \x\ß dF(x) = + oo for every
— CO — 00
ß > 2 .
Let us add that Lindeberg first proved his theorem by a different method,
viz. by a direct study of the convolution of the distributions (see § 12).
Lindeberg’s condition (16) is, as was shown by W. Feller, necessary as
well, in the following sense: If | 2, are independent random
variables with finite expectation and finite standard deviation, if Fk{x) is the
448 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 1
An = Е Ю , Sn = D ( 0 and C = , then
f x 2dFk (x). t
у £ Л=1
M>eSí!
Lindeberg’s condition implies thus that the variables (£* — E (fk))jSn are,
in a certain sense, “uniformly small” with great probability. We do not
prove here that (16) is a necessary condition. Neither do we deal with
further generalizations of the central limit theorem.1
The results of this section can be generalized in the following way: in
stead of a sequence C* {k = 1 ,2 ,...) consider a matrix (£nk), {k = 1 , 2 .,kn;
n = 1, 2 ,. . .) of random variables such that the variables Сит, Cn2 >• • -Лпк„
are independent for every n and put
k n
C„= I U -
k = 1
By the same method which served to prove Theorem 3 we can prove the
following, somewhat more general, theorem:
T h e o r e m 4 . Let E
,nk (k = 1 , 2 , . . ., k„) be for every n (n = 1 , 2 , . . . ) inde
pendent random variables with finite variance. Put M nk = E(<;„k), Dnk =
= D(£nk) and let Fnk(x) denote the distribution function o f ^,k — M„k. We
kn
1 Cf. the books of В. V. Gnedenko and A. N. Kolmogorov [1] and W. Feller [7],
Л ol. 2 , containing the detailed discussion of many further results in this domain.
v i li , § 2] TH E LOCAL FO RM O F C E N T R A L L IM IT T H E O R E M 449
In the preceding section we have seen that the distribution function Fn(x)
of the standardized sum £* of n independent random variables i b i 2, . . . ,
i n, . . . converges, under certain conditions, to the distribution function of
the normal distribution as n -*■ oo. It is therefore natural to ask under which
conditions the density function of £* (if it exists) tends to the density func
tion of the normal distribution. For this the conditions must certainly be
stronger, since it is known that Fn(x) -*■ Ф(х) does not necessarily imply
F'(x) Ф'(х). We prove first in this respect a theorem due to В. V. Gnedenko:
— - fiW
.. - fa(x)
- 3 - 2 - 1 0 1 2 3
F ig . 2 5
Lemma. I f the density function g(x) is bounded, g(x) < K, and the charac
teristic function
<K0= j g{x)ei,xdx (4)
—OO
+ 00
^ C sin vx
\p(t)dt = 2 I g (x)-------- dx, (5)
J J X
—v —oo
-T
J H 0 d t< Y J #(*)-—C°2
-o o
2TX dx- (8)
V III, § 2 ] TH E LOCAL FO RM O F C E N T R A L L IM IT T H E O R E M 451
fÁ x ) = Y n J (p [7 = ) e~ix‘d t■ (10)
— 00
The second integral does not depend on n and becomes arbitrarily small by
choosing T sufficiently large. It suffices thus to study the first integral. In
order to evaluate it, we separate it into two parts. For и -> 0 we have
w2
<f(u) = 1 - — + ф 2).
!<K“) I < 1
and it follows that
t in 00
г I ( t )n ce
J j cp — dt < 4 du , (15)
T T
tends to zero as n -*■ + oo. First we choose q — q(e) with 0 < q < 1, so
that I <p(t) \ < q when j t \ > s > 0.
In fact, according to Theorem 8 of Chapter VI, § 2,
lim I<p(t) I = 0.
/-*- + 00
+ cc ____
Since we have already shown that f | tp(u) |2 du is finite and lim sJnq"~‘i =
-00 «->-+00
= 0. the integral (16) tends also to zero as n -> + oo. All these restric
tions are valid uniformly inx.(2) holds thus uniformly for - od < x < + oo.
Theorem 1 is herewith proved.
When fix ) is not bounded but for any given к f kix) is, (2) remains still
valid. This can be shown by a slight modification of the above proof. The
condition that f k(x) be bounded for a value of к (and, consequently, also
for every n > k) is evidently necessary for the uniform convergence of
/«(*) to —L= e 2.
V 2*
tn = i z k
k = l
limit relation
^ И -г) + (1-*М )],0 (2)
f x ‘dF(x)
-y
holds, then (1) is valid for every suitably chosen sequence o f numbers {A„}
and {S„}.
Notes
1. Condition (2) is not only sufficient but also necessary for the validity
of (1). But this will not be proved here.
2. If the standard deviation of the random variables exists, i.e. if
+oo
j x 2dF(x) is finite, (2) is evidently true; this follow s im m ediately from the
—00
inequality
/И -Д О + О -^О О )]^ J x 2dF(x).
1*1
Thus we can see that Theorem 1 of the present section comprises Theorem 1
of §1.
3. If the standard deviation D and the expectation M of qk exist and if
M = 0 , then in (1) An = 0 and Sn = D ^/n. Conversely, if (1) holds with
A n = 0 and Sn = D ^/n, the standard deviation of t k exists and is equal to D.
In fact, in this case Theorem 3 of Chapter VI, § 4 permits to state the follow
ing: If rp(t) is the characteristic function of the random variables Qk, we
have
t П - /2
lim cp ---- — = e 2
-Too [D ^/n.
hence
In <p i — ^ = ] - In <K 0 )
\D jn Dr
lim --------- Г 7 1 * -------- = - - y »
n-*-+00 1 ^
D jn .
+00 +00
and from this follows that D1 = ( x 2dF(x). Thus if j x 2dF(x) does not
—oo —00
exist, the sequence of numbers {Sn} for which (1) holds cannot have the
order of magnitude ^Jn. (Clearly, by the proof of Theorem 1, S„ tends to
infinity faster than J n .)
VIII, § 3] DOMAIN OF ATTRACTION OF NORMAL DISTRIBUTION 455
P roof of T heorem 1.
L em m a . I f we have
y2[ F ( - y ) + (1 -^00)] ^
— ------------ ---------- — < a. <
r
1 fo r y > y 0 > 0,
^ л (3)
f x 2dF(x)
-y
+ со
then I I X I dF(x) exists.
— 00
and by (3)
Y +x
Г j t 2dF(t)
I IXI dF{x) < t (1 - F(y) + F ( - y)) + « -=i— 1------dx,
J J X
y<.\x\^Y у
thus
Since the right hand side of (5) does not depend on Y, we conclude that
+ CO
It follows from
А(У)= (1 -F(y)f ‘ (8)
y2 1
m = ------------ ^ -----------> ---------------------------г г (9)
(1 - F{y)) j X2dF(x) (1 - F(y)) F(y) - —
о L
that
lim A(y) — Too. (10)
У-+ + 00
Evidently, C„ -» + oo and
и(1 = (12)
Put now
S l = n I x 2dF(x), (13)
—Cft
and let <p(t) be the characteristic function of £,k, q„{t) the characteristic func
tion of £JSn. We have
~ i t ~n
<Pn(t) = E(e s" ) = <P — . (14)
However, we have
+Cn
It r Jlii Г JitL
<P — =1+ ( e s" - l) rf F ( x ) + (e s- - 1 )dF(x) (15)
-Cn |*| >Cn
and
(18)
J 6S nA S„ 6n 6J i n
-C n
In n
with a 9n which remains in absolute value below a bound not depending
on n. Since lim J b(CJ) = 0, we get
/?-*-+00
lim rpn( t ) = e 2,
n-*- + oo
(19)
which implies the statement of Theorem 1.
458 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 4
As regards the question whether other distributions than the normal also
have a domain of attraction, the following example shows that this is pos
sible. Let be completely independent random variables
possessing a common stable distribution of order a(0 < a < 2) and charac-
1 "
teristic function e-1' 1“, then the characteristic function of - | a £ £,k is
n * k=1
exactly e ~ l' 1“.
Thus any stable distribution has a domain of attraction which contains
at least the distribution itself. The domain of attraction of a stable distribu
tion with 0 < oc < 2 is very narrow compared with that of the normal dis
tribution; it contains only distributions very similar to the stable distribution
considered. As regards the determination of the domain of attraction in the
case of a stable distribution with 0 < a < 2, we refer to the book of
Gnedenko and Kolmogorov [1].
T heorem l. Let £nl, <f„2, . .., £,nkn be independent random variables assuming
nonnegative integral values only. Put
and
K k = Z P n k (r). (2 )
r=2
lim Z R nk = 0, (C)
П-++СО k —1
V I I I , § 4] C O N V E R G E N C E TO T H E P O IS S O N D IS T R IB U T IO N 459
P roof . Let gnk{z) denote the generating function of the random variable
ink-
9 n k {z ) = f P „k (r) zr ( I г I < 1). (4)
r=0
Clearly
19 n k ( z ) - P n k { 0) - P n k (\)z \ < R nk fo r I z I < 1. (5)
Since
a * (0 )-(i - P n k ( i) ) = - R n k, (6)
we can write
\9 n k (z ) ~ 1 - P n k W iz - 1) I < (7)
The identity (38) of § 1 implies, since | gnk(f) \ < 1 and | 1 + pnk(Y)(z — 1) | <
^ L
П
k=l
9пк(г) -
k=l
п (1 + P A 1) (z- 1)) — 2 X1 Rnk. k=
(8)
If , ч 1
max />„*(1)< max (1 - />„*(0)) < —- ,
t< , k < ,k n \< Jk< Jk„ 4
which is because of (B) fulfilled for n > n0, then identity (38) of § 1 leads to
< m a x (1 -
1< .k < ,k „
Pnfifií) I г - 1 12 k П= 1 P A X )- (9)
It follows now by our assumptions from (8) and (9) that
lim
«-►+ 00
ft
k=1
9 nk(z) = e A (z~ 1 ). (10)
kn
Since Y\ 9nk(z) is the generating function of t]n and eA(z_1) that of the Poisson
k=l
460 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 5
ln\IX)4 X ) n~J
The statement of the central limit theorem is valid for certain sequences
of weakly dependent random variables. In the present and in the following
two sections we prove some results in this direction. These results have
practical importance, too, since in the applications the independence is
often only approximately true. The following theorem1 refers to samples
taken from a finite population, a situation very often encountered in prac
tice.
^Ь-тг)-
1 Cf. P. Erdős and A. Rényi [1]. See also J. Hájek [2], where it is shown that
<2)
Theorem 1 is essentially best possible.
V III, § 5 ] C E N T R A L L IM IT T H E O R E M F O R F IN IT E P O P U L A T IO N 461
In i n
D* * - D4 i r [ l - i f i ' (3)
Cn .h = Cn,n — M N. (4)
Put further
1 V, / M k, l 2
4v,»00 = ~ r X « л и -----т Н • (5)
°N |^ - ^ |> .^ / N >
I f the condition
lim dNn(s) = 0 (6)
N - > + CO
is satisfied for any e > 0, then we have for —oo < x < + oo
r*
lim P < x = Ф(х). (7)
lV-*-+oc , ^N.n
1 M N l 2 ^ x r2 D * n ^
— < - 51 - V
У aN --------- < Ne2 — < m 2. 2
2 Dn I N Dn
Hence, for N > N0 we have n = n(N ) > —-y and since e > 0 can be
chosen arbitrarily small, we get
We may assume
M N = 0. (9)
462 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 5
In fact, if (9) is not fulfilled, consider instead of the numbers aNk the numbers
a'N k = aNk - ; for these (9) is clearly fulfilled and if Theorem 1 holds
for a'N k, it remains valid for aNk too.
Furthermore, we can assume
N
1< n< — - (10)
the random variables C\,n and have indeed the same distribution
N
and if n > , we may take instead of n the number N — n.
We compute now the characteristic function 9oNn(t) of £%„:
(13)
and
BN,nW = Щ - k)N~ \ (14)
Indeed if we calculate the value of the expression behind the sign of integra
tion, by taking in the product (N - m) times the first and m times the sec
ond term, we obtain a term multiplied by the factor ei(m~"'yp; such a term
vanishes therefore when the integration is carried out provided that m Ф n.
If N -a + oo and N —n —►+ 0 0 , which is certainly fulfilled in our case because
of (8) and (10), it follows from Stirling’s formula1*that
BKn(X) » — = 1 (17)
j2 n N X ( \ - A)
& 0 М ) = (1 - Я ) exp - A Í + -^ -1 1 +
1,/АА(1 - A) A v .J ]
1 If A is fixed, (17) follows directly from the Moivre-Laplace theorem. In our case
A depends on N, hence the latter theorem cannot be applied. But Stirling’s formula
leads easily to (17).
464 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 5
Now let e > 0 be given and suppose \ ф \ < 2e ^JNX( 1 — Я). If к is an index
such that ^ I J gjv’*l < g) then (20) implies
DN,n
(23)
with I 0XI < 1; (21) implies that for every value of к and for \ ф \ <
< 2ev/ /v;.(r- Я),
, о Л , 2(1 - 2) Ф , taN'k)*
\ 6k( i l f , t ) - l \ < - — - - 7 = - .+ - ■ (24)
2 7 )УЯ( 1 — Я) л
where | 02 1 ^ Q and
Since
the right hand side of (31) tends to zero as N —> + oo because of (28) and
because
/— и(1 - cose)]
lim ^Jn exp —------ --------= 0.
N-*+a> 2 )
From (18), (25), and (31) we obtain, since s can be taken arbitrarily small
+00
t ) -** 1 г - ф2 -—
О т <рК п - = e 2■ e 2 d i// = e 2, (32)
.V~+oo U N ,n ) J
—00
which concludes the proof of our theorem.
A case of particular importance occurs when M of the numbers aN k are
equal to 1 and N — M are equal to zero. Then
Ml I N - M
m n —m
P(HN,n = m )= ..... ........ (33)
n
i.e. (Nn has a hypergeometric distribution. Furthermore, MN = M , DN =
= у / M {N - M)/N. Condition (6) is satisfied for W ft_ ^ jM )n(N - и)
(34)
M
When — is constant or remains above a positive bound, this means that n
N
nM N
must tend to + oo with N. F rom ------- ►+ oo it follows, because of M < —
N 2
N
and n < — , that N -> + oo, л -+ + oo, and M -» + oo. Theorem 1 con
'M \ I N — M
„ к n —к i . ....
hm Y ______ _____ 7N7\----------= ф(х )- (35)
NpX-++oo k<.np+xinp(\—p )(l—A) I
. n)
M
A particular case of this theorem, when p = — = constant, was derived
by S. N. Bernstein.
M
Note further that if p = — is constant and n increases more slowly than
n 2
N (if, for instance — -*■ 0), (35) can be derived from the Moivre-Laplace
theorem by approximating the terms of the hypergeometric distribution by
those of the binomial distribution (see Chapter II, § 12, Exercise 18). How
ever, the general case cannot be treated in this way.
Theorem 2 can also be proved directly by merely considering the asymp
totic behaviour of the terms of the hypergeometric distribution, but this
procedure leads to tiresome calculations.
C„ = £
k=l
Assume that there exist two sequences {C„} and {5,,} with Sn —>+ co and a
distribution function F(x) such that at every continuity point o f F(x) the dis
tribution function o f
L - c n
4n =---^---
tends to F{x):
lim P(ri„ < x) = F(x).
n-*-+ 00
Then the sequence o f the random variables t]n is mixing.
P roof . We shall use the following lemma due to H. Cramér1
By assumption
lim P( Ie„ I > <5) = 0.
«-*- + 00
F(x - S ) - P ( \ £ n\ > S ) < P(0n + e„ < x| | e„| < 5) < F(x + 8), (5)
F(x — Ő) < lim Р ( в „ + sn < x) < lim P{9„ + sn < x) < F(x + <5). (6)
n-++00 «-»-+oo
Since x is by assumption a continuity point of F(x) and S > 0 may be
taken arbitrarily small, (3) is proved.
Let now x be a continuity point of F(x) and suppose F(x) > 0. Then by
assumption we can find an n0 such that /'(>/.. < x) > 0 for n > n0. Put
A0 = £2 and denote by Ak the event г]щ+к < x (k = 1, 2, . . . ) . Then P(Ak) >
> 0 and, by assumption,
£n0+k C«0+k
= о
Vn *ln
O« ’ C
V III, § 6] A P P L IC A T IO N O F M IX IN G T H E O R E M S 469
holds. The theorem is therefore proved for every x such that F(x) > 0.
If X is a continuity point of F(x) such that F(x) = 0, we have
tZk-C„
k =1
ч„= s—
possess a limit distribution, then r\n is in the limit independent of any ran
dom variable 9 in the following sense: For every у such that P(9 < у) > 0
the relation
lim P(t]n < x, 9 < y) = lim P(rjn < x)P(9 < y) (11)
П-++ OO «-*- + OO
£ _ C
where rjn = — ——and F(x) is a nondegenerate distribution function, then
Sn
rjn cannot converge in probability to a random variable rjx .
Q(A) = §x(co)dP.
A
V III, § 7 ] S U M S O F A R A N D O M N U M B E R O F R A N D O M V A R IA B L E S 471
tilt
L =^ (i)
V«
the relation
lim P(C„ < x) = Ф(х) (2)
Л—
*-+ 00
1 The sequence of events {A„} is also mixing with respect to [Q, cA, Q] since
Q*(A) = Q(A I B) is also absolutely continuous with respect to P. Hence by Lemma 2
lim Q(A„ I B) = d.
/!-*■+00
2 Cf. P. Révész [1].
472 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 7
Proof. Put
C„k = P(v„ = k) (n,k = 1, 2, . . . ) . (5)
lim Sn = S
«-►+ 00
then
00
Hm £ C „ fcS* = S.
«-► + 00 k = 1
Now
oo
P(Cv„ < x ) = Y P(Ck < x , v n = k). (7)
k = l
From (2) and the above-mentioned theorem from the theory of series we
obtain (4) and Theorem 1 is proved.
The situation is somewhat more complicated if we do not suppose that
v„ is independent of the variables £,k. In this case a stronger condition than
(3) must be imposed upon v„. As an example we prove now a theorem
which is a particular case of Anscombe’s theorem.1The reasoning is inspired
by W. Doeblin.
-+ c, (9)
n
P ro o f . P u t
n
y -^ l. (И )
Furthermore
c- = é И г + ■ (12)
Now we need a simple lemma.
Let <5 > 0 be arbitrary. Choose N and nx so that for n > n x the inequality
P(I в„ I > N) < S should hold. Choose n2 > щ such that for n > n2 the
g
inequality P \y n — 1 1> — < <5should be valid. So P(| &n (yn — 1) | > e) <
< 28 for n > n2. Consequently, 0„ (yn — 1) Л 0 and the present lemma
follows from Lemma 1 of § 6.
According to these two lemmas it suffices for the proof of Theorem 2 to
show that
*7v„ _p о ( 13)
J in
Let e > 0 and 8 > 0 be arbitrary; choose nx such that
P{ I v„ — I > 8e2 A„) < d for n > щ. (14)
Clearly
£ Р Ш < «, (22)
k = m +1
T heorem 4. Let a. be a positive discrete random variable ', suppose (2) and
— A a. (23)
n
Then (4) is valid.
The proof1 rests upon Theorem 3 and uses the same method as the proof
of Theorem 2.
any essential restriction of the generality it can be assumed that cv# is the
set of the nonnegative integers. In this particular case (1) can be written in
the following form: If n, k ,j a, j \ , . . are any nonnegative integers, then
P(C„+i = * 1C, —
jn■■L=J„)= P(C„ + 1 = кin=Jn
I )• (2 b )
Similarly, we can show that for arbitrary integers 0 < nx < n2 < . . . <
< ns < n,
Clearly for every positive integer m and for j > 0 the relation
(5)
k= 0
holds. In fact, the terms of the sum are the probabilities belonging to a com
plete system of events. Hence the matrix Пт, which has nonnegative terms
only, has the property that the sum of terms in each row is equal to 1. Such
matrices with nonnegative elements are called stochastic matrices. The matrix
Пт can be computed from П as follows. According to the theorem of com
plete probability (cf. Chapter III, § 2, Formula (2)) we have for 1 < r < m
P(C„,m=k\C„ = j) = I
/ =0
P(Lrm=к I C„+, = /, C„ = ;)P (C „ +, = 1 =Л1C„ (6)
pT = I p f p t ~ r)- (7)
t=o
Thus we have
The matrix of m-step transition probabilities is thus the m-th power of the
matrix of one-step transition probabilities.
So far we have only considered transition probabilities, i.e. conditional
probabilities. In order to determine from these the probability distribution
of we must know the state of the system at the instant t = 0 or at least
the probabilities of the initial state of the system, i.e. the probability distri
bution F(Cо = к) (к = 0, 1, . . . ) . With the notation P((n = k) = Pn(k)
{n = 0, 1 , . . . ) one can thus write
In this case
Рп{к) = р% . (11)
/1 - Я X \
n = p t1 - pi •
РАО) = — + (1 - Я - pT k ( 0 ) - - r j - l , (12b)
Л. ~r Ц *v A fi
where P0(l) and Po(0) are the probabilities that at time 0 the machine works
and does not work, respectively. Since 0 < Я < 1, 0 < p < 1, we have always
I 1 — Я — p I < 1; hence (12a) and (12b) lead to
on the initial distribution P0(j), then the Markov chain is called ergodic.
An initial distribution such that £„ has the same distribution for every value
of n, is called a stationary distribution. If the Markov chain is ergodic and
there exists a stationary distribution, the latter is evidently the limit distri
bution of C„- It is еа5У to show that there exists a stationary distribution,
iff the system of equations
** = f
7=o
Pjkx, {k = 0 , 1, . . •) (14)
00
*o = (1 - A) x0 + рхъ
Xl= kx0 + (1 - fi) xv (15)
A g
*1 = 1—— > * o = t ~;— • (16)
A ~ fl A fl
In this example there exists a stationary distribution and the Markov chain
is ergodic.
The following theorem, due essentially to A. A. Markov, shows that this
holds under rather general conditions.
Pk = l PiPjk = (19)
J=o
which satisfies
1^= 1- (20)
1=0
Proof. By assumption
Clearly
and
m V < m(kn+1\ (25)
Similarly, for 0 < j < N,
with
Щ < M k. (30)
= I PfiPj? (3D
y=o
holds and for a certain /0;
Let H be the set of all j (0 < j < N) for which pfy —p\s^ > 0 and let H
be the complementary set of H, i.e. the set of those j (0 <i j < N ), for which
P u ~ P u < 0 holds. Put
A+B= f
y=o
p f i - £ pffi = 1 - 1 = 0,
j =о
hence В - —A and it follows from (33) that
A < \ - p ^ t<\-d,
hence (36) is valid in both cases. It follows thus from (35) and (36) that
Mi"+P - n4n+l) < (1 - d) (M P - mi">). (37)
482 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 8
Qk = l Q ,P ,k (k = 0 , l . . , N ) (40)
1=0
Q, = I QlP\”\
1=0
N
Because of lim p ff = P, and Y Qi = 1 there follows Q, — P„ which
n-»+OO / =0
was to be proven.
The numbers Pk fulfill the equations
Pk = t p ,p № (41)
;=o
for every n = 1, 2 ,. . Equation (19) is a particular case of (41).
If the distribution of (0 is known, P(C0 — i) = P0(i), then from (10a) one
can derive the relation
Finally, let us mention the following particular case: assume that for the
matrix of the transition probabilities (p]k) the sum of all columns is equal
to 1:
£ Pjk= 1 for к = 0, l , ..
7=0
It follows from (42) that the probabilities of the N + 1 states are in the limit
equal to each other, regardless of the initial distribution.
A particular class of the Markov chains is that of the so-called additive
Markov chains. If £0, . . . are independent random variables and
if we put C„ = £0 + £i + • • • + the random variables C„(n = 0, 1,. . .)
form a Markov chain, since
- 1 m
For homogeneous Markov chains with a finite number of states the suit
ably standardized sum
*ln — Co + Cl + • • • + in
is under general conditions in the limit normally distributed. This will be
proved here only for the simplest case of a chain with two states.
4 n = Í Cl- (45)
k = 0
Then
( я- n
VП- \
lim P “= < X = Ф(х). (46)
п-*-со пЛц( 2 — Я — /i)
W (X + J tf я
W x A s f )„
(A + g f K ’
Hence (46) reduces to
Proof. If t„ denotes the instant when the system returns for the и-th
time to the state A b we have 0 < tx < t 2 < . . . < t„ < . . . ; £1я = 1;
V III, § 8] L IM IT D IS T R IB U T IO N S F O R M A R K O V C H A I N S 485
P(<5„=l)=l-/t
and
P{8„ = к) = цк(1 - Х)к~г for к > 2 .
E(K)- —
and
*2V '* 0 •
If Co =..l, <5i has the same distribution as the other <5„ (n ^ 2); if Co = 0, we
have Р(<Зг = 1) = X, P(öx = к) = (1 - A)*-1! for /с > 2 hence £(<50 =
lim P
{ — - .
( < X = Ф(х). (47)
к~+оо 2 —A —ц)
^ Ä '
Now obviously P(r)„ < к) = P(rk > n); in fact t]„ < к means that up
to the moment n the system was less than к times in the state A u thus its
&-th entrance into the state A x occurs after the moment n, hence x-K > n and
conversely. If we put
k = \ X n i x / ”M 2 - A - l>) -
[ a + |í V (A + i i f J’
к ( l + v) x \/k u ( 2 —A —u)
n = —- - -V- -- ----- л ---- + 0(1),
486 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 9
hence
Í 'l (ъ - Ц
P —, Л , = < X =F —p = = > - л: + О -7 = .(48)
/иЯ /;(2 - Я - ju) I У М 2 - Я - /i) lN/ А;
W (Я + м)3 ' V Я' '
Thus (47) leads to
/ Я \
Чп - у —“ и
lim Р —, ^ : < X = 1 —Ф(—х) = Ф(х). (49)
/!-*•+со /wA/i(2 —Я —/z)
^V (А + /i)3 ^
We use the following property of the exponential distribution: for any x and
у one has
P { t > x + y \ C > y ) = P{C > x). (1)
Property (1) is characteristic of the exponential distribution. In fact, it is
equivalent to
G(x + >0 = G(x)G(j) (2)
if F{x) is the distribution function of '( and G(x) = 1 — F(x); we know al
ready (cf. Chapter III, § 13) that the only nonincreasing solutions of (2),
1 For the method see A. Rényi [9], [10].
v i li , § 9] L IM IT D IS T R IB U T IO N S F O R “ O R D E R S T A T IS T IC S ” 487
the trivial solutions G(x) = 0 and G(x) = 1 excepted, are the functions of
the form G(x) = exp (-A x ) with A > 0.
The meaning of (1) becomes particularly clear if we interpret Cas the dura
tion of an event which takes a certain lapse of time to occur. In this case (1)
expresses the fact that the future duration of an event which is still in course
at a moment у does not depend on the time passed already since the begin
ning of this event.
Arrange the random variables Ci, Сг>. . ., Cn in increasing order and let
С2 = В Д i,Í2,. ••,£„)
be the k -th of the ranked variables (j. Then1
1 F(x) being continuous, the probability that two of the are equal is zero; this
possibility can thus be omitted.
488 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 9
t t = In * . (A: = 1, 2 , . . . , n) (10)
r\Z*+l-k)
and as £k are independent, the same is valid for
Consider now the distribution of C,k. Let у — F_1(x) (0 < x < 1) be the
inverse function of x = F(y) ( —oo < у < + oo). Then the relation
is valid, i.e.
holds for 0 < x < 1, the random variables F(tJk) are uniformly distributed
in the interval (0, 1). The random variables /•"(£*) are thus the ordered ele
ments of a sample selected from a population uniformly distributed on
(0, 1).
490 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 9
Starting from this point of view, many results on order statistics can be
derived quite easily. As an example, consider the following problem: What
is the limit distribution of the random variables <!;* when both n and к tend
к
to infinity in such a way that — tends to a limit q (0 < q < 1) ? In
n
particular we consider the case к = [nq\ + 1; £*nq]+\ is called the sample
quantile o f order q.
We prove a theorem which implies in particular that the sample quantile
of order q is in the limit normally distributed, provided that the distribution
function F(x) fulfills certain conditions.
D = j h ) ^ q { l ~ qy (20)
Proof. We consider first the limit distribution of
C*+i-*(„) = I n ■ (21)
By (12)
rt+l-k(n) S
C n + l-* ( n ) = E , J J ’ (2 2 )
j =1 n + 1~ J
^ 1 I
У — = lnW+C + O — ,
hi к Nj
M n = £(&!_*(„)) = In — + о f - U . (24)
? yjn
Since
_l_ J _____ l_ M_
b V 2 Ni N2 + [TVf,
we get
y‘ 1 1 1 ( 1 I
Л , * 3 " 2N \ 2N \ + W [iV?J
it follows that
л + 1 -Arpi) x _ I 3\ 1 \
11 гГ-Ш-
Thus Liapunov’s form of the central limit theorem (Theorem 4 of § 1)
can be applied to the sums (22). Taking into account the lemmas of
Sections 6 and 7 we get
ír * , 1 \
ÍÍ+ i - а д - I n —
lim P ------- --------- < ~ n = ф(х)- (28)
n^ +co / 1 —<7 ./и
^ V T "
1 Cf. K. Knopp [1 ].
492 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 10
í rW
* i -в д - I Ьт
n - L \
P --------------, ----------- < 7= =
/ 1- q Jn
' V q '
f /1-17 ri
= F- 1(a) + — I- - - - - - 1- - - - - - - ” y .- J - - - - - - L (3 0 )
w № ,)
where lim 9„ = 1; further
/2-*- + CO
?H (“xх/-^Г)- ')■--■* +° (t )•
Now (29), the continuity of f(x), and the lemmas of § 6 and § 7 imply (18),
hence the theorem is proved.
The theorem states that the empirical sample quantile of order q of a
sample of n elements is for sufficiently large n nearly normally distributed
1 for £* <
F„(x) is the empirical distribution function of the sample . . ., <J„.
T heorem 1 (Smirnov).
Theorem 2 (Kolmogorov).
lim P( J n sup IFn (x) - F(x) | < y ) = | ^ ^ .} > °’
«-+ 00 -ao<*<+oo (0 otherwise,
where
ЧУ) = I (2)
k=—oo
Notice that in these two theorems the limit distributions do not depend
on F(x). It suffices that F(x) is continuous, this guarantees the validity of
these and all further theorems in this section. The values of the function
Ч У ) figuring in Kolmogorov’s theorem are given in Table 8 at the end of
this book.
The theorems of Smirnov and Kolmogorov may serve to test the hypothe
sis that a sample of size n was drawn from a population with a given con
tinuous distribution function F(x).
The theorems of Kolmogorov and Smirnov refer to the maximal deviation
between Fn(x) and F(x). Often it is more convenient to consider the maximum
1 For the proof of Theorem 1 cf. § 13, Exercise 23.
494 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § J 0
F (x) — Fix)
for F(x) > a > 0 of the relative deviation — ———----- . The follow-
F(x)
ing theorems are concerned with this relative deviation.1
T heorem 3. We have
J r Fn( x ) - F ( x )
hm P J n sup --------------- < у =
n -+ + со X a < , x < + OO - * \ X )
*1 t =*
_ J e 2 dx fo r у > 0,
о
0 otherw ise,
where xa is defined by F(xa) = a, 0 < a < 1.
T heorem 4. We have
J r F n (x )~ F (x ) j ^ \ь[у J ^ - \ f o r y > 0 ,
lim P L/и sup ------ — ------- < у = 1 \ 1 "/
И — J-0 0 l X a < tX < + o o \ " \ " ) I
0 otherwise,
where
2\
exp - {2k 4- l)2
L(z) = — X ( - 1 ) * -------i - — (3)
n 2k + 1
and x a is defined by F(xa) — a, 0 < a < 1.
The values of the function L(z) defined by (3) are tabulated in Table 9.
We may be interested in the maximum of the relative deviation over an
interval (xa, x b), where xa and x b are defined by F{xa) = a and F(xb) — b
(0 < a < b < 1). This problem is solved by
T heorem 5. I f 0 < a < b < 1, F(xa) = a, F(xb) = b, then the relation
J Fn{ x )-F { x )
\ i mP J n sup ------— ----- < T =
n - > + 00 X a< ,X < X b г \л )
________ у 0 - ' ) ( l- r - ä
- г —со
J “p( - J J ,) J 0
s * •*
is valid.
' Cf. A. Rényi [9].
V III, § 1 0 ) E M P IR IC A L D IS T R IB U T IO N F U N C T IO N S 495
\
i.e. the probability, that the empirical distribution function remains every
where under the theoretical distribution function, tends to zero. According
to Theorem 3 the same holds if we restrict ourselves to values of x superior
to xa (a > 0). However, if we consider an interval [xa, xfc] with 0 < a <
< b < 1, then by Theorem 5,
lim P( sup (F„(x) - F(x)) < 0) =
И--+СО Xa<,x<,Xb
0 _,|i ?P--
+ 00
J
0
(4,
and have the respective standard deviations / —p— — and 1. Now this
V b —a
probability is equal to
1 Ia(l — b) 1 . [a( 1 - b)
иar c , a n \ h r ^ r - 2 í arc sm V <5)
As a matter of fact for a normal distribution symmetrical in x and у
the probability of the random point lying in an angle у is —— ; an affine
2л
transformation leads from this to the general case. Thus we have
0 for z < 0,
( 2n ) _
n
1 otherwise.
V III, § 10] E M P IR IC A L D I S T R IB U T IO N F U N C T IO N S 497
0 for z - V7 42n
= ’
= 1 Í n ii 2" I / 1 ^ In
Т ъ ,\
(2
Z
* = -[" ]
(-!)
I n-kc) JYn
. f or —p = < z <
\ 2
,
1 otherwise.
The values of
1 + [«“] ( 2n \
./- » ‘ Ú J
n ) 1'
are tabulated in Table 7, for n < 30; for n > 30 Theorem 8 can already
be applied.
First we prove Theorems 9 and 10; Theorems 7 and 8 can then be derived
by passing to the limit. Collect the random variables £,ъ rjb . . . ,ц п
into one sequence and arrange these 2n numbers in increasing order; let
Ct denote the А-th number in this ordered sequence. One can suppose
that £ * < £ * < . . . < C Put
J 1 if C* is one of the
I —1 otherwise.
Thus in the sequence 0lf 02, ■■■, 92n, n numbers are equal to 1 and n
numbers are equal to —1. Put Sk — 91 + 92 + . . . + 9k. We prove first
are valid.
498 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 10
and, similarly,
sup n \ F„(x) - G„(x)I = max n | F„(C* + 0 ) - G„(£* + 0) | = max |S*|.
—co < x < + co l< ,k < ,2 n l < J c < ,2 n
П
find the number of the sequences въ . . . , 0 .2 n fulfilling this condition
(2n\
and then divide this number by . We arrived thus at a combinatorial
11
problem. Its solution will be facilitated by the following geometrical
representation : Assign to every sequence 0 X , . . . , 9 n a broken line in the
(x, y) plane starting from the point (0, 0) with the points (Sk, k) (k =
= 1 , 2 , . . . , In) as vertices. (Here (a, b) denotes the point with coordinates
x = a, у = b.) There corresponds thus to every sequence . . . , 02n a
“path” in the plane; all paths start from (0, 0) and end at (0, 2n); all are
composed of segments forming with the x-axis an angle either of +45°
or of —45°. We have to determine the number of those paths which do
not intersect the line x = z J i n . Let this number be denoted by U* (z).
If a path intersects the line x = z J i n , it is clear that it reaches the line
x = {z J i n } = c, too.
Thus we have to count those paths which lie everywhere below the
line x = c. First we count the paths which intersect the line x — c.
If a path intersects the line x = c, we uniquely assign to it a path which
is identical with the original one up to the first intersection with the line
x = c and from this point on is the reflection of the original path with
V III, § 10] E M P IR IC A L D I S T R IB U T IO N F U N C T IO N S 499
respect to the line x = c. The new path ends at the point (2c, 2rí). By this
procedure, we assign to every path going from (0, 0) to (0, 2rí) and inter
secting the line x = c in a one-to-one manner a path which goes from
(0, 0) to (2c, 2rí) and is composed of segments which again form an angle
of ±45° with the x-axis. The number of paths having one or more points
in common with the line x = c is thus equal to the total number of the
I 2n
paths going from (0, 0) to (2c, 2rí). This number is . Hence
[n — c
. u :(,z) = [2n - ( 2n ) .
n \n —c)
Because of the lemma, Theorem 9 is herewith proved.
Proof of Theorem 10. We use a similar argument. The number of paths
going from (0, 0) to (0, 2rí) and having no point in common either with
x = z J i n or with x = —z J i n is equal to the number of paths going
from (0, 0) to (0, In) and having no point in common with the lines x = +c.
Let this number be denoted by t/„(z).
Let N+ and 1V_ denote the number of paths intersecting x — c and
x = —c, respectively. Let N +_ (and iV_+) denote the number of the paths
which after intersecting x = c (and x = —c) intersect also x = —c (and x = c),
respectively, etc. Let N0 denote the number of the paths which do not in
tersect either x = c or x = —c. There can be shown as in Chapter II (§ 3,
Theorem 9) that
N0 = N - N + - N_ + N + _ + N _ + - N +_ + - N _ +_ + . . . . (7)
2n I
n +c) _2zS
l,m ~ m ~ ~ e
л-*- + оо ■
-n )
Theorem 8 can be derived from Theorem 10 in a similar way.
In this section we shall study limit theorems of another type than those
encountered so far. As we do not strive at the greatest possible generality
but rather wish to present the different types of limit distributions, we shall
restrict ourselves mainly to the simplest case, i.e. to the case of the one
dimensional random walk (classical ruin problem). We shall find in the
study of this simple problem a lot of surprising laws which contribute to
a better understanding of the nature of chance. Theorems 1 and 2 are
concerned with the problem of random walk in «-space.
Let the random variables £ъ {2, . be independent and let
each of them assume the values +1 and —1 with probability — .The
random variable
c„ = £ c (i)
k =1
lattice” . Imagine a point which moves “at random” over this lattice. We
understand by a “ random walk” the following: If the moving point can
be found at a time t = n at a certain lattice point, then the probability
that at the time t — n + 1 it can be found at one of the adjacent points
of the lattice is equal to for all adjacent points which have r — 1 coor
dinates equal to those of the preceding point and one coordinate differing
by ± 1. If the position of the point at the time t = n is given by the vector
£(„r), then the random vectors fj* {n — 0, 1,. . .) form a homogeneous
additive Markov chain, namely
c(„r) = c(or )+ i W,
A =1
(2л)!
PS = !Y2
,r=n (nf.. . . nr\)
(2 r f
n\
(2 )
nx\ . . . n r\
In particular
0 ( 1) _
r 2n —
22n 5
1 Cf. G. Pólya [2].
502 THE LIMIT THEOREMS OF PROBABILITY THEORY [VIII, § 11
2л j 2
p(2) _ _ П^__
2« л2п »
Í 2л
р(3) _ i 11 у w!_______
2" 62n * 4 , k\ l \ { n - k - l ) \ •
(За)
л /л л
and
/» ff« — • (3b)
nn
„ и !
У ------------ = г".
nl+.!Xnr=n «i! • • - и,!
On the other hand, it is easy to see that among the polynomial coefficients
the largest are those in which the numbers пъ n2, . . . ,nr differ at most
by +1 from each other (cf. Chapter III, § 18, Exercise 3). Hence
(2n\ (3c)
^ \n n\ J 1
< —У- max — --------- = О —— .
(4r)n r
Znj = n
nx\ . . . n T\
n
1
On the other hand, it can be proved that P $ can be represented by the
following integral:
2n
»(<■) ______n у
2n (In)'-1 ( 2 r f n
<4 >
Y P $ = + oo for /• = 1 and r — 2,
/1 =1
In the latter case the Borel-Cantelli lemma permits to state that for
r ^ 3 the moving point returns with probability 1 at most finitely many
times to its initial position.
For r = 1 and r = 2 we shall show that with probability 1 the
moving point will sooner or later (and therefore infinitely often) return
to its initial position. In order to prove this, consider the time interval
which passes until the first return of the moving point. Let Q(p) denote
the probability that the point walking at random on the r-dimensional
lattice reaches its initial position for the tin e after n steps. Obviously,
p$ = + YL P&Q&-2k- (5)
k=1
Put
HAx) = (9a)
1 + Gr(x)
and
а д "7 ~=Ш-m
504 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 11
Clearly,
0 <r) = i Q& = H'{ 1) = lim - , (10)
k =l *-*1-0 1 + W )
where Q(r) denotes the probability that the moving point returns at least
+ 00
once to the origin. For r = 1 and r = 2 the series £ is divergent, hence
k =l
<2(r) = 1, while for r > 3
t r®
Q(r) _ =1______
1+ t P $
k =1
hence 0 < 2 (r) < 1. (E.g., for r = 3, g (3) « 0.35.) Thus we have proved
Theorem 1; at the same time we have obtained
Theorem 2. For r > 3 a point performing random walk over the lattice
Gr has a probability less than 1 to return to its original position.
It can be shown in a similar manner that for r — 1 and r = 2 the moving
point passes infinitely many times through every point of the lattice with
probability 1, while this is not true for r > 3.
In what follows we shall deal with the case r — 1 only. First we give
the explicit form of the probability Q ff The generator function (6) is here
В Д - yH — = 1- / 1 ^ = I l K - i ) * - 1**.
1 + Gj(x) ^ * ti \ k }
Hence
12k — 2
— + ■ on
2 ^Р кк2
A simple calculation shows that
( 12)
V III, § 1 1 ] R A N D O M W A LK PRO BLEM S 505
Let Vj be the number of the steps in which the moving point first returns
to its initial position; hence vx is a random variable and P( vl — 2k) = Q$.
It follows from the asymptotic behaviour of the sequence Ihat the
expectation of Vj is infinite. Let 9f t ) be the characteristic function of v1:
<p(t) = 1 - У 1 - e2",
hence
lim q k | = e x p (- J - 2 i t ) . (13)
n~*4-CO 1
But we have
+00
’ (. 1
____ ! exp \ixt — —
exp(—s j — 2it) = —f== -------— 3--- — dx, (14)
V :2n 0' л“2
- — 3- í/m = 2 (1 — Ф (“ 7 = |) (15)
0
J J ln u 2
<i6)
is valid.
506 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 11
2k\ l In — 2k
1 "(2 k\l2 n -2 k\ ,
. - * ) - ■ <20)
holds.
Remark. Relation (20)is a corollary of (19); in fact if we add the proba
bilities P{n2n = 2k) for к = 0, 1,. .. , n, we obtain 1, remembering that
7t2n is always even. But since we wish to use (20) for the proof of (19), we
have to prove (20) directly.
2k
1 00 к
( 21)
yj 1 —X k=0 4
Let us take the square of both sides of (21); since on the left side we get
1 00
—------ = Y x k, (20) is obtained by comparing the coefficients of x n on
1 - A k%
both sides.
Now we prove (19) by induction. Clearly, (19) is true for n = 1; in effect
P ( n 2 = 0) = P ( n 2 = 2) = •
Suppose that (19) is valid for n < N and let Vj denote the least index j
for which Cj = 0; vx is necessarily an even number. Furthermore
N
P{n2N = 2k) = Yj p (n2N = 2k, Vi = 20 + P(n2N = 2k, vx > 2N).
/=i
But
P(n2N = 2k, Vi = 21) =
((21-2 ( 2/j \
The probability P(n2n = 2k, vL > 2 N) is evidently zero for 0 < к < N.
If к = 0 or к = N,
2N\
T heorem 6.
Ш _L
2 2k Jn k' { )
hence
TCon ) I Г dt
hm P X < <у =— - 7= = =
n~+00 2« ) n J ^(1 _ ,)
Now since n2n < 7г2„ +1 ^ 7г2„ + 1, the limit distribution of ——+1- coin-
277 + 1
а д = 2 ^ Г ^ (" 2 - 1)Л-
Proof. We see that the left side of (28) is the coefficient of x n in the
power series expansion of
1 _ 1
^/(1 - e" x) (1 —e~" x) y j 1 — 2x cos t + :c2
On the other hand we know that1
1 1
I
E exp it -^2n-| = e 2 P„ cos -j— .
2n ) ] 1 2 t7
• (30)
Remark. Theorem 6 expresses an interesting paradoxical fact. The
2 r- .
derivative of the distribution function F(x) = — - arc sin J x, i.e.
n
F '( X ) = ----; - 1
Лу/х(1 - x)
seem quite natural that the moving point would pass approximately half
of its time on the positive and the other half on the negative semiaxis.
However, Theorem 6 shows that this is not the case. Or, to put it in the
terms of coin tossing: One would consider as the most probable that both
players are leading during nearly ~ of the whole time. But this is not so;
on the contrary, — is the least probable value for the fraction of time
during which one of the players is leading. However, a little reflexion
shows this to be quite natural; indeed £n varies quite slowly, because of
C„+i — Си = ± 1; if Си reaches for a certain n a large positive value,
(n+k will remain for a long time positive and a similar reasoning holds
for large negative values too.
Theorem 6 is due to P. Lévy.1
The theorem can be generalized. It was proved by Erdős and Kac2 that
if Ci, C2 , • • • are independent random variables with E(£k) = 0, D(£k) = 1
П
In this case (24) is valid even if the variance does not exist.
We now determine the exact distribution of 9„ (the number of the zeros
in the sequence Ci, C2 , • • • , C«) f°r even values of n. We prove first
T heorem 7. For every positive integer n
2 k 12n — k \
Р(в2п = к ) = -5Г (A: = 0 , 1 , . . . ,2«) (32)
2 n
holds.
From this we derive (32) by the method used in the proof of Theorem 5.
We obtain from (32) by Stirling’s formula
E(ßn) * J 2^ - . (33)
( £ 1 + 8 2 + . . . + £„ 1 2 . I l X
hm P -----------------------< x\ = — arc sin / — -— . (34)
„- +=o I n ) n V 2
hence
I£(<=„ O l ^ C2 — - — ; (36a)
V m — n
E
4v = ^ H n ’ (38)
we find
E{An) = 0 and E(A%) < - % .
ln N
Hence, by applying Chebyshev’s inequality,
со
From this we obtain that the series E T(|d2/t* | > s) converges for every
k =1
e > 0. Thus by the Borel-Cantelli lemma the inequality | A2k* | < e is
satisfied with probability 1 for a sufficiently large k. But for 2k8 < n <
< 2(,<c+1)! we have
\An\ < A ^ + ^ .
T heorem 9.
T heorem 10.
lim P( max | C* I < x sf n ) =
n-*-+oo l<,k<,n
, (Ik + l)12я 2
4 * ( - I)* e x p - i — Л -
= lr ~ L -----------2 k + 1 ------- f 0 r X > °* (4»
0 otherwise.
Theorem 9 can be derived from the following formula (cf. Chapter III,
§ 18, Exercise 19):1
Г” - " П
1 L 2 j / w + 2/ c к 1
<42)
It was shown by Erdős and Kac2 that Theorems 9 and 10 can be amply
generalized. They can be extended to the sums of independent, identically
distributed random variables.3
It is interesting to compare Theorems 9 and 10 with the results of the
preceding section. Those results can be put in the following form:
T heorem 11.
/— . í 1 —e~2x‘ for X > 0,
lim P( max £* < x j I n \ C2„ = 0) = j . (43)
„--юс 1 < ^ 2„ (0 otherwise.
T heorem 12.
,— У (—1)k е~2к'х' for X > 0,
lim ( max | C/tl < x j l n |(2„ = 0) = *=-oo (44)
„- +0О i^*< 2„ [ 0 otherwise.
Theorems 11 and 12 describe the properties of paths which after In
steps return to the origin. One expects that under this condition the path
does not deviate as far from its origin as in the general case. Indeed the
expectation of the distribution (43) is
00
J*4x2 e~2x‘ dx = = 0.627,
0
while that of the distribution (40) is / — = 0.798.
V 71
For another proof of Theorem 9 see Chapter VII, § 16, Exercise 13. e.
1
Cf. P. Erdős and M. Kac [1].
2
3 For the extension to random variables which are not identically distributed,
see A. Rényi [9].
V it i § 12] P R O O F O F L IM IT T H E O R E M S B Y O P E R A T O R M E T H O D 515
A(B + C) = AB + AC.
is a contraction operator.
L emma 2. Let F(x) and G(x) be any two distribution functions. The operators
A F and A g associated with them are commutative and A FAa = A H, where
H = H(x) is the convolution o f the distribution functions F(x) and G'(.v), i.e.
H(x) = J f (x - y)dG(y).
—oo
V I I I , § 12 ] P R O O F O F L IM IT T H E O R E M S B Y O P E R A T O R M E T H O D 517
P roof. Clearly
lim — =0 (5)
«-*•00 ^ П
is fulfilled, then
1 f -—
lim Fn (x) - Ф(л-) = —== e 2 du. (6)
п~* со 2 jI J
—CO
518 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 12
Now
+ 00 + 00
we obtain
+ 00
+ 00 +00
Since
J
+ 00
\ y \ 3dFnk(y)= ”, (12)
-0 0
and
(,з)
—00 —00
there follows
Because of the Hölder inequality one has for every random variable
Thus we proved that if / £ C3 , then for any value of x (and even uniformly
in x)
From this follows that (6) holds for every x. Indeed if e > 0 is arbitrary,
let / E(x) be a function belonging to C 3 with the following properties:
f j x ) = 1 if x < 0 ,/£(x) = 0 if x > e andf E(x) is decreasing if x lies between
0 and e. Such a function can be given readily, e.g. the following function
has all the required properties:
1 for x < 0,
0 for £ < X.
Then
+ со
Ф(х + e) > J f E(x + y)d<P(y) > Ф(х), (22)
—00
and
+
F„ (x + £) > J f e (x + y) dF„ (y) > Fn (x). (23)
-o o
Hence
lim sup Fn(x) < Ф(х + e), (24)
n -+ oo
and
lim inf Fn(x + e) > Ф(х). (25)
n-*- 00
Since (27) is valid for every positive e, it follows that (6) is fulfilled for
every x. Theorem 1 is herewith proved.
Now we pass to the proof of the Lindeberg theorem by the operator
method. We prove the theorem in its most general form, i.e. we present
the proof of Theorem 4 of § 1.
P roof of T heorem 4, § 1 by the operator method. We may assume
without restriction of generality that M nk = E(£nk) = 0 (k = 1 , 2 , . . . , « ) .
n
Put = £ €nk, and let Fn(x) denote the distribution function of ( я.
k= 1
V I I I , § 12 ] P R O O F O F L IM IT T H E O R E M S BY O P E R A T O R M ETHOD 521
lim Af J = Л Ф/ . (28)
n-*- oo
As we have seen in the proof of Theorem 1, it then follows that for every
real X
lim F„ (x) = Ф(х).
n-*- 00
Let U„k denote the operator associated with the distribution function
Fnk(x) of the random variable £„k and Vnk the operator associated to the
normal distribution with expectation 0 and standard deviation Dnk. Then
according to our assumptions
Af „ = U nlUn2. . . U nn, and Аф = VM Vn2 .. . V„„. (29)
Further by Lemma 4 for every / ( C 3
Put
sup If"(x) I = Mx and sup |f"' (x) 1= M2,
then
Unkf - f { x ) - ~ D \k f"(x) I < - i- sM 2 D\k + Mx J j* dF nk (x). (35)
hence
+ 00
l im E j / d F nk O ’) = 0 . (3 9 )
л-*-со A :=l |У| > e
furthermore
D2nk = J x2dF„k(x )+ J X2dFnk (x) < s 2+ J x2dF„k(x),
|jc|^ 8 |лг| > e J*|>£
therefore
P{U=V=Pnk- (41)
Put
K = t Pnk (4 2 )
k 1 =
and suppose that
lim Xn = X (43)
П -+ 00
and
lim max pnk = 0. (44)
tt-*-oo 1 <,k<,n
Then the distribution o f
in = Ínl + Íril + • • • + inn (45)
converges to the Poisson distribution with expectation X.
Remark. Theorem 2 is a particular case of Theorem 1 of § 4; the latter
can be proved in a similar manner. Merely for simplicity’s sake we restrict
ourselves to the proof of Theorem 2.
P roof. Let К denote the set of all real-valued bounded functions f ( x )
(x = 0, 1, 2 , . . . ) defined on the nonnegative integers. Put | | / | | = sup|/(x)| .
Let there be associated with every probability distribution —
— {Po .........A i - ” } an operator defined by
00
A & f = Y . A x + r )P r (4 6 )
r=0
for every / £K . Clearly, Ag- maps the set К into itself, Ag^ is a linear con
traction operator, further if fP and Ц аге апУ two distributions defined on
the nonnegative integers, then A ^ A q = A ^ where Lft = Sfi • Q, i.e.
is the convolution of the distributions ffi and £); that is, if = {pn} and
Q = {dn}> then = {/•„}, where
n
rn = Yj P kdn-k-
k= 0
Let Unk denote the operator associated with the distribution 6f'nk of the
random variable and V„k the operator associated with the Poisson
distribution with parameter p nk. Then Unl Un2 . . . Unn is nothing else than
the operator A.cßnassociated with the distribution SAn of the random variable
C„, while Vnl Vn2-.. Vm is the operator Qkii associated with the Poisson
distribution with parameter Xn (taking into account that if Qk is the
Poisson distribution with parameter X, then = йх+ц)-
524 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 12
holds. In fact, if (47) holds for every f £ K, choose for / the function for
which /(0) = 1 and / ( x) = 0 for x > 1; then it follows from (47) that
for every r (and even uniformly in r)
u nkf - v nkf = f ( x ) ( \ - p nk - e ~ n +
00 vrLe~Pnk
+ f ( x + 1){p„k - P„k e Pnk) - E /(* + r) ~ Zi---- > (51)
r =2 ' •
and thus
lim , q. (58)
\ x 2dF(x)
о
Put Cn = í i + £ 2 + • • • + £n- Then there exists a sequence o f numbers Sn
such that for every x
lim P < xj = Ф(х). (59)
w-*-oo )
P roof . Put
<«,)
j x2dF(x)
о
then by assumption
lim S ( y ) = 0. (61)
y-*-+CO
Put further
./.л %) /
^(у) (Л _ с у ч \ 2 У ’ (6 2 )
( w) (1 - F(y)) § X2dF(x)
0
then, as was shown in § 3,
limd(y)=+oo. (63)
n-+oo
By our assumption A(y) is continuous for у > y0 . Let C„ denote the least
positive number for which
A(C„) = n \ (64)
then C„ -> oo, furthermore
и ( 1 -Т ( С й) ) = 7 а д . (65)
526 Г Н Е L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V I I I, § 12
+Cn
Put f 2C2
S2n = n x2dF(x) = — = ^ = , (66)
J x/<5(C„)
-Cn
then
(«>
V2
Now let {/„* be the operator associated with the distribution of the random
L
variable — and Vnk the operator associated with the normal distribution
Sn 1
with expectation 0 and standard deviation — = (k = 1,2, ,ri). Then
Vй
UniU n2 ■. . Unn is the operator associated with the distribution function
F„(x) of the random variable while V„1 Vn2 . . . v nn is the operator
associated with the standard normal distribution function Ф(х) (having
expectation 0 and standard deviation 1). Thus by Lemma 4 for every
/ d C3 we have
II A fJ - A * f II < £ II Unkf - Vnkf\\ = n II Unlf - Vnif\\.
k =1
Hence it suffices to prove that
\ i m n \ \ U nlf - V nlf\\ = 0. (68)
П-+00
Now
On the other hand, if in the integral on the right hand side of (69) f ( x + y)
is expanded into a Taylor series up to the third term and if it is taken into
account that by our assumption the distribution with the distribution
function F(y) is symmetric with respect to the point 0, then we have
where
+c„
R n =
-c„
J y 3f" ' (x + Oy)dF(y), (72)
similarly, A hf — e,,x(p(t).
Hence, from the fact that for every / £ C3
A fJ - AFf (76)
it follows that for every real t <pn(t) -* <p(t).
Therefore the operator method proves slightly more than the characteristic
function method. In effect, we prove for every / £ C3 the validity of (76)
and even that (76) is fulfilled uniformly in x. This makes the proof of the
relation F„(x) -* F(x) simpler, because while the implication of the relation
Fn(x) -> F(x) by the relation <pn(t) -> <p{t) is a comparatively deep theorem
(the so-called continuity theorem of characteristic functions, cf. Theorem 3
of Chapter VI, § 4) it is quite easy to see that (76) implies Fn(x) -> F{x) (for
528 T H E L IM IT T H E O R E M S O F P R O B A B IL IT Y T H E O R Y [V III, § 13
every X which is a continuity point of F(x)). On the other hand, the method
by which we proved (76) in each of the above discussed cases, can be applied
for distributions of sums of independent random variables only, while the
method of characteristic functions can be applied in other cases too (cf. e.g.
§ 5 or Exercise 26 of § 13).
§ 13. Exercises
1. Prove Theorem 2' of Chapter VI, § 5 by means of the central limit theorem
(Chapter VIII, § 1, Theorem 1).
Hint. If Fix) is a distribution function with expectation 0 and variance 1 such that
f [— ) * F (— I = F - ,
KJ lff2J yJo\ + ol
then F{x) is equal to the я-fold convolution of F(Xyjn). This converges to the normal
distribution as n —> + сю.
2. Let . be independent random variables and suppose
Under what conditions on the positive numbers an does Liapunov’s condition of the
central limit theorem hold for the random variables {„?
Hint. Put
It follows that
H hL< LL< izM 1
s n - s n - [s„J ’
К ш
and Liapunov’s condition lim — = 0 is fulfilled, iff lim — = 0 .
Л-v + O, V Л-^+ОО C^П
3. a) Let ся be a random variable having a Poisson distribution with expectation A.
Show by the method of characteristic functions that the distribution function of the
— A
random variable----—— tends to the normal distribution function as A—> + °°
4. Let e„(x) denote the и-th digit in the decimal expansion of jc (0 < x < 1 ); the
П
values of £■„(.*) are thus the numbers 0, 1, . . . , 9. Put S„(x) = ^ еА(дс). If E„(y) is the
1
2 s (X ) 9n
set of the numbers х for w hich---- ------ < >’,and if \E„(y)\denotes the Lebesgue
Hint. We choose a point rj at random in (0, 1); i.e. rj is a random variable uniformly
distributed in the interval (0, 1). The random variables = E„(rj) are then independent
and identically distributed; the central limit theorem can be applied. We have:
ЕЮ = Y ’ D(^ = ^ •
x = V «»(*) ,
> -------------
«Й <h4f--4n
where e„(x) may take on the values 0, 1, — 1. As in Exercise 4 put Sn(x) =
= £ ek(x). Now if E„(y) denotes the set of numbers x (0 < x < 1) such that
Hint. Choose at random a point t] € (0, 1) and put c„ = e„(rf). It is easy to see
that the random variables are independent. Furthermore
6 . Let ti, (t, t • be independent random variables, with the same normal
distribution. Put
_ , . 1/ г _
Í« = — £ ff« = ' — 4 -----i-----, and r„ = V я•
« /t=i n —1 a„
Show that
л:
i г -
lim Р(т„ < а) = — е 2 du.
» -+ 0 O ,у2я J
— со
Ь)
(cf. Ch. IV, § 10). Its density function is
Another proof can be obtained by noticing that the distribution function of v'n
tends to the normal distribution function as n -* + °° and lim st o„ = 1 ; the result
ft —*■ + no
^W i = l~ 4 - (* = 0, 1 , . . . , A — 1),
VIII, § 13] E X E R C IS E S 531
{к = 1 , 2 ........N ) (1)
Pu = 0 for I к - 1 1 ф 1 .
Show that
„ N f N) I 2 Iя
« ■ т + Г - т Ц ' - » ] -
(This example contains the statistical justification of Newton’s law of cooling.)
9. Let Galton’s desk be modified in the following manner (cf. Fig. 26): From the
JV-th row on the number of pegs is alternatingly equal to the number in the (N — l)-th
row and in the 7V-th row. On the whole desk there are N + n rows of pegs. Determine
the distribution of the balls in the containers when the number n of balls is large.
10. The random variables £0, . . . , £„ . . . form a homogeneous Markov chain;
all take on values in (0, 1); let the conditional distribution of C„+i under the
condition Сп — У be absolutely continuous for every value of у (0 < у < 1); let
p(x, y) be the corresponding conditional density function. We assume that for
0 < X < 1 and 0 < у < 1 the function p(x, y) is always positive and that for every
i
X ( Q < x < 1) J p(x, y) dy = 1 holds, further that p(x, y) is continuous. Let p„(x, y )
о
be the conditional density function of C„ under the condition C0 = y. Show that
the relation
lim p„(x, у) - 1
Л -* - + CO
11. Let a moving point perform a random walk on a plane regular triangular
lattice. If the moving point is at the moment t = n at an arbitrary lattice-point, it
may pass at the moment t = n + 1 with the same probability to any of the 6 neigh
bouring lattice points. Show that the moving point will return with probability 1 to
its initial position, but that the expectation of the time passing until this return is
infinite.
The following Exercises 12 through 18 all deal with homogeneous Markov chains
with a finite number o f states, fulfilling the conditions o f Theorem 1, § 8 . The notations
are the same. The states are denoted by A0, Al t . . . , AK. The random variable c„ is
equal to к if the system is in the state Ak at the time n (Jc = 0, 1, . . . ,N ). We put
P (íо = J) = Pdf), Pfk = Р(Лп +,„ = к | cm = j), p<f = p,k and PJk) = p(i„ = k). We
assume that min p ik = d > 0. According to Theorem 1 o f § 8 the limits lim pjj? = Pk
I ,к n —*■ со
N
exist and are independent o f j . Furthermore Pk = 1.
k=0
12. Let
tt) _ f 1 if the system is in state Ak at the time t = n,
T,n I 0 otherwise.
П
We put Cnk) = Y j Show that
/=i
Hk)
lim st — = Pk (к = 0, 1 , . . Л0
n—
►+00 Г1
i.e. the system passes approximately a fraction Pk of the whole time in the state Ak.
Hint. We have
F.(rif') = P( ín = k) = P{k),
where R(fi„k\ fifil,) is the correlation coefficient of fi„k) and fifin and C is positive.
Thus the result follows from Theorem 3 of Chapter VII, § 3.
14. Assume that at t — 0 the system is in the state Ak . It returns to it for the first
time after a certain number of steps. Let this random number be denoted by j**» .
Show that P(vM > n) < (1 — d f .
Hint. We have P(vw > 1) = 1 — pkk; hence the inequality is true for n = 1.
Suppose for a proof by induction that the inequality is true for n . Then
hence
/V * > n + 1) = 5 ] P(v® > и, f , = у) X pm <
i фк hjzk
b) E( v ^ ) = -I- •
“k
Hint, a) follows from Exercise 14. Let further Vk(z) denote the generating function
o f v'M; we have
V M = 1 - 7Г7Т
Uk (*) >
where
The relations
lead to
16. Let the numbers pSjP (r — 0, 1 , . . . ) denote the values of n for which rfk) = 1
(fffi < fAk> < ■■•)> Vn® >s defined here as in Exercise 1 2 . Show that the standardized
distribution of tends to the normal distribution as r —►+ 00 .
17. Show that the distribution of the random variables Qk> introduced in Exercise
12 tends, after standardization, to the normal distribution as n -* + 0 0 . (Generaliza
tion of Theorem 2 in § 8 .)
Hint. It is easy to see that Pff„k) < r) = P(p(,k) > n) ; for if the system passes less
than r times through the state Ak during the first n steps, then it will return to it for
the r-th time after the moment t = n , and conversely. Thus we are back to Exercise
16 and find
18. Put
Pfc* = P(Sn = /с 1{*+i = I) (n = 1 , 2 , . . .;J, к = 0 .1 ........N).
Show that the limits lim P)k "' = P*k exist and form a stochastic matrix.
n —► + ОЭ
Hint. We have
p t-r t _ p $ n = k ) p k,
lk Г&+i=J) '
Hence
lim Pfc* = = Pi
П -* -+ o o -V /
and
N I N
X P*k = " 7 Г ^ PkPki-
k= О г / /c=0
But the Pk satisfy the system of equations
N
X PkPk, = Pi ( / ' = 0 , 1 ........ N ) ,
k= 0
N
hence we find that ^ P*k = \ \ the transition probabilities Pfk define thus again a
k=o
Markov chain.
Hint. We have
^ « -^ С М т /М т Г
and, consequently,
0
20. Suppose F(x) = x (0 < x < 1). Show that ni*k and и(1 — Сл+i-/) are
independent in the limit as n~* + 0 0 and have gamma distributions of order к and j,
respectively:
lim P(n Z*k < X, n(l - Ím + i- í) < y ) -
tl-* -+ co
density function of
f* , /Vi---- In In n + In 4л
ÍZk — m + a ' 2ln n - o --------- —
___________ ^ 2 j 2 In n
a
y j 2 In n
I t is e a s y t o p r o v e t h a t
Remark. We can derive from this result the theorem of Smirnov (§ 10, Theorem 1),
24. (Wilcoxon’s test for the comparison of two samples.) Let £„ . . . , and
T]j......... rjn be independent, identically distributed random variables with the common
c o n tin u o u s d is tr ib u tio n f u n c tio n F ( x ) . L e t th e n u m b e r s i k a n d rj: b e u n ite d i n to a
sin g le se q u e n c e , le t th e m b e a r r a n g e d in in c r e a s in g o r d e r a n d in v e s tig a te th e “ p la c e s ”
o c c u p ie d b y L e t v t , v 2, . . ., v m d e n o te t h e r a n k s o f th e e le m e n ts
4i, . • • , Sm in th is s e q u e n c e . P u t
m(m + 1)
W= + v„ + . . . + vm-------- ------.
a) Show that W is equal to the number of pairs ({ ,, i],) such that { / > rj,.
b) Show that E(W) = .
c) Let G„m(z) be the generating function of W:
G„m ( z ) = X P( W = k) z \
к
Show that
Г
^nm <7\
\z) — s-< \+/-<
c„ s(z)
M(Z)
C„,f(z)
\ >
where we have put
Q i2) = Ó S L z J l} .
/U Д1 - Z)
d) Show that
DiW ), + .
W _E( \y\
e) Derive from c) that the distribution of tV* = ---------------- tends to the normal
D{W)
distribution as n - > + o o , m —> + o o , if — tends to a constant. (Cf. Ch. II, § 12,
n
Exercise 46 and Ch. Ill, § 18, Exercise 45.)
number of those triplets (i,j, к) for which ij, > Zk, r), > i k and i < j. We put
_ L,
2 2
Show that if F(x) = 6 (x), then E(L) = — and if Fix) Ф G(x), then E(L) > — .
26. Let there be performed N independent experiments. Let the possible outcomes
of every experiment be the events A,, A , . Let p k = P(Ak) (k = 1,2
Г
let v* denote the number of occurrences of the event TA., where ^ vk = N . If
*=i
2 V {V« - КРкУ
Xn = У ----- rt-------- .
ih\ Npk
th e n th e d is tr ib u tio n o f •/% te n d s a s N —> + oo t o th e /- - d i s tr i b u t i o n w ith r — 1 d eg re e s
o f f re e d o m :
/-1 f--i' 2 .2 ;
lim sup Д е"'") - X ■ „ ’ < 2 /ТГ i
N—+« /= 0 7 * z *•
28. Let be random variables which assume only the values 0 and 1
and let rjnk be the sum of all products of к distinct elements of the sequence
£л1» ***>£ЛЛ•
Vnk — ^ ^nli £/ii2• • • £mV
lá í i < ...< / * S n
A* .
Show that if E(r]„k) tends to — - (k = 1 , 2 , . . . ) as n -» + o o , then the distribution
of the sum
X Í»I = *?Л1
/=1
Е(лгп = £ 2л = 0) = ^
i 2; )
^ 1 )2 2' ^ — 0» 1 , . . я
and
_ 1 „ ( 2я - 2П 1 ( 2/ l
Р ( я 2„ - 2 * , { 2я - - 2 / ) - - ^ г ^ Х _ Д и - / ) (я _ / + 1 у ( / _ 7J
for fc = 0 , 1, . . . , я and j = If 5, . . . , я.
30. By using the results of Exercise 29 show the following: If y„ is any sequence
Уп
of integers such that yn and n are of the same parity and lim —p= = у (у is here
n—
*■+03 -y n
VIII, § 13J EXERCISES 539
lim P \~ V <X
M -+ + C 0 \ n
| =J
J
!?)<*>
0
with
+ 00
У* _ U\
. , 2ei e 2 du
m r)- ^ c т г г з т
J \ u2 I
>
i'T
for 0 < / :S 1 , у > 0 and f ( t \y ) = f ( l — 1 1 — y) for у < 0 .
71-
Remark. For у = 0 the conditional limit distribution of — with respect to the
n
condition CJ yj n~* 0 is thus uniform on (0 , 1 ). If we notice that n is, in the limit,
normally distributed, it follows that
'/ т г 7 *
0
л ^ р ( ^ <х[с" - ° У И
0
\ / ^ A;
from these results, from Р(СЯ> 0) = -i- and from
^ “7 « h r>
the arc sin law can be easily derived.
C H A P T E R IX
APPENDIX
INTRODUCTION TO INFORMATION THEORY
§ 1. Hartley’s formula
It follows
Thus for every e > 0 we can find a number к such that if we take the
elements of E by ordered groups of k, then the identification of one element
requires on the average less than log2 N + e binary digits.
The formula
c. I(E2) = 1.
Postulate C is the definition of the unit; it is not more and not less
arbitrary than the choice of the unit of some physical quantity. The meaning
of Postulate В is evident: the larger a set, the more information is gained
by the characterization of its elements. Postulate A may be justified as
follows.
A set E nm of N M element may be decomposed into N subsets each of
M elements; let these be denoted by £ $ , . . . , Fffp. In order to characterize
an element of ENM we can proceed in two steps. First we specify that subset
to which the element in question belongs. Let this subset be denoted by
£j$. We need for this specification an information I(EN), since there are
N subsets. Next we identify the element in £j$. The amount of information
needed for this purpose is equal to I(EM) since the subset E $ contains M
elements. Now these two informations completely characterize an element
of ENM; Postulate A expresses thus that the information is an additive
quantity.
IX, § X] HAR TLEY ’S FO RM U LA 543
P roof. Let P be an integer larger than 2. Define for every integer r the
integer s(r) by
2i(r) < P r < 2s<r)+\ (2)
r r
Hence
s(r)
hm ----- = log2 P. (4)
Г -* 00 f
f i a k) = k f ( a ) (7)
and, by C ,/( 2) = 1; hence it follows from (6) that
s (r )< r f(P )< s (r )+ 1, (8)
thus
Proof. Let P > 1 be any power of a prime number and f(n) = I(E„)
a function satisfying A*, B*, C. Put
/ л 4 f( P ) logo n
g(fi) =f{n) - log p (io)
Clearly, g{ri) fulfills A*. Furthermore we have
en = g(n + 1) - (U )
then B* implies
lim £„ = 0. (12)
п-+ co
<7(P) = 0. (1 3 )
Г n ~| „ if n "I
— for — ,P = 1,
P P
n' = l (14)
where (a, b) denotes the greatest common divisor of the integers a and b.
1 Cf. P. Erdős [2] and the article of D. K. Fadeev, The notion of entropy in a
finite probabilistic pattern (Arbeiten zur Informationstheorie, Yol. I). Fadeev found
this theorem independently from Erdős. The proof tfiven here (cf. A. Rényi [29], [30],
[37]) is considerably simpler than that of the above two authors.
IX, § 11 HAR TLEY ’S FO RM ULA 545
Clearly
n '< ~ (15)
and
n = Pn' + /,
where (n',P ) = 1 and 0 < l < I P . According to (13), g(Pn') = g(n),
hence we can write
n—1
g(n) = g(n') + g(n) - g(Pn') = g(ri) + £ sk, (16a)
k= P „'
pk ?
hence we obtain rík) = 0 after at most - °^2 П + 1 steps, hence for every n
.log oP
(log2R
Thus, according to (12),
Let c denote the limit of the left hand side of (19). We conclude that for
every P > 1 which is a power of a prime number
§ 2. Shannon’s formula
h = Z Pк log2 — • (3)
fc=1 Pk
Formula (3) was first established by Shannon and in what follows we
shall call it Shannon's formula. Simultaneously with and independently of
Shannon the same formula was also found by N. Wiener.
In particular, if px = p2 = . . . = p„ = — , Shannon's formula reduces
n
to Hartley's formula (cf. Formula (1) of § 1). Analysing the above heuristic
considerations it is clear that we implicitly used three assumptions, namely
1. The selection of the considered element from the set E depends on
chance; actually, we are dealing with the observed value of a random
variable.
2. All elements of E are equiprobable; the probability that an element
of E belongs to Ek is therefore pk — .
h i.
2 2
Furthermore, we require:
IV. The following relation holds:
КРъРъ - • ■,/>„) =
E f t 10g2— • (5)
*=i Pk
i.e. that the occurrence of the sure event does not give us any information.
In fact, if n = 2, p1 = 1, p2 = 0, it follows from IV that
= K s m, Рт + Ъ ■■ ; P m + n) + Sm l ~~ ^ , ••; j, (7a)
Sm Sm )
К Р ъ Р г , ■ ■ ; P m + n) = 7 ( P i + P i , P 3 , ■ ■ ■, P m + n ) +
r f Pi Pm . /п ч
— I ~ 9 • • •» ~ J (^c)
[ sm Sm ,
<4')
j t i ■*/ *7
where we have put
mj
sJ = l L P i i 0 = 1, 2, . .. , « ) . (8)
1= 1
By assumption,
и я /я /
E J7 = Ё I P jl = 1■
7=l 7=11=1
In fact, if in (4") all m. are equal to m and all pn are equal to ----- , the
mn
left hand side is equal to f(nrri) and the right hand side to /(и) + f(m),
hence we get (9).
e) If we apply (4') to the case when all probabilities are equal and if we
unite them all except the first one, we obtain
lim [/(«) - / ( « - 1 )] = 0. (1 1 )
n-+oo
Put
f ( n — l) = d2 + d3 + . . . + dn_x\
л N Z nÖn
(l5)
I n
/2 = 2
Because of (12) the right hand side of (15) tends to zero for N -» oo. Hence
we have
Iim
N-*-+oo “aF' T1Tk =
I 2 dk = °- ( 16)
552 IN T R O D U C T I O N T O I N F O R M A T I O N T H E O R Y [IX , § 2
we find
(
hm I as„ +
. (1
,, — a)s ---------------------
si + s2 + • • • + — sn )I = i
(0 < a < 1), then we have also lim s„ = i. We need only the particular case
« —►CO
However, (4) does not follow from (22). This is most easily demonstrated
by the quantity
which fulfills Postulates I —III and Formula (22), without fulfilling (4). (If it
n 1
fulfilled (4), it would be equal, by the just-proved theorem, to У pk log2— ,
fc=i Pk
which is not the case.) We shall see in § 6 that the quantity (23) too can be
considered as a measure of the information associated with the distribution
SA = (py,. . pn). In fact, we shall define a class of information measures
depending on a parameter a which contains both Shannon’s information
(for a = 1) and the quantity (23) (for a = 2).
We add some further remarks.1
mation or about uncertainty means essentially the same thing: in the first
case we consider an experiment which has been performed, in the second
case an experiment not yet performed. The two terminologies will be used
alternatively in order to obtain the simplest possible formulation of our re
sults.
2. The quantity (5) is frequently called the entropy of the distribution
•Sfi — (/>!,. . ., p„). Indeed, there is a strong connection between the notion of
entropy in thermodynamics and the notion of information (or uncertainty).
L. Boltzmann was the first to emphasize the probabilistic meaning of the
thermodynamical entropy and thus he may be considered as a pioneer of
information theory. It would even be proper to call Formula (5) the Boltz
mann-Shannon formula. Boltzmann proved that the entropy of a physical
system can be considered as a measure of the disorder in the system. In case
of a physical system having many degrees of freedom (e.g. a perfect gas)
the number measuring the disorder of the system measures also the uncer
tainty concerning the states of the individual particles.
3. In order to avoid possible misunderstandings it should be emphasized
that when we speak about information, what we have in mind is not the
subjective “ information” possessed by a particular observer. The terminol
ogy is really somewhat misleading as it seems to support that the informa
tion depends somehow on the observer. In reality the information contained
in an observation is a quantity independent of the fact whether it does or
does not reach the perception of an observer (be it a man or some registering
device or a computer). The notion of uncertainty should also be interpreted
in an objective sense; what we have in mind is not the subjective “uncertain
ty” existing in the mind of the observer concerning the outcomes of an experi
ment ; it is an uncertainty due to the fact that really several possibilities are
to be taken into account. The measure of uncertainty does not depend on
anything else than these possible events and in this sense it is entirely objec
tive. The above mentioned relation between information and thermodynam
ical entropy is noteworthy in this respect too.
x[, x2, . . x'„. The observation of the random variable assuming the values
x\,x'2, .. x'n with probabilities р ъ р2, • • -,P„ contains the same amount
of information as the observation of Consequently, if h(x) is a function
such that h(x) ф h(x') for x ф x', we have /(/?(£)) = /(£). However, with
out the condition h(x) Ф h(x') for x Ф x' we can state only that /(/;(£)) <
< /(c). This follows from the evident inequality
It suffices for this to apply (2) to the convex function у = xlog2x (x > 0);
with x k = p k, wk — — (k = 1 , 2 , . . ., n) we get (3). The equality sign holds
n
556 IN T R O D U C T I O N T O I N F O R M A T I O N T H E O R Y [IX , § 3
X 9k = X Pi X wjk = X p j= i;
k=l j —1 k=1 j =1
hence Q = (qb q2, ■. ., #„) is a probability distribution and we find
im <№ . (4)
In fact, by putting
g(x) = X log2 X, Xj = ph Wj = wjk ( j = 1 , 2 , . . . , ri)
and
m
X
7-1 Г]к = Чк (k = \ , 2 , . . . , n ) . ( b)
10
then we have
/(({,»?)) = Ач) + /«!*). (13)
Formula (13) follows from (9), (10b), (11) and (12):
m n
щ In) = /((?, I/)) +7=1
X AX=l o*’°g ?* = /((Í, »7)) - /(«;)■
2
It follows from the definition that /(£ | rj) = /(£) when £ and rj are inde
pendent, hence (13) reduces in this case to the relation obtained in the pre
ceding section:
щ , n)) = m + m - (i4)
We may consider (13) as a generalization of the theorem on the additivity
of the information: the information contained in the pair of values (£, rj)
is the sum of the information contained in the value of rj and of the condi
tional information contained in the value of £ when we know that rj takes
on a certain value.
Now we show that in general the relation
/(({, ч)) s m + m (is)
558 IN T R O D U C T I O N T O IN F O R M A T I O N T H E O R Y [IX , § 3
holds, where the sign of equality occurs only if £ and t] are independent.
According to (13), relation (15) is equivalent to
/ « In) £ / ( 0 ( 16)
which means that the “conditional” uncertainty of £ for a known value of
i/ cannot exceed the “unconditional” uncertainty of £. By taking ( 11) into
account we can write
m n
7( £ b ) = - Z Ё ЗД|* lo§2 P j\k ' (17)
1=1*=1
П
Pj l°g2 P j ^ Z 4 kP i\k l0g2P,|fc. ( 18)
k=1
From (17) and (18) follows immediately (16), and hence (15) too. The sign
of equality in (18) can only hold if all P j \ k (k = 1 , 2 , . . . , n) are equal, i.e.
when £ and ц are independent. We conclude from (13) that
/« , 7) = m - / « I 7 ). (2 1 a)
(We must not confuse /(£, íj ) with the information /((£, >7)) associated with
the two-dimensional distribution of £ and Í7.) According to (20)
/(& П) 2: 0, (23)
where the equality sign holds only if £ and 17 are independent. Hence if £
and rj are not independent, the value of rj gives always information about
£. On the other hand, from (21a) and (21b) follows
Here too, it is easy to find the cases in which the equality sign holds. In fact,
if I{£, rj) = I{£), then I{£ I i/) = 0 , which can occur only if the value of £ is
uniquely determined by the value of r\, i.e. if £ = / ( 17). Similarly, /(£, 17) =
= /( 17) can occur only if 17 = #(£). The quantity I(£, rj) can be considered
as a measure of the stochastic dependence between the random variables
£ and 17.
The relation I(£, 17) = /( 17, £), expressing that 17 contains (on the average)
just as much information about £ as £ about 17, seems to be at the first glance
surprising, but a deeper consideration shows it to be quite natural.
The following example is enlightening. Let 17 be a random variable sym
metrically distributed with respect to the origin, with P(ri = 0) = 0, and
put £ = rj1. There corresponds to every value of r\ one and only one value
of <!;, while conversely £ determines 17 only up to its sign. In spite of this,
£ gives just as much information on 17 as rj gives on £ (viz, /(c)); the differ
ence is that this information suffices for the complete characterization of
£ but does not determine rj completely (only the value of \ri\). In fact, Ijtj) =
= I(£) + 1 (if we know already the absolute value of 17, rj can still take on
the values ± | rj | with probability — , hence one unit of uncertainty must
be added).
We prove now the inequality
I ( f M ) * /« , 4), (25)
which is equivalent to
If instead of r\ we observe a function f{rj) of 17, then we obtain from the value
oifjrj) at most as much information on £ as from the value of /7; the uncer
tainty of £ given the value of /( 77) is thus not less than its uncertainty given
the value of /7.
560 IN T R O D U C T IO N TO IN FO R M A T IO N THEORY [IX, § 4
The same example which served to derive Shannon’s formula can be used
to get a heuristic idea of the notion of gain o f information. Let E be a set
containing N elements and let Еъ .. ,,E„ be a partition of this set. If Nk
П
m = m + ЕС I n) (l)
where, clearly
Now let E ' be a nonempty subset of E and let E'k (k - 1 , 2 , . . . , rí) denote
the intersection of Ek and E'. Let N'k be the number of elements of E'k,
Nk "
N ' the number of elements of E ' and put qk = ---- . Then we have N'k =
A k=i
П
= N ', hence qk = 1. Suppose that we know about an element chosen
k= 1
at random that it belongs to E'; what amount of information will be fur
nished hereby about q l The original (a priori) distribution of q was
&(Pi,Po, ■■■,p „); after the information telling us that the chosen element
belongs to E', 11 has the (a posteriori) distribution Q = (qu q2, . . ., q„). At the
first sight one could think that the information gained is 7(d?3) — I(Q). This,
however, cannot be true, since 7(d?3) — 7(Q) may be negative, while the gain
of information must always be positive. The quantity I{SP) — 7(Q) is the
decrease of uncertainty of q; we are, however, looking for the gain of infor
mation with respect to q resulting from the knowledge that ef belongs to E'.
Let the quantity looked for be denoted by 7(Q || d?3) 1; it can be determined
by the following reasoning: The statement eK £ E ' contains the information
N . „
logo ~ v This information consists of two parts; first the information given
by the proposition et £ E' about the value of q, next the information given
by this proposition about the value of £ if q is already known. The second
part is easy to calculate; in fact if q — k, the information obtained is equal
t° logo —7 - and since this information presents itself with probability qk,
™к
the information about the value of £ is
n
4k
k=
E log2 •
tv,
^7
1 к
Hence
E = i and
V
k=i qk
i a N N 'k
A Nk =— . ^k
pk
we find that
1(Q.\\&>) = £ ? f c lo g 2 — • (3 )
k=\ Pk
The quantity 7(Q || SE) depends only on the distributions d? 3 and Q; it
1 We use a double b ar||in KQWfifi) in order to avoid confusion with the con
ditional information 7({ | q).
562 IN T R O D U C T IO N TO IN FO R M A T IO N TH EORY [IX, § 4
7(Q||^)>0. (4)
The equality sign occurs in (4) only if the distributions ■&>and Q are identical.
I(Q II is defined only if every pk is positive and if there exists a one-to-one
correspondence between the individual terms of the two distributions. The
quantity I{Q || J7J), defined by (3), will be called the gain o f information
resulting from the replacement of the (a priori) distribution & by the (a pos
teriori) distribution Q.
The gain of information is one of the most important notions in informa
tion theory; it may even be considered as the fundamental one, from which
all others can be derived. In § 6 we shall build up information theory in this
fashion; the gain of information, as a basic concept, will be defined by pos
tulates.
The relative information introduced in the preceding section can be ex
pressed as follows by means of the gain of information. Let £ and rj be ran
dom variables assuming the distinct values х ъ x2, .. ., x m and y u y 2. . .., y n
with positive probabilities pj = P (f = x,) and qk — P(q — y k) respectively;
put = Ox, . . ., p j , Q = (qu q2, . . ., q„),
^ к — {Pl\k’P2\k> ■• ■; Pm\k)‘
Then we have
Z ЯкK ^ k \ \ & )
k=\
= Z Z rjk 1o82 Pj~ Як~ •
j=l k=l
(6)
From this (5) can be derived by Formula (22) of § 3. Formula (5) means
that the amount of information on £ which is contained in the value of q
is equal to the expectation of the gain of information obtained by replacing
the distribution S 5 of t; by the conditional distribution
If = (ръ . . ., pn) is any distribution having n terms and if '<£n =
IX, § 4] THE G A IN OF INFO R M A T IO N 563
1 1 1I .
= — , — , . . — , we have
n n n I
1 (9 № = £ > * log2 nPk = log2 n - 1 (9 ) = 1(STJ- 1 (9 ). (7)
*=i
The gain of information obtained by replacing the uniform distribution by
the distribution 9 is thus equal in this case to the decrease of uncertainty.
But in general the quantities I(Q || 9) and 1(6$) — 1(Q) are not equal.
Though in general 1(9k\\9) # 1(9) —1(9k), Formula (5) still expresses
that the averages of these two quantities are equal. For according to the
first definition of relative information,
But only the sums on the two sides of (8) are equal; the single terms have not
necessarily the same value.
The following symmetric expression is also often considered in informa
tion theory:
J(9,Q) = I(Q\\9) + I(9\\Q). (9)
Let us remark that while certain terms of the sum (3) defining I(Q | | 9 )
may be negative and we know only that the sum itself is nonnegative, on
the contrary, on the right hand side of (10) all terms are nonnegative.
The relative information can be expressed by means of the gain of infor
mation in still another way. If 92 is the distribution {rjk}, 9 * Qthe dis
tribution {pjqk}, then it follows from Formula (22) of § 3 that
/(£, г?) = I(92\\ 9 * Q ) . (11)
The information concerning £ contained in the value of q is thus equal to
the gain of information obtained by replacing the direct product of the dis
tributions of £ and q by their actual joint distribution.
564 IN T R O D U C T I O N T O IN F O R M A T I O N T H E O R Y [IX , § 5
—
n
log2 —
7t„
- w < 41 . (4)
the second of the remaining ones. According to (3) the probability that a
sequence belongs to the second class is less than <5. Let Cn denote the number
of sequences of the first class, let qb q2, . . ., qCn be their probabilities. By
(4) we have
Now let us number the events of the first class from 1 to C„ and write
g
these numbers in the binary system. For this n / ( ^ ) + — + 1 binary digits
are needed. There can be found an n2 such that for n > n2 the inequality
holds. Put и0 = max (nh n2); it is clear that n0depends only on s, 5, and SP
and satisfies the requirements of the theorem. It is easy to show that with
large probability «(/(d?5) — e )0 —1-symbols are not sufficient to describe
the outcome of the sequence of experiments. To see this, subdivide again
the set of the sequences into two classes: let the first class contain the se
quences for which
and the second the remaining ones. Choose an n3 such that for n > n3 the
probability of (9) exceeds 1 — <5; this is possible because of (2). Let
Dn denote the number of sequences in the first class and let гъ r2, . . r Dn
be the corresponding probabilities. We have then
Furthermore, by assumption
d„
E ^ i -г. (ii )
i =i
If we select some outcomes and assign to them sequences of zeros and
ones of length not exceeding — в), the number of these sequences
will be less than hence the total probability of the selected out
comes will be at most
a probability > q > 0. If we choose <5such that 2d < g, then this contradicts
what was just proved.
Theorem 1 is therefore completely proved. It can be sharpened in the
following manner:
T heorem 2. For every 5 > 0 there can be given an щ such that fo r n > n0
the outcome o f n independent experiments ^ can be uniquely expressed, with
probability > 1 — <5, by at most nl(SF) + K f h 0 —1-symbols; К is here a
positive constant which depends only on Ö.
However, there corresponds to every q between 0 and 1 a constant K ' and an
integer n0 such that a unique characterization o f the outcome o f a sequence
o f experiments becomes impossible (with a probability > o, for n > n0) by
less than nl{-¥s) — K 'J n 0 —1-symbols.
the signs and 1 the other. The question is then, how many 0 or 1 symbols
are necessary for the transmission of the information contained in n signs
£i, £2 furnished by the source. According to Theorem 1 with proba
bility arbitrarily near to 1 less than + s) symbols are required, pro
vided that n is sufficiently large. This shows the importance of the quantity
/(.9 s) for communication engineering.
Let us mention an important particular case. If pr — p2 — ... = p r = — ,
r
then H-9') = log2 r; therefore in order to encode a signal of such a source
into 0 —1-symbols, on the average log2 r symbols are necessary. (Of course
this can be shown directly.) If for instance a number written in the decimal
system is transcribed into a binary system, the number of digits increases on
the average by the factor log2 10 = 3.3219 . . . . This is of importance for
computers, which work in the binary system.
If the source emits signals x k with probabilities p k (k = 1 , 2 , . . ., r) and
nl(,9)
if the channel can transmit s different signs, approximately —---- — signs
log2i
are necessary in order to transmit a message of n signs if the most econom
ic coding is applied.
It is to be noticed that optimal or nearly optimal codings are very compli
cated and are feasible only for long sequences of signals. Hence in practice
usually such codes are employed which to some extent take into account
the statistical nature of the source, but are more easy to handle than the
nearly optimal codings. In particular, the signals are coded one by one or
by small groups (as for instance in the encoding of letters into Morse sig
nals).
The message sources encountered in practice are generally much more
complicated than those described above. The individual signals are, in
general, not independent of each other. E.g. in every natural language the
letters have not only different probabilities, but the probability of a letter
depends also on the letters preceding it in the text. This can also be taken
into account, but we do not deal with these questions here.
The channels actually used in communication theory are also much more
complicated than those discussed above. In practice, it is of the great impor
tance to know how to transmit the information through a channel which,
with a certain probability, distorts the transmitted signal. Then one cannot
be sure that the received signal is identical to the emitted one. (E.g. in broad
casting the distortions caused by the transmission through the atmosphere
are perceived as noise.) Such channels are called noisy channels. Information
theory takes this into account, but our brief introduction does not permit
to go into these questions.
I X , § 6} F U R T H E R M E A S U R E S O F IN F O R M A T I O N 569
ZPj
j=i
Let <*; be an incomplete random variable taking on values x k with probabili-
570 IN T R O D U C T I O N T O IN F O R M A T I O N T H E O R Y [IX , § 6
If we put qx — q2 = 1, we find
g (h p ) + g{p, 1) = 0. (5)
We conclude from (4) and (5) that
g{\,p) = c log2 —
P
with c > 0. According to Postulate III c = 1, thus
and by (6)
, 4k
Як = -
In
7= 1
я>
F(Ll, ■9:>,X) will be called the conditional distribution function o f the gain o f
information.
Now we can formulate our further requirements:
P ostulate IV. I(Q 11 .9s) depends only on the function F(Q, x).
Because of this postulate we can also write /[T(x)] instead of /(Q || 9 s),
where F(x) = F(Q, ,9 , x).
Notes. 1. If Q = <%q, 9 s = If p, we have
Postulate IV is thus fulfilled and (8) expresses that for a degenerate distri-
IX, § 6] FU R TH ER M EASURES OF IN FO R M ATIO N 573
bution function
0 for X < c,
otherwise, ('°>
(where c can be any real number) we have the relation I[Dc(x)] = c.
hold. This is the case if we take qk = twk and pk — twk 2~ak. If we choose
the number t such that
Ц -1 ° г Л Z - 4 V (13а)
£ ? * * '1 Ä I
— or 7(Q II ti5) = IfQ у ,VJ) with
P roof . 2 Instead of Iffil || Sfi) we use also the notation 7a(F(Q, x.))
Then (13a) and (13b) are written as
+ 00
and
+00
h (F )= J xdF(x). (16)
-00
From these formulae we see that IJQ || FA) satisfies for every «Postulates
I through VI. It remains still to show that no other functional can satisfy
all these Postulates. A simple calculation shows that
+»
F (0i * 0.2, ^ 1 * <&2 , x ) - j F(Q 1 , ,X - y)dF (Q 2, y), (17)
—00
I[Dc(x)\ = c (20)
c
holds for every real c, where Dc(x) is the degenerate distribution function
of the constant (see Formula (10)).
Let
1h ( t ) = I [ ( l - t ) D _ A(x) + tDA(x)]. (21)
576 INTRODUCTION TO INFORMATION THEORY [IX, § 6
hence
I[F] = /[(1 - t
*=i
w k9>A {O k )) D - a (x ) + (X
*=i
w k Т а Ю) D a (*)] (25)
or, according to (22) by writing <pA\ t ) instead of <J/A(t)
L emma . Let (pfx) and <p2(x) be two continuous and strictly increasing func
tions in the interval [/, К]. Suppose that for arbitrarily chosen numbers
Хц x2, .. ., x„ in [J, K] and for positive numbers wi, w2, . . . , wn with
П
Yj wk = 1 we always have
k=1
holds, where a > 0 and ß are two constants. (Conversely, (28) implies (27)).
IX, § 6) F U R T H E R M E A S U R E S O F IN F O R M A T IO N 577
, ч <Pb (x )-<P b ( - A ) , - . D
Г Л *)= torO<A<B.(30)
Now we investigate how cp(x) can be chosen such that it fulfils also Postu
late Put in / '
0
578 IN T R O D U C T I O N T O I N F O R M A T I O N T H E O R Y [IX , § 6
for all values of a and b and for 0 < t < 1. It follows from the lemma that
This relation being fulfilled for every y, we may interchange x and y, hence
h(x) = (42)
with а Ф 1, hence
9(«-l)5£ _ 1
Ф ) = ----- -k ----- • (43)
4*=1 '
holds. Thus if we put for any incomplete distribution = (ръ .. ., p„)
1 { £ * }
/a (J * ) = J — l°g2 * (45a)
\I a ;
4= i J
we find that
4 ( Q № = 4(^4)-4(0). (46)
(46) shows that the quantity I f d f ) may be considered as a measure of the
amount of information corresponding to the distribution d?3 (or else as a
measure of the uncertainty of a random variable with the distribution d?5).
We call I f f ) the information o f order a. It is easy to see that
" 1
Z 0*l°g2—
4(Q m = logo n - ^ ~ - n-------^ . (47)
Z 9k
k=1
For any incomplete distribution 6$ = (ръ . . .,p „) we put
Zn Pklogi —
1
4 (^ ) = ~ — l o §2 z Pk for « ф 1. (45c)
l -а k=i
580 IN T R O D U C T I O N T O I N F O R M A T I O N T H E O R Y [IX , § 6
, I a — V"“
f (69) = logs — - „ Pk -----
V Х й ,
4 A: = 1 7
Proof. We have
/ Л_ l a_l \ 1
( Z
<7* — Y *
= iog2 h=i— -£hL — , (50)
I ft
4 *=i '
from which Theorem 3 follows by the same theorem on mean values (cf.
foot-note) as above.
If я is negative or zero, the properties of and IfQ || ffi) differ essen
tially from those of Shannon's information. As can be seen from Theorem 3,
IJ fl II Sfi) is for complete distributions only then positive, when я is posi
tive. The following property is particularly undesirable: Let я < 0; modify
the complete d istrib u tio n ^ = (ръ . . .,p n) by letting p x tend to zero, then
tends to infinity. On the other hand, is always equal to log2 n
whenever contains n positive terms. I f f 1) is thus very inadequate to meas
ure the information and we consider only I f f -1) with positive я as true meas
ures of information.
Let us now consider some distinctive features of Shannon’s information
among the informations of the family Iffifi), or of I f f l || 9 s) among the
informations of the family Ififl 11 ■9). One of these properties is given by
Theorem 4. I f £ and rj are two random variables with the discrete finite
distributions Sfi and Q and if .Jf denotes the two-dimensional distribution of
the pair (£, rf), then
holds for every and Q with the mentioned properties i f and only if я = 1.
P(fi = 0, ri = 0) = pq + e,
P(f = 0, n = 1) = p (i - q) - £>
P(£ = 1, t] = 0) = (1 - p)q - e,
Щ = 1, 4 = 1)=«(1 - Ж 1 - * ) + «
582 IN T R O D U C T I O N T O IN F O R M A T I O N T H E O R Y [IX , § 6
with
1 1
0 < p < 1, 0 < q < 1, р ф — ,
and
I e I < min ( pq, (1 - (1 - q)p, (1 - p)(l - q)).
d ЧкУ- d j í J k ) ^ d P k ) á p k \
k=1 *=1 k=1 *=1
where the equality sign can only hold if SP = .
P roof. F or а Ф 1 we have
in X \ [ n
X! ^ I Xj ™
/«( Q II & ) + 4 ( Q II & ') = (X 1
lo& k = lP *k i r - 1f y d ---- ■ (54)
L 4k
U=i
IX , § 7 ] S T A T IS T IC A L I N T E R P R E T A T I O N 583
It is easy to see that the right hand side of (54) is not identically zero; e.g.
it is different from 0 if we put n = 2, qx = q2, px ^ p2.
Z Pk b g 2
/(a) = ------- 7--------- . (I)
Z
k=1
pi
Since
Г r 1 / r 1 \ 2\
\ Z logl —
Pk I Zp2bg2- -\
/' (a) = - In 2 k=1 r------ - k=1 —------ — , (2)
,
4
Z Pi
k= 1
\
4
Z Pk
k= 1
J,
' '
tp llo g ,^-
log2 - J - r = Ha) = -*=Ц г------ Pk (4)
i
m
к = 1
and
< pf for a> ]
584 IN T R O D U C T I O N T O IN F O R M A T I O N T H E O R Y [IX , § 7
Now let Bn(<x) be the event n„ > p(oc)". Consider the conditional information
contained in the outcome of the sequence of experiments, under the condi
tion Bn(u). Put for this
£ v ^ r b r r - <5>
П PÍ* :>?(»)"
*=' г
£ пк = п
fc=i
P ( B n( * ) ) > C n ( a ) p ( a y (6)
E « - 1) = ( Í Pi)"- (7)
k =1
Hence, because of Markov’s inequality,
<9)
Put
?*(«)= г • (10)
l Pl
7=1
If Q denotes the distribution (<7i(a ),. . c/f(«)), we get from (4) by a simple
calculation that
Furthermore, we have
A ( Q J - « l o g * 4 - - ( « - 1) / a( ^ ) = A ( ^ ) + T ^ - / i ( Q J I ^ ) . (13)
/W 1- a
Choose a sufficiently large h for which
C „(a)> (16)
П "* (« )!
k=l
But according to Stirling’s formula
П »*(«)'•
k=1
Relations (12), (16) and (17) lead to
- Z 4 k ( a ) lo g o —
p(a) = 2 k=i Pk- (19)
r
Let vk be the number o f experiments with outcome Ak, let n„ = [ ] p'kk and
k = 1
let Bn(oi) be the event nn > p(d)n. Now if Bn(a) occurs, the outcome o f the
sequence o f experiments may be characterized completely by a sequence o f
0 - 1-symbols o f length
not
*/i(Q I) = n/a (J2) + — — (20)
If, however, q > 0 and e > 0 are arbitrarily small positive numbers and n
is large enough, then n f lf Q J — e) 0 —1-symbols are not sufficient with
probability > Q.
Remarks. 1. IfQf) = may also be considered as an information
measure of the distribution difi; it has the following properties:
a) 0 < P % 9‘) < log2 r,
b) if = Sfi * Q, we have
/W (J^) = /W ( ^ ) + /M(Q).
4 (^ )> lo g 2 - Í - . (21)
PW
and
Z Pk loge— (2 )
k = l Pk
IX, § 8 ] D E FIN IT IO N OF IN FO R M A T IO N FOR GENER A L D ISTR IBU TIO N S 587
if the series on the right hand sides of (1) and (2) converge. The series (2)
does not always converge. For instance for
Pk = ck log2(k + l) (k = 1,2, . . . )
CO J
■?i n loS2(" + !)
However the series (1) converges always for a > 1. In case of discrete in
finite distributions the measure of order a of the amount of information is
thus always defined if a > 1.
Let ») be a second random variable which takes on the same values as £,
but has a different probability distribution P(ri = x k) = qk (k — 1 , 2 , . . . ) .
Let the gain of information of order a, obtained if the distribution
Q — (<h, ?2 , • • •) is replaced by & = ( р ъ p2, . ..) , be defined by
h ( Q .\ \^ ) = £ Як logo — , (4)
k =1 Pk
if the series on the right hand side of (3) or (4) converges (which is not always
the case). The series (3) converges according to Holder’s inequality always
for 0 < a < 1.
Let now £ be a random variable having continuous distribution. We want
now to extend the definition of the measure of order a of the amount of
information, i.e. Ix( 0 >to th iscase- If we do this in a straightforward way we
obtain that this quantity is, in general, infinite. If for instance £ is uniformly
distributed on (0, 1), we know (cf. Ch. VII, § 14, Exercise 12) that the digits
of the binary expansion of £ are completely independent random variables
which take on the values 0 and 1 with probability — . Hence the exact
knowledge of the values of £ furnishes an information 1 + 1 + 1 + . . .
which is infinite. Or, to put it more precisely, the amount of information
furnished would be infinite if the value of f could be known exactly. Practi
cally, however, a continuous quantity can only be determined up to a finite
number of decimal (or binary) digits.
588 IN T R O D U C T IO N TO INFO R M A T IO N THEORY [IX, § 8
£ - Ш (5)
where [x] denotes the largest integer not exceeding x. Suppose a > 0 and
let I f ^ f be finite (this is only a restriction for a < 1). It follows from
Jensen’s inequality that IftqN) is finite for every N and the inequality
la(An) ^ Ia( £ l ) + 10g2 N ( 6)
is valid. If 0 < a < 1 and if we put
, к \ к &+ 1 1
№ - o, ±1, ± 2 , . . . ; N = 1 , 2 , . . . ) ,
then we have the inequality
+ CO + CO
from which (6) follows; for a > 1 (6) can be proved in a similar manner.
When the distribution is continuous, the information Ixf N) tends to
infinity as N -» oo; however, in many cases the limit
if
+ 00
I
—oo
'°& д ^ * •
exists, then
+ 00
We have then
+00
oo I f 1
h (ív) - log2N = £ pNk log2 — ---- = f N (x) log2— — dx. (12)
k=- oo ^N 4 J /ivW
—oo
If
л:
Д а) = j' / ( m) í/ m (13)
— СО
for almost every x. Now we shall use Jensen’s inequality in the following
form : If g(x) is a concave function and if p(x) and h{x) are measurable func-
b
tions with p{x) > 0 and j p(x)dx = 1, then we have
a
h b
J g(h(x)) p(x) d x < g ( \ h(x) p ( x ) d x ) . (15)
a a
This inequality can be proved in the same way as the usual form of Jensen’s
inequality. If we apply (15) with g(x) = log2 x, h(x) = — — and
J \x )
, 4 f(x) к к+ 1
p(x) = ----- for — < x < .......- ■■■-■,
PNk A N
then we get
k+\
N
J / И '°8г
к
dx S pm log, (16)
N
and, by summing over к
+ 00
i.e.
+ 00
Iff(x) < К, we have also f N(x) < K; thus the functions f N(x) are uniformly
bounded. Hence, by the convergence theorem of Lebesgue,
+ Л +A
üm
N-*oo J
Г
f N(x) log2
JN \X)
dx =
J
f(x) log,ГJ\X)
dx (20)
-A -A
for every A > 0.
According to Jensen’s inequality, we have
(/+1)V—1 j j
£ P n * log2 -Tz---- ^ Pit iog2 — . (21)
к = IN M pNk Pu
+00
Since we have assumed that /,(£ 1) and f(x) log, —— dx are finite, we
J “ Ax)
— 00
(20), (22a) and (22b) show immediately that the theorem is true for я = 1.
Consider now the case я > 1. We get from Fatou’s lemma1 that
+00 +00
lim inf j f N (.X)a dx > j f(x )a dx. (23)
N-*oo J J
—00 —00
On the other hand, according to Jensen’s inequality,
+CO +00
J f x i x f d x <, J f ( x f dx. (24)
—oo —oo
lim J f N( x y d x = J f ( x f d x , (25)
N-+ oo —oo —oo
hence (10) is proved for я > 1. We have still to examine the case 0 < я < 1.
1 Cf. F. Riesz and B. Sz.-Nagy [1], p. 30.
592 IN T R O D U C T IO N TO IN F O R M A T IO N THEORY (IX, §8
Since we supposed / a(<^i) to be finite, we can find for every e > 0 an A > 0
such that
I p\, < ^ (2 9 )
\1\>A
From (27), (28) and (29) we conclude that (25) remains valid for 0 < a < 1.
Theorem 1 is thus completely proved.
The quantities
+ 00
4 , i ( c 0 = 4 , i ( £ ) + log2 c. (32)
IX, § 8] D EFIN IT IO N OF IN FO R M A T IO N FOR GEN ER A L DISTR IBU TION S 59 3
These facts are explained by realizing that 7al(£) is the limit of a diJference
between two informations.
All what we have said can be extended to the case of r-dimensional ran
dom vectors (r — 2, 3 ,. ..) with an absolutely continuous distribution. Let
f ( x x, . . xr) be the density function of the random vector (£(1), . . ., £(r)).
Put ^ ^ (k = 1,2,..., r). If /„((ЙЧ . . ., fir))) is finite,1 we have
,im 4 ( « P , . . ; « 9 (33)
N- 00 log2A
The dimension of the (absolutely continuous) distribution of a random vector
of r components is thus equal to r; the notion of dimension in information
theory is thus in accordance with the notion of geometrical dimension.
Furthermore, for a > 0, a Ф 1 we have
I h(m) d£ß — 1.
n
The gain o f information o f order a (or o f order 1) obtained if is replaced by
Q is defined1* by the formulas
The formulas
(39)
In this case we obtain for the gain of information from (37) and (38)
+ 00
Consider first the case 0 < a < 1. It is clear that pN(x) -> p(x) and
qN(x) q(x) almost everywhere; further
+00
Ш ы 11& n) = log2 J qN( x f pN(x)l~adx. (41)
—00
According to Lebesgue’s theorem we have for every A > 0
+A +A
lim f qN{x)apN(xf~* dx = [ q(x)ap{x)x~a dx. (42)
ЛГ- t * -A -A
Since
j q N( x ) xp N( x y ~ * d x
1*1>A
can be made arbitrarily small for a sufficiently large A, uniformly in N ,
Theorem 2 is proved for 0 < a < 1.
Now suppose a > 1. We have, according to Jensen’s inequality,
7 q(*)“ , ^ 7 Ям(хУ
—oo
J m ** -J —oo
m
and on the other hand by Fatou’s lemma
—oo —oo
' T j / Í 7 í £ r ' jx 4 <44)
which settles the case a > 1.
Finally, let a = 1. We have
+ 00
log2e
From je log2 X > ----------- we deduce
lim
N-*oo J
Г qN(x) log2 P { ) dx = (
n x J
q(x) log2
p (x )
dx (48)
— 00 — 00
T heorem 1. I f & = (pl3 . . ,,pr) and Q„ = (qnu . .., qnr) are probability
distributions and i f
lim IfQn 11*9$) = 0 (a > 0), (1)
N -*-00
P roof . If (2) does not hold, there exists a subsequence nk < n2 < . ■. <
< ns < . . . of the integers with
Г
Hm qnsh=p'k and £ (p'k - pkf # 0. (3)
s-+ oo к —1
598 IN T R O D U C T I O N T O IN F O R M A T I O N T H E O R Y [IX , §9
Г
Obviously, Z Pk = 1; further if we put J f ' = {p[ , . . . , p'r), it follows
from (3) that
lim № п,\\& ) = 1А & '\\& ). (4)
s -* 00
According to (1), / al(d?3' H d?3) = 0 , but this is possible only if d?3' = ffi, i.e.
if Pk = Pk f°r к = 1, 2 , . . r, which contradicts (3). Thus Theorem 1 is
proved.
As an application of this theorem we shall now prove a theorem about
ergodicity of homogeneous Markov chains, which, essentially, is contained
in Theorem 1 of Chapter VIII, § 8. We give here a new proof of this result,
only to show how the methods of information theory may be used to prove
theorems on limit distributions.
T heorem 2. Let us consider a homogeneous Markov chain with a finite
number o f states A0, . . ., AN\ let the probability o f transition from Aj to A k in
n steps be denoted by р $ (n = 1 , 2 , . . . ) . For pffl we write simply pJk. I f there
exists an integer s > 1 such that pjl > 0 for j, к = 0 , 1 , . . . , N, then the
equations
j=o
Z X jPjk = Xk (к = 0 , 1 , . . . , N) (5a)
1 We have here a particular case of the Perron-Fi obenius theorem; cf. F. R. Gant-
macher [1], Vol. 2, p. 46.
V
IX , § 9 ] I N F O R M A T I O N T H E O R Y O F L IM IT T H E O R E M S 599
But this inequality is an equality; hence the same must hold for every in
equality (5b), i.e.
N
1**1 = Z Pik\xj\. (5c)
7=0
Z P j P p ' ^ i PkPP
j=0 k=0
We find then by induction that
Since (5d) is valid for h = s, it follows that no pk can be zero. Because of the
homogeneity of the equations, (5a) has thus a positive system of solutions
N
p0,P i ,. ..,P n with Z Pk — l-1* Put
k=0
Z nik = 1. ( 10)
7=0
1 This solution is unique; this is a corollary of (7) and need not be proved here
separately.
600 INTR O D U C T IO N TO INFO R M AT IO N THEORY [IX, § 9
Furthermore, by definition
feiH - (13)
N
Since £ ptk = 1, it follows that
it = 0
N N
Z Pkщк = P iY Pik = Pi• ( 1 4)
fc = 0 *=0
If we multiply the inequality (13) by p k and then take the sum over k, we
obtain
Obviously
N AT
I q'jk = / =L0 «b = i-
* =0
IX , § 9] INFO R M AT IO N THEORY OF LIMIT THEOREM S 6 01
Let Qj and Q'j denote the distributions (qJ0, . . ., qjN) and (q'j0, . . ., q'JN),
respectively. If we put 71$ = PjP^/Pk, Jensen’s inequality implies
by the same argument that led to (15). But, because of (16), the relations
and
lim 4 ( ^ " ' ) | l ^ ) = V (18b)
/-►00
hold; hence there is equality in (17). Since (17) is derived from Jensen’s
inequality, it follows that equality can hold only if q,, = Xp, (/ = 0, 1,. . .,N ).
N N
Since Y j Ял — Y Pi = 1j we must have Я = 1; consequently, Qj = Sf1.
1= 0 /= 0
But then IJSlj И — IJ.& II = 0, hence by (18b) у = 0. Theorem 2
is herewith proved.
The idea to prove theorems on limit distributions by means of informa
tion theory is due to Yu. V. Linnik. He proved in this way the central limit
theorem under Lindeberg conditions by using Shannon’s entropy; he proved
the convergence of the distribution with the density function p„(x) to the
normal distribution by showing that
+00
lim j p„ (x) log2 ~ ~ dx = 0, (19)
00 J <P(x)
— 00
1
<P(x) = —7= e 2
V 2л
To say that the distribution with the density function p„(x) tends to the nor
mal distribution, means therefore that the entropy of this distribution tends
to the entropy of the normal distribution. But we can prove that for a den
sity function p(x) such that
+ 00
j xl p{x)dx = 1 (23)
— cn
the inequality
+00 +00
Г* c
I p(x) log2 —ГТ dx < (f{x) log, —— dx (24)
J P(x) J <p(x)
-0 0 -0 0
holds, since because of (21) and (23), (24) is equivalent to the well-known
inequality
+00
I p(x) log2 dx > 0. (25)
J <P(x)
—00
The statement of the central limit theorem may therefore be expressed as
follows: The entropy of the standardized sum of independent random vari
ables tends, as the number of the variables tends to infinity, to the maxi-
IX, § 10] EXTENSION OF INFORMATION THEORY 603
mum of the entropy of all random variables with unit variance. Thus the
central limit theorem of probability theory is closely connected with the
second law of thermodynamics.1
P (A \B ) = ^ ^ . (1)
*
604 IN T R O D U C T IO N TO IN F O R M A T IO N T H E O R Y [IX , § 10
if this limit exists and is finite. If it does not exist, the information in ques
tion will be characterized by the following two quantities:
lim inf Щ I Qn) = Щ )
N-*■00
and
lim sup Щ I Qn) = Щ ) .
N-+00
1 if и is odd,
N
{ 0 it n is even.
N1
— N --------
2 N 2 N
1,(80 (n) I Qs ) = — — log2 — — + ----- ------- - lo g .,-------- - ,
~Y
and by (3)
/ ,(%(«)) = 1. (5)
It follows in the same way that
and
N — 2r 4- 1
p ( m - r \ q n) = n .
2[log2/vl 1 I
li™ ' - \r— = У — < у< 1 ,
N- oo 2 J
then we have
§ 11. Exercises
2. a) Let some integer n (1 < n < 2 000) be divided by 6 , 10, 22, and 35 and let
the remainders be given, while we assume that the remainders are compatible. How
much information is thus given concerning the number n?
Hint. The information is equal to log2 2000 = 10.96 (i.e. we get full information
o n и). I n f a c t th e r e m a in d e r s m e n tio n e d d e te r m in e n m o d u lo t h e le a s t c o m m o n
m u ltip le o f 6, 10, 2 2 a n d 3 5 ; w h ic h is e q u a l t o 2 31 0 > 2 0 0 0 , h e n c e n is u n iq u e ly
d e te r m in e d .
b ) L e t th e n u m b e r n b e e x p re s s e d in t h e sy s te m o f b a s e ( —2 ), i.e . p u t
» = E M “ 2)*,
k =0
where bk can take on the values 0 or 1 only. How much information on n is contained
in bkl
[-4 4
Hint. P u t N = Y 2 2i+1; th e n
/= 0
r 2r+1—N — 1
i y - п К Ж -Z )
where p runs through all primes. Let N k(x) denote the number of integers n smaller
than X with V(n) — U(n) = k, then
гhm ---------= dk
n
» - + 0. Y
1 Cf. A. Rényi [16].
Ix, § 11] EXERCISES 607
Y1 en(x)
X = л=1
L —J4 r .
where q is a positive integer > 2 , and t'„{x) can take on the values 0 , 1 , ,q — 1
(n — 1 , 2 , . . . ) . How much information with respect to x is contained in the value
of £„(*)?
b) Expand x (0 < x < 1) into the Cantor series
\ en(x)
x = > ------------- ,
«=i <hd2 ■■■%
where qL, q.,, . . . , q„, . . . are positive integers > 2 , and e„(x) can take on the values
0 ,1 ......... qn — 1 . How much information with respect to x is contained in the value
of en(x) 4
c) Expand x (0 < x < 1) into a regular continued fraction
1
* ” ” 1 ’
ai (x) 4------7~T~T-----
ű2 (x) + . . .
where each a„(x) can be an arbitrary positive integer. How much information about
x is contained in the value of a„(x) ?
Hint. Let m„(k) denote the measure of the set of those x for which a„(x) = к .
As is known1
Let it be remarked that contrary to Exercises 3.a) and 3.b), the random variables a„(x)
in this example are not independent; the total information contained in a sequence
of several digits a„(x) is not equal to the sum of the informations contained in the
individual digits.
4. Let a differentiable function f(x) be defined in [0, A] and suppose/(0 ) = 0 and
|/ '0 ) | < В . Find an upper bound for the information necessary in order to determine
the value of /(x ) at every point of [0, A] with an error not exceeding e > 0 .
k£ 1 Д \
Hint. Put xk = ----- Iк — 0,1, . . . , ------ , Xr ab = A. Let the curve of
1
В l e J 1— J+1
/(x ) be approximated by a polygonal line у — rp(x) which can have for its slope in
each of the intervals (xk, xk+x) either + B or —B. If <p(x) is already defined for
0 < X < xk , then let the slope in (xk+x) be so chosen that \f(xk+t) - <f(xk+l)\ < e.
Obviously, this is always possible. S ince/(x) - rp(x) is in every interval (xk, x*+ ,)
monotone, the inequality |/(лг) — <f(x)\ < e holds in the open intervals (xk, xk+l)
(k = о, 1,. . .) as well. Clearly, the number of possible functions <f{x) is equal
Г 1 i
to 2*-— -* . In order to determine f ( x ) up to an error e there suffices therefore
AB
----- + 1 bits of information.
e
5. We have n apparently identical coins. One of them is false and heavier than the
others. We possess a balance with two scales but without weights. How many
weighings are necessary to find the false coin?
Hint. The amount of information needed is equal to log 2 n. Only weighings with
an equal number of coins in both scales are worth while to be performed. 3 cases
are possible: equilibrium, right scale heavier, and left scale heavier. One weighing
log» n ]
{
j-j weighings
({x} denotes the smallest integer greater than or equal to x). It is easy to see that
this number of weighings is sufficient. In fact, let к be defined by З*- 1 < n < 3*.
At the first weighing we put in each of the dishes -^-j coins. We know then to which
of the three sets, each containing at most 3*- 1 coins, the false coin belongs. Proceeding
in this manner, the false coin will be found after at most к weighings.
6. The “Bar-Kochba” game is played as follows: Player A thinks of any object,
playei В asks questions which can be answered by “yes” or “no”, and has to guess
the thing on which A thought from the answers. Naturally, A has to answer all
questions honestly.
a) The players agree that A thinks of some nonnegative integer < N. What is the
minimal number of questions permitting В to find out the considered integer? Give
an “optimal” sequence of questions.
Hint. Obviously, at least {log 2 N) questions are needed, since each answer provides
at most one bit of information and we need log2 N bits. An optimal system of questions
is to ask, whether in the binary representation of the number x the first, the second,...,
digit is 0? The aim is arrived at by {log 2 N} questions, since the binary representation
of an integer is unique.
b) Suppose N = 2s. How many optimal systems of questions do exist? That is:
how many systems of exactly s questions determine x whatever it may be?
Hint. The number of the possible sequences of answers to s questions is evidently 2s.
There corresponds thus to every integer x (x = 0, 1, . . . , 2s — 1) in a one-to-one
manner a sequence of s yes-or-no answers. Every question can be put in the following
form: Does x belong to a subset A of the sequence 0, 1, . . . , 2s — 1? Thus to an
optimal sequence of s questions there correspond 5 subsets of the set M = { 0 , 1 , . . . ,
2s — 1}; let these be denoted by Au A2, . . . , As . According to what has been said,
A x has to contain exactly 2’~' elements. Let A always denote the set complementary
to A with respect to M . Then ArA2 and AXA„ have to contain both 2S~2 elements;
AtA2A 3 , AyA2A3, AyAsA3 , and A,A2A3 have to contain 2s -3 elements, and so on.
IX, § 11] E X E R C IS E S 609
й ( Я ’ Й ' - С Г - ^
If we regard the systems of questions which differ only in the order of questions as
2 s!
identical, then the number looked for is —— .
i!
Remark. In the Bar-Kochba game the questions are, in general, formulated while
taking into account the answers already obtained. (In the language of set theory:
if the first answers have shown that the object belongs to a subset A of the set M of
all possible objects, then the next question is whether it belongs to some subset В
of the set A.) It follows from what has been said that the questioner suffers no dis
advantage by being obliged to put his questions simultaneously.
7. Suppose that in the Bar-Kochba game type players agree that the objects
allowed to be thought of are the n elements of a given set M. Suppose that the
questions are asked at random, or in other words, all possible questions have the
same probability, independently of the answers already obtained.
a) What is the probability that the questioner finds out the object by к questions?
b) Find the limit of the probability obtained in a) as n and к both tend to + 00 such
that
lim (к — log2 n) — c .
n—»-CO
Hints. We may suppose that the elements of the set M are the numbers 1, 2 , . . . , n.
Each possible question is equivalent to asking whether the number thought of does
belong to a certain subset of M. The number of possible questions is thus equal to
the number of subsets of M, i.e. to 2". (For sake of simplicity there are included the
two trivial questions corresponding to the whole set and the empty set.) Let
Alt A2, . . . , Ak be the sets chosen at random by the questioner: i.e. he asks, whether
the number thought of does belong to these sets. By assumption, each of the sets
An is, with probability , equal to an arbitrary subset of M. Put
Remarks
1. The number of questions needed ( i f и is s u f f i c i e n t l y l a r g e ) in
order to find the number with a probability > 0.99 by means of this random strategy
exceeds only by 7 the number of questions needed in the case the optimal strategy
is employed. In fact, exp ^ j > 0.99. This result is surprising, since one would
be inclined to guess that the random strategy is much less advantageous than the
optimal strategy.
2. When the questions are asked at random it may happen that the same question
occurs twice. But the corresponding probability is so small, if n is large, that it is
not worth while to exclude this possibility, though of course this would slightly increase
the chances of success.
8. Certain players play the Bar-Kochba game in the following manner: There are
r + 1 players, r players think of some object; the last player asks them questions.
The same question is addressed to every player, who answers by “yes” or “no”,
according to what is true concerning the object he had thought of.
a) Each of the players thinks of one of the numbers 1, 2, . . . , и (и > r), but each
of a different number. The questions are asked at random, as in the preceding
exercise. What is the probability that the questioner finds all numbers by к questions?
b) n = r and the players agree to think each of a different number of the sequence
1 , 2 , . . . , « ; hence it is a permutation of numbers which is to be found. What is the
probability that the questioner finds the permutation by к questions? Calculate ap
proximately this probability for к = 2 log2 n + c.
Hints, a) We are led to the following urn problem: we put « balls into 2k urns,
independently of each other, each ball having the same probability —k to get into
any one of the urns. Among the n balls there are r red balls, the others are white.
What is the probability that all the red balls get into different urns? This proba
bility is
p"’k■" “ П f1 - );
thus
Remark. This problem is due to K. L. Chung but his proof differs from ours.
If instead of the complete additivity only simple additivity is required, i.e. that
/( » m) = An) + f (pi), if (и, tri) = 1, then the condition (B) does not imply /(») =
= c log n. (The last step of the proof cannot be carried out in this case.)10
10. Let P = {pk} be any distribution with
00
X kpk = k > 1.
k= 1
612 IN T R O D U C T IO N TO IN F O R M A T IO N T H E O R Y [ I X , § 11
f j Pk 1o82 — = h (& )
*=1 Pk
takes on its maximal value if
11. Let (fi and Q be two distributions, absolutely continuous with respect to Lebesgue
measure, with density functions p(x) and q(x) and further let Q_ be absolutely con
tinuous with respect to fP. It follows from Theorem 2 of § 8 that the gain of information
is nonnegative in this case too, i.e. we have the inequalities
+CO
J
—oo
Í q(x) lo g 2 - q\X
--
P(x)
dx > 0
and
+ 00
/ q(xY
-— 7 l0 g2 —— 71T d x > 0 for a> 0,a ^ 1.
a- 1 J p(xf 1
Prove these inequalities directly (without passing to the limit) by Jensen’s inequality
geneialized for functions, i.e. by inequality (15) of § 8 .
12. a) Let £ be a positive random variable having an absolutely continuous distri
bution, with £'(£) = A > 0. Show that the entropy (of order 1) of c is maximal if
the distribution of £ is exponential.
Hint. Let f(x) be a density function in (0, + =») with
со
J X f(x) dx = Á
0
and put
*W = l e x p ( - | ) .
We have then
b b J?
4.« (0?l> Vi, ■■; Vn)) = 4.» ((fi, Í2. • • •, Í,)) + log2 I II С II I
where ]|C|| denotes the determinant of C.
14. Let £t, be independent random variables with absolutely continuous
distribution. We have
4 .Г ( t f l , O ) = 4 .1 t f l ) + 4 ,2 ( « + • - - + 4 , ( { ,) .
It follows that
KÍ, V) = Л (« II & * Q),
because of Formula (38) of § 8 .
16. In the following exercises we always use natural (Napier’s) logarithms In.
a) Calculate the entropy (of dimension 1 and of order 1) of the normal distribution;
i.e. show that
+ 00
where
1 ( ( x - mY \
<f(x) = - ■ e x p ------- — — .
2ла l )
614 IN T R O D U C T IO N TO IN F O R M A T IO N T H E O R Y [IX , § 11
b) Calculate the entropy (of dimension r and of order 1) of the /--dimensional normal
distribution.
Hint. Let the r-dimensional density function of the random variables Ci, У •••>£/
be
J
1É=1 /=1
E W /-
1 ( 1 V У}\
— r ----------- exp| - y E ^ - .
(2л) 2 ala2 . . . a r
_ 1
with aya2 . . . aT— ||В|| 2 . According to Exercise 13, the entropy is invariant under
such a transformation, since the absolute value of the determinant of an orthogonal
transformation is equal to 1. Hence, according to Exercise 14,
I A I С 2ле
I(Z,V) = In ^ 2 л е ^ — ^ + ln V 2 - ,n =
= In /— AC — .
V AC — B2
If В = 0, i.e, if 4' and rj are independent, we find of course that /({, rj) = 0.
17. Let { be a random variable with absolutely continuous distribution and density
function f(x). Let the standard deviation a оf f be finite and positive.
XX, § 11] EXERCISES 615
1
W\—
a- 1
) —“‘i 1- < X2\ 1
--------- 7---------{ 2 1 - — for fx | < c ,
Ш = c Г2 ■ 1 1 c >
(a — 1j
0 otherwise,
, , /3a — 1
where we have put c = a , ---------.
V a —1
Hint. Put
+03
m = M(i), a2 =
J
f (x — mf f ( x ) dx, (f(x) = —
^ 2 ло
exp ( —
l 2 <r2
-I.
)
— 00
We have then
+ 00 +00 +00
which implies a), b) can be proved in the same fashion. Let it be noticed that f j x )
tends to
as a —> 1 .
18. Let f(x) and f„(x) be density functions such that f,,(x) = 0 (« = 1, 2, . . .) for
every value of x for which f(x) = 0 ; suppose further that all integrals
Jf Щ/(*)Я - d x (« = 1, 2 , . . . )
where E runs through all measurable subsets of the set of real numbers.
616 IN T R O D U C T IO N TO IN F O R M A T IO N T H E O R Y [IX , § 11
^ f f (/»(■*■) -f( x )Y J ) 2
" U -------7 w J
— oo
and clearli
f” (fr(x) —f(x))‘ _ f4 W ^ _ L
■I /(*) J /(X)
— 00 — oo
TABLES 617
TABLES
T able 1
n n\ lo g nl n n\ lo g n\
T ablü 2
ín \
Binomial coefficients ^ | for n <C 30 1
V 0 1 2 3 4 5 6 7 8
2 1 2 1
3 1 3 - 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
6 1 6 15 20 15 6 1
7 1 7 21 35 35 21 7 1
8 1 8 28 56 70 56 28 8 1
9 1 9 36 84 126 126 84 36 9
10 1 10 45 120 210 252 210 120 45
11 1 i 11 55 165 330 462 1 462 330 165
12 1 12 66 220 495 792 924 792 495
13 1 13 78 286 715 1287 1716 1716 1287
14 1 1 14 91 364 1001 2002 3003 3432 3003
15 1 Í 15 105 455 1365 3003 5005 6435 6435
16 1 ! 16 120 560 1820 4368 8008 11440 12870
17 1 17 136 680 2380 6188 12376 19448 24310
18 1 18 153 816 3060 8568 18564 31824 43758
19 1 19 171 969 3876 11628 27132 50388 75582
20 1 j 20 190 1140 4845 15504 38760 77520 125970
21 1 21 210 1330 5985 20349 54264 116280 203490
22 1 22 231 1540 7315 26334 74613 170544 319770
23 1 23 253 1771 8855 33649 100947 245157 490314
24 1 24 276 2024 10626 42504 134596 346104 735471
25 1 25 300 2300 12650 53130 177100 480700 1081575
26 1 26 325 2600 14950 65780 i 230230 657800 1562275
27 1 27 351 2925 17550 80730 296010 888030 2220075
28 1 28 378 3276 20475 98280 1 376740 1184040 3108105
29 1 29 406 3654 23751 118755 j 475020 1560780 4292145
30 1 30 435 4060 27405 142506 j 593775 2035800 5852925
1 For n > 15 values of | ” j are given for к < ~ only; the further values can
T able 2
(continued)
i к/
9 10 11 I 12 13 14 15 /
/ n
[7
j 2
55
I
10 I
11
1
i
1
I
1 j
j
'
.
И
h 9
10
5
6
220 66 12 1 12
715 286 j 78 I 13 1 13
2002 I 1001 j 364 j 91 I 14 1 14
5005 I 3003 1365 I 455 i 105 15 1 15
11440 I 8008 I 4368 j 1820 j 560 120 16 16
24310 ; 19448 I 12376 I 6188 j 2380 680 136 17
48620 ! 43758 ! 31824 j 18564 J
8568 3060 816 18
92378 j 92378 75582 j 50388 | 27132 11628 3876 19
167960 ! 184756 167960 | 125970 [ 77520 38760 15504 20
293930 j 352716 352716 [ 293930 | 203490 116280 54264 21
497420 I 646646 705432 I 646646 [ 497420 319770 170544 | 22
817190 j
1144066 1352078 1352078 |
1144066 817190 490314123
1307504 Í 1961256 2496144 i 2704156 2496144 j 1961256 1307504 24
2042975 3268760 4457400 j 5200300 j 5200300 4457400 3268760 25
3124550 5311735 j 7726160 I 9657700 10400600 j 9657700 7726160 26
4686825 8436285 ! 13037895 ! 17383860 Í20058300 20058300 17383860 27
6906900 13123110 21474180 30421755 I 37442160 40116600 37442160 28
10015005 I 20030010 34597290 51895935 ' 67863915 77558760 77558760 29
14307150 j
30045015 54627300 86493225 119759850 145422675 155117520 30
I I I
620 TABLES
~ Ä “ '
0.1 0.2 0.3 0.4 0.5
к \ ___________________________________________________________________________ _____ _________
Л
1 2 3 4 5
к^ ___________________________________________________________________
( c o n t in u e d ) T able 3
~ X j '
6 7 8 9 10
к \
I
0 0.00247 0.00091 0.00033 0.00012 0.00004
1 0.01487 0.00638 0.00268 0.00111 0.00045
2 0.04461 0.02234 0.01073 0.00499 0.00227
3 0.08923 0.05212 0.02862 0.01499 0.00756
4 0.13385 0.09122 0.05725 0.03373 0.01891
5 0.16062 0.12772 0.09160 0.06072 0.03783
6 0.16062 0.14900 0.12214 0.09109 0.06305
7 0.13768 0.14900 0.13959 0.11712 0.09007
8 0.10326 0.13038 0.13959 0.13176 0.11260
9 0.06883 0.10140 0.12408 0.13176 0.12511
10 0.04130 0.07098 0.09926 0.11858 0.12511
11 0.02252 0.04517 0.07219 0.09702 0.11374
12 0.01126 0.02635 0.04812 0.07276 0.09478
13 0.00519 0.01418 0.02961 0.05037 0.07290
14 0.00222 0.00709 0.01692 0.03238 0.05207
15 0.00089 0.00331 0.00902 0.01943 0.03471
16 0.00033 0.00144 0.00451 0.01093 0.02169
17 0.00011 0.00059 0.00212 0.00578 0.01276
18 0.00003 0.00023 0.00094 0.00289 0.00709
19 0.00001 0.00008 0.00039 0.00137 0.00373
20 0.00003 0.00015 0.00061 0.00186
21 0.00006 0.00026 0.00088
22 0.00002 0.00010 0.00040
23 0.00004 0.00017
24 0.00001 0.00007
25 0.00002
26 0 .0 0 0 0 1
622 TABLES
T able 3 ( c o n tin u e d )
\ я 11 12 13 14 15
к \ _________ __j___________________________________________________ _______
0 0 .0 0 0 0 1
1 0.00018 0.00007 0.00002 0.00001
2 0.00101 0.00044 0.00019 0.00008 0.00003
3 0.00370 0.00177 0.00082 0.00038 0.00017
4 0.01018 0.00530 0.00269 0.00133 0.00064
5 0.02241 0.01274 0.00699 0.00373 0 00193
6 0.04109 0.02548 0.01515 0.00869 0.00483
7 0.06457 0.04368 0.02814 0.01739 0.01037
8 0.08879 0.06552 0.04573 0.03043 0.01944
9 0.10853 0.08736 0.06605 0.04734 0.03240
10 0.11938 0.10484 0.08587 0.06628 0.04861
11 0.11938 0.11437 0.10148 0.08435 0.06628
12 0.10943 0.11437 0.10994 0.09841 0.08285
13 0.09259 0.10557 0.10994 0.10599 0.09560
14 0.07275 0.09048 0.10209 0.10599 0.10244
15 0.05335 0.07239 0.08847 0 09892 0.10244
16 0.03668 0.05429 0.07188 0.08655 0.09603
17 0.02373 0.03832 0 05497 0.07128 0.08473
18 0.01450 0.02555 0.03970 0.05544 0.07061
19 000839 0.01613 0.02716 0.04085 0.05574
20 0.00461 0.00968 0.01765 0.02859 0.04181
21 0.00241 0.00553 0.01093 0.01906 0.02986
22 0.00121 000301 0.00645 0 01213 0.02036
23 0.00057 0.00157 0.00365 | 0.00738 0.01328
24 0.00026 0.00078 0.00197 | 0.00430 0.00830
25 0.00011 0.00037 0.00102 0.00241 0.00498
26 0.00004 0.00017 0.00051 0.00129 0.00287
27 0.00002 0.00007 0.00024 0.00067 0 00159
28 0.00003 0.00011 0.00033 0.00085
29 0.00001 0.00005 0.00016 0.00044
30 I 0 .0 0 0 0 2 0.00007 0 .0 0 0 2 2
31 I i 0,00003 0.00010
32 I ! 0.00001 0.00005
33 I 0.00002
34 I ! 0.00001
I
TABLES 623
(c o n tin u e d ) T able 3
. A
\ 16 17 1819 20
кX ________________________________________________________________________________________________
0
1
2 0.00001
3 000007 0.00003
4 0.00030 0.00014 0.00006 0.00003 0.00001
5 0.00098 0.00049 0.00024 0.00011 0.00005
6 0.00262 0 00138 0.00071 0.00036 0.00018
7 0 00599 0.00337 0.00185 0.00099 0.00052
8 0.01198 0.00716 0.00416 0.00236 0.00130
9 0 02131 0.01352 0.00832 0.00498 0.00290
10 003409 0.02300 0.01498 0.00946 0.00581
11 0.04959 0 03554 0.02452 0.01635 0.01057
12 0.06612 0.05035 0.03678 0.02588 0.01762
13 0.08138 0.06584 0.05092 0.03783 0.02711
14 009301 0.07996 0.06548 0.05135 0.03874
15 009921 0.09062 0.07857 0.06504 0.05165
16 0.09921 0.09628 0.08839 0.07724 0.06456
17 0.09338 0.09628 0.09359 0.08632 0.07595
18 0.08300 0.09093 0.09359 0.09112 0.08439
19 0.06989 0.08136 0.08867 0.09112 0 08883
20 0 05592 0.06915 0.07980 0.08656 0 08883
21 0.04260 0.05598 0.06840 0 07832 0.08460
22 0 03098 0 04326 0.05596 0.06764 0 07691
23 0.02155 0.03197 0.04380 0.05587 0.06688
24 0.01437 0.02265 0.03285 0.04423 0.05573
25 0 00919 0.01540 0.02365 0 03362 0.04458
26 0 00566 0 01007 0.01637 0.02456 0 03429
27 0.00335 0.00634 0.01091 0.01728 0.02540
28 0.00191 0.00385 0.00701 0.01173 0.01814
29 0.00105 0.00225 0.00435 0.00768 0.01251
30 0.00056 0.00127 0.00261 0.00486 0.00834
31 0.00029 0.00070 0.00151 0.00298 0.00538
32 0.00014 0.00037 0.00085 0 00177 0.00336
33 0.00007 0.00019 0.00046 0.00102 0.00203
34 0.00003 0.00009 0.00024 0.00057 0.00119
35 0.00001 0.00004 0.00012 0.00030 0.00068
36 0.00002 0.00006 0.00016 0.00938
37 0.00001 0.00003 0.00008 0.00020
38 0.00001 0.00004 0.00010
39 0.00002 0.00005
40 0.00002
41 0.00001
624 TABLES
T able 4
( continued) T able 4
T able 4 ( c o n tin u e d )
n 2 = 5 .0 j 6=5. 5 2 = 6 .0 2= 6.5
_________________________I_______________________________________ _
1 I 0.99326 0.99591 0.99752 0.99850
2 95957 97345 98265 98872
3 87535 91162 93804 95696
4 73497 79830 84880 88816
5 55951 64248 71494 77633
6 38404 47108 55433 63096
7 23782 31396 39370 47347
8 13337 19051 25603 32724
9 06809 10564 15276 20843
10 03183 05378 08392 12262
11 01369 02525 04262 06684
12 00545 01099 02009 03389
13 00202 00445 00883 01603
14 00070 00169 00363 00710
15 00023 00060 00140 00296
16 00007 00020 00051 00116
17 00002 00006 00017 00044
18 00001 00002 00006 00015
19 00001 00002 00005
20 00001 00001
TABLES 627
( c o n t in u e d ) T able 4
T able 4 ( c o n t in u e d )
T able 5
1 - *
The function <p(x) = — -— e 2
J in
0.00 0.3989
0.01 0.3989 0.41 0.3668 0.81 0.2874 1.21 0.1919
0.02 0.3989 0.42 0.3653 0.82 0.2850 1.22 0.1895
0.03 0.3988 0.43 0.3637 0.83 0.2827 1.23 0.1872
0.04 0.3986 0.44 0.3621 0.84 i 0.2803 1.24 0.1849
0.05 0.3984 0.45 0.3605 0.85 0.2780 1.25 0.1826
0.06 0.3982 0.46 0.3589 0.86 0.2756 1.26 0.1804
0.07 0.3980 0.47 0.3572 0.87 0.2732 1.27 0.1781
0.08 ! 0.3977 0.48 0.3555 0.88 0.2709 1.28 0.1758
0.09 0.3973 0.49 0.3538 0.89 0.2685 1.29 0.1736
0.10 0.3970 0.50 0.3521 0.90 0.2661 1.30 0.1714
0.11 ; 0.3965 0.51 0.3503 0.91 0.2637 1.31 0.1691
0.12 ! 0.3961 0.52 0.3485 0.92 0.2613 1.32 0.1669
0.13 0.3956 0.53 0.3467 0.93 0.2589 1.33 0.1647
0.14 0.3951 0.54 0.3448 0.94 0.2565 1.34 0.1626
0.15 j 0.3945 0.55 0.3429 0.95 0.2541 1.35 0.1604
0.16 ; 0.3939 0.56 0.3410 0.96 0.2516 1.36 0.1582
0.17 : 0.3932 0.57 0.3391 0.97 0.2492 1.37 0.1561
0.18 j 0.3925 0.58 0.3372 0.98 0.2468 1.38 0.1539
0.19 0.3918 0.59 0.3352 0.99 0.2444 1.39 0.1518
0.20 1 0.3910 0.60 0.3332 1.00 0.2420 1.40 0.1497
0.21 ; 0 3902 0.61 0.3312 1.01 0.2396 1.41 0.1476
0.22 0.3894 0.62 0.3292 1.02 0.2371 1.42 0.1456
0.23 0.3885 0.63 0.3271 1.03 0.2347 1.43 0.1435
0.24 0.3876 0.64 0.3251 1.04 0.2323 1.44 0.1415
0.25 0.3867 0.65 0.3230 1.05 0.2299 1.45 0.1394
0.26 0.3857 0.66 0.3209 1.06 0.2275 1.46 0.1374
0.27 0.3847 0.67 0.3187 1.07 0.2251 1.47 0.1354
0.28 0.3836 0.68 0.3166 1.08 0.2227 1.48 0.1334
0.29 0.3825 0.69 0.3144 1.09 0.2203 1.49 0.1315
0.30 0.3814 0.70 0.3123 1.10 0.2179 1.50 0.1295
0.31 0.3802 0.71 0.3101 1.11 0.2155 1.51 0.1276
0.32 0.3790 0.72 0.3079 1.12 0.2131 1.52 0.1257
0.33 0.3778 0.73 0.3056 1.13 0.2107 1.53 0.1238
0.34 0.3765 0.74 0.3034 1.14 0.2083 1.54 0.1219
0.35 0.3752 0.75 0.3011 1.15 0.2059 1.55 0.1200
0.36 0.3739 0.76 0.2989 1.16 0.2036 1.56 0.1182
0-37 0.3725 0.77 0.2966 1.17 0.2012 1.57 0.1163
0.38 0.3712 0.78 j 0.2943 1.18 0.1989 1.58 0.1145
0.39 0.3697 0.79 0.2920 1.19 0.1965 1.59 0.1127
0.40 0.3683 0.80 0.2897 1.20 0.1942 1.60 0.1109
630 TABLES
T able 6
л:
i ’ I j
X Ф(х) X \ Ф(.Х) X 1 ф(х) X j Ф(х)
j— -
0.00 0.5000
0.01 0.5040 0.41 0.6591 0.81 0.7910 1.21 0.8869
0.02 0.5080 0.42 0.6628 0.82 0.7939 1.22 0.8888
0.03 0.5120 0.43 0.6664 0.83 0.7967 1.23 0.8907
0.04 0.5160 0.44 0.6700 0.84 0.7995 1.24 0.8925
0.05 0.5199 0.45 0.6736 0.85 0.8023 1.25 0.8944
0.06 0.5239 0.46 0.6772 0.86 0.8051 1.26 0.8962
0.07 0.5279 0.47 0.6808 0.87 0.8078 1.27 0.8980
0.08 0.5319 0.48 0.6844 0.88 0.8106 1.28 0.8997
0.09 0.5359 0.49 0.6879 0.89 0.8133 1.29 0.9015
0.10 0.5398 0.50 0.6915 0.90 0.8159 1.30 0.9032
0.11 0.5438 0.51 0.6950 0.91 0.8186 1.31 0.9049
0.12 0.5478 0.52 0.6985 0.92 0.8212 1.32 0.9066
0.13 0.5517 0.53 0.7019 0.93 0.8238 1.33 0.9082
0.14 0.5557 0.54 0.7054 0.94 0.8264 1.34 0.9099
0.15 0.5596 0.55 0.7088 0.95 0.8289 1.35 0.9115
0.16 0.5636 0.56 0.7123 0.96 0.8315 1.36 0.9131
0.17 0.5675 0.57 0.7157 0.97 0.8340 1.37 0.9147
0.18 0.5714 0.58 0.7190 0.98 0.8365 1.38 0.9162
0.19 0.5753 0.59 0.7224 0.99 0.8389 1.39 j 0.9177
0.20 0.5793 0.60 0.7257 1.00 0.8413 1.40 j 0.9192
0.21 0.5832 0.61 0.7291 1.01 0.8438 1.41 0.9207
0.22 0.5871 0.62 0.7324 1.02 0.8461 1.42 0.9222
0.23 0.5910 0.63 0.7357 1.03 0.8485 1.53 0.9236
0.24 0.5948 0.64 0.7389 1.04 0.8508 1.44 0.9251
0.25 0.5987 0.65 0.7422 1.05 0.8531 1.45 j 0.9265
0.26 0.6026 0.66 0.7454 1.06 0.8554 1.46 0.9279
0.27 0.6064 0.67 0.7486 1.07 0.8577 1.47 0.9292
0.28 0.6103 0.68 0.7517 1.08 0.8599 1.48 0.9306
0.29 0.6141 0.69 0.7549 1.09 0.8621 1.49 0.9319
0 30 0.6179 0.70 0.7580 1.10 0.8643 1.50 0.9332
0.31 0.6217 0.71 0.7611 1.11 0.8665 1.51 0.9345
0.32 0.6255 0.72 0.7642 1.12 0.8686 1.52 0.9357
0.33 0.6293 0.73 0.7673 1.13 0.8708 1.53 0.9370
0.34 0.6331 0.74 0.7703 1.14 0.8729 1.54 0.9382
0.35 0.6368 0.75 0.7734 1.15 0.8749 1.55 0.9394
0.36 0.6406 0.76 0.7764 1.16 0.8770 1.56 0.9406
0.37 0.6443 0.77 0.7794 1.17 0.8790 1.57 0.9418
0.38 0.6480 0.78 0.7823 1.18 0.8810 1.58 0.9429
0.39 0.6517 0.79 0.7853 1.19 0.8830 1.59 0.9441
0.40 0.6554 0.80 0.7881 1.20 0.8849 1.60 0.9452
632 TABLES
T able 6 ( c o n t in u e d )
£ I I i I I
U) I I
__ _____ _____ I
1 I 2 3 ; 4 5 6 7 j 8 9 10 I 11 j 12 13 14 | 15 16 17 18
I I I
634 TABLES
T a b le 8
(c o n tin u e d ) T able 8
i ..
z K(z) z K(z) z K(z)
T able 9
/ _ (2Лс + 1)»я*
0.1
0.5
1-0 [ 0.0000 0.0000
1.5 I 0.0000 0.0000 I 0.0000 0.0002 0.0009
2.0 ! 0.0000 0.0001 0.0008 0.0036 0.0101 0.0212
2.5 j 0.0001 0.0022 0.0112 I 0.0299 0.0578 0.0925
3.0 0.0000 I 0.0015 0.0151 0.0474 j 0.0941 0.1487 0.2061
3.5 O.OOOl I 0 .0 0 9 2 0.0491 0 .1 1 3 6 j 0 .1 8 7 9 0 .2 6 2 8 0.3341
4 .0 0 .0 0 0 6 0.0291 0 .1 0 5 2 0.2001 | 0 .2 9 4 2 0 .3 8 0 4 0.4 5 7 1
4 .5 0 .0 0 3 1 0 .0 6 4 3 0 .1 7 7 6 0.2951 1 0.4001 0.4901 0 5665
5.0 0.0096 0.1134 0.2582 0.3895 0.4985 0.5873 0 6598
5.5 0.0225 0.1726 0.3406 0.4784 0.5863 0.6707 0.7374
6.0 0.0428 0.2375 0.4204 0.5591 0.6627 0.7409 0.8005
6.5 0.0707 0.3045 0.4952 0.6310 0.7282 0.7989 0.8509
7.0 0.1053 0.3708 0.5638 0.6940 0.7834 0.8461 0.8904
7.5 0.1452 j 0.4347 0.6258 0.7484 0.8294 0.8838 0.9207
8.0 0.1889 0.4959 0.6811 0.7951 0.8671 0.9135 0 9436
8.5 0 2348 0.5513 0.7301 0.8345 0.8977 0.9365 0.9606
9.0 0.2819 0.6031 0.7731 0.8676 0.9221 0.9540 0.9729
9.5 0.3290 0.6506 0.8104 0.8950 0.9414 0.9672 0.9817
10.0 0.3754 0.6938 0.8427 0.9175 0.9564 0.9770 0.9878
11.0 0.4640 0.7678 0.8939 0.9505 0.9768 0.9891 0.9949
12.0 0.5450 0.8270 0.9303 0.9714 ; 0.9882 0.9951 0.9980
13.0 0.6174 0.8734 0.9555 0.9841 j 0.9943 0.9980 0.9993
14.0 0.6812 0.9090 0.9724 0.9915 0.9974 0.9992 0.9998
15.0 0.7367 0.9358 0.9833 0.9956 j 0.9988 0.9997 0.9999
16.0 0.7844 0.9555 0.9902 0.9978 | 0.9995 0.9999 1.0000
17.0 0.8249 0.9697 0.9944 0.9990 j 0.9998 1.0000
18.0 0.8591 0.9797 0.9969 0.9995 0.9999
19.0 0.8876 0.9867 0.9983 0.9998 1.0000
20.0 0.9112 0.9915 0.9991 0.9999
21.0 0.9304 0.9946 0.9996 1.0000
22.0 0.9459 0.9967 0.9998
23.0 0.9584 0.9980 0.9999
24.0 0.9683 0.9988 1.0000
25.0 0.9760 0.9993
30.0 0.9949 1.0000
35.0 0.9991
40.0 0.9999
43.0 1.0000
TABLES 637
(c o n tin u e d ) T able 9
\ a
\ I 0.08 0.09 I 0.1 0.2 0.3 0.4 0.5
■У ^ 1 ____________________________________________ I _____________________________________________________________________________________________
These notes wish to call attention to books and papers which may be useful to the
reader for further study of subjects dealt with in the present textbook, including
books and papers to which reference was made in the text. For topics which are
treated in detail in some current textbook, we mention only such books, where the
reader can find further references.
As regards topics not discussed in standard textbooks, the'sources of the material
contained in this book are given in greater detail. These bibliographic notes contain
often some remarks on the historical development of the problems dealt with, but
to give a full account of the history of probability theory was of course impossible.
For the history of Probability Calculus up to Laplace see Todhunter [1].
Concerning less-known theorems or methods from other branches of mathematics,
we refer to some current textbook readily accessible to the reader.
The notes are restricted to the most important methodical problems. On several
occasions the method of exposition chosen in the present book is compared in the
notes to that in other textbooks.
Chapter I
Glivenko was the first to stress in his textbook (Glivenko [3]; cf. also Kolmogorov
[9]) the advantage of discussing the algebra of events as a Boolean algebra before
the introduction of the notion of probability. It seems to us that the understanding
of Kolmogorov’s axiomatic theory is hereby facilitated. On the general theory of
measure and integration over a Boolean algebra instead of over a field of sets see
Carathéodory [1 ]. Recent results on probability as a measure on a Boolean algebra
are summarized in Kappos [1].
§§ 1- 4 . On Boolean algebras in general see Birkhoff [1], Glivenko [2]. We did
not give a system of independent axioms for Boolean algebras, since it seemed to
us of much more importance to present the rules of Boolean algebra in a way which
makes clear the duality of the two basic operations.
§ 5. See Stone [1]. We follow here Frink [1]; as to the Lemma see Hausdorff [1]
and Frink [2].
§ 6. The unsolved problem mentioned in Exercise 7 was first formulated by Dede
kind (cf. Birkhofif [1], p. 147). Concerning Exercise 11 see e.g. Gavrilov [1].
R E M A R K S A N D B IB L IO G R A P H IC A L N O T E S 639
Chapter II
Chapter III
Chapter IV
Chapter V
§ 1. The content of this section appeared first in the present book (German edition,
1962).
§ 2. We follow here Kolmogorov [5]. For the Radon-Nikodym theorem, see e.g.
Halmos [1 ].
§ 3. Concerning the new deduction of the Maxwell distribution given here, see
Rényi [19].
§ 4 . We follow here Kolmogorov [5].
§ 6 and § 7. See Gebelein [1], Rényi [26], [28], Csáki-Fischer [1], [2]. On the
Lemma of Theorem 3 of § 7, see Boas [1]. On Theorem 3, see Rényi [26]. On the
applications of the large sieve of Linnik in number theory, see Linnik [1], Rényi [2],
Bateman-Chowla-Erdős [1].
§ 7. Exercises 1-4 treat problems of integral geometry from the point of view of
probability theory (see Blaschke [1]). Exercise 6: cf. Hajós-Rényi [1]; Exercises
28-30: Rényi [28].
REM ARKS AN D B IB L IO G R A P H IC A L N O T E S 641
Chapter VI
Chapter VII
Chapter VIII
§ 1. Chebyshev [1], Markov [1], Liapunov [1], Lindeberg [1], Pólya [1], Feller [1],
Khinchin [3], Gnedenko-Kolrnogorov [1], Kolmogorov [5], [11], Prékopa Rényi-
Urbanik [1 ].
§ 2. Gnedenko [2].
§ 3. Lévy [4], Feller [1], Khinchin [5], and also Gnedenko-Kolrnogorov [1].
§ 5. Erdős-Rényi [1]; for the particular case p = const, cf. Bernstein [4].
§ 6. For the lemma, cf. Cramér [3]. Theorem 3 was first, under certain restrictions,
proved by a different method (cf. Rényi [3]). This result was generalized by Kolmo
gorov [10]. For the more simple proof given here, cf. Rényi [24]. Theorem 3 may be
applied to prove limit theorems for dependent random variables; cf. Révész [1].
On the central limit theorem for dependent random variables, see the fundamental
paper of Bernstein [3].
§ 7. Anscombe [1], Doeblin [2], Rényi [22], [31].
§ 8. On the theory and applications of Markov chains and Markov processes
see Markov [1], Kolmogorov [3], [7]; Doeblin [1], [2], Feller [2], [3], [6], [7],
Doob [2], Chung [1], Bartlett [1], Bharucha-Reid [1], Wiener [2], Chandrasekhar
[1], Einstein [1], Hostinsky [1], Lévy [3], Rényi [11].
§ 9. Rényi [9], [10], van Dantzig [1], Malmquist [1]; further references are to
be found in Wilks [1], Wang [1].
§ 10. Kolmogorov [6], N. V. Smirnov [1], [2], Gnedenko [1], Gnedenko-Koroljuk
[1], Doob [1], Feller [5], Donsker [1].
§ 11. For Theorem 1: Pólya [2]. See also Dvoretzky-Erdős [1]. On the arc sine
law (Theorem 6) cf. Lévy [2], Erdős-Kac [2]; Sparre-Andersen [1], [2]; Chung-
Feller [1], Rényi [33]. On Lemma 2, Rényi [36]; for other generalizations, Spitzer
[1]. For Theorem 8, see Erdős-Hunt [1]; for Theorem 9, Erdős-Kac [1]; for a gen
eralization of it, Rényi [9].
§ 12. Lindeberg [1], Krickeberg [1].
§ 13. Exercise 5: Rényi [17]; for a similar general system of independent functions,
see Steinhaus, Kac and Ryll-Nardzewski [1 ]—[10], Rényi [7]. Exercise 8: Kac [1];
Exercise 24: Wilcoxon [1] and Rényi [12]; Exercise 25: Lehmann [1] and Rényi
[12]; the equivalence of the problems considered in these two papers is proved in
E. Csáki [1]. Exercise 28: Erdős-Rényi [3]. The result of Exercise 30 is due to
Chung and Feller [1]; as regards the presentation given here, cf. Rényi [33].
Appendix
On the concepts of the entropy and information see Boltzmann [1], Hartley [1],
Shannon [1], Wiener [1], Shannon-Weaver [1], Woodward [1], Barnard [1], Jeffreys
[1], and the papers of Khinchin. Fadeev, Kolmogorov, Gelfand, Jaglom, etc., in
Arbeiten zur Informationstheorie I— III .On the role of the notion of information in sta
tistics, see the works of Fisher [1 ]—[3], and of Kuliback [1 ]. The notion of the dimension
of a probability distribution and that of the entropy of the corresponding dimension
were introduced in a paper of Balatoni and Rényi (Arbeiten zur Informationstheorie I)
and were further developed in Rényi [27], [30]. Measures of information differing
from the Shannon-measure were already considered earlier, e.g. by Bhattacharyya
[1] and Schützenberger [1]; the theory of entropy and information of order a is
developed in Rényi [34], [37].
R E M A R K S A N D B IB L IO G R A P H IC A L N O T E S 643
Part of the material appeared for the first time in the German edition of this book.
This appendix covers merely the basic notions of information theory; their appli
cation to the transmission of information through a noisy channel, coding theory,
etc. are not dealt with here. Besides the already mentioned works of Shannon and
Khinchin let there be indicated those of Feinstein [1], [2], McMillan [1] and Wol
fowitz [1], [2], [3].
§ 1. Concerning the theorem of Erdős on additive number-theoretical functions,
which was rediscovered by Fadeev, see Erdős [2]; the simple proof given in the text
is due to Rényi [29].
§ 2. For the theorem of Mercer, see Knopp [1].
§ 6. On the mean value theorem, see de Finetti [2], Kolmogorov [4], Nagumo [1],
Aczél [1], Hardy-Littlewood-Pólya [1] (where further references can be found;
this book contains also all other inequalities used in the Appendix, e.g. the inequalities
of Jensen and of Hölder).
§ 9. The idea that quantities of information theory may be used for the proof of
limit theorems is due to Linnik [2].
On the theorem of Perron-Frobenius, see Gantmacher [1].
§ 11. For Exercise 2c, see Rényi [16] and Kac [2]; Exercise 3c: Khinchin [6], for
the generalizations: Rényi [21]. Exercise 4: Kolmogorov-Tikhomirov (Arbeiten zur
Informationstheorie III). The content of Exercise 9 is due to Chung (unpublished
communication), the proof given here differs from that of Chung. Exercise 17b: cf.
Moriguti [1].
Tables
A czél, J.
[1] On mean values, Bull. Amer. Math. Soc. 54, 393-400 (1948).
[2] On composed Poisson distributions, III, Acta Math. Acad. Sei. Hung. 3, 219—
224 (1952).
A c z é l , J ., L . J á n o ssy a n d A . R é n y i
[1] On composed Poisson distributions, I, Acta Math. Acad. Sei. Hung. 1, 209-224
(1950).
A l e x a n d r o v , P. S. (Александров, С.) [1] Введение в общую теорию множеств
и функций (Introduction to the theory of sets and functions), OGIZ, Moscow -
Leningrad 1948.
A n s c o m b e , F. J.
[1] Large sample theory of sequential estimation, Proc. Cambridge Phil. Soc. 48,
600 (1952).
A r a tó , M . a n d A . R én y i
[1] Probabilistic proof of a theorem on the approximation of continuous functions
by means of generalized Bernstein polynomials, Acta Math. Acad. Sei. Hung. 8,
91-98 (1957).
ARBEITEN ZUR INFORMATIONSTHEORIE I-III (Teil von A. J . C h in t s c h i n ,
D. K . F a d d e j e w , A. N. K o l m o g o r o f f , A. R é n y i und J . B a l a t o n i ; Teil II von
I. M. G e l f a n d , A. M. J a g l o m , A. N. K o l m o g o r o f f , C h ia n g T se - P e i , I. P.
Z a r e g r a d s k i ; Teil III von A. N. K o l m o g o r o f f und W. M. T ic h o m ir o w ),
VEB Deutscher Verlag der Wissenschaften, Berlin 1957 bzw. 1960.
A u m a n n , G.
[1] Reelle Funktionen, Springer-Verlag, Berlin-Göttingen-Heidelberg 1954.
*
646 REFERENCES
E.
B a t ic l e ,
[2] Sur une lói de probabilité a priori pour l’interprétation des résultats de tirages
dans une urne, C. R. Acad. Sci. Paris 228, 902-904 (1949).
Bauer, И .
[1] Wahrscheinlichkeitstheorie und Grundzüge der Masstheorie, Sammlung Gö
schen 1216/1216a, de Gruyter, Berlin 1964.
B a y es , T h .
[1] Essay towards solving a problem in the doctrine of chances, “Ostwald’s Klas
siker der Exakten Wissenschaften”, Nr. 169, W. Engelmann, Leipzig 1908.
B e r n o u l l i , J.
[1] Ars Coniectandi (1713) I— II, III-IV. “Ostwald’s Klassiker der Exakten
Wissenschaften”, Nr. 108, W. Engelmann, Leipzig 1899.
B e r n s t e in , S. N. (Бернштейн, С. H.)
[1 ] Démonstration du théoréme de Weierstrass fondée sur la calcul des probabilités,
Soobshchs. Charkovskovo Mat. Obshch. (2) 13, 1-2 (1912).
[2] Опыт аксиоматического обоснования теории вероятностей (On a tentative
axiomatisation probability theory), Charkovskovo Zap. Mat. ot-va 15,
209-274 (1917).
[3] Sur Textension du théoréme limite du calcul des probabilités aux sommes de
quantités dépendantes. Math. Ann. 97, 1-59 (1926).
[4] Теория вероятностей (Probability theory), 4. ed., Goztehizdat, Moscow 1946.
B h a r u c h a - R e id , A. T.
[1] Elements of the theory of Markov processes and their applications, McGraw-
Hill, New York 1960.
B h a t t a c h a r y y a , A.
[1] On some analogues of the amount of information and their use in statistical
estimation, Sankhya 8, 1-14 (1946).
BlENAYMÉ, M.
[1 ] Considérations á l’appui de la découverte de Laplace sur la loi des probabilités
dans la méthode des moindres carrés, C. R. Acad. Sci. Paris 37, 309-324 (1853).
BlRKHOFF, G.
[1] Lattice theory, 3. ed., American Mathematical Society Colloquium Publications
25. AMS, Providence 1967.
B l a n c - L a p ie r r e , A., et R. F o r t é t
[1] Théorie des fonctions aléatoires, Masson et Cie., Paris 1953.
B l a s c h k e , W.
[1] Vorlesungen über Integralgeometrie, 3. Aufl., VEB Deutscher Verlag der
Wissenschaften, Berlin 1955.
B l u m , J . R ., D . L . H a n s o n a n d L . H . K o o pm a n s
[1] On the strong law of large numbers for a class of stochastic processes, Zeit
schrift für Wahrscheinlichkeitstheorie 2, 1-11'(1963).
B oa s, R . P . J r .
[1] A general moment problem, Amer. J. Math. 63, 361-370 (1941).
Bochner, S. and S. C h a n d r a s e k h a r a n
[1] Fourier transforms, Princeton Univ. Press, Princeton 1949.
B o l t z m a n n , L.
[1] Vorlesungen über Gastheorie, Johann Ambrosius Barth, Leipzig 1896.
Borel, É .
[1] Sur les probabilités dénombrables et leurs applications arithmétiques, Rend.
Circ. Mat. Palermo 26, 247-271 (1909).
[2] Éléments de la théorie des probabilités, Hermann et Fils, Paris 1909.
REFERENCES 647
F. P.
C a n t e l l i,
[1] La tendenza ad un limite nel senzo del calcolo déllé probabilita, Rend. Circ.
Mat. Palermo 16, 191-201 (1916).
C a rathéodory, C.
[1] Entwurf einer Algebraisierung des Integralbegriffes, Sitzungsber. Math.-Natur-
wiss. Klasse Bayer. Akad. Wiss., München 1938, S. 24-28.
C h a n d r a s e k h a r , S.
[1 ] Stochastic problems in physics and astronomy, Rev. Mod. Phys. 15, 1-89 (1943).
C h e b y s h e v , P. L. (Чебышев, П. Л.)
[1] Теория вероятностей (Theory of probability), Akad. izd., Moscow 1936.
C h u n g , K. L.
[1] Markov chains with stationary transition probabilities, Springer-Verlag, Berlin-
Göttingen-Heideiberg 1960.
C h u n g , K. L ., a n d P. E r d ő s
[1] Probability limit theorems assuming only the first moment, I, Mem. Amer.
Math. Soc. 6, 1-19 (1950).
C h u n g , K. L . a n d W. F e l l e r
[1] On fluctuations in coin-tossing, Proc. Acad. Sei. USA 35, 605-608 (1949).
C r a m é r , H.
[1] Über eine Eigenschaft der normalen Verteilungsfunktion, Math. Z. 41 , 405-414
(1936).
[2] Random variables and probability distributions, Cambridge Univ. Press,
Cambridge 1937.
[3] Mathematical methods of statistics, Princeton Univ. Press, Princeton 1946.
C r a m é r , H. and H. W o l d
[1] Some theorems on distribution functions, J. London Math. Soc. 11, 290-294
(1936).
C s á k i, E .
[1] On two modifications of the Wilcoxon-test, Publ. Math. Inst. Hung. Acad.
Sei. 4 , 313-319 (1959).
C s á k i , P. and J. F is c h e r
[1] On bivariate stochastic connection, Publ. Math. Inst. Hung. Acad. Sei. 5,
311-323 (1960).
[2] Contributions to the problem of maximal correlation, Publ. Math. Inst. Hung.
Acad. Sei. 5, 325-337 (1950).
C sá szá r, Á .
[1] Sur la structure des espaces de probabilité conditionelle, Acta Math. Acad. Sci.
Hung. 6, 337-361 (1955).
[2] Sur une caractérisation de la répartition normale de probabilités, Acta Math.
Acad. Sci. Hung. 7, 359-382 (1956).
VAN D a NTZIG, D .
[1] Mathematische Statistiek, “Kadercursus Statistiek, 1947-1948”, Mathema
tisch Centrum, Amsterdam 1948.
648 REFERENCES
D G.
a r m o is ,
[1] Analyse générale des liaisons stochastiques, Revue Inst. Internat. Stat. 21, 2-8
(1953).
D o e b l in , W .
[1] Sur les propriétés asymptotiques de mouvements régis par certains types de
chalnes simples, Bull. Soc. Math. Roumaine Sci. 391, 57-115 (1937); 3911,
3-61 (1937).
[2] Éléments d’une théorie générale des chaínes simples constantes de Markov,
Ann. Sci. École Norm. Sup. (3) 57, 61-111 (1940).
D on sk er, M . D .
[1] Justification and extension of Doob’s heuristic approach to the Kolmogorov-
Smirnov theorems, Ann. Math. Stat. 23, 277-281 (1952).
D oob, J. L.
[1] Heuristic approach to the Kolmogorov-Smirnov theorems, Ann. Math. Stat.
20, 393 (1949).
[2] Stochastic processes, Wiley-Chapman, New York-London 1953.
D ugué, D.
[1] Arithmétique de lois de probabilités, Mém. Sci. Math., No. 137, Gauthier-
Villars, Paris 1957.
D u m a s , M.
[1] Sur les lois de probabilités divergentes et la formule de Fisher, Interméd. Rech.
Math. 9 (1947), Supplement 127-130.
[2] Interpretation de resultats de tirages exhaustifs, C. R. Acad. Sei. Paris 288,
904-906 (1949).
(See the note from E. Borei after Dumas’ article, too.)
D v o r e t z k y , A. and P. E r d ő s
[1] Some problems on random walk in space, Proc. 2nd Berkeley Symp. Math.
Stat. Prob. 1950, Univ. California Press, Berkeley-Los Angeles 1951, 353-367.
E ggenberger, F., u n d G. P ó l y a
[1] Über die Statistik verketteter Vorgänge, Z. angew. Math. Mech. 3, 279-289
(1923).
E in s t e in , A.
[1] Zur Theorie der Brownschen Bewegung, Ann. Physik 19, 371-381 (1906).
E r d ő s , P.
[1] On the law of the iterated logarithm, Ann. Math. 43, 419-436 (1942).
[2] On the distribution function of additive functions Ann. Math. 47, 1-20(1946).
E r d ő s , P . and G . A. H u n t
[1 ] Changes of signs of sums of random variables, Pacific J. Math. 3,678-679 (1953).
E r d ő s , P . and M. K ac
[1 ] On certain limit theorems of the theory of probability, Bull. Amer. Math. Soc.
52, 292-302 (1946).
[2] On the number of positive sums of independent random variables, Bull. Amer.
Math. Soc. 53, 1011-1020 (1947).
E r d ő s, P . a n d A . R ényi
[1] On the central limit theorem for samples from a finite population, Publ. Math.
Inst. Hung. Acad. Sei. 4, 49-61 (1959).
[2] On Cantor’s series with convergent V — , Ann. Univ. Sei. Budapest, Rolando
4n
Eötvös nom., Sect. Math. 2, 93-109 (1959).
REFERENCES 649
A. R én y i
E rd ős, E. a n d
[3] On the evolution of random graphs, Publ. Math. Inst. Hung. Acad. Sei. 5,17-61
(1960).
[4] On a classical problem of probability theory, Publ. Math. Inst. Hung. Acad.
Sei. 6, 215-220 (1961).
E sseen , C. G.
(1] Fourier analysis of distribution functions, A mathematical study of the Laplace-
Gaussian law, Acta Math. 77, 1-125 (1945).
F e in s t e in ,A.
[1 ] A new basic theorem of information theory, Trans. Inst. Radio Eng., 2-22
(1954).
[2] Foundations of information theory, McGraw-Hill, New York 1958.
F e l d h e im , E.
[1 ] Étude de la stabilité des lois de probabilité, Dissertation, Univ. Paris, Paris 1937.
[2] Neuere Beweise und Verallgemeinerung der wahrscheinlichkeitstheoretischen
Sätze von Simmons, Mat. Fiz. Lapok 45, 99-114 (1938).
F e i l e r , W.
[1] Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung, Math. Z.
40, 521-559 (1935); 42, 301-312 (1947).
[2] Zur Theorie der stochastischen Prozesse, Existenz- und Eindeutigkeitssätze,
Math. Ann. 113, 113-160 (1936).
[3] On the integro-differential equations of purely discontinuous Markov processes,
Trans. Amer. Math. Soc. 48, 488-515 (1940). Errata: ibidem 58, 474 (1945).
[4] The law of the iterated logarithm for identically distributed random variables,
Ann. Math. 47, 631-638 (1946).
[5] On the Kolmogorov-Smirnov limit theorems for empirical distributions, Ann.
Math. Stat. 19, 177-189 (1948).
[6] On the theory of stochastic processes, with particular reference to applications,
Proc. Berkeley Symp. Math. Stat. Prob. 1945, 1946, Univ. California Press,
Berkeley-Los Angeles 1949, 403-432.
[7] An introduction to probability theory and its applications, Vols 1-2, Wiley,
New York 1950-1966.
DE FiN E T T I, B .
[1] Funzione caratteristica di un fenomeno aleatorio, Mem. R. Accad. Lincei (6)
4, 85-133 (1930).
[2] Sul concetto di media, Giorn. 1st. Ital. Att. 2, 369-396 (1931).
F is h e r , R. A.
[1] Statistical methods for research workers, 10th edition, Oliver-Boyd Ltd.,
Edinburgh-London 1948.
[2] The design of experiments, Oliver-Boyd Ltd., London-Edinburgh 1949.
[3] Contributions to mathematical statistics, Wiley-Chapman, New York-London
1950.
F is h e r , R . A. a n d F . Y a tes
[1] Statistical tables for biological, agricultural and medical research, Oliver-Boyd
Ltd., London-Edinburgh 1949.
F is z , M .
[1] Probability theory and mathematical statistics, 3. ed. Wiley, New York 1963.
E. M a r c z e w s k i a n d C. R y l l - N a r d z e w s k i
F l o r e k , K .,
[1] Remarks on the Poisson stochastic process, I, Studia Math. 13, 122-129 (1953).
650 REFERENCES
F réchet, M.
[1] Récherches théoriques modernes, Fascicule 3 du Tome 1 du Traité du calcul
des probabilités par É. Borei et divers auteurs, Gauthier-Villars, Paris 1937.
[2] Les probabilités associées ä un Systeme d’événements compatibles et dépendants,
I-II, Hermann et Cie., Paris 1940 and 1943.
F r in k , О .
[1] Representations of Boolean algebras, Bull. Amer. Math. Soc. 47, 755-756
(1941).
[2] A proof of the maximal chain theorem, Amer. J. Math. 74, 676-678 (1952).
H a jó s , G . a n d A . R ényi
[1] Elementary proofs of some basic facts concerning order statistics, Acta Math.
Acad. Sei. Hung. 5, 1-6 (1954).
H a l m o s , P. R.
[1] Measure Theory, van Nostrand, New York 1950.
H ardy, G . H .
[1 ] D iv e r g e n t s e rie s , C la r e n d o n P re s s , O x f o r d 1949.
H a r d y , G. H., J. E. L it t l e w o o d and G. P ó l y a
[1] Inequalities, 2nd edition, Cambridge Univ. Press, Cambridge 1952.
H ardy, G . H. a n d W. W. R o g o s in s k i
[1] Fourier series, 3rd edition, Cambridge Univ. Press, Cambridge 1956.
H a r d y , G. H. and E. M. W r ig h t
[1] An introduction to the theory of numbers, 4th edition, Clarendon Press, Oxford
1960.
H a r r is , T. E .
[1] The theory of branching processes, Springer Verlag, Berlin-Heidelberg-New
York 1963.
H artley, R . V.
[1] Transmission of information, Bell Syst. Techn. J. 7, 535-563 (1928).
H a u sd o r ff, F.
[1] Grundzüge der Mengenlehre, В. G. Teubner, Leipzig 1914.
H e l m e r t , R.
[1] Über die Wahrscheinlichkeit der Potenzsummen der Beobachtungsfehler und
über einige damit im Zusammenhang stehende Fragen, Z. Math. Phys. 21 ,
192-219 (1876).
H il l e , E .
[1] Functional analysis and semi-groups, Amer. Math. Soc. Coll. Pubi., Vol. 31,
New York 1948.
HOSTINSKY, B.
[1] Méthodes générales du calcul des probabilités, Mém. Sci. Math. Nr. 52,
Gauthier-Villars, Paris 1931.
H u r w i t z , A. und R. C o u r a n t
[1] Funktionentheorie, Springer, Berlin 1929.
J effreys, H .
[1] Theory of probability, 2nd edition, Clarendon Press, Oxford 1948.
Jordan, C h .
[1] On probability, Proc. Phys. Math. Soc. Japan 7, 96-109 (1925).
[2] Statistique mathématique, Gauthier-Villars, Paris 1927.
[3] Le théoréme de probabilité de Poincaré, généralisé au cas de plusieurs variables
indépendantes, Acta Sei. Math. Szeged 7, 103-111 (1934).
[4] Calculus of finite differences, 2nd edition, Chelsea Publ. Comp., New York 1950.
[5] Fejezetek a klasszikus valószínűségszámításból (Chapters from the classical
calculus of probabilities), Akadémiai Kiadó, Budapest 1956.
K ac, M .
[1] Random walk and the theory of Brownian motion, Amer. Math. Monthly 54 ,.
369-391 (1947).
652 REFERENCES
K ac, M.
[2] A remark on the proceeding paper by A. Rényi, Pubi. Inst. Math. Beograd 8,
163-165 (1955).
[3] Probability and related topics in physical sciences, Lectures in applied mathe
matics, Vol. I, Intersci. Publ., London-New York 1959.
[4] Statistical independence in probability, analysis and number theory, Math.
Assoc. America 1959.
K antorovitch, L. V. (Канторович, Л. B.)
[1] Sur une P ro b le m e de M. Steinhaus, Fund. Math. 14 , 266-270 (1929).
K a p p o s , D. A.
[1] Strukturtheorie der Wahrscheinlichkeitsfelder und -räume, Springer-Verlag,
Berlin-Göttingen-Heidelberg 1960.
K a w a t a , 1. and H. S a k a m o t o
[1 ] On the characterization of the normal population by the independence of the
sample mean and the sample variance, J. Math. Soc. Japan 1, 111-115 (1949).
K hinchin , A. J. ( Х и н ч и н , А. Я.)
[1] Über dyadische Brüche, Math. Z. 18, 109-118 (1923).
[2] Sur les classes d’événements équivalents, Mat. Sbornik 39:3, 40-43 (1932).
[3] Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, Springer, Berlin 1933.
[4] Korrelationstheorie der stationärer stochastischer Prozesse, Math. Ann. 109, 604-
615 (1934).
[5] Sul dominio di attrazione della legge di Gauss, Giorn. Ist. Ital. Att. 6, 378-393
(1935).
[6] Kettenbrüche, B. G. Teubner, Leipzig 1956.
[7] О классах эквивалентных событии (On classes of equivalent events), Dok-
ladi Akad. Nauk. SSSR 8 5 , 713-714 (1952).
K h in c h in , A. J. und A. N. K o l m o g o r o v (Х и н ч ин , А. Я . и A. H. Колмогоров)
[1 ] Über Konvergenz von Reihen, deren Glieder durch den Zufall bestimmt werden,
Mat. Sbornik 32, 668-677 (1925).
(K hinchin ) C hintschin, A. J. et P. L evy
[1] Sur les lois stables, C. R. Acad. Sei. Paris 202, 374-376 (1936).
K n o pp, K.
[1] Theorie und Anwendung der unendlichen Reihen, Springer, Berlin 1924.
K o l l e r , S.
[1] Graphische Tafeln zur Beurteilung statistischer Zahlen, Steinkopff, Dresden-
Leipzig 1943.
K olmogorov, A. N. (Колмогоров, А. H.)
[1] Über das Gesetz des iterierten Logarithmus, Math. Ann. 101, 126-136 (1929).
[2] Sur la loi forte des grandes nombres, C. R. Acad. Sei. Paris 191, 910-912(1930).
[3] Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung, Math.
Ann. 104, 415-458 (1930).
[4] Sur la notion de la moyenne, Atti R. Accad. Naz. Lincei 12, 388-391 (1930).
[5] Foundations of the theory of probability, Chelsea, New York 1956.
[6] Sulla determinazione empirica di una legge di distribuzione, Giorn. 1st. Ital. Att.
4 , 83-91 (1933).
[7] Цепы Маркова с счетным множеством возможных состояний (Markov
chains with denumerably infinite possible states), Bull. Mosk. Univ. 1, 1 (1937).
[8] О логарифмически нормальном законе распределения размеров частиц при
дроблении (On the lognormal distribution of the sizes of particles in chop
ping), Doki. Akad. Nauk. SSSR 31, 99-101 (1941).
[9] Algebres de Boole métriques completes, VI. Zjad Matematykow Polskich,
Warsaw 20-23. IX. 1948, Inst. Math. Univ. Krakow, 1950. 22-30.
REFERENCES 653
L aha, R. G.
[1 ] An example of a non-normal distribution where the quotient follows the Cauchy
law, Proc. Nat. Acad. Sei. USA 44 , 222-223 (1958).
L a p l a c e , P. S.
[1] Théorie analytique des probabilités, 1795. Oeuvres Complétes de Laplace, t. 7,
Gauthier-Villars, Paris 1886.
[2] Essai philosophique sur les probabilités, I—II, Gauthier-Villars, Paris 1921.
L e h m a n n , E . L.
[1 ] Consistency and unbiasedness of certain nonparametric tests, Ann. Math. Stat.
22, 165-180 (1951).
L evy, P.
[1] Calcul des Probabilités, Gauthier-Villars, Paris 1925.
[2] Sur certains processes stochastiques homogénes, Comp.Math. 7, 283-339. (1939).
[3] Processus stochastiques et mouvement brownien, Gauthier-Villars, Paris 1948.
[4] Théorie de l’addition des variables aléatoires, 2е ed. Gauthier-Villars, Paris 1954.
L i g h t h i l l , M. J.
[1] An introduction to Fourier analysis and generalised functions, Cambridge
Univ. Press, Cambridge 1959.
L in d e b e r g , J. W.
[1] Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrech
nung, Math. Z. 15, 211-225 (1922).
Linnik , Y u. V. (Линник,Ю. В.)
[1] The large sieve, Doki. Akad. Nauk. SSSR 30, 292-294 (1941).
[2] Теоретико-информационное доказательство центральной предельной тео
ремы в условиях Линдеберга (An information theoretic proof of the central
limit theorem on Lindeberg conditions), Teor. Verojatn. Prim. 4 , 311-321
(1959).
[3] Разложения вероятностных законов (Decomposition of probability functions),
Izd. Univ. Leningrad 1960.
L i n n i k , Yu. V. and A. A. S in g e r (Линник, Ю. В. и А. А. Зингер)
[1] Об одном аналитическом обобщении теоремы Крамера (On an analytic
extension of Cramér’s theorem), Vestnik Leningr. Univ. 11, 51-56 (1955).
L j a p u n o v , A. M. (Ляпунов, A. M.)
[1] Избранные труды (Selected works), Akad. izd. Moscow 1948, pp. 179-250.
654 REFERENCES
S.
M a l m q u is t ,
[1] On a property of order statistics from a rectangular distribution, Skand.
Aktuarietidskrift 33, 214-222 (1950).
M a r c z e w s k i, E .
[1] Remarks on the Poisson stochastic process, II, Studia Math. 13, 130-136 (1953).
M arkov, А. А. (Марков, A. A.)
[1] Wahrscheinlichkeitsrechnung, В. G. Teubner, Leipzig 1912.
M c M il l a n , B .
[1] The basic theorems of information theory, Ann. Math. Stat. 24, 196-219(1953).
M edgyessy, P.
[1] Decomposition of superpositions of distribution functions, Akad. Kiadó,
Budapest 1961.
v o n M is e s , R.
[1] Wahrscheinlichkeitsrechnung und ihre Anwendung in der Statistik und theore
tischen Physik, Deuticke, Leipzig-Wien 1931.
[2] Wahrscheinlichkeit, Statistik und Wahrheit, Springer-Verlag, Berlin 1952.
M o g y o r ó d i , J.
[1] On a consequence of a mixing theorem of A. Rényi, MTA Mat. Kút. Int. Közi.,
9, 263-267 (1964).
M o l in a , F. C.
[1] Poisson’s exponential binomial limit, van Nostrand, New York 1942.
M o r ig u t i , S.
[1 ] A lower bound for a probability moment of an absolutely continuous distribution
with finite variance, Ann. Math. Stat. 23, 286-289 (1952).
N agumo, M.
[1] Über eine Klasse von Mittelwerten, Japan. J. Math. 7, 71-79 (1930).
REFERENCES 655
N eveu, J.
[1] Mathematical foundations of the calculus of probability, Holden-Day Inc.,
San Francisco 1965.
N eyman, J.
[1] L’estimation statistique traité comme un probléme classique de probabilité,
Act. Sci. Industr., Nr. 739, Gauthier-Villars, Paris 1938.
[2] First course in probability and statistics, H. Holt et Co., New York 1950.
O nicescu, O. et G. M ihoc
[1] La dépendance statistique. Chaines et families de chaines discontinues, Act.
Sei. Industr., Nr. 503, Gauthier-Villars, Paris 1937.
O nicescu, O., G. M ihoc si C. T. I onescu-T ulcea
[1] Calculul probabilitatilor si applicatii, Bucurejti 1956.
P arzen, E.
[1] Modern probability theory and its applications, Wiley, New York 1960.
P earson, E. S. and H. O. H artley
[1] Biometrical tables for statisticians, Cambridge Univ. Press, Cambridge 1954.
P earson, K.
[1] Early statistical papers, Cambridge Univ. Press, Cambridge 1948.
P oincaré, H.
[1] Calcul des probabilités, Carré-Naud, Paris 1912.
P oisson, S. D.
[1] Recherches sur la probabilité de judgements, Bachelier, Paris 1837.
P ólya, G.
[1] Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das
Momentproblem, Math. Z. 8, 171-181 (1920).
[2] Über eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die Irrfahrt
im Straßennetz, Math. Ann. 84, 149-160 (1921).
P ólya, G. und G. Szegő
[1] Aufgaben und Lehrsätze aus der Analysis, I-II, Springer, Berlin 1925.
P opper , K.
[1 ] Philosophy of science: A personal report, British Philosophy in the Mid-Century,
ed. by C. A. M ace, 1956, p. 191.
[2] The logic of scientific discovery, Hutchinson, London 1959.
P rékopa, A.
[1] On composed Poisson distributions, IV, Acta Math. Acad. Sei. Hung. 3,
317-326 (1952).
[2] Valószínűségelmélet műszaki alkalmazásokkal (Probability theory and its
applications in technology), Műszaki Könyvkiadó, Budapest 1962.
P rékopa, A., A. R ényi and K. U rbanik
[1] О предельном раерпеделении для сумм независимых случайных величин
на бикомпактных коммутативных топологических группах (On limit dis
tribution of sums of independent random variables over bicompact commuta
tive topological groups), Acta Math. Acad. Sei. Hung. 7, 11-16 (1956).
R eichenbach, H.
[1] Wahrscheinlichkeitslehre, Sijthoff, Leiden 1935.
656 REFERENCES
RÉNYI, A .
[1] Simple proof of a theorem of Borel and of the law of the iterated logarithm,
Mat. Tidsskrift B, 41-48 (1948).
[2] О представлении четных , чисел в виде суммы простого и почти простого
числа (On the representation of even numbers as sums of a prime and an al
most prime number), Izvestia Akad. Nauk. SSSR, Ser. Mat. 12, 57-78 (1948).
[3] К теории предельных теорем для сумм независимых случайных величин
(On limit theorems of sums of independent random variables), Acta Math.
Acad. Sei. Hung. 1, 99-108 (1950).
[4] On the algebra of distributions, Publ. Math. Debrecen 1, 135-149 (1950).
[5] On composed Poisson distributions, II, Acta Math. Acad. Sei. Hung. 2, 83-98
(1951).
[6] On some problems concerning Poisson processes, Publ. Math. Debrecen 2,
66-73 (1951).
[7] On a conjecture of H. Steinhaus, Ann. Soc. Polon. Math. 25, 279-287 (1952).
[8] On projections of probability distributions, Acta Math. Acad. Sei. Hung. 3,
131-142 (1952).
[9] On the theory of order statistics, Acta Math. Acad. Sei. Hung. 4, 191-232
(1953).
[10] Eine neue Methode in der Theorie der geordneten Stichproben, Bericht über
die Mathematiker-Tagung Berlin 1953, VEB Deutscher Verlag der Wissen
schaften, Berlin 1953, 203-213.
[11] Kémiai reakciók tárgyalása a sztochasztikus folyamatok elmélete segítségével
(On describing chemical reactions by means of stochastic processes), A Magyar
Tudományos Akadémia Alkalmazott Matematikai Intézetének Közleményei 2,
596-600 (1953) (In Hungarian).
[12] Újabb kritériumok két minta összehasonlítására (Some new criteria for com
parison of two samples), A Magyar Tudományos Akadémia Alkalmazott
Matematikai Intézetének Közleményei 2, 243-265 (1953) (In Hungarian).
[13] Valószinűségszámítás (Probability theory), Tankönyvkiadó, Budapest 1954
(In Hungarian).
[14] Axiomatischer Aufbau der Wahrscheinlichkeitsrechnung, Bericht über die
Tagung Wahrscheinlichkeitsrechnung und Mathematische Statistik, VEB
Deutscher Verlag der Wissenschaften, Berlin 1954, 7-15.
[15] On a new axiomatic theory of probability, Acta Math. Acad. Sei. Hung. 6,
285 -335 (1955).
[16] On the density of sequences of integers, Publ. Inst. Math. Beograd 8, 157-162
(1955).
[17] A számjegyek eloszlása valós számok Cantor-féle előállításaiban (The distri
bution of the digits in Cantor’s representation of the real numbers), Mat. Lapok
7, 77-100 (1956) (In Hungarian).
[18] On conditional probability spaces generated by a dimensionally ordered set of
measures, Teor. Verojatn. prim. 1, 61-71 (1956).
[19] A new deduction of Maxwell’s law of velocity distribution, Isv. Mat. Inst. Sofia 2,
45-53 (1957).
[20] A remark on the theorem of Simmons, Acta Sei. Math. Szeged. 18, 21-22
(1957).
[21] Representations for real numbers and their ergodic properties, Acta Math.
Acad. Sei. Hung. 8, 477-493 (1957).
[22] On the asymptotic distribution of the sum of a random number of independent
random variables, Acta Math. Acad. Sei. Hung. 8, 193-199 (1957).
REFERENCES 657
R é n y i, A.
[23] Quelques remarques sur les probabilités des événements dépendantes, J. Math,
pures appl. 37, 393-398 (1958).
[24] On mixing sequences of sets, Acta Math. Acad. Sei. Hung. 9, 215-228 (1958).
[25] Probabilistic methods in number theory, Proceedings of the International
Congress of Mathematicians, Edinburgh 1958, 529-539.
[26] New version of the probabilistic generalization of the large sieve, Acta Math.
Acad. Sei. Hung. 10, 217-226 (1959).
[27] On the dimension and entropy of probability distributions, Acta Math. Acad.
Sei. Hung. 10, 193-215 (1959).
[28] On measures of dependence, Acta Math. Acad. Sei. Hung. 10, 441-451 (1959).
[29] On a theorem of P. Erdős and its applications in information theory, Mathe
matica Cluj 1 (24), 341-344 (1959).
[30] Dimension, entropy and information, Transactions of the II. Prague Conference
on Information theory, statistical decision functions, random processes, Praha
1960, 545-556.
[31] On the central limit theorem for the sum of a random number of independent
random variables, Acta Math. Acad. Sei. Hung. 11, 97-102 (1960).
[32] Az aprítás matematikai elméletéről (On the mathematical theory of chopping),
Építőanyag 1-8 (1960) (In Hungarian).
[33] Bolyongási problémákra vonatkozó határeloszlástételek (Limit theorems in
random walk problems), A Magyar Tudományos Akadémia III (Matematikai
és Fizikai) Osztályának Közleményei 10, 149-170 (1960) (In Hungarian).
[34] Az információelmélet néhány alapvető kérdése (Some fundamental problems
of the information theory), A Magyar Tudományos Akadémia III (Matematikai
és Fizikai) Osztályának Közleményei 10, 251-282 (1960) (In Hungarian).
[35] Egy általános módszer valószínűségszámítási tételek bizonyítására (A general
method for proving theorems in probability theory), A Magyar Tudományos
Akadémia III (Matematikai és Fizikai) Osztályának Közleményei 11, 79-105
(1961) (In Hungarian).
[36] Legendre polynomials and probability theory, Ann. Univ. Sei. Budapest, R.
Eötvös nom., Sect. Math. 3-4, 247-251 (1961).
[37] On measures of entropy and informations. Proc. Fourth Berkeley Symposium
on Math. Stat. Prob. 1960, Vol. I, Univ. California Press, Berkeley-Los Angeles
1961, 547-561.
[38] On stable sequences of events, Sankhya A 25, 293-302 (1963).
[39] On certain representations of real numbers and on equivalent events, Acta Sei.
Math. Szeged 26, 63-74 (1965).
[40] Új módszerek és eredmények a kombinatorikus analízisben (New methods
and results in combinatorial analysis), A Magyar Tudományos Akadémia III
(Matematikai és Fizikai) Osztályának Közleményei 16, 75-105, 159-177
(1966) (In Hungarian).
[41] Sur les espaces simples des probabilités conditionnelles, Ann. Inst. H. Poincare
В 1, 3-19 (1964).
[42] On the foundations of information theory, Review of the International Statis
tical Institute 33, 1-14 (1965).
R é n y i , A. and P. R év é sz
[1] On mixing sequences of random variables, Acta Math. Acad. Sei. Hung. 9,
389-393 (1958).
[2] A study of sequences of equivalent events as special stable sequences, Publi
cationes Mathematicae Debrecen 10, 319-325 (1963).
658 REFERENCES
Saxer, W.
[1] Versicherungsmathematik, II, Springer-Verlag, Berlin-Göttingen-Heidelberg
1958.
S chmetterer, L.
[1] Einführung in die mathematische Statistik, Springer-Verlag, Wien 1956.
Schützenberger, M. P.
[1 ] Contributions aux applications statistiques de la théorie de l’information, Inst.
Stat. Univ. Paris (A) 2575, 1-115 (1953).
S hannon, C. E.
[1 ] A mathematical theory of communication, Bell Syst. Techn. J. 27, 379-423,
623-653 (1948).
S hannon, C. E. and W. W eaver
[1] The mathematical theory of communication, Univ. Illinois Press, Urbana 1949.
Singer, А. А. (Зингер, A. A.)
[1] О независимых выборках из нормальной совокупности (On independent
samples from a population), Uspehi Mat. Nauk 6, 172-175 (1951).
Skitovich, V. R. (Скитович, В. P.)
[1] Об одном свойстве нормального распределения (On a property of the nor
mal distribution), Doki. Akad. Nauk. SSSR 89, 217-219 (1953).
Slutsky, E.
[1] Über stochastische Asymptoten und Grenzwerte, Metrón 5, 1-90 (1925).
Smirnov, N. V. (Смирнов, H. В.)
[1] Über die Verteilung allgemeiner Glieder in der Variationsreihe, Metrón 12,
59-81 (1935).
[2] Приближение законов распределения случайных величин по эмпирическим
данным (Approximation of the laws of distribution of random variables by
means of empirical data), Uspehi Mat. Nauk. 10, 179-206 (1944).
S mirnov, V. I. (Смирнов, В. И.)
[1] Lehrgang der höheren Mathematik, Teil III, 3. Auf!., VEB Deutscher Verlag
der Wissenschaften, Berlin 1961.
von Smoluchowski, M.
[1] Drei Vorträge über Diffusion, Brownsche Molekularbewegung und Koagulation
von Kolloidteilchen, Phys. Z. 17, 557-571, 585-599 (1916).
REFERENCES 659
E.
S pa r r e-A n d er sen ,
[1] On the number of positive sums of random variables, Skand. Aktuarietidskrift,
1949, 27-36.
[2] On the fluctuations of sums of random variables, I—II, Math. Scand. 1,263-285
(1953); 2, 193-223 (1954).
S pitzer , F.
[1] A combinatorial lemma and its application to probability theory, Trans. Amer.
Math. Soc. 82, 323-339 (1956).
S t e in h a u s . H.
[1] Les probabilités dénombrables et leur rapport á la théorie de la mesure, Fund.
Math. 285-310 (1923).
[2] Sur la probabilité de la convergence des séries, Studia Math. 2, 21-39 (1951).
S teinhaus, H., M. Kac et C. R yll-N ardzewski
[1 ]—[10] Sur les fonctions indépendantes, I, Studia Mathematica 6, 46-58 (1936);
II, ibidem 6, 59-66 (1936); III, ibidem 6, 89-97 (1936); IV, ibidem 7 , 1-15
(1938); V, ibidem 7 , 96-100 (1938); VI, ibidem 9 , 121-132 (1940); VII, ibidem
10 , 1-20 (1948); VIII, ibidem 11, 133-144 (1949); IX, ibidem 12 , 102-107
(1951); X, ibidem 13 , 1-17 (1953).
S tone, M. H.
[1] The theory of representation for Boolean algebras, Trans. Amer. Math. Soc. 4,
31-111 (1936).
Student
[1] —’s Collected papers, Edited by E. S. Pearson and J. Wishart, London 1942.
S zász, G.
[1] Introduction to lattice theory (transl. from the Hungarian), Akad. Kiadó,
Budapest 1963.
S zőkefalvi-N agy, В.
[1] Spektraldarstellung linearer Transformationen des Hilbertschen Raumes,
Springer, Berlin 1942.
T itchmarsh, E. C.
[1] Theory of functions, Clarendon Press, Oxford 1952.
T odhunter , L.
[1] History of the mathematical theory of probability, MacMillan, Cambridge-
London 1865.
U spenski, J. W. (Успенский, Ю. В.)
[1] Introduction to mathematical probability, McGraw-Hill, New York-London
1937.
Veksler, V., L. G roshev and B. I saev (Векслер, В., Л. Грошев и Б. Исаев)
[1] Ионизационные методы исследования излучений (Ionisation methods in the
study of radiations), Gostehizdat, Moscow 1949.
W aerden, van der , В. L.
[1] Mathematische Statistik, Springer-Verlag, Berlin-Göttingen-Heidelberg 1957.
W ald , A.
[1] Die Widerspruchsfreiheit des Kollektivbegriffes der Wahrscheinlichkeitsrech
nung, Erg. Math. Koll. 8, Wien 1935-1936.
W ang , Shou Y en
[1] On the limiting distribution of the ratio of two empirical distributions, Acta
Math. Sinica 5, 253 (1955).
660 REFERENCES
W id d e r , D. V.
[1] The Laplace-transform, Princeton Univ. Press, Princeton 1946.
W ie n e r , N.
[1] Cybernetics or control and communication in the animal and the machine,
Act. Sei. Indust., Nr. 1053, Hermann et Cie, Paris 1948.
[2] Extrapolation, interpolation and smoothing of stationary time series, Wiley,
New York 1949.
W ilcoxon, F.
[1] Individual comparisons by ranking methods, Biometrics Bull. 1, 80-83 (1945).
W il k s , S. S.
[1] Order statistics, Bull. Amer. Math. Soc. 54, 6-50 (1948).
W o l f o w it z , J.
[1] The coding of messages subject to chance errors, Illinois J. Math. 1, 591-606
(1957).
[2] Information theory for mathematicians, Ann. Math. Stat. 29, 351-356 (1958).
[3 ] Coding theorems of information theory, Springer-Verlag, Berlin-Göttingen-
Heidelberg 1961.
W o o d w a r d , P. M.
[1] Probability and information theory with applications to radar, Pergamon Press,
London 1953.
Z ygmund, A.
[1] Trigonometrical series, Warsaw 1935; Dover-New York 1955.
[2] Trigonometric series, I—II, Cambridge Univ. Press, Cambridge 1959.
AUTHOR AND SUBJECT INDEX
F eldheim, E., 167, 640, 641, 649 H ardy, G. H., 307, 368, 552, 574, 580,
F eller, W., 447, 448, 453, 639, 641, 642 640, 641, 643, 651
Fermi-Dirac statistics, 43 H arris, T. E., 651
F inetti, B., de, 413, 639, 643, 649 H artley, H. O., 643
F ischer, J., 640, 647 H artley, R. V., 642, 643, 651
F isher, R. A., 339, 642, 643, 649 Hartley’s formula, 542
Fisz, M., 639, 649 H ausdorff, F., 23, 415, 638, 651
F lorek, K., 640, 649 H elly, E., 319, 641
F ortét, R., 639, 646 H elmert, R., 198, 640, 651
Fourier-Stieltjes transform, 302 H ille, E., 431, 641, 651
Fourier transform, 356, 357 H irschfeld, A. O., 283
F réchet, M., 37, 639, 650 H ostinsky, B., 642, 651
frequency, 30 H unt , G. A., 512, 642, 648
—, relative, 30 H urwitz, A., 641, 651
F rink , O., 21, 638, 650 hypergeometric distribution, 88
F robenius, 598, 643
fundamental theorem of mathematical incomplete probability distribution, 569
statistics, 400 incomplete random variable, 569
independent events, 57
gain, conditional distribution function of independent random variables, 99, 182
572 infinitely divisible distribution, 347
—, measure of, 574 infinitesimal random variable 448
—, of information, 562 information, 540, 554, 592
Gabon’s desk, 152 —, of order alpha, 579, 586
gamma distribution, 202 integral geometry, 69
G antmacher, F. R., 598, 643, 650 I onescu-T ulcea, C. T., 639, 658
G auss, C. F., 641, 650 I saev, B., 640, 659
Gauss curve, 152
Gaussian, density function, 191 J aglom, A. M., 642, 645
Gaussian distribution function, 157, 187 J ánossy, L., 640, 645
Gaussian random variable, 156 J effreys, H., 562, 639, 641, 642, 651
G avrilov, M. A., 28, 638, 650 Jensen inequality, 555
G eary, R. C., 339, 641, 650 joint distribution function, 178
G ebelein, H., 283, 640, 650 J o r d a n , C h ., 37, 6 3 9 , 651
G elfand, A. N., 642, 646
Gelfand-distributions, 353 K ac, M., 345, 511, 514, 639, 641, 642,
generalized functions, 354 643, 648, 651, 659
generating function, 135 K a n t o r o v ic h , L. V., 120, 640
geometric distribution, 90 K a p p o s , D. A., 638, 639
G livenko, V. I., 9, 401, 492, 638, 641, 650 K a w a t a , J ., 339, 641
G nedenko, В. V., 348, 448, 449, 458, K h in c h in , A., 347, 380, 453, 548, 607,
496, 639, 641, 642, 650 639, 641, 642, 643, 645
—, theorem of, 449 K n o p p , К., 150, 426, 472, 491, 640, 641,
G roshev, L., 640, 659 643
G umbel, A. J., 37 K o l l e r , S ., 643
K olmogorov, A. N., 9, 33, 69, 276, 383,
H ájek, J., 434, 460, 641, 650 396, 402, 420, 438, 448, 458, 493,
H ajós, G., 640, 651 576, 638, 639, 640, 641, 642, 643
half line period, 127 645, 650, 652
H almos, P. R., 48, 639, 651 Kolmogorov probability space, 97
H anson, L., 475 Kolmogorov’s formula, 348
664 A U T H O R A N D S U B J E C T IN D E X
Kolgomorov’s fundamental theorem, 286 M almquist, S., 489, 640, 642, 654
— inequality, 392 M arczewski, E., 640, 649, 654
K oopmans, L. H., 646 marginal distribution, 190
K orouuk , V. S„ 496, 642, 650 M arkov, A. A., 442, 642, 654
K rickeberg, K., 642, 653 —, theorem of, 479
K ronecker, L., 397 Markov chain, 475
K ullback, S., 642, 653 -------, additive, 483
Ky F an, 641, 653 -------, ergodic, 479
-------, homogeneous, 476
Laguerre polynomials, 169 -------, reversible, 534
L aha, R. G., 372, 641, 653 Markov inequality, 218
Laplace curve, 152 maximal correlation, 283
—, method of, 164 Maxwell distribution, 200, 239
L aplace, P. S„ 153, 639, 653 — —, of order n, 269
large sieve, 286 Maxwell-Boltzmann statistics, 43
lattice, 21 M c M illan, B., 643, 654
— d is tr ib u tio n , 308 measure, 49
la w o f e r r o r s 440 —, complete, 50
— of large numbers, due to, Bernstein, —, outer, 50
379 —, a-finite, 49
--------------------Khinchin, 380 measurable set, 50
--------------------Kolmogorov, 383 M edgyessy, P., 654
--------------------Markov, 378 median, 217
— of the iterated logarithm 402 M ensov, D. E., 641
Lebesgue measure, 52 Mercer theorem, 552, 643
Lebesgue-Stieltjes measure, 52 M ihoc , G., 639, 640, 655
Legendre polynomials, 509 M ikusinski, J., 353, 641
L ehmann, E. L., 642, 653 M ises, R. von, 639, 654
level set, 172 mixing sequence of random variables, 467
L évy, P„ 348, 350, 453, 511, 639, 641, mixture of distributions, 207, 131
642, 652, 653 modulus of dependence, 283
Lévy-Khinchin formula, 347 M ogyoródi, J., 475, 654
Liapunov, A. M., 517, 641, 642 Moivre-Laplace theorem, 153
Liapunov’s condition, 442 M olina, F. C., 643, 654
L ighthill, M. J., 353, 641, 653 moment, 137, 217
L indeberg, J. W., 642, 653 —- generating function, 138
Lindeberg’s, condition, 443, 447 monotone class, 418
— theorem 520 Monte Carlo method, 69
linear operator, 515 M origuti, S., 654
L innik , Yu . V., 286, 329, 336, 605, 640, mutually independent random variables,
641, 643, 653 252
L ittlewood, J. E., 368, 574, 580, 643,651
L japunov, A. M., 653, 653 N agumo, M., 576, 643, 654
L obachevski, N. I., 198, 640, 654 л-dimensional cylinder, 287
L o év e , M., 639, 654 negative binomial distribution, 92
logarithmically uniform distribution, 249 negligible random variable, 448
lognormal distribution, 194 N eveu, J., 655
L omnicki, A., 639, 654 N ewton, I. 531
L ö s c h , F ., 199, 640, 65 4 N eyman, J. 639, 655
L uce, R. D„ 639, 654 non-atomic probability space, 81
L ukács, E., 331, 339, 641, 654 normal curve, 152
A U T H O R A N D S U B J E C T IN D E X 665
normal distribution, 156, 186 643, 645, 648, 650, 651, 655, 656,
normally distributed random vector, 191 657, 658
R é v é s z , P„ 471, 641, 642, 657, 658
O b r e sk o v , N. G., 168 R ic h t e r , H ., 6 3 9 , 658
O n i c e s c u , O . , 639, 640, 655 R ie m a n n , B ., 307
R i e s z , F„ 407, 411, 658
operator method, 515
order statistics, 205, 235, 486 ring of sets, 48
R o b b i n s , H., 641, 658
P e a r s o n , K., 32,198,276,279,640,642,655
H., 339, 641, 652
S a k a m o t o ,
Pearson distribution, 233 sample space, 21
P o i s s o n , S . D., 640, 655 S a x e r , W„ 640, 658
Poisson distribution, 123, 202 S c h m e t t e r e r , L ., 639
Poisson’s summation formula, 365 S c h o b l ik , F., 199, 654
P ó l y a , G., 169, 368, 509, 510, 574, 580, S c h ü t z e n b e r g e r , M.P., 642
640, 641, 642, 643, 648, 651, 655 S c h w a r t z , L., 353
Pólya distribution, 94 S c h w a r z , H. A., 329, 641
polyhypergeometric distribution, 89 semiinvariant, 139
polynomial distribution, 87 sequences of mixing sets, 406
P o p p e r , K., 639, 655 S h a n n o n , C. E., 547, 567, 569, 642, 658
positive definit function, 304 Shannon’s formula, 547
Post-Widder inversion formula, 432 — gain of information, 574
P r é k o p a , A., 640, 642, 655 — information, 579
probability algebra, 34 similar distributions, 186
— distribution, 84 Simmons theorem, 167
— of an event, 32 Simson distribution, 197
projection of distribution, 189 S i n g e r , A. A., 330, 339, 641, 653, 658
к
FURTHER TIT LE S T O BE
RECOMMENDED IN TH IS LIN E
Á. ÁD ÁM
STUDIES IN MATHEMATICAL
STATISTICS
T h eo ry and Application
ED IT ED B Y I. V I N C Z E AN D K. S A R K A D !
Distributors
K U L T U R A
Budapest 62. P.O.B. 149
!
if
К--
\
,
.