Professional Documents
Culture Documents
Athanasios Papoulis - Probability, Random Variables and Stochastic Processes
Athanasios Papoulis - Probability, Random Variables and Stochastic Processes
0 with distribution F(x) = |
—e *— xe* and we Wish to generate an RN sequence y, > 0.with distribution
Fy) = 1 =e”. In this example Fu) = —In(1 — w); hence
FC R(x)) = —In[t — F(x) = -In(e + xe“)
Inserting into (8-145), we obtain
yy = —In(e=" + xe")
REJECTION METHOD. In the percentile transformation method, we used the
inverse of the function F(x). However, inverting a function is not a simple task
To overcome this difficulty, we develop next a method that avoids inversion. The
problem under consideration is the generation of an RN sequence y, with
distribution F,(y) in terms of the RN sequence x, as in (8-145),
The proposed method is based on the relative frequency interpretation of
the conditional density
Plx
This yields c, = x3(01), cy = y7_5/(n), and the interval
nd 5 ne he
—_ x, iff at least k + r of the samples x, are
greater than x,. Finally, y, f(x, 8)
of the 1 samples x, of x. This density, considered as a function of @ is called the
likelihood function of X. The value @ of @ that maximizes (X10) is the ML
estimate of @. The logarithm
L(X,6) = in f(X,0) = © in f(x,,0) (9-38)
ist
is the /og-! likelihood function of X. From the monotonicity of the logarithm, it
follows that 6 also maximizes the function L(x, 0). If @ is in the interior of the
domain © of @, then 6 is a root of the equation
aL(xX,0) a 1 af(x,,6)
= =0 9-39)
a0 ini f(%,0) 40 ‘ t262 sratistics
Example 9-12. Suppose that f(x, @) = 6e~"*U(x), In this case,
£(X0) = 0%"? L(X,0) =n In 0 — Ont
Hence
aL(X,8) _
0
Thus the ML estimator of @ equals 1/X. This estimator is biased because
E(1/3} = n0/(n — 1),
The ML method can be used to estimate any parameter. However, for
moderate values of , the estimate is not efficient. The method is used primarily
for large values of n. This is based on the following important result.
Asymptotic properties. For large n, the distribution of the ML estimator 6
approaches a normal curve with mean @ and variance 1 /nJ where
AL(x, 8) |*
role} -
2
aL (x,0
08
f(x, 0) dx (9-40)
Thus
(9-41)
The number / is called the information about @ contained in x. Using integra:
tion by parts, we can show that (see Prob. 9-24)
#PL(x,@)
ia -7{
We show later [see (9-46)] that the variance of any estimator of @ cannot be
smaller than 1/nJ. From this it follows that the ML estimator is asymptotically
normal, unbiased, with minimum variance. In the next example, we demonstrate
the yalidity of the above theorem. The proof will not be given.
Example 9-13. Suppose that the RV x is N(n,o) where 7 is a known constant.
We wish to find the ML estimate 6 of its variance v = a2. In this problem,
1
f(X,0) = aa a = Sc, ny}
n 1 2
L(X,v) = - = >) ee —n)
(%,0) = ~sin(2av) = E(x, — n)
Inserting into (9-29), we obtain
a(x) on 1
a apt Be 2) = 09-2 PARAMETER ESTIMATION 263
This yields the estimator
1
n
LG ny
v=
As we know [see (8-67)]
n
Furthermore, for large n the RV ? is nearly normal (CLT) as in (9-41). To
complete the validity of (9-41), it suffices to show that nl = Va? = n/20+, This
follows from the identity
et =al- ey
The Rao-Cramér Bound
A basic problem in estimation is the determination of the best estimator 6 of a
Parameter 9. It is casy to show that if 6 exists, it is unique (see Prob. 9-39),
However, in general, the problem of determi ing the best estimator of 8, or
even of showing that such an estimator exists, is not simple. In the following, we
determine the greatest lower bound of the variance of most estimators. This
result can be used to establish whether a particular estimator is the best or that
it is close to the best. We shall assume that the density f(x, @) of x is
differentiable with respect to @ and that the boundary of the domain of x does
not depend on @. Differentiating the area condition If(x, 0) dx = 1 with respect
to 6, we obtain the identity
= f(x, 0)
{a= dy =0 (9-42)
A density satisfying the conditions leading to this identity will be called regular.
We show next that
("|
ae
{| AL(X, 8)
#\| 7)
} =nl (9-43)
where L(X, @) = In f(X, 6) is the log-likelihood of X and nl is the information
about @ contained in X [see also (9-40)].
Proof. From the identity L(x, 6) = In f(x, 0) and (9-42), it follows that
2 OL(x,0 co @ 58)
This shows that the mean of the function aL(x, 0)/00 is 0; hence its variance
equals E(|#L(x, )/a6|*). Inserting into (9-38), we obtain (9-43) because the
RVs In f(x, @) are independent.264 statistics
We shall use (9-43) to determine the greatest lower bound of the variance
of an arbitrary estimator 6 of @. Suppose first that 6 = 2(X) is an unbiased
estimator of 8;
BO) = fs )F(%, 8) dx = 0
Differentiating with respect to 6, we obtain
X,0) aL(X, 4)
1= [2 — re Fak = [aX Sp 0) ax
This yields
AL (X, 0)
eco | =1 (9-44)
Multiplying the first equation of (9-43) by @ and subtracting from (9-44), we
obtain
ao 9) j- (9-45)
£(Le(®) ~ 6]
We shall use this identity to prove the following important result.
THEOREM. The variance E{{g(X) — 0J°) of any unbiased estimator 4 of 6
cannot be smaller than 1 /n/:
1
fe ri (9-46)
Proof. The proof is based on Schwarz’s inequality
E*{zw) < E{22)E{w?) (9-47)
Squaring both sides of (9-45) and applying (9-47) to the RVs z = g(X) — @ and
w = 4L(X, 0)/d0 we obtain
a {| 9L(X,8) |?
1s E{[g(X) - ariel oe | (9-48)
and (9-46) results.
We shall now determine the class of functions for which the estimator 6 is
best, that is, that (9-46) is an equality. As we know, (9-47) is an equality if
z= cw. Hence (9-48) is an equality if g(X) — @ = cdL(X, @)/d8. To find c, we
insert into (9-48) and use (9-43). This yields c = 1/n/ hence
aL(X, 0)
a0
Thus the estimate 6 = g(X) is best if the log-likelihood function LUX, @)
satisfies (9-49),
=nI[e(X) - 6] (9-49)9-3 uveornEsistesrinG 265
COROLLARY. If 6 = g(X) isa biased estimator of @ with mean £(6) = 7(4),
then
>, Or
of=
i nt
(9-50)
Proof. The statistic 6 = g(X) is an unbiased estimator of the parameter 7 = 7(@).
We can, therefore, apply (9-46) provided that we replace @ by 7(@) and n/ by
the information about 7 contained in X. Since
aL[X,A(7)] _ oL(X,6) de
ar "ae dr
and 6(z) = 1/7/(8) we obtain
+| aL [X, 0(7)] | fe)
or [ra]?
and (9-50) results. Reasoning as in (9-49), we conclude that (2-44) is an equality
iff
aL X,0(z)) _ ni n .
ar Tec ee) 4(7)] (9-51)
Note If f(x, 0) is a density of exponential ype, that is, if
f(%,0) = h(x)exp{a(a)q(x) + b(A)) (9-52)
then the statistic 6 =(1/n)Sq(x) is the best estimator of the parameter 7(@) =
=b'(0)/a'(6). This follows readily from (9-51).
9-3 HYPOTHESIS TESTING
A statistical hypothesis is an assumption about the value of one or more
parameters of a statistical model, Hypothesis testing is a process of establishing
the validity of a hypothesis. This topic is fundamental in a variety of applica
tions: Is Mendel’s theory of heredity valid? Is the number of particles emitted
from a radioactive substance Poisson distributed? Does the value of a parame-
ter in a scientific investigation equal a specific constant? Are two events
independent? Does the mean of an RV change if certain factors of the
experiment are modified? Does smoking decrease life expectancy? Do voting
patterns depend on sex? Do IQ scores depend on parental education? The list
is endless.
We shall introduce the main concepts of hypothesis testing in the context
of the following problem: The distribution of an RV x is a known func-
tion F(x,@) depending on a parameter 0. We wish to test the assumption
6 = 0) against the assumption 8 # 6). The assumption that 0 = is denoted
by Hy and is called the null hypothesis. The assumption that 6 + 8), is denoted266 statistics
by A, and is called the altertiative hypothesis, The values that @ might take
under the alternative hypothesis form a set ©, in the parameter space. If 0,
consists of a single point @ = 6,, the hypothesis H, is called simple; otherwise
itis called composite. The null hypothesis is in most cases simple.
The purpose of hypothesis testing is to establish whether experimental
evidence supports the rejection of the null hypothesis. The decision is based on
the location of the observed sample X of x. Suppose that under hypothesis H,
the density f(X,0,) of the sample vector X is negligible in a certain region D,
of the sample space, taking significant values only in the complement D_ of
D.. It is reasonable then to reject Hy if X is in D, and to accept Hy if X is in
D,. The set D. is called the critical region of the test and the set D. is called the
region of acceptance of Hy. The test is thus specified in terms of the set D,.
We should stress that the purpose of hypothesis testing is not to determine
whether Hy or H, is true. It is to establish whether the evidence supports the
rejection of Hy. The terms “accept” and “reject” must, therefore, be inter-
preted accordingly. Suppose, for example, that we wish to establish whether the
hypothesis Hy that a coin is fair is true. To do so, we toss the coin 100 times
and observe that heads show k times. If k = we reject Hy, that is, we decide
on the basis of the evidence that the fair-coin hypothesis should be rejected. If
k = 49, we accept Hy, that is, we decide that the evidence does not support the
rejection of the fair-coin hypothesis. The evidence alone, however, does not lead
to the conclusion that the coin is fair. We could have as well concluded that
p= 0,49.
In hypothesis testing two kinds of errors might occur depending on the
location of X:
1, Suppose first that Hy is true. If X & D,, we reject Hj) even though it is true.
We then say that a Type J error is committed. The probability for such an
error is denoted by @ and is called the significance level of the test. Thus
a@=P(X © DH} (9:53)
The difference | — @ = P(X ¢ D.|H,) equals the probability that we accept
Hy when true. In this notation, P{--- [#,) is not a conditional probability.
The symbol H, merely indicates that H, is true.
2, Suppose next that Hy is false. If ¥ € D., we accept Hy even though it is
false. We then say that a Type II error is committed, The probability for such
an error is. a function A(4) of @ called the operating characteristic (OC) of the
test. Thus
B(@) = P{X € D,|H,} (9-54)
‘The difference 1 — (6) is the probability that we reject Hy when false. This
is denoted by P(@) and is called the power of the test. Thus
P(0) = 1—B(0) = P(X = DJH,) (9-55)9-3 wyroruesis testina 267
Fundamental note Hypothesis testing is not a part of statistics. It is part of decision
theory based on statistics. Statistical consideration alone cannot lead to a decision. They
merely lead to the following probabilistic statements:
ED} =a
€D,) = Blo)
If Hy is true, then P
If Hy is false, then P
(9-56)
Guided by these statements, we “reject” Hy if X © D, and we “accept” Hy if X € D
These decisions are not based on (9-56) alone, They take into consideration other, often
subjective, factors, for example, our prior knowledge concerning the truth of Hy, or the
consequences of a wrong decision.
The test of a hypothesis is specified in terms of its critical region. The
region D. is chosén so as to keep the probabilities of both types of errors small.
However both probabilities cannot be arbitrarily small because a decrease in a
results in an increase in B. In most applications, it is more important to control
qa. The selection of the region D. proceeds thus as follows:
Assign a value to the Type I error probability @ and search for a region D,
of the sample space so as to minimize the Type II error probability for a specific
6. If the resulting (9) is too large, increase a to its largest tolerable value; if
(8) is still too large, increase the number » of samples,
A testis called most powerful if (0) is minimum. In general, the critical
region of a most powerful test depends on @. If it is the same for every 6 = O).
the test is uniformly most powerful. Such a test does not always exist. The
determination. of the critical region of a most powerful test involyes a search in
the n-dimensional sample space. In the following, we introduce a simpler
approach,
TEST STATISTIC. Prior to any experimentation, we select a function
a=8(X)
of the sample vector X. We then find a set R, of the real line where under
hypothesis Hy the density of q is negligible, and we reject Hy if the yalue
q = 3(X) of qisin R,. The set R, is the critical region of the test; the RV q is
the fest statistic. In the selection of the function g(X) we are guided by the
point estimate of @.
In a hypothesis test based on a test statistic, the two types of errors are
expressed in terms of the region R, of the real line and the density f,(q,@) of
the test statistic q:
a =P{qER,\H,) = J fala) dq (9-57)
B(0) = Pla RAM) = f fy(4,0) da (9-58)
To carry out the test, we determine first the function f(g, 8). We then
assign a value to a and we search for a ‘region R. minimizing B(@). The search268 sranistics
fa.0)
= don dwn Celie cma,
Uy
8
é q
R R, R R,
(a) (b) te)
FIGURE 9-10
is now limited to the real line. We shall assume that the function f,(q, 4) has a
single maximum. This is the case for most practical t
Our objective is to test the hypothesis @ = 0, ag each of the hypothe-
ses 0 # 0, 0 >), and @ < 64. To be concrete, we shall assume that the
function f,(q, 9) is concentrated on the right of f,(q, 9) for @ > @, and on its
left for @ < @) as in Fig. 9-10.
Hy: 0 + Oy
Under the stated assumptions, the most likely values of q are on the right of
F,(4, 9) if 8 > 0, and on its left if @ < 6). It is, therefore, desirable to reject Hy
if q w
z
=
Crh 2s:
The RV x has a Poisson distribution with mean @. We wish to find the bayesian
estimate 6 of @ under the assumption that @ is the value of an RV 0 with prior
density f,(0) ~ 6%e-U(6). Show that
m+b+1
ne
b=
Suppose that the IQ scores of children in a certain grade are the samples of an
N(,@) RV x. We test 10 children and obtain the following averages: x = 90,
§ = 5, Find the 0.95 confidence interval of 7 and of o.
The RVs x, are and N(0,«). We observe that x? + -+- +xj, = 4. Find the
0.95 confidence interval of o.
The readings of a voltmeter introduces an error v with mean 0. We wish to
estimate its standard deviation 7. We measure a calibrated source V = 3 V four
times and obtain the values 2.90, 3.15, 3.05, and 2,96. Assuming that » is normal,
find the 0.95 confidence interval of o.
The RV x has the Erlang density f(x) ~
x, = 3.1,34,3.3, Find the ML estimate ¢ of c.
The RV x has the truncated exponential density f(x) = ce "U(x — x9). Find
thé ML estimate é of c in terms of the n samples x, of x.
‘The time to failure of a bulb is an RV x with density ce~“*U(x), We test 80 bulbs
and find that 200 hours later, 62 of them are still good. Find the ML estimate of c.
The RV x has a Poisson distribution with mean @. Show that the ML estimate of @
equals =,
Show that if L(x, 6) = In f(x, 0) is the likelihood function of an RY x, then
esse) 2582)
4
U(x), We observe the samples
009-25,
9-26.
9-27.
9-28.
9-29,
9-30.
9-31.
9-32.
9-33,
9-34.
proutems 281
We are given an RV x with mean y and standard deviation o = 2, and we wish to
test the hypothesis 7 = 8 against » = 8,7 with a = 0.01 using as the test statistic
the sample mean % of n samples. (a) Find the critical region R, of the test and the
resulting 6 if n = 64. (b) Find n and R. if B = 0.05
A new car is introduced with the claim that its average mileage in highway driving
is at least 28 miles per gallon. Seventeen cars are tested, and the following mileage
is obtained:
19 20 24 25 26 26.8 27.2 27.5
28 282 284 29 30 31 32 333 35
Can we conclude with significance level at most 0.05 that the claim is true?
The weights of cereal boxes are the values of an RV x with mean 7. We measure
64 boxes and find that ¥ = 7.7 oz. and s = 1.5 oz. Test the hypothesis Hy: n = 8
oz, against H,: 1 # 8 oz. with a = 0.1 and a = 0.01.
Brand A batteries cost more than brand B batteries. Their life lengths are two RVs
xand y. We test 16 batteries of brand A and 26 batteries of brand B and find these
values, in hours:
Test the hypothesis 7, = n, against n, > 7, with a =
A coin is tossed 64 times, and heads shows 22 times. Test the hypothesis that the
coin is fair with significance level (0.05. We toss a coin 16 times, and heads shows
times. If k is such that k, < k < k3, we accept the hypothesis that the coin is fair
with significance level a = 0.05. Find k, and &, and the resulting B error.
In a production process, the number of defective units per hour is a Poisson
distributed RV x with parameter A = 5. A new process is introduced, and it is
observed that the hourly defectives in a 22-hour period are
mye3 054264153740832436569
Test the hypothesis A = 5 against A <5 with a = 0,05.
A die is tossed 102 times, and the ith face shows k, = 18, 15, 19, 17, 13, and 20
times. Test the hypothesis that the die is fair with @ = 0,05 using the chi-square
test.
A computer prints out 1000 numbers consisting of the 10 integers j = 0,1,
The number 7, of times j appears equals
n,=85 110 118 91 78 105 122 94 101 96
9
‘Test the hypothesis that the numbers j are uniformly distributed between 0 and 9,
with « = 0.05.
‘The number x of particles emitted from a radioactive substance in 1 second is a
Poisson RV with mean @. In 50 seconds, 1058 particles are emitted. Test the
hypothesis @, = 20 against. # 20 with @ = 0.05 using the asymptotic approxima-
tion.
The RVs x and y are N(n,,0,) and N(n,,0,) respectively and independent. Test
the hypothesis a, =, against o, +a, using as the test statistic the ratio (sce
Prob. 6-19)
1% te
a= | Ute - 1) 7 ny)
it