Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

10.

11 Likelihood Ratio Tests 549

n
ii Argue that L( p0 )/L( pa ) < k if and only if i=1 yi > k ∗ for some constant k ∗ .
iii Give the rejection region for the most powerful test of H0 versus Ha .
n
b Recall that i=1 Yi has a binomial distribution with parameters n and p. Indicate how
to determine the values of any constants contained in the rejection region derived in part
[a(iii)].
c Is the test derived in part (a) uniformly most powerful for testing H0 : p = p0 versus
Ha : p > p0 ? Why or why not?
*10.103 Let Y1 , Y2 , . . . , Yn denote a random sample from a uniform distribution over the interval (0, θ).
a Find the most powerful α-level test for testing H0 : θ = θ0 against Ha : θ = θa , where
θa < θ0 .
b Is the test in part (a) uniformly most powerful for testing H0 : θ = θ0 against Ha : θ < θ0 ?
*10.104 Refer to the random sample of Exercise 10.103.
a Find the most powerful α-level test for testing H0 : θ = θ0 against Ha : θ = θa , where
θa > θ0 .
b Is the test in part (a) uniformly most powerful for testing H0 : θ = θ0 against Ha : θ > θ0 ?
c Is the most powerful α-level test that you found in part (a) unique?

10.11 Likelihood Ratio Tests


Theorem 10.1 provides a method of constructing most powerful tests for simple
hypotheses when the distribution of the observations is known except for the value of
a single unknown parameter. This method can sometimes be used to find uniformly
most powerful tests for composite hypotheses that involve a single parameter. In
many cases, the distribution of concern has more than one unknown parameter. In
this section, we present a very general method that can be used to derive tests of
hypotheses. The procedure works for simple or composite hypotheses and whether
or not other parameters with unknown values are present.
Suppose that a random sample is selected from a distribution and that the likelihood
function L(y1 , y2 , . . . , yn | θ1 , θ2 , . . . , θk ) is a function of k parameters, θ1 , θ2 , . . . , θk .
To simplify notation, let  denote the vector of all k parameters—that is,  =
(θ1 , θ2 , . . . , θk )—and write the likelihood function as L(). It may be the case that
we are interested in testing hypotheses only about one of the parameters, say, θ1 . For
example, if as in Example 10.24, we take a sample from a normally distributed popu-
lation with unknown mean µ and unknown variance σ 2 , then the likelihood function
depends on the two parameters µ and σ 2 and  = (µ, σ 2 ). If we are interested in
testing hypotheses about only the mean µ, then σ 2 —a parameter not of particular
interest to us—is called a nuisance parameter. Thus, the likelihood function may be
a function with both unknown nuisance parameters and a parameter of interest.
Suppose that the null hypothesis specifies that  (may be a vector) lies in a par-
ticular set of possible values—say, 0 —and that the alternative hypothesis specifies
that  lies in another set of possible values a , which does not overlap 0 . For
example, if we sample from a population with an exponential distribution with mean
λ (in this case, λ is the only parameter of the distribution, and  = λ), we might be
550 Chapter 10 Hypothesis Testing

interested in testing H0 : λ = λ0 versus Ha : λ =


λ0 . In this exponential example, 0
contains only the single value λ0 and a = {λ > 0 : λ
= λ0 }. Denote the union of
the two sets, 0 and a , by ; that is,  = 0 ∪ a . In the exponential example,
 = {λ0 } ∪ {λ > 0 : λ
= λ0 } = {λ : λ > 0}, the set of all possible values for λ. Either
or both of the hypotheses H0 and Ha can be composite because they might contain
multiple values of the parameter of interest or because other unknown parameters
may be present.
Let L( ˆ 0 ) denote the maximum (actually the supremum) of the likelihood function
for all  ∈ 0 . That is, L( ˆ 0 ) = max∈0 L(). Notice that L( ˆ 0 ) represents the
best explanation for the observed data for all  ∈ 0 and can be found by using
methods similar to those used in Section 9.7. Similarly, L() ˆ = max∈ L()
represents the best explanation for the observed data for all  ∈  = 0 ∪ a . If
L( ˆ 0 ) = L(),ˆ then a best explanation for the observed data can be found inside 0 ,
and we should not reject the null hypothesis H0 :  ∈ 0 . However, if L( ˆ 0 ) < L(),
ˆ
then the best explanation for the observed data can be found inside a , and we should
consider rejecting H0 in favor of Ha . A likelihood ratio test is based on the ratio
L( ˆ 0 )/L().
ˆ

A Likelihood Ratio Test


Define λ by
max L()
ˆ 0)
L( ∈0
λ= = .
L()
ˆ max L()
∈

A likelihood ratio test of H0 :  ∈ 0 versus Ha :  ∈ a employs λ as a test


statistic, and the rejection region is determined by λ ≤ k.

It can be shown that 0 ≤ λ ≤ 1. A value of λ close to zero indicates that the likeli-
hood of the sample is much smaller under H0 than it is under Ha . Therefore, the data
suggest favoring Ha over H0 . The actual value of k is chosen so that α achieves the
desired value. We illustrate the mechanics of this method with the following example.

E X AM PL E 10.24 Suppose that Y1 , Y2 , . . . , Yn constitute a random sample from a normal distribution


with unknown mean µ and unknown variance σ 2 . We want to test H0 : µ = µ0 versus
Ha : µ > µ0 . Find the appropriate likelihood ratio test.

Solution In this case,  = (µ, σ 2 ). Notice that 0 is the set {(µ0 , σ 2 ) : σ 2 > 0}, a =
{(µ, σ 2 ) : µ > µ0 , σ 2 > 0}, and hence that  = 0 ∪ a = {(µ, σ 2 ) : µ ≥ µ0 ,
σ 2 > 0}. The constant value of the variance σ 2 is completely unspecified. We must
now find L( ˆ 0 ) and L().
ˆ
For the normal distribution, we have
 n  n/2  
1 1 n
(y i − µ) 2
L() = L(µ, σ 2 ) = √ exp − .
2π σ2 i=1
2σ 2
10.11 Likelihood Ratio Tests 551

Restricting µ to 0 implies that µ = µ0 , and we can find L( ˆ 0 ) if we determine


the value of σ 2 that maximizes L(µ, σ 2 ) subject to the constraint that µ = µ0 . From
Example 9.15, we see that when µ = µ0 the value of σ 2 that maximizes L(µ0 , σ 2 ) is
1 n
σ̂02 = (yi − µ0 )2 .
n i=1
ˆ 0 ) is obtained by replacing µ with µ0 and σ 2 with σ̂02 in L(µ, σ 2 ), which
Thus, L(
gives
 n  n/2    n  n/2
1 1 n
(y − µ )2
1 1
e−n/2 .
i 0
L(0 ) = √
ˆ exp − = √
2π σ̂ 2
0 i=1 2 σ̂0
2
2π σ̂ 2
0

We now turn to finding L().


ˆ As in Example 9.15, it is easier to look at ln L(µ, σ 2 ),

n n 1 n
ln[L(µ, σ 2 )] = − ln σ 2 − ln 2π − 2
(yi − µ)2 .
2 2 2σ i=1

Taking derivatives with respect to µ and σ 2 , we obtain


∂{ln[L(µ, σ 2 )]} 1 
n
= 2 (yi − µ),
∂µ σ i=1
∂{ln[L(µ, σ 2 )]}  n  1  n
= − + (yi − µ)2 .
∂σ 2 2σ 2 2σ 4 i=1

We need to find the maximum of L(µ, σ 2 ) over the set  = {(µ, σ 2 ) : µ ≥ µ0 ,


σ 2 > 0}. Notice that
∂ L(µ, σ 2 )/∂µ < 0, if µ > y,
∂ L(µ, σ )/∂µ = 0,
2
if µ = y,
∂ L(µ, σ )/∂µ > 0,
2
if µ < y.
Thus, over the set  = {(µ, σ 2 ) : µ ≥ µ0 , σ 2 > 0}, ln L(µ, σ 2 ) [and also L(µ, σ 2 )]
is maximized at µ̂ where

y, if y > µ0 ,
µ̂ =
µ0 , if y ≤ µ0 .

Just as earlier, the value of σ 2 in  that maximizes L(µ, σ 2 ), is


1 n
σ̂ 2 = (yi − µ̂)2 .
n i=1
ˆ is obtained by replacing µ with µ̂ and σ 2 with σ̂ 2 , which yields
L()
 n  n/2    n  n/2
1 1 n
(yi − µ̂)2 1 1
L() = √
ˆ exp − = √ e−n/2 .
2π σ̂ 2
i=1
2 σ̂ 2
2π σ̂ 2
552 Chapter 10 Hypothesis Testing

Thus,
L(ˆ 0 )  σ̂ 2 n/2
λ= =
L()ˆ σ̂02
  n/2


n
(y − y)2
  i=1 i
, if y > µ0
n
= i=1 (yi − µ0 )
2



1, if y ≤ µ0 .

Notice that λ is always less than or equal to 1. Thus, “small” values of λ are those
less than some k < 1. Because
n  n
(yi − µ0 )2 = [(yi − y) + (y − µ0 )]2
i=1 i=1

n
= (yi − y)2 + n(y − µ0 )2
i=1
if k < 1, it follows that the rejection region, λ ≤ k, is equivalent to
n
(yi − y)2
ni=1 < k 2/n = k 
i=1 (yi − µ 0 )2
n
i=1 i − y)
(y 2
n < k
i=1 (yi − y) + n(y − µ0 )
2 2

1
< k.
n(y − µ0 )2
1 + n
i=1 (yi − y)
2

This inequality in turn is equivalent to


n(y − µ0 )2 1
n >  − 1 = k 
(y
i=1 i − y) 2 k

n(y − µ0 )2
> (n − 1)k 
1  n
(yi − y) 2
n − 1 i=1
or, because y > µ0 when λ < k < 1,

n(y − µ0 ) (
> (n − 1)k  ,
s
where
1  n
s2 = (yi − y)2 .
n − 1 i=1

Notice that n(Y − µ0 )/S is the t statistic employed in previous sections. Conse-
quently, the likelihood ratio test is equivalent to the t test of Section 10.8.
10.11 Likelihood Ratio Tests 553

Situations in which the likelihood ratio test assumes a well-known form are not
uncommon. In fact, all the tests of Sections 10.8 and 10.9 can be obtained by the
likelihood ratio method. For most practical problems, the likelihood ratio method
produces the best possible test, in terms of power.
Unfortunately, the likelihood ratio method does not always produce a test statistic
with a known probability distribution, such as the t statistic of Example 10.24. If the
sample size is large, however, we can obtain an approximation to the distribution of λ
if some reasonable “regularity conditions” are satisfied by the underlying population
distribution(s). These are general conditions that hold for most (but not all) of the
distributions that we have considered. The regularity conditions mainly involve the
existence of derivatives, with respect to the parameters, of the likelihood function.
Another key condition is that the region over which the likelihood function is positive
cannot depend on unknown parameter values.

THEOREM 10.2 Let Y1 , Y2 , . . . , Yn have joint likelihood function L(). Let r0 denote the num-
ber of free parameters that are specified by H0 :  ∈ 0 and let r denote the
number of free parameters specified by the statement  ∈ . Then, for large
n, −2 ln(λ) has approximately a χ 2 distribution with r0 − r df.

The proof of this result is beyond the scope of this text. Theorem 10.2 allows us
to use the table of the χ 2 distribution to find rejection regions with fixed α when n
is large. Notice that −2 ln(λ) is a decreasing function of λ. Because the likelihood
ratio test specifies that we use RR: {λ < k}, this rejection may be rewritten as
RR : {−2 ln(λ) > −2 ln(k) = k ∗ }. For large sample sizes, if we desire an α-level
test, Theorem 10.2 implies that k ∗ ≈ χα2 . That is, a large-sample likelihood ratio test
has rejection region given by
−2 ln(λ) > χα2 , where χα2 is based on r0 − r df.
The size of the sample necessary for a “good” approximation varies from application
to application. It is important to realize that large-sample likelihood ratio tests are
based on −2 ln(λ), where λ is the original likelihood ratio, λ = L( ˆ 0 )/L().
ˆ

EXAMPLE 10.25 Suppose that an engineer wishes to compare the number of complaints per week filed
by union stewards for two different shifts at a manufacturing plant. One hundred
independent observations on the number of complaints gave means x = 20 for shift
1 and y = 22 for shift 2. Assume that the number of complaints per week on the ith
shift has a Poisson distribution with mean θi , for i = 1, 2. Use the likelihood ratio
method to test H0 : θ1 = θ2 versus Ha : θ1 =

θ2 with α ≈ .01.

Solution The likelihood of the sample is now the joint probability function of all xi ’s and y j ’s
and is given by
   
1
θ1 i e−nθ1 θ2 j e−nθ2 ,
x y
L(θ1 , θ2 ) =
k
where k = x1 ! · · · xn !y1 ! · · · yn !, and n = 100. In this example,  = (θ1 , θ2 ) and
0 = {(θ1 , θ2 ) : θ1 = θ2 = θ}, where θ is unknown. Hence, under H0 the likelihood
554 Chapter 10 Hypothesis Testing

function is a function of the single parameter θ, and


   
1 x + y j −2nθ
L(θ ) = θ i e .
k
Notice that, for  ∈ 0 , L(θ ) is maximized when θ is equal to its maximum likelihood
estimate,

1  n n
1
θ̂ = xi + y j = (x + y).
2n i=1 j=1
2
In this example, a = {(θ1 , θ2 ) : θ1
= θ2 } and  = {(θ1 , θ2 ) : θ1 > 0, θ2 > 0}. Using
the general likelihood L(θ1 , θ2 ), a function of both θ1 and θ2 , we see that L(θ1 , θ2 ) is
maximized when θ̂ 1 = x and θ̂ 2 = y, respectively. That is, L(θ1 , θ2 ) is maximized
when both θ1 and θ2 are replaced by their maximum likelihood estimates. Thus,
ˆ 0)
L( k −1 (θ̂)nx+n y e−2n θ̂ (θ̂)nx+n y
λ= = = .
L()
ˆ k −1 (θ̂ 1 )nx (θ̂ 2 )n y e−n θ̂ 1 −n θ̂ 2 (x)nx (y)n y
Notice that λ is a complicated function of x and y. The observed value of θ̂ is
(1/2)(x + y) = (1/2)(20 + 22) = 21. The observed value of λ is
21(100)(20+22)
λ=
20(100)(20) 22(100)(22)
and hence
−2 ln(λ) = −(2)[4200 ln(21) − 2000 ln(20) − 2200 ln(22)] = 9.53.
In this application, the number of free parameters in  = {(θ1 , θ2 ) : θ1 > 0, θ2 > 0}
is k = 2. In 0 = {(θ1 , θ2 ) : θ1 = θ2 = θ}, r0 = 1 of these free parameters is fixed.
In the set , r = 0 of the parameters are fixed. Theorem 10.2 implies that −2 ln(λ)
has an approximately χ 2 distribution with r0 − r = 1 − 0 = 1 df. Small values of
λ correspond to large values of −2 ln(λ), so the rejection region for a test at approx-
imately the α = .01 level contains the values of −2 ln(λ) that exceed χ.01 2
= 6.635,
the value that cuts off an area of .01 in the right-hand tail of a χ 2 density with 1 df.
Because the observed value of −2 ln(λ) is larger than χ.01 2
, we reject H0 : θ1 = θ2 .
We conclude, at approximately the α = .01 level of significance, that the mean
numbers of complaints filed by the union stewards do differ.

Exercises
10.105 Let Y1 , Y2 , . . . , Yn denote a random sample from a normal distribution with mean µ (unknown)
and variance σ 2 . For testing H0 : σ 2 = σ02 against Ha : σ 2 > σ02 , show that the likelihood ratio
test is equivalent to the χ 2 test given in Section 10.9.
10.106 A survey of voter sentiment was conducted in four midcity political wards to compare the
fraction of voters favoring candidate A. Random samples of 200 voters were polled in each of
the four wards, with the results as shown in the accompanying table. The numbers of voters
favoring A in the four samples can be regarded as four independent binomial random variables.
Exercises 555

Construct a likelihood ratio test of the hypothesis that the fractions of voters favoring candidate
A are the same in all four wards. Use α = .05.

Ward
Opinion 1 2 3 4 Total
Favor A 76 53 59 48 236
Do not favor A 124 147 141 152 564
Total 200 200 200 200 800

10.107 Let S12 and S22 denote, respectively, the variances of independent random samples of sizes n
and m selected from normal distributions with means µ1 and µ2 and common variance σ 2 . If
µ1 and µ2 are unknown, construct a likelihood ratio test of H0 : σ 2 = σ02 against Ha : σ 2 = σa2 ,
assuming that σa2 > σ02 .
10.108 Suppose that X 1 , X 2 , . . . , X n 1 , Y1 , Y2 , . . . , Yn 2 , and W1 , W2 , . . . , Wn 3 are independent random
samples from normal distributions with respective unknown means µ1 , µ2 , and µ3 and vari-
ances σ12 , σ22 , and σ32 .
a Find the likelihood ratio test for H0 : σ12 = σ22 = σ32 against the alternative of at least one
inequality.
b Find an approximate critical region for the test in part (a) if n 1 , n 2 , and n 3 are large and
α = .05.
*10.109 Let X 1 , X 2 , . . . , X m denote a random sample from the exponential density with mean θ1 and
let Y1 , Y2 , . . . , Yn denote an independent random sample from an exponential density with
mean θ2 .
a Find the likelihood ratio criterion for testing H0 : θ1 = θ2 versus Ha : θ1 =

θ2 .

b 
Show that the test in part (a) is equivalent to an exact F test [Hint: Transform X i and
Y j to χ random variables.]
2

*10.110 Show that a likelihood ratio test depends on the data only through the value of a sufficient
statistic. [Hint: Use the factorization criterion.]
10.111 Suppose that we are interested in testing the simple null hypothesis H0 : θ = θ0 versus the
simple alternative hypothesis Ha : θ = θa . According to the Neyman–Pearson lemma, the test
that maximizes the power at θa has a rejection region determined by
L(θ0 )
< k.
L(θa )
In the context of a likelihood ratio test, if we are interested in the simple H0 and Ha , as stated,
then 0 = {θ0 }, a = {θa }, and  = {θ0 , θa }.
a Show that the likelihood ratio λ is given by
L(θ0 ) 1
λ= = $ ).
max{L(θ0 ), L(θa )} L(θa )
max 1,
L(θ0 )
b Argue that λ < k if and only if, for some constant k  ,
L(θ0 )
< k.
L(θa )
c What do the results in parts (a) and (b) imply about likelihood ratio tests when both the
null and alternative hypotheses are simple?

You might also like