Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

Chapter 7. Statistical hypothesis testing in economics and business

7.1 Concepts
7.1.1 Statistical Hypothesis
Definition. A statistical hypothesis is a hypothesis about: the probability distribution
of a random variable, the characteristic parameters of a random variable (E(X), V(X),
the proportion p...) or the independence of random variables.
● Given statistical hypothesis is denoted by H0, called the null hypothesis.
● When studying a statistical hypothesis, we also study a clause that conflict with
it, called the alternative hypothesis and is denoted by H1 in order to if the hypothesis
H0 is rejected, then we accept the hypothesis H1.
Example. Studying the height of young people in province A. We can make the
following pair of statistical hypotheses: H0: The average height of young people in
province A is   168 cm , then the opposing hypotheses corresponding to it can be
H1 :   168 cm or H1 :   168 cm or H1 :   168 cm .

Definition. The method of using statistical tools, based on the information obtained on
the survey sample, to find a conclusion about accepting or rejecting a statistical
hypothesis is called statistical hypothesis testing.
Principle of the small probability: If an event has a very small probability, it can be
realistically considered that in an trial, the event will not occur.
7.1.2 Standardized test statistic (Statistical hypothesis testing criterion)
From the original random variable X in the population, create a random sample
of size n: W = (X1, X2, …, Xn) and choose a statistic G = f( X1, X2, …, Xn, θ0), where
θ0 is a parameter related to the hypothesis to be tested.
If H0 is true, the probability distribution of G is determined.
The G-statistic is called the standardized test statistic (or simply as the test
statistic).
7.1.3 Rejection region
With a quite small probability of α given (α is usually taken as 0.05 or 0.01), a
region W can be found corresponding such that under the assumption that the

1
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

hypothesis H0 is true, the probability that G takes a value in the region Wα is equal to
α. This condition is written as:
P(G  W / H0 )   .
Since α is quite small, according to the principle of the small probability, the
event
(G  W ) can be considered not to occur in a trial.
The value α is called the significance level of the testing.
W is called the rejection region (or critical region) H0 with the significance
level α.
Note. For a given significance level α, it is possible to find an infinite number of
corresponding rejection regions.
7.1.4 The value of the test statistic (Observed value of the testing criterion)
From a specific sample w = (x1, x2, …, xn), a specific value of the test statistic
G is calculated:
Gqs = f( x1, x2, …, xn, θ0).
This value is called the value of the test statistic.
7.1.5 Statistical hypothesis testing rule
+) If G qs  W then it means that H0 is false and hence the conclusion: reject

H0 and accept H1.


+) If G qs  W then it does not confirm that H0 is true, it just means that

through this specific sample, it is impossible to confirm that H0 is false. Therefore, it


can only be said: through this specific sample, there is no basis to reject H0 (in fact still
accept the hypothesis H0).
7.1.6 Type 1 error and type 2 error: With the above test rule, two errors can be
made:
1. Type 1 error: is an error when we reject the hypothesis H0 but in fact H0 is correct.
The probability of a type 1 error is α.
Indeed, if H0 is true then the probability that (G ∈ Wα) equals α:

2
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

But if (G ∈ Wα) then we immediately reject H0. Thus, the probability of making
a type 1 error is α.
2. Type 2 error: is the error of accepting the hypothesis H0 while H0 is false.
The type 2 error occurs when Gqs  W while H1 is true. Suppose the
probability of making a type 2 error is β:
P(G W / H1)   .
Then, the event that does not make a type 2 error is the event G ∈ Wα while H1
is true:
G  W / H1 .
This event is opposite to the event (G W / H1) , so its probability is
P(G  W / H1)  1 – .
The probability 1- β is called the power of the test.
A statistical testing is ideal if it minimizes both the probability of a type 1 error
and a type 2 error. However, such an ideal test does not exist. With a definite sample
of size n, when we decrease the probability of a type 1 error, it will increase the
probability of a type 2 error and vice versa.
In practice, we do as follows: After fixing a level of significance α (fixing the
probability of making a type 1 error to be α) and with a sample size n, in an infinite
number of rejection regions that can be found, we choose the "best" rejection region
as the rejection region such that the probability of making a type 2 error is the smallest
or the power of the test is the largest. Thus, we need to find the rejection region Wα
satisfying the following conditions:
 P(G  W / H 0 )  

 P(G  W / H1)  1    max
7.2. Hypothesis test for the expected value
Let X be the original random variable in a population. X has the normal
distribution with parameters  and 2 , ( X ~ N(, 2 ) ), where the expected value
E ( X )   is unknown.

If there is a basis to hypothesize that the value of  is equal to  0 , we make the


statistical hypothesis:

3
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

H0:  = 0.
To test the above hypothesis, from the population, we set up a sample of size n:
W = (X1, X2, …, Xn)

a) The case V ( X )   2 is known (The population variance is known)


Choose the test statistic is:
(X  μ 0 ) n
U
σ
If the hypothesis H0 is true, then we have
(X  μ) n
U
σ
and U has the standard normal distribution N(0; 1).
With the given significance level α, depending on the form of the alternative
hypothesis H1, the "best" rejection region is constructed according to the following
cases:
+) The case of the one-tail test (the right-tail test) H1: μ > μ0: With the given
significance level α, it is possible to find the standard critical value uα such that:
P(G ∈ Wα/H0) = P(U > uα) = α.
The rejection region Wα is

+) The case of the one-tail test (the left-tail test) H1: μ < μ0: With the given
significance level α, it is possible to find a standard critical value u1-α such that:
P(G ∈ Wα/H0) = P(U < u1-α) = P(U < -uα) = α
The rejection region Wα is

+) The case of the two-tail test H1: μ ≠ μ0: With the given significance level α, we can
be found two standard critical values u1α/2 và u α/2 such that:
P(G ∈ Wα/H0) = P(U < u1-α/2) + P(U > uα/2) = P(U < -uα/2) + P(U > uα/2) = P(|U| > uα/2) =
α
The rejection region Wα is

4
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

From a specific sample w = (x1, x2, …, xn), calculating the value of the test
statistic:

and comparing with Wα to conclude:


- If Uqs ∈ Wα then H0 is false, reject H0, accept H1
- If Uqs Wα then there is no basis to reject H0.
Example 1. MSG (monosodium glutamate) is packed on an automatic line with the
prescribed packed weight of 453 grams. Assume the packed weight of MSG is a
normally distributed random variable with a standard deviation of 36 grams.
Randomly checking the weight of 81 MSG packages and calculated the average
weight of these packages of 448 grams. Can we conclude at the 1% significance level
that MSG is underweight in packaging?
Solution. Let X be the packed weight of MSG. According to the assumption,

X ~ N(, 2 ) with   36 grams. Thus, the average packed weight of MSG is E(X) = μ.

This is a test of the parameter μ of a normally distributed random variable when the
variance of the population is known.
The pair of hypotheses is: H0: μ = 453, H1: μ < 453.
The test statistic is:

The rejection region is:


W = (;  u  )
With α = 0.01, we have uα = u0.01 = 2.33. Thus
W = (; 2.33)

According to the assumption, x  448 . Therefore, the value of the test statistic is

Since Uqs  W , we cannot reject H0. It means that at the 1% significance


level, from the given sample, there is insufficient evidence to infer that MSG is is
underweight in packaging.
5
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

฀ Calculating the probability of a type 2 error (β): Denoting μ0 is the


hypothetical value of μ and μ1 is the real value of μ
The probability of a type 2 error if the test is one-tail test as follows:

where U has the standard normal distribution N (0; 1).


The probability of a type 2 error if the test is two-tail test as follows:

(7.2)
where U has the standard normal distribution N (0; 1).
Example 2. Refer to Example 1, if the actual average packed weight of MSG is 441
grams, what is the probability of a type 2 error?
Solution. We have
   
  P  U  u  1 0 n
  

 441  453 
 P  U  2.33  81   P(U  0.67)  P(U  0.67).
 36 
Moreover, u0.2514  0.67 .
Thus
  P(U  0.67)  P(U  u 0.2514 )  0.2514 .

Therefore, if the actual average packed weight of MSG is 441 grams, the
probability of a type 2 error is 0.2514.
฀ Find the sample size given α and β: The minimum sample size k needs to be
investigated so that the probability of making a type 1 error is  and the probability
of making a type 2 error does not exceed the value  and the actual value 1 deviates

from the value 0 does not exceed the given value  is the smallest positive integer
satisfying the following formula:
+) if the test is the one-tail test:

 2 (u  u )2
k .
2

6
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

+) if the test is the two-tail test:

 2 (u / 2  u )2
k
2

b) The case V ( X )   2 is unknown


Choose the test statistic is the following statistic:
(X  μ 0 ) n
T
S
If the null hypothesis H0 is correct, then we have
(X  μ) n
T
S
and T has the Student distribution with (n - 1) degrees of freedom.
With the given significance level α, depending on the form of the alternative
hypothesis H1, the "best" rejection region is constructed according to the following
cases:
+) The case of the one-tail test (the right-tail test) H1: μ > μ0: With the given

significance level α, it is possible to find the Student critical value t (n 1) such that:

P(G  W/H0) = P(T > t (n 1) ) = .


We get the rejection region Wα :

+) The case of the one-tail test (the left-tail test) H1: μ < μ0: With the given

significance level α, it is possible to find the Student critical value t1(n1) such that:

P(G  W/H0) = P(T < t1(n1) ) = P(T < - t (n



1)
) = .

We get the rejection region Wα :

+) The case of the two-tail test H1: μ ≠ μ0: With the given significance level α, we can

be found two Student critical values t (n/21) and t1(n1)/ 2 such that:

P(G  W/H0) = P(T < t1(n1)/ 2 ) + P(T > t (n/21) )

= P(T < - t (n/21) ) + P(U > t (n/21) )

7
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

= P(|T| > t (n/21) ) = 

We get two-sided rejection region:

(7.5)
From a specific sample w = (x1, x2, …, xn), we calculate x and s, then calculate
the value of the test statistic:
(x  0 ) n
Tqs 
s
Compare Tqs with the rejection region Wα and conclude:
- If Tqs ∈ Wα then reject H0, accept H1.
- If Tqs Wα then there is no basis to reject H0.
Example. The time norm to complete a product is 14 minutes. Is it necessary to
change the norm, if we track the time to complete the product at 25 workers, we get
the following table of data:
Time to complete one Number of
product (minutes) workers
respectively
10-12 2
12-14 6
14-16 10
16-18 4
18-20 3
Let's conclude at the 5% significance level, knowing that the time to complete a
product is a normally distributed random variable.

Solution. Let X be the time to complete one product. We have X ~ N(, 2 ) .


Average time to complete a product is E(X) = μ.
This is a problem to test the hypothesis about the parameter μ of the random

variable X that has the normal distribution N(, 2 ) when it is unknown 2 .


According to the requirements of the problem, we must test the following pair
of hypotheses:
H0: μ = 14, H1: μ ≠ 14.
8
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

The test statistic is

The rejection region is


1) (n 1)
Wα = (;  t (n
 / 2 )  (t  / 2 ; )

1)
With n = 25, we have t (n (24)
 /2  t 0.025  2.064

Thus, W  (; 2.064)  (2.064; )

From the specific sample, we make the following table to calculate x and s:

xi ni nixi

11 2 22 242
13 6 78 1014
15 10 150 2250
17 4 68 1156
19 3 57 1083

25 375 5745

1 375
x
n
 nixi 
25
 15

1 2 5745
ms 
n
 n i x i2  x 
25
 152  4.8

n 25
 s ms  (4.8)  5
n 1 24
The value of the test statistic is
(x  0 ) n 15  14
Tqs =  25  2.236
s 5
Since Tqs ∈ Wα , rejects H0, accept H1.
Conclusion: At the 5% significance level, there is enough evidence to infer that
it is necessary to change the time norm to complete one product.
฀ Calculating the probability of a type 2 error (β): Denoting μ0 is the
hypothetical value of μ and μ1 is the real value of μ
9
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

The probability of a type 2 error if the test is one-tail test as follows:


 0  1 
  P  T  t( n 1)  n ,
 s 
where T has the Student distribution with (n - 1) degrees of freedom.
The probability of a type 2 error if the test is two-tail test as follows:
 0  1 
  P  T  t( n/21)  n,
 s 
where T has the Student distribution with (n - 1) degrees of freedom.
฀ Find the sample size given α and β: The minimum sample size k needs to be
investigated so that the probability of making a type 1 error is  and the probability
of making a type 2 error does not exceed the value  and the actual value 1 deviates

from the value 0 does not exceed the given value  is the smallest positive integer
satisfying the following formula:
+) if the test is the one-tail test:

.
+) if the test is the two-tail test:

1) (n 1)
where s2 is the variance of sample preliminary size n, t (n 1) , t (n
 / 2 and t is

the critical value corresponding to the α, α / 2, β and the number of degrees of


freedom is
n  1 .

7.3. Hypothesis testing on two expected values of two random variables


Suppose there are two populations where the original random variables X1 and

X2 have E(X1) = 1, V(X1) = 12 and E(X2) = 2, V(X2) =  22 . If E(X1) = μ1 and E(X2)
= μ2 are unknown but there is a basis for assuming that their values are equal, we make
the statistical hypothesis:
H0 : μ 1 = μ 2 .

10
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

To test the above hypothesis, from two populations, draw two independent
samples of size n1 and n2:
W1  (X11 , X12 ,..., X1n1 )
W2  (X 21 , X 22 ,..., X 2n 2 )

a) In the case the variances V(X1) = 12 and V(X2) =  22 are known and we assume

that X1, X2 have the normal distribution X1 ~ N ( 1, 12 ) , X 2 ~ N ( 2 ,  22 )


Choose the test statistic:
(X1  X 2 )  (μ1  μ 2 )
U .
σ σ
2 2

1 2
n1 n 2
We know that the statistic U has the standard normal distribution N(0, 1).
If the hypothesis H0 correct, we have

and U has also the standard normal distribution N(0, 1).


With the given significance level α, depending on the form of the alternative
hypothesis H1, the "best" rejection region is constructed according to the following
cases:
+) The case of the one-tail test (the right-tail test) H1: μ1 > μ2:
The rejection region is:

+) The case of the one-tail test (the left-tail test) H1: μ1 < μ2:
The rejection region is:

+) The case of the two-tail test H1: μ1 ≠ μ2:


The rejection region is:

From two specific samples that drawn from X1 and X2 respectively:


w1  ( x11 , x12 , , x1n1 ) and w 2  ( x 21 , x 22 , , x 2n 2 ) , we calculate sample means

11
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

n n
1 1 1 2
x1  
n1 i 1
x1i ; x 2   x 2i
n 2 i 1

and the value of the test statistic:

Compare Uqs with the rejection region Wα to conclude:


- If Uqs ∈ Wα then reject H0, accept H1.
- If Uqs Wα then there is no basis to reject H0.

b) In the case the variances V(X1) = 12 and V(X2) =  22 are unknown but assume

that they are equal ( 12 = 22 ) and assume that X1, X2 have the normal distribution

X1 ~ N ( 1, 12 ) , X 2 ~ N ( 2 ,  22 )

Choose the test statistic is the following statistic:

(X1  X 2 )  (μ1  μ 2 ) (n1  1)S12  (n 2  1)S22


T , where Sp 
1 1 n1  n 2  2
Sp . 
n1 n 2

We know that the T has the Student distribution with (n1 + n2 – 2) degrees of
freedom.
If the hypothesis H0 correct, then we have

and T still has the Student distribution with (n1 + n2 – 2) degrees of freedom.
With the given significance level α, depending on the form of the alternative
hypothesis H1, the "best" rejection region is constructed according to the following
cases:
+) The case of the one-tail test (the right-tail test) H1: μ1 > μ2:
The rejection region is:

+) The case of the one-tail test (the left-tail test) H1: μ1 < μ2:
12
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

The left rejection region is:

+) The case of the two-tail test H1: μ1  μ 2 :


The rejection region is:

From two specific samples that drawn from X1 and X2 respectively:

w1  ( x11 , x12 , , x1n1 ) and w 2  ( x 21 , x 22 , , x 2n 2 ) , we calculate x1 ; x 2 , s12 , s22 and

calculate the value of the test statistic

Finally, compare with Wα and conclusions.


Example. People experimented with two different methods of raising chickens. After
a month, getting the following weight gain results:

Method I n1 = 100 chickens x1  1.1 kg s12  0.04

Method II n2 = 150 chickens x 2  1.2 kg s22  0.09

Can we conclude at the 5% significance level that method II is more effective


than method I? Assume that the weight gain of chickens is normally distributed and

12  22 .

Solution. Let X and X2 be the weight gain of chickens when applying breeding

methods I and II, respectively. We have X1 ~ N(1, 12 ), X2 ~ N(2 , 22 ) with 12  22 .
The pair of hypotheses: H0: μ1 = μ2, H1: μ1 < μ2
The test statistic is

Rejection region is


W = -;-t (n

1  n 2  2)

13
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

At the significance level of 0.05, t (n1  n 2  2)  t (248)


0.05  u 0.05  1.645 . So, the

rejection region is
W  (;  1.645)

From the specific sample obtained, we can calculate

(n1  1)s12  (n 2  1)s 22 (100  1)0,04  (150  1)0,09


Sp    0.265
n1  n 2  2 100  150  2

x1  x 2 1.1  1.2
 Tqs    2.923
1 1 1 1
Sp  0.265 
n1 n 2 100 150

Since Tqs ∈ Wα , reject H0, accept H1.


Conclusion: With the significance level of 0.05, it can be concluded that
method II is more effective than method I.

c) In the case the variances V(X1) = 12 and V(X2) =  22 are unknown, X1 and X2
distribute according to a certain probability distribution, not necessarily according
to the normal distribution, two samples are investigated independently with size n1 >
30 and n2 > 30:
The test statistic is the following statistic
(X1  X 2 )  (μ1  μ 2 )
U
S12 S22

n1 n 2
With n1 > 30 and n2 > 30, the statistic U has an approximately standard normal
distribution N  0, 1 .

If the hypothesis H0 is correct, we have

With the given significance level α, depending on the form of the alternative
hypothesis H1, the "best" rejection region is constructed according to the following
cases:
+) The case of the one-tail test (the right-tail test) H1: μ1 > μ2:

14
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

The rejection region is

+) The case of the one-tail test (the left-tail test) H1: μ1 < μ2:
The rejection region is:

+) The case of the two-tail test H1: μ1 ≠ μ2:


The rejection region is

From two specific samples that drawn from X1 and X2 respectively:


w1  ( x11 , x12 , , x1n1 ), w 2  ( x 21 , x 22 , , x1n 2 ) , we calculate

;
and the value of the test statistic

.
Compare Uqs with the rejection region Wα to conclude:
- If Uqs ∈ Wα then reject H0, accept H1
- If Uqs Wα then there is no basis to reject H0.
Example 1. To find out the current consumption situation of a kind of the product in a
week at agents in province A, people randomly collect sales revenue at 101 agents and
have the following results:
Sales revenue 25 26 27 28 29 30
(million VND)
Number of agents 10 18 30 22 15 6
People also randomly collect sales revenue at 101 agents in province B and get
101 101
 x Bi  2525 ,  x 2B i
 63425 . Knowing that sales revenue is a random variable that
i 1 i 1

has the normal distribution. Can we conclude at the 5% significance level that the
average revenue of trading agents in the two provinces is the same?

15
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

Solution. Let X1 and X2 be the sales revenue of agents in provinces A and B


respectively.
H 0 : μ1  μ 2
The pair of statistical hypothesis: 
H1 : μ1  μ 2
The test statistic is
(X1  X 2 )
U .
S12 S22

n1 n 2

The rejection region is: W  (; u  /2 )  (u  /2 ; )


u /2  u0.025  1.96

 W  (; 1.96)  (1.96; )

x1  27.3168;s12  1.8403

x 2  25;s 22  3
The value of the test statistic is
x1  x 2
U qs   10.5831  Wα
s12 s 22

n1 n 2
reject H0, accept H1.
Therefore, at the significance level of 5%, we can conclude that the average revenue of
business agents in two provinces A and B is different.
Example 2. Two classes study statistics together and the results of the final exam are
as follows:
Class A n1 = 64 x1  73.2 s1 = 10.9

Class B n2 = 68 x 2  76.6 s2 = 11.2

Can we conclude at the 5% significance level that the average exam result of class B is
higher than that of class A?
Solution. Let X1 and X2 be the results of the statistics exam of students in class A and
B respectively. Thus, the average exam result in class A and class B is  1 and  2
respectively.

16
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

This is the problem of testing the hypothesis pair H0:  1 =  2, H1:  1 <  2 when the

variances 12 và 22 are unknown.

Since n1  64  30 , n 2  68  30 , the test statistic is

The rejection region is


W  (; u )  (; u0.05 ) .

We have u0.05  1.645 . Thus, Wa  (;  1.645) .


Through the specific sample, the value of the test statistic can be calculated as
follows:
x1  x 2 73.2  76.6
U qs    1.76731
s12 s 22 10.92 11.22
 
n1 n 2 64 68
Because Uqs ∈ Wα , we reject H0, accept H1.
Conclusion: At the 5% significance level, we conclude that the average exam
result of class B is higher than that of class A.
7.4. Hypothesis testing of the parameter p of a random variable has zero-one
distribution
Assume that the original random variable X in the population has the zero - one
distribution with the parameter is p: X ~ A(p) . If p is unknown, but there is a basis to
assume that its value is equal to p0, we make a statistical hypothesis:
H 0 : p = p0
From the population, create a random sample of size n:
W = ( X1, X2, …, Xn)
Suppose n and p satisfy the condition
np  5 and n(1  p)  5 .

Choose the test statistic

17
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

(f  p) n
If H0 is true, U  has the approximate standard normal distribution
p(1  p)

N(0,1). Therefore, with the significance level α and depending on the alternative
hypothesis H1, the rejection region are determined as follows:
+) The case of the one-tail test (the right-tail test) H1: p > p0
The rejection region is

+) The case of the one-tail test (the left-tail test) H1: p < p0
The rejection region is

+) The case of the two-tail test H1: μ1 ≠ μ2:


The rejection region is

From a specific sample w = (x1, x2, …, xn), we can calculate the sample
frequency f, then find the value of the test statistic according to the formula:

.
Compare Uqs with the rejection region Wα to conclude:
- If Uqs ∈ Wα , reject H0, acceptH1
- If Uqs Wα, there is no basis to reject H0.
Example. Disease A can be cured with drug H. The company that manufactures drug
H claims that the rate of patients recovering from disease due to taking drug of this
company is 85%. People tested drug H on 250 patients with disease A and found that
195 people recovered from the disease. Can we conclude at the 5% significance level
that the above statement of the company manufacturing drug H is higher than reality?
Solution. Let p be the proportion of patients with disease A who recover from the
disease when taking the drug H.
The pair of statistical hypotheses are: H0: p = 0.85; H1: p < 0.85
Because np0 = 250(0.85) = 212.5 > 5 and n(1  p0 )  250(0.15)  37.5  5 , we
choose the test statistic

18
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

(f  p0 ) n (f  0.85). 250
U 
p0 (1  p0 ) 0.85(1  0.85)

The rejection region is


W = (;  u  ) .

With the significance level  = 0.05, we have u = u0.05 = 1.645. Thus, the
rejection region is
W = (; 1.645) .
For n = 250, m = 210, we calculate the sample proportion as follows
m 195
f   0.78 .
n 250
We have the value of the test statistic
(f  p0 ) n (0.78  0.85). 250
Uqs    3.0997
p0 (1  p0 ) 0.85(1  0.85)

Since Uqs  W , we reject H0, accept H1.

Conclusion: At the significance level of 5%, we conclude that there is enough


evidence to infer that the statement of the company manufacturing drug H is higher
than the reality.
7.5. Hypothesis testing of two parameters p of two random variables with zero-
one distribution
Suppose there are two study populations, where X1 ~ A(p1) and X 2 ~ A(p 2 ) , p1 and p2
are unknown but there is a basis to believe that p1 = p2, we make the statistical
hypothesis:
H0 : p1 = p 2 .
From the two populations, draw two random samples of size n1 and n2 ( n1  30 and n2
> 30):
W1  (X11 , X12 ,..., X1n1 )

and
W2  (X 21 , X 22 ,..., X 2n 2 ) .

Choose the test statistic:

19
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

(f1  f 2 )  (p1  p 2 )
U .
p1 (1  p1 ) p 2 (1  p 2 )

n1 n2

With n1  30 and n2 > 30, U has the approximate standard normal distribution N(0; 1).
If H0 is true (p1 = p2 = p), the test statistic becomes:
f1  f 2
U .
1 1 
p(1  p)   
 n1 n 2 
Since p is unknown, it is replaced by its estimate:
n1f1  n 2 f 2
f .
n1  n 2

Therefore, we have the test statistic


f1  f 2
U .
1 1 
f (1  f )   
 n1 n 2 

with the approximate standard normal distribution N(0; 1) if n1  30 and n2 > 30.
With the significance level α and depending on the alternative hypothesis H1,
the rejection region are determined as follows:
+) The case of the one-tail test (the right-tail test): H1: p1 > p2:
W  (u ; ) .

+) The case of the one-tail test (the left-tail test): H1: p1 < p2:
W  (; u ) .

+) The case of the two-tail test H1: p1 p2 :


W  (;  u / 2 )  ( u / 2 ; ) .

For specific samples w1  (x11 , x12 ,..., x1n ) and w2  (x 21 , x 22 ,..., x 2n ) , we get the
1 2

sample proportions f1 , f 2 and f . From there, the value of the test statistic is calculated
according to the following formula:
f1  f 2
U qs  .
1 1 
f (1  f )   
 n1 n 2 
Compare Uqs with the rejection region Wα to conclude:

20
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

- If Uqs  W, reject H0, admit H1.


- If Uqs  W, there is no basis to reject H0.
Example. In two factories A and B, there are the following data on employees:
Factory A has 200 workers, in 1997, 30 people quit their jobs. Factory B has 350
workers, in 1997, 65 people quit their jobs. Can we conclude at the 5% significance
level that the proportion of workers leaving factory A is lower than that of factory B?
Solution. Let p1 and p2 be the proportion of workers quitting in factories A and B,
respectively.
The pair of statistical hypotheses are: H0: p1 = p2; H1: p1 < p2
The test statistic is:
f1  f 2
U .
 1 1 
f (1  f )   
 200 350 

The rejection region is:


W  (; u )  (; 1.645) .

We have
30 65 13
f1   0.15;f 2   .
200 350 70
n1f1  n 2 f 2 200(0.15)  350(13 / 70) 19
f   .
n1  n 2 550 110

From there, we obtain the value of the test statistic:


(0.15  13 / 70)
U qs   1.0659  Wα .
 1 1 
(19 /110)(1  19 /110)   
 200 350 

Thus, we cannot reject H0, which means that there is not enough evidence to infer that
the proportion of workers quitting in factory A is lower than in factory B.
7.6. Hypothesis testing for the variance
Assume that the original random variable X in the population has the normal

distribution N(, 2 ) with V(X)  2 is unknown but there is basis to assume that its

value is equal to 02 . We make a statistical hypothesis

H0: 2 = 02 .

21
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

To test the above hypothesis, from the population, we create a random sample
of size n:
W = (X1, X2, …, Xn)
and choose the test statistic:

2 (n  1)S2
 
02

If the hypothesis H0 is correct, the statistic  2 has a chi-square distribution


with n – 1 degrees of freedom. Therefore, with a significance level α and depending
on the form of the alternative hypothesis H1, the rejection region Wα is built in the
following cases:

+) The case of the one-tail test (the right-tail test) H1: 2 > 02 :
The rejection region Wα is

+) The case of the one-tail test (the left-tail test) H1: 2 < 02
The rejection region Wα is
1)
W  (; 12(n
 )

+) The case of the two-tail test H1: 2  02 :


The rejection region Wα is

W  (; 12(n /1)2 )  ( 2(/n21) ; )

With a specific sample w = (x1, x2, …, xn), we can calculate the sample variance
s 2 and the value of the test statistic

2 (n  1)s 2
qs  .
02
2
Compare qs with the rejection region Wα to conclude:

2
- If qs  W then reject H0, accept H1

2
- If qs  W then there is no basis to reject H0.

Example. The weight of chicks at birth is a normally distributed random variable.


Suspecting the uniformity of chick weight is reduced, 12 chickens were weighed and
22
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

the variance was found to be s 2  11.41 (grams)2. At the significance level of 5%, let’s
conclude about the above suspicion, knowing that, normally, the dispersion of chick
weight is 10 (gram)2?
Solution. Let X be the weight of chicks at birth. According to the assumption,

X ~ N(, 2 ) . Therefore, the uniformity (or the dispersion) of chick weight is 2 . This

is a hypothesis testing problem:

H0: 2  10 ; H1: 2  10
The test statistic is

(n  1)S2 11S2
2   .
02 10

The rejection region is


W = 2(n 1) ;  . 
With the significance level 0.05, we have 2(n 1)  0.05
2(11)
 19.68 . Therefore,

we have the rejection region is


W = (19.68;  )
From the specific sample, we have the value of the test statistic

2 (n  1)s 2 11(11.41)
qs    12.551 .
02 10

2
Since qs  W , we cannot reject the hypothesis H 0 , which means there is not

enough evidence to infer that the uniformity of chick weight is reduced.


7.7. Hypothesis testing on two variances of two normally distributed random
variables
Suppose there are two study populations in which the original random variables

X1 and X 2 have the same normal distribution X1 ~ N(1, 12 ) , X2 ~ N(2 , 22 ) . If 12

and 22 are unknown, but there is a basis to assume that their values are equal, we
make a statistical hypothesis
H 0 : 12 22 .

To test the above hypothesis, from two populations, draw two independent
samples of size n1 and n2:
23
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

W1  (X11 , X12 ,..., X1n1 )

W2  (X 21 , X 22 ,..., X 2n 2 )

Choose the test statistic:


S12 σ 22
F  2 . 2 (if S12  S22 ).
S2 σ1
The F-statistic has the Fisher distribution with (n1 - 1) and (n2 – 1) degrees of
freedom.
If the hypothesis H0 is true, the test statistic has the form
S12
F ,
S22
and it still distributes F(n1 - 1; n2 - 1).
With a significance level α and depending on the form of the alternative
hypothesis H1, the rejection region Wα is built in the following cases:

+) The case of the one-tail test (the right-tail test) : H1: 12 > 22 :
The rejection region Wα is

W  ( f(n1 1, n2 1) ; )

+) The case of the one-tail test (the left-tail test) H1: 12 < 22 :
The rejection region Wα is
( n 1, n2 1)
W  (; f11 )

+) The case of the two-tail test H1: 12  22 :


The rejection region Wα is
( n 1, n2 1) ( n 1, n2 1)
W  (; f11 / 2 )  ( f /12 ; ) .

For specific samples w1  (x11 , x12 ,..., x1n ) and w2  (x 21 , x 22 ,..., x 2n ) , we can
1 2

calculate the sample variances s12 and s 22 . From there, the value of the test statistic can
be calculated according to the following formula:
s12
Fqs  2 .
s2
Compare Fqs with the rejection region Wα to conclude:

24
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

- If Fqs  W , reject H0, accept H1

- If Fqs  W , there is no basis to reject H0.

Example 1. To compare the accuracy of two measuring devices, samples were taken
and the following results were obtained:
Device A Device B
Measured 25 times Measured 21 times
Variance of error = 14.5 Variance of error = 17.2
Assume that the measurement error is a normally distributed random variable.
Conduct a test at the 5% significance level to infer whether the accuracy of the two
devices is the same?
Solution. Let X1 and X 2 be the measurement error of device B and device A,

respectively. We have X1 ~ N(1, 12 ), X2 ~ N(2 , 22 ) .

This is the problem of testing the hypothesis pair: H0: 12  22 , H1: 12  22
The test statistic is:
S12
F .
S22
The rejection region is
W  (;f1(n1 /1,n
2
2 1)
)  (f (n/121,n 2 1) ; ) .
With the significance level α = 0.05, we have
(n 1,n 2 1) (20,24) 1 1 (n 1,n 2 1) (20,24)
f11 /2  f0.975    0.4149 and f /21  f0.025  2.33
(24,20) 2.41
f0.025

Thus, the rejection region is


W = (;0.4149)  (2.33; ) .
For a specific sample, we have
s12 17.2
Fqs  2   1.1862
s 2 14.5

Since Fqs  W, we cannot reject the hypothesis H0.


Conclusion: At the significance level of 5%, we can said that the accuracy of
the two devices is the same.

25
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU

Example 2. Measuring the height of 200 randomly selected young people in a


residential area A obtained the following data:
Height (cm) 155 160 165 170 175
Number of young people 30 60 50 50 10
In residential area B, the height of 200 young people was also randomly measured and
the sample standard deviation was calculated as 4.15cm. Knowing that the height of
young people in region A and region B are normally distributed random variables. At
the 5% level of significance, can we conclude that the height of young people in region
B is more uniform than that of young people in region A? Knowing that
(199,199) (199,199)
f0.05  1.26334;f0.95  0.791552 .

Solution. Let XA and XB be the height of the young people in regions A and B,

respectively. We have XA ~ N(A , 2A ), XB ~ N(B , B


2
) .

The pair of statistical hypotheses is


H o : A2  B2

H1 : A   B
2 2

The test statistic is:


S2A
F .
S2B
The rejection region is

W  ( f( nA 1, nB 1) ; )  ( f0.05


(199,199)
; )  (1.26334; ) .

We obtain that

s2A  32.3493 and s2B  17.2225 .

Therefore, the value of the test statistic is


32.3493
Fqs   1.8783  Wα .
17.2225
Since Fqs  Wα , reject H0, accept H1.

Conclusion: At the significance level of 5%, there is enough evidence to infer


that the height of young people in region B is more uniform than that of young people
in region A.

26

You might also like