Professional Documents
Culture Documents
Chapter 7 XSTKE
Chapter 7 XSTKE
7.1 Concepts
7.1.1 Statistical Hypothesis
Definition. A statistical hypothesis is a hypothesis about: the probability distribution
of a random variable, the characteristic parameters of a random variable (E(X), V(X),
the proportion p...) or the independence of random variables.
● Given statistical hypothesis is denoted by H0, called the null hypothesis.
● When studying a statistical hypothesis, we also study a clause that conflict with
it, called the alternative hypothesis and is denoted by H1 in order to if the hypothesis
H0 is rejected, then we accept the hypothesis H1.
Example. Studying the height of young people in province A. We can make the
following pair of statistical hypotheses: H0: The average height of young people in
province A is 168 cm , then the opposing hypotheses corresponding to it can be
H1 : 168 cm or H1 : 168 cm or H1 : 168 cm .
Definition. The method of using statistical tools, based on the information obtained on
the survey sample, to find a conclusion about accepting or rejecting a statistical
hypothesis is called statistical hypothesis testing.
Principle of the small probability: If an event has a very small probability, it can be
realistically considered that in an trial, the event will not occur.
7.1.2 Standardized test statistic (Statistical hypothesis testing criterion)
From the original random variable X in the population, create a random sample
of size n: W = (X1, X2, …, Xn) and choose a statistic G = f( X1, X2, …, Xn, θ0), where
θ0 is a parameter related to the hypothesis to be tested.
If H0 is true, the probability distribution of G is determined.
The G-statistic is called the standardized test statistic (or simply as the test
statistic).
7.1.3 Rejection region
With a quite small probability of α given (α is usually taken as 0.05 or 0.01), a
region W can be found corresponding such that under the assumption that the
1
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
hypothesis H0 is true, the probability that G takes a value in the region Wα is equal to
α. This condition is written as:
P(G W / H0 ) .
Since α is quite small, according to the principle of the small probability, the
event
(G W ) can be considered not to occur in a trial.
The value α is called the significance level of the testing.
W is called the rejection region (or critical region) H0 with the significance
level α.
Note. For a given significance level α, it is possible to find an infinite number of
corresponding rejection regions.
7.1.4 The value of the test statistic (Observed value of the testing criterion)
From a specific sample w = (x1, x2, …, xn), a specific value of the test statistic
G is calculated:
Gqs = f( x1, x2, …, xn, θ0).
This value is called the value of the test statistic.
7.1.5 Statistical hypothesis testing rule
+) If G qs W then it means that H0 is false and hence the conclusion: reject
2
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
But if (G ∈ Wα) then we immediately reject H0. Thus, the probability of making
a type 1 error is α.
2. Type 2 error: is the error of accepting the hypothesis H0 while H0 is false.
The type 2 error occurs when Gqs W while H1 is true. Suppose the
probability of making a type 2 error is β:
P(G W / H1) .
Then, the event that does not make a type 2 error is the event G ∈ Wα while H1
is true:
G W / H1 .
This event is opposite to the event (G W / H1) , so its probability is
P(G W / H1) 1 – .
The probability 1- β is called the power of the test.
A statistical testing is ideal if it minimizes both the probability of a type 1 error
and a type 2 error. However, such an ideal test does not exist. With a definite sample
of size n, when we decrease the probability of a type 1 error, it will increase the
probability of a type 2 error and vice versa.
In practice, we do as follows: After fixing a level of significance α (fixing the
probability of making a type 1 error to be α) and with a sample size n, in an infinite
number of rejection regions that can be found, we choose the "best" rejection region
as the rejection region such that the probability of making a type 2 error is the smallest
or the power of the test is the largest. Thus, we need to find the rejection region Wα
satisfying the following conditions:
P(G W / H 0 )
P(G W / H1) 1 max
7.2. Hypothesis test for the expected value
Let X be the original random variable in a population. X has the normal
distribution with parameters and 2 , ( X ~ N(, 2 ) ), where the expected value
E ( X ) is unknown.
3
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
H0: = 0.
To test the above hypothesis, from the population, we set up a sample of size n:
W = (X1, X2, …, Xn)
+) The case of the one-tail test (the left-tail test) H1: μ < μ0: With the given
significance level α, it is possible to find a standard critical value u1-α such that:
P(G ∈ Wα/H0) = P(U < u1-α) = P(U < -uα) = α
The rejection region Wα is
+) The case of the two-tail test H1: μ ≠ μ0: With the given significance level α, we can
be found two standard critical values u1α/2 và u α/2 such that:
P(G ∈ Wα/H0) = P(U < u1-α/2) + P(U > uα/2) = P(U < -uα/2) + P(U > uα/2) = P(|U| > uα/2) =
α
The rejection region Wα is
4
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
From a specific sample w = (x1, x2, …, xn), calculating the value of the test
statistic:
X ~ N(, 2 ) with 36 grams. Thus, the average packed weight of MSG is E(X) = μ.
This is a test of the parameter μ of a normally distributed random variable when the
variance of the population is known.
The pair of hypotheses is: H0: μ = 453, H1: μ < 453.
The test statistic is:
According to the assumption, x 448 . Therefore, the value of the test statistic is
(7.2)
where U has the standard normal distribution N (0; 1).
Example 2. Refer to Example 1, if the actual average packed weight of MSG is 441
grams, what is the probability of a type 2 error?
Solution. We have
P U u 1 0 n
441 453
P U 2.33 81 P(U 0.67) P(U 0.67).
36
Moreover, u0.2514 0.67 .
Thus
P(U 0.67) P(U u 0.2514 ) 0.2514 .
Therefore, if the actual average packed weight of MSG is 441 grams, the
probability of a type 2 error is 0.2514.
Find the sample size given α and β: The minimum sample size k needs to be
investigated so that the probability of making a type 1 error is and the probability
of making a type 2 error does not exceed the value and the actual value 1 deviates
from the value 0 does not exceed the given value is the smallest positive integer
satisfying the following formula:
+) if the test is the one-tail test:
2 (u u )2
k .
2
6
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
2 (u / 2 u )2
k
2
significance level α, it is possible to find the Student critical value t (n 1) such that:
+) The case of the one-tail test (the left-tail test) H1: μ < μ0: With the given
significance level α, it is possible to find the Student critical value t1(n1) such that:
+) The case of the two-tail test H1: μ ≠ μ0: With the given significance level α, we can
be found two Student critical values t (n/21) and t1(n1)/ 2 such that:
7
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
(7.5)
From a specific sample w = (x1, x2, …, xn), we calculate x and s, then calculate
the value of the test statistic:
(x 0 ) n
Tqs
s
Compare Tqs with the rejection region Wα and conclude:
- If Tqs ∈ Wα then reject H0, accept H1.
- If Tqs Wα then there is no basis to reject H0.
Example. The time norm to complete a product is 14 minutes. Is it necessary to
change the norm, if we track the time to complete the product at 25 workers, we get
the following table of data:
Time to complete one Number of
product (minutes) workers
respectively
10-12 2
12-14 6
14-16 10
16-18 4
18-20 3
Let's conclude at the 5% significance level, knowing that the time to complete a
product is a normally distributed random variable.
1)
With n = 25, we have t (n (24)
/2 t 0.025 2.064
From the specific sample, we make the following table to calculate x and s:
xi ni nixi
11 2 22 242
13 6 78 1014
15 10 150 2250
17 4 68 1156
19 3 57 1083
25 375 5745
1 375
x
n
nixi
25
15
1 2 5745
ms
n
n i x i2 x
25
152 4.8
n 25
s ms (4.8) 5
n 1 24
The value of the test statistic is
(x 0 ) n 15 14
Tqs = 25 2.236
s 5
Since Tqs ∈ Wα , rejects H0, accept H1.
Conclusion: At the 5% significance level, there is enough evidence to infer that
it is necessary to change the time norm to complete one product.
Calculating the probability of a type 2 error (β): Denoting μ0 is the
hypothetical value of μ and μ1 is the real value of μ
9
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
from the value 0 does not exceed the given value is the smallest positive integer
satisfying the following formula:
+) if the test is the one-tail test:
.
+) if the test is the two-tail test:
1) (n 1)
where s2 is the variance of sample preliminary size n, t (n 1) , t (n
/ 2 and t is
X2 have E(X1) = 1, V(X1) = 12 and E(X2) = 2, V(X2) = 22 . If E(X1) = μ1 and E(X2)
= μ2 are unknown but there is a basis for assuming that their values are equal, we make
the statistical hypothesis:
H0 : μ 1 = μ 2 .
10
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
To test the above hypothesis, from two populations, draw two independent
samples of size n1 and n2:
W1 (X11 , X12 ,..., X1n1 )
W2 (X 21 , X 22 ,..., X 2n 2 )
a) In the case the variances V(X1) = 12 and V(X2) = 22 are known and we assume
+) The case of the one-tail test (the left-tail test) H1: μ1 < μ2:
The rejection region is:
11
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
n n
1 1 1 2
x1
n1 i 1
x1i ; x 2 x 2i
n 2 i 1
b) In the case the variances V(X1) = 12 and V(X2) = 22 are unknown but assume
that they are equal ( 12 = 22 ) and assume that X1, X2 have the normal distribution
X1 ~ N ( 1, 12 ) , X 2 ~ N ( 2 , 22 )
We know that the T has the Student distribution with (n1 + n2 – 2) degrees of
freedom.
If the hypothesis H0 correct, then we have
and T still has the Student distribution with (n1 + n2 – 2) degrees of freedom.
With the given significance level α, depending on the form of the alternative
hypothesis H1, the "best" rejection region is constructed according to the following
cases:
+) The case of the one-tail test (the right-tail test) H1: μ1 > μ2:
The rejection region is:
+) The case of the one-tail test (the left-tail test) H1: μ1 < μ2:
12
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
12 22 .
Solution. Let X and X2 be the weight gain of chickens when applying breeding
methods I and II, respectively. We have X1 ~ N(1, 12 ), X2 ~ N(2 , 22 ) with 12 22 .
The pair of hypotheses: H0: μ1 = μ2, H1: μ1 < μ2
The test statistic is
Rejection region is
W = -;-t (n
1 n 2 2)
13
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
rejection region is
W (; 1.645)
x1 x 2 1.1 1.2
Tqs 2.923
1 1 1 1
Sp 0.265
n1 n 2 100 150
c) In the case the variances V(X1) = 12 and V(X2) = 22 are unknown, X1 and X2
distribute according to a certain probability distribution, not necessarily according
to the normal distribution, two samples are investigated independently with size n1 >
30 and n2 > 30:
The test statistic is the following statistic
(X1 X 2 ) (μ1 μ 2 )
U
S12 S22
n1 n 2
With n1 > 30 and n2 > 30, the statistic U has an approximately standard normal
distribution N 0, 1 .
With the given significance level α, depending on the form of the alternative
hypothesis H1, the "best" rejection region is constructed according to the following
cases:
+) The case of the one-tail test (the right-tail test) H1: μ1 > μ2:
14
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
+) The case of the one-tail test (the left-tail test) H1: μ1 < μ2:
The rejection region is:
;
and the value of the test statistic
.
Compare Uqs with the rejection region Wα to conclude:
- If Uqs ∈ Wα then reject H0, accept H1
- If Uqs Wα then there is no basis to reject H0.
Example 1. To find out the current consumption situation of a kind of the product in a
week at agents in province A, people randomly collect sales revenue at 101 agents and
have the following results:
Sales revenue 25 26 27 28 29 30
(million VND)
Number of agents 10 18 30 22 15 6
People also randomly collect sales revenue at 101 agents in province B and get
101 101
x Bi 2525 , x 2B i
63425 . Knowing that sales revenue is a random variable that
i 1 i 1
has the normal distribution. Can we conclude at the 5% significance level that the
average revenue of trading agents in the two provinces is the same?
15
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
x1 27.3168;s12 1.8403
x 2 25;s 22 3
The value of the test statistic is
x1 x 2
U qs 10.5831 Wα
s12 s 22
n1 n 2
reject H0, accept H1.
Therefore, at the significance level of 5%, we can conclude that the average revenue of
business agents in two provinces A and B is different.
Example 2. Two classes study statistics together and the results of the final exam are
as follows:
Class A n1 = 64 x1 73.2 s1 = 10.9
Can we conclude at the 5% significance level that the average exam result of class B is
higher than that of class A?
Solution. Let X1 and X2 be the results of the statistics exam of students in class A and
B respectively. Thus, the average exam result in class A and class B is 1 and 2
respectively.
16
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
This is the problem of testing the hypothesis pair H0: 1 = 2, H1: 1 < 2 when the
17
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
(f p) n
If H0 is true, U has the approximate standard normal distribution
p(1 p)
N(0,1). Therefore, with the significance level α and depending on the alternative
hypothesis H1, the rejection region are determined as follows:
+) The case of the one-tail test (the right-tail test) H1: p > p0
The rejection region is
+) The case of the one-tail test (the left-tail test) H1: p < p0
The rejection region is
From a specific sample w = (x1, x2, …, xn), we can calculate the sample
frequency f, then find the value of the test statistic according to the formula:
.
Compare Uqs with the rejection region Wα to conclude:
- If Uqs ∈ Wα , reject H0, acceptH1
- If Uqs Wα, there is no basis to reject H0.
Example. Disease A can be cured with drug H. The company that manufactures drug
H claims that the rate of patients recovering from disease due to taking drug of this
company is 85%. People tested drug H on 250 patients with disease A and found that
195 people recovered from the disease. Can we conclude at the 5% significance level
that the above statement of the company manufacturing drug H is higher than reality?
Solution. Let p be the proportion of patients with disease A who recover from the
disease when taking the drug H.
The pair of statistical hypotheses are: H0: p = 0.85; H1: p < 0.85
Because np0 = 250(0.85) = 212.5 > 5 and n(1 p0 ) 250(0.15) 37.5 5 , we
choose the test statistic
18
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
(f p0 ) n (f 0.85). 250
U
p0 (1 p0 ) 0.85(1 0.85)
With the significance level = 0.05, we have u = u0.05 = 1.645. Thus, the
rejection region is
W = (; 1.645) .
For n = 250, m = 210, we calculate the sample proportion as follows
m 195
f 0.78 .
n 250
We have the value of the test statistic
(f p0 ) n (0.78 0.85). 250
Uqs 3.0997
p0 (1 p0 ) 0.85(1 0.85)
and
W2 (X 21 , X 22 ,..., X 2n 2 ) .
19
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
(f1 f 2 ) (p1 p 2 )
U .
p1 (1 p1 ) p 2 (1 p 2 )
n1 n2
With n1 30 and n2 > 30, U has the approximate standard normal distribution N(0; 1).
If H0 is true (p1 = p2 = p), the test statistic becomes:
f1 f 2
U .
1 1
p(1 p)
n1 n 2
Since p is unknown, it is replaced by its estimate:
n1f1 n 2 f 2
f .
n1 n 2
with the approximate standard normal distribution N(0; 1) if n1 30 and n2 > 30.
With the significance level α and depending on the alternative hypothesis H1,
the rejection region are determined as follows:
+) The case of the one-tail test (the right-tail test): H1: p1 > p2:
W (u ; ) .
+) The case of the one-tail test (the left-tail test): H1: p1 < p2:
W (; u ) .
For specific samples w1 (x11 , x12 ,..., x1n ) and w2 (x 21 , x 22 ,..., x 2n ) , we get the
1 2
sample proportions f1 , f 2 and f . From there, the value of the test statistic is calculated
according to the following formula:
f1 f 2
U qs .
1 1
f (1 f )
n1 n 2
Compare Uqs with the rejection region Wα to conclude:
20
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
We have
30 65 13
f1 0.15;f 2 .
200 350 70
n1f1 n 2 f 2 200(0.15) 350(13 / 70) 19
f .
n1 n 2 550 110
Thus, we cannot reject H0, which means that there is not enough evidence to infer that
the proportion of workers quitting in factory A is lower than in factory B.
7.6. Hypothesis testing for the variance
Assume that the original random variable X in the population has the normal
distribution N(, 2 ) with V(X) 2 is unknown but there is basis to assume that its
H0: 2 = 02 .
21
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
To test the above hypothesis, from the population, we create a random sample
of size n:
W = (X1, X2, …, Xn)
and choose the test statistic:
2 (n 1)S2
02
+) The case of the one-tail test (the right-tail test) H1: 2 > 02 :
The rejection region Wα is
+) The case of the one-tail test (the left-tail test) H1: 2 < 02
The rejection region Wα is
1)
W (; 12(n
)
With a specific sample w = (x1, x2, …, xn), we can calculate the sample variance
s 2 and the value of the test statistic
2 (n 1)s 2
qs .
02
2
Compare qs with the rejection region Wα to conclude:
2
- If qs W then reject H0, accept H1
2
- If qs W then there is no basis to reject H0.
the variance was found to be s 2 11.41 (grams)2. At the significance level of 5%, let’s
conclude about the above suspicion, knowing that, normally, the dispersion of chick
weight is 10 (gram)2?
Solution. Let X be the weight of chicks at birth. According to the assumption,
X ~ N(, 2 ) . Therefore, the uniformity (or the dispersion) of chick weight is 2 . This
H0: 2 10 ; H1: 2 10
The test statistic is
(n 1)S2 11S2
2 .
02 10
W = 2(n 1) ; .
With the significance level 0.05, we have 2(n 1) 0.05
2(11)
19.68 . Therefore,
2 (n 1)s 2 11(11.41)
qs 12.551 .
02 10
2
Since qs W , we cannot reject the hypothesis H 0 , which means there is not
X1 and X 2 have the same normal distribution X1 ~ N(1, 12 ) , X2 ~ N(2 , 22 ) . If 12
and 22 are unknown, but there is a basis to assume that their values are equal, we
make a statistical hypothesis
H 0 : 12 22 .
To test the above hypothesis, from two populations, draw two independent
samples of size n1 and n2:
23
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
W2 (X 21 , X 22 ,..., X 2n 2 )
+) The case of the one-tail test (the right-tail test) : H1: 12 > 22 :
The rejection region Wα is
+) The case of the one-tail test (the left-tail test) H1: 12 < 22 :
The rejection region Wα is
( n 1, n2 1)
W (; f11 )
For specific samples w1 (x11 , x12 ,..., x1n ) and w2 (x 21 , x 22 ,..., x 2n ) , we can
1 2
calculate the sample variances s12 and s 22 . From there, the value of the test statistic can
be calculated according to the following formula:
s12
Fqs 2 .
s2
Compare Fqs with the rejection region Wα to conclude:
24
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
Example 1. To compare the accuracy of two measuring devices, samples were taken
and the following results were obtained:
Device A Device B
Measured 25 times Measured 21 times
Variance of error = 14.5 Variance of error = 17.2
Assume that the measurement error is a normally distributed random variable.
Conduct a test at the 5% significance level to infer whether the accuracy of the two
devices is the same?
Solution. Let X1 and X 2 be the measurement error of device B and device A,
This is the problem of testing the hypothesis pair: H0: 12 22 , H1: 12 22
The test statistic is:
S12
F .
S22
The rejection region is
W (;f1(n1 /1,n
2
2 1)
) (f (n/121,n 2 1) ; ) .
With the significance level α = 0.05, we have
(n 1,n 2 1) (20,24) 1 1 (n 1,n 2 1) (20,24)
f11 /2 f0.975 0.4149 and f /21 f0.025 2.33
(24,20) 2.41
f0.025
25
Lecturer: Nguyen Duong Nguyen, Mathematics Department, Faculty of Basic Science, FTU
Solution. Let XA and XB be the height of the young people in regions A and B,
We obtain that
26