Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

211MAT1302 - Unit - IV : TESTING OF HYPOTHESIS FOR

LARGE SAMPLES

Syllabus: Sampling distributions – Statistical hypothesis – Large sample


tests based on Normal distribution for single proportion, difference of pro-
portions, single mean and difference of means.

4.1 Sampling
Definition 4.1.1. A population consists of collection of individual units,
which may be persons or experimental outcomes, whose characteristics are to
be studied.

Definition 4.1.2. A sample is a portion of the population that is studied to


learn about the characteristics of the population.

Note 4.1.3. For practical purpose, a sample may be regarded as a large


sample if size of the sample n > 30, otherwise it is called a small sample.

Types of Sampling
Some of the commonly known and frequently used sampling are:

1. Purposive sampling

2. Random sampling

3. Simple sampling

4. Stratified sampling

Parameters and Statistics

Definition 4.1.4. The statistical constants of the population such as mean


(µ), variance (σ 2 ) etc. which are usually referred to as parameters.
The statistical measures computed from the sample observation alone, e.g.,
mean (x), variance (s2 ), are called statistics.

Sampling distribution of a Statistic


If we draw a sample of size n from a given finite population of size N , then
the total of possible samples are N Cn = n!(NN−n)!
!
= k (say).
For each of these k samples, we can compute some statistic t, in particular
mean (x), variance (s2 ), etc., as given below:

1
Sample Statistic
Number t x s2
1 t1 x1 s21
2 t2 x2 s22
3 t3 x3 s23
.. .. .. ..
. . . .
k tk xk s2k
The set of values of the statistic so obtained, one for each sample, is called
the sampling distribution of the statistic.

Standard Error

Definition 4.1.5. The standard deviation of the sampling distribution of a


statistic is known as its standard error, abbreviated as S.E.

Note 4.1.6. For large samples, the S.E. for some well known statistics are
given below:

Statistic S.E.
Sample mean (x) √σ
√n
σ2
Sample S.D. (s)
√2n
2 2 2
Sample Variance (s ) σ
√ n
PQ
Sample proportion (p)
√ n
σ12 σ22
Difference of two sample means (x1 − x2 ) +
√ n1 n2
σ12 σ22
Difference of two sample SDs (s1 − s2 ) +
√ 2n1 2n2

Difference of two sample proportions (p1 − p2 ) P1 Q1


n1
+ P2 Q2
n2

Note 4.1.7. S.E. plays a very important role in large sample theory. If t is
any statistic, then for large samples Z = t−E(t)
SE(t)
∼ N (0, 1).

Test of Significance

Definition 4.1.8. The test of significance which enable us to decide on the


basis of the sample results if
i) the deviation between the observed sample statistic and the hypothetical
parameter value is significant.
ii) the deviation between two sample statistics is significant.

Null Hypothesis

2
Definition 4.1.9. For applying the test of significance, we first setup a hy-
pothesis - a definite statement about the population parameter. Such a hy-
pothesis is a hypothesis of no difference is called the null hypothesis and it is
denoted by H0 .

Alternative Hypothesis
Definition 4.1.10. Any hypothesis which is complementary to the null hy-
pothesis is called an Alternative hypothesis and it is denoted by H1 .
Example: If we want to test the null hypothesis that the population has a
specified mean ( µ0 say). i.e., H0 : µ = µ0 , then the alternative hypothesis
would be
(i) H1 : µ ̸= µ0 (ii) H1 : µ > µ0 (iii) H1 : µ < µ0
The alternative hypothesis (i) is known as a two tailed alternative, (ii) is
known as a right tailed alternative and (iii) is know as a left tailed alterna-
tive.

Errors in Sampling
Definition 4.1.11. There are two types of errors in sampling,
i) Type I Error: Reject H0 , when it is true.
ii)Type II Error: Accept H0 , when it is wrong.
Critical Region and Level of significance (LOS)
Definition 4.1.12. A region (corresponding to a statistic t) in the sample
space S which amounts to rejection of H0 is termed as Critical region or
Region of rejection.
If w is the critical region and t is the value of the statistic based on a random
sample of size n, the P (t ∈ w/H0 ) = α, P (t ∈ w/H1 ) = β where w is the
complementary set of w, is called the Acceptance region.

The probability α that a random value of the statistic t belongs to the critical
region is known as the level of significance.

Note 4.1.13. The level of significance is normally taken as 5% or 1%.


Critical value or Significant value
Definition 4.1.14. The value of the test statistic Z for which the critical
region and acceptance region are separated is called the critical value or the
significant value of Z and denoted by Zα , when α is the LOS.

3
Note 4.1.15. When Z = t−E(t)
SE(t)
∼ N (0, 1), we have P (|Z| < 1.96) = 95%
and P (|Z| > 1.96) = 5%. Thus Z = ±1.96 separates the critical region and
the acceptance region at 5% LOS for a two tailed test.

Note 4.1.16. The critical value of Z for a single tailed test (right or left) at
LOS α is the same as that for a LOS for a two tailed test of LOS 2α.

Note 4.1.17. The critical values for some standard LOSs are given below
for large samples.
Nature of Test 1%(0.01) 2%(0.02) 5%(0.05) 10%(0.10)
Two tailed |Zα | = 2.58 |Zα | = 2.33 |Zα | = 1.96 |Zα | = 1.645
Right tailed Zα = 2.33 Zα = 2.055 Zα = 1.645 Zα = 1.28
Left tailed Zα = −2.33 Zα = −2.055 Zα = −1.645 Zα = −1.28

4.2 Testing of Hypothesis


Procedure for Testing of Hypothesis(for large samples)

1. Null Hypothesis (H0 ) is defined.

2. Alternative Hypothesis (H1 ) is also defined, after careful study of the


problem and also the nature of the test (two tailed or one tailed) is
decided.

3. Choose appropriate LOS (5% or 1%)


t−E(t)
4. Compute the test statistic Z = SE(t)
under the null hypothesis.

5. We compare the computed value of Z in step (4) with the significant


value at the given LOS.

(a) If |Z| < 1.96, H0 my be accepted at 5% LOS.


(b) If |Z| > 1.96, H0 my be rejected at 5% LOS.
(c) If |Z| < 2.58, H0 my be accepted at 1% LOS.
(d) If |Z| > 2.58, H0 my be rejected at 1% LOS.

For single tailed test (Right or Left), we compare the computed value
of |Z| with 1.645 (at 5% LOS) and 2.33 (at 1% LOS) and accept or
reject H0 accordingly.

4
Interval Estimation of Population parameter
We have P (|Z| ≤ 1.96) = 0.95
⇒ P (| t−E(t)
SE(t)
| ≤ 1.96) = 0.95
⇒ P (t − 1.96 SE(t) ≤ E(t) ≤ t + 1.96 SE(t)] = 0.95

This means that, with 95% confidence, that the parameter E(t) will lie be-
tween t − 1.96 SE(t) and t + 1.96 SE(t). Thus {t − 1.96 SE(t), t + 1.96 SE(t)}
is the 95% confidence limit for E(t).

Similarly, {t − 2.58 SE(t), t + 2.58 SE(t)} is the 99% confidence limit for E(t)
and {t − 2.33 SE(t), t + 2.33 SE(t)} is the 98% confidence limit for E(t).

TEST OF SIGNIFICANCE FOR LARGE SAMPLES


Type I: Test of significance for single proportion

The test statistic is Z = √


p−P
PQ
where n is the sample size, p is the sample
n
proportion, P is the population proportion and Q = 1 − P .
√ √
Note 4.2.1. 95% confidence limit for P is given by (p−1.96 pq , p+1.96 pq )
√ pq n √ pq n
and 98% confidence limit for P is given by (p − 2.33 n , p + 2.33 n )
Problem 4.2.2. The fatality rate of typhoid patients is believed to be 17.26%.
In a certain year, 640 patients suffering from typhoid were treated in a
metropolitan hospital and only 63 patients died. Can you consider the hospi-
tal efficient?
Solution:
Null Hypothesis, H0 : p = P
Alternative Hypothesis, H1 = p < P (one tailed (left) test )
Let the LOS be 1%. ∴Zα = −2.33

Here, p = 63
640
= 0.0984, P = 17.26% = 0.1726, Q = 1 − P = 0.8274 and
n = 640.

The test statistic,


p−P 0.0984 − 0.1726
Z= √ = √ = −4.9672.
PQ (0.1726)(0.8274)
n 640

Now |Z| > |Zα |, we reject the null hypothesis H0 and we accept the alter-
native hypothesis H1 . i.e., the difference between p and P is significant.

5
That is, the hospital is efficient in bringing down the fatality rate of typhoid
patients at 1% LOS.
Problem 4.2.3. A random sample of 500 apples were taken from a large
consignment and 60 were found to be defective. Obtain the 98% confidence
limits for the percentage number of bad apples in the consignment.
Solution: Given n = 500, p = proportion of bad apples in the sample
60
= 500 = 0.12, q = 1 − p = 0.88.

We
( have √ the 98% confidence√ pq ) limits for population proportion are
p −(2.33 pq , p + 2.33 )
n √ n √
i.e., 0.12 − 2.33 (0.12)(0.88)
500
, 0.12 + 2.33 (0.12)(0.88)
500
i.e., (0.08615, 0.15385).
∴98% confidence limits percentage of bad apples in the consignment are
(8.62, 15.39).

Problem 4.2.4. A salesman in a departmental store claims that at most 60


percent of the shoppers entering the store leaves without making a purchase.
A random sample of 50 shoppers showed that 35 of them left without making a
purchase. Are these sample results consistent with the claim of the salesman?
Use a level of significance of 0.05.
35
Solution: Here n = 50, p = sample proportion = 50 = 0.7
P = population proportion = 60% = 0.60 = 0.6; Q = 1 − P = 1 − 0.6 = 0.4

Null Hypothesis, H0 : p = P
Alternative Hypothesis, H1 = p > P (one tailed (right) test )

The LOS is 0.05, ∴Zα = 1.645


The test statistic,
p−P 0.7 − 0.6
Z= √ = √ = 1.4434
PQ 0.6×0.4
n 50

Since |Z| = 1.4434 < 1.645(= Zα |, we accept the null hypothesis.


i.e., the difference between p and P is not significant.
∴ The sample results are consistent with the claim of the salesman.
Type II: Test of significance for difference of proportions

To test the significant difference between two sample proportions p1 and p2 ,


the test statistic is Z = √ p(1 −p2 ) where P = n1np11 +n
+n2 p2
2
and Q = 1 − P .
1
PQ n1
+ n1
2

6
Problem 4.2.5. A random sample of 400 men and 600 women were asked
whether they would like to have a flyover near their residence. 200 men
and 325 women were in favour of the proposal. Test the hypothesis that,
proportions of men and women in favour of the proposal are same at 5%
level.

Solution: Given, Sample sizes n1 = 400 and n2 = 600.


200
Proportion of men, p1 = 400 = 0.5
Proportion of women, p2 = 325
600
= 0.541

Null Hypothesis, H0 : Assume that that there is no significant difference


between the option of men and women as far as the proposal of flyover is
concerned.
i.e., H0 : p1 = p2 .
Alternative Hypothesis, H1 : p1 ̸= p2 (Two tailed)
Given, LOS is 5%, ∴Zα = 1.96

The test statistic,


p1 − p2
Z=√ ( )
P Q n11 + 1
n2

where
n1 p1 + n2 p2 (400)(0.5) + (600)(0.541)
P = = = 0.525
n1 + n2 (400 + 600)
and Q = 1 − P = 1 − 0.525 = 0.475.

0.5 − 0.541
∴Z = √ ( 1 ) = −1.28
1
(0.525)(0.475) 400 + 600

Since |Z| = 1.28 < 1.96, we accept the null hypothesis at 5% LOS.
i.e., There is no significant difference of opinion between men and women as
far as the proposal of flyover is concerned.

Problem 4.2.6. . 15.5% of a random sample of 1600 undergraduates were


smokers, whereas 20% of a random sample of 900 postgraduates were smokers
in a state. Can we conclude that less number of undergraduate are smokers
than the postgraduates at 1% LOS?

Solution: Given, Sample sizes n1 = 1600 and n2 = 900.


Proportion of smokers in UG, p1 = 15.5% = 0.155

7
Proportion of smokers in PG, p2 = 20% = 0.20

Null Hypothesis, H0 : p1 = p2
Alternative Hypothesis, H1 : p1 < p2 (one tailed)
Given the LOS is 1%, ∴|Zα | = 2.33 (one tailed)

The test statistic,


p1 − p2
Z=√ ( )
P Q n11 + 1
n2

where
n1 p1 + n2 p2 (1600)(0.155) + (900)(0.20)
P = = = 0.1712
n1 + n2 (1600 + 900)

and Q = 1 − P = 1 − 0.1712 = 0.8288.

0.155 − 0.20
∴Z = √ ( 1 ) = −2.8671
1
(0.1712)(0.8288) 1600 + 900

Since |Z| = 2.8671 > 2.33 = |Zα |, we reject the null hypothesis at 1% LOS.
i.e., We accept the alternative hypothesis, H1 .
i.e., We conclude that, the proportion of smokers in UG is less than the
proportion of smokers in PG.
Type III: Test of significance for single mean

To test the given sample of size n and mean x has been drawn from a popu-
lation with mean µ, we setup the null hypothesis that there is no difference
between x and µ. The test statistic is
x−µ
Z=
√σ
n

where σ is the standard deviation of the population and n is the sample size.

Problem 4.2.7. A sample of 900 members has a mean 3.4 cms and s.d 2.61
cms. Is the sample drawn from a large population of mean 3.25 cms and s.d
2.61 cms.

Solution: Given n = 900, x = 3.4, s = 2.61, µ = 3.25, σ = 2.61


Null Hypothesis,H0 : The sample has been drawn from the population with

8
mean µ = 3.25 (or) x = µ
Alternative Hypothesis, H1 : x ̸= µ (Two tailed)
Let the LOS be 5%. ∴|Zα | = 1.96.
The test statistic
x−µ 3.4 − 3.25
Z= = = 1.7241
√σ 2.61

n 900

∵ |Z| = 1.7241 < 1.96, we accept the null hypothesis at 5% LOS.


i.e., The sample has been drawn from the population with mean µ = 3.25.

Problem 4.2.8. The mean breaking of the cables supplied by a manufacturer


is 1800 with a S.D of 100. By a new technique in the manufacturing process,
it is claimed that the breaking strength of the cable has increased. In order to
test this claim, a sample of 50 cables is tested and it is found that the mean
breaking strength is 1850. Can we support the claim at 1% LOS?

Solution: Given n = 50, x = 1850, µ = 1800, σ = 100


Null Hypothesis, H0 : There is no significant difference between the mean
breaking strength of the sample and the population (or) x = µ
Alternative Hypothesis, H1 : x > µ (one tailed test)
Given, the LOS is 1%. wasytheref ore |Zα | = 2.33 (one tailed)
The test statistic
x−µ 1850 − 1800
Z= = = 3.5355
√σ 100

n 50

∵ |Z| = 3.5355 > 2.33, we reject the null hypothesis at 1% LOS.


i.e., We accept the alternative hypothesis H1 .
i.e., Based on the sample data, we may support the claim of increase in brak-
ing strength.

Type IV: Test of significance for difference of mean

Let x1 be the mean of a sample of size n1 , from a population with mean


µ1 and variance σ12 and let x2 be the mean of a sample of size n2 , from a
population with mean µ2 and variance σ22 .
To test whether there is any significant difference between x1 and x2 , we use
the test statistic
x1 − x2
Z=√ 2
σ1 σ2
n1
+ n22

9
Note : If the samples have been drawn from the same population, then
σ12 = σ12 = σ 2
x1 − x2 x1 − x2
∴Z = √ = √
σ2 2
n1
+ nσ2 σ n11 + n12

If σ is not known, we can use the estimate of σ 2 , given by

n1 s21 + n2 s22
σ2 =
n1 + n2

where s21 and s22 are the variances of sample 1 and sample 2 respectively.

Problem 4.2.9. The means of two large samples of 1000 and 2000 members
are 67.5 inches and 68.0 inches respectively. Can the samples be regarded as
drawn from the same population of standard deviation 2.5 inches?

Solution: Given, Sample sizes n1 = 100 and n2 = 2000.


Sample means x1 = 67.5 inches and x2 = 68 inches
Population S.D. σ = 2.5 inches.

Null Hypothesis, H0 : The samples have been drawn from the same popula-
tion of S.D. 2.5 inches. (or) µ1 = µ2 and σ = 2.5 inches.
Alternative Hypothesis, H1 : µ1 ̸= µ2 (two tailed test)
Let the LOS be 5%. ∴|Zα | = 1.96.
The test statistic
x1 − x2 67.5 − 68
Z= √ = √ = −5.1640
1 1 1 1
σ n1 + n2 2.5 1000 + 2000

Since |Z| = 5.1640 > 1.96, We reject the null hypothesis at 5% LOS.
i.e., The samples are not drawn from the same population of S.D. 2.5 inches.

Problem 4.2.10. A simple sample of heights of 6400 English men has a


mean of 170 cm and a S.D. of 6.4 cm, while a simple sample of heights of
1600 Americans has a mean of 172 cm and a S.D of 6.3 cm. Do the data
indicate that Americans are, on the average, taller than the Englishmen?

Solution: Sample of Englishmen: Size n1 = 6400, Mean height x1 = 170 cm


and S.D. s1 = 6.4 cm.
Sample of Americans: Size n2 = 1600, Mean height x2 = 172 cm and S.D.
s2 = 6.3 cm.

10
Null Hypothesis, H0 : There is no significant difference between the mean
height of Englishmen and the mean height of Americans. i.e., µ1 = µ2 .
Alternative Hypothesis, H1 : µ1 < µ2 (one tailed test) Let the LOS be 1%.
∴|Zα | = 2.33 (one tailed test).
The test statistic
x1 − x2
Z= √
σ n11 + n12
where
n1 s21 + n2 s22 6400 × 6.42 + 1600 × 6.32
σ2 = = = 40.706
n1 + n2 6400 + 1600
170 − 172
∴ Z=√ √ = −11.2152
1 1
40.706 6400 + 1600

Since |Z| = 11.2152 > |Zα | = 2.33, we reject the null hypothesis at 1% LOS.
i.e., we accept the alternative hypothesis.
We conclude that, on an average, Americans are taller than Englishmen.

11

You might also like