Professional Documents
Culture Documents
Lecture 5 Stratified Sampling
Lecture 5 Stratified Sampling
Principles of Stratification
The principal reasons for using stratified random sampling rather than simple
random sampling include:
➢ You use simple random sampling to choose subjects from within each of your
nine groups, selecting a roughly equal sample size from each one.
➢ You can then collect data on salaries and job histories from each of the
members of your sample to investigate your question.
Advantages:
Notations
N : Population size
k : Number of strata
Population (N units)
𝑘
Stratum 1 Stratum 2 Stratum k
N1 units N2 units Nk units 𝑁 = ∑ 𝑁𝑖
………….. 𝑖=1
𝑘
Sample 1 Sample 2 Sample k
𝑛 = ∑ 𝑛𝑖
n1 units n2 units nk units 𝑖=1
Estimation of Population Mean and Its Variance
Let
1
𝑦̅𝑖 = ∑𝑛𝑗=1
𝑖
𝑦𝑖𝑗 : sample mean of ith stratum
𝑛𝑖
1 1
𝑁𝑖
𝑌̅ = ∑𝑘𝑖=1 ∑𝑗=1 ̅𝑖 : population mean where 𝑊𝑖 = 𝑁𝑖
̅𝑖 = ∑𝑘𝑖=1 𝑊𝑖 𝑌
𝑦𝑖𝑗 = ∑𝑘𝑖=1 𝑁𝑖 𝑌
𝑁 𝑁 𝑁
An estimator 𝑦̅𝑠𝑡 (st for stratified) for the population mean 𝑌̅ can be written as
1 𝑁𝑖
𝑦̅𝑠𝑡 = ∑𝑘𝑖=1 𝑁𝑖 𝑦̅𝑖 = ∑𝑘𝑖=1 𝑊𝑖 𝑦̅𝑖 , where 𝑊𝑖 =
𝑁 𝑁
If in every stratum the sample estimator 𝑦̅𝑖 is unbiased and samples are drawn
independently in different strata, then 𝑦̅𝑠𝑡 is an unbiased estimator of the population
mean and its sampling variance is given by
𝑛𝑖 𝑆𝑖2 1 𝑆𝑖2
𝑉𝑎𝑟(𝑦̅𝑠𝑡 ) = ∑𝑘𝑖=1 𝑊𝑖2 (1 − )𝑛 = 2 ∑𝑘𝑖=1 𝑁𝑖 (𝑁𝑖 − 𝑛𝑖 ) ,
𝑁𝑖 𝑖 𝑁 𝑛𝑖
1 2
where 𝑆𝑖2 = ∑𝑁𝑖 (𝑦 − 𝑌
̅𝑖 )
𝑁 −1 𝑖=1 𝑖𝑗
𝑖
1 2
where 𝑠𝑖2 = ∑𝑁 𝑖
(𝑦 − 𝑦̅)
𝑛 −1 𝑖=1 𝑖𝑗
𝑖
𝑖
Allocation Problem and Choice of Sample Sizes in Different Strata
In stratified sampling, the allocation of the sample to different strata is done by the
consideration of three factors, viz.,
➢ The sample size cannot be determined by minimizing both cost and variability
simultaneously.
➢ The cost function is directly proportional to the sample size whereas
variability is inversely proportional to the sample size.
There are four methods of allocation of sample sizes to different strata in a stratified
sampling procedure and these are:
i. Equal allocation
ii. Proportional allocation
iii. Neyman allocation
iv. Optimum allocation
1. Equal Allocation
Choose the sample size 𝑛𝑖 to be the same for all the strata. Let n be the sample
size and k be the number of strata, then
𝑛
𝑛𝑖 = for all i=1,2,………..,k.
𝑘
2. Proportional Allocation
𝑛𝑖 ∝ 𝑁𝑖
𝑛
𝑛𝑖 = ( ) 𝑁𝑖
𝑁
• This means that the sampling fraction is the same in all strata.
• It gives a self-weighting sample by which numerous estimates can be
made with greater speed and a higher degree of precision.
𝑛𝑖 ∝ 𝑁𝑖 𝑆𝑖 and
𝑛𝑁𝑖 𝑆𝑖
𝑛𝑖 = ∑𝑘
𝑖=1 𝑁𝑖 𝑆𝑖
This allocation arises when the 𝑉𝑎𝑟(𝑦̅𝑠𝑡 ) is minimized subject to the constraint
𝑛 = ∑𝑘𝑖=1 𝑛𝑖 (prespecified).
➢ There may be difficult in using this method as the value of 𝑺𝒊 will usually be
unknown.
➢ However, the stratum variances may be obtained from previous surveys or
from a specially planned pilot survey.
4. Optimum Allocation
In this method of allocation, the sample sizes 𝑛𝑖 in the respective strata are
determined with a view
➢ to minimize 𝑽(𝒚 ̅𝒔𝒕 ) for a specified cost of conducting the sample survey or
➢ to minimize the cost for a specified value of 𝑽(𝒚 ̅𝒔𝒕 ).
The simplest cost function in stratified sampling that can be taken is
𝑛
𝐶 = 𝑐 + ∑ 𝑛𝑖 𝑐𝑖
𝑖=1
where
✓ overhead cost c is constant and
✓ 𝑐𝑖 is the average cost of surveying one unit in the ith stratum, which may
depend upon the nature and size of the units in the stratum.
and
∑𝑘
𝑖=1(𝑊𝑖 𝑆𝑖 )
𝑛= ………………..(2)
√𝜆𝑐𝑖
𝜓 = 𝑉(𝑦̅𝑠𝑡 ) + 𝜆𝐶
Thus, the relation (3) leads to the following important conclusions that, in a given
stratum, we have to take a larger sample if
Note:
➢ If 𝑐𝑖 ′𝑠 are the same from stratum to stratum, relation (3) will lead to the
Neyman allocation.
➢ Similarly, if 𝑐𝑖 ′𝑠 and 𝑆𝑖 ′𝑠 do not vary from stratum to stratum, relation (3) will
lead to proportional allocation.
The total sample size n required for estimating the population with a specified cost
C is given by
(𝐶 − 𝑐)(∑𝑘𝑖=1 𝑊𝑖 𝑆𝑖 /√𝑐𝑖 )
𝑛=
∑𝑘𝑖=1 𝑊𝑖 𝑆𝑖 /√𝑐𝑖
If V is fixed, we find
The values of the stratum variances can be obtained from earlier surveys or from a
knowledge of the measurements within each stratum.
Stratum I Stratum II
Family # Family size Family # Family size Family # Family size
1 5 11 6 1 3
2 7 12 9 2 9
3 3 13 5 3 2
4 9 14 4 4 6
5 8 15 8 5 11
6 6 16 7 6 4
7 5 17 6 7 8
8 6 18 4 8 2
9 6 19 6 9 3
10 4 20 6 10 2
Solution:
Stratum 1 Stratum 2
Y1j (Y1j - 𝑌̅1 )2 Y2j (Y2j - 𝑌̅2 )2
5 1 3 4
7 1 9 16
3 9 2 9
9 9 6 1
8 4 11 36
6 0 4 1
5 1 8 9
6 0 2 9
6 0 3 4
4 4 2 9
10 10
6 ∑𝑖=1 Y2j =50 ∑𝑖=1(Y2j −
0 𝑌̅2 )2 =98
9 9
5 1
4 4
8 4
7 1
6 0
4 4
6 0
6 0
20 10
∑𝑖=1 Y1j =120
∑(Y2j − 𝑌̅1 )2
𝑖=1
= 52
1. Stratum Mean
Mean of Stratum 1
120
𝑌̅1 = =6
20
Mean of Stratum 2
50
𝑌̅2 = = 5
10
Stratum Variance
Variance of Stratum 1
1 2 1
𝑆12 = ∑20 (𝑦 − 𝑌̅1 ) = 𝑋52 = 2.74
𝑁1 −1 𝑖=1 𝑖𝑗 19
Variance of Stratum 2
1 2 1
𝑆22 = ∑10 (𝑦 − 𝑌̅2 ) = 𝑋98 = 10.89
𝑁2 −1 𝑖=1 𝑖𝑗 9
Population Variance
1 𝑆𝑖2
𝑉𝑎𝑟(𝑦̅𝑠𝑡 ) = 2
∑𝑘𝑖=1 𝑁𝑖 (𝑁𝑖 − 𝑛𝑖 )
𝑁 𝑛𝑖
1 𝑆12 𝑆2
= 2
[𝑁1 (𝑁1 − 𝑛1 ) + 𝑁2 (𝑁2 − 𝑛2 ) 2 ]
𝑁 𝑛1 𝑛2
1 2.74 10.89
= [20(20 − 8) + 10(10 − 4) ] = 0.27
302 8 4
Selected sample:
Stratum I Stratum II
Selected Sl. # Value Selected Sl. # Value
13 5 5 11
05 8 4 6
19 6 3 2
06 6 7 4
18 4
16 7
15 8
14 4
Sample mean of Stratum 1
48
𝑦̅1 = = 6.0
8
1 𝑠𝑖2
𝑣𝑎𝑟(𝑦̅𝑠𝑡 ) = 2
∑𝑘𝑖=1 𝑁𝑖 (𝑁𝑖 − 𝑛𝑖 )
𝑁 𝑛𝑖
1 2.57 3.81
= 2
[20(20 − 8) + 20(20 − 4) ]
30 8 3
1
= (77.1 + 304.8)
900
381.9
= = 0.424
900