Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Lecture 5: Stratified Sampling

When is Stratified Sampling Used?

➢ A representative sample is important for a good estimate of the population


parameter.
➢ If the population is homogenous with respect to the characteristic under
study, then the method of simple random sampling will yield a homogenous
sample and in turn, the sample mean will serve as a good estimator of
population mean.
➢ Thus, if the population is homogenous with respect to the characteristic
under study, then the sample drawn through simple random sampling is
expected to provide a representative sample.
➢ Moreover, the variance of sample mean not only depends on the sample size
and sampling fraction but also on the population variance.
➢ In order to increase the precision of an estimator, we need to use a sampling
scheme which can reduces the heterogeneity in the population.
➢ If the population is heterogeneous with respect to the characteristic under
study, then one such sampling procedure is stratified sampling.

Definition of Stratified Sampling

➢ Stratified random sampling is a sampling plan in which the population is


✓ divided into several non-overlapping strata and
✓ selects a random sample from each stratum in such a way that units
within the strata are homogenous but between strata they are
heterogeneous.
➢ Strata are generally formed on the basis of some known characteristics of
the population, which is believed to be related to the variable of interest.
➢ Suppose that a population of N individuals can be subdivided into k
mutually exclusive and collectively exhaustive groups, or strata.
➢ Stratified random sampling is the selection of independent simple random
samples from each stratum of the population.
➢ Let k strata in the population contain N1, N2,…………,Nk members, such that
N1 +N2+…………+Nk = N.
➢ Let the numbers in the samples be n1, n2,…………,nk. Then the total number
of sample members is n1 + n2 +…………..+nk = n.
• Sampling within each stratum may be selected proportionately or equal in
size from each stratum.
• A proportionate stratified sample is achieved if the sampling fraction is
the same for every stratum. Under this design, the sample size in a stratum
is proportional to the size of the population in the stratum.
• For each individual stratum, stratum mean, proportion, variance, and other
characteristics are computed.
• These estimates are then properly weighted to form a combined estimate
for the entire population.

Principles of Stratification

➢ The strata should be non-overlapping and exhaustive so that they together


comprise the whole population.
➢ The strata should be made as homogenous as possible within strata and
heterogeneous between strata.
➢ Strata are to be formed on the basis of some known characteristics of the
population, which are believed to have some relationship with the subject
of inquiry and variables of interest.
➢ When stratification with respect to the characteristics under study becomes
difficult for practical reasons, administrative convenience may be
considered as the basis for forming the strata.
➢ With a view to improve the sampling design, strata should be formed on the
basis of natural characteristics as far as possible.
➢ Past data, intuition, expert judgment or preliminary findings from pilot
surveys may also be used to set-up the strata.
➢ This, however, requires that we have prior knowledge of the nature of the
population from which we are sampling.

The principal reasons for using stratified random sampling rather than simple
random sampling include:

1. Stratification may produce a smaller error of estimation than would be


produced by a simple random sample of the same size. This result is
particularly true if measurements within strata are very homogeneous.
2. The cost per observation in the survey may be reduced by stratification of
the population elements into convenient groupings.
3. Estimates of population parameters may be desired for subgroups of the
population. These subgroups should then be identified.
Steps Involved in Stratified Sampling

Step 1: Define your population and subgroups


➢ We should begin by clearly defining the population from which your sample
will be taken.

Choosing characteristics for stratification


➢ You must also choose the characteristic that you will use to divide your
groups.
➢ This choice is very important:
✓ since each member of the population can only be placed in only one
subgroup,
✓ the classification of each subject to each subgroup should be clear and
obvious.

Stratifying by multiple characteristics


➢ You can choose to stratify by multiple different characteristics at once, so
long as you can clearly match every subject to exactly one subgroup.
➢ In this case, to get the total number of subgroups, you multiply the numbers
of strata for each characteristic.
➢ For instance, if you were stratifying by both race and gender identity, using
four groups for the former and three for the latter, you would have 4 x 3 = 12
groups in total.

Example: Stratifying by multiple characteristics


➢ Suppose we ae interested to know salaries and job histories of graduates of
the university students.
➢ Your population is all graduates of the university within the last ten years.
➢ You will stratify by both gender identity and degree received.

Step 2: Separate the population into strata


➢ Next, collect a list of every member of the population, and assign each
member to a stratum.
➢ You must ensure that each stratum is mutually exclusive (there is no overlap
between them), but that together, they contain the entire population.
Example: Separating the population into strata.
➢ You compile a list of every graduate’s name, gender identity, and the degree
that they obtained.
➢ Using this list, you stratify on two characteristics:
✓ gender identity, with three strata (male, female, and other), and
✓ degree, with three strata (bachelor’s, master’s, and doctorate).
➢ Combining these characteristics, you have nine groups in total.
➢ Each graduate must be assigned to exactly one group.

Characteristic Strata Groups/strata

Gender Identity • Male 1. Male bachelor’s graduates,


• Female 2. Female bachelor’s graduates,
• Other 3. Other bachelor’s graduates
4. Male master’s graduates,
Degree • Bachelor’s 5. Female master’s graduates,
• Master’s 6. Other master’s graduates
• Doctorate 7. Male doctoral graduates,
8. Female doctoral graduates,
9. Other doctoral graduates

Step 3: Decide on the sample size for each stratum


➢ First, you need to decide whether you want your sample to be
proportionate or disproportionate/equal.

Proportionate versus disproportionate sampling


➢ In proportionate sampling, the sample size of each stratum is equal to
the subgroup’s proportion in the population as a whole.
➢ Subgroups that are less represented in the greater population (for
example, rural populations, which make up a lower portion of the
population in most developed countries) will also be less represented in
the sample.
➢ In disproportionate sampling, the sample sizes of each strata are
disproportionate to their representation in the population as a whole.

✓ You might choose this method if you wish to study a particularly


underrepresented subgroup whose sample size would otherwise be too
low to allow you to draw any statistical conclusions.
Sample size
➢ Next, you can decide on your total sample size.
➢ This should be large enough to ensure you can draw statistical conclusions
about each subgroup.
➢ If you know your desired margin of error and confidence level as well as
estimated size and standard deviation of the population you are working
with, you can use a sample size calculator/determination formula to estimate
the necessary numbers.

Example: Sample size


➢ Because you need to ensure your sample size of doctoral graduates is large
enough, you decide to use disproportionate sampling.

➢ Even though doctoral students make up a small proportion of the overall


student population, your sample is about ⅓ bachelor’s graduates, ⅓ master’s
graduates, and ⅓ doctoral graduates.
Step 4: Randomly sample from each stratum
➢ Finally, you should use another probability sampling method, such as simple
random or systematic sampling, to sample from within each stratum.
➢ If properly done, the randomization inherent in such methods will allow you
to obtain a sample that is representative of that particular subgroup.

Example: Random sampling

➢ You use simple random sampling to choose subjects from within each of your
nine groups, selecting a roughly equal sample size from each one.
➢ You can then collect data on salaries and job histories from each of the
members of your sample to investigate your question.

Advantages:

➢ The stratification can improve the efficiency of estimation under appropriate


conditions.
➢ The stratification may be administratively convenient and facilitate the
drawing of a sample.
➢ We may want to estimate characteristics of the separate strata as well as of
the overall population.
Disadvantages:

➢ Requires ancillary information


➢ Can be more time consuming to plan and implement

Notations

We use the following symbols and notations:

N : Population size

k : Number of strata

Ni : Number of sampling units in ith strata

𝑁 = ∑𝑘𝑖=1 𝑁𝑖 = Total population size

ni : Number of sampling units to be drawn from ith stratum

𝑛 = ∑𝑘𝑖=1 𝑛𝑖 = Total sample size

Population (N units)

𝑘
Stratum 1 Stratum 2 Stratum k
N1 units N2 units Nk units 𝑁 = ∑ 𝑁𝑖
………….. 𝑖=1

𝑘
Sample 1 Sample 2 Sample k
𝑛 = ∑ 𝑛𝑖
n1 units n2 units nk units 𝑖=1
Estimation of Population Mean and Its Variance

Let

Y : Characteristic under study

𝑦𝑖𝑗 : vale of jth unit in ith stratum


1
𝑌̅𝑖 = ∑𝑁 𝑖
𝑗=1 𝑦𝑖𝑗 : population mean of ith stratum
𝑁𝑖

1
𝑦̅𝑖 = ∑𝑛𝑗=1
𝑖
𝑦𝑖𝑗 : sample mean of ith stratum
𝑛𝑖

1 1
𝑁𝑖
𝑌̅ = ∑𝑘𝑖=1 ∑𝑗=1 ̅𝑖 : population mean where 𝑊𝑖 = 𝑁𝑖
̅𝑖 = ∑𝑘𝑖=1 𝑊𝑖 𝑌
𝑦𝑖𝑗 = ∑𝑘𝑖=1 𝑁𝑖 𝑌
𝑁 𝑁 𝑁

An estimator 𝑦̅𝑠𝑡 (st for stratified) for the population mean 𝑌̅ can be written as
1 𝑁𝑖
𝑦̅𝑠𝑡 = ∑𝑘𝑖=1 𝑁𝑖 𝑦̅𝑖 = ∑𝑘𝑖=1 𝑊𝑖 𝑦̅𝑖 , where 𝑊𝑖 =
𝑁 𝑁

which is different from the overall sample mean


𝑘
1
𝑦̅ = ∑ 𝑛𝑖 𝑦̅𝑖
𝑛
𝑖=1

If in every stratum the sample estimator 𝑦̅𝑖 is unbiased and samples are drawn
independently in different strata, then 𝑦̅𝑠𝑡 is an unbiased estimator of the population
mean and its sampling variance is given by
𝑛𝑖 𝑆𝑖2 1 𝑆𝑖2
𝑉𝑎𝑟(𝑦̅𝑠𝑡 ) = ∑𝑘𝑖=1 𝑊𝑖2 (1 − )𝑛 = 2 ∑𝑘𝑖=1 𝑁𝑖 (𝑁𝑖 − 𝑛𝑖 ) ,
𝑁𝑖 𝑖 𝑁 𝑛𝑖

1 2
where 𝑆𝑖2 = ∑𝑁𝑖 (𝑦 − 𝑌
̅𝑖 )
𝑁 −1 𝑖=1 𝑖𝑗
𝑖

If a simple random sampling is taken within each stratum, an unbiased estimator of


𝑦̅𝑠𝑡 can obtained as follows:
𝑛 𝑠2 1 𝑠2
𝑣𝑎𝑟(𝑦̅𝑠𝑡 ) = ∑𝑘𝑖=1 𝑊𝑖2 (1 − 𝑖 ) 𝑖 = 2 ∑𝑘𝑖=1 𝑁𝑖 (𝑁𝑖 − 𝑛𝑖 ) 𝑖 ,
𝑁 𝑛 𝑁 𝑖 𝑖 𝑛 𝑖

1 2
where 𝑠𝑖2 = ∑𝑁 𝑖
(𝑦 − 𝑦̅)
𝑛 −1 𝑖=1 𝑖𝑗
𝑖
𝑖
Allocation Problem and Choice of Sample Sizes in Different Strata

In stratified sampling, the allocation of the sample to different strata is done by the
consideration of three factors, viz.,

✓ the total number of units in the stratum, i.e. stratum size


✓ the variability within the stratum, and
✓ the cost in taking observations per sampling unit in the stratum.

A good allocation is one which

✓ Minimize the cost of survey for a given variance or specified precision


✓ Minimize the variance or maximize the precision for given budget.

➢ The sample size cannot be determined by minimizing both cost and variability
simultaneously.
➢ The cost function is directly proportional to the sample size whereas
variability is inversely proportional to the sample size.

There are four methods of allocation of sample sizes to different strata in a stratified
sampling procedure and these are:
i. Equal allocation
ii. Proportional allocation
iii. Neyman allocation
iv. Optimum allocation

1. Equal Allocation
Choose the sample size 𝑛𝑖 to be the same for all the strata. Let n be the sample
size and k be the number of strata, then
𝑛
𝑛𝑖 = for all i=1,2,………..,k.
𝑘

2. Proportional Allocation

For fixed k, select 𝑛𝑖 such that it is proportional to stratum size 𝑁𝑖 , i.e.,

𝑛𝑖 ∝ 𝑁𝑖
𝑛
𝑛𝑖 = ( ) 𝑁𝑖
𝑁

• This means that the sampling fraction is the same in all strata.
• It gives a self-weighting sample by which numerous estimates can be
made with greater speed and a higher degree of precision.

3. Neyman or Optimum Allocation


➢ This allocation of the total sample size to strata is called minimum
variance allocation and is due to Neyman.
➢ The allocation of samples among different strata is based on a joint
consideration of the stratum size and the stratum variation.
➢ In this allocation, it is assumed that the sampling cost per unit among
different strata is the same and the size of the sample is fixed. Sample
sizes are allocated by

𝑛𝑖 ∝ 𝑁𝑖 𝑆𝑖 and
𝑛𝑁𝑖 𝑆𝑖
𝑛𝑖 = ∑𝑘
𝑖=1 𝑁𝑖 𝑆𝑖

This allocation arises when the 𝑉𝑎𝑟(𝑦̅𝑠𝑡 ) is minimized subject to the constraint
𝑛 = ∑𝑘𝑖=1 𝑛𝑖 (prespecified).

There are some limitations of the optimum allocation.

➢ There may be difficult in using this method as the value of 𝑺𝒊 will usually be
unknown.
➢ However, the stratum variances may be obtained from previous surveys or
from a specially planned pilot survey.

4. Optimum Allocation
In this method of allocation, the sample sizes 𝑛𝑖 in the respective strata are
determined with a view
➢ to minimize 𝑽(𝒚 ̅𝒔𝒕 ) for a specified cost of conducting the sample survey or
➢ to minimize the cost for a specified value of 𝑽(𝒚 ̅𝒔𝒕 ).
The simplest cost function in stratified sampling that can be taken is
𝑛

𝐶 = 𝑐 + ∑ 𝑛𝑖 𝑐𝑖
𝑖=1
where
✓ overhead cost c is constant and
✓ 𝑐𝑖 is the average cost of surveying one unit in the ith stratum, which may
depend upon the nature and size of the units in the stratum.

In this allocation, sample sizes are allocated by


𝑊𝑖 𝑆𝑖
𝑛𝑖 = ……………………(1)
√𝜆𝑐𝑖

and
∑𝑘
𝑖=1(𝑊𝑖 𝑆𝑖 )
𝑛= ………………..(2)
√𝜆𝑐𝑖

where 𝜆 is some unknown constant.

We select the constant 𝜆 to minimize

𝜓 = 𝑉(𝑦̅𝑠𝑡 ) + 𝜆𝐶

From the relations (1) and (2) we can obtain


(𝑁𝑖 𝑆𝑖 /√𝑐𝑖 )
𝑛𝑖 = 𝑛 ∑𝑘 ……………………(3)
𝑖=1(𝑁𝑖 𝑆𝑖 /√𝑐𝑖 )

Thus, the relation (3) leads to the following important conclusions that, in a given
stratum, we have to take a larger sample if

i. The stratum size is larger;


ii. The stratum has larger variability; and
iii. The cost per unit is cheaper in the stratum.

Note:
➢ If 𝑐𝑖 ′𝑠 are the same from stratum to stratum, relation (3) will lead to the
Neyman allocation.
➢ Similarly, if 𝑐𝑖 ′𝑠 and 𝑆𝑖 ′𝑠 do not vary from stratum to stratum, relation (3) will
lead to proportional allocation.

The total sample size n required for estimating the population with a specified cost
C is given by

(𝐶 − 𝑐)(∑𝑘𝑖=1 𝑊𝑖 𝑆𝑖 /√𝑐𝑖 )
𝑛=
∑𝑘𝑖=1 𝑊𝑖 𝑆𝑖 /√𝑐𝑖

If V is fixed, we find

(∑𝑘𝑖=1 𝑊𝑖 𝑆𝑖 /√𝑐𝑖 )(∑𝑘𝑖=1 𝑊𝑖 𝑆𝑖 √𝑐𝑖


𝑛=
𝑉 + ∑𝑘𝑖=1 𝑊𝑖 𝑆𝑖2 /𝑁

The values of the stratum variances can be obtained from earlier surveys or from a
knowledge of the measurements within each stratum.

Example: Consider a population of 30 families living in two city blocks, block A


and block B. Block A consists of 20 families with 120 members and block B consists
of 10 families with 50 members. Table below shows the distribution of the family
size of these two blocks. We treat the blocks as stratum.

Stratum I Stratum II
Family # Family size Family # Family size Family # Family size
1 5 11 6 1 3
2 7 12 9 2 9
3 3 13 5 3 2
4 9 14 4 4 6
5 8 15 8 5 11
6 6 16 7 6 4
7 5 17 6 7 8
8 6 18 4 8 2
9 6 19 6 9 3
10 4 20 6 10 2

1. Compute the stratum mean, stratum variance.


2. Let us choose 8 families from stratum I and 4 families from stratum II at
random. Using the sample values, obtain the
(i) estimate the average family size.
(ii) Estimate the variance of the sample mean.

Solution:

Stratum 1 Stratum 2
Y1j (Y1j - 𝑌̅1 )2 Y2j (Y2j - 𝑌̅2 )2
5 1 3 4
7 1 9 16
3 9 2 9
9 9 6 1
8 4 11 36
6 0 4 1
5 1 8 9
6 0 2 9
6 0 3 4
4 4 2 9
10 10
6 ∑𝑖=1 Y2j =50 ∑𝑖=1(Y2j −
0 𝑌̅2 )2 =98
9 9
5 1
4 4
8 4
7 1
6 0
4 4
6 0
6 0
20 10
∑𝑖=1 Y1j =120
∑(Y2j − 𝑌̅1 )2
𝑖=1
= 52

1. Stratum Mean

Mean of Stratum 1
120
𝑌̅1 = =6
20

Mean of Stratum 2
50
𝑌̅2 = = 5
10

Stratum Variance

Variance of Stratum 1
1 2 1
𝑆12 = ∑20 (𝑦 − 𝑌̅1 ) = 𝑋52 = 2.74
𝑁1 −1 𝑖=1 𝑖𝑗 19

Variance of Stratum 2
1 2 1
𝑆22 = ∑10 (𝑦 − 𝑌̅2 ) = 𝑋98 = 10.89
𝑁2 −1 𝑖=1 𝑖𝑗 9

Population Variance
1 𝑆𝑖2
𝑉𝑎𝑟(𝑦̅𝑠𝑡 ) = 2
∑𝑘𝑖=1 𝑁𝑖 (𝑁𝑖 − 𝑛𝑖 )
𝑁 𝑛𝑖

1 𝑆12 𝑆2
= 2
[𝑁1 (𝑁1 − 𝑛1 ) + 𝑁2 (𝑁2 − 𝑛2 ) 2 ]
𝑁 𝑛1 𝑛2

1 2.74 10.89
= [20(20 − 8) + 10(10 − 4) ] = 0.27
302 8 4

Selected sample:

Stratum I Stratum II
Selected Sl. # Value Selected Sl. # Value
13 5 5 11
05 8 4 6
19 6 3 2
06 6 7 4
18 4
16 7
15 8
14 4
Sample mean of Stratum 1
48
𝑦̅1 = = 6.0
8

Sample mean of Stratum 2


23
𝑦̅2 = = 5.75
4

Sample variance of Stratum 1


1 2 1
𝑠12 = ∑8 (𝑦 − ̅̅̅)
𝑦1 = 𝑋18 = 2.57
𝑛1 −1 𝑖=1 𝑖𝑗 7

Sample variance of Stratum 2


1 2 1
𝑠22 = ∑4 (𝑦 − ̅̅̅)
𝑦2 = 𝑋11.44 = 3.81
𝑛2 −1 𝑖=1 𝑖𝑗 3

1 𝑠𝑖2
𝑣𝑎𝑟(𝑦̅𝑠𝑡 ) = 2
∑𝑘𝑖=1 𝑁𝑖 (𝑁𝑖 − 𝑛𝑖 )
𝑁 𝑛𝑖

1 2.57 3.81
= 2
[20(20 − 8) + 20(20 − 4) ]
30 8 3

1
= (77.1 + 304.8)
900

381.9
= = 0.424
900

You might also like