Lecture 6 Cluster sampling.docx

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Lecture 6: Cluster Sampling

Situations in which cluster sampling is used

In many practical situations and many types of populations,
➢ a list of elements is not available and
➢ so the use of an element as a sampling unit is not feasible.
The method of cluster sampling or area sampling can be used in such situations.
➢ It is one of the assumptions in any sampling procedure that the population
can be divided into a finite number of distinct and identifiable units, called
sampling units.
➢ The smallest units into which the population can be divided are called
elements of the population.
➢ The groups of such elements are called clusters.

Sampling Procedure
In cluster sampling
➢ Divide the whole population into clusters according to some well-defined
➢ Treat the clusters as sampling units.
➢ Choose a sample of clusters according to some procedure.
➢ Carry out a complete enumeration of the selected clusters, i.e. collect
information on all the sampling units available in selected clusters.
Area Sampling
➢ In case, the entire area containing the populations is
✓ subdivided into smaller area segments and
✓ each element in the population is associated with one and only one
such area segment, the procedure is called an area sampling.

Conditions Under Which the Cluster Sampling is Used
Cluster sampling is preferred when
i. No reliable listing of elements is available and it is expensive to prepare
ii. Even if the list of elements is available, the location or identification of
the units may be difficult.
iii. A necessary condition for the validity of this procedure is that
✓ every unit of the population under study must correspond to one and
only one unit of the cluster so that
✓ the total number of sampling units in the frame may cover all the
units of the population under study without any omission or
✓ When this condition is not satisfied, bias is introduced.

Construction of Clusters:
➢ The clusters are constructed such that the sampling units are heterogenous
within the clusters and homogenous among the clusters.
➢ This is opposite to the construction of the strata in the stratified sampling.
➢ There are two options to construct the clusters
—equal size and
—unequal size.

Case of Equal Clusters
Layout of NM Population Elements in Clusters
Elements 1 2 3 ……. i …… N
1 𝑦11 𝑦21 𝑦31 𝑦𝑖1 𝑦𝑁1
2 𝑦12 𝑦22 𝑦32 𝑦𝑖2 𝑦𝑁2
3 𝑦13 𝑦23 𝑦33 𝑦𝑖3 𝑦𝑁3

j 𝑦1𝑗 𝑦2𝑗 𝑦3𝑗 𝑦𝑖𝑗 𝑦𝑁𝑗

M 𝑦1𝑀 𝑦2𝑀 𝑦3𝑀 𝑦𝑖𝑀 𝑦𝑁𝑀

Cluster 𝑦1 𝑦2 𝑦3 𝑦𝑖 𝑦𝑁
Cluster 𝑦̅1 𝑦̅2 𝑦̅3 𝑦̅𝑖 𝑦̅𝑁

➢ Suppose the population is divided into N clusters and each cluster is of size
➢ Select a sample of n clusters from N clusters by the method of SRS, generally
We shall consider the case of equal clusters.
➢ Suppose the population consists of N clusters, each of M elements, and that a
sample of n clusters is drawn by the method of simple random sampling.
N = number clusters in the population
n = number clusters in the sample
M = number elements in the cluster
𝑦𝑖𝑗 = the value of the characteristic under study for the jth element,
( j = 1, 2,…….,M) in the ith cluster, ( i = 1, 2,…….N).
𝑦̅𝑖 = ∑𝑀
𝑗=1 = the mean of per element of the cluster

𝑦̅𝑛 = ∑𝑛𝑖=1 𝑦̅𝑖. /𝑛 = the mean of the cluster means in a sample

of n clusters
𝑌̅𝑁 = ∑𝑁
𝑖=1 𝑦
̅𝑖 = the mean of the cluster means in the population
𝑌̅ = ∑𝑁 𝑀
𝑖=1 ∑𝑗=1 𝑦
̅𝑖𝑗 = the mean per element in the population
1 2
𝑆𝑖2 = ∑𝑀 (𝑦 − 𝑦̅𝑖 ) = the mean square between elements
(𝑀−1) 𝑗=1 𝑖𝑗

within the ith cluster (i=1, 2,……., N

𝑆𝑤2 = ∑𝑀 2
𝑗=1 𝑆𝑖 = the mean square within clusters (w for within)
𝑆𝑏2 = ∑𝑁 ̅𝑖. − 𝑌̅𝑁 )2 = the mean square between cluster means

in the population (b for between)

1 2
𝑆2 = ∑𝑁
𝑖=1 ∑𝑀
𝑗=1(𝑦𝑖𝑗 − 𝑌̅) = the mean square between elements

in the population
∑𝑁 𝑀 ̅ ̅
𝑖=1 ∑𝑗=1(𝑦𝑖𝑗 −𝑌 )(𝑦𝑖𝑘 −𝑌 )
𝜌= = the intracluster correlation coefficient
(𝑀−1)(𝑁𝑀−1)𝑆 2

between elements within clusters.

Equal Cluster Sampling: Estimation of Population Mean and its Variance

In simple random sampling, wor, of n clusters each containing M elements from a
population of N clusters, the sample mean 𝑦̅𝑛 is an unbiased estimator of 𝑌̅ and its
variance is given by
(𝑁 − 𝑛) 2 (1 − 𝑓) 2
𝑉(𝑦̅𝑛 ) = 𝑆𝑏 = 𝑆𝑏
𝑁𝑛 𝑛
Estimate of variance
The estimate of variance in case of WOR is
̂ (𝑦̅𝑛 )=𝑁−𝑛 𝑠𝑏2 = (1−𝑓) 𝑠𝑏2
𝑁𝑛 𝑛
where 𝑠𝑏2 = ∑𝑛𝑖=1(𝑦̅𝑖 − 𝑦̅𝑛 )2 is the mean sum of squares between cluster means
in the sample.

Example: A social researcher wishes to estimate the average number of male
children in a given community. For this purpose, he prepared a list of 400
geographical clusters of 10 households each and a simple random sample of 4
clusters was selected. The relevant data appear in the accompanying table. Estimate
the average number of male children per household for the community and hence
obtain an estimate of the variance.
Table: Sample of male children in 10 households.
Households Cluster
1 1 1 2 1
2 2 3 1 1
3 1 2 1 3
4 3 2 1 2
5 3 3 1 1
6 2 1 3 5
7 1 4 2 1
8 4 1 1 2
9 1 1 3 3
10 1 2 1 1
TOTAL 19 20 16 20
Mean: 𝑦̅𝑖 1.9 2.0 1.6 2.0

Solution: Here n = 4, M = 10, N = 400.

Hence an estimate of the average number of male children in the community is
1 1.9+2.0+1.6+2.0 7.5
𝑦̅𝑛 = ∑𝑛𝑖=1 𝑦̅𝑖 = = = 1.875
𝑛 4 4

The estimated variance is

̂ (𝑦̅𝑛 )=𝑁−𝑛 𝑠𝑏2
𝑉𝑎𝑟 where 𝑠𝑏2 =
∑𝑛𝑖=1(𝑦̅𝑖 − 𝑦̅𝑛 )2
𝑁𝑛 𝑛−1
1 1
Now 𝑠𝑏2 = ∑𝑛𝑖=1(𝑦̅𝑖 − 𝑦̅𝑛 )2 = [(1.9 − 1.875)2 + (2.0 − 1.875)2 + (1.6 −
𝑛−1 3
1.875)2 + (2.0 − 1.875)2 ]
= = 0.036

Hence ̂ (𝑦̅𝑛 )=𝑁−𝑛 𝑠𝑏2 = 400−4 𝑋0.036 = 0.0089

𝑁𝑛 400𝑋4

Relative Efficiency of Cluster Sampling
In sampling nM elements from the population by simple random sampling, the
variance of the sample mean 𝑦̅ is given by
(1 − 𝑓)𝑆 2
𝑉(𝑦̅) =
Thus, the relative efficiency of cluster sampling compared with simple random
sampling is given by
𝑉𝑆𝑅 (𝑦̅) 𝑆2
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑒𝑓𝑓𝑖𝑐𝑖𝑛𝑒𝑐𝑦 = =
𝑉𝐶 (𝑦̅𝑛 ) 𝑀𝑆𝑏2

This shows that the efficiency of cluster sampling increases as the mean square
between clusters decreases.
Also (𝑁 − 1)𝑀𝑆𝑏2 = (𝑁𝑀 − 1)𝑆 2 − 𝑁(𝑀 − 1)𝑆𝑤2
Therefore, relative efficiency will increase with increase in the mean square within
clusters. These results suggest that the clusters should be so formed that variation
within clusters is maximum while variation between clusters is minimum.
Case of Unequal Clusters
➢ In practice, the equal size of clusters is available only when planned.
➢ For example, in a screw manufacturing company, the packets of screws can
be prepared such that every packet contains same number screws.
➢ In real applications, it is hard to get clusters of equal size.
✓ For example, the villages with equal areas are difficult to find, the
districts with same number of persons are difficult to find, the number
of members in a household may not be same in each household in a
given area.
Let there be N clusters and 𝑀𝑖 be the size of ith cluster, let
𝑀0 = ∑𝑁
𝑖=1 𝑀𝑖

̅ = 1 ∑𝑁
𝑀 𝑖=1 𝑀𝑖 =
𝑦̅𝑖 = ∑𝑀 𝑖
𝑗=1 𝑦𝑖𝑗 : mean of ith cluster
1 𝑀𝑖 𝑀𝑖 1 𝑀𝑖
𝑌̅ = ∑𝑁 𝑁
𝑖=1 ∑𝑗=1 𝑦𝑖𝑗 = ∑𝑖=1 𝑦̅𝑖 = ∑𝑁
𝑖=1 ̅ 𝑖
𝑀0 𝑀0 𝑁 𝑀

Mean of Cluster Means:
Consider the simple arithmetic mean of the cluster means as
𝑦̿𝑐 = ∑𝑛𝑖=1 𝑦̅𝑖
𝑁−𝑛 2
An estimate of 𝑉𝑎𝑟(𝑦̿𝑐 ) = 𝑠
𝑁𝑛 𝑏
where 𝑠𝑏2 = ∑𝑛𝑖=1(𝑦̅𝑖 − 𝑦̿𝑐 )2

You might also like