Professional Documents
Culture Documents
Sampling Unit 8
Sampling Unit 8
Systematic sampling is a more commonly used selection procedure. Systematic sampling means
the selection of units at a fixed interval from a list, starting from a randomly determined point. In
other words, it involves selecting units from a list using a selection interval, say k, so that every
k th element on the list, following a random start selected between 1 and k, is included in the
sample. Systematic sampling is normally carried out as follows considering different cases.
For example, consider the farm survey in which n = 8,000 farms are to be selected using
144,000
systematic sampling from N = 144,000 farms. The selection interval is k N n 18 .
8,000
We must find a random start number between 1 and 18 inclusive, and then take every 18 th
thereafter. If the random start number between 1 and 18 is 12, then the units in the sample would
correspond to the farms numbered 12, 30, 48, 66, 84,---, 143994. Once the sampling interval (k)
is determined, the random selection of the starting point determines the whole sample. In this
case there are 18 possible samples that can be chosen, i.e., j =1, 2,- - -,18.
1 1 n 1
For this design the sampling fraction is f
k N N 18
n
8.1.2 When the interval k N is not an integer
n
When k is taken as an integer nearest to the whole number, there is a method of taking a
systematic sample of 1 in k elements that will always lead to unbiased estimates of population
means, totals, and proportions. This method requires prior knowledge of the population size N.
The procedure of this method is as follows:
1. Choose a random number between 1 and N where N is the number of elements in the
population. Let this number be j, where j = 1, 2, - - - , N.
2. Compute the quotient j , where k is the sampling interval nearest to the whole number.
k
Express this quotient as an integer plus a remainder.
1
Example: Suppose we wish to take a 1-in-6 sample from a list of 26 employees. Since k is
26 2
determined (k = 6), the required sample size is n = N 4 4
k 6 6
We need to select 4 employees using systematic sampling with interval k = 6.
Instead of taking a random number between 1 and 6 to start the systematic sampling, we must
take a random number j between 1 and 26, and j will have 26 possible values.
2
Let j be 8. We then divide 8 by 6 and determine the remainder, i.e., j = 8 1 , the
k 6 6
nd
remainder is 2 (m = 2). We take a systematic sample 1 in 6, starting with the 2 employee, and
obtain 2, 8, 14, 20 on the list.
When j = 24, the quotient j = 24 4, and the remainder m = 0. In this case we take a
k 6
systematic sample 1 in 6, starting with k = 6. Then we obtain 6, 12, 18, 24 on the list.
This method can be used where N is a multiple of n or not. It consists of choosing a random start
between 1 and N and thereafter, every k th unit in a cyclical manner will be selected, until a
sample of n units is obtained. The interval k is computed as an integer or to the nearest integer.
Generally, if j is a random start, 1 j N , the following serial numbers will be selected into the
sample.
Example:- for illustration purpose let the population size N=14, and the sample size n=5. The
N 14 4
value of the interval is k 2 3
n 5 5
We choose a random start j between 1 and 14. Let j =7. Then a systematic sample 1 in 3 would
produce samples with the following serial numbers.
For i = 0, 1, 2, 3, 4 and j =7, we obtain the following numbers either by using j+ik or j+ik-N.
7 0 3 7 , if i = 0
7 1 3 10 , if i = 1
7 2 3 13 , if i = 2
7 3 3 14 2 , if i = 3
7 4 3 14 5 , if i = 4
These numbers are located on the circle to show cyclical arrangement.
2
8.2 The Estimated Mean and Its Variance
If the units are listed randomly, this is similar to simple random sampling. But, if the units are
listed in order by a specific attribute such as age group, income group etc, then this is similar to
stratified sampling. This process will reduce the sampling error associated with a randomly
ordered list by ensuring that each group is represented in the sample in a proportion equal to its
proportion in the underlying population.
Several formulas have been developed for the variance of y sy , the mean of a systematic sample.
However, a systematic sampling can be treated just like a simple random sample for the purpose
of analysis provided that certain conditions are met.
The basic requirement of this is that the list used as the sampling frame must not have any
intrinsic regularity or periodicity of its own.
Consider all possible samples of 1-in- k elements.
Corresponding to this points we find the value Y jr where Yjr represents the rth element of the jth
systematic sample (r = 1, 2, …, n; j = 1, 2, …, k). N = nk
n k n k
y jr Y jr
j 1 r 1
y
j 1
j
r 1
yj , Y , the mean of sample means
n N k
(Y jr Y )2 (Y jr Y )2
S2
N 1 nk 1
3
k
(y j Y )2
j 1
Variance of the mean: S a2
k 1
Like cluster sampling, the parameter w is the correlation coefficient between pairs of units that
are in the same systematic sample. The variance of the means in terms of this w would be
2 S 2 ( N 1) w n 1 1
S
a . In systematic sampling, the estimated mean denoted by y sy is
n 2 k 1
simply one of the y j ’s, depending on which random number was chosen to start sampling.
n
y
r 1
jr
Therefore, y sy = = yj .
n
Theorem 8.1: Let yjr denotes the rth element of the jth systematic sample with N = nk, where j =
1, 2, …, k; r = 1, 2, …, n. Then the mean of systematic sample, y sy , is an unbiased estimate of
Y , and its variance is
k k
y Y
2
(y j Y )2 j
j 1 k 1 2 j 1
V ( y sy ) S a , where S a2 . Prove this theorem.
k k k 1
Corollary: Expressing the variance of the mean of a systematic sample in terms of w is:
k 1 2 N 1 S 2 w n 1 1 S 2 w n 1 1
V ( y sy ) Sa , for large N.
k N n n
The mean of a systematic sample is more precise than the mean of a SRS when w is small and
1
negative. Show that w if V ( y sy ) V( y ).
N 1
1
If w , then y sy is more precise. This can occur when the populations are
N 1
ordered with respect to the study variable.
1
If w = or w = 0, for large N, then they are equally efficient. This can occur when
N 1
the population’s elements are arranged in random order with respect to the variable under
the study.
1
If w > or w > 0, for large N, then y sy is less efficient. This can happen when the
N 1
population’s elements are arranged in a list demonstrating a high degree of periodicity.
Consider a case where the population is in “random order”. That is, there is no trend or
stratification in the study variable, and no correlation between neighboring values. In this case
4
systematic sampling is equivalent to simple random sampling, E(Vsy) = Vran. Then the sample
n
(y jr y j )2
1 f 2
variance is v( y sy ) s y j , where s 2y j r 1
, for sample j and srs.
n n 1
Suppose that the required sample size n is selected from N in the form of two or more (say m)
systematic sub-samples of the same size with independent random start. Assume that these m
sub-samples, each of size n = n/m, are to be selected from the population of N units. Also let
N/n = k. Then, select m random start using srs wr or wor, for each m sub-samples, from 1 to k,
where k = mk.
Let y1 , y 2 , . . . , y m be the estimators of the population mean based on m such sub-samples, each
n
y
j 1
ij
y
i 1
i
i) y sy = , i = 1,2, - - -, m, which is unbiased estimator of population mean Y .
m
k 1 k
ii) V( y sy ) = ( y i Y ) 2 , i =1,2, - - -, k, with its unbiased estimator given by
km(km 1) i 1
m
k 1
v( y sy ) = ( yi y sy ) 2
km(m 1) i 1
Example:
A dairy research institute is interested in estimating the total milk yield of a cow in connection
with a breading program. The total lactation period was taken as 300 days. It was decided to
select 3 systematic sub-samples without replacement, each of size 10 days, so to arrive at a total
sample size of 30 days. Milk yield recorded for the selected days is given below. The random
start used for each sub-sample was 1, 12, and 26.
5
Estimate the total milk (in liters) and estimate the variance.
N = 300, n = 30, m = 3, k = N/n = 300/ 30 = 10
Select n = n/m = 30/3 = 10 from 300 with interval of k= m k = 3x10 = 30,
N m 300
Yˆsy N y sy y i (10.055 9.755 9.680) 2949 liters
m i 1 3
2 m
N (k 1)
v( Yˆsy ) = ( y i y sy ) 2
km(m 1) i 1
(300) 2 (10 1)
v( Yˆ )
sy 10 x3 x 2
10.055 9.832 9.755 9.832 9.68 9.832 = 1063.125
a) Linear Trend
Suppose a population consists of linear trend that could be expressed in the form of
yi = i. That is, y1 = 1, y2 = 2, y3 = 3, - - -, yi = i, . . ., yN = N, for population of size N.
N N N N
N ( N 1) N ( N 1)(2 N 1)
yi i , yi i 2
2
i 1 i 1 2 i 1 i 1 6
2 2
2 1 N 2 1 N 1 N ( N 1)(2 N 1) 1 N ( N 1)
V(y) = S = yi N
N 1 i 1 i 1
yi =
N 1 6
N 2
N ( N 1)
S2 =
12
Substituting this value into the V( y ) gives
S 2 N n ( k 1)( N 1) ( k 1)( nk 1)
V( y ) = =
n N 12 12
Similarly, the means of systematic samples, y j , is in increasing order starting from 1, and with
value of 1 unit apart from each other. That is, y1 1, y 2 2, - - -, y j j , for j = 1, 2, - - -, k and
for k possible systematic samples.
2
k
1
V( y sy ) = y j Y
k j 1
2
1 k 1 k 1 k (k 1)(2k 1) 1 k (k 1) 2 k 2 1
V( y sy ) = y 2j y j =
k j 1 k i 1 k 6 k 2 12
k 2 1 ( k 1)( nk 1)
V( y sy ) V( y ) if and only if . If n = 1, they are equal, otherwise
12 12
systematic sample is much more effective.
b) Periodicity:
This shows that all students with similar performance are selected in one sample, providing large
differences between the sample means. In this situation w is very high which implies low
precision and can be calculated if grades are converted to numeric values. To overcome this
periodicity problem, you must arrange the grades in increasing or decreasing order before sample
selection is carried out.
Systematic sampling has the following advantages over simple random sampling.
It is easy and simple to apply; the sample is easier to draw since only one random number is
required; costs may be lower per unit to sample.
From operational point of view, it is a convenient method for the selection of sampling units
in the field and can easily be taught to individuals who have little training in survey
methodology. It also allows easy checking of the selection.
Systematic sampling tends to distribute the sample over the listed population in a more
evenly way.
Higher precision is often associated with systematic sampling, especially if the arrangement
of the units in the list is related to the characteristics of interest. It has an effect similar to
stratification and is often called “implicit stratification”.
It provides more efficient estimates of population mean or total in comparison to simple
random sample for populations with linear trend.
Disadvantage:
A serious disadvantage of this method lies in its use with populations having unforeseen
periodicity, which may substantially contribute to the bias in the estimate of mean or total. In
this case w > 0, and for large N, V( y sy ) > V( y SRS ).